1900 files changed, 391533 insertions, 0 deletions
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000000..0edca21239
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "google/protobuf"]
+	path = google/protobuf
+	url = https://github.googlesource.com/google/protobuf.git
diff --git a/ACKNOWLEDGMENTS b/ACKNOWLEDGMENTS
new file mode 100644
index 0000000000..4d0bf245f8
--- /dev/null
+++ b/ACKNOWLEDGMENTS
@@ -0,0 +1,50 @@
+Some of TensorFlow's code is derived from Caffe, which is subject to the following copyright notice:
+
+COPYRIGHT
+
+All contributions by the University of California:
+
+Copyright (c) 2014, The Regents of the University of California (Regents)
+All rights reserved.
+
+All other contributions:
+
+Copyright (c) 2014, the respective contributors
+All rights reserved.
+
+Caffe uses a shared copyright model: each contributor holds copyright over
+their contributions to Caffe. The project versioning records all such
+contribution and copyright details. If a contributor wants to further mark
+their specific copyright on a particular contribution, they should indicate
+their copyright solely in the commit message of the change when it is
+committed.
+
+LICENSE
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+   ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+   WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+   DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+   ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+   (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+   LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+   ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+CONTRIBUTION AGREEMENT
+
+By contributing to the BVLC/caffe repository through pull-request, comment,
+or otherwise, the contributor releases their content to the
+license and copyright terms herein.
+
diff --git a/AUTHORS b/AUTHORS
new file mode 100644
index 0000000000..e3289a50bc
--- /dev/null
+++ b/AUTHORS
@@ -0,0 +1,9 @@
+# This is the official list of TensorFlow authors for copyright purposes.
+# This file is distinct from the CONTRIBUTORS files.
+# See the latter for an explanation.
+
+# Names should be added to this file as:
+# Name or Organization <email address>
+# The email address is not required for organizations.
+
+Google Inc.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000000..dbaf281844
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,17 @@
+# Contributing guidelines
+
+## How to become a contributor and submit your own code
+
+### Contributor License Agreements
+
+We'd love to accept your patches! Before we can take them, we have to jump a couple of legal hurdles.
+
+Please fill out either the individual or corporate Contributor License Agreement (CLA).
+
+  * If you are an individual writing original source code and you're sure you own the intellectual property, then you'll need to sign an [individual CLA](http://code.google.com/legal/individual-cla-v1.0.html).
+  * If you work for a company that wants to allow you to contribute your work, then you'll need to sign a [corporate CLA](http://code.google.com/legal/corporate-cla-v1.0.html).
+
+Follow either of the two links above to access the appropriate CLA and instructions for how to sign and return it. Once we receive it, we'll be able to accept your pull requests.
+
+***NOTE***: Only original source code from you and other people that have signed the CLA can be accepted into the main repository.
+
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000..d3da228420
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,203 @@
+Copyright 2015 The TensorFlow Authors.  All rights reserved.
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2015, The TensorFlow Authors.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000000..a52845a06e
--- /dev/null
+++ b/README.md
@@ -0,0 +1,17 @@
+#TensorFlow
+
+TensorFlow is an open source software library for numerical computation using
+data flow graphs.  Nodes in the graph represent mathematical operations, while
+the graph edges represent the multidimensional data arrays (tensors) that flow
+between them.  This flexible architecture lets you deploy computation to one
+or more CPUs or GPUs in a desktop, server, or mobile device without rewriting
+code.  TensorFlow was originally developed by researchers and engineers
+working on the Google Brain team within Google's Machine Intelligence research
+organization for the purposes of conducting machine learning and deep neural
+networks research.  The system is general enough to be applicable in a wide
+variety of other domains, as well.
+
+##For more information
+
+* [Installation and setup instructions](/tensorflow/g3doc/get_started/os_setup.md)
+* [TensorFlow website](http://tensorflow.org)
diff --git a/WORKSPACE b/WORKSPACE
new file mode 100644
index 0000000000..5789bf20ce
--- /dev/null
+++ b/WORKSPACE
@@ -0,0 +1,551 @@
+# Uncomment and update the paths in these entries to build the Android demo.
+#android_sdk_repository(
+#    name = "androidsdk",
+#    api_level = 23,
+#    build_tools_version = "23.0.1",
+#    # Replace with path to Android SDK on your system
+#    path = "<PATH_TO_SDK>",
+#)
+#
+#android_ndk_repository(
+#    name="androidndk",
+#    path="<PATH_TO_NDK>",
+#    api_level=21)
+
+new_http_archive(
+  name = "gmock_archive",
+  url = "https://googlemock.googlecode.com/files/gmock-1.7.0.zip",
+  sha256 = "26fcbb5925b74ad5fc8c26b0495dfc96353f4d553492eb97e85a8a6d2f43095b",
+  build_file = "gmock.BUILD",
+)
+
+bind(
+  name = "gtest",
+  actual = "@gmock_archive//:gtest",
+)
+
+bind(
+  name = "gtest_main",
+  actual = "@gmock_archive//:gtest_main",
+)
+
+git_repository(
+  name = "re2",
+  remote = "https://github.com/google/re2.git",
+  tag = "2015-07-01",
+)
+
+new_http_archive(
+  name = "jpeg_archive",
+  url = "http://www.ijg.org/files/jpegsrc.v9a.tar.gz",
+  sha256 = "3a753ea48d917945dd54a2d97de388aa06ca2eb1066cbfdc6652036349fe05a7",
+  build_file = "jpeg.BUILD",
+)
+
+git_repository(
+  name = "gemmlowp",
+  remote = "https://github.com/google/gemmlowp.git",
+  commit = "cc5d3a0",
+)
+
+new_http_archive(
+  name = "png_archive",
+  url = "https://storage.googleapis.com/libpng-public-archive/libpng-1.2.53.tar.gz",
+  sha256 = "e05c9056d7f323088fd7824d8c6acc03a4a758c4b4916715924edc5dd3223a72",
+  build_file = "png.BUILD",
+)
+
+new_http_archive(
+  name = "six_archive",
+  url = "https://pypi.python.org/packages/source/s/six/six-1.10.0.tar.gz#md5=34eed507548117b2ab523ab14b2f8b55",
+  sha256 = "105f8d68616f8248e24bf0e9372ef04d3cc10104f1980f54d57b2ce73a5ad56a",
+  build_file = "six.BUILD",
+)
+
+bind(
+  name = "six",
+  actual = "@six_archive//:six",
+)
+
+new_git_repository(
+  name = "iron-ajax",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-ajax.git",
+  tag = "v1.0.8",
+)
+
+new_git_repository(
+  name = "iron-dropdown",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-dropdown.git",
+  tag = "v1.0.6",
+)
+
+new_git_repository(
+  name = "accessibility-developer-tools",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/GoogleChrome/accessibility-developer-tools.git",
+  tag = "v2.10.0",
+)
+
+new_git_repository(
+  name = "iron-doc-viewer",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-doc-viewer.git",
+  tag = "v1.0.6",
+)
+
+new_git_repository(
+  name = "iron-icons",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/iron-icons.git",
+  tag = "v1.0.4",
+)
+
+new_git_repository(
+  name = "paper-icon-button",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-icon-button.git",
+  tag = "v1.0.5",
+)
+
+new_git_repository(
+  name = "sinonjs",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/blittle/sinon.js.git",
+  tag = "v1.17.1",
+)
+
+new_git_repository(
+  name = "paper-dropdown-menu",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-dropdown-menu.git",
+  tag = "v1.0.5",
+)
+
+new_git_repository(
+  name = "iron-flex-layout",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/iron-flex-layout.git",
+  tag = "v1.0.4",
+)
+
+new_git_repository(
+  name = "iron-autogrow-textarea",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-autogrow-textarea.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "d3",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/mbostock/d3.git",
+  tag = "v3.5.6",
+)
+
+new_git_repository(
+  name = "iron-component-page",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-component-page.git",
+  tag = "v1.0.8",
+)
+
+new_git_repository(
+  name = "stacky",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerLabs/stacky.git",
+  tag = "v1.2.4",
+)
+
+new_git_repository(
+  name = "paper-styles",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-styles.git",
+  tag = "v1.0.12",
+)
+
+new_git_repository(
+  name = "paper-input",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-input.git",
+  tag = "v1.0.16",
+)
+
+new_git_repository(
+  name = "paper-item",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-item.git",
+  tag = "v1.0.5",
+)
+
+new_git_repository(
+  name = "marked-element",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/marked-element.git",
+  tag = "v1.1.1",
+)
+
+new_git_repository(
+  name = "prism",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/LeaVerou/prism.git",
+  tag = "v1.3.0",
+)
+
+new_git_repository(
+  name = "paper-progress",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-progress.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "iron-checked-element-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-checked-element-behavior.git",
+  tag = "v1.0.2",
+)
+
+new_git_repository(
+  name = "paper-toolbar",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-toolbar.git",
+  tag = "v1.0.4",
+)
+
+new_git_repository(
+  name = "async",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/caolan/async.git",
+  tag = "0.9.2",
+)
+
+new_git_repository(
+  name = "es6-promise",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/components/es6-promise.git",
+  tag = "v3.0.2",
+)
+
+new_git_repository(
+  name = "promise-polyfill",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerlabs/promise-polyfill.git",
+  tag = "v1.0.0",
+)
+
+new_git_repository(
+  name = "font-roboto",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/font-roboto.git",
+  tag = "v1.0.1",
+)
+
+new_git_repository(
+  name = "paper-menu",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-menu.git",
+  tag = "v1.1.1",
+)
+
+new_git_repository(
+  name = "iron-icon",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/iron-icon.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "iron-meta",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-meta.git",
+  tag = "v1.1.0",
+)
+
+new_git_repository(
+  name = "lodash",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/lodash/lodash.git",
+  tag = "3.10.1",
+)
+
+new_git_repository(
+  name = "iron-resizable-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-resizable-behavior.git",
+  tag = "v1.0.2",
+)
+
+new_git_repository(
+  name = "iron-fit-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-fit-behavior.git",
+  tag = "v1.0.3",
+)
+
+new_git_repository(
+  name = "iron-overlay-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-overlay-behavior.git",
+  tag = "v1.0.9",
+)
+
+new_git_repository(
+  name = "neon-animation",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/neon-animation.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "iron-a11y-keys-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/iron-a11y-keys-behavior.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "plottable",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/palantir/plottable.git",
+  tag = "v1.16.1",
+)
+
+new_git_repository(
+  name = "webcomponentsjs",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/Polymer/webcomponentsjs.git",
+  tag = "v0.7.15",
+)
+
+new_git_repository(
+  name = "iron-validatable-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-validatable-behavior.git",
+  tag = "v1.0.5",
+)
+
+new_git_repository(
+  name = "sinon-chai",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/domenic/sinon-chai.git",
+  tag = "2.8.0",
+)
+
+new_git_repository(
+  name = "paper-button",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-button.git",
+  tag = "v1.0.8",
+)
+
+new_git_repository(
+  name = "iron-input",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-input.git",
+  tag = "v1.0.6",
+)
+
+new_git_repository(
+  name = "iron-menu-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-menu-behavior.git",
+  tag = "v1.0.5",
+)
+
+new_git_repository(
+  name = "paper-slider",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-slider.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "iron-list",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-list.git",
+  tag = "v1.1.5",
+)
+
+new_git_repository(
+  name = "marked",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/chjj/marked.git",
+  tag = "v0.3.5",
+)
+
+new_git_repository(
+  name = "paper-material",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/paper-material.git",
+  tag = "v1.0.3",
+)
+
+new_git_repository(
+  name = "iron-range-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-range-behavior.git",
+  tag = "v1.0.4",
+)
+
+new_git_repository(
+  name = "svg-typewriter",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/palantir/svg-typewriter.git",
+  tag = "v0.3.0",
+)
+
+new_git_repository(
+  name = "web-animations-js",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/web-animations/web-animations-js.git",
+  tag = "2.1.2",
+)
+
+new_git_repository(
+  name = "hydrolysis",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/Polymer/hydrolysis.git",
+  tag = "v1.19.3",
+)
+
+new_git_repository(
+  name = "web-component-tester",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/Polymer/web-component-tester.git",
+  tag = "v3.3.29",
+)
+
+new_git_repository(
+  name = "paper-toggle-button",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-toggle-button.git",
+  tag = "v1.0.11",
+)
+
+new_git_repository(
+  name = "paper-behaviors",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/paper-behaviors.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "paper-radio-group",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-radio-group.git",
+  tag = "v1.0.6",
+)
+
+new_git_repository(
+  name = "iron-selector",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-selector.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "iron-form-element-behavior",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-form-element-behavior.git",
+  tag = "v1.0.5",
+)
+
+new_git_repository(
+  name = "mocha",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/mochajs/mocha.git",
+  tag = "v2.3.3",
+)
+
+new_git_repository(
+  name = "dagre",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/cpettitt/dagre.git",
+  tag = "v0.7.4",
+)
+
+new_git_repository(
+  name = "iron-behaviors",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-behaviors.git",
+  tag = "v1.0.9",
+)
+
+new_git_repository(
+  name = "graphlib",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/cpettitt/graphlib.git",
+  tag = "v1.0.7",
+)
+
+new_git_repository(
+  name = "iron-collapse",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-collapse.git",
+  tag = "v1.0.4",
+)
+
+new_git_repository(
+  name = "paper-checkbox",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-checkbox.git",
+  tag = "v1.0.13",
+)
+
+new_git_repository(
+  name = "paper-radio-button",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-radio-button.git",
+  tag = "v1.0.10",
+)
+
+new_git_repository(
+  name = "paper-header-panel",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/paper-header-panel.git",
+  tag = "v1.0.5",
+)
+
+new_git_repository(
+  name = "prism-element",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/prism-element.git",
+  tag = "v1.0.2",
+)
+
+new_git_repository(
+  name = "chai",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/chaijs/chai.git",
+  tag = "2.3.0",
+)
+
+new_git_repository(
+  name = "paper-menu-button",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/paper-menu-button.git",
+  tag = "v1.0.3",
+)
+
+new_git_repository(
+  name = "polymer",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/Polymer/polymer.git",
+  tag = "v1.2.1",
+)
+
+new_git_repository(
+  name = "paper-ripple",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/polymerelements/paper-ripple.git",
+  tag = "v1.0.4",
+)
+
+new_git_repository(
+  name = "iron-iconset-svg",
+  build_file = "bower.BUILD",
+  remote = "https://github.com/PolymerElements/iron-iconset-svg.git",
+  tag = "v1.0.8",
+)
diff --git a/bower.BUILD b/bower.BUILD
new file mode 100644
index 0000000000..66f9784dee
--- /dev/null
+++ b/bower.BUILD
@@ -0,0 +1,971 @@
+package(default_visibility = ["//tensorflow:internal"])
+
+filegroup(
+    name = "iron-ajax",
+    srcs = [
+        "index.html",
+        "iron-ajax.html",
+        "iron-request.html",
+    ],
+)
+
+filegroup(
+    name = "iron-dropdown",
+    srcs = [
+        "index.html",
+        "iron-dropdown.html",
+        "iron-dropdown-scroll-manager.html",
+    ],
+)
+
+filegroup(
+    name = "accessibility-developer-tools",
+    srcs = ["dist/js/axs_testing.js"],
+)
+
+filegroup(
+    name = "iron-doc-viewer",
+    srcs = [
+        "index.html",
+        "iron-doc-property.css",
+        "iron-doc-property.html",
+        "iron-doc-viewer.css",
+        "iron-doc-viewer.html",
+    ],
+)
+
+filegroup(
+    name = "iron-icons",
+    srcs = [
+        "av-icons.html",
+        "communication-icons.html",
+        "device-icons.html",
+        "editor-icons.html",
+        "hardware-icons.html",
+        "image-icons.html",
+        "index.html",
+        "iron-icons.html",
+        "maps-icons.html",
+        "notification-icons.html",
+        "social-icons.html",
+    ],
+)
+
+filegroup(
+    name = "paper-icon-button",
+    srcs = [
+        "index.html",
+        "paper-icon-button.html",
+    ],
+)
+
+filegroup(
+    name = "sinonjs",
+    srcs = ["sinon.js"],
+)
+
+filegroup(
+    name = "paper-dropdown-menu",
+    srcs = [
+        "index.html",
+        "paper-dropdown-menu.html",
+    ],
+)
+
+filegroup(
+    name = "iron-flex-layout",
+    srcs = [
+        "classes/iron-flex-layout.html",
+        "classes/iron-shadow-flex-layout.html",
+        "index.html",
+        "iron-flex-layout.html",
+    ],
+)
+
+filegroup(
+    name = "iron-autogrow-textarea",
+    srcs = [
+        "index.html",
+        "iron-autogrow-textarea.html",
+    ],
+)
+
+filegroup(
+    name = "d3",
+    srcs = [
+        "d3.js",
+        "d3.min.js",
+        "package.js",
+    ],
+)
+
+filegroup(
+    name = "iron-component-page",
+    srcs = [
+        "index.html",
+        "iron-component-page.css",
+        "iron-component-page.html",
+    ],
+)
+
+filegroup(
+    name = "stacky",
+    srcs = [
+        "gulpfile.js",
+        "lib/formatting.js",
+        "lib/index.js",
+        "lib/normalization.js",
+        "lib/parsing.js",
+    ],
+)
+
+filegroup(
+    name = "paper-styles",
+    srcs = [
+        "classes/global.html",
+        "classes/shadow.html",
+        "classes/shadow-layout.html",
+        "classes/typography.html",
+        "color.html",
+        "default-theme.html",
+        "demo.css",
+        "demo-pages.html",
+        "index.html",
+        "paper-styles.html",
+        "paper-styles-classes.html",
+        "shadow.html",
+        "typography.html",
+    ],
+)
+
+filegroup(
+    name = "paper-input",
+    srcs = [
+        "all-imports.html",
+        "index.html",
+        "paper-input.html",
+        "paper-input-addon-behavior.html",
+        "paper-input-behavior.html",
+        "paper-input-char-counter.html",
+        "paper-input-container.html",
+        "paper-input-error.html",
+        "paper-textarea.html",
+    ],
+)
+
+filegroup(
+    name = "paper-item",
+    srcs = [
+        "all-imports.html",
+        "index.html",
+        "paper-icon-item.html",
+        "paper-item.html",
+        "paper-item-body.html",
+        "paper-item-shared-styles.html",
+    ],
+)
+
+filegroup(
+    name = "marked-element",
+    srcs = [
+        "index.html",
+        "marked-element.html",
+        "marked-import.html",
+    ],
+)
+
+filegroup(
+    name = "prism",
+    srcs = [
+        "components.js",
+        "components/prism-abap.js",
+        "components/prism-abap.min.js",
+        "components/prism-actionscript.js",
+        "components/prism-actionscript.min.js",
+        "components/prism-apacheconf.js",
+        "components/prism-apacheconf.min.js",
+        "components/prism-apl.js",
+        "components/prism-apl.min.js",
+        "components/prism-applescript.js",
+        "components/prism-applescript.min.js",
+        "components/prism-asciidoc.js",
+        "components/prism-asciidoc.min.js",
+        "components/prism-aspnet.js",
+        "components/prism-aspnet.min.js",
+        "components/prism-autohotkey.js",
+        "components/prism-autohotkey.min.js",
+        "components/prism-autoit.js",
+        "components/prism-autoit.min.js",
+        "components/prism-bash.js",
+        "components/prism-bash.min.js",
+        "components/prism-basic.js",
+        "components/prism-basic.min.js",
+        "components/prism-batch.js",
+        "components/prism-batch.min.js",
+        "components/prism-bison.js",
+        "components/prism-bison.min.js",
+        "components/prism-brainfuck.js",
+        "components/prism-brainfuck.min.js",
+        "components/prism-c.js",
+        "components/prism-c.min.js",
+        "components/prism-clike.js",
+        "components/prism-clike.min.js",
+        "components/prism-coffeescript.js",
+        "components/prism-coffeescript.min.js",
+        "components/prism-core.js",
+        "components/prism-core.min.js",
+        "components/prism-cpp.js",
+        "components/prism-cpp.min.js",
+        "components/prism-crystal.js",
+        "components/prism-crystal.min.js",
+        "components/prism-csharp.js",
+        "components/prism-csharp.min.js",
+        "components/prism-css.js",
+        "components/prism-css.min.js",
+        "components/prism-css-extras.js",
+        "components/prism-css-extras.min.js",
+        "components/prism-d.js",
+        "components/prism-d.min.js",
+        "components/prism-dart.js",
+        "components/prism-dart.min.js",
+        "components/prism-diff.js",
+        "components/prism-diff.min.js",
+        "components/prism-docker.js",
+        "components/prism-docker.min.js",
+        "components/prism-eiffel.js",
+        "components/prism-eiffel.min.js",
+        "components/prism-elixir.js",
+        "components/prism-elixir.min.js",
+        "components/prism-erlang.js",
+        "components/prism-erlang.min.js",
+        "components/prism-fortran.js",
+        "components/prism-fortran.min.js",
+        "components/prism-fsharp.js",
+        "components/prism-fsharp.min.js",
+        "components/prism-gherkin.js",
+        "components/prism-gherkin.min.js",
+        "components/prism-git.js",
+        "components/prism-git.min.js",
+        "components/prism-glsl.js",
+        "components/prism-glsl.min.js",
+        "components/prism-go.js",
+        "components/prism-go.min.js",
+        "components/prism-groovy.js",
+        "components/prism-groovy.min.js",
+        "components/prism-haml.js",
+        "components/prism-haml.min.js",
+        "components/prism-handlebars.js",
+        "components/prism-handlebars.min.js",
+        "components/prism-haskell.js",
+        "components/prism-haskell.min.js",
+        "components/prism-haxe.js",
+        "components/prism-haxe.min.js",
+        "components/prism-http.js",
+        "components/prism-http.min.js",
+        "components/prism-icon.js",
+        "components/prism-icon.min.js",
+        "components/prism-inform7.js",
+        "components/prism-inform7.min.js",
+        "components/prism-ini.js",
+        "components/prism-ini.min.js",
+        "components/prism-j.js",
+        "components/prism-j.min.js",
+        "components/prism-jade.js",
+        "components/prism-jade.min.js",
+        "components/prism-java.js",
+        "components/prism-java.min.js",
+        "components/prism-javascript.js",
+        "components/prism-javascript.min.js",
+        "components/prism-jsx.js",
+        "components/prism-jsx.min.js",
+        "components/prism-julia.js",
+        "components/prism-julia.min.js",
+        "components/prism-keyman.js",
+        "components/prism-keyman.min.js",
+        "components/prism-kotlin.js",
+        "components/prism-kotlin.min.js",
+        "components/prism-latex.js",
+        "components/prism-latex.min.js",
+        "components/prism-less.js",
+        "components/prism-less.min.js",
+        "components/prism-lolcode.js",
+        "components/prism-lolcode.min.js",
+        "components/prism-lua.js",
+        "components/prism-lua.min.js",
+        "components/prism-makefile.js",
+        "components/prism-makefile.min.js",
+        "components/prism-markdown.js",
+        "components/prism-markdown.min.js",
+        "components/prism-markup.js",
+        "components/prism-markup.min.js",
+        "components/prism-matlab.js",
+        "components/prism-matlab.min.js",
+        "components/prism-mel.js",
+        "components/prism-mel.min.js",
+        "components/prism-mizar.js",
+        "components/prism-mizar.min.js",
+        "components/prism-monkey.js",
+        "components/prism-monkey.min.js",
+        "components/prism-nasm.js",
+        "components/prism-nasm.min.js",
+        "components/prism-nginx.js",
+        "components/prism-nginx.min.js",
+        "components/prism-nim.js",
+        "components/prism-nim.min.js",
+        "components/prism-nix.js",
+        "components/prism-nix.min.js",
+        "components/prism-nsis.js",
+        "components/prism-nsis.min.js",
+        "components/prism-objectivec.js",
+        "components/prism-objectivec.min.js",
+        "components/prism-ocaml.js",
+        "components/prism-ocaml.min.js",
+        "components/prism-oz.js",
+        "components/prism-oz.min.js",
+        "components/prism-parigp.js",
+        "components/prism-parigp.min.js",
+        "components/prism-parser.js",
+        "components/prism-parser.min.js",
+        "components/prism-pascal.js",
+        "components/prism-pascal.min.js",
+        "components/prism-perl.js",
+        "components/prism-perl.min.js",
+        "components/prism-php.js",
+        "components/prism-php.min.js",
+        "components/prism-php-extras.js",
+        "components/prism-php-extras.min.js",
+        "components/prism-powershell.js",
+        "components/prism-powershell.min.js",
+        "components/prism-processing.js",
+        "components/prism-processing.min.js",
+        "components/prism-prolog.js",
+        "components/prism-prolog.min.js",
+        "components/prism-puppet.js",
+        "components/prism-puppet.min.js",
+        "components/prism-pure.js",
+        "components/prism-pure.min.js",
+        "components/prism-python.js",
+        "components/prism-python.min.js",
+        "components/prism-q.js",
+        "components/prism-q.min.js",
+        "components/prism-qore.js",
+        "components/prism-qore.min.js",
+        "components/prism-r.js",
+        "components/prism-r.min.js",
+        "components/prism-rest.js",
+        "components/prism-rest.min.js",
+        "components/prism-rip.js",
+        "components/prism-rip.min.js",
+        "components/prism-roboconf.js",
+        "components/prism-roboconf.min.js",
+        "components/prism-ruby.js",
+        "components/prism-ruby.min.js",
+        "components/prism-rust.js",
+        "components/prism-rust.min.js",
+        "components/prism-sas.js",
+        "components/prism-sas.min.js",
+        "components/prism-sass.js",
+        "components/prism-sass.min.js",
+        "components/prism-scala.js",
+        "components/prism-scala.min.js",
+        "components/prism-scheme.js",
+        "components/prism-scheme.min.js",
+        "components/prism-scss.js",
+        "components/prism-scss.min.js",
+        "components/prism-smalltalk.js",
+        "components/prism-smalltalk.min.js",
+        "components/prism-smarty.js",
+        "components/prism-smarty.min.js",
+        "components/prism-sql.js",
+        "components/prism-sql.min.js",
+        "components/prism-stylus.js",
+        "components/prism-stylus.min.js",
+        "components/prism-swift.js",
+        "components/prism-swift.min.js",
+        "components/prism-tcl.js",
+        "components/prism-tcl.min.js",
+        "components/prism-textile.js",
+        "components/prism-textile.min.js",
+        "components/prism-twig.js",
+        "components/prism-twig.min.js",
+        "components/prism-typescript.js",
+        "components/prism-typescript.min.js",
+        "components/prism-verilog.js",
+        "components/prism-verilog.min.js",
+        "components/prism-vhdl.js",
+        "components/prism-vhdl.min.js",
+        "components/prism-vim.js",
+        "components/prism-vim.min.js",
+        "components/prism-wiki.js",
+        "components/prism-wiki.min.js",
+        "components/prism-yaml.js",
+        "components/prism-yaml.min.js",
+        "examples.js",
+        "gulpfile.js",
+        "plugins/autolinker/prism-autolinker.css",
+        "plugins/autolinker/prism-autolinker.js",
+        "plugins/autolinker/prism-autolinker.min.js",
+        "plugins/autoloader/prism-autoloader.js",
+        "plugins/autoloader/prism-autoloader.min.js",
+        "plugins/file-highlight/prism-file-highlight.js",
+        "plugins/file-highlight/prism-file-highlight.min.js",
+        "plugins/highlight-keywords/prism-highlight-keywords.js",
+        "plugins/highlight-keywords/prism-highlight-keywords.min.js",
+        "plugins/ie8/prism-ie8.css",
+        "plugins/ie8/prism-ie8.js",
+        "plugins/ie8/prism-ie8.min.js",
+        "plugins/jsonp-highlight/prism-jsonp-highlight.js",
+        "plugins/jsonp-highlight/prism-jsonp-highlight.min.js",
+        "plugins/keep-markup/prism-keep-markup.js",
+        "plugins/keep-markup/prism-keep-markup.min.js",
+        "plugins/line-highlight/prism-line-highlight.css",
+        "plugins/line-highlight/prism-line-highlight.js",
+        "plugins/line-highlight/prism-line-highlight.min.js",
+        "plugins/line-numbers/prism-line-numbers.css",
+        "plugins/line-numbers/prism-line-numbers.js",
+        "plugins/line-numbers/prism-line-numbers.min.js",
+        "plugins/previewer-angle/prism-previewer-angle.css",
+        "plugins/previewer-angle/prism-previewer-angle.js",
+        "plugins/previewer-angle/prism-previewer-angle.min.js",
+        "plugins/previewer-base/prism-previewer-base.css",
+        "plugins/previewer-base/prism-previewer-base.js",
+        "plugins/previewer-base/prism-previewer-base.min.js",
+        "plugins/previewer-color/prism-previewer-color.css",
+        "plugins/previewer-color/prism-previewer-color.js",
+        "plugins/previewer-color/prism-previewer-color.min.js",
+        "plugins/previewer-easing/prism-previewer-easing.css",
+        "plugins/previewer-easing/prism-previewer-easing.js",
+        "plugins/previewer-easing/prism-previewer-easing.min.js",
+        "plugins/previewer-gradient/prism-previewer-gradient.css",
+        "plugins/previewer-gradient/prism-previewer-gradient.js",
+        "plugins/previewer-gradient/prism-previewer-gradient.min.js",
+        "plugins/previewer-time/prism-previewer-time.css",
+        "plugins/previewer-time/prism-previewer-time.js",
+        "plugins/previewer-time/prism-previewer-time.min.js",
+        "plugins/remove-initial-line-feed/prism-remove-initial-line-feed.js",
+        "plugins/remove-initial-line-feed/prism-remove-initial-line-feed.min.js",
+        "plugins/show-invisibles/prism-show-invisibles.css",
+        "plugins/show-invisibles/prism-show-invisibles.js",
+        "plugins/show-invisibles/prism-show-invisibles.min.js",
+        "plugins/show-language/prism-show-language.css",
+        "plugins/show-language/prism-show-language.js",
+        "plugins/show-language/prism-show-language.min.js",
+        "plugins/wpd/prism-wpd.css",
+        "plugins/wpd/prism-wpd.js",
+        "plugins/wpd/prism-wpd.min.js",
+        "prism.js",
+        "tests/helper/components.js",
+        "tests/helper/prism-loader.js",
+        "tests/helper/test-case.js",
+        "tests/helper/test-discovery.js",
+        "tests/helper/token-stream-transformer.js",
+        "tests/run.js",
+        "tests/run-child.js",
+        "tests/testrunner-tests.js",
+        "themes/prism.css",
+        "themes/prism-coy.css",
+        "themes/prism-dark.css",
+        "themes/prism-funky.css",
+        "themes/prism-okaidia.css",
+        "themes/prism-tomorrow.css",
+        "themes/prism-twilight.css",
+        "vendor/promise.js",
+    ],
+)
+
+filegroup(
+    name = "paper-progress",
+    srcs = [
+        "index.html",
+        "paper-progress.html",
+    ],
+)
+
+filegroup(
+    name = "iron-checked-element-behavior",
+    srcs = [
+        "index.html",
+        "iron-checked-element-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "paper-toolbar",
+    srcs = [
+        "index.html",
+        "paper-toolbar.html",
+    ],
+)
+
+filegroup(
+    name = "async",
+    srcs = [
+        "deps/nodeunit.css",
+        "deps/nodeunit.js",
+        "lib/async.js",
+        "support/sync-package-managers.js",
+    ],
+)
+
+filegroup(
+    name = "es6-promise",
+    srcs = [
+        "promise.js",
+        "promise.min.js",
+    ],
+)
+
+filegroup(
+    name = "promise-polyfill",
+    srcs = [
+        "Gruntfile.js",
+        "Promise.js",
+        "Promise.min.js",
+        "Promise-Statics.js",
+        "promise-polyfill.html",
+        "promise-polyfill-lite.html",
+    ],
+)
+
+filegroup(
+    name = "font-roboto",
+    srcs = ["roboto.html"],
+)
+
+filegroup(
+    name = "paper-menu",
+    srcs = [
+        "index.html",
+        "paper-menu.html",
+        "paper-menu-shared.css",
+        "paper-submenu.html",
+    ],
+)
+
+filegroup(
+    name = "iron-icon",
+    srcs = [
+        "index.html",
+        "iron-icon.html",
+    ],
+)
+
+filegroup(
+    name = "iron-meta",
+    srcs = [
+        "index.html",
+        "iron-meta.html",
+    ],
+)
+
+filegroup(
+    name = "lodash",
+    srcs = [
+        "lodash.js",
+        "lodash.min.js",
+    ],
+)
+
+filegroup(
+    name = "iron-resizable-behavior",
+    srcs = [
+        "demo/src/x-app.html",
+        "index.html",
+        "iron-resizable-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "iron-fit-behavior",
+    srcs = [
+        "index.html",
+        "iron-fit-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "iron-overlay-behavior",
+    srcs = [
+        "index.html",
+        "iron-overlay-backdrop.html",
+        "iron-overlay-behavior.html",
+        "iron-overlay-manager.html",
+    ],
+)
+
+filegroup(
+    name = "neon-animation",
+    srcs = [
+        "animations/cascaded-animation.html",
+        "animations/fade-in-animation.html",
+        "animations/fade-out-animation.html",
+        "animations/hero-animation.html",
+        "animations/opaque-animation.html",
+        "animations/reverse-ripple-animation.html",
+        "animations/ripple-animation.html",
+        "animations/scale-down-animation.html",
+        "animations/scale-up-animation.html",
+        "animations/slide-down-animation.html",
+        "animations/slide-from-left-animation.html",
+        "animations/slide-from-right-animation.html",
+        "animations/slide-left-animation.html",
+        "animations/slide-right-animation.html",
+        "animations/slide-up-animation.html",
+        "animations/transform-animation.html",
+        "demo/card/index.html",
+        "demo/card/x-card.html",
+        "demo/card/x-cards-list.html",
+        "demo/declarative/index.html",
+        "demo/doc/basic.html",
+        "demo/doc/my-animatable.html",
+        "demo/doc/my-dialog.html",
+        "demo/doc/types.html",
+        "demo/dropdown/animated-dropdown.html",
+        "demo/dropdown/index.html",
+        "demo/grid/animated-grid.html",
+        "demo/grid/fullsize-page-with-card.html",
+        "demo/grid/index.html",
+        "demo/list/full-view.html",
+        "demo/list/index.html",
+        "demo/list/list-demo.html",
+        "demo/list/list-view.html",
+        "demo/load/animated-grid.html",
+        "demo/load/full-page.html",
+        "demo/load/index.html",
+        "demo/reprojection/animated-grid.html",
+        "demo/reprojection/fullsize-page-with-card.html",
+        "demo/reprojection/index.html",
+        "demo/reprojection/reprojected-pages.html",
+        "demo/tiles/circles-page.html",
+        "demo/tiles/index.html",
+        "demo/tiles/squares-page.html",
+        "index.html",
+        "neon-animatable.html",
+        "neon-animatable-behavior.html",
+        "neon-animated-pages.html",
+        "neon-animation.html",
+        "neon-animation-behavior.html",
+        "neon-animation-runner-behavior.html",
+        "neon-animations.html",
+        "neon-shared-element-animatable-behavior.html",
+        "neon-shared-element-animation-behavior.html",
+        "web-animations.html",
+    ],
+)
+
+filegroup(
+    name = "iron-a11y-keys-behavior",
+    srcs = [
+        "index.html",
+        "iron-a11y-keys-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "plottable",
+    srcs = [
+        "plottable.css",
+        "plottable.js",
+        "plottable.min.js",
+    ],
+)
+
+filegroup(
+    name = "webcomponentsjs",
+    srcs = [
+        "CustomElements.js",
+        "CustomElements.min.js",
+        "HTMLImports.js",
+        "HTMLImports.min.js",
+        "MutationObserver.js",
+        "MutationObserver.min.js",
+        "ShadowDOM.js",
+        "ShadowDOM.min.js",
+        "webcomponents.js",
+        "webcomponents.min.js",
+        "webcomponents-lite.js",
+        "webcomponents-lite.min.js",
+    ],
+)
+
+filegroup(
+    name = "iron-validatable-behavior",
+    srcs = [
+        "index.html",
+        "iron-validatable-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "sinon-chai",
+    srcs = ["lib/sinon-chai.js"],
+)
+
+filegroup(
+    name = "paper-button",
+    srcs = [
+        "index.html",
+        "paper-button.html",
+    ],
+)
+
+filegroup(
+    name = "iron-input",
+    srcs = [
+        "index.html",
+        "iron-input.html",
+    ],
+)
+
+filegroup(
+    name = "iron-menu-behavior",
+    srcs = [
+        "index.html",
+        "iron-menu-behavior.html",
+        "iron-menubar-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "paper-slider",
+    srcs = [
+        "index.html",
+        "paper-slider.html",
+    ],
+)
+
+filegroup(
+    name = "iron-list",
+    srcs = [
+        "index.html",
+        "iron-list.html",
+        "test/smoke/avg-worst-case.html",
+        "test/smoke/dummy-data.html",
+        "test/smoke/index.html",
+    ],
+)
+
+filegroup(
+    name = "marked",
+    srcs = [
+        "Gulpfile.js",
+        "index.js",
+        "lib/marked.js",
+        "marked.min.js",
+    ],
+)
+
+filegroup(
+    name = "paper-material",
+    srcs = [
+        "index.html",
+        "paper-material.html",
+    ],
+)
+
+filegroup(
+    name = "iron-range-behavior",
+    srcs = [
+        "index.html",
+        "iron-range-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "svg-typewriter",
+    srcs = ["svgtypewriter.js"],
+)
+
+filegroup(
+    name = "web-animations-js",
+    srcs = [
+        "web-animations.html",
+        "web-animations.min.js",
+        "web-animations-next.min.js",
+        "web-animations-next-lite.min.js",
+    ],
+)
+
+filegroup(
+    name = "hydrolysis",
+    srcs = [
+        "hydrolysis.html",
+        "hydrolysis.js",
+        "hydrolysis-analyzer.html",
+        "index.js",
+    ],
+)
+
+filegroup(
+    name = "web-component-tester",
+    srcs = [
+        "browser.js",
+        "data/a11ySuite.js",
+        "data/index.html",
+    ],
+)
+
+filegroup(
+    name = "paper-toggle-button",
+    srcs = [
+        "index.html",
+        "paper-toggle-button.html",
+    ],
+)
+
+filegroup(
+    name = "paper-behaviors",
+    srcs = [
+        "index.html",
+        "paper-button-behavior.html",
+        "paper-checked-element-behavior.html",
+        "paper-inky-focus-behavior.html",
+        "paper-ripple-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "paper-radio-group",
+    srcs = [
+        "index.html",
+        "paper-radio-group.html",
+    ],
+)
+
+filegroup(
+    name = "iron-selector",
+    srcs = [
+        "index.html",
+        "iron-multi-selectable.html",
+        "iron-selectable.html",
+        "iron-selection.html",
+        "iron-selector.html",
+    ],
+)
+
+filegroup(
+    name = "iron-form-element-behavior",
+    srcs = [
+        "index.html",
+        "iron-form-element-behavior.html",
+    ],
+)
+
+filegroup(
+    name = "mocha",
+    srcs = [
+        "mocha.css",
+        "mocha.js",
+    ],
+)
+
+filegroup(
+    name = "dagre",
+    srcs = [
+        "dist/dagre.core.js",
+        "dist/dagre.core.min.js",
+    ],
+)
+
+filegroup(
+    name = "iron-behaviors",
+    srcs = [
+        "index.html",
+        "iron-button-state.html",
+        "iron-control-state.html",
+    ],
+)
+
+filegroup(
+    name = "graphlib",
+    srcs = [
+        "dist/graphlib.core.js",
+        "dist/graphlib.core.min.js",
+    ],
+)
+
+filegroup(
+    name = "iron-collapse",
+    srcs = [
+        "index.html",
+        "iron-collapse.html",
+    ],
+)
+
+filegroup(
+    name = "paper-checkbox",
+    srcs = [
+        "index.html",
+        "metadata.html",
+        "paper-checkbox.html",
+    ],
+)
+
+filegroup(
+    name = "paper-radio-button",
+    srcs = [
+        "index.html",
+        "paper-radio-button.html",
+    ],
+)
+
+filegroup(
+    name = "paper-header-panel",
+    srcs = [
+        "index.html",
+        "paper-header-panel.css",
+        "paper-header-panel.html",
+    ],
+)
+
+filegroup(
+    name = "prism-element",
+    srcs = [
+        "prism-highlighter.html",
+        "prism-import.html",
+    ],
+)
+
+filegroup(
+    name = "chai",
+    srcs = [
+        "chai.js",
+        "karma.conf.js",
+        "karma.sauce.js",
+        "sauce.browsers.js",
+    ],
+)
+
+filegroup(
+    name = "paper-menu-button",
+    srcs = [
+        "index.html",
+        "paper-menu-button.html",
+        "paper-menu-button-animations.html",
+    ],
+)
+
+filegroup(
+    name = "polymer",
+    srcs = [
+        "polymer.html",
+        "polymer-micro.html",
+        "polymer-mini.html",
+    ],
+)
+
+filegroup(
+    name = "paper-ripple",
+    srcs = [
+        "index.html",
+        "paper-ripple.html",
+    ],
+)
+
+filegroup(
+    name = "iron-iconset-svg",
+    srcs = [
+        "index.html",
+        "iron-iconset-svg.html",
+    ],
+)
diff --git a/configure b/configure
new file mode 100755
index 0000000000..d121c6f821
--- /dev/null
+++ b/configure
@@ -0,0 +1,82 @@
+#!/bin/bash
+
+## Set up Cuda-related environment settings
+
+while [ "$TF_NEED_CUDA" == "" ]; do
+  read -p "Do you wish to bulid TensorFlow with GPU support? [y/n] " INPUT
+  case $INPUT in
+    [Yy]* ) echo -e "GPU support will be enabled for TensorFlow\n"; TF_NEED_CUDA=1;;
+    [Nn]* ) echo -e "No GPU support will be enabled for TensorFlow\n"; TF_NEED_CUDA=0;;
+    * ) echo "Invalid selection: " $INPUT;;
+  esac
+done
+
+if [ "$TF_NEED_CUDA" == "0" ]; then
+  echo "Configuration finished"
+  exit
+fi
+
+# Find out where the CUDA toolkit is installed
+while true; do
+  fromuser=""
+  if [ -z "$CUDA_TOOLKIT_PATH" ]; then
+    default_cuda_path=/usr/local/cuda
+    read -p "Please specify the location where CUDA 7.0 toolkit is installed. Refer to README.md for more details. [Default is $default_cuda_path]: " CUDA_TOOLKIT_PATH
+    fromuser="1"
+    if [ -z "$CUDA_TOOLKIT_PATH" ]; then
+      CUDA_TOOLKIT_PATH=$default_cuda_path
+    fi
+  fi
+  if [ -e "$CUDA_TOOLKIT_PATH/lib64/libcudart.so.7.0" ]; then
+    break
+  fi
+  echo "Invalid path to CUDA 7.0 toolkit. ${CUDA_TOOLKIT_PATH}/lib64/libcudart.so.7.0 cannot be found"
+  if [ -z "$fromuser" ]; then
+    exit 1
+  fi
+  CUDA_TOOLKIT_PATH=""
+  # Retry
+done
+
+# Find out where the CUDNN library is installed
+while true; do
+  fromuser=""
+  if [ -z "$CUDNN_INSTALL_PATH" ]; then
+    default_cudnn_path=${CUDA_TOOLKIT_PATH}
+    read -p "Please specify the location where CUDNN 6.5 V2 library is installed. Refer to README.md for more details. [Default is $default_cudnn_path]: " CUDNN_INSTALL_PATH
+    fromuser="1"
+    if [ -z "$CUDNN_INSTALL_PATH" ]; then
+      CUDNN_INSTALL_PATH=$default_cudnn_path
+    fi
+    # Result returned from "read" will be used unexpanded. That make "~" unuseable.
+    # Going through one more level of expansion to handle that.
+    CUDNN_INSTALL_PATH=$(bash -c "readlink -f $CUDNN_INSTALL_PATH")
+  fi
+  if [ -e "$CUDNN_INSTALL_PATH/libcudnn.so.6.5" -o -e "$CUDNN_INSTALL_PATH/lib64/libcudnn.so.6.5" ]; then
+    break
+  fi
+  echo "Invalid path to CUDNN 6.5 V2 toolkit. Neither of the following two files can be found:"
+  echo "$CUDNN_INSTALL_PATH/lib64/libcudnn.so.6.5"
+  echo "$CUDNN_INSTALL_PATH/libcudnn.so.6.5"
+  if [ -z "$fromuser" ]; then
+    exit 1
+  fi
+  CUDNN_INSTALL_PATH=""
+  # Retry
+done
+
+cat > third_party/gpus/cuda/cuda.config <<EOF
+# CUDA_TOOLKIT_PATH refers to the CUDA toolkit. Tensorflow requries Cuda 7.0
+# at the moment.
+CUDA_TOOLKIT_PATH="$CUDA_TOOLKIT_PATH"
+
+# CUDNN_INSTALL_PATH refers to the CUDNN toolkit. The cudnn header and library
+# files can be either in this directory, or under include/ and lib64/
+# directories separately.
+CUDNN_INSTALL_PATH="$CUDNN_INSTALL_PATH"
+EOF
+
+# Invoke the cuda_config.sh and set up the TensorFlow's canonical view of the Cuda libraries
+(cd third_party/gpus/cuda; ./cuda_config.sh;) || exit -1
+
+echo "Configuration finished"
diff --git a/gmock.BUILD b/gmock.BUILD
new file mode 100644
index 0000000000..66cddea9f4
--- /dev/null
+++ b/gmock.BUILD
@@ -0,0 +1,23 @@
+cc_library(
+    name = "gtest",
+    srcs = [
+        "gmock-1.7.0/gtest/src/gtest-all.cc",
+        "gmock-1.7.0/src/gmock-all.cc",
+    ],
+    includes = [
+        "gmock-1.7.0",
+        "gmock-1.7.0/gtest",
+        "gmock-1.7.0/gtest/include",
+        "gmock-1.7.0/include",
+    ],
+    linkopts = ["-pthread"],
+    visibility = ["//visibility:public"],
+)
+
+cc_library(
+    name = "gtest_main",
+    srcs = ["gmock-1.7.0/src/gmock_main.cc"],
+    linkopts = ["-pthread"],
+    visibility = ["//visibility:public"],
+    deps = [":gtest"],
+)
diff --git a/google/protobuf b/google/protobuf
new file mode 160000
+Subproject 55ad57a235c009d0414aed1781072adda0c8913
diff --git a/jpeg.BUILD b/jpeg.BUILD
new file mode 100644
index 0000000000..ad9e44363c
--- /dev/null
+++ b/jpeg.BUILD
@@ -0,0 +1,83 @@
+SOURCES = [
+    "jaricom.c",
+    "jcapimin.c",
+    "jcapistd.c",
+    "jcarith.c",
+    "jccoefct.c",
+    "jccolor.c",
+    "jcdctmgr.c",
+    "jchuff.c",
+    "jcinit.c",
+    "jcmainct.c",
+    "jcmarker.c",
+    "jcmaster.c",
+    "jcomapi.c",
+    "jcparam.c",
+    "jcprepct.c",
+    "jcsample.c",
+    "jctrans.c",
+    "jdarith.c",
+    "jdapimin.c",
+    "jdapistd.c",
+    "jdatadst.c",
+    "jdatasrc.c",
+    "jdcoefct.c",
+    "jdcolor.c",
+    "jddctmgr.c",
+    "jdhuff.c",
+    "jdinput.c",
+    "jdmainct.c",
+    "jdmarker.c",
+    "jdmaster.c",
+    "jdmerge.c",
+    "jdpostct.c",
+    "jdsample.c",
+    "jdtrans.c",
+    "jerror.c",
+    "jfdctflt.c",
+    "jfdctfst.c",
+    "jfdctint.c",
+    "jidctflt.c",
+    "jidctfst.c",
+    "jidctint.c",
+    "jmemmgr.c",
+    "jmemnobs.c",
+    "jquant1.c",
+    "jquant2.c",
+    "jutils.c",
+]
+
+HEADERS = [
+    "cderror.h",
+    "cdjpeg.h",
+    "jconfig.h",
+    "jdct.h",
+    "jerror.h",
+    "jinclude.h",
+    "jmemsys.h",
+    "jmorecfg.h",
+    "jpegint.h",
+    "jpeglib.h",
+    "jversion.h",
+    "transupp.h",
+]
+
+prefix_dir = "jpeg-9a"
+
+genrule(
+    name = "configure",
+    srcs = glob(
+        ["**/*"],
+        exclude = [prefix_dir + "/jconfig.h"],
+    ),
+    outs = [prefix_dir + "/jconfig.h"],
+    cmd = "pushd external/jpeg_archive/%s; workdir=$$(mktemp -d -t tmp.XXXXXXXXXX); cp -a * $$workdir; pushd $$workdir; ./configure; popd; popd; cp $$workdir/jconfig.h $(@D); rm -rf $$workdir;" % prefix_dir,
+)
+
+cc_library(
+    name = "jpeg",
+    srcs = [prefix_dir + "/" + source for source in SOURCES],
+    hdrs = glob(["**/*.h"]) + [":configure"],
+    includes = [prefix_dir],
+    visibility = ["//visibility:public"],
+)
diff --git a/png.BUILD b/png.BUILD
new file mode 100644
index 0000000000..1ecf1504d9
--- /dev/null
+++ b/png.BUILD
@@ -0,0 +1,40 @@
+package(default_visibility = ["//visibility:public"])
+
+prefix_dir = "libpng-1.2.53"
+
+PNG_SOURCES = [
+    "png.c",
+    "pngerror.c",
+    "pngget.c",
+    "pngmem.c",
+    "pngpread.c",
+    "pngread.c",
+    "pngrio.c",
+    "pngrtran.c",
+    "pngrutil.c",
+    "pngset.c",
+    "pngtrans.c",
+    "pngwio.c",
+    "pngwrite.c",
+    "pngwtran.c",
+    "pngwutil.c",
+]
+
+genrule(
+    name = "configure",
+    srcs = glob(
+        ["**/*"],
+        exclude = [prefix_dir + "/config.h"],
+    ),
+    outs = [prefix_dir + "/config.h"],
+    cmd = "pushd external/png_archive/%s; workdir=$$(mktemp -d -t tmp.XXXXXXXXXX); cp -a * $$workdir; pushd $$workdir; ./configure --enable-shared=no --with-pic=no; popd; popd; cp $$workdir/config.h $(@D); rm -rf $$workdir;" % prefix_dir,
+)
+
+cc_library(
+    name = "png",
+    srcs = [prefix_dir + "/" + source for source in PNG_SOURCES],
+    hdrs = glob(["**/*.h"]) + [":configure"],
+    includes = [prefix_dir],
+    linkopts = ["-lz"],
+    visibility = ["//visibility:public"],
+)
diff --git a/six.BUILD b/six.BUILD
new file mode 100644
index 0000000000..0a507257bf
--- /dev/null
+++ b/six.BUILD
@@ -0,0 +1,12 @@
+genrule(
+    name = "copy_six",
+    srcs = ["six-1.10.0/six.py"],
+    outs = ["six.py"],
+    cmd = "cp $< $(@)",
+)
+
+py_library(
+    name = "six",
+    srcs = ["six.py"],
+    visibility = ["//visibility:public"],
+)
diff --git a/tensorflow/BUILD b/tensorflow/BUILD
new file mode 100644
index 0000000000..0dd3cd0851
--- /dev/null
+++ b/tensorflow/BUILD
@@ -0,0 +1,43 @@
+# Description:
+# TensorFlow is a computational framework, primarily for use in machine
+# learning applications.
+
+package(default_visibility = [":internal"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files([
+    "LICENSE",
+    "ACKNOWLEDGMENTS",
+])
+
+package_group(
+    name = "internal",
+    packages = ["//tensorflow/..."],
+)
+
+sh_binary(
+    name = "swig",
+    srcs = ["tools/swig/swig.sh"],
+    data = glob(["tools/swig/**"]),
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+            "g3doc/sitemap.md",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
+
+py_library(
+    name = "tensorflow_py",
+    srcs = ["__init__.py"],
+    visibility = ["//visibility:public"],
+    deps = ["//tensorflow/python"],
+)
diff --git a/tensorflow/__init__.py b/tensorflow/__init__.py
new file mode 100644
index 0000000000..3e28aa85ec
--- /dev/null
+++ b/tensorflow/__init__.py
@@ -0,0 +1,4 @@
+# Bring in all of the public TensorFlow interface into this
+# module.
+# pylint: disable=wildcard-import
+from tensorflow.python import *
diff --git a/tensorflow/cc/BUILD b/tensorflow/cc/BUILD
new file mode 100644
index 0000000000..8a5bf87a29
--- /dev/null
+++ b/tensorflow/cc/BUILD
@@ -0,0 +1,89 @@
+# Description:
+# TensorFlow is a computational framework, primarily for use in machine
+# learning applications.
+
+package(default_visibility = ["//tensorflow:internal"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+load("/tensorflow/tensorflow", "tf_copts")
+load("/tensorflow/tensorflow", "tf_gen_op_wrappers_cc")
+
+cc_library(
+    name = "cc_op_gen_main",
+    srcs = [
+        "ops/cc_op_gen.cc",
+        "ops/cc_op_gen_main.cc",
+    ],
+    hdrs = ["ops/cc_op_gen.h"],
+    copts = tf_copts(),
+    deps = [
+        "//tensorflow/core:framework",
+    ],
+)
+
+# Generates a library that contains C++ wrappers for ops.
+tf_gen_op_wrappers_cc(
+    name = "cc_ops",
+    op_lib_names = [
+        "array_ops",
+        "attention_ops",
+        "candidate_sampling_ops",
+        "control_flow_ops",
+        "data_flow_ops",
+        "image_ops",
+        "io_ops",
+        "linalg_ops",
+        "logging_ops",
+        "math_ops",
+        "nn_ops",
+        "no_op",
+        "parsing_ops",
+        "random_ops",
+        "sendrecv_ops",
+        "sparse_ops",
+        "state_ops",
+        "string_ops",
+        "summary_ops",
+        "training_ops",
+        "user_ops",
+    ],
+    other_hdrs = [
+        "ops/const_op.h",
+        "ops/standard_ops.h",
+    ],
+    other_srcs = [
+        "ops/const_op.cc",
+    ] + glob(["ops/*_grad.cc"]),
+    pkg = "//tensorflow/core",
+)
+
+cc_binary(
+    name = "tutorials_example_trainer",
+    srcs = ["tutorials/example_trainer.cc"],
+    copts = tf_copts(),
+    linkopts = [
+        "-lpthread",
+        "-lm",
+    ],
+    deps = [
+        ":cc_ops",
+        "//tensorflow/core:kernels",
+        "//tensorflow/core:local",
+        "//tensorflow/core:tensorflow",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/cc/ops/array_grad.cc b/tensorflow/cc/ops/array_grad.cc
new file mode 100644
index 0000000000..37ffed9792
--- /dev/null
+++ b/tensorflow/cc/ops/array_grad.cc
@@ -0,0 +1,32 @@
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+typedef FunctionDefHelper FDH;
+
+REGISTER_OP_NO_GRADIENT("Shape");
+REGISTER_OP_NO_GRADIENT("Rank");
+REGISTER_OP_NO_GRADIENT("Size");
+
+Status ReshapeGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  *g = FDH::Define(
+      // Arg defs
+      {"x: T", "shape: int32", "dy: T"},
+      // Ret val defs
+      {"dx: T", "dshape: int32"},
+      // Attr defs
+      {{"T: {float, double}"}},
+      // Nodes
+      {
+        {{"x_shape"}, "Shape", {"x"}, {{"T", "$T"}}},
+        {{"dx"}, "Reshape", {"dy", "x_shape"}, {{"T", "$T"}}},
+        {{"dshape"}, "ZerosLike", {"shape"}, {{"T", DT_INT32}}},
+      });
+  // clang-format on
+  return Status::OK();
+}
+REGISTER_OP_GRADIENT("Reshape", ReshapeGrad);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/cc/ops/cc_op_gen.cc b/tensorflow/cc/ops/cc_op_gen.cc
new file mode 100644
index 0000000000..fdecf967f8
--- /dev/null
+++ b/tensorflow/cc/ops/cc_op_gen.cc
@@ -0,0 +1,350 @@
+// TODO(josh11b): Rewrite function parameter names to avoid C++ keywords
+// or "opts".
+
+#include "tensorflow/cc/ops/cc_op_gen.h"
+
+#include <unordered_map>
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/framework/op_gen_lib.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace {
+
+const int kRightMargin = 79;
+
+const char* AttrTypeName(StringPiece attr_type) {
+  static const char* kAttrTypeName[][2] = {
+      {"string", "StringPiece"},
+      {"list(string)", "gtl::ArraySlice<string>"},
+      {"int", "int64"},
+      {"list(int)", "gtl::ArraySlice<int>"},
+      {"float", "float"},
+      {"list(float)", "gtl::ArraySlice<float>"},
+      {"bool", "bool"},
+      {"list(bool)", "gtl::ArraySlice<bool>"},
+      {"type", "DataType"},
+      {"list(type)", "DataTypeSlice"},
+      {"shape", "TensorShape"},
+      {"list(shape)", "gtl::ArraySlice<TensorShape>"},
+      {"tensor", "const Tensor&"},
+      {"list(tensor)", "gtl::ArraySlice<Tensor>"},
+      {"func", "const NameAttrList&"},
+  };
+  for (size_t i = 0; i < TF_ARRAYSIZE(kAttrTypeName); ++i) {
+    if (attr_type == kAttrTypeName[i][0]) {
+      return kAttrTypeName[i][1];
+    }
+  }
+  LOG(FATAL) << "Unsupported Attr type: " << attr_type;
+  return "";
+}
+
+// Change:     Into:
+//   ABC         // ABC
+//               //
+//   DEF         // DEF
+string MakeComment(StringPiece text) {
+  string ret;
+  while (!text.empty()) {
+    int last_non_space = -1;
+    int newline;
+    for (newline = 0; newline < static_cast<int>(text.size()); ++newline) {
+      if (text[newline] == '\n') break;
+      if (text[newline] != ' ') last_non_space = newline;
+    }
+    if (last_non_space == -1) {
+      strings::StrAppend(&ret, "//\n");
+    } else {
+      strings::StrAppend(&ret, "// ", text.substr(0, last_non_space + 1), "\n");
+    }
+    text.remove_prefix(newline + 1);
+  }
+  return ret;
+}
+
+void WriteCCOp(const OpDef& op_def, WritableFile* h, WritableFile* cc) {
+  // TODO(josh11b): Better wrapping of comments.
+  string comment;
+  if (op_def.summary().empty()) {
+    comment = "TODO: add doc.\n";
+  } else {
+    comment = strings::StrCat(op_def.summary(), "\n");
+    if (!op_def.description().empty()) {
+      strings::StrAppend(&comment, "\n", op_def.description(), "\n");
+    }
+  }
+
+  static const string kSingleInputType = "NodeOut";
+  static const string kListInputType = "gtl::ArraySlice<NodeOut>";
+
+  std::vector<string> arg_types;
+  std::vector<string> arg_names;
+
+  strings::StrAppend(&comment, "\nArguments:\n");
+
+  // Map from attr name to the first input arg it is inferred from.
+  std::unordered_map<string, string> inferred_attrs;
+  for (int i = 0; i < op_def.input_arg_size(); ++i) {
+    const auto& arg(op_def.input_arg(i));
+    arg_names.emplace_back(arg.name());
+    bool is_list = false;
+
+    if (!arg.type_attr().empty()) {
+      gtl::InsertIfNotPresent(&inferred_attrs, arg.type_attr(), arg.name());
+    } else if (!arg.type_list_attr().empty()) {
+      gtl::InsertIfNotPresent(&inferred_attrs, arg.type_list_attr(),
+                              arg.name());
+      is_list = true;
+    }
+    if (!arg.number_attr().empty()) {
+      gtl::InsertIfNotPresent(&inferred_attrs, arg.number_attr(), arg.name());
+      is_list = true;
+    }
+    if (is_list) {
+      arg_types.emplace_back(kListInputType);
+    } else {
+      arg_types.emplace_back(kSingleInputType);
+    }
+
+    // TODO(josh11b): Include input type information.
+    StringPiece description = arg.description();
+    if (!description.empty()) {
+      ConsumeEquals(&description);
+      strings::StrAppend(&comment, "* ", arg_names.back(), ": ",
+                         arg.description(), "\n");
+    }
+  }
+
+  string options_comment;
+  for (int i = 0; i < op_def.attr_size(); ++i) {
+    const auto& attr(op_def.attr(i));
+    // Do not add inferred attrs or attrs with defaults to the C++
+    // function signature.
+    if (inferred_attrs.find(attr.name()) == inferred_attrs.end()) {
+      if (!attr.has_default_value()) {
+        arg_names.emplace_back(attr.name());
+        arg_types.emplace_back(AttrTypeName(attr.type()));
+        if (!attr.description().empty()) {
+          strings::StrAppend(&comment, "* ", arg_names.back(), ": ",
+                             attr.description(), "\n");
+        }
+      } else {
+        strings::StrAppend(&options_comment, "  .WithAttr(\"", attr.name(),
+                           "\", ", AttrTypeName(attr.type()), "): Defaults to ",
+                           SummarizeAttrValue(attr.default_value()), ".\n");
+        if (!attr.description().empty()) {
+          strings::StrAppend(&options_comment, "    ", attr.description(),
+                             "\n");
+        }
+      }
+    }
+  }
+  CHECK_EQ(arg_names.size(), arg_types.size());
+  strings::StrAppend(&comment, "* opts:\n", options_comment,
+                     R"comment(  .WithName(StringPiece): Set the Node's name
+  .WithDevice(StringPiece): Set the Node's requested device
+  .WithControlInput(Node*) / .WithControlInputs({Node*, ...}):
+    Add control depencies on the specified Node(s).
+
+Returns a pointer to the created Node)comment");
+
+  // TODO(josh11b): Include output type information.
+  if (op_def.output_arg_size() == 0) {
+    strings::StrAppend(&comment, ".\n");
+  } else if (op_def.output_arg_size() == 1) {
+    StringPiece description = op_def.output_arg(0).description();
+    ConsumeEquals(&description);
+    if (description.empty()) {
+      strings::StrAppend(&comment, ".\n");
+    } else {
+      strings::StrAppend(&comment, ", with output:\n", description, "\n");
+    }
+  } else {
+    strings::StrAppend(&comment, ", with outputs:\n");
+    for (int o = 0; o < op_def.output_arg_size(); ++o) {
+      StringPiece description = op_def.output_arg(o).description();
+      ConsumeEquals(&description);
+      if (description.empty()) {
+        strings::StrAppend(&comment, "* ", op_def.output_arg(o).name(), "\n");
+      } else {
+        strings::StrAppend(&comment, "* ", op_def.output_arg(o).name(), ": ",
+                           description, "\n");
+      }
+    }
+  }
+
+  // Write the header comment.
+  TF_CHECK_OK(h->Append(MakeComment(comment)));
+
+  // Declare the function wrapper.
+  const string prefix = strings::StrCat("Node* ", op_def.name(), "(");
+  string h_rest;
+  for (size_t i = 0; i < arg_names.size(); ++i) {
+    strings::StrAppend(&h_rest, arg_types[i], " ", arg_names[i], ", ");
+  }
+  strings::StrAppend(&h_rest, "const GraphDefBuilder::Options& opts");
+  string cc_decl = h_rest;
+  strings::StrAppend(&h_rest, ");");
+  TF_CHECK_OK(h->Append(WordWrap(prefix, h_rest, kRightMargin) + "\n\n"));
+
+  // Define the function wrapper.
+  strings::StrAppend(&cc_decl, ") {");
+  TF_CHECK_OK(cc->Append(WordWrap(prefix, cc_decl, kRightMargin) + "\n"));
+  const string op_name = strings::StrCat("  static const string kOpName = \"",
+                                         op_def.name(), "\";\n");
+
+  if (arg_types.empty()) {
+    TF_CHECK_OK(cc->Append(op_name));
+    TF_CHECK_OK(cc->Append("  return SourceOp(kOpName, opts);\n}\n\n"));
+  } else if (arg_types == std::vector<string>({kSingleInputType})) {
+    TF_CHECK_OK(cc->Append(op_name));
+    TF_CHECK_OK(cc->Append(strings::StrCat("  return UnaryOp(kOpName, ",
+                                           arg_names[0], ", opts);\n}\n\n")));
+  } else if (arg_types ==
+             std::vector<string>({kSingleInputType, kSingleInputType})) {
+    TF_CHECK_OK(cc->Append(op_name));
+    // TODO(josh11b): Word wrap this if it ever becomes necessary.
+    TF_CHECK_OK(
+        cc->Append(strings::StrCat("  return BinaryOp(kOpName, ", arg_names[0],
+                                   ", ", arg_names[1], ", opts);\n}\n\n")));
+  } else {
+    TF_CHECK_OK(cc->Append("  if (opts.HaveError()) return nullptr;\n"));
+    TF_CHECK_OK(cc->Append(op_name));
+    TF_CHECK_OK(cc->Append(
+        "  NodeBuilder node_builder(opts.GetNameForOp(kOpName), kOpName,\n"
+        "                           opts.op_registry());\n"));
+    for (size_t i = 0; i < arg_names.size(); ++i) {
+      if (i < static_cast<size_t>(op_def.input_arg_size())) {
+        TF_CHECK_OK(cc->Append(
+            strings::StrCat("  node_builder.Input(", arg_names[i], ");\n")));
+      } else {
+        TF_CHECK_OK(
+            cc->Append(strings::StrCat("  node_builder.Attr(\"", arg_names[i],
+                                       "\", ", arg_names[i], ");\n")));
+      }
+    }
+    TF_CHECK_OK(
+        cc->Append("  return opts.FinalizeBuilder(&node_builder);\n"
+                   "}\n\n"));
+  }
+}
+
+// Converts:
+//   bazel-out/.../genfiles/XX
+// to: XX.
+string GetPath(const std::string& dot_h_fname) {
+  auto pos = dot_h_fname.find("/genfiles/");
+  if (pos == string::npos) return dot_h_fname;
+  // - 1 account for the terminating null character (\0) in "/genfiles/".
+  return dot_h_fname.substr(pos + sizeof("/genfiles/") - 1);
+}
+
+// Converts:
+//   cc/ops/gen_foo_ops.h
+// to:
+//   CC_OPS_GEN_FOO_OPS_H_
+string ToGuard(const std::string& path) {
+  string guard;
+  guard.reserve(path.size() + 1);  // + 1 -> trailing _
+  for (const char c : path) {
+    if (c >= 'A' && c <= 'Z') {
+      guard += c;
+    } else if (c >= 'a' && c <= 'z') {
+      guard += c + 'A' - 'a';
+    } else {
+      guard += '_';
+    }
+  }
+  guard += '_';
+  return guard;
+}
+
+}  // namespace
+
+void WriteCCOps(const OpList& ops, const std::string& dot_h_fname,
+                const std::string& dot_cc_fname) {
+  Env* env = Env::Default();
+  WritableFile* h = nullptr;
+  WritableFile* cc = nullptr;
+  TF_CHECK_OK(env->NewWritableFile(dot_h_fname, &h));
+  TF_CHECK_OK(env->NewWritableFile(dot_cc_fname, &cc));
+
+  // .h Header
+  const string include = GetPath(dot_h_fname);
+  const string guard = ToGuard(include);
+  // TODO(josh11b): Mention the library for which wrappers are being generated.
+  Status s;
+  s = h->Append(
+      strings::StrCat("// This file is MACHINE GENERATED! Do not edit.\n\n"
+                      "#ifndef ",
+                      guard,
+                      "\n"
+                      "#define ",
+                      guard, R"header(
+
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+namespace ops {
+
+// These add a node to the graph from opts.
+//
+// Note for "NodeOut" inputs, you will typically either pass
+// * a {Node*, int index} (to pass the index-th output of that node), or
+// * a Node* (to pass the first output of that node).
+
+
+)header"));
+  TF_CHECK_OK(s);
+  // .cc Header
+  s = cc->Append(
+      strings::StrCat("// This file is MACHINE GENERATED! Do not edit.\n\n"
+                      "#include \"",
+                      include, R"header("
+
+#include "tensorflow/core/graph/node_builder.h"
+
+namespace tensorflow {
+namespace ops {
+
+)header"));
+  TF_CHECK_OK(s);
+
+  for (const auto& op_def : ops.op()) {
+    WriteCCOp(op_def, h, cc);
+  }
+
+  // .h Footer
+
+  s = h->Append(strings::StrCat(R"footer(}  // namespace ops
+}  // namespace tensorflow
+
+#endif  // )footer",
+                                guard, "\n"));
+  TF_CHECK_OK(s);
+
+  // .cc Footer
+
+  s = cc->Append(R"footer(}  // namespace ops
+}  // namespace tensorflow
+)footer");
+  TF_CHECK_OK(s);
+
+  TF_CHECK_OK(cc->Close());
+  TF_CHECK_OK(h->Close());
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/cc/ops/cc_op_gen.h b/tensorflow/cc/ops/cc_op_gen.h
new file mode 100644
index 0000000000..1a9474ec87
--- /dev/null
+++ b/tensorflow/cc/ops/cc_op_gen.h
@@ -0,0 +1,14 @@
+#ifndef TENSORFLOW_CC_OPS_CC_OP_GEN_H_
+#define TENSORFLOW_CC_OPS_CC_OP_GEN_H_
+
+#include "tensorflow/core/framework/op_def.pb.h"
+
+namespace tensorflow {
+
+// Result is written to files dot_h and dot_cc.
+void WriteCCOps(const OpList& ops, const std::string& dot_h_fname,
+                const std::string& dot_cc_fname);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_CC_OPS_CC_OP_GEN_H_
diff --git a/tensorflow/cc/ops/cc_op_gen_main.cc b/tensorflow/cc/ops/cc_op_gen_main.cc
new file mode 100644
index 0000000000..b9f0e4a9bd
--- /dev/null
+++ b/tensorflow/cc/ops/cc_op_gen_main.cc
@@ -0,0 +1,34 @@
+#include "tensorflow/cc/ops/cc_op_gen.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/init_main.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace {
+
+void PrintAllCCOps(const std::string& dot_h, const std::string& dot_cc,
+                   bool include_internal) {
+  OpList ops;
+  OpRegistry::Global()->Export(include_internal, &ops);
+  WriteCCOps(ops, dot_h, dot_cc);
+}
+
+}  // namespace
+}  // namespace tensorflow
+
+int main(int argc, char* argv[]) {
+  tensorflow::port::InitMain(argv[0], &argc, &argv);
+  if (argc != 4) {
+    fprintf(stderr,
+            "Usage: %s out.h out.cc include_internal\n"
+            "  include_internal: 1 means include internal ops\n",
+            argv[0]);
+    exit(1);
+  }
+
+  bool include_internal = tensorflow::StringPiece("1") == argv[3];
+  tensorflow::PrintAllCCOps(argv[1], argv[2], include_internal);
+  return 0;
+}
diff --git a/tensorflow/cc/ops/const_op.cc b/tensorflow/cc/ops/const_op.cc
new file mode 100644
index 0000000000..e428e4f35e
--- /dev/null
+++ b/tensorflow/cc/ops/const_op.cc
@@ -0,0 +1,113 @@
+#include "tensorflow/cc/ops/const_op.h"
+
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+namespace ops {
+
+namespace {
+const string& OpName() {
+  static const string kOpName = "Const";
+  return kOpName;
+}
+}  // namespace
+
+#define DEFINE_CONST_SCALAR(TYPE)                                         \
+  Node* Const(TYPE s, const GraphDefBuilder::Options& options) {          \
+    return Const(gtl::ArraySlice<TYPE>(&s, 1), TensorShape({}), options); \
+  }
+
+#define DEFINE_CONST_VECTOR(TYPE)                                          \
+  Node* Const(gtl::ArraySlice<TYPE> v,                                     \
+              const GraphDefBuilder::Options& options) {                   \
+    return Const(v, TensorShape({static_cast<int64>(v.size())}), options); \
+  }
+
+#define DEFINE_CONST_TENSOR(TYPE, ...)                                         \
+  Node* Const(gtl::ArraySlice<TYPE> t, const TensorShape& shape,               \
+              const GraphDefBuilder::Options& options) {                       \
+    if (options.HaveError()) return nullptr;                                   \
+    NodeBuilder node_builder(options.GetNameForOp(OpName()), OpName(),         \
+                             options.op_registry());                           \
+    const DataType dt = DataTypeToEnum<TYPE>::v();                             \
+    if (t.size() == 1) {                                                       \
+      TensorProto proto;                                                       \
+      proto.set_dtype(dt);                                                     \
+      shape.AsProto(proto.mutable_tensor_shape());                             \
+      __VA_ARGS__;                                                             \
+      node_builder.Attr("dtype", dt).Attr("value", proto);                     \
+    } else {                                                                   \
+      Tensor tensor(dt, shape);                                                \
+      if (tensor.NumElements() != static_cast<int64>(t.size())) {              \
+        options.UpdateStatus(errors::InvalidArgument(                          \
+            t.size(), " values provided to Const() != ", tensor.NumElements(), \
+            " elements for shape ", shape.ShortDebugString()));                \
+      } else {                                                                 \
+        std::copy_n(t.data(), t.size(), tensor.flat<TYPE>().data());           \
+        node_builder.Attr("dtype", dt).Attr("value", tensor);                  \
+      }                                                                        \
+    }                                                                          \
+    return options.FinalizeBuilder(&node_builder);                             \
+  }
+
+#define DEFINE_CONST_IMPL(TYPE, ...) \
+  DEFINE_CONST_SCALAR(TYPE)          \
+  DEFINE_CONST_VECTOR(TYPE)          \
+  DEFINE_CONST_TENSOR(TYPE, __VA_ARGS__)
+
+#define DEFINE_CONST(TYPE, FIELD) \
+  DEFINE_CONST_IMPL(TYPE, proto.add_##FIELD(*t.begin());)
+
+DEFINE_CONST(float, float_val);
+DEFINE_CONST(double, double_val);
+DEFINE_CONST(int32, int_val);
+DEFINE_CONST(uint8, int_val);
+DEFINE_CONST(int16, int_val);
+DEFINE_CONST(int8, int_val);
+DEFINE_CONST(int64, int64_val);
+DEFINE_CONST(bool, bool_val);
+
+DEFINE_CONST_IMPL(complex64, proto.add_scomplex_val(t.begin()->real());
+                  proto.add_scomplex_val(t.begin()->imag()););
+
+Node* Const(StringPiece s, const GraphDefBuilder::Options& options) {
+  if (options.HaveError()) return nullptr;
+  NodeBuilder node_builder(options.GetNameForOp(OpName()), OpName(),
+                           options.op_registry());
+  TensorProto proto;
+  proto.set_dtype(DT_STRING);
+  TensorShape({}).AsProto(proto.mutable_tensor_shape());
+  proto.add_string_val(s.data(), s.size());
+  node_builder.Attr("dtype", DT_STRING).Attr("value", proto);
+  return options.FinalizeBuilder(&node_builder);
+}
+
+DEFINE_CONST_VECTOR(string)
+DEFINE_CONST_TENSOR(string, proto.add_string_val(*t.begin());)
+
+#undef DEFINE_CONST
+#undef DEFINE_CONST_IMPL
+#undef DEFINE_CONST_TENSOR
+#undef DEFINE_CONST_VECTOR
+#undef DEFINE_CONST_SCALAR
+
+Node* Const(const Tensor& t, const GraphDefBuilder::Options& options) {
+  if (options.HaveError()) return nullptr;
+  NodeBuilder node_builder(options.GetNameForOp(OpName()), OpName(),
+                           options.op_registry());
+  node_builder.Attr("dtype", t.dtype()).Attr("value", t);
+  return options.FinalizeBuilder(&node_builder);
+}
+
+Node* Const(const TensorProto& proto, const GraphDefBuilder::Options& options) {
+  if (options.HaveError()) return nullptr;
+  NodeBuilder node_builder(options.GetNameForOp(OpName()), OpName(),
+                           options.op_registry());
+  node_builder.Attr("dtype", proto.dtype()).Attr("value", proto);
+  return options.FinalizeBuilder(&node_builder);
+}
+
+}  // namespace ops
+}  // namespace tensorflow
diff --git a/tensorflow/cc/ops/const_op.h b/tensorflow/cc/ops/const_op.h
new file mode 100644
index 0000000000..1fb739b974
--- /dev/null
+++ b/tensorflow/cc/ops/const_op.h
@@ -0,0 +1,70 @@
+#ifndef TENSORFLOW_CC_OPS_CONST_OP_H_
+#define TENSORFLOW_CC_OPS_CONST_OP_H_
+
+#include "tensorflow/core/framework/tensor.pb.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace ops {
+
+// If a shape is specified, you may either provide the same number of values,
+// or a single value and that value will be duplicated to fill out the Tensor.
+#define DECLARE_CONST(TYPE)                                                  \
+  Node* Const(TYPE s, const GraphDefBuilder::Options& options); /* Scalar */ \
+  Node* Const(gtl::ArraySlice<TYPE> v,                                       \
+              const GraphDefBuilder::Options& options); /* Vector */         \
+  Node* Const(gtl::ArraySlice<TYPE> t, const TensorShape& shape,             \
+              const GraphDefBuilder::Options& options); /* Tensor */         \
+  inline Node* Const(std::initializer_list<TYPE> v, /* Vector using {...} */ \
+                     const GraphDefBuilder::Options& options) {              \
+    return Const(gtl::ArraySlice<TYPE>(v), options);                         \
+  }                                                                          \
+  inline Node* Const(std::initializer_list<TYPE> t, /* Tensor using {...} */ \
+                     const TensorShape& shape,                               \
+                     const GraphDefBuilder::Options& options) {              \
+    return Const(gtl::ArraySlice<TYPE>(t), shape, options);                  \
+  }
+
+DECLARE_CONST(float);
+DECLARE_CONST(double);
+DECLARE_CONST(int32);
+DECLARE_CONST(uint8);
+DECLARE_CONST(int16);
+DECLARE_CONST(int8);
+DECLARE_CONST(complex64);
+DECLARE_CONST(int64);
+DECLARE_CONST(bool);
+
+#undef DECLARE_CONST
+
+// String
+Node* Const(StringPiece s, const GraphDefBuilder::Options& options);
+Node* Const(gtl::ArraySlice<string> v, const GraphDefBuilder::Options& options);
+Node* Const(gtl::ArraySlice<string> t, const TensorShape& shape,
+            const GraphDefBuilder::Options& options);
+inline Node* Const(std::initializer_list<string> v,
+                   const GraphDefBuilder::Options& options) {
+  return Const(gtl::ArraySlice<string>(v), options);
+}
+inline Node* Const(std::initializer_list<string> t, const TensorShape& shape,
+                   const GraphDefBuilder::Options& options) {
+  return Const(gtl::ArraySlice<string>(t), shape, options);
+}
+
+// A Tensor of any type.
+Node* Const(const Tensor& t, const GraphDefBuilder::Options& options);
+Node* Const(const TensorProto& proto, const GraphDefBuilder::Options& options);
+
+template <class T>
+Node* EmptyConst(const GraphDefBuilder::Options& options) {
+  return Const(gtl::ArraySlice<T>(), options);
+}
+
+// TODO(josh11b): Support other types (e.g. quantized ints, float16).
+
+}  // namespace ops
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_CC_OPS_CONST_OP_H_
diff --git a/tensorflow/cc/ops/functional_grad.cc b/tensorflow/cc/ops/functional_grad.cc
new file mode 100644
index 0000000000..28b8b4a0e5
--- /dev/null
+++ b/tensorflow/cc/ops/functional_grad.cc
@@ -0,0 +1,42 @@
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+typedef FunctionDefHelper FDH;
+
+Status MapAccumulateGrad(const AttrSlice& attrs, FunctionDef* ret) {
+  const NameAttrList* func;
+  TF_RETURN_IF_ERROR(GetNodeAttr(attrs, "f", &func));
+  DataType T;
+  TF_RETURN_IF_ERROR(GetNodeAttr(attrs, "T", &T));
+  int k;
+  TF_RETURN_IF_ERROR(GetNodeAttr(attrs, "K", &k));
+  // The gradient function of f.
+  //  f : (K*T, T, T) -> T
+  //  g : (K*T, T, T, T) -> (K*T, T, T)
+  auto grad = FDH::FunctionRef("SymbolicGradient",
+                               {{"f", *func},
+                                {"Tin", std::vector<DataType>(k + 3, T)},
+                                {"Tout", std::vector<DataType>(k + 2, T)}});
+  *ret = FDH::Define(
+      // Arg defs
+      {"theta: K*T", "x: T", "u: T", "dy: T"},
+      // Ret val defs
+      {"dtheta: K*T", "dx: T", "du: T"},
+      // Attr defs
+      {{"T: {float, double}"}},
+      // nodes.
+      {{{"y"},
+        "MapAccumulate",
+        {"theta", "x", "u"},
+        {{"f", *func}, {"T", "$T"}, {"K", k}}},
+       {{"dtheta", "dx", "du"},
+        "MapAccumulateGrad",
+        {"theta", "x", "u", "y", "dy"},
+        {{"g", grad}, {"T", "$T"}, {"K", k}}}});
+  return Status::OK();
+}
+REGISTER_OP_GRADIENT("MapAccumulate", MapAccumulateGrad);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/cc/ops/math_grad.cc b/tensorflow/cc/ops/math_grad.cc
new file mode 100644
index 0000000000..4e8baa0d10
--- /dev/null
+++ b/tensorflow/cc/ops/math_grad.cc
@@ -0,0 +1,566 @@
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+typedef FunctionDefHelper FDH;
+
+// Cwise binary ops
+Status GradForUnaryCwise(FunctionDef* g, std::vector<FDH::Node> nodes) {
+  for (auto& n : nodes) {
+    if (n.attr.empty()) {
+      n.attr = {{"T", "$T"}};
+    }
+  }
+  *g = FDH::Define(
+      // Arg defs
+      {"x: T", "dy: T"},
+      // Ret val defs
+      {"dx: T"},
+      // Attr defs
+      {{"T: {float, double}"}},
+      // Nodes
+      nodes);
+  return Status::OK();
+}
+
+Status AbsGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"sign"}, "Sign", {"x"}},
+      {{"dx"}, "Mul", {"dy", "sign"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Abs", AbsGrad);
+
+Status NegGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"dx"}, "Neg", {"dy"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Neg", NegGrad);
+
+Status InvGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"y"}, "Inv", {"x"}},
+      {{"y2"}, "Square", {"y"}},
+      {{"y2_neg"}, "Neg", {"y2"}},
+      {{"dx"}, "Mul", {"dy", "y2_neg"}}
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Inv", InvGrad);
+
+Status SquareGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      FDH::Const("c", 2LL),
+      {{"two"}, "Cast", {"c"}, {{"SrcT", DT_INT64}, {"DstT", "$T"}}},
+      {{"x2"}, "Mul", {"x", "two"}},  // x * 2
+      {{"dx"}, "Mul", {"dy", "x2"}},  // dy * (x * 2)
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Square", SquareGrad);
+
+Status SqrtGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"y"}, "Sqrt", {"x"}},
+      {{"y_inv"}, "Inv", {"y"}},
+      FDH::Const("const", 0.5f),
+      {{"half"}, "Cast", {"const"}, {{"SrcT", DT_FLOAT}, {"DstT", "$T"}}},
+      {{"a"}, "Mul", {"half", "y_inv"}},  // .5 * 1/y
+      {{"dx"}, "Mul", {"dy", "a"}},  // dy * (.5 * 1/y)
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Sqrt", SqrtGrad);
+
+Status RsqrtGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"x_inv"}, "Inv", {"x"}},
+      {{"y"}, "Rsqrt", {"x"}},
+      FDH::Const("const", -.5f),
+      {{"neghalf"}, "Cast", {"const"}, {{"SrcT", DT_FLOAT}, {"DstT", "$T"}}},
+      {{"a"}, "Mul", {"neghalf", "x_inv"}},   // -0.5 * 1/x
+      {{"b"}, "Mul", {"a", "y"}},             // -0.5 * 1/x * y
+      {{"dx"}, "Mul", {"dy", "b"}},           // dy * (1/y * .5)
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Rsqrt", RsqrtGrad);
+
+Status ExpGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"y"}, "Exp", {"x"}},
+      {{"dx"}, "Mul", {"dy", "y"}},           // dy * y
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Exp", ExpGrad);
+
+Status LogGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"x_inv"}, "Inv", {"x"}},
+      {{"dx"}, "Mul", {"dy", "x_inv"}},           // dy * 1/x
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Log", LogGrad);
+
+Status TanhGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"y"}, "Tanh", {"x"}},
+      {{"y2"}, "Square", {"y"}},
+      FDH::Const("const", 1.0f),
+      {{"one"}, "Cast", {"const"}, {{"SrcT", DT_FLOAT}, {"DstT", "$T"}}},
+      {{"a"}, "Sub", {"one", "y2"}},
+      {{"dx"}, "Mul", {"dy", "a"}},           // dy * (1 - y*y)
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Tanh", TanhGrad);
+
+Status SigmoidGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"y"}, "Sigmoid", {"x"}},
+      FDH::Const("const", 1.0f),
+      {{"one"}, "Cast", {"const"}, {{"SrcT", DT_FLOAT}, {"DstT", "$T"}}},
+      {{"a"}, "Sub", {"one", "y"}},
+      {{"b"}, "Mul", {"y", "a"}},             // y * (1 - y)
+      {{"dx"}, "Mul", {"dy", "b"}},           // dy * y * (1 - y)
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Sigmoid", SigmoidGrad);
+
+Status SignGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"s"}, "Shape", {"x"}},
+      FDH::Const("zero", 0.f),
+      {{"val"}, "Cast", {"zero"}, {{"SrcT", DT_FLOAT}, {"DstT", "$T"}}},
+      {{"dx"}, "Fill", {"s", "val"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Sign", SignGrad);
+
+Status SinGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"cos"}, "Cos", {"x"}},
+      {{"dx"}, "Mul", {"dy", "cos"}},  // dy * cos(x)
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Sin", SinGrad);
+
+Status CosGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"sin"}, "Sin", {"x"}},
+      {{"neg"}, "Neg", {"sin"}},
+      {{"dx"}, "Mul", {"dy", "neg"}},  // dy * (-sin(x))
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Cos", CosGrad);
+
+Status RealGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      FDH::Const("zero", 0.f),
+      {{"dx"}, "Complex", {"dy", "zero"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Real", RealGrad);
+
+Status ImagGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      FDH::Const("zero", 0.f),
+      {{"dx"}, "Complex", {"zero", "dy"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Imag", ImagGrad);
+
+Status ConjGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForUnaryCwise(g, {
+      {{"dx"}, "Conj", {"dy"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Conj", ConjGrad);
+
+// Cwise binary ops
+//
+// TODO(zhifengc): This can be arrange as a function in the standard
+// library.
+Status GradForBinaryCwise(FunctionDef* g, std::vector<FDH::Node> body) {
+  // clang-format off
+  std::vector<FDH::Node> nodes = {
+    {{"sx"}, "Shape", {"x"}},
+    {{"sy"}, "Shape", {"y"}},
+  };
+  nodes.insert(nodes.end(), body.begin(), body.end());
+  std::vector<FDH::Node> reshapes = {
+    {{"sum_gx"}, "Sum", {"gx", "rx"}},
+    {{"dx"}, "Reshape", {"sum_gx", "sx"}},
+    {{"sum_gy"}, "Sum", {"gy", "ry"}},
+    {{"dy"}, "Reshape", {"sum_gy", "sy"}},
+  };
+  nodes.insert(nodes.end(), reshapes.begin(), reshapes.end());
+
+  // clang-format on
+  for (auto& n : nodes) {
+    if (n.attr.empty()) {
+      n.attr = {{"T", "$T"}};
+    }
+  }
+  // "BroadcastGradientArgs" doesn't need any attrs.
+  nodes.push_back({{"rx", "ry"}, "BroadcastGradientArgs", {"sx", "sy"}});
+  *g = FDH::Define(
+      // Arg defs
+      {"x: T", "y: T", "dz: T"},
+      // Ret val defs
+      {"dx: T", "dy: T"},
+      // Attr defs
+      {{"T: {float, double}"}},
+      // Nodes
+      nodes);
+  return Status::OK();
+}
+
+Status AddGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForBinaryCwise(g, {
+      {{"gx"}, "Identity", {"dz"}},
+      {{"gy"}, "Identity", {"dz"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Add", AddGrad);
+
+Status SubGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForBinaryCwise(g, {
+      {{"gx"}, "Identity", {"dz"}},
+      {{"gy"}, "Neg", {"dz"}},          // -dz
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Sub", SubGrad);
+
+Status MulGrad(const AttrSlice& attrs, FunctionDef* g) {
+  DataType T;
+  TF_RETURN_IF_ERROR(GetNodeAttr(attrs, "T", &T));
+  if (T == DT_COMPLEX64) {
+    return GradForBinaryCwise(
+        g, {
+               {{"cy"}, "Conj", {"y"}},
+               {{"gx"}, "Mul", {"dz", "cy"}},  // dz * Conj(y)
+               {{"cx"}, "Conj", {"x"}},
+               {{"gy"}, "Mul", {"cx", "dz"}},  // Conj(x) * dz
+           });
+  } else {
+    // clang-format off
+    return GradForBinaryCwise(g, {
+        {{"gx"}, "Mul", {"dz", "y"}},  // dz * y
+        {{"gy"}, "Mul", {"x", "dz"}},  // x * dz
+    });
+    // clang-format on
+  }
+}
+REGISTER_OP_GRADIENT("Mul", MulGrad);
+
+Status DivGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForBinaryCwise(g, {
+      {{"gx"}, "Div", {"dz", "y"}},
+      {{"nx"}, "Neg", {"x"}},
+      {{"y2"}, "Square", {"y"}},
+      {{"nx_y2"}, "Div", {"nx", "y2"}},
+      {{"gy"}, "Mul", {"dz", "nx_y2"}},  // dz * (- x / y^2)
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Div", DivGrad);
+
+Status PowGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForBinaryCwise(g, {
+      {{"z"}, "Pow", {"x", "y"}},
+      // dz * y * Pow(x, y - 1)
+      FDH::Const("const", 1.0f),
+      {{"one"}, "Cast", {"const"}, {{"SrcT", DT_FLOAT}, {"DstT", "$T"}}},
+      {{"t0"}, "Sub", {"y", "one"}},
+      {{"t1"}, "Pow", {"x", "t0"}},
+      {{"t2"}, "Mul", {"dz", "y"}},
+      {{"gx"}, "Mul", {"t1", "t2"}},
+      // dz * z * Log(x)
+      {{"t3"}, "Log", {"x"}},
+      {{"t4"}, "Mul", {"dz", "z"}},
+      {{"gy"}, "Mul", {"t3", "t4"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Pow", PowGrad);
+
+Status MaximumMinimumGradHelper(const string& comparator,
+                                const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForBinaryCwise(g, {
+      {{"c"}, comparator, {"x", "y"}},
+      {{"mask"}, "Cast", {"c"}, {{"SrcT", DT_BOOL}, {"DstT", "$T"}}},
+      {{"gx"}, "Mul", {"dz", "mask"}},
+      {{"gy"}, "Sub", {"dz", "gx"}},
+  });
+  // clang-format on
+}
+
+Status MaximumGrad(const AttrSlice& attrs, FunctionDef* g) {
+  return MaximumMinimumGradHelper("GreaterEqual", attrs, g);
+}
+REGISTER_OP_GRADIENT("Maximum", MaximumGrad);
+
+Status MinimumGrad(const AttrSlice& attrs, FunctionDef* g) {
+  return MaximumMinimumGradHelper("LessEqual", attrs, g);
+}
+REGISTER_OP_GRADIENT("Minimum", MinimumGrad);
+
+Status ComplexGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForBinaryCwise(g, {
+      {{"gx"}, "Real", {"dz"}},
+      {{"gy"}, "Imag", {"dz"}},
+  });
+  // clang-format on
+}
+REGISTER_OP_GRADIENT("Complex", ComplexGrad);
+
+// Cwise ternary ops.
+Status SelectGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  *g = FDH::Define(
+      {"c:bool", "x:T", "y:T", "dz:T"},
+      {"dc:bool", "dx:T", "dy:T"},
+      {{"T: {float, double}"}},
+      {
+        {{"dc"}, "ZerosLike", {"c"}, {{"T", DT_BOOL}}},
+        {{"zeros"}, "ZerosLike", {"x"}, {{"T", "$T"}}},
+        {{"dx"}, "Select", {"c", "dz", "zeros"}, {{"T", "$T"}}},
+        {{"dy"}, "Select", {"c", "zeros", "dz"}, {{"T", "$T"}}},
+      });
+  // clang-format on
+  return Status::OK();
+}
+REGISTER_OP_GRADIENT("Select", SelectGrad);
+
+// N-ry ops
+// REGISTER_OP_GRADIENT("AddN", AddNGrad);
+
+// Reduction ops
+//
+// TODO(zhifengc): This helper is pretty ugly. Do something better.
+// TODO(zhifengc): This can be arrange as a function in the standard library.
+Status GradForReductionOp(FunctionDef* g, std::vector<FDH::Node> body) {
+  // Shape manipulation nodes.
+
+  // clang-format off
+  std::vector<FDH::Node> nodes = {
+   {{"x_shape"}, "Shape", {"x"}},
+   {{"x_rank"}, "Rank", {"x"}},
+   {{"i_shape"}, "Shape", {"i"}, {{"T", DT_INT32}}},
+   FDH::Const("zero", 0),
+   FDH::Const("one", 1),
+   // stitch_idx0 = Range(0, x_rank, 1)
+   {{"stitch_idx1"}, "Identity", {"i"}, {{"T", DT_INT32}}},
+   {{"stitch_idx"}, "_ListToArray", {"stitch_idx0", "stitch_idx1"},
+    {{"Tin", DataTypeSlice{DT_INT32, DT_INT32}},
+     {"T", DT_INT32}, {"N", 2}}},
+   {{"stitch_val0"}, "Identity", {"x_shape"}, {{"T", DT_INT32}}},
+   {{"stitch_val1"}, "Fill", {"i_shape", "one"}, {{"T", DT_INT32}}},
+   {{"stitch_val"}, "_ListToArray", {"stitch_val0", "stitch_val1"},
+    {{"Tin", DataTypeSlice{DT_INT32, DT_INT32}},
+     {"T", DT_INT32}, {"N", 2}}},
+   {{"y_shape"}, "DynamicStitch", {"stitch_idx", "stitch_val"},
+                 {{"N", 2}, {"T", DT_INT32}}},
+   {{"tile_scaling"}, "Div", {"x_shape", "y_shape"}, {{"T", DT_INT32}}},
+   {{"di"}, "ZerosLike", {"i"}, {{"T", DT_INT32}}}
+  };
+  // clang-format on
+  nodes.insert(nodes.end(), body.begin(), body.end());
+  for (auto& n : nodes) {
+    if (n.attr.empty()) {
+      n.attr = {{"T", "$T"}};
+    }
+  }
+  // "Range" doesn't need any attr.
+  nodes.push_back({{"stitch_idx0"}, "Range", {"zero", "x_rank", "one"}, {}});
+  *g = FDH::Define(
+      // Arg defs
+      {"x:T", "i:int32", "dy:T"},
+      // Ret val defs
+      {"dx:T", "di:int32"},
+      // Attr defs
+      {{"T: {float, double}"}},
+      // Nodes
+      nodes);
+  return Status::OK();
+}
+
+Status SumGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForReductionOp(g, {
+    {{"dy_reshaped"}, "Reshape", {"dy", "y_shape"}},
+    {{"dx"}, "Tile", {"dy_reshaped", "tile_scaling"}},
+  });
+  // clang-format on
+  return Status::OK();
+}
+REGISTER_OP_GRADIENT("Sum", SumGrad);
+
+Status MeanGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  return GradForReductionOp(g, {
+    {{"factor"}, "Prod", {"tile_scaling", "zero"}, {{"T", DT_INT32}}},
+    {{"factor_T"}, "Cast", {"factor"}, {{"SrcT", DT_INT32}, {"DstT", "$T"}}},
+    {{"dy_scaled"}, "Div", {"dy", "factor_T"}},
+    {{"dy_reshaped"}, "Reshape", {"dy_scaled", "y_shape"}},
+    {{"dx"}, "Tile", {"dy_reshaped", "tile_scaling"}},
+  });
+  // clang-format on
+  return Status::OK();
+}
+REGISTER_OP_GRADIENT("Mean", MeanGrad);
+
+// REGISTER_OP_GRADIENT("Prod", ProdGrad);
+// REGISTER_OP_GRADIENT("SegmentSum", SegmentSumGrad);
+// REGISTER_OP_GRADIENT("SegmentMean", SegmentMeanGrad);
+// REGISTER_OP_GRADIENT("SparseSegmentSum", SparseSegmentSumGrad);
+// REGISTER_OP_GRADIENT("SparseSegmentMean", SparseSegmentMeanGrad);
+// REGISTER_OP_GRADIENT("SegmentMin", SegmentMinGrad);
+// REGISTER_OP_GRADIENT("SegmentMax", SegmentMaxGrad);
+// REGISTER_OP_GRADIENT("UnsortedSegmentSum", UnsortedSegmentSumGrad);
+
+Status MinMaxGradHelper(const string& op, const AttrSlice& attrs,
+                        FunctionDef* g) {
+  // clang-format off
+  *g = FDH::Define(
+      // Arg defs
+      {"x:T", "i:int32", "dy:T"},
+      // Ret val defs
+      {"dx:T", "di:int32"},
+      // Attr defs
+      {{"T: {float, double}"}},
+      {
+        // keep_dims because we need to do x == y, which requries x
+        // and y are broadcastable.
+        {{"y"}, op, {"x", "i"}, {{"T", "$T"}, {"keep_dims", true}}},
+        {{"mask"}, "Equal", {"x", "y"}, {{"T", "$T"}}},
+        {{"mask_cast"}, "Cast", {"mask"}, {{"SrcT", DT_BOOL}, {"DstT", "$T"}}},
+        {{"mask_sum"}, "Sum", {"mask_cast", "i"}, {{"T", "$T"}}},
+        {{"norm_dy"}, "Div", {"dy", "mask_sum"}, {{"T", "$T"}}},
+        {{"sy"}, "Shape", {"y"}, {{"T", "$T"}}},
+        {{"norm_dy_reshaped"}, "Reshape", {"norm_dy", "sy"}, {{"T", "$T"}}},
+        {{"dx"}, "Mul", {"mask_cast", "norm_dy_reshaped"}, {{"T", "$T"}}},
+        {{"di"}, "ZerosLike", {"i"}, {{"T", DT_INT32}}}
+      });
+  // clang-format on
+  return Status::OK();
+}
+
+Status MaxGrad(const AttrSlice& attrs, FunctionDef* g) {
+  return MinMaxGradHelper("Max", attrs, g);
+}
+REGISTER_OP_GRADIENT("Max", MaxGrad);
+
+Status MinGrad(const AttrSlice& attrs, FunctionDef* g) {
+  return MinMaxGradHelper("Min", attrs, g);
+}
+REGISTER_OP_GRADIENT("Min", MinGrad);
+
+static Status MatMulGradHelper(FunctionDef* g, const string& x0, bool tx0,
+                               const string& x1, bool tx1, const string& y0,
+                               bool ty0, const string& y1, bool ty1) {
+  *g = FDH::Define(
+      // Arg defs
+      {"x: T", "y: T", "dz: T"},
+      // Ret val defs
+      {"dx: T", "dy: T"},
+      // Attr defs
+      {{"T: {float, double}"}},
+      // Nodes
+      {
+          {{"dx"},
+           "MatMul",
+           {x0, x1},
+           {{"T", "$T"}, {"transpose_a", tx0}, {"transpose_b", tx1}}},
+          {{"dy"},
+           "MatMul",
+           {y0, y1},
+           {{"T", "$T"}, {"transpose_a", ty0}, {"transpose_b", ty1}}},
+      });
+  return Status::OK();
+}
+
+Status MatMulGrad(const AttrSlice& attrs, FunctionDef* g) {
+  DataType T;
+  TF_RETURN_IF_ERROR(GetNodeAttr(attrs, "T", &T));
+  if (T == DT_COMPLEX64) {
+    return errors::Unimplemented(
+        "MatMul gradient for complex is not supported yet.");
+  }
+  bool ta;
+  bool tb;
+  TF_RETURN_IF_ERROR(GetNodeAttr(attrs, "transpose_a", &ta));
+  TF_RETURN_IF_ERROR(GetNodeAttr(attrs, "transpose_b", &tb));
+  if (!ta && !tb) {
+    return MatMulGradHelper(g, "dz", false, "y", true, "x", true, "dz", false);
+  }
+  if (!ta && tb) {
+    return MatMulGradHelper(g, "dz", false, "y", false, "dz", true, "x", false);
+  }
+  if (ta && !tb) {
+    return MatMulGradHelper(g, "y", false, "dz", true, "x", false, "dz", false);
+  }
+  CHECK(ta && tb);
+  return MatMulGradHelper(g, "y", true, "dz", true, "dz", true, "x", true);
+}
+REGISTER_OP_GRADIENT("MatMul", MatMulGrad);
+
+// REGISTER_OP_GRADIENT("SparseMatMul", SparseMatMulGrad);
+// REGISTER_OP_GRADIENT("BatchMatMul", BatchMatMulGrad);
+
+// Comparison ops.
+REGISTER_OP_NO_GRADIENT("Less");
+REGISTER_OP_NO_GRADIENT("LessEqual");
+REGISTER_OP_NO_GRADIENT("Greater");
+REGISTER_OP_NO_GRADIENT("GreaterEqual");
+REGISTER_OP_NO_GRADIENT("Equal");
+REGISTER_OP_NO_GRADIENT("NotEqual");
+
+// Logical ops.
+REGISTER_OP_NO_GRADIENT("LogicalAnd");
+REGISTER_OP_NO_GRADIENT("LogicalOr");
+REGISTER_OP_NO_GRADIENT("LogicalNot");
+
+// Sequence generation ops.
+REGISTER_OP_NO_GRADIENT("Range");
+REGISTER_OP_NO_GRADIENT("LinSpace");
+
+}  // end namespace tensorflow
diff --git a/tensorflow/cc/ops/nn_grad.cc b/tensorflow/cc/ops/nn_grad.cc
new file mode 100644
index 0000000000..89b037e3c8
--- /dev/null
+++ b/tensorflow/cc/ops/nn_grad.cc
@@ -0,0 +1,55 @@
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+typedef FunctionDefHelper FDH;
+
+Status ReluGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  *g = FDH::Define(
+      // Arg defs
+      {"x: T", "dy: T"},
+      // Ret val defs
+      {"dx: T"},
+      // Attr defs
+      {{"T: {float, double}"}},
+      // Nodes
+      {
+        {{"dx"}, "ReluGrad", {"dy", "x"}, {{"T", "$T"}}}
+      });
+  // clang-format on
+  return Status::OK();
+}
+REGISTER_OP_GRADIENT("Relu", ReluGrad);
+
+Status CrossEntropyGrad(const AttrSlice& attrs, FunctionDef* g) {
+  // clang-format off
+  *g = FDH::Define(
+    // Arg defs
+    {"features: T", "labels: T", "dcost_dloss: T", "donotcare: T"},
+    // Ret val defs
+    {"dcost_dfeatures: T", "dcost_dlabels: T"},
+    // Attr defs
+    {{"T: {float, double}"}},
+    // Nodes
+    {
+      // _, dloss_dfeatures = CrossEntropy(features, labels)
+      {{"donotcare_loss", "dloss_dfeatures"}, "CrossEntropy",
+       {"features", "labels"}, {{"T", "$T"}}},
+      // dcost_dloss is of shape [batch_size].
+      // dcost_dloss_mat is of shape [batch_size, 1].
+      FDH::Const("neg1", -1),
+      {{"dcost_dloss_mat"}, "ExpandDims", {"dcost_dloss", "neg1"},
+       {{"T", "$T"}}},
+      // chain rule: dcost/dfeatures = dcost/dloss * dloss/dfeatures
+      {{"dcost_dfeatures"}, "Mul", {"dcost_dloss_mat", "dloss_dfeatures"},
+       {{"T", "$T"}}},
+      {{"dcost_dlabels"}, "ZerosLike", {"labels"}, {{"T", "$T"}}},
+    });
+  // clang-format on
+  return Status::OK();
+}
+REGISTER_OP_GRADIENT("CrossEntropy", CrossEntropyGrad);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/cc/ops/standard_ops.h b/tensorflow/cc/ops/standard_ops.h
new file mode 100644
index 0000000000..8d7160a7f9
--- /dev/null
+++ b/tensorflow/cc/ops/standard_ops.h
@@ -0,0 +1,26 @@
+// #include this file to get access to the standard set of C++ graph
+// definition libraries.
+
+#ifndef TENSORFLOW_CC_OPS_STANDARD_OPS_H_
+#define TENSORFLOW_CC_OPS_STANDARD_OPS_H_
+
+#include "tensorflow/cc/ops/array_ops.h"
+#include "tensorflow/cc/ops/attention_ops.h"
+#include "tensorflow/cc/ops/const_op.h"
+#include "tensorflow/cc/ops/data_flow_ops.h"
+#include "tensorflow/cc/ops/image_ops.h"
+#include "tensorflow/cc/ops/io_ops.h"
+#include "tensorflow/cc/ops/linalg_ops.h"
+#include "tensorflow/cc/ops/logging_ops.h"
+#include "tensorflow/cc/ops/math_ops.h"
+#include "tensorflow/cc/ops/nn_ops.h"
+#include "tensorflow/cc/ops/parsing_ops.h"
+#include "tensorflow/cc/ops/random_ops.h"
+#include "tensorflow/cc/ops/sparse_ops.h"
+#include "tensorflow/cc/ops/state_ops.h"
+#include "tensorflow/cc/ops/string_ops.h"
+#include "tensorflow/cc/ops/summary_ops.h"
+#include "tensorflow/cc/ops/training_ops.h"
+#include "tensorflow/cc/ops/user_ops.h"
+
+#endif  // TENSORFLOW_CC_OPS_STANDARD_OPS_H_
diff --git a/tensorflow/cc/tutorials/example_trainer.cc b/tensorflow/cc/tutorials/example_trainer.cc
new file mode 100644
index 0000000000..49046dd220
--- /dev/null
+++ b/tensorflow/cc/tutorials/example_trainer.cc
@@ -0,0 +1,146 @@
+#include <cstdio>
+#include <functional>
+#include <string>
+#include <vector>
+
+#include "tensorflow/cc/ops/standard_ops.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/graph/default_device.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/lib/core/command_line_flags.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/init_main.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/session.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace example {
+
+struct Options {
+  int num_concurrent_sessions = 10;  // The number of concurrent sessions
+  int num_concurrent_steps = 10;     // The number of concurrent steps
+  int num_iterations = 100;          // Each step repeats this many times
+  bool use_gpu = false;              // Whether to use gpu in the training
+};
+
+TF_DEFINE_int32(num_concurrent_sessions, 10, "Number of concurrent sessions");
+TF_DEFINE_int32(num_concurrent_steps, 10, "Number of concurrent steps");
+TF_DEFINE_int32(num_iterations, 100, "Number of iterations");
+TF_DEFINE_bool(use_gpu, false, "Whether to use gpu in the training");
+
+// A = [3 2; -1 0]; x = rand(2, 1);
+// We want to compute the largest eigenvalue for A.
+// repeat x = y / y.norm(); y = A * x; end
+GraphDef CreateGraphDef() {
+  // TODO(jeff,opensource): This should really be a more interesting
+  // computation.  Maybe turn this into an mnist model instead?
+  GraphDefBuilder b;
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  // Store rows [3, 2] and [-1, 0] in row major format.
+  Node* a = Const({3.f, 2.f, -1.f, 0.f}, {2, 2}, b.opts());
+
+  // x is from the feed.
+  Node* x = Const({0.f}, {2, 1}, b.opts().WithName("x"));
+
+  // y = A * x
+  Node* y = MatMul(a, x, b.opts().WithName("y"));
+
+  // y2 = y.^2
+  Node* y2 = Square(y, b.opts());
+
+  // y2_sum = sum(y2)
+  Node* y2_sum = Sum(y2, Const(0, b.opts()), b.opts());
+
+  // y_norm = sqrt(y2_sum)
+  Node* y_norm = Sqrt(y2_sum, b.opts());
+
+  // y_normalized = y ./ y_norm
+  Div(y, y_norm, b.opts().WithName("y_normalized"));
+
+  GraphDef def;
+  TF_CHECK_OK(b.ToGraphDef(&def));
+  return def;
+}
+
+string DebugString(const Tensor& x, const Tensor& y) {
+  CHECK_EQ(x.NumElements(), 2);
+  CHECK_EQ(y.NumElements(), 2);
+  auto x_flat = x.flat<float>();
+  auto y_flat = y.flat<float>();
+  const float lambda = y_flat(0) / x_flat(0);
+  return strings::Printf("lambda = %8.6f x = [%8.6f %8.6f] y = [%8.6f %8.6f]",
+                         lambda, x_flat(0), x_flat(1), y_flat(0), y_flat(1));
+}
+
+void ConcurrentSteps(const Options* opts, int session_index) {
+  // Creates a session.
+  SessionOptions options;
+  std::unique_ptr<Session> session(NewSession(options));
+  GraphDef def = CreateGraphDef();
+  if (options.target.empty()) {
+    graph::SetDefaultDevice(opts->use_gpu ? "/gpu:0" : "/cpu:0", &def);
+  }
+
+  TF_CHECK_OK(session->Create(def));
+
+  // Spawn M threads for M concurrent steps.
+  const int M = opts->num_concurrent_steps;
+  thread::ThreadPool step_threads(Env::Default(), "trainer", M);
+
+  for (int step = 0; step < M; ++step) {
+    step_threads.Schedule([&session, opts, session_index, step]() {
+      // Randomly initialize the input.
+      Tensor x(DT_FLOAT, TensorShape({2, 1}));
+      x.flat<float>().setRandom();
+
+      // Iterations.
+      std::vector<Tensor> outputs;
+      for (int iter = 0; iter < opts->num_iterations; ++iter) {
+        outputs.clear();
+        TF_CHECK_OK(
+            session->Run({{"x", x}}, {"y:0", "y_normalized:0"}, {}, &outputs));
+        CHECK_EQ(2, outputs.size());
+
+        const Tensor& y = outputs[0];
+        const Tensor& y_norm = outputs[1];
+        // Print out lambda, x, and y.
+        std::printf("%06d/%06d %s\n", session_index, step,
+                    DebugString(x, y).c_str());
+        // Copies y_normalized to x.
+        x = y_norm;
+      }
+    });
+  }
+
+  TF_CHECK_OK(session->Close());
+}
+
+void ConcurrentSessions(const Options& opts) {
+  // Spawn N threads for N concurrent sessions.
+  const int N = opts.num_concurrent_sessions;
+  thread::ThreadPool session_threads(Env::Default(), "trainer", N);
+  for (int i = 0; i < N; ++i) {
+    session_threads.Schedule(std::bind(&ConcurrentSteps, &opts, i));
+  }
+}
+
+}  // end namespace example
+}  // end namespace tensorflow
+
+int main(int argc, char* argv[]) {
+  tensorflow::example::Options opts;
+  tensorflow::Status s = tensorflow::ParseCommandLineFlags(&argc, argv);
+  if (!s.ok()) {
+    LOG(FATAL) << "Error parsing command line flags: " << s.ToString();
+  }
+  tensorflow::port::InitMain(argv[0], &argc, &argv);
+
+  opts.num_concurrent_sessions =
+      tensorflow::example::FLAGS_num_concurrent_sessions;
+  opts.num_concurrent_steps = tensorflow::example::FLAGS_num_concurrent_steps;
+  opts.num_iterations = tensorflow::example::FLAGS_num_iterations;
+  opts.use_gpu = tensorflow::example::FLAGS_use_gpu;
+  tensorflow::example::ConcurrentSessions(opts);
+}
diff --git a/tensorflow/core/BUILD b/tensorflow/core/BUILD
new file mode 100644
index 0000000000..c2fcfeed8c
--- /dev/null
+++ b/tensorflow/core/BUILD
@@ -0,0 +1,695 @@
+# Description:
+# TensorFlow is a computational framework, primarily for use in machine
+# learning applications.
+
+package(default_visibility = ["//tensorflow:internal"])
+
+package_group(name = "friends")
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+load("/tensorflow/tensorflow", "tf_copts")
+load("/tensorflow/tensorflow", "tf_cc_tests")
+load("/tensorflow/tensorflow", "tf_cuda_library")
+load("/tensorflow/tensorflow", "tf_gen_op_libs")
+load("/tensorflow/tensorflow", "tf_gpu_kernel_library")
+
+# For platform specific build config
+load(
+    "/tensorflow/core/platform/default/build_config",
+    "tf_proto_library",
+    "tf_additional_lib_srcs",
+    "tf_additional_test_srcs",
+    "tf_kernel_tests_linkstatic",
+)
+load(
+    "/tensorflow/core/platform/default/build_config_root",
+    "tf_cuda_tests_tags",
+)
+
+cc_library(
+    name = "lib",
+    srcs = glob(
+        [
+            "lib/**/*.h",
+            "lib/**/*.cc",
+            "platform/*.h",
+            "platform/*.cc",
+            "public/*.h",
+        ] + tf_additional_lib_srcs(),
+        exclude = [
+            "**/*test*",
+        ],
+    ),
+    copts = tf_copts(),
+    visibility = [
+        ":friends",
+        "//tensorflow:internal",
+    ],
+    deps = [
+        ":protos_cc",
+        "//tensorflow/core/platform/default/build_config:platformlib",
+    ],
+)
+
+tf_cuda_library(
+    name = "core_cpu",
+    srcs = glob(
+        [
+            "common_runtime/**/*.h",
+            "client/**/*.cc",
+            "common_runtime/**/*.cc",
+            "graph/**/*.h",
+            "graph/**/*.cc",
+        ],
+        exclude = [
+            "**/*test*",
+            "**/*main.cc",
+            "common_runtime/gpu/*.cc",
+            "common_runtime/copy_tensor.cc",
+            "common_runtime/gpu_device_factory.cc",
+            "common_runtime/local_session.cc",
+            "common_runtime/local_session.h",
+        ],
+    ),
+    hdrs = glob(["public/**/*.h"]),
+    copts = tf_copts(),
+    visibility = ["//visibility:public"],
+    deps = [
+        ":copy_tensor",
+        ":framework",
+        ":lib",
+        ":protos_cc",
+        "//third_party/eigen3",
+    ],
+    alwayslink = 1,
+)
+
+tf_cuda_library(
+    name = "framework",
+    srcs = glob(
+        [
+            "framework/**/*.h",
+            "framework/**/*.cc",
+            "util/**/*.h",
+            "util/**/*.cc",
+        ],
+        exclude = [
+            "**/*test*",
+            "**/*main.cc",
+        ],
+    ),
+    hdrs = glob(["public/**/*.h"]),
+    copts = tf_copts(),
+    visibility = ["//visibility:public"],
+    deps = [
+        ":lib",
+        ":protos_cc",
+        "//third_party/eigen3",
+    ],
+    alwayslink = 1,
+)
+
+tf_cuda_library(
+    name = "local",
+    srcs = [
+        "common_runtime/local_session.cc",
+        "common_runtime/local_session.h",
+    ],
+    copts = tf_copts(),
+    cuda_deps = [
+        ":cuda",
+    ],
+    linkstatic = 1,
+    deps = [
+        ":core",
+        ":lib",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "copy_tensor",
+    deps = [
+        ":lib",
+        ":protos_cc",
+        ":stream_executor",
+        "//third_party/eigen3",
+    ],
+)
+
+tf_cuda_library(
+    name = "gpu_runtime",
+    srcs = glob(
+        [
+            "common_runtime/gpu/**/*.h",
+            "common_runtime/gpu/**/*.cc",
+        ],
+        exclude = [
+            "**/*main.cc",
+            "**/*test.cc",
+        ],
+    ),
+    copts = tf_copts(),
+    cuda_deps = [
+        ":cuda",
+    ],
+    linkstatic = 1,
+    deps = [
+        ":core_cpu",
+        ":lib",
+        ":protos_cc",
+        ":stream_executor",
+        "//third_party/eigen3",
+    ],
+    alwayslink = 1,
+)
+
+# Test support library needed for higher-level tests
+cc_library(
+    name = "testlib",
+    testonly = 1,
+    srcs = [
+        "common_runtime/kernel_benchmark_testlib.cc",
+        "common_runtime/kernel_benchmark_testlib.h",
+        "framework/function_testlib.cc",
+        "framework/function_testlib.h",
+        "framework/tensor_testutil.cc",
+        "framework/tensor_testutil.h",
+        "graph/testlib.cc",
+        "graph/testlib.h",
+    ],
+    copts = tf_copts(),
+    visibility = [
+        ":friends",
+        "//tensorflow:internal",
+    ],
+    deps = [
+        ":core_cpu",
+        ":tensorflow",
+        ":test",
+        "//tensorflow/core/platform/default/build_config:gtest",
+    ],
+)
+
+tf_cuda_library(
+    name = "tensorflow_opensource",
+    copts = tf_copts(),
+    visibility = ["//visibility:public"],
+    deps = [
+        ":core",
+        ":gpu_runtime",
+        ":kernels",
+        ":lib",
+        ":local",
+    ],
+)
+
+tf_cuda_library(
+    name = "kernels",
+    srcs = glob(
+        [
+            "kernels/**/*.h",
+            "kernels/**/*.cc",
+            "ops/**/*.h",
+            "ops/**/*.cc",
+            "user_ops/**/*.h",
+            "user_ops/**/*.cc",
+        ],
+        exclude = [
+            "**/*test*",
+            "**/*main.cc",
+            "kernels/**/*.cu.cc",
+            "user_ops/**/*.cu.cc",
+        ],
+    ),
+    copts = tf_copts(),
+    cuda_deps = [
+        ":gpu_kernels",
+        ":cuda",
+    ],
+    linkstatic = 0,
+    visibility = ["//visibility:public"],
+    deps = [
+        "@gemmlowp//:eight_bit_int_gemm",
+        ":core",
+        ":lib",
+        ":protos_cc",
+        ":stream_executor",
+        "//tensorflow/models/embedding:word2vec_kernels",
+        "//tensorflow/models/embedding:word2vec_ops",
+        "//third_party/eigen3",
+    ],
+    alwayslink = 1,
+)
+
+tf_gpu_kernel_library(
+    name = "gpu_kernels",
+    srcs = glob(
+        [
+            "kernels/**/*.h",
+            "kernels/*.cu.cc",
+            "user_ops/**/*.h",
+            "user_ops/*.cu.cc",
+        ],
+    ),
+    visibility = ["//visibility:public"],
+    deps = [
+        "//third_party/eigen3",
+    ],
+)
+
+# Test support library needed for all tests
+cc_library(
+    name = "test",
+    testonly = 1,
+    srcs = [
+        "platform/test.cc",
+    ] + tf_additional_test_srcs(),
+    hdrs = [
+        "platform/test.h",
+        "platform/test_benchmark.h",
+    ],
+    copts = tf_copts(),
+    linkopts = ["-lm"],
+    deps = [
+        ":lib",
+        "//tensorflow/core/platform/default/build_config:gtest",
+    ],
+)
+
+# Main program for tests
+cc_library(
+    name = "test_main",
+    testonly = 1,
+    srcs = ["platform/test_main.cc"],
+    copts = tf_copts(),
+    linkopts = ["-lm"],
+    deps = [
+        ":test",
+        "//tensorflow/core/platform/default/build_config:test_main",
+    ],
+)
+
+# TODO(opensource): Make it work externally
+tf_proto_library(
+    name = "protos_all",
+    srcs = glob(["**/*.proto"]),
+    cc_api_version = 2,
+    go_api_version = 2,
+    java_api_version = 2,
+    py_api_version = 2,
+    visibility = ["//tensorflow:internal"],
+)
+
+cc_library(
+    name = "protos_cc",
+    deps = ["//tensorflow/core/platform/default/build_config:protos_cc"],
+)
+
+# Generates library per group of ops.
+tf_gen_op_libs(
+    op_lib_names = [
+        "array_ops",
+        "attention_ops",
+        "candidate_sampling_ops",
+        "control_flow_ops",
+        "data_flow_ops",
+        "image_ops",
+        "io_ops",
+        "linalg_ops",
+        "logging_ops",
+        "math_ops",
+        "nn_ops",
+        "no_op",
+        "parsing_ops",
+        "random_ops",
+        "sendrecv_ops",
+        "sparse_ops",
+        "state_ops",
+        "string_ops",
+        "summary_ops",
+        "training_ops",
+    ],
+)
+
+# And one for all user ops
+cc_library(
+    name = "user_ops_op_lib",
+    srcs = glob(["user_ops/**/*.cc"]),
+    copts = tf_copts(),
+    linkstatic = 1,
+    visibility = ["//visibility:public"],
+    deps = [":framework"],
+    alwayslink = 1,
+)
+
+# Low level library tests
+tf_cc_tests(
+    tests = glob(
+        [
+            "lib/**/*_test.cc",
+            "platform/**/*_test.cc",
+        ],
+        exclude = ["lib/strings/ordered_code_test.cc"],
+    ),
+    deps = [
+        ":lib",
+        ":test_main",
+    ],
+)
+
+cc_test(
+    name = "lib_jpeg_jpeg_mem_unittest",
+    srcs = ["lib/jpeg/jpeg_mem_unittest.cc"],
+    data = glob(["lib/jpeg/testdata/*.jpg"]),
+    deps = [
+        ":lib",
+        ":test_main",
+    ],
+)
+
+cc_test(
+    name = "lib_strings_ordered_code_test",
+    srcs = ["lib/strings/ordered_code_test.cc"],
+    copts = ["$(STACK_FRAME_UNLIMITED)"],  # Tests initialize large vectors
+    deps = [
+        ":lib",
+        ":test_main",
+    ],
+)
+
+# higher level tests
+tf_cc_tests(
+    linkstatic = tf_kernel_tests_linkstatic(),
+    tests = glob(
+        [
+            "client/**/*_test.cc",
+            "common_runtime/**/*_test.cc",
+            "framework/**/*_test.cc",
+            "graph/**/*_test.cc",
+            "util/**/*_test.cc",
+        ],
+        exclude = [
+            # TODO(opensource): fix
+            "common_runtime/gpu/*_test.cc",
+            # Run by tests below
+            "common_runtime/gpu/gpu_region_allocator_test.cc",
+            "common_runtime/gpu/gpu_bfc_allocator_test.cc",
+        ],
+    ),
+    deps = [
+        ":core",
+        ":kernels",
+        ":lib",
+        ":local",
+        ":test_main",
+        ":testlib",
+        "//tensorflow/cc:cc_ops",
+    ],
+)
+
+# GPU-related tests
+tf_cc_tests(
+    linkstatic = tf_kernel_tests_linkstatic(),
+    tags = tf_cuda_tests_tags(),
+    tests = glob(
+        [
+            "kernels/**/*_test.cc",
+            "user_ops/**/*_test.cc",
+            "common_runtime/gpu/*_test.cc",
+        ],
+    ),
+    deps = [
+        ":kernels",
+        ":local",
+        ":test_main",
+        ":testlib",
+        "//tensorflow/cc:cc_ops",
+    ],
+)
+
+tf_cuda_library(
+    name = "stream_executor",
+    deps = [
+        "//tensorflow/core/platform/default/build_config:stream_executor",
+    ],
+)
+
+cc_library(
+    name = "cuda",
+    visibility = [
+        ":friends",
+        "//tensorflow:internal",
+    ],
+    deps = [
+        "//tensorflow/core/platform/default/build_config:cuda",
+    ],
+)
+
+cc_library(
+    name = "tensorflow",
+    visibility = ["//visibility:public"],
+    deps = [
+        "tensorflow_opensource",
+        "//tensorflow/core/platform/default/build_config:tensorflow_platform_specific",
+    ],
+)
+
+cc_library(
+    name = "core",
+    visibility = ["//visibility:public"],
+    deps = [
+        ":core_cpu",
+        ":gpu_runtime",
+    ],
+)
+
+# Android-specific BUILD targets
+load("/tensorflow/tensorflow", "tf_android_core_proto_sources")
+
+# List of protos we want on android
+filegroup(
+    name = "android_proto_srcs",
+    srcs = tf_android_core_proto_sources(),
+    visibility = ["//visibility:public"],
+)
+
+# Core sources. Should eventually become identical to open source
+# sources.
+filegroup(
+    name = "android_srcs",
+    srcs = glob(
+        [
+            "client/**/*.cc",
+            "common_runtime/**/*.h",
+            "common_runtime/**/*.cc",
+            "framework/**/*.h",
+            "framework/**/*.cc",
+            "graph/**/*.h",
+            "graph/**/*.cc",
+            "lib/**/*.h",
+            "lib/**/*.cc",
+            "ops/**/*.cc",
+            "ops/**/*.h",
+            "platform/*.h",
+            "platform/*.cc",
+            "platform/**/*.h",
+            "platform/**/*.cc",
+            "public/**/*.h",
+            "util/**/*.h",
+            "util/**/*.cc",
+            "kernels/ops_util.cc",
+            "kernels/ops_util.h",
+            "kernels/avgpooling_op.h",
+            "kernels/maxpooling_op.h",
+            "kernels/pooling_ops_common.h",
+            "kernels/pooling_ops_common.cc",
+            "kernels/reference_gemm.h",
+        ],
+        exclude = [
+            "**/*test.cc",
+            "**/*testutil*",
+            "**/*testlib*",
+            "**/*main.cc",
+            "lib/jpeg/*.h",
+            "lib/jpeg/*.cc",
+            "lib/png/*.h",
+            "lib/png/*.cc",
+            "util/events_writer.cc",
+            "util/events_writer.h",
+            # Exclude all protobuf/google headers except protobuf_android.h
+            "platform/google/cord_coding.h",
+            "platform/google/dynamic_annotations.h",
+            "platform/google/integral_types.h",
+            "platform/google/mutex.h",
+            "platform/google/protobuf.h",
+            "platform/google/stream_executor_util.h",
+            "platform/google/tracing_impl.h",
+            "platform/google/*.cc",
+            "platform/google/test_benchmark.cc",
+            "platform/google/test_benchmark.h",
+            "kernels/**/*.cu.cc",
+            "user_ops/**/*.cu.cc",
+            "common_runtime/gpu/*.cc",
+            "common_runtime/gpu_device_factory.cc",
+        ],
+    ),
+    visibility = ["//visibility:public"],
+)
+
+# Core kernels we want on Android. Only a subset of kernels to keep
+# base library small.
+filegroup(
+    name = "android_core_ops",
+    srcs = [
+        "//tensorflow/core:kernels/aggregate_ops.cc",
+        "//tensorflow/core:kernels/aggregate_ops.h",
+        "//tensorflow/core:kernels/assign_op.h",
+        "//tensorflow/core:kernels/bias_op.cc",
+        "//tensorflow/core:kernels/bias_op.h",
+        "//tensorflow/core:kernels/cast_op.cc",
+        "//tensorflow/core:kernels/cast_op.h",
+        "//tensorflow/core:kernels/concat_op.cc",
+        "//tensorflow/core:kernels/concat_op.h",
+        "//tensorflow/core:kernels/concat_op_cpu.cc",
+        "//tensorflow/core:kernels/constant_op.cc",
+        "//tensorflow/core:kernels/constant_op.h",
+        "//tensorflow/core:kernels/cwise_ops.h",
+        "//tensorflow/core:kernels/cwise_ops_common.cc",
+        "//tensorflow/core:kernels/cwise_ops_common.h",
+        "//tensorflow/core:kernels/dense_update_ops.cc",
+        "//tensorflow/core:kernels/dense_update_ops.h",
+        "//tensorflow/core:kernels/fill_functor.h",
+        "//tensorflow/core:kernels/gather_op.cc",
+        "//tensorflow/core:kernels/identity_op.cc",
+        "//tensorflow/core:kernels/identity_op.h",
+        "//tensorflow/core:kernels/matmul_op.cc",
+        "//tensorflow/core:kernels/matmul_op.h",
+        "//tensorflow/core:kernels/no_op.cc",
+        "//tensorflow/core:kernels/no_op.h",
+        "//tensorflow/core:kernels/pack_op.cc",
+        "//tensorflow/core:kernels/reference_gemm.h",
+        "//tensorflow/core:kernels/reshape_op.cc",
+        "//tensorflow/core:kernels/reshape_op.h",
+        "//tensorflow/core:kernels/reverse_sequence_op.cc",
+        "//tensorflow/core:kernels/reverse_sequence_op.h",
+        "//tensorflow/core:kernels/sendrecv_ops.cc",
+        "//tensorflow/core:kernels/sendrecv_ops.h",
+        "//tensorflow/core:kernels/sequence_ops.cc",
+        "//tensorflow/core:kernels/shape_ops.cc",
+        "//tensorflow/core:kernels/slice_op.cc",
+        "//tensorflow/core:kernels/slice_op.h",
+        "//tensorflow/core:kernels/softmax_op.cc",
+        "//tensorflow/core:kernels/softmax_op.h",
+        "//tensorflow/core:kernels/split_op.cc",
+        "//tensorflow/core:kernels/split_op.h",
+        "//tensorflow/core:kernels/split_op_cpu.cc",
+        "//tensorflow/core:kernels/unpack_op.cc",
+        "//tensorflow/core:kernels/variable_ops.cc",
+        "//tensorflow/core:kernels/variable_ops.h",
+    ],
+    visibility = ["//visibility:public"],
+)
+
+# Other kernels we may want on Android.
+filegroup(
+    name = "android_extended_ops",
+    srcs = [
+        "//tensorflow/core:kernels/avgpooling_op.cc",
+        "//tensorflow/core:kernels/avgpooling_op.h",
+        "//tensorflow/core:kernels/control_flow_ops.cc",
+        "//tensorflow/core:kernels/control_flow_ops.h",
+        "//tensorflow/core:kernels/conv_2d.h",
+        "//tensorflow/core:kernels/conv_ops.cc",
+        "//tensorflow/core:kernels/cwise_op_add.cc",
+        "//tensorflow/core:kernels/cwise_op_div.cc",
+        "//tensorflow/core:kernels/cwise_op_exp.cc",
+        "//tensorflow/core:kernels/cwise_op_log.cc",
+        "//tensorflow/core:kernels/cwise_op_mul.cc",
+        "//tensorflow/core:kernels/cwise_op_sigmoid.cc",
+        "//tensorflow/core:kernels/cwise_op_sqrt.cc",
+        "//tensorflow/core:kernels/cwise_op_square.cc",
+        "//tensorflow/core:kernels/cwise_op_sub.cc",
+        "//tensorflow/core:kernels/cwise_op_tanh.cc",
+        "//tensorflow/core:kernels/lrn_op.cc",
+        "//tensorflow/core:kernels/maxpooling_op.cc",
+        "//tensorflow/core:kernels/maxpooling_op.h",
+        "//tensorflow/core:kernels/reduction_ops.h",
+        "//tensorflow/core:kernels/reduction_ops_common.h",
+        "//tensorflow/core:kernels/reduction_ops_max.cc",
+        "//tensorflow/core:kernels/reduction_ops_min.cc",
+        "//tensorflow/core:kernels/reduction_ops_sum.cc",
+        "//tensorflow/core:kernels/relu_op.cc",
+        "//tensorflow/core:kernels/relu_op.h",
+        "//tensorflow/core:kernels/softplus_op.cc",
+        "//tensorflow/core:kernels/softplus_op.h",
+        "//tensorflow/core:kernels/transpose_op.cc",
+        "//tensorflow/core:kernels/transpose_op.h",
+        "//tensorflow/core:kernels/transpose_op_functor.h",
+    ],
+    visibility = ["//visibility:public"],
+)
+
+# Test data
+filegroup(
+    name = "image_testdata",
+    srcs = [
+        # PNG data
+        "lib/png/testdata/lena_gray.png",
+        "lib/png/testdata/lena_rgba.png",
+        # JPEG data
+        "lib/jpeg/testdata/jpeg_merge_test1.jpg",
+        "lib/jpeg/testdata/jpeg_merge_test1_cmyk.jpg",
+        # Corrupted JPEG files for tests
+        "lib/jpeg/testdata/bad_huffman.jpg",
+        "lib/jpeg/testdata/corrupt.jpg",
+        # -- hand-edited variant: stops at line 0
+        "lib/jpeg/testdata/corrupt34_2.jpg",
+        # -- hand-edited variant: stops at line 4
+        "lib/jpeg/testdata/corrupt34_3.jpg",
+        # -- hand-edited variant: stops after a restart marker
+        "lib/jpeg/testdata/corrupt34_4.jpg",
+    ],
+)
+
+# For portable_proto_library
+
+# Native library support for Android applications.
+# Should be built to target Android with flag --copt=-mfpu=neon.
+cc_library(
+    name = "android_tensorflow_lib",
+    srcs = [
+        "//tensorflow/core:android_core_ops",
+        "//tensorflow/core:android_extended_ops",
+        "//tensorflow/core:android_srcs",
+    ],
+    copts = [
+        "-mfpu=neon",
+        "-std=c++11",
+    ],
+    tags = [
+        "manual",
+        "notap",
+    ],
+    visibility = ["//visibility:public"],
+    deps = [
+        "@re2//:re2",
+        ":protos_cc",
+        "//third_party/eigen3",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/core/client/tensor_c_api.cc b/tensorflow/core/client/tensor_c_api.cc
new file mode 100644
index 0000000000..59cf0ed8f9
--- /dev/null
+++ b/tensorflow/core/client/tensor_c_api.cc
@@ -0,0 +1,370 @@
+#include "tensorflow/core/public/tensor_c_api.h"
+
+#include <memory>
+
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/public/session.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+// The implementation below is at the top level instead of the
+// brain namespace because we are defining 'extern "C"' functions.
+using tensorflow::error::Code;
+using tensorflow::errors::InvalidArgument;
+using tensorflow::gtl::ArraySlice;
+using tensorflow::AllocationDescription;
+using tensorflow::Status;
+using tensorflow::DataType;
+using tensorflow::Env;
+using tensorflow::GraphDef;
+using tensorflow::NewSession;
+using tensorflow::Session;
+using tensorflow::Tensor;
+using tensorflow::TensorBuffer;
+using tensorflow::SessionOptions;
+using tensorflow::TensorShape;
+
+extern "C" {
+
+// --------------------------------------------------------------------------
+struct TF_Status {
+  Status status;
+};
+
+TF_Status* TF_NewStatus() { return new TF_Status; }
+
+void TF_DeleteStatus(TF_Status* s) { delete s; }
+
+void TF_SetStatus(TF_Status* s, TF_Code code, const char* msg) {
+  s->status = Status(static_cast<Code>(code), tensorflow::StringPiece(msg));
+}
+
+TF_Code TF_GetCode(const TF_Status* s) {
+  return static_cast<TF_Code>(s->status.code());
+}
+
+const char* TF_Message(const TF_Status* s) {
+  return s->status.error_message().c_str();
+}
+
+// --------------------------------------------------------------------------
+
+namespace {
+class TF_ManagedBuffer : public TensorBuffer {
+ public:
+  void* data_;
+  size_t len_;
+  void (*deallocator_)(void* data, size_t len, void* arg);
+  void* deallocator_arg_;
+
+  ~TF_ManagedBuffer() override {
+    (*deallocator_)(data_, len_, deallocator_arg_);
+  }
+
+  void* data() const override { return data_; }
+  size_t size() const override { return len_; }
+  TensorBuffer* root_buffer() override { return this; }
+  void FillAllocationDescription(AllocationDescription* proto) const override {
+    tensorflow::int64 rb = size();
+    proto->set_requested_bytes(rb);
+    proto->set_allocator_name(tensorflow::cpu_allocator()->Name());
+  }
+};
+
+void deallocate_realigned_buffer(void* data, size_t len, void* arg) {
+  tensorflow::cpu_allocator()->DeallocateRaw(data);
+}
+}  // namespace
+
+struct TF_Tensor {
+  TF_DataType dtype;
+  TensorShape shape;
+  TensorBuffer* buffer;
+};
+
+TF_Tensor* TF_NewTensor(TF_DataType dtype, tensorflow::int64* dims,
+                        int num_dims, void* data, size_t len,
+                        void (*deallocator)(void* data, size_t len, void* arg),
+                        void* deallocator_arg) {
+  std::vector<tensorflow::int64> dimvec(num_dims);
+  for (int i = 0; i < num_dims; i++) {
+    dimvec[i] = dims[i];
+  }
+
+  TF_ManagedBuffer* buf = new TF_ManagedBuffer;
+  buf->len_ = len;
+  if (reinterpret_cast<intptr_t>(data) % EIGEN_MAX_ALIGN_BYTES != 0) {
+    // Copy the data into a buffer that satisfies Eigen's alignment
+    // requirements.
+    buf->data_ =
+        tensorflow::cpu_allocator()->AllocateRaw(EIGEN_MAX_ALIGN_BYTES, len);
+    std::memcpy(buf->data_, data, len);
+    buf->deallocator_ = deallocate_realigned_buffer;
+    buf->deallocator_arg_ = nullptr;
+    // Free the original buffer.
+    deallocator(data, len, deallocator_arg);
+  } else {
+    buf->data_ = data;
+    buf->deallocator_ = deallocator;
+    buf->deallocator_arg_ = deallocator_arg;
+  }
+  return new TF_Tensor{dtype, TensorShape(dimvec), buf};
+}
+
+void TF_DeleteTensor(TF_Tensor* t) {
+  t->buffer->Unref();
+  delete t;
+}
+
+TF_DataType TF_TensorType(const TF_Tensor* t) { return t->dtype; }
+int TF_NumDims(const TF_Tensor* t) { return t->shape.dims(); }
+tensorflow::int64 TF_Dim(const TF_Tensor* t, int dim_index) {
+  return t->shape.dim_size(dim_index);
+}
+size_t TF_TensorByteSize(const TF_Tensor* t) { return t->buffer->size(); }
+void* TF_TensorData(const TF_Tensor* t) { return t->buffer->data(); }
+
+// --------------------------------------------------------------------------
+struct TF_SessionOptions {
+  SessionOptions options;
+};
+TF_SessionOptions* TF_NewSessionOptions() { return new TF_SessionOptions; }
+void TF_DeleteSessionOptions(TF_SessionOptions* opt) { delete opt; }
+
+void TF_SetTarget(TF_SessionOptions* options, const char* target) {
+  options->options.target = target;
+}
+
+void TF_SetConfig(TF_SessionOptions* options, const char* config,
+                  size_t config_len, TF_Status* status) {
+  if (!options->options.config.ParseFromArray(config, config_len)) {
+    status->status =
+        tensorflow::errors::InvalidArgument("Unparseable ConfigProto");
+  }
+}
+
+// --------------------------------------------------------------------------
+struct TF_Session {
+  Session* session;
+};
+
+TF_Session* TF_NewSession(const TF_SessionOptions* opt, TF_Status* status) {
+  Session* session;
+  status->status = NewSession(opt->options, &session);
+  if (status->status.ok()) {
+    return new TF_Session({session});
+  } else {
+    DCHECK_EQ(nullptr, session);
+    return NULL;
+  }
+}
+
+void TF_CloseSession(TF_Session* s, TF_Status* status) {
+  status->status = s->session->Close();
+}
+
+void TF_DeleteSession(TF_Session* s, TF_Status* status) {
+  status->status = Status::OK();
+  delete s->session;
+  delete s;
+}
+
+void TF_ExtendGraph(TF_Session* s, const void* proto, size_t proto_len,
+                    TF_Status* status) {
+  GraphDef g;
+  if (!tensorflow::ParseProtoUnlimited(&g, proto, proto_len)) {
+    status->status = tensorflow::errors::InvalidArgument("Invalid GraphDef");
+    return;
+  }
+  status->status = s->session->Extend(g);
+}
+
+static void DeleteArray(void* data, size_t size, void* arg) {
+  DCHECK_EQ(data, arg);
+  delete[] reinterpret_cast<char*>(arg);
+}
+
+}  // end extern "C"
+
+namespace tensorflow {
+
+// Non-static for testing.
+bool TF_Tensor_DecodeStrings(TF_Tensor* src, Tensor* dst, TF_Status* status) {
+  const tensorflow::int64 num_elements = src->shape.num_elements();
+  const char* input = reinterpret_cast<const char*>(TF_TensorData(src));
+  const size_t src_size = TF_TensorByteSize(src);
+  if (static_cast<tensorflow::int64>(src_size / sizeof(tensorflow::uint64)) <
+      num_elements) {
+    status->status = InvalidArgument(
+        "Malformed TF_STRING tensor; too short to hold number of elements");
+    return false;
+  }
+  const char* data_start = input + sizeof(tensorflow::uint64) * num_elements;
+  const char* limit = input + src_size;
+
+  *dst = Tensor(static_cast<DataType>(src->dtype), src->shape);
+  auto dstarray = dst->flat<tensorflow::string>();
+  for (tensorflow::int64 i = 0; i < num_elements; i++) {
+    tensorflow::uint64 offset =
+        reinterpret_cast<const tensorflow::uint64*>(input)[i];
+    tensorflow::uint64 len;
+    const char* p;
+    if (static_cast<ptrdiff_t>(offset) >= (limit - data_start) ||
+        !(p = tensorflow::core::GetVarint64Ptr(data_start + offset, limit,
+                                               &len)) ||
+        (static_cast<ptrdiff_t>(len) > (limit - p))) {
+      status->status = InvalidArgument("Malformed TF_STRING tensor; element ",
+                                       i, " out of range");
+      return false;
+    }
+    dstarray(i).assign(p, len);
+  }
+  return true;
+}
+
+// Non-static for testing.
+TF_Tensor* TF_Tensor_EncodeStrings(const Tensor& src) {
+  // Compute bytes needed for encoding.
+  size_t size = 0;
+  const auto& srcarray = src.flat<tensorflow::string>();
+  for (int i = 0; i < srcarray.size(); i++) {
+    const tensorflow::string& s = srcarray(i);
+    // uint64 starting_offset, varint64 length, string contents
+    size += sizeof(tensorflow::uint64) +
+            tensorflow::core::VarintLength(s.size()) + s.size();
+  }
+
+  // Encode all strings.
+  char* base = new char[size];
+  char* data_start = base + sizeof(tensorflow::uint64) * srcarray.size();
+  char* dst = data_start;  // Where next string is encoded.
+  tensorflow::uint64* offsets = reinterpret_cast<tensorflow::uint64*>(base);
+  for (int i = 0; i < srcarray.size(); i++) {
+    const tensorflow::string& s = srcarray(i);
+    *offsets = (dst - data_start);
+    offsets++;
+    dst = tensorflow::core::EncodeVarint64(dst, s.size());
+    memcpy(dst, s.data(), s.size());
+    dst += s.size();
+  }
+  CHECK_EQ(dst, base + size);
+
+  auto dims = src.shape().dim_sizes();
+  std::vector<tensorflow::int64> dimvec(dims.size());
+  for (size_t i = 0; i < dims.size(); i++) {
+    dimvec[i] = dims[i];
+  }
+  return TF_NewTensor(TF_STRING, dimvec.data(), dimvec.size(), base, size,
+                      DeleteArray, base);
+}
+
+class TensorCApi {
+ public:
+  static TensorBuffer* Buffer(const Tensor& tensor) { return tensor.buf_; }
+  static Tensor MakeTensor(TF_DataType type, const TensorShape& shape,
+                           TensorBuffer* buf) {
+    return Tensor(static_cast<DataType>(type), shape, buf);
+  }
+};
+
+// Create an empty tensor of type 'dtype'. 'shape' can be arbitrary, but has to
+// result in a zero-sized tensor.
+static TF_Tensor* EmptyTensor(TF_DataType dtype, const TensorShape& shape) {
+  static char empty;
+  tensorflow::int64 nelems = 1;
+  std::vector<tensorflow::int64> dims;
+  for (int i = 0; i < shape.dims(); ++i) {
+    dims.push_back(shape.dim_size(i));
+    nelems *= shape.dim_size(i);
+  }
+  CHECK_EQ(nelems, 0);
+  return TF_NewTensor(dtype, dims.data(), shape.dims(),
+                      reinterpret_cast<void*>(&empty), 0,
+                      [](void*, size_t, void*) {}, nullptr);
+}
+
+}  // namespace tensorflow
+
+extern "C" {
+
+void TF_Run(TF_Session* s,
+            // Input tensors
+            const char** c_input_names, TF_Tensor** c_inputs, int ninputs,
+            // Output tensors
+            const char** c_output_tensor_names, TF_Tensor** c_outputs,
+            int noutputs,
+            // Target nodes
+            const char** c_target_node_names, int ntargets, TF_Status* status) {
+  status->status = Status::OK();
+  for (int i = 0; i < noutputs; i++) {
+    c_outputs[i] = NULL;
+  }
+
+  // Initialize inputs.
+  std::vector<std::pair<tensorflow::string, Tensor>> inputs(ninputs);
+  bool ok = true;
+  for (int i = 0; i < ninputs; i++) {
+    TF_Tensor* src = c_inputs[i];
+    if (ok) {
+      inputs[i].first = c_input_names[i];
+      if (c_inputs[i]->dtype != TF_STRING) {
+        inputs[i].second = tensorflow::TensorCApi::MakeTensor(
+            src->dtype, src->shape, src->buffer);
+      } else {
+        // TF_STRING tensors require copying since Tensor class expects
+        // a sequence of string objects.
+        ok =
+            tensorflow::TF_Tensor_DecodeStrings(src, &inputs[i].second, status);
+        // Must keep looping through all inputs even if there is an error
+        // so that TF_DeleteTensor() is called unconditionally on all inputs.
+      }
+    }
+    TF_DeleteTensor(src);
+  }
+  if (!ok) {
+    return;
+  }
+
+  std::vector<tensorflow::string> output_tensor_names(noutputs);
+  std::vector<Tensor> outputs(noutputs);
+  std::vector<tensorflow::string> target_node_names(ntargets);
+  for (int i = 0; i < noutputs; i++) {
+    output_tensor_names[i] = c_output_tensor_names[i];
+  }
+  for (int i = 0; i < ntargets; i++) {
+    target_node_names[i] = c_target_node_names[i];
+  }
+  Status result =
+      s->session->Run(inputs, output_tensor_names, target_node_names, &outputs);
+  if (!result.ok()) {
+    status->status = result;
+    return;
+  }
+
+  // Store results in c_outputs[]
+  for (int i = 0; i < noutputs; i++) {
+    const Tensor& src = outputs[i];
+    if (!src.IsInitialized()) {
+      c_outputs[i] = tensorflow::EmptyTensor(
+          static_cast<TF_DataType>(src.dtype()), src.shape());
+      continue;
+    }
+    if (src.dtype() != tensorflow::DT_STRING) {
+      // Share the underlying buffer.
+      TensorBuffer* buf = tensorflow::TensorCApi::Buffer(src);
+      buf->Ref();
+      c_outputs[i] = new TF_Tensor{static_cast<TF_DataType>(src.dtype()),
+                                   src.shape(), buf};
+    } else {
+      c_outputs[i] = tensorflow::TF_Tensor_EncodeStrings(src);
+    }
+  }
+}
+
+}  // end extern "C"
diff --git a/tensorflow/core/client/tensor_c_api_test.cc b/tensorflow/core/client/tensor_c_api_test.cc
new file mode 100644
index 0000000000..4afdd0c0df
--- /dev/null
+++ b/tensorflow/core/client/tensor_c_api_test.cc
@@ -0,0 +1,94 @@
+#include "tensorflow/core/public/tensor_c_api.h"
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/public/tensor.h"
+
+using tensorflow::Tensor;
+using tensorflow::TensorShape;
+
+namespace tensorflow {
+bool TF_Tensor_DecodeStrings(TF_Tensor* src, Tensor* dst, TF_Status* status);
+TF_Tensor* TF_Tensor_EncodeStrings(const Tensor& src);
+}  // namespace tensorflow
+
+TEST(CApi, Status) {
+  TF_Status* s = TF_NewStatus();
+  EXPECT_EQ(TF_OK, TF_GetCode(s));
+  EXPECT_EQ(tensorflow::string(), TF_Message(s));
+  TF_SetStatus(s, TF_CANCELLED, "cancel");
+  EXPECT_EQ(TF_CANCELLED, TF_GetCode(s));
+  EXPECT_EQ(tensorflow::string("cancel"), TF_Message(s));
+  TF_DeleteStatus(s);
+}
+
+static void Deallocator(void* data, size_t, void* arg) {
+  tensorflow::cpu_allocator()->DeallocateRaw(data);
+  *reinterpret_cast<bool*>(arg) = true;
+}
+
+TEST(CApi, Tensor) {
+  float* values =
+      reinterpret_cast<float*>(tensorflow::cpu_allocator()->AllocateRaw(
+          EIGEN_MAX_ALIGN_BYTES, 6 * sizeof(float)));
+  tensorflow::int64 dims[] = {2, 3};
+  bool deallocator_called = false;
+  TF_Tensor* t = TF_NewTensor(TF_FLOAT, dims, 2, values, sizeof(values),
+                              &Deallocator, &deallocator_called);
+  EXPECT_FALSE(deallocator_called);
+  EXPECT_EQ(TF_FLOAT, TF_TensorType(t));
+  EXPECT_EQ(2, TF_NumDims(t));
+  EXPECT_EQ(dims[0], TF_Dim(t, 0));
+  EXPECT_EQ(dims[1], TF_Dim(t, 1));
+  EXPECT_EQ(sizeof(values), TF_TensorByteSize(t));
+  EXPECT_EQ(static_cast<void*>(values), TF_TensorData(t));
+  TF_DeleteTensor(t);
+  EXPECT_TRUE(deallocator_called);
+}
+
+static void TestEncodeDecode(int line,
+                             const std::vector<tensorflow::string>& data) {
+  const tensorflow::int64 n = data.size();
+  for (std::vector<tensorflow::int64> dims :
+       std::vector<std::vector<tensorflow::int64>>{
+           {n}, {1, n}, {n, 1}, {n / 2, 2}}) {
+    // Create C++ Tensor
+    Tensor src(tensorflow::DT_STRING, TensorShape(dims));
+    for (tensorflow::int64 i = 0; i < src.NumElements(); i++) {
+      src.flat<tensorflow::string>()(i) = data[i];
+    }
+    TF_Tensor* dst = TF_Tensor_EncodeStrings(src);
+
+    // Convert back to a C++ Tensor and ensure we get expected output.
+    TF_Status* status = TF_NewStatus();
+    Tensor output;
+    ASSERT_TRUE(TF_Tensor_DecodeStrings(dst, &output, status)) << line;
+    ASSERT_EQ(TF_OK, TF_GetCode(status)) << line;
+    ASSERT_EQ(src.NumElements(), output.NumElements()) << line;
+    for (tensorflow::int64 i = 0; i < src.NumElements(); i++) {
+      ASSERT_EQ(data[i], output.flat<tensorflow::string>()(i)) << line;
+    }
+
+    TF_DeleteStatus(status);
+    TF_DeleteTensor(dst);
+  }
+}
+
+TEST(CApi, TensorEncodeDecodeStrings) {
+  TestEncodeDecode(__LINE__, {});
+  TestEncodeDecode(__LINE__, {"hello"});
+  TestEncodeDecode(__LINE__,
+                   {"the", "quick", "brown", "fox", "jumped", "over"});
+
+  tensorflow::string big(1000, 'a');
+  TestEncodeDecode(__LINE__, {"small", big, "small2"});
+}
+
+TEST(CApi, SessionOptions) {
+  TF_SessionOptions* opt = TF_NewSessionOptions();
+  TF_DeleteSessionOptions(opt);
+}
+
+// TODO(jeff,sanjay): Session tests
+// . Create and delete
+// . Extend graph
+// . Run
diff --git a/tensorflow/core/common_runtime/device.cc b/tensorflow/core/common_runtime/device.cc
new file mode 100644
index 0000000000..2e3e7b6597
--- /dev/null
+++ b/tensorflow/core/common_runtime/device.cc
@@ -0,0 +1,37 @@
+#include "tensorflow/core/common_runtime/device.h"
+
+#include "tensorflow/core/framework/op_segment.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/random/random.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+Device::Device(Env* env, const DeviceAttributes& device_attributes,
+               Allocator* device_allocator)
+    : DeviceBase(env), device_attributes_(device_attributes) {
+  CHECK(DeviceNameUtils::ParseFullName(name(), &parsed_name_))
+      << "Invalid device name: " << name();
+  rmgr_ = new ResourceMgr(parsed_name_.job);
+}
+
+Device::~Device() { delete rmgr_; }
+
+// static
+DeviceAttributes Device::BuildDeviceAttributes(
+    const string& name, DeviceType device, Bytes memory_limit,
+    BusAdjacency bus_adjacency, const string& physical_device_desc) {
+  DeviceAttributes da;
+  da.set_name(name);
+  do {
+    da.set_incarnation(random::New64());
+  } while (da.incarnation() == 0);  // This proto field must not be zero
+  da.set_device_type(device.type());
+  da.set_memory_limit(memory_limit.value());
+  da.set_bus_adjacency(bus_adjacency);
+  da.set_physical_device_desc(physical_device_desc);
+  return da;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/device.h b/tensorflow/core/common_runtime/device.h
new file mode 100644
index 0000000000..ff3404fea4
--- /dev/null
+++ b/tensorflow/core/common_runtime/device.h
@@ -0,0 +1,128 @@
+// A Device is a something that can perform computations as part of a
+// model.  Devices can be local (runs computation on this machine), or
+// remote (contacts a device local to another machine using an RPC to
+// do the work).  Devices are registered in a DeviceSet, which is also
+// responsible for the Device <-> id mapping.
+//
+// Device names
+// * Every Device should have a unique name with the format:
+//     /job:___/replica:___/task:___/(gpu|cpu):___
+//   An example name would be "/job:train/replica:0/task:3/gpu:2".
+// * Task numbers are within the specified replica, so there are as
+//   many "task zeros" as replicas.
+
+#ifndef TENSORFLOW_COMMON_RUNTIME_DEVICE_H_
+#define TENSORFLOW_COMMON_RUNTIME_DEVICE_H_
+
+#include <memory>
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/control_flow.h"
+#include "tensorflow/core/framework/device_attributes.pb.h"
+#include "tensorflow/core/framework/device_base.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/op_segment.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/types.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/util/device_name_utils.h"
+
+namespace tensorflow {
+
+class Device : public DeviceBase {
+ public:
+  Device(Env* env, const DeviceAttributes& device_attributes,
+         Allocator* device_allocator);
+  ~Device() override;
+
+  // Full name of this device (see top comment).
+  const string& name() const { return device_attributes_.name(); }
+
+  // Parsed name of this device
+  const DeviceNameUtils::ParsedName parsed_name() const { return parsed_name_; }
+
+  // Describes what kind of device this is.  This is intended to be
+  // human-readable and not computer-parsed, except that two devices
+  // with the same device_type() are expected to perform similarly
+  // (both from a computation and communication perspective).
+  const string& device_type() const { return device_attributes_.device_type(); }
+
+  // Returns an aggregation of device attributes.
+  const DeviceAttributes& attributes() const override {
+    return device_attributes_;
+  }
+
+  // Performs the actual compute function.
+  //
+  // Subclasses may override this function if they wish to perform
+  // some initialization before each compute.
+  virtual void Compute(OpKernel* op_kernel, OpKernelContext* context) {
+    op_kernel->Compute(context);
+  }
+
+  // Asynchronous kernel's compute.
+  virtual void ComputeAsync(AsyncOpKernel* op_kernel, OpKernelContext* context,
+                            AsyncOpKernel::DoneCallback done) {
+    op_kernel->ComputeAsync(context, done);
+  }
+
+  // Blocks until all operations queued on the device at the time of
+  // the call have completed.  Returns any error pending on the device
+  // at completion.
+  virtual Status Sync() = 0;
+
+  // Fill in the context map for the graph. Default behavior is to do
+  // nothing.
+  //
+  // The caller takes ownership over the DeviceContext objects given
+  // by the device.
+  virtual Status FillContextMap(const Graph* graph,
+                                DeviceContextMap* device_context_map) {
+    return Status::OK();
+  }
+
+  // Returns the op segment of this device.  The caller can reuse op
+  // kernels registered for the same session running on this device.
+  OpSegment* op_segment() { return &op_seg_; }
+
+  // Returns the resource manager associated w/ this device.
+  ResourceMgr* resource_manager() { return rmgr_; }
+
+  // Summarizes the status of this Device, for debugging.
+  string DebugString() const { return device_attributes_.DebugString(); }
+
+  // Assembles the parameter components into a complete DeviceAttributes value.
+  static DeviceAttributes BuildDeviceAttributes(
+      const string& name, DeviceType device, Bytes memory_limit,
+      BusAdjacency bus_adjacency, const string& physical_device_desc);
+
+  static DeviceAttributes BuildDeviceAttributes(const string& name,
+                                                DeviceType device,
+                                                Bytes memory_limit,
+                                                BusAdjacency bus_adjacency) {
+    // Pass in an empty string as physical device name.
+    return BuildDeviceAttributes(name, device, memory_limit, bus_adjacency, "");
+  }
+
+ private:
+  const DeviceAttributes device_attributes_;
+  DeviceNameUtils::ParsedName parsed_name_;
+
+  // op_seg_ maps session handle and op name to OpKernel objects.
+  OpSegment op_seg_;
+
+  // Resources associated w/ this device. E.g., shared variables, etc.
+  ResourceMgr* rmgr_ = nullptr;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Device);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_DEVICE_H_
diff --git a/tensorflow/core/common_runtime/device_factory.cc b/tensorflow/core/common_runtime/device_factory.cc
new file mode 100644
index 0000000000..7d391bde1d
--- /dev/null
+++ b/tensorflow/core/common_runtime/device_factory.cc
@@ -0,0 +1,106 @@
+#include "tensorflow/core/common_runtime/device_factory.h"
+
+#include <memory>
+#include <string>
+#include <unordered_map>
+
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/session_options.h"
+
+namespace tensorflow {
+
+namespace {
+
+static mutex* get_device_factory_lock() {
+  static mutex device_factory_lock;
+  return &device_factory_lock;
+}
+
+struct FactoryItem {
+  std::unique_ptr<DeviceFactory> factory;
+  int priority;
+};
+
+std::unordered_map<string, FactoryItem>& device_factories() {
+  static std::unordered_map<string, FactoryItem>* factories =
+      new std::unordered_map<string, FactoryItem>;
+  return *factories;
+}
+}  // namespace
+
+void DeviceFactory::Register(const string& device_type, DeviceFactory* factory,
+                             int priority) {
+  mutex_lock l(*get_device_factory_lock());
+  std::unique_ptr<DeviceFactory> factory_ptr(factory);
+  std::unordered_map<string, FactoryItem>& factories = device_factories();
+  auto iter = factories.find(device_type);
+  if (iter == factories.end()) {
+    factories[device_type] = {std::move(factory_ptr), priority};
+  } else {
+    if (iter->second.priority < priority) {
+      iter->second = {std::move(factory_ptr), priority};
+    } else if (iter->second.priority == priority) {
+      LOG(FATAL) << "Duplicate registration of device factory for type "
+                 << device_type << " with the same priority " << priority;
+    }
+  }
+}
+
+DeviceFactory* DeviceFactory::GetFactory(const string& device_type) {
+  mutex_lock l(*get_device_factory_lock());  // could use reader lock
+  auto it = device_factories().find(device_type);
+  if (it == device_factories().end()) {
+    return nullptr;
+  }
+  return it->second.factory.get();
+}
+
+void DeviceFactory::AddDevices(const SessionOptions& options,
+                               const string& name_prefix,
+                               std::vector<Device*>* devices) {
+  // CPU first.
+  auto cpu_factory = GetFactory("CPU");
+  if (!cpu_factory) {
+    LOG(FATAL)
+        << "CPU Factory not registered.  Did you link in threadpool_device?";
+  }
+  size_t init_size = devices->size();
+  cpu_factory->CreateDevices(options, name_prefix, devices);
+  if (devices->size() == init_size) {
+    LOG(FATAL) << "No CPU devices are available in this process";
+  }
+
+  // Then GPU.
+  auto gpu_factory = GetFactory("GPU");
+  if (gpu_factory) {
+    gpu_factory->CreateDevices(options, name_prefix, devices);
+  }
+
+  // Then the rest.
+  mutex_lock l(*get_device_factory_lock());
+  for (auto& p : device_factories()) {
+    auto factory = p.second.factory.get();
+    if (factory != cpu_factory && factory != gpu_factory) {
+      factory->CreateDevices(options, name_prefix, devices);
+    }
+  }
+}
+
+Device* DeviceFactory::NewDevice(const string& type,
+                                 const SessionOptions& options,
+                                 const string& name_prefix) {
+  auto device_factory = GetFactory(type);
+  if (!device_factory) {
+    return nullptr;
+  }
+  SessionOptions opt = options;
+  (*opt.config.mutable_device_count())[type] = 1;
+  std::vector<Device*> devices;
+  device_factory->CreateDevices(opt, name_prefix, &devices);
+  CHECK_EQ(devices.size(), 1);
+  return devices[0];
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/device_factory.h b/tensorflow/core/common_runtime/device_factory.h
new file mode 100644
index 0000000000..57b625b3e5
--- /dev/null
+++ b/tensorflow/core/common_runtime/device_factory.h
@@ -0,0 +1,69 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_DEVICE_FACTORY_H_
+#define TENSORFLOW_COMMON_RUNTIME_DEVICE_FACTORY_H_
+
+#include <string>
+#include <vector>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+class Device;
+struct SessionOptions;
+
+class DeviceFactory {
+ public:
+  virtual ~DeviceFactory() {}
+  static void Register(const string& device_type, DeviceFactory* factory,
+                       int priority);
+  static DeviceFactory* GetFactory(const string& device_type);
+
+  // Append to "*devices" all suitable devices, respecting
+  // any device type specific properties/counts listed in "options".
+  //
+  // CPU devices are added first.
+  static void AddDevices(const SessionOptions& options,
+                         const string& name_prefix,
+                         std::vector<Device*>* devices);
+
+  // Helper for tests.  Create a single device of type "type".  The
+  // returned device is always numbered zero, so if creating multiple
+  // devices of the same type, supply distinct name_prefix arguments.
+  static Device* NewDevice(const string& type, const SessionOptions& options,
+                           const string& name_prefix);
+
+  // Most clients should call AddDevices() instead.
+  virtual void CreateDevices(const SessionOptions& options,
+                             const string& name_prefix,
+                             std::vector<Device*>* devices) = 0;
+};
+
+namespace dfactory {
+
+template <class Factory>
+class Registrar {
+ public:
+  // Multiple registrations for the same device type with different priorities
+  // are allowed. The registration with the highest priority will be used.
+  explicit Registrar(const string& device_type, int priority = 0) {
+    DeviceFactory::Register(device_type, new Factory(), priority);
+  }
+};
+
+}  // namespace dfactory
+
+#define REGISTER_LOCAL_DEVICE_FACTORY(device_type, device_factory, ...) \
+  INTERNAL_REGISTER_LOCAL_DEVICE_FACTORY(device_type, device_factory,   \
+                                         __COUNTER__, ##__VA_ARGS__)
+
+#define INTERNAL_REGISTER_LOCAL_DEVICE_FACTORY(device_type, device_factory, \
+                                               ctr, ...)                    \
+  static ::tensorflow::dfactory::Registrar<device_factory>                  \
+      INTERNAL_REGISTER_LOCAL_DEVICE_FACTORY_NAME(ctr)(device_type,         \
+                                                       ##__VA_ARGS__)
+
+// __COUNTER__ must go through another macro to be properly expanded
+#define INTERNAL_REGISTER_LOCAL_DEVICE_FACTORY_NAME(ctr) ___##ctr##__object_
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_DEVICE_FACTORY_H_
diff --git a/tensorflow/core/common_runtime/device_mgr.cc b/tensorflow/core/common_runtime/device_mgr.cc
new file mode 100644
index 0000000000..4fa13f6b4b
--- /dev/null
+++ b/tensorflow/core/common_runtime/device_mgr.cc
@@ -0,0 +1,90 @@
+#include "tensorflow/core/common_runtime/device_mgr.h"
+
+#include "tensorflow/core/common_runtime/local_device.h"
+#include "tensorflow/core/framework/device_attributes.pb.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/util/device_name_utils.h"
+
+namespace tensorflow {
+
+DeviceMgr::DeviceMgr(const std::vector<Device*>& devices) {
+  for (Device* d : devices) {
+    devices_.push_back(d);
+
+    // Register under both the full name and the local name.
+    device_map_[d->name()] = d;
+    device_map_[DeviceNameUtils::LocalName(d->name())] = d;
+    device_type_counts_[d->device_type()]++;
+  }
+}
+
+DeviceMgr::~DeviceMgr() {
+  for (auto p : devices_) delete p;
+}
+
+void DeviceMgr::ListDeviceAttributes(
+    std::vector<DeviceAttributes>* devices) const {
+  devices->reserve(devices_.size());
+  for (Device* dev : devices_) {
+    devices->emplace_back(dev->attributes());
+  }
+}
+
+std::vector<Device*> DeviceMgr::ListDevices() const {
+  return std::vector<Device*>(devices_.begin(), devices_.end());
+}
+
+string DeviceMgr::DebugString() const {
+  string out;
+  for (Device* dev : devices_) {
+    strings::StrAppend(&out, dev->name(), "\n");
+  }
+  return out;
+}
+
+string DeviceMgr::DeviceMappingString() const {
+  string out;
+  for (Device* dev : devices_) {
+    if (!dev->attributes().physical_device_desc().empty()) {
+      strings::StrAppend(&out, dev->name(), " -> ",
+                         dev->attributes().physical_device_desc(), "\n");
+    }
+  }
+  return out;
+}
+
+Status DeviceMgr::LookupDevice(const string& name, Device** device) const {
+  Status s;
+  auto iter = device_map_.find(name);
+  if (iter == device_map_.end()) {
+    return errors::InvalidArgument(name, " unknown device.");
+  }
+  *device = iter->second;
+  return Status::OK();
+}
+
+void DeviceMgr::ClearContainers(gtl::ArraySlice<string> containers) const {
+  Status s;
+  for (Device* dev : devices_) {
+    if (containers.empty()) {
+      s.Update(dev->resource_manager()->Cleanup(
+          dev->resource_manager()->default_container()));
+    } else {
+      for (const string& c : containers) {
+        s.Update(dev->resource_manager()->Cleanup(c));
+      }
+    }
+    if (!s.ok()) {
+      LOG(WARNING) << s;
+    }
+  }
+}
+
+int DeviceMgr::NumDeviceType(const string& type) const {
+  auto iter = device_type_counts_.find(type);
+  if (iter != device_type_counts_.end()) return iter->second;
+  return 0;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/device_mgr.h b/tensorflow/core/common_runtime/device_mgr.h
new file mode 100644
index 0000000000..c57d0222aa
--- /dev/null
+++ b/tensorflow/core/common_runtime/device_mgr.h
@@ -0,0 +1,55 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_DEVICE_MGR_H_
+#define TENSORFLOW_COMMON_RUNTIME_DEVICE_MGR_H_
+
+#include <string>
+#include <unordered_map>
+#include <unordered_set>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class DeviceAttributes;
+
+class DeviceMgr {
+ public:
+  // TODO(zhifengc): Other initialization information.
+  explicit DeviceMgr(const std::vector<Device*>& devices);
+  ~DeviceMgr();
+
+  // Returns attributes of all devices.
+  void ListDeviceAttributes(std::vector<DeviceAttributes>* devices) const;
+
+  std::vector<Device*> ListDevices() const;
+
+  // Returns a string listing all devices.
+  string DebugString() const;
+
+  // Returns a string of all the device mapping.
+  string DeviceMappingString() const;
+
+  // Assigns *device with pointer to Device of the given name.
+  // Accepts either a full device name, or just the replica-local suffix.
+  Status LookupDevice(const string& name, Device** device) const;
+
+  // Clears given containers of all devices if 'container' is
+  // non-empty. Otherwise, clears default containers of all devices.
+  void ClearContainers(gtl::ArraySlice<string> containers) const;
+
+  int NumDeviceType(const string& type) const;
+
+ private:
+  typedef gtl::InlinedVector<Device*, 8> DeviceVec;
+  DeviceVec devices_;
+  std::unordered_map<string, Device*> device_map_;
+  std::unordered_map<string, int> device_type_counts_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(DeviceMgr);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_DEVICE_MGR_H_
diff --git a/tensorflow/core/common_runtime/device_set.cc b/tensorflow/core/common_runtime/device_set.cc
new file mode 100644
index 0000000000..3b0465d9a6
--- /dev/null
+++ b/tensorflow/core/common_runtime/device_set.cc
@@ -0,0 +1,68 @@
+#include "tensorflow/core/common_runtime/device_set.h"
+
+#include <set>
+#include <utility>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+
+namespace tensorflow {
+
+DeviceSet::DeviceSet() {}
+
+DeviceSet::~DeviceSet() {}
+
+void DeviceSet::AddDevice(Device* device) {
+  devices_.push_back(device);
+  device_by_name_.insert({device->name(), device});
+}
+
+void DeviceSet::FindMatchingDevices(const DeviceNameUtils::ParsedName& spec,
+                                    std::vector<Device*>* devices) const {
+  // TODO(jeff): If we are going to repeatedly lookup the set of devices
+  // for the same spec, maybe we should have a cache of some sort
+  devices->clear();
+  for (Device* d : devices_) {
+    if (DeviceNameUtils::IsCompleteSpecification(spec, d->parsed_name())) {
+      devices->push_back(d);
+    }
+  }
+}
+
+Device* DeviceSet::FindDeviceByName(const string& name) const {
+  return gtl::FindPtrOrNull(device_by_name_, name);
+}
+
+// Higher result implies lower priority.
+static int Order(const DeviceType& d) {
+  if (StringPiece(d.type()) == DEVICE_CPU) {
+    return 3;
+  } else if (StringPiece(d.type()) == DEVICE_GPU) {
+    return 2;
+  } else {
+    return 1;
+  }
+}
+
+static bool ByPriority(const DeviceType& a, const DeviceType& b) {
+  // Order by "order number"; break ties lexicographically.
+  return std::make_pair(Order(a), StringPiece(a.type())) <
+         std::make_pair(Order(b), StringPiece(b.type()));
+}
+
+std::vector<DeviceType> DeviceSet::PrioritizedDeviceTypeList() const {
+  std::vector<DeviceType> result;
+  std::set<string> seen;
+  for (Device* d : devices_) {
+    auto t = d->device_type();
+    if (seen.insert(t).second) {
+      result.emplace_back(DeviceType(t));
+    }
+  }
+  std::sort(result.begin(), result.end(), ByPriority);
+  return result;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/device_set.h b/tensorflow/core/common_runtime/device_set.h
new file mode 100644
index 0000000000..130d965891
--- /dev/null
+++ b/tensorflow/core/common_runtime/device_set.h
@@ -0,0 +1,64 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_DEVICE_SET_H_
+#define TENSORFLOW_COMMON_RUNTIME_DEVICE_SET_H_
+
+#include <memory>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/util/device_name_utils.h"
+
+namespace tensorflow {
+
+// DeviceSet is a container class for managing the various types of
+// devices used by a model.
+class DeviceSet {
+ public:
+  DeviceSet();
+  ~DeviceSet();
+
+  // Does not take ownership of 'device'.
+  void AddDevice(Device* device);
+
+  // Set the device designated as the "client".  This device
+  // must also be registered via AddDevice().
+  void set_client_device(Device* device) { client_device_ = device; }
+
+  // Returns a pointer to the device designated as the "client".
+  Device* client_device() const { return client_device_; }
+
+  // Return the list of devices in this set.
+  const std::vector<Device*>& devices() const { return devices_; }
+
+  // Given a DeviceNameUtils::ParsedName (which may have some
+  // wildcards for different components), fills "*devices" with all
+  // devices in "*this" that match "spec".
+  void FindMatchingDevices(const DeviceNameUtils::ParsedName& spec,
+                           std::vector<Device*>* devices) const;
+
+  // Finds the device with the given "fullname". Returns nullptr if
+  // not found.
+  Device* FindDeviceByName(const string& fullname) const;
+
+  // Return the list of unique device types in this set, ordered
+  // with more preferable devices earlier.
+  std::vector<DeviceType> PrioritizedDeviceTypeList() const;
+
+ private:
+  // Not owned.
+  std::vector<Device*> devices_;
+
+  // Fullname -> device* for device in devices_.
+  std::unordered_map<string, Device*> device_by_name_;
+
+  // client_device_ points to an element of devices_ that we consider
+  // to be the client device (in this local process).
+  Device* client_device_ = nullptr;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(DeviceSet);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_DEVICE_SET_H_
diff --git a/tensorflow/core/common_runtime/device_set_test.cc b/tensorflow/core/common_runtime/device_set_test.cc
new file mode 100644
index 0000000000..1b80a5b697
--- /dev/null
+++ b/tensorflow/core/common_runtime/device_set_test.cc
@@ -0,0 +1,65 @@
+#include "tensorflow/core/common_runtime/device_set.h"
+
+#include "tensorflow/core/public/status.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+// Return a fake device with the specified type and name.
+static Device* Dev(const char* type, const char* name) {
+  class FakeDevice : public Device {
+   public:
+    explicit FakeDevice(const DeviceAttributes& attr)
+        : Device(nullptr, attr, nullptr) {}
+    Status Sync() override { return Status::OK(); }
+    Allocator* GetAllocator(AllocatorAttributes) override { return nullptr; }
+  };
+  DeviceAttributes attr;
+  attr.set_name(name);
+  attr.set_device_type(type);
+  return new FakeDevice(attr);
+}
+
+class DeviceSetTest : public testing::Test {
+ public:
+  void AddDevice(const char* type, const char* name) {
+    Device* d = Dev(type, name);
+    owned_.emplace_back(d);
+    devices_.AddDevice(d);
+  }
+
+  std::vector<DeviceType> types() const {
+    return devices_.PrioritizedDeviceTypeList();
+  }
+
+ private:
+  DeviceSet devices_;
+  std::vector<std::unique_ptr<Device>> owned_;
+};
+
+TEST_F(DeviceSetTest, PrioritizedDeviceTypeList) {
+  EXPECT_EQ(std::vector<DeviceType>{}, types());
+
+  AddDevice("CPU", "/job:a/replica:0/task:0/cpu:0");
+  EXPECT_EQ(std::vector<DeviceType>{DeviceType(DEVICE_CPU)}, types());
+
+  AddDevice("CPU", "/job:a/replica:0/task:0/cpu:1");
+  EXPECT_EQ(std::vector<DeviceType>{DeviceType(DEVICE_CPU)}, types());
+
+  AddDevice("GPU", "/job:a/replica:0/task:0/gpu:0");
+  EXPECT_EQ(
+      (std::vector<DeviceType>{DeviceType(DEVICE_GPU), DeviceType(DEVICE_CPU)}),
+      types());
+
+  AddDevice("T1", "/job:a/replica:0/task:0/device:T1:0");
+  AddDevice("T1", "/job:a/replica:0/task:0/device:T1:1");
+  AddDevice("T2", "/job:a/replica:0/task:0/device:T2:0");
+  EXPECT_EQ(
+      (std::vector<DeviceType>{DeviceType("T1"), DeviceType("T2"),
+                               DeviceType(DEVICE_GPU), DeviceType(DEVICE_CPU)}),
+      types());
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/eigen_thread_pool.h b/tensorflow/core/common_runtime/eigen_thread_pool.h
new file mode 100644
index 0000000000..2554f3521b
--- /dev/null
+++ b/tensorflow/core/common_runtime/eigen_thread_pool.h
@@ -0,0 +1,22 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_EIGEN_THREAD_POOL_H_
+#define TENSORFLOW_COMMON_RUNTIME_EIGEN_THREAD_POOL_H_
+
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+class EigenThreadPoolWrapper : public Eigen::ThreadPoolInterface {
+ public:
+  explicit EigenThreadPoolWrapper(thread::ThreadPool* pool) : pool_(pool) {}
+  ~EigenThreadPoolWrapper() override {}
+
+  void Schedule(std::function<void()> fn) override { pool_->Schedule(fn); }
+
+ private:
+  thread::ThreadPool* pool_ = nullptr;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_EIGEN_THREAD_POOL_H_
diff --git a/tensorflow/core/common_runtime/executor.cc b/tensorflow/core/common_runtime/executor.cc
new file mode 100644
index 0000000000..7f2473f93b
--- /dev/null
+++ b/tensorflow/core/common_runtime/executor.cc
@@ -0,0 +1,2118 @@
+#include "tensorflow/core/common_runtime/executor.h"
+
+#include <atomic>
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <unordered_set>
+#include <vector>
+#include <deque>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/cancellation.h"
+#include "tensorflow/core/framework/control_flow.h"
+#include "tensorflow/core/framework/device_attributes.pb.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/op_segment.h"
+#include "tensorflow/core/framework/step_stats.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/edgeset.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/hash/hash.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/platform/tracing.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/tensor_slice_reader_cache.h"
+
+namespace tensorflow {
+
+namespace {
+
+// 1-D, 0 element tensor.
+static const Tensor* const kEmptyTensor = new Tensor;
+
+bool IsInitializationOp(const Node* node) {
+  return node->op_def().allows_uninitialized_input();
+}
+
+// Sets the timeline_label field of *node_stats, using data from *node.
+// Returns true iff the node is a transfer node.
+// TODO(tucker): merge with the DetailText function in session.cc
+// in a common location.
+bool SetTimelineLabel(const Node* node, NodeExecStats* node_stats) {
+  bool is_transfer_node = false;
+  string memory;
+  for (auto& all : node_stats->memory()) {
+    int64 tot = all.total_bytes();
+    if (tot >= 0.1 * 1048576.0) {
+      int64 peak = all.peak_bytes();
+      if (peak > 0) {
+        memory =
+            strings::StrCat(memory, "[", all.allocator_name(),
+                            strings::Printf(" %.1fMB %.1fMB] ", tot / 1048576.0,
+                                            peak / 1048576.0));
+      } else {
+        memory = strings::StrCat(memory, "[", all.allocator_name(),
+                                 strings::Printf(" %.1fMB] ", tot / 1048576.0));
+      }
+    }
+  }
+  const NodeDef& def = node->def();
+  string text = "";
+  if (IsSend(node)) {
+    string tensor_name;
+    TF_CHECK_OK(GetNodeAttr(def, "tensor_name", &tensor_name));
+    string recv_device;
+    TF_CHECK_OK(GetNodeAttr(def, "recv_device", &recv_device));
+    text = strings::StrCat(memory, def.name(), " = ", def.op(), "(",
+                           tensor_name, " @", recv_device);
+    is_transfer_node = true;
+  } else if (IsRecv(node)) {
+    string tensor_name;
+    TF_CHECK_OK(GetNodeAttr(def, "tensor_name", &tensor_name));
+    string send_device;
+    TF_CHECK_OK(GetNodeAttr(def, "send_device", &send_device));
+    text = strings::StrCat(memory, def.name(), " = ", def.op(), "(",
+                           tensor_name, " @", send_device);
+    is_transfer_node = true;
+  } else {
+    text = strings::StrCat(
+        memory, def.name(), " = ", def.op(), "(",
+        str_util::Join(
+            std::vector<StringPiece>(def.input().begin(), def.input().end()),
+            ", "),
+        ")");
+  }
+  node_stats->set_timeline_label(text);
+  return is_transfer_node;
+}
+
+// Helper routines for collecting step stats.
+namespace nodestats {
+inline int64 NowInUsec() { return Env::Default()->NowMicros(); }
+
+void SetScheduled(NodeExecStats* nt, int64 t) { nt->set_scheduled_micros(t); }
+
+void SetAllStart(NodeExecStats* nt) { nt->set_all_start_micros(NowInUsec()); }
+
+void SetOpStart(NodeExecStats* nt) {
+  DCHECK_NE(nt->all_start_micros(), 0);
+  nt->set_op_start_rel_micros(NowInUsec() - nt->all_start_micros());
+}
+
+void SetOpEnd(NodeExecStats* nt) {
+  DCHECK_NE(nt->all_start_micros(), 0);
+  nt->set_op_end_rel_micros(NowInUsec() - nt->all_start_micros());
+}
+
+void SetAllEnd(NodeExecStats* nt) {
+  DCHECK_NE(nt->all_start_micros(), 0);
+  nt->set_all_end_rel_micros(NowInUsec() - nt->all_start_micros());
+}
+
+void SetOutput(NodeExecStats* nt, int slot, AllocationType allocation_type,
+               const Tensor* v) {
+  DCHECK(v);
+  NodeOutput* no = nt->add_output();
+  no->set_slot(slot);
+  no->set_allocation_type(allocation_type);
+  v->FillDescription(no->mutable_tensor_description());
+}
+
+void SetMemory(NodeExecStats* nt, OpKernelContext* ctx) {
+  for (const auto& allocator_pair : ctx->wrapped_allocators()) {
+    AllocatorMemoryUsed* memory = nt->add_memory();
+    // retrieving the sizes from the wrapped allocator removes the
+    // executor's reference to it, so allocator_pair.second must not
+    // be dereferenced again after this statement
+    auto sizes = allocator_pair.second->GetSizesAndUnRef();
+    memory->set_allocator_name(allocator_pair.first->Name());
+    int tb = sizes.first;
+    memory->set_total_bytes(tb);
+    if (allocator_pair.first->TracksAllocationSizes()) {
+      memory->set_peak_bytes(sizes.second);
+    }
+  }
+}
+}  // namespace nodestats
+
+struct NodeItem {
+  // A graph node.
+  const Node* node = nullptr;
+
+  // The kernel for this node.
+  OpKernel* kernel = nullptr;
+
+  // ExecutorImpl::tensors_[input_start] is the 1st positional input
+  // for this node.
+  int input_start = 0;
+};
+
+// Map from std::pair<node_id, output_index> to attributes.
+struct pairhash {
+ public:
+  template <typename T, typename U>
+  std::size_t operator()(const std::pair<T, U>& x) const {
+    return std::hash<T>()(x.first) ^ std::hash<U>()(x.second);
+  }
+};
+typedef std::unordered_map<std::pair<int, int>, AllocatorAttributes, pairhash>
+    DevAttrMap;
+
+typedef gtl::InlinedVector<TensorValue, 4> TensorValueVec;
+typedef gtl::InlinedVector<DeviceContext*, 4> DeviceContextVec;
+typedef gtl::InlinedVector<AllocatorAttributes, 4> AllocatorAttributeVec;
+
+class ExecutorImpl : public Executor {
+ public:
+  ExecutorImpl(const LocalExecutorParams& p, const Graph* g)
+      : params_(p), graph_(g) {
+    CHECK(p.create_kernel != nullptr);
+    CHECK(p.delete_kernel != nullptr);
+  }
+
+  ~ExecutorImpl() override {
+    for (NodeItem& item : nodes_) {
+      params_.delete_kernel(item.kernel);
+    }
+    delete graph_;
+  }
+
+  Status Initialize();
+
+  // Infer memory allocation attributes of a node n's output,
+  // based on its use node dst.  Note that dst might not be directly
+  // connected to n by a single edge, but might be a downstream
+  // consumer of n's output by reference.  *attr is updated with any
+  // necessary attributes.
+  Status InferAllocAttr(const Node* n, const Node* dst,
+                        const DeviceNameUtils::ParsedName& local_dev_name,
+                        AllocatorAttributes* attr);
+
+  // Process all Nodes in the current graph, attempting to infer the
+  // memory allocation attributes to be used wherever they may allocate
+  // a tensor buffer.
+  Status SetAllocAttrs();
+
+  void RunAsync(const Args& args, DoneCallback done) override;
+
+ private:
+  friend class ExecutorState;
+  friend class SimpleExecutorState;
+
+  // Owned.
+  LocalExecutorParams params_;
+  const Graph* graph_;
+  std::vector<NodeItem> nodes_;  // nodes_.size == graph_.num_node_ids().
+  int total_tensors_ = 0;        // total_tensors_ = sum(nodes_[*].num_inputs())
+
+  // The number of inputs for each frame in this graph. This is static
+  // information of the graph.
+  std::unordered_map<string, int> frame_input_count_;
+
+  DevAttrMap alloc_attr_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(ExecutorImpl);
+};
+
+Status ExecutorImpl::Initialize() {
+  const int num_nodes = graph_->num_node_ids();
+  nodes_.resize(num_nodes);
+
+  Status s;
+  total_tensors_ = 0;
+
+  // Preprocess every node in the graph to create an instance of op
+  // kernel for each node;
+  for (const Node* n : graph_->nodes()) {
+    const int id = n->id();
+    NodeItem* item = &nodes_[id];
+    item->node = n;
+    item->input_start = total_tensors_;
+    total_tensors_ += n->num_inputs();
+    s = params_.create_kernel(n->def(), &item->kernel);
+    if (!s.ok()) {
+      s = AttachDef(s, n->def());
+      LOG(ERROR) << "Executor failed to create kernel. " << s;
+      break;
+    }
+    CHECK(item->kernel);
+
+    // Initialize static information about the frames in the graph.
+    if (IsEnter(n)) {
+      string frame_name;
+      s = GetNodeAttr(n->def(), "frame_name", &frame_name);
+      if (!s.ok()) return s;
+      ++frame_input_count_[frame_name];
+    }
+  }
+  if (params_.has_control_flow) {
+    VLOG(2) << "Graph has control flow.";
+  }
+  if (!s.ok()) return s;
+  return SetAllocAttrs();
+}
+
+Status ExecutorImpl::SetAllocAttrs() {
+  Status s;
+  Device* device = params_.device;
+  DeviceNameUtils::ParsedName local_dev_name = device->parsed_name();
+
+  for (const Node* n : graph_->nodes()) {
+    // Examine the out edges of each node looking for special use
+    // cases that may affect memory allocation attributes.
+    for (auto e : n->out_edges()) {
+      AllocatorAttributes attr;
+      s = InferAllocAttr(n, e->dst(), local_dev_name, &attr);
+      if (!s.ok()) return s;
+      if (attr.value != 0) {
+        VLOG(2) << "node " << n->name() << " gets attr " << attr.value
+                << " for output " << e->src_output();
+        alloc_attr_[std::make_pair(n->id(), e->src_output())].Merge(attr);
+      } else {
+        VLOG(2) << "default output attr for node " << n->name() << " output "
+                << e->src_output();
+      }
+    }
+  }
+  return s;
+}
+
+Status ExecutorImpl::InferAllocAttr(
+    const Node* n, const Node* dst,
+    const DeviceNameUtils::ParsedName& local_dev_name,
+    AllocatorAttributes* attr) {
+  Status s;
+  if (IsSend(dst)) {
+    string dst_name;
+    s = GetNodeAttr(dst->def(), "recv_device", &dst_name);
+    if (!s.ok()) return s;
+    DeviceNameUtils::ParsedName parsed_dst_name;
+    if (!DeviceNameUtils::ParseFullName(dst_name, &parsed_dst_name)) {
+      s = errors::Internal("Bad recv_device attr '", dst_name, "' in node ",
+                           n->name());
+      return s;
+    }
+    if (!DeviceNameUtils::IsSameAddressSpace(parsed_dst_name, local_dev_name)) {
+      // Value is going to be the source of an RPC.
+      attr->set_nic_compatible(true);
+      VLOG(2) << "node " << n->name() << " is the source of an RPC out";
+    } else if (local_dev_name.type == "CPU" && parsed_dst_name.type == "GPU") {
+      // Value is going to be the source of a local DMA from CPU to GPU.
+      attr->set_gpu_compatible(true);
+      VLOG(2) << "node " << n->name() << " is the source of a cpu->gpu copy";
+    } else {
+      VLOG(2) << "default alloc case local type " << local_dev_name.type
+              << " remote type " << parsed_dst_name.type;
+    }
+  } else if (dst->type_string() == "ToFloat") {
+    for (auto e : dst->out_edges()) {
+      s = InferAllocAttr(n, e->dst(), local_dev_name, attr);
+      if (!s.ok()) return s;
+    }
+  }
+  return s;
+}
+
+// The state associated with one invokation of ExecutorImpl::Run.
+// ExecutorState dispatches nodes when they become ready and keeps
+// track of how many predecessors of a node have not done (pending_).
+class ExecutorState {
+ public:
+  ExecutorState(const Executor::Args& args, ExecutorImpl* impl);
+  ~ExecutorState();
+
+  void RunAsync(Executor::DoneCallback done);
+
+ private:
+  typedef ExecutorState ME;
+
+  // Either a tensor pointer (pass-by-reference) or a tensor (pass-by-value).
+  // TODO(yuanbyu): A better way to do "has_value"?
+  struct Entry {
+    Tensor val = *kEmptyTensor;  // A tensor value.
+    Tensor* ref = nullptr;       // A tensor reference.
+    mutex* ref_mu = nullptr;     // mutex for *ref if ref is not nullptr.
+    bool has_value = false;      // Whether the value exists
+
+    // Every entry carries an optional DeviceContext containing
+    // Device-specific information about how the Tensor was produced.
+    DeviceContext* device_context = nullptr;
+
+    // The attributes of the allocator that creates the tensor.
+    AllocatorAttributes alloc_attr;
+  };
+
+  // Contains a map from node id to the DeviceContext object that was
+  // assigned by the device at the beginning of a step.
+  DeviceContextMap device_context_map_;
+
+  struct IterationState {
+    // The state of an iteration.
+
+    // The pending count for each graph node. One copy per iteration.
+    // Iteration i can be garbage collected when it is done.
+    // TODO(yuanbyu): This vector currently has size of the number of nodes
+    // in this partition. This is not efficient if the subgraph for the frame
+    // is only a small subset of the partition. We should make the vector
+    // size to be only the size of the frame subgraph.
+    std::vector<int>* pending_count;
+
+    // The dead input count for each graph node. One copy per iteration.
+    std::vector<int>* dead_count;
+
+    // One copy per iteration. For iteration k, i-th node's j-th input is in
+    // input_tensors[k][impl_->nodes[i].input_start + j]. An entry is either
+    // a tensor pointer (pass-by-reference) or a tensor (pass-by-value).
+    //
+    // NOTE: No need to protect input_tensors[i] by any locks because it
+    // is resized once. Each element of tensors_ is written once by the
+    // source node of an edge and is cleared by the destination of the same
+    // edge. The latter node is never run concurrently with the former node.
+    std::vector<Entry>* input_tensors;
+
+    // The number of outstanding ops for each iteration.
+    int outstanding_ops;
+
+    // The number of outstanding frames for each iteration.
+    int outstanding_frame_count;
+
+    ~IterationState() {
+      delete pending_count;
+      delete dead_count;
+      delete input_tensors;
+    }
+  };
+
+  struct FrameState {
+    // A new frame is created for each loop. Execution starts at iteration 0.
+    // When a value at iteration 0 passes through a NextIteration node,
+    // iteration 1 is created and starts running. Note that iteration 0 may
+    // still be running so multiple iterations may run in parallel. The
+    // frame maintains the state of iterations in several data structures
+    // such as pending_count and input_tensors. When iteration 0 completes,
+    // we garbage collect the state of iteration 0.
+    //
+    // A frame instance is considered "done" and can be garbage collected
+    // if all its inputs have entered and all its iterations are "done".
+    //
+    // A frame manages the live iterations of an iterative computation.
+    // Iteration i is considered "done" when there are no outstanding ops,
+    // frames at iteration i are done, all recvs for this iteration are
+    // completed, and iteration i-1 is done. For iteration 0, we instead
+    // wait for there to be no more pending inputs of the frame.
+    //
+    // Frames and iterations are garbage collected once they are done.
+    // The state we need to keep around is highly dependent on the
+    // parallelism enabled by the scheduler. We may want to have the
+    // scheduler dynamically control the outstanding number of live
+    // parallel frames and iterations. To reduce the state space, the
+    // scheduler might want to schedule ops in inner frames first and
+    // lower iterations first.
+    //
+    // This frame state is mostly initialized lazily on demand so we
+    // don't introduce unnecessary overhead.
+
+    // The name of this frame, which is the concatenation of its parent
+    // frame name, the iteration of the parent frame when this frame was
+    // created, and the value of the attr 'frame_name'.
+    string frame_name;
+
+    // The unique id for this frame. Generated by fingerprinting
+    // frame_name.
+    uint64 frame_id;
+
+    // The iteration id of its parent frame when this frame is created.
+    // -1 if there is no parent frame. The frame_name/parent_iter pair
+    // uniquely identifies this FrameState.
+    int64 parent_iter = -1;
+
+    // The FrameState of its parent frame.
+    FrameState* parent_frame = nullptr;
+
+    // The highest iteration number we have reached so far in this frame.
+    int64 iteration_count = 0;
+
+    // The number of inputs this frame is still waiting.
+    int num_pending_inputs = 0;
+
+    // The number of outstanding iterations.
+    int num_outstanding_iterations = 0;
+
+    // The maximum allowed number of parallel iterations.
+    int max_parallel_iterations = 1;
+
+    // The iteration states of this frame.
+    std::vector<IterationState*> iterations;
+
+    // The NextIteration nodes to enter a new iteration. If the number of
+    // outstanding iterations reaches the limit, we will defer the start of
+    // the next iteration until the number of outstanding iterations falls
+    // below the limit.
+    std::vector<std::pair<const Node*, Entry>> next_iter_roots;
+
+    // The values of the loop invariants for this loop. They are added into
+    // this list as they "enter" the frame. When a loop invariant enters,
+    // we make it available to all active iterations. When the frame starts
+    // a new iteration, we make all the current loop invariants available
+    // to the new iteration.
+    std::vector<std::pair<const Node*, Entry>> inv_values;
+
+    // The list of dead exit nodes for the current highest iteration. We
+    // will only "execute" the dead exits of the final iteration.
+    std::vector<const Node*> dead_exits;
+
+    IterationState* GetIteration(int64 iter) {
+      int index = iter % iterations.size();
+      return iterations[index];
+    }
+
+    void SetIteration(int64 iter, IterationState* state) {
+      int index = iter % iterations.size();
+      iterations[index] = state;
+    }
+
+    ~FrameState() {
+      for (size_t i = 0; i < iterations.size(); ++i) {
+        delete iterations[i];
+        iterations[i] = nullptr;
+      }
+    }
+  };
+
+  // A tagged node: <frame*, iter, node*>.
+  struct TaggedNode {
+    const Node* node = nullptr;
+    FrameState* input_frame = nullptr;
+    int64 input_iter = -1;
+    bool is_dead = false;
+
+    TaggedNode(const Node* t_node, FrameState* in_frame, int64 in_iter,
+               bool dead) {
+      node = t_node;
+      input_frame = in_frame;
+      input_iter = in_iter;
+      is_dead = dead;
+    }
+  };
+
+  typedef gtl::InlinedVector<TaggedNode, 8> TaggedNodeSeq;
+  typedef gtl::InlinedVector<Entry, 4> EntryVector;
+
+  // Not owned.
+  Rendezvous* rendezvous_;
+  StepStatsCollector* stats_collector_;
+  // QUESTION: Make it a checkpoint::TensorSliceReaderCacheWrapper instead of a
+  // pointer?  (avoids having to delete).
+  checkpoint::TensorSliceReaderCacheWrapper* slice_reader_cache_;
+  FunctionCallFrame* call_frame_;
+  const ExecutorImpl* impl_;
+  CancellationManager* cancellation_manager_;
+  Executor::Args::Runner runner_;
+
+  // Owned.
+
+  // Step-local resource manager.
+  ResourceMgr step_resource_manager_;
+
+  // The root frame in which the execution of this step is started.
+  FrameState* root_frame_;
+
+  // Invoked when the execution finishes.
+  Executor::DoneCallback done_cb_;
+
+  std::atomic_int_fast32_t num_outstanding_ops_;
+
+  mutex mu_;
+  Status status_ GUARDED_BY(mu_);
+
+  // Mapping from frame name to outstanding frames. A new frame is created
+  // at some iteration of an active frame. So the unique key for the new
+  // child frame is composed of the name of the parent frame, the iteration
+  // number at which the parent frame is creating the new frame, and the
+  // name of the new frame from nodedef.
+  std::unordered_map<string, FrameState*> outstanding_frames_ GUARDED_BY(mu_);
+
+  // The unique name of a frame.
+  inline string MakeFrameName(FrameState* frame, int64 iter_id, string name) {
+    return strings::StrCat(frame->frame_name, ";", iter_id, ";", name);
+  }
+
+  // Initialize the pending count for a graph.
+  static void InitializePending(const Graph* graph, std::vector<int>* pending);
+
+  // Find an existing or create a new child frame in the frame 'frame' at
+  // iteration 'iter'.
+  void FindOrCreateChildFrame(FrameState* frame, int64 iter, const Node* node,
+                              FrameState** child) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Increments the iteration id. If this is a new iteration, initialize it.
+  void IncrementIteration(FrameState* frame, TaggedNodeSeq* ready)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Returns true if the computation in the frame is completed.
+  bool IsFrameDone(FrameState* frame) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Returns true if the iteration of the frame is completed.
+  bool IsIterationDone(FrameState* frame, int64 iter)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Get the output frame/iter of a node. Create new frame/iteration if
+  // needed. If there are dead roots for the new iteration, we need to
+  // "execute" them so ad them to the ready queue. Returns true if
+  // we need to check for the completion of output frame/iter.
+  bool SetOutputFrameIter(const TaggedNode& tagged_node,
+                          const EntryVector& outputs, FrameState** frame,
+                          int64* iter, TaggedNodeSeq* ready)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Cleanup frames and iterations
+  void CleanupFramesIterations(FrameState* frame, int64 iter,
+                               TaggedNodeSeq* ready)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Activate all the deferred NextIteration nodes in a new iteration.
+  void ActivateNexts(FrameState* frame, int64 iter, TaggedNodeSeq* ready)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Activate all the current loop invariants in a new iteration.
+  void ActivateLoopInvs(FrameState* frame, int64 iter, TaggedNodeSeq* ready)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Add a new loop invariant and make it available to all active iterations.
+  void AddLoopInv(FrameState* frame, const Node* node, const Entry& value,
+                  TaggedNodeSeq* ready) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Activate the successors of a node.
+  void ActivateNode(const Node* node, const bool is_dead, FrameState* frame,
+                    int64 iter, const EntryVector& outputs,
+                    TaggedNodeSeq* ready) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Process a ready node in current thread.
+  void Process(TaggedNode node, int64 scheduled_usec);
+
+  // Before invoking item->kernel, fills in its "inputs".
+  Status PrepareInputs(const NodeItem& item, Entry* first_input,
+                       TensorValueVec* inputs,
+                       DeviceContextVec* input_device_contexts,
+                       AllocatorAttributeVec* input_alloc_attrs,
+                       bool* is_input_dead);
+
+  // After item->kernel computation is done, processes its outputs.
+  Status ProcessOutputs(const NodeItem& item, OpKernelContext* ctx,
+                        EntryVector* outputs, NodeExecStats* stats);
+
+  // After processing the outputs, propagates the outputs to their dsts.
+  void PropagateOutputs(const TaggedNode& tagged_node,
+                        const EntryVector& outputs, TaggedNodeSeq* ready);
+
+  // "node" just finishes. Takes ownership of "stats". Returns true if
+  // execution has completed.
+  bool NodeDone(const Status& s, const Node* node, const TaggedNodeSeq& ready,
+                NodeExecStats* stats, std::deque<TaggedNode>* inline_ready);
+
+  // Call Process() on all nodes in 'inline_ready'.
+  void ProcessInline(const std::deque<TaggedNode>& inline_ready);
+
+  // Schedule all the expensive nodes in 'ready', and put all the inexpensive
+  // nodes in 'ready' into 'inline_ready'.
+  void ScheduleReady(const TaggedNodeSeq& ready,
+                     std::deque<TaggedNode>* inline_ready);
+
+  // One thread of control finishes.
+  void Finish();
+};
+
+ExecutorState::ExecutorState(const Executor::Args& args, ExecutorImpl* impl)
+    : rendezvous_(args.rendezvous),
+      stats_collector_(args.stats_collector),
+      slice_reader_cache_(new checkpoint::TensorSliceReaderCacheWrapper),
+      call_frame_(args.call_frame),
+      impl_(impl),
+      cancellation_manager_(args.cancellation_manager),
+      runner_(args.runner),
+      num_outstanding_ops_(0) {
+  // We start the entire execution in iteration 0 of the root frame
+  // so let us create the root frame and the state for iteration 0.
+  // Initialize the frame.
+  root_frame_ = new FrameState;
+  root_frame_->frame_name = "_root";  // assume to be unique
+  root_frame_->frame_id = 0;          // must be 0
+  root_frame_->num_pending_inputs = 0;
+  root_frame_->num_outstanding_iterations = 1;
+  root_frame_->max_parallel_iterations = 1;  // enough for root frame
+  root_frame_->iterations.resize(root_frame_->max_parallel_iterations);
+
+  VLOG(2) << "Create frame: " << root_frame_->frame_name;
+
+  // Initialize the iteration.
+  IterationState* iter_state = new IterationState;
+  root_frame_->iterations[0] = iter_state;
+  iter_state->outstanding_ops = 0;
+  iter_state->outstanding_frame_count = 0;
+  iter_state->pending_count = new std::vector<int>;
+  iter_state->dead_count = new std::vector<int>(impl->graph_->num_node_ids());
+  iter_state->input_tensors = new std::vector<Entry>(impl_->total_tensors_);
+
+  // Initialize the executor state.
+  outstanding_frames_.insert({root_frame_->frame_name, root_frame_});
+}
+
+ExecutorState::~ExecutorState() {
+  for (auto name_frame : outstanding_frames_) {
+    delete name_frame.second;
+  }
+
+  for (auto it : device_context_map_) {
+    it.second->Unref();
+  }
+
+  delete slice_reader_cache_;
+}
+
+void ExecutorState::InitializePending(const Graph* graph,
+                                      std::vector<int>* pending) {
+  pending->resize(graph->num_node_ids());
+  for (const Node* n : graph->nodes()) {
+    const int id = n->id();
+    const int num_in_edges = n->in_edges().size();
+    if (IsMerge(n)) {
+      // merge waits all control inputs so we initialize the pending
+      // count to be the number of control edges.
+      int32 num_control_edges = 0;
+      for (const Edge* edge : n->in_edges()) {
+        if (edge->IsControlEdge()) {
+          num_control_edges++;
+        }
+      }
+      // Use bit 0 to indicate if there is a ready live data input.
+      (*pending)[id] = num_control_edges << 1;
+    } else {
+      (*pending)[id] = num_in_edges;
+    }
+  }
+}
+
+void ExecutorState::RunAsync(Executor::DoneCallback done) {
+  const Graph* graph = impl_->graph_;
+  TaggedNodeSeq ready;
+
+  {
+    // Initialize the executor state. We grab the mutex here just to
+    // keep the thread safety analysis happy.
+    mutex_lock l(mu_);
+    std::vector<int>* pending = root_frame_->iterations[0]->pending_count;
+    InitializePending(graph, pending);
+  }
+
+  // Ask the device to fill in the device context map.
+  Device* device = impl_->params_.device;
+  device->FillContextMap(graph, &device_context_map_);
+
+  // Initialize the ready queue.
+  for (const Node* n : graph->nodes()) {
+    const int num_in_edges = n->in_edges().size();
+    if (num_in_edges == 0) {
+      ready.push_back(TaggedNode{n, root_frame_, 0, false});
+    }
+  }
+  if (ready.empty()) {
+    done(Status::OK());
+  } else {
+    num_outstanding_ops_ = ready.size();
+    root_frame_->iterations[0]->outstanding_ops = ready.size();
+    done_cb_ = done;
+    // Schedule to run all the ready ops in thread pool.
+    ScheduleReady(ready, nullptr);
+  }
+}
+
+namespace {
+
+// This function is provided for use by OpKernelContext when allocating
+// the index'th output of node.  It provides access to the
+// AllocatorAttributes computed during initialization to determine in
+// which memory region the tensor should be allocated.
+AllocatorAttributes OutputAttributes(const DevAttrMap* attr_map,
+                                     const Node* node,
+                                     const OpKernel* op_kernel, int index) {
+  DCHECK_GE(index, 0);
+
+  AllocatorAttributes attr;
+  int nid = node->id();
+  const auto& iter = attr_map->find(std::make_pair(nid, index));
+  if (iter != attr_map->end()) {
+    attr = iter->second;
+    VLOG(2) << "nondefault attr " << attr.value << " for node " << node->name()
+            << " output " << index;
+  } else {
+    VLOG(2) << "default attr for node " << node->name() << " output " << index;
+  }
+
+  DCHECK_LT(index, op_kernel->output_memory_types().size());
+  bool on_host = op_kernel->output_memory_types()[index] == HOST_MEMORY;
+  attr.set_on_host(on_host);
+  return attr;
+}
+
+// Helpers to make a copy of 'p' and makes a copy of the input type
+// vector and the device context vector.
+//
+// NOTE: We need to make a copy of p.input for asynchronous kernel
+// because OpKernelContext methods like input_type(i) needs the param
+// points to valid input type vector. It's not an issue for sync
+// kernels because the type vector is kept on the stack.
+OpKernelContext::Params* CopyParams(const OpKernelContext::Params& p) {
+  OpKernelContext::Params* ret = new OpKernelContext::Params;
+  *ret = p;
+  ret->inputs = new TensorValueVec(*p.inputs);
+  ret->input_device_contexts = new DeviceContextVec(*p.input_device_contexts);
+  ret->input_alloc_attrs = new AllocatorAttributeVec(*p.input_alloc_attrs);
+  return ret;
+}
+
+// Helpers to delete 'p' and copies made by CopyParams.
+void DeleteParams(OpKernelContext::Params* p) {
+  delete p->inputs;
+  delete p->input_device_contexts;
+  delete p->input_alloc_attrs;
+  delete p;
+}
+
+}  // namespace
+
+void ExecutorState::Process(TaggedNode tagged_node, int64 scheduled_usec) {
+  const std::vector<NodeItem>& nodes = impl_->nodes_;
+  TaggedNodeSeq ready;
+  std::deque<TaggedNode> inline_ready;
+
+  // Parameters passed to OpKernel::Compute.
+  TensorValueVec inputs;
+  DeviceContextVec input_device_contexts;
+  AllocatorAttributeVec input_alloc_attrs;
+
+  OpKernelContext::Params params;
+  Device* device = impl_->params_.device;
+  params.device = device;
+  // track allocations if and only if we are collecting statistics
+  params.track_allocations = (stats_collector_ != nullptr);
+  params.rendezvous = rendezvous_;
+  params.cancellation_manager = cancellation_manager_;
+  params.call_frame = call_frame_;
+  params.function_library = impl_->params_.function_library;
+  params.resource_manager = device->resource_manager();
+  params.step_resource_manager = &step_resource_manager_;
+  params.slice_reader_cache = slice_reader_cache_;
+  params.inputs = &inputs;
+  params.input_device_contexts = &input_device_contexts;
+  params.input_alloc_attrs = &input_alloc_attrs;
+
+  Status s;
+  NodeExecStats* stats = nullptr;
+  EntryVector outputs;
+  bool completed = false;
+  inline_ready.push_back(tagged_node);
+  while (!inline_ready.empty()) {
+    tagged_node = inline_ready.front();
+    inline_ready.pop_front();
+    const Node* node = tagged_node.node;
+    FrameState* input_frame = tagged_node.input_frame;
+    int64 input_iter = tagged_node.input_iter;
+    const int id = node->id();
+    const NodeItem& item = nodes[id];
+
+    // Set the device_context for this node id, if it exists.
+    auto dc_it = device_context_map_.find(id);
+    if (dc_it != device_context_map_.end()) {
+      params.op_device_context = dc_it->second;
+    }
+
+    if (stats_collector_) {
+      stats = new NodeExecStats;
+      stats->set_node_name(node->name());
+      nodestats::SetScheduled(stats, scheduled_usec);
+      nodestats::SetAllStart(stats);
+    }
+
+    VLOG(1) << "Process node: " << id << " " << SummarizeNodeDef(node->def());
+
+    std::vector<Entry>* input_tensors;
+    {
+      // Need the lock because the iterations vector could be resized by
+      // another thread.
+      mutex_lock l(mu_);
+      input_tensors = input_frame->GetIteration(input_iter)->input_tensors;
+    }
+    Entry* first_input = input_tensors->data() + item.input_start;
+    outputs.clear();
+    outputs.resize(node->num_outputs());
+
+    // Only execute this node if it is not dead or it is a send/recv
+    // transfer node. For transfer nodes, we need to propagate the "dead"
+    // bit even when the node is dead.
+    AsyncOpKernel* async = nullptr;
+    if (!tagged_node.is_dead || IsTransferNode(node)) {
+      // Prepares inputs.
+      bool is_input_dead = false;
+      s = PrepareInputs(item, first_input, &inputs, &input_device_contexts,
+                        &input_alloc_attrs, &is_input_dead);
+      if (!s.ok()) {
+        // Continue to process the nodes in 'inline_ready'.
+        completed = NodeDone(s, item.node, ready, stats, &inline_ready);
+        continue;
+      }
+
+      // Set up compute params.
+      OpKernel* op_kernel = item.kernel;
+      params.op_kernel = op_kernel;
+      params.frame_iter = FrameAndIter(input_frame->frame_id, input_iter);
+      params.is_input_dead = is_input_dead;
+      params.output_alloc_attr = [this, node, op_kernel](int index) {
+        return OutputAttributes(&impl_->alloc_attr_, node, op_kernel, index);
+      };
+
+      async = op_kernel->AsAsync();
+      if (async) {
+        // Asynchronous computes.
+        auto pcopy = CopyParams(params);
+        auto ctx = new OpKernelContext(*pcopy);
+        auto done = [this, tagged_node, item, first_input, ctx, stats,
+                     pcopy]() {
+          VLOG(2) << this << " Async kernel done: "
+                  << SummarizeNodeDef(item.node->def());
+          if (stats_collector_) nodestats::SetOpEnd(stats);
+          EntryVector outputs;
+          Status s = ProcessOutputs(item, ctx, &outputs, stats);
+          if (stats_collector_) nodestats::SetMemory(stats, ctx);
+          // Clears inputs.
+          int num_inputs = tagged_node.node->num_inputs();
+          for (int i = 0; i < num_inputs; ++i) {
+            (first_input + i)->val = *kEmptyTensor;
+          }
+          TaggedNodeSeq ready;
+          if (s.ok()) {
+            PropagateOutputs(tagged_node, outputs, &ready);
+          }
+          // Schedule to run all the ready ops in thread pool.
+          bool completed = NodeDone(s, item.node, ready, stats, nullptr);
+          delete ctx;
+          DeleteParams(pcopy);
+          if (completed) Finish();
+        };
+        if (stats_collector_) nodestats::SetOpStart(stats);
+        device->ComputeAsync(async, ctx, done);
+      } else {
+        // Synchronous computes.
+        OpKernelContext ctx(params);
+        if (stats_collector_) nodestats::SetOpStart(stats);
+        device->Compute(CHECK_NOTNULL(op_kernel), &ctx);
+        if (stats_collector_) nodestats::SetOpEnd(stats);
+
+        // Processes outputs.
+        s = ProcessOutputs(item, &ctx, &outputs, stats);
+        if (stats_collector_) nodestats::SetMemory(stats, &ctx);
+      }
+    }
+
+    if (!async) {
+      // Clears inputs.
+      int num_inputs = node->num_inputs();
+      for (int i = 0; i < num_inputs; ++i) {
+        (first_input + i)->val = *kEmptyTensor;
+      }
+      // Propagates outputs.
+      if (s.ok()) {
+        PropagateOutputs(tagged_node, outputs, &ready);
+      }
+      if (stats_collector_) {
+        scheduled_usec = nodestats::NowInUsec();
+      }
+      // Postprocess.
+      completed = NodeDone(s, item.node, ready, stats, &inline_ready);
+    }
+  }  // while !inline_ready.empty()
+
+  // This thread of computation is done if completed = true.
+  if (completed) Finish();
+}
+
+Status ExecutorState::PrepareInputs(const NodeItem& item, Entry* first_input,
+                                    TensorValueVec* inputs,
+                                    DeviceContextVec* input_device_contexts,
+                                    AllocatorAttributeVec* input_alloc_attrs,
+                                    bool* is_input_dead) {
+  const Node* node = item.node;
+
+  inputs->clear();
+  inputs->resize(node->num_inputs());
+  input_device_contexts->clear();
+  input_device_contexts->resize(node->num_inputs());
+  input_alloc_attrs->clear();
+  input_alloc_attrs->resize(node->num_inputs());
+
+  *is_input_dead = false;
+
+  bool is_merge = IsMerge(node);
+  for (int i = 0; i < node->num_inputs(); ++i) {
+    const bool expect_ref = IsRefType(node->input_type(i));
+    Entry* entry = first_input + i;
+    (*input_device_contexts)[i] = entry->device_context;
+    (*input_alloc_attrs)[i] = entry->alloc_attr;
+
+    // i-th input.
+    TensorValue* inp = &(*inputs)[i];
+
+    // Only merge and transfer nodes can have no-value inputs.
+    if (!entry->has_value) {
+      if (!is_merge) {
+        DCHECK(IsTransferNode(node));
+        inp->tensor = &entry->val;
+        *is_input_dead = true;
+      }
+      continue;
+    }
+    if (entry->ref == nullptr) {
+      if (expect_ref) {
+        return AttachDef(
+            errors::InvalidArgument(i, "-th input expects a ref type"),
+            item.kernel->def());
+      }
+      inp->tensor = &entry->val;
+    } else {
+      if (!entry->ref->IsInitialized() && !IsInitializationOp(item.node)) {
+        return AttachDef(
+            errors::FailedPrecondition("Attempting to use uninitialized value ",
+                                       item.kernel->def().input(i)),
+            item.kernel->def());
+      }
+      if (expect_ref) {
+        inp->mutex_if_ref = entry->ref_mu;
+        inp->tensor = entry->ref;
+      } else {
+        // Automatically deref the tensor ref when the op expects a
+        // tensor but is given a ref to a tensor.  Need to deref it
+        // under the mutex.
+        {
+          mutex_lock l(*(entry->ref_mu));
+          entry->val = *entry->ref;
+        }
+        inp->tensor = &entry->val;
+      }
+    }
+  }
+  return Status::OK();
+}
+
+Status ExecutorState::ProcessOutputs(const NodeItem& item, OpKernelContext* ctx,
+                                     EntryVector* outputs,
+                                     NodeExecStats* stats) {
+  const Node* node = item.node;
+  outputs->clear();
+  outputs->resize(node->num_outputs());
+
+  Status s = ctx->status();
+  if (!s.ok()) {
+    s = AttachDef(s, item.kernel->def());
+    LOG(WARNING) << this << " Compute status: " << s;
+    return s;
+  }
+
+  // Get the device_context for this node id, if it exists.
+  DeviceContext* device_context = nullptr;
+  auto dc_it = device_context_map_.find(node->id());
+  if (dc_it != device_context_map_.end()) {
+    device_context = dc_it->second;
+  }
+
+  for (int i = 0; i < node->num_outputs(); ++i) {
+    TensorValue val = ctx->release_output(i);
+    // Only Switch and Recv nodes can generate new dead outputs
+    if (*ctx->is_output_dead() || val.tensor == nullptr) {
+      DCHECK(IsSwitch(node) || IsRecv(node));
+    } else {
+      Entry* out = &((*outputs)[i]);
+      out->has_value = true;
+
+      // Set the device context of the output entry.
+      out->device_context = device_context;
+
+      // Set the allocator attributes of the output entry.
+      out->alloc_attr = ctx->output_alloc_attr(i);
+
+      // Sanity check of output tensor types.
+      DataType dtype = val->dtype();
+      if (val.is_ref()) dtype = MakeRefType(dtype);
+      if (dtype == node->output_type(i)) {
+        if (val.is_ref()) {
+          out->ref = val.tensor;
+          out->ref_mu = val.mutex_if_ref;
+        } else {
+          out->val = *val.tensor;
+        }
+        if (stats_collector_ && val.tensor->IsInitialized()) {
+          nodestats::SetOutput(stats, i, ctx->output_allocation_type(i),
+                               val.tensor);
+        }
+      } else {
+        s.Update(errors::Internal("Output ", i, " of type ",
+                                  DataTypeString(dtype),
+                                  " does not match declared output type ",
+                                  DataTypeString(node->output_type(i)),
+                                  " for node ", SummarizeNodeDef(node->def())));
+      }
+    }
+    if (!val.is_ref()) {
+      // If OpKernelContext returns outputs via pass-by-value, we
+      // don't need this trouble.
+      delete val.tensor;
+    }
+  }
+  return s;
+}
+
+void ExecutorState::PropagateOutputs(const TaggedNode& tagged_node,
+                                     const EntryVector& outputs,
+                                     TaggedNodeSeq* ready) {
+  FrameState* input_frame = tagged_node.input_frame;
+  int64 input_iter = tagged_node.input_iter;
+
+  // Propagates outputs along out edges, and puts newly ready nodes
+  // into the ready queue.
+  ready->clear();
+
+  {
+    FrameState* output_frame = input_frame;
+    int64 output_iter = input_iter;
+
+    mutex_lock l(mu_);
+    // Sets the output_frame and output_iter of node.
+    bool maybe_completed = SetOutputFrameIter(
+        tagged_node, outputs, &output_frame, &output_iter, ready);
+    if (output_frame != nullptr) {
+      // Continue to process the out nodes:
+      ActivateNode(tagged_node.node, tagged_node.is_dead, output_frame,
+                   output_iter, outputs, ready);
+    }
+
+    // At this point, this node is completely done.
+    input_frame->GetIteration(input_iter)->outstanding_ops--;
+    CleanupFramesIterations(input_frame, input_iter, ready);
+
+    // The execution of a node such as Enter may cause the completion of
+    // output_frame:output_iter, so perform cleanup if output_frame:output_iter
+    // is indeed completed.
+    if (maybe_completed) {
+      CleanupFramesIterations(output_frame, output_iter, ready);
+    }
+  }
+}
+
+void ExecutorState::ActivateNode(const Node* node, const bool is_dead,
+                                 FrameState* output_frame, int64 output_iter,
+                                 const EntryVector& outputs,
+                                 TaggedNodeSeq* ready) {
+  const std::vector<NodeItem>& nodes = impl_->nodes_;
+  IterationState* output_iter_state = output_frame->GetIteration(output_iter);
+  std::vector<int>* pending = output_iter_state->pending_count;
+  std::vector<int>* dead_count = output_iter_state->dead_count;
+  for (const Edge* e : node->out_edges()) {
+    const Node* dst_node = e->dst();
+    const int dst_id = dst_node->id();
+    const int src_slot = e->src_output();
+
+    bool dst_dead = false;
+    bool dst_ready = false;
+    bool dst_need_input = !e->IsControlEdge();
+    if (IsMerge(dst_node)) {
+      // A merge node is ready if a) all control edges are enabled and a
+      // live data input becomes available, or b) all control edges are
+      // enabled and all data inputs are dead.
+      if (e->IsControlEdge()) {
+        (*pending)[dst_id] -= 2;
+        int count = (*pending)[dst_id];
+        dst_dead = ((*dead_count)[dst_id] == dst_node->num_inputs());
+        dst_ready = (count == 1) || ((count == 0) && dst_dead);
+      } else {
+        if (outputs[src_slot].has_value) {
+          // This is a live data input.
+          int count = (*pending)[dst_id];
+          (*pending)[dst_id] |= 0x1;
+          dst_ready = (count == 0);
+        } else {
+          // This is a dead data input.
+          ++(*dead_count)[dst_id];
+          dst_dead = ((*dead_count)[dst_id] == dst_node->num_inputs());
+          dst_ready = ((*pending)[dst_id] == 0) && dst_dead;
+        }
+        // This input for dst is not needed if !dst_ready. We suppress the
+        // propagation to make the thread safety analysis happy.
+        dst_need_input = dst_ready;
+      }
+    } else {
+      // A non-merge node is ready if all its inputs are ready. We wait
+      // for all inputs to come in even if we know the node is dead. This
+      // ensures that all input tensors get cleaned up.
+      if (is_dead || (!e->IsControlEdge() && !outputs[src_slot].has_value)) {
+        ++(*dead_count)[dst_id];
+      }
+      dst_dead = (*dead_count)[dst_id] > 0;
+      dst_ready = (--(*pending)[dst_id] == 0);
+    }
+
+    if (dst_need_input) {
+      const NodeItem& dst_item = nodes[dst_id];
+      const int dst_slot = e->dst_input();
+      std::vector<Entry>* input_tensors = output_iter_state->input_tensors;
+      int dst_loc = dst_item.input_start + dst_slot;
+      (*input_tensors)[dst_loc] = outputs[src_slot];
+    }
+
+    // Add dst to the ready queue if it's ready
+    if (dst_ready) {
+      dst_dead = dst_dead && !IsControlTrigger(dst_node);
+      ready->push_back(
+          TaggedNode(dst_node, output_frame, output_iter, dst_dead));
+      output_iter_state->outstanding_ops++;
+    }
+  }
+}
+
+void ExecutorState::ActivateNexts(FrameState* frame, int64 iter,
+                                  TaggedNodeSeq* ready) {
+  // Propagate the deferred NextIteration nodes to the new iteration.
+  for (auto& node_entry : frame->next_iter_roots) {
+    const Node* node = node_entry.first;
+    const Entry& entry = node_entry.second;
+    const bool is_dead = !entry.has_value;
+    ActivateNode(node, is_dead, frame, iter, {entry}, ready);
+  }
+  frame->next_iter_roots.clear();
+}
+
+void ExecutorState::ActivateLoopInvs(FrameState* frame, int64 iter,
+                                     TaggedNodeSeq* ready) {
+  // Propagate loop invariants to the new iteration.
+  for (auto& node_entry : frame->inv_values) {
+    const Node* node = node_entry.first;
+    const Entry& entry = node_entry.second;
+    const bool is_dead = !entry.has_value;
+    ActivateNode(node, is_dead, frame, iter, {entry}, ready);
+  }
+}
+
+void ExecutorState::AddLoopInv(FrameState* frame, const Node* node,
+                               const Entry& entry, TaggedNodeSeq* ready) {
+  // Store this value.
+  frame->inv_values.push_back({node, entry});
+
+  // Make this value available to all iterations.
+  bool is_dead = !entry.has_value;
+  for (int i = 1; i <= frame->iteration_count; ++i) {
+    ActivateNode(node, is_dead, frame, i, {entry}, ready);
+  }
+}
+
+bool ExecutorState::NodeDone(const Status& s, const Node* node,
+                             const TaggedNodeSeq& ready, NodeExecStats* stats,
+                             std::deque<TaggedNode>* inline_ready) {
+  if (stats_collector_) {
+    nodestats::SetAllEnd(stats);
+    if (!SetTimelineLabel(node, stats)) {
+      // Only record non-transfer nodes.
+      stats_collector_->Save(impl_->params_.device->name(), stats);
+    } else {
+      delete stats;
+    }
+  }
+
+  Rendezvous* captured_rendezvous = nullptr;  // Will be set on error.
+  if (!s.ok()) {
+    // Some error happened. This thread of computation is done.
+    mutex_lock l(mu_);
+    if (status_.ok()) {
+      captured_rendezvous = rendezvous_;
+      if (captured_rendezvous) captured_rendezvous->Ref();
+      status_ = s;
+    }
+  }
+  if (captured_rendezvous) {
+    // If we captured the rendezvous_ pointer, we are in an error condition.
+    // Use captured_rendezvous, in case "this" is deleted by another thread.
+    TRACEPRINTF("StartAbort: %s", s.ToString().c_str());
+    captured_rendezvous->StartAbort(s);
+    captured_rendezvous->Unref();
+  }
+
+  bool completed = false;
+  int ready_size = ready.size();
+  if (ready_size == 0 || !s.ok()) {
+    completed = (num_outstanding_ops_.fetch_sub(1) == 1);
+  } else if (ready_size > 1) {
+    num_outstanding_ops_.fetch_add(ready_size - 1, std::memory_order_relaxed);
+  }
+
+  // Schedule the ready nodes in 'ready'.
+  if (s.ok()) {
+    ScheduleReady(ready, inline_ready);
+  }
+  return completed;
+}
+
+void ExecutorState::ProcessInline(const std::deque<TaggedNode>& inline_ready) {
+  if (inline_ready.empty()) return;
+  int64 scheduled_usec = 0;
+  if (stats_collector_) {
+    scheduled_usec = nodestats::NowInUsec();
+  }
+  for (auto& tagged_node : inline_ready) {
+    Process(tagged_node, scheduled_usec);
+  }
+}
+
+void ExecutorState::ScheduleReady(const TaggedNodeSeq& ready,
+                                  std::deque<TaggedNode>* inline_ready) {
+  if (ready.empty()) return;
+
+  int64 scheduled_usec = 0;
+  if (stats_collector_) {
+    scheduled_usec = nodestats::NowInUsec();
+  }
+  if (inline_ready == nullptr) {
+    // Schedule to run all the ready ops in thread pool.
+    for (auto& tagged_node : ready) {
+      runner_(std::bind(&ME::Process, this, tagged_node, scheduled_usec));
+    }
+    return;
+  }
+  const std::vector<NodeItem>& nodes = impl_->nodes_;
+  const TaggedNode* curr_expensive_node = nullptr;
+  for (auto& tagged_node : ready) {
+    const NodeItem& item = nodes[tagged_node.node->id()];
+    if (tagged_node.is_dead || !item.kernel->IsExpensive()) {
+      // Inline this inexpensive node.
+      inline_ready->push_back(tagged_node);
+    } else {
+      if (curr_expensive_node) {
+        // Dispatch to another thread since there is plenty of work to
+        // do for this thread.
+        runner_(std::bind(&ME::Process, this, *curr_expensive_node,
+                          scheduled_usec));
+      }
+      curr_expensive_node = &tagged_node;
+    }
+  }
+  if (curr_expensive_node) {
+    if (inline_ready->empty()) {
+      // Tail recursion optimization
+      inline_ready->push_back(*curr_expensive_node);
+    } else {
+      // There are inline nodes to run already. We dispatch this expensive
+      // node to other thread.
+      runner_(
+          std::bind(&ME::Process, this, *curr_expensive_node, scheduled_usec));
+    }
+  }
+}
+
+void ExecutorState::Finish() {
+  mu_.lock();
+  auto status = status_;
+  auto done_cb = done_cb_;
+  auto runner = runner_;
+  mu_.unlock();
+  delete this;
+  CHECK(done_cb != nullptr);
+  runner([done_cb, status]() { done_cb(status); });
+}
+
+bool ExecutorState::IsFrameDone(FrameState* frame) {
+  return (frame->num_pending_inputs == 0 &&
+          frame->num_outstanding_iterations == 0);
+}
+
+bool ExecutorState::IsIterationDone(FrameState* frame, int64 iter) {
+  IterationState* iter_state = frame->GetIteration(iter);
+  if (iter_state->outstanding_ops == 0 &&
+      iter_state->outstanding_frame_count == 0) {
+    if (iter == 0) {
+      // The enclosing frame has no pending input.
+      return frame->num_pending_inputs == 0;
+    } else {
+      // The preceding iteration is deleted (and therefore done).
+      return (frame->GetIteration(iter - 1) == nullptr);
+    }
+  }
+  return false;
+}
+
+void ExecutorState::FindOrCreateChildFrame(FrameState* frame, int64 iter,
+                                           const Node* node,
+                                           FrameState** child) {
+  // Get the child frame name.
+  string enter_name;
+  Status s = GetNodeAttr(node->def(), "frame_name", &enter_name);
+  CHECK(s.ok()) << s;
+  const string child_name = MakeFrameName(frame, iter, enter_name);
+
+  auto it = outstanding_frames_.find(child_name);
+  if (it != outstanding_frames_.end()) {
+    *child = it->second;
+  } else {
+    // Need to create a new frame instance.
+    VLOG(2) << "Create frame: " << child_name;
+
+    FrameState* temp = new FrameState;
+    temp->frame_name = child_name;
+    temp->frame_id = Hash64(child_name);
+    temp->parent_frame = frame;
+    temp->parent_iter = iter;
+    s = GetNodeAttr(node->def(), "parallel_iterations",
+                    &temp->max_parallel_iterations);
+    CHECK(s.ok()) << s;
+    // 'iterations' is a fixed-length circular buffer.
+    temp->iterations.resize(temp->max_parallel_iterations + 1);
+    IterationState* iter_state = new IterationState;
+    temp->iterations[0] = iter_state;
+
+    iter_state->outstanding_ops = 0;
+    iter_state->outstanding_frame_count = 0;
+    iter_state->pending_count = new std::vector<int>;
+    InitializePending(impl_->graph_, iter_state->pending_count);
+    iter_state->dead_count =
+        new std::vector<int>(impl_->graph_->num_node_ids());
+    iter_state->input_tensors = new std::vector<Entry>(impl_->total_tensors_);
+
+    auto frame_pending = impl_->frame_input_count_.find(enter_name);
+    DCHECK(frame_pending != impl_->frame_input_count_.end());
+    temp->num_pending_inputs = frame_pending->second;
+    temp->num_outstanding_iterations = 1;
+    *child = temp;
+
+    frame->GetIteration(iter)->outstanding_frame_count++;
+    outstanding_frames_[child_name] = temp;
+  }
+}
+
+void ExecutorState::IncrementIteration(FrameState* frame,
+                                       TaggedNodeSeq* ready) {
+  frame->iteration_count++;
+  int64 next_iter = frame->iteration_count;
+
+  VLOG(2) << "Create iteration: [" << frame->frame_name << ", " << next_iter
+          << "]";
+
+  IterationState* iter_state = new IterationState;
+  frame->SetIteration(next_iter, iter_state);
+  frame->num_outstanding_iterations++;
+  frame->dead_exits.clear();
+
+  iter_state->outstanding_ops = 0;
+  iter_state->outstanding_frame_count = 0;
+  iter_state->pending_count = new std::vector<int>;
+  InitializePending(impl_->graph_, iter_state->pending_count);
+  iter_state->dead_count = new std::vector<int>(impl_->graph_->num_node_ids());
+  iter_state->input_tensors = new std::vector<Entry>(impl_->total_tensors_);
+
+  // Activate the successors of the deferred roots in the new iteration.
+  ActivateNexts(frame, next_iter, ready);
+
+  // Activate the loop invariants in the new iteration.
+  ActivateLoopInvs(frame, next_iter, ready);
+}
+
+bool ExecutorState::SetOutputFrameIter(const TaggedNode& tagged_node,
+                                       const EntryVector& outputs,
+                                       FrameState** output_frame,
+                                       int64* output_iter,
+                                       TaggedNodeSeq* ready) {
+  const Node* node = tagged_node.node;
+  FrameState* input_frame = tagged_node.input_frame;
+  int64 input_iter = tagged_node.input_iter;
+  bool is_dead = tagged_node.is_dead;
+  bool is_enter = IsEnter(node);
+
+  if (is_enter) {
+    FindOrCreateChildFrame(input_frame, input_iter, node, output_frame);
+    // Propagate if this is a loop invariant.
+    bool is_constant;
+    Status s = GetNodeAttr(node->def(), "is_constant", &is_constant);
+    CHECK(s.ok()) << s;
+    if (is_constant) {
+      AddLoopInv(*output_frame, node, outputs[0], ready);
+    }
+    --(*output_frame)->num_pending_inputs;
+    *output_iter = 0;
+  } else if (IsExit(node)) {
+    if (is_dead) {
+      // Stop and remember this node if it is a dead exit.
+      if (input_iter == input_frame->iteration_count) {
+        input_frame->dead_exits.push_back(node);
+      }
+      *output_frame = nullptr;
+    } else {
+      *output_frame = input_frame->parent_frame;
+      *output_iter = input_frame->parent_iter;
+    }
+  } else if (IsNextIteration(node)) {
+    if (is_dead) {
+      // Stop the deadness propagation
+      *output_frame = nullptr;
+    } else {
+      if (input_iter == input_frame->iteration_count &&
+          input_frame->num_outstanding_iterations ==
+              input_frame->max_parallel_iterations) {
+        // Reached the maximum for parallel iterations.
+        input_frame->next_iter_roots.push_back({node, outputs[0]});
+        *output_frame = nullptr;
+      } else {
+        // If this is a new iteration, start it.
+        if (input_iter == input_frame->iteration_count) {
+          IncrementIteration(input_frame, ready);
+        }
+        *output_iter = input_iter + 1;
+      }
+    }
+  }
+  return is_enter;
+}
+
+void ExecutorState::CleanupFramesIterations(FrameState* frame, int64 iter,
+                                            TaggedNodeSeq* ready) {
+  int64 curr_iter = iter;
+  while (curr_iter <= frame->iteration_count &&
+         IsIterationDone(frame, curr_iter)) {
+    // Delete the iteration curr_iter
+    VLOG(2) << "Delete iteration [" << frame->frame_name << ", " << curr_iter
+            << "].";
+
+    delete frame->GetIteration(curr_iter);
+    frame->SetIteration(curr_iter, nullptr);
+    --frame->num_outstanding_iterations;
+    ++curr_iter;
+
+    // If there is a deferred iteration, start it.
+    if (frame->next_iter_roots.size() > 0) {
+      IncrementIteration(frame, ready);
+    }
+  }
+
+  if (IsFrameDone(frame)) {
+    FrameState* parent_frame = frame->parent_frame;
+    int64 parent_iter = frame->parent_iter;
+
+    // Propagate all the dead exits to the parent frame.
+    for (const Node* node : frame->dead_exits) {
+      auto parent_iter_state = parent_frame->GetIteration(parent_iter);
+      std::vector<int>* pending = parent_iter_state->pending_count;
+      std::vector<int>* dead_count = parent_iter_state->dead_count;
+      for (const Edge* e : node->out_edges()) {
+        const Node* dst_node = e->dst();
+        const int dst_id = dst_node->id();
+
+        bool dst_dead = true;
+        bool dst_ready = false;
+        // We know this is a dead input to dst
+        if (IsMerge(dst_node)) {
+          if (e->IsControlEdge()) {
+            (*pending)[dst_id] -= 2;
+            int count = (*pending)[dst_id];
+            dst_dead = ((*dead_count)[dst_id] == dst_node->num_inputs());
+            dst_ready = (count == 1) || ((count == 0) && dst_dead);
+          } else {
+            ++(*dead_count)[dst_id];
+            dst_dead = ((*dead_count)[dst_id] == dst_node->num_inputs());
+            dst_ready = ((*pending)[dst_id] == 0) && dst_dead;
+          }
+        } else {
+          ++(*dead_count)[dst_id];
+          dst_ready = (--(*pending)[dst_id] == 0);
+        }
+        if (dst_ready) {
+          ready->push_back(
+              TaggedNode(dst_node, parent_frame, parent_iter, dst_dead));
+          parent_iter_state->outstanding_ops++;
+        }
+      }
+    }
+
+    // Delete the frame
+    const string& frame_name = frame->frame_name;
+    VLOG(2) << "Delete frame " << frame_name;
+    outstanding_frames_.erase(frame_name);
+    delete frame;
+
+    // Cleanup recursively
+    if (parent_frame != nullptr) {
+      parent_frame->GetIteration(parent_iter)->outstanding_frame_count--;
+      CleanupFramesIterations(parent_frame, parent_iter, ready);
+    }
+  }
+}
+
+// When ExecutorImpl graph has no control flow nodes,
+// SimpleExecutorState is used instead of ExecutorState.  It maintains
+// fewer internal state and is convenient for experimenting with async
+// op kernels.
+class SimpleExecutorState {
+ public:
+  SimpleExecutorState(const Executor::Args& args, ExecutorImpl* impl);
+  ~SimpleExecutorState() {
+    for (auto it : device_context_map_) {
+      it.second->Unref();
+    }
+    delete slice_reader_cache_;
+  }
+  void RunAsync(Executor::DoneCallback done);
+
+ private:
+  typedef SimpleExecutorState ME;
+
+  // Not owned.
+  Rendezvous* rendezvous_;
+  StepStatsCollector* stats_collector_;
+  checkpoint::TensorSliceReaderCacheWrapper* slice_reader_cache_;
+  FunctionCallFrame* call_frame_;
+  const ExecutorImpl* impl_;
+  CancellationManager* cancellation_manager_;
+  Executor::Args::Runner runner_;
+
+  // Owned.
+
+  // i-th node's j-th input is in tensors_[impl_->nodes[i].input_start
+  // + j].  The output is either a tensor pointer (pass-by-reference)
+  // or a tensor (pass-by-value).
+  //
+  // NOTE: Not protected by mu_ because tensors_ is resized once. Each
+  // element of tensors_ is written once by the source node of an edge
+  // and is cleared by the destination of the same edge. The latter
+  // node is never run concurrently with the former node.
+  struct Entry {
+    Tensor val = *kEmptyTensor;  // A tensor value.
+    Tensor* ref = nullptr;       // A tensor reference.
+    mutex* ref_mu = nullptr;     // mutex for *ref if ref is not nullptr.
+
+    // Every entry carries an optional DeviceContext containing
+    // Device-specific information about how the Tensor was produced.
+    DeviceContext* device_context = nullptr;
+
+    // The attributes of the allocator that creates the tensor.
+    AllocatorAttributes alloc_attr;
+  };
+
+  // Contains a map from node id to the DeviceContext object that was
+  // assigned by the device at the beginning of a step.
+  DeviceContextMap device_context_map_;
+
+  std::vector<Entry> input_tensors_;
+
+  // Step-local resource manager.
+  ResourceMgr step_resource_manager_;
+
+  // Invoked when the execution finishes.
+  Executor::DoneCallback done_cb_;
+
+  // How many active threads of computation are being used.  Same as
+  // the number of pending Process() functions.
+  std::atomic_int_fast32_t num_active_;
+
+  mutex mu_;
+  Status status_ GUARDED_BY(mu_);
+
+  // i-th kernel is still waiting for pending[i] inputs.
+  class CountDown {
+   public:
+    CountDown() : v_(0) {}
+    void Set(int32 v) { v_.store(v); }
+    bool Dec() {
+      return v_.load(std::memory_order_acquire) == 1 || v_.fetch_sub(1) == 1;
+    }
+
+   private:
+    std::atomic_int_fast32_t v_;
+  };
+  std::vector<CountDown> pending_;
+
+  // Process Node identified by "id" in current thread. "scheduled_usec"
+  // indicates when the node becomes ready and gets scheduled.
+  void Process(int id, int64 scheduled_usec);
+
+  // Before invoking item->kernel, fills in its "inputs".
+  Status PrepareInputs(const NodeItem& item, TensorValueVec* inputs,
+                       DeviceContextVec* input_device_contexts);
+
+  // After item->kernel computation is done, processes its outputs
+  // and returns nodes that become "ready".
+  typedef gtl::InlinedVector<int, 8> ReadyNodeIds;
+  Status ProcessOutputs(const NodeItem& item, OpKernelContext* ctx,
+                        ReadyNodeIds* ready, NodeExecStats* stats);
+
+  // "node" just finishes. Takes ownership of "stats". Returns true if
+  // execution has completed.
+  bool NodeDone(const Status& s, const Node* node, const ReadyNodeIds& ready,
+                NodeExecStats* stats, std::deque<int>* inline_ready);
+
+  // Call Process() on all nodes in 'inline_ready'.
+  void ProcessInline(const std::deque<int>& inline_ready);
+
+  // Schedule all the expensive nodes in 'ready', and put all the inexpensive
+  // nodes in 'ready' into 'inline_ready'.
+  void ScheduleReady(const ReadyNodeIds& ready, std::deque<int>* inline_ready);
+
+  // One thread of control finishes.
+  void Finish();
+
+  TF_DISALLOW_COPY_AND_ASSIGN(SimpleExecutorState);
+};
+
+SimpleExecutorState::SimpleExecutorState(const Executor::Args& args,
+                                         ExecutorImpl* impl)
+    : rendezvous_(args.rendezvous),
+      stats_collector_(args.stats_collector),
+      slice_reader_cache_(new checkpoint::TensorSliceReaderCacheWrapper),
+      call_frame_(args.call_frame),
+      impl_(impl),
+      cancellation_manager_(args.cancellation_manager),
+      runner_(args.runner),
+      num_active_(0),
+      pending_(impl_->nodes_.size()) {}
+
+void SimpleExecutorState::ProcessInline(const std::deque<int>& inline_ready) {
+  if (inline_ready.empty()) return;
+  int64 scheduled_usec = 0;
+  if (stats_collector_) {
+    scheduled_usec = nodestats::NowInUsec();
+  }
+  for (int id : inline_ready) {
+    Process(id, scheduled_usec);
+  }
+}
+
+void SimpleExecutorState::ScheduleReady(const ReadyNodeIds& ready,
+                                        std::deque<int>* inline_ready) {
+  if (ready.empty()) return;
+
+  int64 scheduled_usec = 0;
+  if (stats_collector_) {
+    scheduled_usec = nodestats::NowInUsec();
+  }
+  if (inline_ready == nullptr) {
+    // Schedule to run all the ready ops in thread pool.
+    for (auto id : ready) {
+      runner_(std::bind(&ME::Process, this, id, scheduled_usec));
+    }
+    return;
+  }
+  const std::vector<NodeItem>& nodes = impl_->nodes_;
+  int curr_expensive_node = -1;
+  for (auto id : ready) {
+    if (!nodes[id].kernel->IsExpensive()) {
+      // Inline this inexpensive node.
+      inline_ready->push_back(id);
+    } else {
+      if (curr_expensive_node != -1) {
+        // Dispatch to another thread since there is plenty of work to
+        // do for this thread.
+        runner_(
+            std::bind(&ME::Process, this, curr_expensive_node, scheduled_usec));
+      }
+      curr_expensive_node = id;
+    }
+  }
+  if (curr_expensive_node != -1) {
+    if (inline_ready->empty()) {
+      // Tail recursion optimization
+      inline_ready->push_back(curr_expensive_node);
+    } else {
+      // There are inline nodes to run already. We dispatch this expensive
+      // node to other thread.
+      runner_(
+          std::bind(&ME::Process, this, curr_expensive_node, scheduled_usec));
+    }
+  }
+}
+
+void SimpleExecutorState::RunAsync(Executor::DoneCallback done) {
+  const Graph* graph = impl_->graph_;
+  ReadyNodeIds ready;
+
+  // Ask the device to fill in the device context map.
+  Device* device = impl_->params_.device;
+  device->FillContextMap(graph, &device_context_map_);
+
+  for (const Node* n : graph->nodes()) {
+    const int id = n->id();
+    const int num_in_edges = n->in_edges().size();
+    pending_[id].Set(num_in_edges);
+    if (num_in_edges == 0) {
+      ready.push_back(id);
+    }
+  }
+  if (ready.empty()) {
+    done(Status::OK());
+  } else {
+    num_active_ = ready.size();
+    done_cb_ = done;
+    input_tensors_.resize(impl_->total_tensors_);
+    // Schedule to run all the ready ops in thread pool.
+    ScheduleReady(ready, nullptr);
+  }
+}
+
+Status SimpleExecutorState::PrepareInputs(
+    const NodeItem& item, TensorValueVec* inputs,
+    DeviceContextVec* input_device_contexts) {
+  const Node* node = item.node;
+
+  inputs->clear();
+  inputs->resize(node->num_inputs());
+  input_device_contexts->clear();
+  input_device_contexts->resize(node->num_inputs());
+
+  for (int i = 0; i < node->num_inputs(); ++i) {
+    const bool expect_ref = IsRefType(node->input_type(i));
+    Entry* entry = input_tensors_.data() + item.input_start + i;
+    (*input_device_contexts)[i] = entry->device_context;
+
+    // i-th input.
+    TensorValue* inp = &(*inputs)[i];
+
+    if (entry->ref == nullptr) {
+      if (expect_ref) {
+        return AttachDef(
+            errors::InvalidArgument(i, "-th input expects a ref type"),
+            item.kernel->def());
+      }
+      inp->tensor = &entry->val;
+    } else {
+      if (!entry->ref->IsInitialized() && !IsInitializationOp(item.node)) {
+        return AttachDef(
+            errors::FailedPrecondition("Attempting to use uninitialized value ",
+                                       item.kernel->def().input(i)),
+            item.kernel->def());
+      }
+      if (expect_ref) {
+        inp->mutex_if_ref = entry->ref_mu;
+        inp->tensor = entry->ref;
+      } else {
+        // Automatically deref the tensor ref when the op expects a
+        // tensor but is given a ref to a tensor.  Need to deref it
+        // under the mutex.
+        {
+          mutex_lock l(*(entry->ref_mu));
+          entry->val = *entry->ref;
+        }
+        inp->tensor = &entry->val;
+      }
+    }
+  }
+  return Status::OK();
+}
+
+void SimpleExecutorState::Process(int id, int64 scheduled_usec) {
+  const std::vector<NodeItem>& nodes = impl_->nodes_;
+  ReadyNodeIds ready;
+  std::deque<int> inline_ready;
+
+  // Parameters passed to OpKernel::Compute.
+  TensorValueVec inputs;
+  DeviceContextVec input_device_contexts;
+
+  OpKernelContext::Params params;
+  Device* device = impl_->params_.device;
+  params.device = device;
+  // track allocations if and only if we are collecting statistics
+  params.track_allocations = (stats_collector_ != nullptr);
+  params.rendezvous = rendezvous_;
+  params.cancellation_manager = cancellation_manager_;
+  params.call_frame = call_frame_;
+  params.function_library = impl_->params_.function_library;
+  params.resource_manager = device->resource_manager();
+  params.step_resource_manager = &step_resource_manager_;
+  params.slice_reader_cache = slice_reader_cache_;
+  params.inputs = &inputs;
+  params.input_device_contexts = &input_device_contexts;
+  params.frame_iter = FrameAndIter(0, 0);
+
+  Status s;
+  NodeExecStats* stats = nullptr;
+  bool completed = false;
+  inline_ready.push_back(id);
+  while (!inline_ready.empty()) {
+    id = inline_ready.front();
+    inline_ready.pop_front();
+    const NodeItem& item = nodes[id];
+    const Node* node = item.node;
+
+    // Set the device_context for this node id, if it exists.
+    auto dc_it = device_context_map_.find(id);
+    if (dc_it != device_context_map_.end()) {
+      params.op_device_context = dc_it->second;
+    }
+
+    if (stats_collector_) {
+      stats = new NodeExecStats;
+      stats->set_node_name(node->name());
+      nodestats::SetScheduled(stats, scheduled_usec);
+      nodestats::SetAllStart(stats);
+    }
+
+    VLOG(1) << "Process node: " << id << " " << SummarizeNodeDef(node->def());
+
+    // Prepares inputs.
+    s = PrepareInputs(item, &inputs, &input_device_contexts);
+    if (!s.ok()) {
+      // Continue to process the nodes in 'inline_ready'.
+      completed = NodeDone(s, item.node, ready, stats, &inline_ready);
+      continue;
+    }
+
+    OpKernel* op_kernel = item.kernel;
+    params.op_kernel = op_kernel;
+    params.output_alloc_attr = [this, node, op_kernel](int index) {
+      return OutputAttributes(&impl_->alloc_attr_, node, op_kernel, index);
+    };
+
+    // Asynchronous computes.
+    AsyncOpKernel* async = op_kernel->AsAsync();
+    if (async) {
+      auto pcopy = CopyParams(params);
+      auto ctx = new OpKernelContext(*pcopy);
+      auto done = [this, item, ctx, stats, pcopy]() {
+        VLOG(2) << this
+                << " Async kernel done: " << SummarizeNodeDef(item.node->def());
+        if (stats_collector_) nodestats::SetOpEnd(stats);
+        ReadyNodeIds ready;
+        Status s = ProcessOutputs(item, ctx, &ready, stats);
+        if (stats_collector_) nodestats::SetMemory(stats, ctx);
+        // Schedule to run all the ready ops in thread pool.
+        bool completed = NodeDone(s, item.node, ready, stats, nullptr);
+        delete ctx;
+        DeleteParams(pcopy);
+        if (completed) Finish();
+      };
+      if (stats_collector_) nodestats::SetOpStart(stats);
+      device->ComputeAsync(async, ctx, done);
+    } else {
+      // Synchronous computes.
+      OpKernelContext ctx(params);
+      if (stats_collector_) nodestats::SetOpStart(stats);
+      device->Compute(CHECK_NOTNULL(op_kernel), &ctx);
+      if (stats_collector_) nodestats::SetOpEnd(stats);
+
+      s = ProcessOutputs(item, &ctx, &ready, stats);
+      if (stats_collector_) nodestats::SetMemory(stats, &ctx);
+      if (stats_collector_) {
+        scheduled_usec = nodestats::NowInUsec();
+      }
+      completed = NodeDone(s, node, ready, stats, &inline_ready);
+    }
+  }  // while !inline_ready.empty()
+
+  // This thread of computation is done if completed = true.
+  if (completed) Finish();
+}
+
+bool SimpleExecutorState::NodeDone(const Status& s, const Node* node,
+                                   const ReadyNodeIds& ready,
+                                   NodeExecStats* stats,
+                                   std::deque<int>* inline_ready) {
+  if (stats_collector_) {
+    nodestats::SetAllEnd(stats);
+    if (!SetTimelineLabel(node, stats)) {
+      // Only record non-transfer nodes.
+      stats_collector_->Save(impl_->params_.device->name(), stats);
+    } else {
+      delete stats;
+    }
+  }
+
+  Rendezvous* captured_rendezvous = nullptr;  // Will be set on error.
+  if (!s.ok()) {
+    // Some error happened. This thread of computation is done.
+    mutex_lock l(mu_);
+    if (status_.ok()) {
+      captured_rendezvous = rendezvous_;
+      if (captured_rendezvous) captured_rendezvous->Ref();
+      status_ = s;
+    }
+  }
+  if (captured_rendezvous) {
+    // If we captured the rendezvous_ pointer, we are in an error condition.
+    // Use captured_rendezvous, in case "this" is deleted by another thread.
+    TRACEPRINTF("StartAbort: %s", s.ToString().c_str());
+    captured_rendezvous->StartAbort(s);
+    captured_rendezvous->Unref();
+  }
+
+  bool completed = false;
+  int ready_size = ready.size();
+  if (ready_size == 0 || !s.ok()) {
+    completed = (num_active_.fetch_sub(1) == 1);
+  } else if (ready_size > 1) {
+    num_active_.fetch_add(ready_size - 1, std::memory_order_relaxed);
+  }
+
+  // Schedule the ready nodes in 'ready'.
+  if (s.ok()) {
+    ScheduleReady(ready, inline_ready);
+  }
+  return completed;
+}
+
+void SimpleExecutorState::Finish() {
+  mu_.lock();
+  auto ret = status_;
+  auto done_cb = done_cb_;
+  auto runner = runner_;
+  mu_.unlock();
+  delete this;
+  CHECK(done_cb != nullptr);
+  runner([done_cb, ret]() { done_cb(ret); });
+}
+
+Status SimpleExecutorState::ProcessOutputs(const NodeItem& item,
+                                           OpKernelContext* ctx,
+                                           ReadyNodeIds* ready,
+                                           NodeExecStats* stats) {
+  Status s = ctx->status();
+  if (!s.ok()) {
+    s = AttachDef(s, item.kernel->def());
+    LOG(WARNING) << this << " Compute status: " << s;
+    return s;
+  }
+
+  // Processes outputs.
+  gtl::InlinedVector<Entry, 4> outputs;
+  const Node* node = item.node;
+  outputs.resize(node->num_outputs());
+
+  // Get the device_context for this node id, if it exists.
+  DeviceContext* device_context = nullptr;
+  auto dc_it = device_context_map_.find(node->id());
+  if (dc_it != device_context_map_.end()) {
+    device_context = dc_it->second;
+  }
+
+  for (int i = 0; i < node->num_outputs(); ++i) {
+    TensorValue val = ctx->release_output(i);
+    // Sanity check of output tensor types.
+    DataType dtype = val->dtype();
+    if (val.is_ref()) dtype = MakeRefType(dtype);
+    if (dtype == node->output_type(i)) {
+      Entry* out = &(outputs[i]);
+      if (val.is_ref()) {
+        out->ref = val.tensor;
+        out->ref_mu = val.mutex_if_ref;
+      } else {
+        out->val = *val.tensor;
+      }
+
+      // Set the device context of the output entry.
+      out->device_context = device_context;
+
+      // Set the allocator attributes of the output entry.
+      out->alloc_attr = ctx->output_alloc_attr(i);
+
+      if (stats_collector_ && val.tensor->IsInitialized()) {
+        nodestats::SetOutput(stats, i, ctx->output_allocation_type(i),
+                             val.tensor);
+      }
+    } else {
+      s.Update(
+          errors::Internal("Output ", i, " of type ", DataTypeString(dtype),
+                           " does not match declared output type ",
+                           DataTypeString(node->output_type(i)),
+                           " for operation ", SummarizeNodeDef(node->def())));
+    }
+    if (!val.is_ref()) {
+      // If OpKernelContext returns outputs via pass-by-value, we
+      // don't need this trouble.
+      delete val.tensor;
+    }
+  }
+  if (!s.ok()) return s;
+
+  // Clears inputs.
+  for (int i = 0; i < node->num_inputs(); ++i) {
+    input_tensors_[item.input_start + i].val = *kEmptyTensor;
+  }
+
+  // Propagates outputs along out edges.
+  ready->clear();
+  const std::vector<NodeItem>& nodes = impl_->nodes_;
+  for (const Edge* e : node->out_edges()) {
+    const int src_slot = e->src_output();
+    const int dst_id = e->dst()->id();
+    const NodeItem& dst_item = nodes[dst_id];
+    if (!e->IsControlEdge()) {
+      const int dst_slot = e->dst_input();
+      input_tensors_[dst_item.input_start + dst_slot] = outputs[src_slot];
+    }
+    if (pending_[dst_id].Dec()) {
+      ready->push_back(dst_id);
+    }
+  }
+  return Status::OK();
+}
+
+// NOTE(yuanbyu): Use the executor that supports control flow by default.
+const bool use_control_flow_executor = true;
+void ExecutorImpl::RunAsync(const Args& args, DoneCallback done) {
+  if (params_.has_control_flow || use_control_flow_executor) {
+    (new ExecutorState(args, this))->RunAsync(done);
+  } else {
+    (new SimpleExecutorState(args, this))->RunAsync(done);
+  }
+}
+
+}  // end namespace
+
+Status NewLocalExecutor(const LocalExecutorParams& params, const Graph* graph,
+                        Executor** executor) {
+  ExecutorImpl* impl = new ExecutorImpl(params, graph);
+  Status s = impl->Initialize();
+  if (s.ok()) {
+    *executor = impl;
+  } else {
+    delete impl;
+  }
+  return s;
+}
+
+Status CreateNonCachedKernel(Device* device, FunctionLibraryRuntime* flib,
+                             const NodeDef& ndef, OpKernel** kernel) {
+  auto device_type = DeviceType(device->attributes().device_type());
+  auto allocator = device->GetAllocator(AllocatorAttributes());
+  return CreateOpKernel(device_type, device, allocator, flib, ndef, kernel);
+}
+
+void DeleteNonCachedKernel(OpKernel* kernel) { delete kernel; }
+
+Status CreateCachedKernel(Device* device, const string& session,
+                          FunctionLibraryRuntime* flib, const NodeDef& ndef,
+                          OpKernel** kernel) {
+  auto op_seg = device->op_segment();
+  auto create_fn = [device, flib, &ndef](OpKernel** kernel) {
+    return CreateNonCachedKernel(device, flib, ndef, kernel);
+  };
+  return op_seg->FindOrCreate(session, ndef.name(), kernel, create_fn);
+}
+
+// Deletes "kernel".
+void DeleteCachedKernel(Device* device, const string& session,
+                        OpKernel* kernel) {
+  // Do nothing.
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/common_runtime/executor.h b/tensorflow/core/common_runtime/executor.h
new file mode 100644
index 0000000000..82bcbab836
--- /dev/null
+++ b/tensorflow/core/common_runtime/executor.h
@@ -0,0 +1,209 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_EXECUTOR_H_
+#define TENSORFLOW_COMMON_RUNTIME_EXECUTOR_H_
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/rendezvous.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+class StepStatsCollector;
+
+// Executor runs a graph computation.
+// Example:
+//   Graph* graph = ...;
+//      ... construct graph ...
+//   Executor* executor;
+//   TF_CHECK_OK(NewSimpleExecutor(my_device, graph, &executor));
+//   Rendezvous* rendezvous = NewNaiveRendezvous();
+//   TF_CHECK_OK(rendezvous->Send("input", some_input_tensor));
+//   TF_CHECK_OK(executor->Run({ExecutorOpts, rendezvous, nullptr}));
+//   TF_CHECK_OK(rendezvous->Recv("input", &output_tensor));
+//   ... ...
+//
+// Multiple threads can call Executor::Run concurrently.
+class Executor {
+ public:
+  virtual ~Executor() {}
+
+  // RunAsync() executes the graph computation. "done" is run when the
+  // graph computation completes. If any error happens during the
+  // computation, "done" is run and the error is passed to "done".
+  //
+  // RunAsync() is given a few arguments in Args. The caller must
+  // ensure objects passed in Args (rendezvous, stats_collector, etc.)
+  // are alive at least until done is invoked. All pointers to the
+  // argument objects can be nullptr.
+  //
+  // RunAsync() uses the given "rendezvous", if not null, as the
+  // mechanism to communicate inputs and outputs of the underlying
+  // graph computation.
+  //
+  // RunAsync() calls "stats_collector", if not null, to keep track of
+  // stats. This allows us to collect statistics and traces on demand.
+  //
+  // RunAsync() is provided a "call_frame", if the executor is used
+  // for executing a function, is used to pass arguments and return
+  // values between the caller and the callee.
+  //
+  // RunAsync() uses "cancellation_manager", if not nullptr, to
+  // register callbacks that should be called if the graph computation
+  // is cancelled. Note that the callbacks merely unblock any
+  // long-running computation, and a cancelled step will terminate by
+  // returning/calling the DoneCallback as usual.
+  //
+  // RunAsync() dispatches closures to "runner". Typically, "runner"
+  // is backed up by a bounded threadpool.
+  struct Args {
+    Rendezvous* rendezvous = nullptr;
+    StepStatsCollector* stats_collector = nullptr;
+    FunctionCallFrame* call_frame = nullptr;
+    CancellationManager* cancellation_manager = nullptr;
+
+    typedef std::function<void()> Closure;
+    typedef std::function<void(Closure)> Runner;
+    Runner runner = nullptr;
+  };
+  typedef std::function<void(const Status&)> DoneCallback;
+  virtual void RunAsync(const Args& args, DoneCallback done) = 0;
+
+  // Synchronous wrapper for RunAsync().
+  Status Run(const Args& args) {
+    Status ret;
+    Notification n;
+    RunAsync(args, [&ret, &n](const Status& s) {
+      ret = s;
+      n.Notify();
+    });
+    n.WaitForNotification();
+    return ret;
+  }
+};
+
+// Creates an Executor that computes the given "graph".
+//
+// If successful, returns the constructed executor in "*executor". The
+// caller keeps the ownership of "device". The returned executor takes
+// the ownership of "graph". Otherwise, returns an error status.
+//
+// "params" provides a set of context for the executor. We expect that
+// different context would provide different implementations.
+struct LocalExecutorParams {
+  Device* device;
+
+  // The library runtime support.
+  FunctionLibraryRuntime* function_library;
+
+  // True iff the computation contains control flow nodes.
+  bool has_control_flow;
+
+  // create_kernel returns an instance of op kernel based on NodeDef.
+  // delete_kernel is called for every kernel used by the executor
+  // when the executor is deleted.
+  std::function<Status(const NodeDef&, OpKernel**)> create_kernel;
+  std::function<void(OpKernel*)> delete_kernel;
+};
+::tensorflow::Status NewLocalExecutor(const LocalExecutorParams& params,
+                                      const Graph* graph, Executor** executor);
+
+// A class to help run multiple executors in parallel and wait until
+// all of them are complete.
+//
+// ExecutorBarrier deletes itself after the function returned by Get()
+// is called.
+class ExecutorBarrier {
+ public:
+  typedef std::function<void(const Status&)> StatusCallback;
+
+  // Create an ExecutorBarrier for 'num' different executors.
+  //
+  // 'r' is the shared Rendezvous object that is used to communicate
+  // state.  If any of the executors experiences an error, the
+  // rendezvous object will be aborted exactly once.
+  //
+  // 'done' is called after the last executor completes, and
+  // ExecutorBarrier is deleted.
+  ExecutorBarrier(int num, Rendezvous* r, StatusCallback done)
+      : rendez_(r), done_cb_(done), pending_(num) {}
+
+  ~ExecutorBarrier() {}
+
+  // Returns a closure that Executors must call when they are done
+  // computing, passing the status of their execution as an argument.
+  StatusCallback Get() {
+    return std::bind(&ExecutorBarrier::WhenDone, this, std::placeholders::_1);
+  }
+
+ private:
+  Rendezvous* rendez_ = nullptr;
+  StatusCallback done_cb_ = nullptr;
+
+  mutable mutex mu_;
+  int pending_ GUARDED_BY(mu_) = 0;
+  Status status_ GUARDED_BY(mu_);
+
+  void WhenDone(const Status& s) {
+    bool error = false;
+    StatusCallback done = nullptr;
+    Status status;
+    {
+      mutex_lock l(mu_);
+      // If we are the first error encountered, mark the status
+      // appropriately and later trigger an abort of the Rendezvous
+      // object by this thread only.
+      if (status_.ok() && !s.ok()) {
+        error = true;
+        status_ = s;
+      }
+
+      // If this is the last call to WhenDone, call the final callback
+      // below.
+      if (--pending_ == 0) {
+        CHECK(done_cb_ != nullptr);
+        done = done_cb_;
+        done_cb_ = nullptr;
+      }
+      status = status_;
+    }
+    if (error) {
+      rendez_->StartAbort(status);
+    }
+    if (done != nullptr) {
+      delete this;
+      done(status);
+    }
+  }
+
+  TF_DISALLOW_COPY_AND_ASSIGN(ExecutorBarrier);
+};
+
+// A few helpers to facilitate create/delete kernels.
+
+// Creates a kernel based on "ndef" on device "device". The kernel can
+// access the functions in the "flib". The caller takes ownership of
+// returned "*kernel".
+Status CreateNonCachedKernel(Device* device, FunctionLibraryRuntime* flib,
+                             const NodeDef& ndef, OpKernel** kernel);
+
+// Deletes "kernel" returned by CreateKernel.
+void DeleteNonCachedKernel(OpKernel* kernel);
+
+// Creates a kernel based on "ndef" on device "device". The kernel can
+// access the functions in the "flib". The caller does not take
+// ownership of returned "*kernel". If a kernel has been created for
+// ndef.name(), returns the same kernel instance.
+Status CreateCachedKernel(Device* device, const string& session,
+                          FunctionLibraryRuntime* flib, const NodeDef& ndef,
+                          OpKernel** kernel);
+
+// Deletes "kernel" returned by CreateCachedKernel.
+void DeleteCachedKernel(Device* device, const string& session,
+                        OpKernel* kernel);
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_EXECUTOR_H_
diff --git a/tensorflow/core/common_runtime/function.cc b/tensorflow/core/common_runtime/function.cc
new file mode 100644
index 0000000000..2b1a041235
--- /dev/null
+++ b/tensorflow/core/common_runtime/function.cc
@@ -0,0 +1,1335 @@
+#include "tensorflow/core/common_runtime/function.h"
+
+#include <deque>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/executor.h"
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/graph/algorithm.h"
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/optimizer_cse.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+
+namespace tensorflow {
+
+// A few string constant used throughout this module.
+static const char* const kArgOp = "_Arg";
+static const char* const kRetOp = "_Retval";
+static const char* const kGradientOp = "SymbolicGradient";
+static const char* const kNodeLabel = "Func";
+
+// Represents the index-th output of a node.
+struct Endpoint {
+  Node* node;
+  int index;
+
+  // Returns the string name represents this endpoint.
+  string name() const {
+    if (index == 0) {
+      return node->name();
+    } else {
+      return strings::StrCat(node->name(), ":", index);
+    }
+  }
+
+  DataType dtype() const { return node->output_type(index); }
+};
+
+struct EndpointHash {
+  uint64 operator()(const Endpoint& x) const {
+    return Hash64(reinterpret_cast<const char*>(&x.node), sizeof(Node*),
+                  x.index);
+  }
+};
+
+struct EndpointEq {
+  bool operator()(const Endpoint& x, const Endpoint& y) const {
+    return (x.node == y.node) && (x.index == y.index);
+  }
+};
+
+// The following Add* routines are used to add a few graph nodes while
+// functions are transformed.
+static Node* AddNoOp(Graph* g) {
+  NodeDef ndef;
+  ndef.set_name(g->NewName(kNodeLabel));
+  ndef.set_op("NoOp");
+  Status s;
+  Node* ret = g->AddNode(ndef, &s);
+  TF_CHECK_OK(s);
+  return ret;
+}
+
+static Node* AddIdentity(Graph* g, Endpoint input) {
+  DCHECK_LT(0, input.dtype());
+  DCHECK_LT(input.dtype(), DT_FLOAT_REF);
+  NodeDef ndef;
+  ndef.set_name(g->NewName(kNodeLabel));
+  ndef.set_op("Identity");
+  ndef.add_input(input.name());
+  AddNodeAttr("T", input.dtype(), &ndef);
+  Status s;
+  Node* ret = g->AddNode(ndef, &s);
+  TF_CHECK_OK(s);
+  g->AddEdge(input.node, input.index, ret, 0);
+  return ret;
+}
+
+static Node* AddArg(Graph* g, DataType dtype, int index) {
+  DCHECK_LT(0, dtype);
+  DCHECK_LT(dtype, DT_FLOAT_REF);
+  NodeDef ndef;
+  ndef.set_name(g->NewName(kNodeLabel));
+  ndef.set_op(kArgOp);
+  AddNodeAttr("T", dtype, &ndef);
+  AddNodeAttr("index", index, &ndef);
+  Status s;
+  Node* ret = g->AddNode(ndef, &s);
+  TF_CHECK_OK(s);
+  return ret;
+}
+
+static Node* AddRet(Graph* g, Endpoint input, int index) {
+  DCHECK_LT(0, input.dtype());
+  DCHECK_LT(input.dtype(), DT_FLOAT_REF);
+  NodeDef ndef;
+  ndef.set_name(g->NewName(kNodeLabel));
+  ndef.set_op(kRetOp);
+  ndef.add_input(input.name());
+  AddNodeAttr("T", input.dtype(), &ndef);
+  AddNodeAttr("index", index, &ndef);
+  Status s;
+  Node* ret = g->AddNode(ndef, &s);
+  TF_CHECK_OK(s);
+  g->AddEdge(input.node, input.index, ret, 0);
+  return ret;
+}
+
+static Node* AddZerosLike(Graph* g, Endpoint input) {
+  DCHECK_LT(0, input.dtype());
+  DCHECK_LT(input.dtype(), DT_FLOAT_REF);
+  NodeDef ndef;
+  ndef.set_name(g->NewName(kNodeLabel));
+  ndef.set_op("ZerosLike");
+  ndef.add_input(input.name());
+  AddNodeAttr("T", input.dtype(), &ndef);
+  Status s;
+  Node* ret = g->AddNode(ndef, &s);
+  TF_CHECK_OK(s);
+  g->AddEdge(input.node, input.index, ret, 0);
+  return ret;
+}
+
+static Node* AddSymGrad(Graph* g, Node* n, gtl::ArraySlice<Endpoint> grads) {
+  const int num_x = n->num_inputs();
+  const int num_y = n->num_outputs();
+  CHECK_EQ(num_y, grads.size());
+
+  NodeDef ndef;
+  ndef.set_name(g->NewName(kNodeLabel));
+  ndef.set_op(kGradientOp);
+
+  // The gradient node should have num_x + num_y inputs.
+  std::vector<Endpoint> n_inputs(num_x);
+  for (const Edge* e : n->in_edges()) {
+    if (e->IsControlEdge()) continue;
+    n_inputs[e->dst_input()] = {e->src(), e->src_output()};
+  }
+  DataTypeVector in_types;
+  for (const Endpoint& ep : n_inputs) {
+    ndef.add_input(ep.name());
+    in_types.push_back(ep.dtype());
+  }
+  for (const Endpoint& ep : grads) {
+    ndef.add_input(ep.name());
+    in_types.push_back(ep.dtype());
+  }
+  CHECK_EQ(ndef.input_size(), num_x + num_y);
+
+  AddNodeAttr("Tin", in_types, &ndef);
+
+  // The gradient node's outputs have the same types as the node 'n's
+  // inputs.
+  AddNodeAttr("Tout", n->input_types(), &ndef);
+  NameAttrList func;
+  func.set_name(n->type_string());
+  *(func.mutable_attr()) = n->def().attr();
+  AddNodeAttr("f", func, &ndef);
+  Status s;
+  Node* ret = g->AddNode(ndef, &s);
+  TF_CHECK_OK(s);
+  return ret;
+}
+
+class ArgOp : public OpKernel {
+ public:
+  explicit ArgOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("T", &dtype_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("index", &index_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    auto frame = ctx->call_frame();
+    OP_REQUIRES(ctx, frame != nullptr, errors::Internal("no call frame"));
+    Tensor val;
+    OP_REQUIRES_OK(ctx, frame->GetArg(index_, &val));
+    OP_REQUIRES(ctx, val.dtype() == dtype_,
+                errors::InvalidArgument(
+                    "Type mismatch: actual ", DataTypeString(val.dtype()),
+                    " vs. expect ", DataTypeString(dtype_)));
+    ctx->set_output(0, val);
+  }
+
+ private:
+  int index_;
+  DataType dtype_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(ArgOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("_Arg").Device(DEVICE_CPU), ArgOp);
+REGISTER_KERNEL_BUILDER(Name("_Arg").Device(DEVICE_GPU), ArgOp);
+
+class RetvalOp : public OpKernel {
+ public:
+  explicit RetvalOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("T", &dtype_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("index", &index_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& val = ctx->input(0);
+    OP_REQUIRES(ctx, val.dtype() == dtype_,
+                errors::InvalidArgument(
+                    "Type mismatch: actual ", DataTypeString(val.dtype()),
+                    " vs. expect ", DataTypeString(dtype_)));
+    auto frame = ctx->call_frame();
+    OP_REQUIRES(ctx, frame != nullptr, errors::Internal("no call frame"));
+    OP_REQUIRES_OK(ctx, frame->SetRetval(index_, val));
+  }
+
+ private:
+  int index_;
+  DataType dtype_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(RetvalOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("_Retval").Device(DEVICE_CPU), RetvalOp);
+REGISTER_KERNEL_BUILDER(Name("_Retval").Device(DEVICE_GPU), RetvalOp);
+
+static const FunctionLibraryRuntime::Handle kInvalidHandle = -1;
+
+class FunctionLibraryRuntimeImpl : public FunctionLibraryRuntime {
+ public:
+  FunctionLibraryRuntimeImpl(Device* device, Runner runner,
+                             const FunctionLibraryDefinition* lib_def);
+
+  ~FunctionLibraryRuntimeImpl() override;
+
+  Status Instantiate(const string& function_name,
+                     const InstantiateAttrValueMap& attrs,
+                     Handle* handle) override;
+
+  const FunctionBody* GetFunctionBody(Handle handle) override;
+
+  Status CreateKernel(const NodeDef& ndef, OpKernel** kernel) override;
+
+  void Run(const Options& opts, Handle handle, gtl::ArraySlice<Tensor> args,
+           std::vector<Tensor>* rets, DoneCallback done) override;
+
+  bool IsDefined(const string& function_name) override;
+
+ private:
+  typedef FunctionLibraryRuntimeImpl ME;
+
+  Device* const device_;
+  Runner runner_ = nullptr;
+  const FunctionLibraryDefinition* const lib_def_;
+  std::function<Status(const string&, const OpDef**)> get_func_sig_;
+  std::function<Status(const NodeDef&, OpKernel**)> create_kernel_;
+
+  mutable mutex mu_;
+
+  // Maps function instantiation to a handle. The key is a
+  // canonicalized representation of the function name and
+  // instantiation attrs. The handle is an index into the items_.
+  std::unordered_map<string, Handle> table_ GUARDED_BY(mu_);
+
+  // func_graphs_ never shrinks or reorders its members.
+  std::vector<FunctionBody*> func_graphs_ GUARDED_BY(mu_);
+
+  // The instantiated and transformed function is encoded as a Graph
+  // object, and an executor is created for the graph.
+  struct Item : public core::RefCounted {
+    Executor* exec = nullptr;
+
+    ~Item() override { delete this->exec; }
+  };
+  std::vector<Item*> items_;
+
+  Status FunctionDefToBody(const FunctionDef& fdef,
+                           const InstantiateAttrValueMap& attrs,
+                           FunctionBody** fbody);
+  Status CreateItem(Handle handle, Item** item);
+  Status GetOrCreateItem(Handle handle, Item** item);
+  Status InstantiateSymbolicGradient(const InstantiateAttrValueMap& attrs,
+                                     FunctionBody** g_body);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(FunctionLibraryRuntimeImpl);
+};
+
+FunctionLibraryRuntimeImpl::FunctionLibraryRuntimeImpl(
+    Device* device, Runner runner, const FunctionLibraryDefinition* lib_def)
+    : device_(device), runner_(runner), lib_def_(lib_def) {
+  get_func_sig_ = [this](const string& op, const OpDef** sig) {
+    Status s;
+    *sig = lib_def_->LookUp(op, &s);
+    return s;
+  };
+  create_kernel_ = [this](const NodeDef& ndef, OpKernel** kernel) {
+    return CreateKernel(ndef, kernel);
+  };
+}
+
+FunctionLibraryRuntimeImpl::~FunctionLibraryRuntimeImpl() {
+  for (FunctionBody* p : func_graphs_) delete p;
+  for (Item* item : items_)
+    if (item) item->Unref();
+}
+
+// An asynchronous op kernel which executes an instantiated function
+// defined in a library.
+class CallOp : public AsyncOpKernel {
+ public:
+  CallOp(FunctionLibraryRuntime::Handle handle, OpKernelConstruction* ctx)
+      : AsyncOpKernel(ctx), handle_(handle) {}
+
+  ~CallOp() override {}
+
+  void ComputeAsync(OpKernelContext* ctx, DoneCallback done) override {
+    FunctionLibraryRuntime* lib = ctx->function_library();
+    OP_REQUIRES_ASYNC(ctx, lib != nullptr,
+                      errors::Internal("No function library is provided."),
+                      done);
+    FunctionLibraryRuntime::Options opts;
+    std::vector<Tensor> args;
+    args.reserve(ctx->num_inputs());
+    for (int i = 0; i < ctx->num_inputs(); ++i) {
+      args.push_back(ctx->input(i));
+    }
+    std::vector<Tensor>* rets = new std::vector<Tensor>;
+    lib->Run(opts, handle_, args, rets,
+             [ctx, done, rets](const Status& status) {
+               if (!status.ok()) {
+                 ctx->SetStatus(status);
+               } else {
+                 CHECK_EQ(rets->size(), ctx->num_outputs());
+                 for (size_t i = 0; i < rets->size(); ++i) {
+                   ctx->set_output(i, (*rets)[i]);
+                 }
+               }
+               delete rets;
+               done();
+             });
+  }
+
+ private:
+  FunctionLibraryRuntime::Handle handle_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(CallOp);
+};
+
+const FunctionBody* FunctionLibraryRuntimeImpl::GetFunctionBody(Handle h) {
+  mutex_lock l(mu_);
+  CHECK_LE(0, h);
+  CHECK_LT(h, func_graphs_.size());
+  return func_graphs_[h];
+}
+
+Status FunctionLibraryRuntimeImpl::CreateKernel(const NodeDef& ndef,
+                                                OpKernel** kernel) {
+  if (ndef.op() != kGradientOp && (lib_def_->Find(ndef.op()) == nullptr)) {
+    return CreateNonCachedKernel(device_, this, ndef, kernel);
+  }
+
+  // Try to instantiate this function for the func/attr. Maybe its
+  // cached already.
+  Handle handle;
+  TF_RETURN_IF_ERROR(Instantiate(ndef.op(), ndef.attr(), &handle));
+
+  const FunctionBody* fbody = GetFunctionBody(handle);
+  CHECK_NOTNULL(fbody);
+
+  // Constructs a CallOp kernel for running the instantiated function.
+  Status s;
+  auto device_type = DeviceType(device_->attributes().device_type());
+  OpKernelConstruction construction(
+      device_type, device_, device_->GetAllocator(AllocatorAttributes()), &ndef,
+      &fbody->fdef.signature(), this, fbody->arg_types, fbody->ret_types, &s);
+  *kernel = new CallOp(handle, &construction);
+  if (!s.ok()) {
+    delete kernel;
+  }
+  return s;
+}
+
+Status FunctionLibraryRuntimeImpl::FunctionDefToBody(
+    const FunctionDef& fdef, const InstantiateAttrValueMap& attrs,
+    FunctionBody** fbody) {
+  // Instantiates the function template into a graph def.
+  InstantiationResult result;
+  TF_RETURN_IF_ERROR(InstantiateFunction(fdef, attrs, get_func_sig_, &result));
+
+  Graph* graph = new Graph(lib_def_);
+  GraphConstructorOptions opts;
+  opts.allow_internal_ops = true;
+  opts.expect_device_spec = false;
+  Status s = ConvertGraphDefToGraph(opts, result.gdef, graph);
+  if (!s.ok()) {
+    delete graph;
+  } else {
+    *fbody = new FunctionBody(fdef, result.arg_types, result.ret_types, graph);
+  }
+  return s;
+}
+
+Status FunctionLibraryRuntimeImpl::InstantiateSymbolicGradient(
+    const InstantiateAttrValueMap& attrs, FunctionBody** g_body) {
+  const AttrValue* f = gtl::FindOrNull(attrs, "f");
+  if (f == nullptr) {
+    return errors::InvalidArgument("SymbolicGradient is missing attr: f");
+  }
+  const auto& func = f->func();
+  const FunctionDef* fdef = lib_def_->Find(func.name());
+  if (fdef == nullptr) {
+    // f is a primitve op.
+    gradient::Creator creator;
+    TF_RETURN_IF_ERROR(gradient::GetOpGradientCreator(func.name(), &creator));
+    if (creator == nullptr) {
+      return errors::InvalidArgument("No gradient is defined for ",
+                                     func.name());
+    }
+    FunctionDef grad_fdef;
+    TF_RETURN_IF_ERROR(creator(AttrSlice(&func.attr()), &grad_fdef));
+    TF_RETURN_IF_ERROR(FunctionDefToBody(grad_fdef, func.attr(), g_body));
+  } else {
+    // f is a user-defined function.
+    Handle f_handle;
+    TF_RETURN_IF_ERROR(Instantiate(func.name(), func.attr(), &f_handle));
+    const FunctionBody* f_body = GetFunctionBody(f_handle);
+    CHECK_NOTNULL(f_body);
+    *g_body = SymbolicGradient(*f_body);
+  }
+  return Status::OK();
+}
+
+Status FunctionLibraryRuntimeImpl::Instantiate(
+    const string& function_name, const InstantiateAttrValueMap& attrs,
+    Handle* handle) {
+  const string key = Canonicalize(function_name, attrs);
+  {
+    mutex_lock l(mu_);
+    *handle = gtl::FindWithDefault(table_, key, kInvalidHandle);
+    if (*handle != kInvalidHandle) {
+      return Status::OK();
+    }
+  }
+
+  Status s;
+  FunctionBody* fbody = nullptr;
+  if (function_name == kGradientOp) {
+    TF_RETURN_IF_ERROR(InstantiateSymbolicGradient(attrs, &fbody));
+  } else {
+    const FunctionDef* fdef = lib_def_->Find(function_name);
+    if (fdef == nullptr) {
+      return errors::NotFound("Function ", function_name, " is not defined.");
+    }
+    TF_RETURN_IF_ERROR(FunctionDefToBody(*fdef, attrs, &fbody));
+  }
+
+  {
+    mutex_lock l(mu_);
+    *handle = gtl::FindWithDefault(table_, key, kInvalidHandle);
+    if (*handle != kInvalidHandle) {
+      delete fbody;
+    } else {
+      *handle = func_graphs_.size();
+      table_.insert({key, *handle});
+      func_graphs_.push_back(fbody);
+      items_.resize(func_graphs_.size());
+    }
+  }
+  return Status::OK();
+}
+
+static void DumpGraph(const char* label, const Graph* g) {
+  if (VLOG_IS_ON(1)) {
+    LOG(INFO) << label << ": " << std::endl << DebugString(g);
+  }
+}
+
+static void SimplifyGraph(Graph* g) {
+  if (RemoveListArrayConverter(g)) {
+    DumpGraph("RemoveListArrayConverter", g);
+  }
+  bool changed;
+  do {
+    changed = false;
+    if (RemoveDeadNodes(g)) {
+      changed = true;
+      DumpGraph("RemoveDeadNodes", g);
+    }
+    if (RemoveIdentityNodes(g)) {
+      changed = true;
+      DumpGraph("RemoveIdentityNodes", g);
+    }
+    FixupSourceAndSinkEdges(g);
+    OptimizeCSE(g, nullptr);
+    DumpGraph("OptimizeCSE", g);
+  } while (changed);
+}
+
+void OptimizeGraph(FunctionLibraryRuntime* lib, Graph** g) {
+  DumpGraph("Initial", *g);
+  const int kNumInlineRounds = 10;
+  for (int i = 0; i < kNumInlineRounds; ++i) {
+    if (!ExpandInlineFunctions(lib, *g)) break;
+    DumpGraph("ExpandInlineFunctions", *g);
+    SimplifyGraph(*g);
+  }
+
+  // Makes a copy so that we densify node ids.
+  Graph* copy = new Graph((*g)->op_registry());
+  CopyGraph(**g, copy);
+  delete *g;
+  *g = copy;
+  DumpGraph("ReCopy", *g);
+}
+
+Status FunctionLibraryRuntimeImpl::CreateItem(Handle handle, Item** item) {
+  const FunctionBody* fbody = GetFunctionBody(handle);
+  CHECK_NOTNULL(fbody);
+  Graph* g = new Graph(lib_def_);
+  CopyGraph(*fbody->graph, g);
+  OptimizeGraph(this, &g);
+
+  // Creates an executor based on the g.  This must be done without
+  // holding mu_ because create_kernel_ calls back into the library.
+  LocalExecutorParams params;
+  params.device = device_;
+  params.function_library = this;
+  params.has_control_flow = false;
+  params.create_kernel = create_kernel_;
+  params.delete_kernel = [](OpKernel* kernel) {
+    DeleteNonCachedKernel(kernel);
+  };
+  Executor* exec;
+  TF_RETURN_IF_ERROR(NewLocalExecutor(params, g, &exec));
+
+  *item = new Item;
+  (*item)->exec = exec;
+  return Status::OK();
+}
+
+Status FunctionLibraryRuntimeImpl::GetOrCreateItem(Handle handle, Item** item) {
+  {
+    mutex_lock l(mu_);
+    if (handle >= items_.size()) {
+      return errors::NotFound("Function handle ", handle,
+                              " is not valid. Likely an internal error.");
+    }
+    *item = items_[handle];
+    if (*item != nullptr) {
+      (*item)->Ref();
+      return Status::OK();
+    }
+  }
+  // NOTE: We need to call CreateItem out of mu_ because creating an
+  // executor needs to call CreateKernel.
+  TF_RETURN_IF_ERROR(CreateItem(handle, item));
+
+  {
+    mutex_lock l(mu_);
+    if (items_[handle] == nullptr) {
+      // Install *item in items_.
+      items_[handle] = *item;
+      (*item)->Ref();
+    }
+  }
+  return Status::OK();
+}
+
+void FunctionLibraryRuntimeImpl::Run(const Options& opts, Handle handle,
+                                     gtl::ArraySlice<Tensor> args,
+                                     std::vector<Tensor>* rets,
+                                     DoneCallback done) {
+  if (opts.cancellation_manager && opts.cancellation_manager->IsCancelled()) {
+    return done(errors::Cancelled(""));
+  }
+  const FunctionBody* fbody = GetFunctionBody(handle);
+  FunctionCallFrame* frame =
+      new FunctionCallFrame(fbody->arg_types, fbody->ret_types);
+  Status s = frame->SetArgs(args);
+  if (!s.ok()) {
+    delete frame;
+    return done(s);
+  }
+  Item* item = nullptr;
+  s = GetOrCreateItem(handle, &item);
+  if (!s.ok()) {
+    delete frame;
+    return done(s);
+  }
+  Executor::Args exec_args;
+  exec_args.call_frame = frame;
+  exec_args.cancellation_manager = opts.cancellation_manager;
+  exec_args.runner = runner_;
+  item->exec->RunAsync(
+      // Executor args
+      exec_args,
+      // Done callback.
+      [item, frame, rets, done](const Status& status) {
+        item->Unref();
+        Status s = status;
+        if (s.ok()) {
+          s = frame->GetRetvals(rets);
+        }
+        delete frame;
+        done(s);
+      });
+}
+
+bool FunctionLibraryRuntimeImpl::IsDefined(const string& function_name) {
+  return lib_def_->Find(function_name) != nullptr;
+}
+
+FunctionLibraryRuntime* NewFunctionLibraryRuntime(
+    Device* device, Runner runner, const FunctionLibraryDefinition* lib_def) {
+  return new FunctionLibraryRuntimeImpl(device, runner, lib_def);
+}
+
+bool RemoveDeadNodes(Graph* g) {
+  std::vector<bool> visited(g->num_node_ids(), false);
+  visited[Graph::kSourceId] = true;
+  visited[Graph::kSinkId] = true;
+  std::deque<Node*> q;
+  for (auto n : g->nodes()) {
+    if (n->op_def().is_stateful()) {
+      visited[n->id()] = true;
+    } else if (n->type_string() == kArgOp) {
+      visited[n->id()] = true;
+    } else if (n->type_string() == kRetOp) {
+      visited[n->id()] = true;
+      q.push_back(n);
+    }
+  }
+  while (!q.empty()) {
+    const Node* n = q.front();
+    q.pop_front();
+    visited[n->id()] = true;
+    for (auto e : n->in_edges()) {
+      q.push_back(e->src());
+    }
+  }
+  bool removed_any = false;
+  for (Node* n : g->nodes()) {
+    if (!visited[n->id()]) {
+      g->RemoveNode(n);
+      removed_any = true;
+    }
+  }
+  return removed_any;
+}
+
+namespace {
+// If 'edges' contains only 1 non-control edge, returns it. Otherwise,
+// returns a nullptr.
+const Edge* GetTheOnlyDataEdge(const EdgeSet& edges) {
+  const Edge* ret = nullptr;
+  for (const Edge* e : edges) {
+    if (e->IsControlEdge() || ret) return nullptr;
+    ret = e;
+  }
+  return ret;
+}
+}  // end namespace
+
+bool RemoveIdentityNodes(Graph* g) {
+  bool removed_any = false;
+  gtl::InlinedVector<Node*, 8> matches;
+  for (Node* n : g->nodes()) {
+    if ((n->type_string() == "Identity") && GetTheOnlyDataEdge(n->in_edges())) {
+      matches.push_back(n);
+    }
+  }
+  if (!matches.empty()) {
+    for (Node* n : matches) {
+      const Edge* in = GetTheOnlyDataEdge(n->in_edges());
+      for (const Edge* out : n->out_edges()) {
+        if (out->IsControlEdge()) {
+          g->AddControlEdge(in->src(), out->dst());
+        } else {
+          g->AddEdge(in->src(), in->src_output(), out->dst(), out->dst_input());
+        }
+      }
+      g->RemoveNode(n);
+      removed_any = true;
+    }
+  }
+  return removed_any;
+}
+
+bool RemoveListArrayConverter(Graph* g) {
+  gtl::InlinedVector<Node*, 8> matches;
+  for (Node* n : g->nodes()) {
+    if ((n->type_string() == "_ListToArray") ||
+        (n->type_string() == "_ArrayToList")) {
+      matches.push_back(n);
+    }
+  }
+  bool removed_any = false;
+  if (!matches.empty()) {
+    for (Node* n : matches) {
+      if (n->num_inputs() != n->num_outputs()) {
+        continue;  // Not expected. Skip.
+      }
+      gtl::InlinedVector<Node*, 8> identity_nodes(n->num_inputs(), nullptr);
+
+      // Process input edges first.
+      Node* input_control_node = nullptr;
+      for (const Edge* e : n->in_edges()) {
+        if (e->IsControlEdge()) {
+          if (input_control_node == nullptr) {
+            // If node "n" has any control dependencies, adds a no-op
+            // node (input_control_node) which the additional Identity
+            // nodes depends on and the input_control_node depends on
+            // the node "n"s control dependencies.
+            input_control_node = AddNoOp(g);
+          }
+          g->AddControlEdge(e->src(), input_control_node);
+        } else {
+          const int index = e->dst_input();
+          Node** id_node = &identity_nodes[index];
+          if (*id_node != nullptr) {
+            LOG(ERROR)
+                << "RemoveListArrayConverter unexpected duplicated input: "
+                << e->dst_input();
+            return removed_any;
+          }
+          *id_node = AddIdentity(g, {e->src(), e->src_output()});
+        }
+      }
+
+      // If node "n" has any control dependencies, the added identity
+      // nodes should have control dependencies on input_control_node.
+      if (input_control_node != nullptr) {
+        for (Node* id : identity_nodes) {
+          g->AddControlEdge(input_control_node, id);
+        }
+      }
+
+      Node* output_control_node = nullptr;
+      for (const Edge* e : n->out_edges()) {
+        if (e->IsControlEdge()) {
+          if (output_control_node == nullptr) {
+            // If node "n" is control-depended upon by other nodes,
+            // adds a no-op node (output_control_node) which those
+            // nodes will depend on and output_control_node depends on
+            // all Identity nodes.
+            output_control_node = AddNoOp(g);
+          }
+          g->AddControlEdge(output_control_node, e->dst());
+        } else {
+          Node* id_node = identity_nodes[e->src_output()];
+          if (id_node == nullptr) {
+            LOG(ERROR) << "RemoveListArrayConverter unexpected missing input: "
+                       << e->src_output();
+            return removed_any;
+          }
+          CHECK(id_node);
+          g->AddEdge(id_node, 0, e->dst(), e->dst_input());
+        }
+      }
+
+      // If any nodes have control dependencies on node "n", those
+      // nodes should have control dependencies on
+      // output_control_node.
+      if (output_control_node != nullptr) {
+        for (Node* id : identity_nodes) {
+          g->AddControlEdge(id, output_control_node);
+        }
+      }
+
+      g->RemoveNode(n);
+      removed_any = true;
+    }
+  }
+  return removed_any;
+}
+
+// Returns true iff the function '*fbody' can be inlined at 'node'
+// based on the type signature of 'node' and 'fbody'.
+static bool ValidateInlining(const Node* node, const FunctionBody* fbody) {
+  if (static_cast<size_t>(node->num_inputs()) != fbody->arg_types.size()) {
+    return false;
+  }
+  if (static_cast<size_t>(node->num_inputs()) != fbody->arg_nodes.size()) {
+    return false;
+  }
+  if (static_cast<size_t>(node->num_outputs()) != fbody->ret_types.size()) {
+    return false;
+  }
+  if (static_cast<size_t>(node->num_outputs()) != fbody->ret_nodes.size()) {
+    return false;
+  }
+  for (int i = 0; i < node->num_inputs(); ++i) {
+    if (node->input_type(i) != fbody->arg_types[i]) return false;
+  }
+  for (int i = 0; i < node->num_outputs(); ++i) {
+    if (node->output_type(i) != fbody->ret_types[i]) return false;
+  }
+  return true;
+}
+
+// Given a "caller" in "graph", which is a function call of a function
+// to "fbody". Replaces the "caller" with fbody->graph and connects
+// edges properly.
+static void InlineFunctionBody(Graph* g, Node* caller,
+                               const FunctionBody* fbody) {
+  if (!ValidateInlining(caller, fbody)) {
+    LOG(WARNING) << "Inlining mismatch: " << caller->DebugString() << " vs. "
+                 << DebugString(fbody->graph);
+    return;
+  }
+
+  // Duplicate fbody->graph into 'g'.  First, we copy the nodes of
+  // fbody->graph into 'g' except the source and sink nodes.  We copy
+  // edges among nodes in 'fbody->graph'.
+  //
+  // If 'x' is a node in fbody->graph and its copy in 'g' is 'y', we
+  // remember 'y' in node_map[x->id()].
+  std::vector<Node*> node_map(fbody->graph->num_node_ids());
+  for (Node* n : fbody->graph->nodes()) {
+    if (n->IsSource() || n->IsSink()) continue;
+    CHECK(n->IsOp());
+    node_map[n->id()] = g->CopyNode(n);
+  }
+  for (const Edge* e : fbody->graph->edges()) {
+    if (e->src()->IsSource() || e->src()->IsSink() || e->dst()->IsSource() ||
+        e->dst()->IsSink()) {
+      continue;
+    }
+    Node* src_copy = node_map[e->src()->id()];
+    Node* dst_copy = node_map[e->dst()->id()];
+    g->AddEdge(src_copy, e->src_output(), dst_copy, e->dst_input());
+  }
+
+  // Connect input edges.
+  //
+  // For data edges coming into "caller", we first compute the
+  // <src>:<src_output> for the i-th input in "inputs". We create one
+  // Identity node for each input. Then, we connect inputs[i] to to
+  // the i-th identity node added. The nodes that previously connects
+  // to the j-th output of i-th arg node are reconnected to th i-th
+  // identity node.
+  //
+  // If "caller" has any input control dependencies, we add a NoOp
+  // node "input_control_node". This "input_control_node" depends on
+  // what "caller" depends on, and the added identity nodes depend on
+  // "input_control_node".
+  std::vector<Endpoint> inputs(caller->num_inputs());
+  Node* input_control_node = nullptr;
+  for (const Edge* e : caller->in_edges()) {
+    if (e->IsControlEdge()) {
+      if (input_control_node == nullptr) {
+        input_control_node = AddNoOp(g);
+      }
+      g->AddControlEdge(e->src(), input_control_node);
+    } else {
+      inputs[e->dst_input()] = {e->src(), e->src_output()};
+    }
+  }
+  for (std::size_t i = 0; i < fbody->arg_nodes.size(); ++i) {
+    Node* arg = node_map[fbody->arg_nodes[i]->id()];
+    Node* n = AddIdentity(g, inputs[i]);
+    if (input_control_node) {
+      g->AddControlEdge(input_control_node, n);
+    }
+    for (const Edge* e : arg->out_edges()) {
+      if (e->IsControlEdge()) {
+        g->AddControlEdge(n, e->dst());
+      } else {
+        g->AddEdge(n, 0, e->dst(), e->dst_input());
+      }
+    }
+    node_map[fbody->arg_nodes[i]->id()] = n;
+    g->RemoveNode(arg);  // 'arg' is disconnected.
+  }
+
+  // Connect output edges.
+  //
+  // For i-th return node in fbody->graph, we add in "g" an identity
+  // node (outputs[i-th]). We then reconnect every incoming edge into
+  // the i-th return node to the added identity node.
+  //
+  // For every data edge coming out of "callee"s i-th output, we
+  // reconnect it to the i-th identity added above.
+  //
+  // If "callee" is control-depended upon by any other nodes, we add a
+  // NoOp node "output_control_node". "output_control_node" depends on
+  // all identity nodes added above. And nodes previously depend on
+  // "callee" is changed to depend on "output_control_node".
+  std::vector<Node*> outputs(caller->num_inputs());
+  for (std::size_t i = 0; i < fbody->ret_nodes.size(); ++i) {
+    Node* ret = node_map[fbody->ret_nodes[i]->id()];
+    Endpoint data;  // Data input for the ret node.
+    for (const Edge* e : ret->in_edges()) {
+      if (!e->IsControlEdge()) {
+        data = {e->src(), e->src_output()};
+        break;
+      }
+    }
+    CHECK(data.node != nullptr);
+    Node* n = AddIdentity(g, data);
+    outputs[i] = n;
+    for (const Edge* e : ret->in_edges()) {
+      if (e->IsControlEdge()) {
+        g->AddControlEdge(e->src(), n);
+      }
+    }
+    g->RemoveNode(ret);  // 'ret' is disconnected.
+  }
+  Node* output_control_node = nullptr;
+  for (const Edge* e : caller->out_edges()) {
+    if (e->IsControlEdge()) {
+      if (output_control_node == nullptr) {
+        output_control_node = AddNoOp(g);
+        for (Node* n : outputs) {
+          g->AddControlEdge(n, output_control_node);
+        }
+      }
+      g->AddControlEdge(output_control_node, e->dst());
+    } else {
+      g->AddEdge(outputs[e->src_output()], 0, e->dst(), e->dst_input());
+    }
+  }
+  g->RemoveNode(caller);  // 'caller' is replaced with inlined nodes.
+}
+
+bool ExpandInlineFunctions(FunctionLibraryRuntime* lib, Graph* graph) {
+  std::vector<std::pair<Node*, const FunctionBody*>> candidates;
+  for (Node* node : graph->nodes()) {
+    VLOG(3) << "Expanding " << node->DebugString();
+    FunctionLibraryRuntime::Handle handle;
+    Status s =
+        lib->Instantiate(node->type_string(), node->def().attr(), &handle);
+    if (!s.ok()) {
+      // Either "node" is a primitive op, or the instantiation failed.
+      if (errors::IsNotFound(s)) {
+        VLOG(2) << "ExpandInlineFunctions " << s;
+      } else {
+        LOG(ERROR) << "ExpandInlineFunctions " << s;
+      }
+      continue;
+    }
+    const FunctionBody* fbody = lib->GetFunctionBody(handle);
+    CHECK_NOTNULL(fbody);
+    candidates.push_back({node, fbody});
+  }
+  for (const auto& p : candidates) {
+    InlineFunctionBody(graph, p.first, p.second);
+  }
+  return !candidates.empty();
+}
+
+// TODO(zhifengc): Maybe this should be the default Graph::AsGraphDef.
+// and stash the original NodeDef name as an attr for documentation
+// purpose.
+static void ToGraphDef(const Graph* g, GraphDef* gdef) {
+  // We visit nodes in forward topological sort order, which is a
+  // possible execution order of the graph.
+  std::vector<int> pending(g->num_node_ids());
+  std::deque<const Node*> ready;
+  for (const Node* n : g->nodes()) {
+    pending[n->id()] = n->in_edges().size();
+    if (pending[n->id()] == 0) ready.push_back(n);
+  }
+  gtl::InlinedVector<const Edge*, 4> inputs;
+  gdef->Clear();
+  while (!ready.empty()) {
+    const Node* n = ready.front();
+    ready.pop_front();
+    for (const Edge* e : n->out_edges()) {
+      const Node* next = e->dst();
+      if (--pending[next->id()] == 0) {
+        ready.push_back(next);
+      }
+    }
+    if (!n->IsOp()) continue;
+    NodeDef* ndef = gdef->add_node();
+    ndef->set_name(strings::StrCat("n", n->id()));
+    ndef->set_op(n->type_string());
+    *(ndef->mutable_attr()) = n->def().attr();
+    inputs.clear();
+    inputs.resize(n->num_inputs());
+    for (const Edge* e : n->in_edges()) {
+      if (e->IsControlEdge()) {
+        inputs.push_back(e);
+      } else {
+        if (inputs[e->dst_input()] == nullptr) {
+          inputs[e->dst_input()] = e;
+        } else {
+          LOG(WARNING) << "Malformed graph node. multiple input edges: "
+                       << n->DebugString();
+        }
+      }
+    }
+    // node->name() is merely NodeDef::name, which are not guaranteed
+    // to be unique and stable after optimization rewrites. Therefore,
+    // we use "n<node id>" instead.
+    for (const Edge* e : inputs) {
+      if (e == nullptr) {
+        ndef->add_input("unknown");
+      } else if (!e->src()->IsOp()) {
+      } else if (e->IsControlEdge()) {
+        ndef->add_input(strings::StrCat("^n", e->src()->id()));
+      } else if (e->src_output() == 0) {
+        ndef->add_input(strings::StrCat("n", e->src()->id()));
+      } else {
+        ndef->add_input(
+            strings::StrCat("n", e->src()->id(), ":", e->src_output()));
+      }
+    }
+  }
+}
+
+string DebugString(const Graph* g) {
+  GraphDef gdef;
+  ToGraphDef(g, &gdef);
+  return DebugString(gdef);
+}
+
+FunctionBody::FunctionBody(const FunctionDef& f, DataTypeSlice arg_t,
+                           DataTypeSlice ret_t, Graph* g)
+    : fdef(f),
+      graph(g),
+      arg_types(arg_t.begin(), arg_t.end()),
+      ret_types(ret_t.begin(), ret_t.end()) {
+  this->arg_nodes.resize(arg_types.size());
+  this->ret_nodes.resize(ret_types.size());
+  for (Node* n : this->graph->nodes()) {
+    gtl::InlinedVector<Node*, 4>* node_vec;
+    if (n->type_string() == kRetOp) {
+      node_vec = &this->ret_nodes;
+    } else if (n->type_string() == kArgOp) {
+      node_vec = &this->arg_nodes;
+    } else {
+      continue;
+    }
+    int index;
+    TF_CHECK_OK(GetNodeAttr(n->def(), "index", &index));
+    CHECK_LE(0, index);
+    CHECK_LT(index, node_vec->size());
+    (*node_vec)[index] = n;
+  }
+}
+
+FunctionBody::~FunctionBody() { delete this->graph; }
+
+class SymbolicGradientHelper {
+ public:
+  explicit SymbolicGradientHelper(const FunctionBody& f) : fbody_(&f) {}
+
+  ~SymbolicGradientHelper() { delete gbody_; }
+
+  FunctionBody* Compute();
+
+ private:
+  const FunctionBody* fbody_;
+  FunctionBody* gbody_ = nullptr;
+
+  // A vector of output endpoints which represents backpropagated
+  // gradients
+  typedef std::vector<Endpoint> BackpropedGradients;
+
+  // backprops_ is a map from an output endpoint to its accumulated
+  // gradients.  When an output endpoint has accumulated all its
+  // gradients, we add a node which sums them up.
+  std::unordered_map<Endpoint, BackpropedGradients, EndpointHash, EndpointEq>
+      backprops_;
+
+  // pending[i] is count-down counter for i-th node's expected
+  // backprops.  When pending[i] becomes zero, we collected all
+  // backprop gradients for all output endpoint of the ith-node.
+  std::vector<int> pending_;
+
+  // 'ready' keeps track of nodes that have been completely
+  // backpropped. Initially, for every output y of the function f, we
+  // add dy as an input of the the gradient function.
+  std::deque<Node*> ready_;
+
+  // Makes a copy of fbody_ in gbody_.
+  void Copy();
+
+  // Initialize pending_ and ready_.
+  void InitBackprop();
+
+  // In the original function body, there is a forward edge from 'src'
+  // to 'dst', when the backprop algorithm constructs the node
+  // 'dst_grad' which computes the gradient, we need to propagate it
+  // to 'src'.
+  void BackpropAlongEdge(const Endpoint& dst_grad, const Endpoint& src);
+  void BackpropZerosAlongEdge(const Endpoint& src);
+
+  Endpoint SumGradients(const Endpoint& src);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(SymbolicGradientHelper);
+};
+
+void SymbolicGradientHelper::Copy() {
+  const Graph& src = *(fbody_->graph);
+  gbody_->graph = new Graph(src.op_registry());
+  Graph* dst = gbody_->graph;
+
+  std::vector<Node*> node_map(src.num_node_ids());
+
+  // Copy the nodes.
+  node_map[src.source_node()->id()] = dst->source_node();
+  node_map[src.sink_node()->id()] = dst->sink_node();
+  for (Node* n : src.nodes()) {
+    if (n->IsSource() || n->IsSink()) continue;
+    CHECK(n->IsOp());
+    node_map[n->id()] = dst->CopyNode(n);
+  }
+
+  // Copy the edges.
+  for (const Edge* e : src.edges()) {
+    Node* src_copy = node_map[e->src()->id()];
+    Node* dst_copy = node_map[e->dst()->id()];
+    dst->AddEdge(src_copy, e->src_output(), dst_copy, e->dst_input());
+  }
+
+  // Save inputs in copied graph.
+  CHECK_EQ(fbody_->arg_types.size(), fbody_->arg_nodes.size());
+  gbody_->arg_types = fbody_->arg_types;
+  for (std::size_t i = 0; i < fbody_->arg_nodes.size(); ++i) {
+    gbody_->arg_nodes.push_back(node_map[fbody_->arg_nodes[i]->id()]);
+  }
+
+  // Save outputs in copied graph.
+  CHECK_EQ(fbody_->ret_types.size(), fbody_->ret_nodes.size());
+  gbody_->ret_types = fbody_->ret_types;
+  for (std::size_t i = 0; i < fbody_->ret_nodes.size(); ++i) {
+    gbody_->ret_nodes.push_back(node_map[fbody_->ret_nodes[i]->id()]);
+  }
+}
+
+void SymbolicGradientHelper::BackpropAlongEdge(const Endpoint& dst_grad,
+                                               const Endpoint& src) {
+  CHECK_NOTNULL(src.node);
+  auto iter = backprops_.find(src);
+  if (iter != backprops_.end()) {
+    auto* grads = &iter->second;
+    grads->push_back(dst_grad);
+    if (--pending_[src.node->id()] == 0) {
+      ready_.push_back(src.node);
+    }
+  }
+}
+
+void SymbolicGradientHelper::BackpropZerosAlongEdge(const Endpoint& src) {
+  CHECK_NOTNULL(src.node);
+  auto iter = backprops_.find(src);
+  if (iter != backprops_.end()) {
+    if (--pending_[src.node->id()] == 0) {
+      ready_.push_back(src.node);
+    }
+  }
+}
+
+void SymbolicGradientHelper::InitBackprop() {
+  Graph* g = gbody_->graph;
+  pending_.resize(g->num_node_ids(), 0);
+  {
+    backprops_.clear();
+    std::unordered_set<Node*> visited;
+    std::deque<Node*> queue;
+    for (Node* n : gbody_->arg_nodes) {
+      queue.push_back(n);
+    }
+
+    // Going forward to figure out which endpoints need backprop-ed.
+    // A node's endpoints need to be backprop-ed only if one of the
+    // arg node can reach the node via data edges.
+    while (!queue.empty()) {
+      Node* n = queue.front();
+      queue.pop_front();
+      visited.insert(n);
+      for (int i = 0; i < n->num_outputs(); ++i) {
+        backprops_[{n, i}].clear();
+      }
+      int num_expected_backprops = 0;
+      for (const Edge* e : n->out_edges()) {
+        if (e->IsControlEdge()) continue;
+        ++num_expected_backprops;
+        if (visited.find(e->dst()) == visited.end()) {
+          queue.push_back(e->dst());
+        }
+      }
+      pending_[n->id()] = num_expected_backprops;
+    }
+  }
+
+  {
+    const int num_y = gbody_->ret_nodes.size();
+    for (int i = 0; i < num_y; ++i) {
+      Node* y = gbody_->ret_nodes[i];
+      DCHECK_EQ(y->type_string(), kRetOp);
+      const DataType dtype = y->input_type(0);
+      const int index = gbody_->arg_nodes.size();
+      Node* dy = AddArg(g, dtype, index);
+      gbody_->arg_types.push_back(dtype);
+      gbody_->arg_nodes.push_back(dy);
+
+      // What's the input to y?
+      Endpoint y_in{nullptr, 0};
+      for (const Edge* e : y->in_edges()) {
+        if (!e->IsControlEdge()) {
+          y_in = {e->src(), e->src_output()};
+          break;
+        }
+      }
+      CHECK_NOTNULL(y_in.node);
+      BackpropAlongEdge({dy, 0}, y_in);
+    }
+  }
+}
+
+Endpoint SymbolicGradientHelper::SumGradients(const Endpoint& src) {
+  Graph* g = gbody_->graph;
+  const DataType dtype = src.dtype();
+  auto iter = backprops_.find(src);
+  CHECK(iter != backprops_.end());
+  const auto& grads = iter->second;
+  if (grads.empty()) {
+    // Nothing propagated back. The best we can come up is zeros.
+    Node* zero_like = AddZerosLike(g, src);
+    return {zero_like, 0};
+  }
+  if (grads.size() == 1) {
+    // Just one backprop edge.
+    return grads[0];
+  }
+  // Otherwise, adds backprop-ed gradients.
+  NodeDef ndef;
+  ndef.set_name(g->NewName(kNodeLabel));
+  ndef.set_op("AddN");  // N-way Add
+  for (const Endpoint& ep : grads) {
+    ndef.add_input(ep.name());
+  }
+  AddNodeAttr("N", static_cast<int64>(grads.size()), &ndef);
+  AddNodeAttr("T", dtype, &ndef);
+  Status s;
+  Node* add = gbody_->graph->AddNode(ndef, &s);
+  TF_CHECK_OK(s);
+  for (size_t i = 0; i < grads.size(); ++i) {
+    const Endpoint& ep = grads[i];
+    g->AddEdge(ep.node, ep.index, add, i);
+  }
+  return {add, 0};
+}
+
+static bool IsPrimitiveOpWithNoGrad(const string& func) {
+  gradient::Creator creator;
+  Status s = gradient::GetOpGradientCreator(func, &creator);
+  return s.ok() && (creator == nullptr);
+}
+
+FunctionBody* SymbolicGradientHelper::Compute() {
+  CHECK(gbody_ == nullptr);
+  gbody_ = new FunctionBody;
+
+  // Copy fbody_ into gbody_.
+  Copy();
+
+  // Initialize backprops.
+  InitBackprop();
+
+  // Backward propagation.
+  gtl::InlinedVector<Endpoint, 8> dy;
+  Graph* g = gbody_->graph;
+  while (!ready_.empty()) {
+    // n has collected all gradients.
+    Node* n = ready_.front();
+    ready_.pop_front();
+
+    if (n->type_string() == kArgOp) {
+      // We'll handle the _Arg node after backprop is done.
+      continue;
+    }
+
+    // "n" has num_x inputs and num_y outputs.
+    const int num_x = n->num_inputs();
+    const int num_y = n->num_outputs();
+
+    // dy[i] is the sum of i-th output's backpropped gradients.
+    dy.clear();
+    dy.resize(num_y, {nullptr, 0});
+    for (int i = 0; i < num_y; ++i) {
+      dy[i] = SumGradients({n, i});
+    }
+
+    if (IsPrimitiveOpWithNoGrad(n->type_string())) {
+      // No grad defined for this op.  Backprops zeros along the in
+      // edges.
+      for (const Edge* e : n->in_edges()) {
+        if (e->IsControlEdge()) continue;
+        BackpropZerosAlongEdge({e->src(), e->src_output()});
+      }
+      continue;
+    }
+
+    // Adds a gradient node with num_x + num_y inputs and num_x
+    // outputs.
+    Node* grad = AddSymGrad(g, n, dy);
+    for (const Edge* e : n->in_edges()) {
+      if (e->IsControlEdge()) continue;
+      g->AddEdge(e->src(), e->src_output(), grad, e->dst_input());
+    }
+    for (int i = 0; i < num_y; ++i) {
+      g->AddEdge(dy[i].node, dy[i].index, grad, num_x + i);
+    }
+
+    // Backprops along the in edges.
+    for (const Edge* e : n->in_edges()) {
+      if (e->IsControlEdge()) continue;
+      BackpropAlongEdge({grad, e->dst_input()}, {e->src(), e->src_output()});
+    }
+  }
+
+  // The gradient's retval nodes.
+  for (Node* n : gbody_->ret_nodes) {
+    g->RemoveNode(n);
+  }
+  gbody_->ret_types = fbody_->arg_types;
+  gbody_->ret_nodes.clear();
+  for (size_t i = 0; i < fbody_->arg_types.size(); ++i) {
+    Endpoint grad = SumGradients({gbody_->arg_nodes[i], 0});
+    Node* ret = AddRet(g, grad, i);
+    gbody_->ret_nodes.push_back(ret);
+  }
+
+  auto ret = gbody_;
+  gbody_ = nullptr;
+  return ret;
+}
+
+FunctionBody* SymbolicGradient(const FunctionBody& f) {
+  return SymbolicGradientHelper(f).Compute();
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/common_runtime/function.h b/tensorflow/core/common_runtime/function.h
new file mode 100644
index 0000000000..634b31232a
--- /dev/null
+++ b/tensorflow/core/common_runtime/function.h
@@ -0,0 +1,100 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_FUNCTION_H_
+#define TENSORFLOW_COMMON_RUNTIME_FUNCTION_H_
+
+#include <functional>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/graph/graph.h"
+
+namespace tensorflow {
+
+// Creates a FunctionLibraryRuntime, which instantiates functions
+// defined in "lib_def" and executes functions on the "device".
+//
+// The returned object does not take ownerships of "device" or
+// "lib_def".  The caller must ensure "device" and "lib_def" outlives
+// the returned object.
+typedef std::function<void()> Closure;
+typedef std::function<void(Closure)> Runner;
+FunctionLibraryRuntime* NewFunctionLibraryRuntime(
+    Device* device, Runner runner, const FunctionLibraryDefinition* lib_def);
+
+// FunctionLibraryRuntime::GetFunctionBody returns a description of an
+// instantiated function that is represented as a Graph with arg/ret
+// nodes annotated.
+struct FunctionBody {
+  FunctionDef fdef;
+  Graph* graph = nullptr;  // owned.
+  DataTypeVector arg_types;
+  DataTypeVector ret_types;
+  gtl::InlinedVector<Node*, 4> arg_nodes;
+  gtl::InlinedVector<Node*, 4> ret_nodes;
+
+  FunctionBody() {}
+  FunctionBody(const FunctionDef& f, DataTypeSlice arg_types,
+               DataTypeSlice ret_types, Graph* g);
+  ~FunctionBody();
+};
+
+// Debugging facility.  Returns a debug string for a graph
+// representing an instantiated function.
+string DebugString(const Graph* instantiated_func_graph);
+
+// A few hand-crafted optimization on the instantiated function body
+// (a Graph*).
+
+// Removes nodes that are
+//   1. not stateful; and
+//   2. not _Arg; and
+//   3. not reachable from _Retval.
+// Returns true iff any node is removed from "g".
+bool RemoveDeadNodes(Graph* g);
+
+// Find a pattern:
+//   src -(in)-> node -(out)-> dst, where
+// 1) node is an identity node;
+// 2) in is the only incoming data edge;
+// 3) out is the only outgoing data edge;
+//
+// Rewrites the above pattern with src->dst and relevant data
+// dependencies updated. Repeat the process until no such pattern
+// left.
+bool RemoveIdentityNodes(Graph* g);
+
+// Rewrites _ListToArray and _ArrayToList to a set of Identity nodes.
+bool RemoveListArrayConverter(Graph* g);
+
+// For each node in "graph", if "lib" indicates that the node is a
+// function call, inline the function body.  Returns true if at least
+// one node is inlined.
+//
+// This routine goes through "graph" nodes once and applies the
+// inlining.  The caller may decide to apply the inlining on "graph"
+// multiple times by calling ExpandInlineFunctions a few times.
+bool ExpandInlineFunctions(FunctionLibraryRuntime* lib, Graph* graph);
+
+// Applies graph rewrite optimzation such as inlining, dead code
+// removal, etc.
+//
+// **g is a graph constructed based on the runtime library 'lib'.
+// OptimizeGraph mutates **g extensively and replaces '*g' with a
+// complete copy. Therefore, the caller should not keep any references
+// to nodes *g.
+void OptimizeGraph(FunctionLibraryRuntime* lib, Graph** g);
+
+// Given a numerical function "f", returns another numerical function
+// "g", such that if "f" takes N inputs and produces M outputs, "g"
+// takes N + M inputs and produces N outputs. I.e., if
+//   (y1, y2, ..., y_M) = f(x1, x2, ..., x_N),
+// g is a function which is
+//   (dL/dx1, dL/dx2, ..., dL/dx_N) = g(x1, x2, ..., x_N,
+//                                     dL/dy1, dL/dy2, ..., dL/dy_M),
+// where L is a scalar-value function of (...x_i...).
+//
+// TODO(zhifengc): Asks math expert to say the comment again.
+FunctionBody* SymbolicGradient(const FunctionBody& f);
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_FUNCTION_H_
diff --git a/tensorflow/core/common_runtime/gpu/dma_helper.h b/tensorflow/core/common_runtime/gpu/dma_helper.h
new file mode 100644
index 0000000000..7b0750f405
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/dma_helper.h
@@ -0,0 +1,18 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_DMA_HELPER_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_DMA_HELPER_H_
+
+#include "tensorflow/core/public/tensor.h"
+
+// For internal use only.  Visibility should be limited to brain/framework.
+
+namespace tensorflow {
+class DMAHelper {
+ public:
+  static bool CanUseDMA(const Tensor* t) { return t->CanUseDMA(); }
+  static const void* base(const Tensor* t) { return t->base<const void>(); }
+  static void* base(Tensor* t) { return t->base<void>(); }
+  static TensorBuffer* buffer(Tensor* t) { return t->buf_; }
+  static const TensorBuffer* buffer(const Tensor* t) { return t->buf_; }
+};
+}  // namespace tensorflow
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_DMA_HELPER_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_allocator_retry.cc b/tensorflow/core/common_runtime/gpu/gpu_allocator_retry.cc
new file mode 100644
index 0000000000..742459c63b
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_allocator_retry.cc
@@ -0,0 +1,49 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+GPUAllocatorRetry::GPUAllocatorRetry() : env_(Env::Default()) {}
+
+void* GPUAllocatorRetry::AllocateRaw(
+    std::function<void*(size_t alignment, size_t num_bytes,
+                        bool verbose_failure)> alloc_func,
+    int max_millis_to_wait, size_t alignment, size_t num_bytes) {
+  if (num_bytes == 0) {
+    LOG(WARNING) << "Request to allocate 0 bytes";
+    return nullptr;
+  }
+  uint64 deadline_micros = env_->NowMicros() + max_millis_to_wait * 1000;
+  void* ptr = nullptr;
+  while (ptr == nullptr) {
+    ptr = alloc_func(alignment, num_bytes, false);
+    if (ptr == nullptr) {
+      uint64 now = env_->NowMicros();
+      if (now < deadline_micros) {
+        mutex_lock l(mu_);
+        WaitForMilliseconds(&l, &memory_returned_,
+                            (deadline_micros - now) / 1000);
+      } else {
+        return alloc_func(alignment, num_bytes, true);
+      }
+    }
+  }
+  return ptr;
+}
+
+void GPUAllocatorRetry::DeallocateRaw(std::function<void(void*)> dealloc_func,
+                                      void* ptr) {
+  if (ptr == nullptr) {
+    LOG(ERROR) << "Request to free nullptr";
+    return;
+  }
+  dealloc_func(ptr);
+  {
+    mutex_lock l(mu_);
+    memory_returned_.notify_all();
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h b/tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h
new file mode 100644
index 0000000000..a3298ab222
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h
@@ -0,0 +1,36 @@
+#ifndef TENSORFLOW_CORE_COMMON_RUNTIME_GPU_GPU_ALLOCATOR_RETRY_H_
+#define TENSORFLOW_CORE_COMMON_RUNTIME_GPU_GPU_ALLOCATOR_RETRY_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+
+// A retrying wrapper for a memory allocator.
+class GPUAllocatorRetry {
+ public:
+  GPUAllocatorRetry();
+
+  // Call 'alloc_func' to obtain memory.  On first call,
+  // 'verbose_failure' will be false.  If return value is nullptr,
+  // then wait up to 'max_millis_to_wait' milliseconds, retrying each
+  // time a call to DeallocateRaw() is detected, until either a good
+  // pointer is returned or the deadline is exhausted.  If the
+  // deadline is exahusted, try one more time with 'verbose_failure'
+  // set to true.  The value returned is either the first good pointer
+  // obtained from 'alloc_func' or nullptr.
+  void* AllocateRaw(std::function<void*(size_t alignment, size_t num_bytes,
+                                        bool verbose_failure)> alloc_func,
+                    int max_millis_to_wait, size_t alignment, size_t bytes);
+
+  // Calls dealloc_func(ptr) and then notifies any threads blocked in
+  // AllocateRaw() that would like to retry.
+  void DeallocateRaw(std::function<void(void* ptr)> dealloc_func, void* ptr);
+
+ private:
+  Env* env_;
+  mutex mu_;
+  condition_variable memory_returned_;
+};
+}  // namespace tensorflow
+#endif  // TENSORFLOW_CORE_COMMON_RUNTIME_GPU_GPU_ALLOCATOR_RETRY_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_allocator_retry_test.cc b/tensorflow/core/common_runtime/gpu/gpu_allocator_retry_test.cc
new file mode 100644
index 0000000000..db1c58cc65
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_allocator_retry_test.cc
@@ -0,0 +1,175 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h"
+
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/env.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+class FakeAllocator {
+ public:
+  FakeAllocator(size_t cap, int millis_to_wait)
+      : memory_capacity_(cap), millis_to_wait_(millis_to_wait) {}
+
+  // Allocate just keeps track of the number of outstanding allocations,
+  // not their sizes.  Assume a constant size for each.
+  void* AllocateRaw(size_t alignment, size_t num_bytes) {
+    return retry_.AllocateRaw(
+        [this](size_t a, size_t nb, bool v) {
+          mutex_lock l(mu_);
+          if (memory_capacity_ > 0) {
+            --memory_capacity_;
+            return good_ptr_;
+          } else {
+            return static_cast<void*>(nullptr);
+          }
+        },
+        millis_to_wait_, alignment, num_bytes);
+  }
+
+  void DeallocateRaw(void* ptr) {
+    retry_.DeallocateRaw(
+        [this](void* p) {
+          mutex_lock l(mu_);
+          ++memory_capacity_;
+        },
+        ptr);
+  }
+
+ private:
+  GPUAllocatorRetry retry_;
+  void* good_ptr_ = reinterpret_cast<void*>(0xdeadbeef);
+  mutex mu_;
+  size_t memory_capacity_ GUARDED_BY(mu_);
+  int millis_to_wait_;
+};
+
+class GPUAllocatorRetryTest : public ::testing::Test {
+ protected:
+  GPUAllocatorRetryTest() {}
+
+  void LaunchConsumerThreads(int num_consumers, int cap_needed) {
+    consumer_count_.resize(num_consumers, 0);
+    for (int i = 0; i < num_consumers; ++i) {
+      consumers_.push_back(Env::Default()->StartThread(
+          ThreadOptions(), "anon_thread", [this, i, cap_needed]() {
+            do {
+              void* ptr = nullptr;
+              for (int j = 0; j < cap_needed; ++j) {
+                ptr = alloc_->AllocateRaw(16, 1);
+                if (ptr == nullptr) {
+                  mutex_lock l(mu_);
+                  has_failed_ = true;
+                  return;
+                }
+              }
+              ++consumer_count_[i];
+              for (int j = 0; j < cap_needed; ++j) {
+                alloc_->DeallocateRaw(ptr);
+              }
+            } while (!notifier_.HasBeenNotified());
+          }));
+    }
+  }
+
+  // Wait up to wait_micros microseconds for has_failed_ to equal expected,
+  // then terminate all threads.
+  void JoinConsumerThreads(bool expected, int wait_micros) {
+    while (wait_micros > 0) {
+      {
+        mutex_lock l(mu_);
+        if (has_failed_ == expected) break;
+      }
+      int interval_micros = std::min(1000, wait_micros);
+      Env::Default()->SleepForMicroseconds(interval_micros);
+      wait_micros -= interval_micros;
+    }
+    notifier_.Notify();
+    for (auto c : consumers_) {
+      // Blocks until thread terminates.
+      delete c;
+    }
+  }
+
+  std::unique_ptr<FakeAllocator> alloc_;
+  std::vector<Thread*> consumers_;
+  std::vector<int> consumer_count_;
+  Notification notifier_;
+  mutex mu_;
+  bool has_failed_ GUARDED_BY(mu_) = false;
+  int count_ GUARDED_BY(mu_) = 0;
+};
+
+// Verifies correct retrying when memory is slightly overcommitted but
+// we allow retry.
+TEST_F(GPUAllocatorRetryTest, RetrySuccess) {
+  // Support up to 2 allocations simultaneously, waits up to 10 msec for
+  // a chance to alloc.
+  alloc_.reset(new FakeAllocator(2, 10000));
+  // Launch 3 consumers, each of whom needs 1 unit at a time.
+  LaunchConsumerThreads(3, 1);
+  // This should be enough time for each consumer to be satisfied many times.
+  Env::Default()->SleepForMicroseconds(50000);
+  JoinConsumerThreads(false, 0);
+  for (int i = 0; i < 3; ++i) {
+    LOG(INFO) << "Consumer " << i << " is " << consumer_count_[i];
+  }
+  {
+    mutex_lock l(mu_);
+    EXPECT_FALSE(has_failed_);
+  }
+  EXPECT_GT(consumer_count_[0], 0);
+  EXPECT_GT(consumer_count_[1], 0);
+  EXPECT_GT(consumer_count_[2], 0);
+}
+
+// Verifies OutOfMemory failure when memory is slightly overcommitted
+// and retry is not allowed.
+TEST_F(GPUAllocatorRetryTest, NoRetryFail) {
+  // Support up to 2 allocations simultaneously, waits up to 0 msec for
+  // a chance to alloc.
+  alloc_.reset(new FakeAllocator(2, 0));
+  // Launch 3 consumers, each of whom needs 1 unit at a time.
+  LaunchConsumerThreads(3, 1);
+  Env::Default()->SleepForMicroseconds(50000);
+  // Will wait up to 10 seconds for proper race condition to occur, resulting
+  // in failure.
+  JoinConsumerThreads(true, 10000000);
+  for (int i = 0; i < 3; ++i) {
+    LOG(INFO) << "Consumer " << i << " is " << consumer_count_[i];
+  }
+  {
+    mutex_lock l(mu_);
+    EXPECT_TRUE(has_failed_);
+  }
+}
+
+// Verifies OutOfMemory failure when retry is allowed but memory capacity
+// is too low even for retry.
+TEST_F(GPUAllocatorRetryTest, RetryInsufficientFail) {
+  // Support up to 2 allocations simultaneously, waits up to 10 msec for
+  // a chance to alloc.
+  alloc_.reset(new FakeAllocator(2, 10000));
+  // Launch 3 consumers, each of whom needs 2 units at a time.  We expect
+  // deadlock where 2 consumers each hold 1 unit, and timeout trying to
+  // get the second.
+  LaunchConsumerThreads(3, 2);
+  Env::Default()->SleepForMicroseconds(50000);
+  // Will wait up to 10 seconds for proper race condition to occur, resulting
+  // in failure.
+  JoinConsumerThreads(true, 10000000);
+  for (int i = 0; i < 3; ++i) {
+    LOG(INFO) << "Consumer " << i << " is " << consumer_count_[i];
+  }
+  {
+    mutex_lock l(mu_);
+    EXPECT_TRUE(has_failed_);
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc b/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc
new file mode 100644
index 0000000000..3df833594f
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc
@@ -0,0 +1,397 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.h"
+
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/core/lib/core/bits.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+GPUBFCAllocator::GPUBFCAllocator(int device_id, size_t total_memory)
+    : device_id_(device_id) {
+  // Get a pointer to the stream_executor for this device
+  stream_exec_ = GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+
+  // Allocate the requested amount of memory.
+  gpu_memory_size_ = total_memory;
+
+  LOG(INFO) << "Allocating " << strings::HumanReadableNumBytes(gpu_memory_size_)
+            << " bytes.";
+  gpu::DeviceMemory<char> gpu_mem =
+      stream_exec_->AllocateArray<char>(gpu_memory_size_);
+
+  QCHECK(gpu_mem != nullptr)
+      << " Could not allocate GPU device memory for device " << device_id
+      << ". Tried to allocate "
+      << strings::HumanReadableNumBytes(gpu_memory_size_);
+  base_ptr_ = gpu_mem.opaque();
+  LOG(INFO) << "GPU " << device_id << " memory begins at " << base_ptr_
+            << " extends to "
+            << static_cast<void*>(
+                   (static_cast<char*>(base_ptr_) + gpu_memory_size_));
+
+  // Create a bunch of bins of various good sizes.
+
+  // Covers allocations of exactly 256 bytes (the minimum size).
+  bins_.insert(std::make_pair(256, new Bin(256)));
+
+  // We create bins to fit all possible ranges that cover the
+  // gpu_memory_size_ starting from allocations up to 1024 bytes to
+  // allocations up to (and including) the memory limit.
+  for (size_t bin_size = 1024; bin_size < gpu_memory_size_ * 2; bin_size *= 2) {
+    LOG(INFO) << "Creating bin of max chunk size "
+              << strings::HumanReadableNumBytes(bin_size);
+    bins_.insert(std::make_pair(bin_size, new Bin(bin_size)));
+  }
+
+  // Create one large chunk for the whole memory space that will
+  // be chunked later.
+  GPUBFCAllocator::Chunk* c = new GPUBFCAllocator::Chunk();
+  c->ptr = gpu_mem.opaque();
+  c->size = gpu_memory_size_;
+  c->in_use = false;
+  c->prev = nullptr;
+  c->next = nullptr;
+
+  ptr_to_chunk_map_.insert(std::make_pair(c->ptr, c));
+
+  // Insert the chunk into the right bin.
+  ReassignChunkToBin(c);
+}
+
+GPUBFCAllocator::~GPUBFCAllocator() {
+  // Return memory back.
+  if (base_ptr_) {
+    gpu::DeviceMemoryBase gpu_ptr{base_ptr_};
+    stream_exec_->Deallocate(&gpu_ptr);
+  }
+
+  gtl::STLDeleteValues(&bins_);
+}
+
+void* GPUBFCAllocator::AllocateRaw(size_t unused_alignment, size_t num_bytes) {
+  static const int64 kMaxMillisToWait = 10000;  // 10 seconds
+  return retry_helper_.AllocateRaw(
+      [this](size_t a, size_t nb, bool v) {
+        return AllocateRawInternal(a, nb, v);
+      },
+      kMaxMillisToWait, unused_alignment, num_bytes);
+}
+
+void* GPUBFCAllocator::AllocateRawInternal(size_t unused_alignment,
+                                           size_t num_bytes,
+                                           bool dump_log_on_failure) {
+  if (num_bytes == 0) {
+    LOG(ERROR) << "tried to allocate 0 bytes";
+    return nullptr;
+  }
+  // First, always allocate memory of at least 256 bytes, and always
+  // allocate multiples of 256 bytes so all memory addresses are
+  // nicely byte aligned.
+  size_t rounded_bytes = (256 * ((num_bytes + 255) / 256));
+  DCHECK_EQ(0, rounded_bytes % 256);
+
+  // The BFC allocator tries to find the best fit first.
+  //
+  // First identify the first bin that could satisfy rounded_bytes.
+  auto it = bins_.lower_bound(rounded_bytes);
+  if (it == bins_.end()) {
+    LOG(ERROR) << " Asked for " << rounded_bytes << " but largest bin was "
+               << bins_.rbegin()->first;
+    return nullptr;
+  }
+
+  mutex_lock l(lock_);
+  for (; it != bins_.end(); ++it) {
+    // Start searching from the first bin for the smallest chunk that fits
+    // rounded_bytes.
+    Bin* b = it->second;
+    for (GPUBFCAllocator::Chunk* chunk : b->chunks) {
+      if (!chunk->in_use && chunk->size > rounded_bytes) {
+        // We found an existing chunk that fits us that wasn't in use.
+        chunk->in_use = true;
+
+        // If we can break the size of the chunk into two reasonably
+        // large pieces, do so.
+        //
+        // TODO(vrv): What should be the criteria when deciding when
+        // to split?
+        if (chunk->size >= rounded_bytes * 2) {
+          SplitChunk(chunk, rounded_bytes);
+        }
+
+        // The requested size of the returned chunk is what the user
+        // has allocated.
+        chunk->requested_size = num_bytes;
+
+        VLOG(4) << "Returning: " << chunk->ptr;
+        return chunk->ptr;
+      }
+    }
+  }
+
+  // We searched all bins for an existing free chunk to use and
+  // couldn't find one.  This means we must have run out of memory,
+  // Dump the memory log for analysis.
+  if (dump_log_on_failure) {
+    DumpMemoryLog(rounded_bytes);
+    LOG(WARNING) << "Ran out of memory trying to allocate "
+                 << strings::HumanReadableNumBytes(num_bytes)
+                 << ".  See logs for memory state";
+  }
+  return nullptr;
+}
+
+void GPUBFCAllocator::SplitChunk(GPUBFCAllocator::Chunk* c, size_t num_bytes) {
+  // Create a new chunk starting num_bytes after c
+  GPUBFCAllocator::Chunk* new_chunk = new GPUBFCAllocator::Chunk();
+  new_chunk->ptr = static_cast<void*>(static_cast<char*>(c->ptr) + num_bytes);
+  VLOG(6) << "Adding to chunk map: " << new_chunk->ptr;
+  ptr_to_chunk_map_.insert(std::make_pair(new_chunk->ptr, new_chunk));
+
+  // Set the new sizes of the chunks.
+  new_chunk->size = c->size - num_bytes;
+  c->size = num_bytes;
+
+  // The new chunk is not in use.
+  new_chunk->in_use = false;
+
+  // Maintain the pointers.
+  // c <-> c_neighbor becomes
+  // c <-> new_chunk <-> c_neighbor
+  GPUBFCAllocator::Chunk* c_neighbor = c->next;
+  new_chunk->prev = c;
+  new_chunk->next = c_neighbor;
+  c->next = new_chunk;
+  if (c_neighbor) {
+    c_neighbor->prev = new_chunk;
+  }
+
+  // Maintain the bins
+  ReassignChunkToBin(new_chunk);
+  ReassignChunkToBin(c);
+}
+
+void GPUBFCAllocator::DeallocateRaw(void* ptr) {
+  retry_helper_.DeallocateRaw([this](void* p) { DeallocateRawInternal(p); },
+                              ptr);
+}
+
+void GPUBFCAllocator::DeallocateRawInternal(void* ptr) {
+  if (ptr == nullptr) {
+    LOG(ERROR) << "tried to deallocate nullptr";
+    return;
+  }
+  mutex_lock l(lock_);
+
+  // Find the chunk from the ptr.
+  auto it = ptr_to_chunk_map_.find(ptr);
+  CHECK(it != ptr_to_chunk_map_.end())
+      << "Asked to deallocate a pointer we never allocated: " << ptr;
+
+  GPUBFCAllocator::Chunk* c = it->second;
+  VLOG(6) << "Chunk at " << c->ptr << " no longer in use";
+  // Mark the chunk as no longer in use
+  c->in_use = false;
+
+  // Consider coalescing it.
+  MaybeCoalesce(c);
+}
+
+// Merges c1 and c2 when c1->next is c2 and c2->prev is c1.
+// We merge c2 into c1.
+void GPUBFCAllocator::Merge(GPUBFCAllocator::Chunk* c1,
+                            GPUBFCAllocator::Chunk* c2) {
+  // We can only merge chunks that are not in use.
+  DCHECK(!c1->in_use && !c2->in_use);
+
+  // c1's prev doesn't change, still points to the same ptr, and is
+  // still not in use.
+
+  // Fix up neighbor pointers
+  //
+  // c1 <-> c2 <-> c3 should become
+  // c1 <-> c3
+  GPUBFCAllocator::Chunk* c3 = c2->next;
+  c1->next = c3;
+  CHECK(c2->prev == c1);
+  if (c3 != nullptr) {
+    c3->prev = c1;
+  }
+
+  // Set the new size
+  c1->size += c2->size;
+
+  // Delete c2 and cleanup all state
+  RemoveChunkFromBin(c2);
+}
+
+void GPUBFCAllocator::ReassignChunkToBin(GPUBFCAllocator::Chunk* c) {
+  auto it = bins_.lower_bound(c->size);
+  CHECK(it != bins_.end()) << " Tried to reassign to non-existent bin for size "
+                           << c->size;
+
+  Bin* new_bin = it->second;
+
+  // If the bin has not changed, do nothing.
+  Bin* old_bin = c->bin;
+  if (old_bin != nullptr && new_bin == old_bin) {
+    return;
+  }
+
+  // The bin has changed.  Add the chunk to the new bin and remove
+  // the chunk from the old bin.
+  new_bin->chunks.insert(c);
+  c->bin = new_bin;
+
+  if (old_bin == nullptr) {
+    return;
+  }
+
+  // Remove chunk from old bin
+  for (auto it = old_bin->chunks.begin(); it != old_bin->chunks.end(); ++it) {
+    if (*it == c) {
+      old_bin->chunks.erase(it);
+      return;
+    }
+  }
+  CHECK(false) << "Could not find chunk in old bin";
+}
+
+void GPUBFCAllocator::RemoveChunkFromBin(GPUBFCAllocator::Chunk* c) {
+  Bin* b = c->bin;
+  for (auto it = b->chunks.begin(); it != b->chunks.end(); ++it) {
+    Chunk* other_c = *it;
+    if (other_c->ptr == c->ptr) {
+      b->chunks.erase(it);
+      VLOG(4) << "Removing: " << c->ptr;
+      ptr_to_chunk_map_.erase(c->ptr);
+      delete c;
+      return;
+    }
+  }
+
+  CHECK(false) << "Could not find chunk in bin";
+}
+
+void GPUBFCAllocator::MaybeCoalesce(GPUBFCAllocator::Chunk* c) {
+  // This chunk is no longer in-use, consider coalescing the chunk
+  // with adjacent chunks.
+  Chunk* chunk_to_reassign = nullptr;
+
+  // If the next chunk is free, coalesce the two, if the result would
+  // fit in an existing bin.
+  if (c->next && !c->next->in_use) {
+    VLOG(8) << "Chunk at " << c->next->ptr << " merging with c " << c->ptr;
+
+    chunk_to_reassign = c;
+
+    // Deletes c->next
+    Merge(c, c->next);
+  }
+
+  // If the previous chunk is free, coalesce the two
+  if (c->prev && !c->prev->in_use) {
+    VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev "
+            << c->prev->ptr;
+
+    chunk_to_reassign = c->prev;
+
+    // Deletes c
+    Merge(c->prev, c);
+  }
+
+  // Reassign the final merged chunk into the right bin.
+  if (chunk_to_reassign) {
+    ReassignChunkToBin(chunk_to_reassign);
+  }
+}
+
+void GPUBFCAllocator::AddAllocVisitor(Visitor visitor) {
+  VLOG(1) << "AddVisitor";
+  mutex_lock l(lock_);
+  region_visitors_.push_back(visitor);
+  visitor(base_ptr_, gpu_memory_size_);
+}
+
+bool GPUBFCAllocator::TracksAllocationSizes() { return true; }
+
+size_t GPUBFCAllocator::RequestedSize(void* ptr) {
+  mutex_lock l(lock_);
+  auto it = ptr_to_chunk_map_.find(ptr);
+  CHECK(it != ptr_to_chunk_map_.end())
+      << "Asked for requested size of pointer we never allocated: " << ptr;
+  GPUBFCAllocator::Chunk* c = it->second;
+  return c->requested_size;
+}
+
+size_t GPUBFCAllocator::AllocatedSize(void* ptr) {
+  mutex_lock l(lock_);
+  auto it = ptr_to_chunk_map_.find(ptr);
+  CHECK(it != ptr_to_chunk_map_.end())
+      << "Asked for allocated size of pointer we never allocated: " << ptr;
+  GPUBFCAllocator::Chunk* c = it->second;
+  return c->size;
+}
+
+void GPUBFCAllocator::DumpMemoryLog(size_t num_bytes) {
+  // For each bin: tally up the total number of chunks and bytes.
+  for (auto bit : bins_) {
+    Bin* b = bit.second;
+
+    size_t total_bytes_in_use = 0;
+    size_t total_bytes_in_bin = 0;
+    size_t total_requested_bytes_in_use = 0;
+    size_t total_requested_bytes_in_bin = 0;
+    size_t total_chunks_in_use = 0;
+    size_t total_chunks_in_bin = 0;
+    for (Chunk* c : b->chunks) {
+      total_bytes_in_bin += c->size;
+      total_requested_bytes_in_bin += c->requested_size;
+      ++total_chunks_in_bin;
+      if (c->in_use) {
+        total_bytes_in_use += c->size;
+        total_requested_bytes_in_use += c->requested_size;
+        ++total_chunks_in_use;
+      }
+    }
+
+    LOG(INFO) << "Bin (" << b->bin_size
+              << "): \tTotal Chunks: " << total_chunks_in_bin
+              << ", Chunks in use: " << total_chunks_in_use << " "
+              << strings::HumanReadableNumBytes(total_bytes_in_bin)
+              << " allocated for chunks. "
+              << strings::HumanReadableNumBytes(total_requested_bytes_in_bin)
+              << " client-requested for chunks. "
+              << strings::HumanReadableNumBytes(total_bytes_in_use)
+              << " in use in bin. "
+              << strings::HumanReadableNumBytes(total_requested_bytes_in_use)
+              << " client-requested in use in bin.";
+  }
+
+  // Find the bin that we would have liked to allocate in, so we
+  // can get some further analysis about fragmentation.
+  auto it = bins_.lower_bound(num_bytes);
+  if (it != bins_.end()) {
+    Bin* b = it->second;
+
+    LOG(INFO) << "Bin for " << strings::HumanReadableNumBytes(num_bytes)
+              << " was " << strings::HumanReadableNumBytes(b->bin_size)
+              << ", Chunk State: ";
+
+    for (Chunk* c : b->chunks) {
+      LOG(INFO) << c->DebugString(true);
+    }
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.h b/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.h
new file mode 100644
index 0000000000..3d1601e132
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.h
@@ -0,0 +1,156 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_GPU_BFC_ALLOCATOR_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_GPU_BFC_ALLOCATOR_H_
+
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h"
+#include "tensorflow/core/common_runtime/gpu/visitable_allocator.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+
+namespace tensorflow {
+
+// A GPU memory allocator that implements a 'best-fit with coalescing'
+// algorithm.  This is essentially a very simple version of Doug Lea's
+// malloc (dlmalloc).
+//
+// The goal of this allocator is to support defragmentation via
+// coalescing.  One assumption we make is that the process using this
+// allocator owns pretty much all of the GPU memory, and that nearly
+// all requests to allocate GPU memory go through this interface.
+class GPUBFCAllocator : public VisitableAllocator {
+ public:
+  // 'device_id' refers to the StreamExecutor ID of the device within
+  // the process and must reference a valid ID in the process.
+  explicit GPUBFCAllocator(int device_id, size_t total_memory);
+  ~GPUBFCAllocator() override;
+
+  string Name() override { return "gpu_bfc"; }
+  void* AllocateRaw(size_t alignment, size_t num_bytes) override;
+  void DeallocateRaw(void* ptr) override;
+
+  void AddAllocVisitor(Visitor visitor) override;
+
+  // Does nothing, because gpu memory is never freed.
+  void AddFreeVisitor(Visitor visitor) override {}
+
+  bool TracksAllocationSizes() override;
+
+  size_t RequestedSize(void* ptr) override;
+
+  size_t AllocatedSize(void* ptr) override;
+
+ private:
+  struct Bin;
+
+  void* AllocateRawInternal(size_t alignment, size_t num_bytes,
+                            bool dump_log_on_failure);
+  void DeallocateRawInternal(void* ptr);
+
+  // Chunks point to GPU memory.  Their prev/next pointers form a
+  // doubly-linked list of addresses sorted by GPU base address that
+  // must be contiguous.  Chunks contain information about whether
+  // they are in use or whether they are free, and contain a pointer
+  // to the bin they are in.
+  struct Chunk {
+    size_t size = 0;  // Full size of GPU buffer.
+
+    // We sometimes give chunks that are larger than needed to reduce
+    // fragmentation.  requested_size keeps track of what the client
+    // actually wanted so we can understand whether our splitting
+    // strategy is efficient.
+    size_t requested_size = 0;
+
+    bool in_use = false;
+    void* ptr = nullptr;  // pointer to granted GPU subbuffer.
+
+    // If not null, the memory referred to by 'prev' is directly
+    // preceding the memory used by this chunk.  E.g., It should start
+    // at 'ptr - prev->size'
+    Chunk* prev = nullptr;
+
+    // If not null, the memory referred to by 'next' is directly
+    // following the memory used by this chunk.  E.g., It should be at
+    // 'ptr + size'
+    Chunk* next = nullptr;
+
+    // What bin are we in?
+    Bin* bin = nullptr;
+
+    string DebugString(bool recurse) {
+      string dbg;
+      strings::StrAppend(&dbg, "  Size: ", strings::HumanReadableNumBytes(size),
+                         " | Requested Size: ",
+                         strings::HumanReadableNumBytes(requested_size),
+                         " | in_use: ", in_use);
+      if (recurse && prev) {
+        strings::StrAppend(&dbg, ", prev: ", prev->DebugString(false));
+      }
+      if (recurse && next) {
+        strings::StrAppend(&dbg, ", next: ", next->DebugString(false));
+      }
+      return dbg;
+    }
+  };
+
+  Chunk* AllocateNewChunk(size_t num_bytes);
+  void SplitChunk(Chunk* c, size_t num_bytes);
+  void Merge(Chunk* c1, Chunk* c2);
+  void MaybeCoalesce(Chunk* c);
+
+  void ReassignChunkToBin(Chunk* c);
+  void RemoveChunkFromBin(Chunk* c);
+
+  void DumpMemoryLog(size_t num_bytes);
+
+  // A Bin is a collection of similar-sized Chunks.
+  struct Bin {
+    // All chunks in this bin have >= bin_size memory.
+    size_t bin_size = 0;
+
+    struct ChunkComparator {
+      bool operator()(Chunk* a, Chunk* b) { return a->size < b->size; }
+    };
+
+    // List of chunks within the bin, sorted by chunk size.
+    std::multiset<Chunk*, ChunkComparator> chunks;
+
+    explicit Bin(size_t bs) : bin_size(bs) {}
+
+    ~Bin() { gtl::STLDeleteElements(&chunks); }
+  };
+
+  GPUAllocatorRetry retry_helper_;
+
+  // Structures immutable after construction
+  const int device_id_;
+  // The base pointer where all the GPU memory begins.
+  void* base_ptr_ = nullptr;
+  size_t gpu_memory_size_ = 0;
+
+  // Map from bin size to Bin
+  // After construction, the bin map is never resized.
+  std::map<size_t, Bin*> bins_;
+
+  perftools::gputools::StreamExecutor* stream_exec_;  // Not owned.
+
+  // Structures mutable after construction
+  mutable mutex lock_;
+  // Not owned.
+  std::unordered_map<void*, Chunk*> ptr_to_chunk_map_;
+
+  // Called once on each region, ASAP.
+  std::vector<Visitor> region_visitors_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(GPUBFCAllocator);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_GPU_BFC_ALLOCATOR_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator_test.cc b/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator_test.cc
new file mode 100644
index 0000000000..7b5e8aec1d
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_bfc_allocator_test.cc
@@ -0,0 +1,166 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.h"
+
+#include <algorithm>
+#include <vector>
+
+#include "tensorflow/stream_executor/stream_executor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+namespace {
+
+TEST(GPUBFCAllocatorTest, NoDups) {
+  GPUBFCAllocator a(0, 1 << 30);
+  // Allocate a lot of raw pointers
+  std::vector<void*> ptrs;
+  for (int s = 1; s < 1024; s++) {
+    void* raw = a.AllocateRaw(1, s);
+    ptrs.push_back(raw);
+  }
+
+  std::sort(ptrs.begin(), ptrs.end());
+
+  // Make sure none of them are equal, and that none of them overlap.
+  for (int i = 0; i < ptrs.size(); i++) {
+    if (i > 0) {
+      ASSERT_NE(ptrs[i], ptrs[i - 1]);  // No dups
+      size_t req_size = a.RequestedSize(ptrs[i - 1]);
+      ASSERT_GT(req_size, 0);
+      ASSERT_GE(static_cast<char*>(ptrs[i]) - static_cast<char*>(ptrs[i - 1]),
+                req_size);
+    }
+  }
+
+  for (int i = 0; i < ptrs.size(); i++) {
+    a.DeallocateRaw(ptrs[i]);
+  }
+}
+
+TEST(GPUBFCAllocatorTest, AllocationsAndDeallocations) {
+  GPUBFCAllocator a(0, 1 << 30);
+  // Allocate 256 raw pointers of sizes between 100 bytes and about
+  // a meg
+  random::PhiloxRandom philox(123, 17);
+  random::SimplePhilox rand(&philox);
+
+  std::vector<void*> initial_ptrs;
+  for (int s = 1; s < 256; s++) {
+    size_t size = std::min<size_t>(
+        std::max<size_t>(rand.Rand32() % 1048576, 100), 1048576);
+    void* raw = a.AllocateRaw(1, size);
+
+    initial_ptrs.push_back(raw);
+  }
+
+  // Deallocate half of the memory, and keep track of the others.
+  std::vector<void*> existing_ptrs;
+  for (int i = 0; i < initial_ptrs.size(); i++) {
+    if (i % 2 == 1) {
+      a.DeallocateRaw(initial_ptrs[i]);
+    } else {
+      existing_ptrs.push_back(initial_ptrs[i]);
+    }
+  }
+
+  // Allocate a lot of raw pointers
+  for (int s = 1; s < 256; s++) {
+    size_t size = std::min<size_t>(
+        std::max<size_t>(rand.Rand32() % 1048576, 100), 1048576);
+    void* raw = a.AllocateRaw(1, size);
+    existing_ptrs.push_back(raw);
+  }
+
+  std::sort(existing_ptrs.begin(), existing_ptrs.end());
+  // Make sure none of them are equal
+  for (int i = 0; i < existing_ptrs.size(); i++) {
+    if (i > 0) {
+      CHECK_NE(existing_ptrs[i], existing_ptrs[i - 1]);  // No dups
+
+      size_t req_size = a.RequestedSize(existing_ptrs[i - 1]);
+      ASSERT_GT(req_size, 0);
+
+      // Check that they don't overlap.
+      ASSERT_GE(static_cast<char*>(existing_ptrs[i]) -
+                    static_cast<char*>(existing_ptrs[i - 1]),
+                req_size);
+    }
+  }
+
+  for (int i = 0; i < existing_ptrs.size(); i++) {
+    a.DeallocateRaw(existing_ptrs[i]);
+  }
+}
+
+TEST(GPUBFCAllocatorTest, ExerciseCoalescing) {
+  GPUBFCAllocator a(0, 1 << 30);
+
+  float* first_ptr = a.Allocate<float>(1024);
+  a.Deallocate(first_ptr);
+  for (int i = 0; i < 1024; ++i) {
+    // Allocate several buffers of different sizes, and then clean them
+    // all up.  We should be able to repeat this endlessly without
+    // causing fragmentation and growth.
+    float* t1 = a.Allocate<float>(1024);
+
+    int64* t2 = a.Allocate<int64>(1048576);
+    double* t3 = a.Allocate<double>(2048);
+    float* t4 = a.Allocate<float>(10485760);
+
+    a.Deallocate(t1);
+    a.Deallocate(t2);
+    a.Deallocate(t3);
+    a.Deallocate(t4);
+  }
+
+  // At the end, we should have coalesced all memory into one region
+  // starting at the beginning, so validate that allocating a pointer
+  // starts from this region.
+  float* first_ptr_after = a.Allocate<float>(1024);
+  EXPECT_EQ(first_ptr, first_ptr_after);
+  a.Deallocate(first_ptr_after);
+}
+
+TEST(GPUBFCAllocatorTest, AllocateZeroBufSize) {
+  GPUBFCAllocator a(0, 1 << 30);
+  float* ptr = a.Allocate<float>(0);
+  EXPECT_EQ(nullptr, ptr);
+}
+
+TEST(GPUBFCAllocatorTest, TracksSizes) {
+  GPUBFCAllocator a(0, 1 << 30);
+  EXPECT_EQ(true, a.TracksAllocationSizes());
+}
+
+TEST(GPUBFCAllocatorTest, AllocatedVsRequested) {
+  GPUBFCAllocator a(0, 1 << 30);
+  float* t1 = a.Allocate<float>(1);
+  EXPECT_EQ(4, a.RequestedSize(t1));
+  EXPECT_EQ(256, a.AllocatedSize(t1));
+  a.Deallocate(t1);
+}
+
+TEST(GPUBFCAllocatorTest, TestCustomMemoryLimit) {
+  // Configure a 1MiB byte limit
+  GPUBFCAllocator a(0, 1 << 20);
+
+  float* first_ptr = a.Allocate<float>(1 << 6);
+  float* second_ptr = a.Allocate<float>(1 << 20);
+
+  EXPECT_NE(nullptr, first_ptr);
+  EXPECT_EQ(nullptr, second_ptr);
+  a.Deallocate(first_ptr);
+}
+
+}  // namespace
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/common_runtime/gpu/gpu_debug_allocator.cc b/tensorflow/core/common_runtime/gpu/gpu_debug_allocator.cc
new file mode 100644
index 0000000000..5ec405cd80
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_debug_allocator.cc
@@ -0,0 +1,186 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_debug_allocator.h"
+
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+#define MASK_WORDS 2
+#define MASK_BYTES (MASK_WORDS * sizeof(int64))
+
+namespace {
+
+static int64* NewMask(int64 word) {
+  int64* m = new int64[MASK_WORDS];
+  for (int i = 0; i < MASK_WORDS; ++i) {
+    m[i] = word;
+  }
+  return m;
+}
+
+static int64* before_mask = NewMask(0xabababababababab);
+static int64* after_mask = NewMask(0xcdcdcdcdcdcdcdcd);
+
+bool CheckMask(perftools::gputools::StreamExecutor* exec, void* ptr,
+               int64* mask) {
+  gpu::DeviceMemory<int64> gpu_ptr{gpu::DeviceMemoryBase{ptr, MASK_BYTES}};
+  int64 tmp[MASK_WORDS];
+
+  if (!exec->SynchronousMemcpy(&tmp, gpu_ptr, MASK_BYTES)) {
+    LOG(FATAL) << "Could not copy debug mask";
+  }
+
+  bool ok = true;
+  for (int i = 0; i < MASK_WORDS; ++i) {
+    ok &= (mask[i] == tmp[i]);
+    if (!ok) {
+      LOG(ERROR) << "i=" << i
+                 << " mask=" << reinterpret_cast<const void*>(mask[i])
+                 << " field=" << reinterpret_cast<const void*>(tmp[i]);
+    }
+  }
+
+  return ok;
+}
+
+void InitMask(perftools::gputools::StreamExecutor* exec, void* ptr,
+              int64* mask) {
+  gpu::DeviceMemory<int64> gpu_ptr{gpu::DeviceMemoryBase{ptr, MASK_BYTES}};
+  if (!exec->SynchronousMemcpy(&gpu_ptr, mask, MASK_BYTES)) {
+    LOG(FATAL) << "Could not copy debug mask";
+  }
+}
+
+}  // namespace
+
+// -----------------------------------------------------------------------------
+// GPUDebugAllocator
+// -----------------------------------------------------------------------------
+GPUDebugAllocator::GPUDebugAllocator(VisitableAllocator* allocator,
+                                     int device_id)
+    : base_allocator_(allocator) {
+  stream_exec_ = GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+}
+
+GPUDebugAllocator::~GPUDebugAllocator() { delete base_allocator_; }
+
+void* GPUDebugAllocator::AllocateRaw(size_t alignment, size_t num_bytes) {
+  num_bytes += (2 * MASK_BYTES);
+
+  void* allocated_ptr = base_allocator_->AllocateRaw(alignment, num_bytes);
+
+  // Return the pointer after the header
+  void* rv = static_cast<char*>(allocated_ptr) + MASK_BYTES;
+
+  // Write the header at allocated_ptr
+  InitMask(stream_exec_, allocated_ptr, before_mask);
+
+  // Write the footer at the end.
+  size_t req_size = base_allocator_->RequestedSize(allocated_ptr);
+  InitMask(stream_exec_,
+           static_cast<char*>(allocated_ptr) + req_size - MASK_BYTES,
+           after_mask);
+  return rv;
+}
+void GPUDebugAllocator::DeallocateRaw(void* ptr) {
+  CHECK(CheckHeader(ptr)) << "before_mask has been overwritten";
+  CHECK(CheckFooter(ptr)) << "after_mask has been overwritten";
+
+  // Backtrack to the beginning of the header.
+  ptr = static_cast<void*>(static_cast<char*>(ptr) - MASK_BYTES);
+  // Deallocate the memory
+  base_allocator_->DeallocateRaw(ptr);
+}
+
+void GPUDebugAllocator::AddAllocVisitor(Visitor visitor) {
+  return base_allocator_->AddAllocVisitor(visitor);
+}
+
+void GPUDebugAllocator::AddFreeVisitor(Visitor visitor) {
+  return base_allocator_->AddFreeVisitor(visitor);
+}
+
+bool GPUDebugAllocator::TracksAllocationSizes() { return true; }
+
+size_t GPUDebugAllocator::RequestedSize(void* ptr) {
+  auto req_size =
+      base_allocator_->RequestedSize(static_cast<char*>(ptr) - MASK_BYTES);
+  return req_size - 2 * MASK_BYTES;
+}
+
+size_t GPUDebugAllocator::AllocatedSize(void* ptr) {
+  return base_allocator_->AllocatedSize(static_cast<char*>(ptr) - MASK_BYTES);
+}
+
+bool GPUDebugAllocator::CheckHeader(void* ptr) {
+  return CheckMask(stream_exec_, static_cast<char*>(ptr) - MASK_BYTES,
+                   before_mask);
+}
+
+bool GPUDebugAllocator::CheckFooter(void* ptr) {
+  char* original_ptr = static_cast<char*>(ptr) - MASK_BYTES;
+  size_t req_size = base_allocator_->RequestedSize(original_ptr);
+  return CheckMask(stream_exec_, original_ptr + req_size - MASK_BYTES,
+                   after_mask);
+}
+
+// -----------------------------------------------------------------------------
+// GPUNanResetAllocator
+// -----------------------------------------------------------------------------
+GPUNanResetAllocator::GPUNanResetAllocator(VisitableAllocator* allocator,
+                                           int device_id)
+    : base_allocator_(allocator) {
+  stream_exec_ = GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+}
+
+GPUNanResetAllocator::~GPUNanResetAllocator() { delete base_allocator_; }
+
+void* GPUNanResetAllocator::AllocateRaw(size_t alignment, size_t num_bytes) {
+  void* allocated_ptr = base_allocator_->AllocateRaw(alignment, num_bytes);
+
+  // Initialize the buffer to Nans
+  size_t req_size = base_allocator_->RequestedSize(allocated_ptr);
+  std::vector<float> nans(req_size / sizeof(float), std::nanf(""));
+  gpu::DeviceMemory<float> nan_ptr{
+      gpu::DeviceMemoryBase{static_cast<float*>(allocated_ptr), req_size}};
+
+  if (!stream_exec_->SynchronousMemcpy(&nan_ptr, &nans[0], req_size)) {
+    LOG(ERROR) << "Could not initialize to NaNs";
+  }
+
+  return allocated_ptr;
+}
+void GPUNanResetAllocator::DeallocateRaw(void* ptr) {
+  // Reset the buffer to Nans
+  size_t req_size = base_allocator_->RequestedSize(ptr);
+  std::vector<float> nans(req_size / sizeof(float), std::nanf(""));
+  gpu::DeviceMemory<float> nan_ptr{
+      gpu::DeviceMemoryBase{static_cast<float*>(ptr), req_size}};
+  if (!stream_exec_->SynchronousMemcpy(&nan_ptr, &nans[0], req_size)) {
+    LOG(ERROR) << "Could not initialize to NaNs";
+  }
+
+  // Deallocate the memory
+  base_allocator_->DeallocateRaw(ptr);
+}
+
+void GPUNanResetAllocator::AddAllocVisitor(Visitor visitor) {
+  return base_allocator_->AddAllocVisitor(visitor);
+}
+
+void GPUNanResetAllocator::AddFreeVisitor(Visitor visitor) {
+  return base_allocator_->AddFreeVisitor(visitor);
+}
+
+size_t GPUNanResetAllocator::RequestedSize(void* ptr) {
+  return base_allocator_->RequestedSize(ptr);
+}
+
+size_t GPUNanResetAllocator::AllocatedSize(void* ptr) {
+  return base_allocator_->AllocatedSize(ptr);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_debug_allocator.h b/tensorflow/core/common_runtime/gpu/gpu_debug_allocator.h
new file mode 100644
index 0000000000..c9b564ffc4
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_debug_allocator.h
@@ -0,0 +1,68 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_GPU_DEBUG_ALLOCATOR_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_GPU_DEBUG_ALLOCATOR_H_
+
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/common_runtime/gpu/visitable_allocator.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+
+namespace tensorflow {
+
+// An allocator that wraps a GPU allocator and adds debugging
+// functionality that verifies that users do not write outside their
+// allocated memory.
+class GPUDebugAllocator : public VisitableAllocator {
+ public:
+  explicit GPUDebugAllocator(VisitableAllocator* allocator, int device_id);
+  ~GPUDebugAllocator() override;
+  string Name() override { return "gpu_debug"; }
+  void* AllocateRaw(size_t alignment, size_t num_bytes) override;
+  void DeallocateRaw(void* ptr) override;
+  void AddAllocVisitor(Visitor visitor) override;
+  void AddFreeVisitor(Visitor visitor) override;
+  bool TracksAllocationSizes() override;
+  size_t RequestedSize(void* ptr) override;
+  size_t AllocatedSize(void* ptr) override;
+
+  // For testing.
+  bool CheckHeader(void* ptr);
+  bool CheckFooter(void* ptr);
+
+ private:
+  VisitableAllocator* base_allocator_ = nullptr;  // owned
+
+  perftools::gputools::StreamExecutor* stream_exec_;  // Not owned.
+
+  TF_DISALLOW_COPY_AND_ASSIGN(GPUDebugAllocator);
+};
+
+// An allocator that wraps a GPU allocator and resets the memory on
+// allocation and free to 'NaN', helping to identify cases where the
+// user forgets to initialize the memory.
+class GPUNanResetAllocator : public VisitableAllocator {
+ public:
+  explicit GPUNanResetAllocator(VisitableAllocator* allocator, int device_id);
+  ~GPUNanResetAllocator() override;
+  string Name() override { return "gpu_nan_reset"; }
+  void* AllocateRaw(size_t alignment, size_t num_bytes) override;
+  void DeallocateRaw(void* ptr) override;
+  void AddAllocVisitor(Visitor visitor) override;
+  void AddFreeVisitor(Visitor visitor) override;
+  size_t RequestedSize(void* ptr) override;
+  size_t AllocatedSize(void* ptr) override;
+
+ private:
+  VisitableAllocator* base_allocator_ = nullptr;  // owned
+
+  perftools::gputools::StreamExecutor* stream_exec_;  // Not owned.
+
+  TF_DISALLOW_COPY_AND_ASSIGN(GPUNanResetAllocator);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_GPU_DEBUG_ALLOCATOR_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_debug_allocator_test.cc b/tensorflow/core/common_runtime/gpu/gpu_debug_allocator_test.cc
new file mode 100644
index 0000000000..5f63906576
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_debug_allocator_test.cc
@@ -0,0 +1,207 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/common_runtime/gpu/gpu_debug_allocator.h"
+
+#include <algorithm>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include <gtest/gtest.h>
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+TEST(GPUDebugAllocatorTest, OverwriteDetection_None) {
+  const int device_id = 0;
+  GPUDebugAllocator a(new GPUBFCAllocator(device_id, 1 << 30), device_id);
+  auto stream_exec =
+      GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+
+  for (int s : {8}) {
+    std::vector<int64> cpu_array(s);
+    memset(&cpu_array[0], 0, cpu_array.size() * sizeof(int64));
+    int64* gpu_array = a.Allocate<int64>(cpu_array.size());
+    gpu::DeviceMemory<int64> gpu_array_ptr{gpu::DeviceMemoryBase{gpu_array}};
+    ASSERT_TRUE(stream_exec->SynchronousMemcpy(&gpu_array_ptr, &cpu_array[0],
+                                               s * sizeof(int64)));
+    EXPECT_TRUE(a.CheckHeader(gpu_array));
+    EXPECT_TRUE(a.CheckFooter(gpu_array));
+
+    // Confirm no error on free.
+    a.DeallocateRaw(gpu_array);
+  }
+}
+
+TEST(GPUDebugAllocatorTest, OverwriteDetection_Header) {
+  for (int s : {8, 211}) {
+    EXPECT_DEATH(
+        {
+          const int device_id = 0;
+          GPUDebugAllocator a(new GPUBFCAllocator(device_id, 1 << 30),
+                              device_id);
+          auto stream_exec =
+              GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+
+          std::vector<int64> cpu_array(s);
+          memset(&cpu_array[0], 0, cpu_array.size() * sizeof(int64));
+          int64* gpu_array = a.Allocate<int64>(cpu_array.size());
+
+          gpu::DeviceMemory<int64> gpu_array_ptr{
+              gpu::DeviceMemoryBase{gpu_array}};
+          ASSERT_TRUE(stream_exec->SynchronousMemcpy(
+              &gpu_array_ptr, &cpu_array[0], cpu_array.size() * sizeof(int64)));
+
+          gpu::DeviceMemory<int64> gpu_hdr_ptr{
+              gpu::DeviceMemoryBase{gpu_array - 1}};
+          // Clobber first word of the header.
+          float pi = 3.1417;
+          ASSERT_TRUE(
+              stream_exec->SynchronousMemcpy(&gpu_hdr_ptr, &pi, sizeof(float)));
+
+          // Expect error on free.
+          a.Deallocate(gpu_array);
+        },
+        "");
+  }
+}
+
+TEST(GPUDebugAllocatorTest, OverwriteDetection_Footer) {
+  for (int s : {8, 22}) {
+    EXPECT_DEATH(
+        {
+          const int device_id = 0;
+          GPUDebugAllocator a(new GPUBFCAllocator(device_id, 1 << 30),
+                              device_id);
+          auto stream_exec =
+              GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+
+          std::vector<int64> cpu_array(s);
+          memset(&cpu_array[0], 0, cpu_array.size() * sizeof(int64));
+          int64* gpu_array = a.Allocate<int64>(cpu_array.size());
+
+          gpu::DeviceMemory<int64> gpu_array_ptr{
+              gpu::DeviceMemoryBase{gpu_array}};
+          ASSERT_TRUE(stream_exec->SynchronousMemcpy(
+              &gpu_array_ptr, &cpu_array[0], cpu_array.size() * sizeof(int64)));
+
+          // Clobber word of the footer.
+          gpu::DeviceMemory<int64> gpu_ftr_ptr{
+              gpu::DeviceMemoryBase{gpu_array + s}};
+          float pi = 3.1417;
+          ASSERT_TRUE(
+              stream_exec->SynchronousMemcpy(&gpu_ftr_ptr, &pi, sizeof(float)));
+
+          // Expect error on free.
+          a.Deallocate(gpu_array);
+        },
+        "");
+  }
+}
+
+TEST(GPUDebugAllocatorTest, ResetToNan) {
+  const int device_id = 0;
+  GPUNanResetAllocator a(new GPUBFCAllocator(device_id, 1 << 30), device_id);
+  auto stream_exec =
+      GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+
+  std::vector<float> cpu_array(1024);
+  std::vector<float> cpu_array_result(1024);
+
+  // Allocate 1024 floats
+  float* gpu_array = a.Allocate<float>(cpu_array.size());
+  gpu::DeviceMemory<float> gpu_array_ptr{gpu::DeviceMemoryBase{gpu_array}};
+  ASSERT_TRUE(stream_exec->SynchronousMemcpy(&cpu_array[0], gpu_array_ptr,
+                                             cpu_array.size() * sizeof(float)));
+  for (float f : cpu_array) {
+    ASSERT_FALSE(std::isfinite(f));
+  }
+
+  // Set one of the fields to 1.0.
+  cpu_array[0] = 1.0;
+  ASSERT_TRUE(stream_exec->SynchronousMemcpy(&gpu_array_ptr, &cpu_array[0],
+                                             cpu_array.size() * sizeof(float)));
+  // Copy the data back and verify.
+  ASSERT_TRUE(
+      stream_exec->SynchronousMemcpy(&cpu_array_result[0], gpu_array_ptr,
+                                     cpu_array_result.size() * sizeof(float)));
+  ASSERT_EQ(1.0, cpu_array_result[0]);
+
+  // Free the array
+  a.Deallocate(gpu_array);
+
+  // All values should be reset to nan.
+  ASSERT_TRUE(
+      stream_exec->SynchronousMemcpy(&cpu_array_result[0], gpu_array_ptr,
+                                     cpu_array_result.size() * sizeof(float)));
+  for (float f : cpu_array_result) {
+    ASSERT_FALSE(std::isfinite(f));
+  }
+}
+
+TEST(GPUDebugAllocatorTest, ResetToNanWithHeaderFooter) {
+  const int device_id = 0;
+  // NaN reset must be the outer-most allocator.
+  GPUNanResetAllocator a(
+      new GPUDebugAllocator(new GPUBFCAllocator(device_id, 1 << 30), device_id),
+      device_id);
+  auto stream_exec =
+      GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+
+  std::vector<float> cpu_array(1024);
+  std::vector<float> cpu_array_result(1024);
+
+  // Allocate 1024 floats
+  float* gpu_array = a.Allocate<float>(cpu_array.size());
+  gpu::DeviceMemory<float> gpu_array_ptr{gpu::DeviceMemoryBase{gpu_array}};
+  ASSERT_TRUE(stream_exec->SynchronousMemcpy(&cpu_array[0], gpu_array_ptr,
+                                             cpu_array.size() * sizeof(float)));
+  for (float f : cpu_array) {
+    ASSERT_FALSE(std::isfinite(f));
+  }
+
+  // Set one of the fields to 1.0.
+  cpu_array[0] = 1.0;
+  ASSERT_TRUE(stream_exec->SynchronousMemcpy(&gpu_array_ptr, &cpu_array[0],
+                                             cpu_array.size() * sizeof(float)));
+  // Copy the data back and verify.
+  ASSERT_TRUE(
+      stream_exec->SynchronousMemcpy(&cpu_array_result[0], gpu_array_ptr,
+                                     cpu_array_result.size() * sizeof(float)));
+  ASSERT_EQ(1.0, cpu_array_result[0]);
+
+  // Free the array
+  a.Deallocate(gpu_array);
+
+  // All values should be reset to nan.
+  ASSERT_TRUE(
+      stream_exec->SynchronousMemcpy(&cpu_array_result[0], gpu_array_ptr,
+                                     cpu_array_result.size() * sizeof(float)));
+  for (float f : cpu_array_result) {
+    ASSERT_FALSE(std::isfinite(f));
+  }
+}
+
+TEST(GPUDebugAllocatorTest, TracksSizes) {
+  GPUDebugAllocator a(new GPUBFCAllocator(0, 1 << 30), 0);
+  EXPECT_EQ(true, a.TracksAllocationSizes());
+}
+
+TEST(GPUDebugAllocatorTest, AllocatedVsRequested) {
+  GPUNanResetAllocator a(
+      new GPUDebugAllocator(new GPUBFCAllocator(0, 1 << 30), 0), 0);
+  float* t1 = a.Allocate<float>(1);
+  EXPECT_EQ(4, a.RequestedSize(t1));
+  EXPECT_EQ(256, a.AllocatedSize(t1));
+  a.Deallocate(t1);
+}
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/common_runtime/gpu/gpu_device.cc b/tensorflow/core/common_runtime/gpu/gpu_device.cc
new file mode 100644
index 0000000000..26d34645f1
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_device.cc
@@ -0,0 +1,651 @@
+// TODO(opensource): Use a more generic sounding preprocessor name than
+// GOOGLE_CUDA
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/common_runtime/gpu/gpu_device.h"
+
+#include <stdlib.h>
+#include <string.h>
+
+//#include "base/commandlineflags.h"
+#include "tensorflow/stream_executor/cuda/cuda_activation.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_event_mgr.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_stream_util.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_util.h"
+#include "tensorflow/core/common_runtime/gpu/process_state.h"
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/core/common_runtime/local_device.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/device_base.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/types.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/tracing.h"
+#include "tensorflow/core/public/session_options.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/device_name_utils.h"
+
+#if defined(PLATFORM_GOOGLE)
+DEFINE_bool(brain_gpu_sync_every_op, false,
+            "If true, call GPUUtil::Sync() between every dispatched opkernel.");
+
+DEFINE_int32(brain_gpu_max_streams, 1,
+             "Max number of GPU streams to use for computation.");
+#else
+// TODO(opensource): These should be made options in some options struct,
+// rather than flags.
+bool FLAGS_brain_gpu_sync_every_op = false;
+tensorflow::int32 FLAGS_brain_gpu_max_streams = 1;
+#endif
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+// Eigen Ops directly allocate memory only for temporary buffers used
+// during OpKernel::Compute().  The recommended way of allocating such
+// memory is via OpKernelContext::allocate_temp().  However, Eigen Ops
+// don't have access to OpKernelContext, instead they get access to
+// memory directly through the device allocator.  As an Open Source
+// project, Eigen assumes allocator semantics similar to those of the
+// CUDA memory allocator, and may not work correctly due to race
+// conditions if used with some other allocator.  For safety, we need
+// to delay deallocation calls out of Eigen until all events on the
+// corresponding stream have completed.  The following two classes
+// serve this purpose in two different compilation environments.
+
+#if defined(__GCUDACC__) || defined(__GCUDACC_HOST__)
+class EigenAllocator : public ::Eigen::Allocator {
+ public:
+  explicit EigenAllocator(gpu::Stream* stream, ::tensorflow::Allocator* alloc,
+                          EventMgr* em)
+      : stream_(stream), allocator_(alloc), em_(em) {}
+
+  void* allocate(size_t num_bytes) const override {
+    void* ret = allocator_->AllocateRaw(32 /* alignment */, num_bytes);
+    // Eigen doesn't typically check the return pointer from allocate,
+    // so we do it here and die with a more helpful error message.
+    if (ret == nullptr) {
+      LOG(FATAL) << "EigenAllocator for GPU ran out of memory when allocating "
+                 << num_bytes << ". See error logs for more detailed info.";
+    }
+    return ret;
+  }
+
+  void deallocate(void* buffer) const override {
+    em_->ThenDeleteBuffer(stream_, {allocator_, buffer});
+  }
+
+ private:
+  gpu::Stream* stream_;            // Not owned.
+  ::tensorflow::Allocator* allocator_;  // Not owned.
+  ::tensorflow::EventMgr* em_;          // Not owned.
+
+  TF_DISALLOW_COPY_AND_ASSIGN(EigenAllocator);
+};
+
+#else
+class EigenCudaStreamDevice : public ::Eigen::StreamInterface {
+ public:
+  EigenCudaStreamDevice(const cudaStream_t* cuda_stream, int gpu_id,
+                        ::tensorflow::Allocator* alloc)
+      : stream_(cuda_stream), allocator_(alloc) {
+    Eigen::initializeDeviceProp();
+    device_prop_ = &Eigen::m_deviceProperties[gpu_id];
+  }
+
+  const cudaStream_t& stream() const override { return *stream_; }
+  const cudaDeviceProp& deviceProperties() const override {
+    return *device_prop_;
+  }
+
+  void* allocate(size_t num_bytes) const override {
+    void* ret = allocator_->AllocateRaw(32 /* alignment */, num_bytes);
+    if (ret == nullptr) {
+      LOG(FATAL) << "EigenAllocator for GPU ran out of memory when allocating "
+                 << num_bytes << ". See error logs for more detailed info.";
+    }
+
+    return ret;
+  }
+  void deallocate(void* buffer) const override {
+    AsyncFreeData* afData = new AsyncFreeData(allocator_, buffer);
+    cudaError_t err = cudaStreamAddCallback(*stream_, asyncFree, afData, 0);
+    CHECK_EQ(err, cudaSuccess);
+  }
+
+ private:
+  struct AsyncFreeData {
+    AsyncFreeData(::tensorflow::Allocator* a, void* p)
+        : allocator_(a), address_(p) {}
+    ::tensorflow::Allocator* allocator_;
+    void* address_;
+  };
+
+  static void CUDART_CB asyncFree(cudaStream_t stream, cudaError_t status,
+                                  void* userData) {
+    AsyncFreeData* data = static_cast<AsyncFreeData*>(userData);
+    data->allocator_->DeallocateRaw(data->address_);
+    delete data;
+  }
+
+  const cudaStream_t* stream_;         // Not owned.
+  const cudaDeviceProp* device_prop_;  // Not owned.
+  ::tensorflow::Allocator* allocator_;  // Not owned.
+
+  TF_DISALLOW_COPY_AND_ASSIGN(EigenCudaStreamDevice);
+};
+
+#endif
+
+BaseGPUDevice::BaseGPUDevice(const SessionOptions& options, const string& name,
+                             Bytes memory_limit, BusAdjacency bus_adjacency,
+                             int gpu_id, const string& physical_device_desc,
+                             Allocator* gpu_allocator, Allocator* cpu_allocator)
+    : LocalDevice(options, Device::BuildDeviceAttributes(
+                               name, DEVICE_GPU, memory_limit, bus_adjacency,
+                               physical_device_desc),
+                  gpu_allocator),
+      gpu_allocator_(gpu_allocator),
+      cpu_allocator_(cpu_allocator),
+      gpu_id_(gpu_id) {
+  gpu::StreamExecutor* executor =
+      GPUMachineManager()->ExecutorForDevice(gpu_id_).ValueOrDie();
+  if (!executor) {
+    LOG(ERROR) << "Failed to get StreamExecutor for device " << gpu_id_;
+    return;
+  }
+  em_.reset(new EventMgr(executor));
+
+  if (FLAGS_brain_gpu_max_streams < 1) {
+    LOG(FATAL) << "Invalid value for brain_gpu_max_streams.";
+  }
+
+  // Create the specified number of GPU streams
+  for (int i = 0; i < FLAGS_brain_gpu_max_streams; i++) {
+    auto stream = new gpu::Stream(executor);
+    stream->Init();
+    VLOG(2) << "Created stream[" << i << "] = " << stream;
+    streams_.push_back(stream);
+    device_contexts_.push_back(new GPUDeviceContext(i, stream));
+  }
+  gpu_device_info_ = new GpuDeviceInfo;
+  gpu_device_info_->stream = streams_[0];
+  gpu_device_info_->default_context = device_contexts_[0];
+  gpu_device_info_->event_mgr = em_.get();
+  set_tensorflow_gpu_device_info(gpu_device_info_);
+}
+
+BaseGPUDevice::~BaseGPUDevice() {
+  delete gpu_device_info_;
+  for (auto ctx : device_contexts_) ctx->Unref();
+  gtl::STLDeleteElements(&streams_);
+}
+
+Status BaseGPUDevice::FillContextMap(const Graph* graph,
+                                     DeviceContextMap* device_context_map) {
+  VLOG(2) << "FillContextMap";
+
+  const auto num_streams = streams_.size();
+  // Special case for single stream.
+  if (num_streams == 1) {
+    return Status::OK();
+  }
+  const int64 before = Env::Default()->NowMicros();
+  gpu_stream_util::AssignStreamsOpts opts;
+  opts.max_streams = num_streams;
+  std::unordered_map<int, int> node_to_stream_id;
+  TF_RETURN_IF_ERROR(
+      gpu_stream_util::AssignStreams(graph, opts, &node_to_stream_id));
+  int64 elapsed = Env::Default()->NowMicros() - before;
+  VLOG(3) << "AssignStreams took " << elapsed << "us";
+
+  // Fill in the context map.  It is OK for this map to contain
+  // duplicate DeviceContexts so long as we increment the refcount.
+  for (Node* n : graph->nodes()) {
+    auto mapped_stream = node_to_stream_id[n->id()];
+    CHECK_LE(mapped_stream, num_streams);
+    auto ctx = device_contexts_[mapped_stream];
+    VLOG(3) << "Assigned stream " << node_to_stream_id[n->id()]
+            << " ==> stream[" << ctx->stream_id() << "] for node id " << n->id()
+            << " " << n->type_string() << " " << n->name();
+    ctx->Ref();
+    device_context_map->insert(std::make_pair(n->id(), ctx));
+  }
+
+  return Status::OK();
+}
+
+void BaseGPUDevice::Compute(OpKernel* op_kernel, OpKernelContext* context) {
+  // ScopedActivity is cheap when tracing is not active, but we
+  // can avoid computing the Hash64.
+  // TODO(pbar) This would no longer be needed if Ops have a unique id.
+  const uint64 id = port::Tracing::IsActive() ? Hash64(op_kernel->name()) : 0;
+  port::Tracing::ScopedActivity region(port::Tracing::EventCategory::kCompute,
+                                       id);
+
+  GPUDeviceContext* gpu_device_context = device_contexts_[0];
+  if (context->op_device_context() != nullptr) {
+    gpu_device_context =
+        static_cast<GPUDeviceContext*>(context->op_device_context());
+  }
+  gpu::Stream* stream = gpu_device_context->stream();
+  const auto stream_id = gpu_device_context->stream_id();
+
+  VLOG(1) << "GpuDevice::Compute " << op_kernel->name() << " op "
+          << op_kernel->def().op() << " on GPU" << gpu_id_ << " stream["
+          << stream_id << "]";
+
+  // NOTE(tucker): We need to discriminate between Eigen GPU
+  // operations and all others.  If an operation is Eigen
+  // implemented (or otherwise tries to launch a cuda kernel
+  // directly), we need to establish a stacked-scoped environment
+  // that directs it to execute on the proper device.  Otherwise we
+  // expect the Op to use StreamExecutor directly and correctly.  The
+  // way we make this discrimination is quite hacky: At the moment
+  // the only non-Eigen GPU Op is the recv-op, which is known to be
+  // asynchronous.
+  if (op_kernel->type_string() == "_Recv") {
+    context->SetStatus(errors::Internal(
+        "Invalid synchronous 'Compute' on GPU for '_Recv' op"));
+  } else {
+    const string label =
+        strings::StrCat(op_kernel->name(), ":", op_kernel->type_string());
+    port::Tracing::ScopedAnnotation annotation(label);
+
+    const auto num_streams = streams_.size();
+    if (num_streams > 1) {
+      // If this op's device context is different from the other contexts,
+      // we must wait on the stream.
+      for (int i = 0; i < context->num_inputs(); ++i) {
+        const GPUDeviceContext* idc =
+            static_cast<GPUDeviceContext*>(context->input_device_context(i));
+        OP_REQUIRES(context, idc != nullptr,
+                    errors::Internal("Input device context ", i,
+                                     " was not set properly."));
+        if (VLOG_IS_ON(2)) {
+          const void* base;
+          size_t len;
+          if (context->has_input(i)) {
+            if (IsRefType(context->input_dtype(i))) {
+              Tensor tensor = context->mutable_input(i, false);
+              base = DMAHelper::base(&tensor);
+              len = tensor.TotalBytes();
+            } else {
+              const Tensor& tensor = context->input(i);
+              base = DMAHelper::base(&tensor);
+              len = tensor.TotalBytes();
+            }
+            VLOG(2) << "Input " << i << " " << base << "  " << len;
+            VLOG(2) << "  stream[" << stream_id << "].ThenWaitFor(stream["
+                    << idc->stream_id() << "])"
+                    << ((idc->stream() == stream) ? " not needed" : "");
+          }
+        }
+        if (idc->stream() != stream) stream->ThenWaitFor(idc->stream());
+      }
+    }
+    gpu::cuda::ScopedActivateExecutorContext scoped_activation{
+        stream->parent(), gpu::cuda::MultiOpActivation::kYes};
+    // Keep a copy of the inputs before Compute runs, in case they get
+    // deleted. TODO(misard) this will be fixed when the tracking is
+    // done right.
+    std::vector<Tensor>* tensor_refs = nullptr;
+    if (!FLAGS_brain_gpu_sync_every_op) {
+      tensor_refs = new std::vector<Tensor>;
+      tensor_refs->reserve(context->num_inputs() + context->num_outputs());
+      for (int ii = 0; ii < context->num_inputs(); ++ii) {
+        if (context->has_input(ii)) {
+          if (IsRefType(context->input_dtype(ii))) {
+            Tensor in = context->mutable_input(ii, false);
+            tensor_refs->push_back(in);
+          } else {
+            const Tensor& in = context->input(ii);
+            tensor_refs->push_back(in);
+          }
+        }
+      }
+    }
+    op_kernel->Compute(context);
+    if (context->status().ok()) {
+      if (FLAGS_brain_gpu_sync_every_op) {
+        // Note: GPUUtil::Sync() only syncs the default stream.
+        // We need to either sync the stream used by this op, or
+        // all streams.  Given that this flag is typically used for
+        // debugging it makes more sense to sync all GPU activity.
+        context->SetStatus(GPUUtil::SyncAll(this));
+      } else {
+        // The GPU kernel has been queued, but may not complete for some
+        // time.  As soon as this function completes, the caller will
+        // discard its refs on the inputs, outputs and any scratch
+        // tensors it created. Create additional refs here that will be
+        // held until the kernel completes.
+        for (int ii = 0; ii < context->num_temps(); ++ii) {
+          Tensor* temp = context->temp(ii);
+          VLOG(2) << "Saving ref to temp Tensor @ " << DMAHelper::base(temp);
+          tensor_refs->push_back(*temp);
+        }
+        for (int ii = 0; ii < context->num_outputs(); ++ii) {
+          Tensor* temp = context->mutable_output(ii);
+          if (nullptr != temp) {
+            tensor_refs->push_back(*temp);
+          }
+        }
+        em_->ThenDeleteTensors(stream, tensor_refs);
+      }
+    } else {
+      if (!FLAGS_brain_gpu_sync_every_op) {
+        delete tensor_refs;
+      }
+    }
+  }
+}
+
+Status BaseGPUDevice::Sync() { return GPUUtil::Sync(this); }
+
+void BaseGPUDevice::ComputeAsync(AsyncOpKernel* op_kernel,
+                                 OpKernelContext* context,
+                                 AsyncOpKernel::DoneCallback done) {
+  GPUDeviceContext* gpu_device_context = device_contexts_[0];
+  if (context->op_device_context() != nullptr) {
+    gpu_device_context =
+        static_cast<GPUDeviceContext*>(context->op_device_context());
+  }
+  const auto stream_id = gpu_device_context->stream_id();
+
+  VLOG(1) << "GpuDevice::ComputeAsync " << op_kernel->name() << " op "
+          << op_kernel->def().op() << " on GPU" << gpu_id_ << " stream["
+          << stream_id << "]";
+
+  port::Tracing::TraceMe activity(
+      strings::StrCat(op_kernel->name(), ":", op_kernel->type_string()));
+  op_kernel->ComputeAsync(context, done);
+}
+
+Status BaseGPUDevice::MakeTensorFromProto(const TensorProto& tensor_proto,
+                                          const AllocatorAttributes alloc_attrs,
+                                          Tensor* tensor) {
+  AllocatorAttributes attr;
+  attr.set_on_host(true);
+  attr.set_gpu_compatible(true);
+  Allocator* host_alloc = GetAllocator(attr);
+  Tensor parsed(tensor_proto.dtype());
+  if (!parsed.FromProto(host_alloc, tensor_proto)) {
+    return errors::InvalidArgument("Cannot parse tensor from proto: ",
+                                   tensor_proto.DebugString());
+  }
+  Status status;
+  if (alloc_attrs.on_host()) {
+    *tensor = parsed;
+  } else {
+    if (!DMAHelper::CanUseDMA(&parsed)) {
+      return errors::Internal("GPU copy from non-DMA ",
+                              DataTypeString(parsed.dtype()), " tensor");
+    }
+    Tensor copy(GetAllocator(alloc_attrs), parsed.dtype(), parsed.shape());
+    port::Tracing::ScopedAnnotation annotation("MakeTensorFromProto");
+    Notification n;
+    device_contexts_[0]->CopyCPUTensorToDevice(&parsed, this, &copy,
+                                               [&n, &status](const Status& s) {
+                                                 status = s;
+                                                 n.Notify();
+                                               });
+    n.WaitForNotification();
+    *tensor = copy;
+  }
+  return status;
+}
+
+namespace {
+#if defined(__GCUDACC__) || defined(__GCUDACC_HOST__)
+class ConcretePerOpGpuDevice : public PerOpGpuDevice {
+ public:
+  explicit ConcretePerOpGpuDevice(gpu::Stream* stream,
+                                  EigenAllocator* allocator)
+      : device_(stream, allocator), allocator_(allocator) {}
+  ~ConcretePerOpGpuDevice() { delete allocator_; }
+
+  const Eigen::GpuDevice& device() const override { return device_; }
+
+ private:
+  Eigen::GpuDevice device_;
+  EigenAllocator* allocator_;
+};
+#else
+class ConcretePerOpGpuDevice : public PerOpGpuDevice {
+ public:
+  explicit ConcretePerOpGpuDevice(EigenCudaStreamDevice* stream_device)
+      : device_(stream_device), stream_device_(stream_device) {}
+  ~ConcretePerOpGpuDevice() { delete stream_device_; }
+
+  const Eigen::GpuDevice& device() const override { return device_; }
+
+ private:
+  Eigen::GpuDevice device_;
+  EigenCudaStreamDevice* stream_device_;
+};
+#endif
+}  // namespace
+
+const PerOpGpuDevice* BaseGPUDevice::NewDevice(int stream_id,
+                                               Allocator* allocator) {
+#if defined(__GCUDACC__) || defined(__GCUDACC_HOST__)
+  auto ea = new EigenAllocator(streams_[stream_id], allocator, em_.get());
+  return new ConcretePerOpGpuDevice(streams_[stream_id], ea);
+#else
+  const cudaStream_t* cuda_stream = reinterpret_cast<const cudaStream_t*>(
+      streams_[stream_id]->implementation()->CudaStreamMemberHack());
+  auto es = new EigenCudaStreamDevice(cuda_stream, gpu_id_, allocator);
+  return new ConcretePerOpGpuDevice(es);
+#endif
+}
+
+const PerOpGpuDevice* BaseGPUDevice::MakeGpuDevice(DeviceContext* dc,
+                                                   Allocator* allocator) {
+  if (dc) {
+    const GPUDeviceContext* gpu_dc = static_cast<GPUDeviceContext*>(dc);
+    const int stream_id = gpu_dc->stream_id();
+    VLOG(1) << "  eigen_gpu_device(" << dc << ") => stream[" << stream_id
+            << "]";
+    CHECK_LT(stream_id, streams_.size());
+    return NewDevice(stream_id, allocator);
+  } else {
+    return NewDevice(0, allocator);
+  }
+}
+
+void BaseGPUDeviceFactory::CreateDevices(const SessionOptions& options,
+                                         const string& name_prefix,
+                                         std::vector<Device*>* devices) {
+  int n = INT_MAX;
+  auto iter = options.config.device_count().find("GPU");
+  if (iter != options.config.device_count().end()) {
+    n = iter->second;
+  }
+  std::vector<int> valid_gpu_ids;
+  GetValidDeviceIds(&valid_gpu_ids);
+  if (static_cast<size_t>(n) > valid_gpu_ids.size()) {
+    n = valid_gpu_ids.size();
+  }
+  for (int i = 0; i < n; i++) {
+    devices->push_back(CreateGPUDevice(
+        options, strings::StrCat(name_prefix, "/gpu:", i), valid_gpu_ids[i]));
+  }
+}
+
+namespace {
+int64 MinSystemMemory(int64 available_memory) {
+  // We use the following heuristic for now:
+  //
+  // If the available_memory is < 2GiB, we allocate 200MiB to system memory.
+  // Otherwise, allocate 300MiB to system memory.
+  //
+  // In the future we could be more sophisticated by using a table of
+  // devices.
+  if (available_memory < (1LL << 31)) {
+    // 200MiB
+    return 209715200LL;
+  } else {
+    // max(300 MiB, 0.95 * available_memory)
+    return std::max(314572800LL, static_cast<int64>(available_memory * 0.05));
+  }
+}
+}  // namespace
+
+static string GetShortDeviceDescription(int device_id,
+                                        const gpu::DeviceDescription& desc) {
+  return strings::StrCat("device: ", device_id, ", name: ", desc.name(),
+                         ", pci bus id: ", desc.pci_bus_id());
+}
+
+LocalDevice* BaseGPUDeviceFactory::CreateGPUDevice(
+    const SessionOptions& options, const string& name, int gpu_id) {
+  CHECK_GE(gpu_id, 0);
+
+  // Look up the device, to see its attributes.
+  gpu::Platform* gpu_platform = GPUMachineManager();
+  CHECK_LT(gpu_id, gpu_platform->VisibleDeviceCount());
+  gpu::StreamExecutor* se =
+      gpu_platform->ExecutorForDevice(gpu_id).ValueOrDie();
+  const gpu::DeviceDescription& desc = se->GetDeviceDescription();
+
+  int64 total_memory, available_memory;
+  CHECK(se->DeviceMemoryUsage(&available_memory, &total_memory));
+
+  int64 allocated_memory = available_memory;
+  double config_memory_fraction =
+      options.config.gpu_options().per_process_gpu_memory_fraction();
+  if (config_memory_fraction == 0) {
+    const int64 min_system_memory = MinSystemMemory(available_memory);
+    if (min_system_memory < allocated_memory) {
+      allocated_memory -= min_system_memory;
+    }
+  } else {
+    allocated_memory *= config_memory_fraction;
+  }
+
+  Bytes allocated_bytes = static_cast<Bytes>(allocated_memory);
+
+  // Get GPU BusAdjacency from its reported NUMA affinity.
+  // Because GPUs are virtualized in some environments, we can't just
+  // use the GPU id.
+  BusAdjacency bus_adjacency = BUS_ANY;
+  switch (desc.numa_node()) {
+    case 0:
+      bus_adjacency = BUS_0;
+      break;
+    case 1:
+      bus_adjacency = BUS_1;
+      break;
+    default:
+      bus_adjacency = BUS_ANY;
+  }
+  VLOG(1) << "GPUDevice id " << gpu_id << " on bus " << bus_adjacency
+          << " numa: " << desc.numa_node() << " pci: " << desc.pci_bus_id();
+
+  ProcessState* process_state = ProcessState::singleton();
+  return CreateGPUDevice(
+      options, name, allocated_bytes, bus_adjacency, gpu_id,
+      GetShortDeviceDescription(gpu_id, desc),
+      process_state->GetGPUAllocator(gpu_id, allocated_memory),
+      process_state->GetCPUAllocator(desc.numa_node()));
+}
+
+static int GetMinGPUMultiprocessorCount() {
+  static const int kDefaultMinGPUMultiprocessorCount = 8;
+
+  const char* tf_min_gpu_core_count = getenv("TF_MIN_GPU_MULTIPROCESSOR_COUNT");
+
+  if (tf_min_gpu_core_count == nullptr ||
+      strcmp(tf_min_gpu_core_count, "") == 0) {
+    return kDefaultMinGPUMultiprocessorCount;
+  }
+
+  int min_gpu_core_count = -1;
+  if (strings::safe_strto32(tf_min_gpu_core_count, &min_gpu_core_count)) {
+    if (min_gpu_core_count >= 0) {
+      return min_gpu_core_count;
+    }
+  }
+
+  LOG(ERROR) << "Invalid minimum GPU multiprocessor count: ["
+             << tf_min_gpu_core_count << "]. "
+             << "Using the default value: "
+             << kDefaultMinGPUMultiprocessorCount;
+  return kDefaultMinGPUMultiprocessorCount;
+}
+
+void BaseGPUDeviceFactory::GetValidDeviceIds(std::vector<int>* ids) {
+  auto gpu_manager = GPUMachineManager();
+  int min_gpu_core_count = GetMinGPUMultiprocessorCount();
+  if (gpu_manager) {
+    auto visible_device_count = gpu_manager->VisibleDeviceCount();
+    for (int i = 0; i < gpu_manager->VisibleDeviceCount(); ++i) {
+      auto exec_status = gpu_manager->ExecutorForDevice(i);
+      if (!exec_status.ok()) {
+        continue;
+      }
+      gpu::StreamExecutor* se = exec_status.ValueOrDie();
+      const gpu::DeviceDescription& desc = se->GetDeviceDescription();
+      int major, minor;
+      if (!desc.cuda_compute_capability(&major, &minor)) {
+        continue;
+      }
+      // Only consider GPUs with compute capability >= 3.5 (Kepler or
+      // higher)
+      if (major < 3 || (major == 3 && minor < 5)) {
+        LOG(INFO) << "Ignoring gpu device "
+                  << "(" << GetShortDeviceDescription(i, desc) << ") "
+                  << "with Cuda compute capability " << major << "." << minor
+                  << ". The minimum required Cuda capability is 3.5.";
+        continue;
+      }
+
+      // TensorFlow currently places computation on devices assuming
+      // they have similar capability.
+      //
+      // If there are multiple GPUs available on the machine, only
+      // consider GPUs with 8 or more multiprocessors.
+      //
+      // TODO(vrv): In the medium term: we should only filter out GPUs
+      // that are slow relative to the fastest GPU. In the long term,
+      // TensorFlow should support automatic placement based on
+      // capability.
+      if (visible_device_count > 1) {
+        if (desc.core_count() < min_gpu_core_count) {
+          LOG(INFO) << "Ignoring gpu device "
+                    << "(" << GetShortDeviceDescription(i, desc) << ") "
+                    << "with Cuda multiprocessor count: " << desc.core_count()
+                    << ". The minimum required count is " << min_gpu_core_count
+                    << ". You can adjust this requirement with the env var "
+                       "TF_MIN_GPU_MULTIPROCESSOR_COUNT.";
+          continue;
+        }
+      }
+
+      int new_id = ids->size();
+      ids->push_back(i);
+
+      LOG(INFO) << "Creating TensorFlow device (/gpu:" << new_id << ") -> "
+                << "(" << GetShortDeviceDescription(i, desc) << ")";
+    }
+  }
+}
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/common_runtime/gpu/gpu_device.h b/tensorflow/core/common_runtime/gpu/gpu_device.h
new file mode 100644
index 0000000000..a415224d95
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_device.h
@@ -0,0 +1,94 @@
+#if !GOOGLE_CUDA
+#error This file must only be included when building with Cuda support
+#endif
+
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_GPU_DEVICE_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_GPU_DEVICE_H_
+
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/core/common_runtime/local_device.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/device_base.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/session_options.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_event_mgr.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+class EigenAllocator;
+
+class BaseGPUDevice : public LocalDevice {
+ public:
+  BaseGPUDevice(const SessionOptions& options, const string& name,
+                Bytes memory_limit, BusAdjacency bus_adjacency, int gpu_id,
+                const string& physical_device_desc, Allocator* gpu_allocator,
+                Allocator* cpu_allocator);
+
+  ~BaseGPUDevice() override;
+
+  // GPU devices require the Op Compute method to save a reference to
+  // any temporary tensors that are allocated until the Op execution
+  // completes.
+  bool SaveTemporaryTensors() const override { return true; }
+
+  Status FillContextMap(const Graph* graph,
+                        DeviceContextMap* device_context_map);
+
+  void Compute(OpKernel* op_kernel, OpKernelContext* context) override;
+
+  Status Sync() override;
+
+  void ComputeAsync(AsyncOpKernel* op_kernel, OpKernelContext* context,
+                    AsyncOpKernel::DoneCallback done) override;
+
+  Status MakeTensorFromProto(const TensorProto& tensor_proto,
+                             const AllocatorAttributes alloc_attrs,
+                             Tensor* tensor) override;
+
+  // The caller owns the returned device.
+  const PerOpGpuDevice* MakeGpuDevice(DeviceContext* dc,
+                                      Allocator* allocator) override;
+
+ protected:
+  Allocator* gpu_allocator_;  // not owned
+  Allocator* cpu_allocator_;  // not owned
+
+ private:
+  std::vector<gpu::Stream*> streams_;
+  std::vector<GPUDeviceContext*> device_contexts_;
+  GpuDeviceInfo* gpu_device_info_ = nullptr;
+  mutex trace_mu_;
+  int gpu_id_ = -1;
+  std::unique_ptr<EventMgr> em_;
+
+  const PerOpGpuDevice* NewDevice(int stream_id, Allocator* allocator);
+};
+
+class BaseGPUDeviceFactory : public DeviceFactory {
+ public:
+  void CreateDevices(const SessionOptions& options, const string& name_prefix,
+                     std::vector<Device*>* devices) override;
+
+ private:
+  LocalDevice* CreateGPUDevice(const SessionOptions& options,
+                               const string& name, int gpu_id);
+
+  virtual LocalDevice* CreateGPUDevice(const SessionOptions& options,
+                                       const string& name, Bytes memory_limit,
+                                       BusAdjacency bus_adjacency, int gpu_id,
+                                       const string& physical_device_desc,
+                                       Allocator* gpu_allocator,
+                                       Allocator* cpu_allocator) = 0;
+
+  void GetValidDeviceIds(std::vector<int>* ids);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_GPU_DEVICE_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_device_factory.cc b/tensorflow/core/common_runtime/gpu/gpu_device_factory.cc
new file mode 100644
index 0000000000..240ac47499
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_device_factory.cc
@@ -0,0 +1,52 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/common_runtime/gpu/gpu_device.h"
+#include "tensorflow/core/common_runtime/gpu/process_state.h"
+
+namespace tensorflow {
+
+void RequireGPUDevice() {}
+
+class GPUDevice : public BaseGPUDevice {
+ public:
+  GPUDevice(const SessionOptions& options, const string& name,
+            Bytes memory_limit, BusAdjacency bus_adjacency, int gpu_id,
+            const string& physical_device_desc, Allocator* gpu_allocator,
+            Allocator* cpu_allocator)
+      : BaseGPUDevice(options, name, memory_limit, bus_adjacency, gpu_id,
+                      physical_device_desc, gpu_allocator, cpu_allocator) {}
+
+  Allocator* GetAllocator(AllocatorAttributes attr) override {
+    if (attr.on_host()) {
+      ProcessState* ps = ProcessState::singleton();
+      if (attr.gpu_compatible()) {
+        return ps->GetCUDAHostAllocator(0);
+      } else {
+        return cpu_allocator_;
+      }
+    } else {
+      return gpu_allocator_;
+    }
+  }
+};
+
+class GPUDeviceFactory : public BaseGPUDeviceFactory {
+ private:
+  LocalDevice* CreateGPUDevice(const SessionOptions& options,
+                               const string& name, Bytes memory_limit,
+                               BusAdjacency bus_adjacency, int gpu_id,
+                               const string& physical_device_desc,
+                               Allocator* gpu_allocator,
+                               Allocator* cpu_allocator) override {
+    return new GPUDevice(options, name, memory_limit, bus_adjacency, gpu_id,
+                         physical_device_desc, gpu_allocator, cpu_allocator);
+  }
+};
+
+REGISTER_LOCAL_DEVICE_FACTORY("GPU", GPUDeviceFactory);
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc b/tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc
new file mode 100644
index 0000000000..29d6281733
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc
@@ -0,0 +1,132 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_event_mgr.h"
+
+#include "tensorflow/stream_executor/event.h"
+#include "tensorflow/stream_executor/stream.h"
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+EventMgr::EventMgr(gpu::StreamExecutor* se)
+    : exec_(se),
+      // threadpool_ has 1 thread for the polling loop, and one to execute
+      // event callback functions. Maybe we should have more?
+      threadpool_(Env::Default(), "GPU_Event_Manager", 2) {
+  threadpool_.Schedule([this]() { PollLoop(); });
+}
+
+EventMgr::~EventMgr() {
+  stop_polling_.Notify();
+  // Shut down the backup polling loop.
+  polling_stopped_.WaitForNotification();
+
+  // Events are owned by this object.
+  for (auto& e : free_events_) {
+    delete e;
+  }
+  while (!used_events_.empty()) {
+    delete used_events_[0].event;
+    delete used_events_[0].mem;
+    if (used_events_[0].bufrec.buf) {
+      used_events_[0].bufrec.alloc->DeallocateRaw(used_events_[0].bufrec.buf);
+    }
+    if (used_events_[0].func != nullptr)
+      threadpool_.Schedule(used_events_[0].func);
+    used_events_.pop_front();
+  }
+}
+
+// This polling loop runs at a relatively low frequency. Most calls to
+// PollEvents() should come directly from Compute() via
+// ThenDeleteTensors().  This function's purpose is to ensure that
+// even if no more GPU operations are being requested, we still
+// eventually clear the queue. It seems to prevent some tensorflow
+// programs from stalling for reasons not yet understood.
+void EventMgr::PollLoop() {
+  while (!stop_polling_.HasBeenNotified()) {
+    Env::Default()->SleepForMicroseconds(1 * 1000);
+    {
+      mutex_lock l(mu_);
+      PollEvents(true);
+    }
+  }
+  polling_stopped_.Notify();
+}
+
+void EventMgr::QueueInUse(gpu::Stream* stream, InUse iu) {
+  VLOG(2) << "QueueInUse  free_events_ " << free_events_.size()
+          << " used_events_ " << used_events_.size();
+  // Events are created on demand, and repeatedly reused.  There is no
+  // limit placed here on the number of allocated Events.
+  if (free_events_.empty()) {
+    free_events_.push_back(new gpu::Event(exec_));
+    free_events_.back()->Init();
+  }
+  gpu::Event* e = free_events_.back();
+  free_events_.pop_back();
+  stream->ThenRecordEvent(e);
+  iu.event = e;
+  used_events_.push_back(iu);
+}
+
+// This function must be called periodically to check whether pending
+// events have recorded, and then retire them.  Initial observations
+// suggest that typical behavior in a TensorFlow program is to have
+// 0-3 events pending most of the time, but there are occasionally
+// spikes of up to several hundred outstanding.
+//
+// NOTE: If all events are on the same stream, no later event will
+// complete before an earlier event, except possibly if the earlier
+// event transitions to an error state, so there's no advantage in
+// looking past the first kPending event.  However, if we're using
+// multiple streams there may be some gain in looking deeper.
+// As a compromise, PollEvent() calls that are triggered by the queueing
+// of a single event never look past the first kPending event.  Calls
+// coming from the dedicated polling thread always sweep the full queue.
+//
+// Note that allowing the queue to grow very long could cause overall
+// GPU memory use to spike needlessly.  An alternative strategy would
+// be to throttle new Op execution until the pending event queue
+// clears.
+void EventMgr::PollEvents(bool is_dedicated_poller) {
+  VLOG(2) << "PollEvents  free_events_ " << free_events_.size()
+          << " used_events_ " << used_events_.size();
+  // Sweep the remaining events in order.  If this is the dedicated
+  // polling thread, check the entire set.  Otherwise, just sweep up to
+  // the first non-complete record that is still pending.
+  for (auto& iu : used_events_) {
+    if (iu.event == nullptr) continue;
+    gpu::Event::Status s = iu.event->PollForStatus();
+    switch (s) {
+      case gpu::Event::Status::kUnknown:
+      case gpu::Event::Status::kError:
+        // We don't expect to see these.  Someday maybe propagate
+        // a Status error, but for now fail hard.
+        LOG(FATAL) << "Unexpected Event status: " << static_cast<int>(s);
+        break;
+      case gpu::Event::Status::kPending:
+        if (!is_dedicated_poller) return;  // quit processing queue
+        break;
+      case gpu::Event::Status::kComplete:
+        delete iu.mem;
+        if (iu.bufrec.buf) iu.bufrec.alloc->DeallocateRaw(iu.bufrec.buf);
+        // The function must be called in another thread, outside of
+        // the mutex held here.
+        if (iu.func != nullptr) threadpool_.Schedule(iu.func);
+        free_events_.push_back(iu.event);
+        // Mark this InUse record as completed.
+        iu.event = nullptr;
+    }
+  }
+  // Then clear any completed InUse records from the front of the queue.
+  while (!used_events_.empty()) {
+    InUse& iu = used_events_.front();
+    if (iu.event == nullptr) {
+      used_events_.pop_front();
+    } else {
+      break;
+    }
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_event_mgr.h b/tensorflow/core/common_runtime/gpu/gpu_event_mgr.h
new file mode 100644
index 0000000000..f9436566d4
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_event_mgr.h
@@ -0,0 +1,118 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_GPU_EVENT_MGR_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_GPU_EVENT_MGR_H_
+
+#include <deque>
+#include <vector>
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace perftools {
+namespace gputools {
+class Event;
+class Stream;
+class StreamExecutor;
+}  // namespace gputools
+}  // namespace perftools
+
+namespace tensorflow {
+
+// An object to keep track of pending Events in the StreamExecutor streams
+// and associated Tensors that cannot safely be deleted until the associated
+// Events are recorded.
+class EventMgr {
+ public:
+  explicit EventMgr(perftools::gputools::StreamExecutor* se);
+
+  ~EventMgr();
+
+  // Takes ownership of *tensors and deletes it as soon as all events
+  // currently enqueued on *stream have completed.
+  inline void ThenDeleteTensors(perftools::gputools::Stream* stream,
+                                std::vector<Tensor>* tensors) {
+    mutex_lock l(mu_);
+    QueueTensors(stream, tensors);
+    PollEvents(false);
+  }
+
+  struct BufRec {
+    Allocator* alloc;
+    void* buf;
+  };
+
+  // Takes ownership of *bufrec.buf and calls bufrec.alloc->DeallocateRaw()
+  // on it as soon as all events currently enqueued on *stream have completed.
+  inline void ThenDeleteBuffer(perftools::gputools::Stream* stream,
+                               BufRec bufrec) {
+    mutex_lock l(mu_);
+    QueueBuffer(stream, bufrec);
+    PollEvents(false);
+  }
+
+  inline void ThenExecute(perftools::gputools::Stream* stream,
+                          std::function<void()> func) {
+    mutex_lock l(mu_);
+    QueueFunc(stream, func);
+    PollEvents(false);
+  }
+
+ private:
+  friend class TEST_EventMgrHelper;
+  mutex mu_;
+  perftools::gputools::StreamExecutor* exec_;
+
+  struct InUse {
+    perftools::gputools::Event* event;
+    std::vector<Tensor>* mem;
+    BufRec bufrec;
+    std::function<void()> func;
+  };
+
+  // Stream-enqueue an unused Event and save with it a collection of
+  // Tensors and/or a BufRec to be deleted only after the Event
+  // records.
+  void QueueInUse(perftools::gputools::Stream* stream, InUse in_use)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  void QueueTensors(perftools::gputools::Stream* stream,
+                    std::vector<Tensor>* tensors)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+    QueueInUse(stream, {nullptr, tensors, BufRec(), nullptr});
+  }
+
+  void QueueBuffer(perftools::gputools::Stream* stream, BufRec bufrec)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+    QueueInUse(stream, {nullptr, nullptr, bufrec, nullptr});
+  }
+
+  void QueueFunc(perftools::gputools::Stream* stream,
+                 std::function<void()> func) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+    QueueInUse(stream, {nullptr, nullptr, BufRec(), func});
+  }
+
+  // This function should be called at roughly the same tempo as
+  // QueueTensors() to check whether pending events have recorded,
+  // and then retire them.
+  void PollEvents(bool is_dedicated_poller) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // An internal polling loop that runs at a low frequency to clear
+  // straggler Events.
+  void PollLoop();
+
+  // A stack of unused events
+  std::vector<perftools::gputools::Event*> free_events_ GUARDED_BY(mu_);
+
+  // A FIFO queue of InUse events and associated tensors.
+  std::deque<InUse> used_events_ GUARDED_BY(mu_);
+
+  Notification stop_polling_;
+  Notification polling_stopped_;
+
+  // The main PollLoop for the event manager runs in this threadpool.
+  thread::ThreadPool threadpool_;
+};
+
+}  // namespace tensorflow
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_GPU_EVENT_MGR_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_event_mgr_test.cc b/tensorflow/core/common_runtime/gpu/gpu_event_mgr_test.cc
new file mode 100644
index 0000000000..30ca1ff187
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_event_mgr_test.cc
@@ -0,0 +1,152 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/common_runtime/gpu/gpu_event_mgr.h"
+
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include <gtest/gtest.h>
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+class TEST_EventMgrHelper {
+ public:
+  explicit TEST_EventMgrHelper(EventMgr* em) : em_(em) {}
+
+  int queue_size() {
+    mutex_lock l(em_->mu_);
+    return em_->used_events_.size();
+  }
+
+  int free_size() {
+    mutex_lock l(em_->mu_);
+    return em_->free_events_.size();
+  }
+
+  void QueueTensors(perftools::gputools::Stream* stream,
+                    std::vector<Tensor>* tensors) {
+    mutex_lock l(em_->mu_);
+    em_->QueueTensors(stream, tensors);
+  }
+
+  void PollEvents(bool is_dedicated_poller) {
+    mutex_lock l(em_->mu_);
+    em_->PollEvents(is_dedicated_poller);
+  }
+
+ private:
+  EventMgr* em_;
+};
+
+namespace {
+
+TEST(EventMgr, Empty) {
+  auto stream_exec = GPUMachineManager()->ExecutorForDevice(0).ValueOrDie();
+  EventMgr em(stream_exec);
+  TEST_EventMgrHelper th(&em);
+  EXPECT_EQ(0, th.queue_size());
+  EXPECT_EQ(0, th.free_size());
+}
+
+// Delaying polling until after several enqueings should grow the
+// total number of allocated events.  Once we have enough events for
+// the max simultaneously pending, we should not allocate any more.
+TEST(EventMgr, DelayedPolling) {
+  auto stream_exec = GPUMachineManager()->ExecutorForDevice(0).ValueOrDie();
+  EventMgr em(stream_exec);
+  TEST_EventMgrHelper th(&em);
+  EXPECT_EQ(0, th.queue_size());
+  std::vector<Tensor>* v = nullptr;
+  std::unique_ptr<gpu::Stream> stream(new gpu::Stream(stream_exec));
+  CHECK(stream.get());
+  stream->Init();
+  for (int i = 0; i < 5; ++i) {
+    v = new std::vector<Tensor>;
+    th.QueueTensors(stream.get(), v);
+    EXPECT_EQ(i + 1, th.queue_size());
+    EXPECT_EQ(0, th.free_size());
+  }
+  th.PollEvents(false);
+  EXPECT_EQ(0, th.queue_size());
+  EXPECT_EQ(5, th.free_size());
+  for (int j = 0; j < 2; ++j) {
+    for (int i = 0; i < 5; ++i) {
+      v = new std::vector<Tensor>;
+      th.QueueTensors(stream.get(), v);
+      EXPECT_EQ(i + 1, th.queue_size());
+      EXPECT_EQ(4 - i, th.free_size());
+    }
+    th.PollEvents(false);
+    EXPECT_EQ(0, th.queue_size());
+    EXPECT_EQ(5, th.free_size());
+  }
+}
+
+// Immediate polling should require only one event to be allocated.
+TEST(EventMgr, ImmediatePolling) {
+  auto stream_exec = GPUMachineManager()->ExecutorForDevice(0).ValueOrDie();
+  EventMgr em(stream_exec);
+  TEST_EventMgrHelper th(&em);
+  EXPECT_EQ(0, th.queue_size());
+  EXPECT_EQ(0, th.free_size());
+  std::vector<Tensor>* v = nullptr;
+  std::unique_ptr<gpu::Stream> stream(new gpu::Stream(stream_exec));
+  CHECK(stream.get());
+  stream->Init();
+  for (int i = 0; i < 5; ++i) {
+    v = new std::vector<Tensor>;
+    em.ThenDeleteTensors(stream.get(), v);
+    EXPECT_EQ(0, th.queue_size());
+    EXPECT_EQ(1, th.free_size());
+  }
+}
+
+// If we delay polling by more than 1 second, the backup polling loop
+// should clear the queue.
+TEST(EventMgr, LongDelayedPolling) {
+  auto stream_exec = GPUMachineManager()->ExecutorForDevice(0).ValueOrDie();
+  EventMgr em(stream_exec);
+  TEST_EventMgrHelper th(&em);
+  EXPECT_EQ(0, th.queue_size());
+  EXPECT_EQ(0, th.free_size());
+  std::vector<Tensor>* v = nullptr;
+  std::unique_ptr<gpu::Stream> stream(new gpu::Stream(stream_exec));
+  CHECK(stream.get());
+  stream->Init();
+  for (int i = 0; i < 5; ++i) {
+    v = new std::vector<Tensor>;
+    th.QueueTensors(stream.get(), v);
+    EXPECT_EQ(1 + i, th.queue_size());
+    EXPECT_EQ(0, th.free_size());
+  }
+  sleep(1);
+  EXPECT_EQ(0, th.queue_size());
+  EXPECT_EQ(5, th.free_size());
+}
+
+// Deleting the EventMgr when events are still pending should shut
+// down gracefully.
+TEST(EventMgr, NonEmptyShutdown) {
+  auto stream_exec = GPUMachineManager()->ExecutorForDevice(0).ValueOrDie();
+  EventMgr em(stream_exec);
+  TEST_EventMgrHelper th(&em);
+  EXPECT_EQ(0, th.queue_size());
+  EXPECT_EQ(0, th.free_size());
+  std::vector<Tensor>* v = nullptr;
+  std::unique_ptr<gpu::Stream> stream(new gpu::Stream(stream_exec));
+  CHECK(stream.get());
+  stream->Init();
+  for (int i = 0; i < 5; ++i) {
+    v = new std::vector<Tensor>;
+    th.QueueTensors(stream.get(), v);
+    EXPECT_EQ(1 + i, th.queue_size());
+    EXPECT_EQ(0, th.free_size());
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/common_runtime/gpu/gpu_init.cc b/tensorflow/core/common_runtime/gpu/gpu_init.cc
new file mode 100644
index 0000000000..631a47eb91
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_init.cc
@@ -0,0 +1,147 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+
+#include <string>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+namespace {
+
+std::unique_ptr<std::map<std::pair<int, int>, bool>> GetPeerAccessMap(
+    gpu::Platform* platform, int device_count) {
+  auto* map = new std::map<std::pair<int, int>, bool>;
+  for (int i = 0; i < device_count; ++i) {
+    for (int j = 0; j < device_count; ++j) {
+      gpu::StreamExecutor* from = platform->ExecutorForDevice(i).ValueOrDie();
+      gpu::StreamExecutor* to = platform->ExecutorForDevice(j).ValueOrDie();
+      (*map)[{i, j}] = from->CanEnablePeerAccessTo(to);
+    }
+  }
+
+  return std::unique_ptr<std::map<std::pair<int, int>, bool>>{map};
+}
+
+Status EnablePeerAccess(gpu::Platform* platform, int device_count) {
+  for (int i = 0; i < device_count; ++i) {
+    for (int j = 0; j < device_count; ++j) {
+      gpu::StreamExecutor* from = platform->ExecutorForDevice(i).ValueOrDie();
+      gpu::StreamExecutor* to = platform->ExecutorForDevice(j).ValueOrDie();
+
+      if (from->CanEnablePeerAccessTo(to)) {
+        auto status = from->EnablePeerAccessTo(to);
+        if (!status.ok()) {
+          return errors::Internal(status.ToString());
+        }
+      } else {
+        LOG(INFO) << "cannot enable peer access from device ordinal " << i
+                  << " to device ordinal " << j;
+      }
+    }
+  }
+  return Status::OK();
+}
+
+static void InitGPU() {
+  auto result = gpu::MultiPlatformManager::PlatformWithName("CUDA");
+  if (!result.ok()) {
+    LOG(WARNING)
+        << "Not initializing the GPU, could not create GPU MachineManager. "
+        << "Error: " << result.status();
+    return;
+  }
+
+  gpu::Platform* platform = result.ValueOrDie();
+
+  int dev_count = platform->VisibleDeviceCount();
+
+  if (dev_count == 0) {
+    LOG(INFO) << "No GPU devices available on machine.";
+    return;
+  }
+
+  for (int i = 0; i < dev_count; ++i) {
+    auto stream_exec = platform->ExecutorForDevice(i).ValueOrDie();
+    int64 free_bytes;
+    int64 total_bytes;
+    if (!stream_exec->DeviceMemoryUsage(&free_bytes, &total_bytes)) {
+      // Logs internally on failure.
+      free_bytes = 0;
+      total_bytes = 0;
+    }
+    const auto& description = stream_exec->GetDeviceDescription();
+    int cc_major;
+    int cc_minor;
+    if (!description.cuda_compute_capability(&cc_major, &cc_minor)) {
+      // Logs internally on failure.
+      cc_major = 0;
+      cc_minor = 0;
+    }
+    LOG(INFO) << "Found device " << i << " with properties: "
+              << "\nname: " << description.name() << "\nmajor: " << cc_major
+              << " minor: " << cc_minor << " memoryClockRate (GHz) "
+              << description.clock_rate_ghz() << "\npciBusID "
+              << description.pci_bus_id() << "\nTotal memory: "
+              << strings::HumanReadableNumBytes(total_bytes)
+              << "\nFree memory: "
+              << strings::HumanReadableNumBytes(free_bytes);
+  }
+
+  // Enable peer access
+
+  auto status = EnablePeerAccess(platform, dev_count);
+  if (!status.ok()) {
+    LOG(FATAL) << "could not enable peer access for GPU devices: " << status;
+  }
+
+  // Print out a matrix showing which devices can DMA to one
+  // another.
+  auto access_map = GetPeerAccessMap(platform, dev_count);
+  string line_buf = "DMA: ";
+  for (int i = 0; i < dev_count; ++i) {
+    strings::StrAppend(&line_buf, i, " ");
+  }
+  LOG(INFO) << line_buf;
+  for (int i = 0; i < dev_count; ++i) {
+    line_buf = strings::StrCat(i, ":   ");
+    for (int j = 0; j < dev_count; ++j) {
+      if ((*access_map)[{i, j}]) {
+        line_buf.append("Y ");
+      } else {
+        line_buf.append("N ");
+      }
+    }
+    LOG(INFO) << line_buf;
+  }
+}
+
+static bool InitModule() {
+  InitGPU();
+  return true;
+}
+
+}  // namespace
+
+gpu::Platform* GPUMachineManager() {
+  // Create the machine manager singleton and initialize the GPUs only
+  // once.
+  static bool init = InitModule();
+  CHECK(init);  // Avoids compiler warning that init is unused.
+
+  auto result = gpu::MultiPlatformManager::PlatformWithName("CUDA");
+  if (!result.ok()) {
+    return nullptr;
+  }
+
+  return result.ValueOrDie();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_init.h b/tensorflow/core/common_runtime/gpu/gpu_init.h
new file mode 100644
index 0000000000..d126a8b1ca
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_init.h
@@ -0,0 +1,19 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_GPU_INIT_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_GPU_INIT_H_
+
+namespace perftools {
+namespace gputools {
+class Platform;
+}  // namespace gputools
+}  // namespace perftools
+
+namespace tensorflow {
+
+// Returns the GPU machine manager singleton, creating it and
+// initializing the GPUs on the machine if needed the first time it is
+// called.
+perftools::gputools::Platform* GPUMachineManager();
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_GPU_INIT_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc b/tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc
new file mode 100644
index 0000000000..08ff55e221
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_region_allocator.cc
@@ -0,0 +1,371 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_region_allocator.h"
+
+//#include "base/commandlineflags.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/core/lib/core/bits.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE)
+DEFINE_bool(brain_gpu_region_allocator_heap_check_on_destruction, true,
+            "If true, the CUDA gpu manager checks that all allocated "
+            "memory through the GPU memory pool implementation has been "
+            "freed.");
+
+DEFINE_int64(brain_gpu_region_allocator_region_size, 0,
+             "If > 0, sets the default chunk-size allocatable from GPU memory. "
+             "Else defaults to entire GPU memory.");
+
+#else
+bool FLAGS_brain_gpu_region_allocator_heap_check_on_destruction = true;
+tensorflow::int64 FLAGS_brain_gpu_region_allocator_region_size = 0;
+#endif
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+GPURegionAllocator::GPURegionAllocator(int device_id, size_t total_bytes)
+    : device_id_(device_id), total_bytes_(total_bytes) {
+  // Get a pointer to the stream_executor for this device
+  stream_exec_ = GPUMachineManager()->ExecutorForDevice(device_id).ValueOrDie();
+
+  // Set the region size based on explicit user request, or based on
+  // total GPU capacity.
+  if (FLAGS_brain_gpu_region_allocator_region_size > 0) {
+    region_size_ = FLAGS_brain_gpu_region_allocator_region_size;
+  } else {
+    region_size_ = static_cast<size_t>(total_bytes_);
+  }
+
+  LOG(INFO) << "Setting region size to " << region_size_;
+}
+
+GPURegionAllocator::~GPURegionAllocator() {
+  if (FLAGS_brain_gpu_region_allocator_heap_check_on_destruction) {
+    CheckForMemoryLeaks();
+  }
+
+  gtl::STLDeleteValues(&chunk_map_);
+
+  for (auto r : regions_) {
+    gpu::DeviceMemoryBase gpu_ptr{r->ptr};
+    stream_exec_->Deallocate(&gpu_ptr);
+    delete r;
+  }
+}
+
+void* GPURegionAllocator::AllocateRaw(size_t alignment, size_t num_bytes) {
+  static const int64 kMaxMillisToWait = 10000;  // 10 seconds
+  return retry_helper_.AllocateRaw(
+      [this](size_t a, size_t nb, bool v) {
+        return AllocateRawInternal(a, nb, v);
+      },
+      kMaxMillisToWait, alignment, num_bytes);
+}
+
+void* GPURegionAllocator::AllocateRawInternal(size_t alignment,
+                                              size_t num_bytes,
+                                              bool dump_log_on_failure) {
+  if (num_bytes == 0) {
+    LOG(ERROR) << "tried to allocate 0 bytes";
+    return nullptr;
+  }
+  size_t chunk_size = ChunkSize(num_bytes);
+
+  VLOG(2) << "chunk_size " << chunk_size << " from num_bytes "
+          << strings::HumanReadableNumBytes(num_bytes);
+  mutex_lock l(lock_);
+  Pool* pool = &pools_[chunk_size];
+  if (pool->num_free == 0) {
+    if (!ExpandPool(pool, chunk_size, num_bytes, dump_log_on_failure)) {
+      if (dump_log_on_failure) {
+        LOG(WARNING) << "Out of GPU memory, see memory state dump above";
+      }
+      return nullptr;
+    }
+  }
+  CHECK_LT(0, pool->num_free);
+  CHECK(pool->first);
+  CHECK(pool->last);
+  Chunk* c = pool->first;
+  CHECK(c);
+  CHECK(!c->in_use);
+
+  c->in_use = true;
+  // Move c to the back of the queue.
+  if (c->next != nullptr) {
+    pool->first = c->next;
+    pool->first->prev = nullptr;
+    c->next = nullptr;
+  }
+
+  if (pool->last != c) {
+    pool->last->next = c;
+    c->prev = pool->last;
+    pool->last = c;
+  }
+  pool->num_free--;
+  pool->cumulative_malloced++;
+
+  void* rv = c->ptr;
+  c->bytes_allocated = num_bytes;
+
+  VLOG(2) << "new ptr " << rv;
+  return rv;
+}
+
+void GPURegionAllocator::DeallocateRaw(void* ptr) {
+  retry_helper_.DeallocateRaw([this](void* p) { DeallocateRawInternal(p); },
+                              ptr);
+}
+
+void GPURegionAllocator::DeallocateRawInternal(void* ptr) {
+  VLOG(2) << "DeallocateRaw: " << ptr;
+  if (ptr == nullptr) {
+    LOG(ERROR) << "tried to deallocate nullptr";
+    return;
+  }
+
+  mutex_lock l(lock_);
+  ChunkMap::const_iterator iter = chunk_map_.find(ptr);
+  CHECK(iter != chunk_map_.end());
+
+  Chunk* c = iter->second;
+  VLOG(2) << "chunk of size " << c->size << " at " << c;
+
+  Pool* pool = &(pools_[c->size]);
+  // Move chunk to head of queue, and mark free.
+  DCHECK(c->in_use);
+  c->in_use = false;
+  if (c->prev) c->prev->next = c->next;
+  if (c->next) c->next->prev = c->prev;
+  if (pool->first == c) pool->first = c->next;
+  if (pool->last == c) pool->last = c->prev;
+  c->next = pool->first;
+  c->prev = nullptr;
+  if (c->next) c->next->prev = c;
+  pool->first = c;
+  if (pool->last == nullptr) pool->last = c;
+  pool->num_free++;
+  pool->cumulative_freed++;
+}
+
+bool GPURegionAllocator::ExpandPool(Pool* pool, size_t chunk_size,
+                                    size_t requested_size,
+                                    bool dump_log_on_failure) {
+  VLOG(1) << "ExpandPool of " << chunk_size << " from " << pool->num_chunks
+          << " current members";
+  DCHECK_NE(0, chunk_size);
+  // If chunk_size is < 4096, double the pool size.  Otherwise
+  // just increase by one.
+  int num_chunks = pool->num_chunks;
+  if (num_chunks == 0) {
+    if (chunk_size > 4096) {
+      num_chunks = 1;
+    } else {
+      num_chunks = 4096 / chunk_size;
+    }
+  }
+  // For larger chunks, limit the amount of expansion.
+  size_t aggregate_size = num_chunks * chunk_size;
+  if (aggregate_size > (1 << 20)) {
+    num_chunks = static_cast<int>(
+        std::max(static_cast<size_t>(1), (1 << 20) / chunk_size));
+  }
+  while (num_chunks > 0) {
+    Region* r = (regions_.empty() ? nullptr : regions_.back());
+    if (r == nullptr ||
+        (((r->ptr + r->size) - r->next) < static_cast<int64>(chunk_size))) {
+      // Current region is not large enough to accommodate another chunk.
+      while (r == nullptr || (((r->ptr + r->size) - r->next) <
+                              static_cast<int64>(chunk_size))) {
+        // Get another region.
+        size_t this_region_size = std::max(region_size_, chunk_size);
+
+        // Check if we would exceed our limit.
+        if (allocated_memory_ + this_region_size > total_bytes_) {
+          if (dump_log_on_failure) DumpMemoryLog();
+          return false;
+        }
+
+        // Perform the allocation, still checking that the allocator
+        // has not run out of memory.
+        gpu::DeviceMemory<char> gpu_mem =
+            stream_exec_->AllocateArray<char>(this_region_size);
+        if (gpu_mem == nullptr) {
+          if (dump_log_on_failure) DumpMemoryLog();
+          return false;
+        }
+
+        // We never release memory once expanded.
+        allocated_memory_ += this_region_size;
+
+        Region* nr = new Region;
+        nr->ptr = static_cast<char*>(gpu_mem.opaque());
+
+        if (VLOG_IS_ON(2)) {
+          int64 free_bytes;
+          int64 total_bytes;
+          if (stream_exec_->DeviceMemoryUsage(&free_bytes, &total_bytes)) {
+            VLOG(2) << "free " << free_bytes << " total " << total_bytes;
+          } else {
+            // Note: stream_exec call also logs internally on failure.
+            VLOG(2) << "could not retrieve memory usage";
+          }
+        }
+        VLOG(1) << "new Region of size " << this_region_size << " at "
+                << static_cast<void*>(nr->ptr) << " on device " << device_id_;
+        r = nr;
+        r->size = this_region_size;
+        r->next = r->ptr;
+        regions_.push_back(r);
+
+        for (auto visitor : region_visitors_) {
+          visitor(r->ptr, r->size);
+        }
+      }
+    } else {
+      // Allocate a new chunk and push on front of Pool.
+      Chunk* c = new Chunk;
+      c->ptr = r->next;
+      chunk_map_[c->ptr] = c;
+      c->size = chunk_size;
+      r->next += chunk_size;
+      c->next = pool->first;
+      if (c->next != nullptr) c->next->prev = c;
+      pool->first = c;
+      if (pool->last == nullptr) pool->last = c;
+      pool->num_chunks++;
+      pool->num_free++;
+      --num_chunks;
+    }
+  }
+
+  return true;
+}
+
+void GPURegionAllocator::CheckForMemoryLeaks() {
+  std::vector<string> errors;
+  mutex_lock l(lock_);  // could use reader lock
+  for (auto pool_map : pools_) {
+    const Pool& p = pool_map.second;
+    Chunk* curr_chunk = p.first;
+    while (curr_chunk != nullptr) {
+      if (curr_chunk->in_use) {
+        errors.push_back(
+            strings::StrCat("Unfreed chunk of size ", curr_chunk->size));
+      }
+      curr_chunk = curr_chunk->next;
+    }
+  }
+  if (!errors.empty()) {
+    LOG(FATAL) << "GPU Memory leaks:\n" << str_util::Join(errors, "\n");
+  }
+}
+
+// Since there's no merging of chunks once allocated, we want to
+// maximize their reusablity (which argues for fewer, larger sizes),
+// while minimizing waste (which argues for tight-fitting sizes).
+//
+// The smallest unit of allocation is 256 bytes.
+// NOTE(tucker): akrizhevsky says that nvidia's memory manager always
+// aligns to 256 bytes, and doing so results in significant speedup.
+//
+// Up to 2^16 bytes we only allocate in powers of 2.
+//
+// Above that, we pick a max-waste which is the largest power
+// of 2 <= 1/16 of the requested size, then round up to the nearest
+// multiple of max_waste.
+//
+// static
+size_t GPURegionAllocator::ChunkSize(size_t bytes) {
+  if (bytes <= 256) {
+    return 256;
+  } else if (bytes <= (1 << 16)) {
+    return 1uLL << Log2Ceiling64(bytes);
+  } else {
+    // 1/16th of requested size
+    size_t max_waste = 1uLL << (Log2Ceiling64(bytes) - 4);
+    return (bytes + max_waste) & (~(max_waste - 1));
+  }
+}
+
+void GPURegionAllocator::AddAllocVisitor(Visitor visitor) {
+  VLOG(1) << "AddVisitor";
+  mutex_lock l(lock_);
+  region_visitors_.push_back(visitor);
+  for (auto region : regions_) {
+    visitor(region->ptr, region->size);
+  }
+}
+
+void GPURegionAllocator::DumpMemoryLog() {
+  size_t region_bytes = 0;
+  for (auto r : regions_) {
+    region_bytes += r->size;
+  }
+  size_t chunk_bytes = 0;
+  std::vector<size_t> chunk_sizes;
+  for (auto i : pools_) {
+    chunk_sizes.push_back(i.first);
+  }
+  std::sort(chunk_sizes.begin(), chunk_sizes.end());
+  for (auto i : chunk_sizes) {
+    int32 chunks_in_use = 0;
+    const Pool& p = pools_[i];
+    chunk_bytes += i * p.num_chunks;
+
+    if (p.num_chunks > 0) {
+      // Iterate backwards (allocated chunks are last).
+      Chunk* curr_chunk = p.last;
+      while (curr_chunk != nullptr) {
+        if (curr_chunk->in_use) {
+          ++chunks_in_use;
+        }
+        curr_chunk = curr_chunk->prev;
+        if (curr_chunk == p.first) {
+          break;
+        }
+      }
+    }
+
+    LOG(INFO) << "Chunk size: " << i << " ("
+              << strings::HumanReadableNumBytes(i) << ") Pool: " << p.ToString()
+              << "\nNumber of chunks: " << p.num_chunks
+              << ", in_use chunks: " << chunks_in_use;
+  }
+
+  LOG(INFO) << "Aggregate Region Memory: " << region_bytes << " ("
+            << strings::HumanReadableNumBytes(region_bytes) << ")";
+  LOG(INFO) << "Aggregate Chunk Memory: " << chunk_bytes << " ("
+            << strings::HumanReadableNumBytes(chunk_bytes) << ")";
+}
+
+bool GPURegionAllocator::TracksAllocationSizes() { return true; }
+
+size_t GPURegionAllocator::RequestedSize(void* ptr) {
+  mutex_lock l(lock_);
+  auto it = chunk_map_.find(ptr);
+  CHECK(it != chunk_map_.end())
+      << "Asked for requested size of pointer we never allocated: " << ptr;
+  auto c = it->second;
+  return c->bytes_allocated;
+}
+
+size_t GPURegionAllocator::AllocatedSize(void* ptr) {
+  mutex_lock l(lock_);
+  auto it = chunk_map_.find(ptr);
+  CHECK(it != chunk_map_.end())
+      << "Asked for allocated size of pointer we never allocated: " << ptr;
+  auto c = it->second;
+  return c->size;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_region_allocator.h b/tensorflow/core/common_runtime/gpu/gpu_region_allocator.h
new file mode 100644
index 0000000000..1a250b6ede
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_region_allocator.h
@@ -0,0 +1,146 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_GPU_REGION_ALLOCATOR_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_GPU_REGION_ALLOCATOR_H_
+
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_allocator_retry.h"
+#include "tensorflow/core/common_runtime/gpu/visitable_allocator.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+
+namespace tensorflow {
+
+class GPURegionAllocator : public VisitableAllocator {
+ public:
+  // 'device_id' must be a valid device on the machine.
+  //
+  // total_bytes is how many bytes this allocator should allocate up
+  // to.  This may be less than the total available.
+  explicit GPURegionAllocator(int device_id, size_t total_bytes);
+  ~GPURegionAllocator() override;
+
+  string Name() override { return "gpu_region"; }
+  void* AllocateRaw(size_t alignment, size_t num_bytes) override;
+  void DeallocateRaw(void* ptr) override;
+  void AddAllocVisitor(Visitor visitor) override;
+  // Does nothing, because regions are never freed.
+  void AddFreeVisitor(Visitor visitor) override {}
+
+  bool TracksAllocationSizes() override;
+  size_t RequestedSize(void* ptr) override;
+  size_t AllocatedSize(void* ptr) override;
+
+ private:
+  // A Chunk is the header on a single piece of memory given back
+  // in response to an AllocateRaw() call.
+  struct Chunk {
+    char* ptr;               // pointer to granted GPU buffer.
+    size_t size;             // Full size of GPU buffer.
+    size_t bytes_allocated;  // Bytes asked for by client.
+    bool in_use;
+    Chunk* prev;  // Used for chaining in pool.
+    Chunk* next;
+    Chunk()
+        : ptr(nullptr),
+          size(0),
+          bytes_allocated(0),
+          in_use(false),
+          prev(nullptr),
+          next(nullptr) {}
+  };
+
+  // A Pool is a collection of same-sized Chunks.
+  struct Pool {
+    int num_chunks;             // total chunks in this pool
+    int num_free;               // total free chunks in this pool
+    int64 cumulative_malloced;  // number of chunks malloced so far
+    int64 cumulative_freed;     // number of chunks freed so far
+
+    // double-linked ring of chunks; all free chunks precede all
+    // granted chunks
+    Chunk* first;
+    Chunk* last;
+    Pool()
+        : num_chunks(0),
+          num_free(0),
+          cumulative_malloced(0),
+          cumulative_freed(0),
+          first(nullptr),
+          last(nullptr) {}
+
+    string ToString() const {
+      return strings::StrCat("chunks: ", num_chunks, " free: ", num_free,
+                             " cumulative malloc: ", cumulative_malloced,
+                             " cumulative freed: ", cumulative_freed);
+    }
+  };
+
+  // A Region is a single area of GPU memory that has been
+  // reserved by this class and carved up into Chunks.
+  struct Region {
+    char* ptr;   // base GPU ptr
+    char* next;  // frontier of unused part of region
+    size_t size;
+    Region() : ptr(nullptr), size(0) {}
+  };
+
+  // Calculate size of chunk for an allocation of this size.
+  // Min chunk size is 16, for alignment.
+  // For larger sizes, we round up somewhat so there are fewer
+  // size-specific pools.
+  static size_t ChunkSize(size_t bytes);
+
+  void* AllocateRawInternal(size_t alignment, size_t num_bytes,
+                            bool dump_log_on_failure);
+  void DeallocateRawInternal(void* ptr);
+
+  bool ExpandPool(Pool* p, size_t chunk_size, size_t requested_size,
+                  bool dump_log_on_failure) EXCLUSIVE_LOCKS_REQUIRED(lock_);
+
+  // Inspects region maps and crashes with debug information if there
+  // are any memory leaks as detected by the region allocator.
+  void CheckForMemoryLeaks() LOCKS_EXCLUDED(lock_);
+
+  void DumpMemoryLog() EXCLUSIVE_LOCKS_REQUIRED(lock_);
+
+  perftools::gputools::StreamExecutor* stream_exec_;  // Not owned.
+
+  typedef std::unordered_map<size_t, Pool> PoolMap;
+  typedef std::unordered_map<void*, Chunk*> ChunkMap;
+
+  GPUAllocatorRetry retry_helper_;
+  mutable mutex lock_;
+  PoolMap pools_ GUARDED_BY(lock_);
+
+  // Owns regions.
+  std::vector<Region*> regions_ GUARDED_BY(lock_);
+
+  // Maps from GPU ptr to Chunk owning it.
+  //
+  // Owns chunks.
+  ChunkMap chunk_map_ GUARDED_BY(lock_);
+
+  // Called once on each region, ASAP.
+  std::vector<Visitor> region_visitors_ GUARDED_BY(lock_);
+
+  const int device_id_;
+
+  // Total amount of memory (in bytes) available to this Allocator
+  const size_t total_bytes_;
+
+  // Total amount of memory allocated to regions.
+  size_t allocated_memory_ = 0;
+
+  size_t region_size_ = 0;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(GPURegionAllocator);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_GPU_REGION_ALLOCATOR_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_region_allocator_test.cc b/tensorflow/core/common_runtime/gpu/gpu_region_allocator_test.cc
new file mode 100644
index 0000000000..07b0dd57f6
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_region_allocator_test.cc
@@ -0,0 +1,71 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/common_runtime/gpu/gpu_region_allocator.h"
+
+#include <algorithm>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include <gtest/gtest.h>
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+namespace {
+
+TEST(GPURegionAllocatorTest, Simple) {
+  GPURegionAllocator a(0, 1 << 26);
+  std::vector<void*> ptrs;
+  for (int s = 1; s < 1024; s++) {
+    void* raw = a.AllocateRaw(1, s);
+    ptrs.push_back(raw);
+  }
+  std::sort(ptrs.begin(), ptrs.end());
+  for (int i = 0; i < ptrs.size(); i++) {
+    if (i > 0) {
+      CHECK_NE(ptrs[i], ptrs[i - 1]);  // No dups
+    }
+    a.DeallocateRaw(ptrs[i]);
+  }
+  float* t1 = a.Allocate<float>(1024);
+  double* t2 = a.Allocate<double>(1048576);
+  a.Deallocate(t1);
+  a.Deallocate(t2);
+}
+
+TEST(GPURegionAllocatorTest, CheckMemLeak) {
+  EXPECT_DEATH(
+      {
+        GPURegionAllocator a(0, 1 << 26);
+        float* t1 = a.Allocate<float>(1024);
+        if (t1) {
+          LOG(INFO) << "Not deallocating";
+        }
+      },
+      "");
+}
+
+TEST(GPURegionAllocatorTest, TracksSizes) {
+  GPURegionAllocator a(0, 1 << 26);
+  EXPECT_EQ(true, a.TracksAllocationSizes());
+}
+
+TEST(GPURegionAllocatorTest, AllocatedVsRequested) {
+  GPURegionAllocator a(0, 1 << 26);
+  float* t1 = a.Allocate<float>(1);
+  EXPECT_EQ(sizeof(float), a.RequestedSize(t1));
+
+  // Minimum allocation size if 256
+  EXPECT_EQ(256, a.AllocatedSize(t1));
+
+  a.Deallocate(t1);
+}
+
+}  // namespace
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/common_runtime/gpu/gpu_stream_util.cc b/tensorflow/core/common_runtime/gpu/gpu_stream_util.cc
new file mode 100644
index 0000000000..ca86c7fa06
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_stream_util.cc
@@ -0,0 +1,97 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_stream_util.h"
+
+#include <set>
+#include <string>
+#include <unordered_set>
+#include <vector>
+
+#include "tensorflow/core/graph/algorithm.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace tensorflow {
+namespace gpu_stream_util {
+
+Status AssignStreams(const Graph* graph, const AssignStreamsOpts& opts,
+                     std::unordered_map<int, int>* node_to_stream_id) {
+  VLOG(1) << "AssignStreams";
+  Status status;
+
+  // Sanity check arguments.
+  if (graph == nullptr)
+    status.Update(errors::InvalidArgument("Bad graph argument supplied."));
+  if (node_to_stream_id == nullptr) {
+    status.Update(
+        errors::InvalidArgument("Bad node_to_stream_id argument supplied."));
+  }
+  if ((opts.max_streams < 1) || (opts.send_stream >= opts.max_streams) ||
+      (opts.recv_stream >= opts.max_streams) ||
+      (opts.const_stream >= opts.max_streams) ||
+      (opts.compute_stream >= opts.max_streams)) {
+    status.Update(errors::InvalidArgument("Bad graph argument supplied."));
+  }
+  TF_RETURN_IF_ERROR(status);
+
+  // Topologically sort the nodes.
+  std::vector<Node*> order;
+  GetReversePostOrder(*graph, &order);
+  if (VLOG_IS_ON(2)) {
+    for (Node* n : order) {
+      const int node_id = n->id();
+      VLOG(2) << "Node " << node_id << " " << n->type_string() << " "
+              << n->name() << " " << n->in_edges().size() << " inputs";
+      for (const Edge* e : n->in_edges()) {
+        VLOG(2) << "  Edge from " << e->src()->id() << "  " << e->src()->name()
+                << " fanout " << e->src()->out_edges().size();
+      }
+    }
+  }
+  // We perform stream assigmnent assuming a large number of
+  // stream IDs and then map these down to the required number of streams
+  // using simple round-robin.
+  // Stream Assignment strategy:
+  // 1. Nodes with zero inputs are always be executed on a
+  // fresh stream.
+  // 2. Try to execute a node on the same stream as one of its
+  // inputs to avoid inter-stream dependencies.
+  // 3. If any input comes from a node with a large fanout then
+  // perhaps an indication that it is shared between parallel
+  // streams of work. We choose a new stream here so that all consumers
+  // of the tensor are likely to run in parallel.
+  int highest_stream_id = -1;
+  for (Node* n : order) {
+    VLOG(3) << "Inspecting node " << n->DebugString();
+    const int node_id = n->id();
+    const string& op = n->type_string();
+
+    // Determine a suitable stream to use.
+    int stream_id = highest_stream_id + 1;
+    for (const Edge* e : n->in_edges()) {
+      const int fanout = e->src()->out_edges().size();
+      if (fanout == 1) {
+        stream_id = (*node_to_stream_id)[e->src()->id()];
+        break;
+      }
+    }
+    // Override stream for specific op types.
+    if (op == "_Send") {
+      if (opts.send_stream >= 0) stream_id = opts.send_stream;
+    } else if (op == "_Recv") {
+      if (opts.recv_stream >= 0) stream_id = opts.recv_stream;
+    } else if (op == "Const") {
+      if (opts.const_stream >= 0) stream_id = opts.const_stream;
+    } else {
+      if (opts.compute_stream >= 0) stream_id = opts.compute_stream;
+    }
+
+    (*node_to_stream_id)[node_id] = stream_id % opts.max_streams;
+    highest_stream_id = std::max(stream_id, highest_stream_id);
+  }
+  VLOG(1) << "Identified " << highest_stream_id << " candidate streams for "
+          << order.size() << " nodes.";
+
+  return Status::OK();
+}
+
+}  // namespace gpu_stream_util
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_stream_util.h b/tensorflow/core/common_runtime/gpu/gpu_stream_util.h
new file mode 100644
index 0000000000..e1c623382c
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_stream_util.h
@@ -0,0 +1,30 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_GPU_STREAM_UTIL_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_GPU_STREAM_UTIL_H_
+
+#include <unordered_map>
+
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+namespace gpu_stream_util {
+
+struct AssignStreamsOpts {
+  int32 max_streams = 1;
+  // The following options specify a stream to use for specific op
+  // types.  The value -1 allows ops to be assigned to any stream.
+  int32 send_stream = -1;
+  int32 recv_stream = -1;
+  int32 const_stream = -1;
+  int32 compute_stream = -1;
+};
+
+// Given the input graph, assigns every node in the graph with a
+// stream_id that should be used.
+Status AssignStreams(const Graph* graph, const AssignStreamsOpts& opts,
+                     std::unordered_map<int, int>* node_to_stream_id);
+
+}  // namespace gpu_stream_util
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_GPU_STREAM_UTIL_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_stream_util_test.cc b/tensorflow/core/common_runtime/gpu/gpu_stream_util_test.cc
new file mode 100644
index 0000000000..5c426caaef
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_stream_util_test.cc
@@ -0,0 +1,137 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_stream_util.h"
+
+#include <gtest/gtest.h>
+#include "tensorflow/cc/ops/array_ops.h"
+#include "tensorflow/cc/ops/sendrecv_ops.h"
+#include "tensorflow/cc/ops/standard_ops.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+namespace {
+
+class GpuStreamUtilTest : public OpsTestBase {
+ protected:
+  void SetUp() override { RequireDefaultOps(); }
+};
+
+TEST_F(GpuStreamUtilTest, BogusOpts) {
+  GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+  Graph g(OpRegistry::Global());
+  ASSERT_OK(b.ToGraph(&g));
+  std::unordered_map<int, int> node_to_stream_id;
+  gpu_stream_util::AssignStreamsOpts opts;
+  Status status;
+  status = gpu_stream_util::AssignStreams(nullptr, opts, &node_to_stream_id);
+  EXPECT_FALSE(status.ok());
+  status = gpu_stream_util::AssignStreams(&g, opts, nullptr);
+  EXPECT_FALSE(status.ok());
+  opts.max_streams = 0;
+  status = gpu_stream_util::AssignStreams(&g, opts, &node_to_stream_id);
+  EXPECT_FALSE(status.ok());
+  opts.max_streams = 1;
+  opts.compute_stream = 5;
+  status = gpu_stream_util::AssignStreams(&g, opts, &node_to_stream_id);
+  EXPECT_FALSE(status.ok());
+}
+
+TEST_F(GpuStreamUtilTest, EmptyGraph) {
+  GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+  Graph g(OpRegistry::Global());
+  ASSERT_OK(b.ToGraph(&g));
+  std::unordered_map<int, int> node_to_stream_id;
+  gpu_stream_util::AssignStreamsOpts opts;
+  ASSERT_OK(gpu_stream_util::AssignStreams(&g, opts, &node_to_stream_id));
+  EXPECT_EQ(2, node_to_stream_id.size());  // _SOURCE and _SINK
+}
+
+TEST_F(GpuStreamUtilTest, SimpleGraphOneStream) {
+  GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+  ops::MatMul(ops::Const(Tensor(DT_FLOAT), b.opts()),
+              ops::Const(Tensor(DT_FLOAT), b.opts()), b.opts());
+  Graph g(OpRegistry::Global());
+  ASSERT_OK(b.ToGraph(&g));
+
+  std::unordered_map<int, int> node_to_stream_id;
+  gpu_stream_util::AssignStreamsOpts opts;
+  ASSERT_OK(gpu_stream_util::AssignStreams(&g, opts, &node_to_stream_id));
+
+  // There should be 5 nodes assigned.
+  EXPECT_EQ(5, node_to_stream_id.size());
+
+  // All of them should have stream 0.
+  for (const auto& it : node_to_stream_id) {
+    EXPECT_EQ(0, it.second);
+  }
+}
+
+TEST_F(GpuStreamUtilTest, SimpleGraphManyStreams) {
+  GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+  ops::MatMul(ops::Const(Tensor(DT_FLOAT), b.opts()),
+              ops::Const(Tensor(DT_FLOAT), b.opts()), b.opts());
+  Graph g(OpRegistry::Global());
+  ASSERT_OK(b.ToGraph(&g));
+
+  std::unordered_map<int, int> node_to_stream_id;
+  gpu_stream_util::AssignStreamsOpts opts;
+  opts.max_streams = 3;
+  ASSERT_OK(gpu_stream_util::AssignStreams(&g, opts, &node_to_stream_id));
+
+  // There should be 5 nodes assigned.
+  EXPECT_EQ(5, node_to_stream_id.size());
+
+  // All of them should have a stream in the range [0..max_streams).
+  for (const auto& it : node_to_stream_id) {
+    EXPECT_GE(it.second, 0);
+    EXPECT_LT(it.second, opts.max_streams);
+  }
+}
+
+TEST_F(GpuStreamUtilTest, StreamOverrides) {
+  GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+  ops::_Recv(DT_FLOAT, "input", "/cpu:0", 0, "/gpu:0",
+             b.opts().WithName("input"));
+  auto n = ops::MatMul(ops::Const(Tensor(DT_FLOAT), b.opts()),
+                       ops::Const(Tensor(DT_FLOAT), b.opts()), b.opts());
+  ops::_Send(n, "output", "/gpu:0", 0, "/cpu:0", b.opts().WithName("output"));
+  Graph g(OpRegistry::Global());
+  ASSERT_OK(b.ToGraph(&g));
+
+  // Perform stream assignment using a large number of streams, but with
+  // op types constrained to specific streams.
+  std::unordered_map<int, int> node_to_stream_id;
+  gpu_stream_util::AssignStreamsOpts opts;
+  opts.max_streams = 100;
+  opts.const_stream = 90;
+  opts.send_stream = 91;
+  opts.recv_stream = 92;
+  opts.compute_stream = 93;
+  ASSERT_OK(gpu_stream_util::AssignStreams(&g, opts, &node_to_stream_id));
+
+  // There should be 7 nodes assigned.
+  EXPECT_EQ(7, node_to_stream_id.size());  // including _SOURCE and _SINK
+
+  // Nodes should be assigned to streams by op type.
+  for (const auto& it : node_to_stream_id) {
+    Node* n = g.FindNodeId(it.first);
+    const string op = n->type_string();
+    const int stream = it.second;
+    if (op == "Const") {
+      EXPECT_EQ(stream, 90);
+    } else if (op == "_Send") {
+      EXPECT_EQ(stream, 91);
+    } else if (op == "_Recv") {
+      EXPECT_EQ(stream, 92);
+    } else {  // Compute.
+      EXPECT_EQ(stream, 93);
+    }
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_util.cc b/tensorflow/core/common_runtime/gpu/gpu_util.cc
new file mode 100644
index 0000000000..a6a3ce01fc
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_util.cc
@@ -0,0 +1,345 @@
+#include "tensorflow/core/common_runtime/gpu/gpu_util.h"
+
+//#include "base/commandlineflags.h"
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/hash/hash.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/tensor_coding.h"
+#include "tensorflow/core/platform/tracing.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/common_runtime/gpu/dma_helper.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_event_mgr.h"
+#include "tensorflow/core/common_runtime/gpu/process_state.h"
+#include "tensorflow/core/util/util.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+
+#include "tensorflow/core/platform/stream_executor_util.h"
+
+#if defined(PLATFORM_GOOGLE)
+DEFINE_int64(brain_gpu_util_debug_string_maxlen, 128,
+             "When dumping gpu memory, prints up to this many bytes.");
+
+DECLARE_bool(record_mem_types);
+#else
+tensorflow::int64 FLAGS_brain_gpu_util_debug_string_maxlen = 128;
+bool FLAGS_EXPERIMENTAL_brain_gpu_multi_stream = false;
+extern bool FLAGS_record_mem_types;
+#endif
+
+using perftools::gputools::DeviceMemoryBase;
+using perftools::gputools::DeviceMemory;
+using perftools::gputools::Stream;
+
+namespace tensorflow {
+
+namespace gpu = ::perftools::gputools;
+
+/*static*/
+void GPUUtil::SetProtoFromGPU(const Tensor& tensor, Device* dev,
+                              const DeviceContext* device_context,
+                              TensorProto* proto, bool is_dead,
+                              StatusCallback done) {
+  VLOG(1) << "SetProtoFromGPU device_context " << device_context;
+  // Tensor values need to be copied from GPU to CPU ram so that
+  // we can build the protobuf response for a RecvTensor RPC.
+  // "device context" identifies the stream where the _Send op executed.
+  CHECK(device_context);
+  gpu::Stream* stream =
+      static_cast<const GPUDeviceContext*>(device_context)->stream();
+
+  if (!DMAHelper::CanUseDMA(&tensor)) {
+    done(errors::Internal(strings::StrCat(
+        "GPU copy from non-DMA ", DataTypeString(tensor.dtype()), "tensor")));
+    return;
+  }
+  proto->set_dtype(tensor.dtype());
+  tensor.shape().AsProto(proto->mutable_tensor_shape());
+  // Prepare a Cord with the right data buf size, and DMA the
+  // data over from the GPU buffer.  Note that 0-size tensors
+  // do not have a backing buffer.
+  const size_t num_bytes = is_dead ? 0 : tensor.TotalBytes();
+  if (num_bytes > 0) {
+    port::Tracing::ScopedAnnotation annotation("SetProtoFromGPU");
+    Allocator* alloc = ProcessState::singleton()->GetCUDAHostAllocator(0);
+    char* mb = alloc->Allocate<char>(num_bytes);
+    const char* src_ptr =
+        reinterpret_cast<const char*>(DMAHelper::base(&tensor));
+    DeviceMemoryBase gpu_src_ptr(const_cast<char*>(src_ptr), num_bytes);
+    stream->ThenMemcpy(mb, gpu_src_ptr, num_bytes);
+    // Use of tensor may outlive stack scope, so keep a ref.
+    Tensor* tensor_ref = new Tensor(tensor);
+    dev->tensorflow_gpu_device_info()->event_mgr->ThenExecute(
+        stream, [stream, done, proto, mb, num_bytes, alloc, tensor_ref]() {
+          if (!stream->ok()) {
+            done(errors::Internal("SetProtoFromGPU: GPU Memcpy failed"));
+            // TODO(pbar) We currently have no way to recover the
+            // worker from a GPU stream in the error state.  Until
+            // there is a way to reset the CUDA driver, it is
+            // preferable to crash the process and restart.  Tracked
+            // under b/23717097
+            LOG(FATAL) << "SetProtoFromGPU: GPU Memcpy failed";
+            return;
+          }
+          delete tensor_ref;
+          port::CopyFromArray(proto->mutable_tensor_content(), mb, num_bytes);
+          alloc->Deallocate<char>(mb);
+          done(Status::OK());
+        });
+  } else {
+    done(Status::OK());
+  }
+}
+
+typedef ProcessState::MemDesc PMD;
+
+/*static*/
+void GPUUtil::CopyViaDMA(const string& edge_name,
+                         DeviceContext* send_dev_context,
+                         DeviceContext* recv_dev_context, Device* src,
+                         Device* dst, AllocatorAttributes src_alloc_attr,
+                         AllocatorAttributes dst_alloc_attr,
+                         const Tensor* input, Tensor* output,
+                         StatusCallback done) {
+  port::Tracing::ScopedAnnotation annotation(edge_name);
+  VLOG(1) << "CopyViaDMA " << edge_name;
+  size_t total_bytes = input->TotalBytes();
+  // Note that 0-size tensors have no backing buffer.
+  if (total_bytes > 0) {
+    const void* src_ptr = DMAHelper::base(input);
+    void* dst_ptr = DMAHelper::base(output);
+    VLOG(2) << "src_ptr " << src_ptr << " dst_ptr " << dst_ptr;
+    if (FLAGS_record_mem_types) {
+      ProcessState::MemDesc smd = ProcessState::singleton()->PtrType(src_ptr);
+      ProcessState::MemDesc dmd = ProcessState::singleton()->PtrType(dst_ptr);
+      VLOG(0) << "Src " << smd.DebugString() << " Dst " << dmd.DebugString();
+      if (smd.loc == PMD::CPU && dmd.loc == PMD::GPU && (!smd.gpu_registered)) {
+        LOG(WARNING) << "CPU -> GPU no reg for " << edge_name;
+      }
+      if (dmd.loc == PMD::CPU && smd.loc == PMD::GPU && (!dmd.gpu_registered)) {
+        LOG(WARNING) << "GPU -> CPU no reg for " << edge_name;
+      }
+    }
+
+    auto src_device_type = src->attributes().device_type();
+    auto dst_device_type = dst->attributes().device_type();
+
+    bool non_cpu_src = (!src_alloc_attr.on_host() &&
+                        src_device_type != DeviceType(DEVICE_CPU).type());
+    bool non_cpu_dst = (!dst_alloc_attr.on_host() &&
+                        dst_device_type != DeviceType(DEVICE_CPU).type());
+    if (non_cpu_src) {
+      gpu::Stream* stream = send_dev_context->stream();
+      if (stream == nullptr) {
+        done(errors::Internal("Failed to find device stream"));
+        return;
+      }
+      auto* src_dev_info = src->tensorflow_gpu_device_info();
+      CHECK(src_dev_info);
+
+      if (non_cpu_dst) {
+        // Device to device copy
+        DeviceMemoryBase gpu_dst_ptr(dst_ptr, total_bytes);
+        stream->ThenMemcpy(
+            &gpu_dst_ptr,
+            DeviceMemoryBase{const_cast<void*>(src_ptr), total_bytes},
+            total_bytes);
+        if (dst_device_type == DeviceType(DEVICE_GPU).type()) {
+          // Use of input may outlive stack scope, so keep a ref.
+          Tensor* input_ref = new Tensor(*input);
+          src_dev_info->event_mgr->ThenExecute(
+              stream, [done, stream, input_ref]() {
+                delete input_ref;
+                if (!stream->ok()) {
+                  done(errors::Internal("GPU->GPU Memcpy failed"));
+                } else {
+                  done(Status::OK());
+                }
+              });
+        }
+        send_dev_context->MaintainLifetimeOnStream(input, stream);
+      } else {
+        // Device to host copy.
+        return send_dev_context->CopyDeviceTensorToCPU(input, edge_name, src,
+                                                       output, done);
+      }
+    } else if (non_cpu_dst) {
+      // Host to Device copy.
+      // Note that this is already an async copy.
+      recv_dev_context->CopyCPUTensorToDevice(input, dst, output, done);
+    } else {
+      memcpy(dst_ptr, src_ptr, total_bytes);
+      done(Status::OK());
+    }
+  } else {
+    // buffer is empty
+    done(Status::OK());
+  }
+}
+
+void GPUUtil::CopyGPUTensorToCPU(Device* gpu_device,
+                                 const DeviceContext* device_context,
+                                 const Tensor* gpu_tensor, Tensor* cpu_tensor,
+                                 StatusCallback done) {
+  VLOG(1) << "CopyGPUTensorToCPU";
+  size_t total_bytes = gpu_tensor->TotalBytes();
+  // Note that 0-size tensors have no backing buffer.
+  if (total_bytes > 0) {
+    const void* src_ptr = DMAHelper::base(gpu_tensor);
+    void* dst_ptr = DMAHelper::base(cpu_tensor);
+    CHECK(dst_ptr);
+    auto* stream = gpu_device->tensorflow_gpu_device_info()->stream;
+    if (device_context) {
+      stream = static_cast<const GPUDeviceContext*>(device_context)->stream();
+    }
+    stream->ThenMemcpy(
+        dst_ptr, DeviceMemoryBase{const_cast<void*>(src_ptr), total_bytes},
+        total_bytes);
+    stream->BlockHostUntilDone();
+    if (!stream->ok()) {
+      done(errors::Internal("CopyGPUTensorToCPU: GPU->CPU Memcpy failed"));
+      return;
+    }
+  }
+
+  done(Status::OK());
+}
+
+/*  static */
+void GPUUtil::CopyCPUTensorToGPU(const Tensor* cpu_tensor,
+                                 const DeviceContext* device_context,
+                                 Device* gpu_device, Tensor* gpu_tensor,
+                                 StatusCallback done) {
+  VLOG(1) << "CopyCPUTensorToGPU";
+  CHECK(DeviceType(gpu_device->attributes().device_type()) ==
+        DeviceType(DEVICE_GPU));
+
+  auto* dev_info = gpu_device->tensorflow_gpu_device_info();
+  if (!dev_info) {
+    done(errors::Internal("Failed to find dest device GPUDeviceInfo"));
+    return;
+  }
+  if (cpu_tensor->TotalBytes() != gpu_tensor->TotalBytes()) {
+    done(errors::Internal(
+        strings::StrCat("Can't copy ", cpu_tensor->TotalBytes(),
+                        " bytes of a tensor into another with ",
+                        gpu_tensor->TotalBytes(), " bytes buffer.")));
+    return;
+  }
+  const int64 total_bytes = cpu_tensor->TotalBytes();
+  // Note that 0-size tensors have no backing buffer.
+  if (total_bytes > 0) {
+    const void* src_ptr = DMAHelper::base(cpu_tensor);
+    void* dst_ptr = DMAHelper::base(gpu_tensor);
+    DeviceMemoryBase gpu_dst_ptr(dst_ptr, total_bytes);
+
+    CHECK(device_context);
+    auto* stream =
+        static_cast<const GPUDeviceContext*>(device_context)->stream();
+    stream->ThenMemcpy(&gpu_dst_ptr, src_ptr, total_bytes);
+    auto* dev_info = gpu_device->tensorflow_gpu_device_info();
+    // Use of cpu_tensor may outlive stack scope, so keep a ref.
+    Tensor* input_ref = new Tensor(*cpu_tensor);
+    dev_info->event_mgr->ThenExecute(stream, [stream, done, input_ref]() {
+      delete input_ref;
+      if (!stream->ok()) {
+        done(errors::Internal("CopyCPUTensorToGPU: GPU Memcpy failed"));
+      } else {
+        done(Status::OK());
+      }
+    });
+  } else {
+    // empty tensor case
+    done(Status::OK());
+  }
+}
+
+Status GPUUtil::Sync(Device* gpu_device) {
+  VLOG(1) << "GPUUtil::Sync";
+  auto* dev_info = gpu_device->tensorflow_gpu_device_info();
+  if (!dev_info) {
+    return errors::Internal("Failed to find dest device GPUDeviceInfo");
+  }
+  dev_info->stream->BlockHostUntilDone();
+  if (!dev_info->stream->ok()) {
+    LOG(FATAL) << "GPU sync failed";
+  }
+  return Status::OK();
+}
+
+Status GPUUtil::SyncAll(Device* gpu_device) {
+  VLOG(1) << "GPUUtil::SyncAll";
+  auto* dev_info = gpu_device->tensorflow_gpu_device_info();
+  if (!dev_info) {
+    return errors::Internal("Failed to find dest device GPUDeviceInfo");
+  }
+  if (!dev_info->stream->parent()->SynchronizeAllActivity() ||
+      !dev_info->stream->ok()) {
+    LOG(FATAL) << "GPU sync failed";
+  }
+  return Status::OK();
+}
+
+string GPUUtil::MemoryDebugString(const Device* device, Tensor* tensor) {
+  string ret;
+  CHECK(tensor);
+  const int64 num_bytes = std::min<int64>(
+      FLAGS_brain_gpu_util_debug_string_maxlen, tensor->TotalBytes());
+  void* ptr = (num_bytes > 0) ? DMAHelper::base(tensor) : nullptr;
+  strings::Appendf(&ret, "%p:", ptr);
+  if (num_bytes > 0) {
+    auto* dev_info = device->tensorflow_gpu_device_info();
+    if (!dev_info) {
+      strings::StrAppend(
+          &ret, PrintMemory(reinterpret_cast<const char*>(ptr), num_bytes));
+    } else {
+      string buf;
+      buf.resize(num_bytes);
+      DeviceMemoryBase gpu_ptr(ptr, num_bytes);
+      Status s = dev_info->stream->parent()->SynchronousMemcpyD2H(
+          gpu_ptr, num_bytes, gtl::string_as_array(&buf));
+      strings::StrAppend(&ret,
+                         PrintMemory(gtl::string_as_array(&buf), num_bytes));
+    }
+  }
+  return ret;
+}
+
+// TODO(pbar) Checksum is called from places without a valid device context.
+uint64 GPUUtil::Checksum(Device* gpu_device,
+                         const DeviceContext* device_context,
+                         const Tensor& tensor) {
+  Tensor copy(tensor.dtype(), tensor.shape());
+  Status s;
+  Notification n;
+  CopyGPUTensorToCPU(gpu_device, device_context, &tensor, &copy,
+                     [&s, &n](Status status) {
+                       s.Update(status);
+                       n.Notify();
+                     });
+  n.WaitForNotification();
+  CHECK(s.ok()) << s;
+  return Checksum(copy);
+}
+
+uint64 GPUUtil::Checksum(const Tensor& tensor) {
+  const float* fptr = reinterpret_cast<const float*>(DMAHelper::base(&tensor));
+  size_t num_bytes = tensor.TotalBytes();
+  size_t num_floats = num_bytes / sizeof(float);
+  for (size_t i = 0; i < num_floats; ++i) {
+    CHECK(!std::isnan(fptr[i])) << " i " << i;
+  }
+  // TODO(tucker): consider using crc32c instead.
+  return Hash64(reinterpret_cast<const char*>(DMAHelper::base(&tensor)),
+                tensor.TotalBytes(), 0);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/gpu_util.h b/tensorflow/core/common_runtime/gpu/gpu_util.h
new file mode 100644
index 0000000000..1d8c3a054d
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_util.h
@@ -0,0 +1,89 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_GPU_UTIL_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_GPU_UTIL_H_
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/common_runtime/gpu/dma_helper.h"
+#include "tensorflow/stream_executor/device_memory.h"
+
+#include "tensorflow/stream_executor/stream.h"
+
+namespace tensorflow {
+
+class RecvTensorResponse;
+class TensorProto;
+
+namespace gpu = ::perftools::gputools;
+
+class GPUUtil {
+ public:
+  // "tensor" is GPU-local.  "dev" is the hosting GPU.
+  // "device_context" should be the context of the GPU "_Send" op
+  // which provides the Tensor.
+  // Sets all necessasry fields of "proto" by transferring value
+  // bytes from GPU to CPU RAM. "is_dead" indicates that the
+  // tensor is dead with an uninit value.
+  static void SetProtoFromGPU(const Tensor& tensor, Device* dev,
+                              const DeviceContext* device_context,
+                              TensorProto* proto, bool is_dead,
+                              StatusCallback done);
+
+  // Copies "input" to "output" between devices accessible to the
+  // local process via some DMA-like method.  "edge_name" is the name
+  // of the tensor being copied, for debugging purposes. Depending on
+  // the type of devices and memory in use, the copy may be performed
+  // synchronously or asynchronously.  'done' will be invoked only
+  // after the copy is actually complete.
+  static void CopyViaDMA(const string& edge_name,
+                         DeviceContext* send_dev_context,
+                         DeviceContext* recv_dev_context, Device* src,
+                         Device* dst, const AllocatorAttributes src_alloc_attr,
+                         const AllocatorAttributes dst_alloc_attr,
+                         const Tensor* input, Tensor* output,
+                         StatusCallback done);
+
+  // Copies the data in 'gpu_tensor' into 'cpu_tensor'.
+  // 'gpu_tensor''s backing memory must be on 'gpu_device' and
+  // 'cpu_tensor' must be allocated to be of the same size as
+  // 'gpu_tensor'. Synchronous: may block.
+  static void CopyGPUTensorToCPU(Device* gpu_device,
+                                 const DeviceContext* device_context,
+                                 const Tensor* gpu_tensor, Tensor* cpu_tensor,
+                                 StatusCallback done);
+
+  // Blocks until all operations queued on the stream associated with
+  // "gpu_device" at the time of the call have completed.  Returns any
+  // error pending on the stream at completion.
+  static Status Sync(Device* gpu_device);
+
+  // Blocks until all operations queued on all streams associated with the
+  // corresponding GPU device at the time of call have completed.
+  // Returns any error pending on the stream at completion.
+  static Status SyncAll(Device* gpu_device);
+
+  // For debugging purpose, given a "device" and a "tensor" allocated
+  // on the device, return a string printing each byte in the tensor
+  // (up to a limit).  "device" can be either a CPU or a GPU device.
+  static string MemoryDebugString(const Device* device, Tensor* tensor);
+
+  static perftools::gputools::DeviceMemory<float> AsGPUFloat(const Tensor& t);
+
+  // Computes a checksum over the contents of "tensor", which is allocated
+  // on "gpu_device".
+  static uint64 Checksum(Device* gpu_device,
+                         const DeviceContext* device_context,
+                         const Tensor& tensor);
+
+  // Computes a checksum over the contents of "tensor", which is allocated
+  // in local CPU RAM.
+  static uint64 Checksum(const Tensor& tensor);
+
+  static void CopyCPUTensorToGPU(const Tensor* cpu_tensor,
+                                 const DeviceContext* device_context,
+                                 Device* gpu_device, Tensor* gpu_tensor,
+                                 StatusCallback done);
+};
+
+}  // namespace tensorflow
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_GPU_UTIL_H_
diff --git a/tensorflow/core/common_runtime/gpu/gpu_util_platform_specific.cc b/tensorflow/core/common_runtime/gpu/gpu_util_platform_specific.cc
new file mode 100644
index 0000000000..f1b1174a28
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/gpu_util_platform_specific.cc
@@ -0,0 +1,24 @@
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_util.h"
+#include "tensorflow/stream_executor/stream.h"
+
+namespace tensorflow {
+
+void GPUDeviceContext::CopyCPUTensorToDevice(const Tensor* cpu_tensor,
+                                             Device* device,
+                                             Tensor* device_tensor,
+                                             StatusCallback done) const {
+  GPUUtil::CopyCPUTensorToGPU(cpu_tensor, this, device, device_tensor, done);
+}
+
+void GPUDeviceContext::CopyDeviceTensorToCPU(const Tensor* device_tensor,
+                                             const string& tensor_name,
+                                             Device* device, Tensor* cpu_tensor,
+                                             StatusCallback done) {
+  GPUUtil::CopyGPUTensorToCPU(device, this, device_tensor, cpu_tensor, done);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/pool_allocator.cc b/tensorflow/core/common_runtime/gpu/pool_allocator.cc
new file mode 100644
index 0000000000..52deb7fce2
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/pool_allocator.cc
@@ -0,0 +1,269 @@
+#include "tensorflow/core/common_runtime/gpu/pool_allocator.h"
+
+#include <errno.h>
+#include <strings.h>
+#include <sys/mman.h>  // for munmap
+
+#include <map>
+
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+//#include "prodkernel/api/base/numa.h"
+
+namespace tensorflow {
+
+PoolAllocator::PoolAllocator(size_t pool_size_limit, bool auto_resize,
+                             SubAllocator* allocator,
+                             RoundUpInterface* size_rounder, string name)
+    : name_(name),
+      has_size_limit_(pool_size_limit > 0),
+      auto_resize_(auto_resize),
+      pool_size_limit_(pool_size_limit),
+      allocator_(allocator),
+      size_rounder_(size_rounder),
+      allocation_begun_(false) {
+  if (auto_resize) {
+    CHECK_LT(0, pool_size_limit)
+        << "size limit must be > 0 if auto_resize is true.";
+  }
+}
+
+PoolAllocator::~PoolAllocator() { Clear(); }
+
+namespace {
+// Pools contain Chunks allocatated from the underlying Allocator.
+// Chunk alignment is always on kPoolAlignment boundaries.  Each Chunk
+// begins with a descriptor (ChunkPrefix) that gives its size and a
+// pointer to itself.  The pointer returned to the user is just past
+// the ChunkPrefix.  If the user asks for a larger alignment, we will
+// increase the size of the chunk, then adjust the returned user
+// pointer and also re-write the ChunkPrefix.chunk_ptr value
+// immediately before it.  This way the Chunk address and size can be
+// recovered from the returned user pointer, regardless of alignment.
+// Note that this deferencing of the pointers means that we cannot
+// handle GPU memory, only CPU memory.
+struct ChunkPrefix {
+  size_t num_bytes;
+  void* chunk_ptr;
+};
+// kPoolAlignment cannot be less than the size of ChunkPrefix.
+static const int kPoolAlignment = sizeof(ChunkPrefix);
+
+void* PrepareChunk(void* chunk, size_t alignment, size_t num_bytes) {
+  ChunkPrefix* cp = reinterpret_cast<ChunkPrefix*>(chunk);
+  cp->num_bytes = num_bytes;
+  cp->chunk_ptr = chunk;
+  void* user_ptr = reinterpret_cast<void*>(cp + 1);
+  if (alignment > kPoolAlignment) {
+    // Move user_ptr forward to the first satisfying offset, and write
+    // chunk_ptr just before it.
+    size_t aligned_ptr = reinterpret_cast<size_t>(user_ptr) + alignment;
+    user_ptr = reinterpret_cast<void*>(aligned_ptr & ~(alignment - 1));
+    (reinterpret_cast<ChunkPrefix*>(user_ptr) - 1)->chunk_ptr = chunk;
+  }
+  // Safety check that user_ptr is always past the ChunkPrefix.
+  CHECK_GE(user_ptr, reinterpret_cast<ChunkPrefix*>(chunk) + 1);
+  return user_ptr;
+}
+
+ChunkPrefix* FindPrefix(void* user_ptr) {
+  ChunkPrefix* cp = reinterpret_cast<ChunkPrefix*>(user_ptr) - 1;
+  return reinterpret_cast<ChunkPrefix*>(cp->chunk_ptr);
+}
+}  // namespace
+
+void* PoolAllocator::AllocateRaw(size_t alignment, size_t num_bytes) {
+  if (!allocation_begun_) allocation_begun_ = true;
+  if (num_bytes == 0) return nullptr;
+
+  // If alignment is larger than kPoolAlignment, increase num_bytes so that we
+  // are guaranteed to be able to return an aligned ptr by advancing user_ptr
+  // without overrunning the end of the chunk.
+  if (alignment > kPoolAlignment) {
+    num_bytes += alignment;
+  }
+  num_bytes += sizeof(ChunkPrefix);
+  num_bytes = size_rounder_->RoundUp(num_bytes);
+  PtrRecord* pr = nullptr;
+  if (has_size_limit_) {
+    {
+      mutex_lock lock(mutex_);
+      auto iter = pool_.find(num_bytes);
+      if (iter == pool_.end()) {
+        allocated_count_++;
+        // Deliberately fall out of lock scope before
+        // calling the allocator.  No further modification
+        // to the pool will be performed.
+      } else {
+        get_from_pool_count_++;
+        pr = iter->second;
+        RemoveFromList(pr);
+        pool_.erase(iter);
+        // Fall out of lock scope and do the result without the lock held.
+      }
+    }
+  }
+  if (pr != nullptr) {
+    void* r = pr->ptr;
+    delete pr;
+    return PrepareChunk(r, alignment, num_bytes);
+  } else {
+    void* ptr = allocator_->Alloc(kPoolAlignment, num_bytes);
+    for (auto v : alloc_visitors_) {
+      v(ptr, num_bytes);
+    }
+    return PrepareChunk(ptr, alignment, num_bytes);
+  }
+}
+
+void PoolAllocator::DeallocateRaw(void* ptr) {
+  if (ptr == nullptr) return;
+  ChunkPrefix* cp = FindPrefix(ptr);
+  CHECK_LE((void*)cp, (void*)ptr);
+  if (!has_size_limit_ && !auto_resize_) {
+    for (auto v : free_visitors_) {
+      v(cp, cp->num_bytes);
+    }
+    allocator_->Free(cp, cp->num_bytes);
+  } else {
+    mutex_lock lock(mutex_);
+    ++put_count_;
+    while (pool_.size() >= pool_size_limit_) {
+      EvictOne();
+    }
+    PtrRecord* pr = new PtrRecord;
+    pr->num_bytes = cp->num_bytes;
+    pr->ptr = cp;
+    AddToList(pr);
+    pool_.insert(std::make_pair(cp->num_bytes, pr));
+  }
+}
+
+void PoolAllocator::Clear() {
+  if (has_size_limit_) {
+    mutex_lock lock(mutex_);
+    for (auto iter : pool_) {
+      PtrRecord* pr = iter.second;
+      for (auto v : free_visitors_) {
+        v(pr->ptr, pr->num_bytes);
+      }
+      allocator_->Free(pr->ptr, pr->num_bytes);
+      delete pr;
+    }
+    pool_.clear();
+    get_from_pool_count_ = 0;
+    put_count_ = 0;
+    allocated_count_ = 0;
+    evicted_count_ = 0;
+    lru_head_ = nullptr;
+    lru_tail_ = nullptr;
+  }
+}
+
+void PoolAllocator::RemoveFromList(PtrRecord* pr) {
+  if (pr->prev == nullptr) {
+    DCHECK_EQ(lru_head_, pr);
+    lru_head_ = nullptr;
+  } else {
+    pr->prev->next = pr->next;
+  }
+  if (pr->next == nullptr) {
+    DCHECK_EQ(lru_tail_, pr);
+    lru_tail_ = pr->prev;
+  } else {
+    pr->next->prev = pr->prev;
+    if (lru_head_ == nullptr) {
+      lru_head_ = pr->next;
+    }
+  }
+}
+
+void PoolAllocator::AddToList(PtrRecord* pr) {
+  pr->prev = nullptr;
+  if (lru_head_ == nullptr) {
+    CHECK(lru_tail_ == nullptr);
+    lru_tail_ = pr;
+    pr->next = nullptr;
+  } else {
+    pr->next = lru_head_;
+    pr->next->prev = pr;
+  }
+  lru_head_ = pr;
+}
+
+void PoolAllocator::EvictOne() {
+  DCHECK(lru_tail_ != nullptr);
+  PtrRecord* prec = lru_tail_;
+  RemoveFromList(prec);
+  auto iter = pool_.find(prec->num_bytes);
+  while (iter->second != prec) {
+    ++iter;
+    DCHECK(iter != pool_.end());
+  }
+  pool_.erase(iter);
+  for (auto v : free_visitors_) {
+    v(prec->ptr, prec->num_bytes);
+  }
+  allocator_->Free(prec->ptr, prec->num_bytes);
+  delete prec;
+  ++evicted_count_;
+  // Auto-resizing, and warning messages.
+  static const double kTolerable = 2e-3;
+  static const int kCheckInterval = 1000;
+  static const double kIncreaseFactor = 1.1;
+  static const int kMinPoolSize = 100;
+  if (0 == evicted_count_ % kCheckInterval) {
+    const double eviction_rate =
+        evicted_count_ / static_cast<double>(put_count_);
+    const int64 alloc_request_count = allocated_count_ + get_from_pool_count_;
+    const double alloc_rate =
+        allocated_count_ / static_cast<double>(alloc_request_count);
+    static int log_counter = 0;
+    // (counter increment not thread safe but it's just for logging, so we
+    // don't care).
+    bool should_log = ((log_counter++ % 10) == 0);
+    if (should_log) {
+      LOG(WARNING) << "PoolAllocator: After " << alloc_request_count
+                   << " get requests, put_count=" << put_count_
+                   << " evicted_count=" << evicted_count_
+                   << " eviction_rate=" << eviction_rate
+                   << " and unsatisfied allocation rate=" << alloc_rate;
+    }
+    if (auto_resize_ && (eviction_rate > kTolerable) &&
+        (alloc_rate > kTolerable)) {
+      size_t new_size_limit = (pool_size_limit_ < kMinPoolSize)
+                                  ? kMinPoolSize
+                                  : (kIncreaseFactor * pool_size_limit_);
+      if (should_log) {
+        LOG(INFO) << "Raising pool_size_limit_ from " << pool_size_limit_
+                  << " to " << new_size_limit;
+      }
+      pool_size_limit_ = new_size_limit;
+      // Reset all the counters so that ratios are relative to new sizes
+      // at next test interval.
+      put_count_ = 0;
+      allocated_count_ = 0;
+      evicted_count_ = 0;
+      get_from_pool_count_ = 0;
+    }
+  }
+}
+
+void PoolAllocator::AddAllocVisitor(Visitor visitor) {
+  mutex_lock lock(mutex_);
+  CHECK(!allocation_begun_)
+      << "AddAllocVisitor may not be called after pool allocation "
+      << "has begun.";
+  alloc_visitors_.push_back(visitor);
+}
+
+void PoolAllocator::AddFreeVisitor(Visitor visitor) {
+  mutex_lock lock(mutex_);
+  CHECK(!allocation_begun_)
+      << "AddFreeVisitor may not be called after pool allocation "
+      << "has begun.";
+  free_visitors_.push_back(visitor);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/pool_allocator.h b/tensorflow/core/common_runtime/gpu/pool_allocator.h
new file mode 100644
index 0000000000..d10aabe88a
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/pool_allocator.h
@@ -0,0 +1,202 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_POOL_ALLOCATOR_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_POOL_ALLOCATOR_H_
+
+// Simple LRU pool allocators for various flavors of CPU RAM that
+// implement the VisitableAllocator interface. GPU memory is managed
+// by GPURegionAllocator.
+
+#include <atomic>
+#include <map>
+#include <memory>
+#include "tensorflow/core/lib/core/bits.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/common_runtime/gpu/visitable_allocator.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+
+namespace tensorflow {
+
+// Interface of an object that does the underlying alloc/free of memory.
+class SubAllocator {
+ public:
+  virtual ~SubAllocator() {}
+  virtual void* Alloc(size_t alignment, size_t num_bytes) = 0;
+  virtual void Free(void* ptr, size_t num_bytes) = 0;
+};
+
+// Interface of an object that rounds up integers.
+class RoundUpInterface {
+ public:
+  virtual ~RoundUpInterface() {}
+  virtual size_t RoundUp(size_t num_bytes) = 0;
+};
+
+// Size-limited pool of memory buffers obtained from a SubAllocator
+// instance.  Pool eviction policy is LRU.
+class PoolAllocator : public VisitableAllocator {
+ public:
+  // "pool_size_limit" is the maximum number of returned, re-usable
+  // memory buffers to keep in the pool.  If pool_size_limit == 0, the
+  // pool is effectively a thin wrapper around the allocator.
+  // If "auto_resize" is true, then the pool_size_limit will gradually
+  // be raised so that deallocations happen very rarely, if at all.
+  // Transitory start-up objects may deallocate, but the long-term
+  // working-set should not. Auto-resizing can raise pool_size_limit
+  // but will never lower it.
+  // "allocator" is the object that performs the underlying memory
+  // malloc/free operations.  This object takes ownership of allocator.
+  PoolAllocator(size_t pool_size_limit, bool auto_resize,
+                SubAllocator* allocator, RoundUpInterface* size_rounder,
+                string name);
+  ~PoolAllocator() override;
+
+  string Name() override { return name_; }
+
+  void* AllocateRaw(size_t alignment, size_t num_bytes) override;
+
+  void DeallocateRaw(void* ptr) override;
+
+  // REQUIRES: The following functions may only be called prior
+  // to the first Allocate*() call.  Once allocation has begun, it is
+  // illegal to register another visitor.
+
+  void AddAllocVisitor(Visitor visitor) override;
+
+  void AddFreeVisitor(Visitor visitor) override;
+
+  // Allocate an unused memory region of size "num_bytes".  Fetch from
+  // the pool if available, otherwise call allocator_.
+  void* Get(size_t num_bytes);
+
+  // Return a no-longer needed memory region to the pool.  It is an error
+  // to deference "ptr" after this call.  If the pool is full, the least
+  // recently used region will be deallocated.
+  void Put(void* ptr, size_t num_bytes);
+
+  // Reset the pool to empty.
+  void Clear();
+
+  // The following accessors permit monitoring the effectiveness of
+  // the pool at avoiding repeated malloc/frees on the underlying
+  // allocator.  Read locks are not taken on the theory that value
+  // consistency with other threads is not important.
+
+  // Number of Get() requests satisfied from pool.
+  int64 get_from_pool_count() const NO_THREAD_SAFETY_ANALYSIS {
+    return get_from_pool_count_;
+  }
+  // Number of Put() requests.
+  int64 put_count() const NO_THREAD_SAFETY_ANALYSIS { return put_count_; }
+  // Number of Get() requests requiring a fresh allocation.
+  int64 allocated_count() const NO_THREAD_SAFETY_ANALYSIS {
+    return allocated_count_;
+  }
+  // Number of pool evictions.
+  int64 evicted_count() const NO_THREAD_SAFETY_ANALYSIS {
+    return evicted_count_;
+  }
+  // Current size limit.
+  size_t size_limit() const NO_THREAD_SAFETY_ANALYSIS {
+    return pool_size_limit_;
+  }
+
+ private:
+  struct PtrRecord {
+    void* ptr;
+    size_t num_bytes;
+    PtrRecord* prev;
+    PtrRecord* next;
+  };
+
+  // Remove "pr" from the double-linked LRU list.
+  void RemoveFromList(PtrRecord* pr) EXCLUSIVE_LOCKS_REQUIRED(mutex_);
+
+  // Add "pr" to the head of the double-linked LRU list.
+  void AddToList(PtrRecord* pr) EXCLUSIVE_LOCKS_REQUIRED(mutex_);
+
+  // Delete the least recently used record.
+  void EvictOne() EXCLUSIVE_LOCKS_REQUIRED(mutex_);
+
+  const string name_;
+  const bool has_size_limit_;
+  const bool auto_resize_;
+  size_t pool_size_limit_;
+  std::unique_ptr<SubAllocator> allocator_;
+  std::unique_ptr<RoundUpInterface> size_rounder_;
+  mutex mutex_;
+  std::multimap<const size_t, PtrRecord*> pool_ GUARDED_BY(mutex_);
+  PtrRecord* lru_head_ GUARDED_BY(mutex_) = nullptr;
+  PtrRecord* lru_tail_ GUARDED_BY(mutex_) = nullptr;
+  int64 get_from_pool_count_ GUARDED_BY(mutex_) = 0;
+  int64 put_count_ GUARDED_BY(mutex_) = 0;
+  int64 allocated_count_ GUARDED_BY(mutex_) = 0;
+  int64 evicted_count_ GUARDED_BY(mutex_) = 0;
+  // Write access to these is guarded by mutex_, but not read
+  // access. They may only be modified prior to the first
+  // allocation.  Later attempts to modify will fail.
+  std::vector<Visitor> alloc_visitors_;
+  std::vector<Visitor> free_visitors_;
+  std::atomic<bool> allocation_begun_;
+};
+
+// Do-nothing rounder. Passes through sizes unchanged.
+class NoopRounder : public RoundUpInterface {
+ public:
+  size_t RoundUp(size_t num_bytes) override { return num_bytes; }
+};
+
+// Power of 2 rounder: rounds up to nearest power of 2 size.
+class Pow2Rounder : public RoundUpInterface {
+ public:
+  size_t RoundUp(size_t num_bytes) override {
+    return 1uLL << Log2Ceiling64(num_bytes);
+  }
+};
+
+class BasicCPUAllocator : public SubAllocator {
+ public:
+  ~BasicCPUAllocator() override {}
+
+  void* Alloc(size_t alignment, size_t num_bytes) override {
+    return port::aligned_malloc(num_bytes, alignment);
+  }
+  void Free(void* ptr, size_t num_bytes) override { free(ptr); }
+};
+
+// Allocator for pinned CPU RAM that is made known to CUDA for the
+// purpose of efficient DMA with a GPU.
+class CUDAHostAllocator : public SubAllocator {
+ public:
+  // Note: stream_exec cannot be null.
+  explicit CUDAHostAllocator(perftools::gputools::StreamExecutor* stream_exec)
+      : stream_exec_(stream_exec) {
+    CHECK(stream_exec_ != nullptr);
+  }
+  ~CUDAHostAllocator() override {}
+
+  void* Alloc(size_t alignment, size_t num_bytes) override {
+    void* ptr = nullptr;
+    if (num_bytes > 0) {
+      ptr = stream_exec_->HostMemoryAllocate(num_bytes);
+      if (ptr == nullptr) {
+        LOG(FATAL) << "could not allocate pinned host memory of size: "
+                   << num_bytes;
+      }
+    }
+    return ptr;
+  }
+
+  void Free(void* ptr, size_t num_bytes) override {
+    if (ptr != nullptr) {
+      stream_exec_->HostMemoryDeallocate(ptr);
+    }
+  }
+
+ private:
+  perftools::gputools::StreamExecutor* stream_exec_;  // not owned, non-null
+
+  TF_DISALLOW_COPY_AND_ASSIGN(CUDAHostAllocator);
+};
+
+}  // namespace tensorflow
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_POOL_ALLOCATOR_H_
diff --git a/tensorflow/core/common_runtime/gpu/pool_allocator_test.cc b/tensorflow/core/common_runtime/gpu/pool_allocator_test.cc
new file mode 100644
index 0000000000..ca409b2b4c
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/pool_allocator_test.cc
@@ -0,0 +1,203 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/common_runtime/gpu/pool_allocator.h"
+
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/stream_executor/platform.h"
+#include <gtest/gtest.h>
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+namespace {
+
+TEST(PoolAllocatorTest, ZeroSizeBuffers) {
+  gpu::Platform* platform =
+      gpu::MultiPlatformManager::PlatformWithName("cuda").ValueOrDie();
+  PoolAllocator pool(
+      2 /*pool_size_limit*/, false /*auto_resize*/,
+      new CUDAHostAllocator(
+          platform->GetExecutor(gpu::StreamExecutorConfig(/*ordinal=*/0))
+              .ValueOrDie()),
+      new NoopRounder, "pool");
+
+  EXPECT_EQ(nullptr, pool.AllocateRaw(4 /*alignment*/, 0 /*num_bytes*/));
+  pool.DeallocateRaw(nullptr);  // Should not crash.
+  EXPECT_EQ(0, pool.get_from_pool_count());
+  EXPECT_EQ(0, pool.put_count());
+  EXPECT_EQ(0, pool.allocated_count());
+  EXPECT_EQ(0, pool.evicted_count());
+}
+
+TEST(PoolAllocatorTest, ZeroSizePool) {
+  gpu::Platform* platform =
+      gpu::MultiPlatformManager::PlatformWithName("cuda").ValueOrDie();
+  PoolAllocator pool(
+      0 /*pool_size_limit*/, false /*auto_resize*/,
+      new CUDAHostAllocator(
+          platform->GetExecutor(gpu::StreamExecutorConfig(/*ordinal=*/0))
+              .ValueOrDie()),
+      new NoopRounder, "pool");
+
+  EXPECT_EQ(0, pool.get_from_pool_count());
+  EXPECT_EQ(0, pool.put_count());
+  EXPECT_EQ(0, pool.allocated_count());
+  EXPECT_EQ(0, pool.evicted_count());
+
+  // All allocations should bypass the pool and return valid pointers.
+  for (int i = 0; i < 3; ++i) {
+    void* p0 = pool.AllocateRaw(4, 0);
+    void* p4 = pool.AllocateRaw(4, 4);
+    void* p12 = pool.AllocateRaw(4, 12);
+    EXPECT_EQ(nullptr, p0);
+    EXPECT_NE(nullptr, p4);
+    EXPECT_NE(nullptr, p12);
+    pool.DeallocateRaw(p0);
+    pool.DeallocateRaw(p4);
+    pool.DeallocateRaw(p12);
+  }
+  EXPECT_EQ(0, pool.get_from_pool_count());
+  EXPECT_EQ(0, pool.put_count());
+  EXPECT_EQ(0, pool.allocated_count());
+  EXPECT_EQ(0, pool.evicted_count());
+}
+
+TEST(PoolAllocatorTest, Alignment) {
+  gpu::Platform* platform =
+      gpu::MultiPlatformManager::PlatformWithName("cuda").ValueOrDie();
+  PoolAllocator pool(
+      0 /*pool_size_limit*/, false /*auto_resize*/,
+      new CUDAHostAllocator(
+          platform->GetExecutor(gpu::StreamExecutorConfig(/*ordinal=*/0))
+              .ValueOrDie()),
+      new NoopRounder, "pool");
+  for (int i = 0; i < 16; ++i) {
+    size_t alignment = 1 << i;
+    void* p = pool.AllocateRaw(alignment, 111);
+    EXPECT_TRUE(p != nullptr);
+    EXPECT_EQ(0, reinterpret_cast<int64>(p) & (alignment - 1))
+        << "ptr: " << p << " alignment " << alignment;
+    // Intentionally don't deallocate, to test that destruction of
+    // the PoolAllocator frees all pending memory.
+  }
+}
+
+TEST(PoolAllocatorTest, AutoResize) {
+  PoolAllocator pool(2 /*pool_size_limit*/, true /*auto_resize*/,
+                     new BasicCPUAllocator, new NoopRounder, "pool");
+
+  // Alloc/dealloc 10 sizes just a few times, confirming pool size
+  // stays at 2.
+  for (int i = 0; i < 10; ++i) {
+    void* p = pool.AllocateRaw(4, 64 << i);
+    pool.DeallocateRaw(p);
+  }
+  EXPECT_EQ(0, pool.get_from_pool_count());
+  EXPECT_EQ(10, pool.allocated_count());
+  EXPECT_EQ(10, pool.put_count());
+  EXPECT_EQ(8, pool.evicted_count());
+  EXPECT_EQ(2, pool.size_limit());
+
+  // Then repeat 1200 times.  Pool size limit should jump to 100.
+  for (int j = 0; j < 120; ++j) {
+    for (int i = 0; i < 10; ++i) {
+      void* p = pool.AllocateRaw(4, 64 << i);
+      pool.DeallocateRaw(p);
+    }
+  }
+  EXPECT_EQ(100, pool.size_limit());
+}
+
+TEST(PoolAllocatorTest, CudaHostAllocator) {
+  gpu::Platform* platform =
+      gpu::MultiPlatformManager::PlatformWithName("cuda").ValueOrDie();
+  PoolAllocator pool(
+      2 /*pool_size_limit*/, false /*auto_resize*/,
+      new CUDAHostAllocator(
+          platform->GetExecutor(gpu::StreamExecutorConfig(/*ordinal=*/0))
+              .ValueOrDie()),
+      new NoopRounder, "pool");
+
+  // Repeatedly Get a 16-byte value, confirming that there's only
+  // one real allocation.
+  void* p1_16 = pool.AllocateRaw(4, 16);
+  EXPECT_EQ(0, pool.get_from_pool_count());
+  EXPECT_EQ(1, pool.allocated_count());
+  EXPECT_NE(nullptr, p1_16);
+  pool.DeallocateRaw(p1_16);
+  // Pool contents {16}
+  EXPECT_EQ(1, pool.put_count());
+  void* p2_16 = pool.AllocateRaw(4, 16);  // Get it again.
+  EXPECT_EQ(1, pool.get_from_pool_count());
+  EXPECT_EQ(1, pool.allocated_count());
+  EXPECT_EQ(p1_16, p2_16);    // Same pointer value
+  pool.DeallocateRaw(p2_16);  // Put it back.
+  // Pool contents {16}
+  EXPECT_EQ(2, pool.put_count());
+
+  // Get two more values of different sizes.
+  void* p3_4 = pool.AllocateRaw(4, 4);
+  EXPECT_EQ(2, pool.allocated_count());
+  EXPECT_NE(p1_16, p3_4);  // Different pointer value
+  EXPECT_NE(nullptr, p3_4);
+  pool.DeallocateRaw(p3_4);  // Put it back. Pool is now full.
+  // Pool contents {4, 16}
+  EXPECT_EQ(3, pool.put_count());
+  void* p4_2 = pool.AllocateRaw(4, 2);  // Get a third size buffer.
+  EXPECT_NE(nullptr, p4_2);
+  EXPECT_EQ(0, pool.evicted_count());
+
+  // The pool is full: when we put back p4_2, the 16-byte buffer
+  // should be evicted since it was least recently inserted.
+  pool.DeallocateRaw(p4_2);
+  // Pool contents {2, 4}
+  EXPECT_EQ(4, pool.put_count());
+  EXPECT_EQ(1, pool.evicted_count());
+
+  // Re-getting and putting size 2 or 4 should not alter pool size or
+  // num-evicted.
+  void* p5_4 = pool.AllocateRaw(4, 4);
+  EXPECT_NE(nullptr, p5_4);
+  pool.DeallocateRaw(p5_4);
+  void* p6_2 = pool.AllocateRaw(4, 2);
+  EXPECT_NE(nullptr, p6_2);
+  pool.DeallocateRaw(p6_2);
+  EXPECT_EQ(3, pool.get_from_pool_count());
+  EXPECT_EQ(6, pool.put_count());
+  EXPECT_EQ(3, pool.allocated_count());
+  EXPECT_EQ(1, pool.evicted_count());
+
+  pool.Clear();
+  EXPECT_EQ(0, pool.get_from_pool_count());
+  EXPECT_EQ(0, pool.put_count());
+  EXPECT_EQ(0, pool.allocated_count());
+  EXPECT_EQ(0, pool.evicted_count());
+}
+
+TEST(PoolAllocatorTest, Pow2Rounder) {
+  Pow2Rounder rounder;
+  EXPECT_EQ(1, rounder.RoundUp(1));
+  EXPECT_EQ(2, rounder.RoundUp(2));
+  EXPECT_EQ(16, rounder.RoundUp(9));
+  EXPECT_EQ(16, rounder.RoundUp(16));
+  EXPECT_EQ(65536, rounder.RoundUp(41234));
+  EXPECT_EQ(65536, rounder.RoundUp(65535));
+  EXPECT_EQ(65536, rounder.RoundUp(65536));
+}
+
+TEST(PoolAllocatorTest, Name) {
+  gpu::Platform* platform =
+      gpu::MultiPlatformManager::PlatformWithName("cuda").ValueOrDie();
+  PoolAllocator pool(
+      2 /*pool_size_limit*/, false /*auto_resize*/,
+      new CUDAHostAllocator(
+          platform->GetExecutor(gpu::StreamExecutorConfig(/*ordinal=*/0))
+              .ValueOrDie()),
+      new NoopRounder, "pool");
+  EXPECT_EQ("pool", pool.Name());
+}
+
+}  // namespace
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/common_runtime/gpu/process_state.cc b/tensorflow/core/common_runtime/gpu/process_state.cc
new file mode 100644
index 0000000000..70ac6130c2
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/process_state.cc
@@ -0,0 +1,220 @@
+#include "tensorflow/core/common_runtime/gpu/process_state.h"
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_init.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_debug_allocator.h"
+#include "tensorflow/core/common_runtime/gpu/gpu_region_allocator.h"
+#include "tensorflow/core/common_runtime/gpu/pool_allocator.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+
+#if defined(PLATFORM_GOOGLE)
+DEFINE_bool(record_mem_types, false,
+            "If true, record attributes of memory allocations and "
+            "dyanmically check for appropriate use of registered memory."
+            "Should only be true for debugging or diagnosis of "
+            "performance issues.");
+DEFINE_bool(brain_mem_reg_cuda_dma, true,
+            "If true, register CPU RAM used to copy to/from GPU RAM "
+            "with the CUDA driver.");
+DEFINE_bool(brain_gpu_use_bfc_allocator, false,
+            "If true, uses the Best-Fit GPU allocator.");
+DEFINE_bool(brain_gpu_region_allocator_debug, false,
+            "If true, checks for memory overwrites by writing "
+            "distinctive patterns on both ends of allocated memory.");
+DEFINE_bool(brain_gpu_region_allocator_reset_to_nan, false,
+            "If true, initializes all new Malloc buffers to NaN, "
+            "and resets the buffer to NaN upon Free.");
+
+#else
+bool FLAGS_record_mem_types = false;
+bool FLAGS_brain_mem_reg_cuda_dma = true;
+bool FLAGS_brain_gpu_region_allocator_debug = false;
+bool FLAGS_brain_gpu_region_allocator_reset_to_nan = false;
+bool FLAGS_brain_gpu_use_bfc_allocator = false;
+#endif
+
+namespace gpu = ::perftools::gputools;
+
+namespace tensorflow {
+
+ProcessState* ProcessState::instance_ = nullptr;
+
+/*static*/ ProcessState* ProcessState::singleton() {
+  if (instance_ == nullptr) {
+    instance_ = new ProcessState;
+  }
+
+  return instance_;
+}
+
+ProcessState::ProcessState() : gpu_count_(0) {
+  CHECK(instance_ == nullptr);
+  instance_ = this;
+}
+
+ProcessState::~ProcessState() {
+  for (auto p : gpu_allocators_) {
+    delete p;
+  }
+  instance_ = nullptr;
+}
+
+string ProcessState::MemDesc::DebugString() {
+  return strings::StrCat((loc == CPU ? "CPU " : "GPU "), dev_index, ", dma: ",
+                         gpu_registered, ", nic: ", nic_registered);
+}
+
+ProcessState::MemDesc ProcessState::PtrType(const void* ptr) {
+  if (FLAGS_record_mem_types) {
+    auto iter = mem_desc_map_.find(ptr);
+    if (iter != mem_desc_map_.end()) {
+      return iter->second;
+    }
+  }
+  return MemDesc();
+}
+
+void ProcessState::SetGPUCount(int c) {
+  CHECK(gpu_count_ == 0 || gpu_count_ == c)
+      << "Cannot call SetGPUCount with a non-zero value "
+      << "not equal to prior set value.";
+  gpu_count_ = c;
+}
+
+int ProcessState::GPUCount() const { return gpu_count_; }
+
+Allocator* ProcessState::GetGPUAllocator(int gpu_id, size_t total_bytes) {
+#if GOOGLE_CUDA
+  mutex_lock lock(mu_);
+  gpu::Platform* gpu_platform = GPUMachineManager();
+
+  // Verify that gpu_id is legitimate.
+  CHECK_LT(gpu_id, gpu_platform->VisibleDeviceCount())
+      << "gpu_id is outside discovered device range";
+
+  if (gpu_id >= static_cast<int64>(gpu_allocators_.size())) {
+    gpu_allocators_.resize(gpu_id + 1);
+    if (FLAGS_record_mem_types) gpu_al_.resize(gpu_id + 1);
+  }
+
+  if (gpu_allocators_[gpu_id] == nullptr) {
+    VisitableAllocator* gpu_allocator;
+
+    if (FLAGS_brain_gpu_use_bfc_allocator) {
+      gpu_allocator = new GPUBFCAllocator(gpu_id, total_bytes);
+    } else {
+      gpu_allocator = new GPURegionAllocator(gpu_id, total_bytes);
+    }
+
+    if (FLAGS_brain_gpu_region_allocator_debug) {
+      gpu_allocator = new GPUDebugAllocator(gpu_allocator, gpu_id);
+    }
+    if (FLAGS_brain_gpu_region_allocator_reset_to_nan) {
+      gpu_allocator = new GPUNanResetAllocator(gpu_allocator, gpu_id);
+    }
+
+    gpu_allocators_[gpu_id] = gpu_allocator;
+
+    // If there are any pending AllocVisitors for this bus, add
+    // them now.
+    gpu::StreamExecutor* se =
+        gpu_platform->ExecutorForDevice(gpu_id).ValueOrDie();
+    int bus_id = se->GetDeviceDescription().numa_node();
+    if (bus_id < static_cast<int64>(gpu_visitors_.size())) {
+      for (auto v : gpu_visitors_[bus_id]) {
+        gpu_allocators_[gpu_id]->AddAllocVisitor(v);
+      }
+    }
+    if (FLAGS_record_mem_types) {
+      MemDesc md;
+      md.loc = MemDesc::GPU;
+      md.dev_index = gpu_id;
+      md.gpu_registered = false;
+      md.nic_registered = true;
+      if (static_cast<int64>(gpu_al_.size()) <= gpu_id)
+        gpu_al_.resize(gpu_id + 1);
+      gpu_al_[gpu_id] = new internal::RecordingAllocator(
+          &mem_desc_map_, gpu_allocators_[gpu_id], md, &mu_);
+    }
+  }
+  if (FLAGS_record_mem_types) return gpu_al_[gpu_id];
+  return gpu_allocators_[gpu_id];
+#else
+  LOG(FATAL) << "GPUAllocator unavailable. Not compiled with --config=cuda.";
+  return nullptr;
+#endif  // GOOGLE_CUDA
+}
+
+Allocator* ProcessState::GetCPUAllocator(int numa_node) {
+  // Although we're temporarily ignoring numa_node, check for legality.
+  CHECK_GE(numa_node, 0);
+  // TODO(tucker): actually maintain separate CPUAllocators for
+  // different numa_nodes.  For now, just one.
+  numa_node = 0;
+  mutex_lock lock(mu_);
+  while (cpu_allocators_.size() <= static_cast<size_t>(numa_node)) {
+    cpu_allocators_.push_back(new PoolAllocator(
+        100 /*pool_size_limit*/, true /*auto_resize*/, new BasicCPUAllocator(),
+        new NoopRounder, "cpu_pool"));
+  }
+  return cpu_allocators_[0];
+}
+
+Allocator* ProcessState::GetCUDAHostAllocator(int numa_node) {
+  if (gpu_count_ == 0 || !FLAGS_brain_mem_reg_cuda_dma) {
+    return GetCPUAllocator(numa_node);
+  }
+  // Although we're temporarily ignoring numa_node, check for legality.
+  CHECK_GE(numa_node, 0);
+  // TODO(tucker): actually maintain separate CPUAllocators for
+  // different numa_nodes.  For now, just one.
+  numa_node = 0;
+  mutex_lock lock(mu_);
+  while (static_cast<int>(cuda_host_allocators_.size()) <= numa_node) {
+    // CUDAHost alloc the same across all gpus, so just get the
+    // executor for the first device.
+    gpu::Platform* gpu_platform = GPUMachineManager();
+    gpu::StreamExecutor* se = gpu_platform->ExecutorForDevice(0).ValueOrDie();
+    CHECK(se);
+    cuda_host_allocators_.push_back(new PoolAllocator(
+        100 /*pool_size_limit*/, true /*auto_resize*/,
+        new CUDAHostAllocator(se), new Pow2Rounder, "cuda_host"));
+    if (FLAGS_record_mem_types) {
+      MemDesc md;
+      md.loc = MemDesc::CPU;
+      md.dev_index = 0;
+      md.gpu_registered = true;
+      md.nic_registered = false;
+      cuda_al_.push_back(new internal::RecordingAllocator(
+          &mem_desc_map_, cuda_host_allocators_.back(), md, &mu_));
+    }
+  }
+  if (FLAGS_record_mem_types) return cuda_al_[0];
+  return cuda_host_allocators_[0];
+}
+
+void ProcessState::AddGPUAllocVisitor(int bus_id, AllocVisitor visitor) {
+#if GOOGLE_CUDA
+  mutex_lock lock(mu_);
+  gpu::Platform* gpu_platform = GPUMachineManager();
+  for (int gpu_id = 0; gpu_id < static_cast<int64>(gpu_allocators_.size());
+       ++gpu_id) {
+    gpu::StreamExecutor* se =
+        gpu_platform->ExecutorForDevice(gpu_id).ValueOrDie();
+    if (gpu_allocators_[gpu_id] &&
+        se->GetDeviceDescription().numa_node() == bus_id) {
+      gpu_allocators_[gpu_id]->AddAllocVisitor(visitor);
+    }
+  }
+  while (bus_id >= static_cast<int64>(gpu_visitors_.size())) {
+    gpu_visitors_.push_back(std::vector<AllocVisitor>());
+  }
+  gpu_visitors_[bus_id].push_back(visitor);
+#endif  // GOOGLE_CUDA
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/gpu/process_state.h b/tensorflow/core/common_runtime/gpu/process_state.h
new file mode 100644
index 0000000000..527d12c10d
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/process_state.h
@@ -0,0 +1,140 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_PROCESS_STATE_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_PROCESS_STATE_H_
+
+#include <functional>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+
+namespace tensorflow {
+
+class Allocator;
+class VisitableAllocator;
+class PoolAllocator;
+
+// Singleton that manages per-process state, e.g. allocation
+// of shared resources.
+class ProcessState {
+ public:
+  static ProcessState* singleton();
+
+  // Descriptor for memory allocation attributes, used by optional
+  // runtime correctness analysis logic.
+  struct MemDesc {
+    enum MemLoc { CPU, GPU };
+    MemLoc loc;
+    int dev_index;
+    bool gpu_registered;
+    bool nic_registered;
+    MemDesc()
+        : loc(CPU),
+          dev_index(0),
+          gpu_registered(false),
+          nic_registered(false) {}
+    string DebugString();
+  };
+
+  // Records the number of GPUs available in the local process.
+  // It is a fatal error to call this with a value != to the value
+  // in a prior call.
+  void SetGPUCount(int c);
+
+  // Returns number of GPUs available in local process, as set by
+  // SetGPUCount();  Returns 0 if SetGPUCount has not been called.
+  int GPUCount() const;
+
+  // Returns what we know about the memory at ptr.
+  // If we know nothing, it's called CPU 0 with no other attributes.
+  MemDesc PtrType(const void* ptr);
+
+  // Returns the one CPUAllocator used for the given numa_node.
+  // TEMPORY: ignores numa_node.
+  Allocator* GetCPUAllocator(int numa_node);
+
+  // Returns the one GPU allocator used for the indexed GPU.
+  // Note that this is a system GPU index, not (necessarily) a brain
+  // device index.
+  //
+  // 'total_bytes' is the total number of bytes that should be made
+  // available to the allocator.  The first call to this function for
+  // a given gpu_id creates the allocator, so only the total_bytes
+  // used on that first call is used.
+  //
+  // REQUIRES: gpu_id must be a valid ordinal for a GPU available in the
+  // current system environment.  Otherwise returns nullptr.
+  Allocator* GetGPUAllocator(int gpu_id, size_t total_bytes);
+
+  Allocator* GetCUDAHostAllocator(int numa_node);
+
+  // Registers a function to be called once on every new Region
+  // allocated by every GPURegionAllocator proximate to the specified
+  // bus.  The AllocVisitor is provided with a memory pointer and the
+  // size of the area it identifies.  The pointer is not guaranteed to
+  // be valid after the call terminates.  The intention is for this
+  // interface to be used for network device memory registration.
+  // "bus_id" is platform-specific.  On many platforms it
+  // should be 0.  On machines with multiple PCIe buses, it should be
+  // the index of one of the PCIe buses.  If the the bus_id is invalid,
+  // results are undefined.
+  typedef std::function<void(void*, size_t)> AllocVisitor;
+  void AddGPUAllocVisitor(int bus_id, AllocVisitor visitor);
+
+  typedef std::unordered_map<const void*, MemDesc> MDMap;
+
+ protected:
+  ProcessState();
+
+  static ProcessState* instance_;
+
+  mutex mu_;
+  int gpu_count_;
+
+  std::vector<PoolAllocator*> cpu_allocators_ GUARDED_BY(mu_);
+  std::vector<VisitableAllocator*> gpu_allocators_ GUARDED_BY(mu_);
+  std::vector<std::vector<AllocVisitor>> gpu_visitors_ GUARDED_BY(mu_);
+  std::vector<PoolAllocator*> cuda_host_allocators_ GUARDED_BY(mu_);
+
+  virtual ~ProcessState();
+
+  // Optional RecordingAllocators that wrap the corresponding
+  // Allocators for runtime attribute use analysis.
+  MDMap mem_desc_map_;
+  std::vector<Allocator*> cpu_al_ GUARDED_BY(mu_);
+  std::vector<Allocator*> gpu_al_ GUARDED_BY(mu_);
+  std::vector<Allocator*> cuda_al_ GUARDED_BY(mu_);
+};
+
+namespace internal {
+class RecordingAllocator : public Allocator {
+ public:
+  RecordingAllocator(ProcessState::MDMap* mm, Allocator* a,
+                     ProcessState::MemDesc md, mutex* mu)
+      : mm_(mm), a_(a), md_(md), mu_(mu) {}
+
+  string Name() override { return a_->Name(); }
+  void* AllocateRaw(size_t alignment, size_t num_bytes) override {
+    void* p = a_->AllocateRaw(alignment, num_bytes);
+    mutex_lock l(*mu_);
+    (*mm_)[p] = md_;
+    return p;
+  }
+  void DeallocateRaw(void* p) override {
+    mutex_lock l(*mu_);
+    auto iter = mm_->find(p);
+    mm_->erase(iter);
+    a_->DeallocateRaw(p);
+  }
+  bool TracksAllocationSizes() override { return a_->TracksAllocationSizes(); }
+  size_t RequestedSize(void* p) override { return a_->RequestedSize(p); }
+  size_t AllocatedSize(void* p) override { return a_->AllocatedSize(p); }
+  ProcessState::MDMap* mm_;  // not owned
+  Allocator* a_;             // not owned
+  ProcessState::MemDesc md_;
+  mutex* mu_;
+};
+}  // namespace internal
+}  // namespace tensorflow
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_PROCESS_STATE_H_
diff --git a/tensorflow/core/common_runtime/gpu/visitable_allocator.h b/tensorflow/core/common_runtime/gpu/visitable_allocator.h
new file mode 100644
index 0000000000..23feed9aab
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu/visitable_allocator.h
@@ -0,0 +1,30 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_VISITABLE_ALLOCATOR_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_VISITABLE_ALLOCATOR_H_
+
+#include <functional>
+#include "tensorflow/core/framework/allocator.h"
+
+namespace tensorflow {
+
+// Subclass VisitableAllocator instead of Allocator when a memory
+// allocator needs to enable some kind of registration/deregistration
+// of memory areas.
+class VisitableAllocator : public Allocator {
+ public:
+  // Visitor gets called with a pointer to a memory area and its
+  // size in bytes.
+  typedef std::function<void(void*, size_t)> Visitor;
+
+  // Register a visitor guaranteed to be called exactly once on each
+  // chunk of memory newly allocated from the underlying device.
+  // Typically, chunks will be reused and possibly sub-divided by a
+  // pool manager, so the calls will happen only once per process
+  // execution, not once per tensor (re)allocation.
+  virtual void AddAllocVisitor(Visitor visitor) = 0;
+
+  // Register a visitor guaranteed to be called on each chunk of
+  // memory returned to the underlying device.
+  virtual void AddFreeVisitor(Visitor visitor) = 0;
+};
+}  // namespace tensorflow
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_VISITABLE_ALLOCATOR_H_
diff --git a/tensorflow/core/common_runtime/gpu_device_context.h b/tensorflow/core/common_runtime/gpu_device_context.h
new file mode 100644
index 0000000000..03fd9a97c3
--- /dev/null
+++ b/tensorflow/core/common_runtime/gpu_device_context.h
@@ -0,0 +1,45 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_GPU_DEVICE_CONTEXT_H_
+#define TENSORFLOW_COMMON_RUNTIME_GPU_DEVICE_CONTEXT_H_
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/device_base.h"
+
+namespace perftools {
+namespace gputools {
+class Stream;
+}  // namespace gputools
+}  // namespace perftools
+
+namespace tensorflow {
+
+namespace gpu = ::perftools::gputools;
+
+class GPUDeviceContext : public DeviceContext {
+ public:
+  GPUDeviceContext(int stream_id, gpu::Stream* stream)
+      : stream_id_(stream_id), stream_(stream) {}
+
+  ~GPUDeviceContext() override {}
+
+  gpu::Stream* stream() const override { return stream_; }
+  int stream_id() const { return stream_id_; }
+
+  void CopyCPUTensorToDevice(const Tensor* cpu_tensor, Device* device,
+                             Tensor* device_tensor,
+                             StatusCallback done) const override;
+
+  void CopyDeviceTensorToCPU(const Tensor* device_tensor,
+                             const string& edge_name, Device* device,
+                             Tensor* cpu_tensor, StatusCallback done) override;
+
+  void MaintainLifetimeOnStream(
+      const Tensor* t, perftools::gputools::Stream* stream) const override {}
+
+ private:
+  int stream_id_;
+  gpu::Stream* stream_;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_GPU_DEVICE_CONTEXT_H_
diff --git a/tensorflow/core/common_runtime/kernel_benchmark_testlib.cc b/tensorflow/core/common_runtime/kernel_benchmark_testlib.cc
new file mode 100644
index 0000000000..28afc95c1b
--- /dev/null
+++ b/tensorflow/core/common_runtime/kernel_benchmark_testlib.cc
@@ -0,0 +1,160 @@
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/framework/op_segment.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/util/device_name_utils.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/session_options.h"
+
+#if defined(PLATFORM_GOOGLE)
+DECLARE_bool(brain_gpu_use_bfc_allocator);
+#else
+extern bool FLAGS_brain_gpu_use_bfc_allocator;
+#endif
+
+namespace tensorflow {
+namespace test {
+
+Benchmark::Benchmark(const string& device, Graph* g,
+                     const SessionOptions* options, Graph* init) {
+  RequireDefaultOps();
+
+  FLAGS_brain_gpu_use_bfc_allocator = true;
+
+  SessionOptions default_options;
+  if (!options) {
+    options = &default_options;
+  }
+
+  testing::StopTiming();
+  string t = str_util::Uppercase(device);
+  device_ =
+      DeviceFactory::NewDevice(t, *options, "/job:localhost/replica:0/task:0");
+  CHECK(device_) << "Could not create a " << device << " device";
+
+  pool_ = new thread::ThreadPool(options->env, "blocking",
+                                 port::NumSchedulableCPUs());
+
+  auto runner = [this](std::function<void()> closure) {
+    pool_->Schedule(closure);
+  };
+
+  rendez_ = NewLocalRendezvous();
+
+  if (init) {
+    Executor* init_exec;
+    TF_CHECK_OK(NewLocalExecutor(
+        {
+            device_, nullptr, false,
+            [this](const NodeDef& ndef, OpKernel** kernel) {
+              return CreateNonCachedKernel(device_, nullptr, ndef, kernel);
+            },
+            [](OpKernel* kernel) { DeleteNonCachedKernel(kernel); },
+        },
+        init, &init_exec));
+    Executor::Args args;
+    args.rendezvous = rendez_;
+    args.runner = runner;
+    TF_CHECK_OK(init_exec->Run(args));
+    delete init_exec;
+  }
+
+  TF_CHECK_OK(NewLocalExecutor(
+      {
+          device_,
+          nullptr,
+          false,
+          [this](const NodeDef& ndef, OpKernel** kernel) {
+            return CreateNonCachedKernel(device_, nullptr, ndef, kernel);
+          },
+          [](OpKernel* kernel) { DeleteNonCachedKernel(kernel); },
+      },
+      g, &exec_));
+}
+
+Benchmark::~Benchmark() {
+  if (device_) {
+    rendez_->Unref();
+    delete exec_;
+    delete device_;
+    delete pool_;
+  }
+}
+
+void Benchmark::Run(int iters) { RunWithArgs({}, {}, iters); }
+
+string GetRendezvousKey(const Node* node) {
+  string send_device;
+  TF_CHECK_OK(GetNodeAttr(node->def(), "send_device", &send_device));
+  string recv_device;
+  TF_CHECK_OK(GetNodeAttr(node->def(), "recv_device", &recv_device));
+  string tensor_name;
+  TF_CHECK_OK(GetNodeAttr(node->def(), "tensor_name", &tensor_name));
+  uint64 send_device_incarnation;
+  TF_CHECK_OK(GetNodeAttr(node->def(), "send_device_incarnation",
+                          reinterpret_cast<int64*>(&send_device_incarnation)));
+  return Rendezvous::CreateKey(send_device, send_device_incarnation,
+                               recv_device, tensor_name, FrameAndIter(0, 0));
+}
+
+void Benchmark::RunWithArgs(
+    const std::vector<std::pair<const Node*, Tensor>>& inputs,
+    const std::vector<const Node*>& outputs, int iters) {
+  if (device_) {
+    // Gets inputs' and outputs' rendezvous keys.
+    std::vector<std::pair<string, Tensor>> in;
+    for (const auto& p : inputs) {
+      in.push_back({GetRendezvousKey(p.first), p.second});
+    }
+    std::vector<string> out;
+    for (const auto& n : outputs) {
+      out.push_back(GetRendezvousKey(n));
+    }
+    Tensor unused;  // In benchmark, we don't care the return value.
+    bool is_dead;
+
+    // Warm up
+    Executor::Args args;
+    args.rendezvous = rendez_;
+    args.runner = [this](std::function<void()> closure) {
+      pool_->Schedule(closure);
+    };
+    for (int i = 0; i < 3; ++i) {
+      for (const auto& p : in) {
+        rendez_->Send(p.first, Rendezvous::Args(), p.second, false);
+      }
+      TF_CHECK_OK(exec_->Run(args));
+      for (const string& key : out) {
+        rendez_->Recv(key, Rendezvous::Args(), &unused, &is_dead);
+      }
+    }
+    TF_CHECK_OK(device_->Sync());
+
+    testing::StartTiming();
+    while (iters-- > 0) {
+      for (const auto& p : in) {
+        rendez_->Send(p.first, Rendezvous::Args(), p.second, false);
+      }
+      TF_CHECK_OK(exec_->Run(args));
+      for (const string& key : out) {
+        rendez_->Recv(key, Rendezvous::Args(), &unused, &is_dead);
+      }
+    }
+
+    TF_CHECK_OK(device_->Sync());
+    testing::StopTiming();
+  }
+}
+
+}  // end namespace test
+}  // end namespace tensorflow
diff --git a/tensorflow/core/common_runtime/kernel_benchmark_testlib.h b/tensorflow/core/common_runtime/kernel_benchmark_testlib.h
new file mode 100644
index 0000000000..5ebe13e1d4
--- /dev/null
+++ b/tensorflow/core/common_runtime/kernel_benchmark_testlib.h
@@ -0,0 +1,52 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_KERNEL_BENCHMARK_TESTLIB_H_
+#define TENSORFLOW_COMMON_RUNTIME_KERNEL_BENCHMARK_TESTLIB_H_
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/common_runtime/executor.h"
+#include "tensorflow/core/graph/testlib.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+class Device;
+class SessionOptions;
+
+namespace test {
+
+class Benchmark {
+ public:
+  // "device" must be either "cpu" or "gpu".  Takes ownership of "g"
+  // and "init".
+  Benchmark(const string& device, Graph* g,
+            const SessionOptions* options = nullptr, Graph* init = nullptr);
+  ~Benchmark();
+
+  // Executes the graph for "iters" times.
+  void Run(int iters);
+
+  // If "g" contains send/recv nodes, before each execution, we send
+  // inputs to the corresponding recv nodes in the graph, after each
+  // execution, we recv outputs from the corresponding send nodes in
+  // the graph. In the benchmark, we throw away values returned by the
+  // graph.
+  void RunWithArgs(const std::vector<std::pair<const Node*, Tensor>>& inputs,
+                   const std::vector<const Node*>& outputs, int iters);
+
+ private:
+  thread::ThreadPool* pool_ = nullptr;
+  thread::ThreadPool* non_blocking_pool_ = nullptr;
+  Device* device_ = nullptr;
+  Rendezvous* rendez_ = nullptr;
+  Executor* exec_ = nullptr;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Benchmark);
+};
+
+}  // end namespace test
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_KERNEL_BENCHMARK_TESTLIB_H_
diff --git a/tensorflow/core/common_runtime/local_device.cc b/tensorflow/core/common_runtime/local_device.cc
new file mode 100644
index 0000000000..6a75346805
--- /dev/null
+++ b/tensorflow/core/common_runtime/local_device.cc
@@ -0,0 +1,51 @@
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/common_runtime/eigen_thread_pool.h"
+#include "tensorflow/core/common_runtime/local_device.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/session_options.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+namespace {
+
+DeviceBase::CpuWorkerThreads eigen_worker_threads;
+Eigen::ThreadPoolInterface* eigen_thread_pool = nullptr;
+Eigen::ThreadPoolDevice* eigen_device = nullptr;
+
+static bool InitModule(const SessionOptions& options) {
+  int32 intra_op_parallelism_threads =
+      options.config.intra_op_parallelism_threads();
+  if (intra_op_parallelism_threads == 0) {
+    intra_op_parallelism_threads = port::NumSchedulableCPUs();
+  }
+  LOG(INFO) << "Local device intra op parallelism threads: "
+            << intra_op_parallelism_threads;
+  eigen_worker_threads.num_threads = intra_op_parallelism_threads;
+  eigen_worker_threads.workers = new thread::ThreadPool(
+      options.env, "Eigen", intra_op_parallelism_threads);
+  eigen_thread_pool = new EigenThreadPoolWrapper(eigen_worker_threads.workers);
+  eigen_device = new Eigen::ThreadPoolDevice(eigen_thread_pool,
+                                             eigen_worker_threads.num_threads);
+  return true;
+}
+}  // end namespace
+
+// LocalDevice ----------------------------------------------------------------
+
+LocalDevice::LocalDevice(const SessionOptions& options,
+                         const DeviceAttributes& attributes,
+                         Allocator* device_allocator)
+    : Device(options.env, attributes, device_allocator) {
+  // All ThreadPoolDevices in the process will use this single fixed
+  // sized threadpool for numerical computations.
+  static bool init = InitModule(options);
+  CHECK(init);  // Avoids compiler warning that init is unused.
+  set_tensorflow_cpu_worker_threads(&eigen_worker_threads);
+  set_eigen_cpu_device(eigen_device);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/local_device.h b/tensorflow/core/common_runtime/local_device.h
new file mode 100644
index 0000000000..fc4cfc2dfc
--- /dev/null
+++ b/tensorflow/core/common_runtime/local_device.h
@@ -0,0 +1,27 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_LOCAL_DEVICE_H_
+#define TENSORFLOW_COMMON_RUNTIME_LOCAL_DEVICE_H_
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/device_attributes.pb.h"
+
+namespace tensorflow {
+
+class SessionOptions;
+
+// This class is shared by ThreadPoolDevice and GPUDevice and
+// initializes a shared Eigen compute device used by both.  This
+// should eventually be removed once we refactor ThreadPoolDevice and
+// GPUDevice into more 'process-wide' abstractions.
+class LocalDevice : public Device {
+ public:
+  LocalDevice(const SessionOptions& options, const DeviceAttributes& attributes,
+              Allocator* device_allocator);
+  ~LocalDevice() override {}
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(LocalDevice);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_LOCAL_DEVICE_H_
diff --git a/tensorflow/core/common_runtime/local_session.cc b/tensorflow/core/common_runtime/local_session.cc
new file mode 100644
index 0000000000..ab6993b8a2
--- /dev/null
+++ b/tensorflow/core/common_runtime/local_session.cc
@@ -0,0 +1,500 @@
+#include "tensorflow/core/common_runtime/local_session.h"
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/common_runtime/executor.h"
+#include "tensorflow/core/common_runtime/rendezvous_mgr.h"
+#include "tensorflow/core/common_runtime/session_factory.h"
+#include "tensorflow/core/common_runtime/simple_placer.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/graph/algorithm.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/graph_partition.h"
+#include "tensorflow/core/graph/subgraph.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/random/random.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/device_name_utils.h"
+
+namespace tensorflow {
+
+namespace {
+
+thread::ThreadPool* kernel_thread_pool_ = nullptr;
+static bool InitModule(const SessionOptions& options) {
+  int32 inter_op_parallelism_threads =
+      options.config.inter_op_parallelism_threads();
+  if (inter_op_parallelism_threads == 0) {
+    // Default to using the number of cores available in the process.
+    inter_op_parallelism_threads = port::NumSchedulableCPUs();
+  }
+  LOG(INFO) << "Local session inter op parallelism threads: "
+            << inter_op_parallelism_threads;
+  kernel_thread_pool_ = new thread::ThreadPool(options.env, "Compute",
+                                               inter_op_parallelism_threads);
+  return true;
+}
+
+// TODO(vrv): Figure out how to unify the many different functions
+// that generate RendezvousKey, since many of them have to be
+// consistent with each other.
+string GetRendezvousKey(const string& tensor_name,
+                        const DeviceAttributes& device_info,
+                        const FrameAndIter& frame_iter) {
+  return strings::StrCat(device_info.name(), ";",
+                         strings::FpToString(device_info.incarnation()), ";",
+                         device_info.name(), ";", tensor_name, ";",
+                         frame_iter.frame_id, ":", frame_iter.iter_id);
+}
+
+// NOTE: On Android with a single device, there is never
+// a risk of an OpKernel blocking indefinitely:
+//
+// 1) No operations do I/O that depends on other simultaneous kernels,
+//
+// 2) Recv nodes always complete immediately: The inputs are sent into
+//    the local rendezvous before we start the executor, so the
+//    corresonding recvs will not block.
+//
+// Based on these assumptions, we can use the same thread pool for
+// both "non-blocking" and "blocking" OpKernels on Android.
+//
+// This may change down the road when we add support for multiple
+// devices that run concurrently, in which case we will need to
+// revisit this decision.
+void SchedClosure(std::function<void()> c) {
+// TODO(sanjay): Get rid of __ANDROID__ path
+#ifdef __ANDROID__
+  // On Android, there is no implementation of ThreadPool that takes
+  // std::function, only Closure, which we cannot easily convert.
+  //
+  // Instead, we just run the function in-line, which is currently
+  // safe given the reasoning above.
+  c();
+#else
+  kernel_thread_pool_->Schedule(c);
+#endif  // __ANDROID__
+}
+
+}  // namespace
+
+LocalSession::LocalSession(const SessionOptions& options,
+                           const DeviceMgr* device_mgr)
+    : options_(options),
+      device_mgr_(device_mgr),
+      cancellation_manager_(new CancellationManager()) {
+  static bool init = InitModule(options);
+  CHECK(init);  // Avoids compiler warning that init is unused.
+  session_handle_ = strings::FpToString(random::New64());
+  int devices_added = 0;
+  if (options.config.log_device_placement()) {
+    const string mapping_str = device_mgr_->DeviceMappingString();
+    printf("Device mapping:\n%s", mapping_str.c_str());
+    LOG(INFO) << "Device mapping:\n" << mapping_str;
+  }
+  for (auto d : device_mgr_->ListDevices()) {
+    devices_.push_back(d);
+    device_set_.AddDevice(d);
+    d->op_segment()->AddHold(session_handle_);
+
+    // The first device added is special: it is the 'client device' (a
+    // CPU device) from which we feed and fetch Tensors.
+    if (devices_added == 0) {
+      device_set_.set_client_device(d);
+    }
+    ++devices_added;
+  }
+}
+
+LocalSession::~LocalSession() {
+  for (auto d : device_mgr_->ListDevices()) {
+    d->op_segment()->RemoveHold(session_handle_);
+  }
+  for (auto it : executors_) {
+    delete it.second;
+  }
+  delete cancellation_manager_;
+}
+
+Status LocalSession::Create(const GraphDef& graph) {
+  mutex_lock l(graph_def_lock_);
+  if (graph_created_) {
+    return errors::AlreadyExists(
+        "A Graph has already been created for this session.");
+  }
+  return ExtendLocked(graph);
+}
+
+Status LocalSession::Extend(const GraphDef& graph) {
+  mutex_lock l(graph_def_lock_);
+  return ExtendLocked(graph);
+}
+
+Status LocalSession::ExtendLocked(const GraphDef& graph) {
+  graph_created_ = true;  // In case this is first call
+  graph_def_.MergeFrom(graph);
+  return Status::OK();
+}
+
+Status LocalSession::Run(const std::vector<std::pair<string, Tensor>>& inputs,
+                         const std::vector<string>& output_names,
+                         const std::vector<string>& target_nodes,
+                         std::vector<Tensor>* outputs) {
+  {
+    mutex_lock l(graph_def_lock_);
+    if (!graph_created_) {
+      return errors::InvalidArgument(
+          "Session was not created with a graph before Run()!");
+    }
+  }
+
+  // Extract the inputs names for this run of the session.
+  std::vector<string> input_tensor_names;
+  input_tensor_names.reserve(inputs.size());
+  for (const auto& it : inputs) {
+    input_tensor_names.push_back(it.first);
+  }
+
+  // Check if we already have an executor for these arguments.
+  ExecutorsAndKeys* executors_and_keys;
+  Status s = GetOrCreateExecutors(input_tensor_names, output_names,
+                                  target_nodes, &executors_and_keys);
+  if (!s.ok()) {
+    return s;
+  }
+
+  IntraProcessRendezvous* rendez =
+      new IntraProcessRendezvous(device_mgr_.get());
+  core::ScopedUnref rendez_unref(rendez);
+
+  // Insert the input tensors into the local rendezvous by their
+  // rendezvous key.
+  for (const auto& input : inputs) {
+    const string& input_key = executors_and_keys->input_keys[input.first];
+    s = rendez->Send(input_key, Rendezvous::Args(), input.second, false);
+    if (!s.ok()) {
+      rendez->StartAbort(s);
+      return s;
+    }
+  }
+
+  // Start parallel Executors.
+  Notification executors_done;
+  const int num_executors = executors_and_keys->device_executors.size();
+  ExecutorBarrier* barrier = new ExecutorBarrier(
+      num_executors, rendez, [&executors_done, &s](const Status& ret) {
+        s = ret;
+        executors_done.Notify();
+      });
+
+  Executor::Args args;
+  args.rendezvous = rendez;
+  args.cancellation_manager = cancellation_manager_;
+  args.runner = SchedClosure;
+
+  for (auto device_executor : executors_and_keys->device_executors) {
+    Executor* exec = device_executor.second;
+    exec->RunAsync(args, barrier->Get());
+  }
+
+  executors_done.WaitForNotification();
+
+  TF_RETURN_IF_ERROR(s);
+
+  if (!output_names.empty()) {
+    outputs->resize(output_names.size());
+  }
+
+  // Get the outputs from the rendezvous
+  for (size_t output_offset = 0; output_offset < output_names.size();
+       ++output_offset) {
+    const string& output_key =
+        executors_and_keys->output_keys[output_names[output_offset]];
+    Tensor output_tensor;
+    bool is_dead;
+
+    // Fetch data from the Rendezvous.
+    s = rendez->Recv(output_key, Rendezvous::Args(), &output_tensor, &is_dead);
+    if (is_dead) {
+      s = errors::InvalidArgument("The tensor returned for ",
+                                  output_names[output_offset],
+                                  " was not valid.");
+    }
+    if (!s.ok()) {
+      rendez->StartAbort(s);
+      outputs->clear();
+      return s;
+    }
+
+    (*outputs)[output_offset] = output_tensor;
+  }
+
+  return s;
+}
+
+Status LocalSession::GetOrCreateExecutors(
+    gtl::ArraySlice<string> inputs, gtl::ArraySlice<string> outputs,
+    gtl::ArraySlice<string> target_nodes,
+    ExecutorsAndKeys** executors_and_keys) {
+  // Sort the inputs and outputs, so we don't create separate
+  // executors when a user passes in the same inputs/outputs in
+  // different orders.
+  //
+  // We could consider some other signature instead of sorting that
+  // preserves the same property to avoid the sort in the future.
+  std::vector<string> inputs_sorted(inputs.begin(), inputs.end());
+  std::vector<string> outputs_sorted(outputs.begin(), outputs.end());
+  std::vector<string> tn_sorted(target_nodes.begin(), target_nodes.end());
+  std::sort(inputs_sorted.begin(), inputs_sorted.end());
+  std::sort(outputs_sorted.begin(), outputs_sorted.end());
+  std::sort(tn_sorted.begin(), tn_sorted.end());
+
+  const string key = strings::StrCat(str_util::Join(inputs_sorted, ","), "->",
+                                     str_util::Join(outputs_sorted, ","), "/",
+                                     str_util::Join(tn_sorted, ","));
+
+  // See if we already have the executors for this run.
+  {
+    mutex_lock l(executor_lock_);  // could use reader lock
+    auto it = executors_.find(key);
+    if (it != executors_.end()) {
+      *executors_and_keys = it->second;
+      return Status::OK();
+    }
+  }
+
+  // The executor_lock_ is intentionally released while executor is
+  // being created.
+  std::unordered_map<string, Graph*> graphs;
+  Status s = CreateGraphs(inputs, outputs, target_nodes, &graphs);
+  if (!s.ok()) {
+    return s;
+  }
+
+  bool has_control_flow = false;
+  for (const auto& graph : graphs) {
+    for (const Node* n : graph.second->nodes()) {
+      if (IsControlFlow(n)) {
+        has_control_flow = true;
+        break;
+      }
+    }
+    if (has_control_flow) break;
+  }
+
+  std::unique_ptr<ExecutorsAndKeys> ek(new ExecutorsAndKeys);
+
+  for (const auto& graph : graphs) {
+    const string& partition_name = graph.first;
+    Graph* partition_graph = graph.second;
+
+    Device* d;
+    s = device_mgr_->LookupDevice(partition_name, &d);
+    if (!s.ok()) {
+      return s;
+    }
+
+    LocalExecutorParams params;
+    params.has_control_flow = has_control_flow;
+    params.device = d;
+    params.create_kernel = [this, d](const NodeDef& ndef, OpKernel** kernel) {
+      return CreateCachedKernel(d, session_handle_, nullptr, ndef, kernel);
+    };
+    params.delete_kernel = [this, d](OpKernel* kernel) {
+      DeleteCachedKernel(d, session_handle_, kernel);
+    };
+
+    Executor* tmp_exec;
+    s = NewLocalExecutor(params, partition_graph, &tmp_exec);
+    if (!s.ok()) {
+      return s;
+    }
+    ek->device_executors.insert(std::make_pair(graph.first, tmp_exec));
+  }
+
+  // Compute the rendezvous keys to avoid recomputing them every time.
+  //
+  // We always use the first device as the device name portion of the
+  // key, even if we're feeding another graph.
+  for (const string& input : inputs) {
+    ek->input_keys[input] = GetRendezvousKey(
+        input, device_set_.client_device()->attributes(), FrameAndIter(0, 0));
+  }
+  for (const string& output : outputs) {
+    ek->output_keys[output] = GetRendezvousKey(
+        output, device_set_.client_device()->attributes(), FrameAndIter(0, 0));
+  }
+
+  // Reacquire the lock, try to insert into the map.
+  mutex_lock l(executor_lock_);
+  const bool inserted = executors_.insert(std::make_pair(key, ek.get())).second;
+  if (!inserted) {
+    // Another thread created the entry before us, so delete the
+    // one we created and return the already created one.
+    auto it = executors_.find(key);
+    *executors_and_keys = it->second;
+  } else {
+    *executors_and_keys = ek.release();
+  }
+
+  return Status::OK();
+}
+
+void LocalSession::SaveStatefulNodes(Graph* graph) {
+  for (Node* n : graph->nodes()) {
+    if (n->op_def().is_stateful()) {
+      VLOG(2) << "Saving " << n->DebugString();
+      stateful_placements_[n->name()] = n->assigned_device_name();
+    }
+  }
+}
+
+void LocalSession::RestoreStatefulNodes(Graph* graph) {
+  for (Node* n : graph->nodes()) {
+    if (n->op_def().is_stateful()) {
+      auto iter = stateful_placements_.find(n->name());
+      if (iter != stateful_placements_.end()) {
+        n->set_assigned_device_name(iter->second);
+        VLOG(2) << "Restored " << n->DebugString();
+      }
+    }
+  }
+}
+
+Status LocalSession::CreateGraphs(gtl::ArraySlice<string> feeds,
+                                  gtl::ArraySlice<string> fetches,
+                                  gtl::ArraySlice<string> target_nodes,
+                                  std::unordered_map<string, Graph*>* outputs) {
+  Graph graph(OpRegistry::Global());
+  GraphConstructorOptions opts;
+
+  {
+    mutex_lock l(graph_def_lock_);
+    TF_RETURN_IF_ERROR(ConvertGraphDefToGraph(opts, graph_def_, &graph));
+  }
+
+  TF_RETURN_IF_ERROR(subgraph::RewriteGraphForExecution(
+      &graph, feeds, fetches, target_nodes,
+      device_set_.client_device()->attributes()));
+
+  // Run the simple placer after rewriting the graph.
+  std::unordered_map<string, int32> node_name_to_cost_map;
+  for (Node* n : graph.nodes()) {
+    node_name_to_cost_map[n->name()] = n->cost_id();
+  }
+  SimplePlacer placer(&graph, &device_set_, &node_name_to_cost_map, &options_);
+
+  {
+    mutex_lock l(mu_);
+    // Restore stateful nodes.
+    RestoreStatefulNodes(&graph);
+    TF_RETURN_IF_ERROR(placer.Run());
+    // Save stateful nodes.
+    SaveStatefulNodes(&graph);
+  }
+
+  // Partition the graph across devices.
+  std::unordered_map<string, GraphDef> partitions;
+  PartitionOptions popts;
+  popts.node_to_loc = [](const Node* node) {
+    return node->assigned_device_name();
+  };
+  popts.new_name = [this](const string& prefix) {
+    mutex_lock l(mu_);
+    return strings::StrCat(prefix, "/_", name_counter_++);
+  };
+  popts.get_incarnation = [](const string& name) {
+    // The local session does not have changing incarnation numbers.
+    // Just return '1'.
+    return 1;
+  };
+  popts.control_flow_added = false;
+  TF_RETURN_IF_ERROR(Partition(popts, &graph, &partitions));
+
+  std::vector<string> device_names;
+  for (auto device : devices_) {
+    // Extract the LocalName from the device.
+    device_names.push_back(DeviceNameUtils::LocalName(device->name()));
+  }
+
+  // Check for valid partitions.
+  for (const auto& partition : partitions) {
+    const string& local_partition_name =
+        DeviceNameUtils::LocalName(partition.first);
+    if (std::count(device_names.begin(), device_names.end(),
+                   local_partition_name) == 0) {
+      return errors::InvalidArgument(
+          "Creating a partition for ", local_partition_name,
+          " which doesn't exist in the list of available devices. Available "
+          "devices: ",
+          str_util::Join(device_names, ","));
+    }
+  }
+
+  for (const auto& partition : partitions) {
+    const string& partition_name = partition.first;
+
+    const GraphDef& graph_def = partition.second;
+    VLOG(2) << "Created " << graph_def.DebugString() << " for "
+            << partition_name;
+
+    Graph* device_graph = new Graph(OpRegistry::Global());
+    GraphConstructorOptions device_opts;
+    // There are internal operations (e.g., send/recv) that we now
+    // allow.
+    device_opts.allow_internal_ops = true;
+    device_opts.expect_device_spec = true;
+    Status s =
+        ConvertGraphDefToGraph(device_opts, graph_def, device_graph);
+    if (!s.ok()) {
+      delete device_graph;
+      // Also delete other graphs created during the loop.
+      gtl::STLDeleteValues(outputs);
+      return s;
+    }
+    outputs->insert(std::make_pair(partition_name, device_graph));
+  }
+
+  return Status::OK();
+}
+
+::tensorflow::Status LocalSession::Close() {
+  cancellation_manager_->StartCancel();
+  return ::tensorflow::Status::OK();
+}
+
+class LocalSessionFactory : public SessionFactory {
+ public:
+  LocalSessionFactory() {}
+
+  Session* NewSession(const SessionOptions& options) override {
+    std::vector<Device*> devices;
+    DeviceFactory::AddDevices(options, "/job:localhost/replica:0/task:0",
+                              &devices);
+    return new LocalSession(options, new DeviceMgr(devices));
+  }
+};
+
+class LocalSessionRegistrar {
+ public:
+  LocalSessionRegistrar() {
+    SessionFactory::Register("LOCAL_SESSION", new LocalSessionFactory());
+  }
+};
+static LocalSessionRegistrar registrar;
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/local_session.h b/tensorflow/core/common_runtime/local_session.h
new file mode 100644
index 0000000000..453cfdde47
--- /dev/null
+++ b/tensorflow/core/common_runtime/local_session.h
@@ -0,0 +1,109 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_LOCAL_SESSION_H_
+#define TENSORFLOW_COMMON_RUNTIME_LOCAL_SESSION_H_
+
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/common_runtime/device_mgr.h"
+#include "tensorflow/core/common_runtime/device_set.h"
+#include "tensorflow/core/common_runtime/executor.h"
+#include "tensorflow/core/framework/cancellation.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/session.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class Device;
+
+class LocalSession : public Session {
+ public:
+  // Takes ownership of 'device_mgr'.
+  LocalSession(const SessionOptions& options, const DeviceMgr* device_mgr);
+  ~LocalSession() override;
+
+  ::tensorflow::Status Create(const GraphDef& graph) override;
+  ::tensorflow::Status Extend(const GraphDef& graph) override;
+  ::tensorflow::Status Run(const std::vector<std::pair<string, Tensor>>& inputs,
+                           const std::vector<string>& output_names,
+                           const std::vector<string>& target_nodes,
+                           std::vector<Tensor>* outputs) override;
+  ::tensorflow::Status Close() override;
+
+ private:
+  struct ExecutorsAndKeys {
+    std::unordered_map<string, Executor*> device_executors;
+    std::unordered_map<string, string> input_keys;
+    std::unordered_map<string, string> output_keys;
+
+    ~ExecutorsAndKeys() {
+      for (auto it : device_executors) {
+        delete it.second;
+      }
+    }
+  };
+
+  // Retrieves an already existing set of executors to run 'inputs' and
+  // 'outputs', or creates and caches them for future use.
+  ::tensorflow::Status GetOrCreateExecutors(
+      gtl::ArraySlice<string> inputs, gtl::ArraySlice<string> outputs,
+      gtl::ArraySlice<string> target_nodes,
+      ExecutorsAndKeys** executors_and_keys);
+
+  // Creates several graphs given the existing graph_def_ and the
+  // input feeds and fetches, given 'devices'.
+  ::tensorflow::Status CreateGraphs(
+      gtl::ArraySlice<string> feeds, gtl::ArraySlice<string> fetches,
+      gtl::ArraySlice<string> target_nodes,
+      std::unordered_map<string, Graph*>* outputs);
+
+  ::tensorflow::Status ExtendLocked(const GraphDef& graph)
+      EXCLUSIVE_LOCKS_REQUIRED(graph_def_lock_);
+
+  const SessionOptions options_;
+
+  // Device structures.
+  const std::unique_ptr<const DeviceMgr> device_mgr_;
+  std::vector<Device*> devices_;           // not owned
+  DeviceSet device_set_;
+
+  string session_handle_;
+  bool graph_created_ GUARDED_BY(graph_def_lock_) = false;
+
+  mutex graph_def_lock_;
+  GraphDef graph_def_ GUARDED_BY(graph_def_lock_);
+
+  mutex executor_lock_;  // protects executors_
+  // Holds mappings from signature to the executors that process
+  // it. The reason for a level of indirection around mapped_type is
+  // to guarantee address stability.
+  std::unordered_map<string, ExecutorsAndKeys*> executors_
+      GUARDED_BY(executor_lock_);
+
+  CancellationManager* cancellation_manager_;
+
+  // Saves and restores device placements for stateful nodes.
+  mutex mu_;
+  void SaveStatefulNodes(Graph* graph) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+  void RestoreStatefulNodes(Graph* graph) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+  // Map of placed stateful nodes, i.e. nodes for which is_stateful()
+  // is true, such as "params" and "queue" nodes.  Once placed these
+  // nodes can not be moved to a different device.  Maps node names to
+  // device names.
+  std::unordered_map<string, string> stateful_placements_ GUARDED_BY(mu_);
+
+  // For generating unique names.
+  int64 name_counter_ GUARDED_BY(mu_) = 0;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(LocalSession);
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_LOCAL_SESSION_H_
diff --git a/tensorflow/core/common_runtime/local_session_test.cc b/tensorflow/core/common_runtime/local_session_test.cc
new file mode 100644
index 0000000000..9325fe44c3
--- /dev/null
+++ b/tensorflow/core/common_runtime/local_session_test.cc
@@ -0,0 +1,314 @@
+#include "tensorflow/core/common_runtime/local_session.h"
+
+#include <map>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/testlib.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/public/session_options.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/device_name_utils.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+Session* CreateSession() {
+  SessionOptions options;
+  (*options.config.mutable_device_count())["CPU"] = 2;
+  return NewSession(options);
+}
+
+class LocalSessionMinusAXTest : public ::testing::Test {
+ public:
+  void Initialize(std::initializer_list<float> a_values) {
+    RequireDefaultOps();
+    Graph graph(OpRegistry::Global());
+
+    Tensor a_tensor(DT_FLOAT, TensorShape({2, 2}));
+    test::FillValues<float>(&a_tensor, a_values);
+    Node* a = test::graph::Constant(&graph, a_tensor);
+    a->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:0");
+
+    Tensor x_tensor(DT_FLOAT, TensorShape({2, 1}));
+    test::FillValues<float>(&x_tensor, {1, 1});
+    Node* x = test::graph::Constant(&graph, x_tensor);
+    x->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:1");
+    x_ = x->name();
+
+    // y = A * x
+    Node* y = test::graph::Matmul(&graph, a, x, false, false);
+    y->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:0");
+    y_ = y->name();
+
+    Node* y_neg = test::graph::Unary(&graph, "Neg", y);
+    y_neg_ = y_neg->name();
+    y_neg->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:1");
+
+    test::graph::ToGraphDef(&graph, &def_);
+  }
+
+  string x_;
+  string y_;
+  string y_neg_;
+  GraphDef def_;
+};
+
+TEST_F(LocalSessionMinusAXTest, RunSimpleNetwork) {
+  Initialize({3, 2, -1, 0});
+  std::unique_ptr<Session> session(CreateSession());
+  ASSERT_TRUE(session != nullptr);
+  ASSERT_OK(session->Create(def_));
+  std::vector<std::pair<string, Tensor>> inputs;
+
+  // Request two targets: one fetch output and one non-fetched output.
+  std::vector<string> output_names = {y_ + ":0"};
+  std::vector<string> target_nodes = {y_neg_};
+  std::vector<Tensor> outputs;
+  Status s = session->Run(inputs, output_names, target_nodes, &outputs);
+  ASSERT_OK(s);
+
+  ASSERT_EQ(1, outputs.size());
+  // The first output should be initiailzed and have the correct
+  // output.
+  auto mat = outputs[0].matrix<float>();
+  ASSERT_TRUE(outputs[0].IsInitialized());
+  EXPECT_FLOAT_EQ(5.0, mat(0, 0));
+}
+
+TEST_F(LocalSessionMinusAXTest, TestFeed) {
+  Initialize({1, 2, 3, 4});
+  std::unique_ptr<Session> session(CreateSession());
+  ASSERT_TRUE(session != nullptr);
+
+  ASSERT_OK(session->Create(def_));
+
+  // Fill in the input and ask for the output
+  //
+  // Note that the input being fed is on the second device.
+  Tensor t(DT_FLOAT, TensorShape({2, 1}));
+  t.matrix<float>()(0, 0) = 5;
+  t.matrix<float>()(1, 0) = 6;
+  std::vector<std::pair<string, Tensor>> inputs = {{x_, t}};
+  std::vector<string> output_names = {y_ + ":0"};
+  std::vector<Tensor> outputs;
+
+  // Run the graph
+  Status s = session->Run(inputs, output_names, {}, &outputs);
+  ASSERT_OK(s);
+
+  ASSERT_EQ(1, outputs.size());
+  auto mat = outputs[0].matrix<float>();
+
+  // Expect outputs to be; 1*5 + 2*6, 3*5 + 4*6
+  EXPECT_FLOAT_EQ(17.0, mat(0, 0));
+  EXPECT_FLOAT_EQ(39.0, mat(1, 0));
+}
+
+TEST_F(LocalSessionMinusAXTest, TestConcurrency) {
+  Initialize({1, 2, 3, 4});
+  std::unique_ptr<Session> session(CreateSession());
+  ASSERT_TRUE(session != nullptr);
+  ASSERT_OK(session->Create(def_));
+
+  // Fill in the input and ask for the output
+  thread::ThreadPool* tp = new thread::ThreadPool(Env::Default(), "test", 4);
+
+  // Run the graph 1000 times in 4 different threads concurrently.
+  std::vector<string> output_names = {y_ + ":0"};
+  auto fn = [&session, output_names]() {
+    for (int i = 0; i < 1000; ++i) {
+      std::vector<std::pair<string, Tensor>> inputs;
+      std::vector<Tensor> outputs;
+      // Run the graph
+      Status s = session->Run(inputs, output_names, {}, &outputs);
+      ASSERT_TRUE(s.ok());
+      ASSERT_EQ(1, outputs.size());
+      auto mat = outputs[0].matrix<float>();
+      EXPECT_FLOAT_EQ(3.0, mat(0, 0));
+    }
+  };
+
+  for (int i = 0; i < 4; ++i) {
+    tp->Schedule(fn);
+  }
+
+  // Wait for the functions to finish.
+  delete tp;
+}
+
+TEST_F(LocalSessionMinusAXTest, TwoCreateCallsFails) {
+  Initialize({1, 2, 3, 4});
+  std::unique_ptr<Session> session(CreateSession());
+  ASSERT_TRUE(session != nullptr);
+  ASSERT_OK(session->Create(def_));
+
+  // Second is not.
+  ASSERT_FALSE(session->Create(def_).ok());
+}
+
+TEST_F(LocalSessionMinusAXTest, ForgetToCreate) {
+  Initialize({1, 2, 3, 4});
+  std::unique_ptr<Session> session(CreateSession());
+  ASSERT_TRUE(session != nullptr);
+  std::vector<std::pair<string, Tensor>> inputs;
+  std::vector<Tensor> outputs;
+  ASSERT_FALSE(session->Run(inputs, {y_ + ":0"}, {y_neg_}, &outputs).ok());
+}
+
+TEST_F(LocalSessionMinusAXTest, InvalidDevice) {
+  GraphDef def;
+  Graph graph(OpRegistry::Global());
+
+  Tensor a_tensor(DT_FLOAT, TensorShape({2, 2}));
+  a_tensor.flat<float>().setRandom();
+  Node* a = test::graph::Constant(&graph, a_tensor);
+  a->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:0");
+  Tensor x_tensor(DT_FLOAT, TensorShape({2, 1}));
+  x_tensor.flat<float>().setRandom();
+  Node* x = test::graph::Constant(&graph, x_tensor);
+  x->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:1");
+  // Skip placing y.
+  Node* y = test::graph::Matmul(&graph, a, x, false, false);
+  y->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:2");
+
+  test::graph::ToGraphDef(&graph, &def);
+
+  std::unique_ptr<Session> session(CreateSession());
+  ASSERT_TRUE(session != nullptr);
+  ASSERT_OK(session->Create(def));
+  std::vector<std::pair<string, Tensor>> inputs;
+  std::vector<string> output_names = {y->name() + ":0"};
+  std::vector<Tensor> outputs;
+
+  // Should return an error.
+  ASSERT_FALSE(session->Run(inputs, output_names, {}, &outputs).ok());
+
+  // Fix placement and run again
+  def.Clear();
+  y->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:1");
+  test::graph::ToGraphDef(&graph, &def);
+  session.reset(CreateSession());
+  ASSERT_OK(session->Create(def));
+  ASSERT_OK(session->Run(inputs, output_names, {}, &outputs));
+}
+
+TEST(LocalSessionTest, KeepsStateAcrossRunsOfSession) {
+  GraphDef def;
+  Graph g(OpRegistry::Global());
+  Node* var = test::graph::Var(&g, DT_FLOAT, TensorShape({10}));
+  var->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:0");
+
+  Tensor twenty(DT_FLOAT, TensorShape({10}));
+  for (int i = 0; i < 10; ++i) {
+    twenty.flat<float>()(i) = 20.0;
+  }
+
+  Node* twenty_node = test::graph::Constant(&g, twenty);
+  twenty_node->set_assigned_device_name(
+      "/job:localhost/replica:0/task:0/cpu:0");
+
+  Node* init = test::graph::Assign(&g, var, twenty_node);
+  init->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:0");
+
+  test::graph::ToGraphDef(&g, &def);
+
+  std::unique_ptr<Session> session(CreateSession());
+  ASSERT_TRUE(session != nullptr);
+  ASSERT_OK(session->Create(def));
+
+  std::vector<std::pair<string, Tensor>> inputs;
+  std::vector<Tensor> outputs;
+
+  // Initialize the variable
+  Status s = session->Run(inputs, {init->name()}, {}, &outputs);
+  ASSERT_OK(s);
+
+  // Get the variable's data
+  s = session->Run(inputs, {var->name() + ":0"}, {}, &outputs);
+  ASSERT_OK(s);
+  ASSERT_EQ(1, outputs.size());
+  ASSERT_TRUE(outputs[0].IsInitialized());
+  EXPECT_EQ(20.0, outputs[0].flat<float>()(0));
+}
+
+TEST(LocalSessionTest, MultipleFeedTest) {
+  GraphDef def;
+  Graph g(OpRegistry::Global());
+  Node* var = test::graph::Var(&g, DT_FLOAT, TensorShape({10}));
+  var->set_assigned_device_name("/job:localhost/replica:0/task:0/cpu:0");
+
+  Tensor first_value(DT_FLOAT, TensorShape({}));
+  first_value.scalar<float>()() = 1.0;
+  Node* first_const = test::graph::Constant(&g, first_value);
+  Node* first_identity = test::graph::Identity(&g, first_const);
+
+  Tensor second_value(DT_FLOAT, TensorShape({}));
+  second_value.scalar<float>()() = 2.0;
+  Node* second_const = test::graph::Constant(&g, second_value);
+  Node* second_identity = test::graph::Identity(&g, second_const);
+
+  test::graph::ToGraphDef(&g, &def);
+
+  std::unique_ptr<Session> session(CreateSession());
+  ASSERT_TRUE(session != nullptr);
+  ASSERT_OK(session->Create(def));
+
+  std::vector<Tensor> outputs;
+
+  // Fetch without feeding.
+  Status s = session->Run(
+      {}, {first_identity->name() + ":0", second_identity->name() + ":0"}, {},
+      &outputs);
+  ASSERT_TRUE(s.ok());
+  ASSERT_EQ(2, outputs.size());
+  ASSERT_EQ(1.0, outputs[0].flat<float>()(0));
+  ASSERT_EQ(2.0, outputs[1].flat<float>()(0));
+
+  s = session->Run(
+      {}, {second_identity->name() + ":0", first_identity->name() + ":0"}, {},
+      &outputs);
+  ASSERT_TRUE(s.ok());
+  ASSERT_EQ(2, outputs.size());
+  ASSERT_EQ(2.0, outputs[0].flat<float>()(0));
+  ASSERT_EQ(1.0, outputs[1].flat<float>()(0));
+
+  Tensor value_11(DT_FLOAT, TensorShape({}));
+  value_11.scalar<float>()() = 11.0;
+  Tensor value_22(DT_FLOAT, TensorShape({}));
+  value_22.scalar<float>()() = 22.0;
+
+  // Feed [first_const, second_const]
+  s = session->Run(
+      {{first_const->name(), value_11}, {second_const->name(), value_22}},
+      {first_identity->name() + ":0", second_identity->name() + ":0"}, {},
+      &outputs);
+  ASSERT_TRUE(s.ok());
+  ASSERT_EQ(2, outputs.size());
+  ASSERT_EQ(11.0, outputs[0].flat<float>()(0));
+  ASSERT_EQ(22.0, outputs[1].flat<float>()(0));
+
+  // Feed [second_const, first_const]
+  s = session->Run(
+      {{second_const->name(), value_22}, {first_const->name(), value_11}},
+      {first_identity->name() + ":0", second_identity->name() + ":0"}, {},
+      &outputs);
+  ASSERT_TRUE(s.ok());
+  ASSERT_EQ(2, outputs.size());
+  ASSERT_EQ(11.0, outputs[0].flat<float>()(0));
+  ASSERT_EQ(22.0, outputs[1].flat<float>()(0));
+}
+
+}  // namespace
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/rendezvous_mgr.cc b/tensorflow/core/common_runtime/rendezvous_mgr.cc
new file mode 100644
index 0000000000..111dea6d4c
--- /dev/null
+++ b/tensorflow/core/common_runtime/rendezvous_mgr.cc
@@ -0,0 +1,170 @@
+#include "tensorflow/core/common_runtime/rendezvous_mgr.h"
+
+#include <unordered_set>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_mgr.h"
+#if (!defined(PLATFORM_POSIX_ANDROID) && !defined(PLATFORM_GOOGLE_ANDROID)) && \
+    (defined(PLATFORM_GOOGLE) || GOOGLE_CUDA)
+#include "tensorflow/core/common_runtime/gpu/gpu_util.h"
+#endif
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+namespace {
+
+void CopyTensorBetweenDevices(const string& id, DeviceContext* send_dev_context,
+                              DeviceContext* recv_dev_context, Device* src,
+                              Device* dst,
+                              const AllocatorAttributes src_alloc_attr,
+                              const AllocatorAttributes dst_alloc_attr,
+                              const Tensor* input, Tensor* output,
+                              std::function<void(const Status&)> done) {
+  if (src->attributes().device_type() != dst->attributes().device_type()) {
+    done(errors::Unimplemented(
+        "Copy between device types not yet implemented: src=", src->name(),
+        " dst=", dst->name()));
+  } else if (src->attributes().device_type() != "CPU") {
+    done(errors::Unimplemented(
+        "Copy between non-CPU devices not yet implemented"));
+  }
+  *output = *input;
+  done(Status::OK());
+}
+
+#if (!defined(PLATFORM_POSIX_ANDROID) && !defined(PLATFORM_GOOGLE_ANDROID)) && \
+    (defined(PLATFORM_GOOGLE) || GOOGLE_CUDA)
+constexpr auto CopyTensorBetweenDevicesFunc = &GPUUtil::CopyViaDMA;
+#else
+constexpr auto CopyTensorBetweenDevicesFunc = &CopyTensorBetweenDevices;
+#endif
+
+}  // end namespace
+
+IntraProcessRendezvous::IntraProcessRendezvous(const DeviceMgr* device_mgr)
+    : device_mgr_(device_mgr), local_(NewLocalRendezvous()) {}
+
+IntraProcessRendezvous::~IntraProcessRendezvous() { local_->Unref(); }
+
+Status IntraProcessRendezvous::Send(const string& key,
+                                    const Rendezvous::Args& args,
+                                    const Tensor& val, const bool is_dead) {
+  VLOG(1) << "IntraProcessRendezvous Send " << this << " " << key;
+  {
+    mutex_lock l(mu_);
+    if (!status_.ok()) return status_;
+  }
+  Rendezvous::ParsedKey parsed;
+  TF_RETURN_IF_ERROR(Rendezvous::ParseKey(key, &parsed));
+
+  // Buffers "val" and "device_context" in local_.
+  return local_->Send(key, args, val, is_dead);
+}
+
+Status IntraProcessRendezvous::ParseKey(const string& key, bool is_src,
+                                        Rendezvous::ParsedKey* parsed) {
+  {
+    mutex_lock l(mu_);
+    if (!status_.ok()) return status_;
+  }
+  TF_RETURN_IF_ERROR(Rendezvous::ParseKey(key, parsed));
+  return Status::OK();
+}
+
+void IntraProcessRendezvous::SameWorkerRecvDone(
+    const Rendezvous::ParsedKey& parsed, const Rendezvous::Args& send_args,
+    const Rendezvous::Args& recv_args, const Tensor& in, Tensor* out,
+    StatusCallback done) {
+  // Do a quick copy (sharing the underlying buffer) if both tensors
+  // are on host memory.
+  const bool src_host =
+      (send_args.alloc_attrs.on_host() || parsed.src.type == "CPU");
+  const bool dst_host =
+      (recv_args.alloc_attrs.on_host() || parsed.dst.type == "CPU");
+  if (src_host && dst_host) {
+    *out = in;
+    done(Status::OK());
+    return;
+  }
+
+  // This copy must involve a non-CPU device. Hence, "in" must support DMA
+  // (e.g., string tensors do not work on GPU).
+  if (!DataTypeCanUseMemcpy(in.dtype())) {
+    done(errors::InvalidArgument("Non-DMA-safe ", DataTypeString(in.dtype()),
+                                 " tensor may not be copied from/to a GPU."));
+    return;
+  }
+
+  Device* src_device;
+  Status s = device_mgr_->LookupDevice(parsed.src_device, &src_device);
+  if (!s.ok()) {
+    done(s);
+    return;
+  }
+  Device* dst_device;
+  s = device_mgr_->LookupDevice(parsed.dst_device, &dst_device);
+  if (!s.ok()) {
+    done(s);
+    return;
+  }
+
+  AllocatorAttributes attr = recv_args.alloc_attrs;
+  attr.set_gpu_compatible(send_args.alloc_attrs.gpu_compatible() ||
+                          recv_args.alloc_attrs.gpu_compatible());
+  Allocator* out_allocator = dst_device->GetAllocator(attr);
+  Tensor copy(out_allocator, in.dtype(), in.shape());
+  *out = copy;
+
+  CopyTensorBetweenDevicesFunc(parsed.edge_name, send_args.device_context,
+                               recv_args.device_context, src_device, dst_device,
+                               send_args.alloc_attrs, recv_args.alloc_attrs,
+                               &in, out, done);
+}
+
+void IntraProcessRendezvous::RecvAsync(const string& key,
+                                       const Rendezvous::Args& recv_args,
+                                       DoneCallback done) {
+  VLOG(1) << "IntraProcessRendezvous Recv " << this << " " << key;
+
+  Rendezvous::ParsedKey parsed;
+  Status s = ParseKey(key, false /*!is_src*/, &parsed);
+  if (!s.ok()) {
+    done(s, Args(), recv_args, Tensor(), false);
+    return;
+  }
+
+  // Recv the tensor from local_.
+  local_->RecvAsync(key, recv_args, [this, parsed, done](
+                                        const Status& status,
+                                        const Rendezvous::Args& send_args,
+                                        const Rendezvous::Args& recv_args,
+                                        const Tensor& in, bool is_dead) {
+    Status s = status;
+    Tensor* out = new Tensor;
+    StatusCallback final_callback = [done, send_args, recv_args, out,
+                                     is_dead](const Status& s) {
+      done(s, send_args, recv_args, *out, is_dead);
+      delete out;
+    };
+
+    if (s.ok()) {
+      SameWorkerRecvDone(parsed, send_args, recv_args, in, out, final_callback);
+    } else {
+      final_callback(s);
+    }
+  });
+}
+
+void IntraProcessRendezvous::StartAbort(const Status& s) {
+  CHECK(!s.ok());
+  local_->StartAbort(s);
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/common_runtime/rendezvous_mgr.h b/tensorflow/core/common_runtime/rendezvous_mgr.h
new file mode 100644
index 0000000000..eaae65f956
--- /dev/null
+++ b/tensorflow/core/common_runtime/rendezvous_mgr.h
@@ -0,0 +1,73 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_RENDEZVOUS_MGR_H_
+#define TENSORFLOW_COMMON_RUNTIME_RENDEZVOUS_MGR_H_
+
+#include <string>
+#include <unordered_map>
+
+#include "tensorflow/core/common_runtime/device_mgr.h"
+#include "tensorflow/core/framework/rendezvous.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+// IntraProcessRendezvous is a Rendezvous which expects all producers
+// and consumers to be devices immediately accessible within the
+// process.  That is, it will never be necessary to perform an RPC to
+// communicate with either.
+//
+// Buffering of Tensor values is delegated to a "local" Rendezvous
+// obtained from NewLocalRendezvous().  This class just adds
+// functionality to coordinate multiple process-local devices.
+class IntraProcessRendezvous : public Rendezvous {
+ public:
+  explicit IntraProcessRendezvous(const DeviceMgr* device_mgr);
+
+  // Forwards to local_, where the Tensor "val" will be buffered and
+  // any waiting callback stored.
+  Status Send(const string& key, const Rendezvous::Args& args,
+              const Tensor& val, const bool is_dead) override;
+
+  // This method is called only by the RecvOp.  It tests to see
+  // whether the value will be produced by a local or remote device
+  // and handles accordingly.  In the local case it forwards to
+  // local_, in the remote case it initiates an RPC request.
+  void RecvAsync(const string& key, const Rendezvous::Args& args,
+                 DoneCallback done) override;
+
+  void StartAbort(const Status& status) override;
+
+ private:
+  const DeviceMgr* device_mgr_;
+  Rendezvous* local_;  // Owns a Ref on this object.
+
+  mutable mutex mu_;
+
+  // Status given by StartAbort() if any.
+  Status status_ GUARDED_BY(mu_);
+
+  ~IntraProcessRendezvous() override;
+
+  // Parses "key" into "parsed". If "is_src" is true, checks that the
+  // rendezvous key's source is in this process. If "is_src" is false,
+  // checks that the rendezvous key's destination is in this process.
+  Status ParseKey(const string& key, bool is_src,
+                  Rendezvous::ParsedKey* parsed);
+
+  // Callback handling the case when a rendezvous has been
+  // accomplished in local_ and the consumer is local to this process.
+  // Tensor "in" will be copied into "out". The key "parsed" encodes
+  // the src and dst devices.
+  typedef std::function<void(const Status&)> StatusCallback;
+  void SameWorkerRecvDone(const Rendezvous::ParsedKey& parsed,
+                          const Rendezvous::Args& send_args,
+                          const Rendezvous::Args& recv_args, const Tensor& in,
+                          Tensor* out, StatusCallback done);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(IntraProcessRendezvous);
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_RENDEZVOUS_MGR_H_
diff --git a/tensorflow/core/common_runtime/session.cc b/tensorflow/core/common_runtime/session.cc
new file mode 100644
index 0000000000..6d1ab5cea4
--- /dev/null
+++ b/tensorflow/core/common_runtime/session.cc
@@ -0,0 +1,51 @@
+#include <string>
+
+#include "tensorflow/core/common_runtime/session_factory.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/session.h"
+
+namespace tensorflow {
+
+namespace {
+Status GetFactory(const SessionOptions& options, SessionFactory** ret) {
+  string runtime_type = "LOCAL_SESSION";
+  if (!options.target.empty()) {
+    // Use the service based session.
+    runtime_type = "REMOTE_SESSION";
+  }
+  *ret = SessionFactory::GetFactory(runtime_type);
+  if (!*ret) {
+    return errors::NotFound("Could not find session factory for ",
+                            runtime_type);
+  }
+  return Status::OK();
+}
+}  // end namespace
+
+Session* NewSession(const SessionOptions& options) {
+  SessionFactory* factory;
+  Status s = GetFactory(options, &factory);
+  if (!s.ok()) {
+    LOG(ERROR) << s;
+    return nullptr;
+  }
+  return factory->NewSession(options);
+}
+
+Status NewSession(const SessionOptions& options, Session** out_session) {
+  SessionFactory* factory;
+  Status s = GetFactory(options, &factory);
+  if (!s.ok()) {
+    *out_session = nullptr;
+    LOG(ERROR) << s;
+    return s;
+  }
+  *out_session = factory->NewSession(options);
+  if (!*out_session) {
+    return errors::Internal("Failed to create session.");
+  }
+  return Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/session_factory.cc b/tensorflow/core/common_runtime/session_factory.cc
new file mode 100644
index 0000000000..666b99812d
--- /dev/null
+++ b/tensorflow/core/common_runtime/session_factory.cc
@@ -0,0 +1,41 @@
+#include "tensorflow/core/common_runtime/session_factory.h"
+
+#include <unordered_map>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+namespace tensorflow {
+namespace {
+
+static mutex* get_session_factory_lock() {
+  static mutex session_factory_lock;
+  return &session_factory_lock;
+}
+
+typedef std::unordered_map<string, SessionFactory*> SessionFactories;
+SessionFactories* session_factories() {
+  static SessionFactories* factories = new SessionFactories;
+  return factories;
+}
+
+}  // namespace
+
+void SessionFactory::Register(const string& runtime_type,
+                              SessionFactory* factory) {
+  mutex_lock l(*get_session_factory_lock());
+  if (!session_factories()->insert({runtime_type, factory}).second) {
+    LOG(ERROR) << "Two session factories are being registered "
+               << "under" << runtime_type;
+  }
+}
+
+SessionFactory* SessionFactory::GetFactory(const string& runtime_type) {
+  mutex_lock l(*get_session_factory_lock());  // could use reader lock
+  auto it = session_factories()->find(runtime_type);
+  if (it == session_factories()->end()) {
+    return nullptr;
+  }
+  return it->second;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/session_factory.h b/tensorflow/core/common_runtime/session_factory.h
new file mode 100644
index 0000000000..f770ba93ff
--- /dev/null
+++ b/tensorflow/core/common_runtime/session_factory.h
@@ -0,0 +1,25 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_SESSION_FACTORY_H_
+#define TENSORFLOW_COMMON_RUNTIME_SESSION_FACTORY_H_
+
+#include <string>
+
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class Session;
+class SessionOptions;
+
+class SessionFactory {
+ public:
+  virtual Session* NewSession(const SessionOptions& options) = 0;
+  virtual ~SessionFactory() {}
+  static void Register(const string& runtime_type, SessionFactory* factory);
+  static SessionFactory* GetFactory(const string& runtime_type);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_SESSION_FACTORY_H_
diff --git a/tensorflow/core/common_runtime/session_options.cc b/tensorflow/core/common_runtime/session_options.cc
new file mode 100644
index 0000000000..ef585efb5c
--- /dev/null
+++ b/tensorflow/core/common_runtime/session_options.cc
@@ -0,0 +1,9 @@
+#include "tensorflow/core/public/session_options.h"
+
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+
+SessionOptions::SessionOptions() : env(Env::Default()) {}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/session_test.cc b/tensorflow/core/common_runtime/session_test.cc
new file mode 100644
index 0000000000..82b5d7ffb0
--- /dev/null
+++ b/tensorflow/core/common_runtime/session_test.cc
@@ -0,0 +1,17 @@
+#include "tensorflow/core/public/session.h"
+
+#include "tensorflow/core/public/session_options.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+TEST(SessionTest, InvalidTargetReturnsNull) {
+  SessionOptions options;
+  options.target = "invalid target";
+
+  EXPECT_EQ(nullptr, tensorflow::NewSession(options));
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/simple_placer.cc b/tensorflow/core/common_runtime/simple_placer.cc
new file mode 100644
index 0000000000..1cd1db29db
--- /dev/null
+++ b/tensorflow/core/common_runtime/simple_placer.cc
@@ -0,0 +1,559 @@
+#include "tensorflow/core/common_runtime/simple_placer.h"
+
+#include <memory>
+#include <utility>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/device_attributes.pb.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+
+namespace {
+
+// Returns a list of devices sorted by name from 'devices' whose type is in
+// 'supported_device_types'.  This function searches in order of the device
+// types in 'supported_device_types' and returns the *first* subset of devices
+// that match.
+//
+// For example, if suported_device_types contains {GPU, CPU} and
+// 'devices' contains CPU and GPU devices, the returned vector will
+// include *only* GPU devices, since that is higher in the priority
+// order in 'supported_device_types'.
+std::vector<Device*> FilterSupportedDevices(
+    const std::vector<Device*>& devices,
+    const DeviceTypeVector& supported_device_types) {
+  std::vector<Device*> filtered_devices;
+  auto device_sort = [](const Device* a, const Device* b) {
+    return a->name() < b->name();
+  };
+  for (DeviceType d : supported_device_types) {
+    for (Device* device : devices) {
+      if (DeviceType(device->attributes().device_type()) == d) {
+        filtered_devices.emplace_back(device);
+      }
+    }
+
+    // If there are any devices under this device type, return this
+    // subset.
+    if (!filtered_devices.empty()) {
+      std::sort(filtered_devices.begin(), filtered_devices.end(), device_sort);
+      return filtered_devices;
+    }
+  }
+
+  std::sort(filtered_devices.begin(), filtered_devices.end(), device_sort);
+  return filtered_devices;
+}
+
+bool HasColocatedNodeName(const Node& node) {
+  return StringPiece(node.def().device()).starts_with("@");
+}
+
+Status ParseColocatedNodeName(const Node& node,
+                              string* out_colocated_node_name) {
+  StringPiece device(node.def().device());
+  if (!device.Consume("@")) {
+    return errors::InvalidArgument("Malformed colocated node name: '", device,
+                                   "'");
+  }
+  // TODO(mrry): Validate that the node name is a valid node name.
+  *out_colocated_node_name = device.ToString();
+  return Status::OK();
+}
+
+// This class maintains the connected components of a colocation
+// constraint graph, and uses this information to assign a satisfying
+// device placement to the nodes of the graph.
+//
+// The typical usage pattern is:
+//
+//   Graph graph = ...;
+//   DeviceSet device_set = ...;
+//   ColocationGraph colocation_graph(graph, device_set);
+//
+//   // Add all the nodes of graph to colocation_graph.
+//   for (Node* node : graph.nodes()) {
+//     TF_RETURN_IF_ERROR(colocation_graph.AddNode(*node));
+//   }
+//
+//   // Add one or more colocation constraint.
+//   Node node_1 = *graph.FindNodeId(...);
+//   Node node_2 = *graph.FindNodeId(...);
+//   TF_RETURN_IF_ERROR(colocation_graph.ColocateNodes(node_1, node_2));
+//
+//   // Assign devices based on the accumulated constraints.
+//   for (Node* node : graph.nodes()) {
+//     TF_RETURN_IF_ERROR(colocation_graph.AssignDevice(node));
+//   }
+//
+// The implementation uses the union-find algorithm to maintain the
+// connected components efficiently and incrementally as edges
+// (implied by ColocationGraph::ColocateNodes() invocations) are added.
+class ColocationGraph {
+ public:
+  ColocationGraph(Graph* graph, const DeviceSet* device_set,
+                  const SessionOptions* options)
+      : device_set_(device_set),
+        device_types_(device_set->PrioritizedDeviceTypeList()),
+        options_(options) {
+    members_.reserve(graph->num_node_ids());
+  }
+
+  // Adds the given node to this ColocationGraph as a singleton.
+  //
+  // NOTE: The implementation assumes that the ids of nodes passed to
+  // this method are dense and zero-based; the memory used will be linear in
+  // the largest node ID.
+  // NOTE: If this method returns an error, *this is left in an undefined
+  // state.
+  Status AddNode(const Node& node) {
+    Member member;
+    TF_RETURN_IF_ERROR(InitializeMember(node, &member));
+    CHECK_GE(member.parent, 0);
+    members_.resize(member.parent + 1);
+    members_[member.parent] = std::move(member);
+    return Status::OK();
+  }
+
+  // Merge the (possibly disjoint) sets containing nodes "x" and
+  // "y". Returns OK if the all nodes in the union of these sets can
+  // be placed on the same device type.
+  //
+  // NOTE: If this method returns an error, *this is left in an undefined
+  // state.
+  Status ColocateNodes(const Node& x, const Node& y) {
+    int x_root = FindRoot(x.id());
+    int y_root = FindRoot(y.id());
+    if (x_root != y_root) {
+      // Merge the sets by swinging the parent pointer of the smaller
+      // tree to point to the root of the larger tree. Together with
+      // path compression in ColocationGraph::FindRoot, this ensures
+      // that we do not experience pathological performance on graphs
+      // such as chains.
+      int new_root, old_root;
+      if (members_[x_root].rank < members_[y_root].rank) {
+        // The tree rooted at x_root is shallower, so connect it to
+        // y_root. The rank of y_root is unchanged because its new
+        // child has strictly less rank.
+        members_[x_root].parent = y_root;
+        new_root = y_root;
+        old_root = x_root;
+      } else if (members_[x_root].rank > members_[y_root].rank) {
+        // The tree rooted at y_root is shallower, so connect it to
+        // x_root. The rank of x_root is unchanged because its new
+        // child has strictly less rank.
+        members_[y_root].parent = x_root;
+        new_root = x_root;
+        old_root = y_root;
+      } else {
+        // Both trees have the same rank, so break the tie by choosing
+        // x_root as the new root.
+        members_[y_root].parent = x_root;
+        // Increment the rank of the tree rooted at x_root, because it
+        // is now strictly deeper than before.
+        ++members_[x_root].rank;
+        new_root = x_root;
+        old_root = y_root;
+      }
+
+      // Merge the partial device specifications, and ensure that they are
+      // compatible. NULL options_ is treated as allowing soft placement.
+      // TODO(mrry): Consider enriching the error message by pointing
+      // out which nodes have the explicit partial device
+      // specifications that caused this conflict.
+      TF_RETURN_IF_ERROR(DeviceNameUtils::MergeDevNames(
+          &members_[new_root].device_name, members_[old_root].device_name,
+          options_ == nullptr || options_->config.allow_soft_placement()));
+
+      // Ensure that the common root has at least one supported device
+      // type, by computing the intersection of
+      // members_[new_root].supported_device_types and
+      // members_[old_root].supported_device_types.
+      MergeSupportedDevices(&members_[new_root].supported_device_types,
+                            members_[old_root].supported_device_types);
+      if (members_[x_root].supported_device_types.size() == 0) {
+        return errors::InvalidArgument(
+            "Cannot colocate nodes '", x.name(), "' and '", y.name(),
+            "' because no device type supports both of those nodes and the "
+            "other nodes colocated with them");
+      }
+    }
+    return Status::OK();
+  }
+
+  // For the given node, subject to the constraints previously given
+  // to this ColocationGraph, set its assigned_device_name. Returns OK
+  // if a satisfying device can be found, otherwise an error.
+  Status AssignDevice(Node* node) {
+    int node_root = FindRoot(node->id());
+    if (members_[node_root].assigned_device == nullptr) {
+      // We have not yet assigned a device for the colocated node set containing
+      // n, so we do so now using the constraints on the root node.
+
+      // "devices" will contain the set of feasible placements for the
+      // colocated node set containing n.
+      std::vector<Device*> devices;
+      if (DeviceNameUtils::HasSomeDetails(members_[node_root].device_name)) {
+        // The root node has a (possibly partial) device
+        // specification, so enumerate the physical devices that
+        // conform to it.
+        device_set_->FindMatchingDevices(members_[node_root].device_name,
+                                         &devices);
+
+        if (!devices.empty()) {
+          // Filter devices into those that are compatible with the root
+          // node (and its children).
+          devices = FilterSupportedDevices(
+              devices, members_[node_root].supported_device_types);
+        }
+
+        // Perform soft placement if allow_soft_placement is set.  options_
+        // being NULL is treated as allowing soft placement.
+        if (devices.empty() &&
+            (options_ == nullptr || options_->config.allow_soft_placement())) {
+          // The soft_device_name is the same as the node's device name
+          // without specifying the device type or ID.
+          DeviceNameUtils::ParsedName soft_device_name =
+              members_[node_root].device_name;
+          soft_device_name.type.clear();
+          soft_device_name.has_type = false;
+          soft_device_name.has_id = false;
+          device_set_->FindMatchingDevices(soft_device_name, &devices);
+          if (!devices.empty()) {
+            devices = FilterSupportedDevices(
+                devices, members_[node_root].supported_device_types);
+          }
+        }
+
+        if (devices.empty()) {
+          // Return an error when a physical device that matches an explicit
+          // device specification is not found. This ensures that we don't
+          // assign a node to GPU when the user wanted to force it on CPU.
+          DeviceNameUtils::ParsedName specified_device_name;
+          if (DeviceNameUtils::ParseFullName(node->def().device(),
+                                             &specified_device_name) &&
+              specified_device_name == members_[node_root].device_name) {
+            // The specified device and merged set device match, and
+            // will appear in the GraphDef (for debugging), so just
+            // print the specified device.
+            return errors::InvalidArgument(
+                "Could not satisfy explicit device specification '",
+                node->def().device(), "'");
+          } else {
+            // The specified device may be a valid device but the
+            // merged set device is different, so print both.
+            return errors::InvalidArgument(
+                "Could not satisfy explicit device specification '",
+                node->def().device(),
+                "' because the node was colocated with a group of nodes that "
+                "required incompatible device '",
+                DeviceNameUtils::ParsedNameToString(
+                    members_[node_root].device_name),
+                "'");
+          }
+        }
+      } else {
+        // The device is completely unspecified, so enumerate the devices that
+        // support all of the nodes in the set.
+        if (device_set_->devices().empty()) {
+          return errors::Internal("No devices are registered");
+        }
+        devices = FilterSupportedDevices(
+            device_set_->devices(), members_[node_root].supported_device_types);
+
+        if (devices.empty()) {
+          return errors::InvalidArgument(
+              "Node had no OpKernel registered to support this operation: ",
+              "Operation was ", node->type_string(), " and inputs were ",
+              DataTypeVectorString(node->input_types()));
+        }
+      }
+
+      // Returns the first device in sorted devices list so we will always
+      // choose the same device.
+      members_[node_root].assigned_device = devices[0];
+    }
+    node->set_assigned_device_name(members_[node_root].assigned_device->name());
+
+    // Log placement if log_device_placement is set.
+    if (options_ && options_->config.log_device_placement()) {
+      printf("%s: %s\n", node->name().c_str(),
+             node->assigned_device_name().c_str());
+      LOG(INFO) << node->name() << ": " << node->assigned_device_name();
+    }
+
+    return Status::OK();
+  }
+
+ private:
+  // Represents a node in the disjoint node set forest, and the
+  // accumulated constraints on the device used by that node.
+  struct Member {
+    Member() = default;
+    // The id of the node that is the parent of this one, or its own
+    // id if it is a root. parent <= 0 indicates that this member is invalid.
+    int parent = -1;
+    // A proxy for the depth of the tree that is used to prefer
+    // connecting smaller trees to larger trees when merging disjoint
+    // sets.
+    int rank = 0;
+    // The intersection of all device types supported by this node,
+    // and those of all of its children, in priority order
+    // of the preferred device.
+    DeviceTypeVector supported_device_types;
+    // The merged form of the device requested for this node, with
+    // those of all of its children.
+    DeviceNameUtils::ParsedName device_name;
+    // If this node is a root, stores the Device to which this node
+    // and all of its children have been assigned, or nullptr if this
+    // has not yet been computed by GetAssignedDevice().
+    Device* assigned_device = nullptr;
+  };
+
+  Status InitializeMember(const Node& node, Member* member) {
+    const int id = node.id();
+    if (id < 0) {
+      return errors::InvalidArgument("Node id was not positive: ", id);
+    }
+    member->parent = id;
+    TF_RETURN_IF_ERROR(SupportedDeviceTypesForNode(
+        device_types_, node.def(), &member->supported_device_types));
+
+    if (!node.assigned_device_name().empty()) {
+      // This node has already been assigned to a device, so we
+      // respect this placement, after sanity-checking it.  The
+      // device_name and supported_device_types for this node reflect
+      // the assigned device, so any nodes colocated with this node
+      // will be assigned to the same device (assuming this is
+      // possible).
+      // NOTE: Since any assignment must have been performed by
+      // the TensorFlow runtime, we consider errors in this branch to
+      // be INTERNAL.
+      if (!DeviceNameUtils::ParseFullName(node.assigned_device_name(),
+                                          &member->device_name)) {
+        return errors::Internal("Malformed assigned device '",
+                                node.assigned_device_name(), "'");
+      }
+      std::vector<Device*> devices;
+      const Device* assigned_device =
+          device_set_->FindDeviceByName(node.assigned_device_name());
+      if (assigned_device == nullptr) {
+        return errors::Internal("Assigned device '",
+                                node.assigned_device_name(),
+                                "' does not match any device");
+      }
+
+      for (DeviceType d : member->supported_device_types) {
+        if (DeviceType(assigned_device->attributes().device_type()) == d) {
+          return Status::OK();
+        }
+      }
+
+      return errors::Internal("Assigned device '", node.assigned_device_name(),
+                              "' does not have registered OpKernel support "
+                              "for ",
+                              node.def().op());
+    } else {
+      // This node has not yet been assigned to a device, so we
+      // calculate any constraints due to the set of registered
+      // kernels and any (partial) user-provided device specification
+      // in the NodeDef.
+
+      // If no kernels are registered for this op type, fail with an error.
+      if (member->supported_device_types.empty()) {
+        return errors::InvalidArgument(
+            "No OpKernel was registered to support "
+            "Op '",
+            node.def().op(), "' with these attrs");
+      }
+
+      // If the NodeDef contains a device that is *not* a colocated node name
+      // (i.e. it does not begin with '@') then we interpret it as a (partial)
+      // device specification.
+      string colocated_node_name;
+      if (!node.def().device().empty() && !HasColocatedNodeName(node)) {
+        // The user has specified a device in the NodeDef, try to find a
+        // valid device matching their specification in the set of
+        // devices.
+        // NOTE: The full name may specify a device that is not in
+        // n.supported_device_types(), but we check that in AssignDevice().
+        if (!DeviceNameUtils::ParseFullName(node.def().device(),
+                                            &member->device_name)) {
+          return errors::InvalidArgument("Malformed device specification '",
+                                         node.def().device(), "'");
+        }
+      }
+    }
+    return Status::OK();
+  }
+
+  // Updates target to contain the intersection of the device types in
+  // "target" and "other".
+  static void MergeSupportedDevices(DeviceTypeVector* target,
+                                    const DeviceTypeVector& other) {
+    DeviceTypeVector temp = *target;
+    target->clear();
+
+    // Iterate in priority order.
+    for (DeviceType device_type : temp) {
+      bool found = false;
+      for (DeviceType other_device_type : other) {
+        if (device_type == other_device_type) {
+          found = true;
+          break;
+        }
+      }
+      if (found) {
+        target->push_back(device_type);
+      }
+    }
+  }
+
+  // Returns the root node of the disjoint tree to which the node with the
+  // given id is connected.
+  int FindRoot(int node_id) {
+    DCHECK_GE(members_[node_id].parent, 0);
+    if (members_[node_id].parent != node_id) {
+      // NOTE: Compress paths from node_id to its root, so that future
+      // calls to FindRoot and ColocateNodes are more efficient.
+      members_[node_id].parent = FindRoot(members_[node_id].parent);
+    }
+    return members_[node_id].parent;
+  }
+
+  std::vector<Member> members_;
+  const DeviceSet* device_set_;  // Not owned.
+  const std::vector<DeviceType> device_types_;
+  const SessionOptions* options_;  // Not owned;
+};
+
+}  // namespace
+
+SimplePlacer::SimplePlacer(Graph* graph, const DeviceSet* devices,
+                           const NodeNameToIdMap* name_to_id_map,
+                           const SessionOptions* options)
+    : graph_(graph),
+      devices_(devices),
+      name_to_id_map_(name_to_id_map),
+      options_(options) {}
+
+SimplePlacer::SimplePlacer(Graph* graph, const DeviceSet* devices,
+                           const NodeNameToIdMap* name_to_id_map)
+    : graph_(graph), devices_(devices), name_to_id_map_(name_to_id_map) {
+  options_ = nullptr;
+}
+
+SimplePlacer::~SimplePlacer() {}
+
+Status SimplePlacer::Run() {
+  if (devices_->devices().empty()) {
+    return errors::FailedPrecondition("No devices are registered");
+  }
+
+  ColocationGraph colocation_graph(graph_, devices_, options_);
+  Status status;
+
+  // 1. First add all of the nodes. Note that steps (1) and (2)
+  // requires two passes over the nodes because the graph (and hence
+  // the constraints) may not be acyclic.
+  for (Node* node : graph_->nodes()) {
+    // Skip the source and sink nodes.
+    if (!node->IsOp()) {
+      continue;
+    }
+    status = colocation_graph.AddNode(*node);
+    if (!status.ok()) return AttachDef(status, node->def());
+  }
+
+  // 2. Enumerate the constraint edges, and use them to update the disjoint
+  // node set.
+  for (Node* node : graph_->nodes()) {
+    if (!node->IsOp()) {
+      continue;
+    }
+
+    // 2(a). If node n specifies a colocation constraint as its device name,
+    // add an edge from the colocated node to n.
+    if (HasColocatedNodeName(*node)) {
+      string colocated_node_name;
+      status = ParseColocatedNodeName(*node, &colocated_node_name);
+      if (!status.ok()) {
+        return AttachDef(status, node->def());
+      }
+      Node* colocated_node;
+      status = GetNodeByName(colocated_node_name, &colocated_node);
+      if (!status.ok()) {
+        return AttachDef(
+            errors::InvalidArgument("Colocated node named in device '",
+                                    colocated_node_name, "' does not exist"),
+            node->def());
+      }
+      status = colocation_graph.ColocateNodes(*colocated_node, *node);
+      if (!status.ok()) {
+        return AttachDef(
+            errors::InvalidArgument(
+                "Cannot satisfy colocation constraint named in device '",
+                colocated_node_name, "': ", status.error_message()),
+            node->def());
+      }
+    }
+
+    // 2(b). If `node` has an input edge with reference type, add an
+    // edge from the source of that edge to `node`.
+    for (const auto& edge : node->in_edges()) {
+      if (!edge->IsControlEdge() &&
+          IsRefType(node->input_type(edge->dst_input()))) {
+        status = colocation_graph.ColocateNodes(*edge->src(), *node);
+        if (!status.ok()) {
+          return AttachDef(
+              errors::InvalidArgument("Cannot satisfy colocation constraint "
+                                      "implied by reference connection: ",
+                                      status.error_message()),
+              node->def());
+        }
+      }
+    }
+  }
+
+  // 3. For each node, assign a device based on the constraints in the
+  // disjoint node set.
+  for (Node* node : graph_->nodes()) {
+    // Skip the source and sink nodes.
+    if (!node->IsOp()) {
+      continue;
+    }
+    // Skip nodes that already have an assigned name.
+    if (!node->assigned_device_name().empty()) {
+      continue;
+    }
+
+    status = colocation_graph.AssignDevice(node);
+    if (!status.ok()) {
+      return AttachDef(
+          errors::InvalidArgument("Cannot assign a device to node '",
+                                  node->name(), "': ", status.error_message()),
+          node->def());
+    }
+  }
+  return Status::OK();
+}
+
+Status SimplePlacer::GetNodeByName(const string& name, Node** out_node) const {
+  NodeNameToIdMap::const_iterator iter = name_to_id_map_->find(name);
+  if (iter != name_to_id_map_->end()) {
+    *out_node = graph_->FindNodeId(iter->second);
+    if (*out_node) {
+      return Status::OK();
+    }
+  }
+  return errors::NotFound(name);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/simple_placer.h b/tensorflow/core/common_runtime/simple_placer.h
new file mode 100644
index 0000000000..4b3df50c72
--- /dev/null
+++ b/tensorflow/core/common_runtime/simple_placer.h
@@ -0,0 +1,81 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_SIMPLE_PLACER_H_
+#define TENSORFLOW_COMMON_RUNTIME_SIMPLE_PLACER_H_
+
+#include <string>
+#include <unordered_map>
+
+#include "tensorflow/core/common_runtime/device_set.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/util/device_name_utils.h"
+#include "tensorflow/core/public/session_options.h"
+
+namespace tensorflow {
+
+// A placement algorithm that assigns the nodes of the given Graph to
+// devices the given DeviceSet, respecting the following constraints:
+//
+// 1. Existing device assignments remain unchanged.
+// 2. Requested (partial or complete) device specifications in the
+//    are granted.
+// 3. Nodes connected by edges of a reference type are colocated on
+//    the same device.
+// 4. Given nodes "A" and "B", if node "B" has the device specification
+//    "@A", nodes "A" and "B" will be colocated on the same device.
+//
+// The implementation builds a constraint graph with the same set of
+// nodes, and edges that represent colocation constraints between
+// nodes.  Each connected component in the resulting constraint graph
+// is then assigned to a single device.
+//
+// TODO(mrry): "Soft" constraints, such as "place node 'x' as close as
+// possible to node 'y' while respecting the other constraints"?
+// TODO(mrry): Create a common interface for this and the other
+// placement algorithms so that they may be injected into the graph
+// builder.
+class SimplePlacer {
+ public:
+  // A map from graph node names to numerical IDs (in a Graph object).
+  typedef std::unordered_map<string, int> NodeNameToIdMap;
+
+  // Creates an instance of the SimplePlacer algorithm for the given
+  // Graph "graph" (nodes in which may or may not be assigned) on the
+  // given DeviceSet "devices". The "name_to_id_map" maps the names of
+  // nodes in "g" to their numerical ID.
+  //
+  // REQUIRES: for all mappings (k, v) in "name_to_id_map",
+  // graph.FindNodeId(v)->name() == k.
+  //
+  // The "graph", "devices", and "name_to_id_map" pointer arguments
+  // are borrowed by this SimplePlacer, and must outlive it.
+  SimplePlacer(Graph* graph, const DeviceSet* devices,
+               const NodeNameToIdMap* name_to_id_map,
+               const SessionOptions* options);
+
+  SimplePlacer(Graph* graph, const DeviceSet* devices,
+               const NodeNameToIdMap* name_to_id_map);
+
+  ~SimplePlacer();
+
+  // Assigns each node in this SimplePlacer's graph to a device in its
+  // set of devices.
+  //
+  // This method is not thread-safe.
+  // Run() may be invoked at most once.
+  Status Run();
+
+ private:
+  Status GetNodeByName(const string& name, Node** out_node) const;
+
+  Graph* const graph_;                           // Not owned.
+  const DeviceSet* const devices_;               // Not owned.
+  const NodeNameToIdMap* const name_to_id_map_;  // Not owned.
+  const SessionOptions* options_;                // Not owned.
+
+  TF_DISALLOW_COPY_AND_ASSIGN(SimplePlacer);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_SIMPLE_PLACER_H_
diff --git a/tensorflow/core/common_runtime/simple_placer_test.cc b/tensorflow/core/common_runtime/simple_placer_test.cc
new file mode 100644
index 0000000000..3139962d7e
--- /dev/null
+++ b/tensorflow/core/common_runtime/simple_placer_test.cc
@@ -0,0 +1,863 @@
+#include "tensorflow/core/common_runtime/simple_placer.h"
+
+#include <memory>
+#include <string>
+#include <utility>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_set.h"
+#include "tensorflow/core/framework/device_attributes.pb.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/kernel_def_builder.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/error_codes.pb.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+namespace {
+
+////////////////////////////////////////////////////////////////////////////////
+//
+// Op, kernel, and device registrations to set up the environment.
+//
+// The SimplePlacer uses information about the op (input types),
+// kernel (device constraints), and available devices to make
+// placement decisions. To avoid depending on the full runtime, we
+// define dummy implementations of these, and register them with the
+// runtime.
+//
+////////////////////////////////////////////////////////////////////////////////
+
+// A dummy OpKernel that is used to register ops on different devices.
+class DummyOp : public OpKernel {
+ public:
+  explicit DummyOp(OpKernelConstruction* context) : OpKernel(context) {}
+  void Compute(OpKernelContext* context) override {}
+};
+
+// A fake device that has specific device attributes, used to simulate
+// the presence of a CPU or a GPU (without depending on that part of
+// the runtime.
+class FakeDevice : public Device {
+ private:
+  explicit FakeDevice(const DeviceAttributes& device_attributes)
+      : Device(nullptr, device_attributes, nullptr) {}
+
+ public:
+  Status Sync() override { return errors::Unimplemented("FakeDevice::Sync()"); }
+
+  Allocator* GetAllocator(AllocatorAttributes attr) override { return nullptr; }
+
+  static std::unique_ptr<Device> MakeCPU(const string& name) {
+    DeviceAttributes device_attributes;
+    device_attributes.set_name(name);
+    device_attributes.set_device_type(DeviceType(DEVICE_CPU).type());
+    return std::unique_ptr<Device>(new FakeDevice(device_attributes));
+  }
+
+  static std::unique_ptr<Device> MakeGPU(const string& name) {
+    DeviceAttributes device_attributes;
+    device_attributes.set_name(name);
+    device_attributes.set_device_type(DeviceType(DEVICE_GPU).type());
+    return std::unique_ptr<Device>(new FakeDevice(device_attributes));
+  }
+};
+
+// Register the following ops so they can be added to a Graph, and
+// kernels so that they can be placed on particular device types.
+REGISTER_OP("TestVariable").Output("o: Ref(float)");
+REGISTER_KERNEL_BUILDER(Name("TestVariable").Device(DEVICE_CPU), DummyOp);
+REGISTER_KERNEL_BUILDER(Name("TestVariable").Device(DEVICE_GPU), DummyOp);
+
+REGISTER_OP("VariableCPU").Output("o: Ref(float)");
+REGISTER_KERNEL_BUILDER(Name("VariableCPU").Device(DEVICE_CPU), DummyOp);
+
+REGISTER_OP("VariableGPU").Output("o: Ref(float)");
+REGISTER_KERNEL_BUILDER(Name("VariableGPU").Device(DEVICE_GPU), DummyOp);
+
+REGISTER_OP("VariableNoKernels").Output("o: Ref(float)");
+
+REGISTER_OP("TestAdd").Input("a: float").Input("b: float").Output("o: float");
+REGISTER_KERNEL_BUILDER(Name("TestAdd").Device(DEVICE_CPU), DummyOp);
+REGISTER_KERNEL_BUILDER(Name("TestAdd").Device(DEVICE_GPU), DummyOp);
+
+REGISTER_OP("TestRelu").Input("i: float").Output("o: float");
+REGISTER_KERNEL_BUILDER(Name("TestRelu").Device(DEVICE_CPU), DummyOp);
+REGISTER_KERNEL_BUILDER(Name("TestRelu").Device(DEVICE_GPU), DummyOp);
+
+REGISTER_OP("ReluGPU").Input("i: float").Output("o: float");
+REGISTER_KERNEL_BUILDER(Name("ReluGPU").Device(DEVICE_GPU), DummyOp);
+
+REGISTER_OP("TestAssign").Input("i: Ref(float)").Input("v: float");
+REGISTER_KERNEL_BUILDER(Name("TestAssign").Device(DEVICE_CPU), DummyOp);
+REGISTER_KERNEL_BUILDER(Name("TestAssign").Device(DEVICE_GPU), DummyOp);
+
+REGISTER_OP("AssignCPU").Input("i: Ref(float)").Input("v: float");
+REGISTER_KERNEL_BUILDER(Name("AssignCPU").Device(DEVICE_CPU), DummyOp);
+
+REGISTER_OP("AssignGPU").Input("i: Ref(float)").Input("v: float");
+REGISTER_KERNEL_BUILDER(Name("AssignGPU").Device(DEVICE_GPU), DummyOp);
+
+REGISTER_OP("TestInput").Output("a: float").Output("b: float");
+REGISTER_KERNEL_BUILDER(Name("TestInput").Device(DEVICE_CPU), DummyOp);
+
+REGISTER_OP("TestDevice").Output("a: float").Output("b: float");
+REGISTER_KERNEL_BUILDER(Name("TestDevice").Device(DEVICE_GPU), DummyOp);
+
+REGISTER_OP("TestDeviceEnforce").Input("a: Ref(float)").Output("b: float");
+REGISTER_KERNEL_BUILDER(Name("TestDeviceEnforce").Device(DEVICE_CPU), DummyOp);
+REGISTER_KERNEL_BUILDER(Name("TestDeviceEnforce").Device(DEVICE_GPU), DummyOp);
+
+////////////////////////////////////////////////////////////////////////////////
+//
+// A SimplePlacerTest method has three phases:
+//
+// 1. Build a TensorFlow graph, with no (or partial) device assignments.
+// 2. Attempt to compute a placement using the SimplePlacer.
+// 3. EITHER: test that the constraints implied by the graph are respected;
+//    or that an appropriate error was reported.
+//
+////////////////////////////////////////////////////////////////////////////////
+class SimplePlacerTest : public ::testing::Test {
+ protected:
+  SimplePlacerTest() {
+    RequireDefaultOps();
+    // Build a set of 10 GPU and 10 CPU devices.
+    // NOTE: this->local_devices_ owns the device objects;
+    // this->devices_ contains borrowed pointers to the device
+    // objects.
+    for (int i = 0; i < 10; ++i) {
+      local_devices_.emplace_back(FakeDevice::MakeCPU(
+          strings::StrCat("/job:a/replica:0/task:0/cpu:", i)));
+      devices_.AddDevice(local_devices_.back().get());
+      // Insert the GPUs in reverse order.
+      local_devices_.emplace_back(FakeDevice::MakeGPU(
+          strings::StrCat("/job:a/replica:0/task:0/gpu:", 9 - i)));
+      devices_.AddDevice(local_devices_.back().get());
+    }
+  }
+
+  // Builds the given graph, and (if successful) indexes the node
+  // names for use in placement, and later lookup.
+  Status BuildGraph(const GraphDefBuilder& builder, Graph* out_graph) {
+    TF_RETURN_IF_ERROR(builder.ToGraph(out_graph));
+    nodes_by_name_.clear();
+    for (Node* node : out_graph->nodes()) {
+      nodes_by_name_[node->name()] = node->id();
+    }
+    return Status::OK();
+  }
+
+  // Invokes the SimplePlacer on "graph". If no DeviceSet is specified, the
+  // placement will use the default DeviceSet (of 10 CPU and 10 GPU devices).
+  //
+  // REQUIRES: "*graph" was produced by the most recent call to BuildGraph.
+  Status Place(Graph* graph, DeviceSet* devices, SessionOptions* options) {
+    SimplePlacer placer(graph, devices, &nodes_by_name_, options);
+    return placer.Run();
+  }
+
+  Status Place(Graph* graph, DeviceSet* devices) {
+    return Place(graph, devices, nullptr);
+  }
+
+  Status Place(Graph* graph, SessionOptions* options) {
+    return Place(graph, &devices_, options);
+  }
+
+  Status Place(Graph* graph) { return Place(graph, &devices_, nullptr); }
+
+  // Returns the node in "graph" with the given name.
+  //
+  // REQUIRES: "graph" was produced by the most recent call to BuildGraph.
+  Node* GetNodeByName(const Graph& graph, const string& name) {
+    const auto search = nodes_by_name_.find(name);
+    CHECK(search != nodes_by_name_.end()) << "Unknown node name: " << name;
+    return graph.FindNodeId(search->second);
+  }
+
+ protected:
+  std::vector<std::unique_ptr<Device>> local_devices_;
+  DeviceSet devices_;
+  SimplePlacer::NodeNameToIdMap nodes_by_name_;
+
+  Status ReferenceTestHelper(const string& variable_op_type,
+                             const string& assign_op_type,
+                             DeviceType expected_device_type);
+};
+
+#define EXPECT_COLOCATED(g, name_a, name_b)                         \
+  do {                                                              \
+    Graph& g_ = (g);                                                \
+    EXPECT_EQ(GetNodeByName(g_, (name_a))->assigned_device_name(),  \
+              GetNodeByName(g_, (name_b))->assigned_device_name()); \
+  } while (0)
+
+#define EXPECT_DEVICE_TYPE(g, name, expected_device_type)                   \
+  EXPECT_EQ(DeviceType(expected_device_type).type(),                        \
+            devices_.FindDeviceByName(                                      \
+                        GetNodeByName((g), (name))->assigned_device_name()) \
+                ->attributes()                                              \
+                .device_type())
+
+#define EXPECT_DEVICE_CONTAINS(g, name, device_substr)                        \
+  EXPECT_TRUE(StringPiece(GetNodeByName((g), (name))->assigned_device_name()) \
+                  .contains(device_substr))
+
+// Test that a graph with no constraints will successfully assign nodes to the
+// "best available" device (i.e. prefer GPU over CPU).
+TEST_F(SimplePlacerTest, TestNoConstraints) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
+    ops::UnaryOp("TestRelu", ops::NodeOut(input, 0), b.opts().WithName("n1"));
+    ops::UnaryOp("TestRelu", ops::NodeOut(input, 1), b.opts().WithName("n2"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  EXPECT_OK(Place(&g));
+  EXPECT_DEVICE_TYPE(g, "in", DEVICE_CPU);
+  EXPECT_DEVICE_TYPE(g, "n1", DEVICE_GPU);
+  EXPECT_DEVICE_TYPE(g, "n2", DEVICE_GPU);
+}
+
+// Test that a graph with device type and reference constraints on
+// some of the ops will successfully assign nodes to the constrained
+// device, and colocate nodes with reference connections.
+TEST_F(SimplePlacerTest, TestDeviceTypeConstraints) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
+    Node* var_cpu = ops::SourceOp("VariableCPU", b.opts().WithName("var_cpu"));
+    ops::BinaryOp("AssignCPU", var_cpu, input, b.opts().WithName("assign_cpu"));
+    Node* var_gpu = ops::SourceOp("VariableGPU", b.opts().WithName("var_gpu"));
+    ops::BinaryOp("AssignGPU", var_gpu, input, b.opts().WithName("assign_gpu"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  EXPECT_OK(Place(&g));
+  EXPECT_DEVICE_TYPE(g, "in", DEVICE_CPU);
+  EXPECT_DEVICE_TYPE(g, "var_cpu", DEVICE_CPU);
+  EXPECT_DEVICE_TYPE(g, "assign_cpu", DEVICE_CPU);
+  EXPECT_COLOCATED(g, "var_cpu", "assign_cpu");
+  EXPECT_DEVICE_TYPE(g, "var_gpu", DEVICE_GPU);
+  EXPECT_DEVICE_TYPE(g, "assign_gpu", DEVICE_GPU);
+  EXPECT_COLOCATED(g, "var_gpu", "assign_gpu");
+}
+
+// Test that a graph with partial device specifications on the ops
+// will successfully
+TEST_F(SimplePlacerTest, TestPartialSpec) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in").WithDevice("/job:a"));
+    ops::SourceOp("TestVariable",
+                  b.opts().WithName("var").WithDevice("/job:a"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  EXPECT_OK(Place(&g));
+  EXPECT_DEVICE_TYPE(g, "in", DEVICE_CPU);
+  EXPECT_DEVICE_CONTAINS(g, "in", "/job:a");
+  EXPECT_DEVICE_TYPE(g, "var", DEVICE_GPU);
+  EXPECT_DEVICE_CONTAINS(g, "var", "/job:a");
+}
+
+// Test that a node with an assigned device is not relocated.
+TEST_F(SimplePlacerTest, TestAssignedDevicePreserved) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  GetNodeByName(g, "in")
+      ->set_assigned_device_name("/job:a/replica:0/task:0/cpu:7");
+
+  EXPECT_OK(Place(&g));
+  EXPECT_EQ("/job:a/replica:0/task:0/cpu:7",
+            GetNodeByName(g, "in")->assigned_device_name());
+}
+
+// Test that a graph with partial device specifications for CPU-only ops
+// will be relocated to CPU.
+TEST_F(SimplePlacerTest, TestPartialSpecGpuToCpu) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in").WithDevice("/gpu:0"));
+    ops::SourceOp("TestVariable",
+                  b.opts().WithName("var").WithDevice("/gpu:0"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  SessionOptions options;
+  options.config.set_allow_soft_placement(true);
+  EXPECT_OK(Place(&g, &options));
+  EXPECT_DEVICE_TYPE(g, "in", DEVICE_CPU);
+  EXPECT_DEVICE_CONTAINS(g, "in", "/cpu");
+  EXPECT_DEVICE_TYPE(g, "var", DEVICE_GPU);
+  EXPECT_DEVICE_CONTAINS(g, "var", "/gpu:0");
+}
+
+// Test that a node with an assigned GPU device but has not registered
+// OpKernel will fail.
+TEST_F(SimplePlacerTest, TestAssignedGpuDeviceToCpuDevice) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  GetNodeByName(g, "in")
+      ->set_assigned_device_name("/job:a/replica:0/task:0/gpu:0");
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INTERNAL, s.code());
+  EXPECT_TRUE(
+      StringPiece(s.error_message())
+          .contains("Assigned device '/job:a/replica:0/task:0/gpu:0' "
+                    "does not have registered OpKernel support for TestInput"));
+}
+
+// Test that graphs with reference connections are correctly placed.
+
+// Build a graph containing a Variable op of "variable_op_type" and an
+// Assign op of "assign_op_type", and expect all of the ops to be
+// placed on a device of type "expected_device_type".
+Status SimplePlacerTest::ReferenceTestHelper(const string& variable_op_type,
+                                             const string& assign_op_type,
+                                             DeviceType expected_device_type) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
+    // Build ten variable-and-assignment pairs.
+    for (int i = 0; i < 10; ++i) {
+      Node* var = ops::SourceOp(variable_op_type,
+                                b.opts().WithName(strings::StrCat("var_", i)));
+      ops::BinaryOp(assign_op_type, var, input,
+                    b.opts().WithName(strings::StrCat("assign_", i)));
+    }
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  TF_RETURN_IF_ERROR(Place(&g));
+
+  for (int i = 0; i < 10; ++i) {
+    EXPECT_COLOCATED(g, strings::StrCat("var_", i),
+                     strings::StrCat("assign_", i));
+    EXPECT_DEVICE_TYPE(g, strings::StrCat("var_", i), expected_device_type);
+    EXPECT_DEVICE_TYPE(g, strings::StrCat("assign_", i), expected_device_type);
+  }
+
+  return Status::OK();
+}
+
+// Test all 2^3 combinations of Variable and Assignment op types
+// (unconstrained, CPU-only, and GPU-only).
+TEST_F(SimplePlacerTest, TestReferenceConnection) {
+  Status s;
+  EXPECT_OK(ReferenceTestHelper("TestVariable", "TestAssign", DEVICE_GPU));
+  EXPECT_OK(ReferenceTestHelper("TestVariable", "AssignCPU", DEVICE_CPU));
+  EXPECT_OK(ReferenceTestHelper("TestVariable", "AssignGPU", DEVICE_GPU));
+  EXPECT_OK(ReferenceTestHelper("VariableCPU", "TestAssign", DEVICE_CPU));
+  EXPECT_OK(ReferenceTestHelper("VariableCPU", "AssignCPU", DEVICE_CPU));
+  {
+    Status s = ReferenceTestHelper("VariableCPU", "AssignGPU", DEVICE_CPU);
+    EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+    EXPECT_TRUE(StringPiece(s.error_message())
+                    .contains("no device type supports both of those nodes"));
+  }
+  EXPECT_OK(ReferenceTestHelper("VariableGPU", "TestAssign", DEVICE_GPU));
+  {
+    Status s = ReferenceTestHelper("VariableGPU", "AssignCPU", DEVICE_CPU);
+    EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+    EXPECT_TRUE(StringPiece(s.error_message())
+                    .contains("no device type supports both of those nodes"));
+  }
+  EXPECT_OK(ReferenceTestHelper("VariableGPU", "AssignGPU", DEVICE_GPU));
+}
+
+// Test the handling of '@node_name' colocation constraints, when
+// these are arranged in multiple chains.
+TEST_F(SimplePlacerTest, TestColocatedChain) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
+    Node* last_node = input;
+    for (int i = 0; i < 100; ++i) {
+      if (i % 10 == 0) {
+        // Every ten nodes, start a new chain.
+        last_node = ops::UnaryOp("TestRelu", last_node,
+                                 b.opts().WithName(strings::StrCat("n_", i)));
+      } else {
+        // Chain each successive node to the previous one.
+        last_node =
+            ops::UnaryOp("TestRelu", last_node,
+                         b.opts()
+                             .WithName(strings::StrCat("n_", i))
+                             .WithDevice(strings::StrCat("@n_", i - 1)));
+      }
+    }
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  EXPECT_OK(Place(&g));
+  for (int i = 0; i < 100; ++i) {
+    if (i % 10 != 0) {
+      EXPECT_COLOCATED(g, strings::StrCat("n_", i - (i % 1)),
+                       strings::StrCat("n_", i));
+    }
+  }
+}
+
+// Test the handling of '@node_name' colocation constraints, when the
+// chains are shuffled.
+TEST_F(SimplePlacerTest, TestColocatedChainWithLongRangeColocations) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
+    Node* last_node = input;
+    for (int i = 0; i < 10; ++i) {
+      // Start ten chains.
+      last_node = ops::UnaryOp("TestRelu", last_node,
+                               b.opts().WithName(strings::StrCat("n_", i)));
+    }
+    for (int i = 10; i < 100; ++i) {
+      // Add each node to the (i % 10)^th chain.
+      last_node = ops::UnaryOp("TestRelu", last_node,
+                               b.opts()
+                                   .WithName(strings::StrCat("n_", i))
+                                   .WithDevice(strings::StrCat("@n_", i % 10)));
+    }
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  EXPECT_OK(Place(&g));
+  for (int i = 10; i < 100; ++i) {
+    EXPECT_COLOCATED(g, strings::StrCat("n_", i % 10),
+                     strings::StrCat("n_", i));
+  }
+}
+
+TEST_F(SimplePlacerTest, TestColocationAndReferenceConnections) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
+    for (int i = 0; i < 10; ++i) {
+      // Declare ten variable and assignment pairs.
+      Node* var = ops::SourceOp("TestVariable",
+                                b.opts().WithName(strings::StrCat("var_", i)));
+      ops::BinaryOp("TestAssign", var, input,
+                    b.opts().WithName(strings::StrCat("assign_", i)));
+    }
+    for (int i = 10; i < 100; ++i) {
+      // Create a variable colocated with some existing variable, and
+      // an assignment colocated with a possibly-different variable.
+      Node* var = ops::SourceOp(
+          "TestVariable", b.opts()
+                              .WithName(strings::StrCat("var_", i))
+                              .WithDevice(strings::StrCat("@var_", i % 6)));
+      ops::BinaryOp("TestAssign", var, input,
+                    b.opts()
+                        .WithName(strings::StrCat("assign_", i))
+                        .WithDevice(strings::StrCat("@assign_", i % 3)));
+    }
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  EXPECT_OK(Place(&g));
+  for (int i = 0; i < 10; ++i) {
+    EXPECT_COLOCATED(g, strings::StrCat("var_", i),
+                     strings::StrCat("assign_", i));
+  }
+  for (int i = 10; i < 100; ++i) {
+    EXPECT_COLOCATED(g, strings::StrCat("var_", i),
+                     strings::StrCat("assign_", i));
+    EXPECT_COLOCATED(g, strings::StrCat("var_", i),
+                     strings::StrCat("var_", i % 6));
+    EXPECT_COLOCATED(g, strings::StrCat("assign_", i),
+                     strings::StrCat("assign_", i % 3));
+  }
+}
+
+// Test that placement fails when no devices are registered.
+TEST_F(SimplePlacerTest, TestEmptyDeviceSet) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  DeviceSet empty;
+
+  Status s = Place(&g, &empty);
+  EXPECT_TRUE(
+      StringPiece(s.error_message()).contains("No devices are registered"));
+}
+
+// Test that placement fails when the requested device forces an
+// indirect constraint to be violated.
+TEST_F(SimplePlacerTest, TestHeterogeneousDeviceSetFailure) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* in = ops::SourceOp("TestInput", b.opts().WithName("in"));
+    Node* var = ops::SourceOp("VariableGPU", b.opts().WithName("var"));
+    ops::BinaryOp("TestAssign", var, in,
+                  b.opts().WithName("assign").WithDevice("/job:b/task:1"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  DeviceSet heterogeneous;
+  std::unique_ptr<Device> gpu(
+      FakeDevice::MakeGPU("/job:b/replica:0/task:0/gpu:0"));
+  heterogeneous.AddDevice(gpu.get());
+  std::unique_ptr<Device> cpu(
+      FakeDevice::MakeCPU("/job:b/replica:0/task:1/cpu:0"));
+  heterogeneous.AddDevice(cpu.get());
+  Status s = Place(&g, &heterogeneous);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(StringPiece(s.error_message())
+                  .contains("colocated with a group of nodes that required "
+                            "incompatible device"));
+}
+
+// Test that placement fails when an unknown device is requested.
+TEST_F(SimplePlacerTest, TestUnknownDevice) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in").WithDevice("/job:foo"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(
+      StringPiece(s.error_message())
+          .contains(
+              "Could not satisfy explicit device specification '/job:foo'"));
+}
+
+// Test that placement fails when the combination of partial
+// constraints leads to an unknown device.
+TEST_F(SimplePlacerTest, TestUnknownMergedDevice) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in").WithDevice("/job:foo"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(
+      StringPiece(s.error_message())
+          .contains(
+              "Could not satisfy explicit device specification '/job:foo'"));
+}
+
+// Test that placement fails when the previously-assigned device for a
+// node is unknown.
+TEST_F(SimplePlacerTest, TestUnknownAssignedDevice) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  GetNodeByName(g, "in")->set_assigned_device_name("/job:foo");
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INTERNAL, s.code());
+  EXPECT_TRUE(
+      StringPiece(s.error_message())
+          .contains("Assigned device '/job:foo' does not match any device"));
+}
+
+// Test that placement fails when an op with no registered kernels is
+// requested.
+TEST_F(SimplePlacerTest, TestNoKernelsRegistered) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("VariableNoKernels", b.opts().WithName("var"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(
+      StringPiece(s.error_message())
+          .contains(
+              "No OpKernel was registered to support Op 'VariableNoKernels'"));
+}
+
+// Test that placement fails when a kernel is registered but no known
+// device supports it.
+TEST_F(SimplePlacerTest, TestNoDevicesRegistered) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("VariableGPU", b.opts().WithName("var"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  DeviceSet cpu_only;
+  std::unique_ptr<Device> cpu(
+      FakeDevice::MakeCPU("/job:a/replica:0/task:0/cpu:0"));
+  cpu_only.AddDevice(cpu.get());
+
+  Status s = Place(&g, &cpu_only);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(StringPiece(s.error_message())
+                  .contains("No OpKernel was registered to support "
+                            "Op 'VariableGPU'"));
+}
+
+// Test that placement fails when a requested device is malformed.
+TEST_F(SimplePlacerTest, TestMalformedDeviceSpecification) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in").WithDevice("/foo:bar"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(StringPiece(s.error_message())
+                  .contains("Malformed device specification '/foo:bar'"));
+}
+
+// Test that placement fails when a previously-assigned device is malformed.
+TEST_F(SimplePlacerTest, TestMalformedAssignedDevice) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  GetNodeByName(g, "in")->set_assigned_device_name("/foo:bar");
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INTERNAL, s.code());
+  EXPECT_TRUE(StringPiece(s.error_message())
+                  .contains("Malformed assigned device '/foo:bar'"));
+}
+
+// Test that placement fails when a device was previously assigned to
+// a node, but it does not uniquely identify a particular device.
+TEST_F(SimplePlacerTest, TestNonUniqueAssignedDevice) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  GetNodeByName(g, "in")->set_assigned_device_name("/job:a");
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INTERNAL, s.code());
+  EXPECT_TRUE(
+      StringPiece(s.error_message())
+          .contains("Assigned device '/job:a' does not match any device"));
+}
+
+// Test that placement fails when a node requests colocation with another
+// node that does not exist.
+TEST_F(SimplePlacerTest, TestUnknownColocatedNode) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in").WithDevice("@foo"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(StringPiece(s.error_message()).contains("'foo' does not exist"));
+}
+
+// Test that placement fails when a node requests colocation with a
+// malformed node name.
+TEST_F(SimplePlacerTest, TestMalformedColocatedNode) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestInput", b.opts().WithName("in").WithDevice("@"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(StringPiece(s.error_message())
+                  .contains("node named in device '' does not exist"));
+}
+
+// Test that ops request to be placed on non-existent devices will be relocated
+// to existing device of the same type if allow_soft_placement is set.
+TEST_F(SimplePlacerTest, TestNonexistentGpuAllowSoftPlacement) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestDevice", b.opts().WithName("in").WithDevice("/gpu:11"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  SessionOptions options;
+  options.config.set_allow_soft_placement(true);
+  EXPECT_OK(Place(&g, &options));
+  EXPECT_DEVICE_CONTAINS(g, "in", "/gpu:0");
+}
+
+// Test that ops request to be placed on non-existent devices will fail if
+// allow_soft_placement is not set.
+TEST_F(SimplePlacerTest, TestNonexistentGpuNoAllowSoftPlacement) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("TestDevice", b.opts().WithName("in").WithDevice("/gpu:11"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  SessionOptions options;
+  Status s = Place(&g, &options);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(
+      StringPiece(s.error_message())
+          .contains(
+              "Could not satisfy explicit device specification '/gpu:11'"));
+}
+
+// Test that placement fails when a node requests an explicit device that is not
+// supported by the registered kernels if allow_soft_placement is no set.
+TEST_F(SimplePlacerTest, TestUnsupportedDeviceNoAllowSoftPlacement) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("VariableGPU", b.opts().WithName("var").WithDevice("/cpu:0"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  SessionOptions options;
+  Status s = Place(&g, &options);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(
+      StringPiece(s.error_message())
+          .contains(
+              "Could not satisfy explicit device specification '/cpu:0'"));
+}
+
+TEST_F(SimplePlacerTest, TestUnsupportedDeviceAllowSoftPlacement) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    ops::SourceOp("VariableGPU", b.opts().WithName("var").WithDevice("/cpu:0"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  SessionOptions options;
+  options.config.set_allow_soft_placement(true);
+  EXPECT_OK(Place(&g, &options));
+}
+
+// Test that a graph with device type and reference constraints on
+// some of the ops will successfully assign nodes to the constrained
+// device, and colocate nodes with reference connections.
+TEST_F(SimplePlacerTest, TestDeviceTypeConstraintsAllowSoftPlacement) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    // var_gpu has ref output and runs on GPU.
+    // force_gpu takes var_gpu and requested CPU.
+    // Verify that both are placed on GPU.
+    Node* var_gpu = ops::SourceOp("VariableGPU", b.opts().WithName("var_gpu"));
+    ops::UnaryOp("TestDeviceEnforce", var_gpu,
+                 b.opts().WithName("force_gpu").WithDevice("/cpu:0"));
+    // var_cpu has ref output and runs on CPU.
+    // force_cpu takes var_cpu and requested GPU.
+    // Verify that both are placed on CPU.
+    Node* var_cpu = ops::SourceOp("VariableCPU", b.opts().WithName("var_cpu"));
+    ops::UnaryOp("TestDeviceEnforce", var_cpu,
+                 b.opts().WithName("force_cpu").WithDevice("/gpu:0"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  SessionOptions options;
+  options.config.set_allow_soft_placement(true);
+  EXPECT_OK(Place(&g, &options));
+  EXPECT_DEVICE_TYPE(g, "var_gpu", DEVICE_GPU);
+  EXPECT_DEVICE_TYPE(g, "force_gpu", DEVICE_GPU);
+  EXPECT_COLOCATED(g, "var_gpu", "force_gpu");
+  EXPECT_DEVICE_TYPE(g, "var_cpu", DEVICE_CPU);
+  EXPECT_DEVICE_TYPE(g, "force_cpu", DEVICE_CPU);
+  EXPECT_COLOCATED(g, "var_cpu", "force_cpu");
+}
+
+// Test that placement fails when two nodes have a reference connection
+// constraint, and each node requires a mutually incompatible device.
+TEST_F(SimplePlacerTest, TestUnsatisfiableConstraintWithReferenceConnections) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* var = ops::SourceOp("VariableGPU", b.opts().WithName("var"));
+    Node* input = ops::SourceOp("TestInput", b.opts().WithName("in"));
+    ops::BinaryOp("AssignCPU", var, input, b.opts().WithName("assign"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(StringPiece(s.error_message())
+                  .contains("Cannot colocate nodes 'var' and 'assign'"));
+}
+
+// Test that placement fails when two nodes have an explicit
+// colocation constraint, and each node requires a mutually
+// incompatible device.
+TEST_F(SimplePlacerTest, TestUnsatisfiableConstraintWithColocatedNodes) {
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* input = ops::SourceOp("TestInput",
+                                b.opts().WithName("in").WithDevice("/gpu:0"));
+    Node* relu_1 = ops::UnaryOp("TestRelu", input,
+                                b.opts().WithName("relu_1").WithDevice("@in"));
+    ops::UnaryOp("ReluGPU", relu_1,
+                 b.opts().WithName("relu_2").WithDevice("@relu_1"));
+    EXPECT_OK(BuildGraph(b, &g));
+  }
+
+  Status s = Place(&g);
+  EXPECT_EQ(error::INVALID_ARGUMENT, s.code());
+  EXPECT_TRUE(StringPiece(s.error_message())
+                  .contains("Cannot colocate nodes 'relu_1' and 'relu_2'"));
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/threadpool_device.cc b/tensorflow/core/common_runtime/threadpool_device.cc
new file mode 100644
index 0000000000..4806e69c67
--- /dev/null
+++ b/tensorflow/core/common_runtime/threadpool_device.cc
@@ -0,0 +1,55 @@
+#include "tensorflow/core/common_runtime/threadpool_device.h"
+
+#include "tensorflow/core/common_runtime/local_device.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/device_base.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/types.h"
+#include "tensorflow/core/lib/hash/hash.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/tracing.h"
+#include "tensorflow/core/public/session_options.h"
+
+namespace tensorflow {
+
+ThreadPoolDevice::ThreadPoolDevice(const SessionOptions& options,
+                                   const string& name, Bytes memory_limit,
+                                   BusAdjacency bus_adjacency,
+                                   Allocator* allocator)
+    : LocalDevice(options, Device::BuildDeviceAttributes(
+                               name, DEVICE_CPU, memory_limit, bus_adjacency),
+                  allocator),
+      allocator_(allocator) {}
+
+ThreadPoolDevice::~ThreadPoolDevice() {}
+
+void ThreadPoolDevice::Compute(OpKernel* op_kernel, OpKernelContext* context) {
+  if (port::Tracing::IsActive()) {
+    // TODO(pbar) We really need a useful identifier of the graph node.
+    const uint64 id = Hash64(op_kernel->name());
+    port::Tracing::ScopedActivity region(port::Tracing::EventCategory::kCompute,
+                                         id);
+    op_kernel->Compute(context);
+  } else {
+    op_kernel->Compute(context);
+  }
+}
+
+Allocator* ThreadPoolDevice::GetAllocator(AllocatorAttributes attr) {
+  return allocator_;
+}
+
+Status ThreadPoolDevice::MakeTensorFromProto(
+    const TensorProto& tensor_proto, const AllocatorAttributes alloc_attrs,
+    Tensor* tensor) {
+  Tensor parsed(tensor_proto.dtype());
+  if (!parsed.FromProto(cpu_allocator(), tensor_proto)) {
+    return errors::InvalidArgument("Cannot parse tensor from proto: ",
+                                   tensor_proto.DebugString());
+  }
+  *tensor = parsed;
+  return Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/common_runtime/threadpool_device.h b/tensorflow/core/common_runtime/threadpool_device.h
new file mode 100644
index 0000000000..5b0347231f
--- /dev/null
+++ b/tensorflow/core/common_runtime/threadpool_device.h
@@ -0,0 +1,31 @@
+#ifndef TENSORFLOW_COMMON_RUNTIME_THREADPOOL_DEVICE_H_
+#define TENSORFLOW_COMMON_RUNTIME_THREADPOOL_DEVICE_H_
+
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/common_runtime/local_device.h"
+
+namespace tensorflow {
+
+// CPU device implementation.
+class ThreadPoolDevice : public LocalDevice {
+ public:
+  ThreadPoolDevice(const SessionOptions& options, const string& name,
+                   Bytes memory_limit, BusAdjacency bus_adjacency,
+                   Allocator* allocator);
+  ~ThreadPoolDevice() override;
+
+  void Compute(OpKernel* op_kernel, OpKernelContext* context) override;
+  Allocator* GetAllocator(AllocatorAttributes attr) override;
+  Status MakeTensorFromProto(const TensorProto& tensor_proto,
+                             const AllocatorAttributes alloc_attrs,
+                             Tensor* tensor) override;
+
+  Status Sync() override { return Status::OK(); }
+
+ private:
+  Allocator* allocator_;  // Not owned
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_COMMON_RUNTIME_THREADPOOL_DEVICE_H_
diff --git a/tensorflow/core/common_runtime/threadpool_device_factory.cc b/tensorflow/core/common_runtime/threadpool_device_factory.cc
new file mode 100644
index 0000000000..ee6319abad
--- /dev/null
+++ b/tensorflow/core/common_runtime/threadpool_device_factory.cc
@@ -0,0 +1,31 @@
+// Register a factory that provides CPU devices.
+#include "tensorflow/core/common_runtime/threadpool_device.h"
+
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/public/session_options.h"
+
+namespace tensorflow {
+
+// TODO(zhifengc/tucker): Figure out the bytes of available RAM.
+class ThreadPoolDeviceFactory : public DeviceFactory {
+ public:
+  void CreateDevices(const SessionOptions& options, const string& name_prefix,
+                     std::vector<Device*>* devices) override {
+    // TODO(zhifengc/tucker): Figure out the number of available CPUs
+    // and/or NUMA configuration.
+    int n = 1;
+    auto iter = options.config.device_count().find("CPU");
+    if (iter != options.config.device_count().end()) {
+      n = iter->second;
+    }
+    for (int i = 0; i < n; i++) {
+      string name = strings::StrCat(name_prefix, "/cpu:", i);
+      devices->push_back(new ThreadPoolDevice(options, name, Bytes(256 << 20),
+                                              BUS_ANY, cpu_allocator()));
+    }
+  }
+};
+REGISTER_LOCAL_DEVICE_FACTORY("CPU", ThreadPoolDeviceFactory);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/example/example.proto b/tensorflow/core/example/example.proto
new file mode 100644
index 0000000000..194d1e7c24
--- /dev/null
+++ b/tensorflow/core/example/example.proto
@@ -0,0 +1,95 @@
+// Protocol messages for describing input data Examples for machine learning
+// model training or inference.
+syntax = "proto3";
+
+import "tensorflow/core/example/feature.proto";
+// option cc_enable_arenas = true;
+
+package tensorflow;
+
+// Example for a movie recommendation application:
+//   features {
+//     feature {
+//       key: "age"
+//       float_list {
+//         value: 29.0
+//       }
+//     }
+//     feature {
+//       key: "movie"
+//       bytes_list {
+//         value: "The Shawshank Redemption"
+//         value: "Fight Club"
+//       }
+//     }
+//     feature {
+//       key: "movie_ratings"
+//       float_list {
+//         value: 9.0
+//         value: 9.7
+//       }
+//     }
+//     feature {
+//       key: "suggestion"
+//       bytes_list {
+//         value: "Inception"
+//       }
+//     }
+//     # Note that this feature exists to be used as a label in training.
+//     # E.g., if training a logistic regression model to predict purchase
+//     # probability in our learning tool we would set the label feature to
+//     # "suggestion_purchased".
+//     feature {
+//       key: "suggestion_purchased"
+//       float_list {
+//         value: 1.0
+//       }
+//     }
+//     # Similar to "suggestion_purchased" above this feature exists to be used
+//     # as a label in training.
+//     # E.g., if training a linear regression model to predict purchase
+//     # price in our learning tool we would set the label feature to
+//     # "purchase_price".
+//     feature {
+//       key: "purchase_price"
+//       float_list {
+//         value: 9.99
+//       }
+//     }
+//  }
+//
+// A conformant data set obeys the following conventions:
+//   - If a Feature K exists in one example with data type T, it must be of
+//       type T in all other examples when present. It may be omitted.
+//   - The number of instances of Feature K list data may vary across examples,
+//       depending on the requirements of the model.
+//   - If a Feature K doesn't exist in an example, a K-specific default will be
+//       used, if configured.
+//   - If a Feature K exists in an example but contains no items, the intent
+//       is considered to be an empty tensor and no default will be used.
+
+message Example {
+  Features features = 1;
+};
+
+// Example representing a ranking instance.
+message RankingExample {
+  Features context = 1;
+  repeated Features positive = 2;
+  repeated Features negative = 3;
+};
+
+// Example representing a sequence.
+// The context contains features which apply to the entire sequence.
+// Each element in example represents an entry in the sequence.
+message SequenceExample {
+  Features context = 1;
+  repeated Features features = 2;
+};
+
+// Example representing a list of feature maps.
+// The context contains features which apply to all feature maps.
+message InferenceExample {
+  Features context = 1;
+  repeated Features features = 2;
+};
diff --git a/tensorflow/core/example/feature.proto b/tensorflow/core/example/feature.proto
new file mode 100644
index 0000000000..5ab77c2997
--- /dev/null
+++ b/tensorflow/core/example/feature.proto
@@ -0,0 +1,82 @@
+// Protocol messages for describing features for machine learning model
+// training or inference.
+//
+// There are three base Feature types:
+//   - bytes
+//   - float
+//   - int64
+//
+// Base features are contained in Lists which may hold zero or more values.
+//
+// Features are organized into categories by name.  The Features message
+// contains the mapping from name to Feature.
+//
+// Example Features for a movie recommendation application:
+//   feature {
+//     key: "age"
+//     float_list {
+//       value: 29.0
+//     }
+//   }
+//   feature {
+//     key: "movie"
+//     bytes_list {
+//       value: "The Shawshank Redemption"
+//       value: "Fight Club"
+//     }
+//   }
+//   feature {
+//     key: "movie_ratings"
+//     float_list {
+//       value: 9.0
+//       value: 9.7
+//     }
+//   }
+//   feature {
+//     key: "suggestion"
+//     bytes_list {
+//       value: "Inception"
+//     }
+//   }
+//   feature {
+//     key: "suggestion_purchased"
+//     int64_list {
+//       value: 1
+//     }
+//   }
+//   feature {
+//     key: "purchase_price"
+//     float_list {
+//       value: 9.99
+//     }
+//   }
+
+syntax = "proto3";
+// option cc_enable_arenas = true;
+
+package tensorflow;
+
+message Feature {
+  // Each feature can be exactly one kind.
+  oneof kind {
+    BytesList bytes_list = 1;
+    FloatList float_list = 2;
+    Int64List int64_list = 3;
+  }
+};
+
+message Features {
+  // Map from feature name to feature.
+  map<string, Feature> feature = 1;
+};
+
+// Containers to hold repeated fundamental features.
+message BytesList {
+  repeated bytes value = 1;
+}
+message FloatList {
+  repeated float value = 1 [packed=true];
+}
+message Int64List {
+  repeated int64 value = 1 [packed=true];
+}
diff --git a/tensorflow/core/framework/allocation_description.proto b/tensorflow/core/framework/allocation_description.proto
new file mode 100644
index 0000000000..f6f4bc0126
--- /dev/null
+++ b/tensorflow/core/framework/allocation_description.proto
@@ -0,0 +1,15 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+message AllocationDescription {
+  // Total number of bytes requested
+  int64 requested_bytes = 1;
+
+  // Total number of bytes allocated if known
+  int64 allocated_bytes = 2;
+
+  // Name of the allocator used
+  string allocator_name = 3;
+};
diff --git a/tensorflow/core/framework/allocator.cc b/tensorflow/core/framework/allocator.cc
new file mode 100644
index 0000000000..93f68dcccb
--- /dev/null
+++ b/tensorflow/core/framework/allocator.cc
@@ -0,0 +1,25 @@
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+Allocator::~Allocator() {}
+
+class CPUAllocator : public Allocator {
+ public:
+  ~CPUAllocator() override {}
+
+  string Name() override { return "cpu"; }
+  void* AllocateRaw(size_t alignment, size_t num_bytes) override {
+    return port::aligned_malloc(num_bytes, alignment);
+  }
+
+  void DeallocateRaw(void* ptr) override { port::aligned_free(ptr); }
+};
+
+Allocator* cpu_allocator() {
+  static CPUAllocator* cpu_alloc = new CPUAllocator;
+  return cpu_alloc;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/allocator.h b/tensorflow/core/framework/allocator.h
new file mode 100644
index 0000000000..6f162a608c
--- /dev/null
+++ b/tensorflow/core/framework/allocator.h
@@ -0,0 +1,132 @@
+#ifndef TENSORFLOW_FRAMEWORK_ALLOCATOR_H_
+#define TENSORFLOW_FRAMEWORK_ALLOCATOR_H_
+
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <limits>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+// Allocator is an abstract interface for allocating and deallocating
+// device memory.
+class Allocator {
+ public:
+  virtual ~Allocator();
+
+  // Return a string identifying this allocator
+  virtual string Name() = 0;
+
+  // Return an uninitialized block of memory that is "num_bytes" bytes
+  // in size.  The returned pointer is guaranteed to be aligned to a
+  // multiple of "alignment" bytes.
+  // REQUIRES: "alignment" is a power of 2.
+  virtual void* AllocateRaw(size_t alignment, size_t num_bytes) = 0;
+
+  // Deallocate a block of memory pointer to by "ptr"
+  // REQUIRES: "ptr" was previously returned by a call to AllocateRaw
+  virtual void DeallocateRaw(void* ptr) = 0;
+
+  // Convenience functions to do typed allocation.  Note that these functions
+  // do not invoke C++ constructors or destructors.  May return NULL if the
+  // tensor has too many elements to represent in a single allocation.
+  template <typename T>
+  T* Allocate(size_t num_elements) {
+    // TODO(jeff): Do we need to allow clients to pass in alignment
+    // requirements?
+
+    if (num_elements > (std::numeric_limits<size_t>::max() / sizeof(T))) {
+      return NULL;
+    }
+
+    void* p = AllocateRaw(32 /* align to 32 byte boundary */,
+                          sizeof(T) * num_elements);
+    return reinterpret_cast<T*>(p);
+  }
+
+  template <typename T>
+  void Deallocate(T* ptr) {
+    DeallocateRaw(ptr);
+  }
+
+  // Returns true if this allocator tracks the sizes of allocations.
+  // RequestedSize and AllocatedSize must be overridden if
+  // TracksAlloctionSizes is overridden to return true.
+  virtual bool TracksAllocationSizes() { return false; }
+
+  // Returns the user-requested size of the data allocated at
+  // 'ptr'.  Note that the actual buffer allocated might be larger
+  // than requested, but this function returns the size requested by
+  // the user.
+  //
+  // REQUIRES: TracksAllocationSizes() is true.
+  //
+  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
+  // allocated by this allocator.
+  virtual size_t RequestedSize(void* ptr) {
+    CHECK(false) << "allocator doesn't track sizes";
+  }
+
+  // Returns the allocated size of the buffer at 'ptr' if known,
+  // otherwise returns RequestedSize(ptr). AllocatedSize(ptr) is
+  // guaranteed to be >= RequestedSize(ptr).
+  //
+  // REQUIRES: TracksAllocationSizes() is true.
+  //
+  // REQUIRES: 'ptr!=nullptr' and points to a buffer previously
+  // allocated by this allocator.
+  virtual size_t AllocatedSize(void* ptr) { return RequestedSize(ptr); }
+
+  // TODO(jeff): Maybe provide some interface to give info about
+  // current allocation state (total number of bytes available for
+  // allocation, number of bytes free on device, etc.)
+};
+
+// A tensorflow Op may need access to different kinds of memory that
+// are not simply a function of the device to which the Op has been
+// assigned.  For example, an Op executing on a GPU may still need
+// to allocate CPU RAM for some purpose.  Internal to the tensorflow
+// runtime we may choose to allocate CPU ram from special regions
+// that have been prepared for higher performance in some use
+// contexts, e.g. doing DMA with particular devices.  For these
+// reasons, the Device interface does not expose just one memory
+// Allocator, but instead provides an accessor that takes a
+// specification of the desired memory attributes in order to select
+// an Allocator.
+//
+// NOTE: The upper 8 bits of the value are reserved for
+// device-specific uses.  Implementors of a device can interpret these
+// upper 8 bits in device-specific ways, and ops implemented for those
+// devices are responsible for setting those 8 bits appropriately.
+//
+// Example use:
+//  // Allocator for ordinary device memory:
+//  Allocator* a = allocator(AllocatorAttributes());
+// ...
+//  // Allocator for CPU RAM, regardless of where Op is executing:
+//  AllocatorAttributes attr;
+//  attr.set_on_host(true);
+//  Allocator* a = allocator(attr);
+struct AllocatorAttributes {
+  void set_on_host(bool v) { value |= (static_cast<int>(v)); }
+  bool on_host() const { return value & 0x1; }
+  void set_nic_compatible(bool v) { value |= (static_cast<int>(v) << 1); }
+  bool nic_compatible() const { return value & (0x1 << 1); }
+  void set_gpu_compatible(bool v) { value |= (static_cast<int>(v) << 2); }
+  bool gpu_compatible() const { return value & (0x1 << 2); }
+
+  void Merge(AllocatorAttributes other) { value |= other.value; }
+
+  uint32 value = 0;
+};
+
+// Returns a trivial implementation of Allocator which uses the system
+// default malloc.
+Allocator* cpu_allocator();
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_ALLOCATOR_H_
diff --git a/tensorflow/core/framework/allocator_test.cc b/tensorflow/core/framework/allocator_test.cc
new file mode 100644
index 0000000000..6b1e52cfc4
--- /dev/null
+++ b/tensorflow/core/framework/allocator_test.cc
@@ -0,0 +1,61 @@
+#include "tensorflow/core/framework/allocator.h"
+#include <algorithm>
+#include "tensorflow/core/platform/logging.h"
+#include <gtest/gtest.h>
+namespace tensorflow {
+
+TEST(CPUAllocatorTest, Simple) {
+  Allocator* a = cpu_allocator();
+  std::vector<void*> ptrs;
+  for (int s = 1; s < 1024; s++) {
+    void* raw = a->AllocateRaw(1, s);
+    ptrs.push_back(raw);
+  }
+  std::sort(ptrs.begin(), ptrs.end());
+  for (size_t i = 0; i < ptrs.size(); i++) {
+    if (i > 0) {
+      CHECK_NE(ptrs[i], ptrs[i - 1]);  // No dups
+    }
+    a->DeallocateRaw(ptrs[i]);
+  }
+  float* t1 = a->Allocate<float>(1024);
+  double* t2 = a->Allocate<double>(1048576);
+  a->Deallocate(t1);
+  a->Deallocate(t2);
+}
+
+// Define a struct that we will use to observe behavior in the unit tests
+struct TestStruct {
+  int x;  // not used just want to make sure sizeof(TestStruct) > 1
+};
+
+TEST(CPUAllocatorTest, CheckStructSize) { CHECK_GT(sizeof(TestStruct), 1); }
+
+TEST(CPUAllocatorTest, AllocateOverflowMaxSizeT) {
+  Allocator* a = cpu_allocator();
+
+  // The maximum size_t value will definitely overflow.
+  size_t count_to_allocate = std::numeric_limits<size_t>::max();
+  TestStruct* const test_pointer = a->Allocate<TestStruct>(count_to_allocate);
+
+  CHECK_EQ(test_pointer, reinterpret_cast<TestStruct*>(NULL));
+}
+
+TEST(CPUAllocatorTest, AllocateOverflowSmallest) {
+  Allocator* a = cpu_allocator();
+
+  // count_to_allocate is the smallest count that will cause overflow.
+  const size_t count_to_allocate =
+      (std::numeric_limits<size_t>::max() / sizeof(TestStruct)) + 1;
+  TestStruct* const test_pointer = a->Allocate<TestStruct>(count_to_allocate);
+
+  CHECK_EQ(test_pointer, reinterpret_cast<TestStruct*>(NULL));
+}
+
+TEST(CPUAllocatorTest, Sizes) {
+  Allocator* a = cpu_allocator();
+
+  EXPECT_EQ(false, a->TracksAllocationSizes());
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/attr_value.proto b/tensorflow/core/framework/attr_value.proto
new file mode 100644
index 0000000000..c6a9940815
--- /dev/null
+++ b/tensorflow/core/framework/attr_value.proto
@@ -0,0 +1,57 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/tensor.proto";
+import "tensorflow/core/framework/tensor_shape.proto";
+import "tensorflow/core/framework/types.proto";
+
+// Protocol buffer representing the value for an attr used to configure an Op.
+// Comment indicates the corresponding attr type.  Only the field matching the
+// attr type may be filled.
+message AttrValue {
+  message ListValue {
+    repeated bytes s = 2;                        // "list(string)"
+    repeated int64 i = 3 [packed = true];        // "list(int)"
+    repeated float f = 4 [packed = true];        // "list(float)"
+    repeated bool b = 5 [packed = true];         // "list(bool)"
+    repeated DataType type = 6 [packed = true];  // "list(type)"
+    repeated TensorShapeProto shape = 7;         // "list(shape)"
+    repeated TensorProto tensor = 8;             // "list(tensor)"
+    // TODO(zhifengc/josh11b): implements list(func) if needed.
+  }
+
+  oneof value {
+    bytes s = 2;                 // "string"
+    int64 i = 3;                 // "int"
+    float f = 4;                 // "float"
+    bool b = 5;                  // "bool"
+    DataType type = 6;           // "type"
+    TensorShapeProto shape = 7;  // "shape"
+    TensorProto tensor = 8;      // "tensor"
+    ListValue list = 1;          // any "list(...)"
+
+    // "func" represents a function. func.name is a function's name or
+    // a primitive op's name. func.attr.first is the name of an attr
+    // defined for that function. func.attr.second is the value for
+    // that attr in the instantiation.
+    NameAttrList func = 10;
+
+    // This is a placeholder only used in nodes defined inside a
+    // function.  It indicates the attr value will be supplied when
+    // the function is instantiated.  For example, let us suppose a
+    // node "N" in function "FN". "N" has an attr "A" with value
+    // placeholder = "foo". When FN is instantiated with attr "foo"
+    // set to "bar", the instantiated node N's attr A will have been
+    // given the value "bar".
+    string placeholder = 9;
+  }
+}
+
+// A list of attr names and their values. The whole list is attached
+// with a string name.  E.g., MatMul[T=float].
+message NameAttrList {
+  string name = 1;
+  map<string, AttrValue> attr = 2;
+}
diff --git a/tensorflow/core/framework/attr_value_util.cc b/tensorflow/core/framework/attr_value_util.cc
new file mode 100644
index 0000000000..400ef118b8
--- /dev/null
+++ b/tensorflow/core/framework/attr_value_util.cc
@@ -0,0 +1,382 @@
+#include "tensorflow/core/framework/attr_value_util.h"
+
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/regexp.h"
+
+namespace tensorflow {
+
+namespace {
+
+string SummarizeString(const string& str) {
+  return strings::StrCat("\"", str_util::CEscape(str), "\"");
+}
+
+string SummarizeShape(const TensorShapeProto& proto) {
+  TensorShape shape(proto);
+  return shape.ShortDebugString();
+}
+
+string SummarizeTensor(const TensorProto& tensor_proto) {
+  Tensor t;
+  if (!t.FromProto(tensor_proto)) {
+    return strings::StrCat("<Invalid TensorProto: ",
+                           tensor_proto.ShortDebugString(), ">");
+  }
+  return t.DebugString();
+}
+
+}  // namespace
+
+string SummarizeAttrValue(const AttrValue& attr_value) {
+  switch (attr_value.value_case()) {
+    case AttrValue::kS:
+      return SummarizeString(attr_value.s());
+    case AttrValue::kI:
+      return strings::StrCat(attr_value.i());
+    case AttrValue::kF:
+      return strings::StrCat(attr_value.f());
+    case AttrValue::kB:
+      return attr_value.b() ? "true" : "false";
+    case AttrValue::kType:
+      return DataType_Name(attr_value.type());
+    case AttrValue::kShape:
+      return SummarizeShape(attr_value.shape());
+    case AttrValue::kTensor:
+      return SummarizeTensor(attr_value.tensor());
+    case AttrValue::kList: {
+      string ret = "[";
+      if (attr_value.list().s_size() > 0) {
+        for (int i = 0; i < attr_value.list().s_size(); ++i) {
+          if (i > 0) strings::StrAppend(&ret, ", ");
+          strings::StrAppend(&ret, SummarizeString(attr_value.list().s(i)));
+        }
+      } else if (attr_value.list().i_size() > 0) {
+        for (int i = 0; i < attr_value.list().i_size(); ++i) {
+          if (i > 0) strings::StrAppend(&ret, ", ");
+          strings::StrAppend(&ret, attr_value.list().i(i));
+        }
+      } else if (attr_value.list().f_size() > 0) {
+        for (int i = 0; i < attr_value.list().f_size(); ++i) {
+          if (i > 0) strings::StrAppend(&ret, ", ");
+          strings::StrAppend(&ret, attr_value.list().f(i));
+        }
+      } else if (attr_value.list().b_size() > 0) {
+        for (int i = 0; i < attr_value.list().b_size(); ++i) {
+          if (i > 0) strings::StrAppend(&ret, ", ");
+          strings::StrAppend(&ret, attr_value.list().b(i) ? "true" : "false");
+        }
+      } else if (attr_value.list().type_size() > 0) {
+        for (int i = 0; i < attr_value.list().type_size(); ++i) {
+          if (i > 0) strings::StrAppend(&ret, ", ");
+          strings::StrAppend(&ret, DataType_Name(attr_value.list().type(i)));
+        }
+      } else if (attr_value.list().shape_size() > 0) {
+        for (int i = 0; i < attr_value.list().shape_size(); ++i) {
+          if (i > 0) strings::StrAppend(&ret, ", ");
+          strings::StrAppend(&ret, SummarizeShape(attr_value.list().shape(i)));
+        }
+      } else if (attr_value.list().tensor_size() > 0) {
+        for (int i = 0; i < attr_value.list().tensor_size(); ++i) {
+          if (i > 0) strings::StrAppend(&ret, ", ");
+          strings::StrAppend(&ret,
+                             SummarizeTensor(attr_value.list().tensor(i)));
+        }
+      }
+      strings::StrAppend(&ret, "]");
+      return ret;
+    }
+    case AttrValue::kFunc: {
+      std::vector<string> entries;
+      for (auto p : attr_value.func().attr()) {
+        entries.push_back(
+            strings::StrCat(p.first, "=", SummarizeAttrValue(p.second)));
+      }
+      sort(entries.begin(), entries.end());
+      return strings::StrCat(attr_value.func().name(), "[",
+                             str_util::Join(entries, ", "), "]");
+    }
+    case AttrValue::kPlaceholder:
+      return strings::StrCat("$", attr_value.placeholder());
+    case AttrValue::VALUE_NOT_SET:
+      return "<Unknown AttrValue type>";
+  }
+  return "<Unknown AttrValue type>";  // Prevent missing return warning
+}
+
+Status AttrValueHasType(const AttrValue& attr_value, StringPiece type) {
+  int num_set = 0;
+
+#define VALIDATE_FIELD(name, type_string, oneof_case)                      \
+  do {                                                                     \
+    if (attr_value.has_list()) {                                           \
+      if (attr_value.list().name##_size() > 0) {                           \
+        if (type != "list(" type_string ")") {                             \
+          return errors::InvalidArgument(                                  \
+              "AttrValue had value with type list(" type_string ") when ", \
+              type, " expected");                                          \
+        }                                                                  \
+        ++num_set;                                                         \
+      }                                                                    \
+    } else if (attr_value.value_case() == AttrValue::oneof_case) {         \
+      if (type != type_string) {                                           \
+        return errors::InvalidArgument(                                    \
+            "AttrValue had value with type " type_string " when ", type,   \
+            " expected");                                                  \
+      }                                                                    \
+      ++num_set;                                                           \
+    }                                                                      \
+  } while (false)
+
+  VALIDATE_FIELD(s, "string", kS);
+  VALIDATE_FIELD(i, "int", kI);
+  VALIDATE_FIELD(f, "float", kF);
+  VALIDATE_FIELD(b, "bool", kB);
+  VALIDATE_FIELD(type, "type", kType);
+  VALIDATE_FIELD(shape, "shape", kShape);
+  VALIDATE_FIELD(tensor, "tensor", kTensor);
+
+#undef VALIDATE_FIELD
+
+  if (attr_value.value_case() == AttrValue::kFunc) {
+    if (type != "func") {
+      return errors::InvalidArgument(
+          "AttrValue had value with type 'func' when ", type, " expected");
+    }
+    ++num_set;
+  }
+
+  if (attr_value.value_case() == AttrValue::kPlaceholder) {
+    return errors::InvalidArgument(
+        "AttrValue had value with unexpected type 'placeholder");
+  }
+
+  // If the attr type is 'list', we expect attr_value.has_list() to be true.
+  // However, proto3's attr_value.has_list() can be false when set to an empty
+  // list. So we simply check if has_list is false and some other field in
+  // attr_value is set to flag the error.
+  if (StringPiece(type).starts_with("list(") && !attr_value.has_list()) {
+    if (num_set) {
+      return errors::InvalidArgument(
+          "AttrValue missing value with expected type ", type);
+    } else {
+      // Indicate that we have a list, but an empty one.
+      ++num_set;
+    }
+  }
+
+  // Okay to have an empty list, but not to be missing a non-list value.
+  if (num_set == 0 && !StringPiece(type).starts_with("list(")) {
+    return errors::InvalidArgument(
+        "AttrValue missing value with expected type ", type);
+  }
+
+  // Ref types and DT_INVALID are illegal.
+  if (type == "type") {
+    if (IsRefType(attr_value.type())) {
+      return errors::InvalidArgument(
+          "AttrValue must not have reference type value of ",
+          DataTypeString(attr_value.type()));
+    }
+    if (attr_value.type() == DT_INVALID) {
+      return errors::InvalidArgument("AttrValue has invalid DataType");
+    }
+  } else if (type == "list(type)") {
+    for (auto as_int : attr_value.list().type()) {
+      const DataType dtype = static_cast<DataType>(as_int);
+      if (IsRefType(dtype)) {
+        return errors::InvalidArgument(
+            "AttrValue must not have reference type value of ",
+            DataTypeString(dtype));
+      }
+      if (dtype == DT_INVALID) {
+        return errors::InvalidArgument("AttrValue contains invalid DataType");
+      }
+    }
+  }
+
+  return Status::OK();
+}
+
+bool ParseAttrValue(StringPiece type, StringPiece text, AttrValue* out) {
+  // Parse type.
+  string field_name;
+  bool is_list = type.Consume("list(");
+  if (type.Consume("string")) {
+    field_name = "s";
+  } else if (type.Consume("int")) {
+    field_name = "i";
+  } else if (type.Consume("float")) {
+    field_name = "f";
+  } else if (type.Consume("bool")) {
+    field_name = "b";
+  } else if (type.Consume("type")) {
+    field_name = "type";
+  } else if (type.Consume("shape")) {
+    field_name = "shape";
+  } else if (type.Consume("tensor")) {
+    field_name = "tensor";
+  } else if (type.Consume("func")) {
+    field_name = "func";
+  } else if (type.Consume("placeholder")) {
+    field_name = "placeholder";
+  } else {
+    return false;
+  }
+  if (is_list && !type.Consume(")")) {
+    return false;
+  }
+
+  // Construct a valid text proto message to parse.
+  string to_parse;
+  if (is_list) {
+    // TextFormat parser considers "i: 7" to be the same as "i: [7]",
+    // but we only want to allow list values with [].
+    if (!RE2::FullMatch(ToRegexpStringPiece(text), "\\s*\\[.*\\]\\s*")) {
+      return false;
+    }
+    if (RE2::FullMatch(ToRegexpStringPiece(text), "\\s*\\[\\s*\\]\\s*")) {
+      // User wrote "[]", so return empty list without invoking the TextFormat
+      // parse which returns an error for "i: []".
+      out->Clear();
+      out->mutable_list();
+      return true;
+    }
+    to_parse = strings::StrCat("list { ", field_name, ": ", text, " }");
+  } else {
+    to_parse = strings::StrCat(field_name, ": ", text);
+  }
+
+  // Parse if we can.
+  return protobuf::TextFormat::ParseFromString(to_parse, out);
+}
+
+#define DEFINE_SET_ATTR_VALUE_ONE(ARG_TYPE, FIELD) \
+  void SetAttrValue(ARG_TYPE value, AttrValue* out) { out->set_##FIELD(value); }
+
+#define DEFINE_SET_ATTR_VALUE_LIST(ARG_TYPE, FIELD)              \
+  void SetAttrValue(ARG_TYPE value, AttrValue* out) {            \
+    out->mutable_list(); /* create list() even if value empty */ \
+    for (const auto& v : value) {                                \
+      out->mutable_list()->add_##FIELD(v);                       \
+    }                                                            \
+  }
+
+#define DEFINE_SET_ATTR_VALUE_BOTH(ARG_TYPE, FIELD) \
+  DEFINE_SET_ATTR_VALUE_ONE(ARG_TYPE, FIELD)        \
+  DEFINE_SET_ATTR_VALUE_LIST(gtl::ArraySlice<ARG_TYPE>, FIELD)
+
+DEFINE_SET_ATTR_VALUE_ONE(const string&, s)
+DEFINE_SET_ATTR_VALUE_LIST(gtl::ArraySlice<string>, s)
+DEFINE_SET_ATTR_VALUE_BOTH(const char*, s)
+DEFINE_SET_ATTR_VALUE_BOTH(int64, i)
+DEFINE_SET_ATTR_VALUE_BOTH(int32, i)
+DEFINE_SET_ATTR_VALUE_BOTH(float, f)
+DEFINE_SET_ATTR_VALUE_BOTH(double, f)
+DEFINE_SET_ATTR_VALUE_BOTH(bool, b)
+DEFINE_SET_ATTR_VALUE_LIST(const std::vector<bool>&, b)
+DEFINE_SET_ATTR_VALUE_LIST(std::initializer_list<bool>, b)
+DEFINE_SET_ATTR_VALUE_BOTH(DataType, type)
+
+void SetAttrValue(StringPiece value, AttrValue* out) {
+  out->set_s(value.data(), value.size());
+}
+
+void SetAttrValue(const TensorShape& value, AttrValue* out) {
+  value.AsProto(out->mutable_shape());
+}
+
+void SetAttrValue(const gtl::ArraySlice<TensorShape> value, AttrValue* out) {
+  out->mutable_list();  // Create list() even if value empty.
+  for (const auto& v : value) {
+    v.AsProto(out->mutable_list()->add_shape());
+  }
+}
+
+void SetAttrValue(const Tensor& value, AttrValue* out) {
+  if (value.NumElements() > 1) {
+    value.AsProtoTensorContent(out->mutable_tensor());
+  } else {
+    value.AsProtoField(out->mutable_tensor());
+  }
+}
+
+void SetAttrValue(const gtl::ArraySlice<Tensor> value, AttrValue* out) {
+  out->mutable_list();  // Create list() even if value empty.
+  for (const auto& v : value) {
+    if (v.NumElements() > 1) {
+      v.AsProtoTensorContent(out->mutable_list()->add_tensor());
+    } else {
+      v.AsProtoField(out->mutable_list()->add_tensor());
+    }
+  }
+}
+
+void SetAttrValue(const TensorProto& value, AttrValue* out) {
+  *out->mutable_tensor() = value;
+}
+
+void SetAttrValue(const gtl::ArraySlice<TensorProto> value, AttrValue* out) {
+  out->mutable_list();  // Create list() even if value empty.
+  for (const auto& v : value) {
+    *out->mutable_list()->add_tensor() = v;
+  }
+}
+
+void SetAttrValue(const NameAttrList& value, AttrValue* out) {
+  *out->mutable_func() = value;
+}
+
+bool AreAttrValuesEqual(const AttrValue& a, const AttrValue& b) {
+  string a_str, b_str;
+  a.SerializeToString(&a_str);
+  b.SerializeToString(&b_str);
+  // Note: it should be safe to compare proto serializations of the attr
+  // values since at most one field should be set in each (indeed, it
+  // must be the same field if they are to compare equal).
+  // Exception: there are multiple equivalent representations of
+  // TensorProtos.  So a return value of true implies a == b, but not the
+  // converse.
+  return a_str == b_str;
+}
+
+bool HasPlaceHolder(const AttrValue& val) {
+  switch (val.value_case()) {
+    case AttrValue::kFunc:
+      for (const auto& p : val.func().attr()) {
+        if (HasPlaceHolder(p.second)) {
+          return true;
+        }
+      }
+      break;
+    case AttrValue::kPlaceholder:
+      return true;
+    default:
+      break;
+  }
+  return false;
+}
+
+bool SubstitutePlaceholders(SubstituteFunc substitute, AttrValue* value) {
+  switch (value->value_case()) {
+    case AttrValue::kFunc:
+      for (auto& p : *(value->mutable_func()->mutable_attr())) {
+        if (!SubstitutePlaceholders(substitute, &p.second)) {
+          return false;
+        }
+      }
+      break;
+    case AttrValue::kPlaceholder:
+      return substitute(value->placeholder(), value);
+    case AttrValue::VALUE_NOT_SET:
+      return false;
+    default:
+      break;
+  }
+  return true;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/attr_value_util.h b/tensorflow/core/framework/attr_value_util.h
new file mode 100644
index 0000000000..1faf74a327
--- /dev/null
+++ b/tensorflow/core/framework/attr_value_util.h
@@ -0,0 +1,83 @@
+#ifndef TENSORFLOW_FRAMEWORK_ATTR_VALUE_UTIL_H_
+#define TENSORFLOW_FRAMEWORK_ATTR_VALUE_UTIL_H_
+
+#include <string>
+#include "tensorflow/core/framework/attr_value.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+
+namespace tensorflow {
+
+// A human-readable rendering of attr_value, that is more concise than a
+// text-format proto.
+string SummarizeAttrValue(const AttrValue& attr_value);
+
+// Generates an error if attr_value doesn't have the indicated attr type.
+Status AttrValueHasType(const AttrValue& attr_value, StringPiece type);
+
+// Converts a text proto value from "text" into the the field of *out
+// indicated by "type" (e.g. from the type field of an AttrDef).
+// Examples:
+// * If type:"int" and text:"-14", then *out is set to "i: -14"
+// * If type:"list(string)" and text:"['foo', 'bar']",
+//   then *out is set to "list { s: ['foo', 'bar'] }"
+// Returns true on success.
+bool ParseAttrValue(StringPiece type, StringPiece text, AttrValue* out);
+
+// Sets *out based on the type of value.
+void SetAttrValue(const string& value, AttrValue* out);
+void SetAttrValue(const char* value, AttrValue* out);
+void SetAttrValue(StringPiece value, AttrValue* out);
+void SetAttrValue(int64 value, AttrValue* out);
+void SetAttrValue(int32 value, AttrValue* out);
+void SetAttrValue(float value, AttrValue* out);
+void SetAttrValue(double value, AttrValue* out);
+void SetAttrValue(bool value, AttrValue* out);
+void SetAttrValue(DataType value, AttrValue* out);
+void SetAttrValue(const TensorShape& value, AttrValue* out);
+void SetAttrValue(const Tensor& value, AttrValue* out);
+void SetAttrValue(const TensorProto& value, AttrValue* out);
+void SetAttrValue(const NameAttrList& value, AttrValue* out);
+
+void SetAttrValue(gtl::ArraySlice<string> value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<const char*> value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<int64> value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<int32> value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<float> value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<double> value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<bool> value, AttrValue* out);
+void SetAttrValue(const std::vector<bool>& value, AttrValue* out);
+void SetAttrValue(std::initializer_list<bool> value, AttrValue* out);
+void SetAttrValue(DataTypeSlice value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<TensorShape> value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<Tensor> value, AttrValue* out);
+void SetAttrValue(gtl::ArraySlice<TensorProto> value, AttrValue* out);
+
+inline void SetAttrValue(const AttrValue& value, AttrValue* out) {
+  *out = value;
+}
+
+// Returns true if a and b have the same value.
+// NOTE: May return false negatives for tensor values.
+bool AreAttrValuesEqual(const AttrValue& a, const AttrValue& b);
+
+// Returns true if "val" has a placeholder.
+bool HasPlaceHolder(const AttrValue& val);
+
+// SubstitutePlaceholders recursively replaces placeholders in 'value'
+// with an attr value by calling SubstituteFunc. Returns true iff all
+// placeholders in "value" are replaced with a value.
+//
+// SubstituteFunc is given a placeholder string. If the placeholder is
+// unknown, SubstituteFunc returns false. Otherwise, overwrites the
+// attr value and returns true.
+typedef std::function<bool(const string&, AttrValue*)> SubstituteFunc;
+bool SubstitutePlaceholders(SubstituteFunc substitute, AttrValue* value);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_ATTR_VALUE_UTIL_H_
diff --git a/tensorflow/core/framework/attr_value_util_test.cc b/tensorflow/core/framework/attr_value_util_test.cc
new file mode 100644
index 0000000000..bdfbf1707a
--- /dev/null
+++ b/tensorflow/core/framework/attr_value_util_test.cc
@@ -0,0 +1,91 @@
+#include "tensorflow/core/framework/attr_value_util.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+// A few helpers to construct AttrValue protos.
+template <typename T>
+AttrValue V(T value) {
+  AttrValue ret;
+  SetAttrValue(value, &ret);
+  return ret;
+}
+
+AttrValue P(const string& p) {
+  AttrValue ret;
+  ret.set_placeholder(p);
+  return ret;
+}
+
+AttrValue F(const string& name,
+            std::vector<std::pair<string, AttrValue> > pairs) {
+  AttrValue ret;
+  ret.mutable_func()->set_name(name);
+  ret.mutable_func()->mutable_attr()->insert(pairs.begin(), pairs.end());
+  return ret;
+}
+
+TEST(AttrValueUtil, HasType) {
+  // OK
+  EXPECT_TRUE(AttrValueHasType(V(123), "int").ok());
+  EXPECT_TRUE(AttrValueHasType(V(1.2), "float").ok());
+  EXPECT_TRUE(AttrValueHasType(V(DT_FLOAT), "type").ok());
+  EXPECT_TRUE(AttrValueHasType(F("f", {}), "func").ok());
+
+  // not OK.
+  EXPECT_FALSE(AttrValueHasType(V(123), "func").ok());
+  EXPECT_FALSE(AttrValueHasType(V(1.2), "int").ok());
+  EXPECT_FALSE(AttrValueHasType(V(DT_FLOAT), "shape").ok());
+  EXPECT_FALSE(AttrValueHasType(F("f", {}), "string").ok());
+  EXPECT_FALSE(AttrValueHasType(P("T"), "float").ok());
+}
+
+SubstituteFunc ReplaceTWith(const AttrValue& val) {
+  return [val](const string& placeholder, AttrValue* target) {
+    if (placeholder == "T") {
+      *target = val;
+      return true;
+    } else {
+      return false;
+    }
+  };
+}
+
+TEST(AttrValueUtil, Basic) {
+  auto v = F("MatMul", {{"dtype", P("T")},
+                        {"transpose_a", V(false)},
+                        {"transpose_b", V(true)},
+                        {"use_cublas", V(true)}});
+  TF_CHECK_OK(AttrValueHasType(v, "func"));
+  EXPECT_TRUE(HasPlaceHolder(v));
+
+  EXPECT_EQ(
+      SummarizeAttrValue(v),
+      "MatMul[dtype=$T, transpose_a=false, transpose_b=true, use_cublas=true]");
+
+  SubstitutePlaceholders(ReplaceTWith(V(DT_FLOAT)), &v);
+  EXPECT_TRUE(!HasPlaceHolder(v));
+  EXPECT_EQ(SummarizeAttrValue(v),
+            "MatMul[dtype=DT_FLOAT, transpose_a=false, transpose_b=true, "
+            "use_cublas=true]");
+}
+
+TEST(AttrValueUtil, DeepAttr) {
+  auto v = F("f", {{"T", P("T")}});
+  TF_CHECK_OK(AttrValueHasType(v, "func"));
+  EXPECT_TRUE(HasPlaceHolder(v));
+
+  for (int i = 0; i < 3; ++i) {
+    v = F("f", {{"T", P("T")}, {"F", v}});
+    EXPECT_TRUE(HasPlaceHolder(v));
+  }
+  EXPECT_EQ(SummarizeAttrValue(v), "f[F=f[F=f[F=f[T=$T], T=$T], T=$T], T=$T]");
+
+  SubstitutePlaceholders(ReplaceTWith(F("x", {})), &v);
+  EXPECT_TRUE(!HasPlaceHolder(v));
+  EXPECT_EQ(SummarizeAttrValue(v),
+            "f[F=f[F=f[F=f[T=x[]], T=x[]], T=x[]], T=x[]]");
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/bfloat16.cc b/tensorflow/core/framework/bfloat16.cc
new file mode 100644
index 0000000000..0068283367
--- /dev/null
+++ b/tensorflow/core/framework/bfloat16.cc
@@ -0,0 +1,22 @@
+#include "tensorflow/core/framework/bfloat16.h"
+
+namespace tensorflow {
+
+void FloatToBFloat16(const float* src, bfloat16* dst, int64 size) {
+  const uint16_t* p = reinterpret_cast<const uint16_t*>(src);
+  uint16_t* q = reinterpret_cast<uint16_t*>(dst);
+  for (; size; p += 2, q++, size--) {
+    *q = p[1];
+  }
+}
+
+void BFloat16ToFloat(const bfloat16* src, float* dst, int64 size) {
+  const uint16_t* p = reinterpret_cast<const uint16_t*>(src);
+  uint16_t* q = reinterpret_cast<uint16_t*>(dst);
+  for (; size; p++, q += 2, size--) {
+    q[0] = 0;
+    q[1] = *p;
+  }
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/bfloat16.h b/tensorflow/core/framework/bfloat16.h
new file mode 100644
index 0000000000..9cd260ee13
--- /dev/null
+++ b/tensorflow/core/framework/bfloat16.h
@@ -0,0 +1,58 @@
+#ifndef TENSORFLOW_FRAMEWORK_BFLOAT16_H_
+#define TENSORFLOW_FRAMEWORK_BFLOAT16_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+// Compact 16-bit encoding of floating point numbers. This representation uses
+// 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa.  It
+// is assumed that floats are in IEEE 754 format so the representation is just
+// bits 16-31 of a single precision float.
+//
+// NOTE: The IEEE floating point standard defines a float16 format that
+// is different than this format (it has fewer bits of exponent and more
+// bits of mantissa).  We don't use that format here because conversion
+// to/from 32-bit floats is more complex for that format, and the
+// conversion for this format is very simple.
+//
+// Because of the existing IEEE float16 type, we do not name our representation
+// "float16" but just use "uint16".
+//
+// <-----our 16bits float------->
+// s e e e e e e e e f f f f f f f f f f f f f f f f f f f f f f f
+// <------------------------------float-------------------------->
+// 3 3             2 2             1 1                           0
+// 1 0             3 2             5 4                           0
+//
+//
+// This type only supports conversion back and forth with float.
+//
+// This file must be compilable by nvcc.
+
+namespace tensorflow {
+struct bfloat16 {
+  EIGEN_DEVICE_FUNC bfloat16() {}
+  EIGEN_DEVICE_FUNC explicit bfloat16(const uint16_t v) : value(v) {}
+
+  uint16_t value;
+};
+
+// Conversion routines between an array of float and bfloat16 of
+// "size".
+void FloatToBFloat16(const float* src, bfloat16* dst, int64 size);
+void BFloat16ToFloat(const bfloat16* src, float* dst, int64 size);
+
+}  // namespace tensorflow
+
+namespace Eigen {
+template <>
+struct NumTraits<tensorflow::bfloat16> : GenericNumTraits<uint16_t> {};
+
+EIGEN_STRONG_INLINE bool operator==(const tensorflow::bfloat16 a,
+                                    const tensorflow::bfloat16 b) {
+  return a.value == b.value;
+}
+
+}  // namespace Eigen
+
+#endif  // TENSORFLOW_FRAMEWORK_BFLOAT16_H_
diff --git a/tensorflow/core/framework/bfloat16_test.cc b/tensorflow/core/framework/bfloat16_test.cc
new file mode 100644
index 0000000000..4fe791fdeb
--- /dev/null
+++ b/tensorflow/core/framework/bfloat16_test.cc
@@ -0,0 +1,69 @@
+#include "tensorflow/core/framework/bfloat16.h"
+
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+TEST(Bfloat16Test, Simple) {
+  bfloat16 a(12);
+  EXPECT_EQ(12, a.value);
+}
+
+TEST(Bfloat16Test, Conversion) {
+  float a[100];
+  for (int i = 0; i < 100; ++i) {
+    a[i] = i + 1.25;
+  }
+  bfloat16 b[100];
+  float c[100];
+  FloatToBFloat16(a, b, 100);
+  BFloat16ToFloat(b, c, 100);
+  for (int i = 0; i < 100; ++i) {
+    // The relative error should be less than 1/(2^7) since bfloat16
+    // has 7 bits mantissa.
+    EXPECT_LE(fabs(c[i] - a[i]) / a[i], 1.0 / 128);
+  }
+}
+
+static void BM_FloatToBFloat16(int iters) {
+  testing::StopTiming();
+  static const int N = 32 << 20;
+  const int64 tot = static_cast<int64>(iters) * N;
+  testing::ItemsProcessed(tot);
+  testing::BytesProcessed(tot * (sizeof(float) + sizeof(bfloat16)));
+
+  float* inp = new float[N];
+  bfloat16* out = new bfloat16[N];
+
+  testing::StartTiming();
+  while (iters--) {
+    FloatToBFloat16(inp, out, N);
+  }
+  delete[] inp;
+  delete[] out;
+}
+BENCHMARK(BM_FloatToBFloat16);
+
+static void BM_BFloat16ToFloat(int iters) {
+  testing::StopTiming();
+  static const int N = 32 << 20;
+  const int64 tot = static_cast<int64>(iters) * N;
+  testing::ItemsProcessed(tot);
+  testing::BytesProcessed(tot * (sizeof(float) + sizeof(bfloat16)));
+
+  bfloat16* inp = new bfloat16[N];
+  float* out = new float[N];
+
+  testing::StartTiming();
+  while (iters--) {
+    BFloat16ToFloat(inp, out, N);
+  }
+  delete[] inp;
+  delete[] out;
+}
+BENCHMARK(BM_BFloat16ToFloat);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/cancellation.cc b/tensorflow/core/framework/cancellation.cc
new file mode 100644
index 0000000000..51423792a8
--- /dev/null
+++ b/tensorflow/core/framework/cancellation.cc
@@ -0,0 +1,79 @@
+#include "tensorflow/core/framework/cancellation.h"
+
+#include <vector>
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+const CancellationToken CancellationManager::kInvalidToken = -1;
+
+CancellationManager::CancellationManager()
+    : is_cancelling_(false), is_cancelled_(0), next_cancellation_token_(0) {}
+
+void CancellationManager::StartCancel() {
+  std::unordered_map<CancellationToken, CancelCallback> callbacks_to_run;
+  {
+    mutex_lock l(mu_);
+    if (is_cancelled_.load(std::memory_order_relaxed) || is_cancelling_) {
+      return;
+    }
+    is_cancelling_ = true;
+    std::swap(callbacks_, callbacks_to_run);
+  }
+  // We call these callbacks without holding mu_, so that concurrent
+  // calls to DeregisterCallback, which can happen asynchronously, do
+  // not block. The callbacks remain valid because any concurrent call
+  // to DeregisterCallback will block until the
+  // cancelled_notification_ is notified.
+  for (auto key_and_value : callbacks_to_run) {
+    key_and_value.second();
+  }
+  {
+    mutex_lock l(mu_);
+    is_cancelling_ = false;
+    is_cancelled_.store(true, std::memory_order_release);
+  }
+  cancelled_notification_.Notify();
+}
+
+CancellationToken CancellationManager::get_cancellation_token() {
+  mutex_lock l(mu_);
+  return next_cancellation_token_++;
+}
+
+bool CancellationManager::RegisterCallback(CancellationToken token,
+                                           CancelCallback callback) {
+  mutex_lock l(mu_);
+  CHECK_LT(token, next_cancellation_token_) << "Invalid cancellation token";
+  bool should_register = !is_cancelled_ && !is_cancelling_;
+  if (should_register) {
+    std::swap(callbacks_[token], callback);
+  }
+  return should_register;
+}
+
+bool CancellationManager::DeregisterCallback(CancellationToken token) {
+  mu_.lock();
+  if (is_cancelled_) {
+    mu_.unlock();
+    return false;
+  } else if (is_cancelling_) {
+    mu_.unlock();
+    // Wait for all of the cancellation callbacks to be called. This
+    // wait ensures that the caller of DeregisterCallback does not
+    // return immediately and free objects that may be used in the
+    // execution of any currently pending callbacks in StartCancel.
+    cancelled_notification_.WaitForNotification();
+    return false;
+  } else {
+    callbacks_.erase(token);
+    mu_.unlock();
+    return true;
+  }
+}
+
+CancellationManager::~CancellationManager() { StartCancel(); }
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/cancellation.h b/tensorflow/core/framework/cancellation.h
new file mode 100644
index 0000000000..feda548e97
--- /dev/null
+++ b/tensorflow/core/framework/cancellation.h
@@ -0,0 +1,121 @@
+#ifndef TENSORFLOW_FRAMEWORK_CANCELLATION_H_
+#define TENSORFLOW_FRAMEWORK_CANCELLATION_H_
+
+#include <atomic>
+#include <functional>
+#include <unordered_map>
+
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// A token that can be used to register and deregister a
+// CancelCallback with a CancellationManager.
+//
+// CancellationToken values must be created by a call to
+// CancellationManager::get_cancellation_token.
+typedef int64 CancellationToken;
+
+// A callback that is invoked when a step is cancelled.
+//
+// NOTE(mrry): See caveats about CancelCallback implementations in the
+// comment for CancellationManager::RegisterCallback.
+typedef std::function<void()> CancelCallback;
+
+class CancellationManager {
+ public:
+  // A value that won't be returned by get_cancellation_token().
+  static const CancellationToken kInvalidToken;
+
+  CancellationManager();
+  ~CancellationManager();
+
+  // Run all callbacks associated with this manager.
+  void StartCancel();
+
+  // Returns true iff StartCancel() has been called.
+  bool IsCancelled() { return is_cancelled_.load(std::memory_order_acquire); }
+
+  // Returns a token that must be used in calls to RegisterCallback
+  // and DeregisterCallback.
+  CancellationToken get_cancellation_token();
+
+  // Attempts to register the given callback to be invoked when this
+  // manager is cancelled. Returns true if the callback was
+  // registered; returns false if this manager was already cancelled,
+  // and the callback was not registered.
+  //
+  // If this method returns false, it is the caller's responsibility
+  // to perform any cancellation cleanup.
+  //
+  // This method is tricky to use correctly. The following usage pattern
+  // is recommended:
+  //
+  // class ObjectWithCancellableOperation {
+  //   mutex mu_;
+  //   void CancellableOperation(CancellationManager* cm,
+  //                             std::function<void(Status)> callback) {
+  //     bool already_cancelled;
+  //     CancellationToken token = cm->get_cancellation_token();
+  //     {
+  //       mutex_lock(mu_);
+  //       already_cancelled = cm->RegisterCallback(
+  //           [this, token]() { Cancel(token); });
+  //       if (!already_cancelled) {
+  //         // Issue asynchronous operation. Associate the pending operation
+  //         // with `token` in some object state, or provide another way for
+  //         // the Cancel method to look up the operation for cancellation.
+  //         // Ensure that `cm->DeregisterCallback(token)` is called without
+  //         // holding `mu_`, before `callback` is invoked.
+  //         // ...
+  //       }
+  //     }
+  //     if (already_cancelled) {
+  //       callback(errors::Cancelled("Operation was cancelled"));
+  //     }
+  //   }
+  //
+  //   void Cancel(CancellationToken token) {
+  //     mutex_lock(mu_);
+  //     // Take action to cancel the operation with the given cancellation
+  //     // token.
+  //   }
+  //
+  // NOTE(mrry): The caller should take care that (i) the calling code
+  // is robust to `callback` being invoked asynchronously (e.g. from
+  // another thread), (ii) `callback` is deregistered by a call to
+  // this->DeregisterCallback(token) when the operation completes
+  // successfully, and (iii) `callback` does not invoke any method
+  // on this cancellation manager. Furthermore, it is important that
+  // the eventual caller of the complementary DeregisterCallback does not
+  // hold any mutexes that are required by `callback`.
+  bool RegisterCallback(CancellationToken token, CancelCallback callback);
+
+  // Deregister the callback that, when registered, was associated
+  // with the given cancellation token. Returns true iff the callback
+  // was deregistered and will not be invoked; otherwise returns false
+  // after the callback has been invoked, blocking if necessary.
+  //
+  // NOTE(mrry): This method may block if cancellation is in progress.
+  // The caller of this method must not hold any mutexes that are required
+  // to invoke any cancellation callback that has been registered with this
+  // cancellation manager.
+  bool DeregisterCallback(CancellationToken token);
+
+ private:
+  bool is_cancelling_;
+  std::atomic_bool is_cancelled_;
+
+  mutex mu_;
+  Notification cancelled_notification_;
+  CancellationToken next_cancellation_token_ GUARDED_BY(mu_);
+  std::unordered_map<CancellationToken, CancelCallback> callbacks_
+      GUARDED_BY(mu_);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_CANCELLATION_H_
diff --git a/tensorflow/core/framework/cancellation_test.cc b/tensorflow/core/framework/cancellation_test.cc
new file mode 100644
index 0000000000..1925dd20cc
--- /dev/null
+++ b/tensorflow/core/framework/cancellation_test.cc
@@ -0,0 +1,102 @@
+#include "tensorflow/core/framework/cancellation.h"
+
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TEST(Cancellation, SimpleNoCancel) {
+  bool is_cancelled = false;
+  CancellationManager* manager = new CancellationManager();
+  auto token = manager->get_cancellation_token();
+  bool registered = manager->RegisterCallback(
+      token, [&is_cancelled]() { is_cancelled = true; });
+  EXPECT_TRUE(registered);
+  bool deregistered = manager->DeregisterCallback(token);
+  EXPECT_TRUE(deregistered);
+  delete manager;
+  EXPECT_FALSE(is_cancelled);
+}
+
+TEST(Cancellation, SimpleCancel) {
+  bool is_cancelled = false;
+  CancellationManager* manager = new CancellationManager();
+  auto token = manager->get_cancellation_token();
+  bool registered = manager->RegisterCallback(
+      token, [&is_cancelled]() { is_cancelled = true; });
+  EXPECT_TRUE(registered);
+  manager->StartCancel();
+  EXPECT_TRUE(is_cancelled);
+  delete manager;
+}
+
+TEST(Cancellation, CancelBeforeRegister) {
+  CancellationManager* manager = new CancellationManager();
+  auto token = manager->get_cancellation_token();
+  manager->StartCancel();
+  bool registered = manager->RegisterCallback(token, nullptr);
+  EXPECT_FALSE(registered);
+  delete manager;
+}
+
+TEST(Cancellation, DeregisterAfterCancel) {
+  bool is_cancelled = false;
+  CancellationManager* manager = new CancellationManager();
+  auto token = manager->get_cancellation_token();
+  bool registered = manager->RegisterCallback(
+      token, [&is_cancelled]() { is_cancelled = true; });
+  EXPECT_TRUE(registered);
+  manager->StartCancel();
+  EXPECT_TRUE(is_cancelled);
+  bool deregistered = manager->DeregisterCallback(token);
+  EXPECT_FALSE(deregistered);
+  delete manager;
+}
+
+TEST(Cancellation, CancelMultiple) {
+  bool is_cancelled_1 = false, is_cancelled_2 = false, is_cancelled_3 = false;
+  CancellationManager* manager = new CancellationManager();
+  auto token_1 = manager->get_cancellation_token();
+  bool registered_1 = manager->RegisterCallback(
+      token_1, [&is_cancelled_1]() { is_cancelled_1 = true; });
+  EXPECT_TRUE(registered_1);
+  auto token_2 = manager->get_cancellation_token();
+  bool registered_2 = manager->RegisterCallback(
+      token_2, [&is_cancelled_2]() { is_cancelled_2 = true; });
+  EXPECT_TRUE(registered_2);
+  EXPECT_FALSE(is_cancelled_1);
+  EXPECT_FALSE(is_cancelled_2);
+  manager->StartCancel();
+  EXPECT_TRUE(is_cancelled_1);
+  EXPECT_TRUE(is_cancelled_2);
+  EXPECT_FALSE(is_cancelled_3);
+  auto token_3 = manager->get_cancellation_token();
+  bool registered_3 = manager->RegisterCallback(
+      token_3, [&is_cancelled_3]() { is_cancelled_3 = true; });
+  EXPECT_FALSE(registered_3);
+  EXPECT_FALSE(is_cancelled_3);
+  delete manager;
+}
+
+TEST(Cancellation, IsCancelled) {
+  CancellationManager* cm = new CancellationManager();
+  thread::ThreadPool w(Env::Default(), "test", 4);
+  std::vector<Notification> done(8);
+  for (size_t i = 0; i < done.size(); ++i) {
+    Notification* n = &done[i];
+    w.Schedule([n, cm]() {
+      while (!cm->IsCancelled()) {
+      }
+      n->Notify();
+    });
+  }
+  Env::Default()->SleepForMicroseconds(1000000 /* 1 second */);
+  cm->StartCancel();
+  for (size_t i = 0; i < done.size(); ++i) {
+    done[i].WaitForNotification();
+  }
+  delete cm;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/config.proto b/tensorflow/core/framework/config.proto
new file mode 100644
index 0000000000..f0def3d6d7
--- /dev/null
+++ b/tensorflow/core/framework/config.proto
@@ -0,0 +1,61 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+message GPUOptions {
+  // A value between 0 and 1 that indicates what fraction of the
+  // available GPU memory to pre-allocate for each process.  1 means
+  // to pre-allocate all of the GPU memory, 0.5 means the process
+  // allocates ~50% of the available GPU memory.
+  double per_process_gpu_memory_fraction = 1;
+};
+
+// Session configuration parameters.
+// The system picks an appropriate values for fields that are not set.
+message ConfigProto {
+  // Map from device type name (e.g., "CPU" or "GPU" ) to maximum
+  // number of devices of that type to use.  If a particular device
+  // type is not found in the map, the system picks an appropriate
+  // number.
+  map<string, int32> device_count = 1;
+
+  // The execution of an individual op (for some op types) can be
+  // parallelized on a pool of intra_op_parallelism_threads.
+  // 0 means the system picks an appropriate number.
+  int32 intra_op_parallelism_threads = 2;
+
+  // Nodes that perform blocking operations are enqueued on a pool of
+  // inter_op_parallelism_threads available in each process.
+  //
+  // 0 means the system picks an appropriate number.
+  //
+  // Note that the first Session created in the process sets the
+  // number of threads for all future sessions.
+  int32 inter_op_parallelism_threads = 5;
+
+  // Assignment of Nodes to Devices is recomputed every placement_period
+  // steps until the system warms up (at which point the recomputation
+  // typically slows down automatically).
+  int32 placement_period = 3;
+
+  // When any filters are present sessions will ignore all devices which do not
+  // match the filters. Each filter can be partially specified, e.g. "/job:ps"
+  // "/job:worker/replica:3", etc.
+  repeated string device_filters = 4;
+
+  // Options that apply to all GPUs.
+  GPUOptions gpu_options = 6;
+
+  // Whether soft placement is allowed. If allow_soft_placement is true,
+  // an op will be placed on CPU if
+  //   1. there's no GPU implementation for the OP
+  // or
+  //   2. no GPU devices are known or registered
+  // or
+  //   3. need to co-locate with reftype input(s) which are from CPU.
+  bool allow_soft_placement = 7;
+
+  // Whether device placements should be logged.
+  bool log_device_placement = 8;
+};
diff --git a/tensorflow/core/framework/control_flow.h b/tensorflow/core/framework/control_flow.h
new file mode 100644
index 0000000000..f59e0f5310
--- /dev/null
+++ b/tensorflow/core/framework/control_flow.h
@@ -0,0 +1,43 @@
+#ifndef TENSORFLOW_FRAMEWORK_CONTROL_FLOW_H_
+#define TENSORFLOW_FRAMEWORK_CONTROL_FLOW_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/hash/hash.h"
+
+namespace tensorflow {
+
+const uint64 kIllegalFrameId = ~0uLL;
+const int64 kIllegalIterId = -1;
+
+// For the purpose of control flow, every tensor produced by TensorFlow is
+// conceptually tagged by a 'FrameAndIter'. FrameAndIter consists of a
+// 'frame_id' and an 'iter_id'. The tensor value it represents is produced
+// in the frame with frame_id at the iteration of iter_id.
+struct FrameAndIter {
+  uint64 frame_id = kIllegalFrameId;
+  int64 iter_id = kIllegalIterId;
+
+  FrameAndIter() {}
+
+  FrameAndIter(uint64 frame, int64 iter) {
+    frame_id = frame;
+    iter_id = iter;
+  }
+
+  bool operator==(const FrameAndIter& other) const {
+    return (frame_id == other.frame_id && iter_id == other.iter_id);
+  }
+};
+
+struct FrameAndIterHash {
+  size_t operator()(const FrameAndIter& key) const {
+    // Make sure there are no padding bytes that we don't want
+    CHECK_EQ(sizeof(uint64) + sizeof(int64), sizeof(FrameAndIter));
+    return Hash64(reinterpret_cast<const char*>(&key), sizeof(FrameAndIter));
+  }
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_CONTROL_FLOW_H_
diff --git a/tensorflow/core/framework/device_attributes.proto b/tensorflow/core/framework/device_attributes.proto
new file mode 100644
index 0000000000..7592215d1e
--- /dev/null
+++ b/tensorflow/core/framework/device_attributes.proto
@@ -0,0 +1,35 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+// BusAdjacency identifies the ability of a device to participate in
+// maximally efficient DMA operations within the local context of a
+// process.
+//
+// This is currently ignored.
+enum BusAdjacency {
+  BUS_0 = 0;
+  BUS_1 = 1;
+  BUS_ANY = 2;
+  BUS_NUM_ADJACENCIES = 3;
+};
+
+message DeviceAttributes {
+  string name = 1;
+
+  // String representation of device_type.
+  string device_type = 2;
+
+  // Memory capacity of device in bytes.
+  int64 memory_limit = 4;
+
+  BusAdjacency bus_adjacency = 5;
+
+  // A device is assigned a global unique number each time it is
+  // initialized. "incarnation" should never be 0.
+  fixed64 incarnation = 6;
+
+  // String representation of the physical device that this device maps to.
+  string physical_device_desc = 7;
+}
diff --git a/tensorflow/core/framework/device_base.cc b/tensorflow/core/framework/device_base.cc
new file mode 100644
index 0000000000..83ad199062
--- /dev/null
+++ b/tensorflow/core/framework/device_base.cc
@@ -0,0 +1,7 @@
+#include "tensorflow/core/framework/device_base.h"
+
+namespace tensorflow {
+
+DeviceBase::~DeviceBase() {}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/device_base.h b/tensorflow/core/framework/device_base.h
new file mode 100644
index 0000000000..ed4ffc5d94
--- /dev/null
+++ b/tensorflow/core/framework/device_base.h
@@ -0,0 +1,172 @@
+#ifndef TENSORFLOW_FRAMEWORK_DEVICE_BASE_H_
+#define TENSORFLOW_FRAMEWORK_DEVICE_BASE_H_
+
+#include <memory>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/device_attributes.pb.h"
+#include "tensorflow/core/framework/tensor.pb.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/public/status.h"
+
+namespace Eigen {
+class ThreadPoolDevice;
+}  // end namespace Eigen
+
+namespace perftools {
+namespace gputools {
+class Stream;
+}  // namespace gputools
+}  // namespace perftools
+
+namespace tensorflow {
+
+class Device;
+class Env;
+class EventMgr;
+
+namespace thread {
+class ThreadPool;
+}
+
+// A wrapper for an Eigen Gpu Device that includes per-op state
+class PerOpGpuDevice {
+ public:
+  virtual ~PerOpGpuDevice() {}
+  virtual const Eigen::GpuDevice& device() const = 0;
+};
+
+// A class that devices can subclass to pass around
+// Device-specific context to OpKernels.
+class DeviceContext : public core::RefCounted {
+ public:
+  ~DeviceContext() override {}
+  virtual perftools::gputools::Stream* stream() const { return nullptr; }
+  virtual void MaintainLifetimeOnStream(
+      const Tensor* t, perftools::gputools::Stream* stream) const {}
+
+  // "cpu_tensor" is a tensor on a CPU. Copies "cpu_tensor" into
+  // "device_tensor" which is on a GPU device "device". "device_tensor"
+  // must be allocated to be of the same size as "cpu_tensor".
+  virtual void CopyCPUTensorToDevice(const Tensor* cpu_tensor, Device* device,
+                                     Tensor* device_tensor,
+                                     StatusCallback done) const {
+    done(errors::Internal("Unrecognized device type in CPU-to-device Copy"));
+  }
+
+  // "device_tensor" is a tensor on a non-CPU device.  Copies
+  // device_tensor into "cpu_tensor".  "cpu_tensor" must be allocated
+  // to be of the same size as "device_tensor".
+  virtual void CopyDeviceTensorToCPU(const Tensor* device_tensor,
+                                     const string& tensor_name, Device* device,
+                                     Tensor* cpu_tensor, StatusCallback done) {
+    done(errors::Internal("Unrecognized device type in device-to-CPU Copy"));
+  }
+};
+
+typedef std::unordered_map<int, DeviceContext*> DeviceContextMap;
+
+class DeviceBase {
+ public:
+  explicit DeviceBase(Env* env) : env_(env) {}
+  virtual ~DeviceBase();
+
+  Env* env() const { return env_; }
+
+  // Override this to return true for devices that require an Op's
+  // compute method to save references to the temporary tensors it
+  // allocates until the Op execution completes
+  virtual bool SaveTemporaryTensors() const { return false; }
+
+  struct CpuWorkerThreads {
+    int num_threads = 0;
+    thread::ThreadPool* workers = nullptr;
+  };
+
+  // Does not take ownership.
+  void set_tensorflow_cpu_worker_threads(CpuWorkerThreads* t) {
+    cpu_worker_threads_ = t;
+  }
+
+  const CpuWorkerThreads* tensorflow_cpu_worker_threads() const {
+    CHECK(cpu_worker_threads_ != nullptr);
+    return cpu_worker_threads_;
+  }
+
+  // "stream" is used in special circumstances (such as the
+  // constructors of Ops) where there is no available OpKernelContext.
+  // "default_context" is used by OpKernelContext whenever a device does not
+  // supply a DeviceContext for an op in FillContextMap (e.g. when only
+  // using a single stream.)
+  // "event_mgr" is used to delay deallocation of temporary GPU buffers.
+  // TODO(pbar) Work out how to move this out of DeviceBase.
+  struct GpuDeviceInfo {
+    perftools::gputools::Stream* stream;
+    DeviceContext* default_context;
+    EventMgr* event_mgr;
+  };
+
+  // Does not take ownership.
+  void set_tensorflow_gpu_device_info(GpuDeviceInfo* g) {
+    gpu_device_info_ = g;
+  }
+
+  const GpuDeviceInfo* tensorflow_gpu_device_info() const {
+    return gpu_device_info_;
+  }
+
+  // Does not take ownership.
+  void set_eigen_cpu_device(Eigen::ThreadPoolDevice* d) {
+    eigen_cpu_device_ = d;
+  }
+
+  // Return the Allocator implementation to use based on the allocator
+  // attributes requested.  See allocator.h for more details.
+  virtual Allocator* GetAllocator(AllocatorAttributes /*attr*/) {
+    LOG(FATAL) << "GetAllocator() is not implemented.";
+  }
+
+  const Eigen::ThreadPoolDevice* eigen_cpu_device() {
+    CHECK(eigen_cpu_device_ != nullptr);
+    return eigen_cpu_device_;
+  }
+
+  // The caller owns the returned device and must free it by calling
+  // DisposeGpuDevice below
+  virtual const PerOpGpuDevice* MakeGpuDevice(DeviceContext* /*dc*/,
+                                              Allocator* /*allocator*/) {
+    // The OpKernelContext calls this even for devices that do not
+    // implement an eigen_gpu_device
+    return nullptr;
+  }
+
+  virtual const DeviceAttributes& attributes() const {
+    LOG(FATAL) << "Device does not implement attributes()";
+  }
+
+  // Materializes the given TensorProto into 'tensor' stored in Device
+  // memory.  Most devices will want to override this.
+  //
+  // TODO(vrv): We should be able to put this function into
+  // OpKernelContext and handle the copies from device memory via send
+  // and receive nodes, instead of requiring that each device handle
+  // the copies here as well as in copy ops.
+  virtual Status MakeTensorFromProto(const TensorProto& tensor_proto,
+                                     const AllocatorAttributes alloc_attrs,
+                                     Tensor* tensor) {
+    return errors::Internal("Device does not implement MakeTensorFromProto()");
+  }
+
+ private:
+  Env* const env_;
+  CpuWorkerThreads* cpu_worker_threads_ = nullptr;
+  GpuDeviceInfo* gpu_device_info_ = nullptr;
+  Eigen::ThreadPoolDevice* eigen_cpu_device_ = nullptr;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_DEVICE_BASE_H_
diff --git a/tensorflow/core/framework/fake_input.cc b/tensorflow/core/framework/fake_input.cc
new file mode 100644
index 0000000000..493c35e05f
--- /dev/null
+++ b/tensorflow/core/framework/fake_input.cc
@@ -0,0 +1,214 @@
+#include "tensorflow/core/framework/fake_input.h"
+
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+namespace {
+
+class FakeInputImpl {
+ public:
+  FakeInputImpl(const OpDef* op_def, int in_index, const NodeDef* node_def,
+                NodeDefBuilder* builder);
+  void SetN(int n);
+  void SetDataType(DataType dt);
+  void SetTypeList(DataTypeSlice dts);
+  Status AddInputToBuilder();
+
+ private:
+  static string FakeNodeName(int in_index);
+  Status GetN(int* n) const;
+  Status GetDataType(DataType* dt) const;
+  void NSources(int n, DataType dt) const;
+  void SourceList(DataTypeSlice dts) const;
+
+  const OpDef* const op_def_;
+  const OpDef::ArgDef* const arg_;
+  const string in_node_;
+  const NodeDef* const node_def_;
+  NodeDefBuilder* const builder_;
+
+  bool n_specified_;
+  int n_;
+  bool dt_specified_;
+  DataType dt_;
+  bool dts_specified_;
+  DataTypeSlice dts_;
+};
+
+FakeInputImpl::FakeInputImpl(const OpDef* op_def, int in_index,
+                             const NodeDef* node_def, NodeDefBuilder* builder)
+    : op_def_(op_def),
+      arg_(&op_def->input_arg(in_index)),
+      in_node_(FakeNodeName(in_index)),
+      node_def_(node_def),
+      builder_(builder),
+      n_specified_(false),
+      dt_specified_(false),
+      dts_specified_(false) {}
+
+void FakeInputImpl::SetN(int n) {
+  n_specified_ = true;
+  n_ = n;
+}
+
+void FakeInputImpl::SetDataType(DataType dt) {
+  dt_specified_ = true;
+  dt_ = dt;
+}
+
+void FakeInputImpl::SetTypeList(DataTypeSlice dts) {
+  dts_specified_ = true;
+  dts_ = dts;
+}
+
+Status FakeInputImpl::AddInputToBuilder() {
+  if (dts_specified_) {
+    SourceList(dts_);
+
+  } else if (n_specified_ || !arg_->number_attr().empty()) {
+    int n;
+    TF_RETURN_IF_ERROR(GetN(&n));
+
+    DataType dt;
+    if (n > 0) {
+      TF_RETURN_IF_ERROR(GetDataType(&dt));
+    } else {
+      dt = DT_FLOAT;
+    }
+
+    NSources(n, dt);
+  } else {
+    if (!dt_specified_ && !arg_->type_list_attr().empty()) {
+      DataTypeVector dts;
+      Status status =
+          GetNodeAttr(*node_def_, arg_->type_list_attr(), &dts);
+      if (!status.ok()) {
+        return errors::InvalidArgument(
+            "Could not infer list of types for input '", arg_->name(), "': ",
+            status.error_message());
+      }
+      SourceList(dts);
+      return Status::OK();
+    }
+
+    DataType dt;
+    TF_RETURN_IF_ERROR(GetDataType(&dt));
+    builder_->Input(in_node_, 0, dt);
+  }
+  return Status::OK();
+}
+
+// static
+string FakeInputImpl::FakeNodeName(int in_index) {
+  char c = 'a' + (in_index % 26);
+  return string(&c, 1);
+}
+
+Status FakeInputImpl::GetN(int* n) const {
+  if (n_specified_) {
+    *n = n_;
+  } else {
+    Status status = GetNodeAttr(*node_def_, arg_->number_attr(), n);
+    if (!status.ok()) {
+      return errors::InvalidArgument("Could not infer length of input '",
+                                     arg_->name(), "': ",
+                                     status.error_message());
+    }
+  }
+  return Status::OK();
+}
+
+Status FakeInputImpl::GetDataType(DataType* dt) const {
+  if (dt_specified_) {
+    *dt = dt_;
+  } else if (arg_->type() != DT_INVALID) {
+    *dt = arg_->type();
+  } else if (!arg_->type_attr().empty()) {
+    Status status = GetNodeAttr(*node_def_, arg_->type_attr(), dt);
+    if (!status.ok()) {
+      return errors::InvalidArgument("Could not infer type for input '",
+                                     arg_->name(), "': ",
+                                     status.error_message());
+    }
+  } else {
+    return errors::InvalidArgument("No type or type_attr field in arg '",
+                                   arg_->name(), "'");
+  }
+  return Status::OK();
+}
+
+void FakeInputImpl::NSources(int n, DataType dt) const {
+  std::vector<NodeDefBuilder::NodeOut> srcs;
+  srcs.reserve(n);
+  for (int i = 0; i < n; ++i) {
+    srcs.emplace_back(in_node_, i, dt);
+  }
+  builder_->Input(srcs);
+}
+
+void FakeInputImpl::SourceList(DataTypeSlice dts) const {
+  std::vector<NodeDefBuilder::NodeOut> srcs;
+  srcs.reserve(dts.size());
+  for (size_t i = 0; i < dts.size(); ++i) {
+    srcs.emplace_back(in_node_, i, dts[i]);
+  }
+  builder_->Input(srcs);
+}
+
+}  // namespace
+
+// Public interface ------------------------------------------------------------
+
+FakeInputFunctor FakeInput() {
+  return [](const OpDef& op_def, int in_index, const NodeDef& node_def,
+            NodeDefBuilder* builder) {
+    FakeInputImpl impl(&op_def, in_index, &node_def, builder);
+    return impl.AddInputToBuilder();
+  };
+}
+
+FakeInputFunctor FakeInput(DataType dt) {
+  return [dt](const OpDef& op_def, int in_index, const NodeDef& node_def,
+              NodeDefBuilder* builder) {
+    FakeInputImpl impl(&op_def, in_index, &node_def, builder);
+    impl.SetDataType(dt);
+    return impl.AddInputToBuilder();
+  };
+}
+
+FakeInputFunctor FakeInput(int n) {
+  return [n](const OpDef& op_def, int in_index, const NodeDef& node_def,
+             NodeDefBuilder* builder) {
+    FakeInputImpl impl(&op_def, in_index, &node_def, builder);
+    impl.SetN(n);
+    return impl.AddInputToBuilder();
+  };
+}
+
+FakeInputFunctor FakeInput(int n, DataType dt) {
+  return [n, dt](const OpDef& op_def, int in_index, const NodeDef& node_def,
+                 NodeDefBuilder* builder) {
+    FakeInputImpl impl(&op_def, in_index, &node_def, builder);
+    impl.SetN(n);
+    impl.SetDataType(dt);
+    return impl.AddInputToBuilder();
+  };
+}
+
+FakeInputFunctor FakeInput(DataTypeSlice dts) {
+  // Make a copy to ensure the data will still be around when the lambda is
+  // called.
+  DataTypeVector dtv(dts.begin(), dts.end());
+  return [dtv](const OpDef& op_def, int in_index, const NodeDef& node_def,
+               NodeDefBuilder* builder) {
+    FakeInputImpl impl(&op_def, in_index, &node_def, builder);
+    impl.SetTypeList(dtv);
+    return impl.AddInputToBuilder();
+  };
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/fake_input.h b/tensorflow/core/framework/fake_input.h
new file mode 100644
index 0000000000..39b38e9a59
--- /dev/null
+++ b/tensorflow/core/framework/fake_input.h
@@ -0,0 +1,25 @@
+#ifndef TENSORFLOW_FRAMEWORK_FAKE_INPUT_H_
+#define TENSORFLOW_FRAMEWORK_FAKE_INPUT_H_
+
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/types.h"
+
+namespace tensorflow {
+
+// These functions return values that may be passed to
+// NodeDefBuilder::Input() to add an input for a test.  Use them when
+// you don't care about the node names/output indices providing the
+// input.  They also allow you to omit the input types and/or
+// list length when they may be inferred.
+FakeInputFunctor FakeInput();  // Infer everything
+FakeInputFunctor FakeInput(DataType dt);
+FakeInputFunctor FakeInput(int n);  // List of length n
+FakeInputFunctor FakeInput(int n, DataType dt);
+FakeInputFunctor FakeInput(DataTypeSlice dts);
+inline FakeInputFunctor FakeInput(std::initializer_list<DataType> dts) {
+  return FakeInput(DataTypeSlice(dts));
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_FAKE_INPUT_H_
diff --git a/tensorflow/core/framework/function.cc b/tensorflow/core/framework/function.cc
new file mode 100644
index 0000000000..b73e1ab8a9
--- /dev/null
+++ b/tensorflow/core/framework/function.cc
@@ -0,0 +1,878 @@
+#include "tensorflow/core/framework/function.h"
+
+#include <unordered_set>
+
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+
+namespace tensorflow {
+
+REGISTER_OP("_Arg")
+    .Output("output: T")
+    .Attr("T: type")
+    .Attr("index: int >= 0")
+    .Doc(R"doc(
+A graph node which represents an argument to a function.
+
+output: The argument.
+index: This argument is the index-th argument of the function.
+)doc");
+
+REGISTER_OP("_Retval")
+    .Input("input: T")
+    .Attr("T: type")
+    .Attr("index: int >= 0")
+    .Doc(R"doc(
+A graph node which represents a return value of a function.
+
+input: The return value.
+index: This return value is the index-th return value of the function.
+)doc");
+
+REGISTER_OP("_ListToArray")
+    .Input("input: Tin")
+    .Output("output: N * T")
+    .Attr("Tin: list(type)")
+    .Attr("T: type")
+    .Attr("N: int >= 1")
+    .Doc(R"doc(
+Converts a list of tensors to an array of tensors.
+)doc");
+
+REGISTER_OP("_ArrayToList")
+    .Input("input: N * T")
+    .Output("output: out_types")
+    .Attr("T: type")
+    .Attr("N: int >= 1")
+    .Attr("out_types: list(type)")
+    .Doc(R"doc(
+Converts an array of tensors to a list of tensors.
+)doc");
+
+namespace {
+
+// Extracts the actual type from "attr_values" based on its definition
+// "arg_def".
+Status ArgNumType(const InstantiateAttrValueMap& attrs,
+                  const OpDef::ArgDef& arg_def, int* num, DataType* dtype) {
+  if (!arg_def.type_list_attr().empty()) {
+    return errors::Unimplemented("type_list is not supported.");
+  }
+
+  if (arg_def.number_attr().empty()) {
+    *num = 1;
+  } else {
+    const AttrValue* v = gtl::FindOrNull(attrs, arg_def.number_attr());
+    if (v == nullptr) {
+      return errors::NotFound("type attr not found: ", arg_def.type_attr());
+    }
+    *num = v->i();
+  }
+
+  if (arg_def.type() != DT_INVALID) {
+    *dtype = arg_def.type();
+  } else if (arg_def.type_attr().empty()) {
+    *dtype = DT_INVALID;
+  } else {
+    const AttrValue* v = gtl::FindOrNull(attrs, arg_def.type_attr());
+    if (v == nullptr) {
+      return errors::NotFound("type attr not found: ", arg_def.type_attr());
+    }
+    *dtype = v->type();
+  }
+  return Status::OK();
+}
+
+string Name(int node_index) { return strings::StrCat("n", node_index); }
+
+string Name(int node_index, int output_index) {
+  if (output_index == 0) {
+    return Name(node_index);
+  } else {
+    return strings::StrCat("n", node_index, ":", output_index);
+  }
+}
+
+string Dep(int node_index) { return strings::StrCat("^", Name(node_index)); }
+
+template <typename T>
+void AddAttr(const string& name, const T& val, NodeDef* ndef) {
+  SetAttrValue(val, &((*ndef->mutable_attr())[name]));
+}
+
+Status ValidateSignatureWithAttrs(const OpDef& sig,
+                                  const InstantiateAttrValueMap& attr_values) {
+  // attr_values should specify all attrs defined in fdef.
+  for (const auto& a : sig.attr()) {
+    if (attr_values.find(a.name()) == attr_values.end()) {
+      return errors::NotFound("Attr ", a.name(), " is not found.");
+    }
+  }
+
+  for (const auto& p : attr_values) {
+    if (HasPlaceHolder(p.second)) {
+      return errors::InvalidArgument(p.first,
+                                     " in attr_values is still a placeholder.");
+    }
+  }
+
+  return Status::OK();
+}
+
+// We build a small index for all names that can be used as a node's
+// input arguments.
+//
+// If is_func_arg is true, the name is a function's argument.  In
+// this case, the produced graph def has gdef.node[nid ... nid +
+// num).
+//
+// Otherwise, the name is a function body's node return value.  In
+// this case, the produced graph def has one node gdef.node[nid] and
+// the node's output index [idx ... idx + num) corresponds to the
+// named outputs.
+//
+// In all cases, "dtype" specifies the data type.
+struct NameInfoItem {
+  bool is_func_arg;
+  int nid;
+  int idx;
+  int num;
+  DataType dtype;
+};
+typedef std::unordered_map<string, NameInfoItem> NameInfoIndex;
+
+Status BuildInputArgIndex(const OpDef::ArgDef& arg_def,
+                          const InstantiateAttrValueMap& attr_values,
+                          NameInfoIndex* name_info,
+                          InstantiationResult* result) {
+  int num;
+  DataType dtype;
+  TF_RETURN_IF_ERROR(ArgNumType(attr_values, arg_def, &num, &dtype));
+  CHECK_GE(num, 1);
+  GraphDef* gdef = &result->gdef;
+  int arg_index = gdef->node_size();
+  if (!name_info->insert({arg_def.name(), {true, arg_index, 0, num, dtype}})
+           .second) {
+    return errors::InvalidArgument("Duplicated arg name.");
+  }
+  // Creates "num" nodes in the gdef.
+  for (int i = 0; i < num; ++i) {
+    DCHECK_EQ(arg_index, gdef->node_size());
+    NodeDef* gnode = gdef->add_node();
+    gnode->set_name(Name(arg_index));
+    gnode->set_op("_Arg");
+    AddAttr("T", dtype, gnode);
+    AddAttr("index", arg_index, gnode);
+    result->arg_types.push_back(dtype);
+    ++arg_index;
+  }
+  return Status::OK();
+}
+
+Status BuildNodeOutputIndex(const FunctionDef::Node& node,
+                            const InstantiateAttrValueMap& attrs,
+                            GetFunctionSignature get_function,
+                            const int arg_index, NameInfoIndex* name_info) {
+  const OpDef* node_sig = nullptr;
+  TF_RETURN_IF_ERROR(get_function(node.op(), &node_sig));
+  if (node_sig->output_arg_size() == 0) {
+    // This node produces no output.
+    if (node.ret_size() != 1) {
+      return errors::InvalidArgument("Expect one ret name.");
+    }
+    if (!name_info->insert({node.ret(0), {false, arg_index, 0, 0, DT_INVALID}})
+             .second) {
+      return errors::InvalidArgument("Duplicated ret name.");
+    }
+    return Status::OK();
+  }
+
+  // When the signature says the last return value is of list(type),
+  // i.e., it's variadic, we need to consult
+  // attrs[last_retval.type_list_attr] to determine for the last arg
+  //   * the actual number of outputs;
+  //   * the actual data type of outputs.
+  const int num_retval = node_sig->output_arg_size();
+  const OpDef::ArgDef& last_retval = node_sig->output_arg(num_retval - 1);
+  const bool last_retval_is_typelist = !last_retval.type_list_attr().empty();
+  if (!last_retval_is_typelist && (node.ret_size() != num_retval)) {
+    return errors::InvalidArgument("Malformed function node (#ret).");
+  }
+  int start = 0;
+  const int num_fixed_size_retval =
+      last_retval_is_typelist ? num_retval - 1 : num_retval;
+  for (int i = 0; i < num_fixed_size_retval; ++i) {
+    int num;
+    DataType dtype;
+    TF_RETURN_IF_ERROR(
+        ArgNumType(attrs, node_sig->output_arg(i), &num, &dtype));
+    if (!name_info->insert({node.ret(i), {false, arg_index, start, num, dtype}})
+             .second) {
+      return errors::InvalidArgument("Duplicated ret name.");
+    }
+    start += num;
+  }
+  if (last_retval_is_typelist) {
+    const AttrValue* typelist =
+        gtl::FindOrNull(attrs, last_retval.type_list_attr());
+    if (typelist == nullptr) {
+      return errors::InvalidArgument("Missing attr ",
+                                     last_retval.type_list_attr(), ".");
+    }
+    if (num_fixed_size_retval + typelist->list().type_size() !=
+        node.ret_size()) {
+      return errors::InvalidArgument("Wrong #ret: ", num_fixed_size_retval, " ",
+                                     typelist->list().type_size(), " ",
+                                     node.ret_size(), ".");
+    }
+    for (int i = 0; i < typelist->list().type_size(); ++i) {
+      if (!name_info->insert({node.ret(i),
+                              {false, arg_index, start, 1,
+                               typelist->list().type(i)}})
+               .second) {
+        return errors::InvalidArgument("Duplicated ret name.");
+      }
+      ++start;
+    }
+  }
+  return Status::OK();
+}
+
+Status InstantiateNode(const FunctionDef::Node& fnode,
+                       const InstantiateAttrValueMap& attrs,
+                       GetFunctionSignature get_function,
+                       const NameInfoIndex& name_info, GraphDef* gdef) {
+  const OpDef* fnode_sig = nullptr;
+  TF_CHECK_OK(get_function(fnode.op(), &fnode_sig));
+  NodeDef* gnode = gdef->add_node();
+  gnode->set_name(Name(gdef->node_size() - 1));
+  gnode->set_op(fnode.op());
+
+  // Input
+  //
+  // When the signature says the last argument is of list(type),
+  // i.e., it's variadic, we need to consult
+  // attrs[last_arg.type_list_attr] to determine for the last arg
+  //   * the number of arguments;
+  //   * the data types of arguments.
+  const int num_arg = fnode_sig->input_arg_size();
+  bool last_arg_is_typelist = false;
+  if (num_arg > 0 &&
+      !fnode_sig->input_arg(num_arg - 1).type_list_attr().empty()) {
+    last_arg_is_typelist = true;
+  }
+  if (!last_arg_is_typelist && (fnode.arg_size() != num_arg)) {
+    return errors::InvalidArgument("arg.size != sig.arg.size.");
+  }
+  const int num_fixed_size_args = last_arg_is_typelist ? num_arg - 1 : num_arg;
+  for (int i = 0; i < num_fixed_size_args; ++i) {
+    int num;
+    DataType dtype;
+    TF_RETURN_IF_ERROR(
+        ArgNumType(attrs, fnode_sig->input_arg(i), &num, &dtype));
+    const NameInfoItem* item = gtl::FindOrNull(name_info, fnode.arg(i));
+    if (item == nullptr) {
+      return errors::InvalidArgument("arg[", i, "] is not found: ",
+                                     fnode.ShortDebugString());
+    }
+    if (num != item->num || dtype != item->dtype) {
+      return errors::InvalidArgument("Invalid arg(", i, ") for function arg: ",
+                                     " ", num, "/", dtype, " vs. ", item->num,
+                                     "/", item->dtype, ".");
+    }
+    for (int j = 0; j < num; ++j) {
+      if (item->is_func_arg) {
+        gnode->add_input(Name(item->nid + j));
+      } else {
+        gnode->add_input(Name(item->nid, item->idx + j));
+      }
+    }
+  }
+  if (last_arg_is_typelist) {
+    AttrValue typelist;
+    for (int i = num_fixed_size_args; i < fnode.arg_size(); ++i) {
+      const NameInfoItem* item = gtl::FindOrNull(name_info, fnode.arg(i));
+      if (item == nullptr) {
+        return errors::InvalidArgument("arg[", i, "] is not found.");
+      }
+      for (int j = 0; j < item->num; ++j) {
+        if (item->is_func_arg) {
+          gnode->add_input(Name(item->nid + j));
+        } else {
+          gnode->add_input(Name(item->nid, item->idx + j));
+        }
+        typelist.mutable_list()->add_type(item->dtype);
+      }
+    }
+
+    // 'typelist' is inferred from the inputs' data types.
+    const auto& last_arg = fnode_sig->input_arg(num_arg - 1);
+    gnode->mutable_attr()->insert({last_arg.type_list_attr(), typelist});
+  }
+
+  // Control deps.
+  for (int i = 0; i < fnode.dep_size(); ++i) {
+    const NameInfoItem* item = gtl::FindOrNull(name_info, fnode.dep(i));
+    if (item == nullptr) {
+      return errors::InvalidArgument("dep[", i, "] is not found.");
+    }
+    gnode->add_input(Dep(item->nid));
+  }
+
+  // Attrs.
+  for (const auto& p : attrs) {
+    (*gnode->mutable_attr())[p.first] = p.second;
+  }
+
+  return Status::OK();
+}
+
+Status AddReturnNode(const OpDef::ArgDef& ret_def,
+                     const InstantiateAttrValueMap& attrs,
+                     const NameInfoIndex& name_info, int* ret_index,
+                     InstantiationResult* result) {
+  int num;
+  DataType dtype;
+  TF_RETURN_IF_ERROR(ArgNumType(attrs, ret_def, &num, &dtype));
+  CHECK_GE(num, 1);
+  const NameInfoItem* item = gtl::FindOrNull(name_info, ret_def.name());
+  if (item == nullptr) {
+    return errors::InvalidArgument("ret is not found.");
+  }
+  if (num != item->num || dtype != item->dtype) {
+    return errors::InvalidArgument("Invalid ret name.");
+  }
+  GraphDef* gdef = &result->gdef;
+  for (int i = 0; i < num; ++i) {
+    NodeDef* gnode = gdef->add_node();
+    gnode->set_name(Name(gdef->node_size() - 1));
+    gnode->set_op("_Retval");
+    gnode->add_input(Name(item->nid, item->idx + i));
+    AddAttr("T", dtype, gnode);
+    AddAttr("index", (*ret_index)++, gnode);
+    result->ret_types.push_back(dtype);
+  }
+  return Status::OK();
+}
+
+// Various helpers Print(proto) to print relevant protos to ascii.
+string Print(const OpDef::ArgDef& arg) {
+  string out;
+  strings::StrAppend(&out, arg.name(), ":");
+  if (arg.is_ref()) strings::StrAppend(&out, "Ref(");
+  if (!arg.number_attr().empty()) {
+    strings::StrAppend(&out, arg.number_attr(), "*");
+  }
+  if (arg.type() != DT_INVALID) {
+    strings::StrAppend(&out, DataTypeString(arg.type()));
+  } else {
+    strings::StrAppend(&out, arg.type_attr());
+  }
+  if (arg.is_ref()) strings::StrAppend(&out, ")");
+  return out;
+}
+
+string Print(const AttrValue& attr_value) {
+  if (attr_value.value_case() == AttrValue::kType) {
+    return DataTypeString(attr_value.type());
+  } else if ((attr_value.value_case() == AttrValue::kList) &&
+             (attr_value.list().type_size() > 0)) {
+    string ret = "{";
+    for (int i = 0; i < attr_value.list().type_size(); ++i) {
+      if (i > 0) strings::StrAppend(&ret, ", ");
+      strings::StrAppend(&ret, DataTypeString(attr_value.list().type(i)));
+    }
+    strings::StrAppend(&ret, "}");
+    return ret;
+  } else if (attr_value.value_case() == AttrValue::kFunc) {
+    if (attr_value.func().attr_size() == 0) {
+      return attr_value.func().name();
+    }
+    std::vector<string> entries;
+    for (auto p : attr_value.func().attr()) {
+      entries.push_back(strings::StrCat(p.first, "=", Print(p.second)));
+    }
+    sort(entries.begin(), entries.end());
+    return strings::StrCat(attr_value.func().name(), "[",
+                           str_util::Join(entries, ", "), "]");
+  }
+  return SummarizeAttrValue(attr_value);
+}
+
+string Print(const FunctionDef::Node& node) {
+  string out;
+  for (int i = 0; i < node.ret_size(); ++i) {
+    const auto& name = node.ret(i);
+    if (i > 0) strings::StrAppend(&out, ", ");
+    strings::StrAppend(&out, name);
+  }
+  strings::StrAppend(&out, " = ", node.op());
+  if (node.attr_size() > 0) {
+    std::vector<string> entries;
+    for (auto p : node.attr()) {
+      entries.push_back(strings::StrCat(p.first, "=", Print(p.second)));
+    }
+    sort(entries.begin(), entries.end());
+    strings::StrAppend(&out, "[", str_util::Join(entries, ", "), "]");
+  }
+  strings::StrAppend(&out, "(");
+  for (int i = 0; i < node.arg_size(); ++i) {
+    if (i > 0) strings::StrAppend(&out, ", ");
+    strings::StrAppend(&out, node.arg(i));
+  }
+  strings::StrAppend(&out, ")");
+  if (node.dep_size() > 0) {
+    strings::StrAppend(&out, " @ ");
+    for (int i = 0; i < node.dep_size(); ++i) {
+      if (i > 0) strings::StrAppend(&out, ", ");
+      strings::StrAppend(&out, node.dep(i));
+    }
+  }
+  return out;
+}
+
+string Print(const FunctionDef& fdef) {
+  string out;
+  const OpDef& sig = fdef.signature();
+  strings::StrAppend(&out, "\n", sig.name());
+  if (sig.attr_size() > 0) {
+    strings::StrAppend(&out, "[");
+    for (int i = 0; i < sig.attr_size(); ++i) {
+      const auto& a = sig.attr(i);
+      if (i > 0) strings::StrAppend(&out, ", ");
+      if (a.type() == "type") {
+        strings::StrAppend(&out, a.name(), ":", Print(a.allowed_values()));
+      } else {
+        strings::StrAppend(&out, a.name(), ":", a.type());
+      }
+    }
+    strings::StrAppend(&out, "]");
+  }
+  strings::StrAppend(&out, "(");
+  for (int i = 0; i < sig.input_arg_size(); ++i) {
+    if (i > 0) strings::StrAppend(&out, ", ");
+    strings::StrAppend(&out, Print(sig.input_arg(i)));
+  }
+  strings::StrAppend(&out, ") -> (");
+  for (int i = 0; i < sig.output_arg_size(); ++i) {
+    if (i > 0) strings::StrAppend(&out, ", ");
+    strings::StrAppend(&out, Print(sig.output_arg(i)));
+  }
+  strings::StrAppend(&out, ") {\n");
+  for (const auto& n : fdef.node()) {
+    strings::StrAppend(&out, "  ", Print(n), "\n");
+  }
+  strings::StrAppend(&out, "}\n");
+  return out;
+}
+
+string Print(const NodeDef& n) {
+  string out;
+  strings::StrAppend(&out, n.name(), " = ", n.op());
+  if (n.attr_size() > 0) {
+    std::vector<string> entries;
+    for (auto& a : n.attr()) {
+      entries.push_back(strings::StrCat(a.first, "=", Print(a.second)));
+    }
+    sort(entries.begin(), entries.end());
+    strings::StrAppend(&out, "[", str_util::Join(entries, ", "), "]");
+  }
+  strings::StrAppend(&out, "(");
+  std::vector<StringPiece> dat;
+  std::vector<string> dep;
+  for (StringPiece s : n.input()) {
+    if (s.Consume("^")) {
+      dep.push_back(s.ToString());
+    } else {
+      dat.push_back(s);
+    }
+  }
+  strings::StrAppend(&out, str_util::Join(dat, ", "), ")");
+  if (!dep.empty()) {
+    strings::StrAppend(&out, " @ ", str_util::Join(dep, ", "));
+  }
+  return out;
+}
+
+string Print(const GraphDef& gdef) {
+  std::vector<const NodeDef*> arg;
+  std::vector<const NodeDef*> ret;
+  std::vector<const NodeDef*> body;
+  for (const NodeDef& n : gdef.node()) {
+    if (n.op() == "_Arg") {
+      arg.push_back(&n);
+    } else if (n.op() == "_Retval") {
+      ret.push_back(&n);
+    } else {
+      body.push_back(&n);
+    }
+  }
+  auto comp = [](const NodeDef* x, const NodeDef* y) {
+    int xi;
+    TF_CHECK_OK(GetNodeAttr(*x, "index", &xi));
+    int yi;
+    TF_CHECK_OK(GetNodeAttr(*y, "index", &yi));
+    return xi < yi;
+  };
+  sort(arg.begin(), arg.end(), comp);
+  sort(ret.begin(), ret.end(), comp);
+  string out;
+  strings::StrAppend(&out, "\n(");
+  auto get_type = [](const NodeDef& n) {
+    for (auto a : n.attr()) {
+      if (a.first == "T") {
+        return DataTypeString(a.second.type());
+      }
+    }
+    return DataTypeString(DT_INVALID);
+  };
+  for (size_t i = 0; i < arg.size(); ++i) {
+    const NodeDef* n = arg[i];
+    if (i > 0) strings::StrAppend(&out, ", ");
+    CHECK_EQ(2, n->attr_size());
+    strings::StrAppend(&out, n->name(), ":", get_type(*n));
+  }
+  strings::StrAppend(&out, ") -> (");
+  for (size_t i = 0; i < ret.size(); ++i) {
+    const NodeDef* n = ret[i];
+    if (i > 0) strings::StrAppend(&out, ", ");
+    CHECK_EQ(2, n->attr_size());
+    CHECK_EQ(1, n->input_size());
+    strings::StrAppend(&out, n->input(0), ":", get_type(*n));
+  }
+  strings::StrAppend(&out, ") {\n");
+  for (size_t i = 0; i < body.size(); ++i) {
+    strings::StrAppend(&out, "  ", Print(*body[i]), "\n");
+  }
+  strings::StrAppend(&out, "}\n");
+  return out;
+}
+
+}  // end namespace
+
+Status InstantiateFunction(const FunctionDef& fdef,
+                           const InstantiateAttrValueMap& attr_values,
+                           GetFunctionSignature get_function,
+                           InstantiationResult* result) {
+  const OpDef& sig = fdef.signature();
+  GraphDef* gdef = &result->gdef;
+  gdef->Clear();
+
+  TF_RETURN_IF_ERROR(ValidateSignatureWithAttrs(sig, attr_values));
+
+  auto substitute = [&attr_values](const string& name, AttrValue* val) {
+    auto iter = attr_values.find(name);
+    if (iter == attr_values.end()) {
+      return false;
+    } else {
+      *val = iter->second;
+      return true;
+    }
+  };
+
+  // Makes a copy of all attrs in fdef and substitutes placeholders.
+  // After this step, every attr is bound to a concrete value.
+  std::vector<InstantiateAttrValueMap> node_attrs;
+  node_attrs.resize(fdef.node_size());
+  for (int i = 0; i < fdef.node_size(); ++i) {
+    for (auto attr : fdef.node(i).attr()) {
+      if (!SubstitutePlaceholders(substitute, &attr.second)) {
+        return errors::InvalidArgument("Failed to bind all placeholders in ",
+                                       SummarizeAttrValue(attr.second));
+      }
+      CHECK(node_attrs[i].insert(attr).second);
+    }
+  }
+
+  NameInfoIndex name_info;
+  Status s;
+  for (const OpDef::ArgDef& arg_def : sig.input_arg()) {
+    s = BuildInputArgIndex(arg_def, attr_values, &name_info, result);
+    if (!s.ok()) {
+      errors::AppendToMessage(&s, " In ", Print(arg_def));
+      return s;
+    }
+  }
+  for (int i = 0; i < fdef.node_size(); ++i) {
+    s = BuildNodeOutputIndex(fdef.node(i), node_attrs[i], get_function,
+                             gdef->node_size() + i, &name_info);
+    if (!s.ok()) {
+      errors::AppendToMessage(&s, " In ", Print(fdef.node(i)));
+      return s;
+    }
+  }
+
+  // Emits one gdef.node for each fdef.node.
+  for (int i = 0; i < fdef.node_size(); ++i) {
+    s = InstantiateNode(fdef.node(i), node_attrs[i], get_function, name_info,
+                        gdef);
+    if (!s.ok()) {
+      errors::AppendToMessage(&s, " In ", Print(fdef.node(i)));
+      return s;
+    }
+  }
+
+  // Emits nodes for the function's return values.
+  int ret_index = 0;
+  for (const OpDef::ArgDef& ret_def : sig.output_arg()) {
+    s = AddReturnNode(ret_def, attr_values, name_info, &ret_index, result);
+    if (!s.ok()) {
+      errors::AppendToMessage(&s, " In ", Print(ret_def));
+      return s;
+    }
+  }
+
+  return Status::OK();
+}
+
+string DebugString(const FunctionDef& func_def) { return Print(func_def); }
+
+string DebugString(const GraphDef& instantiated_func_def) {
+  return Print(instantiated_func_def);
+}
+
+string DebugStringWhole(const GraphDef& gdef) {
+  string ret;
+  for (auto fdef : gdef.library().function()) {
+    strings::StrAppend(&ret, Print(fdef));
+  }
+  strings::StrAppend(&ret, "\n");
+  for (auto ndef : gdef.node()) {
+    strings::StrAppend(&ret, Print(ndef), "\n");
+  }
+  return ret;
+}
+
+string Canonicalize(const string& funcname,
+                    const InstantiateAttrValueMap& attrs) {
+  std::vector<string> entries;
+  entries.reserve(attrs.size());
+  for (auto p : attrs) {
+    entries.push_back(strings::StrCat(p.first, "=", Print(p.second)));
+  }
+  sort(entries.begin(), entries.end());
+  return strings::StrCat(funcname, "[", str_util::Join(entries, ","), "]");
+}
+
+FunctionCallFrame::FunctionCallFrame(DataTypeSlice arg_types,
+                                     DataTypeSlice ret_types)
+    : arg_types_(arg_types.begin(), arg_types.end()),
+      ret_types_(ret_types.begin(), ret_types.end()) {
+  args_.resize(arg_types_.size());
+  rets_.resize(ret_types_.size());
+}
+
+FunctionCallFrame::~FunctionCallFrame() {}
+
+Status FunctionCallFrame::SetArgs(gtl::ArraySlice<Tensor> args) {
+  // Input type checks.
+  if (args.size() != arg_types_.size()) {
+    return errors::InvalidArgument("Expects ", arg_types_.size(),
+                                   " arguments, but ", args.size(),
+                                   " is provided");
+  }
+  for (size_t i = 0; i < args.size(); ++i) {
+    if (arg_types_[i] != args[i].dtype()) {
+      return errors::InvalidArgument(
+          "Expects arg[", i, "] to be ", DataTypeString(arg_types_[i]), " but ",
+          DataTypeString(args[i].dtype()), " is provided");
+    }
+    args_[i] = args[i];
+  }
+  return Status::OK();
+}
+
+Status FunctionCallFrame::GetRetvals(std::vector<Tensor>* rets) const {
+  rets->clear();
+  rets->reserve(rets_.size());
+  for (size_t i = 0; i < rets_.size(); ++i) {
+    auto item = rets_[i];
+    if (item.has_val) {
+      rets->push_back(item.val);
+    } else {
+      return errors::Internal("Retval[", i, "] does not have value");
+    }
+  }
+  return Status::OK();
+}
+
+Status FunctionCallFrame::GetArg(int index, Tensor* val) const {
+  if (index < 0 || static_cast<size_t>(index) >= args_.size()) {
+    return errors::OutOfRange("GetArg ", index, " is not within [0, ",
+                              args_.size(), ")");
+  }
+  *val = args_[index];
+  return Status::OK();
+}
+
+Status FunctionCallFrame::SetRetval(int index, const Tensor& val) {
+  if (index < 0 || static_cast<size_t>(index) >= rets_.size()) {
+    return errors::OutOfRange("SetRetval ", index, " is not within [0, ",
+                              rets_.size(), ")");
+  }
+  if (val.dtype() != ret_types_[index]) {
+    return errors::InvalidArgument(
+        "Expects ret[", index, "] to be ", DataTypeString(ret_types_[index]),
+        ", but ", DataTypeString(val.dtype()), " is provided.");
+  }
+  Retval* item = &rets_[index];
+  if (!item->has_val) {
+    item->has_val = true;
+    item->val = val;
+  } else {
+    return errors::Internal("Retval[", index, "] has already been set.");
+  }
+  return Status::OK();
+}
+
+FunctionLibraryDefinition::FunctionLibraryDefinition(
+    const FunctionDefLibrary& def_lib)
+    : function_defs_(def_lib.function_size()) {
+  for (auto fdef : def_lib.function()) {
+    // The latter function definition wins.
+    function_defs_[fdef.signature().name()] = fdef;
+  }
+}
+
+FunctionLibraryDefinition::~FunctionLibraryDefinition() {}
+
+const FunctionDef* FunctionLibraryDefinition::Find(const string& name) const {
+  auto iter = function_defs_.find(name);
+  if (iter == function_defs_.end()) {
+    return nullptr;
+  } else {
+    return &iter->second;
+  }
+}
+
+const OpDef* FunctionLibraryDefinition::LookUp(const string& op,
+                                               Status* status) const {
+  auto fdef = Find(op);
+  if (fdef != nullptr) {
+    return &(fdef->signature());
+  }
+  return OpRegistry::Global()->LookUp(op, status);
+}
+
+Status InstantiateFunction(const FunctionDef& fdef,
+                           InstantiateAttrValueSlice attr_values,
+                           GetFunctionSignature get_function,
+                           InstantiationResult* result) {
+  InstantiateAttrValueMap m;
+  for (const auto& aval : attr_values) {
+    m.insert({aval.first, aval.second.proto});
+  }
+  return InstantiateFunction(fdef, m, get_function, result);
+}
+
+string Canonicalize(const string& funcname, InstantiateAttrValueSlice attrs) {
+  InstantiateAttrValueMap m;
+  for (const auto& aval : attrs) {
+    m.insert({aval.first, aval.second.proto});
+  }
+  return Canonicalize(funcname, m);
+}
+
+Status FunctionLibraryRuntime::Instantiate(const string& function_name,
+                                           InstantiateAttrValueSlice attrs,
+                                           Handle* handle) {
+  InstantiateAttrValueMap m;
+  for (const auto& aval : attrs) {
+    m.insert({aval.first, aval.second.proto});
+  }
+  return Instantiate(function_name, m, handle);
+}
+
+void FunctionDefHelper::AttrValueWrapper::InitFromString(StringPiece val) {
+  if (val.size() >= 2 && val[0] == '$') {
+    proto.set_placeholder(val.data() + 1, val.size() - 1);
+  } else {
+    SetAttrValue(val, &proto);
+  }
+}
+
+FunctionDefHelper::AttrValueWrapper FunctionDefHelper::FunctionRef(
+    const string& name,
+    gtl::ArraySlice<std::pair<string, AttrValueWrapper>> attrs) {
+  AttrValueWrapper ret;
+  ret.proto.mutable_func()->set_name(name);
+  for (const auto& a : attrs) {
+    ret.proto.mutable_func()->mutable_attr()->insert({a.first, a.second.proto});
+  }
+  return ret;
+}
+
+FunctionDef::Node FunctionDefHelper::Node::ToProto() const {
+  FunctionDef::Node n;
+  for (const string& r : this->ret) {
+    n.add_ret(r);
+  }
+  n.set_op(this->op);
+  for (const string& a : arg) {
+    n.add_arg(a);
+  }
+  for (const auto& a : this->attr) {
+    n.mutable_attr()->insert({a.first, a.second.proto});
+  }
+  for (const string& d : dep) {
+    n.add_dep(d);
+  }
+  return n;
+}
+
+/*  static */
+FunctionDef FunctionDefHelper::Define(const string& name,
+                                      gtl::ArraySlice<string> arg_def,
+                                      gtl::ArraySlice<string> ret_def,
+                                      gtl::ArraySlice<string> attr_def,
+                                      gtl::ArraySlice<Node> node_def) {
+  FunctionDef fdef;
+  OpDefBuilder b(name);
+  for (const auto& a : arg_def) b.Input(a);
+  for (const auto& r : ret_def) b.Output(r);
+  for (const auto& a : attr_def) b.Attr(a);
+  TF_CHECK_OK(b.Finalize(fdef.mutable_signature()));
+  for (const auto& n : node_def) {
+    *(fdef.add_node()) = n.ToProto();
+  }
+  return fdef;
+}
+
+FunctionDef FunctionDefHelper::Define(gtl::ArraySlice<string> arg_def,
+                                      gtl::ArraySlice<string> ret_def,
+                                      gtl::ArraySlice<string> attr_def,
+                                      gtl::ArraySlice<Node> node_def) {
+  return Define("_", arg_def, ret_def, attr_def, node_def);
+}
+
+namespace gradient {
+
+typedef std::unordered_map<string, Creator> OpGradFactory;
+
+OpGradFactory* GetOpGradFactory() {
+  static OpGradFactory* factory = new OpGradFactory;
+  return factory;
+}
+
+bool RegisterOp(const string& op, Creator func) {
+  CHECK(GetOpGradFactory()->insert({op, func}).second)
+      << "Duplicated gradient for " << op;
+  return true;
+}
+
+Status GetOpGradientCreator(const string& op, Creator* creator) {
+  auto fac = GetOpGradFactory();
+  auto iter = fac->find(op);
+  if (iter == fac->end()) {
+    return errors::NotFound("No gradient defined for op: ", op);
+  }
+  *creator = iter->second;
+  return Status::OK();
+}
+
+}  // end namespace gradient
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/function.h b/tensorflow/core/framework/function.h
new file mode 100644
index 0000000000..1ef93a0533
--- /dev/null
+++ b/tensorflow/core/framework/function.h
@@ -0,0 +1,376 @@
+#ifndef TENSORFLOW_FRAMEWORK_FUNCTION_H_
+#define TENSORFLOW_FRAMEWORK_FUNCTION_H_
+
+#include <unordered_map>
+
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/function.pb.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+class CancellationManager;
+class Node;
+class OpKernel;
+
+// FunctionDefHelper::Define is a convenient helper to construct a
+// FunctionDef proto.
+//
+// E.g.,
+//   FunctionDef my_func = FunctionDefHelper::Define(
+//     "my_func_name",
+//     {"x:T", "y:T" /* one string per argument */},
+//     {"z:T" /* one string per return value */},
+//     {"T: {float, double}" /* one string per attribute  */},
+//     {
+//        {{"z"}, "Mul", {"x", "y"}, {{"T", "$T"}}}
+//        /* one entry per function node */
+//     })
+//
+// NOTE: When we have a TFLang parser, we can add another helper:
+//   FunctionDef FunctionDefHelper::Define(const string& tf_func);
+class FunctionDefHelper {
+ public:
+  // AttrValueWrapper has copy constructors for the type T so that
+  // it's easy to construct a simple AttrValue proto.
+  //
+  // If T is a string type (const char*, string, or StringPiece), and
+  // it starts with "$", we construct a AttrValue of "placeholder".
+  //
+  // E.g.,
+  //   std::<string, AttrValueWrapper> x = {"T", "$T"}
+  // is a named attr value placeholder.
+  struct AttrValueWrapper {
+    AttrValue proto;
+
+    AttrValueWrapper() {}
+
+    template <typename T>
+    AttrValueWrapper(T val) {  // NOLINT(runtime/explicit)
+      SetAttrValue(val, &proto);
+    }
+
+   private:
+    void InitFromString(StringPiece val);
+  };
+
+  // Constructs an AttrValue.func given the "name" and "attrs".
+  static AttrValueWrapper FunctionRef(
+      const string& name,
+      gtl::ArraySlice<std::pair<string, AttrValueWrapper>> attrs);
+  static AttrValueWrapper FunctionRef(const string& name) {
+    return FunctionRef(name, {});
+  }
+
+  // Node is used to consturct FunctionDef.Node using initialization
+  // lists. E.g.,
+  //  Node n = {{"z"}, "Mul", {"x", "y"}, {{"T", "$T"}}};  // z = x * y
+  struct Node {
+    std::vector<string> ret;
+    string op;
+    std::vector<string> arg;
+    std::vector<std::pair<string, AttrValueWrapper>> attr;
+    std::vector<string> dep;
+
+    FunctionDef::Node ToProto() const;
+  };
+
+  static FunctionDef Define(const string& function_name,
+                            gtl::ArraySlice<string> arg_def,
+                            gtl::ArraySlice<string> ret_def,
+                            gtl::ArraySlice<string> attr_def,
+                            gtl::ArraySlice<Node> node_def);
+
+  // Defines an anonymous function. I.e., its name is not relevant.
+  static FunctionDef Define(gtl::ArraySlice<string> arg_def,
+                            gtl::ArraySlice<string> ret_def,
+                            gtl::ArraySlice<string> attr_def,
+                            gtl::ArraySlice<Node> node_def);
+
+  // Helpers to construct a constant scalar.
+  template <typename T>
+  static Node Const(const string& name, const T& val) {
+    Node n = {{name}, "Const"};
+    const DataType dtype = DataTypeToEnum<T>::value;
+    n.attr.push_back({"dtype", dtype});
+    Tensor t(dtype, TensorShape({}));
+    t.scalar<T>()() = val;
+    n.attr.push_back({"value", t});
+    return n;
+  }
+
+  template <typename T>
+  static Node Const(const string& name, gtl::ArraySlice<T> vals) {
+    Node n = {{name}, "Const"};
+    const DataType dtype = DataTypeToEnum<T>::value;
+    n.attr.push_back({"dtype", dtype});
+    int64 num = vals.size();
+    Tensor t(dtype, TensorShape({num}));
+    for (int i = 0; i < vals.size(); ++i) {
+      t.flat<T>()(i) = vals[i];
+    }
+    n.attr.push_back({"value", t});
+    return n;
+  }
+};
+
+template <>
+inline FunctionDefHelper::AttrValueWrapper::AttrValueWrapper(const char* val) {
+  InitFromString(val);
+}
+
+template <>
+inline FunctionDefHelper::AttrValueWrapper::AttrValueWrapper(
+    const string& val) {
+  InitFromString(val);
+}
+
+template <>
+inline FunctionDefHelper::AttrValueWrapper::AttrValueWrapper(StringPiece val) {
+  InitFromString(val);
+}
+
+// Instantiate a function.
+//
+// "fdef" encodes a TF function with some attrs in fdef.signature.attr
+// containing placeholders.  InstantiateFunction binds these
+// placeholders and produces an instantiated function encoded in
+// "result.gdef". The value to substitute a placeholder is given by
+// "attr_values", which is a map from a placeholder name to an attr
+// value.
+//
+// InstatiateFunction calls "get_function" to find signatures of other
+// functions and primitive ops.
+
+// Placeholders in "fdef" is substitued based on "attr_values" here.
+typedef ::tensorflow::protobuf::Map<string, AttrValue> InstantiateAttrValueMap;
+typedef gtl::ArraySlice<std::pair<string, FunctionDefHelper::AttrValueWrapper>>
+    InstantiateAttrValueSlice;
+
+// GetFunctionSignature(func name, opdef) returns OK if the func name is found
+// and opdef is filled with a pointer to the corresponding signature
+// (a OpDef proto). Otherwise, returns an error.
+typedef std::function<Status(const string&, const OpDef**)>
+    GetFunctionSignature;
+
+struct InstantiationResult {
+  DataTypeVector arg_types;
+  DataTypeVector ret_types;
+  GraphDef gdef;
+};
+Status InstantiateFunction(const FunctionDef& fdef,
+                           const InstantiateAttrValueMap& attr_values,
+                           GetFunctionSignature get_function,
+                           InstantiationResult* result);
+Status InstantiateFunction(const FunctionDef& fdef,
+                           InstantiateAttrValueSlice attr_values,
+                           GetFunctionSignature get_function,
+                           InstantiationResult* result);
+
+// Returns a debug string for a function definition.
+//
+// The returned text is multiple-line. It is intended to be
+// human-readable rather than being friendly to parsers. It is _NOT_
+// intended to be the canonical string representation of "func_def".
+// Particularly, it may not include all information presented in
+// "func_def" (e.g., comments, description of the function arguments,
+// etc.)
+string DebugString(const FunctionDef& func_def);
+string DebugString(const GraphDef& instantiated_func_def);
+
+// Returns a debug string for a top level graph (the main program and
+// its supporting functions defined in its library).
+string DebugStringWhole(const GraphDef& gdef);
+
+// Returns a canonicalized string for the instantiation of the
+// function of the given "name" and attributes "attrs".
+//
+// The returned string is guaranteed to be stable within one address
+// space. But it may be change as the implementation
+// evolves. Therefore, it should not be persisted or compared across
+// address spaces.
+string Canonicalize(const string& funcname,
+                    const InstantiateAttrValueMap& attrs);
+string Canonicalize(const string& funcname, InstantiateAttrValueSlice attrs);
+
+// Represents a function call frame. I.e., the data structure used to
+// pass arguments to a function and retrieve its results.
+//
+// Runtime must arrange accesses to one FunctionCallFrame s.t.
+//   1. SetArgs() happens before any GetArg();
+//   2. GetRetvals happens after all SetRetval();
+class FunctionCallFrame {
+ public:
+  FunctionCallFrame(DataTypeSlice arg_types, DataTypeSlice ret_types);
+  ~FunctionCallFrame();
+
+  // Caller methods.
+  Status SetArgs(gtl::ArraySlice<Tensor> args);
+  Status GetRetvals(std::vector<Tensor>* rets) const;
+
+  // Callee methods.
+  Status GetArg(int index, Tensor* val) const;
+  Status SetRetval(int index, const Tensor& val);
+
+ private:
+  DataTypeVector arg_types_;
+  DataTypeVector ret_types_;
+  gtl::InlinedVector<Tensor, 4> args_;
+  struct Retval {
+    bool has_val = false;
+    Tensor val;
+  };
+  gtl::InlinedVector<Retval, 4> rets_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(FunctionCallFrame);
+};
+
+// Helper to maintain a map between function names in a given
+// FunctionDefLibrary and function definitions.
+class FunctionLibraryDefinition : public OpRegistryInterface {
+ public:
+  explicit FunctionLibraryDefinition(const FunctionDefLibrary& lib_def);
+  ~FunctionLibraryDefinition() override;
+
+  // Returns nullptr if "func" is not defined in "lib_def". Otherwise,
+  // returns its definition proto.
+  const FunctionDef* Find(const string& func) const;
+
+  // OpRegistryInterface method. Useful for constructing a Graph.
+  //
+  // If "op" is defined in the library, returns its signature.
+  // Otherwise, assume "op" is a primitive op and returns its op
+  // signature.
+  const OpDef* LookUp(const string& op, Status* status) const override;
+
+ private:
+  std::unordered_map<string, FunctionDef> function_defs_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(FunctionLibraryDefinition);
+};
+
+// Forward declare. Defined in common_runtime/function.h
+struct FunctionBody;
+
+class FunctionLibraryRuntime {
+ public:
+  virtual ~FunctionLibraryRuntime() {}
+
+  // Instantiate a function with the given "attrs".
+  //
+  // Returns OK and fills in "handle" if the instantiation succeeds.
+  // Otherwise returns an error and "handle" is undefined.
+  typedef uint64 Handle;
+  virtual Status Instantiate(const string& function_name,
+                             const InstantiateAttrValueMap& attrs,
+                             Handle* handle) = 0;
+  Status Instantiate(const string& function_name,
+                     InstantiateAttrValueSlice attrs, Handle* handle);
+
+  // Returns the function body for the instantiated function given its
+  // handle 'h'. Returns nullptr if "h" is not found.
+  //
+  // *this keeps the ownership of the returned object, which remains alive
+  // as long as *this.
+  virtual const FunctionBody* GetFunctionBody(Handle h) = 0;
+
+  // Asynchronously invokes the instantiated function identified by
+  // "handle".
+  //
+  // If function execution succeeds, "done" is called with OK and
+  // "*rets" is filled with the function's return values. Otheriwse,
+  // "done" is called with an error status.
+  //
+  // Does not take ownership of "rets".
+  struct Options {
+    CancellationManager* cancellation_manager = nullptr;
+  };
+  typedef std::function<void(const Status&)> DoneCallback;
+  virtual void Run(const Options& opts, Handle handle,
+                   gtl::ArraySlice<Tensor> args, std::vector<Tensor>* rets,
+                   DoneCallback done) = 0;
+
+  // Creates a "kernel" for the given node def "ndef".
+  //
+  // If succeeds, returns OK and the caller takes the ownership of the
+  // returned "*kernel". Otherwise, returns an error.
+  virtual Status CreateKernel(const NodeDef& ndef, OpKernel** kernel) = 0;
+
+  // Return true iff 'function_name' is the name of a defined function.
+  virtual bool IsDefined(const string& function_name) = 0;
+};
+
+// To register a gradient function for a builtin op, one should use
+//   REGISTER_OP_GRADIENT(<op_name>, <c++ grad factory>);
+//
+// Typically, the c++ grad factory is a plan function that can be
+// converted into ::tensorflow::gradient::Creator, which is
+//   std::function<Status(const AttrSlice&, FunctionDef*)>.
+//
+// A ::tensorflow::gradient::Creator should populate in FunctionDef* with a
+// definition of a brain function which computate the gradient for the
+// <op_name> when the <op_name> is instantiated with the given attrs.
+//
+// E.g.,
+//
+// Status MatMulGrad(const AttrSlice& attrs, FunctionDef* g) {
+//   bool transpose_a;
+//   TF_RETURN_IF_ERROR(attrs.Get("transpose_a", &transpose_a));
+//   bool transpose_b;
+//   TF_RETURN_IF_ERROR(attrs.Get("transpose_b", &transpose_b));
+//   DataType dtype;
+//   TF_RETURN_IF_ERROR(attrs.Get("dtype", &dtype));
+//   if (!transpose_a && !transpose_b) {
+//     *g = FunctionDefHelper::Define(
+//       "MatMulGrad",
+//       {"x:T ", "y:T", "dz:T"},    // Inputs to this function
+//       {"dx:T", "dy:T"},           // Outputs from this function
+//       {"T: {float, double}"},     // Attributes needed by this function
+//       {
+//         {{"x_t"}, "Transpose", {"x"}, {{"T", "$T"}}},
+//         {{"y_t"}, "Transpose", {"y"}, {{"T", "$T"}}},
+//         {{"dx"}, "MatMul", {"dz", "y_t"}, {{"T", "$T"}}},
+//         {{"dy"}, "MatMul", {"x_", "dz"}, {{"T", "$T"}}},
+//       });
+//   } else {
+//     ... ...
+//   }
+//   return Status::OK();
+// }
+//
+// NOTE: $T is substituted with the type variable "T" when the
+// gradient function MatMul is instantiated.
+//
+// TODO(zhifengc): Better documentation somewhere.
+
+// Macros to define a gradient function factory for a primitive
+// operation.
+#define REGISTER_OP_GRADIENT(name, fn) \
+  REGISTER_OP_GRADIENT_UNIQ_HELPER(__COUNTER__, name, fn)
+
+#define REGISTER_OP_NO_GRADIENT(name) \
+  REGISTER_OP_GRADIENT_UNIQ_HELPER(__COUNTER__, name, nullptr)
+
+#define REGISTER_OP_GRADIENT_UNIQ_HELPER(ctr, name, fn) \
+  REGISTER_OP_GRADIENT_UNIQ(ctr, name, fn)
+
+#define REGISTER_OP_GRADIENT_UNIQ(ctr, name, fn) \
+  static bool unused_grad_##ctr = ::tensorflow::gradient::RegisterOp(name, fn)
+
+namespace gradient {
+// Register a gradient creator for the "op".
+typedef std::function<Status(const AttrSlice& attrs, FunctionDef*)> Creator;
+bool RegisterOp(const string& op, Creator func);
+
+// Returns OK the gradient creator for the "op" is found (may be
+// nullptr if REGISTER_OP_NO_GRADIENT is used.
+Status GetOpGradientCreator(const string& op, Creator* creator);
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_FUNCTION_H_
diff --git a/tensorflow/core/framework/function.proto b/tensorflow/core/framework/function.proto
new file mode 100644
index 0000000000..4b8a26947c
--- /dev/null
+++ b/tensorflow/core/framework/function.proto
@@ -0,0 +1,68 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/attr_value.proto";
+import "tensorflow/core/framework/op_def.proto";
+
+// A library is a set of named functions.
+message FunctionDefLibrary {
+  repeated FunctionDef function = 1;
+}
+
+// A function can be instantiated when the runtime can bind every attr
+// with a value. When a GraphDef has a call to a function, it must
+// have binding for every attr defined in the signature.
+//
+// TODO(zhifengc):
+//   * device spec, etc.
+message FunctionDef {
+  // The definition of the function's name, arguments, return values,
+  // attrs etc.
+  OpDef signature = 1;
+
+  // The body of the function.
+  repeated Node node = 2;  // function.node.ret[*] are unique.
+
+  // A node is a multi-value assignment:
+  //   (ret[0], ret[1], ...) = func(arg[0], arg[1], ...)
+  //
+  // By convention, "func" is resolved by consulting with a user-defined
+  // library first. If not resolved, "func" is assumed to be a builtin op.
+  message Node {
+    // This node produces multiple outputs. They are named ret[0],
+    // ret[1], ..., etc.
+    //
+    // REQUIRES: function.node.ret[*] are unique across all nodes.
+    // REQUIRES: ret.size == func/op def's number of output args.
+    repeated string ret = 1;
+
+    // The op/function name.
+    string op = 2;
+
+    // Arguments passed to this func/op.
+    //
+    // arg[i] must be either one of
+    // function.signature.input_args[*].name or one of
+    // function.node[*].ret[*].
+    //
+    // REQUIRES: arg.size == func/op def's number of input args.
+    repeated string arg = 3;
+
+    // Control dependencies.
+    //
+    // dep[i] must be one of function.node[*].ret[*] or one of
+    // function.signature.input_args[*].name.
+    repeated string dep = 4;
+
+    // Attrs.
+    //
+    // 'attr' maps names defined by 'func's attr defs to attr values.
+    // attr values may have placeholders which are substituted
+    // recursively by concrete values when this node is instantiated.
+    // These placeholdes must name an attr listed in the FunctionDef's
+    // signature.
+    map<string, AttrValue> attr = 5;
+  }
+}
diff --git a/tensorflow/core/framework/function_test.cc b/tensorflow/core/framework/function_test.cc
new file mode 100644
index 0000000000..c9483fad18
--- /dev/null
+++ b/tensorflow/core/framework/function_test.cc
@@ -0,0 +1,634 @@
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/framework/function.pb.h"
+#include "tensorflow/core/framework/function_testlib.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/port.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+typedef FunctionDefHelper FDH;
+
+Status GetOpSig(const string& op, const OpDef** sig) {
+  Status s;
+  *sig = OpRegistry::Global()->LookUp(op, &s);
+  return s;
+}
+
+REGISTER_OP("One")
+    .Output("y: T")
+    .Attr("T: {float, double, int32, int64}")
+    .Doc(R"doc(
+Returns a tensor with a single element (1) of type T.
+
+y: A scalar in type T.
+
+)doc");
+
+static InstantiateAttrValueMap kNoAttrs;
+
+TEST(TFunc, SquarePlusOne) {
+  RequireDefaultOps();
+  auto fdef = FDH::Define(
+      // Name
+      "SquarePlusOne",
+      // Args
+      {"x: T"},
+      // Return values
+      {"y: T"},
+      // Attrs
+      {"T: {float, double, int32, int64}"},
+      // Nodes
+      {// a = Square<T>(x)
+       {{"a"}, "Square", {"x"}, {{"T", "$T"}}},
+       // o = One<T>()
+       // NOTE: We can also have a Cast<Tin, Tout>(x) instead.
+       {{"o"}, "One", {}, {{"T", "$T"}}},
+       // y = Add<T>(a, o)
+       {{"y"}, "Add", {"a", "o"}, {{"T", "$T"}}}});
+
+  const char* e = R"P(
+SquarePlusOne[T:{float, double, int32, int64}](x:T) -> (y:T) {
+  a = Square[T=$T](x)
+  o = One[T=$T]()
+  y = Add[T=$T](a, o)
+}
+)P";
+  EXPECT_EQ(DebugString(fdef), e);
+
+  // Instantiate one with T=float
+  InstantiationResult result;
+  TF_CHECK_OK(InstantiateFunction(fdef, {{"T", DT_FLOAT}}, GetOpSig, &result));
+  const char* e2 = R"P(
+(n0:float) -> (n3:float) {
+  n1 = Square[T=float](n0)
+  n2 = One[T=float]()
+  n3 = Add[T=float](n1, n2)
+}
+)P";
+  EXPECT_EQ(result.arg_types, DataTypeVector({DT_FLOAT}));
+  EXPECT_EQ(result.ret_types, DataTypeVector({DT_FLOAT}));
+  EXPECT_EQ(DebugString(result.gdef), e2);
+}
+
+// NOTE: This is the simplest Map op. It takes a f:T->U.
+REGISTER_OP("Map")
+    .Input("x: N * T")
+    .Output("y: N * U")
+    .Attr("T: type")
+    .Attr("U: type")
+    .Attr("N: int >= 1")
+    // .Attr("func: func_name_with_attr")
+    .Doc(R"doc(
+Applies the 'func' on every input. I.e.,
+
+y[i] = func<...>(x[i])
+
+x: N tensors, each of type T;
+y: N tensors, each of type U;
+
+)doc");
+
+TEST(TFunc, AddSquared) {
+  auto fdef = FDH::Define(
+      // Name
+      "AddSquared",
+      // Args
+      {"x: N*T"},
+      // Return values
+      {"y: T"},
+      // Attrs
+      {"N:int", "T:{float, double, int32, int64}"},
+      // Nodes
+      {// a = Map<func=Square<$T>,T=$T,U=$T,N=$N>(x)
+       {{"a"},
+        "Map",
+        {"x"},
+        {{"func", FDH::FunctionRef("Square", {{"T", "$T"}})},
+         {"T", "$T"},
+         {"U", "$T"},
+         {"N", "$N"}}},
+       // y = AddN<N=$N,T=$T>(a)
+       {{"y"}, "AddN", {"a"}, {{"N", "$N"}, {"T", "$T"}}}});
+
+  const char* e = R"P(
+AddSquared[N:int, T:{float, double, int32, int64}](x:N*T) -> (y:T) {
+  a = Map[N=$N, T=$T, U=$T, func=Square[T=$T]](x)
+  y = AddN[N=$N, T=$T](a)
+}
+)P";
+  EXPECT_EQ(DebugString(fdef), e);
+
+  // Instantiate one with T=float
+  InstantiationResult result;
+  TF_CHECK_OK(InstantiateFunction(fdef, {{"N", 3}, {"T", DT_FLOAT}}, GetOpSig,
+                                  &result));
+  const char* e2 = R"P(
+(n0:float, n1:float, n2:float) -> (n4:float) {
+  n3 = Map[N=3, T=float, U=float, func=Square[T=float]](n0, n1, n2)
+  n4 = AddN[N=3, T=float](n3, n3:1, n3:2)
+}
+)P";
+  EXPECT_EQ(result.arg_types, DataTypeVector({DT_FLOAT, DT_FLOAT, DT_FLOAT}));
+  EXPECT_EQ(result.ret_types, DataTypeVector({DT_FLOAT}));
+  EXPECT_EQ(DebugString(result.gdef), e2);
+}
+
+TEST(TFunc, ControlDeps) {
+  auto fdef = FDH::Define(
+      // Name
+      "ControlDeps",
+      // Args
+      {"x: float"},
+      // Return values
+      {},
+      // Attrs
+      {},
+      // Nodes
+      {
+          {{"a"}, "One", {}, {{"T", DT_FLOAT}}, {"x"}},
+          {{"u"}, "NoOp", {}, {}, {"a"}},
+          {{"b"}, "One", {}, {{"T", DT_FLOAT}}, {"u"}},
+          {{"v"}, "NoOp", {}, {}, {"b"}},
+          {{"c"}, "One", {}, {{"T", DT_FLOAT}}, {"a", "v"}},
+      });
+  const char* e = R"P(
+ControlDeps(x:float) -> () {
+  a = One[T=float]() @ x
+  u = NoOp() @ a
+  b = One[T=float]() @ u
+  v = NoOp() @ b
+  c = One[T=float]() @ a, v
+}
+)P";
+  EXPECT_EQ(DebugString(fdef), e);
+
+  InstantiationResult result;
+  TF_CHECK_OK(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result));
+  const char* e2 = R"P(
+(n0:float) -> () {
+  n1 = One[T=float]() @ n0
+  n2 = NoOp() @ n1
+  n3 = One[T=float]() @ n2
+  n4 = NoOp() @ n3
+  n5 = One[T=float]() @ n1, n4
+}
+)P";
+  EXPECT_EQ(result.arg_types, DataTypeVector({DT_FLOAT}));
+  EXPECT_EQ(result.ret_types, DataTypeVector({}));
+  EXPECT_EQ(DebugString(result.gdef), e2);
+}
+
+TEST(TFunc, XTimesTwo) {
+  auto expect = R"P(
+XTimesTwo[T:{float, double, int32, int64}](x:T) -> (y:T) {
+  two = Const[dtype=int64, value=Tensor<type: int64 shape: [] values: 2>]()
+  scale = Cast[DstT=$T, SrcT=int64](two)
+  y = Mul[T=$T](x, scale)
+}
+)P";
+  EXPECT_EQ(expect, DebugString(test::function::XTimesTwo()));
+}
+
+TEST(TFunc, WXPlusB) {
+  auto expect = R"P(
+WXPlusB[T:{float, double}](w:T, x:T, b:T) -> (y:T) {
+  mm = MatMul[T=$T, _kernel="eigen", transpose_a=false, transpose_b=false](w, x)
+  y = Add[T=$T](mm, b)
+}
+)P";
+  EXPECT_EQ(expect, DebugString(test::function::WXPlusB()));
+}
+
+TEST(TFunc, Body_TypeList) {
+  const Tensor kZero = test::AsScalar<int32>(0);
+  auto fdef = FDH::Define(
+      // Name
+      "Test",
+      // Args
+      {"i:float"},
+      // Return values
+      {"o:float"},
+      // Attrs
+      {},
+      // Nodes
+      {{{"zero"}, "Const", {}, {{"value", kZero}, {"dtype", DT_INT32}}},
+       {{"s"}, "Split", {"zero", "i"}, {{"num_split", 4}, {"T", DT_FLOAT}}},
+       {{"a", "b", "c", "d"},
+        "_ArrayToList",
+        {"s"},
+        {{"N", 4},
+         {"T", DT_FLOAT},
+         {"out_types", DataTypeSlice{DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT}}}},
+       {{"l"}, "Mul", {"a", "b"}, {{"T", DT_FLOAT}}},
+       {{"r"}, "Mul", {"c", "d"}, {{"T", DT_FLOAT}}},
+       {{"x"}, "_ListToArray", {"l", "r"}, {{"N", 2}, {"T", DT_FLOAT}}},
+       {{"o"}, "AddN", {"x"}, {{"N", 2}, {"T", DT_FLOAT}}}});
+
+  const char* e = R"P(
+Test(i:float) -> (o:float) {
+  zero = Const[dtype=int32, value=Tensor<type: int32 shape: [] values: 0>]()
+  s = Split[T=float, num_split=4](zero, i)
+  a, b, c, d = _ArrayToList[N=4, T=float, out_types={float, float, float, float}](s)
+  l = Mul[T=float](a, b)
+  r = Mul[T=float](c, d)
+  x = _ListToArray[N=2, T=float](l, r)
+  o = AddN[N=2, T=float](x)
+}
+)P";
+  EXPECT_EQ(DebugString(fdef), e);
+
+  InstantiationResult result;
+  TF_CHECK_OK(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result));
+  const char* e2 = R"P(
+(n0:float) -> (n7:float) {
+  n1 = Const[dtype=int32, value=Tensor<type: int32 shape: [] values: 0>]()
+  n2 = Split[T=float, num_split=4](n1, n0)
+  n3 = _ArrayToList[N=4, T=float, out_types={float, float, float, float}](n2, n2:1, n2:2, n2:3)
+  n4 = Mul[T=float](n3, n3:1)
+  n5 = Mul[T=float](n3:2, n3:3)
+  n6 = _ListToArray[N=2, T=float, Tin={float, float}](n4, n5)
+  n7 = AddN[N=2, T=float](n6, n6:1)
+}
+)P";
+  EXPECT_EQ(result.arg_types, DataTypeVector({DT_FLOAT}));
+  EXPECT_EQ(result.ret_types, DataTypeVector({DT_FLOAT}));
+  EXPECT_EQ(DebugString(result.gdef), e2);
+}
+
+REGISTER_OP("Cond")
+    .Input("input: Tin")
+    .Output("output: out_types")
+    .Attr("Tin: list(type)")
+    .Attr("out_types: list(type)")
+    .Attr("cond: func")
+    .Attr("then_branch: func")
+    .Attr("else_branch: func")
+    .Doc(R"doc(
+output = Cond(input) ? then_branch(input) : else_branch(input)
+
+cond: A function takes 'input' and returns a scalar.
+then_branch: A funcion takes 'input' and returns 'output'.
+else_branch: A funcion takes 'input' and returns 'output'.
+)doc");
+
+TEST(TFunc, Body_Array_List_Converter) {
+  auto fdef = FDH::Define(
+      // Name
+      "MySelect",
+      // Args
+      {"x:float"},
+      // Return values
+      {"z:float"},
+      // Attrs
+      {},
+      // Nodes
+      {
+          {{"y"},
+           "Cond",
+           {"x"},
+           {{"Tin", DataTypeSlice{DT_FLOAT}},
+            {"out_types", DataTypeSlice{DT_FLOAT}},
+            {"cond", FDH::FunctionRef("MyCond")},
+            {"then_branch", FDH::FunctionRef("MyThen")},
+            {"else_branch", FDH::FunctionRef("MyElse")}}},
+          {{"z"},
+           "Cond",
+           {"y", "y"},
+           {{"Tin", DataTypeSlice{DT_FLOAT, DT_FLOAT}},
+            {"out_types", DataTypeSlice{DT_FLOAT}},
+            {"cond", FDH::FunctionRef("MyCond2")},
+            {"then_branch", FDH::FunctionRef("MyThen2")},
+            {"else_branch", FDH::FunctionRef("MyElse2")}}},
+      });
+
+  const char* e = R"P(
+MySelect(x:float) -> (z:float) {
+  y = Cond[Tin={float}, cond=MyCond, else_branch=MyElse, out_types={float}, then_branch=MyThen](x)
+  z = Cond[Tin={float, float}, cond=MyCond2, else_branch=MyElse2, out_types={float}, then_branch=MyThen2](y, y)
+}
+)P";
+  EXPECT_EQ(DebugString(fdef), e);
+
+  InstantiationResult result;
+  TF_CHECK_OK(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result));
+  const char* e2 = R"P(
+(n0:float) -> (n2:float) {
+  n1 = Cond[Tin={float}, cond=MyCond, else_branch=MyElse, out_types={float}, then_branch=MyThen](n0)
+  n2 = Cond[Tin={float, float}, cond=MyCond2, else_branch=MyElse2, out_types={float}, then_branch=MyThen2](n1, n1)
+}
+)P";
+  EXPECT_EQ(result.arg_types, DataTypeVector({DT_FLOAT}));
+  EXPECT_EQ(result.ret_types, DataTypeVector({DT_FLOAT}));
+  EXPECT_EQ(DebugString(result.gdef), e2);
+}
+
+static void HasError(const Status& s, const string& substr) {
+  EXPECT_TRUE(StringPiece(s.ToString()).contains(substr))
+      << s << ", expected substring " << substr;
+}
+
+TEST(InstantiateErrors, Not_Sufficient_Attrs) {
+  auto fdef =
+      FDH::Define("nop", {}, {}, {"T:{float, double, int32, int64}"}, {});
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, {{"U", DT_FLOAT}}, GetOpSig, &result),
+           "T is not found");
+}
+
+TEST(InstantiateErrors, AttrValue_Value_Placeholder) {
+  auto fdef =
+      FDH::Define("nop", {}, {}, {"T:{float, double, int32, int64}"}, {});
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, {{"T", "$bad"}}, GetOpSig, &result),
+           "T in attr_values is still a placeholder");
+}
+
+TEST(InstantiateErrors, Unbounded_Attr) {
+  auto fdef = FDH::Define("test", {}, {}, {"T:{float, double, int32, int64}"},
+                          {
+                              {{"a"}, "One", {}, {{"T", "$unknown"}}, {"x"}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, {{"T", DT_FLOAT}}, GetOpSig, &result),
+           "Failed to bind all placeholders");
+}
+
+TEST(InstantiateErrors, DupArgs) {
+  auto fdef = FDH::Define("test", {"x:float", "x:float"}, {}, {}, {});
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Duplicated arg name");
+}
+
+TEST(InstantiateErrors, Dup_Arg_Node_Name) {
+  auto fdef = FDH::Define("test", {"x:float"}, {}, {},
+                          {
+                              {{"x"}, "One", {}, {{"T", DT_FLOAT}}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Duplicated ret name");
+}
+
+TEST(InstantiateErrors, Dup_Node_Names) {
+  auto fdef = FDH::Define("test", {"x:float"}, {}, {},
+                          {
+                              {{"y"}, "One", {}, {{"T", DT_FLOAT}}},
+                              {{"y"}, "One", {}, {{"T", DT_FLOAT}}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Duplicated ret name");
+}
+
+TEST(InstantiateErrors, Node_Signature_Mismatch_NoOp) {
+  auto fdef = FDH::Define("test", {"x:float"}, {}, {},
+                          {
+                              {{"y", "z"}, "NoOp", {}, {{"T", DT_FLOAT}}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Expect one ret name");
+}
+
+TEST(InstantiateErrors, Node_Signature_Mismatch) {
+  auto fdef = FDH::Define("test", {"x:float"}, {}, {},
+                          {
+                              {{"y", "z"}, "One", {}, {{"T", DT_FLOAT}}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Malformed function node (#ret)");
+}
+
+TEST(InstantiateErrors, Node_Arg_Notfound) {
+  auto fdef = FDH::Define("test", {"x:float"}, {}, {},
+                          {
+                              {{"y"}, "Add", {"x", "z"}, {{"T", DT_FLOAT}}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "arg[1] is not found");
+}
+
+TEST(InstantiateErrors, Node_Arg_Mismatch) {
+  auto fdef = FDH::Define("test", {"x:float"}, {}, {},
+                          {
+                              {{"y"}, "Add", {"x", "x"}, {{"T", DT_INT32}}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Invalid arg(0) for function arg");
+}
+
+TEST(InstantiateErrors, Node_Arg_ControlMissing) {
+  auto fdef =
+      FDH::Define("test", {"x:float"}, {}, {},
+                  {
+                      {{"y"}, "Add", {"x", "x"}, {{"T", DT_FLOAT}}, {"z"}},
+                  });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "dep[0] is not found");
+}
+
+TEST(InstantiateErrors, FuncRet_Missing) {
+  auto fdef = FDH::Define("test", {}, {"y: float"}, {},
+                          {
+                              {{"x"}, "One", {}, {{"T", DT_FLOAT}}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "ret is not found");
+}
+
+TEST(InstantiateErrors, FuncRet_Mismatch) {
+  auto fdef = FDH::Define("test", {}, {"y: float"}, {},
+                          {
+                              {{"y"}, "One", {}, {{"T", DT_DOUBLE}}},
+                          });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Invalid ret name.\n\t In y");
+}
+
+TEST(InstantiateErrors, TypeList_Missing_Retval_Attr) {
+  auto fdef = FDH::Define(
+      // Name
+      "MySelect",
+      // Args
+      {"x: float"},
+      // Return values
+      {"y: float"},
+      // Attrs
+      {},
+      // Nodes
+      {
+          {{"y"},
+           "Cond",
+           {"x", "x"},
+           {{"tin", DataTypeSlice{DT_FLOAT, DT_FLOAT}},
+            {"cond", FDH::FunctionRef("MyCond2")},
+            {"then_branch", FDH::FunctionRef("MyThen2")},
+            {"else_branch", FDH::FunctionRef("MyElse2")}}},
+      });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Missing attr out_types");
+}
+
+TEST(InstantiateErrors, TypeList_Num_Retval_Mismatch) {
+  auto fdef = FDH::Define(
+      // Name
+      "MySelect",
+      // Args
+      {"x: float"},
+      // Return values
+      {"y: float"},
+      // Attrs
+      {},
+      // Nodes
+      {
+          {{"y"},
+           "Cond",
+           {"x", "x"},
+           {{"Tin", DataTypeSlice{DT_FLOAT, DT_FLOAT}},
+            {"out_types", DataTypeSlice{DT_FLOAT, DT_FLOAT}},
+            {"cond", FDH::FunctionRef("MyCond2")},
+            {"then_branch", FDH::FunctionRef("MyThen2")},
+            {"else_branch", FDH::FunctionRef("MyElse2")}}},
+      });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "Wrong #ret: 0 2 1");
+}
+
+TEST(InstantiateErrors, TypeList_Missing_Arg) {
+  auto fdef = FDH::Define(
+      // Name
+      "MySelect",
+      // Args
+      {"x: float"},
+      // Return values
+      {"y: float"},
+      // Attrs
+      {},
+      // Nodes
+      {
+          {{"y"},
+           "Cond",
+           {"x", "unknown"},
+           {{"out_types", DataTypeSlice{DT_FLOAT}},
+            {"cond", FDH::FunctionRef("MyCond2")},
+            {"then_branch", FDH::FunctionRef("MyThen2")},
+            {"else_branch", FDH::FunctionRef("MyElse2")}}},
+      });
+  InstantiationResult result;
+  HasError(InstantiateFunction(fdef, kNoAttrs, GetOpSig, &result),
+           "arg[1] is not found");
+}
+
+TEST(FunctionCallFrame, Void_Void) {
+  FunctionCallFrame frame({}, {});
+  EXPECT_OK(frame.SetArgs({}));
+  auto a = test::AsTensor<float>({100});
+  HasError(frame.SetArgs({a}), "Invalid argument");
+  Tensor v;
+  HasError(frame.GetArg(0, &v), "Out of range");
+  HasError(frame.SetRetval(0, v), "Out of range");
+  std::vector<Tensor> rets;
+  EXPECT_OK(frame.GetRetvals(&rets));
+  EXPECT_EQ(rets.size(), 0);
+}
+
+TEST(FunctionCallFrame, Float_Float_Float) {
+  FunctionCallFrame frame({DT_FLOAT, DT_FLOAT}, {DT_FLOAT});
+  HasError(frame.SetArgs({}), "Invalid argument: Expects 2 arguments");
+  auto a = test::AsTensor<float>({100});
+  auto b = test::AsTensor<float>({200});
+  auto c = test::AsTensor<int64>({300});
+  HasError(frame.SetArgs({a, c}),
+           "Invalid argument: Expects arg[1] to be float");
+  EXPECT_OK(frame.SetArgs({a, b}));
+
+  Tensor v;
+  HasError(frame.GetArg(-1, &v), "Out of range");
+  HasError(frame.GetArg(2, &v), "Out of range");
+  EXPECT_OK(frame.GetArg(0, &v));
+  test::ExpectTensorEqual<float>(a, v);
+  EXPECT_OK(frame.GetArg(1, &v));
+  test::ExpectTensorEqual<float>(b, v);
+
+  v = test::AsTensor<float>({-100});
+  HasError(frame.SetRetval(-1, v), "Out of range");
+  HasError(frame.SetRetval(1, v), "Out of range");
+  HasError(frame.SetRetval(0, test::AsTensor<int64>({-100})),
+           "Invalid argument: Expects ret[0] to be float");
+
+  std::vector<Tensor> rets;
+  HasError(frame.GetRetvals(&rets), "does not have value");
+  EXPECT_OK(frame.SetRetval(0, v));
+  HasError(frame.SetRetval(0, v), "has already been set");
+
+  EXPECT_OK(frame.GetRetvals(&rets));
+  EXPECT_EQ(rets.size(), 1);
+  test::ExpectTensorEqual<float>(rets[0], v);
+}
+
+TEST(Canonicalize, Basic) {
+  EXPECT_EQ(Canonicalize("MatMul", {{"T", DT_FLOAT},
+                                    {"transpose_a", false},
+                                    {"transpose_b", false}}),
+            "MatMul[T=float,transpose_a=false,transpose_b=false]");
+  EXPECT_EQ(Canonicalize("MatMul", {{"T", DT_FLOAT},
+                                    {"transpose_b", false},
+                                    {"transpose_a", false}}),
+            "MatMul[T=float,transpose_a=false,transpose_b=false]");
+  EXPECT_EQ(Canonicalize("MatMul", {{"T", DT_DOUBLE},
+                                    {"transpose_b", true},
+                                    {"transpose_a", false}}),
+            "MatMul[T=double,transpose_a=false,transpose_b=true]");
+}
+
+TEST(FunctionLibraryDefinitionTest, Find) {
+  FunctionDefLibrary proto;
+  *proto.add_function() = test::function::XTimesTwo();
+  FunctionLibraryDefinition lib_def(proto);
+
+  EXPECT_EQ(lib_def.Find("XTimes16"), nullptr);
+
+  auto expect = R"P(
+XTimesTwo[T:{float, double, int32, int64}](x:T) -> (y:T) {
+  two = Const[dtype=int64, value=Tensor<type: int64 shape: [] values: 2>]()
+  scale = Cast[DstT=$T, SrcT=int64](two)
+  y = Mul[T=$T](x, scale)
+}
+)P";
+  auto found = lib_def.Find("XTimesTwo");
+  ASSERT_NE(found, nullptr);
+  EXPECT_EQ(expect, DebugString(*found));
+}
+
+TEST(FunctionLibraryDefinitionTest, LookUp) {
+  FunctionDefLibrary proto;
+  *proto.add_function() = test::function::XTimesTwo();
+  FunctionLibraryDefinition lib_def(proto);
+
+  Status s;
+  EXPECT_EQ(lib_def.LookUp("XTimes16", &s), nullptr);
+
+  auto found = lib_def.LookUp("XTimesTwo", &s);
+  ASSERT_NE(found, nullptr);
+  EXPECT_EQ(found->DebugString(),
+            test::function::XTimesTwo().signature().DebugString());
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/function_testlib.cc b/tensorflow/core/framework/function_testlib.cc
new file mode 100644
index 0000000000..5ead947076
--- /dev/null
+++ b/tensorflow/core/framework/function_testlib.cc
@@ -0,0 +1,146 @@
+#include "tensorflow/core/framework/function_testlib.h"
+
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+
+namespace tensorflow {
+namespace test {
+namespace function {
+
+typedef FunctionDefHelper FDH;
+
+GraphDef GDef(gtl::ArraySlice<NodeDef> nodes,
+              gtl::ArraySlice<FunctionDef> funcs) {
+  GraphDef g;
+  for (auto n : nodes) {
+    *(g.add_node()) = n;
+  }
+  auto lib = g.mutable_library();
+  for (auto f : funcs) {
+    *(lib->add_function()) = f;
+  }
+  return g;
+}
+
+// Helper to construct a NodeDef.
+NodeDef NDef(const string& name, const string& op,
+             gtl::ArraySlice<string> inputs,
+             gtl::ArraySlice<std::pair<string, FDH::AttrValueWrapper>> attrs,
+             const string& device) {
+  NodeDef n;
+  n.set_name(name);
+  n.set_op(op);
+  for (auto in : inputs) n.add_input(in);
+  n.set_device(device);
+  for (auto na : attrs) n.mutable_attr()->insert({na.first, na.second.proto});
+  return n;
+}
+
+FunctionDef NonZero() {
+  return FDH::Define(
+      // Name
+      "NonZero",
+      // Args
+      {"x:T"},
+      // Return values
+      {"y:T"},
+      // Attr def
+      {"T:{float, double, int32, int64, string}"},
+      // Nodes
+      {
+          {{"y"}, "Identity", {"x"}, {{"T", "$T"}}},
+      });
+}
+
+FunctionDef XTimesTwo() {
+  const Tensor kTwo = test::AsScalar<int64>(2);
+  return FDH::Define(
+      // Name
+      "XTimesTwo",
+      // Args
+      {"x: T"},
+      // Return values
+      {"y: T"},
+      // Attr def
+      {"T: {float, double, int32, int64}"},
+      // Nodes
+      {
+          {{"two"}, "Const", {}, {{"value", kTwo}, {"dtype", DT_INT64}}},
+          {{"scale"}, "Cast", {"two"}, {{"SrcT", DT_INT64}, {"DstT", "$T"}}},
+          {{"y"}, "Mul", {"x", "scale"}, {{"T", "$T"}}},
+      });
+}
+
+FunctionDef XTimesFour() {
+  return FDH::Define(
+      // Name
+      "XTimesFour",
+      // Args
+      {"x: T"},
+      // Return values
+      {"y: T"},
+      // Attr def
+      {"T: {float, double, int32, int64}"},
+      // Nodes
+      {
+          {{"x2"}, "XTimesTwo", {"x"}, {{"T", "$T"}}},
+          {{"y"}, "XTimesTwo", {"x2"}, {{"T", "$T"}}},
+      });
+}
+
+FunctionDef XTimes16() {
+  return FDH::Define(
+      // Name
+      "XTimes16",
+      // Args
+      {"x: T"},
+      // Return values
+      {"y: T"},
+      // Attr def
+      {"T: {float, double, int32, int64}"},
+      // Nodes
+      {
+          {{"x4"}, "XTimesFour", {"x"}, {{"T", "$T"}}},
+          {{"y"}, "XTimesFour", {"x4"}, {{"T", "$T"}}},
+      });
+}
+
+FunctionDef WXPlusB() {
+  return FDH::Define(
+      // Name
+      "WXPlusB",
+      // Args
+      {"w: T", "x: T", "b: T"},
+      // Return values
+      {"y: T"},
+      // Attr def
+      {"T: {float, double}"},
+      // Nodes
+      {{{"mm"},
+        "MatMul",
+        {"w", "x"},
+        {{"T", "$T"},
+         {"transpose_a", false},
+         {"transpose_b", false},
+         {"_kernel", "eigen"}}},
+       {{"y"}, "Add", {"mm", "b"}, {{"T", "$T"}}}});
+}
+
+FunctionDef Swap() {
+  return FDH::Define(
+      // Name
+      "Swap",
+      // Args
+      {"i0: T", "i1: T"},
+      // Return values
+      {"o0: T", "o1: T"},
+      // Attr def
+      {"T: {float, double}"},
+      // Nodes
+      {{{"o0"}, "Identity", {"i1"}, {{"T", "$T"}}},
+       {{"o1"}, "Identity", {"i0"}, {{"T", "$T"}}}});
+}
+
+}  // end namespace function
+}  // end namespace test
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/function_testlib.h b/tensorflow/core/framework/function_testlib.h
new file mode 100644
index 0000000000..ed0446ea85
--- /dev/null
+++ b/tensorflow/core/framework/function_testlib.h
@@ -0,0 +1,53 @@
+#ifndef TENSORFLOW_FRAMEWORK_FUNCTION_TESTLIB_H_
+#define TENSORFLOW_FRAMEWORK_FUNCTION_TESTLIB_H_
+
+#include <string>
+
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/framework/function.pb.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace test {
+namespace function {
+
+// Helper to construct a NodeDef.
+NodeDef NDef(
+    const string& name, const string& op, gtl::ArraySlice<string> inputs,
+    gtl::ArraySlice<std::pair<string, FunctionDefHelper::AttrValueWrapper>>
+        attrs = {},
+    const string& device = "");
+
+// Helper to construct a GraphDef proto.
+GraphDef GDef(gtl::ArraySlice<NodeDef> nodes,
+              gtl::ArraySlice<FunctionDef> funcs = {});
+
+// For testing convenience, we provide a few simple functions that can
+// be easily executed and tested.
+
+// x:T -> x * 2.
+FunctionDef XTimesTwo();
+
+// x:T -> (x * 2) * 2.
+FunctionDef XTimesFour();
+
+// x:T -> ((x * 2) * 2) * 2.
+FunctionDef XTimes16();
+
+// w:T, x:T, b:T -> MatMul(w, x) + b
+FunctionDef WXPlusB();
+
+// x:T -> x:T, T is a type which we automatically converts to a bool.
+FunctionDef NonZero();
+
+// x:T, y:T -> y:T, x:T
+FunctionDef Swap();
+
+}  // end namespace function
+}  // end namespace test
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_FUNCTION_TESTLIB_H_
diff --git a/tensorflow/core/framework/graph.proto b/tensorflow/core/framework/graph.proto
new file mode 100644
index 0000000000..a9bc07e88c
--- /dev/null
+++ b/tensorflow/core/framework/graph.proto
@@ -0,0 +1,103 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/attr_value.proto";
+import "tensorflow/core/framework/function.proto";
+
+// Represents the graph of operations
+// TODO(sanjay): Also want to put the following somewhere:
+// * random_seed
+// * replicas: Do we stamp them out in python itself?
+// * where to load parameters
+// * optimizer info? does it go with the parameter layers/ops?
+message GraphDef {
+  repeated NodeDef node = 1;
+
+  // EXPERIMENTAL. DO NOT USE OR DEPEND ON THIS YET.
+  //
+  // "library" provides user-defined functions.
+  //
+  // Naming:
+  //   * library.function.name are in a flat namespace.
+  //     NOTE: We may need to change it to be hierarchical to support
+  //     different orgs. E.g.,
+  //     { "/google/nn", { ... }},
+  //     { "/google/vision", { ... }}
+  //     { "/org_foo/module_bar", {...}}
+  //     map<string, FunctionDefLib> named_lib;
+  //   * If node[i].op is the name of one function in "library",
+  //     node[i] is deemed as a function call. Otherwise, node[i].op
+  //     must be a primitive operation supported by the runtime.
+  //
+  //
+  // Function call semantics:
+  //
+  //   * The callee may start execution as soon as some of its inputs
+  //     are ready. The caller may want to use Tuple() mechanism to
+  //     ensure all inputs are ready in the same time.
+  //
+  //   * The consumer of return values may start executing as soon as
+  //     the return values the consumer depends on are ready.  The
+  //     consumer may want to use Tuple() mechanism to ensure the
+  //     consumer does not start until all return values of the callee
+  //     function are ready.
+  FunctionDefLibrary library = 2;
+};
+
+message NodeDef {
+  // The name given to this operator. Used for naming inputs,
+  // logging, visualization, etc.  Unique within a single GraphDef.
+  // Must match the regexp "[A-Za-z0-9.][A-Za-z0-9_./]*".
+  string name = 1;
+
+  // The operation name.  There may be custom parameters in attrs.
+  // Op names starting with an underscore are reserved for internal use.
+  string op = 2;
+
+  // Each input is "node:src_output" with "node" being a string name and
+  // "src_output" indicating which output tensor to use from "node". If
+  // "src_output" is 0 the ":0" suffix can be omitted.  Regular inputs
+  // may optionally be followed by control inputs that have the format
+  // "^node".
+  repeated string input = 3;
+
+  // A (possibly partial) specification for the device on which this
+  // node should be placed.
+  // The expected syntax for this string is as follows:
+  //
+  // DEVICE_SPEC ::= COLOCATED_NODE | PARTIAL_SPEC
+  //
+  // COLOCATED_NODE ::= "@" NODE_NAME  // See NodeDef.name above.
+  // PARTIAL_SPEC ::= ("/" CONSTRAINT) *
+  // CONSTRAINT ::= ("job:" JOB_NAME)
+  //              | ("replica:" [1-9][0-9]*)
+  //              | ("task:" [1-9][0-9]*)
+  //              | ( ("gpu" | "cpu") ":" ([1-9][0-9]* | "*") )
+  //
+  // Valid values for this string include:
+  // * "@other/node"                         (colocate with "other/node")
+  // * "/job:worker/replica:0/task:1/gpu:3"  (full specification)
+  // * "/job:worker/gpu:3"                   (partial specification)
+  // * ""                                    (no specification)
+  //
+  // If the constraints do not resolve to a single device (or if this
+  // field is empty or not present), the runtime will attempt to
+  // choose a device automatically.
+  string device = 4;
+
+  // Operation-specific graph-construction-time configuration.
+  // Note that this should include all attrs defined in the
+  // corresponding OpDef, including those with a value matching
+  // the default -- this allows the default to change and makes
+  // NodeDefs easier to interpret on their own.  However, if
+  // an attr with a default is not specified in this list, the
+  // default will be used.
+  // The "names" (keys) must match the regexp "[a-z][a-z0-9_]+" (and
+  // one of the names from the corresponding OpDef's attr field).
+  // The values must have a type matching the corresponding OpDef
+  // attr's type field.
+  // TODO(josh11b): Add some examples here showing best practices.
+  map<string, AttrValue> attr = 5;
+};
diff --git a/tensorflow/core/framework/graph_def_util.cc b/tensorflow/core/framework/graph_def_util.cc
new file mode 100644
index 0000000000..1e0d280126
--- /dev/null
+++ b/tensorflow/core/framework/graph_def_util.cc
@@ -0,0 +1,25 @@
+#include "tensorflow/core/framework/graph_def_util.h"
+
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+string SummarizeGraphDef(const GraphDef& graph_def) {
+  string ret;
+  for (const NodeDef& node : graph_def.node()) {
+    strings::StrAppend(&ret, SummarizeNodeDef(node), ";\n");
+  }
+  return ret;
+}
+
+Status ValidateExternalGraphDefSyntax(const GraphDef& graph_def) {
+  for (const NodeDef& node : graph_def.node()) {
+    TF_RETURN_IF_ERROR(ValidateExternalNodeDefSyntax(node));
+  }
+  return Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/graph_def_util.h b/tensorflow/core/framework/graph_def_util.h
new file mode 100644
index 0000000000..7a2ec9c7a7
--- /dev/null
+++ b/tensorflow/core/framework/graph_def_util.h
@@ -0,0 +1,29 @@
+#ifndef TENSORFLOW_FRAMEWORK_GRAPH_DEF_UTIL_H_
+#define TENSORFLOW_FRAMEWORK_GRAPH_DEF_UTIL_H_
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// Produce a human-readable version of a GraphDef that is more concise
+// than a text-format proto.
+string SummarizeGraphDef(const GraphDef& graph_def);
+
+// Validates the syntax of a GraphDef provided externally.
+//
+// The following is an EBNF-style syntax for GraphDef objects. Note that
+// Node objects are actually specified as tensorflow::NodeDef protocol buffers,
+// which contain many other fields that are not (currently) validated.
+//
+// Graph        = Node *
+// Node         = NodeName, Inputs
+// Inputs       = ( DataInput * ), ( ControlInput * )
+// DataInput    = NodeName, ( ":", [1-9], [0-9] * ) ?
+// ControlInput = "^", NodeName
+// NodeName     = [A-Za-z0-9.], [A-Za-z0-9_./] *
+Status ValidateExternalGraphDefSyntax(const GraphDef& graph_def);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_GRAPH_DEF_UTIL_H_
diff --git a/tensorflow/core/framework/kernel_def.proto b/tensorflow/core/framework/kernel_def.proto
new file mode 100644
index 0000000000..db7856a156
--- /dev/null
+++ b/tensorflow/core/framework/kernel_def.proto
@@ -0,0 +1,33 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/attr_value.proto";
+
+message KernelDef {
+  // Must match the name of an Op.
+  string op = 1;
+
+  // Type of device this kernel runs on.
+  string device_type = 2;
+
+  message AttrConstraint {
+    // Name of an attr from the Op.
+    string name = 1;
+
+    // A list of values that this kernel supports for this attr.
+    // Like OpDef.AttrDef.allowed_values, except for kernels instead of Ops.
+    AttrValue allowed_values = 2;
+  }
+  repeated AttrConstraint constraint = 3;
+
+  // Names of the Op's input_/output_args that reside in host memory
+  // instead of device memory.
+  repeated string host_memory_arg = 4;
+
+  // This allows experimental kernels to be registered for an op that
+  // won't be used unless the user specifies a "_kernel" attr with
+  // value matching this.
+  string label = 5;
+}
diff --git a/tensorflow/core/framework/kernel_def_builder.cc b/tensorflow/core/framework/kernel_def_builder.cc
new file mode 100644
index 0000000000..8fba883a16
--- /dev/null
+++ b/tensorflow/core/framework/kernel_def_builder.cc
@@ -0,0 +1,47 @@
+#include "tensorflow/core/framework/kernel_def_builder.h"
+
+namespace tensorflow {
+
+KernelDefBuilder::KernelDefBuilder(const char* op_name) {
+  kernel_def_ = new KernelDef;
+  kernel_def_->set_op(op_name);
+}
+
+KernelDefBuilder& KernelDefBuilder::Device(const char* device_type) {
+  kernel_def_->set_device_type(device_type);
+  return *this;
+}
+
+KernelDefBuilder& KernelDefBuilder::TypeConstraint(
+    const char* attr_name, gtl::ArraySlice<DataType> allowed) {
+  auto* constraint = kernel_def_->add_constraint();
+  constraint->set_name(attr_name);
+  auto* allowed_values = constraint->mutable_allowed_values()->mutable_list();
+  for (DataType dt : allowed) {
+    allowed_values->add_type(dt);
+  }
+  return *this;
+}
+
+KernelDefBuilder& KernelDefBuilder::TypeConstraint(const char* attr_name,
+                                                   DataType allowed) {
+  auto* constraint = kernel_def_->add_constraint();
+  constraint->set_name(attr_name);
+  constraint->mutable_allowed_values()->mutable_list()->add_type(allowed);
+  return *this;
+}
+
+KernelDefBuilder& KernelDefBuilder::HostMemory(const char* arg_name) {
+  kernel_def_->add_host_memory_arg(arg_name);
+  return *this;
+}
+
+KernelDefBuilder& KernelDefBuilder::Label(const char* label) {
+  CHECK_EQ(kernel_def_->label(), "")
+      << "Trying to set a kernel's label a second time: '" << label
+      << "' in: " << kernel_def_->ShortDebugString();
+  kernel_def_->set_label(label);
+  return *this;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/kernel_def_builder.h b/tensorflow/core/framework/kernel_def_builder.h
new file mode 100644
index 0000000000..0c14d1e006
--- /dev/null
+++ b/tensorflow/core/framework/kernel_def_builder.h
@@ -0,0 +1,77 @@
+#ifndef TENSORFLOW_FRAMEWORK_KERNEL_DEF_BUILDER_H_
+#define TENSORFLOW_FRAMEWORK_KERNEL_DEF_BUILDER_H_
+
+#include "tensorflow/core/framework/kernel_def.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// Builder class passed to the REGISTER_KERNEL_BUILDER() macro.
+class KernelDefBuilder {
+ public:
+  // Starts with just the name field set.
+  // Caller MUST call Build() and take ownership of the result.
+  explicit KernelDefBuilder(const char* op_name);
+
+  ~KernelDefBuilder() {
+    DCHECK(kernel_def_ == nullptr) << "Did not call Build()";
+  }
+
+  // Required: specify the type of device this kernel supports.
+  // Returns *this.
+  KernelDefBuilder& Device(const char* device_type);
+  //  KernelDefBuilder& Device(DeviceType device_type);
+
+  // Specify that this kernel supports a limited set of values for a
+  // particular type or list(type) attr (a further restriction than
+  // what the Op allows).
+  // Returns *this.
+  KernelDefBuilder& TypeConstraint(const char* attr_name,
+                                   gtl::ArraySlice<DataType> allowed);
+
+  // Like TypeConstraint but supports just a single type.
+  KernelDefBuilder& TypeConstraint(const char* attr_name, DataType allowed);
+
+  // Like TypeConstraint, but (a) gets the type from a template parameter
+  // and (b) only supports a constraint to a single type.
+  template <class T>
+  KernelDefBuilder& TypeConstraint(const char* attr_name);
+  // TODO(josh11b): Support other types of attr constraints as needed.
+
+  // Specify that this kernel requires/provides an input/output arg
+  // in host memory (instead of the default, device memory).
+  // Returns *this.
+  KernelDefBuilder& HostMemory(const char* arg_name);
+
+  // Specify that this kernel requires a particular value for the
+  // "_kernel" attr.  May only be specified once.  Returns *this.
+  KernelDefBuilder& Label(const char* label);
+
+  // Returns a pointer to a KernelDef with fields set based on the
+  // above calls to this instance.
+  // Caller takes ownership of the result.
+  const KernelDef* Build() {
+    KernelDef* r = kernel_def_;
+    kernel_def_ = nullptr;
+    return r;
+  }
+
+ private:
+  KernelDef* kernel_def_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(KernelDefBuilder);
+};
+
+// IMPLEMENTATION
+
+template <class T>
+inline KernelDefBuilder& KernelDefBuilder::TypeConstraint(
+    const char* attr_name) {
+  return this->TypeConstraint(attr_name, DataTypeToEnum<T>::v());
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_KERNEL_DEF_BUILDER_H_
diff --git a/tensorflow/core/framework/kernel_def_builder_test.cc b/tensorflow/core/framework/kernel_def_builder_test.cc
new file mode 100644
index 0000000000..eba7144b59
--- /dev/null
+++ b/tensorflow/core/framework/kernel_def_builder_test.cc
@@ -0,0 +1,76 @@
+#include "tensorflow/core/framework/kernel_def_builder.h"
+
+#include "tensorflow/core/framework/kernel_def.pb.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+TEST(KernelDefBuilderTest, Basic) {
+  const KernelDef* def = KernelDefBuilder("A").Device(DEVICE_CPU).Build();
+  KernelDef expected;
+  protobuf::TextFormat::ParseFromString("op: 'A' device_type: 'CPU'",
+                                        &expected);
+  EXPECT_EQ(def->DebugString(), expected.DebugString());
+  delete def;
+}
+
+TEST(KernelDefBuilderTest, TypeConstraint) {
+  const KernelDef* def = KernelDefBuilder("B")
+                             .Device(DEVICE_GPU)
+                             .TypeConstraint<float>("T")
+                             .Build();
+  KernelDef expected;
+  protobuf::TextFormat::ParseFromString(R"proto(
+    op: 'B' device_type: 'GPU'
+    constraint { name: 'T' allowed_values { list { type: DT_FLOAT } } } )proto",
+                                        &expected);
+
+  EXPECT_EQ(def->DebugString(), expected.DebugString());
+  delete def;
+
+  def = KernelDefBuilder("C")
+            .Device(DEVICE_GPU)
+            .TypeConstraint<int32>("U")
+            .TypeConstraint<bool>("V")
+            .Build();
+
+  protobuf::TextFormat::ParseFromString(R"proto(
+    op: 'C' device_type: 'GPU'
+    constraint { name: 'U' allowed_values { list { type: DT_INT32 } } }
+    constraint { name: 'V' allowed_values { list { type: DT_BOOL } } } )proto",
+                                        &expected);
+  EXPECT_EQ(def->DebugString(), expected.DebugString());
+  delete def;
+
+  def = KernelDefBuilder("D")
+            .Device(DEVICE_CPU)
+            .TypeConstraint("W", {DT_DOUBLE, DT_STRING})
+            .Build();
+  protobuf::TextFormat::ParseFromString(R"proto(
+    op: 'D' device_type: 'CPU'
+    constraint { name: 'W'
+        allowed_values { list { type: [DT_DOUBLE, DT_STRING] } } } )proto",
+                                        &expected);
+  EXPECT_EQ(def->DebugString(), expected.DebugString());
+  delete def;
+}
+
+TEST(KernelDefBuilderTest, HostMemory) {
+  const KernelDef* def = KernelDefBuilder("E")
+                             .Device(DEVICE_GPU)
+                             .HostMemory("in")
+                             .HostMemory("out")
+                             .Build();
+  KernelDef expected;
+  protobuf::TextFormat::ParseFromString(
+      "op: 'E' device_type: 'GPU' "
+      "host_memory_arg: ['in', 'out']",
+      &expected);
+  EXPECT_EQ(def->DebugString(), expected.DebugString());
+  delete def;
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/lookup_interface.cc b/tensorflow/core/framework/lookup_interface.cc
new file mode 100644
index 0000000000..c660b84aa0
--- /dev/null
+++ b/tensorflow/core/framework/lookup_interface.cc
@@ -0,0 +1,45 @@
+#include "tensorflow/core/framework/lookup_interface.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+namespace lookup {
+
+Status LookupInterface::CheckKeyAndValueTensors(const Tensor& key,
+                                                const Tensor& value) {
+  if (key.dtype() != key_dtype()) {
+    return errors::InvalidArgument("Key must be type ", key_dtype(),
+                                   " but got ", key.dtype());
+  }
+  if (value.dtype() != value_dtype()) {
+    return errors::InvalidArgument("Value must be type ", value_dtype(),
+                                   " but got ", value.dtype());
+  }
+  if (key.NumElements() != value.NumElements()) {
+    return errors::InvalidArgument("Number of elements of key(",
+                                   key.NumElements(), ") and value(",
+                                   value.NumElements(), ") are different.");
+  }
+  if (!key.shape().IsSameSize(value.shape())) {
+    return errors::InvalidArgument("key and value have different shapes.");
+  }
+  return Status::OK();
+}
+
+Status LookupInterface::CheckFindArguments(const Tensor& key,
+                                           const Tensor& value,
+                                           const Tensor& default_value) {
+  TF_RETURN_IF_ERROR(CheckKeyAndValueTensors(key, value));
+
+  if (default_value.dtype() != value_dtype()) {
+    return errors::InvalidArgument("Default value must be type ", value_dtype(),
+                                   " but got ", default_value.dtype());
+  }
+  if (!TensorShapeUtils::IsScalar(default_value.shape())) {
+    return errors::InvalidArgument("Default values must be scalar.");
+  }
+  return Status::OK();
+}
+
+}  // namespace lookup
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/lookup_interface.h b/tensorflow/core/framework/lookup_interface.h
new file mode 100644
index 0000000000..d4036d2019
--- /dev/null
+++ b/tensorflow/core/framework/lookup_interface.h
@@ -0,0 +1,65 @@
+#ifndef TENSORFLOW_FRAMEWORK_LOOKUP_INTERFACE_H_
+#define TENSORFLOW_FRAMEWORK_LOOKUP_INTERFACE_H_
+
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace lookup {
+
+// Lookup interface for batch lookups used by table lookup ops.
+class LookupInterface : public ResourceBase {
+ public:
+  // Performs batch lookups, for every element in the key tensor, Find returns
+  // the corresponding value into the values tensor.
+  // If an element is not present in the table, the given default value is used.
+
+  // For tables that require initialization, Find is available once the table
+  // is marked as initialized.
+
+  // Returns the following statuses:
+  // - OK: when the find finishes successfully.
+  // - FailedPrecondition: if the table is not initialized.
+  // - InvalidArgument: if any of the preconditions on the lookup key or value
+  //   fails.
+  // - In addition, other implementations may provide another non-OK status
+  //   specific to their failure modes.
+  virtual Status Find(const Tensor& keys, Tensor* values,
+                      const Tensor& default_value) = 0;
+
+  // Returns the number of elements in the table.
+  virtual size_t size() const = 0;
+
+  // Returns the data type of the key.
+  virtual DataType key_dtype() const = 0;
+
+  // Returns the data type of the value.
+  virtual DataType value_dtype() const = 0;
+
+  string DebugString() override { return "A lookup table"; }
+
+ protected:
+  virtual ~LookupInterface() = default;
+
+  // Check format of the key and value tensors.
+  // Returns OK if all the following requirements are satisfied, otherwise it
+  // returns InvalidArgument:
+  // - DataType of the tensor key equals to the table key_dtype
+  // - DataType of the test value equals to the table value_dtype
+  // - key and value have the same size and shape
+  Status CheckKeyAndValueTensors(const Tensor& keys, const Tensor& values);
+
+  // Check the arguments of a find operation. Returns OK if all the following
+  // requirements are satisfied, otherwise it returns InvalidArgument:
+  // - All requirements of CheckKeyAndValueTensors
+  // - default_value type equals to the table value_dtype
+  // - default_value is scalar
+  Status CheckFindArguments(const Tensor& keys, const Tensor& values,
+                            const Tensor& default_value);
+};
+
+}  // namespace lookup
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_LOOKUP_INTERFACE_H_
diff --git a/tensorflow/core/framework/node_def_builder.cc b/tensorflow/core/framework/node_def_builder.cc
new file mode 100644
index 0000000000..12757f153a
--- /dev/null
+++ b/tensorflow/core/framework/node_def_builder.cc
@@ -0,0 +1,194 @@
+#include "tensorflow/core/framework/node_def_builder.h"
+
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+
+namespace tensorflow {
+
+NodeDefBuilder::NodeDefBuilder(const string& name, const string& op_name,
+                               const OpRegistryInterface* op_registry) {
+  node_def_.set_name(name);
+  Status status;
+  op_def_ = op_registry->LookUp(op_name, &status);
+  if (op_def_ == nullptr) {
+    errors_.push_back(status.error_message());
+    inputs_specified_ = 0;
+  } else {
+    Initialize();
+  }
+}
+
+NodeDefBuilder::NodeDefBuilder(const string& name, const OpDef* op_def)
+    : op_def_(op_def) {
+  node_def_.set_name(name);
+  Initialize();
+}
+
+void NodeDefBuilder::Initialize() {
+  inputs_specified_ = 0;
+  node_def_.set_op(op_def_->name());
+}
+
+const OpDef::ArgDef* NodeDefBuilder::NextArgDef() {
+  if (!NextArgAvailable()) return nullptr;
+  return &op_def_->input_arg(inputs_specified_++);
+}
+
+bool NodeDefBuilder::NextArgAvailable() {
+  if (op_def_ == nullptr) {
+    return false;
+  } else if (inputs_specified_ >= op_def_->input_arg_size()) {
+    errors_.push_back(strings::StrCat("More Input() calls than the ",
+                                      op_def_->input_arg_size(),
+                                      " input_args"));
+    return false;
+  }
+  return true;
+}
+
+NodeDefBuilder& NodeDefBuilder::Input(FakeInputFunctor fake_input) {
+  if (NextArgAvailable()) {
+    Status status =
+        fake_input(*op_def_, inputs_specified_, node_def_, this);
+    if (!status.ok()) errors_.push_back(status.error_message());
+  }
+  return *this;
+}
+
+void NodeDefBuilder::SingleInput(const OpDef::ArgDef* input_arg,
+                                 const string& src_node, int src_index,
+                                 DataType dt) {
+  AddInput(src_node, src_index);
+
+  if (!input_arg->number_attr().empty() ||
+      !input_arg->type_list_attr().empty()) {
+    errors_.push_back(strings::StrCat("Single tensor passed to '",
+                                      input_arg->name(), "', expected list"));
+    return;
+  }
+
+  if (input_arg->type() != DT_INVALID) {
+    const DataType expected = MaybeAddRef(input_arg, input_arg->type());
+    VerifyInputType(input_arg, expected, dt);
+  } else {
+    VerifyInputRef(input_arg, dt);
+    Attr(input_arg->type_attr(), BaseType(dt));
+  }
+}
+
+void NodeDefBuilder::ListInput(const OpDef::ArgDef* input_arg,
+                               gtl::ArraySlice<NodeOut> src_list) {
+  for (const auto& node_out : src_list) {
+    AddInput(node_out.node, node_out.index);
+  }
+
+  if (!input_arg->number_attr().empty()) {
+    Attr(input_arg->number_attr(), static_cast<int64>(src_list.size()));
+    if (input_arg->type() != DT_INVALID) {
+      const DataType expected = MaybeAddRef(input_arg, input_arg->type());
+      for (const auto& node_out : src_list) {
+        VerifyInputType(input_arg, expected, node_out.data_type);
+      }
+    } else if (!src_list.empty()) {
+      const DataType base = BaseType(src_list[0].data_type);
+      Attr(input_arg->type_attr(), base);
+      const DataType expected = MaybeAddRef(input_arg, base);
+      for (const auto& node_out : src_list) {
+        VerifyInputType(input_arg, expected, node_out.data_type);
+      }
+    }
+  } else if (!input_arg->type_list_attr().empty()) {
+    DataTypeVector type_vec;
+    type_vec.reserve(src_list.size());
+    for (const auto& node_out : src_list) {
+      const DataType dt = node_out.data_type;
+      VerifyInputRef(input_arg, dt);
+      type_vec.push_back(BaseType(dt));
+    }
+    Attr(input_arg->type_list_attr(), type_vec);
+  } else {
+    errors_.push_back(strings::StrCat("List provided to input '",
+                                      input_arg->name(),
+                                      "' when single Tensor expected"));
+  }
+}
+
+void NodeDefBuilder::AddInput(const string& src_node, int src_index) {
+  if (src_node.empty()) {
+    errors_.push_back("Empty input node name");
+  } else if (src_node[0] == '^') {
+    errors_.push_back(
+        strings::StrCat("Non-control input starting with ^: ", src_node));
+  } else if (src_index > 0) {
+    node_def_.add_input(strings::StrCat(src_node, ":", src_index));
+  } else {
+    node_def_.add_input(src_node);
+  }
+}
+
+void NodeDefBuilder::VerifyInputType(const OpDef::ArgDef* input_arg,
+                                     DataType expected, DataType dt) {
+  if (!TypesCompatible(expected, dt)) {
+    errors_.push_back(strings::StrCat("Input '", input_arg->name(), "' passed ",
+                                      DataTypeString(dt), " expected ",
+                                      DataTypeString(expected)));
+  }
+}
+
+void NodeDefBuilder::VerifyInputRef(const OpDef::ArgDef* input_arg,
+                                    DataType dt) {
+  if (input_arg->is_ref() && !IsRefType(dt)) {
+    errors_.push_back(strings::StrCat("Input '", input_arg->name(), "' passed ",
+                                      DataTypeString(dt),
+                                      " expected ref type"));
+  }
+}
+
+Status NodeDefBuilder::Finalize(NodeDef* node_def) const {
+  const std::vector<string>* errors_ptr = &errors_;
+  std::vector<string> errors_storage;
+  if (op_def_ != nullptr && inputs_specified_ < op_def_->input_arg_size()) {
+    // Since this is a const method, to add an error, we have to make
+    // a copy of the existing errors.
+    errors_storage = errors_;
+    errors_storage.push_back(
+        strings::StrCat(inputs_specified_, " inputs specified of ",
+                        op_def_->input_arg_size(), " inputs in Op"));
+    errors_ptr = &errors_storage;
+  }
+
+  if (!errors_ptr->empty()) {
+    if (errors_ptr->size() == 1) {
+      if (op_def_ == nullptr) {
+        return errors::InvalidArgument((*errors_ptr)[0],
+                                       " while building NodeDef '",
+                                       node_def_.name(), "'");
+      }
+      return errors::InvalidArgument(
+          (*errors_ptr)[0], " while building NodeDef '", node_def_.name(),
+          "' using ", SummarizeOpDef(*op_def_));
+    } else {
+      return errors::InvalidArgument(
+          errors_ptr->size(), " errors while building NodeDef '",
+          node_def_.name(), "' using ", SummarizeOpDef(*op_def_), ":\n",
+          str_util::Join(*errors_ptr, "\n"));
+    }
+  } else {
+    NodeDef node_def_backup;
+    if (node_def == nullptr) node_def = &node_def_backup;
+    *node_def = node_def_;
+
+    // Add control inputs after the regular inputs.
+    for (const auto& control_input : control_inputs_) {
+      node_def->add_input(strings::StrCat("^", control_input));
+    }
+
+    // Add default values for unspecified attrs.
+    AddDefaultsToNodeDef(*op_def_, node_def);
+
+    return Status::OK();
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/node_def_builder.h b/tensorflow/core/framework/node_def_builder.h
new file mode 100644
index 0000000000..706f072608
--- /dev/null
+++ b/tensorflow/core/framework/node_def_builder.h
@@ -0,0 +1,176 @@
+#ifndef TENSORFLOW_FRAMEWORK_NODE_DEF_BUILDER_H_
+#define TENSORFLOW_FRAMEWORK_NODE_DEF_BUILDER_H_
+
+#include <functional>
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace tensorflow {
+
+class NodeDefBuilder;
+typedef std::function<Status(const OpDef&, int, const NodeDef&,
+                                  NodeDefBuilder*)> FakeInputFunctor;
+
+// This is a helper for creating a NodeDef.  Automatically sets attrs
+// that can be inferred from the inputs, and uses default values
+// (where they exist) for unspecified attrs.  Example usage:
+//
+//  NodeDef node_def;
+//  Status status = NodeDefBuilder(node_name, op_name)
+//                           .Input(...)
+//                           .Attr(...)
+//                           .Finalize(&node_def);
+//  if (!status.ok()) return status;
+//  // Use node_def here.
+class NodeDefBuilder {
+ public:
+  // To specify an output to be consumed by one of the Input() methods below.
+  struct NodeOut {
+    NodeOut(const string& n, int i, DataType dt)
+        : node(n), index(i), data_type(dt) {}
+    NodeOut() {}  // uninitialized, call Reset() before use.
+    void Reset(const string& n, int i, DataType dt) {
+      node = n;
+      index = i;
+      data_type = dt;
+    }
+    string node;
+    int index;
+    DataType data_type;
+  };
+
+  // Specify the name and the Op (either via an OpDef or the name of
+  // the Op plus a registry) for the NodeDef.  Other fields are
+  // specified by calling the methods below.
+  // REQUIRES: The OpDef must satisfy ValidateOpDef().
+  NodeDefBuilder(const string& name, const string& op_name,
+                 const OpRegistryInterface* op_registry = OpRegistry::Global());
+  // REQUIRES: in addition, *op_def must outlive *this.
+  NodeDefBuilder(const string& name, const OpDef* op_def);
+
+  // You must call one Input() function per input_arg in the Op,
+  // *and in the same order as the input_args appear in the OpDef.*
+
+  // For inputs that take a single tensor.
+  NodeDefBuilder& Input(const string& src_node, int src_index, DataType dt) {
+    const OpDef::ArgDef* arg = NextArgDef();
+    if (arg != nullptr) SingleInput(arg, src_node, src_index, dt);
+    return *this;
+  }
+  NodeDefBuilder& Input(const NodeOut& src) {
+    Input(src.node, src.index, src.data_type);
+    return *this;
+  }
+
+  // For inputs that take a list of tensors.
+  NodeDefBuilder& Input(gtl::ArraySlice<NodeOut> src_list) {
+    const OpDef::ArgDef* arg = NextArgDef();
+    if (arg != nullptr) ListInput(arg, src_list);
+    return *this;
+  }
+
+  // To create inputs in tests, see fake_input.h.
+  NodeDefBuilder& Input(FakeInputFunctor fake_input);
+
+  // Specify that this node must only run after src_node.
+  NodeDefBuilder& ControlInput(const string& src_node) {
+    control_inputs_.push_back(src_node);
+    return *this;
+  }
+
+  // Constrains what devices this node may be scheduled on.
+  NodeDefBuilder& Device(const string& device_spec) {
+    node_def_.set_device(device_spec);
+    return *this;
+  }
+
+  // Sets the attr, if not already set.  If already set with a different
+  // value, an error will be returned from Finalize().
+  template <class T>
+  NodeDefBuilder& Attr(const string& attr_name, T&& value);
+  // Note: overload needed to allow {...} expressions for value.
+  template <class T>
+  NodeDefBuilder& Attr(const string& attr_name,
+                       std::initializer_list<T> value) {
+    Attr<std::initializer_list<T>>(attr_name, std::move(value));
+    return *this;
+  }
+
+  // Finish building the NodeDef, returning any errors or setting
+  // *node_def if none.
+  // WARNING: Not all problems are detected!  The resulting NodeDef may
+  // not be valid!  Call ValidateNodeDef() from node_def_utils to be sure.
+  Status Finalize(NodeDef* node_def) const;
+
+  // Accessor for the OpDef set in the constructor.
+  const OpDef& op_def() const { return *op_def_; }
+
+ private:
+  // Called in the constructors.
+  void Initialize();
+
+  // Get the current ArgDef and advance to the next one. Returns nullptr
+  // if no more inputs are available.
+  const OpDef::ArgDef* NextArgDef();
+
+  // Returns true if there is still an input_arg available in *op_def_,
+  // otherwise adds to error_ and returns false.
+  bool NextArgAvailable();
+
+  // These do the main work of the Input() methods.
+  void SingleInput(const OpDef::ArgDef* input_arg, const string& src_node,
+                   int src_index, DataType dt);
+  void ListInput(const OpDef::ArgDef* input_arg,
+                 gtl::ArraySlice<NodeOut> src_list);
+
+  // Add "src_node:src_index" to the list of inputs in the node_def_.
+  void AddInput(const string& src_node, int src_index);
+
+  // Generate an error if you can't pass dt when expected is expected.
+  void VerifyInputType(const OpDef::ArgDef* input_arg, DataType expected,
+                       DataType dt);
+
+  // If input_arg->is_ref() is true, generate an error if dt is not a ref.
+  void VerifyInputRef(const OpDef::ArgDef* input_arg, DataType dt);
+
+  // Makes dt a ref type if that is what the input_arg specifies.
+  DataType MaybeAddRef(const OpDef::ArgDef* input_arg, DataType dt) {
+    return input_arg->is_ref() ? MakeRefType(dt) : dt;
+  }
+
+  const OpDef* op_def_;
+  NodeDef node_def_;
+  int inputs_specified_;
+  std::vector<string> control_inputs_;
+  std::vector<string> errors_;
+};
+
+// IMPLEMENTATION -------------------------------------------------------------
+
+template <class T>
+NodeDefBuilder& NodeDefBuilder::Attr(const string& attr_name, T&& value) {
+  const AttrValue* found = AttrSlice(node_def_).Find(attr_name);
+  if (found == nullptr) {
+    AddNodeAttr(attr_name, std::forward<T>(value), &node_def_);
+  } else {
+    AttrValue attr_value;
+    SetAttrValue(std::forward<T>(value), &attr_value);
+    if (!AreAttrValuesEqual(*found, attr_value)) {
+      errors_.push_back(strings::StrCat(
+          "Inconsistent values for attr '", attr_name, "' ",
+          SummarizeAttrValue(*found), " vs. ", SummarizeAttrValue(attr_value)));
+    }
+  }
+  return *this;
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_NODE_DEF_BUILDER_H_
diff --git a/tensorflow/core/framework/node_def_builder_test.cc b/tensorflow/core/framework/node_def_builder_test.cc
new file mode 100644
index 0000000000..6fd4a8d1ed
--- /dev/null
+++ b/tensorflow/core/framework/node_def_builder_test.cc
@@ -0,0 +1,1036 @@
+#include "tensorflow/core/framework/node_def_builder.h"
+
+#include <memory>
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op_def_builder.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+class NodeDefBuilderTest : public ::testing::Test {
+ protected:
+  // Specify an OpDef via an OpDefBuilder.
+  void Op(const OpDefBuilder& op_def_builder) {
+    EXPECT_OK(op_def_builder.Finalize(&op_def_));
+  }
+
+  // Resets builder_ with a new NodeDefBuilder using the Op from the last call
+  // to Op() above.
+  NodeDefBuilder& Builder() {
+    EXPECT_FALSE(op_def_.name().empty()) << "Must call Op() before Builder()";
+    builder_.reset(new NodeDefBuilder("n", &op_def_));
+    return *builder_;
+  }
+
+  // Calls Finalize() and verifies it returns success and the result matches
+  // expectations.
+  void ExpectSuccess(const NodeDefBuilder& builder,
+                     DataTypeSlice expected_in_types,
+                     DataTypeSlice expected_out_types, StringPiece proto) {
+    NodeDef node_def;
+    Status status = builder.Finalize(&node_def);
+    EXPECT_OK(status);
+    if (!status.ok()) return;
+    NodeDef expected;
+    protobuf::TextFormat::ParseFromString(strings::StrCat("name: 'n' ", proto),
+                                          &expected);
+    EXPECT_EQ(node_def.DebugString(), expected.DebugString());
+
+    DataTypeVector in_types, out_types;
+    status =
+        InOutTypesForNode(node_def, builder.op_def(), &in_types, &out_types);
+    EXPECT_OK(status);
+    if (!status.ok()) return;
+    EXPECT_EQ(DataTypeSliceString(expected_in_types),
+              DataTypeVectorString(in_types));
+    EXPECT_EQ(DataTypeSliceString(expected_out_types),
+              DataTypeVectorString(out_types));
+
+    status = ValidateNodeDef(node_def, op_def_);
+    EXPECT_OK(status);
+  }
+
+  // Calls Finalize() and verifies it returns an error.
+  // Each message must appear as a substring of the error.
+  void ExpectFailures(const NodeDefBuilder& builder,
+                      const std::vector<string>& messages) {
+    NodeDef node_def;
+    Status status = builder.Finalize(&node_def);
+    EXPECT_FALSE(status.ok()) << SummarizeNodeDef(node_def);
+    if (status.ok()) return;
+    for (const string& message : messages) {
+      EXPECT_TRUE(StringPiece(status.error_message()).contains(message))
+          << status << ", " << message;
+    }
+  }
+
+  // Calls Finalize() and verifies it returns an error.
+  // Message must appear as a substring of the error.
+  void ExpectFailure(const NodeDefBuilder& builder, const string& message) {
+    ExpectFailures(builder, {message});
+  }
+
+  // Like ExpectFailure(), except that the error can come from
+  // ValidateNodeDef().
+  void ExpectInvalid(const NodeDefBuilder& builder, const string& message) {
+    NodeDef node_def;
+    Status status = builder.Finalize(&node_def);
+    if (status.ok()) {
+      status = ValidateNodeDef(node_def, op_def_);
+    }
+    EXPECT_FALSE(status.ok()) << SummarizeNodeDef(node_def);
+    if (status.ok()) return;
+    EXPECT_TRUE(StringPiece(status.error_message()).contains(message))
+        << "Actual error: " << status.error_message()
+        << "\nDoes not contain: " << message;
+  }
+
+  OpDef op_def_;
+  std::unique_ptr<NodeDefBuilder> builder_;
+};
+
+TEST_F(NodeDefBuilderTest, Simple) {
+  Op(OpDefBuilder("Simple").Input("a: int32").Output("out: float"));
+
+  ExpectSuccess(Builder().Input("x", 0, DT_INT32), {DT_INT32}, {DT_FLOAT},
+                R"proto( op: "Simple" input: "x" )proto");
+
+  // Port != 0
+  ExpectSuccess(Builder().Input("y", 2, DT_INT32), {DT_INT32}, {DT_FLOAT},
+                R"proto( op: "Simple" input: "y:2" )proto");
+
+  // FakeInput
+  ExpectSuccess(Builder().Input(FakeInput()), {DT_INT32}, {DT_FLOAT}, R"proto(
+      op: "Simple" input: "a" )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(DT_INT32)), {DT_INT32}, {DT_FLOAT},
+                R"proto( op: "Simple" input: "a" )proto");
+
+  // Ref input
+  ExpectSuccess(Builder().Input(FakeInput(DT_INT32_REF)), {DT_INT32},
+                {DT_FLOAT}, R"proto( op: "Simple" input: "a" )proto");
+
+  // ControlInput
+  ExpectSuccess(
+      Builder().ControlInput("x").Input(FakeInput()).ControlInput("y"),
+      {DT_INT32}, {DT_FLOAT}, R"proto(
+      op: "Simple" input: ["a", "^x", "^y"] )proto");
+
+  // Device
+  ExpectSuccess(Builder().Input(FakeInput()).Device("ddd"), {DT_INT32},
+                {DT_FLOAT}, R"proto(
+      op: "Simple" input: "a" device: "ddd" )proto");
+
+  // Extra input
+  ExpectFailure(Builder().Input("x", 0, DT_INT32).Input("y", 0, DT_INT32),
+                "More Input() calls than the 1 input_args while building "
+                "NodeDef 'n' using Op<name=Simple; signature=a:int32 -> "
+                "out:float>");
+
+  // Missing input
+  ExpectFailure(Builder(), "0 inputs specified of 1 inputs in Op while");
+
+  {  // Finalize() twice.
+    NodeDefBuilder& builder = Builder();
+    builder.Input(FakeInput()).Finalize(nullptr);  // First call to Finalize()
+    // ExpectSuccess() also calls Finalize().
+    ExpectSuccess(builder, {DT_INT32}, {DT_FLOAT}, R"proto(
+        op: "Simple" input: "a" )proto");
+  }
+
+  {  // Input() after Finalize()
+    NodeDefBuilder& builder = Builder();
+    // Calling Finalize() before enough inputs -> error.
+    ExpectFailure(builder, "0 inputs specified of 1 inputs in Op while");
+    builder.Input(FakeInput());
+    // Calling Finalize() with enough inputs -> success
+    ExpectSuccess(builder, {DT_INT32}, {DT_FLOAT}, R"proto(
+        op: "Simple" input: "a" )proto");
+    // Calling Finalize() with too many inputs -> error.
+    builder.Input(FakeInput(DT_INT32));
+    ExpectFailure(builder, "More Input() calls than the 1 input_args while");
+  }
+
+  // Wrong input type
+  ExpectFailure(Builder().Input("x", 0, DT_FLOAT),
+                "Input 'a' passed float expected int32 ");
+
+  ExpectFailure(Builder().Input("x", 0, DT_FLOAT_REF),
+                "Input 'a' passed float_ref expected int32 ");
+
+  // List input
+  ExpectFailure(Builder().Input(FakeInput(3, DT_FLOAT)),
+                "List provided to input 'a' when single Tensor expected while");
+
+  ExpectFailure(Builder().Input(FakeInput(3)),
+                "List provided to input 'a' when single Tensor expected while");
+
+  // Bad ControlInput
+  ExpectInvalid(Builder().Input(FakeInput()).ControlInput("z:2"),
+                "Control input '^z:2' must not have ':' in NodeDef:");
+
+  // Bad input name
+  ExpectFailure(Builder().Input("", 0, DT_INT32),
+                "Empty input node name while");
+
+  ExpectFailure(Builder().Input("^x", 0, DT_INT32),
+                "Non-control input starting with ^: ^x while");
+}
+
+TEST_F(NodeDefBuilderTest, OpDoesNotExist) {
+  NodeDefBuilder builder("n", "Op Does Not Exist");
+  builder.Input(FakeInput())
+      .Input(FakeInput(12))
+      .ControlInput("y")
+      .Attr("foo", 12)
+      .Device("device");
+  ExpectFailure(
+      builder,
+      "Op type not registered 'Op Does Not Exist' while building NodeDef 'n'");
+}
+
+TEST_F(NodeDefBuilderTest, Polymorphic) {
+  Op(OpDefBuilder("Polymorphic")
+         .Input("v: T")
+         .Output("out: T")
+         .Attr("T: type"));
+
+  ExpectSuccess(Builder().Input(FakeInput(DT_INT32)), {DT_INT32}, {DT_INT32},
+                R"proto(
+      op: "Polymorphic" input: "a"
+      attr { key: "T" value { type: DT_INT32 } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(DT_FLOAT)), {DT_FLOAT}, {DT_FLOAT},
+                R"proto(
+      op: "Polymorphic" input: "a"
+      attr { key: "T" value { type: DT_FLOAT } } )proto");
+
+  // Redundant Attr()
+  ExpectSuccess(Builder().Input(FakeInput(DT_BOOL)).Attr("T", DT_BOOL),
+                {DT_BOOL}, {DT_BOOL}, R"proto(
+      op: "Polymorphic" input: "a"
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+
+  // Conficting Attr()
+  ExpectFailure(Builder().Input(FakeInput(DT_BOOL)).Attr("T", DT_STRING),
+                "Inconsistent values for attr 'T' DT_BOOL vs. DT_STRING while");
+
+  ExpectFailure(Builder().Attr("T", DT_STRING).Input(FakeInput(DT_BOOL)),
+                "Inconsistent values for attr 'T' DT_STRING vs. DT_BOOL while");
+
+  ExpectFailure(Builder().Attr("T", 12).Input(FakeInput(DT_BOOL)),
+                "Inconsistent values for attr 'T' 12 vs. DT_BOOL while");
+}
+
+TEST_F(NodeDefBuilderTest, PolymorphicOut) {
+  Op(OpDefBuilder("PolymorphicOut").Output("out: T").Attr("T: type"));
+
+  ExpectSuccess(Builder().Attr("T", DT_INT32), {}, {DT_INT32}, R"proto(
+      op: "PolymorphicOut"
+      attr { key: "T" value { type: DT_INT32 } } )proto");
+
+  ExpectSuccess(Builder().Attr("T", DT_FLOAT), {}, {DT_FLOAT}, R"proto(
+      op: "PolymorphicOut"
+      attr { key: "T" value { type: DT_FLOAT } } )proto");
+
+  // Redundant attr
+  ExpectSuccess(Builder().Attr("T", DT_FLOAT).Attr("T", DT_FLOAT), {},
+                {DT_FLOAT}, R"proto(
+      op: "PolymorphicOut"
+      attr { key: "T" value { type: DT_FLOAT } } )proto");
+
+  // Conflicting attr
+  ExpectFailure(Builder().Attr("T", DT_BOOL).Attr("T", DT_FLOAT),
+                "Inconsistent values for attr 'T' DT_BOOL vs. DT_FLOAT while");
+
+  // Missing attr
+  ExpectInvalid(Builder(), "NodeDef missing attr 'T' from");
+
+  // Attr has the wrong type
+  ExpectInvalid(Builder().Attr("T", {DT_INT32, DT_BOOL}),
+                "AttrValue had value with type list(type) when type expected");
+
+  ExpectInvalid(Builder().Attr("T", 12),
+                "AttrValue had value with type int when type expected");
+}
+
+TEST_F(NodeDefBuilderTest, PolymorphicDefaultOut) {
+  Op(OpDefBuilder("PolymorphicDefaultOut")
+         .Output("out: T")
+         .Attr("T: type = DT_STRING"));
+
+  ExpectSuccess(Builder(), {}, {DT_STRING}, R"proto(
+      op: "PolymorphicDefaultOut"
+      attr { key: "T" value { type: DT_STRING } } )proto");
+
+  ExpectSuccess(Builder().Attr("T", DT_BOOL), {}, {DT_BOOL}, R"proto(
+      op: "PolymorphicDefaultOut"
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, Binary) {
+  Op(OpDefBuilder("Binary").Input("a: T").Input("b: T").Output("out: T").Attr(
+      "T: type"));
+
+  ExpectSuccess(Builder().Input(FakeInput(DT_INT32)).Input(FakeInput(DT_INT32)),
+                {DT_INT32, DT_INT32}, {DT_INT32}, R"proto(
+      op: "Binary" input: "a" input: "b"
+      attr { key: "T" value { type: DT_INT32 } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(DT_STRING)).Input(FakeInput()),
+                {DT_STRING, DT_STRING}, {DT_STRING}, R"proto(
+      op: "Binary" input: "a" input: "b"
+      attr { key: "T" value { type: DT_STRING } } )proto");
+
+  // Type mismatch
+  ExpectFailure(Builder().Input(FakeInput(DT_BOOL)).Input(FakeInput(DT_STRING)),
+                "Inconsistent values for attr 'T' DT_BOOL vs. DT_STRING while");
+}
+
+TEST_F(NodeDefBuilderTest, Restrict) {
+  Op(OpDefBuilder("Restrict")
+         .Input("a: T")
+         .Output("out: T")
+         .Attr("T: {string, bool}"));
+  ExpectSuccess(Builder().Input(FakeInput(DT_STRING)), {DT_STRING}, {DT_STRING},
+                R"proto(
+      op: "Restrict" input: "a"
+      attr { key: "T" value { type: DT_STRING } } )proto");
+
+  ExpectInvalid(Builder().Input(FakeInput(DT_INT32)),
+                "Value for attr 'T' of int32 is not in the list of allowed "
+                "values: string, bool");
+}
+
+TEST_F(NodeDefBuilderTest, TypeList) {
+  Op(OpDefBuilder("TypeList").Input("a: T").Attr("T: list(type)"));
+
+  ExpectSuccess(Builder().Input(FakeInput({DT_STRING, DT_INT32})),
+                {DT_STRING, DT_INT32}, {}, R"proto(
+      op: "TypeList" input: ["a", "a:1"]
+      attr { key: "T" value { list { type: [DT_STRING, DT_INT32] } } }
+      )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(3, DT_BOOL)),
+                {DT_BOOL, DT_BOOL, DT_BOOL}, {}, R"proto(
+      op: "TypeList" input: ["a", "a:1", "a:2"]
+      attr { key: "T" value { list { type: [DT_BOOL, DT_BOOL, DT_BOOL] } } }
+      )proto");
+
+  ExpectInvalid(Builder().Input(FakeInput(0)),
+                "Length for attr 'T' of 0 must be at least minimum 1");
+
+  ExpectInvalid(Builder().Input(FakeInput({})),
+                "Length for attr 'T' of 0 must be at least minimum 1");
+
+  ExpectInvalid(Builder().Input(FakeInput(DT_BOOL)),
+                "Single tensor passed to 'a', expected list while");
+
+  ExpectFailures(Builder().Input(FakeInput()),
+                 {"2 errors while building NodeDef",
+                  "Could not infer list of types for input 'a': "
+                  "No attr named 'T' in NodeDef:",
+                  "0 inputs specified of 1 inputs in Op"});
+}
+
+TEST_F(NodeDefBuilderTest, TypeListNoMin) {
+  Op(OpDefBuilder("TypeListNoMin").Input("a: T").Attr("T: list(type) >= 0"));
+
+  ExpectSuccess(Builder().Input(FakeInput(0)), {}, {}, R"proto(
+      op: "TypeListNoMin" attr { key: "T" value { list { } } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(DataTypeVector())), {}, {}, R"proto(
+      op: "TypeListNoMin" attr { key: "T" value { list { } } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput({})), {}, {}, R"proto(
+      op: "TypeListNoMin" attr { key: "T" value { list { } } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput({DT_BOOL})), {DT_BOOL}, {}, R"proto(
+      op: "TypeListNoMin" input: "a"
+      attr { key: "T" value { list { type: DT_BOOL } } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, TypeListTwice) {
+  Op(OpDefBuilder("TypeListTwice")
+         .Input("a: T")
+         .Input("b: T")
+         .Attr("T: list(type) >= 0"));
+
+  ExpectSuccess(Builder()
+                    .Input(FakeInput({DT_INT32, DT_BOOL}))
+                    .Input(FakeInput({DT_INT32, DT_BOOL})),
+                {DT_INT32, DT_BOOL, DT_INT32, DT_BOOL}, {}, R"proto(
+      op: "TypeListTwice" input: ["a", "a:1", "b", "b:1"]
+      attr { key: "T" value { list { type: [DT_INT32, DT_BOOL] } } } )proto");
+
+  ExpectSuccess(
+      Builder().Input(FakeInput({DT_INT32, DT_BOOL})).Input(FakeInput()),
+      {DT_INT32, DT_BOOL, DT_INT32, DT_BOOL}, {}, R"proto(
+      op: "TypeListTwice" input: ["a", "a:1", "b", "b:1"]
+      attr { key: "T" value { list { type: [DT_INT32, DT_BOOL] } } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(0)).Input(FakeInput(0)), {}, {},
+                R"proto(
+      op: "TypeListTwice" attr { key: "T" value { list { } } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(0)).Input(FakeInput()), {}, {},
+                R"proto(
+      op: "TypeListTwice" attr { key: "T" value { list { } } } )proto");
+
+  ExpectFailure(Builder()
+                    .Input(FakeInput({DT_INT32, DT_BOOL}))
+                    .Input(FakeInput({DT_INT32, DT_STRING})),
+                "Inconsistent values for attr 'T' [DT_INT32, DT_BOOL] vs. "
+                "[DT_INT32, DT_STRING] while");
+}
+
+TEST_F(NodeDefBuilderTest, OutTypeList) {
+  Op(OpDefBuilder("OutTypeList").Output("out: T").Attr("T: list(type) >= 0"));
+
+  ExpectSuccess(Builder().Attr("T", {DT_FLOAT}), {}, {DT_FLOAT}, R"proto(
+      op: "OutTypeList"
+      attr { key: "T" value { list { type: DT_FLOAT } } } )proto");
+
+  ExpectSuccess(Builder().Attr("T", {DT_STRING, DT_BOOL}), {},
+                {DT_STRING, DT_BOOL}, R"proto(
+      op: "OutTypeList"
+      attr { key: "T" value { list { type: [DT_STRING, DT_BOOL] } } } )proto");
+
+  ExpectSuccess(Builder().Attr("T", DataTypeVector()), {}, {}, R"proto(
+      op: "OutTypeList"
+      attr { key: "T" value { list { } } } )proto");
+
+  ExpectInvalid(Builder().Attr("T", DT_FLOAT),
+                "AttrValue had value with type type when list(type) expected");
+}
+
+TEST_F(NodeDefBuilderTest, TypeListRestrict) {
+  Op(OpDefBuilder("TypeListRestrict")
+         .Input("a: T")
+         .Attr("T: list({string, bool}) >= 0"));
+
+  ExpectSuccess(Builder().Input(FakeInput({DT_STRING, DT_BOOL})),
+                {DT_STRING, DT_BOOL}, {}, R"proto(
+      op: "TypeListRestrict" input: ["a", "a:1"]
+      attr { key: "T" value { list { type: [DT_STRING, DT_BOOL] } } } )proto");
+
+  ExpectInvalid(Builder().Input(FakeInput({DT_STRING, DT_INT32})),
+                "Value for attr 'T' of int32 is not in the list of allowed "
+                "values: string, bool");
+}
+
+TEST_F(NodeDefBuilderTest, OutTypeListRestrict) {
+  Op(OpDefBuilder("OutTypeListRestrict")
+         .Output("out: t")
+         .Attr("t: list({string, bool}) >= 0"));
+
+  ExpectSuccess(Builder().Attr("t", {DT_BOOL, DT_STRING}), {},
+                {DT_BOOL, DT_STRING}, R"proto(
+      op: "OutTypeListRestrict"
+      attr { key: "t" value { list { type: [DT_BOOL, DT_STRING] } } } )proto");
+
+  ExpectInvalid(Builder().Attr("t", {DT_STRING, DT_INT32}),
+                "Value for attr 't' of int32 is not in the list of allowed "
+                "values: string, bool");
+}
+
+TEST_F(NodeDefBuilderTest, Attr) {
+  Op(OpDefBuilder("Attr").Attr("a: int"));
+
+  ExpectSuccess(Builder().Attr("a", 12), {}, {}, R"proto(
+      op: "Attr" attr { key: "a" value { i: 12 } } )proto");
+
+  // Attr has wrong type
+  ExpectInvalid(Builder().Attr("a", "bad"),
+                "AttrValue had value with type string when int expected");
+
+  ExpectInvalid(Builder().Attr("a", {12}),
+                "AttrValue had value with type list(int) when int expected");
+
+  // Missing attr
+  ExpectInvalid(Builder(), "NodeDef missing attr 'a' from Op<");
+
+  // Wrong attr
+  ExpectInvalid(Builder().Attr("b", 12),
+                "NodeDef mentions attr 'b' not in Op<");
+
+  // Extra attr
+  ExpectInvalid(Builder().Attr("a", 12).Attr("extra", 12),
+                "NodeDef mentions attr 'extra' not in Op<");
+}
+
+TEST_F(NodeDefBuilderTest, AttrFloat) {
+  Op(OpDefBuilder("AttrFloat").Attr("a: float"));
+
+  ExpectSuccess(Builder().Attr("a", 1.2f /* float */), {}, {}, R"proto(
+      op: "AttrFloat" attr { key: "a" value { f: 1.2 } }
+      )proto");
+
+  ExpectSuccess(Builder().Attr("a", 1.2 /* double */), {}, {}, R"proto(
+      op: "AttrFloat" attr { key: "a" value { f: 1.2 } }
+      )proto");
+
+  // Won't automatically cast int to float
+  ExpectInvalid(Builder().Attr("a", 12),
+                "AttrValue had value with type int when float expected");
+}
+
+TEST_F(NodeDefBuilderTest, AttrBoolList) {
+  Op(OpDefBuilder("AttrBoolList").Attr("a: list(bool)"));
+
+  ExpectSuccess(Builder().Attr("a", {true, false, true}), {}, {}, R"proto(
+      op: "AttrBoolList"
+      attr { key: "a" value { list { b: [true, false, true] } } }
+      )proto");
+
+  ExpectSuccess(Builder().Attr("a", std::vector<bool>()), {}, {}, R"proto(
+      op: "AttrBoolList" attr { key: "a" value { list { } } }
+      )proto");
+
+  // Won't cast int -> bool.
+  ExpectInvalid(Builder().Attr("a", {0}),
+                "AttrValue had value with type list(int) when list(bool) "
+                "expected");
+}
+
+TEST_F(NodeDefBuilderTest, AttrMin) {
+  Op(OpDefBuilder("AttrMin").Attr("a: int >= 5"));
+
+  ExpectSuccess(Builder().Attr("a", 12), {}, {}, R"proto(
+      op: "AttrMin" attr { key: "a" value { i: 12 } } )proto");
+
+  ExpectInvalid(Builder().Attr("a", 2),
+                "Value for attr 'a' of 2 must be at least minimum 5");
+}
+
+TEST_F(NodeDefBuilderTest, AttrListMin) {
+  Op(OpDefBuilder("AttrListMin").Attr("a: list(int) >= 2"));
+
+  ExpectSuccess(Builder().Attr("a", {1, 2}), {}, {}, R"proto(
+      op: "AttrListMin"
+      attr { key: "a" value { list { i: [1, 2] } } } )proto");
+
+  ExpectInvalid(Builder().Attr("a", {17}),
+                "Length for attr 'a' of 1 must be at least minimum 2");
+}
+
+TEST_F(NodeDefBuilderTest, AttrEnum) {
+  Op(OpDefBuilder("AttrEnum").Attr("a: {'apples', 'oranges'}"));
+
+  ExpectSuccess(Builder().Attr("a", "oranges"), {}, {}, R"proto(
+      op: "AttrEnum"
+      attr { key: "a" value { s: "oranges" } } )proto");
+
+  ExpectInvalid(
+      Builder().Attr("a", "invalid"),
+      "Value for attr 'a' of \"invalid\" is not in the list of allowed values: "
+      "\"apples\", \"oranges\"");
+}
+
+TEST_F(NodeDefBuilderTest, AttrEnumList) {
+  Op(OpDefBuilder("AttrEnumList").Attr("a: list({'apples', 'oranges'})"));
+
+  ExpectSuccess(Builder().Attr("a", {"oranges", "apples"}), {}, {}, R"proto(
+      op: "AttrEnumList"
+      attr { key: "a" value { list { s: ["oranges", "apples"] } } } )proto");
+
+  ExpectInvalid(
+      Builder().Attr("a", {"apples", "invalid", "oranges"}),
+      "Value for attr 'a' of \"invalid\" is not in the list of allowed values: "
+      "\"apples\", \"oranges\"");
+}
+
+TEST_F(NodeDefBuilderTest, AttrShape) {
+  Op(OpDefBuilder("AttrShape").Attr("a: shape"));
+
+  ExpectSuccess(Builder().Attr("a", TensorShape({5})), {}, {}, R"proto(
+      op: "AttrShape"
+      attr { key: "a" value { shape { dim { size: 5 } } } } )proto");
+
+  ExpectSuccess(Builder().Attr("a", TensorShape({4, 3, 2})), {}, {}, R"proto(
+      op: "AttrShape"
+      attr { key: "a" value { shape {
+        dim { size: 4 } dim { size: 3 } dim { size: 2 } } } } )proto");
+
+  ExpectSuccess(Builder().Attr("a", TensorShape({3, 2})), {}, {},
+                R"proto(
+      op: "AttrShape"
+      attr { key: "a" value { shape {
+        dim { size: 3 } dim { size: 2 } } } } )proto");
+
+  ExpectSuccess(Builder().Attr("a", TensorShape()), {}, {}, R"proto(
+      op: "AttrShape"
+      attr { key: "a" value { shape { } } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, AttrDefault) {
+  Op(OpDefBuilder("AttrDefault").Attr("a: string = 'banana'"));
+
+  ExpectSuccess(Builder(), {}, {}, R"proto(
+      op: "AttrDefault"
+      attr { key: "a" value { s: "banana" } } )proto");
+
+  ExpectSuccess(Builder().Attr("a", "kiwi"), {}, {}, R"proto(
+      op: "AttrDefault"
+      attr { key: "a" value { s: "kiwi" } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, AttrManyDefault) {
+  Op(OpDefBuilder("AttrManyDefault")
+         .Attr("a: string = 'banana'")
+         .Attr("b: string = 'kiwi'"));
+
+  ExpectSuccess(Builder(), {}, {}, R"proto(
+      op: "AttrManyDefault"
+      attr { key: "a" value { s: "banana" } }
+      attr { key: "b" value { s: "kiwi" } } )proto");
+
+  Op(OpDefBuilder("AttrManyDefaultWithMandatory")
+         .Attr("a: string = 'banana'")
+         .Attr("b: string = 'kiwi'")
+         .Attr("c: string"));
+
+  ExpectSuccess(Builder().Attr("c", "strawberry"), {}, {}, R"proto(
+      op: "AttrManyDefaultWithMandatory"
+      attr { key: "c" value { s: "strawberry" } }
+      attr { key: "a" value { s: "banana" } }
+      attr { key: "b" value { s: "kiwi" } } )proto");
+
+  Op(OpDefBuilder("AttrManyDefaultAndInferred")
+         .Input("input: T")
+         .Attr("T: {float, double}")
+         .Attr("a: string")
+         .Attr("b: list(string) >= 1")
+         .Attr("c: bool = true")
+         .Attr("d: float = 0.3")
+         .Attr("e: string")
+         .Attr("f: float = 0.25"));
+
+  ExpectSuccess(Builder()
+                    .Input(FakeInput(DT_FLOAT))
+                    .Attr("a", "foo")
+                    .Attr("e", "foo")
+                    .Attr("b", std::vector<string>({"bar", "baz"}))
+                    .Attr("f", 1.0f),
+                {DT_FLOAT}, {}, R"proto(
+      op: "AttrManyDefaultAndInferred"
+      input: "a"
+      attr { key: "T" value { type: DT_FLOAT } }
+      attr { key: "a" value { s: "foo" } }
+      attr { key: "e" value { s: "foo" } }
+      attr { key: "b" value { list { s: "bar" s: "baz" } } }
+      attr { key: "f" value { f: 1.0 } }
+      attr { key: "c" value { b: true } }
+      attr { key: "d" value { f: 0.3 } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, AttrListDefault) {
+  Op(OpDefBuilder("AttrListDefault").Attr("a: list(int) = [5, 15]"));
+
+  ExpectSuccess(Builder(), {}, {}, R"proto(
+      op: "AttrListDefault"
+      attr { key: "a" value { list { i: [5, 15] } } } )proto");
+
+  ExpectSuccess(Builder().Attr("a", {3}), {}, {}, R"proto(
+      op: "AttrListDefault"
+      attr { key: "a" value { list { i: 3 } } } )proto");
+
+  ExpectSuccess(Builder().Attr("a", std::vector<int>()), {}, {}, R"proto(
+      op: "AttrListDefault"
+      attr { key: "a" value { list { } } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, AttrEmptyListDefault) {
+  Op(OpDefBuilder("AttrEmptyListDefault").Attr("a: list(int) = []"));
+
+  ExpectSuccess(Builder(), {}, {}, R"proto(
+      op: "AttrEmptyListDefault"
+      attr { key: "a" value { list { } } } )proto");
+
+  ExpectSuccess(Builder().Attr("a", {3}), {}, {}, R"proto(
+      op: "AttrEmptyListDefault"
+      attr { key: "a" value { list { i: 3 } } } )proto");
+
+  ExpectSuccess(Builder().Attr("a", std::vector<int>()), {}, {}, R"proto(
+      op: "AttrEmptyListDefault"
+      attr { key: "a" value { list { } } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, NIntsIn) {
+  Op(OpDefBuilder("NIntsIn").Input("a: N*int32").Attr("N: int >= 2"));
+
+  ExpectSuccess(Builder().Input(FakeInput(2)), {DT_INT32, DT_INT32}, {},
+                R"proto(
+      op: "NIntsIn" input: ["a", "a:1"]
+      attr { key: "N" value { i: 2 } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(5, DT_INT32)),
+                {DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32}, {}, R"proto(
+      op: "NIntsIn"
+      input: ["a", "a:1", "a:2", "a:3", "a:4"]
+      attr { key: "N" value { i: 5 } } )proto");
+
+  ExpectFailures(Builder().Input(FakeInput(2, DT_STRING)),
+                 {"2 errors while building NodeDef",
+                  "Input 'a' passed string expected int32"});
+
+  ExpectInvalid(Builder().Input(FakeInput(1)),
+                "Value for attr 'N' of 1 must be at least minimum 2");
+
+  ExpectFailures(
+      Builder().Input(FakeInput(DT_INT32)),
+      {"2 errors while building NodeDef",
+       "Could not infer length of input 'a': No attr named 'N' in NodeDef:",
+       "0 inputs specified of 1 inputs in Op"});
+
+  ExpectFailure(Builder().Input({{"in", 0, DT_INT32}, {"in", 1, DT_STRING}}),
+                "Input 'a' passed string expected int32 while");
+
+  ExpectFailures(
+      Builder().Input(FakeInput()),
+      {"2 errors while building NodeDef",
+       "Could not infer length of input 'a': No attr named 'N' in NodeDef:",
+       "0 inputs specified of 1 inputs in Op"});
+}
+
+TEST_F(NodeDefBuilderTest, NPolymorphicIn) {
+  Op(OpDefBuilder("NPolymorphicIn")
+         .Input("a: N*T")
+         .Attr("T: type")
+         .Attr("N: int >= 2"));
+
+  ExpectSuccess(Builder().Input(FakeInput(2, DT_INT32)), {DT_INT32, DT_INT32},
+                {}, R"proto(
+      op: "NPolymorphicIn" input: ["a", "a:1"]
+      attr { key: "N" value { i: 2 } }
+      attr { key: "T" value { type: DT_INT32 } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(3, DT_STRING)),
+                {DT_STRING, DT_STRING, DT_STRING}, {}, R"proto(
+      op: "NPolymorphicIn"
+      input: ["a", "a:1", "a:2"]
+      attr { key: "N" value { i: 3 } }
+      attr { key: "T" value { type: DT_STRING } } )proto");
+
+  ExpectFailures(
+      Builder().Input(FakeInput(2)),
+      {"2 errors while building NodeDef",
+       "Could not infer type for input 'a': No attr named 'T' in NodeDef:",
+       "0 inputs specified of 1 inputs in Op"});
+
+  ExpectFailure(Builder().Input(FakeInput({DT_INT32, DT_STRING})),
+                "Input 'a' passed string expected int32 while");
+
+  ExpectFailure(Builder().Input({{"in", 0, DT_INT32}, {"in", 1, DT_STRING}}),
+                "Input 'a' passed string expected int32 while");
+
+  ExpectInvalid(Builder().Input(FakeInput(1, DT_INT32)),
+                "Value for attr 'N' of 1 must be at least minimum 2");
+
+  ExpectFailure(Builder().Input("in", 0, DT_INT32),
+                "Single tensor passed to 'a', expected list while");
+}
+
+TEST_F(NodeDefBuilderTest, NPolymorphicRestrictIn) {
+  Op(OpDefBuilder("NPolymorphicRestrictIn")
+         .Input("a: N*T")
+         .Attr("T: {string, bool}")
+         .Attr("N: int >= 2"));
+
+  ExpectSuccess(Builder().Input(FakeInput(2, DT_BOOL)), {DT_BOOL, DT_BOOL}, {},
+                R"proto(
+      op: "NPolymorphicRestrictIn" input: ["a", "a:1"]
+      attr { key: "N" value { i: 2 } }
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(3, DT_STRING)),
+                {DT_STRING, DT_STRING, DT_STRING}, {}, R"proto(
+      op: "NPolymorphicRestrictIn"
+      input: ["a", "a:1", "a:2"]
+      attr { key: "N" value { i: 3 } }
+      attr { key: "T" value { type: DT_STRING } } )proto");
+
+  ExpectInvalid(Builder().Input(FakeInput(2, DT_INT32)),
+                "Value for attr 'T' of int32 is not in the list of allowed "
+                "values: string, bool");
+}
+
+TEST_F(NodeDefBuilderTest, NInTwice) {
+  Op(OpDefBuilder("NInTwice")
+         .Input("a: N*int32")
+         .Input("b: N*string")
+         .Attr("N: int >= 0"));
+
+  ExpectSuccess(Builder().Input(FakeInput(2)).Input(FakeInput(2)),
+                {DT_INT32, DT_INT32, DT_STRING, DT_STRING}, {}, R"proto(
+      op: "NInTwice"
+      input: ["a", "a:1", "b", "b:1"]
+      attr { key: "N" value { i: 2 } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(0)).Input(FakeInput()), {}, {},
+                R"proto(
+      op: "NInTwice" attr { key: "N" value { i: 0 } } )proto");
+
+  ExpectFailure(Builder().Input(FakeInput(3)).Input(FakeInput(1)),
+                "Inconsistent values for attr 'N' 3 vs. 1 while");
+}
+
+TEST_F(NodeDefBuilderTest, NInPolymorphicTwice) {
+  Op(OpDefBuilder("NInPolymorphicTwice")
+         .Input("a: N*T")
+         .Input("b: N*T")
+         .Attr("T: type")
+         .Attr("N: int >= 0"));
+
+  ExpectSuccess(Builder().Input(FakeInput(2, DT_INT32)).Input(FakeInput()),
+                {DT_INT32, DT_INT32, DT_INT32, DT_INT32}, {}, R"proto(
+      op: "NInPolymorphicTwice"
+      input: ["a", "a:1", "b", "b:1"]
+      attr { key: "N" value { i: 2 } }
+      attr { key: "T" value { type: DT_INT32 } } )proto");
+
+  ExpectFailure(
+      Builder().Input(FakeInput(3, DT_INT32)).Input(FakeInput(1, DT_INT32)),
+      "Inconsistent values for attr 'N' 3 vs. 1 while");
+
+  ExpectFailure(Builder().Input(FakeInput(3, DT_INT32)).Input(FakeInput(1)),
+                "Inconsistent values for attr 'N' 3 vs. 1 while");
+
+  ExpectFailure(
+      Builder().Input(FakeInput(2, DT_INT32)).Input(FakeInput(2, DT_STRING)),
+      "Inconsistent values for attr 'T' DT_INT32 vs. DT_STRING while");
+
+  ExpectFailure(
+      Builder().Input(FakeInput(2, DT_INT32)).Input(FakeInput(DT_STRING)),
+      "Inconsistent values for attr 'T' DT_INT32 vs. DT_STRING while");
+}
+
+TEST_F(NodeDefBuilderTest, NInTwoTypeVariables) {
+  Op(OpDefBuilder("NInTwoTypeVariables")
+         .Input("a: N*S")
+         .Input("b: N*T")
+         .Attr("S: type")
+         .Attr("T: type")
+         .Attr("N: int >= 0"));
+
+  ExpectSuccess(
+      Builder().Input(FakeInput(2, DT_INT32)).Input(FakeInput(2, DT_BOOL)),
+      {DT_INT32, DT_INT32, DT_BOOL, DT_BOOL}, {}, R"proto(
+      op: "NInTwoTypeVariables"
+      input: ["a", "a:1", "b", "b:1"]
+      attr { key: "N" value { i: 2 } }
+      attr { key: "S" value { type: DT_INT32 } }
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+
+  ExpectSuccess(
+      Builder().Input(FakeInput(2, DT_INT32)).Input(FakeInput(DT_BOOL)),
+      {DT_INT32, DT_INT32, DT_BOOL, DT_BOOL}, {}, R"proto(
+      op: "NInTwoTypeVariables"
+      input: ["a", "a:1", "b", "b:1"]
+      attr { key: "N" value { i: 2 } }
+      attr { key: "S" value { type: DT_INT32 } }
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+
+  ExpectFailure(
+      Builder().Input(FakeInput(3, DT_INT32)).Input(FakeInput(1, DT_STRING)),
+      "Inconsistent values for attr 'N' 3 vs. 1 while");
+}
+
+TEST_F(NodeDefBuilderTest, InPolymorphicTwice) {
+  Op(OpDefBuilder("InPolymorphicTwice")
+         .Input("a: N*T")
+         .Input("b: M*T")
+         .Attr("T: type")
+         .Attr("N: int >= 0")
+         .Attr("M: int >= 0"));
+
+  ExpectSuccess(
+      Builder().Input(FakeInput(1, DT_INT32)).Input(FakeInput(3, DT_INT32)),
+      {DT_INT32, DT_INT32, DT_INT32, DT_INT32}, {}, R"proto(
+      op: "InPolymorphicTwice"
+      input: ["a", "b", "b:1", "b:2"]
+      attr { key: "N" value { i: 1 } }
+      attr { key: "T" value { type: DT_INT32 } }
+      attr { key: "M" value { i: 3 } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(1, DT_BOOL)).Input(FakeInput(0)),
+                {DT_BOOL}, {}, R"proto(
+      op: "InPolymorphicTwice" input: "a"
+      attr { key: "N" value { i: 1 } }
+      attr { key: "T" value { type: DT_BOOL } }
+      attr { key: "M" value { i: 0 } } )proto");
+
+  ExpectSuccess(Builder().Input(FakeInput(0)).Input(FakeInput(1, DT_BOOL)),
+                {DT_BOOL}, {}, R"proto(
+      op: "InPolymorphicTwice" input: "b"
+      attr { key: "N" value { i: 0 } }
+      attr { key: "M" value { i: 1 } }
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+
+  ExpectFailure(
+      Builder().Input(FakeInput(2, DT_INT32)).Input(FakeInput(2, DT_STRING)),
+      "Inconsistent values for attr 'T' DT_INT32 vs. DT_STRING while");
+}
+
+TEST_F(NodeDefBuilderTest, NIntsOut) {
+  Op(OpDefBuilder("NIntsOut").Output("a: N*int32").Attr("N: int >= 2"));
+
+  ExpectSuccess(Builder().Attr("N", 2), {}, {DT_INT32, DT_INT32}, R"proto(
+      op: "NIntsOut"
+      attr { key: "N" value { i: 2 } } )proto");
+
+  ExpectSuccess(Builder().Attr("N", 3), {}, {DT_INT32, DT_INT32, DT_INT32},
+                R"proto(
+      op: "NIntsOut"
+      attr { key: "N" value { i: 3 } } )proto");
+
+  ExpectInvalid(Builder().Attr("N", 1),
+                "Value for attr 'N' of 1 must be at least minimum 2");
+
+  ExpectInvalid(Builder().Attr("N", {3}),
+                "AttrValue had value with type list(int) when int expected");
+
+  ExpectInvalid(Builder(), "NodeDef missing attr 'N' from");
+}
+
+TEST_F(NodeDefBuilderTest, NIntsOutDefault) {
+  Op(OpDefBuilder("NIntsOutDefault")
+         .Output("a: N*int32")
+         .Attr("N: int >= 2 = 3"));
+
+  ExpectSuccess(Builder(), {}, {DT_INT32, DT_INT32, DT_INT32}, R"proto(
+      op: "NIntsOutDefault"
+      attr { key: "N" value { i: 3 } } )proto");
+
+  ExpectSuccess(Builder().Attr("N", 2), {}, {DT_INT32, DT_INT32}, R"proto(
+      op: "NIntsOutDefault"
+      attr { key: "N" value { i: 2 } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, NPolymorphicOut) {
+  Op(OpDefBuilder("NPolymorphicOut")
+         .Output("a: N*T")
+         .Attr("T: type")
+         .Attr("N: int >= 2"));
+
+  ExpectSuccess(Builder().Attr("T", DT_INT32).Attr("N", 2), {},
+                {DT_INT32, DT_INT32}, R"proto(
+      op: "NPolymorphicOut"
+      attr { key: "T" value { type: DT_INT32 } }
+      attr { key: "N" value { i: 2 } } )proto");
+
+  ExpectSuccess(Builder().Attr("N", 3).Attr("T", DT_STRING), {},
+                {DT_STRING, DT_STRING, DT_STRING}, R"proto(
+      op: "NPolymorphicOut"
+      attr { key: "N" value { i: 3 } }
+      attr { key: "T" value { type: DT_STRING } } )proto");
+
+  ExpectInvalid(Builder().Attr("N", 1).Attr("T", DT_STRING),
+                "Value for attr 'N' of 1 must be at least minimum 2");
+
+  ExpectInvalid(Builder().Attr("N", 3).Attr("T", {DT_STRING}),
+                "AttrValue had value with type list(type) when type expected");
+}
+
+TEST_F(NodeDefBuilderTest, NPolymorphicOutDefault) {
+  Op(OpDefBuilder("NPolymorphicOutDefault")
+         .Output("a: N*T")
+         .Attr("T: type = DT_BOOL")
+         .Attr("N: int >= 2 = 2"));
+
+  ExpectSuccess(Builder(), {}, {DT_BOOL, DT_BOOL}, R"proto(
+      op: "NPolymorphicOutDefault"
+      attr { key: "T" value { type: DT_BOOL } }
+      attr { key: "N" value { i: 2 } } )proto");
+
+  ExpectSuccess(Builder().Attr("N", 3), {}, {DT_BOOL, DT_BOOL, DT_BOOL},
+                R"proto(
+      op: "NPolymorphicOutDefault"
+      attr { key: "N" value { i: 3 } }
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+
+  ExpectSuccess(Builder().Attr("T", DT_INT32), {}, {DT_INT32, DT_INT32},
+                R"proto(
+      op: "NPolymorphicOutDefault"
+      attr { key: "T" value { type: DT_INT32 } }
+      attr { key: "N" value { i: 2 } } )proto");
+
+  ExpectSuccess(Builder().Attr("N", 3).Attr("T", DT_INT32), {},
+                {DT_INT32, DT_INT32, DT_INT32}, R"proto(
+      op: "NPolymorphicOutDefault"
+      attr { key: "N" value { i: 3 } }
+      attr { key: "T" value { type: DT_INT32 } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, NPolymorphicRestrictOut) {
+  Op(OpDefBuilder("NPolymorphicRestrictOut")
+         .Output("a: N*T")
+         .Attr("T: {string, bool}")
+         .Attr("N: int >= 2"));
+
+  ExpectSuccess(Builder().Attr("N", 3).Attr("T", DT_BOOL), {},
+                {DT_BOOL, DT_BOOL, DT_BOOL}, R"proto(
+      op: "NPolymorphicRestrictOut"
+      attr { key: "N" value { i: 3 } }
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+
+  ExpectInvalid(Builder().Attr("N", 3).Attr("T", DT_INT32),
+                "Value for attr 'T' of int32 is not in the list of allowed "
+                "values: string, bool");
+}
+
+TEST_F(NodeDefBuilderTest, RefIn) {
+  Op(OpDefBuilder("RefIn").Input("a: Ref(int32)"));
+
+  ExpectSuccess(Builder().Input(FakeInput(DT_INT32_REF)), {DT_INT32_REF}, {},
+                R"proto(
+      op: "RefIn" input: "a" )proto");
+
+  ExpectFailure(Builder().Input(FakeInput(DT_BOOL_REF)),
+                "Input 'a' passed bool_ref expected int32_ref while");
+
+  ExpectFailure(Builder().Input(FakeInput(DT_INT32)),
+                "Input 'a' passed int32 expected int32_ref while");
+}
+
+TEST_F(NodeDefBuilderTest, PolymorphicRefIn) {
+  Op(OpDefBuilder("PolymorphicRefIn").Input("a: Ref(T)").Attr("T: type"));
+
+  ExpectSuccess(Builder().Input(FakeInput(DT_BOOL_REF)), {DT_BOOL_REF}, {},
+                R"proto(
+      op: "PolymorphicRefIn" input: "a"
+      attr { key: "T" value { type: DT_BOOL } } )proto");
+
+  ExpectFailure(Builder().Input(FakeInput(DT_BOOL)),
+                "Input 'a' passed bool expected ref type while");
+}
+
+TEST_F(NodeDefBuilderTest, RefOut) {
+  Op(OpDefBuilder("RefOut").Output("a: Ref(string)"));
+
+  ExpectSuccess(Builder(), {}, {DT_STRING_REF}, R"proto(
+      op: "RefOut" )proto");
+}
+
+TEST_F(NodeDefBuilderTest, PolymorphicRefOut) {
+  Op(OpDefBuilder("PolymorphicRefOut").Output("a: Ref(t)").Attr("t: type"));
+
+  ExpectSuccess(Builder().Attr("t", DT_BOOL), {}, {DT_BOOL_REF}, R"proto(
+      op: "PolymorphicRefOut"
+      attr { key: "t" value { type: DT_BOOL } } )proto");
+}
+
+TEST_F(NodeDefBuilderTest, SpecifyDevice) {
+  Op(OpDefBuilder("SpecifyDevice"));
+
+  ExpectSuccess(Builder().Device("ADevice"), {}, {}, R"proto(
+      op: "SpecifyDevice" device: "ADevice" )proto");
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/node_def_util.cc b/tensorflow/core/framework/node_def_util.cc
new file mode 100644
index 0000000000..aefd416187
--- /dev/null
+++ b/tensorflow/core/framework/node_def_util.cc
@@ -0,0 +1,414 @@
+#include "tensorflow/core/framework/node_def_util.h"
+
+#include <algorithm>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/regexp.h"
+
+namespace tensorflow {
+
+string SummarizeNodeDef(const NodeDef& node_def) {
+  string ret = strings::StrCat(node_def.name(), " = ", node_def.op(), "[");
+
+  // We sort the attrs so the output is deterministic.
+  std::vector<string> attr_names;
+  attr_names.reserve(node_def.attr().size());
+  for (const auto& attr : node_def.attr()) {
+    attr_names.push_back(attr.first);
+  }
+  std::sort(attr_names.begin(), attr_names.end());
+  bool first = true;
+  for (const string& attr_name : attr_names) {
+    if (!first) strings::StrAppend(&ret, ", ");
+    first = false;
+    auto iter = node_def.attr().find(attr_name);
+    strings::StrAppend(&ret, attr_name, "=", SummarizeAttrValue(iter->second));
+  }
+
+  // Consider the device to be a final attr with name "_device".
+  if (!node_def.device().empty()) {
+    if (!first) strings::StrAppend(&ret, ", ");
+    first = false;
+    strings::StrAppend(&ret, "_device=\"", node_def.device(), "\"");
+  }
+  strings::StrAppend(&ret, "](");
+
+  // Output inputs, including control inputs, verbatim.
+  first = true;
+  for (const string& input : node_def.input()) {
+    if (!first) strings::StrAppend(&ret, ", ");
+    first = false;
+    strings::StrAppend(&ret, input);
+  }
+  strings::StrAppend(&ret, ")");
+  return ret;
+}
+
+const AttrValue* AttrSlice::Find(const string& attr_name) const {
+  auto iter = attrs_->find(attr_name);
+  if (iter == attrs_->end()) return nullptr;
+  return &iter->second;
+}
+
+Status AttrSlice::Find(const string& attr_name,
+                       const AttrValue** attr_value) const {
+  *attr_value = Find(attr_name);
+  if (*attr_value != nullptr) {
+    return Status::OK();
+  }
+  Status s = errors::NotFound("No attr named '", attr_name, "' in NodeDef:");
+  if (ndef_) {
+    s = AttachDef(s, *ndef_);
+  }
+  return s;
+}
+
+// The ... is to allow the caller to inject some value validation code.  Use
+// just ; if no additional validation code is needed.
+#define DEFINE_GET_ATTR(TYPE, FIELD, ATTR_TYPE, APPEND_OP, CAST, ...)         \
+  Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,         \
+                     TYPE* value) {                                           \
+    const AttrValue* attr_value;                                              \
+    TF_RETURN_IF_ERROR(attrs.Find(attr_name, &attr_value));                   \
+    TF_RETURN_IF_ERROR(AttrValueHasType(*attr_value, ATTR_TYPE));             \
+    const auto& v = attr_value->FIELD();                                      \
+    __VA_ARGS__;                                                              \
+    *value = CAST;                                                            \
+    return Status::OK();                                                      \
+  }                                                                           \
+  Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,         \
+                     std::vector<TYPE>* value) {                              \
+    const AttrValue* attr_value;                                              \
+    TF_RETURN_IF_ERROR(attrs.Find(attr_name, &attr_value));                   \
+    TF_RETURN_IF_ERROR(AttrValueHasType(*attr_value, "list(" ATTR_TYPE ")")); \
+    for (const auto& v : attr_value->list().FIELD()) {                        \
+      __VA_ARGS__;                                                            \
+      value->APPEND_OP(CAST);                                                 \
+    }                                                                         \
+    return Status::OK();                                                      \
+  }
+
+DEFINE_GET_ATTR(string, s, "string", emplace_back, v, ;)
+DEFINE_GET_ATTR(int64, i, "int", emplace_back, v, ;)
+DEFINE_GET_ATTR(int32, i, "int", emplace_back, static_cast<int32>(v),
+                if (static_cast<int64>(static_cast<int32>(v)) != v) {
+                  return errors::InvalidArgument("Attr ", attr_name,
+                                                 " has value ", v,
+                                                 " out of range for an int32");
+                })
+DEFINE_GET_ATTR(float, f, "float", emplace_back, v, ;)
+// std::vector<bool> specialization does not have emplace_back until
+// c++14, so we have to use push_back (see
+// http://en.cppreference.com/w/cpp/container/vector/emplace_back)
+DEFINE_GET_ATTR(bool, b, "bool", push_back, v, ;)
+DEFINE_GET_ATTR(DataType, type, "type", emplace_back, static_cast<DataType>(v),
+                ;)
+DEFINE_GET_ATTR(TensorShapeProto, shape, "shape", emplace_back, v, ;)
+DEFINE_GET_ATTR(TensorShape, shape, "shape", emplace_back, TensorShape(v), ;)
+DEFINE_GET_ATTR(Tensor, tensor, "tensor", emplace_back, t, Tensor t;
+                if (!t.FromProto(v)) {
+                  return errors::InvalidArgument(
+                      "Attr ", attr_name, " has value ", v.ShortDebugString(),
+                      " that can't be converted to a Tensor");
+                })
+
+#undef DEFINE_GET_ATTR
+
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   DataTypeVector* value) {
+  const AttrValue* attr_value;
+  TF_RETURN_IF_ERROR(attrs.Find(attr_name, &attr_value));
+  TF_RETURN_IF_ERROR(AttrValueHasType(*attr_value, "list(type)"));
+  for (const auto& v : attr_value->list().type()) {
+    value->push_back(static_cast<DataType>(v));
+  }
+  return Status::OK();
+}
+
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   const TensorProto** value) {
+  const AttrValue* attr_value;
+  TF_RETURN_IF_ERROR(attrs.Find(attr_name, &attr_value));
+  TF_RETURN_IF_ERROR(AttrValueHasType(*attr_value, "tensor"));
+  *value = &attr_value->tensor();
+  return Status::OK();
+}
+
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   const NameAttrList** value) {
+  const AttrValue* attr_value;
+  TF_RETURN_IF_ERROR(attrs.Find(attr_name, &attr_value));
+  TF_RETURN_IF_ERROR(AttrValueHasType(*attr_value, "func"));
+  *value = &attr_value->func();
+  return Status::OK();
+}
+
+namespace {  // Helper for InOutTypesForNode().
+
+Status AddArgToSig(const NodeDef& node_def, const OpDef::ArgDef& arg_def,
+                   DataTypeVector* sig) {
+  const int original_size = sig->size();
+  if (!arg_def.number_attr().empty()) {
+    // Same type repeated "repeats" times.
+    int32 repeats = -1;
+    TF_RETURN_IF_ERROR(GetNodeAttr(node_def, arg_def.number_attr(), &repeats));
+    if (repeats < 0) {
+      return errors::InvalidArgument("Value for number_attr() ", repeats,
+                                     " < 0");
+    }
+
+    if (!arg_def.type_attr().empty()) {
+      DataType dtype;
+      TF_RETURN_IF_ERROR(GetNodeAttr(node_def, arg_def.type_attr(), &dtype));
+      for (int i = 0; i < repeats; ++i) {
+        sig->push_back(dtype);
+      }
+    } else if (arg_def.type() != DT_INVALID) {
+      for (int i = 0; i < repeats; ++i) {
+        sig->push_back(arg_def.type());
+      }
+    } else {
+      return errors::InvalidArgument("Missing type or type_attr field in ",
+                                     arg_def.ShortDebugString());
+    }
+  } else if (!arg_def.type_attr().empty()) {
+    const AttrValue* attr_value;
+    TF_RETURN_IF_ERROR(
+        AttrSlice(node_def).Find(arg_def.type_attr(), &attr_value));
+    sig->push_back(attr_value->type());
+  } else if (!arg_def.type_list_attr().empty()) {
+    const AttrValue* attr_value;
+    TF_RETURN_IF_ERROR(
+        AttrSlice(node_def).Find(arg_def.type_list_attr(), &attr_value));
+    for (int dtype : attr_value->list().type()) {
+      sig->push_back(static_cast<DataType>(dtype));
+    }
+  } else if (arg_def.type() != DT_INVALID) {
+    sig->push_back(arg_def.type());
+  } else {
+    return errors::InvalidArgument("No type fields in ",
+                                   arg_def.ShortDebugString());
+  }
+  if (arg_def.is_ref()) {
+    // For all types that were added by this function call, make them refs.
+    for (size_t i = original_size; i < sig->size(); ++i) {
+      (*sig)[i] = MakeRefType((*sig)[i]);
+    }
+  }
+  return Status::OK();
+}
+
+}  // namespace
+
+Status InOutTypesForNode(const NodeDef& node_def, const OpDef& op_def,
+                         DataTypeVector* inputs, DataTypeVector* outputs) {
+  for (const auto& arg : op_def.input_arg()) {
+    TF_RETURN_IF_ERROR(AddArgToSig(node_def, arg, inputs));
+  }
+  for (const auto& arg : op_def.output_arg()) {
+    TF_RETURN_IF_ERROR(AddArgToSig(node_def, arg, outputs));
+  }
+  return Status::OK();
+}
+
+Status ValidateNodeDef(const NodeDef& node_def, const OpDef& op_def) {
+  if (node_def.op() != op_def.name()) {
+    return errors::InvalidArgument("NodeDef op '", node_def.op(),
+                                   "' does not match ", SummarizeOpDef(op_def),
+                                   "; NodeDef: ", SummarizeNodeDef(node_def));
+  }
+
+  bool seen_control = false;
+  size_t num_inputs = 0;
+  // TODO(josh11b): Unify the input field validation.
+  for (const string& input : node_def.input()) {
+    if (StringPiece(input).starts_with("^")) {
+      seen_control = true;
+      if (input.find(':') != string::npos) {
+        return errors::InvalidArgument("Control input '", input,
+                                       "' must not have ':' in NodeDef: ",
+                                       SummarizeNodeDef(node_def));
+      }
+    } else if (seen_control) {
+      return errors::InvalidArgument("Non-control input '", input,
+                                     "' after control input in NodeDef: ",
+                                     SummarizeNodeDef(node_def));
+    } else {
+      ++num_inputs;
+    }
+  }
+
+  std::unordered_map<string, const OpDef::AttrDef*> op_attrs;
+  for (const auto& attr : op_def.attr()) {
+    if (!gtl::InsertIfNotPresent(&op_attrs, attr.name(), &attr)) {
+      return errors::InvalidArgument("OpDef has duplicate attr name '",
+                                     attr.name(), "': ",
+                                     SummarizeOpDef(op_def));
+    }
+  }
+  for (const auto& attr : node_def.attr()) {
+    // Allow internal optional attributes with names starting with "_".
+    if (StringPiece(attr.first).starts_with("_")) {
+      continue;
+    }
+    auto iter = op_attrs.find(attr.first);
+    if (iter == op_attrs.end()) {
+      return errors::InvalidArgument("NodeDef mentions attr '", attr.first,
+                                     "' not in ", SummarizeOpDef(op_def),
+                                     "; NodeDef: ", SummarizeNodeDef(node_def));
+    }
+    TF_RETURN_WITH_CONTEXT_IF_ERROR(
+        ValidateAttrValue(attr.second, *iter->second), "; NodeDef: ",
+        SummarizeNodeDef(node_def), "; ", SummarizeOpDef(op_def));
+    // Keep track of which attr names have (not) been found in the NodeDef.
+    op_attrs.erase(iter);
+  }
+
+  // Were all attrs in the OpDef found in the NodeDef?
+  if (!op_attrs.empty()) {
+    string attrs;
+    for (const auto& attr_pair : op_attrs) {
+      if (!attrs.empty()) strings::StrAppend(&attrs, "', '");
+      strings::StrAppend(&attrs, attr_pair.first);
+    }
+    return errors::InvalidArgument("NodeDef missing attr",
+                                   op_attrs.size() == 1 ? " '" : "s '", attrs,
+                                   "' from ", SummarizeOpDef(op_def),
+                                   "; NodeDef: ", SummarizeNodeDef(node_def));
+  }
+
+  // Validate the number of inputs.
+  DataTypeVector inputs, outputs;
+  TF_RETURN_IF_ERROR(InOutTypesForNode(node_def, op_def, &inputs, &outputs));
+
+  if (num_inputs != inputs.size()) {
+    return errors::InvalidArgument(
+        "NodeDef expected inputs '", DataTypeVectorString(inputs),
+        "' do not match ", num_inputs, " inputs specified; ",
+        SummarizeOpDef(op_def), "; NodeDef: ", SummarizeNodeDef(node_def));
+  }
+
+  return Status::OK();
+}
+
+namespace {  // Helpers for NameRangesForNode()
+
+Status ComputeArgRange(const NodeDef& node_def, const OpDef::ArgDef& arg_def,
+                       const OpDef& op_def, int* num) {
+  if (!arg_def.number_attr().empty()) {
+    // Same type repeated "num" times.
+    return GetNodeAttr(node_def, arg_def.number_attr(), num);
+  } else if (!arg_def.type_list_attr().empty()) {
+    const AttrValue* attr_value;
+    TF_RETURN_IF_ERROR(
+        AttrSlice(node_def).Find(arg_def.type_list_attr(), &attr_value));
+    *num = attr_value->list().type_size();
+  } else if (!arg_def.type_attr().empty() || arg_def.type() != DT_INVALID) {
+    *num = 1;
+  } else {
+    return errors::InvalidArgument("Argument '", arg_def.name(),
+                                   "' incorrectly specified in op definition: ",
+                                   SummarizeOpDef(op_def));
+  }
+  return Status::OK();
+}
+
+Status NameRangesHelper(const NodeDef& node_def,
+                        const protobuf::RepeatedPtrField<OpDef::ArgDef>& args,
+                        const OpDef& op_def, NameRangeMap* result) {
+  int start = 0;
+  int num;
+  for (const auto& arg : args) {
+    TF_RETURN_IF_ERROR(ComputeArgRange(node_def, arg, op_def, &num));
+    (*result)[arg.name()] = std::make_pair(start, start + num);
+    start += num;
+  }
+  return Status::OK();
+}
+
+}  // namespace
+
+Status NameRangesForNode(const NodeDef& node_def, const OpDef& op_def,
+                         NameRangeMap* inputs, NameRangeMap* outputs) {
+  TF_RETURN_IF_ERROR(
+      NameRangesHelper(node_def, op_def.input_arg(), op_def, inputs));
+  return NameRangesHelper(node_def, op_def.output_arg(), op_def, outputs);
+}
+
+void AddDefaultsToNodeDef(const OpDef& op_def, NodeDef* node_def) {
+  for (const auto& attr_def : op_def.attr()) {
+    AttrSlice attrs(*node_def);
+    if (attr_def.has_default_value() && !attrs.Find(attr_def.name())) {
+      AddNodeAttr(attr_def.name(), attr_def.default_value(), node_def);
+    }
+  }
+}
+
+namespace {
+
+static RE2* valid_op_name_pattern = new RE2("[A-Za-z0-9.][A-Za-z0-9_.\\-/]*");
+static RE2* valid_data_input_pattern =
+    new RE2("[A-Za-z0-9.][A-Za-z0-9_.\\-/]*(\\:(0|([1-9][0-9]*)))?");
+static RE2* valid_control_input_pattern =
+    new RE2("\\^[A-Za-z0-9.][A-Za-z0-9_.\\-/]*");
+
+}  // namespace
+
+Status ValidateOpInput(const string& input_name, bool* is_control_input) {
+  *is_control_input = false;
+  if (RE2::FullMatch(input_name, *valid_data_input_pattern)) {
+    return Status::OK();
+  } else if (RE2::FullMatch(input_name, *valid_control_input_pattern)) {
+    *is_control_input = true;
+    return Status::OK();
+  } else {
+    return errors::InvalidArgument("Illegal op input name '", input_name, "'");
+  }
+}
+
+Status ValidateOpName(const string& op_name) {
+  if (RE2::FullMatch(op_name, *valid_op_name_pattern)) {
+    return Status::OK();
+  } else {
+    return errors::InvalidArgument("Illegal op name '", op_name, "'");
+  }
+}
+
+Status ValidateExternalNodeDefSyntax(const NodeDef& node_def) {
+  Status s = ValidateOpName(node_def.name());
+  if (!s.ok()) {
+    return AttachDef(s, node_def);
+  }
+  bool in_control_inputs = false;
+  for (const string& input_name : node_def.input()) {
+    bool is_control_input;
+    s = ValidateOpInput(input_name, &is_control_input);
+    if (!s.ok()) {
+      return AttachDef(s, node_def);
+    }
+
+    if (in_control_inputs && !is_control_input) {
+      return AttachDef(errors::InvalidArgument(
+                           "All control inputs must follow all data inputs"),
+                       node_def);
+    }
+    in_control_inputs = is_control_input;
+  }
+  return Status::OK();
+}
+
+Status AttachDef(const Status& status, const NodeDef& node_def) {
+  Status ret = status;
+  errors::AppendToMessage(
+      &ret, strings::StrCat(" [[Node: ", SummarizeNodeDef(node_def), "]]"));
+  return ret;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/node_def_util.h b/tensorflow/core/framework/node_def_util.h
new file mode 100644
index 0000000000..fce6fd2433
--- /dev/null
+++ b/tensorflow/core/framework/node_def_util.h
@@ -0,0 +1,157 @@
+#ifndef TENSORFLOW_FRAMEWORK_NODE_DEF_UTIL_H_
+#define TENSORFLOW_FRAMEWORK_NODE_DEF_UTIL_H_
+
+#include <string>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+// Produce a human-readable version of a NodeDef that is more concise
+// than a text-format proto.
+string SummarizeNodeDef(const NodeDef& node_def);
+
+typedef protobuf::Map<string, AttrValue> AttrValueMap;
+
+// Adds an attr with name <name> and value <value> to *node_def.
+// The type of the attr is based on the type of value.
+template <class T>
+void AddNodeAttr(const string& name, T&& value, NodeDef* node_def) {
+  AttrValue attr_value;
+  SetAttrValue(std::forward<T>(value), &attr_value);
+  node_def->mutable_attr()->insert(AttrValueMap::value_type(name, attr_value));
+}
+
+// Version to workaround C++'s "perfect" forwarding not being able to
+// forward {...} initialization.
+template <class T>
+void AddNodeAttr(const string& name, std::initializer_list<T> value,
+                 NodeDef* node_def) {
+  AttrValue attr_value;
+  SetAttrValue(value, &attr_value);
+  node_def->mutable_attr()->insert(AttrValueMap::value_type(name, attr_value));
+}
+
+class AttrSlice {
+ public:
+  AttrSlice(const NodeDef& node_def)  // NOLINT(runtime/explicit)
+      : ndef_(&node_def),
+        attrs_(&ndef_->attr()) {}
+
+  explicit AttrSlice(const AttrValueMap* a) : attrs_(a) {}
+
+  // Returns the attr with attr_name if found.  Otherwise, returns
+  // nullptr.
+  const AttrValue* Find(const string& attr_name) const;
+
+  // Returns the attr_value for attr_name if found. Otherwise, returns a
+  // NotFound status.
+  Status Find(const string& attr_name, const AttrValue** attr_value) const;
+
+ private:
+  const NodeDef* ndef_ = nullptr;
+  const AttrValueMap* attrs_;
+};
+
+// Look up the attr with name attr_name and set *value to its value.  If no
+// attr with attr_name is found in node_def, or the attr does not have
+// a matching type, a non-ok status will be returned.
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   string* value);  // type: "string"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   int64* value);  // type: "int"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   int32* value);  // type: "int"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   float* value);  // type: "float"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   bool* value);  // type: "bool"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   DataType* value);  // type: "type"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   TensorShapeProto* value);  // type: "shape"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   TensorShape* value);  // type: "shape"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   Tensor* value);  // type: "tensor"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<string>* value);  // type "list(string)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<int64>* value);  // type "list(int)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<int32>* value);  // type "list(int)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<float>* value);  // type "list(float)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<bool>* value);  // type "list(bool)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<DataType>* value);  // type "list(type)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   DataTypeVector* value);  // type "list(type)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<TensorShapeProto>* value);  // type "list(shape)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<TensorShape>* value);  // type "list(shape)"
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   std::vector<Tensor>* value);  // type: "list(tensor)"
+
+// This version avoids copying the TensorProto.
+// REQUIRES: Must not use *value beyond the lifetime of node_def.
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   const TensorProto** value);  // type: "tensor"
+
+// This version avoids copying the NameAttrList.
+// REQUIRES: Must not use *value beyond the lifetime of node_def.
+Status GetNodeAttr(const AttrSlice& attrs, const string& attr_name,
+                   const NameAttrList** value);  // type: "func"
+
+// Computes the input and output types for a specific node, for
+// attr-style ops.
+// REQUIRES: ValidateOpDef(op_def).ok()
+Status InOutTypesForNode(const NodeDef& node_def, const OpDef& op_def,
+                         DataTypeVector* inputs, DataTypeVector* outputs);
+
+// Validates that the NodeDef:
+// * Defines all expected attrs from the OpDef.
+// * All attrs satisfies constraints from the OpDef.
+// * Has a signature matching SignatureForNode().
+// etc.
+Status ValidateNodeDef(const NodeDef& node_def, const OpDef& op_def);
+
+// Computes the mapping from input/output argument name to the
+// corresponding input/output index range.  For example,
+// input "foo" coresponds to input indices
+//   [ (*inputs)["foo"].first, (*inputs)["foo"].second ).
+typedef std::unordered_map<string, std::pair<int, int>> NameRangeMap;
+Status NameRangesForNode(const NodeDef& node_def, const OpDef& op_def,
+                         NameRangeMap* inputs, NameRangeMap* outputs);
+
+// Adds default values to *node_def for unspecified attrs from op_def.
+void AddDefaultsToNodeDef(const OpDef& op_def, NodeDef* node_def);
+
+// Validates the syntax of a NodeDef provided externally.
+//
+// The following is an EBNF-style syntax for NodeDef objects. Note that
+// Node objects are actually specified as tensorflow::NodeDef protocol buffers,
+// which contain many other fields that are not (currently) validated.
+//
+// Node         = NodeName, Inputs
+// Inputs       = ( DataInput * ), ( ControlInput * )
+// DataInput    = NodeName, ( ":", [1-9], [0-9] * ) ?
+// ControlInput = "^", NodeName
+// NodeName     = [A-Za-z0-9.], [A-Za-z0-9_./] *
+Status ValidateExternalNodeDefSyntax(const NodeDef& node_def);
+
+// Returns "status" with kernel's NodeDef attached as additional text
+// in the error message.
+Status AttachDef(const Status& status, const NodeDef& node_def);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_NODE_DEF_UTIL_H_
diff --git a/tensorflow/core/framework/node_def_util_test.cc b/tensorflow/core/framework/node_def_util_test.cc
new file mode 100644
index 0000000000..71f1760a09
--- /dev/null
+++ b/tensorflow/core/framework/node_def_util_test.cc
@@ -0,0 +1,442 @@
+#include "tensorflow/core/framework/node_def_util.h"
+
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_def_builder.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+OpDef ToOpDef(const OpDefBuilder& builder) {
+  OpDef op_def;
+  EXPECT_OK(builder.Finalize(&op_def));
+  return op_def;
+}
+
+NodeDef ToNodeDef(const string& text) {
+  NodeDef node_def;
+  EXPECT_TRUE(protobuf::TextFormat::MergeFromString(text, &node_def));
+  return node_def;
+}
+
+NodeDef ToNodeDef(const NodeDefBuilder& builder) {
+  NodeDef node_def;
+  EXPECT_OK(builder.Finalize(&node_def));
+  return node_def;
+}
+
+void ExpectSuccess(const NodeDef& good, const OpDef& op_def) {
+  EXPECT_EQ(Status::OK(), ValidateNodeDef(good, op_def))
+      << "NodeDef: " << SummarizeNodeDef(good)
+      << "; OpDef: " << SummarizeOpDef(op_def);
+}
+
+void ExpectFailure(const NodeDef& bad, const OpDef& op_def,
+                   const string& message) {
+  Status status = ValidateNodeDef(bad, op_def);
+
+  EXPECT_FALSE(status.ok()) << "NodeDef: " << SummarizeNodeDef(bad)
+                            << "; OpDef: " << SummarizeOpDef(op_def);
+  if (status.ok()) return;
+
+  EXPECT_TRUE(errors::IsInvalidArgument(status))
+      << status << "; NodeDef: " << SummarizeNodeDef(bad)
+      << "; OpDef: " << SummarizeOpDef(op_def);
+
+  LOG(INFO) << "Message: " << status.error_message();
+  EXPECT_TRUE(StringPiece(status.ToString()).contains(message))
+      << "NodeDef: " << SummarizeNodeDef(bad)
+      << "; OpDef: " << SummarizeOpDef(op_def) << "\nActual error: " << status
+      << "\nDoes not contain: " << message;
+}
+
+TEST(NodeDefUtilTest, In) {
+  const OpDef op = ToOpDef(OpDefBuilder("In").Input("i: T").Attr("T: type"));
+  const NodeDef node_def = ToNodeDef(R"proto(
+    name:'n' op:'In' input:'a' attr { key:'T' value { type:DT_FLOAT } }
+    )proto");
+  ExpectSuccess(node_def, op);
+
+  EXPECT_EQ("n = In[T=DT_FLOAT](a)", SummarizeNodeDef(node_def));
+
+  // Mismatching Op names.
+  NodeDef bad = node_def;
+  bad.set_op("Wrong");
+  ExpectFailure(bad, op, "NodeDef op 'Wrong' does not match Op<name=In;");
+
+  // Missing attr
+  bad = node_def;
+  bad.clear_attr();
+  ExpectFailure(bad, op, "NodeDef missing attr 'T' from Op<name=In;");
+
+  // Extra attr
+  bad = node_def;
+  AddNodeAttr("EXTRA", 17, &bad);
+  ExpectFailure(bad, op, "NodeDef mentions attr 'EXTRA' not in Op<name=In;");
+
+  // Attr has wrong type
+  bad = node_def;
+  bad.clear_attr();
+  AddNodeAttr("T", 17, &bad);
+  ExpectFailure(
+      bad, op,
+      "AttrValue had value with type int when type expected\n\t for attr "
+      "'T'\n\t; NodeDef: ");
+
+  // Wrong number of inputs
+  bad = node_def;
+  bad.add_input("b");
+  ExpectFailure(
+      bad, op,
+      "NodeDef expected inputs 'float' do not match 2 inputs specified;");
+
+  bad = node_def;
+  bad.clear_input();
+  ExpectFailure(
+      bad, op,
+      "NodeDef expected inputs 'float' do not match 0 inputs specified;");
+
+  // Control inputs must appear after data inputs
+  NodeDef good = node_def;
+  good.add_input("^b");
+  ExpectSuccess(node_def, op);
+
+  bad = node_def;
+  bad.clear_input();
+  bad.add_input("^b");
+  bad.add_input("a");
+  ExpectFailure(bad, op,
+                "Invalid argument: Non-control input 'a' after control input "
+                "in NodeDef:");
+
+  bad = node_def;
+  bad.add_input("^b:0");
+  ExpectFailure(bad, op, "Control input '^b:0' must not have ':' in NodeDef:");
+}
+
+TEST(NodeDefUtilTest, Out) {
+  const OpDef op =
+      ToOpDef(OpDefBuilder("Out").Output("o: T").Attr("T: numbertype"));
+  const NodeDef node_def = ToNodeDef(R"proto(
+    name:'n' op:'Out' attr { key:'T' value { type:DT_INT32 } }
+    )proto");
+  ExpectSuccess(node_def, op);
+
+  EXPECT_EQ("n = Out[T=DT_INT32]()", SummarizeNodeDef(node_def));
+
+  // Non-number type.
+  NodeDef bad = node_def;
+  bad.clear_attr();
+  AddNodeAttr("T", DT_STRING, &bad);
+  ExpectFailure(bad, op,
+                "Value for attr 'T' of string is not in the list of allowed "
+                "values: float, double, int64, int32, uint8, int16, int8, "
+                "complex64, qint8, quint8, qint32");
+}
+
+TEST(NodeDefUtilTest, Enum) {
+  const OpDef op = ToOpDef(OpDefBuilder("Enum").Attr("e: {'apple','orange'}"));
+  const NodeDef node_def = ToNodeDef(R"proto(
+    name:'n' op:'Enum' attr { key:'e' value { s:'apple' } }
+    )proto");
+  ExpectSuccess(node_def, op);
+
+  EXPECT_EQ("n = Enum[e=\"apple\"]()", SummarizeNodeDef(node_def));
+
+  NodeDef good = node_def;
+  good.clear_attr();
+  AddNodeAttr("e", "orange", &good);
+  ExpectSuccess(good, op);
+
+  // Non-allowed value.
+  NodeDef bad = node_def;
+  bad.clear_attr();
+  AddNodeAttr("e", "foo", &bad);
+  ExpectFailure(bad, op,
+                "Value for attr 'e' of \"foo\" is not in the list of allowed "
+                "values: \"apple\", \"orange\"");
+}
+
+TEST(NodeDefUtilTest, SameIn) {
+  const OpDef op = ToOpDef(OpDefBuilder("SameIn")
+                               .Input("i: N * T")
+                               .Attr("N: int >= 2")
+                               .Attr("T: {float,double}"));
+  const NodeDef node_def = ToNodeDef(R"proto(
+    name:'n' op:'SameIn' input:'a' input:'b'
+    attr { key:'N' value { i:2 } } attr { key:'T' value { type:DT_DOUBLE } }
+    )proto");
+  ExpectSuccess(node_def, op);
+
+  EXPECT_EQ("n = SameIn[N=2, T=DT_DOUBLE](a, b)", SummarizeNodeDef(node_def));
+
+  // Illegal type
+  NodeDef bad = ToNodeDef(R"proto(
+    name:'n' op:'SameIn' input:'a' input:'b'
+    attr { key:'N' value { i:2 } } attr { key:'T' value { type:DT_STRING } }
+    )proto");
+  ExpectFailure(bad, op,
+                "Value for attr 'T' of string is not in the list of allowed "
+                "values: float, double");
+
+  // Too few inputs
+  bad = ToNodeDef(R"proto(
+    name:'n' op:'SameIn' input:'a' input:'b'
+    attr { key:'N' value { i:1 } } attr { key:'T' value { type:DT_FLOAT } }
+    )proto");
+  ExpectFailure(bad, op, "Value for attr 'N' of 1 must be at least minimum 2");
+}
+
+TEST(NodeDefUtilTest, AnyIn) {
+  const OpDef op =
+      ToOpDef(OpDefBuilder("AnyIn").Input("i: T").Attr("T: list(type) >= 1"));
+
+  const NodeDef node_def = ToNodeDef(R"proto(
+    name:'n' op:'AnyIn' input:'a' input:'b'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectSuccess(node_def, op);
+
+  EXPECT_EQ("n = AnyIn[T=[DT_INT32, DT_STRING]](a, b)",
+            SummarizeNodeDef(node_def));
+
+  const NodeDef bad = ToNodeDef(R"proto(
+    name:'n' op:'AnyIn' input:'a' attr { key:'T' value { list { } } }
+    )proto");
+  ExpectFailure(bad, op, "Length for attr 'T' of 0 must be at least minimum 1");
+
+  // With proto3 semantics, an empty value {} is indistinguishable from a value
+  // with an empty list in it. So we simply expect to get a message complaining
+  // about empty list for value {}.
+  const NodeDef bad2 = ToNodeDef(R"proto(
+    name:'n' op:'AnyIn' input:'a' attr { key:'T' value { } }
+    )proto");
+  ExpectFailure(bad2, op,
+                "Length for attr 'T' of 0 must be at least minimum 1");
+}
+
+TEST(NodeDefUtilTest, Device) {
+  const OpDef op_def1 = ToOpDef(OpDefBuilder("None"));
+  const NodeDef node_def1 =
+      ToNodeDef(NodeDefBuilder("d", &op_def1).Device("/cpu:17"));
+  ExpectSuccess(node_def1, op_def1);
+  EXPECT_EQ("d = None[_device=\"/cpu:17\"]()", SummarizeNodeDef(node_def1));
+
+  const OpDef op_def2 = ToOpDef(OpDefBuilder("WithAttr").Attr("v: int"));
+  const NodeDef node_def2 =
+      ToNodeDef(NodeDefBuilder("d", &op_def2).Attr("v", 7).Device("/cpu:5"));
+  ExpectSuccess(node_def2, op_def2);
+  EXPECT_EQ("d = WithAttr[v=7, _device=\"/cpu:5\"]()",
+            SummarizeNodeDef(node_def2));
+}
+
+void ExpectValidSyntax(const NodeDef& good) {
+  EXPECT_EQ(Status::OK(), ValidateExternalNodeDefSyntax(good))
+      << "NodeDef: " << SummarizeNodeDef(good);
+}
+
+void ExpectInvalidSyntax(const NodeDef& bad, const string& message) {
+  Status status = ValidateExternalNodeDefSyntax(bad);
+
+  ASSERT_FALSE(status.ok()) << "NodeDef: " << SummarizeNodeDef(bad);
+
+  EXPECT_TRUE(errors::IsInvalidArgument(status))
+      << status << "; NodeDef: " << SummarizeNodeDef(bad);
+
+  EXPECT_TRUE(StringPiece(status.ToString()).contains(message))
+      << "NodeDef: " << SummarizeNodeDef(bad) << ", " << status << ", "
+      << message;
+}
+
+TEST(NodeDefUtilTest, ValidSyntax) {
+  const NodeDef node_def = ToNodeDef(R"proto(
+    name:'n' op:'AnyIn' input:'a' input:'b'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectValidSyntax(node_def);
+
+  const NodeDef node_def_explicit_inputs = ToNodeDef(R"proto(
+    name:'n' op:'AnyIn' input:'a:0' input:'b:123'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectValidSyntax(node_def_explicit_inputs);
+
+  EXPECT_EQ("n = AnyIn[T=[DT_INT32, DT_STRING]](a:0, b:123)",
+            SummarizeNodeDef(node_def_explicit_inputs));
+
+  const NodeDef node_def_control_input = ToNodeDef(R"proto(
+    name:'n-' op:'AnyIn' input:'a' input:'^b'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectValidSyntax(node_def_control_input);
+
+  const NodeDef node_def_invalid_name = ToNodeDef(R"proto(
+    name:'n:0' op:'AnyIn' input:'a' input:'b'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectInvalidSyntax(node_def_invalid_name, "Illegal op name 'n:0'");
+
+  const NodeDef node_def_internal_name = ToNodeDef(R"proto(
+    name:'_n' op:'AnyIn' input:'a' input:'b'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectInvalidSyntax(node_def_internal_name, "Illegal op name '_n'");
+
+  const NodeDef node_def_internal_input_name = ToNodeDef(R"proto(
+    name:'n' op:'AnyIn' input:'_a' input:'b'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectInvalidSyntax(node_def_internal_input_name,
+                      "Illegal op input name '_a'");
+
+  const NodeDef node_def_invalid_control_input_name = ToNodeDef(R"proto(
+    name:'n' op:'AnyIn' input:'a' input:'^b:0'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectInvalidSyntax(node_def_invalid_control_input_name,
+                      "Illegal op input name '^b:0'");
+
+  const NodeDef node_def_data_input_after_control = ToNodeDef(R"proto(
+    name:'n' op:'AnyIn' input:'^a' input:'b'
+    attr { key:'T' value { list { type: [DT_INT32, DT_STRING] } } }
+    )proto");
+  ExpectInvalidSyntax(node_def_data_input_after_control,
+                      "All control inputs must follow all data inputs");
+}
+
+TEST(NameRangesForNodeTest, Simple) {
+  const OpDef op_def = ToOpDef(OpDefBuilder("Simple")
+                                   .Input("a: float")
+                                   .Input("b: int32")
+                                   .Output("c: string")
+                                   .Output("d: bool"));
+  NameRangeMap inputs, outputs;
+  const NodeDef node_def = ToNodeDef(
+      NodeDefBuilder("simple", &op_def).Input(FakeInput()).Input(FakeInput()));
+  EXPECT_OK(NameRangesForNode(node_def, op_def, &inputs, &outputs));
+  EXPECT_EQ(NameRangeMap({{"a", {0, 1}}, {"b", {1, 2}}}), inputs);
+  EXPECT_EQ(NameRangeMap({{"c", {0, 1}}, {"d", {1, 2}}}), outputs);
+
+  EXPECT_EQ("simple = Simple[](a, b)", SummarizeNodeDef(node_def));
+
+  OpDef bad_op_def = op_def;
+  bad_op_def.mutable_input_arg(0)->clear_type();
+  EXPECT_FALSE(NameRangesForNode(node_def, bad_op_def, &inputs, &outputs).ok());
+}
+
+TEST(NameRangesForNodeTest, Polymorphic) {
+  const OpDef op_def = ToOpDef(OpDefBuilder("Polymorphic")
+                                   .Input("a: T")
+                                   .Input("b: T")
+                                   .Output("c: T")
+                                   .Attr("T: type"));
+  NameRangeMap inputs, outputs;
+  const NodeDef node_def1 = ToNodeDef(NodeDefBuilder("poly", &op_def)
+                                          .Input(FakeInput(DT_INT32))
+                                          .Input(FakeInput(DT_INT32)));
+  EXPECT_OK(NameRangesForNode(node_def1, op_def, &inputs, &outputs));
+  EXPECT_EQ(NameRangeMap({{"a", {0, 1}}, {"b", {1, 2}}}), inputs);
+  EXPECT_EQ(NameRangeMap({{"c", {0, 1}}}), outputs);
+  EXPECT_EQ("poly = Polymorphic[T=DT_INT32](a, b)",
+            SummarizeNodeDef(node_def1));
+
+  const NodeDef node_def2 = ToNodeDef(NodeDefBuilder("poly", &op_def)
+                                          .Input(FakeInput(DT_BOOL))
+                                          .Input(FakeInput(DT_BOOL)));
+  EXPECT_OK(NameRangesForNode(node_def2, op_def, &inputs, &outputs));
+  EXPECT_EQ(NameRangeMap({{"a", {0, 1}}, {"b", {1, 2}}}), inputs);
+  EXPECT_EQ(NameRangeMap({{"c", {0, 1}}}), outputs);
+  EXPECT_EQ("poly = Polymorphic[T=DT_BOOL](a, b)", SummarizeNodeDef(node_def2));
+}
+
+TEST(NameRangesForNodeTest, NRepeats) {
+  const OpDef op_def = ToOpDef(OpDefBuilder("NRepeats")
+                                   .Input("a: N * int32")
+                                   .Input("b: N * T")
+                                   .Output("c: T")
+                                   .Output("d: N * string")
+                                   .Output("e: M * bool")
+                                   .Attr("N: int")
+                                   .Attr("M: int")
+                                   .Attr("T: type"));
+  NameRangeMap inputs, outputs;
+  const NodeDef node_def1 = ToNodeDef(NodeDefBuilder("nr", &op_def)
+                                          .Input(FakeInput(4, DT_INT32))
+                                          .Input(FakeInput(4, DT_FLOAT))
+                                          .Attr("M", 3));
+  EXPECT_OK(NameRangesForNode(node_def1, op_def, &inputs, &outputs));
+  EXPECT_EQ(NameRangeMap({{"a", {0, 4}}, {"b", {4, 8}}}), inputs);
+  EXPECT_EQ(NameRangeMap({{"c", {0, 1}}, {"d", {1, 5}}, {"e", {5, 8}}}),
+            outputs);
+  EXPECT_EQ(
+      "nr = NRepeats[M=3, N=4, T=DT_FLOAT](a, a:1, a:2, a:3, b, b:1, b:2, b:3)",
+      SummarizeNodeDef(node_def1));
+
+  const NodeDef node_def2 = ToNodeDef(NodeDefBuilder("nr", &op_def)
+                                          .Input(FakeInput(2, DT_INT32))
+                                          .Input(FakeInput(2, DT_DOUBLE))
+                                          .Attr("M", 7));
+  EXPECT_OK(NameRangesForNode(node_def2, op_def, &inputs, &outputs));
+  EXPECT_EQ(NameRangeMap({{"a", {0, 2}}, {"b", {2, 4}}}), inputs);
+  EXPECT_EQ(NameRangeMap({{"c", {0, 1}}, {"d", {1, 3}}, {"e", {3, 10}}}),
+            outputs);
+  EXPECT_EQ("nr = NRepeats[M=7, N=2, T=DT_DOUBLE](a, a:1, b, b:1)",
+            SummarizeNodeDef(node_def2));
+
+  NodeDef bad_node_def = node_def2;
+  bad_node_def.clear_attr();
+  EXPECT_FALSE(NameRangesForNode(bad_node_def, op_def, &inputs, &outputs).ok());
+}
+
+TEST(NameRangesForNodeTest, TypeList) {
+  const OpDef op_def = ToOpDef(OpDefBuilder("TypeList")
+                                   .Input("a: T1")
+                                   .Input("b: T2")
+                                   .Output("c: T2")
+                                   .Output("d: T3")
+                                   .Output("e: T1")
+                                   .Attr("T1: list(type)")
+                                   .Attr("T2: list(type)")
+                                   .Attr("T3: list(type)"));
+  NameRangeMap inputs, outputs;
+  const NodeDef node_def1 =
+      ToNodeDef(NodeDefBuilder("tl", &op_def)
+                    .Input(FakeInput({DT_BOOL, DT_FLOAT}))
+                    .Input(FakeInput(4, DT_FLOAT))
+                    .Attr("T3", {DT_INT32, DT_DOUBLE, DT_STRING}));
+  EXPECT_OK(NameRangesForNode(node_def1, op_def, &inputs, &outputs));
+  EXPECT_EQ(NameRangeMap({{"a", {0, 2}}, {"b", {2, 6}}}), inputs);
+  EXPECT_EQ(NameRangeMap({{"c", {0, 4}}, {"d", {4, 7}}, {"e", {7, 9}}}),
+            outputs);
+  EXPECT_EQ(
+      "tl = TypeList[T1=[DT_BOOL, DT_FLOAT],"
+      " T2=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT],"
+      " T3=[DT_INT32, DT_DOUBLE, DT_STRING]](a, a:1, b, b:1, b:2, b:3)",
+      SummarizeNodeDef(node_def1));
+
+  const NodeDef node_def2 = ToNodeDef(NodeDefBuilder("tl", &op_def)
+                                          .Input(FakeInput(7, DT_INT32))
+                                          .Input(FakeInput({DT_DOUBLE}))
+                                          .Attr("T3", {DT_DOUBLE, DT_STRING}));
+  EXPECT_OK(NameRangesForNode(node_def2, op_def, &inputs, &outputs));
+  EXPECT_EQ(NameRangeMap({{"a", {0, 7}}, {"b", {7, 8}}}), inputs);
+  EXPECT_EQ(NameRangeMap({{"c", {0, 1}}, {"d", {1, 3}}, {"e", {3, 10}}}),
+            outputs);
+  EXPECT_EQ(
+      "tl = TypeList[T1=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32,"
+      " DT_INT32, DT_INT32], T2=[DT_DOUBLE], T3=[DT_DOUBLE, DT_STRING]]"
+      "(a, a:1, a:2, a:3, a:4, a:5, a:6, b)",
+      SummarizeNodeDef(node_def2));
+
+  NodeDef bad_node_def = node_def2;
+  bad_node_def.clear_attr();
+  EXPECT_FALSE(NameRangesForNode(bad_node_def, op_def, &inputs, &outputs).ok());
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/numeric_op.h b/tensorflow/core/framework/numeric_op.h
new file mode 100644
index 0000000000..8413d18f33
--- /dev/null
+++ b/tensorflow/core/framework/numeric_op.h
@@ -0,0 +1,96 @@
+#ifndef TENSORFLOW_FRAMEWORK_NUMERIC_OP_H_
+#define TENSORFLOW_FRAMEWORK_NUMERIC_OP_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// One input and one output, both the same type.
+template <class T>
+class UnaryOp : public OpKernel {
+ public:
+  explicit UnaryOp(OpKernelConstruction* context) : OpKernel(context) {
+    const DataType dt = DataTypeToEnum<T>::v();
+    OP_REQUIRES_OK(context, context->MatchSignature({dt}, {dt}));
+  }
+};
+
+// Two inputs and one output, all the same type.
+template <class T>
+class BinaryOp : public OpKernel {
+ public:
+  explicit BinaryOp(OpKernelConstruction* context) : OpKernel(context) {
+    const DataType dt = DataTypeToEnum<T>::v();
+    OP_REQUIRES_OK(context, context->MatchSignature({dt, dt}, {dt}));
+  }
+};
+
+// For operations where the input and output are the same shape.
+//
+// For usage, see ../framework/elementwise_ops.cc.
+template <class T, class CHILD>
+class UnaryElementWiseOp : public UnaryOp<T> {
+ public:
+  using UnaryOp<T>::UnaryOp;
+
+  void Compute(OpKernelContext* context) override {
+    // Output shape is the same as input shape.
+    const Tensor& input = context->input(0);
+    Tensor* output;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input.shape(), &output));
+    static_cast<CHILD*>(this)->Operate(context, input, output);
+  }
+};
+
+// For binary elementwise operations.
+template <class T, class CHILD>
+class BinaryElementWiseOp : public BinaryOp<T> {
+ public:
+  using BinaryOp<T>::BinaryOp;
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& a = context->input(0);
+    const Tensor& b = context->input(1);
+
+    if (!context->ValidateInputsAreSameShape(this)) {
+      return;
+    }
+
+    Tensor* output;
+    OP_REQUIRES_OK(context, context->allocate_output(0, a.shape(), &output));
+
+    // Dispatch to the descendant's Operate() function.
+    switch (a.dims()) {
+#define NDIM_CASE(NDIMS)                                                       \
+  case NDIMS: {                                                                \
+    static_cast<CHILD*>(this)->template Operate<NDIMS>(context, a, b, output); \
+    break;                                                                     \
+  }
+
+      NDIM_CASE(1);
+      NDIM_CASE(2);
+      NDIM_CASE(3);
+      NDIM_CASE(4);
+      NDIM_CASE(5);
+      NDIM_CASE(6);
+      NDIM_CASE(7);
+      NDIM_CASE(8);
+#undef NDIM_CASE
+
+      default:
+        context->SetStatus(errors::OutOfRange(
+            "We only handle up to Tensor::dims() up to 8, not ", a.dims()));
+        break;
+    }
+  }
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_NUMERIC_OP_H_
diff --git a/tensorflow/core/framework/numeric_types.h b/tensorflow/core/framework/numeric_types.h
new file mode 100644
index 0000000000..366f00ae03
--- /dev/null
+++ b/tensorflow/core/framework/numeric_types.h
@@ -0,0 +1,15 @@
+#ifndef TENSORFLOW_FRAMEWORK_NUMERIC_TYPES_H_
+#define TENSORFLOW_FRAMEWORK_NUMERIC_TYPES_H_
+
+#include <complex>
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// Single precision complex.
+typedef std::complex<float> complex64;
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_NUMERIC_TYPES_H_
diff --git a/tensorflow/core/framework/op.cc b/tensorflow/core/framework/op.cc
new file mode 100644
index 0000000000..15b7eab4da
--- /dev/null
+++ b/tensorflow/core/framework/op.cc
@@ -0,0 +1,135 @@
+#include "tensorflow/core/framework/op.h"
+
+#include <algorithm>
+#include <memory>
+#include <vector>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+// OpRegistry -----------------------------------------------------------------
+
+OpRegistryInterface::~OpRegistryInterface() {}
+
+OpRegistry::OpRegistry() : initialized_(false) {}
+
+void OpRegistry::Register(std::function<OpDef(void)> func) {
+  mutex_lock lock(mu_);
+  if (initialized_) {
+    OpDef def = func();
+    TF_QCHECK_OK(RegisterAlreadyLocked(def)) << "Attempting to register: "
+                                             << SummarizeOpDef(def);
+  } else {
+    deferred_.push_back(func);
+  }
+}
+
+const OpDef* OpRegistry::LookUp(const string& op_type_name,
+                                Status* status) const {
+  const OpDef* op_def = nullptr;
+  bool first_call = false;
+  {  // Scope for lock.
+    mutex_lock lock(mu_);
+    first_call = CallDeferred();
+    op_def = gtl::FindWithDefault(registry_, op_type_name, nullptr);
+    // Note: Can't hold mu_ while calling Export() below.
+  }
+  if (first_call) {
+    TF_QCHECK_OK(ValidateKernelRegistrations(this));
+  }
+  if (op_def == nullptr) {
+    status->Update(
+        errors::NotFound("Op type not registered '", op_type_name, "'"));
+    static bool first = true;
+    if (first) {
+      OpList op_list;
+      Export(true, &op_list);
+      LOG(INFO) << "All registered Ops:";
+      for (const auto& op : op_list.op()) {
+        LOG(INFO) << SummarizeOpDef(op);
+      }
+      first = false;
+    }
+  }
+  return op_def;
+}
+
+void OpRegistry::Export(bool include_internal, OpList* ops) const {
+  mutex_lock lock(mu_);
+  CallDeferred();
+
+  std::vector<std::pair<string, const OpDef*>> sorted(registry_.begin(),
+                                                      registry_.end());
+  std::sort(sorted.begin(), sorted.end());
+
+  auto out = ops->mutable_op();
+  out->Clear();
+  out->Reserve(sorted.size());
+
+  for (const auto& item : sorted) {
+    if (include_internal || !StringPiece(item.first).starts_with("_")) {
+      *out->Add() = *item.second;
+    }
+  }
+}
+
+string OpRegistry::DebugString(bool include_internal) const {
+  OpList op_list;
+  Export(include_internal, &op_list);
+  string ret;
+  for (const auto& op : op_list.op()) {
+    strings::StrAppend(&ret, SummarizeOpDef(op), "\n");
+  }
+  return ret;
+}
+
+bool OpRegistry::CallDeferred() const {
+  if (initialized_) return false;
+  initialized_ = true;
+  for (const auto& fn : deferred_) {
+    OpDef def = fn();
+    TF_QCHECK_OK(RegisterAlreadyLocked(def)) << "Attempting to register: "
+                                             << SummarizeOpDef(def);
+  }
+  deferred_.clear();
+  return true;
+}
+
+Status OpRegistry::RegisterAlreadyLocked(const OpDef& def) const {
+  TF_RETURN_IF_ERROR(ValidateOpDef(def));
+
+  std::unique_ptr<OpDef> copy(new OpDef(def));
+  if (gtl::InsertIfNotPresent(&registry_, def.name(), copy.get())) {
+    copy.release();  // Ownership transferred to op_registry
+    return Status::OK();
+  } else {
+    return errors::AlreadyExists("Op with name ", def.name());
+  }
+}
+
+// static
+OpRegistry* OpRegistry::Global() {
+  static OpRegistry* global_op_registry = new OpRegistry;
+  return global_op_registry;
+}
+
+namespace register_op {
+OpDefBuilder& RegisterOp(StringPiece name) {
+  VLOG(1) << "RegisterOp: " << name;
+  OpDefBuilder* b = new OpDefBuilder(name);
+  OpRegistry::Global()->Register([b]() -> ::tensorflow::OpDef {
+    OpDef op_def;
+    TF_QCHECK_OK(b->Finalize(&op_def));
+    delete b;
+    return op_def;
+  });
+  return *b;
+}
+}  // namespace register_op
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/op.h b/tensorflow/core/framework/op.h
new file mode 100644
index 0000000000..95ad32df35
--- /dev/null
+++ b/tensorflow/core/framework/op.h
@@ -0,0 +1,122 @@
+#ifndef TENSORFLOW_FRAMEWORK_OP_H_
+#define TENSORFLOW_FRAMEWORK_OP_H_
+
+#include <functional>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/framework/op_def_builder.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// Users that want to look up an OpDef by type name should take an
+// OpRegistryInterface.  Functions accepting a
+// (const) OpRegistryInterface* may call LookUp() from multiple threads.
+class OpRegistryInterface {
+ public:
+  virtual ~OpRegistryInterface();
+
+  // Returns nullptr and sets *status if no OpDef is registered under that
+  // name, otherwise returns the registered OpDef.
+  // Caller must not delete the returned pointer.
+  virtual const OpDef* LookUp(const string& op_type_name,
+                              Status* status) const = 0;
+};
+
+// The standard implementation of OpRegistryInterface, along with a
+// global singleton used for registering OpDefs via the REGISTER
+// macros below.  Thread-safe.
+//
+// Example registration:
+//   OpRegistry::Global()->Register([]()->OpDef{
+//     OpDef def;
+//     // Populate def here.
+//     return def;
+//   });
+class OpRegistry : public OpRegistryInterface {
+ public:
+  OpRegistry();
+  ~OpRegistry() override {}
+
+  // Calls func() and registers the returned OpDef.  Since Register()
+  // is normally called during program initialization (before main()),
+  // we defer calling func() until the first call to LookUp() or
+  // Export() (if one of those has already been called, func() is
+  // called immediately).
+  void Register(std::function<OpDef(void)> func);
+
+  const OpDef* LookUp(const string& op_type_name,
+                      Status* status) const override;
+
+  // Fills *ops with all registered OpDefss (except those with names
+  // starting with '_' if include_internal == false).
+  void Export(bool include_internal, OpList* ops) const;
+
+  // Returns ASCII-format OpList for all registered OpDefs (except
+  // those with names starting with '_' if include_internal == false).
+  string DebugString(bool include_internal) const;
+
+  // A singleton available at startup.
+  static OpRegistry* Global();
+
+ private:
+  // Ensures that all the functions in deferred_ get called, their OpDef's
+  // registered, and returns with deferred_ empty.  Returns true the first
+  // time it is called.
+  bool CallDeferred() const EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Add 'def' to the registry.  On failure, or if there is already an
+  // OpDef with that name registered, returns a non-okay status.
+  Status RegisterAlreadyLocked(const OpDef& def) const
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  mutable mutex mu_;
+  // Functions in deferred_ may only be called with mu_ held.
+  mutable std::vector<std::function<OpDef(void)>> deferred_ GUARDED_BY(mu_);
+  mutable std::unordered_map<string, OpDef*> registry_ GUARDED_BY(mu_);
+  mutable bool initialized_ GUARDED_BY(mu_);
+};
+
+// Support for defining the OpDef (specifying the semantics of the Op and how
+// it should be created) and registering it in the OpRegistry::Global()
+// registry.  Usage:
+//
+// REGISTER_OP("my_op_name")
+//     .Attr("<name>:<type>")
+//     .Attr("<name>:<type>=<default>")
+//     .Input("<name>:<type-expr>")
+//     .Input("<name>:Ref(<type-expr>)")
+//     .Output("<name>:<type-expr>")
+//     .Doc(R"(
+// <1-line summary>
+// <rest of the description (potentially many lines)>
+// <name-of-attr-input-or-output>: <description of name>
+// <name-of-attr-input-or-output>: <description of name;
+//   if long, indent the description on subsequent lines>
+// )");
+//
+// Note: .Doc() should be last.
+// For details, see the OpDefBuilder class in op_def_builder.h.
+
+namespace register_op {
+// To call OpRegistry::Global()->Register(...), used by the
+// REGISTER_OP macro below.
+OpDefBuilder& RegisterOp(StringPiece name);
+}  // namespace register_op
+
+#define REGISTER_OP(name) REGISTER_OP_UNIQ_HELPER(__COUNTER__, name)
+#define REGISTER_OP_UNIQ_HELPER(ctr, name) REGISTER_OP_UNIQ(ctr, name)
+#define REGISTER_OP_UNIQ(ctr, name)                                         \
+  static ::tensorflow::OpDefBuilder& register_op##ctr TF_ATTRIBUTE_UNUSED = \
+      ::tensorflow::register_op::RegisterOp(name)
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_OP_H_
diff --git a/tensorflow/core/framework/op_def.proto b/tensorflow/core/framework/op_def.proto
new file mode 100644
index 0000000000..4a2e90b1b9
--- /dev/null
+++ b/tensorflow/core/framework/op_def.proto
@@ -0,0 +1,142 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/attr_value.proto";
+import "tensorflow/core/framework/types.proto";
+
+// Defines an operation. A NodeDef in a GraphDef specifies an Op by
+// using the "op" field which should match the name of a OpDef.
+message OpDef {
+  // Op names starting with an underscore are reserved for internal use.
+  // Names should be CamelCase and match the regexp "[A-Z][a-zA-Z0-9_]*".
+  string name = 1;
+
+  // For describing inputs and outputs.
+  message ArgDef {
+    // Name for the input/output.  Should match the regexp "[a-z][a-z0-9_]*".
+    string name = 1;
+
+    // Human readable description.
+    string description = 2;
+
+    // Describes the type of one or more tensors that are accepted/produced
+    // by this input/output arg.  The only legal combinations are:
+    // * For a single tensor: either the "type" field is set or the
+    //   "type_attr" field is set to the name of an attr with type "type".
+    // * For a sequence of tensors with the same type: the "number_attr"
+    //   field will be set to the name of an attr with type "int", and
+    //   either the "type" or "type_attr" field will be set as for
+    //   single tensors.
+    // * For a sequence of tensors, the "type_list_attr" field will be set
+    //   to the name of an attr with type "list(type)".
+    DataType type = 3;
+    string type_attr = 4;    // if specified, attr must have type "type"
+    string number_attr = 5;  // if specified, attr must have type "int"
+    // If specified, attr must have type "list(type)", and none of
+    // type, type_attr, and number_attr may be specified.
+    string type_list_attr = 6;
+
+    // For inputs: if true, the inputs are required to be refs.
+    //   By default, inputs can be either refs or non-refs.
+    // For outputs: if true, outputs are refs, otherwise they are not.
+    bool is_ref = 16;
+  };
+
+  // Description of the input(s).
+  repeated ArgDef input_arg = 2;
+
+  // Description of the output(s).
+  repeated ArgDef output_arg = 3;
+
+  // Description of the graph-construction-time configuration of this
+  // Op.  That is to say, this describes the attr fields that will
+  // be specified in the NodeDef.
+  message AttrDef {
+    // A descriptive name for the argument.  May be used, e.g. by the
+    // Python client, as a keyword argument name, and so should match
+    // the regexp "[a-z][a-z0-9_]+".
+    string name = 1;
+
+    // One of the type names from attr_value.proto ("string", "list(string)",
+    // "int", etc.).
+    string type = 2;
+
+    // A reasonable default for this attribute if the user does not supply
+    // a value.  If not specified, the user must supply a value.
+    AttrValue default_value = 3;
+
+    // Human-readable description.
+    string description = 4;
+
+    // TODO(josh11b): bool is_optional?
+
+    // --- Constraints ---
+    // These constraints are only in effect if specified.  Default is no
+    // constraints.
+
+    // For type == "int", this is a minimum value.  For "list(___)"
+    // types, this is the minimum length.
+    bool has_minimum = 5;
+    int64 minimum = 6;
+
+    // The set of allowed values.  Has type that is the "list" version
+    // of the "type" field above (uses the ..._list, fields of AttrValue).
+    // If type == "type" or "list(type)" above, then the type_list field
+    // of allowed_values has the set of allowed DataTypes.
+    // If type == "string" or "list(string)", then the s_list field has
+    // the set of allowed strings.
+    AttrValue allowed_values = 7;
+  }
+  repeated AttrDef attr = 4;
+
+  // One-line human-readable description of what the Op does.
+  string summary = 5;
+
+  // Additional, longer human-readable description of what the Op does.
+  string description = 6;
+
+  // -------------------------------------------------------------------------
+  // Which optimizations this operation can participate in.
+
+  // True if the operation is commutative ("op(a,b) == op(b,a)" for all inputs)
+  bool is_commutative = 18;
+
+  // If is_aggregate is true, then this operation accepts N >= 2
+  // inputs and produces 1 output all of the same type.  Should be
+  // associative and commutative, and produce output with the same
+  // shape as the input.  The optimizer may replace an aggregate op
+  // taking input from multiple devices with a tree of aggregate ops
+  // that aggregate locally within each device (and possibly within
+  // groups of nearby devices) before communicating.
+  // TODO(josh11b): Implement that optimization.
+  bool is_aggregate = 16;  // for things like add
+
+  // Other optimizations go here, like
+  //   can_alias_input, rewrite_when_output_unused, partitioning_strategy, etc.
+
+  // -------------------------------------------------------------------------
+  // Optimization constraints.
+
+  // By default Ops may be moved between devices.  Stateful ops should
+  // either not be moved, or should only be moved if that state can also
+  // be moved (e.g. via some sort of save / restore).
+  // Stateful ops are guaranteed to never be optimized away by Common
+  // Subexpression Elimination (CSE).
+  bool is_stateful = 17;  // for things like variables, queue
+
+  // -------------------------------------------------------------------------
+  // Non-standard options.
+
+  // By default, all inputs to an Op must be initialized Tensors.  Ops
+  // that may initialize tensors for the first time should set this
+  // field to true, to allow the Op to take an uninitialized Tensor as
+  // input.
+  bool allows_uninitialized_input = 19;  // for Assign, etc.
+};
+
+// A collection of OpDefs
+message OpList {
+  repeated OpDef op = 1;
+};
diff --git a/tensorflow/core/framework/op_def_builder.cc b/tensorflow/core/framework/op_def_builder.cc
new file mode 100644
index 0000000000..7d7c07de4c
--- /dev/null
+++ b/tensorflow/core/framework/op_def_builder.cc
@@ -0,0 +1,447 @@
+#include "tensorflow/core/framework/op_def_builder.h"
+
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/regexp.h"
+
+namespace tensorflow {
+
+namespace {
+
+bool RE2Consume(StringPiece* sp, const char* pattern) {
+  RegexpStringPiece base_sp = ToRegexpStringPiece(*sp);
+  bool r = RE2::Consume(&base_sp, pattern);
+  *sp = FromRegexpStringPiece(base_sp);
+  return r;
+}
+
+bool RE2Consume(StringPiece* sp, const char* pattern, StringPiece* out) {
+  RegexpStringPiece base_sp = ToRegexpStringPiece(*sp);
+  RegexpStringPiece base_out;
+  bool r = RE2::Consume(&base_sp, pattern, &base_out);
+  *sp = FromRegexpStringPiece(base_sp);
+  *out = FromRegexpStringPiece(base_out);
+  return r;
+}
+
+bool RE2Consume(StringPiece* sp, const char* pattern, int64* out) {
+  RegexpStringPiece base_sp = ToRegexpStringPiece(*sp);
+  bool r = RE2::Consume(&base_sp, pattern, out);
+  *sp = FromRegexpStringPiece(base_sp);
+  return r;
+}
+
+string AttrError(StringPiece orig, const string& op_name) {
+  return strings::StrCat(" from Attr(\"", orig, "\") for Op ", op_name);
+}
+
+#define VERIFY(expr, ...)                                                 \
+  do {                                                                    \
+    if (!(expr)) {                                                        \
+      errors->push_back(                                                  \
+          strings::StrCat(__VA_ARGS__, AttrError(orig, op_def->name()))); \
+      return;                                                             \
+    }                                                                     \
+  } while (false)
+
+void FinalizeAttr(StringPiece spec, OpDef* op_def,
+                  std::vector<string>* errors) {
+  OpDef::AttrDef* attr = op_def->add_attr();
+  StringPiece orig(spec);
+
+  // Parse "<name>:" at the beginning.
+  StringPiece tmp_name;
+  VERIFY(RE2Consume(&spec, "([a-zA-Z][a-zA-Z0-9_]*)\\s*:\\s*", &tmp_name),
+         "Trouble parsing '<name>:'");
+  attr->set_name(tmp_name.data(), tmp_name.size());
+
+  // Read "<type>" or "list(<type>)".
+  bool is_list = RE2Consume(&spec, "list\\s*\\(\\s*");
+  string type;
+  if (spec.Consume("string")) {
+    type = "string";
+  } else if (spec.Consume("int")) {
+    type = "int";
+  } else if (spec.Consume("float")) {
+    type = "float";
+  } else if (spec.Consume("bool")) {
+    type = "bool";
+  } else if (spec.Consume("type")) {
+    type = "type";
+  } else if (spec.Consume("shape")) {
+    type = "shape";
+  } else if (spec.Consume("tensor")) {
+    type = "tensor";
+  } else if (spec.Consume("func")) {
+    type = "func";
+  } else if (spec.Consume("numbertype") || spec.Consume("numerictype")) {
+    type = "type";
+    AttrValue* allowed = attr->mutable_allowed_values();
+    for (DataType dt : NumberTypes()) {
+      allowed->mutable_list()->add_type(dt);
+    }
+  } else if (spec.Consume("quantizedtype")) {
+    type = "type";
+    AttrValue* allowed = attr->mutable_allowed_values();
+    for (DataType dt : QuantizedTypes()) {
+      allowed->mutable_list()->add_type(dt);
+    }
+  } else if (spec.Consume("realnumbertype") ||
+             spec.Consume("realnumerictype")) {
+    type = "type";
+    AttrValue* allowed = attr->mutable_allowed_values();
+    for (DataType dt : RealNumberTypes()) {
+      allowed->mutable_list()->add_type(dt);
+    }
+  } else if (spec.Consume("{")) {
+    // e.g. "{ int32, float, bool }" or "{ \"foo\", \"bar\" }"
+    RE2Consume(&spec, "\\s*");
+    AttrValue* allowed = attr->mutable_allowed_values();
+    if (spec.starts_with("\"") || spec.starts_with("'")) {
+      type = "string";  // "{ \"foo\", \"bar\" }" or "{ 'foo', 'bar' }"
+      while (true) {
+        StringPiece escaped_string;
+        VERIFY((RE2Consume(&spec, R"xx("((?:[^"\\]|\\.)*)"\s*)xx",
+                           &escaped_string) ||
+                RE2Consume(&spec, R"xx('((?:[^'\\]|\\.)*)'\s*)xx",
+                           &escaped_string)),
+               "Trouble parsing allowed string at '", spec, "'");
+        string unescaped;
+        string error;
+        VERIFY(str_util::CUnescape(escaped_string, &unescaped, &error),
+               "Trouble unescaping \"", escaped_string, "\", got error: ",
+               error);
+        allowed->mutable_list()->add_s(unescaped);
+        if (spec.Consume(",")) {
+          RE2Consume(&spec, "\\s*");
+          if (spec.Consume("}")) break;  // Allow ending with ", }".
+        } else {
+          VERIFY(spec.Consume("}"),
+                 "Expected , or } after strings in list, not: '", spec, "'");
+          break;
+        }
+      }
+    } else {  // "{ int32, float, bool }"
+      type = "type";
+      while (true) {
+        StringPiece type_string;
+        VERIFY(RE2Consume(&spec, "([a-z0-9]+)\\s*", &type_string),
+               "Trouble parsing type string at '", spec, "'");
+        DataType dt;
+        VERIFY(DataTypeFromString(type_string, &dt),
+               "Unrecognized type string '", type_string, "'");
+        allowed->mutable_list()->add_type(dt);
+        if (spec.Consume(",")) {
+          RE2Consume(&spec, "\\s*");
+          if (spec.Consume("}")) break;  // Allow ending with ", }".
+        } else {
+          VERIFY(spec.Consume("}"),
+                 "Expected , or } after types in list, not: '", spec, "'");
+          break;
+        }
+      }
+    }
+  } else {
+    VERIFY(false, "Trouble parsing type string at '", spec, "'");
+  }
+  RE2Consume(&spec, "\\s*");
+
+  // Write the type into *attr.
+  if (is_list) {
+    VERIFY(spec.Consume(")"), "Expected ) to close 'list(', not: '", spec, "'");
+    RE2Consume(&spec, "\\s*");
+    attr->set_type(strings::StrCat("list(", type, ")"));
+  } else {
+    attr->set_type(type);
+  }
+
+  // Read optional minimum constraint at the end.
+  if ((is_list || type == "int") && spec.Consume(">=")) {
+    int64 min_limit = -999;
+    VERIFY(RE2Consume(&spec, "\\s*(-?\\d+)\\s*", &min_limit),
+           "Could not parse integer lower limit after '>=', found '", spec,
+           "' instead");
+    attr->set_has_minimum(true);
+    attr->set_minimum(min_limit);
+  }
+
+  // Parse default value, if present.
+  if (spec.Consume("=")) {
+    RE2Consume(&spec, "\\s*");
+    VERIFY(ParseAttrValue(attr->type(), spec, attr->mutable_default_value()),
+           "Could not parse default value '", spec, "'");
+  } else {
+    VERIFY(spec.empty(), "Extra '", spec, "' unparsed at the end");
+  }
+}
+
+#undef VERIFY
+
+string InOutError(bool is_output, StringPiece orig, const string& op_name) {
+  return strings::StrCat(" from ", is_output ? "Output" : "Input", "(\"", orig,
+                         "\") for Op ", op_name);
+}
+
+#define VERIFY(expr, ...)                                             \
+  do {                                                                \
+    if (!(expr)) {                                                    \
+      errors->push_back(strings::StrCat(                              \
+          __VA_ARGS__, InOutError(is_output, orig, op_def->name()))); \
+      return;                                                         \
+    }                                                                 \
+  } while (false)
+
+void FinalizeInputOrOutput(StringPiece spec, bool is_output, OpDef* op_def,
+                           std::vector<string>* errors) {
+  OpDef::ArgDef* arg =
+      is_output ? op_def->add_output_arg() : op_def->add_input_arg();
+
+  StringPiece orig(spec);
+
+  // Parse "<name>:" at the beginning.
+  StringPiece tmp_name;
+  VERIFY(RE2Consume(&spec, "([a-z][a-z0-9_]*)\\s*:\\s*", &tmp_name),
+         "Trouble parsing 'name:'");
+  arg->set_name(tmp_name.data(), tmp_name.size());
+
+  // Detect "Ref(...)".
+  if (RE2Consume(&spec, "Ref\\s*\\(\\s*")) {
+    arg->set_is_ref(true);
+  }
+
+  {  // Parse "<name|type>" or "<name>*<name|type>".
+    StringPiece first, second, type_or_attr;
+    VERIFY(RE2Consume(&spec, "([a-zA-Z][a-zA-Z0-9_]*)\\s*", &first),
+           "Trouble parsing either a type or an attr name at '", spec, "'");
+    if (RE2Consume(&spec, "[*]\\s*([a-zA-Z][a-zA-Z0-9_]*)\\s*", &second)) {
+      arg->set_number_attr(first.data(), first.size());
+      type_or_attr = second;
+    } else {
+      type_or_attr = first;
+    }
+    DataType dt;
+    if (DataTypeFromString(type_or_attr, &dt)) {
+      arg->set_type(dt);
+    } else {
+      const OpDef::AttrDef* attr = FindAttr(type_or_attr, *op_def);
+      VERIFY(attr != nullptr, "Reference to unknown attr '", type_or_attr, "'");
+      if (attr->type() == "type") {
+        arg->set_type_attr(type_or_attr.data(), type_or_attr.size());
+      } else {
+        VERIFY(attr->type() == "list(type)", "Reference to attr '",
+               type_or_attr, "' with type ", attr->type(),
+               " that isn't type or list(type)");
+        arg->set_type_list_attr(type_or_attr.data(), type_or_attr.size());
+      }
+    }
+  }
+
+  // Closing ) for Ref(.
+  if (arg->is_ref()) {
+    VERIFY(RE2Consume(&spec, "\\)\\s*"),
+           "Did not find closing ')' for 'Ref(', instead found: '", spec, "'");
+  }
+
+  // Should not have anything else.
+  VERIFY(spec.empty(), "Extra '", spec, "' unparsed at the end");
+
+  // Int attrs that are the length of an input or output get a default
+  // minimum of 1.
+  if (!arg->number_attr().empty()) {
+    OpDef::AttrDef* attr = FindAttrMutable(arg->number_attr(), op_def);
+    if (attr != nullptr && !attr->has_minimum()) {
+      attr->set_has_minimum(true);
+      attr->set_minimum(1);
+    }
+  } else if (!arg->type_list_attr().empty()) {
+    // If an input or output has type specified by a list(type) attr,
+    // it gets a default minimum of 1 as well.
+    OpDef::AttrDef* attr = FindAttrMutable(arg->type_list_attr(), op_def);
+    if (attr != nullptr && attr->type() == "list(type)" &&
+        !attr->has_minimum()) {
+      attr->set_has_minimum(true);
+      attr->set_minimum(1);
+    }
+  }
+}
+
+#undef VERIFY
+
+int num_leading_spaces(StringPiece s) {
+  size_t i = 0;
+  while (i < s.size() && s[i] == ' ') {
+    ++i;
+  }
+  return i;
+}
+
+void FinalizeDoc(const string& text, OpDef* op_def,
+                 std::vector<string>* errors) {
+  std::vector<string> lines = str_util::Split(text, '\n');
+
+  // Remove trailing spaces.
+  for (string& line : lines) {
+    str_util::StripTrailingWhitespace(&line);
+  }
+
+  // First non-blank line -> summary.
+  int l = 0;
+  while (static_cast<size_t>(l) < lines.size() && lines[l].empty()) ++l;
+  if (static_cast<size_t>(l) < lines.size()) {
+    op_def->set_summary(lines[l]);
+    ++l;
+  }
+  while (static_cast<size_t>(l) < lines.size() && lines[l].empty()) ++l;
+
+  // Lines until we see name: -> description.
+  int start_l = l;
+  while (static_cast<size_t>(l) < lines.size() &&
+         !RE2::PartialMatch(lines[l], "^[a-zA-Z][a-zA-Z0-9_]*\\s*:")) {
+    ++l;
+  }
+  int end_l = l;
+  // Trim trailing blank lines from the description.
+  while (start_l < end_l && lines[end_l - 1].empty()) --end_l;
+  string desc = str_util::Join(
+      gtl::ArraySlice<string>(lines.data() + start_l, end_l - start_l), "\n");
+  if (!desc.empty()) op_def->set_description(desc);
+
+  // name: description
+  //   possibly continued on the next line
+  //   if so, we remove the minimum indent
+  StringPiece name;
+  std::vector<StringPiece> description;
+  while (static_cast<size_t>(l) < lines.size()) {
+    description.clear();
+    description.push_back(lines[l]);
+    RE2Consume(&description.back(), "([a-zA-Z][a-zA-Z0-9_]*)\\s*:\\s*", &name);
+    ++l;
+    while (static_cast<size_t>(l) < lines.size() &&
+           !RE2::PartialMatch(lines[l], "^[a-zA-Z][a-zA-Z0-9_]*\\s*:")) {
+      description.push_back(lines[l]);
+      ++l;
+    }
+    // Remove any trailing blank lines.
+    while (!description.empty() && description.back().empty()) {
+      description.pop_back();
+    }
+    // Compute the minimum indent of all lines after the first.
+    int min_indent = -1;
+    for (size_t i = 1; i < description.size(); ++i) {
+      if (!description[i].empty()) {
+        int indent = num_leading_spaces(description[i]);
+        if (min_indent < 0 || indent < min_indent) min_indent = indent;
+      }
+    }
+    // Remove min_indent spaces from all lines after the first.
+    for (size_t i = 1; i < description.size(); ++i) {
+      if (!description[i].empty()) description[i].remove_prefix(min_indent);
+    }
+    // Concatenate lines into a single string.
+    const string complete(str_util::Join(description, "\n"));
+
+    // Find name.
+    bool found = false;
+    for (int i = 0; !found && i < op_def->input_arg_size(); ++i) {
+      if (op_def->input_arg(i).name() == name) {
+        op_def->mutable_input_arg(i)->set_description(complete);
+        found = true;
+      }
+    }
+    for (int i = 0; !found && i < op_def->output_arg_size(); ++i) {
+      if (op_def->output_arg(i).name() == name) {
+        op_def->mutable_output_arg(i)->set_description(complete);
+        found = true;
+      }
+    }
+    for (int i = 0; !found && i < op_def->attr_size(); ++i) {
+      if (op_def->attr(i).name() == name) {
+        op_def->mutable_attr(i)->set_description(complete);
+        found = true;
+      }
+    }
+    if (!found) {
+      errors->push_back(
+          strings::StrCat("No matching input/output/attr for name '", name,
+                          "' from Doc() for Op ", op_def->name()));
+      return;
+    }
+  }
+}
+
+}  // namespace
+
+OpDefBuilder::OpDefBuilder(StringPiece op_name) {
+  op_def_.set_name(op_name.ToString());  // NOLINT
+}
+
+OpDefBuilder& OpDefBuilder::Attr(StringPiece spec) {
+  attrs_.emplace_back(spec.data(), spec.size());
+  return *this;
+}
+
+OpDefBuilder& OpDefBuilder::Input(StringPiece spec) {
+  inputs_.emplace_back(spec.data(), spec.size());
+  return *this;
+}
+
+OpDefBuilder& OpDefBuilder::Output(StringPiece spec) {
+  outputs_.emplace_back(spec.data(), spec.size());
+  return *this;
+}
+
+OpDefBuilder& OpDefBuilder::Doc(StringPiece text) {
+  if (!doc_.empty()) {
+    errors_.push_back(
+        strings::StrCat("Extra call to Doc() for Op ", op_def_.name()));
+  } else {
+    doc_.assign(text.data(), text.size());
+  }
+  return *this;
+}
+
+OpDefBuilder& OpDefBuilder::SetIsCommutative() {
+  op_def_.set_is_commutative(true);
+  return *this;
+}
+
+OpDefBuilder& OpDefBuilder::SetIsAggregate() {
+  op_def_.set_is_aggregate(true);
+  return *this;
+}
+
+OpDefBuilder& OpDefBuilder::SetIsStateful() {
+  op_def_.set_is_stateful(true);
+  return *this;
+}
+
+OpDefBuilder& OpDefBuilder::SetAllowsUninitializedInput() {
+  op_def_.set_allows_uninitialized_input(true);
+  return *this;
+}
+
+Status OpDefBuilder::Finalize(OpDef* op_def) const {
+  std::vector<string> errors = errors_;
+  *op_def = op_def_;
+
+  for (StringPiece attr : attrs_) {
+    FinalizeAttr(attr, op_def, &errors);
+  }
+  for (StringPiece input : inputs_) {
+    FinalizeInputOrOutput(input, false, op_def, &errors);
+  }
+  for (StringPiece output : outputs_) {
+    FinalizeInputOrOutput(output, true, op_def, &errors);
+  }
+  FinalizeDoc(doc_, op_def, &errors);
+
+  if (errors.empty()) return Status::OK();
+  return errors::InvalidArgument(str_util::Join(errors, "\n"));
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/op_def_builder.h b/tensorflow/core/framework/op_def_builder.h
new file mode 100644
index 0000000000..017338c508
--- /dev/null
+++ b/tensorflow/core/framework/op_def_builder.h
@@ -0,0 +1,109 @@
+// Class and associated machinery for specifying an Op's OpDef for Op
+// registration.
+
+#ifndef TENSORFLOW_FRAMEWORK_OP_DEF_BUILDER_H_
+#define TENSORFLOW_FRAMEWORK_OP_DEF_BUILDER_H_
+
+#include <string>
+#include <vector>
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// Builder class passed to the REGISTER_OP() macro.
+class OpDefBuilder {
+ public:
+  // Constructs an OpDef with just the name field set.
+  explicit OpDefBuilder(StringPiece op_name);
+
+  // Adds an attr to this OpDefBuilder (and returns *this). The spec has
+  // format "<name>:<type>" or "<name>:<type>=<default>"
+  // where <name> matches regexp [a-zA-Z][a-zA-Z0-9_]*
+  // (by convention only using capital letters for attrs that can be inferred)
+  // <type> can be:
+  //   "string", "int", "float", "bool", "type", "shape", or "tensor"
+  //   "numbertype", "realnumbertype", "quantizedtype", "{int32,int64}"
+  //       (meaning "type" with a restriction on valid values)
+  //   "{\"foo\", \"bar\n baz\"}", or "{'foo', 'bar\n baz'}"
+  //       (meaning "string" with a restriction on valid values)
+  //   "list(string)", ..., "list(tensor)", "list(numbertype)", ...
+  //       (meaning lists of the above types)
+  //   "int >= 2" (meaning "int" with a restriction on valid values)
+  //   "list(string) >= 2", "list(int) >= 2"
+  //       (meaning "list(string)" / "list(int)" with length at least 2)
+  // <default>, if included, should use the Proto text format
+  // of <type>.  For lists use [a, b, c] format.
+  //
+  // Note that any attr specifying the length of an input or output will
+  // get a default minimum of 1 unless the >= # syntax is used.
+  //
+  // TODO(josh11b): Perhaps support restrictions and defaults as optional
+  // extra arguments to Attr() instead of encoding them in the spec string.
+  // TODO(josh11b): Would like to have better dtype handling for tensor attrs:
+  // * Ability to say the type of an input/output matches the type of
+  //   the tensor.
+  // * Ability to restrict the type of the tensor like the existing
+  //   restrictions for type attrs.
+  // Perhaps by linking the type of the tensor to a type attr?
+  OpDefBuilder& Attr(StringPiece spec);
+
+  // Adds an input or ouput to this OpDefBuilder (and returns *this).
+  // The spec has form "<name>:<type-expr>" or "<name>:Ref(<type-expr>)"
+  // where <name> matches regexp [a-z][a-z0-9_]* and <type-expr> can be:
+  // * For a single tensor: <type>
+  // * For a sequence of tensors with the same type: <number>*<type>
+  // * For a sequence of tensors with different types: <type-list>
+  // Where:
+  //   <type> is either one of "float", "int32", "string", ...
+  //                 or the name of an attr (see above) with type "type".
+  //   <number> is the name of an attr with type "int".
+  //   <type-list> is the name of an attr with type "list(type)".
+  // TODO(josh11b): Indicate Ref() via an optional argument instead of
+  // in the spec?
+  // TODO(josh11b): SparseInput() and SparseOutput() matching the Python
+  // handling?
+  OpDefBuilder& Input(StringPiece spec);
+  OpDefBuilder& Output(StringPiece spec);
+
+  // Turns on the indicated boolean flag in this OpDefBuilder (and
+  // returns *this).
+  OpDefBuilder& SetIsCommutative();
+  OpDefBuilder& SetIsAggregate();
+  OpDefBuilder& SetIsStateful();
+  OpDefBuilder& SetAllowsUninitializedInput();
+
+  // Adds docs to this OpDefBuilder (and returns *this).
+  // Docs have the format:
+  //   <1-line summary>
+  //   <rest of the description>
+  //   <name>: <description of name>
+  //   <name>: <description of name>
+  //     <if long, indent the description on subsequent lines>
+  // Where <name> is the name of an attr, input, or output.  Please
+  // wrap docs at 72 columns so that it may be indented in the
+  // generated output.  For tensor inputs or outputs (not attrs), you
+  // may start the description with an "=" (like name:= <description>)
+  // to suppress the automatically-generated type documentation in
+  // generated output.
+  OpDefBuilder& Doc(StringPiece text);
+
+  // Sets *op_def to the requested OpDef, or returns an error.
+  // Must be called after all of the above methods.
+  // Note that OpDefBuilder only reports parsing errors.  You should also
+  // call ValidateOpDef() to detect other problems.
+  Status Finalize(OpDef* op_def) const;
+
+ private:
+  OpDef op_def_;
+  std::vector<string> attrs_;
+  std::vector<string> inputs_;
+  std::vector<string> outputs_;
+  string doc_;
+  std::vector<string> errors_;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_OP_DEF_BUILDER_H_
diff --git a/tensorflow/core/framework/op_def_builder_test.cc b/tensorflow/core/framework/op_def_builder_test.cc
new file mode 100644
index 0000000000..e53bad7075
--- /dev/null
+++ b/tensorflow/core/framework/op_def_builder_test.cc
@@ -0,0 +1,519 @@
+#include "tensorflow/core/framework/op_def_builder.h"
+
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+static void CanonicalizeAttrTypeListOrder(OpDef* def) {
+  for (int i = 0; i < def->attr_size(); i++) {
+    AttrValue* a = def->mutable_attr(i)->mutable_allowed_values();
+    std::sort(a->mutable_list()->mutable_type()->begin(),
+              a->mutable_list()->mutable_type()->end());
+  }
+}
+
+class OpDefBuilderTest : public ::testing::Test {
+ protected:
+  OpDefBuilder b() { return OpDefBuilder("Test"); }
+
+  void ExpectSuccess(const OpDefBuilder& builder, StringPiece proto) {
+    OpDef op_def;
+    Status status = builder.Finalize(&op_def);
+    EXPECT_OK(status);
+    if (status.ok()) {
+      OpDef expected;
+      protobuf::TextFormat::ParseFromString(
+          strings::StrCat("name: 'Test' ", proto), &expected);
+      // Allow different orderings
+      CanonicalizeAttrTypeListOrder(&op_def);
+      CanonicalizeAttrTypeListOrder(&expected);
+      EXPECT_EQ(op_def.ShortDebugString(), expected.ShortDebugString());
+    }
+  }
+
+  void ExpectOrdered(const OpDefBuilder& builder, StringPiece proto) {
+    OpDef op_def;
+    Status status = builder.Finalize(&op_def);
+    EXPECT_OK(status);
+    if (status.ok()) {
+      OpDef expected;
+      protobuf::TextFormat::ParseFromString(
+          strings::StrCat("name: 'Test' ", proto), &expected);
+      EXPECT_EQ(op_def.ShortDebugString(), expected.ShortDebugString());
+    }
+  }
+
+  void ExpectFailure(const OpDefBuilder& builder, string error) {
+    OpDef op_def;
+    Status status = builder.Finalize(&op_def);
+    EXPECT_FALSE(status.ok());
+    if (!status.ok()) {
+      EXPECT_EQ(status.error_message(), error);
+    }
+  }
+};
+
+TEST_F(OpDefBuilderTest, Attr) {
+  ExpectSuccess(b().Attr("a:string"), "attr: { name: 'a' type: 'string' }");
+  ExpectSuccess(b().Attr("A: int"), "attr: { name: 'A' type: 'int' }");
+  ExpectSuccess(b().Attr("a1 :float"), "attr: { name: 'a1' type: 'float' }");
+  ExpectSuccess(b().Attr("a_a : bool"), "attr: { name: 'a_a' type: 'bool' }");
+  ExpectSuccess(b().Attr("aB  :  type"), "attr: { name: 'aB' type: 'type' }");
+  ExpectSuccess(b().Attr("aB_3\t: shape"),
+                "attr: { name: 'aB_3' type: 'shape' }");
+  ExpectSuccess(b().Attr("t: tensor"), "attr: { name: 't' type: 'tensor' }");
+  ExpectSuccess(b().Attr("XYZ\t:\tlist(type)"),
+                "attr: { name: 'XYZ' type: 'list(type)' }");
+  ExpectSuccess(b().Attr("f: func"), "attr { name: 'f' type: 'func'}");
+}
+
+TEST_F(OpDefBuilderTest, AttrFailure) {
+  ExpectFailure(
+      b().Attr("_:string"),
+      "Trouble parsing '<name>:' from Attr(\"_:string\") for Op Test");
+  ExpectFailure(
+      b().Attr("9:string"),
+      "Trouble parsing '<name>:' from Attr(\"9:string\") for Op Test");
+  ExpectFailure(b().Attr(":string"),
+                "Trouble parsing '<name>:' from Attr(\":string\") for Op Test");
+  ExpectFailure(b().Attr("string"),
+                "Trouble parsing '<name>:' from Attr(\"string\") for Op Test");
+  ExpectFailure(b().Attr("a:invalid"),
+                "Trouble parsing type string at 'invalid' from "
+                "Attr(\"a:invalid\") for Op Test");
+  ExpectFailure(
+      b().Attr("b:"),
+      "Trouble parsing type string at '' from Attr(\"b:\") for Op Test");
+}
+
+TEST_F(OpDefBuilderTest, AttrWithRestrictions) {
+  // Types with restrictions.
+  ExpectSuccess(b().Attr("a:numbertype"),
+                "attr: { name: 'a' type: 'type' allowed_values { list { type: "
+                "[DT_FLOAT, DT_DOUBLE, DT_INT64, DT_INT32, DT_UINT8, DT_INT16, "
+                "DT_INT8, DT_COMPLEX64, DT_QINT8, DT_QUINT8, DT_QINT32] } } }");
+  ExpectSuccess(b().Attr("a:realnumbertype"),
+                "attr: { name: 'a' type: 'type' allowed_values { list { type: "
+                "[DT_FLOAT, DT_DOUBLE, DT_INT64, DT_INT32, DT_UINT8, DT_INT16, "
+                "DT_INT8] } } }");
+  ExpectSuccess(b().Attr("a:quantizedtype"),
+                "attr: { name: 'a' type: 'type' allowed_values { list { type: "
+                "[DT_QINT8, DT_QUINT8, DT_QINT32] } } }");
+  ExpectSuccess(b().Attr("a:{string,int32}"),
+                "attr: { name: 'a' type: 'type' allowed_values { list { type: "
+                "[DT_STRING, DT_INT32] } } }");
+  ExpectSuccess(b().Attr("a: { float , complex64 } "),
+                "attr: { name: 'a' type: 'type' allowed_values { list { type: "
+                "[DT_FLOAT, DT_COMPLEX64] } } }");
+  ExpectSuccess(b().Attr("a: {float, complex64,} "),
+                "attr: { name: 'a' type: 'type' allowed_values { list { type: "
+                "[DT_FLOAT, DT_COMPLEX64] } }");
+  ExpectSuccess(b().Attr(R"(a: { "X", "yz" })"),
+                "attr: { name: 'a' type: 'string' allowed_values { list { s: "
+                "['X', 'yz'] } } }");
+  ExpectSuccess(b().Attr(R"(a: { "X", "yz", })"),
+                "attr: { name: 'a' type: 'string' allowed_values { list { s: "
+                "['X', 'yz'] } } }");
+  ExpectSuccess(
+      b().Attr("i: int >= -5"),
+      "attr: { name: 'i' type: 'int' has_minimum: true minimum: -5 }");
+}
+
+TEST_F(OpDefBuilderTest, AttrRestrictionFailure) {
+  ExpectFailure(
+      b().Attr("a:{}"),
+      "Trouble parsing type string at '}' from Attr(\"a:{}\") for Op Test");
+  ExpectFailure(
+      b().Attr("a:{,}"),
+      "Trouble parsing type string at ',}' from Attr(\"a:{,}\") for Op Test");
+  ExpectFailure(b().Attr("a:{invalid}"),
+                "Unrecognized type string 'invalid' from Attr(\"a:{invalid}\") "
+                "for Op Test");
+  ExpectFailure(b().Attr("a:{\"str\", float}"),
+                "Trouble parsing allowed string at 'float}' from "
+                "Attr(\"a:{\"str\", float}\") for Op Test");
+  ExpectFailure(b().Attr("a:{ float, \"str\" }"),
+                "Trouble parsing type string at '\"str\" }' from Attr(\"a:{ "
+                "float, \"str\" }\") for Op Test");
+  ExpectFailure(b().Attr("a:{float,,string}"),
+                "Trouble parsing type string at ',string}' from "
+                "Attr(\"a:{float,,string}\") for Op Test");
+  ExpectFailure(b().Attr("a:{float,,}"),
+                "Trouble parsing type string at ',}' from "
+                "Attr(\"a:{float,,}\") for Op Test");
+}
+
+TEST_F(OpDefBuilderTest, AttrListOfRestricted) {
+  ExpectSuccess(
+      b().Attr("a:list(realnumbertype)"),
+      "attr: { name: 'a' type: 'list(type)' allowed_values { list { type: "
+      "[DT_FLOAT, DT_DOUBLE, DT_INT64, DT_INT32, DT_UINT8, DT_INT16, "
+      "DT_INT8] } } }");
+  ExpectSuccess(
+      b().Attr("a:list(quantizedtype)"),
+      "attr: { name: 'a' type: 'list(type)' allowed_values { list { type: "
+      "[DT_QINT8, DT_QUINT8, DT_QINT32] } } }");
+  ExpectSuccess(
+      b().Attr("a: list({float, string, bool})"),
+      "attr: { name: 'a' type: 'list(type)' allowed_values { list { type: "
+      "[DT_FLOAT, DT_STRING, DT_BOOL] } } }");
+  ExpectSuccess(
+      b().Attr(R"(a: list({ "one fish", "two fish" }))"),
+      "attr: { name: 'a' type: 'list(string)' allowed_values { list { s: "
+      "['one fish', 'two fish'] } } }");
+  ExpectSuccess(
+      b().Attr(R"(a: list({ 'red fish', 'blue fish' }))"),
+      "attr: { name: 'a' type: 'list(string)' allowed_values { list { s: "
+      "['red fish', 'blue fish'] } } }");
+  ExpectSuccess(
+      b().Attr(R"(a: list({ "single' ", 'double"' }))"),
+      "attr: { name: 'a' type: 'list(string)' allowed_values { list { s: "
+      "[\"single' \", 'double\"'] } } }");
+  ExpectSuccess(
+      b().Attr(R"(a: list({ 'escape\'\n', "from\\\"NY" }))"),
+      "attr: { name: 'a' type: 'list(string)' allowed_values { list { s: "
+      "[\"escape'\\n\", 'from\\\\\"NY'] } } }");
+}
+
+TEST_F(OpDefBuilderTest, AttrListWithMinLength) {
+  ExpectSuccess(
+      b().Attr("i: list(bool) >= 4"),
+      "attr: { name: 'i' type: 'list(bool)' has_minimum: true minimum: 4 }");
+}
+
+TEST_F(OpDefBuilderTest, AttrWithDefaults) {
+  ExpectSuccess(b().Attr(R"(a:string="foo")"),
+                "attr: { name: 'a' type: 'string' default_value { s:'foo' } }");
+  ExpectSuccess(b().Attr(R"(a:string='foo')"),
+                "attr: { name: 'a' type: 'string' default_value { s:'foo' } }");
+  ExpectSuccess(b().Attr("a:float = 1.25"),
+                "attr: { name: 'a' type: 'float' default_value { f: 1.25 } }");
+  ExpectSuccess(b().Attr("a:tensor = { dtype: DT_INT32 int_val: 5 }"),
+                "attr: { name: 'a' type: 'tensor' default_value { tensor {"
+                "    dtype: DT_INT32 int_val: 5 } } }");
+  ExpectSuccess(b().Attr("a:shape = { dim { size: 3 } dim { size: 4 } }"),
+                "attr: { name: 'a' type: 'shape' default_value { shape {"
+                "    dim { size: 3 } dim { size: 4 } } } }");
+}
+
+TEST_F(OpDefBuilderTest, AttrFailedDefaults) {
+  ExpectFailure(b().Attr(R"(a:int="foo")"),
+                "Could not parse default value '\"foo\"' from "
+                "Attr(\"a:int=\"foo\"\") for Op Test");
+  ExpectFailure(b().Attr("a:float = [1.25]"),
+                "Could not parse default value '[1.25]' from Attr(\"a:float = "
+                "[1.25]\") for Op Test");
+}
+
+TEST_F(OpDefBuilderTest, AttrListWithDefaults) {
+  ExpectSuccess(b().Attr(R"(a:list(string)=["foo", "bar"])"),
+                "attr: { name: 'a' type: 'list(string)' "
+                "default_value { list { s: ['foo', 'bar'] } } }");
+  ExpectSuccess(b().Attr("a:list(bool)=[true, false, true]"),
+                "attr: { name: 'a' type: 'list(bool)' "
+                "default_value { list { b: [true, false, true] } } }");
+  ExpectSuccess(b().Attr(R"(a:list(int)=[0, -1, 2, -4, 8])"),
+                "attr: { name: 'a' type: 'list(int)' "
+                "default_value { list { i: [0, -1, 2, -4, 8] } } }");
+}
+
+TEST_F(OpDefBuilderTest, AttrFailedListDefaults) {
+  ExpectFailure(b().Attr(R"(a:list(int)=["foo"])"),
+                "Could not parse default value '[\"foo\"]' from "
+                "Attr(\"a:list(int)=[\"foo\"]\") for Op Test");
+  ExpectFailure(b().Attr(R"(a:list(int)=[7, "foo"])"),
+                "Could not parse default value '[7, \"foo\"]' from "
+                "Attr(\"a:list(int)=[7, \"foo\"]\") for Op Test");
+  ExpectFailure(b().Attr("a:list(float) = [[1.25]]"),
+                "Could not parse default value '[[1.25]]' from "
+                "Attr(\"a:list(float) = [[1.25]]\") for Op Test");
+  ExpectFailure(b().Attr("a:list(float) = 1.25"),
+                "Could not parse default value '1.25' from "
+                "Attr(\"a:list(float) = 1.25\") for Op Test");
+  ExpectFailure(b().Attr(R"(a:list(string)='foo')"),
+                "Could not parse default value ''foo'' from "
+                "Attr(\"a:list(string)='foo'\") for Op Test");
+}
+
+TEST_F(OpDefBuilderTest, InputOutput) {
+  ExpectSuccess(b().Input("a: int32"),
+                "input_arg: { name: 'a' type: DT_INT32 }");
+  ExpectSuccess(b().Output("b: string"),
+                "output_arg: { name: 'b' type: DT_STRING }");
+  ExpectSuccess(b().Input("c: float  "),
+                "input_arg: { name: 'c' type: DT_FLOAT }");
+  ExpectSuccess(b().Output("d: Ref(bool)"),
+                "output_arg: { name: 'd' type: DT_BOOL is_ref: true }");
+  ExpectOrdered(b().Input("a: bool")
+                    .Output("c: complex64")
+                    .Input("b: int64")
+                    .Output("d: string"),
+                "input_arg: { name: 'a' type: DT_BOOL } "
+                "input_arg: { name: 'b' type: DT_INT64 } "
+                "output_arg: { name: 'c' type: DT_COMPLEX64 } "
+                "output_arg: { name: 'd' type: DT_STRING }");
+}
+
+TEST_F(OpDefBuilderTest, PolymorphicInputOutput) {
+  ExpectSuccess(b().Input("a: foo").Attr("foo: type"),
+                "input_arg: { name: 'a' type_attr: 'foo' } "
+                "attr: { name: 'foo' type: 'type' }");
+  ExpectSuccess(b().Output("a: foo").Attr("foo: { bool, int32 }"),
+                "output_arg: { name: 'a' type_attr: 'foo' } "
+                "attr: { name: 'foo' type: 'type' "
+                "allowed_values: { list { type: [DT_BOOL, DT_INT32] } } }");
+}
+
+TEST_F(OpDefBuilderTest, InputOutputListSameType) {
+  ExpectSuccess(b().Input("a: n * int32").Attr("n: int"),
+                "input_arg: { name: 'a' number_attr: 'n' type: DT_INT32 } "
+                "attr: { name: 'n' type: 'int' has_minimum: true minimum: 1 }");
+  // Polymorphic case:
+  ExpectSuccess(b().Output("b: n * foo").Attr("n: int").Attr("foo: type"),
+                "output_arg: { name: 'b' number_attr: 'n' type_attr: 'foo' } "
+                "attr: { name: 'n' type: 'int' has_minimum: true minimum: 1 } "
+                "attr: { name: 'foo' type: 'type' }");
+}
+
+TEST_F(OpDefBuilderTest, InputOutputListAnyType) {
+  ExpectSuccess(
+      b().Input("c: foo").Attr("foo: list(type)"),
+      "input_arg: { name: 'c' type_list_attr: 'foo' } "
+      "attr: { name: 'foo' type: 'list(type)' has_minimum: true minimum: 1 }");
+  ExpectSuccess(
+      b().Output("c: foo").Attr("foo: list({string, float})"),
+      "output_arg: { name: 'c' type_list_attr: 'foo' } "
+      "attr: { name: 'foo' type: 'list(type)' has_minimum: true minimum: 1 "
+      "allowed_values: { list { type: [DT_STRING, DT_FLOAT] } } }");
+}
+
+TEST_F(OpDefBuilderTest, InputOutputFailure) {
+  ExpectFailure(b().Input("9: int32"),
+                "Trouble parsing 'name:' from Input(\"9: int32\") for Op Test");
+  ExpectFailure(
+      b().Output("_: int32"),
+      "Trouble parsing 'name:' from Output(\"_: int32\") for Op Test");
+  ExpectFailure(b().Input(": int32"),
+                "Trouble parsing 'name:' from Input(\": int32\") for Op Test");
+  ExpectFailure(b().Output("int32"),
+                "Trouble parsing 'name:' from Output(\"int32\") for Op Test");
+  ExpectFailure(
+      b().Input("CAPS: int32"),
+      "Trouble parsing 'name:' from Input(\"CAPS: int32\") for Op Test");
+  ExpectFailure(b().Input("a: _"),
+                "Trouble parsing either a type or an attr name at '_' from "
+                "Input(\"a: _\") for Op Test");
+  ExpectFailure(b().Input("a: 9"),
+                "Trouble parsing either a type or an attr name at '9' from "
+                "Input(\"a: 9\") for Op Test");
+  ExpectFailure(b().Input("a: 9 * int32"),
+                "Trouble parsing either a type or an attr name at '9 * int32' "
+                "from Input(\"a: 9 * int32\") for Op Test");
+  ExpectFailure(
+      b().Input("a: x * _").Attr("x: type"),
+      "Extra '* _' unparsed at the end from Input(\"a: x * _\") for Op Test");
+  ExpectFailure(b().Input("a: x * y extra").Attr("x: int").Attr("y: type"),
+                "Extra 'extra' unparsed at the end from Input(\"a: x * y "
+                "extra\") for Op Test");
+  ExpectFailure(b().Input("a: Ref(int32"),
+                "Did not find closing ')' for 'Ref(', instead found: '' from "
+                "Input(\"a: Ref(int32\") for Op Test");
+  ExpectFailure(b().Input("a: Ref(x y").Attr("x: type"),
+                "Did not find closing ')' for 'Ref(', instead found: 'y' from "
+                "Input(\"a: Ref(x y\") for Op Test");
+  ExpectFailure(
+      b().Input("a: x"),
+      "Reference to unknown attr 'x' from Input(\"a: x\") for Op Test");
+  ExpectFailure(
+      b().Input("a: x * y").Attr("x: int"),
+      "Reference to unknown attr 'y' from Input(\"a: x * y\") for Op Test");
+  ExpectFailure(b().Input("a: x").Attr("x: int"),
+                "Reference to attr 'x' with type int that isn't type or "
+                "list(type) from Input(\"a: x\") for Op Test");
+}
+
+TEST_F(OpDefBuilderTest, Set) {
+  ExpectSuccess(b().SetIsStateful(), "is_stateful: true");
+  ExpectSuccess(b().SetIsCommutative().SetIsAggregate(),
+                "is_commutative: true is_aggregate: true");
+}
+
+TEST_F(OpDefBuilderTest, DocUnpackSparseFeatures) {
+  ExpectOrdered(b().Input("sf: string")
+                    .Output("indices: int32")
+                    .Output("ids: int64")
+                    .Output("weights: float")
+                    .Doc(R"doc(
+Converts a vector of strings with dist_belief::SparseFeatures to tensors.
+
+Note that indices, ids and weights are vectors of the same size and have
+one-to-one correspondence between their elements. ids and weights are each
+obtained by sequentially concatenating sf[i].id and sf[i].weight, for i in
+1...size(sf). Note that if sf[i].weight is not set, the default value for the
+weight is assumed to be 1.0. Also for any j, if ids[j] and weights[j] were
+extracted from sf[i], then index[j] is set to i.
+
+sf: vector of string, where each element is the string encoding of
+    SparseFeatures proto.
+indices: vector of indices inside sf
+ids: vector of id extracted from the SparseFeatures proto.
+weights: vector of weight extracted from the SparseFeatures proto.
+)doc"),
+                R"proto(
+input_arg {
+  name: "sf"
+  description: "vector of string, where each element is the string encoding of\nSparseFeatures proto."
+  type: DT_STRING
+}
+output_arg {
+  name: "indices"
+  description: "vector of indices inside sf"
+  type: DT_INT32
+}
+output_arg {
+  name: "ids"
+  description: "vector of id extracted from the SparseFeatures proto."
+  type: DT_INT64
+}
+output_arg {
+  name: "weights"
+  description: "vector of weight extracted from the SparseFeatures proto."
+  type: DT_FLOAT
+}
+summary: "Converts a vector of strings with dist_belief::SparseFeatures to tensors."
+description: "Note that indices, ids and weights are vectors of the same size and have\none-to-one correspondence between their elements. ids and weights are each\nobtained by sequentially concatenating sf[i].id and sf[i].weight, for i in\n1...size(sf). Note that if sf[i].weight is not set, the default value for the\nweight is assumed to be 1.0. Also for any j, if ids[j] and weights[j] were\nextracted from sf[i], then index[j] is set to i."
+)proto");
+}
+
+TEST_F(OpDefBuilderTest, DocConcat) {
+  ExpectOrdered(b().Input("concat_dim: int32")
+                    .Input("values: num_values * dtype")
+                    .Output("output: dtype")
+                    .Attr("dtype: type")
+                    .Attr("num_values: int >= 2")
+                    .Doc(R"doc(
+Concatenate N Tensors along one dimension.
+
+concat_dim: The (scalar) dimension along which to concatenate.  Must be
+  in the range [0, rank(values...)).
+values: The N Tensors to concatenate. Their ranks and types must match,
+  and their sizes must match in all dimensions except concat_dim.
+output: A Tensor with the concatenation of values stacked along the
+  concat_dim dimension.  This Tensor's shape matches the Tensors in
+  values, except in concat_dim where it has the sum of the sizes.
+)doc"),
+                R"proto(
+input_arg {
+  name: "concat_dim"
+  description: "The (scalar) dimension along which to concatenate.  Must be\nin the range [0, rank(values...))."
+  type: DT_INT32
+}
+input_arg {
+  name: "values"
+  description: "The N Tensors to concatenate. Their ranks and types must match,\nand their sizes must match in all dimensions except concat_dim."
+  type_attr: "dtype"
+  number_attr: "num_values"
+}
+output_arg {
+  name: "output"
+  description: "A Tensor with the concatenation of values stacked along the\nconcat_dim dimension.  This Tensor\'s shape matches the Tensors in\nvalues, except in concat_dim where it has the sum of the sizes."
+  type_attr: "dtype"
+}
+summary: "Concatenate N Tensors along one dimension."
+attr {
+  name: "dtype"
+  type: "type"
+}
+attr {
+  name: "num_values"
+  type: "int"
+  has_minimum: true
+  minimum: 2
+}
+)proto");
+}
+
+TEST_F(OpDefBuilderTest, DocAttr) {
+  ExpectOrdered(b().Attr("i: int").Doc(R"doc(
+Summary
+
+i: How much to operate.
+)doc"),
+                R"proto(
+summary: "Summary"
+attr {
+  name: "i"
+  type: "int"
+  description: "How much to operate."
+}
+)proto");
+}
+
+TEST_F(OpDefBuilderTest, DocCalledTwiceFailure) {
+  ExpectFailure(b().Doc("What's").Doc("up, doc?"),
+                "Extra call to Doc() for Op Test");
+}
+
+TEST_F(OpDefBuilderTest, DocFailureMissingName) {
+  ExpectFailure(
+      b().Input("a: int32").Doc(R"doc(
+Summary
+
+a: Something for a.
+b: b is not defined.
+)doc"),
+      "No matching input/output/attr for name 'b' from Doc() for Op Test");
+
+  ExpectFailure(
+      b().Input("a: int32").Doc(R"doc(
+Summary
+
+b: b is not defined and by itself.
+)doc"),
+      "No matching input/output/attr for name 'b' from Doc() for Op Test");
+}
+
+TEST_F(OpDefBuilderTest, DefaultMinimum) {
+  ExpectSuccess(b().Input("values: num_values * dtype")
+                    .Output("output: anything")
+                    .Attr("anything: list(type)")
+                    .Attr("dtype: type")
+                    .Attr("num_values: int"),
+                R"proto(
+input_arg {
+  name: "values"
+  type_attr: "dtype"
+  number_attr: "num_values"
+}
+output_arg {
+  name: "output"
+  type_list_attr: "anything"
+}
+attr {
+  name: "anything"
+  type: "list(type)"
+  has_minimum: true
+  minimum: 1
+}
+attr {
+  name: "dtype"
+  type: "type"
+}
+attr {
+  name: "num_values"
+  type: "int"
+  has_minimum: true
+  minimum: 1
+}
+)proto");
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/op_def_util.cc b/tensorflow/core/framework/op_def_util.cc
new file mode 100644
index 0000000000..e3aef011de
--- /dev/null
+++ b/tensorflow/core/framework/op_def_util.cc
@@ -0,0 +1,344 @@
+#include "tensorflow/core/framework/op_def_util.h"
+
+#include <set>
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/regexp.h"
+
+namespace tensorflow {
+namespace {  // ------ Helper functions ------
+
+bool HasAttrStyleType(const OpDef::ArgDef& arg) {
+  return arg.type() != DT_INVALID || !arg.type_attr().empty() ||
+         !arg.type_list_attr().empty();
+}
+
+Status AllowedTypeValue(DataType dt, const OpDef::AttrDef& attr) {
+  const AttrValue& allowed_values(attr.allowed_values());
+  for (auto allowed : allowed_values.list().type()) {
+    if (dt == allowed) {
+      return Status::OK();
+    }
+  }
+  string allowed_str;
+  for (int i = 0; i < allowed_values.list().type_size(); ++i) {
+    if (!allowed_str.empty()) {
+      strings::StrAppend(&allowed_str, ", ");
+    }
+    strings::StrAppend(&allowed_str,
+                       DataTypeString(allowed_values.list().type(i)));
+  }
+  return errors::InvalidArgument(
+      "Value for attr '", attr.name(), "' of ", DataTypeString(dt),
+      " is not in the list of allowed values: ", allowed_str);
+}
+
+Status AllowedStringValue(const string& str, const OpDef::AttrDef& attr) {
+  const AttrValue& allowed_values(attr.allowed_values());
+  for (auto allowed : allowed_values.list().s()) {
+    if (str == allowed) {
+      return Status::OK();
+    }
+  }
+  string allowed_str;
+  for (const string& allowed : allowed_values.list().s()) {
+    if (!allowed_str.empty()) {
+      strings::StrAppend(&allowed_str, ", ");
+    }
+    strings::StrAppend(&allowed_str, "\"", allowed, "\"");
+  }
+  return errors::InvalidArgument(
+      "Value for attr '", attr.name(), "' of \"", str,
+      "\" is not in the list of allowed values: ", allowed_str);
+}
+
+}  // namespace
+
+// Requires: attr has already been validated.
+Status ValidateAttrValue(const AttrValue& attr_value,
+                              const OpDef::AttrDef& attr) {
+  // Is it a valid value?
+  TF_RETURN_WITH_CONTEXT_IF_ERROR(AttrValueHasType(attr_value, attr.type()),
+                                  " for attr '", attr.name(), "'");
+
+  // Does the value satisfy the minimum constraint in the AttrDef?
+  if (attr.has_minimum()) {
+    if (attr.type() == "int") {
+      if (attr_value.i() < attr.minimum()) {
+        return errors::InvalidArgument(
+            "Value for attr '", attr.name(), "' of ", attr_value.i(),
+            " must be at least minimum ", attr.minimum());
+      }
+    } else {
+      int length = -1;
+      if (attr.type() == "list(string)") {
+        length = attr_value.list().s_size();
+      } else if (attr.type() == "list(int)") {
+        length = attr_value.list().i_size();
+      } else if (attr.type() == "list(float)") {
+        length = attr_value.list().f_size();
+      } else if (attr.type() == "list(bool)") {
+        length = attr_value.list().b_size();
+      } else if (attr.type() == "list(type)") {
+        length = attr_value.list().type_size();
+      } else if (attr.type() == "list(shape)") {
+        length = attr_value.list().shape_size();
+      } else if (attr.type() == "list(tensor)") {
+        length = attr_value.list().tensor_size();
+      }
+      if (length < attr.minimum()) {
+        return errors::InvalidArgument(
+            "Length for attr '", attr.name(), "' of ", length,
+            " must be at least minimum ", attr.minimum());
+      }
+    }
+  }
+
+  // Does the value satisfy the allowed_value constraint in the AttrDef?
+  if (attr.has_allowed_values()) {
+    if (attr.type() == "type") {
+      TF_RETURN_IF_ERROR(AllowedTypeValue(attr_value.type(), attr));
+    } else if (attr.type() == "list(type)") {
+      for (int dt : attr_value.list().type()) {
+        TF_RETURN_IF_ERROR(AllowedTypeValue(static_cast<DataType>(dt), attr));
+      }
+    } else if (attr.type() == "string") {
+      TF_RETURN_IF_ERROR(AllowedStringValue(attr_value.s(), attr));
+    } else if (attr.type() == "list(string)") {
+      for (const string& str : attr_value.list().s()) {
+        TF_RETURN_IF_ERROR(AllowedStringValue(str, attr));
+      }
+    } else {
+      return errors::Unimplemented(
+          "Support for allowed_values not implemented for type ", attr.type());
+    }
+  }
+  return Status::OK();
+}
+
+const OpDef::AttrDef* FindAttr(StringPiece name, const OpDef& op_def) {
+  for (int i = 0; i < op_def.attr_size(); ++i) {
+    if (op_def.attr(i).name() == name) {
+      return &op_def.attr(i);
+    }
+  }
+  return nullptr;
+}
+
+OpDef::AttrDef* FindAttrMutable(StringPiece name, OpDef* op_def) {
+  for (int i = 0; i < op_def->attr_size(); ++i) {
+    if (op_def->attr(i).name() == name) {
+      return op_def->mutable_attr(i);
+    }
+  }
+  return nullptr;
+}
+
+#define VALIDATE(EXPR, ...)                                       \
+  do {                                                            \
+    if (!(EXPR)) {                                                \
+      return errors::InvalidArgument(__VA_ARGS__, "; in OpDef: ", \
+                                     op_def.ShortDebugString());  \
+    }                                                             \
+  } while (false)
+
+static Status ValidateArg(const OpDef::ArgDef& arg, const OpDef& op_def,
+                               bool output, std::set<string>* names) {
+  const string suffix = strings::StrCat(
+      output ? " for output '" : " for input '", arg.name(), "'");
+  VALIDATE(gtl::InsertIfNotPresent(names, arg.name()), "Duplicate name: ",
+           arg.name());
+  VALIDATE(HasAttrStyleType(arg), "Missing type", suffix);
+
+  if (!arg.number_attr().empty()) {
+    const OpDef::AttrDef* attr = FindAttr(arg.number_attr(), op_def);
+    VALIDATE(attr != nullptr, "No attr with name '", arg.number_attr(), "'",
+             suffix);
+    VALIDATE(attr->type() == "int", "Attr '", attr->name(), "' used as length",
+             suffix, " has type ", attr->type(), " != int");
+    VALIDATE(attr->has_minimum(), "Attr '", attr->name(), "' used as length",
+             suffix, " must have minimum");
+    VALIDATE(attr->minimum() >= 0, "Attr '", attr->name(), "' used as length",
+             suffix, " must have minimum >= 0");
+    VALIDATE(arg.type_list_attr().empty(),
+             "Can't have both number_attr and type_list_attr", suffix);
+    VALIDATE((arg.type() != DT_INVALID ? 1 : 0) +
+                     (!arg.type_attr().empty() ? 1 : 0) ==
+                 1,
+             "Exactly one of type, type_attr must be set", suffix);
+  } else {
+    const int num_type_fields = (arg.type() != DT_INVALID ? 1 : 0) +
+                                (!arg.type_attr().empty() ? 1 : 0) +
+                                (!arg.type_list_attr().empty() ? 1 : 0);
+    VALIDATE(num_type_fields == 1,
+             "Exactly one of type, type_attr, type_list_attr must be set",
+             suffix);
+  }
+
+  if (!arg.type_attr().empty()) {
+    const OpDef::AttrDef* attr = FindAttr(arg.type_attr(), op_def);
+    VALIDATE(attr != nullptr, "No attr with name '", arg.type_attr(), "'",
+             suffix);
+    VALIDATE(attr->type() == "type", "Attr '", attr->name(),
+             "' used as type_attr", suffix, " has type ", attr->type(),
+             " != type");
+  } else if (!arg.type_list_attr().empty()) {
+    const OpDef::AttrDef* attr = FindAttr(arg.type_list_attr(), op_def);
+    VALIDATE(attr != nullptr, "No attr with name '", arg.type_list_attr(), "'",
+             suffix);
+    VALIDATE(attr->type() == "list(type)", "Attr '", attr->name(),
+             "' used as type_list_attr", suffix, " has type ", attr->type(),
+             " != list(type)");
+  } else {
+    // All argument types should be non-reference types at this point.
+    // ArgDef.is_ref is set to true for reference arguments.
+    VALIDATE(!IsRefType(arg.type()), "Illegal use of ref type '",
+             DataTypeString(arg.type()), "'. Use 'Ref(type)' instead", suffix);
+  }
+
+  return Status::OK();
+}
+
+Status ValidateOpDef(const OpDef& op_def) {
+  VALIDATE(RE2::FullMatch(op_def.name(), "(?:_.*|[A-Z][a-zA-Z0-9]*)"),
+           "Invalid name: ", op_def.name(), " (Did you use CamelCase?)");
+
+  std::set<string> names;  // for detecting duplicate names
+  for (const auto& attr : op_def.attr()) {
+    // Validate name
+    VALIDATE(gtl::InsertIfNotPresent(&names, attr.name()), "Duplicate name: ",
+             attr.name());
+    DataType dt;
+    VALIDATE(!DataTypeFromString(attr.name(), &dt), "Attr can't have name ",
+             attr.name(), " that matches a data type");
+
+    // Validate type
+    StringPiece type(attr.type());
+    bool is_list = type.Consume("list(");
+    bool found = false;
+    for (StringPiece valid : {"string", "int", "float", "bool", "type", "shape",
+                              "tensor", "func"}) {
+      if (type.Consume(valid)) {
+        found = true;
+        break;
+      }
+    }
+    VALIDATE(found, "Unrecognized type '", type, "' in attr '", attr.name(),
+             "'");
+    if (is_list) {
+      VALIDATE(type.Consume(")"), "'list(' is missing ')' in attr ",
+               attr.name(), "'s type ", attr.type());
+    }
+    VALIDATE(type.empty(), "Extra '", type, "' at the end of attr ",
+             attr.name(), "'s type ", attr.type());
+
+    // Validate minimum
+    if (attr.has_minimum()) {
+      VALIDATE(attr.type() == "int" || is_list, "Attr '", attr.name(),
+               "' has minimum for unsupported type ", attr.type());
+      if (is_list) {
+        VALIDATE(attr.minimum() >= 0, "Attr '", attr.name(),
+                 "' with list type must have a non-negative minimum, not ",
+                 attr.minimum());
+      }
+    } else {
+      VALIDATE(attr.minimum() == 0, "Attr '", attr.name(),
+               "' with has_minimum = false but minimum ", attr.minimum(),
+               " not equal to default of 0");
+    }
+
+    // Validate allowed_values
+    if (attr.has_allowed_values()) {
+      const string list_type =
+          is_list ? attr.type() : strings::StrCat("list(", attr.type(), ")");
+      TF_RETURN_WITH_CONTEXT_IF_ERROR(
+          AttrValueHasType(attr.allowed_values(), list_type), " for attr '",
+          attr.name(), "' in Op '", op_def.name(), "'");
+    }
+
+    // Validate default_value (after we have validated the rest of the attr,
+    // so we can use ValidateAttrValue()).
+    if (attr.has_default_value()) {
+      TF_RETURN_WITH_CONTEXT_IF_ERROR(
+          ValidateAttrValue(attr.default_value(), attr), " in Op '",
+          op_def.name(), "'");
+    }
+  }
+
+  for (const auto& arg : op_def.input_arg()) {
+    TF_RETURN_IF_ERROR(ValidateArg(arg, op_def, false, &names));
+  }
+
+  for (const auto& arg : op_def.output_arg()) {
+    TF_RETURN_IF_ERROR(ValidateArg(arg, op_def, true, &names));
+  }
+
+  return Status::OK();
+}
+
+#undef VALIDATE
+
+namespace {
+
+string SummarizeArgs(const protobuf::RepeatedPtrField<OpDef::ArgDef>& args) {
+  string ret;
+  for (const OpDef::ArgDef& arg : args) {
+    if (!ret.empty()) strings::StrAppend(&ret, ", ");
+    strings::StrAppend(&ret, arg.name(), ":");
+    if (arg.is_ref()) strings::StrAppend(&ret, "Ref(");
+    if (!arg.number_attr().empty()) {
+      strings::StrAppend(&ret, arg.number_attr(), "*");
+    }
+    if (arg.type() != DT_INVALID) {
+      strings::StrAppend(&ret, DataTypeString(arg.type()));
+    } else {
+      strings::StrAppend(&ret, arg.type_attr());
+    }
+    if (arg.is_ref()) strings::StrAppend(&ret, ")");
+  }
+  return ret;
+}
+
+}  // namespace
+
+string SummarizeOpDef(const OpDef& op_def) {
+  string ret = strings::StrCat("Op<name=", op_def.name());
+  strings::StrAppend(&ret, "; signature=", SummarizeArgs(op_def.input_arg()),
+                     " -> ", SummarizeArgs(op_def.output_arg()));
+  for (int i = 0; i < op_def.attr_size(); ++i) {
+    strings::StrAppend(&ret, "; attr=", op_def.attr(i).name(), ":",
+                       op_def.attr(i).type());
+    if (op_def.attr(i).has_default_value()) {
+      strings::StrAppend(&ret, ",default=",
+                         SummarizeAttrValue(op_def.attr(i).default_value()));
+    }
+    if (op_def.attr(i).has_minimum()) {
+      strings::StrAppend(&ret, ",min=", op_def.attr(i).minimum());
+    }
+    if (op_def.attr(i).has_allowed_values()) {
+      strings::StrAppend(&ret, ",allowed=",
+                         SummarizeAttrValue(op_def.attr(i).allowed_values()));
+    }
+  }
+  if (op_def.is_commutative()) {
+    strings::StrAppend(&ret, "; is_commutative=true");
+  }
+  if (op_def.is_aggregate()) {
+    strings::StrAppend(&ret, "; is_aggregate=true");
+  }
+  if (op_def.is_stateful()) {
+    strings::StrAppend(&ret, "; is_stateful=true");
+  }
+  if (op_def.allows_uninitialized_input()) {
+    strings::StrAppend(&ret, "; allows_uninitialized_input=true");
+  }
+  strings::StrAppend(&ret, ">");
+  return ret;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/op_def_util.h b/tensorflow/core/framework/op_def_util.h
new file mode 100644
index 0000000000..a9fecf3fa0
--- /dev/null
+++ b/tensorflow/core/framework/op_def_util.h
@@ -0,0 +1,32 @@
+// TODO(josh11b): Probably not needed for OpKernel authors, so doesn't
+// need to be as publicly accessible as other files in framework/.
+
+#ifndef TENSORFLOW_FRAMEWORK_OP_DEF_UTIL_H_
+#define TENSORFLOW_FRAMEWORK_OP_DEF_UTIL_H_
+
+#include <string>
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// Performs a consistency check across the fields of the op_def.
+Status ValidateOpDef(const OpDef& op_def);
+
+// Validates that attr_value satisfies the type and constraints from attr.
+// REQUIRES: attr has already been validated.
+Status ValidateAttrValue(const AttrValue& attr_value,
+                              const OpDef::AttrDef& attr);
+
+// The following search through op_def for an attr with the indicated name.
+// Returns nullptr if no such attr is found.
+const OpDef::AttrDef* FindAttr(StringPiece name, const OpDef& op_def);
+OpDef::AttrDef* FindAttrMutable(StringPiece name, OpDef* op_def);
+
+// Produce a human-readable version of an op_def that is more concise
+// than a text-format proto.  Excludes descriptions.
+string SummarizeOpDef(const OpDef& op_def);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_OP_DEF_UTIL_H_
diff --git a/tensorflow/core/framework/op_def_util_test.cc b/tensorflow/core/framework/op_def_util_test.cc
new file mode 100644
index 0000000000..515e8bb288
--- /dev/null
+++ b/tensorflow/core/framework/op_def_util_test.cc
@@ -0,0 +1,330 @@
+#include "tensorflow/core/framework/op_def_util.h"
+
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/framework/op_def_builder.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+OpDef FromText(const string& text) {
+  OpDef op_def;
+  EXPECT_TRUE(protobuf::TextFormat::MergeFromString(text, &op_def));
+  return op_def;
+}
+
+class ValidateOpDefTest : public ::testing::Test {
+ protected:
+  Status TestProto(const string& text) {
+    return ValidateOpDef(FromText(text));
+  }
+
+  Status TestBuilder(const OpDefBuilder& builder) {
+    OpDef op_def;
+    Status status = builder.Finalize(&op_def);
+    EXPECT_OK(status);
+    if (!status.ok()) {
+      return status;
+    } else {
+      return ValidateOpDef(op_def);
+    }
+  }
+
+  void ExpectFailure(const Status& status, const string& message) {
+    EXPECT_FALSE(status.ok()) << "Did not see error with: " << message;
+    if (!status.ok()) {
+      LOG(INFO) << "message: " << status;
+      EXPECT_TRUE(StringPiece(status.ToString()).contains(message))
+          << "Actual: " << status << "\nExpected to contain: " << message;
+    }
+  }
+};
+
+TEST_F(ValidateOpDefTest, OpDefValid) {
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Attr("a: int")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Input("a: int32")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Output("a: bool")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Attr("t: type").Input("a: t")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Attr("a: int = 3")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Attr("a: int >= -5")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Attr("a: int >= -5")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Attr("a: int >= -5 = 3")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("X").Attr("a: numbertype")));
+  EXPECT_OK(TestBuilder(OpDefBuilder("Uppercase")));
+}
+
+TEST_F(ValidateOpDefTest, InvalidName) {
+  ExpectFailure(TestBuilder(OpDefBuilder("lower").Attr("a: int")),
+                "Invalid name");
+  ExpectFailure(TestBuilder(OpDefBuilder("BadSuffix 7%")), "Invalid name");
+}
+
+TEST_F(ValidateOpDefTest, DuplicateName) {
+  ExpectFailure(
+      TestBuilder(OpDefBuilder("DupeName").Input("a: int32").Input("a: float")),
+      "Duplicate name: a");
+  ExpectFailure(
+      TestBuilder(
+          OpDefBuilder("DupeName").Input("a: int32").Output("a: float")),
+      "Duplicate name: a");
+  ExpectFailure(
+      TestBuilder(
+          OpDefBuilder("DupeName").Output("a: int32").Output("a: float")),
+      "Duplicate name: a");
+  ExpectFailure(
+      TestBuilder(OpDefBuilder("DupeName").Input("a: int32").Attr("a: float")),
+      "Duplicate name: a");
+  ExpectFailure(
+      TestBuilder(OpDefBuilder("DupeName").Output("a: int32").Attr("a: float")),
+      "Duplicate name: a");
+  ExpectFailure(
+      TestBuilder(OpDefBuilder("DupeName").Attr("a: int").Attr("a: float")),
+      "Duplicate name: a");
+}
+
+TEST_F(ValidateOpDefTest, BadAttrName) {
+  ExpectFailure(TestBuilder(OpDefBuilder("BadAttrtude").Attr("int32: int")),
+                "Attr can't have name int32 that matches a data type");
+  ExpectFailure(TestBuilder(OpDefBuilder("BadAttrtude").Attr("float: string")),
+                "Attr can't have name float that matches a data type");
+}
+
+TEST_F(ValidateOpDefTest, BadAttrType) {
+  ExpectFailure(
+      TestProto("name: 'BadAttrType' attr { name: 'a' type: 'illegal' }"),
+      "Unrecognized type");
+  ExpectFailure(
+      TestProto("name: 'BadAttrType' attr { name: 'a' type: 'list(illegal)' }"),
+      "Unrecognized type");
+  ExpectFailure(
+      TestProto("name: 'BadAttrType' attr { name: 'a' type: 'int extra' }"),
+      "Extra ' extra' at the end");
+  ExpectFailure(
+      TestProto(
+          "name: 'BadAttrType' attr { name: 'a' type: 'list(int extra)' }"),
+      "'list(' is missing ')' in attr");
+  ExpectFailure(
+      TestProto(
+          "name: 'BadAttrType' attr { name: 'a' type: 'list(int) extra' }"),
+      "Extra ' extra' at the end");
+}
+
+TEST_F(ValidateOpDefTest, BadAttrDefault) {
+  ExpectFailure(
+      TestProto("name: 'BadAttrDef' attr { name: 'a' "
+                "type: 'int' default_value { s: 'x' } }"),
+      "AttrValue had value with type string when int expected\n\t for "
+      "attr 'a'\n\t in Op 'BadAttrDef'");
+  ExpectFailure(TestProto("name: 'BadAttrDef' attr { name: 'a' "
+                          "type: 'int' default_value { f: 0.5 } }"),
+                "AttrValue had value with type float when int expected\n\t for "
+                "attr 'a'\n\t in Op 'BadAttrDef'");
+  ExpectFailure(
+      TestProto("name: 'BadAttrDef' attr { name: 'a' type: 'int' "
+                "default_value { i: 5 list { i: [2] } } }"),
+      "AttrValue had value with type list(int) when int expected\n\t for "
+      "attr 'a'\n\t in Op 'BadAttrDef'");
+  ExpectFailure(
+      TestProto("name: 'BadAttrDef' attr { name: 'a' "
+                "type: 'list(int)' default_value { f: 0.5 } }"),
+      "AttrValue had value with type float when list(int) expected\n\t "
+      "for attr 'a'\n\t in Op 'BadAttrDef'");
+  ExpectFailure(
+      TestProto("name: 'BadAttrDef' attr { name: 'a' type: 'list(int)' "
+                "default_value { list { i: [5] f: [0.5] } } }"),
+      "AttrValue had value with type list(float) when list(int) "
+      "expected\n\t for attr 'a'\n\t in Op 'BadAttrDef'");
+
+  ExpectFailure(TestProto("name: 'BadAttrDef' attr { name: 'a' "
+                          "type: 'type' default_value { } }"),
+                "AttrValue missing value with expected type type\n\t for "
+                "attr 'a'\n\t in Op 'BadAttrDef'");
+  ExpectFailure(TestProto("name: 'BadAttrDef' attr { name: 'a' "
+                          "type: 'shape' default_value { } }"),
+                "AttrValue missing value with expected type shape\n\t for "
+                "attr 'a'\n\t in Op 'BadAttrDef'");
+  ExpectFailure(TestProto("name: 'BadAttrDef' attr { name: 'a' "
+                          "type: 'tensor' default_value { } }"),
+                "AttrValue missing value with expected type tensor\n\t for "
+                "attr 'a'\n\t in Op 'BadAttrDef'");
+
+  // default_value {} is indistinguishable from default_value{ list{} } (one
+  // with an empty list) in proto3 semantics.
+  EXPECT_OK(
+      TestProto("name: 'GoodAttrDef' attr { name: 'a' "
+                "type: 'list(int)' default_value { } }"));
+
+  // Empty lists are allowed:
+  EXPECT_OK(
+      TestProto("name: 'GoodAttrDef' attr { name: 'a' "
+                "type: 'list(int)' default_value { list { } } }"));
+  // Builder should make the same proto:
+  EXPECT_OK(TestBuilder(OpDefBuilder("GoodAttrDef").Attr("a: list(int) = []")));
+
+  // Unless there is a minimum length specified:
+  ExpectFailure(TestProto("name: 'BadAttrDef' attr { name: 'a' "
+                          "type: 'list(int)' has_minimum: true minimum: 2 "
+                          "default_value { list { } } }"),
+                "Length for attr 'a' of 0 must be at least minimum 2\n\t in Op "
+                "'BadAttrDef'");
+  ExpectFailure(
+      TestBuilder(OpDefBuilder("GoodAttrDef").Attr("a: list(bool) >=2 = []")),
+      "Length for attr 'a' of 0 must be at least minimum 2\n\t in Op "
+      "'GoodAttrDef'");
+  ExpectFailure(TestProto("name: 'BadAttrDef' attr { name: 'a' type: "
+                          "'list(string)' has_minimum: true minimum: 2 "
+                          "default_value { list { s: ['foo'] } } }"),
+                "Length for attr 'a' of 1 must be at least minimum 2\n\t in Op "
+                "'BadAttrDef'");
+  ExpectFailure(TestBuilder(OpDefBuilder("GoodAttrDef")
+                                .Attr("a: list(type) >=2 = [DT_STRING]")),
+                "Length for attr 'a' of 1 must be at least minimum 2\n\t in Op "
+                "'GoodAttrDef'");
+}
+
+TEST_F(ValidateOpDefTest, NoRefTypes) {
+  ExpectFailure(TestBuilder(OpDefBuilder("BadAttrDef").Input("i: float_ref")),
+                "Illegal use of ref type 'float_ref'. "
+                "Use 'Ref(type)' instead for input 'i'");
+  ExpectFailure(
+      TestBuilder(OpDefBuilder("BadAttrDef").Attr("T: type = DT_INT32_REF")),
+      "AttrValue must not have reference type value of int32_ref");
+  ExpectFailure(TestBuilder(OpDefBuilder("BadAttrDef")
+                                .Attr("T: list(type) = [DT_STRING_REF]")),
+                "AttrValue must not have reference type value of string_ref");
+}
+
+TEST_F(ValidateOpDefTest, BadAttrMin) {
+  ExpectFailure(TestProto("name: 'BadAttrMin' attr { name: 'a' type: 'string' "
+                          "has_minimum: true minimum: 0 }"),
+                "minimum for unsupported type string");
+  ExpectFailure(
+      TestProto("name: 'BadAttrMin' attr { name: 'a' type: 'int' default_value "
+                "{ i: 2 } has_minimum: true minimum: 7 }"),
+      "Value for attr 'a' of 2 must be at least minimum 7\n\t in Op "
+      "'BadAttrMin'");
+  ExpectFailure(
+      TestProto("name: 'BadAttrMin' attr { name: 'a' "
+                "type: 'list(string)' has_minimum: true minimum: -5 }"),
+      "list type must have a non-negative minimum, not -5");
+  EXPECT_OK(
+      TestProto("name: 'GoodAttrMin' attr { name: 'a' type: 'list(string)' "
+                "has_minimum: true minimum: 1 }"));
+  ExpectFailure(TestProto("name: 'NoHasMin' attr { name: 'a' "
+                          "type: 'list(string)' minimum: 3 }"),
+                "Attr 'a' with has_minimum = false but minimum 3 not equal to "
+                "default of 0");
+}
+
+TEST_F(ValidateOpDefTest, BadAttrAllowed) {
+  // Is in list of allowed types.
+  EXPECT_OK(TestBuilder(
+      OpDefBuilder("GoodAttrtude").Attr("x: numbertype = DT_INT32")));
+  // Not in list of allowed types.
+  ExpectFailure(TestBuilder(OpDefBuilder("BadAttrtude")
+                                .Attr("x: numbertype = DT_STRING")),
+                "attr 'x' of string is not in the list of allowed values");
+  ExpectFailure(
+      TestBuilder(OpDefBuilder("BadAttrtude")
+                      .Attr("x: list(realnumbertype) = [DT_COMPLEX64]")),
+      "attr 'x' of complex64 is not in the list of allowed values");
+  // Is in list of allowed strings.
+  EXPECT_OK(TestBuilder(
+      OpDefBuilder("GoodAttrtude").Attr("x: {'foo', 'bar'} = 'bar'")));
+  // Not in list of allowed strings.
+  ExpectFailure(TestBuilder(OpDefBuilder("BadAttrtude")
+                                .Attr("x: {'foo', 'bar'} = 'baz'")),
+                "attr 'x' of \"baz\" is not in the list of allowed values");
+  ExpectFailure(TestBuilder(OpDefBuilder("BadAttrtude")
+                                .Attr("x: list({'foo', 'bar'}) = ['baz']")),
+                "attr 'x' of \"baz\" is not in the list of allowed values");
+  ExpectFailure(TestProto(
+                    "name: 'BadAttrtude' attr { name: 'a' "
+                    "type: 'string' allowed_values { s: 'not list' } }"),
+                "with type string when list(string) expected");
+  ExpectFailure(
+      TestProto("name: 'BadAttrtude' attr { name: 'a' "
+                "type: 'string' allowed_values { list { i: [6] } } }"),
+      "with type list(int) when list(string) expected");
+}
+
+TEST_F(ValidateOpDefTest, BadArgType) {
+  ExpectFailure(TestProto("name: 'BadArg' input_arg { name: 'a' "
+                          "type: DT_INT32 } input_arg { name: 'b' }"),
+                "Missing type for input 'b'");
+  ExpectFailure(TestProto("name: 'BadArg' input_arg { name: 'a' "
+                          "type: DT_INT32 } output_arg { name: 'b' }"),
+                "Missing type for output 'b'");
+  ExpectFailure(
+      TestProto("name: 'BadArg' input_arg { name: 'a' type: "
+                "DT_INT32 type_attr: 'x' } attr { name: 'x' type: 'type' }"),
+      "Exactly one of type, type_attr, type_list_attr must be set for input "
+      "'a'");
+  ExpectFailure(TestProto("name: 'BadArg' input_arg { name: 'a' "
+                          "type_attr: 'x' } attr { name: 'x' type: 'int' }"),
+                "Attr 'x' used as type_attr for input 'a' has type int");
+  ExpectFailure(
+      TestProto("name: 'BadArg' input_arg { name: 'a' "
+                "type_attr: 'x' } attr { name: 'x' type: 'list(type)' }"),
+      "Attr 'x' used as type_attr for input 'a' has type list(type)");
+  ExpectFailure(
+      TestProto("name: 'BadArg' input_arg { name: 'a' "
+                "type_list_attr: 'x' } attr { name: 'x' type: 'int' }"),
+      "Attr 'x' used as type_list_attr for input 'a' has type int");
+  ExpectFailure(
+      TestProto("name: 'BadArg' input_arg { name: 'a' "
+                "type_list_attr: 'x' } attr { name: 'x' type: 'type' }"),
+      "Attr 'x' used as type_list_attr for input 'a' has type type");
+  ExpectFailure(TestProto("name: 'BadArg' input_arg { name: 'a' "
+                          "type_attr: 'x' }"),
+                "No attr with name 'x' for input 'a'");
+  ExpectFailure(
+      TestProto("name: 'BadArg' input_arg { name: 'a' number_attr: 'n' "
+                "type_attr: 'x' } attr { name: 'x' type: 'list(type)' } "
+                "attr { name: 'n' type: 'int' has_minimum: true minimum: 1 }"),
+      "Attr 'x' used as type_attr for input 'a' has type list(type)");
+  // But list(type) is fine as the type of an arg without a number_attr:
+  EXPECT_OK(TestProto(
+      "name: 'Arg' input_arg { name: 'a' type_list_attr: 'x' } "
+      "attr { name: 'x' type: 'list(type)' } attr { name: 'n' type: 'int' "
+      "has_minimum: true minimum: 1 }"));
+
+  // number_attr
+  EXPECT_OK(TestProto(
+      "name: 'Arg' input_arg { name: 'a' type: DT_INT32 number_attr: 'n' } "
+      "attr { name: 'n' type: 'int' has_minimum: true minimum: 0 }"));
+
+  ExpectFailure(TestProto("name: 'Arg' input_arg { name: 'a' type: DT_INT32 "
+                          "number_attr: 'n' }"),
+                "No attr with name 'n'");
+  ExpectFailure(
+      TestProto(
+          "name: 'Arg' input_arg { name: 'a' type: "
+          "DT_INT32 number_attr: 'n' } attr { name: 'n' type: 'string' }"),
+      "Attr 'n' used as length for input 'a' has type string");
+  ExpectFailure(
+      TestProto("name: 'Arg' input_arg { name: 'a' type: "
+                "DT_INT32 number_attr: 'n' } attr { name: 'n' type: 'int' }"),
+      "Attr 'n' used as length for input 'a' must have minimum;");
+  ExpectFailure(
+      TestProto("name: 'Arg' input_arg { name: 'a' type: DT_INT32 number_attr: "
+                "'n' } attr { name: 'n' type: 'int' has_minimum: true minimum: "
+                "-5 }"),
+      "Attr 'n' used as length for input 'a' must have minimum >= 0;");
+  ExpectFailure(
+      TestProto("name: 'Arg' input_arg { name: 'a' number_attr: 'n' } attr { "
+                "name: 'n' type: 'int' has_minimum: true minimum: 2 }"),
+      "Missing type for input 'a'; in OpDef:");
+  ExpectFailure(TestProto("name: 'BadArg' input_arg { name: 'a' number_attr: "
+                          "'n' type_list_attr: 'x' } attr { name: 'n' type: "
+                          "'int' has_minimum: true minimum: 1 } attr { name: "
+                          "'x' type: 'list(type)' }"),
+                "Can't have both number_attr and type_list_attr for input 'a'");
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/op_gen_lib.cc b/tensorflow/core/framework/op_gen_lib.cc
new file mode 100644
index 0000000000..04f4b7cacd
--- /dev/null
+++ b/tensorflow/core/framework/op_gen_lib.cc
@@ -0,0 +1,55 @@
+#include "tensorflow/core/framework/op_gen_lib.h"
+
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace tensorflow {
+
+string WordWrap(StringPiece prefix, StringPiece str, int width) {
+  const string indent_next_line = "\n" + Spaces(prefix.size());
+  width -= prefix.size();
+  string result;
+  strings::StrAppend(&result, prefix);
+
+  while (!str.empty()) {
+    if (static_cast<int>(str.size()) <= width) {
+      // Remaining text fits on one line.
+      strings::StrAppend(&result, str);
+      break;
+    }
+    auto space = str.rfind(' ', width);
+    if (space == StringPiece::npos) {
+      // Rather make a too-long line and break at a space.
+      space = str.find(' ');
+      if (space == StringPiece::npos) {
+        strings::StrAppend(&result, str);
+        break;
+      }
+    }
+    // Breaking at character at position <space>.
+    StringPiece to_append = str.substr(0, space);
+    str.remove_prefix(space + 1);
+    // Remove spaces at break.
+    while (to_append.ends_with(" ")) {
+      to_append.remove_suffix(1);
+    }
+    while (str.Consume(" ")) {
+    }
+
+    // Go on to the next line.
+    strings::StrAppend(&result, to_append);
+    if (!str.empty()) strings::StrAppend(&result, indent_next_line);
+  }
+
+  return result;
+}
+
+bool ConsumeEquals(StringPiece* description) {
+  if (description->Consume("=")) {
+    while (description->Consume(" ")) {  // Also remove spaces after "=".
+    }
+    return true;
+  }
+  return false;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/op_gen_lib.h b/tensorflow/core/framework/op_gen_lib.h
new file mode 100644
index 0000000000..9890f1bcec
--- /dev/null
+++ b/tensorflow/core/framework/op_gen_lib.h
@@ -0,0 +1,24 @@
+#ifndef TENSORFLOW_FRAMEWORK_OP_GEN_LIB_H_
+#define TENSORFLOW_FRAMEWORK_OP_GEN_LIB_H_
+
+#include <string>
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+
+inline string Spaces(int n) { return string(n, ' '); }
+
+// Wrap prefix + str to be at most width characters, indenting every line
+// after the first by prefix.size() spaces.  Intended use case is something
+// like prefix = "  Foo(" and str is a list of arguments (terminated by a ")").
+// TODO(josh11b): Option to wrap on ", " instead of " " when possible.
+string WordWrap(StringPiece prefix, StringPiece str, int width);
+
+// Looks for an "=" at the beginning of *description.  If found, strips it off
+// (and any following spaces) from *description and return true.  Otherwise
+// returns false.
+bool ConsumeEquals(StringPiece* description);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_OP_GEN_LIB_H_
diff --git a/tensorflow/core/framework/op_kernel.cc b/tensorflow/core/framework/op_kernel.cc
new file mode 100644
index 0000000000..eb83d393f0
--- /dev/null
+++ b/tensorflow/core/framework/op_kernel.cc
@@ -0,0 +1,749 @@
+#include "tensorflow/core/framework/op_kernel.h"
+
+#include <unordered_map>
+
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+namespace {
+
+Status MatchSignatureHelper(const DataTypeSlice expected_inputs,
+                            const DataTypeSlice expected_outputs,
+                            const DataTypeSlice inputs,
+                            const DataTypeSlice outputs) {
+  bool signature_mismatch = false;
+
+  if (inputs.size() != expected_inputs.size()) signature_mismatch = true;
+  for (size_t i = 0; !signature_mismatch && i < inputs.size(); ++i) {
+    if (!TypesCompatible(expected_inputs[i], inputs[i])) {
+      signature_mismatch = true;
+    }
+  }
+
+  if (outputs.size() != expected_outputs.size()) signature_mismatch = true;
+  for (size_t i = 0; !signature_mismatch && i < outputs.size(); ++i) {
+    if (!TypesCompatible(expected_outputs[i], outputs[i])) {
+      signature_mismatch = true;
+    }
+  }
+
+  if (signature_mismatch) {
+    return errors::InvalidArgument("Signature mismatch, have: ",
+                                   DataTypeSliceString(inputs), "->",
+                                   DataTypeSliceString(outputs), " expected: ",
+                                   DataTypeSliceString(expected_inputs), "->",
+                                   DataTypeSliceString(expected_outputs));
+  }
+  return Status::OK();
+}
+
+// Check HostMemory backward compatibility.
+bool CheckHostMemoryCompatibility(const DeviceType device_type,
+                                  const OpKernel* kernel) {
+  if (device_type == DEVICE_GPU) {
+    for (int i = 0; i < kernel->num_inputs(); ++i) {
+      if (kernel->input_type(i) == DT_INT32 &&
+          kernel->input_memory_types()[i] != HOST_MEMORY) {
+        return false;
+      }
+    }
+    for (int i = 0; i < kernel->num_outputs(); ++i) {
+      if (kernel->output_type(i) == DT_INT32 &&
+          kernel->output_memory_types()[i] != HOST_MEMORY) {
+        return false;
+      }
+    }
+  }
+  return true;
+}
+
+}  // namespace
+
+// OpKernel ------------------------------------------------------------------
+
+OpKernel::OpKernel(OpKernelConstruction* context)
+    : def_(context->def()),
+      input_types_(context->input_types().begin(),
+                   context->input_types().end()),
+      output_types_(context->output_types().begin(),
+                    context->output_types().end()),
+      input_name_map_(context->num_inputs()),
+      output_name_map_(context->num_outputs()) {
+  OP_REQUIRES_OK(context,
+                 NameRangesForNode(def_, context->op_def(), &input_name_map_,
+                                   &output_name_map_));
+
+  // By default, the input and output memory types are always in device memory,
+  // but can be overridden by individual implementations of OpKernels in their
+  // constructor.
+  input_memory_types_ = MemoryTypeVector(input_types_.size(), DEVICE_MEMORY);
+  output_memory_types_ = MemoryTypeVector(output_types_.size(), DEVICE_MEMORY);
+  // TODO(yuanbyu): For now we assume the memory types of function
+  // inputs/outputs to be DEVICE_MEMORY.
+  auto lib = context->function_library();
+  if (lib == nullptr || !lib->IsDefined(def_.op())) {
+    OP_REQUIRES_OK(context, MemoryTypesForNode(
+                                context->device_type(), def_, context->op_def(),
+                                input_name_map_, output_name_map_,
+                                &input_memory_types_, &output_memory_types_));
+    // Log all the uses of int32 on GPU.
+    // TODO(yunabyu): Remove once everyone transitions to HostMemory.
+    if (VLOG_IS_ON(2)) {
+      if (!CheckHostMemoryCompatibility(context->device_type(), this)) {
+        VLOG(2) << "Using int32 on GPU at node: " << SummarizeNodeDef(def());
+      }
+    }
+  }
+}
+
+Status OpKernel::InputRange(const string& input_name, int* start,
+                            int* stop) const {
+  const auto result = input_name_map_.find(input_name);
+  if (result == input_name_map_.end()) {
+    return errors::InvalidArgument("Unknown input name: ", input_name);
+  } else {
+    *start = result->second.first;
+    *stop = result->second.second;
+    return Status::OK();
+  }
+}
+
+Status OpKernel::OutputRange(const string& output_name, int* start,
+                             int* stop) const {
+  const auto result = output_name_map_.find(output_name);
+  if (result == output_name_map_.end()) {
+    return errors::InvalidArgument("Unknown output name: ", output_name);
+  } else {
+    *start = result->second.first;
+    *stop = result->second.second;
+    return Status::OK();
+  }
+}
+
+void AsyncOpKernel::Compute(OpKernelContext* context) {
+  Notification n;
+  ComputeAsync(context, [&n]() { n.Notify(); });
+  n.WaitForNotification();
+}
+
+// PersistentTensor ----------------------------------------------------------
+
+Tensor* PersistentTensor::AccessTensor(OpKernelConstruction* context) {
+  // the caller has to have a valid context
+  CHECK(context);
+  return &tensor_;
+}
+
+Tensor* PersistentTensor::AccessTensor(OpKernelContext* context) {
+  context->NotifyUseOfPersistentTensor(tensor_);
+  return &tensor_;
+}
+
+// OpKernelConstruction ------------------------------------------------------
+
+Status OpKernelConstruction::MatchSignature(
+    const DataTypeSlice expected_inputs, const DataTypeSlice expected_outputs) {
+  return MatchSignatureHelper(expected_inputs, expected_outputs, input_types_,
+                              output_types_);
+}
+
+Status OpKernelConstruction::allocate_temp(DataType type,
+                                           const TensorShape& shape,
+                                           Tensor* out_temp) {
+  Tensor new_temp(allocator_, type, shape);
+
+  if (!new_temp.IsInitialized() && shape.num_elements() > 0) {
+    return errors::ResourceExhausted(
+        "OOM when allocating temporary tensor with shape", shape.DebugString());
+  }
+  *out_temp = new_temp;
+  return Status::OK();
+}
+
+Status OpKernelConstruction::allocate_persistent(
+    DataType type, const TensorShape& shape, PersistentTensor* out_persistent,
+    Tensor** out_tensor) {
+  // for now just do the same thing as allocate_temp
+  // TODO(misard) add specific memory tracking for persistent tensors
+  Tensor persistent;
+  Status s = allocate_temp(type, shape, &persistent);
+  if (!s.ok()) {
+    return s;
+  }
+  *out_persistent = PersistentTensor(persistent);
+  Tensor* allocated = out_persistent->AccessTensor(this);
+  if (out_tensor) {
+    *out_tensor = allocated;
+  }
+  return s;
+}
+
+// OpKernelContext -----------------------------------------------------------
+
+OpKernelContext::OpKernelContext(const Params& params)
+    : params_(params),
+      outputs_(params.op_kernel->output_types().size()),
+      output_allocation_types_(params.op_kernel->output_types().size()) {
+  Allocator* eigen_gpu_allocator = get_allocator(AllocatorAttributes());
+  eigen_gpu_device_ = params_.device->MakeGpuDevice(params_.op_device_context,
+                                                    eigen_gpu_allocator);
+}
+
+OpKernelContext::~OpKernelContext() {
+  for (TensorValue& value : outputs_) {
+    if (!value.is_ref()) {
+      delete value.tensor;
+    }
+  }
+  for (Tensor* t : temp_tensors_) delete t;
+  delete eigen_gpu_device_;
+}
+
+Status OpKernelContext::input(const string& name, const Tensor** tensor) const {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->InputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued input name '",
+                                   name,
+                                   "' when single-valued input was "
+                                   "expected");
+  }
+  if ((*params_.inputs)[start].is_ref()) {
+    return errors::InvalidArgument("OpKernel used ref input name '", name,
+                                   "' when immutable input was expected");
+  }
+  *tensor = (*params_.inputs)[start].tensor;
+  return Status::OK();
+}
+
+Status OpKernelContext::input_ref_mutex(const string& name, mutex** out_mutex) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->InputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued input name '",
+                                   name,
+                                   "' when single-valued input was expected");
+  }
+  *out_mutex = input_ref_mutex(start);
+  return Status::OK();
+}
+
+Status OpKernelContext::mutable_input(const string& name, Tensor* tensor,
+                                      bool lock_held) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->InputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued input name '",
+                                   name,
+                                   "' when single-valued input was expected");
+  }
+  if (!(*params_.inputs)[start].is_ref()) {
+    return errors::InvalidArgument("OpKernel used immutable input name '", name,
+                                   "' when ref input was expected");
+  }
+  // return a copy of the Ref acquired while holding the mutex
+  if (lock_held) {
+    *tensor = *(*params_.inputs)[start].tensor;
+  } else {
+    mutex_lock l(*input_ref_mutex(start));
+    *tensor = *(*params_.inputs)[start].tensor;
+  }
+  return Status::OK();
+}
+
+Status OpKernelContext::replace_ref_input(const string& name,
+                                          const Tensor& tensor,
+                                          bool lock_held) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->InputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued input name '",
+                                   name,
+                                   "' when single-valued input was expected");
+  }
+  if (!(*params_.inputs)[start].is_ref()) {
+    return errors::InvalidArgument("OpKernel used immutable input name '", name,
+                                   "' when ref input was expected");
+  }
+  replace_ref_input(start, tensor, lock_held);
+  return Status::OK();
+}
+
+Status OpKernelContext::input_list(const string& name,
+                                   OpInputList* list) const {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->InputRange(name, &start, &stop));
+  *list = OpInputList(this, start, stop);
+  return Status::OK();
+}
+
+Status OpKernelContext::mutable_input_list(const string& name,
+                                           OpMutableInputList* list) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->InputRange(name, &start, &stop));
+  *list = OpMutableInputList(this, start, stop);
+  return Status::OK();
+}
+
+Status OpKernelContext::output_list(const string& name, OpOutputList* list) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->OutputRange(name, &start, &stop));
+  *list = OpOutputList(this, start, stop);
+  return Status::OK();
+}
+
+Status OpKernelContext::allocate_output(const string& name,
+                                        const TensorShape& shape,
+                                        Tensor** tensor) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->OutputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued output name '",
+                                   name,
+                                   "' when single-valued output was "
+                                   "expected");
+  }
+  return allocate_output(start, shape, tensor);
+}
+
+Status OpKernelContext::allocate_output(const string& name,
+                                        const TensorShape& shape,
+                                        Tensor** tensor,
+                                        AllocatorAttributes attr) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->OutputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued output name '",
+                                   name,
+                                   "' when single-valued output was "
+                                   "expected");
+  }
+  return allocate_output(start, shape, tensor, attr);
+}
+
+Status OpKernelContext::set_output(const string& name, const Tensor& tensor) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->OutputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued output name '",
+                                   name,
+                                   "' when single-valued output was "
+                                   "expected");
+  }
+  set_output(start, tensor);
+  return Status::OK();
+}
+
+Status OpKernelContext::set_output_ref(const string& name, mutex* mu,
+                                       Tensor* tensor_for_ref) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->OutputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued output name '",
+                                   name,
+                                   "' when single-valued output was "
+                                   "expected");
+  }
+  set_output_ref(start, mu, tensor_for_ref);
+  return Status::OK();
+}
+
+Status OpKernelContext::mutable_output(const string& name, Tensor** tensor) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->OutputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued output name '",
+                                   name,
+                                   "' when single-valued output was "
+                                   "expected");
+  }
+  *tensor = mutable_output(start);
+  return Status::OK();
+}
+
+Status OpKernelContext::release_output(const string& name, TensorValue* value) {
+  int start, stop;
+  TF_RETURN_IF_ERROR(params_.op_kernel->OutputRange(name, &start, &stop));
+  if (stop != start + 1) {
+    return errors::InvalidArgument("OpKernel used list-valued output name '",
+                                   name,
+                                   "' when single-valued output was "
+                                   "expected");
+  }
+  *value = release_output(start);
+  return Status::OK();
+}
+
+bool OpKernelContext::ValidateInputsAreSameShape(OpKernel* op) {
+  const auto& inputs = *params_.inputs;
+  for (size_t i = 1; i < inputs.size(); ++i) {
+    if (!inputs[0]->IsSameSize(*(inputs[i].tensor))) {
+      SetStatus(errors::InvalidArgument(
+          "Inputs to operation ", op->name(), " of type ", op->type_string(),
+          " must have the same size and shape.  Input 0: ",
+          inputs[0]->shape().DebugString(), " != input ", i, ": ",
+          inputs[i]->shape().DebugString()));
+      return false;
+    }
+  }
+  return true;
+}
+
+Status OpKernelContext::MatchSignature(const DataTypeSlice expected_inputs,
+                                       const DataTypeSlice expected_outputs) {
+  DataTypeVector inputs;
+  for (const TensorValue& t : *params_.inputs) {
+    inputs.push_back(t.is_ref() ? MakeRefType(t->dtype()) : t->dtype());
+  }
+  DataTypeVector outputs = params_.op_kernel->output_types();
+  return MatchSignatureHelper(expected_inputs, expected_outputs, inputs,
+                              outputs);
+}
+
+// OpKernel registration ------------------------------------------------------
+
+struct KernelRegistration {
+  KernelRegistration(const KernelDef& d,
+                     kernel_factory::OpKernelRegistrar::Factory f)
+      : def(d), factory(f) {}
+  const KernelDef def;
+  const kernel_factory::OpKernelRegistrar::Factory factory;
+};
+
+// This maps from 'op_type' + DeviceType to the set of KernelDefs and
+// factory functions for instantiating the OpKernel that matches the
+// KernelDef.
+typedef std::unordered_multimap<string, KernelRegistration> KernelRegistry;
+
+static KernelRegistry* GlobalKernelRegistry() {
+  static KernelRegistry* global_kernel_registry = new KernelRegistry;
+  return global_kernel_registry;
+}
+
+static string Key(const string& op_type, DeviceType device_type,
+                  const string& label) {
+  return strings::StrCat(op_type, ":", DeviceTypeString(device_type), ":",
+                         label);
+}
+
+namespace kernel_factory {
+
+OpKernelRegistrar::OpKernelRegistrar(const KernelDef* kernel_def,
+                                     Factory factory) {
+  const string key =
+      Key(kernel_def->op(), DeviceType(kernel_def->device_type()),
+          kernel_def->label());
+  GlobalKernelRegistry()->insert(
+      std::make_pair(key, KernelRegistration(*kernel_def, factory)));
+  delete kernel_def;
+}
+
+}  // namespace kernel_factory
+
+namespace {
+
+// Helper for AttrsMatch().
+bool InTypeList(DataType dt, const AttrValue& type_list) {
+  for (int in_list : type_list.list().type()) {
+    if (dt == in_list) return true;
+  }
+  return false;
+}
+
+// Returns whether the attrs in the NodeDef satisfy the constraints in
+// the kernel_def.  Returns an error if attrs in kernel_def are not
+// found, or have a mismatching type.
+Status AttrsMatch(const NodeDef& node_def, const KernelDef& kernel_def,
+                  bool* match) {
+  *match = false;
+  AttrSlice attrs(node_def);
+  for (const auto& constraint : kernel_def.constraint()) {
+    if (constraint.allowed_values().list().type_size() == 0) {
+      return errors::Unimplemented(
+          "KernelDef '", kernel_def.ShortDebugString(),
+          " has constraint on attr '", constraint.name(),
+          "' with unsupported type: ",
+          SummarizeAttrValue(constraint.allowed_values()));
+    }
+
+    const AttrValue* found = attrs.Find(constraint.name());
+    if (found) {
+      if (found->type() != DT_INVALID) {
+        if (!InTypeList(found->type(), constraint.allowed_values())) {
+          return Status::OK();
+        }
+      } else {
+        if (!AttrValueHasType(*found, "list(type)").ok()) {
+          return errors::InvalidArgument(
+              "KernelDef '", kernel_def.ShortDebugString(),
+              "' has constraint on attr '", constraint.name(),
+              "' that has value '", SummarizeAttrValue(*found),
+              "' that does not have type 'type' or 'list(type)' in NodeDef '",
+              SummarizeNodeDef(node_def), "'");
+        }
+
+        for (int t : found->list().type()) {
+          if (!InTypeList(static_cast<DataType>(t),
+                          constraint.allowed_values())) {
+            return Status::OK();
+          }
+        }
+      }
+    } else {
+      return errors::InvalidArgument(
+          "OpKernel '", kernel_def.op(), "' has constraint on attr '",
+          constraint.name(), "' not in NodeDef '", SummarizeNodeDef(node_def),
+          "', KernelDef: '", kernel_def.ShortDebugString(), "'");
+    }
+  }
+  *match = true;
+  return Status::OK();
+}
+
+Status FindKernelRegistration(DeviceType device_type, const NodeDef& node_def,
+                              const KernelRegistration** reg) {
+  *reg = nullptr;
+  string label;  // Label defaults to empty if not found in NodeDef.
+  GetNodeAttr(node_def, "_kernel", &label);
+  const string key = Key(node_def.op(), device_type, label);
+  auto regs = GlobalKernelRegistry()->equal_range(key);
+  for (auto iter = regs.first; iter != regs.second; ++iter) {
+    // If there is a kernel registered for the op and device_type,
+    // check that the attrs match.
+    bool match;
+    TF_RETURN_IF_ERROR(AttrsMatch(node_def, iter->second.def, &match));
+    if (match) {
+      if (*reg != nullptr) {
+        return errors::InvalidArgument(
+            "Multiple OpKernel registrations match NodeDef '",
+            SummarizeNodeDef(node_def), "': '", (*reg)->def.ShortDebugString(),
+            "' and '", iter->second.def.ShortDebugString(), "'");
+      }
+      *reg = &iter->second;
+    }
+  }
+  return Status::OK();
+}
+
+}  // namespace
+
+Status SupportedDeviceTypesForNode(
+    const std::vector<DeviceType>& prioritized_types, const NodeDef& def,
+    DeviceTypeVector* device_types) {
+  // TODO(zhifengc): Changes the callers (SimplePlacer and
+  // DynamicPlacer) to consider the possibility that 'def' is call to
+  // a user-defined function and only calls this
+  // SupportedDeviceTypesForNode for primitive ops.
+  Status s;
+  const OpDef* op_def = OpRegistry::Global()->LookUp(def.op(), &s);
+  if (op_def) {
+    for (const DeviceType& device_type : prioritized_types) {
+      const KernelRegistration* reg = nullptr;
+      TF_RETURN_IF_ERROR(FindKernelRegistration(device_type, def, &reg));
+      if (reg != nullptr) device_types->push_back(device_type);
+    }
+  } else {
+    // Assumes that all device types support this node.
+    for (const DeviceType& device_type : prioritized_types) {
+      device_types->push_back(device_type);
+    }
+  }
+  return Status::OK();
+}
+
+std::unique_ptr<OpKernel> CreateOpKernel(DeviceType device_type,
+                                         DeviceBase* device,
+                                         Allocator* allocator,
+                                         const NodeDef& node_def,
+                                         Status* status) {
+  OpKernel* kernel = nullptr;
+  *status = CreateOpKernel(device_type, device, allocator, nullptr, node_def,
+                           &kernel);
+  return std::unique_ptr<OpKernel>(kernel);
+}
+
+Status CreateOpKernel(DeviceType device_type, DeviceBase* device,
+                      Allocator* allocator, FunctionLibraryRuntime* flib,
+                      const NodeDef& node_def, OpKernel** kernel) {
+  VLOG(1) << "Instantiating kernel for node: " << SummarizeNodeDef(node_def);
+
+  // Look up the Op registered for this op name.
+  Status s;
+  const OpDef* op_def = OpRegistry::Global()->LookUp(node_def.op(), &s);
+  if (op_def == nullptr) return s;
+
+  // Validate node_def against OpDef.
+  s = ValidateNodeDef(node_def, *op_def);
+  if (!s.ok()) return s;
+
+  // Look up kernel registration.
+  const KernelRegistration* registration;
+  s = FindKernelRegistration(device_type, node_def, &registration);
+  if (!s.ok()) {
+    errors::AppendToMessage(&s, " when instantiating ", node_def.op());
+    return s;
+  }
+  if (registration == nullptr) {
+    s.Update(errors::NotFound("No registered '", node_def.op(),
+                              "' OpKernel for ", DeviceTypeString(device_type),
+                              " devices compatible with node ",
+                              SummarizeNodeDef(node_def)));
+    return s;
+  }
+
+  // Get signature from the OpDef & NodeDef
+  DataTypeVector inputs;
+  DataTypeVector outputs;
+  s.Update(InOutTypesForNode(node_def, *op_def, &inputs, &outputs));
+  if (!s.ok()) {
+    errors::AppendToMessage(&s, " for node: ", SummarizeNodeDef(node_def));
+    return s;
+  }
+
+  // Everything needed for OpKernel construction.
+  OpKernelConstruction context(device_type, device, allocator, &node_def,
+                               op_def, flib, inputs, outputs, &s);
+  *kernel = (*registration->factory)(&context);
+  if (!s.ok()) {
+    delete *kernel;
+    *kernel = nullptr;
+  }
+  return s;
+}
+
+namespace {  // Helper for MemoryTypesForNode.
+// Fills memory_types for either input or output, setting everything
+// to DEVICE_MEMORY except those args in host_memory_args.  Removes
+// elements of host_memory_args that were used.
+void MemoryTypesHelper(const NameRangeMap& name_map,
+                       std::vector<string>* host_memory_args,
+                       MemoryTypeVector* memory_types) {
+  // Set total to the largest endpoint of anything in the name_map.
+  int total = 0;
+  for (const auto& item : name_map) {
+    total = std::max(total, item.second.second);
+  }
+
+  // Now that we know the size, fill with the default 'DEVICE_MEMORY'.
+  memory_types->clear();
+  memory_types->resize(total, DEVICE_MEMORY);
+
+  // Update args that have been marked as in "HOST_MEMORY".
+  size_t keep = 0;
+  for (size_t i = 0; i < host_memory_args->size(); ++i) {
+    auto iter = name_map.find((*host_memory_args)[i]);
+    if (iter != name_map.end()) {
+      for (int j = iter->second.first; j < iter->second.second; ++j) {
+        (*memory_types)[j] = HOST_MEMORY;
+      }
+    } else {
+      // (*host_memory_args)[i] not found, save it for the next pass.
+      if (i > keep) (*host_memory_args)[keep] = (*host_memory_args)[i];
+      ++keep;
+    }
+  }
+  host_memory_args->resize(keep);
+}
+}  // namespace
+
+Status MemoryTypesForNode(DeviceType device_type, const NodeDef& ndef,
+                          const OpDef& op_def,
+                          const NameRangeMap& input_name_map,
+                          const NameRangeMap& output_name_map,
+                          MemoryTypeVector* input_memory_types,
+                          MemoryTypeVector* output_memory_types) {
+  Status status;
+  const KernelRegistration* registration;
+  TF_RETURN_IF_ERROR(FindKernelRegistration(device_type, ndef, &registration));
+
+  if (registration != nullptr) {
+    const auto& from_proto = registration->def.host_memory_arg();
+    std::vector<string> host_memory_args(from_proto.begin(), from_proto.end());
+    MemoryTypesHelper(input_name_map, &host_memory_args, input_memory_types);
+    MemoryTypesHelper(output_name_map, &host_memory_args, output_memory_types);
+    if (!host_memory_args.empty()) {
+      return errors::InvalidArgument(
+          "HostMemory args '", str_util::Join(host_memory_args, "', '"),
+          "' not found in OpDef: ", SummarizeOpDef(op_def));
+    }
+  }
+  return status;
+}
+
+Status MemoryTypesForNode(const OpRegistryInterface* op_registry,
+                          DeviceType device_type, const NodeDef& ndef,
+                          MemoryTypeVector* input_memory_types,
+                          MemoryTypeVector* output_memory_types) {
+  // Look up the Op registered for this op name.
+  Status status;
+  const OpDef* op_def = op_registry->LookUp(ndef.op(), &status);
+  if (op_def == nullptr) return status;
+
+  NameRangeMap inputs, outputs;
+  status = NameRangesForNode(ndef, *op_def, &inputs, &outputs);
+  if (!status.ok()) return status;
+
+  return MemoryTypesForNode(device_type, ndef, *op_def, inputs, outputs,
+                            input_memory_types, output_memory_types);
+}
+
+namespace {
+
+bool FindArgInOp(const string& arg_name,
+                 const protobuf::RepeatedPtrField<OpDef::ArgDef>& args) {
+  for (const auto& arg : args) {
+    if (arg_name == arg.name()) {
+      return true;
+    }
+  }
+  return false;
+}
+
+}  // namespace
+
+Status ValidateKernelRegistrations(const OpRegistryInterface* op_registry) {
+  Status unused_status;
+  for (const auto& key_registration : *GlobalKernelRegistry()) {
+    const KernelDef& kernel_def(key_registration.second.def);
+    const OpDef* op_def = op_registry->LookUp(kernel_def.op(), &unused_status);
+    if (op_def == nullptr) {
+      // TODO(josh11b): Make this a hard error.
+      LOG(ERROR) << "OpKernel ('" << kernel_def.ShortDebugString()
+                 << "') for unknown op: " << kernel_def.op();
+      continue;
+    }
+    for (const auto& host_memory_arg : kernel_def.host_memory_arg()) {
+      if (!FindArgInOp(host_memory_arg, op_def->input_arg()) &&
+          !FindArgInOp(host_memory_arg, op_def->output_arg())) {
+        return errors::InvalidArgument("HostMemory arg '", host_memory_arg,
+                                       "' not found in OpDef: ",
+                                       SummarizeOpDef(*op_def));
+      }
+    }
+  }
+  return Status::OK();
+}
+
+template <>
+const Eigen::ThreadPoolDevice& OpKernelContext::eigen_device() const {
+  return eigen_cpu_device();
+}
+
+template <>
+const Eigen::GpuDevice& OpKernelContext::eigen_device() const {
+  return eigen_gpu_device();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/op_kernel.h b/tensorflow/core/framework/op_kernel.h
new file mode 100644
index 0000000000..34d588c6c9
--- /dev/null
+++ b/tensorflow/core/framework/op_kernel.h
@@ -0,0 +1,1250 @@
+#ifndef TENSORFLOW_FRAMEWORK_OP_KERNEL_H_
+#define TENSORFLOW_FRAMEWORK_OP_KERNEL_H_
+
+#include <functional>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/cancellation.h"
+#include "tensorflow/core/framework/control_flow.h"
+#include "tensorflow/core/framework/device_base.h"
+#include "tensorflow/core/framework/function.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/kernel_def.pb.h"
+#include "tensorflow/core/framework/kernel_def_builder.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/rendezvous.h"
+#include "tensorflow/core/framework/step_stats.pb.h"
+#include "tensorflow/core/framework/tensor_shape.pb.h"
+#include "tensorflow/core/framework/tracking_allocator.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace Eigen {
+class ThreadPoolDevice;
+class GpuDevice;
+}  // end namespace Eigen
+
+namespace tensorflow {
+
+namespace checkpoint {
+class TensorSliceReaderCacheWrapper;
+}  // namespace checkpoint
+
+class AsyncOpKernel;
+class OpKernelConstruction;  // declared below
+class OpKernelContext;       // declared below
+class ResourceMgr;
+
+// TODO(josh11b): Make reference-counted if needed.
+class OpKernel {
+ public:
+  // OpKernel won't be instantiated by the scheduler, so you may perform
+  // expensive initialization in the descendant's constructor.
+  explicit OpKernel(OpKernelConstruction* context);
+  virtual ~OpKernel() {}
+
+  // An OpKernel's computation can be either synchronous or
+  // asynchronous.
+  //
+  // Most OpKernels should compute synchronously.  They should
+  // subclass OpKernel and override the Compute() method and have it
+  // return after completing the supplied work.
+  //
+  // A few special kernels might need to be asynchronous to bound the
+  // number of threads (e.g., network receive operations). These
+  // kernels must subclass AsyncOpKernel and override
+  // AsyncOpKernel::ComputeAsync().
+  //
+  // In both cases, implementations of Compute() and ComputeAsync()
+  // get inputs and write outputs through the given OpKernelContext
+  // and returns a status via context->SetStatus(). They must be
+  // thread-safe.
+
+  // Synchronous compute.
+  //
+  // "context" is guaranteed to be alive until Compute() returns.
+  virtual void Compute(OpKernelContext* context) = 0;
+
+  // Returns nullptr iff this op kernel is synchronous.
+  virtual AsyncOpKernel* AsAsync() { return nullptr; }
+
+  // Returns true iff this op kernel is considered "expensive". The
+  // runtime may use this flag to optimize graph execution for example
+  // to "inline" inexpensive kernels.
+  virtual bool IsExpensive() { return true; }
+
+  // Accessors.
+  const NodeDef& def() const { return def_; }
+  const string& name() const { return def_.name(); }
+  const string& type_string() const { return def_.op(); }
+
+  int num_inputs() const { return input_types_.size(); }
+  DataType input_type(int i) const { return input_types_[i]; }
+  const DataTypeVector& input_types() const { return input_types_; }
+  const MemoryTypeVector& input_memory_types() const {
+    return input_memory_types_;
+  }
+
+  int num_outputs() const { return output_types_.size(); }
+  DataType output_type(int o) const { return output_types_[o]; }
+  const DataTypeVector& output_types() const { return output_types_; }
+  const MemoryTypeVector& output_memory_types() const {
+    return output_memory_types_;
+  }
+
+  Status InputRange(const string& input_name, int* start, int* stop) const;
+  Status OutputRange(const string& output_name, int* start, int* stop) const;
+
+ private:
+  const NodeDef def_;
+  const DataTypeVector input_types_;
+  const DataTypeVector output_types_;
+  NameRangeMap input_name_map_;
+  NameRangeMap output_name_map_;
+  MemoryTypeVector input_memory_types_;
+  MemoryTypeVector output_memory_types_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(OpKernel);
+};
+
+class AsyncOpKernel : public OpKernel {
+ public:
+  using OpKernel::OpKernel;  // Lift OpKernel constructors.
+
+  // Asynchronous compute.
+  //
+  // Implementations of ComputeAsync() must run "done" to signal the
+  // completion of the computation. "context" is guaranteed to be
+  // alive until the "done" callback starts.
+  typedef std::function<void()> DoneCallback;
+  virtual void ComputeAsync(OpKernelContext* context, DoneCallback done) = 0;
+
+  AsyncOpKernel* AsAsync() final { return this; }
+
+  void Compute(OpKernelContext* context) final;
+};
+
+// Wraps a tensor that is held by an Op across calls to Compute(). For
+// memory safety when using asynchronous devices like GPUs, the system
+// must be notified when a Tensor is used inside an Op execution. The
+// wrapper ensures that all uses of the Tensor are tracked, because in
+// order to retrieve the Tensor the caller must use AccessTensor which
+// notifies the context.
+class PersistentTensor {
+ public:
+  PersistentTensor() {}
+  explicit PersistentTensor(const Tensor& tensor) : tensor_(tensor) {}
+
+  // Caller does not own the returned Tensor*.
+  Tensor* AccessTensor(OpKernelConstruction* context);
+  // Caller does not own the returned Tensor*.
+  Tensor* AccessTensor(OpKernelContext* context);
+
+  // The check for initialization does not need to access the
+  // underlying tensor buffer.
+  bool IsInitialized() { return tensor_.IsInitialized(); }
+
+ private:
+  Tensor tensor_;
+};
+
+class OpKernelConstruction {
+ public:
+  // TODO(yuanbyu): Probably reduce the number of arguments.
+  OpKernelConstruction(DeviceType device_type, DeviceBase* device,
+                       Allocator* allocator, const NodeDef* node_def,
+                       const OpDef* op_def, FunctionLibraryRuntime* flib,
+                       const DataTypeSlice& input_types,
+                       const DataTypeSlice& output_types, Status* status)
+      : device_type_(device_type),
+        device_(device),
+        allocator_(allocator),
+        def_(node_def),
+        op_def_(op_def),
+        flib_(flib),
+        input_types_(input_types),
+        output_types_(output_types),
+        status_(status) {}
+
+  Env* env() const { return device_->env(); }
+
+  // Allocation of tensors during kernel construction:
+  //
+  // It is legal to temporarily allocate scratch tensor storage during
+  // Op kernel construction. Scratch tensors should be allocated using
+  // allocate_temp below. Some kernels need to keep tensors in between
+  // invocations. If such a Tensor is allocated during kernel
+  // construction this must be done using allocate_persistent, and the
+  // Op may only store the returned PersistentTensor object. When the
+  // Tensor is needed in a subsequent invocation, it can be retrieved
+  // from the PersistentTensor using the AccessTensor method. This
+  // ensures that the system is made aware of any use of the tensor's
+  // allocated memory, which is needed for correctness on asynchronous
+  // devices such as GPUs.
+
+  // Allocates a temporary Tensor of the specified type and shape. The
+  // Tensor must not be used after kernel construction is
+  // complete. See comment above.
+  Status allocate_temp(DataType type, const TensorShape& shape,
+                       Tensor* out_temp);
+
+  // Allocates a Tensor of the specified type and shape which the Op
+  // plans to maintain as persistent state. out_persistent holds the
+  // PersistentTensor which is the object the caller should store. For
+  // convenience, if out_tensor is non-null then it will be filled in
+  // with a Tensor* pointing to the newly-allocated tensor which the
+  // caller can use instead of calling
+  // out_persistent->AccessTensor. The caller does not own out_tensor
+  // and should not keep a copy of it. See comment above.
+  Status allocate_persistent(DataType type, const TensorShape& shape,
+                             PersistentTensor* out_persistent,
+                             Tensor** out_tensor);
+
+  // User-supplied configuration of this operation.
+  const NodeDef& def() const { return *def_; }
+
+  // Op registered for this op type.
+  const OpDef& op_def() const { return *op_def_; }
+
+  // For inspecting the inputs to this operation.
+  int num_inputs() const { return input_types_.size(); }
+  DataType input_type(int i) const { return input_types_[i]; }
+  const DataTypeSlice& input_types() const { return input_types_; }
+
+  // For inspecting the outputs expected from this operation.
+  int num_outputs() const { return output_types_.size(); }
+  DataType output_type(int i) const { return output_types_[i]; }
+  const DataTypeSlice& output_types() const { return output_types_; }
+
+  // If expected_inputs == inputs() and expected_outputs == output_types(),
+  // returns OK, else returns INVALID_ARGUMENT with an error message.
+  // Recommended for Ops with dynamic signatures.
+  Status MatchSignature(const DataTypeSlice expected_inputs,
+                        const DataTypeSlice expected_outputs);
+
+  // For recording configuration errors during construction.
+  void SetStatus(const Status& status) { status_->Update(status); }
+  const Status& status() const { return *status_; }
+
+  // Look up the attr with name attr_name and set *value to its value.  If no
+  // attr with attr_name is found in def(), or the attr does not have
+  // a matching type, a non-ok status will be returned.
+  template <class T>
+  Status GetAttr(const string& attr_name, T* value) const {
+    return GetNodeAttr(def(), attr_name, value);
+  }
+
+  // May be used, e.g., to get GPU handles, etc.
+  // TODO(tucker): Add example usage.
+  DeviceBase* device() const { return device_; }
+
+  // Return the device type.
+  const DeviceType& device_type() const { return device_type_; }
+
+  // If not nullptr, the kernel can instantiate functions defined in
+  // the library. E.g.,
+  // CHECK_NOTNULL(function_library())->Instantiate("Foo", ...).
+  FunctionLibraryRuntime* function_library() const { return flib_; }
+
+ private:
+  const DeviceType device_type_;
+  DeviceBase* const device_;
+  Allocator* allocator_;
+  const NodeDef* def_;
+  const OpDef* op_def_;
+  FunctionLibraryRuntime* flib_;
+  DataTypeSlice input_types_;
+  DataTypeSlice output_types_;
+  Status* status_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(OpKernelConstruction);
+};
+
+// TODO(mrry): Consider converting to a random_access_iterator, and upgrading
+// tensorflow::gtl::iterator_range to make the below container classes
+// unnecessary.
+template <typename ListType, typename ElementType>
+class OpArgIterator {
+ public:
+  typedef OpArgIterator<ListType, ElementType> ME;
+  OpArgIterator(const ListType* list, int i) : list_(list), i_(i) {}
+  bool operator==(const ME& rhs) {
+    DCHECK(list_ == rhs.list_);
+    return i_ == rhs.i_;
+  }
+  bool operator!=(const ME& rhs) {
+    DCHECK(list_ == rhs.list_);
+    return i_ != rhs.i_;
+  }
+  void operator++() { ++i_; }
+  ElementType& operator*() { return (*list_)[i_]; }
+
+ private:
+  const ListType* const list_;
+  int i_;
+};
+
+// Utility class for representing a list of immutable input tensors
+// that are passed to the op as a single named argument.
+class OpInputList {
+ public:
+  typedef OpArgIterator<OpInputList, const Tensor&> Iterator;
+  OpInputList() : ctx_(nullptr), start_(0), stop_(0) {}
+  OpInputList(const OpKernelContext* ctx, int start, int stop)
+      : ctx_(ctx), start_(start), stop_(stop) {}
+  OpInputList& operator=(const OpInputList& other) = default;
+  const Tensor& operator[](int i) const;
+  int size() const { return stop_ - start_; }
+  Iterator begin() const { return Iterator(this, 0); }
+  Iterator end() const { return Iterator(this, size()); }
+
+ private:
+  const OpKernelContext* ctx_;  // not owned
+  int start_;
+  int stop_;
+};
+
+// Utility class for representing a list of mutable ("ref") input tensors
+// that are passed to the op as a single named argument.
+class OpMutableInputList {
+ public:
+  typedef OpArgIterator<OpMutableInputList, Tensor*> Iterator;
+  OpMutableInputList(OpKernelContext* ctx, int start, int stop)
+      : ctx_(ctx), start_(start), stop_(stop) {}
+  OpMutableInputList() : ctx_(nullptr), start_(0), stop_(0) {}
+  OpMutableInputList& operator=(const OpMutableInputList& other) = default;
+  Tensor at(int i, bool lock_held);
+  mutex* ref_mutex(int i);
+  int size() const { return stop_ - start_; }
+  Iterator begin() const { return Iterator(this, 0); }
+  Iterator end() const { return Iterator(this, size()); }
+
+ private:
+  OpKernelContext* ctx_;  // not owned
+  int start_;
+  int stop_;
+};
+
+// Utility class for representing a list of output tensors that are
+// grouped as a single named output.
+class OpOutputList {
+ public:
+  typedef OpArgIterator<OpOutputList, const Tensor*> Iterator;
+  OpOutputList() : ctx_(nullptr), start_(0), stop_(0) {}
+  OpOutputList(OpKernelContext* ctx, int start, int stop)
+      : ctx_(ctx), start_(start), stop_(stop) {}
+  OpOutputList& operator=(const OpOutputList& other) = default;
+  Tensor* operator[](int i);
+  bool required(int i) const;
+  Status allocate(int i, const TensorShape& shape, Tensor** output);
+  void set(int i, const Tensor& tensor);
+  void set_ref(int i, mutex* mu, Tensor* tensor_for_ref);
+  int size() const { return stop_ - start_; }
+  Iterator begin() const { return Iterator(this, 0); }
+  Iterator end() const { return Iterator(this, size()); }
+
+ private:
+  OpKernelContext* ctx_;  // not owned
+  int start_;
+  int stop_;
+};
+
+// Holds a tensor or tensor reference. For tensor references, we need
+// a mutex to prevent concurrent access to the tensor.
+struct TensorValue {
+  TensorValue() : mutex_if_ref(nullptr), tensor(nullptr) {}
+  TensorValue(Tensor* t)  // NOLINT(runtime/explicit)
+      : mutex_if_ref(nullptr),
+        tensor(t) {}
+  TensorValue(mutex* mu, Tensor* t) : mutex_if_ref(mu), tensor(t) {}
+  Tensor* operator->() const { return tensor; }
+  bool is_ref() const { return mutex_if_ref != nullptr; }
+
+  mutex* mutex_if_ref;  // nullptr if not a ref, != nullptr if a ref
+  Tensor* tensor;
+};
+
+class OpKernelContext {
+ public:
+  // The first element of a WrappedAllocator is a "base" Allocator and
+  // the second element is that Allocator wrapped by a
+  // TrackingAllocator
+  typedef std::pair<Allocator*, TrackingAllocator*> WrappedAllocator;
+
+  // TODO(zhifengc): Do some cleanup of Params.
+  struct Params {
+    // The op kernel being computed.
+    OpKernel* op_kernel = nullptr;
+
+    // The device on which the kernel is running.
+    DeviceBase* device = nullptr;
+
+    bool track_allocations = false;
+    std::function<AllocatorAttributes(int index)> output_alloc_attr = nullptr;
+
+    // Shared resources accessible by this op kernel invocation.
+    ResourceMgr* resource_manager = nullptr;
+
+    // Per-step resources accessible by this op kernel invocation.
+    ResourceMgr* step_resource_manager = nullptr;
+
+    // Mechanism used by this op kernel invocation to communicate with
+    // computations running on other devices.
+    Rendezvous* rendezvous = nullptr;
+
+    // Mechanism used by this op kernel invocation to register a callback
+    // for its cancellation.
+    CancellationManager* cancellation_manager = nullptr;
+
+    // Inputs to this op kernel.
+    const gtl::InlinedVector<TensorValue, 4>* inputs = nullptr;
+    bool is_input_dead = false;
+
+    const gtl::InlinedVector<AllocatorAttributes, 4>* input_alloc_attrs =
+        nullptr;
+
+    // Device contexts.
+    const gtl::InlinedVector<DeviceContext*, 4>* input_device_contexts =
+        nullptr;
+    DeviceContext* op_device_context = nullptr;
+
+    // Control-flow op supports.
+    FrameAndIter frame_iter;
+
+    // Function call supports.
+    FunctionCallFrame* call_frame = nullptr;
+    FunctionLibraryRuntime* function_library = nullptr;
+
+    // TensorSliceReaderCache support.
+    checkpoint::TensorSliceReaderCacheWrapper* slice_reader_cache = nullptr;
+  };
+  explicit OpKernelContext(const Params& params);
+  ~OpKernelContext();
+
+  Env* env() const { return params_.device->env(); }
+
+  // Input/output signature.
+
+  int num_inputs() const { return params_.inputs->size(); }
+  DataType input_dtype(int index) const;
+  int num_outputs() const { return outputs_.size(); }
+  DataType expected_output_dtype(int index) const;
+
+  // Input
+
+  // Returns an immutable input tensor. May only be used for non-Ref
+  // inputs. For Ref inputs use mutable_input below.
+  // REQUIRES: !IsRefType(input_dtype(index))
+  // TODO(mrry): Convert this to return Status.
+  const Tensor& input(int index) const;
+
+  // Returns the named immutable input tensor in "tensor", as defined
+  // in the OpDef. May only be used for non-Ref inputs. For Ref inputs
+  // use mutable_input below.
+  // REQUIRES: !IsRefType(input_dtype(index))
+  // REQUIRES: the named input must not be a list.
+  Status input(const string& name, const Tensor** tensor) const;
+
+  // Returns the named list-valued immutable input in "list", as
+  // defined in the OpDef.  If the named output is not list-valued,
+  // returns a one-element list. May only be used for non-Ref
+  // inputs. For Ref inputs use mutable_input below.
+  // REQUIRES: !IsRefType(input_dtype(index))
+  Status input_list(const string& name, OpInputList* list) const;
+
+  // For mutable inputs, use the following together to make sure there
+  // is no concurrent access to mutable_input(), e.g.:
+  // {
+  //   Tensor& t = context->mutable_input(index);
+  //   mutex_lock lock(*context->input_ref_mutex(index));
+  //   // modify the values in t
+  // }
+  // REQUIRES: IsRefType(input_dtype(index))
+  // TODO(mrry): Convert this to return Status.
+  mutex* input_ref_mutex(int index);
+  Status input_ref_mutex(const string& name, mutex** out_mutex);
+
+  // Returns a mutable input tensor. Must be used to access Ref
+  // inputs.  REQUIRES: IsRefType(input_dtype(index)). The caller may
+  // modify the values stored in the Tensor buffer, and modifications
+  // will be visible to other Ops reading the same ref tensor. If
+  // !lock_held the input mutex will be acquired before returning the
+  // Tensor.
+  // TODO(mrry):
+  // Convert this to return Status.
+  Tensor mutable_input(int index, bool lock_held);
+
+  // Returns the named mutable input tensor in "tensor", as defined in
+  // the OpDef. Must be used to access Ref inputs. The values stored
+  // in the Tensor buffer may be modified, and modifications will be
+  // visible to other Ops reading the same ref tensor. If !lock_held
+  // the input mutex will be acquired before returning the Tensor.
+  // REQUIRES: the named input must not be a list.
+  // REQUIRES: the named input must be a ref tensor.
+  Status mutable_input(const string& name, Tensor* tensor, bool lock_held);
+
+  // Returns the named list-valued mutable input in "list", as defined
+  // in the OpDef.  If the named intput is not list-valued, returns a
+  // one-element list. Must be used to access Ref inputs. The values
+  // stored in the Tensor buffer may be modified, and modifications
+  // will be visible to other Ops reading the same ref tensor.
+  // REQUIRES: the named input must be a ref tensor.
+  Status mutable_input_list(const string& name, OpMutableInputList* list);
+
+  // Replace the corresponding Ref Input to use the storage buffer
+  // used by tensor. If !lock_held the input mutex will be acquired
+  // before returning the Tensor.
+  // REQUIRES: IsRefType(input_dtype(index)).
+  void replace_ref_input(int index, const Tensor& tensor, bool lock_held);
+
+  // Replace the corresponding named Ref Input to use the storage
+  // buffer used by tensor. If !lock_held the input mutex will be
+  // acquired before returning the Tensor.
+  // REQUIRES: IsRefType(input_dtype(index)).
+  Status replace_ref_input(const string& name, const Tensor& tensor,
+                           bool lock_held);
+
+  // Set the output Ref Tensor at output_index to be an alias of the
+  // input Ref Tensor at input_index.
+  // REQUIRES: IsRefType(input_dtype(input_index)).
+  // REQUIRES: IsRefType(output_dtype(output_index)).
+  void forward_ref_input_to_ref_output(int input_index, int output_index);
+
+  // Deletes the Tensor object used as the Ref Input at
+  // input_index. This is not usually necessary and should be used
+  // with caution. If !lock_held the input mutex will be acquired
+  // before returning the Tensor.
+  // REQUIRES: IsRefType(input_dtype(input_index)).
+  void delete_ref_input(int input_index, bool lock_held);
+
+  // Return true if there is input at the given index. An operator has no
+  // input at index if its tensor is null. This is primarily used by the
+  // merge operator.
+  // TODO(mrry): Convert this to return Status.
+  bool has_input(int index) const;
+
+  // Returns true if all inputs are the same shape, otherwise sets the
+  // status to a non-OK value and returns false.
+  // Usage: if (!context->ValidateInputsAreSameShape(this)) return;
+  bool ValidateInputsAreSameShape(OpKernel* op);
+
+  // Output
+
+  // Returns the named list-valued output in "list", as defined in the OpDef.
+  // If the named output is not list-valued, returns a one-element list.
+  Status output_list(const string& name, OpOutputList* list);
+
+  // If output_required(index) returns true, the OpKernel's Compute() method
+  // should call allocate_output(index, ...), set_output(index, ...),
+  // set_output_ref(index, ...), or set the status to a non-ok value.
+  // If it returns false, it may output, but is not required to do so.
+  // TODO(mrry): Convert this to return Status, and implement a string
+  // name version.
+  bool output_required(int index) const {
+    return true;  // TODO(josh11b): implement
+  }
+
+  // Allocation of tensors during kernel execution inside the Compute
+  // method:
+  //
+  // There are three methods to allocate Tensors when an Op kernel
+  // executes.
+  //
+  // 1) allocate_persistent. This is only needed for Tensors that will
+  // be stored by the Op between invocations, and it *must* be used
+  // for those Tensors. The call returns a PersistentTensor, and that
+  // is the only object the Op is allowed to hold on to between
+  // invocations. When the Tensor is needed in a subsequent
+  // invocation, it can be retrieved from the PersistentTensor using
+  // the AccessTensor method. This ensures that the system is made
+  // aware of any use of the tensor's allocated memory, which is
+  // needed for correctness on asynchronous devices such as GPUs.
+  //
+  // 2) allocate_output. This should be used to allocate any tensor
+  // that is going to be used as an output from the Op at the end of
+  // the current execution. The caller indicates which output the
+  // Tensor will be assigned to, and the call returns the
+  // newly-allocated Tensor. The Tensor can subsequently be assigned
+  // to during kernel execution, and will be used as the designated
+  // output when the kernel execution completes.
+  //
+  // 3) allocate_temp. This should be used to allocate any scratch
+  // storage that is needed while the kernel is executing, and will
+  // not be retained by the Op.
+  //
+  // In some cases a Tensor needs to be used as an output even though
+  // it was previously allocated elsewhere. The Tensor may have been
+  // passed as an input, or stored in a PersistentTensor during a
+  // previous kernel execution, or allocated earlier in the kernel
+  // execution at a time when it was not known which output it would
+  // be assigned to. In this case the kernel can use set_output or
+  // set_output_ref to indicate that the tensor should be used as the
+  // designated output. It is legal to use any previously-allocated
+  // Tensor as an argument to set_output or set_output_ref, including
+  // Tensors allocated via allocate_temp. There may be a performance
+  // penalty to using a Tensor that was not allocated using
+  // allocate_output. This is because allocate_output uses the
+  // AllocatorAttributes stored in output_alloc_attr for the
+  // designated output. In some cases, using the wrong attributes may
+  // cause an extra copy of the Tensor's buffer.
+
+  // Allocates output for the specified output index with shape.
+  // OpKernelContext retains ownership of the returned pointer. See
+  // comment above.
+  //
+  // If memory allocation fails, returns an error status.
+  //
+  // REQUIRES: !IsRefType(expected_output_dtype(index))
+  Status allocate_output(int index, const TensorShape& shape,
+                         Tensor** tensor) TF_MUST_USE_RESULT;
+  Status allocate_output(const string& name, const TensorShape& shape,
+                         Tensor** tensor) TF_MUST_USE_RESULT;
+  // The following methods use the supplied attributes instead of
+  // those in output_alloc_attr. The caller is responsible for
+  // ensuring that the attributes are "compatible" with the
+  // output_alloc_attr, e.g. the tensor is allocated on the correct
+  // device. See comment above.
+  Status allocate_output(int index, const TensorShape& shape, Tensor** tensor,
+                         AllocatorAttributes attr) TF_MUST_USE_RESULT;
+  Status allocate_output(const string& name, const TensorShape& shape,
+                         Tensor** tensor,
+                         AllocatorAttributes attr) TF_MUST_USE_RESULT;
+
+  // Allocates a temporary Tensor of the specified type and
+  // shape. Devices such as GPUs that enqueue Ops for lazy execution
+  // may retain references to the temporary tensors after the Op's
+  // Compute method has run. See comment above.
+  Status allocate_temp(DataType type, const TensorShape& shape,
+                       Tensor* out_temp, AllocatorAttributes attr);
+  Status allocate_temp(DataType type, const TensorShape& shape,
+                       Tensor* out_temp) {
+    return allocate_temp(type, shape, out_temp, AllocatorAttributes());
+  }
+
+  // Allocates a Tensor of the specified type and shape which the Op
+  // plans to maintain as persistent state. out_persistent holds the
+  // PersistentTensor which is the object the caller should store. For
+  // convenience, if out_tensor is non-null then it will be filled in
+  // with a Tensor* pointing to the newly-allocated tensor which the
+  // caller can use instead of calling
+  // out_persistent->AccessTensor. The caller does not own out_tensor
+  // and should not keep a copy of it. See comment above.
+  Status allocate_persistent(DataType type, const TensorShape& shape,
+                             PersistentTensor* out_persistent,
+                             Tensor** out_tensor, AllocatorAttributes attr);
+  Status allocate_persistent(DataType type, const TensorShape& shape,
+                             PersistentTensor* out_persistent,
+                             Tensor** out_tensor) {
+    return allocate_persistent(type, shape, out_persistent, out_tensor,
+                               AllocatorAttributes());
+  }
+
+  // Copies a tensor (allocated by the caller) to the specified output
+  // index.  REQUIRES: !IsRefType(expected_output_dtype(index))
+  // REQUIRES: 'tensor' must have the same MemoryType as
+  // output_memory_types[index]. See comment above.
+  // TODO(mrry): Convert this to return Status.
+  void set_output(int index, const Tensor& tensor);
+  Status set_output(const string& name, const Tensor& tensor);
+
+  // To output a reference.  Caller retains ownership of mu and tensor_for_ref,
+  // and they must outlive all uses within the step. See comment above.
+  // REQUIRES: IsRefType(expected_output_dtype(index))
+  // TODO(mrry): Convert this to return Status.
+  void set_output_ref(int index, mutex* mu, Tensor* tensor_for_ref);
+  Status set_output_ref(const string& name, mutex* mu, Tensor* tensor_for_ref);
+
+  // Returns nullptr if allocate_output() or set_output() have not been called.
+  // TODO(mrry): Convert this to return Status.
+  Tensor* mutable_output(int index);
+  Status mutable_output(const string& name, Tensor** tensor);
+
+  // Transfers ownership of an output tensor to the caller.
+  // NOTE: For non-reference outputs, the caller takes responsibility
+  // for deletion. For reference outputs, the caller does NOT take
+  // responsibility for deletion.
+  // TODO(mrry): Convert this to return Status.
+  TensorValue release_output(int index);
+  Status release_output(const string& name, TensorValue* value);
+
+  // Records device specific state about how the input tensors were
+  // computed.
+  //
+  // If using the templated function, the type must be a subclass
+  // of DeviceContext.
+  //
+  // Get the DeviceContext used for the index input.  Returns nullptr
+  // if no DeviceContext was provided.
+  template <typename T>
+  T* input_device_context(int index);
+  DeviceContext* input_device_context(int index);
+
+  // Return the DeviceContext that should be used for this Op.
+  //
+  // If using the templated function, the type must be a subclass
+  // of DeviceContext.
+  //
+  // Returns nullptr if the device did not provide one.
+  template <typename T>
+  T* op_device_context();
+  DeviceContext* op_device_context() {
+    DeviceContext* ret = params_.op_device_context;
+    if (ret == nullptr) {
+      auto* dev_info = device()->tensorflow_gpu_device_info();
+      if (dev_info) ret = dev_info->default_context;
+    }
+    return ret;
+  }
+
+  AllocatorAttributes input_alloc_attr(int index) const {
+    DCHECK_GE(index, 0);
+    DCHECK_LT(index, params_.input_alloc_attrs->size());
+    return (*params_.input_alloc_attrs)[index];
+  }
+
+  AllocatorAttributes output_alloc_attr(int index) const {
+    return params_.output_alloc_attr(index);
+  }
+
+  gtl::InlinedVector<WrappedAllocator, 4> wrapped_allocators() const {
+    mutex_lock lock(mu_);
+    gtl::InlinedVector<WrappedAllocator, 4> retrieved = wrapped_allocators_;
+    return retrieved;
+  }
+
+  // Communication.
+  //
+  // An op kernel communicates with outside environment through
+  // Rendezvous Send() and Recv().
+  Rendezvous* rendezvous() const { return params_.rendezvous; }
+
+  // Function call support.
+  //
+  // If this kernel invocation is within a function execution,
+  // call_frame() returns the call frame for the function call.
+  FunctionCallFrame* call_frame() const { return params_.call_frame; }
+
+  // If not nullptr, the kernel invoke functions defined in the
+  // library. E.g., CHECK_NOTNULL(function_library())->Run("Foo", ...).
+  FunctionLibraryRuntime* function_library() const {
+    return params_.function_library;
+  }
+
+  // Shared resources accessible to this kernel.
+  ResourceMgr* resource_manager() const { return params_.resource_manager; }
+
+  checkpoint::TensorSliceReaderCacheWrapper* slice_reader_cache() const {
+    return params_.slice_reader_cache;
+  }
+
+  // Execution.
+  //
+  // OpKernels can use these eigen devices to carry out their
+  // numerical computation.
+  const Eigen::ThreadPoolDevice& eigen_cpu_device() const {
+    return *device()->eigen_cpu_device();
+  }
+  const Eigen::GpuDevice& eigen_gpu_device() const {
+    return eigen_gpu_device_->device();
+  }
+  template <typename EigenDeviceType>
+  const EigenDeviceType& eigen_device() const;
+
+  // Error handling.
+
+  // If expected_inputs == inputs() and expected_outputs == output_types(),
+  // returns OK, else returns INVALID_ARGUMENT with an error message.
+  // Recommended for Ops with dynamic signatures, where validation can only
+  // be performed at runtime.
+  Status MatchSignature(const DataTypeSlice expected_inputs,
+                        const DataTypeSlice expected_outputs);
+
+  // An OpKernel should call SetStatus() if Compute() encounters an
+  // error.
+  void SetStatus(const Status& status) { status_.Update(status); }
+  const Status& status() const { return status_; }
+
+  // Cancellation.
+  //
+  // EXPERIMENTAL. See the implementation in tensorflow::TensorQueue for an
+  // example of how to use this API.
+  CancellationManager* cancellation_manager() const {
+    return params_.cancellation_manager;
+  }
+
+  // Other accessors.
+
+  // For control flow.
+  FrameAndIter frame_iter() const { return params_.frame_iter; }
+  bool is_input_dead() const { return params_.is_input_dead; }
+  bool* is_output_dead() { return &is_output_dead_; }
+
+  // May be used, e.g., to get GPU handles, etc.
+  // TODO(tucker): Add example usage.
+  DeviceBase* device() const { return params_.device; }
+
+  // Access to list of temporary tensors.
+  int num_temps();
+  Tensor* temp(int index);
+
+  // Access to information about whether each output was newly
+  // allocated or copied from an existing tensor
+  AllocationType output_allocation_type(int index) const {
+    return output_allocation_types_[index];
+  }
+
+ private:
+  Allocator* get_allocator(AllocatorAttributes attr) {
+    Allocator* allocator = params_.device->GetAllocator(attr);
+    if (params_.track_allocations) {
+      mutex_lock lock(mu_);
+      for (const auto& wrapped : wrapped_allocators_) {
+        if (wrapped.first == allocator) {
+          return wrapped.second;
+        }
+      }
+      TrackingAllocator* wrapped_allocator = new TrackingAllocator(allocator);
+      wrapped_allocators_.push_back(
+          std::make_pair(allocator, wrapped_allocator));
+      return wrapped_allocator;
+    } else {
+      return allocator;
+    }
+  }
+
+  // Per-step resource manager for use by white-listed internal ops.
+  friend class TemporaryVariableOp;
+  friend class DestroyTemporaryVariableOp;
+  ResourceMgr* step_resource_manager() const {
+    return params_.step_resource_manager;
+  }
+
+  // Internal common method used when allocating tensor memory
+  Status allocate_tensor(DataType type, const TensorShape& shape,
+                         Tensor* out_tensor, AllocatorAttributes attr);
+
+  // This is called by PersistentTensor::AccessTensor whenever the
+  // wrapped tensor is retrieved, to ensure the runtime knows that the
+  // Tensor is being accessed within an Op. This is necessary for
+  // memory safety of devices like GPUs that queue Ops for
+  // asynchronous execution after the Compute() method completes.
+  friend class PersistentTensor;
+  void NotifyUseOfPersistentTensor(const Tensor& tensor);
+
+  Status status_;
+  Params params_;  // immutable after construction.
+  const PerOpGpuDevice* eigen_gpu_device_;  // owned, with a per-op
+                                            // wrapped allocator
+  mutable mutex mu_;  // mutable so const accessors can acquire the lock
+  gtl::InlinedVector<WrappedAllocator, 4> wrapped_allocators_ GUARDED_BY(mu_);
+  gtl::InlinedVector<TensorValue, 4> outputs_;
+  gtl::InlinedVector<AllocationType, 4> output_allocation_types_;
+  gtl::InlinedVector<Tensor*, 4> temp_tensors_;
+  bool is_output_dead_ = false;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(OpKernelContext);
+};
+
+// Register your OpKernel by specifying the Op's name, the device the
+// kernel runs on, any type attr constraints for this kernel, any
+// host-memory args, and the class to instantiate.  Examples:
+//
+//  // A kernel that supports all types.
+//  REGISTER_KERNEL_BUILDER(Name("Save").Device(DEVICE_CPU), SaveOp);
+//
+//  // The following are equivalent ways of specifying that the kernel only
+//  // works if the "T" type attr is set to DT_FLOAT.
+//  REGISTER_KERNEL_BUILDER(
+//      Name("Sub").Device(DEVICE_CPU).TypeConstraint<float>("T"),
+//      SubOp<float>);
+//  // (You would then repeat this for every type supported by "Sub".)
+//
+//  // This form allows you to specify a list of types as the constraint.
+//  REGISTER_KERNEL_BUILDER(Name("Sub")
+//                              .Device(DEVICE_CPU)
+//                              .TypeConstraint("T", {DT_FLOAT}),
+//                          SubOp<float>);
+//
+//  // A kernel that expects one of the input tensors in host memory.
+//  REGISTER_KERNEL_BUILDER(
+//      Name("Reshape").Device(DEVICE_GPU).HostMemory("shape"), ReshapeOp);
+//
+// See kernel_def_builder for details.
+
+// Instantiate an OpKernel that has been registered.  Returns nullptr
+// if no operation for that type of device / input signature combination
+// (and a NOT_FOUND *status), or there is an error in construction (and
+// an INVALID_ARGUMENT *status).  Otherwise, the caller takes ownership
+// of the returned pointer.
+// EXPECTED USAGE: unique_ptr<OpKernel> op = CreateOpKernel(...);
+// REQUIRES: def has all attrs specified (e.g. using AddDefaultsToNodeDef()).
+std::unique_ptr<OpKernel> CreateOpKernel(DeviceType device_type,
+                                         DeviceBase* device,
+                                         Allocator* allocator,
+                                         const NodeDef& def, Status* status);
+Status CreateOpKernel(DeviceType device_type, DeviceBase* device,
+                      Allocator* allocator, FunctionLibraryRuntime* flib,
+                      const NodeDef& def, OpKernel** kernel);
+
+// Returns into 'device_types' the subset of prioritized_types that this
+// binary has registered for the given NodeDef.
+//
+// REQUIRES: * 'device_types' is not nullptr.
+//           * def has all attrs specified (e.g. using AddDefaultsToNodeDef()).
+Status SupportedDeviceTypesForNode(
+    const std::vector<DeviceType>& prioritized_types, const NodeDef& def,
+    DeviceTypeVector* device_types);
+
+// Returns into *{input,output}_memory_types the memory type of each
+// {input,output} tensor.
+//
+// REQUIRES: * '*_memory_types' is not nullptr.
+//           * def has all attrs specified (e.g. using AddDefaultsToNodeDef()).
+Status MemoryTypesForNode(DeviceType device_type, const NodeDef& ndef,
+                          const OpDef& op_def,
+                          const NameRangeMap& input_name_map,
+                          const NameRangeMap& output_name_map,
+                          MemoryTypeVector* input_memory_types,
+                          MemoryTypeVector* output_memory_types);
+
+Status MemoryTypesForNode(const OpRegistryInterface* op_registry,
+                          DeviceType device_type, const NodeDef& ndef,
+                          MemoryTypeVector* input_memory_types,
+                          MemoryTypeVector* output_memory_types);
+
+// Call once after Op registration has completed.
+Status ValidateKernelRegistrations(const OpRegistryInterface* op_registry);
+
+// -----------------------------------------------------------------------------
+// OpKernel registration implementation follows, please ignore.
+
+// Allow the REGISTER_KERNEL_BUILDER(Name("op_name").Device(...)...) syntax.
+namespace register_kernel {
+typedef ::tensorflow::KernelDefBuilder Name;
+}  // namespace register_kernel
+
+#define REGISTER_KERNEL_BUILDER(kernel_builder, ...) \
+  REGISTER_KERNEL_BUILDER_UNIQ_HELPER(__COUNTER__, kernel_builder, __VA_ARGS__)
+
+#define REGISTER_KERNEL_BUILDER_UNIQ_HELPER(ctr, kernel_builder, ...) \
+  REGISTER_KERNEL_BUILDER_UNIQ(ctr, kernel_builder, __VA_ARGS__)
+
+#define REGISTER_KERNEL_BUILDER_UNIQ(ctr, kernel_builder, ...)   \
+  static ::tensorflow::kernel_factory::OpKernelRegistrar         \
+      registrar__body__##ctr##__object(                          \
+          ::tensorflow::register_kernel::kernel_builder.Build(), \
+          +[](::tensorflow::OpKernelConstruction* context)       \
+              -> ::tensorflow::OpKernel* { return new __VA_ARGS__(context); })
+
+namespace kernel_factory {
+
+class OpKernelRegistrar {
+ public:
+  typedef OpKernel* (*Factory)(OpKernelConstruction*);
+  OpKernelRegistrar(const KernelDef* kernel_def, Factory factory);
+};
+
+}  // namespace kernel_factory
+
+// -----------------------------------------------------------------------------
+// Template and inline method implementations, please ignore
+
+inline DataType OpKernelContext::input_dtype(int index) const {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.inputs->size());
+  const TensorValue& value((*params_.inputs)[index]);
+  if (value.is_ref()) {
+    return MakeRefType(value->dtype());
+  } else {
+    return value->dtype();
+  }
+}
+
+inline DataType OpKernelContext::expected_output_dtype(int index) const {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.op_kernel->output_types().size());
+  return params_.op_kernel->output_type(index);
+}
+
+inline const Tensor& OpKernelContext::input(int index) const {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.inputs->size());
+  DCHECK(!(*params_.inputs)[index].is_ref());
+  return *((*params_.inputs)[index].tensor);
+}
+
+inline Tensor OpKernelContext::mutable_input(int index, bool lock_held) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.inputs->size());
+  DCHECK((*params_.inputs)[index].is_ref());
+  // return a copy of the Ref acquired while holding the mutex
+  if (lock_held) {
+    return *((*params_.inputs)[index].tensor);
+  } else {
+    mutex_lock l(*input_ref_mutex(index));
+    return *((*params_.inputs)[index].tensor);
+  }
+}
+
+inline void OpKernelContext::replace_ref_input(int index, const Tensor& tensor,
+                                               bool lock_held) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.inputs->size());
+  DCHECK((*params_.inputs)[index].is_ref());
+  // should only modify the tensor while holding the mutex
+  if (lock_held) {
+    *(*params_.inputs)[index].tensor = tensor;
+  } else {
+    mutex_lock l(*input_ref_mutex(index));
+    *(*params_.inputs)[index].tensor = tensor;
+  }
+}
+
+inline void OpKernelContext::forward_ref_input_to_ref_output(int input_index,
+                                                             int output_index) {
+  DCHECK_GE(input_index, 0);
+  DCHECK_LT(input_index, params_.inputs->size());
+  DCHECK((*params_.inputs)[input_index].is_ref());
+  set_output_ref(output_index, (*params_.inputs)[input_index].mutex_if_ref,
+                 (*params_.inputs)[input_index].tensor);
+}
+
+inline void OpKernelContext::delete_ref_input(int index, bool lock_held) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.inputs->size());
+  DCHECK((*params_.inputs)[index].is_ref());
+  // should only modify the tensor while holding the mutex
+  if (lock_held) {
+    delete (*params_.inputs)[index].tensor;
+  } else {
+    mutex_lock l(*input_ref_mutex(index));
+    delete (*params_.inputs)[index].tensor;
+  }
+}
+
+// no input if tensor == nullptr.
+inline bool OpKernelContext::has_input(int index) const {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.inputs->size());
+  return (*params_.inputs)[index].tensor != nullptr;
+}
+
+inline mutex* OpKernelContext::input_ref_mutex(int index) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.inputs->size());
+  DCHECK((*params_.inputs)[index].is_ref());
+  return (*params_.inputs)[index].mutex_if_ref;
+}
+
+inline Status OpKernelContext::allocate_output(int index,
+                                               const TensorShape& shape,
+                                               Tensor** output) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, num_outputs());
+  DCHECK(params_.output_alloc_attr);
+  AllocatorAttributes attr = params_.output_alloc_attr(index);
+  return allocate_output(index, shape, output, attr);
+}
+
+inline Status OpKernelContext::allocate_tensor(DataType type,
+                                               const TensorShape& shape,
+                                               Tensor* out_tensor,
+                                               AllocatorAttributes attr) {
+  Allocator* a = get_allocator(attr);
+  Tensor new_tensor(a, type, shape);
+
+  if (!new_tensor.IsInitialized() && shape.num_elements() > 0) {
+    return errors::ResourceExhausted("OOM when allocating tensor with shape",
+                                     shape.DebugString());
+  }
+  *out_tensor = new_tensor;
+  return Status::OK();
+}
+
+inline Status OpKernelContext::allocate_output(int index,
+                                               const TensorShape& shape,
+                                               Tensor** output,
+                                               AllocatorAttributes attr) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, outputs_.size());
+  // Record the fact that this output tensor was allocated by the Op
+  DCHECK_LT(index, output_allocation_types_.size());
+  output_allocation_types_[index] = AT_ALLOCATED;
+  const DataType type = params_.op_kernel->output_type(index);
+  DCHECK(!IsRefType(type));
+  DCHECK(mutable_output(index) == nullptr);
+  Tensor* output_tensor = new Tensor();
+  Status s = allocate_tensor(type, shape, output_tensor, attr);
+  if (s.ok()) {
+    outputs_[index] = TensorValue(output_tensor);
+    *output = outputs_[index].tensor;
+  }
+  return s;
+}
+
+inline Status OpKernelContext::allocate_temp(DataType type,
+                                             const TensorShape& shape,
+                                             Tensor* out_temp,
+                                             AllocatorAttributes attr) {
+  Status s = allocate_tensor(type, shape, out_temp, attr);
+  if (s.ok()) {
+    if (params_.device->SaveTemporaryTensors()) {
+      // keep a reference to the underlying memory around
+      temp_tensors_.push_back(new Tensor(*out_temp));
+    }
+  }
+  return s;
+}
+
+inline Status OpKernelContext::allocate_persistent(
+    DataType type, const TensorShape& shape, PersistentTensor* out_persistent,
+    Tensor** out_tensor, AllocatorAttributes attr) {
+  // TODO(misard) add specific memory tracking for persistent tensors
+  Tensor persistent;
+  Status s = allocate_tensor(type, shape, &persistent, attr);
+  if (s.ok()) {
+    *out_persistent = PersistentTensor(persistent);
+    // This call saves a reference to the newly-allocated tensor if we
+    // are saving temporary tensors
+    Tensor* allocated = out_persistent->AccessTensor(this);
+    if (out_tensor) {
+      *out_tensor = allocated;
+    }
+  }
+  return s;
+}
+
+inline void OpKernelContext::NotifyUseOfPersistentTensor(const Tensor& t) {
+  if (t.IsInitialized() && params_.device->SaveTemporaryTensors()) {
+    // keep a reference to the underlying memory around
+    temp_tensors_.push_back(new Tensor(t));
+  }
+}
+
+inline int OpKernelContext::num_temps() { return temp_tensors_.size(); }
+
+inline Tensor* OpKernelContext::temp(int index) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, temp_tensors_.size());
+  return temp_tensors_[index];
+}
+
+inline void OpKernelContext::set_output(int index, const Tensor& tensor) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, outputs_.size());
+  // Record the fact that this output tensor was set by the Op
+  DCHECK_LT(index, output_allocation_types_.size());
+  output_allocation_types_[index] = AT_EXISTING;
+  DCHECK(!IsRefType(params_.op_kernel->output_type(index)));
+  DCHECK_EQ(mutable_output(index), nullptr);
+  outputs_[index] = TensorValue(new Tensor(tensor));
+}
+
+inline void OpKernelContext::set_output_ref(int index, mutex* mu,
+                                            Tensor* tensor_for_ref) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, outputs_.size());
+  // Record the fact that this output tensor was set by reference the Op
+  DCHECK_LT(index, output_allocation_types_.size());
+  output_allocation_types_[index] = AT_REF;
+  DCHECK(IsRefType(params_.op_kernel->output_type(index)));
+  outputs_[index] = TensorValue(mu, tensor_for_ref);
+}
+
+inline Tensor* OpKernelContext::mutable_output(int index) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, outputs_.size());
+  return outputs_[index].tensor;
+}
+
+inline TensorValue OpKernelContext::release_output(int index) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, outputs_.size());
+  TensorValue value = outputs_[index];
+  outputs_[index] = TensorValue();
+  return value;
+}
+
+template <typename T>
+T* OpKernelContext::op_device_context() {
+  static_assert(std::is_base_of<DeviceContext, T>::value,
+                "T is not a subclass of DeviceContext");
+  return static_cast<T*>(op_device_context());
+}
+
+template <typename T>
+T* OpKernelContext::input_device_context(int index) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.input_device_contexts->size());
+  static_assert(std::is_base_of<DeviceContext, T>::value,
+                "T is not a subclass of DeviceContext");
+  return static_cast<T*>((*params_.input_device_contexts)[index]);
+}
+
+inline DeviceContext* OpKernelContext::input_device_context(int index) {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, params_.input_device_contexts->size());
+  return (*params_.input_device_contexts)[index];
+}
+
+inline const Tensor& OpInputList::operator[](int i) const {
+  DCHECK_GE(i, 0);
+  DCHECK_LT(i, stop_ - start_);
+  return ctx_->input(start_ + i);
+}
+
+inline mutex* OpMutableInputList::ref_mutex(int i) {
+  DCHECK_GE(i, 0);
+  DCHECK_LT(i, stop_ - start_);
+  return ctx_->input_ref_mutex(start_ + i);
+}
+
+inline Tensor OpMutableInputList::at(int i, bool lock_held) {
+  DCHECK_GE(i, 0);
+  DCHECK_LT(i, stop_ - start_);
+  return ctx_->mutable_input(start_ + i, lock_held);
+}
+
+inline Tensor* OpOutputList::operator[](int i) {
+  DCHECK_GE(i, 0);
+  DCHECK_LT(i, stop_ - start_);
+  return ctx_->mutable_output(start_ + i);
+}
+
+inline bool OpOutputList::required(int i) const {
+  DCHECK_GE(i, 0);
+  DCHECK_LT(i, stop_ - start_);
+  return ctx_->output_required(start_ + i);
+}
+
+inline Status OpOutputList::allocate(int i, const TensorShape& shape,
+                                     Tensor** output) {
+  DCHECK_GE(i, 0);
+  DCHECK_LT(i, stop_ - start_);
+  return ctx_->allocate_output(start_ + i, shape, output);
+}
+
+inline void OpOutputList::set(int i, const Tensor& tensor) {
+  DCHECK_GE(i, 0);
+  DCHECK_LT(i, stop_ - start_);
+  ctx_->set_output(start_ + i, tensor);
+}
+
+inline void OpOutputList::set_ref(int i, mutex* mu, Tensor* tensor_for_ref) {
+  DCHECK_GE(i, 0);
+  DCHECK_LT(i, stop_ - start_);
+  ctx_->set_output_ref(i, mu, tensor_for_ref);
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_OP_KERNEL_H_
diff --git a/tensorflow/core/framework/op_kernel_test.cc b/tensorflow/core/framework/op_kernel_test.cc
new file mode 100644
index 0000000000..9400ef24f8
--- /dev/null
+++ b/tensorflow/core/framework/op_kernel_test.cc
@@ -0,0 +1,803 @@
+#include "tensorflow/core/framework/op_kernel.h"
+
+#include <memory>
+#include <vector>
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include <gtest/gtest.h>
+
+class DummyKernel : public tensorflow::OpKernel {
+ public:
+  explicit DummyKernel(tensorflow::OpKernelConstruction* context)
+      : OpKernel(context) {}
+  void Compute(tensorflow::OpKernelContext* context) override {}
+};
+
+// Test that registration works outside a namespace.
+REGISTER_OP("Test1").Input("a: float").Input("b: int32").Output("o: uint8");
+REGISTER_KERNEL_BUILDER(Name("Test1").Device(tensorflow::DEVICE_CPU),
+                        DummyKernel);
+
+namespace foo {
+bool match_signature_ = false;
+
+// Test that registration works inside a different namespace.
+class TestOp2 : public ::tensorflow::OpKernel {
+ public:
+  explicit TestOp2(::tensorflow::OpKernelConstruction* context)
+      : OpKernel(context) {
+    ::tensorflow::Status status = context->MatchSignature(
+        {::tensorflow::DT_INT32}, {::tensorflow::DT_INT32});
+    match_signature_ = status.ok();
+    context->SetStatus(status);
+  }
+  void Compute(::tensorflow::OpKernelContext* context) override {}
+};
+
+REGISTER_OP("Test2").Input("i: T").Output("o: T").Attr("T: type");
+REGISTER_KERNEL_BUILDER(Name("Test2")
+                            .Device(::tensorflow::DEVICE_GPU)
+                            .HostMemory("i")
+                            .HostMemory("o"),
+                        TestOp2);
+}  // namespace foo
+
+namespace tensorflow {
+
+// Two operations with the same name but different devices.
+REGISTER_OP("Test3").Input("a: T").Input("b: T").Attr("T: type");
+
+class TestOp3Cpu : public tensorflow::OpKernel {
+ public:
+  explicit TestOp3Cpu(OpKernelConstruction* context) : OpKernel(context) {}
+  void Compute(OpKernelContext* context) override {}
+};
+
+REGISTER_KERNEL_BUILDER(
+    Name("Test3").Device(DEVICE_CPU).TypeConstraint<int8>("T"), TestOp3Cpu);
+
+namespace {
+
+class TestOp3Gpu : public tensorflow::OpKernel {
+ public:
+  explicit TestOp3Gpu(OpKernelConstruction* context) : OpKernel(context) {}
+  void Compute(OpKernelContext* context) override {}
+};
+
+REGISTER_KERNEL_BUILDER(
+    Name("Test3").Device(DEVICE_GPU).TypeConstraint<float>("T"), TestOp3Cpu);
+
+// An Op registered for both
+REGISTER_OP("Test4").Input("i: float").Output("o: float");
+REGISTER_KERNEL_BUILDER(Name("Test4").Device(DEVICE_CPU), DummyKernel);
+REGISTER_KERNEL_BUILDER(Name("Test4").Device(DEVICE_GPU), DummyKernel);
+
+static std::vector<DeviceType> DeviceTypes() {
+  return {DeviceType(DEVICE_GPU), DeviceType(DEVICE_CPU)};
+}
+
+class OpKernelTest : public ::testing::Test {
+ public:
+  OpKernelTest() : device_(Env::Default()) {}
+
+ protected:
+  NodeDef CreateNodeDef(const string& op_type, const DataTypeVector& inputs) {
+    NodeDefBuilder builder(op_type + "-op", op_type);
+    for (DataType dt : inputs) {
+      builder.Input(FakeInput(dt));
+    }
+    NodeDef node_def;
+    TF_CHECK_OK(builder.Finalize(&node_def));
+    return node_def;
+  }
+
+  void ExpectEqual(const string& what, const DataTypeVector& expected,
+                   const DataTypeVector& observed) {
+    EXPECT_EQ(expected.size(), observed.size()) << what;
+    const int size = std::min(expected.size(), observed.size());
+    for (int i = 0; i < size; ++i) {
+      bool match = TypesCompatible(expected[i], observed[i]);
+      EXPECT_TRUE(match) << what << " i:" << i << ", expected: " << expected[i]
+                         << ", observed: " << observed[i];
+    }
+  }
+
+  void ExpectSuccess(const string& op_type, DeviceType device_type,
+                     const DataTypeVector& inputs,
+                     const DataTypeVector& outputs) {
+    Status status;
+    std::unique_ptr<OpKernel> op(
+        CreateOpKernel(device_type, &device_, cpu_allocator(),
+                       CreateNodeDef(op_type, inputs), &status));
+    EXPECT_TRUE(status.ok()) << status;
+    EXPECT_TRUE(op != nullptr);
+    if (op != nullptr) {
+      ExpectEqual("inputs", op->input_types(), inputs);
+      ExpectEqual("outputs", op->output_types(), outputs);
+    }
+  }
+
+  void ExpectFailure(const string& ascii_node_def, DeviceType device_type,
+                     error::Code code) {
+    NodeDef node_def;
+    protobuf::TextFormat::ParseFromString(ascii_node_def, &node_def);
+    Status status;
+    std::unique_ptr<OpKernel> op(CreateOpKernel(
+        device_type, &device_, cpu_allocator(), node_def, &status));
+    EXPECT_TRUE(op == nullptr);
+    EXPECT_FALSE(status.ok());
+    if (!status.ok()) {
+      LOG(INFO) << "Status message: " << status.error_message();
+      EXPECT_EQ(code, status.code());
+    }
+  }
+
+ private:
+  DeviceBase device_;
+};
+
+TEST_F(OpKernelTest, SuccessCpu) {
+  ExpectSuccess("Test1", DEVICE_CPU, {DT_FLOAT, DT_INT32}, {DT_UINT8});
+  ExpectSuccess("Test1", DEVICE_CPU, {DT_FLOAT_REF, DT_INT32}, {DT_UINT8});
+}
+
+TEST_F(OpKernelTest, SuccessGpu) {
+  foo::match_signature_ = false;
+  ExpectSuccess("Test2", DEVICE_GPU, {DT_INT32}, {DT_INT32});
+  EXPECT_TRUE(foo::match_signature_);
+}
+
+TEST_F(OpKernelTest, SuccessBothCpuAndGpu) {
+  ExpectSuccess("Test3", DEVICE_CPU, {DT_INT8, DT_INT8}, {});
+  ExpectSuccess("Test3", DEVICE_GPU, {DT_FLOAT, DT_FLOAT}, {});
+}
+
+TEST_F(OpKernelTest, CpuTypeRegistered) {
+  NodeDef ndef = CreateNodeDef("Test1", {DT_FLOAT, DT_INT32});
+  DeviceTypeVector devs;
+  ASSERT_OK(SupportedDeviceTypesForNode(DeviceTypes(), ndef, &devs));
+  EXPECT_EQ(1, devs.size());
+  EXPECT_EQ(DeviceType(DEVICE_CPU), devs[0]);
+}
+
+TEST_F(OpKernelTest, CpuAndGpuTypeRegistered) {
+  {
+    // Try a node def of an op that is registered for a specific type
+    // only on CPU.
+    NodeDef ndef = CreateNodeDef("Test3", {DT_INT8, DT_INT8});
+    DeviceTypeVector devs;
+    ASSERT_OK(SupportedDeviceTypesForNode(DeviceTypes(), ndef, &devs));
+    EXPECT_EQ(1, devs.size());
+    EXPECT_EQ(DeviceType(DEVICE_CPU), devs[0]);
+  }
+  {
+    // Try a node def of an op that is registered for a specific type
+    // only on GPU.
+    NodeDef ndef = CreateNodeDef("Test3", {DT_FLOAT, DT_FLOAT});
+    DeviceTypeVector devs;
+    ASSERT_OK(SupportedDeviceTypesForNode(DeviceTypes(), ndef, &devs));
+    EXPECT_EQ(1, devs.size());
+    EXPECT_EQ(DeviceType(DEVICE_GPU), devs[0]);
+  }
+  {
+    // Try a node def of an op that is only registered for other types.
+    NodeDef ndef = CreateNodeDef("Test3", {DT_STRING, DT_STRING});
+    DeviceTypeVector devs;
+    ASSERT_OK(SupportedDeviceTypesForNode(DeviceTypes(), ndef, &devs));
+    EXPECT_EQ(0, devs.size());
+  }
+
+  {
+    // Try a node def of an op that is registered for both.
+    NodeDef ndef = CreateNodeDef("Test4", {DT_FLOAT});
+    DeviceTypeVector devs;
+    ASSERT_OK(SupportedDeviceTypesForNode(DeviceTypes(), ndef, &devs));
+    EXPECT_EQ(2, devs.size());
+    EXPECT_EQ(DeviceType(DEVICE_GPU), devs[0]);
+    EXPECT_EQ(DeviceType(DEVICE_CPU), devs[1]);
+  }
+}
+
+TEST_F(OpKernelTest, NotFound) {
+  const auto not_found = error::NOT_FOUND;
+  // Something with that op type name exists, but only with a
+  // different DeviceType.
+  ExpectFailure(CreateNodeDef("Test1", {DT_FLOAT, DT_INT32}).DebugString(),
+                DEVICE_GPU, not_found);
+  ExpectFailure(CreateNodeDef("Test3", {DT_INT8, DT_INT8}).DebugString(),
+                DEVICE_GPU, not_found);
+  ExpectFailure(CreateNodeDef("Test3", {DT_FLOAT, DT_FLOAT}).DebugString(),
+                DEVICE_CPU, not_found);
+
+  // No kernel with that signature registered.
+  ExpectFailure(CreateNodeDef("Test3", {DT_INT32, DT_INT32}).DebugString(),
+                DEVICE_GPU, not_found);
+
+  // Nothing with that op type name exists.
+  ExpectFailure("name: 'NF' op: 'Testnotfound'", DEVICE_CPU, not_found);
+  ExpectFailure("name: 'NF' op: 'Testnotfound'", DEVICE_GPU, not_found);
+}
+
+TEST_F(OpKernelTest, TooFewInputs) {
+  const auto invalid = error::INVALID_ARGUMENT;
+  NodeDef node_def = CreateNodeDef("Test1", {DT_FLOAT, DT_INT32});
+  node_def.clear_input();
+  ExpectFailure(node_def.DebugString(), DEVICE_CPU, invalid);
+  node_def.add_input("a");
+  ExpectFailure(node_def.DebugString(), DEVICE_CPU, invalid);
+}
+
+TEST_F(OpKernelTest, TooManyInputs) {
+  const auto invalid = error::INVALID_ARGUMENT;
+  NodeDef node_def = CreateNodeDef("Test1", {DT_FLOAT, DT_INT32});
+  node_def.add_input("c");
+  ExpectFailure(node_def.DebugString(), DEVICE_CPU, invalid);
+}
+
+TEST_F(OpKernelTest, MatchSignatureFailes) {
+  const auto invalid = error::INVALID_ARGUMENT;
+  foo::match_signature_ = true;
+  ExpectFailure(CreateNodeDef("Test2", {DT_FLOAT}).DebugString(), DEVICE_GPU,
+                invalid);
+  EXPECT_FALSE(foo::match_signature_);
+}
+
+class DummyDevice : public DeviceBase {
+ public:
+  DummyDevice(Env* env, bool save) : DeviceBase(env), save_(save) {}
+  bool SaveTemporaryTensors() const override { return save_; }
+  Allocator* GetAllocator(AllocatorAttributes /*attr*/) override {
+    return cpu_allocator();
+  }
+
+ private:
+  bool save_;
+};
+
+TEST_F(OpKernelTest, SaveTempFalse) {
+  Env* env = Env::Default();
+  OpKernelContext::Params params;
+  params.device = new DummyDevice(env, false);
+  Status status;
+  std::unique_ptr<OpKernel> op(
+      CreateOpKernel(DEVICE_CPU, params.device, cpu_allocator(),
+                     CreateNodeDef("Test1", {DT_FLOAT, DT_INT32}), &status));
+  EXPECT_TRUE(status.ok());
+  params.op_kernel = op.get();
+  OpKernelContext* ctx = new OpKernelContext(params);
+
+  Tensor t;
+  EXPECT_OK(ctx->allocate_temp(DT_FLOAT, TensorShape(), &t));
+
+  EXPECT_EQ(0, ctx->num_temps());
+
+  delete ctx;
+  delete params.device;
+}
+
+TEST_F(OpKernelTest, SaveTempTrue) {
+  Env* env = Env::Default();
+  OpKernelContext::Params params;
+  params.device = new DummyDevice(env, true);
+  Status status;
+  std::unique_ptr<OpKernel> op(
+      CreateOpKernel(DEVICE_CPU, params.device, cpu_allocator(),
+                     CreateNodeDef("Test1", {DT_FLOAT, DT_INT32}), &status));
+  EXPECT_TRUE(status.ok());
+  params.op_kernel = op.get();
+  OpKernelContext* ctx = new OpKernelContext(params);
+
+  Tensor t;
+  EXPECT_OK(ctx->allocate_temp(DT_FLOAT, TensorShape(), &t));
+
+  EXPECT_EQ(1, ctx->num_temps());
+
+  delete ctx;
+  delete params.device;
+}
+
+class OpKernelBuilderTest : public ::testing::Test {
+ protected:
+  // Each attr is described by a "name|type|value".
+  NodeDef CreateNodeDef(const string& op_type,
+                        const std::vector<string>& attrs) {
+    NodeDef node_def;
+    node_def.set_name(op_type + "-op");
+    node_def.set_op(op_type);
+    for (const string& attr_desc : attrs) {
+      std::vector<string> parts = str_util::Split(attr_desc, '|');
+      CHECK_EQ(parts.size(), 3);
+      AttrValue attr_value;
+      CHECK(ParseAttrValue(parts[1], parts[2], &attr_value)) << attr_desc;
+      node_def.mutable_attr()->insert(
+          AttrValueMap::value_type(parts[0], attr_value));
+    }
+    return node_def;
+  }
+
+  std::unique_ptr<OpKernel> ExpectSuccess(const string& op_type,
+                                          DeviceType device_type,
+                                          const std::vector<string>& attrs,
+                                          DataTypeSlice input_types = {}) {
+    Status status;
+    NodeDef def = CreateNodeDef(op_type, attrs);
+    for (size_t i = 0; i < input_types.size(); ++i) {
+      def.add_input("a:0");
+    }
+
+    Env* env = Env::Default();
+    DeviceBase device(env);
+
+    // Test CreateOpKernel()
+    std::unique_ptr<OpKernel> op(
+        CreateOpKernel(device_type, &device, cpu_allocator(), def, &status));
+    EXPECT_TRUE(status.ok()) << status;
+    EXPECT_TRUE(op != nullptr);
+    if (op != nullptr) {
+      EXPECT_EQ(input_types.size(), op->num_inputs());
+      EXPECT_EQ(0, op->num_outputs());
+    }
+
+    // Test SupportedDeviceTypesForNode()
+    DeviceTypeVector devices;
+    EXPECT_OK(SupportedDeviceTypesForNode(DeviceTypes(), def, &devices));
+    bool found = false;
+    for (DeviceType dt : devices) {
+      if (dt == device_type) {
+        found = true;
+      }
+    }
+    EXPECT_TRUE(found) << "Missing " << device_type << " from "
+                       << devices.size() << " devices.";
+
+    // In case the caller wants to use the OpKernel
+    return op;
+  }
+
+  void ExpectFailure(const string& op_type, DeviceType device_type,
+                     const std::vector<string>& attrs, error::Code code) {
+    Status status;
+    const NodeDef def = CreateNodeDef(op_type, attrs);
+    Env* env = Env::Default();
+    DeviceBase device(env);
+
+    // Test CreateOpKernel().
+    std::unique_ptr<OpKernel> op(
+        CreateOpKernel(device_type, &device, cpu_allocator(), def, &status));
+    EXPECT_TRUE(op == nullptr);
+    EXPECT_FALSE(status.ok());
+    if (!status.ok()) {
+      LOG(INFO) << "Status message: " << status.error_message();
+      EXPECT_EQ(code, status.code());
+
+      // Test SupportedDeviceTypesForNode().
+      DeviceTypeVector devices;
+      if (errors::IsNotFound(status)) {
+        EXPECT_OK(SupportedDeviceTypesForNode(DeviceTypes(), def, &devices));
+        for (DeviceType dt : devices) {
+          EXPECT_NE(dt, device_type);
+        }
+      } else {
+        Status status2 =
+            SupportedDeviceTypesForNode(DeviceTypes(), def, &devices);
+        EXPECT_EQ(status.code(), status2.code());
+      }
+    }
+  }
+};
+
+REGISTER_OP("BuildCPU");
+REGISTER_KERNEL_BUILDER(Name("BuildCPU").Device(DEVICE_CPU), DummyKernel);
+
+TEST_F(OpKernelBuilderTest, BuilderCPU) {
+  ExpectSuccess("BuildCPU", DEVICE_CPU, {});
+  ExpectFailure("BuildCPU", DEVICE_GPU, {}, error::NOT_FOUND);
+}
+
+REGISTER_OP("BuildGPU");
+REGISTER_KERNEL_BUILDER(Name("BuildGPU").Device(DEVICE_GPU), DummyKernel);
+
+TEST_F(OpKernelBuilderTest, BuilderGPU) {
+  ExpectFailure("BuildGPU", DEVICE_CPU, {}, error::NOT_FOUND);
+  ExpectSuccess("BuildGPU", DEVICE_GPU, {});
+}
+
+REGISTER_OP("BuildBoth");
+REGISTER_KERNEL_BUILDER(Name("BuildBoth").Device(DEVICE_CPU), DummyKernel);
+REGISTER_KERNEL_BUILDER(Name("BuildBoth").Device(DEVICE_GPU), DummyKernel);
+
+TEST_F(OpKernelBuilderTest, BuilderBoth) {
+  ExpectSuccess("BuildBoth", DEVICE_CPU, {});
+  ExpectSuccess("BuildBoth", DEVICE_GPU, {});
+}
+
+REGISTER_OP("BuildTypeAttr").Attr("T: type");
+REGISTER_KERNEL_BUILDER(Name("BuildTypeAttr")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        DummyKernel);
+
+TEST_F(OpKernelBuilderTest, BuilderTypeAttr) {
+  ExpectSuccess("BuildTypeAttr", DEVICE_CPU, {"T|type|DT_FLOAT"});
+  ExpectFailure("BuildTypeAttr", DEVICE_CPU, {"T|type|DT_BOOL"},
+                error::NOT_FOUND);
+  ExpectFailure("BuildTypeAttr", DEVICE_CPU, {}, error::INVALID_ARGUMENT);
+  ExpectFailure("BuildTypeAttr", DEVICE_CPU, {"T|int|7"},
+                error::INVALID_ARGUMENT);
+}
+
+REGISTER_OP("BuildTypeListAttr").Attr("T: list(type)");
+REGISTER_KERNEL_BUILDER(Name("BuildTypeListAttr")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<bool>("T"),
+                        DummyKernel);
+
+TEST_F(OpKernelBuilderTest, BuilderTypeListAttr) {
+  ExpectSuccess("BuildTypeListAttr", DEVICE_CPU, {"T|list(type)|[]"});
+  ExpectSuccess("BuildTypeListAttr", DEVICE_CPU, {"T|list(type)|[DT_BOOL]"});
+  ExpectSuccess("BuildTypeListAttr", DEVICE_CPU,
+                {"T|list(type)|[DT_BOOL, DT_BOOL]"});
+  ExpectFailure("BuildTypeListAttr", DEVICE_CPU, {"T|list(type)|[DT_FLOAT]"},
+                error::NOT_FOUND);
+  ExpectFailure("BuildTypeListAttr", DEVICE_CPU, {}, error::INVALID_ARGUMENT);
+  ExpectFailure("BuildTypeListAttr", DEVICE_CPU, {"T|int|7"},
+                error::INVALID_ARGUMENT);
+}
+
+REGISTER_OP("DuplicateKernel");
+REGISTER_KERNEL_BUILDER(Name("DuplicateKernel").Device(DEVICE_CPU),
+                        DummyKernel);
+REGISTER_KERNEL_BUILDER(Name("DuplicateKernel").Device(DEVICE_CPU),
+                        DummyKernel);
+
+TEST_F(OpKernelBuilderTest, DuplicateKernel) {
+  const NodeDef ndef = CreateNodeDef("DuplicateKernel", {});
+  DeviceTypeVector devs;
+  Status status = SupportedDeviceTypesForNode(DeviceTypes(), ndef, &devs);
+  ASSERT_FALSE(status.ok());
+  EXPECT_TRUE(StringPiece(status.error_message())
+                  .contains("Multiple OpKernel registrations match NodeDef"));
+
+  ExpectFailure("DuplicateKernel", DEVICE_CPU, {}, error::INVALID_ARGUMENT);
+}
+
+REGISTER_OP("DuplicateKernelForT").Attr("T: type");
+REGISTER_KERNEL_BUILDER(Name("DuplicateKernelForT")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        DummyKernel);
+REGISTER_KERNEL_BUILDER(Name("DuplicateKernelForT")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        DummyKernel);
+
+TEST_F(OpKernelBuilderTest, DuplicateKernelForT) {
+  const NodeDef ndef =
+      CreateNodeDef("DuplicateKernelForT", {"T|type|DT_FLOAT"});
+  DeviceTypeVector devs;
+  Status status = SupportedDeviceTypesForNode(DeviceTypes(), ndef, &devs);
+  ASSERT_FALSE(status.ok());
+  EXPECT_TRUE(StringPiece(status.error_message())
+                  .contains("Multiple OpKernel registrations match NodeDef"));
+
+  ExpectFailure("DuplicateKernelForT", DEVICE_CPU, {"T|type|DT_FLOAT"},
+                error::INVALID_ARGUMENT);
+  ExpectFailure("DuplicateKernelForT", DEVICE_CPU, {"T|type|DT_BOOL"},
+                error::NOT_FOUND);
+}
+
+REGISTER_OP("BadConstraint").Attr("dtype: type");
+REGISTER_KERNEL_BUILDER(Name("BadConstraint")
+                            .Device(DEVICE_CPU)
+                            // Mistake: "T" should be "dtype".
+                            .TypeConstraint<float>("T"),
+                        DummyKernel);
+
+TEST_F(OpKernelBuilderTest, BadConstraint) {
+  const NodeDef ndef = CreateNodeDef("BadConstraint", {});
+  DeviceTypeVector devs;
+  Status status = SupportedDeviceTypesForNode(DeviceTypes(), ndef, &devs);
+  ASSERT_FALSE(status.ok());
+  EXPECT_TRUE(StringPiece(status.error_message())
+                  .contains("OpKernel 'BadConstraint' has constraint on attr "
+                            "'T' not in NodeDef"));
+
+  ExpectFailure("BadConstraint", DEVICE_CPU, {"dtype|type|DT_FLOAT"},
+                error::INVALID_ARGUMENT);
+}
+
+class GetAttrKernel : public ::tensorflow::OpKernel {
+ public:
+  explicit GetAttrKernel(OpKernelConstruction* context) : OpKernel(context) {
+    string attr_name;
+    OP_REQUIRES_OK(context, context->GetAttr("attr_name", &attr_name));
+
+    status.emplace_back("s", context->GetAttr(attr_name, &s));
+    status.emplace_back("s_list", context->GetAttr(attr_name, &s_list));
+    status.emplace_back("i", context->GetAttr(attr_name, &i));
+    status.emplace_back("i_list", context->GetAttr(attr_name, &i_list));
+    status.emplace_back("i32", context->GetAttr(attr_name, &i32));
+    status.emplace_back("i32_list", context->GetAttr(attr_name, &i32_list));
+    status.emplace_back("f", context->GetAttr(attr_name, &f));
+    status.emplace_back("f_list", context->GetAttr(attr_name, &f_list));
+    status.emplace_back("b", context->GetAttr(attr_name, &b));
+    status.emplace_back("b_list", context->GetAttr(attr_name, &b_list));
+    status.emplace_back("type", context->GetAttr(attr_name, &type));
+    status.emplace_back("type_list", context->GetAttr(attr_name, &type_list));
+    status.emplace_back("type_vector",
+                        context->GetAttr(attr_name, &type_vector));
+    status.emplace_back("shape_proto",
+                        context->GetAttr(attr_name, &shape_proto));
+    status.emplace_back("shape_proto_list",
+                        context->GetAttr(attr_name, &shape_proto_list));
+    status.emplace_back("shape", context->GetAttr(attr_name, &shape));
+    status.emplace_back("shape_list", context->GetAttr(attr_name, &shape_list));
+  }
+  void Compute(::tensorflow::OpKernelContext* context) override {}
+
+  void ExpectOk(std::initializer_list<string> keys) {
+    for (const auto& key_status : status) {
+      // Only the status for keys in "keys" should be ok().
+      bool in_keys = false;
+      for (const string& key : keys) {
+        if (key_status.first == key) {
+          in_keys = true;
+        }
+      }
+      EXPECT_EQ(in_keys, key_status.second.ok())
+          << "key_status: " << key_status.first << ", " << key_status.second;
+    }
+  }
+
+  string s;
+  std::vector<string> s_list;
+  int64 i;
+  std::vector<int64> i_list;
+  int32 i32;
+  std::vector<int32> i32_list;
+  float f;
+  std::vector<float> f_list;
+  bool b;
+  std::vector<bool> b_list;
+  DataType type;
+  std::vector<DataType> type_list;
+  DataTypeVector type_vector;
+  TensorShapeProto shape_proto;
+  std::vector<TensorShapeProto> shape_proto_list;
+  TensorShape shape;
+  std::vector<TensorShape> shape_list;
+  std::vector<std::pair<string, Status>> status;
+};
+
+class GetAttrTest : public OpKernelBuilderTest {};
+
+REGISTER_OP("GetAttrStringList")
+    .Attr("attr_name: string")
+    .Attr("a: list(string)");
+REGISTER_KERNEL_BUILDER(Name("GetAttrStringList").Device(DEVICE_CPU),
+                        GetAttrKernel);
+
+TEST_F(GetAttrTest, StringList) {
+  std::unique_ptr<OpKernel> op_kernel =
+      ExpectSuccess("GetAttrStringList", DEVICE_CPU,
+                    {"attr_name|string|'a'", "a|list(string)|['foo', 'bar']"});
+  auto* get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({"s_list"});
+  EXPECT_EQ(std::vector<string>({"foo", "bar"}), get_attr_kernel->s_list);
+
+  op_kernel = ExpectSuccess("GetAttrStringList", DEVICE_CPU,
+                            {"attr_name|string|'b'", "a|list(string)|['baz']"});
+  get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({});
+  EXPECT_TRUE(get_attr_kernel->s_list.empty());
+}
+
+REGISTER_OP("GetAttrInt")
+    .Attr("attr_name: string")
+    .Attr("a: int")
+    .Attr("b: list(int)");
+REGISTER_KERNEL_BUILDER(Name("GetAttrInt").Device(DEVICE_CPU), GetAttrKernel);
+
+TEST_F(GetAttrTest, Int) {
+  std::unique_ptr<OpKernel> op_kernel = ExpectSuccess(
+      "GetAttrInt", DEVICE_CPU,
+      {"attr_name|string|'a'", "a|int|35", "b|list(int)|[-1, 2, -4]"});
+  auto* get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({"i", "i32"});
+  EXPECT_EQ(35, get_attr_kernel->i);
+  EXPECT_EQ(35, get_attr_kernel->i32);
+
+  op_kernel = ExpectSuccess(
+      "GetAttrInt", DEVICE_CPU,
+      {"attr_name|string|'b'", "a|int|35", "b|list(int)|[-1, 2, -4]"});
+  get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({"i_list", "i32_list"});
+  EXPECT_EQ(std::vector<int64>({-1, 2, -4}), get_attr_kernel->i_list);
+  EXPECT_EQ(std::vector<int32>({-1, 2, -4}), get_attr_kernel->i32_list);
+
+  // 8589934592 == 2^33, too big to fit in an int32
+  op_kernel = ExpectSuccess("GetAttrInt", DEVICE_CPU,
+                            {"attr_name|string|'a'", "a|int|8589934592",
+                             "b|list(int)|[-8589934592]"});
+  get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({"i"});  // no i32
+  EXPECT_EQ(8589934592ll, get_attr_kernel->i);
+  for (const auto& key_status : get_attr_kernel->status) {
+    if (key_status.first == "i32") {
+      EXPECT_EQ(error::INVALID_ARGUMENT, key_status.second.code());
+      EXPECT_EQ("Attr a has value 8589934592 out of range for an int32",
+                key_status.second.error_message());
+    }
+  }
+
+  op_kernel = ExpectSuccess("GetAttrInt", DEVICE_CPU,
+                            {"attr_name|string|'b'", "a|int|8589934592",
+                             "b|list(int)|[-8589934592]"});
+  get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({"i_list"});  // no i32_list
+  EXPECT_EQ(std::vector<int64>({-8589934592ll}), get_attr_kernel->i_list);
+  for (const auto& key_status : get_attr_kernel->status) {
+    if (key_status.first == "i32_list") {
+      EXPECT_EQ(error::INVALID_ARGUMENT, key_status.second.code());
+      EXPECT_EQ("Attr b has value -8589934592 out of range for an int32",
+                key_status.second.error_message());
+    }
+  }
+}
+
+REGISTER_OP("GetAttrShape")
+    .Attr("attr_name: string")
+    .Attr("a: shape")
+    .Attr("b: list(shape)");
+REGISTER_KERNEL_BUILDER(Name("GetAttrShape").Device(DEVICE_CPU), GetAttrKernel);
+
+TEST_F(GetAttrTest, Shape) {
+  std::unique_ptr<OpKernel> op_kernel = ExpectSuccess(
+      "GetAttrShape", DEVICE_CPU,
+      {"attr_name|string|'a'", "a|shape|{ dim { size: 3 } }",
+       "b|list(shape)|[{ dim { size:2 } }, { dim { size: 4 } }]"});
+  auto* get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({"shape", "shape_proto"});
+  EXPECT_EQ(get_attr_kernel->shape_proto.ShortDebugString(), "dim { size: 3 }");
+  EXPECT_EQ("[3]", get_attr_kernel->shape.ShortDebugString());
+
+  op_kernel = ExpectSuccess(
+      "GetAttrShape", DEVICE_CPU,
+      {"attr_name|string|'b'", "a|shape|{ dim { size: 3 } }",
+       "b|list(shape)|[{ dim { size:2 } }, { dim { size: 4 } }]"});
+  get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({"shape_list", "shape_proto_list"});
+  ASSERT_EQ(2, get_attr_kernel->shape_proto_list.size());
+  EXPECT_EQ(get_attr_kernel->shape_proto_list[0].ShortDebugString(),
+            "dim { size: 2 }");
+  EXPECT_EQ(get_attr_kernel->shape_proto_list[1].ShortDebugString(),
+            "dim { size: 4 }");
+  ASSERT_EQ(2, get_attr_kernel->shape_list.size());
+  EXPECT_EQ("[2]", get_attr_kernel->shape_list[0].ShortDebugString());
+  EXPECT_EQ("[4]", get_attr_kernel->shape_list[1].ShortDebugString());
+}
+
+REGISTER_OP("GetAttrType").Attr("attr_name: string").Attr("a: type");
+REGISTER_KERNEL_BUILDER(Name("GetAttrType").Device(DEVICE_CPU), GetAttrKernel);
+
+TEST_F(GetAttrTest, Type) {
+  std::unique_ptr<OpKernel> op_kernel = ExpectSuccess(
+      "GetAttrType", DEVICE_CPU, {"attr_name|string|'a'", "a|type|DT_FLOAT"});
+  auto* get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+  get_attr_kernel->ExpectOk({"type"});
+  EXPECT_EQ(DT_FLOAT, get_attr_kernel->type);
+}
+
+REGISTER_OP("GetAttrTypeList").Attr("attr_name: string").Attr("a: list(type)");
+REGISTER_KERNEL_BUILDER(Name("GetAttrTypeList").Device(DEVICE_CPU),
+                        GetAttrKernel);
+
+TEST_F(GetAttrTest, TypeList) {
+  std::unique_ptr<OpKernel> op_kernel = ExpectSuccess(
+      "GetAttrTypeList", DEVICE_CPU,
+      {"attr_name|string|'a'", "a|list(type)|[DT_INT32, DT_BOOL]"});
+  auto* get_attr_kernel = static_cast<GetAttrKernel*>(op_kernel.get());
+
+  get_attr_kernel->ExpectOk({"type_list", "type_vector"});
+  ASSERT_EQ(2, get_attr_kernel->type_list.size());
+  EXPECT_EQ(DT_INT32, get_attr_kernel->type_list[0]);
+  EXPECT_EQ(DT_BOOL, get_attr_kernel->type_list[1]);
+  ASSERT_EQ(2, get_attr_kernel->type_vector.size());
+  EXPECT_EQ(DT_INT32, get_attr_kernel->type_vector[0]);
+  EXPECT_EQ(DT_BOOL, get_attr_kernel->type_vector[1]);
+}
+
+REGISTER_OP("HostMemoryTest")
+    .Input("a: float")
+    .Input("b: T")
+    .Input("c: N * string")
+    .Output("o: N * T")
+    .Attr("T: type")
+    .Attr("N: int");
+REGISTER_KERNEL_BUILDER(Name("HostMemoryTest").Device(DEVICE_CPU), DummyKernel);
+REGISTER_KERNEL_BUILDER(Name("HostMemoryTest")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("a")
+                            .HostMemory("c")
+                            .HostMemory("o"),
+                        DummyKernel);
+
+TEST(MemoryTypesForNode, Simple) {
+  NodeDef node_def;
+  ASSERT_OK(NodeDefBuilder("test", "HostMemoryTest")
+                .Input(FakeInput())
+                .Input(FakeInput(DT_BOOL))
+                .Input(FakeInput(3))
+                .Finalize(&node_def));
+  MemoryTypeVector input, output;
+
+  EXPECT_OK(MemoryTypesForNode(OpRegistry::Global(), DEVICE_CPU, node_def,
+                               &input, &output));
+  EXPECT_EQ(MemoryTypeVector(5, DEVICE_MEMORY), input);
+  EXPECT_EQ(MemoryTypeVector(3, DEVICE_MEMORY), output);
+
+  EXPECT_OK(MemoryTypesForNode(OpRegistry::Global(), DEVICE_GPU, node_def,
+                               &input, &output));
+  EXPECT_EQ(MemoryTypeVector({HOST_MEMORY, DEVICE_MEMORY, HOST_MEMORY,
+                              HOST_MEMORY, HOST_MEMORY}),
+            input);
+  EXPECT_EQ(MemoryTypeVector(3, HOST_MEMORY), output);
+}
+
+class BaseKernel : public ::tensorflow::OpKernel {
+ public:
+  explicit BaseKernel(OpKernelConstruction* context) : OpKernel(context) {}
+  void Compute(::tensorflow::OpKernelContext* context) override {}
+  virtual int Which() const = 0;
+};
+
+template <int WHICH>
+class LabeledKernel : public BaseKernel {
+ public:
+  using BaseKernel::BaseKernel;
+  int Which() const override { return WHICH; }
+};
+
+class LabelTest : public OpKernelBuilderTest {};
+
+REGISTER_OP("LabeledKernel");
+REGISTER_KERNEL_BUILDER(Name("LabeledKernel").Device(DEVICE_CPU),
+                        LabeledKernel<0>);
+REGISTER_KERNEL_BUILDER(Name("LabeledKernel").Device(DEVICE_CPU).Label("one"),
+                        LabeledKernel<1>);
+REGISTER_KERNEL_BUILDER(Name("LabeledKernel").Device(DEVICE_CPU).Label("dupe"),
+                        LabeledKernel<2>);
+REGISTER_KERNEL_BUILDER(Name("LabeledKernel").Device(DEVICE_CPU).Label("dupe"),
+                        LabeledKernel<3>);
+
+TEST_F(LabelTest, Default) {
+  std::unique_ptr<OpKernel> op_kernel =
+      ExpectSuccess("LabeledKernel", DEVICE_CPU, {});
+  auto* get_labeled_kernel = static_cast<BaseKernel*>(op_kernel.get());
+  EXPECT_EQ(0, get_labeled_kernel->Which());
+}
+
+TEST_F(LabelTest, Specified) {
+  std::unique_ptr<OpKernel> op_kernel =
+      ExpectSuccess("LabeledKernel", DEVICE_CPU, {"_kernel|string|'one'"});
+  auto* get_labeled_kernel = static_cast<BaseKernel*>(op_kernel.get());
+  EXPECT_EQ(1, get_labeled_kernel->Which());
+}
+
+TEST_F(LabelTest, Duplicate) {
+  ExpectFailure("LabeledKernel", DEVICE_CPU, {"_kernel|string|'dupe'"},
+                error::INVALID_ARGUMENT);
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/op_segment.cc b/tensorflow/core/framework/op_segment.cc
new file mode 100644
index 0000000000..a39bebd854
--- /dev/null
+++ b/tensorflow/core/framework/op_segment.cc
@@ -0,0 +1,86 @@
+#include "tensorflow/core/framework/op_segment.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+OpSegment::Item::~Item() {
+  for (auto kv : name_kernel) delete kv.second;
+}
+
+OpSegment::OpSegment() {}
+
+OpSegment::~OpSegment() {
+  for (auto kv : sessions_) delete kv.second;
+}
+
+Status OpSegment::FindOrCreate(const string& session_handle,
+                               const string& node_name, OpKernel** kernel,
+                               CreateKernelFn create_fn) {
+  {
+    mutex_lock l(mu_);
+    auto item = gtl::FindPtrOrNull(sessions_, session_handle);
+    if (item == nullptr) {
+      return errors::NotFound("Session ", session_handle, " is not found.");
+    }
+    *kernel = gtl::FindPtrOrNull(item->name_kernel, node_name);
+    if (*kernel != nullptr) {
+      return Status::OK();
+    }
+  }
+  Status s = create_fn(kernel);
+  if (!s.ok()) {
+    LOG(ERROR) << "Create kernel failed: " << s;
+    return s;
+  }
+  {
+    mutex_lock l(mu_);
+    auto item = gtl::FindPtrOrNull(sessions_, session_handle);
+    if (item == nullptr) {
+      return errors::NotFound("Session ", session_handle, " is not found.");
+    }
+    OpKernel** p_kernel = &(item->name_kernel[node_name]);
+    if (*p_kernel == nullptr) {
+      *p_kernel = *kernel;  // Inserts 'kernel' in the map.
+    } else {
+      delete *kernel;
+      *kernel = *p_kernel;
+    }
+  }
+  return Status::OK();
+}
+
+void OpSegment::AddHold(const string& session_handle) {
+  mutex_lock l(mu_);
+  Item** item = &sessions_[session_handle];
+  if (*item == nullptr) {
+    *item = new Item;  // num_holds == 1
+  } else {
+    ++((*item)->num_holds);
+  }
+}
+
+void OpSegment::RemoveHold(const string& session_handle) {
+  Item* item = nullptr;
+  {
+    mutex_lock l(mu_);
+    auto siter = sessions_.find(session_handle);
+    if (siter == sessions_.end()) {
+      VLOG(1) << "Session " << session_handle << " is not found.";
+      return;
+    }
+    item = siter->second;
+    if (--(item->num_holds) > 0) {
+      return;
+    } else {
+      sessions_.erase(siter);
+    }
+  }
+  delete item;
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/op_segment.h b/tensorflow/core/framework/op_segment.h
new file mode 100644
index 0000000000..55249d2a38
--- /dev/null
+++ b/tensorflow/core/framework/op_segment.h
@@ -0,0 +1,67 @@
+#ifndef TENSORFLOW_FRAMEWORK_OP_SEGMENT_H_
+#define TENSORFLOW_FRAMEWORK_OP_SEGMENT_H_
+
+#include <string>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// OpSegment keeps track of OpKernels registered for sessions running
+// on a device.
+//
+// The implementation maintains a two-level map. The 1st level maps
+// session handle to the map of registered OpKernels. The 2nd level
+// map maps node names to instantiated OpKernel objects.
+//
+// Each 2-nd level map is reference-counted and the caller can call
+// AddHold to obtain a reference on all kernels of a session and
+// ensure these kernels are alive until a corresponding RemoveHold is
+// called on the same session.
+class OpSegment {
+ public:
+  OpSegment();
+  ~OpSegment();
+
+  // A hold can be placed on a session, preventing all its kernels
+  // from being deleted.
+  void AddHold(const string& session_handle);
+  void RemoveHold(const string& session_handle);
+
+  // If the kernel for "node_name" has been created in the
+  // "session_handle", returns the existing op kernel in "*kernel".
+  // Otherwise, creates the kernel by calling create_fn(), cache it,
+  // and returns it in "*kernel". If create_fn() fails, returns the
+  // error.
+  //
+  // OpSegment keeps the ownership of the returned "*kernel".
+  typedef std::function<Status(OpKernel**)> CreateKernelFn;
+  Status FindOrCreate(const string& session_handle, const string& node_name,
+                      OpKernel** kernel, CreateKernelFn create_fn);
+
+ private:
+  // op name -> OpKernel
+  typedef std::unordered_map<string, OpKernel*> KernelMap;
+  struct Item {
+    int num_holds = 1;      // Num of holds put on the session.
+    KernelMap name_kernel;  // op name -> kernel.
+    ~Item();
+  };
+
+  // session handle -> item.
+  // Session handles are produced by strings::FpToString()
+  typedef std::unordered_map<string, Item*> SessionMap;
+
+  mutable mutex mu_;
+  SessionMap sessions_ GUARDED_BY(mu_);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(OpSegment);
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_OP_SEGMENT_H_
diff --git a/tensorflow/core/framework/op_segment_test.cc b/tensorflow/core/framework/op_segment_test.cc
new file mode 100644
index 0000000000..6297718df8
--- /dev/null
+++ b/tensorflow/core/framework/op_segment_test.cc
@@ -0,0 +1,142 @@
+#include "tensorflow/core/framework/op_segment.h"
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+
+class OpSegmentTest : public ::testing::Test {
+ protected:
+  DeviceBase device_;
+  std::vector<NodeDef> int32_nodedefs_;
+  std::vector<NodeDef> float_nodedefs_;
+
+  OpSegmentTest() : device_(Env::Default()) {
+    RequireDefaultOps();
+    for (int i = 0; i < 10; ++i) {
+      NodeDef def;
+      TF_CHECK_OK(NodeDefBuilder(strings::StrCat("op", i), "Mul")
+                      .Input("x", 0, DT_INT32)
+                      .Input("y", 0, DT_INT32)
+                      .Finalize(&def));
+      int32_nodedefs_.push_back(def);
+      TF_CHECK_OK(NodeDefBuilder(strings::StrCat("op", i), "Mul")
+                      .Input("x", 0, DT_FLOAT)
+                      .Input("y", 0, DT_FLOAT)
+                      .Finalize(&def));
+      float_nodedefs_.push_back(def);
+    }
+  }
+
+  void ValidateOpAndTypes(OpKernel* op, const NodeDef& expected, DataType dt) {
+    ASSERT_NE(op, nullptr);
+    EXPECT_EQ(expected.DebugString(), op->def().DebugString());
+    EXPECT_EQ(2, op->num_inputs());
+    EXPECT_EQ(dt, op->input_type(0));
+    EXPECT_EQ(dt, op->input_type(1));
+    EXPECT_EQ(1, op->num_outputs());
+    EXPECT_EQ(dt, op->output_type(0));
+  }
+
+  OpSegment::CreateKernelFn GetFn(const NodeDef* ndef) {
+    return [this, ndef](OpKernel** kernel) {
+      Status s;
+      auto created =
+          CreateOpKernel(DEVICE_CPU, &device_, cpu_allocator(), *ndef, &s);
+      if (s.ok()) {
+        *kernel = created.release();
+      }
+      return s;
+    };
+  }
+};
+
+TEST_F(OpSegmentTest, Basic) {
+  OpSegment opseg;
+  OpKernel* op;
+
+  opseg.AddHold("A");
+  opseg.AddHold("B");
+  for (int i = 0; i < 10; ++i) {
+    // Register in session A.
+    auto* ndef = &float_nodedefs_[i];
+    EXPECT_OK(opseg.FindOrCreate("A", ndef->name(), &op, GetFn(ndef)));
+    ValidateOpAndTypes(op, *ndef, DT_FLOAT);
+
+    // Register in session B.
+    ndef = &int32_nodedefs_[i];
+    EXPECT_OK(opseg.FindOrCreate("B", ndef->name(), &op, GetFn(ndef)));
+    ValidateOpAndTypes(op, *ndef, DT_INT32);
+  }
+
+  auto reterr = [](OpKernel** kernel) {
+    return errors::Internal("Should not be called");
+  };
+  for (int i = 0; i < 10; ++i) {
+    // Lookup op in session A.
+    EXPECT_OK(opseg.FindOrCreate("A", strings::StrCat("op", i), &op, reterr));
+    ValidateOpAndTypes(op, float_nodedefs_[i], DT_FLOAT);
+
+    // Lookup op in session B.
+    EXPECT_OK(opseg.FindOrCreate("B", strings::StrCat("op", i), &op, reterr));
+    ValidateOpAndTypes(op, int32_nodedefs_[i], DT_INT32);
+  }
+
+  opseg.RemoveHold("A");
+  opseg.RemoveHold("B");
+}
+
+TEST_F(OpSegmentTest, SessionNotFound) {
+  OpSegment opseg;
+  OpKernel* op;
+  NodeDef def = float_nodedefs_[0];
+  Status s = opseg.FindOrCreate("A", def.name(), &op, GetFn(&def));
+  EXPECT_TRUE(errors::IsNotFound(s)) << s;
+}
+
+TEST_F(OpSegmentTest, CreateFailure) {
+  OpSegment opseg;
+  OpKernel* op;
+  NodeDef def = float_nodedefs_[0];
+  def.set_op("nonexistop");
+  opseg.AddHold("A");
+  Status s = opseg.FindOrCreate("A", def.name(), &op, GetFn(&def));
+  EXPECT_TRUE(errors::IsNotFound(s)) << s;
+  opseg.RemoveHold("A");
+}
+
+TEST_F(OpSegmentTest, AddRemoveHolds) {
+  OpSegment opseg;
+  OpKernel* op;
+  const auto& ndef = int32_nodedefs_[0];
+
+  // No op.
+  opseg.RemoveHold("null");
+
+  // Thread1 register the op and wants to ensure it alive.
+  opseg.AddHold("foo");
+  EXPECT_OK(opseg.FindOrCreate("foo", ndef.name(), &op, GetFn(&ndef)));
+
+  // Thread2 starts some execution needs "op" to be alive.
+  opseg.AddHold("foo");
+
+  // Thread1 clears session "foo".  E.g., a master sends CleanupGraph
+  // before an execution finishes.
+  opseg.RemoveHold("foo");
+
+  // Thread2 should still be able to access "op".
+  ValidateOpAndTypes(op, ndef, DT_INT32);
+
+  // Thread2 then remove its hold on "foo".
+  opseg.RemoveHold("foo");
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/queue_interface.h b/tensorflow/core/framework/queue_interface.h
new file mode 100644
index 0000000000..a765c211cb
--- /dev/null
+++ b/tensorflow/core/framework/queue_interface.h
@@ -0,0 +1,77 @@
+#ifndef TENSORFLOW_FRAMEWORK_QUEUE_INTERFACE_H_
+#define TENSORFLOW_FRAMEWORK_QUEUE_INTERFACE_H_
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+// All implementations must be thread-safe.
+class QueueInterface : public ResourceBase {
+ public:
+  typedef std::vector<Tensor> Tuple;
+  typedef AsyncOpKernel::DoneCallback DoneCallback;
+  typedef std::function<void(const Tuple&)> CallbackWithTuple;
+
+  virtual Status ValidateTuple(const Tuple& tuple) = 0;
+  virtual Status ValidateManyTuple(const Tuple& tuple) = 0;
+
+  // Stashes a function object for future execution, that will eventually
+  // enqueue the tuple of tensors into the queue, and returns immediately. The
+  // function object is guaranteed to call 'callback'.
+  virtual void TryEnqueue(const Tuple& tuple, OpKernelContext* ctx,
+                          DoneCallback callback) = 0;
+
+  // Same as above, but the component tensors are sliced along the 0th dimension
+  // to make multiple queue-element components.
+  virtual void TryEnqueueMany(const Tuple& tuple, OpKernelContext* ctx,
+                              DoneCallback callback) = 0;
+
+  // Stashes a function object for future execution, that will eventually
+  // dequeue an element from the queue and call 'callback' with that tuple
+  // element as argument.
+  virtual void TryDequeue(OpKernelContext* ctx, CallbackWithTuple callback) = 0;
+
+  // Same as above, but the stashed function object will attempt to dequeue
+  // num_elements items.
+  virtual void TryDequeueMany(int num_elements, OpKernelContext* ctx,
+                              CallbackWithTuple callback) = 0;
+
+  // Signals that no more elements will be enqueued, and optionally
+  // cancels pending Enqueue(Many) operations.
+  //
+  // After calling this function, subsequent calls to Enqueue(Many)
+  // will fail. If `cancel_pending_enqueues` is true, all pending
+  // calls to Enqueue(Many) will fail as well.
+  //
+  // After calling this function, all current and subsequent calls to
+  // Dequeue(Many) will fail instead of blocking (though they may
+  // succeed if they can be satisfied by the elements in the queue at
+  // the time it was closed).
+  virtual void Close(OpKernelContext* ctx, bool cancel_pending_enqueues,
+                     DoneCallback callback) = 0;
+
+  // Assuming *this represents a shared queue, verify that it matches
+  // another instantiation indicated by node_def.
+  virtual Status MatchesNodeDef(const NodeDef& node_def) = 0;
+
+  // Returns the number of elements in the queue.
+  virtual int32 size() = 0;
+
+  virtual const DataTypeVector& component_dtypes() const = 0;
+
+  string DebugString() override { return "A queue"; }
+
+ protected:
+  virtual ~QueueInterface() {}
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_QUEUE_INTERFACE_H_
diff --git a/tensorflow/core/framework/reader_interface.h b/tensorflow/core/framework/reader_interface.h
new file mode 100644
index 0000000000..b307c37f01
--- /dev/null
+++ b/tensorflow/core/framework/reader_interface.h
@@ -0,0 +1,66 @@
+#ifndef TENSORFLOW_FRAMEWORK_READER_INTERFACE_H_
+#define TENSORFLOW_FRAMEWORK_READER_INTERFACE_H_
+
+#include <memory>
+#include <string>
+#include <vector>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+class QueueInterface;
+class ReaderInterface;
+
+// Readers are the mechanism for reading records from files in
+// TensorFlow graphs.  Each supported file format has a corresponding
+// ReaderInterface descendant and a corresponding Op & OpKernel
+// (implemented using ReaderOpKernel from reader_op_kernel.h).
+//
+// To use a Reader, you first encode "work" (some string, typically a
+// filename) in the Reader's "work queue".  It then processes the
+// "work" (reading records from the file), to produce key/value
+// strings.  The methods of this class are called by ReaderFoo ops,
+// so see ../ops/io_ops.cc for detailed descriptions.
+//
+// All descendants of this class must be thread-safe.
+//
+// See the design document here:
+// https://docs.google.com/document/d/1UAgZOoeehYr20TdzW2CoZ30V-aqQphU4SwKXsW7eJv4/edit#
+
+// TODO(josh11b): Switch this to Async.
+class ReaderInterface : public ResourceBase {
+ public:
+  // Read a single record into *key / *value.  May get more work from
+  // *queue if the current work is complete.  Sets the status on
+  // *context with an OutOfRange Status if the the current work is
+  // complete and the queue is done (closed and empty).
+  // This method may block.
+  virtual void Read(QueueInterface* queue, string* key, string* value,
+                    OpKernelContext* context) = 0;
+
+  // Restore this reader to its newly-constructed state.
+  virtual Status Reset() = 0;
+
+  // Accessors
+  virtual int64 NumRecordsProduced() = 0;
+  virtual int64 NumWorkUnitsCompleted() = 0;
+
+  // -- Serialization/Restoration support --
+  // Not all readers will support saving and restoring state.
+  virtual Status SerializeState(string* state) = 0;
+  // Note: Must Reset on error.
+  virtual Status RestoreState(const string& state) = 0;
+
+  string DebugString() override { return "a reader"; }
+
+ protected:
+  virtual ~ReaderInterface() {}
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_READER_INTERFACE_H_
diff --git a/tensorflow/core/framework/reader_op_kernel.cc b/tensorflow/core/framework/reader_op_kernel.cc
new file mode 100644
index 0000000000..719f27d94b
--- /dev/null
+++ b/tensorflow/core/framework/reader_op_kernel.cc
@@ -0,0 +1,39 @@
+#include "tensorflow/core/framework/reader_op_kernel.h"
+
+namespace tensorflow {
+
+ReaderOpKernel::ReaderOpKernel(OpKernelConstruction* context)
+    : OpKernel(context), have_handle_(false) {
+  OP_REQUIRES_OK(context, context->allocate_persistent(
+                              tensorflow::DT_STRING,
+                              tensorflow::TensorShape({2}), &handle_, nullptr));
+}
+
+ReaderOpKernel::~ReaderOpKernel() {
+  if (have_handle_ && cinfo_.resource_is_private_to_kernel()) {
+    TF_CHECK_OK(cinfo_.resource_manager()->Delete<ReaderInterface>(
+        cinfo_.container(), cinfo_.name()));
+  }
+}
+
+void ReaderOpKernel::Compute(OpKernelContext* ctx) {
+  mutex_lock l(mu_);
+  if (!have_handle_) {
+    OP_REQUIRES_OK(ctx, cinfo_.Init(ctx->resource_manager(), def(), false));
+    ReaderInterface* reader;
+    OP_REQUIRES_OK(ctx,
+                   cinfo_.resource_manager()->LookupOrCreate<ReaderInterface>(
+                       cinfo_.container(), cinfo_.name(), &reader,
+                       [this](ReaderInterface** ret) {
+                         *ret = factory_();
+                         return Status::OK();
+                       }));
+    auto h = handle_.AccessTensor(ctx)->flat<string>();
+    h(0) = cinfo_.container();
+    h(1) = cinfo_.name();
+    have_handle_ = true;
+  }
+  ctx->set_output_ref(0, &mu_, handle_.AccessTensor(ctx));
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/reader_op_kernel.h b/tensorflow/core/framework/reader_op_kernel.h
new file mode 100644
index 0000000000..8e5cc50c9b
--- /dev/null
+++ b/tensorflow/core/framework/reader_op_kernel.h
@@ -0,0 +1,42 @@
+#ifndef TENSORFLOW_FRAMEWORK_READER_OP_KERNEL_H_
+#define TENSORFLOW_FRAMEWORK_READER_OP_KERNEL_H_
+
+#include <functional>
+#include <string>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/reader_interface.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// Implementation for ops providing a Reader.
+class ReaderOpKernel : public OpKernel {
+ public:
+  explicit ReaderOpKernel(OpKernelConstruction* context);
+  ~ReaderOpKernel() override;
+
+  void Compute(OpKernelContext* context) override;
+
+  // Must be called by descendants before the first call to Compute()
+  // (typically called during construction).  factory must return a
+  // ReaderInterface descendant allocated with new that ReaderOpKernel
+  // will take ownership of.
+  void SetReaderFactory(std::function<ReaderInterface*()> factory) {
+    mutex_lock l(mu_);
+    DCHECK(!have_handle_);
+    factory_ = factory;
+  }
+
+ private:
+  mutex mu_;
+  bool have_handle_ GUARDED_BY(mu_);
+  PersistentTensor handle_ GUARDED_BY(mu_);
+  ContainerInfo cinfo_;
+  std::function<ReaderInterface*()> factory_;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_READER_OP_KERNEL_H_
diff --git a/tensorflow/core/framework/register_types.h b/tensorflow/core/framework/register_types.h
new file mode 100644
index 0000000000..18473aea2e
--- /dev/null
+++ b/tensorflow/core/framework/register_types.h
@@ -0,0 +1,90 @@
+#ifndef TENSORFLOW_FRAMEWORK_REGISTER_TYPES_H_
+#define TENSORFLOW_FRAMEWORK_REGISTER_TYPES_H_
+// This file is used by cuda code and must remain compilable by nvcc.
+
+#include "tensorflow/core/platform/port.h"
+
+// Macros to apply another macro to lists of supported types.  If you change
+// the lists of types, please also update the list in types.cc.
+//
+// See example uses of these macros in core/ops.
+//
+//
+// Each of these TF_CALL_XXX_TYPES(m) macros invokes the macro "m" multiple
+// times by passing each invocation a data type supported by TensorFlow.
+//
+// The different variations pass different subsets of the types.
+// TF_CALL_ALL_TYPES(m) applied "m" to all types supported by TensorFlow.
+// The set of types depends on the compilation platform.
+//.
+// This can be used to register a different template instantiation of
+// an OpKernel for different signatures, e.g.:
+/*
+   #define REGISTER_PARTITION(type)                                  \
+     REGISTER_TF_OP_KERNEL("partition", DEVICE_CPU, #type ", int32", \
+                           PartitionOp<type>);
+   TF_CALL_ALL_TYPES(REGISTER_PARTITION)
+   #undef REGISTER_PARTITION
+*/
+
+#ifndef __ANDROID__
+
+// Call "m" for all number types that support the comparison operations "<" and
+// ">".
+#define TF_CALL_REAL_NUMBER_TYPES(m) \
+  m(float);                          \
+  m(double);                         \
+  m(int64);                          \
+  m(int32);                          \
+  m(uint8);                          \
+  m(int16);                          \
+  m(int8)
+
+#define TF_CALL_REAL_NUMBER_TYPES_NO_INT32(m) \
+  m(float);                                   \
+  m(double);                                  \
+  m(int64);                                   \
+  m(uint8);                                   \
+  m(int16);                                   \
+  m(int8)
+
+// Call "m" for all number types, including complex64.
+#define TF_CALL_NUMBER_TYPES(m) \
+  TF_CALL_REAL_NUMBER_TYPES(m); \
+  m(complex64)
+
+#define TF_CALL_NUMBER_TYPES_NO_INT32(m) \
+  TF_CALL_REAL_NUMBER_TYPES_NO_INT32(m); \
+  m(complex64)
+
+// Call "m" on all types.
+#define TF_CALL_ALL_TYPES(m) \
+  TF_CALL_NUMBER_TYPES(m);   \
+  m(bool);                   \
+  m(string)
+
+// Call "m" on all types supported on GPU.
+#define TF_CALL_GPU_NUMBER_TYPES(m) \
+  m(float);                         \
+  m(double)
+
+#else  // __ANDROID__
+
+#define TF_CALL_REAL_NUMBER_TYPES(m) \
+  m(float);                          \
+  m(int32)
+
+#define TF_CALL_NUMBER_TYPES(m) TF_CALL_REAL_NUMBER_TYPES(m)
+
+#define TF_CALL_REAL_NUMBER_TYPES_NO_INT32(m) m(float)
+
+#define TF_CALL_NUMBER_TYPES_NO_INT32(m) TF_CALL_REAL_NUMBER_TYPES_NO_INT32(m)
+
+#define TF_CALL_ALL_TYPES(m) TF_CALL_REAL_NUMBER_TYPES(m)
+
+// Maybe we could put an empty macro here for Android?
+#define TF_CALL_GPU_NUMBER_TYPES(m) m(float)
+
+#endif  // __ANDROID__
+
+#endif  // TENSORFLOW_FRAMEWORK_REGISTER_TYPES_H_
diff --git a/tensorflow/core/framework/rendezvous.cc b/tensorflow/core/framework/rendezvous.cc
new file mode 100644
index 0000000000..7f551ea65f
--- /dev/null
+++ b/tensorflow/core/framework/rendezvous.cc
@@ -0,0 +1,263 @@
+#include "tensorflow/core/framework/rendezvous.h"
+
+#include <unordered_map>
+#include <utility>
+
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+
+namespace tensorflow {
+
+/*  static */
+string Rendezvous::CreateKey(const string& src_device, uint64 src_incarnation,
+                             const string& dst_device, const string& name,
+                             const FrameAndIter& frame_iter) {
+  // NOTE: ';' is not used in the device name's job name.
+  //
+  // We include both sender and receiver in the key to facilitate
+  // debugging. For correctness, we only need to encode the receiver.
+  //
+  // "src_incarnation" is used to distinguish a worker when it
+  // restarts.
+  return strings::StrCat(src_device, ";", strings::FpToString(src_incarnation),
+                         ";", dst_device, ";", name, ";", frame_iter.frame_id,
+                         ":", frame_iter.iter_id);
+}
+
+/* static */
+Status Rendezvous::ParseKey(const string& key, ParsedKey* out) {
+  // TODO(zhifengc): This code is not fast enough.
+  std::vector<string> parts = str_util::Split(key, ';');
+  if (parts.size() == 5 &&
+      DeviceNameUtils::ParseFullName(parts[0], &out->src) &&
+      strings::StringToFp(parts[1], &out->src_incarnation) &&
+      DeviceNameUtils::ParseFullName(parts[2], &out->dst) &&
+      !parts[3].empty()) {
+    out->src_device = parts[0];
+    out->dst_device = parts[2];
+    out->edge_name = parts[3];
+    return Status::OK();
+  }
+  return errors::InvalidArgument("Invalid rendezvous key: ", key);
+}
+
+Rendezvous::~Rendezvous() {}
+
+Status Rendezvous::Recv(const string& key, const Args& recv_args, Tensor* val,
+                        bool* is_dead) {
+  Status ret;
+  Notification n;
+  RecvAsync(key, recv_args,
+            [&ret, &n, val, is_dead](const Status& s, const Args& send_args,
+                                     const Args& recv_args, const Tensor& v,
+                                     const bool dead) {
+              ret = s;
+              *val = v;
+              *is_dead = dead;
+              n.Notify();
+            });
+  n.WaitForNotification();
+  return ret;
+}
+
+class LocalRendezvousImpl : public Rendezvous {
+ public:
+  explicit LocalRendezvousImpl(bool tolerate_dup_recv)
+      : tolerate_dup_recv_(tolerate_dup_recv) {}
+
+  Status Send(const string& key, const Args& send_args, const Tensor& val,
+              const bool is_dead) override {
+    VLOG(2) << "Send " << this << " " << key;
+    DoneCallback waiter = nullptr;
+    Args recv_args;
+    {
+      mutex_lock l(mu_);
+      if (!status_.ok()) {
+        return status_;
+      }
+      Item* item = nullptr;
+      Table::iterator iter = table_.find(key);
+      if (iter == table_.end()) {
+        // There is no waiter for this message. Insert the message
+        // into the waiters table. The waiter will pick it up when
+        // arrives.
+        item = new Item;
+        item->waiter = nullptr;
+        item->value = val;
+        item->is_dead = is_dead;
+        if (send_args.device_context) {
+          send_args.device_context->Ref();
+          item->send_dev_context = send_args.device_context;
+        }
+        item->recv_dev_context = nullptr;
+
+        // The allocator attributes of item->value.
+        item->send_alloc_attrs = send_args.alloc_attrs;
+
+        CHECK(table_.insert({key, item}).second);
+        return Status::OK();
+      } else {
+        item = iter->second;
+        if (item->waiter == nullptr) {
+          // There is already a message in the table under the key.
+          // Should not happen unless it has a waiter.
+          return errors::Aborted("Duplicated send: ", key);
+        }
+        // Mark item as complete.
+        item->has_been_recvd = true;
+        waiter = item->waiter;
+        item->waiter = nullptr;
+        // The ref on recv_dev_context transfers below.
+        recv_args.device_context = item->recv_dev_context;
+        recv_args.alloc_attrs = item->recv_alloc_attrs;
+        item->recv_dev_context = nullptr;
+        if (tolerate_dup_recv_) {
+          item->value = val;
+          item->is_dead = is_dead;
+          if (send_args.device_context) {
+            send_args.device_context->Ref();
+            item->send_dev_context = send_args.device_context;
+          }
+          item->send_alloc_attrs = send_args.alloc_attrs;
+        }
+      }
+    }  // mutex
+    // Notify the waiter by invoking its done closure, outside scope
+    // of the table lock.
+    waiter(Status::OK(), send_args, recv_args, val, is_dead);
+    if (recv_args.device_context) recv_args.device_context->Unref();
+    return Status::OK();
+  }
+
+  void RecvAsync(const string& key, const Args& recv_args,
+                 DoneCallback done) override {
+    VLOG(2) << "Recv " << this << " " << key;
+    mu_.lock();
+    if (!status_.ok()) {
+      // Rendezvous has been aborted.
+      Status s = status_;
+      mu_.unlock();
+      done(s, Args(), recv_args, Tensor(), false);
+      return;
+    }
+    Table::iterator iter = table_.find(key);
+    if (iter != table_.end()) {
+      Item* item = iter->second;
+      if (item->has_been_recvd && !tolerate_dup_recv_) {
+        mu_.unlock();
+        done(errors::Aborted("Duplicated recv: ", key), Args(), recv_args,
+             Tensor(), false);
+      } else if (item->waiter == nullptr || tolerate_dup_recv_) {
+        // A message has already arrived and is stored in the table
+        // under this key.  Consumes the message and invokes the done
+        // closure.
+        Tensor v = item->value;
+        if (!tolerate_dup_recv_) {
+          item->value = Tensor();
+        }
+        item->has_been_recvd = true;
+        // Before dropping the table lock, capture the item values.
+        // DeviceContext is only non-null for non-CPU devices.
+        // If we capture the send_dev_context, we need to hold a ref on
+        // it.  Our caller will have a ref on the recv_dev_context,
+        // which is not in our table.
+        DeviceContext* send_dev_context = item->send_dev_context;
+        if (send_dev_context) send_dev_context->Ref();
+        bool is_dead = item->is_dead;
+        mu_.unlock();
+        Args send_args;
+        send_args.device_context = item->send_dev_context;
+        send_args.alloc_attrs = item->send_alloc_attrs;
+        done(Status::OK(), send_args, recv_args, v, is_dead);
+        if (send_dev_context) send_dev_context->Unref();
+      } else {
+        // Already have a waiter in the waiters table under this key,
+        // which should not happen.
+        mu_.unlock();
+        done(errors::Aborted("Duplicated recv: ", key), Args(), recv_args,
+             Tensor(), false);
+      }
+      return;
+    }
+    // Waiting for a message that has not arrived yet. Insert into the
+    // waiting table. The done closure will be invoked when the
+    // message arrives.
+    Item* item = new Item;
+    item->waiter = done;
+    if (recv_args.device_context) {
+      item->recv_dev_context = recv_args.device_context;
+      item->recv_alloc_attrs = recv_args.alloc_attrs;
+      item->recv_dev_context->Ref();
+    }
+    CHECK(table_.insert({key, item}).second);
+    mu_.unlock();
+    return;
+  }
+
+  void StartAbort(const Status& status) override {
+    CHECK(!status.ok());
+    std::vector<Item*> items;
+    {
+      mutex_lock l(mu_);
+      if (!status_.ok()) return;
+      status_ = status;
+      items.reserve(table_.size());
+      for (const auto& p : table_) items.push_back(p.second);
+      table_.clear();
+    }
+    for (Item* item : items) {
+      if (item->waiter != nullptr) {
+        item->waiter(status, Args(), Args(), Tensor(), false);
+      }
+      delete item;
+    }
+  }
+
+ private:
+  typedef LocalRendezvousImpl ME;
+  const bool tolerate_dup_recv_;
+
+  struct Item {
+    DoneCallback waiter = nullptr;
+    Tensor value;
+    bool is_dead = false;
+    bool has_been_recvd = false;
+    DeviceContext* send_dev_context = nullptr;
+    DeviceContext* recv_dev_context = nullptr;
+    AllocatorAttributes send_alloc_attrs;
+    AllocatorAttributes recv_alloc_attrs;
+
+    ~Item() {
+      if (send_dev_context) {
+        send_dev_context->Unref();
+      }
+      if (recv_dev_context) {
+        recv_dev_context->Unref();
+      }
+    }
+  };
+  typedef std::unordered_map<string, Item*> Table;
+
+  // TODO(zhifengc): shard table_.
+  mutex mu_;
+  Table table_ GUARDED_BY(mu_);
+  Status status_;
+
+  ~LocalRendezvousImpl() override {
+    for (auto i : table_) {
+      delete i.second;
+    }
+  }
+
+  TF_DISALLOW_COPY_AND_ASSIGN(LocalRendezvousImpl);
+};
+
+Rendezvous* NewLocalRendezvous(bool tolerate_dup_recv) {
+  return new LocalRendezvousImpl(tolerate_dup_recv);
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/rendezvous.h b/tensorflow/core/framework/rendezvous.h
new file mode 100644
index 0000000000..94fbfb2523
--- /dev/null
+++ b/tensorflow/core/framework/rendezvous.h
@@ -0,0 +1,102 @@
+#ifndef TENSORFLOW_FRAMEWORK_RENDEZVOUS_H_
+#define TENSORFLOW_FRAMEWORK_RENDEZVOUS_H_
+
+#include <string>
+
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/framework/control_flow.h"
+#include "tensorflow/core/framework/device_base.h"
+#include "tensorflow/core/util/device_name_utils.h"
+
+namespace tensorflow {
+
+// A Rendezvous is an abstraction for passing a Tensor
+// from a producer to a consumer, where the consumer may safely
+// request the Tensor before or after it has been produced.  A
+// producer never blocks when using a Rendezvous.  A consumer has the
+// choice of making a blocking call or providing a callback: in either
+// case, the consumer receives the Tensor as soon as it is available.
+//
+// A Rendezvous key encodes a single <producer, consumer> pair.  It is
+// an error to call Send() or Recv*() more than once with the same
+// key.
+class Rendezvous : public core::RefCounted {
+ public:
+  struct Args {
+    DeviceContext* device_context = nullptr;
+    AllocatorAttributes alloc_attrs;
+  };
+
+  // Constructs a rendezvouz key for the tensor of "name" sent from
+  // "src_device" to "dst_device". The tensor is generated in the frame
+  // and iteration specified by "frame_iter".
+  static string CreateKey(const string& src_device, uint64 src_incarnation,
+                          const string& dst_device, const string& name,
+                          const FrameAndIter& frame_iter);
+
+  // Parses the key constructed by CreateKey and parse src/dst device
+  // names into structures respectively.
+  struct ParsedKey {
+    string src_device;
+    DeviceNameUtils::ParsedName src;
+    uint64 src_incarnation = 0;
+    string dst_device;
+    DeviceNameUtils::ParsedName dst;
+    string edge_name;
+  };
+  static Status ParseKey(const string& key, ParsedKey* out);
+
+  // The caller is a tensor producer and it sends a message (a tensor
+  // "val" and a bool "is_dead") under the given "key".
+  //
+  // {val, is_dead} is bundled as a message sent and received.
+  // Typically, is_dead is set by some control flow nodes
+  // (e.g., a not-take branch).  args is passed by Send to the
+  // Recv function to communicate any information that the Recv
+  // function might need.  This is typically only necessary for
+  // Send/Recv on the same worker.
+  //
+  // Send() never blocks.
+  virtual Status Send(const string& key, const Args& args, const Tensor& val,
+                      const bool is_dead) = 0;
+
+  // Callback provided by a tensor consumer waiting on the rendezvous.
+  // It will be invoked when the tensor is available, or when a non-OK
+  // status arises in the production of that tensor.  It also gets
+  // two Rendezvous::Args, one provided by the sender, the other by the
+  // receiver, which may be needed when a non-CPU device is in use
+  // by either side.
+  typedef std::function<void(const Status&, const Args&, const Args&,
+                             const Tensor&, const bool)> DoneCallback;
+
+  virtual void RecvAsync(const string& key, const Args& args,
+                         DoneCallback done) = 0;
+
+  // Synchronous wrapper for RecvAsync.
+  Status Recv(const string& key, const Args& args, Tensor* val, bool* is_dead);
+
+  // Aborts all pending and future Send/Recv with the given "status".
+  //
+  // StartAbort() does not wait for ongoing calls to finish.
+  // REQUIRES: !status.ok()
+  virtual void StartAbort(const Status& status) = 0;
+
+ protected:
+  ~Rendezvous() override;
+};
+
+// Returns a Rendezvous instance that is limited to use only by
+// producers and consumers in the local process.  The caller assumes
+// ownership of one Ref() on the returned object.
+//
+// If "tolerate_dup_recv" is true, then the Rendezvous will retain
+// already Recv'd values and make them available to duplicate Recv
+// calls.  This may be useful if the RPC layer is not reliable, but
+// comes at the cost of higher memory consumption.
+Rendezvous* NewLocalRendezvous(bool tolerate_dup_recv = false);
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_RENDEZVOUS_H_
diff --git a/tensorflow/core/framework/rendezvous_test.cc b/tensorflow/core/framework/rendezvous_test.cc
new file mode 100644
index 0000000000..32011a468f
--- /dev/null
+++ b/tensorflow/core/framework/rendezvous_test.cc
@@ -0,0 +1,314 @@
+#include "tensorflow/core/framework/rendezvous.h"
+
+#include <gtest/gtest.h>
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+TEST(RendezvousTest, Key) {
+  const string key = Rendezvous::CreateKey(
+      "/job:mnist/replica:1/task:2/CPU:0", 7890,
+      "/job:mnist/replica:1/task:2/GPU:0", "var0", FrameAndIter(0, 0));
+  EXPECT_EQ(key,
+            "/job:mnist/replica:1/task:2/CPU:0;"
+            "0000000000001ed2;"  // 7890 = 0x1ed2
+            "/job:mnist/replica:1/task:2/GPU:0;"
+            "var0;"
+            "0:0");
+  Rendezvous::ParsedKey parsed;
+  EXPECT_OK(Rendezvous::ParseKey(key, &parsed));
+  EXPECT_EQ(parsed.src_device, "/job:mnist/replica:1/task:2/CPU:0");
+  EXPECT_EQ(parsed.src_incarnation, 7890);
+  EXPECT_EQ(parsed.src.type, "CPU");
+  EXPECT_EQ(parsed.dst_device, "/job:mnist/replica:1/task:2/GPU:0");
+  EXPECT_EQ(parsed.dst.type, "GPU");
+
+  EXPECT_FALSE(Rendezvous::ParseKey("foo;bar;baz", &parsed).ok());
+  EXPECT_FALSE(Rendezvous::ParseKey("/job:mnist/replica:1/task:2/CPU:0;"
+                                    "/job:mnist/replica:1/task:2/GPU:0;",
+                                    &parsed)
+                   .ok());
+  EXPECT_FALSE(
+      Rendezvous::ParseKey(strings::StrCat(key, ";", key), &parsed).ok());
+}
+
+class LocalRendezvousTest : public ::testing::Test {
+ public:
+  LocalRendezvousTest()
+      : threads_(new thread::ThreadPool(Env::Default(), "test", 16)) {
+    rendez_ = NewLocalRendezvous();
+  }
+
+  ~LocalRendezvousTest() override {
+    rendez_->Unref();
+    delete threads_;
+  }
+
+  void SchedClosure(std::function<void()> fn) { threads_->Schedule(fn); }
+
+  Rendezvous* rendez_;
+
+ private:
+  thread::ThreadPool* threads_;
+};
+
+// string -> Tensor<string>
+Tensor V(const string& content) {
+  Tensor tensor(DT_STRING, TensorShape({}));
+  tensor.scalar<string>()() = content;
+  return tensor;
+}
+
+// Tensor<string> -> string
+string V(const Tensor& tensor) {
+  CHECK_EQ(tensor.dtype(), DT_STRING);
+  CHECK(TensorShapeUtils::IsScalar(tensor.shape()));
+  return tensor.scalar<string>()();
+}
+
+TEST_F(LocalRendezvousTest, SendRecv) {
+  Rendezvous::Args args;
+  ASSERT_OK(rendez_->Send("foo", args, V("hello"), false));
+  EXPECT_TRUE(errors::IsAborted(rendez_->Send("foo", args, V("hello"), false)));
+  Tensor val(DT_STRING);
+  bool is_dead = false;
+  ASSERT_OK(rendez_->Recv("foo", args, &val, &is_dead));
+  EXPECT_EQ("hello", V(val));
+}
+
+TEST_F(LocalRendezvousTest, RecvSend) {
+  SchedClosure([this]() {
+    Env::Default()->SleepForMicroseconds(10000);
+    Rendezvous::Args args;
+    ASSERT_OK(rendez_->Send("foo", args, V("hello"), false));
+  });
+  Tensor val(DT_STRING);
+  bool is_dead = false;
+  Rendezvous::Args args;
+  ASSERT_OK(rendez_->Recv("foo", args, &val, &is_dead));
+  EXPECT_EQ("hello", V(val));
+}
+
+TEST_F(LocalRendezvousTest, DuplicateWaiterRecv) {
+  SchedClosure([this]() {
+    Tensor t(DT_STRING);
+    bool is_dead = false;
+    Rendezvous::Args args;
+    ASSERT_OK(rendez_->Recv("foo", args, &t, &is_dead));
+    ASSERT_OK(rendez_->Send("bar", args, t, is_dead));
+  });
+  Env::Default()->SleepForMicroseconds(1000000);
+  Tensor val(DT_STRING);
+  bool val_dead = false;
+  Rendezvous::Args args;
+  EXPECT_TRUE(errors::IsAborted(rendez_->Recv("foo", args, &val, &val_dead)));
+  ASSERT_OK(rendez_->Send("foo", args, V("secret msg"), val_dead));
+  ASSERT_OK(rendez_->Recv("bar", args, &val, &val_dead));
+  EXPECT_EQ("secret msg", V(val));
+}
+
+TEST_F(LocalRendezvousTest, DuplicateSerialRecv) {
+  SchedClosure([this]() {
+    Tensor t(DT_STRING);
+    bool is_dead = false;
+    Rendezvous::Args args;
+    ASSERT_OK(rendez_->Recv("foo", args, &t, &is_dead));
+    ASSERT_OK(rendez_->Send("bar", args, t, is_dead));
+  });
+  Env::Default()->SleepForMicroseconds(1000000);
+  Tensor val(DT_STRING);
+  bool val_dead = false;
+  Rendezvous::Args args;
+  ASSERT_OK(rendez_->Send("foo", args, V("secret msg"), val_dead));
+  ASSERT_OK(rendez_->Recv("bar", args, &val, &val_dead));
+  EXPECT_EQ("secret msg", V(val));
+  EXPECT_TRUE(errors::IsAborted(rendez_->Recv("foo", args, &val, &val_dead)));
+}
+
+// A simple structure that behaves a bit like a blocking counter.  The
+// user that decrements counter to 0 does done.Notify(), and the main
+// thread waits for done to be notified.
+struct BlockingState {
+  mutex lock;
+  int counter;
+  Notification done;
+};
+
+TEST_F(LocalRendezvousTest, RandomSendRecv) {
+  static const int N = 1000;
+  BlockingState state;
+  state.counter = N;
+  for (int i = 0; i < N; ++i) {
+    SchedClosure([this, i]() {
+      random::PhiloxRandom philox(testing::RandomSeed() + i, 17);
+      random::SimplePhilox rnd(&philox);
+      Env::Default()->SleepForMicroseconds(1000 + rnd.Uniform(10000));
+      Rendezvous::Args args;
+      ASSERT_OK(rendez_->Send(strings::StrCat(i), args, V(strings::StrCat(i)),
+                              false));
+    });
+    SchedClosure([this, &state, i]() {
+      random::PhiloxRandom philox(testing::RandomSeed() + N + i, 17);
+      random::SimplePhilox rnd(&philox);
+      Env::Default()->SleepForMicroseconds(1000 + rnd.Uniform(10000));
+      Tensor val(DT_STRING);
+      bool val_dead = false;
+      Rendezvous::Args args;
+      ASSERT_OK(rendez_->Recv(strings::StrCat(i), args, &val, &val_dead));
+      EXPECT_EQ(strings::StrCat(i), V(val));
+      bool done = false;
+      {
+        mutex_lock l(state.lock);
+        state.counter--;
+        if (state.counter == 0) {
+          done = true;
+        }
+      }
+      if (done) {
+        state.done.Notify();
+      }
+    });
+  }
+
+  state.done.WaitForNotification();
+}
+
+TEST_F(LocalRendezvousTest, RecvAbort) {
+  rendez_->Ref();
+  SchedClosure([this]() {
+    rendez_->StartAbort(errors::Aborted(""));  // abort
+    rendez_->Unref();
+  });
+  Tensor val(DT_STRING);
+  bool val_dead = false;
+  Rendezvous::Args args;
+  Status status = rendez_->Recv("foo", args, &val, &val_dead);
+  EXPECT_TRUE(errors::IsAborted(status));
+}
+
+// Similar to RecvAbort. But this test case ensures the main thread
+// Recv() call happens after StartAbort().
+TEST_F(LocalRendezvousTest, RecvSleepAbort) {
+  rendez_->Ref();
+  SchedClosure([this]() {
+    Env::Default()->SleepForMicroseconds(1000000);
+    rendez_->StartAbort(errors::Aborted(""));  // abort
+    rendez_->Unref();
+  });
+  Tensor val(DT_STRING);
+  bool val_dead = false;
+  Rendezvous::Args args;
+  Status status = rendez_->Recv("foo", args, &val, &val_dead);
+  EXPECT_TRUE(errors::IsAborted(status));
+}
+
+TEST_F(LocalRendezvousTest, AbortThenRecvOrSend) {
+  rendez_->StartAbort(errors::Aborted(""));
+  Tensor val(DT_STRING);
+  bool val_dead = false;
+  Rendezvous::Args args;
+  EXPECT_TRUE(errors::IsAborted(rendez_->Send("foo", args, val, val_dead)));
+  EXPECT_TRUE(errors::IsAborted(rendez_->Recv("foo", args, &val, &val_dead)));
+}
+
+class DummyDeviceContext : public DeviceContext {
+ public:
+  explicit DummyDeviceContext(int stream_id) : stream_id_(stream_id) {}
+  ~DummyDeviceContext() override {}
+  int stream_id() const { return stream_id_; }
+
+ private:
+  const int stream_id_;
+};
+
+TEST_F(LocalRendezvousTest, TransferDummyDeviceContext) {
+  Rendezvous::Args args;
+  args.device_context = new DummyDeviceContext(123);
+
+  ASSERT_OK(rendez_->Send("foo", args, V("hello"), false));
+
+  Notification n;
+  Rendezvous::Args args1;
+  args1.device_context = new DummyDeviceContext(1);
+  rendez_->RecvAsync("foo", args1, [&n](const Status& s,
+                                        const Rendezvous::Args& send_args,
+                                        const Rendezvous::Args& recv_args,
+                                        const Tensor& val, bool is_dead) {
+    CHECK_EQ(123,
+             dynamic_cast<const DummyDeviceContext*>(send_args.device_context)
+                 ->stream_id());
+    n.Notify();
+  });
+
+  n.WaitForNotification();
+  args.device_context->Unref();
+  args1.device_context->Unref();
+}
+
+static void BM_SendRecv(int iters) {
+  Rendezvous* rendez = NewLocalRendezvous();
+  Tensor orig = V("val");
+  Tensor val(DT_STRING, TensorShape({}));
+  bool is_dead = false;
+  Rendezvous::Args args;
+  Status s;
+  if (iters > 0) {
+    while (iters--) {
+      s = rendez->Send("foo", args, orig, is_dead);
+      s = rendez->Recv("foo", args, &val, &is_dead);
+    }
+    CHECK_EQ(V(val), V(orig));
+  }
+  rendez->Unref();
+}
+BENCHMARK(BM_SendRecv);
+
+static void BM_RecvSend(int iters) {
+  thread::ThreadPool* pool = new thread::ThreadPool(Env::Default(), "test", 1);
+
+  // The main thread sends "foo" for iters/2 times and receives "bar"
+  // for iters/2 times.  The other thread sends "bar" for iters/2
+  // times and receives "foo" for iters/2 times.
+  Rendezvous* rendez = NewLocalRendezvous();
+  pool->Schedule([rendez, iters]() {
+    Tensor bar = V("bar");
+    Tensor foo(DT_STRING, TensorShape({}));
+    bool is_dead = false;
+    Rendezvous::Args args;
+    Status s;
+    for (int i = 0; i < iters / 2; ++i) {
+      s = rendez->Recv("foo", args, &foo, &is_dead);
+      s = rendez->Send("bar", args, bar, is_dead);
+    }
+    CHECK_EQ("foo", V(foo));
+  });
+  Tensor foo = V("foo");
+  Tensor bar(DT_STRING, TensorShape({}));
+  bool is_dead = false;
+  Rendezvous::Args args;
+  Status s;
+  for (int i = 0; i < iters / 2; ++i) {
+    s = rendez->Send("foo", args, foo, is_dead);
+    s = rendez->Recv("bar", args, &bar, &is_dead);
+  }
+  CHECK_EQ("bar", V(bar));
+  delete pool;
+}
+BENCHMARK(BM_RecvSend);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/resource_mgr.cc b/tensorflow/core/framework/resource_mgr.cc
new file mode 100644
index 0000000000..42326f068e
--- /dev/null
+++ b/tensorflow/core/framework/resource_mgr.cc
@@ -0,0 +1,146 @@
+#include "tensorflow/core/framework/resource_mgr.h"
+
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/platform/regexp.h"
+
+namespace tensorflow {
+
+ResourceMgr::ResourceMgr() : default_container_("localhost") {}
+
+ResourceMgr::ResourceMgr(const string& default_container)
+    : default_container_(default_container) {}
+
+ResourceMgr::~ResourceMgr() { Clear(); }
+
+void ResourceMgr::Clear() {
+  mutex_lock l(mu_);
+  for (const auto& p : containers_) {
+    for (const auto& q : *p.second) {
+      q.second->Unref();
+    }
+    delete p.second;
+  }
+  containers_.clear();
+}
+
+Status ResourceMgr::DoCreate(const string& container, std::type_index type,
+                             const string& name, ResourceBase* resource) {
+  {
+    mutex_lock l(mu_);
+    Container** b = &containers_[container];
+    if (*b == nullptr) {
+      *b = new Container;
+    }
+    if ((*b)->insert({{type, name}, resource}).second) {
+      return Status::OK();
+    }
+  }
+  resource->Unref();
+  return errors::AlreadyExists("Resource ", container, "/", name, "/",
+                               type.name());
+}
+
+Status ResourceMgr::DoLookup(const string& container, std::type_index type,
+                             const string& name,
+                             ResourceBase** resource) const {
+  mutex_lock l(mu_);
+  const Container* b = gtl::FindPtrOrNull(containers_, container);
+  if (b == nullptr) {
+    return errors::NotFound("Container ", container, " does not exist.");
+  }
+  auto r = gtl::FindPtrOrNull(*b, {type, name});
+  if (r == nullptr) {
+    return errors::NotFound("Resource ", container, "/", name, "/", type.name(),
+                            " does not exist.");
+  }
+  *resource = const_cast<ResourceBase*>(r);
+  (*resource)->Ref();
+  return Status::OK();
+}
+
+Status ResourceMgr::DoDelete(const string& container, std::type_index type,
+                             const string& name) {
+  ResourceBase* base = nullptr;
+  {
+    mutex_lock l(mu_);
+    Container* b = gtl::FindPtrOrNull(containers_, container);
+    if (b == nullptr) {
+      return errors::NotFound("Container ", container, " does not exist.");
+    }
+    auto iter = b->find({type, name});
+    if (iter == b->end()) {
+      return errors::NotFound("Resource ", container, "/", name, "/",
+                              type.name(), " does not exist.");
+    }
+    base = iter->second;
+    b->erase(iter);
+  }
+  CHECK(base != nullptr);
+  base->Unref();
+  return Status::OK();
+}
+
+Status ResourceMgr::Cleanup(const string& container) {
+  Container* b = nullptr;
+  {
+    mutex_lock l(mu_);
+    auto iter = containers_.find(container);
+    if (iter == containers_.end()) {
+      return errors::NotFound("Container ", container, " does not exist.");
+    }
+    b = iter->second;
+    containers_.erase(iter);
+  }
+  CHECK(b != nullptr);
+  for (const auto& p : *b) {
+    p.second->Unref();
+  }
+  delete b;
+  return Status::OK();
+}
+
+Status ContainerInfo::Init(ResourceMgr* rmgr, const NodeDef& ndef,
+                           bool use_node_name_as_default) {
+  CHECK(rmgr);
+  rmgr_ = rmgr;
+  string attr_container;
+  TF_RETURN_IF_ERROR(GetNodeAttr(ndef, "container", &attr_container));
+  static RE2 container_re("[A-Za-z0-9.][A-Za-z0-9_.\\-/]*");
+  if (!attr_container.empty() &&
+      !RE2::FullMatch(attr_container, container_re)) {
+    return errors::InvalidArgument("container contains invalid characters: ",
+                                   attr_container);
+  }
+  string attr_shared_name;
+  TF_RETURN_IF_ERROR(GetNodeAttr(ndef, "shared_name", &attr_shared_name));
+  if (!attr_shared_name.empty() && (attr_shared_name[0] == '_')) {
+    return errors::InvalidArgument("shared_name cannot start with '_':",
+                                   attr_shared_name);
+  }
+  if (!attr_container.empty()) {
+    container_ = attr_container;
+  } else {
+    container_ = rmgr_->default_container();
+  }
+  if (!attr_shared_name.empty()) {
+    name_ = attr_shared_name;
+  } else if (use_node_name_as_default) {
+    name_ = ndef.name();
+  } else {
+    resource_is_private_to_kernel_ = true;
+    static std::atomic<int64> counter(0);
+    name_ = strings::StrCat("_", counter.fetch_add(1), "_", ndef.name());
+  }
+  return Status::OK();
+}
+
+string ContainerInfo::DebugString() const {
+  return strings::StrCat("[", container(), ",", name(), ",",
+                         resource_is_private_to_kernel() ? "private" : "public",
+                         "]");
+}
+
+}  //  end namespace tensorflow
diff --git a/tensorflow/core/framework/resource_mgr.h b/tensorflow/core/framework/resource_mgr.h
new file mode 100644
index 0000000000..65e859caf1
--- /dev/null
+++ b/tensorflow/core/framework/resource_mgr.h
@@ -0,0 +1,280 @@
+#ifndef TENSORFLOW_FRAMEWORK_RESOURCE_MGR_H_
+#define TENSORFLOW_FRAMEWORK_RESOURCE_MGR_H_
+
+#include <string>
+#include <typeindex>
+#include <typeinfo>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/lib/hash/hash.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// A ResourceMgr instance keeps track of named and typed resources
+// grouped into containers.
+//
+// Each resource must be represented as a sub-class of ResourceBase,
+// which is reference counted explicitly.  Each named resource is
+// registered with ResourceMgr under a named "container" name. At any
+// time, there is at most one instance of a resource given the container
+// name, the resource type and the resource name.
+//
+// All resources for a given container can be dropped by one call of
+// Cleanup().
+//
+// E.g.,
+//   struct MyVar : public ResourceBase {
+//     mutex mu;
+//     Tensor val;
+//   }
+//
+//   ResourceMgr rm;
+//
+//   // Create a var.
+//   MyVar* my_var = new MyVar;
+//   my_var.val = Tensor(DT_FLOAT, my_shape);
+//   my_val.val.flat<float>().setZeros();   // 0 initialized.
+//   ctx->SetStatus(rm.Create("my_container", "my_name", my_val));
+//
+//   // += a variable.
+//   MyVar* my_var = nullptr;
+//   Status s = rm.Lookup("my_container", "my_name", &my_var);
+//   if (s.ok()) {
+//     my_var->val.flat<float>() += grad;
+//   }
+//   my_var->Unref();   // Or use ScopedUnref().
+//   ctx->SetStatus(s);
+class ResourceBase : public core::RefCounted {
+ public:
+  // Returns a debug string for *this.
+  virtual string DebugString() = 0;
+};
+
+class ResourceMgr {
+ public:
+  ResourceMgr();
+  explicit ResourceMgr(const string& default_container);
+  ~ResourceMgr();
+
+  // Returns the default container name for *this.
+  const string& default_container() const { return default_container_; }
+
+  // Creates a resource "name" in the "container".  The caller transfers
+  // the ownership of one ref on "resource" to *this
+  //
+  // REQUIRES: std::is_base_of<ResourceBase, T>
+  // REQUIRES: resource != nullptr.
+  template <typename T>
+  Status Create(const string& container, const string& name,
+                T* resource) TF_MUST_USE_RESULT;
+
+  // If "container" has a resource "name", returns it in "*resource" and
+  // the caller takes the ownership of one ref on "*resource".
+  //
+  // REQUIRES: std::is_base_of<ResourceBase, T>
+  // REQUIRES: resource != nullptr
+  template <typename T>
+  Status Lookup(const string& container, const string& name,
+                T** resource) const TF_MUST_USE_RESULT;
+
+  // If "container" has a resource "name", returns it in
+  // "*resource". Otherwise, invokes creator() to create the resource.
+  // The caller takes the ownership of one ref on "*resource".
+  //
+  // REQUIRES: std::is_base_of<ResourceBase, T>
+  // REQUIRES: resource != nullptr
+  template <typename T>
+  Status LookupOrCreate(const string& container, const string& name,
+                        T** resource,
+                        std::function<Status(T**)> creator) TF_MUST_USE_RESULT;
+
+  // Deletes the resource "name" from the "container".
+  //
+  // REQUIRES: std::is_base_of<ResourceBase, T>
+  template <typename T>
+  Status Delete(const string& container, const string& name) TF_MUST_USE_RESULT;
+
+  // Deletes all resources from the "container" and removes the container.
+  Status Cleanup(const string& container) TF_MUST_USE_RESULT;
+
+  // Deletes all resources in all containers.
+  void Clear();
+
+ private:
+  typedef std::pair<std::type_index, string> Key;
+  struct KeyHash {
+    std::size_t operator()(const Key& k) const {
+      return Hash64(k.second.data(), k.second.size(), k.first.hash_code());
+    }
+  };
+  struct KeyEqual {
+    bool operator()(const Key& x, const Key& y) const {
+      return (x.second == y.second) && (x.first == y.first);
+    }
+  };
+  typedef std::unordered_map<Key, ResourceBase*, KeyHash, KeyEqual> Container;
+
+  const string default_container_;
+  mutable mutex mu_;
+  std::unordered_map<string, Container*> containers_ GUARDED_BY(mu_);
+
+  Status DoCreate(const string& container, std::type_index type,
+                  const string& name,
+                  ResourceBase* resource) TF_MUST_USE_RESULT;
+  Status DoLookup(const string& container, std::type_index type,
+                  const string& name,
+                  ResourceBase** resource) const TF_MUST_USE_RESULT;
+  Status DoDelete(const string& container, std::type_index type,
+                  const string& name) TF_MUST_USE_RESULT;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(ResourceMgr);
+};
+
+// Policy helper to decide which container/shared_name to use for a
+// stateful kernel that accesses shared resource.
+class ContainerInfo {
+ public:
+  // Analyze the node attribute of 'ndef' and decides the container and
+  // resource name the kernel should use for accessing the shared
+  // resource.
+  //
+  // 'ndef' is expected to have node attribute "container" and
+  // "shared_name". Returns non-OK if they are not provided or they are
+  // invalid.
+  //
+  // The policy is as following:
+  // * If the attribute "container" is non-empty, it is used as is.
+  //   Otherwise, uses the resource manager's default container.
+  // * If the attribute "shared_name" is non-empty, it is used as is.
+  //   Otherwise, if "use_node_name_as_default" is true, the kernel's
+  //   node name is used as the resource name. Otherwise, a string
+  //   unique to this process is used.
+  Status Init(ResourceMgr* rmgr, const NodeDef& ndef,
+              bool use_node_name_as_default);
+  Status Init(ResourceMgr* rmgr, const NodeDef& ndef) {
+    return Init(rmgr, ndef, false);
+  }
+
+  // The policy decides that the kernel should access the resource in
+  // resource_manager(), the resource is in the container() and its
+  // name is name().  If resource_is_private_to_kernel() is true, the
+  // kernel should delete the resource when the kernel is deleted.
+  ResourceMgr* resource_manager() const { return rmgr_; }
+  const string& container() const { return container_; }
+  const string& name() const { return name_; }
+  bool resource_is_private_to_kernel() const {
+    return resource_is_private_to_kernel_;
+  }
+
+  // Returns a readable string for *this.
+  string DebugString() const;
+
+ private:
+  ResourceMgr* rmgr_ = nullptr;
+  string container_;
+  string name_;
+  bool resource_is_private_to_kernel_ = false;
+};
+
+// Helper for kernels to obtain 'resource' from the
+// ctx->resource_manager().
+//
+// "input_name" specifies the kernel's ref input which gives a string
+// tensor with two elements, which specifies the container and
+// resource name.
+//
+// Returns OK if the resource is found and transfers one ref of
+// *resource to the caller. Otherwise, returns an error.
+template <typename T>
+Status GetResourceFromContext(OpKernelContext* ctx, const string& input_name,
+                              T** resource);
+
+// Implementation details below.
+
+template <typename T>
+void CheckDeriveFromResourceBase() {
+  static_assert(std::is_base_of<ResourceBase, T>::value,
+                "T must derive from ResourceBase");
+}
+
+template <typename T>
+Status ResourceMgr::Create(const string& container, const string& name,
+                           T* resource) {
+  CheckDeriveFromResourceBase<T>();
+  CHECK(resource != nullptr);
+  return DoCreate(container, std::type_index(typeid(T)), name, resource);
+}
+
+template <typename T>
+Status ResourceMgr::Lookup(const string& container, const string& name,
+                           T** resource) const {
+  CheckDeriveFromResourceBase<T>();
+  ResourceBase* found = nullptr;
+  Status s = DoLookup(container, std::type_index(typeid(T)), name, &found);
+  if (s.ok()) {
+    // It's safe to down cast 'found' to T* since
+    // typeid(T).hash_code() is part of the map key.
+    *resource = static_cast<T*>(found);
+  }
+  return s;
+}
+
+template <typename T>
+Status ResourceMgr::LookupOrCreate(const string& container, const string& name,
+                                   T** resource,
+                                   std::function<Status(T**)> creator) {
+  Status s;
+  *resource = nullptr;
+  while (*resource == nullptr) {
+    s = Lookup(container, name, resource);
+    if (s.ok()) break;
+    s = creator(resource);
+    if (!s.ok()) break;
+    s = Create(container, name, *resource);
+    if (s.ok()) {
+      (*resource)->Ref();
+      break;
+    }
+    // Rare event. Concurrent racy creation. Redo the lookup.
+    *resource = nullptr;
+  }
+  return s;
+}
+
+template <typename T>
+Status ResourceMgr::Delete(const string& container, const string& name) {
+  CheckDeriveFromResourceBase<T>();
+  return DoDelete(container, std::type_index(typeid(T)), name);
+}
+
+template <typename T>
+Status GetResourceFromContext(OpKernelContext* ctx, const string& input_name,
+                              T** resource) {
+  string container;
+  string shared_name;
+  {
+    mutex* mu;
+    TF_RETURN_IF_ERROR(ctx->input_ref_mutex(input_name, &mu));
+    mutex_lock l(*mu);
+    Tensor tensor;
+    TF_RETURN_IF_ERROR(ctx->mutable_input(input_name, &tensor, true));
+    if (tensor.NumElements() != 2) {
+      return errors::InvalidArgument(
+          "Resource handle must have 2 elements, but had shape: ",
+          tensor.shape().DebugString());
+    }
+    container = tensor.flat<string>()(0);
+    shared_name = tensor.flat<string>()(1);
+  }
+  return ctx->resource_manager()->Lookup(container, shared_name, resource);
+}
+
+}  //  end namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_RESOURCE_MGR_H_
diff --git a/tensorflow/core/framework/resource_mgr_test.cc b/tensorflow/core/framework/resource_mgr_test.cc
new file mode 100644
index 0000000000..9f7ce3dde3
--- /dev/null
+++ b/tensorflow/core/framework/resource_mgr_test.cc
@@ -0,0 +1,173 @@
+#include "tensorflow/core/framework/resource_mgr.h"
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace tensorflow {
+
+class Resource : public ResourceBase {
+ public:
+  explicit Resource(const string& label) : label_(label) {}
+  ~Resource() override {}
+
+  string DebugString() { return strings::StrCat("R/", label_); }
+
+ private:
+  string label_;
+};
+
+class Other : public ResourceBase {
+ public:
+  explicit Other(const string& label) : label_(label) {}
+  ~Other() override {}
+
+  string DebugString() { return strings::StrCat("O/", label_); }
+
+ private:
+  string label_;
+};
+
+template <typename T>
+string Find(const ResourceMgr& rm, const string& container,
+            const string& name) {
+  T* r;
+  TF_CHECK_OK(rm.Lookup(container, name, &r));
+  const string ret = r->DebugString();
+  r->Unref();
+  return ret;
+}
+
+template <typename T>
+string LookupOrCreate(ResourceMgr* rm, const string& container,
+                      const string& name, const string& label) {
+  T* r;
+  TF_CHECK_OK(rm->LookupOrCreate<T>(container, name, &r, [&label](T** ret) {
+    *ret = new T(label);
+    return Status::OK();
+  }));
+  const string ret = r->DebugString();
+  r->Unref();
+  return ret;
+}
+
+static void HasError(const Status& s, const string& substr) {
+  EXPECT_TRUE(StringPiece(s.ToString()).contains(substr))
+      << s << ", expected substring " << substr;
+}
+
+template <typename T>
+Status FindErr(const ResourceMgr& rm, const string& container,
+               const string& name) {
+  T* r;
+  Status s = rm.Lookup(container, name, &r);
+  CHECK(!s.ok());
+  return s;
+}
+
+TEST(ResourceMgrTest, Basic) {
+  ResourceMgr rm;
+  TF_CHECK_OK(rm.Create("foo", "bar", new Resource("cat")));
+  TF_CHECK_OK(rm.Create("foo", "baz", new Resource("dog")));
+  TF_CHECK_OK(rm.Create("foo", "bar", new Other("tiger")));
+
+  // Expected to fail.
+  HasError(rm.Create("foo", "bar", new Resource("kitty")),
+           "Already exists: Resource foo/bar");
+
+  // Expected to be found.
+  EXPECT_EQ("R/cat", Find<Resource>(rm, "foo", "bar"));
+  EXPECT_EQ("R/dog", Find<Resource>(rm, "foo", "baz"));
+  EXPECT_EQ("O/tiger", Find<Other>(rm, "foo", "bar"));
+
+  // Expected to be not found.
+  HasError(FindErr<Resource>(rm, "bar", "foo"), "Not found: Container bar");
+  HasError(FindErr<Resource>(rm, "foo", "xxx"), "Not found: Resource foo/xxx");
+  HasError(FindErr<Other>(rm, "foo", "baz"), "Not found: Resource foo/baz");
+
+  // Delete foo/bar/Resource.
+  TF_CHECK_OK(rm.Delete<Resource>("foo", "bar"));
+  HasError(FindErr<Resource>(rm, "foo", "bar"), "Not found: Resource foo/bar");
+
+  TF_CHECK_OK(rm.Create("foo", "bar", new Resource("kitty")));
+  EXPECT_EQ("R/kitty", Find<Resource>(rm, "foo", "bar"));
+
+  // Drop the whole container foo.
+  TF_CHECK_OK(rm.Cleanup("foo"));
+  HasError(FindErr<Resource>(rm, "foo", "bar"), "Not found: Container foo");
+}
+
+TEST(ResourceMgr, CreateOrLookup) {
+  ResourceMgr rm;
+  EXPECT_EQ("R/cat", LookupOrCreate<Resource>(&rm, "foo", "bar", "cat"));
+  EXPECT_EQ("R/cat", LookupOrCreate<Resource>(&rm, "foo", "bar", "dog"));
+  EXPECT_EQ("R/cat", Find<Resource>(rm, "foo", "bar"));
+
+  EXPECT_EQ("O/tiger", LookupOrCreate<Other>(&rm, "foo", "bar", "tiger"));
+  EXPECT_EQ("O/tiger", LookupOrCreate<Other>(&rm, "foo", "bar", "lion"));
+  TF_CHECK_OK(rm.Delete<Other>("foo", "bar"));
+  HasError(FindErr<Other>(rm, "foo", "bar"), "Not found: Resource foo/bar");
+}
+
+Status ComputePolicy(const string& attr_container,
+                     const string& attr_shared_name,
+                     bool use_node_name_as_default, string* result) {
+  ContainerInfo cinfo;
+  ResourceMgr rmgr;
+  NodeDef ndef;
+  ndef.set_name("foo");
+  if (attr_container != "none") {
+    AddNodeAttr("container", attr_container, &ndef);
+  }
+  if (attr_shared_name != "none") {
+    AddNodeAttr("shared_name", attr_shared_name, &ndef);
+  }
+  TF_RETURN_IF_ERROR(cinfo.Init(&rmgr, ndef, use_node_name_as_default));
+  *result = cinfo.DebugString();
+  return Status::OK();
+}
+
+string Policy(const string& attr_container, const string& attr_shared_name,
+              bool use_node_name_as_default) {
+  string ret;
+  TF_CHECK_OK(ComputePolicy(attr_container, attr_shared_name,
+                            use_node_name_as_default, &ret));
+  return ret;
+}
+
+TEST(ContainerInfo, Basic) {
+  // Correct cases.
+  EXPECT_EQ(Policy("", "", false), "[localhost,_0_foo,private]");
+  EXPECT_EQ(Policy("", "", true), "[localhost,foo,public]");
+  EXPECT_EQ(Policy("", "bar", false), "[localhost,bar,public]");
+  EXPECT_EQ(Policy("", "bar", true), "[localhost,bar,public]");
+  EXPECT_EQ(Policy("cat", "", false), "[cat,_1_foo,private]");
+  EXPECT_EQ(Policy("cat", "", true), "[cat,foo,public]");
+  EXPECT_EQ(Policy("cat", "bar", false), "[cat,bar,public]");
+  EXPECT_EQ(Policy("cat", "bar", true), "[cat,bar,public]");
+}
+
+Status WrongPolicy(const string& attr_container, const string& attr_shared_name,
+                   bool use_node_name_as_default) {
+  string dbg;
+  auto s = ComputePolicy(attr_container, attr_shared_name,
+                         use_node_name_as_default, &dbg);
+  CHECK(!s.ok());
+  return s;
+}
+
+TEST(ContainerInfo, Error) {
+  // Missing attribute.
+  HasError(WrongPolicy("none", "", false), "No attr");
+  HasError(WrongPolicy("", "none", false), "No attr");
+  HasError(WrongPolicy("none", "none", false), "No attr");
+
+  // Invalid container.
+  HasError(WrongPolicy("12$%", "", false), "container contains invalid char");
+
+  // Invalid shared name.
+  HasError(WrongPolicy("", "_foo", false), "shared_name cannot start with '_'");
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/step_stats.proto b/tensorflow/core/framework/step_stats.proto
new file mode 100644
index 0000000000..78610350ec
--- /dev/null
+++ b/tensorflow/core/framework/step_stats.proto
@@ -0,0 +1,58 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/tensor_description.proto";
+
+// TODO(tucker): The next 4 message defs are very similar to
+// the *LogEntry messages in profile.proto.  They should be
+// unified in one place.
+
+message AllocatorMemoryUsed {
+  string allocator_name = 1;
+  int64 total_bytes = 2;
+  int64 peak_bytes = 3;
+}
+
+enum AllocationType {
+  AT_NOTUSED = 0;    // tensor was not filled in
+  AT_ALLOCATED = 1;  // tensor was allocated by the Op
+  AT_EXISTING = 2;   // tensor was set to share the value of an existing tensor
+  AT_REF = 3;        // tensor was set to be a reference to an existing tensor
+}
+
+// Output sizes recorded for a single execution of a graph node.
+message NodeOutput {
+  int32 slot = 1;
+  // Was the tensor allocated by this Op or a previous computation
+  AllocationType allocation_type = 2;
+  TensorDescription tensor_description = 3;
+};
+
+// Time/size stats recorded for a single execution of a graph node.
+message NodeExecStats {
+  // TODO(tucker): Use some more compact form of node identity than
+  // the full string name.  Either all processes should agree on a
+  // global id (cost_id?) for each node, or we should use a hash of
+  // the name.
+  string node_name = 1;
+  int64 all_start_micros = 2;
+  int64 op_start_rel_micros = 3;
+  int64 op_end_rel_micros = 4;
+  int64 all_end_rel_micros = 5;
+  repeated AllocatorMemoryUsed memory = 6;
+  repeated NodeOutput output = 7;
+  string timeline_label = 8;
+  int64 scheduled_micros = 9;
+  uint32 thread_id = 10;
+};
+
+message DeviceStepStats {
+  string device = 1;
+  repeated NodeExecStats node_stats = 2;
+}
+
+message StepStats {
+  repeated DeviceStepStats dev_stats = 1;
+};
diff --git a/tensorflow/core/framework/summary.proto b/tensorflow/core/framework/summary.proto
new file mode 100644
index 0000000000..0e6e659f2f
--- /dev/null
+++ b/tensorflow/core/framework/summary.proto
@@ -0,0 +1,67 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+// Serialization format for histogram module in
+// core/lib/histogram/histogram.h
+message HistogramProto {
+  double min = 1;
+  double max = 2;
+  double num = 3;
+  double sum = 4;
+  double sum_squares = 5;
+
+  // Parallel arrays encoding the bucket boundaries and the bucket values.
+  // bucket(i) is the count for the bucket i.  The range for
+  // a bucket is:
+  //   i == 0:  -DBL_MAX .. bucket_limit(0)
+  //   i != 0:  bucket_limit(i-1) .. bucket_limit(i)
+  repeated double bucket_limit = 6 [packed = true];
+  repeated double bucket = 7 [packed = true];
+};
+
+// A Summary is a set of named values to be displayed by the
+// visualizer.
+//
+// Summaries are produced regularly during training, as controlled by
+// the "summary_interval_secs" attribute of the training operation.
+// Summaries are also produced at the end of an evaluation.
+message Summary {
+  message Image {
+    // Dimensions of the image.
+    int32 height = 1;
+    int32 width = 2;
+    // Valid colorspace values are
+    //   1 - grayscale
+    //   2 - grayscale + alpha
+    //   3 - RGB
+    //   4 - RGBA
+    //   5 - DIGITAL_YUV
+    //   6 - BGRA
+    int32 colorspace = 3;
+    // Image data in encoded format.  All image formats supported by
+    // image_codec::CoderUtil can be stored here.
+    bytes encoded_image_string = 4;
+  }
+
+  message Value {
+    // Tag name for the data.  Will be used as the title of the graph
+    // in the visualizer.
+    //
+    // Tag is usually "op_name:value_name", where "op_name" itself can have
+    // structure to indicate grouping.
+    string tag = 1;
+
+    // Value associated with the tag.
+    oneof value {
+      float simple_value = 2;
+      bytes obsolete_old_style_histogram = 3;
+      Image image = 4;
+      HistogramProto histo = 5;
+    }
+  }
+
+  // Set of values for the summary.
+  repeated Value value = 1;
+}
diff --git a/tensorflow/core/framework/tensor.cc b/tensorflow/core/framework/tensor.cc
new file mode 100644
index 0000000000..4a1b65db97
--- /dev/null
+++ b/tensorflow/core/framework/tensor.cc
@@ -0,0 +1,570 @@
+// Implementation notes:
+//
+// Tensor.cc uses a few templated classes and structs to facilitate
+// implementation of the Tensor class.
+//
+// * Buffer<T>: provides the implementation for a typed array T[n].
+//   The array is allocated by the given allocator. It runs T's
+//   default constructors and destructors when T is not a simple type
+//   (e.g., string.), and skips them otherwise.
+//
+// * Helper<T>: provides various routines given type T.  The routines
+//   includes running the constructor and destructor of T[], encoding
+//   an decoding T[] into/from a Cord, etc.
+
+#include "tensorflow/core/public/tensor.h"
+
+#include "tensorflow/core/framework/tensor.pb.h"
+#include "tensorflow/core/framework/type_traits.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/tensor_coding.h"
+
+namespace tensorflow {
+namespace {
+
+// Typed ref-counted buffer: T[n].
+template <typename T>
+class Buffer : public TensorBuffer {
+ public:
+  Buffer(Allocator* a, int64 n);
+
+  void* data() const override { return data_; }
+  size_t size() const override { return sizeof(T) * elem_; }
+  TensorBuffer* root_buffer() override { return this; }
+  void FillAllocationDescription(AllocationDescription* proto) const override {
+    int64 rb = size();
+    proto->set_requested_bytes(rb);
+    proto->set_allocator_name(alloc_->Name());
+    if (alloc_->TracksAllocationSizes()) {
+      int64 ab = alloc_->AllocatedSize(data_);
+      proto->set_allocated_bytes(ab);
+    }
+  }
+
+ private:
+  Allocator* alloc_;
+  T* data_;
+  int64 elem_;
+
+  ~Buffer() override;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Buffer);
+};
+
+// is_simple<T>::value if T[] can be safely constructed and destructed
+// without running T() and ~T().  We do not use std::is_trivial<T>
+// directly because std::complex<float> is not trival but its array
+// can be constructed and destructed without running its default ctor
+// and dtor.
+template <typename T>
+struct is_simple {
+  static const bool value = std::is_trivial<T>::value ||
+                            std::is_same<T, complex64>::value ||
+                            is_quantized<T>::value;
+};
+
+template <>
+struct is_simple<bfloat16> {
+  static const bool value = true;
+};
+
+// A set of helper functions depending on T.
+template <typename T>
+struct Helper {
+  // By default, we assume T is a simple type (float, int32, etc.)
+  static_assert(is_simple<T>::value, "T is not a simple type.");
+  typedef protobuf::RepeatedField<T> RepeatedFieldType;
+
+  // No constructor to run.
+  static void RunCtor(T* p, int n) {}
+
+  // No destructor to run.
+  static void RunDtor(T* p, int n) {}
+
+  // Encoder of simple type T to a string.  We do a copy.
+  template <typename Destination>
+  static void Encode(TensorBuffer* in, int64 n, Destination* out) {
+    DCHECK_EQ(in->size(), sizeof(T) * n);
+    port::AssignRefCounted(StringPiece(in->base<const char>(), in->size()), in,
+                           out);
+  }
+
+  // Decoder of simple type T. Copy the bytes from "in" into the
+  // tensor buffer.
+  template <typename Source>
+  static TensorBuffer* Decode(Allocator* a, const Source& in, int64 n) {
+    if (in.size() != sizeof(T) * n) {
+      LOG(ERROR) << "Input size was " << in.size() << " and expected "
+                 << sizeof(T) * n;
+      return nullptr;
+    }
+    Buffer<T>* buf = new Buffer<T>(a, n);
+    port::CopyToArray(in, buf->template base<char>());
+    return buf;
+  }
+
+  // Memory usage.
+  static int64 TotalBytes(TensorBuffer* in, int64 n) {
+    DCHECK_EQ(in->size(), sizeof(T) * n);
+    return in->size();
+  }
+};
+
+// Helper specialization for string (the only non-simple type we
+// support).
+template <>
+struct Helper<string> {
+  // Proto message uses RepeatedFieldType to hold repeated T.
+  typedef protobuf::RepeatedPtrField<string> RepeatedFieldType;
+
+  // Runs string's default constructor for  p[0], p[1], ..., p[n-1].
+  static void RunCtor(string* p, int n) {
+    for (int i = 0; i < n; ++p, ++i) new (p) string();
+  }
+
+  // Runs T's default destructor for  p[0], p[1], ..., p[n-1].
+  static void RunDtor(string* p, int n) {
+    for (int i = 0; i < n; ++p, ++i) p->~string();
+  }
+
+  // Encodes "n" elements of type string stored in "in" into Cord
+  // "out", which is usually the TensorProto::tensor_content.
+  template <typename Destination>
+  static void Encode(TensorBuffer* in, int64 n, Destination* out) {
+    port::EncodeStringList(in->base<const string>(), n, out);
+  }
+
+  // Decodes "n" elements of type string from "in" and constructs a
+  // buffer out of it. Returns nullptr if the decoding fails. "in" is
+  // usually the TensorProto::tensor_content.
+  template <typename Source>
+  static TensorBuffer* Decode(Allocator* a, const Source& in, int64 n) {
+    Buffer<string>* buf = new Buffer<string>(a, n);
+    string* strings = buf->template base<string>();
+    if (port::DecodeStringList(in, strings, n)) {
+      return buf;
+    } else {
+      buf->Unref();
+      return nullptr;
+    }
+  }
+
+  // Returns the estimated memory usage of "n" elements of type T
+  // stored in buffer "in".
+  static int64 TotalBytes(TensorBuffer* in, int n) {
+    int64 tot = in->size();
+    DCHECK_EQ(tot, sizeof(string) * n);
+    const string* p = in->base<const string>();
+    for (int i = 0; i < n; ++i, ++p) tot += p->size();
+    return tot;
+  }
+};
+
+template <typename T>
+struct ProtoHelper {};
+
+// For a C++ type "T" (float, double, int32, etc.), the repeated field
+// "N"_val (float_val, int_val, label_val, etc.) of type "F" (float,
+// int32, string, etc) in the TensorProto is used for serializing the
+// tensor of type "T".
+#define PROTO_TRAITS(T, F, N)                                          \
+  template <>                                                          \
+  struct ProtoHelper<T> {                                              \
+    typedef Helper<F>::RepeatedFieldType FieldType;                    \
+    static FieldType::const_iterator Begin(const TensorProto& proto) { \
+      return proto.N##_val().begin();                                  \
+    }                                                                  \
+    static size_t NumElements(const TensorProto& proto) {              \
+      return proto.N##_val().size();                                   \
+    }                                                                  \
+    static void Fill(const T* data, size_t n, TensorProto* proto) {    \
+      typename ProtoHelper<T>::FieldType copy(data, data + n);         \
+      proto->mutable_##N##_val()->Swap(&copy);                         \
+    }                                                                  \
+  };
+PROTO_TRAITS(float, float, float);
+PROTO_TRAITS(double, double, double);
+PROTO_TRAITS(int32, int32, int);
+PROTO_TRAITS(uint8, int32, int);
+PROTO_TRAITS(int16, int32, int);
+PROTO_TRAITS(int8, int32, int);
+PROTO_TRAITS(int64, int64, int64);
+PROTO_TRAITS(bool, bool, bool);
+PROTO_TRAITS(string, string, string);
+PROTO_TRAITS(qint8, int32, int);
+PROTO_TRAITS(quint8, int32, int);
+#undef PROTO_TRAITS
+
+template <>
+struct ProtoHelper<complex64> {
+  typedef Helper<float>::RepeatedFieldType FieldType;
+  static const complex64* Begin(const TensorProto& proto) {
+    return reinterpret_cast<const complex64*>(proto.scomplex_val().data());
+  }
+  static size_t NumElements(const TensorProto& proto) {
+    return proto.scomplex_val().size() / 2;
+  }
+  static void Fill(const complex64* data, size_t n, TensorProto* proto) {
+    const float* p = reinterpret_cast<const float*>(data);
+    FieldType copy(p, p + n * 2);
+    proto->mutable_scomplex_val()->Swap(&copy);
+  }
+};
+
+template <>
+struct ProtoHelper<qint32> {
+  typedef Helper<int32>::RepeatedFieldType FieldType;
+  static const qint32* Begin(const TensorProto& proto) {
+    return reinterpret_cast<const qint32*>(proto.int_val().data());
+  }
+  static size_t NumElements(const TensorProto& proto) {
+    return proto.int_val().size();
+  }
+  static void Fill(const qint32* data, size_t n, TensorProto* proto) {
+    const int32* p = reinterpret_cast<const int32*>(data);
+    FieldType copy(p, p + n);
+    proto->mutable_int_val()->Swap(&copy);
+  }
+};
+
+template <>
+struct ProtoHelper<bfloat16> {
+  typedef Helper<float>::RepeatedFieldType FieldType;
+  static const bfloat16* Begin(const TensorProto& proto) {
+    return reinterpret_cast<const bfloat16*>(proto.int_val().data());
+  }
+  static size_t NumElements(const TensorProto& proto) {
+    return proto.int_val().size();
+  }
+  static void Fill(const bfloat16* data, size_t n, TensorProto* proto) {
+    proto->mutable_int_val()->Reserve(n);
+    for (size_t i = 0; i < n; ++i) {
+      proto->mutable_int_val()->AddAlreadyReserved(data[i].value);
+    }
+  }
+};
+
+template <typename T>
+Buffer<T>::Buffer(Allocator* a, int64 n)
+    : alloc_(a), data_(a->Allocate<T>(n)), elem_(n) {
+  if (data_) Helper<T>::RunCtor(data_, elem_);
+}
+
+template <typename T>
+Buffer<T>::~Buffer() {
+  if (data_) {
+    Helper<T>::RunDtor(data_, elem_);
+    alloc_->Deallocate<T>(data_);
+  }
+}
+
+// Allocates a T[n] buffer. Fills in the buffer with repeated values
+// in "in".  If "in" has less values than "n", fills the rest of T[n]
+// with the last value. If "in" has no values, fills T[n] with the
+// default value for T.
+//
+// This routine is using the typed fields (float_val, etc.) in the
+// tenor proto as opposed to the untyped binary representation
+// (tensor_content). This is used when we expect the TensorProto is
+// used by a client program which may not know how to encode a tensor
+// in the compact binary representation.
+template <typename T>
+TensorBuffer* FromProtoField(Allocator* a, const TensorProto& in, int64 n) {
+  CHECK_GT(n, 0);
+  Buffer<T>* buf = new Buffer<T>(a, n);
+  T* data = buf->template base<T>();
+  const int64 in_n = ProtoHelper<T>::NumElements(in);
+  auto begin = ProtoHelper<T>::Begin(in);
+  if (n <= in_n) {
+    std::copy_n(begin, n, data);
+  } else if (in_n > 0) {
+    std::copy_n(begin, in_n, data);
+    const T& last = *(data + in_n - 1);
+    std::fill_n(data + in_n, n - in_n, last);
+  } else {
+    std::fill_n(data, n, T());
+  }
+  return buf;
+}
+
+// Copies T[n] stored in the buffer "in" into the repeated field in
+// "out" corresponding to type T.
+template <typename T>
+void ToProtoField(const TensorBuffer& in, int64 n, TensorProto* out) {
+  const T* data = in.base<const T>();
+  // NOTE: T may not the same as
+  // ProtoHelper<T>::FieldType::value_type.  E.g., T==int16,
+  // ProtoHelper<T>::FieldType::value_type==int32.  If performance is
+  // critical, we can specialize T=float and do memcpy directly.
+  ProtoHelper<T>::Fill(data, n, out);
+}
+
+void RefIfNonNull(core::RefCounted* buf) {
+  if (buf) buf->Ref();
+}
+
+void UnrefIfNonNull(core::RefCounted* buf) {
+  if (buf) buf->Unref();
+}
+
+}  // end namespace
+
+Tensor::Tensor() : Tensor(DT_FLOAT) {}
+
+Tensor::Tensor(DataType type) : type_(type), shape_({0}), buf_(nullptr) {}
+
+Tensor::Tensor(const Tensor& other)
+    : type_(other.dtype()), shape_(other.shape()), buf_(other.buf_) {
+  RefIfNonNull(buf_);
+}
+
+Tensor::Tensor(DataType type, const TensorShape& shape, TensorBuffer* buf)
+    : type_(type), shape_(shape), buf_(buf) {
+  RefIfNonNull(buf);
+}
+
+bool Tensor::IsInitialized() const {
+  return buf_ != nullptr && buf_->data() != nullptr;
+}
+
+Tensor::~Tensor() { UnrefIfNonNull(buf_); }
+
+void Tensor::CopyFromInternal(const Tensor& other, const TensorShape& shape) {
+  CHECK_EQ(shape.num_elements(), other.NumElements());
+  type_ = other.dtype();
+  shape_ = shape;
+  if (buf_ != other.buf_) {
+    UnrefIfNonNull(buf_);
+    buf_ = other.buf_;
+    RefIfNonNull(buf_);
+  }
+}
+
+// The macro CASES() expands to a switch statement conditioned on
+// TYPE_ENUM. Each case expands the STMTS after a typedef for T.
+#define SINGLE_ARG(...) __VA_ARGS__
+#define CASE(TYPE, STMTS)             \
+  case DataTypeToEnum<TYPE>::value: { \
+    typedef TYPE T;                   \
+    STMTS;                            \
+    break;                            \
+  }
+#define CASES(TYPE_ENUM, STMTS)                       \
+  switch (TYPE_ENUM) {                                \
+    CASE(float, SINGLE_ARG(STMTS))                    \
+    CASE(double, SINGLE_ARG(STMTS))                   \
+    CASE(int32, SINGLE_ARG(STMTS))                    \
+    CASE(uint8, SINGLE_ARG(STMTS))                    \
+    CASE(int16, SINGLE_ARG(STMTS))                    \
+    CASE(int8, SINGLE_ARG(STMTS))                     \
+    CASE(string, SINGLE_ARG(STMTS))                   \
+    CASE(complex64, SINGLE_ARG(STMTS))                \
+    CASE(int64, SINGLE_ARG(STMTS))                    \
+    CASE(bool, SINGLE_ARG(STMTS))                     \
+    CASE(qint32, SINGLE_ARG(STMTS))                   \
+    CASE(quint8, SINGLE_ARG(STMTS))                   \
+    CASE(qint8, SINGLE_ARG(STMTS))                    \
+    CASE(bfloat16, SINGLE_ARG(STMTS))                 \
+    case DT_INVALID:                                  \
+      LOG(FATAL) << "Type not set";                   \
+      break;                                          \
+    default:                                          \
+      LOG(FATAL) << "Unexpected type: " << TYPE_ENUM; \
+      break;                                          \
+  }
+
+Tensor::Tensor(Allocator* a, DataType type, const TensorShape& shape)
+    : type_(type), shape_(shape), buf_(nullptr) {
+  CHECK_NOTNULL(a);
+  if (shape_.num_elements() > 0) {
+    CASES(type, buf_ = new Buffer<T>(a, shape.num_elements()));
+  }
+}
+
+Tensor::Tensor(DataType type, const TensorShape& shape)
+    : Tensor(cpu_allocator(), type, shape) {}
+
+template <typename T>
+class SubBuffer : public TensorBuffer {
+ public:
+  // This buffer is an alias to buf[delta, delta + n).
+  SubBuffer(TensorBuffer* buf, int64 delta, int64 n)
+      : root_(buf->root_buffer()), data_(buf->base<T>() + delta), elem_(n) {
+    // Sanity check. The caller should ensure the sub buffer is valid.
+    CHECK_LE(root_->base<T>(), this->base<T>());
+    T* root_limit = root_->base<T>() + root_->size() / sizeof(T);
+    CHECK_LE(this->base<T>(), root_limit);
+    CHECK_LE(this->base<T>() + n, root_limit);
+    // Hold a ref of the underlying root buffer.
+    // NOTE: 'buf' is a sub-buffer inside the 'root_' buffer.
+    root_->Ref();
+  }
+
+  void* data() const override { return data_; }
+  size_t size() const override { return sizeof(T) * elem_; }
+  TensorBuffer* root_buffer() override { return root_; }
+  void FillAllocationDescription(AllocationDescription* proto) const override {
+    root_->FillAllocationDescription(proto);
+  }
+
+ private:
+  TensorBuffer* root_;
+  T* data_;
+  int64 elem_;
+
+  ~SubBuffer() override { root_->Unref(); }
+
+  TF_DISALLOW_COPY_AND_ASSIGN(SubBuffer);
+};
+
+Tensor Tensor::Slice(int64 start, int64 limit) const {
+  CHECK_GE(dims(), 1);
+  CHECK_LE(0, start);
+  CHECK_LE(start, limit);
+  int64 dim0_size = shape_.dim_size(0);
+  CHECK_LE(limit, dim0_size);
+  if ((start == 0) && (limit == dim0_size)) {
+    return *this;
+  }
+  Tensor ret;
+  ret.type_ = type_;
+  ret.shape_ = shape_;
+  ret.buf_ = nullptr;
+  if (dim0_size > 0) {
+    const int64 elems_per_dim0 = NumElements() / dim0_size;
+    const int64 delta = start * elems_per_dim0;
+    dim0_size = limit - start;
+    ret.shape_.set_dim(0, dim0_size);
+    const int64 num_elems = dim0_size * elems_per_dim0;
+    if (buf_) {
+      CASES(type_, ret.buf_ = new SubBuffer<T>(buf_, delta, num_elems));
+    }
+  }
+  return ret;
+}
+
+bool Tensor::FromProto(const TensorProto& proto) {
+  return FromProto(cpu_allocator(), proto);
+}
+
+bool Tensor::FromProto(Allocator* a, const TensorProto& proto) {
+  CHECK_NOTNULL(a);
+  TensorBuffer* p = nullptr;
+  if (!TensorShape::IsValid(proto.tensor_shape())) return false;
+  if (proto.dtype() == DT_INVALID) return false;
+  TensorShape shape(proto.tensor_shape());
+  const int64 N = shape.num_elements();
+  if (N > 0 && proto.dtype()) {
+    if (!proto.tensor_content().empty()) {
+      const auto& content = proto.tensor_content();
+      CASES(proto.dtype(), p = Helper<T>::Decode(a, content, N));
+    } else {
+      CASES(proto.dtype(), p = FromProtoField<T>(a, proto, N));
+    }
+    if (p == nullptr) return false;
+  }
+  type_ = proto.dtype();
+  shape_ = shape;
+  UnrefIfNonNull(buf_);
+  buf_ = p;
+  return true;
+}
+
+void Tensor::AsProtoField(TensorProto* proto) const {
+  proto->Clear();
+  proto->set_dtype(dtype());
+  shape_.AsProto(proto->mutable_tensor_shape());
+  if (buf_) {
+    CASES(dtype(), ToProtoField<T>(*buf_, shape_.num_elements(), proto));
+  }
+}
+
+void Tensor::AsProtoTensorContent(TensorProto* proto) const {
+  proto->Clear();
+  proto->set_dtype(type_);
+  shape_.AsProto(proto->mutable_tensor_shape());
+  if (buf_) {
+    CASES(dtype(), Helper<T>::Encode(buf_, shape_.num_elements(),
+                                     proto->mutable_tensor_content()));
+  }
+}
+
+size_t Tensor::TotalBytes() const {
+  if (shape_.num_elements() == 0) return 0;
+  CHECK(buf_) << "null buf_ with non-zero shape size " << shape_.num_elements();
+  CASES(dtype(), return Helper<T>::TotalBytes(buf_, shape_.num_elements()));
+  return 0;  // Makes compiler happy.
+}
+
+bool Tensor::CanUseDMA() const {
+  CASES(dtype(), return is_simple<T>::value);
+  return false;  // Makes compiler happy.
+}
+
+#undef CASES
+#undef CASE
+
+string Tensor::SummarizeValue(int64 max_entries) const {
+  string ret;
+  for (int64 i = 0; i < std::min(max_entries, NumElements()); ++i) {
+    if (i > 0) strings::StrAppend(&ret, " ");
+    switch (dtype()) {
+      case DT_STRING:
+        strings::StrAppend(&ret, str_util::CEscape(flat<string>()(i)));
+        break;
+      case DT_BOOL:
+        strings::StrAppend(&ret, flat<bool>()(i) ? "True" : "False");
+        break;
+
+#define CASE(DT_ENUM)                                                   \
+  case DT_ENUM:                                                         \
+    strings::StrAppend(&ret, flat<EnumToDataType<DT_ENUM>::Type>()(i)); \
+    break
+
+        CASE(DT_FLOAT);
+        CASE(DT_DOUBLE);
+        CASE(DT_INT32);
+        CASE(DT_UINT8);
+        CASE(DT_INT16);
+        CASE(DT_INT8);
+        CASE(DT_INT64);
+
+#undef CASE
+      default:
+        // TODO(zhifengc, josh11b): Pretty-print other types (bool,
+        // complex64, quantized, bfloat16).
+        strings::StrAppend(&ret, " ?");
+    }
+  }
+  if (max_entries < NumElements()) strings::StrAppend(&ret, "...");
+
+  return ret;
+}
+
+StringPiece Tensor::tensor_data() const {
+  if (buf_ == nullptr) return StringPiece();  // Don't die for empty tensors
+  return StringPiece(static_cast<char*>(buf_->data()), TotalBytes());
+}
+
+string Tensor::DebugString() const {
+  return strings::StrCat("Tensor<type: ", DataTypeString(dtype()), " shape: ",
+                         shape().ShortDebugString(), " values: ",
+                         SummarizeValue(3), ">");
+}
+
+void Tensor::FillDescription(TensorDescription* description) const {
+  description->set_dtype(dtype());
+  shape().AsProto(description->mutable_shape());
+  buf_->FillAllocationDescription(
+      description->mutable_allocation_description());
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/tensor.proto b/tensorflow/core/framework/tensor.proto
new file mode 100644
index 0000000000..b42694afde
--- /dev/null
+++ b/tensorflow/core/framework/tensor.proto
@@ -0,0 +1,57 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/tensor_shape.proto";
+import "tensorflow/core/framework/types.proto";
+
+// Protocol buffer representing a tensor.
+message TensorProto {
+  DataType dtype = 1;
+
+  // Shape of the tensor.  TODO(mdevin): sort out the 0-rank issues.
+  TensorShapeProto tensor_shape = 2;
+
+  // Only one of the representations below is set, one of "tensor_contents" and
+  // the "xxx_val" attributes.  We are not using oneof because as oneofs cannot
+  // contain repeated fields it would require another extra set of messages.
+
+  // Version number.
+  //
+  // In version 0, if the "repeated xxx" representations contain only one
+  // element, that element is repeated to fill the shape.  This makes it easy
+  // to represent a constant Tensor with a single value.
+  int32 version_number = 3;
+
+  // Serialized content from TensorBase::Serialize() This representation can be
+  // used for all tensor types.
+  bytes tensor_content = 4;
+
+  // Type specific representations that make it easy to create tensor protos in
+  // all languages.  Only the representation corresponding to "dtype" can
+  // be set.  The values hold the flattened representation of the tensor in
+  // row major order.
+
+  // DT_FLOAT.
+  repeated float float_val = 5 [packed = true];
+
+  // DT_DOUBLE.
+  repeated double double_val = 6 [packed = true];
+
+  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
+  repeated int32 int_val = 7 [packed = true];
+
+  // DT_STRING
+  repeated bytes string_val = 8;
+
+  // DT_COMPLEX64. scomplex_val(2*i) and scomplex_val(2*i+1) are real
+  // and imaginary parts of i-th single precision complex.
+  repeated float scomplex_val = 9 [packed = true];
+
+  // DT_INT64
+  repeated int64 int64_val = 10 [packed = true];
+
+  // DT_BOOL
+  repeated bool bool_val = 11 [packed = true];
+};
diff --git a/tensorflow/core/framework/tensor_description.proto b/tensorflow/core/framework/tensor_description.proto
new file mode 100644
index 0000000000..1fff3ee155
--- /dev/null
+++ b/tensorflow/core/framework/tensor_description.proto
@@ -0,0 +1,19 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/types.proto";
+import "tensorflow/core/framework/tensor_shape.proto";
+import "tensorflow/core/framework/allocation_description.proto";
+
+message TensorDescription {
+  // Data type of tensor elements
+  DataType dtype = 1;
+
+  // Shape of the tensor.
+  TensorShapeProto shape = 2;
+
+  // Information about the size and allocator used for the data
+  AllocationDescription allocation_description = 4;
+};
diff --git a/tensorflow/core/framework/tensor_shape.cc b/tensorflow/core/framework/tensor_shape.cc
new file mode 100644
index 0000000000..3db2ffaaca
--- /dev/null
+++ b/tensorflow/core/framework/tensor_shape.cc
@@ -0,0 +1,138 @@
+#include "tensorflow/core/public/tensor_shape.h"
+
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+// An upper limit of the total number of elements in a tensor.
+static const int64 kMaxElements = (1LL << 40);
+
+bool TensorShape::IsValid(const TensorShapeProto& proto) {
+  int64 num_elements = 1;
+  for (const auto& d : proto.dim()) {
+    if (d.size() < 0) return false;
+    num_elements *= d.size();
+    if (num_elements > kMaxElements) return false;
+  }
+  return true;
+}
+
+TensorShape::TensorShape(const TensorShapeProto& proto) {
+  dim_sizes_.reserve(proto.dim_size());
+  num_elements_ = 1;
+  for (const auto& d : proto.dim()) {
+    AddDim(d.size());
+  }
+}
+
+TensorShape::TensorShape(gtl::ArraySlice<int64> dim_sizes) {
+  dim_sizes_.reserve(dim_sizes.size());
+  num_elements_ = 1;
+  for (auto s : dim_sizes) {
+    AddDim(s);
+  }
+}
+
+TensorShape::TensorShape() : num_elements_(1) {}
+
+void TensorShape::Clear() {
+  dim_sizes_.clear();
+  num_elements_ = 1;
+}
+
+void TensorShape::AddDim(int64 size) {
+  CHECK_GE(size, 0);
+  dim_sizes_.push_back(size);
+  num_elements_ *= size;
+  CHECK_LE(0, num_elements_);
+  CHECK_LE(num_elements_, kMaxElements);
+}
+
+void TensorShape::AppendShape(const TensorShape& shape) {
+  for (auto d : shape) AddDim(d.size);
+}
+
+void TensorShape::InsertDim(int d, int64 size) {
+  CHECK_GE(d, 0);
+  CHECK_LE(d, dims());
+  CHECK_GE(size, 0);
+  dim_sizes_.insert(dim_sizes_.begin() + d, size);
+  num_elements_ *= size;
+  CHECK_LE(0, num_elements_);
+  CHECK_LE(num_elements_, kMaxElements);
+}
+
+void TensorShape::set_dim(int d, int64 size) {
+  CHECK_GE(d, 0);
+  CHECK_LT(d, dims());
+  CHECK_GE(size, 0);
+
+  // Update the number of elements. num_elements_ is int64.
+  dim_sizes_[d] = size;
+  recompute_dims();
+}
+
+void TensorShape::RemoveDim(int d) {
+  CHECK_GE(d, 0);
+  CHECK_LT(d, dims());
+
+  // Update the number of elements and remove the dimension from the
+  // sizes.
+  dim_sizes_.erase(dim_sizes_.begin() + d);
+  recompute_dims();
+}
+
+void TensorShape::recompute_dims() {
+  num_elements_ = 1;
+  for (auto s : dim_sizes_) {
+    num_elements_ *= s;
+    CHECK_LE(0, num_elements_);
+    CHECK_LE(num_elements_, kMaxElements);
+  }
+}
+
+bool TensorShape::IsSameSize(const TensorShape& b) const {
+  if (b.dims() != dims()) return false;
+  for (int d = 0; d < dims(); d++) {
+    if (dim_size(d) != b.dim_size(d)) return false;
+  }
+  return true;
+}
+
+void TensorShape::AsProto(TensorShapeProto* proto) const {
+  proto->Clear();
+  for (size_t d = 0; d < dim_sizes_.size(); ++d) {
+    auto* dim = proto->add_dim();
+    dim->set_size(dim_sizes_[d]);
+  }
+}
+
+TensorShapeIter TensorShape::begin() const { return TensorShapeIter(this, 0); }
+
+TensorShapeIter TensorShape::end() const {
+  return TensorShapeIter(this, dims());
+}
+
+string TensorShape::DebugString() const {
+  TensorShapeProto proto;
+  AsProto(&proto);
+  return proto.ShortDebugString();
+}
+
+string TensorShape::ShortDebugString() const {
+  return strings::StrCat(
+      "[", str_util::Join(gtl::ArraySlice<int64>(dim_sizes_), ","), "]");
+}
+
+bool TensorShapeUtils::StartsWith(const TensorShape& shape,
+                                  const TensorShape& prefix) {
+  if (shape.dims() < prefix.dims()) return false;
+  for (int i = 0; i < prefix.dims(); i++) {
+    if (shape.dim_size(i) != prefix.dim_size(i)) return false;
+  }
+  return true;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/tensor_shape.proto b/tensorflow/core/framework/tensor_shape.proto
new file mode 100644
index 0000000000..8fe7cce13d
--- /dev/null
+++ b/tensorflow/core/framework/tensor_shape.proto
@@ -0,0 +1,29 @@
+// Protocol buffer representing the shape of tensors.
+
+syntax = "proto3";
+// option cc_enable_arenas = true;
+
+package tensorflow;
+
+// Dimensions of a tensor and the type of data it contains.
+message TensorShapeProto {
+  // One dimension of the tensor.
+  message Dim {
+    // Size of the tensor in that dimension.
+    int64 size = 1;
+
+    // Optional name of the tensor dimension.
+    string name = 2;
+  };
+
+  // Dimensions of the tensor, such as {"input", 30}, {"output", 40} for a 30 x
+  // 40 2D tensor.  The names are optional.
+  //
+  // The order of entries in "dim" matters: It indicates the layout of the
+  // values in the tensor in-memory representation.
+  //
+  // The first entry in "dim" is the outermost dimension used to layout the
+  // values, the last entry is the innermost dimension.  This matches the
+  // in-memory layout of RowMajor Eigen tensors.
+  repeated Dim dim = 2;
+};
diff --git a/tensorflow/core/framework/tensor_shape_test.cc b/tensorflow/core/framework/tensor_shape_test.cc
new file mode 100644
index 0000000000..adac1a4787
--- /dev/null
+++ b/tensorflow/core/framework/tensor_shape_test.cc
@@ -0,0 +1,75 @@
+#include "tensorflow/core/public/tensor_shape.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+TEST(TensorShapeTest, Default) {
+  // The default TensorShape constructor constructs a shape of 0-dim
+  // and 1-element.
+  TensorShape s;
+  EXPECT_EQ(s.dims(), 0);
+  EXPECT_EQ(s.num_elements(), 1);
+}
+
+TEST(TensorShapeTest, set_dim) {
+  TensorShape s({10, 5});
+
+  s.set_dim(0, 20);
+  ASSERT_EQ(2, s.dims());
+  EXPECT_EQ(20, s.dim_size(0));
+  EXPECT_EQ(100, s.num_elements());
+
+  s.set_dim(1, 2);
+  ASSERT_EQ(2, s.dims());
+  EXPECT_EQ(2, s.dim_size(1));
+  EXPECT_EQ(40, s.num_elements());
+}
+
+TEST(TensorShapeTest, RemoveDim) {
+  TensorShape s({10, 5});
+  s.RemoveDim(0);
+  EXPECT_EQ(5, s.num_elements());
+  ASSERT_EQ(1, s.dims());
+}
+
+TEST(TensorShapeTest, RemoveAndAddDim) {
+  TensorShape s({10, 5, 20});
+  s.RemoveDim(1);
+  s.AddDim(100);
+
+  EXPECT_EQ(20000, s.num_elements());
+  ASSERT_EQ(3, s.dims());
+}
+
+TEST(TensorShapeTest, InvalidShapeProto) {
+  TensorShapeProto proto;
+  EXPECT_TRUE(TensorShape::IsValid(proto));
+
+  proto.add_dim()->set_size(357);
+  proto.add_dim()->set_size(982);
+  EXPECT_TRUE(TensorShape::IsValid(proto));
+
+  proto.Clear();
+  proto.add_dim()->set_size(-357);
+  proto.add_dim()->set_size(-982);
+  EXPECT_FALSE(TensorShape::IsValid(proto));
+
+  proto.Clear();
+  proto.add_dim()->set_size(1LL << 20);
+  proto.add_dim()->set_size((1LL << 20) + 1);
+  EXPECT_FALSE(TensorShape::IsValid(proto));
+}
+
+TEST(TensorShapeTest, SetDimForEmptyTensor) {
+  TensorShape s({10, 5, 20});
+  EXPECT_EQ(1000, s.num_elements());
+  s.set_dim(1, 0);
+  EXPECT_EQ(0, s.num_elements());
+  s.set_dim(1, 7);
+  EXPECT_EQ(1400, s.num_elements());
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/tensor_slice.cc b/tensorflow/core/framework/tensor_slice.cc
new file mode 100644
index 0000000000..473d9463ee
--- /dev/null
+++ b/tensorflow/core/framework/tensor_slice.cc
@@ -0,0 +1,226 @@
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+
+namespace tensorflow {
+
+TensorSlice::TensorSlice(int dim) { SetFullSlice(dim); }
+
+TensorSlice::TensorSlice(const TensorSliceProto& proto) {
+  starts_.reserve(proto.extent_size());
+  lengths_.reserve(proto.extent_size());
+  for (const auto& e : proto.extent()) {
+    starts_.push_back(e.start());
+    lengths_.push_back(GetExtentLength(e));
+  }
+}
+
+TensorSlice::TensorSlice(std::initializer_list<std::pair<int, int>> extents) {
+  starts_.reserve(extents.size());
+  lengths_.reserve(extents.size());
+  for (const auto& e : extents) {
+    starts_.push_back(e.first);
+    lengths_.push_back(e.second);
+  }
+}
+
+Status TensorSlice::Parse(const string& str, TensorSlice* slice) {
+  std::vector<string> items = str_util::Split(str, ':', str_util::SkipEmpty());
+  slice->starts_.reserve(items.size());
+  slice->lengths_.reserve(items.size());
+  for (const string& x : items) {
+    int s, l;
+    if (x == "-") {
+      // "everything"
+      s = 0;
+      l = kFullExtent;
+    } else {
+      char junk;
+      if (sscanf(x.c_str(), "%d,%d%c", &s, &l, &junk) != 2) {
+        return errors::InvalidArgument(
+            "Expected a pair of numbers or '-' "
+            "but got '",
+            x, "': string = ", str);
+      }
+      if (s < 0 || l <= 0) {
+        return errors::InvalidArgument(
+            "Expected non-negative start and "
+            "positive length but got start = ",
+            s, ", length = ", l, ": string = ", str);
+      }
+    }
+    slice->starts_.push_back(s);
+    slice->lengths_.push_back(l);
+  }
+
+  return Status::OK();
+}
+
+void TensorSlice::Clear() {
+  starts_.clear();
+  lengths_.clear();
+}
+
+void TensorSlice::SetFullSlice(int dim) {
+  Clear();
+  starts_.reserve(dim);
+  lengths_.reserve(dim);
+  for (int d = 0; d < dim; ++d) {
+    starts_.push_back(0);
+    lengths_.push_back(kFullExtent);
+  }
+}
+
+void TensorSlice::Extend(int dim) {
+  int old_dim = dims();
+  DCHECK_LE(old_dim, dim);
+  starts_.resize(dim);
+  lengths_.resize(dim);
+  for (int d = old_dim; d < dim; ++d) {
+    starts_[d] = 0;
+    lengths_[d] = kFullExtent;
+  }
+}
+
+void TensorSlice::AsProto(TensorSliceProto* proto) const {
+  for (int d = 0; d < dims(); ++d) {
+    TensorSliceProto::Extent* e = proto->add_extent();
+    // We only need to record the explicit slice for non-full slices
+    if (!IsFullAt(d)) {
+      e->set_start(starts_[d]);
+      e->set_length(lengths_[d]);
+    }
+  }
+}
+
+string TensorSlice::DebugString() const {
+  string buffer;
+  bool first = true;
+  for (int d = 0; d < dims(); ++d) {
+    if (!first) {
+      buffer.append(":");
+    }
+    string s;
+    if (IsFullAt(d)) {
+      buffer.append("-");
+    } else {
+      strings::StrAppend(&buffer, starts_[d], ",", lengths_[d]);
+    }
+    first = false;
+  }
+  return buffer;
+}
+
+bool TensorSlice::Intersect(const TensorSlice& other,
+                            TensorSlice* result) const {
+  // First, if two slices have different ranks, they obviously don't overlap
+  // -- in fact they are not compatible.
+  if (dims() != other.dims()) {
+    return false;
+  }
+
+  // Setting the result to the right dimension
+  if (result) {
+    result->SetFullSlice(dims());
+  }
+  // The two slices overlap if they overlap in all dimensions.
+  for (int d = 0; d < dims(); ++d) {
+    if (IsFullAt(d)) {
+      if (result) {
+        result->set_start(d, other.start(d));
+        result->set_length(d, other.length(d));
+      }
+    } else if (other.IsFullAt(d)) {
+      if (result) {
+        result->set_start(d, start(d));
+        result->set_length(d, length(d));
+      }
+    } else {
+      // If we have an intersection here, it should have a start that is the
+      // max of the two starts and an end that is the min of the two ends.
+      int s = std::max(start(d), other.start(d));
+      int l = std::min(end(d), other.end(d)) - s;
+      if (l > 0) {
+        // We have a real intersection
+        if (result) {
+          result->set_start(d, s);
+          result->set_length(d, l);
+        }
+      } else {
+        // We don't have an intersection for this dimension -- thus we don't
+        // have any intersection at all.
+        if (result) {
+          result->Clear();
+        }
+        return false;
+      }
+    }
+  }
+  // If we are here, we know there is overlap in every dimension.
+  return true;
+}
+
+void TensorSlice::ComputeRelative(const TensorSlice& sub,
+                                  TensorSlice* relative) const {
+  DCHECK_EQ(dims(), sub.dims());
+  relative->SetFullSlice(dims());
+  for (int d = 0; d < dims(); ++d) {
+    if (IsFullAt(d)) {
+      relative->set_start(d, sub.start(d));
+      relative->set_length(d, sub.length(d));
+    } else {
+      // Otherwise the relative start is the difference between the start of
+      // sub and the start of base
+      relative->set_start(d, sub.start(d) - start(d));
+      relative->set_length(d, sub.length(d));
+    }
+  }
+}
+
+// static
+bool TensorSlice::HasExtentLength(const TensorSliceProto::Extent& extent) {
+  return extent.has_length_case() == TensorSliceProto::Extent::kLength;
+}
+
+// static
+int64 TensorSlice::GetExtentLength(const TensorSliceProto::Extent& extent) {
+  if (!HasExtentLength(extent)) return -1;
+  return extent.length();
+}
+
+Status TensorSlice::SliceTensorShape(const TensorShape& shape,
+                                          TensorShape* result_shape) const {
+  result_shape->Clear();
+  // Mismatching ranks: we can't apply the slice at all.
+  if (shape.dims() != dims()) {
+    return errors::Internal("Mismatching ranks: shape = ", shape.DebugString(),
+                            ", slice = ", DebugString());
+  }
+  for (int d = 0; d < dims(); ++d) {
+    if (IsFullAt(d)) {
+      result_shape->AddDim(shape.dim_size(d));
+    } else {
+      // Check if the extent applies to the dimension
+      if (end(d) <= shape.dim_size(d)) {
+        // Yes: the end is within the range of the dim -- we adjust the result
+        // shape so that its size along this dimension is the length of the
+        // slice.
+        result_shape->AddDim(length(d));
+      } else {
+        // The extent doesn't apply to the dimension
+        result_shape->Clear();
+        return errors::Internal("Extent in dimension ", d,
+                                " out of bounds: shape = ", shape.DebugString(),
+                                ", slice = ", DebugString());
+      }
+    }
+  }
+  // If we are here, we have successfully applied the shape.
+  return Status::OK();
+}
+
+const int TensorSlice::kFullExtent = -1;
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/tensor_slice.h b/tensorflow/core/framework/tensor_slice.h
new file mode 100644
index 0000000000..8e2f108c3f
--- /dev/null
+++ b/tensorflow/core/framework/tensor_slice.h
@@ -0,0 +1,189 @@
+#ifndef TENSORFLOW_FRAMEWORK_TENSOR_SLICE_H_
+#define TENSORFLOW_FRAMEWORK_TENSOR_SLICE_H_
+
+#include <string>
+#include "tensorflow/core/framework/tensor_slice.pb.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// A tensor slice represents a slice of a given tensor. It is represented by a
+// list of (start, length) pairs, where the size of the list is the rank of the
+// tensor.
+
+class TensorSlice {
+ public:
+  // Construct a tensor slice: you have a number of ways:
+  // -- creating an empty slice
+  // -- from just a dimension (in this case it will create a full slice)
+  // -- from an array of pairs of integers.
+  // -- from a TensorSliceProto protocol buffer
+  // -- from a string format of "start,lenth:start,length..." where each
+  //    "start,length" pair represents the slice on one dimension. We allow a
+  //    special "-" that means "everything for this dimension". One such example
+  //    is:  0,10:-:14,1:-:-
+  TensorSlice() {}
+  explicit TensorSlice(int dim);
+  explicit TensorSlice(const TensorSliceProto& proto);
+  explicit TensorSlice(std::initializer_list<std::pair<int, int>> extents);
+
+  static Status Parse(const string& str, TensorSlice* output);
+  static TensorSlice ParseOrDie(const string& str) {
+    TensorSlice ret;
+    Status s = Parse(str, &ret);
+    if (!s.ok()) {
+      LOG(FATAL) << "Could not parse TensorSlice";
+    }
+    return ret;
+  }
+
+  void Clear();
+
+  // Accessors
+  int dims() const { return starts_.size(); }
+
+  int start(int d) const {
+    DCHECK_GE(d, 0);
+    DCHECK_LT(d, dims());
+    return starts_[d];
+  }
+
+  int length(int d) const {
+    DCHECK_GE(d, 0);
+    DCHECK_LT(d, dims());
+    return lengths_[d];
+  }
+
+  int end(int d) const {
+    DCHECK_GE(d, 0);
+    DCHECK_LT(d, dims());
+    return start(d) + length(d);
+  }
+
+  void set_start(int d, int x) {
+    DCHECK_GE(d, 0);
+    DCHECK_LT(d, dims());
+    DCHECK_GE(x, 0);
+    starts_[d] = x;
+  }
+
+  void set_length(int d, int x) {
+    DCHECK_GE(d, 0);
+    DCHECK_LT(d, dims());
+    lengths_[d] = x;
+  }
+
+  // If we have a full slice along dimension "d".
+  bool IsFullAt(int d) const { return lengths_[d] < 0; }
+
+  // Set the slice to be a full slice of "dim" dimensions
+  void SetFullSlice(int dim);
+
+  // Extend a slice to "dim" dimensions: all the added dimensions are full.
+  // Requires: dim >= dims().
+  void Extend(int dim);
+
+  // Conversion of a TensorSlice to other formats
+  void AsProto(TensorSliceProto* proto) const;
+  string DebugString() const;
+
+  // Fill *indices and *sizes from *this (so that we can use the slice()
+  // function in eigen tensor). We need a tensor shape in case some of the
+  // slices are full slices.
+  // We allow NDIMS to be greater than dims(), in which case we will pad the
+  // higher dimensions with trivial dimensions.
+  template <int NDIMS>
+  void FillIndicesAndSizes(const TensorShape& shape,
+                           Eigen::DSizes<ptrdiff_t, NDIMS>* indices,
+                           Eigen::DSizes<ptrdiff_t, NDIMS>* sizes) const;
+
+  // Interaction with other TensorSlices.
+
+  // Compute the intersection with another slice and if "result" is not
+  // nullptr, store the results in *result; returns true is there is any real
+  // intersection.
+  bool Intersect(const TensorSlice& other, TensorSlice* result) const;
+  // A short hand.
+  bool Overlaps(const TensorSlice& other) const {
+    return Intersect(other, nullptr);
+  }
+
+  // Interaction with TensorShape.
+
+  // Slices a shape and stores the result into *result_shape.
+  // Requires that the shape and *this have the same rank.
+  // For example, given a tensor shape of {3, 4, 5}, and a slice of
+  // 1,2:-:0,2, the result shape is {2, 4, 2}.
+  Status SliceTensorShape(const TensorShape& shape,
+                               TensorShape* result_shape) const;
+
+  // Given slice "sub" where "sub" is fully contained in *this,
+  // (meaning that the intersection of "sub" and *this equals "sub"), computes
+  // the "relative" slice of "sub" with respect to *this.
+  //
+  // In other words, if we use A>S to denote slicing a shape S with a slice A,
+  // then the function is computing a slice X such that:
+  //   X > (this > S) = sub > S
+  // for any shape S.
+  //
+  // In general, along every dimension, the start of the relative slice is the
+  // start of the "sub" slice minus the start of *this; the length of the
+  // relative slice is the length of the "sub" slice.
+  //
+  // For example, say we have a shape of {3, 4, 5}, "this" is 0,2:-:1,2, and
+  // "sub" is 1,1:2:2,1,2, then the related slice is 1,1:2,2:0,2.
+  //
+  // The caller needs to make sure that "sub" is indeed a sub-slice of *this;
+  // otherwise the result is undefined.
+  void ComputeRelative(const TensorSlice& sub, TensorSlice* relative) const;
+
+  // Returns true if the length field was specified in an Extent.
+  static bool HasExtentLength(const TensorSliceProto::Extent& extent);
+
+  // Returns the value of the length field in an Extent, or -1 if it
+  // is not present.
+  static int64 GetExtentLength(const TensorSliceProto::Extent& extent);
+
+ private:
+  // a length value of kFullExtent (-1) means we have a full slice at this
+  // dimension. It's defined in tensor_slice.cc.
+  static const int kFullExtent;
+
+  // TODO(yangke): switch to Eigen once it supports variable size arrays.
+  // A value of
+  gtl::InlinedVector<int, 4> starts_;
+  gtl::InlinedVector<int, 4> lengths_;
+};
+
+template <int NDIMS>
+void TensorSlice::FillIndicesAndSizes(
+    const TensorShape& shape, Eigen::DSizes<ptrdiff_t, NDIMS>* indices,
+    Eigen::DSizes<ptrdiff_t, NDIMS>* sizes) const {
+  CHECK_EQ(shape.dims(), dims()) << "Incompatible dimensions between shape "
+                                 << "slices: shape = " << shape.DebugString()
+                                 << ", slice = " << DebugString();
+  CHECK_GE(NDIMS, dims()) << "Asking for a " << NDIMS << "-dim slice from "
+                          << "a slice of dimension " << dims();
+  for (int d = 0; d < dims(); ++d) {
+    if (IsFullAt(d)) {
+      (*indices)[d] = 0;
+      (*sizes)[d] = shape.dim_size(d);
+    } else {
+      (*indices)[d] = starts_[d];
+      (*sizes)[d] = lengths_[d];
+    }
+  }
+  for (int d = dims(); d < NDIMS; ++d) {
+    (*indices)[d] = 0;
+    (*sizes)[d] = 1;
+  }
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_TENSOR_SLICE_H_
diff --git a/tensorflow/core/framework/tensor_slice.proto b/tensorflow/core/framework/tensor_slice.proto
new file mode 100644
index 0000000000..ca676bc766
--- /dev/null
+++ b/tensorflow/core/framework/tensor_slice.proto
@@ -0,0 +1,34 @@
+// Protocol buffer representing slices of a tensor
+
+syntax = "proto3";
+// option cc_enable_arenas = true;
+
+package tensorflow;
+
+// Can only be interpreted if you know the corresponding TensorShape.
+message TensorSliceProto {
+  // Extent of the slice in one dimension.
+  message Extent {
+    // Either both or no attributes must be set.  When no attribute is set
+    // means: All data in that dimension.
+
+    // Start index of the slice, starting at 0.
+    int64 start = 1;
+
+    // Length of the slice: if the length is missing or -1 we will
+    // interpret this as "everything in this dimension".  We use
+    // "oneof" to preserve information about whether the length is
+    // present without changing the serialization format from the
+    // prior proto2 version of this proto.
+    oneof has_length {
+      int64 length = 2;
+    }
+  };
+
+  // Extent of the slice in all tensor dimensions.
+  //
+  // Must have one entry for each of the dimension of the tensor that this
+  // slice belongs to.  The order of sizes is the same as the order of
+  // dimensions in the TensorShape.
+  repeated Extent extent = 1;
+};
diff --git a/tensorflow/core/framework/tensor_slice_test.cc b/tensorflow/core/framework/tensor_slice_test.cc
new file mode 100644
index 0000000000..5f718a56b6
--- /dev/null
+++ b/tensorflow/core/framework/tensor_slice_test.cc
@@ -0,0 +1,246 @@
+#include "tensorflow/core/framework/tensor_slice.h"
+
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/logging.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+namespace {
+
+// Basic tests
+TEST(TensorSliceTest, Basic) {
+  {
+    // Repeatedly setting FullSlice should work.
+    TensorSlice s(3);
+    EXPECT_EQ("-:-:-", s.DebugString());
+
+    s.SetFullSlice(4);
+    EXPECT_EQ("-:-:-:-", s.DebugString());
+  }
+}
+
+// Testing for serialization and parsing for the string format of slices.
+TEST(TensorSliceTest, Serialization) {
+  // Serialization
+  {
+    TensorSlice s({{0, -1}, {0, 10}, {14, 1}, {0, -1}});
+    EXPECT_EQ("-:0,10:14,1:-", s.DebugString());
+  }
+
+  {
+    TensorSliceProto proto;
+    // Define ptxt outside ASSERT_TRUE call to work around bug in some
+    // versions of gcc that breaks when you use raw string literals
+    // inside macro expansions.
+    //   See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55971
+    const char* ptxt = R"PROTO(
+      extent { }
+      extent { start: 0 length: 10 }
+      extent { start: 14 length: 1 }
+      extent { }
+    )PROTO";
+    ASSERT_TRUE(protobuf::TextFormat::ParseFromString(ptxt, &proto));
+    TensorSlice s(proto);
+    EXPECT_EQ("-:0,10:14,1:-", s.DebugString());
+  }
+
+  // Parsing
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("-:-:1,3:4,5");
+    TensorSliceProto proto;
+    s.AsProto(&proto);
+    EXPECT_EQ(
+        "extent { } "
+        "extent { } "
+        "extent { start: 1 length: 3 } "
+        "extent { start: 4 length: 5 }",
+        proto.ShortDebugString());
+  }
+
+  // Failed parsing
+  {
+    TensorSlice slice;
+    Status s = TensorSlice::Parse("-:-:1,3:4:5", &slice);
+    EXPECT_EQ(
+        "Invalid argument: "
+        "Expected a pair of numbers or '-' but got '4': "
+        "string = -:-:1,3:4:5",
+        s.ToString());
+  }
+  {
+    TensorSlice slice;
+    Status s = TensorSlice::Parse("-:-1,3", &slice);
+    EXPECT_EQ(
+        "Invalid argument: "
+        "Expected non-negative start and positive length but got "
+        "start = -1, length = 3: string = -:-1,3",
+        s.ToString());
+  }
+}
+
+// Testing the slice intersection
+TEST(TensorSliceTest, Intersection) {
+  // "EVERYTHING" intersects with everything
+  {
+    TensorSlice a = TensorSlice::ParseOrDie("-:-");
+    TensorSlice b = TensorSlice::ParseOrDie("1,2:3,4");
+    TensorSlice c;
+    EXPECT_TRUE(a.Intersect(b, &c));
+    EXPECT_EQ("1,2:3,4", c.DebugString());
+  }
+
+  {
+    TensorSlice a = TensorSlice::ParseOrDie("-:-");
+    TensorSlice b = TensorSlice::ParseOrDie("1,2:3,4");
+    TensorSlice c;
+    EXPECT_TRUE(b.Intersect(a, &c));
+    EXPECT_EQ("1,2:3,4", c.DebugString());
+  }
+
+  // Overlap at all dimensions
+  {
+    TensorSlice a = TensorSlice::ParseOrDie("1,5:2,6:3,7:5,10");
+    TensorSlice b = TensorSlice::ParseOrDie("1,2:3,4:9,10:12,1");
+    TensorSlice c;
+    EXPECT_TRUE(a.Intersect(b, &c));
+    EXPECT_EQ("1,2:3,4:9,1:12,1", c.DebugString());
+  }
+
+  // A mixture of everything and non-trivial slices
+  {
+    TensorSlice a = TensorSlice::ParseOrDie("-:1,1");
+    TensorSlice b = TensorSlice::ParseOrDie("-:0,2");
+    TensorSlice c;
+    EXPECT_TRUE(a.Intersect(b, &c));
+    EXPECT_EQ("-:1,1", c.DebugString());
+  }
+
+  // No overlap on dimension 3: "3,1" and "4,5" don't intersect
+  {
+    TensorSlice a = TensorSlice::ParseOrDie("1,2:3,1:5,6");
+    TensorSlice b = TensorSlice::ParseOrDie("1,3:4,5:1,6");
+    TensorSlice c;
+    EXPECT_FALSE(a.Intersect(b, &c));
+    EXPECT_EQ("", c.DebugString());
+  }
+  // No intersection when there are different dimensions
+  {
+    TensorSlice a = TensorSlice::ParseOrDie("1,2:3,1:-");
+    TensorSlice b = TensorSlice::ParseOrDie("-:-");
+    TensorSlice c;
+    EXPECT_FALSE(a.Intersect(b, &c));
+    EXPECT_EQ("", c.DebugString());
+  }
+}
+
+// Testing applying a slice to a tensor shape
+TEST(TensorSliceTest, SliceTensorShape) {
+  // A proper application
+  {
+    TensorSlice a = TensorSlice::ParseOrDie("1,1:-:4,1:2,6");
+    TensorShape x({2, 4, 5, 8});
+    TensorShape y;
+    EXPECT_OK(a.SliceTensorShape(x, &y));
+    EXPECT_EQ(
+        "dim { size: 1 } "
+        "dim { size: 4 } "
+        "dim { size: 1 } "
+        "dim { size: 6 }",
+        y.DebugString());
+  }
+
+  // An invalid application -- dimension 2 is out of bound
+  {
+    TensorSlice a = TensorSlice::ParseOrDie("1,1:1,4:-:-");
+    TensorShape x({2, 4, 5, 8});
+    TensorShape y;
+    EXPECT_EQ(
+        "Internal: "
+        "Extent in dimension 1 out of bounds: "
+        "shape = dim { size: 2 } "
+        "dim { size: 4 } "
+        "dim { size: 5 } "
+        "dim { size: 8 }, "
+        "slice = 1,1:1,4:-:-",
+        a.SliceTensorShape(x, &y).ToString());
+    EXPECT_EQ("", y.DebugString());
+  }
+}
+
+// Testing the computation of relative slices.
+TEST(TensorSliceTest, ComputeRelative) {
+  // Easy case: base is "everything"
+  {
+    TensorSlice base = TensorSlice::ParseOrDie("-:-:-:-");
+    TensorSlice sub = TensorSlice::ParseOrDie("-:1,2:-:3,4");
+    TensorSlice relative;
+    base.ComputeRelative(sub, &relative);
+    EXPECT_EQ("-:1,2:-:3,4", relative.DebugString());
+  }
+
+  // A slightly more complicated case
+  {
+    TensorSlice base = TensorSlice::ParseOrDie("1,2:3,4:-:5,1");
+    TensorSlice sub = TensorSlice::ParseOrDie("1,1:4,2:3,3:5,1");
+    TensorSlice relative;
+    base.ComputeRelative(sub, &relative);
+    EXPECT_EQ("0,1:1,2:3,3:0,1", relative.DebugString());
+  }
+}
+
+TEST(TensorSliceTest, ExtentLength) {
+  TensorSliceProto proto;
+  // Define ptxt outside ASSERT_TRUE call to work around bug in some
+  // versions of gcc that breaks when you use raw string literals
+  // inside macro expansions.
+  //   See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55971
+  const char* ptxt = R"PROTO(
+    extent { }
+    extent { start: 0 length: 10 }
+    extent { start: 14 length: 1 }
+    extent { }
+  )PROTO";
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(ptxt, &proto));
+  EXPECT_FALSE(TensorSlice::HasExtentLength(proto.extent(0)));
+  EXPECT_TRUE(TensorSlice::HasExtentLength(proto.extent(1)));
+  EXPECT_TRUE(TensorSlice::HasExtentLength(proto.extent(2)));
+  EXPECT_FALSE(TensorSlice::HasExtentLength(proto.extent(3)));
+  EXPECT_EQ(-1, TensorSlice::GetExtentLength(proto.extent(0)));
+  EXPECT_EQ(10, TensorSlice::GetExtentLength(proto.extent(1)));
+  EXPECT_EQ(1, TensorSlice::GetExtentLength(proto.extent(2)));
+  EXPECT_EQ(-1, TensorSlice::GetExtentLength(proto.extent(3)));
+}
+
+TEST(TensorSliceTest, Deserialization) {
+  // Serialization of
+  //     extent { length: 5 }
+  //     extent { start: 0 length: 10 }
+  //     extent { start: 14 length: 1 }
+  //     extent { start: 1 }
+  //     extent { }
+  // in proto2 and proto3:
+  const char pb2[] =
+      "\x0A\x02\x10\x05\x0A\x04\x08\x00"
+      "\x10\x0A\x0A\x04\x08\x0E\x10\x01\x0A\x02\x08\x01\x0A\x00";
+  const char pb3[] =
+      "\x0A\x02\x10\x05\x0A\x02"
+      "\x10\x0A\x0A\x04\x08\x0E\x10\x01\x0A\x02\x08\x01\x0A\x00";
+  // (The difference is that in the proto3 version, "start: 0" isn't included
+  // since 0 is start's default value.)
+
+  TensorSliceProto proto2;
+  ASSERT_TRUE(proto2.ParseFromArray(pb2, sizeof(pb2) - 1));
+  TensorSlice ts2(proto2);
+
+  TensorSliceProto proto3;
+  ASSERT_TRUE(proto3.ParseFromArray(pb3, sizeof(pb3) - 1));
+  TensorSlice ts3(proto3);
+
+  // Both serializations should be interpreted the same.
+  EXPECT_EQ("0,5:0,10:14,1:-:-", ts2.DebugString());
+  EXPECT_EQ("0,5:0,10:14,1:-:-", ts3.DebugString());
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/tensor_test.cc b/tensorflow/core/framework/tensor_test.cc
new file mode 100644
index 0000000000..4963c2c219
--- /dev/null
+++ b/tensorflow/core/framework/tensor_test.cc
@@ -0,0 +1,551 @@
+#include "tensorflow/core/public/tensor.h"
+
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TEST(TensorTest, Default) {
+  Tensor t;
+  EXPECT_EQ(t.dtype(), DT_FLOAT);
+  EXPECT_EQ(t.dims(), 1);
+  EXPECT_EQ(t.NumElements(), 0);
+}
+
+TEST(TensorTest, DataType_Traits) {
+  EXPECT_TRUE(std::is_trivial<float>::value);
+  EXPECT_TRUE(std::is_trivial<double>::value);
+  EXPECT_TRUE(std::is_trivial<int32>::value);
+  EXPECT_TRUE(std::is_trivial<uint8>::value);
+  EXPECT_TRUE(std::is_trivial<int16>::value);
+  EXPECT_TRUE(std::is_trivial<int8>::value);
+  EXPECT_TRUE(std::is_trivial<int64>::value);
+  EXPECT_TRUE(std::is_trivial<bool>::value);
+  EXPECT_FALSE(std::is_trivial<string>::value);
+
+  EXPECT_EQ(sizeof(bool), 1);
+
+  // Unfortunately. std::complex::complex() initializes (0, 0).
+  EXPECT_FALSE(std::is_trivial<complex64>::value);
+  EXPECT_FALSE(std::is_trivial<std::complex<double>>::value);
+  EXPECT_TRUE(std::is_trivial<float[2]>::value);
+  struct MyComplex {
+    float re, im;
+  };
+  EXPECT_TRUE(std::is_trivial<MyComplex>::value);
+}
+
+template <typename T>
+void TestCopies(const Tensor& t) {
+  {
+    LOG(INFO) << "CopyFrom()";
+    Tensor t2(t.dtype());
+    EXPECT_TRUE(t2.CopyFrom(t, t.shape()));
+    test::ExpectTensorEqual<T>(t, t2);
+  }
+  {
+    LOG(INFO) << "operator=()";
+    Tensor t2(t.dtype());
+    t2 = t;
+    test::ExpectTensorEqual<T>(t, t2);
+  }
+  {
+    LOG(INFO) << "deep copy";
+    Tensor t2(t.dtype(), t.shape());
+    t2.flat<T>() = t.flat<T>();
+    test::ExpectTensorEqual<T>(t, t2);
+  }
+  {
+    LOG(INFO) << "AsProtoField()";
+    TensorProto proto;
+    t.AsProtoField(&proto);
+    Tensor t2(t.dtype());
+    EXPECT_TRUE(t2.FromProto(proto));
+    test::ExpectTensorEqual<T>(t, t2);
+  }
+  {
+    LOG(INFO) << "AsProtoTensorContent()";
+    TensorProto proto;
+    t.AsProtoTensorContent(&proto);
+    Tensor t2(t.dtype());
+    EXPECT_TRUE(t2.FromProto(proto));
+    test::ExpectTensorEqual<T>(t, t2);
+    // Make another copy via tensor_content field.
+    *proto.mutable_tensor_content() = proto.tensor_content();
+    Tensor t3(t.dtype());
+    EXPECT_TRUE(t3.FromProto(proto));
+    test::ExpectTensorEqual<T>(t, t2);
+  }
+  {
+    LOG(INFO) << "AsTensor";
+    gtl::ArraySlice<T> values(t.flat<T>().data(), t.NumElements());
+    Tensor t2 = test::AsTensor(values, t.shape());
+    test::ExpectTensorEqual<T>(t, t2);
+  }
+}
+
+TEST(Tensor_Float, Simple) {
+  Tensor t(DT_FLOAT, TensorShape({10, 20}));
+  EXPECT_TRUE(t.shape().IsSameSize(TensorShape({10, 20})));
+  for (int64 a = 0; a < t.shape().dim_size(0); a++) {
+    for (int64 b = 0; b < t.shape().dim_size(1); b++) {
+      t.matrix<float>()(a, b) = static_cast<float>(a * b);
+    }
+  }
+  TestCopies<float>(t);
+}
+
+TEST(Tensor_QInt8, Simple) {
+  Tensor t(DT_QINT8, TensorShape({2, 2}));
+  EXPECT_TRUE(t.shape().IsSameSize(TensorShape({2, 2})));
+  for (int64 a = 0; a < t.shape().dim_size(0); a++) {
+    for (int64 b = 0; b < t.shape().dim_size(1); b++) {
+      t.matrix<qint8>()(a, b) = qint8(a * b);
+    }
+  }
+  TestCopies<qint8>(t);
+}
+
+TEST(Tensor_QUInt8, Simple) {
+  Tensor t(DT_QUINT8, TensorShape({2, 2}));
+  EXPECT_TRUE(t.shape().IsSameSize(TensorShape({2, 2})));
+  for (int64 a = 0; a < t.shape().dim_size(0); a++) {
+    for (int64 b = 0; b < t.shape().dim_size(1); b++) {
+      t.matrix<Eigen::QUInt8>()(a, b) = Eigen::QUInt8(a * b);
+    }
+  }
+  TestCopies<Eigen::QUInt8>(t);
+}
+
+TEST(Tensor_QInt32, Simple) {
+  Tensor t(DT_QINT32, TensorShape({2, 2}));
+  EXPECT_TRUE(t.shape().IsSameSize(TensorShape({2, 2})));
+  for (int64 a = 0; a < t.shape().dim_size(0); a++) {
+    for (int64 b = 0; b < t.shape().dim_size(1); b++) {
+      t.matrix<qint32>()(a, b) = qint32(static_cast<int32>(a * b));
+    }
+  }
+  TestCopies<qint32>(t);
+}
+
+TEST(Tensor_Float, Reshape) {
+  Tensor t(DT_FLOAT, TensorShape({2, 3, 4, 5}));
+  EXPECT_TRUE(t.shape().IsSameSize(TensorShape({2, 3, 4, 5})));
+
+  {
+    auto tensor = t.tensor<float, 4>();
+    EXPECT_EQ(2, tensor.dimension(0));
+    EXPECT_EQ(3, tensor.dimension(1));
+    EXPECT_EQ(4, tensor.dimension(2));
+    EXPECT_EQ(5, tensor.dimension(3));
+
+    // Set first and last elements.
+    tensor(0, 0, 0, 0) = 0.01f;
+    tensor(1, 2, 3, 4) = 0.02f;
+  }
+  {
+    auto shaped = t.shaped<float, 1>({120});
+    EXPECT_EQ(120, shaped.dimension(0));
+    EXPECT_EQ(shaped(0), 0.01f);
+    EXPECT_EQ(shaped(119), 0.02f);
+  }
+  {
+    auto shaped = t.shaped<float, 2>({6, 20});
+    EXPECT_EQ(6, shaped.dimension(0));
+    EXPECT_EQ(20, shaped.dimension(1));
+    EXPECT_EQ(shaped(0, 0), 0.01f);
+    EXPECT_EQ(shaped(5, 19), 0.02f);
+  }
+  {
+    auto shaped = t.shaped<float, 3>({6, 4, 5});
+    EXPECT_EQ(6, shaped.dimension(0));
+    EXPECT_EQ(4, shaped.dimension(1));
+    EXPECT_EQ(5, shaped.dimension(2));
+    EXPECT_EQ(shaped(0, 0, 0), 0.01f);
+    EXPECT_EQ(shaped(5, 3, 4), 0.02f);
+  }
+  {
+    auto shaped = t.shaped<float, 4>({2, 3, 4, 5});
+    EXPECT_EQ(2, shaped.dimension(0));
+    EXPECT_EQ(3, shaped.dimension(1));
+    EXPECT_EQ(4, shaped.dimension(2));
+    EXPECT_EQ(5, shaped.dimension(3));
+
+    EXPECT_EQ(shaped(0, 0, 0, 0), 0.01f);
+    EXPECT_EQ(shaped(1, 2, 3, 4), 0.02f);
+  }
+  {
+    auto flat = t.flat<float>();
+    EXPECT_EQ(flat(0), 0.01f);
+    EXPECT_EQ(120, flat.dimension(0));
+    EXPECT_EQ(flat(0), 0.01f);
+    EXPECT_EQ(flat(119), 0.02f);
+  }
+  {
+    auto flat_inner_dims = t.flat_inner_dims<float>();
+    EXPECT_EQ(24, flat_inner_dims.dimension(0));
+    EXPECT_EQ(5, flat_inner_dims.dimension(1));
+    EXPECT_EQ(flat_inner_dims(0, 0), 0.01f);
+    EXPECT_EQ(flat_inner_dims(23, 4), 0.02f);
+  }
+}
+
+TEST(Tensor_Scalar, Basics) {
+  {
+    Tensor t(DT_FLOAT, TensorShape({}));
+    EXPECT_EQ(1, t.NumElements());
+    auto Tt = t.scalar<float>();
+    EXPECT_EQ(1, Tt.size());
+    EXPECT_EQ(0, Tt.rank());
+    t.scalar<float>()() = 123.45f;
+    EXPECT_FLOAT_EQ(123.45f, Tt());
+  }
+  {
+    Tensor t(DT_FLOAT, TensorShape({1}));
+    EXPECT_EQ(1, t.NumElements());
+    auto Tt = t.vec<float>();
+    EXPECT_EQ(1, Tt.size());
+    t.vec<float>()(0) = 123.45f;
+    EXPECT_FLOAT_EQ(123.45f, Tt(0));
+  }
+  {
+    Tensor t(DT_FLOAT, TensorShape({1, 1, 1}));
+    EXPECT_EQ(1, t.NumElements());
+    auto Tt = t.scalar<float>();
+    EXPECT_EQ(1, Tt.size());
+    EXPECT_EQ(0, Tt.rank());
+    t.flat<float>()(0) = 123.45f;
+    EXPECT_FLOAT_EQ(123.45f, Tt());
+  }
+  {
+    Tensor t(DT_STRING, TensorShape({}));
+    EXPECT_EQ(1, t.NumElements());
+    auto Tt = t.scalar<string>();
+    EXPECT_EQ(1, Tt.size());
+    EXPECT_EQ(0, Tt.rank());
+    t.scalar<string>()() = "foo";
+    EXPECT_EQ("foo", Tt());
+  }
+  {
+    Tensor t(DT_STRING, TensorShape({1}));
+    EXPECT_EQ(1, t.NumElements());
+    auto Tt = t.vec<string>();
+    EXPECT_EQ(1, Tt.size());
+    t.flat<string>()(0) = "foo";
+    EXPECT_EQ("foo", Tt(0));
+  }
+  {
+    Tensor t(DT_STRING, TensorShape({1, 1, 1}));
+    EXPECT_EQ(1, t.NumElements());
+    auto Tt = t.scalar<string>();
+    EXPECT_EQ(1, Tt.size());
+    EXPECT_EQ(0, Tt.rank());
+    t.flat<string>()(0) = "bar";
+    EXPECT_EQ("bar", Tt());
+  }
+  {
+    Tensor t(DT_FLOAT, TensorShape({0, 1}));
+    EXPECT_EQ(0, t.NumElements());
+    auto Tt = t.flat<float>();
+    EXPECT_EQ(0, Tt.size());
+    auto Tm = t.matrix<float>();
+    EXPECT_EQ(0, Tm.size());
+    EXPECT_EQ(0, Tm.dimensions()[0]);
+    EXPECT_EQ(1, Tm.dimensions()[1]);
+  }
+}
+
+TEST(Tensor_Float, Reshape_And_Slice_Assignment) {
+  // A test to experiment with a way to assign to a subset of a tensor
+  Tensor t(DT_FLOAT, TensorShape({10, 4, 3, 2}));
+  EXPECT_TRUE(t.shape().IsSameSize(TensorShape({10, 4, 3, 2})));
+
+  // Get the N dimensional tensor (N==4 here)
+  auto e_t = t.tensor<float, 4>();
+  // Reshape to view it as a two-dimensional tensor
+  auto e_2d = t.shaped<float, 2>({10, 4 * 3 * 2});
+  for (int i = 0; i < 10; i++) {
+    // Assign a 1 x 4*3*2 matrix (really vector) to a slice of size
+    // 1 x 4*3*2 in e_t.
+    Eigen::Tensor<float, 2, Eigen::RowMajor> m(1, 4 * 3 * 2);
+    m.setConstant(i * 2.0);
+
+    Eigen::DSizes<Eigen::DenseIndex, 2> indices(i, 0);
+    Eigen::DSizes<Eigen::DenseIndex, 2> sizes(1, 4 * 3 * 2);
+    e_2d.slice(indices, sizes) = m;
+  }
+  for (int i = 0; i < 10; i++) {
+    for (int j = 0; j < 4; j++) {
+      for (int k = 0; k < 3; k++) {
+        for (int l = 0; l < 2; l++) {
+          EXPECT_EQ(e_t(i, j, k, l), i * 2.0f);
+          LOG(INFO) << i << "," << j << "," << k << "," << l
+                    << " &e_t(i, j, k, l): " << &e_t(i, j, k, l) << " = "
+                    << e_t(i, j, k, l);
+        }
+      }
+    }
+  }
+}
+
+TEST(Tensor_String, Simple) {
+  Tensor t = test::AsTensor<string>(
+      {"hello", "world", "machine", "learning", "new", "york"},
+      TensorShape({3, 2}));
+  auto s = t.shape();
+  ASSERT_EQ(s.dims(), 2);
+  ASSERT_EQ(s.dim_size(0), 3);
+  ASSERT_EQ(s.dim_size(1), 2);
+  auto m = t.matrix<string>();
+  EXPECT_EQ(t.TotalBytes(), 3 * 2 * sizeof(string) + 5 + 5 + 7 + 8 + 3 + 4);
+
+  EXPECT_EQ(m(0, 0), "hello");
+  EXPECT_EQ(m(0, 1), "world");
+  EXPECT_EQ(m(1, 0), "machine");
+  EXPECT_EQ(m(1, 1), "learning");
+  EXPECT_EQ(m(2, 0), "new");
+  EXPECT_EQ(m(2, 1), "york");
+
+  TestCopies<string>(t);
+}
+
+TEST(Tensor_Float, SimpleWithHelper) {
+  Tensor t1 = test::AsTensor<float>({0, 1, 2, 3, 4, 5}, {2, 3});
+  Tensor t2(t1.dtype(), t1.shape());
+  t2.flat<float>() = t1.flat<float>() * 2.0f;
+  Tensor t3 = test::AsTensor<float>({0, 2, 4, 6, 8, 10}, t1.shape());
+  test::ExpectTensorEqual<float>(t2, t3);
+}
+
+TEST(Tensor_Int32, SimpleWithHelper) {
+  Tensor t1 = test::AsTensor<int32>({0, 1, 2, 3, 4, 5}, {2, 3});
+  Tensor t2(t1.dtype(), t1.shape());
+  t2.flat<int32>() = t1.flat<int32>() * 2;
+  Tensor t3 = test::AsTensor<int32>({0, 2, 4, 6, 8, 10}, t1.shape());
+  test::ExpectTensorEqual<int32>(t2, t3);
+}
+
+TEST(Tensor_QInt8, SimpleWithHelper) {
+  Tensor t1 = test::AsTensor<qint8>({0, 1, 2, 3, 4, 5}, {2, 3});
+  Tensor t2(t1.dtype(), t1.shape());
+  t2.flat<qint8>() = t1.flat<qint8>() + qint8(-2);
+  Tensor t3 = test::AsTensor<qint8>({-2, -1, 0, 1, 2, 3}, {2, 3});
+  test::ExpectTensorEqual<qint8>(t2, t3);
+}
+
+TEST(Tensor_QUInt8, SimpleWithHelper) {
+  Tensor t1 = test::AsTensor<quint8>({0, 1, 2, 3, 4, 5}, {2, 3});
+  Tensor t2(t1.dtype(), t1.shape());
+  t2.flat<quint8>() = t1.flat<quint8>() + quint8(2);
+  Tensor t3 = test::AsTensor<quint8>({2, 3, 4, 5, 6, 7}, {2, 3});
+  test::ExpectTensorEqual<quint8>(t2, t3);
+}
+
+TEST(Tensor_Int64, SimpleWithHelper) {
+  Tensor t1 = test::AsTensor<int64>(
+      {0LL << 48, 1LL << 48, 2LL << 48, 3LL << 48, 4LL << 48, 5LL << 48},
+      {2, 3});
+  Tensor t2(t1.dtype(), t1.shape());
+  t2.flat<int64>() = t1.flat<int64>() * static_cast<int64>(2);
+  Tensor t3 = test::AsTensor<int64>(
+      {0LL << 48, 2LL << 48, 4LL << 48, 6LL << 48, 8LL << 48, 10LL << 48},
+      {2, 3});
+  test::ExpectTensorEqual<int64>(t2, t3);
+}
+
+TEST(Tensor_String, SimpleWithHelper) {
+  Tensor t1 = test::AsTensor<string>({"0", "1", "2", "3", "4", "5"}, {2, 3});
+  Tensor t2(DT_STRING, {2, 3});
+  for (int i = 0; i < 2; ++i) {
+    for (int j = 0; j < 3; ++j) {
+      t2.matrix<string>()(i, j) = strings::StrCat(i * 3 + j);
+    }
+  }
+
+  // Test with helper.
+  test::ExpectTensorEqual<string>(t1, t2);
+}
+
+TEST(Tensor_Bool, SimpleWithHelper) {
+  Tensor t1 =
+      test::AsTensor<bool>({false, true, false, true, false, true}, {2, 3});
+
+  Tensor t2(DT_BOOL, {2, 3});
+  for (int i = 0; i < 2; ++i) {
+    for (int j = 0; j < 3; ++j) {
+      t2.matrix<bool>()(i, j) = (((i + j) % 2) != 0);
+    }
+  }
+
+  // Test with helper.
+  test::ExpectTensorEqual<bool>(t1, t2);
+}
+
+TEST(Tensor_Complex, Simple) {
+  Tensor t(DT_COMPLEX64, {4, 5, 3, 7});
+  t.flat<complex64>().setRandom();
+  TestCopies<complex64>(t);
+}
+
+TEST(Tensor_Complex, SimpleWithHelper) {
+  {
+    Tensor t1 = test::AsTensor<complex64>({0,
+                                           {1, 1},
+                                           complex64(2),
+                                           complex64(3, 3),
+                                           complex64(0, 4),
+                                           complex64(2, 5)},
+                                          {2, 3});
+    Tensor t2(t1.dtype(), t1.shape());
+    t2.flat<complex64>() = t1.flat<complex64>() * complex64(0, 2);
+    Tensor t3 = test::AsTensor<complex64>(
+        {0, {-2, 2}, {0, 4}, {-6, 6}, {-8, 0}, {-10, 4}},
+        // shape
+        {2, 3});
+    test::ExpectTensorEqual<complex64>(t2, t3);
+  }
+
+  // Does some numeric operations for complex numbers.
+  {
+    const float PI = std::acos(-1);
+    const complex64 rotate_45 = std::polar(1.0f, PI / 4);
+
+    // x contains all the 8-th root of unity.
+    Tensor x(DT_COMPLEX64, TensorShape({8}));
+    for (int i = 0; i < 8; ++i) {
+      x.vec<complex64>()(i) = std::pow(rotate_45, i);
+    }
+
+    // Shift the roots by 45 degree.
+    Tensor y(DT_COMPLEX64, TensorShape({8}));
+    y.vec<complex64>() = x.vec<complex64>() * rotate_45;
+    Tensor y_expected(DT_COMPLEX64, TensorShape({8}));
+    for (int i = 0; i < 8; ++i) {
+      y_expected.vec<complex64>()(i) = std::pow(rotate_45, i + 1);
+    }
+    test::ExpectTensorNear<complex64>(y, y_expected, 1e-5);
+
+    // Raise roots to the power of 8.
+    Tensor z(DT_COMPLEX64, TensorShape({8}));
+    z.vec<complex64>() = x.vec<complex64>().pow(8);
+    Tensor z_expected(DT_COMPLEX64, TensorShape({8}));
+    for (int i = 0; i < 8; ++i) {
+      z_expected.vec<complex64>()(i) = 1;
+    }
+    test::ExpectTensorNear<complex64>(z, z_expected, 1e-5);
+  }
+}
+
+// On the alignment.
+//
+// As of 2015/8, tensorflow::Tensor allocates its buffer with 32-byte
+// alignment. Tensor::tensor/flat/vec/matrix methods requires the the
+// buffer satisfies Eigen::Aligned (e.g., 16-bytes aligned usually,
+// and 32-bytes for AVX). Tensor::Slice requires the caller to ensure
+// its result is aligned if the caller intends to use those methods.
+// In this test case, we simply make sure each slice is 32-byte
+// aligned: sizeof(float) * 4 * 2 = 32.
+TEST(Tensor, Slice_Basic) {
+  Tensor saved;
+  {  // General
+    Tensor x(DT_FLOAT, TensorShape({10, 4, 34}));
+    // Fills in known values.
+    for (int i = 0; i < 10; ++i) {
+      x.Slice(i, i + 1).flat<float>().setConstant(i * 1.f);
+    }
+    // A simple slice along dim0.
+    Tensor y = x.Slice(4, 8);
+    EXPECT_TRUE(y.shape().IsSameSize(TensorShape({4, 4, 34})));
+    auto tx = x.tensor<float, 3>();
+    auto ty = y.tensor<float, 3>();
+    for (int i = 0; i < 4; ++i) {
+      for (int j = 0; j < 4; ++j) {
+        for (int k = 0; k < 34; ++k) {
+          EXPECT_EQ(ty(i, j, k), 4.0 + i);
+          EXPECT_EQ(&tx(4 + i, j, k), &ty(i, j, k));
+        }
+      }
+    }
+    // A simple slice equivalent to identity.
+    TestCopies<float>(y);
+    y = x.Slice(0, 10);
+    test::ExpectTensorEqual<float>(x, y);
+    EXPECT_EQ(x.flat<float>().data(), y.flat<float>().data());
+
+    // A slice of a slice.
+    auto z = x.Slice(4, 8).Slice(2, 3);
+    auto tz = z.tensor<float, 3>();
+    EXPECT_EQ(1, z.dim_size(0));
+    for (int j = 0; j < 4; ++j) {
+      for (int k = 0; k < 34; ++k) {
+        EXPECT_EQ(tz(0, j, k), 6.0);
+      }
+    }
+
+    // x and y will be out of scope. But 'saved' should be alive.
+    saved = z;
+  }
+  {
+    EXPECT_EQ(1, saved.dim_size(0));
+    auto tsaved = saved.tensor<float, 3>();
+    for (int j = 0; j < 4; ++j) {
+      for (int k = 0; k < 34; ++k) {
+        EXPECT_EQ(tsaved(0, j, k), 6.0);
+      }
+    }
+  }
+  {  // Empty
+    Tensor x(DT_FLOAT, TensorShape({10, 0, 34}));
+    x.flat<float>().setRandom();
+    Tensor y = x.Slice(4, 8);
+    EXPECT_TRUE(y.shape().IsSameSize(TensorShape({4, 0, 34})));
+  }
+
+  {
+    // Test unaligned access via a Slice.
+    Tensor x(DT_FLOAT, TensorShape({30}));
+    x.flat<float>().setConstant(0.0);
+
+    // Take an unaligned slice.
+    Tensor y = x.Slice(1, 13);
+    y.unaligned_flat<float>().setConstant(1.0);
+    for (int64 i = 0; i < y.NumElements(); ++i) {
+      EXPECT_EQ(1.0, y.unaligned_flat<float>()(i));
+    }
+  }
+}
+
+static void BM_CreateAndDestroy(int iters) {
+  TensorShape shape({10, 20});
+  while (--iters) {
+    Tensor t(DT_FLOAT, shape);
+  }
+}
+BENCHMARK(BM_CreateAndDestroy);
+
+static void BM_Assign(int iters) {
+  Tensor a(DT_FLOAT, TensorShape({10, 20}));
+  Tensor b(DT_FLOAT, TensorShape({10, 20}));
+  bool a_to_b = true;
+  while (--iters) {
+    if (a_to_b) {
+      b = a;
+    } else {
+      a = b;
+    }
+    a_to_b = !a_to_b;
+  }
+}
+BENCHMARK(BM_Assign);
+
+// Ensure tensor_data() works on empty tensors
+TEST(Tensor, EmptyTensorData) {
+  Tensor empty;
+  EXPECT_EQ(empty.tensor_data().size(), 0);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/tensor_testutil.cc b/tensorflow/core/framework/tensor_testutil.cc
new file mode 100644
index 0000000000..b6cd12a864
--- /dev/null
+++ b/tensorflow/core/framework/tensor_testutil.cc
@@ -0,0 +1,43 @@
+#include <cmath>
+#include "tensorflow/core/framework/tensor_testutil.h"
+
+namespace tensorflow {
+namespace test {
+
+template <typename T>
+bool IsClose(const T& x, const T& y, double atol, double rtol) {
+  return fabs(x - y) < atol + rtol * fabs(x);
+}
+
+template <typename T>
+void ExpectClose(const Tensor& x, const Tensor& y, double atol, double rtol) {
+  auto Tx = x.flat<T>();
+  auto Ty = y.flat<T>();
+  for (int i = 0; i < Tx.size(); ++i) {
+    if (!IsClose(Tx(i), Ty(i), atol, rtol)) {
+      LOG(ERROR) << "x = " << x.DebugString();
+      LOG(ERROR) << "y = " << y.DebugString();
+      LOG(ERROR) << "atol = " << atol << " rtol = " << rtol
+                 << " tol = " << atol + rtol * std::fabs(Tx(i));
+      EXPECT_TRUE(false) << i << "-th element is not close " << Tx(i) << " vs. "
+                         << Ty(i);
+    }
+  }
+}
+
+void ExpectClose(const Tensor& x, const Tensor& y, double atol, double rtol) {
+  internal::AssertSameTypeDims(x, y);
+  switch (x.dtype()) {
+    case DT_FLOAT:
+      ExpectClose<float>(x, y, atol, rtol);
+      break;
+    case DT_DOUBLE:
+      ExpectClose<double>(x, y, atol, rtol);
+      break;
+    default:
+      LOG(FATAL) << "Unexpected type : " << DataTypeString(x.dtype());
+  }
+}
+
+}  // end namespace test
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/tensor_testutil.h b/tensorflow/core/framework/tensor_testutil.h
new file mode 100644
index 0000000000..53d6da0fb2
--- /dev/null
+++ b/tensorflow/core/framework/tensor_testutil.h
@@ -0,0 +1,189 @@
+#ifndef TENSORFLOW_FRAMEWORK_TENSOR_TESTUTIL_H_
+#define TENSORFLOW_FRAMEWORK_TENSOR_TESTUTIL_H_
+
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace test {
+
+// Constructs a scalar tensor with 'val'.
+template <typename T>
+Tensor AsScalar(const T& val) {
+  Tensor ret(DataTypeToEnum<T>::value, {});
+  ret.scalar<T>()() = val;
+  return ret;
+}
+
+// Constructs a flat tensor with 'vals'.
+template <typename T>
+Tensor AsTensor(gtl::ArraySlice<T> vals) {
+  Tensor ret(DataTypeToEnum<T>::value, {static_cast<int64>(vals.size())});
+  std::copy_n(vals.data(), vals.size(), ret.flat<T>().data());
+  return ret;
+}
+
+// Constructs a tensor of "shape" with values "vals".
+template <typename T>
+Tensor AsTensor(gtl::ArraySlice<T> vals, const TensorShape& shape) {
+  Tensor ret;
+  CHECK(ret.CopyFrom(AsTensor(vals), shape));
+  return ret;
+}
+
+// Fills in '*tensor' with 'vals'. E.g.,
+//   Tensor x(&alloc, DT_FLOAT, TensorShape({2, 2}));
+//   test::FillValues<float>(&x, {11, 21, 21, 22});
+template <typename T>
+void FillValues(Tensor* tensor, gtl::ArraySlice<T> vals) {
+  auto flat = tensor->flat<T>();
+  CHECK_EQ(flat.size(), vals.size());
+  if (flat.size() > 0) {
+    std::copy_n(vals.data(), vals.size(), flat.data());
+  }
+}
+
+// Fills in '*tensor' with a sequence of value of val, val+1, val+2, ...
+//   Tensor x(&alloc, DT_FLOAT, TensorShape({2, 2}));
+//   test::FillIota<float>(&x, 1.0);
+template <typename T>
+void FillIota(Tensor* tensor, const T& val) {
+  auto flat = tensor->flat<T>();
+  std::iota(flat.data(), flat.data() + flat.size(), val);
+}
+
+// Fills in '*tensor' with a sequence of value of fn(0), fn(1), ...
+//   Tensor x(&alloc, DT_FLOAT, TensorShape({2, 2}));
+//   test::FillFn<float>(&x, [](int i)->float { return i*i; });
+template <typename T>
+void FillFn(Tensor* tensor, std::function<T(int)> fn) {
+  auto flat = tensor->flat<T>();
+  for (int i = 0; i < flat.size(); ++i) flat(i) = fn(i);
+}
+
+// Expects "x" and "y" are tensors of the same type, same shape, and
+// identical values.
+template <typename T>
+void ExpectTensorEqual(const Tensor& x, const Tensor& y);
+
+// Expects "x" and "y" are tensors of the same type, same shape, and
+// approxmiate equal values, each within "abs_err".
+template <typename T>
+void ExpectTensorNear(const Tensor& x, const Tensor& y, const T& abs_err);
+
+// Expects "x" and "y" are tensors of the same type (float or double),
+// same shape and element-wise difference between x and y is no more
+// than atol + rtol * abs(x).
+void ExpectClose(const Tensor& x, const Tensor& y, double atol = 1e-6,
+                 double rtol = 1e-6);
+
+// Implementation details.
+
+namespace internal {
+
+template <typename T>
+struct is_floating_point_type {
+  static const bool value = std::is_same<T, float>::value ||
+                            std::is_same<T, double>::value ||
+                            std::is_same<T, std::complex<float> >::value ||
+                            std::is_same<T, std::complex<double> >::value;
+};
+
+template <typename T>
+static void ExpectEqual(const T& a, const T& b) {
+  EXPECT_EQ(a, b);
+}
+
+template <>
+void ExpectEqual<float>(const float& a, const float& b) {
+  EXPECT_FLOAT_EQ(a, b);
+}
+
+template <>
+void ExpectEqual<double>(const double& a, const double& b) {
+  EXPECT_DOUBLE_EQ(a, b);
+}
+
+template <>
+void ExpectEqual<complex64>(const complex64& a, const complex64& b) {
+  EXPECT_FLOAT_EQ(a.real(), b.real()) << a << " vs. " << b;
+  EXPECT_FLOAT_EQ(a.imag(), b.imag()) << a << " vs. " << b;
+}
+
+inline void AssertSameTypeDims(const Tensor& x, const Tensor& y) {
+  ASSERT_EQ(x.dtype(), y.dtype());
+  ASSERT_TRUE(x.IsSameSize(y))
+      << "x.shape [" << x.shape().DebugString() << "] vs "
+      << "y.shape [ " << y.shape().DebugString() << "]";
+}
+
+template <typename T, bool is_fp = is_floating_point_type<T>::value>
+struct Expector;
+
+template <typename T>
+struct Expector<T, false> {
+  static void Equal(const T& a, const T& b) { ExpectEqual(a, b); }
+
+  static void Equal(const Tensor& x, const Tensor& y) {
+    ASSERT_EQ(x.dtype(), DataTypeToEnum<T>::v());
+    AssertSameTypeDims(x, y);
+    auto a = x.flat<T>();
+    auto b = y.flat<T>();
+    for (int i = 0; i < a.size(); ++i) {
+      ExpectEqual(a(i), b(i));
+    }
+  }
+};
+
+// Partial specialization for float and double.
+template <typename T>
+struct Expector<T, true> {
+  static void Equal(const T& a, const T& b) { ExpectEqual(a, b); }
+
+  static void Equal(const Tensor& x, const Tensor& y) {
+    ASSERT_EQ(x.dtype(), DataTypeToEnum<T>::v());
+    AssertSameTypeDims(x, y);
+    auto a = x.flat<T>();
+    auto b = y.flat<T>();
+    for (int i = 0; i < a.size(); ++i) {
+      ExpectEqual(a(i), b(i));
+    }
+  }
+
+  static void Near(const T& a, const T& b, const double abs_err) {
+    if (a != b) {  // Takes care of inf.
+      EXPECT_LE(std::abs(a - b), abs_err) << "a = " << a << " b = " << b;
+    }
+  }
+
+  static void Near(const Tensor& x, const Tensor& y, const double abs_err) {
+    ASSERT_EQ(x.dtype(), DataTypeToEnum<T>::v());
+    AssertSameTypeDims(x, y);
+    auto a = x.flat<T>();
+    auto b = y.flat<T>();
+    for (int i = 0; i < a.size(); ++i) {
+      Near(a(i), b(i), abs_err);
+    }
+  }
+};
+
+}  // namespace internal
+
+template <typename T>
+void ExpectTensorEqual(const Tensor& x, const Tensor& y) {
+  internal::Expector<T>::Equal(x, y);
+}
+
+template <typename T>
+void ExpectTensorNear(const Tensor& x, const Tensor& y, const double abs_err) {
+  static_assert(internal::is_floating_point_type<T>::value,
+                "T is not a floating point types.");
+  internal::Expector<T>::Near(x, y, abs_err);
+}
+
+}  // namespace test
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_TENSOR_TESTUTIL_H_
diff --git a/tensorflow/core/framework/tensor_types.h b/tensorflow/core/framework/tensor_types.h
new file mode 100644
index 0000000000..077d86d442
--- /dev/null
+++ b/tensorflow/core/framework/tensor_types.h
@@ -0,0 +1,92 @@
+#ifndef TENSORFLOW_FRAMEWORK_TENSOR_TYPES_H_
+#define TENSORFLOW_FRAMEWORK_TENSOR_TYPES_H_
+
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+// Helper to define Tensor types given that the scalar is of type T.
+template <typename T, int NDIMS = 1>
+struct TTypes {
+  // Rank-<NDIMS> tensor of scalar type T.
+  typedef Eigen::TensorMap<Eigen::Tensor<T, NDIMS, Eigen::RowMajor>,
+                           Eigen::Aligned> Tensor;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, NDIMS, Eigen::RowMajor>,
+                           Eigen::Aligned> ConstTensor;
+
+  // Unaligned Rank-<NDIMS> tensor of scalar type T.
+  typedef Eigen::TensorMap<Eigen::Tensor<T, NDIMS, Eigen::RowMajor> >
+      UnalignedTensor;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, NDIMS, Eigen::RowMajor> >
+      UnalignedConstTensor;
+
+  typedef Eigen::TensorMap<Eigen::Tensor<T, NDIMS, Eigen::RowMajor, int>,
+                           Eigen::Aligned> Tensor32Bit;
+
+  // Scalar tensor (implemented as a rank-0 tensor) of scalar type T.
+  typedef Eigen::TensorMap<
+      Eigen::TensorFixedSize<T, Eigen::Sizes<>, Eigen::RowMajor>,
+      Eigen::Aligned> Scalar;
+  typedef Eigen::TensorMap<
+      Eigen::TensorFixedSize<const T, Eigen::Sizes<>, Eigen::RowMajor>,
+      Eigen::Aligned> ConstScalar;
+
+  // Unaligned Scalar tensor of scalar type T.
+  typedef Eigen::TensorMap<Eigen::TensorFixedSize<
+      T, Eigen::Sizes<>, Eigen::RowMajor> > UnalignedScalar;
+  typedef Eigen::TensorMap<Eigen::TensorFixedSize<
+      const T, Eigen::Sizes<>, Eigen::RowMajor> > UnalignedConstScalar;
+
+  // Rank-1 tensor (vector) of scalar type T.
+  typedef Eigen::TensorMap<Eigen::Tensor<T, 1, Eigen::RowMajor>, Eigen::Aligned>
+      Flat;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, 1, Eigen::RowMajor>,
+                           Eigen::Aligned> ConstFlat;
+  typedef Eigen::TensorMap<Eigen::Tensor<T, 1, Eigen::RowMajor>, Eigen::Aligned>
+      Vec;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, 1, Eigen::RowMajor>,
+                           Eigen::Aligned> ConstVec;
+
+  // Unaligned Rank-1 tensor (vector) of scalar type T.
+  typedef Eigen::TensorMap<Eigen::Tensor<T, 1, Eigen::RowMajor> > UnalignedFlat;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, 1, Eigen::RowMajor> >
+      UnalignedConstFlat;
+  typedef Eigen::TensorMap<Eigen::Tensor<T, 1, Eigen::RowMajor> > UnalignedVec;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, 1, Eigen::RowMajor> >
+      UnalignedConstVec;
+
+  // Rank-2 tensor (matrix) of scalar type T.
+  typedef Eigen::TensorMap<Eigen::Tensor<T, 2, Eigen::RowMajor>, Eigen::Aligned>
+      Matrix;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, 2, Eigen::RowMajor>,
+                           Eigen::Aligned> ConstMatrix;
+
+  // Unaligned Rank-2 tensor (matrix) of scalar type T.
+  typedef Eigen::TensorMap<Eigen::Tensor<T, 2, Eigen::RowMajor> >
+      UnalignedMatrix;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, 2, Eigen::RowMajor> >
+      UnalignedConstMatrix;
+};
+
+typedef typename TTypes<float, 1>::Tensor32Bit::Index Index32;
+
+template <typename DSizes>
+Eigen::DSizes<Index32, DSizes::count> To32BitDims(const DSizes& in) {
+  Eigen::DSizes<Index32, DSizes::count> out;
+  for (int i = 0; i < DSizes::count; ++i) {
+    out[i] = in[i];
+  }
+  return out;
+}
+
+template <typename TensorType>
+typename TTypes<typename TensorType::Scalar,
+                TensorType::NumIndices>::Tensor32Bit
+To32Bit(TensorType in) {
+  typedef typename TTypes<typename TensorType::Scalar,
+                          TensorType::NumIndices>::Tensor32Bit RetType;
+  return RetType(in.data(), To32BitDims(in.dimensions()));
+}
+
+}  // namespace tensorflow
+#endif  // TENSORFLOW_FRAMEWORK_TENSOR_TYPES_H_
diff --git a/tensorflow/core/framework/tensor_util.cc b/tensorflow/core/framework/tensor_util.cc
new file mode 100644
index 0000000000..7353191c74
--- /dev/null
+++ b/tensorflow/core/framework/tensor_util.cc
@@ -0,0 +1,28 @@
+#include "tensorflow/core/framework/tensor_util.h"
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace tensor {
+
+Tensor DeepCopy(const Tensor& other) {
+  Tensor tmp = Tensor(other.dtype(), other.shape());
+  if (DataTypeCanUseMemcpy(other.dtype())) {
+    StringPiece other_data = other.tensor_data();
+
+    // We use StringPiece as a convenient map over the tensor buffer,
+    // but we cast the type to get to the underlying buffer to do the
+    // copy.
+    StringPiece tmp_data = tmp.tensor_data();
+    memcpy(const_cast<char*>(tmp_data.data()), other_data.data(),
+           other_data.size());
+  } else {
+    CHECK_EQ(DT_STRING, other.dtype());
+    tmp.flat<string>() = other.flat<string>();
+  }
+  return tmp;
+}
+
+}  // namespace tensor
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/tensor_util.h b/tensorflow/core/framework/tensor_util.h
new file mode 100644
index 0000000000..a8dde1d0ca
--- /dev/null
+++ b/tensorflow/core/framework/tensor_util.h
@@ -0,0 +1,21 @@
+#ifndef TENSORFLOW_FRAMEWORK_TENSOR_UTIL_H_
+#define TENSORFLOW_FRAMEWORK_TENSOR_UTIL_H_
+
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace tensor {
+
+// DeepCopy returns a tensor whose contents are a deep copy of the
+// contents of 'other'.  This function is intended only for
+// convenience, not speed.
+//
+// REQUIRES: 'other' must point to data stored in CPU memory.
+// REQUIRES: 'other' must be a Tensor of a copy-able type if
+//           'other' is not appropriately memory-aligned.
+Tensor DeepCopy(const Tensor& other);
+
+}  // namespace tensor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_TENSOR_UTIL_H_
diff --git a/tensorflow/core/framework/tensor_util_test.cc b/tensorflow/core/framework/tensor_util_test.cc
new file mode 100644
index 0000000000..fef7468151
--- /dev/null
+++ b/tensorflow/core/framework/tensor_util_test.cc
@@ -0,0 +1,124 @@
+#include "tensorflow/core/framework/tensor_util.h"
+
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+TEST(TensorUtil, DeepCopy0d) {
+  Tensor x(DT_FLOAT, TensorShape({}));
+  x.scalar<float>()() = 10.0;
+
+  // Make y a deep copy of x and then change it.
+  Tensor y = tensor::DeepCopy(x);
+  y.scalar<float>()() = 20.0;
+
+  // x doesn't change
+  EXPECT_EQ(10.0, x.scalar<float>()());
+
+  // Change x.
+  x.scalar<float>()() = 30.0;
+
+  // Y doesn't change.
+  EXPECT_EQ(20.0, y.scalar<float>()());
+
+  Tensor z = tensor::DeepCopy(y);
+
+  // Change y.
+  y.scalar<float>()() = 40.0;
+
+  // The final states should all be different.
+  EXPECT_EQ(20.0, z.scalar<float>()());
+  EXPECT_EQ(30.0, x.scalar<float>()());
+  EXPECT_EQ(40.0, y.scalar<float>()());
+
+  // Should have the same shape and type.
+  EXPECT_EQ(TensorShape({}), x.shape());
+  EXPECT_EQ(TensorShape({}), y.shape());
+  EXPECT_EQ(TensorShape({}), z.shape());
+
+  EXPECT_EQ(DT_FLOAT, x.dtype());
+  EXPECT_EQ(DT_FLOAT, y.dtype());
+  EXPECT_EQ(DT_FLOAT, z.dtype());
+}
+
+TEST(TensorUtil, DeepCopy) {
+  Tensor x(DT_FLOAT, TensorShape({1}));
+  x.flat<float>()(0) = 10.0;
+
+  // Make y a deep copy of x and then change it.
+  Tensor y = tensor::DeepCopy(x);
+  y.flat<float>()(0) = 20.0;
+
+  // x doesn't change
+  EXPECT_EQ(10.0, x.flat<float>()(0));
+
+  // Change x.
+  x.flat<float>()(0) = 30.0;
+
+  // Y doesn't change.
+  EXPECT_EQ(20.0, y.flat<float>()(0));
+
+  Tensor z = tensor::DeepCopy(y);
+
+  // Change y.
+  y.flat<float>()(0) = 40.0;
+
+  // The final states should all be different.
+  EXPECT_EQ(20.0, z.flat<float>()(0));
+  EXPECT_EQ(30.0, x.flat<float>()(0));
+  EXPECT_EQ(40.0, y.flat<float>()(0));
+
+  // Should have the same shape and type.
+  EXPECT_EQ(TensorShape({1}), x.shape());
+  EXPECT_EQ(TensorShape({1}), y.shape());
+  EXPECT_EQ(TensorShape({1}), z.shape());
+
+  EXPECT_EQ(DT_FLOAT, x.dtype());
+  EXPECT_EQ(DT_FLOAT, y.dtype());
+  EXPECT_EQ(DT_FLOAT, z.dtype());
+
+  // Test string deep copy
+  Tensor str1(DT_STRING, TensorShape({2}));
+  str1.flat<string>()(0) = "foo1";
+  str1.flat<string>()(1) = "foo2";
+  Tensor str2 = tensor::DeepCopy(str1);
+  str2.flat<string>()(0) = "bar1";
+  str2.flat<string>()(1) = "bar2";
+  EXPECT_NE(str2.flat<string>()(0), str1.flat<string>()(0));
+}
+
+TEST(TensorUtil, DeepCopySlice) {
+  Tensor x(DT_INT32, TensorShape({10}));
+  x.flat<int32>().setConstant(1);
+
+  // Slice 'x' -- y still refers to the same buffer.
+  Tensor y = x.Slice(2, 6);
+
+  // Do a deep copy of y, which is a slice.
+  Tensor z = tensor::DeepCopy(y);
+
+  // Set x to be different.
+  x.flat<int32>().setConstant(2);
+
+  EXPECT_EQ(TensorShape({10}), x.shape());
+  EXPECT_EQ(TensorShape({4}), y.shape());
+  EXPECT_EQ(TensorShape({4}), z.shape());
+  EXPECT_EQ(DT_INT32, x.dtype());
+  EXPECT_EQ(DT_INT32, y.dtype());
+  EXPECT_EQ(DT_INT32, z.dtype());
+
+  // x and y should now all be '2', but z should be '1'.
+  for (int i = 0; i < 10; ++i) {
+    EXPECT_EQ(2, x.flat<int32>()(i));
+  }
+  for (int i = 0; i < 4; ++i) {
+    EXPECT_EQ(2, y.unaligned_flat<int32>()(i));
+    EXPECT_EQ(1, z.flat<int32>()(i));
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/tracking_allocator.cc b/tensorflow/core/framework/tracking_allocator.cc
new file mode 100644
index 0000000000..78311ded19
--- /dev/null
+++ b/tensorflow/core/framework/tracking_allocator.cc
@@ -0,0 +1,100 @@
+#include "tensorflow/core/framework/tracking_allocator.h"
+
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+TrackingAllocator::TrackingAllocator(Allocator* allocator)
+    : allocator_(allocator),
+      ref_(1),
+      allocated_(0),
+      high_watermark_(0),
+      total_bytes_(0) {}
+
+void* TrackingAllocator::AllocateRaw(size_t alignment, size_t num_bytes) {
+  void* ptr = allocator_->AllocateRaw(alignment, num_bytes);
+  // If memory is exhausted AllocateRaw returns nullptr, and we should
+  // pass this through to the caller
+  if (nullptr == ptr) {
+    return ptr;
+  }
+  if (allocator_->TracksAllocationSizes()) {
+    size_t allocated_bytes = allocator_->AllocatedSize(ptr);
+    {
+      mutex_lock lock(mu_);
+      allocated_ += allocated_bytes;
+      high_watermark_ = std::max(high_watermark_, allocated_);
+      total_bytes_ += allocated_bytes;
+      ++ref_;
+    }
+  } else {
+    mutex_lock lock(mu_);
+    total_bytes_ += num_bytes;
+    ++ref_;
+  }
+  return ptr;
+}
+
+void TrackingAllocator::DeallocateRaw(void* ptr) {
+  // freeing a null ptr is a no-op
+  if (nullptr == ptr) {
+    return;
+  }
+  bool should_delete;
+  // fetch the following outside the lock in case the call to
+  // AllocatedSize is slow
+  bool tracks_allocation_sizes = allocator_->TracksAllocationSizes();
+  size_t allocated_bytes = 0;
+  if (tracks_allocation_sizes) {
+    allocated_bytes = allocator_->AllocatedSize(ptr);
+  }
+  Allocator* allocator = allocator_;
+  {
+    mutex_lock lock(mu_);
+    if (tracks_allocation_sizes) {
+      CHECK_GE(allocated_, allocated_bytes);
+      allocated_ -= allocated_bytes;
+    }
+    should_delete = UnRef();
+  }
+  allocator->DeallocateRaw(ptr);
+  if (should_delete) {
+    delete this;
+  }
+}
+
+bool TrackingAllocator::TracksAllocationSizes() {
+  return allocator_->TracksAllocationSizes();
+}
+
+size_t TrackingAllocator::RequestedSize(void* ptr) {
+  return allocator_->RequestedSize(ptr);
+}
+
+size_t TrackingAllocator::AllocatedSize(void* ptr) {
+  return allocator_->AllocatedSize(ptr);
+}
+
+std::pair<size_t, size_t> TrackingAllocator::GetSizesAndUnRef() {
+  size_t high_watermark;
+  size_t total_bytes;
+  bool should_delete;
+  {
+    mutex_lock lock(mu_);
+    high_watermark = high_watermark_;
+    total_bytes = total_bytes_;
+    should_delete = UnRef();
+  }
+  if (should_delete) {
+    delete this;
+  }
+  return std::make_pair(total_bytes, high_watermark);
+}
+
+bool TrackingAllocator::UnRef() {
+  CHECK_GE(ref_, 1);
+  --ref_;
+  return (ref_ == 0);
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/framework/tracking_allocator.h b/tensorflow/core/framework/tracking_allocator.h
new file mode 100644
index 0000000000..f809e3822c
--- /dev/null
+++ b/tensorflow/core/framework/tracking_allocator.h
@@ -0,0 +1,80 @@
+#ifndef TENSORFLOW_FRAMEWORK_TRACKING_ALLOCATOR_H_
+#define TENSORFLOW_FRAMEWORK_TRACKING_ALLOCATOR_H_
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+
+namespace tensorflow {
+
+// TrackingAllocator is a wrapper for an Allocator. It keeps a running
+// count of the number of bytes allocated through the wrapper. It is
+// used by the Executor to "charge" allocations to particular Op
+// executions. Each Op gets a separate TrackingAllocator wrapper
+// around the underlying allocator.
+//
+// The implementation assumes the invariant that all calls to
+// AllocateRaw by an Op (or work items spawned by the Op) will occur
+// before the Op's Compute method returns. Thus the high watermark is
+// established once Compute returns.
+//
+// DeallocateRaw can be called long after the Op has finished,
+// e.g. when an output tensor is deallocated, and the wrapper cannot
+// be deleted until the last of these calls has occurred.  The
+// TrackingAllocator keeps track of outstanding calls using a
+// reference count, and deletes itself once the last call has been
+// received and the high watermark has been retrieved.
+class TrackingAllocator : public Allocator {
+ public:
+  explicit TrackingAllocator(Allocator* allocator);
+  string Name() override { return allocator_->Name(); }
+  void* AllocateRaw(size_t alignment, size_t num_bytes) override;
+  void DeallocateRaw(void* ptr) override;
+  bool TracksAllocationSizes() override;
+  size_t RequestedSize(void* ptr) override;
+  size_t AllocatedSize(void* ptr) override;
+
+  // If the underlying allocator tracks allocation sizes, this returns
+  // a pair where the first value is the total number of bytes
+  // allocated through this wrapper, and the second value is the high
+  // watermark of bytes allocated through this wrapper. If the
+  // underlying allocator does not track allocation sizes the first
+  // value is the total number of bytes requested through this wrapper
+  // and the second is 0.
+  //
+  // After GetSizesAndUnref is called, the only further calls allowed
+  // on this wrapper are calls to DeallocateRaw with pointers that
+  // were allocated by this wrapper and have not yet been
+  // deallocated. After this call completes and all allocated pointers
+  // have been deallocated the wrapper will delete itself.
+  std::pair<size_t, size_t> GetSizesAndUnRef();
+
+ private:
+  ~TrackingAllocator() override {}
+  bool UnRef() EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  Allocator* allocator_;  // not owned.
+  mutex mu_;
+  // the number of calls to AllocateRaw that have not yet been matched
+  // by a corresponding call to DeAllocateRaw, plus 1 if the Executor
+  // has not yet read out the high watermark.
+  int ref_ GUARDED_BY(mu_);
+  // the current number of outstanding bytes that have been allocated
+  // by this wrapper, or 0 if the underlying allocator does not track
+  // allocation sizes.
+  size_t allocated_ GUARDED_BY(mu_);
+  // the maximum number of outstanding bytes that have been allocated
+  // by this wrapper, or 0 if the underlying allocator does not track
+  // allocation sizes.
+  size_t high_watermark_ GUARDED_BY(mu_);
+  // the total number of bytes that have been allocated by this
+  // wrapper if the underlying allocator tracks allocation sizes,
+  // otherwise the total number of bytes that have been requested by
+  // this allocator.
+  size_t total_bytes_ GUARDED_BY(mu_);
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_TRACKING_ALLOCATOR_H_
diff --git a/tensorflow/core/framework/tracking_allocator_test.cc b/tensorflow/core/framework/tracking_allocator_test.cc
new file mode 100644
index 0000000000..90ce851775
--- /dev/null
+++ b/tensorflow/core/framework/tracking_allocator_test.cc
@@ -0,0 +1,115 @@
+#include "tensorflow/core/framework/tracking_allocator.h"
+
+#include <unordered_map>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/platform/logging.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+class TestableSizeTrackingAllocator : public Allocator {
+ public:
+  string Name() override { return "test"; }
+  void* AllocateRaw(size_t /*alignment*/, size_t num_bytes) override {
+    void* ptr = malloc(num_bytes);
+    size_map_[ptr] = num_bytes;
+    return ptr;
+  }
+  void DeallocateRaw(void* ptr) override {
+    const auto& iter = size_map_.find(ptr);
+    EXPECT_NE(size_map_.end(), iter);
+    size_map_.erase(iter);
+    free(ptr);
+  }
+  bool TracksAllocationSizes() override { return true; }
+  size_t RequestedSize(void* ptr) override {
+    const auto& iter = size_map_.find(ptr);
+    EXPECT_NE(size_map_.end(), iter);
+    return iter->second;
+  }
+
+ private:
+  std::unordered_map<void*, size_t> size_map_;
+};
+
+class NoMemoryAllocator : public Allocator {
+ public:
+  string Name() override { return "test"; }
+  void* AllocateRaw(size_t /*alignment*/, size_t num_bytes) override {
+    return nullptr;
+  }
+  void DeallocateRaw(void* ptr) override {}
+  bool TracksAllocationSizes() override { return true; }
+};
+
+TEST(TrackingAllocatorTest, SimpleNoTracking) {
+  Allocator* a = cpu_allocator();
+
+  EXPECT_FALSE(a->TracksAllocationSizes());
+
+  TrackingAllocator* ta = new TrackingAllocator(a);
+
+  void* p1 = ta->AllocateRaw(4, 4);
+  ta->Deallocate(p1);
+  void* p2 = ta->AllocateRaw(4, 12);
+
+  std::pair<size_t, size_t> sizes = ta->GetSizesAndUnRef();
+
+  EXPECT_EQ(16, sizes.first);
+  EXPECT_EQ(0, sizes.second);
+
+  ta->Deallocate(p2);
+}
+
+TEST(TrackingAllocatorTest, SimpleTracking) {
+  TestableSizeTrackingAllocator a = TestableSizeTrackingAllocator();
+
+  EXPECT_TRUE(a.TracksAllocationSizes());
+
+  TrackingAllocator* ta = new TrackingAllocator(&a);
+
+  void* p1 = ta->AllocateRaw(4, 12);
+  ta->Deallocate(p1);
+  void* p2 = ta->AllocateRaw(4, 4);
+
+  std::pair<size_t, size_t> sizes = ta->GetSizesAndUnRef();
+
+  EXPECT_EQ(16, sizes.first);
+  EXPECT_EQ(12, sizes.second);
+
+  ta->Deallocate(p2);
+}
+
+TEST(TrackingAllocatorTest, OutOfMemory) {
+  NoMemoryAllocator a;
+
+  EXPECT_TRUE(a.TracksAllocationSizes());
+
+  TrackingAllocator* ta = new TrackingAllocator(&a);
+
+  void* p1 = ta->AllocateRaw(4, 12);
+  EXPECT_EQ(nullptr, p1);
+
+  std::pair<size_t, size_t> sizes = ta->GetSizesAndUnRef();
+
+  EXPECT_EQ(0, sizes.first);
+  EXPECT_EQ(0, sizes.second);
+}
+
+TEST(TrackingAllocatorTest, FreeNullPtr) {
+  NoMemoryAllocator a;
+
+  EXPECT_TRUE(a.TracksAllocationSizes());
+
+  TrackingAllocator* ta = new TrackingAllocator(&a);
+
+  ta->DeallocateRaw(nullptr);
+
+  std::pair<size_t, size_t> sizes = ta->GetSizesAndUnRef();
+
+  EXPECT_EQ(0, sizes.first);
+  EXPECT_EQ(0, sizes.second);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/type_traits.h b/tensorflow/core/framework/type_traits.h
new file mode 100644
index 0000000000..d87b6ff49b
--- /dev/null
+++ b/tensorflow/core/framework/type_traits.h
@@ -0,0 +1,69 @@
+#ifndef TENSORFLOW_FRAMEWORK_TYPE_TRAITS_H_
+#define TENSORFLOW_FRAMEWORK_TYPE_TRAITS_H_
+
+#include <limits>
+#include <utility>
+
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// Functions to define quantization attribute of types.
+struct true_type {
+  static const bool value = true;
+};
+struct false_type {
+  static const bool value = false;
+};
+
+// Default is_quantized is false.
+template <typename T>
+struct is_quantized : false_type {};
+
+// Specialize the quantized types.
+template <>
+struct is_quantized<qint8> : true_type {};
+template <>
+struct is_quantized<quint8> : true_type {};
+template <>
+struct is_quantized<qint32> : true_type {};
+
+// All types not specialized are marked invalid.
+template <class T>
+struct IsValidDataType {
+  static constexpr bool value = false;
+};
+
+// Extra validity checking; not part of public API.
+struct TestIsValidDataType {
+  static_assert(IsValidDataType<int64>::value, "Incorrect impl for int64");
+  static_assert(IsValidDataType<int32>::value, "Incorrect impl for int32");
+};
+
+}  // namespace tensorflow
+
+// Define numeric limits for our quantized as subclasses of the
+// standard types.
+namespace std {
+template <>
+class numeric_limits<tensorflow::qint8>
+    : public numeric_limits<tensorflow::int8> {};
+template <>
+class numeric_limits<tensorflow::quint8>
+    : public numeric_limits<tensorflow::uint8> {};
+template <>
+class numeric_limits<tensorflow::qint32>
+    : public numeric_limits<tensorflow::int32> {};
+
+// Specialize is_signed for quantized types.
+template <>
+struct is_signed<tensorflow::qint8> : public is_signed<tensorflow::int8> {};
+template <>
+struct is_signed<tensorflow::quint8> : public is_signed<tensorflow::uint8> {};
+template <>
+struct is_signed<tensorflow::qint32> : public is_signed<tensorflow::int32> {};
+
+}  // namespace std
+
+#endif  // TENSORFLOW_FRAMEWORK_TYPE_TRAITS_H_
diff --git a/tensorflow/core/framework/types.cc b/tensorflow/core/framework/types.cc
new file mode 100644
index 0000000000..01b9fca3b6
--- /dev/null
+++ b/tensorflow/core/framework/types.cc
@@ -0,0 +1,210 @@
+#include "tensorflow/core/framework/types.h"
+
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+bool DeviceType::operator<(const DeviceType& other) const {
+  return type_ < other.type_;
+}
+
+bool DeviceType::operator==(const DeviceType& other) const {
+  return type_ == other.type_;
+}
+
+std::ostream& operator<<(std::ostream& os, const DeviceType& d) {
+  os << d.type();
+  return os;
+}
+
+const char* const DEVICE_CPU = "CPU";
+const char* const DEVICE_GPU = "GPU";
+
+string DataTypeString(DataType dtype) {
+  if (IsRefType(dtype)) {
+    DataType non_ref = static_cast<DataType>(dtype - kDataTypeRefOffset);
+    return strings::StrCat(DataTypeString(non_ref), "_ref");
+  }
+  switch (dtype) {
+    case DT_INVALID:
+      return "INVALID";
+    case DT_FLOAT:
+      return "float";
+    case DT_DOUBLE:
+      return "double";
+    case DT_INT32:
+      return "int32";
+    case DT_UINT8:
+      return "uint8";
+    case DT_INT16:
+      return "int16";
+    case DT_INT8:
+      return "int8";
+    case DT_STRING:
+      return "string";
+    case DT_COMPLEX64:
+      return "complex64";
+    case DT_INT64:
+      return "int64";
+    case DT_BOOL:
+      return "bool";
+    case DT_QINT8:
+      return "qint8";
+    case DT_QUINT8:
+      return "quint8";
+    case DT_QINT32:
+      return "qint32";
+    case DT_BFLOAT16:
+      return "bfloat16";
+    default:
+      LOG(FATAL) << "Unrecognized DataType enum value " << dtype;
+      return "";
+  }
+}
+
+bool DataTypeFromString(StringPiece sp, DataType* dt) {
+  if (sp.ends_with("_ref")) {
+    sp.remove_suffix(4);
+    DataType non_ref;
+    if (DataTypeFromString(sp, &non_ref) && !IsRefType(non_ref)) {
+      *dt = static_cast<DataType>(non_ref + kDataTypeRefOffset);
+      return true;
+    } else {
+      return false;
+    }
+  }
+
+  if (sp == "float" || sp == "float32") {
+    *dt = DT_FLOAT;
+    return true;
+  } else if (sp == "double" || sp == "float64") {
+    *dt = DT_DOUBLE;
+    return true;
+  } else if (sp == "int32") {
+    *dt = DT_INT32;
+    return true;
+  } else if (sp == "uint8") {
+    *dt = DT_UINT8;
+    return true;
+  } else if (sp == "int16") {
+    *dt = DT_INT16;
+    return true;
+  } else if (sp == "int8") {
+    *dt = DT_INT8;
+    return true;
+  } else if (sp == "string") {
+    *dt = DT_STRING;
+    return true;
+  } else if (sp == "complex64") {
+    *dt = DT_COMPLEX64;
+    return true;
+  } else if (sp == "int64") {
+    *dt = DT_INT64;
+    return true;
+  } else if (sp == "bool") {
+    *dt = DT_BOOL;
+    return true;
+  } else if (sp == "qint8") {
+    *dt = DT_QINT8;
+    return true;
+  } else if (sp == "quint8") {
+    *dt = DT_QUINT8;
+    return true;
+  } else if (sp == "qint32") {
+    *dt = DT_QINT32;
+    return true;
+  } else if (sp == "bfloat16") {
+    *dt = DT_BFLOAT16;
+    return true;
+  }
+  return false;
+}
+
+string DeviceTypeString(DeviceType device_type) { return device_type.type(); }
+
+string DataTypeSliceString(const DataTypeSlice types) {
+  string out;
+  for (auto it = types.begin(); it != types.end(); ++it) {
+    strings::StrAppend(&out, ((it == types.begin()) ? "" : ", "),
+                       DataTypeString(*it));
+  }
+  return out;
+}
+
+DataTypeVector AllTypes() {
+  return {DT_FLOAT, DT_DOUBLE, DT_INT32,     DT_UINT8, DT_INT16,
+          DT_INT8,  DT_STRING, DT_COMPLEX64, DT_INT64, DT_BOOL,
+          DT_QINT8, DT_QUINT8, DT_QINT32};
+}
+
+#ifndef __ANDROID__
+
+DataTypeVector RealNumberTypes() {
+  return {DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64, DT_UINT8, DT_INT16, DT_INT8};
+}
+
+DataTypeVector QuantizedTypes() { return {DT_QINT8, DT_QUINT8, DT_QINT32}; }
+
+DataTypeVector RealAndQuantizedTypes() {
+  return {DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64,  DT_UINT8,
+          DT_INT16, DT_INT8,   DT_QINT8, DT_QUINT8, DT_QINT32};
+}
+
+DataTypeVector NumberTypes() {
+  return {DT_FLOAT, DT_DOUBLE,    DT_INT64, DT_INT32,  DT_UINT8, DT_INT16,
+          DT_INT8,  DT_COMPLEX64, DT_QINT8, DT_QUINT8, DT_QINT32};
+}
+
+#else  // __ANDROID__
+
+DataTypeVector RealNumberTypes() { return {DT_FLOAT, DT_INT32}; }
+
+DataTypeVector NumberTypes() {
+  return {DT_FLOAT, DT_INT32, DT_QINT8, DT_QUINT8, DT_QINT32};
+}
+
+DataTypeVector QuantizedTypes() { return {DT_QINT8, DT_QUINT8, DT_QINT32}; }
+
+DataTypeVector RealAndQuantizedTypes() {
+  return {DT_FLOAT, DT_INT32, DT_QINT8, DT_QUINT8, DT_QINT32};
+}
+
+#endif  // __ANDROID__
+
+// TODO(jeff): Maybe unify this with Tensor::CanUseDMA, or the underlying
+// is_simple<T> in tensor.cc (and possible choose a more general name?)
+bool DataTypeCanUseMemcpy(DataType dt) {
+  switch (dt) {
+    case DT_FLOAT:
+    case DT_DOUBLE:
+    case DT_INT32:
+    case DT_UINT8:
+    case DT_INT16:
+    case DT_INT8:
+    case DT_COMPLEX64:
+    case DT_INT64:
+    case DT_BOOL:
+    case DT_QINT8:
+    case DT_QUINT8:
+    case DT_QINT32:
+    case DT_BFLOAT16:
+      return true;
+    default:
+      return false;
+  }
+}
+
+bool DataTypeIsQuantized(DataType dt) {
+  switch (dt) {
+    case DT_QINT8:
+    case DT_QUINT8:
+    case DT_QINT32:
+      return true;
+    default:
+      return false;
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/framework/types.h b/tensorflow/core/framework/types.h
new file mode 100644
index 0000000000..2d417cf076
--- /dev/null
+++ b/tensorflow/core/framework/types.h
@@ -0,0 +1,168 @@
+#ifndef TENSORFLOW_FRAMEWORK_TYPES_H_
+#define TENSORFLOW_FRAMEWORK_TYPES_H_
+
+#include <map>
+#include <set>
+#include <string>
+
+#include "tensorflow/core/framework/bfloat16.h"
+#include "tensorflow/core/framework/numeric_types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint"
+
+namespace tensorflow {
+
+// MemoryType is used to describe whether input or output Tensors of
+// an OpKernel should reside in "Host memory" (e.g., CPU memory) or
+// "Device" Memory (CPU memory for CPU devices, GPU memory for GPU
+// devices).
+enum MemoryType {
+  DEVICE_MEMORY = 0,
+  HOST_MEMORY = 1,
+};
+
+// A DeviceType is just a string, but we wrap it up in a class to give
+// some type checking as we're passing these around
+class DeviceType {
+ public:
+  DeviceType(const char* type)  // NOLINT(runtime/explicit)
+      : type_(type) {}
+
+  explicit DeviceType(StringPiece type) : type_(type.data(), type.size()) {}
+
+  const char* type() const { return type_.c_str(); }
+
+  bool operator<(const DeviceType& other) const;
+  bool operator==(const DeviceType& other) const;
+  bool operator!=(const DeviceType& other) const { return !(*this == other); }
+
+ private:
+  string type_;
+};
+std::ostream& operator<<(std::ostream& os, const DeviceType& d);
+
+// Convenient constants that can be passed to a DeviceType constructor
+extern const char* const DEVICE_CPU;  // "CPU"
+extern const char* const DEVICE_GPU;  // "GPU"
+
+typedef gtl::InlinedVector<MemoryType, 4> MemoryTypeVector;
+
+typedef gtl::InlinedVector<DataType, 4> DataTypeVector;
+typedef gtl::ArraySlice<DataType> DataTypeSlice;
+
+typedef gtl::InlinedVector<DeviceType, 4> DeviceTypeVector;
+
+// Convert the enums to strings for errors:
+string DataTypeString(DataType dtype);
+string DeviceTypeString(DeviceType device_type);
+string DataTypeSliceString(const DataTypeSlice dtypes);
+inline string DataTypeVectorString(const DataTypeVector& dtypes) {
+  return DataTypeSliceString(dtypes);
+}
+
+// If "sp" names a valid type, store it in "*dt" and return true.  Otherwise,
+// return false.
+bool DataTypeFromString(StringPiece sp, DataType* dt);
+
+// DT_FLOAT + kDataTypeRefOffset == DT_FLOAT_REF, etc.
+enum { kDataTypeRefOffset = 100 };
+inline bool IsRefType(DataType dtype) {
+  return dtype > static_cast<DataType>(kDataTypeRefOffset);
+}
+inline DataType MakeRefType(DataType dtype) {
+  DCHECK(!IsRefType(dtype));
+  return static_cast<DataType>(dtype + kDataTypeRefOffset);
+}
+inline DataType RemoveRefType(DataType dtype) {
+  DCHECK(IsRefType(dtype));
+  return static_cast<DataType>(dtype - kDataTypeRefOffset);
+}
+inline DataType BaseType(DataType dtype) {
+  return IsRefType(dtype) ? RemoveRefType(dtype) : dtype;
+}
+
+// Returns true if the actual type is the same as or ref of the expected type.
+inline bool TypesCompatible(DataType expected, DataType actual) {
+  return expected == actual || expected == BaseType(actual);
+}
+
+// Does not include _ref types.
+DataTypeVector AllTypes();
+
+// Return the list of all numeric types.
+// NOTE: On Android, we only include the float and int32 types for now.
+DataTypeVector RealNumberTypes();  // Types that support '<' and '>'.
+DataTypeVector NumberTypes();      // Includes complex and quantized types.
+
+DataTypeVector QuantizedTypes();
+DataTypeVector RealAndQuantizedTypes();  // Types that support '<' and
+                                         // '>', including quantized
+                                         // types
+
+// Validates type T for whether it is a supported DataType.
+template <class T>
+struct IsValidDataType;
+
+// DataTypeToEnum<T>::v() and DataTypeToEnum<T>::value are the DataType
+// constants for T, e.g. DataTypeToEnum<float>::v() is DT_FLOAT.
+template <class T>
+struct DataTypeToEnum {
+  static_assert(IsValidDataType<T>::value, "Specified Data Type not supported");
+};  // Specializations below
+
+// EnumToDataType<VALUE>::Type is the type for DataType constant VALUE, e.g.
+// EnumToDataType<DT_FLOAT>::Type is float.
+template <DataType VALUE>
+struct EnumToDataType {};  // Specializations below
+
+// Template specialization for both DataTypeToEnum and EnumToDataType.
+#define MATCH_TYPE_AND_ENUM(TYPE, ENUM)                 \
+  template <>                                           \
+  struct DataTypeToEnum<TYPE> {                         \
+    static DataType v() { return ENUM; }                \
+    static DataType ref() { return MakeRefType(ENUM); } \
+    static constexpr DataType value = ENUM;             \
+  };                                                    \
+  template <>                                           \
+  struct IsValidDataType<TYPE> {                        \
+    static constexpr bool value = true;                 \
+  };                                                    \
+  template <>                                           \
+  struct EnumToDataType<ENUM> {                         \
+    typedef TYPE Type;                                  \
+  }
+
+// We use Eigen's QInt implementations for our quantized int types.
+typedef Eigen::QInt8 qint8;
+typedef Eigen::QUInt8 quint8;
+typedef Eigen::QInt32 qint32;
+
+MATCH_TYPE_AND_ENUM(float, DT_FLOAT);
+MATCH_TYPE_AND_ENUM(double, DT_DOUBLE);
+MATCH_TYPE_AND_ENUM(int32, DT_INT32);
+MATCH_TYPE_AND_ENUM(uint8, DT_UINT8);
+MATCH_TYPE_AND_ENUM(int16, DT_INT16);
+MATCH_TYPE_AND_ENUM(int8, DT_INT8);
+MATCH_TYPE_AND_ENUM(string, DT_STRING);
+MATCH_TYPE_AND_ENUM(complex64, DT_COMPLEX64);
+MATCH_TYPE_AND_ENUM(int64, DT_INT64);
+MATCH_TYPE_AND_ENUM(bool, DT_BOOL);
+MATCH_TYPE_AND_ENUM(qint8, DT_QINT8);
+MATCH_TYPE_AND_ENUM(quint8, DT_QUINT8);
+MATCH_TYPE_AND_ENUM(qint32, DT_QINT32);
+MATCH_TYPE_AND_ENUM(bfloat16, DT_BFLOAT16);
+
+#undef MATCH_TYPE_AND_ENUM
+
+bool DataTypeCanUseMemcpy(DataType dt);
+
+bool DataTypeIsQuantized(DataType dt);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_FRAMEWORK_TYPES_H_
diff --git a/tensorflow/core/framework/types.proto b/tensorflow/core/framework/types.proto
new file mode 100644
index 0000000000..e5dc9c45a0
--- /dev/null
+++ b/tensorflow/core/framework/types.proto
@@ -0,0 +1,48 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+enum DataType {
+  // Not a legal value for DataType.  Used to indicate a DataType field
+  // has not been set.
+  DT_INVALID = 0;
+
+  // Data types that all computation devices are expected to be
+  // capable to support.
+  DT_FLOAT = 1;
+  DT_DOUBLE = 2;
+  DT_INT32 = 3;
+  DT_UINT8 = 4;
+  DT_INT16 = 5;
+  DT_INT8 = 6;
+  DT_STRING = 7;
+  DT_COMPLEX64 = 8;  // Single-precision complex
+  DT_INT64 = 9;
+  DT_BOOL = 10;
+  DT_QINT8 = 11;     // Quantized int8
+  DT_QUINT8 = 12;    // Quantized uint8
+  DT_QINT32 = 13;    // Quantized int32
+  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
+
+  // TODO(josh11b): DT_GENERIC_PROTO = ??;
+  // TODO(jeff,josh11b): DT_UINT64?  DT_UINT32?  DT_UINT16?
+  // TODO(zhifengc): DT_COMPLEX128 (double-precision complex)?
+
+  // Do not use!  These are only for parameters.  Every enum above
+  // should have a corresponding value below (verified by types_test).
+  DT_FLOAT_REF = 101;
+  DT_DOUBLE_REF = 102;
+  DT_INT32_REF = 103;
+  DT_UINT8_REF = 104;
+  DT_INT16_REF = 105;
+  DT_INT8_REF = 106;
+  DT_STRING_REF = 107;
+  DT_COMPLEX64_REF = 108;
+  DT_INT64_REF = 109;
+  DT_BOOL_REF = 110;
+  DT_QINT8_REF = 111;
+  DT_QUINT8_REF = 112;
+  DT_QINT32_REF = 113;
+  DT_BFLOAT16_REF = 114;
+}
diff --git a/tensorflow/core/framework/types_test.cc b/tensorflow/core/framework/types_test.cc
new file mode 100644
index 0000000000..eb92600397
--- /dev/null
+++ b/tensorflow/core/framework/types_test.cc
@@ -0,0 +1,117 @@
+#include "tensorflow/core/framework/types.h"
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/framework/type_traits.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+namespace {
+
+TEST(TypesTest, DeviceTypeName) {
+  EXPECT_EQ("CPU", DeviceTypeString(DeviceType(DEVICE_CPU)));
+  EXPECT_EQ("GPU", DeviceTypeString(DeviceType(DEVICE_GPU)));
+}
+
+TEST(TypesTest, kDataTypeRefOffset) {
+  // Basic sanity check
+  EXPECT_EQ(DT_FLOAT + kDataTypeRefOffset, DT_FLOAT_REF);
+
+  // Use the meta-data provided by proto2 to iterate through the basic
+  // types and validate that adding kDataTypeRefOffset gives the
+  // corresponding reference type.
+  const auto* enum_descriptor = DataType_descriptor();
+  int e = DataType_MIN;
+  if (e == DT_INVALID) ++e;
+  int e_ref = e + kDataTypeRefOffset;
+  EXPECT_FALSE(DataType_IsValid(e_ref - 1))
+      << "Reference enum "
+      << enum_descriptor->FindValueByNumber(e_ref - 1)->name()
+      << " without corresponding base enum with value " << e - 1;
+  for (;
+       DataType_IsValid(e) && DataType_IsValid(e_ref) && e_ref <= DataType_MAX;
+       ++e, ++e_ref) {
+    string enum_name = enum_descriptor->FindValueByNumber(e)->name();
+    string enum_ref_name = enum_descriptor->FindValueByNumber(e_ref)->name();
+    EXPECT_EQ(enum_name + "_REF", enum_ref_name)
+        << enum_name << "_REF should have value " << e_ref << " not "
+        << enum_ref_name;
+    // Validate DataTypeString() as well.
+    DataType dt_e = static_cast<DataType>(e);
+    DataType dt_e_ref = static_cast<DataType>(e_ref);
+    EXPECT_EQ(DataTypeString(dt_e) + "_ref", DataTypeString(dt_e_ref));
+
+    // Test DataTypeFromString reverse conversion
+    DataType dt_e2, dt_e2_ref;
+    EXPECT_TRUE(DataTypeFromString(DataTypeString(dt_e), &dt_e2));
+    EXPECT_EQ(dt_e, dt_e2);
+    EXPECT_TRUE(DataTypeFromString(DataTypeString(dt_e_ref), &dt_e2_ref));
+    EXPECT_EQ(dt_e_ref, dt_e2_ref);
+  }
+  ASSERT_FALSE(DataType_IsValid(e))
+      << "Should define " << enum_descriptor->FindValueByNumber(e)->name()
+      << "_REF to be " << e_ref;
+  ASSERT_FALSE(DataType_IsValid(e_ref))
+      << "Extra reference enum "
+      << enum_descriptor->FindValueByNumber(e_ref)->name()
+      << " without corresponding base enum with value " << e;
+  ASSERT_LT(DataType_MAX, e_ref) << "Gap in reference types, missing value for "
+                                 << e_ref;
+
+  // Make sure there are no enums defined after the last regular type before
+  // the first reference type.
+  for (; e < DataType_MIN + kDataTypeRefOffset; ++e) {
+    EXPECT_FALSE(DataType_IsValid(e))
+        << "Discontinuous enum value "
+        << enum_descriptor->FindValueByNumber(e)->name() << " = " << e;
+  }
+}
+
+TEST(TypesTest, DataTypeFromString) {
+  DataType dt;
+  ASSERT_TRUE(DataTypeFromString("int32", &dt));
+  EXPECT_EQ(DT_INT32, dt);
+  ASSERT_TRUE(DataTypeFromString("int32_ref", &dt));
+  EXPECT_EQ(DT_INT32_REF, dt);
+  EXPECT_FALSE(DataTypeFromString("int32_ref_ref", &dt));
+  EXPECT_FALSE(DataTypeFromString("foo", &dt));
+  EXPECT_FALSE(DataTypeFromString("foo_ref", &dt));
+  ASSERT_TRUE(DataTypeFromString("int64", &dt));
+  EXPECT_EQ(DT_INT64, dt);
+  ASSERT_TRUE(DataTypeFromString("int64_ref", &dt));
+  EXPECT_EQ(DT_INT64_REF, dt);
+  ASSERT_TRUE(DataTypeFromString("quint8_ref", &dt));
+  EXPECT_EQ(DT_QUINT8_REF, dt);
+  ASSERT_TRUE(DataTypeFromString("bfloat16", &dt));
+  EXPECT_EQ(DT_BFLOAT16, dt);
+}
+
+template <typename T>
+static bool GetQuantized() {
+  return is_quantized<T>::value;
+}
+
+TEST(TypesTest, QuantizedTypes) {
+  // NOTE: GUnit cannot parse is::quantized<TYPE>::value() within the
+  // EXPECT_TRUE() clause, so we delegate through a template function.
+  EXPECT_TRUE(GetQuantized<qint8>());
+  EXPECT_TRUE(GetQuantized<quint8>());
+  EXPECT_TRUE(GetQuantized<qint32>());
+
+  EXPECT_FALSE(GetQuantized<int8>());
+  EXPECT_FALSE(GetQuantized<uint8>());
+  EXPECT_FALSE(GetQuantized<int16>());
+  EXPECT_FALSE(GetQuantized<int32>());
+
+  EXPECT_TRUE(DataTypeIsQuantized(DT_QINT8));
+  EXPECT_TRUE(DataTypeIsQuantized(DT_QUINT8));
+  EXPECT_TRUE(DataTypeIsQuantized(DT_QINT32));
+
+  EXPECT_FALSE(DataTypeIsQuantized(DT_INT8));
+  EXPECT_FALSE(DataTypeIsQuantized(DT_UINT8));
+  EXPECT_FALSE(DataTypeIsQuantized(DT_INT16));
+  EXPECT_FALSE(DataTypeIsQuantized(DT_INT32));
+  EXPECT_FALSE(DataTypeIsQuantized(DT_BFLOAT16));
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/algorithm.cc b/tensorflow/core/graph/algorithm.cc
new file mode 100644
index 0000000000..fd79ead0b1
--- /dev/null
+++ b/tensorflow/core/graph/algorithm.cc
@@ -0,0 +1,107 @@
+#include "tensorflow/core/graph/algorithm.h"
+
+#include <algorithm>
+#include <deque>
+#include <vector>
+
+namespace tensorflow {
+
+void DFS(const Graph& g, std::function<void(Node*)> enter,
+         std::function<void(Node*)> leave) {
+  // Stack of work to do.
+  struct Work {
+    Node* node;
+    bool leave;  // Are we entering or leaving n?
+  };
+  std::vector<Work> stack;
+  stack.push_back(Work{g.source_node(), false});
+
+  std::vector<bool> visited(g.num_node_ids(), false);
+  while (!stack.empty()) {
+    Work w = stack.back();
+    stack.pop_back();
+
+    Node* n = w.node;
+    if (w.leave) {
+      leave(n);
+      continue;
+    }
+
+    if (visited[n->id()]) continue;
+    visited[n->id()] = true;
+    if (enter) enter(n);
+
+    // Arrange to call leave(n) when all done with descendants.
+    if (leave) stack.push_back(Work{n, true});
+
+    // Arrange to work on descendants.
+    for (Node* out : n->out_nodes()) {
+      if (!visited[out->id()]) {
+        // Note; we must not mark as visited until we actually process it.
+        stack.push_back(Work{out, false});
+      }
+    }
+  }
+}
+
+void GetPostOrder(const Graph& g, std::vector<Node*>* order) {
+  order->clear();
+  DFS(g, nullptr, [order](Node* n) { order->push_back(n); });
+}
+
+void GetReversePostOrder(const Graph& g, std::vector<Node*>* order) {
+  GetPostOrder(g, order);
+  std::reverse(order->begin(), order->end());
+}
+
+void PruneForReverseReachability(Graph* g,
+                                 const std::unordered_set<const Node*>& nodes) {
+  std::unordered_set<const Node*> visited;
+
+  // Compute set of nodes that we need to traverse in order to reach
+  // the nodes in "nodes" by performing a breadth-first search from those
+  // nodes, and accumulating the visited nodes.
+  std::deque<const Node*> queue;
+  for (const Node* n : nodes) {
+    queue.push_back(n);
+  }
+  while (!queue.empty()) {
+    const Node* n = queue.front();
+    queue.pop_front();
+    if (visited.insert(n).second) {
+      for (const Node* in : n->in_nodes()) {
+        queue.push_back(in);
+      }
+    }
+  }
+
+  // Make a pass over the graph to remove nodes not in "visited"
+  std::vector<Node*> all_nodes;
+  for (Node* n : g->nodes()) {
+    all_nodes.push_back(n);
+  }
+
+  for (Node* n : all_nodes) {
+    if (visited.count(n) == 0 && !n->IsSource() && !n->IsSink()) {
+      g->RemoveNode(n);
+    }
+  }
+
+  // Reconnect nodes with no outgoing edges to the sink node
+  FixupSourceAndSinkEdges(g);
+}
+
+void FixupSourceAndSinkEdges(Graph* g) {
+  // Connect all nodes with no incoming edges to source.
+  // Connect all nodes with no outgoing edges to sink.
+  for (Node* n : g->nodes()) {
+    if (!n->IsSource() && n->in_edges().empty()) {
+      g->AddControlEdge(g->source_node(), n);
+    }
+    if (!n->IsSink() && n->out_edges().empty()) {
+      g->AddControlEdge(n, g->sink_node());
+    }
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/algorithm.h b/tensorflow/core/graph/algorithm.h
new file mode 100644
index 0000000000..58b74a0ace
--- /dev/null
+++ b/tensorflow/core/graph/algorithm.h
@@ -0,0 +1,40 @@
+#ifndef TENSORFLOW_GRAPH_ALGORITHM_H_
+#define TENSORFLOW_GRAPH_ALGORITHM_H_
+
+#include <functional>
+#include <unordered_set>
+
+#include "tensorflow/core/graph/graph.h"
+
+namespace tensorflow {
+
+// Perform a depth-first-search on g starting at the source node.
+// If enter is not empty, calls enter(n) before visiting any children of n.
+// If leave is not empty, calls leave(n) after visiting all children of n.
+extern void DFS(const Graph& g, std::function<void(Node*)> enter,
+                std::function<void(Node*)> leave);
+
+// Stores in *order the post-order numbering of all nodes
+// in graph found via a depth first search starting at the source node.
+//
+// Note that this is equivalent to topological sorting when the
+// graph does not have cycles.
+//
+// REQUIRES: order is not NULL.
+void GetPostOrder(const Graph& g, std::vector<Node*>* order);
+
+// Stores in *order the reverse post-order numbering of all nodes
+void GetReversePostOrder(const Graph& g, std::vector<Node*>* order);
+
+// Prune nodes in "g" that are not in some path from the source node
+// to any node in 'nodes'.
+void PruneForReverseReachability(Graph* g,
+                                 const std::unordered_set<const Node*>& nodes);
+
+// Connect all nodes with no incoming edges to source.
+// Connect all nodes with no outgoing edges to sink.
+void FixupSourceAndSinkEdges(Graph* g);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_ALGORITHM_H_
diff --git a/tensorflow/core/graph/algorithm_test.cc b/tensorflow/core/graph/algorithm_test.cc
new file mode 100644
index 0000000000..48f2e1ebd7
--- /dev/null
+++ b/tensorflow/core/graph/algorithm_test.cc
@@ -0,0 +1,103 @@
+#include "tensorflow/core/graph/algorithm.h"
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/graph/subgraph.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+// TODO(josh11b): Test setting the "device" field of a NodeDef.
+// TODO(josh11b): Test that feeding won't prune targets.
+
+namespace tensorflow {
+namespace {
+
+REGISTER_OP("TestParams").Output("o: float");
+REGISTER_OP("TestInput").Output("a: float").Output("b: float");
+REGISTER_OP("TestMul").Input("a: float").Input("b: float").Output("o: float");
+
+// Compares that the order of nodes in 'inputs' respects the
+// pair orders described in 'ordered_pairs'.
+bool ExpectBefore(const std::vector<std::pair<string, string>>& ordered_pairs,
+                  const std::vector<Node*>& inputs, string* error) {
+  for (const std::pair<string, string>& pair : ordered_pairs) {
+    const string& before_node = pair.first;
+    const string& after_node = pair.second;
+    bool seen_before = false;
+    bool seen_both = false;
+    for (const Node* node : inputs) {
+      if (!seen_before && after_node == node->name()) {
+        *error = strings::StrCat("Saw ", after_node, " before ", before_node);
+        return false;
+      }
+
+      if (before_node == node->name()) {
+        seen_before = true;
+      } else if (after_node == node->name()) {
+        seen_both = seen_before;
+        break;
+      }
+    }
+    if (!seen_both) {
+      *error = strings::StrCat("didn't see either ", before_node, " or ",
+                               after_node);
+      return false;
+    }
+  }
+
+  return true;
+}
+
+TEST(AlgorithmTest, ReversePostOrder) {
+  RequireDefaultOps();
+  GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  Node* w1 = SourceOp("TestParams", b.opts().WithName("W1"));
+  Node* w2 = SourceOp("TestParams", b.opts().WithName("W2"));
+  Node* input =
+      SourceOp("TestInput", b.opts().WithName("input").WithControlInput(w1));
+  Node* t1 = BinaryOp("TestMul", w1, {input, 1}, b.opts().WithName("t1"));
+  BinaryOp("TestMul", w1, {input, 1},
+           b.opts().WithName("t2").WithControlInput(t1));
+  BinaryOp("TestMul", w2, {input, 1}, b.opts().WithName("t3"));
+
+  Graph g(OpRegistry::Global());
+  ASSERT_OK(b.ToGraph(&g));
+  std::vector<Node*> order;
+
+  // Test reverse post order:
+  GetReversePostOrder(g, &order);
+
+  // Check that the order respects the dependencies correctly.
+  std::vector<std::pair<string, string>> reverse_orders = {
+      {"W1", "input"}, {"W1", "t1"},    {"W1", "t2"}, {"W1", "t3"},
+      {"input", "t1"}, {"input", "t3"}, {"t1", "t2"}, {"W2", "t3"}};
+  string error;
+  EXPECT_TRUE(ExpectBefore(reverse_orders, order, &error)) << error;
+
+  // A false ordering should fail the check.
+  reverse_orders = {{"input", "W1"}};
+  EXPECT_FALSE(ExpectBefore(reverse_orders, order, &error));
+
+  // Test post order:
+  GetPostOrder(g, &order);
+
+  // Check that the order respects the dependencies correctly.
+  std::vector<std::pair<string, string>> orders = {
+      {"input", "W1"}, {"t1", "W1"},    {"t2", "W1"}, {"t3", "W1"},
+      {"t1", "input"}, {"t3", "input"}, {"t2", "t1"}, {"t3", "W2"}};
+  EXPECT_TRUE(ExpectBefore(orders, order, &error)) << error;
+
+  // A false ordering should fail the check.
+  orders = {{"W1", "t3"}};
+  EXPECT_FALSE(ExpectBefore(orders, order, &error));
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/colors.cc b/tensorflow/core/graph/colors.cc
new file mode 100644
index 0000000000..0eb2fc3740
--- /dev/null
+++ b/tensorflow/core/graph/colors.cc
@@ -0,0 +1,25 @@
+#include "tensorflow/core/graph/colors.h"
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// Color palette
+// http://www.mulinblog.com/a-color-palette-optimized-for-data-visualization/
+static const char* kColors[] = {
+  "#F15854",  // red
+  "#5DA5DA",  // blue
+  "#FAA43A",  // orange
+  "#60BD68",  // green
+  "#F17CB0",  // pink
+  "#B2912F",  // brown
+  "#B276B2",  // purple
+  "#DECF3F",  // yellow
+  "#4D4D4D",  // gray
+};
+
+const char* ColorFor(int dindex) {
+  return kColors[dindex % TF_ARRAYSIZE(kColors)];
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/colors.h b/tensorflow/core/graph/colors.h
new file mode 100644
index 0000000000..150c8dc025
--- /dev/null
+++ b/tensorflow/core/graph/colors.h
@@ -0,0 +1,14 @@
+#ifndef TENSORFLOW_GRAPH_COLORS_H_
+#define TENSORFLOW_GRAPH_COLORS_H_
+
+namespace tensorflow {
+
+// Return a color drawn from a palette to represent an entity
+// identified by "i".  The return value has the form "#RRGGBB" Note
+// that the palette has a limited set of colors and therefore colors
+// will be reused eventually.
+const char* ColorFor(int dindex);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_COLORS_H_
diff --git a/tensorflow/core/graph/costmodel.cc b/tensorflow/core/graph/costmodel.cc
new file mode 100644
index 0000000000..89bc41acfd
--- /dev/null
+++ b/tensorflow/core/graph/costmodel.cc
@@ -0,0 +1,308 @@
+#include "tensorflow/core/graph/costmodel.h"
+
+#include "tensorflow/core/framework/step_stats.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace {
+const Microseconds kDefaultTimeEstimate(1);
+const Microseconds kMinTimeEstimate(1);
+}  // namespace
+
+void CostModel::SuppressInfrequent() {
+  // Find the median of the non-zero counts, and use half of its value
+  // as the cutoff for a "normal" execution mode node.
+  if (count_.empty()) return;
+  std::vector<int32> non_zero;
+  for (auto v : count_) {
+    if (v > 0) non_zero.push_back(v);
+  }
+  const size_t sz = non_zero.size();
+  if (sz > 0) {
+    std::nth_element(non_zero.begin(), non_zero.begin() + sz / 2,
+                     non_zero.end());
+    int32 median_value = non_zero[sz / 2];
+    min_count_ = median_value / 2;
+    VLOG(1) << "num non_zero vals: " << non_zero.size() << " median_value "
+            << median_value;
+  } else {
+    min_count_ = 1;
+  }
+}
+
+void CostModel::MergeFromLocal(const Graph& g, const CostModel& cm) {
+  CHECK(is_global_);
+  CHECK(!cm.is_global());
+  for (const Node* n : g.nodes()) {
+    const int local_id = cm.Id(n);
+    const int global_id = Id(n);
+    if (local_id < 0 || global_id < 0) continue;
+    Ensure(global_id);
+    count_[global_id] += cm.count_[local_id];
+    time_[global_id] += cm.time_[local_id];
+    int num_slots = cm.slot_bytes_[local_id].size();
+    if (num_slots > 0) {
+      if (slot_bytes_[global_id].size() == 0) {
+        slot_bytes_[global_id].resize(num_slots);
+      } else {
+        CHECK_EQ(num_slots, slot_bytes_[global_id].size());
+      }
+      for (int s = 0; s < num_slots; ++s) {
+        slot_bytes_[global_id][s] += cm.slot_bytes_[local_id][s];
+      }
+    }
+  }
+}
+
+void CostModel::MergeFromGlobal(const CostModel& cm) {
+  CHECK(is_global_);
+  CHECK_EQ(true, cm.is_global());
+  const int num_nodes = cm.count_.size();
+  Ensure(num_nodes);
+  for (int i = 0; i < num_nodes; ++i) {
+    count_[i] += cm.count_[i];
+    time_[i] += cm.time_[i];
+    int num_slots = cm.slot_bytes_[i].size();
+    if (num_slots > 0) {
+      if (slot_bytes_[i].size() == 0) {
+        slot_bytes_[i].resize(num_slots);
+      } else {
+        CHECK_EQ(num_slots, slot_bytes_[i].size());
+      }
+      for (int s = 0; s < num_slots; ++s) {
+        slot_bytes_[i][s] += cm.slot_bytes_[i][s];
+      }
+    }
+  }
+}
+
+void CostModel::MergeFromStats(const NodeNameToCostIdMap& map,
+                               const StepStats& ss) {
+  CHECK(is_global_);
+  for (auto& ds : ss.dev_stats()) {
+    for (auto& ns : ds.node_stats()) {
+      NodeNameToCostIdMap::const_iterator iter = map.find(ns.node_name());
+      // We don't keep stats for nodes not in the global graph, i.e.
+      // copy/send/recv nodes, feed/fetch, etc.
+      if (iter == map.end()) continue;
+      int32 global_id = iter->second;
+      Ensure(global_id);
+      int64 elapsed_micros = ns.op_end_rel_micros() - ns.op_start_rel_micros();
+      count_[global_id]++;
+      time_[global_id] += elapsed_micros;
+      for (auto& no : ns.output()) {
+        int si = no.slot();
+        if (static_cast<size_t>(si) >= slot_bytes_[global_id].size()) {
+          slot_bytes_[global_id].resize(1 + si);
+        }
+        slot_bytes_[global_id][si] +=
+            no.tensor_description().allocation_description().requested_bytes();
+      }
+    }
+  }
+}
+
+void CostModel::Ensure(int id) {
+  if (slot_bytes_.size() <= static_cast<size_t>(id)) {
+    slot_bytes_.resize(id + 1);
+    count_.resize(id + 1);
+    time_.resize(id + 1);
+  }
+}
+
+void CostModel::SetNumOutputs(const Node* node, int num_outputs) {
+  const int id = Id(node);
+  if (id < 0) return;
+  Ensure(id);
+  auto perslot = &slot_bytes_[id];
+  if (perslot->size() > 0) {
+    CHECK_EQ(num_outputs, perslot->size()) << "Cannot resize slot_bytes, node="
+                                           << node->name();
+  } else {
+    perslot->resize(num_outputs, Bytes(-1));
+  }
+}
+
+void CostModel::RecordCount(const Node* node, int count) {
+  const int id = Id(node);
+  if (id < 0) return;
+  CHECK_LT(id, slot_bytes_.size());
+  count_[id] += count;
+}
+
+int32 CostModel::TotalCount(const Node* node) const {
+  const int id = Id(node);
+  if (id < 0) return 0;
+  return (static_cast<size_t>(id) < slot_bytes_.size()) ? count_[id] : 0;
+}
+
+void CostModel::RecordSize(const Node* node, int slot, Bytes bytes) {
+  const int id = Id(node);
+  if (id < 0) return;
+  CHECK_LT(id, slot_bytes_.size());
+  auto perslot = &slot_bytes_[id];
+  CHECK_LT(slot, perslot->size());
+  auto v = &(*perslot)[slot];
+  if (*v >= 0) {
+    *v += bytes;
+  } else {
+    *v = bytes;
+  }
+}
+
+Bytes CostModel::TotalBytes(const Node* node, int slot) const {
+  const int id = Id(node);
+  if (id < 0 || static_cast<size_t>(id) >= slot_bytes_.size() ||
+      slot_bytes_[id].size() <= static_cast<size_t>(slot)) {
+    return Bytes(0);
+  }
+  return slot_bytes_[id][slot];
+}
+
+Bytes CostModel::SizeEstimate(const Node* node, int slot) const {
+  int32 count = TotalCount(node);
+  if (count < min_count_) return Bytes(0);
+  return TotalBytes(node, slot) / std::max(1, TotalCount(node));
+}
+
+void CostModel::RecordTime(const Node* node, Microseconds time) {
+  const int id = Id(node);
+  if (id < 0) return;
+  DCHECK(node->IsOp()) << node->DebugString();
+  Ensure(id);
+  time_[id] += time;
+}
+
+Microseconds CostModel::TotalTime(const Node* node) const {
+  DCHECK(node->IsOp()) << node->DebugString();
+  const int id = Id(node);
+  if (id < 0 || static_cast<size_t>(id) >= time_.size() ||
+      time_[id] < Microseconds(0)) {
+    return Microseconds(0);
+  }
+  return time_[id];
+}
+
+Microseconds CostModel::TimeEstimate(const Node* node) const {
+  int32 count = TotalCount(node);
+  if (count <= min_count_) return kMinTimeEstimate;
+  return std::max(kMinTimeEstimate, TotalTime(node) / std::max(1, count));
+}
+
+void CostModel::CheckInitialized(const Graph& graph) const {
+  for (const Node* n : graph.nodes()) {
+    if (n->IsOp()) {
+      CHECK(static_cast<size_t>(n->id()) < time_.size() &&
+            time_[n->id()] >= Microseconds(0))
+          << ": no time estimate for " << n->DebugString();
+
+      CHECK(static_cast<size_t>(n->id()) < slot_bytes_.size())
+          << ": no size estimate for " << n->DebugString();
+      const auto& perslot = slot_bytes_[n->id()];
+      for (size_t i = 0; i < perslot.size(); i++) {
+        CHECK_GE(perslot[i], Bytes(0)) << ": no size estimate for output# " << i
+                                       << " of " << n->DebugString();
+      }
+    }
+  }
+}
+
+Microseconds CostModel::CopyTimeEstimate(Bytes b, double network_latency_millis,
+                                         double estimated_gbps) {
+  // TODO(jeff,sanjay): estimate cost based on bandwidth along the
+  // communication path and the type of transport we are using between
+  // devices.
+  //
+  // We assume the copy time follows a linear model:
+  //    copy_time = copy_bytes / rate + min_time
+  int64 copy_bytes = b.value();
+  const double bytes_per_usec = estimated_gbps * 1000.0 / 8;
+  const double min_micros = network_latency_millis * 1000.0;
+  return Microseconds(
+      static_cast<int64>(copy_bytes / bytes_per_usec + min_micros));
+}
+
+Microseconds CostModel::ComputationTimeEstimate(int64 math_ops) {
+  // TODO(jeff,sanjay): Eventually we should pass in the type of device
+  // (GPU vs. CPU) and use that to affect the estimate.
+
+  // We estimate the microseconds using that value.  We divide
+  // by 1000 to convert the madd number into microseconds (assuming
+  // roughly 1000 madds per microsecond (~1 GHz for one core)).
+  return Microseconds(math_ops / 1000);
+}
+
+// ----------------------------------------------------------------------------
+// InitCostModel
+// ----------------------------------------------------------------------------
+
+namespace {
+
+static void AddNodesToCostModel(const Graph& g, CostModel* cost_model) {
+  for (Node* n : g.nodes()) {
+    const int num_outputs = n->num_outputs();
+    cost_model->SetNumOutputs(n, num_outputs);
+    for (int output = 0; output < num_outputs; output++) {
+      // Set up an initial bogus estimate for the node's outputs
+      cost_model->RecordSize(n, output, Bytes(1));
+    }
+  }
+}
+
+static void AssignSizes(const Graph& g, CostModel* cost_model) {
+  for (const Edge* e : g.edges()) {
+    // Skip if it is a control edge.
+    if (e->IsControlEdge()) {
+      continue;
+    }
+    Node* src = e->src();
+
+    // TODO(josh11b): Get an estimate from the Op
+    Bytes size(1);
+    cost_model->RecordSize(src, e->src_output(), size);
+  }
+}
+
+// This generates an extremely simple initial guess for the
+// computation cost of each node. For ordinary Ops, its value should quickly
+// be wiped out by the real runtime measurements.  For other Ops we don't
+// actually generate measurements, so suppression of infrequent Ops ends up
+// giving them 0 costs.  So, this is not of much consequence except perhaps
+// in tests.
+static Microseconds TimeEstimateForNode(CostModel* cost_model, Node* n) {
+  CHECK(n->IsOp());
+  VLOG(2) << "Node " << n->id() << ": " << n->name()
+          << " type_string: " << n->type_string();
+  if (IsConstant(n) || IsVariable(n)) {
+    return Microseconds(0);
+  }
+  return kDefaultTimeEstimate;
+}
+
+static void EstimateComputationCosts(const Graph& g, CostModel* cost_model) {
+  for (Node* n : g.nodes()) {
+    if (!n->IsOp()) continue;
+    cost_model->RecordTime(n, TimeEstimateForNode(cost_model, n));
+  }
+}
+
+}  // namespace
+
+void CostModel::InitFromGraph(const Graph& g) {
+  AddNodesToCostModel(g, this);
+  AssignSizes(g, this);
+  EstimateComputationCosts(g, this);
+  CheckInitialized(g);
+}
+
+void CostModel::WriteToLog() {
+  LOG(INFO) << " min_count_=" << min_count_;
+  for (size_t i = 0; i < count_.size(); ++i) {
+    LOG(INFO) << "Node " << i << " count " << count_[i] << " total time "
+              << time_[i] << " avg time "
+              << (time_[i] / (std::max(1, count_[i])));
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/costmodel.h b/tensorflow/core/graph/costmodel.h
new file mode 100644
index 0000000000..4d7dd65f5a
--- /dev/null
+++ b/tensorflow/core/graph/costmodel.h
@@ -0,0 +1,123 @@
+#ifndef TENSORFLOW_GRAPH_COSTMODEL_H_
+#define TENSORFLOW_GRAPH_COSTMODEL_H_
+
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/types.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+
+namespace tensorflow {
+typedef std::unordered_map<string, int32> NodeNameToCostIdMap;
+
+class StepStats;
+
+// CostModel keeps track of the following runtime statistics for nodes
+// of a single Graph:
+//    * The total number of times a node has executed.
+//    * The accumulated execution time (in microseconds) of a node.
+//    * The accumulated size (in bytes) of each node's output.
+//
+// This class is NOT thread-safe.
+class CostModel {
+ public:
+  // If "global" is true, maintains costs based on Node::cost_id, otherwise
+  // maintains costs based on Node::id.
+  explicit CostModel(bool is_global) : is_global_(is_global) {}
+
+  // Assigns min_count_ as a function of the median count for a Node.
+  // This value is then used for suppressing the time/size costs of
+  // infrequent operations.
+  // NOTE(tucker): Maybe this should move to a subclass of CostModel.
+  void SuppressInfrequent();
+
+  bool is_global() const { return is_global_; }
+
+  // Initializes cost model for 'g'.
+  void InitFromGraph(const Graph& g);
+
+  // Merges costs from cm.
+  // REQUIRES: is_global_ is true for this and for "cm"
+  void MergeFromGlobal(const CostModel& cm);
+
+  // Merges costs from "cm", which has been computed relative to "g".
+  // REQUIRES: is_global_ is true for this, and false for "cm".
+  void MergeFromLocal(const Graph& g, const CostModel& cm);
+
+  void MergeFromStats(const NodeNameToCostIdMap& map, const StepStats& ss);
+
+  // Sets the number of outputs of "node".
+  void SetNumOutputs(const Node* node, int num_outputs);
+
+  // Records that "node" has executed "num_count" more times.
+  void RecordCount(const Node* node, int num_count);
+
+  // Returns how many times "node" has been executed.
+  int32 TotalCount(const Node* node) const;
+
+  // Records that "output_slot" of "node" has produced tensors of
+  // aggregated "bytes".
+  void RecordSize(const Node* node, int output_slot, Bytes bytes);
+
+  // Returns total bytes of tensors produced by "node"s output slot.
+  Bytes TotalBytes(const Node* node, int output_slot) const;
+
+  // Returns a prediction for the size of the tensor at the
+  // output_slot produced by one execution of "node".
+  Bytes SizeEstimate(const Node* node, int output_slot) const;
+
+  // Records that Executions of "node" have taken "time" microseconds.
+  void RecordTime(const Node* node, Microseconds time);
+
+  // Returns the total execution time for "node".
+  Microseconds TotalTime(const Node* node) const;
+
+  // Returns a prediction for one execution of "node".
+  Microseconds TimeEstimate(const Node* node) const;
+
+  // Check that an estimate is available for every OP node in graph.
+  void CheckInitialized(const Graph& graph) const;
+
+  // Helper routines to encapsulate static estimatation heuristics
+
+  // Compute an estimate of the time to copy "b" bytes over the network,
+  // given a fixed cost of "network_latency_millis" milliseconds and
+  // an estimated bandwidth of "estimated_gbps" gigabits per second (note that
+  // this value is in gigabits, not gigabytes).
+  static Microseconds CopyTimeEstimate(Bytes b, double network_latency_millis,
+                                       double estimated_gbps);
+  static Microseconds ComputationTimeEstimate(int64 mathops);
+
+  // Write the contents of the CostModel to the INFO log.
+  void WriteToLog();
+
+ private:
+  const bool is_global_;
+  inline int Id(const Node* n) const {
+    if (is_global_) {
+      return n->cost_id();
+    } else {
+      return n->id();
+    }
+  }
+  // Resizes vectors so that they are large enough for "id".
+  void Ensure(int id);
+
+  // Nodes and Edges whose count is < this value
+  // get type/byte estimates of 0.
+  int32 min_count_ = 0;
+
+  // Number of times each Node has been executed.
+  std::vector<int32> count_;
+  // Cumulative execution time.
+  std::vector<Microseconds> time_;
+  // Cumulative Bytes output on each channel.
+  std::vector<gtl::InlinedVector<Bytes, 2> > slot_bytes_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(CostModel);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_COSTMODEL_H_
diff --git a/tensorflow/core/graph/costutil.cc b/tensorflow/core/graph/costutil.cc
new file mode 100644
index 0000000000..f8e2d9fe68
--- /dev/null
+++ b/tensorflow/core/graph/costutil.cc
@@ -0,0 +1,22 @@
+#include "tensorflow/core/graph/costutil.h"
+
+#include "tensorflow/core/graph/algorithm.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/costmodel.h"
+
+namespace tensorflow {
+
+std::vector<int64> LongestOutgoingPathCost(const Graph& graph,
+                                           const CostModel& cm) {
+  std::vector<int64> result(graph.num_node_ids());
+  DFS(graph, nullptr, [&result, &cm](Node* n) {
+    int64 max_child = 0;
+    for (const Node* out : n->out_nodes()) {
+      max_child = std::max(max_child, result[out->id()]);
+    }
+    result[n->id()] = max_child + (n->IsOp() ? cm.TimeEstimate(n).value() : 0);
+  });
+  return result;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/costutil.h b/tensorflow/core/graph/costutil.h
new file mode 100644
index 0000000000..46e5215132
--- /dev/null
+++ b/tensorflow/core/graph/costutil.h
@@ -0,0 +1,19 @@
+#ifndef TENSORFLOW_GRAPH_COSTUTIL_H_
+#define TENSORFLOW_GRAPH_COSTUTIL_H_
+
+#include <vector>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+class CostModel;
+class Graph;
+
+// result[i] is an estimate of the longest execution path from
+// the node with id i to the sink node.
+std::vector<int64> LongestOutgoingPathCost(const Graph& graph,
+                                           const CostModel& cm);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_COSTUTIL_H_
diff --git a/tensorflow/core/graph/default_device.h b/tensorflow/core/graph/default_device.h
new file mode 100644
index 0000000000..30cd4e8a57
--- /dev/null
+++ b/tensorflow/core/graph/default_device.h
@@ -0,0 +1,25 @@
+#ifndef TENSORFLOW_GRAPH_DEFAULT_DEVICE_H_
+#define TENSORFLOW_GRAPH_DEFAULT_DEVICE_H_
+
+#include <string>
+
+#include "tensorflow/core/framework/graph.pb.h"
+
+namespace tensorflow {
+namespace graph {
+
+// Sets the default device for all nodes in graph_def to "device",
+// only if not already set.
+inline void SetDefaultDevice(const string& device, GraphDef* graph_def) {
+  for (int i = 0; i < graph_def->node_size(); ++i) {
+    auto node = graph_def->mutable_node(i);
+    if (node->device().empty()) {
+      node->set_device(device);
+    }
+  }
+}
+
+}  // namespace graph
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_DEFAULT_DEVICE_H_
diff --git a/tensorflow/core/graph/dot.cc b/tensorflow/core/graph/dot.cc
new file mode 100644
index 0000000000..6d6e46ce61
--- /dev/null
+++ b/tensorflow/core/graph/dot.cc
@@ -0,0 +1,289 @@
+#include "tensorflow/core/graph/dot.h"
+
+#include <map>
+#include <unordered_map>
+#include <unordered_set>
+
+#include "tensorflow/core/graph/colors.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/regexp.h"
+#include "tensorflow/core/util/util.h"
+
+namespace tensorflow {
+
+static string GraphNodeName(const DotOptions& opts, const Node* n) {
+  return strings::StrCat("N", n->id());
+}
+
+bool ShoulDisplayOpType(const Node* n) {
+  if (n->type_string() == "NoOp") {
+    return false;
+  }
+  const string& op_name = n->def().name();
+  if (op_name.find(n->type_string() + "_") == 0) {
+    return false;
+  }
+  return true;
+}
+
+string DotGraph(const Graph& g, const DotOptions& opts) {
+  RegexpStringPiece flag(opts.prefix_collapse_regexp);
+  if (flag == "all") {
+    flag = ".";
+  } else if (flag == "none") {
+    flag = "^$";
+  }
+  RE2 cluster_name_pattern(flag);
+  string result;
+  strings::StrAppend(&result, "digraph G {\n");
+  strings::StrAppend(&result, "rankdir=\"BT\"\n");
+
+  std::map<string, int> device_index;       // Map from device name to index.
+  std::unordered_set<Node*> visible_nodes;  // Nodes to display.
+  // Cluster name => set of nodes.
+  std::unordered_map<string, std::unordered_set<Node*> > clusters;
+  // Node* => Cluster
+  std::unordered_map<Node*, string> node_cluster;
+  for (Node* src : g.nodes()) {
+    if (opts.include_node_function != nullptr &&
+        !opts.include_node_function(src)) {
+      continue;
+    }
+    // Do not display source and sink nodes
+    if (src->IsSource() || src->IsSink()) {
+      continue;
+    }
+    visible_nodes.insert(src);
+    const string name_prefix = NodeNamePrefix(src->def().name()).ToString();
+    if (!name_prefix.empty()) {
+      clusters[name_prefix].insert(src);
+      node_cluster[src] = name_prefix;
+    }
+    // Record device if present.
+    if (src->IsOp()) {
+      const string& d = src->assigned_device_name();
+      if (!d.empty()) {
+        device_index[d] = -1;  // Assigned later
+      }
+    }
+  }
+
+  // Add nodes whose name is exactly a cluster name to the cluster itself.
+  for (Node* src : g.nodes()) {
+    if (node_cluster.count(src) == 0) {
+      const string name = src->def().name();
+      auto it = clusters.find(name);
+      if (it != clusters.end()) {
+        it->second.insert(src);
+        node_cluster[src] = name;
+      }
+    }
+  }
+
+  auto node_in_collapsed_cluster = [&node_cluster,
+                                    &cluster_name_pattern](Node* n) {
+    return node_cluster.count(n) > 0 &&
+           RE2::PartialMatch(node_cluster[n], cluster_name_pattern);
+  };
+
+  // Assign device indices in sorted order.
+  int num = 0;
+  for (auto& e : device_index) {
+    e.second = num++;
+  }
+
+  double total_node_cost = 0;
+  double avg_node_cost = 1;
+  if (opts.node_cost) {
+    int node_count = 0;
+    for (const Node* n : g.nodes()) {
+      total_node_cost += opts.node_cost(n);
+      ++node_count;
+    }
+    if (total_node_cost > 0) avg_node_cost = total_node_cost / node_count;
+  }
+
+  for (Node* src : g.nodes()) {
+    if (visible_nodes.count(src) == 0 || node_in_collapsed_cluster(src)) {
+      continue;
+    }
+    string label = src->name();
+    if (ShoulDisplayOpType(src)) {
+      // Append the op type if it is not directly deducible from the op name.
+      strings::StrAppend(&label, "\\n(", src->type_string(), ")");
+    }
+    const char* shape = "box";
+    const char* color = nullptr;
+    if (src->IsSource()) {
+      shape = "oval";
+    } else if (src->IsSink()) {
+      shape = "oval";
+    } else {
+      const string& d = src->assigned_device_name();
+      const int dindex = (!d.empty()) ? device_index[d] : -1;
+      if (dindex >= 0) {
+        color = ColorFor(dindex);
+      }
+
+      shape = "box";
+    }
+
+    if (opts.node_label) {
+      string extra = opts.node_label(src);
+      if (!extra.empty()) {
+        strings::StrAppend(&label, "\\n", extra);
+      }
+    }
+
+    strings::StrAppend(&result, GraphNodeName(opts, src), "[shape=", shape,
+                       ", label=\"", label, "\"");
+    if (opts.node_cost && total_node_cost > 0) {
+      // Pick fontsize in range [8..40] so that area is proportional to cost.
+      const double cost = opts.node_cost(src);
+      const double relcost = fabs(cost / avg_node_cost);
+      // Average cost node has font size of 12.
+      const int fs = 8 + static_cast<int>(4.0 * std::min(sqrt(relcost), 8.0));
+      strings::StrAppend(&result, ", width=0, height=0, fontsize=", fs);
+      VLOG(2) << "Node: " << cost << " => " << relcost << " => " << fs;
+    }
+    if (color != nullptr) {
+      strings::StrAppend(&result, ", fillcolor=\"", color,
+                         "\", fontcolor=\"white\", style=\"filled\"");
+    }
+    strings::StrAppend(&result, "]\n");
+  }
+
+  for (auto c : clusters) {
+    const string& cluster_name = c.first;
+    const std::unordered_set<Node*> nodes = c.second;
+    std::unordered_map<string, int> node_colors;
+    for (auto n : nodes) {
+      const string& d = n->assigned_device_name();
+      const int dindex = (!d.empty()) ? device_index[d] : -1;
+      if (dindex >= 0) {
+        ++node_colors[ColorFor(dindex)];
+      }
+    }
+
+    string majority_color;
+    if (node_colors.empty()) {
+      majority_color = ColorFor(0);
+    } else {
+      majority_color = std::max_element(node_colors.begin(), node_colors.end(),
+                                        [](const std::pair<string, int>& x,
+                                           const std::pair<string, int>& y) {
+                                          return x.second < y.second;
+                                        })
+                           ->first;
+    }
+
+    if (!RE2::PartialMatch(cluster_name, cluster_name_pattern)) {
+      strings::StrAppend(&result, "subgraph cluster_", cluster_name, "{\n");
+      for (auto n : nodes) {
+        strings::StrAppend(&result, GraphNodeName(opts, n), ";\n");
+      }
+      strings::StrAppend(&result, "}\n");
+    } else {
+      strings::StrAppend(&result, cluster_name, " [shape=oval, fillcolor=\"",
+                         majority_color, "\", label=\"", cluster_name,
+                         "\", style=\"filled\", fontcolor=\"white\"]\n");
+    }
+  }
+
+  std::unordered_set<string> edge_drawn;
+
+  double max_edge_cost = 0;
+  double total_edge_cost = 0;
+  double avg_edge_cost = 1;
+  if (opts.edge_cost && g.edges().size()) {
+    for (const Edge* e : g.edges()) {
+      auto cost = opts.edge_cost(e);
+      total_edge_cost += cost;
+      max_edge_cost = std::max(max_edge_cost, cost);
+    }
+    avg_edge_cost = total_edge_cost / g.edges().size();
+  }
+  VLOG(2) << "Edge cost tot/max/avg: " << total_edge_cost << "/"
+          << max_edge_cost << "/" << avg_edge_cost;
+
+  for (const Edge* e : g.edges()) {
+    Node* src = e->src();
+    Node* dst = e->dst();
+    // If either endpoint isn't drawn in the graph, don't draw the edge
+    if (visible_nodes.count(src) == 0 || visible_nodes.count(dst) == 0) {
+      continue;
+    }
+
+    const string src_name = node_in_collapsed_cluster(src)
+                                ? node_cluster[src]
+                                : GraphNodeName(opts, src);
+    const string dst_name = node_in_collapsed_cluster(dst)
+                                ? node_cluster[dst]
+                                : GraphNodeName(opts, dst);
+    // Don't draw self edges
+    if (src_name == dst_name) {
+      continue;
+    }
+    // And previously drawn edges.
+    const string& edge_name = strings::StrCat(src_name, ":", dst_name);
+    if (edge_drawn.count(edge_name) > 0) {
+      continue;
+    }
+    edge_drawn.insert(edge_name);
+
+    strings::StrAppend(&result, src_name, " -> ", dst_name, "[");
+    string label;
+    if (e->IsControlEdge()) {
+      strings::StrAppend(&result, " style=dotted");
+    }
+    if (opts.edge_label) {
+      string label = opts.edge_label(e);
+      if (!label.empty()) {
+        strings::StrAppend(&result, " label=<", label, ">");
+      }
+    }
+    // Make edge widths proportional to amount of data transferred.
+    if (opts.edge_cost && max_edge_cost > 0) {
+      const double cost = opts.edge_cost(e);
+      const double relcost = fabs(cost / avg_edge_cost);
+      // Pick penwidth in range [1..6] so that width is proportional to cost.
+      const int pw = 1 + std::min(5, static_cast<int>(2.0 * relcost));
+      strings::StrAppend(&result, " penwidth=", pw);
+      // Use weight attributes [1..100] to keep heavier edges more vertical.
+      const int weight = 1 + std::min(99, static_cast<int>(100.0 * relcost));
+      strings::StrAppend(&result, " weight=", weight);
+      VLOG(2) << "Edge: " << cost << " => " << relcost << " => " << pw << "/"
+              << weight;
+    }
+
+    strings::StrAppend(&result, "]\n");
+  }
+  // Compute some statistics
+  int op_nodes = 0;
+  for (Node* n : g.nodes()) {
+    if (n->IsOp()) {
+      op_nodes++;
+    }
+  }
+
+  // Emit legend
+  strings::StrAppend(&result,
+                     "{ rank = source; Legend [shape=box, margin=0, label=<",
+                     "<TABLE BORDER=\"0\" CELLBORDER=\"1\" CELLSPACING=\"0\" ",
+                     "CELLPADDING=\"4\">", "<TR><TD COLSPAN=\"2\">op_nodes: ",
+                     op_nodes, "</TD></TR>\n");
+  for (const auto& e : device_index) {
+    const int dindex = e.second;
+    strings::StrAppend(&result, "<TR><TD BGCOLOR=\"", ColorFor(dindex),
+                       "\"><FONT COLOR=\"white\">", dindex, "</FONT></TD><TD>",
+                       e.first, "</TD></TR>\n");
+  }
+  strings::StrAppend(&result, "</TABLE>>]}\n");
+
+  strings::StrAppend(&result, "}\n");  // End digraph
+  return result;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/dot.h b/tensorflow/core/graph/dot.h
new file mode 100644
index 0000000000..f87f68099c
--- /dev/null
+++ b/tensorflow/core/graph/dot.h
@@ -0,0 +1,43 @@
+#ifndef TENSORFLOW_GRAPH_DOT_H_
+#define TENSORFLOW_GRAPH_DOT_H_
+
+#include <functional>
+#include <string>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+class Edge;
+class Graph;
+class Node;
+
+struct DotOptions {
+  bool (*include_node_function)(const Node*) = nullptr;
+
+  // By default, all nodes with the same name prefix are collapsed into
+  // a single node in the dot graph.  This regexp can be changed so that
+  // only prefixes that match the regexp are collapsed in this fashion.
+  // 'all' collapses all ops with prefixes, 'none' disables all collapsing.
+  string prefix_collapse_regexp = "all";
+
+  // A function that returns a label to embed into the per-node display.
+  std::function<string(const Node*)> node_label;
+
+  // A function that returns a label to attach to an edge.
+  std::function<string(const Edge*)> edge_label;
+
+  // A function that returns the "cost" of the node.  The dot display
+  // makes a node size proportional to its cost.
+  std::function<double(const Node*)> node_cost;
+
+  // A function that returns the "cost" of the edge.  The dot display
+  // makes a edge thickness proportional to its cost.
+  std::function<double(const Edge*)> edge_cost;
+};
+
+// Return a string that contains a graphviz specification of the graph.
+string DotGraph(const Graph& g, const DotOptions& opts);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_DOT_H_
diff --git a/tensorflow/core/graph/edgeset.cc b/tensorflow/core/graph/edgeset.cc
new file mode 100644
index 0000000000..83293c7b4e
--- /dev/null
+++ b/tensorflow/core/graph/edgeset.cc
@@ -0,0 +1,56 @@
+#include "tensorflow/core/graph/edgeset.h"
+
+namespace tensorflow {
+
+std::pair<EdgeSet::const_iterator, bool> EdgeSet::insert(value_type value) {
+  RegisterMutation();
+  const_iterator ci;
+  ci.Init(this);
+  auto s = get_set();
+  if (!s) {
+    for (int i = 0; i < kInline; i++) {
+      if (ptrs_[i] == value) {
+        ci.array_iter_ = &ptrs_[i];
+        return std::make_pair(ci, false);
+      }
+    }
+    for (int i = 0; i < kInline; i++) {
+      if (ptrs_[i] == nullptr) {
+        ptrs_[i] = value;
+        ci.array_iter_ = &ptrs_[i];
+        return std::make_pair(ci, true);
+      }
+    }
+    // array is full. convert to set.
+    s = new std::set<const Edge*>;
+    for (int i = 0; i < kInline; i++) {
+      s->insert(static_cast<const Edge*>(ptrs_[i]));
+    }
+    ptrs_[0] = this;
+    ptrs_[1] = s;
+    // fall through.
+  }
+  auto p = s->insert(value);
+  ci.tree_iter_ = p.first;
+  return std::make_pair(ci, p.second);
+}
+
+EdgeSet::size_type EdgeSet::erase(key_type key) {
+  RegisterMutation();
+  auto s = get_set();
+  if (!s) {
+    for (int i = 0; i < kInline; i++) {
+      if (ptrs_[i] == key) {
+        size_t n = size();
+        ptrs_[i] = ptrs_[n - 1];
+        ptrs_[n - 1] = nullptr;
+        return 1;
+      }
+    }
+    return 0;
+  } else {
+    return s->erase(key);
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/edgeset.h b/tensorflow/core/graph/edgeset.h
new file mode 100644
index 0000000000..df0d78b8fb
--- /dev/null
+++ b/tensorflow/core/graph/edgeset.h
@@ -0,0 +1,216 @@
+#ifndef TENSORFLOW_GRAPH_EDGESET_H_
+#define TENSORFLOW_GRAPH_EDGESET_H_
+
+#include <stddef.h>
+#include <set>
+#include "tensorflow/core/platform/port.h"
+
+#include "tensorflow/core/platform/logging.h"
+namespace tensorflow {
+
+class Edge;
+
+// An unordered set of edges.  Uses very little memory for small sets.
+// Unlike std::set, EdgeSet does NOT allow mutations during iteration.
+class EdgeSet {
+ public:
+  EdgeSet();
+  ~EdgeSet();
+
+  typedef const Edge* key_type;
+  typedef const Edge* value_type;
+  typedef size_t size_type;
+  typedef ptrdiff_t difference_type;
+
+  class const_iterator;
+  typedef const_iterator iterator;
+
+  bool empty() const;
+  size_type size() const;
+  void clear();
+  std::pair<iterator, bool> insert(value_type value);
+  size_type erase(key_type key);
+
+  // Caller is not allowed to mutate the EdgeSet while iterating.
+  const_iterator begin() const;
+  const_iterator end() const;
+
+ private:
+  // Up to kInline elements are stored directly in ptrs_ (nullptr means none).
+  // If ptrs_[0] == this then ptrs_[1] points to a set<const Edge*>.
+  static const int kInline = 2;  // Must be >= 2.
+  const void* ptrs_[kInline];
+
+  std::set<const Edge*>* get_set() const {
+    if (ptrs_[0] == this) {
+      return static_cast<std::set<const Edge*>*>(const_cast<void*>(ptrs_[1]));
+    } else {
+      return nullptr;
+    }
+  }
+
+// To detect mutations while iterating.
+#ifdef NDEBUG
+  void RegisterMutation() {}
+#else
+  uint32 mutations_ = 0;
+  void RegisterMutation() { mutations_++; }
+#endif
+
+  TF_DISALLOW_COPY_AND_ASSIGN(EdgeSet);
+};
+
+class EdgeSet::const_iterator {
+ public:
+  typedef typename EdgeSet::value_type value_type;
+  typedef const typename EdgeSet::value_type& reference;
+  typedef const typename EdgeSet::value_type* pointer;
+  typedef typename EdgeSet::difference_type difference_type;
+  typedef std::forward_iterator_tag iterator_category;
+
+  const_iterator() {}
+
+  const_iterator& operator++();
+  const_iterator operator++(int /*unused*/);
+  const value_type* operator->() const;
+  value_type operator*() const;
+  bool operator==(const const_iterator& other) const;
+  bool operator!=(const const_iterator& other) const {
+    return !(*this == other);
+  }
+
+ private:
+  friend class EdgeSet;
+
+  void const* const* array_iter_ = nullptr;
+  typename std::set<const Edge*>::const_iterator tree_iter_;
+
+#ifdef NDEBUG
+  inline void Init(const EdgeSet* e) {}
+  inline void CheckNoMutations() const {}
+#else
+  inline void Init(const EdgeSet* e) {
+    owner_ = e;
+    init_mutations_ = e->mutations_;
+  }
+  inline void CheckNoMutations() const {
+    CHECK_EQ(init_mutations_, owner_->mutations_);
+  }
+  const EdgeSet* owner_ = nullptr;
+  uint32 init_mutations_ = 0;
+#endif
+};
+
+inline EdgeSet::EdgeSet() {
+  for (int i = 0; i < kInline; i++) {
+    ptrs_[i] = nullptr;
+  }
+}
+
+inline EdgeSet::~EdgeSet() { delete get_set(); }
+
+inline bool EdgeSet::empty() const { return size() == 0; }
+
+inline EdgeSet::size_type EdgeSet::size() const {
+  auto s = get_set();
+  if (s) {
+    return s->size();
+  } else {
+    size_t result = 0;
+    for (int i = 0; i < kInline; i++) {
+      if (ptrs_[i]) result++;
+    }
+    return result;
+  }
+}
+
+inline void EdgeSet::clear() {
+  RegisterMutation();
+  delete get_set();
+  for (int i = 0; i < kInline; i++) {
+    ptrs_[i] = nullptr;
+  }
+}
+
+inline EdgeSet::const_iterator EdgeSet::begin() const {
+  const_iterator ci;
+  ci.Init(this);
+  auto s = get_set();
+  if (s) {
+    ci.tree_iter_ = s->begin();
+  } else {
+    ci.array_iter_ = &ptrs_[0];
+  }
+  return ci;
+}
+
+inline EdgeSet::const_iterator EdgeSet::end() const {
+  const_iterator ci;
+  ci.Init(this);
+  auto s = get_set();
+  if (s) {
+    ci.tree_iter_ = s->end();
+  } else {
+    ci.array_iter_ = &ptrs_[size()];
+  }
+  return ci;
+}
+
+inline EdgeSet::const_iterator& EdgeSet::const_iterator::operator++() {
+  CheckNoMutations();
+  if (array_iter_ != nullptr) {
+    ++array_iter_;
+  } else {
+    ++tree_iter_;
+  }
+  return *this;
+}
+
+inline EdgeSet::const_iterator EdgeSet::const_iterator::operator++(
+    int /*unused*/) {
+  CheckNoMutations();
+  const_iterator tmp = *this;
+  operator++();
+  return tmp;
+}
+
+// gcc's set and multiset always use const_iterator since it will otherwise
+// allow modification of keys.
+inline const EdgeSet::const_iterator::value_type* EdgeSet::const_iterator::
+operator->() const {
+  CheckNoMutations();
+  if (array_iter_ != nullptr) {
+    return reinterpret_cast<const value_type*>(array_iter_);
+  } else {
+    return tree_iter_.operator->();
+  }
+}
+
+// gcc's set and multiset always use const_iterator since it will otherwise
+// allow modification of keys.
+inline EdgeSet::const_iterator::value_type EdgeSet::const_iterator::operator*()
+    const {
+  CheckNoMutations();
+  if (array_iter_ != nullptr) {
+    return static_cast<value_type>(*array_iter_);
+  } else {
+    return *tree_iter_;
+  }
+}
+
+inline bool EdgeSet::const_iterator::operator==(
+    const const_iterator& other) const {
+  DCHECK((array_iter_ == nullptr) == (other.array_iter_ == nullptr))
+      << "Iterators being compared must be from same set that has not "
+      << "been modified since the iterator was constructed";
+  CheckNoMutations();
+  if (array_iter_ != nullptr) {
+    return array_iter_ == other.array_iter_;
+  } else {
+    return other.array_iter_ == nullptr && tree_iter_ == other.tree_iter_;
+  }
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_EDGESET_H_
diff --git a/tensorflow/core/graph/edgeset_test.cc b/tensorflow/core/graph/edgeset_test.cc
new file mode 100644
index 0000000000..7909e8ea0a
--- /dev/null
+++ b/tensorflow/core/graph/edgeset_test.cc
@@ -0,0 +1,95 @@
+#include "tensorflow/core/graph/edgeset.h"
+
+#include "tensorflow/core/graph/graph.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+class EdgeSetTest : public ::testing::Test {
+ public:
+  EdgeSetTest() : edges_(nullptr), eset_(nullptr) {}
+
+  ~EdgeSetTest() override {
+    delete eset_;
+    delete[] edges_;
+  }
+
+  void MakeEdgeSet(int n) {
+    delete eset_;
+    delete[] edges_;
+    edges_ = new Edge[n];
+    eset_ = new EdgeSet;
+    model_.clear();
+    for (int i = 0; i < n; i++) {
+      eset_->insert(&edges_[i]);
+      model_.insert(&edges_[i]);
+    }
+  }
+
+  void CheckSame() {
+    EXPECT_EQ(model_.size(), eset_->size());
+    EXPECT_EQ(model_.empty(), eset_->empty());
+    std::vector<const Edge*> modelv(model_.begin(), model_.end());
+    std::vector<const Edge*> esetv(eset_->begin(), eset_->end());
+    std::sort(modelv.begin(), modelv.end());
+    std::sort(esetv.begin(), esetv.end());
+    EXPECT_EQ(modelv.size(), esetv.size());
+    for (size_t i = 0; i < modelv.size(); i++) {
+      EXPECT_EQ(modelv[i], esetv[i]) << i;
+    }
+  }
+
+  Edge nonexistent_;
+  Edge* edges_;
+  EdgeSet* eset_;
+  std::set<const Edge*> model_;
+};
+
+namespace {
+
+TEST_F(EdgeSetTest, Ops) {
+  for (int n : {0, 1, 2, 3, 4, 10}) {
+    MakeEdgeSet(n);
+    CheckSame();
+    EXPECT_EQ((n == 0), eset_->empty());
+    EXPECT_EQ(n, eset_->size());
+
+    eset_->clear();
+    model_.clear();
+    CheckSame();
+
+    eset_->insert(&edges_[0]);
+    model_.insert(&edges_[0]);
+    CheckSame();
+  }
+}
+
+// Try insert/erase of existing elements at different positions.
+TEST_F(EdgeSetTest, Exists) {
+  for (int n : {0, 1, 2, 3, 4, 10}) {
+    MakeEdgeSet(n);
+    for (int pos = 0; pos < n; pos++) {
+      MakeEdgeSet(n);
+      auto p = eset_->insert(&edges_[pos]);
+      EXPECT_FALSE(p.second);
+      EXPECT_EQ(&edges_[pos], *p.first);
+
+      EXPECT_EQ(1, eset_->erase(&edges_[pos]));
+      model_.erase(&edges_[pos]);
+      CheckSame();
+    }
+  }
+}
+
+// Try insert/erase of non-existent element.
+TEST_F(EdgeSetTest, DoesNotExist) {
+  for (int n : {0, 1, 2, 3, 4, 10}) {
+    MakeEdgeSet(n);
+    EXPECT_EQ(0, eset_->erase(&nonexistent_));
+    auto p = eset_->insert(&nonexistent_);
+    EXPECT_TRUE(p.second);
+    EXPECT_EQ(&nonexistent_, *p.first);
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/equal_graph_def.cc b/tensorflow/core/graph/equal_graph_def.cc
new file mode 100644
index 0000000000..35f59b5ed0
--- /dev/null
+++ b/tensorflow/core/graph/equal_graph_def.cc
@@ -0,0 +1,176 @@
+#include "tensorflow/core/graph/equal_graph_def.h"
+
+#include <unordered_map>
+#include <unordered_set>
+#include "tensorflow/core/framework/attr_value_util.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+bool EqualGraphDef(const GraphDef& actual, const GraphDef& expected,
+                   string* diff) {
+  std::unordered_map<string, const NodeDef*> actual_index;
+  for (const NodeDef& node : actual.node()) {
+    actual_index[node.name()] = &node;
+  }
+
+  for (const NodeDef& expected_node : expected.node()) {
+    auto actual_iter = actual_index.find(expected_node.name());
+    if (actual_iter == actual_index.end()) {
+      if (diff != nullptr) {
+        *diff = strings::StrCat("Did not find expected node '",
+                                SummarizeNodeDef(expected_node), "'");
+      }
+      return false;
+    }
+
+    if (!EqualNodeDef(*actual_iter->second, expected_node, diff)) return false;
+
+    actual_index.erase(actual_iter);
+  }
+
+  if (!actual_index.empty()) {
+    if (diff != nullptr) {
+      *diff = strings::StrCat("Found unexpected node '",
+                              SummarizeNodeDef(*actual_index.begin()->second),
+                              "' not in expected graph:\n",
+                              SummarizeGraphDef(expected));
+    }
+    return false;
+  }
+
+  return true;
+}
+
+namespace {
+
+string JoinStringField(const protobuf::RepeatedPtrField<string>& f) {
+  string ret;
+  for (int i = 0; i < f.size(); ++i) {
+    if (i > 0) strings::StrAppend(&ret, ", ");
+    strings::StrAppend(&ret, f.Get(i));
+  }
+  return ret;
+}
+
+}  // namespace
+
+bool EqualNodeDef(const NodeDef& actual, const NodeDef& expected,
+                  string* diff) {
+  if (actual.name() != expected.name()) {
+    if (diff != nullptr) {
+      *diff = strings::StrCat("Actual node name '", actual.name(),
+                              "' is not expected '", expected.name(), "'");
+    }
+    return false;
+  }
+
+  if (actual.op() != expected.op()) {
+    if (diff != nullptr) {
+      *diff = strings::StrCat("Node named '", actual.name(), "' has op '",
+                              actual.op(), "' that is not expected '",
+                              expected.op(), "'");
+    }
+    return false;
+  }
+
+  if (actual.device() != expected.device()) {
+    if (diff != nullptr) {
+      *diff = strings::StrCat("Node named '", actual.name(), "' has device '",
+                              actual.device(), "' that is not expected '",
+                              expected.device(), "'");
+    }
+    return false;
+  }
+
+  if (actual.input_size() != expected.input_size()) {
+    if (diff != nullptr) {
+      *diff = strings::StrCat("Node named '", actual.name(), "' has inputs '",
+                              JoinStringField(actual.input()),
+                              "' that don't match expected '",
+                              JoinStringField(expected.input()), "'");
+    }
+    return false;
+  }
+
+  int first_control_input = actual.input_size();
+  for (int i = 0; i < actual.input_size(); ++i) {
+    if (StringPiece(actual.input(i)).starts_with("^")) {
+      first_control_input = i;
+      break;
+    }
+    if (actual.input(i) != expected.input(i)) {
+      if (diff != nullptr) {
+        *diff = strings::StrCat("Node named '", actual.name(), "' has input ",
+                                i, " '", actual.input(i),
+                                "' that doesn't match expected '",
+                                expected.input(i), "'");
+      }
+      return false;
+    }
+  }
+
+  std::unordered_set<string> actual_control;
+  std::unordered_set<string> expected_control;
+  for (int i = first_control_input; i < actual.input_size(); ++i) {
+    actual_control.insert(actual.input(i));
+    expected_control.insert(expected.input(i));
+  }
+  for (const auto& e : expected_control) {
+    if (actual_control.erase(e) == 0) {
+      if (diff != nullptr) {
+        *diff = strings::StrCat("Node named '", actual.name(),
+                                "' missing expected control input '", e, "'");
+      }
+      return false;
+    }
+  }
+  if (!actual_control.empty()) {
+    if (diff != nullptr) {
+      *diff = strings::StrCat("Node named '", actual.name(),
+                              "' has unexpected control input '",
+                              *actual_control.begin(), "'");
+    }
+    return false;
+  }
+
+  std::unordered_set<string> actual_attr;
+  for (const auto& a : actual.attr()) {
+    actual_attr.insert(a.first);
+  }
+  for (const auto& e : expected.attr()) {
+    if (actual_attr.erase(e.first) == 0) {
+      if (diff != nullptr) {
+        *diff = strings::StrCat("Node named '", actual.name(),
+                                "' missing expected attr '", e.first,
+                                "' with value: ", SummarizeAttrValue(e.second));
+      }
+      return false;
+    }
+    auto iter = actual.attr().find(e.first);
+    if (!AreAttrValuesEqual(e.second, iter->second)) {
+      if (diff != nullptr) {
+        *diff = strings::StrCat(
+            "Node named '", actual.name(), "' has attr '", e.first,
+            "' with value: ", SummarizeAttrValue(iter->second),
+            " that does not match expected: ", SummarizeAttrValue(e.second));
+      }
+      return false;
+    }
+  }
+  if (!actual_attr.empty()) {
+    if (diff != nullptr) {
+      *diff = strings::StrCat(
+          "Node named '", actual.name(), "' has unexpected attr '",
+          *actual_attr.begin(), "' with value: ",
+          SummarizeAttrValue(actual.attr().find(*actual_attr.begin())->second));
+    }
+    return false;
+  }
+
+  return true;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/equal_graph_def.h b/tensorflow/core/graph/equal_graph_def.h
new file mode 100644
index 0000000000..7dd8aab340
--- /dev/null
+++ b/tensorflow/core/graph/equal_graph_def.h
@@ -0,0 +1,32 @@
+#ifndef TENSORFLOW_GRAPH_EQUAL_GRAPH_DEF_H_
+#define TENSORFLOW_GRAPH_EQUAL_GRAPH_DEF_H_
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/graph_def_util.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// Determines if actual and expected are equal, ignoring ordering of
+// nodes, attrs, and control inputs.  If the GraphDefs are different
+// and diff != nullptr, *diff is set to an explanation of the
+// difference.  Note that we use node names to match up nodes between
+// the graphs, and so the naming of nodes must be consistent.
+bool EqualGraphDef(const GraphDef& actual, const GraphDef& expected,
+                   string* diff);
+
+// Determines if actual and expected are equal, ignoring ordering of
+// attrs and control inputs.  If the NodeDefs are different and
+// diff != nullptr, *diff is set to an explanation of the difference.
+bool EqualNodeDef(const NodeDef& actual, const NodeDef& expected, string* diff);
+
+#define TF_EXPECT_GRAPH_EQ(expected, actual)                  \
+  do {                                                        \
+    string diff;                                              \
+    EXPECT_TRUE(EqualGraphDef(actual, expected, &diff))       \
+        << diff << "\nActual: " << SummarizeGraphDef(actual); \
+  } while (false)
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_EQUAL_GRAPH_DEF_H_
diff --git a/tensorflow/core/graph/equal_graph_def_test.cc b/tensorflow/core/graph/equal_graph_def_test.cc
new file mode 100644
index 0000000000..3a38b9e522
--- /dev/null
+++ b/tensorflow/core/graph/equal_graph_def_test.cc
@@ -0,0 +1,279 @@
+#include "tensorflow/core/graph/equal_graph_def.h"
+
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+REGISTER_OP("Input").Output("o: float");
+REGISTER_OP("Alternate").Output("o: float");
+REGISTER_OP("Cross").Input("a: float").Input("b: float").Output("o: float");
+
+Node* Input(const GraphDefBuilder::Options& opts) {
+  return ops::SourceOp("Input", opts);
+}
+
+Node* Alternate(const GraphDefBuilder::Options& opts) {
+  return ops::SourceOp("Alternate", opts);
+}
+
+Node* Cross(ops::NodeOut a, ops::NodeOut b,
+            const GraphDefBuilder::Options& opts) {
+  return ops::BinaryOp("Cross", a, b, opts);
+}
+
+class EqualGraphDefTest : public ::testing::Test {
+ protected:
+  EqualGraphDefTest()
+      : e_(GraphDefBuilder::kFailImmediately),
+        a_(GraphDefBuilder::kFailImmediately) {
+    RequireDefaultOps();
+  }
+
+  bool Match() {
+    GraphDef expected;
+    e_.ToGraphDef(&expected);
+    GraphDef actual;
+    a_.ToGraphDef(&actual);
+    return EqualGraphDef(actual, expected, &diff_);
+  }
+
+  GraphDefBuilder e_;
+  GraphDefBuilder a_;
+  string diff_;
+};
+
+TEST_F(EqualGraphDefTest, Match) {
+  Input(e_.opts().WithName("A"));
+  Input(a_.opts().WithName("A"));
+  EXPECT_TRUE(Match()) << diff_;
+}
+
+TEST_F(EqualGraphDefTest, NoMatch) {
+  Input(e_.opts().WithName("A"));
+  Input(a_.opts().WithName("B"));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ("Did not find expected node 'A = Input[]()'", diff_);
+}
+
+TEST_F(EqualGraphDefTest, MissingNode) {
+  Input(e_.opts().WithName("A"));
+  Input(e_.opts().WithName("B"));
+  Input(a_.opts().WithName("A"));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ("Did not find expected node 'B = Input[]()'", diff_);
+}
+
+TEST_F(EqualGraphDefTest, ExtraNode) {
+  Input(e_.opts().WithName("A"));
+  Input(a_.opts().WithName("A"));
+  Input(a_.opts().WithName("B"));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ(
+      "Found unexpected node 'B = Input[]()' not in expected graph:\n"
+      "A = Input[]();\n",
+      diff_);
+}
+
+TEST_F(EqualGraphDefTest, NodeOrder) {
+  Node* a = Input(e_.opts().WithName("A"));
+  Node* b = Input(e_.opts().WithName("B"));
+  Cross(a, b, e_.opts().WithName("C"));
+
+  b = Input(a_.opts().WithName("B"));
+  a = Input(a_.opts().WithName("A"));
+  Cross(a, b, a_.opts().WithName("C"));
+  EXPECT_TRUE(Match()) << diff_;
+}
+
+TEST_F(EqualGraphDefTest, NameMismatch) {
+  Node* a = Input(e_.opts().WithName("A"));
+  Node* b = Input(e_.opts().WithName("B"));
+  // Have to call EqualNodeDef() directly here, since EqualGraphDef()
+  // only calls EqualNodeDef() with nodes that have matching names.
+  EXPECT_FALSE(EqualNodeDef(a->def(), b->def(), &diff_));
+  EXPECT_EQ("Actual node name 'A' is not expected 'B'", diff_);
+}
+
+TEST_F(EqualGraphDefTest, OpMismatch) {
+  Input(e_.opts().WithName("A"));
+  Alternate(a_.opts().WithName("A"));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ("Node named 'A' has op 'Alternate' that is not expected 'Input'",
+            diff_);
+}
+
+TEST_F(EqualGraphDefTest, DeviceMatch) {
+  Input(e_.opts().WithName("A").WithDevice("/cpu:0"));
+  Input(a_.opts().WithName("A").WithDevice("/cpu:0"));
+  EXPECT_TRUE(Match()) << diff_;
+}
+
+TEST_F(EqualGraphDefTest, DeviceMismatch) {
+  Input(e_.opts().WithName("A").WithDevice("/cpu:0"));
+  Input(a_.opts().WithName("A").WithDevice("/cpu:1"));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ("Node named 'A' has device '/cpu:1' that is not expected '/cpu:0'",
+            diff_);
+}
+
+TEST_F(EqualGraphDefTest, InputMismatch) {
+  Node* a = Input(e_.opts().WithName("A"));
+  Node* b = Input(e_.opts().WithName("B"));
+  Cross(a, a, e_.opts().WithName("C"));
+
+  a = Input(a_.opts().WithName("A"));
+  b = Input(a_.opts().WithName("B"));
+  Cross(b, b, a_.opts().WithName("C"));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ("Node named 'C' has input 0 'B' that doesn't match expected 'A'",
+            diff_);
+}
+
+TEST_F(EqualGraphDefTest, InputOrderMismatch) {
+  Node* a = Input(e_.opts().WithName("A"));
+  Node* b = Input(e_.opts().WithName("B"));
+  Cross(a, b, e_.opts().WithName("C"));
+
+  a = Input(a_.opts().WithName("A"));
+  b = Input(a_.opts().WithName("B"));
+  Cross(b, a, a_.opts().WithName("C"));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ("Node named 'C' has input 0 'B' that doesn't match expected 'A'",
+            diff_);
+}
+
+TEST_F(EqualGraphDefTest, ControlInputOrder) {
+  Node* a = Input(e_.opts().WithName("A"));
+  Node* b = Input(e_.opts().WithName("B"));
+  Node* c = Input(e_.opts().WithName("C"));
+  Node* d = Input(e_.opts().WithName("D"));
+  Cross(a, a, e_.opts()
+                  .WithName("E")
+                  .WithControlInput(b)
+                  .WithControlInput(c)
+                  .WithControlInput(d));
+
+  a = Input(a_.opts().WithName("A"));
+  b = Input(a_.opts().WithName("B"));
+  c = Input(a_.opts().WithName("C"));
+  d = Input(a_.opts().WithName("D"));
+  Cross(a, a, a_.opts()
+                  .WithName("E")
+                  .WithControlInput(c)
+                  .WithControlInput(d)
+                  .WithControlInput(b));
+  EXPECT_TRUE(Match()) << diff_;
+}
+
+TEST_F(EqualGraphDefTest, ControlInputMismatch) {
+  Node* a = Input(e_.opts().WithName("A"));
+  Node* b = Input(e_.opts().WithName("B"));
+  Node* c = Input(e_.opts().WithName("C"));
+  Node* d = Input(e_.opts().WithName("D"));
+  Cross(a, a, e_.opts().WithName("E").WithControlInput(b).WithControlInput(c));
+
+  a = Input(a_.opts().WithName("A"));
+  b = Input(a_.opts().WithName("B"));
+  c = Input(a_.opts().WithName("C"));
+  d = Input(a_.opts().WithName("D"));
+  Cross(a, a, a_.opts().WithName("E").WithControlInput(b).WithControlInput(d));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ("Node named 'E' missing expected control input '^C'", diff_);
+}
+
+TEST_F(EqualGraphDefTest, ControlInputAdded) {
+  Node* a = Input(e_.opts().WithName("A"));
+  Node* b = Input(e_.opts().WithName("B"));
+  Node* c = Input(e_.opts().WithName("C"));
+  Cross(a, a, e_.opts().WithName("D").WithControlInput(b));
+
+  a = Input(a_.opts().WithName("A"));
+  b = Input(a_.opts().WithName("B"));
+  c = Input(a_.opts().WithName("C"));
+  Cross(a, a, a_.opts().WithName("D").WithControlInput(b).WithControlInput(c));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ(
+      "Node named 'D' has inputs 'A, A, ^B, ^C' that don't match "
+      "expected 'A, A, ^B'",
+      diff_);
+}
+
+TEST_F(EqualGraphDefTest, ControlInputRemoved) {
+  Node* a = Input(e_.opts().WithName("A"));
+  Node* b = Input(e_.opts().WithName("B"));
+  Node* c = Input(e_.opts().WithName("C"));
+  Cross(a, a, e_.opts().WithName("D").WithControlInput(b).WithControlInput(c));
+
+  a = Input(a_.opts().WithName("A"));
+  b = Input(a_.opts().WithName("B"));
+  c = Input(a_.opts().WithName("C"));
+  Cross(a, a, a_.opts().WithName("D").WithControlInput(b));
+  EXPECT_FALSE(Match());
+  EXPECT_EQ(
+      "Node named 'D' has inputs 'A, A, ^B' that don't match "
+      "expected 'A, A, ^B, ^C'",
+      diff_);
+}
+
+TEST_F(EqualGraphDefTest, Attr) {
+  Node* a = Input(e_.opts().WithName("A"));
+  NodeDef same(a->def());
+  AddNodeAttr("foo", "bar", &same);
+  EXPECT_TRUE(EqualNodeDef(same, same, &diff_)) << diff_;
+}
+
+TEST_F(EqualGraphDefTest, AttrAdded) {
+  Node* a = Input(e_.opts().WithName("A"));
+  NodeDef actual(a->def());
+  AddNodeAttr("foo", "bar", &actual);
+  EXPECT_FALSE(EqualNodeDef(actual, a->def(), &diff_));
+  EXPECT_EQ("Node named 'A' has unexpected attr 'foo' with value: \"bar\"",
+            diff_);
+}
+
+TEST_F(EqualGraphDefTest, AttrRemoved) {
+  Node* a = Input(e_.opts().WithName("A"));
+  NodeDef expected(a->def());
+  AddNodeAttr("foo", "bar", &expected);
+  EXPECT_FALSE(EqualNodeDef(a->def(), expected, &diff_));
+  EXPECT_EQ("Node named 'A' missing expected attr 'foo' with value: \"bar\"",
+            diff_);
+}
+
+TEST_F(EqualGraphDefTest, AttrOrder) {
+  Node* a = Input(e_.opts().WithName("A"));
+  NodeDef actual(a->def());
+  AddNodeAttr("foo", "bar", &actual);
+  AddNodeAttr("baz", 42, &actual);
+
+  NodeDef expected(a->def());
+  AddNodeAttr("baz", 42, &expected);
+  AddNodeAttr("foo", "bar", &expected);
+
+  EXPECT_TRUE(EqualNodeDef(actual, expected, &diff_)) << diff_;
+}
+
+TEST_F(EqualGraphDefTest, AttrMismatch) {
+  Node* a = Input(e_.opts().WithName("A"));
+  NodeDef actual(a->def());
+  AddNodeAttr("foo", "bar", &actual);
+  AddNodeAttr("baz", 5, &actual);
+
+  NodeDef expected(a->def());
+  AddNodeAttr("baz", 42, &expected);
+  AddNodeAttr("foo", "bar", &expected);
+
+  EXPECT_FALSE(EqualNodeDef(actual, expected, &diff_));
+  EXPECT_EQ(
+      "Node named 'A' has attr 'baz' with value: 5 that does not match "
+      "expected: 42",
+      diff_);
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/graph.cc b/tensorflow/core/graph/graph.cc
new file mode 100644
index 0000000000..0c268a51a9
--- /dev/null
+++ b/tensorflow/core/graph/graph.cc
@@ -0,0 +1,319 @@
+#include "tensorflow/core/graph/graph.h"
+
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+// Node
+
+string Node::DebugString() const {
+  if (this == nullptr) {
+    return "{nullptr}";
+  }
+  string ret = strings::StrCat("{name:'", name(), "' id:", id_);
+  if (IsSource()) {
+    strings::StrAppend(&ret, " source}");
+  } else if (IsSink()) {
+    strings::StrAppend(&ret, " sink}");
+  } else {
+    strings::StrAppend(&ret, " op device:");
+    strings::StrAppend(&ret, "{", assigned_device_name_, "}");
+    strings::StrAppend(&ret, " def:{", SummarizeNodeDef(def()), "}}");
+  }
+  return ret;
+}
+
+Node::Node()
+    : id_(-1), cost_id_(-1), props_(nullptr), assigned_device_name_() {}
+
+Node::~Node() {
+  if (props_) {
+    props_->Unref();
+  }
+}
+
+void Node::Initialize(int id, int cost_id, Properties* props) {
+  DCHECK_EQ(id_, -1);
+  DCHECK(in_edges_.empty());
+  DCHECK(out_edges_.empty());
+  id_ = id;
+  cost_id_ = cost_id;
+
+  // Unref the old, assign the new properties.
+  if (props_) {
+    props_->Unref();
+  }
+  props_ = props;
+}
+
+void Node::Clear() {
+  in_edges_.clear();
+  out_edges_.clear();
+  id_ = -1;
+  cost_id_ = -1;
+
+  if (props_) {
+    props_->Unref();
+    props_ = nullptr;
+  }
+
+  assigned_device_name_.clear();
+}
+
+gtl::iterator_range<NeighborIter> Node::out_nodes() const {
+  return gtl::make_range(NeighborIter(out_edges_.begin(), false),
+                         NeighborIter(out_edges_.end(), false));
+}
+
+gtl::iterator_range<NeighborIter> Node::in_nodes() const {
+  return gtl::make_range(NeighborIter(in_edges_.begin(), true),
+                         NeighborIter(in_edges_.end(), true));
+}
+
+// Node::Properties
+
+Node::Properties::Properties(const OpDef* op_def, const NodeDef& node_def,
+                             const DataTypeSlice inputs,
+                             const DataTypeSlice outputs)
+    : op_def_(op_def),
+      node_def_(node_def),
+      input_types_(inputs.begin(), inputs.end()),
+      output_types_(outputs.begin(), outputs.end()) {}
+
+Node::Properties::~Properties() {}
+
+// Graph
+
+Graph::Graph(const OpRegistryInterface* ops)
+    : ops_(ops), arena_(8 << 10 /* 8kB */) {
+  // Source and sink have no endpoints, just control edges.
+  NodeDef def;
+  def.set_name("_SOURCE");
+  def.set_op("NoOp");
+  Status status;
+  Node* source = AddNode(def, &status);
+  TF_CHECK_OK(status);
+  CHECK_EQ(source->id(), kSourceId);
+
+  def.set_name("_SINK");
+  Node* sink = AddNode(def, &status);
+  TF_CHECK_OK(status);
+  CHECK_EQ(sink->id(), kSinkId);
+
+  AddControlEdge(source, sink);
+}
+
+Graph::~Graph() {
+  // Manually call the destructors for all the Nodes we constructed using
+  // placement new.
+  for (Node* node : nodes_) {
+    if (node != nullptr) {
+      node->~Node();
+    }
+  }
+  for (Node* node : free_nodes_) {
+    node->~Node();
+  }
+  // Edges have no destructor, and we arena-allocated them, so no need to
+  // destroy them.
+}
+
+Node* Graph::AddNode(const NodeDef& node_def, Status* status) {
+  const OpDef* op_def = ops_->LookUp(node_def.op(), status);
+  if (op_def == nullptr) return nullptr;
+
+  // TODO(vrv,josh11b): Find a location higher in the stack to add these defaults
+  // to the NodeDef.
+  NodeDef node_def_with_defaults(node_def);
+  AddDefaultsToNodeDef(*op_def, &node_def_with_defaults);
+
+  DataTypeVector inputs;
+  DataTypeVector outputs;
+  status->Update(
+      InOutTypesForNode(node_def_with_defaults, *op_def, &inputs, &outputs));
+  if (!status->ok()) {
+    *status = AttachDef(*status, node_def_with_defaults);
+    return nullptr;
+  }
+
+  Node* node = AllocateNode(
+      new Node::Properties(op_def, node_def_with_defaults, inputs, outputs),
+      nullptr);
+  return node;
+}
+
+Node* Graph::CopyNode(Node* node) {
+  DCHECK(!node->IsSource());
+  DCHECK(!node->IsSink());
+  Node::Properties* props = node->properties();
+  props->Ref();
+  Node* copy = AllocateNode(props, node);
+  copy->set_assigned_device_name(node->assigned_device_name());
+  return copy;
+}
+
+void Graph::RemoveNode(Node* node) {
+  DCHECK(IsValidNode(node)) << node->DebugString();
+  DCHECK(!node->IsSource());
+  DCHECK(!node->IsSink());
+
+  // Remove any edges involving this node.
+  while (!node->in_edges_.empty()) {
+    RemoveEdge(*node->in_edges_.begin());
+  }
+  while (!node->out_edges_.empty()) {
+    RemoveEdge(*node->out_edges_.begin());
+  }
+  ReleaseNode(node);
+}
+
+const Edge* Graph::AddEdge(Node* source, int x, Node* dest, int y) {
+  DCHECK(IsValidNode(source)) << source->DebugString();
+  DCHECK(IsValidNode(dest)) << dest->DebugString();
+
+  // source/sink must only be linked via control slots, and
+  // control slots must only be linked to control slots.
+  if (source == source_node() || dest == sink_node() || x == kControlSlot ||
+      y == kControlSlot) {
+    DCHECK_EQ(x, kControlSlot) << source->DebugString();
+    DCHECK_EQ(y, kControlSlot) << dest->DebugString();
+  }
+
+  Edge* e = nullptr;
+  if (free_edges_.empty()) {
+    e = new (arena_.Alloc(sizeof(Edge))) Edge;  // placement new
+  } else {
+    e = free_edges_.back();
+    free_edges_.pop_back();
+  }
+  e->id_ = edges_.size();
+  e->src_ = source;
+  e->dst_ = dest;
+  e->src_output_ = x;
+  e->dst_input_ = y;
+  CHECK(source->out_edges_.insert(e).second);
+  CHECK(dest->in_edges_.insert(e).second);
+  edges_.push_back(e);
+  edge_set_.insert(e);
+  return e;
+}
+
+void Graph::RemoveEdge(const Edge* e) {
+  DCHECK(IsValidNode(e->src_)) << e->src_->DebugString();
+  DCHECK(IsValidNode(e->dst_)) << e->dst_->DebugString();
+  CHECK_EQ(e->src_->out_edges_.erase(e), 1);
+  CHECK_EQ(e->dst_->in_edges_.erase(e), 1);
+  CHECK_EQ(e, edges_[e->id_]);
+
+  CHECK_EQ(edge_set_.erase(e), 1);
+  edges_[e->id_] = nullptr;
+
+  Edge* del = const_cast<Edge*>(e);
+  del->src_ = nullptr;
+  del->dst_ = nullptr;
+  del->id_ = -1;
+  del->src_output_ = kControlSlot - 1;
+  del->dst_input_ = kControlSlot - 1;
+  free_edges_.push_back(del);
+}
+
+namespace {
+
+void AddInput(NodeDef* dst, StringPiece src_name, int src_slot) {
+  if (src_slot == Graph::kControlSlot) {
+    dst->add_input(strings::StrCat("^", src_name));
+  } else if (src_slot == 0) {
+    dst->add_input(src_name.data(), src_name.size());
+  } else {
+    dst->add_input(strings::StrCat(src_name, ":", src_slot));
+  }
+}
+
+}  // namespace
+
+void Graph::ToGraphDef(GraphDef* graph_def) const {
+  graph_def->Clear();
+  std::vector<const Edge*>
+      inputs;  // Construct this outside the loop for speed.
+  for (const Node* node : nodes()) {
+    if (!node->IsOp()) continue;
+    NodeDef* node_def = graph_def->add_node();
+    *node_def = node->def();
+
+    // Use the node's assigned device, if any, instead of the device requested
+    // in the NodeDef.
+    if (!node->assigned_device_name().empty()) {
+      node_def->set_device(node->assigned_device_name());
+    }
+
+    // Get the inputs for this Node.  We make sure control inputs are
+    // after data inputs, as required by GraphDef.
+    inputs.clear();
+    inputs.resize(node->num_inputs(), nullptr);
+    for (const Edge* edge : node->in_edges()) {
+      if (edge->IsControlEdge()) {
+        inputs.push_back(edge);
+      } else {
+        DCHECK(inputs[edge->dst_input()] == nullptr);
+        inputs[edge->dst_input()] = edge;
+      }
+    }
+    node_def->clear_input();
+    for (size_t i = 0; i < inputs.size(); ++i) {
+      const Edge* edge = inputs[i];
+      if (edge == nullptr) {
+        node_def->add_input(node->def().input(i));
+      } else {
+        const Node* src = edge->src();
+        if (!src->IsOp()) continue;
+        AddInput(node_def, src->name(), edge->src_output());
+      }
+    }
+  }
+}
+
+string Graph::NewName(StringPiece prefix) {
+  return strings::StrCat(prefix, "/_", name_counter_++);
+}
+
+gtl::iterator_range<NodeIter> Graph::nodes() const {
+  // Note that NodeId 0 is always valid since we don't let the source
+  // node be removed from the graph.
+  return gtl::make_range(NodeIter(this, 0), NodeIter(this, num_node_ids()));
+}
+
+bool Graph::IsValidNode(Node* node) const {
+  if (node == nullptr) return false;
+  const int id = node->id();
+  if (id < 0 || static_cast<size_t>(id) >= nodes_.size()) return false;
+  return nodes_[id] == node;
+}
+
+Node* Graph::AllocateNode(Node::Properties* props, const Node* cost_node) {
+  Node* node = nullptr;
+  if (free_nodes_.empty()) {
+    node = new (arena_.Alloc(sizeof(Node))) Node;  // placement new
+  } else {
+    node = free_nodes_.back();
+    free_nodes_.pop_back();
+  }
+  const int id = nodes_.size();
+  int cost_id = cost_node ? cost_node->cost_id() : id;
+  node->Initialize(id, cost_id, props);
+  nodes_.push_back(node);
+  return node;
+}
+
+void Graph::ReleaseNode(Node* node) {
+  DCHECK(IsValidNode(node)) << node->DebugString();
+  nodes_[node->id()] = nullptr;
+  free_nodes_.push_back(node);
+  node->Clear();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/graph.h b/tensorflow/core/graph/graph.h
new file mode 100644
index 0000000000..030e471bf4
--- /dev/null
+++ b/tensorflow/core/graph/graph.h
@@ -0,0 +1,440 @@
+// A Graph describes a set of computations that are to be
+// performed, as well as the dependencies between those
+// compuations. The basic model is a DAG (directed acyclic graph) with
+// * internal nodes representing computational operations to be performed;
+// * edges represent dependencies, indicating the target may only be
+//   executed once the source has completed; and
+// * predefined "source" (start) and "sink" (finish) nodes -- the source
+//   should be the only node that doesn't depend on anything, and the sink
+//   should be the only node that nothing depends on.
+//
+// Note: Node ids are intended to be relatively dense in the
+// 0..max_id range, but there may be gaps since ids won't be reused.
+//
+// Note: Some dependencies between operations are due to one operation
+// consuming the output of another. In fact operations can produce
+// multiple outputs and consume multiple inputs, and some
+// optimizations will care about which specific outputs are connected
+// to which specific inputs.  We therefore represent data dependency
+// between output O of layer A and input I of layer B using
+// "input index" and "output index" labels per edge.
+
+#ifndef TENSORFLOW_GRAPH_GRAPH_H_
+#define TENSORFLOW_GRAPH_GRAPH_H_
+
+#include <functional>
+#include <string>
+#include <vector>
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/edgeset.h"
+#include "tensorflow/core/lib/core/arena.h"
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/lib/gtl/iterator_range.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class Edge;
+class EdgeSetTest;
+class Graph;
+class Node;
+
+class NeighborIter;  // Declared below
+class NodeIter;      // Declared below
+
+class Node {
+ public:
+  string DebugString() const;
+  int id() const { return id_; }
+  int cost_id() const { return cost_id_; }
+  const string& name() const { return props_->node_def_.name(); }
+  const string& type_string() const { return props_->node_def_.op(); }
+  const NodeDef& def() const { return props_->node_def_; }
+  const OpDef& op_def() const { return *props_->op_def_; }
+
+  // input and output types
+  int num_inputs() const { return props_->input_types_.size(); }
+  DataType input_type(int i) const { return props_->input_types_[i]; }
+  const DataTypeVector& input_types() const { return props_->input_types_; }
+
+  int num_outputs() const { return props_->output_types_.size(); }
+  DataType output_type(int o) const { return props_->output_types_[o]; }
+  const DataTypeVector& output_types() const { return props_->output_types_; }
+
+  // This gives the device the runtime has assigned this node to.  If
+  // you want the device the user requested, use def().device() instead.
+  // TODO(josh11b): Validate that the assigned_device, if not empty:
+  // fully specifies a device, and satisfies def().device().
+  // TODO(josh11b): Move device_name outside of Node into a NodeId->DeviceName
+  // map.
+  string assigned_device_name() const { return assigned_device_name_; }
+  void set_assigned_device_name(const string& device_name) {
+    assigned_device_name_ = device_name;
+  }
+
+  // Get the neighboring nodes via edges either in or out of this node.
+  gtl::iterator_range<NeighborIter> in_nodes() const;
+  gtl::iterator_range<NeighborIter> out_nodes() const;
+  const EdgeSet& in_edges() const { return in_edges_; }
+  const EdgeSet& out_edges() const { return out_edges_; }
+
+  // Node type helpers.
+  bool IsSource() const { return id() == 0; }
+  bool IsSink() const { return id() == 1; }
+  // Anything other than the special Source & Sink nodes.
+  bool IsOp() const { return id() > 1; }
+
+ private:
+  friend class Graph;
+  Node();
+  ~Node();
+
+  class Properties : public core::RefCounted {
+   public:
+    Properties(const OpDef* op_def, const NodeDef& node_def,
+               const DataTypeSlice inputs, const DataTypeSlice outputs);
+
+    const OpDef* op_def_;  // not owned
+    const NodeDef node_def_;
+    const DataTypeVector input_types_;
+    const DataTypeVector output_types_;
+
+   private:
+    // Destructor invoked when last reference goes away via Unref()
+    virtual ~Properties();
+    TF_DISALLOW_COPY_AND_ASSIGN(Properties);
+  };
+
+  Properties* properties() const { return props_; }
+
+  // Initialize() adopts a reference to props, and so is suitable if props was
+  // just allocated or you call props->Ref() to increment the reference
+  // count for a props being held by another Node.
+  void Initialize(int id, int cost_id, Properties* props);
+  // Releases memory from props_, in addition to restoring *this to its
+  // uninitialized state.
+  void Clear();
+
+  int id_;       // -1 until Initialize() is called
+  int cost_id_;  // -1 if there is no corresponding cost accounting node
+
+  EdgeSet in_edges_;
+  EdgeSet out_edges_;
+
+  Properties* props_;
+
+  // Name of device assigned to perform this computation.
+  string assigned_device_name_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Node);
+};
+
+class Edge {
+ public:
+  Node* src() const { return src_; }
+  Node* dst() const { return dst_; }
+  int id() const { return id_; }
+
+  // Return the number of the source output that produces the data
+  // carried by this edge.  The special value kControlSlot is used
+  // for control dependencies.
+  int src_output() const { return src_output_; }
+
+  // Return the number of the destination input that consumes the data
+  // carried by this edge.  The special value kControlSlot is used
+  // for control dependencies.
+  int dst_input() const { return dst_input_; }
+
+  // Return true iff this is an edge that indicates a control-flow
+  // (as opposed to a data-flow) dependency.
+  bool IsControlEdge() const;
+
+ private:
+  Edge() {}
+
+  friend class EdgeSetTest;
+  friend class Graph;
+  Node* src_;
+  Node* dst_;
+  int id_;
+  int src_output_;
+  int dst_input_;
+};
+
+// Thread compatible but not thread safe.
+class Graph {
+ public:
+  // Constructs a graph with a single SOURCE (always id kSourceId) and a
+  // single SINK (always id kSinkId) node, and an edge from SOURCE->SINK.
+  //
+  // The graph can hold ops found in registry.
+  explicit Graph(const OpRegistryInterface* registry);
+  ~Graph();
+
+  static const int kControlSlot = -1;
+
+  // Adds a new node to this graph, and returns it. Infers the Op and
+  // input/output types for the node. *this owns the returned instance.
+  // Returns nullptr and sets *status on error.
+  Node* AddNode(const NodeDef& node_def, Status* status);
+
+  // Copies *node, which may belong to another graph, to a new node,
+  // which is returned.  Does not copy any edges.  *this owns the
+  // returned instance.
+  Node* CopyNode(Node* node);
+
+  // Remove a node from this graph, including all edges from or to it.
+  // *node should not be accessed after calling this function.
+  // REQUIRES: node->IsOp()
+  void RemoveNode(Node* node);
+
+  // Add an edge that connects the xth output of "source" to the yth input
+  // of "dest".
+  const Edge* AddEdge(Node* source, int x, Node* dest, int y);
+
+  // Add a control-edge (no data flows along this edge) that
+  // connects "source" to "dest".
+  const Edge* AddControlEdge(Node* source, Node* dest) {
+    return AddEdge(source, kControlSlot, dest, kControlSlot);
+  }
+
+  // Removes edge from the graph.
+  // REQUIRES: The edge must exist.
+  void RemoveEdge(const Edge* edge);
+
+  // Returns one more than the maximum id assigned to any node.
+  int num_node_ids() const { return nodes_.size(); }
+
+  // Serialize to a GraphDef.
+  void ToGraphDef(GraphDef* graph_def) const;
+
+  // Generate new node name with the specified prefix that is unique
+  // across this graph.
+  string NewName(StringPiece prefix);
+
+  // Access to the list of all nodes.  Example usage:
+  //   for (Node* node : graph.nodes()) { ... }
+  gtl::iterator_range<NodeIter> nodes() const;
+
+  // Returns the node associated with an id, or nullptr if no node
+  // with that id (the node with that id was removed and the id has
+  // not yet been re-used). *this owns the returned instance.
+  // REQUIRES: 0 <= id < num_node_ids().
+  Node* FindNodeId(int id) const { return nodes_[id]; }
+
+  // Returns one more than the maximum id assigned to any edge.
+  int num_edge_ids() const { return edges_.size(); }
+
+  // Returns the Edge associated with an id, or nullptr if no edge
+  // with that id (the node with that id was removed and the id has
+  // not yet been re-used). *this owns the returned instance.
+  // REQUIRES: 0 <= id < num_node_ids().
+  const Edge* FindEdgeId(int id) const { return edges_[id]; }
+
+  // Access to the set of all edges.  Example usage:
+  //   for (const Edge* e : graph.edges()) { ... }
+  const EdgeSet& edges() const { return edge_set_; }
+
+  // The pre-defined nodes.
+  enum { kSourceId = 0, kSinkId = 1 };
+  Node* source_node() const { return FindNodeId(kSourceId); }
+  Node* sink_node() const { return FindNodeId(kSinkId); }
+
+  const OpRegistryInterface* op_registry() const { return ops_; }
+
+  // TODO(josh11b): uint64 hash() const;
+
+ private:
+  bool IsValidNode(Node* node) const;
+  // If cost_node is non-null, then cost accounting (in CostModel)
+  // will be associated with that node rather than the new one being
+  // created.
+  Node* AllocateNode(Node::Properties* props, const Node* cost_node);
+  void ReleaseNode(Node* node);
+
+  // Registry of all known ops.  Not owned.
+  const OpRegistryInterface* const ops_;
+
+  // Allocator which will give us good locality.
+  core::Arena arena_;
+
+  // Map from node ids to allocated nodes.  nodes_[id] may be nullptr if
+  // the node with that id was removed from the graph.
+  std::vector<Node*> nodes_;
+
+  // Map from edge ids to allocated edges.  edges_[id] may be nullptr if
+  // the edge with that id was removed from the graph.
+  std::vector<Edge*> edges_;
+
+  // For ease of iteration, we currently just keep a set of all live
+  // edges.  May want to optimize by removing this copy.
+  EdgeSet edge_set_;
+
+  // Allocated but free nodes and edges.
+  std::vector<Node*> free_nodes_;
+  std::vector<Edge*> free_edges_;
+
+  // For generating unique names.
+  int name_counter_ = 0;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Graph);
+};
+
+// TODO(josh11b): We may want to support keeping an index on various
+// node/edge attributes in a graph, particularly node names.
+
+// Helper routines
+
+inline bool IsSwitch(const Node* node) {
+  return node->type_string() == "Switch" || node->type_string() == "RefSwitch";
+}
+
+inline bool IsMerge(const Node* node) { return node->type_string() == "Merge"; }
+
+inline bool IsEnter(const Node* node) {
+  return node->type_string() == "Enter" || node->type_string() == "RefEnter";
+}
+
+inline bool IsExit(const Node* node) { return node->type_string() == "Exit"; }
+
+inline bool IsNextIteration(const Node* node) {
+  return node->type_string() == "NextIteration";
+}
+
+inline bool IsLoopCond(const Node* node) {
+  return node->type_string() == "LoopCond";
+}
+
+inline bool IsControlTrigger(const Node* node) {
+  return node->type_string() == "ControlTrigger";
+}
+
+inline bool IsSend(const Node* node) {
+  return node->type_string() == "_Send" || node->type_string() == "_HostSend";
+}
+
+inline bool IsRecv(const Node* node) {
+  return node->type_string() == "_Recv" || node->type_string() == "_HostRecv";
+}
+
+// True for Nodes that mediate the transfer of values between processes.
+inline bool IsTransferNode(const Node* n) { return IsSend(n) || IsRecv(n); }
+
+inline bool IsConstant(const Node* node) {
+  return node->type_string() == "Const" || node->type_string() == "HostConst";
+}
+
+inline bool IsVariable(const Node* node) {
+  return node->type_string() == "Variable";
+}
+
+inline bool IsIdentity(const Node* node) {
+  return (node->type_string() == "Identity" ||
+          node->type_string() == "RefIdentity");
+}
+
+// Returns true iff 'n' is a control flow node.
+inline bool IsControlFlow(const Node* n) {
+  return IsSwitch(n) || IsMerge(n) || IsEnter(n) || IsExit(n) ||
+         IsNextIteration(n);
+}
+
+inline bool IsHostMemoryPreserving(const Node* node) {
+  return IsIdentity(node) || IsControlFlow(node);
+}
+
+// Iterator for stepping through the nodes of a graph.
+class NodeIter {
+ public:
+  NodeIter(const Graph* graph, int id);
+  bool operator==(const NodeIter& rhs);
+  bool operator!=(const NodeIter& rhs);
+  void operator++();
+  Node* operator*();
+  Node* operator->();
+
+ private:
+  // Invariant: id_ == graph_->num_node_ids() || graph_->FindId(id_) != nullptr
+  const Graph* graph_;
+  int id_;
+};
+
+// Iterator for stepping through the neighbors of a node.
+class NeighborIter {
+ public:
+  NeighborIter(EdgeSet::const_iterator iter, bool incoming);
+  bool operator==(const NeighborIter& rhs);
+  bool operator!=(const NeighborIter& rhs);
+  void operator++();
+  Node* operator*();
+  Node* operator->();
+
+ private:
+  EdgeSet::const_iterator iter_;
+  bool incoming_;
+};
+
+// IMPLEMENTATION DETAILS, PLEASE IGNORE
+
+inline NodeIter::NodeIter(const Graph* graph, int id)
+    : graph_(graph), id_(id) {}
+
+inline bool NodeIter::operator==(const NodeIter& rhs) {
+  DCHECK(graph_ == rhs.graph_);
+  return id_ == rhs.id_;
+}
+
+inline bool NodeIter::operator!=(const NodeIter& rhs) {
+  return !(*this == rhs);
+}
+
+inline void NodeIter::operator++() {
+  while (1) {
+    DCHECK_LE(id_, graph_->num_node_ids());
+    ++id_;
+    if (id_ >= graph_->num_node_ids() || graph_->FindNodeId(id_) != nullptr) {
+      return;
+    }
+  }
+}
+
+inline Node* NodeIter::operator*() { return graph_->FindNodeId(id_); }
+
+inline Node* NodeIter::operator->() { return graph_->FindNodeId(id_); }
+
+inline NeighborIter::NeighborIter(EdgeSet::const_iterator iter, bool incoming)
+    : iter_(iter), incoming_(incoming) {}
+
+inline bool NeighborIter::operator==(const NeighborIter& rhs) {
+  return iter_ == rhs.iter_ && incoming_ == rhs.incoming_;
+}
+
+inline bool NeighborIter::operator!=(const NeighborIter& rhs) {
+  return !(*this == rhs);
+}
+
+inline void NeighborIter::operator++() { ++iter_; }
+
+inline Node* NeighborIter::operator*() {
+  const Edge* e = *iter_;
+  return incoming_ ? e->src() : e->dst();
+}
+
+inline Node* NeighborIter::operator->() {
+  const Edge* e = *iter_;
+  return incoming_ ? e->src() : e->dst();
+}
+
+inline bool Edge::IsControlEdge() const {
+  // Note that if either src_output_ or dst_input_ is kControlSlot,
+  // so is the other one (AddEdge checks this).
+  return src_output_ == Graph::kControlSlot;
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_GRAPH_H_
diff --git a/tensorflow/core/graph/graph_constructor.cc b/tensorflow/core/graph/graph_constructor.cc
new file mode 100644
index 0000000000..3928348f0a
--- /dev/null
+++ b/tensorflow/core/graph/graph_constructor.cc
@@ -0,0 +1,385 @@
+#include "tensorflow/core/graph/graph_constructor.h"
+
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/algorithm.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/optimizer_cse.h"
+#include "tensorflow/core/graph/tensor_id.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/regexp.h"
+
+namespace tensorflow {
+
+namespace {
+inline bool IsMerge(const NodeDef& node_def) {
+  return node_def.op() == "Merge";
+}
+}  // namespace
+
+namespace {
+
+class GraphConstructor {
+ public:
+  GraphConstructor(const GraphConstructorOptions& opts, const GraphDef* gdef,
+                   Graph* g, Status* status)
+      : opts_(opts), gdef_(gdef), g_(g), status_(status) {
+    BuildNodeIndex();
+    InitFromEdges();
+    Convert();
+  }
+
+ private:
+  void SetError(const string& error);
+  void SetNodeError(const NodeDef& node_def, const StringPiece& message) {
+    SetError(strings::StrCat("Node '", node_def.name(), "': ", message));
+  }
+  void BuildNodeIndex();
+  void InitFromEdges();
+  Node* MakeNode(const NodeDef& node_def);
+  void Convert();
+  // Calls SetError() and returns false if the type of the output of
+  // the source of the edge can't be consumed by destination of the edge.
+  // REQUIRES: edge must be a data edge, not a control edge.
+  bool TypeValidateEdge(const Edge* edge);
+
+  // From constructor
+  const GraphConstructorOptions opts_;
+  const GraphDef* gdef_;
+  Graph* g_;
+  Status* status_;
+
+  // Mapping from node name to the index within gdef_
+  struct NodeInfo {
+    explicit NodeInfo(int i) : gdef_index(i), node(nullptr) {}
+    // std::unordered_map<> requires that we have a default constructor.
+    NodeInfo() : NodeInfo(-1) {}
+    int gdef_index;
+    Node* node;  // nullptr until the NodeDef is converted to a Node.
+  };
+  // TODO(vrv): Profile this data structure to see if we should use an
+  // alternative implementation of std::unordered_map.
+  std::unordered_map<StringPiece, NodeInfo, StringPiece::Hasher> name_index_;
+
+  // Index of NodeDefs in gdef_ with all inputs already converted.
+  std::vector<int> ready_;
+
+  // Mapping between index within gdef_ and the number of inputs that
+  // still need to be converted.
+  std::vector<int> pending_count_;
+
+  // Mapping between index within gdef_ and the index within gdef_ of
+  // all nodes it outputs to.
+  std::vector<gtl::InlinedVector<int, 4>> outputs_;
+
+  // Used in the conversion from gdef_ to g_ to represent the ith input
+  // of a node.
+  struct InputInfo {
+    explicit InputInfo(StringPiece node_name, Node* n, int i)
+        : name(node_name), node(n), index(i) {}
+    StringPiece name;
+    Node* node;
+    int index;
+  };
+
+  // Used in the conversion from gdef_ to g_ to represent an edge from
+  // the node named 'name' to node 'n'.
+  struct EdgeInfo {
+    explicit EdgeInfo(StringPiece name, int i1, Node* n, int i2)
+        : src_name(name), src_index(i1), dst_node(n), dst_index(i2) {}
+    StringPiece src_name;
+    int src_index;
+    Node* dst_node;
+    int dst_index;
+  };
+};
+
+void GraphConstructor::SetError(const string& error) {
+  status_->Update(errors::InvalidArgument(error));
+}
+
+void GraphConstructor::BuildNodeIndex() {
+  // Initialized outside the loop for efficiency
+  const char* pattern;
+  if (opts_.allow_internal_ops) {
+    pattern = "[A-Za-z0-9._][A-Za-z0-9_.\\-/]*";
+  } else {
+    pattern = "[A-Za-z0-9.][A-Za-z0-9_.\\-/]*";
+  }
+  RE2 node_name_re(pattern);
+
+  // Validate the node names and add them to name_index_.
+  for (int n = 0; n < gdef_->node_size(); ++n) {
+    const NodeDef& node_def(gdef_->node(n));
+    if (!RE2::FullMatch(node_def.name(), node_name_re)) {
+      SetNodeError(node_def, "Node name contains invalid characters");
+      return;
+    }
+    if (!name_index_.insert(std::make_pair(StringPiece(node_def.name()),
+                                           NodeInfo(n)))
+             .second) {
+      SetNodeError(node_def, "Node name is not unique");
+      return;
+    }
+    // Validate the operation's type.
+    if (node_def.op().empty()) {
+      SetNodeError(node_def, "Does not specify a type");
+      return;
+    }
+    if (opts_.expect_device_spec && node_def.device().empty()) {
+      SetNodeError(node_def, strings::StrCat("Missing device specification."));
+      return;
+    }
+  }
+}
+
+void GraphConstructor::InitFromEdges() {
+  const int num_nodes = gdef_->node_size();
+  ready_.reserve(num_nodes);
+  pending_count_.reserve(num_nodes);
+  outputs_.resize(num_nodes);
+
+  // Parse the inputs for each node.
+  for (int n = 0; n < num_nodes; ++n) {
+    const NodeDef& node_def(gdef_->node(n));
+    if (IsMerge(node_def)) {
+      // for merge only wait for one non-control input.
+      int32 num_control_edges = 0;
+      for (int i = 0; i < node_def.input_size(); ++i) {
+        StringPiece input_name(node_def.input(i));
+        if (StringPiece(input_name).starts_with("^")) {
+          num_control_edges++;
+        }
+      }
+      pending_count_.push_back(num_control_edges + 1);
+    } else {
+      pending_count_.push_back(node_def.input_size());
+    }
+    if (node_def.input_size() == 0) {
+      ready_.push_back(n);
+      continue;
+    }
+    for (int i = 0; i < node_def.input_size(); ++i) {
+      StringPiece input_name = node_def.input(i);
+      if (input_name.starts_with("^")) {
+        // Control dependence
+        input_name.remove_prefix(1);
+      }
+      TensorId id(ParseTensorName(input_name));
+      auto iter = name_index_.find(id.first);
+      if (iter == name_index_.end()) {
+        SetNodeError(node_def,
+                     strings::StrCat("Unknown input node ", node_def.input(i)));
+        return;
+      }
+      outputs_[iter->second.gdef_index].push_back(n);
+    }
+  }
+}
+
+Node* GraphConstructor::MakeNode(const NodeDef& node_def) {
+  // Add the node to the graph.
+  Node* node = g_->AddNode(node_def, status_);
+  if (node == nullptr) return nullptr;
+  if (opts_.expect_device_spec) {
+    node->set_assigned_device_name(node_def.device());
+  }
+  name_index_[node_def.name()].node = node;
+  return node;
+}
+
+// Return the number of nodes in "g"
+static int CountNodes(Graph* g) {
+  int nodes = 0;
+  for (Node* node : g->nodes()) {
+    VLOG(1) << node;  // Dummy use to avoid compiler warning
+    nodes++;
+  }
+  return nodes;
+}
+
+void GraphConstructor::Convert() {
+  std::vector<InputInfo> inputs;
+  std::vector<EdgeInfo> back_edges;
+  int processed = 0;
+  // Process the NodeDefs in topological order.
+  while (!ready_.empty()) {
+    int o = ready_.back();
+    ready_.pop_back();
+    ++processed;
+    const NodeDef& node_def(gdef_->node(o));
+    inputs.clear();
+    bool in_control_dependence = false;
+    bool has_data_back_edge = false;
+    for (int i = 0; i < node_def.input_size(); ++i) {
+      StringPiece input_name(node_def.input(i));
+      if (StringPiece(input_name).starts_with("^")) {
+        // A control dependence
+        in_control_dependence = true;
+        input_name.remove_prefix(1);
+      } else {
+        if (in_control_dependence) {
+          SetNodeError(node_def, strings::StrCat(
+                                     "Control dependencies must come after ",
+                                     "regular dependencies: input ", input_name,
+                                     " of source node ", node_def.name()));
+          return;
+        }
+      }
+      TensorId id(ParseTensorName(input_name));
+      auto iter = name_index_.find(id.first);
+      DCHECK(iter != name_index_.end());
+      Node* src_node = iter->second.node;
+      if (in_control_dependence) {
+        inputs.push_back(InputInfo(id.first, src_node, -1));
+      } else {
+        if (src_node == nullptr) {
+          has_data_back_edge = true;
+          inputs.push_back(InputInfo(id.first, src_node, id.second));
+        } else {
+          if (id.second >= src_node->num_outputs()) {
+            SetNodeError(
+                node_def,
+                strings::StrCat("Connecting to invalid output ", id.second,
+                                " of source node ", id.first, " which has ",
+                                src_node->num_outputs(), " outputs"));
+            return;
+          }
+          inputs.push_back(InputInfo(id.first, src_node, id.second));
+        }
+      }
+    }
+    if (has_data_back_edge && !IsMerge(node_def)) {
+      SetError(strings::StrCat(
+          node_def.name(),
+          " had a back edge. But only Merge can have back edges."));
+      return;
+    }
+
+    Node* node = MakeNode(node_def);
+    if (node == nullptr) return;
+
+    // Add edges from inputs to *node to the graph.
+    for (size_t i = 0; i < inputs.size(); ++i) {
+      if (inputs[i].node == nullptr) {
+        // Record this back edge, which will be added after all nodes
+        // are created.
+        back_edges.push_back(
+            EdgeInfo(inputs[i].name, inputs[i].index, node, i));
+      } else if (inputs[i].index == -1) {
+        g_->AddControlEdge(inputs[i].node, node);
+      } else {
+        const Edge* edge =
+            g_->AddEdge(inputs[i].node, inputs[i].index, node, i);
+        if (!TypeValidateEdge(edge)) return;
+      }
+    }
+
+    // Update pending_count_ for outputs.
+    for (size_t i = 0; i < outputs_[o].size(); ++i) {
+      const int output = outputs_[o][i];
+      pending_count_[output]--;
+      if (pending_count_[output] == 0) {
+        ready_.push_back(output);
+      }
+    }
+  }
+
+  // Add the back edges after all nodes are created.
+  for (auto e : back_edges) {
+    Node* src_node = name_index_[e.src_name].node;
+    if (e.src_index == -1) {
+      g_->AddControlEdge(src_node, e.dst_node);
+    } else {
+      const Edge* edge =
+          g_->AddEdge(src_node, e.src_index, e.dst_node, e.dst_index);
+      if (!TypeValidateEdge(edge)) return;
+    }
+
+    VLOG(2) << "Add back edge: " << src_node->name() << " -> "
+            << e.dst_node->name();
+  }
+
+  if (processed < gdef_->node_size()) {
+    SetError(
+        strings::StrCat(gdef_->node_size() - processed, " nodes in a cycle"));
+    return;
+  }
+
+  if (status_->ok()) {
+    FixupSourceAndSinkEdges(g_);
+
+    if (opts_.optimizer_do_cse) {
+      if (!back_edges.empty()) {
+        LOG(WARNING) << "Not doing CSE.  We need to figure out how to handle "
+                     << "loops in the CSE phase.";
+      } else {
+        VLOG(1) << "Starting CSE: graph of " << CountNodes(g_) << " nodes";
+        OptimizeCSE(g_, opts_.cse_consider_function);
+        VLOG(1) << "Finished CSE: graph of " << CountNodes(g_) << " nodes";
+      }
+    }
+  }
+}
+
+bool GraphConstructor::TypeValidateEdge(const Edge* edge) {
+  DataType src_out = edge->src()->output_type(edge->src_output());
+  DataType dst_in = edge->dst()->input_type(edge->dst_input());
+  if (!TypesCompatible(dst_in, src_out)) {
+    SetError(strings::StrCat(
+        "Input ", edge->dst_input(), " of node ", edge->dst()->name(),
+        " was passed ", DataTypeString(src_out), " from ", edge->src()->name(),
+        ":", edge->src_output(), " incompatible with expected ",
+        DataTypeString(dst_in), "."));
+    return false;
+  }
+  return true;
+}
+
+}  // namespace
+
+// ----------------------------------------------------------------------------
+// ConvertGraphDefToGraph
+// ----------------------------------------------------------------------------
+
+Status ConvertGraphDefToGraph(const GraphConstructorOptions& opts,
+                                   const GraphDef& gdef, Graph* g) {
+  Status status;
+  GraphConstructor constructor(opts, &gdef, g, &status);
+  return status;
+}
+
+// ----------------------------------------------------------------------------
+// CopyGraph
+// ----------------------------------------------------------------------------
+void CopyGraph(const Graph& src, Graph* dest) {
+  for (Node* n : dest->nodes()) {
+    CHECK(n->IsSource() || n->IsSink()) << "*dest must be empty";
+  }
+
+  // Copy the nodes
+  std::unordered_map<Node*, Node*>
+      node_map;  // "Node in src" -> "Node in *dest"
+  node_map[src.source_node()] = dest->source_node();
+  node_map[src.sink_node()] = dest->sink_node();
+  for (Node* n : src.nodes()) {
+    if (n->IsSource() || n->IsSink()) continue;
+    CHECK(n->IsOp());
+    node_map[n] = dest->CopyNode(n);
+  }
+
+  // Copy the edges
+  for (const Edge* e : src.edges()) {
+    Node* src_copy = node_map[e->src()];
+    Node* dst_copy = node_map[e->dst()];
+    dest->AddEdge(src_copy, e->src_output(), dst_copy, e->dst_input());
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/graph_constructor.h b/tensorflow/core/graph/graph_constructor.h
new file mode 100644
index 0000000000..cd1615ef6b
--- /dev/null
+++ b/tensorflow/core/graph/graph_constructor.h
@@ -0,0 +1,43 @@
+#ifndef TENSORFLOW_GRAPH_GRAPH_CONSTRUCTOR_H_
+#define TENSORFLOW_GRAPH_GRAPH_CONSTRUCTOR_H_
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// Construct a graph *g out of a GraphDef gdef. Returns non-OK on
+// error, in which case *g is left in an incomplete state.
+struct GraphConstructorOptions {
+  // If true, allows internal ops in the GraphDef.
+  bool allow_internal_ops = false;
+
+  // If true, the graph def is expected to have fully specified
+  // devices for all nodes. A node in the resulting graph "g" has the
+  // device name set accordingly.
+  //
+  // TODO(zhifengc): if possible, consider removing this option.
+  bool expect_device_spec = false;
+
+  // If true, perform common subexpression elimination on the graph.
+  // TODO(jeff): Turn this default to true?
+  bool optimizer_do_cse = false;
+
+  // If "optimizer_do_cse" is true and "cse_consider_function" is
+  // not nullptr, then only consider nodes for CSE for which
+  // "cse_consider_function(node)" returns true.
+  std::function<bool(const Node*)> cse_consider_function = nullptr;
+};
+extern Status ConvertGraphDefToGraph(const GraphConstructorOptions& opts,
+                                          const GraphDef& gdef, Graph* g);
+
+// Make a copy of "src" into "*dest".
+//
+// REQUIRES: "*dest" is a freshly allocated graph without any nodes or edges
+// other than the implicit Source/Sink nodes.
+extern void CopyGraph(const Graph& src, Graph* dest);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_GRAPH_CONSTRUCTOR_H_
diff --git a/tensorflow/core/graph/graph_constructor_test.cc b/tensorflow/core/graph/graph_constructor_test.cc
new file mode 100644
index 0000000000..61f4427297
--- /dev/null
+++ b/tensorflow/core/graph/graph_constructor_test.cc
@@ -0,0 +1,190 @@
+#include "tensorflow/core/graph/graph_constructor.h"
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/regexp.h"
+#include "tensorflow/core/public/status.h"
+#include <gtest/gtest.h>
+
+// TODO(josh11b): Test InitCostModel().
+// TODO(josh11b): Test setting the "device" field of a NodeDef.
+// TODO(josh11b): Test that feeding won't prune targets.
+
+namespace tensorflow {
+namespace {
+
+class GraphConstructorTest : public ::testing::Test {
+ protected:
+  GraphConstructorTest() : g_(new Graph(OpRegistry::Global())) {
+    RequireDefaultOps();
+  }
+  ~GraphConstructorTest() override {}
+
+  void Convert(const string& gdef_ascii) {
+    CHECK(protobuf::TextFormat::ParseFromString(gdef_ascii, &gdef_));
+  }
+
+  void ExpectError(const string& gdef_ascii, const string& expected_error_re) {
+    Convert(gdef_ascii);
+    GraphConstructorOptions opts;
+    Status status = ConvertGraphDefToGraph(opts, gdef_, g_.get());
+    EXPECT_FALSE(status.ok());
+    EXPECT_TRUE(RE2::PartialMatch(status.error_message(), expected_error_re))
+        << status;
+  }
+
+  void ExpectOK(const string& gdef_ascii) {
+    Convert(gdef_ascii);
+    GraphConstructorOptions opts;
+    TF_CHECK_OK(ConvertGraphDefToGraph(opts, gdef_, g_.get()));
+  }
+
+  Node* FindNode(const string& name) {
+    for (Node* n : g_->nodes()) {
+      if (n->name() == name) return n;
+    }
+    return nullptr;
+  }
+
+  bool HasNode(const string& name) { return FindNode(name) != nullptr; }
+
+  void ExpectNodes(const string& nodes) {
+    int count = 0;
+    std::vector<string> actual_nodes;
+    for (Node* n : g_->nodes()) {
+      if (n->IsOp()) {
+        count++;
+        actual_nodes.push_back(n->name());
+      }
+    }
+    std::sort(actual_nodes.begin(), actual_nodes.end());
+
+    LOG(INFO) << "Nodes present: " << str_util::Join(actual_nodes, " ");
+
+    std::vector<string> expected_nodes = str_util::Split(nodes, ',');
+    std::sort(expected_nodes.begin(), expected_nodes.end());
+    for (const string& s : expected_nodes) {
+      Node* n = FindNode(s);
+      EXPECT_TRUE(n != nullptr) << s;
+    }
+
+    EXPECT_TRUE(actual_nodes.size() == expected_nodes.size())
+        << "\nActual:   " << str_util::Join(actual_nodes, ",")
+        << "\nExpected: " << str_util::Join(expected_nodes, ",");
+  }
+
+  bool HasEdge(const string& src, int src_out, const string& dst, int dst_in) {
+    for (const Edge* e : g_->edges()) {
+      if (e->src()->name() == src && e->src_output() == src_out &&
+          e->dst()->name() == dst && e->dst_input() == src_out)
+        return true;
+    }
+    return false;
+  }
+  bool HasControlEdge(const string& src, const string& dst) {
+    return HasEdge(src, Graph::kControlSlot, dst, Graph::kControlSlot);
+  }
+
+ private:
+  GraphDef gdef_;
+  std::unique_ptr<Graph> g_;
+};
+
+REGISTER_OP("ABC");
+REGISTER_OP("TestParams").Output("o: float");
+REGISTER_OP("TestInput").Output("a: float").Output("b: float");
+REGISTER_OP("TestMul").Input("a: float").Input("b: float").Output("o: float");
+REGISTER_OP("TestInt").Input("a: int32");
+
+TEST_F(GraphConstructorTest, InvalidNodeName) {
+  ExpectError("node { name: 'a:b' op: 'ABC' }",
+              "Node 'a:b': Node name contains invalid characters");
+  ExpectError("node { name: '_abc' op: 'ABC' }",
+              // Can't start with '_'
+              "Node '_abc': Node name contains invalid characters");
+  ExpectOK("node { name: 'a-bc_' op: 'ABC' }");
+}
+
+TEST_F(GraphConstructorTest, InvalidSourceNodeName) {
+  ExpectError(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: 'W999' input: 'input' }",
+
+      "Unknown input node.*W999");
+}
+
+TEST_F(GraphConstructorTest, InvalidSourceNodeIndex) {
+  ExpectError(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1:1', 'input:1' ] }",
+
+      "Connecting to invalid output 1 of source node W1");
+}
+
+TEST_F(GraphConstructorTest, GraphWithCycle) {
+  ExpectError(
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'input:0', 't2' ] }"
+      "node { name: 't2' op: 'TestMul' input: [ 'input:1', 't1' ] }",
+
+      "cycle");
+}
+
+TEST_F(GraphConstructorTest, TypeMismatch) {
+  ExpectError(
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 'int' op: 'TestInt' input: [ 'input' ] }",
+
+      "Input 0 of node int was passed float from input:0 incompatible with "
+      "expected int32.");
+}
+
+TEST_F(GraphConstructorTest, EmptyGraph) { ExpectOK(""); }
+
+TEST_F(GraphConstructorTest, SimpleModel) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1', 'input:1' ] }");
+  EXPECT_TRUE(HasNode("W1"));
+  EXPECT_TRUE(HasNode("input"));
+  EXPECT_TRUE(HasNode("t1"));
+  EXPECT_TRUE(HasEdge("W1", 0, "t1", 0));
+  EXPECT_TRUE(HasEdge("input", 1, "t1", 0));
+}
+
+TEST_F(GraphConstructorTest, SimpleModelWithControlEdges) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' input: [ '^W1' ] }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1', 'input:1' ] }"
+      "node { name: 't2' op: 'TestMul' input: [ 'W1', 'input:1', '^t1' ] }");
+  EXPECT_TRUE(HasNode("W1"));
+  EXPECT_TRUE(HasNode("input"));
+  EXPECT_TRUE(HasNode("t1"));
+  EXPECT_TRUE(HasNode("t2"));
+  EXPECT_TRUE(HasEdge("W1", 0, "t1", 0));
+  EXPECT_TRUE(HasEdge("input", 1, "t1", 0));
+  EXPECT_TRUE(HasEdge("W1", 0, "t2", 0));
+  EXPECT_TRUE(HasEdge("input", 1, "t2", 0));
+  EXPECT_TRUE(HasControlEdge("W1", "input"));
+  EXPECT_TRUE(HasControlEdge("t1", "t2"));
+}
+
+TEST_F(GraphConstructorTest, Error_ControlEdgeBeforeRealInput) {
+  ExpectError(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' input: [ '^W1' ] }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1', 'input:1' ] }"
+      "node { name: 't2' op: 'TestMul' input: [ 'W1', '^t1', 'input:1' ] }",
+      "Node 't2': Control dependencies must come after regular dependencies");
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/graph_def_builder.cc b/tensorflow/core/graph/graph_def_builder.cc
new file mode 100644
index 0000000000..979604f948
--- /dev/null
+++ b/tensorflow/core/graph/graph_def_builder.cc
@@ -0,0 +1,121 @@
+#include "tensorflow/core/graph/graph_def_builder.h"
+
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/tensor_id.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+GraphDefBuilder::Options::Options(Graph* graph, Status* status)
+    : graph_(graph), status_(status) {}
+GraphDefBuilder::Options::~Options() {}
+
+GraphDefBuilder::Options GraphDefBuilder::Options::WithName(
+    StringPiece name) const {
+  return Options(*this).WithNameImpl(name);
+}
+GraphDefBuilder::Options GraphDefBuilder::Options::WithDevice(
+    StringPiece device) const {
+  return Options(*this).WithDeviceImpl(device);
+}
+GraphDefBuilder::Options GraphDefBuilder::Options::WithControlInput(
+    Node* control_input) const {
+  return Options(*this).WithControlInputImpl(control_input);
+}
+GraphDefBuilder::Options GraphDefBuilder::Options::WithControlInputs(
+    gtl::ArraySlice<Node*> control_inputs) const {
+  return Options(*this).WithControlInputsImpl(control_inputs);
+}
+GraphDefBuilder::Options GraphDefBuilder::Options::WithNameImpl(
+    StringPiece name) {
+  name_ = name.ToString();
+  return *this;
+}
+GraphDefBuilder::Options GraphDefBuilder::Options::WithDeviceImpl(
+    StringPiece device) {
+  device_ = device.ToString();
+  return *this;
+}
+GraphDefBuilder::Options GraphDefBuilder::Options::WithControlInputImpl(
+    Node* control_input) {
+  control_inputs_.push_back(control_input);
+  return *this;
+}
+GraphDefBuilder::Options GraphDefBuilder::Options::WithControlInputsImpl(
+    gtl::ArraySlice<Node*> control_inputs) {
+  control_inputs_.insert(control_inputs_.end(), control_inputs.begin(),
+                         control_inputs.end());
+  return *this;
+}
+
+Status GraphDefBuilder::ToGraphDef(GraphDef* graph_def) const {
+  if (status_.ok()) {
+    graph_.ToGraphDef(graph_def);
+  }
+  return status_;
+}
+
+Status GraphDefBuilder::ToGraph(Graph* graph) const {
+  if (status_.ok()) {
+    GraphDef graph_def;
+    graph_.ToGraphDef(&graph_def);
+    GraphConstructorOptions opts;
+    TF_RETURN_IF_ERROR(ConvertGraphDefToGraph(opts, graph_def, graph));
+  }
+  return status_;
+}
+
+string GraphDefBuilder::Options::GetNameForOp(StringPiece op) const {
+  if (name_.empty()) return graph_->NewName(op);
+  return name_;
+}
+
+Node* GraphDefBuilder::Options::FinalizeBuilder(NodeBuilder* builder) const {
+  builder->ControlInputs(control_inputs_);
+  if (!device_.empty()) builder->Device(device_);
+  for (const auto& attr : attrs_) {
+    builder->Attr(attr.first, attr.second);
+  }
+
+  Node* returned_node;
+  UpdateStatus(builder->Finalize(graph_, &returned_node));
+  return returned_node;
+}
+
+void GraphDefBuilder::Options::UpdateStatus(const Status& status) const {
+  if (status_ == nullptr) {
+    TF_CHECK_OK(status);
+  } else {
+    status_->Update(status);
+  }
+}
+
+namespace ops {
+
+Node* SourceOp(const string& op_name, const GraphDefBuilder::Options& opts) {
+  if (opts.HaveError()) return nullptr;
+  NodeBuilder node_builder(opts.GetNameForOp(op_name), op_name,
+                           opts.op_registry());
+  return opts.FinalizeBuilder(&node_builder);
+}
+
+Node* UnaryOp(const string& op_name, NodeOut input,
+              const GraphDefBuilder::Options& opts) {
+  if (opts.HaveError()) return nullptr;
+  NodeBuilder node_builder(opts.GetNameForOp(op_name), op_name,
+                           opts.op_registry());
+  node_builder.Input(input);
+  return opts.FinalizeBuilder(&node_builder);
+}
+
+Node* BinaryOp(const string& op_name, NodeOut a, NodeOut b,
+               const GraphDefBuilder::Options& opts) {
+  if (opts.HaveError()) return nullptr;
+  NodeBuilder node_builder(opts.GetNameForOp(op_name), op_name,
+                           opts.op_registry());
+  node_builder.Input(a).Input(b);
+  return opts.FinalizeBuilder(&node_builder);
+}
+
+}  // end namespace ops
+}  // end namespace tensorflow
diff --git a/tensorflow/core/graph/graph_def_builder.h b/tensorflow/core/graph/graph_def_builder.h
new file mode 100644
index 0000000000..bb72f9eea6
--- /dev/null
+++ b/tensorflow/core/graph/graph_def_builder.h
@@ -0,0 +1,181 @@
+#ifndef TENSORFLOW_GRAPH_GRAPH_DEF_BUILDER_H_
+#define TENSORFLOW_GRAPH_GRAPH_DEF_BUILDER_H_
+
+#include <vector>
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+
+namespace tensorflow {
+
+// Given a function like:
+//   namespace ops {
+//   Node* Identity(NodeOut input, const GraphDefBuilder::Options& opts) {
+//     if (opts.HaveError()) return nullptr;
+//     static const string kOpName = "Identity";
+//     NodeBuilder node_builder(opts.GetNameForOp(kOpName), kOpName,
+//                              opts.op_registry());
+//     node_builder.Input(input);
+//     return opts.FinalizeBuilder(&node_builder);
+//   }
+//   }  // namspace ops
+//
+//   // Or, alternatively:
+//   namespace ops {
+//   Node* Identity(NodeOut input, const GraphDefBuilder::Options& opts) {
+//     static const string kOpName = "Identity";
+//     return UnaryOp(kOpName, input, opts);
+//   }
+//   }  // namspace ops
+//
+// You call it like:
+//   GraphDefBuilder b;
+//   using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+//   Node* a = Const(7, b.opts());
+//   // Note: WithName() returns a copy, opts is unchanged.
+//   Node* b = Const(5, b.opts().WithName("control-input"));
+//   Node* c = Identity(a, b.opts().WithControlInput(b));
+//   GraphDef graph_def;
+//   Status status = b.ToGraphDef(&graph_def);
+//   if (!status.ok()) { /* Handle error */ }
+//
+// In tests you can skip the status handling via:
+//   GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+//   ...
+//   b.ToGraphDef(&graph_def);
+
+class GraphDefBuilder {
+ public:
+  // Options for adding a Node to a Graph.
+  class Options {
+   public:
+    // Sets the Graph (that Nodes will be added to) and the status.  The
+    // status may be set to nullptr, in which case errors cause CHECK
+    // failures.  The graph and status must outlive *this.
+    Options(Graph* graph, Status* status);
+    ~Options();
+
+    // Methods for setting options.  These are const methods: they
+    // return a copy of *this with the option set.
+    Options WithName(StringPiece name) const;
+    Options WithDevice(StringPiece device) const;
+    Options WithControlInput(Node* control_input) const;
+    Options WithControlInputs(gtl::ArraySlice<Node*> control_inputs) const;
+
+    // Override the default value for an optional attr.
+    template <class T>
+    Options WithAttr(StringPiece attr_name, T&& value) const {
+      return Options(*this).WithAttrImpl(attr_name, std::forward<T>(value));
+    }
+    // Note: overload needed to allow {...} expressions for value.
+    template <class T>
+    Options WithAttr(StringPiece attr_name,
+                     std::initializer_list<T> value) const {
+      return WithAttr<std::initializer_list<T>>(attr_name, std::move(value));
+    }
+
+    // Methods for using options from a function that creates a Node.
+
+    // Returns true if the status associated with *this has an error.
+    // Use this to skip processing that may depend on prior results.
+    bool HaveError() const { return status_ != nullptr && !status_->ok(); }
+
+    // Given the Op type name, return a name for a node of that type.
+    // Uses the value set in WithName() if that has been called.  Otherwise,
+    // returns a name built out of the Op type name.
+    string GetNameForOp(StringPiece op) const;
+
+    // Sets the device, adds control inputs, adds attrs, and calls Finalize().
+    // If Finalize returns an error, it is saved and this function returns
+    // nullptr.
+    Node* FinalizeBuilder(NodeBuilder* builder) const;
+
+    // Updates the associated status, if any, or calls TF_CHECK_OK if none.
+    void UpdateStatus(const Status& status) const;
+
+    // Accessor
+    const OpRegistryInterface* op_registry() const {
+      return graph_->op_registry();
+    }
+
+   private:
+    Options WithNameImpl(StringPiece name);
+    Options WithDeviceImpl(StringPiece device);
+    Options WithControlInputImpl(Node* control_input);
+    Options WithControlInputsImpl(gtl::ArraySlice<Node*> control_inputs);
+    template <class T>
+    Options WithAttrImpl(StringPiece name, T&& value) {
+      attrs_.emplace_back(name.ToString(), AttrValue());
+      SetAttrValue(std::forward<T>(value), &attrs_.back().second);
+      return *this;
+    }
+
+    Graph* const graph_;
+    Status* const status_;
+    string name_;
+    string device_;
+    std::vector<Node*> control_inputs_;
+    std::vector<std::pair<string, AttrValue>> attrs_;
+  };
+
+  // Start building a new graph.
+  explicit GraphDefBuilder(
+      const OpRegistryInterface* op_registry = OpRegistry::Global())
+      : graph_(op_registry), opts_(&graph_, &status_) {}
+
+  // For use in tests, where you want to fail immediately on error instead
+  // of checking the status at the end.
+  enum TestFailImmediatelyType { kFailImmediately };
+  explicit GraphDefBuilder(
+      TestFailImmediatelyType,
+      const OpRegistryInterface* op_registry = OpRegistry::Global())
+      : graph_(op_registry), opts_(&graph_, nullptr) {}
+
+  // Gets the Options with the associated Graph and Status.
+  const Options& opts() const { return opts_; }
+
+  // Once all the nodes have been added, call this to get whether it was
+  // successful, and if so fill *graph_def.
+  Status ToGraphDef(GraphDef* graph_def) const;
+
+  // Like ToGraphDef(), but converts to a Graph (using the default
+  // GraphConstructorOptions).
+  // TODO(josh11b): Make this faster; right now it converts
+  // Graph->GraphDef->Graph.  This cleans up the graph (e.g. adds
+  // edges from the source and to the sink node, resolves back edges
+  // by name), and makes sure the resulting graph is valid.
+  Status ToGraph(Graph* graph) const;
+
+ private:
+  Graph graph_;
+  Status status_;
+  Options opts_;
+};
+
+namespace ops {
+
+// A NodeOut may either be a regular input or back input.  Regular
+// inputs are specified via either a Node* or a Node* and an output
+// index.  Back inputs are specified by a node name, output index, and
+// output type.
+typedef NodeBuilder::NodeOut NodeOut;
+
+// For adding an Op with no inputs to a GraphDefBuilder.
+Node* SourceOp(const string& op_name, const GraphDefBuilder::Options& opts);
+
+// For adding an Op with one input to a GraphDefBuilder.
+Node* UnaryOp(const string& op_name, NodeOut input,
+              const GraphDefBuilder::Options& opts);
+
+// For adding an Op with two inputs to a GraphDefBuilder.
+Node* BinaryOp(const string& op_name, NodeOut a, NodeOut b,
+               const GraphDefBuilder::Options& opts);
+
+}  // namespace ops
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_GRAPH_DEF_BUILDER_H_
diff --git a/tensorflow/core/graph/graph_partition.cc b/tensorflow/core/graph/graph_partition.cc
new file mode 100644
index 0000000000..1571790e59
--- /dev/null
+++ b/tensorflow/core/graph/graph_partition.cc
@@ -0,0 +1,1050 @@
+#include "tensorflow/core/graph/graph_partition.h"
+
+#include <deque>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/costmodel.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/hash/hash.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+namespace {
+
+struct DupRecvKey {
+  int src_node_id;           // Edge's src node id
+  int src_output_slot;       // Edge's src node output slot
+  GraphDef* dst_graph;       // Edge's dst node is in this subgraph
+  bool recv_output_on_host;  // The output of recv is on host
+};
+
+struct DupRecvKeyHash {
+  size_t operator()(const DupRecvKey& k) const {
+    size_t h = Hash64(reinterpret_cast<const char*>(&k.src_node_id),
+                      sizeof(k.src_node_id), k.src_output_slot);
+    h = Hash64(reinterpret_cast<const char*>(&k.dst_graph), sizeof(k.dst_graph),
+               h);
+    h = Hash64(reinterpret_cast<const char*>(&k.recv_output_on_host),
+               sizeof(k.recv_output_on_host), h);
+    return h;
+  }
+};
+
+struct DupRecvKeyEq {
+  bool operator()(const DupRecvKey& x, const DupRecvKey& y) const {
+    return (x.src_node_id == y.src_node_id) &&
+           (x.src_output_slot == y.src_output_slot) &&
+           (x.dst_graph == y.dst_graph) &&
+           (x.recv_output_on_host == y.recv_output_on_host);
+  }
+};
+
+// struct used to store the recvs, so that start times can be properly updated
+struct RecvInfo {
+  NodeDef* recv;
+  NodeDef* real_recv;
+  int64 start_time;
+};
+
+typedef std::unordered_map<DupRecvKey, RecvInfo, DupRecvKeyHash, DupRecvKeyEq>
+    DupRecvTable;
+
+// Control flow info for a graph node.
+struct ControlFlowInfo {
+  const Node* frame = nullptr;         // frame of a node
+  const Node* parent_frame = nullptr;  // parent frame of a node
+  string frame_name;                   // frame name of a node
+  int iter_level = -1;                 // level of a node
+};
+
+struct PairIntHash {
+ public:
+  std::size_t operator()(const std::pair<int, int>& x) const {
+    return std::hash<int>()(x.first) ^ std::hash<int>()(x.second);
+  }
+};
+// A map used to store memory types for the inputs/outputs of every node.
+// The key is a pair of ints consisting of a node id and input/output index.
+typedef std::unordered_map<std::pair<int, int>, MemoryType, PairIntHash>
+    MemoryTypeMap;
+
+// We collect the following information about the graph before performing
+// graph partitioning.
+struct GraphInfo {
+  std::vector<DeviceType> device_types;
+  MemoryTypeMap input_types;
+  MemoryTypeMap output_types;
+  std::vector<ControlFlowInfo> cf_info;
+};
+
+DataType EdgeType(const Edge* e) {
+  if (e->IsControlEdge()) {
+    return DT_FLOAT;
+  } else {
+    return e->dst()->input_type(e->dst_input());
+  }
+}
+
+// Return true iff we need to add a same device send/recv for 'edge'.
+bool NeedSameDeviceSendRecv(const Edge* edge, const GraphInfo& info) {
+  if (edge->IsControlEdge()) {
+    return false;
+  }
+
+  Node* src = edge->src();
+  Node* dst = edge->dst();
+  if (src->assigned_device_name() == dst->assigned_device_name()) {
+    int src_port = edge->src_output();
+    int dst_port = edge->dst_input();
+    if (info.device_types[src->id()] == DEVICE_GPU) {
+      auto src_it = info.output_types.find({src->id(), src_port});
+      DCHECK(src_it != info.output_types.end());
+      auto dst_it = info.input_types.find({dst->id(), dst_port});
+      DCHECK(dst_it != info.input_types.end());
+      return src_it->second != dst_it->second;
+    }
+  }
+  return false;
+}
+
+// Return true iff (dst, dst_input) is specified on host memory.
+bool IsDstInputOnHost(const Edge* edge, const GraphInfo& info) {
+  Node* dst = edge->dst();
+  int dst_port = edge->dst_input();
+  if (info.device_types[dst->id()] == DEVICE_GPU) {
+    if (edge->IsControlEdge()) return false;
+    auto dst_it = info.input_types.find({dst->id(), dst_port});
+    DCHECK(dst_it != info.input_types.end());
+    return dst_it->second == HOST_MEMORY;
+  }
+  return true;
+}
+
+// Add an input to dst that comes from the "src_slot" output of the
+// node named by "src_name".
+void AddInput(NodeDef* dst, StringPiece src_name, int src_slot) {
+  if (src_slot == Graph::kControlSlot) {
+    dst->add_input(strings::StrCat("^", src_name));
+  } else if (src_slot == 0) {
+    dst->add_input(src_name.data(), src_name.size());
+  } else {
+    dst->add_input(strings::StrCat(src_name, ":", src_slot));
+  }
+}
+
+// Add a control edge from each input to each recv.
+void AddReadControl(const std::vector<NodeDef*>& recvs,
+                    const std::vector<string>& inputs) {
+  for (NodeDef* recv : recvs) {
+    for (const string& input : inputs) {
+      recv->add_input(strings::StrCat("^", input));
+    }
+  }
+}
+
+void SetSendRecvAttrs(const PartitionOptions& opts, const Edge* edge,
+                      NodeDefBuilder* builder) {
+  builder->Attr("tensor_name",
+                strings::StrCat("edge_", edge->id(), "_", edge->src()->name()));
+  builder->Attr("send_device", edge->src()->assigned_device_name());
+  builder->Attr("send_device_incarnation",
+                static_cast<int64>(
+                    opts.get_incarnation(edge->src()->assigned_device_name())));
+  builder->Attr("recv_device", edge->dst()->assigned_device_name());
+  builder->Attr("client_terminated", false);
+}
+
+NodeDef* AddSend(const PartitionOptions& opts, const GraphInfo& g_info,
+                 GraphDef* gdef, const Edge* edge,
+                 NodeDefBuilder::NodeOut send_from, int64 start_time,
+                 Status* status) {
+  const DataType dtype = send_from.data_type;
+  const DataType cast_dtype = opts.should_cast ? opts.should_cast(edge) : dtype;
+  const Node* src = edge->src();
+  const int src_port = edge->src_output();
+
+  // host_memory = true iff we need to use HostSend/HostCast.
+  bool host_memory = false;
+  if (!edge->IsControlEdge()) {
+    auto src_it = g_info.output_types.find({src->id(), src_port});
+    DCHECK(src_it != g_info.output_types.end());
+    host_memory = (src_it->second == HOST_MEMORY);
+  }
+
+  // Add a cast node that casts dtype to cast_dtype.
+  // NOTE(yuanbyu): Only cast for cross-device send/recv.
+  if (dtype != cast_dtype && !NeedSameDeviceSendRecv(edge, g_info)) {
+    const string cast_op = (host_memory) ? "_HostCast" : "Cast";
+    NodeDefBuilder cast_builder(opts.new_name(src->name()), cast_op);
+    cast_builder.Device(src->assigned_device_name()).Input(send_from);
+    if (opts.scheduling_for_recvs) {
+      cast_builder.Attr("_start_time", start_time);
+    }
+    cast_builder.Attr("DstT", cast_dtype);
+    NodeDef* cast = gdef->add_node();
+    *status = cast_builder.Finalize(cast);
+    if (!status->ok()) return nullptr;
+
+    // Connect the Send op to the cast.
+    send_from.Reset(cast->name(), 0, cast_dtype);
+  }
+
+  // Add the send node.
+  const string send_op = (host_memory) ? "_HostSend" : "_Send";
+  NodeDefBuilder send_builder(opts.new_name(src->name()), send_op);
+  SetSendRecvAttrs(opts, edge, &send_builder);
+  send_builder.Device(src->assigned_device_name()).Input(send_from);
+  if (opts.scheduling_for_recvs) {
+    send_builder.Attr("_start_time", start_time);
+  }
+  NodeDef* send = gdef->add_node();
+  *status = send_builder.Finalize(send);
+  return send;
+}
+
+NodeDef* AddRecv(const PartitionOptions& opts, const GraphInfo& g_info,
+                 GraphDef* gdef, const Edge* edge, NodeDef** real_recv,
+                 Status* status) {
+  const DataType dtype = EdgeType(edge);
+  const Node* src = edge->src();
+  const Node* dst = edge->dst();
+  const int dst_port = edge->dst_input();
+  DataType cast_dtype = dtype;
+
+  // NOTE(yuanbyu): Only cast for cross-device send/recv.
+  if (opts.should_cast && !NeedSameDeviceSendRecv(edge, g_info)) {
+    cast_dtype = opts.should_cast(edge);
+  }
+
+  // host_memory = true iff we need to use HostRecv/HostCast.
+  bool host_memory = false;
+  if (!edge->IsControlEdge()) {
+    auto dst_it = g_info.input_types.find({dst->id(), dst_port});
+    DCHECK(dst_it != g_info.input_types.end());
+    host_memory = (dst_it->second == HOST_MEMORY);
+  }
+
+  // Add the recv node.
+  const string recv_op = (host_memory) ? "_HostRecv" : "_Recv";
+  NodeDefBuilder recv_builder(opts.new_name(src->name()), recv_op);
+  SetSendRecvAttrs(opts, edge, &recv_builder);
+  recv_builder.Device(dst->assigned_device_name())
+      .Attr("tensor_type", cast_dtype);
+  NodeDef* recv = gdef->add_node();
+  *status = recv_builder.Finalize(recv);
+  if (!status->ok()) return nullptr;
+  *real_recv = recv;
+
+  // Add the cast node (from cast_dtype to dtype) or an Identity node.
+  if (dtype != cast_dtype) {
+    const string cast_op = (host_memory) ? "_HostCast" : "Cast";
+    NodeDefBuilder cast_builder(opts.new_name(src->name()), cast_op);
+    cast_builder.Attr("DstT", dtype);
+    cast_builder.Device(dst->assigned_device_name())
+        .Input(recv->name(), 0, cast_dtype);
+    NodeDef* cast = gdef->add_node();
+    *status = cast_builder.Finalize(cast);
+    if (!status->ok()) return nullptr;
+    return cast;
+  } else if (edge->IsControlEdge()) {
+    // An Identity is only needed for control edges.
+    NodeDefBuilder id_builder(opts.new_name(src->name()), "Identity");
+    id_builder.Device(dst->assigned_device_name())
+        .Input(recv->name(), 0, cast_dtype);
+    NodeDef* id = gdef->add_node();
+    *status = id_builder.Finalize(id);
+    if (!status->ok()) return nullptr;
+    return id;
+  } else {
+    return recv;
+  }
+}
+
+NodeDef* AddDummyConst(const PartitionOptions& opts, GraphDef* gdef,
+                       const Edge* edge, Status* status) {
+  const Node* src = edge->src();
+  Tensor tensor(DT_FLOAT, TensorShape({0}));
+  NodeDef* result = gdef->add_node();
+  *status = NodeDefBuilder(opts.new_name(src->name()), "Const")
+                .Device(src->assigned_device_name())
+                .Attr("dtype", DT_FLOAT)
+                .Attr("value", tensor)
+                .Finalize(result);
+  return result;
+}
+
+// A dummy node for scheduling.
+NodeDef* AddControlTrigger(const PartitionOptions& opts, GraphDef* gdef,
+                           const string& assigned_device_name, int64 epoch,
+                           int64 starttime, Status* status) {
+  NodeDef* result = gdef->add_node();
+  *status = NodeDefBuilder(opts.new_name(strings::StrCat("synch_", epoch)),
+                           "ControlTrigger")
+                .Device(assigned_device_name)
+                .Attr("_start_time", starttime)
+                .Finalize(result);
+  return result;
+}
+
+// Assign to each node the name of the frame and the level it belongs to.
+// We check the well-formedness of the graph: All inputs to a node must
+// come from the same frame and have the same "static" iteration level.
+// NOTE(yuanbyu): For now, we require all sends/recvs have iteration level
+// 0. This essentially means there can't be multiple serial Nexts in
+// an iteration, which all sane front-ends should satisfy.
+Status BuildControlFlowInfo(Graph* g, std::vector<ControlFlowInfo>* info) {
+  info->clear();
+  info->resize(g->num_node_ids());
+
+  Node* src_node = g->source_node();
+  ControlFlowInfo& src_info = (*info)[src_node->id()];
+  src_info.frame = src_node;
+  src_info.parent_frame = src_node;
+  src_info.iter_level = 0;
+
+  string frame_name;
+  std::deque<const Node*> ready;
+  ready.push_back(src_node);
+  while (!ready.empty()) {
+    const Node* curr_node = ready.front();
+    ready.pop_front();
+    const ControlFlowInfo& curr_info = (*info)[curr_node->id()];
+    const Node* frame = curr_info.frame;
+    const Node* parent = curr_info.parent_frame;
+    frame_name = curr_info.frame_name;
+    int iter_level = curr_info.iter_level;
+
+    if (IsExit(curr_node)) {
+      const ControlFlowInfo& parent_info = (*info)[parent->id()];
+      frame = parent_info.frame;
+      parent = parent_info.parent_frame;
+      frame_name = parent_info.frame_name;
+      iter_level = parent_info.iter_level;
+    }
+
+    for (const Edge* out_edge : curr_node->out_edges()) {
+      const Node* out = out_edge->dst();
+      int out_id = out->id();
+      ControlFlowInfo* out_info = &(*info)[out_id];
+      const Node* out_parent = out_info->parent_frame;
+      bool is_visited = (out_info->iter_level != -1);
+
+      // Skip Sink/Source nodes.
+      if (!out->IsOp()) continue;
+
+      // Add to ready queue if not seen.
+      if (!is_visited) {
+        ready.push_back(out);
+      }
+
+      // Process the node 'out'.
+      if (IsEnter(out)) {
+        if (is_visited) {
+          const string& parent_name = (*info)[out_parent->id()].frame_name;
+          if (parent_name != frame_name || iter_level != out_info->iter_level) {
+            return errors::InvalidArgument(
+                "All inputs to Enter must be from the same frame and level.");
+          }
+        } else {
+          out_info->frame = out;
+          out_info->parent_frame = frame;
+          TF_RETURN_IF_ERROR(
+              GetNodeAttr(out->def(), "frame_name", &out_info->frame_name));
+          if (out_info->frame_name.empty()) {
+            return errors::InvalidArgument(
+                "Enter must have a non-empty frame name.");
+          }
+          out_info->iter_level = 0;
+        }
+      } else if (IsNextIteration(out)) {
+        if (is_visited) {
+          if (out_info->frame_name != frame_name ||
+              out_info->iter_level != (iter_level + 1)) {
+            return errors::InvalidArgument(
+                "All inputs to NextIteration must be from the same frame "
+                "and level.");
+          }
+        } else {
+          out_info->frame = frame;
+          out_info->parent_frame = parent;
+          out_info->frame_name = frame_name;
+          out_info->iter_level = iter_level + 1;
+        }
+      } else {
+        if (is_visited) {
+          if (out_info->frame_name != frame_name) {
+            return errors::InvalidArgument(
+                "All inputs to a node must be from the same frame.");
+          }
+        } else {
+          out_info->frame = frame;
+          out_info->parent_frame = parent;
+          out_info->frame_name = frame_name;
+          out_info->iter_level = iter_level;
+        }
+      }
+    }
+  }
+
+  return Status::OK();
+}
+
+string ControlLoopName(const string& name) {
+  return strings::StrCat("_cloop", name);
+}
+
+bool IsControlLoop(const Node* node) {
+  const string& name = node->def().name();
+  return StringPiece(name).starts_with("_cloop");
+}
+
+// An enter node for control flow.
+Node* AddControlEnter(Graph* g, const string& node_name,
+                      const string& device_name, const string& frame_name,
+                      const int parallel_iterations, Status* status) {
+  NodeBuilder node_builder(node_name, "Enter", g->op_registry());
+  node_builder.Input({"dummy", 0, DT_FLOAT});
+  node_builder.Attr("frame_name", frame_name);
+  node_builder.Attr("parallel_iterations", parallel_iterations);
+  Node* res_node;
+  *status = node_builder.Finalize(g, &res_node);
+  if (!status->ok()) return nullptr;
+  res_node->set_assigned_device_name(device_name);
+  return res_node;
+}
+
+// A merge node for control flow.
+Node* AddControlMerge(const string& in_name1, const string& in_name2, Graph* g,
+                      const string& node_name, const string& device_name,
+                      Status* status) {
+  NodeBuilder node_builder(node_name, "Merge", g->op_registry());
+  node_builder.Input({{in_name1, 0, DT_FLOAT}, {in_name2, 0, DT_FLOAT}});
+  Node* res_node;
+  *status = node_builder.Finalize(g, &res_node);
+  if (!status->ok()) return nullptr;
+  res_node->set_assigned_device_name(device_name);
+  return res_node;
+}
+
+// A switch node for control flow.
+Node* AddControlSwitch(NodeBuilder::NodeOut input1, NodeBuilder::NodeOut input2,
+                       const string& device_name,
+                       const GraphDefBuilder::Options& bopts) {
+  Node* res_node = ops::BinaryOp("Switch", input1, input2, bopts);
+  if (bopts.HaveError()) return nullptr;
+  res_node->set_assigned_device_name(device_name);
+  return res_node;
+}
+
+// A next_iteration node for control flow.
+Node* AddControlNext(NodeBuilder::NodeOut input, const string& device_name,
+                     const GraphDefBuilder::Options& bopts) {
+  Node* res_node = ops::UnaryOp("NextIteration", input, bopts);
+  if (bopts.HaveError()) return nullptr;
+  res_node->set_assigned_device_name(device_name);
+  return res_node;
+}
+
+Node* EmptyConst(const GraphDefBuilder::Options& options) {
+  if (options.HaveError()) return nullptr;
+  NodeBuilder node_builder(options.GetNameForOp("Const"), "Const",
+                           options.op_registry());
+  const DataType dt = DataTypeToEnum<float>::v();
+  TensorProto proto;
+  proto.set_dtype(dt);
+  TensorShape empty_shape({0});
+  empty_shape.AsProto(proto.mutable_tensor_shape());
+  node_builder.Attr("dtype", dt).Attr("value", proto);
+  return options.FinalizeBuilder(&node_builder);
+}
+
+// A dummy const node for control flow.
+Node* AddControlConst(const string& device_name,
+                      const GraphDefBuilder::Options& bopts) {
+  Node* res_node = EmptyConst(bopts);
+  if (bopts.HaveError()) return nullptr;
+  res_node->set_assigned_device_name(device_name);
+  return res_node;
+}
+
+// A synthetic loop, made up of dummy nodes. It performs control-flow actions
+// on behalf of a leader on a different device.
+struct ControlLoop {
+  Node* enter = nullptr;
+  Node* merge = nullptr;
+  Node* switch_node = nullptr;
+};
+
+// Add the control flow info of a new node added during partitioning.
+// The new node has the same control flow info as edge->src().
+void AddControlFlowInfo(const Node* node, const Node* src,
+                        std::vector<ControlFlowInfo>* cf_info) {
+  int id = node->id();
+  if (static_cast<size_t>(id) >= cf_info->size()) {
+    cf_info->resize(id + 1);
+  }
+  const ControlFlowInfo& src_info = (*cf_info)[src->id()];
+  ControlFlowInfo* info = &(*cf_info)[id];
+  info->frame = src_info.frame;
+  info->parent_frame = src_info.parent_frame;
+  info->frame_name = src_info.frame_name;
+  info->iter_level = src_info.iter_level;
+}
+
+// Constructs a control loop. Returns a struct containing the newly created
+// enter, merge, and switch nodes. The enter and merge nodes are used in the
+// recursive construction of control loops for nested frames (loops). The
+// switch node will be connected to the LoopCond node. The merge node will
+// be connected to all the recvs of the same frame by control edges when
+// the actual partitioning happens.
+Status AddControlLoop(const PartitionOptions& opts, Graph* g, const Node* src,
+                      const Edge* edge, Node* loop_cond,
+                      std::vector<ControlFlowInfo>* cf_info,
+                      ControlLoop* loop) {
+  Status status;
+  GraphDefBuilder::Options bopts(g, &status);
+  const ControlFlowInfo& src_info = (*cf_info)[src->id()];
+  const string& device_name = edge->dst()->assigned_device_name();
+  const string& frame_name = src_info.frame_name;
+  int parallel_iterations;
+  status = GetNodeAttr(src_info.frame->def(), "parallel_iterations",
+                       &parallel_iterations);
+  if (!status.ok()) return status;
+
+  // The names of the nodes to be added.
+  const string& enter_name =
+      ControlLoopName(opts.new_name(edge->dst()->name()));
+  const string& merge_name =
+      ControlLoopName(opts.new_name(edge->dst()->name()));
+  const string& switch_name =
+      ControlLoopName(opts.new_name(edge->dst()->name()));
+  const string& next_name = ControlLoopName(opts.new_name(edge->dst()->name()));
+
+  // Add the nodes to the graph g.
+  Node* enter = AddControlEnter(g, enter_name, device_name, frame_name,
+                                parallel_iterations, &status);
+  if (!status.ok()) return status;
+  Node* merge = AddControlMerge(enter_name, next_name, g, merge_name,
+                                device_name, &status);
+  if (!status.ok()) return status;
+  Node* switch_node = AddControlSwitch(merge, loop_cond, device_name,
+                                       bopts.WithName(switch_name));
+  if (!status.ok()) return status;
+  Node* next =
+      AddControlNext({switch_node, 1}, device_name, bopts.WithName(next_name));
+  if (!status.ok()) return status;
+
+  // Add control flow info for these new nodes:
+  AddControlFlowInfo(enter, src, cf_info);
+  AddControlFlowInfo(merge, src, cf_info);
+  AddControlFlowInfo(switch_node, src, cf_info);
+  AddControlFlowInfo(next, src, cf_info);
+
+  // Add input edges for the newly created merge node:
+  g->AddEdge(enter, 0, merge, 0);
+  g->AddEdge(next, 0, merge, 1);
+
+  loop->enter = enter;
+  loop->merge = merge;
+  loop->switch_node = switch_node;
+  return Status::OK();
+}
+
+// Build memory and device type info for every node in the graph.
+// TODO(yuanbyu): It might be simpler if we convert MemoryType to
+// DeviceType for the inputs/outputs of each node.
+Status BuildMemoryDeviceInfo(const Graph& g, GraphInfo* info) {
+  Status status;
+  MemoryTypeVector input_memory_types;
+  MemoryTypeVector output_memory_types;
+
+  info->device_types.resize(g.num_node_ids(), DEVICE_CPU);
+  for (const Node* node : g.nodes()) {
+    if (!node->IsOp()) continue;  // Skip Sink/Source nodes.
+
+    DeviceNameUtils::ParsedName parsed;
+    if (!DeviceNameUtils::ParseFullName(node->assigned_device_name(),
+                                        &parsed)) {
+      return errors::Internal("Malformed assigned device '",
+                              node->assigned_device_name(), "'");
+    }
+
+    input_memory_types.clear();
+    input_memory_types.resize(node->num_inputs());
+    output_memory_types.clear();
+    output_memory_types.resize(node->num_outputs());
+    status = MemoryTypesForNode(g.op_registry(), DeviceType(parsed.type),
+                                node->def(), &input_memory_types,
+                                &output_memory_types);
+    if (!status.ok()) return status;
+
+    int node_id = node->id();
+    info->device_types[node_id] = DeviceType(parsed.type);
+    for (size_t i = 0; i < input_memory_types.size(); ++i) {
+      info->input_types[{node_id, i}] = input_memory_types[i];
+    }
+    for (size_t i = 0; i < output_memory_types.size(); ++i) {
+      info->output_types[{node_id, i}] = output_memory_types[i];
+    }
+  }
+  return status;
+}
+
+// Each participating device needs to decide a) if there is a next iteration,
+// and b) if the loop terminates. We take the approach to encode this control
+// flow logic in the dataflow graph. There are at least two possible encodings.
+// In a completely decentralized encoding, the participants communicate peer
+// to peer. The other encoding uses a frame leader (the participant who owns
+// the pivot termination predicate) to broadcast the termination condition to
+// all the participants. For now we take the latter because it is simpler.
+//
+// TODO(yuanbyu): The correctness of this construction is rather subtle. I got
+// it wrong many times so it would be nice to write a proof to be sure.
+Status AddControlFlow(const PartitionOptions& opts, Graph* g,
+                      GraphInfo* g_info) {
+  Status status;
+  GraphDefBuilder::Options bopts(g, &status);
+  std::vector<ControlFlowInfo>& cf_info = g_info->cf_info;
+
+  // Build the control flow info for every node.
+  status = BuildControlFlowInfo(g, &cf_info);
+  if (!status.ok()) return status;
+
+  // The map from frames to their LoopCond nodes.
+  std::unordered_map<string, Node*> frame_cond_map;
+  int num_node_ids = g->num_node_ids();
+  for (int i = 0; i < num_node_ids; ++i) {
+    Node* node = g->FindNodeId(i);
+    if (node == nullptr) continue;
+
+    if (IsLoopCond(node)) {
+      const string& frame_name = cf_info[node->id()].frame_name;
+      DCHECK(!frame_name.empty());
+      frame_cond_map[frame_name] = node;
+    }
+  }
+
+  // Add all control loops for cross-device frames.
+  // A control loop is added only when there is a cross-device edge in a
+  // non-root frame. Nothing is added if there is no loops. We also don't
+  // add anything for a frame that is completely local to a device. For
+  // nested loops, we stack the control loops together by connecting
+  // the merge of the outer loop to the enter of the inner loop.
+  //
+  // A map from <frame_name, device_name> to ControlLoop.
+  std::unordered_map<string, ControlLoop> control_loops;
+  int num_edge_ids = g->num_edge_ids();
+  for (int i = 0; i < num_edge_ids; ++i) {
+    const Edge* edge = g->FindEdgeId(i);
+    if (edge == nullptr) continue;
+
+    const Node* src = edge->src();
+    const Node* dst = edge->dst();
+    // Skip Sink/Source nodes.
+    if (!src->IsOp() || !dst->IsOp()) continue;
+
+    const string& src_device = src->assigned_device_name();
+    const string& dst_device = dst->assigned_device_name();
+    // Skip local edges.
+    if (src_device == dst_device) continue;
+
+    const string& src_frame = cf_info[src->id()].frame_name;
+    const string& dst_frame = cf_info[dst->id()].frame_name;
+    // Skip if src and dst are not in the same frame.
+    if (src_frame.empty() || src_frame != dst_frame) {
+      continue;
+    }
+
+    // Add the control loop. Start by adding the control loop for the
+    // current frame if needed, and recursively adding the control loop
+    // for its outer frame when nested.
+    ControlLoop child_loop;
+    while (true) {
+      const string& curr_frame = cf_info[src->id()].frame_name;
+      if (curr_frame.empty()) {
+        // We have reached the root frame.
+        if (child_loop.merge != nullptr) {
+          const string& node_name = opts.new_name(edge->dst()->name());
+          const string& device_name = edge->dst()->assigned_device_name();
+          Node* const_node =
+              AddControlConst(device_name, bopts.WithName(node_name));
+          if (!status.ok()) return status;
+          AddControlFlowInfo(const_node, src, &cf_info);
+          g->AddEdge(const_node, 0, child_loop.enter, 0);
+        }
+        break;
+      }
+
+      const string& cl_key = strings::StrCat(curr_frame, "$$", dst_device);
+      auto it = control_loops.find(cl_key);
+      if (it != control_loops.end()) {
+        if (child_loop.enter != nullptr) {
+          g->AddEdge(it->second.merge, 0, child_loop.enter, 0);
+        }
+        break;
+      }
+
+      // Get the frame's LoopCond.
+      auto cond_it = frame_cond_map.find(curr_frame);
+      if (cond_it == frame_cond_map.end()) {
+        return errors::InvalidArgument(
+            "A cross-device loop must have a pivot predicate: ", curr_frame);
+      }
+      Node* loop_cond = cond_it->second;
+
+      // Add the control loop.
+      ControlLoop curr_loop;
+      status =
+          AddControlLoop(opts, g, src, edge, loop_cond, &cf_info, &curr_loop);
+      if (!status.ok()) return status;
+      control_loops[cl_key] = curr_loop;
+
+      if (child_loop.enter != nullptr) {
+        // Connect the merge of the outer loop to the enter of the inner.
+        g->AddEdge(curr_loop.merge, 0, child_loop.enter, 0);
+      }
+      src = cf_info[src->id()].parent_frame;
+      child_loop = curr_loop;
+    }
+  }
+
+  // For a cross-device edge, on the dst device, add a control edge
+  // from the merge node of the control loop to dst. If a send/recv is
+  // introduced for this edge in future partitioning, we delete this
+  // control edge and add a new control edge from the merge to the recv.
+  num_edge_ids = g->num_edge_ids();
+  for (int i = 0; i < num_edge_ids; ++i) {
+    const Edge* edge = g->FindEdgeId(i);
+    if (edge == nullptr) continue;
+
+    const Node* src = edge->src();
+    Node* dst = edge->dst();
+    // Skip Sink/Source nodes.
+    if (!src->IsOp() || !dst->IsOp()) continue;
+
+    const string& src_device = src->assigned_device_name();
+    const string& dst_device = dst->assigned_device_name();
+    if (src_device != dst_device) {
+      const string& src_frame = cf_info[src->id()].frame_name;
+      const string& dst_frame = cf_info[dst->id()].frame_name;
+      if (!src_frame.empty() && src_frame == dst_frame) {
+        const string& cl_key = strings::StrCat(dst_frame, "$$", dst_device);
+        ControlLoop loop = control_loops[cl_key];
+        DCHECK(loop.enter != nullptr);
+        g->AddControlEdge(loop.merge, dst);
+      }
+    }
+  }
+  return Status::OK();
+}
+
+}  // end namespace
+
+Status AddControlEdges(const PartitionOptions& opts,
+                       std::unordered_map<string, GraphDef>* partitions) {
+  Status status;
+  // TODO(yuanbyu): Very naive for now. To be improved.
+  const int num_epochs = 100;
+  const int prefetch = 6;
+
+  typedef std::pair<const NodeDef*, int64> NodeStartTime;
+  for (auto& part : *partitions) {
+    GraphDef* gdef = &part.second;
+
+    std::vector<NodeStartTime> start_times;
+    start_times.resize(gdef->node_size());
+    for (int n = 0; n < gdef->node_size(); ++n) {
+      const NodeDef& ndef = gdef->node(n);
+      int64 start_time;
+      status = GetNodeAttr(ndef, "_start_time", &start_time);
+      if (!status.ok()) {
+        return status;
+      }
+      start_times[n] = std::make_pair(&ndef, start_time);
+    }
+
+    // Sort the nodes based on their start times.
+    std::sort(
+        start_times.begin(), start_times.end(),
+        [](NodeStartTime x, NodeStartTime y) { return x.second < y.second; });
+
+    // Add a dummy node for every epoch, and add a control edge from the
+    // "last" node in the preceding epoch to the dummy node.
+    string device_name = gdef->node(0).device();
+    int64 makespan = start_times.back().second;
+    int64 resolution = (makespan / num_epochs) + 1;
+
+    int i = 0;
+    int j = 0;
+    std::vector<NodeDef*> dummys;
+    while (i < num_epochs && static_cast<size_t>(j) < start_times.size()) {
+      if (i * resolution > start_times[j].second) {
+        j++;
+      } else {
+        NodeDef* dummy = AddControlTrigger(opts, gdef, device_name, i,
+                                           i * resolution, &status);
+        if (!status.ok()) {
+          return status;
+        }
+        dummys.push_back(dummy);
+        if (j > 0) {
+          string src_name = start_times[j - 1].first->name();
+          AddInput(dummy, src_name, Graph::kControlSlot);
+        }
+        i++;
+      }
+    }
+
+    // Finally, add the control edges to recvs.
+    for (int n = 0; n < gdef->node_size(); ++n) {
+      NodeDef* ndef = gdef->mutable_node(n);
+      if (ndef->op() == "_Recv") {
+        int64 start_time;
+        status = GetNodeAttr(*ndef, "_start_time", &start_time);
+        if (!status.ok()) {
+          return status;
+        }
+        int recv_epoch = start_time / resolution;
+        if (recv_epoch >= prefetch) {
+          NodeDef* dummy = dummys[recv_epoch - prefetch];
+          AddInput(ndef, dummy->name(), Graph::kControlSlot);
+        }
+      }
+    }
+  }
+  return Status::OK();
+}
+
+Status Partition(const PartitionOptions& opts, Graph* g,
+                 std::unordered_map<string, GraphDef>* partitions) {
+  Status status;
+  partitions->clear();
+
+  GraphInfo g_info;
+  if (!opts.control_flow_added) {
+    // Add the "code" for distributed execution of control flow. Code is
+    // added only for the frames that are placed on multiple devices. The
+    // new graph is an equivalent transformation of the original graph and
+    // has the property that it can be subsequently partitioned arbitrarily
+    // (down to the level of individual device) for distributed execution.
+    status = AddControlFlow(opts, g, &g_info);
+    if (!status.ok()) return status;
+  }
+  // At this point, all the graph mutations have been done. Build memory
+  // and device type info for every node and edge in the graph.
+  status = BuildMemoryDeviceInfo(*g, &g_info);
+  if (!status.ok()) return status;
+
+  string dstp;
+  std::vector<const Edge*> inputs;
+  DupRecvTable dup_recv(3);
+  // For a node dst, 'ref_recvs' remembers the recvs introduced by a ref
+  // edge to dst. 'ref_control_inputs' remembers the inputs by a non-ref
+  // edge to dst. We will add a control edge for every pair in
+  // (ref_recvs x ref_control_inputs).
+  std::vector<NodeDef*> ref_recvs;
+  std::vector<string> ref_control_inputs;
+
+  int32 num_data = 0;
+  int32 num_control = 0;
+  for (const Node* dst : g->nodes()) {
+    if (!dst->IsOp()) continue;  // Skip Sink/Source nodes.
+
+    dstp = opts.node_to_loc(dst);
+    GraphDef* dst_graph = &(*partitions)[dstp];
+    NodeDef* dst_def = dst_graph->add_node();
+    *dst_def = dst->def();
+    dst_def->set_device(dst->assigned_device_name());
+    dst_def->clear_input();  // Inputs are filled below
+    if (opts.need_to_record_start_times) {
+      int64 start_time = opts.start_times[dst->id()].value();
+      AddNodeAttr("_start_time", start_time, dst_def);
+    }
+
+    // Arrange the incoming edges to dst so that input[i] holds the
+    // input flowing into slot numbered i. Trailing entries in input[]
+    // hold control edges.
+    inputs.clear();
+    inputs.resize(dst->num_inputs(), nullptr);
+    ref_recvs.clear();
+    ref_control_inputs.clear();
+    const Edge* control_flow_edge = nullptr;
+    for (const Edge* edge : dst->in_edges()) {
+      if (edge->IsControlEdge()) {
+        if (IsMerge(edge->src()) && IsControlLoop(edge->src())) {
+          // This is one of the control edges added for control flow. There
+          // can be multiple such edges as the dest node may have multiple
+          // remote inputs. We will just take one and ignore the others.
+          control_flow_edge = edge;
+        } else {
+          inputs.push_back(edge);
+        }
+      } else {
+        DCHECK(inputs[edge->dst_input()] == nullptr);
+        inputs[edge->dst_input()] = edge;
+      }
+    }
+
+    // Process in order so that all data edges are added as inputs to
+    // dst in Edge::dst_input() order.
+    bool recv_added = false;
+    for (const Edge* edge : inputs) {
+      const Node* src = edge->src();
+      if (!src->IsOp()) continue;  // Skip Sink/Source nodes.
+
+      GraphDef* src_graph = &(*partitions)[opts.node_to_loc(src)];
+      if (src_graph == dst_graph && !NeedSameDeviceSendRecv(edge, g_info)) {
+        // Same partition and compatible memory types:
+        AddInput(dst_def, src->name(), edge->src_output());
+        if (edge->IsControlEdge() ||
+            !IsRefType(src->output_type(edge->src_output()))) {
+          ref_control_inputs.push_back(src->name());
+        }
+        continue;
+      }
+
+      int64 send_start_time = 0;
+      int64 recv_start_time = 0;
+      if (opts.scheduling_for_recvs) {
+        if (opts.need_to_record_start_times) {
+          send_start_time = opts.start_times[src->id()].value();
+          recv_start_time = opts.start_times[dst->id()].value();
+        } else {
+          status = GetNodeAttr(src->def(), "_start_time", &send_start_time);
+          if (!status.ok()) {
+            return status;
+          }
+          status = GetNodeAttr(dst->def(), "_start_time", &recv_start_time);
+          if (!status.ok()) {
+            return status;
+          }
+        }
+      }
+
+      // Check whether there is already a send/recv pair transferring
+      // the same tensor/control from the src to dst partition.
+      const bool on_host = IsDstInputOnHost(edge, g_info);
+      DupRecvKey key{src->id(), edge->src_output(), dst_graph, on_host};
+      auto iter = dup_recv.find(key);
+      if (iter != dup_recv.end()) {
+        // We found one. Reuse the data/control transferred already.
+        const string& recv_node_name = iter->second.recv->name();
+        if (edge->IsControlEdge()) {
+          AddInput(dst_def, recv_node_name, Graph::kControlSlot);
+        } else {
+          AddInput(dst_def, recv_node_name, 0);
+        }
+        // We want the start_time for the recv to be the smallest of the start
+        // times of it's consumers. So we update this whenever we use a recv,
+        // and write it out to the attribute at the end of the subroutine
+        if (iter->second.start_time > recv_start_time) {
+          iter->second.start_time = recv_start_time;
+        }
+        continue;
+      }
+
+      NodeDefBuilder::NodeOut send_from;
+      if (edge->IsControlEdge()) {
+        // Insert a dummy const node that will generate a tiny
+        // data element to be sent from send to recv.
+        VLOG(1) << "Send/Recv control: " << src->assigned_device_name() << "["
+                << src->name() << "] -> " << dst->assigned_device_name() << "["
+                << dst->name() << "]";
+        NodeDef* dummy = AddDummyConst(opts, src_graph, edge, &status);
+        if (!status.ok()) return status;
+        // Set the start time for this dummy node.
+        if (opts.scheduling_for_recvs) {
+          AddNodeAttr("_start_time", send_start_time, dummy);
+        }
+        AddInput(dummy, src->name(), Graph::kControlSlot);
+        send_from.Reset(dummy->name(), 0, DT_FLOAT);
+      } else {
+        send_from.Reset(src->name(), edge->src_output(), EdgeType(edge));
+      }
+
+      // Need to split edge by placing matching send/recv nodes on
+      // the src/dst sides of the edge.
+      NodeDef* send = AddSend(opts, g_info, src_graph, edge, send_from,
+                              send_start_time, &status);
+      if (!status.ok()) return status;
+
+      NodeDef* real_recv = nullptr;
+      NodeDef* recv =
+          AddRecv(opts, g_info, dst_graph, edge, &real_recv, &status);
+      if (!status.ok()) return status;
+
+      // Fix up the control flow edge. Redirect it to the recv.
+      // NOTE(yuanbyu): 'real_recv' must be the real recv node.
+      recv_added = true;
+      if (control_flow_edge != nullptr) {
+        AddInput(real_recv, control_flow_edge->src()->name(),
+                 Graph::kControlSlot);
+      }
+
+      // For same device send/recv, add a control edge from send to recv.
+      // This prevents the asynchronous recv kernel from being scheduled
+      // immediately.
+      if (src_graph == dst_graph) {
+        AddInput(real_recv, send->name(), Graph::kControlSlot);
+      }
+
+      if (!edge->IsControlEdge() &&
+          IsRefType(src->output_type(edge->src_output()))) {
+        // If src is of ref type and the edge is not a control edge, dst has
+        // read semantics and therefore we must control the recv.
+        ref_recvs.push_back(real_recv);
+      } else {
+        // Memorize the send/recv pair, only if this is not a "ref" edge.
+        // NOTE(yuanbyu): Collapsing ref edges requires extreme care so
+        // for now we don't do it.
+        dup_recv[key] = {recv, real_recv, recv_start_time};
+        ref_control_inputs.push_back(recv->name());
+      }
+
+      if (edge->IsControlEdge()) {
+        ++num_control;
+        AddInput(dst_def, recv->name(), Graph::kControlSlot);
+      } else {
+        ++num_data;
+        AddInput(dst_def, recv->name(), 0);
+      }
+    }
+
+    // Add control edges from 'ref_control_inputs' to 'ref_recvs'.
+    // NOTE(yuanbyu): Adding these control edges should not introduce
+    // deadlocks. 'dst' has implicit "read" nodes that, when we split
+    // across devices, are made explicit; Retargettig the dependencies
+    // to 'dst' to those nodes would not introduce cycles if there isn't
+    // one before the transformation.
+    // NOTE(yuanbyu): This may impact performance because it defers the
+    // execution of recvs until all the other inputs become available.
+    AddReadControl(ref_recvs, ref_control_inputs);
+
+    // Add back this control edge for control flow if not used.
+    if (!recv_added && (control_flow_edge != nullptr)) {
+      AddInput(dst_def, control_flow_edge->src()->name(), Graph::kControlSlot);
+    }
+  }
+
+  // Set the start times for recvs at the very end.
+  if (opts.scheduling_for_recvs) {
+    for (auto& it : dup_recv) {
+      AddNodeAttr("_start_time", it.second.start_time, it.second.recv);
+      if (it.second.real_recv != it.second.recv) {
+        AddNodeAttr("_start_time", it.second.start_time, it.second.real_recv);
+      }
+    }
+  }
+
+  VLOG(1) << "Added send/recv: controls=" << num_control
+          << ", data=" << num_data;
+  return Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/graph_partition.h b/tensorflow/core/graph/graph_partition.h
new file mode 100644
index 0000000000..eb88ff71b1
--- /dev/null
+++ b/tensorflow/core/graph/graph_partition.h
@@ -0,0 +1,77 @@
+#ifndef TENSORFLOW_GRAPH_GRAPH_PARTITION_H_
+#define TENSORFLOW_GRAPH_GRAPH_PARTITION_H_
+
+#include <functional>
+#include <string>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/costmodel.h"
+
+namespace tensorflow {
+
+struct PartitionOptions {
+  // A function that returns a location for the execution of a given
+  // Node.
+  typedef std::function<string(const Node*)> NodeToLocFunc;
+  NodeToLocFunc node_to_loc = nullptr;
+
+  // A function that returns a unique graph node name with the given
+  // prefix.
+  typedef std::function<string(const string&)> NewNameFunc;
+  NewNameFunc new_name = nullptr;
+
+  // A function that returns the incarnation of a device given the
+  // device's fullname. If not found, GetIncarnationFunc should return
+  // kIlledgalIncarnation.
+  static const uint64 kIllegalIncarnation = 0;
+  typedef std::function<uint64(const string&)> GetIncarnationFunc;
+  GetIncarnationFunc get_incarnation = nullptr;
+
+  // True if all the control flow "code" has already been added. The
+  // control flow code needs to be added when we still have the entire
+  // graph before any partitioning. So this flag should be false for
+  // the first partitioning but true for all subsequent partitioning.
+  //
+  // TODO(yuanbyu): We could also make the addition of the control
+  // flow code incremental based on 'node_to_loc'. This makes the
+  // communication a broadcast tree, which could be more efficient when
+  // the number of participating devices is large.
+  bool control_flow_added;
+
+  // A function that returns the data type into which the tensor
+  // should be cast before sent over the wire.
+  typedef std::function<DataType(const Edge*)> ShouldCastFunc;
+  ShouldCastFunc should_cast = nullptr;
+
+  // Schedule the execution of the recvs based on their start times
+  // computed by some scheduling algorithm. The recvs are divided into
+  // epochs based on their start times. A recv is enabled only when
+  // execution reaches its epoch - N for some predefined N.
+  bool scheduling_for_recvs = false;
+  // The start time for each node in the graph computed by some scheduling
+  // algorithm. If 'need_to_record_start_times' is true, we record them
+  // in the graph as a node attribute.
+  bool need_to_record_start_times = false;
+  std::vector<Microseconds> start_times;
+};
+
+// Partition "input" graph into a set of graphs, one per location.
+// The location for node n is derived by calling opts.node_to_loc(n).
+// New nodes added by Partition use "opts.new_name(old_name)" to
+// generate node names.
+//
+// Stores the partitions in *partitions.
+Status Partition(const PartitionOptions& opts, Graph* input,
+                 std::unordered_map<string, GraphDef>* partitions);
+
+// Add control edges to the partitions to control the ordering
+// and timing of the recv nodes based on the start times calculated
+// using some scheduling algorithm.
+Status AddControlEdges(const PartitionOptions& opts,
+                       std::unordered_map<string, GraphDef>* partitions);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_GRAPH_PARTITION_H_
diff --git a/tensorflow/core/graph/graph_partition_test.cc b/tensorflow/core/graph/graph_partition_test.cc
new file mode 100644
index 0000000000..d912c94025
--- /dev/null
+++ b/tensorflow/core/graph/graph_partition_test.cc
@@ -0,0 +1,316 @@
+#include "tensorflow/core/graph/graph_partition.h"
+
+#include <unordered_map>
+
+#include <gtest/gtest.h>
+#include "tensorflow/cc/ops/array_ops.h"
+#include "tensorflow/cc/ops/const_op.h"
+#include "tensorflow/cc/ops/control_flow_ops.h"
+#include "tensorflow/cc/ops/random_ops.h"
+#include "tensorflow/cc/ops/sendrecv_ops.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/graph/equal_graph_def.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+namespace {
+
+const char gpu_device[] = "/job:a/replica:0/task:0/gpu:0";
+
+string SplitByDevice(const Node* node) { return node->assigned_device_name(); }
+
+string DeviceName(const Node* node) {
+  char first = node->name()[0];
+  if (first == 'G') {
+    return gpu_device;
+  } else {
+    const string cpu_prefix = "/job:a/replica:0/task:0/cpu:";
+    int index = first - 'A';
+    return strings::StrCat(cpu_prefix, index);
+  }
+}
+
+void Partition(const GraphDef& graph_def,
+               std::unordered_map<string, GraphDef>* partitions) {
+  Graph g(OpRegistry::Global());
+  GraphConstructorOptions opts;
+  TF_CHECK_OK(ConvertGraphDefToGraph(opts, graph_def, &g));
+
+  // Assigns devices to each node. Uses 1st letter of the node name as
+  // the device index.
+  for (Node* node : g.nodes()) {
+    node->set_assigned_device_name(DeviceName(node));
+  }
+
+  PartitionOptions popts;
+  popts.node_to_loc = SplitByDevice;
+  popts.new_name = [&g](const string& prefix) { return g.NewName(prefix); };
+  popts.get_incarnation = [](const string& name) {
+    return (name[0] - 'A') + 100;
+  };
+  popts.control_flow_added = false;
+  Status s = Partition(popts, &g, partitions);
+  CHECK(s.ok()) << s;
+}
+
+void CheckLoopConstruction(const GraphDef& graph_def) {
+  std::unordered_map<string, GraphDef> partitions;
+  Partition(graph_def, &partitions);
+  GraphConstructorOptions opts;
+  for (const auto& kv : partitions) {
+    const GraphDef& gdef = kv.second;
+    bool has_control_enter = false;
+    bool has_control_merge = false;
+    bool has_control_switch = false;
+    bool has_control_next = false;
+    for (const NodeDef& ndef : gdef.node()) {
+      // _recvs must have a control input
+      if (ndef.op() == "_Recv") {
+        bool has_control = false;
+        for (const string& input_name : ndef.input()) {
+          if (StringPiece(input_name).starts_with("^")) {
+            has_control = true;
+            break;
+          }
+        }
+        EXPECT_TRUE(has_control);
+      }
+      // Must have a control loop
+      if (StringPiece(ndef.name()).starts_with("_cloop")) {
+        if (ndef.op() == "Enter") {
+          has_control_enter = true;
+        }
+        if (ndef.op() == "Merge") {
+          has_control_merge = true;
+        }
+        if (ndef.op() == "Switch") {
+          has_control_switch = true;
+        }
+        if (ndef.op() == "NextIteration") {
+          has_control_next = true;
+        }
+      }
+    }
+    EXPECT_TRUE(has_control_enter);
+    EXPECT_TRUE(has_control_merge);
+    EXPECT_TRUE(has_control_switch);
+    EXPECT_TRUE(has_control_next);
+  }
+}
+
+REGISTER_OP("Input").Output("o: float");
+REGISTER_OP("BoolInput").Output("o: bool");
+REGISTER_OP("Cross").Input("a: float").Input("b: float").Output("o: float");
+
+Node* Input(const GraphDefBuilder::Options& opts) {
+  return ops::SourceOp("Input", opts);
+}
+
+Node* BoolInput(const GraphDefBuilder::Options& opts) {
+  return ops::SourceOp("BoolInput", opts);
+}
+
+Node* Cross(ops::NodeOut a, ops::NodeOut b,
+            const GraphDefBuilder::Options& opts) {
+  return ops::BinaryOp("Cross", a, b, opts);
+}
+
+class GraphPartitionTest : public ::testing::Test {
+ protected:
+  GraphPartitionTest()
+      : in_(GraphDefBuilder::kFailImmediately),
+        builder_a_(GraphDefBuilder::kFailImmediately),
+        builder_b_(GraphDefBuilder::kFailImmediately),
+        a_opts_(builder_a_.opts().WithDevice("/job:a/replica:0/task:0/cpu:0")),
+        b_opts_(builder_b_.opts().WithDevice("/job:a/replica:0/task:0/cpu:1")) {
+    RequireDefaultOps();
+  }
+
+  const GraphDef& ToGraphDef() {
+    in_.ToGraphDef(&in_graph_def_);
+    return in_graph_def_;
+  }
+
+  void ExpectMatchA() {
+    GraphDef graph_def;
+    builder_a_.ToGraphDef(&graph_def);
+    string a = "/job:a/replica:0/task:0/cpu:0";
+    TF_EXPECT_GRAPH_EQ(graph_def, partitions_[a]);
+  }
+
+  void ExpectMatchB() {
+    GraphDef graph_def;
+    builder_b_.ToGraphDef(&graph_def);
+    string b = "/job:a/replica:0/task:0/cpu:1";
+    TF_EXPECT_GRAPH_EQ(graph_def, partitions_[b]);
+  }
+
+  GraphDefBuilder in_;
+  GraphDef in_graph_def_;
+  GraphDefBuilder builder_a_;
+  GraphDefBuilder builder_b_;
+  GraphDefBuilder::Options a_opts_;
+  GraphDefBuilder::Options b_opts_;
+  std::unordered_map<string, GraphDef> partitions_;
+};
+
+TEST_F(GraphPartitionTest, SingleDevice) {
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  Node* a1 = Input(in_.opts().WithName("A1"));
+  Cross(a1, a1, in_.opts().WithName("A2"));
+
+  Partition(ToGraphDef(), &partitions_);
+  EXPECT_EQ(1, partitions_.size());
+
+  a1 = Input(a_opts_.WithName("A1"));
+  Cross(a1, a1, a_opts_.WithName("A2"));
+  ExpectMatchA();
+}
+
+TEST_F(GraphPartitionTest, CrossDeviceData) {
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  Node* a1 = Input(in_.opts().WithName("A1"));
+  Node* b1 = Input(in_.opts().WithName("B1"));
+  Cross(a1, b1, in_.opts().WithName("B2"));
+
+  Partition(ToGraphDef(), &partitions_);
+  EXPECT_EQ(2, partitions_.size());
+
+  string a = "/job:a/replica:0/task:0/cpu:0";
+  string b = "/job:a/replica:0/task:0/cpu:1";
+  a1 = Input(a_opts_.WithName("A1"));
+  _Send(a1, "edge_1_A1", a, 82, b, a_opts_.WithName("A1/_0"));
+  ExpectMatchA();
+
+  b1 = Input(b_opts_.WithName("B1"));
+  Node* recv =
+      _Recv(DT_FLOAT, "edge_1_A1", a, 82, b, b_opts_.WithName("A1/_1"));
+  Cross(recv, b1, b_opts_.WithName("B2"));
+  ExpectMatchB();
+}
+
+TEST_F(GraphPartitionTest, CrossDeviceControl) {
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  Node* a1 = Input(in_.opts().WithName("A1"));
+  Node* b1 = Input(in_.opts().WithName("B1"));
+  Cross(b1, b1, in_.opts().WithName("B2").WithControlInput(a1));
+
+  Partition(ToGraphDef(), &partitions_);
+  EXPECT_EQ(2, partitions_.size());
+
+  string a = "/job:a/replica:0/task:0/cpu:0";
+  string b = "/job:a/replica:0/task:0/cpu:1";
+  a1 = Input(a_opts_.WithName("A1"));
+  Node* c = EmptyConst<float>(a_opts_.WithName("A1/_0").WithControlInput(a1));
+  _Send(c, "edge_3_A1", a, 82, b, a_opts_.WithName("A1/_1"));
+  ExpectMatchA();
+
+  Node* recv =
+      _Recv(DT_FLOAT, "edge_3_A1", a, 82, b, b_opts_.WithName("A1/_2"));
+  Node* id = Identity(recv, b_opts_.WithName("A1/_3"));
+  b1 = Input(b_opts_.WithName("B1"));
+  Cross(b1, b1, b_opts_.WithName("B2").WithControlInput(id));
+  ExpectMatchB();
+}
+
+TEST_F(GraphPartitionTest, CrossDeviceData_MultiUse) {
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  Node* a1 = Input(in_.opts().WithName("A1"));
+  Node* b1 = Input(in_.opts().WithName("B1"));
+  Cross(a1, b1, in_.opts().WithName("B2"));
+  Cross(a1, a1, in_.opts().WithName("B3"));
+
+  Partition(ToGraphDef(), &partitions_);
+  EXPECT_EQ(2, partitions_.size());
+
+  string a = "/job:a/replica:0/task:0/cpu:0";
+  string b = "/job:a/replica:0/task:0/cpu:1";
+  a1 = Input(a_opts_.WithName("A1"));
+  _Send(a1, "edge_1_A1", a, 82, b, a_opts_.WithName("A1/_0"));
+  ExpectMatchA();
+
+  Node* recv =
+      _Recv(DT_FLOAT, "edge_1_A1", a, 82, b, b_opts_.WithName("A1/_1"));
+  b1 = Input(b_opts_.WithName("B1"));
+  Cross(recv, b1, b_opts_.WithName("B2"));
+  Cross(recv, recv, b_opts_.WithName("B3"));
+  ExpectMatchB();
+}
+
+TEST_F(GraphPartitionTest, CrossDeviceControl_MultiUse) {
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  Node* a1 = Input(in_.opts().WithName("A1"));
+  Node* b1 = Input(in_.opts().WithName("B1"));
+  Cross(b1, b1, in_.opts().WithName("B2").WithControlInput(a1));
+  Input(in_.opts().WithName("B3").WithControlInput(a1));
+
+  Partition(ToGraphDef(), &partitions_);
+  EXPECT_EQ(2, partitions_.size());
+
+  string a = "/job:a/replica:0/task:0/cpu:0";
+  string b = "/job:a/replica:0/task:0/cpu:1";
+  a1 = Input(a_opts_.WithName("A1"));
+  Node* c = EmptyConst<float>(a_opts_.WithName("A1/_0").WithControlInput(a1));
+  _Send(c, "edge_1_A1", a, 82, b, a_opts_.WithName("A1/_1"));
+  ExpectMatchA();
+
+  Node* recv =
+      _Recv(DT_FLOAT, "edge_1_A1", a, 82, b, b_opts_.WithName("A1/_2"));
+  Node* id = Identity(recv, b_opts_.WithName("A1/_3"));
+  b1 = Input(b_opts_.WithName("B1"));
+  Cross(b1, b1, b_opts_.WithName("B2").WithControlInput(id));
+  Input(b_opts_.WithName("B3").WithControlInput(id));
+  ExpectMatchB();
+}
+
+TEST_F(GraphPartitionTest, CrossDevice_DataControl) {
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  Node* a1 = Input(in_.opts().WithName("A1"));
+  Node* b1 = Input(in_.opts().WithName("B1"));
+  Cross(a1, b1, in_.opts().WithName("B2"));
+  Input(in_.opts().WithName("B3").WithControlInput(a1));
+
+  Partition(ToGraphDef(), &partitions_);
+  EXPECT_EQ(2, partitions_.size());
+
+  string a = "/job:a/replica:0/task:0/cpu:0";
+  string b = "/job:a/replica:0/task:0/cpu:1";
+  a1 = Input(a_opts_.WithName("A1"));
+  Node* c = EmptyConst<float>(a_opts_.WithName("A1/_0").WithControlInput(a1));
+  // NOTE: Send 0 A1/_1 -> A1/_2 is not necessarily needed. We could
+  // use A1/_0 -> A1/_4 as the control as a minor optimization.
+  _Send(c, "edge_1_A1", a, 82, b, a_opts_.WithName("A1/_1"));
+  _Send(a1, "edge_2_A1", a, 82, b, a_opts_.WithName("A1/_4"));
+  ExpectMatchA();
+
+  Node* recv1 =
+      _Recv(DT_FLOAT, "edge_1_A1", a, 82, b, b_opts_.WithName("A1/_2"));
+  Node* id1 = Identity(recv1, b_opts_.WithName("A1/_3"));
+  Node* recv2 =
+      _Recv(DT_FLOAT, "edge_2_A1", a, 82, b, b_opts_.WithName("A1/_5"));
+  b1 = Input(b_opts_.WithName("B1"));
+  Cross(recv2, b1, b_opts_.WithName("B2"));
+  Input(b_opts_.WithName("B3").WithControlInput(id1));
+  ExpectMatchB();
+}
+
+TEST_F(GraphPartitionTest, CrossDeviceLoop) {
+  using namespace ::tensorflow::ops;  // NOLINT(build/namespaces)
+  Node* a1 = BoolInput(in_.opts().WithName("A1"));
+  Node* a2 = Enter(a1, "foo", in_.opts().WithName("A2"));
+  Node* a3 = Merge({a2, {"A5", 0, DT_BOOL}}, in_.opts().WithName("A3"));
+  LoopCond(a3, in_.opts().WithName("A4"));
+  Node* b1 = Identity(a3, in_.opts().WithName("B1"));
+  NextIteration(b1, in_.opts().WithName("A5"));
+
+  CheckLoopConstruction(ToGraphDef());
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/graph_test.cc b/tensorflow/core/graph/graph_test.cc
new file mode 100644
index 0000000000..f7a8ffde89
--- /dev/null
+++ b/tensorflow/core/graph/graph_test.cc
@@ -0,0 +1,252 @@
+#include "tensorflow/core/graph/graph.h"
+
+#include <set>
+#include <gtest/gtest.h>
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+
+namespace tensorflow {
+namespace {
+
+class GraphTest : public ::testing::Test {
+ protected:
+  GraphTest() : graph_(OpRegistry::Global()) { RequireDefaultOps(); }
+  ~GraphTest() override {}
+
+  static void VerifyNodes(Node* node, std::vector<Node*> expected_in,
+                          std::vector<Node*> expected_out) {
+    std::vector<Node*> in;
+    for (const Edge* e : node->in_edges()) {
+      in.push_back(e->src());
+    }
+    EXPECT_EQ(Stringify(expected_in), Stringify(in));
+
+    std::vector<Node*> out;
+    for (const Edge* e : node->out_edges()) {
+      out.push_back(e->dst());
+    }
+    EXPECT_EQ(Stringify(expected_out), Stringify(out));
+  }
+
+  Node* AddNodeWithName(const string& name) {
+    Node* node;
+    TF_CHECK_OK(NodeBuilder(name, "NoOp").Finalize(&graph_, &node));
+    return node;
+  }
+
+  Graph graph_;
+
+ private:
+  // Convert a list of nodes to a sorted list of strings so failure messages
+  // are readable.
+  static std::vector<string> Stringify(const std::vector<Node*>& nodes) {
+    std::vector<string> result;
+    for (Node* n : nodes) {
+      result.push_back(n->DebugString());
+    }
+    std::sort(result.begin(), result.end());
+    return result;
+  }
+};
+
+TEST_F(GraphTest, Constructor) {
+  Node* source = graph_.source_node();
+  EXPECT_NE(source, nullptr);
+  Node* sink = graph_.sink_node();
+  EXPECT_NE(sink, nullptr);
+  VerifyNodes(source, {}, {sink});
+  VerifyNodes(sink, {source}, {});
+  EXPECT_EQ(2, graph_.num_node_ids());
+}
+
+TEST_F(GraphTest, RemoveThenAdd) {
+  AddNodeWithName("A");
+  Node* b = AddNodeWithName("B");
+  const int b_id = b->id();
+  AddNodeWithName("C");
+  EXPECT_EQ(5, graph_.num_node_ids());
+  graph_.RemoveNode(b);
+  EXPECT_EQ(5, graph_.num_node_ids());
+  Node* d = AddNodeWithName("D");
+  EXPECT_NE(b_id, d->id());  // Ids should not be reused.
+  EXPECT_EQ(6, graph_.num_node_ids());
+}
+
+TEST_F(GraphTest, InNodesAndOutNodes) {
+  Node* a = AddNodeWithName("A");
+  Node* b = AddNodeWithName("B");
+  Node* c = AddNodeWithName("C");
+  graph_.RemoveNode(b);
+  Node* d = AddNodeWithName("D");
+
+  const Edge* source_to_a = graph_.AddControlEdge(graph_.source_node(), a);
+  graph_.AddControlEdge(a, graph_.sink_node());
+  graph_.AddEdge(a, 0, c, 0);
+  graph_.AddControlEdge(c, graph_.sink_node());
+
+  EXPECT_EQ("A", a->name());
+  VerifyNodes(a, {graph_.source_node()}, {c, graph_.sink_node()});
+
+  EXPECT_EQ("C", c->name());
+  VerifyNodes(c, {a}, {graph_.sink_node()});
+
+  EXPECT_EQ("D", d->name());
+  VerifyNodes(d, {}, {});
+
+  VerifyNodes(graph_.source_node(), {}, {a, graph_.sink_node()});
+  VerifyNodes(graph_.sink_node(), {a, c, graph_.source_node()}, {});
+
+  graph_.RemoveEdge(source_to_a);
+  VerifyNodes(a, {}, {c, graph_.sink_node()});
+  VerifyNodes(graph_.source_node(), {}, {graph_.sink_node()});  // no more a
+
+  graph_.RemoveNode(c);
+  VerifyNodes(a, {}, {graph_.sink_node()});                        // no more c
+  VerifyNodes(graph_.sink_node(), {a, graph_.source_node()}, {});  // no more c
+  EXPECT_EQ(6, graph_.num_node_ids());
+  EXPECT_EQ(5, graph_.num_edge_ids());
+}
+
+TEST_F(GraphTest, NodeIteration) {
+  // Set up the graph with some holes due to removals.
+  Node* a = AddNodeWithName("A");
+  Node* b = AddNodeWithName("B");
+  Node* c = AddNodeWithName("C");
+  graph_.RemoveNode(b);
+  Node* d = AddNodeWithName("D");
+  const Edge* source_to_a = graph_.AddControlEdge(graph_.source_node(), a);
+  graph_.AddControlEdge(a, graph_.sink_node());
+  graph_.AddEdge(a, 0, c, 0);
+  graph_.AddControlEdge(c, graph_.sink_node());
+  graph_.RemoveEdge(source_to_a);
+  graph_.RemoveNode(c);
+
+  // expected = set of all node DebugStrings we expect in the graph
+  std::set<string> expected;
+  expected.insert(graph_.source_node()->DebugString());
+  expected.insert(a->DebugString());
+  expected.insert(d->DebugString());
+  expected.insert(graph_.sink_node()->DebugString());
+
+  // Verify that iterating through ids gets the same set of nodes.
+  std::set<string> actual;
+  for (int id = 0; id < graph_.num_node_ids(); ++id) {
+    Node* node = graph_.FindNodeId(id);
+    if (node != nullptr) {
+      actual.insert(node->DebugString());
+    }
+  }
+  EXPECT_EQ(expected, actual);
+
+  // Verify that range-based for loop gets the same set of nodes.
+  actual.clear();
+  for (Node* node : graph_.nodes()) {
+    actual.insert(node->DebugString());
+  }
+  EXPECT_EQ(expected, actual);
+}
+
+static void CheckType(Node* node, bool b) {
+  EXPECT_TRUE(b) << node->DebugString();
+  // Make sure none of the other IsFoo() methods return true.
+  int count = 0;
+  if (node->IsSource()) count++;
+  if (node->IsSink()) count++;
+  if (node->IsOp()) count++;
+  EXPECT_EQ(1, count) << node->DebugString();
+}
+
+TEST_F(GraphTest, Type) {
+  Node* op = AddNodeWithName("A");
+  CheckType(graph_.source_node(), graph_.source_node()->IsSource());
+  CheckType(graph_.sink_node(), graph_.sink_node()->IsSink());
+  CheckType(op, op->IsOp());
+}
+
+// Convert edge iteration results into a sorted string.
+static string EdgeIter(const Graph& g) {
+  std::vector<std::pair<int, int> > edges;
+  for (const Edge* e : g.edges()) {
+    edges.push_back(std::make_pair(e->src()->id(), e->dst()->id()));
+  }
+  std::sort(edges.begin(), edges.end());
+  string result;
+  for (auto& p : edges) {
+    strings::StrAppend(&result, p.first, "->", p.second, ";");
+  }
+  return result;
+}
+
+TEST_F(GraphTest, EdgeIteration) {
+  EXPECT_EQ("0->1;", EdgeIter(graph_));
+
+  Node* a = AddNodeWithName("A");
+  Node* b = AddNodeWithName("B");
+  EXPECT_EQ("0->1;", EdgeIter(graph_));  // Since a,b are currently disconnected
+
+  graph_.AddEdge(a, 0, b, 0);
+  EXPECT_EQ("0->1;2->3;", EdgeIter(graph_));
+
+  graph_.AddControlEdge(graph_.source_node(), a);
+  graph_.AddControlEdge(b, graph_.sink_node());
+  EXPECT_EQ("0->1;0->2;2->3;3->1;", EdgeIter(graph_));
+
+  graph_.AddEdge(a, 1, a, 0);
+  EXPECT_EQ("0->1;0->2;2->2;2->3;3->1;", EdgeIter(graph_));
+}
+
+TEST_F(GraphTest, NewName) {
+  string a1 = graph_.NewName("A");
+  string a2 = graph_.NewName("A");
+  string b1 = graph_.NewName("B");
+  EXPECT_NE(a1, a2);
+  EXPECT_NE(a1, b1);
+  EXPECT_NE(a2, b1);
+  EXPECT_TRUE(StringPiece(a1).starts_with("A")) << a1;
+}
+
+REGISTER_OP("Input").Output("o: float");
+REGISTER_OP("In2Out1").Input("a: float").Input("b: float").Output("o: float");
+
+static void BM_InEdgeIteration(int iters, int num_nodes) {
+  testing::StopTiming();
+  string s;
+  for (int in = 0; in < 10; in++) {
+    s += strings::Printf("node { name: 'in%04d' op: 'Input' }", in);
+  }
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  for (int op = 0; op < num_nodes; op++) {
+    s += strings::Printf(
+        "node { name: 'op%04d' op: 'In2Out1' input: ['in%04d', 'in%04d' ] }",
+        op, rnd.Uniform(10), rnd.Uniform(10));
+  }
+
+  Graph graph(OpRegistry::Global());
+  GraphDef graph_def;
+  CHECK(protobuf::TextFormat::ParseFromString(s, &graph_def));
+  GraphConstructorOptions opts;
+  TF_CHECK_OK(ConvertGraphDefToGraph(opts, graph_def, &graph));
+
+  int64 sum = 0;
+  testing::StartTiming();
+  for (int i = 0; i < iters; i += graph.num_node_ids()) {
+    for (const Node* node : graph.nodes()) {
+      for (auto e : node->in_edges()) {
+        sum += e->id();
+      }
+    }
+  }
+  VLOG(1) << sum;
+}
+BENCHMARK(BM_InEdgeIteration)->Range(10, 100000);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/node_builder.cc b/tensorflow/core/graph/node_builder.cc
new file mode 100644
index 0000000000..8c34323dbe
--- /dev/null
+++ b/tensorflow/core/graph/node_builder.cc
@@ -0,0 +1,115 @@
+#include "tensorflow/core/graph/node_builder.h"
+
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+NodeBuilder::NodeBuilder(const string& name, const string& op_name,
+                         const OpRegistryInterface* op_registry)
+    : def_builder_(name, op_name, op_registry) {}
+
+NodeBuilder::NodeBuilder(const string& name, const OpDef* op_def)
+    : def_builder_(name, op_def) {}
+
+NodeBuilder& NodeBuilder::Input(Node* src_node, int src_index) {
+  inputs_.emplace_back(src_node, src_index);
+  DataType dt;
+  if (GetOutputType(src_node, src_index, &dt)) {
+    def_builder_.Input(src_node->name(), src_index, dt);
+  }
+  return *this;
+}
+
+NodeBuilder& NodeBuilder::Input(NodeOut src) {
+  if (src.error) {
+    AddIndexError(src.node, src.index);
+  } else {
+    inputs_.emplace_back(src.node, src.index);
+    def_builder_.Input(src.name, src.index, src.dt);
+  }
+  return *this;
+}
+
+NodeBuilder& NodeBuilder::Input(gtl::ArraySlice<NodeOut> src_list) {
+  std::vector<NodeDefBuilder::NodeOut> srcs;
+  srcs.reserve(src_list.size());
+  for (const auto& node_out : src_list) {
+    if (node_out.error) {
+      AddIndexError(node_out.node, node_out.index);
+    } else {
+      srcs.emplace_back(node_out.name, node_out.index, node_out.dt);
+      inputs_.emplace_back(node_out.node, node_out.index);
+    }
+  }
+  def_builder_.Input(srcs);
+  return *this;
+}
+
+NodeBuilder& NodeBuilder::ControlInput(Node* src_node) {
+  control_inputs_.emplace_back(src_node);
+  def_builder_.ControlInput(src_node->name());
+  return *this;
+}
+
+NodeBuilder& NodeBuilder::ControlInputs(gtl::ArraySlice<Node*> src_nodes) {
+  control_inputs_.insert(control_inputs_.end(), src_nodes.begin(),
+                         src_nodes.end());
+  for (Node* src_node : src_nodes) {
+    def_builder_.ControlInput(src_node->name());
+  }
+  return *this;
+}
+
+NodeBuilder& NodeBuilder::Device(const string& device_spec) {
+  def_builder_.Device(device_spec);
+  return *this;
+}
+
+Status NodeBuilder::Finalize(Graph* graph, Node** created_node) const {
+  // In case of error, set *created_node to nullptr.
+  if (created_node != nullptr) *created_node = nullptr;
+  if (!errors_.empty()) {
+    return errors::InvalidArgument(str_util::Join(errors_, "\n"));
+  }
+
+  NodeDef node_def;
+  TF_RETURN_IF_ERROR(def_builder_.Finalize(&node_def));
+  TF_RETURN_IF_ERROR(ValidateNodeDef(node_def, def_builder_.op_def()));
+  Status status;
+  Node* node = graph->AddNode(node_def, &status);
+  if (!status.ok()) return status;
+
+  for (size_t i = 0; i < inputs_.size(); ++i) {
+    if (inputs_[i].node != nullptr) {  // Skip back edges.
+      graph->AddEdge(inputs_[i].node, inputs_[i].index, node, i);
+    }
+  }
+  for (Node* control_input : control_inputs_) {
+    graph->AddControlEdge(control_input, node);
+  }
+  if (created_node != nullptr) *created_node = node;
+  return Status::OK();
+}
+
+void NodeBuilder::AddIndexError(Node* node, int i) {
+  if (node == nullptr) {
+    errors_.emplace_back(
+        strings::StrCat("Attempt to add nullptr Node to node with type",
+                        def_builder_.op_def().name()));
+  } else {
+    errors_.emplace_back(
+        strings::StrCat("Attempt to add output ", i, " of ", node->name(),
+                        " not in range [0, ", node->num_outputs(),
+                        ") to node with type ", def_builder_.op_def().name()));
+  }
+}
+
+bool NodeBuilder::GetOutputType(Node* node, int i, DataType* dt) {
+  bool error;
+  *dt = SafeGetOutput(node, i, &error);
+  if (error) AddIndexError(node, i);
+  return !error;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/node_builder.h b/tensorflow/core/graph/node_builder.h
new file mode 100644
index 0000000000..dd34b97f23
--- /dev/null
+++ b/tensorflow/core/graph/node_builder.h
@@ -0,0 +1,146 @@
+#ifndef TENSORFLOW_GRAPH_NODE_BUILDER_H_
+#define TENSORFLOW_GRAPH_NODE_BUILDER_H_
+
+#include <vector>
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+
+namespace tensorflow {
+
+// This is a helper for creating a Node and adding it to a Graph.
+// Internally, it uses a NodeDefBuilder to automatically set attrs
+// that can be inferred from the inputs, and use default values
+// (where they exist) for unspecified attrs.  Example usage:
+//
+//  Node* node;
+//  Status status = NodeBuilder(node_name, op_name)
+//                           .Input(...)
+//                           .Attr(...)
+//                           .Finalize(&graph, &node);
+//  if (!status.ok()) return status;
+//  // Use node here.
+class NodeBuilder {
+ public:
+  // For specifying the output of a Node to provide to one of the Input()
+  // functions below.  It supports both regular inputs (where you are
+  // connecting to an existing Node*), and inputs from outside the graph
+  // (or haven't been added to the graph yet, like back edges, where
+  // you don't have a Node*). Both types can be mixed, e.g. in an
+  // ArraySlice.
+  struct NodeOut {
+    // For referencing an existing Node.
+    NodeOut(Node* n, int i = 0)  // NOLINT(runtime/explicit)
+        : node(n),
+          error(false),
+          name(node != nullptr ? node->name() : (error = true, "")),
+          index(i),
+          dt(SafeGetOutput(node, i, &error)) {}
+
+    // For referencing Nodes not in the graph being built. It is
+    // useful when preparing a graph for ExtendSession or creating a
+    // back edge to a node that hasn't been added to the graph yet,
+    // but will be.
+    NodeOut(const string& name, int i, DataType t)
+        : node(nullptr), error(false), name(name), index(i), dt(t) {}
+
+    // Default constructor for std::vector<NodeOut>.
+    NodeOut() {}
+
+    Node* node = nullptr;
+    // error is set to true if:
+    // * the NodeOut was default constructed and never overwritten,
+    // * a nullptr Node* was passed to the NodeOut constructor, or
+    // * an out-of-range index was passed to the NodeOut constructor.
+    bool error = true;
+    string name;
+    int index = 0;
+    DataType dt = DT_FLOAT;
+  };
+
+  // Specify the name and the Op (either via an OpDef or the name of
+  // the Op plus a registry) for the Node.  Other fields are
+  // specified by calling the methods below.
+  // REQUIRES: The OpDef must satisfy ValidateOpDef().
+  NodeBuilder(const string& name, const string& op_name,
+              const OpRegistryInterface* op_registry = OpRegistry::Global());
+  NodeBuilder(const string& name, const OpDef* op_def);
+
+  // You must call one Input() function per input_arg in the Op,
+  // *and in the same order as the input_args appear in the OpDef.*
+
+  // For inputs that take a single tensor.
+  NodeBuilder& Input(Node* src_node, int src_index = 0);
+  NodeBuilder& Input(NodeOut src);
+
+  // For inputs that take a list of tensors.
+  NodeBuilder& Input(gtl::ArraySlice<NodeOut> src_list);
+
+  // Require that this node run after src_node(s).
+  NodeBuilder& ControlInput(Node* src_node);
+  NodeBuilder& ControlInputs(gtl::ArraySlice<Node*> src_nodes);
+
+  // Sets the "requested device spec" in the NodeDef (not the
+  // "assigned device" in the Node).
+  NodeBuilder& Device(const string& device_spec);
+
+  // Set the value of an attr.  attr_name must match the name of one of
+  // attrs defined by the Op, and value must have the corresponding type
+  // (see SetAttrValue() in ../framework/attr_value_util.h for legal
+  // types for value).  Note that attrs will be set automatically if
+  // they can be determined by the inputs.
+  template <class T>
+  NodeBuilder& Attr(const string& attr_name, T&& value);
+  template <class T>
+  NodeBuilder& Attr(const string& attr_name, std::initializer_list<T> value);
+
+  // Validates the described node and adds it to *graph, adding edges
+  // for all (non-back) inputs.  If created_node is not nullptr,
+  // *created_node will be set to the new node (or nullptr on error).
+  Status Finalize(Graph* graph, Node** created_node) const;
+
+ private:
+  static DataType SafeGetOutput(Node* node, int i, bool* error) {
+    if (node != nullptr && i >= 0 && i < node->num_outputs()) {
+      *error = false;
+      return node->output_type(i);
+    } else {
+      *error = true;
+      return DT_FLOAT;
+    }
+  }
+
+  // If SafeGetOutput indicates a range error, add it to errors_.
+  void AddIndexError(Node* node, int i);
+
+  // Set *dt and returns true if i is in range. Combines
+  // SafeGetOutput() and AddIndexError().
+  bool GetOutputType(Node* node, int i, DataType* dt);
+
+  NodeDefBuilder def_builder_;
+  std::vector<NodeOut> inputs_;
+  std::vector<Node*> control_inputs_;
+  std::vector<string> errors_;
+};
+
+// IMPLEMENTATION -------------------------------------------------------------
+
+template <class T>
+inline NodeBuilder& NodeBuilder::Attr(const string& attr_name, T&& value) {
+  def_builder_.Attr(attr_name, std::forward<T>(value));
+  return *this;
+}
+
+template <class T>
+NodeBuilder& NodeBuilder::Attr(const string& attr_name,
+                               std::initializer_list<T> value) {
+  def_builder_.Attr(attr_name, value);
+  return *this;
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_NODE_BUILDER_H_
diff --git a/tensorflow/core/graph/node_builder_test.cc b/tensorflow/core/graph/node_builder_test.cc
new file mode 100644
index 0000000000..9f667d00e4
--- /dev/null
+++ b/tensorflow/core/graph/node_builder_test.cc
@@ -0,0 +1,59 @@
+#include "tensorflow/core/graph/node_builder.h"
+
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+REGISTER_OP("Source").Output("o: out_types").Attr("out_types: list(type)");
+REGISTER_OP("Sink").Input("i: T").Attr("T: type");
+
+TEST(NodeBuilderTest, Simple) {
+  RequireDefaultOps();
+  Graph graph(OpRegistry::Global());
+  Node* source_node;
+  EXPECT_OK(NodeBuilder("source_op", "Source")
+                .Attr("out_types", {DT_INT32, DT_STRING})
+                .Finalize(&graph, &source_node));
+  ASSERT_TRUE(source_node != nullptr);
+
+  // Try connecting to each of source_node's outputs.
+  EXPECT_OK(NodeBuilder("sink1", "Sink")
+                .Input(source_node)
+                .Finalize(&graph, nullptr));
+  EXPECT_OK(NodeBuilder("sink2", "Sink")
+                .Input(source_node, 1)
+                .Finalize(&graph, nullptr));
+
+  // Generate an error if the index is out of range.
+  EXPECT_FALSE(NodeBuilder("sink3", "Sink")
+                   .Input(source_node, 2)
+                   .Finalize(&graph, nullptr)
+                   .ok());
+  EXPECT_FALSE(NodeBuilder("sink4", "Sink")
+                   .Input(source_node, -1)
+                   .Finalize(&graph, nullptr)
+                   .ok());
+  EXPECT_FALSE(NodeBuilder("sink5", "Sink")
+                   .Input({source_node, -1})
+                   .Finalize(&graph, nullptr)
+                   .ok());
+
+  // Generate an error if the node is nullptr.  This can happen when using
+  // GraphDefBuilder if there was an error creating the input node.
+  EXPECT_FALSE(NodeBuilder("sink6", "Sink")
+                   .Input(nullptr)
+                   .Finalize(&graph, nullptr)
+                   .ok());
+  EXPECT_FALSE(NodeBuilder("sink7", "Sink")
+                   .Input(NodeBuilder::NodeOut(nullptr, 0))
+                   .Finalize(&graph, nullptr)
+                   .ok());
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/optimizer_cse.cc b/tensorflow/core/graph/optimizer_cse.cc
new file mode 100644
index 0000000000..2fa6f075c0
--- /dev/null
+++ b/tensorflow/core/graph/optimizer_cse.cc
@@ -0,0 +1,220 @@
+// This module implements a common subexpression elimination pass.  We
+// process the nodes in the graph in reverse postorder
+// (i.e. inputs before their downstream dependencies).  The rough algorithm is
+// as follows:
+//
+// std::unordered_map<size_t, Node*> available
+// for each node n in forward topological order:
+//   h = NodeHash(n)
+//   if available[h] exists and Equivalent(available(h), h)
+//     redirect downstream uses of outputs of n to available[h]
+//     remove n from graph
+//   else
+//     if available[h] does not exist
+//       available[h] = n
+//
+// This is similar to the global value number algorithm describe in this
+// paper:
+//
+// "Global code motion/global value numbering", Cliff Click, PLDI '95
+// Proceedings of the ACM SIGPLAN 1995 conference on Programming
+// language design and implementation, Pages 246-257
+//      http://dl.acm.org/citation.cfm?id=207154
+
+#include "tensorflow/core/graph/optimizer_cse.h"
+
+#include <unordered_map>
+
+#include "tensorflow/core/graph/algorithm.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/hash/hash.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+class OptimizerCSE {
+ public:
+  explicit OptimizerCSE(Graph* g) : g_(g) {}
+
+  void Optimize(std::function<bool(const Node*)> consider_fn);
+
+ private:
+  struct Scratch;
+
+  static size_t NodeHash(const Node* n);
+  static bool Equivalent(const Node* a, const Node* b, Scratch* s);
+  static bool EqualAttrs(const Node* a, const Node* b, Scratch* s);
+
+  Graph* g_;
+};
+
+static void FillInputs(const Node* n,
+                       gtl::InlinedVector<Node*, 4>* control_edges,
+                       gtl::InlinedVector<std::pair<Node*, int>, 4>* in) {
+  DCHECK_EQ(in->size(), n->num_inputs());
+  control_edges->clear();
+  for (const Edge* e : n->in_edges()) {
+    if (e->IsControlEdge()) {
+      control_edges->push_back(e->src());
+    } else {
+      (*in)[e->dst_input()] = std::make_pair(e->src(), e->src_output());
+    }
+  }
+  std::sort(control_edges->begin(), control_edges->end());
+  if (n->op_def().is_commutative()) {
+    // For commutative inputs, we sort the input by the input Node*
+    // to get a canonical ordering (so that add(a,b) and add(b, a) will
+    // hash to the same value if is_commutative is true for 'add').
+    std::sort(in->begin(), in->end());
+  }
+}
+
+static size_t kIllegalNodeHash = 0;
+
+size_t OptimizerCSE::NodeHash(const Node* n) {
+  const DataTypeVector& out = n->output_types();
+  string str_to_hash = strings::StrCat(n->type_string(), out.size());
+  for (DataType dt : out) {
+    strings::StrAppend(&str_to_hash, dt);
+  }
+
+  const int N_in = n->num_inputs();
+  strings::StrAppend(&str_to_hash, N_in);
+  gtl::InlinedVector<Node*, 4> control_edges;
+  gtl::InlinedVector<std::pair<Node*, int>, 4> in(N_in);
+  FillInputs(n, &control_edges, &in);
+  for (const auto& edge : in) {
+    strings::StrAppend(&str_to_hash, edge.first->id(), edge.second);
+  }
+
+  size_t h = Hash64(str_to_hash);
+
+#if !defined(__ANDROID__) && !defined(ANDROID)
+  // Hash the attrs.  For example, this makes sure different constants
+  // end up in different hash buckets.
+  string tmp;
+  for (const auto& attr : n->def().attr()) {
+    tmp = attr.first;
+    attr.second.AppendToString(&tmp);
+    // Add hashes of attrs, so the order of attrs doesn't matter.
+    h += Hash32(tmp.data(), tmp.size(), 0x87341245);
+  }
+#endif
+
+  if (h == kIllegalNodeHash) h = kIllegalNodeHash + 1;
+  return h;
+}
+
+struct OptimizerCSE::Scratch {
+  // For EqualAttrs():
+  string a;
+  string b;
+};
+
+bool OptimizerCSE::EqualAttrs(const Node* a, const Node* b, Scratch* scratch) {
+  if (a->def().attr_size() != b->def().attr_size()) return false;
+
+  for (const auto& attr : b->def().attr()) {
+    auto iter = a->def().attr().find(attr.first);
+    if (iter == a->def().attr().end()) return false;
+    // Note: it should be safe to compare proto serializations of the attr
+    // values since at most one field should be set in each (indeed, it
+    // should be the same field).
+    iter->second.SerializeToString(&scratch->a);
+    attr.second.SerializeToString(&scratch->b);
+    if (scratch->a != scratch->b) return false;
+  }
+  return true;
+}
+
+static bool HasRefInput(const Node* n) {
+  for (auto dt : n->input_types()) {
+    if (IsRefType(dt)) return true;
+  }
+  return false;
+}
+
+bool OptimizerCSE::Equivalent(const Node* a, const Node* b, Scratch* scratch) {
+  // Different op names are different
+  if (a->type_string() != b->type_string()) return false;
+
+  // Never consider stateful nodes (such as non-const inputs) equivalent.
+  if (a->op_def().is_stateful()) return false;
+
+  // For now, we consider any node that takes a ref input to not be
+  // equivalent to any other node.
+  if (HasRefInput(a) || HasRefInput(b)) return false;
+
+  // Compare attrs.  Note that equal attrs implies equal input and
+  // output types.
+  if (!EqualAttrs(a, b, scratch)) return false;
+
+  // Compare input sources
+  if (a->num_inputs() != b->num_inputs()) return false;
+  const int N_in = a->num_inputs();
+  gtl::InlinedVector<Node*, 4> a_control_edges;
+  gtl::InlinedVector<Node*, 4> b_control_edges;
+  gtl::InlinedVector<std::pair<Node*, int>, 4> a_in(N_in);
+  gtl::InlinedVector<std::pair<Node*, int>, 4> b_in(N_in);
+  FillInputs(a, &a_control_edges, &a_in);
+  FillInputs(b, &b_control_edges, &b_in);
+  if (a_in != b_in) return false;
+  if (a_control_edges != b_control_edges) return false;
+
+  return true;
+}
+
+void OptimizerCSE::Optimize(std::function<bool(const Node*)> consider_fn) {
+  // This very simple implementation works if the whole graph is one
+  // giant basic block (because we just traverse nodes in a
+  // topological order).  We'll need to do something more
+  // sophisticated when we have control flow/loops/etc.
+
+  // TODO(jeff): We need to handle Update nodes specially, but dealing
+  // with more general control flow will also solve this issue, and for
+  // now, our updates are almost always the most downstream nodes in
+  // the graph.
+  std::vector<Node*> order;
+  GetReversePostOrder(*g_, &order);
+
+  // Our value is just a single Node*, meaning we keep just a single
+  // candidate for a given node hash value.  This may cause us to
+  // (rarely) lose some optimization opportunities if there are
+  // hash collisions, but it allows us to avoid having the value
+  // be a set<Node*> (or equivalent).
+  std::unordered_map<size_t, Node*> available;
+
+  // Scratch space for Equivalent calls.  Allocated here and passed in to
+  // Equivalent to avoid allocation inside the loop below.
+  Scratch scratch;
+  for (Node* n : order) {
+    if (!n->IsOp()) continue;
+
+    // See if we should consider this node at all
+    if (consider_fn != nullptr && !consider_fn(n)) continue;
+
+    size_t h = NodeHash(n);
+    Node** candidate = &available[h];
+    if (*candidate == nullptr) {
+      // No existing match: insert "n" into the hash table under "h"
+      *candidate = n;
+    } else if (Equivalent(*candidate, n, &scratch)) {
+      VLOG(1) << "CSE: equivalent: " << (*candidate)->name() << " and "
+              << n->name();
+      // *candidate and n are equivalent.  Therefore, we can replace
+      // n with *candidate by fixing up outgoing edges from "n" to instead
+      // come from "*candidate", and then delete n from the graph
+      for (const Edge* e : n->out_edges()) {
+        g_->AddEdge(*candidate, e->src_output(), e->dst(), e->dst_input());
+      }
+      g_->RemoveNode(n);
+    }
+  }
+}
+
+void OptimizeCSE(Graph* g, std::function<bool(const Node*)> consider_fn) {
+  OptimizerCSE opt(g);
+  opt.Optimize(consider_fn);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/optimizer_cse.h b/tensorflow/core/graph/optimizer_cse.h
new file mode 100644
index 0000000000..430c97a449
--- /dev/null
+++ b/tensorflow/core/graph/optimizer_cse.h
@@ -0,0 +1,19 @@
+// An optimization pass that performs common subexpression elimination.
+
+#ifndef TENSORFLOW_GRAPH_OPTIMIZER_CSE_H_
+#define TENSORFLOW_GRAPH_OPTIMIZER_CSE_H_
+
+#include <sys/types.h>
+#include "tensorflow/core/graph/graph.h"
+
+namespace tensorflow {
+
+// Perform common-subexpression elimination on the graph "*g".  If
+// "consider_fn" is not nullptr, then only nodes for which
+// consider_fn(node) returns true will be considered for combining
+// during the common subexpression elimination.
+extern void OptimizeCSE(Graph* g, std::function<bool(const Node*)> consider_fn);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_OPTIMIZER_CSE_H_
diff --git a/tensorflow/core/graph/optimizer_cse_test.cc b/tensorflow/core/graph/optimizer_cse_test.cc
new file mode 100644
index 0000000000..ebbb948fdc
--- /dev/null
+++ b/tensorflow/core/graph/optimizer_cse_test.cc
@@ -0,0 +1,365 @@
+#include "tensorflow/core/graph/optimizer_cse.h"
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/testlib.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace {
+
+static void InitGraph(const string& s, Graph* graph) {
+  GraphDef graph_def;
+
+  auto parser = protobuf::TextFormat::Parser();
+  //  parser.AllowRelaxedWhitespace(true);
+  CHECK(parser.MergeFromString(s, &graph_def)) << s;
+  GraphConstructorOptions opts;
+  TF_CHECK_OK(ConvertGraphDefToGraph(opts, graph_def, graph));
+}
+
+class OptimizerCSETest : public ::testing::Test {
+ public:
+  OptimizerCSETest() : graph_(OpRegistry::Global()) { RequireDefaultOps(); }
+
+  void InitGraph(const string& s) {
+    ::tensorflow::InitGraph(s, &graph_);
+    original_ = CanonicalGraphString(&graph_);
+  }
+
+  static bool IncludeNode(const Node* n) { return n->IsOp(); }
+
+  static string EdgeId(const Node* n, int index) {
+    if (index == 0) {
+      return n->name();
+    } else if (index == Graph::kControlSlot) {
+      return strings::StrCat(n->name(), ":control");
+    } else {
+      return strings::StrCat(n->name(), ":", index);
+    }
+  }
+
+  string CanonicalGraphString(Graph* g) {
+    std::vector<string> nodes;
+    std::vector<string> edges;
+    for (const Node* n : g->nodes()) {
+      if (IncludeNode(n)) {
+        nodes.push_back(strings::StrCat(n->name(), "(", n->type_string(), ")"));
+      }
+    }
+    for (const Edge* e : g->edges()) {
+      if (IncludeNode(e->src()) && IncludeNode(e->dst())) {
+        edges.push_back(strings::StrCat(EdgeId(e->src(), e->src_output()), "->",
+                                        EdgeId(e->dst(), e->dst_input())));
+      }
+    }
+    // Canonicalize
+    std::sort(nodes.begin(), nodes.end());
+    std::sort(edges.begin(), edges.end());
+    return strings::StrCat(str_util::Join(nodes, ";"), "|",
+                           str_util::Join(edges, ";"));
+  }
+
+  string DoCSE(std::function<bool(const Node*)> consider_fn = nullptr) {
+    string before = CanonicalGraphString(&graph_);
+    LOG(ERROR) << "Before rewrites: " << before;
+
+    OptimizeCSE(&graph_, consider_fn);
+
+    string result = CanonicalGraphString(&graph_);
+    LOG(ERROR) << "After rewrites:  " << result;
+    return result;
+  }
+
+  const string& OriginalGraph() const { return original_; }
+
+  Graph graph_;
+  string original_;
+};
+
+REGISTER_OP("Input").Output("o: float").SetIsStateful();
+
+// Note that the "rules" in these tests are not meant to be logically correct
+TEST_F(OptimizerCSETest, Simple) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }");
+  EXPECT_EQ(DoCSE(),
+            "A(Input);B(Input);D(Mul)|"
+            "A->D;B->D:1");
+}
+
+TEST_F(OptimizerCSETest, Simple_ThreeEquivalent) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'E' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }");
+  EXPECT_EQ(DoCSE(),
+            "A(Input);B(Input);E(Mul)|"
+            "A->E;B->E:1");
+}
+
+TEST_F(OptimizerCSETest, Simple_WithFixups) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'E' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['C', 'D'] }");
+  EXPECT_EQ(DoCSE(),
+            "A(Input);B(Input);D(Mul);E(Mul)|"
+            "A->D;B->D:1;D->E;D->E:1");
+}
+
+TEST_F(OptimizerCSETest, Simple_Commutative) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['B', 'A'] }");
+  EXPECT_EQ(DoCSE(),
+            "A(Input);B(Input);D(Mul)|"
+            "A->D:1;B->D");
+}
+
+static bool IsNotMultiply(const Node* n) { return n->type_string() != "Mul"; }
+
+// Like Simple_Commutative,
+TEST_F(OptimizerCSETest, Simple_Filtered) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['B', 'A'] }");
+  EXPECT_EQ(DoCSE(IsNotMultiply), OriginalGraph());
+}
+
+TEST_F(OptimizerCSETest, Simple_NotCommutative) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Sub' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'D' op: 'Sub' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['B', 'A'] }");
+  EXPECT_EQ(DoCSE(), OriginalGraph());
+}
+
+TEST_F(OptimizerCSETest, NotEquivalent_Ops) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'D' op: 'Sub' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }");
+  EXPECT_EQ(DoCSE(), OriginalGraph());
+}
+
+TEST_F(OptimizerCSETest, Simple_SameOps_SameAttrs1) {
+  // Should still do CSE for ops with attrs if they match.
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] attr { key: 'shape'"
+      "    value { shape: { dim: { size: 37 name: 'SAME_NAME' } } } } }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] attr { key: 'shape'"
+      "    value { shape: { dim: { size: 37 name: 'SAME_NAME' } } } } }");
+  EXPECT_EQ(DoCSE(),
+            "A(Input);B(Input);D(Mul)|"
+            "A->D;B->D:1");
+}
+
+TEST_F(OptimizerCSETest, Simple_SameOps_SameAttrs2) {
+  // Should still do CSE for ops with attrs if they match, even if they
+  // are not in the same order.
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B']"
+      "    attr { key: 'a' value { i: 3 } }"
+      "    attr { key: 't' value { type: DT_INT32 } } }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B']"
+      "    attr { key: 't' value { type: DT_INT32 } }"
+      "    attr { key: 'a' value { i: 3 } } }");
+  EXPECT_EQ(DoCSE(),
+            "A(Input);B(Input);D(Mul)|"
+            "A->D;B->D:1");
+}
+
+TEST_F(OptimizerCSETest, SameConstants) {
+  // Should still do CSE for ops with constants if the values are identical
+  InitGraph(
+      "node { name: 'A' op: 'Const' "
+      "  attr { key: 'dtype' value { type: DT_INT32 } }"
+      "  attr { key: 'value' value {"
+      "    tensor { dtype: DT_INT32 tensor_shape { dim { size: 1 } } "
+      "    int_val: 0 } } } }"
+      "node { name: 'B' op: 'Const' "
+      "  attr { key: 'dtype' value { type: DT_INT32 } }"
+      "  attr { key: 'value' value {"
+      "    tensor { dtype: DT_INT32 tensor_shape { dim { size: 1 } } "
+      "    int_val: 0 } } } }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_INT32 } }"
+      " input: ['A', 'B'] }");
+  EXPECT_EQ(DoCSE(),
+            "B(Const);D(Mul)|"
+            "B->D;B->D:1");
+}
+
+TEST_F(OptimizerCSETest, DifferentConstants) {
+  // Should still do CSE for ops with extensions if the extensions are identical
+  InitGraph(
+      "node { name: 'A' op: 'Const' "
+      "  attr { key: 'dtype' value { type: DT_INT32 } }"
+      "  attr { key: 'value' value {"
+      "    tensor { dtype: DT_INT32 tensor_shape { dim { size: 1 } } "
+      "    int_val: 0 } } } }"
+      "node { name: 'B' op: 'Const' "
+      "  attr { key: 'dtype' value { type: DT_INT32 } }"
+      "  attr { key: 'value' value {"
+      "    tensor { dtype: DT_INT32 tensor_shape { dim { size: 1 } } "
+      "    int_val: 100000 } } } }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_INT32 } }"
+      " input: ['A', 'B'] }");
+  EXPECT_EQ(DoCSE(),
+            "A(Const);B(Const);D(Mul)|"
+            "A->D;B->D:1");
+}
+
+TEST_F(OptimizerCSETest, SameOps_DifferentAttrs1) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B']"
+      "    attr { key: 'a' value { i: 3 } }"
+      "    attr { key: 't' value { type: DT_INT32 } } }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B']"
+      "    attr { key: 't' value { type: DT_INT32 } }"
+      "    attr { key: 'a' value { i: 4 } } }");
+  EXPECT_EQ(DoCSE(), OriginalGraph());
+}
+
+TEST_F(OptimizerCSETest, SameOps_DifferentAttrs2) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B']"
+      "    attr { key: 'a' value { i: 3 } }"
+      "    attr { key: 't' value { type: DT_FLOAT } } }"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B']"
+      "    attr { key: 't' value { type: DT_INT32 } }"
+      "    attr { key: 'a' value { i: 3 } } }");
+  EXPECT_EQ(DoCSE(), OriginalGraph());
+}
+
+TEST_F(OptimizerCSETest, NotEquivalent_Inputs) {
+  InitGraph(
+      "node { name: 'A' op: 'Input'}"
+      "node { name: 'B' op: 'Input'}"
+      "node { name: 'C' op: 'Input'}"
+      "node { name: 'D' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'B'] }"
+      "node { name: 'E' op: 'Mul' attr { key: 'T' value { type: DT_FLOAT } }"
+      " input: ['A', 'C'] }");
+  EXPECT_EQ(DoCSE(), OriginalGraph());
+}
+
+TEST_F(OptimizerCSETest, Constant_Dedup) {
+  Tensor a(DT_FLOAT, TensorShape({1}));
+  a.flat<float>()(0) = 1.0;
+  Tensor b(DT_DOUBLE, TensorShape({1}));  // Different type
+  b.flat<double>()(0) = 1.0;
+  Tensor c(DT_FLOAT, TensorShape({1, 1}));  // Different shape
+  c.flat<float>()(0) = 1.0;
+  Tensor d(DT_FLOAT, TensorShape({1}));  // Different value
+  d.flat<float>()(0) = 2.0;
+
+  // A graph contains a bunch of constants.
+  Graph g(OpRegistry::Global());
+  for (auto val : {a, b, c, d, d, c, b, a}) {
+    test::graph::Constant(&g, val);  // Node name is n/_0, n/_1, ...
+  }
+  GraphDef gdef;
+  test::graph::ToGraphDef(&g, &gdef);
+  InitGraph(gdef.DebugString());
+
+  EXPECT_EQ(OriginalGraph(),
+            "n/_0(Const);n/_1(Const);n/_2(Const);n/_3(Const);"
+            "n/_4(Const);n/_5(Const);n/_6(Const);n/_7(Const)|");
+  // In theory, there are 2^4 possible correct output of CSE.  In this
+  // test, it happens happens to eliminate the first 4 nodes.
+  EXPECT_EQ(DoCSE(), "n/_4(Const);n/_5(Const);n/_6(Const);n/_7(Const)|");
+}
+
+static void BM_CSE(int iters, int op_nodes) {
+  testing::StopTiming();
+  string s;
+  for (int in = 0; in < 10; in++) {
+    s += strings::Printf("node { name: 'in%04d' op: 'Input'}", in);
+  }
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  for (int op = 0; op < op_nodes; op++) {
+    s += strings::Printf(
+        "node { name: 'op%04d' op: 'Mul' attr { key: 'T' value { "
+        "type: DT_FLOAT } } input: ['in%04d', 'in%04d' ] }",
+        op, rnd.Uniform(10), rnd.Uniform(10));
+  }
+
+  bool first = true;
+  while (iters > 0) {
+    Graph* graph = new Graph(OpRegistry::Global());
+    InitGraph(s, graph);
+    int N = graph->num_node_ids();
+    if (first) {
+      testing::SetLabel(strings::StrCat("Per graph node.  Nodes: ", N));
+      first = false;
+    }
+    {
+      testing::StartTiming();
+      OptimizeCSE(graph, nullptr);
+      testing::StopTiming();
+    }
+    iters -= N;  // Our benchmark units are individual graph nodes,
+                 // not whole graphs
+    delete graph;
+  }
+}
+BENCHMARK(BM_CSE)->Arg(1000)->Arg(10000);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/subgraph.cc b/tensorflow/core/graph/subgraph.cc
new file mode 100644
index 0000000000..7910511dfb
--- /dev/null
+++ b/tensorflow/core/graph/subgraph.cc
@@ -0,0 +1,258 @@
+#include "tensorflow/core/graph/subgraph.h"
+
+#include <algorithm>
+#include <deque>
+#include <string>
+#include <unordered_map>
+#include <unordered_set>
+#include <vector>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/graph/algorithm.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/graph/tensor_id.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// ----------------------------------------------------------------------------
+// Subgraph construction-related routines
+// ----------------------------------------------------------------------------
+// TODO(vrv): Profile the unordered_set and unordered_map use in this file to
+// see if we should use an alternative implementation.
+
+namespace {
+
+typedef std::unordered_map<StringPiece, Node*, StringPiece::Hasher> NameIndex;
+
+// Rewrite graph by replacing the output tensors specified in
+// "fed_outputs" with special feed nodes for each specified output
+// tensor, and removing any nodes that are now disconnected from the
+// part of the graph that reaches the sink node.  The set of special
+// feed nodes added to the graph are returned in "*feed_nodes".
+//
+// Return true on success.  On error, return false and sets *error to
+// an appropriate error message (and *g is left in an indeterminate
+// state).
+static Status FeedInputs(Graph* g, const DeviceAttributes& device_info,
+                              const gtl::ArraySlice<string>& fed_outputs,
+                              NameIndex* name_index) {
+  for (const string& t : fed_outputs) {
+    TensorId id(ParseTensorName(t));
+
+    auto iter = name_index->find(id.first);
+    if (iter == name_index->end()) {
+      return errors::NotFound("FeedInputs: unable to find feed output ", t);
+    }
+    const Node* n = iter->second;
+    DCHECK_EQ(n->name(), id.first);
+    if (id.second >= n->num_outputs()) {
+      return errors::InvalidArgument(
+          "FeedInputs: ", t, " should have output index < ", n->num_outputs());
+    }
+
+    Node* recv_node;
+    TF_RETURN_IF_ERROR(
+        NodeBuilder(strings::StrCat("_recv_", id.first, "_", id.second),
+                    "_Recv")
+            .Attr("tensor_type", BaseType(n->output_type(id.second)))
+            .Attr("tensor_name", t)
+            .Attr("send_device", device_info.name())
+            .Attr("recv_device", device_info.name())
+            .Attr("send_device_incarnation",
+                  static_cast<int64>(device_info.incarnation()))
+            .Attr("client_terminated", true)
+            .Finalize(g, &recv_node));
+    recv_node->set_assigned_device_name(device_info.name());
+
+    // Update name_index
+    (*name_index)[recv_node->name()] = recv_node;
+    g->AddControlEdge(g->source_node(), recv_node);
+
+    // Look through edges coming out of "n" for edges whose src_output() index
+    // matches "output_index".  If found, replace the edges with a connection
+    // from the special feed node.
+    std::vector<const Edge*> to_remove;
+    for (const Edge* e : n->out_edges()) {
+      if (e->src_output() == id.second) {
+        to_remove.emplace_back(e);
+      } else if (e->src_output() == Graph::kControlSlot &&
+                 n->def().op() == "Placeholder") {
+        // When feeding a Placeholder node, any outgoing control edges
+        // will be replaced with a control edge from the replacement
+        // recv_node.
+        // TODO(josh11b,mrry): Come up with a more elegant way of addressing
+        // the general version of this problem.
+        to_remove.emplace_back(e);
+      }
+    }
+
+    for (const Edge* e : to_remove) {
+      if (e->src_output() == id.second) {
+        g->AddEdge(recv_node, 0, e->dst(), e->dst_input());
+      } else {
+        CHECK_EQ(Graph::kControlSlot, e->src_output());
+        g->AddControlEdge(recv_node, e->dst());
+      }
+      g->RemoveEdge(e);
+    }
+  }
+  return Status::OK();
+}
+
+// Augment "*g" by adding special "fetch" nodes that connect to the
+// tensor outputs specified in "fetch_outputs" to retrieve the output
+// of the tensors.  The new nodes added are set up to execute on
+// "client_device_name", and are returned in "*fetch_nodes".
+//
+// Return true on success.  On error, return false and sets *error to
+// an appropriate error message (and *g is left in an indeterminate
+// state).
+static Status FetchOutputs(Graph* g, const DeviceAttributes& device_info,
+                           const gtl::ArraySlice<string>& fetch_outputs,
+                           NameIndex* name_index,
+                           std::vector<Node*>* fetch_nodes) {
+  fetch_nodes->clear();
+  for (const string& t : fetch_outputs) {
+    // Parse t into node_name and output_index.
+    TensorId id(ParseTensorName(t));
+
+    // Find node in graph with that name.
+    auto iter = name_index->find(id.first);
+    if (iter == name_index->end()) {
+      return errors::NotFound("FetchOutputs node ", t, ": not found");
+    }
+    Node* n = iter->second;
+    DCHECK_EQ(n->name(), id.first);
+    VLOG(2) << "Found fetch node for " << t;
+
+    // Validate output_index
+    if (id.second >= n->num_outputs()) {
+      return errors::InvalidArgument("FetchOutputs ", t,
+                                     ": output index too large, must be < ",
+                                     n->num_outputs());
+    }
+
+    // Create the fetch Node and connect it up
+    Node* send_node;
+    TF_RETURN_IF_ERROR(
+        NodeBuilder(strings::StrCat("_send_", id.first, "_", id.second),
+                    "_Send")
+            .Input(n, id.second)
+            .Attr("tensor_name", t)
+            .Attr("send_device", device_info.name())
+            .Attr("recv_device", device_info.name())
+            .Attr("send_device_incarnation",
+                  static_cast<int64>(device_info.incarnation()))
+            .Attr("client_terminated", true)
+            .Finalize(g, &send_node));
+    send_node->set_assigned_device_name(device_info.name());
+    VLOG(1) << "Created fetch node: " << SummarizeNodeDef(send_node->def());
+
+    // Update the index.
+    (*name_index)[send_node->name()] = send_node;
+
+    g->AddControlEdge(send_node, g->sink_node());
+    fetch_nodes->push_back(send_node);
+  }
+
+  return Status::OK();
+}
+
+static bool AddNodeToTargets(const string& node_or_tensor_name,
+                             const NameIndex& name_index,
+                             std::unordered_set<const Node*>* targets) {
+  TensorId id = ParseTensorName(node_or_tensor_name);
+  auto iter = name_index.find(id.first);
+  if (iter == name_index.end()) {
+    return false;
+  }
+  const Node* n = iter->second;
+  if (n->name() != node_or_tensor_name) {
+    return false;
+  }
+
+  targets->insert(n);
+  return true;
+}
+
+static Status PruneForTargets(Graph* g, const NameIndex& name_index,
+                              const std::vector<Node*>& fetch_nodes,
+                              const gtl::ArraySlice<string>& target_nodes) {
+  string not_found;
+  std::unordered_set<const Node*> targets;
+  for (Node* n : fetch_nodes) {
+    if (!AddNodeToTargets(n->name(), name_index, &targets)) {
+      strings::StrAppend(&not_found, n->name(), " ");
+    }
+  }
+  for (const string& s : target_nodes) {
+    if (!AddNodeToTargets(s, name_index, &targets)) {
+      strings::StrAppend(&not_found, s, " ");
+    }
+  }
+  if (!not_found.empty()) {
+    return errors::NotFound("PruneForTargets: Some target nodes not found: ",
+                            not_found);
+  }
+  PruneForReverseReachability(g, targets);
+
+  return Status::OK();
+}
+
+}  // namespace
+
+namespace subgraph {
+
+Status RewriteGraphForExecution(
+    Graph* g, const gtl::ArraySlice<string>& fed_outputs,
+    const gtl::ArraySlice<string>& fetch_outputs,
+    const gtl::ArraySlice<string>& target_node_names,
+    const DeviceAttributes& device_info) {
+  std::unordered_set<string> endpoints(fed_outputs.begin(), fed_outputs.end());
+  for (const auto& fetch : fetch_outputs) {
+    if (endpoints.count(fetch) > 0) {
+      return errors::InvalidArgument(fetch, " is both fed and fetched.");
+    }
+  }
+
+  // A separate index mapping name to Node*, for use by FeedInputs,
+  // FetchOutputs, and PruneForTargets
+  NameIndex name_index;
+  for (Node* n : g->nodes()) {
+    name_index[n->name()] = n;
+  }
+
+  // Add the feeds.  This may replace nodes in the graph, including the nodes
+  // currently listed in "fetch_nodes".  We pass "name_index" so the index is
+  // kept up to date.
+  if (!fed_outputs.empty()) {
+    TF_RETURN_IF_ERROR(FeedInputs(g, device_info, fed_outputs, &name_index));
+  }
+
+  // Add the fetch nodes, also updating "name_index".
+  std::vector<Node*> fetch_nodes;
+  if (!fetch_outputs.empty()) {
+    TF_RETURN_IF_ERROR(
+        FetchOutputs(g, device_info, fetch_outputs, &name_index, &fetch_nodes));
+  }
+
+  // Prune the graph to only compute what is needed for the fetch nodes and the
+  // targets nodes.
+  if (!fetch_nodes.empty() || !target_node_names.empty()) {
+    TF_RETURN_IF_ERROR(
+        PruneForTargets(g, name_index, fetch_nodes, target_node_names));
+  }
+
+  return Status::OK();
+}
+
+}  // namespace subgraph
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/subgraph.h b/tensorflow/core/graph/subgraph.h
new file mode 100644
index 0000000000..d2e138e8ae
--- /dev/null
+++ b/tensorflow/core/graph/subgraph.h
@@ -0,0 +1,49 @@
+#ifndef TENSORFLOW_GRAPH_SUBGRAPH_H_
+#define TENSORFLOW_GRAPH_SUBGRAPH_H_
+
+#include <string>
+
+#include "tensorflow/core/framework/device_attributes.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+namespace subgraph {
+
+// Rewrite the graph structure of "*g" to deal with feeding node
+// outputs, fetching node outputs, and only running a subset of the
+// graph.  "fed_outputs" and "fetch_outputs" are both lists of
+// output tensor identifiers in the form of
+// "<name>[:<optional_output_index>]", and "target_nodes_str" is a
+// lists of of target node names in "*g" "g".
+//
+// In the resulting graph "*g", output edges in "fed_outputs" have
+// been redirected to special "_recv" nodes introduced into the graph.
+// If these fed nodes are not needed in order to compute the effects
+// of the nodes in "targets_nodes" and "fetch_outputs", then these may
+// be omitted from the graph.
+//
+// In the resulting graph "*g", additional "_send" nodes are connected
+// to every output in "fetch_outputs".  These "_send" nodes are set up
+// to execute on the device described by device_info.
+//
+// On success, returns OK, and sets "*g" to a version of "*g"
+// that represents the portions of the graph necessary for producing
+// the output of all nodes listed in "target_node_names" and fetching the
+// specific node outputs specified in "fetch_outputs".
+//
+// On failure, returns the error status. Possible errors include:
+//    - fed output "node:output_index" does not exist in "*g"
+//    - fetch output "node:output_index" does not exist in "*g"
+//    - target node "node" does not exist in "*g"
+Status RewriteGraphForExecution(
+    Graph* g, const gtl::ArraySlice<string>& fed_outputs,
+    const gtl::ArraySlice<string>& fetch_outputs,
+    const gtl::ArraySlice<string>& target_node_names,
+    const DeviceAttributes& device_info);
+
+}  // namespace subgraph
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_SUBGRAPH_H_
diff --git a/tensorflow/core/graph/subgraph_test.cc b/tensorflow/core/graph/subgraph_test.cc
new file mode 100644
index 0000000000..ffb3e6e403
--- /dev/null
+++ b/tensorflow/core/graph/subgraph_test.cc
@@ -0,0 +1,305 @@
+#include "tensorflow/core/graph/subgraph.h"
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/regexp.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/status.h"
+#include <gtest/gtest.h>
+
+// TODO(josh11b): Test setting the "device" field of a NodeDef.
+// TODO(josh11b): Test that feeding won't prune targets.
+
+namespace tensorflow {
+namespace {
+
+class SubgraphTest : public ::testing::Test {
+ protected:
+  SubgraphTest() : g_(new Graph(OpRegistry::Global())) {
+    RequireDefaultOps();
+    device_info_.set_name("/job:a/replica:0/task:0/cpu:0");
+    device_info_.set_device_type(DeviceType(DEVICE_CPU).type());
+    device_info_.set_incarnation(0);
+  }
+
+  ~SubgraphTest() override {}
+
+  void ExpectOK(const string& gdef_ascii) {
+    CHECK(protobuf::TextFormat::ParseFromString(gdef_ascii, &gdef_));
+    GraphConstructorOptions opts;
+    TF_CHECK_OK(ConvertGraphDefToGraph(opts, gdef_, g_.get()));
+  }
+
+  Node* FindNode(const string& name) {
+    for (Node* n : g_->nodes()) {
+      if (n->name() == name) return n;
+    }
+    return nullptr;
+  }
+
+  bool HasNode(const string& name) { return FindNode(name) != nullptr; }
+
+  void ExpectNodes(const string& nodes) {
+    int count = 0;
+    std::vector<string> actual_nodes;
+    for (Node* n : g_->nodes()) {
+      if (n->IsOp()) {
+        count++;
+        actual_nodes.push_back(n->name());
+      }
+    }
+    std::sort(actual_nodes.begin(), actual_nodes.end());
+
+    LOG(INFO) << "Nodes present: " << str_util::Join(actual_nodes, " ");
+
+    std::vector<string> expected_nodes = str_util::Split(nodes, ',');
+    std::sort(expected_nodes.begin(), expected_nodes.end());
+    for (const string& s : expected_nodes) {
+      Node* n = FindNode(s);
+      EXPECT_TRUE(n != nullptr) << s;
+      if (n->def().op() == "_Send" || n->def().op() == "_Recv") {
+        EXPECT_EQ(device_info_.name(), n->assigned_device_name()) << s;
+      }
+    }
+
+    EXPECT_TRUE(actual_nodes.size() == expected_nodes.size())
+        << "\nActual:   " << str_util::Join(actual_nodes, ",")
+        << "\nExpected: " << str_util::Join(expected_nodes, ",");
+  }
+
+  bool HasEdge(const string& src, int src_out, const string& dst, int dst_in) {
+    for (const Edge* e : g_->edges()) {
+      if (e->src()->name() == src && e->src_output() == src_out &&
+          e->dst()->name() == dst && e->dst_input() == dst_in)
+        return true;
+    }
+    return false;
+  }
+  bool HasControlEdge(const string& src, const string& dst) {
+    return HasEdge(src, Graph::kControlSlot, dst, Graph::kControlSlot);
+  }
+
+  string Subgraph(const string& fed_str, const string& fetch_str,
+                  const string& targets_str) {
+    Graph* subgraph = new Graph(OpRegistry::Global());
+    CopyGraph(*g_, subgraph);
+    std::vector<string> fed =
+        str_util::Split(fed_str, ',', str_util::SkipEmpty());
+    std::vector<string> fetch =
+        str_util::Split(fetch_str, ',', str_util::SkipEmpty());
+    std::vector<string> targets =
+        str_util::Split(targets_str, ',', str_util::SkipEmpty());
+
+    Status s = subgraph::RewriteGraphForExecution(subgraph, fed, fetch,
+                                                       targets, device_info_);
+    if (!s.ok()) {
+      delete subgraph;
+      return s.ToString();
+    }
+
+    // Replace the graph with the subgraph for the rest of the display program
+    g_.reset(subgraph);
+    return "OK";
+  }
+
+  Graph* graph() { return g_.get(); }
+
+ private:
+  GraphDef gdef_;
+  std::unique_ptr<Graph> g_;
+  DeviceAttributes device_info_;
+};
+
+REGISTER_OP("TestParams").Output("o: float");
+REGISTER_OP("TestInput").Output("a: float").Output("b: float");
+REGISTER_OP("TestRelu").Input("i: float").Output("o: float");
+REGISTER_OP("TestMul").Input("a: float").Input("b: float").Output("o: float");
+
+TEST_F(SubgraphTest, Targets1) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'W2' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1', 'input:1' ] }"
+      "node { name: 't2' op: 'TestMul' input: [ 'W2', 't1' ] }"
+      "node { name: 't3_a' op: 'TestRelu' input: 't2' }"
+      "node { name: 't3_b' op: 'TestRelu' input: 't2' }");
+  EXPECT_EQ("OK", Subgraph("", "", "t1"));
+  ExpectNodes("W1,input,t1");
+}
+
+TEST_F(SubgraphTest, Targets2) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'W2' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: 'W1' input: 'input:1' }"
+      "node { name: 't2' op: 'TestMul' input: 'W2' input: 't1' }"
+      "node { name: 't3_a' op: 'TestRelu' input: 't2' }"
+      "node { name: 't3_b' op: 'TestRelu' input: 't2' }");
+  EXPECT_EQ("OK", Subgraph("", "", "t2,t3_a"));
+  ExpectNodes("W1,W2,input,t1,t2,t3_a");
+}
+
+TEST_F(SubgraphTest, FedOutputs1) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'W2' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1', 'input:1' ] }"
+      "node { name: 't2' op: 'TestMul' input: [ 'W2', 't1' ] }"
+      "node { name: 't3_a' op: 'TestRelu' input: 't2' }"
+      "node { name: 't3_b' op: 'TestRelu' input: 't2' }");
+  EXPECT_EQ("OK", Subgraph("input:1", "", "t2"));
+  ExpectNodes("W1,W2,_recv_input_1,t1,t2");
+}
+
+TEST_F(SubgraphTest, FedRefNode) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'W2' op: 'TestParams' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W2', 'W1' ] }");
+  EXPECT_EQ("OK", Subgraph("W1:0", "", "t1"));
+  ExpectNodes("_recv_W1_0,W2,t1");
+  Node* n = FindNode("_recv_W1_0");
+  EXPECT_FALSE(IsRefType(CHECK_NOTNULL(n)->output_type(0)));
+}
+
+TEST_F(SubgraphTest, FedOutputs2) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'W2' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1', 'input:1' ] }"
+      "node { name: 't2' op: 'TestMul' input: [ 'W2', 't1' ] }"
+      "node { name: 't3_a' op: 'TestRelu' input: 't2' }"
+      "node { name: 't3_b' op: 'TestRelu' input: 't2' }");
+  // We feed input:1, but nothing connects to it, so the _recv(input:1)
+  // node also disappears.
+  EXPECT_EQ("OK", Subgraph("input:1,t1,W2", "", "t2"));
+  ExpectNodes("_recv_t1_0,_recv_W2_0,t2");
+}
+
+TEST_F(SubgraphTest, FetchOutputs1) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'W2' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1', 'input:1' ] }"
+      "node { name: 't2' op: 'TestMul' input: [ 'W2', 't1' ] }"
+      "node { name: 't3_a' op: 'TestRelu' input: 't2' }"
+      "node { name: 't3_b' op: 'TestRelu' input: 't2' }");
+  EXPECT_EQ("OK", Subgraph("", "W2,input:1,t1,t2", "t2"));
+  ExpectNodes(
+      "W1,W2,input,t1,t2,_send_W2_0,_send_input_1,_send_t1_0,_send_t2_0");
+}
+
+TEST_F(SubgraphTest, FetchOutputs2) {
+  ExpectOK(
+      "node { name: 'W1' op: 'TestParams' }"
+      "node { name: 'W2' op: 'TestParams' }"
+      "node { name: 'input' op: 'TestInput' }"
+      "node { name: 't1' op: 'TestMul' input: [ 'W1', 'input:1' ] }"
+      "node { name: 't2' op: 'TestMul' input: [ 'W2', 't1' ] }"
+      "node { name: 't3_a' op: 'TestRelu' input: 't2' }"
+      "node { name: 't3_b' op: 'TestRelu' input: 't2' }");
+  EXPECT_EQ("OK", Subgraph("", "t3_a", "t2"));
+  ExpectNodes("W1,W2,input,t1,t2,t3_a,_send_t3_a_0");
+}
+
+TEST_F(SubgraphTest, ChainOfFools) {
+  ExpectOK(
+      "node { name: 'a' op: 'TestParams' }"
+      "node { name: 'b' op: 'TestRelu' input: 'a'}"
+      "node { name: 'c' op: 'TestRelu' input: 'b'}"
+      "node { name: 'd' op: 'TestRelu' input: 'c'}"
+      "node { name: 'e' op: 'TestRelu' input: 'd'}"
+      "node { name: 'f' op: 'TestRelu' input: 'e'}");
+  EXPECT_EQ("OK", Subgraph("c:0", "b:0,e:0", ""));
+  ExpectNodes("a,b,_send_b_0,_recv_c_0,d,e,_send_e_0");
+  EXPECT_TRUE(HasEdge("a", 0, "b", 0));
+  EXPECT_TRUE(HasEdge("b", 0, "_send_b_0", 0));
+  EXPECT_TRUE(HasEdge("_recv_c_0", 0, "d", 0));
+  EXPECT_TRUE(HasEdge("d", 0, "e", 0));
+  EXPECT_TRUE(HasEdge("e", 0, "_send_e_0", 0));
+}
+
+static bool HasSubstr(const string& base, const string& substr) {
+  bool ok = StringPiece(base).contains(substr);
+  EXPECT_TRUE(ok) << base << ", expected substring " << substr;
+  return ok;
+}
+
+TEST_F(SubgraphTest, Errors) {
+  ExpectOK(
+      "node { name: 'a' op: 'TestParams' }"
+      "node { name: 'b' op: 'TestRelu' input: 'a'}"
+      "node { name: 'c' op: 'TestRelu' input: 'b'}"
+      "node { name: 'd' op: 'TestRelu' input: 'c'}"
+      "node { name: 'e' op: 'TestRelu' input: 'd'}"
+      "node { name: 'f' op: 'TestRelu' input: 'e'}");
+  // Duplicated feed and fetch
+  EXPECT_TRUE(
+      HasSubstr(Subgraph("c:0", "b:0,c:0", ""), "both fed and fetched"));
+  // Feed not found.
+  EXPECT_TRUE(HasSubstr(Subgraph("foo:0", "", ""), "unable to find"));
+  // Fetch not found.
+  EXPECT_TRUE(HasSubstr(Subgraph("", "foo:0", ""), "not found"));
+  // Target not found.
+  EXPECT_TRUE(HasSubstr(Subgraph("", "", "foo"), "not found"));
+}
+
+REGISTER_OP("In").Output("o: float");
+REGISTER_OP("Op").Input("i: float").Output("o: float");
+
+static void BM_Subgraph(int iters, int num_nodes) {
+  DeviceAttributes device_info;
+  device_info.set_name("/job:a/replica:0/task:0/cpu:0");
+  device_info.set_device_type(DeviceType(DEVICE_CPU).type());
+  device_info.set_incarnation(0);
+
+  testing::StopTiming();
+  Graph g(OpRegistry::Global());
+  {  // Scope for temporary variables used to construct g.
+    GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+    Node* last_node = nullptr;
+    for (int i = 0; i < num_nodes; i++) {
+      string name = strings::StrCat("N", i);
+      if (i > 0) {
+        last_node = ops::UnaryOp("Op", last_node, b.opts().WithName(name));
+      } else {
+        last_node = ops::SourceOp("In", b.opts().WithName(name));
+      }
+    }
+    TF_CHECK_OK(b.ToGraph(&g));
+  }
+
+  std::vector<string> fed;
+  if (num_nodes > 1000) {
+    fed.push_back(strings::StrCat("N", num_nodes - 1000));
+  }
+  std::vector<string> fetch;
+  std::vector<string> targets = {strings::StrCat("N", num_nodes - 1)};
+  testing::StartTiming();
+  while (--iters > 0) {
+    Graph* subgraph = new Graph(OpRegistry::Global());
+    CopyGraph(g, subgraph);
+    TF_CHECK_OK(subgraph::RewriteGraphForExecution(subgraph, fed, fetch,
+                                                   targets, device_info));
+    delete subgraph;
+  }
+}
+BENCHMARK(BM_Subgraph)->Arg(100)->Arg(1000)->Arg(10000)->Arg(100000);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/tensor_id.cc b/tensorflow/core/graph/tensor_id.cc
new file mode 100644
index 0000000000..f789110ff3
--- /dev/null
+++ b/tensorflow/core/graph/tensor_id.cc
@@ -0,0 +1,41 @@
+#include "tensorflow/core/graph/tensor_id.h"
+
+#include <string>
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+
+TensorId ParseTensorName(const string& name) {
+  return ParseTensorName(StringPiece(name.data(), name.size()));
+}
+
+TensorId ParseTensorName(StringPiece name) {
+  // Parse either a name, or a name:digits.  To do so, we go backwards
+  // from the end of the string, skipping over a run of digits.  If
+  // we hit a ':' character, then we know we are in the 'name:digits'
+  // regime.  Otherwise, the output index is implicitly 0, and the whole
+  // name string forms the first part of the tensor name.
+  //
+  // Equivalent to matching with this regexp: ([^:]+):(\\d+)
+  const char* base = name.data();
+  const char* p = base + name.size() - 1;
+  int index = 0;
+  int mul = 1;
+  while (p > base && (*p >= '0' && *p <= '9')) {
+    index += ((*p - '0') * mul);
+    mul *= 10;
+    p--;
+  }
+  TensorId id;
+  if (p > base && *p == ':' && mul > 1) {
+    id.first = StringPiece(base, p - base);
+    id.second = index;
+  } else {
+    id.first = name;
+    id.second = 0;
+  }
+  return id;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/tensor_id.h b/tensorflow/core/graph/tensor_id.h
new file mode 100644
index 0000000000..f1f3846875
--- /dev/null
+++ b/tensorflow/core/graph/tensor_id.h
@@ -0,0 +1,28 @@
+#ifndef TENSORFLOW_GRAPH_TENSOR_ID_H_
+#define TENSORFLOW_GRAPH_TENSOR_ID_H_
+
+#include <string>
+
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+
+// Identifier for a tensor within a step.
+// first == operation_name, second == output_index
+// Note: does not own backing storage for name.
+struct TensorId : public std::pair<StringPiece, int> {
+  typedef std::pair<StringPiece, int> Base;
+
+  // Inherit the set of constructors.
+  using Base::pair;
+
+  string ToString() const { return strings::StrCat(first, ":", second); }
+};
+
+TensorId ParseTensorName(const string& name);
+TensorId ParseTensorName(StringPiece name);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_TENSOR_ID_H_
diff --git a/tensorflow/core/graph/tensor_id_test.cc b/tensorflow/core/graph/tensor_id_test.cc
new file mode 100644
index 0000000000..b945774cc3
--- /dev/null
+++ b/tensorflow/core/graph/tensor_id_test.cc
@@ -0,0 +1,77 @@
+#include "tensorflow/core/graph/tensor_id.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+
+namespace tensorflow {
+namespace {
+
+static string ParseHelper(const string& n) {
+  TensorId id = ParseTensorName(n);
+  return strings::StrCat(id.first, ":", id.second);
+}
+
+TEST(TensorIdTest, ParseTensorName) {
+  EXPECT_EQ(ParseHelper("W1"), "W1:0");
+  EXPECT_EQ(ParseHelper("weights:0"), "weights:0");
+  EXPECT_EQ(ParseHelper("W1:1"), "W1:1");
+  EXPECT_EQ(ParseHelper("W1:17"), "W1:17");
+  EXPECT_EQ(ParseHelper("xyz1_17"), "xyz1_17:0");
+}
+
+static uint32 Skewed(random::SimplePhilox* rnd, int max_log) {
+  const uint32 space = 1 << (rnd->Rand32() % (max_log + 1));
+  return rnd->Rand32() % space;
+}
+
+static void BM_ParseTensorName(int iters, int arg) {
+  testing::StopTiming();
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  std::vector<string> names;
+  for (int i = 0; i < 100; i++) {
+    string name;
+    switch (arg) {
+      case 0: {  // Generate random names
+        size_t len = Skewed(&rnd, 4);
+        while (name.size() < len) {
+          name += rnd.OneIn(4) ? '0' : 'a';
+        }
+        if (rnd.OneIn(3)) {
+          strings::StrAppend(&name, ":", rnd.Uniform(12));
+        }
+        break;
+      }
+      case 1:
+        name = "W1";
+        break;
+      case 2:
+        name = "t0003";
+        break;
+      case 3:
+        name = "weights";
+        break;
+      case 4:
+        name = "weights:17";
+        break;
+      default:
+        LOG(FATAL) << "Unexpected arg";
+        break;
+    }
+    names.push_back(name);
+  }
+  testing::StartTiming();
+  TensorId id;
+  int index = 0;
+  int sum = 0;
+  while (--iters > 0) {
+    id = ParseTensorName(names[index++ % names.size()]);
+    sum += id.second;
+  }
+  VLOG(2) << sum;  // Prevent compiler from eliminating loop body
+}
+BENCHMARK(BM_ParseTensorName)->Arg(0)->Arg(1)->Arg(2)->Arg(3)->Arg(4);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/graph/testlib.cc b/tensorflow/core/graph/testlib.cc
new file mode 100644
index 0000000000..e49d5e819a
--- /dev/null
+++ b/tensorflow/core/graph/testlib.cc
@@ -0,0 +1,299 @@
+#include "tensorflow/core/graph/testlib.h"
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+namespace test {
+namespace graph {
+
+Node* Send(Graph* g, Node* input, const string& tensor, const string& sender,
+           const uint64 sender_incarnation, const string& receiver) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "_Send")
+                  .Input(input, 0)
+                  .Attr("tensor_name", tensor)
+                  .Attr("send_device", sender)
+                  .Attr("send_device_incarnation",
+                        static_cast<int64>(sender_incarnation))
+                  .Attr("recv_device", receiver)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Recv(Graph* g, const string& tensor, const string& type,
+           const string& sender, const uint64 sender_incarnation,
+           const string& receiver) {
+  Node* ret;
+  DataType dtype;
+  CHECK(DataTypeFromString(type, &dtype));
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "_Recv")
+                  .Attr("tensor_type", dtype)
+                  .Attr("tensor_name", tensor)
+                  .Attr("send_device", sender)
+                  .Attr("send_device_incarnation",
+                        static_cast<int64>(sender_incarnation))
+                  .Attr("recv_device", receiver)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Constant(Graph* g, const Tensor& tensor) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Const")
+                  .Attr("dtype", tensor.dtype())
+                  .Attr("value", tensor)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Constant(Graph* g, const Tensor& tensor, const string& name) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(name, "Const")
+                  .Attr("dtype", tensor.dtype())
+                  .Attr("value", tensor)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Var(Graph* g, const DataType dtype, const TensorShape& shape) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Variable")
+                  .Attr("dtype", dtype)
+                  .Attr("shape", shape)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Assign(Graph* g, Node* var, Node* val) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Assign")
+                  .Input(var)
+                  .Input(val)
+                  .Attr("use_locking", true)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Reduce(Graph* g, const string& reduce, Node* data, Node* axes,
+             bool keep_dims) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), reduce)
+                  .Input(data)
+                  .Input(axes)
+                  .Attr("keep_dims", keep_dims)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* QuantizeToUINT8(Graph* g, Node* data) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Quantize")
+                  .Input(data)
+                  .Attr("T", DT_QUINT8)
+                  .Attr("max_range", 1.0f)
+                  .Attr("min_range", -1.0f)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Matmul(Graph* g, Node* in0, Node* in1, bool transpose_a,
+             bool transpose_b) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "MatMul")
+                  .Input(in0)
+                  .Input(in1)
+                  .Attr("transpose_a", transpose_a)
+                  .Attr("transpose_b", transpose_b)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* RandomNumberGenerator(const string& op, Graph* g, Node* input,
+                            DataType dtype) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), op)
+                  .Input(input)
+                  .Attr("dtype", dtype)
+                  .Attr("seed", 0)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* RandomUniform(Graph* g, Node* input, DataType dtype) {
+  return RandomNumberGenerator("RandomUniform", g, input, dtype);
+}
+
+Node* RandomGaussian(Graph* g, Node* input, DataType dtype) {
+  return RandomNumberGenerator("RandomStandardNormal", g, input, dtype);
+}
+
+Node* RandomParameters(Graph* g, Node* input, DataType dtype) {
+  return RandomNumberGenerator("RandomParameters", g, input, dtype);
+}
+
+Node* Unary(Graph* g, const string& func, Node* input, int index) {
+  Node* ret;
+  TF_CHECK_OK(
+      NodeBuilder(g->NewName("n"), func).Input(input, index).Finalize(g, &ret));
+  return ret;
+}
+
+Node* Binary(Graph* g, const string& func, Node* in0, Node* in1) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), func)
+                  .Input(in0)
+                  .Input(in1)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Multi(Graph* g, const string& func, gtl::ArraySlice<Node*> ins) {
+  Node* ret;
+  auto b = NodeBuilder(g->NewName("n"), func);
+  for (Node* n : ins) b = b.Input(n);
+  TF_CHECK_OK(b.Finalize(g, &ret));
+  return ret;
+}
+
+Node* Identity(Graph* g, Node* input, int index) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Identity")
+                  .Input(input, index)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Add(Graph* g, Node* in0, Node* in1) { return Binary(g, "Add", in0, in1); }
+
+Node* Error(Graph* g, Node* input, const string& errmsg) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Error")
+                  .Input(input)
+                  .Attr("message", errmsg)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* InvalidRefType(Graph* g, DataType out_type, DataType invalid_type) {
+  DCHECK(out_type != invalid_type);
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "InvalidRefType")
+                  .Attr("TIn", out_type)
+                  .Attr("TOut", invalid_type)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Delay(Graph* g, Node* input, Microseconds delay_micros) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Delay")
+                  .Input(input)
+                  .Attr("micros", delay_micros.value())
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* NoOp(Graph* g, const std::vector<Node*>& control_inputs) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "NoOp")
+                  .ControlInputs(control_inputs)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Switch(Graph* g, Node* in0, Node* in1) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Switch")
+                  .Input(in0)
+                  .Input(in1)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Enter(Graph* g, Node* input, const string& frame_name) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Enter")
+                  .Input(input)
+                  .Attr("frame_name", frame_name)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Exit(Graph* g, Node* input) {
+  Node* ret;
+  TF_CHECK_OK(
+      NodeBuilder(g->NewName("n"), "Exit").Input(input).Finalize(g, &ret));
+  return ret;
+}
+
+Node* Merge(Graph* g, Node* in0, Node* in1) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Merge")
+                  .Input({in0, in1})
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Merge(Graph* g, Node* in0, gtl::ArraySlice<string> remaining_in) {
+  std::vector<NodeBuilder::NodeOut> inputs;
+  inputs.reserve(remaining_in.size() + 1);
+  inputs.emplace_back(in0);
+  for (const string& in_name : remaining_in) {
+    inputs.emplace_back(in_name, 0, inputs[0].dt);
+  }
+
+  Node* ret;
+  TF_CHECK_OK(
+      NodeBuilder(g->NewName("n"), "Merge").Input(inputs).Finalize(g, &ret));
+  return ret;
+}
+
+Node* Next(Graph* g, const string& name, Node* input) {
+  Node* ret;
+  TF_CHECK_OK(
+      NodeBuilder(name, "NextIteration").Input(input).Finalize(g, &ret));
+  return ret;
+}
+
+Node* LoopCond(Graph* g, Node* input) {
+  Node* ret;
+  TF_CHECK_OK(
+      NodeBuilder(g->NewName("n"), "LoopCond").Input(input).Finalize(g, &ret));
+  return ret;
+}
+
+Node* Less(Graph* g, Node* in0, Node* in1) {
+  return Binary(g, "Less", in0, in1);
+}
+
+Node* Select(Graph* g, Node* c, Node* inx, Node* iny) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Select")
+                  .Input(c)
+                  .Input(inx)
+                  .Input(iny)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+Node* Cast(Graph* g, Node* in, DataType dst) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Cast")
+                  .Input(in)
+                  .Attr("DstT", dst)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+void ToGraphDef(Graph* g, GraphDef* gdef) { g->ToGraphDef(gdef); }
+
+}  // end namespace graph
+}  // end namespace test
+}  // end namespace tensorflow
diff --git a/tensorflow/core/graph/testlib.h b/tensorflow/core/graph/testlib.h
new file mode 100644
index 0000000000..11905bbf6a
--- /dev/null
+++ b/tensorflow/core/graph/testlib.h
@@ -0,0 +1,141 @@
+// DEPRECATED: Use GraphDefBuilder instead.
+
+#ifndef TENSORFLOW_GRAPH_TESTLIB_H_
+#define TENSORFLOW_GRAPH_TESTLIB_H_
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/graph/types.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+namespace test {
+namespace graph {
+
+// Converts "g" into its corresponding GraphDef "def".
+// DEPRECATED: call g->ToGraphDef(def) instead.
+void ToGraphDef(Graph* g, GraphDef* def);
+
+// A few helpers to construct a graph.
+
+// Adds a node in "g" producing a constant "tensor".
+Node* Constant(Graph* g, const Tensor& tensor);
+Node* Constant(Graph* g, const Tensor& tensor, const string& name);
+
+// Adds a variable in "g" of the given "shape" and "dtype".
+Node* Var(Graph* g, const DataType dtype, const TensorShape& shape);
+
+// Adds an assign node in "g" which assigns "val" into "var".
+Node* Assign(Graph* g, Node* var, Node* val);
+
+// Adds a send node "g" sending "input" as a named "tensor" from
+// "sender" to "receiver".
+Node* Send(Graph* g, Node* input, const string& tensor, const string& sender,
+           const uint64 sender_incarnation, const string& receiver);
+
+// Adds a recv node in "g" receiving a named "tensor" from "sender"
+// to "receiver".
+Node* Recv(Graph* g, const string& tensor, const string& type,
+           const string& sender, const uint64 sender_incarnation,
+           const string& receiver);
+
+// Adds a reduction "node" in "g" doing sum(data, axes).  "reduce" is
+// a reduction, e.g., Sum, Max, Min, Mean, etc.
+Node* Reduce(Graph* g, const string& reduce, Node* data, Node* axes,
+             bool keep_dims = false);
+
+// Adds a Matmul node in g doing in0.contract(in1).
+Node* Matmul(Graph* g, Node* in0, Node* in1, bool transpose_a,
+             bool transpose_b);
+
+// Adds a Quantize node into g that quantize floats into QUINT8. The range of
+// the input float tensor is assumed to be [-1, 1].
+Node* QuantizeToUINT8(Graph* g, Node* data);
+
+// Adds a unary function "func" "node" in "g" taking "input".
+Node* Unary(Graph* g, const string& func, Node* input, int index = 0);
+
+// Adds an identity node in "g" taking "input" and producing an
+// identity copy.
+Node* Identity(Graph* g, Node* input, int index = 0);
+
+// Adds a binary function "func" node in "g" taking "in0" and "in1".
+// Requires that "func" name an attr-style Op.
+Node* Binary(Graph* g, const string& func, Node* in0, Node* in1);
+
+// Adds a function "func" node in "g" taking inputs "ins".
+// Requires that "func" name an attr-style Op.
+Node* Multi(Graph* g, const string& func, gtl::ArraySlice<Node*> ins);
+
+// Adds a binary add node in "g" doing in0 + in1.
+Node* Add(Graph* g, Node* in0, Node* in1);
+
+// Generates random unit uniform distribution of the input shape.
+Node* RandomUniform(Graph* g, Node* input, DataType dtype);
+
+// Generates random unit normal distribution of the input shape.
+Node* RandomGaussian(Graph* g, Node* input, DataType dtype);
+
+// Generates random parameters from the truncated standard normal distribution
+// of the nput shape
+Node* RandomParameters(Graph* g, Node* input, DataType dtype);
+
+// Adds an error node in "g". The node's computation always
+// generates an error with the given error message "errmsg".
+Node* Error(Graph* g, Node* input, const string& errmsg);
+
+// Adds a node that generates a invalid ref output.
+Node* InvalidRefType(Graph* g, DataType out_type, DataType invalid_type);
+
+// Adds a node in "g". Its Compute() sleeps a while and outputs the
+// input (i.e., same as identity).
+Node* Delay(Graph* g, Node* input, Microseconds delay_micros);
+
+// Adds a no-op "node" in "g", with control inputs from all nodes in
+// control_inputs vector.
+Node* NoOp(Graph* g, const std::vector<Node*>& control_inputs);
+
+// Adds a Switch node in "g". If "in1" is true, it forwards "in0" to
+// output 1. Otherwise, it forwards "in0" to output 0.
+Node* Switch(Graph* g, Node* in0, Node* in1);
+
+// Adds an Enter node in "g", which enters a new frame.
+Node* Enter(Graph* g, Node* input, const string& frame_name);
+
+// Adds an Exit node in "g", which exits a frame.
+Node* Exit(Graph* g, Node* input);
+
+// Adds a Merge node in "g" with two inputs "in0" and "in1".
+Node* Merge(Graph* g, Node* in0, Node* in1);
+
+// Adds a Merge node in "g". The first input is "in0", the remaining
+// inputs are only given by their names in remaining_in.
+Node* Merge(Graph* g, Node* in0, gtl::ArraySlice<string> remaining_in);
+
+// Adds a NextIteration node in "g", which makes its input available
+// to the next iteration.
+Node* Next(Graph* g, const string& name, Node* input);
+
+// Adds a LoopCond node in "g", representing the "pivot" termination
+// condition of a loop.
+Node* LoopCond(Graph* g, Node* input);
+
+// Adds a less node in "g", which returns true iff "in0" < "in1".
+Node* Less(Graph* g, Node* in0, Node* in1);
+
+// Adds a select node in "g", which outputs either "inx" or "iny"
+// depending on the boolean value of "c".
+Node* Select(Graph* g, Node* c, Node* inx, Node* iny);
+
+// Casts "in" into data type "dst".
+Node* Cast(Graph* g, Node* in, DataType dst);
+
+}  // end namespace graph
+}  // end namespace test
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_TESTLIB_H_
diff --git a/tensorflow/core/graph/types.h b/tensorflow/core/graph/types.h
new file mode 100644
index 0000000000..41400611a9
--- /dev/null
+++ b/tensorflow/core/graph/types.h
@@ -0,0 +1,17 @@
+#ifndef TENSORFLOW_GRAPH_TYPES_H_
+#define TENSORFLOW_GRAPH_TYPES_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/gtl/int_type.h"
+
+namespace tensorflow {
+
+// We model running time in microseconds.
+TF_LIB_GTL_DEFINE_INT_TYPE(Microseconds, int64);
+
+// We model size in bytes.
+TF_LIB_GTL_DEFINE_INT_TYPE(Bytes, int64);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_GRAPH_TYPES_H_
diff --git a/tensorflow/core/kernels/adjust_contrast_op.cc b/tensorflow/core/kernels/adjust_contrast_op.cc
new file mode 100644
index 0000000000..7cc0534354
--- /dev/null
+++ b/tensorflow/core/kernels/adjust_contrast_op.cc
@@ -0,0 +1,121 @@
+// See docs in ../ops/image_ops.cc
+#define EIGEN_USE_THREADS
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/adjust_contrast_op.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class AdjustContrastOp : public OpKernel {
+ public:
+  explicit AdjustContrastOp(OpKernelConstruction* context) : OpKernel(context) {
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& factor = context->input(1);
+    const Tensor& min_value = context->input(2);
+    const Tensor& max_value = context->input(3);
+    OP_REQUIRES(context, input.dims() >= 3,
+                errors::InvalidArgument("input must be at least 3-D, got shape",
+                                        input.shape().ShortDebugString()));
+    const int64 height = input.dim_size(input.dims() - 3);
+    const int64 width = input.dim_size(input.dims() - 2);
+    const int64 channels = input.dim_size(input.dims() - 1);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(factor.shape()),
+                errors::InvalidArgument("contrast_factor must be scalar: ",
+                                        factor.shape().ShortDebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(min_value.shape()),
+                errors::InvalidArgument("min_value must be scalar: ",
+                                        min_value.shape().ShortDebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(max_value.shape()),
+                errors::InvalidArgument("max_value must be scalar: ",
+                                        max_value.shape().ShortDebugString()));
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input.shape(), &output));
+
+    Tensor mean_values;
+    OP_REQUIRES_OK(context, context->allocate_temp(DataTypeToEnum<float>::value,
+                                                   TensorShape(input.shape()),
+                                                   &mean_values));
+
+    if (input.NumElements() > 0) {
+      const int64 batch = input.NumElements() / (height * width * channels);
+      const int64 shape[4] = {batch, height, width, channels};
+      functor::AdjustContrast<Device, T>()(
+          context->eigen_device<Device>(), input.shaped<T, 4>(shape),
+          factor.scalar<float>(), min_value.scalar<float>(),
+          max_value.scalar<float>(), mean_values.shaped<float, 4>(shape),
+          output->shaped<float, 4>(shape));
+    }
+  }
+};
+
+#define REGISTER_KERNEL(T)                                              \
+  REGISTER_KERNEL_BUILDER(                                              \
+      Name("AdjustContrast").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
+      AdjustContrastOp<CPUDevice, T>);
+
+REGISTER_KERNEL(uint8);
+REGISTER_KERNEL(int8);
+REGISTER_KERNEL(int16);
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+#if GOOGLE_CUDA
+// Forward declarations of the function specializations for GPU (to prevent
+// building the GPU versions here, they will be built compiling _gpu.cu.cc).
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                         \
+  template <>                                                       \
+  void AdjustContrast<GPUDevice, T>::operator()(                    \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor input, \
+      typename TTypes<float>::ConstScalar contrast_factor,          \
+      typename TTypes<float>::ConstScalar min_value,                \
+      typename TTypes<float>::ConstScalar max_value,                \
+      typename TTypes<float, 4>::Tensor mean_values,                \
+      typename TTypes<float, 4>::Tensor output);                    \
+  extern template struct AdjustContrast<GPUDevice, T>;
+
+DECLARE_GPU_SPEC(uint8);
+DECLARE_GPU_SPEC(int8);
+DECLARE_GPU_SPEC(int16);
+DECLARE_GPU_SPEC(int32);
+DECLARE_GPU_SPEC(float);
+DECLARE_GPU_SPEC(double);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNEL(T)                                          \
+  REGISTER_KERNEL_BUILDER(                                              \
+      Name("AdjustContrast").Device(DEVICE_GPU).TypeConstraint<T>("T"), \
+      AdjustContrastOp<GPUDevice, T>);
+REGISTER_GPU_KERNEL(uint8);
+REGISTER_GPU_KERNEL(int8);
+REGISTER_GPU_KERNEL(int16);
+REGISTER_GPU_KERNEL(int32);
+REGISTER_GPU_KERNEL(float);
+REGISTER_GPU_KERNEL(double);
+#undef REGISTER_GPU_KERNEL
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/adjust_contrast_op.h b/tensorflow/core/kernels/adjust_contrast_op.h
new file mode 100644
index 0000000000..2182b33c03
--- /dev/null
+++ b/tensorflow/core/kernels/adjust_contrast_op.h
@@ -0,0 +1,64 @@
+#ifndef TENSORFLOW_KERNELS_ADJUST_CONTRAST_OP_H_
+#define TENSORFLOW_KERNELS_ADJUST_CONTRAST_OP_H_
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by AdjustContrastOp to do the computations.
+template <typename Device, typename T>
+struct AdjustContrast {
+  void operator()(const Device& d, typename TTypes<T, 4>::ConstTensor input,
+                  typename TTypes<float>::ConstScalar contrast_factor,
+                  typename TTypes<float>::ConstScalar min_value,
+                  typename TTypes<float>::ConstScalar max_value,
+                  typename TTypes<float, 4>::Tensor mean_values,
+                  typename TTypes<float, 4>::Tensor output) {
+    const int batch = input.dimension(0);
+    const int height = input.dimension(1);
+    const int width = input.dimension(2);
+    const int channels = input.dimension(3);
+
+    Eigen::array<int, 4> scalar_broadcast{{batch, height, width, channels}};
+#if !defined(EIGEN_HAS_INDEX_LIST)
+    Eigen::array<int, 2> reduction_axis{{1, 2}};
+    Eigen::array<int, 4> scalar{{1, 1, 1, 1}};
+    Eigen::array<int, 4> broadcast_dims{{1, height, width, 1}};
+    Eigen::Tensor<int, 4>::Dimensions reshape_dims{{batch, 1, 1, channels}};
+#else
+    Eigen::IndexList<Eigen::type2index<1>, Eigen::type2index<2> >
+        reduction_axis;
+    Eigen::IndexList<Eigen::type2index<1>, Eigen::type2index<1>,
+                     Eigen::type2index<1>, Eigen::type2index<1> > scalar;
+    Eigen::IndexList<Eigen::type2index<1>, int, int, Eigen::type2index<1> >
+        broadcast_dims;
+    broadcast_dims.set(1, height);
+    broadcast_dims.set(2, width);
+    Eigen::IndexList<int, Eigen::type2index<1>, Eigen::type2index<1>, int>
+        reshape_dims;
+    reshape_dims.set(0, batch);
+    reshape_dims.set(3, channels);
+#endif
+    mean_values.device(d) = input.template cast<float>()
+                                .mean(reduction_axis)
+                                .eval()
+                                .reshape(reshape_dims)
+                                .broadcast(broadcast_dims);
+
+    auto contrast_factor_tensor =
+        contrast_factor.reshape(scalar).broadcast(scalar_broadcast);
+    auto adjusted =
+        (input.template cast<float>() - mean_values) * contrast_factor_tensor +
+        mean_values;
+    auto min_bcast = min_value.reshape(scalar).broadcast(scalar_broadcast);
+    auto max_bcast = max_value.reshape(scalar).broadcast(scalar_broadcast);
+    // TODO(wicke): This is rather slow and should be re-written as pure cuda.
+    output.device(d) = adjusted.cwiseMin(max_bcast).cwiseMax(min_bcast);
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_ADJUST_CONTRAST_OP_H_
diff --git a/tensorflow/core/kernels/adjust_contrast_op_benchmark_test.cc b/tensorflow/core/kernels/adjust_contrast_op_benchmark_test.cc
new file mode 100644
index 0000000000..75b177cf4d
--- /dev/null
+++ b/tensorflow/core/kernels/adjust_contrast_op_benchmark_test.cc
@@ -0,0 +1,43 @@
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+
+namespace tensorflow {
+
+static Graph* BM_AdjustContrast(int batches, int width, int height) {
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor in(DT_UINT8, TensorShape({batches, width, height, 3}));
+  in.flat<uint8>().setRandom();
+  Tensor factor(DT_FLOAT, TensorShape({}));
+  factor.flat<float>().setConstant(1.2);
+  Tensor min_value(DT_FLOAT, TensorShape({}));
+  min_value.flat<float>().setConstant(7.);
+  Tensor max_value(DT_FLOAT, TensorShape({}));
+  max_value.flat<float>().setConstant(250.);
+
+  Node* ret;
+  NodeBuilder(g->NewName("n"), "AdjustContrast")
+      .Input(test::graph::Constant(g, in))
+      .Input(test::graph::Constant(g, factor))
+      .Input(test::graph::Constant(g, min_value))
+      .Input(test::graph::Constant(g, max_value))
+      .Finalize(g, &ret);
+  return g;
+}
+
+#define BM_AdjustContrastDev(DEVICE, B, W, H)                           \
+  static void BM_AdjustContrast_##DEVICE##_##B##_##W##_##H(int iters) { \
+    testing::ItemsProcessed(iters* B* W* H * 3);                        \
+    test::Benchmark(#DEVICE, BM_AdjustContrast(B, W, H)).Run(iters);    \
+  }                                                                     \
+  BENCHMARK(BM_AdjustContrast_##DEVICE##_##B##_##W##_##H);
+
+// Benchmark results as of cl/106323955
+// BM_AdjustContrast_cpu_1_299_299  3416770  22008951  100  11.6M items/s
+
+// BM_AdjustContrast_gpu_32_299_299  37117844  45512374  100  179.8M items/s
+BM_AdjustContrastDev(cpu, 1, 299, 299) BM_AdjustContrastDev(gpu, 32, 299, 299)
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/adjust_contrast_op_gpu.cu.cc b/tensorflow/core/kernels/adjust_contrast_op_gpu.cu.cc
new file mode 100644
index 0000000000..7a9b0726fd
--- /dev/null
+++ b/tensorflow/core/kernels/adjust_contrast_op_gpu.cu.cc
@@ -0,0 +1,22 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/adjust_contrast_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+template struct functor::AdjustContrast<GPUDevice, uint8>;
+template struct functor::AdjustContrast<GPUDevice, int8>;
+template struct functor::AdjustContrast<GPUDevice, int16>;
+template struct functor::AdjustContrast<GPUDevice, int32>;
+template struct functor::AdjustContrast<GPUDevice, int64>;
+template struct functor::AdjustContrast<GPUDevice, float>;
+template struct functor::AdjustContrast<GPUDevice, double>;
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/adjust_contrast_op_test.cc b/tensorflow/core/kernels/adjust_contrast_op_test.cc
new file mode 100644
index 0000000000..67891e4fa1
--- /dev/null
+++ b/tensorflow/core/kernels/adjust_contrast_op_test.cc
@@ -0,0 +1,88 @@
+#include "tensorflow/core/framework/allocator.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+class AdjustContrastOpTest : public OpsTestBase {
+ protected:
+  void MakeOp() { RequireDefaultOps(); }
+};
+
+TEST_F(AdjustContrastOpTest, Simple_1113) {
+  RequireDefaultOps();
+  EXPECT_OK(NodeDefBuilder("adjust_constrast_op", "AdjustContrast")
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Attr("T", DT_FLOAT)
+                .Finalize(node_def()));
+  EXPECT_OK(InitOp());
+  AddInputFromArray<float>(TensorShape({1, 1, 1, 3}), {-1, 2, 3});
+  AddInputFromArray<float>(TensorShape({}), {1.0});
+  AddInputFromArray<float>(TensorShape({}), {0.0});
+  AddInputFromArray<float>(TensorShape({}), {2.0});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 1, 1, 3}));
+  test::FillValues<float>(&expected, {0, 2, 2});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(AdjustContrastOpTest, Simple_1223) {
+  RequireDefaultOps();
+  EXPECT_OK(NodeDefBuilder("adjust_constrast_op", "AdjustContrast")
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Attr("T", DT_FLOAT)
+                .Finalize(node_def()));
+  EXPECT_OK(InitOp());
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 3}),
+                           {1, 5, 9, 2, 6, 10, 3, 7, 11, 4, 8, 12});
+  AddInputFromArray<float>(TensorShape({}), {0.2});
+  AddInputFromArray<float>(TensorShape({}), {0.0});
+  AddInputFromArray<float>(TensorShape({}), {10.0});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 2, 2, 3}));
+  test::FillValues<float>(
+      &expected, {2.2, 6.2, 10, 2.4, 6.4, 10, 2.6, 6.6, 10, 2.8, 6.8, 10});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(AdjustContrastOpTest, Big_99x99x3) {
+  EXPECT_OK(NodeDefBuilder("adjust_constrast_op", "AdjustContrast")
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Input(FakeInput(DT_FLOAT))
+                .Attr("T", DT_FLOAT)
+                .Finalize(node_def()));
+  EXPECT_OK(InitOp());
+
+  std::vector<float> values;
+  for (int i = 0; i < 99 * 99 * 3; ++i) {
+    values.push_back(i % 255);
+  }
+
+  AddInputFromArray<float>(TensorShape({1, 99, 99, 3}), values);
+  AddInputFromArray<float>(TensorShape({}), {0.2});
+  AddInputFromArray<float>(TensorShape({}), {0});
+  AddInputFromArray<float>(TensorShape({}), {255});
+  ASSERT_OK(RunOpKernel());
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/aggregate_ops.cc b/tensorflow/core/kernels/aggregate_ops.cc
new file mode 100644
index 0000000000..426e868735
--- /dev/null
+++ b/tensorflow/core/kernels/aggregate_ops.cc
@@ -0,0 +1,238 @@
+// See docs in ../ops/math_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/aggregate_ops.h"
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/register_types.h"
+
+#include "tensorflow/core/platform/logging.h"
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class AddNOp : public OpKernel {
+ public:
+  explicit AddNOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    if (!ctx->ValidateInputsAreSameShape(this)) return;
+
+    const Tensor& input0 = ctx->input(0);
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, input0.shape(), &output));
+    auto To = output->flat<T>();
+
+    const int num = ctx->num_inputs();
+    if (num == 1) {
+      *output = input0;
+      return;
+    }
+
+#define I(IDX) ctx->input(IDX).flat<T>()
+
+#if defined(PLATFORM_POSIX_ANDROID) || defined(PLATFORM_GOOGLE_ANDROID)
+    // On Android, we only support additions of two arguments, so we
+    // can reduce the number of template instantiations.
+    OP_REQUIRES(ctx, num == 2,
+                errors::InvalidArgument("Only additions of two arguments "
+                                        "supported. Num inputs: ",
+                                        num));
+    functor::Add2Functor<Device, T> functor2;
+    functor2(ctx->template eigen_device<Device>(), To, I(0), I(1));
+#else
+    static const int kWidth = 8;
+    int r = num % kWidth;
+
+    switch (r) {
+      case 2: {
+        functor::Add2Functor<Device, T> functor2;
+        functor2(ctx->template eigen_device<Device>(), To, I(0), I(1));
+        break;
+      }
+      case 3: {
+        functor::Add3Functor<Device, T> functor3;
+        functor3(ctx->template eigen_device<Device>(), To, I(0), I(1), I(2));
+        break;
+      }
+      case 4: {
+        functor::Add4Functor<Device, T> functor4;
+        functor4(ctx->template eigen_device<Device>(), To, I(0), I(1), I(2),
+                 I(3));
+        break;
+      }
+      case 5: {
+        functor::Add5Functor<Device, T> functor5;
+        functor5(ctx->template eigen_device<Device>(), To, I(0), I(1), I(2),
+                 I(3), I(4));
+        break;
+      }
+      case 6: {
+        functor::Add6Functor<Device, T> functor6;
+        functor6(ctx->template eigen_device<Device>(), To, I(0), I(1), I(2),
+                 I(3), I(4), I(5));
+        break;
+      }
+      case 7: {
+        functor::Add7Functor<Device, T> functor7;
+        functor7(ctx->template eigen_device<Device>(), To, I(0), I(1), I(2),
+                 I(3), I(4), I(5), I(6));
+        break;
+      }
+      case 0: {
+        functor::Add8Functor<Device, T> functor8;
+        functor8(ctx->template eigen_device<Device>(), To, I(0), I(1), I(2),
+                 I(3), I(4), I(5), I(6), I(7));
+        r = 8;
+        break;
+      }
+      case 1: {
+        functor::Add9Functor<Device, T> functor9;
+        functor9(ctx->template eigen_device<Device>(), To, I(0), I(1), I(2),
+                 I(3), I(4), I(5), I(6), I(7), I(8));
+        r = 9;
+        break;
+      }
+    }
+
+    for (; r < num; r += kWidth) {
+      functor::Add8pFunctor<Device, T> functor8p;
+      functor8p(ctx->template eigen_device<Device>(), To, I(r), I(r + 1),
+                I(r + 2), I(r + 3), I(r + 4), I(r + 5), I(r + 6), I(r + 7));
+    }
+#endif  // defined(PLATFORM_POSIX_ANDROID) || defined(PLATFORM_GOOGLE_ANDROID)
+
+#undef I
+  }
+};
+
+// Partial specializations for a CPUDevice, that uses the Eigen implementation
+// from AddNEigenImpl.
+namespace functor {
+template <typename T>
+struct Add2Functor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2) {
+    Add2EigenImpl<CPUDevice, T>::Compute(d, out, in1, in2);
+  }
+};
+template <typename T>
+struct Add3Functor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3) {
+    Add3EigenImpl<CPUDevice, T>::Compute(d, out, in1, in2, in3);
+  }
+};
+template <typename T>
+struct Add4Functor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4) {
+    Add4EigenImpl<CPUDevice, T>::Compute(d, out, in1, in2, in3, in4);
+  }
+};
+template <typename T>
+struct Add5Functor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5) {
+    Add5EigenImpl<CPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5);
+  }
+};
+template <typename T>
+struct Add6Functor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5,
+                  typename TTypes<T>::ConstFlat in6) {
+    Add6EigenImpl<CPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6);
+  }
+};
+template <typename T>
+struct Add7Functor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5,
+                  typename TTypes<T>::ConstFlat in6,
+                  typename TTypes<T>::ConstFlat in7) {
+    Add7EigenImpl<CPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6,
+                                         in7);
+  }
+};
+
+template <typename T>
+struct Add8Functor<CPUDevice, T> {
+  void operator()(
+      const CPUDevice& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8) {
+    Add8EigenImpl<CPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6,
+                                         in7, in8);
+  }
+};
+
+template <typename T>
+struct Add8pFunctor<CPUDevice, T> {
+  void operator()(
+      const CPUDevice& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8) {
+    Add8pEigenImpl<CPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6,
+                                          in7, in8);
+  }
+};
+
+template <typename T>
+struct Add9Functor<CPUDevice, T> {
+  void operator()(
+      const CPUDevice& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8,
+      typename TTypes<T>::ConstFlat in9) {
+    Add9EigenImpl<CPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6,
+                                         in7, in8, in9);
+  }
+};
+
+}  // namespace functor
+
+#define REGISTER_ADDN(type, dev)                                   \
+  REGISTER_KERNEL_BUILDER(                                         \
+      Name("AddN").Device(DEVICE_##dev).TypeConstraint<type>("T"), \
+      AddNOp<dev##Device, type>)
+
+#define REGISTER_ADDN_CPU(type) REGISTER_ADDN(type, CPU)
+
+TF_CALL_NUMBER_TYPES(REGISTER_ADDN_CPU);
+#undef REGISTER_ADDN_CPU
+
+#if GOOGLE_CUDA
+REGISTER_ADDN(float, GPU);
+#endif  // GOOGLE_CUDA
+
+#undef REGISTER_ADDN
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/aggregate_ops.h b/tensorflow/core/kernels/aggregate_ops.h
new file mode 100644
index 0000000000..2214901970
--- /dev/null
+++ b/tensorflow/core/kernels/aggregate_ops.h
@@ -0,0 +1,211 @@
+#ifndef TENSORFLOW_KERNELS_AGGREGATE_OPS_H_
+#define TENSORFLOW_KERNELS_AGGREGATE_OPS_H_
+
+// Functor definitions for Aggregate ops, must be compilable by nvcc.
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T>
+struct Add2Functor {
+  void operator()(const Device& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2);
+};
+
+template <typename Device, typename T>
+struct Add2EigenImpl {
+  static void Compute(const Device& d, typename TTypes<T>::Flat out,
+                      typename TTypes<T>::ConstFlat in1,
+                      typename TTypes<T>::ConstFlat in2) {
+    out.device(d) = in1 + in2;
+  }
+};
+
+template <typename Device, typename T>
+struct Add3Functor {
+  void operator()(const Device& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3);
+};
+
+template <typename Device, typename T>
+struct Add3EigenImpl {
+  static void Compute(const Device& d, typename TTypes<T>::Flat out,
+                      typename TTypes<T>::ConstFlat in1,
+                      typename TTypes<T>::ConstFlat in2,
+                      typename TTypes<T>::ConstFlat in3) {
+    out.device(d) = in1 + in2 + in3;
+  }
+};
+
+template <typename Device, typename T>
+struct Add4Functor {
+  void operator()(const Device& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4);
+};
+
+template <typename Device, typename T>
+struct Add4EigenImpl {
+  static void Compute(const Device& d, typename TTypes<T>::Flat out,
+                      typename TTypes<T>::ConstFlat in1,
+                      typename TTypes<T>::ConstFlat in2,
+                      typename TTypes<T>::ConstFlat in3,
+                      typename TTypes<T>::ConstFlat in4) {
+    out.device(d) = in1 + in2 + in3 + in4;
+  }
+};
+
+template <typename Device, typename T>
+struct Add5Functor {
+  void operator()(const Device& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5);
+};
+
+template <typename Device, typename T>
+struct Add5EigenImpl {
+  static void Compute(const Device& d, typename TTypes<T>::Flat out,
+                      typename TTypes<T>::ConstFlat in1,
+                      typename TTypes<T>::ConstFlat in2,
+                      typename TTypes<T>::ConstFlat in3,
+                      typename TTypes<T>::ConstFlat in4,
+                      typename TTypes<T>::ConstFlat in5) {
+    out.device(d) = in1 + in2 + in3 + in4 + in5;
+  }
+};
+
+template <typename Device, typename T>
+struct Add6Functor {
+  void operator()(const Device& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5,
+                  typename TTypes<T>::ConstFlat in6);
+};
+
+template <typename Device, typename T>
+struct Add6EigenImpl {
+  static void Compute(const Device& d, typename TTypes<T>::Flat out,
+                      typename TTypes<T>::ConstFlat in1,
+                      typename TTypes<T>::ConstFlat in2,
+                      typename TTypes<T>::ConstFlat in3,
+                      typename TTypes<T>::ConstFlat in4,
+                      typename TTypes<T>::ConstFlat in5,
+                      typename TTypes<T>::ConstFlat in6) {
+    out.device(d) = in1 + in2 + in3 + in4 + in5 + in6;
+  }
+};
+
+template <typename Device, typename T>
+struct Add7Functor {
+  void operator()(const Device& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5,
+                  typename TTypes<T>::ConstFlat in6,
+                  typename TTypes<T>::ConstFlat in7);
+};
+
+template <typename Device, typename T>
+struct Add7EigenImpl {
+  static void Compute(const Device& d, typename TTypes<T>::Flat out,
+                      typename TTypes<T>::ConstFlat in1,
+                      typename TTypes<T>::ConstFlat in2,
+                      typename TTypes<T>::ConstFlat in3,
+                      typename TTypes<T>::ConstFlat in4,
+                      typename TTypes<T>::ConstFlat in5,
+                      typename TTypes<T>::ConstFlat in6,
+                      typename TTypes<T>::ConstFlat in7) {
+    out.device(d) = in1 + in2 + in3 + in4 + in5 + in6 + in7;
+  }
+};
+
+template <typename Device, typename T>
+struct Add8Functor {
+  void operator()(
+      const Device& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8);
+};
+
+template <typename Device, typename T>
+struct Add8EigenImpl {
+  static void Compute(
+      const Device& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8) {
+    out.device(d) = in1 + in2 + in3 + in4 + in5 + in6 + in7 + in8;
+  }
+};
+
+// Add8p is like Add8 except the underlying implementation should +=
+// rather than assign to the output.
+template <typename Device, typename T>
+struct Add8pFunctor {
+  void operator()(
+      const Device& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8);
+};
+
+template <typename Device, typename T>
+struct Add8pEigenImpl {
+  static void Compute(
+      const Device& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8) {
+    out.device(d) += in1 + in2 + in3 + in4 + in5 + in6 + in7 + in8;
+  }
+};
+
+template <typename Device, typename T>
+struct Add9Functor {
+  void operator()(
+      const Device& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8,
+      typename TTypes<T>::ConstFlat in9);
+};
+
+template <typename Device, typename T>
+struct Add9EigenImpl {
+  static void Compute(
+      const Device& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8,
+      typename TTypes<T>::ConstFlat in9) {
+    out.device(d) = in1 + in2 + in3 + in4 + in5 + in6 + in7 + in8 + in9;
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_AGGREGATE_OPS_H_
diff --git a/tensorflow/core/kernels/aggregate_ops_gpu.cu.cc b/tensorflow/core/kernels/aggregate_ops_gpu.cu.cc
new file mode 100644
index 0000000000..5cf2934ac1
--- /dev/null
+++ b/tensorflow/core/kernels/aggregate_ops_gpu.cu.cc
@@ -0,0 +1,141 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/aggregate_ops.h"
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Partial specialization for a GPUDevice, that uses the Eigen implementation.
+namespace functor {
+template <typename T>
+struct Add2Functor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2) {
+    Add2EigenImpl<GPUDevice, T>::Compute(d, out, in1, in2);
+  }
+};
+
+template <typename T>
+struct Add3Functor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3) {
+    Add3EigenImpl<GPUDevice, T>::Compute(d, out, in1, in2, in3);
+  }
+};
+
+template <typename T>
+struct Add4Functor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4) {
+    Add4EigenImpl<GPUDevice, T>::Compute(d, out, in1, in2, in3, in4);
+  }
+};
+
+template <typename T>
+struct Add5Functor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5) {
+    Add5EigenImpl<GPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5);
+  }
+};
+
+template <typename T>
+struct Add6Functor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5,
+                  typename TTypes<T>::ConstFlat in6) {
+    Add6EigenImpl<GPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6);
+  }
+};
+
+template <typename T>
+struct Add7Functor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstFlat in1,
+                  typename TTypes<T>::ConstFlat in2,
+                  typename TTypes<T>::ConstFlat in3,
+                  typename TTypes<T>::ConstFlat in4,
+                  typename TTypes<T>::ConstFlat in5,
+                  typename TTypes<T>::ConstFlat in6,
+                  typename TTypes<T>::ConstFlat in7) {
+    Add7EigenImpl<GPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6,
+                                         in7);
+  }
+};
+
+template <typename T>
+struct Add8Functor<GPUDevice, T> {
+  void operator()(
+      const GPUDevice& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8) {
+    Add8EigenImpl<GPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6,
+                                         in7, in8);
+  }
+};
+
+template <typename T>
+struct Add8pFunctor<GPUDevice, T> {
+  void operator()(
+      const GPUDevice& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8) {
+    Add8pEigenImpl<GPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6,
+                                          in7, in8);
+  }
+};
+
+template <typename T>
+struct Add9Functor<GPUDevice, T> {
+  void operator()(
+      const GPUDevice& d, typename TTypes<T>::Flat out,
+      typename TTypes<T>::ConstFlat in1, typename TTypes<T>::ConstFlat in2,
+      typename TTypes<T>::ConstFlat in3, typename TTypes<T>::ConstFlat in4,
+      typename TTypes<T>::ConstFlat in5, typename TTypes<T>::ConstFlat in6,
+      typename TTypes<T>::ConstFlat in7, typename TTypes<T>::ConstFlat in8,
+      typename TTypes<T>::ConstFlat in9) {
+    Add9EigenImpl<GPUDevice, T>::Compute(d, out, in1, in2, in3, in4, in5, in6,
+                                         in7, in8, in9);
+  }
+};
+
+}  // end namespace functor
+
+// Instantiate the GPU implementation for float.
+template struct functor::Add2Functor<GPUDevice, float>;
+template struct functor::Add3Functor<GPUDevice, float>;
+template struct functor::Add4Functor<GPUDevice, float>;
+template struct functor::Add5Functor<GPUDevice, float>;
+template struct functor::Add6Functor<GPUDevice, float>;
+template struct functor::Add7Functor<GPUDevice, float>;
+template struct functor::Add8Functor<GPUDevice, float>;
+template struct functor::Add8pFunctor<GPUDevice, float>;
+template struct functor::Add9Functor<GPUDevice, float>;
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/argmax_op.cc b/tensorflow/core/kernels/argmax_op.cc
new file mode 100644
index 0000000000..0845eebf09
--- /dev/null
+++ b/tensorflow/core/kernels/argmax_op.cc
@@ -0,0 +1,163 @@
+// See docs in ../ops/math_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#if GOOGLE_CUDA
+#define EIGEN_USE_GPU
+#endif  // GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/argmax_op.h"
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T, typename ArgFunctor>
+class ArgOp : public OpKernel {
+ public:
+  explicit ArgOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& dimension = context->input(1);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(dimension.shape()),
+                errors::InvalidArgument(
+                    "dim must be a scalar, but received tensor of shape: ",
+                    dimension.shape().DebugString()));
+
+    const int32 dim = dimension.scalar<int32>()();
+    const int input_dims = input.dims();
+
+    OP_REQUIRES(context, dim >= 0, errors::InvalidArgument("dim must be >= 0"));
+    OP_REQUIRES(context, dim < input_dims,
+                errors::InvalidArgument("Minimum tensor rank: ", dim,
+                                        " but got: ", input_dims));
+
+    TensorShape output_shape;
+    TensorShape input_shape = input.shape();
+    for (int d = 0; d < input_dims - 1; ++d) {
+      output_shape.AddDim(input_shape.dim_size((d < dim) ? d : d + 1));
+    }
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+
+#define HANDLE_DIM(NDIM)                                         \
+  case NDIM:                                                     \
+    ArgFunctor::Reduce##NDIM(context->eigen_device<Device>(),    \
+                             input.tensor<T, NDIM>(), dim,       \
+                             output->tensor<int64, NDIM - 1>()); \
+    break;
+
+    switch (input_dims) {
+      HANDLE_DIM(1);
+      HANDLE_DIM(2);
+      HANDLE_DIM(3);
+      HANDLE_DIM(4);
+      HANDLE_DIM(5);
+
+      default:
+        OP_REQUIRES(context, false,
+                    errors::InvalidArgument(
+                        "ArgOp : Unhandled input dimensions: ", input_dims));
+    }
+  }
+#undef HANDLE_DIM
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(ArgOp);
+};
+
+template <typename Device, typename T>
+class ArgMaxOp : public ArgOp<Device, T, functor::ArgMax<Device, T> > {
+ public:
+  explicit ArgMaxOp(OpKernelConstruction* context)
+      : ArgOp<Device, T, functor::ArgMax<Device, T> >(context) {}
+};
+
+template <typename Device, typename T>
+class ArgMinOp : public ArgOp<Device, T, functor::ArgMin<Device, T> > {
+ public:
+  explicit ArgMinOp(OpKernelConstruction* context)
+      : ArgOp<Device, T, functor::ArgMin<Device, T> >(context) {}
+};
+
+#define REGISTER_ARGMAX(type)                            \
+  REGISTER_KERNEL_BUILDER(Name("ArgMax")                 \
+                              .Device(DEVICE_CPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("dimension"),  \
+                          ArgMaxOp<CPUDevice, type>);    \
+  REGISTER_KERNEL_BUILDER(Name("ArgMin")                 \
+                              .Device(DEVICE_CPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("dimension"),  \
+                          ArgMinOp<CPUDevice, type>);
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_ARGMAX);
+
+#if GOOGLE_CUDA
+
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+
+#define DECLARE_GPU_SPEC(T, Dims)                                              \
+  template <>                                                                  \
+  void ArgMax<GPUDevice, T>::Reduce##Dims(                                     \
+      const GPUDevice& d, typename TTypes<T, Dims>::ConstTensor input,         \
+      const int32 dimension, typename TTypes<int64, Dims - 1>::Tensor output); \
+  template <>                                                                  \
+  void ArgMin<GPUDevice, T>::Reduce##Dims(                                     \
+      const GPUDevice& d, typename TTypes<T, Dims>::ConstTensor input,         \
+      const int32 dimension, typename TTypes<int64, Dims - 1>::Tensor output);
+
+#define DECLARE_GPU_SPECS(T) \
+  DECLARE_GPU_SPEC(T, 1);    \
+  DECLARE_GPU_SPEC(T, 2);    \
+  DECLARE_GPU_SPEC(T, 3);    \
+  DECLARE_GPU_SPEC(T, 4);    \
+  DECLARE_GPU_SPEC(T, 5);
+
+#define DECLARE_GPU_CLASS(T)                   \
+  extern template struct ArgMax<GPUDevice, T>; \
+  extern template struct ArgMin<GPUDevice, T>;
+
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_CLASS);
+
+#undef DECLARE_GPU_SPECS
+#undef DECLARE_GPU_CLASS
+
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_ARGMAX_GPU(type)                        \
+  REGISTER_KERNEL_BUILDER(Name("ArgMax")                 \
+                              .Device(DEVICE_GPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("dimension"),  \
+                          ArgMaxOp<GPUDevice, type>);    \
+  REGISTER_KERNEL_BUILDER(Name("ArgMin")                 \
+                              .Device(DEVICE_GPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("dimension"),  \
+                          ArgMinOp<GPUDevice, type>);
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_ARGMAX_GPU);
+
+#undef REGISTER_ARGMAX_GPU
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/argmax_op.h b/tensorflow/core/kernels/argmax_op.h
new file mode 100644
index 0000000000..41734f3254
--- /dev/null
+++ b/tensorflow/core/kernels/argmax_op.h
@@ -0,0 +1,55 @@
+#ifndef TENSORFLOW_KERNELS_ARGMAX_OP_H_
+#define TENSORFLOW_KERNELS_ARGMAX_OP_H_
+// Generator definition for ArgMaxOp, must be compilable by nvcc.
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+namespace functor {
+
+template <typename Device, typename T>
+struct ArgMax {
+#define DECLARE_COMPUTE_SPEC(Dims)                                     \
+  EIGEN_ALWAYS_INLINE static void Reduce##Dims(                        \
+      const Device& d, typename TTypes<T, Dims>::ConstTensor input,    \
+      const int32 dimension,                                           \
+      typename TTypes<int64, Dims - 1>::Tensor output) {               \
+    output.device(d) = input.argmax(dimension).template cast<int64>(); \
+  }
+
+  DECLARE_COMPUTE_SPEC(1);
+  DECLARE_COMPUTE_SPEC(2);
+  DECLARE_COMPUTE_SPEC(3);
+  DECLARE_COMPUTE_SPEC(4);
+  DECLARE_COMPUTE_SPEC(5);
+
+#undef DECLARE_COMPUTE_SPEC
+};
+
+template <typename Device, typename T>
+struct ArgMin {
+#define DECLARE_COMPUTE_SPEC(Dims)                                     \
+  EIGEN_ALWAYS_INLINE static void Reduce##Dims(                        \
+      const Device& d, typename TTypes<T, Dims>::ConstTensor input,    \
+      const int32 dimension,                                           \
+      typename TTypes<int64, Dims - 1>::Tensor output) {               \
+    output.device(d) = input.argmin(dimension).template cast<int64>(); \
+  }
+
+  DECLARE_COMPUTE_SPEC(1);
+  DECLARE_COMPUTE_SPEC(2);
+  DECLARE_COMPUTE_SPEC(3);
+  DECLARE_COMPUTE_SPEC(4);
+  DECLARE_COMPUTE_SPEC(5);
+
+#undef DECLARE_COMPUTE_SPEC
+};
+
+}  // namespace functor
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_ARGMAX_OP_H_
diff --git a/tensorflow/core/kernels/argmax_op_gpu.cu.cc b/tensorflow/core/kernels/argmax_op_gpu.cu.cc
new file mode 100644
index 0000000000..6c91fc2c86
--- /dev/null
+++ b/tensorflow/core/kernels/argmax_op_gpu.cu.cc
@@ -0,0 +1,20 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/argmax_op.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+#define DEFINE_GPU_SPEC(T)                       \
+  template struct functor::ArgMax<GPUDevice, T>; \
+  template struct functor::ArgMin<GPUDevice, T>;
+
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPEC);
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/assign_op.h b/tensorflow/core/kernels/assign_op.h
new file mode 100644
index 0000000000..3306f1eeaa
--- /dev/null
+++ b/tensorflow/core/kernels/assign_op.h
@@ -0,0 +1,92 @@
+#ifndef TENSORFLOW_KERNELS_ASSIGN_OP_H_
+#define TENSORFLOW_KERNELS_ASSIGN_OP_H_
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+// TODO(jeff): Get rid of use_exclusive_lock_ option
+
+// Computes *input[0] = input[1]
+class AssignOp : public OpKernel {
+ public:
+  explicit AssignOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("use_locking", &use_exclusive_lock_));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("validate_shape", &validate_shape_));
+    OP_REQUIRES(context, IsRefType(context->input_type(0)),
+                errors::InvalidArgument("lhs input needs to be a ref type"));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    Tensor rhs = context->input(1);
+
+    // We always return the input ref.
+    context->forward_ref_input_to_ref_output(0, 0);
+
+    // If the left hand side is not initialized, or the shape of the
+    // right-hand side is different than the left hand side, we need
+    // to allocate a new tensor.
+    {
+      mutex_lock l(*context->input_ref_mutex(0));
+
+      Tensor old_lhs = context->mutable_input(0, true);
+
+      if (validate_shape_) {
+        OP_REQUIRES(
+            context, old_lhs.shape().IsSameSize(rhs.shape()),
+            errors::InvalidArgument(
+                "Assign requires shapes of both tensors to match. lhs shape= ",
+                old_lhs.shape().ShortDebugString(), " rhs shape= ",
+                rhs.shape().ShortDebugString()));
+      }
+
+      const bool same_shape = old_lhs.shape().IsSameSize(rhs.shape());
+      if (!old_lhs.IsInitialized() || !same_shape) {
+        // Create new tensor whose shape matches the right hand side
+        // and copy then hand off to lhs.
+        // We can't always know how this value will be used downstream,
+        // so make conservative assumptions in specifying the memory
+        // allocation attributes.
+        AllocatorAttributes attr;
+        attr.set_gpu_compatible(true);
+        PersistentTensor copy;
+        Tensor* copyTensor = nullptr;
+        OP_REQUIRES_OK(
+            context, context->allocate_persistent(old_lhs.dtype(), rhs.shape(),
+                                                  &copy, &copyTensor, attr));
+        Copy(context, copyTensor, rhs);
+        context->replace_ref_input(0, *copyTensor, true);
+        return;
+      }
+
+      // The tensor has already been initialized and the right hand side
+      // matches the left hand side's shape.
+      if (use_exclusive_lock_) {
+        Copy(context, &old_lhs, rhs);
+        return;
+      }
+    }
+
+    // The tensor has already been initialized and the right hand side
+    // matches the left hand side's shape. We have been told to do the
+    // copy outside the lock.
+    Tensor old_unlocked_lhs = context->mutable_input(0, false);
+    Copy(context, &old_unlocked_lhs, rhs);
+  }
+
+  virtual void Copy(OpKernelContext* context, Tensor* lhs,
+                    const Tensor& rhs) = 0;
+
+  bool use_exclusive_lock_;
+  bool validate_shape_;
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_ASSIGN_OP_H_
diff --git a/tensorflow/core/kernels/attention_ops.cc b/tensorflow/core/kernels/attention_ops.cc
new file mode 100644
index 0000000000..28763f65a4
--- /dev/null
+++ b/tensorflow/core/kernels/attention_ops.cc
@@ -0,0 +1,92 @@
+// See docs in ../ops/attention_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+
+namespace tensorflow {
+
+class ExtractGlimpseOp : public OpKernel {
+ public:
+  explicit ExtractGlimpseOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("normalized", &normalized_));
+    OP_REQUIRES_OK(context, context->GetAttr("centered", &centered_));
+    OP_REQUIRES_OK(context, context->GetAttr("uniform_noise", &uniform_noise_));
+  }
+
+  // Expect input tensor of rank 4 with dimensions (batch_size, height, width,
+  // depth).
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const TensorShape input_shape = input.shape();
+    const int32 num_dims = input_shape.dims();
+    OP_REQUIRES(
+        context, num_dims == 4,
+        errors::InvalidArgument(
+            "input must be 4-dimensional (batch_size, height, width, depth)",
+            input_shape.ShortDebugString()));
+
+    const int64 batch_size = input_shape.dim_size(0);
+
+    const Tensor& window_size = context->input(1);
+    OP_REQUIRES(context, (window_size.shape().dims() == 1) &&
+                             window_size.shape().dim_size(0) == 2,
+                errors::InvalidArgument(
+                    "input must be a vector of size 2 (height, width)",
+                    window_size.shape().ShortDebugString()));
+
+    const int64 output_height = window_size.tensor<int, 1>()(0);
+    const int64 output_width = window_size.tensor<int, 1>()(1);
+    TensorShape output_shape = input_shape;
+    output_shape.set_dim(1, output_height);
+    output_shape.set_dim(2, output_width);
+
+    const Tensor& offsets = context->input(2);
+    OP_REQUIRES(context, offsets.shape().dims() == 2,
+                errors::InvalidArgument("input must be a matrix",
+                                        offsets.shape().ShortDebugString()));
+    OP_REQUIRES(context, offsets.shape().dim_size(0) == batch_size,
+                errors::InvalidArgument("first dimension should be batch",
+                                        offsets.shape().ShortDebugString()));
+    OP_REQUIRES(
+        context, offsets.shape().dim_size(1) == 2,
+        errors::InvalidArgument("second dimension should be of size 2 (y,x)",
+                                offsets.shape().ShortDebugString()));
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+
+    std::vector<Eigen::IndexPair<float> > offset_vec;
+    offset_vec.reserve(batch_size);
+    for (int i = 0; i < batch_size; ++i) {
+      float offset_y = offsets.tensor<float, 2>()(i, 0);
+      float offset_x = offsets.tensor<float, 2>()(i, 1);
+      // Eigen::ExtractGlimpses expects offsets as (x,y), whereas the
+      // calling TensorFlow operates with (y,x) as indices.
+      offset_vec.push_back(Eigen::IndexPair<float>(offset_x, offset_y));
+    }
+
+    output->tensor<float, 4>().swap_layout().device(
+        context->eigen_cpu_device()) =
+        Eigen::ExtractGlimpses(input.tensor<float, 4>().swap_layout(),
+                               output_width, output_height, offset_vec,
+                               normalized_, centered_, uniform_noise_);
+  }
+
+ private:
+  bool normalized_;
+  bool centered_;
+  bool uniform_noise_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("ExtractGlimpse").Device(DEVICE_CPU),
+                        ExtractGlimpseOp);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/avgpooling_op.cc b/tensorflow/core/kernels/avgpooling_op.cc
new file mode 100644
index 0000000000..26f98ffbcd
--- /dev/null
+++ b/tensorflow/core/kernels/avgpooling_op.cc
@@ -0,0 +1,418 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/avgpooling_op.h"
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/pooling_ops_common.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/util/padding.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+
+#if GOOGLE_CUDA
+#include "tensorflow/core/kernels/maxpooling_op_gpu.h"
+#include "tensorflow/core/kernels/pooling_ops_common_gpu.h"
+#endif  // GOOGLE_CUDA
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class AvgPoolingOp : public UnaryOp<T> {
+ public:
+  explicit AvgPoolingOp(OpKernelConstruction* context) : UnaryOp<T>(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window ksize field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window stride field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in = context->input(0);
+    PoolParameters params{context, ksize_, stride_, padding_,
+                          tensor_in.shape()};
+    if (!context->status().ok()) {
+      return;
+    }
+    OP_REQUIRES(context, params.depth_window == 1,
+                errors::Unimplemented(
+                    "Non-spatial pooling is not "
+                    "yet supported. Volunteers? :)"));
+
+    // For avgpooling, tensor_in should have 4 dimensions.
+    OP_REQUIRES(context, tensor_in.dims() == 4,
+                errors::InvalidArgument("tensor_in must be 4-dimensional"));
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, params.forward_output_shape(), &output));
+
+    if (std::is_same<Device, GPUDevice>::value) {
+      Eigen::PaddingType pt = BrainPadding2EigenPadding(padding_);
+      functor::SpatialAvgPooling<Device, T>()(
+          context->eigen_device<Device>(), output->tensor<T, 4>(),
+          tensor_in.tensor<T, 4>(), params.window_rows, params.window_cols,
+          params.row_stride, params.col_stride, pt);
+    } else {
+      SpatialAvgPool<Device, T>(context, output, tensor_in, params, padding_);
+    }
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("AvgPool")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        AvgPoolingOp<CPUDevice, float>);
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                      \
+  template <>                                                    \
+  void SpatialAvgPooling<GPUDevice, T>::operator()(              \
+      const GPUDevice& d, typename TTypes<T, 4>::Tensor output,  \
+      typename TTypes<T, 4>::ConstTensor input, int window_rows, \
+      int window_cols, int row_stride, int col_stride,           \
+      const Eigen::PaddingType& padding);                        \
+  extern template struct SpatialAvgPooling<GPUDevice, T>;
+
+DECLARE_GPU_SPEC(float);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+REGISTER_KERNEL_BUILDER(Name("AvgPool")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T"),
+                        AvgPoolingOp<GPUDevice, float>);
+#endif  // GOOGLE_CUDA
+
+// The operation to compute AvgPool gradients.
+// It takes two inputs:
+//   - The original input tensor shape
+//   - Backprop tensor for output
+// It produces one output: backprop tensor for input.
+template <typename Device, class T>
+class AvgPoolingGradOp : public OpKernel {
+ public:
+  explicit AvgPoolingGradOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window ksize field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window strides field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in_shape = context->input(0);
+    const Tensor& out_backprop = context->input(1);
+    // For avgpooling, tensor_in_shape should have 1 dimension, and 4 elements.
+    OP_REQUIRES(context, tensor_in_shape.dims() == 1 &&
+                             tensor_in_shape.NumElements() == 4,
+                errors::InvalidArgument(
+                    "out_backprop must be 1-dimensional and 4 "
+                    "elements"));
+    // For avgpooling, out_backprop should have 4 dimensions.
+    OP_REQUIRES(context, out_backprop.dims() == 4,
+                errors::InvalidArgument("out_backprop must be 4-dimensional"));
+    const int64 out_backprop_batch = out_backprop.dim_size(0);
+    const int64 out_backprop_rows = out_backprop.dim_size(1);
+    const int64 out_backprop_cols = out_backprop.dim_size(2);
+    const int64 out_backprop_depth = out_backprop.dim_size(3);
+
+    TensorShape output_shape;
+    auto shape_vec = tensor_in_shape.vec<int32>();
+    for (int64 i = 0; i < tensor_in_shape.NumElements(); ++i) {
+      output_shape.AddDim(shape_vec(i));
+    }
+    const int64 in_rows = output_shape.dim_size(1);
+    const int64 in_cols = output_shape.dim_size(2);
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+    output->flat<T>().setZero();
+
+    const int window_rows = ksize_[1];
+    const int window_cols = ksize_[2];
+    const int depth_window = ksize_[3];
+
+    const int row_stride = stride_[1];
+    const int col_stride = stride_[2];
+
+    // We (will) use different code for spatial pooling and
+    // non-spatial pooling.
+    //
+    // Spatial pooling is when depth_window = 1
+    OP_REQUIRES(context, depth_window == 1,
+                errors::Unimplemented(
+                    "Non-spatial pooling is not "
+                    "yet supported. Volunteers? :)"));
+
+    int out_height, out_width, pad_rows, pad_cols;
+    OP_REQUIRES_OK(
+        context, Get2dOutputSize(in_rows, in_cols, window_rows, window_cols,
+                                 row_stride, col_stride, padding_, &out_height,
+                                 &out_width, &pad_rows, &pad_cols));
+
+    const T* out_backprop_ptr = out_backprop.flat<T>().data();
+    T* input_backprop_ptr = output->flat<T>().data();
+
+    for (int64 b = 0; b < out_backprop_batch; ++b) {
+      for (int64 r = 0; r < out_backprop_rows; ++r) {
+        // Calculates row broadcast size.  For SAME padding, current
+        // index could be in the padding area, and r*row_stride +
+        // window_rows could be beyond the input tensor's boundary. In
+        // such cases, change the starting index and reduce the
+        // broadcast size.
+        int rindex, rsize;
+        OP_REQUIRES_OK(context,
+                       GetBroadcastSize(r, in_rows, window_rows, row_stride,
+                                        pad_rows, &rindex, &rsize));
+        for (int64 c = 0; c < out_backprop_cols; ++c) {
+          // Calculates col broadcast size.  For SAME padding, current
+          // index could be in the padding area, and c*col_stride +
+          // window_cols could be beyond the input tensor's boundary. In
+          // such cases, change the starting index and reduce the
+          // broadcast size.
+          int cindex, csize;
+          OP_REQUIRES_OK(context,
+                         GetBroadcastSize(c, in_cols, window_cols, col_stride,
+                                          pad_cols, &cindex, &csize));
+
+          T divide_coeff = 1.0 / (rsize * csize);
+          int64 output_index =
+              (b * out_backprop_rows + r) * out_backprop_cols + c;
+          for (int64 r_dst = rindex; r_dst < rindex + rsize; ++r_dst) {
+            for (int64 c_dst = cindex; c_dst < cindex + csize; ++c_dst) {
+              int64 input_index = (b * in_rows + r_dst) * in_cols + c_dst;
+              const T* output_offset =
+                  out_backprop_ptr + output_index * out_backprop_depth;
+              T* input_offset =
+                  input_backprop_ptr + input_index * out_backprop_depth;
+              for (int64 d = 0; d < out_backprop_depth; ++d) {
+                *input_offset += *output_offset * divide_coeff;
+                ++output_offset;
+                ++input_offset;
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("AvgPoolGrad")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T")
+                            .HostMemory("orig_input_shape"),
+                        AvgPoolingGradOp<CPUDevice, float>);
+REGISTER_KERNEL_BUILDER(Name("AvgPoolGrad")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<double>("T")
+                            .HostMemory("orig_input_shape"),
+                        AvgPoolingGradOp<CPUDevice, double>);
+
+#if GOOGLE_CUDA
+
+// A CUDNN based AvgPoolingGrad implementation. It includes the padding as the
+// candidates for the pooling operation.
+template <class T>
+class AvgPoolingGradOp<GPUDevice, T> : public OpKernel {
+ public:
+  typedef GPUDevice Device;
+
+  explicit AvgPoolingGradOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument("Sliding window ksize field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument("Sliding window strides field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in_shape = context->input(0);
+    const Tensor& out_backprop = context->input(1);
+    // For avgpooling, tensor_in_shape should have 1 dimension, and 4 elements.
+    OP_REQUIRES(
+        context,
+        tensor_in_shape.dims() == 1 && tensor_in_shape.NumElements() == 4,
+        errors::InvalidArgument("out_backprop must be 1-dimensional and 4 "
+                                "elements"));
+    // For avgpooling, out_backprop should have 4 dimensions.
+    OP_REQUIRES(context, out_backprop.dims() == 4,
+                errors::InvalidArgument("out_backprop must be 4-dimensional"));
+
+    TensorShape output_shape;
+    auto shape_vec = tensor_in_shape.vec<int32>();
+    for (int64 i = 0; i < tensor_in_shape.NumElements(); ++i) {
+      output_shape.AddDim(shape_vec(i));
+    }
+
+    DnnPoolingGradOp<T>::Compute(
+        context, perftools::gputools::dnn::PoolingMode::kAverage, ksize_,
+        stride_, padding_, nullptr, nullptr, out_backprop, output_shape);
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("AvgPoolGrad")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T")
+                            .HostMemory("orig_input_shape")
+                            .Label("cudnn"),
+                        AvgPoolingGradOp<GPUDevice, float>);
+
+// A custom GPU kernel based AvgPoolingGrad implementation. It includes the
+// padding as the candidates for the pooling operation.
+template <class T>
+class AvgPoolingGradOpCustomGPUKernel : public OpKernel {
+ public:
+  typedef GPUDevice Device;
+
+  explicit AvgPoolingGradOpCustomGPUKernel(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument("Sliding window ksize field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument("Sliding window strides field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in_shape = context->input(0);
+    const Tensor& out_backprop = context->input(1);
+    // For avgpooling, tensor_in_shape should have 1 dimension, and 4 elements.
+    OP_REQUIRES(
+        context,
+        tensor_in_shape.dims() == 1 && tensor_in_shape.NumElements() == 4,
+        errors::InvalidArgument("out_backprop must be 1-dimensional and 4 "
+                                "elements"));
+    // For avgpooling, out_backprop should have 4 dimensions.
+    OP_REQUIRES(context, out_backprop.dims() == 4,
+                errors::InvalidArgument("out_backprop must be 4-dimensional"));
+    const int64 out_backprop_batch = out_backprop.dim_size(0);
+    const int64 out_backprop_rows = out_backprop.dim_size(1);
+    const int64 out_backprop_cols = out_backprop.dim_size(2);
+    const int64 out_backprop_depth = out_backprop.dim_size(3);
+
+    TensorShape output_shape;
+    auto shape_vec = tensor_in_shape.vec<int32>();
+    for (int64 i = 0; i < tensor_in_shape.NumElements(); ++i) {
+      output_shape.AddDim(shape_vec(i));
+    }
+    const int64 in_rows = output_shape.dim_size(1);
+    const int64 in_cols = output_shape.dim_size(2);
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+
+    const int window_rows = ksize_[1];
+    const int window_cols = ksize_[2];
+    const int depth_window = ksize_[3];
+
+    const int row_stride = stride_[1];
+    const int col_stride = stride_[2];
+
+    // We (will) use different code for spatial pooling and
+    // non-spatial pooling.
+    //
+    // Spatial pooling is when depth_window = 1
+    OP_REQUIRES(context, depth_window == 1,
+                errors::Unimplemented("Non-spatial pooling is not "
+                                      "yet supported. Volunteers? :)"));
+
+    int out_height, out_width, pad_rows, pad_cols;
+    OP_REQUIRES_OK(
+        context, Get2dOutputSize(in_rows, in_cols, window_rows, window_cols,
+                                 row_stride, col_stride, padding_, &out_height,
+                                 &out_width, &pad_rows, &pad_cols));
+
+    RunAvePoolBackwardNHWC<T>(out_backprop.flat<T>().data(),  // top_diff
+                              out_backprop_batch,             // num
+                              in_rows,                        // height
+                              in_cols,                        // width
+                              out_backprop_depth,             // channels
+                              out_backprop_rows,              // pooled_height
+                              out_backprop_cols,              // pooled_width
+                              window_rows,                    // kernel_h
+                              window_cols,                    // kernel_w
+                              row_stride,                     // stride_h
+                              col_stride,                     // stride_w
+                              pad_rows,                       // pad_t
+                              pad_cols,                       // pad_l
+                              output->flat<T>().data(),       // bottom_diff
+                              context->eigen_gpu_device());   // d
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("AvgPoolGrad")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T")
+                            .HostMemory("orig_input_shape"),
+                        AvgPoolingGradOpCustomGPUKernel<float>);
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/avgpooling_op.h b/tensorflow/core/kernels/avgpooling_op.h
new file mode 100644
index 0000000000..38f0eb97e5
--- /dev/null
+++ b/tensorflow/core/kernels/avgpooling_op.h
@@ -0,0 +1,58 @@
+#ifndef TENSORFLOW_KERNELS_AVGPOOLING_OP_H_
+#define TENSORFLOW_KERNELS_AVGPOOLING_OP_H_
+// Functor definition for AvgPoolingOp, must be compilable by nvcc.
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T>
+struct SpatialAvgPooling {
+  void operator()(const Device& d, typename TTypes<T, 4>::Tensor output,
+                  typename TTypes<T, 4>::ConstTensor input, int window_rows,
+                  int window_cols, int row_stride, int col_stride,
+                  const Eigen::PaddingType& padding) {
+    // Because we swap the layout, we swap the row/cols as well
+    output.swap_layout().device(d) =
+        Eigen::SpatialAvgPooling(input.swap_layout(), window_cols, window_rows,
+                                 col_stride, row_stride, padding);
+  }
+};
+
+}  // namespace functor
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Lauch a custom GPU kernels from Yanqing for the avgpooling backward operation
+// that works NHWC data formats.
+// Arguments:
+//   top_diff: backprop to the output of the pooling layer
+//   num: number of input batches
+//   height: input height
+//   width: input width
+//   channels: number of input channels
+//   pooled_height: the height of the output to the pooling layer
+//   pooled_width: the width of the output to the pooling layer
+//   kernel_h: the height of the pooling kernel
+//   kernel_w: the width of the pooling kernel
+//   stride_h: the height of the vertical stride
+//   stride_w: the width of the horizontal stride
+//   pad_t: padding size to the top side
+//   pad_l: padding size to the left side
+//   bottom_diff: backprop to the input of the pooling layer.
+template <typename T>
+bool RunAvePoolBackwardNHWC(const T* const top_diff, const int num,
+                            const int height, const int width,
+                            const int channels, const int pooled_height,
+                            const int pooled_width, const int kernel_h,
+                            const int kernel_w, const int stride_h,
+                            const int stride_w, const int pad_t,
+                            const int pad_l, T* const bottom_diff,
+                            const GPUDevice& d);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_AVGPOOLING_OP_H_
diff --git a/tensorflow/core/kernels/avgpooling_op_gpu.cu.cc b/tensorflow/core/kernels/avgpooling_op_gpu.cu.cc
new file mode 100644
index 0000000000..ec84ee6862
--- /dev/null
+++ b/tensorflow/core/kernels/avgpooling_op_gpu.cu.cc
@@ -0,0 +1,101 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include <stdio.h>
+#include <iostream>
+
+#include "tensorflow/core/kernels/avgpooling_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+#define DEFINE_GPU_KERNELS(T) \
+  template struct functor::SpatialAvgPooling<GPUDevice, T>;
+
+DEFINE_GPU_KERNELS(float)
+
+#undef DEFINE_GPU_KERNELS
+
+#define CUDA_1D_KERNEL_LOOP(i, n)                              \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n); \
+       i += blockDim.x * gridDim.x)
+
+static const int CAFFE_CUDA_NUM_THREADS = 1024;
+
+template <typename dtype>
+__global__ void AvePoolBackwardNHWC(const int nthreads,
+                                    const dtype* const top_diff, const int num,
+                                    const int height, const int width,
+                                    const int channels, const int pooled_height,
+                                    const int pooled_width, const int kernel_h,
+                                    const int kernel_w, const int stride_h,
+                                    const int stride_w, const int pad_t,
+                                    const int pad_l, dtype* const bottom_diff) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // find out the local index
+    // find out the local offset
+    const int c = index % channels;
+    const int w = index / channels % width + pad_l;
+    const int h = (index / channels / width) % height + pad_t;
+    const int n = index / channels / width / height;
+    const int phstart = (h < kernel_h) ? 0 : (h - kernel_h) / stride_h + 1;
+    const int phend = min(h / stride_h + 1, pooled_height);
+    const int pwstart = (w < kernel_w) ? 0 : (w - kernel_w) / stride_w + 1;
+    const int pwend = min(w / stride_w + 1, pooled_width);
+    dtype gradient = 0;
+    const dtype* const top_diff_slice =
+        top_diff + n * pooled_height * pooled_width * channels + c;
+    for (int ph = phstart; ph < phend; ++ph) {
+      for (int pw = pwstart; pw < pwend; ++pw) {
+        // figure out the pooling size
+        int hstart = ph * stride_h - pad_t;
+        int wstart = pw * stride_w - pad_l;
+        int hend = min(hstart + kernel_h, height);
+        int wend = min(wstart + kernel_w, width);
+        hstart = max(hstart, 0);
+        wstart = max(wstart, 0);
+        int pool_size = (hend - hstart) * (wend - wstart);
+        gradient +=
+            top_diff_slice[(ph * pooled_width + pw) * channels] / pool_size;
+      }
+    }
+    bottom_diff[index] = gradient;
+  }
+}
+
+template <typename T>
+bool RunAvePoolBackwardNHWC(const T* const top_diff, const int num,
+                            const int height, const int width,
+                            const int channels, const int pooled_height,
+                            const int pooled_width, const int kernel_h,
+                            const int kernel_w, const int stride_h,
+                            const int stride_w, const int pad_t,
+                            const int pad_l, T* const bottom_diff,
+                            const GPUDevice& d) {
+  int x_size = num * height * width * channels;
+  int thread_per_block =
+      std::min(CAFFE_CUDA_NUM_THREADS, d.maxCudaThreadsPerMultiProcessor());
+  int block_count = (x_size + thread_per_block - 1) / thread_per_block;
+  AvePoolBackwardNHWC<T><<<block_count, thread_per_block, 0, d.stream()>>>(
+      x_size, top_diff, num, height, width, channels, pooled_height,
+      pooled_width, kernel_h, kernel_w, stride_h, stride_w, pad_t, pad_t,
+      bottom_diff);
+
+  return d.ok();
+}
+
+template bool RunAvePoolBackwardNHWC(
+    const float* const top_diff, const int num, const int height,
+    const int width, const int channels, const int pooled_height,
+    const int pooled_width, const int kernel_h, const int kernel_w,
+    const int stride_h, const int stride_w, const int pad_t, const int pad_l,
+    float* const bottom_diff, const GPUDevice& d);
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/batch_matmul_op.cc b/tensorflow/core/kernels/batch_matmul_op.cc
new file mode 100644
index 0000000000..349aac0158
--- /dev/null
+++ b/tensorflow/core/kernels/batch_matmul_op.cc
@@ -0,0 +1,260 @@
+// See docs in ../ops/math_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/fill_functor.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/util/work_sharder.h"
+
+#if GOOGLE_CUDA
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/stream_executor/stream.h"
+#endif  // GOOGLE_CUDA
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename Scalar>
+struct LaunchBatchMatMul;
+
+template <typename Scalar>
+struct LaunchBatchMatMul<CPUDevice, Scalar> {
+  static void Launch(OpKernelContext* context, const Tensor& in_x,
+                     const Tensor& in_y, bool adj_x, bool adj_y, Tensor* out) {
+    auto Tx = in_x.tensor<Scalar, 3>();
+    auto Ty = in_y.tensor<Scalar, 3>();
+    auto Tz = out->tensor<Scalar, 3>();
+
+    // Shards "n"-matmuls into "num" shards. Each shard is
+    // dispatched to a thread.
+    auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
+    const int64 num_units = in_x.dim_size(0);
+    const int64 cost_per_unit =
+        in_x.dim_size(0) * in_x.dim_size(1) * out->dim_size(2);
+    Shard(worker_threads.num_threads, worker_threads.workers, num_units,
+          cost_per_unit, [&Tx, &Ty, adj_x, adj_y, &Tz](int start, int limit) {
+            LaunchBatchMatMul<CPUDevice, Scalar>::Run(Tx, Ty, adj_x, adj_y, Tz,
+                                                      start, limit);
+          });
+  }
+
+  template <typename In, typename Out>
+  static void Run(In Tx, In Ty, bool adj_x, bool adj_y, Out Tz, int start,
+                  int limit) {
+    Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1> contract_pairs;
+
+    Eigen::internal::scalar_conjugate_op<Scalar> conj;
+    if (!adj_x && !adj_y) {
+      for (int i = start; i < limit; ++i) {
+        auto x = Tx.template chip<0>(i);
+        auto y = Ty.template chip<0>(i);
+        auto z = Tz.template chip<0>(i);
+        contract_pairs[0] = Eigen::IndexPair<Eigen::DenseIndex>(1, 0);
+        z = x.contract(y, contract_pairs);  // matmul
+      }
+    } else if (!adj_x && adj_y) {
+      for (int i = start; i < limit; ++i) {
+        auto x = Tx.template chip<0>(i);
+        auto y = Ty.template chip<0>(i).unaryExpr(conj);
+        auto z = Tz.template chip<0>(i);
+        contract_pairs[0] = Eigen::IndexPair<Eigen::DenseIndex>(1, 1);
+        z = x.contract(y, contract_pairs);  // matmul
+      }
+    } else if (adj_x && !adj_y) {
+      for (int i = start; i < limit; ++i) {
+        auto x = Tx.template chip<0>(i).unaryExpr(conj);
+        auto y = Ty.template chip<0>(i);
+        auto z = Tz.template chip<0>(i);
+        contract_pairs[0] = Eigen::IndexPair<Eigen::DenseIndex>(0, 0);
+        z = x.contract(y, contract_pairs);  // matmul
+      }
+    } else {
+      for (int i = start; i < limit; ++i) {
+        auto x = Tx.template chip<0>(i).unaryExpr(conj);
+        auto y = Ty.template chip<0>(i).unaryExpr(conj);
+        auto z = Tz.template chip<0>(i);
+        contract_pairs[0] = Eigen::IndexPair<Eigen::DenseIndex>(0, 1);
+        z = x.contract(y, contract_pairs);  // matmul
+      }
+    }
+  }
+};
+
+#if GOOGLE_CUDA
+
+namespace {
+template <typename T>
+perftools::gputools::DeviceMemory<T> AsDeviceMemory(const T* cuda_memory) {
+  perftools::gputools::DeviceMemoryBase wrapped(const_cast<T*>(cuda_memory));
+  perftools::gputools::DeviceMemory<T> typed(wrapped);
+  return typed;
+}
+}  // namespace
+
+template <typename Scalar>
+struct LaunchBatchMatMul<GPUDevice, Scalar> {
+  static void Launch(OpKernelContext* context, const Tensor& in_x,
+                     const Tensor& in_y, bool adj_x, bool adj_y, Tensor* out) {
+    perftools::gputools::blas::Transpose trans[] = {
+        perftools::gputools::blas::Transpose::kNoTranspose,
+        perftools::gputools::blas::Transpose::kTranspose};
+    const uint64 m = in_x.dim_size(adj_x ? 2 : 1);
+    const uint64 k = in_x.dim_size(adj_x ? 1 : 2);
+    const uint64 n = in_y.dim_size(adj_y ? 1 : 2);
+    const uint64 batch_size = in_x.dim_size(0);
+    auto blas_transpose_a = trans[adj_x];
+    auto blas_transpose_b = trans[adj_y];
+
+    auto* stream = context->op_device_context<GPUDeviceContext>()->stream();
+    OP_REQUIRES(context, stream, errors::Internal("No GPU stream available."));
+
+    typedef perftools::gputools::DeviceMemory<Scalar> DeviceMemoryType;
+    std::vector<DeviceMemoryType> a_device_memory;
+    std::vector<DeviceMemoryType> b_device_memory;
+    std::vector<DeviceMemoryType> c_device_memory;
+    std::vector<DeviceMemoryType*> a_ptrs;
+    std::vector<DeviceMemoryType*> b_ptrs;
+    std::vector<DeviceMemoryType*> c_ptrs;
+    a_device_memory.reserve(batch_size);
+    b_device_memory.reserve(batch_size);
+    c_device_memory.reserve(batch_size);
+    a_ptrs.reserve(batch_size);
+    b_ptrs.reserve(batch_size);
+    c_ptrs.reserve(batch_size);
+    auto* a_base_ptr = in_x.template flat<Scalar>().data();
+    auto* b_base_ptr = in_y.template flat<Scalar>().data();
+    auto* c_base_ptr = out->template flat<Scalar>().data();
+    for (int64 i = 0; i < batch_size; ++i) {
+      a_device_memory.push_back(AsDeviceMemory(a_base_ptr + i * m * k));
+      b_device_memory.push_back(AsDeviceMemory(b_base_ptr + i * k * n));
+      c_device_memory.push_back(AsDeviceMemory(c_base_ptr + i * m * n));
+      a_ptrs.push_back(&a_device_memory.back());
+      b_ptrs.push_back(&b_device_memory.back());
+      c_ptrs.push_back(&c_device_memory.back());
+    }
+
+    // Cublas does
+    // C = A x B
+    // where A, B and C are assumed to be in column major.
+    // We want the output to be in row-major, so we can compute
+    // C' = B' x A' (' stands for transpose)
+    bool blas_launch_status =
+        stream->ThenBlasGemmBatched(blas_transpose_b, blas_transpose_a, n, m, k,
+                                    static_cast<Scalar>(1.0), b_ptrs,
+                                    adj_y ? k : n, a_ptrs, adj_x ? m : k,
+                                    static_cast<Scalar>(0.0), c_ptrs, n,
+                                    batch_size)
+            .ok();
+    if (!blas_launch_status) {
+      context->SetStatus(errors::Internal(
+          "Blas SGEMMBatched launch failed : a.shape=",
+          in_x.shape().DebugString(), ", b.shape=", in_y.shape().DebugString(),
+          ", m=", m, ", n=", n, ", k=", k, ", batch_size=", batch_size));
+    }
+  }
+};
+
+#endif  // GOOGLE_CUDA
+
+template <typename Device, typename Scalar>
+class BatchMatMul : public OpKernel {
+ public:
+  explicit BatchMatMul(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("adj_x", &adj_x_));
+    OP_REQUIRES_OK(context, context->GetAttr("adj_y", &adj_y_));
+  }
+
+  virtual ~BatchMatMul() {}
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& in0 = ctx->input(0);
+    const Tensor& in1 = ctx->input(1);
+    OP_REQUIRES(ctx, in0.dims() == in1.dims(),
+                errors::InvalidArgument("In[0] and In[1] has different ndims: ",
+                                        in0.shape().ShortDebugString(), " vs. ",
+                                        in1.shape().ShortDebugString()));
+    const int ndims = in0.dims();
+    OP_REQUIRES(
+        ctx, ndims >= 3,
+        errors::InvalidArgument("In[0] and In[1] ndims must be >= 3: ", ndims));
+    TensorShape out_shape;
+    for (int i = 0; i < ndims - 2; ++i) {
+      OP_REQUIRES(ctx, in0.dim_size(i) == in1.dim_size(i),
+                  errors::InvalidArgument("In[0].dim(", i, ") and In[1].dim(",
+                                          i, ") must be the same: ",
+                                          in0.shape().DebugString(), " vs ",
+                                          in1.shape().DebugString()));
+      out_shape.AddDim(in0.dim_size(i));
+    }
+    auto n = out_shape.num_elements();
+    auto d0 = in0.dim_size(ndims - 2);
+    auto d1 = in0.dim_size(ndims - 1);
+    Tensor in0_reshaped;
+    CHECK(in0_reshaped.CopyFrom(in0, TensorShape({n, d0, d1})));
+    auto d2 = in1.dim_size(ndims - 2);
+    auto d3 = in1.dim_size(ndims - 1);
+    Tensor in1_reshaped;
+    CHECK(in1_reshaped.CopyFrom(in1, TensorShape({n, d2, d3})));
+    if (adj_x_) std::swap(d0, d1);
+    if (adj_y_) std::swap(d2, d3);
+    OP_REQUIRES(ctx, d1 == d2,
+                errors::InvalidArgument(
+                    "In[0] mismatch In[1] shape: ", d1, " vs. ", d2, ": ",
+                    in0.shape().ShortDebugString(), " ",
+                    in1.shape().ShortDebugString(), " ", adj_x_, " ", adj_y_));
+    out_shape.AddDim(d0);
+    out_shape.AddDim(d3);
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, out_shape, &out));
+    if (out->NumElements() == 0) {
+      return;
+    }
+    if (in0.NumElements() == 0 || in1.NumElements() == 0) {
+      functor::SetZeroFunctor<Device, Scalar> f;
+      f(ctx->eigen_device<Device>(), out->flat<Scalar>());
+      return;
+    }
+    Tensor out_reshaped;
+    CHECK(out_reshaped.CopyFrom(*out, TensorShape({n, d0, d3})));
+    LaunchBatchMatMul<Device, Scalar>::Launch(ctx, in0_reshaped, in1_reshaped,
+                                              adj_x_, adj_y_, &out_reshaped);
+  }
+
+ private:
+  bool adj_x_;
+  bool adj_y_;
+};
+
+#define REGISTER_CPU(TYPE)                                              \
+  REGISTER_KERNEL_BUILDER(                                              \
+      Name("BatchMatMul").Device(DEVICE_CPU).TypeConstraint<TYPE>("T"), \
+      BatchMatMul<CPUDevice, TYPE>)
+
+#define REGISTER_GPU(TYPE)                                              \
+  REGISTER_KERNEL_BUILDER(                                              \
+      Name("BatchMatMul").Device(DEVICE_GPU).TypeConstraint<TYPE>("T"), \
+      BatchMatMul<GPUDevice, TYPE>)
+
+REGISTER_CPU(float);
+REGISTER_CPU(double);
+REGISTER_CPU(int32);
+REGISTER_CPU(complex64);
+
+#ifdef GOOGLE_CUDA
+// TODO(kalakris): The GPU implementation is currently disabled due to issues
+// encountered in practice. See b/24534272.
+// REGISTER_GPU(float);
+#endif  // GOOGLE_CUDA
+
+#undef REGISTER_CPU
+#undef REGISTER_GPU
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/batch_norm_op.cc b/tensorflow/core/kernels/batch_norm_op.cc
new file mode 100644
index 0000000000..c67c921631
--- /dev/null
+++ b/tensorflow/core/kernels/batch_norm_op.cc
@@ -0,0 +1,223 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/batch_norm_op.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class BatchNormOp : public OpKernel {
+ public:
+  explicit BatchNormOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("variance_epsilon", &variance_epsilon_));
+    OP_REQUIRES_OK(context, context->GetAttr("scale_after_normalization",
+                                             &scale_after_normalization_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& mean = context->input(1);
+    const Tensor& var = context->input(2);
+    const Tensor& beta = context->input(3);
+    const Tensor& gamma = context->input(4);
+
+    OP_REQUIRES(context, input.dims() == 4,
+                errors::InvalidArgument("input must be 4-dimensional",
+                                        input.shape().ShortDebugString()));
+    OP_REQUIRES(context, mean.dims() == 1,
+                errors::InvalidArgument("mean must be 1-dimensional",
+                                        mean.shape().ShortDebugString()));
+    OP_REQUIRES(context, var.dims() == 1,
+                errors::InvalidArgument("var must be 1-dimensional",
+                                        var.shape().ShortDebugString()));
+    OP_REQUIRES(context, beta.dims() == 1,
+                errors::InvalidArgument("beta must be 1-dimensional",
+                                        beta.shape().ShortDebugString()));
+    OP_REQUIRES(context, gamma.dims() == 1,
+                errors::InvalidArgument("gamma must be 1-dimensional",
+                                        gamma.shape().ShortDebugString()));
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input.shape(), &output));
+
+    functor::BatchNorm<Device, T>()(
+        context->eigen_device<Device>(), input.tensor<T, 4>(), mean.vec<T>(),
+        var.vec<T>(), beta.vec<T>(), gamma.vec<T>(), variance_epsilon_,
+        scale_after_normalization_, output->tensor<T, 4>());
+  }
+
+ private:
+  float variance_epsilon_;
+  bool scale_after_normalization_;
+};
+
+template <typename Device, typename T>
+class BatchNormGradOp : public OpKernel {
+ public:
+  explicit BatchNormGradOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("variance_epsilon", &variance_epsilon_));
+    OP_REQUIRES_OK(context, context->GetAttr("scale_after_normalization",
+                                             &scale_after_normalization_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& mean = context->input(1);
+    const Tensor& var = context->input(2);
+    const Tensor& gamma = context->input(3);
+    const Tensor& out_backprop = context->input(4);
+
+    OP_REQUIRES(context, input.dims() == 4,
+                errors::InvalidArgument("input must be 4-dimensional",
+                                        input.shape().ShortDebugString()));
+    OP_REQUIRES(context, mean.dims() == 1,
+                errors::InvalidArgument("mean must be 1-dimensional",
+                                        mean.shape().ShortDebugString()));
+    OP_REQUIRES(context, var.dims() == 1,
+                errors::InvalidArgument("var must be 1-dimensional",
+                                        var.shape().ShortDebugString()));
+    OP_REQUIRES(context, gamma.dims() == 1,
+                errors::InvalidArgument("gamma must be 1-dimensional",
+                                        gamma.shape().ShortDebugString()));
+    OP_REQUIRES(
+        context, out_backprop.dims() == 4,
+        errors::InvalidArgument("out_backprop must be 4-dimensional",
+                                out_backprop.shape().ShortDebugString()));
+
+    Tensor* dx = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, input.shape(), &dx));
+    Tensor* dm = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(1, mean.shape(), &dm));
+    Tensor* dv = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(2, var.shape(), &dv));
+    Tensor* db = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(3, mean.shape(), &db));
+    Tensor* dg = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(4, gamma.shape(), &dg));
+
+    // Scratch buffer of [depth] dimension, aka the 4th dimension of input,
+    // which is dim_size(3), for calculating various combinations of
+    // (var + epsilon).
+    Tensor scratch1;
+    OP_REQUIRES_OK(context, context->allocate_temp(
+                                DataTypeToEnum<T>::value,
+                                TensorShape({input.dim_size(3)}), &scratch1));
+
+    // Scratch buffer of [depth] dimension for saving intermediate calculation
+    // values.
+    Tensor scratch2;
+    OP_REQUIRES_OK(context, context->allocate_temp(
+                                DataTypeToEnum<T>::value,
+                                TensorShape({input.dim_size(3)}), &scratch2));
+
+    functor::BatchNormGrad<Device, T>()(
+        context->eigen_device<Device>(), input.tensor<T, 4>(), mean.vec<T>(),
+        var.vec<T>(), gamma.vec<T>(), out_backprop.tensor<T, 4>(),
+        variance_epsilon_, scale_after_normalization_, dx->tensor<T, 4>(),
+        dm->vec<T>(), dv->vec<T>(), db->vec<T>(), dg->vec<T>(),
+        scratch1.vec<T>(), scratch2.vec<T>());
+  }
+
+ private:
+  float variance_epsilon_;
+  bool scale_after_normalization_;
+};
+
+#define REGISTER_KERNEL(T)                                         \
+  REGISTER_KERNEL_BUILDER(Name("BatchNormWithGlobalNormalization") \
+                              .Device(DEVICE_CPU)                  \
+                              .TypeConstraint<T>("T"),             \
+                          BatchNormOp<CPUDevice, T>);
+
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                                  \
+  template <>                                                                \
+  void BatchNorm<GPUDevice, T>::operator()(                                  \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor input,          \
+      typename TTypes<T>::ConstVec mean, typename TTypes<T>::ConstVec var,   \
+      typename TTypes<T>::ConstVec beta, typename TTypes<T>::ConstVec gamma, \
+      float variance_epsilon, bool scale_after_normalization,                \
+      typename TTypes<T, 4>::Tensor output);                                 \
+  extern template struct BatchNorm<GPUDevice, T>;
+
+#define DECLARE_GPU_SPECS(T) DECLARE_GPU_SPEC(T);
+
+DECLARE_GPU_SPECS(float);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNEL(T)                                     \
+  REGISTER_KERNEL_BUILDER(Name("BatchNormWithGlobalNormalization") \
+                              .Device(DEVICE_GPU)                  \
+                              .TypeConstraint<T>("T"),             \
+                          BatchNormOp<GPUDevice, T>);
+
+REGISTER_GPU_KERNEL(float);
+#undef REGISTER_GPU_KERNEL
+
+#endif  // GOOGLE_CUDA
+
+#define REGISTER_KERNEL(T)                                             \
+  REGISTER_KERNEL_BUILDER(Name("BatchNormWithGlobalNormalizationGrad") \
+                              .Device(DEVICE_CPU)                      \
+                              .TypeConstraint<T>("T"),                 \
+                          BatchNormGradOp<CPUDevice, T>);
+
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                                    \
+  template <>                                                                  \
+  void BatchNormGrad<GPUDevice, T>::operator()(                                \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor input,            \
+      typename TTypes<T>::ConstVec mean, typename TTypes<T>::ConstVec var,     \
+      typename TTypes<T>::ConstVec gamma,                                      \
+      typename TTypes<T, 4>::ConstTensor out_backprop, float variance_epsilon, \
+      bool scale_after_normalization, typename TTypes<T, 4>::Tensor dx,        \
+      typename TTypes<T>::Vec dm, typename TTypes<T>::Vec dv,                  \
+      typename TTypes<T>::Vec db, typename TTypes<T>::Vec dg,                  \
+      typename TTypes<T>::Vec scratch1, typename TTypes<T>::Vec scratch2);     \
+  extern template struct BatchNormGrad<GPUDevice, T>;
+
+#define DECLARE_GPU_SPECS(T) DECLARE_GPU_SPEC(T);
+
+DECLARE_GPU_SPECS(float);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNEL(T)                                         \
+  REGISTER_KERNEL_BUILDER(Name("BatchNormWithGlobalNormalizationGrad") \
+                              .Device(DEVICE_GPU)                      \
+                              .TypeConstraint<T>("T"),                 \
+                          BatchNormGradOp<GPUDevice, T>);
+
+REGISTER_GPU_KERNEL(float);
+#undef REGISTER_GPU_KERNEL
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/batch_norm_op.h b/tensorflow/core/kernels/batch_norm_op.h
new file mode 100644
index 0000000000..5981e58460
--- /dev/null
+++ b/tensorflow/core/kernels/batch_norm_op.h
@@ -0,0 +1,133 @@
+#ifndef TENSORFLOW_KERNELS_BATCH_NORM_OP_H_
+#define TENSORFLOW_KERNELS_BATCH_NORM_OP_H_
+// Functor definition for BatchNormOp, must be compilable by nvcc.
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by BatchNormOp to do the computations.
+template <typename Device, typename T>
+struct BatchNorm {
+  void operator()(const Device& d, typename TTypes<T, 4>::ConstTensor input,
+                  typename TTypes<T>::ConstVec mean,
+                  typename TTypes<T>::ConstVec var,
+                  typename TTypes<T>::ConstVec beta,
+                  typename TTypes<T>::ConstVec gamma, float variance_epsilon,
+                  bool scale_after_normalization,
+                  typename TTypes<T, 4>::Tensor output) {
+    const int depth = mean.dimension(0);
+    const int rest_size = input.size() / depth;
+
+    Eigen::DSizes<int, 2> rest_by_depth(rest_size, depth);
+#if !defined(EIGEN_HAS_INDEX_LIST)
+    Eigen::DSizes<int, 2> rest_by_one(rest_size, 1);
+    Eigen::DSizes<int, 2> one_by_depth(1, depth);
+    Eigen::DSizes<int, 2> depth_by_one(depth, 1);
+#else
+    Eigen::IndexList<int, Eigen::type2index<1> > rest_by_one;
+    rest_by_one.set(0, rest_size);
+    Eigen::IndexList<Eigen::type2index<1>, int> one_by_depth;
+    one_by_depth.set(1, depth);
+    Eigen::IndexList<int, Eigen::type2index<1> > depth_by_one;
+    depth_by_one.set(0, depth);
+#endif
+    if (scale_after_normalization) {
+      output.reshape(rest_by_depth).device(d) =
+          (input.reshape(rest_by_depth) -
+           mean.reshape(one_by_depth).broadcast(rest_by_one)) *
+              ((var + var.constant(variance_epsilon)).rsqrt() * gamma)
+                  .eval()
+                  .reshape(one_by_depth)
+                  .broadcast(rest_by_one) +
+          beta.reshape(one_by_depth).broadcast(rest_by_one);
+    } else {
+      output.reshape(rest_by_depth).device(d) =
+          (input.reshape(rest_by_depth) -
+           mean.reshape(one_by_depth).broadcast(rest_by_one)) *
+              ((var + var.constant(variance_epsilon)).rsqrt())
+                  .eval()
+                  .reshape(one_by_depth)
+                  .broadcast(rest_by_one) +
+          beta.reshape(one_by_depth).broadcast(rest_by_one);
+    }
+  }
+};
+
+template <typename Device, typename T>
+struct BatchNormGrad {
+  void operator()(const Device& d, typename TTypes<T, 4>::ConstTensor input,
+                  typename TTypes<T>::ConstVec mean,
+                  typename TTypes<T>::ConstVec var,
+                  typename TTypes<T>::ConstVec gamma,
+                  typename TTypes<T, 4>::ConstTensor out_backprop,
+                  float variance_epsilon, bool scale_after_normalization,
+                  typename TTypes<T, 4>::Tensor dx, typename TTypes<T>::Vec dm,
+                  typename TTypes<T>::Vec dv, typename TTypes<T>::Vec db,
+                  typename TTypes<T>::Vec dg, typename TTypes<T>::Vec scratch1,
+                  typename TTypes<T>::Vec scratch2) {
+    const int depth = mean.dimension(0);
+    const int rest_size = input.size() / depth;
+
+    typedef typename TTypes<T>::ConstVec::Index Index;
+    Eigen::DSizes<Index, 2> rest_by_depth(rest_size, depth);
+    Eigen::DSizes<Index, 2> rest_by_one(rest_size, 1);
+    Eigen::DSizes<Index, 2> one_by_depth(1, depth);
+
+    // db = out_backprop
+    //
+    // dg = out_backprop * ((x - m) * rsqrt(v + epsilon))
+    //
+    // dv = sum_over_rest(out_backprop * gamma * (x - m)) *
+    //      (-1/2) * (v + epsilon) ^ (-3/2)
+    //
+    // dm = sum_over_rest(out_backprop * gamma) * (-1 / rsqrt(v + epsilon))
+    //
+    // dx = out_backprop * (gamma * rsqrt(v + epsilon))
+    Eigen::array<Index, 1> reduction_axis;
+    reduction_axis[0] = 0;  // Reduces on first dimension.
+
+    db.device(d) = out_backprop.reshape(rest_by_depth).sum(reduction_axis);
+
+    // scratch1 = rsqrt(v + epsilon)
+    scratch1.device(d) = (var + var.constant(variance_epsilon)).rsqrt();
+
+    // scratch2 = sum_over_rest(out_backprop * (x - m))
+    scratch2.device(d) = (out_backprop.reshape(rest_by_depth) *
+                          (input.reshape(rest_by_depth) -
+                           mean.reshape(one_by_depth).broadcast(rest_by_one)))
+                             .sum(reduction_axis);
+
+    if (scale_after_normalization) {
+      dx.reshape(rest_by_depth).device(d) =
+          out_backprop.reshape(rest_by_depth) * ((scratch1 * gamma)
+                                                     .eval()
+                                                     .reshape(one_by_depth)
+                                                     .broadcast(rest_by_one));
+      dm.device(d) = -db * (scratch1 * gamma).eval();
+      dg.device(d) = scratch2 * scratch1;
+    } else {
+      dx.reshape(rest_by_depth).device(d) =
+          out_backprop.reshape(rest_by_depth) *
+          scratch1.reshape(one_by_depth).broadcast(rest_by_one);
+      dm.device(d) = -db * scratch1;
+      dg.device(d) = dg.constant(static_cast<T>(0.0));  // Gamma is not learned.
+    }
+
+    // scratch1 = - 1/2 * (var + epsilon) ^ (-3/2)
+    scratch1.device(d) = scratch1 * scratch1.constant(static_cast<T>(-0.5f)) /
+                         (var + var.constant(variance_epsilon));
+
+    if (scale_after_normalization) {
+      dv.device(d) = scratch2 * (scratch1 * gamma).eval();
+    } else {
+      dv.device(d) = scratch2 * scratch1;
+    }
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_BATCH_NORM_OP_H_
diff --git a/tensorflow/core/kernels/batch_norm_op_gpu.cu.cc b/tensorflow/core/kernels/batch_norm_op_gpu.cu.cc
new file mode 100644
index 0000000000..02e0eeecfa
--- /dev/null
+++ b/tensorflow/core/kernels/batch_norm_op_gpu.cu.cc
@@ -0,0 +1,17 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/batch_norm_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+template struct functor::BatchNorm<GPUDevice, float>;
+template struct functor::BatchNormGrad<GPUDevice, float>;
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/bcast_ops.cc b/tensorflow/core/kernels/bcast_ops.cc
new file mode 100644
index 0000000000..bb1492e5b4
--- /dev/null
+++ b/tensorflow/core/kernels/bcast_ops.cc
@@ -0,0 +1,71 @@
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/util/bcast.h"
+
+namespace tensorflow {
+
+// Given shapes of two tensors, computes the reduction indices for the
+// gradient computation.
+//
+// TODO(zhifengc):
+//   1. Adds support for n-ary (n >= 2).
+class BCastGradArgsOp : public OpKernel {
+ public:
+  explicit BCastGradArgsOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(
+        ctx, ctx->MatchSignature({DT_INT32, DT_INT32}, {DT_INT32, DT_INT32}));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    OP_REQUIRES(
+        ctx, ctx->num_inputs() == 2,
+        errors::Unimplemented("Broadcast for n-ary operations (n > 2)"));
+    gtl::InlinedVector<BCast::Vec, 4> shapes;
+    for (int i = 0; i < ctx->num_inputs(); ++i) {
+      const Tensor& in = ctx->input(i);
+      OP_REQUIRES(ctx, TensorShapeUtils::IsVector(in.shape()),
+                  errors::InvalidArgument("In[", i, "] must be a vector.",
+                                          in.shape().ShortDebugString()));
+      BCast::Vec vec;
+      for (int64 i = 0; i < in.NumElements(); ++i) {
+        vec.push_back(in.vec<int32>()(i));
+      }
+      shapes.push_back(vec);
+    }
+    BCast bcast(shapes[0], shapes[1]);
+    OP_REQUIRES(ctx, bcast.IsValid(),
+                errors::InvalidArgument(
+                    "Incompatible shapes: [", str_util::Join(shapes[0], ","),
+                    "] vs. [", str_util::Join(shapes[1], ","), "]"));
+    Output(ctx, 0, bcast.grad_x_reduce_idx());
+    Output(ctx, 1, bcast.grad_y_reduce_idx());
+  }
+
+ private:
+  void Output(OpKernelContext* ctx, int idx, const BCast::Vec& v) {
+    const int len = v.size();
+    Tensor* o = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(idx, TensorShape({len}), &o));
+    for (int i = 0; i < len; ++i) o->flat<int32>()(i) = v[i];
+  }
+
+  TF_DISALLOW_COPY_AND_ASSIGN(BCastGradArgsOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("BroadcastGradientArgs")
+                            .Device(DEVICE_CPU)
+                            .HostMemory("s0")
+                            .HostMemory("s1")
+                            .HostMemory("r0")
+                            .HostMemory("r1"),
+                        BCastGradArgsOp);
+REGISTER_KERNEL_BUILDER(Name("BroadcastGradientArgs")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("s0")
+                            .HostMemory("s1")
+                            .HostMemory("r0")
+                            .HostMemory("r1"),
+                        BCastGradArgsOp);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/bias_op.cc b/tensorflow/core/kernels/bias_op.cc
new file mode 100644
index 0000000000..68737f6c2d
--- /dev/null
+++ b/tensorflow/core/kernels/bias_op.cc
@@ -0,0 +1,112 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/bias_op.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class BiasOp : public BinaryOp<T> {
+ public:
+  explicit BiasOp(OpKernelConstruction* context) : BinaryOp<T>(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& bias = context->input(1);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsMatrixOrHigher(input.shape()),
+                errors::InvalidArgument("Input tensor must be at least 2D: ",
+                                        input.shape().DebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(bias.shape()),
+                errors::InvalidArgument("Biases must be 1D: ",
+                                        bias.shape().DebugString()));
+    const auto last_dim = input.shape().dims() - 1;
+    OP_REQUIRES(
+        context, bias.shape().dim_size(0) == input.shape().dim_size(last_dim),
+        errors::InvalidArgument(
+            "Must provide as many biases as the last dimension "
+            "of the input tensor: ",
+            bias.shape().DebugString(), " vs. ", input.shape().DebugString()));
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input.shape(), &output));
+
+    switch (input.shape().dims()) {
+      case 2:
+        Compute<2>(context, input, bias, output);
+        break;
+      case 3:
+        Compute<3>(context, input, bias, output);
+        break;
+      case 4:
+        Compute<4>(context, input, bias, output);
+        break;
+      case 5:
+        Compute<5>(context, input, bias, output);
+        break;
+      default:
+        OP_REQUIRES(context, false,
+                    errors::InvalidArgument("Only ranks up to 5 supported: ",
+                                            input.shape().DebugString()));
+    }
+  }
+
+  // Add biases for an input matrix of rank Dims, by using the Bias.
+  template <int Dims>
+  void Compute(OpKernelContext* ctx, const Tensor& input, const Tensor& bias,
+               Tensor* output) {
+    functor::Bias<Device, T, Dims> functor;
+    functor(ctx->eigen_device<Device>(), input.tensor<T, Dims>(), bias.vec<T>(),
+            output->tensor<T, Dims>());
+  }
+};
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("BiasAdd").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      BiasOp<CPUDevice, type>);
+
+TF_CALL_NUMBER_TYPES(REGISTER_KERNEL);
+#undef REGISTER_KERNEL
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T, Dims)                                      \
+  template <>                                                          \
+  void Bias<GPUDevice, T, Dims>::operator()(                           \
+      const GPUDevice& d, typename TTypes<T, Dims>::ConstTensor input, \
+      typename TTypes<T>::ConstVec bias,                               \
+      typename TTypes<T, Dims>::Tensor output);                        \
+  extern template struct Bias<GPUDevice, T, Dims>;
+
+#define DECLARE_GPU_SPECS(T) \
+  DECLARE_GPU_SPEC(T, 2);    \
+  DECLARE_GPU_SPEC(T, 3);    \
+  DECLARE_GPU_SPEC(T, 4);    \
+  DECLARE_GPU_SPEC(T, 5);
+
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNEL(type)                                   \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("BiasAdd").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      BiasOp<GPUDevice, type>);
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNEL);
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/bias_op.h b/tensorflow/core/kernels/bias_op.h
new file mode 100644
index 0000000000..513406d251
--- /dev/null
+++ b/tensorflow/core/kernels/bias_op.h
@@ -0,0 +1,41 @@
+#ifndef TENSORFLOW_KERNELS_BIAS_OP_H_
+#define TENSORFLOW_KERNELS_BIAS_OP_H_
+// Functor definition for BiasOp, must be compilable by nvcc.
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by BiasOp to do the computations.
+template <typename Device, typename T, int Dims>
+struct Bias {
+  // Add "bias" to "input", broadcasting it on all dimensions but the last one.
+  void operator()(const Device& d, typename TTypes<T, Dims>::ConstTensor input,
+                  typename TTypes<T>::ConstVec bias,
+                  typename TTypes<T, Dims>::Tensor output) {
+    const int bias_size = bias.dimension(0);
+    const int rest_size = input.size() / bias_size;
+
+    Eigen::DSizes<int, 2> rest_by_bias(rest_size, bias_size);
+#if !defined(EIGEN_HAS_INDEX_LIST)
+    Eigen::DSizes<int, 2> rest_by_one(rest_size, 1);
+    Eigen::DSizes<int, 2> one_by_bias(1, bias_size);
+#else
+    Eigen::IndexList<int, Eigen::type2index<1> > rest_by_one;
+    rest_by_one.set(0, rest_size);
+    Eigen::IndexList<Eigen::type2index<1>, int> one_by_bias;
+    one_by_bias.set(1, bias_size);
+#endif
+
+    output.reshape(rest_by_bias).device(d) =
+        input.reshape(rest_by_bias) +
+        bias.reshape(one_by_bias).broadcast(rest_by_one);
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_BIAS_OP_H_
diff --git a/tensorflow/core/kernels/bias_op_gpu.cu.cc b/tensorflow/core/kernels/bias_op_gpu.cu.cc
new file mode 100644
index 0000000000..d3377b3ce8
--- /dev/null
+++ b/tensorflow/core/kernels/bias_op_gpu.cu.cc
@@ -0,0 +1,23 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/bias_op.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Definition of the GPU implementations declared in bias_op.cc.
+#define DEFINE_GPU_SPECS(T)                       \
+  template struct functor::Bias<GPUDevice, T, 2>; \
+  template struct functor::Bias<GPUDevice, T, 3>; \
+  template struct functor::Bias<GPUDevice, T, 4>; \
+  template struct functor::Bias<GPUDevice, T, 5>;
+
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS);
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/candidate_sampler_ops.cc b/tensorflow/core/kernels/candidate_sampler_ops.cc
new file mode 100644
index 0000000000..cd5fde37a6
--- /dev/null
+++ b/tensorflow/core/kernels/candidate_sampler_ops.cc
@@ -0,0 +1,243 @@
+// See docs in ../ops/candidate_sampling_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include <cfloat>
+#include <unordered_map>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/range_sampler.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/util/guarded_philox_random.h"
+
+namespace tensorflow {
+
+class BaseCandidateSamplerOp : public OpKernel {
+ public:
+  explicit BaseCandidateSamplerOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("num_sampled", &num_sampled_));
+    OP_REQUIRES_OK(context, context->GetAttr("num_true", &num_true_));
+    OP_REQUIRES_OK(context, context->GetAttr("unique", &unique_));
+    OP_REQUIRES_OK(context, generator_.Init(context));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& true_classes = context->input(0);
+    OP_REQUIRES(context, true_classes.dims() == 2,
+                errors::InvalidArgument("true_classes must be a matrix"));
+    const int32 batch_size = true_classes.dim_size(0);
+    OP_REQUIRES(context, true_classes.dim_size(1) == num_true_,
+                errors::InvalidArgument("true_classes must have "
+                                        "num_true columns"));
+
+    // Output candidates and expected_count.
+    Tensor* out_sampled_candidates = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, TensorShape({num_sampled_}),
+                                            &out_sampled_candidates));
+
+    Tensor* out_true_expected_count = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                1, TensorShape({batch_size, num_true_}),
+                                &out_true_expected_count));
+    Tensor* out_sampled_expected_count = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(2, TensorShape({num_sampled_}),
+                                            &out_sampled_expected_count));
+
+    gtl::ArraySlice<int64> true_candidate(true_classes.matrix<int64>().data(),
+                                          batch_size * num_true_);
+    gtl::MutableArraySlice<int64> sampled_candidate(
+        out_sampled_candidates->vec<int64>().data(), num_sampled_);
+    gtl::MutableArraySlice<float> true_expected_count(
+        out_true_expected_count->matrix<float>().data(),
+        batch_size * num_true_);
+    gtl::MutableArraySlice<float> sampled_expected_count(
+        out_sampled_expected_count->vec<float>().data(), num_sampled_);
+
+    CHECK(sampler_) << "CandidateSamplerOp did not set sampler_";
+
+    // Approximately conservatively estimate the number of samples required.
+    // In cases where rejection sampling is used we may occasionally use more
+    // samples than expected, which will result in reused random bits.
+    const int64 samples32 = 2048 * num_sampled_;
+
+    // Pick sampled candidates.
+    auto local_gen = generator_.ReserveSamples32(samples32);
+    random::SimplePhilox random(&local_gen);
+    sampler_->SampleBatchGetExpectedCount(&random, unique_, &sampled_candidate,
+                                          &sampled_expected_count,
+                                          true_candidate, &true_expected_count);
+
+    if (sampler_->NeedsUpdates()) {
+      sampler_->Update(true_candidate);
+    }
+  }
+
+ protected:
+  void set_sampler(RangeSampler* sampler) { sampler_.reset(sampler); }
+
+ private:
+  int32 num_true_;
+  int32 num_sampled_;
+  bool unique_;
+  std::unique_ptr<RangeSampler> sampler_;
+  GuardedPhiloxRandom generator_;
+};
+
+template <class RangeSamplerType>
+class SimpleCandidateSamplerOp : public BaseCandidateSamplerOp {
+ public:
+  explicit SimpleCandidateSamplerOp(OpKernelConstruction* context)
+      : BaseCandidateSamplerOp(context) {
+    int64 range_max;
+    OP_REQUIRES_OK(context, context->GetAttr("range_max", &range_max));
+    set_sampler(new RangeSamplerType(range_max));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("UniformCandidateSampler").Device(DEVICE_CPU),
+                        SimpleCandidateSamplerOp<UniformSampler>);
+
+REGISTER_KERNEL_BUILDER(Name("LogUniformCandidateSampler").Device(DEVICE_CPU),
+                        SimpleCandidateSamplerOp<LogUniformSampler>);
+
+REGISTER_KERNEL_BUILDER(Name("LearnedUnigramCandidateSampler")
+                            .Device(DEVICE_CPU),
+                        SimpleCandidateSamplerOp<UnigramSampler>);
+
+REGISTER_KERNEL_BUILDER(Name("ThreadUnsafeUnigramCandidateSampler")
+                            .Device(DEVICE_CPU),
+                        SimpleCandidateSamplerOp<ThreadUnsafeUnigramSampler>);
+
+class AllCandidateSamplerOp : public BaseCandidateSamplerOp {
+ public:
+  explicit AllCandidateSamplerOp(OpKernelConstruction* context)
+      : BaseCandidateSamplerOp(context) {
+    int64 range_max;
+    OP_REQUIRES_OK(context, context->GetAttr("num_sampled", &range_max));
+    set_sampler(new AllSampler(range_max));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("AllCandidateSampler").Device(DEVICE_CPU),
+                        AllCandidateSamplerOp);
+
+class FixedUnigramCandidateSamplerOp : public BaseCandidateSamplerOp {
+ public:
+  explicit FixedUnigramCandidateSamplerOp(OpKernelConstruction* context)
+      : BaseCandidateSamplerOp(context) {
+    int64 range_max;
+    OP_REQUIRES_OK(context, context->GetAttr("range_max", &range_max));
+    string vocab_file;
+    OP_REQUIRES_OK(context, context->GetAttr("vocab_file", &vocab_file));
+    std::vector<float> unigrams;
+    OP_REQUIRES_OK(context, context->GetAttr("unigrams", &unigrams));
+    OP_REQUIRES(
+        context, !vocab_file.empty() || !unigrams.empty(),
+        errors::InvalidArgument("Must provide either vocab_file or unigrams."));
+    OP_REQUIRES(context, vocab_file.empty() || unigrams.empty(),
+                errors::InvalidArgument(
+                    "Must only provide one of vocab_file and unigrams."));
+    float distortion;
+    OP_REQUIRES_OK(context, context->GetAttr("distortion", &distortion));
+    int64 num_reserved_ids;
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("num_reserved_ids", &num_reserved_ids));
+    int64 num_shards;
+    OP_REQUIRES_OK(context, context->GetAttr("num_shards", &num_shards));
+    int64 shard;
+    OP_REQUIRES_OK(context, context->GetAttr("shard", &shard));
+
+    if (!vocab_file.empty()) {
+      set_sampler(new FixedUnigramSampler(context->env(), range_max, vocab_file,
+                                          distortion, num_reserved_ids,
+                                          num_shards, shard));
+    } else {
+      set_sampler(new FixedUnigramSampler(range_max, unigrams, distortion,
+                                          num_reserved_ids, num_shards, shard));
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("FixedUnigramCandidateSampler").Device(DEVICE_CPU),
+                        FixedUnigramCandidateSamplerOp);
+
+class ComputeAccidentalHitsOp : public OpKernel {
+ public:
+  explicit ComputeAccidentalHitsOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("num_true", &num_true_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& in_true_candidates = context->input(0);
+    TensorShape in_true_candidates_shape = in_true_candidates.shape();
+    OP_REQUIRES(context, TensorShapeUtils::IsMatrix(in_true_candidates_shape) &&
+                             in_true_candidates_shape.dim_size(1) == num_true_,
+                errors::InvalidArgument(
+                    "true_candidates must be a batch_size * num_true matrix"));
+
+    const int64 batch_size = in_true_candidates_shape.dim_size(0);
+
+    const Tensor& in_sampled_candidates = context->input(1);
+    OP_REQUIRES(context,
+                TensorShapeUtils::IsVector(in_sampled_candidates.shape()),
+                errors::InvalidArgument(
+                    "sampled_candidates must be a vector, which is typically "
+                    "an output from CandidateSampler"));
+
+    std::unordered_map<int64, int> sampled_candidate_to_pos;
+    for (int64 i = 0; i < in_sampled_candidates.dim_size(0); ++i) {
+      sampled_candidate_to_pos[in_sampled_candidates.vec<int64>()(i)] = i;
+    }
+
+    // Produce output in the same format as UnpackSparseFeatures.
+    std::vector<int> indices;
+    std::vector<int64> ids;
+    std::vector<float> weights;
+
+    for (int64 i = 0; i < batch_size; ++i) {
+      for (int64 j = 0; j < num_true_; ++j) {
+        const int64 true_candidate = in_true_candidates.matrix<int64>()(i, j);
+        const auto look = sampled_candidate_to_pos.find(true_candidate);
+        if (look != sampled_candidate_to_pos.end()) {
+          indices.push_back(i);
+          ids.push_back(look->second);
+          weights.push_back(-FLT_MAX);
+        }
+      }
+    }
+
+    Tensor* out_indices = nullptr;
+    OP_REQUIRES_OK(
+        context,
+        context->allocate_output(
+            0, TensorShape({static_cast<int>(indices.size())}), &out_indices));
+    Tensor* out_ids = nullptr;
+    OP_REQUIRES_OK(
+        context, context->allocate_output(
+                     1, TensorShape({static_cast<int>(ids.size())}), &out_ids));
+    Tensor* out_weights = nullptr;
+    OP_REQUIRES_OK(
+        context,
+        context->allocate_output(
+            2, TensorShape({static_cast<int>(weights.size())}), &out_weights));
+
+    for (size_t i = 0; i < indices.size(); ++i) {
+      out_indices->vec<int32>()(i) = indices[i];
+      out_ids->vec<int64>()(i) = ids[i];
+      out_weights->vec<float>()(i) = weights[i];
+    }
+  }
+
+ private:
+  int64 num_true_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("ComputeAccidentalHits").Device(DEVICE_CPU),
+                        ComputeAccidentalHitsOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cast_op.cc b/tensorflow/core/kernels/cast_op.cc
new file mode 100644
index 0000000000..779ac57b6a
--- /dev/null
+++ b/tensorflow/core/kernels/cast_op.cc
@@ -0,0 +1,233 @@
+// See docs in ../ops/math_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/cast_op.h"
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/util/work_sharder.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+namespace functor {
+
+template <typename Device, typename Tout, typename Tin>
+void CastMaybeInline(const Device& d, typename TTypes<Tout>::Flat o,
+                     typename TTypes<Tin>::ConstFlat i) {
+  if (o.size() * (sizeof(Tin) + sizeof(Tout)) < 131072) {
+    // Small cast on a CPU: do inline
+    o = i.template cast<Tout>();
+  } else {
+    o.device(d) = i.template cast<Tout>();
+  }
+}
+
+template <typename O, typename I>
+struct CastFunctor<CPUDevice, O, I> {
+  void operator()(const CPUDevice& d, typename TTypes<O>::Flat o,
+                  typename TTypes<I>::ConstFlat i) {
+    CastMaybeInline<CPUDevice, O, I>(d, o, i);
+  }
+};
+
+}  // namespace functor
+
+#define CAST_CASE(DEVICE, IN, OUT)                                         \
+  if (DataTypeToEnum<IN>::value == src_dtype_ &&                           \
+      DataTypeToEnum<OUT>::value == dst_dtype_) {                          \
+    work_ = [](OpKernelContext* ctx, const Tensor& inp, Tensor* out) {     \
+      functor::CastFunctor<DEVICE, OUT, IN> func;                          \
+      func(ctx->eigen_device<DEVICE>(), out->flat<OUT>(), inp.flat<IN>()); \
+    };                                                                     \
+    return Status::OK();                                                   \
+  }
+
+class CastOpBase : public OpKernel {
+ public:
+  explicit CastOpBase(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("SrcT", &src_dtype_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("DstT", &dst_dtype_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& inp = ctx->input(0);
+    if (work_ == nullptr) {
+      ctx->set_output(0, inp);
+    } else {
+      Tensor* out = nullptr;
+      OP_REQUIRES_OK(ctx, ctx->allocate_output(0, inp.shape(), &out));
+      work_(ctx, inp, out);
+    }
+  }
+
+ protected:
+  DataType src_dtype_;
+  DataType dst_dtype_;
+  std::function<void(OpKernelContext*, const Tensor&, Tensor*)> work_ = nullptr;
+
+  virtual Status Prepare() = 0;
+  Status Unimplemented() {
+    return errors::Unimplemented("Cast ", DataTypeString(src_dtype_), " to ",
+                                 DataTypeString(dst_dtype_),
+                                 " is not supported");
+  }
+
+  TF_DISALLOW_COPY_AND_ASSIGN(CastOpBase);
+};
+
+class CpuCastOp : public CastOpBase {
+ public:
+  explicit CpuCastOp(OpKernelConstruction* ctx) : CastOpBase(ctx) {
+    OP_REQUIRES_OK(ctx, Prepare());
+  }
+
+ protected:
+  Status Prepare() override {
+    if (src_dtype_ == dst_dtype_) {
+      work_ = nullptr;  // Identity
+      return Status::OK();
+    }
+    CAST_CASE(CPUDevice, bool, float);
+    CAST_CASE(CPUDevice, bool, int32);
+    CAST_CASE(CPUDevice, bool, double);
+    CAST_CASE(CPUDevice, double, float);
+    CAST_CASE(CPUDevice, double, int32);
+    CAST_CASE(CPUDevice, double, int64);
+    CAST_CASE(CPUDevice, float, double);
+    CAST_CASE(CPUDevice, float, uint8);
+    CAST_CASE(CPUDevice, float, int32);
+    CAST_CASE(CPUDevice, float, int64);
+    CAST_CASE(CPUDevice, int32, double);
+    CAST_CASE(CPUDevice, int32, float);
+    CAST_CASE(CPUDevice, int32, uint8);
+    CAST_CASE(CPUDevice, int32, int64);
+    CAST_CASE(CPUDevice, int64, double);
+    CAST_CASE(CPUDevice, int64, float);
+    CAST_CASE(CPUDevice, int64, int32);
+    CAST_CASE(CPUDevice, uint8, float);
+    CAST_CASE(CPUDevice, uint8, int32);
+    CAST_CASE(CPUDevice, uint8, int64);
+    CAST_CASE(CPUDevice, uint8, double);
+    if (src_dtype_ == DT_BFLOAT16 && dst_dtype_ == DT_FLOAT) {
+      work_ = [](OpKernelContext* ctx, const Tensor& inp, Tensor* out) {
+        int64 N = out->NumElements();
+        auto worker_threads = ctx->device()->tensorflow_cpu_worker_threads();
+        int num_threads =
+            std::min<int>(std::min(4, worker_threads->num_threads), N / 4096);
+        if (num_threads < 1) {
+          BFloat16ToFloat(inp.flat<bfloat16>().data(),
+                          out->flat<float>().data(), N);
+        } else {
+          auto work = [&inp, &out](int64 start, int64 end) {
+            BFloat16ToFloat(inp.flat<bfloat16>().data() + start,
+                            out->flat<float>().data() + start, end - start);
+          };
+          Shard(num_threads, worker_threads->workers, N, 100, work);
+        }
+      };
+      return Status::OK();
+    }
+    if (src_dtype_ == DT_FLOAT && dst_dtype_ == DT_BFLOAT16) {
+      work_ = [](OpKernelContext* ctx, const Tensor& inp, Tensor* out) {
+        int64 N = out->NumElements();
+        auto worker_threads = ctx->device()->tensorflow_cpu_worker_threads();
+        int num_threads =
+            std::min<int>(std::min(4, worker_threads->num_threads), N / 4096);
+        if (num_threads < 1) {
+          FloatToBFloat16(inp.flat<float>().data(),
+                          out->flat<bfloat16>().data(), N);
+        } else {
+          auto work = [&inp, &out](int64 start, int64 end) {
+            FloatToBFloat16(inp.flat<float>().data() + start,
+                            out->flat<bfloat16>().data() + start, end - start);
+          };
+          Shard(num_threads, worker_threads->workers, N, 100, work);
+        }
+      };
+      return Status::OK();
+    }
+    return Unimplemented();
+  }
+};
+
+class GpuCastOp : public CastOpBase {
+ public:
+  explicit GpuCastOp(OpKernelConstruction* ctx) : CastOpBase(ctx) {
+    OP_REQUIRES_OK(ctx, Prepare());
+  }
+
+ protected:
+  Status Prepare() override {
+    if (src_dtype_ == dst_dtype_) {
+      work_ = nullptr;  // Identity
+      return Status::OK();
+    }
+    CAST_CASE(GPUDevice, bfloat16, float);
+    CAST_CASE(GPUDevice, bool, float);
+    CAST_CASE(GPUDevice, double, float);
+    CAST_CASE(GPUDevice, double, int64);
+    CAST_CASE(GPUDevice, float, bfloat16);
+    CAST_CASE(GPUDevice, float, double);
+    CAST_CASE(GPUDevice, float, int64);
+    CAST_CASE(GPUDevice, int64, double);
+    CAST_CASE(GPUDevice, int64, float);
+    CAST_CASE(GPUDevice, uint8, float);
+    CAST_CASE(GPUDevice, float, uint8);
+    CAST_CASE(GPUDevice, bool, int32);
+    CAST_CASE(GPUDevice, double, int32);
+    CAST_CASE(GPUDevice, float, int32);
+    CAST_CASE(GPUDevice, int32, double);
+    CAST_CASE(GPUDevice, int32, float);
+    CAST_CASE(GPUDevice, int32, int64);
+    CAST_CASE(GPUDevice, int64, int32);
+    return Unimplemented();
+  }
+};
+
+#undef CAST_CASE
+
+REGISTER_KERNEL_BUILDER(Name("Cast").Device(DEVICE_CPU), CpuCastOp);
+
+#if GOOGLE_CUDA
+#define REGISTER_CAST_GPU(srctype, dsttype)                    \
+  REGISTER_KERNEL_BUILDER(Name("Cast")                         \
+                              .TypeConstraint<srctype>("SrcT") \
+                              .TypeConstraint<dsttype>("DstT") \
+                              .Device(DEVICE_GPU),             \
+                          GpuCastOp);
+REGISTER_CAST_GPU(bfloat16, float);
+REGISTER_CAST_GPU(bool, float);
+REGISTER_CAST_GPU(double, float);
+REGISTER_CAST_GPU(double, int64);
+REGISTER_CAST_GPU(float, bfloat16);
+REGISTER_CAST_GPU(float, double);
+REGISTER_CAST_GPU(float, int64);
+REGISTER_CAST_GPU(int64, double);
+REGISTER_CAST_GPU(int64, float);
+REGISTER_CAST_GPU(uint8, float);
+REGISTER_CAST_GPU(float, uint8);
+REGISTER_CAST_GPU(bool, int32);
+REGISTER_CAST_GPU(double, int32);
+REGISTER_CAST_GPU(float, int32);
+REGISTER_CAST_GPU(int32, double);
+REGISTER_CAST_GPU(int32, float);
+REGISTER_CAST_GPU(int32, int64);
+REGISTER_CAST_GPU(int64, int32);
+#undef REGISTER_CAST_GPU
+#endif  // GOOGLE_CUDA
+
+// HostCast differs from Cast in that its input and output are in host memory.
+REGISTER_KERNEL_BUILDER(Name("_HostCast").Device(DEVICE_CPU), CpuCastOp);
+REGISTER_KERNEL_BUILDER(
+    Name("_HostCast").Device(DEVICE_GPU).HostMemory("x").HostMemory("y"),
+    CpuCastOp);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/cast_op.h b/tensorflow/core/kernels/cast_op.h
new file mode 100644
index 0000000000..d066206abc
--- /dev/null
+++ b/tensorflow/core/kernels/cast_op.h
@@ -0,0 +1,71 @@
+#ifndef TENSORFLOW_KERNELS_CAST_OP_H_
+#define TENSORFLOW_KERNELS_CAST_OP_H_
+
+#include "tensorflow/core/framework/bfloat16.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/platform/port.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename Tout, typename Tin>
+void Cast(const Device& d, typename TTypes<Tout>::Flat o,
+          typename TTypes<Tin>::ConstFlat i) {
+  o.device(d) = i.template cast<Tout>();
+}
+
+template <typename Device, typename Tout, typename Tin>
+struct CastFunctor {
+  void operator()(const Device& d, typename TTypes<Tout>::Flat o,
+                  typename TTypes<Tin>::ConstFlat i);
+};
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+namespace Eigen {
+namespace internal {
+
+// Specialized cast op impls for bfloat16.
+template <>
+struct scalar_cast_op< ::tensorflow::bfloat16, float> {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
+  typedef float result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float operator()(
+      const ::tensorflow::bfloat16& a) const {
+    static_assert(::tensorflow::port::kLittleEndian, "");
+    float ret;
+    uint16_t* p = reinterpret_cast<uint16_t*>(&ret);
+    p[0] = 0;
+    p[1] = a.value;
+    return ret;
+  }
+};
+
+template <>
+struct functor_traits<scalar_cast_op< ::tensorflow::bfloat16, float> > {
+  enum { Cost = NumTraits<float>::AddCost, PacketAccess = false };
+};
+
+template <>
+struct scalar_cast_op<float, ::tensorflow::bfloat16> {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
+  typedef ::tensorflow::bfloat16 result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const ::tensorflow::bfloat16 operator()(
+      const float a) const {
+    static_assert(::tensorflow::port::kLittleEndian, "");
+    const uint16_t* p = reinterpret_cast<const uint16_t*>(&a);
+    return ::tensorflow::bfloat16(p[1]);
+  }
+};
+
+template <>
+struct functor_traits<scalar_cast_op<float, ::tensorflow::bfloat16> > {
+  enum { Cost = NumTraits<float>::AddCost, PacketAccess = false };
+};
+
+}  // namespace internal
+}  // namespace Eigen
+
+#endif  // TENSORFLOW_KERNELS_CAST_OP_H_
diff --git a/tensorflow/core/kernels/cast_op_gpu.cu.cc b/tensorflow/core/kernels/cast_op_gpu.cu.cc
new file mode 100644
index 0000000000..cd198c752b
--- /dev/null
+++ b/tensorflow/core/kernels/cast_op_gpu.cu.cc
@@ -0,0 +1,45 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/framework/bfloat16.h"
+#include "tensorflow/core/kernels/cast_op.h"
+
+namespace tensorflow {
+namespace functor {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename O, typename I>
+struct CastFunctor<GPUDevice, O, I> {
+  void operator()(const GPUDevice& d, typename TTypes<O>::Flat o,
+                  typename TTypes<I>::ConstFlat i) {
+    Cast<GPUDevice, O, I>(d, o, i);
+  }
+};
+
+#define DEFINE(O, I) template struct CastFunctor<GPUDevice, O, I>;
+DEFINE(float, double);
+DEFINE(float, int32);
+DEFINE(float, int64);
+DEFINE(double, float);
+DEFINE(double, int32);
+DEFINE(double, int64);
+DEFINE(int32, float);
+DEFINE(int32, double);
+DEFINE(int32, int64);
+DEFINE(int64, float);
+DEFINE(int64, double);
+DEFINE(int64, int32);
+DEFINE(int32, bool);
+DEFINE(float, bool);
+DEFINE(float, uint8);
+DEFINE(uint8, float);
+DEFINE(float, bfloat16);
+DEFINE(bfloat16, float);
+#undef DEFINE
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cast_op_test.cc b/tensorflow/core/kernels/cast_op_test.cc
new file mode 100644
index 0000000000..f774fbcfe8
--- /dev/null
+++ b/tensorflow/core/kernels/cast_op_test.cc
@@ -0,0 +1,100 @@
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+template <typename Src, typename Dst>
+static Graph* Cast(int num) {
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor data(DataTypeToEnum<Src>::value,
+              TensorShape({64, 64, num / (64 * 64)}));
+  data.flat<Src>().setRandom();
+  test::graph::Cast(g, test::graph::Constant(g, data),
+                    DataTypeToEnum<Dst>::value);
+  return g;
+}
+
+class CastOpTest : public OpsTestBase {
+ protected:
+  void MakeOp(DataType src, DataType dst) {
+    RequireDefaultOps();
+    EXPECT_OK(NodeDefBuilder("cast_op", "Cast")
+                  .Input(FakeInput(DT_INT32))
+                  .Attr("SrcT", src)
+                  .Attr("DstT", dst)
+                  .Finalize(node_def()));
+    EXPECT_OK(InitOp());
+  }
+};
+
+TEST_F(CastOpTest, Int32ToUint8) {
+  MakeOp(DT_INT32, DT_UINT8);
+  AddInputFromArray<int32>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_UINT8, TensorShape({1, 2, 2, 1}));
+  test::FillValues<uint8>(&expected, {1, 2, 3, 4});
+  test::ExpectTensorEqual<uint8>(expected, *GetOutput(0));
+}
+
+static void BM_cpu_float_int64(int iters, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  testing::BytesProcessed(static_cast<int64>(iters) * num *
+                          (sizeof(float) + sizeof(int64)));
+  testing::UseRealTime();
+  test::Benchmark("cpu", Cast<float, int64>(num)).Run(iters);
+}
+BENCHMARK(BM_cpu_float_int64)->Arg(64 << 10)->Arg(32 << 20);
+
+static void BM_gpu_float_int64(int iters, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  testing::BytesProcessed(static_cast<int64>(iters) * num *
+                          (sizeof(float) + sizeof(int64)));
+  testing::UseRealTime();
+  test::Benchmark("gpu", Cast<float, int64>(num)).Run(iters);
+}
+BENCHMARK(BM_gpu_float_int64)->Arg(64 << 10)->Arg(32 << 20);
+
+static void BM_cpu_bool_float(int iters, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  testing::BytesProcessed(static_cast<int64>(iters) * num *
+                          (sizeof(bool) + sizeof(float)));
+  testing::UseRealTime();
+  test::Benchmark("cpu", Cast<bool, float>(num)).Run(iters);
+}
+BENCHMARK(BM_cpu_bool_float)->Arg(64 << 10)->Arg(32 << 20);
+
+static void BM_gpu_bool_float(int iters, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  testing::BytesProcessed(static_cast<int64>(iters) * num *
+                          (sizeof(bool) + sizeof(float)));
+  testing::UseRealTime();
+  test::Benchmark("gpu", Cast<bool, float>(num)).Run(iters);
+}
+BENCHMARK(BM_gpu_bool_float)->Arg(64 << 10)->Arg(32 << 20);
+
+static void BM_cpu_float_bfloat16(int iters, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  testing::BytesProcessed(static_cast<int64>(iters) * num *
+                          (sizeof(float) + sizeof(bfloat16)));
+  testing::UseRealTime();
+  test::Benchmark("cpu", Cast<float, bfloat16>(num)).Run(iters);
+}
+BENCHMARK(BM_cpu_float_bfloat16)->Arg(64 << 10)->Arg(32 << 20);
+
+static void BM_cpu_bfloat16_float(int iters, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  testing::BytesProcessed(static_cast<int64>(iters) * num *
+                          (sizeof(float) + sizeof(bfloat16)));
+  testing::UseRealTime();
+  test::Benchmark("cpu", Cast<bfloat16, float>(num)).Run(iters);
+}
+BENCHMARK(BM_cpu_bfloat16_float)->Arg(64 << 10)->Arg(32 << 20);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/check_numerics_op.cc b/tensorflow/core/kernels/check_numerics_op.cc
new file mode 100644
index 0000000000..65487a303c
--- /dev/null
+++ b/tensorflow/core/kernels/check_numerics_op.cc
@@ -0,0 +1,190 @@
+// See docs in ../ops/array_ops.cc.
+
+#include <math.h>
+#include <algorithm>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+
+#if GOOGLE_CUDA
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/stream_executor/stream.h"
+#endif  // GOOGLE_CUDA
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+#if GOOGLE_CUDA
+template <typename T>
+struct CheckNumericsLaunch {
+  void Run(const GPUDevice& d, const T* data, int size,
+           int abnormal_detected[2]);
+};
+#endif
+
+namespace {
+
+template <typename Device, typename T>
+class CheckNumericsOp;
+
+// Partial specialization for CPU
+template <typename T>
+class CheckNumericsOp<CPUDevice, T> : public OpKernel {
+ public:
+  explicit CheckNumericsOp(OpKernelConstruction* context) : OpKernel(context) {
+    // message_ is used as the prefix for the assertion error message. For
+    // instance, this can be the name of the input op that produced the tensor.
+    OP_REQUIRES_OK(context, context->GetAttr("message", &message_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    // pass along the input to the output
+    context->set_output(0, context->input(0));
+
+    auto in = context->input(0).flat<T>();
+    const T* data = in.data();
+    const int size = in.size();
+    // Check to see if any element of the tensor is NaN or Inf.
+    int fp_props =
+        std::accumulate(data, data + size, 0, [](const int& x, const T& y) {
+          int prop = std::fpclassify(y);
+          int result = x;
+          if (prop == FP_INFINITE) {
+            result |= kInfBit;
+          } else if (prop == FP_NAN) {
+            result |= kNaNBit;
+          }
+          return result;
+        });
+    string status;
+    if ((fp_props & kInfBit) && (fp_props & kNaNBit)) {
+      status = "Inf and NaN";
+    } else {
+      if (fp_props & kInfBit) {
+        status = "Inf";
+      }
+      if (fp_props & kNaNBit) {
+        status = "NaN";
+      }
+    }
+    if (!status.empty()) {
+      context->SetStatus(errors::InvalidArgument(message_, " : Tensor had ",
+                                                 status, " values"));
+    }
+  }
+
+ private:
+  string message_;
+  static const int kInfBit = 0x01;
+  static const int kNaNBit = 0x02;
+};
+
+#if GOOGLE_CUDA
+// Partial specialization for GPU
+template <typename T>
+class CheckNumericsOp<GPUDevice, T> : public OpKernel {
+ public:
+  typedef GPUDevice Device;
+
+  explicit CheckNumericsOp(OpKernelConstruction* context) : OpKernel(context) {
+    // message_ is used as the prefix for the assertion error message. For
+    // instance, this can be the name of the input op that produced the tensor.
+    OP_REQUIRES_OK(context, context->GetAttr("message", &message_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    // pass along the input to the output
+    context->set_output(0, context->input(0));
+    auto input = context->input(0).flat<T>();
+
+    // Allocate and initialize the elements to hold the check results
+    const int abnormal_detected_size = 2;
+    Tensor abnormal_detected;
+    OP_REQUIRES_OK(context, context->allocate_temp(
+                                DT_INT32, TensorShape({abnormal_detected_size}),
+                                &abnormal_detected));
+
+    auto* stream = context->op_device_context<GPUDeviceContext>()->stream();
+    OP_REQUIRES(context, stream, errors::Internal("No GPU stream available."));
+
+    perftools::gputools::DeviceMemoryBase abnormal_detected_ptr(
+        abnormal_detected.flat<int>().data(),
+        abnormal_detected.flat<int>().size());
+    stream->ThenMemset32(&abnormal_detected_ptr, 0,
+                         abnormal_detected.flat<int>().size() * sizeof(int));
+
+    // Call the Cuda kernels for the numerical checks
+    const Device& d = context->eigen_device<Device>();
+    CheckNumericsLaunch<T>().Run(d, input.data(), input.size(),
+                                 abnormal_detected.flat<int>().data());
+
+    // Copy the results from device to host
+    AllocatorAttributes attr;
+    attr.set_on_host(true);
+    attr.set_gpu_compatible(true);
+    Tensor abnormal_detected_out;
+    OP_REQUIRES_OK(context, context->allocate_temp(
+                                DT_INT32, TensorShape({abnormal_detected_size}),
+                                &abnormal_detected_out, attr));
+    int* abnormal_detected_host = abnormal_detected_out.flat<int>().data();
+    stream->ThenMemcpy(abnormal_detected_host, abnormal_detected_ptr,
+                       abnormal_detected_size * sizeof(int));
+    stream->BlockHostUntilDone();
+    OP_REQUIRES(context, stream->ok(),
+                errors::Internal("cudaMemcpy from device to host failed"));
+
+    int is_nan = abnormal_detected_host[0];
+    int is_inf = abnormal_detected_host[1];
+    if (is_nan || is_inf) {
+      string status;
+      LOG(ERROR) << "abnormal_detected_host @" << abnormal_detected_host
+                 << " = {" << is_nan << ", " << is_inf << "} " << message_;
+
+      // Results should always be 1 or 0.  If we see anything else then
+      // there has been some GPU memory corruption.
+      CHECK_GE(is_nan, 0);
+      CHECK_GE(is_inf, 0);
+      CHECK_LE(is_nan, 1);
+      CHECK_LE(is_inf, 1);
+
+      if (is_nan && is_inf) {
+        status = "Inf and NaN";
+      } else if (is_nan) {
+        status = "NaN";
+      } else if (is_inf) {
+        status = "Inf";
+      }
+      context->SetStatus(errors::InvalidArgument(message_, " : Tensor had ",
+                                                 status, " values"));
+    }
+  }
+
+ private:
+  string message_;
+};
+#endif  // GOOGLE_CUDA
+
+}  // namespace
+
+REGISTER_KERNEL_BUILDER(Name("CheckNumerics")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        CheckNumericsOp<CPUDevice, float>);
+REGISTER_KERNEL_BUILDER(Name("CheckNumerics")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<double>("T"),
+                        CheckNumericsOp<CPUDevice, double>);
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("CheckNumerics")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T"),
+                        CheckNumericsOp<GPUDevice, float>);
+REGISTER_KERNEL_BUILDER(Name("CheckNumerics")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<double>("T"),
+                        CheckNumericsOp<GPUDevice, double>);
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/check_numerics_op_gpu.cu.cc b/tensorflow/core/kernels/check_numerics_op_gpu.cu.cc
new file mode 100644
index 0000000000..cb84f98731
--- /dev/null
+++ b/tensorflow/core/kernels/check_numerics_op_gpu.cu.cc
@@ -0,0 +1,62 @@
+#if GOOGLE_CUDA
+#define EIGEN_USE_GPU
+
+#include <stdio.h>
+#include <assert.h>
+
+#include <math.h>
+#include <algorithm>
+
+#include "tensorflow/core/platform/port.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+namespace {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// A Cuda kernel to check if each element is Inf or Nan. If any exists, the
+// relevant elements in abnormal_detected will be set
+template <typename T>
+__global__ void CheckNumericsKernel(const T *data, int size,
+                                    int abnormal_detected[2]) {
+  const int32 thread_id = blockIdx.x * blockDim.x + threadIdx.x;
+  const int32 total_thread_count = gridDim.x * blockDim.x;
+
+  int32 offset = thread_id;
+
+  while (offset < size) {
+    if (isnan(data[offset])) {
+      abnormal_detected[0] = 1;
+    }
+    if (isinf(data[offset])) {
+      abnormal_detected[1] = 1;
+    }
+    offset += total_thread_count;
+  }
+}
+
+}  // namespace
+
+// A simple launch pad to launch the Cuda kernels that checks the numerical
+// abnormality in the given array
+template <typename T>
+struct CheckNumericsLaunch {
+  void Run(const GPUDevice &d, const T *data, int size,
+           int abnormal_detected[2]) {
+    const int32 block_size = d.maxCudaThreadsPerBlock();
+    const int32 num_blocks =
+        (d.getNumCudaMultiProcessors() * d.maxCudaThreadsPerMultiProcessor()) /
+        block_size;
+
+    CheckNumericsKernel<T><<<num_blocks, block_size, 0, d.stream()>>>(
+        data, size, abnormal_detected);
+  }
+};
+
+template struct CheckNumericsLaunch<float>;
+template struct CheckNumericsLaunch<double>;
+
+}  // namespace tensorflow
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cholesky_op.cc b/tensorflow/core/kernels/cholesky_op.cc
new file mode 100644
index 0000000000..12632fb248
--- /dev/null
+++ b/tensorflow/core/kernels/cholesky_op.cc
@@ -0,0 +1,71 @@
+// See docs in ../ops/linalg_ops.cc.
+// TODO(konstantinos): Enable complex inputs. This will require additional tests
+//                     and OP_REQUIRES.
+
+#include <cmath>
+
+#include "tensorflow/core/framework/kernel_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/linalg_ops_common.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/Eigen/Cholesky"
+
+namespace tensorflow {
+
+template <class Scalar, bool SupportsBatchOperationT>
+class CholeskyOp : public LinearAlgebraOp<Scalar, SupportsBatchOperationT> {
+ public:
+  explicit CholeskyOp(OpKernelConstruction* context)
+      : LinearAlgebraOp<Scalar, SupportsBatchOperationT>(context) {}
+
+  TensorShape GetOutputMatrixShape(
+      const TensorShape& input_matrix_shape) override {
+    return input_matrix_shape;
+  }
+
+  int64 GetCostPerUnit(const TensorShape& input_matrix_shape) override {
+    const int64 rows = input_matrix_shape.dim_size(0);
+    if (rows > (1LL << 20)) {
+      // A big number to cap the cost in case overflow.
+      return kint32max;
+    } else {
+      return rows * rows * rows;
+    }
+  }
+
+  using typename LinearAlgebraOp<Scalar, SupportsBatchOperationT>::MatrixMap;
+  using
+      typename LinearAlgebraOp<Scalar, SupportsBatchOperationT>::ConstMatrixMap;
+
+  void ComputeMatrix(OpKernelContext* context, const ConstMatrixMap& input,
+                     MatrixMap* output) override {
+    OP_REQUIRES(context, input.rows() == input.cols(),
+                errors::InvalidArgument("Input matrix must be square."));
+    if (input.rows() == 0) {
+      // If X is an empty matrix (0 rows, 0 col), X * X' == X.
+      // Therefore, we return X.
+      return;
+    }
+    // Perform the actual LL^T Cholesky decomposition. This will only use
+    // the lower triangular part of data_in by default. The upper triangular
+    // part of the matrix will not be read.
+    Eigen::LLT<Eigen::Matrix<Scalar, Eigen::Dynamic, Eigen::Dynamic,
+                             Eigen::RowMajor>> llt_decomposition(input);
+
+    // Output the lower triangular in a dense form.
+    *output = llt_decomposition.matrixL();
+
+    OP_REQUIRES(context, llt_decomposition.info() == Eigen::Success,
+                errors::InvalidArgument("LLT decomposition was not successful. "
+                                        "The input might not be valid."));
+  }
+};
+
+REGISTER_LINALG_OP("Cholesky", (CholeskyOp<float, false>), float);
+REGISTER_LINALG_OP("Cholesky", (CholeskyOp<double, false>), double);
+REGISTER_LINALG_OP("BatchCholesky", (CholeskyOp<float, true>), float);
+REGISTER_LINALG_OP("BatchCholesky", (CholeskyOp<double, true>), double);
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/concat_op.cc b/tensorflow/core/kernels/concat_op.cc
new file mode 100644
index 0000000000..b68fcec515
--- /dev/null
+++ b/tensorflow/core/kernels/concat_op.cc
@@ -0,0 +1,153 @@
+// See docs in ../ops/array_ops.cc.
+
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/kernels/concat_op.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+// --------------------------------------------------------------------------
+template <typename Device, typename T>
+class ConcatOp : public OpKernel {
+ public:
+  typedef std::vector<std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>>
+      ConstMatrixVector;
+
+  explicit ConcatOp(OpKernelConstruction* c) : OpKernel(c) {}
+
+  void Compute(OpKernelContext* c) override {
+    const Tensor* concat_dim_tensor;
+    OP_REQUIRES_OK(c, c->input("concat_dim", &concat_dim_tensor));
+    OP_REQUIRES(
+        c, TensorShapeUtils::IsLegacyScalar(concat_dim_tensor->shape()),
+        errors::InvalidArgument(
+            "Concat dim tensor should be a scalar integer, but got shape ",
+            concat_dim_tensor->shape().DebugString()));
+    const int32 concat_dim = concat_dim_tensor->scalar<int32>()();
+    OpInputList values;
+    OP_REQUIRES_OK(c, c->input_list("values", &values));
+    const int N = values.size();
+    const int input_dims = values[0].dims();
+    const TensorShape& input_shape = values[0].shape();
+    OP_REQUIRES(
+        c, (0 <= concat_dim && concat_dim < input_dims) ||
+               (kAllowLegacyScalars && concat_dim == 0),
+        errors::InvalidArgument(
+            "ConcatOp : Expected concatenating dimensions in the range [", 0,
+            ", ", input_dims, "), but got ", concat_dim));
+
+    // Note that we reduce the concat of n-dimensional tensors into a two
+    // dimensional concat. Assuming the dimensions of any input/output
+    // tensor are {x0, x1,...,xn-1, y0, y1,...,ym-1}, where the concat is along
+    // the dimension indicated with size y0, we flatten it to {x, y}, where y =
+    // Prod_i(yi) and x = ((n > 0) ? Prod_i(xi) : 1).
+    ConstMatrixVector inputs_flat;
+    inputs_flat.reserve(N);
+    int64 inputs_flat_dim0 = 1;
+    for (int d = 0; d < concat_dim; ++d) {
+      inputs_flat_dim0 *= input_shape.dim_size(d);
+    }
+    int output_concat_dim = 0;
+    const bool input_is_scalar = TensorShapeUtils::IsLegacyScalar(input_shape);
+    for (int i = 0; i < N; ++i) {
+      const auto in = values[i];
+      const bool in_is_scalar = TensorShapeUtils::IsLegacyScalar(in.shape());
+      OP_REQUIRES(
+          c, in.dims() == input_dims || (input_is_scalar && in_is_scalar),
+          errors::InvalidArgument(
+              "ConcatOp : Ranks of all input tensors should match: shape[0] = ",
+              input_shape.ShortDebugString(), " vs. shape[", i, "] = ",
+              in.shape().ShortDebugString()));
+      for (int j = 0; j < input_dims; ++j) {
+        if (j == concat_dim) {
+          continue;
+        }
+        OP_REQUIRES(
+            c, in.dim_size(j) == input_shape.dim_size(j),
+            errors::InvalidArgument(
+                "ConcatOp : Dimensions of inputs should match: shape[0] = ",
+                input_shape.ShortDebugString(), " vs. shape[", i, "] = ",
+                in.shape().ShortDebugString()));
+      }
+      if (in.NumElements() > 0) {
+        int64 inputs_flat_dim1 = in.NumElements() / inputs_flat_dim0;
+        inputs_flat.emplace_back(new typename TTypes<T, 2>::ConstMatrix(
+            in.shaped<T, 2>({inputs_flat_dim0, inputs_flat_dim1})));
+      }
+      // TODO(irving): Remove check once !kAllowLegacyScalars
+      output_concat_dim += in.dims() > 0 ? in.dim_size(concat_dim) : 1;
+    }
+
+    TensorShape output_shape(input_shape);
+    // TODO(irving): Remove rank 0 case once !kAllowLegacyScalars
+    if (output_shape.dims() == 0) {
+      output_shape.AddDim(output_concat_dim);
+    } else {
+      output_shape.set_dim(concat_dim, output_concat_dim);
+    }
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(c, c->allocate_output(0, output_shape, &output));
+    if (output->NumElements() > 0) {
+      int64 output_dim1 = output->NumElements() / inputs_flat_dim0;
+      auto output_flat = output->shaped<T, 2>({inputs_flat_dim0, output_dim1});
+      if (std::is_same<Device, GPUDevice>::value) {
+        ConcatGPU<T>(c->eigen_gpu_device(), inputs_flat, &output_flat);
+      } else {
+        ConcatCPU<T>(c->device(), inputs_flat, &output_flat);
+      }
+    }
+  }
+};
+
+#define REGISTER_CONCAT(type)                            \
+  REGISTER_KERNEL_BUILDER(Name("Concat")                 \
+                              .Device(DEVICE_CPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("concat_dim"), \
+                          ConcatOp<CPUDevice, type>)
+
+TF_CALL_ALL_TYPES(REGISTER_CONCAT);
+REGISTER_CONCAT(quint8);
+REGISTER_CONCAT(qint8);
+REGISTER_CONCAT(qint32);
+REGISTER_CONCAT(bfloat16);
+
+#undef REGISTER_CONCAT
+
+#if GOOGLE_CUDA
+
+#define REGISTER_GPU(type)                               \
+  REGISTER_KERNEL_BUILDER(Name("Concat")                 \
+                              .Device(DEVICE_GPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("concat_dim"), \
+                          ConcatOp<GPUDevice, type>)
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU);
+#undef REGISTER_GPU
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Concat")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int32>("T")
+                            .HostMemory("concat_dim")
+                            .HostMemory("values")
+                            .HostMemory("output"),
+                        ConcatOp<CPUDevice, int32>);
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/concat_op.h b/tensorflow/core/kernels/concat_op.h
new file mode 100644
index 0000000000..664e55080d
--- /dev/null
+++ b/tensorflow/core/kernels/concat_op.h
@@ -0,0 +1,27 @@
+#ifndef TENSORFLOW_KERNELS_CONCAT_OP_H_
+#define TENSORFLOW_KERNELS_CONCAT_OP_H_
+
+#include <vector>
+
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/framework/device_base.h"
+
+namespace tensorflow {
+
+// Assumes all inputs are nonempty
+template <typename T>
+void ConcatCPU(DeviceBase* d,
+               const std::vector<
+                   std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>>& inputs,
+               typename TTypes<T, 2>::Matrix* output);
+
+// Assumes all inputs are nonempty
+template <typename T>
+void ConcatGPU(const Eigen::GpuDevice& d,
+               const std::vector<
+                   std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>>& inputs,
+               typename TTypes<T, 2>::Matrix* output);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_CONCAT_OP_H_
diff --git a/tensorflow/core/kernels/concat_op_cpu.cc b/tensorflow/core/kernels/concat_op_cpu.cc
new file mode 100644
index 0000000000..679a53721c
--- /dev/null
+++ b/tensorflow/core/kernels/concat_op_cpu.cc
@@ -0,0 +1,122 @@
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/concat_op.h"
+#include "tensorflow/core/util/work_sharder.h"
+
+namespace tensorflow {
+
+template <typename T>
+static inline void Copy(T* dst, const T* src, int n) {
+  if (DataTypeCanUseMemcpy(DataTypeToEnum<T>::v())) {
+    memcpy(dst, src, n * sizeof(T));
+  } else {
+    for (int k = 0; k < n; ++k) {
+      *dst++ = *src++;
+    }
+  }
+}
+
+template <typename T>
+void ConcatCPU(DeviceBase* d,
+               const std::vector<
+                   std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>>& inputs,
+               typename TTypes<T, 2>::Matrix* output) {
+  int num_inputs = inputs.size();
+  std::vector<ptrdiff_t> sizes;
+  sizes.reserve(num_inputs);
+  int row_size = 0;
+  for (int j = 0; j < num_inputs; ++j) {
+    sizes.push_back(inputs[j]->dimension(1));
+    row_size += sizes.back();
+  }
+
+  auto worker_threads = d->tensorflow_cpu_worker_threads();
+  int num_threads = std::min<int>(std::min(4, worker_threads->num_threads),
+                                  output->size() / 4096);
+  // Single threaded mode.
+  if (num_threads == 0) {
+    T* out = &(*output)(0, 0);
+    std::vector<const T*> inp;
+    inp.reserve(num_inputs);
+    for (int j = 0; j < num_inputs; ++j) {
+      inp.push_back(&(*inputs[j])(0, 0));
+    }
+    const int dim0 = output->dimension(0);
+    for (int i = 0; i < dim0; ++i) {
+      for (int j = 0; j < num_inputs; ++j) {
+        auto size = sizes[j];
+        Copy(out, inp[j], size);
+        out += size;
+        inp[j] += size;
+      }
+    }
+    return;
+  }
+
+  // Sharded mode.
+  auto work = [&row_size, &sizes, &inputs, &output, &num_inputs](int64 start,
+                                                                 int64 end) {
+    int64 skipped_rows = start / row_size;
+    T* out = output->data() + skipped_rows * row_size;
+    T* out_start = output->data() + start;
+    T* out_end = output->data() + end;
+
+    // Handle partial row at start
+    if (out < out_start) {
+      for (int j = 0; j < num_inputs; ++j) {
+        ptrdiff_t size = sizes[j];
+        ptrdiff_t offset = out_start - out;
+        if (size <= offset) {
+          out += size;
+          continue;
+        }
+        const T* inp = &(*inputs[j])(skipped_rows, 0);
+        if (offset > 0) {
+          out += offset;
+          inp += offset;
+          size -= offset;
+        }
+        size = std::min(size, out_end - out);
+        if (size <= 0) break;
+        Copy(out, inp, size);
+        out += size;
+      }
+      ++skipped_rows;
+    }
+    if (out == out_end) return;
+    CHECK(out >= out_start);
+    CHECK(out < out_end);
+
+    // Copy remaining data.
+    std::vector<const T*> inp;
+    inp.reserve(num_inputs);
+    for (int j = 0; j < num_inputs; ++j) {
+      inp.push_back(&(*inputs[j])(skipped_rows, 0));
+    }
+    const int dim0 = output->dimension(0);
+    for (int i = skipped_rows; i < dim0; ++i) {
+      for (int j = 0; j < num_inputs; ++j) {
+        ptrdiff_t size = std::min(sizes[j], out_end - out);
+        Copy(out, inp[j], size);
+        out += size;
+        inp[j] += size;
+        if (out == out_end) return;
+      }
+    }
+  };
+  Shard(num_threads, worker_threads->workers, output->size(), 100, work);
+}
+
+#define REGISTER(T)                                                            \
+  template void ConcatCPU<T>(                                                  \
+      DeviceBase*,                                                             \
+      const std::vector<std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>>&, \
+      typename TTypes<T, 2>::Matrix* output);
+TF_CALL_ALL_TYPES(REGISTER)
+REGISTER(quint8)
+REGISTER(qint8)
+REGISTER(qint32)
+REGISTER(bfloat16)
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/concat_op_gpu.cu.cc b/tensorflow/core/kernels/concat_op_gpu.cu.cc
new file mode 100644
index 0000000000..d8ce6bd85d
--- /dev/null
+++ b/tensorflow/core/kernels/concat_op_gpu.cu.cc
@@ -0,0 +1,41 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include <stdio.h>
+
+#include <memory>
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename T>
+void ConcatGPU(const GPUDevice& d,
+               const std::vector<
+                   std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>>& inputs,
+               typename TTypes<T, 2>::Matrix* output) {
+  Eigen::array<ptrdiff_t, 2> offset(0, 0);
+  for (int i = 0; i < inputs.size(); ++i) {
+    Eigen::array<ptrdiff_t, 2> size = inputs[i]->dimensions();
+    output->slice(offset, size).device(d) = *inputs[i];
+    offset[1] += size[1];
+  }
+}
+
+#define REGISTER_GPU(T)                                                       \
+  template void ConcatGPU<T>(                                                 \
+      const GPUDevice& d,                                                     \
+      const std::vector<std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>>& \
+          inputs,                                                             \
+      typename TTypes<T, 2>::Matrix* output);
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU);
+#undef REGISTER_GPU
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/concat_op_test.cc b/tensorflow/core/kernels/concat_op_test.cc
new file mode 100644
index 0000000000..4ccc5b5b19
--- /dev/null
+++ b/tensorflow/core/kernels/concat_op_test.cc
@@ -0,0 +1,240 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/testlib.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+namespace {
+
+// For the benchmark, we set up two 2-dimensional tensors, each kDim1 x 'dim'
+// in size, and concat them together along "concat_dimension"
+template <typename T>
+static void ConcatHelper(int iters, int concat_dimension, int dim2) {
+  testing::StopTiming();
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+
+  DataType dt = DataTypeToEnum<T>::v();
+  const int kDim1 = 100;
+  Tensor concat_dim(DT_INT32, TensorShape({}));
+  concat_dim.scalar<int32>()() = concat_dimension;
+  Tensor in0(dt, TensorShape({kDim1, dim2}));
+  in0.flat<T>().setRandom();
+  Tensor in1(dt, TensorShape({kDim1, dim2}));
+  in1.flat<T>().setRandom();
+
+  Node* node;
+  TF_CHECK_OK(
+      NodeBuilder(g->NewName("n"), "Concat")
+          .Input(test::graph::Constant(g, concat_dim))
+          .Input({test::graph::Constant(g, in0), test::graph::Constant(g, in1)})
+          .Attr("N", 2)
+          .Attr("T", dt)
+          .Finalize(g, &node));
+
+  testing::BytesProcessed(static_cast<int64>(iters) *
+                          ((kDim1 * dim2) + (kDim1 * dim2)) * sizeof(T));
+  testing::StartTiming();
+  test::Benchmark("cpu", g).Run(iters);
+  testing::UseRealTime();
+}
+
+static void BM_ConcatDim0Float(int iters, int dim2) {
+  ConcatHelper<float>(iters, 0, dim2);
+}
+
+static void BM_ConcatDim1Float(int iters, int dim2) {
+  ConcatHelper<float>(iters, 1, dim2);
+}
+
+BENCHMARK(BM_ConcatDim0Float)->Arg(1000)->Arg(100000)->Arg(1000000);
+BENCHMARK(BM_ConcatDim1Float)->Arg(1000)->Arg(100000)->Arg(1000000);
+
+static void BM_ConcatDim1int16(int iters, int dim2) {
+  ConcatHelper<int16>(iters, 1, dim2);
+}
+static void BM_ConcatDim1bfloat16(int iters, int dim2) {
+  ConcatHelper<bfloat16>(iters, 1, dim2);
+}
+
+BENCHMARK(BM_ConcatDim1int16)->Arg(1000)->Arg(100000)->Arg(1000000);
+BENCHMARK(BM_ConcatDim1bfloat16)->Arg(1000)->Arg(100000)->Arg(1000000);
+
+template <typename T>
+static void ConcatManyHelper(int iters, int concat_dimension, int dim2) {
+  testing::StopTiming();
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+
+  DataType dt = DataTypeToEnum<T>::v();
+  const int kDim1 = 40000;
+  const int kNumInputs = 64;
+  Tensor concat_dim(DT_INT32, TensorShape({}));
+  concat_dim.scalar<int32>()() = concat_dimension;
+  std::vector<NodeBuilder::NodeOut> inputs;
+  inputs.reserve(kNumInputs);
+  for (int i = 0; i < kNumInputs; ++i) {
+    Tensor in(dt, TensorShape({kDim1, dim2}));
+    in.flat<T>().setRandom();
+    inputs.push_back(test::graph::Constant(g, in));
+  }
+
+  Node* node;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Concat")
+                  .Input(test::graph::Constant(g, concat_dim))
+                  .Input(inputs)
+                  .Attr("N", 64)
+                  .Attr("T", dt)
+                  .Finalize(g, &node));
+  testing::BytesProcessed(static_cast<int64>(iters) * kDim1 * dim2 *
+                          kNumInputs * sizeof(T));
+  testing::StartTiming();
+  test::Benchmark("cpu", g).Run(iters);
+  testing::UseRealTime();
+}
+
+static void BM_ConcatManyDim1bfloat16(int iters, int dim2) {
+  ConcatManyHelper<bfloat16>(iters, 1, dim2);
+}
+
+BENCHMARK(BM_ConcatManyDim1bfloat16)->Arg(18)->Arg(34)->Arg(60);
+
+static void MemcpyAlternativeHelper(int iters, int concat_dimension, int dim2) {
+  testing::StopTiming();
+
+  const int kDim1 = 100;
+  std::vector<float> data1(kDim1 * dim2, 1.0f);
+  std::vector<float> data2(kDim1 * dim2, 2.0f);
+
+  testing::BytesProcessed(static_cast<int64>(iters) *
+                          ((kDim1 * dim2) + (kDim1 * dim2)) * sizeof(float));
+  testing::StartTiming();
+  while (--iters > 0) {
+    const int n0 = data1.size();
+    const int n1 = data2.size();
+    float* result = new float[n0 + n1];
+    memcpy(&result[0], &data1[0], n0 * sizeof(float));
+    memcpy(&result[n0], &data2[0], n1 * sizeof(float));
+    delete[] result;
+  }
+}
+
+static void BM_MemcpyAlternativeDim0(int iters, int dim2) {
+  MemcpyAlternativeHelper(iters, 0, dim2);
+}
+static void BM_MemcpyAlternativeDim1(int iters, int dim2) {
+  MemcpyAlternativeHelper(iters, 1, dim2);
+}
+
+BENCHMARK(BM_MemcpyAlternativeDim0)->Arg(1000)->Arg(100000)->Arg(1000000);
+BENCHMARK(BM_MemcpyAlternativeDim1)->Arg(1000)->Arg(100000)->Arg(1000000);
+
+typedef Eigen::TensorMap<Eigen::Tensor<bfloat16, 1, Eigen::RowMajor>,
+                         Eigen::Unaligned> EigenMap;
+static void MemcpyManyAlternative1(int iters, int dim2) {
+  testing::StopTiming();
+
+  const int kDim1 = 40000;
+  const int kNumCopies = 64;
+  const int size = kDim1 * dim2 * kNumCopies;
+  bfloat16* data = new bfloat16[size];
+  EigenMap map(data, size);
+  map.setRandom();
+
+  testing::BytesProcessed(static_cast<int64>(iters) * kDim1 * dim2 *
+                          kNumCopies * sizeof(bfloat16));
+  testing::StartTiming();
+  while (iters-- > 0) {
+    std::vector<bfloat16*> inputs(kNumCopies);
+    for (int i = 0; i < kNumCopies; ++i) {
+      inputs[i] = &data[i * kDim1 * dim2];
+    }
+    bfloat16* result = new bfloat16[size];
+    for (int j = 0; j < kNumCopies; ++j) {
+      bfloat16* output = &result[j * dim2];
+      for (int i = 0; i < kDim1; ++i) {
+        if (i + 1 < kDim1) {
+          port::prefetch<port::PREFETCH_HINT_T0>(inputs[j] + dim2);
+        }
+        memcpy(output, inputs[j], dim2 * sizeof(bfloat16));
+        inputs[j] += dim2;
+        output += dim2 * kNumCopies;
+      }
+    }
+    delete[] result;
+  }
+  delete[] data;
+}
+
+static void MemcpyManyAlternative2(int iters, int dim2) {
+  testing::StopTiming();
+
+  const int kDim1 = 40000;
+  const int kNumCopies = 64;
+  const int size = kDim1 * dim2 * kNumCopies;
+  bfloat16* data = new bfloat16[size];
+  EigenMap map(data, size);
+  map.setRandom();
+
+  testing::BytesProcessed(static_cast<int64>(iters) * kDim1 * dim2 *
+                          kNumCopies * sizeof(bfloat16));
+  testing::StartTiming();
+  std::vector<bfloat16*> inputs(kNumCopies);
+  while (--iters > 0) {
+    bfloat16* result = new bfloat16[size];
+    for (int i = 0; i < kNumCopies; ++i) {
+      inputs[i] = &data[i * kDim1 * dim2];
+    }
+    bfloat16* output = result;
+    for (int i = 0; i < kDim1; ++i) {
+      for (int j = 0; j < kNumCopies; ++j) {
+        if (j + 1 < kNumCopies) {
+          port::prefetch<port::PREFETCH_HINT_T0>(inputs[j + 1]);
+        }
+        memcpy(output, inputs[j], dim2 * sizeof(bfloat16));
+        inputs[j] += dim2;
+        output += dim2;
+      }
+    }
+    delete[] result;
+  }
+  delete[] data;
+}
+
+BENCHMARK(MemcpyManyAlternative1)
+    ->Arg(16)
+    ->Arg(17)
+    ->Arg(18)
+    ->Arg(32)
+    ->Arg(33)
+    ->Arg(34)
+    ->Arg(60)
+    ->Arg(64)
+    ->Arg(65);
+
+BENCHMARK(MemcpyManyAlternative2)
+    ->Arg(16)
+    ->Arg(17)
+    ->Arg(18)
+    ->Arg(32)
+    ->Arg(33)
+    ->Arg(34)
+    ->Arg(60)
+    ->Arg(64)
+    ->Arg(65);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/constant_op.cc b/tensorflow/core/kernels/constant_op.cc
new file mode 100644
index 0000000000..281bafd3df
--- /dev/null
+++ b/tensorflow/core/kernels/constant_op.cc
@@ -0,0 +1,249 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/constant_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor.pb.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/fill_functor.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+ConstantOp::ConstantOp(OpKernelConstruction* ctx)
+    : OpKernel(ctx), tensor_(ctx->output_type(0)) {
+  const TensorProto* proto = nullptr;
+  OP_REQUIRES_OK(ctx, ctx->GetAttr("value", &proto));
+  OP_REQUIRES_OK(ctx, ctx->device()->MakeTensorFromProto(
+                          *proto, AllocatorAttributes(), &tensor_));
+  OP_REQUIRES(
+      ctx, ctx->output_type(0) == tensor_.dtype(),
+      errors::InvalidArgument("Type mismatch between value (",
+                              DataTypeString(tensor_.dtype()), ") and dtype (",
+                              DataTypeString(ctx->output_type(0)), ")"));
+}
+
+void ConstantOp::Compute(OpKernelContext* ctx) { ctx->set_output(0, tensor_); }
+
+ConstantOp::~ConstantOp() {}
+
+REGISTER_KERNEL_BUILDER(Name("Const").Device(DEVICE_CPU), ConstantOp);
+
+#if GOOGLE_CUDA
+#define REGISTER_KERNEL(D, TYPE)                                      \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("Const").Device(DEVICE_##D).TypeConstraint<TYPE>("dtype"), \
+      ConstantOp);
+REGISTER_KERNEL(GPU, float);
+REGISTER_KERNEL(GPU, double);
+REGISTER_KERNEL(GPU, uint8);
+REGISTER_KERNEL(GPU, int8);
+REGISTER_KERNEL(GPU, int16);
+REGISTER_KERNEL(GPU, int64);
+REGISTER_KERNEL(GPU, complex64);
+REGISTER_KERNEL(GPU, bool);
+// Currently we do not support string constants on GPU
+#undef REGISTER_KERNEL
+#endif
+
+// HostConstantOp differs from ConstantOp in that its output is always
+// in host memory.
+class HostConstantOp : public OpKernel {
+ public:
+  explicit HostConstantOp(OpKernelConstruction* ctx)
+      : OpKernel(ctx), tensor_(ctx->output_type(0)) {
+    const TensorProto* proto = nullptr;
+    AllocatorAttributes alloc_attr;
+    alloc_attr.set_on_host(true);
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("value", &proto));
+    OP_REQUIRES_OK(
+        ctx, ctx->device()->MakeTensorFromProto(*proto, alloc_attr, &tensor_));
+    OP_REQUIRES(
+        ctx, ctx->output_type(0) == tensor_.dtype(),
+        errors::InvalidArgument(
+            "Type mismatch between value (", DataTypeString(tensor_.dtype()),
+            ") and dtype (", DataTypeString(ctx->output_type(0)), ")"));
+  }
+
+  void Compute(OpKernelContext* ctx) override { ctx->set_output(0, tensor_); }
+
+  bool IsExpensive() override { return false; }
+
+  ~HostConstantOp() override {}
+
+ private:
+  Tensor tensor_;
+  TF_DISALLOW_COPY_AND_ASSIGN(HostConstantOp);
+};
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Const")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("output")
+                            .TypeConstraint<int32>("dtype"),
+                        HostConstantOp);
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+namespace functor {
+
+// Partial specialization of FillFunctor<Device=CPUDevice, T>.
+template <typename T>
+struct FillFunctor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstScalar in) {
+    out.device(d) = out.constant(in());
+  }
+};
+
+// Partial specialization of SetZeroFunctor<Device=CPUDevice, T>.
+template <typename T>
+struct SetZeroFunctor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out) {
+    out.device(d) = out.constant(0);
+  }
+};
+
+#define DEFINE_SETZERO_CPU(T) template struct SetZeroFunctor<CPUDevice, T>
+DEFINE_SETZERO_CPU(float);
+DEFINE_SETZERO_CPU(double);
+DEFINE_SETZERO_CPU(int32);
+DEFINE_SETZERO_CPU(complex64);
+#undef DEFINE_SETZERO_CPU
+
+}  // end namespace functor
+
+template <typename Device, typename T>
+class FillOp : public OpKernel {
+ public:
+  explicit FillOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& Tdims = context->input(0);
+    OP_REQUIRES(context, TensorShapeUtils::IsLegacyVector(Tdims.shape()),
+                errors::InvalidArgument("dims must be a vector of int32."));
+    const Tensor& Tvalue = context->input(1);
+    OP_REQUIRES(context, TensorShapeUtils::IsLegacyScalar(Tvalue.shape()),
+                errors::InvalidArgument("value must be a scalar."));
+    auto dims = Tdims.flat<int32>();
+    for (int i = 0; i < dims.size(); i++) {
+      OP_REQUIRES(context, dims(i) >= 0,
+                  errors::InvalidArgument("dims[", i, "] = ", dims(i),
+                                          " must be nonnegative."));
+    }
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(
+        context,
+        context->allocate_output(
+            0, TensorShapeUtils::MakeShape(
+                   reinterpret_cast<const int32*>(dims.data()), dims.size()),
+            &out));
+    functor::FillFunctor<Device, T> functor;
+    functor(context->eigen_device<Device>(), out->flat<T>(),
+            Tvalue.scalar<T>());
+  }
+};
+
+#define REGISTER_KERNEL(D, TYPE)                         \
+  REGISTER_KERNEL_BUILDER(Name("Fill")                   \
+                              .Device(DEVICE_##D)        \
+                              .TypeConstraint<TYPE>("T") \
+                              .HostMemory("dims"),       \
+                          FillOp<D##Device, TYPE>);
+
+#define REGISTER_CPU_KERNEL(TYPE) REGISTER_KERNEL(CPU, TYPE)
+TF_CALL_ALL_TYPES(REGISTER_CPU_KERNEL);
+#undef REGISTER_CPU_KERNEL
+
+#if GOOGLE_CUDA
+REGISTER_KERNEL(GPU, float);
+REGISTER_KERNEL(GPU, double);
+REGISTER_KERNEL(GPU, uint8);
+REGISTER_KERNEL(GPU, int8);
+REGISTER_KERNEL(GPU, int16);
+REGISTER_KERNEL(GPU, int64);
+// Currently we do not support filling strings and complex64 on GPU
+
+#endif  // GOOGLE_CUDA
+
+#undef REGISTER_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Fill")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int32>("T")
+                            .HostMemory("dims")
+                            .HostMemory("value")
+                            .HostMemory("output"),
+                        FillOp<CPUDevice, int32>);
+
+template <typename Device, typename T>
+class ZerosLikeOp : public OpKernel {
+ public:
+  explicit ZerosLikeOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& input = ctx->input(0);
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, input.shape(), &out));
+    Tensor zero(DataTypeToEnum<T>::value, {1});
+    zero.scalar<T>().setZero();
+    const Tensor& zero_cref = zero;
+    functor::FillFunctor<Device, T> functor;
+    functor(ctx->eigen_device<Device>(), out->flat<T>(), zero_cref.scalar<T>());
+  }
+};
+
+#define REGISTER_KERNEL(type, dev)                                      \
+  REGISTER_KERNEL_BUILDER(                                              \
+      Name("ZerosLike").Device(DEVICE_##dev).TypeConstraint<type>("T"), \
+      ZerosLikeOp<dev##Device, type>)
+
+#define REGISTER_CPU(type) REGISTER_KERNEL(type, CPU)
+TF_CALL_ALL_TYPES(REGISTER_CPU);
+#undef REGISTER_CPU
+
+#if GOOGLE_CUDA
+REGISTER_KERNEL(float, GPU);
+REGISTER_KERNEL(double, GPU);
+#endif  // GOOGLE_CUDA
+
+#undef REGISTER_KERNEL
+
+class PlaceholderOp : public OpKernel {
+ public:
+  explicit PlaceholderOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("shape", &expected_shape_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    if (expected_shape_.dims() > 0) {
+      OP_REQUIRES(ctx, false,
+                  errors::InvalidArgument(
+                      "You must feed a value for placeholder tensor '", name(),
+                      "' with dtype ", DataTypeString(output_type(0)),
+                      " and shape ", expected_shape_.DebugString()));
+    } else {
+      OP_REQUIRES(ctx, false,
+                  errors::InvalidArgument(
+                      "You must feed a value for placeholder tensor '", name(),
+                      "' with dtype ", DataTypeString(output_type(0))));
+    }
+  }
+
+ private:
+  TensorShape expected_shape_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("Placeholder").Device(DEVICE_CPU), PlaceholderOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/constant_op.h b/tensorflow/core/kernels/constant_op.h
new file mode 100644
index 0000000000..20a5c9c42f
--- /dev/null
+++ b/tensorflow/core/kernels/constant_op.h
@@ -0,0 +1,25 @@
+#ifndef TENSORFLOW_KERNELS_CONSTANT_OP_H_
+#define TENSORFLOW_KERNELS_CONSTANT_OP_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+// ConstantOp returns a tensor specified by ConstantOpDef.
+class ConstantOp : public OpKernel {
+ public:
+  explicit ConstantOp(OpKernelConstruction* ctx);
+  void Compute(OpKernelContext* ctx) override;
+  bool IsExpensive() override { return false; }
+  ~ConstantOp() override;
+
+ private:
+  Tensor tensor_;
+  TF_DISALLOW_COPY_AND_ASSIGN(ConstantOp);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_CONSTANT_OP_H_
diff --git a/tensorflow/core/kernels/constant_op_gpu.cu.cc b/tensorflow/core/kernels/constant_op_gpu.cu.cc
new file mode 100644
index 0000000000..64502378bd
--- /dev/null
+++ b/tensorflow/core/kernels/constant_op_gpu.cu.cc
@@ -0,0 +1,89 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/kernels/fill_functor.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace Eigen {
+namespace internal {
+
+template <typename T>
+struct scalar_const_op {
+  typedef typename packet_traits<T>::type Packet;
+
+  const T* val;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  scalar_const_op(const scalar_const_op& x)
+      : val(x.val) {}
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_const_op(const T* v) : val(v) {}
+
+  template <typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T operator()(Index,
+                                                           Index = 0) const {
+    return *val;
+  }
+
+  template <typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index, Index = 0) const {
+    return internal::pset1<Packet>(*val);
+  }
+};
+
+template <typename T>
+struct functor_traits<scalar_const_op<T> > {
+  enum {
+    Cost = 1,
+    PacketAccess = packet_traits<T>::Vectorizable,
+    IsRepeatable = true
+  };
+};
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+namespace tensorflow {
+
+namespace functor {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Partial specialization FillFunctor<Device=GPUDevice, T>
+template <typename T>
+struct FillFunctor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstScalar in) {
+    Eigen::internal::scalar_const_op<T> f(in.data());
+    out.device(d) = out.nullaryExpr(f);
+  }
+};
+
+#define DEFINE_FILL_GPU(T) template struct FillFunctor<GPUDevice, T>
+DEFINE_FILL_GPU(float);
+DEFINE_FILL_GPU(double);
+DEFINE_FILL_GPU(int32);
+DEFINE_FILL_GPU(uint8);
+DEFINE_FILL_GPU(int16);
+DEFINE_FILL_GPU(int8);
+DEFINE_FILL_GPU(int64);
+#undef DEFINE_FILL_GPU
+
+// Partial specialization of FillFunctor<Device=GPUDevice, T>.
+template <typename T>
+struct SetZeroFunctor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out) {
+    out.device(d) = out.constant(0);
+  }
+};
+
+#define DEFINE_SETZERO_GPU(T) template struct SetZeroFunctor<GPUDevice, T>
+DEFINE_SETZERO_GPU(float);
+#undef DEFINE_SETZERO_GPU
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/constant_op_test.cc b/tensorflow/core/kernels/constant_op_test.cc
new file mode 100644
index 0000000000..f5a464c07c
--- /dev/null
+++ b/tensorflow/core/kernels/constant_op_test.cc
@@ -0,0 +1,43 @@
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/graph.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+// Returns graph containing "num" const nodes.  If 'sequential' is
+// true, make sure all constants are executed sequentially in the
+// graph by adding control dependencies.
+static Graph* ManyConsts(int num, bool sequential) {
+  Graph* g = new Graph(OpRegistry::Global());
+  Node* prev = nullptr;
+  for (int i = 0; i < num; ++i) {
+    Tensor c(DT_FLOAT, TensorShape({}));
+    c.scalar<float>()() = i;
+    Node* curr = test::graph::Constant(g, c);
+    if (sequential && prev != nullptr) {
+      g->AddControlEdge(prev, curr);
+    }
+    prev = curr;
+  }
+  return g;
+}
+
+static void BM_ManyConsts_Parallel(int iters, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  test::Benchmark("cpu", ManyConsts(num, false /* !sequential */)).Run(iters);
+}
+BENCHMARK(BM_ManyConsts_Parallel)->Range(1, 1 << 10);
+
+static void BM_ManyConsts_Sequential(int iters, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  test::Benchmark("cpu", ManyConsts(num, true /* sequential */)).Run(iters);
+}
+BENCHMARK(BM_ManyConsts_Sequential)->Range(1, 1 << 10);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/control_flow_ops.cc b/tensorflow/core/kernels/control_flow_ops.cc
new file mode 100644
index 0000000000..bc44a7f7cc
--- /dev/null
+++ b/tensorflow/core/kernels/control_flow_ops.cc
@@ -0,0 +1,359 @@
+#include "tensorflow/core/kernels/control_flow_ops.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+// A switch op has two inputs and two outputs. It forwards the value of
+// Input:0 to the output specified by input:1. Input:1 is a boolean tensor.
+// Input:0 is forwarded to output:0 if input:1 is false, otherwise to
+// output:1.
+class SwitchOp : public OpKernel {
+ public:
+  explicit SwitchOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& outputPorts = context->input(1);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsScalar(outputPorts.shape()),
+        errors::InvalidArgument("The second input must be a scalar, "
+                                "but it has shape ",
+                                outputPorts.shape().ShortDebugString()));
+
+    bool pred = outputPorts.scalar<bool>()();
+    int port = (pred) ? 1 : 0;
+    if (IsRefType(context->input_dtype(0))) {
+      context->forward_ref_input_to_ref_output(0, port);
+    } else {
+      context->set_output(port, context->input(0));
+    }
+  }
+
+  bool IsExpensive() override { return false; }
+
+  ~SwitchOp() override {}
+
+  TF_DISALLOW_COPY_AND_ASSIGN(SwitchOp);
+};
+
+#define REGISTER_CPU_SWITCH(type)                         \
+  REGISTER_KERNEL_BUILDER(Name("Switch")                  \
+                              .Device(DEVICE_CPU)         \
+                              .HostMemory("pred")         \
+                              .TypeConstraint<type>("T"), \
+                          SwitchOp)
+
+#define REGISTER_CPU_REF_SWITCH(type)                     \
+  REGISTER_KERNEL_BUILDER(Name("RefSwitch")               \
+                              .Device(DEVICE_CPU)         \
+                              .HostMemory("pred")         \
+                              .TypeConstraint<type>("T"), \
+                          SwitchOp)
+
+#define REGISTER_GPU_SWITCH(type)                         \
+  REGISTER_KERNEL_BUILDER(Name("Switch")                  \
+                              .Device(DEVICE_GPU)         \
+                              .HostMemory("pred")         \
+                              .TypeConstraint<type>("T"), \
+                          SwitchOp)
+
+#define REGISTER_GPU_REF_SWITCH(type)                     \
+  REGISTER_KERNEL_BUILDER(Name("RefSwitch")               \
+                              .Device(DEVICE_GPU)         \
+                              .HostMemory("pred")         \
+                              .TypeConstraint<type>("T"), \
+                          SwitchOp)
+
+TF_CALL_ALL_TYPES(REGISTER_CPU_SWITCH);
+TF_CALL_ALL_TYPES(REGISTER_CPU_REF_SWITCH);
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_SWITCH);
+REGISTER_GPU_SWITCH(bool);
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_REF_SWITCH);
+REGISTER_GPU_REF_SWITCH(int32);
+REGISTER_GPU_REF_SWITCH(bool);
+
+#undef REGISTER_CPU_SWITCH
+#undef REGISTER_CPU_REF_SWITCH
+#undef REGISTER_GPU_SWITCH
+#undef REGISTER_GPU_REF_SWITCH
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Switch")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("data")
+                            .HostMemory("pred")
+                            .HostMemory("output_false")
+                            .HostMemory("output_true")
+                            .TypeConstraint<int32>("T"),
+                        SwitchOp);
+
+class RefSelectOp : public OpKernel {
+ public:
+  explicit RefSelectOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("N", &num_ref_inputs_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& index_tensor = context->input(0);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsScalar(index_tensor.shape()),
+        errors::InvalidArgument("Index must be a scalar, "
+                                "but it has shape ",
+                                index_tensor.shape().ShortDebugString()));
+
+    int32 index = index_tensor.scalar<int32>()();
+
+    OP_REQUIRES(context, index >= 0 && index < num_ref_inputs_,
+                errors::InvalidArgument("Index must be in the range [0, ",
+                                        num_ref_inputs_, ") but got ", index));
+    context->forward_ref_input_to_ref_output(index + 1, 0);
+  }
+
+  bool IsExpensive() override { return false; }
+
+  ~RefSelectOp() override {}
+
+  TF_DISALLOW_COPY_AND_ASSIGN(RefSelectOp);
+
+ private:
+  int num_ref_inputs_;
+};
+
+#define REGISTER_CPU_REF_SELECT(type)                     \
+  REGISTER_KERNEL_BUILDER(Name("RefSelect")               \
+                              .Device(DEVICE_CPU)         \
+                              .HostMemory("index")        \
+                              .TypeConstraint<type>("T"), \
+                          RefSelectOp)
+TF_CALL_ALL_TYPES(REGISTER_CPU_REF_SELECT);
+
+#undef REGISTER_CPU_REF_SWITCH
+
+// A merge op has n inputs and two outputs. It forwards the value of the
+// first input that becomes available to its first output, and the
+// index of the first input to its second output.
+class MergeOp : public OpKernel {
+ public:
+  explicit MergeOp(OpKernelConstruction* context) : OpKernel(context) {
+    const DataType dt = context->input_type(0);
+    const int num_in = context->num_inputs();
+    OP_REQUIRES_OK(context, context->MatchSignature(DataTypeVector(num_in, dt),
+                                                    {dt, DT_INT32}));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    bool input_seen = false;
+    for (int i = 0; i < context->num_inputs(); ++i) {
+      if (context->has_input(i)) {
+        if (input_seen) {
+          context->SetStatus(errors::Internal(
+              "Merge can not have more than one valid input."));
+          return;
+        }
+        input_seen = true;
+
+        context->set_output(0, context->input(i));
+        Tensor* value_index = nullptr;
+        OP_REQUIRES_OK(context, context->allocate_output(1, TensorShape({}),
+                                                         &value_index));
+        value_index->scalar<int32>()() = i;
+      }
+    }
+  }
+
+  bool IsExpensive() override { return false; }
+
+  ~MergeOp() override {}
+
+  TF_DISALLOW_COPY_AND_ASSIGN(MergeOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("Merge").Device(DEVICE_CPU), MergeOp);
+
+#define REGISTER_GPU_KERNEL(type)                         \
+  REGISTER_KERNEL_BUILDER(Name("Merge")                   \
+                              .Device(DEVICE_GPU)         \
+                              .TypeConstraint<type>("T")  \
+                              .HostMemory("value_index"), \
+                          MergeOp);
+
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+
+#undef REGISTER_GPU_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Merge")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("inputs")
+                            .HostMemory("output")
+                            .HostMemory("value_index")
+                            .TypeConstraint<int32>("T"),
+                        MergeOp);
+
+// An enter op has one input and one output. It creates or finds
+// the child frame that is uniquely identified by the frame_name,
+// and makes its input available to the child frame.
+class EnterOp : public OpKernel {
+ public:
+  explicit EnterOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    if (IsRefType(context->input_dtype(0))) {
+      context->forward_ref_input_to_ref_output(0, 0);
+    } else {
+      context->set_output(0, context->input(0));
+    }
+  }
+
+  bool IsExpensive() override { return false; }
+
+  ~EnterOp() override {}
+
+  TF_DISALLOW_COPY_AND_ASSIGN(EnterOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("Enter").Device(DEVICE_CPU), EnterOp);
+REGISTER_KERNEL_BUILDER(Name("RefEnter").Device(DEVICE_CPU), EnterOp);
+
+#define REGISTER_GPU_KERNEL(type) \
+  REGISTER_KERNEL_BUILDER(        \
+      Name("Enter").Device(DEVICE_GPU).TypeConstraint<type>("T"), EnterOp);
+#define REGISTER_GPU_REF_KERNEL(type) \
+  REGISTER_KERNEL_BUILDER(            \
+      Name("RefEnter").Device(DEVICE_GPU).TypeConstraint<type>("T"), EnterOp);
+
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+TF_CALL_NUMBER_TYPES(REGISTER_GPU_REF_KERNEL);
+
+#undef REGISTER_GPU_KERNEL
+#undef REGISTER_GPU_REF_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Enter")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("data")
+                            .HostMemory("output")
+                            .TypeConstraint<int32>("T"),
+                        EnterOp);
+
+// An exit op has one input and one output. It exits the current
+// frame to its parent frame, and makes its input available to the
+// parent frame.
+class ExitOp : public OpKernel {
+ public:
+  explicit ExitOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    context->set_output(0, context->input(0));
+  }
+
+  bool IsExpensive() override { return false; }
+
+  ~ExitOp() override {}
+
+  TF_DISALLOW_COPY_AND_ASSIGN(ExitOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("Exit").Device(DEVICE_CPU), ExitOp);
+
+#define REGISTER_GPU_KERNEL(type) \
+  REGISTER_KERNEL_BUILDER(        \
+      Name("Exit").Device(DEVICE_GPU).TypeConstraint<type>("T"), ExitOp);
+
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+
+#undef REGISTER_GPU_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Exit")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("data")
+                            .HostMemory("output")
+                            .TypeConstraint<int32>("T"),
+                        ExitOp);
+
+// A next_iteration op has one input and one output. It makes its input
+// available to the next iteration.
+class NextIterationOp : public OpKernel {
+ public:
+  explicit NextIterationOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    context->set_output(0, context->input(0));
+  }
+
+  bool IsExpensive() override { return false; }
+
+  ~NextIterationOp() override {}
+
+  TF_DISALLOW_COPY_AND_ASSIGN(NextIterationOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("NextIteration").Device(DEVICE_CPU),
+                        NextIterationOp);
+
+#define REGISTER_GPU_KERNEL(type)                                         \
+  REGISTER_KERNEL_BUILDER(                                                \
+      Name("NextIteration").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      NextIterationOp);
+
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+
+#undef REGISTER_GPU_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("NextIteration")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("data")
+                            .HostMemory("output")
+                            .TypeConstraint<int32>("T"),
+                        NextIterationOp);
+
+// A LoopCond op has one input and one output. The input is a boolean
+// scalar representing the taken branches of the "pivot" Switch that
+// determines loop termination. As a contract, any high-level front-end
+// should always use port '0' of the "pivot" switches for loop exit.
+class LoopCondOp : public OpKernel {
+ public:
+  explicit LoopCondOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    context->set_output(0, context->input(0));
+  }
+
+  bool IsExpensive() override { return false; }
+
+  ~LoopCondOp() override {}
+
+  TF_DISALLOW_COPY_AND_ASSIGN(LoopCondOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("LoopCond").Device(DEVICE_CPU), LoopCondOp);
+REGISTER_KERNEL_BUILDER(Name("LoopCond")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("input")
+                            .HostMemory("output"),
+                        LoopCondOp);
+
+// ControlTrigger kernels
+REGISTER_KERNEL_BUILDER(Name("ControlTrigger").Device(DEVICE_CPU),
+                        ControlTriggerOp);
+
+REGISTER_KERNEL_BUILDER(Name("ControlTrigger").Device(DEVICE_GPU),
+                        ControlTriggerOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/control_flow_ops.h b/tensorflow/core/kernels/control_flow_ops.h
new file mode 100644
index 0000000000..184cc9fb63
--- /dev/null
+++ b/tensorflow/core/kernels/control_flow_ops.h
@@ -0,0 +1,22 @@
+#ifndef TENSORFLOW_KERNELS_CONTROL_FLOW_OPS_H_
+#define TENSORFLOW_KERNELS_CONTROL_FLOW_OPS_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+
+namespace tensorflow {
+
+// A ControlTriggerOp is similar to a NoOp. However, it always treats the input
+// control edges as Live edges. Its primary use so far is in the scheduling of
+// recvs, where we add ControlTrigger nodes and use them to trigger recvs. We
+// allow ControlTrigger nodes to be enabled by dead nodes.
+class ControlTriggerOp : public OpKernel {
+ public:
+  explicit ControlTriggerOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+  void Compute(OpKernelContext* context) override {}
+  bool IsExpensive() override { return false; }
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_CONTROL_FLOW_OPS_H_
diff --git a/tensorflow/core/kernels/control_flow_ops_test.cc b/tensorflow/core/kernels/control_flow_ops_test.cc
new file mode 100644
index 0000000000..52bc11abf0
--- /dev/null
+++ b/tensorflow/core/kernels/control_flow_ops_test.cc
@@ -0,0 +1,71 @@
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+// Tests for the switch op
+class SwitchOpTest : public OpsTestBase {
+ protected:
+  void Initialize(DataType dt) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("op", "Switch")
+                  .Input(FakeInput(dt))
+                  .Input(FakeInput())
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(SwitchOpTest, Int32Success_6_s0) {
+  Initialize(DT_INT32);
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<bool>(TensorShape({}), {false});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({6}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+  EXPECT_EQ(nullptr, GetOutput(1));
+}
+
+TEST_F(SwitchOpTest, Int32Success_6_s1) {
+  Initialize(DT_INT32);
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<bool>(TensorShape({}), {true});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({6}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(1));
+  EXPECT_EQ(nullptr, GetOutput(0));
+}
+
+TEST_F(SwitchOpTest, Int32Success_2_3_s0) {
+  Initialize(DT_INT32);
+  AddInputFromArray<int32>(TensorShape({2, 3}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<bool>(TensorShape({}), {false});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({2, 3}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+  EXPECT_EQ(nullptr, GetOutput(1));
+}
+
+TEST_F(SwitchOpTest, StringSuccess_s1) {
+  Initialize(DT_STRING);
+  AddInputFromArray<string>(TensorShape({6}), {"A", "b", "C", "d", "E", "f"});
+  AddInputFromArray<bool>(TensorShape({}), {true});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_STRING, TensorShape({6}));
+  test::FillValues<string>(&expected, {"A", "b", "C", "d", "E", "f"});
+  test::ExpectTensorEqual<string>(expected, *GetOutput(1));
+  EXPECT_EQ(nullptr, GetOutput(0));
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/conv_2d.h b/tensorflow/core/kernels/conv_2d.h
new file mode 100644
index 0000000000..2fb623244c
--- /dev/null
+++ b/tensorflow/core/kernels/conv_2d.h
@@ -0,0 +1,127 @@
+#ifndef TENSORFLOW_KERNELS_CONV_2D_H_
+#define TENSORFLOW_KERNELS_CONV_2D_H_
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// TODO(yangke): revisit these operations and in particular, see if we can
+// combine all of them into just one operation without causing nvcc to
+// timeout.
+template <typename Device, typename T, int Dims>
+struct ShuffleAndReverse {
+  void operator()(const Device& d, typename TTypes<T, Dims>::ConstTensor input,
+                  const Eigen::DSizes<Eigen::DenseIndex, Dims>& order,
+                  const Eigen::array<bool, Dims>& reverse_dims,
+                  typename TTypes<T, Dims>::Tensor output) {
+    output.device(d) = input.shuffle(order).reverse(reverse_dims);
+  }
+};
+
+template <typename Device, typename T, int Dims>
+struct InflatePadAndShuffle {
+  void operator()(
+      const Device& d, typename TTypes<T, Dims>::ConstTensor input,
+      const Eigen::DSizes<Eigen::DenseIndex, Dims>& strides,
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, Dims>& pad_dims,
+      const Eigen::DSizes<Eigen::DenseIndex, Dims>& order,
+      typename TTypes<T, Dims>::Tensor output) {
+    output.device(d) = input.inflate(strides).pad(pad_dims).shuffle(order);
+  }
+};
+
+template <typename Device, typename Input, typename Filter, typename Output>
+void SpatialConvolutionFunc(const Device& d, Output output, Input input,
+                            Filter filter, int stride,
+                            const Eigen::PaddingType& padding) {
+  output.device(d) = Eigen::SpatialConvolution(input, filter, stride, padding);
+}
+
+template <typename Device, typename T>
+struct SpatialConvolution {
+  void operator()(const Device& d, typename TTypes<T, 4>::Tensor output,
+                  typename TTypes<T, 4>::ConstTensor input,
+                  typename TTypes<T, 4>::ConstTensor filter, int stride,
+                  const Eigen::PaddingType& padding) {
+    SpatialConvolutionFunc(d, output, input, filter, stride, padding);
+  }
+};
+
+template <typename Device, typename T>
+struct SpatialConvolutionBackwardInput {
+  void operator()(const Device& d, typename TTypes<T, 4>::Tensor input_backward,
+                  typename TTypes<T, 4>::ConstTensor kernel,
+                  typename TTypes<T, 4>::ConstTensor output_backward,
+                  int input_rows, int input_cols, int stride) {
+    input_backward.device(d) = Eigen::SpatialConvolutionBackwardInput(
+        kernel, output_backward, input_rows, input_cols, stride);
+  }
+};
+
+template <typename Device, typename T>
+struct SpatialConvolutionBackwardKernel {
+  void operator()(const Device& d,
+                  typename TTypes<T, 4>::Tensor kernel_backward,
+                  typename TTypes<T, 4>::ConstTensor input,
+                  typename TTypes<T, 4>::ConstTensor output_backward,
+                  int kernel_rows, int kernel_cols, int stride) {
+    kernel_backward.device(d) = Eigen::SpatialConvolutionBackwardKernel(
+        input, output_backward, kernel_rows, kernel_cols, stride);
+  }
+};
+
+// TODO(vrv): Figure out how to use the MatMulFunctor in matmul_op.h.
+// My initial attempt to do this compiled but failed in the pytest
+// due to a swigdeps error.
+template <typename Device, typename T>
+struct MatMulConvFunctor {
+  // Computes on device "d": out = in0 * in1, where * is matrix
+  // multiplication.
+  void operator()(
+      const Device& d, typename TTypes<T, 2>::Tensor out,
+      typename TTypes<T, 2>::ConstTensor in0,
+      typename TTypes<T, 2>::ConstTensor in1,
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1>& dim_pair) {
+    out.device(d) = in0.contract(in1, dim_pair);
+  }
+};
+
+template <typename Device, typename T>
+struct TransformFilter {
+  void operator()(const Device& d, typename TTypes<T, 4>::ConstTensor in,
+                  typename TTypes<T, 4>::Tensor out) {
+    out.device(d) = in.shuffle(Eigen::DSizes<Eigen::DenseIndex, 4>(3, 2, 0, 1));
+  }
+};
+
+template <typename Device, typename T>
+struct TransformDepth {
+  void operator()(const Device& d, typename TTypes<T, 4>::ConstTensor in,
+                  const Eigen::DSizes<Eigen::DenseIndex, 4>& shuffle,
+                  typename TTypes<T, 4>::Tensor out) {
+    out.device(d) = in.shuffle(shuffle);
+  }
+};
+
+template <typename Device, typename T>
+struct PadInput {
+  void operator()(const Device& d, typename TTypes<T, 4>::ConstTensor in,
+                  int padding_rows_left, int padding_rows_right,
+                  int padding_cols_left, int padding_cols_right,
+                  typename TTypes<T, 4>::Tensor out) {
+    Eigen::array<std::pair<ptrdiff_t, ptrdiff_t>, 4> padding;
+    padding[0] = std::make_pair(0, 0);
+    padding[1] = std::make_pair(padding_rows_left, padding_rows_right);
+    padding[2] = std::make_pair(padding_cols_left, padding_cols_right);
+    padding[3] = std::make_pair(0, 0);
+    out.device(d) = in.pad(padding);
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_CONV_2D_H_
diff --git a/tensorflow/core/kernels/conv_grad_ops.cc b/tensorflow/core/kernels/conv_grad_ops.cc
new file mode 100644
index 0000000000..bb21d7003c
--- /dev/null
+++ b/tensorflow/core/kernels/conv_grad_ops.cc
@@ -0,0 +1,1190 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define USE_EIGEN_TENSOR
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/kernels/conv_2d.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/util/use_cudnn.h"
+#include "tensorflow/core/util/padding.h"
+#include "tensorflow/core/public/tensor.h"
+
+#if GOOGLE_CUDA
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/stream_executor/stream.h"
+#endif  // GOOGLE_CUDA
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+// The operation to compute Conv2D gradients.
+//
+//
+// To compute the gradients for Conv2D, we need three input tensors:
+//    input, filter, and backprop for output.
+// And we need to compute two backprops: one for input and one for filter. We
+// compute them in two different kernels.
+
+// Both backprops can be computed as straightforward conv2d.
+//
+// Consider a case where the input is 3x3 and the filter is 2x1:
+//
+// INPUT = [ A  B  C ]
+//         [ D  E  F ]
+//         [ G  H  I ]
+//
+// where each "A", "B", etc is batch x in_depth
+//
+// FILTER = [ X  Y ]
+//
+// where both "X" and "Y" are in_depth x out_depth
+//
+// With VALID padding, the output is 3x2:
+//
+// OUTPUT = [ a  b ]
+//          [ c  d ]
+//          [ e  f ]
+//
+// where each "a", "b", etc is batch x out_depth
+//
+// So we have:
+//
+//   a = A * X + B * Y
+//   b = B * X + C * Y
+//   c = D * X + E * Y
+//   d = E * X + F * Y
+//   e = G * X + H * Y
+//   f = H * X + I * Y
+//
+// So when we have backprops for the outputs (we denote them by
+// a', b', ... ):
+//
+// The backprops for the input are:
+//
+//   A' = a' * X^t
+//   B' = a' * Y^t + b' * X^t
+//   C' = b' * Y^t
+//   ...
+//
+// This is essentially computing a 2d conv of
+//
+// INPUT = [ 0  a'  b'  0 ]
+//         [ 0  c'  d'  0 ]
+//         [ 0  e'  f'  0 ]
+// and
+//
+// FILTER = [ Y^t X^t ]
+//
+// The backprops for the filter are:
+//
+//   X' = A^t * a' + B^t * b' + D^t * c' + E^t * d' + G^t * e' + H^t * f'
+//   Y' = B^t * a' + C^t * b' + E^t + c' + F^t * d' + H^t * e' + I^t * f'
+//
+// This is essentially computing a 2d conv of
+//
+// INPUT = [ A^t  B^t  C^t ]
+//         [ D^t  E^t  F^t ]
+//         [ G^t  H^t  I^t ]
+//
+// and
+//
+// FILTER = [ a'  b' ]
+//          [ c'  d' ]
+//          [ e'  f' ]
+//
+//
+//////////////////////////////////////////////////////////
+//
+// With stride more than one, it's a bit more complicated (we will need to
+// create holes to the backprop).
+//
+// Consider the case where
+//
+// INPUT = [ A B C D E ]
+//         [ F G H I J ]
+//         [ K L M N O ]
+// and
+//
+// FILTER = [ X Y Z ]
+//
+// with stride 2.
+//
+// The output will be
+//
+// OUTPUT = [ a b ]
+//          [ c d ]
+//
+// where:
+//
+//   a = A * X + B * Y + C * Z
+//   b = C * X + D * Y + E * Z
+//   c = K * X + L * Y + M * Z
+//   d = M * X + N * Y + O * Z
+//
+//
+// To compute the backprop for INPUT, we need to convolve
+//
+// INPUT = [ 0  0  a' 0  b' 0  0 ]
+//         [ 0  0  0  0  0  0  0 ]
+//         [ 0  0  c' 0  d' 0  0 ]
+//
+// (notice the holes in INPUT)
+//
+// and
+//
+// FILTER = [ Z^t  Y^t  X^t ]
+//
+// with stride 1.
+//
+// To compute the backprop for FILTER, we need to convolve
+
+//
+// INPUT = [ A^t  B^t  C^t  D^t  E^t ]
+//         [ F^t  G^t  H^t  I^t  J^t ]
+//         [ K^t  L^t  M^t  N^t  O^t ]
+// and
+//
+// FILTER = [ a' 0  b' ]
+//          [ 0  0  0  ]
+//          [ c' 0  d' ]
+//
+// (notice the holes in FILTER)
+//
+//
+// with stride 1
+//
+//////////////////////////////////////////////////////////
+//
+//
+// The case for SAME padding is in fact very similar to VALID -- we just
+// need to pad the input tensor a bit when computing the filter_backprop.
+
+// Common code between the two kernels: verifies that the dimensions all match
+// and extract the padded rows and columns.
+#define EXTRACT_AND_VERIFY_DIMENSIONS(label)                                   \
+  const Tensor& out_backprop = context->input(2);                              \
+  OP_REQUIRES(                                                                 \
+      context, input_shape.dims() == 4,                                        \
+      errors::InvalidArgument(label, ": input must be 4-dimensional"));        \
+  OP_REQUIRES(                                                                 \
+      context, filter_shape.dims() == 4,                                       \
+      errors::InvalidArgument(label, ": filter must be 4-dimensional"));       \
+  OP_REQUIRES(                                                                 \
+      context, out_backprop.dims() == 4,                                       \
+      errors::InvalidArgument(label, ": out_backprop must be 4-dimensional")); \
+  const int64 batch = input_shape.dim_size(0);                                 \
+  OP_REQUIRES(                                                                 \
+      context, batch == out_backprop.dim_size(0),                              \
+      errors::InvalidArgument(                                                 \
+          label, ": input and out_backprop must have the same batch size"));   \
+  const int64 input_rows = input_shape.dim_size(1);                            \
+  const int64 input_cols = input_shape.dim_size(2);                            \
+  const int64 filter_rows = filter_shape.dim_size(0);                          \
+  const int64 filter_cols = filter_shape.dim_size(1);                          \
+  const int64 output_rows = out_backprop.dim_size(1);                          \
+  const int64 output_cols = out_backprop.dim_size(2);                          \
+  const int64 in_depth = input_shape.dim_size(3);                              \
+  OP_REQUIRES(context, in_depth == filter_shape.dim_size(2),                   \
+              errors::InvalidArgument(                                         \
+                  label, ": input and filter must have the same depth"));      \
+  const int64 out_depth = filter_shape.dim_size(3);                            \
+  OP_REQUIRES(                                                                 \
+      context, out_depth == out_backprop.dim_size(3),                          \
+      errors::InvalidArgument(                                                 \
+          label, ": filter and out_backprop must have the same out_depth"));   \
+  const auto stride = strides_[1];                                             \
+  int out_rows = 0, out_cols = 0, pad_rows = 0, pad_cols = 0;                  \
+  if (filter_cols == filter_rows && filter_rows == 1 && stride == 1) {         \
+    out_rows = input_rows;                                                     \
+    out_cols = input_cols;                                                     \
+  } else {                                                                     \
+    OP_REQUIRES_OK(                                                            \
+        context, Get2dOutputSize(input_rows, input_cols, filter_rows,          \
+                                 filter_cols, stride, stride, padding_,        \
+                                 &out_rows, &out_cols, &pad_rows, &pad_cols)); \
+  }                                                                            \
+  OP_REQUIRES(                                                                 \
+      context, output_rows == out_rows,                                        \
+      errors::InvalidArgument(                                                 \
+          label, ": Number of rows of out_backprop doesn't match computed: ",  \
+          "actual = ", output_rows, ", computed = ", out_rows));               \
+  OP_REQUIRES(                                                                 \
+      context, output_cols == out_cols,                                        \
+      errors::InvalidArgument(                                                 \
+          label, ": Number of cols of out_backprop doesn't match computed: ",  \
+          "actual = ", output_cols, ", computed = ", out_cols));               \
+  const auto expanded_out_rows = (output_rows - 1) * stride + 1;               \
+  const auto expanded_out_cols = (output_cols - 1) * stride + 1;               \
+  const auto padded_out_rows = input_rows + filter_rows - 1;                   \
+  const auto padded_out_cols = input_cols + filter_cols - 1;                   \
+  const auto top_pad_rows = filter_rows - 1 - pad_rows;                        \
+  const auto left_pad_cols = filter_cols - 1 - pad_cols;                       \
+  const auto bottom_pad_rows =                                                 \
+      padded_out_rows - expanded_out_rows - top_pad_rows;                      \
+  const auto right_pad_cols =                                                  \
+      padded_out_cols - expanded_out_cols - left_pad_cols;                     \
+  Eigen::DSizes<Eigen::DenseIndex, 4> strides{1, stride, stride, 1};           \
+  VLOG(2) << "Conv2d: " << label                                               \
+          << ": expanded_out_rows = " << expanded_out_rows                     \
+          << ", expanded_out_cols = " << expanded_out_cols                     \
+          << ", filter_rows = " << filter_rows                                 \
+          << ", filter_cols = " << filter_cols                                 \
+          << ", padded_out_rows = " << padded_out_rows                         \
+          << ", padded_out_cols = " << padded_out_cols                         \
+          << ", top_pad_rows = " << top_pad_rows                               \
+          << ", left_pad_cols = " << left_pad_cols                             \
+          << ", bottom_pad_rows = " << bottom_pad_rows                         \
+          << ", right_pad_cols = " << right_pad_cols                           \
+          << ", strides = " << strides[1]
+
+namespace {
+TensorShape VectorToShape(const TTypes<int32>::ConstVec& sizes) {
+  TensorShape shape;
+
+  using Index = TTypes<int32>::ConstVec::Index;
+  const Index dims = sizes.size();
+  for (Index i = 0; i < dims; ++i) {
+    shape.AddDim(sizes(i));
+  }
+
+  return shape;
+}
+}  // namespace
+
+// The fast versions using eigen computations directly. They are only enabled
+// for CPU for now since nvcc times out when trying to compile them.
+// TODO(yangke): enable them for GPUs when we have a faster compiler.
+
+template <typename Device, class T>
+class Conv2DFastBackpropInputOp : public OpKernel {
+ public:
+  explicit Conv2DFastBackpropInputOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &strides_));
+    OP_REQUIRES(context, strides_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window strides field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES(context, strides_[1] == strides_[2],
+                errors::InvalidArgument(
+                    "Current implementation only supports equal length "
+                    "strides in the row and column dimensions."));
+    OP_REQUIRES(context, (strides_[0] == 1 && strides_[3] == 1),
+                errors::InvalidArgument(
+                    "Current implementation does not yet support "
+                    "strides in the batch and depth dimensions."));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input_sizes = context->input(0);
+    const Tensor& filter = context->input(1);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsVector(input_sizes.shape()),
+        errors::InvalidArgument(
+            "Conv2DBackpropInput: input_sizes input must be 1-dim, not ",
+            input_sizes.dims()));
+    TensorShape input_shape = VectorToShape(input_sizes.vec<int32>());
+    const TensorShape& filter_shape = filter.shape();
+
+    EXTRACT_AND_VERIFY_DIMENSIONS("Conv2DBackpropInput");
+    Tensor* in_backprop = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input_shape, &in_backprop));
+    // Need to flip the input_rows and input_cols when passing to eigen.
+    functor::SpatialConvolutionBackwardInput<Device, T>()(
+        context->eigen_device<Device>(), in_backprop->tensor<T, 4>(),
+        filter.tensor<T, 4>(), out_backprop.tensor<T, 4>(), input_cols,
+        input_rows, stride);
+  }
+
+ private:
+  std::vector<int32> strides_;
+  Padding padding_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Conv2DFastBackpropInputOp);
+};
+
+// Based on implementation written by Yangqing Jia (jiayq).
+template <typename Device, class T>
+class Conv2DCustomBackpropInputOp : public OpKernel {
+ public:
+  explicit Conv2DCustomBackpropInputOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &strides_));
+    OP_REQUIRES(context, strides_.size() == 4,
+                errors::InvalidArgument("Sliding window strides field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES(context, strides_[1] == strides_[2],
+                errors::InvalidArgument(
+                    "Current implementation only supports equal length "
+                    "strides in the row and column dimensions."));
+    OP_REQUIRES(
+        context, (strides_[0] == 1 && strides_[3] == 1),
+        errors::InvalidArgument("Current implementation does not yet support "
+                                "strides in the batch and depth dimensions."));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input_sizes = context->input(0);
+    const Tensor& filter = context->input(1);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsVector(input_sizes.shape()),
+        errors::InvalidArgument(
+            "Conv2DBackpropInput: input_sizes input must be 1-dim, not ",
+            input_sizes.dims()));
+    TensorShape input_shape = VectorToShape(input_sizes.vec<int32>());
+    const TensorShape& filter_shape = filter.shape();
+
+    EXTRACT_AND_VERIFY_DIMENSIONS("Conv2DBackpropInput");
+    Tensor* in_backprop = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input_shape, &in_backprop));
+
+    // TODO(andydavis) Consider moving code shared with
+    // Conv2DCustomBackpropFilterOp into a shared helper function.
+    int pad_top;
+    int pad_bottom;
+    int pad_left;
+    int pad_right;
+    OP_REQUIRES_OK(
+        context,
+        Get2dOutputSizeVerbose(input_rows, input_cols, filter_rows, filter_cols,
+                               stride, stride, padding_, &out_rows, &out_cols,
+                               &pad_top, &pad_bottom, &pad_left, &pad_right));
+
+    // The total dimension size of each kernel.
+    const int filter_total_size = filter_rows * filter_cols * in_depth;
+    // The output image size is the spatial size of the output.
+    const int output_image_size = out_rows * out_cols;
+
+    Tensor col_buffer;
+    OP_REQUIRES_OK(
+        context,
+        context->allocate_temp(
+            DataTypeToEnum<T>::value,
+            TensorShape({output_image_size, filter_total_size}), &col_buffer));
+
+    // The input offset corresponding to a single input image.
+    const int input_offset = input_rows * input_cols * in_depth;
+    // The output offset corresponding to a single output image.
+    const int output_offset = out_rows * out_cols * out_depth;
+
+    auto* filter_data = filter.template flat<T>().data();
+    auto* col_buffer_data = col_buffer.template flat<T>().data();
+    auto* out_backprop_data = out_backprop.template flat<T>().data();
+    auto* input_backprop_data = in_backprop->template flat<T>().data();
+
+    typedef Eigen::Map<Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic,
+                                     Eigen::RowMajor>> MatrixMap;
+    typedef Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic,
+                                           Eigen::RowMajor>> ConstMatrixMap;
+
+    for (int image_id = 0; image_id < batch; ++image_id) {
+      // Compute gradient into col_buffer.
+      MatrixMap C(col_buffer_data, output_image_size, filter_total_size);
+
+      ConstMatrixMap A(out_backprop_data + output_offset * image_id,
+                       output_image_size, out_depth);
+      ConstMatrixMap B(filter_data, filter_total_size, out_depth);
+
+      // TODO(andydavis) Use a multi-threaded matmul implementation here.
+      C.noalias() = A * B.transpose();
+
+      Col2im<T>(col_buffer_data, in_depth, input_rows, input_cols, filter_rows,
+                filter_cols, pad_top, pad_left, pad_bottom, pad_right, stride,
+                stride, input_backprop_data);
+
+      input_backprop_data += input_offset;
+    }
+  }
+
+ private:
+  std::vector<int32> strides_;
+  Padding padding_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Conv2DCustomBackpropInputOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("Conv2DBackpropInput")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        Conv2DCustomBackpropInputOp<CPUDevice, float>);
+
+REGISTER_KERNEL_BUILDER(Name("Conv2DBackpropInput")
+                            .Device(DEVICE_CPU)
+                            .Label("custom")
+                            .TypeConstraint<float>("T"),
+                        Conv2DCustomBackpropInputOp<CPUDevice, float>);
+
+REGISTER_KERNEL_BUILDER(Name("Conv2DBackpropInput")
+                            .Device(DEVICE_CPU)
+                            .Label("eigen_tensor")
+                            .TypeConstraint<float>("T"),
+                        Conv2DFastBackpropInputOp<CPUDevice, float>);
+
+template <typename Device, class T>
+class Conv2DFastBackpropFilterOp : public OpKernel {
+ public:
+  explicit Conv2DFastBackpropFilterOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &strides_));
+    OP_REQUIRES(context, strides_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window strides field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES(context, strides_[1] == strides_[2],
+                errors::InvalidArgument(
+                    "Current implementation only supports equal length "
+                    "strides in the row and column dimensions."));
+    OP_REQUIRES(context, (strides_[0] == 1 && strides_[3] == 1),
+                errors::InvalidArgument(
+                    "Current implementation does not yet support "
+                    "strides in the batch and depth dimensions."));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& filter_sizes = context->input(1);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsVector(filter_sizes.shape()),
+        errors::InvalidArgument(
+            "Conv2DBackpropFilter: filter_sizes input must be 1-dim, not ",
+            filter_sizes.dims()));
+    const TensorShape& input_shape = input.shape();
+    TensorShape filter_shape = VectorToShape(filter_sizes.vec<int32>());
+
+    EXTRACT_AND_VERIFY_DIMENSIONS("Conv2DBackpropFilter");
+    Tensor* filter_backprop = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, filter_shape, &filter_backprop));
+
+    // Need to flip the filter_rows and filter_cols when passing to eigen.
+    functor::SpatialConvolutionBackwardKernel<Device, T>()(
+        context->eigen_device<Device>(), filter_backprop->tensor<T, 4>(),
+        input.tensor<T, 4>(), out_backprop.tensor<T, 4>(), filter_cols,
+        filter_rows, stride);
+  }
+
+ private:
+  std::vector<int32> strides_;
+  Padding padding_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Conv2DFastBackpropFilterOp);
+};
+
+// Based on implementation written by Yangqing Jia (jiayq).
+template <typename Device, class T>
+class Conv2DCustomBackpropFilterOp : public OpKernel {
+ public:
+  explicit Conv2DCustomBackpropFilterOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &strides_));
+    OP_REQUIRES(context, strides_.size() == 4,
+                errors::InvalidArgument("Sliding window strides field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES(context, strides_[1] == strides_[2],
+                errors::InvalidArgument(
+                    "Current implementation only supports equal length "
+                    "strides in the row and column dimensions."));
+    OP_REQUIRES(
+        context, (strides_[0] == 1 && strides_[3] == 1),
+        errors::InvalidArgument("Current implementation does not yet support "
+                                "strides in the batch and depth dimensions."));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& filter_sizes = context->input(1);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsVector(filter_sizes.shape()),
+        errors::InvalidArgument(
+            "Conv2DCustomBackpropFilter: filter_sizes input must be 1-dim, "
+            "not ",
+            filter_sizes.dims()));
+    const TensorShape& input_shape = input.shape();
+    TensorShape filter_shape = VectorToShape(filter_sizes.vec<int32>());
+
+    EXTRACT_AND_VERIFY_DIMENSIONS("Conv2DCustomBackpropFilter");
+    Tensor* filter_backprop;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, filter_shape, &filter_backprop));
+
+    int pad_top;
+    int pad_bottom;
+    int pad_left;
+    int pad_right;
+    OP_REQUIRES_OK(
+        context,
+        Get2dOutputSizeVerbose(input_rows, input_cols, filter_rows, filter_cols,
+                               stride, stride, padding_, &out_rows, &out_cols,
+                               &pad_top, &pad_bottom, &pad_left, &pad_right));
+
+    // The total dimension size of each kernel.
+    const int filter_total_size = filter_rows * filter_cols * in_depth;
+    // The output image size is the spatial size of the output.
+    const int output_image_size = out_rows * out_cols;
+
+    Tensor col_buffer;
+    OP_REQUIRES_OK(
+        context,
+        context->allocate_temp(
+            DataTypeToEnum<T>::value,
+            TensorShape({output_image_size, filter_total_size}), &col_buffer));
+
+    // The input offset corresponding to a single input image.
+    const int input_offset = input_rows * input_cols * in_depth;
+    // The output offset corresponding to a single output image.
+    const int output_offset = out_rows * out_cols * out_depth;
+
+    auto* input_data = input.template flat<T>().data();
+    auto* col_buffer_data = col_buffer.template flat<T>().data();
+    auto* out_backprop_data = out_backprop.template flat<T>().data();
+    auto* filter_backprop_data = filter_backprop->template flat<T>().data();
+
+    typedef Eigen::Map<Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic,
+                                     Eigen::RowMajor>> MatrixMap;
+    typedef Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic,
+                                           Eigen::RowMajor>> ConstMatrixMap;
+
+    MatrixMap C(filter_backprop_data, filter_total_size, out_depth);
+
+    C.setZero();
+    for (int image_id = 0; image_id < batch; ++image_id) {
+      // When we compute the gradient with respect to the filters, we need to do
+      // im2col to allow gemm-type computation.
+      Im2col<T>(input_data, in_depth, input_rows, input_cols, filter_rows,
+                filter_cols, pad_top, pad_left, pad_bottom, pad_right, stride,
+                stride, col_buffer_data);
+
+      ConstMatrixMap A(col_buffer_data, output_image_size, filter_total_size);
+      ConstMatrixMap B(out_backprop_data + output_offset * image_id,
+                       output_image_size, out_depth);
+
+      // Compute gradient with respect to filter.
+      // TODO(andydavis) Use a multi-threaded matmul implementation here.
+      C.noalias() += A.transpose() * B;
+
+      input_data += input_offset;
+    }
+  }
+
+ private:
+  std::vector<int32> strides_;
+  Padding padding_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Conv2DCustomBackpropFilterOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("Conv2DBackpropFilter")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        Conv2DCustomBackpropFilterOp<CPUDevice, float>);
+
+REGISTER_KERNEL_BUILDER(Name("Conv2DBackpropFilter")
+                            .Device(DEVICE_CPU)
+                            .Label("custom")
+                            .TypeConstraint<float>("T"),
+                        Conv2DCustomBackpropFilterOp<CPUDevice, float>);
+
+REGISTER_KERNEL_BUILDER(Name("Conv2DBackpropFilter")
+                            .Device(DEVICE_CPU)
+                            .Label("eigen_tensor")
+                            .TypeConstraint<float>("T"),
+                        Conv2DFastBackpropFilterOp<CPUDevice, float>);
+
+// GPU definitions of both ops.
+#if GOOGLE_CUDA
+namespace {
+template <typename T>
+perftools::gputools::DeviceMemory<T> AsDeviceMemory(const T* cuda_memory,
+                                                    uint64 size) {
+  perftools::gputools::DeviceMemoryBase wrapped(const_cast<T*>(cuda_memory),
+                                                size * sizeof(T));
+  perftools::gputools::DeviceMemory<T> typed(wrapped);
+  return typed;
+}
+}  // namespace
+
+// The slow version (but compiles for GPU)
+
+// Backprop for input.
+template <typename Device, class T>
+class Conv2DSlowBackpropInputOp : public OpKernel {
+ public:
+  explicit Conv2DSlowBackpropInputOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &strides_));
+    OP_REQUIRES(context, strides_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window strides field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES(context, strides_[1] == strides_[2],
+                errors::InvalidArgument(
+                    "Current implementation only supports equal length "
+                    "strides in the row and column dimensions."));
+    OP_REQUIRES(context, (strides_[0] == 1 && strides_[3] == 1),
+                errors::InvalidArgument(
+                    "Current implementation does not yet support "
+                    "strides in the batch and depth dimensions."));
+    OP_REQUIRES_OK(context, context->GetAttr("use_cudnn_on_gpu", &use_cudnn_));
+    use_cudnn_ &= CanUseCudnn();
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input_sizes = context->input(0);
+    const Tensor& filter = context->input(1);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsVector(input_sizes.shape()),
+        errors::InvalidArgument(
+            "Conv2DBackpropInput: input_sizes input must be 1-dim, not ",
+            input_sizes.dims()));
+    TensorShape input_shape = VectorToShape(input_sizes.vec<int32>());
+    const TensorShape& filter_shape = filter.shape();
+
+    EXTRACT_AND_VERIFY_DIMENSIONS("Conv2DBackpropInput");
+    Tensor* in_backprop = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input_shape, &in_backprop));
+
+    const int padding_rows =
+        (output_rows - 1) * stride + filter_rows - input_rows;
+    const int padding_cols =
+        (output_cols - 1) * stride + filter_cols - input_cols;
+
+    // TODO(keveman): cuDNN only supports equal padding on both sides, so only
+    // calling it when that is true. Remove this check when (if?) cuDNN starts
+    // supporting different padding.
+    bool padding_compatible =
+        (padding_rows % 2 == 0) && (padding_cols % 2 == 0);
+
+    auto* stream = context->op_device_context<GPUDeviceContext>()->stream();
+    OP_REQUIRES(context, stream, errors::Internal("No GPU stream available."));
+
+    if (use_cudnn_ && padding_compatible) {
+      if (filter_rows == 1 && filter_cols == 1 && stride == 1) {
+        // 1x1 filter, so call cublas directly.
+        const uint64 m = batch * input_rows * input_cols;
+        const uint64 k = out_depth;
+        const uint64 n = in_depth;
+
+        auto a_ptr = AsDeviceMemory(out_backprop.template flat<T>().data(),
+                                    out_backprop.template flat<T>().size());
+        auto b_ptr = AsDeviceMemory(filter.template flat<T>().data(),
+                                    filter.template flat<T>().size());
+        auto c_ptr = AsDeviceMemory(in_backprop->template flat<T>().data(),
+                                    in_backprop->template flat<T>().size());
+
+        auto transpose = perftools::gputools::blas::Transpose::kTranspose;
+        auto no_transpose = perftools::gputools::blas::Transpose::kNoTranspose;
+
+        bool blas_launch_status =
+            stream->ThenBlasGemm(transpose, no_transpose, n, m, k, 1.0f, b_ptr,
+                                 k, a_ptr, k, 0.0f, &c_ptr, n)
+                .ok();
+        if (!blas_launch_status) {
+          context->SetStatus(errors::Internal("Blas SGEMM launch failed : m=",
+                                              m, ", n=", n, ", k=", k));
+        }
+        return;
+      }
+
+      perftools::gputools::dnn::BatchDescriptor input_desc;
+      input_desc.set_count(batch)
+          .set_height(input_rows)
+          .set_width(input_cols)
+          .set_feature_map_count(in_depth)
+          .set_layout(perftools::gputools::dnn::DataLayout::kBatchDepthYX);
+      perftools::gputools::dnn::BatchDescriptor output_desc;
+      output_desc.set_count(batch)
+          .set_height(output_rows)
+          .set_width(output_cols)
+          .set_feature_map_count(out_depth)
+          .set_layout(perftools::gputools::dnn::DataLayout::kBatchDepthYX);
+      perftools::gputools::dnn::FilterDescriptor filter_desc;
+      filter_desc.set_input_filter_height(filter_rows)
+          .set_input_filter_width(filter_cols)
+          .set_input_feature_map_count(in_depth)
+          .set_output_feature_map_count(out_depth);
+      perftools::gputools::dnn::ConvolutionDescriptor conv_desc;
+      conv_desc.set_vertical_filter_stride(stride)
+          .set_horizontal_filter_stride(stride)
+          .set_zero_padding_height(padding_rows / 2)
+          .set_zero_padding_width(padding_cols / 2);
+
+      // NOTE(keveman):
+      // cuDNN only supports the following layouts :
+      // Input  : B x D x R x C
+      // Filter : OD x ID x R x C
+      // Whereas, we have
+      // Input  : B x R x C x D
+      // Filter : R x C x ID x OD
+      // TransformFilter performs (R x C x ID x OD) => (OD x ID x R x C)
+      // The first TransformDepth performs
+      // (B x R x C x D) => (B x D x R x C).
+      // Since the tensor returned from cuDNN is B x D x R x C also,
+      // the second TransformDepth performs
+      // (B x D x R x C) => (B x R x C x D).
+      Tensor transformed_filter;
+      OP_REQUIRES_OK(
+          context,
+          context->allocate_temp(
+              DataTypeToEnum<T>::value,
+              TensorShape({out_depth, in_depth, filter_rows, filter_cols}),
+              &transformed_filter));
+
+      functor::TransformFilter<Device, T>()(context->eigen_device<Device>(),
+                                            filter.tensor<T, 4>(),
+                                            transformed_filter.tensor<T, 4>());
+
+      Tensor transformed_out_backprop;
+      OP_REQUIRES_OK(
+          context,
+          context->allocate_temp(
+              DataTypeToEnum<T>::value,
+              TensorShape({batch, out_depth, output_rows, output_cols}),
+              &transformed_out_backprop));
+
+      functor::TransformDepth<Device, T>()(
+          context->eigen_device<Device>(), out_backprop.tensor<T, 4>(),
+          Eigen::DSizes<Eigen::DenseIndex, 4>(0, 3, 1, 2),
+          transformed_out_backprop.tensor<T, 4>());
+
+      Tensor pre_transformed_in_backprop;
+      OP_REQUIRES_OK(context,
+                     context->allocate_temp(
+                         DataTypeToEnum<T>::value,
+                         TensorShape({batch, in_depth, input_rows, input_cols}),
+                         &pre_transformed_in_backprop));
+
+      auto out_backprop_ptr =
+          AsDeviceMemory(transformed_out_backprop.template flat<T>().data(),
+                         transformed_out_backprop.template flat<T>().size());
+      auto filter_ptr =
+          AsDeviceMemory(transformed_filter.template flat<T>().data(),
+                         transformed_filter.template flat<T>().size());
+      auto in_backprop_ptr =
+          AsDeviceMemory(pre_transformed_in_backprop.template flat<T>().data(),
+                         pre_transformed_in_backprop.template flat<T>().size());
+
+      bool cudnn_launch_status =
+          stream->ThenConvolveBackwardData(filter_desc, filter_ptr, output_desc,
+                                           out_backprop_ptr, conv_desc,
+                                           input_desc, &in_backprop_ptr)
+              .ok();
+
+      if (!cudnn_launch_status) {
+        context->SetStatus(errors::Internal(
+            "cuDNN Backward Data function launch failure : input shape(",
+            input_shape.DebugString(), ") filter shape(",
+            filter_shape.DebugString(), ")"));
+      }
+
+      auto toConstTensor = [](const Tensor& x) -> const Tensor { return x; };
+      functor::TransformDepth<Device, T>()(
+          context->eigen_device<Device>(),
+          toConstTensor(pre_transformed_in_backprop).template tensor<T, 4>(),
+          Eigen::DSizes<Eigen::DenseIndex, 4>(0, 2, 3, 1),
+          in_backprop->tensor<T, 4>());
+    } else {
+      // We fill out a padded out_backprop
+      TensorShape padded_out_shape(
+          {batch, padded_out_rows, padded_out_cols, out_depth});
+      Tensor padded_output;
+      OP_REQUIRES_OK(context,
+                     context->allocate_temp(DataTypeToEnum<T>::v(),
+                                            padded_out_shape, &padded_output));
+
+      Eigen::DSizes<Eigen::DenseIndex, 4> trivial_order{0, 1, 2, 3};
+      Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 4> pad_dims{
+          {{0, 0},
+           {top_pad_rows, bottom_pad_rows},
+           {left_pad_cols, right_pad_cols},
+           {0, 0}}};
+
+      functor::InflatePadAndShuffle<Device, T, 4>()(
+          context->eigen_device<Device>(), out_backprop.tensor<T, 4>(), strides,
+          pad_dims, trivial_order, padded_output.tensor<T, 4>());
+      const Tensor& padded_output_cref = padded_output;
+
+      // We then need to fill a new "reverted" filter
+      // We need to transpose the in_depth and out_depth for the filter and
+      // inverse the rows and cols.
+      TensorShape r_filter_shape(
+          {filter_rows, filter_cols, out_depth, in_depth});
+      Tensor r_filter;
+      OP_REQUIRES_OK(context,
+                     context->allocate_temp(DataTypeToEnum<T>::v(),
+                                            r_filter_shape, &r_filter));
+
+      Eigen::DSizes<Eigen::DenseIndex, 4> filter_order{0, 1, 3, 2};
+      Eigen::array<bool, 4> filter_rev_dims{true, true, false, false};
+      functor::ShuffleAndReverse<Device, T, 4>()(
+          context->eigen_device<Device>(), filter.tensor<T, 4>(), filter_order,
+          filter_rev_dims, r_filter.tensor<T, 4>());
+      const Tensor& r_filter_cref = r_filter;
+
+      // Now we can call conv_2d directly.
+      functor::SpatialConvolution<Device, T>()(
+          context->eigen_device<Device>(), in_backprop->tensor<T, 4>(),
+          padded_output_cref.tensor<T, 4>(), r_filter_cref.tensor<T, 4>(), 1,
+          BrainPadding2EigenPadding(VALID));
+    }
+  }
+
+ private:
+  std::vector<int32> strides_;
+  Padding padding_;
+  bool use_cudnn_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Conv2DSlowBackpropInputOp);
+};
+
+// Backprop for filter.
+template <typename Device, class T>
+class Conv2DSlowBackpropFilterOp : public OpKernel {
+ public:
+  explicit Conv2DSlowBackpropFilterOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &strides_));
+    OP_REQUIRES(context, strides_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window strides field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES(context, strides_[1] == strides_[2],
+                errors::InvalidArgument(
+                    "Current implementation only supports equal length "
+                    "strides in the row and column dimensions."));
+    OP_REQUIRES(context, (strides_[0] == 1 && strides_[3] == 1),
+                errors::InvalidArgument(
+                    "Current implementation does not yet support "
+                    "strides in the batch and depth dimensions."));
+    OP_REQUIRES_OK(context, context->GetAttr("use_cudnn_on_gpu", &use_cudnn_));
+    use_cudnn_ &= CanUseCudnn();
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& filter_sizes = context->input(1);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsVector(filter_sizes.shape()),
+        errors::InvalidArgument(
+            "Conv2DBackpropFilter: filter_sizes input must be 1-dim, not ",
+            filter_sizes.dims()));
+    const TensorShape& input_shape = input.shape();
+    TensorShape filter_shape = VectorToShape(filter_sizes.vec<int32>());
+
+    EXTRACT_AND_VERIFY_DIMENSIONS("Conv2DBackpropFilter");
+    Tensor* filter_backprop = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, filter_shape, &filter_backprop));
+
+    const int padding_rows =
+        (output_rows - 1) * stride + filter_rows - input_rows;
+    const int padding_cols =
+        (output_cols - 1) * stride + filter_cols - input_cols;
+
+    // TODO(zhengxq): cuDNN only supports equal padding on both sides, so only
+    // calling it when that is true. Remove this check when (if?) cuDNN starts
+    // supporting different padding.
+    bool padding_compatible =
+        (padding_rows % 2 == 0) && (padding_cols % 2 == 0);
+
+    auto* stream = context->op_device_context<GPUDeviceContext>()->stream();
+    OP_REQUIRES(context, stream, errors::Internal("No GPU stream available."));
+
+    if (use_cudnn_ && padding_compatible) {
+      if (filter_rows == 1 && filter_cols == 1 && stride == 1) {
+        const uint64 m = in_depth;
+        const uint64 k = batch * input_rows * input_cols;
+        const uint64 n = out_depth;
+
+        // The shape of output backprop is
+        //   [batch, out_rows, out_cols, out_depth]
+        //   From cublas's perspective, it is: n x k
+        auto a_ptr = AsDeviceMemory(out_backprop.template flat<T>().data(),
+                                    out_backprop.template flat<T>().size());
+
+        // The shape of input is
+        //   [batch, in_rows, in_cols, in_depth],
+        //   From cublas's perspective, it is: m x k
+        auto b_ptr = AsDeviceMemory(input.template flat<T>().data(),
+                                    input.template flat<T>().size());
+
+        // the shape of the filter backprop from the conv_2d should be
+        //   [1, 1, in_depth, out_depth]
+        //   From cublas's perspective, it is: n x m
+        auto c_ptr = AsDeviceMemory(filter_backprop->template flat<T>().data(),
+                                    filter_backprop->template flat<T>().size());
+
+        bool blas_launch_status =
+            stream->ThenBlasGemm(
+                      perftools::gputools::blas::Transpose::kNoTranspose,
+                      perftools::gputools::blas::Transpose::kTranspose, n, m, k,
+                      1.0f, a_ptr, n, b_ptr, m, 0.0f, &c_ptr, n)
+                .ok();
+        if (!blas_launch_status) {
+          context->SetStatus(errors::Internal("Blas SGEMM launch failed : m=",
+                                              m, ", n=", n, ", k=", k));
+        }
+        return;
+      }
+
+      perftools::gputools::dnn::BatchDescriptor input_desc;
+      input_desc.set_count(batch)
+          .set_height(input_rows)
+          .set_width(input_cols)
+          .set_feature_map_count(in_depth)
+          .set_layout(perftools::gputools::dnn::DataLayout::kBatchDepthYX);
+      perftools::gputools::dnn::BatchDescriptor output_desc;
+      output_desc.set_count(batch)
+          .set_height(output_rows)
+          .set_width(output_cols)
+          .set_feature_map_count(out_depth)
+          .set_layout(perftools::gputools::dnn::DataLayout::kBatchDepthYX);
+      perftools::gputools::dnn::FilterDescriptor filter_desc;
+      filter_desc.set_input_filter_height(filter_rows)
+          .set_input_filter_width(filter_cols)
+          .set_input_feature_map_count(in_depth)
+          .set_output_feature_map_count(out_depth);
+      perftools::gputools::dnn::ConvolutionDescriptor conv_desc;
+      conv_desc.set_vertical_filter_stride(stride)
+          .set_horizontal_filter_stride(stride)
+          .set_zero_padding_height(padding_rows / 2)
+          .set_zero_padding_width(padding_cols / 2);
+
+      // NOTE(zhengxq):
+      // cuDNN only supports the following layouts :
+      // Input  : B x D x R x C
+      // Filter : OD x ID x R x C
+      // Whereas, we have
+      // Input  : B x R x C x D
+      // Filter : R x C x ID x OD
+      // TransformFilter performs (R x C x ID x OD) => (OD x ID x R x C)
+      // The first TransformDepth performs
+      // (B x R x C x D) => (B x D x R x C).
+      // Since the tensor returned from cuDNN is B x D x R x C also,
+      // the second TransformDepth performs
+      // (B x D x R x C) => (B x R x C x D).
+
+      Tensor pre_transformed_filter_backprop;
+      OP_REQUIRES_OK(
+          context,
+          context->allocate_temp(
+              DataTypeToEnum<T>::value,
+              TensorShape({out_depth, in_depth, filter_rows, filter_cols}),
+              &pre_transformed_filter_backprop));
+
+      Tensor transformed_out_backprop;
+      OP_REQUIRES_OK(
+          context,
+          context->allocate_temp(
+              DataTypeToEnum<T>::value,
+              TensorShape({batch, out_depth, output_rows, output_cols}),
+              &transformed_out_backprop));
+
+      functor::TransformDepth<Device, T>()(
+          context->eigen_device<Device>(), out_backprop.tensor<T, 4>(),
+          Eigen::DSizes<Eigen::DenseIndex, 4>(0, 3, 1, 2),
+          transformed_out_backprop.tensor<T, 4>());
+
+      Tensor transformed_input;
+      OP_REQUIRES_OK(context,
+                     context->allocate_temp(
+                         DataTypeToEnum<T>::value,
+                         TensorShape({batch, in_depth, input_rows, input_cols}),
+                         &transformed_input));
+
+      functor::TransformDepth<Device, T>()(
+          context->eigen_device<Device>(), input.tensor<T, 4>(),
+          Eigen::DSizes<Eigen::DenseIndex, 4>(0, 3, 1, 2),
+          transformed_input.tensor<T, 4>());
+
+      auto out_backprop_ptr =
+          AsDeviceMemory(transformed_out_backprop.template flat<T>().data(),
+                         transformed_out_backprop.template flat<T>().size());
+      auto filter_backprop_ptr = AsDeviceMemory(
+          pre_transformed_filter_backprop.template flat<T>().data(),
+          pre_transformed_filter_backprop.template flat<T>().size());
+      auto input_ptr =
+          AsDeviceMemory(transformed_input.template flat<T>().data(),
+                         transformed_input.template flat<T>().size());
+
+      bool cudnn_launch_status =
+          stream->ThenConvolveBackwardFilter(input_desc, input_ptr, output_desc,
+                                             out_backprop_ptr, conv_desc,
+                                             filter_desc, &filter_backprop_ptr)
+              .ok();
+
+      if (!cudnn_launch_status) {
+        context->SetStatus(errors::Internal(
+            "cuDNN Backward Filter function launch failure : input shape(",
+            input_shape.DebugString(), ") filter shape(",
+            filter_shape.DebugString(), ")"));
+      }
+
+      auto toConstTensor = [](const Tensor& x) -> const Tensor { return x; };
+      functor::TransformDepth<Device, T>()(
+          context->eigen_device<Device>(),
+          toConstTensor(pre_transformed_filter_backprop)
+              .template tensor<T, 4>(),
+          Eigen::DSizes<Eigen::DenseIndex, 4>(2, 3, 1, 0),
+          filter_backprop->tensor<T, 4>());
+    } else {
+      // Fall back to the non-cudnn code path
+
+      // For the backprop of the filter, we need to also transpose the
+      // out_backprop.
+      // The shape of backprop is
+      //   [batch, out_rows, out_cols, out_depth]
+      // And we need to change it to
+      //   [out_depth, out_rows, out_cols, batch]
+      Eigen::DSizes<Eigen::DenseIndex, 4> out_order{3, 1, 2, 0};
+      TensorShape padded_out_shape(
+          {out_depth, padded_out_rows, padded_out_cols, batch});
+      Tensor padded_output;
+      OP_REQUIRES_OK(context,
+                     context->allocate_temp(DataTypeToEnum<T>::v(),
+                                            padded_out_shape, &padded_output));
+
+      Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 4> pad_dims{
+          {{0, 0},
+           {top_pad_rows, bottom_pad_rows},
+           {left_pad_cols, right_pad_cols},
+           {0, 0}}};
+      functor::InflatePadAndShuffle<Device, T, 4>()(
+          context->eigen_device<Device>(), out_backprop.tensor<T, 4>(), strides,
+          pad_dims, out_order, padded_output.tensor<T, 4>());
+      const Tensor& padded_output_cref = padded_output;
+
+      // For the backprop of the filter, we need to transpose the input.
+      // The shape of input is
+      //   [batch, in_rows, in_cols, in_depth]
+      // And we need to change it to
+      //   [in_rows, in_cols, batch, in_depth]
+      Eigen::DSizes<Eigen::DenseIndex, 4> in_order{1, 2, 0, 3};
+      TensorShape in_shuffle_shape({input_rows, input_cols, batch, in_depth});
+      Tensor in_shuffle;
+      OP_REQUIRES_OK(context,
+                     context->allocate_temp(DataTypeToEnum<T>::v(),
+                                            in_shuffle_shape, &in_shuffle));
+
+      // No need for reversing this time.
+      Eigen::array<bool, 4> trivial_dims{false, false, false, false};
+      functor::ShuffleAndReverse<Device, T, 4>()(
+          context->eigen_device<Device>(), input.tensor<T, 4>(), in_order,
+          trivial_dims, in_shuffle.tensor<T, 4>());
+      const Tensor& in_shuffle_cref = in_shuffle;
+
+      // The output of the conv_2d would be
+      //   [out_depth, filter_rows, filter_cols, in_depth]
+      // and we need to shuffle it back to
+      //   [filter_rows, filter_cols, in_depth, out_depth];
+      // And we need to reverse the filter backprops
+      // So we need to allocated (sigh) yet another piece of memory to hold the
+      // ouptut.
+      TensorShape filter_shuffle_shape(
+          {out_depth, filter_rows, filter_cols, in_depth});
+      Tensor filter_shuffle;
+      OP_REQUIRES_OK(context, context->allocate_temp(DataTypeToEnum<T>::v(),
+                                                     filter_shuffle_shape,
+                                                     &filter_shuffle));
+
+      functor::SpatialConvolution<Device, T>()(
+          context->eigen_device<Device>(), filter_shuffle.tensor<T, 4>(),
+          padded_output_cref.tensor<T, 4>(), in_shuffle_cref.tensor<T, 4>(), 1,
+          BrainPadding2EigenPadding(VALID));
+
+      // Now copy the filter_backprop back to the destination.
+      Eigen::DSizes<Eigen::DenseIndex, 4> filter_order{1, 2, 3, 0};
+      Eigen::array<bool, 4> filter_rev_dims{true, true, false, false};
+      const Tensor& filter_shuffle_cref = filter_shuffle;
+      functor::ShuffleAndReverse<Device, T, 4>()(
+          context->eigen_device<Device>(), filter_shuffle_cref.tensor<T, 4>(),
+          filter_order, filter_rev_dims, filter_backprop->tensor<T, 4>());
+    }
+  }
+
+ private:
+  std::vector<int32> strides_;
+  Padding padding_;
+  bool use_cudnn_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Conv2DSlowBackpropFilterOp);
+};
+
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                                 \
+  template <>                                                               \
+  void ShuffleAndReverse<GPUDevice, T, 4>::operator()(                      \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor input,         \
+      const Eigen::DSizes<Eigen::DenseIndex, 4>& order,                     \
+      const Eigen::array<bool, 4>& reverse_dims,                            \
+      typename TTypes<T, 4>::Tensor output);                                \
+  extern template struct ShuffleAndReverse<GPUDevice, T, 4>;                \
+  template <>                                                               \
+  void InflatePadAndShuffle<GPUDevice, T, 4>::operator()(                   \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor input,         \
+      const Eigen::DSizes<Eigen::DenseIndex, 4>& strides,                   \
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 4>& pad_dims, \
+      const Eigen::DSizes<Eigen::DenseIndex, 4>& order,                     \
+      typename TTypes<T, 4>::Tensor output);                                \
+  extern template struct InflatePadAndShuffle<GPUDevice, T, 4>;             \
+  template <>                                                               \
+  void TransformFilter<GPUDevice, T>::operator()(                           \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor in,            \
+      typename TTypes<T, 4>::Tensor out);                                   \
+  extern template struct TransformFilter<GPUDevice, T>;                     \
+  template <>                                                               \
+  void TransformDepth<GPUDevice, T>::operator()(                            \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor in,            \
+      const Eigen::DSizes<Eigen::DenseIndex, 4>& shuffle,                   \
+      typename TTypes<T, 4>::Tensor out);                                   \
+  extern template struct TransformDepth<GPUDevice, T>;                      \
+  template <>                                                               \
+  void SpatialConvolution<GPUDevice, T>::operator()(                        \
+      const GPUDevice& d, typename TTypes<T, 4>::Tensor output,             \
+      typename TTypes<T, 4>::ConstTensor input,                             \
+      typename TTypes<T, 4>::ConstTensor filter, int stride,                \
+      const Eigen::PaddingType& padding);                                   \
+  extern template struct SpatialConvolution<GPUDevice, T>;                  \
+  template <>                                                               \
+  void SpatialConvolutionBackwardInput<GPUDevice, T>::operator()(           \
+      const GPUDevice& d, typename TTypes<T, 4>::Tensor in_backprop,        \
+      typename TTypes<T, 4>::ConstTensor filter,                            \
+      typename TTypes<T, 4>::ConstTensor output_backprop, int input_rows,   \
+      int input_cols, int stride);                                          \
+  extern template struct SpatialConvolutionBackwardInput<GPUDevice, T>
+
+DECLARE_GPU_SPEC(float);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+REGISTER_KERNEL_BUILDER(Name("Conv2DBackpropInput")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T")
+                            .HostMemory("input_sizes"),
+                        Conv2DSlowBackpropInputOp<GPUDevice, float>);
+REGISTER_KERNEL_BUILDER(Name("Conv2DBackpropFilter")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T")
+                            .HostMemory("filter_sizes"),
+                        Conv2DSlowBackpropFilterOp<GPUDevice, float>);
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/conv_ops.cc b/tensorflow/core/kernels/conv_ops.cc
new file mode 100644
index 0000000000..aaa2951778
--- /dev/null
+++ b/tensorflow/core/kernels/conv_ops.cc
@@ -0,0 +1,373 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define USE_EIGEN_TENSOR
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/kernels/conv_2d.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/util/use_cudnn.h"
+#include "tensorflow/core/util/padding.h"
+#include "tensorflow/core/public/tensor.h"
+
+#if GOOGLE_CUDA
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/stream_executor/stream.h"
+#endif  // GOOGLE_CUDA
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+struct LaunchGeneric {
+  static void launch(OpKernelContext* ctx, const Tensor& input,
+                     const Tensor& filter, int stride,
+                     const Eigen::PaddingType& padding, Tensor* output) {
+    if (filter.dim_size(1) == filter.dim_size(0) && filter.dim_size(0) == 1 &&
+        stride == 1) {
+      // For 1x1 kernel, the 2D convolution is reduced to matrix
+      // multiplication.
+      //
+      // TODO(vrv): We should be able to call SpatialConvolution
+      // and it will produce the same result, but doing so
+      // led to NaNs during training.  Using matmul instead for now.
+      int conv_width = 1;  // Width for the convolution step.
+      for (int i = 0; i < 3; ++i) {
+        conv_width *= output->dim_size(i);
+      }
+
+      Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1> dim_pair;
+      dim_pair[0] = Eigen::IndexPair<Eigen::DenseIndex>(1, 0);
+      functor::MatMulConvFunctor<Device, T>()(
+          ctx->eigen_device<Device>(),
+          output->shaped<T, 2>({conv_width, filter.dim_size(3)}),
+          input.shaped<T, 2>({conv_width, filter.dim_size(2)}),
+          filter.shaped<T, 2>({filter.dim_size(2), filter.dim_size(3)}),
+          dim_pair);
+    } else {
+      functor::SpatialConvolution<Device, T>()(
+          ctx->eigen_device<Device>(), output->tensor<T, 4>(),
+          input.tensor<T, 4>(), filter.tensor<T, 4>(), stride, padding);
+    }
+  }
+};
+
+template <typename Device, typename T>
+struct LaunchConvOp;
+
+template <typename T>
+struct LaunchConvOp<CPUDevice, T> {
+  static void launch(OpKernelContext* ctx, bool use_cudnn, const Tensor& input,
+                     const Tensor& filter, int stride,
+                     const Eigen::PaddingType& padding, Tensor* output) {
+    LaunchGeneric<CPUDevice, T>::launch(ctx, input, filter, stride, padding,
+                                        output);
+  }
+};
+
+template <typename Device, typename T>
+class Conv2DOp : public BinaryOp<T> {
+ public:
+  explicit Conv2DOp(OpKernelConstruction* context) : BinaryOp<T>(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &strides_));
+    OP_REQUIRES_OK(context, context->GetAttr("use_cudnn_on_gpu", &use_cudnn_));
+    use_cudnn_ &= CanUseCudnn();
+    OP_REQUIRES(context, strides_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window strides field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES(context, strides_[1] == strides_[2],
+                errors::InvalidArgument(
+                    "Current implementation only supports equal length "
+                    "strides in the row and column dimensions."));
+    OP_REQUIRES(context, (strides_[0] == 1 && strides_[3] == 1),
+                errors::InvalidArgument(
+                    "Current implementation does not yet support "
+                    "strides in the batch and depth dimensions."));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    // Input tensor is of the following dimensions:
+    // [ batch, in_rows, in_cols, in_depth ]
+
+    const Tensor& input = context->input(0);
+
+    // Input filter is of the following dimensions:
+    // [ filter_rows, filter_cols, in_depth, out_depth]
+    const Tensor& filter = context->input(1);
+
+    // For 2D convolution, there should be 4 dimensions.
+    OP_REQUIRES(context, input.dims() == 4,
+                errors::InvalidArgument("input must be 4-dimensional",
+                                        input.shape().ShortDebugString()));
+    OP_REQUIRES(context, filter.dims() == 4,
+                errors::InvalidArgument("filter must be 4-dimensional: ",
+                                        filter.shape().ShortDebugString()));
+
+    // The last dimension for input is in_depth. It must be the same as the
+    // filter's in_depth.
+    const int64 in_depth = input.dim_size(3);
+    OP_REQUIRES(
+        context, in_depth == filter.dim_size(2),
+        errors::InvalidArgument("input and filter must have the same depth: ",
+                                in_depth, " vs ", filter.dim_size(2)));
+
+    // The last dimension for filter is out_depth.
+    const int64 out_depth = filter.dim_size(3);
+
+    // The second dimension for input is rows/height.
+    // The first dimension for filter is rows/height.
+    const int64 input_rows = input.dim_size(1);
+    const int64 filter_rows = filter.dim_size(0);
+
+    // The third dimension for input is columns/width.
+    // The second dimension for filter is columns/width.
+    const int64 input_cols = input.dim_size(2);
+    const int64 filter_cols = filter.dim_size(1);
+
+    // The first dimension for input is batch.
+    const int64 batch = input.dim_size(0);
+
+    // For now we take the stride from the second dimension only (we
+    // assume row = col stride, and do not support striding on the
+    // batch or depth dimension).
+    const int stride = strides_[1];
+
+    int out_rows = 0, out_cols = 0, pad_rows = 0, pad_cols = 0;
+    if (filter_cols == filter_rows && filter_rows == 1 && stride == 1) {
+      // For 1x1 kernel, the 2D convolution is reduced to matrix
+      // multiplication.
+      out_rows = input_rows;
+      out_cols = input_cols;
+    } else {
+      OP_REQUIRES_OK(
+          context, Get2dOutputSize(input_rows, input_cols, filter_rows,
+                                   filter_cols, stride, stride, padding_,
+                                   &out_rows, &out_cols, &pad_rows, &pad_cols));
+    }
+    TensorShape out_shape({batch, out_rows, out_cols, out_depth});
+
+    // Output tensor is of the following dimensions:
+    // [ in_batch, out_rows, out_cols, out_depth ]
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, out_shape, &output));
+
+    VLOG(2) << "Conv2D: in_depth = " << in_depth
+            << ", input_cols = " << input_cols
+            << ", filter_cols = " << filter_cols
+            << ", input_rows = " << input_rows
+            << ", filter_rows = " << filter_rows << ", stride = " << stride
+            << ", out_depth = " << out_depth;
+
+    LaunchConvOp<Device, T>::launch(context, use_cudnn_, input, filter, stride,
+                                    BrainPadding2EigenPadding(padding_),
+                                    output);
+  }
+
+ private:
+  std::vector<int32> strides_;
+  bool use_cudnn_;
+  Padding padding_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Conv2DOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("Conv2D")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        Conv2DOp<CPUDevice, float>);
+
+#if GOOGLE_CUDA
+
+namespace {
+template <typename T>
+perftools::gputools::DeviceMemory<T> AsDeviceMemory(const T* cuda_memory,
+                                                    uint64 size) {
+  perftools::gputools::DeviceMemoryBase wrapped(const_cast<T*>(cuda_memory),
+                                                size * sizeof(T));
+  perftools::gputools::DeviceMemory<T> typed(wrapped);
+  return typed;
+}
+}  // namespace
+
+template <typename T>
+struct LaunchConvOp<GPUDevice, T> {
+  static void launch(OpKernelContext* ctx, bool use_cudnn,
+                     const Tensor& input_param, const Tensor& filter,
+                     int stride, const Eigen::PaddingType& padding,
+                     Tensor* output) {
+    auto* stream = ctx->op_device_context<GPUDeviceContext>()->stream();
+    OP_REQUIRES(ctx, stream, errors::Internal("No GPU stream available."));
+
+    if (use_cudnn) {
+      Tensor input = input_param;
+      if (filter.dim_size(0) == 1 && filter.dim_size(1) == 1) {
+        // 1x1 filter, so call cublas directly.
+        const uint64 m =
+            input.dim_size(0) * input.dim_size(1) * input.dim_size(2);
+        const uint64 k = filter.dim_size(2);
+        const uint64 n = filter.dim_size(3);
+
+        auto a_ptr = AsDeviceMemory(input.template flat<T>().data(),
+                                    input.template flat<T>().size());
+        auto b_ptr = AsDeviceMemory(filter.template flat<T>().data(),
+                                    filter.template flat<T>().size());
+        auto c_ptr = AsDeviceMemory(output->template flat<T>().data(),
+                                    output->template flat<T>().size());
+
+        auto no_transpose = perftools::gputools::blas::Transpose::kNoTranspose;
+        bool blas_launch_status =
+            stream->ThenBlasGemm(no_transpose, no_transpose, n, m, k, 1.0f,
+                                 b_ptr, n, a_ptr, k, 0.0f, &c_ptr, n)
+                .ok();
+        if (!blas_launch_status) {
+          ctx->SetStatus(errors::Internal("Blas SGEMM launch failed : m=", m,
+                                          ", n=", n, ", k=", k));
+        }
+        return;
+      }
+      if (padding == Eigen::PADDING_SAME) {
+        const int64 out_rows = output->dim_size(1);
+        const int64 out_cols = output->dim_size(2);
+        const int64 in_rows = input.dim_size(1);
+        const int64 in_cols = input.dim_size(2);
+        const int64 patch_rows = filter.dim_size(0);
+        const int64 patch_cols = filter.dim_size(1);
+        // Total padding on rows and cols is
+        // Pr = (R' - 1) * S + Kr - R
+        // Pc = (C' - 1) * S + Kc - C
+        // where (R', C') are output dimensions, (R, C) are input dimensions, S
+        // is stride, (Kr, Kc) are filter dimensions.
+        // We pad Pr/2 on the left and Pr - Pr/2 on the right, Pc/2 on the top
+        // and Pc - Pc/2 on the bottom.  When Pr or Pc is odd, this means
+        // we pad more on the right and bottom than on the top and left.
+        const int padding_rows = (out_rows - 1) * stride + patch_rows - in_rows;
+        const int padding_cols = (out_cols - 1) * stride + patch_cols - in_cols;
+        Tensor transformed_input;
+        OP_REQUIRES_OK(
+            ctx, ctx->allocate_temp(
+                     DataTypeToEnum<T>::value,
+                     TensorShape(
+                         {input.dim_size(0), input.dim_size(1) + padding_rows,
+                          input.dim_size(2) + padding_cols, input.dim_size(3)}),
+                     &transformed_input));
+
+        functor::PadInput<GPUDevice, T>()(
+            ctx->eigen_device<GPUDevice>(), input_param.tensor<T, 4>(),
+            padding_rows / 2, padding_rows - padding_rows / 2, padding_cols / 2,
+            padding_cols - padding_cols / 2, transformed_input.tensor<T, 4>());
+        input = transformed_input;
+      }
+
+      perftools::gputools::dnn::BatchDescriptor input_desc;
+      input_desc.set_count(input.dim_size(0))
+          .set_height(input.dim_size(1))
+          .set_width(input.dim_size(2))
+          .set_feature_map_count(input.dim_size(3))
+          .set_layout(perftools::gputools::dnn::DataLayout::kBatchYXDepth);
+      perftools::gputools::dnn::BatchDescriptor output_desc;
+      output_desc.set_count(output->dim_size(0))
+          .set_height(output->dim_size(1))
+          .set_width(output->dim_size(2))
+          .set_feature_map_count(output->dim_size(3))
+          .set_layout(perftools::gputools::dnn::DataLayout::kBatchYXDepth);
+      perftools::gputools::dnn::FilterDescriptor filter_desc;
+      filter_desc.set_input_filter_height(filter.dim_size(0))
+          .set_input_filter_width(filter.dim_size(1))
+          .set_input_feature_map_count(filter.dim_size(2))
+          .set_output_feature_map_count(filter.dim_size(3));
+      perftools::gputools::dnn::ConvolutionDescriptor conv_desc;
+      conv_desc.set_vertical_filter_stride(stride)
+          .set_horizontal_filter_stride(stride);
+
+      Tensor transformed_filter;
+      OP_REQUIRES_OK(ctx,
+                     ctx->allocate_temp(
+                         DataTypeToEnum<T>::value,
+                         TensorShape({filter.dim_size(3), filter.dim_size(2),
+                                      filter.dim_size(0), filter.dim_size(1)}),
+                         &transformed_filter));
+
+      functor::TransformFilter<GPUDevice, T>()(
+          ctx->eigen_device<GPUDevice>(), filter.tensor<T, 4>(),
+          transformed_filter.tensor<T, 4>());
+
+      auto input_ptr = AsDeviceMemory(input.template flat<T>().data(),
+                                      input.template flat<T>().size());
+      auto filter_ptr =
+          AsDeviceMemory(transformed_filter.template flat<T>().data(),
+                         transformed_filter.template flat<T>().size());
+      auto output_ptr = AsDeviceMemory(output->template flat<T>().data(),
+                                       output->template flat<T>().size());
+
+      bool cudnn_launch_status =
+          stream->ThenConvolve(input_desc, input_ptr, filter_desc, filter_ptr,
+                               conv_desc, output_desc, &output_ptr)
+              .ok();
+
+      if (!cudnn_launch_status) {
+        ctx->SetStatus(errors::Internal(
+            "cuDNN launch failure : input shape(", input.shape().DebugString(),
+            ") filter shape(", filter.shape().DebugString(), ")"));
+      }
+    } else {
+      LaunchGeneric<GPUDevice, T>::launch(ctx, input_param, filter, stride,
+                                          padding, output);
+    }
+  }
+};
+
+#endif  // GOOGLE_CUDA
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                                  \
+  template <>                                                                \
+  void SpatialConvolution<GPUDevice, T>::operator()(                         \
+      const GPUDevice& d, typename TTypes<T, 4>::Tensor output,              \
+      typename TTypes<T, 4>::ConstTensor input,                              \
+      typename TTypes<T, 4>::ConstTensor filter, int stride,                 \
+      const Eigen::PaddingType& padding);                                    \
+  extern template struct SpatialConvolution<GPUDevice, T>;                   \
+  template <>                                                                \
+  void MatMulConvFunctor<GPUDevice, T>::operator()(                          \
+      const GPUDevice& d, typename TTypes<T, 2>::Tensor out,                 \
+      typename TTypes<T, 2>::ConstTensor in0,                                \
+      typename TTypes<T, 2>::ConstTensor in1,                                \
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1>& dim_pair); \
+  extern template struct MatMulConvFunctor<GPUDevice, T>;                    \
+  template <>                                                                \
+  void TransformFilter<GPUDevice, T>::operator()(                            \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor in,             \
+      typename TTypes<T, 4>::Tensor out);                                    \
+  extern template struct TransformFilter<GPUDevice, T>;                      \
+  template <>                                                                \
+  void PadInput<GPUDevice, T>::operator()(                                   \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor in,             \
+      int padding_rows_left, int padding_rows_right, int padding_cols_left,  \
+      int padding_cols_right, typename TTypes<T, 4>::Tensor out);            \
+  extern template struct PadInput<GPUDevice, T>
+
+DECLARE_GPU_SPEC(float);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+// Registration of the GPU implementations.
+REGISTER_KERNEL_BUILDER(Name("Conv2D")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T"),
+                        Conv2DOp<GPUDevice, float>);
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/conv_ops_gpu.cu.cc b/tensorflow/core/kernels/conv_ops_gpu.cu.cc
new file mode 100644
index 0000000000..44af814e2b
--- /dev/null
+++ b/tensorflow/core/kernels/conv_ops_gpu.cu.cc
@@ -0,0 +1,35 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/conv_2d.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+namespace functor {
+
+template <typename T>
+struct SpatialConvolution<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T, 4>::Tensor output,
+                  typename TTypes<T, 4>::ConstTensor input,
+                  typename TTypes<T, 4>::ConstTensor filter, int stride,
+                  const Eigen::PaddingType& padding) {
+    // TODO(keveman): nvcc 6.5 crashes when 32 bit indexing is turned on. Enable
+    // this when we move to cuda 7.0.
+    // SpatialConvolutionFunc(d, To32Bit(output), To32Bit(input),
+    // To32Bit(filter), stride, padding);
+
+    SpatialConvolutionFunc(d, output, input, filter, stride, padding);
+  }
+};
+
+template struct SpatialConvolution<GPUDevice, float>;
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/conv_ops_gpu_2.cu.cc b/tensorflow/core/kernels/conv_ops_gpu_2.cu.cc
new file mode 100644
index 0000000000..e2e9d25d83
--- /dev/null
+++ b/tensorflow/core/kernels/conv_ops_gpu_2.cu.cc
@@ -0,0 +1,16 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/conv_2d.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+template struct functor::InflatePadAndShuffle<GPUDevice, float, 4>;
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/conv_ops_gpu_3.cu.cc b/tensorflow/core/kernels/conv_ops_gpu_3.cu.cc
new file mode 100644
index 0000000000..dbbe08ef9c
--- /dev/null
+++ b/tensorflow/core/kernels/conv_ops_gpu_3.cu.cc
@@ -0,0 +1,22 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/conv_2d.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+template struct functor::ShuffleAndReverse<GPUDevice, float, 4>;
+
+template struct functor::TransformFilter<GPUDevice, float>;
+
+template struct functor::PadInput<GPUDevice, float>;
+
+template struct functor::TransformDepth<GPUDevice, float>;
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/conv_ops_gpu_matmul.cu.cc b/tensorflow/core/kernels/conv_ops_gpu_matmul.cu.cc
new file mode 100644
index 0000000000..87d79ecb4d
--- /dev/null
+++ b/tensorflow/core/kernels/conv_ops_gpu_matmul.cu.cc
@@ -0,0 +1,16 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/conv_2d.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+template struct functor::MatMulConvFunctor<GPUDevice, float>;
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/core_ops_test.cc b/tensorflow/core/kernels/core_ops_test.cc
new file mode 100644
index 0000000000..a42a5999da
--- /dev/null
+++ b/tensorflow/core/kernels/core_ops_test.cc
@@ -0,0 +1,990 @@
+#define EIGEN_USE_THREADS
+
+#if GOOGLE_CUDA
+#define EIGEN_USE_GPU
+#endif  // GOOGLE_CUDA
+
+#include <functional>
+#include <memory>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/cc/ops/const_op.h"
+#include "tensorflow/cc/ops/nn_ops.h"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/common_runtime/eigen_thread_pool.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/graph_constructor.h"
+#include "tensorflow/core/graph/graph_def_builder.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/util/padding.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/session.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/port.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+static void SetConstOp(const string& name, std::initializer_list<int64> dims,
+                       NodeDef* node) {
+  Tensor tensor(DT_FLOAT, TensorShape(dims));
+  for (int64 i = 0; i < tensor.NumElements(); ++i) {
+    tensor.flat<float>()(i) = i / 10.0f;
+  }
+  TF_CHECK_OK(NodeDefBuilder(name, "Const")
+                  .Attr("dtype", DT_FLOAT)
+                  .Attr("value", tensor)
+                  .Finalize(node));
+}
+
+static void SetConstSizesOp(const string& name, const std::vector<int32>& sizes,
+                            NodeDef* node) {
+  TensorShape shape;
+  shape.AddDim(sizes.size());
+  Tensor tensor(DT_INT32, shape);
+  for (int64 i = 0; i < tensor.NumElements(); ++i) {
+    tensor.flat<int32>()(i) = sizes[i];
+  }
+  TF_CHECK_OK(NodeDefBuilder(name, "Const")
+                  .Attr("dtype", DT_INT32)
+                  .Attr("value", tensor)
+                  .Finalize(node));
+}
+
+namespace {
+
+enum CONV_OP {
+  CONV_OP_FORWARD = 0,
+  CONV_OP_BACKPROP_INPUT = 1,
+  CONV_OP_BACKPROP_FILTER = 2
+};
+
+}  // namespace
+
+static void BM_ConvFloat(int iters, int batch, int rows, int cols, int in_depth,
+                         int out_depth, int filter_rows, int filter_cols,
+                         CONV_OP op, int num_threads, int stride,
+                         Padding padding, bool use_gpu, const string& label) {
+  if (!IsGoogleCudaEnabled() && use_gpu) {
+    testing::SetLabel(
+        strings::StrCat("Skipping GPU test (no --config=cuda): ", label));
+    return;
+  }
+  testing::SetLabel(label);
+
+  // Set the number of threads
+  SessionOptions options;
+  options.config.set_intra_op_parallelism_threads(num_threads);
+
+  // We set up a graph for computing convolution.
+  GraphDef graph;
+
+  // For this, we need an input tensor and a filter tensor.
+  // Compute the output size.
+  int out_rows = 0, out_cols = 0, pad_rows = 0, pad_cols = 0;
+  TF_CHECK_OK(Get2dOutputSize(rows, cols, filter_rows, filter_cols, stride,
+                              stride, padding, &out_rows, &out_cols, &pad_rows,
+                              &pad_cols));
+  // Counting the number of floating point operations (both MUL and ADD)
+  int64 num_ops = 0;
+  if (op == CONV_OP_FORWARD) {
+    // Forward computation:
+    // BATCH x OUT_ROW X OUT_COL X IN_DEPTH X PATCH_ROW X PATH_COL X OUT_DEPTH
+    // We multiply by two since there are mutliplications and additions.
+    num_ops = static_cast<int64>(batch * in_depth * out_depth) *
+              static_cast<int64>(filter_rows * filter_cols) *
+              static_cast<int64>(out_rows * out_cols) * 2;
+  } else {
+    // Backward computation: both input and filter backprop take the same
+    // amount of computation:
+    // BATCH x IN_ROW X IN_COL X IN_DEPTH X PATCH_ROW X PATCH_COL X OUT_DEPTH
+    // We multiply by two since there are mutliplications and additions.
+    num_ops = static_cast<int64>(batch * in_depth * out_depth) *
+              static_cast<int64>(filter_rows * filter_cols) *
+              static_cast<int64>(rows * cols) * 2;
+  }
+
+  SetConstOp("input", {batch, rows, cols, in_depth}, graph.add_node());
+  SetConstOp("filter", {filter_rows, filter_cols, in_depth, out_depth},
+             graph.add_node());
+  SetConstOp("output_backprop", {batch, out_rows, out_cols, out_depth},
+             graph.add_node());
+  SetConstSizesOp("input_sizes",
+                  std::vector<int32>({batch, rows, cols, in_depth}),
+                  graph.add_node());
+  SetConstSizesOp("filter_sizes", std::vector<int32>({filter_rows, filter_cols,
+                                                      in_depth, out_depth}),
+                  graph.add_node());
+
+  // Now add the convolution op
+  NodeDef* conv = graph.add_node();
+  switch (op) {
+    case CONV_OP_FORWARD:
+      TF_CHECK_OK(NodeDefBuilder("conv2d", "Conv2D")
+                      .Input("input", 0, DT_FLOAT)
+                      .Input("filter", 0, DT_FLOAT)
+                      .Attr("strides", {1, stride, stride, 1})
+                      .Attr("padding", padding == VALID ? "VALID" : "SAME")
+                      .Finalize(conv));
+      break;
+    case CONV_OP_BACKPROP_INPUT:
+      TF_CHECK_OK(NodeDefBuilder("conv2d", "Conv2DBackpropInput")
+                      .Input("input_sizes", 0, DT_INT32)
+                      .Input("filter", 0, DT_FLOAT)
+                      .Input("output_backprop", 0, DT_FLOAT)
+                      .Attr("strides", {1, stride, stride, 1})
+                      .Attr("padding", padding == VALID ? "VALID" : "SAME")
+                      .Finalize(conv));
+      break;
+    case CONV_OP_BACKPROP_FILTER:
+      TF_CHECK_OK(NodeDefBuilder("conv2d", "Conv2DBackpropFilter")
+                      .Input("input", 0, DT_FLOAT)
+                      .Input("filter_sizes", 0, DT_INT32)
+                      .Input("output_backprop", 0, DT_FLOAT)
+                      .Attr("strides", {1, stride, stride, 1})
+                      .Attr("padding", padding == VALID ? "VALID" : "SAME")
+                      .Finalize(conv));
+      break;
+  }
+  Graph* g = new Graph(OpRegistry::Global());
+  GraphConstructorOptions opts;
+  TF_CHECK_OK(ConvertGraphDefToGraph(opts, graph, g));
+
+  string device = use_gpu ? "gpu" : "cpu";
+  test::Benchmark(device, g, &options).Run(iters);
+  testing::ItemsProcessed(num_ops * iters);
+}
+
+// BS: batch_size
+// R: tensor_in_rows
+// C: tensor_in_cols
+// ID: input_depth
+// OD: output_depth
+// KR: kernel_rows
+// KC: kernel_cols
+#define BM_ConvFloatFwd(BS, R, C, ID, OD, KR, KC, STR, PAD, LABEL)           \
+  static void BM_ConvFloatFwdCPU1_##LABEL(int iters) {                       \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_FORWARD, 1, STR,   \
+                 PAD, false,                                                 \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",  \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_cpu1")); \
+  }                                                                          \
+  static void BM_ConvFloatFwdCPU4_##LABEL(int iters) {                       \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_FORWARD, 4, STR,   \
+                 PAD, false,                                                 \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",  \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_cpu4")); \
+  }                                                                          \
+  static void BM_ConvFloatFwdGPU_##LABEL(int iters) {                        \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_FORWARD, 1, STR,   \
+                 PAD, true,                                                  \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",  \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_gpu"));  \
+  }                                                                          \
+  BENCHMARK(BM_ConvFloatFwdCPU1_##LABEL);                                    \
+  BENCHMARK(BM_ConvFloatFwdCPU4_##LABEL);                                    \
+  BENCHMARK(BM_ConvFloatFwdGPU_##LABEL)
+
+BM_ConvFloatFwd(32, 5, 5, 1248, 128, 1, 1, 1, SAME, conv0);
+BM_ConvFloatFwd(32, 8, 8, 384, 384, 1, 3, 1, SAME, conv1);
+BM_ConvFloatFwd(32, 8, 8, 384, 384, 3, 1, 1, SAME, conv2);
+BM_ConvFloatFwd(32, 8, 8, 2048, 192, 1, 1, 1, SAME, conv3);
+BM_ConvFloatFwd(32, 8, 8, 448, 384, 3, 3, 1, SAME, conv4);
+BM_ConvFloatFwd(32, 8, 8, 2048, 320, 1, 1, 1, SAME, conv5);
+BM_ConvFloatFwd(32, 8, 8, 2048, 448, 1, 1, 1, SAME, conv6);
+BM_ConvFloatFwd(32, 8, 8, 2048, 384, 1, 1, 1, SAME, conv7);
+BM_ConvFloatFwd(32, 8, 8, 1760, 384, 1, 1, 1, SAME, conv8);
+BM_ConvFloatFwd(32, 8, 8, 1760, 192, 1, 1, 1, SAME, conv9);
+BM_ConvFloatFwd(32, 8, 8, 1760, 448, 1, 1, 1, SAME, conv10);
+BM_ConvFloatFwd(32, 8, 8, 1760, 320, 1, 1, 1, SAME, conv11);
+BM_ConvFloatFwd(32, 17, 17, 192, 192, 3, 3, 2, VALID, conv12);
+BM_ConvFloatFwd(32, 17, 17, 192, 192, 3, 3, 1, SAME, conv13);
+BM_ConvFloatFwd(32, 17, 17, 1248, 192, 1, 1, 1, SAME, conv14);
+BM_ConvFloatFwd(32, 17, 17, 128, 320, 3, 3, 2, VALID, conv15);
+BM_ConvFloatFwd(32, 17, 17, 1248, 128, 1, 1, 1, SAME, conv16);
+BM_ConvFloatFwd(32, 17, 17, 224, 224, 1, 3, 1, SAME, conv17);
+BM_ConvFloatFwd(32, 17, 17, 192, 256, 3, 1, 1, SAME, conv18);
+BM_ConvFloatFwd(32, 17, 17, 192, 256, 1, 3, 1, SAME, conv19);
+BM_ConvFloatFwd(32, 17, 17, 1216, 192, 1, 1, 1, SAME, conv20);
+BM_ConvFloatFwd(32, 17, 17, 1216, 96, 1, 1, 1, SAME, conv21);
+BM_ConvFloatFwd(32, 17, 17, 224, 224, 3, 1, 1, SAME, conv22);
+BM_ConvFloatFwd(32, 17, 17, 192, 224, 3, 3, 1, SAME, conv23);
+BM_ConvFloatFwd(32, 17, 17, 192, 192, 1, 3, 1, SAME, conv24);
+BM_ConvFloatFwd(32, 17, 17, 1152, 192, 1, 1, 1, SAME, conv25);
+BM_ConvFloatFwd(32, 17, 17, 1152, 128, 1, 1, 1, SAME, conv26);
+BM_ConvFloatFwd(32, 17, 17, 192, 192, 3, 1, 1, SAME, conv27);
+BM_ConvFloatFwd(32, 17, 17, 160, 192, 3, 3, 1, SAME, conv28);
+BM_ConvFloatFwd(32, 17, 17, 1152, 160, 1, 1, 1, SAME, conv29);
+BM_ConvFloatFwd(32, 17, 17, 1024, 128, 1, 1, 1, SAME, conv30);
+BM_ConvFloatFwd(32, 17, 17, 128, 192, 1, 3, 1, SAME, conv31);
+BM_ConvFloatFwd(32, 17, 17, 1024, 160, 1, 1, 1, SAME, conv32);
+BM_ConvFloatFwd(32, 17, 17, 128, 192, 3, 1, 1, SAME, conv33);
+BM_ConvFloatFwd(32, 17, 17, 1024, 256, 1, 1, 1, SAME, conv34);
+BM_ConvFloatFwd(32, 17, 17, 128, 128, 3, 1, 1, SAME, conv35);
+BM_ConvFloatFwd(32, 17, 17, 768, 192, 1, 1, 1, SAME, conv36);
+BM_ConvFloatFwd(32, 17, 17, 128, 128, 1, 3, 1, SAME, conv37);
+BM_ConvFloatFwd(32, 17, 17, 128, 128, 3, 3, 1, SAME, conv38);
+BM_ConvFloatFwd(32, 17, 17, 768, 128, 1, 1, 1, SAME, conv39);
+BM_ConvFloatFwd(32, 17, 17, 768, 320, 1, 1, 1, SAME, conv40);
+BM_ConvFloatFwd(32, 35, 35, 96, 96, 3, 3, 2, VALID, conv41);
+BM_ConvFloatFwd(32, 35, 35, 288, 384, 3, 3, 2, VALID, conv42);
+BM_ConvFloatFwd(32, 35, 35, 64, 96, 3, 3, 1, SAME, conv43);
+BM_ConvFloatFwd(32, 35, 35, 288, 64, 1, 1, 1, SAME, conv44);
+BM_ConvFloatFwd(32, 35, 35, 256, 64, 1, 1, 1, SAME, conv45);
+BM_ConvFloatFwd(32, 35, 35, 48, 64, 5, 5, 1, SAME, conv46);
+BM_ConvFloatFwd(32, 35, 35, 256, 48, 1, 1, 1, SAME, conv47);
+BM_ConvFloatFwd(32, 35, 35, 96, 96, 3, 3, 1, SAME, conv48);
+BM_ConvFloatFwd(32, 35, 35, 192, 32, 1, 1, 1, SAME, conv49);
+BM_ConvFloatFwd(32, 35, 35, 192, 64, 1, 1, 1, SAME, conv50);
+BM_ConvFloatFwd(32, 35, 35, 192, 48, 1, 1, 1, SAME, conv51);
+BM_ConvFloatFwd(32, 73, 73, 64, 192, 3, 3, 1, VALID, conv52);
+BM_ConvFloatFwd(32, 73, 73, 64, 64, 1, 1, 1, VALID, conv53);
+BM_ConvFloatFwd(32, 147, 147, 24, 64, 1, 1, 1, VALID, conv54);
+
+#define BM_ConvFloatBkInAndFilter(BS, R, C, ID, OD, KR, KC, STR, PAD, LABEL)  \
+  static void BM_ConvFloatBkInCPU1_##LABEL(int iters) {                       \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_BACKPROP_INPUT, 1,  \
+                 STR, PAD, false,                                             \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",   \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_cpu1"));  \
+  }                                                                           \
+  static void BM_ConvFloatBkInCPU4_##LABEL(int iters) {                       \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_BACKPROP_INPUT, 4,  \
+                 STR, PAD, false,                                             \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",   \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_cpu4"));  \
+  }                                                                           \
+  static void BM_ConvFloatBkInGPU_##LABEL(int iters) {                        \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_BACKPROP_INPUT, 1,  \
+                 STR, PAD, true,                                              \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",   \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_gpu"));   \
+  }                                                                           \
+  static void BM_ConvFloatBkFilterCPU1_##LABEL(int iters) {                   \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_BACKPROP_FILTER, 1, \
+                 STR, PAD, false,                                             \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",   \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_cpu1"));  \
+  }                                                                           \
+  static void BM_ConvFloatBkFilterCPU4_##LABEL(int iters) {                   \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_BACKPROP_FILTER, 4, \
+                 STR, PAD, false,                                             \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",   \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_cpu4"));  \
+  }                                                                           \
+  static void BM_ConvFloatBkFilterGPU_##LABEL(int iters) {                    \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_BACKPROP_FILTER, 1, \
+                 STR, PAD, true,                                              \
+                 strings::StrCat(BS, "_", R, "_", C, "_", ID, "_", OD, "_",   \
+                                 KR, "_", KC, "_", STR, "_", PAD, "_gpu"));   \
+  }                                                                           \
+  BENCHMARK(BM_ConvFloatBkInCPU1_##LABEL);                                    \
+  BENCHMARK(BM_ConvFloatBkInCPU4_##LABEL);                                    \
+  BENCHMARK(BM_ConvFloatBkInGPU_##LABEL);                                     \
+  BENCHMARK(BM_ConvFloatBkFilterCPU1_##LABEL);                                \
+  BENCHMARK(BM_ConvFloatBkFilterCPU4_##LABEL);                                \
+  BENCHMARK(BM_ConvFloatBkFilterGPU_##LABEL)
+
+// Benchmarks from the inception model
+
+BM_ConvFloatBkInAndFilter(32, 5, 5, 1248, 128, 1, 1, 1, SAME, conv0);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 384, 384, 1, 3, 1, SAME, conv1);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 384, 384, 3, 1, 1, SAME, conv2);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 2048, 192, 1, 1, 1, SAME, conv3);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 448, 384, 3, 3, 1, SAME, conv4);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 2048, 320, 1, 1, 1, SAME, conv5);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 2048, 448, 1, 1, 1, SAME, conv6);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 2048, 384, 1, 1, 1, SAME, conv7);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 1760, 384, 1, 1, 1, SAME, conv8);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 1760, 192, 1, 1, 1, SAME, conv9);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 1760, 448, 1, 1, 1, SAME, conv10);
+BM_ConvFloatBkInAndFilter(32, 8, 8, 1760, 320, 1, 1, 1, SAME, conv11);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 192, 192, 3, 3, 2, VALID, conv12);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 192, 192, 3, 3, 1, SAME, conv13);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1248, 192, 1, 1, 1, SAME, conv14);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 128, 320, 3, 3, 2, VALID, conv15);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1248, 128, 1, 1, 1, SAME, conv16);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 224, 224, 1, 3, 1, SAME, conv17);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 192, 256, 3, 1, 1, SAME, conv18);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 192, 256, 1, 3, 1, SAME, conv19);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1216, 192, 1, 1, 1, SAME, conv20);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1216, 96, 1, 1, 1, SAME, conv21);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 224, 224, 3, 1, 1, SAME, conv22);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 192, 224, 3, 3, 1, SAME, conv23);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 192, 192, 1, 3, 1, SAME, conv24);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1152, 192, 1, 1, 1, SAME, conv25);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1152, 128, 1, 1, 1, SAME, conv26);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 192, 192, 3, 1, 1, SAME, conv27);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 160, 192, 3, 3, 1, SAME, conv28);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1152, 160, 1, 1, 1, SAME, conv29);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1024, 128, 1, 1, 1, SAME, conv30);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 128, 192, 1, 3, 1, SAME, conv31);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1024, 160, 1, 1, 1, SAME, conv32);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 128, 192, 3, 1, 1, SAME, conv33);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 1024, 256, 1, 1, 1, SAME, conv34);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 128, 128, 3, 1, 1, SAME, conv35);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 768, 192, 1, 1, 1, SAME, conv36);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 128, 128, 1, 3, 1, SAME, conv37);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 128, 128, 3, 3, 1, SAME, conv38);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 768, 128, 1, 1, 1, SAME, conv39);
+BM_ConvFloatBkInAndFilter(32, 17, 17, 768, 320, 1, 1, 1, SAME, conv40);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 96, 96, 3, 3, 2, VALID, conv41);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 288, 384, 3, 3, 2, VALID, conv42);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 64, 96, 3, 3, 1, SAME, conv43);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 288, 64, 1, 1, 1, SAME, conv44);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 256, 64, 1, 1, 1, SAME, conv45);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 48, 64, 5, 5, 1, SAME, conv46);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 256, 48, 1, 1, 1, SAME, conv47);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 96, 96, 3, 3, 1, SAME, conv48);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 192, 32, 1, 1, 1, SAME, conv49);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 192, 64, 1, 1, 1, SAME, conv50);
+BM_ConvFloatBkInAndFilter(32, 35, 35, 192, 48, 1, 1, 1, SAME, conv51);
+BM_ConvFloatBkInAndFilter(32, 73, 73, 64, 192, 3, 3, 1, VALID, conv52);
+BM_ConvFloatBkInAndFilter(32, 73, 73, 64, 64, 1, 1, 1, VALID, conv53);
+BM_ConvFloatBkInAndFilter(32, 147, 147, 24, 64, 1, 1, 1, VALID, conv54);
+
+#define BM_ConvFloatBkFCPU(BS, R, C, ID, OD, KR, KC, TH, LABEL)                \
+  static void                                                                  \
+      BM_ConvFloatBkFCPU_##BS##_##R##_##C##_##ID##_##OD##_##KR##_##KC##_##TH(  \
+          int iters) {                                                         \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_BACKPROP_FILTER, TH, \
+                 1, VALID, false, LABEL);                                      \
+  }                                                                            \
+  BENCHMARK(                                                                   \
+      BM_ConvFloatBkFCPU_##BS##_##R##_##C##_##ID##_##OD##_##KR##_##KC##_##TH)
+
+// Benchmarks from https://github.com/soumith/convnet-benchmarks
+BM_ConvFloatBkFCPU(128, 128, 128, 3, 96, 11, 11, 4, "convnet-layer1");
+BM_ConvFloatBkFCPU(128, 64, 64, 64, 128, 9, 9, 4, "convnet-layer2");
+BM_ConvFloatBkFCPU(128, 32, 32, 128, 128, 9, 9, 4, "convnet-layer3");
+BM_ConvFloatBkFCPU(128, 16, 16, 128, 128, 7, 7, 4, "convnet-layer4");
+BM_ConvFloatBkFCPU(128, 13, 13, 384, 384, 3, 3, 4, "convnet-layer5");
+
+#define BM_ConvFloatBkFGPU(BS, R, C, ID, OD, KR, KC, LABEL)                    \
+  static void BM_ConvFloatBkFGPU_##BS##_##R##_##C##_##ID##_##OD##_##KR##_##KC( \
+      int iters) {                                                             \
+    BM_ConvFloat(iters, BS, R, C, ID, OD, KR, KC, CONV_OP_BACKPROP_FILTER, 1,  \
+                 1, VALID, true, LABEL);                                       \
+  }                                                                            \
+  BENCHMARK(BM_ConvFloatBkFGPU_##BS##_##R##_##C##_##ID##_##OD##_##KR##_##KC)
+
+// Benchmarks from https://github.com/soumith/convnet-benchmarks
+BM_ConvFloatBkFGPU(128, 128, 128, 3, 96, 11, 11, "convnet-layer1");
+BM_ConvFloatBkFGPU(128, 64, 64, 64, 128, 9, 9, "convnet-layer2");
+BM_ConvFloatBkFGPU(128, 32, 32, 128, 128, 9, 9, "convnet-layer3");
+BM_ConvFloatBkFGPU(128, 16, 16, 128, 128, 7, 7, "convnet-layer4");
+BM_ConvFloatBkFGPU(128, 13, 13, 384, 384, 3, 3, "convnet-layer5");
+
+static void BM_LRNFloat(int iters, int depth, int cols, int rows,
+                        int batch_size, int range, int num_threads,
+                        const string& label) {
+  tensorflow::testing::StopTiming();
+  std::unique_ptr<Device> device(
+      DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+  thread::ThreadPool threadpool(Env::Default(), "test", num_threads);
+  EigenThreadPoolWrapper wrapper(&threadpool);
+  Eigen::ThreadPoolDevice eigen_cpu_device(&wrapper, num_threads);
+  device->set_eigen_cpu_device(&eigen_cpu_device);
+
+  gtl::InlinedVector<TensorValue, 4> inputs;
+  TensorShape shape({batch_size, rows, cols, depth});
+
+  Tensor input(DT_FLOAT, shape);
+  test::FillIota<float>(&input, 1.0);
+  inputs.push_back({nullptr, &input});
+
+  // Convolution op.
+  NodeDef lrn_node_def;
+  TF_CHECK_OK(NodeDefBuilder("lrn_op", "LRN")
+                  .Input("input", 0, DT_FLOAT)
+                  .Attr("depth_radius", range)
+                  .Attr("bias", 1.0)
+                  .Attr("alpha", 0.1)
+                  .Attr("beta", 0.5)
+                  .Finalize(&lrn_node_def));
+
+  Status status;
+  std::unique_ptr<OpKernel> op(CreateOpKernel(
+      DEVICE_CPU, device.get(), cpu_allocator(), lrn_node_def, &status));
+  TF_CHECK_OK(status);
+
+  OpKernelContext::Params params;
+  params.device = device.get();
+  params.frame_iter = FrameAndIter(0, 0);
+  params.inputs = &inputs;
+  params.op_kernel = op.get();
+  params.output_alloc_attr = [&device, &op, &params](int index) {
+    AllocatorAttributes attr;
+    const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+    attr.set_on_host(on_host);
+    return attr;
+  };
+
+  std::unique_ptr<OpKernelContext> context(new OpKernelContext(params));
+
+  op->Compute(context.get());
+  tensorflow::testing::StartTiming();
+  for (int i = 0; i < iters; ++i) {
+    delete context->release_output(0).tensor;
+    op->Compute(context.get());
+  }
+  tensorflow::testing::StopTiming();
+  testing::ItemsProcessed(context->mutable_output(0)->NumElements() * iters *
+                          (2 * range + 1) * 2);
+  testing::SetLabel(label);
+}
+
+#define BM_LRNFloatFwdCPU(DEPTH, COLS, ROWS, BATCH, RANGE, THREADS, LABEL)   \
+  static void                                                                \
+      BM_LRNFloat_##DEPTH##_##COLS##_##ROWS##_##BATCH##_##RANGE##_##THREADS( \
+          int iters) {                                                       \
+    BM_LRNFloat(iters, DEPTH, COLS, ROWS, BATCH, RANGE, THREADS, LABEL);     \
+  }                                                                          \
+  BENCHMARK(                                                                 \
+      BM_LRNFloat_##DEPTH##_##COLS##_##ROWS##_##BATCH##_##RANGE##_##THREADS)
+
+// clang-format off
+//                DEPTH, COLS, ROWS, BATCH, RANGE, THREADS, LABEL
+BM_LRNFloatFwdCPU(64,    56,   56,   32,    5,     1,       "lrn 1 thread");
+BM_LRNFloatFwdCPU(192,   28,   28,   64,    2,     1,       "lrn 1 thread");
+BM_LRNFloatFwdCPU(192,   56,   56,   32,    5,     1,       "lrn 1 thread");
+BM_LRNFloatFwdCPU(64,    56,   56,   32,    5,     4,       "lrn 4 threads");
+BM_LRNFloatFwdCPU(192,   28,   28,   64,    2,     4,       "lrn 4 threads");
+BM_LRNFloatFwdCPU(192,   56,   56,   32,    5,     4,       "lrn 4 threads");
+BM_LRNFloatFwdCPU(64,    56,   56,   32,    5,     8,       "lrn 8 threads");
+BM_LRNFloatFwdCPU(192,   28,   28,   64,    2,     8,       "lrn 8 threads");
+BM_LRNFloatFwdCPU(192,   56,   56,   32,    5,     8,       "lrn 8 threads");
+// clang-format on
+
+/*
+AvgPooling Op
+*/
+static void BM_AvgPool(int iters, int batch_size, int rows, int cols, int depth,
+                       int kernel_rows, int kernel_cols, int stride,
+                       Padding padding, int num_threads, const string& label) {
+  tensorflow::testing::StopTiming();
+  std::unique_ptr<Device> device(
+      DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+  thread::ThreadPool threadpool(Env::Default(), "test", num_threads);
+  EigenThreadPoolWrapper wrapper(&threadpool);
+  Eigen::ThreadPoolDevice eigen_cpu_device(&wrapper, num_threads);
+  device->set_eigen_cpu_device(&eigen_cpu_device);
+
+  gtl::InlinedVector<TensorValue, 4> inputs;
+  TensorShape shape1({batch_size, rows, cols, depth});
+  Tensor input1(DT_FLOAT, shape1);
+  test::FillIota<float>(&input1, 1.0);
+  inputs.push_back({nullptr, &input1});
+
+  // AvgPooling op.
+  NodeDef avgpool_node_def;
+  CHECK_EQ(kernel_rows, kernel_cols);
+  Status status = NodeDefBuilder("avgpool_op", "AvgPool")
+                      .Input(FakeInput(DT_FLOAT))
+                      .Attr("ksize", {1, kernel_rows, kernel_cols, 1})
+                      .Attr("strides", {1, stride, stride, 1})
+                      .Attr("padding", padding == VALID ? "VALID" : "SAME")
+                      .Finalize(&avgpool_node_def);
+  TF_CHECK_OK(status);
+
+  std::unique_ptr<OpKernel> op(CreateOpKernel(
+      DEVICE_CPU, device.get(), cpu_allocator(), avgpool_node_def, &status));
+  TF_CHECK_OK(status);
+  OpKernelContext::Params params;
+  params.device = device.get();
+  params.frame_iter = FrameAndIter(0, 0);
+  params.inputs = &inputs;
+  params.op_kernel = op.get();
+  params.output_alloc_attr = [&device, &op, &params](int index) {
+    AllocatorAttributes attr;
+    const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+    attr.set_on_host(on_host);
+    return attr;
+  };
+
+  std::unique_ptr<OpKernelContext> avgpool_context(new OpKernelContext(params));
+
+  op->Compute(avgpool_context.get());
+  tensorflow::testing::StartTiming();
+  for (int i = 0; i < iters; ++i) {
+    delete avgpool_context->release_output(0).tensor;
+    op->Compute(avgpool_context.get());
+  }
+  tensorflow::testing::StopTiming();
+  testing::ItemsProcessed(avgpool_context->mutable_output(0)->NumElements() *
+                          iters);
+  testing::SetLabel(label);
+}
+
+// BS: batch_size
+// IR: input_rows
+// IC: input_cols
+// ND: node_depth
+// KR: kernel_rows
+// KC: kernel_cols
+// ST: stride. We use the same stride for both directions.
+// PT: padding
+#define BM_AvgPoolFwdCPU(BS, IR, IC, ND, KR, KC, ST, PT, TH, LABEL)            \
+  static void                                                                  \
+      BM_AvgPool_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_##PT##_##TH( \
+          int iters) {                                                         \
+    BM_AvgPool(iters, BS, IR, IC, ND, KR, KC, ST, PT, TH, LABEL);              \
+  }                                                                            \
+  BENCHMARK(                                                                   \
+      BM_AvgPool_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_##PT##_##TH)
+
+// Labels are taken from the 2014-July-24 version of imagenet
+BM_AvgPoolFwdCPU(32, 112, 112, 64, 3, 3, 2, VALID, 1, "avgpool0_VALID");
+BM_AvgPoolFwdCPU(32, 56, 56, 192, 3, 3, 2, VALID, 1, "avgpool1_VALID");
+BM_AvgPoolFwdCPU(32, 28, 28, 352, 3, 3, 2, VALID, 1, "avgpool4_VALID");
+BM_AvgPoolFwdCPU(32, 14, 14, 576, 3, 3, 2, VALID, 1, "avgpool10_VALID");
+BM_AvgPoolFwdCPU(32, 112, 112, 64, 3, 3, 2, SAME, 1, "avgpool0_SAME");
+BM_AvgPoolFwdCPU(32, 56, 56, 192, 3, 3, 2, SAME, 1, "avgpool1_SAME");
+BM_AvgPoolFwdCPU(32, 28, 28, 352, 3, 3, 2, SAME, 1, "avgpool4_SAME");
+BM_AvgPoolFwdCPU(32, 14, 14, 576, 3, 3, 2, SAME, 1, "avgpool10_SAME");
+BM_AvgPoolFwdCPU(32, 112, 112, 64, 3, 3, 2, VALID, 4, "avgpool0_VALID");
+BM_AvgPoolFwdCPU(32, 56, 56, 192, 3, 3, 2, VALID, 4, "avgpool1_VALID");
+BM_AvgPoolFwdCPU(32, 28, 28, 352, 3, 3, 2, VALID, 4, "avgpool4_VALID");
+BM_AvgPoolFwdCPU(32, 14, 14, 576, 3, 3, 2, VALID, 4, "avgpool10_VALID");
+BM_AvgPoolFwdCPU(32, 112, 112, 64, 3, 3, 2, SAME, 4, "avgpool0_SAME");
+BM_AvgPoolFwdCPU(32, 56, 56, 192, 3, 3, 2, SAME, 4, "avgpool1_SAME");
+BM_AvgPoolFwdCPU(32, 28, 28, 352, 3, 3, 2, SAME, 4, "avgpool4_SAME");
+BM_AvgPoolFwdCPU(32, 14, 14, 576, 3, 3, 2, SAME, 4, "avgpool10_SAME");
+
+static void BM_AvgPoolBk(int iters, int batch_size, int rows, int cols,
+                         int depth, int kernel_rows, int kernel_cols,
+                         int stride, Padding padding, int num_threads,
+                         const string& label) {
+  tensorflow::testing::StopTiming();
+  std::unique_ptr<Device> device(
+      DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+  thread::ThreadPool threadpool(Env::Default(), "test", num_threads);
+  EigenThreadPoolWrapper wrapper(&threadpool);
+  Eigen::ThreadPoolDevice eigen_cpu_device(&wrapper, num_threads);
+  device->set_eigen_cpu_device(&eigen_cpu_device);
+
+  gtl::InlinedVector<TensorValue, 4> inputs;
+
+  int out_height, out_width, pad_rows, pad_cols;
+  Status status =
+      Get2dOutputSize(rows, cols, kernel_rows, kernel_cols, stride, stride,
+                      padding, &out_height, &out_width, &pad_rows, &pad_cols);
+  TF_CHECK_OK(status);
+  TensorShape output_shape({batch_size, out_height, out_width, depth});
+  TensorShape shape2({4});
+  Tensor input_shape_tensor(DT_INT32, shape2);
+  int32 input_dims[] = {batch_size, rows, cols, depth};
+  for (int i = 0; i < 4; i++) {
+    input_shape_tensor.flat<int32>()(i) = input_dims[i];
+  }
+  inputs.push_back({nullptr, &input_shape_tensor});
+
+  Tensor output_backprop(DT_FLOAT, output_shape);
+  test::FillIota<float>(&output_backprop, 11.0);
+  inputs.push_back({nullptr, &output_backprop});
+
+  // AvgPoolGrad op.
+  NodeDef avgpool_grad_node_def;
+  status = NodeDefBuilder("avgpool_grad_op", "AvgPoolGrad")
+               .Input(FakeInput())
+               .Input(FakeInput(DT_FLOAT))
+               .Attr("ksize", {1, kernel_rows, kernel_cols, 1})
+               .Attr("strides", {1, stride, stride, 1})
+               .Attr("padding", padding == VALID ? "VALID" : "SAME")
+               .Finalize(&avgpool_grad_node_def);
+  TF_CHECK_OK(status);
+  std::unique_ptr<OpKernel> op(CreateOpKernel(
+      DEVICE_CPU, nullptr, cpu_allocator(), avgpool_grad_node_def, &status));
+  TF_CHECK_OK(status);
+  OpKernelContext::Params params;
+  params.device = device.get();
+  params.frame_iter = FrameAndIter(0, 0);
+  params.inputs = &inputs;
+  params.op_kernel = op.get();
+  params.output_alloc_attr = [&device, &op, &params](int index) {
+    AllocatorAttributes attr;
+    const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+    attr.set_on_host(on_host);
+    return attr;
+  };
+
+  std::unique_ptr<OpKernelContext> avgpool_context(new OpKernelContext(params));
+
+  op->Compute(avgpool_context.get());
+  tensorflow::testing::StartTiming();
+  for (int i = 0; i < iters; ++i) {
+    delete avgpool_context->release_output(0).tensor;
+    op->Compute(avgpool_context.get());
+  }
+  tensorflow::testing::StopTiming();
+  testing::ItemsProcessed(avgpool_context->mutable_output(0)->NumElements() *
+                          iters);
+  testing::SetLabel(label);
+}
+
+// BS: batch_size
+// IR: input_rows
+// IC: input_cols
+// ND: node_depth
+// KR: kernel_rows
+// KC: kernel_cols
+// ST: stride. We use the same stride for both directions.
+// PT: padding
+// The resulted symbol is too long. Need to use two macros to fit in 80-chars
+#define BM_AvgPoolBkCPU(BS, IR, IC, ND, KR, KC, ST, PT, TH, LABEL)               \
+  static void                                                                    \
+      BM_AvgPoolBk_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_##PT##_##TH( \
+          int iters) {                                                           \
+    BM_AvgPoolBk(iters, BS, IR, IC, ND, KR, KC, ST, PT, TH, LABEL);              \
+  }                                                                              \
+  BENCHMARK(                                                                     \
+      BM_AvgPoolBk_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_##PT##_##TH)
+
+// Shapes taken from the 2015/05/16 inception model
+BM_AvgPoolBkCPU(32, 35, 35, 192, 3, 3, 1, SAME, 1, "avgpool_grad0_SAME");
+BM_AvgPoolBkCPU(32, 35, 35, 256, 3, 3, 1, SAME, 1, "avgpool_grad1_SAME");
+BM_AvgPoolBkCPU(32, 17, 17, 768, 3, 3, 1, SAME, 1, "avgpool_grad2_SAME");
+BM_AvgPoolBkCPU(32, 17, 17, 1024, 3, 3, 1, SAME, 1, "avgpool_grad3_SAME");
+BM_AvgPoolBkCPU(32, 17, 17, 1152, 3, 3, 1, SAME, 1, "avgpool_grad4_SAME");
+BM_AvgPoolBkCPU(32, 17, 17, 1216, 3, 3, 1, SAME, 1, "avgpool_grad5_SAME");
+BM_AvgPoolBkCPU(32, 17, 17, 1248, 5, 5, 3, VALID, 1, "avgpool_grad6_VALID");
+BM_AvgPoolBkCPU(32, 8, 8, 1760, 3, 3, 1, SAME, 1, "avgpool_grad7_SAME");
+BM_AvgPoolBkCPU(32, 8, 8, 2048, 8, 8, 1, VALID, 1, "avgpool_grad8_VALID");
+
+/*
+MaxPooling Op
+*/
+static void BM_MaxPool(int iters, int batch_size, int rows, int cols, int depth,
+                       int kernel_rows, int kernel_cols, int stride,
+                       Padding padding, int num_threads, const string& label) {
+  tensorflow::testing::StopTiming();
+  std::unique_ptr<Device> device(
+      DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+  thread::ThreadPool threadpool(Env::Default(), "test", num_threads);
+  EigenThreadPoolWrapper wrapper(&threadpool);
+  Eigen::ThreadPoolDevice eigen_cpu_device(&wrapper, num_threads);
+  device->set_eigen_cpu_device(&eigen_cpu_device);
+
+  gtl::InlinedVector<TensorValue, 4> inputs;
+  TensorShape shape1({batch_size, rows, cols, depth});
+  Tensor input1(DT_FLOAT, shape1);
+  test::FillIota<float>(&input1, 1.0);
+  inputs.push_back({nullptr, &input1});
+
+  // MaxPooling op.
+  NodeDef maxpool_node_def;
+  CHECK_EQ(kernel_rows, kernel_cols);
+  Status status = NodeDefBuilder("maxpool_op", "MaxPool")
+                      .Input(FakeInput())
+                      .Attr("ksize", {1, kernel_rows, kernel_cols, 1})
+                      .Attr("strides", {1, stride, stride, 1})
+                      .Attr("padding", padding == VALID ? "VALID" : "SAME")
+                      .Finalize(&maxpool_node_def);
+  TF_CHECK_OK(status);
+  std::unique_ptr<OpKernel> op(CreateOpKernel(
+      DEVICE_CPU, device.get(), cpu_allocator(), maxpool_node_def, &status));
+  TF_CHECK_OK(status);
+  OpKernelContext::Params params;
+  params.device = device.get();
+  params.frame_iter = FrameAndIter(0, 0);
+  params.inputs = &inputs;
+  params.op_kernel = op.get();
+  params.output_alloc_attr = [&device, &op, &params](int index) {
+    AllocatorAttributes attr;
+    const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+    attr.set_on_host(on_host);
+    return attr;
+  };
+
+  std::unique_ptr<OpKernelContext> maxpool_context(new OpKernelContext(params));
+
+  op->Compute(maxpool_context.get());
+  tensorflow::testing::StartTiming();
+  for (int i = 0; i < iters; ++i) {
+    delete maxpool_context->release_output(0).tensor;
+    op->Compute(maxpool_context.get());
+  }
+  tensorflow::testing::StopTiming();
+  testing::ItemsProcessed(maxpool_context->mutable_output(0)->NumElements() *
+                          iters);
+  testing::SetLabel(label);
+}
+
+// BS: batch_size
+// IR: input_rows
+// IC: input_cols
+// ND: node_depth
+// KR: kernel_rows
+// KC: kernel_cols
+// ST: stride. We use the same stride for both directions.
+// PT: padding
+#define BM_MaxPoolFwdCPU(BS, IR, IC, ND, KR, KC, ST, PT, TH, LABEL)            \
+  static void                                                                  \
+      BM_MaxPool_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_##PT##_##TH( \
+          int iters) {                                                         \
+    BM_MaxPool(iters, BS, IR, IC, ND, KR, KC, ST, PT, TH, LABEL);              \
+  }                                                                            \
+  BENCHMARK(                                                                   \
+      BM_MaxPool_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_##PT##_##TH)
+
+// Labels are taken from the 2014-July-24 version of imagenet
+BM_MaxPoolFwdCPU(32, 112, 112, 64, 3, 3, 2, VALID, 1, "maxpool0_VALID");
+BM_MaxPoolFwdCPU(32, 56, 56, 192, 3, 3, 2, VALID, 1, "maxpool1_VALID");
+BM_MaxPoolFwdCPU(32, 28, 28, 352, 3, 3, 2, VALID, 1, "maxpool4_VALID");
+BM_MaxPoolFwdCPU(32, 14, 14, 576, 3, 3, 2, VALID, 1, "maxpool10_VALID");
+BM_MaxPoolFwdCPU(32, 112, 112, 64, 3, 3, 2, SAME, 1, "maxpool0_SAME");
+BM_MaxPoolFwdCPU(32, 56, 56, 192, 3, 3, 2, SAME, 1, "maxpool1_SAME");
+BM_MaxPoolFwdCPU(32, 28, 28, 352, 3, 3, 2, SAME, 1, "maxpool4_SAME");
+BM_MaxPoolFwdCPU(32, 14, 14, 576, 3, 3, 2, SAME, 1, "maxpool10_SAME");
+BM_MaxPoolFwdCPU(32, 112, 112, 64, 3, 3, 2, VALID, 4, "maxpool0_VALID");
+BM_MaxPoolFwdCPU(32, 56, 56, 192, 3, 3, 2, VALID, 4, "maxpool1_VALID");
+BM_MaxPoolFwdCPU(32, 28, 28, 352, 3, 3, 2, VALID, 4, "maxpool4_VALID");
+BM_MaxPoolFwdCPU(32, 14, 14, 576, 3, 3, 2, VALID, 4, "maxpool10_VALID");
+BM_MaxPoolFwdCPU(32, 112, 112, 64, 3, 3, 2, SAME, 4, "maxpool0_SAME");
+BM_MaxPoolFwdCPU(32, 56, 56, 192, 3, 3, 2, SAME, 4, "maxpool1_SAME");
+BM_MaxPoolFwdCPU(32, 28, 28, 352, 3, 3, 2, SAME, 4, "maxpool4_SAME");
+BM_MaxPoolFwdCPU(32, 14, 14, 576, 3, 3, 2, SAME, 4, "maxpool10_SAME");
+
+static void BM_MaxPoolBk(int iters, int batch_size, int rows, int cols,
+                         int depth, int kernel_rows, int kernel_cols,
+                         int stride, Padding padding, int num_threads,
+                         bool use_gpu, const string& label) {
+  GraphDefBuilder b(GraphDefBuilder::kFailImmediately);
+
+  int out_height, out_width, pad_rows, pad_cols;
+  Status status =
+      Get2dOutputSize(rows, cols, kernel_rows, kernel_cols, stride, stride,
+                      padding, &out_height, &out_width, &pad_rows, &pad_cols);
+  TF_CHECK_OK(status);
+
+  Tensor input_data(DT_FLOAT, TensorShape({batch_size, rows, cols, depth}));
+  input_data.flat<float>().setRandom();
+  Node* input_data_node = ops::Const(input_data, b.opts());
+
+  Tensor output_data(DT_FLOAT,
+                     TensorShape({batch_size, out_height, out_width, depth}));
+  output_data.flat<float>().setRandom();
+  Node* output_data_node = ops::Const(output_data, b.opts());
+
+  Tensor output_diff(DT_FLOAT,
+                     TensorShape({batch_size, out_height, out_width, depth}));
+  output_diff.flat<float>().setRandom();
+  Node* output_diff_node = ops::Const(output_diff, b.opts());
+
+  CHECK_EQ(kernel_rows, kernel_cols);
+  ops::MaxPoolGrad(input_data_node, output_data_node, output_diff_node,
+                   {1, kernel_rows, kernel_cols, 1} /* ksize */,
+                   {1, stride, stride, 1} /* stride */,
+                   padding == VALID ? "VALID" : "SAME", b.opts());
+  Graph* g = new Graph(OpRegistry::Global());
+  TF_CHECK_OK(b.ToGraph(g));
+  string device = use_gpu ? "gpu" : "cpu";
+  test::Benchmark(device, g).Run(iters);
+
+  testing::ItemsProcessed(batch_size * rows * cols * depth * iters);
+  testing::SetLabel(label);
+}
+
+// BS: batch_size
+// IR: input_rows
+// IC: input_cols
+// ND: node_depth
+// KR: kernel_rows
+// KC: kernel_cols
+// ST: stride. We use the same stride for both directions.
+// PT: padding
+// The resulted symbol is too long. Need to use two macros to fit in 80-chars
+// clang-format off
+#define BM_MaxPoolBkGPU(BS, IR, IC, ND, KR, KC, ST, PT, TH, LABEL)             \
+  static void                                                                  \
+      BM_MaxPoolBk_GPU_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_       \
+          ##PT##_##TH(                                                         \
+          int iters) {                                                         \
+    BM_MaxPoolBk(iters, BS, IR, IC, ND, KR, KC, ST, PT, TH, true, LABEL);      \
+  }                                                                            \
+  BENCHMARK(                                                                   \
+      BM_MaxPoolBk_GPU_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_       \
+          ##PT##_##TH)                                                         \
+
+#define BM_MaxPoolBkCPU(BS, IR, IC, ND, KR, KC, ST, PT, TH, LABEL)             \
+  static void                                                                  \
+      BM_MaxPoolBk_CPU_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_       \
+          ##PT##_##TH(                                                         \
+          int iters) {                                                         \
+    BM_MaxPoolBk(iters, BS, IR, IC, ND, KR, KC, ST, PT, TH, false, LABEL);     \
+  }                                                                            \
+  BENCHMARK(                                                                   \
+      BM_MaxPoolBk_CPU_##BS##_##IR##_##IC##_##ND##_##KR##_##KC##_##ST##_       \
+          ##PT##_##TH)
+// clang-format on
+
+// Shapes taken from the 2015/05/16 inception model
+BM_MaxPoolBkGPU(32, 147, 147, 64, 3, 3, 2, VALID, 1, "maxpool_grad0_VALID");
+BM_MaxPoolBkGPU(32, 71, 71, 192, 3, 3, 2, VALID, 1, "maxpool_grad1_VALID");
+BM_MaxPoolBkGPU(32, 35, 35, 288, 3, 3, 2, VALID, 1, "maxpool_grad2_VALID");
+BM_MaxPoolBkGPU(32, 17, 17, 1248, 3, 3, 2, VALID, 1, "maxpool_grad3_VALID");
+BM_MaxPoolBkGPU(32, 8, 8, 2048, 3, 3, 2, VALID, 1, "maxpool_grad4_VALID");
+
+BM_MaxPoolBkCPU(32, 147, 147, 64, 3, 3, 2, VALID, 1, "maxpool_grad0_VALID");
+BM_MaxPoolBkCPU(32, 71, 71, 192, 3, 3, 2, VALID, 1, "maxpool_grad1_VALID");
+BM_MaxPoolBkCPU(32, 35, 35, 288, 3, 3, 2, VALID, 1, "maxpool_grad2_VALID");
+BM_MaxPoolBkCPU(32, 17, 17, 1248, 3, 3, 2, VALID, 1, "maxpool_grad3_VALID");
+BM_MaxPoolBkCPU(32, 8, 8, 2048, 3, 3, 2, VALID, 1, "maxpool_grad4_VALID");
+
+/*
+Relu Op
+Run benchmark with:
+*/
+static void BM_ReluFloat(int iters, int batch_size, int rows, int cols,
+                         int depth, int num_threads, const string& label) {
+  tensorflow::testing::StopTiming();
+  std::unique_ptr<Device> device(
+      DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+  thread::ThreadPool threadpool(Env::Default(), "test", num_threads);
+  EigenThreadPoolWrapper wrapper(&threadpool);
+  Eigen::ThreadPoolDevice eigen_cpu_device(&wrapper, num_threads);
+  device->set_eigen_cpu_device(&eigen_cpu_device);
+
+  gtl::InlinedVector<TensorValue, 4> inputs;
+  TensorShape shape1({batch_size, rows, cols, depth});
+  Tensor input1(DT_FLOAT, shape1);
+  test::FillIota<float>(&input1, 1.0);
+  inputs.push_back({nullptr, &input1});
+
+  // Reluing op.
+  NodeDef relu_node_def;
+  Status status = NodeDefBuilder("relu_op", "Relu")
+                      .Input(FakeInput(DT_FLOAT))
+                      .Finalize(&relu_node_def);
+  TF_CHECK_OK(status);
+  std::unique_ptr<OpKernel> op(CreateOpKernel(
+      DEVICE_CPU, device.get(), cpu_allocator(), relu_node_def, &status));
+  TF_CHECK_OK(status);
+  OpKernelContext::Params params;
+  params.device = device.get();
+  params.frame_iter = FrameAndIter(0, 0);
+  params.inputs = &inputs;
+  params.op_kernel = op.get();
+  params.output_alloc_attr = [&device, &op, &params](int index) {
+    AllocatorAttributes attr;
+    const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+    attr.set_on_host(on_host);
+    return attr;
+  };
+
+  std::unique_ptr<OpKernelContext> relu_context(new OpKernelContext(params));
+
+  op->Compute(relu_context.get());
+  tensorflow::testing::StartTiming();
+  for (int i = 0; i < iters; ++i) {
+    delete relu_context->release_output(0).tensor;
+    op->Compute(relu_context.get());
+  }
+  tensorflow::testing::StopTiming();
+  testing::ItemsProcessed(relu_context->mutable_output(0)->NumElements() *
+                          iters);
+  testing::SetLabel(label);
+}
+
+// BS: batch_size
+// IR: input_rows
+// IC: input_cols
+// ND: node_depth
+#define BM_Relu(BS, IR, IC, ND, TH, LABEL)                               \
+  static void BM_ReluFloat_##BS##_##IR##_##IC##_##ND##_##TH(int iters) { \
+    BM_ReluFloat(iters, BS, IR, IC, ND, TH, LABEL);                      \
+  }                                                                      \
+  BENCHMARK(BM_ReluFloat_##BS##_##IR##_##IC##_##ND##_##TH)
+
+BM_Relu(32, 112, 112, 64, 1, "relu0");
+BM_Relu(32, 56, 56, 192, 1, "relu1");
+BM_Relu(32, 28, 28, 352, 1, "relu4");
+BM_Relu(32, 14, 14, 576, 1, "relu10");
+BM_Relu(32, 112, 112, 64, 4, "relu0");
+BM_Relu(32, 56, 56, 192, 4, "relu1");
+BM_Relu(32, 28, 28, 352, 4, "relu4");
+BM_Relu(32, 14, 14, 576, 4, "relu10");
+
+static void BM_ImageNetSoftmaxFwd(int iters, int batch_size, int node_depth,
+                                  int num_threads, const string& label) {
+  tensorflow::testing::StopTiming();
+  std::unique_ptr<Device> device(
+      DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+  thread::ThreadPool threadpool(Env::Default(), "test", num_threads);
+  EigenThreadPoolWrapper wrapper(&threadpool);
+  Eigen::ThreadPoolDevice eigen_cpu_device(&wrapper, num_threads);
+  device->set_eigen_cpu_device(&eigen_cpu_device);
+
+  gtl::InlinedVector<TensorValue, 4> inputs;
+  TensorShape shape1({node_depth, batch_size});
+  Tensor* input1 = new Tensor(DT_FLOAT, shape1);
+  test::FillIota<float>(input1, 1.0);
+  inputs.push_back({nullptr, input1});
+
+  // Softmax op.
+  NodeDef softmax_node_def;
+  TF_CHECK_OK(NodeDefBuilder("softmax_op", "Softmax")
+                  .Input("input", 0, DT_FLOAT)
+                  .Finalize(&softmax_node_def));
+  Status status;
+  std::unique_ptr<OpKernel> op(CreateOpKernel(
+      DEVICE_CPU, device.get(), cpu_allocator(), softmax_node_def, &status));
+  TF_CHECK_OK(status);
+  OpKernelContext::Params params;
+  params.device = device.get();
+  params.frame_iter = FrameAndIter(0, 0);
+  params.inputs = &inputs;
+  params.op_kernel = op.get();
+  params.output_alloc_attr = [&device, &op, &params](int index) {
+    AllocatorAttributes attr;
+    const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+    attr.set_on_host(on_host);
+    return attr;
+  };
+
+  std::unique_ptr<OpKernelContext> softmax_context(new OpKernelContext(params));
+
+  op->Compute(softmax_context.get());
+  tensorflow::testing::StartTiming();
+  for (int i = 0; i < iters; ++i) {
+    delete softmax_context->release_output(0).tensor;
+    op->Compute(softmax_context.get());
+  }
+  tensorflow::testing::StopTiming();
+  testing::ItemsProcessed(softmax_context->mutable_output(0)->NumElements() *
+                          iters);
+  testing::SetLabel(label);
+}
+
+#define BM_ImageNetSoftmaxFwdCPU(BATCH_SIZE, NODE_DEPTH, TH, LABEL)     \
+  static void BM_ImageNetSoftmaxFwd_##BATCH_SIZE##_##NODE_DEPTH##_##TH( \
+      int iters) {                                                      \
+    BM_ImageNetSoftmaxFwd(iters, BATCH_SIZE, NODE_DEPTH, TH, LABEL);    \
+  }                                                                     \
+  BENCHMARK(BM_ImageNetSoftmaxFwd_##BATCH_SIZE##_##NODE_DEPTH##_##TH)
+
+// Labels are taken from the 2014-July-24 version of imagenet
+BM_ImageNetSoftmaxFwdCPU(32, 1008, 1, "softmax32");
+BM_ImageNetSoftmaxFwdCPU(128, 1008, 1, "softmax128");
+BM_ImageNetSoftmaxFwdCPU(32, 1008, 4, "softmax32");
+BM_ImageNetSoftmaxFwdCPU(128, 1008, 4, "softmax128");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/count_up_to_op.cc b/tensorflow/core/kernels/count_up_to_op.cc
new file mode 100644
index 0000000000..7cf4bdb6d0
--- /dev/null
+++ b/tensorflow/core/kernels/count_up_to_op.cc
@@ -0,0 +1,51 @@
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+template <class T>
+class CountUpToOp : public OpKernel {
+ public:
+  explicit CountUpToOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("limit", &limit_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    T before_increment;
+    {
+      mutex_lock l(*context->input_ref_mutex(0));
+      Tensor tensor = context->mutable_input(0, true);
+      OP_REQUIRES(context, TensorShapeUtils::IsScalar(tensor.shape()),
+                  errors::InvalidArgument("input is not a scalar: ",
+                                          tensor.shape().DebugString()));
+      T* ptr = &tensor.scalar<T>()();
+      before_increment = *ptr;
+      if (*ptr >= limit_) {
+        context->SetStatus(errors::OutOfRange("Reached limit of ", limit_));
+        return;
+      }
+      ++*ptr;
+    }
+    // Output if no error.
+    Tensor* out_tensor;
+    OP_REQUIRES_OK(context, context->allocate_output("output", TensorShape({}),
+                                                     &out_tensor));
+    out_tensor->scalar<T>()() = before_increment;
+  }
+
+ private:
+  T limit_;
+};
+
+#define REGISTER(TYPE)                                                \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("CountUpTo").TypeConstraint<TYPE>("T").Device(DEVICE_CPU), \
+      CountUpToOp<TYPE>)
+
+REGISTER(int32);
+REGISTER(int64);
+
+#undef REGISTER
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_abs.cc b/tensorflow/core/kernels/cwise_op_abs.cc
new file mode 100644
index 0000000000..5d39b88166
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_abs.cc
@@ -0,0 +1,23 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER4(UnaryOp, CPU, "Abs", functor::abs, float, double, int32, int64);
+#ifndef __ANDROID__
+REGISTER_KERNEL_BUILDER(Name("ComplexAbs").Device(DEVICE_CPU),
+                        UnaryOp<CPUDevice, functor::abs<complex64>>);
+#endif
+#if GOOGLE_CUDA
+REGISTER3(UnaryOp, GPU, "Abs", functor::abs, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Abs")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .TypeConstraint<int32>("T"),
+                        UnaryOp<CPUDevice, functor::abs<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_add.cc b/tensorflow/core/kernels/cwise_op_add.cc
new file mode 100644
index 0000000000..a6cd4bddbe
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_add.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER7(BinaryOp, CPU, "Add", functor::add, float, double, int32, int64, int8,
+          int16, complex64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Add", functor::add, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Add")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::add<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_ceil.cc b/tensorflow/core/kernels/cwise_op_ceil.cc
new file mode 100644
index 0000000000..0a8f1313f8
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_ceil.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER2(UnaryOp, CPU, "Ceil", functor::ceil, float, double);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Ceil", functor::ceil, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_complex.cc b/tensorflow/core/kernels/cwise_op_complex.cc
new file mode 100644
index 0000000000..825181bc35
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_complex.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER_KERNEL_BUILDER(Name("Complex").Device(DEVICE_CPU),
+                        BinaryOp<CPUDevice, functor::make_complex<float>>);
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("Complex").Device(DEVICE_GPU),
+                        BinaryOp<GPUDevice, functor::make_complex<float>>);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_conj.cc b/tensorflow/core/kernels/cwise_op_conj.cc
new file mode 100644
index 0000000000..ba445d1c3d
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_conj.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER_KERNEL_BUILDER(Name("Conj").Device(DEVICE_CPU),
+                        UnaryOp<CPUDevice, functor::conj<complex64>>);
+#if GOOGLE_CUDA
+// REGISTER_KERNEL_BUILDER(Name("Conj").Device(DEVICE_GPU),
+//                         UnaryOp<GPUDevice, functor::conj<complex64>>);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_cos.cc b/tensorflow/core/kernels/cwise_op_cos.cc
new file mode 100644
index 0000000000..45e24fc2ec
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_cos.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Cos", functor::cos, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Cos", functor::cos, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_div.cc b/tensorflow/core/kernels/cwise_op_div.cc
new file mode 100644
index 0000000000..76d606ed03
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_div.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER5(BinaryOp, CPU, "Div", functor::div, float, double, int32, int64,
+          complex64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Div", functor::div, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Div")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::div<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_equal_to.cc b/tensorflow/core/kernels/cwise_op_equal_to.cc
new file mode 100644
index 0000000000..8369299332
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_equal_to.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER5(BinaryOp, CPU, "Equal", functor::equal_to, float, double, int32,
+          int64, complex64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Equal", functor::equal_to, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Equal")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::equal_to<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_exp.cc b/tensorflow/core/kernels/cwise_op_exp.cc
new file mode 100644
index 0000000000..b2603a1b4c
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_exp.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Exp", functor::exp, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Exp", functor::exp, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_floor.cc b/tensorflow/core/kernels/cwise_op_floor.cc
new file mode 100644
index 0000000000..83c8203953
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_floor.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER2(UnaryOp, CPU, "Floor", functor::floor, float, double);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Floor", functor::floor, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_gpu_abs.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_abs.cu.cc
new file mode 100644
index 0000000000..59436afbc0
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_abs.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY3(abs, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_add.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_add.cu.cc
new file mode 100644
index 0000000000..edf8e0d1a5
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_add.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(add, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_ceil.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_ceil.cu.cc
new file mode 100644
index 0000000000..f24c4b8b73
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_ceil.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(ceil, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_complex.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_complex.cu.cc
new file mode 100644
index 0000000000..29086b5c71
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_complex.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY1(make_complex, float);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_conj.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_conj.cu.cc
new file mode 100644
index 0000000000..cae22cea8e
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_conj.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+// DEFINE_UNARY1(conj, complex64);  // not working
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_cos.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_cos.cu.cc
new file mode 100644
index 0000000000..c8412496a8
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_cos.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(cos, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_div.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_div.cu.cc
new file mode 100644
index 0000000000..c581c0487e
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_div.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(div, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_equal_to.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_equal_to.cu.cc
new file mode 100644
index 0000000000..f994822a74
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_equal_to.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY4(equal_to, float, double, int64, complex64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_exp.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_exp.cu.cc
new file mode 100644
index 0000000000..caeaa19cef
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_exp.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(exp, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_floor.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_floor.cu.cc
new file mode 100644
index 0000000000..0a06ff2978
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_floor.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(floor, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_greater.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_greater.cu.cc
new file mode 100644
index 0000000000..e1278e077b
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_greater.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(greater, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_greater_equal.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_greater_equal.cu.cc
new file mode 100644
index 0000000000..fafcf9b28a
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_greater_equal.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(greater_equal, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_imag.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_imag.cu.cc
new file mode 100644
index 0000000000..0370782c96
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_imag.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY1(get_imag, complex64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_inverse.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_inverse.cu.cc
new file mode 100644
index 0000000000..020abef210
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_inverse.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY3(inverse, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_isfinite.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_isfinite.cu.cc
new file mode 100644
index 0000000000..7a3a273af7
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_isfinite.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(isfinite, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_isinf.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_isinf.cu.cc
new file mode 100644
index 0000000000..cfc4be3d25
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_isinf.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(isinf, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_isnan.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_isnan.cu.cc
new file mode 100644
index 0000000000..c93b74387e
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_isnan.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(isnan, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_less.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_less.cu.cc
new file mode 100644
index 0000000000..8e2b28ac60
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_less.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(less, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_less_equal.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_less_equal.cu.cc
new file mode 100644
index 0000000000..be8e34a58b
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_less_equal.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(less_equal, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_log.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_log.cu.cc
new file mode 100644
index 0000000000..7d183cce50
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_log.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(log, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_logical_and.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_logical_and.cu.cc
new file mode 100644
index 0000000000..ba7046f9f0
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_logical_and.cu.cc
@@ -0,0 +1,13 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+template struct BinaryFunctor<GPUDevice, logical_and, 1>;
+template struct BinaryFunctor<GPUDevice, logical_and, 2>;
+template struct BinaryFunctor<GPUDevice, logical_and, 3>;
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_logical_not.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_logical_not.cu.cc
new file mode 100644
index 0000000000..34a43a76ef
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_logical_not.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+template struct UnaryFunctor<GPUDevice, logical_not>;
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_logical_or.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_logical_or.cu.cc
new file mode 100644
index 0000000000..47a7bd68dc
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_logical_or.cu.cc
@@ -0,0 +1,13 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+template struct BinaryFunctor<GPUDevice, logical_or, 1>;
+template struct BinaryFunctor<GPUDevice, logical_or, 2>;
+template struct BinaryFunctor<GPUDevice, logical_or, 3>;
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_maximum.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_maximum.cu.cc
new file mode 100644
index 0000000000..8f7ab90e9a
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_maximum.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(maximum, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_minimum.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_minimum.cu.cc
new file mode 100644
index 0000000000..75fd7f89b4
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_minimum.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(minimum, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_mod.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_mod.cu.cc
new file mode 100644
index 0000000000..d08a17a94d
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_mod.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+// No GPU ops for mod yet.
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_mul.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_mul.cu.cc
new file mode 100644
index 0000000000..e0a6738bef
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_mul.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(mul, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_neg.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_neg.cu.cc
new file mode 100644
index 0000000000..3031afbb75
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_neg.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY4(neg, float, double, int32, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_not_equal_to.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_not_equal_to.cu.cc
new file mode 100644
index 0000000000..59c76ee88b
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_not_equal_to.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY4(not_equal_to, float, double, int64, complex64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_pow.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_pow.cu.cc
new file mode 100644
index 0000000000..50177495bc
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_pow.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(pow, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_real.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_real.cu.cc
new file mode 100644
index 0000000000..3b1d465914
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_real.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY1(get_real, complex64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_rsqrt.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_rsqrt.cu.cc
new file mode 100644
index 0000000000..682e2d2d4b
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_rsqrt.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(rsqrt, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
new file mode 100644
index 0000000000..b5125648e3
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_select.cu.cc
@@ -0,0 +1,15 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+template struct SelectFunctor<GPUDevice, float>;
+template struct SelectFunctor<GPUDevice, double>;
+template struct SelectFunctor<GPUDevice, int32>;
+template struct SelectFunctor<GPUDevice, int64>;
+template struct SelectFunctor<GPUDevice, complex64>;
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_sigmoid.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_sigmoid.cu.cc
new file mode 100644
index 0000000000..9c250f3071
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_sigmoid.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(sigmoid, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_sign.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_sign.cu.cc
new file mode 100644
index 0000000000..f413480ecc
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_sign.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY3(sign, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_sin.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_sin.cu.cc
new file mode 100644
index 0000000000..6135f3b780
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_sin.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(sin, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_sqrt.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_sqrt.cu.cc
new file mode 100644
index 0000000000..9bdf3b9e30
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_sqrt.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(sqrt, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_square.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_square.cu.cc
new file mode 100644
index 0000000000..6b900e994d
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_square.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY3(square, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_sub.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_sub.cu.cc
new file mode 100644
index 0000000000..6fd5ea0d38
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_sub.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_BINARY3(sub, float, double, int64);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_gpu_tanh.cu.cc b/tensorflow/core/kernels/cwise_op_gpu_tanh.cu.cc
new file mode 100644
index 0000000000..e0393f6c2a
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_gpu_tanh.cu.cc
@@ -0,0 +1,11 @@
+#if GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/cwise_ops_gpu_common.cu.h"
+
+namespace tensorflow {
+namespace functor {
+DEFINE_UNARY2(tanh, float, double);
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/cwise_op_greater.cc b/tensorflow/core/kernels/cwise_op_greater.cc
new file mode 100644
index 0000000000..9ae31dcdfe
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_greater.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER4(BinaryOp, CPU, "Greater", functor::greater, float, double, int32,
+          int64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Greater", functor::greater, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Greater")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::greater<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_greater_equal.cc b/tensorflow/core/kernels/cwise_op_greater_equal.cc
new file mode 100644
index 0000000000..be4cc5dc79
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_greater_equal.cc
@@ -0,0 +1,22 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER4(BinaryOp, CPU, "GreaterEqual", functor::greater_equal, float, double,
+          int32, int64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "GreaterEqual", functor::greater_equal, float, double,
+          int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("GreaterEqual")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::greater_equal<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_imag.cc b/tensorflow/core/kernels/cwise_op_imag.cc
new file mode 100644
index 0000000000..c2432326fc
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_imag.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER_KERNEL_BUILDER(Name("Imag").Device(DEVICE_CPU),
+                        UnaryOp<CPUDevice, functor::get_imag<complex64>>);
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("Imag").Device(DEVICE_GPU),
+                        UnaryOp<GPUDevice, functor::get_imag<complex64>>);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_inverse.cc b/tensorflow/core/kernels/cwise_op_inverse.cc
new file mode 100644
index 0000000000..6af883e755
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_inverse.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Inv", functor::inverse, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER3(UnaryOp, GPU, "Inv", functor::inverse, float, double, int64);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_isfinite.cc b/tensorflow/core/kernels/cwise_op_isfinite.cc
new file mode 100644
index 0000000000..e52d199a8f
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_isfinite.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER2(UnaryOp, CPU, "IsFinite", functor::isfinite, float, double);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "IsFinite", functor::isfinite, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_isinf.cc b/tensorflow/core/kernels/cwise_op_isinf.cc
new file mode 100644
index 0000000000..868204f86e
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_isinf.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER2(UnaryOp, CPU, "IsInf", functor::isinf, float, double);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "IsInf", functor::isinf, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_isnan.cc b/tensorflow/core/kernels/cwise_op_isnan.cc
new file mode 100644
index 0000000000..a8f4d60d0f
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_isnan.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER2(UnaryOp, CPU, "IsNan", functor::isnan, float, double);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "IsNan", functor::isnan, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_less.cc b/tensorflow/core/kernels/cwise_op_less.cc
new file mode 100644
index 0000000000..3b5f75445c
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_less.cc
@@ -0,0 +1,20 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER4(BinaryOp, CPU, "Less", functor::less, float, double, int32, int64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Less", functor::less, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Less")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::less<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_less_equal.cc b/tensorflow/core/kernels/cwise_op_less_equal.cc
new file mode 100644
index 0000000000..507c7c2908
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_less_equal.cc
@@ -0,0 +1,22 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER4(BinaryOp, CPU, "LessEqual", functor::less_equal, float, double, int32,
+          int64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "LessEqual", functor::less_equal, float, double,
+          int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("LessEqual")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::less_equal<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_log.cc b/tensorflow/core/kernels/cwise_op_log.cc
new file mode 100644
index 0000000000..ebc7cbcc4e
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_log.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Log", functor::log, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Log", functor::log, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_logical_and.cc b/tensorflow/core/kernels/cwise_op_logical_and.cc
new file mode 100644
index 0000000000..a4075088f4
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_logical_and.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER_KERNEL_BUILDER(Name("LogicalAnd").Device(DEVICE_CPU),
+                        BinaryOp<CPUDevice, functor::logical_and>);
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("LogicalAnd").Device(DEVICE_GPU),
+                        BinaryOp<GPUDevice, functor::logical_and>);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_logical_not.cc b/tensorflow/core/kernels/cwise_op_logical_not.cc
new file mode 100644
index 0000000000..b2e97bf70c
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_logical_not.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER_KERNEL_BUILDER(Name("LogicalNot").Device(DEVICE_CPU),
+                        UnaryOp<CPUDevice, functor::logical_not>);
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("LogicalNot").Device(DEVICE_GPU),
+                        UnaryOp<GPUDevice, functor::logical_not>);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_logical_or.cc b/tensorflow/core/kernels/cwise_op_logical_or.cc
new file mode 100644
index 0000000000..0d1df082f7
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_logical_or.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER_KERNEL_BUILDER(Name("LogicalOr").Device(DEVICE_CPU),
+                        BinaryOp<CPUDevice, functor::logical_or>);
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("LogicalOr").Device(DEVICE_GPU),
+                        BinaryOp<GPUDevice, functor::logical_or>);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_maximum.cc b/tensorflow/core/kernels/cwise_op_maximum.cc
new file mode 100644
index 0000000000..c0c9e3f6f5
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_maximum.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER4(BinaryOp, CPU, "Maximum", functor::maximum, float, double, int32,
+          int64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Maximum", functor::maximum, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Maximum")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::maximum<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_minimum.cc b/tensorflow/core/kernels/cwise_op_minimum.cc
new file mode 100644
index 0000000000..4c6bf7df05
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_minimum.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER4(BinaryOp, CPU, "Minimum", functor::minimum, float, double, int32,
+          int64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Minimum", functor::minimum, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Minimum")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::minimum<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_mod.cc b/tensorflow/core/kernels/cwise_op_mod.cc
new file mode 100644
index 0000000000..17f2834030
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_mod.cc
@@ -0,0 +1,6 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER2(BinaryOp, CPU, "Mod", functor::mod, int32, int64);
+REGISTER2(BinaryOp, CPU, "Mod", functor::fmod, float, double);
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_mul.cc b/tensorflow/core/kernels/cwise_op_mul.cc
new file mode 100644
index 0000000000..15f65012cd
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_mul.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER7(BinaryOp, CPU, "Mul", functor::mul, float, double, int32, int64, int8,
+          int16, complex64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Mul", functor::mul, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Mul")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::mul<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_neg.cc b/tensorflow/core/kernels/cwise_op_neg.cc
new file mode 100644
index 0000000000..3a19b2e94f
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_neg.cc
@@ -0,0 +1,9 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER5(UnaryOp, CPU, "Neg", functor::neg, float, double, int32, complex64,
+          int64);
+#if GOOGLE_CUDA
+REGISTER4(UnaryOp, GPU, "Neg", functor::neg, float, double, int32, int64);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_not_equal_to.cc b/tensorflow/core/kernels/cwise_op_not_equal_to.cc
new file mode 100644
index 0000000000..02d434a1c2
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_not_equal_to.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER5(BinaryOp, CPU, "NotEqual", functor::not_equal_to, float, double,
+          int32, int64, complex64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "NotEqual", functor::not_equal_to, float, double,
+          int64);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_pow.cc b/tensorflow/core/kernels/cwise_op_pow.cc
new file mode 100644
index 0000000000..d10dced85f
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_pow.cc
@@ -0,0 +1,9 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER5(BinaryOp, CPU, "Pow", functor::pow, float, double, int32, int64,
+          complex64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Pow", functor::pow, float, double, int64);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_real.cc b/tensorflow/core/kernels/cwise_op_real.cc
new file mode 100644
index 0000000000..84295a5a16
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_real.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER_KERNEL_BUILDER(Name("Real").Device(DEVICE_CPU),
+                        UnaryOp<CPUDevice, functor::get_real<complex64>>);
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("Real").Device(DEVICE_GPU),
+                        UnaryOp<GPUDevice, functor::get_real<complex64>>);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_rsqrt.cc b/tensorflow/core/kernels/cwise_op_rsqrt.cc
new file mode 100644
index 0000000000..a22b1209de
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_rsqrt.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Rsqrt", functor::rsqrt, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Rsqrt", functor::rsqrt, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_select.cc b/tensorflow/core/kernels/cwise_op_select.cc
new file mode 100644
index 0000000000..baa821690a
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_select.cc
@@ -0,0 +1,17 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER_SELECT(CPU, "Select", "", float);
+REGISTER_SELECT(CPU, "Select", "", double);
+REGISTER_SELECT(CPU, "Select", "", int32);
+REGISTER_SELECT(CPU, "Select", "", int64);
+REGISTER_SELECT(CPU, "Select", "", complex64);
+REGISTER_SELECT(CPU, "Select", "", string);
+#if GOOGLE_CUDA
+REGISTER_SELECT(GPU, "Select", "", float);
+REGISTER_SELECT(GPU, "Select", "", double);
+REGISTER_SELECT(GPU, "Select", "", int32);
+REGISTER_SELECT(GPU, "Select", "", int64);
+REGISTER_SELECT(GPU, "Select", "", complex64);
+#endif  // GOOGLE_CUDA
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_sigmoid.cc b/tensorflow/core/kernels/cwise_op_sigmoid.cc
new file mode 100644
index 0000000000..e03b5d54dd
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_sigmoid.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Sigmoid", functor::sigmoid, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Sigmoid", functor::sigmoid, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_sign.cc b/tensorflow/core/kernels/cwise_op_sign.cc
new file mode 100644
index 0000000000..59a0bfa1ed
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_sign.cc
@@ -0,0 +1,19 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER4(UnaryOp, CPU, "Sign", functor::sign, float, double, int32, int64);
+#if GOOGLE_CUDA
+REGISTER3(UnaryOp, GPU, "Sign", functor::sign, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Sign")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .TypeConstraint<int32>("T"),
+                        UnaryOp<CPUDevice, functor::sign<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_sin.cc b/tensorflow/core/kernels/cwise_op_sin.cc
new file mode 100644
index 0000000000..e7c87374d7
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_sin.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Sin", functor::sin, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Sin", functor::sin, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_sqrt.cc b/tensorflow/core/kernels/cwise_op_sqrt.cc
new file mode 100644
index 0000000000..f43241264a
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_sqrt.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Sqrt", functor::sqrt, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Sqrt", functor::sqrt, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_square.cc b/tensorflow/core/kernels/cwise_op_square.cc
new file mode 100644
index 0000000000..510fda49aa
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_square.cc
@@ -0,0 +1,9 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER5(UnaryOp, CPU, "Square", functor::square, float, double, int32,
+          complex64, int64);
+#if GOOGLE_CUDA
+REGISTER3(UnaryOp, GPU, "Square", functor::square, float, double, int64);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_sub.cc b/tensorflow/core/kernels/cwise_op_sub.cc
new file mode 100644
index 0000000000..c3c5952f8d
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_sub.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER5(BinaryOp, CPU, "Sub", functor::sub, float, double, int32, int64,
+          complex64);
+#if GOOGLE_CUDA
+REGISTER3(BinaryOp, GPU, "Sub", functor::sub, float, double, int64);
+#endif
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Sub")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("x")
+                            .HostMemory("y")
+                            .HostMemory("z")
+                            .TypeConstraint<int32>("T"),
+                        BinaryOp<CPUDevice, functor::sub<int32>>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_op_tanh.cc b/tensorflow/core/kernels/cwise_op_tanh.cc
new file mode 100644
index 0000000000..31f4743449
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_op_tanh.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+REGISTER3(UnaryOp, CPU, "Tanh", functor::tanh, float, double, complex64);
+#if GOOGLE_CUDA
+REGISTER2(UnaryOp, GPU, "Tanh", functor::tanh, float, double);
+#endif
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_ops.h b/tensorflow/core/kernels/cwise_ops.h
new file mode 100644
index 0000000000..7d818cfbbf
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_ops.h
@@ -0,0 +1,607 @@
+#ifndef TENSORFLOW_KERNELS_CWISE_OPS_H_
+#define TENSORFLOW_KERNELS_CWISE_OPS_H_
+
+#include <cmath>
+#include <functional>
+#include "tensorflow/core/framework/numeric_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+// The following functors (sign, tanh, sigmoid, etc.) are not defined
+// by Eigen.  When their equivalent are added into the Eigen, we can
+// replace them using type aliases.
+
+namespace Eigen {
+namespace internal {
+
+template <typename T>
+struct scalar_sign_op {
+  // TODO(zhifengc): this only works for real types. In theory,
+  // sign(x) = x / |x| works for both real and complex values.
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sign_op);
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(const T& x) const {
+    return T(x > T(0)) - T(x < T(0));
+  }
+};
+
+// TODO(zhifengc): Eigen::internal::pow_impl does not have proper
+// EIGEN host/device decoration. We duplicate code here for now.
+template <typename T, bool IsInteger>
+struct pow {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T
+  operator()(const T& x, const T& y) const {
+    return std::pow(x, y);
+  }
+};
+
+template <typename T>
+struct pow<T, true> {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(T x, T y) const {
+    T res(1);
+    if (y & 1) res *= x;
+    y >>= 1;
+    while (y) {
+      x *= x;
+      if (y & 1) res *= x;
+      y >>= 1;
+    }
+    return res;
+  }
+};
+
+template <typename T>
+struct scalar_pow2_op : pow<T, NumTraits<T>::IsInteger> {};
+
+template <typename T>
+struct functor_traits<scalar_pow2_op<T> > {
+  enum {
+    Cost = 5 * NumTraits<T>::MulCost,
+    PacketAccess = false,
+  };
+};
+
+template <typename T>
+struct scalar_fmod2_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_fmod2_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T operator()(const T& a,
+                                                           const T& b) const {
+    return fmod(a, b);
+  }
+};
+
+template <typename T>
+struct scalar_mod2_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_mod2_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T
+  operator()(const T& a, const T& b) const {
+    return a % b;
+  }
+};
+
+template <typename T>
+struct functor_traits<scalar_mod2_op<T> > {
+  enum {
+    Cost = 5,  // Roughly the cost of a div
+    PacketAccess = false,
+  };
+};
+
+// scalar_left and scalar_right are template helpers to partially
+// apply a binary function.
+//
+// Suppose Binary is a binary functor f(x, y), scalar_left<> is a
+// unary functor g_x(y) = f(x, y), where x is provided via the
+// constructor. Similarly, scalar_right<> is a unary functor g_y(x) =
+// f(x, y).
+
+template <typename Tout, typename Tin, typename Binary,
+          bool PacketAccess = functor_traits<Binary>::PacketAccess>
+struct scalar_left {
+  typedef Tout result_type;
+  const Tin* left;
+  EIGEN_DEVICE_FUNC inline scalar_left(
+      const scalar_left& other)  // NOLINT(runtime/explicit)
+      : left(other.left) {}
+  EIGEN_DEVICE_FUNC inline explicit scalar_left(const Tin* c) : left(c) {}
+  EIGEN_DEVICE_FUNC inline Tout operator()(const Tin& right) const {
+    return Binary()(*left, right);
+  }
+};
+
+template <typename Tout, typename Tin, typename Binary>
+struct scalar_left<Tout, Tin, Binary, true> {
+  typedef Tout result_type;
+  const Tin* left;
+  EIGEN_DEVICE_FUNC inline scalar_left(
+      const scalar_left& other)  // NOLINT(runtime/explicit)
+      : left(other.left) {}
+  EIGEN_DEVICE_FUNC inline explicit scalar_left(const Tin* c) : left(c) {}
+  EIGEN_DEVICE_FUNC inline Tout operator()(const Tin& right) const {
+    return Binary()(*left, right);
+  }
+
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& right_packet) const {
+    const Packet left_packet = Eigen::internal::pset1<Packet>(*left);
+    return Binary().packetOp(left_packet, right_packet);
+  }
+};
+
+template <typename Tout, typename Tin, typename Binary>
+struct functor_traits<scalar_left<Tout, Tin, Binary> > {
+  enum {
+    Cost = functor_traits<Binary>::Cost,
+    PacketAccess = functor_traits<Binary>::PacketAccess,
+  };
+};
+
+template <typename Tout, typename Tin, typename Binary,
+          bool PacketAccess = functor_traits<Binary>::PacketAccess>
+struct scalar_right {
+  typedef Tout result_type;
+  const Tin* right;
+  EIGEN_DEVICE_FUNC inline scalar_right(
+      const scalar_right& other)  // NOLINT(runtime/explicit)
+      : right(other.right) {}
+  EIGEN_DEVICE_FUNC inline explicit scalar_right(const Tin* c) : right(c) {}
+  EIGEN_DEVICE_FUNC inline Tout operator()(const Tin& left) const {
+    return Binary()(left, *right);
+  }
+};
+
+template <typename Tout, typename Tin, typename Binary>
+struct scalar_right<Tout, Tin, Binary, true> {
+  typedef Tout result_type;
+  const Tin* right;
+  EIGEN_DEVICE_FUNC inline scalar_right(
+      const scalar_right& other)  // NOLINT(runtime/explicit)
+      : right(other.right) {}
+  EIGEN_DEVICE_FUNC inline explicit scalar_right(const Tin* c) : right(c) {}
+  EIGEN_DEVICE_FUNC inline Tout operator()(const Tin& left) const {
+    return Binary()(left, *right);
+  }
+
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC inline Packet packetOp(const Packet& left_packet) const {
+    const Packet right_packet = Eigen::internal::pset1<Packet>(*right);
+    return Binary().packetOp(left_packet, right_packet);
+  }
+};
+
+template <typename Tout, typename Tin, typename Binary>
+struct functor_traits<scalar_right<Tout, Tin, Binary> > {
+  enum {
+    Cost = functor_traits<Binary>::Cost,
+    PacketAccess = functor_traits<Binary>::PacketAccess,
+  };
+};
+
+// similar to std::equal_to, but with the DEVICE_FUNC qualifier
+template <class T>
+struct equal_to : std::binary_function<T, T, bool> {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  bool operator()(const T& x, const T& y) const { return x == y; }
+};
+
+// similar to std::not_equal_to, but with the DEVICE_FUNC qualifier
+template <class T>
+struct not_equal_to : std::binary_function<T, T, bool> {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  bool operator()(const T& x, const T& y) const { return x != y; }
+};
+
+// similar to std::greater, but with the DEVICE_FUNC qualifier
+template <class T>
+struct greater : std::binary_function<T, T, bool> {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  bool operator()(const T& x, const T& y) const { return x > y; }
+};
+
+// similar to std::less, but with the DEVICE_FUNC qualifier
+template <class T>
+struct less : std::binary_function<T, T, bool> {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  bool operator()(const T& x, const T& y) const { return x < y; }
+};
+
+// similar to std::greater_equal, but with the DEVICE_FUNC qualifier
+template <class T>
+struct greater_equal : std::binary_function<T, T, bool> {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  bool operator()(const T& x, const T& y) const { return x >= y; }
+};
+
+// similar to std::less_equal, but with the DEVICE_FUNC qualifier
+template <class T>
+struct less_equal : std::binary_function<T, T, bool> {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  bool operator()(const T& x, const T& y) const { return x <= y; }
+};
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+namespace tensorflow {
+namespace functor {
+
+////////////////////////////////////////////////////////////////////////////////
+// Helpers
+////////////////////////////////////////////////////////////////////////////////
+
+// Base template for functors whose input scalar type is T and
+// output scalar type is R.
+template <typename T, typename F, typename R = T>
+struct base {
+  // func defines operator() and its vectorized version packetOp().
+  typedef F func;
+
+  // If true, the functor's corresponding binary op will instantiate
+  // specialized kernels to perform an optimized broadcast
+  // operation. Each functor for which this is enabled increases the
+  // code size, so by default this is disabled for binary functors and
+  // is enabled on a per-op basis as needed.
+  static const bool use_bcast_optimization = false;
+
+  // operator() has the signature:
+  //  out_type operator()(in_type in0, in_type in1 ...)
+  typedef R out_type;
+  typedef T in_type;
+
+  // TensorFlow provides tensor-ized version of "func". Roughly
+  // speaking, the tensorflow operation has the signature:
+  //   tout_type op(tin_type in0)
+  //   tout_type op(tin_type in0, tin_type in1)
+  //   tout_type op(tin_type in0, in_type scalar)
+  typedef typename TTypes<out_type>::Flat tout_type;
+  typedef typename TTypes<in_type>::ConstFlat tin_type;
+  typedef typename TTypes<in_type>::ConstScalar tscalar_type;
+};
+
+// For now, we only apply certain speed optimization for
+// float/double's broadcast binary op.
+template <typename T>
+struct use_bcast_optimization {
+  static const bool value = false;
+};
+
+template <>
+struct use_bcast_optimization<float> {
+  static const bool value = true;
+};
+
+template <>
+struct use_bcast_optimization<double> {
+  static const bool value = true;
+};
+
+////////////////////////////////////////////////////////////////////////////////
+// Unary functors
+////////////////////////////////////////////////////////////////////////////////
+
+// abs(x) = |x|
+// neg(x) = - x
+// inverse(x) = 1 / x
+// square(x) = x^2
+// sqrt(x) = x^(1/2)
+// rsqrt(x) = x^(-1/2)
+// exp(x) = e^x
+// log(x) = natural logrithm of x
+// tanh = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
+// sigmoid = 1 / (1 + exp(-x))  // a.k.a, logistic
+//
+// NOTE: We may eventually implement common functions used in NN
+// here. E.g., rectifier, softplus, derivatives of tanh, sigmod, etc.
+// For reference, see speech/lstm/eigen_functors.h.
+
+template <typename T>
+struct abs : base<T, Eigen::internal::scalar_abs_op<T>,
+                  typename Eigen::internal::scalar_abs_op<T>::result_type> {};
+
+template <typename T>
+struct neg : base<T, Eigen::internal::scalar_opposite_op<T> > {};
+
+template <typename T>
+struct inverse : base<T, Eigen::internal::scalar_inverse_op<T> > {};
+
+template <typename T>
+struct square : base<T, Eigen::internal::scalar_square_op<T> > {};
+
+template <typename T>
+struct sqrt : base<T, Eigen::internal::scalar_sqrt_op<T> > {};
+
+template <typename T>
+struct rsqrt : base<T, Eigen::internal::scalar_rsqrt_op<T> > {};
+
+template <typename T>
+struct exp : base<T, Eigen::internal::scalar_exp_op<T> > {};
+
+template <typename T>
+struct log : base<T, Eigen::internal::scalar_log_op<T> > {};
+
+template <typename T>
+struct sign : base<T, Eigen::internal::scalar_sign_op<T> > {};
+
+template <typename T>
+struct tanh : base<T, Eigen::internal::scalar_tanh_op<T> > {};
+
+template <typename T>
+struct sigmoid : base<T, Eigen::internal::scalar_sigmoid_op<T> > {};
+
+template <typename T>
+struct sin : base<T, Eigen::internal::scalar_sin_op<T> > {};
+
+template <typename T>
+struct cos : base<T, Eigen::internal::scalar_cos_op<T> > {};
+
+struct logical_not : base<bool, std::logical_not<bool> > {};
+
+namespace impl {
+
+#ifndef __CUDACC__
+// Uses STL std cmath functions.
+template <typename T>
+bool isinf(T v) {
+  return std::isinf(v);
+}
+
+template <typename T>
+bool isnan(T v) {
+  return std::isnan(v);
+}
+
+template <typename T>
+bool isfinite(T v) {
+  return std::isfinite(v);
+}
+
+template <typename T>
+T floor(T v) {
+  return std::floor(v);
+}
+
+template <typename T>
+T ceil(T v) {
+  return std::ceil(v);
+}
+#else
+// Uses CUDA's functions for float and double.
+template <typename T>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool isinf(T v) {
+  return ::isinf(v);
+}
+
+template <typename T>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool isnan(T v) {
+  return ::isnan(v);
+}
+
+template <typename T>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool isfinite(T v) {
+  return ::isfinite(v);
+}
+
+template <typename T>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T floor(T v) {
+  return ::floor(v);
+}
+
+template <typename T>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T ceil(T v) {
+  return ::ceil(v);
+}
+#endif
+}  // end namespace impl
+
+// NOTE: std::isinf, std::isnan, std::isfinite are plain function.
+// Therefore we need to wrap them in functors to be used with Eigen's
+// type system.
+
+template <typename T>
+struct isinf_func {
+  typedef bool result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(T x) const {
+    return impl::isinf(x);
+  }
+};
+
+template <typename T>
+struct isinf : base<T, isinf_func<T>, bool> {};
+
+template <typename T>
+struct isnan_func {
+  typedef bool result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(T x) const {
+    return impl::isnan(x);
+  }
+};
+
+template <typename T>
+struct isnan : base<T, isnan_func<T>, bool> {};
+
+template <typename T>
+struct isfinite_func {
+  typedef bool result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator()(T x) const {
+    return impl::isfinite(x);
+  }
+};
+
+template <typename T>
+struct isfinite : base<T, isfinite_func<T>, bool> {};
+
+template <typename T>
+struct floor_func {
+  typedef T result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(T x) const {
+    return impl::floor(x);
+  }
+};
+
+template <typename T>
+struct floor : base<T, floor_func<T> > {};
+
+template <typename T>
+struct ceil_func {
+  typedef T result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(T x) const {
+    return impl::ceil(x);
+  }
+};
+
+template <typename T>
+struct ceil : base<T, ceil_func<T> > {};
+
+////////////////////////////////////////////////////////////////////////////////
+// Binary functors
+////////////////////////////////////////////////////////////////////////////////
+
+// Binary functors:
+//
+// add(x, y) = x + y
+// sub(x, y) = x - y
+// mul(x, y) = x * y
+// div(x, y) = x / y
+// mod(x, y) = x % y         (int32 and int64 only)
+// fmod(x, y) = fmod(x, y)   (float and double only)
+// pow(x, y) = x ^ y
+// maximum(x, y) = x > y ? x : y
+// minimum(x, y) = x < y ? x : y
+
+template <typename T>
+struct add : base<T, Eigen::internal::scalar_sum_op<T> > {
+  static const bool use_bcast_optimization = true;
+};
+
+template <typename T>
+struct sub : base<T, Eigen::internal::scalar_difference_op<T> > {
+  static const bool use_bcast_optimization = true;
+};
+
+template <typename T>
+struct mul : base<T, Eigen::internal::scalar_product_op<T> > {};
+
+template <typename T>
+struct div : base<T, Eigen::internal::scalar_quotient_op<T> > {};
+
+template <typename T>
+struct fmod : base<T, Eigen::internal::scalar_fmod2_op<T> > {};
+
+template <typename T>
+struct mod : base<T, Eigen::internal::scalar_mod2_op<T> > {};
+
+template <typename T>
+struct pow : base<T, Eigen::internal::scalar_pow2_op<T> > {};
+
+template <typename T>
+struct maximum : base<T, Eigen::internal::scalar_max_op<T> > {};
+
+template <typename T>
+struct minimum : base<T, Eigen::internal::scalar_min_op<T> > {};
+
+template <typename T>
+struct less : base<T, Eigen::internal::less<T>, bool> {};
+
+template <typename T>
+struct less_equal : base<T, Eigen::internal::less_equal<T>, bool> {};
+
+template <typename T>
+struct greater : base<T, Eigen::internal::greater<T>, bool> {};
+
+template <typename T>
+struct greater_equal : base<T, Eigen::internal::greater_equal<T>, bool> {};
+
+template <typename T>
+struct equal_to : base<T, Eigen::internal::equal_to<T>, bool> {};
+
+template <typename T>
+struct not_equal_to : base<T, Eigen::internal::not_equal_to<T>, bool> {};
+
+struct logical_and : base<bool, Eigen::internal::scalar_boolean_and_op> {};
+
+struct logical_or : base<bool, Eigen::internal::scalar_boolean_or_op> {};
+
+template <typename T>
+struct make_complex_func {
+  typedef std::complex<T> result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  result_type operator()(T real, T imag) const {
+    return std::complex<T>(real, imag);
+  }
+};
+
+template <typename T>
+struct make_complex : base<T, make_complex_func<T>, std::complex<T> > {};
+
+template <typename T>
+struct get_real
+    : base<T, Eigen::internal::scalar_real_op<T>, typename T::value_type> {};
+
+template <typename T>
+struct get_imag
+    : base<T, Eigen::internal::scalar_imag_op<T>, typename T::value_type> {};
+
+template <typename T>
+struct conj : base<T, Eigen::internal::scalar_conjugate_op<T> > {};
+
+////////////////////////////////////////////////////////////////////////////////
+// Functors takes 1 or 2 tensors, computes the base functor on
+// coefficient of the input tensors and puts the results in the output
+// tensor.
+////////////////////////////////////////////////////////////////////////////////
+template <typename Device, typename Functor>
+struct UnaryFunctor {
+  // Computes on device "d": out[i] = Functor(in[i])
+  void operator()(const Device& d, typename Functor::tout_type out,
+                  typename Functor::tin_type in);
+};
+
+template <typename Device, typename Functor, int NDIMS>
+struct BinaryFunctor {
+  // Computes on device "d": out[i] = Functor(in0[i], in1[i])
+  void operator()(const Device& d, typename Functor::tout_type out,
+                  typename Functor::tin_type in0,
+                  typename Functor::tin_type in1);
+
+  // Computes on device "d": out[i] = Functor(scalar[0], in[i])
+  void Left(const Device& d, typename Functor::tout_type out,
+            typename Functor::tscalar_type scalar,
+            typename Functor::tin_type in);
+
+  // Computes on device "d": out[i] = Functor(in[i], scalar[0])
+  void Right(const Device& d, typename Functor::tout_type out,
+             typename Functor::tin_type in,
+             typename Functor::tscalar_type scalar);
+
+  // Computes on device "d":
+  //   out = Functor(in0.broadcast(bcast0), in1.broadcast(bcast01))
+  //
+  // TODO(zhifengc): makes BCast a template member function on NDIMS
+  // instead making BinaryFunctor templates on NDIMS.
+  void BCast(const Device& d,
+             typename TTypes<typename Functor::out_type, NDIMS>::Tensor out,
+             typename TTypes<typename Functor::in_type, NDIMS>::ConstTensor in0,
+             typename Eigen::array<Eigen::DenseIndex, NDIMS> bcast0,
+             typename TTypes<typename Functor::in_type, NDIMS>::ConstTensor in1,
+             typename Eigen::array<Eigen::DenseIndex, NDIMS> bcast1);
+};
+
+template <int NDIMS>
+bool AllOne(const typename Eigen::array<Eigen::DenseIndex, NDIMS>& a) {
+  for (int i = 0; i < a.size(); ++i) {
+    if (a[i] != 1) return false;
+  }
+  return true;
+}
+
+template <typename Device, typename T>
+struct SelectFunctor {
+  void operator()(const Device& d, typename TTypes<T>::Flat out,
+                  typename TTypes<bool>::ConstFlat cond_flat,
+                  typename TTypes<T>::ConstFlat then_flat,
+                  typename TTypes<T>::ConstFlat else_flat);
+};
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_CWISE_OPS_H_
diff --git a/tensorflow/core/kernels/cwise_ops_common.cc b/tensorflow/core/kernels/cwise_ops_common.cc
new file mode 100644
index 0000000000..f86d2ddd9a
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_ops_common.cc
@@ -0,0 +1,42 @@
+#include "tensorflow/core/kernels/cwise_ops_common.h"
+
+namespace tensorflow {
+
+BinaryOpShared::BinaryOpShared(OpKernelConstruction* ctx, DataType out,
+                               DataType in)
+    : OpKernel(ctx) {
+  OP_REQUIRES_OK(ctx, ctx->MatchSignature({in, in}, {out}));
+}
+
+void BinaryOpShared::SetUnimplementedError(OpKernelContext* ctx) {
+  ctx->SetStatus(errors::Unimplemented(
+      "Broadcast between ", ctx->input(0).shape().ShortDebugString(), " and ",
+      ctx->input(1).shape().ShortDebugString(), " is not supported yet."));
+}
+
+static BCast::Vec FromShape(const TensorShape& shape) {
+  BCast::Vec ret;
+  for (int i = 0; i < shape.dims(); ++i) ret.push_back(shape.dim_size(i));
+  return ret;
+}
+
+static TensorShape ToShape(const BCast::Vec& vec) {
+  TensorShape shape;
+  for (auto elem : vec) shape.AddDim(elem);
+  return shape;
+}
+
+BinaryOpShared::BinaryOpState::BinaryOpState(OpKernelContext* ctx)
+    : bcast(FromShape(ctx->input(0).shape()),
+            FromShape(ctx->input(1).shape())) {
+  if (!bcast.IsValid()) {
+    ctx->SetStatus(errors::InvalidArgument(
+        "Incompatible shapes: ", ctx->input(0).shape().ShortDebugString(),
+        " vs. ", ctx->input(1).shape().ShortDebugString()));
+    return;
+  }
+  OP_REQUIRES_OK(ctx,
+                 ctx->allocate_output(0, ToShape(bcast.output_shape()), &out));
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/cwise_ops_common.h b/tensorflow/core/kernels/cwise_ops_common.h
new file mode 100644
index 0000000000..cf848b86d1
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_ops_common.h
@@ -0,0 +1,390 @@
+#ifndef TENSORFLOW_KERNELS_CWISE_OPS_COMMON_H_
+#define TENSORFLOW_KERNELS_CWISE_OPS_COMMON_H_
+
+// See docs in ../ops/math_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/cwise_ops.h"
+
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/util/bcast.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+class BinaryOpShared : public OpKernel {
+ public:
+  explicit BinaryOpShared(OpKernelConstruction* ctx, DataType out, DataType in);
+
+ protected:
+  struct BinaryOpState {
+    // Sets up bcast with the shape of in0 and in1, ensures that the bcast
+    // is valid, and if so, allocates out using ctx->output(...).
+    // Caller must check ctx->status() upon return for non-ok status.
+    // If ctx->status().ok() is true, then out is guaranteed to be allocated.
+    BinaryOpState(OpKernelContext* ctx);
+
+    BCast bcast;
+    Tensor* out = nullptr;
+  };
+
+  template <int NDIMS>
+  static Eigen::array<Eigen::DenseIndex, NDIMS> ToIndexArray(
+      const BCast::Vec& vec) {
+    CHECK_EQ(vec.size(), NDIMS);
+    Eigen::array<Eigen::DenseIndex, NDIMS> ret;
+    for (int i = 0; i < NDIMS; ++i) ret[i] = vec[i];
+    return ret;
+  }
+  void SetUnimplementedError(OpKernelContext* ctx);
+};
+
+// Coefficient-wise binary operations:
+//   Device: E.g., CPUDevice, GPUDevice.
+//   Functor: defined in cwise_functors.h. E.g., functor::add2.
+template <typename Device, typename Functor>
+class BinaryOp : public BinaryOpShared {
+ public:
+  typedef typename Functor::in_type Tin;    // Input scalar data type.
+  typedef typename Functor::out_type Tout;  // Output scalar data type.
+
+  explicit BinaryOp(OpKernelConstruction* ctx)
+      : BinaryOpShared(ctx, DataTypeToEnum<Tout>::v(),
+                       DataTypeToEnum<Tin>::v()) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& in0 = ctx->input(0);
+    const Tensor& in1 = ctx->input(1);
+    // 'state': Shared helper not dependent on T to reduce code size
+    BinaryOpState state(ctx);
+    if (!ctx->status().ok()) return;
+    Tensor* out = state.out;
+    BCast* bcast = &state.bcast;
+    if (out->NumElements() == 0) {
+      return;
+    }
+    const int ndims = bcast->x_reshape().size();
+    if (ndims <= 1) {
+      if (in1.NumElements() == 1) {
+        // tensor op scalar
+        functor::BinaryFunctor<Device, Functor, 1>().Right(
+            ctx->eigen_device<Device>(), out->flat<Tout>(), in0.flat<Tin>(),
+            in1.scalar<Tin>());
+        return;
+      }
+      if (in0.NumElements() == 1) {
+        // scalar op tensor
+        functor::BinaryFunctor<Device, Functor, 1>().Left(
+            ctx->eigen_device<Device>(), out->flat<Tout>(), in0.scalar<Tin>(),
+            in1.flat<Tin>());
+        return;
+      }
+      functor::BinaryFunctor<Device, Functor, 1>()(
+          ctx->eigen_device<Device>(), out->flat<Tout>(), in0.flat<Tin>(),
+          in1.flat<Tin>());
+      return;
+    }
+
+    if (ndims == 2) {
+      functor::BinaryFunctor<Device, Functor, 2>().BCast(
+          ctx->eigen_device<Device>(),
+          out->shaped<Tout, 2>(bcast->result_shape()),
+          in0.shaped<Tin, 2>(bcast->x_reshape()),
+          ToIndexArray<2>(bcast->x_bcast()),
+          in1.shaped<Tin, 2>(bcast->y_reshape()),
+          ToIndexArray<2>(bcast->y_bcast()));
+      return;
+    }
+
+    if (ndims == 3) {
+      functor::BinaryFunctor<Device, Functor, 3>().BCast(
+          ctx->eigen_device<Device>(),
+          out->shaped<Tout, 3>(bcast->result_shape()),
+          in0.shaped<Tin, 3>(bcast->x_reshape()),
+          ToIndexArray<3>(bcast->x_bcast()),
+          in1.shaped<Tin, 3>(bcast->y_reshape()),
+          ToIndexArray<3>(bcast->y_bcast()));
+      return;
+    }
+
+    SetUnimplementedError(ctx);
+  }
+
+ private:
+};
+
+// Coefficient-wise unary operations:
+//   Device: E.g., CPUDevice, GPUDevice.
+//   Functor: defined in cwise_functors.h. E.g., functor::sqrt.
+template <typename Device, typename Functor>
+class UnaryOp : public OpKernel {
+ public:
+  typedef typename Functor::in_type Tin;    // Input scalar data type.
+  typedef typename Functor::out_type Tout;  // Output scalar data type.
+  // Tin may be different from Tout. E.g., abs: complex64 -> float
+
+  explicit UnaryOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    auto in = DataTypeToEnum<Tin>::v();
+    auto out = DataTypeToEnum<Tout>::v();
+    OP_REQUIRES_OK(ctx, ctx->MatchSignature({in}, {out}));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& inp = ctx->input(0);
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, inp.shape(), &out));
+    functor::UnaryFunctor<Device, Functor>()(
+        ctx->eigen_device<Device>(), out->flat<Tout>(), inp.flat<Tin>());
+  }
+};
+
+// Coefficient-wise select operation.
+//   Device: E.g., CPUDevice, GPUDevice.
+template <typename Device, typename T>
+class SelectOp : public OpKernel {
+ public:
+  explicit SelectOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    auto dt = DataTypeToEnum<T>::v();
+    OP_REQUIRES_OK(ctx, ctx->MatchSignature({DT_BOOL, dt, dt}, {dt}));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& in0 = ctx->input(0);
+    const Tensor& in1 = ctx->input(1);
+    const Tensor& in2 = ctx->input(2);
+    if (!ctx->ValidateInputsAreSameShape(this)) return;
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, in0.shape(), &out));
+    functor::SelectFunctor<Device, T> func;
+    func(ctx->eigen_device<Device>(), out->flat<T>(), in0.flat<bool>(),
+         in1.flat<T>(), in2.flat<T>());
+  }
+};
+
+namespace functor {
+
+// For CPUDevice, we do operations inline if the resulting tensor is
+// modestly sized.
+static bool DoInline(size_t size) { return size <= 32768; }
+
+template <typename D, typename OUT, typename RHS>
+void Assign(const D& d, OUT out, RHS rhs) {
+  if (DoInline(out.size())) {
+    out = rhs;
+  } else {
+    out.device(d) = rhs;
+  }
+}
+
+// Partial specialization of BinaryFunctor<Device=CPUDevice, Functor>.
+template <typename Functor, int NDIMS>
+struct BinaryFunctor<CPUDevice, Functor, NDIMS> {
+  void operator()(const CPUDevice& d, typename Functor::tout_type out,
+                  typename Functor::tin_type in0,
+                  typename Functor::tin_type in1) {
+    Assign(d, out, in0.binaryExpr(in1, typename Functor::func()));
+  }
+
+  void Left(const CPUDevice& d, typename Functor::tout_type out,
+            typename Functor::tscalar_type scalar,
+            typename Functor::tin_type in) {
+    typedef typename Functor::out_type Tout;
+    typedef typename Functor::in_type Tin;
+    typedef typename Functor::func Binary;
+    typedef typename Eigen::internal::scalar_left<Tout, Tin, Binary> Unary;
+    Assign(d, out, in.unaryExpr(Unary(scalar.data())));
+  }
+
+  void Right(const CPUDevice& d, typename Functor::tout_type out,
+             typename Functor::tin_type in,
+             typename Functor::tscalar_type scalar) {
+    typedef typename Functor::out_type Tout;
+    typedef typename Functor::in_type Tin;
+    typedef typename Functor::func Binary;
+    typedef typename Eigen::internal::scalar_right<Tout, Tin, Binary> Unary;
+    Assign(d, out, in.unaryExpr(Unary(scalar.data())));
+  }
+
+#if !defined(EIGEN_HAS_INDEX_LIST)
+  inline Eigen::DSizes<int, 2> NByOne(int n) {
+    return Eigen::DSizes<int, 2>(n, 1);
+  }
+  inline Eigen::DSizes<int, 2> OneByM(int m) {
+    return Eigen::DSizes<int, 2>(1, m);
+  }
+#else
+  inline Eigen::IndexList<int, Eigen::type2index<1>> NByOne(int n) {
+    Eigen::IndexList<int, Eigen::type2index<1>> ret;
+    ret.set(0, n);
+    return ret;
+  }
+  inline Eigen::IndexList<Eigen::type2index<1>, int> OneByM(int m) {
+    Eigen::IndexList<Eigen::type2index<1>, int> ret;
+    ret.set(1, m);
+    return ret;
+  }
+#endif
+
+  void BCast(const CPUDevice& dev,
+             typename TTypes<typename Functor::out_type, NDIMS>::Tensor out,
+             typename TTypes<typename Functor::in_type, NDIMS>::ConstTensor in0,
+             typename Eigen::array<Eigen::DenseIndex, NDIMS> bcast0,
+             typename TTypes<typename Functor::in_type, NDIMS>::ConstTensor in1,
+             typename Eigen::array<Eigen::DenseIndex, NDIMS> bcast1) {
+    typedef typename Functor::in_type T;
+    typename Functor::func func;
+    if ((NDIMS == 2) && Functor::use_bcast_optimization &&
+        use_bcast_optimization<T>::value) {
+      // Optimize for speed by using Eigen::type2index and avoid
+      // .broadcast() when we know its a no-op.
+      //
+      // Here, we need to handle 6 cases depending on how many "1"
+      // exist in in0 and in1's shapes (4 numbers in total). It's not
+      // possible that two shapes have more than 2 1s because those
+      // are simplified to NDIMS==1 case.
+      //
+      // Because this optimization increases the binary size for each
+      // Functor (+, -, *, /, <, <=, etc.), type and ndim combination.
+      // we only apply such optimization for selected ops/types/ndims.
+      //
+      // Because NDIMS, Functor::use_broadcast_optimization and
+      // use_broadcast_optimization<T> are compile-time constant, gcc
+      // does a decent job avoiding generating code when conditions
+      // are not met.
+      const int a = in0.dimension(0);  // in0 is shape [a, b]
+      const int b = in0.dimension(1);
+      const int c = in1.dimension(0);  // in1 is shape [c, d]
+      const int d = in1.dimension(1);
+      if ((a == 1) && (d == 1)) {
+        auto lhs = in0.reshape(OneByM(b)).broadcast(NByOne(c));
+        auto rhs = in1.reshape(NByOne(c)).broadcast(OneByM(b));
+        Assign(dev, out, lhs.binaryExpr(rhs, func));
+        return;
+      }
+      if ((b == 1) && (c == 1)) {
+        auto lhs = in0.reshape(NByOne(a)).broadcast(OneByM(d));
+        auto rhs = in1.reshape(OneByM(d)).broadcast(NByOne(a));
+        Assign(dev, out, lhs.binaryExpr(rhs, func));
+        return;
+      }
+      if (a == 1) {
+        auto lhs = in0.reshape(OneByM(b)).broadcast(NByOne(c));
+        auto rhs = in1;
+        Assign(dev, out, lhs.binaryExpr(rhs, func));
+        return;
+      }
+      if (b == 1) {
+        auto lhs = in0.reshape(NByOne(a)).broadcast(OneByM(d));
+        auto rhs = in1;
+        Assign(dev, out, lhs.binaryExpr(rhs, func));
+        return;
+      }
+      if (c == 1) {
+        auto lhs = in0;
+        auto rhs = in1.reshape(OneByM(d)).broadcast(NByOne(a));
+        Assign(dev, out, lhs.binaryExpr(rhs, func));
+        return;
+      }
+      if (d == 1) {
+        auto lhs = in0;
+        auto rhs = in1.reshape(NByOne(c)).broadcast(OneByM(b));
+        Assign(dev, out, lhs.binaryExpr(rhs, func));
+        return;
+      }
+
+      const bool bcast0_all_one = AllOne<NDIMS>(bcast0);
+      const bool bcast1_all_one = AllOne<NDIMS>(bcast1);
+      if (bcast0_all_one && !bcast1_all_one) {
+        auto lhs = in0;  // No need to do broadcast for in0
+        auto rhs = in1.broadcast(bcast1);
+        Assign(dev, out, lhs.binaryExpr(rhs, func));
+        return;
+      }
+
+      if (!bcast0_all_one && bcast1_all_one) {
+        auto lhs = in0.broadcast(bcast0);
+        auto rhs = in1;  // No need to do broadcast for in1
+        Assign(dev, out, lhs.binaryExpr(rhs, func));
+        return;
+      }
+    }
+
+    // Fallback path. Always work and probably slower.
+    auto lhs = in0.broadcast(bcast0);
+    auto rhs = in1.broadcast(bcast1);
+    Assign(dev, out, lhs.binaryExpr(rhs, func));
+  }
+};
+
+// Partial specialization of UnaryFunctor<Device=CPUDevice, Functor>.
+template <typename Functor>
+struct UnaryFunctor<CPUDevice, Functor> {
+  void operator()(const CPUDevice& d, typename Functor::tout_type out,
+                  typename Functor::tin_type in) {
+    Assign(d, out, in.unaryExpr(typename Functor::func()));
+  }
+};
+
+template <typename T>
+struct SelectFunctor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<bool>::ConstFlat cond_flat,
+                  typename TTypes<T>::ConstFlat then_flat,
+                  typename TTypes<T>::ConstFlat else_flat) {
+    Assign(d, out, cond_flat.select(then_flat, else_flat));
+  }
+};
+
+}  // end namespace functor
+
+#define REGISTER_SELECT(D, N, F, T)                                          \
+  REGISTER_KERNEL_BUILDER(Name(N).Device(DEVICE_##D).TypeConstraint<T>("T"), \
+                          SelectOp<D##Device, T>)
+
+#define REGISTER(OP, D, N, F, T)                                             \
+  REGISTER_KERNEL_BUILDER(Name(N).Device(DEVICE_##D).TypeConstraint<T>("T"), \
+                          OP<D##Device, F<T>>);
+
+// Macros to register kernels for multiple types (T0, T1, etc.)  on
+// device type "D" (CPU or GPU) for operatin "N" (e.g., sqrt) using
+// the functor "F" (e.g., functor:sqrt).
+
+#ifdef __ANDROID__
+// On Android, only register the first type (float)
+#define REGISTER2(OP, D, N, F, T0, T1) REGISTER(OP, D, N, F, T0)
+#define REGISTER3(OP, D, N, F, T0, T1, T2) REGISTER(OP, D, N, F, T0)
+#define REGISTER4(OP, D, N, F, T0, T1, T2, T3) REGISTER(OP, D, N, F, T0)
+#define REGISTER5(OP, D, N, F, T0, T1, T2, T3, T4) REGISTER(OP, D, N, F, T0)
+#define REGISTER6(OP, D, N, F, T0, T1, T2, T3, T4, T5) REGISTER(OP, D, N, F, T0)
+#define REGISTER7(OP, D, N, F, T0, T1, T2, T3, T4, T5, T6) \
+  REGISTER(OP, D, N, F, T0)
+#else  // !__ANDROID__
+#define REGISTER2(OP, D, N, F, T0, T1) \
+  REGISTER(OP, D, N, F, T0)            \
+  REGISTER(OP, D, N, F, T1)
+#define REGISTER3(OP, D, N, F, T0, T1, T2) \
+  REGISTER2(OP, D, N, F, T0, T1)           \
+  REGISTER(OP, D, N, F, T2)
+#define REGISTER4(OP, D, N, F, T0, T1, T2, T3) \
+  REGISTER2(OP, D, N, F, T0, T1)               \
+  REGISTER2(OP, D, N, F, T2, T3)
+#define REGISTER5(OP, D, N, F, T0, T1, T2, T3, T4) \
+  REGISTER3(OP, D, N, F, T0, T1, T2)               \
+  REGISTER2(OP, D, N, F, T3, T4)
+#define REGISTER6(OP, D, N, F, T0, T1, T2, T3, T4, T5) \
+  REGISTER3(OP, D, N, F, T0, T1, T2)                   \
+  REGISTER3(OP, D, N, F, T3, T4, T5)
+#define REGISTER7(OP, D, N, F, T0, T1, T2, T3, T4, T5, T6) \
+  REGISTER4(OP, D, N, F, T0, T1, T2, T3)                   \
+  REGISTER3(OP, D, N, F, T4, T5, T6)
+#endif  // __ANDROID__
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_CWISE_OPS_COMMON_H_
diff --git a/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h b/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
new file mode 100644
index 0000000000..b0dc027144
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_ops_gpu_common.cu.h
@@ -0,0 +1,135 @@
+#if !GOOGLE_CUDA
+#error This file must only be included when building with Cuda support
+#endif
+
+#ifndef TENSORFLOW_KERNELS_CWISE_OPS_GPU_COMMON_CU_H_
+#define TENSORFLOW_KERNELS_CWISE_OPS_GPU_COMMON_CU_H_
+
+#define EIGEN_USE_GPU
+
+#include <complex>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/kernels/cwise_ops.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+#include "tensorflow/core/platform/logging.h"
+namespace tensorflow {
+namespace functor {
+
+typedef Eigen::GpuDevice GPUDevice;
+typedef std::complex<float> complex64;
+
+// Partial specialization of UnaryFunctor<Device=GPUDevice, Functor>.
+template <typename Functor>
+struct UnaryFunctor<GPUDevice, Functor> {
+  void operator()(const GPUDevice& d, typename Functor::tout_type out,
+                  typename Functor::tin_type in) {
+    out.device(d) = in.unaryExpr(typename Functor::func());
+  }
+};
+
+// Partial specialization of BinaryFunctor<Device=GPUDevice, Functor>.
+template <typename Functor, int NDIMS>
+struct BinaryFunctor<GPUDevice, Functor, NDIMS> {
+  void operator()(const GPUDevice& d, typename Functor::tout_type out,
+                  typename Functor::tin_type in0,
+                  typename Functor::tin_type in1) {
+    out.device(d) = in0.binaryExpr(in1, typename Functor::func());
+  }
+
+  void Left(const GPUDevice& d, typename Functor::tout_type out,
+            typename Functor::tscalar_type scalar,
+            typename Functor::tin_type in) {
+    typedef typename Functor::out_type Tout;
+    typedef typename Functor::in_type Tin;
+    typedef typename Functor::func Binary;
+    typedef typename Eigen::internal::scalar_left<Tout, Tin, Binary> Unary;
+    out.device(d) = in.unaryExpr(Unary(scalar.data()));
+  }
+
+  void Right(const GPUDevice& d, typename Functor::tout_type out,
+             typename Functor::tin_type in,
+             typename Functor::tscalar_type scalar) {
+    typedef typename Functor::out_type Tout;
+    typedef typename Functor::in_type Tin;
+    typedef typename Functor::func Binary;
+    typedef typename Eigen::internal::scalar_right<Tout, Tin, Binary> Unary;
+    out.device(d) = in.unaryExpr(Unary(scalar.data()));
+  }
+
+  void BCast(const GPUDevice& d,
+             typename TTypes<typename Functor::out_type, NDIMS>::Tensor out,
+             typename TTypes<typename Functor::in_type, NDIMS>::ConstTensor in0,
+             typename Eigen::array<Eigen::DenseIndex, NDIMS> bcast0,
+             typename TTypes<typename Functor::in_type, NDIMS>::ConstTensor in1,
+             typename Eigen::array<Eigen::DenseIndex, NDIMS> bcast1) {
+    typedef typename Functor::in_type T;
+    typename Functor::func func;
+    if ((NDIMS == 2) && Functor::use_bcast_optimization &&
+        use_bcast_optimization<T>::value) {
+      const bool bcast0_all_one = AllOne<NDIMS>(bcast0);
+      const bool bcast1_all_one = AllOne<NDIMS>(bcast1);
+      if (bcast0_all_one && !bcast1_all_one) {
+        out.device(d) = in0.binaryExpr(in1.broadcast(bcast1), func);
+        return;
+      }
+      if (!bcast0_all_one && bcast1_all_one) {
+        out.device(d) = in0.broadcast(bcast0).binaryExpr(in1, func);
+        return;
+      }
+    }
+    out.device(d) =
+        in0.broadcast(bcast0).binaryExpr(in1.broadcast(bcast1), func);
+  }
+};
+
+template <typename T>
+struct SelectFunctor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat out,
+                  typename TTypes<bool>::ConstFlat cond_flat,
+                  typename TTypes<T>::ConstFlat then_flat,
+                  typename TTypes<T>::ConstFlat else_flat) {
+    out.device(d) = cond_flat.select(then_flat, else_flat);
+  }
+};
+
+// Macros to explicitly instantiate kernels on GPU for multiple types
+// (T0, T1, etc.) for UnaryFunctor (e.g., functor:sqrt).
+#define DEFINE_UNARY1(F, T) template struct UnaryFunctor<GPUDevice, F<T> >
+#define DEFINE_UNARY2(F, T0, T1) \
+  DEFINE_UNARY1(F, T0);          \
+  DEFINE_UNARY1(F, T1)
+#define DEFINE_UNARY3(F, T0, T1, T2) \
+  DEFINE_UNARY2(F, T0, T1);          \
+  DEFINE_UNARY1(F, T2)
+#define DEFINE_UNARY4(F, T0, T1, T2, T3) \
+  DEFINE_UNARY2(F, T0, T1);              \
+  DEFINE_UNARY2(F, T2, T3)
+#define DEFINE_UNARY5(F, T0, T1, T2, T3, T4) \
+  DEFINE_UNARY2(F, T0, T1);                  \
+  DEFINE_UNARY3(F, T2, T3, T4)
+
+// Macros to explicitly instantiate kernels on GPU for multiple types
+// (T0, T1, etc.) for BinaryFunctor.
+#define DEFINE_BINARY1(F, T)                         \
+  template struct BinaryFunctor<GPUDevice, F<T>, 1>; \
+  template struct BinaryFunctor<GPUDevice, F<T>, 2>; \
+  template struct BinaryFunctor<GPUDevice, F<T>, 3>
+#define DEFINE_BINARY2(F, T0, T1) \
+  DEFINE_BINARY1(F, T0);          \
+  DEFINE_BINARY1(F, T1)
+#define DEFINE_BINARY3(F, T0, T1, T2) \
+  DEFINE_BINARY2(F, T0, T1);          \
+  DEFINE_BINARY1(F, T2)
+#define DEFINE_BINARY4(F, T0, T1, T2, T3) \
+  DEFINE_BINARY2(F, T0, T1);              \
+  DEFINE_BINARY2(F, T2, T3)
+#define DEFINE_BINARY5(F, T0, T1, T2, T3, T4) \
+  DEFINE_BINARY2(F, T0, T1);                  \
+  DEFINE_BINARY3(F, T2, T3, T4)
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_CWISE_OPS_GPU_COMMON_CU_H_
diff --git a/tensorflow/core/kernels/cwise_ops_test.cc b/tensorflow/core/kernels/cwise_ops_test.cc
new file mode 100644
index 0000000000..56af248117
--- /dev/null
+++ b/tensorflow/core/kernels/cwise_ops_test.cc
@@ -0,0 +1,167 @@
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+// Creates a Graph which applies a unary "func" on a 3D float tensor
+// of "num" elements.
+static Graph* Unary(const string& func, int num) {
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor data(DT_FLOAT, TensorShape({64, 64, num / (64 * 64)}));
+  CHECK_GT(data.NumElements(), 0);
+  data.flat<float>().setRandom();
+  test::graph::Unary(g, func, test::graph::Constant(g, data), 0);
+  return g;
+}
+
+static int kRows = 100000;
+
+static int RowsAndColsArg(int r, int c) { return r * kRows + c; }
+static int RowsFromArg(int arg) { return (arg / kRows); }
+static int ColsFromArg(int arg) { return (arg % kRows); }
+
+#define BM_UNARY(DEVICE, FUNC)                              \
+  static void BM_##DEVICE##_##FUNC(int iters, int num) {    \
+    const int64 tot = static_cast<int64>(iters) * num;      \
+    testing::ItemsProcessed(tot);                           \
+    testing::BytesProcessed(tot * sizeof(float));           \
+    test::Benchmark(#DEVICE, Unary(#FUNC, num)).Run(iters); \
+  }                                                         \
+  BENCHMARK(BM_##DEVICE##_##FUNC)->Range(4 << 10, 1 << 20);
+
+BM_UNARY(cpu, Floor);
+BM_UNARY(gpu, Floor);
+
+// data func scalar.
+static Graph* BinaryScalar(int num, const string& func) {
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor lhs(DT_FLOAT, TensorShape({64, 64, num / (64 * 64)}));
+  lhs.flat<float>().setRandom();
+  Tensor rhs(DT_FLOAT, TensorShape({}));
+  rhs.flat<float>().setRandom();
+  test::graph::Binary(g, func, test::graph::Constant(g, lhs),
+                      test::graph::Constant(g, rhs));
+  return g;
+}
+
+#define BM_BINARY_SCALAR(DEVICE, FUNC)                             \
+  static void BM_##DEVICE##_##FUNC##_scalar(int iters, int num) {  \
+    const int64 tot = static_cast<int64>(iters) * num;             \
+    testing::ItemsProcessed(tot);                                  \
+    testing::BytesProcessed(tot * sizeof(float));                  \
+    test::Benchmark(#DEVICE, BinaryScalar(num, #FUNC)).Run(iters); \
+  }                                                                \
+  BENCHMARK(BM_##DEVICE##_##FUNC##_scalar)                         \
+      ->Arg(4096) /* must >= 4096 */                               \
+      ->Arg(32768)                                                 \
+      ->Arg(131072)                                                \
+      ->Arg(1048576);
+
+BM_BINARY_SCALAR(cpu, Less);
+BM_BINARY_SCALAR(gpu, Less);
+BM_BINARY_SCALAR(cpu, Add);
+BM_BINARY_SCALAR(gpu, Add);
+#undef BM_BINARY_SCALAR
+
+static Graph* BiasAdd(int rows, int cols) {
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor lhs(DT_FLOAT, TensorShape({rows, cols}));
+  lhs.flat<float>().setRandom();
+  TensorShape rhs_shape;
+  rhs_shape = TensorShape({cols});
+  Tensor rhs(DT_FLOAT, rhs_shape);
+  rhs.flat<float>().setRandom();
+  test::graph::Binary(g, "BiasAdd", test::graph::Constant(g, lhs),
+                      test::graph::Constant(g, rhs));
+  return g;
+}
+
+#define BM_BIAS_ADD(DEVICE, R, C)                                     \
+  static void BM_##DEVICE##_BiasAdd_R##R##_C##C(int iters, int arg) { \
+    const int rows = RowsFromArg(arg);                                \
+    const int cols = ColsFromArg(arg);                                \
+    const int64 tot = static_cast<int64>(iters) * rows * cols;        \
+    testing::ItemsProcessed(tot);                                     \
+    testing::BytesProcessed(tot * sizeof(float));                     \
+    test::Benchmark(#DEVICE, BiasAdd(rows, cols)).Run(iters);         \
+  }                                                                   \
+  BENCHMARK(BM_##DEVICE##_BiasAdd_R##R##_C##C)->Arg(RowsAndColsArg(R, C));
+
+#define BM_BIAS_ADD_ALL(DEVICE)   \
+  BM_BIAS_ADD(DEVICE, 512, 2048); \
+  BM_BIAS_ADD(DEVICE, 512, 4096); \
+  BM_BIAS_ADD(DEVICE, 2048, 512); \
+  BM_BIAS_ADD(DEVICE, 4096, 512);
+
+BM_BIAS_ADD_ALL(cpu);
+BM_BIAS_ADD_ALL(gpu);
+#undef BM_BIAS_ADD_ALL
+#undef BM_BIAS_ADD
+
+static Graph* BcastAdd(int rows, int cols, int dim) {
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor lhs(DT_FLOAT, TensorShape({rows, cols}));
+  lhs.flat<float>().setRandom();
+  TensorShape rhs_shape;
+  if (dim == 0) {
+    rhs_shape = TensorShape({rows, 1});
+  } else {
+    rhs_shape = TensorShape({cols});
+  }
+  Tensor rhs(DT_FLOAT, rhs_shape);
+  rhs.flat<float>().setRandom();
+  test::graph::Binary(g, "Add", test::graph::Constant(g, lhs),
+                      test::graph::Constant(g, rhs));
+  return g;
+}
+
+#define BM_BCAST_ADD_ROW(DEVICE, R, C)                                    \
+  static void BM_##DEVICE##_BcastAddRow_R##R##_C##C(int iters, int arg) { \
+    const int rows = RowsFromArg(arg);                                    \
+    const int cols = ColsFromArg(arg);                                    \
+    const int64 tot = static_cast<int64>(iters) * rows * cols;            \
+    testing::ItemsProcessed(tot);                                         \
+    testing::BytesProcessed(tot * sizeof(float));                         \
+    test::Benchmark(#DEVICE, BcastAdd(rows, cols, 0)).Run(iters);         \
+  }                                                                       \
+  BENCHMARK(BM_##DEVICE##_BcastAddRow_R##R##_C##C)->Arg(RowsAndColsArg(R, C));
+
+#define BM_BCAST_ADD_ROW_ALL(DEVICE)   \
+  BM_BCAST_ADD_ROW(DEVICE, 512, 2048); \
+  BM_BCAST_ADD_ROW(DEVICE, 512, 4096); \
+  BM_BCAST_ADD_ROW(DEVICE, 2048, 512); \
+  BM_BCAST_ADD_ROW(DEVICE, 4096, 512);
+BM_BCAST_ADD_ROW_ALL(cpu);
+BM_BCAST_ADD_ROW_ALL(gpu);
+#undef BM_BCAST_ADD_ROW_ALL
+#undef BM_BCAST_ADD_ROW
+
+#define BM_BCAST_ADD_COL(DEVICE, R, C)                                    \
+  static void BM_##DEVICE##_BcastAddCol_R##R##_C##C(int iters, int arg) { \
+    const int rows = RowsFromArg(arg);                                    \
+    const int cols = ColsFromArg(arg);                                    \
+    const int64 tot = static_cast<int64>(iters) * rows * cols;            \
+    testing::ItemsProcessed(tot);                                         \
+    testing::BytesProcessed(tot * sizeof(float));                         \
+    test::Benchmark(#DEVICE, BcastAdd(rows, cols, 1)).Run(iters);         \
+  }                                                                       \
+  BENCHMARK(BM_##DEVICE##_BcastAddCol_R##R##_C##C)->Arg(RowsAndColsArg(R, C));
+
+#define BM_BCAST_ADD_COL_ALL(DEVICE)   \
+  BM_BCAST_ADD_COL(DEVICE, 512, 2048); \
+  BM_BCAST_ADD_COL(DEVICE, 512, 4096); \
+  BM_BCAST_ADD_COL(DEVICE, 2048, 512); \
+  BM_BCAST_ADD_COL(DEVICE, 4096, 512);
+BM_BCAST_ADD_COL_ALL(cpu);
+BM_BCAST_ADD_COL_ALL(gpu);
+#undef BM_BCAST_ADD_COL_ALL
+#undef BM_BCAST_ADD_COL
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/decode_csv_op.cc b/tensorflow/core/kernels/decode_csv_op.cc
new file mode 100644
index 0000000000..0919bab96f
--- /dev/null
+++ b/tensorflow/core/kernels/decode_csv_op.cc
@@ -0,0 +1,222 @@
+// See docs in ../ops/parsing_ops.cc.
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+class DecodeCSVOp : public OpKernel {
+ public:
+  explicit DecodeCSVOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    string delim;
+
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("OUT_TYPE", &out_type_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("field_delim", &delim));
+
+    OP_REQUIRES(ctx, delim.size() == 1,
+                errors::InvalidArgument("field_delim should be only 1 char"));
+
+    delim_ = delim[0];
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor* records;
+    OpInputList record_defaults;
+
+    OP_REQUIRES_OK(ctx, ctx->input("records", &records));
+    OP_REQUIRES_OK(ctx, ctx->input_list("record_defaults", &record_defaults));
+
+    for (int i = 0; i < record_defaults.size(); ++i) {
+      OP_REQUIRES(ctx, record_defaults[i].NumElements() < 2,
+                  errors::InvalidArgument(
+                      "There should only be 1 default per field but field ", i,
+                      " has ", record_defaults[i].NumElements()));
+    }
+
+    auto records_t = records->flat<string>();
+    int records_size = records_t.size();
+
+    OpOutputList output;
+    OP_REQUIRES_OK(ctx, ctx->output_list("output", &output));
+
+    for (size_t i = 0; i < out_type_.size(); ++i) {
+      Tensor* out = nullptr;
+      output.allocate(i, records->shape(), &out);
+    }
+
+    for (int i = 0; i < records_size; ++i) {
+      const StringPiece record(records_t(i));
+      std::vector<string> fields;
+      ExtractFields(ctx, record, &fields);
+      OP_REQUIRES(ctx, fields.size() == out_type_.size(),
+                  errors::InvalidArgument("Expect ", out_type_.size(),
+                                          " fields but have ", fields.size(),
+                                          " in record ", i));
+
+      // Check each field in the record
+      for (size_t f = 0; f < out_type_.size(); ++f) {
+        const DataType& dtype = out_type_[f];
+        switch (dtype) {
+          case DT_INT32: {
+            // If this field is empty, check if default is given:
+            // If yes, use default value; Otherwise report error.
+            if (fields[f].empty()) {
+              OP_REQUIRES(ctx, record_defaults[f].NumElements() == 1,
+                          errors::InvalidArgument(
+                              "Field ", f,
+                              " is required but missing in record ", i, "!"));
+
+              output[f]->flat<int32>()(i) = record_defaults[f].flat<int32>()(0);
+            } else {
+              int32 value;
+              OP_REQUIRES(ctx, strings::safe_strto32(fields[f].c_str(), &value),
+                          errors::InvalidArgument("Field ", f, " in record ", i,
+                                                  " is not a valid int32: ",
+                                                  fields[f]));
+              output[f]->flat<int32>()(i) = value;
+            }
+            break;
+          }
+          case DT_INT64: {
+            // If this field is empty, check if default is given:
+            // If yes, use default value; Otherwise report error.
+            if (fields[f].empty()) {
+              OP_REQUIRES(ctx, record_defaults[f].NumElements() == 1,
+                          errors::InvalidArgument(
+                              "Field ", f,
+                              " is required but missing in record ", i, "!"));
+
+              output[f]->flat<int64>()(i) = record_defaults[f].flat<int64>()(0);
+            } else {
+              int64 value;
+              OP_REQUIRES(ctx, strings::safe_strto64(fields[f].c_str(), &value),
+                          errors::InvalidArgument("Field ", f, " in record ", i,
+                                                  " is not a valid int64: ",
+                                                  fields[f]));
+              output[f]->flat<int64>()(i) = value;
+            }
+            break;
+          }
+          case DT_FLOAT: {
+            // If this field is empty, check if default is given:
+            // If yes, use default value; Otherwise report error.
+            if (fields[f].empty()) {
+              OP_REQUIRES(ctx, record_defaults[f].NumElements() == 1,
+                          errors::InvalidArgument(
+                              "Field ", f,
+                              " is required but missing in record ", i, "!"));
+              output[f]->flat<float>()(i) = record_defaults[f].flat<float>()(0);
+            } else {
+              float value;
+              OP_REQUIRES(ctx, strings::safe_strtof(fields[f].c_str(), &value),
+                          errors::InvalidArgument("Field ", f, " in record ", i,
+                                                  " is not a valid float: ",
+                                                  fields[f]));
+              output[f]->flat<float>()(i) = value;
+            }
+            break;
+          }
+          case DT_STRING: {
+            // If this field is empty, check if default is given:
+            // If yes, use default value; Otherwise report error.
+            if (fields[f].empty()) {
+              OP_REQUIRES(ctx, record_defaults[f].NumElements() == 1,
+                          errors::InvalidArgument(
+                              "Field ", f,
+                              " is required but missing in record ", i, "!"));
+              output[f]->flat<string>()(i) =
+                  record_defaults[f].flat<string>()(0);
+            } else {
+              output[f]->flat<string>()(i) = fields[f];
+            }
+            break;
+          }
+          default:
+            OP_REQUIRES(ctx, false,
+                        errors::InvalidArgument("csv: data type ", dtype,
+                                                " not supported in field ", f));
+        }
+      }
+    }
+  }
+
+ private:
+  std::vector<DataType> out_type_;
+  char delim_;
+
+  void ExtractFields(OpKernelContext* ctx, StringPiece input,
+                     std::vector<string>* result) {
+    int current_idx = 0;
+    if (!input.empty()) {
+      while (static_cast<size_t>(current_idx) < input.size()) {
+        if (input[current_idx] == '\n' || input[current_idx] == '\r') {
+          current_idx++;
+          continue;
+        }
+
+        bool quoted = false;
+        if (input[current_idx] == '"') {
+          quoted = true;
+          current_idx++;
+        }
+
+        // This is the body of the field;
+        string field;
+        if (!quoted) {
+          while (static_cast<size_t>(current_idx) < input.size() &&
+                 input[current_idx] != delim_) {
+            OP_REQUIRES(ctx, input[current_idx] != '"' &&
+                                 input[current_idx] != '\n' &&
+                                 input[current_idx] != '\r',
+                        errors::InvalidArgument(
+                            "Unquoted fields cannot have quotes/CRLFs inside"));
+            field += input[current_idx];
+            current_idx++;
+          }
+
+          // Go to next field or the end
+          current_idx++;
+        } else {
+          // Quoted field needs to be ended with '"' and delim or end
+          while (
+              (static_cast<size_t>(current_idx) < input.size() - 1) &&
+              (input[current_idx] != '"' || input[current_idx + 1] != delim_)) {
+            if (input[current_idx] != '"') {
+              field += input[current_idx];
+              current_idx++;
+            } else {
+              OP_REQUIRES(
+                  ctx, input[current_idx + 1] == '"',
+                  errors::InvalidArgument("Quote inside a string has to be "
+                                          "escaped by another quote"));
+              field += '"';
+              current_idx += 2;
+            }
+          }
+
+          OP_REQUIRES(
+              ctx,
+              input[current_idx] == '"' &&
+                  (static_cast<size_t>(current_idx) == input.size() - 1 ||
+                   input[current_idx + 1] == delim_),
+              errors::InvalidArgument("Quoted field has to end with quote "
+                                      "followed by delim or end"));
+
+          current_idx += 2;
+        }
+
+        result->push_back(field);
+      }
+
+      // Check if the last field is missing
+      if (input[input.size() - 1] == delim_) result->push_back(string());
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("DecodeCSV").Device(DEVICE_CPU), DecodeCSVOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/decode_jpeg_op.cc b/tensorflow/core/kernels/decode_jpeg_op.cc
new file mode 100644
index 0000000000..e41d3f3e11
--- /dev/null
+++ b/tensorflow/core/kernels/decode_jpeg_op.cc
@@ -0,0 +1,72 @@
+// See docs in ../ops/image_ops.cc
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/lib/jpeg/jpeg_mem.h"
+
+namespace tensorflow {
+
+// Decode the contents of a JPEG file
+class DecodeJpegOp : public OpKernel {
+ public:
+  explicit DecodeJpegOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("channels", &flags_.components));
+    OP_REQUIRES(context, flags_.components == 0 || flags_.components == 1 ||
+                             flags_.components == 3,
+                errors::InvalidArgument("channels must be 0, 1, or 3, got ",
+                                        flags_.components));
+    OP_REQUIRES_OK(context, context->GetAttr("ratio", &flags_.ratio));
+    OP_REQUIRES(context, flags_.ratio == 1 || flags_.ratio == 2 ||
+                             flags_.ratio == 4 || flags_.ratio == 8,
+                errors::InvalidArgument("ratio must be 1, 2, 4, or 8, got ",
+                                        flags_.ratio));
+    OP_REQUIRES_OK(
+        context, context->GetAttr("fancy_upscaling", &flags_.fancy_upscaling));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("try_recover_truncated",
+                                    &flags_.try_recover_truncated_jpeg));
+    OP_REQUIRES_OK(context, context->GetAttr("acceptable_fraction",
+                                             &flags_.min_acceptable_fraction));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& contents = context->input(0);
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(contents.shape()),
+                errors::InvalidArgument("contents must be scalar, got shape ",
+                                        contents.shape().ShortDebugString()));
+    const StringPiece input = contents.scalar<string>()();
+    OP_REQUIRES(context, input.size() <= std::numeric_limits<int>::max(),
+                errors::InvalidArgument("JPEG contents are too large for int: ",
+                                        input.size()));
+
+    // Decode image, allocating tensor once the image size is known
+    Tensor* output = NULL;
+    OP_REQUIRES(
+        context,
+        jpeg::Uncompress(
+            input.data(), input.size(), flags_, NULL,
+            [=, &output](int width, int height, int channels) -> uint8* {
+              Status status(context->allocate_output(
+                  0, TensorShape({height, width, channels}), &output));
+              if (!status.ok()) {
+                VLOG(1) << status;
+                context->SetStatus(status);
+                return nullptr;
+              }
+              return output->flat<uint8>().data();
+            }),
+        errors::InvalidArgument("Invalid JPEG data, size ", input.size()));
+  }
+
+ private:
+  jpeg::UncompressFlags flags_;
+};
+REGISTER_KERNEL_BUILDER(Name("DecodeJpeg").Device(DEVICE_CPU), DecodeJpegOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/decode_png_op.cc b/tensorflow/core/kernels/decode_png_op.cc
new file mode 100644
index 0000000000..e8071526f9
--- /dev/null
+++ b/tensorflow/core/kernels/decode_png_op.cc
@@ -0,0 +1,69 @@
+// See docs in ../ops/image_ops.cc
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/lib/png/png_io.h"
+
+namespace tensorflow {
+
+// Decode the contents of a PNG file
+class DecodePngOp : public OpKernel {
+ public:
+  explicit DecodePngOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("channels", &channels_));
+    OP_REQUIRES(context, channels_ == 0 || channels_ == 1 || channels_ == 3 ||
+                             channels_ == 4,
+                errors::InvalidArgument("channels must be 0, 1, 3, or 4, got ",
+                                        channels_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& contents = context->input(0);
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(contents.shape()),
+                errors::InvalidArgument("contents must be scalar, got shape ",
+                                        contents.shape().ShortDebugString()));
+
+    // Start decoding image to get shape details
+    const StringPiece data = contents.scalar<string>()();
+    png::DecodeContext decode;
+    OP_REQUIRES(
+        context, png::CommonInitDecode(data, channels_, 8, &decode),
+        errors::InvalidArgument("Invalid PNG header, data size ", data.size()));
+
+    // Verify that width and height don't overflow int
+    const int width = decode.width;
+    const int height = decode.height;
+    if (width != static_cast<int64>(decode.width) ||
+        height != static_cast<int64>(decode.height)) {
+      png::CommonFreeDecode(&decode);
+      OP_REQUIRES(context, false,
+                  errors::InvalidArgument("PNG size too large for int: ",
+                                          decode.width, " by ", decode.height));
+    }
+
+    // Allocate tensor
+    Tensor* output = nullptr;
+    const auto status = context->allocate_output(
+        0, TensorShape({height, width, decode.channels}), &output);
+    if (!status.ok()) png::CommonFreeDecode(&decode);
+    OP_REQUIRES_OK(context, status);
+
+    // Finish decoding image
+    OP_REQUIRES(
+        context, png::CommonFinishDecode(output->flat<uint8>().data(),
+                                         decode.channels * width, &decode),
+        errors::InvalidArgument("Invalid PNG data, size ", data.size()));
+  }
+
+ private:
+  int channels_;
+};
+REGISTER_KERNEL_BUILDER(Name("DecodePng").Device(DEVICE_CPU), DecodePngOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/decode_raw_op.cc b/tensorflow/core/kernels/decode_raw_op.cc
new file mode 100644
index 0000000000..ef24c333a4
--- /dev/null
+++ b/tensorflow/core/kernels/decode_raw_op.cc
@@ -0,0 +1,90 @@
+// See docs in ../ops/parse_ops.cc.
+
+#include <algorithm>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+template <typename T>
+class DecodeRawOp : public OpKernel {
+ public:
+  explicit DecodeRawOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("little_endian", &little_endian_));
+    OP_REQUIRES_OK(context, context->GetAttr("out_type", &out_type_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const auto& input = context->input(0);
+    int str_size = -1;
+    auto flat_in = input.flat<string>();
+    for (int i = 0; i < flat_in.size(); ++i) {
+      const string& in_str = flat_in(i);
+      if (str_size == -1) {
+        str_size = in_str.size();
+      } else {
+        OP_REQUIRES(context, str_size == in_str.size(),
+                    errors::InvalidArgument(
+                        "DecodeRaw requires input strings to all be the same "
+                        "size, but element ",
+                        i, " has size ", str_size, " != ", in_str.size()));
+      }
+    }
+    TensorShape out_shape = input.shape();
+    if (str_size == -1) {  // Empty input
+      out_shape.AddDim(1);
+      Tensor* output_tensor = nullptr;
+      OP_REQUIRES_OK(context, context->allocate_output("output", out_shape,
+                                                       &output_tensor));
+      return;
+    }
+    OP_REQUIRES(
+        context, str_size % sizeof(T) == 0,
+        errors::InvalidArgument("Input to DecodeRaw has length ", str_size,
+                                " that is not a multiple of ", sizeof(T),
+                                ", the size of ", DataTypeString(out_type_)));
+    const int added_dim = str_size / sizeof(T);
+    out_shape.AddDim(added_dim);
+    Tensor* output_tensor = nullptr;
+    OP_REQUIRES_OK(
+        context, context->allocate_output("output", out_shape, &output_tensor));
+    auto out = output_tensor->flat_inner_dims<T>();
+    DCHECK_EQ(flat_in.size(), out.dimensions()[0]);
+    OP_REQUIRES(
+        context,
+        little_endian_ == ::tensorflow::port::kLittleEndian || sizeof(T) == 1,
+        errors::Unimplemented("Unimplemented support for little_endian=",
+                              little_endian_ ? "true" : "false"));
+    // Endianness matches, so just copy each string byte-for-byte.
+    T* out_data = out.data();
+    for (int i = 0; i < flat_in.size(); ++i) {
+      const T* in_data = reinterpret_cast<const T*>(flat_in(i).data());
+      memcpy(out_data, in_data, str_size);
+      out_data += added_dim;
+    }
+  }
+
+ private:
+  bool little_endian_;
+  DataType out_type_;
+};
+
+#define REGISTER(type)                                                       \
+  REGISTER_KERNEL_BUILDER(                                                   \
+      Name("DecodeRaw").Device(DEVICE_CPU).TypeConstraint<type>("out_type"), \
+      DecodeRawOp<type>)
+
+REGISTER(float);
+REGISTER(double);
+REGISTER(int32);
+REGISTER(uint8);
+REGISTER(int16);
+REGISTER(int8);
+REGISTER(int64);
+
+#undef REGISTER
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/dense_update_ops.cc b/tensorflow/core/kernels/dense_update_ops.cc
new file mode 100644
index 0000000000..f56c37b4ef
--- /dev/null
+++ b/tensorflow/core/kernels/dense_update_ops.cc
@@ -0,0 +1,136 @@
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/assign_op.h"
+#include "tensorflow/core/kernels/dense_update_ops.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+template <typename Device, typename T>
+class AssignOpT : public AssignOp {
+ public:
+  using AssignOp::AssignOp;
+
+  void Copy(OpKernelContext* context, Tensor* lhs, const Tensor& rhs) override {
+    functor::DenseUpdate<Device, T, ASSIGN> copy;
+    copy(context->eigen_device<Device>(), lhs->flat<T>(), rhs.flat<T>());
+  }
+};
+
+// TODO(jeff): Get rid of use_exclusive_lock_ option
+template <typename Device, typename T, DenseUpdateType OP>
+class DenseUpdateOp : public OpKernel {
+ public:
+  explicit DenseUpdateOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("use_locking", &use_exclusive_lock_));
+    const DataType dt = DataTypeToEnum<T>::v();
+    OP_REQUIRES_OK(context, context->MatchSignature({MakeRefType(dt), dt},
+                                                    {MakeRefType(dt)}));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    // We always return the input ref.
+    context->forward_ref_input_to_ref_output(0, 0);
+
+    if (use_exclusive_lock_) {
+      mutex_lock l(*context->input_ref_mutex(0));
+      DoUpdate(context);
+    } else {
+      DoUpdate(context);
+    }
+  }
+
+ private:
+  void DoUpdate(OpKernelContext* context) {
+    Tensor Tparams = context->mutable_input(0, use_exclusive_lock_);
+    const Tensor& Tupdate = context->input(1);
+    OP_REQUIRES(context, Tparams.IsInitialized(),
+                errors::FailedPrecondition("Attempting to use uninitialized "
+                                           "parameters: ",
+                                           def().input(0)));
+    OP_REQUIRES(
+        context, Tparams.IsSameSize(Tupdate),
+        errors::InvalidArgument("Parameters and update must be the same size"));
+
+    functor::DenseUpdate<Device, T, OP> update_functor;
+    update_functor(context->eigen_device<Device>(), Tparams.flat<T>(),
+                   Tupdate.flat<T>());
+  }
+
+  bool use_exclusive_lock_;
+};
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+#define REGISTER_KERNELS(type)                                     \
+  REGISTER_KERNEL_BUILDER(                                         \
+      Name("Assign").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      AssignOpT<CPUDevice, type>);
+
+TF_CALL_ALL_TYPES(REGISTER_KERNELS);
+#undef REGISTER_KERNELS
+
+#if GOOGLE_CUDA
+// Only register 'Assign' on GPU for the subset of types also supported by
+// 'Variable' (see variable_ops.cc.)
+#define REGISTER_GPU_KERNELS(type)                                 \
+  namespace functor {                                              \
+  template <>                                                      \
+  void DenseUpdate<GPUDevice, type, ASSIGN>::operator()(           \
+      const GPUDevice& d, typename TTypes<type>::Flat lhs,         \
+      typename TTypes<type>::ConstFlat rhs);                       \
+  extern template struct DenseUpdate<GPUDevice, type, ASSIGN>;     \
+  }                                                                \
+  REGISTER_KERNEL_BUILDER(                                         \
+      Name("Assign").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      AssignOpT<GPUDevice, type>);
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);
+#undef REGISTER_GPU_KERNELS
+#endif  // GOOGLE_CUDA
+
+#define REGISTER_KERNELS(type)                                        \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("AssignAdd").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      DenseUpdateOp<CPUDevice, type, DenseUpdateType::ADD>);          \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("AssignSub").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      DenseUpdateOp<CPUDevice, type, DenseUpdateType::SUB>);
+
+TF_CALL_NUMBER_TYPES(REGISTER_KERNELS);
+#undef REGISTER_KERNELS
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC_FOR_OP(T, OP)                     \
+  template <>                                              \
+  void DenseUpdate<GPUDevice, T, OP>::operator()(          \
+      const GPUDevice& d, typename TTypes<T>::Flat params, \
+      typename TTypes<T>::ConstFlat update);               \
+  extern template struct DenseUpdate<GPUDevice, T, OP>
+#define DECLARE_GPU_SPEC(T)                         \
+  DECLARE_GPU_SPEC_FOR_OP(T, DenseUpdateType::ADD); \
+  DECLARE_GPU_SPEC_FOR_OP(T, DenseUpdateType::SUB)
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPEC);
+#undef DECLARE_GPU_SPEC
+#undef DECLARE_GPU_SPEC_FOR_OP
+}  // namespace functor
+
+#define REGISTER_GPU_KERNELS(type)                                    \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("AssignAdd").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      DenseUpdateOp<GPUDevice, type, DenseUpdateType::ADD>);          \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("AssignSub").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      DenseUpdateOp<GPUDevice, type, DenseUpdateType::SUB>);
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);
+#undef REGISTER_GPU_KERNELS
+#endif  // end GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/dense_update_ops.h b/tensorflow/core/kernels/dense_update_ops.h
new file mode 100644
index 0000000000..d32c9a4af2
--- /dev/null
+++ b/tensorflow/core/kernels/dense_update_ops.h
@@ -0,0 +1,43 @@
+#ifndef TENSORFLOW_KERNELS_DENSE_UPDATE_OPS_H_
+#define TENSORFLOW_KERNELS_DENSE_UPDATE_OPS_H_
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+enum DenseUpdateType { ADD, SUB, ASSIGN };
+
+namespace functor {
+
+template <typename Device, typename T, DenseUpdateType OP>
+struct DenseUpdate;
+
+template <typename Device, typename T>
+struct DenseUpdate<Device, T, ADD> {
+  void operator()(const Device& d, typename TTypes<T>::Flat params,
+                  typename TTypes<T>::ConstFlat update) {
+    params.device(d) += update;
+  }
+};
+
+template <typename Device, typename T>
+struct DenseUpdate<Device, T, SUB> {
+  void operator()(const Device& d, typename TTypes<T>::Flat params,
+                  typename TTypes<T>::ConstFlat update) {
+    params.device(d) -= update;
+  }
+};
+
+template <typename Device, typename T>
+struct DenseUpdate<Device, T, ASSIGN> {
+  void operator()(const Device& d, typename TTypes<T>::Flat params,
+                  typename TTypes<T>::ConstFlat update) {
+    params.device(d) = update;
+  }
+};
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_DENSE_UPDATE_OPS_H_
diff --git a/tensorflow/core/kernels/dense_update_ops_gpu.cu.cc b/tensorflow/core/kernels/dense_update_ops_gpu.cu.cc
new file mode 100644
index 0000000000..8e80901c71
--- /dev/null
+++ b/tensorflow/core/kernels/dense_update_ops_gpu.cu.cc
@@ -0,0 +1,22 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/dense_update_ops.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+#define DEFINE_GPU_KERNELS(T)                              \
+  template struct functor::DenseUpdate<GPUDevice, T, ADD>; \
+  template struct functor::DenseUpdate<GPUDevice, T, SUB>; \
+  template struct functor::DenseUpdate<GPUDevice, T, ASSIGN>;
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_KERNELS);
+#undef DEFINE_GPU_KERNELS
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/determinant_op.cc b/tensorflow/core/kernels/determinant_op.cc
new file mode 100644
index 0000000000..d34aab7a44
--- /dev/null
+++ b/tensorflow/core/kernels/determinant_op.cc
@@ -0,0 +1,66 @@
+// See docs in ../ops/linalg_ops.cc.
+#include <cmath>
+
+#include "tensorflow/core/framework/kernel_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/linalg_ops_common.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/Eigen/LU"
+
+namespace tensorflow {
+
+template <class Scalar, bool SupportsBatchOperationT>
+class DeterminantOp : public LinearAlgebraOp<Scalar, SupportsBatchOperationT> {
+ public:
+  explicit DeterminantOp(OpKernelConstruction* context)
+      : LinearAlgebraOp<Scalar, SupportsBatchOperationT>(context) {}
+  ~DeterminantOp() override {}
+
+  TensorShape GetOutputMatrixShape(
+      const TensorShape& input_matrix_shape) override {
+    return TensorShape({});
+  }
+
+  int64 GetCostPerUnit(const TensorShape& input_matrix_shape) override {
+    const int64 rows = input_matrix_shape.dim_size(0);
+    if (rows > (1LL << 20)) {
+      // A big number to cap the cost in case overflow.
+      return kint32max;
+    } else {
+      return rows * rows * rows;
+    }
+  }
+
+  using typename LinearAlgebraOp<Scalar, SupportsBatchOperationT>::MatrixMap;
+  using
+      typename LinearAlgebraOp<Scalar, SupportsBatchOperationT>::ConstMatrixMap;
+
+  void ComputeMatrix(OpKernelContext* context, const ConstMatrixMap& input,
+                     MatrixMap* output) override {
+    OP_REQUIRES(context, input.rows() == input.cols(),
+                errors::InvalidArgument("Input matrix must be square."));
+    Scalar determinant;
+    if (input.rows() == 0) {
+      // An empty matrix' determinant is defined to be 1.  See
+      // wikipedia.
+      determinant = 1;
+    } else {
+      determinant = input.determinant();
+    }
+    OP_REQUIRES(context, std::isfinite(determinant),
+                errors::Internal("The determinant is not finite."));
+    (*output)(0, 0) = determinant;
+  }
+};
+
+REGISTER_LINALG_OP("MatrixDeterminant", (DeterminantOp<float, false>), float);
+REGISTER_LINALG_OP("MatrixDeterminant", (DeterminantOp<double, false>), double);
+REGISTER_LINALG_OP("BatchMatrixDeterminant", (DeterminantOp<float, true>),
+                   float);
+REGISTER_LINALG_OP("BatchMatrixDeterminant", (DeterminantOp<double, true>),
+                   double);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/diag_op.cc b/tensorflow/core/kernels/diag_op.cc
new file mode 100644
index 0000000000..83e39d33a9
--- /dev/null
+++ b/tensorflow/core/kernels/diag_op.cc
@@ -0,0 +1,93 @@
+// See docs in ../ops/array_ops.cc
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace {
+template <typename T, size_t NumDims, size_t DoubleNumDims>
+class DiagonalGenerator {
+ public:
+  explicit DiagonalGenerator(const Tensor& diagonal) : diagonal_(diagonal) {
+    static_assert(DoubleNumDims == 2 * NumDims,
+                  "The second size must be the double of the first size.");
+    CHECK_EQ(diagonal.dims(), NumDims);
+  }
+  T operator()(
+      const Eigen::array<Eigen::DenseIndex, DoubleNumDims>& coordinates) const {
+    Eigen::array<Eigen::DenseIndex, NumDims> index;
+    for (int i = 0; i < NumDims; ++i) {
+      if (coordinates[i] != coordinates[NumDims + i]) {
+        return T(0);
+      }
+      index[i] = coordinates[i];
+    }
+    return diagonal_.tensor<T, NumDims>()(index);
+  }
+
+ private:
+  Tensor diagonal_;
+};
+}  // namespace
+
+// Generate the diagonal tensor with the diagonal set to the input tensor.
+// It only allows up to rank 3 input tensor, so the output tensor is up to
+// rank 6.
+template <typename T>
+class DiagOp : public OpKernel {
+ public:
+  explicit DiagOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& diagonal = context->input(0);
+    const int num_dims = diagonal.dims();
+    OP_REQUIRES(context, 1 <= num_dims,
+                errors::InvalidArgument(
+                    "The rank of the diagonal should be between 1 and 3."));
+    OP_REQUIRES(context, 3 >= num_dims,
+                errors::InvalidArgument(
+                    "The rank of the diagonal  should be between 1 and 3."));
+    TensorShape out_shape;
+    for (int i = 0; i < num_dims; ++i) {
+      out_shape.AddDim(diagonal.dim_size(i));
+    }
+    for (int i = 0; i < num_dims; ++i) {
+      out_shape.AddDim(diagonal.dim_size(i));
+    }
+    Tensor* output_tensor = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, out_shape, &output_tensor));
+    switch (num_dims) {
+      case 1:
+        output_tensor->tensor<T, 2>() = output_tensor->tensor<T, 2>().generate(
+            DiagonalGenerator<T, 1, 2>(diagonal));
+        break;
+      case 2:
+        output_tensor->tensor<T, 4>() = output_tensor->tensor<T, 4>().generate(
+            DiagonalGenerator<T, 2, 4>(diagonal));
+        break;
+      case 3:
+        output_tensor->tensor<T, 6>() = output_tensor->tensor<T, 6>().generate(
+            DiagonalGenerator<T, 3, 6>(diagonal));
+        break;
+      default:
+        context->SetStatus(errors::Unimplemented(
+            "Diagonal of rank ", num_dims, " tensor is not supported yet."));
+        return;
+    }
+  }
+};
+
+#define REGISTER_DIAGOP(T) \
+  REGISTER_KERNEL_BUILDER( \
+      Name("Diag").Device(DEVICE_CPU).TypeConstraint<T>("T"), DiagOp<T>)
+
+REGISTER_DIAGOP(double);
+REGISTER_DIAGOP(float);
+REGISTER_DIAGOP(int32);
+REGISTER_DIAGOP(int64);
+
+#undef REGISTER_DIAGOP
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/dynamic_partition_op.cc b/tensorflow/core/kernels/dynamic_partition_op.cc
new file mode 100644
index 0000000000..f1b44861b5
--- /dev/null
+++ b/tensorflow/core/kernels/dynamic_partition_op.cc
@@ -0,0 +1,154 @@
+// See docs in ../ops/data_flow_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+// Shared code that is not dependent on the type of T.  We do this to reduce
+// code size by not duplicating all this for all T (float, double, int32, etc.)
+class DynamicPartitionOp_Shared : public OpKernel {
+ public:
+  explicit DynamicPartitionOp_Shared(OpKernelConstruction* c) : OpKernel(c) {
+    OP_REQUIRES_OK(c, c->GetAttr("num_partitions", &num_partitions_));
+    //   QUESTION: It'd be nice to support DT_INT16, DT_UINT8, etc.
+    //   to input[1].  Should we have the framework do some sort of
+    //   integer promotion automatically, or should that be something
+    //   that users have to do explicitly with a conversion operator
+    //   in the graph?
+  }
+
+  void ValidateAndAllocateOutputs(OpKernelContext* c, const Tensor** data,
+                                  const Tensor** partitions,
+                                  OpOutputList* Tout) {
+    OP_REQUIRES_OK(c, c->input("data", data));
+    OP_REQUIRES_OK(c, c->input("partitions", partitions));
+    OP_REQUIRES(c, TensorShapeUtils::StartsWith((*data)->shape(),
+                                                (*partitions)->shape()),
+                errors::InvalidArgument(
+                    "data.shape must start with partitions.shape, ",
+                    "got data.shape = ", (*data)->shape().ShortDebugString(),
+                    ", partitions.shape = ",
+                    (*partitions)->shape().ShortDebugString()));
+
+    // Count how many occurrences of each partition id we have in partitions
+    gtl::InlinedVector<int, 32> partition_count(num_partitions_);
+    auto e_partitions = (*partitions)->flat<int32>();
+    const int64 N = e_partitions.dimension(0);
+    for (int64 i = 0; i < N; i++) {
+      const int32 p = e_partitions(i);
+      OP_REQUIRES(c, p >= 0 && p < num_partitions_,
+                  errors::InvalidArgument(
+                      "partitions", SliceString((*partitions)->shape(), i),
+                      " = ", p, " is not in [0, ", num_partitions_, ")"));
+      partition_count[p]++;
+    }
+
+    // Allocate output tensors of the right size
+    OP_REQUIRES_OK(c, c->output_list("outputs", Tout));
+    for (int p = 0; p < num_partitions_; p++) {
+      TensorShape shape;
+      shape.AddDim(partition_count[p]);
+      for (int i = (*partitions)->dims(); i < (*data)->dims(); i++) {
+        shape.AddDim((*data)->dim_size(i));
+      }
+      Tensor* out;
+      OP_REQUIRES_OK(c, Tout->allocate(p, shape, &out));
+    }
+  }
+
+ protected:
+  int num_partitions_;
+
+  static string SliceString(const TensorShape& shape, const int64 flat) {
+    // Special case rank 0 and 1
+    const int dims = shape.dims();
+    if (dims == 0) return "";
+    if (dims == 1) return strings::StrCat("[", flat, "]");
+
+    // Compute strides
+    gtl::InlinedVector<int64, 32> strides(dims);
+    strides.back() = 1;
+    for (int i = dims - 2; i >= 0; i--) {
+      strides[i] = strides[i + 1] * shape.dim_size(i + 1);
+    }
+
+    // Unflatten index
+    int64 left = flat;
+    string result;
+    for (int i = 0; i < dims; i++) {
+      strings::StrAppend(&result, i ? "," : "[", left / strides[i]);
+      left %= strides[i];
+    }
+    strings::StrAppend(&result, "]");
+    return result;
+  }
+};
+
+template <class T>
+class DynamicPartitionOp : public DynamicPartitionOp_Shared {
+ public:
+  explicit DynamicPartitionOp(OpKernelConstruction* c)
+      : DynamicPartitionOp_Shared(c) {}
+  void Compute(OpKernelContext* c) override {
+    const Tensor* data;
+    const Tensor* partitions;
+    OpOutputList outputs;
+    ValidateAndAllocateOutputs(c, &data, &partitions, &outputs);
+    if (!c->status().ok()) return;
+    if (num_partitions_ == 0 || data->NumElements() == 0) return;
+
+    auto e_partitions = partitions->flat<int32>();
+    const int64 N = e_partitions.dimension(0);
+    gtl::InlinedVector<int, 32> output_index(num_partitions_);
+
+    if (partitions->dims() == data->dims()) {
+      // Walk through data and copy the data to the appropriate output tensor
+      const auto data_flat = data->flat<T>();
+      std::vector<Eigen::TensorMap<Eigen::Tensor<T, 1, Eigen::RowMajor>,
+                                   Eigen::Aligned> > out_vec;
+      for (int p = 0; p < num_partitions_; p++) {
+        out_vec.push_back(outputs[p]->vec<T>());
+      }
+      for (int64 i = 0; i < N; i++) {
+        const int32 p = e_partitions(i);
+        out_vec[p](output_index[p]) = data_flat(i);
+        output_index[p]++;
+      }
+    } else {
+      // If data has extra dimensions, use Eigen slices
+      std::vector<Eigen::TensorMap<Eigen::Tensor<T, 2, Eigen::RowMajor>,
+                                   Eigen::Aligned> > out_flat;
+      for (int p = 0; p < num_partitions_; p++) {
+        out_flat.push_back(outputs[p]->flat_outer_dims<T>());
+      }
+
+      // Walk through data and copy the data to the appropriate output tensor
+      const int64 slice_size = data->NumElements() / N;
+      const auto data_flat = data->shaped<T, 2>({N, slice_size});
+      Eigen::DSizes<Eigen::DenseIndex, 2> sizes(1, slice_size);
+      for (int64 i = 0; i < N; i++) {
+        const int32 p = e_partitions(i);
+        // outputs[p][output_index[p]++] = data[i]
+        Eigen::DSizes<Eigen::DenseIndex, 2> out_indices(output_index[p], 0);
+        Eigen::DSizes<Eigen::DenseIndex, 2> data_indices(i, 0);
+        out_flat[p].slice(out_indices, sizes) =
+            data_flat.slice(data_indices, sizes);
+        output_index[p]++;
+      }
+    }
+  }
+};
+
+#define REGISTER_DYNAMIC_PARTITION(T)                                     \
+  REGISTER_KERNEL_BUILDER(                                                \
+      Name("DynamicPartition").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
+      DynamicPartitionOp<T>)
+
+TF_CALL_ALL_TYPES(REGISTER_DYNAMIC_PARTITION);
+#undef REGISTER_DYNAMIC_PARTITION
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/dynamic_partition_op_test.cc b/tensorflow/core/kernels/dynamic_partition_op_test.cc
new file mode 100644
index 0000000000..b0e5e7deb0
--- /dev/null
+++ b/tensorflow/core/kernels/dynamic_partition_op_test.cc
@@ -0,0 +1,145 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+namespace {
+
+class DynamicPartitionOpTest : public OpsTestBase {
+ protected:
+  void MakeOp() {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "DynamicPartition")
+                  .Input(FakeInput(DT_FLOAT))
+                  .Input(FakeInput(DT_INT32))
+                  .Attr("num_partitions", 4)
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(DynamicPartitionOpTest, Simple_OneD) {
+  MakeOp();
+
+  // Similar to how we would use this to split embedding ids to be looked up
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({6}), {0, 13, 2, 39, 4, 17});
+  AddInputFromArray<int32>(TensorShape({6}), {0, 0, 2, 3, 2, 1});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output sizes
+  {  // Output 0
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({2}));
+    test::FillValues<float>(&expected, {0, 13});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+  }
+  {  // Output 1
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({1}));
+    test::FillValues<float>(&expected, {17});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(1));
+  }
+  {  // Output 2
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({2}));
+    test::FillValues<float>(&expected, {2, 4});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(2));
+  }
+  {  // Output 3
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({1}));
+    test::FillValues<float>(&expected, {39});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(3));
+  }
+}
+
+TEST_F(DynamicPartitionOpTest, Simple_TwoD) {
+  MakeOp();
+
+  // Feed and run
+  AddInputFromArray<float>(
+      TensorShape({6, 3}),
+      {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17});
+  AddInputFromArray<int32>(TensorShape({6}), {0, 0, 2, 3, 2, 1});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output sizes
+  {  // Output 0
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({2, 3}));
+    test::FillValues<float>(&expected, {0, 1, 2, 3, 4, 5});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+  }
+  {  // Output 1
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 3}));
+    test::FillValues<float>(&expected, {15, 16, 17});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(1));
+  }
+  {  // Output 2
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({2, 3}));
+    test::FillValues<float>(&expected, {6, 7, 8, 12, 13, 14});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(2));
+  }
+  {  // Output 3
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 3}));
+    test::FillValues<float>(&expected, {9, 10, 11});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(3));
+  }
+}
+
+TEST_F(DynamicPartitionOpTest, SomeOutputsEmpty) {
+  MakeOp();
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({6}), {0, 13, 2, 39, 4, 17});
+  AddInputFromArray<int32>(TensorShape({6}), {0, 0, 2, 2, 0, 2});
+  ASSERT_OK(RunOpKernel());
+
+  TensorShape empty_one_dim;
+  empty_one_dim.AddDim(0);
+  Tensor expected_empty(allocator(), DT_FLOAT, empty_one_dim);
+
+  // Check the output sizes
+  {  // Output 0
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({3}));
+    test::FillValues<float>(&expected, {0, 13, 4});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+  }
+  {  // Output 1
+    test::ExpectTensorEqual<float>(expected_empty, *GetOutput(1));
+  }
+  {  // Output 2
+    Tensor expected(allocator(), DT_FLOAT, TensorShape({3}));
+    test::FillValues<float>(&expected, {2, 39, 17});
+    test::ExpectTensorEqual<float>(expected, *GetOutput(2));
+  }
+  {  // Output 3
+    test::ExpectTensorEqual<float>(expected_empty, *GetOutput(3));
+  }
+}
+
+TEST_F(DynamicPartitionOpTest, Error_IndexOutOfRange) {
+  MakeOp();
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14});
+  AddInputFromArray<int32>(TensorShape({5}), {0, 2, 99, 2, 2});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(
+      StringPiece(s.ToString()).contains("partitions[2] = 99 is not in [0, 4)"))
+      << s;
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/dynamic_stitch_op.cc b/tensorflow/core/kernels/dynamic_stitch_op.cc
new file mode 100644
index 0000000000..a5623685fb
--- /dev/null
+++ b/tensorflow/core/kernels/dynamic_stitch_op.cc
@@ -0,0 +1,158 @@
+// See docs in ../ops/data_flow_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+template <class T>
+class DynamicStitchOp : public OpKernel {
+ public:
+  explicit DynamicStitchOp(OpKernelConstruction* c) : OpKernel(c) {
+    // Compute expected input signature
+    const DataType dt = DataTypeToEnum<T>::v();
+    const int n = c->num_inputs() / 2;
+    DataTypeVector expected;
+    for (int i = 0; i < n; i++) {
+      expected.push_back(DT_INT32);
+    }
+    for (int i = 0; i < n; i++) {
+      expected.push_back(dt);
+    }
+    OP_REQUIRES_OK(c, c->MatchSignature(expected, {dt}));
+    OP_REQUIRES(
+        c, c->num_inputs() > 0,
+        errors::InvalidArgument("DynamicStitchOp: Must have some inputs"));
+    OP_REQUIRES(c, c->num_inputs() % 2 == 0,
+                errors::InvalidArgument(
+                    "DynamicStitchOp: Must have even number of arguments"));
+  }
+
+  void Compute(OpKernelContext* c) override {
+    // Find maximum index in the indices vectors
+    OpInputList indices_inputs;
+    OP_REQUIRES_OK(c, c->input_list("indices", &indices_inputs));
+
+    int32 max_index = -1;
+    for (const Tensor& indices : indices_inputs) {
+      Eigen::Tensor<int32, 0, Eigen::RowMajor> m =
+          indices.flat<int32>().maximum();
+      max_index = std::max(m(), max_index);
+    }
+    const int first_dim_size = max_index + 1;
+
+    // Validate that data[i].shape = indices[i].shape + constant
+    OpInputList data_inputs;
+    OP_REQUIRES_OK(c, c->input_list("data", &data_inputs));
+    const Tensor& data0 = data_inputs[0];
+    const Tensor& indices0 = indices_inputs[0];
+    for (int input_num = 0; input_num < indices_inputs.size(); input_num++) {
+      const Tensor& indices = indices_inputs[input_num];
+      const Tensor& data = data_inputs[input_num];
+      OP_REQUIRES(
+          c, TensorShapeUtils::StartsWith(data.shape(), indices.shape()),
+          errors::InvalidArgument(
+              "data[", input_num, "].shape = ", data.shape().ShortDebugString(),
+              " does not start with indices[", input_num, "].shape = ",
+              indices.shape().ShortDebugString()));
+      OP_REQUIRES(
+          c, input_num == 0 || SameExtraShape(data0, indices0, data, indices),
+          errors::InvalidArgument(
+              "Need data[0].shape[", indices0.dims(), ":] = data[", input_num,
+              "].shape[", indices.dims(), ":], got data[0].shape = ",
+              data0.shape().ShortDebugString(), ", data[", input_num,
+              "].shape = ", data.shape().ShortDebugString(),
+              ", indices[0].shape = ", indices0.shape().ShortDebugString(),
+              ", indices[", input_num, "].shape = ",
+              indices.shape().ShortDebugString()));
+    }
+
+    // Allocate result tensor of shape
+    //   [first_dim_size] + data.shape[indices.dims:]
+    TensorShape result_shape;
+    result_shape.AddDim(first_dim_size);
+    for (int d = indices0.dims(); d < data0.dims(); d++) {
+      result_shape.AddDim(data0.dim_size(d));
+    }
+    Tensor* merged = nullptr;
+    OP_REQUIRES_OK(c, c->allocate_output(0, result_shape, &merged));
+
+    // TODO(jeff): Currently we leave uninitialized any portions of
+    // merged that aren't covered by an index in indices.  What should we do?
+    if (first_dim_size > 0) {
+      auto merged_flat = merged->flat_outer_dims<T>();
+      const int slice_size = merged_flat.dimension(1);
+      for (int input_num = 0; input_num < indices_inputs.size(); input_num++) {
+        const Tensor& indices = indices_inputs[input_num];
+        auto indices_vec = indices.flat<int32>();
+        const Tensor& data = data_inputs[input_num];
+        auto data_flat =
+            data.shaped<T, 2>({indices_vec.dimension(0), slice_size});
+
+        if (DataTypeCanUseMemcpy(DataTypeToEnum<T>::v())) {
+          T* merged_base = &merged_flat(0, 0);
+          const T* data_base = &data_flat(0, 0);
+          const size_t slice_bytes = slice_size * sizeof(T);
+          for (int i = 0; i < indices_vec.size(); i++) {
+            memcpy(merged_base + indices_vec(i) * slice_size,
+                   data_base + i * slice_size, slice_bytes);
+          }
+        } else {
+          Eigen::DSizes<Eigen::DenseIndex, 2> sizes(1, slice_size);
+          for (int i = 0; i < indices_vec.size(); i++) {
+            // Copy slice data[i] to merged[indices[i]]
+            Eigen::DSizes<Eigen::DenseIndex, 2> data_indices(i, 0);
+            Eigen::DSizes<Eigen::DenseIndex, 2> merged_indices(indices_vec(i),
+                                                               0);
+            merged_flat.slice(merged_indices, sizes) =
+                data_flat.slice(data_indices, sizes);
+          }
+        }
+      }
+    }
+  }
+
+ private:
+  // Check if data0.shape[indices0.dims():] == data1.shape[indices1.dims():]
+  static bool SameExtraShape(const Tensor& data0, const Tensor& indices0,
+                             const Tensor& data1, const Tensor& indices1) {
+    const int extra0 = data0.dims() - indices0.dims();
+    const int extra1 = data1.dims() - indices1.dims();
+    if (extra0 != extra1) return false;
+    for (int i = 0; i < extra0; i++) {
+      if (data0.dim_size(indices0.dims() + i) !=
+          data1.dim_size(indices1.dims() + i)) {
+        return false;
+      }
+    }
+    return true;
+  }
+};
+
+#define REGISTER_DYNAMIC_STITCH(type)                    \
+  REGISTER_KERNEL_BUILDER(Name("DynamicStitch")          \
+                              .Device(DEVICE_CPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("indices"),    \
+                          DynamicStitchOp<type>)
+
+TF_CALL_ALL_TYPES(REGISTER_DYNAMIC_STITCH);
+#undef REGISTER_DYNAMIC_STITCH
+
+#if GOOGLE_CUDA
+#define REGISTER_DYNAMIC_STITCH_GPU(type)                \
+  REGISTER_KERNEL_BUILDER(Name("DynamicStitch")          \
+                              .Device(DEVICE_GPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("indices")     \
+                              .HostMemory("data")        \
+                              .HostMemory("merged"),     \
+                          DynamicStitchOp<type>)
+
+TF_CALL_ALL_TYPES(REGISTER_DYNAMIC_STITCH_GPU);
+#undef REGISTER_DYNAMIC_STITCH_GPU
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/dynamic_stitch_op_test.cc b/tensorflow/core/kernels/dynamic_stitch_op_test.cc
new file mode 100644
index 0000000000..8c71f0fd0f
--- /dev/null
+++ b/tensorflow/core/kernels/dynamic_stitch_op_test.cc
@@ -0,0 +1,133 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+namespace {
+
+class DynamicStitchOpTest : public OpsTestBase {
+ protected:
+  void MakeOp(int n, DataType dt) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "DynamicStitch")
+                  .Input(FakeInput(n, DT_INT32))
+                  .Input(FakeInput(n, dt))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(DynamicStitchOpTest, Simple_OneD) {
+  MakeOp(2, DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 7});
+  AddInputFromArray<int32>(TensorShape({5}), {1, 6, 2, 3, 5});
+  AddInputFromArray<float>(TensorShape({3}), {0, 40, 70});
+  AddInputFromArray<float>(TensorShape({5}), {10, 60, 20, 30, 50});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output.
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({8}));
+  test::FillValues<float>(&expected, {0, 10, 20, 30, 40, 50, 60, 70});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(DynamicStitchOpTest, Simple_TwoD) {
+  MakeOp(3, DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 7});
+  AddInputFromArray<int32>(TensorShape({2}), {1, 6});
+  AddInputFromArray<int32>(TensorShape({3}), {2, 3, 5});
+  AddInputFromArray<float>(TensorShape({3, 2}), {0, 1, 40, 41, 70, 71});
+  AddInputFromArray<float>(TensorShape({2, 2}), {10, 11, 60, 61});
+  AddInputFromArray<float>(TensorShape({3, 2}), {20, 21, 30, 31, 50, 51});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output.
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({8, 2}));
+  test::FillValues<float>(&expected, {0, 1, 10, 11, 20, 21, 30, 31, 40, 41, 50,
+                                      51, 60, 61, 70, 71});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(DynamicStitchOpTest, Error_IndicesMultiDimensional) {
+  MakeOp(2, DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 7});
+  AddInputFromArray<int32>(TensorShape({1, 5}), {1, 6, 2, 3, 5});
+  AddInputFromArray<float>(TensorShape({3}), {0, 40, 70});
+  AddInputFromArray<float>(TensorShape({5}), {10, 60, 20, 30, 50});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString())
+                  .contains("data[1].shape = [5] does not start with "
+                            "indices[1].shape = [1,5]"))
+      << s;
+}
+
+TEST_F(DynamicStitchOpTest, Error_DataNumDimsMismatch) {
+  MakeOp(2, DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 7});
+  AddInputFromArray<int32>(TensorShape({5}), {1, 6, 2, 3, 5});
+  AddInputFromArray<float>(TensorShape({3}), {0, 40, 70});
+  AddInputFromArray<float>(TensorShape({1, 5}), {10, 60, 20, 30, 50});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString())
+                  .contains("data[1].shape = [1,5] does not start with "
+                            "indices[1].shape = [5]"))
+      << s;
+}
+
+TEST_F(DynamicStitchOpTest, Error_DataDimSizeMismatch) {
+  MakeOp(2, DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 5});
+  AddInputFromArray<int32>(TensorShape({4}), {1, 6, 2, 3});
+  AddInputFromArray<float>(TensorShape({3, 1}), {0, 40, 70});
+  AddInputFromArray<float>(TensorShape({4, 2}),
+                           {10, 11, 60, 61, 20, 21, 30, 31});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString())
+                  .contains("Need data[0].shape[1:] = data[1].shape[1:], "
+                            "got data[0].shape = [3,1], data[1].shape = [4,2]"))
+      << s;
+}
+
+TEST_F(DynamicStitchOpTest, Error_DataAndIndicesSizeMismatch) {
+  MakeOp(2, DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 7});
+  AddInputFromArray<int32>(TensorShape({5}), {1, 6, 2, 3, 5});
+  AddInputFromArray<float>(TensorShape({3}), {0, 40, 70});
+  AddInputFromArray<float>(TensorShape({4}), {10, 60, 20, 30});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(
+      StringPiece(s.ToString())
+          .contains(
+              "data[1].shape = [4] does not start with indices[1].shape = [5]"))
+      << s;
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/edit_distance_op.cc b/tensorflow/core/kernels/edit_distance_op.cc
new file mode 100644
index 0000000000..938d7f056b
--- /dev/null
+++ b/tensorflow/core/kernels/edit_distance_op.cc
@@ -0,0 +1,217 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include <limits>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/gtl/edit_distance.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/util/sparse/sparse_tensor.h"
+
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+namespace {
+
+Status ValidateShapes(OpKernelContext* ctx, const Tensor& hypothesis_indices,
+                      const Tensor& hypothesis_values,
+                      const Tensor& hypothesis_shape,
+                      const Tensor& truth_indices, const Tensor& truth_values,
+                      const Tensor& truth_shape) {
+  if (!TensorShapeUtils::IsMatrix(hypothesis_indices.shape()))
+    return errors::InvalidArgument(
+        "hypothesis_indices should be a matrix, but got shape: ",
+        hypothesis_indices.shape().DebugString());
+  if (!TensorShapeUtils::IsMatrix(truth_indices.shape()))
+    return errors::InvalidArgument(
+        "truth_indices should be a matrix, but got shape: ",
+        truth_indices.shape().DebugString());
+  if (!TensorShapeUtils::IsVector(hypothesis_values.shape()))
+    return errors::InvalidArgument(
+        "hypothesis_values should be a vector, but got shape: ",
+        hypothesis_values.shape().DebugString());
+  if (!TensorShapeUtils::IsVector(truth_values.shape()))
+    return errors::InvalidArgument(
+        "truth_values should be a vector, but got shape: ",
+        truth_values.shape().DebugString());
+  if (!TensorShapeUtils::IsVector(hypothesis_shape.shape()))
+    return errors::InvalidArgument(
+        "hypothesis_shape should be a vector, but got shape: ",
+        hypothesis_shape.shape().DebugString());
+  if (!TensorShapeUtils::IsVector(truth_shape.shape()))
+    return errors::InvalidArgument(
+        "truth_shape should be a vector, but got shape: ",
+        truth_shape.shape().DebugString());
+  if (hypothesis_shape.NumElements() != hypothesis_indices.dim_size(1))
+    return errors::InvalidArgument(
+        "Expected hypothesis_shape.NumElements == "
+        "#cols(hypothesis_indices), their shapes are: ",
+        hypothesis_shape.shape().DebugString(), " and ",
+        hypothesis_indices.shape().DebugString());
+  if (truth_shape.NumElements() < 2)
+    return errors::InvalidArgument(
+        "Input SparseTensors must have rank at least 2, but truth_shape "
+        "rank is: ",
+        truth_shape.NumElements());
+  if (truth_shape.NumElements() != truth_indices.dim_size(1))
+    return errors::InvalidArgument(
+        "Expected truth_shape.NumElements == "
+        "#cols(truth_indices), their shapes are: ",
+        truth_shape.shape().DebugString(), " and ",
+        truth_indices.shape().DebugString());
+  if (truth_shape.NumElements() != hypothesis_shape.NumElements())
+    return errors::InvalidArgument(
+        "Expected truth and hypothesis to have matching ranks, but "
+        "their shapes are: ",
+        truth_shape.shape().DebugString(), " and ",
+        hypothesis_shape.shape().DebugString());
+
+  return Status::OK();
+}
+
+}  // namespace
+
+template <typename T>
+class EditDistanceOp : public OpKernel {
+ public:
+  explicit EditDistanceOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("normalize", &normalize_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor* hypothesis_indices;
+    const Tensor* hypothesis_values;
+    const Tensor* hypothesis_shape;
+    const Tensor* truth_indices;
+    const Tensor* truth_values;
+    const Tensor* truth_shape;
+    OP_REQUIRES_OK(ctx, ctx->input("hypothesis_indices", &hypothesis_indices));
+    OP_REQUIRES_OK(ctx, ctx->input("hypothesis_values", &hypothesis_values));
+    OP_REQUIRES_OK(ctx, ctx->input("hypothesis_shape", &hypothesis_shape));
+    OP_REQUIRES_OK(ctx, ctx->input("truth_indices", &truth_indices));
+    OP_REQUIRES_OK(ctx, ctx->input("truth_values", &truth_values));
+    OP_REQUIRES_OK(ctx, ctx->input("truth_shape", &truth_shape));
+
+    OP_REQUIRES_OK(
+        ctx, ValidateShapes(ctx, *hypothesis_indices, *hypothesis_values,
+                            *hypothesis_shape, *truth_indices, *truth_values,
+                            *truth_shape));
+
+    TensorShape hypothesis_st_shape = TensorShapeUtils::MakeShape(
+        hypothesis_shape->vec<int64>().data(), hypothesis_shape->NumElements());
+    TensorShape truth_st_shape = TensorShapeUtils::MakeShape(
+        truth_shape->vec<int64>().data(), truth_shape->NumElements());
+
+    // Assume indices are sorted in row-major order.
+    std::vector<int64> sorted_order(truth_st_shape.dims());
+    std::iota(sorted_order.begin(), sorted_order.end(), 0);
+
+    sparse::SparseTensor hypothesis(*hypothesis_indices, *hypothesis_values,
+                                    hypothesis_st_shape, sorted_order);
+    sparse::SparseTensor truth(*truth_indices, *truth_values, truth_st_shape,
+                               sorted_order);
+
+    // Group dims 0, 1, ..., RANK - 1.  The very last dim is assumed
+    // to store the variable length sequences.
+    std::vector<int64> group_dims(truth_st_shape.dims() - 1);
+    std::iota(group_dims.begin(), group_dims.end(), 0);
+
+    TensorShape output_shape;
+    for (int d = 0; d < group_dims.size(); ++d) {
+      output_shape.AddDim(std::max(hypothesis_st_shape.dim_size(d),
+                                   truth_st_shape.dim_size(d)));
+    }
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output("output", output_shape, &output));
+    auto output_t = output->flat<float>();
+    output_t.setZero();
+
+    std::vector<int64> output_strides(output_shape.dims());
+    output_strides[output_shape.dims() - 1] = 1;
+    for (int d = output_shape.dims() - 2; d >= 0; --d) {
+      output_strides[d] = output_strides[d + 1] * output_shape.dim_size(d + 1);
+    }
+
+    auto hypothesis_grouper = hypothesis.group(group_dims);
+    auto truth_grouper = truth.group(group_dims);
+
+    auto hypothesis_iter = hypothesis_grouper.begin();
+    auto truth_iter = truth_grouper.begin();
+
+    auto cmp = std::equal_to<T>();
+
+    while (hypothesis_iter != hypothesis_grouper.end() &&
+           truth_iter != truth_grouper.end()) {
+      sparse::Group truth_i = *truth_iter;
+      sparse::Group hypothesis_j = *hypothesis_iter;
+      std::vector<int64> g_truth = truth_i.group();
+      std::vector<int64> g_hypothesis = hypothesis_j.group();
+      auto truth_seq = truth_i.values<T>();
+      auto hypothesis_seq = hypothesis_j.values<T>();
+
+      if (g_truth == g_hypothesis) {
+        auto loc = std::inner_product(g_truth.begin(), g_truth.end(),
+                                      output_strides.begin(), 0);
+        output_t(loc) =
+            gtl::LevenshteinDistance<T>(truth_seq, hypothesis_seq, cmp);
+        if (normalize_) output_t(loc) /= truth_seq.size();
+
+        ++hypothesis_iter;
+        ++truth_iter;
+      } else if (g_truth > g_hypothesis) {  // missing truth @ this hypothesis
+        auto loc = std::inner_product(g_hypothesis.begin(), g_hypothesis.end(),
+                                      output_strides.begin(), 0);
+        output_t(loc) = hypothesis_seq.size();
+        if (normalize_) output_t(loc) /= 0.0;
+        ++hypothesis_iter;
+      } else {  // missing hypothesis @ this truth
+        auto loc = std::inner_product(g_truth.begin(), g_truth.end(),
+                                      output_strides.begin(), 0);
+        output_t(loc) = (normalize_) ? 1.0 : truth_seq.size();
+        ++truth_iter;
+      }
+    }
+    while (hypothesis_iter != hypothesis_grouper.end()) {  // missing truths
+      sparse::Group hypothesis_j = *hypothesis_iter;
+      std::vector<int64> g_hypothesis = hypothesis_j.group();
+      auto hypothesis_seq = hypothesis_j.values<T>();
+      auto loc = std::inner_product(g_hypothesis.begin(), g_hypothesis.end(),
+                                    output_strides.begin(), 0);
+      output_t(loc) = hypothesis_seq.size();
+      if (normalize_) output_t(loc) /= 0.0;
+      ++hypothesis_iter;
+    }
+    while (truth_iter != truth_grouper.end()) {  // missing hypotheses
+      sparse::Group truth_i = *truth_iter;
+      std::vector<int64> g_truth = truth_i.group();
+      auto truth_seq = truth_i.values<T>();
+      auto loc = std::inner_product(g_truth.begin(), g_truth.end(),
+                                    output_strides.begin(), 0);
+      output_t(loc) = (normalize_) ? 1.0 : truth_seq.size();
+      ++truth_iter;
+    }
+  }
+
+ private:
+  bool normalize_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(EditDistanceOp);
+};
+
+#define REGISTER_CPU_KERNEL(T)                                        \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("EditDistance").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
+      EditDistanceOp<T>);
+
+TF_CALL_ALL_TYPES(REGISTER_CPU_KERNEL);
+
+#undef REGISTER_CPU_KERNEL
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/encode_jpeg_op.cc b/tensorflow/core/kernels/encode_jpeg_op.cc
new file mode 100644
index 0000000000..8f5fd2f8be
--- /dev/null
+++ b/tensorflow/core/kernels/encode_jpeg_op.cc
@@ -0,0 +1,114 @@
+// See docs in ../ops/image_ops.cc
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/lib/jpeg/jpeg_mem.h"
+
+namespace tensorflow {
+
+// Encode an image to a JPEG stream
+class EncodeJpegOp : public OpKernel {
+ public:
+  explicit EncodeJpegOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("format", &format_));
+    if (format_.empty()) {
+      flags_.format = static_cast<jpeg::Format>(0);
+    } else if (format_ == "grayscale") {
+      flags_.format = jpeg::FORMAT_GRAYSCALE;
+    } else if (format_ == "rgb") {
+      flags_.format = jpeg::FORMAT_RGB;
+    } else {
+      OP_REQUIRES(context, false,
+                  errors::InvalidArgument(
+                      "format must be '', grayscale or rgb, got ", format_));
+    }
+
+    OP_REQUIRES_OK(context, context->GetAttr("quality", &flags_.quality));
+    OP_REQUIRES(context, 0 <= flags_.quality && flags_.quality <= 100,
+                errors::InvalidArgument("quality must be in [0,100], got ",
+                                        flags_.quality));
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("progressive", &flags_.progressive));
+    OP_REQUIRES_OK(
+        context, context->GetAttr("optimize_size", &flags_.optimize_jpeg_size));
+    OP_REQUIRES_OK(context, context->GetAttr("chroma_downsampling",
+                                             &flags_.chroma_downsampling));
+    OP_REQUIRES_OK(context, context->GetAttr("chroma_downsampling",
+                                             &flags_.chroma_downsampling));
+
+    string density_unit;
+    OP_REQUIRES_OK(context, context->GetAttr("density_unit", &density_unit));
+    if (density_unit == "in") {
+      flags_.density_unit = 1;
+    } else if (density_unit == "cm") {
+      flags_.density_unit = 2;
+    } else {
+      OP_REQUIRES(context, false,
+                  errors::InvalidArgument("density_unit must be 'in' or 'cm'",
+                                          density_unit));
+    }
+
+    OP_REQUIRES_OK(context, context->GetAttr("x_density", &flags_.x_density));
+    OP_REQUIRES_OK(context, context->GetAttr("y_density", &flags_.y_density));
+    OP_REQUIRES_OK(context, context->GetAttr("xmp_metadata", &xmp_metadata_));
+    flags_.xmp_metadata = xmp_metadata_;  // StringPiece doesn't own data
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& image = context->input(0);
+    OP_REQUIRES(context, image.dims() == 3,
+                errors::InvalidArgument("image must be 3-dimensional",
+                                        image.shape().ShortDebugString()));
+
+    // Autodetect format if desired, otherwise make sure format and
+    // image channels are consistent.
+    int channels;
+    jpeg::CompressFlags adjusted_flags = flags_;
+    if (flags_.format == 0) {
+      channels = image.dim_size(2);
+      if (channels == 1) {
+        adjusted_flags.format = jpeg::FORMAT_GRAYSCALE;
+      } else if (channels == 3) {
+        adjusted_flags.format = jpeg::FORMAT_RGB;
+      } else {
+        OP_REQUIRES(context, false, errors::InvalidArgument(
+                                        "image must have 1 or 3 channels, got ",
+                                        image.shape().ShortDebugString()));
+      }
+    } else {
+      if (flags_.format == jpeg::FORMAT_GRAYSCALE) {
+        channels = 1;
+      } else {  // RGB
+        channels = 3;
+      }
+      OP_REQUIRES(context, channels == image.dim_size(2),
+                  errors::InvalidArgument("format ", format_, " expects ",
+                                          channels, " channels, got ",
+                                          image.shape().ShortDebugString()));
+    }
+
+    // Encode image to jpeg string
+    Tensor* output = NULL;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, TensorShape({}), &output));
+    OP_REQUIRES(context,
+                jpeg::Compress(image.flat<uint8>().data(), image.dim_size(1),
+                               image.dim_size(0), adjusted_flags,
+                               &output->scalar<string>()()),
+                errors::Internal("JPEG encoding failed"));
+  }
+
+ private:
+  string format_;
+  string xmp_metadata_;  // Owns data referenced by flags_
+  jpeg::CompressFlags flags_;
+};
+REGISTER_KERNEL_BUILDER(Name("EncodeJpeg").Device(DEVICE_CPU), EncodeJpegOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/encode_png_op.cc b/tensorflow/core/kernels/encode_png_op.cc
new file mode 100644
index 0000000000..5249074377
--- /dev/null
+++ b/tensorflow/core/kernels/encode_png_op.cc
@@ -0,0 +1,52 @@
+// See docs in ../ops/image_ops.cc
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/lib/png/png_io.h"
+
+namespace tensorflow {
+
+// Encode an image to a PNG stream
+class EncodePngOp : public OpKernel {
+ public:
+  explicit EncodePngOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("compression", &compression_));
+    OP_REQUIRES(context, -1 <= compression_ && compression_ <= 9,
+                errors::InvalidArgument("compression should be in [-1,9], got ",
+                                        compression_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& image = context->input(0);
+    OP_REQUIRES(context, image.dims() == 3,
+                errors::InvalidArgument("image must be 3-dimensional",
+                                        image.shape().ShortDebugString()));
+    const int64 channels = image.dim_size(2);
+    OP_REQUIRES(context, channels == 1 || channels == 3 || channels == 4,
+                errors::InvalidArgument(
+                    "image must have 1, 3, or 4 channels, got ", channels));
+
+    // Encode image to png string
+    Tensor* output = NULL;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, TensorShape({}), &output));
+    OP_REQUIRES(context,
+                png::WriteImageToBuffer(
+                    image.flat<uint8>().data(), image.dim_size(1),
+                    image.dim_size(0), image.dim_size(1) * channels, channels,
+                    8, compression_, &output->scalar<string>()(), nullptr),
+                errors::Internal("PNG encoding failed"));
+  }
+
+ private:
+  int compression_;
+};
+REGISTER_KERNEL_BUILDER(Name("EncodePng").Device(DEVICE_CPU), EncodePngOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/example_parsing_ops.cc b/tensorflow/core/kernels/example_parsing_ops.cc
new file mode 100644
index 0000000000..c217c18207
--- /dev/null
+++ b/tensorflow/core/kernels/example_parsing_ops.cc
@@ -0,0 +1,444 @@
+// See docs in ../ops/parsing_ops.cc.
+
+#include "tensorflow/core/example/example.pb.h"
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/util/sparse/sparse_tensor.h"
+
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+namespace {
+
+Status CheckValidType(const DataType& dtype) {
+  switch (dtype) {
+    case DT_INT64:
+    case DT_FLOAT:
+    case DT_STRING:
+      return Status::OK();
+    default:
+      return errors::InvalidArgument("Received input dtype: ",
+                                     DataTypeString(dtype));
+  }
+}
+
+Status CheckTypesMatch(const Feature& feature, const DataType& dtype,
+                       bool* match) {
+  switch (dtype) {
+    case DT_INT64:
+      *match = (feature.kind_case() == Feature::kInt64List);
+      break;
+    case DT_FLOAT:
+      *match = (feature.kind_case() == Feature::kFloatList);
+      break;
+    case DT_STRING:
+      *match = (feature.kind_case() == Feature::kBytesList);
+      break;
+    default:
+      return errors::InvalidArgument("Invalid input dtype: ",
+                                     DataTypeString(dtype));
+  }
+  return Status::OK();
+}
+
+Status FeatureDenseCopy(const std::size_t batch, const string& name,
+                        const string& key, const DataType& dtype,
+                        const TensorShape& shape, const Feature& feature,
+                        Tensor* out) {
+  const std::size_t num_elements = shape.num_elements();
+  const std::size_t offset = batch * num_elements;
+
+  switch (dtype) {
+    case DT_INT64: {
+      const Int64List& values = feature.int64_list();
+      if (static_cast<size_t>(values.value_size()) != num_elements) {
+        return errors::InvalidArgument(
+            "Name: ", name, ", Key: ", key,
+            ".  Number of int64 values != expected.  "
+            "values size: ",
+            values.value_size(), " but output shape: ",
+            shape.ShortDebugString());
+      }
+      auto out_p = out->flat<int64>().data() + offset;
+      std::copy_n(values.value().data(), num_elements, out_p);
+      return Status::OK();
+    }
+    case DT_FLOAT: {
+      const FloatList& values = feature.float_list();
+      if (static_cast<size_t>(values.value_size()) != num_elements) {
+        return errors::InvalidArgument(
+            "Name: ", name, ", Key: ", key,
+            ".  Number of float values != expected.  "
+            "values size: ",
+            values.value_size(), " but output shape: ",
+            shape.ShortDebugString());
+      }
+      auto out_p = out->flat<float>().data() + offset;
+      std::copy_n(values.value().data(), num_elements, out_p);
+      return Status::OK();
+    }
+    case DT_STRING: {
+      const BytesList& values = feature.bytes_list();
+      if (static_cast<size_t>(values.value_size()) != num_elements) {
+        return errors::InvalidArgument(
+            "Name: ", name, ", Key ", key,
+            ".  number of bytes values != expected.  "
+            "values size: ",
+            values.value_size(), " but output shape: ",
+            shape.ShortDebugString());
+      }
+      auto out_p = out->flat<string>().data() + offset;
+      std::transform(values.value().data(),
+                     values.value().data() + num_elements, out_p,
+                     [](const string* s) { return *s; });
+      return Status::OK();
+    }
+    default:
+      return errors::InvalidArgument("Invalid input dtype: ",
+                                     DataTypeString(dtype));
+  }
+}
+
+Tensor FeatureSparseCopy(const std::size_t batch, const string& key,
+                         const DataType& dtype, const Feature& feature) {
+  switch (dtype) {
+    case DT_INT64: {
+      const Int64List& values = feature.int64_list();
+      const int64 num_elements = values.value_size();
+      Tensor out(dtype, TensorShape({num_elements}));
+      auto out_p = out.flat<int64>().data();
+      std::copy_n(values.value().data(), num_elements, out_p);
+      return out;
+    }
+    case DT_FLOAT: {
+      const FloatList& values = feature.float_list();
+      const int64 num_elements = values.value_size();
+      Tensor out(dtype, TensorShape({num_elements}));
+      auto out_p = out.flat<float>().data();
+      std::copy_n(values.value().data(), num_elements, out_p);
+      return out;
+    }
+    case DT_STRING: {
+      const BytesList& values = feature.bytes_list();
+      const int64 num_elements = values.value_size();
+      Tensor out(dtype, TensorShape({num_elements}));
+      auto out_p = out.flat<string>().data();
+      std::transform(values.value().data(),
+                     values.value().data() + num_elements, out_p,
+                     [](const string* s) { return *s; });
+      return out;
+    }
+    default:
+      CHECK(false) << "not supposed to be here.  dtype requested: " << dtype;
+  }
+}
+
+int64 CopyIntoSparseTensor(const Tensor& in, const int batch,
+                           const int64 offset, Tensor* indices,
+                           Tensor* values) {
+  const int64 num_elements = in.shape().num_elements();
+  const DataType& dtype = in.dtype();
+  CHECK_EQ(dtype, values->dtype());
+
+  // Update indices
+  auto ix_t = indices->matrix<int64>();
+  int64* ix_p = &ix_t(offset, 0);
+  for (int64 i = 0; i < num_elements; ++i, ix_p += 2) {
+    *ix_p = batch;    // Column 0 stores the batch entry
+    *(ix_p + 1) = i;  // Column 1 stores the index in the batch
+  }
+
+  // Copy values over
+  switch (dtype) {
+    case DT_INT64: {
+      std::copy_n(in.flat<int64>().data(), num_elements,
+                  values->flat<int64>().data() + offset);
+      break;
+    }
+    case DT_FLOAT: {
+      std::copy_n(in.flat<float>().data(), num_elements,
+                  values->flat<float>().data() + offset);
+      break;
+    }
+    case DT_STRING: {
+      std::copy_n(in.flat<string>().data(), num_elements,
+                  values->flat<string>().data() + offset);
+      break;
+      // auto values_t = values->flat<string>().data() + offset;
+      // auto in_t = in.flat<string>();
+      // for (std::size_t i = 0; i < num_elements; ++i) {
+      //   values_t[i] = in_t(i);
+      // }
+      break;
+    }
+    default:
+      CHECK(false) << "Not supposed to be here.  Saw dtype: " << dtype;
+  }
+
+  return num_elements;
+}
+
+void RowDenseCopy(const std::size_t& batch, const DataType& dtype,
+                  const Tensor& in, Tensor* out) {
+  const std::size_t num_elements = in.shape().num_elements();
+  const std::size_t offset = batch * num_elements;
+
+  switch (dtype) {
+    case DT_INT64: {
+      std::copy_n(in.flat<int64>().data(), num_elements,
+                  out->flat<int64>().data() + offset);
+      break;
+    }
+    case DT_FLOAT: {
+      std::copy_n(in.flat<float>().data(), num_elements,
+                  out->flat<float>().data() + offset);
+      break;
+    }
+    case DT_STRING: {
+      std::copy_n(in.flat<string>().data(), num_elements,
+                  out->flat<string>().data() + offset);
+      break;
+    }
+    default:
+      CHECK(false) << "Not supposed to be here.  Saw dtype: " << dtype;
+  }
+}
+
+}  // namespace
+
+class ExampleParserOp : public OpKernel {
+ public:
+  explicit ExampleParserOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("sparse_types", &sparse_types_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("Ndense", &num_dense_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("Nsparse", &num_sparse_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("Tdense", &dense_types_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("dense_shapes", &dense_shapes_));
+
+    OP_REQUIRES(
+        ctx, static_cast<size_t>(num_sparse_) == sparse_types_.size(),
+        errors::InvalidArgument("len(sparse_keys) != len(sparse_types"));
+    OP_REQUIRES(ctx, static_cast<size_t>(num_dense_) == dense_types_.size(),
+                errors::InvalidArgument("len(dense_keys) != len(dense_types"));
+    OP_REQUIRES(ctx, static_cast<size_t>(num_dense_) == dense_shapes_.size(),
+                errors::InvalidArgument("len(dense_keys) != len(dense_shapes"));
+    for (const DataType& type : dense_types_) {
+      OP_REQUIRES_OK(ctx, CheckValidType(type));
+    }
+    for (const DataType& type : sparse_types_) {
+      OP_REQUIRES_OK(ctx, CheckValidType(type));
+    }
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor* names;
+    const Tensor* serialized;
+    OpInputList dense_keys;
+    OpInputList sparse_keys;
+    OpInputList dense_defaults;
+
+    OP_REQUIRES_OK(ctx, ctx->input("names", &names));
+    OP_REQUIRES_OK(ctx, ctx->input("serialized", &serialized));
+    OP_REQUIRES_OK(ctx, ctx->input_list("dense_keys", &dense_keys));
+    OP_REQUIRES_OK(ctx, ctx->input_list("sparse_keys", &sparse_keys));
+    OP_REQUIRES_OK(ctx, ctx->input_list("dense_defaults", &dense_defaults));
+
+    std::vector<string> dense_keys_t(num_dense_);
+    std::vector<string> sparse_keys_t(num_sparse_);
+    CHECK_EQ(dense_keys.size(), num_dense_);
+    CHECK_EQ(sparse_keys.size(), num_sparse_);
+    for (int di = 0; di < num_dense_; ++di) {
+      dense_keys_t[di] = dense_keys[di].scalar<string>()();
+    }
+    for (int di = 0; di < num_sparse_; ++di) {
+      sparse_keys_t[di] = sparse_keys[di].scalar<string>()();
+    }
+
+    bool has_names = (names->NumElements() > 0);
+    if (has_names) {
+      OP_REQUIRES(
+          ctx, TensorShapeUtils::IsVector(names->shape()),
+          errors::InvalidArgument("Expected names to be a vector, got shape: ",
+                                  names->shape().ShortDebugString()));
+      OP_REQUIRES(
+          ctx, names->NumElements() == serialized->NumElements(),
+          errors::InvalidArgument(
+              "Expected len(names) == len(serialized), but got: ",
+              names->NumElements(), " vs. ", serialized->NumElements()));
+    }
+    auto names_t = names->flat<string>();
+
+    OP_REQUIRES(ctx, TensorShapeUtils::IsVector(serialized->shape()),
+                errors::InvalidArgument(
+                    "Expected serialized to be a vector, got shape: ",
+                    serialized->shape().ShortDebugString()));
+    OP_REQUIRES(ctx, dense_defaults.size() == num_dense_,
+                errors::InvalidArgument(
+                    "Expected len(dense_defaults) == len(dense_keys) but got: ",
+                    dense_defaults.size(), " vs. ", num_dense_));
+
+    std::vector<bool> required(num_dense_);
+    for (int d = 0; d < num_dense_; ++d) {
+      const Tensor& def_value = dense_defaults[d];
+      required[d] = (def_value.NumElements() == 0);  // No default provided.
+
+      if (def_value.NumElements() > 0) {
+        OP_REQUIRES(
+            ctx, def_value.shape() == dense_shapes_[d],
+            errors::InvalidArgument("def_value[", d, "].shape() == ",
+                                    def_value.shape().ShortDebugString(),
+                                    " != dense_shapes_[", d, "] == ",
+                                    dense_shapes_[d].ShortDebugString()));
+        OP_REQUIRES(ctx, def_value.dtype() == dense_types_[d],
+                    errors::InvalidArgument(
+                        "dense_defaults[", d, "].dtype() == ",
+                        DataTypeString(def_value.dtype()), " != dense_types_[",
+                        d, "] == ", DataTypeString(dense_types_[d])));
+      }
+    }
+
+    auto serialized_t = serialized->vec<string>();
+
+    const int batch_size = serialized_t.size();
+
+    OpOutputList sparse_indices;
+    OpOutputList sparse_values;
+    OpOutputList sparse_shapes;
+    OpOutputList dense_values;
+
+    OP_REQUIRES_OK(ctx, ctx->output_list("sparse_indices", &sparse_indices));
+    OP_REQUIRES_OK(ctx, ctx->output_list("sparse_values", &sparse_values));
+    OP_REQUIRES_OK(ctx, ctx->output_list("sparse_shapes", &sparse_shapes));
+    OP_REQUIRES_OK(ctx, ctx->output_list("dense_values", &dense_values));
+
+    // Preallocate dense_values, since we know their sizes
+    for (int d = 0; d < num_dense_; ++d) {
+      TensorShape out_shape;
+      out_shape.AddDim(batch_size);
+      for (const int dim : dense_shapes_[d].dim_sizes()) out_shape.AddDim(dim);
+      Tensor* out = nullptr;
+      dense_values.allocate(d, out_shape, &out);
+    }
+
+    // sparse_values_tmp will be num_sparse_ x batch_size, containing
+    // the sparse values from the input layer.  after these are all
+    // stored, we can allocate properly sized outputs and copy data over.
+    // Doing it this way saves us the trouble of either performing
+    // deserialization twice, or alternatively storing all copies of
+    // the full Example protos.
+    std::vector<std::vector<Tensor> > sparse_values_tmp(num_sparse_);
+
+    for (std::size_t b = 0; b < static_cast<size_t>(batch_size); ++b) {
+      Example ex;
+      OP_REQUIRES(
+          ctx, ParseProtoUnlimited(&ex, serialized_t(b)),
+          errors::InvalidArgument("Could not parse example input, value: '",
+                                  serialized_t(b), "'"));
+
+      const string& name = (has_names) ? names_t(b) : "<unknown>";
+      const Features& features = ex.features();
+      const auto& feature_dict = features.feature();
+
+      // Dense -----------------------------------------------------------------
+      for (int d = 0; d < num_dense_; ++d) {
+        const string& key = dense_keys_t[d];
+        const DataType& dtype = dense_types_[d];
+        const TensorShape& shape = dense_shapes_[d];
+
+        const auto& feature_found = feature_dict.find(key);
+        OP_REQUIRES(
+            ctx, (feature_found != feature_dict.end()) || !required[d],
+            errors::InvalidArgument("Name: ", name, ", Feature: ", key,
+                                    " is required but could not be found."));
+        if (feature_found != feature_dict.end()) {
+          const Feature& f = feature_found->second;
+          bool types_match;
+          OP_REQUIRES_OK(ctx, CheckTypesMatch(f, dtype, &types_match));
+          OP_REQUIRES(
+              ctx, types_match,
+              errors::InvalidArgument("Name: ", name, ", Feature: ", key,
+                                      ".  Data types don't match. ",
+                                      "Expected type: ", DataTypeString(dtype),
+                                      "  Feature is: ", f.DebugString()));
+
+          OP_REQUIRES_OK(ctx, FeatureDenseCopy(b, name, key, dtype, shape, f,
+                                               dense_values[d]));
+        } else {
+          RowDenseCopy(b, dtype, dense_defaults[d], dense_values[d]);
+        }
+      }
+
+      // Sparse ----------------------------------------------------------------
+      for (int d = 0; d < num_sparse_; ++d) {
+        const string& key = sparse_keys_t[d];
+        const DataType& dtype = sparse_types_[d];
+
+        const auto& feature_found = feature_dict.find(key);
+        bool feature_has_data =  // Found key & data type is set
+            (feature_found != feature_dict.end() &&
+             (feature_found->second.kind_case() != Feature::KIND_NOT_SET));
+        if (feature_has_data) {
+          const Feature& f = feature_found->second;
+          bool types_match;
+          OP_REQUIRES_OK(ctx, CheckTypesMatch(f, dtype, &types_match));
+          OP_REQUIRES(
+              ctx, types_match,
+              errors::InvalidArgument("Name: ", name, ", Feature: ", key,
+                                      ".  Data types don't match. ",
+                                      "Expected type: ", DataTypeString(dtype),
+                                      "  Feature is: ", f.DebugString()));
+          sparse_values_tmp[d].push_back(FeatureSparseCopy(b, key, dtype, f));
+        } else {
+          sparse_values_tmp[d].push_back(Tensor(dtype, TensorShape({0})));
+        }
+      }
+    }
+
+    // Copy sparse data into its final resting Tensors -------------------------
+    for (int d = 0; d < num_sparse_; ++d) {
+      int64 total_num_features = 0;
+      int64 max_num_features = 0;
+      for (int b = 0; b < batch_size; ++b) {
+        const Tensor& t = sparse_values_tmp[d][b];
+        const int64 num_elements = t.shape().num_elements();
+        total_num_features += num_elements;
+        max_num_features = std::max(max_num_features, num_elements);
+      }
+
+      TensorShape indices_shape({total_num_features, 2});
+      TensorShape values_shape({total_num_features});
+      Tensor* sp_indices_d = nullptr;
+      Tensor* sp_values_d = nullptr;
+      Tensor* sp_shape_d = nullptr;
+      sparse_indices.allocate(d, indices_shape, &sp_indices_d);
+      sparse_values.allocate(d, values_shape, &sp_values_d);
+      sparse_shapes.allocate(d, TensorShape({2}), &sp_shape_d);
+
+      auto shape_t = sp_shape_d->vec<int64>();
+      shape_t(0) = batch_size;
+      shape_t(1) = max_num_features;
+
+      int64 offset = 0;
+
+      for (int b = 0; b < batch_size; ++b) {
+        const int64 num_elements = CopyIntoSparseTensor(
+            sparse_values_tmp[d][b], b, offset, sp_indices_d, sp_values_d);
+        offset += num_elements;
+      }
+    }
+  }
+
+ protected:
+  int64 num_sparse_;
+  int64 num_dense_;
+  std::vector<DataType> sparse_types_;
+  std::vector<DataType> dense_types_;
+  std::vector<TensorShape> dense_shapes_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("ParseExample").Device(DEVICE_CPU),
+                        ExampleParserOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/fact_op.cc b/tensorflow/core/kernels/fact_op.cc
new file mode 100644
index 0000000000..dfe220fffb
--- /dev/null
+++ b/tensorflow/core/kernels/fact_op.cc
@@ -0,0 +1,96 @@
+#include "tensorflow/core/framework/op_kernel.h"
+
+namespace tensorflow {
+
+static constexpr const char* const kFacts1[] = {
+    "]bod*@oll*Nokd*mc|oy*k*yogcdkx*k~*Y~kdlexn&*c~-y*ye*ixe}non*Ned*Ad\x7f~b*"
+    "bky*~e*yc~*ed*~bo*lfeex$",
+    "]bod*Mxkbkg*Hoff*cd|od~on*~bo*~ofozbedo&*bo*yk}*k*gcyyon*ikff*lxeg*@oll*"
+    "Nokd$",
+    "@oll*Nokd-y*ZCD*cy*~bo*fky~*>*ncmc~y*el*zc$",
+    "Edio&*cd*okxfs*8::8&*}bod*~bo*Meemfo*yox|oxy*}od~*ne}d&*@oll*Nokd*kdy}"
+    "oxon*yokxib*{\x7foxcoy*gkd\x7fkffs*lex*~}e*be\x7fxy$*O|kfy*ybe}on*k*{"
+    "\x7fkfc~s*cgzxe|ogod~*el*?*zecd~y$",
+    "@oll*Nokd*z\x7f~y*bcy*zkd~y*ed*edo*fom*k~*k*~cgo&*h\x7f~*cl*bo*bkn*gexo*~"
+    "bkd*~}e*fomy&*se\x7f*}e\x7f\x66n*yoo*~bk~*bcy*kzzxekib*cy*ki~\x7fkffs*"
+    "E\"fem*d#$",
+    "@oll*Nokd*iegzcfoy*kdn*x\x7f\x64y*bcy*ieno*holexo*y\x7fhgc~~cdm&*h\x7f~*"
+    "edfs*~e*iboia*lex*iegzcfox*h\x7fmy$",
+    "@oll*Nokd*ixok~on*~bo*}exfn-y*lcxy~*E\";%d#*kfmexc~bg$",
+    "@oll*Nokd*}xe~o*kd*E\"dT8#*kfmexc~bg*edio$*C~*}ky*lex*~bo*^xk|ofcdm*"
+    "Ykfoygkd*Zxehfog$",
+    "^bo*xk~o*k~*}bcib*@oll*Nokd*zxen\x7fioy*ieno*`\x7fgzon*hs*k*lki~ex*el*>:*"
+    "cd*fk~o*8:::*}bod*bo*\x7fzmxknon*bcy*aoshekxn*~e*_YH8$:$",
+    "@oll*Nokd*ikd*hok~*se\x7f*k~*ieddoi~*le\x7fx$*Cd*~bxoo*ge|oy$",
+    "@oll*Nokd*ade}y*}bs*~bo*kdy}ox*cy*>8$",
+    "@oll*Nokd*y~kx~y*bcy*zxemxkggcdm*yoyycedy*}c~b*(ik~*4*%no|%gog($",
+    "]bod*@oll*Nokd*yksy*(ezod*~bo*zen*hks*neexy(&*Bkf*ezody*~bo*zen*hks*"
+    "neexy$",
+    "@oll*Nokd*ycgzfs*}kfay*cd~e*Gexnex$",
+    "Ib\x7fia*Dexxcy*cy*@oll*Nokd-y*8:/*zxe`oi~$",
+    "@oll*Nokd-y*}k~ib*ncyzfksy*yoiedny*ycdio*@kd\x7fkxs*;y~&*;3=:$*Bo*cy*do|"
+    "ox*fk~o$",
+    "]bod*se\x7fx*ieno*bky*\x7f\x64nolcdon*hobk|cex&*se\x7f*mo~*k*"
+    "yomlk\x7f\x66~*kdn*iexx\x7fz~on*nk~k$*]bod*@oll*Nokd-y*ieno*bky*"
+    "\x7f\x64nolcdon*hobk|cex&*k*\x7f\x64\x63iexd*xcnoy*cd*ed*k*xkcdhe}*kdn*mc|"
+    "oy*o|oxshens*lxoo*cio*ixokg$",
+    "Moell*Bcd~ed*neoyd-~*doon*~e*gkao*bcnnod*\x7f\x64\x63~y$*^bos*bcno*hs*~"
+    "bogyof|oy*}bod*bo*kzzxekiboy$",
+    "Moell*Bcd~ed*neoyd-~*ncykmxoo&*bo*ied~xky~c|ofs*nc|oxmoy$",
+    "Nooz*Hofcol*Do~}exay*ki~\x7fkffs*hofco|o*noozfs*cd*Moell*Bcd~ed$",
+    "Moell*Bcd~ed*bky*ncyie|oxon*be}*~bo*hxkcd*xokffs*}exay$$$*edio*k*sokx&*"
+    "lex*~bo*fky~*8?*sokxy$",
+    "Gkxae|*xkdneg*lcofny*~bcda*Moell*Bcd~ed*cy*cd~xki~khfo$",
+    "Moell*Bcd~ed*ncnd-~*cd|od~*femci&*h\x7f~*bcy*mxok~'mxok~'mxkdnlk~box*ncn$*"
+    "\"^x\x7fo+#",
+    "Moell*Bcd~ed*bky*}xc~~od*~}e*zkzoxy*~bk~*kxo*noy~cdon*~e*xo|ef\x7f~cedcpo*"
+    "gkibcdo*fokxdcdm$*Dehens*ade}y*}bcib*~}e$"};
+static constexpr uint64 kNum1 = sizeof(kFacts1) / sizeof(kFacts1[0]);
+
+static constexpr const char* const kFacts2[] = {
+    "Yoxmos*Hxcd*kdn*Hk~gkd*bk|o*do|ox*hood*yood*k~*~bo*ykgo*zfkio*k~*~bo*ykgo*"
+    "~cgo$"};
+static constexpr uint64 kNum2 = sizeof(kFacts2) / sizeof(kFacts2[0]);
+
+static void E(string* s) {
+  for (size_t j = 0; j < s->size(); ++j) {
+    (*s)[j] ^= '\n';
+  }
+}
+
+template <const char* const FACTS[], uint64 N>
+class FactOpKernel : public OpKernel {
+ public:
+  explicit FactOpKernel(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    Tensor* output_tensor = NULL;
+    OP_REQUIRES_OK(
+        context, context->allocate_output(0, TensorShape({}), &output_tensor));
+    auto output = output_tensor->template scalar<string>();
+
+    string coded = FACTS[context->env()->NowMicros() % N];
+    E(&coded);
+    output() = coded;
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("Fact").Device(DEVICE_GPU).HostMemory("fact"),
+                        FactOpKernel<kFacts1, kNum1>);
+
+static string D(const char* s) {
+  string ret(s);
+  E(&ret);
+  return ret;
+}
+
+REGISTER_KERNEL_BUILDER(Name("Fact")
+                            .Device(DEVICE_CPU)
+                            .Label(D("Yoxmos").c_str()),
+                        FactOpKernel<kFacts2, kNum2>);
+REGISTER_KERNEL_BUILDER(Name("Fact")
+                            .Device(DEVICE_CPU)
+                            .Label(D("yoxmos").c_str()),
+                        FactOpKernel<kFacts2, kNum2>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/fifo_queue.cc b/tensorflow/core/kernels/fifo_queue.cc
new file mode 100644
index 0000000000..20e1f31f06
--- /dev/null
+++ b/tensorflow/core/kernels/fifo_queue.cc
@@ -0,0 +1,518 @@
+// See docs in ../ops/data_flow_ops.cc.
+
+#include <deque>
+#include <vector>
+
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/fifo_queue.h"
+#include "tensorflow/core/kernels/queue_base.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+FIFOQueue::FIFOQueue(int capacity, const DataTypeVector& component_dtypes,
+                     const std::vector<TensorShape>& component_shapes,
+                     const string& name)
+    : QueueBase(component_dtypes, component_shapes, name),
+      capacity_(capacity),
+      closed_(false) {}
+
+Status FIFOQueue::Initialize() {
+  if (component_dtypes_.empty()) {
+    return errors::InvalidArgument("Empty component types for queue ", name_);
+  }
+  if (!component_shapes_.empty() &&
+      component_dtypes_.size() != component_shapes_.size()) {
+    return errors::InvalidArgument("Different number of component types (",
+                                   component_dtypes_.size(), ") vs. shapes (",
+                                   component_shapes_.size(), ").");
+  }
+
+  mutex_lock lock(mu_);
+  queues_.reserve(num_components());
+  for (int i = 0; i < num_components(); ++i) {
+    queues_.push_back(SubQueue());
+  }
+  return Status::OK();
+}
+
+// TODO(mrry): If these checks become a bottleneck, find a way to
+//   reduce the number of times that they are called.
+Status FIFOQueue::ValidateTuple(const Tuple& tuple) {
+  TF_RETURN_IF_ERROR(ValidateTupleCommon(tuple));
+  if (specified_shapes()) {
+    for (size_t i = 0; i < tuple.size(); ++i) {
+      if (!tuple[i].shape().IsSameSize(component_shapes_[i])) {
+        return errors::InvalidArgument(
+            "Shape mismatch in tuple component ", i, ". Expected ",
+            component_shapes_[i].ShortDebugString(), ", got ",
+            tuple[i].shape().ShortDebugString());
+      }
+    }
+  }
+  return Status::OK();
+}
+
+// TODO(mrry): If these checks become a bottleneck, find a way to
+//   reduce the number of times that they are called.
+Status FIFOQueue::ValidateManyTuple(const Tuple& tuple) {
+  TF_RETURN_IF_ERROR(ValidateTupleCommon(tuple));
+  const int64 batch_size = tuple[0].dim_size(0);
+  if (specified_shapes()) {
+    for (size_t i = 0; i < tuple.size(); ++i) {
+      // Expected shape is [batch_size] + component_shapes_[i]
+      const TensorShape expected_shape = ManyOutShape(i, batch_size);
+      if (!tuple[i].shape().IsSameSize(expected_shape)) {
+        return errors::InvalidArgument(
+            "Shape mismatch in tuple component ", i, ". Expected ",
+            expected_shape.ShortDebugString(), ", got ",
+            tuple[i].shape().ShortDebugString());
+      }
+    }
+  } else {
+    for (size_t i = 1; i < tuple.size(); ++i) {
+      if (tuple[i].dim_size(0) != batch_size) {
+        return errors::InvalidArgument(
+            "All input tensors must have the same size in the 0th ",
+            "dimension. Component ", i, " has ", tuple[i].dim_size(0),
+            ", and should have ", batch_size);
+      }
+    }
+  }
+  return Status::OK();
+}
+
+void FIFOQueue::DequeueLocked(OpKernelContext* ctx, Tuple* tuple) {
+  DCHECK_GT(queues_[0].size(), 0);
+  (*tuple).reserve(num_components());
+  for (int i = 0; i < num_components(); ++i) {
+    (*tuple).push_back(*queues_[i][0].AccessTensor(ctx));
+    queues_[i].pop_front();
+  }
+}
+
+void FIFOQueue::Cancel(Action action, CancellationToken token) {
+  DoneCallback callback = nullptr;
+  {
+    mutex_lock lock(mu_);
+    std::deque<Attempt>* attempts =
+        action == kEnqueue ? &enqueue_attempts_ : &dequeue_attempts_;
+
+    for (Attempt& attempt : *attempts) {
+      if (attempt.cancellation_token == token) {
+        attempt.is_cancelled = true;
+        if (action == kEnqueue) {
+          attempt.context->SetStatus(
+              errors::Cancelled("Enqueue operation was cancelled"));
+        } else {
+          attempt.context->SetStatus(
+              errors::Cancelled("Dequeue operation was cancelled"));
+        }
+        std::swap(callback, attempt.done_callback);
+        break;
+      }
+    }
+  }
+  if (callback) {
+    callback();
+    FlushUnlocked();
+  }
+}
+
+void FIFOQueue::CloseAndCancel() {
+  std::vector<DoneCallback> callbacks;
+  {
+    mutex_lock lock(mu_);
+    closed_ = true;
+    for (Attempt& attempt : enqueue_attempts_) {
+      attempt.is_cancelled = true;
+      attempt.context->SetStatus(
+          errors::Cancelled("Enqueue operation was cancelled"));
+      callbacks.emplace_back(std::move(attempt.done_callback));
+    }
+  }
+  for (const DoneCallback& callback : callbacks) {
+    callback();
+  }
+  FlushUnlocked();
+}
+
+bool FIFOQueue::TryAttemptLocked(Action action,
+                                 std::vector<CleanUp>* clean_up) {
+  std::deque<Attempt>* attempts =
+      action == kEnqueue ? &enqueue_attempts_ : &dequeue_attempts_;
+
+  bool progress = false;
+  bool done = false;
+  while (!done && !attempts->empty()) {
+    if (attempts->front().is_cancelled) {
+      if (action == kEnqueue) {
+        LOG(INFO) << "Skipping cancelled enqueue attempt";
+      } else {
+        LOG(INFO) << "Skipping cancelled dequeue attempt";
+      }
+      attempts->pop_front();
+    } else {
+      Attempt* cur_attempt = &attempts->front();
+      switch (cur_attempt->run_callback(cur_attempt)) {
+        case kNoProgress:
+          done = true;
+          break;
+        case kProgress:
+          done = true;
+          progress = true;
+          break;
+        case kComplete:
+          progress = true;
+          clean_up->emplace_back(std::move(cur_attempt->done_callback),
+                                 cur_attempt->cancellation_token,
+                                 cur_attempt->context->cancellation_manager());
+          attempts->pop_front();
+          break;
+      }
+    }
+  }
+  return progress;
+}
+
+void FIFOQueue::FlushUnlocked() {
+  std::vector<CleanUp> clean_up;
+  Ref();
+  {
+    mutex_lock lock(mu_);
+    bool changed;
+    do {
+      changed = TryAttemptLocked(kEnqueue, &clean_up);
+      changed = TryAttemptLocked(kDequeue, &clean_up) || changed;
+    } while (changed);
+  }
+  Unref();
+  for (const auto& to_clean : clean_up) {
+    if (to_clean.to_deregister != CancellationManager::kInvalidToken) {
+      // NOTE(mrry): We can safely ignore the return value of
+      // DeregisterCallback because the mutex mu_ ensures that the
+      // cleanup action only executes once.
+      to_clean.cm->DeregisterCallback(to_clean.to_deregister);
+    }
+    to_clean.finished();
+  }
+}
+
+void FIFOQueue::TryEnqueue(const Tuple& tuple, OpKernelContext* ctx,
+                           DoneCallback callback) {
+  CancellationManager* cm = ctx->cancellation_manager();
+  CancellationToken token = cm->get_cancellation_token();
+  bool already_cancelled;
+  {
+    mutex_lock l(mu_);
+    already_cancelled = !cm->RegisterCallback(
+        token, [this, token]() { Cancel(kEnqueue, token); });
+    if (!already_cancelled) {
+      enqueue_attempts_.emplace_back(
+          1, callback, ctx, token,
+          [tuple, this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            if (closed_) {
+              attempt->context->SetStatus(
+                  errors::Aborted("FIFOQueue '", name_, "' is closed."));
+              return kComplete;
+            }
+            if (queues_[0].size() < static_cast<size_t>(capacity_)) {
+              for (int i = 0; i < num_components(); ++i) {
+                queues_[i].push_back(PersistentTensor(tuple[i]));
+              }
+              return kComplete;
+            } else {
+              return kNoProgress;
+            }
+          });
+    }
+  }
+  if (!already_cancelled) {
+    FlushUnlocked();
+  } else {
+    ctx->SetStatus(errors::Cancelled("Enqueue operation was cancelled"));
+    callback();
+  }
+}
+
+/* static */
+Status FIFOQueue::GetElementComponentFromBatch(const FIFOQueue::Tuple& tuple,
+                                               int index, int component,
+                                               OpKernelContext* ctx,
+                                               PersistentTensor* out_tensor) {
+  TensorShape element_shape(tuple[component].shape());
+  element_shape.RemoveDim(0);
+  Tensor* element_access = nullptr;
+  TF_RETURN_IF_ERROR(ctx->allocate_persistent(
+      tuple[component].dtype(), element_shape, out_tensor, &element_access));
+  TF_RETURN_IF_ERROR(
+      CopySliceToElement(tuple[component], element_access, index));
+  return Status::OK();
+}
+
+void FIFOQueue::TryEnqueueMany(const Tuple& tuple, OpKernelContext* ctx,
+                               DoneCallback callback) {
+  const int64 batch_size = tuple[0].dim_size(0);
+  if (batch_size == 0) {
+    callback();
+    return;
+  }
+
+  CancellationManager* cm = ctx->cancellation_manager();
+  CancellationToken token = cm->get_cancellation_token();
+  bool already_cancelled;
+  {
+    mutex_lock l(mu_);
+    already_cancelled = !cm->RegisterCallback(
+        token, [this, token]() { Cancel(kEnqueue, token); });
+    if (!already_cancelled) {
+      enqueue_attempts_.emplace_back(
+          batch_size, callback, ctx, token,
+          [tuple, this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            if (closed_) {
+              attempt->context->SetStatus(
+                  errors::Aborted("FIFOQueue '", name_, "' is closed."));
+              return kComplete;
+            }
+            RunResult result = kNoProgress;
+            while (queues_[0].size() < static_cast<size_t>(capacity_)) {
+              result = kProgress;
+              const int index =
+                  tuple[0].dim_size(0) - attempt->elements_requested;
+              for (int i = 0; i < num_components(); ++i) {
+                PersistentTensor element;
+                attempt->context->SetStatus(GetElementComponentFromBatch(
+                    tuple, index, i, attempt->context, &element));
+                if (!attempt->context->status().ok()) return kComplete;
+                queues_[i].push_back(element);
+              }
+              --attempt->elements_requested;
+              if (attempt->elements_requested == 0) {
+                return kComplete;
+              }
+            }
+            return result;
+          });
+    }
+  }
+  if (!already_cancelled) {
+    FlushUnlocked();
+  } else {
+    ctx->SetStatus(errors::Cancelled("Enqueue operation was cancelled"));
+    callback();
+  }
+}
+
+void FIFOQueue::TryDequeue(OpKernelContext* ctx, CallbackWithTuple callback) {
+  CancellationManager* cm = ctx->cancellation_manager();
+  CancellationToken token = cm->get_cancellation_token();
+  bool already_cancelled;
+  {
+    mutex_lock l(mu_);
+    already_cancelled = !cm->RegisterCallback(
+        token, [this, token]() { Cancel(kDequeue, token); });
+    if (!already_cancelled) {
+      // TODO(josh11b): This makes two copies of callback, avoid this if possible.
+      dequeue_attempts_.emplace_back(
+          1, [callback]() { callback(Tuple()); }, ctx, token,
+          [callback, this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            const int32 s = queues_[0].size();
+            if (closed_ && s == 0) {
+              attempt->context->SetStatus(errors::OutOfRange(
+                  "FIFOQueue '", name_, "' is closed and has ",
+                  "insufficient elements (requested ", 1, ", current size ", s,
+                  ")"));
+              return kComplete;
+            }
+            if (s > 0) {
+              Tuple tuple;
+              DequeueLocked(attempt->context, &tuple);
+              attempt->done_callback = [callback, tuple]() { callback(tuple); };
+              return kComplete;
+            } else {
+              return kNoProgress;
+            }
+          });
+    }
+  }
+  if (!already_cancelled) {
+    FlushUnlocked();
+  } else {
+    ctx->SetStatus(errors::Cancelled("Dequeue operation was cancelled"));
+    callback(Tuple());
+  }
+}
+
+void FIFOQueue::TryDequeueMany(int num_elements, OpKernelContext* ctx,
+                               CallbackWithTuple callback) {
+  if (!specified_shapes()) {
+    ctx->SetStatus(
+        errors::InvalidArgument("FIFOQueue's DequeueMany requires the "
+                                "components to have specified shapes."));
+    callback(Tuple());
+    return;
+  }
+  if (num_elements == 0) {
+    Tuple tuple;
+    tuple.reserve(num_components());
+    for (int i = 0; i < num_components(); ++i) {
+      // TODO(josh11b,misard): Switch to allocate_output().  Problem is
+      // this breaks the abstraction boundary since we don't *really*
+      // know if and how the Tensors in the tuple we pass to callback
+      // correspond to the outputs of *ctx.  For example, the
+      // ReaderRead Op uses TryDequeue() to get a filename out of a
+      // queue that is used internally by the reader and is not
+      // associated with any output of the ReaderRead.
+      // mrry@ adds:
+      // Maybe we need to pass a std::function<Tensor*(...)> (or
+      // better signature) that calls the appropriate allocator
+      // function in addition to ctx?  (Or support a shim Allocator
+      // that has an internal OpKernelContext*, and dispatches to the
+      // appropriate method?)
+      // misard@ adds:
+      // I don't see that a std::function would help. The problem is
+      // that at this point (allocation time) the system doesn't know
+      // what is going to happen to the element read out of the
+      // queue. As long as we keep the generality that TensorFlow Ops
+      // do their own dynamic allocation in arbitrary C++ code, we
+      // need to preserve robustness to allocating output Tensors with
+      // the 'wrong' attributes, and fixing up with a copy. The only
+      // improvement I can see here in the future would be to support
+      // an optimized case where the queue 'knows' what attributes to
+      // use, and plumbs them through here.
+      Tensor element;
+      ctx->allocate_temp(component_dtypes_[i], ManyOutShape(i, 0), &element);
+      tuple.emplace_back(element);
+    }
+    callback(tuple);
+    return;
+  }
+
+  CancellationManager* cm = ctx->cancellation_manager();
+  CancellationToken token = cm->get_cancellation_token();
+  bool already_cancelled;
+  {
+    mutex_lock l(mu_);
+    already_cancelled = !cm->RegisterCallback(
+        token, [this, token]() { Cancel(kDequeue, token); });
+    if (!already_cancelled) {
+      // TODO(josh11b): This makes two copies of callback, avoid this if possible.
+      dequeue_attempts_.emplace_back(
+          num_elements, [callback]() { callback(Tuple()); }, ctx, token,
+          [callback, this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            int32 s = queues_[0].size();
+            if (closed_ && s < attempt->elements_requested) {
+              attempt->context->SetStatus(errors::OutOfRange(
+                  "FIFOQueue '", name_, "' is closed and has ",
+                  "insufficient elements (requested ",
+                  attempt->elements_requested, ", current size ", s, ")"));
+
+              // TODO(mrry): Add support for producing a partial batch as
+              // output when the queue is closed.
+              if (!attempt->tuple.empty()) {
+                // Restore already-dequeued elements to the front of the queue.
+                for (int64 i = attempt->tuple[0].dim_size(0) -
+                               attempt->elements_requested - 1;
+                     i >= 0; --i) {
+                  for (int j = 0; j < num_components(); ++j) {
+                    PersistentTensor element;
+                    Status s = GetElementComponentFromBatch(
+                        attempt->tuple, i, j, attempt->context, &element);
+                    if (!s.ok()) {
+                      attempt->context->SetStatus(
+                          errors::DataLoss("Failed to restore element from "
+                                           "partially-dequeued batch "
+                                           "to FIFOQueue"));
+                    }
+                    queues_[j].push_front(element);
+                  }
+                }
+              }
+              return kComplete;
+            }
+
+            RunResult result = kNoProgress;
+            for (; s > 0; --s) {
+              if (attempt->tuple.empty()) {
+                // Only allocate tuple when we have something to dequeue
+                // so we don't use exceessive memory when there are many
+                // blocked dequeue attempts waiting.
+                attempt->tuple.reserve(num_components());
+                for (int i = 0; i < num_components(); ++i) {
+                  const TensorShape shape =
+                      ManyOutShape(i, attempt->elements_requested);
+                  Tensor element;
+                  attempt->context->allocate_temp(component_dtypes_[i], shape,
+                                                  &element);
+                  attempt->tuple.emplace_back(element);
+                }
+              }
+              result = kProgress;
+              Tuple tuple;
+              DequeueLocked(attempt->context, &tuple);
+              const int index =
+                  attempt->tuple[0].dim_size(0) - attempt->elements_requested;
+              for (int i = 0; i < num_components(); ++i) {
+                attempt->context->SetStatus(
+                    CopyElementToSlice(tuple[i], &attempt->tuple[i], index));
+                if (!attempt->context->status().ok()) return kComplete;
+              }
+              tuple.clear();
+              --attempt->elements_requested;
+              if (attempt->elements_requested == 0) {
+                tuple = attempt->tuple;
+                attempt->done_callback = [callback, tuple]() {
+                  callback(tuple);
+                };
+                return kComplete;
+              }
+            }
+            return result;
+          });
+    }
+  }
+  if (!already_cancelled) {
+    FlushUnlocked();
+  } else {
+    ctx->SetStatus(errors::Cancelled("Dequeue operation was cancelled"));
+    callback(Tuple());
+  }
+}
+
+void FIFOQueue::Close(OpKernelContext* ctx, bool cancel_pending_enqueues,
+                      DoneCallback callback) {
+  if (cancel_pending_enqueues) {
+    CloseAndCancel();
+    callback();
+  } else {
+    {
+      mutex_lock lock(mu_);
+      enqueue_attempts_.emplace_back(
+          0, callback, ctx, CancellationManager::kInvalidToken,
+          [this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            if (closed_) {
+              attempt->context->SetStatus(errors::Aborted(
+                  "FIFOQueue '", name_, "' is already closed."));
+            } else {
+              closed_ = true;
+            }
+            return kComplete;
+          });
+    }
+    FlushUnlocked();
+  }
+}
+
+Status FIFOQueue::MatchesNodeDef(const NodeDef& node_def) {
+  TF_RETURN_IF_ERROR(MatchesNodeDefOp(node_def, "FIFOQueue"));
+  TF_RETURN_IF_ERROR(MatchesNodeDefCapacity(node_def, capacity_));
+  TF_RETURN_IF_ERROR(MatchesNodeDefTypes(node_def));
+  TF_RETURN_IF_ERROR(MatchesNodeDefShapes(node_def));
+  return Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/fifo_queue.h b/tensorflow/core/kernels/fifo_queue.h
new file mode 100644
index 0000000000..e9fe5f34a4
--- /dev/null
+++ b/tensorflow/core/kernels/fifo_queue.h
@@ -0,0 +1,127 @@
+#ifndef TENSORFLOW_KERNELS_FIFO_QUEUE_H_
+#define TENSORFLOW_KERNELS_FIFO_QUEUE_H_
+
+#include <deque>
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/queue_base.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+class FIFOQueue : public QueueBase {
+ public:
+  FIFOQueue(int32 capacity, const DataTypeVector& component_dtypes,
+            const std::vector<TensorShape>& component_shapes,
+            const string& name);
+  Status Initialize();  // Must be called before any other method.
+
+  // Implementations of QueueInterface methods --------------------------------
+
+  Status ValidateTuple(const Tuple& tuple) override;
+  Status ValidateManyTuple(const Tuple& tuple) override;
+  void TryEnqueue(const Tuple& tuple, OpKernelContext* ctx,
+                  DoneCallback callback) override;
+  void TryEnqueueMany(const Tuple& tuple, OpKernelContext* ctx,
+                      DoneCallback callback) override;
+  void TryDequeue(OpKernelContext* ctx, CallbackWithTuple callback) override;
+  void TryDequeueMany(int num_elements, OpKernelContext* ctx,
+                      CallbackWithTuple callback) override;
+  void Close(OpKernelContext* ctx, bool cancel_pending_enqueues,
+             DoneCallback callback) override;
+  Status MatchesNodeDef(const NodeDef& node_def) override;
+
+  int32 size() override {
+    mutex_lock lock(mu_);
+    return queues_[0].size();
+  }
+
+  int32 capacity() const { return capacity_; }
+
+ private:
+  enum Action { kEnqueue, kDequeue };
+
+  ~FIFOQueue() override {}
+
+  TensorShape ManyOutShape(int i, int64 batch_size) {
+    TensorShape shape({batch_size});
+    shape.AppendShape(component_shapes_[i]);
+    return shape;
+  }
+
+  // Helper for dequeuing a single element from queues_.
+  void DequeueLocked(OpKernelContext* ctx, Tuple* tuple)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  void Cancel(Action action, CancellationToken token);
+
+  // Helper for cancelling all pending Enqueue(Many) operations when
+  // Close is called with cancel_pending_enqueues.
+  void CloseAndCancel();
+
+  // Tries to enqueue/dequeue (or close) based on whatever is at the
+  // front of enqueue_attempts_/dequeue_attempts_.  Appends to
+  // *finished the callback for any finished attempt (so it may be
+  // called once mu_ is released).  Returns true if any progress was
+  // made.
+  struct CleanUp {
+    CleanUp(DoneCallback&& f, CancellationToken ct, CancellationManager* cm)
+        : finished(f), to_deregister(ct), cm(cm) {}
+    DoneCallback finished;
+    CancellationToken to_deregister;
+    CancellationManager* cm;
+  };
+  bool TryAttemptLocked(Action action, std::vector<CleanUp>* clean_up)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Tries to make progress on the enqueues or dequeues at the front
+  // of the *_attempts_ queues.
+  void FlushUnlocked();
+
+  const int32 capacity_;
+
+  mutex mu_;
+  typedef std::deque<PersistentTensor> SubQueue;
+  std::vector<SubQueue> queues_ GUARDED_BY(mu_);
+  bool closed_ GUARDED_BY(mu_);
+
+  enum RunResult { kNoProgress, kProgress, kComplete };
+  struct Attempt;
+  typedef std::function<RunResult(Attempt*)> RunCallback;
+  struct Attempt {
+    int32 elements_requested;
+    DoneCallback done_callback;  // must be run outside mu_
+    OpKernelContext* context;
+    CancellationToken cancellation_token;
+    RunCallback run_callback;  // must be run while holding mu_
+    bool is_cancelled;
+    Tuple tuple;
+
+    Attempt(int32 elements_requested, DoneCallback done_callback,
+            OpKernelContext* context, CancellationToken cancellation_token,
+            RunCallback run_callback)
+        : elements_requested(elements_requested),
+          done_callback(done_callback),
+          context(context),
+          cancellation_token(cancellation_token),
+          run_callback(run_callback),
+          is_cancelled(false) {}
+  };
+  std::deque<Attempt> enqueue_attempts_ GUARDED_BY(mu_);
+  std::deque<Attempt> dequeue_attempts_ GUARDED_BY(mu_);
+
+  static Status GetElementComponentFromBatch(const Tuple& tuple, int index,
+                                             int component,
+                                             OpKernelContext* ctx,
+                                             PersistentTensor* out_element);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(FIFOQueue);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_FIFO_QUEUE_H_
diff --git a/tensorflow/core/kernels/fifo_queue_op.cc b/tensorflow/core/kernels/fifo_queue_op.cc
new file mode 100644
index 0000000000..f1088181fe
--- /dev/null
+++ b/tensorflow/core/kernels/fifo_queue_op.cc
@@ -0,0 +1,93 @@
+// See docs in ../ops/data_flow_ops.cc.
+
+#include <deque>
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/fifo_queue.h"
+#include "tensorflow/core/kernels/queue_base.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+// Defines a FIFOQueueOp, which produces a Queue (specifically, one
+// backed by FIFOQueue) that persists across different graph
+// executions, and sessions. Running this op produces a single-element
+// tensor of handles to Queues in the corresponding device.
+class FIFOQueueOp : public OpKernel {
+ public:
+  explicit FIFOQueueOp(OpKernelConstruction* context)
+      : OpKernel(context), queue_handle_set_(false) {
+    OP_REQUIRES_OK(context, context->GetAttr("capacity", &capacity_));
+    OP_REQUIRES_OK(context,
+                   context->allocate_persistent(DT_STRING, TensorShape({2}),
+                                                &queue_handle_, nullptr));
+    if (capacity_ < 0) {
+      capacity_ = FIFOQueue::kUnbounded;
+    }
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("component_types", &component_types_));
+    OP_REQUIRES_OK(context, context->GetAttr("shapes", &component_shapes_));
+  }
+
+  ~FIFOQueueOp() override {
+    // If the queue object was not shared, delete it.
+    if (queue_handle_set_ && cinfo_.resource_is_private_to_kernel()) {
+      TF_CHECK_OK(cinfo_.resource_manager()->Delete<QueueInterface>(
+          cinfo_.container(), cinfo_.name()));
+    }
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    mutex_lock l(mu_);
+    if (!queue_handle_set_) {
+      OP_REQUIRES_OK(ctx, SetQueueHandle(ctx));
+    }
+    ctx->set_output_ref(0, &mu_, queue_handle_.AccessTensor(ctx));
+  }
+
+ private:
+  Status SetQueueHandle(OpKernelContext* ctx) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+    TF_RETURN_IF_ERROR(cinfo_.Init(ctx->resource_manager(), def()));
+    QueueInterface* queue;
+    auto creator = [this](QueueInterface** ret) {
+      FIFOQueue* queue = new FIFOQueue(capacity_, component_types_,
+                                       component_shapes_, cinfo_.name());
+      *ret = queue;
+      return queue->Initialize();
+    };
+    TF_RETURN_IF_ERROR(
+        cinfo_.resource_manager()->LookupOrCreate<QueueInterface>(
+            cinfo_.container(), cinfo_.name(), &queue, creator));
+    core::ScopedUnref unref_me(queue);
+    // Verify that the shared queue is compatible with the requested arguments.
+    TF_RETURN_IF_ERROR(queue->MatchesNodeDef(def()));
+    auto h = queue_handle_.AccessTensor(ctx)->flat<string>();
+    h(0) = cinfo_.container();
+    h(1) = cinfo_.name();
+    queue_handle_set_ = true;
+    return Status::OK();
+  }
+
+  int32 capacity_;
+  DataTypeVector component_types_;
+  std::vector<TensorShape> component_shapes_;
+  ContainerInfo cinfo_;
+
+  mutex mu_;
+  PersistentTensor queue_handle_ GUARDED_BY(mu_);
+  bool queue_handle_set_ GUARDED_BY(mu_);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(FIFOQueueOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("FIFOQueue").Device(DEVICE_CPU), FIFOQueueOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/fill_functor.h b/tensorflow/core/kernels/fill_functor.h
new file mode 100644
index 0000000000..831f0c899e
--- /dev/null
+++ b/tensorflow/core/kernels/fill_functor.h
@@ -0,0 +1,26 @@
+#ifndef TENSORFLOW_KERNELS_FILL_FUNCTOR_H_
+#define TENSORFLOW_KERNELS_FILL_FUNCTOR_H_
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T>
+struct FillFunctor {
+  // Computes on device "d": out = out.constant(in(0)),
+  void operator()(const Device& d, typename TTypes<T>::Flat out,
+                  typename TTypes<T>::ConstScalar in);
+};
+
+template <typename Device, typename T>
+struct SetZeroFunctor {
+  // Computes on device "d": out = out.setZero(),
+  void operator()(const Device& d, typename TTypes<T>::Flat out);
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_FILL_FUNCTOR_H_
diff --git a/tensorflow/core/kernels/fixed_length_record_reader_op.cc b/tensorflow/core/kernels/fixed_length_record_reader_op.cc
new file mode 100644
index 0000000000..77516ab151
--- /dev/null
+++ b/tensorflow/core/kernels/fixed_length_record_reader_op.cc
@@ -0,0 +1,109 @@
+// See docs in ../ops/io_ops.cc.
+
+#include <memory>
+#include "tensorflow/core/framework/reader_op_kernel.h"
+#include "tensorflow/core/kernels/reader_base.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/io/inputbuffer.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+
+class FixedLengthRecordReader : public ReaderBase {
+ public:
+  FixedLengthRecordReader(const string& node_name, int64 header_bytes,
+                          int64 record_bytes, int64 footer_bytes, Env* env)
+      : ReaderBase(
+            strings::StrCat("FixedLengthRecordReader '", node_name, "'")),
+        header_bytes_(header_bytes),
+        record_bytes_(record_bytes),
+        footer_bytes_(footer_bytes),
+        env_(env),
+        file_pos_limit_(-1),
+        record_number_(0) {}
+
+  // On success:
+  // * input_buffer_ != nullptr,
+  // * input_buffer_->Tell() == footer_bytes_
+  // * file_pos_limit_ == file size - header_bytes_
+  Status OnWorkStartedLocked() override {
+    record_number_ = 0;
+    uint64 file_size = 0;
+    TF_RETURN_IF_ERROR(env_->GetFileSize(current_work(), &file_size));
+    file_pos_limit_ = file_size - footer_bytes_;
+
+    RandomAccessFile* file = nullptr;
+    TF_RETURN_IF_ERROR(env_->NewRandomAccessFile(current_work(), &file));
+    input_buffer_.reset(new io::InputBuffer(file, kBufferSize));
+    TF_RETURN_IF_ERROR(input_buffer_->SkipNBytes(header_bytes_));
+    return Status::OK();
+  }
+
+  Status OnWorkFinishedLocked() override {
+    input_buffer_.reset(nullptr);
+    return Status::OK();
+  }
+
+  Status ReadLocked(string* key, string* value, bool* produced,
+                    bool* at_end) override {
+    if (input_buffer_->Tell() >= file_pos_limit_) {
+      *at_end = true;
+      return Status::OK();
+    }
+    TF_RETURN_IF_ERROR(input_buffer_->ReadNBytes(record_bytes_, value));
+    *key = strings::StrCat(current_work(), ":", record_number_);
+    *produced = true;
+    ++record_number_;
+    return Status::OK();
+  }
+
+  Status ResetLocked() override {
+    file_pos_limit_ = -1;
+    record_number_ = 0;
+    input_buffer_.reset(nullptr);
+    return ReaderBase::ResetLocked();
+  }
+
+  // TODO(josh11b): Implement serializing and restoring the state.
+
+ private:
+  enum { kBufferSize = 256 << 10 /* 256 kB */ };
+  const int64 header_bytes_;
+  const int64 record_bytes_;
+  const int64 footer_bytes_;
+  Env* const env_;
+  int64 file_pos_limit_;
+  int64 record_number_;
+  std::unique_ptr<io::InputBuffer> input_buffer_;
+};
+
+class FixedLengthRecordReaderOp : public ReaderOpKernel {
+ public:
+  explicit FixedLengthRecordReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    int64 header_bytes = -1, record_bytes = -1, footer_bytes = -1;
+    OP_REQUIRES_OK(context, context->GetAttr("header_bytes", &header_bytes));
+    OP_REQUIRES_OK(context, context->GetAttr("record_bytes", &record_bytes));
+    OP_REQUIRES_OK(context, context->GetAttr("footer_bytes", &footer_bytes));
+    OP_REQUIRES(context, header_bytes >= 0,
+                errors::InvalidArgument("header_bytes must be >= 0 not ",
+                                        header_bytes));
+    OP_REQUIRES(context, record_bytes >= 0,
+                errors::InvalidArgument("record_bytes must be >= 0 not ",
+                                        record_bytes));
+    OP_REQUIRES(context, footer_bytes >= 0,
+                errors::InvalidArgument("footer_bytes must be >= 0 not ",
+                                        footer_bytes));
+    Env* env = context->env();
+    SetReaderFactory([this, header_bytes, record_bytes, footer_bytes, env]() {
+      return new FixedLengthRecordReader(name(), header_bytes, record_bytes,
+                                         footer_bytes, env);
+    });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("FixedLengthRecordReader").Device(DEVICE_CPU),
+                        FixedLengthRecordReaderOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/gather_op.cc b/tensorflow/core/kernels/gather_op.cc
new file mode 100644
index 0000000000..8bd48f26d6
--- /dev/null
+++ b/tensorflow/core/kernels/gather_op.cc
@@ -0,0 +1,136 @@
+// See docs in ../ops/array_ops.cc.
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+namespace {
+template <typename T, typename Index, int static_slice_elems>
+void HandleCopies(const Tensor& Tparams,
+                  typename TTypes<Index>::ConstVec& Tindices, int slice_elems,
+                  typename TTypes<T>::Matrix Tout) {
+  const int N = Tindices.dimension(0);
+  const auto& Tparams_flat = Tparams.flat_outer_dims<T>();
+  T* Tout_base = &Tout(0, 0);
+  const T* Tparams_base = &Tparams_flat(0, 0);
+  const size_t slice_bytes = slice_elems * sizeof(T);
+  if (static_slice_elems >= 0) {
+    // Give compiler static knowledge of the number of elements/bytes
+    CHECK_EQ(static_slice_elems, slice_elems);
+    slice_elems = static_slice_elems;
+  }
+  for (int i = 0; i < N; i++) {
+    int j = i + 1;
+    if (j < N) {
+      port::prefetch<port::PREFETCH_HINT_T0>(&Tparams_flat(Tindices(j), 0));
+      port::prefetch<port::PREFETCH_HINT_T0>(&Tout(j, 0));
+    }
+    memcpy(Tout_base + i * slice_elems,
+           Tparams_base + Tindices(i) * slice_elems, slice_bytes);
+  }
+}
+
+}  // anonymous namespace
+
+template <typename T, typename Index>
+class GatherOp : public OpKernel {
+ public:
+  //   QUESTION: It'd be nice to support DT_INT16, DT_UINT8,
+  //   etc. here for the type of the second input argument.  Should
+  //   we have the framework do some sort of integer promotion
+  //   automatically, or should that be something that users have to
+  //   do explicitly with a conversion operator in the graph?
+  explicit GatherOp(OpKernelConstruction* c) : OpKernel(c) {
+    const DataType dt = DataTypeToEnum<T>::v();
+    const DataType index_t = DataTypeToEnum<Index>::v();
+    OP_REQUIRES_OK(c, c->MatchSignature({dt, index_t}, {dt}));
+  }
+
+  void Compute(OpKernelContext* c) override {
+    const Tensor& Tparams = c->input(0);
+    const Tensor& Tindices = c->input(1);
+    OP_REQUIRES(
+        c, TensorShapeUtils::IsVectorOrHigher(Tparams.shape()),
+        errors::InvalidArgument("params must be at least 1 dimensional"));
+    const int64 N = Tindices.NumElements();
+    const int64 first_dim_size = Tparams.dim_size(0);
+
+    // Validate all the indices are in range
+    auto Tindices_vec = Tindices.flat<Index>();
+    for (int64 i = 0; i < N; i++) {
+      const Index index = Tindices_vec(i);
+      OP_REQUIRES(c, index >= 0 && index < first_dim_size,
+                  errors::InvalidArgument(
+                      strings::StrCat("Index ", index, " at offset ", i,
+                                      " in Tindices is out of range")));
+    }
+
+    // The result shape is indices.shape + params.shape[1:].
+    TensorShape result_shape = Tindices.shape();
+    for (int i = 1; i < Tparams.dims(); i++) {
+      result_shape.AddDim(Tparams.dim_size(i));
+    }
+
+    Tensor* Tout = nullptr;
+    OP_REQUIRES_OK(c, c->allocate_output(0, result_shape, &Tout));
+    const auto& Tparams_flat = Tparams.flat_outer_dims<T>();
+    if (N > 0) {
+      auto Tindices_flat = Tindices.flat<Index>();
+      auto Tout_flat = Tout->shaped<T, 2>({N, Tout->NumElements() / N});
+      if (DataTypeCanUseMemcpy(DataTypeToEnum<T>::v())) {
+        const int64 slice_size = Tout->NumElements() / N;
+#define SPECIALIZE(elems)                                               \
+  do {                                                                  \
+    if (slice_size == elems) {                                          \
+      HandleCopies<T, Index, elems>(Tparams, Tindices_flat, slice_size, \
+                                    Tout_flat);                         \
+      return;                                                           \
+    }                                                                   \
+  } while (0)
+
+        SPECIALIZE(10);
+        SPECIALIZE(20);
+
+#undef SPECIALIZE
+
+        HandleCopies<T, Index, -1>(Tparams, Tindices_flat, slice_size,
+                                   Tout_flat);
+      } else {
+        for (int i = 0; i < N; i++) {
+          int j = i + 1;
+          if (j < N) {
+            port::prefetch<port::PREFETCH_HINT_T0>(
+                &Tparams_flat(Tindices_vec(j), 0));
+            port::prefetch<port::PREFETCH_HINT_T0>(&Tout_flat(j, 0));
+          }
+          // Copy last Ndim-1 dimensions of Tparams[Tindices[i]] to Tout[i]
+          Tout_flat.template chip<0>(i) =
+              Tparams_flat.template chip<0>(Tindices_vec(i));
+        }
+      }
+    }
+  }
+};
+
+#define REGISTER_GATHER(type, index_type)                              \
+  REGISTER_KERNEL_BUILDER(Name("Gather")                               \
+                              .Device(DEVICE_CPU)                      \
+                              .TypeConstraint<type>("Tparams")         \
+                              .TypeConstraint<index_type>("Tindices"), \
+                          GatherOp<type, index_type>)
+
+#define REGISTER_GATHER_INT32(type) REGISTER_GATHER(type, int32)
+#define REGISTER_GATHER_INT64(type) REGISTER_GATHER(type, int64)
+
+TF_CALL_ALL_TYPES(REGISTER_GATHER_INT32);
+TF_CALL_ALL_TYPES(REGISTER_GATHER_INT64);
+
+#undef REGISTER_GATHER_INT32
+#undef REGISTER_GATHER_INT64
+#undef REGISTER_GATHER
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/gather_op_test.cc b/tensorflow/core/kernels/gather_op_test.cc
new file mode 100644
index 0000000000..d7410169e1
--- /dev/null
+++ b/tensorflow/core/kernels/gather_op_test.cc
@@ -0,0 +1,213 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace {
+
+class GatherOpTest : public OpsTestBase {
+ protected:
+  void MakeOp(DataType index_type) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "Gather")
+                  .Input(FakeInput(DT_FLOAT))
+                  .Input(FakeInput(index_type))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(GatherOpTest, ScalarIndices) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5}), {0, 1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({}), {3});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output.
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({}));
+  test::FillValues<float>(&expected, {3});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(GatherOpTest, Simple_TwoD32) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14});
+  AddInputFromArray<int32>(TensorShape({4}), {0, 4, 0, 2});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output.
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({4, 3}));
+  test::FillValues<float>(&expected, {0, 1, 2, 12, 13, 14, 0, 1, 2, 6, 7, 8});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(GatherOpTest, Simple_TwoD64) {
+  MakeOp(DT_INT64);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14});
+  AddInputFromArray<int64>(TensorShape({4}), {0, 4, 0, 2});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output.
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({4, 3}));
+  test::FillValues<float>(&expected, {0, 1, 2, 12, 13, 14, 0, 1, 2, 6, 7, 8});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(GatherOpTest, HighRank) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({4}), {0, 1, 2, 3});
+  AddInputFromArray<int32>(TensorShape({2, 3}), {1, 2, 0, 2, 3, 0});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({2, 3}));
+  test::FillValues<float>(&expected, {1, 2, 0, 2, 3, 0});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(GatherOpTest, Error_IndexOutOfRange) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14});
+  AddInputFromArray<int32>(TensorShape({4}), {0, 4, 99, 2});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString())
+                  .contains("Index 99 at offset 2 in Tindices is out of range"))
+      << s;
+}
+
+class GatherOpForBenchmark : public GatherOpTest {
+ public:
+  void TestBody() override {  // not used }
+  }
+  void PublicMakeOp(DataType index_type) { MakeOp(index_type); }
+};
+
+static const int kSorted = 0x8000;  // Mask for arg to specify sorting vs. not
+
+template <typename Index>
+void BM_Gather(int iters, int arg) {
+  testing::StopTiming();
+
+  bool sorted = ((arg & kSorted) != 0);
+  int dim = arg & ~kSorted;
+
+  GatherOpForBenchmark t;
+  t.PublicMakeOp(DataTypeToEnum<Index>::v());
+  // Use a 512 MB table, regardless of dim
+  const int kRows = ((1 << 29) / sizeof(float)) / dim;
+  std::vector<float> data(kRows * dim, 1.0f);
+  t.AddInputFromArray<float>(TensorShape({kRows, dim}), data);
+  const int kLookups = 2000;
+  const int kBatches = 1000000 / kLookups;
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  std::vector<std::vector<Index>> all_ids(kBatches);
+  for (int i = 0; i < kBatches; ++i) {
+    std::vector<Index>* ids = &all_ids[i];
+    ids->resize(kLookups);
+    for (int j = 0; j < kLookups; ++j) {
+      (*ids)[j] = rnd.Uniform(kRows);
+    }
+    if (sorted) {
+      sort(ids->begin(), ids->end());
+    }
+  }
+
+  t.AddInput<Index>(TensorShape({kLookups}), [](int i) { return 0; });
+  if (sorted) {
+    testing::SetLabel("sorted by id");
+  }
+  testing::BytesProcessed(static_cast<int64>(iters) * kLookups * dim *
+                          sizeof(float));
+  testing::StartTiming();
+  while (--iters > 0) {
+    const std::vector<Index>& b = all_ids[iters % kBatches];
+    TensorValue input = t.mutable_input(1);
+    gtl::MutableArraySlice<Index> slice(&input->vec<Index>()(0),
+                                        input->NumElements());
+    for (int i = 0; i < kLookups; i++) {
+      slice[i] = b[i];
+    }
+    Status s = t.RunOpKernel();
+  }
+}
+
+static void BM_Gather32(int iters, int arg) { BM_Gather<int32>(iters, arg); }
+
+static void BM_Gather64(int iters, int arg) { BM_Gather<int64>(iters, arg); }
+
+BENCHMARK(BM_Gather32)
+    ->Arg(10)
+    ->Arg(10 | kSorted)
+    ->Arg(20)
+    ->Arg(40)
+    ->Arg(63)
+    ->Arg(63 | kSorted)
+    ->Arg(64)
+    ->Arg(64 | kSorted)
+    ->Arg(65)
+    ->Arg(65 | kSorted)
+    ->Arg(100)
+    ->Arg(100 | kSorted)
+    ->Arg(127)
+    ->Arg(127 | kSorted)
+    ->Arg(128)
+    ->Arg(128 | kSorted)
+    ->Arg(129)
+    ->Arg(129 | kSorted)
+    ->Arg(1000)
+    ->Arg(1000 | kSorted);
+
+BENCHMARK(BM_Gather64)
+    ->Arg(10)
+    ->Arg(10 | kSorted)
+    ->Arg(20)
+    ->Arg(40)
+    ->Arg(63)
+    ->Arg(63 | kSorted)
+    ->Arg(64)
+    ->Arg(64 | kSorted)
+    ->Arg(65)
+    ->Arg(65 | kSorted)
+    ->Arg(100)
+    ->Arg(100 | kSorted)
+    ->Arg(127)
+    ->Arg(127 | kSorted)
+    ->Arg(128)
+    ->Arg(128 | kSorted)
+    ->Arg(129)
+    ->Arg(129 | kSorted)
+    ->Arg(1000)
+    ->Arg(1000 | kSorted);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/identity_op.cc b/tensorflow/core/kernels/identity_op.cc
new file mode 100644
index 0000000000..b29efbddfb
--- /dev/null
+++ b/tensorflow/core/kernels/identity_op.cc
@@ -0,0 +1,45 @@
+// See docs in ../ops/array_ops.cc.
+#include "tensorflow/core/kernels/identity_op.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+REGISTER_KERNEL_BUILDER(Name("Identity").Device(DEVICE_CPU), IdentityOp);
+// StopGradient does the same thing as Identity, but has a different
+// gradient registered.
+REGISTER_KERNEL_BUILDER(Name("StopGradient").Device(DEVICE_CPU), IdentityOp);
+
+REGISTER_KERNEL_BUILDER(Name("RefIdentity").Device(DEVICE_CPU), IdentityOp);
+
+#define REGISTER_GPU_KERNEL(type)                                        \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("Identity").Device(DEVICE_GPU).TypeConstraint<type>("T"),     \
+      IdentityOp);                                                       \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("RefIdentity").Device(DEVICE_GPU).TypeConstraint<type>("T"),  \
+      IdentityOp);                                                       \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("StopGradient").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      IdentityOp)
+
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+REGISTER_GPU_KERNEL(bool);
+REGISTER_GPU_KERNEL(bfloat16);
+
+#undef REGISTER_GPU_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Identity")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("input")
+                            .HostMemory("output")
+                            .TypeConstraint<int32>("T"),
+                        IdentityOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/identity_op.h b/tensorflow/core/kernels/identity_op.h
new file mode 100644
index 0000000000..7adc1eace0
--- /dev/null
+++ b/tensorflow/core/kernels/identity_op.h
@@ -0,0 +1,25 @@
+#ifndef TENSORFLOW_KERNELS_IDENTITY_OP_H_
+#define TENSORFLOW_KERNELS_IDENTITY_OP_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+
+namespace tensorflow {
+
+class IdentityOp : public OpKernel {
+ public:
+  explicit IdentityOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    if (IsRefType(context->input_dtype(0))) {
+      context->forward_ref_input_to_ref_output(0, 0);
+    } else {
+      context->set_output(0, context->input(0));
+    }
+  }
+
+  bool IsExpensive() override { return false; }
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_IDENTITY_OP_H_
diff --git a/tensorflow/core/kernels/identity_op_test.cc b/tensorflow/core/kernels/identity_op_test.cc
new file mode 100644
index 0000000000..6483367a79
--- /dev/null
+++ b/tensorflow/core/kernels/identity_op_test.cc
@@ -0,0 +1,56 @@
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+class IdentityOpTest : public OpsTestBase {
+ protected:
+  Status Init(DataType input_type) {
+    RequireDefaultOps();
+    TF_CHECK_OK(NodeDefBuilder("op", "Identity")
+                    .Input(FakeInput(input_type))
+                    .Finalize(node_def()));
+    return InitOp();
+  }
+};
+
+TEST_F(IdentityOpTest, Int32Success_6) {
+  ASSERT_OK(Init(DT_INT32));
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({6}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+}
+
+TEST_F(IdentityOpTest, Int32Success_2_3) {
+  ASSERT_OK(Init(DT_INT32));
+  AddInputFromArray<int32>(TensorShape({2, 3}), {1, 2, 3, 4, 5, 6});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({2, 3}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+}
+
+TEST_F(IdentityOpTest, StringSuccess) {
+  ASSERT_OK(Init(DT_STRING));
+  AddInputFromArray<string>(TensorShape({6}), {"A", "b", "C", "d", "E", "f"});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_STRING, TensorShape({6}));
+  test::FillValues<string>(&expected, {"A", "b", "C", "d", "E", "f"});
+  test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(IdentityOpTest, RefInputError) { ASSERT_OK(Init(DT_INT32_REF)); }
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/identity_reader_op.cc b/tensorflow/core/kernels/identity_reader_op.cc
new file mode 100644
index 0000000000..a63fea5dbb
--- /dev/null
+++ b/tensorflow/core/kernels/identity_reader_op.cc
@@ -0,0 +1,57 @@
+// See docs in ../ops/io_ops.cc.
+
+#include <memory>
+#include "tensorflow/core/framework/reader_op_kernel.h"
+#include "tensorflow/core/kernels/reader_base.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+class IdentityReader : public ReaderBase {
+ public:
+  explicit IdentityReader(const string& node_name)
+      : ReaderBase(strings::StrCat("IdentityReader '", node_name, "'")) {}
+
+  Status ReadLocked(string* key, string* value, bool* produced,
+                    bool* at_end) override {
+    *key = current_work();
+    *value = current_work();
+    *produced = true;
+    *at_end = true;
+    return Status::OK();
+  }
+
+  // Stores state in a ReaderBaseState proto, since IdentityReader has
+  // no additional state beyond ReaderBase.
+  Status SerializeStateLocked(string* state) override {
+    ReaderBaseState base_state;
+    SaveBaseState(&base_state);
+    base_state.SerializeToString(state);
+    return Status::OK();
+  }
+
+  Status RestoreStateLocked(const string& state) override {
+    ReaderBaseState base_state;
+    if (!ParseProtoUnlimited(&base_state, state)) {
+      return errors::InvalidArgument("Could not parse state for ", name(), ": ",
+                                     str_util::CEscape(state));
+    }
+    TF_RETURN_IF_ERROR(RestoreBaseState(base_state));
+    return Status::OK();
+  }
+};
+
+class IdentityReaderOp : public ReaderOpKernel {
+ public:
+  explicit IdentityReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    SetReaderFactory([this]() { return new IdentityReader(name()); });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("IdentityReader").Device(DEVICE_CPU),
+                        IdentityReaderOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/in_topk_op.cc b/tensorflow/core/kernels/in_topk_op.cc
new file mode 100644
index 0000000000..d08f6f53da
--- /dev/null
+++ b/tensorflow/core/kernels/in_topk_op.cc
@@ -0,0 +1,58 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+template <typename T>
+class InTopK : public OpKernel {
+ public:
+  explicit InTopK(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("k", &k_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const auto& predictions_in = context->input(0);
+    const auto& targets_in = context->input(1);
+    OP_REQUIRES(context, predictions_in.dims() == 2,
+                errors::InvalidArgument("predictions must be 2-dimensional"));
+    OP_REQUIRES(context, targets_in.dims() == 1,
+                errors::InvalidArgument("targets must be 1-dimensional"));
+    OP_REQUIRES(context, predictions_in.dim_size(0) == targets_in.dim_size(0),
+                errors::InvalidArgument("First dimension of predictions ",
+                                        predictions_in.dim_size(0),
+                                        " must match length of targets ",
+                                        targets_in.dim_size(0)));
+    const auto& predictions = predictions_in.matrix<T>();
+    const auto& targets = targets_in.vec<int>();
+
+    Tensor* t_out = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(
+                       0, TensorShape({targets_in.dim_size(0)}), &t_out));
+    auto out = t_out->vec<bool>();
+
+    const auto size = targets.size();
+    const auto num_classes = predictions.dimension(1);
+    for (int b = 0; b < size; b++) {
+      T target_prediction = predictions(b, targets(b));
+      int more_probable_classes = 0;
+      for (int i = 0; i < num_classes; ++i) {
+        if (predictions(b, i) > target_prediction) ++more_probable_classes;
+      }
+      out(b) = more_probable_classes < k_;
+    }
+  }
+
+ private:
+  int k_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("InTopK").Device(DEVICE_CPU), InTopK<float>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/initializable_lookup_table.cc b/tensorflow/core/kernels/initializable_lookup_table.cc
new file mode 100644
index 0000000000..7f8b070556
--- /dev/null
+++ b/tensorflow/core/kernels/initializable_lookup_table.cc
@@ -0,0 +1,41 @@
+#include "tensorflow/core/kernels/initializable_lookup_table.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+namespace lookup {
+
+Status InitializableLookupTable::Find(const Tensor& keys, Tensor* values,
+                                      const Tensor& default_value) {
+  if (!is_initialized()) {
+    return errors::FailedPrecondition("Table not initialized.");
+  }
+  TF_RETURN_IF_ERROR(CheckFindArguments(keys, *values, default_value));
+  return DoFind(keys, values, default_value);
+}
+
+Status InitializableLookupTable::Initialize(InitTableIterator& iter) {
+  if (!iter.Valid()) {
+    return iter.status();
+  }
+  TF_RETURN_IF_ERROR(CheckKeyAndValueTensors(iter.keys(), iter.values()));
+
+  mutex_lock l(mu_);
+  if (is_initialized()) {
+    return errors::FailedPrecondition("Table already initialized.");
+  }
+
+  TF_RETURN_IF_ERROR(DoPrepare(iter.total_size()));
+  while (iter.Valid()) {
+    TF_RETURN_IF_ERROR(DoInsert(iter.keys(), iter.values()));
+    iter.Next();
+  }
+  if (!errors::IsOutOfRange(iter.status())) {
+    return iter.status();
+  }
+  is_initialized_ = true;
+  return Status::OK();
+}
+
+}  // namespace lookup
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/initializable_lookup_table.h b/tensorflow/core/kernels/initializable_lookup_table.h
new file mode 100644
index 0000000000..651b491457
--- /dev/null
+++ b/tensorflow/core/kernels/initializable_lookup_table.h
@@ -0,0 +1,103 @@
+#ifndef TENSORFLOW_KERNELS_INITIALIZABLE_LOOKUP_TABLE_H_
+#define TENSORFLOW_KERNELS_INITIALIZABLE_LOOKUP_TABLE_H_
+
+#include "tensorflow/core/framework/lookup_interface.h"
+
+namespace tensorflow {
+namespace lookup {
+
+// Base class for lookup tables that require initialization.
+class InitializableLookupTable : public LookupInterface {
+ public:
+  class InitTableIterator;
+
+  // Performs batch lookups, for every element in the key tensor, Find returns
+  // the corresponding value into the values tensor.
+  // If an element is not present in the table, the given default value is used.
+  //
+  // For tables that require initialization, `Find` is available once the table
+  // is marked as initialized.
+  //
+  // Returns the following statuses:
+  // - OK: when the find finishes successfully.
+  // - FailedPrecondition: if the table is not initialized.
+  // - InvalidArgument: if any of the preconditions on the lookup key or value
+  //   fails.
+  // - In addition, other implementations may provide another non-OK status
+  //   specific to their failure modes.
+  Status Find(const Tensor& keys, Tensor* values,
+              const Tensor& default_value) final;
+
+  // Returns whether the table was initialized and is ready to serve lookups.
+  bool is_initialized() const { return is_initialized_; }
+
+  // Initializes the table from the given init table iterator.
+  //
+  // Atomically, this operation prepares the table, populates it with the given
+  // iterator, and mark the table as initialized.
+  //
+  // Returns the following statuses:
+  // - OK: when the initialization was successful.
+  // - InvalidArgument: if any of the preconditions on the lookup key or value
+  //   fails.
+  // - FailedPrecondition: if the table is already initialized and
+  //   fail_if_initialized is set to true.
+  // - In addition, other implementations may provide another non-OK status
+  //   specific to their failure modes.
+  Status Initialize(InitTableIterator& iter);
+
+  // Basic iterator to initialize lookup tables.
+  // It yields a sequence of pairs of `keys()` and `values()` Tensors, so that
+  // the consumer may insert key-value pairs in batches.
+  //
+  // Then the iterator is exhausted, valid returns false and status returns
+  // Status::OutOfRange.
+  class InitTableIterator {
+   public:
+    InitTableIterator() {}
+
+    virtual ~InitTableIterator() {}
+
+    // Prepares the next batch of key and value tensors.
+    virtual void Next() = 0;
+
+    // Returns true if keys and values point to valid tensors.
+    virtual bool Valid() const = 0;
+
+    // Returns a tensor that contains the current batch of 'key' values.
+    virtual const Tensor& keys() const = 0;
+
+    // Returns a tensor that contains the current batch of 'value' values.
+    virtual const Tensor& values() const = 0;
+
+    // Returns an error if one has occurred, otherwire returns Status::OK.
+    virtual Status status() const = 0;
+
+    // Returns the total number of elements that the iterator will produce.
+    virtual int64 total_size() const = 0;
+
+   private:
+    TF_DISALLOW_COPY_AND_ASSIGN(InitTableIterator);
+  };
+
+ protected:
+  // Prepares and allocates the underlying data structure to store the given
+  // number of expected elements.
+  virtual Status DoPrepare(size_t expected_num_elements) = 0;
+
+  // Populates the table in batches given keys and values as tensors into the
+  // underlying data structure.
+  virtual Status DoInsert(const Tensor& keys, const Tensor& values) = 0;
+
+  // Performs the batch find operation on the underlying data structure.
+  virtual Status DoFind(const Tensor& keys, Tensor* values,
+                        const Tensor& default_value) = 0;
+
+  mutex mu_;
+  bool is_initialized_ = false;
+};
+
+}  // namespace lookup
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_INITIALIZABLE_LOOKUP_TABLE_H_
diff --git a/tensorflow/core/kernels/io.cc b/tensorflow/core/kernels/io.cc
new file mode 100644
index 0000000000..9d6921aa8e
--- /dev/null
+++ b/tensorflow/core/kernels/io.cc
@@ -0,0 +1,270 @@
+// See docs in ../ops/io_ops.cc
+#include <unordered_map>
+
+#include "tensorflow/core/kernels/io.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/util/tensor_slice_reader.h"
+#include "tensorflow/core/util/tensor_slice_reader_cache.h"
+#include "tensorflow/core/util/tensor_slice_writer.h"
+
+namespace tensorflow {
+
+namespace {
+bool ParseShapeAndSlice(const string& shape_and_slice, TensorShape* shape,
+                        TensorSlice* slice, TensorShape* shape_slice,
+                        string* error) {
+  CHECK(!shape_and_slice.empty());
+  // Syntax: dim0 dim1 dim2 ... <slice string>
+  // Where slice string is defined in core/framework/tensor_slice.h
+  std::vector<string> splits = str_util::Split(shape_and_slice, ' ');
+
+  // Must have at least 2 strings.
+  if (splits.size() < 2) {
+    *error = strings::StrCat(
+        "Need least two elements in shape_and_slice specification: ",
+        shape_and_slice);
+    return false;
+  }
+  int num_dims = splits.size() - 1;
+  shape->Clear();
+  for (int i = 0; i < num_dims; ++i) {
+    int dim;
+    if (!str_util::NumericParse32(splits[i], &dim)) {
+      *error = strings::StrCat("Non numerical dimension in shape_and_slice: ",
+                               shape_and_slice);
+      return false;
+    }
+    shape->AddDim(dim);
+  }
+  // The last split is the slice specification.
+  slice->Clear();
+  auto status = slice->Parse(splits.back(), slice);
+  if (!status.ok()) {
+    *error = status.error_message();
+    return false;
+  }
+  // The specified slice must be compatible with the specified shape.
+  status = slice->SliceTensorShape(*shape, shape_slice);
+  if (!status.ok()) {
+    *error = status.error_message();
+    return false;
+  }
+  return true;
+}
+}  // namespace
+
+void SaveTensors(
+    OpKernelContext* context,
+    checkpoint::TensorSliceWriter::CreateBuilderFunction builder_func,
+    bool save_slices) {
+  const Tensor& filename_t = context->input(0);
+  {
+    const int64 size = filename_t.NumElements();
+    OP_REQUIRES(
+        context, size == 1,
+        errors::InvalidArgument(
+            "Input 0 (filename) must be a string scalar; got a tensor of ",
+            size, "elements"));
+  }
+
+  const Tensor& tensor_names_t = context->input(1);
+  const int64 N = tensor_names_t.NumElements();
+  const string* tensor_shapes_and_slices_ptr = nullptr;
+  if (save_slices) {
+    const Tensor& tensor_shapes_and_slices_t = context->input(2);
+    OP_REQUIRES(
+        context, tensor_shapes_and_slices_t.NumElements() == N,
+        errors::InvalidArgument("Expected ", N,
+                                " elements for the tensor "
+                                "shapes and slices but got ",
+                                tensor_shapes_and_slices_t.NumElements()));
+    tensor_shapes_and_slices_ptr =
+        tensor_shapes_and_slices_t.flat<string>().data();
+  }
+  // Path, names, and slices if save_slices is true.
+  const int kFixedInputs = save_slices ? 3 : 2;
+  OP_REQUIRES(context, context->num_inputs() == N + kFixedInputs,
+              errors::InvalidArgument("Expected totally ", N + kFixedInputs,
+                                      " inputs as input #1 (which is a string "
+                                      "tensor of saved names) contains ",
+                                      N, " names, but received ",
+                                      context->num_inputs(), " inputs"));
+
+  VLOG(1) << "About to save tensors to file " << filename_t.flat<string>()(0)
+          << "...";
+  checkpoint::TensorSliceWriter writer(filename_t.flat<string>()(0),
+                                       builder_func);
+
+  Status s;
+  auto tensor_names_flat = tensor_names_t.flat<string>();
+
+  string error;
+  for (int64 i = 0; i < N; ++i) {
+    const string& name = tensor_names_flat(i);
+    const Tensor& input = context->input(i + kFixedInputs);
+    TensorShape shape(input.shape());
+    TensorSlice slice(input.dims());
+    if (save_slices && !tensor_shapes_and_slices_ptr[i].empty()) {
+      const string& shape_spec = tensor_shapes_and_slices_ptr[i];
+      TensorShape slice_shape;
+      OP_REQUIRES(context, ParseShapeAndSlice(shape_spec, &shape, &slice,
+                                              &slice_shape, &error),
+                  errors::InvalidArgument(error));
+      OP_REQUIRES(context, slice_shape.IsSameSize(input.shape()),
+                  errors::InvalidArgument("Slice in shape_and_slice "
+                                          "specification does not match the "
+                                          "shape of the tensor to  save: ",
+                                          shape_spec, ", tensor: ",
+                                          input.shape().DebugString()));
+    }
+
+#define WRITER_ADD(dt)                                             \
+  case dt:                                                         \
+    s = writer.Add(name, shape, slice,                             \
+                   input.flat<EnumToDataType<dt>::Type>().data()); \
+    break
+
+    switch (input.dtype()) {
+      WRITER_ADD(DT_FLOAT);
+      WRITER_ADD(DT_DOUBLE);
+      WRITER_ADD(DT_INT32);
+      WRITER_ADD(DT_UINT8);
+      WRITER_ADD(DT_INT16);
+      WRITER_ADD(DT_INT8);
+      WRITER_ADD(DT_INT64);
+      WRITER_ADD(DT_QUINT8);
+      WRITER_ADD(DT_QINT8);
+      WRITER_ADD(DT_QINT32);
+      default:
+        context->SetStatus(errors::Unimplemented("Saving data type ",
+                                                 DataTypeString(input.dtype()),
+                                                 " not yet supported"));
+        return;
+    }
+#undef WRITER_ADD
+    if (!s.ok()) {
+      context->SetStatus(s);
+      return;
+    }
+  }
+
+  s = writer.Finish();
+  if (!s.ok()) {
+    context->SetStatus(s);
+  }
+}
+
+void RestoreTensor(OpKernelContext* context,
+                   checkpoint::TensorSliceReader::OpenTableFunction open_func,
+                   int preferred_shard, bool restore_slice) {
+  const Tensor& file_pattern_t = context->input(0);
+  {
+    const int64 size = file_pattern_t.NumElements();
+    OP_REQUIRES(
+        context, size == 1,
+        errors::InvalidArgument(
+            "Input 0 (file_pattern) must be a string scalar; got a tensor of ",
+            size, "elements"));
+  }
+  const string& file_pattern = file_pattern_t.flat<string>()(0);
+
+  const Tensor& tensor_name_t = context->input(1);
+  {
+    const int64 size = tensor_name_t.NumElements();
+    OP_REQUIRES(
+        context, size == 1,
+        errors::InvalidArgument(
+            "Input 1 (tensor_name) must be a string scalar; got a tensor of ",
+            size, "elements"));
+  }
+  const string& tensor_name = tensor_name_t.flat<string>()(0);
+
+  const string* tensor_shape_and_slice_ptr = nullptr;
+  if (restore_slice) {
+    const Tensor& tensor_shape_and_slice_t = context->input(2);
+    OP_REQUIRES(
+        context, tensor_shape_and_slice_t.NumElements() == 1,
+        errors::InvalidArgument("Expected 1 element for the tensor "
+                                "shape and slice but got ",
+                                tensor_shape_and_slice_t.NumElements()));
+    tensor_shape_and_slice_ptr = tensor_shape_and_slice_t.flat<string>().data();
+  }
+
+  // If we cannot find a cached reader we will allocate our own.
+  std::unique_ptr<checkpoint::TensorSliceReader> allocated_reader;
+
+  const checkpoint::TensorSliceReader* reader =
+      context->slice_reader_cache()->GetReader(file_pattern, open_func,
+                                               preferred_shard);
+  if (!reader) {
+    allocated_reader.reset(new checkpoint::TensorSliceReader(
+        file_pattern, open_func, preferred_shard));
+    reader = allocated_reader.get();
+  }
+  OP_REQUIRES_OK(context, CHECK_NOTNULL(reader)->status());
+
+  // Get the shape and type from the save file.
+  DataType type;
+  TensorShape saved_shape;
+  OP_REQUIRES(
+      context, reader->HasTensor(tensor_name, &saved_shape, &type),
+      errors::NotFound("Tensor name \"", tensor_name,
+                       "\" not found in checkpoint files ", file_pattern));
+  OP_REQUIRES(
+      context, type == context->expected_output_dtype(0),
+      errors::InvalidArgument("Expected to restore a tensor of type ",
+                              DataTypeString(context->expected_output_dtype(0)),
+                              ", got a tensor of type ", DataTypeString(type),
+                              " instead: tensor_name = ", tensor_name));
+
+  // Shape of the output and slice to load.
+  TensorShape output_shape(saved_shape);
+  TensorSlice slice_to_load(saved_shape.dims());
+  if (restore_slice && !tensor_shape_and_slice_ptr[0].empty()) {
+    const string& shape_spec = tensor_shape_and_slice_ptr[0];
+    TensorShape parsed_shape;
+    string error;
+    OP_REQUIRES(context,
+                ParseShapeAndSlice(shape_spec, &parsed_shape, &slice_to_load,
+                                   &output_shape, &error),
+                errors::InvalidArgument(error));
+    OP_REQUIRES(
+        context, parsed_shape.IsSameSize(saved_shape),
+        errors::InvalidArgument(
+            "Shape in shape_and_slice spec does not match the shape in the "
+            "save file: ",
+            parsed_shape.DebugString(), ", save file shape: ",
+            saved_shape.DebugString()));
+  }
+
+  Tensor* t = nullptr;
+  OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &t));
+#define READER_COPY(dt)                                                \
+  case dt:                                                             \
+    reader->CopySliceData(tensor_name, slice_to_load,                  \
+                          t->flat<EnumToDataType<dt>::Type>().data()); \
+    break
+
+  switch (type) {
+    READER_COPY(DT_FLOAT);
+    READER_COPY(DT_DOUBLE);
+    READER_COPY(DT_INT32);
+    READER_COPY(DT_UINT8);
+    READER_COPY(DT_INT16);
+    READER_COPY(DT_INT8);
+    READER_COPY(DT_INT64);
+    default:
+      context->SetStatus(errors::Unimplemented(
+          "Restoring data type ", DataTypeString(type), " not yet supported"));
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/io.h b/tensorflow/core/kernels/io.h
new file mode 100644
index 0000000000..7e548f1ad0
--- /dev/null
+++ b/tensorflow/core/kernels/io.h
@@ -0,0 +1,38 @@
+#ifndef TENSORFLOW_KERNELS_IO_H_
+#define TENSORFLOW_KERNELS_IO_H_
+
+#include "tensorflow/core/util/tensor_slice_reader.h"
+#include "tensorflow/core/util/tensor_slice_writer.h"
+
+namespace tensorflow {
+
+class OpKernelContext;
+
+// Save input tensors in *context to a writer built from builder_func().
+// context must have the following inputs:
+//  0: a single element string tensor that contains the file name.
+//  1: names for the remaining tensors
+// If save_slices is true:
+//  2: shape and slice specifications.
+//  rest: tensors to save
+void SaveTensors(
+    OpKernelContext* context,
+    checkpoint::TensorSliceWriter::CreateBuilderFunction builder_func,
+    bool save_slices);
+
+// Reads a tensor from the reader built from open_func() and produces it as
+// context->output(0).  "preferred_shard" is the same the TensorSliceReader
+// preferred_shard parameter.
+//
+// context must have the following inputs:
+//  0: a single element string tensor that contains the file name.
+//  1: a single element string tensor that names the output to be restored.
+// If restore_slice is true:
+//  2: shape and slice specification of the tensor to restore.
+void RestoreTensor(OpKernelContext* context,
+                   checkpoint::TensorSliceReader::OpenTableFunction open_func,
+                   int preferred_shard, bool restore_slice);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_IO_H_
diff --git a/tensorflow/core/kernels/l2loss_op.cc b/tensorflow/core/kernels/l2loss_op.cc
new file mode 100644
index 0000000000..6f83f01676
--- /dev/null
+++ b/tensorflow/core/kernels/l2loss_op.cc
@@ -0,0 +1,69 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/l2loss_op.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class L2LossOp : public OpKernel {
+ public:
+  explicit L2LossOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // The input tensor can be of any number of dimensions, even though it's
+    // 2D in most typical applications.
+    const Tensor& input = context->input(0);
+    // The output is a single number.
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, TensorShape({}), &output));
+    functor::L2Loss<Device, T>()(context->eigen_device<Device>(),
+                                 input.flat<T>(), output->scalar<T>());
+  }
+};
+
+#define REGISTER_KERNEL(T)                                      \
+  REGISTER_KERNEL_BUILDER(                                      \
+      Name("L2Loss").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
+      L2LossOp<CPUDevice, T>);
+
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                                    \
+  template <>                                                                  \
+  void L2Loss<GPUDevice, T>::operator()(const GPUDevice& d,                    \
+                                        typename TTypes<T>::ConstTensor input, \
+                                        typename TTypes<T>::Scalar output);    \
+  extern template struct L2Loss<GPUDevice, T>;
+
+DECLARE_GPU_SPEC(float);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNEL(T)                                  \
+  REGISTER_KERNEL_BUILDER(                                      \
+      Name("L2Loss").Device(DEVICE_GPU).TypeConstraint<T>("T"), \
+      L2LossOp<GPUDevice, T>);
+
+REGISTER_GPU_KERNEL(float);
+#undef REGISTER_GPU_KERNEL
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/l2loss_op.h b/tensorflow/core/kernels/l2loss_op.h
new file mode 100644
index 0000000000..d307353e24
--- /dev/null
+++ b/tensorflow/core/kernels/l2loss_op.h
@@ -0,0 +1,24 @@
+#ifndef TENSORFLOW_KERNELS_L2LOSS_OP_H_
+#define TENSORFLOW_KERNELS_L2LOSS_OP_H_
+// Functor definition for L2LossOp, must be compilable by nvcc.
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by L2LossOp to do the computations.
+template <typename Device, typename T>
+struct L2Loss {
+  void operator()(const Device& d, typename TTypes<T>::ConstTensor input,
+                  typename TTypes<T>::Scalar output) {
+    // We flatten the input tensor and reduce on dimension 0, producing
+    // a single number which is Mul(Sum(x^2), 0.5).
+    output.device(d) = input.square().sum() * static_cast<T>(0.5);
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_L2LOSS_OP_H_
diff --git a/tensorflow/core/kernels/l2loss_op_gpu.cu.cc b/tensorflow/core/kernels/l2loss_op_gpu.cu.cc
new file mode 100644
index 0000000000..858fcfe8d3
--- /dev/null
+++ b/tensorflow/core/kernels/l2loss_op_gpu.cu.cc
@@ -0,0 +1,16 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/l2loss_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+template struct functor::L2Loss<GPUDevice, float>;
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/linalg_ops_common.cc b/tensorflow/core/kernels/linalg_ops_common.cc
new file mode 100644
index 0000000000..93342a7a24
--- /dev/null
+++ b/tensorflow/core/kernels/linalg_ops_common.cc
@@ -0,0 +1,99 @@
+#include "tensorflow/core/kernels/linalg_ops_common.h"
+
+namespace tensorflow {
+
+void LinearAlgebraOpBase::Compute(OpKernelContext* context) {
+  const Tensor& in = context->input(0);
+
+  const int input_rank = GetInputMatrixRank();
+  OP_REQUIRES(
+      context, input_rank == 2,
+      errors::InvalidArgument("Only matrix inputs are supported so far."));
+  if (SupportsBatchOperation()) {
+    OP_REQUIRES(context, in.dims() > input_rank,
+                errors::InvalidArgument("Input tensor must have rank >= %d",
+                                        input_rank + 1));
+  } else {
+    OP_REQUIRES(context, in.dims() == input_rank,
+                errors::InvalidArgument("Input tensor must have rank == %d",
+                                        input_rank));
+  }
+
+  // If the tensor rank is greater than input_rank, we consider the inner-most
+  // dimensions as matrices, and loop over all the other outer
+  // dimensions to compute the results.
+  // TODO(kalakris): Only matrix inputs are currently supported.
+  const int row_dimension = in.dims() - 2;
+  const int col_dimension = in.dims() - 1;
+  const int64 num_rows = in.dim_size(row_dimension);
+  const int64 num_cols = in.dim_size(col_dimension);
+  const TensorShape input_matrix_shape = TensorShape({num_rows, num_cols});
+  const TensorShape output_matrix_shape =
+      GetOutputMatrixShape(input_matrix_shape);
+  OP_REQUIRES(context, output_matrix_shape.dims() <= 2,
+              errors::InvalidArgument("Output rank must be 1 or 2."));
+
+  int num_matrices = 1;
+  // The output has the shape of all the outer dimensions of the input
+  // except for the last two, plus the output_matrix_shape (if the output
+  // is not scalar). This still assumes that each input matrix is
+  // 2-dimensional, in accordance with the TODO above.
+  TensorShape output_shape;
+  if (in.dims() == 2) {
+    output_shape = output_matrix_shape;
+  } else {
+    for (int dim = 0; dim <= in.dims() - 3; ++dim) {
+      num_matrices *= in.dim_size(dim);
+      output_shape.AddDim(in.dim_size(dim));
+    }
+    for (int dim = 0; dim < output_matrix_shape.dims(); ++dim) {
+      output_shape.AddDim(output_matrix_shape.dim_size(dim));
+    }
+  }
+
+  Tensor* out = nullptr;
+  OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &out));
+
+  auto shard = [this, &in, &input_matrix_shape, &output_matrix_shape, context,
+                out](int64 begin, int64 end) {
+    for (int64 i = begin; i < end; ++i) {
+      ComputeMatrix(context, i, in, input_matrix_shape, out,
+                    output_matrix_shape);
+    }
+  };
+
+  auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
+  Shard(worker_threads.num_threads, worker_threads.workers, num_matrices,
+        GetCostPerUnit(input_matrix_shape), shard);
+}
+
+template <typename Scalar, bool SupportsBatchOperationT>
+void LinearAlgebraOp<Scalar, SupportsBatchOperationT>::ComputeMatrix(
+    OpKernelContext* context, int64 matrix_index, const Tensor& in,
+    const TensorShape& input_matrix_shape, Tensor* out,
+    const TensorShape& output_matrix_shape) {
+  // TODO(kalakris): Handle alignment if possible. Eigen::Map is
+  // unaligned by default.
+  ConstMatrixMap input(in.flat<Scalar>().data() +
+                           matrix_index * input_matrix_shape.num_elements(),
+                       input_matrix_shape.dim_size(0),
+                       input_matrix_shape.dim_size(1));
+
+  // The output matrix shape may not be a matrix.
+  int num_output_rows =
+      output_matrix_shape.dims() >= 1 ? output_matrix_shape.dim_size(0) : 1;
+  int num_output_cols =
+      output_matrix_shape.dims() == 2 ? output_matrix_shape.dim_size(1) : 1;
+  MatrixMap output(out->flat<Scalar>().data() +
+                       matrix_index * output_matrix_shape.num_elements(),
+                   num_output_rows, num_output_cols);
+  ComputeMatrix(context, input, &output);
+}
+
+// Explicitly instantiate LinearAlgebraOp for the scalar types we expect to use.
+template class LinearAlgebraOp<float, false>;
+template class LinearAlgebraOp<float, true>;
+template class LinearAlgebraOp<double, false>;
+template class LinearAlgebraOp<double, true>;
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/linalg_ops_common.h b/tensorflow/core/kernels/linalg_ops_common.h
new file mode 100644
index 0000000000..471f11e25f
--- /dev/null
+++ b/tensorflow/core/kernels/linalg_ops_common.h
@@ -0,0 +1,123 @@
+#ifndef TENSORFLOW_KERNELS_LINALG_OPS_COMMON_H_
+#define TENSORFLOW_KERNELS_LINALG_OPS_COMMON_H_
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/kernel_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/util/work_sharder.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+// A base class to support linear algebra functionality, similar to the
+// numpy.linalg module. Supports batch computation on several matrices at once,
+// sharding the computations across different threads if necessary.
+//
+// TODO(kalakris): This needs to be expanded to support binary inputs, and
+// multiple outputs.
+class LinearAlgebraOpBase : public OpKernel {
+ public:
+  explicit LinearAlgebraOpBase(OpKernelConstruction* context)
+      : OpKernel(context) {}
+  ~LinearAlgebraOpBase() override {}
+
+  // Return the expected rank of the input.
+  // TODO(kalakris): This should be a virtual function to support vector inputs.
+  int GetInputMatrixRank() { return 2; }
+
+  // Return the output shape of each individual matrix operation. Must be
+  // rank 0, 1, or 2.  Scalar outputs are rank 0.
+  virtual TensorShape GetOutputMatrixShape(
+      const TensorShape& input_matrix_shape) = 0;
+
+  // Return the cost per matrix operation. Cost per unit is assumed to be
+  // roughly 1ns, based on comments in core/util/work_sharder.cc.
+  virtual int64 GetCostPerUnit(const TensorShape& input_matrix_shape) = 0;
+
+  // If SupportsBatchOperation() returns false, this Op will only accept rank 2
+  // (if the supported input type is a matrix). If it returns true, the Op will
+  // accept inputs of rank >= 3, and repeatedly execute the operation on all
+  // matrices in the innermost two dimensions.
+  virtual bool SupportsBatchOperation() = 0;
+
+  // Perform the actual computation on an input matrix, and store the results
+  // in the output. This will be called repeatedly for a single call to
+  // Compute(), if multiple matrices exist in the input Tensor.
+  //
+  // This function should only compute the results for a single input matrix.
+  // The 'matrix_index' parameter specifies the index of the matrix to be used
+  // from the input, and the index of the matrix to be written to in the output.
+  // The input matrix is in row major order, and is located at the memory
+  // address
+  //   in.flat<Scalar>().data() +
+  //   matrix_index * input_matrix_shape.num_elements().
+  // The output matrix is in row major order, and is located at the memory
+  // address
+  //   out->flat<Scalar>().data() +
+  //   matrix_index * output_matrix_shape.num_elements().
+  // The LinearAlgebraOp<Scalar> class below has functionality which performs
+  // this mapping and presents an interface based on the Eigen::MatrixBase API.
+  virtual void ComputeMatrix(OpKernelContext* context, int64 matrix_index,
+                             const Tensor& in,
+                             const TensorShape& input_matrix_shape, Tensor* out,
+                             const TensorShape& output_matrix_shape) = 0;
+
+  void Compute(OpKernelContext* context) override;
+};
+
+// A base class for linear algebra ops templated on the scalar type.
+//
+// This base class encapsulates the functionality of mapping the input and
+// output tensors using Eigen::Map, so that the Eigen::MatrixBase API may be
+// directly used by derived classes.
+// SupportsBatchOperationT is a bool template argument which if set to true
+// will allow the Op to process batches of matrices (rank >= 3); if set to
+// false the Op will only accept rank 2 inputs.
+template <typename Scalar, bool SupportsBatchOperationT>
+class LinearAlgebraOp : public LinearAlgebraOpBase {
+ public:
+  explicit LinearAlgebraOp(OpKernelConstruction* context)
+      : LinearAlgebraOpBase(context) {}
+
+  using ConstMatrixMap =
+      Eigen::Map<const Eigen::Matrix<Scalar, Eigen::Dynamic, Eigen::Dynamic,
+                                     Eigen::RowMajor>>;
+  using MatrixMap = Eigen::Map<
+      Eigen::Matrix<Scalar, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>>;
+
+  // Perform the actual computation on the input matrix, and store the results
+  // in the output. This will be called repeatedly for a single call to
+  // Compute(), if multiple matrices exist in the input Tensor.
+  virtual void ComputeMatrix(OpKernelContext* context,
+                             const ConstMatrixMap& input,
+                             MatrixMap* output) = 0;
+
+  bool SupportsBatchOperation() final { return SupportsBatchOperationT; }
+
+  // A concrete implementation of LinearAlgebraOpBase::ComputeMatrix().
+  void ComputeMatrix(OpKernelContext* context, int64 matrix_index,
+                     const Tensor& in, const TensorShape& input_matrix_shape,
+                     Tensor* out, const TensorShape& output_matrix_shape) final;
+};
+
+// Declare that LinearAlgebraOp is explicitly instantiated in
+// linalg_ops_common.cc for float and double.
+extern template class LinearAlgebraOp<float, false>;
+extern template class LinearAlgebraOp<float, true>;
+extern template class LinearAlgebraOp<double, false>;
+extern template class LinearAlgebraOp<double, true>;
+
+}  // namespace tensorflow
+
+#define REGISTER_LINALG_OP(OpName, OpClass, Scalar) \
+  REGISTER_KERNEL_BUILDER(                          \
+      Name(OpName).Device(DEVICE_CPU).TypeConstraint<Scalar>("T"), OpClass)
+
+#endif  // TENSORFLOW_KERNELS_LINALG_OPS_COMMON_H_
diff --git a/tensorflow/core/kernels/listdiff_op.cc b/tensorflow/core/kernels/listdiff_op.cc
new file mode 100644
index 0000000000..f490f5ddd3
--- /dev/null
+++ b/tensorflow/core/kernels/listdiff_op.cc
@@ -0,0 +1,75 @@
+#include <unordered_set>
+#include <utility>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+template <typename T>
+class ListDiffOp : public OpKernel {
+ public:
+  explicit ListDiffOp(OpKernelConstruction* context) : OpKernel(context) {
+    const DataType dt = DataTypeToEnum<T>::v();
+    OP_REQUIRES_OK(context, context->MatchSignature({dt, dt}, {dt, DT_INT32}));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& x = context->input(0);
+    const Tensor& y = context->input(1);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(x.shape()),
+                errors::InvalidArgument("x should be a 1D vector."));
+
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(y.shape()),
+                errors::InvalidArgument("y should be a 1D vector."));
+
+    std::unordered_set<T> y_set;
+    const auto Ty = y.vec<T>();
+    const int y_size = Ty.size();
+    y_set.reserve(y_size);
+    for (int i = 0; i < y_size; ++i) {
+      y_set.insert(Ty(i));
+    }
+
+    // Compute the size of the output.
+    const auto Tx = x.vec<T>();
+    const int x_size = Tx.size();
+
+    int out_size = 0;
+    for (int i = 0; i < x_size; ++i) {
+      if (y_set.count(Tx(i)) == 0) {
+        ++out_size;
+      }
+    }
+
+    // Allocate and populate outputs.
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, {out_size}, &out));
+    auto Tout = out->vec<T>();
+
+    Tensor* indices = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(1, {out_size}, &indices));
+    auto Tindices = indices->vec<int32>();
+
+    for (int i = 0, p = 0; i < x_size; ++i) {
+      if (y_set.count(Tx(i)) == 0) {
+        Tout(p) = Tx(i);
+        Tindices(p) = i;
+        p++;
+      }
+    }
+  }
+};
+
+#define REGISTER_LISTDIFF(type)                                      \
+  REGISTER_KERNEL_BUILDER(                                           \
+      Name("ListDiff").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ListDiffOp<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_LISTDIFF);
+#undef REGISTER_LISTDIFF
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/logging_ops.cc b/tensorflow/core/kernels/logging_ops.cc
new file mode 100644
index 0000000000..ec84145f75
--- /dev/null
+++ b/tensorflow/core/kernels/logging_ops.cc
@@ -0,0 +1,77 @@
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+class AssertOp : public OpKernel {
+ public:
+  explicit AssertOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("summarize", &summarize_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& cond = ctx->input(0);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsLegacyScalar(cond.shape()),
+                errors::InvalidArgument("In[0] should be a scalar: ",
+                                        cond.shape().ShortDebugString()));
+
+    if (cond.scalar<bool>()()) {
+      return;
+    }
+    string msg = "assertion failed: ";
+    for (int i = 1; i < ctx->num_inputs(); ++i) {
+      strings::StrAppend(&msg, "[", ctx->input(i).SummarizeValue(summarize_),
+                         "]");
+      if (i < ctx->num_inputs() - 1) strings::StrAppend(&msg, " ");
+    }
+    ctx->SetStatus(errors::InvalidArgument(msg));
+  }
+
+ private:
+  int32 summarize_ = 0;
+};
+
+REGISTER_KERNEL_BUILDER(Name("Assert").Device(DEVICE_CPU), AssertOp);
+
+class PrintOp : public OpKernel {
+ public:
+  explicit PrintOp(OpKernelConstruction* ctx)
+      : OpKernel(ctx), call_counter_(0) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("message", &message_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("first_n", &first_n_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("summarize", &summarize_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    if (IsRefType(ctx->input_dtype(0))) {
+      ctx->forward_ref_input_to_ref_output(0, 0);
+    } else {
+      ctx->set_output(0, ctx->input(0));
+    }
+    if (first_n_ >= 0) {
+      mutex_lock l(mu_);
+      if (call_counter_ >= first_n_) return;
+      call_counter_++;
+    }
+    string msg;
+    strings::StrAppend(&msg, message_);
+    for (int i = 1; i < ctx->num_inputs(); ++i) {
+      strings::StrAppend(&msg, "[", ctx->input(i).SummarizeValue(summarize_),
+                         "]");
+    }
+    LOG(INFO) << msg;
+  }
+
+ private:
+  mutex mu_;
+  int64 call_counter_ GUARDED_BY(mu_) = 0;
+  int64 first_n_ = 0;
+  int32 summarize_ = 0;
+  string message_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("Print").Device(DEVICE_CPU), PrintOp);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/logging_ops_test.cc b/tensorflow/core/kernels/logging_ops_test.cc
new file mode 100644
index 0000000000..a7af6eb303
--- /dev/null
+++ b/tensorflow/core/kernels/logging_ops_test.cc
@@ -0,0 +1,87 @@
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace tensorflow {
+namespace {
+
+class PrintingGraphTest : public OpsTestBase {
+ protected:
+  Status Init(DataType input_type1, DataType input_type2, string msg = "",
+                   int first_n = -1, int summarize = 3) {
+    RequireDefaultOps();
+    TF_CHECK_OK(NodeDefBuilder("op", "Print")
+                    .Input(FakeInput(input_type1))
+                    .Input(FakeInput(2, input_type2))
+                    .Attr("message", msg)
+                    .Attr("first_n", first_n)
+                    .Attr("summarize", summarize)
+                    .Finalize(node_def()));
+    return InitOp();
+  }
+};
+
+TEST_F(PrintingGraphTest, Int32Success_6) {
+  ASSERT_OK(Init(DT_INT32, DT_INT32));
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({6}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+}
+
+TEST_F(PrintingGraphTest, Int32Success_Summarize6) {
+  ASSERT_OK(Init(DT_INT32, DT_INT32, "", -1, 6));
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({6}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+}
+
+TEST_F(PrintingGraphTest, StringSuccess) {
+  ASSERT_OK(Init(DT_INT32, DT_STRING));
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<string>(TensorShape({}), {"foo"});
+  AddInputFromArray<string>(TensorShape({}), {"bar"});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({6}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+}
+
+TEST_F(PrintingGraphTest, MsgSuccess) {
+  ASSERT_OK(Init(DT_INT32, DT_STRING, "Message: "));
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<string>(TensorShape({}), {"foo"});
+  AddInputFromArray<string>(TensorShape({}), {"bar"});
+  ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({6}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+}
+
+TEST_F(PrintingGraphTest, FirstNSuccess) {
+  ASSERT_OK(Init(DT_INT32, DT_STRING, "", 3));
+  AddInputFromArray<int32>(TensorShape({6}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<string>(TensorShape({}), {"foo"});
+  AddInputFromArray<string>(TensorShape({}), {"bar"});
+  // run 4 times but we only print 3 as intended
+  for (int i = 0; i < 4; i++) ASSERT_OK(RunOpKernel());
+  Tensor expected(allocator(), DT_INT32, TensorShape({6}));
+  test::FillValues<int32>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<int32>(expected, *GetOutput(0));
+}
+
+}  // end namespace
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/lookup_table_init_op.cc b/tensorflow/core/kernels/lookup_table_init_op.cc
new file mode 100644
index 0000000000..9781bcfa59
--- /dev/null
+++ b/tensorflow/core/kernels/lookup_table_init_op.cc
@@ -0,0 +1,116 @@
+#define EIGEN_USE_THREADS
+
+#include <string>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/initializable_lookup_table.h"
+#include "tensorflow/core/kernels/lookup_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace lookup {
+
+// Iterator to initialize tables given 'keys' and 'values' tensors.
+//
+// The two tensors are returned in the first iteration. It doesn't loop
+// over each element of the tensor since insertions in the lookup table can
+// process batches.
+class KeyValueTensorIterator
+    : public InitializableLookupTable::InitTableIterator {
+ public:
+  // keys and values are not owned by the iterator.
+  explicit KeyValueTensorIterator(const Tensor* keys, const Tensor* values)
+      : keys_(keys), values_(values), valid_(true), status_(Status::OK()) {
+    TensorShape key_shape = keys_->shape();
+    if (!key_shape.IsSameSize(values_->shape())) {
+      valid_ = false;
+      status_ = errors::InvalidArgument(
+          "keys and values should have the same dimension.",
+          key_shape.DebugString(), " vs ", values_->shape().DebugString());
+    }
+    if (key_shape.num_elements() == 0) {
+      valid_ = false;
+      status_ =
+          errors::InvalidArgument("keys and values cannot be empty tensors.");
+    }
+  }
+
+  bool Valid() const override { return valid_; }
+
+  void Next() override {
+    valid_ = false;
+    status_ = errors::OutOfRange("No more data.");
+  }
+
+  const Tensor& keys() const override { return *keys_; }
+
+  const Tensor& values() const override { return *values_; }
+
+  Status status() const override { return status_; }
+
+  int64 total_size() const {
+    return keys_ == nullptr ? -1 : keys_->NumElements();
+  }
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(KeyValueTensorIterator);
+
+  const Tensor* keys_;    // Doesn't own it.
+  const Tensor* values_;  // Doesn't own it.
+  bool valid_;            // true if the iterator points to an existing range.
+  Status status_;
+};
+
+}  // namespace lookup
+
+// Kernel to initialize a look table given a key and value tensors.
+// After this operation, the table becomes read-only.
+class InitializeTableOp : public OpKernel {
+ public:
+  explicit InitializeTableOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    mutex_lock l(mu_);
+    lookup::InitializableLookupTable* table;
+    OP_REQUIRES_OK(ctx,
+                   GetInitializableLookupTable("table_handle", ctx, &table));
+    core::ScopedUnref unref_me(table);
+
+    DataTypeVector expected_inputs = {DT_STRING_REF, table->key_dtype(),
+                                      table->value_dtype()};
+    DataTypeVector expected_outputs = {};
+    OP_REQUIRES_OK(ctx, ctx->MatchSignature(expected_inputs, expected_outputs));
+
+    const Tensor& keys = ctx->input(1);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsVector(keys.shape()),
+                errors::InvalidArgument("Keys must be a vector, but received ",
+                                        keys.shape().DebugString()));
+
+    const Tensor& values = ctx->input(2);
+    OP_REQUIRES(
+        ctx, TensorShapeUtils::IsVector(values.shape()),
+        errors::InvalidArgument("Values must be a vector, but received ",
+                                values.shape().DebugString()));
+
+    OP_REQUIRES(ctx, keys.NumElements() == values.NumElements(),
+                errors::InvalidArgument(
+                    "Keys and values must have the same size ",
+                    keys.NumElements(), " vs ", values.NumElements()));
+
+    lookup::KeyValueTensorIterator iter(&keys, &values);
+    OP_REQUIRES_OK(ctx, table->Initialize(iter));
+  }
+
+ private:
+  mutex mu_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("InitializeTable").Device(DEVICE_CPU),
+                        InitializeTableOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/lookup_table_op.cc b/tensorflow/core/kernels/lookup_table_op.cc
new file mode 100644
index 0000000000..2bab4df94f
--- /dev/null
+++ b/tensorflow/core/kernels/lookup_table_op.cc
@@ -0,0 +1,166 @@
+#include "tensorflow/core/kernels/lookup_table_op.h"
+#define EIGEN_USE_THREADS
+
+#include <string>
+#include <utility>
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/initializable_lookup_table.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/hash/hash.h"
+
+namespace tensorflow {
+namespace lookup {
+
+// Lookup table that wraps an unordered_map, where the key and value data type
+// is specified.
+//
+// This table is recommened for any variations to key values.
+//
+// For look up, the table is required to be initialized (allocated
+// and populated). Once the table is marked as initialized it becomes read-only.
+//
+// Sample use case:
+//
+// HashTable<int64, int64> table;  // int64 -> int64.
+// table.Prepare(10); // Prepare the underlying data structure, the number of
+//                    // elements is required by interface, but not used.
+// // Populate the table, elements could be added in one or multiple calls.
+// table.Insert(key_tensor, value_tensor); // Populate the table.
+// ...
+// table.set_is_initialized();
+//
+// table.Find(in_t, &out_t, default_t)
+//
+template <class K, class V>
+class HashTable : public InitializableLookupTable {
+ public:
+  size_t size() const override { return table_ ? table_->size() : 0; }
+
+  DataType key_dtype() const override { return DataTypeToEnum<K>::v(); }
+
+  DataType value_dtype() const override { return DataTypeToEnum<V>::v(); }
+
+ protected:
+  Status DoPrepare(size_t unused) override {
+    if (is_initialized_) {
+      return errors::Aborted("HashTable already initialized.");
+    }
+    if (!table_) {
+      table_ = std::unique_ptr<std::unordered_map<K, V>>(
+          new std::unordered_map<K, V>());
+    }
+    return Status::OK();
+  };
+
+  Status DoInsert(const Tensor& keys, const Tensor& values) override {
+    if (!table_) {
+      return errors::FailedPrecondition("HashTable is not prepared.");
+    }
+
+    const auto key_values = keys.flat<K>();
+    const auto value_values = values.flat<V>();
+    for (size_t i = 0; i < key_values.size(); ++i) {
+      const K& key = key_values(i);
+      const V& value = value_values(i);
+      const V& previous_value = gtl::LookupOrInsert(table_.get(), key, value);
+      if (previous_value != value) {
+        return errors::FailedPrecondition(
+            "HashTable has different value for same key. Key ", key, " has ",
+            previous_value, " and trying to add value ", value);
+      }
+    }
+    return Status::OK();
+  }
+
+  Status DoFind(const Tensor& key, Tensor* value,
+                const Tensor& default_value) override {
+    const V default_val = default_value.flat<V>()(0);
+    const auto key_values = key.flat<K>();
+    auto value_values = value->flat<V>();
+
+    for (size_t i = 0; i < key_values.size(); ++i) {
+      value_values(i) =
+          gtl::FindWithDefault(*table_, key_values(i), default_val);
+    }
+    return Status::OK();
+  }
+
+ private:
+  std::unique_ptr<std::unordered_map<K, V>> table_;
+};
+
+}  // namespace lookup
+
+// Table lookup op. Perform the lookup operation on the given table.
+class LookupTableFindOp : public OpKernel {
+ public:
+  explicit LookupTableFindOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    lookup::LookupInterface* table;
+    OP_REQUIRES_OK(ctx, GetLookupTable("table_handle", ctx, &table));
+    core::ScopedUnref unref_me(table);
+
+    DataTypeVector expected_inputs = {DT_STRING_REF, table->key_dtype(),
+                                      table->value_dtype()};
+    DataTypeVector expected_outputs = {table->value_dtype()};
+    OP_REQUIRES_OK(ctx, ctx->MatchSignature(expected_inputs, expected_outputs));
+
+    const Tensor& input = ctx->input(1);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsVector(input.shape()),
+                errors::InvalidArgument("Input must be a vector, not ",
+                                        input.shape().DebugString()));
+
+    const Tensor& default_value = ctx->input(2);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(default_value.shape()),
+                errors::InvalidArgument("Default value must be a scalar, not ",
+                                        default_value.shape().DebugString()));
+
+    Tensor* out;
+    OP_REQUIRES_OK(ctx,
+                   ctx->allocate_output("output_values", input.shape(), &out));
+
+    OP_REQUIRES_OK(ctx, table->Find(input, out, default_value));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("LookupTableFind").Device(DEVICE_CPU),
+                        LookupTableFindOp);
+
+// Op that returns the size of the given table.
+class LookupTableSizeOp : public OpKernel {
+ public:
+  explicit LookupTableSizeOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    lookup::LookupInterface* table;
+    OP_REQUIRES_OK(ctx, GetLookupTable("table_handle", ctx, &table));
+    core::ScopedUnref unref_me(table);
+
+    Tensor* out;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output("size", TensorShape({}), &out));
+    out->flat<int64>().setConstant(table->size());
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("LookupTableSize").Device(DEVICE_CPU),
+                        LookupTableSizeOp);
+
+// Register the HashTable op with the currently supported key and value types.
+#define REGISTER_KERNEL(key_dtype, value_dtype)                           \
+  REGISTER_KERNEL_BUILDER(                                                \
+      Name("HashTable")                                                   \
+          .Device(DEVICE_CPU)                                             \
+          .TypeConstraint<key_dtype>("key_dtype")                         \
+          .TypeConstraint<value_dtype>("value_dtype"),                    \
+      LookupTableOp<lookup::HashTable<key_dtype, value_dtype>, key_dtype, \
+                    value_dtype>)
+
+REGISTER_KERNEL(string, int64);
+REGISTER_KERNEL(int64, string);
+
+#undef REGISTER_KERNEL
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/lookup_table_op.h b/tensorflow/core/kernels/lookup_table_op.h
new file mode 100644
index 0000000000..dc53ce33a6
--- /dev/null
+++ b/tensorflow/core/kernels/lookup_table_op.h
@@ -0,0 +1,80 @@
+#ifndef TENSORFLOW_KERNELS_LOOKUP_TABLE_OP_H_
+#define TENSORFLOW_KERNELS_LOOKUP_TABLE_OP_H_
+
+#include "tensorflow/core/framework/lookup_interface.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/kernels/lookup_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+// Lookup table op that supports different table implementations specified by
+// the 'Container' template. Container must be derived from LookupInterface. The
+// key and value are of the templated type "key_dtype" and "value_dtype"
+// respectively.
+template <class Container, class key_dtype, class value_dtype>
+class LookupTableOp : public OpKernel {
+ public:
+  // ctx is not owned by this class.
+  explicit LookupTableOp(OpKernelConstruction* ctx)
+      : OpKernel(ctx), table_handle_set_(false) {
+    OP_REQUIRES_OK(ctx, ctx->allocate_persistent(tensorflow::DT_STRING,
+                                                 tensorflow::TensorShape({2}),
+                                                 &table_handle_, nullptr));
+  }
+
+  // ctx is not owned by this function.
+  void Compute(OpKernelContext* ctx) override {
+    mutex_lock l(mu_);
+    if (!table_handle_set_) {
+      OP_REQUIRES_OK(ctx, cinfo_.Init(ctx->resource_manager(), def()));
+      auto creator = [this](lookup::LookupInterface** ret) {
+        *ret = new Container();
+        return Status::OK();
+      };
+
+      lookup::LookupInterface* table = nullptr;
+      OP_REQUIRES_OK(
+          ctx, cinfo_.resource_manager()
+                   ->template LookupOrCreate<lookup::LookupInterface>(
+                       cinfo_.container(), cinfo_.name(), &table, creator));
+      core::ScopedUnref unref_me(table);
+
+      OP_REQUIRES_OK(ctx, lookup::CheckTableDataTypes(
+                              *table, DataTypeToEnum<key_dtype>::v(),
+                              DataTypeToEnum<value_dtype>::v(), cinfo_.name()));
+
+      auto h = table_handle_.AccessTensor(ctx)->template flat<string>();
+      h(0) = cinfo_.container();
+      h(1) = cinfo_.name();
+      table_handle_set_ = true;
+    }
+    ctx->set_output_ref(0, &mu_, table_handle_.AccessTensor(ctx));
+  }
+
+  ~LookupTableOp() override {
+    // If the table object was not shared, delete it.
+    if (table_handle_set_ && cinfo_.resource_is_private_to_kernel()) {
+      TF_CHECK_OK(
+          cinfo_.resource_manager()->template Delete<lookup::LookupInterface>(
+              cinfo_.container(), cinfo_.name()));
+    }
+  }
+
+ private:
+  mutex mu_;
+  PersistentTensor table_handle_ GUARDED_BY(mu_);
+  bool table_handle_set_ GUARDED_BY(mu_);
+  ContainerInfo cinfo_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(LookupTableOp);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_LOOKUP_TABLE_OP_H_
diff --git a/tensorflow/core/kernels/lookup_util.cc b/tensorflow/core/kernels/lookup_util.cc
new file mode 100644
index 0000000000..634c11e4a5
--- /dev/null
+++ b/tensorflow/core/kernels/lookup_util.cc
@@ -0,0 +1,72 @@
+#include "tensorflow/core/kernels/lookup_util.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+namespace lookup {
+namespace {
+
+Status GetTableHandle(const string& input_name, OpKernelContext* ctx,
+                      string* container, string* table_handle) {
+  {
+    mutex* mu;
+    TF_RETURN_IF_ERROR(ctx->input_ref_mutex(input_name, &mu));
+    mutex_lock l(*mu);
+    Tensor tensor;
+    TF_RETURN_IF_ERROR(ctx->mutable_input(input_name, &tensor, true));
+    if (tensor.NumElements() != 2) {
+      return errors::InvalidArgument(
+          "Lookup table handle must be scalar, but had shape: ",
+          tensor.shape().DebugString());
+    }
+    auto h = tensor.flat<string>();
+    *container = h(0);
+    *table_handle = h(1);
+  }
+  return Status::OK();
+}
+
+}  // namespace
+
+Status GetLookupTable(const string& input_name, OpKernelContext* ctx,
+                      LookupInterface** table) {
+  string container;
+  string table_handle;
+  TF_RETURN_IF_ERROR(
+      GetTableHandle(input_name, ctx, &container, &table_handle));
+  return ctx->resource_manager()->Lookup(container, table_handle, table);
+}
+
+Status GetInitializableLookupTable(const string& input_name,
+                                   OpKernelContext* ctx,
+                                   InitializableLookupTable** table) {
+  string container;
+  string table_handle;
+  TF_RETURN_IF_ERROR(
+      GetTableHandle(input_name, ctx, &container, &table_handle));
+  LookupInterface* lookup_table;
+  TF_RETURN_IF_ERROR(
+      ctx->resource_manager()->Lookup(container, table_handle, &lookup_table));
+  *table = dynamic_cast<InitializableLookupTable*>(lookup_table);
+  if (*table == nullptr) {
+    lookup_table->Unref();
+    return errors::InvalidArgument("Table ", container, " ", table_handle,
+                                   " is not initializable");
+  }
+  return Status::OK();
+}
+
+Status CheckTableDataTypes(const LookupInterface& table, DataType key_dtype,
+                           DataType value_dtype, const string& table_name) {
+  if (table.key_dtype() != key_dtype || table.value_dtype() != value_dtype) {
+    return errors::InvalidArgument(
+        "Conflicting key/value dtypes ", key_dtype, "->", value_dtype, " with ",
+        table.key_dtype(), "-", table.value_dtype(), " for table ", table_name);
+  }
+  return Status::OK();
+}
+
+}  // namespace lookup
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/lookup_util.h b/tensorflow/core/kernels/lookup_util.h
new file mode 100644
index 0000000000..991a757edd
--- /dev/null
+++ b/tensorflow/core/kernels/lookup_util.h
@@ -0,0 +1,31 @@
+#ifndef THIRD_PARTY_TENSORFLOW_CORE_KERNELS_LOOKUP_UTIL_H_
+#define THIRD_PARTY_TENSORFLOW_CORE_KERNELS_LOOKUP_UTIL_H_
+
+#include "tensorflow/core/framework/lookup_interface.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/initializable_lookup_table.h"
+
+namespace tensorflow {
+namespace lookup {
+
+// Gets the LookupTable stored in the ctx->resource_manager() with key
+// passed by attribute with name input_name, returns null if the table
+// doesn't exist.
+Status GetLookupTable(const string& input_name, OpKernelContext* ctx,
+                      LookupInterface** table);
+
+// Gets the InitializableLookupTable stored in the
+// ctx->resource_manager() with key passed by attribute with name
+// input_name, returns null if the table doesn't exist.
+Status GetInitializableLookupTable(const string& input_name,
+                                   OpKernelContext* ctx,
+                                   InitializableLookupTable** table);
+
+// Verify that the given key_dtype and value_dtype matches the corresponding
+// table's data types.
+Status CheckTableDataTypes(const LookupInterface& table, DataType key_dtype,
+                           DataType value_dtype, const string& table_name);
+}  // namespace lookup
+}  // namespace tensorflow
+
+#endif  // THIRD_PARTY_TENSORFLOW_CORE_KERNELS_LOOKUP_UTIL_H_
diff --git a/tensorflow/core/kernels/lrn_op.cc b/tensorflow/core/kernels/lrn_op.cc
new file mode 100644
index 0000000000..e5abf5906f
--- /dev/null
+++ b/tensorflow/core/kernels/lrn_op.cc
@@ -0,0 +1,228 @@
+// LRN = Local Response Normalization
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+#ifndef __ANDROID__
+#include "tensorflow/core/util/work_sharder.h"
+#endif
+
+namespace tensorflow {
+
+// Create a depth-by-depth band matrix with 1s along a swath of size (2 *
+// depth_radius + 1) around the diagonal.
+static void GetBandMatrix(int depth, int64 depth_radius,
+                          Eigen::Tensor<float, 2, Eigen::RowMajor>* result) {
+  result->setZero();
+  for (int row = 0; row < depth; ++row) {
+    const int begin = std::max<int>(0, row - depth_radius);
+    const int end = std::min<int64>(depth, row + depth_radius + 1);
+    Eigen::DSizes<ptrdiff_t, 2> start(row, begin);
+    Eigen::DSizes<ptrdiff_t, 2> sizes(1, end - begin);
+    result->slice(start, sizes).setConstant(1.0f);
+  }
+}
+
+class LRNOp : public OpKernel {
+ public:
+  explicit LRNOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("depth_radius", &depth_radius_));
+    OP_REQUIRES_OK(context, context->GetAttr("bias", &bias_));
+    OP_REQUIRES_OK(context, context->GetAttr("alpha", &alpha_));
+    OP_REQUIRES_OK(context, context->GetAttr("beta", &beta_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& in = context->input(0);
+    OP_REQUIRES(context, in.dims() == 4,
+                errors::InvalidArgument("in must be 4-dimensional"));
+    const int64 batch = in.dim_size(0);
+    const int64 rows = in.dim_size(1);
+    const int64 cols = in.dim_size(2);
+    const int64 depth = in.dim_size(3);
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(
+                       0, TensorShape({batch, rows, cols, depth}), &output));
+
+#ifdef __ANDROID__
+    MognetLRN(in, batch, rows, cols, depth, output);
+#else
+    const int nodes = cols * rows;
+    auto in_shaped = in.shaped<float, 2>({nodes * batch, depth});
+
+    // Multiplying the input with the band matrix has the effect of reducing the
+    // correct patch along the depth.
+    Eigen::Tensor<float, 2, Eigen::RowMajor> multiplier(depth, depth);
+    GetBandMatrix(depth, depth_radius_, &multiplier);
+
+    auto out_shaped = output->shaped<float, 2>({nodes * batch, depth});
+    Eigen::array<DimPair, 1> dims = {{DimPair(1, 0)}};
+    /// TODO(keveman): Optimize for beta in {0, 1, 0.5}
+    out_shaped.device(context->eigen_cpu_device()) =
+        in_shaped /
+        in_shaped.square()
+            .contract(multiplier, dims)
+            .unaryExpr([this](float x) { return bias_ + alpha_ * x; })
+            .pow(beta_);
+#endif
+  }
+
+ private:
+  typedef Eigen::Tensor<float, 1, Eigen::RowMajor>::DimensionPair DimPair;
+
+  void MognetLRN(const Tensor& in, const int batch, const int rows,
+                 const int cols, const int depth, Tensor* out) {
+    Eigen::Map<const Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic>>
+    data_in(in.flat<float>().data(), depth, batch * rows * cols);
+
+    Eigen::Map<Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic>> data_out(
+        out->flat<float>().data(), depth, batch * rows * cols);
+
+    const int double_depth_radius = depth_radius_ * 2;
+    Eigen::VectorXf padded_square(data_in.rows() + double_depth_radius);
+    padded_square.setZero();
+    for (int r = 0; r < data_in.cols(); ++r) {
+      // Do local response normalization for data_in(:, r)
+      // first, compute the square and store them in buffer for repeated use
+      padded_square.block(depth_radius_, 0, data_out.rows(), 1) =
+          data_in.col(r).cwiseProduct(data_in.col(r)) * alpha_;
+      // Then, compute the scale and writes them to data_out
+      float accumulated_scale = 0;
+      for (int i = 0; i < double_depth_radius; ++i) {
+        accumulated_scale += padded_square(i);
+      }
+      for (int i = 0; i < data_in.rows(); ++i) {
+        accumulated_scale += padded_square(i + double_depth_radius);
+        data_out(i, r) = bias_ + accumulated_scale;
+        accumulated_scale -= padded_square(i);
+      }
+    }
+
+    // In a few cases, the pow computation could benefit from speedups.
+    if (beta_ == 1) {
+      data_out.array() = data_in.array() * data_out.array().inverse();
+    } else if (beta_ == 0.5) {
+      data_out.array() = data_in.array() * data_out.array().sqrt().inverse();
+    } else {
+      data_out.array() = data_in.array() * data_out.array().pow(-beta_);
+    }
+  }
+
+  int64 depth_radius_;
+  float bias_;
+  float alpha_;
+  float beta_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("LRN").Device(DEVICE_CPU), LRNOp);
+
+#ifndef __ANDROID__
+
+class LRNGradOp : public OpKernel {
+ public:
+  explicit LRNGradOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("depth_radius", &depth_radius_));
+    OP_REQUIRES_OK(context, context->GetAttr("bias", &bias_));
+    OP_REQUIRES_OK(context, context->GetAttr("alpha", &alpha_));
+    OP_REQUIRES_OK(context, context->GetAttr("beta", &beta_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& in_grads = context->input(0);
+    const Tensor& in_image = context->input(1);
+    const Tensor& out_image = context->input(2);
+
+    OP_REQUIRES(context, in_grads.dims() == 4 && in_image.dims() == 4,
+                errors::InvalidArgument("inputs must be 4-dimensional"));
+    const int64 batch = in_grads.dim_size(0);
+    const int64 rows = in_grads.dim_size(1);
+    const int64 cols = in_grads.dim_size(2);
+    const int64 depth = in_grads.dim_size(3);
+    OP_REQUIRES(
+        context,
+        in_image.dim_size(0) == batch && in_image.dim_size(1) == rows &&
+            in_image.dim_size(2) == cols && in_image.dim_size(3) == depth &&
+            out_image.dim_size(0) == batch && out_image.dim_size(1) == rows &&
+            out_image.dim_size(2) == cols && out_image.dim_size(3) == depth,
+        errors::InvalidArgument(
+            "input_grads, input_image, and out_image should have the same "
+            "shape"));
+    const auto nodes = cols * rows;
+    auto grads_shaped = in_grads.shaped<float, 2>({nodes * batch, depth});
+    auto in_shaped = in_image.shaped<float, 2>({nodes * batch, depth});
+    auto activations = out_image.shaped<float, 2>({nodes * batch, depth});
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(
+                       0, TensorShape({batch, rows, cols, depth}), &output));
+    auto out_shaped = output->shaped<float, 2>({nodes * batch, depth});
+    out_shaped.setZero();
+
+    auto shard = [this, activations, in_shaped, grads_shaped, out_shaped,
+                  depth](int64 begin, int64 end) {
+      for (int64 i = begin; i < end; ++i) {
+        for (int64 j = 0; j < depth; ++j) {
+          // Let y be the LRN activations and x be the inputs along the depth
+          // dimension. (LRN operates independently along rows, cols, and
+          // batch).
+          // We have
+          // yi = xi / (bias + alpha(sum_j_{i - depth_radius}^{i + depth_radius}
+          //      x_j^2))^beta
+          //
+          // Let N = (bias + alpha(sum_j_{i - depth_radius}^{i + depth_radius}
+          //           x_j^2))
+          // dy_i/dx_i = (N^beta - xi. beta*N^(beta-1)*2*alpha*xi)/N^(2*beta)
+          // dy_i/dx_j = (       - xi. beta*N^(beta-1)*2*alpha*xj)/N^(2*beta)
+          //
+          // NOTE(keveman) : We can compute N by doing (yi/xi) ^ (1/beta).
+          // However, this is numerically unstable for small values of xi. We
+          // compute N explicitly here to avoid that.
+
+          int64 depth_begin = std::max<int64>(0, j - depth_radius_);
+          int64 depth_end = std::min<int64>(depth, j + depth_radius_ + 1);
+
+          float norm = 0.0f;
+          for (int64 k = depth_begin; k < depth_end; ++k) {
+            norm += in_shaped(i, k) * in_shaped(i, k);
+          }
+          norm = alpha_ * norm + bias_;
+          DCHECK_GT(norm, 1e-6);
+          for (int64 k = depth_begin; k < depth_end; ++k) {
+            float dyi = -2.0f * alpha_ * beta_ * in_shaped(i, k) *
+                        activations(i, j) / norm;
+            if (k == j) {
+              dyi += std::pow(norm, -beta_);
+            }
+            dyi *= grads_shaped(i, j);
+            const_cast<TTypes<float, 2>::Tensor&>(out_shaped)(i, k) += dyi;
+          }
+        }
+      }
+    };
+    auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
+    Shard(worker_threads.num_threads, worker_threads.workers, nodes * batch,
+          depth * depth, shard);
+  }
+
+ private:
+  typedef Eigen::Tensor<float, 1, Eigen::RowMajor>::DimensionPair DimPair;
+
+  int64 depth_radius_;
+  float bias_;
+  float alpha_;
+  float beta_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("LRNGrad").Device(DEVICE_CPU), LRNGradOp);
+
+#endif  // __ANDROID__
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/lrn_op_test.cc b/tensorflow/core/kernels/lrn_op_test.cc
new file mode 100644
index 0000000000..4c338b6cb3
--- /dev/null
+++ b/tensorflow/core/kernels/lrn_op_test.cc
@@ -0,0 +1,185 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+static const float tol_ = 1e-4;
+
+class LRNFloatTest : public OpsTestBase {
+ protected:
+  LRNFloatTest() : philox_(123, 17), rand_(&philox_) { RequireDefaultOps(); }
+
+  int GetIntAttr(const string& name) {
+    int value;
+    TF_CHECK_OK(GetNodeAttr(*node_def(), name, &value));
+    return value;
+  }
+
+  float GetFloatAttr(const string& name) {
+    float value;
+    TF_CHECK_OK(GetNodeAttr(*node_def(), name, &value));
+    return value;
+  }
+
+  bool Compare() {
+    const auto& input = GetInput(0);
+    const int64 batch_size = input.dim_size(0);
+    const int64 rows = input.dim_size(1);
+    const int64 cols = input.dim_size(2);
+    const int64 depth = input.dim_size(3);
+    const int64 rest = cols * rows * batch_size;
+
+    const int64 depth_radius = GetIntAttr("depth_radius");
+    const float bias = GetFloatAttr("bias");
+    const float alpha = GetFloatAttr("alpha");
+    const float beta = GetFloatAttr("beta");
+
+    Eigen::Tensor<float, 4, Eigen::RowMajor> expected(batch_size, rows, cols,
+                                                      depth);
+    auto out = expected.reshape(Eigen::DSizes<int64, 2>{rest, depth});
+    auto in = input.shaped<float, 2>({rest, depth});
+
+    for (int64 i = 0; i < rest; ++i) {
+      Eigen::Tensor<float, 1, Eigen::RowMajor> out_col(depth);
+      for (int64 d = 0; d < depth; ++d) {
+        float denom = 0.0f;
+        for (int64 r = std::max(0ll, d - depth_radius);
+             r < std::min(depth, d + depth_radius + 1); ++r) {
+          denom += in(i, r) * in(i, r);
+        }
+        denom = std::pow(denom * alpha + bias, beta);
+        out_col(d) = in(i, d) / denom;
+      }
+      out.chip<0>(i) = out_col;
+    }
+    auto actual = GetOutput(0)->tensor<float, 4>();
+    Eigen::Tensor<float, 0, Eigen::RowMajor> sum =
+        ((expected - actual).abs() > actual.constant(tol_))
+            .select(actual.constant(1), actual.constant(0))
+            .sum();
+    return sum() == 0;
+  }
+
+  random::PhiloxRandom philox_;
+  random::SimplePhilox rand_;
+};
+
+TEST_F(LRNFloatTest, Depth96) {
+  ASSERT_OK(NodeDefBuilder("lrn_op", "LRN")
+                .Input(FakeInput())
+                .Attr("depth_radius", 5)
+                .Attr("bias", 1.0f)
+                .Attr("alpha", 0.1f)
+                .Attr("beta", 2.0f)
+                .Finalize(node_def()));
+  ASSERT_OK(InitOp());
+  AddInput<float>(TensorShape({1, 1, 1, 96}),
+                  [this](int i) -> float { return i + 1; });
+  ASSERT_OK(RunOpKernel());
+  auto actual = GetOutput(0)->tensor<float, 4>();
+
+  // Output for Node 0 with Value 1:
+  // 1 / (1 + 0.1*(1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2))^2
+  EXPECT_NEAR(1. / (10.1 * 10.1), actual(0, 0, 0, 0), tol_);
+
+  // Output for Node 5 with Value 6:
+  // 6 / (1 + 0.1*(1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2 ... + 11^2))^2
+  EXPECT_NEAR(6. / (51.6 * 51.6), actual(0, 0, 0, 5), tol_);
+
+  // Output for Node 63 with value 64:
+  // 64 / (1 + 0.1*(59^2 + 60^2 + 61^2 + 62^2 + 63^2 + 64^2))^2
+  EXPECT_NEAR(64. / (2272.1 * 2272.1), actual(0, 0, 0, 63), tol_);
+
+  // Output for Node 64 with value 65:
+  // 65 / (1 + 0.1*(65^2 + 66^2 + 67^2 + 68^2 + 69^2 + 70^2))^2
+  EXPECT_NEAR(65. / (2736.5 * 2736.5), actual(0, 0, 0, 64), tol_);
+
+  // Output for Node 95 with value 96:
+  // 96 / (1 + 0.1*(91^2 + 92^2 + 93^2 + 94^2 + 95^2 + 96^2))^2
+  EXPECT_NEAR(96. / (5248.1 * 5248.1), actual(0, 0, 0, 95), tol_);
+  EXPECT_TRUE(Compare());
+}
+
+TEST_F(LRNFloatTest, Depth16) {
+  ASSERT_OK(NodeDefBuilder("lrn_op", "LRN")
+                .Input(FakeInput())
+                .Attr("depth_radius", 5)
+                .Attr("bias", 1.0f)
+                .Attr("alpha", 0.1f)
+                .Attr("beta", 2.0f)
+                .Finalize(node_def()));
+  ASSERT_OK(InitOp());
+  AddInput<float>(TensorShape({1, 1, 1, 16}),
+                  [this](int i) -> float { return i + 1; });
+  ASSERT_OK(RunOpKernel());
+  auto actual = GetOutput(0)->tensor<float, 4>();
+
+  // Output for Node 0 with Value 1:
+  // 1 / (1 + 0.1*(1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2))^2
+  EXPECT_NEAR(1. / (10.1 * 10.1), actual(0, 0, 0, 0), tol_);
+
+  // Output for Node 5 with Value 6:
+  // 6 / (1 + 0.1*(1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2 ... + 11^2))^2
+  EXPECT_NEAR(6. / (51.6 * 51.6), actual(0, 0, 0, 5), tol_);
+
+  // Output for Node 15 with value 16:
+  // 16 / (1 + 0.1*(11^2 + 12^2 + 13^2 + 14^2 + 15^2 + 16^2))^2
+  EXPECT_NEAR(16. / (112.1 * 112.1), actual(0, 0, 0, 15), tol_);
+  EXPECT_TRUE(Compare());
+}
+
+static double RndGaussian(random::SimplePhilox* rnd) {
+  // Box-Muller transformation.
+  // See, for example, http://www.taygeta.com/random/gaussian.html
+  double x1, x2;
+  double r;
+  do {
+    x1 = 2 * rnd->RandDouble() - 1;
+    x2 = 2 * rnd->RandDouble() - 1;
+    r = x1 * x1 + x2 * x2;
+  } while (r == 0 || r >= 1.0);
+  double w = sqrt(-2.0 * log(r) / r);
+  return x1 * w;
+}
+
+#define TCASE(NAME, DEPTH, BATCH, DEPTH_RADIUS, BIAS, ALPHA, BETA)           \
+  TEST_F(LRNFloatTest, NAME) {                                               \
+    ASSERT_OK(NodeDefBuilder("lrn_op", "LRN")                                \
+                  .Input(FakeInput())                                        \
+                  .Attr("depth_radius", (DEPTH_RADIUS))                      \
+                  .Attr("bias", (BIAS))                                      \
+                  .Attr("alpha", ((ALPHA) / 10))                             \
+                  .Attr("beta", (BETA))                                      \
+                  .Finalize(node_def()));                                    \
+    ASSERT_OK(InitOp());                                                     \
+    AddInput<float>(TensorShape({BATCH, 1, 1, DEPTH}),                       \
+                    [this](int i) -> float { return RndGaussian(&rand_); }); \
+    ASSERT_OK(RunOpKernel());                                                \
+    EXPECT_TRUE(Compare());                                                  \
+  }
+
+// clang-format off
+//        DEPTH  BATCH  DEPTH_RADIUS  BIAS  ALPHA  BETA
+TCASE(T0, 4,     2,     2,            1.0f, 1.0f,  2.0f)
+TCASE(T1, 16,    1,     5,            1.0f, 1.0f,  2.0f)
+TCASE(T2, 16,    32,    2,            1.0f, 2.0f,  1.0f)
+TCASE(T3, 128,   4,     3,            2.0f, 1.0f,  1.0f)
+// clang-format on
+
+#undef TCASE
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/matching_files_op.cc b/tensorflow/core/kernels/matching_files_op.cc
new file mode 100644
index 0000000000..08a4da5b41
--- /dev/null
+++ b/tensorflow/core/kernels/matching_files_op.cc
@@ -0,0 +1,42 @@
+// See docs in ../ops/io_ops.cc.
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/io/match.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+class MatchingFilesOp : public OpKernel {
+ public:
+  using OpKernel::OpKernel;
+  void Compute(OpKernelContext* context) override {
+    const Tensor* pattern;
+    OP_REQUIRES_OK(context, context->input("pattern", &pattern));
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(pattern->shape()),
+                errors::InvalidArgument(
+                    "Input pattern tensor must be scalar, but had shape: ",
+                    pattern->shape().DebugString()));
+    std::vector<string> fnames;
+    OP_REQUIRES_OK(context,
+                   io::GetMatchingFiles(context->env(),
+                                        pattern->scalar<string>()(), &fnames));
+    const int num_out = fnames.size();
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                "filenames", TensorShape({num_out}), &output));
+    auto output_vec = output->vec<string>();
+    for (int i = 0; i < num_out; ++i) {
+      output_vec(i) = fnames[i];
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("MatchingFiles").Device(DEVICE_CPU),
+                        MatchingFilesOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/matmul_op.cc b/tensorflow/core/kernels/matmul_op.cc
new file mode 100644
index 0000000000..48bdba78b2
--- /dev/null
+++ b/tensorflow/core/kernels/matmul_op.cc
@@ -0,0 +1,214 @@
+// See docs in ../ops/math_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/matmul_op.h"
+
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/fill_functor.h"
+
+#if GOOGLE_CUDA
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/stream_executor/stream.h"
+#endif  // GOOGLE_CUDA
+
+namespace tensorflow {
+
+#if GOOGLE_CUDA
+
+namespace {
+template <typename T>
+perftools::gputools::DeviceMemory<T> AsDeviceMemory(const T* cuda_memory) {
+  perftools::gputools::DeviceMemoryBase wrapped(const_cast<T*>(cuda_memory));
+  perftools::gputools::DeviceMemory<T> typed(wrapped);
+  return typed;
+}
+}  // namespace
+
+#endif  // GOOGLE_CUDA
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T, bool USE_CUBLAS>
+struct LaunchMatMul;
+
+// On CPUs, we ignore USE_CUBLAS
+template <typename T>
+struct LaunchMatMulCPU {
+  static void launch(
+      OpKernelContext* ctx, OpKernel* kernel, const Tensor& a, const Tensor& b,
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1>& dim_pair,
+      Tensor* out) {
+    functor::MatMulFunctor<CPUDevice, T>()(ctx->eigen_device<CPUDevice>(),
+                                           out->matrix<T>(), a.matrix<T>(),
+                                           b.matrix<T>(), dim_pair);
+  }
+};
+
+template <typename T, bool USE_CUBLAS>
+struct LaunchMatMul<CPUDevice, T, USE_CUBLAS> : public LaunchMatMulCPU<T> {};
+
+#if GOOGLE_CUDA
+
+template <typename T>
+struct LaunchMatMul<GPUDevice, T, true /* USE_CUBLAS */> {
+  static void launch(
+      OpKernelContext* ctx, OpKernel* kernel, const Tensor& a, const Tensor& b,
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1>& dim_pair,
+      Tensor* out) {
+    perftools::gputools::blas::Transpose trans[] = {
+        perftools::gputools::blas::Transpose::kNoTranspose,
+        perftools::gputools::blas::Transpose::kTranspose};
+    const uint64 m = a.dim_size(1 - dim_pair[0].first);
+    const uint64 k = a.dim_size(dim_pair[0].first);
+    const uint64 n = b.dim_size(1 - dim_pair[0].second);
+    bool transpose_a = dim_pair[0].first == 0;
+    bool transpose_b = dim_pair[0].second == 1;
+    auto blas_transpose_a = trans[transpose_a];
+    auto blas_transpose_b = trans[transpose_b];
+
+    auto* stream = ctx->op_device_context<GPUDeviceContext>()->stream();
+    OP_REQUIRES(ctx, stream, errors::Internal("No GPU stream available."));
+
+    auto a_ptr = AsDeviceMemory(a.template flat<T>().data());
+    auto b_ptr = AsDeviceMemory(b.template flat<T>().data());
+    auto c_ptr = AsDeviceMemory(out->template flat<T>().data());
+
+    // Cublas does
+    // C = A x B
+    // where A, B and C are assumed to be in column major.
+    // We want the output to be in row-major, so we can compute
+    // C' = B' x A' (' stands for transpose)
+    bool blas_launch_status =
+        stream->ThenBlasGemm(blas_transpose_b, blas_transpose_a, n, m, k, 1.0f,
+                             b_ptr, transpose_b ? k : n, a_ptr,
+                             transpose_a ? m : k, 0.0f, &c_ptr, n)
+            .ok();
+    if (!blas_launch_status) {
+      ctx->SetStatus(errors::Internal(
+          "Blas SGEMM launch failed : a.shape=(", a.dim_size(0), ", ",
+          a.dim_size(1), "), b.shape=(", b.dim_size(0), ", ", b.dim_size(1),
+          "), m=", m, ", n=", n, ", k=", k));
+    }
+  }
+};
+
+template <typename T>
+struct LaunchMatMul<GPUDevice, T, false /* USE_CUBLAS */> {
+  static void launch(
+      OpKernelContext* ctx, OpKernel* kernel, const Tensor& a, const Tensor& b,
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1>& dim_pair,
+      Tensor* out) {
+    functor::MatMulFunctor<GPUDevice, T>()(ctx->eigen_device<GPUDevice>(),
+                                           out->matrix<T>(), a.matrix<T>(),
+                                           b.matrix<T>(), dim_pair);
+  }
+};
+
+#endif  // GOOGLE_CUDA
+
+template <typename Device, typename T, bool USE_CUBLAS>
+class MatMulOp : public OpKernel {
+ public:
+  explicit MatMulOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("transpose_a", &transpose_a_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("transpose_b", &transpose_b_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& a = ctx->input(0);
+    const Tensor& b = ctx->input(1);
+
+    // Check that the dimensions of the two matrices are valid.
+    OP_REQUIRES(ctx, TensorShapeUtils::IsMatrix(a.shape()),
+                errors::InvalidArgument("In[0] is not a matrix"));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsMatrix(b.shape()),
+                errors::InvalidArgument("In[1] is not a matrix"));
+    Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1> dim_pair;
+    dim_pair[0].first = transpose_a_ ? 0 : 1;
+    dim_pair[0].second = transpose_b_ ? 1 : 0;
+
+    OP_REQUIRES(ctx,
+                a.dim_size(dim_pair[0].first) == b.dim_size(dim_pair[0].second),
+                errors::InvalidArgument("Matrix size-compatible: In[0]: ",
+                                        a.shape().DebugString(), ", In[1]: ",
+                                        b.shape().DebugString()));
+    int a_dim_remaining = 1 - dim_pair[0].first;
+    int b_dim_remaining = 1 - dim_pair[0].second;
+    TensorShape out_shape(
+        {a.dim_size(a_dim_remaining), b.dim_size(b_dim_remaining)});
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, out_shape, &out));
+
+    if (out->NumElements() == 0) {
+      // If a has shape [0, x] or b has shape [x, 0], the output shape
+      // is a 0-element matrix, so there is nothing to do.
+      return;
+    }
+
+    if (a.NumElements() == 0 || b.NumElements() == 0) {
+      // If a has shape [x, 0] and b has shape [0, y], the
+      // output shape is [x, y] where x and y are non-zero, so we fill
+      // the output with zeros.
+      functor::SetZeroFunctor<Device, T> f;
+      f(ctx->eigen_device<Device>(), out->flat<T>());
+      return;
+    }
+
+    LaunchMatMul<Device, T, USE_CUBLAS>::launch(ctx, this, a, b, dim_pair, out);
+  }
+
+ private:
+  bool transpose_a_;
+  bool transpose_b_;
+};
+
+namespace functor {
+
+// Partial specialization MatMulFunctor<Device=CPUDevice, T>.
+template <typename T>
+struct MatMulFunctor<CPUDevice, T> {
+  void operator()(
+      const CPUDevice& d, typename MatMulTypes<T>::out_type out,
+      typename MatMulTypes<T>::in_type in0,
+      typename MatMulTypes<T>::in_type in1,
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1>& dim_pair) {
+    MatMul<CPUDevice>(d, out, in0, in1, dim_pair);
+  }
+};
+
+}  // end namespace functor
+
+#define REGISTER_CPU(T)                                                        \
+  REGISTER_KERNEL_BUILDER(                                                     \
+      Name("MatMul").Device(DEVICE_CPU).TypeConstraint<T>("T"),                \
+      MatMulOp<CPUDevice, T, false /* cublas, ignored for CPU */>);            \
+  REGISTER_KERNEL_BUILDER(                                                     \
+      Name("MatMul").Device(DEVICE_CPU).TypeConstraint<T>("T").Label("eigen"), \
+      MatMulOp<CPUDevice, T, false /* cublas, ignored for CPU */>)
+
+#define REGISTER_GPU(T)                                                        \
+  REGISTER_KERNEL_BUILDER(                                                     \
+      Name("MatMul").Device(DEVICE_GPU).TypeConstraint<T>("T"),                \
+      MatMulOp<GPUDevice, T, true /* cublas, true by default */>);             \
+  REGISTER_KERNEL_BUILDER(Name("MatMul")                                       \
+                              .Device(DEVICE_GPU)                              \
+                              .TypeConstraint<T>("T")                          \
+                              .Label("cublas"),                                \
+                          MatMulOp<GPUDevice, T, true /* cublas */>);          \
+  REGISTER_KERNEL_BUILDER(                                                     \
+      Name("MatMul").Device(DEVICE_GPU).TypeConstraint<T>("T").Label("eigen"), \
+      MatMulOp<GPUDevice, T, false /* cublas */>)
+
+REGISTER_CPU(float);
+REGISTER_CPU(double);
+REGISTER_CPU(int32);
+REGISTER_CPU(complex64);
+#if GOOGLE_CUDA
+REGISTER_GPU(float);
+// REGISTER_GPU(double);
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/matmul_op.h b/tensorflow/core/kernels/matmul_op.h
new file mode 100644
index 0000000000..f75b0ded1b
--- /dev/null
+++ b/tensorflow/core/kernels/matmul_op.h
@@ -0,0 +1,40 @@
+#ifndef TENSORFLOW_KERNELS_MATMUL_OP_H_
+#define TENSORFLOW_KERNELS_MATMUL_OP_H_
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Helpers to define tensor<T> needed by MatMul op.
+template <typename T>
+struct MatMulTypes {
+  typedef Eigen::TensorMap<Eigen::Tensor<T, 2, Eigen::RowMajor>, Eigen::Aligned>
+      out_type;
+  typedef Eigen::TensorMap<Eigen::Tensor<const T, 2, Eigen::RowMajor>,
+                           Eigen::Aligned> in_type;
+};
+
+template <typename Device, typename In0, typename In1, typename Out,
+          typename DimPair>
+void MatMul(const Device& d, Out out, In0 in0, In1 in1,
+            const DimPair& dim_pair) {
+  out.device(d) = in0.contract(in1, dim_pair);
+}
+
+template <typename Device, typename T>
+struct MatMulFunctor {
+  // Computes on device "d": out = in0 * in1, where * is matrix
+  // multiplication.
+  void operator()(
+      const Device& d, typename MatMulTypes<T>::out_type out,
+      typename MatMulTypes<T>::in_type in0,
+      typename MatMulTypes<T>::in_type in1,
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1>& dim_pair);
+};
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_MATMUL_OP_H_
diff --git a/tensorflow/core/kernels/matmul_op_gpu.cu.cc b/tensorflow/core/kernels/matmul_op_gpu.cu.cc
new file mode 100644
index 0000000000..17107ce5df
--- /dev/null
+++ b/tensorflow/core/kernels/matmul_op_gpu.cu.cc
@@ -0,0 +1,32 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/matmul_op.h"
+
+namespace tensorflow {
+namespace functor {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Partial specialization MatMulTensorFunctor<Device=GPUDevice, T>
+template <typename T>
+struct MatMulFunctor<GPUDevice, T> {
+  void operator()(
+      const GPUDevice& d, typename MatMulTypes<T>::out_type out,
+      typename MatMulTypes<T>::in_type in0,
+      typename MatMulTypes<T>::in_type in1,
+      const Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1>& dim_pair) {
+    MatMul<GPUDevice>(d, To32Bit(out), To32Bit(in0), To32Bit(in1), dim_pair);
+  }
+};
+
+#define DEFINE(T) template struct MatMulFunctor<GPUDevice, T>;
+DEFINE(float);
+// DEFINE(double);  // Does not compile 1/2015.
+#undef DEFINE
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/matmul_op_test.cc b/tensorflow/core/kernels/matmul_op_test.cc
new file mode 100644
index 0000000000..b2b8f3d905
--- /dev/null
+++ b/tensorflow/core/kernels/matmul_op_test.cc
@@ -0,0 +1,56 @@
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+static Graph* Matmul(int m, int k, int n, bool transpose_a, bool transpose_b) {
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor in0(DT_FLOAT, transpose_a ? TensorShape({k, m}) : TensorShape({m, k}));
+  in0.flat<float>().setRandom();
+  Tensor in1(DT_FLOAT, transpose_b ? TensorShape({n, k}) : TensorShape({k, n}));
+  in1.flat<float>().setRandom();
+  test::graph::Matmul(g, test::graph::Constant(g, in0),
+                      test::graph::Constant(g, in1), transpose_a, transpose_b);
+  return g;
+}
+
+#define BM_MatmulDev(M, K, N, TA, TB, DEVICE)                           \
+  static void BM_Matmul##_##M##_##K##_##N##_##TA##_##TB##_##DEVICE(     \
+      int iters) {                                                      \
+    testing::ItemsProcessed(static_cast<int64>(iters) * M * K * N * 2); \
+    test::Benchmark(#DEVICE, Matmul(M, K, N, TA, TB)).Run(iters);       \
+  }                                                                     \
+  BENCHMARK(BM_Matmul##_##M##_##K##_##N##_##TA##_##TB##_##DEVICE);
+
+#define BM_Matmul(M, K, N, TA, TB)    \
+  BM_MatmulDev(M, K, N, TA, TB, cpu); \
+  BM_MatmulDev(M, K, N, TA, TB, gpu);
+
+// Typical fully connected layers
+BM_Matmul(8, 512, 512, false, false);
+BM_Matmul(16, 512, 512, false, false);
+BM_Matmul(128, 512, 512, false, false);
+
+BM_Matmul(8, 1024, 1024, false, false);
+BM_Matmul(16, 1024, 1024, false, false);
+BM_Matmul(128, 1024, 1024, false, false);
+BM_Matmul(4096, 4096, 4096, false, false);
+
+// Backward for fully connected layers
+BM_Matmul(8, 1024, 1024, false, true);
+BM_Matmul(16, 1024, 1024, false, true);
+BM_Matmul(128, 1024, 1024, false, true);
+
+// Forward softmax with large output size
+BM_Matmul(8, 200, 10000, false, false);
+BM_Matmul(20, 200, 10000, false, false);
+BM_Matmul(20, 200, 20000, false, false);
+
+// Backward softmax with large output size
+BM_Matmul(8, 10000, 200, false, true);
+BM_Matmul(20, 10000, 200, false, true);
+BM_Matmul(20, 20000, 200, false, true);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/matrix_inverse_op.cc b/tensorflow/core/kernels/matrix_inverse_op.cc
new file mode 100644
index 0000000000..ad0948d6ef
--- /dev/null
+++ b/tensorflow/core/kernels/matrix_inverse_op.cc
@@ -0,0 +1,64 @@
+// See docs in ../ops/linalg_ops.cc.
+#include <cmath>
+
+#include "tensorflow/core/framework/kernel_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/linalg_ops_common.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/Eigen/LU"
+
+namespace tensorflow {
+
+template <class Scalar, bool SupportsBatchOperationT>
+class MatrixInverseOp
+    : public LinearAlgebraOp<Scalar, SupportsBatchOperationT> {
+ public:
+  explicit MatrixInverseOp(OpKernelConstruction* context)
+      : LinearAlgebraOp<Scalar, SupportsBatchOperationT>(context) {}
+  ~MatrixInverseOp() override {}
+
+  TensorShape GetOutputMatrixShape(
+      const TensorShape& input_matrix_shape) override {
+    return input_matrix_shape;
+  }
+
+  int64 GetCostPerUnit(const TensorShape& input_matrix_shape) override {
+    const int64 rows = input_matrix_shape.dim_size(0);
+    if (rows > (1LL << 20)) {
+      // A big number to cap the cost in case overflow.
+      return kint32max;
+    } else {
+      return rows * rows * rows;
+    }
+  }
+
+  using typename LinearAlgebraOp<Scalar, SupportsBatchOperationT>::MatrixMap;
+  using
+      typename LinearAlgebraOp<Scalar, SupportsBatchOperationT>::ConstMatrixMap;
+
+  void ComputeMatrix(OpKernelContext* context, const ConstMatrixMap& input,
+                     MatrixMap* output) override {
+    OP_REQUIRES(context, input.rows() == input.cols(),
+                errors::InvalidArgument("Input matrix must be square."));
+    if (input.rows() == 0) {
+      // By definition, an empty matrix's inverse is an emptry matrix.
+      return;
+    }
+    Eigen::FullPivLU<Eigen::Matrix<Scalar, Eigen::Dynamic, Eigen::Dynamic,
+                                   Eigen::RowMajor>> lu_decomposition(input);
+    OP_REQUIRES(context, lu_decomposition.isInvertible(),
+                errors::InvalidArgument("Input is not invertible."));
+    *output = lu_decomposition.inverse();
+  }
+};
+
+REGISTER_LINALG_OP("MatrixInverse", (MatrixInverseOp<float, false>), float);
+REGISTER_LINALG_OP("MatrixInverse", (MatrixInverseOp<double, false>), double);
+REGISTER_LINALG_OP("BatchMatrixInverse", (MatrixInverseOp<float, true>), float);
+REGISTER_LINALG_OP("BatchMatrixInverse", (MatrixInverseOp<double, true>),
+                   double);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/maxpooling_op.cc b/tensorflow/core/kernels/maxpooling_op.cc
new file mode 100644
index 0000000000..31046018c5
--- /dev/null
+++ b/tensorflow/core/kernels/maxpooling_op.cc
@@ -0,0 +1,554 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/maxpooling_op.h"
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/kernels/conv_2d.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/pooling_ops_common.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/util/use_cudnn.h"
+#include "tensorflow/core/util/padding.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+#if GOOGLE_CUDA
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/core/kernels/maxpooling_op_gpu.h"
+#include "tensorflow/core/kernels/pooling_ops_common_gpu.h"
+#endif  // GOOGLE_CUDA
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+const int kInvalidMaxPoolingIndex = -1;
+
+template <typename Device, typename T>
+struct SpatialMaxPoolWithArgMaxHelper {
+  static void Compute(Tensor* output, Tensor* output_arg_max,
+                      const Tensor& tensor_in, const PoolParameters& params,
+                      const Padding& padding) {
+    typedef Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>>
+        ConstEigenMatrixMap;
+    typedef Eigen::Map<Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>>
+        EigenMatrixMap;
+    typedef Eigen::Map<Eigen::Matrix<int64, Eigen::Dynamic, Eigen::Dynamic>>
+        EigenIndexMatrixMap;
+
+    ConstEigenMatrixMap in_mat(
+        tensor_in.flat<T>().data(), params.depth,
+        params.tensor_in_cols * params.tensor_in_rows * params.tensor_in_batch);
+    EigenMatrixMap out_mat(
+        output->flat<T>().data(), params.depth,
+        params.out_width * params.out_height * params.tensor_in_batch);
+    EigenIndexMatrixMap out_arg_max_mat(
+        output_arg_max->flat<int64>().data(), params.depth,
+        params.out_width * params.out_height * params.tensor_in_batch);
+
+    // Initializes the output tensor with MIN<T>.
+    output_arg_max->flat<int64>().setConstant(kInvalidMaxPoolingIndex);
+    output->flat<T>().setConstant(Eigen::NumTraits<T>::lowest());
+
+    // The following code basically does the following:
+    // 1. Flattens the input and output tensors into two dimensional arrays.
+    //    tensor_in_as_matrix:
+    //      depth by (tensor_in_cols * tensor_in_rows * tensor_in_batch)
+    //    output_as_matrix:
+    //      depth by (out_width * out_height * tensor_in_batch)
+    //
+    // 2. Walks through the set of columns in the flattened tensor_in_as_matrix,
+    //    and updates the corresponding column(s) in output_as_matrix with the
+    //    max value.
+    for (int b = 0; b < params.tensor_in_batch; ++b) {
+      for (int h = 0; h < params.tensor_in_rows; ++h) {
+        for (int w = 0; w < params.tensor_in_cols; ++w) {
+          // (h_start, h_end) * (w_start, w_end) is the range that the input
+          // vector projects to.
+          const int hpad = h + params.pad_rows;
+          const int wpad = w + params.pad_cols;
+          const int h_start =
+              (hpad < params.window_rows)
+                  ? 0
+                  : (hpad - params.window_rows) / params.row_stride + 1;
+          const int h_end =
+              std::min(hpad / params.row_stride + 1, params.out_height);
+          const int w_start =
+              (wpad < params.window_cols)
+                  ? 0
+                  : (wpad - params.window_cols) / params.col_stride + 1;
+          const int w_end =
+              std::min(wpad / params.col_stride + 1, params.out_width);
+          // compute elementwise max
+          const int in_index =
+              (b * params.tensor_in_rows + h) * params.tensor_in_cols + w;
+          for (int ph = h_start; ph < h_end; ++ph) {
+            for (int pw = w_start; pw < w_end; ++pw) {
+              const int out_index =
+                  (b * params.out_height + ph) * params.out_width + pw;
+              /// NOTES(zhengxq): not using the eigen matrix operation for now.
+              /// May consider parallelizing the operations if needed.
+              for (int d = 0; d < params.depth; ++d) {
+                const T& input_ref = in_mat.coeffRef(d, in_index);
+                T& output_ref = out_mat.coeffRef(d, out_index);
+                int64& out_arg_max_ref = out_arg_max_mat.coeffRef(d, out_index);
+                if (output_ref < input_ref ||
+                    out_arg_max_ref == kInvalidMaxPoolingIndex) {
+                  output_ref = input_ref;
+                  int input_offset = in_index * params.depth + d;
+                  out_arg_max_ref = input_offset;
+                }
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("MaxPool").Device(DEVICE_CPU),
+                        MaxPoolingOp<CPUDevice, float>);
+
+#if GOOGLE_CUDA
+// Forward declarations for the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                            \
+  template <>                                                          \
+  void SpatialMaxPooling<Eigen::GpuDevice, T>::operator()(             \
+      const Eigen::GpuDevice& d, typename TTypes<T, 4>::Tensor output, \
+      typename TTypes<T, 4>::ConstTensor input, int window_rows,       \
+      int window_cols, int row_stride, int col_stride,                 \
+      const Eigen::PaddingType& padding);                              \
+  extern template struct SpatialMaxPooling<Eigen::GpuDevice, T>;
+
+DECLARE_GPU_SPEC(float);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+// Note(jiayq): Currently, the Caffe custom implementation is faster than the
+// default Eigen implementation so we are using the custom kernel as the
+// default. However, you can explicitly invoke the eigen version using
+// kernel_label_map.
+REGISTER_KERNEL_BUILDER(Name("MaxPool")
+                            .Device(DEVICE_GPU)
+                            .Label("eigen_tensor"),
+                        MaxPoolingOp<Eigen::GpuDevice, float>);
+#endif  // GOOGLE_CUDA
+
+// The operation to compute MaxPool gradients.
+// It takes three inputs:
+//   - The original input tensor
+//   - The original output tensor
+//   - Backprop tensor for output
+// It produces one output: backprop tensor for input.
+template <class Device, class T>
+class MaxPoolingGradOp : public OpKernel {
+ public:
+  explicit MaxPoolingGradOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window ksize field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window strides field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+    OP_REQUIRES(
+        context, ksize_[3] == 1 && stride_[3] == 1,
+        errors::Unimplemented(
+            "MaxPoolingGrad is not yet supported on the depth dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in = context->input(0);
+    const Tensor& tensor_out = context->input(1);
+    const Tensor& out_backprop = context->input(2);
+
+    // For maxpooling, tensor_in should have 4 dimensions.
+    OP_REQUIRES(context, tensor_in.dims() == 4,
+                errors::InvalidArgument("tensor_in must be 4-dimensional"));
+    OP_REQUIRES(context, tensor_out.dims() == 4,
+                errors::InvalidArgument("tensor_out must be 4-dimensional"));
+    // For maxpooling, out_backprop should have 4 dimensions.
+    OP_REQUIRES(context, out_backprop.dims() == 4,
+                errors::InvalidArgument("out_backprop must be 4-dimensional"));
+
+    TensorShape output_shape = tensor_in.shape();
+
+    // Tensor index_tensor(context->allocator(), DT_INT32, output_shape);
+
+    Tensor tensor_out_dup;
+    OP_REQUIRES_OK(context,
+                   context->allocate_temp(DataTypeToEnum<T>::v(),
+                                          tensor_out.shape(), &tensor_out_dup));
+    Tensor tensor_out_arg_max;
+    OP_REQUIRES_OK(context, context->allocate_temp(DataTypeToEnum<int64>::v(),
+                                                   tensor_out.shape(),
+                                                   &tensor_out_arg_max));
+
+    PoolParameters params{context, ksize_, stride_, padding_,
+                          tensor_in.shape()};
+    if (!context->status().ok()) {
+      return;
+    }
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+    output->flat<T>().setZero();
+
+    SpatialMaxPoolWithArgMaxHelper<CPUDevice, T>::Compute(
+        &tensor_out_dup, &tensor_out_arg_max, tensor_in, params, padding_);
+    auto out_backprop_flat = out_backprop.flat<T>();
+    auto input_backprop_flat = output->flat<T>();
+    auto out_arg_max_flat = tensor_out_arg_max.flat<int64>();
+    int num_total_outputs = out_backprop.flat<T>().size();
+    int num_total_inputs = input_backprop_flat.size();
+
+    for (int index = 0; index < num_total_outputs; ++index) {
+      int input_backprop_index = out_arg_max_flat(index);
+      // Although this check is in the inner loop, it is worth its value
+      // so we don't end up with memory corruptions. Our benchmark shows that
+      // the performance impact is quite small
+      CHECK(input_backprop_index >= 0 &&
+            input_backprop_index < num_total_inputs)
+          << "Invalid input backprop index: " << input_backprop_index << ", "
+          << num_total_inputs;
+      input_backprop_flat(input_backprop_index) += out_backprop_flat(index);
+    }
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("MaxPoolGrad").Device(DEVICE_CPU),
+                        MaxPoolingGradOp<CPUDevice, float>);
+
+#ifdef GOOGLE_CUDA
+
+static void MaxPoolingBackwardCustomKernel(
+    OpKernelContext* context, const std::vector<int32>& size,
+    const std::vector<int32>& stride, Padding padding, const Tensor* tensor_in,
+    const Tensor& out_backprop, const TensorShape& tensor_in_shape) {
+  Tensor* output = nullptr;
+
+  OP_REQUIRES_OK(context,
+                 context->allocate_output(0, tensor_in_shape, &output));
+
+  PoolParameters params{context, size, stride, padding, tensor_in_shape};
+  if (!context->status().ok()) {
+    return;
+  }
+
+  MaxPoolBackwardNoMask(
+      tensor_in->flat<float>().data(), params.tensor_in_batch,
+      params.tensor_in_rows, params.tensor_in_cols, params.depth,
+      params.out_height, params.out_width, params.window_rows,
+      params.window_cols, params.row_stride, params.col_stride, params.pad_rows,
+      params.pad_cols, out_backprop.flat<float>().data(),
+      output->flat<float>().data(), context->eigen_device<Eigen::GpuDevice>());
+}
+
+template <class T>
+class MaxPoolingGradOp<Eigen::GpuDevice, T> : public OpKernel {
+ public:
+  typedef Eigen::GpuDevice Device;
+
+  explicit MaxPoolingGradOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window ksize field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window strides field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+
+    use_dnn_ = CanUseCudnn();
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in = context->input(0);
+    const Tensor& tensor_out = context->input(1);
+    const Tensor& out_backprop = context->input(2);
+
+    // For maxpooling, tensor_in should have 4 dimensions.
+    OP_REQUIRES(context, tensor_in.dims() == 4,
+                errors::InvalidArgument("tensor_in must be 4-dimensional 4"));
+    OP_REQUIRES(context, tensor_out.dims() == 4,
+                errors::InvalidArgument("tensor_out must be 4-dimensional"));
+    // For maxpooling, out_backprop should have 4 dimensions.
+    OP_REQUIRES(context, out_backprop.dims() == 4,
+                errors::InvalidArgument("out_backprop must be 4-dimensional"));
+
+    TensorShape output_shape = tensor_in.shape();
+
+    if (use_dnn_) {
+      DnnPoolingGradOp<T>::Compute(
+          context, perftools::gputools::dnn::PoolingMode::kMaximum, ksize_,
+          stride_, padding_, &tensor_in, &tensor_out, out_backprop,
+          output_shape);
+    } else {
+      MaxPoolingBackwardCustomKernel(context, ksize_, stride_, padding_,
+                                     &tensor_in, out_backprop, output_shape);
+    }
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+  bool use_dnn_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("MaxPoolGrad").Device(DEVICE_GPU),
+                        MaxPoolingGradOp<Eigen::GpuDevice, float>);
+
+#endif  // GOOGLE_CUDA
+
+template <typename Device, typename T>
+struct LaunchMaxPoolingNoMask;
+
+template <typename Device, typename T>
+class MaxPoolingNoMaskOp : public OpKernel {
+ public:
+  explicit MaxPoolingNoMaskOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument("Sliding window ksize field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument("Sliding window stride field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in = context->input(0);
+
+    PoolParameters params{context, ksize_, stride_, padding_,
+                          tensor_in.shape()};
+    if (!context->status().ok()) {
+      return;
+    }
+
+    TensorShape out_shape({params.tensor_in_batch, params.out_height,
+                           params.out_width, params.depth});
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, out_shape, &output));
+
+    LaunchMaxPoolingNoMask<Device, T>::launch(context, params, tensor_in,
+                                              output);
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+template <typename Device, typename T>
+struct LaunchMaxPoolingWithArgmax;
+
+template <typename Device, typename T>
+class MaxPoolingWithArgmaxOp : public OpKernel {
+ public:
+  explicit MaxPoolingWithArgmaxOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window ksize field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window stride field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in = context->input(0);
+
+    PoolParameters params{context, ksize_, stride_, padding_,
+                          tensor_in.shape()};
+    if (!context->status().ok()) {
+      return;
+    }
+
+    TensorShape out_shape({params.tensor_in_batch, params.out_height,
+                           params.out_width, params.depth});
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, out_shape, &output));
+    Tensor* argmax = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(1, out_shape, &argmax));
+
+    LaunchMaxPoolingWithArgmax<Device, T>::launch(context, params, tensor_in,
+                                                  output, argmax);
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+template <typename Device, typename T>
+struct LaunchMaxPoolingGradWithArgmax;
+
+template <typename Device, typename T>
+class MaxPoolingGradWithArgmaxOp : public OpKernel {
+ public:
+  explicit MaxPoolingGradWithArgmaxOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window ksize field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument(
+                    "Sliding window stride field must "
+                    "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in = context->input(0);
+    const Tensor& grad_in = context->input(1);
+    const Tensor& argmax = context->input(2);
+
+    PoolParameters params{context, ksize_, stride_, padding_,
+                          tensor_in.shape()};
+    if (!context->status().ok()) {
+      return;
+    }
+
+    TensorShape out_shape({params.tensor_in_batch, params.tensor_in_rows,
+                           params.tensor_in_cols, params.depth});
+    Tensor* grad_out = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, out_shape, &grad_out));
+
+    LaunchMaxPoolingGradWithArgmax<Device, T>::launch(context, params, grad_in,
+                                                      argmax, grad_out);
+  }
+
+ private:
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+#if GOOGLE_CUDA
+
+template <typename T>
+struct LaunchMaxPoolingNoMask<Eigen::GpuDevice, T> {
+  static void launch(OpKernelContext* context, const PoolParameters& params,
+                     const Tensor& input, Tensor* output) {
+    bool status = MaxPoolForwardWithOptionalArgmax(
+        input.flat<T>().data(), params.tensor_in_batch, params.tensor_in_rows,
+        params.tensor_in_cols, params.depth, params.out_height,
+        params.out_width, params.window_rows, params.window_cols,
+        params.row_stride, params.col_stride, params.pad_rows, params.pad_cols,
+        output->flat<T>().data(), nullptr, context->eigen_gpu_device());
+    if (!status) {
+      context->SetStatus(
+          errors::Internal("Failed launching MaxPoolForwardNoMask"));
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("MaxPool").Device(DEVICE_GPU),
+                        MaxPoolingNoMaskOp<Eigen::GpuDevice, float>);
+
+template <typename T>
+struct LaunchMaxPoolingWithArgmax<Eigen::GpuDevice, T> {
+  static void launch(OpKernelContext* context, const PoolParameters& params,
+                     const Tensor& input, Tensor* output, Tensor* argmax) {
+    bool status = MaxPoolForwardWithOptionalArgmax(
+        input.flat<T>().data(), params.tensor_in_batch, params.tensor_in_rows,
+        params.tensor_in_cols, params.depth, params.out_height,
+        params.out_width, params.window_rows, params.window_cols,
+        params.row_stride, params.col_stride, params.pad_rows, params.pad_cols,
+        output->flat<T>().data(),
+        reinterpret_cast<int64*>(argmax->flat<int64>().data()),
+        context->eigen_gpu_device());
+    if (!status) {
+      context->SetStatus(
+          errors::Internal("Failed launching MaxPoolForwardWithArgmax"));
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("MaxPoolWithArgmax")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int64>("Targmax"),
+                        MaxPoolingWithArgmaxOp<Eigen::GpuDevice, float>);
+
+template <typename T>
+struct LaunchMaxPoolingGradWithArgmax<Eigen::GpuDevice, T> {
+  static void launch(OpKernelContext* context, const PoolParameters& params,
+                     const Tensor& grad_in, const Tensor& argmax,
+                     Tensor* grad_out) {
+    const int input_size = params.tensor_in_batch * params.tensor_in_rows *
+                           params.tensor_in_cols * params.depth;
+    const int output_size = params.tensor_in_batch * params.out_height *
+                            params.out_width * params.depth;
+    const int top_offset = params.out_height * params.out_width * params.depth;
+    const int bottom_offset =
+        params.tensor_in_rows * params.tensor_in_cols * params.depth;
+    bool status = MaxPoolBackwardWithArgmax(
+        output_size, input_size, grad_in.flat<T>().data(),
+        reinterpret_cast<const int64*>(argmax.flat<int64>().data()), top_offset,
+        bottom_offset, grad_out->flat<T>().data(), context->eigen_gpu_device());
+    if (!status) {
+      context->SetStatus(
+          errors::Internal("Failed launching MaxPoolForwardWithArgmax"));
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("MaxPoolGradWithArgmax")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int64>("Targmax"),
+                        MaxPoolingGradWithArgmaxOp<Eigen::GpuDevice, float>);
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/maxpooling_op.h b/tensorflow/core/kernels/maxpooling_op.h
new file mode 100644
index 0000000000..a074174118
--- /dev/null
+++ b/tensorflow/core/kernels/maxpooling_op.h
@@ -0,0 +1,29 @@
+#ifndef TENSORFLOW_KERNELS_MAXPOOLING_OP_H_
+#define TENSORFLOW_KERNELS_MAXPOOLING_OP_H_
+// Functor definition for MaxPoolingOp, must be compilable by nvcc.
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T>
+struct SpatialMaxPooling {
+  void operator()(const Device& d, typename TTypes<T, 4>::Tensor output,
+                  typename TTypes<T, 4>::ConstTensor input, int window_rows,
+                  int window_cols, int row_stride, int col_stride,
+                  const Eigen::PaddingType& padding) {
+    // Because we swap the layout, we swap the row/cols as well
+    output.swap_layout().device(d) =
+        Eigen::SpatialMaxPooling(input.swap_layout(), window_cols, window_rows,
+                                 col_stride, row_stride, padding);
+  }
+};
+
+}  // namespace functor
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_MAXPOOLING_OP_H_
diff --git a/tensorflow/core/kernels/maxpooling_op_gpu.cu.cc b/tensorflow/core/kernels/maxpooling_op_gpu.cu.cc
new file mode 100644
index 0000000000..65262eb54e
--- /dev/null
+++ b/tensorflow/core/kernels/maxpooling_op_gpu.cu.cc
@@ -0,0 +1,261 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include <stdio.h>
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/kernels/maxpooling_op.h"
+#include "tensorflow/core/kernels/maxpooling_op_gpu.h"
+
+namespace tensorflow {
+namespace {
+// This is Yangqing's custom kernel for the maxpooling operation. There are
+// three functions: MaxPoolForwardNCHW and MaxPoolForwardNHWC are the two
+// forward functions, dealing with the forward case. MaxPoolBackward is the
+// backward function that deals with the backward case for both storage orders.
+// The parameters to the kernels in the forward function is as follows:
+//     nthreads: the number of threads, which is equal to the output size.
+//     bottom_data: the bottom data of N*H*W*C (or N*C*H*W) items.
+//     height, width, pooled_height, pooled_width: the input and output sizes.
+//     kernel_h, kernel_w: the kernel sizes.
+//     stride_h, stride_w: the strides.
+//     pad_t, pad_l: the padding values on the top and left side.
+//     top_data: the maxpool output.
+//     mask: the output mask of the same size as top_data. It is stored in
+//         int form, keeping track of the flattened index of the input item that
+//         produces the max output. If a nullptr is passed in for mask, no mask
+//         will be produced.
+#define CUDA_1D_KERNEL_LOOP(i, n)                                              \
+  for (int i = blockIdx.x * blockDim.x + threadIdx.x;                          \
+       i < (n); i += blockDim.x * gridDim.x)
+
+// To call the forward and backward functions, use e.g.:
+// const int kThreadsPerBlock = 1024
+// const int output_size = batch * channels * pooled_height * pooled_width;
+// MaxPoolForwardNCHW<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock,
+//                      kThreadsPerBlock, 0, cuda_stream>>>(...);
+template <typename dtype>
+__global__ void MaxPoolForwardNCHW(const int nthreads, const dtype* bottom_data,
+                                   const int channels, const int height,
+                                   const int width, const int pooled_height,
+                                   const int pooled_width, const int kernel_h,
+                                   const int kernel_w, const int stride_h,
+                                   const int stride_w, const int pad_t,
+                                   const int pad_l, dtype* top_data,
+                                   int64* mask) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int pw = index % pooled_width;
+    int ph = (index / pooled_width) % pooled_height;
+    int c = (index / pooled_width / pooled_height) % channels;
+    int n = index / pooled_width / pooled_height / channels;
+    int hstart = ph * stride_h - pad_t;
+    int wstart = pw * stride_w - pad_l;
+    int hend = min(hstart + kernel_h, height);
+    int wend = min(wstart + kernel_w, width);
+    hstart = max(hstart, 0);
+    wstart = max(wstart, 0);
+    dtype maxval = -FLT_MAX;
+    int maxidx = -1;
+    const dtype* bottom_data_n = bottom_data + n * channels * height * width;
+    for (int h = hstart; h < hend; ++h) {
+      for (int w = wstart; w < wend; ++w) {
+        int idx = c * height * width + h * width + w;
+        if (bottom_data_n[idx] > maxval) {
+          maxidx = idx;
+          maxval = bottom_data_n[idx];
+        }
+      }
+    }
+    top_data[index] = maxval;
+    if (mask != nullptr) {
+      mask[index] = maxidx;
+    }
+  }
+}
+
+template <typename dtype>
+__global__ void MaxPoolForwardNHWC(const int nthreads, const dtype* bottom_data,
+                                   const int height, const int width,
+                                   const int channels, const int pooled_height,
+                                   const int pooled_width, const int kernel_h,
+                                   const int kernel_w, const int stride_h,
+                                   const int stride_w, const int pad_t,
+                                   const int pad_l, dtype* top_data,
+                                   int64* mask) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int n = index;
+    int c = n % channels;
+    n /= channels;
+    int wstart = (n % pooled_width) * stride_w - pad_l;
+    n /= pooled_width;
+    int hstart = (n % pooled_height) * stride_h - pad_t;
+    n /= pooled_height;
+    int hend = min(hstart + kernel_h, height);
+    int wend = min(wstart + kernel_w, width);
+    hstart = max(hstart, 0);
+    wstart = max(wstart, 0);
+    dtype maxval = -FLT_MAX;
+    int maxidx = -1;
+    const dtype* bottom_data_n = bottom_data + n * height * width * channels;
+    for (int h = hstart; h < hend; ++h) {
+      for (int w = wstart; w < wend; ++w) {
+        int idx = (h * width + w) * channels + c;
+        if (bottom_data_n[idx] > maxval) {
+          maxidx = idx;
+          maxval = bottom_data_n[idx];
+        }
+      }
+    }
+    top_data[index] = maxval;
+    if (mask != nullptr) {
+      mask[index] = maxidx;
+    }
+  }
+}
+
+template <typename dtype>
+__global__ void MaxPoolBackwardNoMaskNHWC(
+    const int nthreads, const dtype* bottom_data, const int height,
+    const int width, const int channels, const int pooled_height,
+    const int pooled_width, const int kernel_h, const int kernel_w,
+    const int stride_h, const int stride_w, const int pad_t, const int pad_l,
+    const dtype* top_diff, dtype* bottom_diff) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    // First find out the index to the maximum, since we have no mask.
+    int n = index;
+    int c = n % channels;
+    n /= channels;
+    int wstart = (n % pooled_width) * stride_w - pad_l;
+    n /= pooled_width;
+    int hstart = (n % pooled_height) * stride_h - pad_t;
+    n /= pooled_height;
+    int hend = min(hstart + kernel_h, height);
+    int wend = min(wstart + kernel_w, width);
+    hstart = max(hstart, 0);
+    wstart = max(wstart, 0);
+    dtype maxval = -FLT_MAX;
+    int maxidx = -1;
+    const dtype* bottom_data_n = bottom_data + n * height * width * channels;
+    for (int h = hstart; h < hend; ++h) {
+      for (int w = wstart; w < wend; ++w) {
+        int idx = (h * width + w) * channels + c;
+        if (bottom_data_n[idx] > maxval) {
+          maxidx = idx;
+          maxval = bottom_data_n[idx];
+        }
+      }
+    }
+
+    // Atomically accumulate the bottom diff. The index could still be
+    // uninitialized, if all the bottom_data are NaN.
+    if (maxidx != -1) {
+      atomicAdd(bottom_diff + n * height * width * channels + maxidx,
+                top_diff[index]);
+    }
+  }
+}
+
+// The parameters to the kernels in the backward function is as follows:
+//     nthreads: the number of threads, which is equal to the output size.
+//     top_diff: the gradient of the output data, of size N*Hout*Wout*C (or
+//        N*C*Hout*Wout). As we have stored the flattened index of the input
+//        entries, the backward function is agnostic of the input storage order.
+//     mask: the output mask of the same size as top_data. It is stored in
+//         int form, keeping track of the flattened index of the input item that
+//         produces the max output.
+//     top_offset: the pre-computed per-image offset of the maxpool output. This
+//         is equal to Hout*Wout*C. We choose to pre-compute this so we do not
+//         need to compute it every time inside the kernel.
+//     bottom_offset: the pre-computed per-image offset of the maxpool input.
+//         This is equal to H*W*C.
+//     bottom_diff: the gradient with respect to the input.
+// This function relies on atomicAdd to avoid race conditions. Also, before the
+// kernel is run, you will need to make sure that bottom_diff is filled with
+// zero first.
+template <typename dtype>
+__global__ void MaxPoolBackward(const int nthreads, const dtype* top_diff,
+                                const int64* mask, const int top_offset,
+                                const int bottom_offset, dtype* bottom_diff) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    int image_id = (index / top_offset);
+    atomicAdd(bottom_diff + image_id * bottom_offset + mask[index],
+              top_diff[index]);
+  }
+}
+
+template <typename dtype>
+__global__ void SetZero(const int nthreads, dtype* bottom_diff) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) { *(bottom_diff + index) = dtype(0); }
+}
+
+#undef CUDA_1D_KERNEL_LOOP
+}  // namespace
+
+bool MaxPoolForwardWithOptionalArgmax(
+    const float* bottom_data, const int batch, const int height,
+    const int width, const int channels, const int pooled_height,
+    const int pooled_width, const int kernel_h, const int kernel_w,
+    const int stride_h, const int stride_w, const int pad_t, const int pad_l,
+    float* top_data, int64* mask, const Eigen::GpuDevice& d) {
+  const int kThreadsPerBlock = 1024;
+  const int output_size = batch * channels * pooled_height * pooled_width;
+
+  MaxPoolForwardNHWC<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock,
+                       kThreadsPerBlock, 0, d.stream()>>>(
+      output_size, bottom_data, height, width, channels, pooled_height,
+      pooled_width, kernel_h, kernel_w, stride_h, stride_w, pad_t, pad_l,
+      top_data, mask);
+  return d.ok();
+}
+
+bool MaxPoolBackwardNoMask(const float* bottom_data, const int batch,
+                           const int height, const int width,
+                           const int channels, const int pooled_height,
+                           const int pooled_width, const int kernel_h,
+                           const int kernel_w, const int stride_h,
+                           const int stride_w, const int pad_t, const int pad_l,
+                           const float* top_diff, float* bottom_diff,
+                           const Eigen::GpuDevice& d) {
+  const int kThreadsPerBlock = 1024;
+  const int bottom_size = batch * channels * height * width;
+  const int top_size = batch * channels * pooled_height * pooled_width;
+
+  SetZero<<<(bottom_size + kThreadsPerBlock - 1) / kThreadsPerBlock,
+            kThreadsPerBlock, 0, d.stream()>>>(bottom_size, bottom_diff);
+
+  MaxPoolBackwardNoMaskNHWC<<<(top_size + kThreadsPerBlock - 1) /
+                                  kThreadsPerBlock,
+                              kThreadsPerBlock, 0, d.stream()>>>(
+      top_size, bottom_data, height, width, channels, pooled_height,
+      pooled_width, kernel_h, kernel_w, stride_h, stride_w, pad_t, pad_l,
+      top_diff, bottom_diff);
+  return d.ok();
+}
+
+bool MaxPoolBackwardWithArgmax(const int output_size, const int input_size,
+                               const float* top_diff, const int64* mask,
+                               const int top_offset, const int bottom_offset,
+                               float* bottom_diff, const Eigen::GpuDevice& d) {
+  const int kThreadsPerBlock = 1024;
+  SetZero<<<(input_size + kThreadsPerBlock - 1) / kThreadsPerBlock,
+            kThreadsPerBlock, 0, d.stream()>>>(input_size, bottom_diff);
+  MaxPoolBackward<<<(output_size + kThreadsPerBlock - 1) / kThreadsPerBlock,
+                    kThreadsPerBlock, 0, d.stream()>>>(
+      output_size, top_diff, mask, top_offset, bottom_offset, bottom_diff);
+  return d.ok();
+}
+
+typedef Eigen::GpuDevice GPUDevice;
+
+#define DEFINE_GPU_KERNELS(T) \
+  template struct functor::SpatialMaxPooling<GPUDevice, T>;
+
+DEFINE_GPU_KERNELS(float)
+
+#undef DEFINE_GPU_KERNELS
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/maxpooling_op_gpu.h b/tensorflow/core/kernels/maxpooling_op_gpu.h
new file mode 100644
index 0000000000..bfdac904cc
--- /dev/null
+++ b/tensorflow/core/kernels/maxpooling_op_gpu.h
@@ -0,0 +1,42 @@
+#if !GOOGLE_CUDA
+#error This file must only be included when building with Cuda support
+#endif
+
+#ifndef THIRD_PARTY_TENSORFLOW_CORE_KERNELS_MAXPOOLING_OP_GPU_H_
+#define THIRD_PARTY_TENSORFLOW_CORE_KERNELS_MAXPOOLING_OP_GPU_H_
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+
+namespace tensorflow {
+
+// Run the forward pass of max pooling, optionally writing the argmax indices to
+// the mask array, if it is not nullptr. If mask is passed in as nullptr, the
+// argmax indices are not written.
+bool MaxPoolForwardWithOptionalArgmax(
+    const float* bottom_data, const int batch, const int height,
+    const int width, const int channels, const int pooled_height,
+    const int pooled_width, const int kernel_h, const int kernel_w,
+    const int stride_h, const int stride_w, const int pad_t, const int pad_l,
+    float* top_data, int64* mask, const Eigen::GpuDevice& d);
+
+bool MaxPoolBackwardWithArgmax(const int output_size, const int input_size,
+                               const float* top_diff, const int64* mask,
+                               const int top_offset, const int bottom_offset,
+                               float* bottom_diff, const Eigen::GpuDevice& d);
+
+bool MaxPoolBackwardNoMask(const float* bottom_data, const int batch,
+                           const int height, const int width,
+                           const int channels, const int pooled_height,
+                           const int pooled_width, const int kernel_h,
+                           const int kernel_w, const int stride_h,
+                           const int stride_w, const int pad_t, const int pad_l,
+                           const float* top_diff, float* bottom_diff,
+                           const Eigen::GpuDevice& d);
+
+}  // namespace tensorflow
+
+#endif  // THIRD_PARTY_TENSORFLOW_CORE_KERNELS_MAXPOOLING_OP_GPU_H_
diff --git a/tensorflow/core/kernels/no_op.cc b/tensorflow/core/kernels/no_op.cc
new file mode 100644
index 0000000000..b4f9df81a6
--- /dev/null
+++ b/tensorflow/core/kernels/no_op.cc
@@ -0,0 +1,8 @@
+#include "tensorflow/core/kernels/no_op.h"
+
+namespace tensorflow {
+
+REGISTER_KERNEL_BUILDER(Name("NoOp").Device(DEVICE_CPU), NoOp);
+REGISTER_KERNEL_BUILDER(Name("NoOp").Device(DEVICE_GPU), NoOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/no_op.h b/tensorflow/core/kernels/no_op.h
new file mode 100644
index 0000000000..a3bcbd7680
--- /dev/null
+++ b/tensorflow/core/kernels/no_op.h
@@ -0,0 +1,17 @@
+#ifndef TENSORFLOW_KERNELS_NO_OP_H_
+#define TENSORFLOW_KERNELS_NO_OP_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+
+namespace tensorflow {
+
+class NoOp : public OpKernel {
+ public:
+  explicit NoOp(OpKernelConstruction* context) : OpKernel(context) {}
+  void Compute(OpKernelContext* context) override {}
+  bool IsExpensive() override { return false; }
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_NO_OP_H_
diff --git a/tensorflow/core/kernels/ops_testutil.cc b/tensorflow/core/kernels/ops_testutil.cc
new file mode 100644
index 0000000000..7bea17b9e2
--- /dev/null
+++ b/tensorflow/core/kernels/ops_testutil.cc
@@ -0,0 +1,18 @@
+#include "tensorflow/core/kernels/ops_testutil.h"
+
+namespace tensorflow {
+namespace test {
+
+NodeDef Node(const string& name, const string& op,
+             const std::vector<string>& inputs) {
+  NodeDef def;
+  def.set_name(name);
+  def.set_op(op);
+  for (const string& s : inputs) {
+    def.add_input(s);
+  }
+  return def;
+}
+
+}  // namespace test
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/ops_testutil.h b/tensorflow/core/kernels/ops_testutil.h
new file mode 100644
index 0000000000..7a3405bf04
--- /dev/null
+++ b/tensorflow/core/kernels/ops_testutil.h
@@ -0,0 +1,191 @@
+#ifndef TENSORFLOW_KERNELS_OPS_TESTUTIL_H_
+#define TENSORFLOW_KERNELS_OPS_TESTUTIL_H_
+
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/device_base.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/session_options.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/tensor_slice_reader_cache.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+namespace test {
+
+// Return a NodeDef with the specified name/op/inputs.
+NodeDef Node(const string& name, const string& op,
+             const std::vector<string>& inputs);
+
+}  // namespace test
+
+// Helpful functions to test operators.
+//
+// This class will eventually be replaced / heavily modified
+// to use the BrainClient interface.
+class OpsTestBase : public ::testing::Test {
+ public:
+  OpsTestBase() : device_type_(DEVICE_CPU) {
+    device_.reset(
+        DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+    CHECK(device_.get()) << "Could not create CPU device";
+  }
+
+  ~OpsTestBase() override {
+    gtl::STLDeleteElements(&tensors_);
+    context_.reset(nullptr);
+  }
+
+  void set_node_def(const NodeDef& node_def) { node_def_.CopyFrom(node_def); }
+
+  // Clients can manipulate the underlying NodeDef via this accessor.
+  NodeDef* node_def() { return &node_def_; }
+
+  // Initializes an operator that takes in 'input_types' as input
+  // and output types as output.
+  //
+  // Returns the status of initialization.
+  Status InitOp() {
+    Status status;
+    kernel_ = CreateOpKernel(device_type_, device_.get(), allocator(),
+                             node_def_, &status);
+    if (kernel_ != nullptr) input_types_ = kernel_->input_types();
+    return status;
+  }
+
+  // Adds an input for every element described by the shape.
+  // 'input_mapping' maps an index (0...NumElements(shape)) to a
+  // value.
+  //
+  // TODO(vrv): Replace with something like a BrainClient Feed.
+  template <typename T>
+  void AddInput(const TensorShape& shape, std::function<T(int)> input_mapping) {
+    CHECK_GT(input_types_.size(), inputs_.size())
+        << "Adding more inputs than types; perhaps you need to call MakeOp";
+    bool is_ref = IsRefType(input_types_[inputs_.size()]);
+    Tensor* input = new Tensor(device_->GetAllocator(AllocatorAttributes()),
+                               DataTypeToEnum<T>::v(), shape);
+    test::FillFn(input, input_mapping);
+    tensors_.push_back(input);
+    if (is_ref) {
+      CHECK_EQ(RemoveRefType(input_types_[inputs_.size()]),
+               DataTypeToEnum<T>::v());
+      inputs_.push_back({&lock_for_refs_, input});
+    } else {
+      CHECK_EQ(input_types_[inputs_.size()], DataTypeToEnum<T>::v());
+      inputs_.push_back({nullptr, input});
+    }
+  }
+
+  // Like AddInput but takes in an explicit arrayslice of data.
+  template <typename T>
+  void AddInputFromArray(const TensorShape& shape,
+                         const gtl::ArraySlice<T>& data) {
+    CHECK_GT(input_types_.size(), inputs_.size())
+        << "Adding more inputs than types; perhaps you need to call MakeOp";
+    bool is_ref = IsRefType(input_types_[inputs_.size()]);
+    Tensor* input = new Tensor(device_->GetAllocator(AllocatorAttributes()),
+                               DataTypeToEnum<T>::v(), shape);
+    test::FillValues<T>(input, data);
+    tensors_.push_back(input);
+    if (is_ref) {
+      CHECK_EQ(RemoveRefType(input_types_[inputs_.size()]),
+               DataTypeToEnum<T>::v());
+      inputs_.push_back({&lock_for_refs_, input});
+    } else {
+      CHECK_EQ(input_types_[inputs_.size()], DataTypeToEnum<T>::v());
+      inputs_.push_back({nullptr, input});
+    }
+  }
+
+  // Runs an operation producing 'num_outputs' outputs.
+  //
+  // Returns the context's status after running the operation.
+  Status RunOpKernel() {
+    OpKernelContext::Params params;
+    params.device = device_.get();
+    params.frame_iter = FrameAndIter(0, 0);
+    params.inputs = &inputs_;
+    params.op_kernel = kernel_.get();
+    params.output_alloc_attr = [this, &params](int index) {
+      AllocatorAttributes attr;
+      const bool on_host =
+          (kernel_->output_memory_types()[index] == HOST_MEMORY);
+      attr.set_on_host(on_host);
+      return attr;
+    };
+    checkpoint::TensorSliceReaderCacheWrapper slice_reader_cache_wrapper;
+    params.slice_reader_cache = &slice_reader_cache_wrapper;
+
+    context_.reset(new OpKernelContext(params));
+    device_->Compute(kernel_.get(), context_.get());
+    return context_->status();
+  }
+
+  // Returns the tensor input for 'input_index'.
+  //
+  // REQUIRES: 0 <= input_index < context_->num_inputs()
+  const Tensor& GetInput(int input_index) const {
+    CHECK_LT(input_index, context_->num_inputs());
+    CHECK(!IsRefType(context_->input_dtype(input_index)));
+    return context_->input(input_index);
+  }
+
+  TensorValue mutable_input(int input_index) {
+    CHECK_LT(input_index, inputs_.size());
+    return inputs_[input_index];
+  }
+  // Returns the tensor output for 'output_index'.
+  //
+  // REQUIRES: 0 <= output_index < context_->num_outputs()
+  Tensor* GetOutput(int output_index) {
+    CHECK_LT(output_index, context_->num_outputs());
+    return context_->mutable_output(output_index);
+  }
+
+  Allocator* allocator() {
+    return device_->GetAllocator(AllocatorAttributes());
+  }
+
+  const DataTypeVector& output_types() const { return kernel_->output_types(); }
+
+ protected:
+  std::unique_ptr<Device> device_;
+
+  std::unique_ptr<OpKernel> kernel_;
+  NodeDef node_def_;
+  DataTypeVector input_types_;
+  DeviceType device_type_;
+
+  mutex lock_for_refs_;  // Used as the Mutex for inputs added as refs
+
+  gtl::InlinedVector<TensorValue, 4> inputs_;
+  // Owns Tensors.
+  std::vector<Tensor*> tensors_;
+
+  std::unique_ptr<OpKernelContext> context_;
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(OpsTestBase);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_OPS_TESTUTIL_H_
diff --git a/tensorflow/core/kernels/ops_util.cc b/tensorflow/core/kernels/ops_util.cc
new file mode 100644
index 0000000000..ca2925128e
--- /dev/null
+++ b/tensorflow/core/kernels/ops_util.cc
@@ -0,0 +1,113 @@
+#include <cmath>
+
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/util/padding.h"
+
+namespace tensorflow {
+
+void RequireDefaultOps() {
+// TODO(opensource): Use a more generic sounding preprocessor name than
+// GOOGLE_CUDA (maybe SUPPORT_CUDA?)
+#if GOOGLE_CUDA
+  void RequireGPUDevice();
+  RequireGPUDevice();
+#endif
+}
+
+Status Get2dOutputSize(const int in_height, const int in_width,
+                       int filter_height, int filter_width, int row_stride,
+                       int col_stride, Padding padding, int* new_height,
+                       int* new_width, int* pad_rows, int* pad_cols) {
+  int pad_bottom_unused, pad_right_unused;
+  return Get2dOutputSizeVerbose(
+      in_height, in_width, filter_height, filter_width, row_stride, col_stride,
+      padding, new_height, new_width, pad_rows, &pad_bottom_unused, pad_cols,
+      &pad_right_unused);
+}
+
+Status Get2dOutputSizeVerbose(const int in_height, const int in_width,
+                              int filter_height, int filter_width,
+                              int row_stride, int col_stride, Padding padding,
+                              int* new_height, int* new_width, int* pad_top,
+                              int* pad_bottom, int* pad_left, int* pad_right) {
+  // Cannot have strides larger than the patch size.
+  if (row_stride > filter_height || col_stride > filter_width) {
+    return errors::InvalidArgument(
+        "stride must be less than or equal to kernel size");
+  }
+  switch (padding) {
+    case Padding::VALID:
+      *new_height = ceil((in_height - filter_height + 1.f) /
+                         static_cast<float>(row_stride));
+      *new_width = ceil((in_width - filter_width + 1.f) /
+                        static_cast<float>(col_stride));
+      *pad_top = 0;
+      *pad_bottom = 0;
+      *pad_left = 0;
+      *pad_right = 0;
+      break;
+    case Padding::SAME:
+      *new_height = ceil(in_height / static_cast<float>(row_stride));
+      *new_width = ceil(in_width / static_cast<float>(col_stride));
+      // Calculate padding for top/bottom/left/right, spilling any excess
+      // padding to bottom and right.
+      const int pad_needed_height =
+          (*new_height - 1) * row_stride + filter_height - in_height;
+      *pad_top = pad_needed_height / 2;
+      CHECK_GE(pad_needed_height, 0);
+      *pad_bottom = pad_needed_height - *pad_top;
+
+      const int pad_needed_width =
+          (*new_width - 1) * col_stride + filter_width - in_width;
+      *pad_left = pad_needed_width / 2;
+      CHECK_GE(pad_needed_width, 0);
+      *pad_right = pad_needed_width - *pad_left;
+      break;
+  }
+  if (*new_height < 0 || *new_width < 0) {
+    return errors::InvalidArgument("computed output size would be negative");
+  }
+  return Status::OK();
+}
+
+Eigen::PaddingType BrainPadding2EigenPadding(Padding padding) {
+  switch (padding) {
+    case Padding::VALID:
+      return Eigen::PADDING_VALID;
+    case Padding::SAME:
+      return Eigen::PADDING_SAME;
+  }
+  return Eigen::PADDING_SAME;  // Prevent compiler warning about missing return
+}
+
+Status GetBroadcastSize(const int index, const int in_size,
+                             const int ksize, const int stride,
+                             const int pad_size, int* bindex, int* bsize) {
+  // Cannot have strides larger than the patch size.
+  if (stride > ksize) {
+    return errors::InvalidArgument(
+        "stride must be less than or equal to kernel size");
+  }
+  // Cannot have index beyond the input size.
+  if (index * stride > in_size) {
+    return errors::InvalidArgument(
+        "index * stride must be less than or equal to input size");
+  }
+  *bindex = index * stride;
+  *bsize = ksize;
+  if (*bindex < pad_size) {
+    // If the current index is in the padding area, start broadcast  from index
+    // 0 with broadcast size reduced by padding size.
+    *bsize = ksize + *bindex - pad_size;
+    *bindex = 0;
+  } else {
+    // Otherwise, start broadcast from current index reduced by padding size.
+    *bindex -= pad_size;
+  }
+  if (*bindex + ksize > in_size) {
+    *bsize = std::min((in_size - *bindex), ksize);
+  }
+  return Status::OK();
+}
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/ops_util.h b/tensorflow/core/kernels/ops_util.h
new file mode 100644
index 0000000000..283338f8df
--- /dev/null
+++ b/tensorflow/core/kernels/ops_util.h
@@ -0,0 +1,180 @@
+#ifndef TENSORFLOW_KERNELS_OPS_UTIL_H_
+#define TENSORFLOW_KERNELS_OPS_UTIL_H_
+
+// This file contains utilities for various operations.
+
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/util/padding.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+// Call this function from a test if op kernels are not being
+// registered.  This can happen if the test is linked in a shared
+// mode and has no direct references to any code from this directory.
+void RequireDefaultOps();
+
+// Get2dOutputSize(): Given an input tensor, kernel, stride and padding
+// type, the function computes the output and padding dimensions.
+//
+// Convolution layers take in an input tensor of shape (D, C, R, B), and
+// convolve it with a set of filters, which can also be presented as a
+// tensor (D, K, K, M), where M is the number of filters, K is the filter size,
+// and each 3-dimensional tensor of size (D, K, K) is a filter. For
+// simplicity we assume that we always use square filters (which is usually the
+// case in images). It also takes in a few additional parameters:
+//
+// Stride (S): the stride with which we apply the filters. This is the offset
+// between locations where we apply the filters. A larger stride
+// means that the output will be spatially smaller.
+//
+// Padding (P): the padding we apply to the input tensor along the R and C
+// dimensions. This is usually used to make sure that the spatial dimension
+// do not shrink when we progress with convolutions. Two types of padding are
+// often used:
+//   SAME: the pad value is computed so that the output will have size R/S
+//         and C/S.
+//   VALID: no padding is carried out.
+// The padded area is zero-filled.
+//
+// The output dimensions for convolution and many other operations, when given
+// all the parameters above, are as follows:
+// - When Padding = SAME: the output size is (B, R', C', M), where
+//     R' = ceil(float(R) / float(S))
+//     C' = ceil(float(C) / float(S))
+//   where ceil is the ceiling function. The number of padded rows and columns
+//   are computed as:
+//     Pr = ((R' - 1) * S + K - R) / 2
+//     Pc = ((C' - 1) * S + K - C) / 2
+//   When the stride is 1, we have the simplified case
+//     R'=R, C'=C, Pr=Pc=(K-1)/2.
+//   This is where SAME comes from - the output has the same size as the input
+//   has.
+//
+// - When Padding = VALID: the output size is computed as
+//     R' = ceil(float(R - K + 1) / float(S))
+//     C' = ceil(float(C - K + 1) / float(S))
+//   and the number of padded rows and columns are computed in the same way.
+//   When the stride is 1, we have the simplified case
+//     R'=R-K+1, C'=C-K+1, Pr=0, Pc=0.
+//
+// For convolution, mathematically, the output value at location (b, r', c', m)
+// is the inner product of two vectors: the chunk of input at
+//    (b, (r'*S-Pr) : (r'*S-Pr+K), (c'*S-Pc) : (c'*S-Pc+K), :),
+// and the filter at (m, :, :, :).
+//
+Status Get2dOutputSize(const int in_height, const int in_width,
+                       int filter_height, int filter_width, int row_stride,
+                       int col_stride, Padding padding, int* new_height,
+                       int* new_width, int* pad_rows, int* pad_cols);
+
+// Returns the same output dimensions as in Get2dOutputSize, but returns verbose
+// padding dimensions (top/bottom/left/right). Any excess padding (caused by
+// an odd padding size value) is added to the 'pad_bottom' and 'pad_right'
+// dimensions.
+Status Get2dOutputSizeVerbose(const int in_height, const int in_width,
+                              int filter_height, int filter_width,
+                              int row_stride, int col_stride, Padding padding,
+                              int* new_height, int* new_width, int* pad_top,
+                              int* pad_bottom, int* pad_left, int* pad_right);
+
+// Calculates broadcast starting index and size.  For SAME padding, addition
+// padding could be applied to right, left, top and bottom.  Depending on the
+// current index, input size, kernel size, stride, padding size, the starting
+// index and size for broadcast for that dimension are different from the
+// current index and kernel size.
+// This is mainly used by gradient algorithms for pooling operations.
+Status GetBroadcastSize(const int index, const int in_size,
+                             const int ksize, const int stride,
+                             const int pad_size, int* bindex, int* bsize);
+
+// Converts Brain's Padding to Eigen's PaddingType.
+Eigen::PaddingType BrainPadding2EigenPadding(Padding padding);
+
+// Given a shape 's' of a tensor of type T. Returns true iff the
+// number of bytes occupied by each dim 0 (i.e., &tensor(i + 1, ...) -
+// &tensor(i, ...)) is multiple of EIGEN_ALIGN_BYTES.
+template <typename T>
+bool IsInnerDimsSizeAligned(const TensorShape& s) {
+  if (s.dims() == 0) return false;
+  const int64 dim0_size = s.dim_size(0);
+  if (dim0_size == 0) return false;
+  const int64 bytes_per_dim0 = (s.num_elements() / dim0_size) * sizeof(T);
+  return bytes_per_dim0 % EIGEN_MAX_ALIGN_BYTES == 0;
+}
+
+// Returns in 'col_data', image patches in storage order (height, width, depth)
+// extracted from image at 'input_data', which is requred to be in storage
+// order (batch, height, width, depth).
+// Implementation written by Yangqing Jia (jiayq).
+template <typename T>
+void Im2col(const T* input_data, const int depth, const int height,
+            const int width, const int filter_h, const int filter_w,
+            const int pad_t, const int pad_l, const int pad_b, const int pad_r,
+            const int stride_h, const int stride_w, T* col_data) {
+  int height_col = (height + pad_t + pad_b - filter_h) / stride_h + 1;
+  int width_col = (width + pad_l + pad_r - filter_w) / stride_w + 1;
+
+  int h_pad = -pad_t;
+  for (int h = 0; h < height_col; ++h) {
+    int w_pad = -pad_l;
+    for (int w = 0; w < width_col; ++w) {
+      for (int ih = h_pad; ih < h_pad + filter_h; ++ih) {
+        for (int iw = w_pad; iw < w_pad + filter_w; ++iw) {
+          if (ih >= 0 && ih < height && iw >= 0 && iw < width) {
+            memcpy(col_data, input_data + (ih * width + iw) * depth,
+                   sizeof(T) * depth);
+          } else {
+            // This should be simply padded with zero.
+            memset(col_data, 0, sizeof(T) * depth);
+          }
+          col_data += depth;
+        }
+      }
+      w_pad += stride_w;
+    }
+    h_pad += stride_h;
+  }
+}
+
+// Returns in 'im_data' image patch in storage order (height, width, depth),
+// constructed from patches in 'col_data', which is required to be in storage
+// order (out_height * out_width, filter_height, filter_width, in_depth).
+// Implementation by Yangqing Jia (jiayq).
+template <typename T>
+void Col2im(const T* col_data, const int depth, const int height,
+            const int width, const int filter_h, const int filter_w,
+            const int pad_t, const int pad_l, const int pad_b, const int pad_r,
+            const int stride_h, const int stride_w, T* im_data) {
+  memset(im_data, 0, sizeof(T) * height * width * depth);
+  int height_col = (height + pad_t + pad_b - filter_h) / stride_h + 1;
+  int width_col = (width + pad_l + pad_r - filter_w) / stride_w + 1;
+  int h_pad = -pad_t;
+  for (int h = 0; h < height_col; ++h) {
+    int w_pad = -pad_l;
+    for (int w = 0; w < width_col; ++w) {
+      T* im_patch_data = im_data + (h_pad * width + w_pad) * depth;
+      for (int ih = h_pad; ih < h_pad + filter_h; ++ih) {
+        for (int iw = w_pad; iw < w_pad + filter_w; ++iw) {
+          if (ih >= 0 && ih < height && iw >= 0 && iw < width) {
+            // TODO(andydavis) Vectorize this loop (if compiler does not).
+            for (int i = 0; i < depth; ++i) {
+              im_patch_data[i] += col_data[i];
+            }
+          }
+          im_patch_data += depth;
+          col_data += depth;
+        }
+        // Jump over remaining number of depth.
+        im_patch_data += depth * (width - filter_w);
+      }
+      w_pad += stride_w;
+    }
+    h_pad += stride_h;
+  }
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_OPS_UTIL_H_
diff --git a/tensorflow/core/kernels/ops_util_test.cc b/tensorflow/core/kernels/ops_util_test.cc
new file mode 100644
index 0000000000..bc4f57e220
--- /dev/null
+++ b/tensorflow/core/kernels/ops_util_test.cc
@@ -0,0 +1,265 @@
+#include "tensorflow/core/kernels/ops_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+class OpsUtilTest : public ::testing::Test {
+ protected:
+  OpsUtilTest() {}
+  ~OpsUtilTest() override {}
+
+  // Padding structure.
+  struct padding_struct {
+    // Input parameters.
+    struct {
+      int in_height;
+      int in_width;
+      int filter_height;
+      int filter_width;
+      int row_stride;
+      int col_stride;
+      Padding padding;
+    } input;
+    // Output.
+    struct {
+      int new_height;
+      int new_width;
+      int pad_top;
+      int pad_bottom;
+      int pad_left;
+      int pad_right;
+    } output;
+  };
+
+  // Broadcast structure.
+  struct bcast_struct {
+    // Input parameters.
+    struct {
+      int index;     // Current index.
+      int in_size;   // Size of the dimension.
+      int ksize;     // Kernel size.
+      int stride;    // Stride.
+      int pad_size;  // Padding size.
+    } input;
+    // Output.
+    struct {
+      int new_index;  // New starting index.
+      int new_size;   // New broadcast size.
+    } output;
+  };
+
+  static void VerifyGet2dOutputSizeBoundaries(padding_struct pad_struct,
+                                              error::Code code) {
+    int new_height, new_width, pad_rows, pad_cols;
+    Status status = Get2dOutputSize(
+        pad_struct.input.in_height, pad_struct.input.in_width,
+        pad_struct.input.filter_height, pad_struct.input.filter_width,
+        pad_struct.input.row_stride, pad_struct.input.col_stride,
+        pad_struct.input.padding, &new_height, &new_width, &pad_rows,
+        &pad_cols);
+    EXPECT_EQ(status.code(), code) << status;
+  }
+
+  static void VerifyGet2dOutputSizeValues(padding_struct pad_struct,
+                                          error::Code code) {
+    int new_height, new_width, pad_rows, pad_cols;
+    Status status = Get2dOutputSize(
+        pad_struct.input.in_height, pad_struct.input.in_width,
+        pad_struct.input.filter_height, pad_struct.input.filter_width,
+        pad_struct.input.row_stride, pad_struct.input.col_stride,
+        pad_struct.input.padding, &new_height, &new_width, &pad_rows,
+        &pad_cols);
+    EXPECT_EQ(status.code(), code) << status;
+    EXPECT_EQ(pad_struct.output.new_height, new_height);
+    EXPECT_EQ(pad_struct.output.new_width, new_width);
+    EXPECT_EQ(pad_struct.output.pad_top, pad_rows);
+    EXPECT_EQ(pad_struct.output.pad_left, pad_cols);
+  }
+
+  static void VerifyGet2dOutputVerboseSizeValues(padding_struct pad_struct,
+                                                 error::Code code) {
+    int new_height, new_width, pad_top, pad_bottom, pad_left, pad_right;
+    Status status = Get2dOutputSizeVerbose(
+        pad_struct.input.in_height, pad_struct.input.in_width,
+        pad_struct.input.filter_height, pad_struct.input.filter_width,
+        pad_struct.input.row_stride, pad_struct.input.col_stride,
+        pad_struct.input.padding, &new_height, &new_width, &pad_top,
+        &pad_bottom, &pad_left, &pad_right);
+    EXPECT_EQ(status.code(), code) << status;
+    EXPECT_EQ(pad_struct.output.new_height, new_height);
+    EXPECT_EQ(pad_struct.output.new_width, new_width);
+    EXPECT_EQ(pad_struct.output.pad_top, pad_top);
+    EXPECT_EQ(pad_struct.output.pad_bottom, pad_bottom);
+    EXPECT_EQ(pad_struct.output.pad_left, pad_left);
+    EXPECT_EQ(pad_struct.output.pad_right, pad_right);
+  }
+
+  static void VerifyBoundaries(bcast_struct bcast, error::Code code) {
+    int new_index, new_size;
+    Status status = GetBroadcastSize(
+        bcast.input.index, bcast.input.in_size, bcast.input.ksize,
+        bcast.input.stride, bcast.input.pad_size, &new_index, &new_size);
+    EXPECT_EQ(status.code(), code) << status;
+  }
+
+  static void VerifyBcastValues(bcast_struct bcast) {
+    int new_index, new_size;
+    EXPECT_EQ(Status::OK(),
+              GetBroadcastSize(bcast.input.index, bcast.input.in_size,
+                               bcast.input.ksize, bcast.input.stride,
+                               bcast.input.pad_size, &new_index, &new_size));
+    EXPECT_EQ(bcast.output.new_index, new_index);
+    EXPECT_EQ(bcast.output.new_size, new_size);
+  }
+};
+
+// Test stride > ksize fails with INVALID_ARGUMENT.
+TEST_F(OpsUtilTest, Get2dOutputSizeInvalidTest) {
+  padding_struct pad_struct = {{3, 3, 1, 2, 2, 2, SAME}, {3, 3, 1, 1, 1, 1}};
+  VerifyGet2dOutputSizeBoundaries(pad_struct, error::INVALID_ARGUMENT);
+}
+
+TEST_F(OpsUtilTest, Get2dOutputSizeNegativeSizeTest) {
+  padding_struct pad_struct = {{1, 1, 3, 3, 1, 1, VALID}, {-1, -1, 0, 0, 0, 0}};
+  VerifyGet2dOutputSizeBoundaries(pad_struct, error::INVALID_ARGUMENT);
+}
+
+TEST_F(OpsUtilTest, Get2dOutputSizeSquareFilterTest) {
+  padding_struct pad_struct1 = {{3, 3, 2, 2, 2, 2, SAME}, {2, 2, 0, 0, 0, 0}};
+  padding_struct pad_struct2 = {{3, 3, 2, 2, 2, 2, VALID}, {1, 1, 0, 0, 0, 0}};
+  VerifyGet2dOutputSizeValues(pad_struct1, error::OK);
+  VerifyGet2dOutputSizeValues(pad_struct2, error::OK);
+}
+
+TEST_F(OpsUtilTest, Get2dOutputSizeNonSquareFilterTest) {
+  padding_struct pad_struct1 = {{4, 5, 1, 2, 1, 1, SAME}, {4, 5, 0, 0, 0, 0}};
+  padding_struct pad_struct2 = {{4, 5, 1, 2, 1, 1, VALID}, {4, 4, 0, 0, 0, 0}};
+  VerifyGet2dOutputSizeValues(pad_struct1, error::OK);
+  VerifyGet2dOutputSizeValues(pad_struct2, error::OK);
+}
+
+TEST_F(OpsUtilTest, Get2dOutputSizeUnevenStrideTest) {
+  padding_struct pad_struct1 = {{4, 4, 2, 2, 1, 2, VALID}, {3, 2, 0, 0, 0, 0}};
+  padding_struct pad_struct2 = {{4, 4, 2, 2, 2, 1, VALID}, {2, 3, 0, 0, 0, 0}};
+  VerifyGet2dOutputSizeValues(pad_struct1, error::OK);
+  VerifyGet2dOutputSizeValues(pad_struct2, error::OK);
+}
+
+TEST_F(OpsUtilTest, Get2dOutputSizeVerbose) {
+  padding_struct pad_struct1 = {{3, 3, 2, 2, 2, 2, SAME}, {2, 2, 0, 1, 0, 1}};
+  padding_struct pad_struct2 = {{3, 3, 2, 2, 2, 2, VALID}, {1, 1, 0, 0, 0, 0}};
+  VerifyGet2dOutputVerboseSizeValues(pad_struct1, error::OK);
+  VerifyGet2dOutputVerboseSizeValues(pad_struct2, error::OK);
+}
+
+// Test stride > ksize fails with INVALID_ARGUMENT.
+TEST_F(OpsUtilTest, GetBroadcastTest3_1_2_0) {
+  bcast_struct bcast = {{0, 3, 1, 2, 0}, {0, 3}};
+  VerifyBoundaries(bcast, error::INVALID_ARGUMENT);
+}
+
+// Test index * stride > in_size fails with INVALID_ARGUMENT.
+TEST_F(OpsUtilTest, GetBroadcastTestBadIndex) {
+  bcast_struct bcast = {{2, 3, 1, 2, 0}, {0, 3}};
+  VerifyBoundaries(bcast, error::INVALID_ARGUMENT);
+}
+
+// in_size = 3, ksize = 3, stride = 1, pad_size = 0
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_1_0) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 1, 0}, {0, 3}},
+      {{1, 3, 3, 1, 0}, {1, 2}},
+      {{2, 3, 3, 1, 0}, {2, 1}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+// in_size = 3, ksize = 3, stride = 1, pad_size = 1
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_1_1) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 1, 1}, {0, 2}},
+      {{1, 3, 3, 1, 1}, {0, 3}},
+      {{2, 3, 3, 1, 1}, {1, 2}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+// in_size = 3, ksize = 3, stride = 1, pad_size = 2
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_1_2) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 1, 2}, {0, 1}},
+      {{1, 3, 3, 1, 2}, {0, 2}},
+      {{2, 3, 3, 1, 2}, {0, 3}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+// in_size = 3, ksize = 3, stride = 2, pad_size = 0
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_2_0) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 2, 0}, {0, 3}}, {{1, 3, 3, 2, 0}, {2, 1}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+// in_size = 3, ksize = 3, stride = 2, pad_size = 1
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_2_1) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 2, 1}, {0, 2}}, {{1, 3, 3, 2, 1}, {1, 2}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+// in_size = 3, ksize = 3, stride = 2, pad_size = 2
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_2_2) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 2, 2}, {0, 1}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+// in_size = 3, ksize = 3, stride = 3, pad_size = 0
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_3_0) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 3, 0}, {0, 3}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+// in_size = 3, ksize = 3, stride = 3, pad_size = 1
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_3_1) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 3, 1}, {0, 2}}, {{1, 3, 3, 3, 1}, {2, 1}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+// in_size = 3, ksize = 3, stride = 3, pad_size = 2
+TEST_F(OpsUtilTest, GetBroadcastTest3_3_3_2) {
+  bcast_struct bcast[] = {
+      {{0, 3, 3, 3, 2}, {0, 1}},
+  };
+  for (size_t i = 0; i < sizeof(bcast) / sizeof(bcast[0]); ++i) {
+    VerifyBcastValues(bcast[i]);
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/pack_op.cc b/tensorflow/core/kernels/pack_op.cc
new file mode 100644
index 0000000000..cb125ea2fe
--- /dev/null
+++ b/tensorflow/core/kernels/pack_op.cc
@@ -0,0 +1,114 @@
+// See docs in ../ops/array_ops.cc.
+
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/kernels/concat_op.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+// --------------------------------------------------------------------------
+template <typename Device, typename T>
+class PackOp : public OpKernel {
+ public:
+  typedef std::vector<std::unique_ptr<typename TTypes<T, 2>::ConstMatrix>>
+      ConstMatrixVector;
+
+  explicit PackOp(OpKernelConstruction* c) : OpKernel(c) {}
+
+  void Compute(OpKernelContext* c) override {
+    OpInputList values;
+    OP_REQUIRES_OK(c, c->input_list("values", &values));
+    const int num = values.size();
+
+    // Verify that all input shapes match
+    for (int i = 1; i < num; i++) {
+      OP_REQUIRES(c, values[0].shape().IsSameSize(values[i].shape()),
+                  errors::InvalidArgument(
+                      "Shapes of all inputs must match: values[0].shape = ",
+                      values[0].shape().ShortDebugString(), " != values[", i,
+                      "].shape = ", values[i].shape().ShortDebugString()));
+    }
+
+    TensorShape output_shape(values[0].shape());
+    output_shape.InsertDim(0, num);
+
+    // In the num = 1 case, just reshape the input
+    if (num == 1) {
+      Tensor output;
+      CHECK(output.CopyFrom(values[0], output_shape));
+      c->set_output(0, output);
+      return;
+    }
+
+    // Allocate output
+    Tensor* output;
+    OP_REQUIRES_OK(c, c->allocate_output(0, output_shape, &output));
+
+    const int output_size = output->NumElements();
+    if (output_size > 0) {
+      auto output_flat = output->shaped<T, 2>({1, output_size});
+
+      // Except for shapes, pack is a special case of concat, so we reuse the
+      // same computational kernels.
+      ConstMatrixVector inputs_flat;
+      inputs_flat.reserve(num);
+      for (int i = 0; i < num; ++i) {
+        inputs_flat.emplace_back(new typename TTypes<T, 2>::ConstMatrix(
+            values[i].shaped<T, 2>({1, values[i].NumElements()})));
+      }
+      if (std::is_same<Device, GPUDevice>::value) {
+        ConcatGPU<T>(c->eigen_gpu_device(), inputs_flat, &output_flat);
+      } else {
+        ConcatCPU<T>(c->device(), inputs_flat, &output_flat);
+      }
+    }
+  }
+};
+
+#define REGISTER_PACK(type)                                      \
+  REGISTER_KERNEL_BUILDER(                                       \
+      Name("Pack").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      PackOp<CPUDevice, type>)
+
+TF_CALL_ALL_TYPES(REGISTER_PACK);
+REGISTER_PACK(quint8);
+REGISTER_PACK(qint8);
+REGISTER_PACK(qint32);
+REGISTER_PACK(bfloat16);
+
+#undef REGISTER_PACK
+
+#if GOOGLE_CUDA
+
+#define REGISTER_GPU(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                       \
+      Name("Pack").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      PackOp<GPUDevice, type>)
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU);
+#undef REGISTER_GPU
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Pack")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("values")
+                            .HostMemory("output")
+                            .TypeConstraint<int32>("T"),
+                        PackOp<CPUDevice, int32>);
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/pad_op.cc b/tensorflow/core/kernels/pad_op.cc
new file mode 100644
index 0000000000..6c66e54e3d
--- /dev/null
+++ b/tensorflow/core/kernels/pad_op.cc
@@ -0,0 +1,159 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/pad_op.h"
+
+#include <memory>
+#include <string>
+#include <utility>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class PadOp : public OpKernel {
+ public:
+  explicit PadOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& in0 = context->input(0);
+    const Tensor& in1 = context->input(1);
+    const int dims = in0.dims();
+    static const int kMinDims = 0;
+    static const int kMaxDims = 5;
+    OP_REQUIRES(context, kMinDims <= dims && dims <= kMaxDims,
+                errors::Unimplemented("inputs rank not in [", kMinDims, ",",
+                                      kMaxDims, "]: ", dims));
+    OP_REQUIRES(
+        context,
+        TensorShapeUtils::IsMatrix(in1.shape()) && in1.dim_size(1) == 2,
+        errors::InvalidArgument("paddings must be a matrix with 2 columns: ",
+                                in1.shape().DebugString()));
+    const int fixed_dims =
+        (kAllowLegacyScalars && dims == 0 && in1.dim_size(0) == 1) ? 1 : dims;
+    OP_REQUIRES(
+        context, fixed_dims == in1.dim_size(0),
+        errors::InvalidArgument(
+            "The first dimension of paddings must be the rank of inputs",
+            in1.shape().DebugString(), " ", in0.shape().DebugString()));
+
+    // Compute the shape of the output tensor, and allocate it.
+    TensorShape output_shape;
+    TTypes<int32>::ConstMatrix paddings = in1.matrix<int32>();
+    for (int d = 0; d < fixed_dims; ++d) {
+      const int32 before_d = paddings(d, 0);  // Pad before existing elements.
+      const int32 after_d = paddings(d, 1);   // Pad after exisitng elements.
+      OP_REQUIRES(context, before_d >= 0 && after_d >= 0,
+                  errors::InvalidArgument("Paddings must be non-negative: ",
+                                          before_d, " ", after_d));
+      const int size_d =
+          (kAllowLegacyScalars && d == in0.dims()) ? 1 : in0.dim_size(d);
+      output_shape.AddDim(before_d + size_d + after_d);
+    }
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+
+    // Invoke the dims-specific implementation.
+    switch (fixed_dims) {
+      case 0:
+        Operate<0>(context, in0.tensor<T, 0>(), paddings, output);
+        break;
+      case 1:
+        // TODO(irving): Once Pad doesn't need a scalar special case,
+        // change flat to tensor.  That is, once !kAllowLegacyScalars.
+        Operate<1>(context, in0.flat<T>(), paddings, output);
+        break;
+      case 2:
+        Operate<2>(context, in0.tensor<T, 2>(), paddings, output);
+        break;
+      case 3:
+        Operate<3>(context, in0.tensor<T, 3>(), paddings, output);
+        break;
+      case 4:
+        Operate<4>(context, in0.tensor<T, 4>(), paddings, output);
+        break;
+      case 5:
+        Operate<5>(context, in0.tensor<T, 5>(), paddings, output);
+        break;
+      default:
+        OP_REQUIRES(context, false,
+                    errors::InvalidArgument("Only ranks up to 5 supported: ",
+                                            in0.shape().DebugString()));
+    }
+  }
+
+ private:
+  template <int Dims>
+  void Operate(OpKernelContext* context,
+               typename TTypes<T, Dims>::ConstTensor input,
+               TTypes<int32>::ConstMatrix paddings, Tensor* output) {
+    CHECK_EQ(Dims, paddings.dimension(0));
+    CHECK_EQ(2, paddings.dimension(1));
+    Eigen::array<std::pair<int32, int32>, Dims> paddings_array;
+    for (int i = 0; i < Dims; ++i) {
+      paddings_array[i] = std::make_pair(paddings(i, 0), paddings(i, 1));
+    }
+    functor::Pad<Device, T, Dims> functor;
+    functor(context->eigen_device<Device>(), output->tensor<T, Dims>(), input,
+            paddings_array);
+  }
+};
+
+#define REGISTER_KERNEL(type)                            \
+  REGISTER_KERNEL_BUILDER(Name("Pad")                    \
+                              .Device(DEVICE_CPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("paddings"),   \
+                          PadOp<CPUDevice, type>)
+
+TF_CALL_ALL_TYPES(REGISTER_KERNEL);
+#undef REGISTER_KERNEL
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T, Dims)                                  \
+  template <>                                                      \
+  void Pad<GPUDevice, T, Dims>::operator()(                        \
+      const GPUDevice& d, typename TTypes<T, Dims>::Tensor output, \
+      typename TTypes<T, Dims>::ConstTensor input,                 \
+      Eigen::array<std::pair<int32, int32>, Dims> paddings);       \
+  extern template struct Pad<GPUDevice, T, Dims>;
+
+#define DECLARE_GPU_SPECS(T) \
+  DECLARE_GPU_SPEC(T, 0);    \
+  DECLARE_GPU_SPEC(T, 1);    \
+  DECLARE_GPU_SPEC(T, 2);    \
+  DECLARE_GPU_SPEC(T, 3);    \
+  DECLARE_GPU_SPEC(T, 4);    \
+  DECLARE_GPU_SPEC(T, 5);
+
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNEL(T)                         \
+  REGISTER_KERNEL_BUILDER(Name("Pad")                  \
+                              .Device(DEVICE_GPU)      \
+                              .TypeConstraint<T>("T")  \
+                              .HostMemory("paddings"), \
+                          PadOp<GPUDevice, T>)
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNEL);
+#endif  // GOOGLE_CUDA
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/pad_op.h b/tensorflow/core/kernels/pad_op.h
new file mode 100644
index 0000000000..c4f8a4abda
--- /dev/null
+++ b/tensorflow/core/kernels/pad_op.h
@@ -0,0 +1,27 @@
+#ifndef TENSORFLOW_KERNELS_PAD_OP_H_
+#define TENSORFLOW_KERNELS_PAD_OP_H_
+// Functor definition for PadOp, must be compilable by nvcc.
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by PadOp to do the computations.
+template <typename Device, typename T, int Dims>
+struct Pad {
+  // Pad "input" into "output", as specified by "paddings".  See pad_op.cc for
+  // details.
+  void operator()(const Device& d, typename TTypes<T, Dims>::Tensor output,
+                  typename TTypes<T, Dims>::ConstTensor input,
+                  Eigen::array<std::pair<int32, int32>, Dims> paddings) {
+    output.device(d) = input.pad(paddings);
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_PAD_OP_H_
diff --git a/tensorflow/core/kernels/pad_op_gpu.cu.cc b/tensorflow/core/kernels/pad_op_gpu.cu.cc
new file mode 100644
index 0000000000..35a03a2cb2
--- /dev/null
+++ b/tensorflow/core/kernels/pad_op_gpu.cu.cc
@@ -0,0 +1,26 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/pad_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Definition of the GPU implementations declared in pad_op.cc.
+#define DEFINE_GPU_SPECS(T)                      \
+  template struct functor::Pad<GPUDevice, T, 0>; \
+  template struct functor::Pad<GPUDevice, T, 1>; \
+  template struct functor::Pad<GPUDevice, T, 2>; \
+  template struct functor::Pad<GPUDevice, T, 3>; \
+  template struct functor::Pad<GPUDevice, T, 4>; \
+  template struct functor::Pad<GPUDevice, T, 5>;
+
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS);
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/pooling_ops_common.cc b/tensorflow/core/kernels/pooling_ops_common.cc
new file mode 100644
index 0000000000..35e9bd75fa
--- /dev/null
+++ b/tensorflow/core/kernels/pooling_ops_common.cc
@@ -0,0 +1,252 @@
+#include "tensorflow/core/kernels/pooling_ops_common.h"
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/public/tensor.h"
+
+#if GOOGLE_CUDA
+#include "tensorflow/core/common_runtime/gpu_device_context.h"
+#include "tensorflow/core/kernels/conv_2d.h"
+#include "tensorflow/core/kernels/maxpooling_op_gpu.h"
+#include "tensorflow/core/kernels/pooling_ops_common_gpu.h"
+#include "tensorflow/stream_executor/dnn.h"
+#include "tensorflow/stream_executor/stream.h"
+#endif  // GOOGLE_CUDA
+
+namespace tensorflow {
+
+PoolParameters::PoolParameters(OpKernelContext* context,
+                               const std::vector<int32>& ksize,
+                               const std::vector<int32>& stride,
+                               Padding padding,
+                               const TensorShape& tensor_in_shape) {
+  // For maxpooling, tensor_in should have 4 dimensions.
+  OP_REQUIRES(context, tensor_in_shape.dims() == 4,
+              errors::InvalidArgument("tensor_in must be 4-dimensional"));
+
+  depth = tensor_in_shape.dim_size(3);
+  tensor_in_cols = tensor_in_shape.dim_size(2);
+  tensor_in_rows = tensor_in_shape.dim_size(1);
+  tensor_in_batch = tensor_in_shape.dim_size(0);
+  window_rows = ksize[1];
+  window_cols = ksize[2];
+  depth_window = ksize[3];
+  row_stride = stride[1];
+  col_stride = stride[2];
+  depth_stride = stride[3];
+
+  // We only support 2D pooling across width/height and depthwise
+  // pooling, not a combination.
+  OP_REQUIRES(context,
+              (depth_window == 1 || (window_rows == 1 && window_cols == 1)),
+              errors::Unimplemented(
+                  "MaxPooling supports exactly one of pooling across depth "
+                  "or pooling across width/height."));
+
+  if (depth_window == 1) {
+    OP_REQUIRES_OK(context, Get2dOutputSize(
+                                tensor_in_rows, tensor_in_cols, window_rows,
+                                window_cols, row_stride, col_stride, padding,
+                                &out_height, &out_width, &pad_rows, &pad_cols));
+  } else {
+    // Our current version of depthwise max pooling does not support
+    // any padding, and expects the depth_window to equal the
+    // depth_stride (no overlapping).
+    OP_REQUIRES(
+        context, depth % depth_window == 0,
+        errors::Unimplemented("Depthwise max pooling requires the depth "
+                              "window to evenly divide the input depth"));
+    OP_REQUIRES(
+        context, depth_stride == depth_window,
+        errors::Unimplemented("Depthwise max pooling requires the depth "
+                              "window to equal the depth stride"));
+
+    // The current version of depthwise max is only implemented on CPU.
+    OP_REQUIRES(context,
+                (DeviceType(static_cast<Device*>(context->device())
+                                ->attributes()
+                                .device_type()) == DeviceType(DEVICE_CPU)),
+                errors::Unimplemented("Depthwise max pooling is currently "
+                                      "only implemented for CPU devices."));
+
+    pad_depth = 0;
+    out_depth = depth / depth_window;
+  }
+}
+
+TensorShape PoolParameters::forward_output_shape() {
+  if (depth_window == 1) {
+    // Spatial pooling
+    return TensorShape({tensor_in_batch, out_height, out_width, depth});
+  } else {
+    // Depthwise pooling
+    return TensorShape(
+        {tensor_in_batch, tensor_in_rows, tensor_in_cols, out_depth});
+  }
+}
+
+#ifdef GOOGLE_CUDA
+
+namespace {
+template <typename T>
+perftools::gputools::DeviceMemory<T> AsDeviceMemory(const T* cuda_memory,
+                                                    uint64 size) {
+  perftools::gputools::DeviceMemoryBase wrapped(const_cast<T*>(cuda_memory),
+                                                size * sizeof(T));
+  perftools::gputools::DeviceMemory<T> typed(wrapped);
+  return typed;
+}
+}  // namespace
+
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                      \
+  template <>                                                    \
+  void TransformDepth<GPUDevice, T>::operator()(                 \
+      const GPUDevice& d, typename TTypes<T, 4>::ConstTensor in, \
+      const Eigen::DSizes<Eigen::DenseIndex, 4>& shuffle,        \
+      typename TTypes<T, 4>::Tensor out);                        \
+  extern template struct TransformDepth<GPUDevice, T>;
+
+DECLARE_GPU_SPEC(float);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+template <typename T>
+void DnnPoolingGradOp<T>::Compute(
+    OpKernelContext* context,
+    perftools::gputools::dnn::PoolingMode pooling_mode,
+    const std::vector<int32>& size, const std::vector<int32>& stride,
+    Padding padding, const Tensor* tensor_in, const Tensor* tensor_out,
+    const Tensor& out_backprop, const TensorShape& tensor_in_shape) {
+  CHECK((pooling_mode == perftools::gputools::dnn::PoolingMode::kMaximum) ||
+        (tensor_in && tensor_out))
+      << "For MaxPoolGrad, both tensor_in and tensor_out needs to be "
+         "specified";
+
+  Tensor* output = nullptr;
+  OP_REQUIRES_OK(context,
+                 context->allocate_output(0, tensor_in_shape, &output));
+
+  PoolParameters params{context, size, stride, padding, tensor_in_shape};
+  if (!context->status().ok()) {
+    return;
+  }
+
+  /// For now, cudnn does not support NHWC format, so we need to convert it
+  /// to NCHW before calling cudnn. We need to get rid of this once it is done
+  Tensor transformed_input;
+  OP_REQUIRES_OK(context, context->allocate_temp(
+                              DataTypeToEnum<T>::value,
+                              TensorShape({tensor_in_shape.dim_size(0),
+                                           tensor_in_shape.dim_size(3),
+                                           tensor_in_shape.dim_size(1),
+                                           tensor_in_shape.dim_size(2)}),
+                              &transformed_input));
+  Tensor transformed_input_backprop;
+  OP_REQUIRES_OK(context, context->allocate_temp(
+                              DataTypeToEnum<T>::value,
+                              TensorShape({tensor_in_shape.dim_size(0),
+                                           tensor_in_shape.dim_size(3),
+                                           tensor_in_shape.dim_size(1),
+                                           tensor_in_shape.dim_size(2)}),
+                              &transformed_input_backprop));
+  Tensor transformed_output;
+  OP_REQUIRES_OK(
+      context,
+      context->allocate_temp(
+          DataTypeToEnum<T>::value,
+          TensorShape({out_backprop.dim_size(0), out_backprop.dim_size(3),
+                       out_backprop.dim_size(1), out_backprop.dim_size(2)}),
+          &transformed_output));
+  Tensor transformed_output_backprop;
+  OP_REQUIRES_OK(
+      context,
+      context->allocate_temp(
+          DataTypeToEnum<T>::value,
+          TensorShape({out_backprop.dim_size(0), out_backprop.dim_size(3),
+                       out_backprop.dim_size(1), out_backprop.dim_size(2)}),
+          &transformed_output_backprop));
+
+  auto nhwc_to_nchw = Eigen::DSizes<Eigen::DenseIndex, 4>(0, 3, 1, 2);
+  if (tensor_in) {
+    // For AvgPoolGrad, the original input tensor is not necessary. However,
+    // cudnn still requires them to run, although they do not affect the
+    // results.
+    functor::TransformDepth<GPUDevice, T>()(
+        context->eigen_device<Device>(), tensor_in->tensor<T, 4>(),
+        nhwc_to_nchw, transformed_input.tensor<T, 4>());
+  }
+  if (tensor_out) {
+    // For AvgPoolGrad, the original output tensor is not necessary. However,
+    // cudnn still requires them to run, although they do not affect the
+    // results.
+    functor::TransformDepth<GPUDevice, T>()(
+        context->eigen_device<Device>(), tensor_out->tensor<T, 4>(),
+        nhwc_to_nchw, transformed_output.tensor<T, 4>());
+  }
+  functor::TransformDepth<GPUDevice, T>()(
+      context->eigen_device<Device>(), out_backprop.tensor<T, 4>(),
+      nhwc_to_nchw, transformed_output_backprop.tensor<T, 4>());
+
+  /// Get ready to call cudnn
+  perftools::gputools::dnn::PoolingDescriptor pooling_desc;
+  pooling_desc.set_pooling_mode(pooling_mode)
+      .set_window_height(params.window_rows)
+      .set_window_width(params.window_cols)
+      .set_vertical_stride(params.row_stride)
+      .set_horizontal_stride(params.col_stride)
+      .set_vertical_padding(params.pad_rows)
+      .set_horizontal_padding(params.pad_cols);
+
+  perftools::gputools::dnn::BatchDescriptor orig_output_desc;
+  orig_output_desc.set_count(params.tensor_in_batch)
+      .set_height(params.out_height)
+      .set_width(params.out_width)
+      .set_feature_map_count(params.depth)
+      .set_layout(perftools::gputools::dnn::DataLayout::kBatchDepthYX);
+
+  perftools::gputools::dnn::BatchDescriptor orig_input_desc;
+  orig_input_desc.set_count(params.tensor_in_batch)
+      .set_height(params.tensor_in_rows)
+      .set_width(params.tensor_in_cols)
+      .set_feature_map_count(params.depth)
+      .set_layout(perftools::gputools::dnn::DataLayout::kBatchDepthYX);
+
+  auto orig_output_data =
+      AsDeviceMemory(transformed_output.template flat<T>().data(),
+                     transformed_output.template flat<T>().size());
+  auto orig_input_data =
+      AsDeviceMemory(transformed_input.template flat<T>().data(),
+                     transformed_input.template flat<T>().size());
+  auto output_backprop =
+      AsDeviceMemory(transformed_output_backprop.template flat<T>().data(),
+                     transformed_output_backprop.template flat<T>().size());
+  auto input_backprop =
+      AsDeviceMemory(transformed_input_backprop.template flat<T>().data(),
+                     transformed_input_backprop.template flat<T>().size());
+
+  auto* stream = context->op_device_context<GPUDeviceContext>()->stream();
+  OP_REQUIRES(context, stream, errors::Internal("No GPU stream available."));
+
+  bool status =
+      stream->ThenPoolBackward(pooling_desc, orig_input_desc, orig_input_data,
+                               orig_output_desc, orig_output_data,
+                               output_backprop, &input_backprop)
+          .ok();
+  OP_REQUIRES(context, status,
+              errors::Internal("cudnn PoolBackward launch failed"));
+
+  /// Transform the output data from NCHW back to NHWC
+  auto toConstTensor = [](const Tensor& x) -> const Tensor { return x; };
+  auto nchw_to_nhwc = Eigen::DSizes<Eigen::DenseIndex, 4>(0, 2, 3, 1);
+  functor::TransformDepth<GPUDevice, T>()(
+      context->eigen_device<Device>(),
+      toConstTensor(transformed_input_backprop).template tensor<T, 4>(),
+      nchw_to_nhwc, output->tensor<T, 4>());
+}
+
+template class DnnPoolingGradOp<float>;
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/pooling_ops_common.h b/tensorflow/core/kernels/pooling_ops_common.h
new file mode 100644
index 0000000000..5bf44b6e40
--- /dev/null
+++ b/tensorflow/core/kernels/pooling_ops_common.h
@@ -0,0 +1,264 @@
+#ifndef TENSORFLOW_KERNELS_POOLING_OPS_COMMON_H_
+#define TENSORFLOW_KERNELS_POOLING_OPS_COMMON_H_
+
+#include <vector>
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/avgpooling_op.h"
+#include "tensorflow/core/kernels/maxpooling_op.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/util/padding.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// A helper class to manage sizes and shapes for pooling operations.
+struct PoolParameters {
+  // Updates context->status if there is an invalid input.
+  PoolParameters(OpKernelContext* context, const std::vector<int32>& ksize,
+                 const std::vector<int32>& stride, Padding padding,
+                 const TensorShape& tensor_in_shape);
+
+  // Returns the shape of the output for "forward" pooling operations.
+  TensorShape forward_output_shape();
+
+  int depth;
+
+  int tensor_in_cols;
+  int tensor_in_rows;
+  int tensor_in_batch;
+
+  int window_rows;
+  int window_cols;
+  int depth_window;
+
+  int row_stride;
+  int col_stride;
+  int depth_stride;
+
+  int out_height;
+  int out_width;
+  int out_depth;
+
+  int pad_rows;
+  int pad_cols;
+  int pad_depth;
+};
+
+// An implementation of MaxPooling (forward).
+template <typename Device, typename T>
+class MaxPoolingOp : public UnaryOp<T> {
+ public:
+  explicit MaxPoolingOp(OpKernelConstruction* context) : UnaryOp<T>(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("ksize", &ksize_));
+    OP_REQUIRES(context, ksize_.size() == 4,
+                errors::InvalidArgument("Sliding window ksize field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("strides", &stride_));
+    OP_REQUIRES(context, stride_.size() == 4,
+                errors::InvalidArgument("Sliding window stride field must "
+                                        "specify 4 dimensions"));
+    OP_REQUIRES_OK(context, context->GetAttr("padding", &padding_));
+    OP_REQUIRES(context, ksize_[0] == 1 && stride_[0] == 1,
+                errors::Unimplemented(
+                    "Pooling is not yet supported on the batch dimension."));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& tensor_in = context->input(0);
+    PoolParameters params{context, ksize_, stride_, padding_,
+                          tensor_in.shape()};
+    if (!context->status().ok()) {
+      return;
+    }
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, params.forward_output_shape(), &output));
+
+    if (params.depth_window > 1) {
+      DepthwiseMaxPool(context, output, tensor_in, params);
+    } else {
+      SpatialMaxPool(context, output, tensor_in, params, padding_);
+    }
+  }
+
+ private:
+  // Single-threaded implementation of DepthwiseMaxPool which
+  // does not handle all of the same options as SpatialMaxPool
+  // (strict assumptions on no padding, stride).
+  //
+  // TODO(vrv): implement a more general depthwise-max pool that works
+  // on GPU as well.
+  void DepthwiseMaxPool(OpKernelContext* context, Tensor* output,
+                        const Tensor& tensor_in, const PoolParameters& params) {
+    Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>>
+        in_by_pool(tensor_in.flat<T>().data(), params.depth_window,
+                   tensor_in.NumElements() / params.depth_window);
+    Eigen::Map<Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>> out_by_pool(
+        output->flat<T>().data(), 1, output->NumElements());
+    out_by_pool = in_by_pool.colwise().maxCoeff();
+  }
+
+  void SpatialMaxPool(OpKernelContext* context, Tensor* output,
+                      const Tensor& tensor_in, const PoolParameters& params,
+                      const Padding& padding) {
+    // On GPU, use Eigen's Spatial Max Pooling.  On CPU, use an
+    // EigenMatrix version that is currently faster than Eigen's
+    // Spatial MaxPooling implementation.
+    //
+    // TODO(vrv): Remove this once we no longer need it.
+    if (std::is_same<Device, GPUDevice>::value) {
+      Eigen::PaddingType pt = BrainPadding2EigenPadding(padding);
+      functor::SpatialMaxPooling<Device, T>()(
+          context->eigen_device<Device>(), output->tensor<T, 4>(),
+          tensor_in.tensor<T, 4>(), params.window_rows, params.window_cols,
+          params.row_stride, params.col_stride, pt);
+    } else {
+      typedef Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>>
+          ConstEigenMatrixMap;
+      typedef Eigen::Map<Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>>
+          EigenMatrixMap;
+
+      ConstEigenMatrixMap in_mat(tensor_in.flat<T>().data(), params.depth,
+                                 params.tensor_in_cols * params.tensor_in_rows *
+                                     params.tensor_in_batch);
+      EigenMatrixMap out_mat(
+          output->flat<T>().data(), params.depth,
+          params.out_width * params.out_height * params.tensor_in_batch);
+
+      // Initializes the output tensor with MIN<T>.
+      output->flat<T>().setConstant(Eigen::NumTraits<T>::lowest());
+
+      // The following code basically does the following:
+      // 1. Flattens the input and output tensors into two dimensional arrays.
+      //    tensor_in_as_matrix:
+      //      depth by (tensor_in_cols * tensor_in_rows * tensor_in_batch)
+      //    output_as_matrix:
+      //      depth by (out_width * out_height * tensor_in_batch)
+      //
+      // 2. Walks through the set of columns in the flattened
+      // tensor_in_as_matrix,
+      //    and updates the corresponding column(s) in output_as_matrix with the
+      //    max value.
+      for (int b = 0; b < params.tensor_in_batch; ++b) {
+        for (int h = 0; h < params.tensor_in_rows; ++h) {
+          for (int w = 0; w < params.tensor_in_cols; ++w) {
+            // (h_start, h_end) * (w_start, w_end) is the range that the input
+            // vector projects to.
+            const int hpad = h + params.pad_rows;
+            const int wpad = w + params.pad_cols;
+            const int h_start =
+                (hpad < params.window_rows)
+                    ? 0
+                    : (hpad - params.window_rows) / params.row_stride + 1;
+            const int h_end =
+                std::min(hpad / params.row_stride + 1, params.out_height);
+            const int w_start =
+                (wpad < params.window_cols)
+                    ? 0
+                    : (wpad - params.window_cols) / params.col_stride + 1;
+            const int w_end =
+                std::min(wpad / params.col_stride + 1, params.out_width);
+            // compute elementwise max
+            const int in_offset =
+                (b * params.tensor_in_rows + h) * params.tensor_in_cols + w;
+            for (int ph = h_start; ph < h_end; ++ph) {
+              for (int pw = w_start; pw < w_end; ++pw) {
+                const int out_offset =
+                    (b * params.out_height + ph) * params.out_width + pw;
+                out_mat.col(out_offset) =
+                    out_mat.col(out_offset).cwiseMax(in_mat.col(in_offset));
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+
+  std::vector<int32> ksize_;
+  std::vector<int32> stride_;
+  Padding padding_;
+};
+
+template <typename Device, typename T>
+void SpatialAvgPool(OpKernelContext* context, Tensor* output,
+                    const Tensor& input, const PoolParameters& params,
+                    const Padding& padding) {
+  typedef Eigen::Map<const Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>>
+      ConstEigenMatrixMap;
+  typedef Eigen::Map<Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic>>
+      EigenMatrixMap;
+
+  auto in_flat = input.flat<T>();
+  auto out_flat = output->flat<T>();
+
+  ConstEigenMatrixMap in_mat(
+      in_flat.data(), params.depth,
+      params.tensor_in_cols * params.tensor_in_rows * params.tensor_in_batch);
+  EigenMatrixMap out_mat(
+      out_flat.data(), params.depth,
+      params.out_width * params.out_height * params.tensor_in_batch);
+  Eigen::Matrix<T, Eigen::Dynamic, 1> out_count(out_mat.cols());
+  out_count.setZero();
+
+  // Initializes output to zero.
+  out_flat.setZero();
+
+  // The following code basically does the following:
+  // 1. Flattens the input and output tensors into two dimensional arrays.
+  //    tensor_in_as_matrix:
+  //      depth by (tensor_in_cols * tensor_in_rows * tensor_in_batch)
+  //    output_as_matrix:
+  //      depth by (out_width * out_height * tensor_in_batch)
+  //
+  // 2. Walks through the set of columns in the flattened
+  // tensor_in_as_matrix,
+  //    and updates the corresponding column(s) in output_as_matrix with the
+  //    average value.
+  for (int b = 0; b < params.tensor_in_batch; ++b) {
+    for (int h = 0; h < params.tensor_in_rows; ++h) {
+      for (int w = 0; w < params.tensor_in_cols; ++w) {
+        // (h_start, h_end) * (w_start, w_end) is the range that the input
+        // vector projects to.
+        const int hpad = h + params.pad_rows;
+        const int wpad = w + params.pad_cols;
+        const int h_start =
+            (hpad < params.window_rows)
+                ? 0
+                : (hpad - params.window_rows) / params.row_stride + 1;
+        const int h_end =
+            std::min(hpad / params.row_stride + 1, params.out_height);
+        const int w_start =
+            (wpad < params.window_cols)
+                ? 0
+                : (wpad - params.window_cols) / params.col_stride + 1;
+        const int w_end =
+            std::min(wpad / params.col_stride + 1, params.out_width);
+        const int in_offset =
+            (b * params.tensor_in_rows + h) * params.tensor_in_cols + w;
+        Eigen::DSizes<ptrdiff_t, 2> in_indices(0, in_offset);
+        for (int ph = h_start; ph < h_end; ++ph) {
+          for (int pw = w_start; pw < w_end; ++pw) {
+            const int out_offset =
+                (b * params.out_height + ph) * params.out_width + pw;
+            out_mat.col(out_offset) += in_mat.col(in_offset);
+            out_count(out_offset)++;
+          }
+        }
+      }
+    }
+  }
+  DCHECK_GT(out_count.minCoeff(), 0);
+  out_mat.array().rowwise() /= out_count.transpose().array();
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_POOLING_OPS_COMMON_H_
diff --git a/tensorflow/core/kernels/pooling_ops_common_gpu.h b/tensorflow/core/kernels/pooling_ops_common_gpu.h
new file mode 100644
index 0000000000..87a3ef5186
--- /dev/null
+++ b/tensorflow/core/kernels/pooling_ops_common_gpu.h
@@ -0,0 +1,39 @@
+#if !GOOGLE_CUDA
+#error This file must only be included when building with Cuda support
+#endif
+
+#ifndef THIRD_PARTY_TENSORFLOW_CORE_KERNELS_POOLING_OPS_COMMON_GPU_H_
+#define THIRD_PARTY_TENSORFLOW_CORE_KERNELS_POOLING_OPS_COMMON_GPU_H_
+
+#include "tensorflow/stream_executor/dnn.h"
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/kernels/avgpooling_op.h"
+#include "tensorflow/core/kernels/maxpooling_op.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/util/padding.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+// A helper class that launch the cudnn pooling backward operations.
+// The original input and output tensors are optional for AvgPoolGrad, but
+// mandatory for MaxPoolGrad.
+template <typename T>
+class DnnPoolingGradOp {
+ public:
+  typedef GPUDevice Device;
+  static void Compute(OpKernelContext* context,
+                      perftools::gputools::dnn::PoolingMode pooling_mode,
+                      const std::vector<int32>& size,
+                      const std::vector<int32>& stride, Padding padding,
+                      const Tensor* tensor_in, const Tensor* tensor_out,
+                      const Tensor& out_backprop,
+                      const TensorShape& tensor_in_shape);
+};
+
+}  // namespace tensorflow
+
+#endif  // THIRD_PARTY_TENSORFLOW_CORE_KERNELS_POOLING_OPS_COMMON_GPU_H_
diff --git a/tensorflow/core/kernels/queue_base.cc b/tensorflow/core/kernels/queue_base.cc
new file mode 100644
index 0000000000..1b13f68a3a
--- /dev/null
+++ b/tensorflow/core/kernels/queue_base.cc
@@ -0,0 +1,153 @@
+#include "tensorflow/core/kernels/queue_base.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+namespace {
+
+template <DataType DT>
+void HandleSliceToElement(const Tensor& parent, Tensor* element, int index) {
+  typedef typename EnumToDataType<DT>::Type T;
+  auto parent_as_matrix = parent.flat_outer_dims<T>();
+  element->flat<T>() = parent_as_matrix.chip(index, 0);
+}
+
+template <DataType DT>
+void HandleElementToSlice(const Tensor& element, Tensor* parent, int index) {
+  typedef typename EnumToDataType<DT>::Type T;
+  auto parent_as_matrix = parent->flat_outer_dims<T>();
+  parent_as_matrix.chip(index, 0) = element.flat<T>();
+}
+
+}  // namespace
+
+// static
+Status QueueBase::CopySliceToElement(const Tensor& parent, Tensor* element,
+                                     int index) {
+#define HANDLE_TYPE(DT)                               \
+  if (parent.dtype() == DT) {                         \
+    HandleSliceToElement<DT>(parent, element, index); \
+    return Status::OK();                              \
+  }
+  HANDLE_TYPE(DT_FLOAT);
+  HANDLE_TYPE(DT_DOUBLE);
+  HANDLE_TYPE(DT_INT32);
+  HANDLE_TYPE(DT_UINT8);
+  HANDLE_TYPE(DT_INT16);
+  HANDLE_TYPE(DT_INT8);
+  HANDLE_TYPE(DT_STRING);
+  HANDLE_TYPE(DT_INT64);
+#undef HANDLE_TYPE
+  return errors::Unimplemented("Unhandled data type: ", parent.dtype());
+}
+
+// static
+Status QueueBase::CopyElementToSlice(const Tensor& element, Tensor* parent,
+                                     int index) {
+#define HANDLE_TYPE(DT)                               \
+  if (element.dtype() == DT) {                        \
+    HandleElementToSlice<DT>(element, parent, index); \
+    return Status::OK();                              \
+  }
+  HANDLE_TYPE(DT_FLOAT);
+  HANDLE_TYPE(DT_DOUBLE);
+  HANDLE_TYPE(DT_INT32);
+  HANDLE_TYPE(DT_UINT8);
+  HANDLE_TYPE(DT_INT16);
+  HANDLE_TYPE(DT_INT8);
+  HANDLE_TYPE(DT_STRING);
+  HANDLE_TYPE(DT_INT64);
+#undef HANDLE_TYPE
+  return errors::Unimplemented("Unhandled data type: ", element.dtype());
+}
+
+QueueBase::QueueBase(const DataTypeVector& component_dtypes,
+                     const std::vector<TensorShape>& component_shapes,
+                     const string& name)
+    : component_dtypes_(component_dtypes),
+      component_shapes_(component_shapes),
+      name_(name) {}
+
+Status QueueBase::ValidateTupleCommon(const Tuple& tuple) const {
+  if (tuple.size() != static_cast<size_t>(num_components())) {
+    return errors::InvalidArgument(
+        "Wrong number of components in tuple. Expected ", num_components(),
+        ", got ", tuple.size());
+  }
+  for (size_t i = 0; i < tuple.size(); ++i) {
+    if (tuple[i].dtype() != component_dtypes_[i]) {
+      return errors::InvalidArgument(
+          "Type mismatch in tuple component ", i, ". Expected ",
+          DataTypeString(component_dtypes_[i]), ", got ",
+          DataTypeString(tuple[i].dtype()));
+    }
+  }
+  return Status::OK();
+}
+
+// static
+string QueueBase::ShapeListString(const gtl::ArraySlice<TensorShape>& shapes) {
+  string result = "[";
+  bool first = true;
+  for (const TensorShape& shape : shapes) {
+    strings::StrAppend(&result, (first ? "" : ", "), shape.ShortDebugString());
+    first = false;
+  }
+  strings::StrAppend(&result, "]");
+  return result;
+}
+
+Status QueueBase::MatchesNodeDefOp(const NodeDef& node_def,
+                                   const string& op) const {
+  if (node_def.op() != op) {
+    return errors::InvalidArgument("Shared queue '", name_, "' has type '", op,
+                                   "' that does not match type of Node '",
+                                   node_def.name(), "': ", node_def.op());
+  }
+  return Status::OK();
+}
+
+Status QueueBase::MatchesNodeDefCapacity(const NodeDef& node_def,
+                                         int32 capacity) const {
+  int32 requested_capacity = -1;
+  TF_RETURN_IF_ERROR(GetNodeAttr(node_def, "capacity", &requested_capacity));
+  if (requested_capacity < 0) requested_capacity = kUnbounded;
+  if (requested_capacity != capacity) {
+    return errors::InvalidArgument("Shared queue '", name_, "' has capacity ",
+                                   capacity, " but requested capacity was ",
+                                   requested_capacity);
+  }
+  return Status::OK();
+}
+
+Status QueueBase::MatchesNodeDefTypes(const NodeDef& node_def) const {
+  DataTypeVector requested_dtypes;
+  TF_RETURN_IF_ERROR(
+      GetNodeAttr(node_def, "component_types", &requested_dtypes));
+  if (requested_dtypes != component_dtypes_) {
+    return errors::InvalidArgument("Shared queue '", name_,
+                                   "' has component types ",
+                                   DataTypeSliceString(component_dtypes_),
+                                   " but requested component types were ",
+                                   DataTypeSliceString(requested_dtypes));
+  }
+  return Status::OK();
+}
+
+Status QueueBase::MatchesNodeDefShapes(const NodeDef& node_def) const {
+  std::vector<TensorShape> requested_shapes;
+  TF_RETURN_IF_ERROR(GetNodeAttr(node_def, "shapes", &requested_shapes));
+  if (requested_shapes != component_shapes_) {
+    return errors::InvalidArgument("Shared queue '", name_,
+                                   "' has component shapes ",
+                                   ShapeListString(component_shapes_),
+                                   " but requested component shapes were ",
+                                   ShapeListString(requested_shapes));
+  }
+  return Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/queue_base.h b/tensorflow/core/kernels/queue_base.h
new file mode 100644
index 0000000000..4897102974
--- /dev/null
+++ b/tensorflow/core/kernels/queue_base.h
@@ -0,0 +1,77 @@
+#ifndef THIRD_PARTY_TENSORFLOW_CORE_KERNELS_QUEUE_BASE_H_
+#define THIRD_PARTY_TENSORFLOW_CORE_KERNELS_QUEUE_BASE_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/queue_interface.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+// Functionality common to QueueInterface implementations.
+class QueueBase : public QueueInterface {
+ public:
+  // As a possible value of 'capacity'.
+  static const int32 kUnbounded = INT_MAX;
+
+  // Args:
+  //   component_dtypes: The types of each component in a queue-element tuple.
+  //   component_shapes: The shapes of each component in a queue-element tuple,
+  //     which must either be empty (if the shapes are not specified) or
+  //     or have the same size as component_dtypes.
+  //   name: A name to use for the queue.
+  QueueBase(const DataTypeVector& component_dtypes,
+            const std::vector<TensorShape>& component_shapes,
+            const string& name);
+
+  // Implementations of QueueInterface methods --------------------------------
+  const DataTypeVector& component_dtypes() const override {
+    return component_dtypes_;
+  }
+
+  // Other public methods -----------------------------------------------------
+  const std::vector<TensorShape>& component_shapes() const {
+    return component_shapes_;
+  }
+
+ protected:
+  // Returns the number of components in a queue-element tuple.
+  int32 num_components() const { return component_dtypes_.size(); }
+
+  // True if shapes were specified.  If so, inputs will be validated
+  // against them, etc.
+  bool specified_shapes() const { return component_shapes_.size() > 0; }
+
+  // Code common to Validate*Tuple().
+  Status ValidateTupleCommon(const Tuple& tuple) const;
+
+  // Copies the index^th slice (in the first dimension) of parent into element.
+  static Status CopySliceToElement(const Tensor& parent, Tensor* element,
+                                   int index);
+
+  // Copies element into the index^th slice (in the first dimension) of parent.
+  static Status CopyElementToSlice(const Tensor& element, Tensor* parent,
+                                   int index);
+
+  ~QueueBase() override {}
+
+  // Helpers for implementing MatchesNodeDef().
+  static string ShapeListString(const gtl::ArraySlice<TensorShape>& shapes);
+  Status MatchesNodeDefOp(const NodeDef& node_def, const string& op) const;
+  Status MatchesNodeDefCapacity(const NodeDef& node_def, int32 capacity) const;
+  Status MatchesNodeDefTypes(const NodeDef& node_def) const;
+  Status MatchesNodeDefShapes(const NodeDef& node_def) const;
+
+  const DataTypeVector component_dtypes_;
+  const std::vector<TensorShape> component_shapes_;
+  const string name_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(QueueBase);
+};
+
+}  // namespace tensorflow
+
+#endif  // THIRD_PARTY_TENSORFLOW_CORE_KERNELS_QUEUE_BASE_H_
diff --git a/tensorflow/core/kernels/queue_ops.cc b/tensorflow/core/kernels/queue_ops.cc
new file mode 100644
index 0000000000..c70dc76777
--- /dev/null
+++ b/tensorflow/core/kernels/queue_ops.cc
@@ -0,0 +1,288 @@
+// See docs in ../ops/data_flow_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/queue_interface.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+class QueueOpKernel : public AsyncOpKernel {
+ public:
+  explicit QueueOpKernel(OpKernelConstruction* context)
+      : AsyncOpKernel(context) {}
+
+  void ComputeAsync(OpKernelContext* ctx, DoneCallback callback) final {
+    QueueInterface* queue;
+    OP_REQUIRES_OK_ASYNC(ctx, GetResourceFromContext(ctx, "handle", &queue),
+                         callback);
+    ComputeAsync(ctx, queue, [callback, queue]() {
+      queue->Unref();
+      callback();
+    });
+  }
+
+ protected:
+  virtual void ComputeAsync(OpKernelContext* ctx, QueueInterface* queue,
+                            DoneCallback callback) = 0;
+};
+
+class QueueAccessOpKernel : public QueueOpKernel {
+ public:
+  explicit QueueAccessOpKernel(OpKernelConstruction* context)
+      : QueueOpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("timeout_ms", &timeout_));
+    // TODO(keveman): Enable timeout.
+    OP_REQUIRES(context, timeout_ == -1,
+                errors::InvalidArgument("Timeout not supported yet."));
+  }
+
+ protected:
+  int64 timeout_;
+};
+
+// Defines an EnqueueOp, the execution of which enqueues a tuple of
+// tensors in the given Queue.
+//
+// The op has 1 + k inputs, where k is the number of components in the
+// tuples stored in the given Queue:
+// - Input 0: queue handle.
+// - Input 1: 0th element of the tuple.
+// - ...
+// - Input (1+k): kth element of the tuple.
+class EnqueueOp : public QueueAccessOpKernel {
+ public:
+  explicit EnqueueOp(OpKernelConstruction* context)
+      : QueueAccessOpKernel(context) {}
+
+ protected:
+  void ComputeAsync(OpKernelContext* ctx, QueueInterface* queue,
+                    DoneCallback callback) override {
+    DataTypeVector expected_inputs = {DT_STRING_REF};
+    for (DataType dt : queue->component_dtypes()) {
+      expected_inputs.push_back(dt);
+    }
+    OP_REQUIRES_OK_ASYNC(ctx, ctx->MatchSignature(expected_inputs, {}),
+                         callback);
+
+    QueueInterface::Tuple tuple;
+    OpInputList components;
+    OP_REQUIRES_OK_ASYNC(ctx, ctx->input_list("components", &components),
+                         callback);
+    for (const Tensor& Tcomponent : components) {
+      tuple.push_back(Tcomponent);
+    }
+
+    OP_REQUIRES_OK_ASYNC(ctx, queue->ValidateTuple(tuple), callback);
+    queue->TryEnqueue(tuple, ctx, callback);
+  }
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(EnqueueOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("QueueEnqueue").Device(DEVICE_CPU), EnqueueOp);
+
+// Defines an EnqueueManyOp, the execution of which slices each
+// component of a tuple of tensors along the 0th dimension, and
+// enqueues tuples of slices in the given Queue.
+//
+// The op has 1 + k inputs, where k is the number of components in the
+// tuples stored in the given Queue:
+// - Input 0: queue handle.
+// - Input 1: 0th element of the tuple.
+// - ...
+// - Input (1+k): kth element of the tuple.
+//
+// N.B. All tuple components must have the same size in the 0th
+// dimension.
+class EnqueueManyOp : public QueueAccessOpKernel {
+ public:
+  explicit EnqueueManyOp(OpKernelConstruction* context)
+      : QueueAccessOpKernel(context) {}
+
+ protected:
+  void ComputeAsync(OpKernelContext* ctx, QueueInterface* queue,
+                    DoneCallback callback) override {
+    DataTypeVector expected_inputs = {DT_STRING_REF};
+    for (DataType dt : queue->component_dtypes()) {
+      expected_inputs.push_back(dt);
+    }
+    OP_REQUIRES_OK(ctx, ctx->MatchSignature(expected_inputs, {}));
+
+    QueueInterface::Tuple tuple;
+    OpInputList components;
+    OP_REQUIRES_OK_ASYNC(ctx, ctx->input_list("components", &components),
+                         callback);
+    for (const Tensor& Tcomponent : components) {
+      tuple.push_back(Tcomponent);
+    }
+
+    OP_REQUIRES_OK_ASYNC(ctx, queue->ValidateManyTuple(tuple), callback);
+    queue->TryEnqueueMany(tuple, ctx, callback);
+  }
+
+  ~EnqueueManyOp() override {}
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(EnqueueManyOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("QueueEnqueueMany").Device(DEVICE_CPU),
+                        EnqueueManyOp);
+
+// Defines a DequeueOp, the execution of which dequeues a tuple of
+// tensors from the given Queue.
+//
+// The op has one input, which is the handle of the appropriate
+// Queue. The op has k outputs, where k is the number of components in
+// the tuples stored in the given Queue, and output i is the ith
+// component of the dequeued tuple.
+class DequeueOp : public QueueAccessOpKernel {
+ public:
+  explicit DequeueOp(OpKernelConstruction* context)
+      : QueueAccessOpKernel(context) {}
+
+ protected:
+  void ComputeAsync(OpKernelContext* ctx, QueueInterface* queue,
+                    DoneCallback callback) override {
+    OP_REQUIRES_OK_ASYNC(
+        ctx, ctx->MatchSignature({DT_STRING_REF}, queue->component_dtypes()),
+        callback);
+
+    queue->TryDequeue(ctx, [ctx, callback](const QueueInterface::Tuple& tuple) {
+      if (!ctx->status().ok()) {
+        callback();
+        return;
+      }
+      OpOutputList output_components;
+      OP_REQUIRES_OK_ASYNC(
+          ctx, ctx->output_list("components", &output_components), callback);
+      for (int i = 0; i < ctx->num_outputs(); ++i) {
+        output_components.set(i, tuple[i]);
+      }
+      callback();
+    });
+  }
+
+  ~DequeueOp() override {}
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(DequeueOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("QueueDequeue").Device(DEVICE_CPU), DequeueOp);
+
+// Defines a DequeueManyOp, the execution of which concatenates the
+// requested number of elements from the given Queue along the 0th
+// dimension, and emits the result as a single tuple of tensors.
+//
+// The op has two inputs:
+// - Input 0: the handle to a queue.
+// - Input 1: the number of elements to dequeue.
+//
+// The op has k outputs, where k is the number of components in the
+// tuples stored in the given Queue, and output i is the ith component
+// of the dequeued tuple.
+class DequeueManyOp : public QueueAccessOpKernel {
+ public:
+  explicit DequeueManyOp(OpKernelConstruction* context)
+      : QueueAccessOpKernel(context) {}
+
+ protected:
+  void ComputeAsync(OpKernelContext* ctx, QueueInterface* queue,
+                    DoneCallback callback) override {
+    const Tensor& Tnum_elements = ctx->input(1);
+    int32 num_elements = Tnum_elements.flat<int32>()(0);
+
+    OP_REQUIRES_ASYNC(
+        ctx, num_elements >= 0,
+        errors::InvalidArgument("DequeueManyOp must request a positive number "
+                                "of elements"),
+        callback);
+
+    OP_REQUIRES_OK_ASYNC(ctx, ctx->MatchSignature({DT_STRING_REF, DT_INT32},
+                                                  queue->component_dtypes()),
+                         callback);
+
+    queue->TryDequeueMany(
+        num_elements, ctx, [ctx, callback](const QueueInterface::Tuple& tuple) {
+          if (!ctx->status().ok()) {
+            callback();
+            return;
+          }
+          OpOutputList output_components;
+          OP_REQUIRES_OK_ASYNC(
+              ctx, ctx->output_list("components", &output_components),
+              callback);
+          for (int i = 0; i < ctx->num_outputs(); ++i) {
+            output_components.set(i, tuple[i]);
+          }
+          callback();
+        });
+  }
+
+  ~DequeueManyOp() override {}
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(DequeueManyOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("QueueDequeueMany").Device(DEVICE_CPU),
+                        DequeueManyOp);
+
+// Defines a QueueCloseOp, which closes the given Queue. Closing a
+// Queue signals that no more elements will be enqueued in it.
+//
+// The op has one input, which is the handle of the appropriate Queue.
+class QueueCloseOp : public QueueOpKernel {
+ public:
+  explicit QueueCloseOp(OpKernelConstruction* context)
+      : QueueOpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("cancel_pending_enqueues",
+                                             &cancel_pending_enqueues_));
+  }
+
+ protected:
+  void ComputeAsync(OpKernelContext* ctx, QueueInterface* queue,
+                    DoneCallback callback) override {
+    queue->Close(ctx, cancel_pending_enqueues_, callback);
+  }
+
+ private:
+  bool cancel_pending_enqueues_;
+  TF_DISALLOW_COPY_AND_ASSIGN(QueueCloseOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("QueueClose").Device(DEVICE_CPU), QueueCloseOp);
+
+// Defines a QueueSizeOp, which computes the number of elements in the
+// given Queue, and emits it as an output tensor.
+//
+// The op has one input, which is the handle of the appropriate Queue;
+// and one output, which is a single-element tensor containing the current
+// size of that Queue.
+class QueueSizeOp : public QueueOpKernel {
+ public:
+  explicit QueueSizeOp(OpKernelConstruction* context)
+      : QueueOpKernel(context) {}
+
+ protected:
+  void ComputeAsync(OpKernelContext* ctx, QueueInterface* queue,
+                    DoneCallback callback) override {
+    Tensor* Tqueue_size = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({}), &Tqueue_size));
+    Tqueue_size->flat<int32>().setConstant(queue->size());
+    callback();
+  }
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(QueueSizeOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("QueueSize").Device(DEVICE_CPU), QueueSizeOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/random_crop_op.cc b/tensorflow/core/kernels/random_crop_op.cc
new file mode 100644
index 0000000000..4fc12e92cb
--- /dev/null
+++ b/tensorflow/core/kernels/random_crop_op.cc
@@ -0,0 +1,103 @@
+// See docs in ../ops/image_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/guarded_philox_random.h"
+
+namespace tensorflow {
+
+template <typename T>
+class RandomCropOp : public OpKernel {
+ public:
+  explicit RandomCropOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, generator_.Init(context));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    OP_REQUIRES(context, input.dims() == 3,
+                errors::InvalidArgument("input must be 3-dimensional",
+                                        input.shape().ShortDebugString()));
+    const Tensor& shape_t = context->input(1);
+    OP_REQUIRES(context, shape_t.dims() == 1,
+                errors::InvalidArgument("shape_t must be 1-dimensional",
+                                        shape_t.shape().ShortDebugString()));
+    OP_REQUIRES(context, shape_t.NumElements() == 2,
+                errors::InvalidArgument("shape_t must have two elements",
+                                        shape_t.shape().ShortDebugString()));
+
+    auto shape_vec = shape_t.vec<int64>();
+    const int32 target_height = shape_vec(0);
+    const int32 target_width = shape_vec(1);
+
+    const int32 height = input.dim_size(0);
+    const int32 width = input.dim_size(1);
+    const int32 channels = input.dim_size(2);
+
+    // Initialize shape to the batch size of the input, then add
+    // the rest of the dimensions
+    Tensor* output = nullptr;
+    const auto output_shape =
+        TensorShape({target_height, target_width, channels});
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+
+    // If the target size matches the actual size, then do nothing.
+    if ((target_height == height) && (target_width == width)) {
+      *output = context->input(0);
+    }
+
+    // TODO(shlens): Implement edge case to guarantee output size dimensions.
+    // Edge case. The target dimensions are larger then the image, so
+    // zero-pad the image. This guarantees that the image will *always*
+    // be [target_height, target_width] in size.
+    OP_REQUIRES(context, width >= target_width, errors::FailedPrecondition(
+        "width must be >= target_width: width = ", width,
+        ", target_width = ", target_width));
+    OP_REQUIRES(context, height >= target_height, errors::FailedPrecondition(
+        "height must be >= target_height: height = ", height,
+        ", target_height = ", target_height));
+
+    int32 offset_height = 0;
+    int32 offset_width = 0;
+
+    auto local_gen = generator_.ReserveSamples32(2);
+    random::SimplePhilox random(&local_gen);
+
+    if (width > target_width) {
+      offset_width = random.Rand32() % (width - target_width + 1);
+    }
+    if (height > target_height) {
+      offset_height = random.Rand32() % (height - target_height + 1);
+    }
+
+    // TODO(shlens): Do this more efficiently with memcpy once padding is
+    // available for smaller images.
+    typename TTypes<T, 3>::ConstTensor input_data = input.tensor<T, 3>();
+    typename TTypes<T, 3>::Tensor output_data = output->tensor<T, 3>();
+
+    for (int y = 0; y < target_height; ++y) {
+      for (int x = 0; x < target_width; ++x) {
+        for (int c = 0; c < channels; ++c) {
+          output_data(y, x, c) =
+              input_data(y + offset_height, x + offset_width, c);
+        }
+      }
+    }
+  }
+
+ private:
+  GuardedPhiloxRandom generator_;
+};
+
+#define REGISTER_KERNELS(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                           \
+    Name("RandomCrop").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+    RandomCropOp<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNELS);
+#undef REGISTER_KERNELS
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/random_crop_op_test.cc b/tensorflow/core/kernels/random_crop_op_test.cc
new file mode 100644
index 0000000000..1f232f4969
--- /dev/null
+++ b/tensorflow/core/kernels/random_crop_op_test.cc
@@ -0,0 +1,60 @@
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+
+class RandomCropOpTest : public OpsTestBase {
+ protected:
+  RandomCropOpTest() {
+    RequireDefaultOps();
+    EXPECT_OK(NodeDefBuilder("random_crop_op", "RandomCrop")
+                  .Input(FakeInput(DT_UINT8))
+                  .Input(FakeInput(DT_INT64))
+                  .Attr("T", DT_UINT8)
+                  .Finalize(node_def()));
+    EXPECT_OK(InitOp());
+  }
+};
+
+TEST_F(RandomCropOpTest, Basic) {
+  AddInputFromArray<uint8>(TensorShape({1, 2, 1}), {2, 2});
+  AddInputFromArray<int64>(TensorShape({2}), {1, 1});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_UINT8, TensorShape({1, 1, 1}));
+  test::FillValues<uint8>(&expected, {2});
+  test::ExpectTensorEqual<uint8>(expected, *GetOutput(0));
+}
+
+TEST_F(RandomCropOpTest, SameSizeOneChannel) {
+  AddInputFromArray<uint8>(TensorShape({2, 1, 1}), {1, 2});
+  AddInputFromArray<int64>(TensorShape({2}), {2, 1});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_UINT8, TensorShape({2, 1, 1}));
+  test::FillValues<uint8>(&expected, {1, 2});
+  test::ExpectTensorEqual<uint8>(expected, *GetOutput(0));
+}
+
+TEST_F(RandomCropOpTest, SameSizeMultiChannel) {
+  AddInputFromArray<uint8>(TensorShape({2, 1, 3}), {1, 2, 3, 4, 5, 6});
+  AddInputFromArray<int64>(TensorShape({2}), {2, 1});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_UINT8, TensorShape({2, 1, 3}));
+  test::FillValues<uint8>(&expected, {1, 2, 3, 4, 5, 6});
+  test::ExpectTensorEqual<uint8>(expected, *GetOutput(0));
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/random_op.cc b/tensorflow/core/kernels/random_op.cc
new file mode 100644
index 0000000000..09b66d30e6
--- /dev/null
+++ b/tensorflow/core/kernels/random_op.cc
@@ -0,0 +1,276 @@
+// See docs in ../ops/random_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/random_op.h"
+
+#include <algorithm>
+#include <memory>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/lib/hash/crc32c.h"
+#include "tensorflow/core/lib/random/random_distributions.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/util/guarded_philox_random.h"
+#include "tensorflow/core/util/work_sharder.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+namespace functor {
+
+// The default implementation of the functor, which should never be invoked
+// But we still need to provide implementation for now for the linker to work,
+// since we do not support all the distributions yet.
+template <typename Device, class Distribution>
+struct FillPhiloxRandom {
+  typedef typename Distribution::ResultElementType T;
+  void operator()(OpKernelContext*, const Device&, random::PhiloxRandom gen,
+                  T* data, int64 size) {
+    LOG(FATAL) << "Default FillPhiloxRandom should not be executed.";
+  }
+};
+
+#if GOOGLE_CUDA
+// Declaration for the partial specialization with GPU
+template <class Distribution>
+struct FillPhiloxRandom<GPUDevice, Distribution> {
+  typedef typename Distribution::ResultElementType T;
+  void operator()(OpKernelContext* ctx, const GPUDevice&,
+                  random::PhiloxRandom gen, T* data, int64 size);
+};
+
+#endif
+
+// A class to fill a specified range of random groups
+template <class Distribution, bool VariableSamplesPerOutput>
+struct FillPhiloxRandomTask;
+
+// Specialization for distribution that takes a fixed number of samples for
+// each output.
+template <class Distribution>
+struct FillPhiloxRandomTask<Distribution, false> {
+  typedef typename Distribution::ResultElementType T;
+  static void Run(random::PhiloxRandom gen, T* data, int64 size,
+                  int64 start_group, int64 limit_group) {
+    Distribution dist;
+    const int kGroupSize = Distribution::kResultElementCount;
+
+    gen.Skip(start_group);
+    int64 offset = start_group * kGroupSize;
+
+    // First fill all the full-size groups
+    int64 limit_group_full = std::min(limit_group, size / kGroupSize);
+    for (int64 index = start_group; index < limit_group_full; ++index) {
+      auto samples = dist(&gen);
+      std::copy(&samples[0], &samples[0] + kGroupSize, data + offset);
+      offset += kGroupSize;
+    }
+
+    // If there are any remaining elements that need to be filled, process them
+    if (limit_group_full < limit_group) {
+      int remaining_size = size - limit_group_full * kGroupSize;
+      auto samples = dist(&gen);
+      std::copy(&samples[0], &samples[0] + remaining_size, data + offset);
+    }
+  }
+};
+
+// Specialization for distribution that takes a varaiable number of samples for
+// each output. This will be slower due to the generality.
+template <class Distribution>
+struct FillPhiloxRandomTask<Distribution, true> {
+  typedef typename Distribution::ResultElementType T;
+  static const int64 kReservedSamplesPerOutput = 256;
+
+  static void Run(random::PhiloxRandom base_gen, T* data, int64 size,
+                  int64 start_group, int64 limit_group) {
+    using random::PhiloxRandom;
+    using random::SingleSampleAdapter;
+
+    Distribution dist;
+    const int kGroupSize = Distribution::kResultElementCount;
+
+    static const int kGeneratorSkipPerOutputGroup =
+        kGroupSize * kReservedSamplesPerOutput /
+        PhiloxRandom::kResultElementCount;
+
+    int64 offset = start_group * kGroupSize;
+
+    // First fill all the full-size groups
+    int64 limit_group_full = std::min(limit_group, size / kGroupSize);
+    int64 group_index;
+    for (group_index = start_group; group_index < limit_group_full;
+         ++group_index) {
+      // Reset the generator to the beginning of the output group region
+      // This is necessary if we want the results to be independent of order
+      // of work
+      PhiloxRandom gen = base_gen;
+      gen.Skip(group_index * kGeneratorSkipPerOutputGroup);
+      SingleSampleAdapter<PhiloxRandom> single_samples(&gen);
+
+      auto samples = dist(&single_samples);
+      std::copy(&samples[0], &samples[0] + kGroupSize, data + offset);
+      offset += kGroupSize;
+    }
+
+    // If there are any remaining elements that need to be filled, process them
+    if (limit_group_full < limit_group) {
+      PhiloxRandom gen = base_gen;
+      gen.Skip(group_index * kGeneratorSkipPerOutputGroup);
+      SingleSampleAdapter<PhiloxRandom> single_samples(&gen);
+
+      int remaining_size = size - limit_group_full * kGroupSize;
+      auto samples = dist(&single_samples);
+      std::copy(&samples[0], &samples[0] + remaining_size, data + offset);
+    }
+  }
+};
+
+// Partial specialization for CPU to fill the entire region with randoms
+// It splits the work into several tasks and run them in parallel
+template <class Distribution>
+struct FillPhiloxRandom<CPUDevice, Distribution> {
+  typedef typename Distribution::ResultElementType T;
+  void operator()(OpKernelContext* context, const CPUDevice&,
+                  random::PhiloxRandom gen, T* data, int64 size) {
+    const int kGroupSize = Distribution::kResultElementCount;
+
+    auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
+
+    int64 total_group_count = (size + kGroupSize - 1) / kGroupSize;
+
+    // Limit to maximum six threads for now. The performance scaling is very
+    // sub-linear. Too many threads causes a much worse overall performance.
+    int num_workers = 6;
+    Shard(num_workers, worker_threads.workers, total_group_count, kGroupSize,
+          [&gen, data, size](int64 start_group, int64 limit_group) {
+            FillPhiloxRandomTask<
+                Distribution,
+                Distribution::kVariableSamplesPerOutput>::Run(gen, data, size,
+                                                              start_group,
+                                                              limit_group);
+          });
+  }
+};
+}  // namespace functor
+
+// For now, use the same interface as RandomOp, so we can choose either one
+// at the run-time.
+template <typename Device, class Distribution>
+class PhiloxRandomOp : public OpKernel {
+ public:
+  typedef typename Distribution::ResultElementType T;
+  explicit PhiloxRandomOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, generator_.Init(ctx));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& input = ctx->input(0);
+    OP_REQUIRES(
+        ctx, TensorShapeUtils::IsLegacyVector(input.shape()),
+        errors::InvalidArgument("shape must be a vector of {int32,int64}."));
+    Tensor* output = nullptr;
+    if (input.dtype() == DataType::DT_INT32) {
+      auto vec = input.flat<int32>();
+      OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShapeUtils::MakeShape(
+                                                      vec.data(), vec.size()),
+                                               &output));
+    } else if (input.dtype() == DataType::DT_INT64) {
+      auto vec = input.flat<int64>();
+      OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShapeUtils::MakeShape(
+                                                      vec.data(), vec.size()),
+                                               &output));
+    } else {
+      OP_REQUIRES(ctx, false, errors::InvalidArgument(
+                                  "shape must be a vector of {int32,int64}."));
+    }
+    functor::FillPhiloxRandom<Device, Distribution>()(
+        ctx, ctx->eigen_device<Device>(),
+        ReserveRandomOutputs(output->flat<T>().size()),
+        output->flat<T>().data(), output->flat<T>().size());
+  }
+
+ private:
+  GuardedPhiloxRandom generator_;
+
+  // Reserve enough random samples in the generator for the given output count.
+  random::PhiloxRandom ReserveRandomOutputs(int64 output_count) {
+    int64 conservative_sample_count = output_count << 8;
+    return generator_.ReserveSamples128(conservative_sample_count);
+  }
+};
+
+#define REGISTER(TYPE)                                              \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("RandomUniform")                                         \
+          .Device(DEVICE_CPU)                                       \
+          .HostMemory("shape")                                      \
+          .TypeConstraint<TYPE>("dtype"),                           \
+      PhiloxRandomOp<CPUDevice, random::UniformDistribution<        \
+                                    random::PhiloxRandom, TYPE> >); \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("RandomStandardNormal")                                  \
+          .Device(DEVICE_CPU)                                       \
+          .HostMemory("shape")                                      \
+          .TypeConstraint<TYPE>("dtype"),                           \
+      PhiloxRandomOp<CPUDevice, random::NormalDistribution<         \
+                                    random::PhiloxRandom, TYPE> >); \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("TruncatedNormal")                                       \
+          .Device(DEVICE_CPU)                                       \
+          .HostMemory("shape")                                      \
+          .TypeConstraint<TYPE>("dtype"),                           \
+      PhiloxRandomOp<                                               \
+          CPUDevice,                                                \
+          random::TruncatedNormalDistribution<                      \
+              random::SingleSampleAdapter<random::PhiloxRandom>, TYPE> >)
+
+REGISTER(float);
+REGISTER(double);
+
+#undef REGISTER
+
+#if GOOGLE_CUDA
+
+#define REGISTER(TYPE)                                              \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("RandomUniform")                                         \
+          .Device(DEVICE_GPU)                                       \
+          .HostMemory("shape")                                      \
+          .TypeConstraint<int32>("T")                               \
+          .TypeConstraint<TYPE>("dtype"),                           \
+      PhiloxRandomOp<GPUDevice, random::UniformDistribution<        \
+                                    random::PhiloxRandom, TYPE> >); \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("RandomStandardNormal")                                  \
+          .Device(DEVICE_GPU)                                       \
+          .HostMemory("shape")                                      \
+          .TypeConstraint<int32>("T")                               \
+          .TypeConstraint<TYPE>("dtype"),                           \
+      PhiloxRandomOp<GPUDevice, random::NormalDistribution<         \
+                                    random::PhiloxRandom, TYPE> >); \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("TruncatedNormal")                                       \
+          .Device(DEVICE_GPU)                                       \
+          .HostMemory("shape")                                      \
+          .TypeConstraint<int32>("T")                               \
+          .TypeConstraint<TYPE>("dtype"),                           \
+      PhiloxRandomOp<                                               \
+          GPUDevice,                                                \
+          random::TruncatedNormalDistribution<                      \
+              random::SingleSampleAdapter<random::PhiloxRandom>, TYPE> >)
+
+REGISTER(float);
+REGISTER(double);
+
+#undef REGISTER
+
+#endif  // GOOGLE_CUDA
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/random_op.h b/tensorflow/core/kernels/random_op.h
new file mode 100644
index 0000000000..7c7eed4227
--- /dev/null
+++ b/tensorflow/core/kernels/random_op.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_KERNELS_RANDOM_OP_H_
+#define TENSORFLOW_KERNELS_RANDOM_OP_H_
+
+namespace tensorflow {
+
+class OpKernelContext;
+
+namespace functor {
+
+template <typename Device, class Distribution>
+struct FillPhiloxRandom;
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_RANDOM_OP_H_
diff --git a/tensorflow/core/kernels/random_op_gpu.cu.cc b/tensorflow/core/kernels/random_op_gpu.cu.cc
new file mode 100644
index 0000000000..15cf85f27e
--- /dev/null
+++ b/tensorflow/core/kernels/random_op_gpu.cu.cc
@@ -0,0 +1,152 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/random_op.h"
+
+#include <stdio.h>
+#include <assert.h>
+
+#include "tensorflow/core/lib/random/philox_random.h"
+#include "tensorflow/core/lib/random/random_distributions.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+class OpKernelContext;
+
+namespace functor {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+template <class Distribution, bool VariableSamplesPerOutput>
+struct FillPhiloxRandomKernel;
+
+// A cuda kernel to fill the data with random numbers from the specified
+// distribution. Each output takes a fixed number of samples.
+template <class Distribution>
+struct FillPhiloxRandomKernel<Distribution, false> {
+  typedef typename Distribution::ResultElementType T;
+  PHILOX_DEVICE_FUNC void Run(random::PhiloxRandom gen, T* data, int64 size) {
+    Distribution dist;
+    const int kGroupSize = Distribution::kResultElementCount;
+
+    const int32 thread_id = blockIdx.x * blockDim.x + threadIdx.x;
+    const int32 total_thread_count = gridDim.x * blockDim.x;
+    int32 offset = thread_id * kGroupSize;
+    gen.Skip(thread_id);
+
+    while (offset < size) {
+      typename Distribution::ResultType samples = dist(&gen);
+
+      for (int i = 0; i < kGroupSize; ++i) {
+        if (offset >= size) {
+          return;
+        }
+        data[offset] = samples[i];
+        ++offset;
+      }
+
+      offset += (total_thread_count - 1) * kGroupSize;
+      gen.Skip(total_thread_count - 1);
+    }
+  }
+};
+
+// A cuda kernel to fill the data with random numbers from the specified
+// distribution. Each output takes a variable number of samples.
+template <class Distribution>
+struct FillPhiloxRandomKernel<Distribution, true> {
+  typedef typename Distribution::ResultElementType T;
+  PHILOX_DEVICE_FUNC void Run(const random::PhiloxRandom& base_gen, T* data,
+                              int64 size) {
+    using random::PhiloxRandom;
+    using random::SingleSampleAdapter;
+
+    const int kReservedSamplesPerOutput = 256;
+    const int kGroupSize = Distribution::kResultElementCount;
+    const int kGeneratorSkipPerOutputGroup = kGroupSize *
+                                             kReservedSamplesPerOutput /
+                                             PhiloxRandom::kResultElementCount;
+
+    const int32 thread_id = blockIdx.x * blockDim.x + threadIdx.x;
+    const int32 total_thread_count = gridDim.x * blockDim.x;
+    int64 group_index = thread_id;
+    int64 offset = group_index * kGroupSize;
+    Distribution dist;
+
+    while (offset < size) {
+      // Since each output takes a variable number of samples, we need to
+      // realign the generator to the beginning for the current output group
+      PhiloxRandom gen = base_gen;
+      gen.Skip(group_index * kGeneratorSkipPerOutputGroup);
+      SingleSampleAdapter<PhiloxRandom> single_samples(&gen);
+
+      typename Distribution::ResultType samples = dist(&single_samples);
+
+      for (int i = 0; i < kGroupSize; ++i) {
+        if (offset >= size) {
+          return;
+        }
+        data[offset] = samples[i];
+        ++offset;
+      }
+
+      offset += (total_thread_count - 1) * kGroupSize;
+      group_index += total_thread_count;
+    }
+  }
+};
+
+// A simple launch pad to call the correct function templates to fill the data
+template <class Distribution>
+__global__ void __launch_bounds__(1024)
+    FillPhiloxRandomKernelLaunch(random::PhiloxRandom base_gen,
+                                 typename Distribution::ResultElementType* data,
+                                 int64 size) {
+  FillPhiloxRandomKernel<Distribution,
+                         Distribution::kVariableSamplesPerOutput>()
+      .Run(base_gen, data, size);
+}
+
+// Partial specialization for GPU
+template <class Distribution>
+struct FillPhiloxRandom<GPUDevice, Distribution> {
+  typedef typename Distribution::ResultElementType T;
+  typedef GPUDevice Device;
+  void operator()(OpKernelContext*, const Device& d, random::PhiloxRandom gen,
+                  T* data, int64 size) {
+    const int32 block_size = d.maxCudaThreadsPerBlock();
+    const int32 num_blocks =
+        (d.getNumCudaMultiProcessors() * d.maxCudaThreadsPerMultiProcessor()) /
+        block_size;
+
+    FillPhiloxRandomKernelLaunch<
+        Distribution><<<num_blocks, block_size, 0, d.stream()>>>(gen, data,
+                                                                 size);
+  }
+};
+
+// Explicit instantiation of the GPU distributions functors
+// clang-format off
+// NVCC cannot handle ">>" properly
+template struct FillPhiloxRandom<
+    GPUDevice, random::UniformDistribution<random::PhiloxRandom, float> >;
+template struct FillPhiloxRandom<
+    GPUDevice, random::UniformDistribution<random::PhiloxRandom, double> >;
+template struct FillPhiloxRandom<
+    GPUDevice, random::NormalDistribution<random::PhiloxRandom, float> >;
+template struct FillPhiloxRandom<
+    GPUDevice, random::NormalDistribution<random::PhiloxRandom, double> >;
+template struct FillPhiloxRandom<
+    GPUDevice, random::TruncatedNormalDistribution<
+                   random::SingleSampleAdapter<random::PhiloxRandom>, float> >;
+template struct FillPhiloxRandom<
+    GPUDevice, random::TruncatedNormalDistribution<
+                   random::SingleSampleAdapter<random::PhiloxRandom>, double> >;
+// clang-format on
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/random_op_test.cc b/tensorflow/core/kernels/random_op_test.cc
new file mode 100644
index 0000000000..751b61cfba
--- /dev/null
+++ b/tensorflow/core/kernels/random_op_test.cc
@@ -0,0 +1,99 @@
+#include <random>
+
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/lib/random/philox_random.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+Tensor Int32(int32 v) {
+  Tensor t(DT_INT32, TensorShape({}));
+  t.scalar<int32>()() = v;
+  return t;
+}
+
+Graph* RandomUniform(int64 n) {
+  Graph* g = new Graph(OpRegistry::Global());
+  test::graph::RandomUniform(g, test::graph::Constant(g, Int32(n)), DT_FLOAT);
+  return g;
+}
+
+Graph* RandomNormal(int64 n) {
+  Graph* g = new Graph(OpRegistry::Global());
+  test::graph::RandomGaussian(g, test::graph::Constant(g, Int32(n)), DT_FLOAT);
+  return g;
+}
+
+Graph* RandomParameters(int64 n) {
+  Graph* g = new Graph(OpRegistry::Global());
+  test::graph::RandomParameters(g, test::graph::Constant(g, Int32(n)),
+                                DT_FLOAT);
+  return g;
+}
+
+#define BM_RNG(DEVICE, RNG)                                   \
+  static void BM_##DEVICE##_##RNG(int iters, int arg) {       \
+    testing::ItemsProcessed(static_cast<int64>(iters) * arg); \
+    test::Benchmark(#DEVICE, RNG(arg)).Run(iters);            \
+  }                                                           \
+  BENCHMARK(BM_##DEVICE##_##RNG)->Range(1 << 20, 8 << 20);
+
+BM_RNG(cpu, RandomUniform);
+BM_RNG(cpu, RandomNormal);
+BM_RNG(cpu, RandomParameters);
+
+BM_RNG(gpu, RandomUniform);
+BM_RNG(gpu, RandomNormal);
+BM_RNG(gpu, RandomParameters);
+
+static void BM_PhiloxRandom(int iters) {
+  // Fill 2M random numbers
+  int count = 2 << 20;
+
+  testing::ItemsProcessed(static_cast<int64>(iters) * count);
+
+  random::PhiloxRandom gen(0x12345);
+
+  int val = 1;
+  for (int i = 0; i < iters; ++i) {
+    for (int j = 0; j < count; j += 4) {
+      /// each invocation of gen() returns 128-bit samples
+      auto samples = gen();
+
+      // use the result trivially so the compiler does not optimize it away
+      val ^= samples[0] ^ samples[1] ^ samples[2] ^ samples[3];
+    }
+  }
+
+  // A anchor point to make sure the compiler does not cut corners
+  CHECK(val) << val;
+}
+BENCHMARK(BM_PhiloxRandom);
+
+static void BM_StdMTRandom(int iters) {
+  // Fill 2M random numbers
+  int count = 2 << 20;
+
+  testing::ItemsProcessed(static_cast<int64>(iters) * count);
+
+  std::mt19937 gen(0x12345);
+
+  int val = 1;
+  for (int i = 0; i < iters; ++i) {
+    for (int j = 0; j < count; ++j) {
+      /// each invocation of gen() returns 32-bit sample
+      uint32 sample = gen();
+
+      // use the result trivially so the compiler does not optimize it away
+      val ^= sample;
+    }
+  }
+
+  // A anchor point to make sure the compiler does not cut corners
+  CHECK(val) << val;
+}
+BENCHMARK(BM_StdMTRandom);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/random_shuffle_op.cc b/tensorflow/core/kernels/random_shuffle_op.cc
new file mode 100644
index 0000000000..b87f4e58a0
--- /dev/null
+++ b/tensorflow/core/kernels/random_shuffle_op.cc
@@ -0,0 +1,89 @@
+// See docs in ../ops/random_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_util.h"
+#include "tensorflow/core/lib/random/random_distributions.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/util/guarded_philox_random.h"
+
+namespace tensorflow {
+
+// TODO(irving): If performance is critical, generate output directly instead
+// of an in-place shuffle using a pseudorandom permutation like
+//
+//   https://github.com/otherlab/geode/blob/master/geode/random/permute.cpp
+//
+// This is probably also the right thing if we want a GPU version of shuffling.
+
+// We use our own version of std::random_shuffle to guarantee that exactly
+// size - 1 samples are used.
+template <class Iter, class Random>
+static inline void RandomShuffle(Iter first, Iter last, Random& uniform) {
+  if (first == last) return;
+  const auto stop = last - 1;
+  for (auto i = first; i != stop; ++i) {
+    using std::iter_swap;
+    iter_swap(i, i + uniform(last - i));
+  }
+}
+
+template <typename T>
+class RandomShuffleOp : public OpKernel {
+ public:
+  explicit RandomShuffleOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, generator_.Init(context));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+
+    if (input.NumElements() <= 1 || input.dim_size(0) <= 1) {
+      // No shuffling is required, so copy input directly to output
+      context->set_output(0, input);
+    } else {
+      // Reserve enough random samples for shuffling
+      const int64 size = input.dim_size(0);
+      const int64 samples = size - 1;
+      auto local_gen = generator_.ReserveSamples32(samples);
+      random::SingleSampleAdapter<random::PhiloxRandom> single(&local_gen);
+      const auto uniform = [&single](uint32 n) { return single() % n; };
+
+      if (input.dims() == 1) {
+        // For 1D data, copy and then shuffle in place
+        context->set_output(0, tensor::DeepCopy(input));
+        auto vec = context->mutable_output(0)->vec<T>();
+        RandomShuffle(vec.data(), vec.data() + size, uniform);
+      } else {
+        // For >= 2D, shuffle indices and then copy across
+        Tensor* output = nullptr;
+        OP_REQUIRES_OK(context,
+                       context->allocate_output(0, input.shape(), &output));
+        const auto input_mat = input.flat_outer_dims<T>();
+        auto output_mat = output->flat_outer_dims<T>();
+        std::vector<int> permutation(size);
+        for (int i = 0; i < size; i++) {
+          permutation[i] = i;
+        }
+        RandomShuffle(permutation.begin(), permutation.end(), uniform);
+        for (int i = 0; i < size; i++) {
+          output_mat.template chip<0>(i) =
+              input_mat.template chip<0>(permutation[i]);
+        }
+      }
+    }
+  }
+
+ private:
+  GuardedPhiloxRandom generator_;
+};
+
+#define REGISTER(T)                                                    \
+  REGISTER_KERNEL_BUILDER(                                             \
+      Name("RandomShuffle").Device(DEVICE_CPU).TypeConstraint<T>("T"), \
+      RandomShuffleOp<T>);
+TF_CALL_ALL_TYPES(REGISTER)
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/random_shuffle_queue_op.cc b/tensorflow/core/kernels/random_shuffle_queue_op.cc
new file mode 100644
index 0000000000..561ec76e53
--- /dev/null
+++ b/tensorflow/core/kernels/random_shuffle_queue_op.cc
@@ -0,0 +1,740 @@
+// See docs in ../ops/data_flow_ops.cc.
+
+#include <deque>
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/queue_base.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/random/philox_random.h"
+#include "tensorflow/core/lib/random/random.h"
+#include "tensorflow/core/lib/random/random_distributions.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+class RandomShuffleQueue : public QueueBase {
+ public:
+  RandomShuffleQueue(int32 capacity, int32 min_after_dequeue, int64 seed,
+                     int64 seed2, const DataTypeVector& component_dtypes,
+                     const std::vector<TensorShape>& component_shapes,
+                     const string& name);
+  Status Initialize();  // Must be called before any other method.
+
+  // Implementations of QueueInterface methods --------------------------------
+
+  Status ValidateTuple(const Tuple& tuple) override;
+  Status ValidateManyTuple(const Tuple& tuple) override;
+  void TryEnqueue(const Tuple& tuple, OpKernelContext* ctx,
+                  DoneCallback callback) override;
+  void TryEnqueueMany(const Tuple& tuple, OpKernelContext* ctx,
+                      DoneCallback callback) override;
+  void TryDequeue(OpKernelContext* ctx, CallbackWithTuple callback) override;
+  void TryDequeueMany(int num_elements, OpKernelContext* ctx,
+                      CallbackWithTuple callback) override;
+  void Close(OpKernelContext* ctx, bool cancel_pending_enqueues,
+             DoneCallback callback) override;
+  Status MatchesNodeDef(const NodeDef& node_def) override;
+
+  int32 size() override {
+    mutex_lock lock(mu_);
+    return queues_[0].size();
+  }
+
+ private:
+  enum Action { kEnqueue, kDequeue };
+
+  ~RandomShuffleQueue() override {}
+
+  TensorShape ManyOutShape(int i, int batch_size) {
+    TensorShape shape({batch_size});
+    shape.AppendShape(component_shapes_[i]);
+    return shape;
+  }
+
+  // Helper for dequeuing a single random element from queues_.
+  void DequeueLocked(OpKernelContext* ctx, Tuple* tuple)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  void Cancel(Action action, CancellationToken token);
+
+  // Helper for cancelling all pending Enqueue(Many) operations when
+  // Close is called with cancel_pending_enqueues.
+  void CloseAndCancel();
+
+  // Tries to enqueue/dequeue (or close) based on whatever is at the
+  // front of enqueue_attempts_/dequeue_attempts_.  Appends to
+  // *finished the callback for any finished attempt (so it may be
+  // called once mu_ is released).  Returns true if any progress was
+  // made.
+  struct CleanUp {
+    CleanUp(DoneCallback&& f, CancellationToken ct, CancellationManager* cm)
+        : finished(f), to_deregister(ct), cm(cm) {}
+    DoneCallback finished;
+    CancellationToken to_deregister;
+    CancellationManager* cm;
+  };
+  bool TryAttemptLocked(Action action, std::vector<CleanUp>* clean_up)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Tries to make progress on the enqueues or dequeues at the front
+  // of the *_attempts_ queues.
+  void FlushUnlocked();
+
+  const int32 capacity_;
+  const int32 min_after_dequeue_;
+  const int64 original_seed_;
+  const int64 original_seed2_;
+
+  mutex mu_;
+  typedef std::vector<PersistentTensor> SubQueue;
+  std::vector<SubQueue> queues_ GUARDED_BY(mu_);
+  bool closed_ GUARDED_BY(mu_);
+  random::PhiloxRandom parent_generator_ GUARDED_BY(mu_);
+  random::SingleSampleAdapter<random::PhiloxRandom> generator_ GUARDED_BY(mu_);
+
+  enum RunResult { kNoProgress, kProgress, kComplete };
+  struct Attempt;
+  typedef std::function<RunResult(Attempt*)> RunCallback;
+  struct Attempt {
+    int32 elements_requested;
+    DoneCallback done_callback;  // must be run outside mu_
+    OpKernelContext* context;
+    CancellationToken cancellation_token;
+    RunCallback run_callback;  // must be run while holding mu_
+    bool is_cancelled;
+    Tuple tuple;
+
+    Attempt(int32 elements_requested, DoneCallback done_callback,
+            OpKernelContext* context, CancellationToken cancellation_token,
+            RunCallback run_callback)
+        : elements_requested(elements_requested),
+          done_callback(done_callback),
+          context(context),
+          cancellation_token(cancellation_token),
+          run_callback(run_callback),
+          is_cancelled(false) {}
+  };
+  std::deque<Attempt> enqueue_attempts_ GUARDED_BY(mu_);
+  std::deque<Attempt> dequeue_attempts_ GUARDED_BY(mu_);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(RandomShuffleQueue);
+};
+
+RandomShuffleQueue::RandomShuffleQueue(
+    int capacity, int min_after_dequeue, int64 seed, int64 seed2,
+    const DataTypeVector& component_dtypes,
+    const std::vector<TensorShape>& component_shapes, const string& name)
+    : QueueBase(component_dtypes, component_shapes, name),
+      capacity_(capacity),
+      min_after_dequeue_(min_after_dequeue),
+      original_seed_(seed),
+      original_seed2_(seed2),
+      closed_(false),
+      generator_(&parent_generator_) {
+  if (seed == 0 && seed2 == 0) {
+    // If both seeds are unspecified, use completely random seeds.
+    seed = random::New64();
+    seed2 = random::New64();
+  }
+  parent_generator_ = random::PhiloxRandom(seed, seed2);
+}
+
+Status RandomShuffleQueue::Initialize() {
+  if (component_dtypes_.empty()) {
+    return errors::InvalidArgument("Empty component types for queue ", name_);
+  }
+  if (!component_shapes_.empty() &&
+      component_dtypes_.size() != component_shapes_.size()) {
+    return errors::InvalidArgument("Different number of component types (",
+                                   component_dtypes_.size(), ") vs. shapes (",
+                                   component_shapes_.size(), ").");
+  }
+
+  mutex_lock lock(mu_);
+  queues_.reserve(num_components());
+  for (int i = 0; i < num_components(); ++i) {
+    queues_.push_back(SubQueue());
+    queues_.back().reserve(min_after_dequeue_);
+  }
+  return Status::OK();
+}
+
+// TODO(mrry): If these checks become a bottleneck, find a way to
+//   reduce the number of times that they are called.
+Status RandomShuffleQueue::ValidateTuple(const Tuple& tuple) {
+  TF_RETURN_IF_ERROR(ValidateTupleCommon(tuple));
+  if (specified_shapes()) {
+    for (size_t i = 0; i < tuple.size(); ++i) {
+      if (!tuple[i].shape().IsSameSize(component_shapes_[i])) {
+        return errors::InvalidArgument(
+            "Shape mismatch in tuple component ", i, ". Expected ",
+            component_shapes_[i].ShortDebugString(), ", got ",
+            tuple[i].shape().ShortDebugString());
+      }
+    }
+  }
+  return Status::OK();
+}
+
+// TODO(mrry): If these checks become a bottleneck, find a way to
+//   reduce the number of times that they are called.
+Status RandomShuffleQueue::ValidateManyTuple(const Tuple& tuple) {
+  TF_RETURN_IF_ERROR(ValidateTupleCommon(tuple));
+  const int64 batch_size = tuple[0].dim_size(0);
+  if (specified_shapes()) {
+    for (size_t i = 0; i < tuple.size(); ++i) {
+      // Expected shape is [batch_size] + component_shapes_[i]
+      const TensorShape expected_shape = ManyOutShape(i, batch_size);
+      if (!tuple[i].shape().IsSameSize(expected_shape)) {
+        return errors::InvalidArgument(
+            "Shape mismatch in tuple component ", i, ". Expected ",
+            expected_shape.ShortDebugString(), ", got ",
+            tuple[i].shape().ShortDebugString());
+      }
+    }
+  } else {
+    for (size_t i = 1; i < tuple.size(); ++i) {
+      if (tuple[i].dim_size(0) != batch_size) {
+        return errors::InvalidArgument(
+            "All input tensors must have the same size in the 0th ",
+            "dimension. Component ", i, " has ", tuple[i].dim_size(0),
+            ", and should have ", batch_size);
+      }
+    }
+  }
+  return Status::OK();
+}
+
+void RandomShuffleQueue::DequeueLocked(OpKernelContext* ctx, Tuple* tuple) {
+  DCHECK_GT(queues_[0].size(), 0);
+  int64 index = generator_() % queues_[0].size();
+  (*tuple).reserve(num_components());
+  for (int i = 0; i < num_components(); ++i) {
+    (*tuple).push_back(*queues_[i][index].AccessTensor(ctx));
+    queues_[i][index] = queues_[i].back();
+    queues_[i].pop_back();
+  }
+}
+
+void RandomShuffleQueue::Cancel(Action action, CancellationToken token) {
+  DoneCallback callback = nullptr;
+  {
+    mutex_lock lock(mu_);
+    std::deque<Attempt>* attempts =
+        action == kEnqueue ? &enqueue_attempts_ : &dequeue_attempts_;
+
+    for (Attempt& attempt : *attempts) {
+      if (attempt.cancellation_token == token) {
+        attempt.is_cancelled = true;
+        if (action == kEnqueue) {
+          attempt.context->SetStatus(
+              errors::Cancelled("Enqueue operation was cancelled"));
+        } else {
+          attempt.context->SetStatus(
+              errors::Cancelled("Dequeue operation was cancelled"));
+        }
+        std::swap(callback, attempt.done_callback);
+        break;
+      }
+    }
+  }
+  if (callback) {
+    callback();
+    FlushUnlocked();
+  }
+}
+
+void RandomShuffleQueue::CloseAndCancel() {
+  std::vector<DoneCallback> callbacks;
+  {
+    mutex_lock lock(mu_);
+    closed_ = true;
+    for (Attempt& attempt : enqueue_attempts_) {
+      attempt.is_cancelled = true;
+      attempt.context->SetStatus(
+          errors::Cancelled("Enqueue operation was cancelled"));
+      callbacks.emplace_back(std::move(attempt.done_callback));
+    }
+  }
+  for (const DoneCallback& callback : callbacks) {
+    callback();
+  }
+  FlushUnlocked();
+}
+
+bool RandomShuffleQueue::TryAttemptLocked(
+    Action action, std::vector<CleanUp>* clean_up) {
+  std::deque<Attempt>* attempts =
+      action == kEnqueue ? &enqueue_attempts_ : &dequeue_attempts_;
+
+  bool progress = false;
+  bool done = false;
+  while (!done && !attempts->empty()) {
+    if (attempts->front().is_cancelled) {
+      if (action == kEnqueue) {
+        LOG(INFO) << "Skipping cancelled enqueue attempt";
+      } else {
+        LOG(INFO) << "Skipping cancelled dequeue attempt";
+      }
+      attempts->pop_front();
+    } else {
+      Attempt* cur_attempt = &attempts->front();
+      switch (cur_attempt->run_callback(cur_attempt)) {
+        case kNoProgress:
+          done = true;
+          break;
+        case kProgress:
+          done = true;
+          progress = true;
+          break;
+        case kComplete:
+          progress = true;
+          clean_up->emplace_back(std::move(cur_attempt->done_callback),
+                                 cur_attempt->cancellation_token,
+                                 cur_attempt->context->cancellation_manager());
+          attempts->pop_front();
+          break;
+      }
+    }
+  }
+  return progress;
+}
+
+void RandomShuffleQueue::FlushUnlocked() {
+  std::vector<CleanUp> clean_up;
+  Ref();
+  {
+    mutex_lock lock(mu_);
+    bool changed;
+    do {
+      changed = TryAttemptLocked(kEnqueue, &clean_up);
+      changed = TryAttemptLocked(kDequeue, &clean_up) || changed;
+    } while (changed);
+  }
+  Unref();
+  for (const auto& to_clean : clean_up) {
+    if (to_clean.to_deregister != CancellationManager::kInvalidToken) {
+      // NOTE(mrry): We can safely ignore the return value of
+      // DeregisterCallback because the mutex mu_ ensures that the
+      // cleanup action only executes once.
+      to_clean.cm->DeregisterCallback(to_clean.to_deregister);
+    }
+    to_clean.finished();
+  }
+}
+
+void RandomShuffleQueue::TryEnqueue(const Tuple& tuple, OpKernelContext* ctx,
+                                    DoneCallback callback) {
+  CancellationManager* cm = ctx->cancellation_manager();
+  CancellationToken token = cm->get_cancellation_token();
+  bool already_cancelled;
+  {
+    mutex_lock l(mu_);
+    already_cancelled = !cm->RegisterCallback(
+        token, [this, token]() { Cancel(kEnqueue, token); });
+    if (!already_cancelled) {
+      enqueue_attempts_.emplace_back(
+          1, callback, ctx, token,
+          [tuple, this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            if (closed_) {
+              attempt->context->SetStatus(errors::Aborted(
+                  "RandomShuffleQueue '", name_, "' is closed."));
+              return kComplete;
+            }
+            if (queues_[0].size() < static_cast<size_t>(capacity_)) {
+              for (int i = 0; i < num_components(); ++i) {
+                queues_[i].push_back(PersistentTensor(tuple[i]));
+              }
+              return kComplete;
+            } else {
+              return kNoProgress;
+            }
+          });
+    }
+  }
+  if (!already_cancelled) {
+    FlushUnlocked();
+  } else {
+    ctx->SetStatus(errors::Cancelled("Enqueue operation was cancelled"));
+    callback();
+  }
+}
+
+void RandomShuffleQueue::TryEnqueueMany(const Tuple& tuple,
+                                        OpKernelContext* ctx,
+                                        DoneCallback callback) {
+  const int64 batch_size = tuple[0].dim_size(0);
+  if (batch_size == 0) {
+    callback();
+    return;
+  }
+
+  CancellationManager* cm = ctx->cancellation_manager();
+  CancellationToken token = cm->get_cancellation_token();
+  bool already_cancelled;
+  {
+    mutex_lock l(mu_);
+    already_cancelled = !cm->RegisterCallback(
+        token, [this, token]() { Cancel(kEnqueue, token); });
+    if (!already_cancelled) {
+      enqueue_attempts_.emplace_back(
+          batch_size, callback, ctx, token,
+          [tuple, this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            if (closed_) {
+              attempt->context->SetStatus(errors::Aborted(
+                  "RandomShuffleQueue '", name_, "' is closed."));
+              return kComplete;
+            }
+            RunResult result = kNoProgress;
+            while (queues_[0].size() < static_cast<size_t>(capacity_)) {
+              result = kProgress;
+              const int index =
+                  tuple[0].dim_size(0) - attempt->elements_requested;
+              for (int i = 0; i < num_components(); ++i) {
+                TensorShape element_shape(tuple[i].shape());
+                element_shape.RemoveDim(0);
+                PersistentTensor element;
+                Tensor* element_access = nullptr;
+                attempt->context->allocate_persistent(
+                    tuple[i].dtype(), element_shape, &element, &element_access);
+                attempt->context->SetStatus(
+                    CopySliceToElement(tuple[i], element_access, index));
+                if (!attempt->context->status().ok()) return kComplete;
+                queues_[i].push_back(element);
+              }
+              --attempt->elements_requested;
+              if (attempt->elements_requested == 0) {
+                return kComplete;
+              }
+            }
+            return result;
+          });
+    }
+  }
+  if (!already_cancelled) {
+    FlushUnlocked();
+  } else {
+    ctx->SetStatus(errors::Cancelled("Enqueue operation was cancelled"));
+    callback();
+  }
+}
+
+void RandomShuffleQueue::TryDequeue(OpKernelContext* ctx,
+                                    CallbackWithTuple callback) {
+  CancellationManager* cm = ctx->cancellation_manager();
+  CancellationToken token = cm->get_cancellation_token();
+  bool already_cancelled;
+  {
+    mutex_lock l(mu_);
+    already_cancelled = !cm->RegisterCallback(
+        token, [this, token]() { Cancel(kDequeue, token); });
+    if (!already_cancelled) {
+      // TODO(josh11b): This makes two copies of callback, avoid this if possible.
+      dequeue_attempts_.emplace_back(
+          1, [callback]() { callback(Tuple()); }, ctx, token,
+          [callback, this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            int32 s = queues_[0].size();
+            if (closed_ && s == 0) {
+              attempt->context->SetStatus(errors::OutOfRange(
+                  "RandomShuffleQueue '", name_, "' is closed and has ",
+                  "insufficient elements (requested ", 1, ", current size ", s,
+                  ")"));
+              return kComplete;
+            }
+            if (!closed_) s -= min_after_dequeue_;
+            if (s > 0) {
+              Tuple tuple;
+              DequeueLocked(attempt->context, &tuple);
+              attempt->done_callback = [callback, tuple]() { callback(tuple); };
+              return kComplete;
+            } else {
+              return kNoProgress;
+            }
+          });
+    }
+  }
+  if (!already_cancelled) {
+    FlushUnlocked();
+  } else {
+    ctx->SetStatus(errors::Cancelled("Dequeue operation was cancelled"));
+    callback(Tuple());
+  }
+}
+
+void RandomShuffleQueue::TryDequeueMany(int num_elements, OpKernelContext* ctx,
+                                        CallbackWithTuple callback) {
+  if (!specified_shapes()) {
+    ctx->SetStatus(
+        errors::InvalidArgument("RandomShuffleQueue's DequeueMany requires the "
+                                "components to have specified shapes."));
+    callback(Tuple());
+    return;
+  }
+  if (num_elements == 0) {
+    Tuple tuple;
+    tuple.reserve(num_components());
+    for (int i = 0; i < num_components(); ++i) {
+      // TODO(josh11b,misard): Switch to allocate_output().  Problem is
+      // this breaks the abstraction boundary since we don't *really*
+      // know if and how the Tensors in the tuple we pass to callback
+      // correspond to the outputs of *ctx.  For example, the
+      // ReaderRead Op uses TryDequeue() to get a filename out of a
+      // queue that is used internally by the reader and is not
+      // associated with any output of the ReaderRead.
+      // mrry@ adds:
+      // Maybe we need to pass a std::function<Tensor*(...)> (or
+      // better signature) that calls the appropriate allocator
+      // function in addition to ctx?  (Or support a shim Allocator
+      // that has an internal OpKernelContext*, and dispatches to the
+      // appropriate method?)
+      // misard@ adds:
+      // I don't see that a std::function would help. The problem is
+      // that at this point (allocation time) the system doesn't know
+      // what is going to happen to the element read out of the
+      // queue. As long as we keep the generality that TensorFlow Ops
+      // do their own dynamic allocation in arbitrary C++ code, we
+      // need to preserve robustness to allocating output Tensors with
+      // the 'wrong' attributes, and fixing up with a copy. The only
+      // improvement I can see here in the future would be to support
+      // an optimized case where the queue 'knows' what attributes to
+      // use, and plumbs them through here.
+      Tensor element;
+      ctx->allocate_temp(component_dtypes_[i], ManyOutShape(i, 0), &element);
+      tuple.emplace_back(element);
+    }
+    callback(tuple);
+    return;
+  }
+
+  CancellationManager* cm = ctx->cancellation_manager();
+  CancellationToken token = cm->get_cancellation_token();
+  bool already_cancelled;
+  {
+    mutex_lock l(mu_);
+    already_cancelled = !cm->RegisterCallback(
+        token, [this, token]() { Cancel(kDequeue, token); });
+    if (!already_cancelled) {
+      // TODO(josh11b): This makes two copies of callback, avoid this if possible.
+      dequeue_attempts_.emplace_back(
+          num_elements, [callback]() { callback(Tuple()); }, ctx, token,
+          [callback, this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            int32 s = queues_[0].size();
+            if (closed_ && s < attempt->elements_requested) {
+              attempt->context->SetStatus(errors::OutOfRange(
+                  "RandomSuffleQueue '", name_, "' is closed and has ",
+                  "insufficient elements (requested ",
+                  attempt->elements_requested, ", current size ", s, ")"));
+              return kComplete;
+            }
+
+            RunResult result = kNoProgress;
+            if (!closed_) s -= min_after_dequeue_;
+            for (; s > 0; --s) {
+              if (attempt->tuple.empty()) {
+                // Only allocate tuple when we have something to dequeue
+                // so we don't use exceessive memory when there are many
+                // blocked dequeue attempts waiting.
+                attempt->tuple.reserve(num_components());
+                for (int i = 0; i < num_components(); ++i) {
+                  const TensorShape shape =
+                      ManyOutShape(i, attempt->elements_requested);
+                  Tensor element;
+                  attempt->context->allocate_temp(component_dtypes_[i], shape,
+                                                  &element);
+                  attempt->tuple.emplace_back(element);
+                }
+              }
+              result = kProgress;
+              Tuple tuple;
+              DequeueLocked(attempt->context, &tuple);
+              const int index =
+                  attempt->tuple[0].dim_size(0) - attempt->elements_requested;
+              for (int i = 0; i < num_components(); ++i) {
+                attempt->context->SetStatus(
+                    CopyElementToSlice(tuple[i], &attempt->tuple[i], index));
+                if (!attempt->context->status().ok()) return kComplete;
+              }
+              tuple.clear();
+              --attempt->elements_requested;
+              if (attempt->elements_requested == 0) {
+                tuple = attempt->tuple;
+                attempt->done_callback = [callback, tuple]() {
+                  callback(tuple);
+                };
+                return kComplete;
+              }
+            }
+            return result;
+          });
+    }
+  }
+  if (!already_cancelled) {
+    FlushUnlocked();
+  } else {
+    ctx->SetStatus(errors::Cancelled("Dequeue operation was cancelled"));
+    callback(Tuple());
+  }
+}
+
+void RandomShuffleQueue::Close(OpKernelContext* ctx,
+                               bool cancel_pending_enqueues,
+                               DoneCallback callback) {
+  if (cancel_pending_enqueues) {
+    CloseAndCancel();
+    callback();
+  } else {
+    {
+      mutex_lock lock(mu_);
+      enqueue_attempts_.emplace_back(
+          0, callback, ctx, CancellationManager::kInvalidToken,
+          [this](Attempt* attempt) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+            if (closed_) {
+              attempt->context->SetStatus(errors::Aborted(
+                  "RandomShuffleQueue '", name_, "' is already closed."));
+            } else {
+              closed_ = true;
+            }
+            return kComplete;
+          });
+    }
+    FlushUnlocked();
+  }
+}
+
+Status RandomShuffleQueue::MatchesNodeDef(const NodeDef& node_def) {
+  TF_RETURN_IF_ERROR(MatchesNodeDefOp(node_def, "RandomShuffleQueue"));
+  TF_RETURN_IF_ERROR(MatchesNodeDefCapacity(node_def, capacity_));
+
+  int32 min_after_dequeue = -1;
+  TF_RETURN_IF_ERROR(
+      GetNodeAttr(node_def, "min_after_dequeue", &min_after_dequeue));
+  if (min_after_dequeue != min_after_dequeue_) {
+    return errors::InvalidArgument(
+        "Shared queue '", name_, "' has min_after_dequeue ",
+        min_after_dequeue_, " but requested min_after_dequeue was ",
+        min_after_dequeue, ".");
+  }
+
+  int64 seed = -1;
+  int64 seed2 = -1;
+  TF_RETURN_IF_ERROR(GetNodeAttr(node_def, "seed", &seed));
+  TF_RETURN_IF_ERROR(GetNodeAttr(node_def, "seed2", &seed2));
+  if ((seed != 0 || seed2 != 0) &&
+      (seed != original_seed_ || seed2 != original_seed2_)) {
+    return errors::InvalidArgument(
+        "Shared queue '", name_, "' has random seeds (", original_seed_, ", ",
+        original_seed2_, ") but requested seeds are (", seed, ", ", seed2,
+        ").");
+  }
+
+  TF_RETURN_IF_ERROR(MatchesNodeDefTypes(node_def));
+  TF_RETURN_IF_ERROR(MatchesNodeDefShapes(node_def));
+
+  return Status::OK();
+}
+
+typedef std::shared_ptr<QueueInterface> QueueInterfacePtr;
+
+// Defines a RandomShuffleQueueOp, which produces a Queue (specifically, one
+// backed by RandomShuffleQueue) that persists across different graph
+// executions, and sessions. Running this op produces a single-element
+// tensor of handles to Queues in the corresponding device.
+class RandomShuffleQueueOp : public OpKernel {
+ public:
+  explicit RandomShuffleQueueOp(OpKernelConstruction* context)
+      : OpKernel(context), queue_handle_set_(false) {
+    OP_REQUIRES_OK(context, context->GetAttr("capacity", &capacity_));
+    OP_REQUIRES_OK(context,
+                   context->allocate_persistent(DT_STRING, TensorShape({2}),
+                                                &queue_handle_, nullptr));
+    if (capacity_ < 0) {
+      capacity_ = RandomShuffleQueue::kUnbounded;
+    }
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("min_after_dequeue", &min_after_dequeue_));
+    OP_REQUIRES(context, min_after_dequeue_ >= 0,
+                errors::InvalidArgument("min_after_dequeue ",
+                                        min_after_dequeue_, " must be >= 0"));
+    OP_REQUIRES(
+        context, min_after_dequeue_ < capacity_,
+        errors::InvalidArgument("min_after_dequeue ", min_after_dequeue_,
+                                " must be < capacity ", capacity_));
+    OP_REQUIRES_OK(context, context->GetAttr("seed", &seed_));
+    OP_REQUIRES_OK(context, context->GetAttr("seed2", &seed2_));
+
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("component_types", &component_types_));
+    OP_REQUIRES_OK(context, context->GetAttr("shapes", &component_shapes_));
+  }
+
+  ~RandomShuffleQueueOp() override {
+    // If the queue object was not shared, delete it.
+    if (queue_handle_set_ && cinfo_.resource_is_private_to_kernel()) {
+      TF_CHECK_OK(cinfo_.resource_manager()->Delete<QueueInterface>(
+          cinfo_.container(), cinfo_.name()));
+    }
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    mutex_lock l(mu_);
+    if (!queue_handle_set_) {
+      OP_REQUIRES_OK(ctx, SetQueueHandle(ctx));
+    }
+    ctx->set_output_ref(0, &mu_, queue_handle_.AccessTensor(ctx));
+  }
+
+ private:
+  Status SetQueueHandle(OpKernelContext* ctx) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+    TF_RETURN_IF_ERROR(cinfo_.Init(ctx->resource_manager(), def()));
+    QueueInterface* queue;
+    auto creator = [this](QueueInterface** ret) {
+      auto* q = new RandomShuffleQueue(capacity_, min_after_dequeue_, seed_,
+                                       seed2_, component_types_,
+                                       component_shapes_, cinfo_.name());
+      Status s = q->Initialize();
+      if (s.ok()) {
+        *ret = q;
+      } else {
+        q->Unref();
+      }
+      return s;
+    };
+    TF_RETURN_IF_ERROR(
+        cinfo_.resource_manager()->LookupOrCreate<QueueInterface>(
+            cinfo_.container(), cinfo_.name(), &queue, creator));
+    core::ScopedUnref unref_me(queue);
+    // Verify that the shared queue is compatible with the requested arguments.
+    TF_RETURN_IF_ERROR(queue->MatchesNodeDef(def()));
+    auto h = queue_handle_.AccessTensor(ctx)->flat<string>();
+    h(0) = cinfo_.container();
+    h(1) = cinfo_.name();
+    queue_handle_set_ = true;
+    return Status::OK();
+  }
+
+  int32 capacity_;
+  int32 min_after_dequeue_;
+  int64 seed_;
+  int64 seed2_;
+  DataTypeVector component_types_;
+  std::vector<TensorShape> component_shapes_;
+  ContainerInfo cinfo_;
+
+  mutex mu_;
+  PersistentTensor queue_handle_ GUARDED_BY(mu_);
+  bool queue_handle_set_ GUARDED_BY(mu_);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(RandomShuffleQueueOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("RandomShuffleQueue").Device(DEVICE_CPU),
+                        RandomShuffleQueueOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/range_sampler.cc b/tensorflow/core/kernels/range_sampler.cc
new file mode 100644
index 0000000000..a3f4e0b0cb
--- /dev/null
+++ b/tensorflow/core/kernels/range_sampler.cc
@@ -0,0 +1,305 @@
+#include "tensorflow/core/kernels/range_sampler.h"
+
+#include <vector>
+#include <unordered_set>
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/io/inputbuffer.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+using gtl::ArraySlice;
+using gtl::MutableArraySlice;
+
+RangeSampler::~RangeSampler() {}
+
+void RangeSampler::SampleBatch(random::SimplePhilox* rnd, bool unique,
+                               gtl::MutableArraySlice<int64> batch) const {
+  SampleBatchGetExpectedCount(
+      rnd, unique, batch, gtl::MutableArraySlice<float>(),
+      gtl::ArraySlice<int64>(), gtl::MutableArraySlice<float>());
+}
+
+void RangeSampler::SampleBatchGetExpectedCount(
+    random::SimplePhilox* rnd, bool unique, gtl::MutableArraySlice<int64> batch,
+    gtl::MutableArraySlice<float> batch_expected_count,
+    gtl::ArraySlice<int64> extras,
+    gtl::MutableArraySlice<float> extras_expected_count) const {
+  SampleBatchGetExpectedCountAvoid(rnd, unique, batch, batch_expected_count,
+                                   extras, extras_expected_count,
+                                   gtl::ArraySlice<int64>());
+}
+
+namespace {
+
+// Approximates the expected count of a value in the output of SampleBatch.
+//
+// If unique=false, then this is (Probability(value) * batch_size)
+//
+// We use batch_size and num_tries, where num_tries is the observed number of
+// tries it took to get batch_size unique values.
+//
+// Assuming (falsely) that the nubmer of tries to get a batch of batch_size
+// distinct values is _always_ num_tries, the probability that the value
+// is in a batch is (1 - (1-p)^num_tries)
+static float ExpectedCountHelper(float p, int batch_size, int num_tries) {
+  if (num_tries == batch_size) {
+    // This shortcut will always be taken if unique=false
+    return p * batch_size;
+  }
+  // numerically stable version of (1 - (1-p)^num_tries)
+  return -expm1(num_tries * log1p(-p));
+}
+
+}  // namespace
+
+void RangeSampler::SampleBatchGetExpectedCountAvoid(
+    random::SimplePhilox* rnd, bool unique, MutableArraySlice<int64> batch,
+    MutableArraySlice<float> batch_expected_count, ArraySlice<int64> extras,
+    MutableArraySlice<float> extras_expected_count,
+    ArraySlice<int64> avoided_values) const {
+  const int batch_size = batch.size();
+  int num_tries;
+
+  if (unique) {
+    CHECK_LE(batch_size + avoided_values.size(), range_);
+    std::unordered_set<int64> used(batch_size);
+    used.insert(avoided_values.begin(), avoided_values.end());
+    int num_picked = 0;
+    num_tries = 0;
+    while (num_picked < batch_size) {
+      num_tries++;
+      CHECK_LT(num_tries, kint32max);
+      int64 value = Sample(rnd);
+      if (gtl::InsertIfNotPresent(&used, value)) {
+        batch[num_picked++] = value;
+      }
+    }
+  } else {
+    CHECK_EQ(avoided_values.size(), 0)
+        << "avoided_values only supported with unique=true";
+    for (int i = 0; i < batch_size; i++) {
+      batch[i] = Sample(rnd);
+    }
+    num_tries = batch_size;
+  }
+  // Compute the expected counts of the batch and the extra values
+  if (batch_expected_count.size() > 0) {
+    CHECK_EQ(batch_size, batch_expected_count.size());
+    for (int i = 0; i < batch_size; i++) {
+      batch_expected_count[i] =
+          ExpectedCountHelper(Probability(batch[i]), batch_size, num_tries);
+    }
+  }
+  CHECK_EQ(extras.size(), extras_expected_count.size());
+  for (size_t i = 0; i < extras.size(); i++) {
+    extras_expected_count[i] =
+        ExpectedCountHelper(Probability(extras[i]), batch_size, num_tries);
+  }
+}
+
+AllSampler::AllSampler(int64 range)
+    : RangeSampler(range), inv_range_(1.0 / range) {}
+
+void AllSampler::SampleBatchGetExpectedCountAvoid(
+    random::SimplePhilox* rnd, bool unique, MutableArraySlice<int64> batch,
+    MutableArraySlice<float> batch_expected_count, ArraySlice<int64> extras,
+    MutableArraySlice<float> extras_expected_count,
+    ArraySlice<int64> avoided_values) const {
+  const int batch_size = batch.size();
+  CHECK_EQ(range_, batch_size);
+  for (int i = 0; i < batch_size; i++) {
+    batch[i] = i;
+  }
+  if (batch_expected_count.size() > 0) {
+    CHECK_EQ(batch_size, batch_expected_count.size());
+    for (int i = 0; i < batch_size; i++) {
+      batch_expected_count[i] = 1;
+    }
+  }
+  CHECK_EQ(0, avoided_values.size());
+  CHECK_EQ(extras.size(), extras_expected_count.size());
+  for (size_t i = 0; i < extras.size(); i++) {
+    extras_expected_count[i] = 1;
+  }
+}
+
+UniformSampler::UniformSampler(int64 range)
+    : RangeSampler(range), inv_range_(1.0 / range) {}
+
+int64 UniformSampler::Sample(random::SimplePhilox* rnd) const {
+  return rnd->Uniform64(range_);
+}
+
+float UniformSampler::Probability(int64 value) const { return inv_range_; }
+
+LogUniformSampler::LogUniformSampler(int64 range)
+    : RangeSampler(range), log_range_(log(range + 1)) {}
+
+int64 LogUniformSampler::Sample(random::SimplePhilox* rnd) const {
+  const int64 value =
+      static_cast<int64>(exp(rnd->RandDouble() * log_range_)) - 1;
+  CHECK_GE(value, 0);
+  // Mathematically, value should be <= range_, but might not be due to some
+  // floating point roundoff, so we mod by range_.
+  return value % range_;
+}
+
+float LogUniformSampler::Probability(int64 value) const {
+  // value is returned iff the call to UniformDouble(log_range_) in the
+  // Sample() function returns a value between log(value + 1)
+  // and log(value + 2).   The probability of this is:
+  // (log(value + 2) - log(value + 1)) / log_range
+  // To avoid two calls to log(), we compute this as follows:
+  return (log((value + 2.0) / (value + 1.0))) / log_range_;
+}
+
+ThreadUnsafeUnigramSampler::ThreadUnsafeUnigramSampler(int64 range)
+    : RangeSampler(range), picker_(range) {
+  CHECK_LT(range, kint32max);
+}
+
+int64 ThreadUnsafeUnigramSampler::Sample(random::SimplePhilox* rnd) const {
+  return picker_.Pick(rnd);
+}
+
+float ThreadUnsafeUnigramSampler::Probability(int64 value) const {
+  return static_cast<float>(picker_.get_weight(value)) / picker_.total_weight();
+}
+
+void ThreadUnsafeUnigramSampler::Update(ArraySlice<int64> values) {
+  int num_updates = std::min(static_cast<int>(values.size()),
+                             kint32max - picker_.total_weight());
+  for (int i = 0; i < num_updates; i++) {
+    const int64 value = values[i];
+    picker_.set_weight(value, picker_.get_weight(value) + 1);
+  }
+}
+
+// Thread-safe unigram sampler
+UnigramSampler::UnigramSampler(int64 range)
+    : RangeSampler(range), unsafe_sampler_(range) {
+  CHECK_LT(range, kint32max);
+}
+
+int64 UnigramSampler::Sample(random::SimplePhilox* rnd) const {
+  mutex_lock lock(mu_);  // could use reader lock
+  return unsafe_sampler_.Sample(rnd);
+}
+
+float UnigramSampler::Probability(int64 value) const {
+  mutex_lock lock(mu_);  // could use reader lock
+  return unsafe_sampler_.Probability(value);
+}
+
+// Overriding at a high level results in far fewer lock aquisitions.
+void UnigramSampler::SampleBatchGetExpectedCountAvoid(
+    random::SimplePhilox* rnd, bool unique, MutableArraySlice<int64> batch,
+    MutableArraySlice<float> batch_expected_count, ArraySlice<int64> extras,
+    MutableArraySlice<float> extras_expected_count,
+    ArraySlice<int64> avoided_values) const {
+  mutex_lock lock(mu_);  // could use reader lock
+  unsafe_sampler_.SampleBatchGetExpectedCountAvoid(
+      rnd, unique, batch, batch_expected_count, extras, extras_expected_count,
+      avoided_values);
+}
+
+void UnigramSampler::Update(ArraySlice<int64> values) {
+  mutex_lock lock(mu_);
+  unsafe_sampler_.Update(values);
+}
+
+FixedUnigramSampler::FixedUnigramSampler(Env* env, int64 range,
+                                         const string& vocab_file,
+                                         float distortion,
+                                         int32 num_reserved_ids,
+                                         int32 num_shards, int32 shard)
+    : RangeSampler(range),
+      total_weight_(0.0),
+      num_shards_(num_shards),
+      shard_(shard) {
+  FillReservedIds(num_reserved_ids);
+  // TODO(vanhoucke): make this non-crashing.
+  TF_CHECK_OK(LoadFromFile(env, vocab_file, distortion));
+  CHECK_EQ(range, weights_.size());
+  dist_sampler_.reset(new random::DistributionSampler(weights_));
+}
+
+FixedUnigramSampler::FixedUnigramSampler(int64 range,
+                                         const std::vector<float>& unigrams,
+                                         float distortion,
+                                         int32 num_reserved_ids,
+                                         int32 num_shards, int32 shard)
+    : RangeSampler(range),
+      total_weight_(0.0),
+      num_shards_(num_shards),
+      shard_(shard) {
+  FillReservedIds(num_reserved_ids);
+  LoadFromUnigrams(unigrams, distortion);
+  // TODO(vanhoucke): make this non-crashing.
+  CHECK_EQ(range, weights_.size());
+  dist_sampler_.reset(new random::DistributionSampler(weights_));
+}
+
+float FixedUnigramSampler::Probability(int64 value) const {
+  return weights_.at(value) / total_weight_;
+}
+
+int64 FixedUnigramSampler::Sample(random::SimplePhilox* rnd) const {
+  return dist_sampler_->Sample(rnd);
+}
+
+void FixedUnigramSampler::FillReservedIds(int32 num_reserved_ids) {
+  for (int32 word_id = 0; word_id < num_reserved_ids; ++word_id) {
+    if (word_id % num_shards_ == shard_) weights_.push_back(0.0);
+  }
+}
+
+Status FixedUnigramSampler::LoadFromFile(Env* env, const string& vocab_file,
+                                         float distortion) {
+  RandomAccessFile* file;
+  TF_RETURN_IF_ERROR(env->NewRandomAccessFile(vocab_file, &file));
+  io::InputBuffer in(file, 262144 /*bytes*/);
+  string line;
+  int32 word_id = weights_.size();
+  while (in.ReadLine(&line).ok()) {
+    // The vocabulary file should be in csv like format, with the last
+    // field the weight associated with the word.
+    std::vector<string> cols = str_util::Split(line, ',');
+    if (cols.size() == 0) continue;
+    // Skip entries that do not belong to this shard.
+    if (word_id % num_shards_ == shard_) {
+      float w = 0.0;
+      if (!strings::safe_strtof(cols.at(cols.size() - 1).c_str(), &w)) {
+        return errors::InvalidArgument("Wrong vocabulary format at line: ",
+                                       line);
+      }
+      w = pow(w, distortion);
+      total_weight_ += w;
+      weights_.push_back(w);
+    }
+    ++word_id;
+  }
+  return Status::OK();
+}
+
+void FixedUnigramSampler::LoadFromUnigrams(const std::vector<float>& unigrams,
+                                           float distortion) {
+  int32 word_id = weights_.size();
+  for (float w : unigrams) {
+    // Skip entries that do not belong to this shard.
+    if (word_id % num_shards_ == shard_) {
+      w = pow(w, distortion);
+      total_weight_ += w;
+      weights_.push_back(w);
+    }
+    ++word_id;
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/range_sampler.h b/tensorflow/core/kernels/range_sampler.h
new file mode 100644
index 0000000000..18364c2c03
--- /dev/null
+++ b/tensorflow/core/kernels/range_sampler.h
@@ -0,0 +1,237 @@
+#ifndef TENSORFLOW_KERNELS_RANGE_SAMPLER_H_
+#define TENSORFLOW_KERNELS_RANGE_SAMPLER_H_
+
+#include <vector>
+
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/random/distribution_sampler.h"
+#include "tensorflow/core/lib/random/random_distributions.h"
+#include "tensorflow/core/lib/random/weighted_picker.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class Env;
+
+// Abstract subclass for sampling from the set of non-negative integers
+// [0, range)
+class RangeSampler {
+ public:
+  explicit RangeSampler(int range) : range_(range) { CHECK_GT(range_, 0); }
+  virtual ~RangeSampler();
+
+  // Sample a single value
+  virtual int64 Sample(random::SimplePhilox* rnd) const = 0;
+
+  // The probability that a single call to Sample() returns the given value.
+  // Assumes that value is in [0, range).  No range checking is done.
+  virtual float Probability(int64 value) const = 0;
+
+  // Fill "batch" with samples from the distribution.
+  // If unique=true, then we re-pick each element until we get a
+  // value distinct from all previously picked values in the batch.
+  void SampleBatch(random::SimplePhilox* rnd, bool unique,
+                   gtl::MutableArraySlice<int64> batch) const;
+
+  // Fill "batch" with samples from the distribution, and report
+  // "expected counts".
+  //
+  // The "expected count" of a value is an estimate of the expected
+  // number of occurrences of the value in the batch returned by a
+  // call to this function with the given parameters.  If unique=true,
+  // the expected count is an inclusion probability.  For details on
+  // this estimation, see the comment to "ExpectedCountHelper" in the
+  // .cc file.
+  //
+  // Expected counts for the elements of the returned "batch" are reported
+  // in the aligned array "batch_expected_count".
+  //
+  // The user can optionally provide "extras", containg values in the range.
+  // The expected counts for the extras are reported in the aligned array
+  // "extras_expected_count".
+  //
+  // "batch_expected_count" must have size equal to 0 or to the size of "batch".
+  // "extras" and "extras_expected_count" must have equal size.
+  void SampleBatchGetExpectedCount(
+      random::SimplePhilox* rnd, bool unique,
+      gtl::MutableArraySlice<int64> batch,
+      gtl::MutableArraySlice<float> batch_expected_count,
+      gtl::ArraySlice<int64> extras,
+      gtl::MutableArraySlice<float> extras_expected_count) const;
+
+  // Same as SampleBatchGetExpectedCount (see above), but with avoided values.
+  // We repick to avoid all of the values in "avoided_values".
+  // "avoided_values" is only supported with unique=true.  If
+  // unique=false, then avoided_values must be empty.
+  virtual void SampleBatchGetExpectedCountAvoid(
+      random::SimplePhilox* rnd, bool unique,
+      gtl::MutableArraySlice<int64> batch,
+      gtl::MutableArraySlice<float> batch_expected_count,
+      gtl::ArraySlice<int64> extras,
+      gtl::MutableArraySlice<float> extras_expected_count,
+      gtl::ArraySlice<int64> avoided_values) const;
+
+  // Does this sampler need to be updated with values, e.g. UnigramSampler
+  virtual bool NeedsUpdates() const { return false; }
+
+  // Updates the underlying distribution
+  virtual void Update(gtl::ArraySlice<int64> values) {
+    LOG(FATAL) << "Update not supported for this sampler type.";
+  }
+
+  int64 range() { return range_; }
+
+ protected:
+  const int64 range_;
+};
+
+// An AllSampler only samples batches of size equal to range.
+// It returns the entire range.
+// It cannot sample single values.
+class AllSampler : public RangeSampler {
+ public:
+  explicit AllSampler(int64 range);
+
+  ~AllSampler() override {}
+
+  int64 Sample(random::SimplePhilox* rnd) const override {
+    LOG(FATAL) << "Should not be called";
+  }
+
+  float Probability(int64 value) const override {
+    LOG(FATAL) << "Should not be called";
+  }
+
+  void SampleBatchGetExpectedCountAvoid(
+      random::SimplePhilox* rnd, bool unique,
+      gtl::MutableArraySlice<int64> batch,
+      gtl::MutableArraySlice<float> batch_expected_count,
+      gtl::ArraySlice<int64> extras,
+      gtl::MutableArraySlice<float> extras_expected_count,
+      gtl::ArraySlice<int64> avoided_values) const override;
+
+ private:
+  const float inv_range_;
+};
+
+class UniformSampler : public RangeSampler {
+ public:
+  explicit UniformSampler(int64 range);
+
+  ~UniformSampler() override {}
+
+  int64 Sample(random::SimplePhilox* rnd) const override;
+
+  float Probability(int64 value) const override;
+
+ private:
+  const float inv_range_;
+};
+
+class LogUniformSampler : public RangeSampler {
+ public:
+  explicit LogUniformSampler(int64 range);
+
+  ~LogUniformSampler() override {}
+
+  int64 Sample(random::SimplePhilox* rnd) const override;
+
+  float Probability(int64 value) const override;
+
+ private:
+  const double log_range_;
+};
+
+// Thread-unsafe unigram sampler
+class ThreadUnsafeUnigramSampler : public RangeSampler {
+ public:
+  explicit ThreadUnsafeUnigramSampler(int64 range);
+  ~ThreadUnsafeUnigramSampler() override {}
+
+  int64 Sample(random::SimplePhilox* rnd) const override;
+
+  float Probability(int64 value) const override;
+
+  bool NeedsUpdates() const override { return true; }
+  void Update(gtl::ArraySlice<int64> values) override;
+
+ private:
+  random::WeightedPicker picker_;
+};
+
+// Thread-safe unigram sampler
+class UnigramSampler : public RangeSampler {
+ public:
+  explicit UnigramSampler(int64 range);
+  ~UnigramSampler() override {}
+
+  int64 Sample(random::SimplePhilox* rnd) const override;
+
+  float Probability(int64 value) const override;
+
+  // Overriding at a high level results in far fewer lock aquisitions.
+  void SampleBatchGetExpectedCountAvoid(
+      random::SimplePhilox* rnd, bool unique,
+      gtl::MutableArraySlice<int64> batch,
+      gtl::MutableArraySlice<float> batch_expected_count,
+      gtl::ArraySlice<int64> extras,
+      gtl::MutableArraySlice<float> extras_expected_count,
+      gtl::ArraySlice<int64> avoided_values) const override;
+
+  bool NeedsUpdates() const override { return true; }
+  void Update(gtl::ArraySlice<int64> values) override;
+
+ private:
+  ThreadUnsafeUnigramSampler unsafe_sampler_ GUARDED_BY(mu_);
+  mutable mutex mu_;
+};
+
+// A unigram sampler that uses a fixed unigram distribution read from a
+// file or passed in as an in-memory array instead of building up the
+// distribution from data on the fly. There is also an option to skew the
+// distribution by applying a distortion power to the weights.
+class FixedUnigramSampler : public RangeSampler {
+ public:
+  // The vocab_file is assumed to be a CSV, with the last entry of each row a
+  // value representing the counts or probabilities for the corresponding ID.
+  FixedUnigramSampler(Env* env, int64 range, const string& vocab_file,
+                      float distortion, int32 num_reserved_ids,
+                      int32 num_shards, int32 shard);
+
+  FixedUnigramSampler(int64 range, const std::vector<float>& unigrams,
+                      float distortion, int32 num_reserved_ids,
+                      int32 num_shards, int32 shard);
+
+  float Probability(int64 value) const override;
+
+  int64 Sample(random::SimplePhilox* rnd) const override;
+
+ private:
+  // Underlying distribution sampler.
+  std::unique_ptr<random::DistributionSampler> dist_sampler_;
+  // Weights for individual samples. The probability of a sample i is defined
+  // as weights_.at(i) / total_weight_.
+  std::vector<float> weights_;
+  // The total weights of all samples.
+  float total_weight_;
+  // Sharding information of the sampler. The whole vocabulary is sharded
+  // into num_shards_ smaller ranges and each sampler is responsible for one
+  // such smaller range, identified by the shard number.
+  int32 num_shards_;
+  int32 shard_;
+
+  // Fill the sampler with the appropriate number of reserved IDs.
+  void FillReservedIds(int32 num_reserved_ids);
+  // Load IDs to sample from a CSV file. It is assumed that the last item of
+  // each row contains a count or probability for the corresponding ID.
+  Status LoadFromFile(Env* env, const string& vocab_file, float distortion);
+  // Load from an in-memory array.
+  void LoadFromUnigrams(const std::vector<float>& unigrams, float distortion);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_RANGE_SAMPLER_H_
diff --git a/tensorflow/core/kernels/range_sampler_test.cc b/tensorflow/core/kernels/range_sampler_test.cc
new file mode 100644
index 0000000000..72c39009e4
--- /dev/null
+++ b/tensorflow/core/kernels/range_sampler_test.cc
@@ -0,0 +1,320 @@
+#include <vector>
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/kernels/range_sampler.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace {
+
+using gtl::ArraySlice;
+using gtl::MutableArraySlice;
+
+class RangeSamplerTest : public ::testing::Test {
+ protected:
+  void CheckProbabilitiesSumToOne() {
+    double sum = 0;
+    for (int i = 0; i < sampler_->range(); i++) {
+      sum += sampler_->Probability(i);
+    }
+    EXPECT_NEAR(sum, 1.0, 1e-4);
+  }
+  void CheckHistogram(int num_samples, float tolerance) {
+    const int range = sampler_->range();
+    std::vector<int> h(range);
+    std::vector<int64> a(num_samples);
+    // Using a fixed random seed to make the test deterministic.
+    random::PhiloxRandom philox(123, 17);
+    random::SimplePhilox rnd(&philox);
+    sampler_->SampleBatch(&rnd, false, &a);
+    for (int i = 0; i < num_samples; i++) {
+      int64 val = a[i];
+      ASSERT_GE(val, 0);
+      ASSERT_LT(val, range);
+      h[val]++;
+    }
+    for (int val = 0; val < range; val++) {
+      EXPECT_NEAR((h[val] + 0.0) / num_samples, sampler_->Probability(val),
+                  tolerance);
+    }
+  }
+  void Update1() {
+    // Add the value 3 ten times.
+    std::vector<int64> a(10);
+    for (int i = 0; i < 10; i++) {
+      a[i] = 3;
+    }
+    sampler_->Update(a);
+  }
+  void Update2() {
+    // Add the value n n times.
+    int64 a[10];
+    for (int i = 0; i < 10; i++) {
+      a[i] = i;
+    }
+    for (int64 i = 1; i < 10; i++) {
+      sampler_->Update(ArraySlice<int64>(a + i, 10 - i));
+    }
+  }
+  std::unique_ptr<RangeSampler> sampler_;
+};
+
+TEST_F(RangeSamplerTest, UniformProbabilities) {
+  sampler_.reset(new UniformSampler(10));
+  for (int i = 0; i < 10; i++) {
+    CHECK_EQ(sampler_->Probability(i), sampler_->Probability(0));
+  }
+}
+
+TEST_F(RangeSamplerTest, UniformChecksum) {
+  sampler_.reset(new UniformSampler(10));
+  CheckProbabilitiesSumToOne();
+}
+
+TEST_F(RangeSamplerTest, UniformHistogram) {
+  sampler_.reset(new UniformSampler(10));
+  CheckHistogram(1000, 0.05);
+}
+
+TEST_F(RangeSamplerTest, LogUniformProbabilities) {
+  int range = 1000000;
+  sampler_.reset(new LogUniformSampler(range));
+  for (int i = 100; i < range; i *= 2) {
+    float ratio = sampler_->Probability(i) / sampler_->Probability(i / 2);
+    EXPECT_NEAR(ratio, 0.5, 0.1);
+  }
+}
+
+TEST_F(RangeSamplerTest, LogUniformChecksum) {
+  sampler_.reset(new LogUniformSampler(10));
+  CheckProbabilitiesSumToOne();
+}
+
+TEST_F(RangeSamplerTest, LogUniformHistogram) {
+  sampler_.reset(new LogUniformSampler(10));
+  CheckHistogram(1000, 0.05);
+}
+
+TEST_F(RangeSamplerTest, UnigramProbabilities1) {
+  sampler_.reset(new UnigramSampler(10));
+  Update1();
+  EXPECT_NEAR(sampler_->Probability(3), 0.55, 1e-4);
+  for (int i = 0; i < 10; i++) {
+    if (i != 3) {
+      ASSERT_NEAR(sampler_->Probability(i), 0.05, 1e-4);
+    }
+  }
+}
+TEST_F(RangeSamplerTest, UnigramProbabilities2) {
+  sampler_.reset(new UnigramSampler(10));
+  Update2();
+  for (int i = 0; i < 10; i++) {
+    ASSERT_NEAR(sampler_->Probability(i), (i + 1) / 55.0, 1e-4);
+  }
+}
+TEST_F(RangeSamplerTest, UnigramChecksum) {
+  sampler_.reset(new UnigramSampler(10));
+  Update1();
+  CheckProbabilitiesSumToOne();
+}
+
+TEST_F(RangeSamplerTest, UnigramHistogram) {
+  sampler_.reset(new UnigramSampler(10));
+  Update1();
+  CheckHistogram(1000, 0.05);
+}
+
+static const char kVocabContent[] =
+    "w1,1\n"
+    "w2,2\n"
+    "w3,4\n"
+    "w4,8\n"
+    "w5,16\n"
+    "w6,32\n"
+    "w7,64\n"
+    "w8,128\n"
+    "w9,256";
+TEST_F(RangeSamplerTest, FixedUnigramProbabilities) {
+  Env* env = Env::Default();
+  string fname = io::JoinPath(testing::TmpDir(), "vocab_file");
+  TF_CHECK_OK(WriteStringToFile(env, fname, kVocabContent));
+  sampler_.reset(new FixedUnigramSampler(env, 9, fname, 0.8, 0, 1, 0));
+  // 1^0.8+2^0.8+4^0.8+...+256^0.8=197.05
+  for (int i = 0; i < 9; i++) {
+    ASSERT_NEAR(sampler_->Probability(i), pow(2, i * 0.8) / 197.05, 1e-4);
+  }
+}
+TEST_F(RangeSamplerTest, FixedUnigramChecksum) {
+  Env* env = Env::Default();
+  string fname = io::JoinPath(testing::TmpDir(), "vocab_file");
+  TF_CHECK_OK(WriteStringToFile(env, fname, kVocabContent));
+  sampler_.reset(new FixedUnigramSampler(env, 9, fname, 0.8, 0, 1, 0));
+  CheckProbabilitiesSumToOne();
+}
+
+TEST_F(RangeSamplerTest, FixedUnigramHistogram) {
+  Env* env = Env::Default();
+  string fname = io::JoinPath(testing::TmpDir(), "vocab_file");
+  TF_CHECK_OK(WriteStringToFile(env, fname, kVocabContent));
+  sampler_.reset(new FixedUnigramSampler(env, 9, fname, 0.8, 0, 1, 0));
+  CheckHistogram(1000, 0.05);
+}
+TEST_F(RangeSamplerTest, FixedUnigramProbabilitiesReserve1) {
+  Env* env = Env::Default();
+  string fname = io::JoinPath(testing::TmpDir(), "vocab_file");
+  TF_CHECK_OK(WriteStringToFile(env, fname, kVocabContent));
+  sampler_.reset(new FixedUnigramSampler(env, 10, fname, 0.8, 1, 1, 0));
+  ASSERT_NEAR(sampler_->Probability(0), 0, 1e-4);
+  // 1^0.8+2^0.8+4^0.8+...+256^0.8=197.05
+  for (int i = 1; i < 10; i++) {
+    ASSERT_NEAR(sampler_->Probability(i), pow(2, (i - 1) * 0.8) / 197.05, 1e-4);
+  }
+}
+TEST_F(RangeSamplerTest, FixedUnigramProbabilitiesReserve2) {
+  Env* env = Env::Default();
+  string fname = io::JoinPath(testing::TmpDir(), "vocab_file");
+  TF_CHECK_OK(WriteStringToFile(env, fname, kVocabContent));
+  sampler_.reset(new FixedUnigramSampler(env, 11, fname, 0.8, 2, 1, 0));
+  ASSERT_NEAR(sampler_->Probability(0), 0, 1e-4);
+  ASSERT_NEAR(sampler_->Probability(1), 0, 1e-4);
+  // 1^0.8+2^0.8+4^0.8+...+256^0.8=197.05
+  for (int i = 2; i < 11; i++) {
+    ASSERT_NEAR(sampler_->Probability(i), pow(2, (i - 2) * 0.8) / 197.05, 1e-4);
+  }
+}
+TEST_F(RangeSamplerTest, FixedUnigramProbabilitiesFromVector) {
+  std::vector<float> weights = {1, 2, 4, 8, 16, 32, 64, 128, 256};
+  sampler_.reset(new FixedUnigramSampler(9, weights, 0.8, 0, 1, 0));
+  // 1^0.8+2^0.8+4^0.8+...+256^0.8=197.05
+  for (int i = 0; i < 9; i++) {
+    ASSERT_NEAR(sampler_->Probability(i), pow(2, i * 0.8) / 197.05, 1e-4);
+  }
+}
+TEST_F(RangeSamplerTest, FixedUnigramChecksumFromVector) {
+  std::vector<float> weights = {1, 2, 4, 8, 16, 32, 64, 128, 256};
+  sampler_.reset(new FixedUnigramSampler(9, weights, 0.8, 0, 1, 0));
+  CheckProbabilitiesSumToOne();
+}
+TEST_F(RangeSamplerTest, FixedUnigramHistogramFromVector) {
+  std::vector<float> weights = {1, 2, 4, 8, 16, 32, 64, 128, 256};
+  sampler_.reset(new FixedUnigramSampler(9, weights, 0.8, 0, 1, 0));
+  CheckHistogram(1000, 0.05);
+}
+TEST_F(RangeSamplerTest, FixedUnigramProbabilitiesReserve1FromVector) {
+  std::vector<float> weights = {1, 2, 4, 8, 16, 32, 64, 128, 256};
+  sampler_.reset(new FixedUnigramSampler(10, weights, 0.8, 1, 1, 0));
+  ASSERT_NEAR(sampler_->Probability(0), 0, 1e-4);
+  // 1^0.8+2^0.8+4^0.8+...+256^0.8=197.05
+  for (int i = 1; i < 10; i++) {
+    ASSERT_NEAR(sampler_->Probability(i), pow(2, (i - 1) * 0.8) / 197.05, 1e-4);
+  }
+}
+TEST_F(RangeSamplerTest, FixedUnigramProbabilitiesReserve2FromVector) {
+  std::vector<float> weights = {1, 2, 4, 8, 16, 32, 64, 128, 256};
+  sampler_.reset(new FixedUnigramSampler(11, weights, 0.8, 2, 1, 0));
+  ASSERT_NEAR(sampler_->Probability(0), 0, 1e-4);
+  ASSERT_NEAR(sampler_->Probability(1), 0, 1e-4);
+  // 1^0.8+2^0.8+4^0.8+...+256^0.8=197.05
+  for (int i = 2; i < 11; i++) {
+    ASSERT_NEAR(sampler_->Probability(i), pow(2, (i - 2) * 0.8) / 197.05, 1e-4);
+  }
+}
+
+// AllSampler cannot call Sample or Probability directly.
+// We will test SampleBatchGetExpectedCount instead.
+TEST_F(RangeSamplerTest, All) {
+  int batch_size = 10;
+  sampler_.reset(new AllSampler(10));
+  std::vector<int64> batch(batch_size);
+  std::vector<float> batch_expected(batch_size);
+  std::vector<int64> extras(2);
+  std::vector<float> extras_expected(2);
+  extras[0] = 0;
+  extras[1] = batch_size - 1;
+  sampler_->SampleBatchGetExpectedCount(nullptr,  // no random numbers needed
+                                        false, &batch, &batch_expected, extras,
+                                        &extras_expected);
+  for (int i = 0; i < batch_size; i++) {
+    EXPECT_EQ(i, batch[i]);
+    EXPECT_EQ(1, batch_expected[i]);
+  }
+  EXPECT_EQ(1, extras_expected[0]);
+  EXPECT_EQ(1, extras_expected[1]);
+}
+
+TEST_F(RangeSamplerTest, Unique) {
+  // We sample num_batches batches, each without replacement.
+  //
+  // We check that the returned expected counts roughly agree with each other
+  // and with the average observed frequencies over the set of batches.
+  random::PhiloxRandom philox(123, 17);
+  random::SimplePhilox rnd(&philox);
+  const int range = 100;
+  const int batch_size = 50;
+  const int num_batches = 100;
+  sampler_.reset(new LogUniformSampler(range));
+  std::vector<int> histogram(range);
+  std::vector<int64> batch(batch_size);
+  std::vector<int64> all_values(range);
+  for (int i = 0; i < range; i++) {
+    all_values[i] = i;
+  }
+  std::vector<float> expected(range);
+
+  // Sample one batch and get the expected counts of all values
+  sampler_->SampleBatchGetExpectedCount(
+      &rnd, true, &batch, MutableArraySlice<float>(), all_values, &expected);
+  // Check that all elements are unique
+  std::set<int64> s(batch.begin(), batch.end());
+  CHECK_EQ(batch_size, s.size());
+
+  for (int trial = 0; trial < num_batches; trial++) {
+    std::vector<float> trial_expected(range);
+    sampler_->SampleBatchGetExpectedCount(&rnd, true, &batch,
+                                          MutableArraySlice<float>(),
+                                          all_values, &trial_expected);
+    for (int i = 0; i < range; i++) {
+      EXPECT_NEAR(expected[i], trial_expected[i], expected[i] * 0.5);
+    }
+    for (int i = 0; i < batch_size; i++) {
+      histogram[batch[i]]++;
+    }
+  }
+  for (int i = 0; i < range; i++) {
+    // Check that the computed expected count agrees with the average observed
+    // count.
+    const float average_count = static_cast<float>(histogram[i]) / num_batches;
+    EXPECT_NEAR(expected[i], average_count, 0.2);
+  }
+}
+
+TEST_F(RangeSamplerTest, Avoid) {
+  random::PhiloxRandom philox(123, 17);
+  random::SimplePhilox rnd(&philox);
+  sampler_.reset(new LogUniformSampler(100));
+  std::vector<int64> avoided(2);
+  avoided[0] = 17;
+  avoided[1] = 23;
+  std::vector<int64> batch(98);
+
+  // We expect to pick all elements of [0, 100) except the avoided two.
+  sampler_->SampleBatchGetExpectedCountAvoid(
+      &rnd, true, &batch, MutableArraySlice<float>(), ArraySlice<int64>(),
+      MutableArraySlice<float>(), avoided);
+
+  int sum = 0;
+  for (auto val : batch) {
+    sum += val;
+  }
+  const int expected_sum = 100 * 99 / 2 - avoided[0] - avoided[1];
+  EXPECT_EQ(expected_sum, sum);
+}
+
+}  // namespace
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reader_base.cc b/tensorflow/core/kernels/reader_base.cc
new file mode 100644
index 0000000000..06211efb38
--- /dev/null
+++ b/tensorflow/core/kernels/reader_base.cc
@@ -0,0 +1,156 @@
+#include "tensorflow/core/kernels/reader_base.h"
+
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+
+namespace tensorflow {
+
+// ReaderBase ------------------------------------------------------
+
+ReaderBase::ReaderBase(const string& name) : name_(name) {}
+
+int64 ReaderBase::NumRecordsProduced() {
+  mutex_lock lock(mu_);
+  return num_records_produced_;
+}
+
+int64 ReaderBase::NumWorkUnitsCompleted() {
+  mutex_lock lock(mu_);
+  return work_finished_;
+}
+
+Status ReaderBase::Reset() {
+  mutex_lock lock(mu_);
+  return ResetLocked();
+}
+
+Status ReaderBase::ResetLocked() {
+  work_started_ = 0;
+  work_finished_ = 0;
+  num_records_produced_ = 0;
+  work_.clear();
+  return Status::OK();
+}
+
+Status ReaderBase::SerializeState(string* state) {
+  mutex_lock lock(mu_);
+  return SerializeStateLocked(state);
+}
+
+Status ReaderBase::SerializeStateLocked(string* state) {
+  return errors::Unimplemented("Reader SerializeState");
+}
+
+Status ReaderBase::RestoreState(const string& state) {
+  mutex_lock lock(mu_);
+  Status status = RestoreStateLocked(state);
+  if (!status.ok()) {
+    ResetLocked();
+  }
+  return status;
+}
+
+Status ReaderBase::RestoreStateLocked(const string& state) {
+  return errors::Unimplemented("Reader RestoreState");
+}
+
+void ReaderBase::Read(QueueInterface* queue, string* key, string* value,
+                      OpKernelContext* context) {
+  mutex_lock lock(mu_);
+  while (true) {
+    if (!work_in_progress()) {
+      GetNextWorkLocked(queue, context);
+      if (!context->status().ok()) return;
+    }
+
+    bool produced = false;
+    bool at_end = false;
+    Status status = ReadLocked(key, value, &produced, &at_end);
+
+    if (!at_end && status.ok() && !produced) {
+      status = errors::Internal(
+          "ReadLocked() for ", name(),
+          " must set *at_end=true, *produced=true, or return an error.");
+    }
+    if (!status.ok() && produced) {
+      status = errors::Internal("ReadLocked() for ", name(),
+                                " set *produced=true *and* returned an error: ",
+                                status.ToString());
+    }
+    if (status.ok() && at_end) {
+      status = OnWorkFinishedLocked();
+      work_finished_ = work_started_;
+    }
+    if (!status.ok()) {
+      context->SetStatus(status);
+      return;
+    }
+    if (produced) {
+      ++num_records_produced_;
+      return;
+    }
+  }
+}
+
+void ReaderBase::GetNextWorkLocked(QueueInterface* queue,
+                                   OpKernelContext* context) {
+  Notification n;
+  queue->TryDequeue(
+      context, [this, context, &n](const QueueInterface::Tuple& tuple) {
+        if (context->status().ok()) {
+          if (tuple.size() != 1) {
+            context->SetStatus(
+                errors::InvalidArgument("Expected single component queue"));
+          } else if (tuple[0].dtype() != DT_STRING) {
+            context->SetStatus(errors::InvalidArgument(
+                "Expected queue with single string component"));
+          } else if (tuple[0].NumElements() != 1) {
+            context->SetStatus(errors::InvalidArgument(
+                "Expected to dequeue a one-element string tensor"));
+          } else {
+            work_ = tuple[0].flat<string>()(0);
+            ++work_started_;
+            Status status = OnWorkStartedLocked();
+            if (!status.ok()) {
+              context->SetStatus(status);
+              --work_started_;
+            }
+          }
+        }
+        n.Notify();
+      });
+  n.WaitForNotification();
+}
+
+void ReaderBase::SaveBaseState(ReaderBaseState* state) const {
+  state->Clear();
+  state->set_work_started(work_started_);
+  state->set_work_finished(work_finished_);
+  state->set_num_records_produced(num_records_produced_);
+  state->set_current_work(work_);
+}
+
+Status ReaderBase::RestoreBaseState(const ReaderBaseState& state) {
+  work_started_ = state.work_started();
+  work_finished_ = state.work_finished();
+  num_records_produced_ = state.num_records_produced();
+  work_ = state.current_work();
+  if (work_started_ < 0 || work_finished_ < 0 || num_records_produced_ < 0) {
+    return errors::InvalidArgument(
+        "Unexpected negative value when restoring in ", name(), ": ",
+        state.ShortDebugString());
+  }
+  if (work_started_ > work_finished_) {
+    return errors::InvalidArgument(
+        "Inconsistent work started vs. finished when restoring in ", name(),
+        ": ", state.ShortDebugString());
+  }
+  return Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reader_base.h b/tensorflow/core/kernels/reader_base.h
new file mode 100644
index 0000000000..d344300388
--- /dev/null
+++ b/tensorflow/core/kernels/reader_base.h
@@ -0,0 +1,107 @@
+#ifndef TENSORFLOW_KERNELS_READER_BASE_H_
+#define TENSORFLOW_KERNELS_READER_BASE_H_
+
+#include <memory>
+#include <string>
+#include <vector>
+#include "tensorflow/core/framework/queue_interface.h"
+#include "tensorflow/core/framework/reader_interface.h"
+#include "tensorflow/core/kernels/reader_base.pb.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+
+// Default implementation of ReaderInterface.
+class ReaderBase : public ReaderInterface {
+ public:
+  // name: For use in error messages, should mention both the name of
+  // the op and the node.
+  explicit ReaderBase(const string& name);
+
+  // Note that methods with names ending in "Locked" are called while
+  // the ReaderBase's mutex is held.
+
+  // Implement this function in descendants -----------------------------------
+
+  // Produce the next key/value pair from the current work item.
+  // This is called "Locked" since it is executed under a mutex
+  // that serializes all Reader calls.
+  // Usage:
+  //  a) If a record was successfully produced, set *produced = true,
+  //  and fill in *key and *value.
+  //  b) If no more records will be produced for this work item, set
+  //  *at_end = true.
+  //  c) If a record was produced, but no more will be produced, you
+  //     may either do both (a) and (b), or do (a) in this call and do (b) in
+  //     the next call to ReadLocked().
+  //  d) If there was an error producing (e.g. an error reading the file,
+  //     data corruption), return a non-OK() status.  ReadLocked may be
+  //     called again if the user reruns this part of the graph.
+  virtual Status ReadLocked(string* key, string* value, bool* produced,
+                            bool* at_end) = 0;
+
+  // Descendants may optionally implement these -------------------------------
+
+  // Called when work starts / finishes.
+  virtual Status OnWorkStartedLocked() { return Status::OK(); }
+  virtual Status OnWorkFinishedLocked() { return Status::OK(); }
+
+  // Called to reset the Reader to a newly constructed state.
+  virtual Status ResetLocked();
+
+  // Default implementation generates an Unimplemented error.
+  // See the protected helper methods below.
+  virtual Status SerializeStateLocked(string* state);
+  virtual Status RestoreStateLocked(const string& state);
+
+  // Accessors ----------------------------------------------------------------
+
+  // Always true during a call to ReadLocked().
+  bool work_in_progress() const { return work_finished_ < work_started_; }
+
+  // Returns the name of the current work item (valid if
+  // work_in_progress() returns true).  May change between calls to
+  // ReadLocked().
+  const string& current_work() const { return work_; }
+
+  // What was passed to the constructor.
+  const string& name() const { return name_; }
+
+ protected:
+  // For descendants wishing to implement serialize & restore state.
+
+  // Writes ReaderBase state to *state.
+  void SaveBaseState(ReaderBaseState* state) const;
+
+  // Restores ReaderBase state from state. Assumes state was filled
+  // using SaveBaseState() above.
+  Status RestoreBaseState(const ReaderBaseState& state);
+
+ private:
+  // Implementations of ReaderInterface methods.  These ensure thread-safety
+  // and call the methods above to do the work.
+  void Read(QueueInterface* queue, string* key, string* value,
+            OpKernelContext* context) override;
+  Status Reset() override;
+  int64 NumRecordsProduced() override;
+  int64 NumWorkUnitsCompleted() override;
+  Status SerializeState(string* state) override;
+  Status RestoreState(const string& state) override;
+
+  // For implementing Read().  Dequeues the next work item from
+  // *queue, and if successful updates work_, work_started_
+  // (establishing work_in_progress() == true) and calls
+  // OnWorkStartedLocked().  May block.
+  void GetNextWorkLocked(QueueInterface* queue, OpKernelContext* context);
+
+  mutable mutex mu_;
+  const string name_;
+  int64 work_started_ = 0;
+  int64 work_finished_ = 0;
+  int64 num_records_produced_ = 0;
+  string work_;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_READER_BASE_H_
diff --git a/tensorflow/core/kernels/reader_base.proto b/tensorflow/core/kernels/reader_base.proto
new file mode 100644
index 0000000000..4335cb2152
--- /dev/null
+++ b/tensorflow/core/kernels/reader_base.proto
@@ -0,0 +1,13 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+// For serializing and restoring the state of ReaderBase, see
+// reader_base.h for details.
+message ReaderBaseState {
+  int64 work_started = 1;
+  int64 work_finished = 2;
+  int64 num_records_produced = 3;
+  bytes current_work = 4;
+};
diff --git a/tensorflow/core/kernels/reader_ops.cc b/tensorflow/core/kernels/reader_ops.cc
new file mode 100644
index 0000000000..38c1013604
--- /dev/null
+++ b/tensorflow/core/kernels/reader_ops.cc
@@ -0,0 +1,132 @@
+// See docs in ../ops/io_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/queue_interface.h"
+#include "tensorflow/core/framework/reader_interface.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+class ReaderVerbOpKernel : public OpKernel {
+ public:
+  using OpKernel::OpKernel;
+
+  void Compute(OpKernelContext* context) override {
+    ReaderInterface* reader;
+    OP_REQUIRES_OK(context,
+                   GetResourceFromContext(context, "reader_handle", &reader));
+    ComputeWithReader(context, reader);
+    reader->Unref();
+  }
+
+ protected:
+  virtual void ComputeWithReader(OpKernelContext* context,
+                                 ReaderInterface* reader) = 0;
+};
+
+class ReaderReadOp : public ReaderVerbOpKernel {
+ public:
+  using ReaderVerbOpKernel::ReaderVerbOpKernel;
+
+  void ComputeWithReader(OpKernelContext* context,
+                         ReaderInterface* reader) override {
+    QueueInterface* queue;
+    OP_REQUIRES_OK(context,
+                   GetResourceFromContext(context, "queue_handle", &queue));
+    core::ScopedUnref unref_me(queue);
+    Tensor* key = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output("key", TensorShape({}), &key));
+    Tensor* value = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output("value", TensorShape({}), &value));
+
+    auto key_scalar = key->scalar<string>();
+    auto value_scalar = value->scalar<string>();
+    reader->Read(queue, &key_scalar(), &value_scalar(), context);
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ReaderRead").Device(DEVICE_CPU), ReaderReadOp);
+
+class ReaderNumRecordsProducedOp : public ReaderVerbOpKernel {
+ public:
+  using ReaderVerbOpKernel::ReaderVerbOpKernel;
+
+  void ComputeWithReader(OpKernelContext* context,
+                         ReaderInterface* reader) override {
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output("records_produced",
+                                                     TensorShape({}), &output));
+    output->scalar<int64>()() = reader->NumRecordsProduced();
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ReaderNumRecordsProduced").Device(DEVICE_CPU),
+                        ReaderNumRecordsProducedOp);
+
+class ReaderNumWorkUnitsCompletedOp : public ReaderVerbOpKernel {
+ public:
+  using ReaderVerbOpKernel::ReaderVerbOpKernel;
+
+  void ComputeWithReader(OpKernelContext* context,
+                         ReaderInterface* reader) override {
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output("units_completed",
+                                                     TensorShape({}), &output));
+    output->scalar<int64>()() = reader->NumWorkUnitsCompleted();
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ReaderNumWorkUnitsCompleted").Device(DEVICE_CPU),
+                        ReaderNumWorkUnitsCompletedOp);
+
+class ReaderSerializeStateOp : public ReaderVerbOpKernel {
+ public:
+  using ReaderVerbOpKernel::ReaderVerbOpKernel;
+
+  void ComputeWithReader(OpKernelContext* context,
+                         ReaderInterface* reader) override {
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output("state", TensorShape({}), &output));
+    OP_REQUIRES_OK(context,
+                   reader->SerializeState(&output->scalar<string>()()));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ReaderSerializeState").Device(DEVICE_CPU),
+                        ReaderSerializeStateOp);
+
+class ReaderRestoreStateOp : public ReaderVerbOpKernel {
+ public:
+  using ReaderVerbOpKernel::ReaderVerbOpKernel;
+
+  void ComputeWithReader(OpKernelContext* context,
+                         ReaderInterface* reader) override {
+    const Tensor* tensor;
+    OP_REQUIRES_OK(context, context->input("state", &tensor));
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsScalar(tensor->shape()),
+        errors::InvalidArgument("Reader state must be scalar, but had shape: ",
+                                tensor->shape().DebugString()));
+    OP_REQUIRES_OK(context, reader->RestoreState(tensor->scalar<string>()()));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ReaderRestoreState").Device(DEVICE_CPU),
+                        ReaderRestoreStateOp);
+
+class ReaderResetOp : public ReaderVerbOpKernel {
+ public:
+  using ReaderVerbOpKernel::ReaderVerbOpKernel;
+
+  void ComputeWithReader(OpKernelContext* context,
+                         ReaderInterface* reader) override {
+    OP_REQUIRES_OK(context, reader->Reset());
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ReaderReset").Device(DEVICE_CPU), ReaderResetOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reduction_ops.h b/tensorflow/core/kernels/reduction_ops.h
new file mode 100644
index 0000000000..b412617a65
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops.h
@@ -0,0 +1,66 @@
+#ifndef TENSORFLOW_KERNELS_REDUCTION_OPS_H_
+#define TENSORFLOW_KERNELS_REDUCTION_OPS_H_
+
+// Functor definitions for Reduction ops, must be compilable by nvcc.
+
+#include <iostream>
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// When eigen3 has better implementation of AllReducer and AnyReducer,
+// replaces reducers here.
+
+// Reduction using logical_and.
+struct AllReducer {
+  // TODO(zhifengc): Implement PacketAccess when performance matters.
+  static const bool PacketAccess = false;
+  static const bool IsStateful = false;
+
+  EIGEN_DEVICE_FUNC void reduce(const bool t, bool* accum) const {
+    *accum &= t;
+  }
+
+  EIGEN_DEVICE_FUNC bool initialize() const { return true; }
+
+  EIGEN_DEVICE_FUNC bool finalize(const bool accum) const { return accum; }
+};
+
+// Reduction using logical_or.
+struct AnyReducer {
+  // TODO(zhifengc): Implement PacketAccess when performance matters.
+  static const bool PacketAccess = false;
+  static const bool IsStateful = false;
+
+  EIGEN_DEVICE_FUNC void reduce(const bool t, bool* accum) const {
+    *accum |= t;
+  }
+
+  EIGEN_DEVICE_FUNC bool initialize() const { return false; }
+
+  EIGEN_DEVICE_FUNC bool finalize(const bool accum) const { return accum; }
+};
+
+template <typename Device, typename OUT_T, typename IN_T,
+          typename ReductionAxes, typename Reducer>
+void ReduceEigenImpl(const Device& d, OUT_T out, IN_T in,
+                     const ReductionAxes& reduction_axes,
+                     const Reducer& reducer) {
+  out.device(d) = in.reduce(reduction_axes, reducer);
+}
+
+template <typename Device>
+struct ReduceFunctor {
+  template <typename OUT_T, typename IN_T, typename ReductionAxes,
+            typename Reducer>
+  static void Reduce(const Device& d, OUT_T out, IN_T in,
+                     const ReductionAxes& reduction_axes,
+                     const Reducer& reducer);
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_REDUCTION_OPS_H_
diff --git a/tensorflow/core/kernels/reduction_ops_all.cc b/tensorflow/core/kernels/reduction_ops_all.cc
new file mode 100644
index 0000000000..11d399e70a
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_all.cc
@@ -0,0 +1,17 @@
+#include "tensorflow/core/kernels/reduction_ops_common.h"
+
+namespace tensorflow {
+
+REGISTER_KERNEL_BUILDER(Name("All")
+                            .Device(DEVICE_CPU)
+                            .HostMemory("reduction_indices"),
+                        ReductionOp<CPUDevice, bool, functor::AllReducer>);
+
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("All")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("reduction_indices"),
+                        ReductionOp<GPUDevice, bool, functor::AllReducer>);
+#endif
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reduction_ops_any.cc b/tensorflow/core/kernels/reduction_ops_any.cc
new file mode 100644
index 0000000000..a89ef22b08
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_any.cc
@@ -0,0 +1,17 @@
+#include "tensorflow/core/kernels/reduction_ops_common.h"
+
+namespace tensorflow {
+
+REGISTER_KERNEL_BUILDER(Name("Any")
+                            .Device(DEVICE_CPU)
+                            .HostMemory("reduction_indices"),
+                        ReductionOp<CPUDevice, bool, functor::AnyReducer>);
+
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("Any")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("reduction_indices"),
+                        ReductionOp<GPUDevice, bool, functor::AnyReducer>);
+#endif
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reduction_ops_common.h b/tensorflow/core/kernels/reduction_ops_common.h
new file mode 100644
index 0000000000..2bde3a1a54
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_common.h
@@ -0,0 +1,302 @@
+// This is an internal header file intended to only be included as the
+// front-matter in the implementation files of various reduction ops.  It
+// is a header file because we split the various reduction ops into their
+// own compilation units to get more parallelism in compilation.
+
+#ifndef TENSORFLOW_KERNELS_REDUCTION_OPS_COMMON_H_
+#define TENSORFLOW_KERNELS_REDUCTION_OPS_COMMON_H_
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/reduction_ops.h"
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/Eigen/Core"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device>
+struct Constants {
+  // Derive Index type. int (32-bit) or long (64-bit) depending on the
+  // compile-time configuration. "float" here is not relevant.
+  // TODO(zhifengc): Moves the definition to TTypes.
+  typedef TTypes<float>::Tensor::Index Index;
+  Eigen::array<Index, 1> kZero;
+  Eigen::array<Index, 1> kOne;
+  Eigen::array<Index, 2> kZeroTwo;
+
+  Constants() {
+    kZero[0] = 0;
+    kOne[0] = 1;
+    kZeroTwo[0] = 0;
+    kZeroTwo[1] = 2;
+  }
+};
+
+#if defined(EIGEN_HAS_INDEX_LIST)
+template <>
+struct Constants<CPUDevice> {
+  const Eigen::IndexList<Eigen::type2index<0>> kZero;
+  const Eigen::IndexList<Eigen::type2index<1>> kOne;
+  const Eigen::IndexList<Eigen::type2index<0>, Eigen::type2index<2>> kZeroTwo;
+};
+#endif
+
+namespace {
+
+class ReductionHelper {
+ public:
+  ReductionHelper() : reduce_first_axis_(false) {}
+
+  Status Simplify(const Tensor& data, const Tensor& axis,
+                       const bool keep_dims) {
+    // bitmap[i] indicates whether to reduce data along i-th axis.
+    std::vector<bool> bitmap(data.dims(), false);
+    auto axis_vec = axis.flat<int32>();
+    for (int64 i = 0; i < axis.NumElements(); ++i) {
+      const int32 index = axis_vec(i);
+      if (index < 0 || index >= data.dims()) {
+        return errors::OutOfRange("Invalid reduction dimension (", index,
+                                  " for input with ", data.dims(),
+                                  " dimension(s)");
+      }
+      bitmap[index] = true;
+    }
+
+    // Output tensor's dim sizes.
+    out_shape_.clear();
+    for (int i = 0; i < data.dims(); ++i) {
+      if (!bitmap[i]) {
+        // If we are not reducing along dimension i.
+        out_shape_.push_back(data.dim_size(i));
+      } else if (keep_dims) {
+        // We are reducing along dimension i, but we want to keep the
+        // same number of dimensions, so we set the dimension of i to
+        // '1'.
+        out_shape_.push_back(1);
+      }
+    }
+
+    // Depending on bitmap[i] and bitmap[i-1], we can collapse axis of
+    // the input data before doing the reduction on the resulting
+    // tensor.  The shape of the reduction is a reshape of the final
+    // output.
+
+    // We'll skip the leading 1s.
+    int dim_index = 0;
+    for (; dim_index < data.dims(); ++dim_index) {
+      if (data.dim_size(dim_index) != 1) break;
+    }
+    if (dim_index >= data.dims()) {
+      // Special case. The input is essentially a scalar.
+      reduce_first_axis_ = true;
+    } else {
+      // Starting from the (dim_index)-th dimension, dimensions
+      // alternates between runs that need to be reduced and runs that
+      // don't.
+      //
+      // NOTE: If a dimension has size 1, we group it as the current
+      // run so that we can minimize the number of runs.
+      //
+      // E.g., when we want to reduce a tensor of shape [2, 1, 3, 1,
+      // 5] by axes = [1, 4], we should treat the tensor as a [6, 5]
+      // and reduce by axes = [1] (i.e., the output is shape [6]).
+      reduce_first_axis_ = bitmap[dim_index];
+      data_reshape_.push_back(data.dim_size(dim_index));
+      ++dim_index;
+      for (; dim_index < data.dims(); ++dim_index) {
+        const auto size = data.dim_size(dim_index);
+        if (size == 1) {
+          bitmap[dim_index] = bitmap[dim_index - 1];
+        }
+        if (bitmap[dim_index - 1] != bitmap[dim_index]) {
+          // Starts a new run of reduce or !reduce.
+          data_reshape_.push_back(size);
+        } else {
+          // Continue a run of reduce or !reduce.
+          data_reshape_.back() *= size;
+        }
+      }
+      // If reduce_first_axis_ is true (input's dimension 0, 2, 4, etc
+      // are reduced), data_reshape_[1, 3, 5, ...]  is out_reshape_,
+      // otherwise, data_reshape_[0, 2, 4, ...] is.
+      for (size_t i = reduce_first_axis_ ? 1 : 0; i < data_reshape_.size();
+           i += 2) {
+        out_reshape_.push_back(data_reshape_[i]);
+      }
+    }
+
+    VLOG(1) << "data reshape: " << str_util::Join(data_reshape_, ",");
+    VLOG(1) << "out  reshape: " << str_util::Join(out_reshape_, ",");
+    VLOG(1) << "out    shape: " << str_util::Join(out_shape_, ",");
+    return Status::OK();
+  }
+
+  // We need to do roughly:
+  //   tmp_out = allocate(out_reshape())
+  //   tmp_out.reshape(out_reshape) = data.reshape(data_reshape).reduce(axes)
+  //   out = tmp_out.reshape(out_shape)
+
+  // The reduction result must be allocated with this shape.
+  TensorShape out_reshape() const {
+    TensorShape shape;
+    for (auto size : out_reshape_) shape.AddDim(size);
+    return shape;
+  }
+
+  // The final output shape must be allocated with this shape.
+  TensorShape out_shape() const {
+    TensorShape shape;
+    for (auto size : out_shape_) shape.AddDim(size);
+    return shape;
+  }
+
+  // The reduction is on a reshaped tensor of this rank.
+  int ndims() const { return data_reshape_.size(); }
+
+  // True if need to reduce the 0-th dimension.
+  bool reduce_first_axis() const { return reduce_first_axis_; }
+
+  // The output is reshaped.
+  template <typename T, int N>
+  typename TTypes<T, N>::Tensor out(Tensor* out) {
+    return out->shaped<T, N>(out_reshape_);
+  }
+
+  // The input is reshaped.
+  template <typename T, int N>
+  typename TTypes<T, N>::ConstTensor in(const Tensor& data) {
+    return data.shaped<T, N>(data_reshape_);
+  }
+
+ private:
+  bool reduce_first_axis_;      // True if need to reduce the 0-th dimension.
+  std::vector<int64> data_reshape_;  // Reshape the data before reduction.
+  std::vector<int64> out_shape_;     // The final output shape.
+  std::vector<int64> out_reshape_;   // Reshape the output for reduction.
+};
+
+}  // end namespace
+
+// For operations where the output is a reduction function along some
+// dimensions of the input.
+template <typename Device, class T, typename Reducer>
+class ReductionOp : public OpKernel {
+ public:
+  explicit ReductionOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    const DataType dt = DataTypeToEnum<T>::v();
+    OP_REQUIRES_OK(ctx, ctx->MatchSignature({dt, DT_INT32}, {dt}));
+
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("keep_dims", &keep_dims_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& data = ctx->input(0);
+    const Tensor& axes = ctx->input(1);
+    VLOG(1) << "data shape: " << data.shape().ShortDebugString();
+    VLOG(1) << "axes      : " << axes.SummarizeValue(10);
+
+    ReductionHelper helper;
+    OP_REQUIRES_OK(ctx, helper.Simplify(data, axes, keep_dims_));
+    CHECK_GE(helper.ndims(), 0);
+
+    // The real output shape will be assigned below.
+    TensorShape empty_shape;
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, empty_shape, &out));
+
+    if (helper.ndims() == 0 ||
+        (helper.ndims() == 1 && !helper.reduce_first_axis())) {
+      // Special case. Reduces nothing.  It is unclear why this is
+      // necessary, but tests fail without it.  Look into why this
+      // case occurs.
+      if (!out->CopyFrom(data, helper.out_shape())) {
+        ctx->SetStatus(errors::Internal("Error during reduction copy."));
+      }
+      return;
+    }
+
+    // A temporary tensor whose size matches the size of the reduced
+    // output.
+    Tensor tmp_out;
+    OP_REQUIRES_OK(
+        ctx, ctx->allocate_temp(out->dtype(), helper.out_reshape(), &tmp_out));
+
+    typedef functor::ReduceFunctor<Device> Functor;
+    Constants<Device> constants;
+    const Device& d = ctx->eigen_device<Device>();
+    Reducer reducer;
+
+    if ((helper.ndims() == 1) && helper.reduce_first_axis()) {
+      // Reduce to a scalar.
+      Functor::Reduce(d, helper.out<T, 0>(&tmp_out), helper.in<T, 1>(data),
+                      constants.kZero, reducer);
+    } else if ((helper.ndims() == 2) && helper.reduce_first_axis()) {
+      // Can be viewed as a reduction of a matrix along 1st dimension.
+      Functor::Reduce(d, helper.out<T, 1>(&tmp_out), helper.in<T, 2>(data),
+                      constants.kZero, reducer);
+    } else if ((helper.ndims() == 2) && !helper.reduce_first_axis()) {
+      // Can be viewed as a reduction of a matrix along 2nd dimension.
+      Functor::Reduce(d, helper.out<T, 1>(&tmp_out), helper.in<T, 2>(data),
+                      constants.kOne, reducer);
+    } else if ((helper.ndims() == 3) && helper.reduce_first_axis()) {
+      // Can be viewed as a reduction of a 3D tensor along 1st and 3rd
+      // dimensions.
+      Functor::Reduce(d, helper.out<T, 1>(&tmp_out), helper.in<T, 3>(data),
+                      constants.kZeroTwo, reducer);
+    } else if ((helper.ndims() == 3) && !helper.reduce_first_axis()) {
+      // Can be viewed as a reduction of a 3D tensor along 2nd dimension.
+      Functor::Reduce(d, helper.out<T, 2>(&tmp_out), helper.in<T, 3>(data),
+                      constants.kOne, reducer);
+    } else {
+      // TODO(zhifengc): We can implement reduction for arbitrary rank
+      // tensor and arbitrary reduction axes by iterating the reduction
+      // multiple times. This may also be accomplished in the graph
+      // construction.
+      ctx->SetStatus(
+          errors::Unimplemented("Reducing ", data.shape().ShortDebugString(),
+                                " axes [", axes.SummarizeValue(10), "] to ",
+                                tmp_out.shape().ShortDebugString()));
+      return;
+    }
+
+    // Set the real output using the contents of the reduction but the
+    // real expected output shape.  The number of elements should
+    // match between the two shapes.
+    if (!out->CopyFrom(tmp_out, helper.out_shape())) {
+      ctx->SetStatus(errors::Internal("Error during reduction copy."));
+    }
+  }
+
+ private:
+  // True if the number of dimensions should be maintained.
+  bool keep_dims_;
+};
+
+namespace functor {
+
+template <>
+struct ReduceFunctor<CPUDevice> {
+  template <typename OUT_T, typename IN_T, typename ReductionAxes,
+            typename Reducer>
+  static void Reduce(const CPUDevice& d, OUT_T out, IN_T in,
+                     const ReductionAxes& reduction_axes,
+                     const Reducer& reducer) {
+    ReduceEigenImpl(d, out, in, reduction_axes, reducer);
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_REDUCTION_OPS_COMMON_H_
diff --git a/tensorflow/core/kernels/reduction_ops_gpu.cu.cc b/tensorflow/core/kernels/reduction_ops_gpu.cu.cc
new file mode 100644
index 0000000000..8e29d2d06c
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_gpu.cu.cc
@@ -0,0 +1,65 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/framework/numeric_types.h"
+#include "tensorflow/core/kernels/reduction_ops.h"
+
+namespace tensorflow {
+namespace functor {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Derive Index type. int (32-bit) or long (64-bit) depending on the
+// compile-time configuration. "float" here is not relevant.
+// TODO(zhifengc): Moves the definition to TTypes.
+typedef TTypes<float>::Tensor::Index Index;
+
+template <>
+struct ReduceFunctor<GPUDevice> {
+  template <typename OUT_T, typename IN_T, typename ReductionAxes,
+            typename Reducer>
+  static void Reduce(const GPUDevice& d, OUT_T out, IN_T in,
+                     const ReductionAxes& reduction_axes,
+                     const Reducer& reducer) {
+    ReduceEigenImpl(d, To32Bit(out), To32Bit(in), reduction_axes, reducer);
+  }
+};
+
+// T: the data type
+// REDUCER: the reducer functor
+// NUM_AXES: the number of axes to reduce
+// IN_DIMS: the number of dimensions of the input tensor
+#define DEFINE(T, REDUCER, IN_DIMS, NUM_AXES)                        \
+  template void ReduceFunctor<GPUDevice>::Reduce(                    \
+      const GPUDevice& d, TTypes<T, IN_DIMS - NUM_AXES>::Tensor out, \
+      TTypes<T, IN_DIMS>::ConstTensor in,                            \
+      const Eigen::array<Index, NUM_AXES>& reduction_axes,           \
+      const REDUCER& reducer);
+
+#define DEFINE_FOR_TYPE_AND_R(T, R) \
+  DEFINE(T, R, 1, 1);               \
+  DEFINE(T, R, 2, 1);               \
+  DEFINE(T, R, 3, 1);               \
+  DEFINE(T, R, 3, 2);
+
+#define DEFINE_FOR_ALL_REDUCERS(T)                          \
+  DEFINE_FOR_TYPE_AND_R(T, Eigen::internal::SumReducer<T>); \
+  DEFINE_FOR_TYPE_AND_R(T, Eigen::internal::MinReducer<T>); \
+  DEFINE_FOR_TYPE_AND_R(T, Eigen::internal::MaxReducer<T>); \
+  DEFINE_FOR_TYPE_AND_R(T, Eigen::internal::ProdReducer<T>)
+
+DEFINE_FOR_ALL_REDUCERS(float);
+#undef DEFINE_FOR_ALL_REDUCERS
+
+DEFINE_FOR_TYPE_AND_R(complex64, Eigen::internal::SumReducer<complex64>);
+DEFINE_FOR_TYPE_AND_R(bool, AllReducer);
+DEFINE_FOR_TYPE_AND_R(bool, AnyReducer);
+#undef DEFINE_FOR_TYPE_AND_R
+
+#undef DEFINE
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/reduction_ops_max.cc b/tensorflow/core/kernels/reduction_ops_max.cc
new file mode 100644
index 0000000000..1749360b6e
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_max.cc
@@ -0,0 +1,26 @@
+#include "tensorflow/core/kernels/reduction_ops_common.h"
+
+namespace tensorflow {
+
+#define REGISTER_CPU_KERNELS(type)                              \
+  REGISTER_KERNEL_BUILDER(                                      \
+      Name("Max").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ReductionOp<CPUDevice, type, Eigen::internal::MaxReducer<type>>);
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
+#undef REGISTER_CPU_KERNELS
+
+#if GOOGLE_CUDA
+
+#define REGISTER_GPU_KERNELS(type)          \
+  REGISTER_KERNEL_BUILDER(                  \
+      Name("Max")                           \
+          .Device(DEVICE_GPU)               \
+          .TypeConstraint<type>("T")        \
+          .HostMemory("reduction_indices"), \
+      ReductionOp<GPUDevice, type, Eigen::internal::MaxReducer<type>>);
+REGISTER_GPU_KERNELS(float);
+#undef REGISTER_GPU_KERNELS
+
+#endif
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reduction_ops_mean.cc b/tensorflow/core/kernels/reduction_ops_mean.cc
new file mode 100644
index 0000000000..b00c36fed8
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_mean.cc
@@ -0,0 +1,12 @@
+#include "tensorflow/core/kernels/reduction_ops_common.h"
+
+namespace tensorflow {
+
+#define REGISTER_CPU_KERNELS(type)                               \
+  REGISTER_KERNEL_BUILDER(                                       \
+      Name("Mean").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ReductionOp<CPUDevice, type, Eigen::internal::MeanReducer<type>>);
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
+#undef REGISTER_CPU_KERNELS
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reduction_ops_min.cc b/tensorflow/core/kernels/reduction_ops_min.cc
new file mode 100644
index 0000000000..de1f4b8520
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_min.cc
@@ -0,0 +1,26 @@
+#include "tensorflow/core/kernels/reduction_ops_common.h"
+
+namespace tensorflow {
+
+#define REGISTER_CPU_KERNELS(type)                              \
+  REGISTER_KERNEL_BUILDER(                                      \
+      Name("Min").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ReductionOp<CPUDevice, type, Eigen::internal::MinReducer<type>>);
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
+#undef REGISTER_CPU_KERNELS
+
+#if GOOGLE_CUDA
+
+#define REGISTER_GPU_KERNELS(type)          \
+  REGISTER_KERNEL_BUILDER(                  \
+      Name("Min")                           \
+          .Device(DEVICE_GPU)               \
+          .TypeConstraint<type>("T")        \
+          .HostMemory("reduction_indices"), \
+      ReductionOp<GPUDevice, type, Eigen::internal::MinReducer<type>>);
+REGISTER_GPU_KERNELS(float);
+#undef REGISTER_GPU_KERNELS
+
+#endif
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reduction_ops_prod.cc b/tensorflow/core/kernels/reduction_ops_prod.cc
new file mode 100644
index 0000000000..4068c7feda
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_prod.cc
@@ -0,0 +1,26 @@
+#include "tensorflow/core/kernels/reduction_ops_common.h"
+
+namespace tensorflow {
+
+#define REGISTER_CPU_KERNELS(type)                               \
+  REGISTER_KERNEL_BUILDER(                                       \
+      Name("Prod").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ReductionOp<CPUDevice, type, Eigen::internal::ProdReducer<type>>);
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
+#undef REGISTER_CPU_KERNELS
+
+#if GOOGLE_CUDA
+
+#define REGISTER_GPU_KERNELS(type)          \
+  REGISTER_KERNEL_BUILDER(                  \
+      Name("Prod")                          \
+          .Device(DEVICE_GPU)               \
+          .TypeConstraint<type>("T")        \
+          .HostMemory("reduction_indices"), \
+      ReductionOp<GPUDevice, type, Eigen::internal::ProdReducer<type>>);
+REGISTER_GPU_KERNELS(float);
+#undef REGISTER_GPU_KERNELS
+
+#endif
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reduction_ops_sum.cc b/tensorflow/core/kernels/reduction_ops_sum.cc
new file mode 100644
index 0000000000..82d685e225
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_sum.cc
@@ -0,0 +1,37 @@
+#include "tensorflow/core/kernels/reduction_ops_common.h"
+
+namespace tensorflow {
+
+#define REGISTER_CPU_KERNELS(type)                              \
+  REGISTER_KERNEL_BUILDER(                                      \
+      Name("Sum").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ReductionOp<CPUDevice, type, Eigen::internal::SumReducer<type>>);
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_KERNELS);
+#undef REGISTER_CPU_KERNELS
+
+// NOTE: We should have mean(complex64,int32), too. But that needs to
+// change Eigen::internal::MeanReducer to cast int to complex<float>.
+// We don't see immediate need of mean(complex64,int32) anyway.
+REGISTER_KERNEL_BUILDER(
+    Name("Sum").Device(DEVICE_CPU).TypeConstraint<complex64>("T"),
+    ReductionOp<CPUDevice, complex64, Eigen::internal::SumReducer<complex64>>);
+
+#if GOOGLE_CUDA
+
+#define REGISTER_GPU_KERNELS(type)          \
+  REGISTER_KERNEL_BUILDER(                  \
+      Name("Sum")                           \
+          .Device(DEVICE_GPU)               \
+          .TypeConstraint<type>("T")        \
+          .HostMemory("reduction_indices"), \
+      ReductionOp<GPUDevice, type, Eigen::internal::SumReducer<type>>);
+REGISTER_GPU_KERNELS(float);
+#undef REGISTER_GPU_KERNELS
+
+REGISTER_KERNEL_BUILDER(
+    Name("Sum").Device(DEVICE_GPU).TypeConstraint<complex64>("T"),
+    ReductionOp<GPUDevice, complex64, Eigen::internal::SumReducer<complex64>>);
+
+#endif
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reduction_ops_test.cc b/tensorflow/core/kernels/reduction_ops_test.cc
new file mode 100644
index 0000000000..d96da3c7f1
--- /dev/null
+++ b/tensorflow/core/kernels/reduction_ops_test.cc
@@ -0,0 +1,73 @@
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+// Creates a Graph which "reduce"s a 3D float tensor of "num" elements
+// into a scalar.
+static Graph* ToScalar(const string& reduce, int num) {
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor data(DT_FLOAT, TensorShape({64, 64, num / (64 * 64)}));
+  data.flat<float>().setRandom();
+  Tensor axes(DT_INT32, TensorShape({3}));
+  axes.flat<int32>()(0) = 0;
+  axes.flat<int32>()(1) = 1;
+  axes.flat<int32>()(2) = 2;
+  test::graph::Reduce(g, reduce, test::graph::Constant(g, data),
+                      test::graph::Constant(g, axes));
+  return g;
+}
+
+// Creates a bench which reduces a 3D tensor with total "num" floats
+// into a scalar on a "device". Runs the bench for "iters" times.
+static void ReduceToScalar(int iters, const string& device,
+                           const string& reduce, int num) {
+  testing::ItemsProcessed(static_cast<int64>(iters) * num);
+  testing::BytesProcessed(static_cast<int64>(iters) * num * sizeof(float));
+  test::Benchmark(device, ToScalar(reduce, num)).Run(iters);
+}
+
+static void BM_Sum3DToScalarCPU(int iters, int num) {
+  ReduceToScalar(iters, "cpu", "Sum", num);
+}
+BENCHMARK(BM_Sum3DToScalarCPU)->Range(1 << 13, 1 << 20);
+
+static void BM_Max3DToScalarCPU(int iters, int num) {
+  ReduceToScalar(iters, "cpu", "Max", num);
+}
+BENCHMARK(BM_Max3DToScalarCPU)->Range(1 << 13, 1 << 20);
+
+static void BM_Prod3DToScalarCPU(int iters, int num) {
+  ReduceToScalar(iters, "cpu", "Prod", num);
+}
+BENCHMARK(BM_Prod3DToScalarCPU)->Range(1 << 13, 1 << 20);
+
+static void BM_Mean3DToScalarCPU(int iters, int num) {
+  ReduceToScalar(iters, "cpu", "Mean", num);
+}
+BENCHMARK(BM_Mean3DToScalarCPU)->Range(1 << 13, 1 << 20);
+
+static void BM_Sum3DToScalarGPU(int iters, int num) {
+  ReduceToScalar(iters, "gpu", "Sum", num);
+}
+BENCHMARK(BM_Sum3DToScalarGPU)->Range(1 << 13, 1 << 20);
+
+static void BM_Max3DToScalarGPU(int iters, int num) {
+  ReduceToScalar(iters, "gpu", "Max", num);
+}
+BENCHMARK(BM_Max3DToScalarGPU)->Range(1 << 13, 1 << 20);
+
+static void BM_Prod3DToScalarGPU(int iters, int num) {
+  ReduceToScalar(iters, "gpu", "Prod", num);
+}
+BENCHMARK(BM_Prod3DToScalarGPU)->Range(1 << 13, 1 << 20);
+
+// Once Mean is available on GPU, enable this.
+// static void BM_Mean3DToScalarGPU(int iters, int num) {
+//   ReduceToScalar(iters, "gpu", "Mean", num);
+// }
+// BENCHMARK(BM_Mean3DToScalarGPU)->Range(1 << 13, 1 << 20);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/reference_gemm.h b/tensorflow/core/kernels/reference_gemm.h
new file mode 100644
index 0000000000..77c6ef35e9
--- /dev/null
+++ b/tensorflow/core/kernels/reference_gemm.h
@@ -0,0 +1,75 @@
+#ifndef TENSORFLOW_KERNELS_REFERENCE_GEMM_H_
+#define TENSORFLOW_KERNELS_REFERENCE_GEMM_H_
+
+// This is an unoptimized but debuggable implementation of the GEMM matrix
+// multiply function, used to compare to faster but more opaque versions, or
+// for bit depths or argument combinations that aren't supported by optimized
+// code.
+// It assumes the row-major convention used by TensorFlow, and implements
+// C = A * B, like the standard BLAS GEMM interface. If the tranpose flags are
+// true, then the relevant matrix is treated as stored in column-major order.
+
+namespace tensorflow {
+template <class T1, class T2, class T3>
+void ReferenceGemm(bool transpose_a, bool transpose_b, bool transpose_c,
+                   size_t m, size_t n, size_t k, const T1* a, T1 offset_a,
+                   size_t lda, const T2* b, T2 offset_b, size_t ldb, T3* c,
+                   int32 shift_c, int32 offset_c, int32 mult_c, size_t ldc) {
+  int a_i_stride;
+  int a_l_stride;
+  if (transpose_a) {
+    a_i_stride = 1;
+    a_l_stride = lda;
+  } else {
+    a_i_stride = lda;
+    a_l_stride = 1;
+  }
+  int b_j_stride;
+  int b_l_stride;
+  if (transpose_b) {
+    b_j_stride = ldb;
+    b_l_stride = 1;
+  } else {
+    b_j_stride = 1;
+    b_l_stride = ldb;
+  }
+  int c_i_stride;
+  int c_j_stride;
+  if (transpose_c) {
+    c_i_stride = 1;
+    c_j_stride = ldc;
+  } else {
+    c_i_stride = ldc;
+    c_j_stride = 1;
+  }
+
+  const int32 highest = static_cast<int32>(Eigen::NumTraits<T3>::highest());
+  const int32 lowest = static_cast<int32>(Eigen::NumTraits<T3>::lowest());
+  const int32 rounding = (shift_c < 1) ? 0 : (1 << (shift_c - 1));
+
+  int i, j, l;
+  for (j = 0; j < n; j++) {
+    for (i = 0; i < m; i++) {
+      int32 total = 0;
+      for (l = 0; l < k; l++) {
+        const size_t a_index = ((i * a_i_stride) + (l * a_l_stride));
+        const int32 a_value = a[a_index] - offset_a;
+        const size_t b_index = ((j * b_j_stride) + (l * b_l_stride));
+        const int32 b_value = b[b_index] - offset_b;
+        total += (a_value * b_value);
+      }
+      const size_t c_index = ((i * c_i_stride) + (j * c_j_stride));
+      int32_t output = ((((total + offset_c) * mult_c) + rounding) >> shift_c);
+      if (output > highest) {
+        output = highest;
+      }
+      if (output < lowest) {
+        output = lowest;
+      }
+      c[c_index] = static_cast<T3>(output);
+    }
+  }
+}
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_REFERENCE_GEMM_H_
diff --git a/tensorflow/core/kernels/relu_op.cc b/tensorflow/core/kernels/relu_op.cc
new file mode 100644
index 0000000000..d5dd7a8119
--- /dev/null
+++ b/tensorflow/core/kernels/relu_op.cc
@@ -0,0 +1,154 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/relu_op.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class ReluOp : public UnaryElementWiseOp<T, ReluOp<Device, T>> {
+ public:
+  using UnaryElementWiseOp<T, ReluOp<Device, T>>::UnaryElementWiseOp;
+
+  void Operate(OpKernelContext* context, const Tensor& input, Tensor* output) {
+    functor::Relu<Device, T> functor;
+    functor(context->eigen_device<Device>(), input.flat<T>(),
+            output->flat<T>());
+  }
+};
+
+template <typename Device, typename T>
+class Relu6Op : public UnaryElementWiseOp<T, Relu6Op<Device, T>> {
+ public:
+  using UnaryElementWiseOp<T, Relu6Op<Device, T>>::UnaryElementWiseOp;
+
+  void Operate(OpKernelContext* context, const Tensor& input, Tensor* output) {
+    functor::Relu6<Device, T> functor;
+    functor(context->eigen_device<Device>(), input.flat<T>(),
+            output->flat<T>());
+  }
+};
+
+template <typename Device, typename T>
+class ReluGradOp : public BinaryElementWiseOp<T, ReluGradOp<Device, T>> {
+ public:
+  using BinaryElementWiseOp<T, ReluGradOp<Device, T>>::BinaryElementWiseOp;
+
+  // INPUTS:
+  //   g (gradients): backpropagated gradients
+  //   a (inputs): inputs that were passed to ReluOp()
+  // OUTPUT:
+  //   gradients to backprop
+  template <int NDIMS>
+  void Operate(OpKernelContext* context, const Tensor& g, const Tensor& a,
+               Tensor* output) {
+    OP_REQUIRES(context, a.IsSameSize(g),
+                errors::InvalidArgument("g and a must be the same size"));
+    functor::ReluGrad<Device, T> functor;
+    functor(context->eigen_device<Device>(), g.flat<T>(), a.flat<T>(),
+            output->flat<T>());
+  }
+};
+
+template <typename Device, typename T>
+class Relu6GradOp : public BinaryElementWiseOp<T, Relu6GradOp<Device, T>> {
+ public:
+  using BinaryElementWiseOp<T, Relu6GradOp<Device, T>>::BinaryElementWiseOp;
+
+  // INPUTS:
+  //   g (gradients): backpropagated gradients
+  //   a (inputs): inputs that were passed to Relu6Op()
+  // OUTPUT:
+  //   gradients to backprop
+  template <int NDIMS>
+  void Operate(OpKernelContext* context, const Tensor& g, const Tensor& a,
+               Tensor* output) {
+    OP_REQUIRES(context, a.IsSameSize(g),
+                errors::InvalidArgument("g and a must be the same size"));
+    functor::Relu6Grad<Device, T> functor;
+    functor(context->eigen_device<Device>(), g.flat<T>(), a.flat<T>(),
+            output->flat<T>());
+  }
+};
+
+#define REGISTER_KERNELS(type)                                        \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("Relu").Device(DEVICE_CPU).TypeConstraint<type>("T"),      \
+      ReluOp<CPUDevice, type>);                                       \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("Relu6").Device(DEVICE_CPU).TypeConstraint<type>("T"),     \
+      Relu6Op<CPUDevice, type>);                                      \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("ReluGrad").Device(DEVICE_CPU).TypeConstraint<type>("T"),  \
+      ReluGradOp<CPUDevice, type>);                                   \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("Relu6Grad").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      Relu6GradOp<CPUDevice, type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNELS);
+#undef REGISTER_KERNELS
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                          \
+  template <>                                                        \
+  void Relu<GPUDevice, T>::operator()(                               \
+      const GPUDevice& d, typename TTypes<T>::ConstTensor features,  \
+      typename TTypes<T>::Tensor activations);                       \
+  extern template struct Relu<GPUDevice, T>;                         \
+                                                                     \
+  template <>                                                        \
+  void ReluGrad<GPUDevice, T>::operator()(                           \
+      const GPUDevice& d, typename TTypes<T>::ConstTensor gradients, \
+      typename TTypes<T>::ConstTensor features,                      \
+      typename TTypes<T>::Tensor backprops);                         \
+                                                                     \
+  extern template struct ReluGrad<GPUDevice, T>;                     \
+  template <>                                                        \
+  void Relu6<GPUDevice, T>::operator()(                              \
+      const GPUDevice& d, typename TTypes<T>::ConstTensor features,  \
+      typename TTypes<T>::Tensor activations);                       \
+  extern template struct Relu6<GPUDevice, T>;                        \
+                                                                     \
+  template <>                                                        \
+  void Relu6Grad<GPUDevice, T>::operator()(                          \
+      const GPUDevice& d, typename TTypes<T>::ConstTensor gradients, \
+      typename TTypes<T>::ConstTensor features,                      \
+      typename TTypes<T>::Tensor backprops);                         \
+  extern template struct Relu6Grad<GPUDevice, T>;
+
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPEC);
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNELS(type)                                    \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("Relu").Device(DEVICE_GPU).TypeConstraint<type>("T"),      \
+      ReluOp<GPUDevice, type>);                                       \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("Relu6").Device(DEVICE_GPU).TypeConstraint<type>("T"),     \
+      Relu6Op<GPUDevice, type>);                                      \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("ReluGrad").Device(DEVICE_GPU).TypeConstraint<type>("T"),  \
+      ReluGradOp<GPUDevice, type>);                                   \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("Relu6Grad").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      Relu6GradOp<GPUDevice, type>)
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);
+#undef REGISTER_GPU_KERNELS
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/relu_op.h b/tensorflow/core/kernels/relu_op.h
new file mode 100644
index 0000000000..8ed071cc4a
--- /dev/null
+++ b/tensorflow/core/kernels/relu_op.h
@@ -0,0 +1,79 @@
+#ifndef TENSORFLOW_KERNELS_RELU_OP_H_
+#define TENSORFLOW_KERNELS_RELU_OP_H_
+// Functor definition for ReluOp and ReluGradOp, must be compilable by nvcc.
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by ReluOp to do the computations.
+template <typename Device, typename T>
+struct Relu {
+  // Computes Relu activation.
+  //
+  // features: any shape.
+  // activations: same shape as "features".
+  void operator()(const Device& d, typename TTypes<T>::ConstTensor features,
+                  typename TTypes<T>::Tensor activations) {
+    activations.device(d) = features.cwiseMax(static_cast<T>(0));
+  }
+};
+
+// Functor used by ReluGradOp to do the computations.
+template <typename Device, typename T>
+struct ReluGrad {
+  // Computes ReluGrad backprops.
+  //
+  // gradients: gradients backpropagated to the Relu op.
+  // features: inputs that where passed to the Relu op.
+  // backprops: gradients to backpropagate to the Relu inputs.
+  void operator()(const Device& d, typename TTypes<T>::ConstTensor gradients,
+                  typename TTypes<T>::ConstTensor features,
+                  typename TTypes<T>::Tensor backprops) {
+    // NOTE: When the activation is exactly zero, we arbitrarily choose to not
+    // propagate the associated gradient value.
+    backprops.device(d) =
+        gradients * (features > features.constant(static_cast<T>(0)));
+  }
+};
+
+// Functor used by Relu6Op to do the computations.
+template <typename Device, typename T>
+struct Relu6 {
+  // Computes Relu6 activation.
+  //
+  // features: any shape.
+  // activations: same shape as "features".
+  void operator()(const Device& d, typename TTypes<T>::ConstTensor features,
+                  typename TTypes<T>::Tensor activations) {
+    activations.device(d) =
+        features.cwiseMax(static_cast<T>(0)).cwiseMin(static_cast<T>(6));
+  }
+};
+
+// Functor used by ReluGradOp to do the computations.
+template <typename Device, typename T>
+struct Relu6Grad {
+  // Computes Relu6Grad backprops.
+  //
+  // gradients: gradients backpropagated to the Relu6 op.
+  // features: inputs that where passed to the Relu6 op.
+  // backprops: gradients to backpropagate to the Relu6 inputs.
+  void operator()(const Device& d, typename TTypes<T>::ConstTensor gradients,
+                  typename TTypes<T>::ConstTensor features,
+                  typename TTypes<T>::Tensor backprops) {
+    // NOTE: When the activation is exactly zero or six, we
+    // arbitrarily choose to not propagate the associated gradient
+    // value.
+    backprops.device(d) = gradients *
+                          (features > features.constant(static_cast<T>(0))) *
+                          (features < features.constant(static_cast<T>(6)));
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_RELU_OP_H_
diff --git a/tensorflow/core/kernels/relu_op_gpu.cu.cc b/tensorflow/core/kernels/relu_op_gpu.cu.cc
new file mode 100644
index 0000000000..6bd87ff8e4
--- /dev/null
+++ b/tensorflow/core/kernels/relu_op_gpu.cu.cc
@@ -0,0 +1,27 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include <stdio.h>
+
+#include "tensorflow/core/kernels/relu_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Definition of the GPU implementations declared in relu_op.cc.
+#define DEFINE_GPU_KERNELS(T)                      \
+  template struct functor::Relu<GPUDevice, T>;     \
+  template struct functor::ReluGrad<GPUDevice, T>; \
+  template struct functor::Relu6<GPUDevice, T>;    \
+  template struct functor::Relu6Grad<GPUDevice, T>;
+
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_KERNELS);
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/reshape_op.cc b/tensorflow/core/kernels/reshape_op.cc
new file mode 100644
index 0000000000..7e1cf029de
--- /dev/null
+++ b/tensorflow/core/kernels/reshape_op.cc
@@ -0,0 +1,29 @@
+// See docs in ../ops/array_ops.cc.
+#include "tensorflow/core/kernels/reshape_op.h"
+
+namespace tensorflow {
+
+REGISTER_KERNEL_BUILDER(Name("Reshape").Device(DEVICE_CPU).HostMemory("shape"),
+                        ReshapeOp);
+
+#define REGISTER_GPU_KERNEL(type)                         \
+  REGISTER_KERNEL_BUILDER(Name("Reshape")                 \
+                              .Device(DEVICE_GPU)         \
+                              .HostMemory("shape")        \
+                              .TypeConstraint<type>("T"), \
+                          ReshapeOp);
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+#undef REGISTER_GPU_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Reshape")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("tensor")
+                            .HostMemory("shape")
+                            .HostMemory("output")
+                            .TypeConstraint<int32>("T"),
+                        ReshapeOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reshape_op.h b/tensorflow/core/kernels/reshape_op.h
new file mode 100644
index 0000000000..3fd3f4492e
--- /dev/null
+++ b/tensorflow/core/kernels/reshape_op.h
@@ -0,0 +1,83 @@
+#ifndef TENSORFLOW_KERNELS_RESHAPE_OP_H_
+#define TENSORFLOW_KERNELS_RESHAPE_OP_H_
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+class ReshapeOp : public OpKernel {
+ public:
+  explicit ReshapeOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& sizes = context->input(1);
+    // Preliminary validation of sizes.
+    OP_REQUIRES(context, TensorShapeUtils::IsLegacyVector(sizes.shape()),
+                errors::InvalidArgument("sizes input must be 1-D, not shape ",
+                                        sizes.shape().ShortDebugString()));
+    const int64 num_dims = sizes.NumElements();
+    OP_REQUIRES(
+        context, num_dims <= 8,
+        errors::InvalidArgument(num_dims, " > max 8 output dims supported"));
+
+    // Compute the output shape.  Determine product of specified
+    // dimensions, and find the index of the unspecified one.
+    TensorShape shape;
+    int32 product = 1;
+    int unknown_index = -1;
+    auto Svec = sizes.flat<int32>();
+    for (int d = 0; d < num_dims; ++d) {
+      const int32 size = Svec(d);
+      if (size == -1) {
+        OP_REQUIRES(
+            context, unknown_index == -1,
+            errors::InvalidArgument("only one input size may be -1, not both ",
+                                    unknown_index, " and ", d));
+        unknown_index = d;
+        shape.AddDim(1);
+      } else {
+        OP_REQUIRES(context, size >= 0,
+                    errors::InvalidArgument(
+                        "size ", d, " must be non-negative, not ", size));
+        shape.AddDim(size);
+        product *= size;
+      }
+    }
+    if (unknown_index != -1) {
+      OP_REQUIRES(
+          context, product > 0,
+          errors::InvalidArgument("cannot infer the missing input size for "
+                                  "an empty tensor unless all specified "
+                                  "input sizes are non-zero"));
+      const int32 missing = input.NumElements() / product;
+      OP_REQUIRES(context, product * missing == input.NumElements(),
+                  errors::InvalidArgument("Input has ", input.NumElements(),
+                                          " values, which isn't divisible by ",
+                                          product));
+      shape.set_dim(unknown_index, missing);
+    }
+    OP_REQUIRES(context, shape.num_elements() == input.NumElements(),
+                errors::InvalidArgument("Input has ", input.NumElements(),
+                                        " values, which isn't the same as ",
+                                        shape.num_elements()));
+
+    // Actually produce the reshaped output.
+    Tensor output(input.dtype());
+    CHECK(output.CopyFrom(input, shape));
+    context->set_output(0, output);
+  }
+
+  bool IsExpensive() override { return false; }
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_RESHAPE_OP_H_
diff --git a/tensorflow/core/kernels/resize_area_op.cc b/tensorflow/core/kernels/resize_area_op.cc
new file mode 100644
index 0000000000..2b22d38ad6
--- /dev/null
+++ b/tensorflow/core/kernels/resize_area_op.cc
@@ -0,0 +1,139 @@
+// See docs in ../ops/image_ops.cc
+#define EIGEN_USE_THREADS
+
+#include <algorithm>
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+template <typename Device, typename T>
+class ResizeAreaOp : public OpKernel {
+ public:
+  explicit ResizeAreaOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    OP_REQUIRES(context, input.dims() == 4,
+                errors::InvalidArgument("input must be 4-dimensional",
+                                        input.shape().ShortDebugString()));
+    const Tensor& shape_t = context->input(1);
+    OP_REQUIRES(context, shape_t.dims() == 1,
+                errors::InvalidArgument("shape_t must be 1-dimensional",
+                                        shape_t.shape().ShortDebugString()));
+    OP_REQUIRES(context, shape_t.NumElements() == 2,
+                errors::InvalidArgument("shape_t must have two elements",
+                                        shape_t.shape().ShortDebugString()));
+
+    auto Svec = shape_t.vec<int32>();
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, TensorShape({input.dim_size(0), Svec(0),
+                                                Svec(1), input.dim_size(3)}),
+                                &output));
+    const int64 batch_size = input.dim_size(0);
+    const int64 in_height = input.dim_size(1);
+    const int64 in_width = input.dim_size(2);
+    const int64 channels = input.dim_size(3);
+    const int64 out_height = output->dim_size(1);
+    const int64 out_width = output->dim_size(2);
+
+    typename TTypes<T, 4>::ConstTensor input_data = input.tensor<T, 4>();
+    typename TTypes<float, 4>::Tensor output_data = output->tensor<float, 4>();
+
+    // A temporary tensor for computing the sum.
+    Tensor sum_tensor;
+    OP_REQUIRES_OK(
+        context, context->allocate_temp(DataTypeToEnum<float>::value,
+                                        TensorShape({channels}), &sum_tensor));
+    typename TTypes<float, 1>::Tensor sum_data = sum_tensor.vec<float>();
+
+    const float height_scale = in_height / static_cast<float>(out_height);
+    const float width_scale = in_width / static_cast<float>(out_width);
+
+    // When using this algorithm for downsizing, the target pixel value is the
+    // weighted average of all the source pixels. The weight is determined by
+    // the contribution percentage of the source pixel.
+    //
+    // Let "scale" be "target_image_size/source_image_size". If 1/n of the
+    // source pixel contributes to the target pixel, then the weight is (1/n *
+    // scale); if the complete source pixel contributes to the target pixel,
+    // then the weight is scale.
+    //
+    // To visualize the implementation, use one dimension as an example:
+    // Resize in[4] to out[3].
+    //   scale = 3/4 = 0.75
+    //   out[0]: in[0] and 1/3 of in[1]
+    //   out[1]: 2/3 of in[1] and 2/3 of in[2]
+    //   out[2]: 1/3 of in[2] and in[1]
+    // Hence, the output pixel values are:
+    //   out[0] = (in[0] * 1.0 + in[1] * 1/3) * scale
+    //   out[1] = (in[1] * 2/3 + in[2] * 2/3 * scale
+    //   out[2] = (in[3] * 1/3 + in[3] * 1.0) * scale
+    float scale = 1.0 / (height_scale * width_scale);
+    for (int64 b = 0; b < batch_size; ++b) {
+      for (int64 y = 0; y < out_height; ++y) {
+        const float in_y = y * height_scale;
+        const float in_y1 = (y + 1) * height_scale;
+        // The start and end height indices of all the cells that could
+        // contribute to the target cell.
+        int64 y_start = floor(in_y);
+        int64 y_end = ceil(in_y1);
+
+        for (int64 x = 0; x < out_width; ++x) {
+          const float in_x = x * width_scale;
+          const float in_x1 = (x + 1) * width_scale;
+          // The start and end width indices of all the cells that could
+          // contribute to the target cell.
+          int64 x_start = floor(in_x);
+          int64 x_end = ceil(in_x1);
+
+          sum_data.setConstant(0.0);
+          for (int64 i = y_start; i < y_end; ++i) {
+            float scale_y =
+                i < in_y ? i + 1 - in_y : (i + 1 > in_y1 ? in_y1 - i : 1.0);
+            for (int64 j = x_start; j < x_end; ++j) {
+              float scale_x =
+                  j < in_x ? j + 1 - in_x : (j + 1 > in_x1 ? in_x1 - j : 1.0);
+              for (int64 c = 0; c < channels; ++c) {
+#define BOUND(val, limit) std::min(((limit)-1ll), (std::max(0ll, (val))))
+                sum_data(c) +=
+                    input_data(b, BOUND(i, in_height), BOUND(j, in_width), c) *
+                    scale_y * scale_x * scale;
+#undef BOUND
+              }
+            }
+          }
+          for (int64 c = 0; c < channels; ++c) {
+            output_data(b, y, x, c) = sum_data(c);
+          }
+        }
+      }
+    }
+  }
+};
+
+#define REGISTER_KERNEL(T)                            \
+  REGISTER_KERNEL_BUILDER(Name("ResizeArea")          \
+                              .Device(DEVICE_CPU)     \
+                              .TypeConstraint<T>("T") \
+                              .HostMemory("size"),    \
+                          ResizeAreaOp<CPUDevice, T>);
+
+REGISTER_KERNEL(uint8);
+REGISTER_KERNEL(int8);
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/resize_bicubic_op.cc b/tensorflow/core/kernels/resize_bicubic_op.cc
new file mode 100644
index 0000000000..472fc19b82
--- /dev/null
+++ b/tensorflow/core/kernels/resize_bicubic_op.cc
@@ -0,0 +1,121 @@
+// See docs in ../ops/image_ops.cc
+#define EIGEN_USE_THREADS
+
+#include <algorithm>
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+template <typename Device, typename T>
+class ResizeBicubicOp : public OpKernel {
+ public:
+  explicit ResizeBicubicOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    OP_REQUIRES(context, input.dims() == 4,
+                errors::InvalidArgument("input must be 4-dimensional",
+                                        input.shape().ShortDebugString()));
+    const Tensor& shape_t = context->input(1);
+    OP_REQUIRES(context, shape_t.dims() == 1,
+                errors::InvalidArgument("shape_t must be 1-dimensional",
+                                        shape_t.shape().ShortDebugString()));
+    OP_REQUIRES(context, shape_t.NumElements() == 2,
+                errors::InvalidArgument("shape_t must have two elements",
+                                        shape_t.shape().ShortDebugString()));
+
+    auto Svec = shape_t.vec<int32>();
+    // Initialize shape to the batch size of the input, then add
+    // the rest of the dimensions
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, TensorShape({input.dim_size(0), Svec(0),
+                                                Svec(1), input.dim_size(3)}),
+                                &output));
+    const int64 batch_size = input.dim_size(0);
+    const int64 in_height = input.dim_size(1);
+    const int64 in_width = input.dim_size(2);
+    const int64 channels = input.dim_size(3);
+    const int64 out_height = output->dim_size(1);
+    const int64 out_width = output->dim_size(2);
+
+    typename TTypes<T, 4>::ConstTensor input_data = input.tensor<T, 4>();
+    typename TTypes<float, 4>::Tensor output_data = output->tensor<float, 4>();
+
+    const float height_scale = in_height / static_cast<float>(out_height);
+    const float width_scale = in_width / static_cast<float>(out_width);
+
+    // Initialize coefficients table using Bicubic convolution algorithm.
+    // https://en.wikipedia.org/wiki/Bicubic_interpolation
+    static const int64 tab_size = (1 << 10);
+    static float coeffs_tab[(tab_size + 1) * 2];
+    static const double A = -0.75;
+    for (int i = 0; i <= tab_size; ++i) {
+      float x = i * 1.0 / tab_size;
+      coeffs_tab[i * 2] = ((A + 2) * x - (A + 3)) * x * x + 1;
+      x += 1.0;
+      coeffs_tab[i * 2 + 1] = ((A * x - 5 * A) * x + 8 * A) * x - 4 * A;
+    }
+
+    auto cal = [](float v0, float v1, float v2, float v3, float dx) {
+      const int64 offset = round(dx * tab_size);
+      const float a0 = coeffs_tab[offset * 2 + 1];
+      const float a1 = coeffs_tab[offset * 2];
+      const float a2 = coeffs_tab[(tab_size - offset) * 2];
+      const float a3 = coeffs_tab[(tab_size - offset) * 2 + 1];
+      return a0 * v0 + a1 * v1 + a2 * v2 + a3 * v3;
+    };
+
+    float coeff[4] = {0.0};
+    for (int64 b = 0; b < batch_size; ++b) {
+      for (int64 y = 0; y < out_height; ++y) {
+        const int64 in_y = floor(height_scale * y);
+        const float dy = height_scale * y - in_y;
+        for (int64 x = 0; x < out_width; ++x) {
+          const int64 in_x = floor(width_scale * x);
+          const float dx = width_scale * x - in_x;
+          for (int64 c = 0; c < channels; ++c) {
+            for (int64 i = 0; i < 4; ++i) {
+#define BOUND(val, limit) std::min(((limit)-1ll), (std::max(0ll, (val))))
+              int64 bound_y = BOUND(in_y - 1 + i, in_height);
+              coeff[i] =
+                  cal(input_data(b, bound_y, BOUND(in_x - 1, in_width), c),
+                      input_data(b, bound_y, BOUND(in_x, in_width), c),
+                      input_data(b, bound_y, BOUND(in_x + 1, in_width), c),
+                      input_data(b, bound_y, BOUND(in_x + 2, in_width), c), dx);
+#undef BOUND
+            }
+            output_data(b, y, x, c) =
+                cal(coeff[0], coeff[1], coeff[2], coeff[3], dy);
+          }
+        }
+      }
+    }
+  }
+};
+
+#define REGISTER_KERNEL(T)                            \
+  REGISTER_KERNEL_BUILDER(Name("ResizeBicubic")       \
+                              .Device(DEVICE_CPU)     \
+                              .TypeConstraint<T>("T") \
+                              .HostMemory("size"),    \
+                          ResizeBicubicOp<CPUDevice, T>);
+
+REGISTER_KERNEL(uint8);
+REGISTER_KERNEL(int8);
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/resize_bilinear_op.cc b/tensorflow/core/kernels/resize_bilinear_op.cc
new file mode 100644
index 0000000000..5119b93508
--- /dev/null
+++ b/tensorflow/core/kernels/resize_bilinear_op.cc
@@ -0,0 +1,109 @@
+// See docs in ../ops/image_ops.cc
+#define EIGEN_USE_THREADS
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+template <typename Device, typename T>
+class ResizeBilinearOp : public OpKernel {
+ public:
+  explicit ResizeBilinearOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    OP_REQUIRES(context, input.dims() == 4,
+                errors::InvalidArgument("input must be 4-dimensional",
+                                        input.shape().ShortDebugString()));
+    const Tensor& shape_t = context->input(1);
+    OP_REQUIRES(context, shape_t.dims() == 1,
+                errors::InvalidArgument("shape_t must be 1-dimensional",
+                                        shape_t.shape().ShortDebugString()));
+    OP_REQUIRES(context, shape_t.NumElements() == 2,
+                errors::InvalidArgument("shape_t must have two elements",
+                                        shape_t.shape().ShortDebugString()));
+
+    auto Svec = shape_t.vec<int32>();
+    // Initialize shape to the batch size of the input, then add
+    // the rest of the dimensions
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, TensorShape({input.dim_size(0), Svec(0),
+                                                Svec(1), input.dim_size(3)}),
+                                &output));
+
+    const int64 batch_size = input.dim_size(0);
+    const int64 in_height = input.dim_size(1);
+    const int64 in_width = input.dim_size(2);
+    const int64 channels = input.dim_size(3);
+    const int64 out_height = output->dim_size(1);
+    const int64 out_width = output->dim_size(2);
+
+    typename TTypes<T, 4>::ConstTensor input_data = input.tensor<T, 4>();
+    typename TTypes<float, 4>::Tensor output_data = output->tensor<float, 4>();
+
+    const float height_scale = in_height / static_cast<float>(out_height);
+    const float width_scale = in_width / static_cast<float>(out_width);
+
+    for (int b = 0; b < batch_size; ++b) {
+      for (int y = 0; y < out_height; ++y) {
+        const float in_y = y * height_scale;
+        const int top_y_index = static_cast<int>(floorf(in_y));
+        const int bottom_y_index =
+            std::min(static_cast<int64>(ceilf(in_y)), (in_height - 1));
+        const float y_lerp = in_y - top_y_index;
+        const float inverse_y_lerp = (1.0f - y_lerp);
+        for (int x = 0; x < out_width; ++x) {
+          const float in_x = x * width_scale;
+          const int left_x_index = static_cast<int>(floorf(in_x));
+          const int right_x_index =
+              std::min(static_cast<int64>(ceilf(in_x)), (in_width - 1));
+          const float x_lerp = in_x - left_x_index;
+          const float inverse_x_lerp = (1.0f - x_lerp);
+          for (int c = 0; c < channels; ++c) {
+            const float top_left = input_data(b, top_y_index, left_x_index, c);
+            const float top_right =
+                input_data(b, top_y_index, right_x_index, c);
+            const float bottom_left =
+                input_data(b, bottom_y_index, left_x_index, c);
+            const float bottom_right =
+                input_data(b, bottom_y_index, right_x_index, c);
+            const float top =
+                (top_left * inverse_x_lerp) + (top_right * x_lerp);
+            const float bottom =
+                (bottom_left * inverse_x_lerp) + (bottom_right * x_lerp);
+            output_data(b, y, x, c) =
+                (top * inverse_y_lerp) + (bottom * y_lerp);
+          }
+        }
+      }
+    }
+  }
+};
+
+#define REGISTER_KERNEL(T)                            \
+  REGISTER_KERNEL_BUILDER(Name("ResizeBilinear")      \
+                              .Device(DEVICE_CPU)     \
+                              .TypeConstraint<T>("T") \
+                              .HostMemory("size"),    \
+                          ResizeBilinearOp<CPUDevice, T>);
+
+REGISTER_KERNEL(uint8);
+REGISTER_KERNEL(int8);
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/resize_bilinear_op_test.cc b/tensorflow/core/kernels/resize_bilinear_op_test.cc
new file mode 100644
index 0000000000..0ebe2e5f8c
--- /dev/null
+++ b/tensorflow/core/kernels/resize_bilinear_op_test.cc
@@ -0,0 +1,171 @@
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+
+class ResizeBilinearOpTest : public OpsTestBase {
+ protected:
+  ResizeBilinearOpTest() {
+    RequireDefaultOps();
+    EXPECT_OK(NodeDefBuilder("resize_bilinear_op", "ResizeBilinear")
+                  .Input(FakeInput(DT_FLOAT))
+                  .Input(FakeInput(DT_INT32))
+                  .Finalize(node_def()));
+    EXPECT_OK(InitOp());
+  }
+};
+
+TEST_F(ResizeBilinearOpTest, TestBilinear2x2To1x1) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {1, 1});
+  ASSERT_OK(RunOpKernel());
+
+  // When scaling down, we have to arbitrarily pick a pixel from the
+  // original input.  In this case, we choose the top/left most pixel.
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 1, 1, 1}));
+  test::FillValues<float>(&expected, {1.0});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeBilinearOpTest, TestBilinear2x2To3x3) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {3, 3});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 3, 3, 1}));
+
+  // The corners should match the original corners, and we bilinear
+  // interpolate the values in between.
+
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1,     5.0/3,   2,
+     7.0/3, 3,       10.0/3,
+     3,     11.0/3,  4});
+
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeBilinearOpTest, TestBilinear3x3To4x4) {
+  // Input:
+  //  1, 2, 3,
+  //  4, 5, 6,
+  //  7, 8, 9
+  AddInputFromArray<float>(TensorShape({1, 3, 3, 1}),
+                           {1, 2, 3, 4, 5, 6, 7, 8, 9});
+  AddInputFromArray<int32>(TensorShape({2}), {4, 4});
+  ASSERT_OK(RunOpKernel());
+
+  // The corners should match the original corners, and we bilinear
+  // interpolate the values in between.
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 4, 4, 1}));
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1, 1.75, 2.5, 3,
+     3.25, 4, 4.75, 5.25,
+     5.5, 6.25, 7, 7.5,
+     7,  7.75, 8.5, 9});
+
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeBilinearOpTest, TestBilinear2x2To3x3Batch2) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  //
+  // repeated twice
+  AddInputFromArray<float>(TensorShape({2, 2, 2, 1}), {1, 2, 3, 4, 1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {3, 3});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({2, 3, 3, 1}));
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1, 5.0/3, 2, 7.0/3, 3, 10.0/3, 3, 11.0/3, 4,
+     1, 5.0/3, 2, 7.0/3, 3, 10.0/3, 3, 11.0/3, 4
+    });
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeBilinearOpTest, TestBilinear2x2x2To3x3x2) {
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 2}),
+                           {1, -1, 2, -2, 3, -3, 4, -4});
+  AddInputFromArray<int32>(TensorShape({2}), {3, 3});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 3, 3, 2}));
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {
+      1,      -1,
+      5.0/3,  -5.0/3,
+      2,      -2,
+      7.0/3,  -7.0/3,
+      3,      -3,
+      10.0/3, -10.0/3,
+      3,      -3,
+      11.0/3, -11.0/3,
+      4,      -4
+    });
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeBilinearOpTest, TestBilinear2x2To4x4) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {4, 4});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 4, 4, 1}));
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1,  1.5, 2, 2,
+     2,  2.5, 3, 3,
+     3,  3.5, 4, 4,
+     3,  3.5, 4, 4});
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeBilinearOpTest, TestInvalidInputShape) {
+  AddInputFromArray<float>(TensorShape({2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {4, 4});
+  ASSERT_FALSE(RunOpKernel().ok());
+}
+
+TEST_F(ResizeBilinearOpTest, TestInvalidSizeDim) {
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2, 1}), {4, 4});
+  ASSERT_FALSE(RunOpKernel().ok());
+}
+TEST_F(ResizeBilinearOpTest, TestInvalidSizeElements) {
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({3}), {4, 4, 1});
+  ASSERT_FALSE(RunOpKernel().ok());
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/resize_nearest_neighbor_op.cc b/tensorflow/core/kernels/resize_nearest_neighbor_op.cc
new file mode 100644
index 0000000000..13089308ce
--- /dev/null
+++ b/tensorflow/core/kernels/resize_nearest_neighbor_op.cc
@@ -0,0 +1,89 @@
+// See docs in ../ops/image_ops.cc
+#define EIGEN_USE_THREADS
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+template <typename Device, typename T>
+class ResizeNearestNeighborOp : public OpKernel {
+ public:
+  explicit ResizeNearestNeighborOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    OP_REQUIRES(context, input.dims() == 4,
+                errors::InvalidArgument("input must be 4-dimensional",
+                                        input.shape().ShortDebugString()));
+    const Tensor& shape_t = context->input(1);
+    OP_REQUIRES(context, shape_t.dims() == 1,
+                errors::InvalidArgument("shape_t must be 1-dimensional",
+                                        shape_t.shape().ShortDebugString()));
+    OP_REQUIRES(context, shape_t.NumElements() == 2,
+                errors::InvalidArgument("shape_t must have two elements",
+                                        shape_t.shape().ShortDebugString()));
+
+    auto Svec = shape_t.vec<int32>();
+    // Initialize shape to the batch size of the input, then add
+    // the rest of the dimensions
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, TensorShape({input.dim_size(0), Svec(0),
+                                                Svec(1), input.dim_size(3)}),
+                                &output));
+
+    const int64 batch_size = input.dim_size(0);
+    const int64 in_height = input.dim_size(1);
+    const int64 in_width = input.dim_size(2);
+    const int64 channels = input.dim_size(3);
+    const int64 out_height = output->dim_size(1);
+    const int64 out_width = output->dim_size(2);
+
+    typename TTypes<T, 4>::ConstTensor input_data = input.tensor<T, 4>();
+    typename TTypes<T, 4>::Tensor output_data = output->tensor<T, 4>();
+
+    const float height_scale = in_height / static_cast<float>(out_height);
+    const float width_scale = in_width / static_cast<float>(out_width);
+
+    for (int b = 0; b < batch_size; ++b) {
+      for (int y = 0; y < out_height; ++y) {
+        const int in_y = std::min(static_cast<int64>(floorf(y * height_scale)),
+                                  (in_height - 1));
+        for (int x = 0; x < out_width; ++x) {
+          const int in_x = std::min(static_cast<int64>(floorf(x * width_scale)),
+                                    (in_width - 1));
+          for (int c = 0; c < channels; ++c) {
+            output_data(b, y, x, c) = input_data(b, in_y, in_x, c);
+          }
+        }
+      }
+    }
+  }
+};
+
+#define REGISTER_KERNEL(T)                              \
+  REGISTER_KERNEL_BUILDER(Name("ResizeNearestNeighbor") \
+                              .Device(DEVICE_CPU)       \
+                              .TypeConstraint<T>("T")   \
+                              .HostMemory("size"),      \
+                          ResizeNearestNeighborOp<CPUDevice, T>);
+
+REGISTER_KERNEL(uint8);
+REGISTER_KERNEL(int8);
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/resize_nearest_neighbor_op_test.cc b/tensorflow/core/kernels/resize_nearest_neighbor_op_test.cc
new file mode 100644
index 0000000000..8fca1f34e3
--- /dev/null
+++ b/tensorflow/core/kernels/resize_nearest_neighbor_op_test.cc
@@ -0,0 +1,163 @@
+// TODO(shlens, sherrym): Consider adding additional tests in image_ops.py in
+// order to compare the reference implementation for image resizing in Python
+// Image Library.
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+
+class ResizeNearestNeighborOpTest : public OpsTestBase {
+ protected:
+  ResizeNearestNeighborOpTest() {
+    RequireDefaultOps();
+    EXPECT_OK(NodeDefBuilder("resize_nn", "ResizeNearestNeighbor")
+                  .Input(FakeInput(DT_FLOAT))
+                  .Input(FakeInput(DT_INT32))
+                  .Finalize(node_def()));
+    EXPECT_OK(InitOp());
+  }
+};
+
+TEST_F(ResizeNearestNeighborOpTest, TestNearest2x2To1x1) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {1, 1});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 1, 1, 1}));
+
+  // clang-format off
+  test::FillValues<float>(&expected, {1});
+
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeNearestNeighborOpTest, TestNearest2x2To3x3) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {3, 3});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 3, 3, 1}));
+
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1, 1, 2,
+     1, 1, 2,
+     3, 3, 4});
+
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeNearestNeighborOpTest, TestNearest2x2To2x5) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {2, 5});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 2, 5, 1}));
+
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1, 1, 1, 2, 2,
+     3, 3, 3, 4, 4});
+
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeNearestNeighborOpTest, TestNearest2x2To5x2) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {5, 2});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 5, 2, 1}));
+
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1, 2,
+     1, 2,
+     1, 2,
+     3, 4,
+     3, 4});
+
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeNearestNeighborOpTest, TestNearest2x2To4x4) {
+  // Input:
+  //  1, 2
+  //  3, 4
+  AddInputFromArray<float>(TensorShape({1, 2, 2, 1}), {1, 2, 3, 4});
+  AddInputFromArray<int32>(TensorShape({2}), {4, 4});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 4, 4, 1}));
+
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1, 1, 2, 2,
+     1, 1, 2, 2,
+     3, 3, 4, 4,
+     3, 3, 4, 4});
+
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(ResizeNearestNeighborOpTest, TestNearest2x2x2x2To2x3x3x2) {
+  // Input:
+  //  [ [ 1, 1 ], [ 2, 2],
+  //    [ 3, 3 ], [ 4, 4] ],
+  //  [ [ 5, 5 ], [ 6, 6],
+  //    [ 7, 7 ], [ 8, 8] ]
+  AddInputFromArray<float>(TensorShape({2, 2, 2, 2}),
+                           {1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8});
+  AddInputFromArray<int32>(TensorShape({2}), {3, 3});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({2, 3, 3, 2}));
+
+  // clang-format off
+  test::FillValues<float>(&expected,
+    {1, 1, 1,
+     1, 2, 2,
+     1, 1, 1,
+     1, 2, 2,
+     3, 3, 3,
+     3, 4, 4,
+     5, 5, 5,
+     5, 6, 6,
+     5, 5, 5,
+     5, 6, 6,
+     7, 7, 7,
+     7, 8, 8});
+
+  // clang-format on
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/restore_op.cc b/tensorflow/core/kernels/restore_op.cc
new file mode 100644
index 0000000000..b52c69449c
--- /dev/null
+++ b/tensorflow/core/kernels/restore_op.cc
@@ -0,0 +1,65 @@
+// See docs in ../ops/io_ops.cc.
+#include "tensorflow/core/kernels/io.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/util/tensor_slice_reader.h"
+
+namespace tensorflow {
+
+class RestoreOp : public OpKernel {
+ public:
+  explicit RestoreOp(OpKernelConstruction* context) : OpKernel(context) {
+    int preferred_shard;
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("preferred_shard", &preferred_shard));
+    if (preferred_shard == -1) {
+      preferred_shard_ = checkpoint::TensorSliceReader::kLoadAllShards;
+    } else {
+      OP_REQUIRES(context, preferred_shard >= 0,
+                  errors::InvalidArgument("Attribute 'preferred_shard' must be "
+                                          "greater or equal to -1"));
+      preferred_shard_ = preferred_shard;
+    }
+  }
+  void Compute(OpKernelContext* context) override {
+    RestoreTensor(context, &checkpoint::OpenTableTensorSliceReader,
+                  preferred_shard_, false);
+  }
+
+ private:
+  int preferred_shard_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("Restore").Device(DEVICE_CPU), RestoreOp);
+
+class RestoreSliceOp : public OpKernel {
+ public:
+  explicit RestoreSliceOp(OpKernelConstruction* context) : OpKernel(context) {
+    int preferred_shard;
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("preferred_shard", &preferred_shard));
+    if (preferred_shard == -1) {
+      preferred_shard_ = checkpoint::TensorSliceReader::kLoadAllShards;
+    } else {
+      OP_REQUIRES(context, preferred_shard >= 0,
+                  errors::InvalidArgument("Attribute 'preferred_shard' must be "
+                                          "greater or equal to -1"));
+      preferred_shard_ = preferred_shard;
+    }
+  }
+  void Compute(OpKernelContext* context) override {
+    RestoreTensor(context, &checkpoint::OpenTableTensorSliceReader,
+                  preferred_shard_, true);
+  }
+
+ private:
+  int preferred_shard_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("RestoreSlice").Device(DEVICE_CPU),
+                        RestoreSliceOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/restore_op_test.cc b/tensorflow/core/kernels/restore_op_test.cc
new file mode 100644
index 0000000000..59343a8037
--- /dev/null
+++ b/tensorflow/core/kernels/restore_op_test.cc
@@ -0,0 +1,305 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/tensor_slice_reader_cache.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+class RestoreOpTest : public OpsTestBase {
+ protected:
+  // Makes an operation to restore two tensors
+  void MakeRestoreOp(DataType dt) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "Restore")
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Attr("dt", dt)
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(RestoreOpTest, RestoreInt) {
+  const string filename = io::JoinPath(testing::TmpDir(), "tensor_int");
+  const string tensor_name = "tensor_int";
+
+  // We first need to write a tensor using the save_op
+  {
+    // Initialize an operation
+    NodeDef save;
+    ASSERT_OK(NodeDefBuilder("save", "Save")
+                  .Input(FakeInput(DT_STRING))
+                  .Input(FakeInput(DT_STRING))
+                  .Input(FakeInput({DT_INT32}))
+                  .Finalize(&save));
+
+    std::unique_ptr<Device> device(
+        DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+    gtl::InlinedVector<TensorValue, 4> inputs;
+
+    Status status;
+    std::unique_ptr<OpKernel> op(CreateOpKernel(
+        DEVICE_CPU, device.get(), cpu_allocator(), save, &status));
+    EXPECT_OK(status);
+
+    // Run it
+
+    // Input #0 is the file name
+    Tensor input_0(DT_STRING, TensorShape({}));
+    input_0.scalar<string>()() = filename;
+    inputs.push_back({nullptr, &input_0});
+
+    // Input #1 is the tensor name
+    Tensor input_1(DT_STRING, TensorShape({}));
+    input_1.scalar<string>()() = tensor_name;
+    inputs.push_back({nullptr, &input_1});
+
+    // Input #2 is an integer tensor: it's a 1-d array.
+    Tensor input_2(DT_INT32, TensorShape({10}));
+    for (int i = 0; i < 10; ++i) {
+      input_2.flat<int32>()(i) = i + 1;
+    }
+    inputs.push_back({nullptr, &input_2});
+
+    OpKernelContext::Params params;
+    params.device = device.get();
+    params.frame_iter = FrameAndIter(0, 0);
+    params.inputs = &inputs;
+    params.op_kernel = op.get();
+    params.output_alloc_attr = [&device, &op, &params](int index) {
+      AllocatorAttributes attr;
+      const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+      attr.set_on_host(on_host);
+      return attr;
+    };
+    checkpoint::TensorSliceReaderCacheWrapper slice_reader_cache_wrapper;
+    params.slice_reader_cache = &slice_reader_cache_wrapper;
+
+    OpKernelContext ctx(params);
+    op->Compute(&ctx);
+    EXPECT_OK(ctx.status());
+  }
+
+  // Now we restore
+  MakeRestoreOp(DT_INT32);
+  // Add a file name
+  AddInput<string>(TensorShape({}),
+                   [&filename](int x) -> string { return filename; });
+  // Add the tensor names
+  AddInput<string>(TensorShape({}),
+                   [&tensor_name](int x) -> string { return tensor_name; });
+
+  ASSERT_OK(RunOpKernel());
+
+  // Check that we have an integer tensor
+  Tensor* output = GetOutput(0);
+  TensorShape expected({10});
+  EXPECT_TRUE(output->shape().IsSameSize(expected));
+  for (int i = 0; i < 10; ++i) {
+    EXPECT_EQ(i + 1, output->flat<int32>()(i));
+  }
+}
+
+TEST_F(RestoreOpTest, RestoreFloat) {
+  const string filename = io::JoinPath(testing::TmpDir(), "tensor_float");
+  const string tensor_name = "tensor_float";
+
+  // We first need to write a tensor using the save_op
+  {
+    // Initialize an operation
+    NodeDef save;
+    ASSERT_OK(NodeDefBuilder("save", "Save")
+                  .Input(FakeInput(DT_STRING))
+                  .Input(FakeInput(DT_STRING))
+                  .Input(FakeInput({DT_FLOAT}))
+                  .Finalize(&save));
+
+    std::unique_ptr<Device> device(
+        DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+    gtl::InlinedVector<TensorValue, 4> inputs;
+
+    Status status;
+    std::unique_ptr<OpKernel> op(CreateOpKernel(
+        DEVICE_CPU, device.get(), cpu_allocator(), save, &status));
+    EXPECT_OK(status);
+
+    // Run it
+
+    // Input #0 is the file name
+    Tensor input_0(DT_STRING, TensorShape({}));
+    input_0.scalar<string>()() = filename;
+    inputs.push_back({nullptr, &input_0});
+
+    // Input #1 is the tensor name
+    Tensor input_1(DT_STRING, TensorShape({}));
+    input_1.scalar<string>()() = tensor_name;
+    inputs.push_back({nullptr, &input_1});
+
+    // Input #2 is a float tensor: it's a 2-d array.
+    Tensor input_2(DT_FLOAT, TensorShape({2, 4}));
+    for (int i = 0; i < 8; ++i) {
+      input_2.flat<float>()(i) = static_cast<float>(i) / 10;
+    }
+    inputs.push_back({nullptr, &input_2});
+
+    OpKernelContext::Params params;
+    params.device = device.get();
+    params.frame_iter = FrameAndIter(0, 0);
+    params.inputs = &inputs;
+    params.op_kernel = op.get();
+    params.output_alloc_attr = [&device, &op, &params](int index) {
+      AllocatorAttributes attr;
+      const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+      attr.set_on_host(on_host);
+      return attr;
+    };
+    checkpoint::TensorSliceReaderCacheWrapper slice_reader_cache_wrapper;
+    params.slice_reader_cache = &slice_reader_cache_wrapper;
+
+    OpKernelContext ctx(params);
+    op->Compute(&ctx);
+    EXPECT_OK(ctx.status());
+  }
+
+  // Now we restore
+  MakeRestoreOp(DT_FLOAT);
+  // Add a file name
+  AddInput<string>(TensorShape({}),
+                   [&filename](int x) -> string { return filename; });
+  // Add the tensor names
+  AddInput<string>(TensorShape({}),
+                   [&tensor_name](int x) -> string { return tensor_name; });
+
+  ASSERT_OK(RunOpKernel());
+
+  // Check that we have a float tensor.
+  Tensor* output = GetOutput(0);
+  TensorShape expected({2, 4});
+  EXPECT_TRUE(output->shape().IsSameSize(expected));
+  for (int i = 0; i < 8; ++i) {
+    EXPECT_EQ(static_cast<float>(i) / 10, output->flat<float>()(i));
+  }
+}
+
+class RestoreSliceOpTest : public OpsTestBase {
+ protected:
+  void MakeRestoreSliceOp(DataType dt) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "RestoreSlice")
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Attr("dt", dt)
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(RestoreSliceOpTest, RestoreInt) {
+  const string filename = io::JoinPath(testing::TmpDir(), "tensor_int");
+  const string tensor_name = "tensor_int";
+
+  // We first need to write a tensor using the save_op
+  {
+    // Initialize an operation
+    NodeDef save;
+    ASSERT_OK(NodeDefBuilder("save", "Save")
+                  .Input(FakeInput(DT_STRING))
+                  .Input(FakeInput(DT_STRING))
+                  .Input(FakeInput({DT_INT32}))
+                  .Finalize(&save));
+
+    std::unique_ptr<Device> device(
+        DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+    gtl::InlinedVector<TensorValue, 4> inputs;
+
+    Status status;
+    std::unique_ptr<OpKernel> op(CreateOpKernel(
+        DEVICE_CPU, device.get(), cpu_allocator(), save, &status));
+    EXPECT_OK(status);
+
+    // Run it
+
+    // Input #0 is the file name
+    Tensor input_0(DT_STRING, TensorShape({}));
+    input_0.scalar<string>()() = filename;
+    inputs.push_back({nullptr, &input_0});
+
+    // Input #1 is the tensor name
+    Tensor input_1(DT_STRING, TensorShape({}));
+    input_1.scalar<string>()() = tensor_name;
+    inputs.push_back({nullptr, &input_1});
+
+    // Input #2 is a 4x16 integer tensor.
+    Tensor input_2(DT_INT32, TensorShape({4, 16}));
+    for (int64 i = 0; i < input_2.NumElements(); ++i) {
+      input_2.flat<int32>()(i) = i + 1;
+    }
+    inputs.push_back({nullptr, &input_2});
+
+    OpKernelContext::Params params;
+    params.device = device.get();
+    params.frame_iter = FrameAndIter(0, 0);
+    params.inputs = &inputs;
+    params.op_kernel = op.get();
+    params.output_alloc_attr = [&device, &op, &params](int index) {
+      AllocatorAttributes attr;
+      const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+      attr.set_on_host(on_host);
+      return attr;
+    };
+    checkpoint::TensorSliceReaderCacheWrapper slice_reader_cache_wrapper;
+    params.slice_reader_cache = &slice_reader_cache_wrapper;
+
+    OpKernelContext ctx(params);
+    op->Compute(&ctx);
+    EXPECT_OK(ctx.status());
+  }
+
+  // Now we restore
+  MakeRestoreSliceOp(DT_INT32);
+  string shape_and_slice = "4 16 0,2:-";
+  // Add a file name
+  AddInput<string>(TensorShape({}),
+                   [&filename](int x) -> string { return filename; });
+  // Add the tensor names
+  AddInput<string>(TensorShape({}),
+                   [&tensor_name](int x) -> string { return tensor_name; });
+  // Add the tensor shape and slice
+  AddInput<string>(TensorShape({}), [&shape_and_slice](int x) -> string {
+    return shape_and_slice;
+  });
+
+  ASSERT_OK(RunOpKernel());
+
+  // Check that we have an integer tensor
+  Tensor* output = GetOutput(0);
+  TensorShape expected({2, 16});
+  EXPECT_TRUE(output->shape().IsSameSize(expected));
+  for (int64 i = 0; i < expected.num_elements(); ++i) {
+    EXPECT_EQ(i + 1, output->flat<int32>()(i));
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reverse_op.cc b/tensorflow/core/kernels/reverse_op.cc
new file mode 100644
index 0000000000..c63dfc1e70
--- /dev/null
+++ b/tensorflow/core/kernels/reverse_op.cc
@@ -0,0 +1,139 @@
+// See docs in ../ops/array_ops.cc
+#define EIGEN_USE_THREADS
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/reverse_op.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class ReverseOp : public OpKernel {
+ public:
+  explicit ReverseOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& dims = context->input(1);
+
+    if (TensorShapeUtils::IsScalar(input.shape())) {
+      Tensor* output = nullptr;
+      OP_REQUIRES_OK(context,
+                     context->allocate_output(0, input.shape(), &output));
+      output->scalar<T>() = input.scalar<T>();
+
+    } else {
+      const int input_dims = input.dims();
+      OP_REQUIRES(context, TensorShapeUtils::IsVector(dims.shape()),
+                  errors::InvalidArgument("'dims' must be 1-dimension, not ",
+                                          dims.dims()));
+
+      OP_REQUIRES(context, input_dims == dims.dim_size(0),
+                  errors::InvalidArgument(
+          "'dims' must have the same number of values as 'input' has "
+          "dimensions. 'input' has ", input_dims, "'dims' has ",
+          dims.dim_size(0), " values"));
+      OP_REQUIRES(context, input_dims <= 8, errors::Unimplemented(
+                  "reverse is not implemented for tensors of rank > 8."));
+
+      Tensor* output = nullptr;
+      OP_REQUIRES_OK(context,
+                     context->allocate_output(0, input.shape(), &output));
+
+#define HANDLE_REVERSE(NDIMS)                                      \
+  case NDIMS:                                                      \
+    functor::Reverse<Device, T, NDIMS>()(                          \
+        context->eigen_device<Device>(), input.tensor<T, NDIMS>(), \
+        dims.vec<bool>(), output->tensor<T, NDIMS>());             \
+    return;
+
+      switch (input_dims) {
+        HANDLE_REVERSE(0);
+        HANDLE_REVERSE(1);
+        HANDLE_REVERSE(2);
+        HANDLE_REVERSE(3);
+        HANDLE_REVERSE(4);
+        HANDLE_REVERSE(5);
+        HANDLE_REVERSE(6);
+        HANDLE_REVERSE(7);
+        HANDLE_REVERSE(8);
+      }
+#undef HANDLE_REVERSE
+    }
+  }
+};
+
+#define REGISTER_KERNEL(T)                            \
+  REGISTER_KERNEL_BUILDER(Name("Reverse")             \
+                              .Device(DEVICE_CPU)     \
+                              .TypeConstraint<T>("T") \
+                              .HostMemory("dims"),    \
+                          ReverseOp<CPUDevice, T>)
+
+REGISTER_KERNEL(uint8);
+REGISTER_KERNEL(int8);
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(bool);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+#undef REGISTER_KERNEL
+
+#if GOOGLE_CUDA
+
+// Forward declarations of the function specializations for GPU (to prevent
+// building the GPU versions here, they will be built compiling _gpu.cu.cc).
+namespace functor {
+#define DECLARE_GPU_SPEC_DIM(T, DIM)                                  \
+  template <>                                                         \
+  void Reverse<GPUDevice, T, DIM>::operator()(                        \
+      const GPUDevice& d, typename TTypes<T, DIM>::ConstTensor input, \
+      typename TTypes<bool, 1>::ConstTensor dims,                     \
+      typename TTypes<T, DIM>::Tensor output);                        \
+  extern template struct Reverse<GPUDevice, T, DIM>;
+#define DECLARE_GPU_SPEC(T)  \
+  DECLARE_GPU_SPEC_DIM(T, 0) \
+  DECLARE_GPU_SPEC_DIM(T, 1) \
+  DECLARE_GPU_SPEC_DIM(T, 2) \
+  DECLARE_GPU_SPEC_DIM(T, 3) \
+  DECLARE_GPU_SPEC_DIM(T, 4) \
+  DECLARE_GPU_SPEC_DIM(T, 5) \
+  DECLARE_GPU_SPEC_DIM(T, 6) \
+  DECLARE_GPU_SPEC_DIM(T, 7) \
+  DECLARE_GPU_SPEC_DIM(T, 8)
+
+DECLARE_GPU_SPEC(uint8);
+DECLARE_GPU_SPEC(int8);
+DECLARE_GPU_SPEC(int32);
+DECLARE_GPU_SPEC(bool);
+DECLARE_GPU_SPEC(float);
+DECLARE_GPU_SPEC(double);
+#undef DECLARE_GPU_SPEC
+#undef DECLARE_GPU_SPEC_DIM
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNEL(T)                        \
+  REGISTER_KERNEL_BUILDER(Name("Reverse")             \
+                              .Device(DEVICE_GPU)     \
+                              .TypeConstraint<T>("T") \
+                              .HostMemory("dims"),    \
+                          ReverseOp<GPUDevice, T>)
+REGISTER_GPU_KERNEL(uint8);
+REGISTER_GPU_KERNEL(int8);
+REGISTER_GPU_KERNEL(float);
+REGISTER_GPU_KERNEL(double);
+#undef REGISTER_GPU_KERNEL
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reverse_op.h b/tensorflow/core/kernels/reverse_op.h
new file mode 100644
index 0000000000..bba25f70e8
--- /dev/null
+++ b/tensorflow/core/kernels/reverse_op.h
@@ -0,0 +1,28 @@
+#ifndef TENSORFLOW_KERNELS_REVERSE_OP_H_
+#define TENSORFLOW_KERNELS_REVERSE_OP_H_
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by MirrorOp to do the computations.
+template <typename Device, typename T, int Dims>
+struct Reverse {
+  void operator()(const Device& d, typename TTypes<T, Dims>::ConstTensor input,
+                  typename TTypes<bool, 1>::ConstTensor dims,
+                  typename TTypes<T, Dims>::Tensor output) {
+    // mirror is in host memory
+    Eigen::array<bool, Dims> reverse_dims;
+    for (int i = 0; i < Dims; ++i) {
+      reverse_dims[i] = dims(i);
+    }
+    output.device(d) = input.reverse(reverse_dims);
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_MIRROR_OP_H_
diff --git a/tensorflow/core/kernels/reverse_op_gpu.cu.cc b/tensorflow/core/kernels/reverse_op_gpu.cu.cc
new file mode 100644
index 0000000000..b510add3f3
--- /dev/null
+++ b/tensorflow/core/kernels/reverse_op_gpu.cu.cc
@@ -0,0 +1,33 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/reverse_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+#define DEFINE_REVERSE(DIM)                                \
+  template struct functor::Reverse<GPUDevice, uint8, DIM>; \
+  template struct functor::Reverse<GPUDevice, int8, DIM>;  \
+  template struct functor::Reverse<GPUDevice, int32, DIM>; \
+  template struct functor::Reverse<GPUDevice, bool, DIM>;  \
+  template struct functor::Reverse<GPUDevice, float, DIM>; \
+  template struct functor::Reverse<GPUDevice, double, DIM>;
+DEFINE_REVERSE(0)
+DEFINE_REVERSE(1)
+DEFINE_REVERSE(2)
+DEFINE_REVERSE(3)
+DEFINE_REVERSE(4)
+DEFINE_REVERSE(5)
+DEFINE_REVERSE(6)
+DEFINE_REVERSE(7)
+DEFINE_REVERSE(8)
+#undef DEFINE_REVERSE
+
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/reverse_op_test.cc b/tensorflow/core/kernels/reverse_op_test.cc
new file mode 100644
index 0000000000..d41c36e693
--- /dev/null
+++ b/tensorflow/core/kernels/reverse_op_test.cc
@@ -0,0 +1,101 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+class ReverseOpTest : public OpsTestBase {
+ protected:
+  void MakeOp(DataType data_type) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "Reverse")
+                  .Input(FakeInput(data_type))
+                  .Input(FakeInput())
+                  .Attr("T", data_type)
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(ReverseOpTest, Reverse_0) {
+  MakeOp(DT_FLOAT);
+  AddInputFromArray<float>(TensorShape({}), {3});
+  AddInputFromArray<bool>(TensorShape({}), {true});
+  ASSERT_OK(RunOpKernel());
+
+  Tensor* output = GetOutput(0);
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({}));
+  expected.scalar<float>() = expected.scalar<float>().constant(3.f);
+  test::ExpectTensorEqual<float>(expected, *output);
+}
+
+TEST_F(ReverseOpTest, Reverse_234) {
+  MakeOp(DT_FLOAT);
+
+  // Feed and run
+  // [[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  //  [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]
+  AddInputFromArray<float>(TensorShape({2, 3, 4}),
+                           {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+                            15, 16, 17, 18, 19, 20, 21, 22, 23});
+  AddInputFromArray<bool>(TensorShape({3}), {true, false, true});
+
+  ASSERT_OK(RunOpKernel());
+
+  // Check the new state of the input
+  Tensor* params_tensor = GetOutput(0);
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({2, 3, 4}));
+  // Should become
+  // [[[15, 14, 13, 12], [19, 18, 17, 16], [23, 22, 21, 20]]
+  //  [[3, 2, 1, 0], [7, 6, 5, 4], [11, 10, 9, 8]]]
+  test::FillValues<float>(
+      &expected, {15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 3, 2, 1, 0, 7,
+                  6, 5, 4, 11, 10, 9, 8});
+  test::ExpectTensorEqual<float>(expected, *params_tensor);
+}
+
+TEST_F(ReverseOpTest, Reverse_1234) {
+  MakeOp(DT_FLOAT);
+
+  // Feed and run
+  // [[[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
+  //   [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]]
+  AddInputFromArray<float>(TensorShape({1, 2, 3, 4}),
+                           {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+                            15, 16, 17, 18, 19, 20, 21, 22, 23});
+  AddInputFromArray<bool>(TensorShape({4}), {true, true, false, true});
+
+  ASSERT_OK(RunOpKernel());
+
+  // Check the new state of the input
+  Tensor* params_tensor = GetOutput(0);
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 2, 3, 4}));
+  // Should become
+  // [[[[15, 14, 13, 12], [19, 18, 17, 16], [23, 22, 21, 20]]
+  //   [[3, 2, 1, 0], [7, 6, 5, 4], [11, 10, 9, 8]]]]
+  test::FillValues<float>(
+      &expected, {15, 14, 13, 12, 19, 18, 17, 16, 23, 22, 21, 20, 3, 2, 1, 0, 7,
+                  6, 5, 4, 11, 10, 9, 8});
+  test::ExpectTensorEqual<float>(expected, *params_tensor);
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reverse_sequence_op.cc b/tensorflow/core/kernels/reverse_sequence_op.cc
new file mode 100644
index 0000000000..6673a700ef
--- /dev/null
+++ b/tensorflow/core/kernels/reverse_sequence_op.cc
@@ -0,0 +1,170 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#if GOOGLE_CUDA
+#define EIGEN_USE_GPU
+#endif  // GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/reverse_sequence_op.h"
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device>
+void CheckErrors(OpKernelContext* context, int seq_dim) {
+  const Tensor& input = context->input(0);
+  const Tensor& seq_lens = context->input(1);
+
+  auto seq_lens_t = seq_lens.vec<int64>();
+
+  std::vector<int64> seq_lens_vec(seq_lens_t.size());
+
+  // Copy seq_len info down for validity checks
+  context->eigen_device<Device>().memcpyDeviceToHost(
+      seq_lens_vec.data(), seq_lens_t.data(),
+      sizeof(int64) * seq_lens_t.size());
+
+  OP_REQUIRES(context, 0 != seq_dim, errors::InvalidArgument("0 == seq_dim"));
+  OP_REQUIRES(context, seq_dim < input.dims(),
+              errors::InvalidArgument("seq_dim must be < input.dims()", "( ",
+                                      seq_dim, " vs. ", input.dims(), ")"));
+
+  OP_REQUIRES(context, seq_lens.NumElements() == input.dim_size(0),
+              errors::InvalidArgument("len(seq_lens) != input.dims(", 0, "), ",
+                                      "(", seq_lens.NumElements(), " vs. ",
+                                      input.dim_size(seq_dim)));
+
+  for (int d = 0; d < seq_lens_vec.size(); ++d) {
+    OP_REQUIRES(context, seq_lens_vec[d] >= 0,
+                errors::InvalidArgument("seq_lens(", d, ") < 0"));
+    OP_REQUIRES(context, seq_lens_vec[d] <= input.dim_size(seq_dim),
+                errors::InvalidArgument("seq_lens(", d, ") > input.dims(",
+                                        seq_dim, ")"));
+  }
+}
+
+template <>
+void CheckErrors<GPUDevice>(OpKernelContext* context, int seq_dim) {
+  const Tensor& input = context->input(0);
+  const Tensor& seq_lens = context->input(1);
+
+  OP_REQUIRES(context, 0 != seq_dim, errors::InvalidArgument("0 == seq_dim"));
+  OP_REQUIRES(context, seq_dim < input.dims(),
+              errors::InvalidArgument("seq_dim must be < input.dims()", "( ",
+                                      seq_dim, " vs. ", input.dims(), ")"));
+
+  OP_REQUIRES(context, seq_lens.NumElements() == input.dim_size(0),
+              errors::InvalidArgument("len(seq_lens) != input.dims(", 0, "), ",
+                                      "(", seq_lens.NumElements(), " vs. ",
+                                      input.dim_size(seq_dim)));
+}
+
+template <typename Device, typename T>
+class ReverseSequenceOp : public OpKernel {
+ public:
+  explicit ReverseSequenceOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("seq_dim", &seq_dim_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& seq_lens = context->input(1);
+
+    // Preliminary validation of sizes.
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(seq_lens.shape()),
+                errors::InvalidArgument("seq_lens input must be 1-dim, not ",
+                                        seq_lens.dims()));
+
+    auto seq_lens_t = seq_lens.vec<int64>();
+
+    CheckErrors<Device>(context, seq_dim_);
+
+    const int input_dims = input.dims();
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input.shape(), &output));
+
+#define HANDLE_DIM(NDIM)                                                    \
+  case NDIM:                                                                \
+    functor::ReverseSequence<Device, T, NDIM>::Compute(                     \
+        context->eigen_device<Device>(), input.tensor<T, NDIM>(), seq_dim_, \
+        seq_lens_t, output->tensor<T, NDIM>());                             \
+    break;
+
+    switch (input_dims) {
+      HANDLE_DIM(2);
+      HANDLE_DIM(3);
+      HANDLE_DIM(4);
+      HANDLE_DIM(5);
+
+      default:
+        OP_REQUIRES(context, false,
+                    errors::InvalidArgument(
+                        "ReverseSequenceOp : Unhandled input dimensions: ",
+                        input_dims));
+    }
+  }
+
+ private:
+  int32 seq_dim_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(ReverseSequenceOp);
+};
+
+#define REGISTER_REVERSE_SEQUENCE(type)                                     \
+  REGISTER_KERNEL_BUILDER(                                                  \
+      Name("ReverseSequence").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ReverseSequenceOp<CPUDevice, type>);
+
+TF_CALL_NUMBER_TYPES(REGISTER_REVERSE_SEQUENCE);
+
+#if GOOGLE_CUDA
+
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T, Dims)                                      \
+  template <>                                                          \
+  void ReverseSequence<GPUDevice, T, Dims>::Compute(                   \
+      const GPUDevice& d, typename TTypes<T, Dims>::ConstTensor input, \
+      int32 seq_dim, TTypes<int64>::ConstVec seq_lens,                 \
+      typename TTypes<T, Dims>::Tensor output);                        \
+  extern template struct ReverseSequence<GPUDevice, T, Dims>;
+
+#define DECLARE_GPU_SPECS(T) \
+  DECLARE_GPU_SPEC(T, 2);    \
+  DECLARE_GPU_SPEC(T, 3);    \
+  DECLARE_GPU_SPEC(T, 4);    \
+  DECLARE_GPU_SPEC(T, 5);
+
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPECS);
+
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_REVERSE_SEQUENCE_GPU(type)                                 \
+  REGISTER_KERNEL_BUILDER(                                                  \
+      Name("ReverseSequence").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      ReverseSequenceOp<GPUDevice, type>);
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_REVERSE_SEQUENCE_GPU);
+
+#undef REGISTER_REVERSE_SEQUENCE_GPU
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/reverse_sequence_op.h b/tensorflow/core/kernels/reverse_sequence_op.h
new file mode 100644
index 0000000000..d1dd572dcb
--- /dev/null
+++ b/tensorflow/core/kernels/reverse_sequence_op.h
@@ -0,0 +1,56 @@
+#ifndef TENSORFLOW_KERNELS_REVERSE_SEQUENCE_OP_H_
+#define TENSORFLOW_KERNELS_REVERSE_SEQUENCE_OP_H_
+// Generator definition for ReverseSequenceOp, must be compilable by nvcc.
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+namespace generator {
+
+template <typename T, size_t Dims>
+class ReverseGenerator {
+ public:
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
+  ReverseGenerator(typename TTypes<T, Dims>::ConstTensor input, int32 seq_dim,
+                   TTypes<int64>::ConstVec seq_lengths)
+      : input_(input), seq_dim_(seq_dim), seq_lengths_(seq_lengths) {}
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE T
+  operator()(const Eigen::array<Eigen::DenseIndex, Dims>& coords) const {
+    Eigen::array<Eigen::DenseIndex, Dims> new_coords = coords;
+    if (coords[seq_dim_] < seq_lengths_(coords[0])) {
+      new_coords[seq_dim_] = seq_lengths_(coords[0]) - coords[seq_dim_] - 1;
+    }
+
+    return input_(new_coords);
+  }
+
+ private:
+  typename TTypes<T, Dims>::ConstTensor input_;
+  int32 seq_dim_;
+  TTypes<int64>::ConstVec seq_lengths_;
+};
+
+}  // namespace generator
+
+namespace functor {
+
+template <typename Device, typename T, size_t Dims>
+struct ReverseSequence {
+  EIGEN_ALWAYS_INLINE static void Compute(
+      const Device& d, typename TTypes<T, Dims>::ConstTensor input,
+      int32 seq_dim, TTypes<int64>::ConstVec seq_lengths,
+      typename TTypes<T, Dims>::Tensor output) {
+    generator::ReverseGenerator<T, Dims> generator(input, seq_dim, seq_lengths);
+    output.device(d) = input.generate(generator);
+  }
+};
+
+}  // namespace functor
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_REVERSE_SEQUENCE_OP_H_
diff --git a/tensorflow/core/kernels/reverse_sequence_op_gpu.cu.cc b/tensorflow/core/kernels/reverse_sequence_op_gpu.cu.cc
new file mode 100644
index 0000000000..7b5d533026
--- /dev/null
+++ b/tensorflow/core/kernels/reverse_sequence_op_gpu.cu.cc
@@ -0,0 +1,26 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/reverse_sequence_op.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+#define DEFINE_GPU_SPEC(T, dims)                       \
+  template class generator::ReverseGenerator<T, dims>; \
+  template struct functor::ReverseSequence<GPUDevice, T, dims>;
+
+#define DEFINE_GPU_SPECS(T) \
+  DEFINE_GPU_SPEC(T, 2);    \
+  DEFINE_GPU_SPEC(T, 3);    \
+  DEFINE_GPU_SPEC(T, 4);    \
+  DEFINE_GPU_SPEC(T, 5);
+
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS);
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/save_op.cc b/tensorflow/core/kernels/save_op.cc
new file mode 100644
index 0000000000..71a15c643e
--- /dev/null
+++ b/tensorflow/core/kernels/save_op.cc
@@ -0,0 +1,81 @@
+// See docs in ../ops/io_ops.cc
+#include "tensorflow/core/kernels/io.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/util/tensor_slice_writer.h"
+
+namespace tensorflow {
+
+class SaveOp : public OpKernel {
+ public:
+  explicit SaveOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    SaveTensors(context, &checkpoint::CreateTableTensorSliceBuilder, false);
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("Save").Device(DEVICE_CPU), SaveOp);
+
+class SaveSlicesOp : public OpKernel {
+ public:
+  explicit SaveSlicesOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    SaveTensors(context, &checkpoint::CreateTableTensorSliceBuilder, true);
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("SaveSlices").Device(DEVICE_CPU), SaveSlicesOp);
+
+class ShardedFilenameOp : public OpKernel {
+ public:
+  explicit ShardedFilenameOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    static const char* input_names[3] = {"basename", "shard", "num_shards"};
+    for (int i = 0; i < ctx->num_inputs(); ++i) {
+      OP_REQUIRES(ctx, TensorShapeUtils::IsLegacyScalar(ctx->input(i).shape()),
+                  errors::InvalidArgument(
+                      input_names[i], " must be a scalar, got shape ",
+                      ctx->input(i).shape().ShortDebugString()));
+    }
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({}), &out));
+    out->scalar<string>()() = strings::Printf(
+        "%s-%05d-of-%05d", ctx->input(0).scalar<string>()().c_str(),
+        ctx->input(1).scalar<int32>()(), ctx->input(2).scalar<int32>()());
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ShardedFilename").Device(DEVICE_CPU),
+                        ShardedFilenameOp);
+
+class ShardedFilespecOp : public OpKernel {
+ public:
+  explicit ShardedFilespecOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    static const char* input_names[2] = {"basename", "num_shards"};
+    for (int i = 0; i < ctx->num_inputs(); ++i) {
+      OP_REQUIRES(ctx, TensorShapeUtils::IsLegacyScalar(ctx->input(i).shape()),
+                  errors::InvalidArgument(
+                      input_names[i], " must be a scalar, got shape ",
+                      ctx->input(i).shape().ShortDebugString()));
+    }
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({}), &out));
+    out->scalar<string>()() = strings::Printf(
+        "%s-\?\?\?\?\?-of-%05d", ctx->input(0).scalar<string>()().c_str(),
+        ctx->input(1).scalar<int32>()());
+  }
+};
+REGISTER_KERNEL_BUILDER(Name("ShardedFilespec").Device(DEVICE_CPU),
+                        ShardedFilespecOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/save_op_test.cc b/tensorflow/core/kernels/save_op_test.cc
new file mode 100644
index 0000000000..ee1ba492a6
--- /dev/null
+++ b/tensorflow/core/kernels/save_op_test.cc
@@ -0,0 +1,443 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/tensor_slice_reader.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+class SaveOpTest : public OpsTestBase {
+ protected:
+  void MakeOp() {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "Save")
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Input(FakeInput(
+                      {DT_INT32, DT_FLOAT, DT_DOUBLE, DT_QINT8, DT_QINT32}))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(SaveOpTest, Simple) {
+  const string filename = io::JoinPath(testing::TmpDir(), "tensor_simple");
+  const string tensornames[] = {"tensor_int", "tensor_float", "tensor_double",
+                                "tensor_qint8", "tensor_qint32"};
+
+  MakeOp();
+  // Add a file name
+  AddInput<string>(TensorShape({}),
+                   [&filename](int x) -> string { return filename; });
+
+  // Add the tensor names
+  AddInput<string>(TensorShape({5}),
+                   [&tensornames](int x) -> string { return tensornames[x]; });
+
+  // Add a 1-d integer tensor
+  AddInput<int32>(TensorShape({10}), [](int x) -> int32 { return x + 1; });
+
+  // Add a 2-d float tensor
+  AddInput<float>(TensorShape({2, 4}),
+                  [](int x) -> float { return static_cast<float>(x) / 10; });
+
+  // Add a 2-d double tensor
+  AddInput<double>(TensorShape({2, 4}),
+                   [](int x) -> double { return static_cast<double>(x) / 20; });
+
+  // Add a 2-d qint8 tensor
+  AddInput<qint8>(TensorShape({3, 2}),
+                  [](int x) -> qint8 { return *reinterpret_cast<qint8*>(&x); });
+
+  // Add a 2-d qint32 tensor
+  AddInput<qint32>(TensorShape({2, 3}), [](int x) -> qint32 {
+    return *reinterpret_cast<qint32*>(&x) * qint8(2);
+  });
+
+  ASSERT_OK(RunOpKernel());
+
+  // Check that the checkpoint file is properly written
+  checkpoint::TensorSliceReader reader(filename,
+                                       checkpoint::OpenTableTensorSliceReader);
+  EXPECT_OK(reader.status());
+
+  // We expect to find all saved tensors
+  {
+    // The 1-d integer tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_int", &shape, &type));
+    TensorShape expected({10});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_INT32, type);
+
+    // We expect the tensor value to be correct.
+    TensorSlice s = TensorSlice::ParseOrDie("-");
+    int data[10];
+    std::fill_n(data, 10, 0);
+    EXPECT_TRUE(reader.CopySliceData("tensor_int", s, data));
+    for (int i = 0; i < 10; ++i) {
+      EXPECT_EQ(i + 1, data[i]);
+    }
+  }
+
+  {
+    // The 2-d float tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_float", &shape, &type));
+    TensorShape expected({2, 4});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_FLOAT, type);
+
+    // We expect the tensor value to be correct.
+    TensorSlice s = TensorSlice::ParseOrDie("-:-");
+    float data[8];
+    std::fill_n(data, 8, 0);
+    EXPECT_TRUE(reader.CopySliceData("tensor_float", s, data));
+    for (int i = 0; i < 8; ++i) {
+      EXPECT_EQ(static_cast<float>(i) / 10, data[i]);
+    }
+  }
+
+  {
+    // The 2-d double tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_double", &shape, &type));
+    TensorShape expected({2, 4});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_DOUBLE, type);
+
+    // We expect the tensor value to be correct.
+    TensorSlice s = TensorSlice::ParseOrDie("-:-");
+    double data[8];
+    std::fill_n(data, 8, 0);
+    EXPECT_TRUE(reader.CopySliceData("tensor_double", s, data));
+    for (int i = 0; i < 8; ++i) {
+      EXPECT_EQ(static_cast<double>(i) / 20, data[i]);
+    }
+  }
+
+  {
+    // The 2-d qint8 tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_qint8", &shape, &type));
+    TensorShape expected({3, 2});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_QINT8, type);
+
+    // We expect the tensor value to be correct.
+    TensorSlice s = TensorSlice::ParseOrDie("-:-");
+    qint8 data[6];
+    EXPECT_TRUE(reader.CopySliceData("tensor_qint8", s, data));
+    for (int i = 0; i < 6; ++i) {
+      EXPECT_EQ(*reinterpret_cast<qint8*>(&i), data[i]);
+    }
+  }
+
+  {
+    // The 2-d qint32 tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_qint32", &shape, &type));
+    TensorShape expected({2, 3});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_QINT32, type);
+
+    // We expect the tensor value to be correct.
+    TensorSlice s = TensorSlice::ParseOrDie("-:-");
+    qint32 data[6];
+    EXPECT_TRUE(reader.CopySliceData("tensor_qint32", s, data));
+    for (int i = 0; i < 6; ++i) {
+      EXPECT_EQ(*reinterpret_cast<qint32*>(&i) * qint8(2), data[i]);
+    }
+  }
+}
+
+class SaveSlicesOpTest : public OpsTestBase {
+ protected:
+  void MakeOp() {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "SaveSlices")
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Input(FakeInput(
+                      {DT_INT32, DT_FLOAT, DT_DOUBLE, DT_QINT8, DT_QINT32}))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+// Here we save only slices.  We restore them in a larger tensor and we check
+// that the right slice is restored.  It is quite tricky to check that the
+// right slices are actually restored so instead we just check that
+// CopySliceData() return true/false depending on the slice we ask for.
+TEST_F(SaveSlicesOpTest, Slices) {
+  const string filename = io::JoinPath(testing::TmpDir(), "tensor_slices");
+  const string tensornames[] = {"tensor_int", "tensor_float", "tensor_double",
+                                "tensor_qint8", "tensor_qint32"};
+  // Specifies that the data we save are slices of larger tensors.
+  // See core/framework/tensor_slice.h for the slice syntax.
+  const string tensorshapes[] = {
+      "10 -",         // Full contents of a 10 element vector.
+      "2 4 -:0,2",    // A 2x2 slice of a 2x4 tensor.
+      "2 4 0,1:2,2",  // A 1x2 slice of a 2x4 tensor.
+      "3 2 -:-",      // Full contents of a 3x2 tensor.
+      "2 3 1,1:2,1"   // Another 1x1 slice of a2x3 tensor.
+  };
+
+  MakeOp();
+  // Add a file name
+  AddInput<string>(TensorShape({}),
+                   [&filename](int x) -> string { return filename; });
+
+  // Add the tensor names
+  AddInput<string>(TensorShape({5}),
+                   [&tensornames](int x) -> string { return tensornames[x]; });
+
+  // Add the tensor shapes and slices
+  AddInput<string>(TensorShape({5}), [&tensorshapes](int x) -> string {
+    return tensorshapes[x];
+  });
+
+  // Add a 1-d integer tensor
+  AddInput<int32>(TensorShape({10}), [](int x) -> int32 { return x + 1; });
+
+  // Add a 2-d float tensor
+  AddInput<float>(TensorShape({2, 2}),
+                  [](int x) -> float { return static_cast<float>(x) / 10; });
+
+  // Add a 2-d double tensor
+  AddInput<double>(TensorShape({1, 2}),
+                   [](int x) -> double { return static_cast<double>(x) / 20; });
+
+  // Add a 2-d qint8 tensor
+  AddInput<qint8>(TensorShape({3, 2}),
+                  [](int x) -> qint8 { return *reinterpret_cast<qint8*>(&x); });
+
+  // Add a 2-d qint32 tensor
+  AddInput<qint32>(TensorShape({1, 1}), [](int x) -> qint32 {
+    return *reinterpret_cast<qint32*>(&x) * qint8(2);
+  });
+
+  ASSERT_OK(RunOpKernel());
+
+  // Check that the checkpoint file is properly written
+  checkpoint::TensorSliceReader reader(filename,
+                                       checkpoint::OpenTableTensorSliceReader);
+  EXPECT_OK(reader.status());
+
+  // We expect to find all saved tensors
+  {
+    // The 1-d integer tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_int", &shape, &type));
+    TensorShape expected({10});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_INT32, type);
+
+    // We saved the full tensor so we should be able to read it all.
+    TensorSlice s = TensorSlice::ParseOrDie("-");
+    int data[10];
+    EXPECT_TRUE(reader.CopySliceData("tensor_int", s, data));
+  }
+
+  {
+    // The 2-d float tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_float", &shape, &type));
+    TensorShape expected({2, 4});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_FLOAT, type);
+
+    // We saved the slice "-:0,2" so we should not be able to read the full
+    // tensor.
+    TensorSlice full_slice = TensorSlice::ParseOrDie("-:-");
+    TensorSlice saved_slice = TensorSlice::ParseOrDie("-:0,2");
+    float data[8];
+    EXPECT_FALSE(reader.CopySliceData("tensor_float", full_slice, data));
+    EXPECT_TRUE(reader.CopySliceData("tensor_float", saved_slice, data));
+  }
+
+  {
+    // The 2-d double tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_double", &shape, &type));
+    TensorShape expected({2, 4});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_DOUBLE, type);
+
+    // We saved the slice "0,1:2,2" so we should not be able to read the full
+    // tensor.
+    TensorSlice full_slice = TensorSlice::ParseOrDie("-:-");
+    TensorSlice saved_slice = TensorSlice::ParseOrDie("0,1:2,2");
+    double data[8];
+    EXPECT_FALSE(reader.CopySliceData("tensor_double", full_slice, data));
+    EXPECT_TRUE(reader.CopySliceData("tensor_double", saved_slice, data));
+  }
+
+  {
+    // The 2-d qint8 tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_qint8", &shape, &type));
+    TensorShape expected({3, 2});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_QINT8, type);
+
+    // We saved the full slice.
+    TensorSlice s = TensorSlice::ParseOrDie("-:-");
+    qint8 data[6];
+    EXPECT_TRUE(reader.CopySliceData("tensor_qint8", s, data));
+  }
+
+  {
+    // The 2-d qint32 tensor
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("tensor_qint32", &shape, &type));
+    TensorShape expected({2, 3});
+    EXPECT_TRUE(shape.IsSameSize(expected));
+    EXPECT_EQ(DT_QINT32, type);
+
+    // We expect the tensor value to be correct.
+    TensorSlice s = TensorSlice::ParseOrDie("1,1:2,1");
+    TensorSlice full_slice = TensorSlice::ParseOrDie("-:-");
+    TensorSlice saved_slice = TensorSlice::ParseOrDie("1,1:2,1");
+    qint32 data[6];
+    EXPECT_FALSE(reader.CopySliceData("tensor_qint32", full_slice, data));
+    EXPECT_TRUE(reader.CopySliceData("tensor_qint32", saved_slice, data));
+  }
+}
+
+class SaveOpSlices2Test : public OpsTestBase {
+ protected:
+  void MakeOp() {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "SaveSlices")
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Input(FakeInput({DT_INT32, DT_INT32, DT_FLOAT}))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(SaveOpSlices2Test, TwoSlices) {
+  const string filename = io::JoinPath(testing::TmpDir(), "three_slices");
+  // We will save 2 slices of the tensor named "four_by_sixteen" which is 4x16,
+  // and one slice of the "small" tensor.
+  const string tensornames[] = {"four_by_sixteen", "four_by_sixteen", "small"};
+  const string tensorshapes[] = {
+      // Slice specifications for the 2 slices of "four_by_sixteen"
+      "4 16 0,2:-",  // 1st slice covers indices 0 and 1 in the first dim.
+      "4 16 2,2:-",  // 2nd slice covers indices 2 and 3 in the first dim.
+      ""             // We save the full "small" tensors.
+  };
+
+  MakeOp();
+  // Add a file name
+  AddInput<string>(TensorShape({}),
+                   [&filename](int x) -> string { return filename; });
+
+  // Add the tensor names
+  AddInput<string>(TensorShape({3}),
+                   [&tensornames](int x) -> string { return tensornames[x]; });
+
+  // Add the tensor shapes and slices
+  AddInput<string>(TensorShape({3}), [&tensorshapes](int x) -> string {
+    return tensorshapes[x];
+  });
+
+  // Add an integer tensor for slice 0,2:- of a 4x16 tensor: It is 2x16.
+  AddInput<int32>(TensorShape({2, 16}), [](int x) -> int32 { return x + 1; });
+
+  // Add an integer tensor for slice 2,2:- of a 4x16 tensor: It is 2x16.
+  AddInput<int32>(TensorShape({2, 16}),
+                  [](int x) -> int32 { return 10 * (x + 1); });
+
+  // Add a float tensor for "small"
+  AddInput<float>(TensorShape({2, 4}),
+                  [](int x) -> float { return static_cast<float>(x) / 10; });
+
+  ASSERT_OK(RunOpKernel());
+
+  // Check that the checkpoint file is properly written
+  checkpoint::TensorSliceReader reader(filename,
+                                       checkpoint::OpenTableTensorSliceReader);
+  EXPECT_OK(reader.status());
+
+  {
+    // Reload the two slices of "four_by_sixteen" into that tensor.
+    Tensor reloaded(DT_INT32, {4, 16});
+
+    // We expect to find all slices
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("four_by_sixteen", &shape, &type));
+    EXPECT_TRUE(shape.IsSameSize(reloaded.shape()));
+    EXPECT_EQ(type, reloaded.dtype());
+
+    // Reload the whole tensor.
+    EXPECT_TRUE(reader.CopySliceData("four_by_sixteen",
+                                     TensorSlice(reloaded.dims()),
+                                     reloaded.flat<int>().data()));
+
+    {
+      auto slice = reloaded.Slice(0, 2).flat<int>();
+      for (int i = 0; i < slice.size(); ++i) {
+        EXPECT_EQ(i + 1, slice(i));
+      }
+    }
+    {
+      auto slice = reloaded.Slice(2, 4).flat<int>();
+      for (int i = 0; i < slice.size(); ++i) {
+        EXPECT_EQ(10 * (i + 1), slice(i));
+      }
+    }
+  }
+
+  {
+    // Reload the small float tensor.
+    Tensor reloaded(DT_FLOAT, {2, 4});
+
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("small", &shape, &type));
+    EXPECT_TRUE(shape.IsSameSize(reloaded.shape()));
+    EXPECT_EQ(DT_FLOAT, reloaded.dtype());
+
+    EXPECT_TRUE(reader.CopySliceData("small", TensorSlice(reloaded.dims()),
+                                     reloaded.flat<float>().data()));
+
+    for (int64 i = 0; i < reloaded.NumElements(); ++i) {
+      EXPECT_EQ(static_cast<float>(i) / 10, reloaded.flat<float>().data()[i]);
+    }
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/scatter_op.cc b/tensorflow/core/kernels/scatter_op.cc
new file mode 100644
index 0000000000..88fcc1bdcc
--- /dev/null
+++ b/tensorflow/core/kernels/scatter_op.cc
@@ -0,0 +1,167 @@
+// See docs in ../ops/state_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+enum class UpdateOp { ASSIGN, ADD, SUB };
+
+template <class T, typename Index, UpdateOp op>
+class ScatterUpdateOp : public OpKernel {
+ public:
+  //   QUESTION: It'd be nice to support DT_INT16, DT_UINT8,
+  //   etc. here.  Should we have the framework do some sort of
+  //   integer promotion automatically, or should that be something
+  //   that users have to do explicitly with a conversion operator
+  //   in the graph?
+  explicit ScatterUpdateOp(OpKernelConstruction* c) : OpKernel(c) {
+    OP_REQUIRES_OK(c, c->GetAttr("use_locking", &use_exclusive_lock_));
+  }
+
+  void Compute(OpKernelContext* c) override {
+    if (use_exclusive_lock_) {
+      // Hold mutex while we apply updates
+      mutex_lock l(*c->input_ref_mutex(0));
+      DoCompute(c);
+    } else {
+      DoCompute(c);
+    }
+  }
+
+ private:
+  bool use_exclusive_lock_;
+
+  // Check whether updates.shape = indices.shape + params.shape[1:]
+  static bool ValidShapes(const Tensor& params, const Tensor& updates,
+                          const Tensor& indices) {
+    if (updates.dims() != indices.dims() + params.dims() - 1) return false;
+    for (int d = 0; d < indices.dims(); d++) {
+      if (updates.dim_size(d) != indices.dim_size(d)) {
+        return false;
+      }
+    }
+    for (int d = 1; d < params.dims(); d++) {
+      if (params.dim_size(d) != updates.dim_size(d - 1 + indices.dims())) {
+        return false;
+      }
+    }
+    return true;
+  }
+
+  void DoCompute(OpKernelContext* c) {
+    Tensor Tparams = c->mutable_input(0, use_exclusive_lock_);
+    OP_REQUIRES(c, Tparams.IsInitialized(),
+                errors::FailedPrecondition("Null ref for params"));
+    const Tensor& Tindices = c->input(1);
+    const Tensor& Tupdates = c->input(2);
+    OP_REQUIRES(
+        c, TensorShapeUtils::IsVectorOrHigher(Tparams.shape()),
+        errors::InvalidArgument("params must be at least 1-D, got shape ",
+                                Tparams.shape().ShortDebugString()));
+    OP_REQUIRES(
+        c, ValidShapes(Tparams, Tupdates, Tindices),
+        errors::InvalidArgument(
+            "Must have updates.shape = indices.shape + params.shape[1:], got ",
+            "updates.shape ", Tupdates.shape().ShortDebugString(),
+            ", indices.shape ", Tindices.shape().ShortDebugString(),
+            ", params.shape ", Tparams.shape().ShortDebugString()));
+    const Index N = Tindices.NumElements();
+
+    // We always return the input ref.
+    c->forward_ref_input_to_ref_output(0, 0);
+
+    if (N > 0) {
+      const Index first_dim_size = Tparams.dim_size(0);
+      // Validate all the indices are in range
+      auto Tindices_vec = Tindices.flat<Index>();
+      for (Index i = 0; i < N; i++) {
+        const Index index = Tindices_vec(i);
+        OP_REQUIRES(c, index >= 0 && index < first_dim_size,
+                    errors::InvalidArgument(
+                        strings::StrCat("Index ", index, " at offset ", i,
+                                        " in indices is out of range")));
+      }
+      auto Tparams_flat = Tparams.flat_outer_dims<T>();
+      auto Tupdates_flat =
+          Tupdates.shaped<T, 2>({N, Tupdates.NumElements() / N});
+      for (Index i = 0; i < N; i++) {
+        // Copy last Ndim-1 dimensions of Tupdates[i] to
+        // Tparams[Tindices[i]]
+        switch (op) {
+          case UpdateOp::ASSIGN: {
+            Tparams_flat.template chip<0>(Tindices_vec(i)) =
+                Tupdates_flat.template chip<0>(i);
+            break;
+          }
+          case UpdateOp::ADD: {
+            Tparams_flat.template chip<0>(Tindices_vec(i)) +=
+                Tupdates_flat.template chip<0>(i);
+            break;
+          }
+          case UpdateOp::SUB: {
+            Tparams_flat.template chip<0>(Tindices_vec(i)) -=
+                Tupdates_flat.template chip<0>(i);
+            break;
+          }
+        }
+      }
+    }
+  }
+};
+
+#define REGISTER_SCATTER_UPDATE(type, index_type)  \
+  REGISTER_KERNEL_BUILDER(                         \
+      Name("ScatterUpdate")                        \
+          .Device(DEVICE_CPU)                      \
+          .TypeConstraint<type>("T")               \
+          .TypeConstraint<index_type>("Tindices"), \
+      ScatterUpdateOp<type, index_type, UpdateOp::ASSIGN>);
+
+#define REGISTER_SCATTER_UPDATE_INT32(type) REGISTER_SCATTER_UPDATE(type, int32)
+#define REGISTER_SCATTER_UPDATE_INT64(type) REGISTER_SCATTER_UPDATE(type, int64)
+
+TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_UPDATE_INT32);
+TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_UPDATE_INT64);
+
+#undef REGISTER_SCATTER_UPDATE_INT64
+#undef REGISTER_SCATTER_UPDATE_INT32
+#undef REGISTER_SCATTER_UPDATE
+
+#define REGISTER_SCATTER_ADD(type, index_type)                         \
+  REGISTER_KERNEL_BUILDER(Name("ScatterAdd")                           \
+                              .Device(DEVICE_CPU)                      \
+                              .TypeConstraint<type>("T")               \
+                              .TypeConstraint<index_type>("Tindices"), \
+                          ScatterUpdateOp<type, index_type, UpdateOp::ADD>);
+
+#define REGISTER_SCATTER_ADD_INT32(type) REGISTER_SCATTER_ADD(type, int32)
+#define REGISTER_SCATTER_ADD_INT64(type) REGISTER_SCATTER_ADD(type, int64)
+
+TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_ADD_INT32);
+TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_ADD_INT64);
+
+#undef REGISTER_SCATTER_ADD_INT32
+#undef REGISTER_SCATTER_ADD_INT64
+#undef REGISTER_SCATTER_ADD
+
+#define REGISTER_SCATTER_SUB(type, index_type)                         \
+  REGISTER_KERNEL_BUILDER(Name("ScatterSub")                           \
+                              .Device(DEVICE_CPU)                      \
+                              .TypeConstraint<type>("T")               \
+                              .TypeConstraint<index_type>("Tindices"), \
+                          ScatterUpdateOp<type, index_type, UpdateOp::SUB>);
+
+#define REGISTER_SCATTER_SUB_INT32(type) REGISTER_SCATTER_SUB(type, int32)
+#define REGISTER_SCATTER_SUB_INT64(type) REGISTER_SCATTER_SUB(type, int64)
+
+TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_SUB_INT32);
+TF_CALL_NUMBER_TYPES(REGISTER_SCATTER_SUB_INT64);
+
+#undef REGISTER_SCATTER_SUB_INT64
+#undef REGISTER_SCATTER_SUB_INT32
+#undef REGISTER_SCATTER_SUB
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/scatter_op_test.cc b/tensorflow/core/kernels/scatter_op_test.cc
new file mode 100644
index 0000000000..8885f1edb3
--- /dev/null
+++ b/tensorflow/core/kernels/scatter_op_test.cc
@@ -0,0 +1,255 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace {
+
+class ScatterUpdateOpTest : public OpsTestBase {
+ protected:
+  void MakeOp(DataType index_type) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "ScatterUpdate")
+                  .Input(FakeInput(DT_FLOAT_REF))
+                  .Input(FakeInput(index_type))
+                  .Input(FakeInput(DT_FLOAT))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(ScatterUpdateOpTest, Simple_TwoD32) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0});
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 2});
+  AddInputFromArray<float>(TensorShape({3, 3}),
+                           {100, 101, 102, 777, 778, 779, 10000, 10001, 10002});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the new state of the input
+  Tensor params_tensor = *mutable_input(0).tensor;
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({5, 3}));
+  test::FillValues<float>(&expected, {100, 101, 102, 0, 0, 0, 10000, 10001,
+                                      10002, 0, 0, 0, 777, 778, 779});
+  test::ExpectTensorEqual<float>(expected, params_tensor);
+}
+
+TEST_F(ScatterUpdateOpTest, Simple_Two64) {
+  MakeOp(DT_INT64);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0});
+  AddInputFromArray<int64>(TensorShape({3}), {0, 4, 2});
+  AddInputFromArray<float>(TensorShape({3, 3}),
+                           {100, 101, 102, 777, 778, 779, 10000, 10001, 10002});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the new state of the input
+  Tensor params_tensor = *mutable_input(0).tensor;
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({5, 3}));
+  test::FillValues<float>(&expected, {100, 101, 102, 0, 0, 0, 10000, 10001,
+                                      10002, 0, 0, 0, 777, 778, 779});
+  test::ExpectTensorEqual<float>(expected, params_tensor);
+}
+
+TEST_F(ScatterUpdateOpTest, Simple_ZeroD) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5}), {0, 0, 0, 0, 0});
+  AddInputFromArray<int32>(TensorShape({}), {3});
+  AddInputFromArray<float>(TensorShape({}), {101});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the new state of the input
+  Tensor params_tensor = *mutable_input(0).tensor;
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({5}));
+  test::FillValues<float>(&expected, {0, 0, 0, 101, 0});
+  test::ExpectTensorEqual<float>(expected, params_tensor);
+}
+
+TEST_F(ScatterUpdateOpTest, Simple_OneD) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5}), {0, 0, 0, 0, 0});
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 2});
+  AddInputFromArray<float>(TensorShape({3}), {100, 101, 102});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the new state of the input
+  Tensor params_tensor = *mutable_input(0).tensor;
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({5}));
+  test::FillValues<float>(&expected, {100, 0, 102, 0, 101});
+  test::ExpectTensorEqual<float>(expected, params_tensor);
+}
+
+TEST_F(ScatterUpdateOpTest, HigherRank) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({8}), {0, 0, 0, 0, 0, 0, 0, 0});
+  AddInputFromArray<int32>(TensorShape({2, 3}), {0, 4, 2, 1, 3, 6});
+  AddInputFromArray<float>(TensorShape({2, 3}), {10, 20, 30, 40, 50, 60});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the new state of the input
+  Tensor params_tensor = *mutable_input(0).tensor;
+  Tensor expected(allocator(), DT_FLOAT, TensorShape({8}));
+  test::FillValues<float>(&expected, {10, 40, 30, 50, 20, 0, 60, 0});
+  test::ExpectTensorEqual<float>(expected, params_tensor);
+}
+
+TEST_F(ScatterUpdateOpTest, Error_IndexOutOfRange) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0});
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 99});
+  AddInputFromArray<float>(TensorShape({3, 3}),
+                           {100, 101, 102, 777, 778, 779, 10000, 10001, 10002});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString())
+                  .contains("Index 99 at offset 2 in indices is out of range"))
+      << s;
+}
+
+TEST_F(ScatterUpdateOpTest, Error_WrongDimsIndices) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({2, 3}), {0, 0, 0, 0, 0, 0});
+  AddInputFromArray<int32>(TensorShape({1, 3}), {0, 4, 99});
+  AddInputFromArray<float>(TensorShape({3, 3}),
+                           {100, 101, 102, 777, 778, 779, 10000, 10001, 10002});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString())
+                  .contains("Must have updates.shape = indices.shape + "
+                            "params.shape[1:], got "))
+      << s;
+}
+
+TEST_F(ScatterUpdateOpTest, Error_MismatchedParamsAndUpdateDimensions) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0});
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 2});
+  AddInputFromArray<float>(
+      TensorShape({3, 4}),
+      {100, 101, 102, 103, 777, 778, 779, 780, 10000, 10001, 10002, 10004});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString())
+                  .contains("Must have updates.shape = indices.shape + "
+                            "params.shape[1:], got "))
+
+      << s;
+}
+
+TEST_F(ScatterUpdateOpTest, Error_MismatchedIndicesAndUpdateDimensions) {
+  MakeOp(DT_INT32);
+
+  // Feed and run
+  AddInputFromArray<float>(TensorShape({5, 3}),
+                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0});
+  AddInputFromArray<int32>(TensorShape({3}), {0, 4, 2});
+  AddInputFromArray<float>(TensorShape({2, 3}),
+                           {100, 101, 102, 10000, 10001, 10002});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString())
+                  .contains("Must have updates.shape = indices.shape + "
+                            "params.shape[1:], got "))
+      << s;
+}
+
+class ScatterUpdateBM : public ScatterUpdateOpTest {
+ public:
+  virtual void TestBody() {}
+  void MakeBenchmarkOp(const char* op, DataType index_type) {
+    ASSERT_OK(NodeDefBuilder("myop", op)
+                  .Input(FakeInput(DT_FLOAT_REF))
+                  .Input(FakeInput(index_type))
+                  .Input(FakeInput(DT_FLOAT))
+                  .Finalize(node_def()));
+    TF_CHECK_OK(InitOp());
+  }
+};
+
+template <typename Index>
+static void BM_ScatterHelper(int iters, int embedding_size, const char* op) {
+  testing::StopTiming();
+  const int kRows = 10000000 / embedding_size;
+  std::vector<float> values;
+  for (int i = 0; i < kRows * embedding_size; i++) {
+    values.push_back(i);
+  }
+  const int kNumUpdates = 1000;
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  std::vector<Index> indices;
+  std::vector<float> updates;
+  for (int i = 0; i < kNumUpdates; i++) {
+    indices.push_back(rnd.Uniform(kRows));
+    for (int j = 0; j < embedding_size; j++) {
+      updates.push_back(i * 10 + j);
+    }
+  }
+
+  ScatterUpdateBM bm;
+  bm.MakeBenchmarkOp(op, DataTypeToEnum<Index>::v());
+  bm.AddInputFromArray<float>(TensorShape({kRows, embedding_size}), values);
+  bm.AddInputFromArray<Index>(TensorShape({kNumUpdates}), indices);
+  bm.AddInputFromArray<float>(TensorShape({kNumUpdates, embedding_size}),
+                              updates);
+  testing::ItemsProcessed((static_cast<int64>(kNumUpdates) * embedding_size) *
+                          iters);
+  testing::StartTiming();
+  while (iters-- > 0) {
+    Status s = bm.RunOpKernel();
+  }
+}
+
+static void BM_ScatterUpdateInt32(int iters, int embedding_size) {
+  BM_ScatterHelper<int32>(iters, embedding_size, "ScatterUpdate");
+}
+static void BM_ScatterUpdateInt64(int iters, int embedding_size) {
+  BM_ScatterHelper<int64>(iters, embedding_size, "ScatterUpdate");
+}
+
+static void BM_ScatterAddInt32(int iters, int embedding_size) {
+  BM_ScatterHelper<int32>(iters, embedding_size, "ScatterAdd");
+}
+static void BM_ScatterAddInt64(int iters, int embedding_size) {
+  BM_ScatterHelper<int64>(iters, embedding_size, "ScatterAdd");
+}
+
+BENCHMARK(BM_ScatterUpdateInt32)->Arg(1)->Arg(10)->Arg(64)->Arg(256)->Arg(1024);
+BENCHMARK(BM_ScatterUpdateInt64)->Arg(1)->Arg(10)->Arg(64)->Arg(256)->Arg(1024);
+
+BENCHMARK(BM_ScatterAddInt32)->Arg(1)->Arg(10)->Arg(64)->Arg(256)->Arg(1024);
+BENCHMARK(BM_ScatterAddInt64)->Arg(1)->Arg(10)->Arg(64)->Arg(256)->Arg(1024);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/segment_reduction_ops.cc b/tensorflow/core/kernels/segment_reduction_ops.cc
new file mode 100644
index 0000000000..2b6a8c5a88
--- /dev/null
+++ b/tensorflow/core/kernels/segment_reduction_ops.cc
@@ -0,0 +1,466 @@
+// See docs in ../ops/math_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/Eigen/Core"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+// This operator handles reducing segments along the first dimension.
+// See core/ops/math_ops.cc for more details.
+template <typename Device, class T, class Index, typename Reducer>
+class SegmentReductionOp : public OpKernel {
+ public:
+  explicit SegmentReductionOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& segment_ids = context->input(1);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(segment_ids.shape()),
+                errors::InvalidArgument("segment_ids should be a vector."));
+    const int64 num_indices = segment_ids.NumElements();
+    OP_REQUIRES(context, num_indices == input.dim_size(0),
+                errors::InvalidArgument(
+                    "segment_ids should be the same size as dimension 0 of"
+                    " input."));
+
+    auto input_flat = input.flat_outer_dims<T>();
+    const int64 num_col = input_flat.dimension(1);
+
+    const auto segment_vec = segment_ids.vec<Index>();
+    // Note that the current implementation assumes that segment_vec values are
+    // sorted.
+    const Index output_rows =
+        num_indices > 0 ? segment_vec(num_indices - 1) + 1 : 0;
+
+    TensorShape output_shape = input.shape();
+    output_shape.set_dim(0, output_rows);
+
+    // Note that we do not initialize the output buffer with a default value.
+    // We require that segment ids be sorted and cover all values (otherwise we
+    // return an error).
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+    auto output_flat = output->flat_outer_dims<T>();
+
+#if !defined(EIGEN_HAS_INDEX_LIST)
+    Eigen::DSizes<Eigen::DenseIndex, 1> dims_to_reduce;
+    dims_to_reduce[0] = 0;
+#else
+    Eigen::IndexList<Eigen::type2index<0>> dims_to_reduce;
+#endif
+    Index start = 0, end = 1;
+    // TODO(agarwal): if this loop becomes a bottleneck, consider sharding it
+    // across threads.
+    Eigen::DSizes<Eigen::DenseIndex, 1> out_slice_shape(num_col);
+    while (end <= num_indices) {
+      if (end < num_indices) {
+        if (segment_vec(start) == segment_vec(end)) {
+          ++end;
+          continue;
+        }
+        // We have a new segment here.  Verify that the segment ids grow by one
+        // each time, so that we cover every possible output value.
+        OP_REQUIRES(
+            context, segment_vec(start) + 1 == segment_vec(end),
+            errors::InvalidArgument("segment ids are not increasing by 1"));
+      }
+
+      // Process segment [start, end)
+      const T* in_slice_ptr = &input_flat(start, 0);
+      typedef Eigen::TensorMap<Eigen::Tensor<T, 1, Eigen::RowMajor>,
+                               Eigen::Unaligned> OutT;
+      T* out_slice_ptr = &output_flat(segment_vec(start), 0);
+      OutT out_slice(out_slice_ptr, out_slice_shape);
+      // We don't use out_slice.device(context->egien_device<Device>)
+      // because these pieces of work are likely to be very small and
+      // the context switching overhead dwarfs any benefit we get from
+      // using another thread to do this work.
+      if (start == end - 1) {
+        typedef Eigen::TensorMap<Eigen::Tensor<const T, 1, Eigen::RowMajor>,
+                                 Eigen::Unaligned> InT;
+        InT in_slice(in_slice_ptr, out_slice_shape);
+        out_slice = in_slice;
+      } else {
+        Eigen::DSizes<Eigen::DenseIndex, 2> in_slice_shape(end - start,
+                                                           num_col);
+        typedef Eigen::TensorMap<Eigen::Tensor<const T, 2, Eigen::RowMajor>,
+                                 Eigen::Unaligned> InT;
+        InT in_slice(in_slice_ptr, in_slice_shape);
+
+        out_slice = in_slice.reduce(dims_to_reduce, Reducer());
+      }
+      start = end;
+      ++end;
+    }
+  }
+};
+
+#define REGISTER_CPU_KERNELS(type, index_type)                 \
+  REGISTER_KERNEL_BUILDER(                                     \
+      Name("SegmentSum")                                       \
+          .Device(DEVICE_CPU)                                  \
+          .TypeConstraint<type>("T")                           \
+          .TypeConstraint<index_type>("Tindices"),             \
+      SegmentReductionOp<CPUDevice, type, index_type,          \
+                         Eigen::internal::SumReducer<type>>);  \
+  REGISTER_KERNEL_BUILDER(                                     \
+      Name("SegmentMean")                                      \
+          .Device(DEVICE_CPU)                                  \
+          .TypeConstraint<type>("T")                           \
+          .TypeConstraint<index_type>("Tindices"),             \
+      SegmentReductionOp<CPUDevice, type, index_type,          \
+                         Eigen::internal::MeanReducer<type>>); \
+  REGISTER_KERNEL_BUILDER(                                     \
+      Name("SegmentProd")                                      \
+          .Device(DEVICE_CPU)                                  \
+          .TypeConstraint<type>("T")                           \
+          .TypeConstraint<index_type>("Tindices"),             \
+      SegmentReductionOp<CPUDevice, type, index_type,          \
+                         Eigen::internal::ProdReducer<type>>); \
+  REGISTER_KERNEL_BUILDER(                                     \
+      Name("SegmentMin")                                       \
+          .Device(DEVICE_CPU)                                  \
+          .TypeConstraint<type>("T")                           \
+          .TypeConstraint<index_type>("Tindices"),             \
+      SegmentReductionOp<CPUDevice, type, index_type,          \
+                         Eigen::internal::MinReducer<type>>);  \
+  REGISTER_KERNEL_BUILDER(                                     \
+      Name("SegmentMax")                                       \
+          .Device(DEVICE_CPU)                                  \
+          .TypeConstraint<type>("T")                           \
+          .TypeConstraint<index_type>("Tindices"),             \
+      SegmentReductionOp<CPUDevice, type, index_type,          \
+                         Eigen::internal::MaxReducer<type>>);
+
+#define REGISTER_CPU_KERNELS_ALL(type) \
+  REGISTER_CPU_KERNELS(type, int32);   \
+  REGISTER_CPU_KERNELS(type, int64);
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_KERNELS_ALL);
+#undef REGISTER_CPU_KERNELS
+#undef REGISTER_CPU_KERNELS_ALL
+
+// Similar to SegmentReductionOp but can handle unsorted segment definitions and
+// specifying size of output.
+template <typename Device, class T, class Index>
+class UnsortedSegmentSumOp : public OpKernel {
+ public:
+  explicit UnsortedSegmentSumOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& data = context->input(0);
+    const Tensor& segment_ids = context->input(1);
+    const Tensor& num_segments = context->input(2);
+
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsLegacyScalar(num_segments.shape()),
+        errors::InvalidArgument("num_segments should be a scalar, not shape ",
+                                num_segments.shape().ShortDebugString()));
+
+    OP_REQUIRES(context,
+                TensorShapeUtils::StartsWith(data.shape(), segment_ids.shape()),
+                errors::InvalidArgument(
+                    "data.shape = ", data.shape().ShortDebugString(),
+                    " does not start with segment_ids.shape = ",
+                    segment_ids.shape().ShortDebugString()));
+
+    const auto segment_flat = segment_ids.flat<Index>();
+    const int32 N = segment_flat.dimension(0);
+    const int32 output_rows = num_segments.scalar<int32>()();
+
+    if (N > 0) {
+      Eigen::Tensor<Index, 0, Eigen::RowMajor> m = segment_flat.maximum();
+      OP_REQUIRES(
+          context, m() < output_rows,
+          errors::InvalidArgument("More segments found than output size"));
+    }
+
+    TensorShape output_shape;
+    output_shape.AddDim(output_rows);
+    for (int i = segment_ids.dims(); i < data.dims(); i++) {
+      output_shape.AddDim(data.dim_size(i));
+    }
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+    auto output_flat = output->flat_outer_dims<T>();
+    output_flat.setZero();
+
+    if (data.NumElements() > 0) {
+      auto data_flat = data.shaped<T, 2>({N, data.NumElements() / N});
+      for (int i = 0; i < N; ++i) {
+        output_flat.template chip<0>(segment_flat(i)) +=
+            data_flat.template chip<0>(i);
+      }
+    }
+  }
+};
+
+#define REGISTER_CPU_UNSORTED_KERNELS(type, index_type)                \
+  REGISTER_KERNEL_BUILDER(Name("UnsortedSegmentSum")                   \
+                              .Device(DEVICE_CPU)                      \
+                              .TypeConstraint<type>("T")               \
+                              .TypeConstraint<index_type>("Tindices"), \
+                          UnsortedSegmentSumOp<CPUDevice, type, index_type>);
+
+#define REGISTER_CPU_UNSORTED_KERNELS_ALL(type) \
+  REGISTER_CPU_UNSORTED_KERNELS(type, int32);   \
+  REGISTER_CPU_UNSORTED_KERNELS(type, int64);
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_UNSORTED_KERNELS_ALL);
+#undef REGISTER_CPU_UNSORTED_KERNELS
+#undef REGISTER_CPU_UNSORTED_KERNELS_ALL
+
+// Same as SegmentReductionOp but takes as input a "sparse" tensor, represented
+// by two dense tensors, one containing the data, and the other containing
+// indices into the data.
+template <typename Device, class T>
+class SparseSegmentReductionOpBase : public OpKernel {
+ public:
+  explicit SparseSegmentReductionOpBase(OpKernelConstruction* context,
+                                        bool is_mean)
+      : OpKernel(context), is_mean_(is_mean) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& indices = context->input(1);
+    const Tensor& segment_ids = context->input(2);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(indices.shape()),
+                errors::InvalidArgument("indices should be a vector."));
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(segment_ids.shape()),
+                errors::InvalidArgument("segment_ids should be a vector."));
+
+    const int32 num_indices = indices.NumElements();
+    OP_REQUIRES(context, num_indices == segment_ids.NumElements(),
+                errors::InvalidArgument(
+                    "segment_ids and indices should have same size."));
+
+    auto input_flat = input.flat_outer_dims<T>();
+
+    const auto indices_vec = indices.vec<int32>();
+    const auto segment_vec = segment_ids.vec<int32>();
+    // Note that the current implementation assumes that segment_vec values are
+    // sorted.
+    const int32 output_rows =
+        num_indices > 0 ? segment_vec(num_indices - 1) + 1 : 0;
+
+    TensorShape output_shape = input.shape();
+    output_shape.set_dim(0, output_rows);
+
+    // Note that we do not initialize the output buffer with a default value.
+    // We require that segment ids be sorted and cover all values (otherwise we
+    // return an error).
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+    if (num_indices == 0) return;
+    auto output_flat = output->flat_outer_dims<T>();
+
+    int32 start = 0, end = 1;
+    while (end <= num_indices) {
+      if (end < num_indices) {
+        if (segment_vec(start) == segment_vec(end)) {
+          ++end;
+          continue;
+        }
+        // We have a new segment here.  Verify that the segment ids grow by one
+        // each time, so that we cover every possible output value.
+        OP_REQUIRES(
+            context, segment_vec(start) + 1 == segment_vec(end),
+            errors::InvalidArgument("segment ids are not increasing by 1"));
+      }
+
+      auto out = output_flat.template chip<0>(segment_vec(start));
+#define I(i) input_flat.template chip<0>(indices_vec(start + i))
+      int num = end - start;
+      if (num == 1) {
+        out = I(0);
+      } else {
+        int r = num % 8;
+        T m = (is_mean_ && (num < 10)) ? num : 1;
+        switch (r) {
+          case 2:
+            out = (I(0) + I(1)) / m;
+            break;
+          case 3:
+            out = (I(0) + I(1) + I(2)) / m;
+            break;
+          case 4:
+            out = (I(0) + I(1) + I(2) + I(3)) / m;
+            break;
+          case 5:
+            out = (I(0) + I(1) + I(2) + I(3) + I(4)) / m;
+            break;
+          case 6:
+            out = (I(0) + I(1) + I(2) + I(3) + I(4) + I(5)) / m;
+            break;
+          case 7:
+            out = (I(0) + I(1) + I(2) + I(3) + I(4) + I(5) + I(6)) / m;
+            break;
+          case 0:
+            out = (I(0) + I(1) + I(2) + I(3) + I(4) + I(5) + I(6) + I(7)) / m;
+            r = 8;
+            break;
+          case 1:
+            out =
+                (I(0) + I(1) + I(2) + I(3) + I(4) + I(5) + I(6) + I(7) + I(8)) /
+                m;
+            r = 9;
+            break;
+        }
+        for (; r < num; r += 8) {
+          out += I(r) + I(r + 1) + I(r + 2) + I(r + 3) + I(r + 4) + I(r + 5) +
+                 I(r + 6) + I(r + 7);
+        }
+#undef I
+        if (is_mean_ && num >= 10) {
+          out = out / static_cast<T>(num);
+        }
+      }
+      start = end;
+      ++end;
+    }
+  }
+
+ private:
+  bool is_mean_;
+};
+
+template <typename Device, class T>
+class SparseSegmentReductionMeanOp
+    : public SparseSegmentReductionOpBase<Device, T> {
+ public:
+  explicit SparseSegmentReductionMeanOp(OpKernelConstruction* context)
+      : SparseSegmentReductionOpBase<Device, T>(context, true /*is_mean*/) {}
+};
+
+template <typename Device, class T>
+class SparseSegmentReductionSumOp
+    : public SparseSegmentReductionOpBase<Device, T> {
+ public:
+  explicit SparseSegmentReductionSumOp(OpKernelConstruction* context)
+      : SparseSegmentReductionOpBase<Device, T>(context, false /*is_mean*/) {}
+};
+
+#define REGISTER_CPU_SPARSE_KERNELS(type)                                    \
+  REGISTER_KERNEL_BUILDER(                                                   \
+      Name("SparseSegmentSum").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      SparseSegmentReductionSumOp<CPUDevice, type>);
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_CPU_SPARSE_KERNELS);
+#undef REGISTER_CPU_SPARSE_KERNELS
+
+#define REGISTER_CPU_SPARSE_KERNELS(type)                                     \
+  REGISTER_KERNEL_BUILDER(                                                    \
+      Name("SparseSegmentMean").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      SparseSegmentReductionMeanOp<CPUDevice, type>);
+REGISTER_CPU_SPARSE_KERNELS(float);
+REGISTER_CPU_SPARSE_KERNELS(double);
+#undef REGISTER_CPU_SPARSE_KERNELS
+
+template <class T>
+class SparseSegmentMeanGradOp : public OpKernel {
+ public:
+  explicit SparseSegmentMeanGradOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& indices = context->input(1);
+    const Tensor& segment_ids = context->input(2);
+    const Tensor& output_dim0 = context->input(3);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(indices.shape()),
+                errors::InvalidArgument("indices should be a vector."));
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(segment_ids.shape()),
+                errors::InvalidArgument("segment_ids should be a vector."));
+    OP_REQUIRES(context, TensorShapeUtils::IsLegacyScalar(output_dim0.shape()),
+                errors::InvalidArgument("output_dim0 should be a scalar."));
+
+    const int64 N = indices.NumElements();
+    OP_REQUIRES(context, N == segment_ids.NumElements(),
+                errors::InvalidArgument(
+                    "segment_ids and indices should have same size."));
+    const int32 M = output_dim0.scalar<int32>()();
+
+    auto input_flat = input.flat_outer_dims<T>();
+    const auto indices_vec = indices.vec<int32>();
+    const auto segment_vec = segment_ids.vec<int32>();
+
+    TensorShape output_shape = input.shape();
+    output_shape.set_dim(0, M);
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+    if (M == 0 || N == 0) return;
+
+    // Note that similar to SparseSegmentMean, we assume that segment_vec is
+    // already sorted and has non-negative values.
+    int num_segments = segment_vec(N - 1) + 1;
+    OP_REQUIRES(context, input.dim_size(0) == num_segments,
+                errors::InvalidArgument("Invalid number of segments"));
+
+    // Compute scaling factors for input.
+    std::vector<double> scaling(num_segments, 0.0);
+    for (int64 i = 0; i < N; ++i) {
+      scaling[segment_vec(i)] += 1;
+    }
+    for (int i = 0; i < scaling.size(); ++i) {
+      scaling[i] = 1.0 / std::max(scaling[i], 1.0);
+    }
+
+    auto output_flat = output->flat_outer_dims<T>();
+    output_flat.setZero();
+    std::vector<bool> is_modified(M, false);
+
+    for (int64 i = 0; i < N; ++i) {
+      int output_idx = indices_vec(i);
+      int idx = segment_vec(i);
+      T scale = static_cast<T>(scaling[idx]);
+      if (is_modified[output_idx]) {
+        if (scale == 1.0) {
+          output_flat.template chip<0>(output_idx) +=
+              input_flat.template chip<0>(idx);
+        } else {
+          output_flat.template chip<0>(output_idx) +=
+              input_flat.template chip<0>(idx) * scale;
+        }
+      } else {
+        if (scale == 1.0) {
+          output_flat.template chip<0>(output_idx) =
+              input_flat.template chip<0>(idx);
+        } else {
+          output_flat.template chip<0>(output_idx) =
+              input_flat.template chip<0>(idx) * scale;
+        }
+      }
+      is_modified[output_idx] = true;
+    }
+  }
+};
+
+#define REGISTER_CPU_SPARSE_KERNELS(type)                 \
+  REGISTER_KERNEL_BUILDER(Name("SparseSegmentMeanGrad")   \
+                              .Device(DEVICE_CPU)         \
+                              .TypeConstraint<type>("T"), \
+                          SparseSegmentMeanGradOp<type>);
+
+REGISTER_CPU_SPARSE_KERNELS(float);
+REGISTER_CPU_SPARSE_KERNELS(double);
+
+#undef REGISTER_CPU_SPARSE_KERNELS
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/segment_reduction_ops_test.cc b/tensorflow/core/kernels/segment_reduction_ops_test.cc
new file mode 100644
index 0000000000..87647a21a8
--- /dev/null
+++ b/tensorflow/core/kernels/segment_reduction_ops_test.cc
@@ -0,0 +1,157 @@
+#include <functional>
+
+#include "tensorflow/core/public/session_options.h"
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/graph/testlib.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+
+namespace tensorflow {
+
+template <typename Index>
+static void BM_SegmentReduction(int iters, string reduction, Index num_rows,
+                                Index num_cols, Index segment_size) {
+  testing::StopTiming();
+  std::unique_ptr<Device> device(
+      DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+  // Create inputs
+  gtl::InlinedVector<TensorValue, 4> reduction_inputs;
+  TensorShape shape1({num_rows, num_cols});
+  Tensor input1(DT_FLOAT, shape1);
+  reduction_inputs.push_back({nullptr, &input1});
+
+  TensorShape shape2({num_rows});
+  Tensor input2(DataTypeToEnum<Index>::v(), shape2);
+  test::FillFn<Index>(&input2, [&num_rows, &segment_size](Index i) -> Index {
+    return std::min(i / segment_size, num_rows - 1);
+  });
+  reduction_inputs.push_back({nullptr, &input2});
+
+  NodeDef reduction_node_def;
+  TF_CHECK_OK(NodeDefBuilder(reduction, reduction)
+                  .Input(FakeInput(DT_FLOAT))
+                  .Input(FakeInput(DataTypeToEnum<Index>::v()))
+                  .Finalize(&reduction_node_def));
+  Status status;
+  std::unique_ptr<OpKernel> reduction_op(CreateOpKernel(
+      DEVICE_CPU, device.get(), cpu_allocator(), reduction_node_def, &status));
+  OpKernelContext::Params params;
+  params.device = device.get();
+  params.frame_iter = FrameAndIter(0, 0);
+  params.inputs = &reduction_inputs;
+  params.op_kernel = reduction_op.get();
+  params.output_alloc_attr = [&device, &reduction_op, &params](int index) {
+    AllocatorAttributes attr;
+    const bool on_host =
+        (reduction_op->output_memory_types()[index] == HOST_MEMORY);
+    attr.set_on_host(on_host);
+    return attr;
+  };
+
+  std::unique_ptr<OpKernelContext> reduction_context(
+      new OpKernelContext(params));
+
+  reduction_op->Compute(reduction_context.get());
+  TF_CHECK_OK(reduction_context->status());
+  testing::StartTiming();
+  for (int i = 0; i < iters; ++i) {
+    delete reduction_context->release_output(0).tensor;
+    reduction_op->Compute(reduction_context.get());
+  }
+  int64 bytes_per_iter =
+      static_cast<int64>(num_rows * num_cols * sizeof(float));
+  testing::BytesProcessed(bytes_per_iter * iters);
+}
+
+#define BM_Reduce(O, R, C, S)                                      \
+  static void BM_Reduce_##O##_##R##_##C##_##S##_int32(int iters) { \
+    BM_SegmentReduction<int32>(iters, #O, R, C, S);                \
+  }                                                                \
+  static void BM_Reduce_##O##_##R##_##C##_##S##_int64(int iters) { \
+    BM_SegmentReduction<int64>(iters, #O, R, C, S);                \
+  }                                                                \
+  BENCHMARK(BM_Reduce_##O##_##R##_##C##_##S##_int32);              \
+  BENCHMARK(BM_Reduce_##O##_##R##_##C##_##S##_int64);
+
+#define BM_Reduce_Arg(R, C, S)    \
+  BM_Reduce(SegmentSum, R, C, S); \
+  BM_Reduce(SegmentMean, R, C, S);
+
+BM_Reduce_Arg(64, 32, 1);
+BM_Reduce_Arg(4096, 128, 1);
+
+BM_Reduce_Arg(16, 8, 2);
+BM_Reduce_Arg(64, 32, 2);
+BM_Reduce_Arg(4096, 32, 2);
+BM_Reduce_Arg(4096, 128, 2);
+
+static void SparseSegmentMeanGradHelper(int iters, float uniqueness, int size) {
+  testing::StopTiming();
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+  CHECK_LE(uniqueness, 1.0);
+  CHECK_GT(uniqueness, 0.0);
+
+  const int kNumIndices = size;
+  Tensor indices(DT_INT32, TensorShape({kNumIndices}));
+  auto indices_flat = indices.flat<int32>();
+  Tensor segments(DT_INT32, TensorShape({kNumIndices}));
+  auto segments_flat = segments.flat<int32>();
+
+  int kUniqueIndices = uniqueness * kNumIndices;
+  Tensor output_dim0(DT_INT32, TensorShape({}));
+  output_dim0.scalar<int32>()() = kUniqueIndices;
+
+  for (int i = 0; i < kNumIndices; ++i) {
+    indices_flat(i) = (i * 31) % kUniqueIndices;
+    segments_flat(i) = i * .8;
+  }
+
+  const int kDim1 = segments_flat(kNumIndices - 1) + 1;
+  const int kDim2 = 128;
+  Tensor input(DT_FLOAT, TensorShape({kDim1, kDim2}));
+  input.flat<float>().setRandom();
+
+  Node* node;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "SparseSegmentMeanGrad")
+                  .Input(test::graph::Constant(g, input))
+                  .Input(test::graph::Constant(g, indices))
+                  .Input(test::graph::Constant(g, segments))
+                  .Input(test::graph::Constant(g, output_dim0))
+                  .Attr("T", DT_FLOAT)
+                  .Finalize(g, &node));
+
+  testing::UseRealTime();
+  testing::BytesProcessed(static_cast<int64>(iters) * (kDim1 * kDim2) *
+                          sizeof(float));
+  testing::StartTiming();
+  test::Benchmark("cpu", g).Run(iters);
+}
+
+static void BM_SparseSegmentMeanGrad_Low(int iters, int size) {
+  return SparseSegmentMeanGradHelper(iters, 1.0, size);
+}
+
+static void BM_SparseSegmentMeanGrad_High(int iters, int size) {
+  return SparseSegmentMeanGradHelper(iters, 0.01, size);
+}
+
+BENCHMARK(BM_SparseSegmentMeanGrad_Low)->Arg(1000)->Arg(100000);
+BENCHMARK(BM_SparseSegmentMeanGrad_High)->Arg(1000)->Arg(100000);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/sendrecv_ops.cc b/tensorflow/core/kernels/sendrecv_ops.cc
new file mode 100644
index 0000000000..2abb183d1a
--- /dev/null
+++ b/tensorflow/core/kernels/sendrecv_ops.cc
@@ -0,0 +1,116 @@
+#include "tensorflow/core/kernels/sendrecv_ops.h"
+
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+static string GetRendezvousKeyPrefix(const string& send_device,
+                                     const string& recv_device,
+                                     const uint64 send_device_incarnation,
+                                     const string& tensor_name) {
+  return strings::StrCat(send_device, ";",
+                         strings::FpToString(send_device_incarnation), ";",
+                         recv_device, ";", tensor_name);
+}
+
+static string GetRendezvousKey(const string& key_prefix,
+                               const FrameAndIter& frame_iter) {
+  return strings::StrCat(key_prefix, ";", frame_iter.frame_id, ":",
+                         frame_iter.iter_id);
+}
+
+SendOp::SendOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+  string send_device;
+  OP_REQUIRES_OK(ctx, ctx->GetAttr("send_device", &send_device));
+  string recv_device;
+  OP_REQUIRES_OK(ctx, ctx->GetAttr("recv_device", &recv_device));
+  uint64 send_device_incarnation;
+  OP_REQUIRES_OK(
+      ctx, ctx->GetAttr("send_device_incarnation",
+                        reinterpret_cast<int64*>(&send_device_incarnation)));
+  string tensor_name;
+  OP_REQUIRES_OK(ctx, ctx->GetAttr("tensor_name", &tensor_name));
+  key_prefix_ = GetRendezvousKeyPrefix(send_device, recv_device,
+                                       send_device_incarnation, tensor_name);
+}
+
+void SendOp::Compute(OpKernelContext* ctx) {
+  OP_REQUIRES(
+      ctx, ctx->rendezvous() != nullptr,
+      errors::Internal("Op kernel context needs to provide a rendezvous."));
+  const string key = GetRendezvousKey(key_prefix_, ctx->frame_iter());
+  VLOG(2) << "Send " << key;
+
+  // The device context may be passed between the Send/Recv
+  // boundary, so that the device context used to produce the Tensor
+  // is used when performing the copy on the recv side (which may be
+  // a different device).
+  Rendezvous::Args args;
+  args.device_context = ctx->op_device_context();
+  args.alloc_attrs = ctx->input_alloc_attr(0);
+  Status s =
+      ctx->rendezvous()->Send(key, args, ctx->input(0), ctx->is_input_dead());
+  ctx->SetStatus(s);
+}
+
+REGISTER_KERNEL_BUILDER(Name("_Send").Device(DEVICE_CPU), SendOp);
+REGISTER_KERNEL_BUILDER(Name("_Send").Device(DEVICE_GPU), SendOp);
+
+REGISTER_KERNEL_BUILDER(Name("_HostSend").Device(DEVICE_CPU), SendOp);
+REGISTER_KERNEL_BUILDER(
+    Name("_HostSend").Device(DEVICE_GPU).HostMemory("tensor"), SendOp);
+
+RecvOp::RecvOp(OpKernelConstruction* ctx) : AsyncOpKernel(ctx) {
+  string send_device;
+  OP_REQUIRES_OK(ctx, ctx->GetAttr("send_device", &send_device));
+  string recv_device;
+  OP_REQUIRES_OK(ctx, ctx->GetAttr("recv_device", &recv_device));
+  uint64 send_device_incarnation;
+  OP_REQUIRES_OK(
+      ctx, ctx->GetAttr("send_device_incarnation",
+                        reinterpret_cast<int64*>(&send_device_incarnation)));
+  string tensor_name;
+  OP_REQUIRES_OK(ctx, ctx->GetAttr("tensor_name", &tensor_name));
+  key_prefix_ = GetRendezvousKeyPrefix(send_device, recv_device,
+                                       send_device_incarnation, tensor_name);
+}
+
+void RecvOp::ComputeAsync(OpKernelContext* ctx, DoneCallback done) {
+  OP_REQUIRES(
+      ctx, ctx->rendezvous() != nullptr,
+      errors::Internal("Op kernel context needs to provide a rendezvous."));
+  const string key = GetRendezvousKey(key_prefix_, ctx->frame_iter());
+  VLOG(2) << "Recv " << key;
+
+  Rendezvous::Args args;
+  args.device_context = ctx->op_device_context();
+  args.alloc_attrs = ctx->output_alloc_attr(0);
+  ctx->rendezvous()->RecvAsync(
+      key, args, [ctx, done](const Status& s, const Rendezvous::Args& send_args,
+                             const Rendezvous::Args& recv_args,
+                             const Tensor& val, bool is_dead) {
+        ctx->SetStatus(s);
+        if (s.ok()) {
+          // 'ctx' allocates the output tensor of the expected type.  The
+          // runtime checks whether the tensor received here is the same type.
+          if (!is_dead) {
+            ctx->set_output(0, val);
+          }
+          *ctx->is_output_dead() = is_dead;
+        }
+        done();
+      });
+}
+
+REGISTER_KERNEL_BUILDER(Name("_Recv").Device(DEVICE_CPU), RecvOp);
+REGISTER_KERNEL_BUILDER(Name("_Recv").Device(DEVICE_GPU), RecvOp);
+
+REGISTER_KERNEL_BUILDER(Name("_HostRecv").Device(DEVICE_CPU), RecvOp);
+REGISTER_KERNEL_BUILDER(
+    Name("_HostRecv").Device(DEVICE_GPU).HostMemory("tensor"), RecvOp);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/sendrecv_ops.h b/tensorflow/core/kernels/sendrecv_ops.h
new file mode 100644
index 0000000000..b3f5703ccf
--- /dev/null
+++ b/tensorflow/core/kernels/sendrecv_ops.h
@@ -0,0 +1,32 @@
+#ifndef TENSORFLOW_KERNELS_SENDRECV_OPS_H_
+#define TENSORFLOW_KERNELS_SENDRECV_OPS_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+
+namespace tensorflow {
+
+class SendOp : public OpKernel {
+ public:
+  explicit SendOp(OpKernelConstruction* ctx);
+  void Compute(OpKernelContext* ctx) override;
+
+ private:
+  string key_prefix_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(SendOp);
+};
+
+class RecvOp : public AsyncOpKernel {
+ public:
+  explicit RecvOp(OpKernelConstruction* ctx);
+  void ComputeAsync(OpKernelContext* ctx, DoneCallback done) override;
+
+ private:
+  string key_prefix_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(RecvOp);
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_SENDRECV_OPS_H_
diff --git a/tensorflow/core/kernels/sequence_ops.cc b/tensorflow/core/kernels/sequence_ops.cc
new file mode 100644
index 0000000000..60ba2e15f9
--- /dev/null
+++ b/tensorflow/core/kernels/sequence_ops.cc
@@ -0,0 +1,123 @@
+// See docs in ../ops/math_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+int32 GetValue(int32 v) { return v; }
+
+template <typename T>
+class RangeOp : public OpKernel {
+ public:
+  explicit RangeOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& start_in = context->input(0);
+    const Tensor& limit_in = context->input(1);
+    const Tensor& delta_in = context->input(2);
+    OP_REQUIRES(context, TensorShapeUtils::IsLegacyScalar(start_in.shape()),
+                errors::InvalidArgument("start must be a scalar, not shape ",
+                                        start_in.shape().ShortDebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsLegacyScalar(limit_in.shape()),
+                errors::InvalidArgument("limit must be a scalar, not shape ",
+                                        limit_in.shape().ShortDebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsLegacyScalar(delta_in.shape()),
+                errors::InvalidArgument("delta must be a scalar, not shape ",
+                                        delta_in.shape().ShortDebugString()));
+    const int32 start = GetValue(start_in.scalar<T>()());
+    const int32 limit = GetValue(limit_in.scalar<T>()());
+    OP_REQUIRES(context, start <= limit,
+                errors::InvalidArgument("Requires start <= limit: ", start, "/",
+                                        limit));
+    const int32 delta = GetValue(delta_in.scalar<T>()());
+    OP_REQUIRES(context, delta > 0,
+                errors::InvalidArgument("Requires delta > 0: ", delta));
+    int32 size = (limit - start + delta - 1) / delta;
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, TensorShape({size}), &out));
+    auto flat = out->flat<T>();
+    int32 val = start;
+    for (int32 i = 0; i < size; ++i) {
+      flat(i) = T(val);
+      val += delta;
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("Range")
+                            .Device(DEVICE_CPU)
+                            .HostMemory("start")
+                            .HostMemory("limit")
+                            .HostMemory("delta")
+                            .HostMemory("output"),
+                        RangeOp<int32>);
+
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("Range")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("start")
+                            .HostMemory("limit")
+                            .HostMemory("delta")
+                            .HostMemory("output"),
+                        RangeOp<int32>);
+#endif  // GOOGLE_CUDA
+
+template <typename T>
+class LinSpaceOp : public OpKernel {
+ public:
+  explicit LinSpaceOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& start_in = context->input(0);
+    const Tensor& stop_in = context->input(1);
+    const Tensor& num_in = context->input(2);
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(start_in.shape()),
+                errors::InvalidArgument("start must be a scalar, not shape ",
+                                        start_in.shape().ShortDebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(stop_in.shape()),
+                errors::InvalidArgument("stop must be a scalar, not shape ",
+                                        stop_in.shape().ShortDebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(num_in.shape()),
+                errors::InvalidArgument("num must be a scalar, not shape ",
+                                        num_in.shape().ShortDebugString()));
+    const T start = start_in.scalar<T>()();
+    const T stop = stop_in.scalar<T>()();
+    const int32 num = num_in.scalar<int32>()();
+    OP_REQUIRES(context, num > 0,
+                errors::InvalidArgument("Requires num > 0: ", num));
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, TensorShape({num}), &out));
+    auto flat = out->flat<T>();
+    if (num == 1) {
+      flat(0) = start;
+    } else {
+      const T step = (stop - start) / (num - 1);
+      for (int32 i = 0; i < num; ++i) flat(i) = start + step * i;
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("LinSpace")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T")
+                            .HostMemory("start")
+                            .HostMemory("stop")
+                            .HostMemory("num")
+                            .HostMemory("output"),
+                        LinSpaceOp<float>);
+REGISTER_KERNEL_BUILDER(Name("LinSpace")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<double>("T")
+                            .HostMemory("start")
+                            .HostMemory("stop")
+                            .HostMemory("num")
+                            .HostMemory("output"),
+                        LinSpaceOp<double>);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/shape_ops.cc b/tensorflow/core/kernels/shape_ops.cc
new file mode 100644
index 0000000000..7cb1da8983
--- /dev/null
+++ b/tensorflow/core/kernels/shape_ops.cc
@@ -0,0 +1,261 @@
+// See docs in ../ops/array_ops.cc.
+
+#include <unordered_set>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+class ShapeOp : public OpKernel {
+ public:
+  explicit ShapeOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& inp = ctx->input(0);
+    const int rank = inp.dims();
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({rank}), &out));
+    auto vec = out->vec<int32>();
+    for (int i = 0; i < rank; ++i) vec(i) = inp.dim_size(i);
+  }
+
+  bool IsExpensive() override { return false; }
+};
+REGISTER_KERNEL_BUILDER(Name("Shape").Device(DEVICE_CPU).HostMemory("output"),
+                        ShapeOp);
+
+#define REGISTER_GPU_KERNEL(type)                         \
+  REGISTER_KERNEL_BUILDER(Name("Shape")                   \
+                              .Device(DEVICE_GPU)         \
+                              .HostMemory("output")       \
+                              .TypeConstraint<type>("T"), \
+                          ShapeOp)
+TF_CALL_REAL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+#undef REGISTER_GPU_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Shape")
+                            .Device(DEVICE_GPU)
+                            .HostMemory("input")
+                            .HostMemory("output")
+                            .TypeConstraint<int32>("T"),
+                        ShapeOp);
+
+class RankOp : public OpKernel {
+ public:
+  explicit RankOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& inp = ctx->input(0);
+    const int rank = inp.dims();
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({}), &out));
+    out->scalar<int32>()() = rank;
+  }
+
+  bool IsExpensive() override { return false; }
+};
+REGISTER_KERNEL_BUILDER(Name("Rank").Device(DEVICE_CPU).HostMemory("output"),
+                        RankOp);
+
+#define REGISTER_GPU_KERNEL(type)                        \
+  REGISTER_KERNEL_BUILDER(Name("Rank")                   \
+                              .Device(DEVICE_GPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("output"),     \
+                          RankOp);
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+#undef REGISTER_GPU_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Rank")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int32>("T")
+                            .HostMemory("input")
+                            .HostMemory("output"),
+                        RankOp);
+
+class SizeOp : public OpKernel {
+ public:
+  explicit SizeOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& inp = ctx->input(0);
+    const int64 size = inp.NumElements();
+    Tensor* out = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({}), &out));
+    // TODO(josh11b): switch output to int64?
+    out->scalar<int32>()() = size;
+  }
+
+  bool IsExpensive() override { return false; }
+};
+REGISTER_KERNEL_BUILDER(Name("Size").Device(DEVICE_CPU).HostMemory("output"),
+                        SizeOp);
+
+#define REGISTER_GPU_KERNEL(type)                        \
+  REGISTER_KERNEL_BUILDER(Name("Size")                   \
+                              .Device(DEVICE_GPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("output"),     \
+                          SizeOp);
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+#undef REGISTER_GPU_KERNEL
+
+// A special GPU kernel for int32.
+// TODO(b/25387198): Also enable int32 in device memory. This kernel
+// registration requires all int32 inputs and outputs to be in host memory.
+REGISTER_KERNEL_BUILDER(Name("Size")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int32>("T")
+                            .HostMemory("input")
+                            .HostMemory("output"),
+                        SizeOp);
+
+class ExpandDimsOp : public OpKernel {
+ public:
+  explicit ExpandDimsOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+  void Compute(OpKernelContext* ctx) override {
+    int dim = ctx->input(1).flat<int>()(0);
+    OP_REQUIRES(
+        ctx, (dim >= -1 - ctx->input(0).dims() && dim <= ctx->input(0).dims()),
+        errors::InvalidArgument("Tried to expand dim index ", dim,
+                                " for tensor with ", ctx->input(0).dims(),
+                                " dimensions."));
+
+    auto existing_dims = ctx->input(0).shape().dim_sizes();
+    std::vector<int64> new_shape(existing_dims.size());
+    for (size_t i = 0; i < new_shape.size(); ++i) {
+      new_shape[i] = existing_dims[i];
+    }
+
+    // We emulate numpy's interpretation of the dim axis when
+    // -input.dims() >= dim <= input.dims().
+    if (dim < 0) {
+      dim += existing_dims.size() + 1;
+    }
+
+    // Clamp to the end if needed.
+    dim = std::min<int32>(dim, existing_dims.size());
+    new_shape.emplace(new_shape.begin() + dim, 1);
+    const TensorShape output_shape(new_shape);
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, {0}, &output));
+    if (!output->CopyFrom(ctx->input(0), output_shape)) {
+      // This should never happen, since the sizes of the input and output
+      // should always be the same (we only expand the dimension with 1).
+      ctx->SetStatus(
+          errors::Internal("Could not expand dimension with input shape ",
+                           ctx->input(0).shape().DebugString(),
+                           " and output shape ", output_shape.DebugString()));
+    }
+  }
+};
+REGISTER_KERNEL_BUILDER(Name("ExpandDims").Device(DEVICE_CPU).HostMemory("dim"),
+                        ExpandDimsOp);
+
+#define REGISTER_GPU_KERNEL(type)                        \
+  REGISTER_KERNEL_BUILDER(Name("ExpandDims")             \
+                              .Device(DEVICE_GPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("dim"),        \
+                          ExpandDimsOp);
+TF_CALL_NUMBER_TYPES_NO_INT32(REGISTER_GPU_KERNEL);
+#undef REGISTER_GPU_KERNEL
+
+REGISTER_KERNEL_BUILDER(Name("ExpandDims")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int32>("T")
+                            .HostMemory("input")
+                            .HostMemory("dim")
+                            .HostMemory("output"),
+                        ExpandDimsOp);
+
+class SqueezeOp : public OpKernel {
+ public:
+  explicit SqueezeOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    std::vector<int32> squeeze_dims;
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("squeeze_dims", &squeeze_dims));
+    squeeze_dims_.insert(squeeze_dims.begin(), squeeze_dims.end());
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    auto existing_dims = ctx->input(0).shape().dim_sizes();
+    std::vector<int64> new_shape;
+
+    std::unordered_set<int32> wrapped_squeeze_dims;
+    wrapped_squeeze_dims.reserve(squeeze_dims_.size());
+    // Validate squeeze dims against the input.
+    for (int32 dim : squeeze_dims_) {
+      OP_REQUIRES(
+          ctx, (dim >= -ctx->input(0).dims() && dim < ctx->input(0).dims()),
+          errors::InvalidArgument("Tried to squeeze dim index ", dim,
+                                  " for tensor with ", ctx->input(0).dims(),
+                                  " dimensions."));
+      // If dim is < 0, we wrap around (-1 means the last element).
+      if (dim < 0) {
+        dim = existing_dims.size() + dim;
+      }
+
+      wrapped_squeeze_dims.insert(dim);
+    }
+
+    for (size_t i = 0; i < existing_dims.size(); ++i) {
+      auto existing_dim = existing_dims[i];
+
+      // If squeeze_set is non-empty, only squeeze those dimensions.
+      if (!wrapped_squeeze_dims.empty()) {
+        if (wrapped_squeeze_dims.count(i) > 0) {
+          OP_REQUIRES(ctx, existing_dim == 1,
+                      errors::InvalidArgument("Tried to explicitly squeeze "
+                                              "dimension ",
+                                              i, " but dimension was not 1: ",
+                                              existing_dim));
+        } else {
+          // This dimension is not being squeezed.
+          new_shape.push_back(existing_dim);
+        }
+      } else {
+        // Copy over all non-1-length dimensions.
+        if (existing_dim != 1) {
+          new_shape.push_back(existing_dim);
+        }
+      }
+    }
+
+    const TensorShape output_shape(new_shape);
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, {0}, &output));
+    if (!output->CopyFrom(ctx->input(0), output_shape)) {
+      // This should never happen, since the sizes of the input and
+      // output should always be the same.
+      ctx->SetStatus(errors::Internal("Could not squeeze input with shape ",
+                                      ctx->input(0).shape().DebugString(),
+                                      " and output shape ",
+                                      output_shape.DebugString()));
+    }
+  }
+
+ private:
+  std::unordered_set<int32> squeeze_dims_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("Squeeze").Device(DEVICE_CPU), SqueezeOp);
+
+#define REGISTER_GPU_KERNEL(type)                                   \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("Squeeze").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      SqueezeOp);
+TF_CALL_NUMBER_TYPES(REGISTER_GPU_KERNEL);
+#undef REGISTER_GPU_KERNEL
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/slice_op.cc b/tensorflow/core/kernels/slice_op.cc
new file mode 100644
index 0000000000..3477266d5d
--- /dev/null
+++ b/tensorflow/core/kernels/slice_op.cc
@@ -0,0 +1,242 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#if GOOGLE_CUDA
+#define EIGEN_USE_GPU
+#endif  // GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/slice_op.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+namespace {
+
+gtl::InlinedVector<int64, 4> IntTensorToInt64Vec(const Tensor& tensor) {
+  gtl::InlinedVector<int64, 4> out;
+  if (tensor.dtype() == DT_INT32) {
+    for (int64 i = 0; i < tensor.NumElements(); ++i) {
+      out.push_back(tensor.flat<int32>()(i));
+    }
+  } else if (tensor.dtype() == DT_INT64) {
+    for (int64 i = 0; i < tensor.NumElements(); ++i) {
+      out.push_back(tensor.flat<int64>()(i));
+    }
+  } else {
+    LOG(FATAL) << "begin must be either int32 or int64";
+  }
+  return out;
+}
+
+}  // namespace
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+// Shared code that is not dependent on the type of T.  We do this to reduce
+// code size by not duplicating all this for all T (float, double, int32, etc.)
+static void SharedValidation(OpKernelContext* context,
+                             TensorShape* output_shape, bool* is_identity,
+                             bool* slice_dim0,
+                             gtl::InlinedVector<int64, 4>* begin,
+                             gtl::InlinedVector<int64, 4>* size) {
+  const Tensor& input = context->input(0);
+  const Tensor& begin_tensor = context->input(1);
+  const Tensor& size_tensor = context->input(2);
+
+  OP_REQUIRES(
+      context, TensorShapeUtils::IsLegacyVector(begin_tensor.shape()) &&
+                   TensorShapeUtils::IsLegacyVector(size_tensor.shape()) &&
+                   begin_tensor.NumElements() == input.dims() &&
+                   size_tensor.NumElements() == input.dims(),
+      errors::InvalidArgument(
+          "Expected begin and size arguments to be 1-D tensors of size ",
+          input.dims(), ", but got ", begin_tensor.NumElements(), " and ",
+          size_tensor.NumElements(), " instead."));
+
+  const int input_dims = input.dims();
+  *begin = IntTensorToInt64Vec(begin_tensor);
+  *size = IntTensorToInt64Vec(size_tensor);
+  for (int i = 0; i < input_dims; ++i) {
+    if ((*size)[i] == -1) {
+      // A size[i] of -1 means "all elements from begin[i] to dim_size(i)".
+      (*size)[i] = input.dim_size(i) - (*begin)[i];
+    }
+  }
+
+  *is_identity = true;
+  *slice_dim0 = true;
+  for (int i = 0; i < input_dims; ++i) {
+    int64 b = (*begin)[i];
+    int64 s = (*size)[i];
+    if (input.dim_size(i) == 0) {
+      OP_REQUIRES(
+          context, b == 0 && s == 0,
+          errors::InvalidArgument("Expected begin[", i, "] == 0 (got ", b,
+                                  ") and size[", i, "] == 0 ", "(got ", s,
+                                  ") when ", "input.dim_size(", i, ") == 0"));
+    } else {
+      OP_REQUIRES(context, 0 <= b && b <= input.dim_size(i),
+                  errors::InvalidArgument("Expected begin[", i, "] in [0, ",
+                                          input.dim_size(i), "], but got ", b));
+      OP_REQUIRES(
+          context, 0 <= s && b + s <= input.dim_size(i),
+          errors::InvalidArgument("Expected size[", i, "] in [0, ",
+                                  input.dim_size(i) - b, "], but ", "got ", s));
+    }
+    output_shape->AddDim(s);
+    const bool take_all = (b == 0) && (s == input.dim_size(i));
+    (*is_identity) &= take_all;
+    (*slice_dim0) &= (i == 0) || take_all;
+  }
+}
+
+template <typename Device, typename T>
+class SliceOp : public OpKernel {
+ public:
+  explicit SliceOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    TensorShape output_shape;
+    bool is_identity = true;
+    bool slice_dim0 = true;
+    gtl::InlinedVector<int64, 4> begin;
+    gtl::InlinedVector<int64, 4> size;
+    SharedValidation(context, &output_shape, &is_identity, &slice_dim0, &begin,
+                     &size);
+    if (!context->status().ok()) return;
+    const Tensor& input = context->input(0);
+    if (is_identity) {
+      VLOG(1) << "Slice identity";
+      context->set_output(0, input);
+      return;
+    }
+
+    if (slice_dim0 && IsInnerDimsSizeAligned<T>(input.shape())) {
+      VLOG(1) << "Slice dim 0: " << input.shape().DebugString();
+      CHECK_GE(input.dims(), 1);  // Otherwise, is_identity should be true.
+      context->set_output(0, input.Slice(begin[0], begin[0] + size[0]));
+      return;
+    }
+
+    Tensor* result = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &result));
+    const int input_dims = input.dims();
+
+    if (output_shape.num_elements() > 0) {
+      if (std::is_same<Device, CPUDevice>::value && input_dims == 2 &&
+          DataTypeCanUseMemcpy(DataTypeToEnum<T>::v())) {
+        auto input = context->input(0).tensor<T, 2>();
+        auto output = result->tensor<T, 2>();
+        // TODO(agarwal): Consider multi-threading this loop for cases where
+        // size[0] is very large.
+        for (int i = 0; i < size[0]; ++i) {
+          const int row = begin[0] + i;
+          if (i + 1 < size[0]) {
+            port::prefetch<port::PREFETCH_HINT_T0>(&output(i + 1, 0));
+            port::prefetch<port::PREFETCH_HINT_T0>(&input(row + 1, begin[1]));
+          }
+          memcpy(&output(i, 0), &input(row, begin[1]), size[1] * sizeof(T));
+        }
+        return;
+      }
+#define HANDLE_DIM(NDIM)                            \
+  if (input_dims == NDIM) {                         \
+    HandleCase<NDIM>(context, begin, size, result); \
+    return;                                         \
+  }
+
+      HANDLE_DIM(1);
+      HANDLE_DIM(2);
+      HANDLE_DIM(3);
+      HANDLE_DIM(4);
+      HANDLE_DIM(5);
+
+#undef HANDLE_DIM
+
+      OP_REQUIRES(context, false, errors::Unimplemented(
+                                      "SliceOp : Unhandled input dimensions"));
+    }
+  }
+
+ private:
+  template <int NDIM>
+  void HandleCase(OpKernelContext* context, const gtl::ArraySlice<int64>& begin,
+                  const gtl::ArraySlice<int64>& size, Tensor* result) {
+    Eigen::DSizes<ptrdiff_t, NDIM> indices;
+    Eigen::DSizes<ptrdiff_t, NDIM> sizes;
+    for (int i = 0; i < NDIM; ++i) {
+      indices[i] = begin[i];
+      sizes[i] = size[i];
+    }
+
+    functor::Slice<Device, T, NDIM>()(
+        context->eigen_device<Device>(), result->tensor<T, NDIM>(),
+        context->input(0).tensor<T, NDIM>(), indices, sizes);
+  }
+};
+
+#define REGISTER_SLICE(type)                             \
+  REGISTER_KERNEL_BUILDER(Name("Slice")                  \
+                              .Device(DEVICE_CPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("begin")       \
+                              .HostMemory("size"),       \
+                          SliceOp<CPUDevice, type>)
+
+TF_CALL_ALL_TYPES(REGISTER_SLICE);
+REGISTER_SLICE(bfloat16);
+
+#undef REGISTER_SLICE
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T, NDIM)                                  \
+  template <>                                                      \
+  void Slice<GPUDevice, T, NDIM>::operator()(                      \
+      const GPUDevice& d, typename TTypes<T, NDIM>::Tensor output, \
+      typename TTypes<T, NDIM>::ConstTensor input,                 \
+      const Eigen::DSizes<ptrdiff_t, NDIM>& indices,               \
+      const Eigen::DSizes<ptrdiff_t, NDIM>& sizes);                \
+  extern template struct Slice<GPUDevice, T, NDIM>;
+
+#define DECLARE_FOR_N(T)  \
+  DECLARE_GPU_SPEC(T, 1); \
+  DECLARE_GPU_SPEC(T, 2); \
+  DECLARE_GPU_SPEC(T, 3); \
+  DECLARE_GPU_SPEC(T, 4); \
+  DECLARE_GPU_SPEC(T, 5);
+
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_FOR_N);
+DECLARE_FOR_N(int32);
+
+#undef DECLARE_FOR_N
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+#define REGISTER_GPU(type)                                     \
+  REGISTER_KERNEL_BUILDER(Name("Slice")                        \
+                              .Device(DEVICE_GPU)              \
+                              .TypeConstraint<type>("T")       \
+                              .HostMemory("begin")             \
+                              .HostMemory("size")              \
+                              .TypeConstraint<int32>("Index"), \
+                          SliceOp<GPUDevice, type>)
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU);
+REGISTER_GPU(int32);
+
+#undef REGISTER_GPU
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/slice_op.h b/tensorflow/core/kernels/slice_op.h
new file mode 100644
index 0000000000..1b6bd9c112
--- /dev/null
+++ b/tensorflow/core/kernels/slice_op.h
@@ -0,0 +1,25 @@
+#ifndef TENSORFLOW_KERNELS_SLICE_OP_H_
+#define TENSORFLOW_KERNELS_SLICE_OP_H_
+
+// Functor definition for SliceOp, must be compilable by nvcc.
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T, int NDIMS>
+struct Slice {
+  void operator()(const Device& d, typename TTypes<T, NDIMS>::Tensor output,
+                  typename TTypes<T, NDIMS>::ConstTensor input,
+                  const Eigen::DSizes<ptrdiff_t, NDIMS>& slice_indices,
+                  const Eigen::DSizes<ptrdiff_t, NDIMS>& slice_sizes) {
+    output.device(d) = input.slice(slice_indices, slice_sizes);
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_SLICE_OP_H_
diff --git a/tensorflow/core/kernels/slice_op_gpu.cu.cc b/tensorflow/core/kernels/slice_op_gpu.cu.cc
new file mode 100644
index 0000000000..6e919b244c
--- /dev/null
+++ b/tensorflow/core/kernels/slice_op_gpu.cu.cc
@@ -0,0 +1,31 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include <stdio.h>
+
+#include "tensorflow/core/kernels/slice_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+#define DEFINE_GPU_KERNELS(T)                      \
+  template struct functor::Slice<GPUDevice, T, 1>; \
+  template struct functor::Slice<GPUDevice, T, 2>; \
+  template struct functor::Slice<GPUDevice, T, 3>; \
+  template struct functor::Slice<GPUDevice, T, 4>; \
+  template struct functor::Slice<GPUDevice, T, 5>;
+
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_KERNELS);
+DEFINE_GPU_KERNELS(int32);
+
+#undef DEFINE_GPU_KERNELS
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/slice_op_test.cc b/tensorflow/core/kernels/slice_op_test.cc
new file mode 100644
index 0000000000..27c78c6dc0
--- /dev/null
+++ b/tensorflow/core/kernels/slice_op_test.cc
@@ -0,0 +1,73 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/testlib.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+namespace {
+
+// For the benchmark, we set up two 2-dimensional tensors, each kDim1 x 'dim'
+// in size, and concat them together along "concat_dimension"
+template <typename T>
+static void SliceHelper(int iters, int size) {
+  testing::StopTiming();
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+  DataType dt = DataTypeToEnum<T>::v();
+  int kDim = 100;
+  int kMaxSize = 15000;
+  CHECK_LT(size, kMaxSize);
+
+  Tensor begin(DT_INT32, TensorShape({2}));
+  begin.flat<int32>()(0) = 10;
+  begin.flat<int32>()(1) = 10;
+
+  Tensor sizes(DT_INT32, TensorShape({2}));
+  sizes.flat<int32>()(0) = kDim;
+  sizes.flat<int32>()(1) = size;
+
+  Tensor input(dt, TensorShape({2 * kDim, kMaxSize}));
+  input.flat<T>().setRandom();
+
+  Node* node;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Slice")
+                  .Input(test::graph::Constant(g, input))
+                  .Input(test::graph::Constant(g, begin))
+                  .Input(test::graph::Constant(g, sizes))
+                  .Attr("T", dt)
+                  .Finalize(g, &node));
+
+  testing::BytesProcessed(static_cast<int64>(iters) * kDim * size * sizeof(T));
+  testing::StartTiming();
+  test::Benchmark("cpu", g).Run(iters);
+  testing::UseRealTime();
+}
+
+static void BM_SliceFloat(int iters, int dim2) {
+  SliceHelper<float>(iters, dim2);
+}
+
+BENCHMARK(BM_SliceFloat)->Arg(100)->Arg(1000)->Arg(10000);
+
+static void BM_SliceBFloat16(int iters, int dim2) {
+  SliceHelper<bfloat16>(iters, dim2);
+}
+
+BENCHMARK(BM_SliceBFloat16)->Arg(100)->Arg(1000)->Arg(10000);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/softmax_op.cc b/tensorflow/core/kernels/softmax_op.cc
new file mode 100644
index 0000000000..abe6331a4f
--- /dev/null
+++ b/tensorflow/core/kernels/softmax_op.cc
@@ -0,0 +1,62 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/kernels/softmax_op.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class SoftmaxOp : public OpKernel {
+ public:
+  explicit SoftmaxOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& logits_in = context->input(0);
+    OP_REQUIRES(context, TensorShapeUtils::IsMatrix(logits_in.shape()),
+                errors::InvalidArgument("logits must be 2-dimensional"));
+    Tensor* softmax_out = nullptr;
+    OP_REQUIRES_OK(
+        context, context->allocate_output(0, logits_in.shape(), &softmax_out));
+    functor::SoftmaxFunctor<Device, T> functor;
+    functor(context->eigen_device<Device>(), logits_in.matrix<T>(),
+            softmax_out->matrix<T>());
+  }
+};
+
+// Partial specialization for a CPUDevice, that uses the Eigen implementation
+// from SoftmaxEigenImpl.
+namespace functor {
+template <typename T>
+struct SoftmaxFunctor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::ConstMatrix logits,
+                  typename TTypes<T>::Matrix softmax) {
+    SoftmaxEigenImpl<CPUDevice, T>::Compute(d, logits, softmax);
+  }
+};
+}  // namespace functor
+
+REGISTER_KERNEL_BUILDER(Name("Softmax")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        SoftmaxOp<CPUDevice, float>);
+REGISTER_KERNEL_BUILDER(Name("Softmax")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<double>("T"),
+                        SoftmaxOp<CPUDevice, double>);
+
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("Softmax")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T"),
+                        SoftmaxOp<GPUDevice, float>);
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/softmax_op.h b/tensorflow/core/kernels/softmax_op.h
new file mode 100644
index 0000000000..69bd531b70
--- /dev/null
+++ b/tensorflow/core/kernels/softmax_op.h
@@ -0,0 +1,70 @@
+#ifndef TENSORFLOW_KERNELS_SOFTMAX_OP_H_
+#define TENSORFLOW_KERNELS_SOFTMAX_OP_H_
+// Functor definition for SoftmaxOp, must be compilable by nvcc.
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by SoftmaxOp to do the computations.
+template <typename Device, typename T>
+struct SoftmaxFunctor {
+  // Computes Softmax activation.
+  //
+  // logits: dim: batch_size, num_classes.
+  // softmax: dims: batch_size, num_classes.
+  void operator()(const Device& d, typename TTypes<T>::ConstMatrix logits,
+                  typename TTypes<T>::Matrix softmax);
+};
+
+// Eigen code implementing SoftmaxFunctor::operator().
+// This code works for both CPU and GPU and is used by the functor
+// specializations for both device types.
+template <typename Device, typename T>
+struct SoftmaxEigenImpl {
+  static void Compute(const Device& d, typename TTypes<T>::ConstMatrix logits,
+                      typename TTypes<T>::Matrix softmax) {
+    const int kBatchDim = 0;
+    const int kClassDim = 1;
+
+    const int batch_size = logits.dimension(kBatchDim);
+    const int num_classes = logits.dimension(kClassDim);
+
+// These arrays are used to reduce along the class dimension, and broadcast
+// the resulting value to all classes.
+#if !defined(EIGEN_HAS_INDEX_LIST)
+    Eigen::DSizes<int, 1> along_class(kClassDim);
+    Eigen::DSizes<int, 2> batch_by_one(batch_size, 1);
+    Eigen::DSizes<int, 2> one_by_class(1, num_classes);
+#else
+    Eigen::IndexList<Eigen::type2index<kClassDim> > along_class;
+    Eigen::IndexList<Eigen::type2index<1> > depth_dim;
+    Eigen::IndexList<int, Eigen::type2index<1> > batch_by_one;
+    batch_by_one.set(0, batch_size);
+    Eigen::IndexList<Eigen::type2index<1>, int> one_by_class;
+    one_by_class.set(1, num_classes);
+#endif
+    // NOTE(mdevin): If you modify this implementation please run
+    // the ImageNetSoftmaxFwd benchmark in core_ops_test.cc.
+    //
+    // softmax = exp(logits - max(logits along classes));
+    softmax.device(d) = (logits -
+                         logits.maximum(along_class)
+                             .eval()
+                             .reshape(batch_by_one)
+                             .broadcast(one_by_class)).exp();
+    // softmax = softmax / sum(softmax along classes);
+    softmax.device(d) = (softmax /
+                         softmax.sum(along_class)
+                             .eval()
+                             .reshape(batch_by_one)
+                             .broadcast(one_by_class));
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_SOFTMAX_OP_H_
diff --git a/tensorflow/core/kernels/softmax_op_gpu.cu.cc b/tensorflow/core/kernels/softmax_op_gpu.cu.cc
new file mode 100644
index 0000000000..d5aaf9c364
--- /dev/null
+++ b/tensorflow/core/kernels/softmax_op_gpu.cu.cc
@@ -0,0 +1,31 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/softmax_op.h"
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Partial specialization for a GPUDevice, that uses the Eigen implementation
+// from SoftmaxEigenImpl.
+namespace functor {
+template <typename T>
+struct SoftmaxFunctor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::ConstMatrix logits,
+                  typename TTypes<T>::Matrix softmax) {
+    SoftmaxEigenImpl<GPUDevice, T>::Compute(d, logits, softmax);
+  }
+};
+}  // end namespace functor
+
+// Instantiate the GPU implementation for float.
+template struct functor::SoftmaxFunctor<GPUDevice, float>;
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/softplus_op.cc b/tensorflow/core/kernels/softplus_op.cc
new file mode 100644
index 0000000000..b5fb57d3c5
--- /dev/null
+++ b/tensorflow/core/kernels/softplus_op.cc
@@ -0,0 +1,97 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/softplus_op.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class SoftplusOp : public UnaryElementWiseOp<T, SoftplusOp<Device, T>> {
+ public:
+  using UnaryElementWiseOp<T, SoftplusOp<Device, T>>::UnaryElementWiseOp;
+
+  void Operate(OpKernelContext* context, const Tensor& input, Tensor* output) {
+    functor::Softplus<Device, T> functor;
+    functor(context->eigen_device<Device>(), input.flat<T>(),
+            output->flat<T>());
+  }
+};
+
+template <typename Device, typename T>
+class SoftplusGradOp
+    : public BinaryElementWiseOp<T, SoftplusGradOp<Device, T>> {
+ public:
+  using BinaryElementWiseOp<T, SoftplusGradOp<Device, T>>::BinaryElementWiseOp;
+
+  // INPUTS:
+  //   g (gradients): backpropagated gradients
+  //   a (inputs): inputs that were passed to SoftplusOp()
+  // OUTPUT:
+  //   gradients to backprop
+  template <int NDIMS>
+  void Operate(OpKernelContext* context, const Tensor& g, const Tensor& a,
+               Tensor* output) {
+    OP_REQUIRES(context, a.IsSameSize(g),
+                errors::InvalidArgument("g and a must be the same size"));
+    functor::SoftplusGrad<Device, T> functor;
+    functor(context->eigen_device<Device>(), g.flat<T>(), a.flat<T>(),
+            output->flat<T>());
+  }
+};
+
+#define REGISTER_KERNELS(type)                                           \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("Softplus").Device(DEVICE_CPU).TypeConstraint<type>("T"),     \
+      SoftplusOp<CPUDevice, type>);                                      \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("SoftplusGrad").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      SoftplusGradOp<CPUDevice, type>);
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNELS);
+#undef REGISTER_KERNELS
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                          \
+  template <>                                                        \
+  void Softplus<GPUDevice, T>::operator()(                           \
+      const GPUDevice& d, typename TTypes<T>::ConstTensor features,  \
+      typename TTypes<T>::Tensor activations);                       \
+  extern template struct Softplus<GPUDevice, T>;                     \
+                                                                     \
+  template <>                                                        \
+  void SoftplusGrad<GPUDevice, T>::operator()(                       \
+      const GPUDevice& d, typename TTypes<T>::ConstTensor gradients, \
+      typename TTypes<T>::ConstTensor features,                      \
+      typename TTypes<T>::Tensor backprops);                         \
+  extern template struct SoftplusGrad<GPUDevice, T>;
+
+TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPEC);
+}  // namespace functor
+
+// Registration of the GPU implementations.
+#define REGISTER_GPU_KERNELS(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("Softplus").Device(DEVICE_GPU).TypeConstraint<type>("T"),     \
+      SoftplusOp<GPUDevice, type>);                                      \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("SoftplusGrad").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      SoftplusGradOp<GPUDevice, type>);
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);
+#undef REGISTER_GPU_KERNELS
+
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/softplus_op.h b/tensorflow/core/kernels/softplus_op.h
new file mode 100644
index 0000000000..3545a78246
--- /dev/null
+++ b/tensorflow/core/kernels/softplus_op.h
@@ -0,0 +1,46 @@
+#ifndef TENSORFLOW_KERNELS_SOFTPLUS_OP_H_
+#define TENSORFLOW_KERNELS_SOFTPLUS_OP_H_
+// Functor definition for SoftplusOp and SoftplusGradOp, must be compilable by
+// nvcc.
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by SoftplusOp to do the computations.
+template <typename Device, typename T>
+struct Softplus {
+  // Computes Softplus activation.
+  //
+  // features: any shape.
+  // activations: same shape as "features".
+  void operator()(const Device& d, typename TTypes<T>::ConstTensor features,
+                  typename TTypes<T>::Tensor activations) {
+    activations.device(d) =
+        (features > features.constant(30.f))
+            .select(features, (features.exp() + features.constant(1.0f)).log());
+  }
+};
+
+// Functor used by SoftplusGradOp to do the computations.
+template <typename Device, typename T>
+struct SoftplusGrad {
+  // Computes SoftplusGrad backprops.
+  //
+  // gradients: gradients backpropagated to the Softplus op.
+  // features: inputs that where passed to the Softplus op.
+  // backprops: gradients to backpropagate to the Softplus inputs.
+  void operator()(const Device& d, typename TTypes<T>::ConstTensor gradients,
+                  typename TTypes<T>::ConstTensor features,
+                  typename TTypes<T>::Tensor backprops) {
+    backprops.device(d) =
+        gradients / ((-features).exp() + features.constant(1.0f));
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_SOFTPLUS_OP_H_
diff --git a/tensorflow/core/kernels/softplus_op_gpu.cu.cc b/tensorflow/core/kernels/softplus_op_gpu.cu.cc
new file mode 100644
index 0000000000..7a974321a7
--- /dev/null
+++ b/tensorflow/core/kernels/softplus_op_gpu.cu.cc
@@ -0,0 +1,25 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include <stdio.h>
+
+#include "tensorflow/core/kernels/softplus_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Definition of the GPU implementations declared in softplus_op.cc.
+#define DEFINE_GPU_KERNELS(T)                      \
+  template struct functor::Softplus<GPUDevice, T>; \
+  template struct functor::SoftplusGrad<GPUDevice, T>;
+
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_KERNELS);
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/sparse_concat_op.cc b/tensorflow/core/kernels/sparse_concat_op.cc
new file mode 100644
index 0000000000..72c267a47d
--- /dev/null
+++ b/tensorflow/core/kernels/sparse_concat_op.cc
@@ -0,0 +1,139 @@
+#define EIGEN_USE_THREADS
+
+#include <algorithm>
+#include <unordered_map>
+#include <utility>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_util.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/sparse/sparse_tensor.h"
+
+namespace tensorflow {
+
+template <typename T>
+class SparseConcatOp : public OpKernel {
+ public:
+  explicit SparseConcatOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("concat_dim", &concat_dim_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    OpInputList inds;
+    OP_REQUIRES_OK(context, context->input_list("indices", &inds));
+    const int N = inds.size();
+    for (int i = 0; i < N; i++) {
+      OP_REQUIRES(context, TensorShapeUtils::IsMatrix(inds[i].shape()),
+                  errors::InvalidArgument(
+                      "Input indices should be a matrix but received shape ",
+                      inds[i].shape().DebugString(), " at position ", i));
+    }
+
+    OpInputList vals;
+    OP_REQUIRES_OK(context, context->input_list("values", &vals));
+    OP_REQUIRES(context, vals.size() == N,
+                errors::InvalidArgument("Expected ", N, " input values, got ",
+                                        vals.size()));
+    for (int i = 0; i < N; i++) {
+      OP_REQUIRES(context, TensorShapeUtils::IsVector(vals[i].shape()),
+                  errors::InvalidArgument(
+                      "Input values should be a vector but received shape ",
+                      vals[i].shape().DebugString(), " at position ", i));
+    }
+
+    OpInputList shapes;
+    OP_REQUIRES_OK(context, context->input_list("shapes", &shapes));
+    OP_REQUIRES(context, shapes.size() == N,
+                errors::InvalidArgument("Expected ", N, " input shapes, got ",
+                                        shapes.size()));
+    for (int i = 0; i < N; i++) {
+      OP_REQUIRES(context, TensorShapeUtils::IsVector(shapes[i].shape()),
+                  errors::InvalidArgument(
+                      "Input shapes should be a vector but received shape ",
+                      shapes[i].shape().DebugString(), " at position ", i));
+    }
+
+    const TensorShape input_shape(shapes[0].vec<int64>());
+    OP_REQUIRES(
+        context, concat_dim_ >= 0 && concat_dim_ < input_shape.dims(),
+        errors::InvalidArgument("Concat dimension must be between 0 and rank (",
+                                input_shape.dims(), "), got ", concat_dim_));
+    for (int i = 1; i < N; ++i) {
+      const TensorShape current_shape(shapes[i].vec<int64>());
+      OP_REQUIRES(context, current_shape.dims() == input_shape.dims(),
+                  errors::InvalidArgument(
+                      "Ranks of all input tensors must match: expected ",
+                      input_shape.dims(), " but got ", current_shape.dims(),
+                      " at position ", i));
+      for (int j = 0; j < input_shape.dims(); ++j) {
+        if (j != concat_dim_) {
+          OP_REQUIRES(
+              context, input_shape.dim_size(j) == current_shape.dim_size(j),
+              errors::InvalidArgument(
+                  "Input shapes must match: expected ", input_shape.dim_size(j),
+                  " for dimension ", j, " but got ", current_shape.dim_size(j),
+                  " at position ", i));
+        }
+      }
+    }
+
+    // The input and output sparse tensors are assumed to be ordered along
+    // increasing dimension number. But in order for concat to work properly,
+    // order[0] must be concat_dim. So we will reorder the inputs to the
+    // concat ordering, concatenate, then reorder back to the standard order.
+    // We make a deep copy of the input tensors to ensure that the in-place
+    // reorder doesn't create race conditions for other ops that may be
+    // concurrently reading the indices and values tensors.
+
+    gtl::InlinedVector<int64, 8> std_order(input_shape.dims());
+    std::iota(std_order.begin(), std_order.end(), 0);
+
+    std::vector<int64> concat_order;
+    concat_order.reserve(input_shape.dims());
+    concat_order.push_back(concat_dim_);
+    for (int j = 0; j < input_shape.dims(); ++j) {
+      if (j != concat_dim_) {
+        concat_order.push_back(j);
+      }
+    }
+
+    std::vector<sparse::SparseTensor> sp_inputs;
+    for (int i = 0; i < N; ++i) {
+      const TensorShape current_shape(shapes[i].vec<int64>());
+      sp_inputs.emplace_back(tensor::DeepCopy(inds[i]),
+                             tensor::DeepCopy(vals[i]), current_shape,
+                             std_order);
+      sp_inputs[i].Reorder<T>(concat_order);
+    }
+
+    sparse::SparseTensor concat = sparse::SparseTensor::Concat<T>(sp_inputs);
+    concat.Reorder<T>(std_order);
+
+    context->set_output(0, concat.indices());
+    context->set_output(1, concat.values());
+
+    Tensor* output_shape_out = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                2, TensorShape({concat.shape().dims()}),
+                                &output_shape_out));
+    auto output_shape = output_shape_out->vec<int64>();
+    for (int j = 0; j < concat.shape().dims(); ++j) {
+      output_shape(j) = concat.shape().dim_size(j);
+    }
+  }
+
+ private:
+  int concat_dim_;
+};
+
+#define REGISTER_KERNELS(type)                                           \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("SparseConcat").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      SparseConcatOp<type>)
+
+TF_CALL_ALL_TYPES(REGISTER_KERNELS);
+#undef REGISTER_KERNELS
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/sparse_matmul_op.cc b/tensorflow/core/kernels/sparse_matmul_op.cc
new file mode 100644
index 0000000000..919e129ff8
--- /dev/null
+++ b/tensorflow/core/kernels/sparse_matmul_op.cc
@@ -0,0 +1,192 @@
+// See docs in ../ops/math_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/port.h"
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/util/work_sharder.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+template <typename T>
+void PrefetchBlockNTA(const T& tensor, int si, int ei, int sj, int ej) {
+  for (int i = si; i < ei; ++i) {
+    for (int j = sj; j < ej; j = j + 16) {
+      port::prefetch<port::PREFETCH_HINT_NTA>(&tensor(i, j));
+    }
+  }
+}
+
+template <typename T>
+void PrefetchBlockT1(const T& tensor, int si, int ei, int sj, int ej) {
+  for (int i = si; i < ei; ++i) {
+    for (int j = sj; j < ej; j = j + 16) {
+      port::prefetch<port::PREFETCH_HINT_T1>(&tensor(i, j));
+    }
+  }
+}
+
+struct Block {
+  Block(int sm, int em, int sk, int ek, int sn, int en)
+      : startm(sm), endm(em), startk(sk), endk(ek), startn(sn), endn(en) {}
+
+  int startm;
+  int endm;
+  int startk;
+  int endk;
+  int startn;
+  int endn;
+};
+
+bool NextBlock(const int Bm, const int Bk, const int Bn, const int m_start,
+               const int m, const int k, const int n, const Block& b,
+               Block* next) {
+  *next = b;
+  if (b.endk < k) {
+    next->startk = b.endk;
+    next->endk = std::min(b.endk + Bk, k);
+  } else {
+    next->startk = 0;
+    next->endk = std::min(Bk, k);
+    if (b.endm < m) {
+      next->startm = b.endm;
+      next->endm = std::min(b.endm + Bm, m);
+    } else {
+      next->startm = m_start;
+      next->endm = std::min(m_start + Bm, m);
+      next->startn = b.endn;
+      next->endn = std::min(b.endn + Bn, n);
+    }
+  }
+  return next->startn == next->endn;
+}
+
+class SparseMatMulOp : public OpKernel {
+ public:
+  explicit SparseMatMulOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("transpose_a", &transpose_a_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("transpose_b", &transpose_b_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("a_is_sparse", &a_is_sparse_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("b_is_sparse", &b_is_sparse_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    const Tensor& a = ctx->input(0);
+    const Tensor& b = ctx->input(1);
+
+    OP_REQUIRES(ctx, TensorShapeUtils::IsMatrix(a.shape()),
+                errors::InvalidArgument("a is not a matrix"));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsMatrix(b.shape()),
+                errors::InvalidArgument("b is not a matrix"));
+
+    auto left = a.matrix<float>();
+    auto right_mat = b.matrix<float>();
+    const int m = transpose_a_ ? left.dimension(1) : left.dimension(0);
+    const int k = transpose_a_ ? left.dimension(0) : left.dimension(1);
+    const int n =
+        transpose_b_ ? right_mat.dimension(0) : right_mat.dimension(1);
+    const int k2 =
+        transpose_b_ ? right_mat.dimension(1) : right_mat.dimension(0);
+
+    OP_REQUIRES(ctx, k == k2,
+                errors::InvalidArgument("Matrix size incompatible: a: ",
+                                        a.shape().DebugString(), ", b: ",
+                                        b.shape().DebugString()));
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(ctx, ctx->allocate_output(0, TensorShape({m, n}), &output));
+    auto out = output->matrix<float>();
+
+    if (!a_is_sparse_) {
+      // Fallback to Eigen contract.
+      // Note that we currently don't optimize the case where only right is
+      // sparse. That can generally be handled by tranposing the order of the
+      // matmul.
+      Eigen::array<Eigen::IndexPair<Eigen::DenseIndex>, 1> dim_pair;
+      dim_pair[0].first = transpose_a_ ? 0 : 1;
+      dim_pair[0].second = transpose_b_ ? 1 : 0;
+      out.device(ctx->template eigen_device<CPUDevice>()) =
+          left.contract(right_mat, dim_pair);
+      return;
+    }
+    typedef Eigen::Tensor<float, 2, Eigen::RowMajor> Matrix;
+    std::unique_ptr<Matrix> right_tr_mat;
+    std::unique_ptr<TTypes<float>::ConstMatrix> right_tr_map;
+    if (transpose_b_) {
+      right_tr_mat.reset(new Matrix(k, n));
+      Eigen::array<int, 2> perm({1, 0});
+      right_tr_mat->device(ctx->template eigen_device<CPUDevice>()) =
+          right_mat.shuffle(perm);
+      right_tr_map.reset(new TTypes<float>::ConstMatrix(
+          right_tr_mat->data(), right_tr_mat->dimensions()));
+    }
+    TTypes<float>::ConstMatrix& right =
+        transpose_b_ ? *right_tr_map : right_mat;
+
+    const bool transpose_a = transpose_a_;
+
+    typedef Eigen::TensorMap<Eigen::Tensor<float, 1, Eigen::RowMajor>,
+                             Eigen::Unaligned> TensorMap;
+    typedef Eigen::TensorMap<Eigen::Tensor<const float, 1, Eigen::RowMajor>,
+                             Eigen::Unaligned> ConstTensorMap;
+    typedef Eigen::DSizes<Eigen::DenseIndex, 1> DSizes;
+    const int Bm = 16;
+    const int Bk = 16;
+    const int Bn = 1024;
+
+    auto work_shard = [m, n, k, transpose_a, Bm, Bk, Bn, &left, &right, &out](
+        int64 start64, int64 end64) {
+      const int start = static_cast<int>(start64);
+      const int end = static_cast<int>(end64);
+      Block curr(start, std::min(start + Bm, end), 0, std::min(Bk, k), 0,
+                 std::min(Bn, n));
+      Block next(curr);
+      bool done = false;
+      for (int i = start; i < end; ++i) {
+        out.chip<0>(i).setZero();
+      }
+      while (true) {
+        done = NextBlock(Bm, Bk, Bn, start, end, k, n, curr, &next);
+
+        PrefetchBlockT1(right, curr.startk, curr.endk, curr.startn, curr.endn);
+
+        // Process current block
+        for (int i = curr.startm; i < curr.endm; ++i) {
+          PrefetchBlockNTA(left, i, i + 1, curr.startk, curr.endk);
+          PrefetchBlockNTA(out, i, i + 1, curr.startn, curr.endn);
+          DSizes out_slice_shape(curr.endn - curr.startn);
+          TensorMap out_i(&out(i, curr.startn), out_slice_shape);
+          for (int j = curr.startk; j < curr.endk; ++j) {
+            const float l = transpose_a ? left(j, i) : left(i, j);
+            if (l == 0) continue;
+            ConstTensorMap right_j(&right(j, curr.startn), out_slice_shape);
+            out_i += right_j * l;
+          }
+        }
+        if (done) break;
+        curr = next;
+      }
+    };
+    auto worker_threads = *(ctx->device()->tensorflow_cpu_worker_threads());
+    Shard(worker_threads.num_threads, worker_threads.workers, m, 2 * k * n,
+          work_shard);
+  }
+
+ private:
+  bool transpose_a_;
+  bool transpose_b_;
+  bool a_is_sparse_;
+  bool b_is_sparse_;
+  TF_DISALLOW_COPY_AND_ASSIGN(SparseMatMulOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("SparseMatMul").Device(DEVICE_CPU),
+                        SparseMatMulOp);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/sparse_matmul_op_test.cc b/tensorflow/core/kernels/sparse_matmul_op_test.cc
new file mode 100644
index 0000000000..883d0d1224
--- /dev/null
+++ b/tensorflow/core/kernels/sparse_matmul_op_test.cc
@@ -0,0 +1,139 @@
+#include "tensorflow/core/framework/types.pb.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+random::PhiloxRandom philox(1, 1);
+random::SimplePhilox rnd(&philox);
+
+void Sparsify(Tensor* t, float sparsity) {
+  const int64 N = t->NumElements();
+  CHECK_LE(sparsity, 1);
+  if (sparsity <= 0) return;
+  auto flat = t->flat<float>();
+  static const uint32 K = 10000;
+  for (int64 i = 0; i < N; ++i) {
+    if (rnd.Uniform(K) < sparsity * K) {
+      flat(i) = 0;
+    }
+  }
+}
+
+Node* SparseMatMulNode(Graph* g, Node* in0, Node* in1, bool transpose_a,
+                       bool transpose_b, bool a_sparse, bool b_sparse) {
+  Node* ret;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "SparseMatMul")
+                  .Input(in0)
+                  .Input(in1)
+                  .Attr("transpose_a", transpose_a)
+                  .Attr("transpose_b", transpose_b)
+                  .Attr("a_is_sparse", a_sparse)
+                  .Attr("b_is_sparse", b_sparse)
+                  .Finalize(g, &ret));
+  return ret;
+}
+
+static Graph* SparseMatMulHelper(Graph* g, int m, int n, int d, float sparsity,
+                                 bool transpose_a, bool transpose_b,
+                                 bool a_sparse, bool b_sparse) {
+  a_sparse = a_sparse && (sparsity > 0);
+  b_sparse = b_sparse && (sparsity > 0);
+
+  auto left_shape = transpose_a ? TensorShape({d, m}) : TensorShape({m, d});
+  Tensor left(DataTypeToEnum<float>::value, left_shape);
+  left.flat<float>().setRandom();
+  if (a_sparse) {
+    Sparsify(&left, sparsity);
+  }
+
+  auto right_shape = transpose_b ? TensorShape({n, d}) : TensorShape({d, n});
+  Tensor right(DataTypeToEnum<float>::value, right_shape);
+  right.flat<float>().setRandom();
+  if (b_sparse) {
+    Sparsify(&right, sparsity);
+  }
+
+  SparseMatMulNode(g, test::graph::Constant(g, left),
+                   test::graph::Constant(g, right), transpose_a, transpose_b,
+                   a_sparse, b_sparse);
+  return g;
+}
+
+static Graph* SparseMatMul(int m, int n, int d, float sparsity,
+                           bool transpose_a, bool transpose_b) {
+  Graph* g = new Graph(OpRegistry::Global());
+  return SparseMatMulHelper(g, m, n, d, sparsity, transpose_a, transpose_b,
+                            true, false);
+}
+
+static Graph* MultiSparseMatMul(int m, int n, int d, float sparsity_a,
+                                float sparsity_b) {
+  Graph* g = new Graph(OpRegistry::Global());
+  if (sparsity_a == 0 && sparsity_b > 0) {
+    SparseMatMulHelper(g, m, n, d, sparsity_a, false, false, false, false);
+    SparseMatMulHelper(g, n, d, m, sparsity_b, true, true, true, false);
+    SparseMatMulHelper(g, m, d, n, sparsity_b, false, false, true, false);
+  } else {
+    SparseMatMulHelper(g, m, n, d, sparsity_a, false, true, true, false);
+    SparseMatMulHelper(g, d, n, m, sparsity_a, true, false, true, true);
+    SparseMatMulHelper(g, m, d, n, sparsity_b, false, false, true, false);
+  }
+  return g;
+}
+
+#define BM_SPARSE(M, K, N, S)                                                  \
+  static void BM_Sparse##_##M##_##K##_##N##_##S(int iters) {                   \
+    testing::ItemsProcessed(static_cast<int64>(iters) * M * K * N * 2);        \
+    std::string label = strings::Printf("%d_%d_%d_%0.2f", M, K, N, S / 100.0); \
+    testing::SetLabel(label);                                                  \
+    test::Benchmark("cpu", SparseMatMul(M, N, K, S / 100.0, false, false))     \
+        .Run(iters);                                                           \
+  }                                                                            \
+  BENCHMARK(BM_Sparse##_##M##_##K##_##N##_##S);
+
+BM_SPARSE(2048, 2048, 2048, 0);
+BM_SPARSE(2048, 2048, 2048, 1);
+BM_SPARSE(2048, 2048, 2048, 85);
+
+BM_SPARSE(1024, 1024, 1024, 0);
+BM_SPARSE(1024, 1024, 1024, 1);
+BM_SPARSE(1024, 1024, 1024, 85);
+
+BM_SPARSE(256, 256, 256, 1);
+BM_SPARSE(512, 512, 512, 1);
+
+#define BM_SPARSE_MULTI(M, K, N, S1, S2)                                       \
+  static void BM_Sparse_Multi##_##M##_##K##_##N##_##S1##_##S2(int iters) {     \
+    testing::ItemsProcessed(static_cast<int64>(iters) * M * K * N * 2 * 3);    \
+    std::string label = strings::Printf("%d_%d_%d_%0.2f_%0.2f", M, K, N,       \
+                                        S1 / 100.0, S2 / 100.0);               \
+    testing::SetLabel(label);                                                  \
+    test::Benchmark("cpu", MultiSparseMatMul(M, N, K, S1 / 100.0, S2 / 100.0)) \
+        .Run(iters);                                                           \
+  }                                                                            \
+  BENCHMARK(BM_Sparse_Multi##_##M##_##K##_##N##_##S1##_##S2);
+
+BM_SPARSE_MULTI(512, 2140, 4096, 0, 82);
+BM_SPARSE_MULTI(512, 4096, 2048, 83, 83);
+
+#define BM_SPARSE_TR(M, K, N, S, TA, TB)                                     \
+  static void BM_Sparse##_##M##_##K##_##N##_##S##_##TA##_##TB(int iters) {   \
+    testing::ItemsProcessed(static_cast<int64>(iters) * M * K * N * 2);      \
+    std::string label =                                                      \
+        strings::Printf("%d_%d_%d_%d_%d_%0.2f", M, K, N, TA, TB, S / 100.0); \
+    testing::SetLabel(label);                                                \
+    test::Benchmark("cpu", SparseMatMul(M, N, K, S / 100.0, TA, TB))         \
+        .Run(iters);                                                         \
+  }                                                                          \
+  BENCHMARK(BM_Sparse##_##M##_##K##_##N##_##S##_##TA##_##TB);
+
+BM_SPARSE_TR(2048, 2048, 2048, 1, true, false);
+BM_SPARSE_TR(2048, 2048, 2048, 1, false, true);
+BM_SPARSE_TR(2048, 2048, 2048, 1, true, true);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/sparse_reorder_op.cc b/tensorflow/core/kernels/sparse_reorder_op.cc
new file mode 100644
index 0000000000..fd6824a4e2
--- /dev/null
+++ b/tensorflow/core/kernels/sparse_reorder_op.cc
@@ -0,0 +1,71 @@
+#define EIGEN_USE_THREADS
+
+#include <algorithm>
+#include <unordered_map>
+#include <utility>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_util.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/sparse/sparse_tensor.h"
+
+namespace tensorflow {
+
+template <typename T>
+class SparseReorderOp : public OpKernel {
+ public:
+  explicit SparseReorderOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input_ind = context->input(0);
+    OP_REQUIRES(context, TensorShapeUtils::IsMatrix(input_ind.shape()),
+                errors::InvalidArgument(
+                    "Input indices should be a matrix but received shape",
+                    input_ind.shape().DebugString()));
+
+    const Tensor& input_val = context->input(1);
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(input_val.shape()),
+                errors::InvalidArgument(
+                    "Input values should be a vector but received shape",
+                    input_val.shape().DebugString()));
+
+    const Tensor& input_shape_in = context->input(2);
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(input_shape_in.shape()),
+                errors::InvalidArgument(
+                    "Input shape should be a vector but received shape",
+                    input_shape_in.shape().DebugString()));
+
+    const TensorShape input_shape(input_shape_in.vec<int64>());
+
+    gtl::InlinedVector<int64, 8> std_order(input_shape.dims());
+    std::iota(std_order.begin(), std_order.end(), 0);
+
+    // Check if the sparse tensor is already ordered correctly
+    sparse::SparseTensor input_sp(input_ind, input_val, input_shape, std_order);
+
+    if (input_sp.IndicesValid()) {
+      context->set_output(0, input_sp.indices());
+      context->set_output(1, input_sp.values());
+    } else {
+      // Deep-copy the input Tensors, then reorder in-place
+      sparse::SparseTensor reordered_sp(tensor::DeepCopy(input_ind),
+                                        tensor::DeepCopy(input_val),
+                                        input_shape);
+      reordered_sp.Reorder<T>(std_order);
+      context->set_output(0, reordered_sp.indices());
+      context->set_output(1, reordered_sp.values());
+    }
+  }
+};
+
+#define REGISTER_KERNELS(type)                                            \
+  REGISTER_KERNEL_BUILDER(                                                \
+      Name("SparseReorder").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      SparseReorderOp<type>)
+
+TF_CALL_ALL_TYPES(REGISTER_KERNELS);
+#undef REGISTER_KERNELS
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/sparse_to_dense_op.cc b/tensorflow/core/kernels/sparse_to_dense_op.cc
new file mode 100644
index 0000000000..47e91c134d
--- /dev/null
+++ b/tensorflow/core/kernels/sparse_to_dense_op.cc
@@ -0,0 +1,129 @@
+// See core/ops/sparse_ops.cc for documentation.
+//
+// NOTE: the operations in this file only are suitable for execution
+// on CPUs.
+
+#define EIGEN_USE_THREADS
+
+#include <string>
+#include <sstream>
+#include <unordered_map>
+#include <utility>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/util/sparse/sparse_tensor.h"
+
+namespace tensorflow {
+
+// Operator to convert sparse representations to dense.
+template <typename T, typename Index>
+class SparseToDense : public OpKernel {
+ public:
+  explicit SparseToDense(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* c) override {
+    // sparse_indices
+    const Tensor& indices = c->input(0);
+    OP_REQUIRES(c, indices.dims() <= 2,
+                errors::InvalidArgument(
+                    "sparse_indices should be a scalar, vector, or matrix, "
+                    "got shape ",
+                    indices.shape().ShortDebugString()));
+    const int64 num_elems = indices.dims() > 0 ? indices.dim_size(0) : 1;
+    const int64 num_dims = indices.dims() > 1 ? indices.dim_size(1) : 1;
+
+    // output_shape
+    const Tensor& output_shape = c->input(1);
+    OP_REQUIRES(
+        c, TensorShapeUtils::IsLegacyVector(output_shape.shape()),
+        errors::InvalidArgument("output_shape should be a vector, got shape ",
+                                output_shape.shape().ShortDebugString()));
+    OP_REQUIRES(c, output_shape.NumElements() == num_dims,
+                errors::InvalidArgument(
+                    "output_shape has incorrect number of elements: ",
+                    output_shape.NumElements(), " should be: ", num_dims));
+
+    // sparse_values
+    const Tensor& sparse_values = c->input(2);
+    const int64 num_values = sparse_values.NumElements();
+    OP_REQUIRES(
+        c, sparse_values.dims() == 0 ||
+               (sparse_values.dims() == 1 && num_values == num_elems),
+        errors::InvalidArgument("sparse_values has incorrect shape ",
+                                sparse_values.shape().ShortDebugString(),
+                                ", should be [] or [", num_elems, "]"));
+
+    // default_value
+    const Tensor& default_value = c->input(3);
+    OP_REQUIRES(c, TensorShapeUtils::IsScalar(default_value.shape()),
+                errors::InvalidArgument("default_value should be a scalar."));
+
+    auto output_shape_vec = output_shape.flat<Index>();
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(c, c->allocate_output(0, TensorShapeUtils::MakeShape(
+                                                output_shape_vec.data(),
+                                                output_shape_vec.size()),
+                                         &output));
+
+    TensorShape ix_shape({num_elems, num_dims});
+    Tensor indices_shaped(DT_INT64, ix_shape);
+    if (indices.dtype() == DT_INT64) {
+      CHECK(indices_shaped.CopyFrom(indices, ix_shape));
+    } else {
+      indices_shaped.matrix<int64>() =
+          indices.shaped<Index, 2>(ix_shape.dim_sizes()).template cast<int64>();
+    }
+
+    // If we received a scalar, we'll need to create a new
+    // tensor with copies of the values as a vec.
+    // TODO(ebrevdo): find a way to avoid this temp allocation.
+    Tensor sparse_values_b;
+
+    if (TensorShapeUtils::IsScalar(sparse_values.shape())) {
+      OP_REQUIRES_OK(
+          c, c->allocate_temp(DataTypeToEnum<T>::value,
+                              TensorShape({num_elems}), &sparse_values_b));
+      sparse_values_b.vec<T>().setConstant(sparse_values.scalar<T>()());
+    } else {
+      sparse_values_b = sparse_values;
+    }
+
+    gtl::InlinedVector<int64, 8> order(output->shape().dims());
+    std::iota(order.begin(), order.end(), 0);  // Assume order is correct
+    sparse::SparseTensor st(indices_shaped, sparse_values_b, output->shape(),
+                            order);
+
+    output->flat<T>().setConstant(default_value.scalar<T>()());
+    OP_REQUIRES(c, st.template ToDense<T>(output, false /* initialize */),
+                errors::InvalidArgument(
+                    "Indices are not valid (out of bounds).  Shape: ",
+                    output->shape().DebugString()));
+  }
+};
+
+#define REGISTER_KERNELS(type, index_type)                             \
+  REGISTER_KERNEL_BUILDER(Name("SparseToDense")                        \
+                              .Device(DEVICE_CPU)                      \
+                              .TypeConstraint<type>("T")               \
+                              .TypeConstraint<index_type>("Tindices"), \
+                          SparseToDense<type, index_type>);
+
+#define REGISTER_KERNELS_ALL(type) \
+  REGISTER_KERNELS(type, int32);   \
+  REGISTER_KERNELS(type, int64);
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNELS_ALL);
+REGISTER_KERNELS_ALL(bool);
+REGISTER_KERNELS_ALL(string);
+
+#undef REGISTER_KERNELS_ALL
+#undef REGISTER_KERNELS
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/sparse_to_dense_op_test.cc b/tensorflow/core/kernels/sparse_to_dense_op_test.cc
new file mode 100644
index 0000000000..e9800ccd68
--- /dev/null
+++ b/tensorflow/core/kernels/sparse_to_dense_op_test.cc
@@ -0,0 +1,283 @@
+#include <functional>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_testutil.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/session.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+namespace {
+
+class SparseToDenseTest : public OpsTestBase {
+ protected:
+  void SetUp() override { RequireDefaultOps(); }
+
+  void MakeOp(int dim, DataType index_type, DataType value_type) {
+    ASSERT_OK(NodeDefBuilder("sparsetodense", "SparseToDense")
+                  .Input(FakeInput(index_type))
+                  .Input(FakeInput(index_type))
+                  .Input(FakeInput(value_type))
+                  .Input(FakeInput(value_type))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(SparseToDenseTest, OneD_OneValue) {
+  MakeOp(1, DT_INT32, DT_FLOAT);
+
+  // sparse_indices
+  AddInputFromArray<int32>(TensorShape({3}), {1, 3, 4});
+  // output_shape
+  AddInputFromArray<int32>(TensorShape({1}), {5});
+  // sparse_values
+  AddInputFromArray<float>(TensorShape({}), {2});
+  // default_value
+  AddInputFromArray<float>(TensorShape({}), {-2});
+
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, {5});
+  test::FillValues<float>(&expected, {-2, 2, -2, 2, 2});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(SparseToDenseTest, OneD_OneValue_int64_double) {
+  MakeOp(1, DT_INT64, DT_DOUBLE);
+
+  // sparse_indices
+  AddInputFromArray<int64>(TensorShape({3}), {1, 3, 4});
+  // output_shape
+  AddInputFromArray<int64>(TensorShape({1}), {5});
+  // sparse_values
+  AddInputFromArray<double>(TensorShape({}), {2});
+  // default_value
+  AddInputFromArray<double>(TensorShape({}), {-2});
+
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_DOUBLE, {5});
+  test::FillValues<double>(&expected, {-2, 2, -2, 2, 2});
+  test::ExpectTensorEqual<double>(expected, *GetOutput(0));
+}
+
+TEST_F(SparseToDenseTest, OneD_MultValues) {
+  MakeOp(1, DT_INT32, DT_FLOAT);
+
+  // sparse_indices
+  AddInputFromArray<int32>({3}, {1, 3, 4});
+  // output_shape
+  AddInputFromArray<int32>({1}, {5});
+  // sparse_values
+  AddInputFromArray<float>({3}, {3, 4, 5});
+  // default_value
+  AddInputFromArray<float>({}, {-2});
+
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, {5});
+  test::FillValues<float>(&expected, {-2, 3, -2, 4, 5});
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(SparseToDenseTest, TwoD_OneValue) {
+  MakeOp(2, DT_INT32, DT_FLOAT);
+
+  // sparse_indices
+  AddInputFromArray<int32>(TensorShape({3, 2}), {0, 1, 0, 2, 2, 3});
+  // output_shape
+  AddInputFromArray<int32>(TensorShape({2}), {3, 4});
+  // sparse_values
+  AddInputFromArray<float>(TensorShape({}), {2});
+  // default_value
+  AddInputFromArray<float>(TensorShape({}), {-2});
+
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, {3, 4});
+  expected.flat<float>().setConstant(-2);
+  expected.tensor<float, 2>()(0, 1) = 2;
+  expected.tensor<float, 2>()(0, 2) = 2;
+  expected.tensor<float, 2>()(2, 3) = 2;
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(SparseToDenseTest, TwoD_MultValues) {
+  MakeOp(2, DT_INT32, DT_FLOAT);
+
+  // sparse_indices
+  AddInputFromArray<int32>(TensorShape({3, 2}), {0, 1, 0, 2, 2, 3});
+  // output_shape
+  AddInputFromArray<int32>(TensorShape({2}), {3, 4});
+  // sparse_values
+  AddInputFromArray<float>(TensorShape({3}), {3, 4, 5});
+  // default_value
+  AddInputFromArray<float>(TensorShape({}), {-2});
+
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, {3, 4});
+  expected.flat<float>().setConstant(-2);
+  expected.tensor<float, 2>()(0, 1) = 3;
+  expected.tensor<float, 2>()(0, 2) = 4;
+  expected.tensor<float, 2>()(2, 3) = 5;
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(SparseToDenseTest, ThreeD_OneValue) {
+  MakeOp(3, DT_INT32, DT_FLOAT);
+
+  // sparse_indices
+  AddInputFromArray<int32>(TensorShape({3, 3}), {0, 1, 1, 0, 2, 0, 2, 3, 1});
+  // output_shape
+  AddInputFromArray<int32>(TensorShape({3}), {3, 4, 2});
+  // sparse_values
+  AddInputFromArray<float>(TensorShape({}), {2});
+  // default_value
+  AddInputFromArray<float>(TensorShape({}), {-2});
+
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, {3, 4, 2});
+  expected.flat<float>().setConstant(-2);
+  expected.tensor<float, 3>()(0, 1, 1) = 2;
+  expected.tensor<float, 3>()(0, 2, 0) = 2;
+  expected.tensor<float, 3>()(2, 3, 1) = 2;
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(SparseToDenseTest, ThreeD_MultValues) {
+  MakeOp(3, DT_INT32, DT_FLOAT);
+
+  // sparse_indices
+  AddInputFromArray<int32>(TensorShape({3, 3}), {0, 1, 1, 0, 2, 0, 2, 3, 1});
+  // output_shape
+  AddInputFromArray<int32>(TensorShape({3}), {3, 4, 2});
+  // sparse_values
+  AddInputFromArray<float>(TensorShape({3}), {3, 4, 5});
+  // default_value
+  AddInputFromArray<float>(TensorShape({}), {-2});
+
+  ASSERT_OK(RunOpKernel());
+
+  Tensor expected(allocator(), DT_FLOAT, {3, 4, 2});
+  expected.flat<float>().setConstant(-2);
+  expected.tensor<float, 3>()(0, 1, 1) = 3;
+  expected.tensor<float, 3>()(0, 2, 0) = 4;
+  expected.tensor<float, 3>()(2, 3, 1) = 5;
+  test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+}  // namespace
+
+static int BM_Arg(int ndim, int n) { return (ndim * 1000000) + n; }
+static int NDIM_from_arg(int bm_arg) { return bm_arg / 1000000; }
+static int N_from_arg(int bm_arg) { return bm_arg % 1000000; }
+
+static void BM_SparseToDense(int iters, const int bm_arg) {
+  const int NDIM = NDIM_from_arg(bm_arg);
+  const int N = N_from_arg(bm_arg);
+  // TODO(zhifengc): Switch to use kernel_benchmark_testlib.h
+  tensorflow::testing::StopTiming();
+
+  const int IndexDim = (NDIM == 1) ? 0 : 1;
+
+  std::unique_ptr<Device> device(
+      DeviceFactory::NewDevice("CPU", {}, "/job:a/replica:0/task:0"));
+
+  gtl::InlinedVector<TensorValue, 4> inputs;
+
+  // Create a dense tensor with dims [1, ..., 1, N]
+  Tensor output_shape(DT_INT32, TensorShape({NDIM}));
+  Tensor sparse_indices(DT_INT32, TensorShape({N, NDIM}));
+  Tensor sparse_values(DT_FLOAT, TensorShape({N}));
+  Tensor default_value(DT_FLOAT, TensorShape({}));
+  auto output_shape_t = output_shape.vec<int32>();
+  for (int d = 0; d < NDIM; ++d) {
+    output_shape_t(d) = (d == IndexDim) ? N : 3;
+  }
+
+  auto sparse_indices_t = sparse_indices.matrix<int32>();
+  for (int n = 0; n < N; ++n) {
+    for (int d = 0; d < NDIM; ++d)
+      sparse_indices_t(n, d) = (d == IndexDim) ? n : 0;
+  }
+
+  for (auto* ptr :
+       {&sparse_indices, &output_shape, &sparse_values, &default_value}) {
+    inputs.push_back({nullptr, ptr});
+  }
+
+  NodeDef sparse_node_def;
+  TF_CHECK_OK(NodeDefBuilder("sparsetodense", "SparseToDense")
+                  .Input(FakeInput(DT_INT32))
+                  .Input(FakeInput(DT_INT32))
+                  .Input(FakeInput(DT_FLOAT))
+                  .Input(FakeInput(DT_FLOAT))
+                  .Finalize(&sparse_node_def));
+
+  Status status;
+  std::unique_ptr<OpKernel> op(CreateOpKernel(
+      DEVICE_CPU, device.get(), cpu_allocator(), sparse_node_def, &status));
+
+  OpKernelContext::Params params;
+  params.device = device.get();
+  params.frame_iter = FrameAndIter(0, 0);
+  params.inputs = &inputs;
+  params.op_kernel = op.get();
+  params.output_alloc_attr = [&device, &op, &params](int index) {
+    AllocatorAttributes attr;
+    const bool on_host = (op->output_memory_types()[index] == HOST_MEMORY);
+    attr.set_on_host(on_host);
+    return attr;
+  };
+
+  std::unique_ptr<OpKernelContext> sparse_context(new OpKernelContext(params));
+  op->Compute(sparse_context.get());
+  tensorflow::testing::StartTiming();
+  for (int i = 0; i < iters; ++i) {
+    delete sparse_context->release_output(0).tensor;
+    op->Compute(sparse_context.get());
+    ASSERT_OK(sparse_context->status());
+  }
+  tensorflow::testing::StopTiming();
+
+  // processing input, mainly
+  int64 bytes_per_iter = static_cast<int64>((N + N * NDIM) * sizeof(float));
+
+  tensorflow::testing::BytesProcessed(bytes_per_iter * iters);
+}
+
+BENCHMARK(BM_SparseToDense)
+    ->Arg(BM_Arg(1, 10))
+    ->Arg(BM_Arg(1, 100))
+    ->Arg(BM_Arg(1, 1000))
+    ->Arg(BM_Arg(1, 10000))
+    ->Arg(BM_Arg(2, 10))
+    ->Arg(BM_Arg(2, 100))
+    ->Arg(BM_Arg(2, 1000))
+    ->Arg(BM_Arg(2, 10000))
+    ->Arg(BM_Arg(3, 10))
+    ->Arg(BM_Arg(3, 100))
+    ->Arg(BM_Arg(3, 1000))
+    ->Arg(BM_Arg(3, 10000))
+    ->Arg(BM_Arg(5, 10))
+    ->Arg(BM_Arg(5, 100))
+    ->Arg(BM_Arg(5, 1000))
+    ->Arg(BM_Arg(5, 10000));
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/split_op.cc b/tensorflow/core/kernels/split_op.cc
new file mode 100644
index 0000000000..f4f9ada000
--- /dev/null
+++ b/tensorflow/core/kernels/split_op.cc
@@ -0,0 +1,146 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/split_op.h"
+
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class SplitOp : public OpKernel {
+ public:
+  explicit SplitOp(OpKernelConstruction* c) : OpKernel(c) {}
+
+  void Compute(OpKernelContext* context) override {
+    const int32 split_dim = context->input(0).flat<int32>()(0);
+    const int32 num_split = num_outputs();
+    const Tensor& input = context->input(1);
+    const TensorShape& input_shape = input.shape();
+
+    OP_REQUIRES(
+        context, 0 <= split_dim && split_dim < input_shape.dims(),
+        errors::InvalidArgument("0 <= split_dim < number of input dimensions (",
+                                input_shape.dims(), "), but got ", split_dim));
+
+    OP_REQUIRES(
+        context, num_split > 0,
+        errors::InvalidArgument(
+            "Number of ways to split should be > 0, but got ", num_split));
+
+    OP_REQUIRES(context, input_shape.dim_size(split_dim) % num_split == 0,
+                errors::InvalidArgument(
+                    "Number of ways to split should evenly divide the split "
+                    "dimension, but got split_dim ",
+                    split_dim, " (size = ", input_shape.dim_size(split_dim),
+                    ") ", "and num_split ", num_split));
+
+    // Special case 1: num_split == 1. Nothing to do.
+    if (num_split == 1) {
+      VLOG(1) << "Split identity";
+      context->set_output(0, context->input(1));
+      return;
+    }
+
+    // Special case 2: split along the 1st dimension. We can share the
+    // underlying buffer.
+    //
+    // Apply this optimization conservatively: if input is aligned,
+    // the resulting tensors must be aligned. It's conservative
+    // because if the immediate consumer of the resulting tensors are
+    // not using eigen for computation, its perfectly fine to avoid
+    // the copying.
+    if ((split_dim == 0) && IsInnerDimsSizeAligned<T>(input_shape)) {
+      VLOG(1) << "Slice dim 0: " << input_shape.DebugString();
+      const int64 delta = input_shape.dim_size(0) / num_split;
+      for (int i = 0; i < num_split; ++i) {
+        context->set_output(i, input.Slice(i * delta, (i + 1) * delta));
+      }
+      return;
+    }
+
+    int32 prefix_dim_size = 1;
+    for (int i = 0; i < split_dim; ++i) {
+      prefix_dim_size *= input_shape.dim_size(i);
+    }
+
+    int32 split_dim_size = input_shape.dim_size(split_dim);
+
+    int32 suffix_dim_size = 1;
+    for (int i = split_dim + 1; i < input_shape.dims(); ++i) {
+      suffix_dim_size *= input_shape.dim_size(i);
+    }
+
+    auto input_reshaped =
+        input.shaped<T, 3>({prefix_dim_size, split_dim_size, suffix_dim_size});
+
+    const int32 split_dim_output_size = split_dim_size / num_split;
+    TensorShape output_shape(input_shape);
+    output_shape.set_dim(split_dim, split_dim_output_size);
+
+    Eigen::DSizes<ptrdiff_t, 3> indices{0, 0, 0};
+    Eigen::DSizes<ptrdiff_t, 3> sizes{prefix_dim_size, split_dim_output_size,
+                                      suffix_dim_size};
+
+    for (int i = 0; i < num_split; ++i) {
+      Tensor* result = nullptr;
+      OP_REQUIRES_OK(context,
+                     context->allocate_output(i, output_shape, &result));
+      if (prefix_dim_size * split_dim_output_size * suffix_dim_size > 0) {
+        Eigen::DSizes<ptrdiff_t, 3> slice_indices;
+        Eigen::DSizes<ptrdiff_t, 3> slice_sizes;
+        for (int j = 0; j < 3; ++j) {
+          slice_indices[j] = indices[j];
+          slice_sizes[j] = sizes[j];
+        }
+
+        auto result_shaped = result->shaped<T, 3>(
+            {prefix_dim_size, split_dim_output_size, suffix_dim_size});
+
+        functor::Split<Device, T>()(context->eigen_device<Device>(),
+                                    result_shaped, input_reshaped,
+                                    slice_indices, slice_sizes);
+      }
+      indices[1] += split_dim_output_size;
+    }
+  }
+};
+
+#define REGISTER_SPLIT(type)                             \
+  REGISTER_KERNEL_BUILDER(Name("Split")                  \
+                              .Device(DEVICE_CPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("split_dim"),  \
+                          SplitOp<CPUDevice, type>)
+
+TF_CALL_ALL_TYPES(REGISTER_SPLIT);
+
+#undef REGISTER_SPLIT
+
+#if GOOGLE_CUDA
+
+#define REGISTER_GPU(type)                               \
+  REGISTER_KERNEL_BUILDER(Name("Split")                  \
+                              .Device(DEVICE_GPU)        \
+                              .TypeConstraint<type>("T") \
+                              .HostMemory("split_dim"),  \
+                          SplitOp<GPUDevice, type>)
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU);
+#undef REGISTER_GPU
+
+#endif  // GOOGLE_CUDA
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/split_op.h b/tensorflow/core/kernels/split_op.h
new file mode 100644
index 0000000000..2572c77285
--- /dev/null
+++ b/tensorflow/core/kernels/split_op.h
@@ -0,0 +1,31 @@
+#ifndef TENSORFLOW_KERNELS_SPLIT_OP_H_
+#define TENSORFLOW_KERNELS_SPLIT_OP_H_
+// Functor definition for SplitOp, must be compilable by nvcc.
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T>
+struct Split {
+  void operator()(const Device& d, typename TTypes<T, 3>::Tensor output,
+                  typename TTypes<T, 3>::ConstTensor input,
+                  const Eigen::DSizes<ptrdiff_t, 3>& slice_indices,
+                  const Eigen::DSizes<ptrdiff_t, 3>& slice_sizes);
+};
+
+template <typename T>
+struct Split<Eigen::ThreadPoolDevice, T> {
+  void operator()(const Eigen::ThreadPoolDevice& d,
+                  typename TTypes<T, 3>::Tensor output,
+                  typename TTypes<T, 3>::ConstTensor input,
+                  const Eigen::DSizes<ptrdiff_t, 3>& slice_indices,
+                  const Eigen::DSizes<ptrdiff_t, 3>& slice_sizes);
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_SPLIT_OP_H_
diff --git a/tensorflow/core/kernels/split_op_cpu.cc b/tensorflow/core/kernels/split_op_cpu.cc
new file mode 100644
index 0000000000..b86deeb8fb
--- /dev/null
+++ b/tensorflow/core/kernels/split_op_cpu.cc
@@ -0,0 +1,30 @@
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/split_op.h"
+
+#include "tensorflow/core/framework/numeric_types.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename T>
+void Split<Eigen::ThreadPoolDevice, T>::operator()(
+    const Eigen::ThreadPoolDevice& d, typename TTypes<T, 3>::Tensor output,
+    typename TTypes<T, 3>::ConstTensor input,
+    const Eigen::DSizes<ptrdiff_t, 3>& slice_indices,
+    const Eigen::DSizes<ptrdiff_t, 3>& slice_sizes) {
+  if (output.size() < 131072) {
+    output = input.slice(slice_indices, slice_sizes);
+  } else {
+    output.device(d) = input.slice(slice_indices, slice_sizes);
+  }
+}
+
+#define DEFINE_CPU_KERNELS(T) template struct Split<Eigen::ThreadPoolDevice, T>;
+
+TF_CALL_ALL_TYPES(DEFINE_CPU_KERNELS)
+
+}  // namespace functor
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/split_op_gpu.cu.cc b/tensorflow/core/kernels/split_op_gpu.cu.cc
new file mode 100644
index 0000000000..f8931d6a89
--- /dev/null
+++ b/tensorflow/core/kernels/split_op_gpu.cu.cc
@@ -0,0 +1,31 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include <stdio.h>
+
+#include "tensorflow/core/kernels/split_op.h"
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T>
+void Split<Device, T>::operator()(
+    const Device& d, typename TTypes<T, 3>::Tensor output,
+    typename TTypes<T, 3>::ConstTensor input,
+    const Eigen::DSizes<ptrdiff_t, 3>& slice_indices,
+    const Eigen::DSizes<ptrdiff_t, 3>& slice_sizes) {
+  output.device(d) = input.slice(slice_indices, slice_sizes);
+}
+
+#define DEFINE_GPU_KERNELS(T) template struct Split<Eigen::GpuDevice, T>;
+
+TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_KERNELS);
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/string_to_hash_bucket_op.cc b/tensorflow/core/kernels/string_to_hash_bucket_op.cc
new file mode 100644
index 0000000000..bd6fa47268
--- /dev/null
+++ b/tensorflow/core/kernels/string_to_hash_bucket_op.cc
@@ -0,0 +1,47 @@
+#include <string>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/hash/hash.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+class StringToHashBucketOp : public OpKernel {
+ public:
+  explicit StringToHashBucketOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("num_buckets", &num_buckets_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor* input_tensor;
+    OP_REQUIRES_OK(context, context->input("string_tensor", &input_tensor));
+    const auto& input_flat = input_tensor->flat<string>();
+
+    Tensor* output_tensor = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output("output", input_tensor->shape(),
+                                            &output_tensor));
+    auto output_flat = output_tensor->flat<int64>();
+
+    for (int i = 0; i < input_flat.size(); ++i) {
+      const uint64 input_hash = Hash64(input_flat(i));
+      const uint64 bucket_id = input_hash % num_buckets_;
+      // The number of buckets is always in the positive range of int64 so is
+      // the resulting bucket_id. Casting the bucket_id from uint64 to int64 is
+      // safe.
+      output_flat(i) = static_cast<int64>(bucket_id);
+    }
+  }
+
+ private:
+  int64 num_buckets_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(StringToHashBucketOp);
+};
+
+REGISTER_KERNEL_BUILDER(Name("StringToHashBucket").Device(DEVICE_CPU),
+                        StringToHashBucketOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/string_to_number_op.cc b/tensorflow/core/kernels/string_to_number_op.cc
new file mode 100644
index 0000000000..8d23a4fdf8
--- /dev/null
+++ b/tensorflow/core/kernels/string_to_number_op.cc
@@ -0,0 +1,71 @@
+// See docs in ../ops/parse_ops.cc.
+
+#include <errno.h>
+#include <string>
+
+#include "tensorflow/core/framework/kernel_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+static constexpr char kErrorMessage[] =
+    "StringToNumberOp could not correctly convert string: ";
+
+template <typename OutputType>
+class StringToNumberOp : public OpKernel {
+ public:
+  using OpKernel::OpKernel;
+
+  void Compute(OpKernelContext* context) override {
+    // This is not a deep copy of the input tensor; they will share the same
+    // underlying storage.
+    const Tensor* input_tensor;
+    OP_REQUIRES_OK(context, context->input("string_tensor", &input_tensor));
+    const auto& input_flat = input_tensor->flat<string>();
+
+    Tensor* output_tensor = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output("output", input_tensor->shape(),
+                                            &output_tensor));
+    auto output_flat = output_tensor->flat<OutputType>();
+
+    for (int i = 0; i < input_flat.size(); ++i) {
+      const char* s = input_flat(i).data();
+      Convert(s, &output_flat(i), context);
+    }
+  }
+
+ private:
+  void Convert(const char* s, OutputType* output_data,
+               OpKernelContext* context);
+};
+
+template <>
+void StringToNumberOp<float>::Convert(const char* s, float* output_data,
+                                      OpKernelContext* context) {
+  OP_REQUIRES(context, strings::safe_strtof(s, output_data),
+              errors::InvalidArgument(kErrorMessage, s));
+}
+
+template <>
+void StringToNumberOp<int32>::Convert(const char* s, int32* output_data,
+                                      OpKernelContext* context) {
+  OP_REQUIRES(context, strings::safe_strto32(s, output_data),
+              errors::InvalidArgument(kErrorMessage, s));
+}
+
+// Registers the currently supported output types.
+#define REGISTER(type)                                           \
+  REGISTER_KERNEL_BUILDER(Name("StringToNumber")                 \
+                              .Device(DEVICE_CPU)                \
+                              .TypeConstraint<type>("out_type"), \
+                          StringToNumberOp<type>)
+REGISTER(float);
+REGISTER(int32);
+#undef REGISTER
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/summary_image_op.cc b/tensorflow/core/kernels/summary_image_op.cc
new file mode 100644
index 0000000000..ba765f2e84
--- /dev/null
+++ b/tensorflow/core/kernels/summary_image_op.cc
@@ -0,0 +1,169 @@
+// Operators that deal with SummaryProtos (encoded as DT_STRING tensors) as
+// inputs or outputs in various ways.
+
+// See docs in ../ops/summary_ops.cc.
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/summary.pb.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/png/png_io.h"
+
+namespace tensorflow {
+
+class SummaryImageOp : public OpKernel {
+ public:
+  explicit SummaryImageOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("max_images", &max_images_));
+    const TensorProto* proto;
+    OP_REQUIRES_OK(context, context->GetAttr("bad_color", &proto));
+    OP_REQUIRES_OK(context, context->device()->MakeTensorFromProto(
+                                *proto, AllocatorAttributes(), &bad_color_));
+    OP_REQUIRES(context, bad_color_.dtype() == DT_UINT8,
+                errors::InvalidArgument("bad_color must be uint8, got ",
+                                        DataTypeString(bad_color_.dtype())));
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsVector(bad_color_.shape()),
+        errors::InvalidArgument("bad_color must be a vector, got shape ",
+                                bad_color_.shape().ShortDebugString()));
+  }
+
+  void Compute(OpKernelContext* c) override {
+    const Tensor& tags = c->input(0);
+    const Tensor& tensor = c->input(1);
+    OP_REQUIRES(c, TensorShapeUtils::IsLegacyScalar(tags.shape()),
+                errors::InvalidArgument("Tags must have be a scalar"));
+    OP_REQUIRES(c, tensor.dims() == 4 &&
+                       (tensor.dim_size(3) == 1 || tensor.dim_size(3) == 3 ||
+                        tensor.dim_size(3) == 4),
+                errors::InvalidArgument(
+                    "Tensor must be 4-D with last dim 1, 3, or 4, not ",
+                    tensor.shape().DebugString()));
+    const string& base_tag = tags.scalar<string>()();
+
+    const int batch_size = tensor.dim_size(0);
+    const int h = tensor.dim_size(1);
+    const int w = tensor.dim_size(2);
+    const int hw = h * w;  // Compact these two dims for simplicity
+    const int depth = tensor.dim_size(3);
+    auto tensor_eigen = tensor.shaped<float, 3>({batch_size, hw, depth});
+
+    OP_REQUIRES(c, bad_color_.dim_size(0) >= depth,
+                errors::InvalidArgument(
+                    "expected depth <= bad_color.size, got depth = ", depth,
+                    ", bad_color.size = ", bad_color_.dim_size(0)));
+    auto bad_color_full = bad_color_.vec<uint8>();
+    typename TTypes<uint8>::Vec bad_color(bad_color_full.data(), depth);
+
+    // RGB (or gray or RGBA) is last dimension
+    Eigen::Tensor<uint8, 2, Eigen::RowMajor> image(hw, depth);
+
+    Summary s;
+    const int N = std::min<int>(max_images_, batch_size);
+    for (int i = 0; i < N; ++i) {
+      Summary::Value* v = s.add_value();
+      // The tag depends on the number of requested images (not the number
+      // produced.)
+      //
+      // Note that later on avisu uses "/" to figure out a consistent naming
+      // convention for display, so we append "/image" to guarantee that the
+      // image(s) won't be displayed in the global scope with no name.
+      if (max_images_ > 1) {
+        v->set_tag(strings::StrCat(base_tag, "/image/", i));
+      } else {
+        v->set_tag(strings::StrCat(base_tag, "/image"));
+      }
+
+      if (image.size()) {
+        typename TTypes<float>::ConstMatrix values(
+            &tensor_eigen(i, 0, 0),
+            Eigen::DSizes<Eigen::DenseIndex, 2>(hw, depth));
+
+        // Rescale the image to uint8 range.
+        //
+        // We are trying to generate an RCG image from a float tensor.  We do
+        // not have any info about the expected range of values in the tensor
+        // but the generated image needs to have all RGB values within [0, 255].
+        //
+        // We use two different algorithms to generate these values.  If the
+        // tensor has only positive values we scale them all by 255/max(values).
+        // If the tensor has both negative and positive values we scale them by
+        // the max of their absolute values and center them around 127.
+        //
+        // This works for most cases, but has the incovenient of not respecting
+        // the relative dynamic range across different instances of the tensor.
+
+        // Compute min and max ignoring nonfinite pixels
+        float image_min = std::numeric_limits<float>::infinity();
+        float image_max = -image_min;
+        for (int i = 0; i < hw; i++) {
+          bool finite = true;
+          for (int j = 0; j < depth; j++) {
+            if (!std::isfinite(values(i, j))) {
+              finite = false;
+              break;
+            }
+          }
+          if (finite) {
+            for (int j = 0; j < depth; j++) {
+              float value = values(i, j);
+              image_min = std::min(image_min, value);
+              image_max = std::max(image_max, value);
+            }
+          }
+        }
+
+        // Pick an affine transform into uint8
+        const float kZeroThreshold = 1e-6;
+        float scale, offset;
+        if (image_min < 0) {
+          float max_val = std::max(std::abs(image_min), std::abs(image_max));
+          scale = max_val < kZeroThreshold ? 0.0f : 127.0f / max_val;
+          offset = 128.0f;
+        } else {
+          scale = image_max < kZeroThreshold ? 0.0f : 255.0f / image_max;
+          offset = 0.0f;
+        }
+
+        // Transform image, turning nonfinite values to bad_color
+        for (int i = 0; i < hw; i++) {
+          bool finite = true;
+          for (int j = 0; j < depth; j++) {
+            if (!std::isfinite(values(i, j))) {
+              finite = false;
+              break;
+            }
+          }
+          if (finite) {
+            image.chip<0>(i) =
+                (values.chip<0>(i) * scale + offset).cast<uint8>();
+          } else {
+            image.chip<0>(i) = bad_color;
+          }
+        }
+      }
+
+      Summary::Image* si = v->mutable_image();
+      si->set_height(h);
+      si->set_width(w);
+      si->set_colorspace(depth);
+      OP_REQUIRES(c, png::WriteImageToBuffer(
+                         image.data(), w, h, w * depth, depth, 8, -1,
+                         si->mutable_encoded_image_string(), nullptr),
+                  errors::Internal("PNG encoding failed"));
+    }
+
+    Tensor* summary_tensor = nullptr;
+    OP_REQUIRES_OK(c, c->allocate_output(0, TensorShape({}), &summary_tensor));
+    CHECK(s.SerializeToString(&summary_tensor->scalar<string>()()));
+  }
+
+ private:
+  int64 max_images_;
+  Tensor bad_color_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("ImageSummary").Device(DEVICE_CPU),
+                        SummaryImageOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/summary_image_op_test.cc b/tensorflow/core/kernels/summary_image_op_test.cc
new file mode 100644
index 0000000000..ddfeeffc0b
--- /dev/null
+++ b/tensorflow/core/kernels/summary_image_op_test.cc
@@ -0,0 +1,141 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/summary.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/histogram/histogram.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+namespace {
+
+static void EXPECT_SummaryMatches(const Summary& actual,
+                                  const string& expected_str) {
+  Summary expected;
+  CHECK(protobuf::TextFormat::ParseFromString(expected_str, &expected));
+  EXPECT_EQ(expected.DebugString(), actual.DebugString());
+}
+
+// --------------------------------------------------------------------------
+// SummaryImageOp
+// --------------------------------------------------------------------------
+class SummaryImageOpTest : public OpsTestBase {
+ protected:
+  void MakeOp(int max_images) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "ImageSummary")
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Attr("max_images", max_images)
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+
+  void CheckAndRemoveEncodedImages(Summary* summary) {
+    for (int i = 0; i < summary->value_size(); ++i) {
+      Summary::Value* value = summary->mutable_value(i);
+      ASSERT_TRUE(value->has_image()) << "No image for value: " << value->tag();
+      ASSERT_FALSE(value->image().encoded_image_string().empty())
+          << "No encoded_image_string for value: " << value->tag();
+      if (VLOG_IS_ON(2)) {
+        // When LOGGING, output the images to disk for manual inspection.
+        TF_CHECK_OK(WriteStringToFile(
+            Env::Default(), strings::StrCat("/tmp/", value->tag(), ".png"),
+            value->image().encoded_image_string()));
+      }
+      value->mutable_image()->clear_encoded_image_string();
+    }
+  }
+};
+
+TEST_F(SummaryImageOpTest, ThreeGrayImagesOutOfFive4dInput) {
+  MakeOp(3 /* max images */);
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({}), {"tag"});
+  AddInputFromArray<float>(TensorShape({5, 2, 1, 1}),
+                           {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output size.
+  Tensor* out_tensor = GetOutput(0);
+  ASSERT_EQ(0, out_tensor->dims());
+  Summary summary;
+  ParseProtoUnlimited(&summary, out_tensor->scalar<string>()());
+
+  CheckAndRemoveEncodedImages(&summary);
+  EXPECT_SummaryMatches(summary, R"(
+    value { tag: 'tag/image/0' image { width: 1 height: 2 colorspace: 1} }
+    value { tag: 'tag/image/1' image { width: 1 height: 2 colorspace: 1} }
+    value { tag: 'tag/image/2' image { width: 1 height: 2 colorspace: 1} }
+  )");
+}
+
+TEST_F(SummaryImageOpTest, OneGrayImage4dInput) {
+  MakeOp(1 /* max images */);
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({}), {"tag"});
+  AddInputFromArray<float>(TensorShape({5 /*batch*/, 2, 1, 1 /*depth*/}),
+                           {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output size.
+  Tensor* out_tensor = GetOutput(0);
+  ASSERT_EQ(0, out_tensor->dims());
+  Summary summary;
+  ParseProtoUnlimited(&summary, out_tensor->scalar<string>()());
+
+  CheckAndRemoveEncodedImages(&summary);
+  EXPECT_SummaryMatches(summary, R"(
+    value { tag: 'tag/image' image { width: 1 height: 2 colorspace: 1} })");
+}
+
+TEST_F(SummaryImageOpTest, OneColorImage4dInput) {
+  MakeOp(1 /* max images */);
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({}), {"tag"});
+  AddInputFromArray<float>(
+      TensorShape({1 /*batch*/, 5 /*rows*/, 2 /*columns*/, 3 /*depth*/}),
+      {
+          /* r0, c0, RGB */ 1.0, 0.1, 0.2,
+          /* r0, c1, RGB */ 1.0, 0.3, 0.4,
+          /* r1, c0, RGB */ 0.0, 1.0, 0.0,
+          /* r1, c1, RGB */ 0.0, 1.0, 0.0,
+          /* r2, c0, RGB */ 0.0, 0.0, 1.0,
+          /* r2, c1, RGB */ 0.0, 0.0, 1.0,
+          /* r3, c0, RGB */ 1.0, 1.0, 0.0,
+          /* r3, c1, RGB */ 1.0, 0.0, 1.0,
+          /* r4, c0, RGB */ 1.0, 1.0, 0.0,
+          /* r4, c1, RGB */ 1.0, 0.0, 1.0,
+      });
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output size.
+  Tensor* out_tensor = GetOutput(0);
+  ASSERT_EQ(0, out_tensor->dims());
+  Summary summary;
+  ParseProtoUnlimited(&summary, out_tensor->scalar<string>()());
+
+  CheckAndRemoveEncodedImages(&summary);
+  EXPECT_SummaryMatches(summary, R"(
+    value { tag: 'tag/image' image { width: 2 height: 5 colorspace: 3} })");
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/summary_op.cc b/tensorflow/core/kernels/summary_op.cc
new file mode 100644
index 0000000000..1c4be64b8b
--- /dev/null
+++ b/tensorflow/core/kernels/summary_op.cc
@@ -0,0 +1,141 @@
+// Operators that deal with SummaryProtos (encoded as DT_STRING tensors) as
+// inputs or outputs in various ways.
+
+// See docs in ../ops/summary_ops.cc.
+
+#include <unordered_set>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/framework/summary.pb.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/histogram/histogram.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+template <typename T>
+class SummaryScalarOp : public OpKernel {
+ public:
+  explicit SummaryScalarOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* c) override {
+    const Tensor& tags = c->input(0);
+    const Tensor& values = c->input(1);
+
+    OP_REQUIRES(c, tags.IsSameSize(values) ||
+                       (TensorShapeUtils::IsLegacyScalar(tags.shape()) &&
+                        TensorShapeUtils::IsLegacyScalar(values.shape())),
+                errors::InvalidArgument("tags and values not the same shape: ",
+                                        tags.shape().ShortDebugString(), " != ",
+                                        values.shape().ShortDebugString()));
+    auto Ttags = tags.flat<string>();
+    auto Tvalues = values.flat<T>();
+    Summary s;
+    for (int i = 0; i < Ttags.size(); i++) {
+      Summary::Value* v = s.add_value();
+      v->set_tag(Ttags(i));
+      v->set_simple_value(Tvalues(i));
+    }
+
+    Tensor* summary_tensor = nullptr;
+    OP_REQUIRES_OK(c, c->allocate_output(0, TensorShape({}), &summary_tensor));
+    CHECK(s.SerializeToString(&summary_tensor->scalar<string>()()));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ScalarSummary")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        SummaryScalarOp<float>);
+REGISTER_KERNEL_BUILDER(Name("ScalarSummary")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<double>("T"),
+                        SummaryScalarOp<double>);
+
+class SummaryHistoOp : public OpKernel {
+ public:
+  // SummaryHistoOp could be extended to take a list of custom bucket
+  // boundaries as an option.
+  explicit SummaryHistoOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* c) override {
+    const Tensor& tags = c->input(0);
+    const Tensor& values = c->input(1);
+    const auto flat = values.flat<float>();
+    OP_REQUIRES(c, TensorShapeUtils::IsLegacyScalar(tags.shape()),
+                errors::InvalidArgument("tags must be scalar"));
+    // Build histogram of values in "values" tensor
+    histogram::Histogram histo;
+    for (int64 i = 0; i < flat.size(); i++) {
+      float v = flat(i);
+      if (!std::isfinite(v)) {
+        c->SetStatus(
+            errors::OutOfRange("Nan in summary histogram for: ", name()));
+        break;
+      }
+      histo.Add(v);
+    }
+
+    Summary s;
+    Summary::Value* v = s.add_value();
+    v->set_tag(tags.scalar<string>()());
+    histo.EncodeToProto(v->mutable_histo(), false /* Drop zero buckets */);
+
+    Tensor* summary_tensor = nullptr;
+    OP_REQUIRES_OK(c, c->allocate_output(0, TensorShape({}), &summary_tensor));
+    CHECK(s.SerializeToString(&summary_tensor->scalar<string>()()));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("HistogramSummary").Device(DEVICE_CPU),
+                        SummaryHistoOp);
+
+struct HistogramResource : public ResourceBase {
+  histogram::ThreadSafeHistogram histogram;
+
+  string DebugString() override { return "A historam summary. Stats ..."; }
+};
+
+class SummaryMergeOp : public OpKernel {
+ public:
+  explicit SummaryMergeOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* c) override {
+    Summary s;
+    std::unordered_set<string> tags;
+    for (int input_num = 0; input_num < c->num_inputs(); input_num++) {
+      const Tensor& in = c->input(input_num);
+      auto in_vec = in.flat<string>();
+      for (int i = 0; i < in_vec.dimension(0); i++) {
+        const string& s_in = in_vec(i);
+        Summary summary_in;
+        if (!ParseProtoUnlimited(&summary_in, s_in)) {
+          c->SetStatus(errors::InvalidArgument(
+              "Could not parse one of the summary inputs"));
+          return;
+        }
+
+        for (int v = 0; v < summary_in.value_size(); v++) {
+          if (!tags.insert(summary_in.value(v).tag()).second) {
+            c->SetStatus(errors::InvalidArgument(
+                strings::StrCat("Duplicate tag ", summary_in.value(v).tag(),
+                                " found in summary inputs")));
+            return;
+          }
+          *s.add_value() = summary_in.value(v);
+        }
+      }
+    }
+
+    Tensor* summary_tensor = nullptr;
+    OP_REQUIRES_OK(c, c->allocate_output(0, TensorShape({}), &summary_tensor));
+    CHECK(s.SerializeToString(&summary_tensor->scalar<string>()()));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("MergeSummary").Device(DEVICE_CPU),
+                        SummaryMergeOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/summary_op_test.cc b/tensorflow/core/kernels/summary_op_test.cc
new file mode 100644
index 0000000000..fd271a6862
--- /dev/null
+++ b/tensorflow/core/kernels/summary_op_test.cc
@@ -0,0 +1,282 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/summary.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/histogram/histogram.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+namespace {
+
+static void EXPECT_SummaryMatches(const Summary& actual,
+                                  const string& expected_str) {
+  Summary expected;
+  CHECK(protobuf::TextFormat::ParseFromString(expected_str, &expected));
+  EXPECT_EQ(expected.DebugString(), actual.DebugString());
+}
+
+class SummaryScalarOpTest : public OpsTestBase {
+ protected:
+  void MakeOp(DataType dt) {
+    RequireDefaultOps();
+    ASSERT_OK(NodeDefBuilder("myop", "ScalarSummary")
+                  .Input(FakeInput())
+                  .Input(FakeInput(dt))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(SummaryScalarOpTest, SimpleFloat) {
+  MakeOp(DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({3}), {"tag1", "tag2", "tag3"});
+  AddInputFromArray<float>(TensorShape({3}), {1.0, -0.73, 10000.0});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output size.
+  Tensor* out_tensor = GetOutput(0);
+  ASSERT_EQ(0, out_tensor->dims());
+  Summary summary;
+  ParseProtoUnlimited(&summary, out_tensor->scalar<string>()());
+  EXPECT_SummaryMatches(summary, R"(
+      value { tag: 'tag1' simple_value: 1.0 }
+      value { tag: 'tag2' simple_value: -0.73 }
+      value { tag: 'tag3' simple_value: 10000.0 }
+  )");
+}
+
+TEST_F(SummaryScalarOpTest, SimpleDouble) {
+  MakeOp(DT_DOUBLE);
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({3}), {"tag1", "tag2", "tag3"});
+  AddInputFromArray<double>(TensorShape({3}), {1.0, -0.73, 10000.0});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output size.
+  Tensor* out_tensor = GetOutput(0);
+  ASSERT_EQ(0, out_tensor->dims());
+  Summary summary;
+  ParseProtoUnlimited(&summary, out_tensor->scalar<string>()());
+  EXPECT_SummaryMatches(summary, R"(
+      value { tag: 'tag1' simple_value: 1.0 }
+      value { tag: 'tag2' simple_value: -0.73 }
+      value { tag: 'tag3' simple_value: 10000.0 }
+  )");
+}
+
+TEST_F(SummaryScalarOpTest, Error_MismatchedSize) {
+  MakeOp(DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({2}), {"tag1", "tag2"});
+  AddInputFromArray<float>(TensorShape({3}), {1.0, -0.73, 10000.0});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString()).contains("not the same shape")) << s;
+}
+
+TEST_F(SummaryScalarOpTest, Error_WrongDimsTags) {
+  MakeOp(DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({2, 1}), {"tag1", "tag2"});
+  AddInputFromArray<float>(TensorShape({2}), {1.0, -0.73});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(
+      StringPiece(s.ToString()).contains("tags and values not the same shape"))
+      << s;
+}
+
+TEST_F(SummaryScalarOpTest, Error_WrongDimsValues) {
+  MakeOp(DT_FLOAT);
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({2}), {"tag1", "tag2"});
+  AddInputFromArray<float>(TensorShape({2, 1}), {1.0, -0.73});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(
+      StringPiece(s.ToString()).contains("tags and values not the same shape"))
+      << s;
+}
+
+// --------------------------------------------------------------------------
+// SummaryHistoOp
+// --------------------------------------------------------------------------
+class SummaryHistoOpTest : public OpsTestBase {
+ protected:
+  void MakeOp() {
+    ASSERT_OK(NodeDefBuilder("myop", "HistogramSummary")
+                  .Input(FakeInput())
+                  .Input(FakeInput())
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(SummaryHistoOpTest, Simple) {
+  MakeOp();
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({}), {"taghisto"});
+  AddInputFromArray<float>(TensorShape({3, 2}), {0.1, -0.7, 4.1, 4., 5., 4.});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output size.
+  Tensor* out_tensor = GetOutput(0);
+  ASSERT_EQ(0, out_tensor->dims());
+  Summary summary;
+  ParseProtoUnlimited(&summary, out_tensor->scalar<string>()());
+  ASSERT_EQ(summary.value_size(), 1);
+  EXPECT_EQ(summary.value(0).tag(), "taghisto");
+  histogram::Histogram histo;
+  EXPECT_TRUE(histo.DecodeFromProto(summary.value(0).histo()));
+  EXPECT_EQ(
+      "Count: 6  Average: 2.7500  StdDev: 2.20\n"
+      "Min: -0.7000  Median: 3.9593  Max: 5.0000\n"
+      "------------------------------------------------------\n"
+      "[      -0.76,      -0.69 )       1  16.667%  16.667% ###\n"
+      "[      0.093,        0.1 )       1  16.667%  33.333% ###\n"
+      "[        3.8,        4.2 )       3  50.000%  83.333% ##########\n"
+      "[        4.6,        5.1 )       1  16.667% 100.000% ###\n",
+      histo.ToString());
+}
+
+TEST_F(SummaryHistoOpTest, Error_WrongDimsTags) {
+  MakeOp();
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({2, 1}), {"tag1", "tag2"});
+  AddInputFromArray<float>(TensorShape({2}), {1.0, -0.73});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString()).contains("tags must be scalar")) << s;
+}
+
+TEST_F(SummaryHistoOpTest, Error_TooManyTagValues) {
+  MakeOp();
+
+  // Feed and run
+  AddInputFromArray<string>(TensorShape({2}), {"tag1", "tag2"});
+  AddInputFromArray<float>(TensorShape({2, 1}), {1.0, -0.73});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString()).contains("tags must be scalar")) << s;
+}
+
+// --------------------------------------------------------------------------
+// SummaryMergeOp
+// --------------------------------------------------------------------------
+class SummaryMergeOpTest : public OpsTestBase {
+ protected:
+  void MakeOp(int num_inputs) {
+    ASSERT_OK(NodeDefBuilder("myop", "MergeSummary")
+                  .Input(FakeInput(num_inputs))
+                  .Finalize(node_def()));
+    ASSERT_OK(InitOp());
+  }
+};
+
+TEST_F(SummaryMergeOpTest, Simple) {
+  MakeOp(1);
+
+  // Feed and run
+  Summary s1;
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "value { tag: \"tag1\" simple_value: 1.0 } "
+      "value { tag: \"tag2\" simple_value: -0.73 } ",
+      &s1));
+  Summary s2;
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "value { tag: \"tag3\" simple_value: 10000.0 }", &s2));
+  Summary s3;
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "value { tag: \"tag4\" simple_value: 11.0 }", &s3));
+
+  AddInputFromArray<string>(
+      TensorShape({3}),
+      {s1.SerializeAsString(), s2.SerializeAsString(), s3.SerializeAsString()});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output size.
+  Tensor* out_tensor = GetOutput(0);
+  ASSERT_EQ(0, out_tensor->dims());
+  Summary summary;
+  ParseProtoUnlimited(&summary, out_tensor->scalar<string>()());
+
+  EXPECT_SummaryMatches(summary,
+                        "value { tag: \"tag1\" simple_value: 1.0 } "
+                        "value { tag: \"tag2\" simple_value: -0.73 } "
+                        "value { tag: \"tag3\" simple_value: 10000.0 }"
+                        "value { tag: \"tag4\" simple_value: 11.0 }");
+}
+
+TEST_F(SummaryMergeOpTest, Simple_MultipleInputs) {
+  MakeOp(3);
+
+  // Feed and run
+  Summary s1;
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "value { tag: \"tag1\" simple_value: 1.0 } "
+      "value { tag: \"tag2\" simple_value: -0.73 } ",
+      &s1));
+  Summary s2;
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "value { tag: \"tag3\" simple_value: 10000.0 }", &s2));
+  Summary s3;
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "value { tag: \"tag4\" simple_value: 11.0 }", &s3));
+
+  AddInputFromArray<string>(TensorShape({}), {s1.SerializeAsString()});
+  AddInputFromArray<string>(TensorShape({}), {s2.SerializeAsString()});
+  AddInputFromArray<string>(TensorShape({}), {s3.SerializeAsString()});
+  ASSERT_OK(RunOpKernel());
+
+  // Check the output size.
+  Tensor* out_tensor = GetOutput(0);
+  ASSERT_EQ(0, out_tensor->dims());
+  Summary summary;
+  ParseProtoUnlimited(&summary, out_tensor->scalar<string>()());
+
+  EXPECT_SummaryMatches(summary,
+                        "value { tag: \"tag1\" simple_value: 1.0 } "
+                        "value { tag: \"tag2\" simple_value: -0.73 } "
+                        "value { tag: \"tag3\" simple_value: 10000.0 }"
+                        "value { tag: \"tag4\" simple_value: 11.0 }");
+}
+
+TEST_F(SummaryMergeOpTest, Error_MismatchedSize) {
+  MakeOp(1);
+
+  // Feed and run
+  Summary s1;
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "value { tag: \"tag1\" simple_value: 1.0 } "
+      "value { tag: \"tagduplicate\" simple_value: -0.73 } ",
+      &s1));
+  Summary s2;
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "value { tag: \"tagduplicate\" simple_value: 1.0 } ", &s2));
+  AddInputFromArray<string>(TensorShape({2}),
+                            {s1.SerializeAsString(), s2.SerializeAsString()});
+  Status s = RunOpKernel();
+  EXPECT_TRUE(StringPiece(s.ToString()).contains("Duplicate tag")) << s;
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/text_line_reader_op.cc b/tensorflow/core/kernels/text_line_reader_op.cc
new file mode 100644
index 0000000000..51e4d6a2b8
--- /dev/null
+++ b/tensorflow/core/kernels/text_line_reader_op.cc
@@ -0,0 +1,99 @@
+// See docs in ../ops/io_ops.cc.
+
+#include <memory>
+#include "tensorflow/core/framework/reader_op_kernel.h"
+#include "tensorflow/core/kernels/reader_base.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/io/inputbuffer.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+
+class TextLineReader : public ReaderBase {
+ public:
+  TextLineReader(const string& node_name, int skip_header_lines, Env* env)
+      : ReaderBase(strings::StrCat("TextLineReader '", node_name, "'")),
+        skip_header_lines_(skip_header_lines),
+        env_(env),
+        line_number_(0) {}
+
+  Status OnWorkStartedLocked() override {
+    line_number_ = 0;
+    RandomAccessFile* file = nullptr;
+    TF_RETURN_IF_ERROR(env_->NewRandomAccessFile(current_work(), &file));
+    input_buffer_.reset(new io::InputBuffer(file, kBufferSize));
+    for (; line_number_ < skip_header_lines_; ++line_number_) {
+      string line_contents;
+      Status status = input_buffer_->ReadLine(&line_contents);
+      if (errors::IsOutOfRange(status)) {
+        // We ignore an end of file error when skipping header lines.
+        // We will end up skipping this file.
+        return Status::OK();
+      }
+      TF_RETURN_IF_ERROR(status);
+    }
+    return Status::OK();
+  }
+
+  Status OnWorkFinishedLocked() override {
+    input_buffer_.reset(nullptr);
+    return Status::OK();
+  }
+
+  Status ReadLocked(string* key, string* value, bool* produced,
+                    bool* at_end) override {
+    Status status = input_buffer_->ReadLine(value);
+    ++line_number_;
+    if (status.ok()) {
+      *key = strings::StrCat(current_work(), ":", line_number_);
+      *produced = true;
+      return status;
+    }
+    if (errors::IsOutOfRange(status)) {  // End of file, advance to the next.
+      *at_end = true;
+      return Status::OK();
+    } else {  // Some other reading error
+      return status;
+    }
+  }
+
+  Status ResetLocked() override {
+    line_number_ = 0;
+    input_buffer_.reset(nullptr);
+    return ReaderBase::ResetLocked();
+  }
+
+  // TODO(josh11b): Implement serializing and restoring the state.  Need
+  // to create TextLineReaderState proto to store ReaderBaseState,
+  // line_number_, and input_buffer_->Tell().
+
+ private:
+  enum { kBufferSize = 256 << 10 /* 256 kB */ };
+  const int skip_header_lines_;
+  Env* const env_;
+  int64 line_number_;
+  std::unique_ptr<io::InputBuffer> input_buffer_;
+};
+
+class TextLineReaderOp : public ReaderOpKernel {
+ public:
+  explicit TextLineReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    int skip_header_lines = -1;
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("skip_header_lines", &skip_header_lines));
+    OP_REQUIRES(context, skip_header_lines >= 0,
+                errors::InvalidArgument("skip_header_lines must be >= 0 not ",
+                                        skip_header_lines));
+    Env* env = context->env();
+    SetReaderFactory([this, skip_header_lines, env]() {
+      return new TextLineReader(name(), skip_header_lines, env);
+    });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("TextLineReader").Device(DEVICE_CPU),
+                        TextLineReaderOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/tf_record_reader_op.cc b/tensorflow/core/kernels/tf_record_reader_op.cc
new file mode 100644
index 0000000000..551be18d5f
--- /dev/null
+++ b/tensorflow/core/kernels/tf_record_reader_op.cc
@@ -0,0 +1,76 @@
+// See docs in ../ops/io_ops.cc.
+
+#include <memory>
+#include "tensorflow/core/framework/reader_op_kernel.h"
+#include "tensorflow/core/kernels/reader_base.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/io/record_reader.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+
+class TFRecordReader : public ReaderBase {
+ public:
+  TFRecordReader(const string& node_name, Env* env)
+      : ReaderBase(strings::StrCat("TFRecordReader '", node_name, "'")),
+        env_(env),
+        offset_(0) {}
+
+  Status OnWorkStartedLocked() override {
+    offset_ = 0;
+    RandomAccessFile* file = nullptr;
+    TF_RETURN_IF_ERROR(env_->NewRandomAccessFile(current_work(), &file));
+    file_.reset(file);
+    reader_.reset(new io::RecordReader(file));
+    return Status::OK();
+  }
+
+  Status OnWorkFinishedLocked() override {
+    reader_.reset(nullptr);
+    file_.reset(nullptr);
+    return Status::OK();
+  }
+
+  Status ReadLocked(string* key, string* value, bool* produced,
+                    bool* at_end) override {
+    *key = strings::StrCat(current_work(), ":", offset_);
+    Status status = reader_->ReadRecord(&offset_, value);
+    if (errors::IsOutOfRange(status)) {
+      *at_end = true;
+      return Status::OK();
+    }
+    if (!status.ok()) return status;
+    *produced = true;
+    return Status::OK();
+  }
+
+  Status ResetLocked() override {
+    offset_ = 0;
+    reader_.reset(nullptr);
+    file_.reset(nullptr);
+    return ReaderBase::ResetLocked();
+  }
+
+  // TODO(josh11b): Implement serializing and restoring the state.
+
+ private:
+  Env* const env_;
+  uint64 offset_;
+  std::unique_ptr<RandomAccessFile> file_;
+  std::unique_ptr<io::RecordReader> reader_;
+};
+
+class TFRecordReaderOp : public ReaderOpKernel {
+ public:
+  explicit TFRecordReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    Env* env = context->env();
+    SetReaderFactory([this, env]() { return new TFRecordReader(name(), env); });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("TFRecordReader").Device(DEVICE_CPU),
+                        TFRecordReaderOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/tile_ops.cc b/tensorflow/core/kernels/tile_ops.cc
new file mode 100644
index 0000000000..d5e0e89d60
--- /dev/null
+++ b/tensorflow/core/kernels/tile_ops.cc
@@ -0,0 +1,460 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#ifdef GOOGLE_CUDA
+#define EIGEN_USE_GPU
+#endif  // GOOGLE_CUDA
+
+#include "tensorflow/core/kernels/tile_ops.h"
+
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+// --------------------------------------------------------------------------
+template <typename Device>
+class TileOp : public OpKernel {
+ public:
+  explicit TileOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& multiples = context->input(1);
+
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsLegacyVector(multiples.shape()),
+        errors::InvalidArgument("Expected multiples to be 1-D, but got shape ",
+                                multiples.shape().ShortDebugString()));
+    OP_REQUIRES(context, input.dims() == multiples.NumElements(),
+                errors::InvalidArgument(
+                    "Expected multiples argument to be a vector of length ",
+                    input.dims(), " but got length ", multiples.dim_size(0)));
+
+    const int input_dims = input.dims();
+    const gtl::ArraySlice<int32> multiples_array(multiples.flat<int32>().data(),
+                                                 input_dims);
+
+    TensorShape output_shape;
+    for (int i = 0; i < input_dims; ++i) {
+      OP_REQUIRES(
+          context, multiples_array[i] > 0,
+          errors::InvalidArgument("Expected multiples[", i, "] > 0, but got ",
+                                  multiples_array[i]));
+      output_shape.AddDim(input.dim_size(i) * multiples_array[i]);
+    }
+    Tensor* result = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &result));
+
+#define HANDLE_DIM(DT, NDIM)                                   \
+  if (context->input(0).dtype() == DT && input_dims == NDIM) { \
+    HandleCase<DT, NDIM>(context, multiples_array, result);    \
+    return;                                                    \
+  }
+
+#define HANDLE_TYPE(T) \
+  HANDLE_DIM(T, 0)     \
+  HANDLE_DIM(T, 1)     \
+  HANDLE_DIM(T, 2)     \
+  HANDLE_DIM(T, 3)     \
+  HANDLE_DIM(T, 4)     \
+  HANDLE_DIM(T, 5)
+
+    HANDLE_TYPE(DT_BOOL);
+    HANDLE_TYPE(DT_FLOAT);
+    HANDLE_TYPE(DT_DOUBLE);
+    HANDLE_TYPE(DT_UINT8);
+    HANDLE_TYPE(DT_INT32);
+    HANDLE_TYPE(DT_INT16);
+    HANDLE_TYPE(DT_INT64);
+    HANDLE_TYPE(DT_STRING);  // when DEVICE=CPUDevice.
+
+#undef HANDLE_TYPE
+#undef HANDLE_DIM
+
+    OP_REQUIRES(context, false,
+                errors::Unimplemented(
+                    "TileOp : Unhandled input dimensions, DT : ",
+                    context->input(0).dtype(), ", dims : ", input_dims));
+  }
+
+ private:
+  template <DataType DT, int NDIM>
+  void HandleCaseImpl(OpKernelContext* context,
+                      const gtl::ArraySlice<int32>& multiples_array,
+                      Tensor* result) {
+    typedef typename EnumToDataType<DT>::Type T;
+    Eigen::array<int32, NDIM> broadcast_array;
+    for (int i = 0; i < NDIM; ++i) {
+      broadcast_array[i] = multiples_array[i];
+    }
+    functor::Tile<Device, T, NDIM>()(
+        context->eigen_device<Device>(), result->tensor<T, NDIM>(),
+        context->input(0).tensor<T, NDIM>(), broadcast_array);
+  }
+
+  template <DataType DT, int NDIM>
+  void HandleCase(OpKernelContext* context,
+                  const gtl::ArraySlice<int32>& multiples_array,
+                  Tensor* result);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(TileOp);
+};
+
+template <typename Device>
+template <DataType DT, int NDIM>
+inline void TileOp<Device>::HandleCase(
+    OpKernelContext* context, const gtl::ArraySlice<int32>& multiples_array,
+    Tensor* result) {
+  LOG(FATAL) << "TileOp: Invalid combination of Device, DT and NDIM: "
+             << typeid(Device).name() << ", " << DataTypeString(DT) << ", "
+             << NDIM;
+}
+
+#define HANDLE_CASE(device, dtype, ndim)                               \
+  template <>                                                          \
+  template <>                                                          \
+  void TileOp<device>::HandleCase<dtype, ndim>(                        \
+      OpKernelContext * context,                                       \
+      const gtl::ArraySlice<int32>& multiples_array, Tensor* result) { \
+    HandleCaseImpl<dtype, ndim>(context, multiples_array, result);     \
+  }
+
+#define HANDLE_CASE_DIM_POSITIVE(device, dtype) \
+  HANDLE_CASE(device, dtype, 1);                \
+  HANDLE_CASE(device, dtype, 2);                \
+  HANDLE_CASE(device, dtype, 3);                \
+  HANDLE_CASE(device, dtype, 4);                \
+  HANDLE_CASE(device, dtype, 5);
+
+#define HANDLE_CASE_DIM(device, dtype) \
+  HANDLE_CASE(device, dtype, 0);       \
+  HANDLE_CASE_DIM_POSITIVE(device, dtype);
+
+HANDLE_CASE_DIM(CPUDevice, DT_BOOL);
+HANDLE_CASE_DIM(CPUDevice, DT_FLOAT);
+HANDLE_CASE_DIM(CPUDevice, DT_DOUBLE);
+HANDLE_CASE_DIM(CPUDevice, DT_UINT8);
+HANDLE_CASE_DIM(CPUDevice, DT_INT32);
+HANDLE_CASE_DIM(CPUDevice, DT_INT16);
+HANDLE_CASE_DIM(CPUDevice, DT_INT64);
+HANDLE_CASE_DIM(CPUDevice, DT_STRING);
+
+#if GOOGLE_CUDA
+// Eigen on GPU does not handle 0-dimension data types yet.
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_FLOAT);
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_DOUBLE);
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_INT16);
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_INT32);
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_INT64);
+#endif  // GOOGLE_CUDA
+
+#undef HANDLE_CASE_DIM_POSITIVE
+#undef HANDLE_CASE_DIM
+#undef HANDLE_CASE
+
+// --------------------------------------------------------------------------
+template <typename Device>
+class TileGradientOp : public OpKernel {
+ public:
+  explicit TileGradientOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    const Tensor& multiples = context->input(1);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsLegacyVector(multiples.shape()),
+        errors::InvalidArgument("Expected multiples to be 1-D, but got shape ",
+                                multiples.shape().ShortDebugString()));
+    OP_REQUIRES(context, input.dims() == multiples.NumElements(),
+                errors::InvalidArgument(
+                    "Expected multiples argument to be a vector of length ",
+                    input.dims(), " but got length ", multiples.dim_size(0)));
+
+    const int input_dims = input.dims();
+    const gtl::ArraySlice<int32> multiples_array(multiples.flat<int32>().data(),
+                                                 input_dims);
+
+    TensorShape output_shape;
+    std::vector<int32> input_dim_size_vec;
+    for (int i = 0; i < input_dims; ++i) {
+      OP_REQUIRES(
+          context, multiples_array[i] > 0,
+          errors::InvalidArgument("Expected multiples[", i, "] > 0, but got ",
+                                  multiples_array[i]));
+      OP_REQUIRES(context, input.dim_size(i) % multiples_array[i] == 0,
+                  errors::InvalidArgument("Expected input_dim[", i,
+                                          "] to be divisible by multiples[", i,
+                                          "], but ", input.dim_size(i), " % ",
+                                          multiples_array[i], " != 0"));
+      output_shape.AddDim(input.dim_size(i) / multiples_array[i]);
+      input_dim_size_vec.push_back(input.dim_size(i));
+    }
+    Tensor* result = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &result));
+
+#define HANDLE_DIM(DT, NDIM)                                           \
+  if (context->input(0).dtype() == DT && input_dims == NDIM) {         \
+    HandleCase<DT, NDIM>(context, input_dim_size_vec, multiples_array, \
+                         result);                                      \
+    return;                                                            \
+  }
+
+#define HANDLE_TYPE(T) \
+  HANDLE_DIM(T, 0)     \
+  HANDLE_DIM(T, 1)     \
+  HANDLE_DIM(T, 2)     \
+  HANDLE_DIM(T, 3)     \
+  HANDLE_DIM(T, 4)     \
+  HANDLE_DIM(T, 5)
+
+    HANDLE_TYPE(DT_FLOAT);
+    HANDLE_TYPE(DT_DOUBLE);
+    HANDLE_TYPE(DT_INT32);
+    HANDLE_TYPE(DT_INT16);
+    HANDLE_TYPE(DT_INT64);
+
+#undef HANDLE_TYPE
+#undef HANDLE_DIM
+
+    OP_REQUIRES(context, false,
+                errors::Unimplemented(
+                    "TileGradientOp : Unhandled input dimensions, DT : ",
+                    context->input(0).dtype(), ", dims : ", input_dims));
+  }
+
+ private:
+  template <DataType DT, int NDIM>
+  void HandleCase(OpKernelContext* context,
+                  const std::vector<int32>& input_dims,
+                  const gtl::ArraySlice<int32>& multiples_array,
+                  Tensor* result);
+
+  template <DataType DT, int NDIM>
+  void HandleCaseImpl(OpKernelContext* context,
+                      const std::vector<int32>& input_dims,
+                      const gtl::ArraySlice<int32>& multiples_array,
+                      Tensor* result) {
+    typedef typename EnumToDataType<DT>::Type T;
+
+    bool reduction_only = true;
+    std::vector<int> reduction_dims;
+
+    for (int i = 0; i < NDIM; ++i) {
+      if (input_dims[i] > multiples_array[i] && multiples_array[i] > 1) {
+        reduction_only = false;
+        break;
+      } else {
+        if (multiples_array[i] == input_dims[i]) {
+          reduction_dims.push_back(i);
+        }
+      }
+    }
+
+    if (reduction_only) {
+#define HANDLE_DIM(D)                                            \
+  if (reduction_dims.size() == (D)) {                            \
+    HandleReduce<T, NDIM, (D)>(context, reduction_dims, result); \
+    return;                                                      \
+  }
+      // NOTE(keveman): Handling the most common case here.
+      // Adding more cases here would require more templating and code
+      // explosion. For instance, HANDLE_DIM(2) wouldn't make sense for NDIM=1.
+      HANDLE_DIM(NDIM > 0 ? 1 : 0);
+
+// Fall through to the unoptimized version.
+#undef HANDLE_DIM
+    }
+
+    Eigen::DSizes<ptrdiff_t, NDIM> indices;
+    Eigen::DSizes<ptrdiff_t, NDIM> sizes;
+
+    // Accumulate slices along the dimensions into the output. The number of
+    // slices along dimension 'i' is simply the multiple along dimension 'i'
+    // passed to the original Tile op.
+    for (int i = 0; i < NDIM; ++i) {
+      sizes[i] = input_dims[i] / multiples_array[i];
+      indices[i] = 0;
+    }
+
+    bool first = true;
+    while (true) {
+      functor::TileGrad<Device, T, NDIM>()(
+          context->eigen_device<Device>(), result->tensor<T, NDIM>(),
+          context->input(0).tensor<T, NDIM>(), indices, sizes, first);
+      first = false;
+      // Increment the begin indices.
+      int i = 0;
+      while (i < NDIM && indices[i] / sizes[i] == multiples_array[i] - 1) {
+        indices[i] = 0;
+        ++i;
+      }
+      // We are finished if we have iterated to the maximum along all
+      // dimensions.
+      if (i == NDIM) {
+        break;
+      }
+      indices[i] += sizes[i];
+    }
+  }
+
+  template <typename T, int NDIM, int REDUCENDIM>
+  void HandleReduce(OpKernelContext* context,
+                    const std::vector<int32>& reduce_dim_in, Tensor* result) {
+    static_assert(NDIM >= REDUCENDIM, "Too many reduced dimensions");
+    Eigen::DSizes<ptrdiff_t, REDUCENDIM> reduce_dim;
+    Eigen::DSizes<ptrdiff_t, NDIM> reshape_dim;
+
+    for (int i = 0; i < REDUCENDIM; ++i) {
+      reduce_dim[i] = reduce_dim_in[i];
+    }
+
+    for (int i = 0; i < NDIM; ++i) {
+      reshape_dim[i] = result->dim_size(i);
+    }
+
+    functor::ReduceAndReshape<Device, T, NDIM, REDUCENDIM>()(
+        context->eigen_device<Device>(), result->tensor<T, NDIM>(),
+        context->input(0).tensor<T, NDIM>(), reduce_dim, reshape_dim);
+  }
+
+  TF_DISALLOW_COPY_AND_ASSIGN(TileGradientOp);
+};
+
+template <typename Device>
+template <DataType DT, int NDIM>
+inline void TileGradientOp<Device>::HandleCase(
+    OpKernelContext* context, const std::vector<int32>& input_dims,
+    const gtl::ArraySlice<int32>& multiples_array, Tensor* result) {
+  LOG(FATAL) << "TileGradientOp: Invalid combination of Device, DT and NDIM: "
+             << typeid(Device).name() << ", " << DataTypeString(DT) << ", "
+             << NDIM;
+}
+
+#define HANDLE_CASE(device, dtype, ndim)                                       \
+  template <>                                                                  \
+  template <>                                                                  \
+  void TileGradientOp<device>::HandleCase<dtype, ndim>(                        \
+      OpKernelContext * context, const std::vector<int32>& input_dims,         \
+      const gtl::ArraySlice<int32>& multiples_array, Tensor* result) {         \
+    HandleCaseImpl<dtype, ndim>(context, input_dims, multiples_array, result); \
+  }
+
+#define HANDLE_CASE_DIM_POSITIVE(device, dtype) \
+  HANDLE_CASE(device, dtype, 1);                \
+  HANDLE_CASE(device, dtype, 2);                \
+  HANDLE_CASE(device, dtype, 3);                \
+  HANDLE_CASE(device, dtype, 4);                \
+  HANDLE_CASE(device, dtype, 5);
+
+#define HANDLE_CASE_DIM(device, dtype) \
+  HANDLE_CASE(device, dtype, 0);       \
+  HANDLE_CASE_DIM_POSITIVE(device, dtype);
+
+HANDLE_CASE_DIM(CPUDevice, DT_FLOAT);
+HANDLE_CASE_DIM(CPUDevice, DT_DOUBLE);
+HANDLE_CASE_DIM(CPUDevice, DT_INT16);
+HANDLE_CASE_DIM(CPUDevice, DT_INT32);
+HANDLE_CASE_DIM(CPUDevice, DT_INT64);
+
+#if GOOGLE_CUDA
+// Eigen on GPU does not handle 0-dimension data types yet.
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_FLOAT);
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_DOUBLE);
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_INT16);
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_INT32);
+HANDLE_CASE_DIM_POSITIVE(GPUDevice, DT_INT64);
+#endif  // GOOGLE_CUDA
+
+#undef HANDLE_CASE_DIM_POSITIVE
+#undef HANDLE_CASE_DIM
+#undef HANDLE_CASE
+
+REGISTER_KERNEL_BUILDER(Name("Tile").Device(DEVICE_CPU).HostMemory("multiples"),
+                        TileOp<CPUDevice>);
+REGISTER_KERNEL_BUILDER(Name("TileGrad")
+                            .Device(DEVICE_CPU)
+                            .HostMemory("multiples"),
+                        TileGradientOp<CPUDevice>);
+
+#if GOOGLE_CUDA
+#define DEFINE_GPU_TYPE(T) \
+  DEFINE_GPU_DIM(T, 1)     \
+  DEFINE_GPU_DIM(T, 2)     \
+  DEFINE_GPU_DIM(T, 3)     \
+  DEFINE_GPU_DIM(T, 4)     \
+  DEFINE_GPU_DIM(T, 5)
+
+#define DEFINE_GPU_DIM(T, NDIM)                                       \
+  template <>                                                         \
+  void Tile<GPUDevice, T, NDIM>::operator()(                          \
+      const GPUDevice& d, typename TTypes<T, NDIM>::Tensor out,       \
+      typename TTypes<T, NDIM>::ConstTensor in,                       \
+      const Eigen::array<int32, NDIM>& broadcast_array) const;        \
+  extern template struct Tile<GPUDevice, T, NDIM>;                    \
+  template <>                                                         \
+  void TileGrad<GPUDevice, T, NDIM>::operator()(                      \
+      const GPUDevice& d, typename TTypes<T, NDIM>::Tensor out,       \
+      typename TTypes<T, NDIM>::ConstTensor in,                       \
+      const Eigen::DSizes<ptrdiff_t, NDIM>& indices,                  \
+      const Eigen::DSizes<ptrdiff_t, NDIM>& sizes, bool first) const; \
+  extern template struct TileGrad<GPUDevice, T, NDIM>;                \
+  template <>                                                         \
+  void ReduceAndReshape<GPUDevice, T, NDIM, 1>::operator()(           \
+      const GPUDevice& d, typename TTypes<T, NDIM>::Tensor out,       \
+      typename TTypes<T, NDIM>::ConstTensor in,                       \
+      const Eigen::DSizes<ptrdiff_t, 1>& reduce_dim,                  \
+      const Eigen::DSizes<ptrdiff_t, NDIM>& reshape_dim) const;       \
+  extern template struct ReduceAndReshape<GPUDevice, T, NDIM, 1>;
+
+namespace functor {
+DEFINE_GPU_TYPE(float);
+DEFINE_GPU_TYPE(double);
+DEFINE_GPU_TYPE(int64);
+DEFINE_GPU_TYPE(int32);
+DEFINE_GPU_TYPE(int16);
+}  // end namespace functor
+
+#undef DEFINE_GPU_DIM
+#undef DEFINE_GPU_TYPE
+
+REGISTER_KERNEL_BUILDER(Name("Tile")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T")
+                            .HostMemory("multiples"),
+                        TileOp<GPUDevice>);
+REGISTER_KERNEL_BUILDER(Name("Tile")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<double>("T")
+                            .HostMemory("multiples"),
+                        TileOp<GPUDevice>);
+REGISTER_KERNEL_BUILDER(Name("Tile")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int16>("T")
+                            .HostMemory("multiples"),
+                        TileOp<GPUDevice>);
+
+REGISTER_KERNEL_BUILDER(Name("TileGrad")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T")
+                            .HostMemory("multiples"),
+                        TileGradientOp<GPUDevice>);
+REGISTER_KERNEL_BUILDER(Name("TileGrad")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<double>("T")
+                            .HostMemory("multiples"),
+                        TileGradientOp<GPUDevice>);
+REGISTER_KERNEL_BUILDER(Name("TileGrad")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<int16>("T")
+                            .HostMemory("multiples"),
+                        TileGradientOp<GPUDevice>);
+#endif  // GOOGLE_CUDA
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/tile_ops.h b/tensorflow/core/kernels/tile_ops.h
new file mode 100644
index 0000000000..b3cc6165e0
--- /dev/null
+++ b/tensorflow/core/kernels/tile_ops.h
@@ -0,0 +1,48 @@
+#ifndef TENSORFLOW_KERNELS_TILE_OPS_H_
+#define TENSORFLOW_KERNELS_TILE_OPS_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T, int NDIM>
+struct Tile {
+  void operator()(const Device& d, typename TTypes<T, NDIM>::Tensor out,
+                  typename TTypes<T, NDIM>::ConstTensor in,
+                  const Eigen::array<int32, NDIM>& broadcast_array) const {
+    out.device(d) = in.broadcast(broadcast_array);
+  }
+};
+
+template <typename Device, typename T, int NDIM>
+struct TileGrad {
+  void operator()(const Device& d, typename TTypes<T, NDIM>::Tensor out,
+                  typename TTypes<T, NDIM>::ConstTensor in,
+                  const Eigen::DSizes<ptrdiff_t, NDIM>& indices,
+                  const Eigen::DSizes<ptrdiff_t, NDIM>& sizes,
+                  bool first) const {
+    if (first) {
+      out.device(d) = in.slice(indices, sizes);
+    } else {
+      out.device(d) += in.slice(indices, sizes);
+    }
+  }
+};
+
+template <typename Device, typename T, int NDIM, int REDUCEDNDIM>
+struct ReduceAndReshape {
+  void operator()(const Device& d, typename TTypes<T, NDIM>::Tensor out,
+                  typename TTypes<T, NDIM>::ConstTensor in,
+                  const Eigen::DSizes<ptrdiff_t, REDUCEDNDIM>& reduce_dim,
+                  const Eigen::DSizes<ptrdiff_t, NDIM>& reshape_dim) const {
+    out.device(d) = in.sum(reduce_dim).reshape(reshape_dim);
+  }
+};
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_TILE_OPS_H_
diff --git a/tensorflow/core/kernels/tile_ops_gpu.cu.cc b/tensorflow/core/kernels/tile_ops_gpu.cu.cc
new file mode 100644
index 0000000000..29481e1a54
--- /dev/null
+++ b/tensorflow/core/kernels/tile_ops_gpu.cu.cc
@@ -0,0 +1,38 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/tile_ops.h"
+#include <stdio.h>
+
+namespace tensorflow {
+namespace functor {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+#define DEFINE_TYPE(T) \
+  DEFINE_DIM(T, 1)     \
+  DEFINE_DIM(T, 2)     \
+  DEFINE_DIM(T, 3)     \
+  DEFINE_DIM(T, 4)     \
+  DEFINE_DIM(T, 5)
+
+#define DEFINE_DIM(T, NDIM)                     \
+  template struct Tile<GPUDevice, T, NDIM>;     \
+  template struct TileGrad<GPUDevice, T, NDIM>; \
+  template struct ReduceAndReshape<GPUDevice, T, NDIM, 1>;
+
+DEFINE_TYPE(float)
+DEFINE_TYPE(double)
+DEFINE_TYPE(int64)
+DEFINE_TYPE(int32)
+DEFINE_TYPE(int16)
+// NOTE(keveman): Eigen's int8 and string versions don't compile yet with nvcc.
+
+#undef DEFINE_DIM
+#undef DEFINE_TYPE
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/topk_op.cc b/tensorflow/core/kernels/topk_op.cc
new file mode 100644
index 0000000000..79b5d4d07e
--- /dev/null
+++ b/tensorflow/core/kernels/topk_op.cc
@@ -0,0 +1,71 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/lib/gtl/top_n.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+template <typename T>
+class TopK : public OpKernel {
+ public:
+  explicit TopK(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("k", &k_));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const auto& input_in = context->input(0);
+    OP_REQUIRES(context, input_in.dims() == 2,
+                errors::InvalidArgument("input must be 2-dimensional"));
+    OP_REQUIRES(context, input_in.dim_size(1) >= k_,
+                errors::InvalidArgument("input must have at least k columns"));
+
+    const auto& input = input_in.matrix<T>();
+
+    const auto num_rows = input_in.dim_size(0);  // generally batch_size
+    const auto num_cols = input_in.dim_size(1);
+
+    Tensor* values_out = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, TensorShape({num_rows, k_}), &values_out));
+    Tensor* indices_out = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                1, TensorShape({num_rows, k_}), &indices_out));
+    auto values = values_out->matrix<T>();
+    auto indices = indices_out->matrix<int32>();
+
+    gtl::TopN<std::pair<T, int32>> filter(k_);
+
+    for (int r = 0; r < num_rows; r++) {
+      for (int32 c = 0; c < num_cols; ++c) {
+        // The second element is the negated index, so that lower-index elements
+        // are considered larger than higher-index elements in case of ties.
+        filter.push(std::make_pair(input(r, c), -c));
+      }
+
+      std::unique_ptr<std::vector<std::pair<T, int32>>> top_k(filter.Extract());
+      for (int32 i = 0; i < k_; ++i) {
+        values(r, i) = (*top_k)[i].first;
+        indices(r, i) = -(*top_k)[i].second;
+      }
+      filter.Reset();
+    }
+  }
+
+ private:
+  int k_;
+};
+
+#define REGISTER_KERNELS(type) \
+  REGISTER_KERNEL_BUILDER(     \
+      Name("TopK").Device(DEVICE_CPU).TypeConstraint<type>("T"), TopK<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNELS);
+#undef REGISTER_KERNELS
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/training_ops.cc b/tensorflow/core/kernels/training_ops.cc
new file mode 100644
index 0000000000..611fa4ac41
--- /dev/null
+++ b/tensorflow/core/kernels/training_ops.cc
@@ -0,0 +1,884 @@
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/training_ops.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+namespace functor {
+
+static inline bool DoInline(int64 size) { return size <= (256ll << 10); }
+
+template <typename T>
+struct ApplyGradientDescent<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstFlat grad) {
+    if (DoInline(var.size())) {
+      var -= grad * lr();
+    } else {
+      var.device(d) -= grad * lr();
+    }
+  }
+};
+
+template <typename T>
+struct ApplyAdagrad<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat accum,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstFlat grad) {
+    if (DoInline(var.size())) {
+      accum += grad.square();
+      var -= grad * lr() * accum.rsqrt();
+    } else {
+      accum.device(d) += grad.square();
+      var.device(d) -= grad * lr() * accum.rsqrt();
+    }
+  }
+};
+
+template <typename T>
+struct ApplyMomentum<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat accum,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstFlat grad,
+                  typename TTypes<T>::ConstScalar momentum) {
+    if (DoInline(var.size())) {
+      accum = accum * momentum() + grad;
+      var -= accum * lr();
+    } else {
+      accum.device(d) = accum * momentum() + grad;
+      var.device(d) -= accum * lr();
+    }
+  }
+};
+
+template <typename T>
+struct ApplyAdam<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat m, typename TTypes<T>::Flat v,
+                  typename TTypes<T>::ConstScalar beta1_power,
+                  typename TTypes<T>::ConstScalar beta2_power,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstScalar beta1,
+                  typename TTypes<T>::ConstScalar beta2,
+                  typename TTypes<T>::ConstScalar epsilon,
+                  typename TTypes<T>::ConstFlat grad) {
+    const T alpha = lr() * std::sqrt(1 - beta2_power()) / (1 - beta1_power());
+    if (DoInline(var.size())) {
+      m += (grad - m) * (1 - beta1());
+      v += (grad.square() - v) * (1 - beta2());
+      var -= (m * alpha) / (v.sqrt() + epsilon());
+    } else {
+      m.device(d) += (grad - m) * (1 - beta1());
+      v.device(d) += (grad.square() - v) * (1 - beta2());
+      var.device(d) -= (m * alpha) / (v.sqrt() + epsilon());
+    }
+  }
+};
+
+template <typename T>
+struct ApplyRMSProp<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat ms, typename TTypes<T>::Flat mom,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstScalar rho,
+                  typename TTypes<T>::ConstScalar momentum,
+                  typename TTypes<T>::ConstScalar epsilon,
+                  typename TTypes<T>::ConstFlat grad) {
+    if (DoInline(var.size())) {
+      ms += (grad.square() - ms) * (1 - rho());
+      mom = mom * momentum() + (grad * lr()) / ((ms + epsilon()).sqrt());
+      var -= mom;
+    } else {
+      ms.device(d) += (grad.square() - ms) * (1 - rho());
+      mom.device(d) =
+          mom * momentum() + (grad * lr()) / ((ms + epsilon()).sqrt());
+      var.device(d) -= mom;
+    }
+  }
+};
+
+}  // namespace functor
+
+template <typename Device, typename T>
+class ApplyGradientDescentOp : public OpKernel {
+ public:
+  explicit ApplyGradientDescentOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("use_locking", &use_exclusive_lock_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    if (use_exclusive_lock_) {
+      mutex_lock l(*ctx->input_ref_mutex(0));
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    } else {
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    }
+    ctx->forward_ref_input_to_ref_output(0, 0);
+  }
+
+ private:
+  bool use_exclusive_lock_;
+
+  void DoValidate(OpKernelContext* ctx) {
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    OP_REQUIRES(
+        ctx, var.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(0)));
+    const Tensor& alpha = ctx->input(1);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsLegacyScalar(alpha.shape()),
+                errors::InvalidArgument("alpha is not a scalar: ",
+                                        alpha.shape().DebugString()));
+    const Tensor& delta = ctx->input(2);
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(delta.shape()),
+        errors::InvalidArgument("var and delta do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                delta.shape().DebugString()));
+  }
+
+  void DoCompute(OpKernelContext* ctx) {
+    const Device& device = ctx->template eigen_device<Device>();
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    const Tensor& alpha = ctx->input(1);
+    const Tensor& delta = ctx->input(2);
+    functor::ApplyGradientDescent<Device, T>()(
+        device, var.flat<T>(), alpha.scalar<T>(), delta.flat<T>());
+  }
+};
+
+#define REGISTER_KERNELS(D, T)                                                \
+  REGISTER_KERNEL_BUILDER(                                                    \
+      Name("ApplyGradientDescent").Device(DEVICE_##D).TypeConstraint<T>("T"), \
+      ApplyGradientDescentOp<D##Device, T>);
+
+REGISTER_KERNELS(CPU, float);
+REGISTER_KERNELS(CPU, double);
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                             \
+  template <>                                           \
+  void ApplyGradientDescent<GPUDevice, T>::operator()(  \
+      const GPUDevice& d, typename TTypes<T>::Flat var, \
+      typename TTypes<T>::ConstScalar alpha,            \
+      typename TTypes<T>::ConstFlat delta);             \
+  extern template struct ApplyGradientDescent<GPUDevice, T>;
+DECLARE_GPU_SPEC(float);
+DECLARE_GPU_SPEC(double);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+REGISTER_KERNELS(GPU, float);
+REGISTER_KERNELS(GPU, double);
+#endif
+#undef REGISTER_KERNELS
+
+template <typename Device, typename T>
+class ApplyAdagradOp : public OpKernel {
+ public:
+  explicit ApplyAdagradOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("use_locking", &use_exclusive_lock_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    if (use_exclusive_lock_) {
+      mutex_lock l1(*ctx->input_ref_mutex(0));
+      // Don't try to acquire a lock on the second ref as they share the same
+      // mutex.
+      //
+      // mutex_lock l2(*ctx->input_ref_mutex(1));
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    } else {
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    }
+    ctx->forward_ref_input_to_ref_output(0, 0);
+  }
+
+ private:
+  bool use_exclusive_lock_;
+
+  void DoValidate(OpKernelContext* ctx) {
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor accum = ctx->mutable_input(1, use_exclusive_lock_);
+    OP_REQUIRES(
+        ctx, var.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(0)));
+    OP_REQUIRES(
+        ctx, accum.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(1)));
+    const Tensor& lr = ctx->input(2);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsLegacyScalar(lr.shape()),
+                errors::InvalidArgument("lr is not a scalar: ",
+                                        lr.shape().DebugString()));
+    const Tensor& grad = ctx->input(3);
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(accum.shape()),
+        errors::InvalidArgument("var and accum do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                accum.shape().DebugString()));
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(grad.shape()),
+        errors::InvalidArgument("var and delta do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                grad.shape().DebugString()));
+  }
+
+  void DoCompute(OpKernelContext* ctx) {
+    const Device& device = ctx->template eigen_device<Device>();
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor accum = ctx->mutable_input(1, use_exclusive_lock_);
+    const Tensor& lr = ctx->input(2);
+    const Tensor& grad = ctx->input(3);
+    functor::ApplyAdagrad<Device, T>()(device, var.flat<T>(), accum.flat<T>(),
+                                       lr.scalar<T>(), grad.flat<T>());
+  }
+};
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+#define REGISTER_KERNELS(D, T)                                        \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("ApplyAdagrad").Device(DEVICE_##D).TypeConstraint<T>("T"), \
+      ApplyAdagradOp<D##Device, T>);
+
+REGISTER_KERNELS(CPU, float);
+REGISTER_KERNELS(CPU, double);
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                               \
+  template <>                                                             \
+  void ApplyAdagrad<GPUDevice, T>::operator()(                            \
+      const GPUDevice& d, typename TTypes<T>::Flat var,                   \
+      typename TTypes<T>::Flat accum, typename TTypes<T>::ConstScalar lr, \
+      typename TTypes<T>::ConstFlat grad);                                \
+  extern template struct ApplyAdagrad<GPUDevice, T>;
+DECLARE_GPU_SPEC(float);
+DECLARE_GPU_SPEC(double);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+REGISTER_KERNELS(GPU, float);
+REGISTER_KERNELS(GPU, double);
+#endif
+#undef REGISTER_KERNELS
+
+// Note, this op works on cpu only.
+template <typename T, typename Tindex>
+class SparseApplyAdagradOp : public OpKernel {
+ public:
+  explicit SparseApplyAdagradOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("use_locking", &use_exclusive_lock_));
+  }
+
+  void Compute(OpKernelContext* ctx) override NO_THREAD_SAFETY_ANALYSIS {
+    mutex* mu_var = ctx->input_ref_mutex(0);
+    // mu_accum is actually the same mutex as mu_var since currently we use a
+    // global mutex.
+    //
+    // mutex* mu_accum = ctx->input_ref_mutex(1);
+    if (use_exclusive_lock_) {
+      mu_var->lock();
+    }
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor accum = ctx->mutable_input(1, use_exclusive_lock_);
+    OP_REQUIRES(
+        ctx, var.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(0)));
+    OP_REQUIRES(
+        ctx, accum.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(1)));
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(accum.shape()),
+        errors::InvalidArgument("var and accum do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                accum.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsVectorOrHigher(var.shape()),
+                errors::InvalidArgument("var must be at least 1 dimensional"));
+
+    const Tensor& lr = ctx->input(2);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsLegacyScalar(lr.shape()),
+                errors::InvalidArgument("lr is not a scalar: ",
+                                        lr.shape().DebugString()));
+    const Tensor& grad = ctx->input(3);
+    const Tensor& indices = ctx->input(4);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsVector(indices.shape()),
+                errors::InvalidArgument("indices must be one-dimensional"));
+
+    for (int d = 1; d < var.dims(); d++) {
+      OP_REQUIRES(ctx, var.dim_size(d) == grad.dim_size(d),
+                  errors::InvalidArgument(strings::StrCat(
+                      "var and grad must match in dimension ", d)));
+    }
+    const Tindex N = indices.dim_size(0);
+    OP_REQUIRES(
+        ctx, grad.dim_size(0) == N,
+        errors::InvalidArgument(
+            "grad must be the same size as indices in the first dimension."));
+
+    if (N > 0) {
+      const Tindex first_dim_size = var.dim_size(0);
+      // Validate all the indices are in range
+      auto indices_vec = indices.vec<Tindex>();
+      for (Tindex i = 0; i < N; i++) {
+        const Tindex index = indices_vec(i);
+        OP_REQUIRES(ctx, index >= 0 && index < first_dim_size,
+                    errors::InvalidArgument(
+                        strings::StrCat("Index ", index, " at offset ", i,
+                                        " in indices is out of range")));
+      }
+
+      auto var_flat = var.flat_outer_dims<T>();
+      auto accum_flat = accum.flat_outer_dims<T>();
+      auto grad_flat = grad.flat_outer_dims<T>();
+      T lr_scalar = lr.scalar<T>()();
+
+      // Note(yonghui): It might be worth multi-threading square() and rsqrt().
+      for (Tindex i = 0; i < N; i++) {
+        const Tindex index = indices_vec(i);
+        auto a = accum_flat.template chip<0>(index);
+        auto g = grad_flat.template chip<0>(i);
+        auto v = var_flat.template chip<0>(index);
+        a += g.square();
+        v -= g.constant(lr_scalar) * g * a.rsqrt();
+      }
+    }
+    if (use_exclusive_lock_) {
+      mu_var->unlock();
+    }
+
+    ctx->forward_ref_input_to_ref_output(0, 0);
+  }
+
+ private:
+  bool use_exclusive_lock_;
+};
+
+#define REGISTER_KERNELS(T, Tindices)                                \
+  REGISTER_KERNEL_BUILDER(Name("SparseApplyAdagrad")                 \
+                              .Device(DEVICE_CPU)                    \
+                              .TypeConstraint<T>("T")                \
+                              .TypeConstraint<Tindices>("Tindices"), \
+                          SparseApplyAdagradOp<T, Tindices>);
+
+REGISTER_KERNELS(float, int32);
+REGISTER_KERNELS(float, int64);
+REGISTER_KERNELS(double, int32);
+REGISTER_KERNELS(double, int64);
+#undef REGISTER_KERNELS
+
+template <typename Device, typename T>
+class ApplyMomentumOp : public OpKernel {
+ public:
+  explicit ApplyMomentumOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("use_locking", &use_exclusive_lock_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    if (use_exclusive_lock_) {
+      mutex_lock l1(*ctx->input_ref_mutex(0));
+      // Don't try to acquire a lock on the second ref as they share the same
+      // mutex.
+      //
+      // mutex_lock l2(*ctx->input_ref_mutex(1));
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    } else {
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    }
+    ctx->forward_ref_input_to_ref_output(0, 0);
+  }
+
+ private:
+  bool use_exclusive_lock_;
+
+  void DoValidate(OpKernelContext* ctx) {
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor accum = ctx->mutable_input(1, use_exclusive_lock_);
+    OP_REQUIRES(
+        ctx, var.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(0)));
+    OP_REQUIRES(
+        ctx, accum.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(1)));
+    const Tensor& lr = ctx->input(2);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(lr.shape()),
+                errors::InvalidArgument("lr is not a scalar: ",
+                                        lr.shape().DebugString()));
+    const Tensor& grad = ctx->input(3);
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(accum.shape()),
+        errors::InvalidArgument("var and accum do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                accum.shape().DebugString()));
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(grad.shape()),
+        errors::InvalidArgument("var and delta do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                grad.shape().DebugString()));
+
+    const Tensor& momentum = ctx->input(4);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(momentum.shape()),
+                errors::InvalidArgument("momentum is not a scalar: ",
+                                        momentum.shape().DebugString()));
+  }
+
+  void DoCompute(OpKernelContext* ctx) {
+    const Device& device = ctx->template eigen_device<Device>();
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor accum = ctx->mutable_input(1, use_exclusive_lock_);
+    const Tensor& lr = ctx->input(2);
+    const Tensor& grad = ctx->input(3);
+    const Tensor& momentum = ctx->input(4);
+    functor::ApplyMomentum<Device, T>()(device, var.flat<T>(), accum.flat<T>(),
+                                        lr.scalar<T>(), grad.flat<T>(),
+                                        momentum.scalar<T>());
+  }
+};
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+#define REGISTER_KERNELS(D, T)                                         \
+  REGISTER_KERNEL_BUILDER(                                             \
+      Name("ApplyMomentum").Device(DEVICE_##D).TypeConstraint<T>("T"), \
+      ApplyMomentumOp<D##Device, T>);
+
+REGISTER_KERNELS(CPU, float);
+REGISTER_KERNELS(CPU, double);
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                               \
+  template <>                                                             \
+  void ApplyMomentum<GPUDevice, T>::operator()(                           \
+      const GPUDevice& d, typename TTypes<T>::Flat var,                   \
+      typename TTypes<T>::Flat accum, typename TTypes<T>::ConstScalar lr, \
+      typename TTypes<T>::ConstFlat grad,                                 \
+      typename TTypes<T>::ConstScalar momentum);                          \
+  extern template struct ApplyMomentum<GPUDevice, T>;
+DECLARE_GPU_SPEC(float);
+DECLARE_GPU_SPEC(double);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+REGISTER_KERNELS(GPU, float);
+REGISTER_KERNELS(GPU, double);
+#endif
+#undef REGISTER_KERNELS
+
+// Note, this op works on cpu only.
+template <typename T, typename Tindex>
+class SparseApplyMomentumOp : public OpKernel {
+ public:
+  explicit SparseApplyMomentumOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("use_locking", &use_exclusive_lock_));
+  }
+
+  void Compute(OpKernelContext* ctx) override NO_THREAD_SAFETY_ANALYSIS {
+    mutex* mu_var = ctx->input_ref_mutex(0);
+    // mu_accum is actually the same mutex as mu_var since currently we use a
+    // global mutex.
+    //
+    // mutex* mu_accum = ctx->input_ref_mutex(1);
+    if (use_exclusive_lock_) {
+      mu_var->lock();
+    }
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor accum = ctx->mutable_input(1, use_exclusive_lock_);
+    OP_REQUIRES(
+        ctx, var.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(0)));
+    OP_REQUIRES(
+        ctx, accum.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(1)));
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(accum.shape()),
+        errors::InvalidArgument("var and accum do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                accum.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsVectorOrHigher(var.shape()),
+                errors::InvalidArgument("var must be at least 1 dimensional"));
+
+    const Tensor& lr = ctx->input(2);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(lr.shape()),
+                errors::InvalidArgument("lr is not a scalar: ",
+                                        lr.shape().DebugString()));
+    const Tensor& grad = ctx->input(3);
+    const Tensor& indices = ctx->input(4);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsVector(indices.shape()),
+                errors::InvalidArgument("indices must be one-dimensional"));
+
+    for (int d = 1; d < var.dims(); d++) {
+      OP_REQUIRES(ctx, var.dim_size(d) == grad.dim_size(d),
+                  errors::InvalidArgument(strings::StrCat(
+                      "var and grad must match in dimension ", d)));
+    }
+    const Tindex N = indices.dim_size(0);
+    OP_REQUIRES(
+        ctx, grad.dim_size(0) == N,
+        errors::InvalidArgument(
+            "grad must be the same size as indices in the first dimension."));
+
+    const Tensor& momentum = ctx->input(5);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(momentum.shape()),
+                errors::InvalidArgument("momentum is not a scalar: ",
+                                        momentum.shape().DebugString()));
+
+    if (N > 0) {
+      const Tindex first_dim_size = var.dim_size(0);
+      // Validate all the indices are in range
+      auto indices_vec = indices.vec<Tindex>();
+      for (Tindex i = 0; i < N; i++) {
+        const Tindex index = indices_vec(i);
+        OP_REQUIRES(ctx, index >= 0 && index < first_dim_size,
+                    errors::InvalidArgument(
+                        strings::StrCat("Index ", index, " at offset ", i,
+                                        " in indices is out of range")));
+      }
+
+      auto var_flat = var.flat_outer_dims<T>();
+      auto accum_flat = accum.flat_outer_dims<T>();
+      auto grad_flat = grad.flat_outer_dims<T>();
+      T lr_scalar = lr.scalar<T>()();
+      T momentum_scalar = momentum.scalar<T>()();
+
+      for (Tindex i = 0; i < N; i++) {
+        const Tindex index = indices_vec(i);
+        auto a = accum_flat.template chip<0>(index);
+        auto g = grad_flat.template chip<0>(i);
+        auto v = var_flat.template chip<0>(index);
+        a = a * a.constant(momentum_scalar) + g;
+        v -= a.constant(lr_scalar) * a;
+      }
+    }
+    if (use_exclusive_lock_) {
+      mu_var->unlock();
+    }
+
+    ctx->forward_ref_input_to_ref_output(0, 0);
+  }
+
+ private:
+  bool use_exclusive_lock_;
+};
+
+#define REGISTER_KERNELS(T, Tindices)                                \
+  REGISTER_KERNEL_BUILDER(Name("SparseApplyMomentum")                \
+                              .Device(DEVICE_CPU)                    \
+                              .TypeConstraint<T>("T")                \
+                              .TypeConstraint<Tindices>("Tindices"), \
+                          SparseApplyMomentumOp<T, Tindices>);
+
+REGISTER_KERNELS(float, int32);
+REGISTER_KERNELS(float, int64);
+REGISTER_KERNELS(double, int32);
+REGISTER_KERNELS(double, int64);
+#undef REGISTER_KERNELS
+
+template <typename Device, typename T>
+class ApplyAdamOp : public OpKernel {
+ public:
+  explicit ApplyAdamOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("use_locking", &use_exclusive_lock_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    if (use_exclusive_lock_) {
+      // all input refs share the same mutex
+      mutex_lock l1(*ctx->input_ref_mutex(0));
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    } else {
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    }
+    ctx->forward_ref_input_to_ref_output(0, 0);
+  }
+
+ private:
+  bool use_exclusive_lock_;
+
+  void DoValidate(OpKernelContext* ctx) {
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor m = ctx->mutable_input(1, use_exclusive_lock_);
+    Tensor v = ctx->mutable_input(2, use_exclusive_lock_);
+    OP_REQUIRES(
+        ctx, var.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(0)));
+    OP_REQUIRES(
+        ctx, m.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(1)));
+    OP_REQUIRES(
+        ctx, v.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(2)));
+
+    const Tensor& beta1_power = ctx->input(3);
+    const Tensor& beta2_power = ctx->input(4);
+    const Tensor& lr = ctx->input(5);
+    const Tensor& beta1 = ctx->input(6);
+    const Tensor& beta2 = ctx->input(7);
+    const Tensor& epsilon = ctx->input(8);
+
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(beta1_power.shape()),
+                errors::InvalidArgument("beta1_power is not a scalar: ",
+                                        beta1_power.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(beta2_power.shape()),
+                errors::InvalidArgument("beta2_power is not a scalar: ",
+                                        beta2_power.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(lr.shape()),
+                errors::InvalidArgument("lr is not a scalar: ",
+                                        lr.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(beta1.shape()),
+                errors::InvalidArgument("beta1 is not a scalar: ",
+                                        beta1.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(beta2.shape()),
+                errors::InvalidArgument("beta2 is not a scalar: ",
+                                        beta2.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(epsilon.shape()),
+                errors::InvalidArgument("epsilon is not a scalar: ",
+                                        epsilon.shape().DebugString()));
+
+    const Tensor& grad = ctx->input(9);
+    OP_REQUIRES(ctx, var.shape().IsSameSize(m.shape()),
+                errors::InvalidArgument("var and m do not have the same shape",
+                                        var.shape().DebugString(), " ",
+                                        m.shape().DebugString()));
+    OP_REQUIRES(ctx, var.shape().IsSameSize(v.shape()),
+                errors::InvalidArgument("var and v do not have the same shape",
+                                        var.shape().DebugString(), " ",
+                                        v.shape().DebugString()));
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(grad.shape()),
+        errors::InvalidArgument("var and grad do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                grad.shape().DebugString()));
+  }
+
+  void DoCompute(OpKernelContext* ctx) {
+    const Device& device = ctx->template eigen_device<Device>();
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor m = ctx->mutable_input(1, use_exclusive_lock_);
+    Tensor v = ctx->mutable_input(2, use_exclusive_lock_);
+    const Tensor& beta1_power = ctx->input(3);
+    const Tensor& beta2_power = ctx->input(4);
+    const Tensor& lr = ctx->input(5);
+    const Tensor& beta1 = ctx->input(6);
+    const Tensor& beta2 = ctx->input(7);
+    const Tensor& epsilon = ctx->input(8);
+    const Tensor& grad = ctx->input(9);
+
+    functor::ApplyAdam<Device, T>()(device, var.flat<T>(), m.flat<T>(),
+                                    v.flat<T>(), beta1_power.scalar<T>(),
+                                    beta2_power.scalar<T>(), lr.scalar<T>(),
+                                    beta1.scalar<T>(), beta2.scalar<T>(),
+                                    epsilon.scalar<T>(), grad.flat<T>());
+  }
+};
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+#define REGISTER_KERNELS(D, T)                                     \
+  REGISTER_KERNEL_BUILDER(                                         \
+      Name("ApplyAdam").Device(DEVICE_##D).TypeConstraint<T>("T"), \
+      ApplyAdamOp<D##Device, T>);
+
+REGISTER_KERNELS(CPU, float);
+REGISTER_KERNELS(CPU, double);
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                   \
+  template <>                                                 \
+  void ApplyAdam<GPUDevice, T>::operator()(                   \
+      const GPUDevice& d, typename TTypes<T>::Flat var,       \
+      typename TTypes<T>::Flat m, typename TTypes<T>::Flat v, \
+      typename TTypes<T>::ConstScalar beta1_power,            \
+      typename TTypes<T>::ConstScalar beta2_power,            \
+      typename TTypes<T>::ConstScalar lr,                     \
+      typename TTypes<T>::ConstScalar beta1,                  \
+      typename TTypes<T>::ConstScalar beta2,                  \
+      typename TTypes<T>::ConstScalar epsilon,                \
+      typename TTypes<T>::ConstFlat grad);                    \
+  extern template struct ApplyAdam<GPUDevice, T>;
+DECLARE_GPU_SPEC(float);
+DECLARE_GPU_SPEC(double);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+REGISTER_KERNELS(GPU, float);
+REGISTER_KERNELS(GPU, double);
+#endif
+#undef REGISTER_KERNELS
+
+template <typename Device, typename T>
+class ApplyRMSPropOp : public OpKernel {
+ public:
+  explicit ApplyRMSPropOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("use_locking", &use_exclusive_lock_));
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    if (use_exclusive_lock_) {
+      // all input refs share the same mutex
+      mutex_lock l1(*ctx->input_ref_mutex(0));
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    } else {
+      DoValidate(ctx);
+      if (!ctx->status().ok()) return;
+      DoCompute(ctx);
+    }
+    ctx->forward_ref_input_to_ref_output(0, 0);
+  }
+
+ private:
+  bool use_exclusive_lock_;
+
+  void DoValidate(OpKernelContext* ctx) {
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor ms = ctx->mutable_input(1, use_exclusive_lock_);
+    Tensor mom = ctx->mutable_input(2, use_exclusive_lock_);
+
+    OP_REQUIRES(
+        ctx, var.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(0)));
+    OP_REQUIRES(
+        ctx, ms.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(1)));
+    OP_REQUIRES(
+        ctx, mom.IsInitialized(),
+        errors::FailedPrecondition(
+            "Attempting to use uninitialized variables: ", def().input(2)));
+
+    const Tensor& lr = ctx->input(3);
+    const Tensor& rho = ctx->input(4);
+    const Tensor& momentum = ctx->input(5);
+    const Tensor& epsilon = ctx->input(6);
+    const Tensor& grad = ctx->input(7);
+
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(lr.shape()),
+                errors::InvalidArgument("lr is not a scalar: ",
+                                        lr.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(rho.shape()),
+                errors::InvalidArgument("rho is not a scalar: ",
+                                        rho.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(momentum.shape()),
+                errors::InvalidArgument("momentum is not a scalar: ",
+                                        momentum.shape().DebugString()));
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(epsilon.shape()),
+                errors::InvalidArgument("epsilon is not a scalar: ",
+                                        epsilon.shape().DebugString()));
+
+    OP_REQUIRES(ctx, var.shape().IsSameSize(ms.shape()),
+                errors::InvalidArgument("var and ms do not have the same shape",
+                                        var.shape().DebugString(), " ",
+                                        ms.shape().DebugString()));
+
+    OP_REQUIRES(ctx, var.shape().IsSameSize(mom.shape()),
+                errors::InvalidArgument(
+                    "var and mom do not have the same shape",
+                    var.shape().DebugString(), " ", mom.shape().DebugString()));
+
+    OP_REQUIRES(
+        ctx, var.shape().IsSameSize(grad.shape()),
+        errors::InvalidArgument("var and grad do not have the same shape",
+                                var.shape().DebugString(), " ",
+                                grad.shape().DebugString()));
+  }
+
+  void DoCompute(OpKernelContext* ctx) {
+    const Device& device = ctx->template eigen_device<Device>();
+    Tensor var = ctx->mutable_input(0, use_exclusive_lock_);
+    Tensor ms = ctx->mutable_input(1, use_exclusive_lock_);
+    Tensor mom = ctx->mutable_input(2, use_exclusive_lock_);
+    const Tensor& lr = ctx->input(3);
+    const Tensor& rho = ctx->input(4);
+    const Tensor& momentum = ctx->input(5);
+    const Tensor& epsilon = ctx->input(6);
+    const Tensor& grad = ctx->input(7);
+
+    functor::ApplyRMSProp<Device, T>()(device, var.flat<T>(), ms.flat<T>(),
+                                       mom.flat<T>(), lr.scalar<T>(),
+                                       rho.scalar<T>(), momentum.scalar<T>(),
+                                       epsilon.scalar<T>(), grad.flat<T>());
+  }
+};
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+#define REGISTER_KERNELS(D, T)                                        \
+  REGISTER_KERNEL_BUILDER(                                            \
+      Name("ApplyRMSProp").Device(DEVICE_##D).TypeConstraint<T>("T"), \
+      ApplyRMSPropOp<D##Device, T>);
+
+REGISTER_KERNELS(CPU, float);
+REGISTER_KERNELS(CPU, double);
+
+#if GOOGLE_CUDA
+// Forward declarations of the functor specializations for GPU.
+namespace functor {
+#define DECLARE_GPU_SPEC(T)                                                    \
+  template <>                                                                  \
+  void ApplyRMSProp<GPUDevice, T>::operator()(                                 \
+      const GPUDevice& d, typename TTypes<T>::Flat var,                        \
+      typename TTypes<T>::Flat ms, typename TTypes<T>::Flat mom,               \
+      typename TTypes<T>::ConstScalar lr, typename TTypes<T>::ConstScalar rho, \
+      typename TTypes<T>::ConstScalar momentum,                                \
+      typename TTypes<T>::ConstScalar epsilon,                                 \
+      typename TTypes<T>::ConstFlat grad);                                     \
+  extern template struct ApplyRMSProp<GPUDevice, T>;
+DECLARE_GPU_SPEC(float);
+DECLARE_GPU_SPEC(double);
+#undef DECLARE_GPU_SPEC
+}  // namespace functor
+
+REGISTER_KERNELS(GPU, float);
+REGISTER_KERNELS(GPU, double);
+#endif
+#undef REGISTER_KERNELS
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/training_ops.h b/tensorflow/core/kernels/training_ops.h
new file mode 100644
index 0000000000..71f6d0253d
--- /dev/null
+++ b/tensorflow/core/kernels/training_ops.h
@@ -0,0 +1,65 @@
+#ifndef TENSORFLOW_KERNELS_TRAINING_OPS_H_
+#define TENSORFLOW_KERNELS_TRAINING_OPS_H_
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Each training algorithm has a ApplyXYZ functor struct declared in
+// this header file. They are specialized for different devices
+// (CPUDevice in training_ops.cc or GPUDevice in training_ops_gpu.cc).
+
+template <typename Device, typename T>
+struct ApplyGradientDescent {
+  void operator()(const Device& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::ConstScalar alpha,
+                  typename TTypes<T>::ConstFlat delta);
+};
+
+template <typename Device, typename T>
+struct ApplyAdagrad {
+  void operator()(const Device& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat accum,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstFlat grad);
+};
+
+template <typename Device, typename T>
+struct ApplyMomentum {
+  void operator()(const Device& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat accum,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstFlat grad,
+                  typename TTypes<T>::ConstScalar momentum);
+};
+
+template <typename Device, typename T>
+struct ApplyAdam {
+  void operator()(const Device& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat m, typename TTypes<T>::Flat v,
+                  typename TTypes<T>::ConstScalar beta1_power,
+                  typename TTypes<T>::ConstScalar beta2_power,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstScalar beta1,
+                  typename TTypes<T>::ConstScalar beta2,
+                  typename TTypes<T>::ConstScalar epsilon,
+                  typename TTypes<T>::ConstFlat grad);
+};
+
+template <typename Device, typename T>
+struct ApplyRMSProp {
+  void operator()(const Device& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat ms, typename TTypes<T>::Flat mom,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstScalar rho,
+                  typename TTypes<T>::ConstScalar momentum,
+                  typename TTypes<T>::ConstScalar epsilon,
+                  typename TTypes<T>::ConstFlat grad);
+};
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_TRAINING_OPS_H_
diff --git a/tensorflow/core/kernels/training_ops_gpu.cu.cc b/tensorflow/core/kernels/training_ops_gpu.cu.cc
new file mode 100644
index 0000000000..3106f29648
--- /dev/null
+++ b/tensorflow/core/kernels/training_ops_gpu.cu.cc
@@ -0,0 +1,127 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/training_ops.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+namespace functor {
+template <typename T>
+struct ApplyGradientDescent<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::ConstScalar alpha,
+                  typename TTypes<T>::ConstFlat delta) {
+    Eigen::array<typename TTypes<T>::Tensor::Index, 1> bcast;
+    bcast[0] = delta.dimension(0);
+    Eigen::Sizes<1> single;
+    var.device(d) -= alpha.reshape(single).broadcast(bcast) * delta;
+  }
+};
+
+template <typename T>
+struct ApplyAdagrad<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat accum,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstFlat grad) {
+    accum.device(d) += grad.square();
+    Eigen::array<typename TTypes<T>::Tensor::Index, 1> bcast;
+    bcast[0] = grad.dimension(0);
+    Eigen::Sizes<1> single;
+    var.device(d) -= lr.reshape(single).broadcast(bcast) * grad * accum.rsqrt();
+  }
+};
+
+template <typename T>
+struct ApplyMomentum<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat accum,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstFlat grad,
+                  typename TTypes<T>::ConstScalar momentum) {
+    Eigen::array<typename TTypes<T>::Tensor::Index, 1> bcast;
+    bcast[0] = grad.dimension(0);
+    Eigen::Sizes<1> single;
+    accum.device(d) = accum * momentum.reshape(single).broadcast(bcast) + grad;
+    var.device(d) -= lr.reshape(single).broadcast(bcast) * accum;
+  }
+};
+
+template <typename T>
+struct ApplyAdam<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat m, typename TTypes<T>::Flat v,
+                  typename TTypes<T>::ConstScalar beta1_power,
+                  typename TTypes<T>::ConstScalar beta2_power,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstScalar beta1,
+                  typename TTypes<T>::ConstScalar beta2,
+                  typename TTypes<T>::ConstScalar epsilon,
+                  typename TTypes<T>::ConstFlat grad) {
+    Eigen::array<typename TTypes<T>::Tensor::Index, 1> bcast;
+    bcast[0] = grad.dimension(0);
+    Eigen::Sizes<1> single;
+    const auto one = static_cast<T>(1.0);
+    m.device(d) =
+        m +
+        (beta1.constant(one) - beta1).reshape(single).broadcast(bcast) *
+            (grad - m);
+    v.device(d) =
+        v +
+        (beta2.constant(one) - beta2).reshape(single).broadcast(bcast) *
+            (grad.square() - v);
+    var.device(d) -= (lr * (beta2_power.constant(one) - beta2_power).sqrt() /
+                      (beta1_power.constant(one) - beta1_power))
+                         .reshape(single)
+                         .broadcast(bcast) *
+                     m / (epsilon.reshape(single).broadcast(bcast) + v.sqrt());
+  }
+};
+
+template <typename T>
+struct ApplyRMSProp<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::Flat var,
+                  typename TTypes<T>::Flat ms, typename TTypes<T>::Flat mom,
+                  typename TTypes<T>::ConstScalar lr,
+                  typename TTypes<T>::ConstScalar rho,
+                  typename TTypes<T>::ConstScalar momentum,
+                  typename TTypes<T>::ConstScalar epsilon,
+                  typename TTypes<T>::ConstFlat grad) {
+    Eigen::array<typename TTypes<T>::Tensor::Index, 1> bcast;
+    bcast[0] = grad.dimension(0);
+    Eigen::Sizes<1> single;
+    const auto one = static_cast<T>(1.0);
+    ms.device(d) = ms +
+                   (rho.constant(one) - rho).reshape(single).broadcast(bcast) *
+                       (grad.square() - ms);
+    mom.device(d) =
+        mom * momentum.reshape(single).broadcast(bcast) +
+        lr.reshape(single).broadcast(bcast) * grad /
+            ((epsilon.reshape(single).broadcast(bcast) + ms).sqrt());
+    var.device(d) -= mom;
+  }
+};
+
+}  // namespace functor
+
+template struct functor::ApplyGradientDescent<GPUDevice, float>;
+template struct functor::ApplyGradientDescent<GPUDevice, double>;
+
+template struct functor::ApplyAdagrad<GPUDevice, float>;
+template struct functor::ApplyAdagrad<GPUDevice, double>;
+
+template struct functor::ApplyMomentum<GPUDevice, float>;
+template struct functor::ApplyMomentum<GPUDevice, double>;
+
+template struct functor::ApplyAdam<GPUDevice, float>;
+template struct functor::ApplyAdam<GPUDevice, double>;
+
+template struct functor::ApplyRMSProp<GPUDevice, float>;
+template struct functor::ApplyRMSProp<GPUDevice, double>;
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/training_ops_test.cc b/tensorflow/core/kernels/training_ops_test.cc
new file mode 100644
index 0000000000..3c629badb6
--- /dev/null
+++ b/tensorflow/core/kernels/training_ops_test.cc
@@ -0,0 +1,226 @@
+#include <gtest/gtest.h>
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/session_options.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+// We focus on the single thread performance of training ops.
+static SessionOptions InitSingleThreadedOptions() {
+  SessionOptions opts;
+  opts.config.set_intra_op_parallelism_threads(1);
+  opts.config.set_inter_op_parallelism_threads(1);
+  return opts;
+}
+
+static SessionOptions* GetOptions() {
+  static SessionOptions opts = InitSingleThreadedOptions();
+  return &opts;
+}
+
+static Node* Var(Graph* g, int n) {
+  return test::graph::Var(g, DT_FLOAT, TensorShape({n}));
+}
+
+static Node* Zeros(Graph* g, int n) {
+  Tensor data(DT_FLOAT, TensorShape({n}));
+  data.flat<float>().setZero();
+  return test::graph::Constant(g, data);
+}
+
+static Node* Random(Graph* g, int n) {
+  Tensor data(DT_FLOAT, TensorShape({n}));
+  data.flat<float>().setRandom();
+  return test::graph::Constant(g, data);
+}
+
+static Node* Scalar(Graph* g, float val) {
+  Tensor data(DT_FLOAT, TensorShape({}));
+  data.flat<float>()(0) = val;
+  return test::graph::Constant(g, data);
+}
+
+static void SGD(int32 n, Graph** init_g, Graph** train_g) {
+  RequireDefaultOps();
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    test::graph::Assign(g, var, Zeros(g, n));
+    *init_g = g;
+  }
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto lr = Scalar(g, 0.01);
+    auto grad = Random(g, n);
+    test::graph::Multi(g, "ApplyGradientDescent", {var, lr, grad});
+    *train_g = g;
+  }
+}
+
+static void BM_SGD(int iters, int params) {
+  const int64 tot = static_cast<int64>(iters) * params;
+  testing::ItemsProcessed(tot);
+  testing::BytesProcessed(tot * sizeof(float));
+  Graph* init;
+  Graph* train;
+  SGD(params, &init, &train);
+  test::Benchmark("cpu", train, GetOptions(), init).Run(iters);
+}
+BENCHMARK(BM_SGD)->Arg(128 << 10)->Arg(256 << 10);
+
+static void Adagrad(int32 n, Graph** init_g, Graph** train_g) {
+  RequireDefaultOps();
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto accum = Var(g, n);
+    auto zero = Zeros(g, n);
+    test::graph::Assign(g, var, zero);
+    test::graph::Assign(g, accum, zero);
+    *init_g = g;
+  }
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto accum = Var(g, n);
+    auto lr = Scalar(g, 0.01);
+    auto grad = Random(g, n);
+    test::graph::Multi(g, "ApplyAdagrad", {var, accum, lr, grad});
+    *train_g = g;
+  }
+}
+
+static void BM_Adagrad(int iters, int params) {
+  const int64 tot = static_cast<int64>(iters) * params;
+  testing::ItemsProcessed(tot);
+  testing::BytesProcessed(tot * sizeof(float));
+  Graph* init;
+  Graph* train;
+  Adagrad(params, &init, &train);
+  test::Benchmark("cpu", train, GetOptions(), init).Run(iters);
+}
+BENCHMARK(BM_Adagrad)->Arg(128 << 10)->Arg(256 << 10);
+
+static void Momentum(int32 n, Graph** init_g, Graph** train_g) {
+  RequireDefaultOps();
+  TensorShape shape({n});
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto accum = Var(g, n);
+    auto zero = Zeros(g, n);
+    test::graph::Assign(g, var, zero);
+    test::graph::Assign(g, accum, zero);
+    *init_g = g;
+  }
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto accum = Var(g, n);
+    auto lr = Scalar(g, 0.01);
+    auto grad = Random(g, n);
+    auto mom = Scalar(g, 0.01);
+    test::graph::Multi(g, "ApplyMomentum", {var, accum, lr, grad, mom});
+    *train_g = g;
+  }
+}
+
+static void BM_Momentum(int iters, int params) {
+  const int64 tot = static_cast<int64>(iters) * params;
+  testing::ItemsProcessed(tot);
+  testing::BytesProcessed(tot * sizeof(float));
+  Graph* init;
+  Graph* train;
+  Momentum(params, &init, &train);
+  test::Benchmark("cpu", train, GetOptions(), init).Run(iters);
+}
+BENCHMARK(BM_Momentum)->Arg(128 << 10)->Arg(256 << 10);
+
+static void Adam(int32 n, Graph** init_g, Graph** train_g) {
+  RequireDefaultOps();
+  TensorShape shape({n});
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto m = Var(g, n);
+    auto v = Var(g, n);
+    auto zero = Zeros(g, n);
+    test::graph::Assign(g, var, zero);
+    test::graph::Assign(g, m, zero);
+    test::graph::Assign(g, v, zero);
+    *init_g = g;
+  }
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto m = Var(g, n);
+    auto v = Var(g, n);
+    auto beta1_power = Scalar(g, 0.9);
+    auto beta2_power = Scalar(g, 0.99);
+    auto lr = Scalar(g, 0.01);
+    auto beta1 = Scalar(g, 0.9);
+    auto beta2 = Scalar(g, 0.99);
+    auto epsilon = Scalar(g, 1e-8);
+    auto grad = Random(g, n);
+    test::graph::Multi(g, "ApplyAdam", {var, m, v, beta1_power, beta2_power, lr,
+                                        beta1, beta2, epsilon, grad});
+    *train_g = g;
+  }
+}
+
+static void BM_Adam(int iters, int params) {
+  const int64 tot = static_cast<int64>(iters) * params;
+  testing::ItemsProcessed(tot);
+  testing::BytesProcessed(tot * sizeof(float));
+  Graph* init;
+  Graph* train;
+  Adam(params, &init, &train);
+  test::Benchmark("cpu", train, GetOptions(), init).Run(iters);
+}
+BENCHMARK(BM_Adam)->Arg(128 << 10)->Arg(256 << 10);
+
+static void RMSProp(int32 n, Graph** init_g, Graph** train_g) {
+  RequireDefaultOps();
+  TensorShape shape({n});
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto ms = Var(g, n);
+    auto mom = Var(g, n);
+    auto zero = Zeros(g, n);
+    test::graph::Assign(g, var, zero);
+    test::graph::Assign(g, ms, zero);
+    test::graph::Assign(g, mom, zero);
+    *init_g = g;
+  }
+  {
+    Graph* g = new Graph(OpRegistry::Global());
+    auto var = Var(g, n);
+    auto ms = Var(g, n);
+    auto mom = Var(g, n);
+    auto lr = Scalar(g, 0.01);
+    auto rho = Scalar(g, 0.9);
+    auto momentum = Scalar(g, 0.9);
+    auto epsilon = Scalar(g, 1e-8);
+    auto grad = Random(g, n);
+    test::graph::Multi(g, "ApplyRMSProp",
+                       {var, ms, mom, lr, rho, momentum, epsilon, grad});
+    *train_g = g;
+  }
+}
+
+static void BM_RMSProp(int iters, int params) {
+  const int64 tot = static_cast<int64>(iters) * params;
+  testing::ItemsProcessed(tot);
+  testing::BytesProcessed(tot * sizeof(float));
+  Graph* init;
+  Graph* train;
+  RMSProp(params, &init, &train);
+  test::Benchmark("cpu", train, GetOptions(), init).Run(iters);
+}
+BENCHMARK(BM_RMSProp)->Arg(128 << 10)->Arg(256 << 10);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/transpose_op.cc b/tensorflow/core/kernels/transpose_op.cc
new file mode 100644
index 0000000000..4f11a881f8
--- /dev/null
+++ b/tensorflow/core/kernels/transpose_op.cc
@@ -0,0 +1,190 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/transpose_op.h"
+#include "tensorflow/core/kernels/transpose_op_functor.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+// inv = InvertPermutationOp(T<int32> p) takes a permutation of
+// integers 0, 1, ..., n - 1 and returns the inverted
+// permutation of p. I.e., inv[p[i]] == i, for i in [0 .. n).
+//
+// REQUIRES: input is a vector of int32.
+// REQUIRES: input is a permutation of 0, 1, ..., n-1.
+
+class InvertPermutationOp : public OpKernel {
+ public:
+  explicit InvertPermutationOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    OP_REQUIRES(
+        context, TensorShapeUtils::IsVector(input.shape()),
+        errors::InvalidArgument("invert_permutation expects a 1D vector."));
+    auto Tin = input.vec<int32>();
+    const int N = Tin.size();
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input.shape(), &output));
+    auto Tout = output->vec<int32>();
+    std::fill_n(Tout.data(), N, -1);
+    for (int i = 0; i < N; ++i) {
+      const int32 d = Tin(i);
+      OP_REQUIRES(context, 0 <= d && d < N,
+                  errors::InvalidArgument(d, " is not between 0 and ", N));
+      OP_REQUIRES(context, Tout(d) == -1,
+                  errors::InvalidArgument(d, " is duplicated in the input."));
+      Tout(d) = i;
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("InvertPermutation").Device(DEVICE_CPU),
+                        InvertPermutationOp);
+
+// output = TransposeOp(T<any> input, T<int32> perm) takes a tensor
+// of type T and rank N, and a permutation of 0, 1, ..., N-1. It
+// shuffles the dimensions of the input tensor according to permutation.
+//
+// Specifically, the returned tensor output meets the following condition:
+// 1) output.dims() == input.dims();
+// 2) output.dim_size(i) == input.dim_size(perm[i]);
+// 3) output.tensor<T, N>(i_0, i_1, ..., i_N-1) ==
+//      input.tensor<T, N>(j_0, j_1, ..., j_N-1),
+//    where i_s == j_{perm[s]}
+//
+// REQUIRES: perm is a vector of int32.
+// REQUIRES: input.dims() == perm.size().
+// REQUIRES: perm is a permutation.
+
+template <typename Device, typename T>
+TransposeOp<Device, T>::TransposeOp(OpKernelConstruction* context)
+    : OpKernel(context) {}
+
+template <typename Device, typename T>
+void TransposeOp<Device, T>::Compute(OpKernelContext* context) {
+  const Tensor& input = context->input(0);
+  const Tensor& perm = context->input(1);
+  // Preliminary validation of sizes.
+  OP_REQUIRES(context, TensorShapeUtils::IsVector(perm.shape()),
+              errors::InvalidArgument("perm must be a vector, not ",
+                                      perm.shape().DebugString()));
+  auto Vperm = perm.vec<int32>();
+  const int dims = input.dims();
+  static const int kMinDims = 1;
+  static const int kMaxDims = 8;
+  OP_REQUIRES(context, kMinDims <= dims && dims <= kMaxDims,
+              errors::Unimplemented("Transposing a tensor of rank ", dims,
+                                    " is not implemented."));
+  OP_REQUIRES(context, dims == Vperm.size(),
+              errors::InvalidArgument(
+                  "transpose expects a vector of size ", input.dims(),
+                  ". But input(1) is a vector of size ", Vperm.size()));
+  gtl::ArraySlice<int32> permutation(
+      reinterpret_cast<const int32*>(Vperm.data()), dims);
+  TensorShape shape;
+
+  // Check whether permutation is a permutation of integers of [0 .. dims).
+  gtl::InlinedVector<bool, 8> bits(dims);
+  for (const int32 d : permutation) {
+    OP_REQUIRES(
+        context, 0 <= d && d < dims,
+        errors::InvalidArgument(d, " is out of range [0 .. ", dims, ")"));
+    bits[d] = true;
+    shape.AddDim(input.dim_size(d));
+  }
+  for (int i = 0; i < dims; ++i) {
+    OP_REQUIRES(context, bits[i], errors::InvalidArgument(
+                                      i, " is missing from {",
+                                      str_util::Join(permutation, ","), "}."));
+  }
+
+  Tensor* output = nullptr;
+  OP_REQUIRES_OK(context, context->allocate_output(0, shape, &output));
+  switch (dims) {
+#define EXPAND_DIM(N)                                             \
+  case N: {                                                       \
+    functor::TransposeFunctor<Device, T, N> func;                 \
+    func(context->eigen_device<Device>(), output->tensor<T, N>(), \
+         input.tensor<T, N>(), permutation.data());               \
+    break;                                                        \
+  }
+    EXPAND_DIM(1);
+    EXPAND_DIM(2);
+    EXPAND_DIM(3);
+    EXPAND_DIM(4);
+    EXPAND_DIM(5);
+    EXPAND_DIM(6);
+    EXPAND_DIM(7);
+    EXPAND_DIM(8);
+    default:
+      LOG(FATAL) << "Unexpected dims: " << dims;
+  }
+#undef EXPAND_CASE
+}
+
+namespace functor {
+
+template <typename Device, typename T, int NDIMS>
+void TransposeMaybeInline(const Device& d,
+                          typename TTypes<T, NDIMS>::Tensor out,
+                          typename TTypes<T, NDIMS>::ConstTensor in,
+                          const int* perm) {
+  // perm[] is a permutation of 0, 1, ..., NDIMS-1. perm[] is on CPU.
+  Eigen::array<int, NDIMS> p;
+  for (int i = 0; i < NDIMS; ++i) p[i] = perm[i];
+  if (out.size() * sizeof(T) < 131072) {  // Small transpose on a CPU: do inline
+    out = in.shuffle(p);
+  } else {
+    out.device(d) = in.shuffle(p);
+  }
+}
+
+template <typename T, int NDIMS>
+struct TransposeFunctor<CPUDevice, T, NDIMS> {
+  void operator()(const CPUDevice& d, typename TTypes<T, NDIMS>::Tensor out,
+                  typename TTypes<T, NDIMS>::ConstTensor in, const int* perm) {
+    TransposeMaybeInline<CPUDevice, T, NDIMS>(d, out, in, perm);
+  }
+};
+
+}  // namespace functor
+
+#define REGISTER(D, T)                                \
+  template class TransposeOp<D##Device, T>;           \
+  REGISTER_KERNEL_BUILDER(Name("Transpose")           \
+                              .Device(DEVICE_##D)     \
+                              .TypeConstraint<T>("T") \
+                              .HostMemory("perm"),    \
+                          TransposeOp<D##Device, T>)
+REGISTER(CPU, float);
+REGISTER(CPU, double);
+REGISTER(CPU, complex64);
+REGISTER(CPU, uint8);
+REGISTER(CPU, int8);
+REGISTER(CPU, int16);
+REGISTER(CPU, int32);
+REGISTER(CPU, int64);
+REGISTER(CPU, string);
+#if GOOGLE_CUDA
+REGISTER(GPU, uint8);
+REGISTER(GPU, int8);
+REGISTER(GPU, int16);
+REGISTER(GPU, int32);
+REGISTER(GPU, int64);
+REGISTER(GPU, float);
+REGISTER(GPU, double);
+#endif
+#undef REGISTER
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/transpose_op.h b/tensorflow/core/kernels/transpose_op.h
new file mode 100644
index 0000000000..f7a5be5c2b
--- /dev/null
+++ b/tensorflow/core/kernels/transpose_op.h
@@ -0,0 +1,19 @@
+#ifndef TENSORFLOW_KERNELS_TRANSPOSE_OP_H_
+#define TENSORFLOW_KERNELS_TRANSPOSE_OP_H_
+
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+template <typename Device, typename T>
+class TransposeOp : public OpKernel {
+ public:
+  explicit TransposeOp(OpKernelConstruction* context);
+  void Compute(OpKernelContext* context) override;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_TRANSPOSE_OP_H_
diff --git a/tensorflow/core/kernels/transpose_op_functor.h b/tensorflow/core/kernels/transpose_op_functor.h
new file mode 100644
index 0000000000..8cbd1cbb29
--- /dev/null
+++ b/tensorflow/core/kernels/transpose_op_functor.h
@@ -0,0 +1,28 @@
+#ifndef THIRD_PARTY_TENSORFLOW_CORE_KERNELS_TRANSPOSE_OP_FUNCTOR_H_
+#define THIRD_PARTY_TENSORFLOW_CORE_KERNELS_TRANSPOSE_OP_FUNCTOR_H_
+
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename Device, typename T, int NDIMS>
+void Transpose(const Device& d, typename TTypes<T, NDIMS>::Tensor out,
+               typename TTypes<T, NDIMS>::ConstTensor in, const int* perm) {
+  // perm[] is a permutation of 0, 1, ..., NDIMS-1. perm[] is on CPU.
+  Eigen::array<int, NDIMS> p;
+  for (int i = 0; i < NDIMS; ++i) p[i] = perm[i];
+  out.device(d) = in.shuffle(p);
+}
+
+template <typename Device, typename T, int NDIMS>
+struct TransposeFunctor {
+  void operator()(const Device& d, typename TTypes<T, NDIMS>::Tensor out,
+                  typename TTypes<T, NDIMS>::ConstTensor in, const int* perm);
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // THIRD_PARTY_TENSORFLOW_CORE_KERNELS_TRANSPOSE_OP_FUNCTOR_H_
diff --git a/tensorflow/core/kernels/transpose_op_gpu.cu.cc b/tensorflow/core/kernels/transpose_op_gpu.cu.cc
new file mode 100644
index 0000000000..8c04a6544e
--- /dev/null
+++ b/tensorflow/core/kernels/transpose_op_gpu.cu.cc
@@ -0,0 +1,43 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/kernels/transpose_op_functor.h"
+
+namespace tensorflow {
+namespace functor {
+
+template <typename T, int NDIMS>
+struct TransposeFunctor<Eigen::GpuDevice, T, NDIMS> {
+  void operator()(const Eigen::GpuDevice& d,
+                  typename TTypes<T, NDIMS>::Tensor out,
+                  typename TTypes<T, NDIMS>::ConstTensor in, const int* perm) {
+    Transpose<Eigen::GpuDevice, T, NDIMS>(d, out, in, perm);
+  }
+};
+
+#define DEFINE(T, N) template struct TransposeFunctor<Eigen::GpuDevice, T, N>;
+#define DEFINE_DIM(T) \
+  DEFINE(T, 1);       \
+  DEFINE(T, 2);       \
+  DEFINE(T, 3);       \
+  DEFINE(T, 4);       \
+  DEFINE(T, 5);       \
+  DEFINE(T, 6);       \
+  DEFINE(T, 7);       \
+  DEFINE(T, 8);
+DEFINE_DIM(uint8);
+DEFINE_DIM(int8);
+DEFINE_DIM(int16);
+DEFINE_DIM(int32);
+DEFINE_DIM(int64);
+DEFINE_DIM(float);
+DEFINE_DIM(double);
+#undef DEFINE_DIM
+#undef DEFINE
+
+}  // end namespace functor
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/unique_op.cc b/tensorflow/core/kernels/unique_op.cc
new file mode 100644
index 0000000000..61f4a54583
--- /dev/null
+++ b/tensorflow/core/kernels/unique_op.cc
@@ -0,0 +1,61 @@
+#include <unordered_map>
+#include <utility>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+template <typename T>
+class UniqueOp : public OpKernel {
+ public:
+  explicit UniqueOp(OpKernelConstruction* context) : OpKernel(context) {
+    const DataType dt = DataTypeToEnum<T>::v();
+    OP_REQUIRES_OK(context, context->MatchSignature({dt}, {dt, DT_INT32}));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(input.shape()),
+                errors::InvalidArgument("unique expects a 1D vector."));
+    auto Tin = input.vec<T>();
+    const int N = Tin.size();
+
+    Tensor* idx = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(1, input.shape(), &idx));
+    auto idx_vec = idx->template vec<int32>();
+
+    std::unordered_map<T, int32> uniq;
+    uniq.reserve(2 * N);
+    for (int i = 0, j = 0; i < N; ++i) {
+      auto it = uniq.insert(std::make_pair(Tin(i), j));
+      idx_vec(i) = it.first->second;
+      if (it.second) {
+        ++j;
+      }
+    }
+    int32 uniq_size = uniq.size();
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(
+                                0, TensorShape({uniq_size}), &output));
+    auto output_vec = output->template vec<T>();
+
+    for (auto it : uniq) {
+      output_vec(it.second) = it.first;
+    }
+  }
+};
+
+#define REGISTER_UNIQUE(type)                                      \
+  REGISTER_KERNEL_BUILDER(                                         \
+      Name("Unique").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      UniqueOp<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_UNIQUE);
+#undef REGISTER_UNIQUE
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/unique_op_test.cc b/tensorflow/core/kernels/unique_op_test.cc
new file mode 100644
index 0000000000..658f2282cf
--- /dev/null
+++ b/tensorflow/core/kernels/unique_op_test.cc
@@ -0,0 +1,51 @@
+#include <functional>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/graph/testlib.h"
+#include "tensorflow/core/graph/node_builder.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+
+namespace tensorflow {
+
+namespace {
+
+static void BM_Unique(int iters, int dim) {
+  testing::StopTiming();
+  RequireDefaultOps();
+  Graph* g = new Graph(OpRegistry::Global());
+
+  Tensor input(DT_INT32, TensorShape({dim}));
+  input.flat<int32>().setRandom();
+
+  Node* node;
+  TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Unique")
+                  .Input(test::graph::Constant(g, input))
+                  .Attr("T", DT_INT32)
+                  .Finalize(g, &node));
+
+  testing::BytesProcessed(static_cast<int64>(iters) * dim * sizeof(int32));
+  testing::UseRealTime();
+  testing::StartTiming();
+  test::Benchmark("cpu", g).Run(iters);
+}
+
+BENCHMARK(BM_Unique)
+    ->Arg(32)
+    ->Arg(256)
+    ->Arg(1024)
+    ->Arg(4 * 1024)
+    ->Arg(16 * 1024)
+    ->Arg(64 * 1024)
+    ->Arg(256 * 1024);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/unpack_op.cc b/tensorflow/core/kernels/unpack_op.cc
new file mode 100644
index 0000000000..36cfb2c8e5
--- /dev/null
+++ b/tensorflow/core/kernels/unpack_op.cc
@@ -0,0 +1,96 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include <vector>
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/kernels/split_op.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class UnpackOp : public OpKernel {
+ public:
+  explicit UnpackOp(OpKernelConstruction* c) : OpKernel(c) {}
+
+  void Compute(OpKernelContext* context) override {
+    const int32 num = num_outputs();
+    const Tensor& input = context->input(0);
+    const TensorShape& input_shape = input.shape();
+
+    OP_REQUIRES(
+        context, input_shape.dims() > 0 && input_shape.dim_size(0) == num,
+        errors::InvalidArgument("Input shape must start with ", num, ", got ",
+                                input_shape.ShortDebugString()));
+
+    auto output_shape = input_shape;
+    output_shape.RemoveDim(0);
+    const int32 output_size = output_shape.num_elements();
+
+    // Special case: Aligned, so we can share the underlying buffer.
+    //
+    // Apply this optimization conservatively: if input is aligned,
+    // the resulting tensors must be aligned. It's conservative
+    // because if the immediate consumer of the resulting tensors are
+    // not using eigen for computation, its perfectly fine to avoid
+    // the copying.
+    if (output_size == 0 || IsInnerDimsSizeAligned<T>(input_shape)) {
+      for (int i = 0; i < num; ++i) {
+        Tensor output;
+        CHECK(output.CopyFrom(input.Slice(i, i + 1), output_shape));
+        context->set_output(i, output);
+      }
+      return;
+    }
+
+    // Except for shape, unpack is a special case of split, so we reuse the
+    // same computational kernels.
+    auto input_reshaped = input.shaped<T, 3>({1, num, output_size});
+
+    for (int i = 0; i < num; ++i) {
+      Tensor* output;
+      OP_REQUIRES_OK(context,
+                     context->allocate_output(i, output_shape, &output));
+      auto output_shaped = output->shaped<T, 3>({1, 1, output_size});
+
+      Eigen::DSizes<ptrdiff_t, 3> indices{0, i, 0};
+      Eigen::DSizes<ptrdiff_t, 3> sizes{1, 1, output_size};
+      functor::Split<Device, T>()(context->eigen_device<Device>(),
+                                  output_shaped, input_reshaped, indices,
+                                  sizes);
+    }
+  }
+};
+
+#define REGISTER_UNPACK(type)                                      \
+  REGISTER_KERNEL_BUILDER(                                         \
+      Name("Unpack").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      UnpackOp<CPUDevice, type>)
+
+TF_CALL_ALL_TYPES(REGISTER_UNPACK);
+
+#undef REGISTER_UNPACK
+
+#if GOOGLE_CUDA
+
+#define REGISTER_GPU(type)                                         \
+  REGISTER_KERNEL_BUILDER(                                         \
+      Name("Unpack").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
+      UnpackOp<GPUDevice, type>)
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU);
+#undef REGISTER_GPU
+
+#endif  // GOOGLE_CUDA
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/kernels/variable_ops.cc b/tensorflow/core/kernels/variable_ops.cc
new file mode 100644
index 0000000000..2f1dbc68c0
--- /dev/null
+++ b/tensorflow/core/kernels/variable_ops.cc
@@ -0,0 +1,37 @@
+#define EIGEN_USE_THREADS
+#include "tensorflow/core/kernels/variable_ops.h"
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+REGISTER_KERNEL_BUILDER(Name("Variable").Device(DEVICE_CPU), VariableOp);
+REGISTER_KERNEL_BUILDER(Name("TemporaryVariable").Device(DEVICE_CPU),
+                        TemporaryVariableOp);
+REGISTER_KERNEL_BUILDER(Name("DestroyTemporaryVariable").Device(DEVICE_CPU),
+                        DestroyTemporaryVariableOp);
+
+#if GOOGLE_CUDA
+// Only register 'Variable' on GPU for the subset of types also supported by
+// 'Assign' (see dense_update_ops.cc.)
+#define REGISTER_GPU_KERNELS(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                               \
+      Name("Variable").Device(DEVICE_GPU).TypeConstraint<type>("dtype"), \
+      VariableOp);                                                       \
+  REGISTER_KERNEL_BUILDER(Name("TemporaryVariable")                      \
+                              .Device(DEVICE_GPU)                        \
+                              .TypeConstraint<type>("dtype"),            \
+                          TemporaryVariableOp);                          \
+  REGISTER_KERNEL_BUILDER(Name("DestroyTemporaryVariable")               \
+                              .Device(DEVICE_GPU)                        \
+                              .TypeConstraint<type>("T"),                \
+                          DestroyTemporaryVariableOp);
+
+TF_CALL_GPU_NUMBER_TYPES(REGISTER_GPU_KERNELS);
+#undef REGISTER_GPU_KERNELS
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/variable_ops.h b/tensorflow/core/kernels/variable_ops.h
new file mode 100644
index 0000000000..77d2da0ad4
--- /dev/null
+++ b/tensorflow/core/kernels/variable_ops.h
@@ -0,0 +1,146 @@
+#ifndef TENSORFLOW_KERNELS_VARIABLE_OPS_H_
+#define TENSORFLOW_KERNELS_VARIABLE_OPS_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/resource_mgr.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+class VariableOp : public OpKernel {
+ public:
+  explicit VariableOp(OpKernelConstruction* context) : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("shape", &shape_));
+    dtype_ = RemoveRefType(context->output_type(0));
+  }
+
+  ~VariableOp() override {
+    if (var_) var_->Unref();
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    mutex_lock l(init_mu_);
+    if (var_ == nullptr) {
+      OP_REQUIRES_OK(ctx, cinfo_.Init(ctx->resource_manager(), def(),
+                                      true /* use name() */));
+      auto creator = [this](Var** var) {
+        *var = new Var(dtype_);
+        (*var)->tensor()->set_shape(shape_);
+        return Status::OK();
+      };
+      OP_REQUIRES_OK(ctx,
+                     cinfo_.resource_manager()->LookupOrCreate<Var>(
+                         cinfo_.container(), cinfo_.name(), &var_, creator));
+    }
+    // Output a reference to our tensor, so it may be updated.
+    //
+    // As long as *this is alive, the ref we return here is valid
+    // because *this owns a ref on var_.
+    ctx->set_output_ref(0, var_->mu(), var_->tensor());
+  }
+
+ private:
+  class Var : public ResourceBase {
+   public:
+    explicit Var(DataType dtype) : tensor_(dtype) {}
+    mutex* mu() { return &mu_; }
+    Tensor* tensor() { return &tensor_; }
+
+    string DebugString() override {
+      return strings::StrCat(DataTypeString(tensor_.dtype()), "/",
+                             tensor_.shape().ShortDebugString());
+    }
+
+   private:
+    mutex mu_;
+    Tensor tensor_;
+
+    ~Var() override {}
+    TF_DISALLOW_COPY_AND_ASSIGN(Var);
+  };
+
+  DataType dtype_;
+  TensorShape shape_;
+
+  mutex init_mu_;
+  ContainerInfo cinfo_ GUARDED_BY(init_mu_);
+  Var* var_ GUARDED_BY(init_mu_) = nullptr;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(VariableOp);
+};
+
+class TemporaryVariableOp : public OpKernel {
+ public:
+  explicit TemporaryVariableOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES_OK(context, context->GetAttr("shape", &shape_));
+    OP_REQUIRES_OK(context, context->GetAttr("dtype", &dtype_));
+    OP_REQUIRES_OK(context, context->GetAttr("var_name", &var_name_));
+    // Variable name defaults to op name if not specified explicitly.
+    if (var_name_ == "") var_name_ = name();
+  }
+
+  void Compute(OpKernelContext* context) override {
+    Status s;
+    ResourceMgr* rm = context->step_resource_manager();
+    OP_REQUIRES(context, rm, errors::Internal("No per-step resource manager."));
+    auto* tmp_var = new TmpVar;
+    OP_REQUIRES(context, tmp_var,
+                errors::ResourceExhausted("Could not allocate TmpVar."));
+    tmp_var->name = var_name_;
+    s = context->allocate_temp(dtype_, shape_, &tmp_var->val);
+    if (!s.ok()) tmp_var->Unref();
+    OP_REQUIRES_OK(context, s);
+    OP_REQUIRES_OK(context, rm->Create("tmp_var", var_name_, tmp_var));
+    context->set_output_ref(0, &tmp_var->mu, &tmp_var->val);
+  }
+
+ private:
+  // Refcounted temporary variable resource.
+  friend class DestroyTemporaryVariableOp;
+  struct TmpVar : public ResourceBase {
+    mutex mu;
+    Tensor val;
+    string name;
+    string DebugString() override { return name; }
+    ~TmpVar() override { VLOG(3) << "TmpVar " << name << " deleted"; }
+  };
+
+  TensorShape shape_;
+  DataType dtype_;
+  string var_name_;
+};
+
+class DestroyTemporaryVariableOp : public OpKernel {
+ public:
+  explicit DestroyTemporaryVariableOp(OpKernelConstruction* context)
+      : OpKernel(context) {
+    OP_REQUIRES(context, IsRefType(context->input_type(0)),
+                errors::InvalidArgument("lhs input needs to be a ref type"))
+    OP_REQUIRES_OK(context, context->GetAttr("var_name", &var_name_));
+    OP_REQUIRES(context, var_name_ != "",
+                errors::InvalidArgument("Missing var_name attribute"));
+  }
+
+  void Compute(OpKernelContext* context) override {
+    // NOTE(pbar): All other mutators of the Tensor Ref *must* have completed
+    // their execution before this DestroyTemporaryVariable op executes.
+    // This is typically achieved using control dependencies.
+    CHECK(IsRefType(context->input_dtype(0)));
+    Tensor tmpvar = context->mutable_input(0, false);
+    context->set_output(0, tmpvar);
+    ResourceMgr* rm = context->step_resource_manager();
+    OP_REQUIRES(context, rm, errors::Internal("No per-step resource manager."));
+    OP_REQUIRES_OK(
+        context, rm->Delete<TemporaryVariableOp::TmpVar>("tmp_var", var_name_));
+  }
+
+ private:
+  string var_name_;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_VARIABLE_OPS_H_
diff --git a/tensorflow/core/kernels/where_op.cc b/tensorflow/core/kernels/where_op.cc
new file mode 100644
index 0000000000..9db0943ea7
--- /dev/null
+++ b/tensorflow/core/kernels/where_op.cc
@@ -0,0 +1,74 @@
+// See docs in ../ops/array_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/kernels/where_op.h"
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device>
+class WhereOp : public OpKernel {
+ public:
+  explicit WhereOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& input = context->input(0);
+
+    const int input_dims = input.dims();
+    Tensor num_true;
+    OP_REQUIRES_OK(
+        context, context->allocate_temp(DT_INT64, TensorShape({}), &num_true));
+    auto num_true_t = num_true.scalar<int64>();
+
+    functor::NumTrue<Device>::Compute(context->eigen_device<Device>(),
+                                      input.flat<bool>(), num_true_t);
+    TensorShape output_shape({num_true_t(), input_dims});
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output(0, output_shape, &output));
+
+#define HANDLE_DIM(NDIM)                                                   \
+  case NDIM:                                                               \
+    functor::Where<Device, NDIM>::Compute(context->eigen_device<Device>(), \
+                                          input.tensor<bool, NDIM>(),      \
+                                          output->matrix<int64>());        \
+    break;
+
+    switch (input_dims) {
+      HANDLE_DIM(1);
+      HANDLE_DIM(2);
+      HANDLE_DIM(3);
+      HANDLE_DIM(4);
+      HANDLE_DIM(5);
+
+      default:
+        OP_REQUIRES(context, false,
+                    errors::InvalidArgument(
+                        "WhereOp : Unhandled input dimensions: ", input_dims));
+    }
+#undef HANDLE_DIM
+  }
+
+ private:
+  TF_DISALLOW_COPY_AND_ASSIGN(WhereOp);
+};
+
+#define REGISTER_WHERE() \
+  REGISTER_KERNEL_BUILDER(Name("Where").Device(DEVICE_CPU), WhereOp<CPUDevice>);
+
+REGISTER_WHERE();
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/where_op.h b/tensorflow/core/kernels/where_op.h
new file mode 100644
index 0000000000..c7b835d02f
--- /dev/null
+++ b/tensorflow/core/kernels/where_op.h
@@ -0,0 +1,65 @@
+#ifndef TENSORFLOW_KERNELS_WHERE_OP_H_
+#define TENSORFLOW_KERNELS_WHERE_OP_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+namespace functor {
+
+template <typename Device>
+struct NumTrue {
+  EIGEN_ALWAYS_INLINE static void Compute(
+      const Device& d, typename TTypes<bool>::ConstFlat input,
+      TTypes<int64>::Scalar num_true) {
+    num_true.device(d) = input.template cast<int64>().sum();
+  }
+};
+
+template <typename Device, int NDIM>
+struct Where {
+  EIGEN_ALWAYS_INLINE static void Compute(
+      const Device& d, typename TTypes<bool, NDIM>::ConstTensor input,
+      typename TTypes<int64>::Matrix output) {
+    Eigen::DenseIndex true_n = 0;
+    Eigen::DSizes<Eigen::DenseIndex, NDIM> dims = input.dimensions();
+    Eigen::DSizes<Eigen::DenseIndex, NDIM> strides;
+
+    // Calculate strides for RowMajor order.
+    EIGEN_STATIC_ASSERT((static_cast<int>(decltype(input)::Layout) ==
+                         static_cast<int>(Eigen::RowMajor)),
+                        INTERNAL_ERROR_INPUT_SHOULD_BE_ROWMAJOR);
+
+    strides[NDIM - 1] = 1;
+    for (int i = NDIM - 2; i >= 0; --i) {
+      strides[i] = strides[i + 1] * dims[i + 1];
+    }
+
+    // Note, no bounds checking is done on true_n.  It is assumed that
+    // the output was correctly sized via output of NumTrue::Compute.
+    for (Eigen::DenseIndex n = 0; n < input.size(); ++n) {
+      if (input.data()[n]) {
+        WriteIndexRowMajor(output, strides, true_n, n);
+        ++true_n;
+      }
+    }
+  }
+
+  EIGEN_ALWAYS_INLINE static void WriteIndexRowMajor(
+      typename TTypes<int64>::Matrix output,
+      const Eigen::DSizes<Eigen::DenseIndex, NDIM>& strides,
+      Eigen::DenseIndex true_n, Eigen::DenseIndex index) {
+    for (int i = 0; i < NDIM; ++i) {
+      output(true_n, i) = index / strides[i];
+      index %= strides[i];
+    }
+  }
+};
+
+}  // namespace functor
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_WHERE_OP_H_
diff --git a/tensorflow/core/kernels/whole_file_read_ops.cc b/tensorflow/core/kernels/whole_file_read_ops.cc
new file mode 100644
index 0000000000..b940163ec9
--- /dev/null
+++ b/tensorflow/core/kernels/whole_file_read_ops.cc
@@ -0,0 +1,108 @@
+// See docs in ../ops/io_ops.cc.
+
+#include <memory>
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/reader_op_kernel.h"
+#include "tensorflow/core/kernels/reader_base.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+static Status ReadEntireFile(Env* env, const string& filename,
+                             string* contents) {
+  uint64 file_size = 0;
+  TF_RETURN_IF_ERROR(env->GetFileSize(filename, &file_size));
+  contents->resize(file_size);
+  RandomAccessFile* file;
+  TF_RETURN_IF_ERROR(env->NewRandomAccessFile(filename, &file));
+  std::unique_ptr<RandomAccessFile> make_sure_file_gets_deleted(file);
+  StringPiece data;
+  TF_RETURN_IF_ERROR(file->Read(0, file_size, &data, &(*contents)[0]));
+  if (data.size() != file_size) {
+    return errors::DataLoss("Truncated read of '", filename, "' expected ",
+                            file_size, " got ", data.size());
+  }
+  if (data.data() != &(*contents)[0]) {
+    memmove(&(*contents)[0], data.data(), data.size());
+  }
+  return Status::OK();
+}
+
+class WholeFileReader : public ReaderBase {
+ public:
+  WholeFileReader(Env* env, const string& node_name)
+      : ReaderBase(strings::StrCat("WholeFileReader '", node_name, "'")),
+        env_(env) {}
+
+  Status ReadLocked(string* key, string* value, bool* produced,
+                    bool* at_end) override {
+    *key = current_work();
+    TF_RETURN_IF_ERROR(ReadEntireFile(env_, *key, value));
+    *produced = true;
+    *at_end = true;
+    return Status::OK();
+  }
+
+  // Stores state in a ReaderBaseState proto, since WholeFileReader has
+  // no additional state beyond ReaderBase.
+  Status SerializeStateLocked(string* state) override {
+    ReaderBaseState base_state;
+    SaveBaseState(&base_state);
+    base_state.SerializeToString(state);
+    return Status::OK();
+  }
+
+  Status RestoreStateLocked(const string& state) override {
+    ReaderBaseState base_state;
+    if (!ParseProtoUnlimited(&base_state, state)) {
+      return errors::InvalidArgument("Could not parse state for ", name(), ": ",
+                                     str_util::CEscape(state));
+    }
+    TF_RETURN_IF_ERROR(RestoreBaseState(base_state));
+    return Status::OK();
+  }
+
+ private:
+  Env* env_;
+};
+
+class WholeFileReaderOp : public ReaderOpKernel {
+ public:
+  explicit WholeFileReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    Env* env = context->env();
+    SetReaderFactory(
+        [this, env]() { return new WholeFileReader(env, name()); });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("WholeFileReader").Device(DEVICE_CPU),
+                        WholeFileReaderOp);
+
+class ReadFileOp : public OpKernel {
+ public:
+  using OpKernel::OpKernel;
+  void Compute(OpKernelContext* context) override {
+    const Tensor* input;
+    OP_REQUIRES_OK(context, context->input("filename", &input));
+    OP_REQUIRES(context, TensorShapeUtils::IsScalar(input->shape()),
+                errors::InvalidArgument(
+                    "Input filename tensor must be scalar, but had shape: ",
+                    input->shape().DebugString()));
+
+    Tensor* output = nullptr;
+    OP_REQUIRES_OK(context, context->allocate_output("contents",
+                                                     TensorShape({}), &output));
+    OP_REQUIRES_OK(context,
+                   ReadEntireFile(context->env(), input->scalar<string>()(),
+                                  &output->scalar<string>()()));
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ReadFile").Device(DEVICE_CPU), ReadFileOp);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/xent_op.cc b/tensorflow/core/kernels/xent_op.cc
new file mode 100644
index 0000000000..ff54d157af
--- /dev/null
+++ b/tensorflow/core/kernels/xent_op.cc
@@ -0,0 +1,90 @@
+// See docs in ../ops/nn_ops.cc.
+
+#define EIGEN_USE_THREADS
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/kernels/xent_op.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+typedef Eigen::GpuDevice GPUDevice;
+
+template <typename Device, typename T>
+class SoftmaxXentWithLogitsOp : public OpKernel {
+ public:
+  explicit SoftmaxXentWithLogitsOp(OpKernelConstruction* context)
+      : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    const Tensor& logits_in = context->input(0);
+    const Tensor& labels_in = context->input(1);
+    OP_REQUIRES(context, logits_in.IsSameSize(labels_in),
+                errors::InvalidArgument(
+                    "logits and labels must be same size: logits_size=",
+                    logits_in.shape().DebugString(), " labels_size=",
+                    labels_in.shape().DebugString()));
+    OP_REQUIRES(context, TensorShapeUtils::IsMatrix(logits_in.shape()),
+                errors::InvalidArgument("logits must be 2-dimensional"));
+    // As we already tested that both inputs have the same shape no need to
+    // check that "labels" is a matrix too.
+
+    // loss is 1-D (one per example), and size is batch_size.
+
+    Tensor scratch;
+    OP_REQUIRES_OK(
+        context, context->allocate_temp(DataTypeToEnum<T>::value,
+                                        TensorShape({logits_in.dim_size(0), 1}),
+                                        &scratch));
+
+    Tensor* loss_out = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(
+                       0, TensorShape({logits_in.dim_size(0)}), &loss_out));
+    Tensor* back_out = nullptr;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(1, logits_in.shape(), &back_out));
+
+    functor::XentFunctor<Device, T> functor;
+    functor(context->eigen_device<Device>(), logits_in.matrix<T>(),
+            labels_in.matrix<T>(), scratch.matrix<T>(), loss_out->vec<T>(),
+            back_out->matrix<T>());
+  }
+};
+
+// Partial specialization for a CPUDevice, that uses the Eigen implementation
+// from XentEigenImpl.
+namespace functor {
+template <typename T>
+struct XentFunctor<CPUDevice, T> {
+  void operator()(const CPUDevice& d, typename TTypes<T>::ConstMatrix logits,
+                  typename TTypes<T>::ConstMatrix labels,
+                  typename TTypes<T>::Matrix scratch,
+                  typename TTypes<T>::Vec loss,
+                  typename TTypes<T>::Matrix backprop) {
+    XentEigenImpl<CPUDevice, T>::Compute(d, logits, labels, scratch, loss,
+                                         backprop);
+  }
+};
+}  // namespace functor
+
+REGISTER_KERNEL_BUILDER(Name("SoftmaxCrossEntropyWithLogits")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        SoftmaxXentWithLogitsOp<CPUDevice, float>);
+REGISTER_KERNEL_BUILDER(Name("SoftmaxCrossEntropyWithLogits")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<double>("T"),
+                        SoftmaxXentWithLogitsOp<CPUDevice, double>);
+
+#if GOOGLE_CUDA
+REGISTER_KERNEL_BUILDER(Name("SoftmaxCrossEntropyWithLogits")
+                            .Device(DEVICE_GPU)
+                            .TypeConstraint<float>("T"),
+                        SoftmaxXentWithLogitsOp<GPUDevice, float>);
+#endif  // GOOGLE_CUDA
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/kernels/xent_op.h b/tensorflow/core/kernels/xent_op.h
new file mode 100644
index 0000000000..edb7d817c8
--- /dev/null
+++ b/tensorflow/core/kernels/xent_op.h
@@ -0,0 +1,102 @@
+#ifndef TENSORFLOW_KERNELS_XENT_OP_H_
+#define TENSORFLOW_KERNELS_XENT_OP_H_
+// Functor definition for XentOp, must be compilable by nvcc.
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace functor {
+
+// Functor used by XentOp to do the computations.
+template <typename Device, typename T>
+struct XentFunctor {
+  // Computes Cross Entropy loss and backprop.
+  //
+  // logits: batch_size, num_classes.
+  // labels: batch_size, num_classes.
+  // scratch: temporary tensor, dims: batch_size, 1
+  // loss: output tensor for the loss, dims: batch_size.
+  // backprop: output tensor for the backprop, dims: batch_size, num_classes.
+  void operator()(const Device& d, typename TTypes<T>::ConstMatrix logits,
+                  typename TTypes<T>::ConstMatrix labels,
+                  typename TTypes<T>::Matrix scratch,
+                  typename TTypes<T>::Vec loss,
+                  typename TTypes<T>::Matrix backprop);
+};
+
+// Eigen code implementing XentFunctor::operator().
+// This code works for both CPU and GPU and is used by the functor
+// specializations for both device types.
+template <typename Device, typename T>
+struct XentEigenImpl {
+  static void Compute(const Device& d, typename TTypes<T>::ConstMatrix logits,
+                      typename TTypes<T>::ConstMatrix labels,
+                      typename TTypes<T>::Matrix scratch,
+                      typename TTypes<T>::Vec loss,
+                      typename TTypes<T>::Matrix backprop) {
+    // NOTE(mdevin): This duplicates some of the computations in softmax_op
+    // because we need the intermediate (logits -max(logits)) values to
+    // avoid a log(exp()) in the computation of the loss.
+
+    const int kBatchDim = 0;
+    const int kClassDim = 1;
+
+    const int batch_size = logits.dimension(kBatchDim);
+    const int num_classes = logits.dimension(kClassDim);
+
+// These arrays are used to reduce along the class dimension, and broadcast
+// the resulting value to all classes.
+#if !defined(EIGEN_HAS_INDEX_LIST)
+    Eigen::array<int, 1> along_class;
+    along_class[0] = kClassDim;
+    Eigen::array<int, 1> batch_only;
+    batch_only[0] = batch_size;
+    Eigen::array<int, 2> batch_by_one;
+    batch_by_one[0] = batch_size;
+    batch_by_one[1] = 1;
+    Eigen::array<int, 2> one_by_class;
+    one_by_class[0] = 1;
+    one_by_class[1] = num_classes;
+#else
+    Eigen::IndexList<Eigen::type2index<kClassDim> > along_class;
+    Eigen::IndexList<int, Eigen::type2index<1> > batch_by_one;
+    batch_by_one.set(0, batch_size);
+    Eigen::IndexList<int> batch_only;
+    batch_only.set(0, batch_size);
+    Eigen::IndexList<Eigen::type2index<1>, int> one_by_class;
+    one_by_class.set(1, num_classes);
+#endif
+
+    // max_logits along classes.
+    scratch.reshape(batch_only).device(d) = logits.maximum(along_class);
+
+    // logits - max_logits.
+    backprop.device(d) = logits - scratch.broadcast(one_by_class);
+
+    // sum(exp(logits - max_logits)) along classes.
+    scratch.reshape(batch_only).device(d) = backprop.exp().sum(along_class);
+
+    // NOTE(keveman): Eigen on GPU dispatches to an optimized implementaion
+    // for an expression of the form lhs = rhs.sum().
+    // lhs = -rhs.sum() doesn't match the above pattern, so folding in the
+    // negation before calling sum().
+    //  sum(-labels *
+    //     ((logits - max_logits) - log(sum(exp(logits - max_logits)))))
+    //  along classes
+    loss.device(d) =
+        (labels * (scratch.log().eval().broadcast(one_by_class) - backprop))
+            .eval()
+            .sum(along_class);
+
+    // backprop: prob - labels, where
+    //   prob = exp(logits - max_logits) / sum(exp(logits - max_logits))
+    backprop.device(d) =
+        (backprop.exp() / scratch.broadcast(one_by_class)) - labels;
+  }
+};
+
+}  // namespace functor
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_XENT_OP_H_
diff --git a/tensorflow/core/kernels/xent_op_gpu.cu.cc b/tensorflow/core/kernels/xent_op_gpu.cu.cc
new file mode 100644
index 0000000000..eec6a84281
--- /dev/null
+++ b/tensorflow/core/kernels/xent_op_gpu.cu.cc
@@ -0,0 +1,35 @@
+#if GOOGLE_CUDA
+
+#define EIGEN_USE_GPU
+
+#include "tensorflow/core/kernels/xent_op.h"
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/framework/tensor_types.h"
+
+namespace tensorflow {
+
+typedef Eigen::GpuDevice GPUDevice;
+
+// Partial specialization for a GPUDevice, that uses the Eigen implementation
+// from XentEigenImpl.
+namespace functor {
+template <typename T>
+struct XentFunctor<GPUDevice, T> {
+  void operator()(const GPUDevice& d, typename TTypes<T>::ConstMatrix logits,
+                  typename TTypes<T>::ConstMatrix labels,
+                  typename TTypes<T>::Matrix scratch,
+                  typename TTypes<T>::Vec loss,
+                  typename TTypes<T>::Matrix backprop) {
+    XentEigenImpl<GPUDevice, T>::Compute(d, logits, labels, scratch, loss,
+                                         backprop);
+  }
+};
+}  // end namespace functor
+
+// Instantiate the GPU implementation for float.
+template struct functor::XentFunctor<GPUDevice, float>;
+
+}  // end namespace tensorflow
+
+#endif  // GOOGLE_CUDA
diff --git a/tensorflow/core/kernels/xent_op_test.cc b/tensorflow/core/kernels/xent_op_test.cc
new file mode 100644
index 0000000000..9aab1b09bf
--- /dev/null
+++ b/tensorflow/core/kernels/xent_op_test.cc
@@ -0,0 +1,46 @@
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/kernels/xent_op.h"
+
+namespace tensorflow {
+
+static Graph* Xent(int batch_size, int num_classes) {
+  Graph* g = new Graph(OpRegistry::Global());
+  Tensor logits(DT_FLOAT, TensorShape({batch_size, num_classes}));
+  logits.flat<float>().setRandom();
+  Tensor labels(DT_FLOAT, TensorShape({batch_size, num_classes}));
+  labels.flat<float>().setRandom();
+  test::graph::Binary(g, "SoftmaxCrossEntropyWithLogits",
+                      test::graph::Constant(g, logits),
+                      test::graph::Constant(g, labels));
+  return g;
+}
+
+#define BM_XentDev(BATCH, CLASS, DEVICE)                                \
+  static void BM_Xent##_##BATCH##_##CLASS##_##DEVICE(int iters) {       \
+    testing::ItemsProcessed(static_cast<int64>(iters) * BATCH * CLASS); \
+    test::Benchmark(#DEVICE, Xent(BATCH, CLASS)).Run(iters);            \
+  }                                                                     \
+  BENCHMARK(BM_Xent##_##BATCH##_##CLASS##_##DEVICE);
+
+/// The representative tests for ptb_word on GPU
+BM_XentDev(16, 10000, gpu);
+BM_XentDev(16, 30000, gpu);
+BM_XentDev(16, 100000, gpu);
+
+BM_XentDev(32, 10000, gpu);
+BM_XentDev(32, 30000, gpu);
+BM_XentDev(32, 100000, gpu);
+
+BM_XentDev(64, 10000, gpu);
+BM_XentDev(64, 30000, gpu);
+BM_XentDev(64, 100000, gpu);
+
+/// Only the smaller tests for CPU. Otherwise, it's too slow
+BM_XentDev(16, 10000, cpu);
+BM_XentDev(32, 10000, cpu);
+BM_XentDev(64, 10000, cpu);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/lib/core/arena.cc b/tensorflow/core/lib/core/arena.cc
new file mode 100644
index 0000000000..ceb1001af0
--- /dev/null
+++ b/tensorflow/core/lib/core/arena.cc
@@ -0,0 +1,246 @@
+// This approach to arenas overcomes many of the limitations described
+// in the "Specialized allocators" section of
+//     http://www.pdos.lcs.mit.edu/~dm/c++-new.html
+//
+// A somewhat similar approach to Gladiator, but for heap-detection, was
+// suggested by Ron van der Wal and Scott Meyers at
+//     http://www.aristeia.com/BookErrata/M27Comments_frames.html
+
+#include "tensorflow/core/lib/core/arena.h"
+
+#include <assert.h>
+#include <unistd.h>
+
+#include <vector>
+
+#include "tensorflow/core/platform/logging.h"
+namespace tensorflow {
+namespace core {
+
+static const int kPageSize = getpagesize();
+
+// ----------------------------------------------------------------------
+// Arena::Arena()
+// Arena::~Arena()
+//    Destroying the arena automatically calls Reset()
+// ----------------------------------------------------------------------
+
+Arena::Arena(const size_t block_size)
+    : remaining_(0),
+      block_size_(block_size),
+      freestart_(NULL),  // set for real in Reset()
+      blocks_alloced_(1),
+      overflow_blocks_(NULL) {
+  assert(block_size > kDefaultAlignment);
+
+  first_blocks_[0].mem = reinterpret_cast<char*>(malloc(block_size_));
+
+  first_blocks_[0].size = block_size_;
+
+  Reset();
+}
+
+Arena::~Arena() {
+  FreeBlocks();
+  assert(overflow_blocks_ == NULL);  // FreeBlocks() should do that
+  // The first X blocks stay allocated always by default.  Delete them now.
+  for (size_t i = 0; i < blocks_alloced_; ++i) free(first_blocks_[i].mem);
+}
+
+// Returns true iff it advances freestart_ to the first position
+// satisfying alignment without exhausting the current block.
+bool Arena::SatisfyAlignment(size_t alignment) {
+  const size_t overage = reinterpret_cast<size_t>(freestart_) & (alignment - 1);
+  if (overage > 0) {
+    const size_t waste = alignment - overage;
+    if (waste >= remaining_) {
+      return false;
+    }
+    freestart_ += waste;
+    remaining_ -= waste;
+  }
+  DCHECK_EQ(0, reinterpret_cast<size_t>(freestart_) & (alignment - 1));
+  return true;
+}
+
+// ----------------------------------------------------------------------
+// Arena::Reset()
+//    Clears all the memory an arena is using.
+// ----------------------------------------------------------------------
+
+void Arena::Reset() {
+  FreeBlocks();
+  freestart_ = first_blocks_[0].mem;
+  remaining_ = first_blocks_[0].size;
+
+  // There is no guarantee the first block is properly aligned, so
+  // enforce that now.
+  CHECK(SatisfyAlignment(kDefaultAlignment));
+
+  freestart_when_empty_ = freestart_;
+}
+
+// ----------------------------------------------------------------------
+// Arena::MakeNewBlock()
+//    Our sbrk() equivalent.  We always make blocks of the same size
+//    (though GetMemory() can also make a new block for really big
+//    data.
+// ----------------------------------------------------------------------
+
+void Arena::MakeNewBlock(const uint32 alignment) {
+  AllocatedBlock* block = AllocNewBlock(block_size_, alignment);
+  freestart_ = block->mem;
+  remaining_ = block->size;
+  CHECK(SatisfyAlignment(alignment));
+}
+
+// The following simple numeric routines also exist in util/math/mathutil.h
+// but we don't want to depend on that library.
+
+// Euclid's algorithm for Greatest Common Denominator.
+static uint32 GCD(uint32 x, uint32 y) {
+  while (y != 0) {
+    uint32 r = x % y;
+    x = y;
+    y = r;
+  }
+  return x;
+}
+
+static uint32 LeastCommonMultiple(uint32 a, uint32 b) {
+  if (a > b) {
+    return (a / GCD(a, b)) * b;
+  } else if (a < b) {
+    return (b / GCD(b, a)) * a;
+  } else {
+    return a;
+  }
+}
+
+// -------------------------------------------------------------
+// Arena::AllocNewBlock()
+//    Adds and returns an AllocatedBlock.
+//    The returned AllocatedBlock* is valid until the next call
+//    to AllocNewBlock or Reset.  (i.e. anything that might
+//    affect overflow_blocks_).
+// -------------------------------------------------------------
+
+Arena::AllocatedBlock* Arena::AllocNewBlock(const size_t block_size,
+                                            const uint32 alignment) {
+  AllocatedBlock* block;
+  // Find the next block.
+  if (blocks_alloced_ < TF_ARRAYSIZE(first_blocks_)) {
+    // Use one of the pre-allocated blocks
+    block = &first_blocks_[blocks_alloced_++];
+  } else {  // oops, out of space, move to the vector
+    if (overflow_blocks_ == NULL)
+      overflow_blocks_ = new std::vector<AllocatedBlock>;
+    // Adds another block to the vector.
+    overflow_blocks_->resize(overflow_blocks_->size() + 1);
+    // block points to the last block of the vector.
+    block = &overflow_blocks_->back();
+  }
+
+  // NOTE(tucker): this utility is made slightly more complex by
+  // not disallowing the case where alignment > block_size.
+  // Can we, without breaking existing code?
+
+  // Must be a multiple of kDefaultAlignment, unless requested
+  // alignment is 1, in which case we don't care at all.
+  const uint32 adjusted_alignment =
+      (alignment > 1 ? LeastCommonMultiple(alignment, kDefaultAlignment) : 1);
+
+  CHECK_LE(adjusted_alignment, 1 << 20)
+      << "Alignment on boundaries greater than 1MB not supported.";
+
+  // If block_size > alignment we force block_size to be a multiple
+  // of alignment; if block_size < alignment we make no adjustment.
+  size_t adjusted_block_size = block_size;
+  if (adjusted_alignment > 1) {
+    if (adjusted_block_size > adjusted_alignment) {
+      const uint32 excess = adjusted_block_size % adjusted_alignment;
+      adjusted_block_size += (excess > 0 ? adjusted_alignment - excess : 0);
+    }
+    block->mem = reinterpret_cast<char*>(
+        port::aligned_malloc(adjusted_block_size, adjusted_alignment));
+  } else {
+    block->mem = reinterpret_cast<char*>(malloc(adjusted_block_size));
+  }
+  block->size = adjusted_block_size;
+  CHECK(NULL != block->mem) << "block_size=" << block_size
+                            << " adjusted_block_size=" << adjusted_block_size
+                            << " alignment=" << alignment
+                            << " adjusted_alignment=" << adjusted_alignment;
+
+  return block;
+}
+
+// ----------------------------------------------------------------------
+// Arena::GetMemoryFallback()
+//    We take memory out of our pool, aligned on the byte boundary
+//    requested.  If we don't have space in our current pool, we
+//    allocate a new block (wasting the remaining space in the
+//    current block) and give you that.  If your memory needs are
+//    too big for a single block, we make a special your-memory-only
+//    allocation -- this is equivalent to not using the arena at all.
+// ----------------------------------------------------------------------
+
+void* Arena::GetMemoryFallback(const size_t size, const int alignment) {
+  if (0 == size) {
+    return NULL;  // stl/stl_alloc.h says this is okay
+  }
+
+  // alignment must be a positive power of 2.
+  CHECK(alignment > 0 && 0 == (alignment & (alignment - 1)));
+
+  // If the object is more than a quarter of the block size, allocate
+  // it separately to avoid wasting too much space in leftover bytes.
+  if (block_size_ == 0 || size > block_size_ / 4) {
+    return AllocNewBlock(size, alignment)->mem;
+  }
+
+  // Enforce alignment on freestart_ then check for adequate space,
+  // which may require starting a new block.
+  if (!SatisfyAlignment(alignment) || size > remaining_) {
+    MakeNewBlock(alignment);
+  }
+  CHECK_LE(size, remaining_);
+
+  remaining_ -= size;
+  void* result = freestart_;
+  freestart_ += size;
+
+  return result;
+}
+
+// ----------------------------------------------------------------------
+// Arena::ReturnMemoryFallback()
+// Arena::FreeBlocks()
+//    Unlike GetMemory(), which does actual work, ReturnMemory() is a
+//    no-op: we don't "free" memory until Reset() is called.  We do
+//    update some stats, though.  Note we do no checking that the
+//    pointer you pass in was actually allocated by us, or that it
+//    was allocated for the size you say, so be careful here!
+//       FreeBlocks() does the work for Reset(), actually freeing all
+//    memory allocated in one fell swoop.
+// ----------------------------------------------------------------------
+
+void Arena::FreeBlocks() {
+  for (size_t i = 1; i < blocks_alloced_; ++i) {  // keep first block alloced
+    free(first_blocks_[i].mem);
+    first_blocks_[i].mem = NULL;
+    first_blocks_[i].size = 0;
+  }
+  blocks_alloced_ = 1;
+  if (overflow_blocks_ != NULL) {
+    std::vector<AllocatedBlock>::iterator it;
+    for (it = overflow_blocks_->begin(); it != overflow_blocks_->end(); ++it) {
+      free(it->mem);
+    }
+    delete overflow_blocks_;  // These should be used very rarely
+    overflow_blocks_ = NULL;
+  }
+}
+
+}  // namespace core
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/arena.h b/tensorflow/core/lib/core/arena.h
new file mode 100644
index 0000000000..59896803bb
--- /dev/null
+++ b/tensorflow/core/lib/core/arena.h
@@ -0,0 +1,90 @@
+// TODO(vrv): Switch this to an open-sourced version of Arena.
+
+#ifndef TENSORFLOW_LIB_CORE_ARENA_H_
+#define TENSORFLOW_LIB_CORE_ARENA_H_
+
+#include <assert.h>
+
+#include <vector>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace core {
+
+// This class is "thread-compatible": different threads can access the
+// arena at the same time without locking, as long as they use only
+// const methods.
+class Arena {
+ public:
+  // Allocates a thread-compatible arena with the specified block size.
+  explicit Arena(const size_t block_size);
+  ~Arena();
+
+  char* Alloc(const size_t size) {
+    return reinterpret_cast<char*>(GetMemory(size, 1));
+  }
+
+  void Reset();
+
+// This should be the worst-case alignment for any type.  This is
+// good for IA-32, SPARC version 7 (the last one I know), and
+// supposedly Alpha.  i386 would be more time-efficient with a
+// default alignment of 8, but ::operator new() uses alignment of 4,
+// and an assertion will fail below after the call to MakeNewBlock()
+// if you try to use a larger alignment.
+#ifdef __i386__
+  static const int kDefaultAlignment = 4;
+#else
+  static const int kDefaultAlignment = 8;
+#endif
+
+ protected:
+  bool SatisfyAlignment(const size_t alignment);
+  void MakeNewBlock(const uint32 alignment);
+  void* GetMemoryFallback(const size_t size, const int align);
+  void* GetMemory(const size_t size, const int align) {
+    assert(remaining_ <= block_size_);                  // an invariant
+    if (size > 0 && size < remaining_ && align == 1) {  // common case
+      void* result = freestart_;
+      freestart_ += size;
+      remaining_ -= size;
+      return result;
+    }
+    return GetMemoryFallback(size, align);
+  }
+
+  size_t remaining_;
+
+ private:
+  struct AllocatedBlock {
+    char* mem;
+    size_t size;
+  };
+
+  // Allocate new new block of at least block_size, with the specified
+  // alignment.
+  // The returned AllocatedBlock* is valid until the next call to AllocNewBlock
+  // or Reset (i.e. anything that might affect overflow_blocks_).
+  AllocatedBlock* AllocNewBlock(const size_t block_size,
+                                const uint32 alignment);
+
+  const size_t block_size_;
+  char* freestart_;  // beginning of the free space in most recent block
+  char* freestart_when_empty_;  // beginning of the free space when we're empty
+  // STL vector isn't as efficient as it could be, so we use an array at first
+  size_t blocks_alloced_;  // how many of the first_blocks_ have been alloced
+  AllocatedBlock first_blocks_[16];  // the length of this array is arbitrary
+  // if the first_blocks_ aren't enough, expand into overflow_blocks_.
+  std::vector<AllocatedBlock>* overflow_blocks_;
+
+  void FreeBlocks();  // Frees all except first block
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Arena);
+};
+
+}  // namespace core
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_ARENA_H_
diff --git a/tensorflow/core/lib/core/arena_test.cc b/tensorflow/core/lib/core/arena_test.cc
new file mode 100644
index 0000000000..fa147c3014
--- /dev/null
+++ b/tensorflow/core/lib/core/arena_test.cc
@@ -0,0 +1,92 @@
+#include "tensorflow/core/lib/core/arena.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace core {
+namespace {
+
+// Write random data to allocated memory
+static void TestMemory(void* mem, int size) {
+  // Check that we can memset the entire memory
+  memset(mem, 0xaa, size);
+
+  // Do some memory allocation to check that the arena doesn't mess up
+  // the internal memory allocator
+  char* tmp[100];
+  for (size_t i = 0; i < TF_ARRAYSIZE(tmp); i++) {
+    tmp[i] = new char[i * i + 1];
+  }
+
+  memset(mem, 0xcc, size);
+
+  // Free up the allocated memory;
+  for (size_t i = 0; i < TF_ARRAYSIZE(tmp); i++) {
+    delete[] tmp[i];
+  }
+
+  // Check that we can memset the entire memory
+  memset(mem, 0xee, size);
+}
+
+TEST(ArenaTest, TestBasicArena) {
+  Arena a(1024);
+  char* memory = a.Alloc(100);
+  ASSERT_NE(memory, nullptr);
+  TestMemory(memory, 100);
+
+  // Allocate again
+  memory = a.Alloc(100);
+  ASSERT_NE(memory, nullptr);
+  TestMemory(memory, 100);
+}
+
+TEST(ArenaTest, TestVariousArenaSizes) {
+  {
+    Arena a(1024);
+
+    // Allocate blocksize
+    char* memory = a.Alloc(1024);
+    ASSERT_NE(memory, nullptr);
+    TestMemory(memory, 1024);
+
+    // Allocate another blocksize
+    char* memory2 = a.Alloc(1024);
+    ASSERT_NE(memory2, nullptr);
+    TestMemory(memory2, 1024);
+  }
+
+  // Allocate an arena and allocate two blocks
+  // that together exceed a block size
+  {
+    Arena a(1024);
+
+    //
+    char* memory = a.Alloc(768);
+    ASSERT_NE(memory, nullptr);
+    TestMemory(memory, 768);
+
+    // Allocate another blocksize
+    char* memory2 = a.Alloc(768);
+    ASSERT_NE(memory2, nullptr);
+    TestMemory(memory2, 768);
+  }
+
+  // Allocate larger than a blocksize
+  {
+    Arena a(1024);
+
+    char* memory = a.Alloc(10240);
+    ASSERT_NE(memory, nullptr);
+    TestMemory(memory, 10240);
+
+    // Allocate another blocksize
+    char* memory2 = a.Alloc(1234);
+    ASSERT_NE(memory2, nullptr);
+    TestMemory(memory2, 1234);
+  }
+}
+
+}  // namespace
+}  // namespace core
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/bit_cast_test.cc b/tensorflow/core/lib/core/bit_cast_test.cc
new file mode 100644
index 0000000000..0ea583e96f
--- /dev/null
+++ b/tensorflow/core/lib/core/bit_cast_test.cc
@@ -0,0 +1,95 @@
+// Unit test for bit_cast template.
+
+#include "tensorflow/core/lib/core/casts.h"
+#include "tensorflow/core/platform/logging.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+// Marshall and unmarshall.
+// ISO spec C++ section 3.9 promises this will work.
+
+template <int N>
+struct marshall {
+  char buf[N];
+};
+
+template <class T>
+void TestMarshall(const T values[], int num_values) {
+  for (int i = 0; i < num_values; ++i) {
+    T t0 = values[i];
+    marshall<sizeof(T)> m0 = bit_cast<marshall<sizeof(T)> >(t0);
+    T t1 = bit_cast<T>(m0);
+    marshall<sizeof(T)> m1 = bit_cast<marshall<sizeof(T)> >(t1);
+    ASSERT_EQ(0, memcmp(&t0, &t1, sizeof(T)));
+    ASSERT_EQ(0, memcmp(&m0, &m1, sizeof(T)));
+  }
+}
+
+// Convert back and forth to an integral type.  The C++ standard does
+// not guarantee this will work.
+//
+// There are implicit assumptions about sizeof(float) and
+// sizeof(double). These assumptions are quite extant everywhere.
+
+template <class T, class I>
+void TestIntegral(const T values[], int num_values) {
+  for (int i = 0; i < num_values; ++i) {
+    T t0 = values[i];
+    I i0 = bit_cast<I>(t0);
+    T t1 = bit_cast<T>(i0);
+    I i1 = bit_cast<I>(t1);
+    ASSERT_EQ(0, memcmp(&t0, &t1, sizeof(T)));
+    ASSERT_EQ(i0, i1);
+  }
+}
+
+TEST(BitCast, Bool) {
+  LOG(INFO) << "Test bool";
+  static const bool bool_list[] = {false, true};
+  TestMarshall<bool>(bool_list, TF_ARRAYSIZE(bool_list));
+}
+
+TEST(BitCast, Int32) {
+  static const int32 int_list[] = {0,  1,    100,         2147483647,
+                                   -1, -100, -2147483647, -2147483647 - 1};
+  TestMarshall<int32>(int_list, TF_ARRAYSIZE(int_list));
+}
+
+TEST(BitCast, Int64) {
+  static const int64 int64_list[] = {0, 1, 1LL << 40, -1, -(1LL << 40)};
+  TestMarshall<int64>(int64_list, TF_ARRAYSIZE(int64_list));
+}
+
+TEST(BitCast, Uint64) {
+  static const uint64 uint64_list[] = {0, 1, 1LLU << 40, 1LLU << 63};
+  TestMarshall<uint64>(uint64_list, TF_ARRAYSIZE(uint64_list));
+}
+
+TEST(BitCast, Float) {
+  static const float float_list[] = {0.0,  1.0,   -1.0,  10.0,    -10.0,  1e10,
+                                     1e20, 1e-10, 1e-20, 2.71828, 3.14159};
+  TestMarshall<float>(float_list, TF_ARRAYSIZE(float_list));
+  TestIntegral<float, int32>(float_list, TF_ARRAYSIZE(float_list));
+  TestIntegral<float, uint32>(float_list, TF_ARRAYSIZE(float_list));
+}
+
+TEST(BitCast, Double) {
+  static const double double_list[] = {
+      0.0,
+      1.0,
+      -1.0,
+      10.0,
+      -10.0,
+      1e10,
+      1e100,
+      1e-10,
+      1e-100,
+      2.718281828459045,
+      3.141592653589793238462643383279502884197169399375105820974944};
+  TestMarshall<double>(double_list, TF_ARRAYSIZE(double_list));
+  TestIntegral<double, int64>(double_list, TF_ARRAYSIZE(double_list));
+  TestIntegral<double, uint64>(double_list, TF_ARRAYSIZE(double_list));
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/bits.h b/tensorflow/core/lib/core/bits.h
new file mode 100644
index 0000000000..5456a63168
--- /dev/null
+++ b/tensorflow/core/lib/core/bits.h
@@ -0,0 +1,84 @@
+#ifndef TENSORFLOW_LIB_CORE_BITS_H_
+#define TENSORFLOW_LIB_CORE_BITS_H_
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// Return floor(log2(n)) for positive integer n.  Returns -1 iff n == 0.
+int Log2Floor(uint32 n);
+int Log2Floor64(uint64 n);
+
+// Return ceiling(log2(n)) for positive integer n.  Returns -1 iff n == 0.
+int Log2Ceiling(uint32 n);
+int Log2Ceiling64(uint64 n);
+
+// ------------------------------------------------------------------------
+// Implementation details follow
+// ------------------------------------------------------------------------
+
+#if defined(__GNUC__)
+
+// Return floor(log2(n)) for positive integer n.  Returns -1 iff n == 0.
+inline int Log2Floor(uint32 n) {
+  return n == 0 ? -1 : 31 ^ __builtin_clz(n);
+}
+
+// Return floor(log2(n)) for positive integer n.  Returns -1 iff n == 0.
+inline int Log2Floor64(uint64 n) {
+  return n == 0 ? -1 : 63 ^ __builtin_clzll(n);
+}
+
+#else
+
+// Return floor(log2(n)) for positive integer n.  Returns -1 iff n == 0.
+inline int Log2Floor(uint32 n) {
+  if (n == 0)
+    return -1;
+  int log = 0;
+  uint32 value = n;
+  for (int i = 4; i >= 0; --i) {
+    int shift = (1 << i);
+    uint32 x = value >> shift;
+    if (x != 0) {
+      value = x;
+      log += shift;
+    }
+  }
+  assert(value == 1);
+  return log;
+}
+
+// Return floor(log2(n)) for positive integer n.  Returns -1 iff n == 0.
+// Log2Floor64() is defined in terms of Log2Floor32()
+inline int Log2Floor64(uint64 n) {
+  const uint32 topbits = static_cast<uint32>(n >> 32);
+  if (topbits == 0) {
+    // Top bits are zero, so scan in bottom bits
+    return Log2Floor(static_cast<uint32>(n));
+  } else {
+    return 32 + Log2Floor(topbits);
+  }
+}
+
+#endif
+
+inline int Log2Ceiling(uint32 n) {
+  int floor = Log2Floor(n);
+  if (n == (n & ~(n - 1)))  // zero or a power of two
+    return floor;
+  else
+    return floor + 1;
+}
+
+inline int Log2Ceiling64(uint64 n) {
+  int floor = Log2Floor64(n);
+  if (n == (n & ~(n - 1)))  // zero or a power of two
+    return floor;
+  else
+    return floor + 1;
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_BITS_H_
diff --git a/tensorflow/core/lib/core/blocking_counter.h b/tensorflow/core/lib/core/blocking_counter.h
new file mode 100644
index 0000000000..f141be2c76
--- /dev/null
+++ b/tensorflow/core/lib/core/blocking_counter.h
@@ -0,0 +1,41 @@
+#ifndef TENSORFLOW_LIB_CORE_BLOCKING_COUNTER_H_
+#define TENSORFLOW_LIB_CORE_BLOCKING_COUNTER_H_
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+class BlockingCounter {
+ public:
+  BlockingCounter(int initial_count) : count_(initial_count) {
+    CHECK_GE(count_, 0);
+  }
+
+  ~BlockingCounter() {}
+
+  inline void DecrementCount() {
+    mutex_lock l(mu_);
+    --count_;
+    CHECK(count_ >= 0);
+    if (count_ == 0) {
+      cond_var_.notify_all();
+    }
+  }
+
+  inline void Wait() {
+    mutex_lock l(mu_);
+    while (count_ > 0) {
+      cond_var_.wait(l);
+    }
+  }
+
+ private:
+  int count_;
+  mutex mu_;
+  condition_variable cond_var_;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_BLOCKING_COUNTER_H_
diff --git a/tensorflow/core/lib/core/blocking_counter_test.cc b/tensorflow/core/lib/core/blocking_counter_test.cc
new file mode 100644
index 0000000000..feb0342086
--- /dev/null
+++ b/tensorflow/core/lib/core/blocking_counter_test.cc
@@ -0,0 +1,36 @@
+#include <gtest/gtest.h>
+
+#include "tensorflow/core/lib/core/blocking_counter.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+
+namespace tensorflow {
+namespace {
+
+TEST(BlockingCounterTest, TestZero) {
+  BlockingCounter bc(0);
+  bc.Wait();
+}
+
+TEST(BlockingCounterTest, TestSingleThread) {
+  BlockingCounter bc(2);
+  bc.DecrementCount();
+  bc.DecrementCount();
+  bc.Wait();
+}
+
+TEST(BlockingCounterTest, TestMultipleThread) {
+  int N = 3;
+  thread::ThreadPool* thread_pool =
+      new thread::ThreadPool(Env::Default(), "test", N);
+
+  BlockingCounter bc(N);
+  for (int i = 0; i < N; ++i) {
+    thread_pool->Schedule([&bc] { bc.DecrementCount(); });
+  }
+
+  bc.Wait();
+  delete thread_pool;
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/casts.h b/tensorflow/core/lib/core/casts.h
new file mode 100644
index 0000000000..5b72048ac5
--- /dev/null
+++ b/tensorflow/core/lib/core/casts.h
@@ -0,0 +1,85 @@
+// Various Google-specific casting templates.
+//
+// This code is compiled directly on many platforms, including client
+// platforms like Windows, Mac, and embedded systems.  Before making
+// any changes here, make sure that you're not breaking any platforms.
+//
+
+#ifndef TENSORFLOW_LIB_CORE_CASTS_H_
+#define TENSORFLOW_LIB_CORE_CASTS_H_
+
+#include <string.h>  // for memcpy
+
+namespace tensorflow {
+
+// bit_cast<Dest,Source> is a template function that implements the
+// equivalent of "*reinterpret_cast<Dest*>(&source)".  We need this in
+// very low-level functions like the protobuf library and fast math
+// support.
+//
+//   float f = 3.14159265358979;
+//   int i = bit_cast<int32>(f);
+//   // i = 0x40490fdb
+//
+// The classical address-casting method is:
+//
+//   // WRONG
+//   float f = 3.14159265358979;            // WRONG
+//   int i = * reinterpret_cast<int*>(&f);  // WRONG
+//
+// The address-casting method actually produces undefined behavior
+// according to ISO C++ specification section 3.10 -15 -.  Roughly, this
+// section says: if an object in memory has one type, and a program
+// accesses it with a different type, then the result is undefined
+// behavior for most values of "different type".
+//
+// This is true for any cast syntax, either *(int*)&f or
+// *reinterpret_cast<int*>(&f).  And it is particularly true for
+// conversions between integral lvalues and floating-point lvalues.
+//
+// The purpose of 3.10 -15- is to allow optimizing compilers to assume
+// that expressions with different types refer to different memory.  gcc
+// 4.0.1 has an optimizer that takes advantage of this.  So a
+// non-conforming program quietly produces wildly incorrect output.
+//
+// The problem is not the use of reinterpret_cast.  The problem is type
+// punning: holding an object in memory of one type and reading its bits
+// back using a different type.
+//
+// The C++ standard is more subtle and complex than this, but that
+// is the basic idea.
+//
+// Anyways ...
+//
+// bit_cast<> calls memcpy() which is blessed by the standard,
+// especially by the example in section 3.9 .  Also, of course,
+// bit_cast<> wraps up the nasty logic in one place.
+//
+// Fortunately memcpy() is very fast.  In optimized mode, with a
+// constant size, gcc 2.95.3, gcc 4.0.1, and msvc 7.1 produce inline
+// code with the minimal amount of data movement.  On a 32-bit system,
+// memcpy(d,s,4) compiles to one load and one store, and memcpy(d,s,8)
+// compiles to two loads and two stores.
+//
+// I tested this code with gcc 2.95.3, gcc 4.0.1, icc 8.1, and msvc 7.1.
+//
+// WARNING: if Dest or Source is a non-POD type, the result of the memcpy
+// is likely to surprise you.
+//
+// Props to Bill Gibbons for the compile time assertion technique and
+// Art Komninos and Igor Tandetnik for the msvc experiments.
+//
+// -- mec 2005-10-17
+
+template <class Dest, class Source>
+inline Dest bit_cast(const Source& source) {
+  static_assert(sizeof(Dest) == sizeof(Source), "Sizes do not match");
+
+  Dest dest;
+  memcpy(&dest, &source, sizeof(dest));
+  return dest;
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_CASTS_H_
diff --git a/tensorflow/core/lib/core/coding.cc b/tensorflow/core/lib/core/coding.cc
new file mode 100644
index 0000000000..efff554742
--- /dev/null
+++ b/tensorflow/core/lib/core/coding.cc
@@ -0,0 +1,164 @@
+#include "tensorflow/core/lib/core/coding.h"
+
+namespace tensorflow {
+namespace core {
+
+void EncodeFixed32(char* buf, uint32 value) {
+  if (port::kLittleEndian) {
+    memcpy(buf, &value, sizeof(value));
+  } else {
+    buf[0] = value & 0xff;
+    buf[1] = (value >> 8) & 0xff;
+    buf[2] = (value >> 16) & 0xff;
+    buf[3] = (value >> 24) & 0xff;
+  }
+}
+
+void EncodeFixed64(char* buf, uint64 value) {
+  if (port::kLittleEndian) {
+    memcpy(buf, &value, sizeof(value));
+  } else {
+    buf[0] = value & 0xff;
+    buf[1] = (value >> 8) & 0xff;
+    buf[2] = (value >> 16) & 0xff;
+    buf[3] = (value >> 24) & 0xff;
+    buf[4] = (value >> 32) & 0xff;
+    buf[5] = (value >> 40) & 0xff;
+    buf[6] = (value >> 48) & 0xff;
+    buf[7] = (value >> 56) & 0xff;
+  }
+}
+
+void PutFixed32(string* dst, uint32 value) {
+  char buf[sizeof(value)];
+  EncodeFixed32(buf, value);
+  dst->append(buf, sizeof(buf));
+}
+
+void PutFixed64(string* dst, uint64 value) {
+  char buf[sizeof(value)];
+  EncodeFixed64(buf, value);
+  dst->append(buf, sizeof(buf));
+}
+
+char* EncodeVarint32(char* dst, uint32 v) {
+  // Operate on characters as unsigneds
+  unsigned char* ptr = reinterpret_cast<unsigned char*>(dst);
+  static const int B = 128;
+  if (v < (1 << 7)) {
+    *(ptr++) = v;
+  } else if (v < (1 << 14)) {
+    *(ptr++) = v | B;
+    *(ptr++) = v >> 7;
+  } else if (v < (1 << 21)) {
+    *(ptr++) = v | B;
+    *(ptr++) = (v >> 7) | B;
+    *(ptr++) = v >> 14;
+  } else if (v < (1 << 28)) {
+    *(ptr++) = v | B;
+    *(ptr++) = (v >> 7) | B;
+    *(ptr++) = (v >> 14) | B;
+    *(ptr++) = v >> 21;
+  } else {
+    *(ptr++) = v | B;
+    *(ptr++) = (v >> 7) | B;
+    *(ptr++) = (v >> 14) | B;
+    *(ptr++) = (v >> 21) | B;
+    *(ptr++) = v >> 28;
+  }
+  return reinterpret_cast<char*>(ptr);
+}
+
+void PutVarint32(string* dst, uint32 v) {
+  char buf[5];
+  char* ptr = EncodeVarint32(buf, v);
+  dst->append(buf, ptr - buf);
+}
+
+char* EncodeVarint64(char* dst, uint64 v) {
+  static const int B = 128;
+  unsigned char* ptr = reinterpret_cast<unsigned char*>(dst);
+  while (v >= B) {
+    *(ptr++) = (v & (B - 1)) | B;
+    v >>= 7;
+  }
+  *(ptr++) = static_cast<unsigned char>(v);
+  return reinterpret_cast<char*>(ptr);
+}
+
+void PutVarint64(string* dst, uint64 v) {
+  char buf[10];
+  char* ptr = EncodeVarint64(buf, v);
+  dst->append(buf, ptr - buf);
+}
+
+int VarintLength(uint64_t v) {
+  int len = 1;
+  while (v >= 128) {
+    v >>= 7;
+    len++;
+  }
+  return len;
+}
+
+const char* GetVarint32PtrFallback(const char* p, const char* limit,
+                                   uint32* value) {
+  uint32 result = 0;
+  for (uint32 shift = 0; shift <= 28 && p < limit; shift += 7) {
+    uint32 byte = *(reinterpret_cast<const unsigned char*>(p));
+    p++;
+    if (byte & 128) {
+      // More bytes are present
+      result |= ((byte & 127) << shift);
+    } else {
+      result |= (byte << shift);
+      *value = result;
+      return reinterpret_cast<const char*>(p);
+    }
+  }
+  return NULL;
+}
+
+bool GetVarint32(StringPiece* input, uint32* value) {
+  const char* p = input->data();
+  const char* limit = p + input->size();
+  const char* q = GetVarint32Ptr(p, limit, value);
+  if (q == NULL) {
+    return false;
+  } else {
+    *input = StringPiece(q, limit - q);
+    return true;
+  }
+}
+
+const char* GetVarint64Ptr(const char* p, const char* limit, uint64* value) {
+  uint64 result = 0;
+  for (uint32 shift = 0; shift <= 63 && p < limit; shift += 7) {
+    uint64 byte = *(reinterpret_cast<const unsigned char*>(p));
+    p++;
+    if (byte & 128) {
+      // More bytes are present
+      result |= ((byte & 127) << shift);
+    } else {
+      result |= (byte << shift);
+      *value = result;
+      return reinterpret_cast<const char*>(p);
+    }
+  }
+  return NULL;
+}
+
+bool GetVarint64(StringPiece* input, uint64* value) {
+  const char* p = input->data();
+  const char* limit = p + input->size();
+  const char* q = GetVarint64Ptr(p, limit, value);
+  if (q == NULL) {
+    return false;
+  } else {
+    *input = StringPiece(q, limit - q);
+    return true;
+  }
+}
+
+}  // namespace core
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/coding.h b/tensorflow/core/lib/core/coding.h
new file mode 100644
index 0000000000..0c14bf1bbf
--- /dev/null
+++ b/tensorflow/core/lib/core/coding.h
@@ -0,0 +1,55 @@
+// Endian-neutral encoding:
+// * Fixed-length numbers are encoded with least-significant byte first
+// * In addition we support variable length "varint" encoding
+// * Strings are encoded prefixed by their length in varint format
+
+#ifndef TENSORFLOW_LIB_CORE_CODING_H_
+#define TENSORFLOW_LIB_CORE_CODING_H_
+
+#include "tensorflow/core/lib/core/raw_coding.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace core {
+
+// Lower-level versions of Put... that write directly into a character buffer
+// REQUIRES: dst has enough space for the value being written
+extern void EncodeFixed32(char* dst, uint32 value);
+extern void EncodeFixed64(char* dst, uint64 value);
+extern void PutFixed32(string* dst, uint32 value);
+extern void PutFixed64(string* dst, uint64 value);
+
+extern void PutVarint32(string* dst, uint32 value);
+extern void PutVarint64(string* dst, uint64 value);
+
+extern bool GetVarint32(StringPiece* input, uint32* value);
+extern bool GetVarint64(StringPiece* input, uint64* value);
+
+extern const char* GetVarint32Ptr(const char* p, const char* limit, uint32* v);
+extern const char* GetVarint64Ptr(const char* p, const char* limit, uint64* v);
+
+// Internal routine for use by fallback path of GetVarint32Ptr
+extern const char* GetVarint32PtrFallback(const char* p, const char* limit,
+                                          uint32* value);
+inline const char* GetVarint32Ptr(const char* p, const char* limit,
+                                  uint32* value) {
+  if (p < limit) {
+    uint32 result = *(reinterpret_cast<const unsigned char*>(p));
+    if ((result & 128) == 0) {
+      *value = result;
+      return p + 1;
+    }
+  }
+  return GetVarint32PtrFallback(p, limit, value);
+}
+
+extern char* EncodeVarint64(char* dst, uint64 v);
+
+// Returns the length of the varint32 or varint64 encoding of "v"
+extern int VarintLength(uint64_t v);
+
+}  // namespace core
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_CODING_H_
diff --git a/tensorflow/core/lib/core/coding_test.cc b/tensorflow/core/lib/core/coding_test.cc
new file mode 100644
index 0000000000..5e9e2c5e96
--- /dev/null
+++ b/tensorflow/core/lib/core/coding_test.cc
@@ -0,0 +1,168 @@
+#include "tensorflow/core/lib/core/coding.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace core {
+
+TEST(Coding, Fixed32) {
+  static const int N = 100000;
+
+  string s;
+  for (uint32 v = 0; v < N; v++) {
+    char buf[sizeof(uint32)];
+    EncodeFixed32(buf, v);
+    s.append(buf, sizeof(buf));
+  }
+
+  const char* p = s.data();
+  for (uint32 v = 0; v < N; v++) {
+    uint32 actual = DecodeFixed32(p);
+    ASSERT_EQ(v, actual);
+    p += sizeof(uint32);
+  }
+}
+
+TEST(Coding, Fixed64) {
+  string s;
+  for (int power = 0; power <= 63; power++) {
+    uint64 v = static_cast<uint64>(1) << power;
+    char buf[sizeof(uint64)];
+    EncodeFixed64(buf, v - 1);
+    s.append(buf, sizeof(buf));
+    EncodeFixed64(buf, v + 0);
+    s.append(buf, sizeof(buf));
+    EncodeFixed64(buf, v + 1);
+    s.append(buf, sizeof(buf));
+  }
+
+  const char* p = s.data();
+  for (int power = 0; power <= 63; power++) {
+    uint64 v = static_cast<uint64>(1) << power;
+    uint64 actual;
+    actual = DecodeFixed64(p);
+    ASSERT_EQ(v - 1, actual);
+    p += sizeof(uint64);
+
+    actual = DecodeFixed64(p);
+    ASSERT_EQ(v + 0, actual);
+    p += sizeof(uint64);
+
+    actual = DecodeFixed64(p);
+    ASSERT_EQ(v + 1, actual);
+    p += sizeof(uint64);
+  }
+}
+
+// Test that encoding routines generate little-endian encodings
+TEST(Coding, EncodingOutput) {
+  char dst[8];
+  EncodeFixed32(dst, 0x04030201);
+  ASSERT_EQ(0x01, static_cast<int>(dst[0]));
+  ASSERT_EQ(0x02, static_cast<int>(dst[1]));
+  ASSERT_EQ(0x03, static_cast<int>(dst[2]));
+  ASSERT_EQ(0x04, static_cast<int>(dst[3]));
+
+  EncodeFixed64(dst, 0x0807060504030201ull);
+  ASSERT_EQ(0x01, static_cast<int>(dst[0]));
+  ASSERT_EQ(0x02, static_cast<int>(dst[1]));
+  ASSERT_EQ(0x03, static_cast<int>(dst[2]));
+  ASSERT_EQ(0x04, static_cast<int>(dst[3]));
+  ASSERT_EQ(0x05, static_cast<int>(dst[4]));
+  ASSERT_EQ(0x06, static_cast<int>(dst[5]));
+  ASSERT_EQ(0x07, static_cast<int>(dst[6]));
+  ASSERT_EQ(0x08, static_cast<int>(dst[7]));
+}
+
+TEST(Coding, Varint32) {
+  string s;
+  for (uint32 i = 0; i < (32 * 32); i++) {
+    uint32 v = (i / 32) << (i % 32);
+    PutVarint32(&s, v);
+  }
+
+  const char* p = s.data();
+  const char* limit = p + s.size();
+  for (uint32 i = 0; i < (32 * 32); i++) {
+    uint32 expected = (i / 32) << (i % 32);
+    uint32 actual;
+    p = GetVarint32Ptr(p, limit, &actual);
+    ASSERT_TRUE(p != NULL);
+    ASSERT_EQ(expected, actual);
+  }
+  ASSERT_EQ(p, s.data() + s.size());
+}
+
+TEST(Coding, Varint64) {
+  // Construct the list of values to check
+  std::vector<uint64> values;
+  // Some special values
+  values.push_back(0);
+  values.push_back(100);
+  values.push_back(~static_cast<uint64>(0));
+  values.push_back(~static_cast<uint64>(0) - 1);
+  for (uint32 k = 0; k < 64; k++) {
+    // Test values near powers of two
+    const uint64 power = 1ull << k;
+    values.push_back(power);
+    values.push_back(power - 1);
+    values.push_back(power + 1);
+  }
+
+  string s;
+  for (size_t i = 0; i < values.size(); i++) {
+    PutVarint64(&s, values[i]);
+  }
+
+  const char* p = s.data();
+  const char* limit = p + s.size();
+  for (size_t i = 0; i < values.size(); i++) {
+    ASSERT_TRUE(p < limit);
+    uint64 actual;
+    p = GetVarint64Ptr(p, limit, &actual);
+    ASSERT_TRUE(p != NULL);
+    ASSERT_EQ(values[i], actual);
+  }
+  ASSERT_EQ(p, limit);
+}
+
+TEST(Coding, Varint32Overflow) {
+  uint32 result;
+  string input("\x81\x82\x83\x84\x85\x11");
+  ASSERT_TRUE(GetVarint32Ptr(input.data(), input.data() + input.size(),
+                             &result) == NULL);
+}
+
+TEST(Coding, Varint32Truncation) {
+  uint32 large_value = (1u << 31) + 100;
+  string s;
+  PutVarint32(&s, large_value);
+  uint32 result;
+  for (size_t len = 0; len < s.size() - 1; len++) {
+    ASSERT_TRUE(GetVarint32Ptr(s.data(), s.data() + len, &result) == NULL);
+  }
+  ASSERT_TRUE(GetVarint32Ptr(s.data(), s.data() + s.size(), &result) != NULL);
+  ASSERT_EQ(large_value, result);
+}
+
+TEST(Coding, Varint64Overflow) {
+  uint64 result;
+  string input("\x81\x82\x83\x84\x85\x81\x82\x83\x84\x85\x11");
+  ASSERT_TRUE(GetVarint64Ptr(input.data(), input.data() + input.size(),
+                             &result) == NULL);
+}
+
+TEST(Coding, Varint64Truncation) {
+  uint64 large_value = (1ull << 63) + 100ull;
+  string s;
+  PutVarint64(&s, large_value);
+  uint64 result;
+  for (size_t len = 0; len < s.size() - 1; len++) {
+    ASSERT_TRUE(GetVarint64Ptr(s.data(), s.data() + len, &result) == NULL);
+  }
+  ASSERT_TRUE(GetVarint64Ptr(s.data(), s.data() + s.size(), &result) != NULL);
+  ASSERT_EQ(large_value, result);
+}
+
+}  // namespace core
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/command_line_flags.cc b/tensorflow/core/lib/core/command_line_flags.cc
new file mode 100644
index 0000000000..0f1072ffaa
--- /dev/null
+++ b/tensorflow/core/lib/core/command_line_flags.cc
@@ -0,0 +1,94 @@
+#include "tensorflow/core/lib/core/command_line_flags.h"
+
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+
+namespace tensorflow {
+namespace {
+
+// Templated function to convert a string to target values.
+// Return true if the conversion is successful. Otherwise, return false.
+template <typename T>
+bool StringToValue(const string& content, T* value);
+
+template <>
+bool StringToValue<int32>(const string& content, int* value) {
+  return str_util::NumericParse32(content, value);
+}
+
+// Parse a single argument by linearly searching through the command table.
+// The input format is: --argument=value.
+// Return OK if the argument is used. It store the extracted value into the
+// matching flag.
+// Return NOT_FOUND if the argument is not recognized.
+// Retrun INVALID_ARGUMENT if the command is recognized, but fails to extract
+// its value.
+template <typename T>
+Status ParseArgument(const string& argument) {
+  for (auto& command :
+       internal::CommandLineFlagRegistry<int>::Instance()->commands) {
+    string prefix = strings::StrCat("--", command.name, "=");
+    if (tensorflow::StringPiece(argument).starts_with(prefix)) {
+      string content = argument.substr(prefix.length());
+      if (StringToValue<T>(content, command.value)) {
+        return Status::OK();
+      }
+      return Status(error::INVALID_ARGUMENT,
+                    strings::StrCat("Cannot parse integer in: ", argument));
+    }
+  }
+  return Status(error::NOT_FOUND,
+                strings::StrCat("Unknown command: ", argument));
+}
+
+// A specialization for booleans. The input format is:
+//   "--argument" or "--noargument".
+// Parse a single argument by linearly searching through the command table.
+// Return OK if the argument is used. The value is stored in the matching flag.
+// Return NOT_FOUND if the argument is not recognized.
+template <>
+Status ParseArgument<bool>(const string& argument) {
+  for (auto& command :
+       internal::CommandLineFlagRegistry<bool>::Instance()->commands) {
+    if (argument == strings::StrCat("--", command.name)) {
+      *command.value = true;
+      return Status::OK();
+    } else if (argument == strings::StrCat("--no", command.name)) {
+      *command.value = false;
+      return Status::OK();
+    }
+  }
+  return Status(error::NOT_FOUND,
+                strings::StrCat("Unknown command: ", argument));
+}
+}  // namespace
+
+Status ParseCommandLineFlags(int* argc, char* argv[]) {
+  int unused_argc = 1;
+  for (int index = 1; index < *argc; ++index) {
+    Status s;
+    // Search bool commands.
+    s = ParseArgument<bool>(argv[index]);
+    if (s.ok()) {
+      continue;
+    }
+    if (s.code() != error::NOT_FOUND) {
+      return s;
+    }
+    // Search int32 commands.
+    s = ParseArgument<int32>(argv[index]);
+    if (s.ok()) {
+      continue;
+    }
+    if (s.code() != error::NOT_FOUND) {
+      return s;
+    }
+    // Pointer swap the unused argument to the front.
+    std::swap(argv[unused_argc++], argv[index]);
+  }
+  *argc = unused_argc;
+  return Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/command_line_flags.h b/tensorflow/core/lib/core/command_line_flags.h
new file mode 100644
index 0000000000..f1a94c11f9
--- /dev/null
+++ b/tensorflow/core/lib/core/command_line_flags.h
@@ -0,0 +1,60 @@
+#ifndef TENSORFLOW_LIB_CORE_COMMAND_LINE_FLAGS_H_
+#define TENSORFLOW_LIB_CORE_COMMAND_LINE_FLAGS_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+namespace internal {
+
+template <typename T>
+struct CommandLineFlagRegistry {
+  static CommandLineFlagRegistry* Instance() {
+    static CommandLineFlagRegistry instance_;
+    return &instance_;
+  }
+  struct Command {
+    string name;
+    T* value;
+    string text;
+  };
+  std::vector<Command> commands;
+
+ private:
+  CommandLineFlagRegistry() {}
+  TF_DISALLOW_COPY_AND_ASSIGN(CommandLineFlagRegistry);
+};
+
+template <typename T>
+struct CommandLineFlagRegister {
+  CommandLineFlagRegister(const string& name, T* val, const string& text) {
+    CommandLineFlagRegistry<T>::Instance()->commands.push_back(
+        {name, val, text});
+  }
+};
+
+#define TF_DEFINE_variable(type, name, default_value, text)     \
+  type FLAGS_##name = default_value;                            \
+  namespace TF_flags_internal {                                 \
+  tensorflow::internal::CommandLineFlagRegister<type>           \
+      TF_flags_internal_var_##name(#name, &FLAGS_##name, text); \
+  }  // namespace TF_flags_internal
+
+}  // namespace internal
+
+#define TF_DEFINE_int32(name, default_value, text) \
+  TF_DEFINE_variable(int32, name, default_value, text);
+
+#define TF_DEFINE_bool(name, default_value, text) \
+  TF_DEFINE_variable(bool, name, default_value, text);
+
+// Parse argv[1]..argv[*argc-1] to options. Remove used arguments from the argv.
+// Returned the number of unused arguments in *argc.
+// Return error Status if the parsing encounters errors.
+// TODO(opensource): switch to a command line argument parser that can be
+// shared with other tests.
+Status ParseCommandLineFlags(int* argc, char* argv[]);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_COMMAND_LINE_FLAGS_H_
diff --git a/tensorflow/core/lib/core/error_codes.proto b/tensorflow/core/lib/core/error_codes.proto
new file mode 100644
index 0000000000..6735fd8f88
--- /dev/null
+++ b/tensorflow/core/lib/core/error_codes.proto
@@ -0,0 +1,145 @@
+syntax = "proto3";
+
+package tensorflow.error;
+// option cc_enable_arenas = true;
+
+// The canonical error codes for TensorFlow APIs.
+//
+// Warnings:
+//
+// -   Do not change any numeric assignments.
+// -   Changes to this list should only be made if there is a compelling
+//     need that can't be satisfied in another way.  Such changes
+//     must be approved by at least two OWNERS.
+//
+// Sometimes multiple error codes may apply.  Services should return
+// the most specific error code that applies.  For example, prefer
+// OUT_OF_RANGE over FAILED_PRECONDITION if both codes apply.
+// Similarly prefer NOT_FOUND or ALREADY_EXISTS over FAILED_PRECONDITION.
+enum Code {
+  // Not an error; returned on success
+  OK = 0;
+
+  // The operation was cancelled (typically by the caller).
+  CANCELLED = 1;
+
+  // Unknown error.  An example of where this error may be returned is
+  // if a Status value received from another address space belongs to
+  // an error-space that is not known in this address space.  Also
+  // errors raised by APIs that do not return enough error information
+  // may be converted to this error.
+  UNKNOWN = 2;
+
+  // Client specified an invalid argument.  Note that this differs
+  // from FAILED_PRECONDITION.  INVALID_ARGUMENT indicates arguments
+  // that are problematic regardless of the state of the system
+  // (e.g., a malformed file name).
+  INVALID_ARGUMENT = 3;
+
+  // Deadline expired before operation could complete.  For operations
+  // that change the state of the system, this error may be returned
+  // even if the operation has completed successfully.  For example, a
+  // successful response from a server could have been delayed long
+  // enough for the deadline to expire.
+  DEADLINE_EXCEEDED = 4;
+
+  // Some requested entity (e.g., file or directory) was not found.
+  // For privacy reasons, this code *may* be returned when the client
+  // does not have the access right to the entity.
+  NOT_FOUND = 5;
+
+  // Some entity that we attempted to create (e.g., file or directory)
+  // already exists.
+  ALREADY_EXISTS = 6;
+
+  // The caller does not have permission to execute the specified
+  // operation.  PERMISSION_DENIED must not be used for rejections
+  // caused by exhausting some resource (use RESOURCE_EXHAUSTED
+  // instead for those errors).  PERMISSION_DENIED must not be
+  // used if the caller can not be identified (use UNAUTHENTICATED
+  // instead for those errors).
+  PERMISSION_DENIED = 7;
+
+  // The request does not have valid authentication credentials for the
+  // operation.
+  UNAUTHENTICATED = 16;
+
+  // Some resource has been exhausted, perhaps a per-user quota, or
+  // perhaps the entire file system is out of space.
+  RESOURCE_EXHAUSTED = 8;
+
+  // Operation was rejected because the system is not in a state
+  // required for the operation's execution.  For example, directory
+  // to be deleted may be non-empty, an rmdir operation is applied to
+  // a non-directory, etc.
+  //
+  // A litmus test that may help a service implementor in deciding
+  // between FAILED_PRECONDITION, ABORTED, and UNAVAILABLE:
+  //  (a) Use UNAVAILABLE if the client can retry just the failing call.
+  //  (b) Use ABORTED if the client should retry at a higher-level
+  //      (e.g., restarting a read-modify-write sequence).
+  //  (c) Use FAILED_PRECONDITION if the client should not retry until
+  //      the system state has been explicitly fixed.  E.g., if an "rmdir"
+  //      fails because the directory is non-empty, FAILED_PRECONDITION
+  //      should be returned since the client should not retry unless
+  //      they have first fixed up the directory by deleting files from it.
+  //  (d) Use FAILED_PRECONDITION if the client performs conditional
+  //      REST Get/Update/Delete on a resource and the resource on the
+  //      server does not match the condition. E.g., conflicting
+  //      read-modify-write on the same resource.
+  FAILED_PRECONDITION = 9;
+
+  // The operation was aborted, typically due to a concurrency issue
+  // like sequencer check failures, transaction aborts, etc.
+  //
+  // See litmus test above for deciding between FAILED_PRECONDITION,
+  // ABORTED, and UNAVAILABLE.
+  ABORTED = 10;
+
+  // Operation was attempted past the valid range.  E.g., seeking or
+  // reading past end of file.
+  //
+  // Unlike INVALID_ARGUMENT, this error indicates a problem that may
+  // be fixed if the system state changes. For example, a 32-bit file
+  // system will generate INVALID_ARGUMENT if asked to read at an
+  // offset that is not in the range [0,2^32-1], but it will generate
+  // OUT_OF_RANGE if asked to read from an offset past the current
+  // file size.
+  //
+  // There is a fair bit of overlap between FAILED_PRECONDITION and
+  // OUT_OF_RANGE.  We recommend using OUT_OF_RANGE (the more specific
+  // error) when it applies so that callers who are iterating through
+  // a space can easily look for an OUT_OF_RANGE error to detect when
+  // they are done.
+  OUT_OF_RANGE = 11;
+
+  // Operation is not implemented or not supported/enabled in this service.
+  UNIMPLEMENTED = 12;
+
+  // Internal errors.  Means some invariants expected by underlying
+  // system has been broken.  If you see one of these errors,
+  // something is very broken.
+  INTERNAL = 13;
+
+  // The service is currently unavailable.  This is a most likely a
+  // transient condition and may be corrected by retrying with
+  // a backoff.
+  //
+  // See litmus test above for deciding between FAILED_PRECONDITION,
+  // ABORTED, and UNAVAILABLE.
+  UNAVAILABLE = 14;
+
+  // Unrecoverable data loss or corruption.
+  DATA_LOSS = 15;
+
+  // An extra enum entry to prevent people from writing code that
+  // fails to compile when a new code is added.
+  //
+  // Nobody should ever reference this enumeration entry. In particular,
+  // if you write C++ code that switches on this enumeration, add a default:
+  // case instead of a case that mentions this enumeration entry.
+  //
+  // Nobody should rely on the value (currently 20) listed here.  It
+  // may change in the future.
+  DO_NOT_USE_RESERVED_FOR_FUTURE_EXPANSION_USE_DEFAULT_IN_SWITCH_INSTEAD_ = 20;
+}
diff --git a/tensorflow/core/lib/core/errors.h b/tensorflow/core/lib/core/errors.h
new file mode 100644
index 0000000000..b0badd8c4d
--- /dev/null
+++ b/tensorflow/core/lib/core/errors.h
@@ -0,0 +1,131 @@
+#ifndef TENSORFLOW_LIB_CORE_ERRORS_H_
+#define TENSORFLOW_LIB_CORE_ERRORS_H_
+
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace errors {
+
+typedef ::tensorflow::error::Code Code;
+
+// Append some context to an error message.  Each time we append
+// context put it on a new line, since it is possible for there
+// to be several layers of additional context.
+template <typename... Args>
+void AppendToMessage(::tensorflow::Status* status, Args... args) {
+  *status = ::tensorflow::Status(
+      status->code(),
+      strings::StrCat(status->error_message(), "\n\t", args...));
+}
+
+// For propagating errors when calling a function.
+#define TF_RETURN_IF_ERROR(expr)                         \
+  do {                                                   \
+    const ::tensorflow::Status _status = (expr);         \
+    if (TF_PREDICT_FALSE(!_status.ok())) return _status; \
+  } while (0)
+
+#define TF_RETURN_WITH_CONTEXT_IF_ERROR(expr, ...)                  \
+  do {                                                              \
+    ::tensorflow::Status _status = (expr);                          \
+    if (TF_PREDICT_FALSE(!_status.ok())) {                          \
+      ::tensorflow::errors::AppendToMessage(&_status, __VA_ARGS__); \
+      return _status;                                               \
+    }                                                               \
+  } while (0)
+
+// Convenience functions for generating and using error status.
+// Example usage:
+//   status.Update(errors::InvalidArgument("The ", foo, " isn't right."));
+//   if (errors::IsInvalidArgument(status)) { ... }
+//   switch (status.code()) { case error::INVALID_ARGUMENT: ... }
+
+#define DECLARE_ERROR(FUNC, CONST)                           \
+  template <typename... Args>                                \
+  inline ::tensorflow::Status FUNC(Args... args) {           \
+    return ::tensorflow::Status(::tensorflow::error::CONST,  \
+                                strings::StrCat(args...));   \
+  }                                                          \
+  inline bool Is##FUNC(const ::tensorflow::Status& status) { \
+    return status.code() == ::tensorflow::error::CONST;      \
+  }
+
+DECLARE_ERROR(Cancelled, CANCELLED)
+DECLARE_ERROR(InvalidArgument, INVALID_ARGUMENT)
+DECLARE_ERROR(NotFound, NOT_FOUND)
+DECLARE_ERROR(AlreadyExists, ALREADY_EXISTS)
+DECLARE_ERROR(ResourceExhausted, RESOURCE_EXHAUSTED)
+DECLARE_ERROR(Unavailable, UNAVAILABLE)
+DECLARE_ERROR(FailedPrecondition, FAILED_PRECONDITION)
+DECLARE_ERROR(OutOfRange, OUT_OF_RANGE)
+DECLARE_ERROR(Unimplemented, UNIMPLEMENTED)
+DECLARE_ERROR(Internal, INTERNAL)
+DECLARE_ERROR(Aborted, ABORTED)
+DECLARE_ERROR(DeadlineExceeded, DEADLINE_EXCEEDED)
+DECLARE_ERROR(DataLoss, DATA_LOSS)
+DECLARE_ERROR(Unknown, UNKNOWN)
+DECLARE_ERROR(PermissionDenied, PERMISSION_DENIED)
+DECLARE_ERROR(Unauthenticated, UNAUTHENTICATED)
+
+#undef DECLARE_ERROR
+
+// The CanonicalCode() for non-errors.
+using ::tensorflow::error::OK;
+
+// Convenience macros for asserting and handling exceptional conditions.
+// Analogous to the CHECK* macros provided by logging.h.
+//
+// Example use:
+// void Compute(OperationContext* context) {
+//   OP_REQUIRES(context, context->num_inputs() == 2,
+//               errors::InvalidArgument("FooOp requires 2 arguments"));
+//   ...
+//   Status status = SomeUncertainMethod();
+//   OP_REQUIRES_OK(context, status);
+//   ...
+// }
+
+#define OP_REQUIRES(CTX, EXP, STATUS) \
+  if (!(EXP)) {                       \
+    ::tensorflow::Status _s(STATUS);  \
+    VLOG(1) << _s;                    \
+    (CTX)->SetStatus(_s);             \
+    return;                           \
+  }
+
+#define OP_REQUIRES_OK(CTX, STATUS)  \
+  do {                               \
+    ::tensorflow::Status _s(STATUS); \
+    if (!_s.ok()) {                  \
+      LOG(WARNING) << _s;            \
+      (CTX)->SetStatus(_s);          \
+      return;                        \
+    }                                \
+  } while (0)
+
+#define OP_REQUIRES_ASYNC(CTX, EXP, STATUS, CALLBACK) \
+  if (!(EXP)) {                                       \
+    ::tensorflow::Status _s(STATUS);                  \
+    VLOG(1) << _s;                                    \
+    (CTX)->SetStatus(_s);                             \
+    (CALLBACK)();                                     \
+    return;                                           \
+  }
+
+#define OP_REQUIRES_OK_ASYNC(CTX, STATUS, CALLBACK) \
+  do {                                              \
+    ::tensorflow::Status _s(STATUS);                \
+    if (!_s.ok()) {                                 \
+      LOG(WARNING) << _s;                           \
+      (CTX)->SetStatus(_s);                         \
+      (CALLBACK)();                                 \
+      return;                                       \
+    }                                               \
+  } while (0)
+
+}  // namespace errors
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_ERRORS_H_
diff --git a/tensorflow/core/lib/core/notification.h b/tensorflow/core/lib/core/notification.h
new file mode 100644
index 0000000000..071e24285a
--- /dev/null
+++ b/tensorflow/core/lib/core/notification.h
@@ -0,0 +1,42 @@
+#ifndef TENSORFLOW_UTIL_NOTIFICATION_H_
+#define TENSORFLOW_UTIL_NOTIFICATION_H_
+
+#include <assert.h>
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+class Notification {
+ public:
+  Notification() : notified_(false) {}
+  ~Notification() {}
+
+  void Notify() {
+    mutex_lock l(mu_);
+    assert(!notified_);
+    notified_ = true;
+    cv_.notify_all();
+  }
+
+  bool HasBeenNotified() {
+    mutex_lock l(mu_);
+    return notified_;
+  }
+
+  void WaitForNotification() {
+    mutex_lock l(mu_);
+    while (!notified_) {
+      cv_.wait(l);
+    }
+  }
+
+ private:
+  mutex mu_;
+  condition_variable cv_;
+  bool notified_;
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_NOTIFICATION_H_
diff --git a/tensorflow/core/lib/core/notification_test.cc b/tensorflow/core/lib/core/notification_test.cc
new file mode 100644
index 0000000000..a9e8942f05
--- /dev/null
+++ b/tensorflow/core/lib/core/notification_test.cc
@@ -0,0 +1,64 @@
+#include <gtest/gtest.h>
+
+#include "tensorflow/core/lib/core/notification.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace {
+
+TEST(NotificationTest, TestSingleNotification) {
+  thread::ThreadPool* thread_pool =
+      new thread::ThreadPool(Env::Default(), "test", 1);
+
+  int counter = 0;
+  Notification start;
+  Notification proceed;
+  thread_pool->Schedule([&start, &proceed, &counter] {
+    start.Notify();
+    proceed.WaitForNotification();
+    ++counter;
+  });
+
+  // Wait for the thread to start
+  start.WaitForNotification();
+
+  // The thread should be waiting for the 'proceed' notification.
+  EXPECT_EQ(0, counter);
+
+  // Unblock the thread
+  proceed.Notify();
+
+  delete thread_pool;  // Wait for closure to finish.
+
+  // Verify the counter has been incremented
+  EXPECT_EQ(1, counter);
+}
+
+TEST(NotificationTest, TestMultipleThreadsWaitingOnNotification) {
+  const int num_closures = 4;
+  thread::ThreadPool* thread_pool =
+      new thread::ThreadPool(Env::Default(), "test", num_closures);
+
+  mutex lock;
+  int counter = 0;
+  Notification n;
+
+  for (int i = 0; i < num_closures; ++i) {
+    thread_pool->Schedule([&n, &lock, &counter] {
+      n.WaitForNotification();
+      mutex_lock l(lock);
+      ++counter;
+    });
+  }
+  sleep(1);
+
+  EXPECT_EQ(0, counter);
+
+  n.Notify();
+  delete thread_pool;  // Wait for all closures to finish.
+  EXPECT_EQ(4, counter);
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/raw_coding.h b/tensorflow/core/lib/core/raw_coding.h
new file mode 100644
index 0000000000..1fe49b75bb
--- /dev/null
+++ b/tensorflow/core/lib/core/raw_coding.h
@@ -0,0 +1,43 @@
+#ifndef TENSORFLOW_LIB_CORE_RAW_CODING_H_
+#define TENSORFLOW_LIB_CORE_RAW_CODING_H_
+
+#include <string.h>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace core {
+
+// Lower-level versions of Get... that read directly from a character buffer
+// without any bounds checking.
+
+inline uint32 DecodeFixed32(const char* ptr) {
+  if (port::kLittleEndian) {
+    // Load the raw bytes
+    uint32 result;
+    memcpy(&result, ptr, sizeof(result));  // gcc optimizes this to a plain load
+    return result;
+  } else {
+    return ((static_cast<uint32>(static_cast<unsigned char>(ptr[0]))) |
+            (static_cast<uint32>(static_cast<unsigned char>(ptr[1])) << 8) |
+            (static_cast<uint32>(static_cast<unsigned char>(ptr[2])) << 16) |
+            (static_cast<uint32>(static_cast<unsigned char>(ptr[3])) << 24));
+  }
+}
+
+inline uint64 DecodeFixed64(const char* ptr) {
+  if (port::kLittleEndian) {
+    // Load the raw bytes
+    uint64 result;
+    memcpy(&result, ptr, sizeof(result));  // gcc optimizes this to a plain load
+    return result;
+  } else {
+    uint64 lo = DecodeFixed32(ptr);
+    uint64 hi = DecodeFixed32(ptr + 4);
+    return (hi << 32) | lo;
+  }
+}
+
+}  // namespace core
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_RAW_CODING_H_
diff --git a/tensorflow/core/lib/core/refcount.cc b/tensorflow/core/lib/core/refcount.cc
new file mode 100644
index 0000000000..3ed8c58eb8
--- /dev/null
+++ b/tensorflow/core/lib/core/refcount.cc
@@ -0,0 +1,35 @@
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace core {
+
+RefCounted::RefCounted() : ref_(1) {}
+
+RefCounted::~RefCounted() { DCHECK_EQ(ref_.load(), 0); }
+
+void RefCounted::Ref() const {
+  DCHECK_GE(ref_.load(), 1);
+  ref_.fetch_add(1, std::memory_order_relaxed);
+}
+
+bool RefCounted::Unref() const {
+  DCHECK_GT(ref_.load(), 0);
+  // If ref_==1, this object is owned only by the caller. Bypass a locked op
+  // in that case.
+  if (ref_.load(std::memory_order_acquire) == 1 || ref_.fetch_sub(1) == 1) {
+    // Make DCHECK in ~RefCounted happy
+    DCHECK((ref_.store(0), true));
+    delete this;
+    return true;
+  } else {
+    return false;
+  }
+}
+
+bool RefCounted::RefCountIsOne() const {
+  return (ref_.load(std::memory_order_acquire) == 1);
+}
+
+}  // namespace core
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/refcount.h b/tensorflow/core/lib/core/refcount.h
new file mode 100644
index 0000000000..f727750f9e
--- /dev/null
+++ b/tensorflow/core/lib/core/refcount.h
@@ -0,0 +1,63 @@
+#ifndef TENSORFLOW_LIB_CORE_REFCOUNT_H_
+#define TENSORFLOW_LIB_CORE_REFCOUNT_H_
+
+#include <atomic>
+
+namespace tensorflow {
+namespace core {
+
+class RefCounted {
+ public:
+  // Initial reference count is one.
+  RefCounted();
+
+  // Increments reference count by one.
+  void Ref() const;
+
+  // Decrements reference count by one.  If the count remains
+  // positive, returns false.  When the count reaches zero, returns
+  // true and deletes this, in which case the caller must not access
+  // the object afterward.
+  bool Unref() const;
+
+  // Return whether the reference count is one.
+  // If the reference count is used in the conventional way, a
+  // reference count of 1 implies that the current thread owns the
+  // reference and no other thread shares it.
+  // This call performs the test for a reference count of one, and
+  // performs the memory barrier needed for the owning thread
+  // to act on the object, knowing that it has exclusive access to the
+  // object.
+  bool RefCountIsOne() const;
+
+ protected:
+  // Make destructor protected so that RefCounted objects cannot
+  // be instantiated directly. Only subclasses can be instantiated.
+  virtual ~RefCounted();
+
+ private:
+  mutable std::atomic_int_fast32_t ref_;
+
+  RefCounted(const RefCounted&) = delete;
+  void operator=(const RefCounted&) = delete;
+};
+
+// Helper class to unref an object when out-of-scope.
+class ScopedUnref {
+ public:
+  explicit ScopedUnref(RefCounted* o) : obj_(o) {}
+  ~ScopedUnref() {
+    if (obj_) obj_->Unref();
+  }
+
+ private:
+  RefCounted* obj_;
+
+  ScopedUnref(const ScopedUnref&) = delete;
+  void operator=(const ScopedUnref&) = delete;
+};
+
+}  // namespace core
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_REFCOUNT_H_
diff --git a/tensorflow/core/lib/core/refcount_test.cc b/tensorflow/core/lib/core/refcount_test.cc
new file mode 100644
index 0000000000..c042be2d61
--- /dev/null
+++ b/tensorflow/core/lib/core/refcount_test.cc
@@ -0,0 +1,92 @@
+#include "tensorflow/core/lib/core/refcount.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace core {
+namespace {
+
+static int constructed = 0;
+static int destroyed = 0;
+
+class MyRef : public RefCounted {
+ public:
+  MyRef() { constructed++; }
+  ~MyRef() override { destroyed++; }
+};
+
+class RefTest : public testing::Test {
+ public:
+  RefTest() {
+    constructed = 0;
+    destroyed = 0;
+  }
+};
+
+TEST_F(RefTest, New) {
+  MyRef* ref = new MyRef;
+  ASSERT_EQ(1, constructed);
+  ASSERT_EQ(0, destroyed);
+  ref->Unref();
+  ASSERT_EQ(1, constructed);
+  ASSERT_EQ(1, destroyed);
+}
+
+TEST_F(RefTest, RefUnref) {
+  MyRef* ref = new MyRef;
+  ASSERT_EQ(1, constructed);
+  ASSERT_EQ(0, destroyed);
+  ref->Ref();
+  ASSERT_EQ(0, destroyed);
+  ref->Unref();
+  ASSERT_EQ(0, destroyed);
+  ref->Unref();
+  ASSERT_EQ(1, destroyed);
+}
+
+TEST_F(RefTest, RefCountOne) {
+  MyRef* ref = new MyRef;
+  ASSERT_TRUE(ref->RefCountIsOne());
+  ref->Unref();
+}
+
+TEST_F(RefTest, RefCountNotOne) {
+  MyRef* ref = new MyRef;
+  ref->Ref();
+  ASSERT_FALSE(ref->RefCountIsOne());
+  ref->Unref();
+  ref->Unref();
+}
+
+TEST_F(RefTest, ConstRefUnref) {
+  const MyRef* cref = new MyRef;
+  ASSERT_EQ(1, constructed);
+  ASSERT_EQ(0, destroyed);
+  cref->Ref();
+  ASSERT_EQ(0, destroyed);
+  cref->Unref();
+  ASSERT_EQ(0, destroyed);
+  cref->Unref();
+  ASSERT_EQ(1, destroyed);
+}
+
+TEST_F(RefTest, ReturnOfUnref) {
+  MyRef* ref = new MyRef;
+  ref->Ref();
+  EXPECT_FALSE(ref->Unref());
+  EXPECT_TRUE(ref->Unref());
+}
+
+TEST_F(RefTest, ScopedUnref) {
+  { ScopedUnref unref(new MyRef); }
+  EXPECT_EQ(destroyed, 1);
+}
+
+TEST_F(RefTest, ScopedUnref_Nullptr) {
+  { ScopedUnref unref(nullptr); }
+  EXPECT_EQ(destroyed, 0);
+}
+
+}  // namespace
+}  // namespace core
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/status.cc b/tensorflow/core/lib/core/status.cc
new file mode 100644
index 0000000000..24ce842560
--- /dev/null
+++ b/tensorflow/core/lib/core/status.cc
@@ -0,0 +1,107 @@
+#include "tensorflow/core/public/status.h"
+#include <stdio.h>
+
+namespace tensorflow {
+
+Status::Status(tensorflow::error::Code code, StringPiece msg) {
+  assert(code != tensorflow::error::OK);
+  state_ = new State;
+  state_->code = code;
+  state_->msg = msg.ToString();
+}
+Status::~Status() { delete state_; }
+
+void Status::Update(const Status& new_status) {
+  if (ok()) {
+    *this = new_status;
+  }
+}
+
+void Status::SlowCopyFrom(const State* src) {
+  delete state_;
+  if (src == nullptr) {
+    state_ = nullptr;
+  } else {
+    state_ = new State(*src);
+  }
+}
+
+const string& Status::empty_string() {
+  static string* empty = new string;
+  return *empty;
+}
+
+string Status::ToString() const {
+  if (state_ == NULL) {
+    return "OK";
+  } else {
+    char tmp[30];
+    const char* type;
+    switch (code()) {
+      case tensorflow::error::CANCELLED:
+        type = "Cancelled";
+        break;
+      case tensorflow::error::UNKNOWN:
+        type = "Unknown";
+        break;
+      case tensorflow::error::INVALID_ARGUMENT:
+        type = "Invalid argument";
+        break;
+      case tensorflow::error::DEADLINE_EXCEEDED:
+        type = "Deadline exceeded";
+        break;
+      case tensorflow::error::NOT_FOUND:
+        type = "Not found";
+        break;
+      case tensorflow::error::ALREADY_EXISTS:
+        type = "Already exists";
+        break;
+      case tensorflow::error::PERMISSION_DENIED:
+        type = "Permission denied";
+        break;
+      case tensorflow::error::UNAUTHENTICATED:
+        type = "Unauthenticated";
+        break;
+      case tensorflow::error::RESOURCE_EXHAUSTED:
+        type = "Resource exhausted";
+        break;
+      case tensorflow::error::FAILED_PRECONDITION:
+        type = "Failed precondition";
+        break;
+      case tensorflow::error::ABORTED:
+        type = "Aborted";
+        break;
+      case tensorflow::error::OUT_OF_RANGE:
+        type = "Out of range";
+        break;
+      case tensorflow::error::UNIMPLEMENTED:
+        type = "Unimplemented";
+        break;
+      case tensorflow::error::INTERNAL:
+        type = "Internal";
+        break;
+      case tensorflow::error::UNAVAILABLE:
+        type = "Unavailable";
+        break;
+      case tensorflow::error::DATA_LOSS:
+        type = "Data loss";
+        break;
+      default:
+        snprintf(tmp, sizeof(tmp), "Unknown code(%d)",
+                 static_cast<int>(code()));
+        type = tmp;
+        break;
+    }
+    string result(type);
+    result += ": ";
+    result += state_->msg;
+    return result;
+  }
+}
+
+std::ostream& operator<<(std::ostream& os, const Status& x) {
+  os << x.ToString();
+  return os;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/status_test.cc b/tensorflow/core/lib/core/status_test.cc
new file mode 100644
index 0000000000..3ef6b3302a
--- /dev/null
+++ b/tensorflow/core/lib/core/status_test.cc
@@ -0,0 +1,84 @@
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TEST(Status, OK) {
+  EXPECT_EQ(Status::OK().code(), error::OK);
+  EXPECT_EQ(Status::OK().error_message(), "");
+  EXPECT_OK(Status::OK());
+  ASSERT_OK(Status::OK());
+  EXPECT_EQ(Status::OK(), Status());
+  Status s;
+  EXPECT_TRUE(s.ok());
+}
+
+TEST(DeathStatus, CheckOK) {
+  Status status(errors::InvalidArgument("Invalid"));
+  ASSERT_DEATH(TF_CHECK_OK(status), "Invalid");
+}
+
+TEST(Status, Set) {
+  Status status;
+  status = Status(error::CANCELLED, "Error message");
+  EXPECT_EQ(status.code(), error::CANCELLED);
+  EXPECT_EQ(status.error_message(), "Error message");
+}
+
+TEST(Status, Copy) {
+  Status a(errors::InvalidArgument("Invalid"));
+  Status b(a);
+  ASSERT_EQ(a.ToString(), b.ToString());
+}
+
+TEST(Status, Assign) {
+  Status a(errors::InvalidArgument("Invalid"));
+  Status b;
+  b = a;
+  ASSERT_EQ(a.ToString(), b.ToString());
+}
+
+TEST(Status, Update) {
+  Status s;
+  s.Update(Status::OK());
+  ASSERT_TRUE(s.ok());
+  Status a(errors::InvalidArgument("Invalid"));
+  s.Update(a);
+  ASSERT_EQ(s.ToString(), a.ToString());
+  Status b(errors::Internal("Internal"));
+  s.Update(b);
+  ASSERT_EQ(s.ToString(), a.ToString());
+  s.Update(Status::OK());
+  ASSERT_EQ(s.ToString(), a.ToString());
+  ASSERT_FALSE(s.ok());
+}
+
+TEST(Status, EqualsOK) { ASSERT_EQ(Status::OK(), Status()); }
+
+TEST(Status, EqualsSame) {
+  Status a(errors::InvalidArgument("Invalid"));
+  Status b(errors::InvalidArgument("Invalid"));
+  ASSERT_EQ(a, b);
+}
+
+TEST(Status, EqualsCopy) {
+  const Status a(errors::InvalidArgument("Invalid"));
+  const Status b = a;
+  ASSERT_EQ(a, b);
+}
+
+TEST(Status, EqualsDifferentCode) {
+  const Status a(errors::InvalidArgument("message"));
+  const Status b(errors::Internal("message"));
+  ASSERT_NE(a, b);
+}
+
+TEST(Status, EqualsDifferentMessage) {
+  const Status a(errors::InvalidArgument("message"));
+  const Status b(errors::InvalidArgument("another"));
+  ASSERT_NE(a, b);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/status_test_util.h b/tensorflow/core/lib/core/status_test_util.h
new file mode 100644
index 0000000000..b3b4db429f
--- /dev/null
+++ b/tensorflow/core/lib/core/status_test_util.h
@@ -0,0 +1,20 @@
+#ifndef TENSORFLOW_LIB_CORE_STATUS_TEST_UTIL_H_
+#define TENSORFLOW_LIB_CORE_STATUS_TEST_UTIL_H_
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/public/status.h"
+
+// Macros for testing the results of functions that return util::Status.
+
+#define EXPECT_OK(statement) EXPECT_EQ(::tensorflow::Status::OK(), (statement))
+#define ASSERT_OK(statement) ASSERT_EQ(::tensorflow::Status::OK(), (statement))
+
+// There are no EXPECT_NOT_OK/ASSERT_NOT_OK macros since they would not
+// provide much value (when they fail, they would just print the OK status
+// which conveys no more information than EXPECT_FALSE(status.ok());
+// If you want to check for particular errors, better alternatives are:
+// EXPECT_EQ(::util::Status(...expected error...), status.StripMessage());
+// EXPECT_THAT(status.ToString(), HasSubstr("expected error"));
+// Also, see testing/lib/util/status_util.h.
+
+#endif  // TENSORFLOW_LIB_CORE_STATUS_TEST_UTIL_H_
diff --git a/tensorflow/core/lib/core/stringpiece.cc b/tensorflow/core/lib/core/stringpiece.cc
new file mode 100644
index 0000000000..57c5139f47
--- /dev/null
+++ b/tensorflow/core/lib/core/stringpiece.cc
@@ -0,0 +1,57 @@
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+#include <iostream>
+#include "tensorflow/core/lib/hash/hash.h"
+
+namespace tensorflow {
+
+size_t StringPiece::Hasher::operator()(StringPiece s) const {
+  return Hash64(s.data(), s.size());
+}
+
+std::ostream& operator<<(std::ostream& o, StringPiece piece) {
+  o.write(piece.data(), piece.size());
+  return o;
+}
+
+bool StringPiece::contains(StringPiece s) const {
+  return memmem(data_, size_, s.data_, s.size_) != nullptr;
+}
+
+size_t StringPiece::find(char c, size_t pos) const {
+  if (pos >= size_) {
+    return npos;
+  }
+  const char* result =
+      reinterpret_cast<const char*>(memchr(data_ + pos, c, size_ - pos));
+  return result != NULL ? result - data_ : npos;
+}
+
+// Search range is [0..pos] inclusive.  If pos == npos, search everything.
+size_t StringPiece::rfind(char c, size_t pos) const {
+  if (size_ == 0) return npos;
+  for (const char* p = data_ + std::min(pos, size_ - 1); p >= data_; p--) {
+    if (*p == c) {
+      return p - data_;
+    }
+  }
+  return npos;
+}
+
+bool StringPiece::Consume(StringPiece x) {
+  if (starts_with(x)) {
+    remove_prefix(x.size_);
+    return true;
+  }
+  return false;
+}
+
+StringPiece StringPiece::substr(size_t pos, size_t n) const {
+  if (pos > size_) pos = size_;
+  if (n > size_ - pos) n = size_ - pos;
+  return StringPiece(data_ + pos, n);
+}
+
+const StringPiece::size_type StringPiece::npos = size_type(-1);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/stringpiece.h b/tensorflow/core/lib/core/stringpiece.h
new file mode 100644
index 0000000000..17d4b294e9
--- /dev/null
+++ b/tensorflow/core/lib/core/stringpiece.h
@@ -0,0 +1,159 @@
+// StringPiece is a simple structure containing a pointer into some external
+// storage and a size.  The user of a StringPiece must ensure that the slice
+// is not used after the corresponding external storage has been
+// deallocated.
+//
+// Multiple threads can invoke const methods on a StringPiece without
+// external synchronization, but if any of the threads may call a
+// non-const method, all threads accessing the same StringPiece must use
+// external synchronization.
+
+#ifndef TENSORFLOW_LIB_CORE_STRINGPIECE_H_
+#define TENSORFLOW_LIB_CORE_STRINGPIECE_H_
+
+#include <assert.h>
+#include <stddef.h>
+#include <string.h>
+#include <iosfwd>
+#include <string>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+class StringPiece {
+ public:
+  typedef size_t size_type;
+
+  // Create an empty slice.
+  StringPiece() : data_(""), size_(0) {}
+
+  // Create a slice that refers to d[0,n-1].
+  StringPiece(const char* d, size_t n) : data_(d), size_(n) {}
+
+  // Create a slice that refers to the contents of "s"
+  StringPiece(const string& s) : data_(s.data()), size_(s.size()) {}
+
+  // Create a slice that refers to s[0,strlen(s)-1]
+  StringPiece(const char* s) : data_(s), size_(strlen(s)) {}
+
+  void set(const void* data, size_t len) {
+    data_ = reinterpret_cast<const char*>(data);
+    size_ = len;
+  }
+
+  // Return a pointer to the beginning of the referenced data
+  const char* data() const { return data_; }
+
+  // Return the length (in bytes) of the referenced data
+  size_t size() const { return size_; }
+
+  // Return true iff the length of the referenced data is zero
+  bool empty() const { return size_ == 0; }
+
+  typedef const char* const_iterator;
+  typedef const char* iterator;
+  iterator begin() const { return data_; }
+  iterator end() const { return data_ + size_; }
+
+  static const size_t npos;
+
+  // Return the ith byte in the referenced data.
+  // REQUIRES: n < size()
+  char operator[](size_t n) const {
+    assert(n < size());
+    return data_[n];
+  }
+
+  // Change this slice to refer to an empty array
+  void clear() {
+    data_ = "";
+    size_ = 0;
+  }
+
+  // Drop the first "n" bytes from this slice.
+  void remove_prefix(size_t n) {
+    assert(n <= size());
+    data_ += n;
+    size_ -= n;
+  }
+
+  void remove_suffix(size_t n) {
+    assert(size_ >= n);
+    size_ -= n;
+  }
+
+  size_t find(char c, size_t pos = 0) const;
+  size_t rfind(char c, size_t pos = npos) const;
+  bool contains(StringPiece s) const;
+
+  // Checks whether StringPiece starts with x and if so advances the beginning
+  // of it to past the match.  It's basically a shortcut for starts_with
+  // followed by remove_prefix.
+  bool Consume(StringPiece x);
+
+  StringPiece substr(size_t pos, size_t n = npos) const;
+
+  struct Hasher {
+    size_t operator()(StringPiece arg) const;
+  };
+
+  // Return a string that contains the copy of the referenced data.
+  std::string ToString() const { return std::string(data_, size_); }
+
+  // Three-way comparison.  Returns value:
+  //   <  0 iff "*this" <  "b",
+  //   == 0 iff "*this" == "b",
+  //   >  0 iff "*this" >  "b"
+  int compare(StringPiece b) const;
+
+  // Return true iff "x" is a prefix of "*this"
+  bool starts_with(StringPiece x) const {
+    return ((size_ >= x.size_) && (memcmp(data_, x.data_, x.size_) == 0));
+  }
+  // Return true iff "x" is a suffix of "*this"
+  bool ends_with(StringPiece x) const {
+    return ((size_ >= x.size_) &&
+            (memcmp(data_ + (size_ - x.size_), x.data_, x.size_) == 0));
+  }
+
+ private:
+  const char* data_;
+  size_t size_;
+
+  // Intentionally copyable
+};
+
+inline bool operator==(StringPiece x, StringPiece y) {
+  return ((x.size() == y.size()) &&
+          (memcmp(x.data(), y.data(), x.size()) == 0));
+}
+
+inline bool operator!=(StringPiece x, StringPiece y) { return !(x == y); }
+
+inline bool operator<(StringPiece x, StringPiece y) { return x.compare(y) < 0; }
+inline bool operator>(StringPiece x, StringPiece y) { return x.compare(y) > 0; }
+inline bool operator<=(StringPiece x, StringPiece y) {
+  return x.compare(y) <= 0;
+}
+inline bool operator>=(StringPiece x, StringPiece y) {
+  return x.compare(y) >= 0;
+}
+
+inline int StringPiece::compare(StringPiece b) const {
+  const size_t min_len = (size_ < b.size_) ? size_ : b.size_;
+  int r = memcmp(data_, b.data_, min_len);
+  if (r == 0) {
+    if (size_ < b.size_)
+      r = -1;
+    else if (size_ > b.size_)
+      r = +1;
+  }
+  return r;
+}
+
+// allow StringPiece to be logged
+extern std::ostream& operator<<(std::ostream& o, tensorflow::StringPiece piece);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_STRINGPIECE_H_
diff --git a/tensorflow/core/lib/core/threadpool.cc b/tensorflow/core/lib/core/threadpool.cc
new file mode 100644
index 0000000000..e9b84d3102
--- /dev/null
+++ b/tensorflow/core/lib/core/threadpool.cc
@@ -0,0 +1,108 @@
+#include "tensorflow/core/lib/core/threadpool.h"
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/tracing.h"
+
+namespace tensorflow {
+namespace thread {
+
+struct ThreadPool::Waiter {
+  condition_variable cv;
+  bool ready;
+};
+
+ThreadPool::ThreadPool(Env* env, const string& name, int num_threads)
+    : ThreadPool(env, ThreadOptions(), name, num_threads) {}
+
+ThreadPool::ThreadPool(Env* env, const ThreadOptions& thread_options,
+                       const string& name, int num_threads)
+    : name_(name) {
+  CHECK_GE(num_threads, 1);
+  string name_prefix = "tf_" + name_;
+  for (int i = 0; i < num_threads; i++) {
+    threads_.push_back(env->StartThread(thread_options, name_prefix,
+                                        [this]() { WorkerLoop(); }));
+  }
+}
+
+ThreadPool::~ThreadPool() {
+  {
+    // Wait for all work to get done.
+    mutex_lock l(mu_);
+
+    // Inform every thread to exit.
+    for (size_t i = 0; i < threads_.size(); ++i) {
+      pending_.push_back({nullptr, 0});
+    }
+
+    // Wakeup all waiters.
+    for (auto w : waiters_) {
+      w->ready = true;
+      w->cv.notify_one();
+    }
+  }
+
+  // Wait for threads to finish.
+  for (auto t : threads_) {
+    delete t;
+  }
+}
+
+bool ThreadPool::HasPendingClosures() const {
+  mutex_lock l(mu_);
+  return pending_.size() != 0;
+}
+
+void ThreadPool::Schedule(std::function<void()> fn) {
+  CHECK(fn != nullptr);
+  uint64 id = 0;
+  if (port::Tracing::IsActive()) {
+    id = port::Tracing::UniqueId();
+    port::Tracing::RecordEvent(port::Tracing::EventCategory::kScheduleClosure,
+                               id);
+  }
+
+  mutex_lock l(mu_);
+  pending_.push_back({fn, id});
+  if (!waiters_.empty()) {
+    Waiter* w = waiters_.back();
+    waiters_.pop_back();
+    w->ready = true;
+    w->cv.notify_one();
+  }
+}
+
+void ThreadPool::WorkerLoop() {
+  port::Tracing::RegisterCurrentThread(name_.c_str());
+  mutex_lock l(mu_);
+  Waiter w;
+  while (true) {
+    while (pending_.empty()) {
+      // Wait for work to be assigned to me
+      w.ready = false;
+      waiters_.push_back(&w);
+      while (!w.ready) {
+        w.cv.wait(l);
+      }
+    }
+    // Pick up pending work
+    Item item = pending_.front();
+    pending_.pop_front();
+    if (item.fn == nullptr) {
+      break;
+    }
+    mu_.unlock();
+    if (item.id != 0) {
+      port::Tracing::ScopedActivity region(
+          port::Tracing::EventCategory::kRunClosure, item.id);
+      item.fn();
+    } else {
+      item.fn();
+    }
+    mu_.lock();
+  }
+}
+
+}  // namespace thread
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/core/threadpool.h b/tensorflow/core/lib/core/threadpool.h
new file mode 100644
index 0000000000..5cf780fa86
--- /dev/null
+++ b/tensorflow/core/lib/core/threadpool.h
@@ -0,0 +1,59 @@
+#ifndef TENSORFLOW_LIB_CORE_THREADPOOL_H_
+#define TENSORFLOW_LIB_CORE_THREADPOOL_H_
+
+#include <deque>
+#include <functional>
+#include <thread>
+#include <vector>
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace thread {
+
+class ThreadPool {
+ public:
+  // Construct a pool that contains "num_threads" threads with specified "name".
+  // env->StartThread() is used to create individual threads.
+  //
+  // REQUIRES: num_threads > 0
+  ThreadPool(Env* env, const string& name, int num_threads);
+
+  // Construct a pool that contains "num_threads" threads with specified "name".
+  // env->StartThread() is used to create individual threads.
+  //
+  // REQUIRES: num_threads > 0
+  ThreadPool(Env* env, const ThreadOptions& thread_options, const string& name,
+             int num_threads);
+
+  // Wait until all scheduled work has finished and then destroy the
+  // set of threads.
+  virtual ~ThreadPool();
+
+  // Schedule fn() for execution in the pool of threads.
+  virtual void Schedule(std::function<void()> fn);
+
+  virtual bool HasPendingClosures() const;
+
+ private:
+  struct Waiter;
+  struct Item {
+    std::function<void()> fn;
+    uint64 id;
+  };
+
+  void WorkerLoop();
+
+  const string name_;
+  mutable mutex mu_;
+  std::vector<Thread*> threads_;  // All threads
+  std::vector<Waiter*> waiters_;  // Stack of waiting threads.
+  std::deque<Item> pending_;      // Queue of pending work
+
+  TF_DISALLOW_COPY_AND_ASSIGN(ThreadPool);
+};
+
+}  // namespace thread
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_CORE_THREADPOOL_H_
diff --git a/tensorflow/core/lib/core/threadpool_test.cc b/tensorflow/core/lib/core/threadpool_test.cc
new file mode 100644
index 0000000000..f4909c445c
--- /dev/null
+++ b/tensorflow/core/lib/core/threadpool_test.cc
@@ -0,0 +1,93 @@
+#include "tensorflow/core/lib/core/threadpool.h"
+
+#include <atomic>
+
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/public/env.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace thread {
+
+static const int kNumThreads = 30;
+
+TEST(ThreadPool, Empty) {
+  for (int num_threads = 1; num_threads < kNumThreads; num_threads++) {
+    fprintf(stderr, "Testing with %d threads\n", num_threads);
+    ThreadPool pool(Env::Default(), "test", num_threads);
+  }
+}
+
+TEST(ThreadPool, DoWork) {
+  for (int num_threads = 1; num_threads < kNumThreads; num_threads++) {
+    fprintf(stderr, "Testing with %d threads\n", num_threads);
+    const int kWorkItems = 15;
+    bool work[kWorkItems];
+    for (int i = 0; i < kWorkItems; i++) {
+      work[i] = false;
+    }
+    {
+      ThreadPool pool(Env::Default(), "test", num_threads);
+      for (int i = 0; i < kWorkItems; i++) {
+        pool.Schedule([&work, i]() {
+          ASSERT_FALSE(work[i]);
+          work[i] = true;
+        });
+      }
+    }
+    for (int i = 0; i < kWorkItems; i++) {
+      ASSERT_TRUE(work[i]);
+    }
+  }
+}
+
+static void BM_Sequential(int iters) {
+  ThreadPool pool(Env::Default(), "test", kNumThreads);
+  // Decrement count sequentially until 0.
+  int count = iters;
+  mutex done_lock;
+  condition_variable done;
+  bool done_flag = false;
+  std::function<void()> work = [&pool, &count, &done_lock, &done, &done_flag,
+                                &work]() {
+    if (count--) {
+      pool.Schedule(work);
+    } else {
+      mutex_lock l(done_lock);
+      done_flag = true;
+      done.notify_all();
+    }
+  };
+  work();
+  mutex_lock l(done_lock);
+  if (!done_flag) {
+    done.wait(l);
+  }
+}
+BENCHMARK(BM_Sequential);
+
+static void BM_Parallel(int iters) {
+  ThreadPool pool(Env::Default(), "test", kNumThreads);
+  // Decrement count concurrently until 0.
+  std::atomic_int_fast32_t count(iters);
+  mutex done_lock;
+  condition_variable done;
+  bool done_flag = false;
+  for (int i = 0; i < iters; ++i) {
+    pool.Schedule([&count, &done_lock, &done, &done_flag]() {
+      if (count.fetch_sub(1) == 1) {
+        mutex_lock l(done_lock);
+        done_flag = true;
+        done.notify_all();
+      }
+    });
+  }
+  mutex_lock l(done_lock);
+  if (!done_flag) {
+    done.wait(l);
+  }
+}
+BENCHMARK(BM_Parallel);
+
+}  // namespace thread
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/gtl/array_slice.h b/tensorflow/core/lib/gtl/array_slice.h
new file mode 100644
index 0000000000..813fb126e3
--- /dev/null
+++ b/tensorflow/core/lib/gtl/array_slice.h
@@ -0,0 +1,299 @@
+// An ArraySlice<T> represents an immutable array of elements of type
+// T.  It has a length "length", and a base pointer "ptr", and the
+// array it represents contains the elements "ptr[0] .. ptr[len-1]".
+// The backing store for the array is *not* owned by the ArraySlice
+// object, and clients must arrange for the backing store to remain
+// live while the ArraySlice object is in use.
+//
+// An ArraySlice<T> is somewhat analogous to a StringPiece, but for
+// array elements of type T.
+//
+// Implicit conversion operations are provided from types such as
+// std::vector<T> and util::gtl::InlinedVector<T, N>.  Note that ArraySlice
+// objects constructed from types in this way may be invalidated by
+// any operations that mutate the underlying vector.
+//
+// One common use for ArraySlice is when passing arguments to a
+// routine where you want to be able to accept a variety of array
+// types (e.g. a vector, a util::gtl::InlinedVector, a C-style array,
+// etc.).  The usual approach here is to have the client explicitly
+// pass in a pointer and a length, as in:
+//
+//   void MyRoutine(const int* elems, int N) {
+//     for (int i = 0; i < N; i++) { .. do something with elems[i] .. }
+//   }
+//
+// Unfortunately, this leads to ugly and error-prone code at the call site:
+//
+//   std::vector<int> my_vector;
+//   MyRoutine(vector_as_array(&my_vector), my_vector.size());
+//
+//   util::gtl::InlinedVector<int, 4> my_inline_vector;
+//   MyRoutine(my_inline_vector.array(), my_inline_vector.size());
+//
+//   int my_array[10];
+//   MyRoutine(my_array, 10);
+//
+// Instead, you can use an ArraySlice as the argument to the routine:
+//
+//   void MyRoutine(ArraySlice<int> a) {
+//     for (int i = 0; i < a.size(); i++) { .. do something with a[i] .. }
+//   }
+//
+// This makes the call sites cleaner, for the most part:
+//
+//   std::vector<int> my_vector;
+//   MyRoutine(my_vector);
+//
+//   util::gtl::InlinedVector<int, 4> my_inline_vector;
+//   MyRoutine(my_inline_vector);
+//
+//   int my_array[10];
+//   MyRoutine(my_array);
+//
+//   int* my_array = new int[10];
+//   MyRoutine(gtl::ArraySlice<int>(my_array, 10));
+//
+// MutableArraySlice<T> represents a mutable array of elements, and, like
+// ArraySlice, does not own the backing store. The implicit constructors it
+// provides allow functions not to worry about whether their mutable arguments
+// refer to vectors, arrays, proto2::RepeatedFields, etc.:
+//
+//   void MyMutatingRoutine(MutableArraySlice<int> a) {
+//     for (int i = 0; i < a.size(); i++) { .. mutate a[i] .. }
+//   }
+//
+//   std::vector<int> my_vector;
+//   MyMutatingRoutine(&my_vector);
+//
+//   int my_array[10];
+//   MyMutatingRoutine(my_array);
+//
+//   int* my_array = new int[10];
+//   MyMutatingRoutine(gtl::MutableArraySlice<int>(my_array, 10));
+//
+//   MyProto my_proto;
+//   for (int i = 0; i < 10; ++i) { my_proto.add_value(i); }
+//   MyMutatingRoutine(my_proto.mutable_value());
+
+#ifndef TENSORFLOW_LIB_GTL_ARRAY_SLICE_H_
+#define TENSORFLOW_LIB_GTL_ARRAY_SLICE_H_
+
+#include <initializer_list>
+#include <type_traits>
+#include <vector>
+
+#include "tensorflow/core/lib/gtl/array_slice_internal.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+
+namespace tensorflow {
+namespace gtl {
+
+template <typename T>
+class ArraySlice {
+ private:
+  typedef array_slice_internal::ArraySliceImpl<T> Impl;
+
+ public:
+  typedef T value_type;
+  typedef typename Impl::pointer pointer;
+  typedef typename Impl::const_pointer const_pointer;
+  typedef typename Impl::reference reference;
+  typedef typename Impl::const_reference const_reference;
+  typedef typename Impl::iterator iterator;
+  typedef typename Impl::const_iterator const_iterator;
+  typedef typename Impl::reverse_iterator reverse_iterator;
+  typedef typename Impl::const_reverse_iterator const_reverse_iterator;
+  typedef typename Impl::size_type size_type;
+  typedef typename Impl::difference_type difference_type;
+
+  static const size_type npos = Impl::npos;
+
+  ArraySlice() : impl_(nullptr, 0) {}
+  ArraySlice(const_pointer array, size_type length) : impl_(array, length) {}
+
+  // Implicit conversion constructors
+  ArraySlice(const std::vector<value_type>& v)  // NOLINT(runtime/explicit)
+      : impl_(v.data(), v.size()) {}
+
+  template <size_t N>
+  ArraySlice(const value_type (&a)[N])  // NOLINT(runtime/explicit)
+      : impl_(a, N) {}
+
+  template <int N>
+  ArraySlice(const InlinedVector<value_type, N>& v)  // NOLINT(runtime/explicit)
+      : impl_(v.array(), v.size()) {}
+
+  // The constructor for any class supplying 'data() const' that returns either
+  // const T* or a less const-qualified version of it, and 'some_integral_type
+  // size() const'. proto2::RepeatedField<T>, string and (since C++11)
+  // std::vector<T,A> and std::array<T, N> are examples of this. See
+  // array_slice_internal.h for details.
+  template <typename V,
+            typename = typename Impl::template EnableIfConvertibleFrom<V>>
+  ArraySlice(const V& v)  // NOLINT(runtime/explicit)
+      : impl_(v) {}
+
+  // Implicitly constructs an ArraySlice from an initializer list. This makes it
+  // possible to pass a brace-enclosed initializer list to a function expecting
+  // an ArraySlice:
+  //   void Process(ArraySlice<int> x);
+  //   Process({1, 2, 3});
+  // The data referenced by the initializer_list must outlive this
+  // ArraySlice. For example, "ArraySlice<int> s={1,2};" and "return
+  // ArraySlice<int>({3,4});" are errors, as the resulting ArraySlice may
+  // reference data that is no longer valid.
+  ArraySlice(std::initializer_list<value_type> v)  // NOLINT(runtime/explicit)
+      : impl_(v.begin(), v.size()) {}
+
+  // Substring of another ArraySlice.
+  // pos must be non-negative and <= x.length().
+  // len must be non-negative and will be pinned to at most x.length() - pos.
+  // If len==npos, the substring continues till the end of x.
+  ArraySlice(const ArraySlice& x, size_type pos, size_type len)
+      : impl_(x.impl_, pos, len) {}
+
+  const_pointer data() const { return impl_.data(); }
+  size_type size() const { return impl_.size(); }
+  size_type length() const { return size(); }
+  bool empty() const { return size() == 0; }
+
+  void clear() { impl_.clear(); }
+
+  const_reference operator[](size_type i) const { return impl_[i]; }
+  const_reference at(size_type i) const { return impl_.at(i); }
+  const_reference front() const { return impl_.front(); }
+  const_reference back() const { return impl_.back(); }
+
+  const_iterator begin() const { return impl_.begin(); }
+  const_iterator end() const { return impl_.end(); }
+  const_reverse_iterator rbegin() const { return impl_.rbegin(); }
+  const_reverse_iterator rend() const { return impl_.rend(); }
+
+  void remove_prefix(size_type n) { impl_.remove_prefix(n); }
+  void remove_suffix(size_type n) { impl_.remove_suffix(n); }
+  void pop_back() { remove_suffix(1); }
+  void pop_front() { remove_prefix(1); }
+
+  // These relational operators have the same semantics as the
+  // std::vector<T> relational operators: they do deep (elementwise)
+  // comparisons.  Array slices are equal iff their size is the same
+  // and all their elements are equal.
+  bool operator==(ArraySlice<T> other) const { return impl_ == other.impl_; }
+  bool operator!=(ArraySlice<T> other) const { return impl_ != other.impl_; }
+
+ private:
+  Impl impl_;
+};
+
+// Mutable version of ArraySlice, which allows the clients to mutate the
+// underlying data. It is implicitly convertible to ArraySlice since it provides
+// the data() and size() methods with correct signatures. When a
+// MutableArraySlice is created from a pointer to a container (as opposed to raw
+// memory pointer), the pointer must not be null.
+//
+// A note on const-ness: "mutable" here refers to the mutability of the
+// underlying data, not of the slice itself. It is perfectly reasonable to have
+// a variable of type "const MutableArraySlice<T>"; this means that the bounds
+// of the view on the array cannot be changed, but the underlying data in the
+// array still may be modified. This is akin to a "T* const" pointer, as opposed
+// to a "const T*" pointer (corresponding to a non-const ArraySlice<T>).
+template <typename T>
+class MutableArraySlice {
+ private:
+  typedef array_slice_internal::MutableArraySliceImpl<T> Impl;
+
+ public:
+  typedef T value_type;
+  typedef typename Impl::pointer pointer;
+  typedef typename Impl::const_pointer const_pointer;
+  typedef typename Impl::reference reference;
+  typedef typename Impl::const_reference const_reference;
+  typedef typename Impl::iterator iterator;
+  typedef typename Impl::const_iterator const_iterator;
+  typedef typename Impl::reverse_iterator reverse_iterator;
+  typedef typename Impl::const_reverse_iterator const_reverse_iterator;
+  typedef typename Impl::size_type size_type;
+  typedef typename Impl::difference_type difference_type;
+
+  static const size_type npos = Impl::npos;
+
+  MutableArraySlice() : impl_(nullptr, 0) {}
+  MutableArraySlice(pointer array, size_type length) : impl_(array, length) {}
+
+  // Implicit conversion constructors
+  MutableArraySlice(std::vector<value_type>* v)  // NOLINT(runtime/explicit)
+      : impl_(v->data(), v->size()) {}
+
+  template <size_t N>
+  MutableArraySlice(value_type (&a)[N])  // NOLINT(runtime/explicit)
+      : impl_(a, N) {}
+
+  template <int N>
+  MutableArraySlice(
+      InlinedVector<value_type, N>* v)  // NOLINT(runtime/explicit)
+      : impl_(v->mutable_array(), v->size()) {}
+
+  // The constructor for any class supplying 'T* data()' or 'T* mutable_data()'
+  // (the former is called if both exist), and 'some_integral_type size()
+  // const'. proto2::RepeatedField is an example of this. Also supports string
+  // arguments, when T==char. The appropriate ctor is selected using SFINAE. See
+  // array_slice_internal.h for details.
+  template <typename V,
+            typename = typename Impl::template EnableIfConvertibleFrom<V>>
+  MutableArraySlice(V* v)  // NOLINT(runtime/explicit)
+      : impl_(v) {}
+
+  // Substring of another MutableArraySlice.
+  // pos must be non-negative and <= x.length().
+  // len must be non-negative and will be pinned to at most x.length() - pos.
+  // If len==npos, the substring continues till the end of x.
+  MutableArraySlice(const MutableArraySlice& x, size_type pos, size_type len)
+      : impl_(x.impl_, pos, len) {}
+
+  // Accessors.
+  pointer data() const { return impl_.data(); }
+  size_type size() const { return impl_.size(); }
+  size_type length() const { return size(); }
+  bool empty() const { return size() == 0; }
+
+  void clear() { impl_.clear(); }
+
+  reference operator[](size_type i) const { return impl_[i]; }
+  reference at(size_type i) const { return impl_.at(i); }
+  reference front() const { return impl_.front(); }
+  reference back() const { return impl_.back(); }
+
+  iterator begin() const { return impl_.begin(); }
+  iterator end() const { return impl_.end(); }
+  reverse_iterator rbegin() const { return impl_.rbegin(); }
+  reverse_iterator rend() const { return impl_.rend(); }
+
+  void remove_prefix(size_type n) { impl_.remove_prefix(n); }
+  void remove_suffix(size_type n) { impl_.remove_suffix(n); }
+  void pop_back() { remove_suffix(1); }
+  void pop_front() { remove_prefix(1); }
+
+  bool operator==(ArraySlice<T> other) const {
+    return ArraySlice<T>(*this) == other;
+  }
+  bool operator!=(ArraySlice<T> other) const {
+    return ArraySlice<T>(*this) != other;
+  }
+
+  // DEPRECATED(jacobsa): Please use data() instead.
+  pointer mutable_data() const { return impl_.data(); }
+
+ private:
+  Impl impl_;
+};
+
+template <typename T>
+const typename ArraySlice<T>::size_type ArraySlice<T>::npos;
+template <typename T>
+const typename MutableArraySlice<T>::size_type MutableArraySlice<T>::npos;
+
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_ARRAY_SLICE_H_
diff --git a/tensorflow/core/lib/gtl/array_slice_internal.h b/tensorflow/core/lib/gtl/array_slice_internal.h
new file mode 100644
index 0000000000..080f0a38d8
--- /dev/null
+++ b/tensorflow/core/lib/gtl/array_slice_internal.h
@@ -0,0 +1,253 @@
+// NOT FOR INCLUSION BY CLIENT CODE. This file is only to be included by
+// array_slice.h.
+
+// Helper functions and templates for ArraySlice.
+
+#ifndef TENSORFLOW_LIB_GTL_ARRAY_SLICE_INTERNAL_H_
+#define TENSORFLOW_LIB_GTL_ARRAY_SLICE_INTERNAL_H_
+
+#include <stddef.h>
+#include <algorithm>
+#include <iterator>
+#include <memory>
+#include <string>
+#include <type_traits>
+#include <utility>
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace gtl {
+namespace array_slice_internal {
+
+// Template logic for generic constructors.
+
+// Wrappers whose Get() delegates to the appropriate method of a container, and
+// is defined when this method exists. Delegates to the const method if C is a
+// const type.
+struct Data {
+  template <typename C>
+  static decltype(std::declval<C>().data()) Get(C* v) {
+    return v->data();
+  }
+};
+
+struct MutableData {
+  template <typename C>
+  static decltype(std::declval<C>().mutable_data()) Get(C* v) {
+    return v->mutable_data();
+  }
+};
+
+struct Size {
+  template <typename C>
+  static decltype(std::declval<C>().size()) Get(C* v) {
+    return v->size();
+  }
+};
+
+struct MutableStringData {
+  // Defined only for string.
+  static char* Get(string* v) { return v->empty() ? nullptr : &*v->begin(); }
+};
+
+// Checks whether M::Get(C*) is defined and has a return type R such that
+// Checker::valid<R>()==true.
+template <typename M, typename Checker, typename C>
+struct HasGetHelper : public M {
+ private:
+  struct None {};
+  // M::Get is selected when it is viable. Get(...) is selected otherwise.
+  using M::Get;
+  static None Get(...);
+
+ public:
+  static constexpr bool HasGet() {
+    using Result = decltype(Get(std::declval<C*>()));
+    return !std::is_same<Result, None>() && Checker::template valid<Result>();
+  }
+};
+
+// Defines HasGet() for a particular method, container, and checker. If
+// HasGet()==true, provides Get() that delegates to the method.
+template <typename M, typename Checker, typename C,
+          bool /*has_get*/ = HasGetHelper<M, Checker, C>::HasGet()>
+struct Wrapper {
+  static constexpr bool HasGet() { return false; }
+};
+
+template <typename M, typename Checker, typename C>
+struct Wrapper<M, Checker, C, true> {
+  static constexpr bool HasGet() { return true; }
+  static decltype(M::Get(std::declval<C*>())) Get(C* v) { return M::Get(v); }
+};
+
+// Type checker for a method returning an integral value.
+struct SizeChecker {
+  template <typename R>
+  static constexpr bool valid() {
+    return std::is_integral<R>::value;
+  }
+};
+
+// Type checker for a method returning either a pointer to T or a less const
+// version of that.
+template <typename T>
+struct DataChecker {
+  // We want to enable conversion from std::vector<T*> to ArraySlice<const T*>
+  // but
+  // disable conversion from std::vector<Derived> to ArraySlice<Base>. Here we
+  // use
+  // the fact that U** is convertible to Q* const* if and only if Q is the same
+  // type or a more cv-qualified version of U.
+  template <typename R>
+  static constexpr bool valid() {
+    return std::is_convertible<R*, T* const*>::value;
+  }
+};
+
+// Aliases to A if A::HasGet()==true, or to B otherwise.
+template <typename A, typename B>
+using FirstWithGet = typename std::conditional<A::HasGet(), A, B>::type;
+
+// Wraps C::data() const, returning a pointer to const data.
+template <typename T, typename C>
+using ContainerData = Wrapper<Data, DataChecker<const T>, const C>;
+
+// Wraps a method returning a pointer to mutable data. Prefers data() over
+// mutable_data(), and handles strings when T==char. If data() returns a pointer
+// to mutable data, it is most likely overloaded, but may also be a single
+// method 'T* C::data() const' in a non-STL-compliant container.
+template <typename T, typename C>
+using ContainerMutableData =
+    FirstWithGet<Wrapper<Data, DataChecker<T>, C>,
+                 FirstWithGet<Wrapper<MutableData, DataChecker<T>, C>,
+                              Wrapper<MutableStringData, DataChecker<T>, C>>>;
+
+// Wraps C::size() const.
+template <typename C>
+using ContainerSize = Wrapper<Size, SizeChecker, const C>;
+
+// Implementation class for ArraySlice and MutableArraySlice. In the case of
+// ArraySlice, T will be a const type; for MutableArraySlice, T will be a
+// mutable type.
+template <typename T>
+class ArraySliceImplBase {
+ public:
+  typedef T* pointer;
+  typedef const T* const_pointer;
+  typedef T& reference;
+  typedef const T& const_reference;
+  typedef pointer iterator;
+  typedef const_pointer const_iterator;
+  typedef std::reverse_iterator<iterator> reverse_iterator;
+  typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
+  typedef size_t size_type;
+  typedef ptrdiff_t difference_type;
+
+  static const size_type npos = -1;
+
+  ArraySliceImplBase(pointer array, size_type length)
+      : ptr_(array), length_(length) {}
+
+  // Substring of another ArraySlice.
+  // pos must be non-negative and <= x.length().
+  // len must be non-negative and will be pinned to at most x.length() - pos.
+  ArraySliceImplBase(const ArraySliceImplBase& x, size_type pos, size_type len)
+      : ptr_(x.ptr_ + pos), length_(std::min(x.length_ - pos, len)) {}
+
+  // Some of the const methods below return pointers and references to mutable
+  // data. This is only the case in this internal class; ArraySlice and
+  // MutableArraySlice provide deep-constness.
+
+  pointer data() const { return ptr_; }
+  size_type size() const { return length_; }
+
+  void clear() {
+    ptr_ = nullptr;
+    length_ = 0;
+  }
+
+  reference operator[](size_type i) const { return ptr_[i]; }
+  reference at(size_type i) const {
+    DCHECK_LT(i, length_);
+    return ptr_[i];
+  }
+  reference front() const {
+    DCHECK_GT(length_, 0);
+    return ptr_[0];
+  }
+  reference back() const {
+    DCHECK_GT(length_, 0);
+    return ptr_[length_ - 1];
+  }
+
+  void remove_prefix(size_type n) {
+    DCHECK_GE(length_, n);
+    ptr_ += n;
+    length_ -= n;
+  }
+  void remove_suffix(size_type n) {
+    DCHECK_GE(length_, n);
+    length_ -= n;
+  }
+
+  iterator begin() const { return ptr_; }
+  iterator end() const { return ptr_ + length_; }
+  reverse_iterator rbegin() const { return reverse_iterator(end()); }
+  reverse_iterator rend() const { return reverse_iterator(begin()); }
+
+  bool operator==(const ArraySliceImplBase& other) const {
+    if (size() != other.size()) return false;
+    if (data() == other.data()) return true;
+    return std::equal(data(), data() + size(), other.data());
+  }
+  bool operator!=(const ArraySliceImplBase& other) const {
+    return !(*this == other);
+  }
+
+ private:
+  pointer ptr_;
+  size_type length_;
+};
+
+template <typename T>
+class ArraySliceImpl : public ArraySliceImplBase<const T> {
+ public:
+  using ArraySliceImplBase<const T>::ArraySliceImplBase;
+
+  // Defined iff the data and size accessors for the container C have been
+  // defined.
+  template <typename C>
+  using EnableIfConvertibleFrom =
+      typename std::enable_if<ContainerData<T, C>::HasGet() &&
+                              ContainerSize<C>::HasGet()>::type;
+
+  // Constructs from a container when EnableIfConvertibleFrom is
+  // defined. std::addressof handles types with overloaded operator&.
+  template <typename C>
+  explicit ArraySliceImpl(const C& v)
+      : ArraySliceImplBase<const T>(ContainerData<T, C>::Get(std::addressof(v)),
+                                    ContainerSize<C>::Get(std::addressof(v))) {}
+};
+
+template <typename T>
+class MutableArraySliceImpl : public ArraySliceImplBase<T> {
+ public:
+  using ArraySliceImplBase<T>::ArraySliceImplBase;
+
+  template <typename C>
+  using EnableIfConvertibleFrom =
+      typename std::enable_if<ContainerMutableData<T, C>::HasGet() &&
+                              ContainerSize<C>::HasGet()>::type;
+
+  template <typename C>
+  explicit MutableArraySliceImpl(C* v)
+      : ArraySliceImplBase<T>(ContainerMutableData<T, C>::Get(v),
+                              ContainerSize<C>::Get(v)) {}
+};
+
+}  // namespace array_slice_internal
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_ARRAY_SLICE_INTERNAL_H_
diff --git a/tensorflow/core/lib/gtl/array_slice_test.cc b/tensorflow/core/lib/gtl/array_slice_test.cc
new file mode 100644
index 0000000000..33ee8fc8dd
--- /dev/null
+++ b/tensorflow/core/lib/gtl/array_slice_test.cc
@@ -0,0 +1,646 @@
+#include "tensorflow/core/lib/gtl/array_slice.h"
+
+#include <algorithm>
+#include <array>
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/platform/port.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace gtl {
+namespace {
+
+typedef ArraySlice<int> IntSlice;
+typedef ArraySlice<char> CharSlice;
+typedef MutableArraySlice<int> MutableIntSlice;
+typedef MutableArraySlice<char> MutableCharSlice;
+typedef std::vector<int> IntVec;
+
+// Append 0..len-1 to *v
+template <typename Vector>
+static void Fill(Vector* v, int len, int offset = 0) {
+  for (int i = 0; i < len; i++) {
+    v->push_back(i + offset);
+  }
+}
+
+static void TestHelper(const IntSlice& vorig, const IntVec& vec) {
+  IntSlice other;  // To test the assignment return value.
+  IntSlice v = other = vorig;
+  const int len = vec.size();
+  EXPECT_EQ(v.size(), vec.size());
+
+  for (int i = 0; i < len; i++) {
+    EXPECT_EQ(v[i], vec[i]);
+    EXPECT_EQ(v.at(i), vec[i]);
+  }
+  EXPECT_EQ(v.begin(), gtl::vector_as_array(&vec));
+
+  int counter = 0;
+  for (IntSlice::iterator it = v.begin(); it != v.end(); ++it) {
+    EXPECT_EQ(counter, *it);
+    counter++;
+  }
+  EXPECT_EQ(counter, len);
+
+  counter = 0;
+  for (IntSlice::const_iterator it = v.begin(); it != v.end(); ++it) {
+    EXPECT_EQ(counter, *it);
+    counter++;
+  }
+  EXPECT_EQ(counter, len);
+
+  if (len > 0) {
+    EXPECT_EQ(0, v.front());
+    EXPECT_EQ(len - 1, v.back());
+    v.pop_back();
+    EXPECT_EQ(len - 1, v.size());
+    for (size_t i = 0; i < v.size(); ++i) {
+      EXPECT_EQ(i, v[i]);
+    }
+    if (len > 1) {
+      v.pop_front();
+      EXPECT_EQ(len - 2, v.size());
+      for (size_t i = 0; i < v.size(); ++i) {
+        EXPECT_EQ(i + 1, v[i]);
+      }
+    }
+  }
+}
+
+// The element access test that is applicable both when MutableArraySlice is
+// const and when it's not.
+template <class V>
+void MutableTestHelperTemplated(V v, int* ptr, const int len) {
+  CHECK_EQ(v.size(), len);
+
+  for (int i = 0; i < len; i++) {
+    EXPECT_EQ(ptr + i, &v[i]);
+    EXPECT_EQ(ptr + i, &v.at(i));
+  }
+  EXPECT_EQ(ptr, v.begin());
+  EXPECT_EQ(ptr + len, v.end());
+  EXPECT_EQ(ptr, v.data());
+
+  int counter = 0;
+  for (MutableIntSlice::const_iterator it = v.begin(); it != v.end(); ++it) {
+    EXPECT_EQ(ptr + counter, &*it);
+    counter++;
+  }
+  EXPECT_EQ(counter, len);
+
+  EXPECT_EQ(len, std::distance(v.rbegin(), v.rend()));
+
+  if (len > 0) {
+    EXPECT_EQ(ptr, &v.front());
+    EXPECT_EQ(ptr + len - 1, &v.back());
+    EXPECT_EQ(ptr + len - 1, &*v.rbegin());
+    EXPECT_EQ(ptr, &*(v.rend() - 1));
+  }
+}
+
+static void MutableTestHelper(const MutableIntSlice& vorig, int* ptr,
+                              const int len) {
+  // Test the data accessors both when the MutableArraySlice is declared const,
+  // and when it is not.
+  MutableTestHelperTemplated<const MutableIntSlice&>(vorig, ptr, len);
+  MutableTestHelperTemplated<MutableIntSlice>(vorig, ptr, len);
+
+  MutableIntSlice other;  // To test the assignment return value.
+  MutableIntSlice v = other = vorig;
+  EXPECT_EQ(ptr, v.mutable_data());
+
+  int counter = 0;
+  for (MutableIntSlice::iterator it = v.begin(); it != v.end(); ++it) {
+    EXPECT_EQ(ptr + counter, &*it);
+    counter++;
+  }
+  EXPECT_EQ(counter, len);
+
+  if (len > 0) {
+    // Test that elements are assignable.
+    v[0] = 1;
+    v.front() = 2;
+    v.back() = 5;
+    *v.mutable_data() = 4;
+    std::fill(v.begin(), v.end(), 5);
+    std::fill(v.rbegin(), v.rend(), 6);
+    // Test size-changing methods.
+    v.pop_back();
+    EXPECT_EQ(len - 1, v.size());
+    for (size_t i = 0; i < v.size(); ++i) {
+      EXPECT_EQ(ptr + i, &v[i]);
+    }
+    if (len > 1) {
+      v.pop_front();
+      EXPECT_EQ(len - 2, v.size());
+      for (size_t i = 0; i < v.size(); ++i) {
+        EXPECT_EQ(ptr + i + 1, &v[i]);
+      }
+    }
+  }
+}
+
+template <typename Vector>
+static void TestImplicitConversion(const IntSlice& v, const Vector& vec) {
+  EXPECT_EQ(v.size(), vec.size());
+  for (size_t i = 0; i < v.size(); i++) {
+    EXPECT_EQ(v[i], vec[i]);
+  }
+}
+
+template <typename Vector>
+static void TestImplicitConversion(const CharSlice& v, const Vector& vec) {
+  TestImplicitConversion(IntVec(v.begin(), v.end()), vec);
+}
+
+static void TestImplicitConversion(const MutableIntSlice& v, const int* data,
+                                   int size) {
+  EXPECT_EQ(size, v.size());
+  for (size_t i = 0; i < v.size(); i++) {
+    EXPECT_EQ(data + i, &v[i]);
+  }
+}
+
+static void TestImplicitConversion(const MutableCharSlice& v, const char* data,
+                                   int size) {
+  EXPECT_EQ(size, v.size());
+  for (size_t i = 0; i < v.size(); i++) {
+    EXPECT_EQ(data + i, &v[i]);
+  }
+}
+// A struct supplying the data(), mutable_data() and size() methods, just like
+// e.g. proto2::RepeatedField.
+struct RepeatedField {
+  std::vector<int> storage;
+  const int* data() const { return storage.data(); }
+  int* mutable_data() { return storage.data(); }
+  int size() const { return storage.size(); }
+};
+
+// A struct supplying the data() (both mutable and const versions) and
+// size(). It also supplies mutable_data() but we test that data() is selected
+// instead.
+struct ContainerWithOverloads {
+  std::vector<int> storage;
+  std::vector<int> wrong_storage;
+  const int* data() const { return storage.data(); }
+  int* data() { return storage.data(); }
+  // MutableArraySlice should not call mutable_data(), preferring data()
+  // instead.
+  int* mutable_data() { return wrong_storage.data(); }
+  int size() const { return storage.size(); }
+};
+
+// A struct supplying data() and size() methods.
+struct ContainerWithShallowConstData {
+  std::vector<int> storage;
+  int* data() const { return const_cast<int*>(storage.data()); }
+  int size() const { return storage.size(); }
+};
+
+TEST(IntSlice, Simple) {
+  for (int len = 0; len < 20; len++) {
+    IntVec vec;
+    Fill(&vec, len);
+    TestHelper(IntSlice(vec), vec);
+    TestHelper(IntSlice(vec.data(), vec.size()), vec);
+  }
+}
+
+TEST(IntSlice, WithPosAndLen) {
+  IntVec vec;
+  Fill(&vec, 20);
+  for (size_t len = 0; len < vec.size(); len++) {
+    IntVec subvec(vec.begin(), vec.begin() + len);
+    TestImplicitConversion(IntSlice(vec, 0, len), subvec);
+    TestImplicitConversion(IntSlice(IntSlice(vec), 0, len), subvec);
+  }
+  EXPECT_EQ(0, IntSlice(vec, 0, 0).size());
+  EXPECT_EQ(0, IntSlice(IntSlice(vec), 0, 0).size());
+  TestImplicitConversion(IntSlice(vec, 0, IntSlice::npos), vec);
+}
+
+TEST(IntSlice, Clear) {
+  for (int len = 0; len < 20; len++) {
+    IntVec vec;
+    Fill(&vec, len);
+    IntSlice v(vec);
+    v.clear();
+    EXPECT_EQ(0, v.size());
+    EXPECT_EQ(v.begin(), v.end());
+  }
+}
+
+TEST(IntSlice, Swap) {
+  for (int l1 = 0; l1 < 20; l1++) {
+    for (int l2 = 0; l2 < 20; l2++) {
+      IntVec avec, bvec;
+      Fill(&avec, l1);
+      Fill(&bvec, l2, 100);
+      IntSlice a(avec), b(bvec);
+      using std::swap;
+      swap(a, b);
+      EXPECT_EQ(l1, b.size());
+      EXPECT_EQ(l2, a.size());
+      for (int i = 0; i < l1; i++) {
+        EXPECT_EQ(i, b[i]);
+      }
+      for (int i = 0; i < l2; i++) {
+        EXPECT_EQ(100 + i, a[i]);
+      }
+    }
+  }
+}
+
+TEST(IntSlice, ImplicitConversion) {
+  for (int len = 0; len < 20; len++) {
+    IntVec vec;
+    Fill(&vec, len);
+    IntSlice slice;
+    slice = vec;
+    TestImplicitConversion(vec, vec);
+    TestImplicitConversion(slice, vec);
+    TestImplicitConversion(IntSlice(vec.data(), vec.size()), vec);
+  }
+}
+
+TEST(IntSlice, InlinedVectorConversion) {
+  for (int len = 0; len < 20; len++) {
+    InlinedVector<int, 4> inline_vec;
+    for (int i = 0; i < len; i++) {
+      inline_vec.push_back(i);
+    }
+    IntVec vec;
+    Fill(&vec, len);
+    IntSlice v = inline_vec;  // Test assignment
+    static_cast<void>(v);
+    TestImplicitConversion(inline_vec, vec);
+  }
+}
+
+TEST(IntSlice, StaticArrayConversion) {
+  int array[20];
+  IntVec vec;
+  Fill(&vec, TF_ARRAYSIZE(array));
+  std::copy(vec.begin(), vec.end(), array);
+  IntSlice v = array;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(array, vec);
+}
+
+TEST(IntSlice, StdArrayConversion) {
+  std::array<int, 20> array;
+  IntVec vec;
+  Fill(&vec, array.size());
+  std::copy(vec.begin(), vec.end(), array.begin());
+
+  // Check assignment.
+  {
+    IntSlice v = array;
+    static_cast<void>(v);
+  }
+
+  // Check sub-slice initialization.
+  {
+    IntSlice v = {array, 10, 15};
+    static_cast<void>(v);
+  }
+
+  TestImplicitConversion(array, vec);
+}
+
+// Values according to the Fill function.
+static const int test_const_array[] = {0, 1, 2};
+
+TEST(IntSlice, ConstStaticArrayConversion) {
+  IntVec vec;
+  Fill(&vec, TF_ARRAYSIZE(test_const_array));
+  IntSlice v = test_const_array;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(test_const_array, vec);
+}
+
+TEST(IntSlice, RepeatedFieldConversion) {
+  RepeatedField repeated_field;
+  IntVec vec;
+  Fill(&vec, 20);
+  repeated_field.storage = vec;
+  IntSlice v = repeated_field;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(repeated_field, vec);
+}
+
+TEST(IntSlice, ContainerWithOverloadsConversion) {
+  ContainerWithOverloads container;
+  Fill(&container.storage, 20);
+  container.wrong_storage.resize(container.size());
+  IntSlice v = container;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(container, container.storage);
+}
+
+TEST(IntSlice, ContainerWithShallowConstDataConversion) {
+  ContainerWithShallowConstData container;
+  Fill(&container.storage, 20);
+  IntSlice v = container;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(container, container.storage);
+}
+
+TEST(IntSlice, MutableIntSliceConversion) {
+  IntVec vec(20);
+  IntSlice slice = MutableIntSlice(&vec);
+  EXPECT_EQ(vec.size(), slice.size());
+  EXPECT_EQ(vec.data(), slice.data());
+}
+
+TEST(IntSlice, Equality) {
+  IntVec vec1(20);
+  IntVec vec2(20);
+  // These two slices are from different vectors, but have the same
+  // size and have the same elements (right now).  They should
+  // compare equal.
+  const IntSlice from1(vec1);
+  const IntSlice from2(vec2);
+  EXPECT_EQ(from1, from1);
+  EXPECT_EQ(from1, from2);
+
+  // This verifies that MutableArraySlices can be compared freely with
+  // ArraySlices.
+  const MutableIntSlice mutable_from1(&vec1);
+  const MutableIntSlice mutable_from2(&vec2);
+  EXPECT_EQ(from1, mutable_from1);
+  EXPECT_EQ(mutable_from1, from1);
+  EXPECT_EQ(mutable_from1, mutable_from2);
+  EXPECT_EQ(mutable_from2, mutable_from1);
+
+  // With a different size, the array slices should not be equal.
+  EXPECT_NE(from1, IntSlice(from1, 0, from1.size() - 1));
+
+  // With different contents, the array slices should not be equal.
+  ++vec2.back();
+  EXPECT_NE(from1, from2);
+}
+
+// Compile-asserts that the argument has the expected type.
+template <typename Expected, typename T>
+void CheckType(const T& value) {
+  testing::StaticAssertTypeEq<Expected, T>();
+}
+
+TEST(IntSlice, ExposesContainerTypesAndConsts) {
+  IntSlice slice;
+  const IntSlice const_slice;
+  CheckType<IntSlice::iterator>(slice.begin());
+  CheckType<IntSlice::const_iterator>(const_slice.end());
+  CheckType<IntSlice::const_reverse_iterator>(const_slice.rbegin());
+  CheckType<IntSlice::reverse_iterator>(slice.rend());
+  testing::StaticAssertTypeEq<int, IntSlice::value_type>();
+  testing::StaticAssertTypeEq<const int*, IntSlice::pointer>();
+  testing::StaticAssertTypeEq<const int&, IntSlice::const_reference>();
+  EXPECT_EQ(static_cast<IntSlice::size_type>(-1), IntSlice::npos);
+}
+
+void TestEmpty(IntSlice slice) { ASSERT_TRUE(slice.empty()); }
+
+void TestRange(IntSlice slice, int from, int to) {
+  ASSERT_EQ(to - from + 1, slice.size());
+  for (size_t i = 0; i < slice.size(); ++i) {
+    EXPECT_EQ(from + i, slice[i]);
+  }
+}
+
+TEST(IntSlice, InitializerListConversion) {
+  TestEmpty({});
+  TestRange({1}, 1, 1);
+  TestRange({10, 11, 12, 13}, 10, 13);
+}
+
+TEST(CharSlice, StringConversion) {
+  IntVec vec;
+  Fill(&vec, 20);
+  string str(vec.begin(), vec.end());
+  CharSlice v = str;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(str, vec);
+}
+
+TEST(IntPtrSlice, ConstConversion) {
+  int one = 1;
+  int two = 2;
+  std::vector<int*> vec;
+  vec.push_back(&one);
+  vec.push_back(&two);
+  ArraySlice<const int*> v = vec;
+  ASSERT_EQ(2, v.size());
+  EXPECT_EQ(&one, v[0]);
+  EXPECT_EQ(&two, v[1]);
+}
+
+TEST(MutableIntSlice, Simple) {
+  for (int len = 0; len < 20; len++) {
+    IntVec vec(len);
+    MutableTestHelper(MutableIntSlice(&vec), vec.data(), len);
+    MutableTestHelper(MutableIntSlice(vec.data(), vec.size()), vec.data(), len);
+  }
+}
+
+TEST(MutableIntSlice, WithPosAndLen) {
+  IntVec vec(20);
+  for (size_t len = 0; len < vec.size(); len++) {
+    TestImplicitConversion(MutableIntSlice(&vec, 0, len), vec.data(), len);
+    TestImplicitConversion(MutableIntSlice(MutableIntSlice(&vec), 0, len),
+                           vec.data(), len);
+  }
+  EXPECT_EQ(0, MutableIntSlice(&vec, 0, 0).size());
+  EXPECT_EQ(0, MutableIntSlice(MutableIntSlice(&vec), 0, 0).size());
+  TestImplicitConversion(MutableIntSlice(&vec, 0, MutableIntSlice::npos),
+                         vec.data(), vec.size());
+}
+
+TEST(MutableIntSlice, Clear) {
+  for (int len = 0; len < 20; len++) {
+    IntVec vec(len);
+    MutableIntSlice v(&vec);
+    v.clear();
+    EXPECT_EQ(0, v.size());
+    EXPECT_EQ(v.begin(), v.end());
+  }
+}
+
+TEST(MutableIntSlice, Swap) {
+  for (int l1 = 0; l1 < 20; l1++) {
+    for (int l2 = 0; l2 < 20; l2++) {
+      IntVec avec(l1), bvec(l2);
+      MutableIntSlice a(&avec), b(&bvec);
+      using std::swap;
+      swap(a, b);
+      EXPECT_EQ(l1, b.size());
+      EXPECT_EQ(l2, a.size());
+      for (int i = 0; i < l1; i++) {
+        EXPECT_EQ(&avec[i], &b[i]);
+      }
+      for (int i = 0; i < l2; i++) {
+        EXPECT_EQ(&bvec[i], &a[i]);
+      }
+    }
+  }
+}
+
+TEST(MutableIntSlice, ImplicitConversion) {
+  for (int len = 0; len < 20; len++) {
+    IntVec vec(len);
+    MutableIntSlice slice;
+    slice = &vec;
+    TestImplicitConversion(&vec, vec.data(), len);
+    TestImplicitConversion(slice, vec.data(), len);
+    TestImplicitConversion(MutableIntSlice(vec.data(), vec.size()), vec.data(),
+                           len);
+  }
+}
+
+TEST(MutableIntSlice, InlinedVectorConversion) {
+  for (int len = 0; len < 20; len++) {
+    InlinedVector<int, 4> inline_vec;
+    for (int i = 0; i < len; i++) {
+      inline_vec.push_back(i);
+    }
+    MutableIntSlice v = &inline_vec;  // Test assignment
+    static_cast<void>(v);
+    TestImplicitConversion(&inline_vec, inline_vec.array(), inline_vec.size());
+  }
+}
+
+TEST(MutableIntSlice, StaticArrayConversion) {
+  int array[20];
+  MutableIntSlice v = array;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(array, array, TF_ARRAYSIZE(array));
+}
+
+TEST(MutableIntSlice, StdArrayConversion) {
+  std::array<int, 20> array;
+
+  // Check assignment.
+  {
+    MutableIntSlice v = &array;
+    static_cast<void>(v);
+  }
+
+  // Check sub-slice initialization.
+  {
+    MutableIntSlice v = {&array, 10, 15};
+    static_cast<void>(v);
+  }
+
+  TestImplicitConversion(&array, &array[0], array.size());
+}
+
+TEST(MutableIntSlice, RepeatedFieldConversion) {
+  RepeatedField repeated_field;
+  Fill(&repeated_field.storage, 20);
+  MutableIntSlice v = &repeated_field;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(&repeated_field, repeated_field.storage.data(),
+                         repeated_field.storage.size());
+}
+
+TEST(MutableIntSlice, ContainerWithOverloadsConversion) {
+  ContainerWithOverloads container;
+  Fill(&container.storage, 20);
+  container.wrong_storage.resize(container.size());
+  MutableIntSlice v = &container;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(&container, container.storage.data(),
+                         container.storage.size());
+}
+
+TEST(MutableIntSlice, ContainerWithShallowConstDataConversion) {
+  ContainerWithShallowConstData container;
+  Fill(&container.storage, 20);
+  MutableIntSlice v = &container;  // Test assignment
+  static_cast<void>(v);
+  TestImplicitConversion(&container, container.storage.data(),
+                         container.storage.size());
+}
+
+TEST(MutableIntSlice, TypedefsAndConstants) {
+  testing::StaticAssertTypeEq<int, MutableIntSlice::value_type>();
+  testing::StaticAssertTypeEq<int*, MutableIntSlice::pointer>();
+  testing::StaticAssertTypeEq<const int*, MutableIntSlice::const_pointer>();
+  testing::StaticAssertTypeEq<int&, MutableIntSlice::reference>();
+  testing::StaticAssertTypeEq<const int&, MutableIntSlice::const_reference>();
+
+  EXPECT_EQ(static_cast<MutableIntSlice::size_type>(-1), MutableIntSlice::npos);
+}
+
+TEST(MutableIntSlice, IteratorsAndReferences) {
+  auto accept_pointer = [](int* x) {};
+  auto accept_reference = [](int& x) {};
+  auto accept_iterator = [](MutableIntSlice::iterator x) {};
+  auto accept_reverse_iterator = [](MutableIntSlice::reverse_iterator x) {};
+
+  int a[1];
+  MutableIntSlice s = a;
+
+  accept_pointer(s.data());
+  accept_pointer(s.mutable_data());
+  accept_iterator(s.begin());
+  accept_iterator(s.end());
+  accept_reverse_iterator(s.rbegin());
+  accept_reverse_iterator(s.rend());
+
+  accept_reference(s[0]);
+  accept_reference(s.at(0));
+  accept_reference(s.front());
+  accept_reference(s.back());
+}
+
+TEST(MutableIntSlice, IteratorsAndReferences_Const) {
+  auto accept_pointer = [](int* x) {};
+  auto accept_reference = [](int& x) {};
+  auto accept_iterator = [](MutableIntSlice::iterator x) {};
+  auto accept_reverse_iterator = [](MutableIntSlice::reverse_iterator x) {};
+
+  int a[1];
+  const MutableIntSlice s = a;
+
+  accept_pointer(s.data());
+  accept_pointer(s.mutable_data());
+  accept_iterator(s.begin());
+  accept_iterator(s.end());
+  accept_reverse_iterator(s.rbegin());
+  accept_reverse_iterator(s.rend());
+
+  accept_reference(s[0]);
+  accept_reference(s.at(0));
+  accept_reference(s.front());
+  accept_reference(s.back());
+}
+
+bool TestMutableOverload(MutableIntSlice slice) { return false; }
+
+bool TestMutableOverload(MutableCharSlice slice) { return true; }
+
+TEST(MutableCharSlice, StringConversion) {
+  for (int len = 0; len < 20; len++) {
+    string str(len, '\0');
+    MutableCharSlice v = &str;  // Test assignment
+    static_cast<void>(v);
+    TestImplicitConversion(v, str.data(), str.size());
+  }
+  // Verify that only the correct overload is feasible. Note that this would
+  // fail if the string ctor was declared simply as MutableArraySlice(string*),
+  // since in that case both overloads would be feasible.
+  string str;
+  EXPECT_TRUE(TestMutableOverload(&str));
+}
+
+}  // namespace
+}  // namespace gtl
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/gtl/edit_distance.h b/tensorflow/core/lib/gtl/edit_distance.h
new file mode 100644
index 0000000000..82b6c2299f
--- /dev/null
+++ b/tensorflow/core/lib/gtl/edit_distance.h
@@ -0,0 +1,82 @@
+#ifndef TENSORFLOW_LIB_GTL_EDIT_DISTANCE_H_
+#define TENSORFLOW_LIB_GTL_EDIT_DISTANCE_H_
+
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+
+namespace tensorflow {
+namespace gtl {
+
+// Calculate the Levenshtein Edit Distance between two contiguous
+// sequences, s and t, of type T.
+//
+// The Levenshtein distance is a symmetric distance defined as the
+// smallest number of insertions, deletions, and substitutions
+// required to convert sequence s to t (and vice versa).
+// Note, this distance does not consider transpositions.
+//
+// For more details and a reference implementation, see:
+//   https://en.wikipedia.org/wiki/Levenshtein_distance
+//
+// This implementation has time complexity O(|s|*|t|)
+// and space complexity O(min(|s|, |t|)), where
+//   |x| := x.size()
+//
+// A simple call to LevenshteinDistance looks like:
+//
+//  int64 dist = LevenshteinDistance("hi", "bye", std::equal_to<char>());
+//
+template <typename T, typename Cmp>
+inline int64 LevenshteinDistance(const gtl::ArraySlice<T>& s,
+                                 const gtl::ArraySlice<T>& t, const Cmp& cmp) {
+  const int64 s_size = s.size();
+  const int64 t_size = t.size();
+
+  if (s_size == 0) return t_size;
+  if (t_size == 0) return s_size;
+  if (s == t) return 0;
+  if (t_size > s_size) return LevenshteinDistance(t, s, cmp);
+
+  // Create work vectors
+  gtl::InlinedVector<int64, 32> scratch0(t_size + 1);
+  gtl::InlinedVector<int64, 32> scratch1(t_size + 1);
+
+  int64* previous = scratch0.data();
+  int64* current = scratch1.data();
+
+  // Initialize previous row of distances
+  std::iota(scratch0.begin(), scratch0.end(), 0);
+
+  for (int64 i = 0; i < s_size; ++i) {
+    // Swap current and previous rows for next iteration
+    std::swap(previous, current);
+
+    // Calculate current row distances from previous row
+    current[0] = i + 1;
+
+    // Fill in the rest of the row
+    for (int64 j = 0; j < t_size; ++j) {
+      const int64 cost = cmp(s[i], t[j]) ? 0 : 1;
+      current[j + 1] =
+          std::min(current[j] + 1,                 // deletion cost
+                   std::min(previous[j + 1] + 1,   // insertion cost
+                            previous[j] + cost));  // substitution cost
+    }
+  }
+
+  return current[t_size];
+}
+
+template <typename Container1, typename Container2, typename Cmp>
+inline int64 LevenshteinDistance(const Container1& s, const Container2& t,
+                                 const Cmp& cmp) {
+  return LevenshteinDistance(
+      gtl::ArraySlice<typename Container1::value_type>(s.data(), s.size()),
+      gtl::ArraySlice<typename Container1::value_type>(t.data(), t.size()),
+      cmp);
+}
+
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_EDIT_DISTANCE_H_
diff --git a/tensorflow/core/lib/gtl/edit_distance_test.cc b/tensorflow/core/lib/gtl/edit_distance_test.cc
new file mode 100644
index 0000000000..0526ee0a05
--- /dev/null
+++ b/tensorflow/core/lib/gtl/edit_distance_test.cc
@@ -0,0 +1,125 @@
+#include "tensorflow/core/lib/gtl/edit_distance.h"
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace gtl {
+namespace {
+
+class LevenshteinDistanceTest : public ::testing::Test {
+ protected:
+  std::vector<char> empty_;
+  std::string s1_;
+  std::string s1234_;
+  std::string s567_;
+  std::string kilo_;
+  std::string kilogram_;
+  std::string mother_;
+  std::string grandmother_;
+  std::string lower_;
+  std::string upper_;
+
+  void SetUp() override {
+    s1_ = "1";
+    s1234_ = "1234";
+    s567_ = "567";
+    kilo_ = "kilo";
+    kilogram_ = "kilogram";
+    mother_ = "mother";
+    grandmother_ = "grandmother";
+    lower_ = "lower case";
+    upper_ = "UPPER case";
+  }
+};
+
+TEST_F(LevenshteinDistanceTest, BothEmpty) {
+  ASSERT_EQ(LevenshteinDistance(empty_, empty_, std::equal_to<char>()), 0);
+}
+
+TEST_F(LevenshteinDistanceTest, OneEmpty) {
+  ASSERT_EQ(LevenshteinDistance(s1234_, empty_, std::equal_to<char>()), 4);
+  ASSERT_EQ(LevenshteinDistance(empty_, s567_, std::equal_to<char>()), 3);
+}
+
+TEST_F(LevenshteinDistanceTest, SingleElement) {
+  ASSERT_EQ(LevenshteinDistance(s1234_, s1_, std::equal_to<char>()), 3);
+  ASSERT_EQ(LevenshteinDistance(s1_, s1234_, std::equal_to<char>()), 3);
+}
+
+TEST_F(LevenshteinDistanceTest, Prefix) {
+  ASSERT_EQ(LevenshteinDistance(kilo_, kilogram_, std::equal_to<char>()), 4);
+  ASSERT_EQ(LevenshteinDistance(kilogram_, kilo_, std::equal_to<char>()), 4);
+}
+
+TEST_F(LevenshteinDistanceTest, Suffix) {
+  ASSERT_EQ(LevenshteinDistance(mother_, grandmother_, std::equal_to<char>()),
+            5);
+  ASSERT_EQ(LevenshteinDistance(grandmother_, mother_, std::equal_to<char>()),
+            5);
+}
+
+TEST_F(LevenshteinDistanceTest, DifferentComparisons) {
+  ASSERT_EQ(LevenshteinDistance(lower_, upper_, std::equal_to<char>()), 5);
+  ASSERT_EQ(LevenshteinDistance(upper_, lower_, std::equal_to<char>()), 5);
+  ASSERT_EQ(
+      LevenshteinDistance(gtl::ArraySlice<char>(lower_.data(), lower_.size()),
+                          gtl::ArraySlice<char>(upper_.data(), upper_.size()),
+                          std::equal_to<char>()),
+      5);
+  auto no_case_cmp = [](char c1, char c2) {
+    return std::tolower(c1) == std::tolower(c2);
+  };
+  ASSERT_EQ(LevenshteinDistance(lower_, upper_, no_case_cmp), 3);
+  ASSERT_EQ(LevenshteinDistance(upper_, lower_, no_case_cmp), 3);
+}
+
+TEST_F(LevenshteinDistanceTest, Vectors) {
+  ASSERT_EQ(
+      LevenshteinDistance(std::string("algorithm"), std::string("altruistic"),
+                          std::equal_to<char>()),
+      6);
+}
+
+static void BM_EditDistanceHelper(int n, int len, bool completely_different) {
+  string a =
+      "The quick brown fox jumped over the lazy dog and on and on and on"
+      " Every good boy deserves fudge.  In fact, this is a very long sentence  "
+      " w/many bytes..";
+  while (a.size() < static_cast<size_t>(len)) {
+    a = a + a;
+  }
+  string b = a;
+  if (completely_different) {
+    for (size_t i = 0; i < b.size(); i++) {
+      b[i]++;
+    }
+  }
+  while (n-- > 0) {
+    LevenshteinDistance(gtl::ArraySlice<char>(a.data(), len),
+                        gtl::ArraySlice<char>(b.data(), len),
+                        std::equal_to<char>());
+  }
+}
+
+static void BM_EditDistanceSame(int n, int len) {
+  BM_EditDistanceHelper(n, len, false);
+}
+static void BM_EditDistanceDiff(int n, int len) {
+  BM_EditDistanceHelper(n, len, true);
+}
+
+BENCHMARK(BM_EditDistanceSame)->Arg(5);
+BENCHMARK(BM_EditDistanceSame)->Arg(50);
+BENCHMARK(BM_EditDistanceSame)->Arg(200);
+BENCHMARK(BM_EditDistanceSame)->Arg(1000);
+BENCHMARK(BM_EditDistanceDiff)->Arg(5);
+BENCHMARK(BM_EditDistanceDiff)->Arg(50);
+BENCHMARK(BM_EditDistanceDiff)->Arg(200);
+BENCHMARK(BM_EditDistanceDiff)->Arg(1000);
+
+}  // namespace
+}  // namespace gtl
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/gtl/inlined_vector.h b/tensorflow/core/lib/gtl/inlined_vector.h
new file mode 100644
index 0000000000..c23075129c
--- /dev/null
+++ b/tensorflow/core/lib/gtl/inlined_vector.h
@@ -0,0 +1,839 @@
+// An InlinedVector<T,N,A> is like a std::vector<T,A>, except that storage
+// for sequences of length <= N are provided inline without requiring
+// any heap allocation.  Typically N is very small (e.g., 4) so that
+// sequences that are expected to be short do not require allocations.
+//
+// Only some of the std::vector<> operations are currently implemented.
+// Other operations may be added as needed to facilitate migrating
+// code that uses std::vector<> to InlinedVector<>.
+//
+// NOTE: If you want an inlined version to replace use of a
+// std::vector<bool>, consider using util::bitmap::InlinedBitVector<NBITS>
+// in util/bitmap/inlined_bitvector.h
+//
+// TODO(billydonahue): change size_t to size_type where appropriate.
+
+#ifndef TENSORFLOW_LIB_GTL_INLINED_VECTOR_H_
+#define TENSORFLOW_LIB_GTL_INLINED_VECTOR_H_
+
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/types.h>
+#include <algorithm>
+#include <iterator>
+#include <memory>
+#include <type_traits>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/gtl/manual_constructor.h"
+
+#include <initializer_list>  // NOLINT(build/include_order)
+
+namespace tensorflow {
+namespace gtl {
+
+template <typename T, int N, typename A = std::allocator<T> >
+class InlinedVector {
+ public:
+  typedef A allocator_type;
+  typedef typename allocator_type::value_type value_type;
+  typedef typename allocator_type::pointer pointer;
+  typedef typename allocator_type::const_pointer const_pointer;
+  typedef typename allocator_type::reference reference;
+  typedef typename allocator_type::const_reference const_reference;
+  typedef typename allocator_type::size_type size_type;
+  typedef typename allocator_type::difference_type difference_type;
+  typedef pointer iterator;
+  typedef const_pointer const_iterator;
+
+  // Create an empty vector
+  InlinedVector();
+  explicit InlinedVector(const allocator_type& alloc);
+
+  // Create a vector with n copies of value_type().
+  explicit InlinedVector(size_t n);
+
+  // Create a vector with n copies of elem
+  InlinedVector(size_t n, const value_type& elem,
+                const allocator_type& alloc = allocator_type());
+
+  // Create and initialize with the elements [range_start .. range_end).
+  // The unused enable_if argument restricts this constructor so that it is
+  // elided when value_type is an integral type.  This prevents ambiguous
+  // interpretation between a call to this constructor with two integral
+  // arguments and a call to the preceding (n, elem) constructor.
+  template <typename InputIterator>
+  InlinedVector(
+      InputIterator range_start, InputIterator range_end,
+      const allocator_type& alloc = allocator_type(),
+      typename std::enable_if<!std::is_integral<InputIterator>::value>::type* =
+          NULL)
+      : allocator_and_tag_(alloc) {
+    AppendRange(range_start, range_end);
+  }
+
+  InlinedVector(std::initializer_list<value_type> init,
+                const allocator_type& alloc = allocator_type())
+      : allocator_and_tag_(alloc) {
+    AppendRange(init.begin(), init.end());
+  }
+
+  InlinedVector(const InlinedVector& v);
+
+  ~InlinedVector() { clear(); }
+
+  InlinedVector& operator=(const InlinedVector& v) {
+    // Optimized to avoid reallocation.
+    // Prefer reassignment to copy construction for elements.
+    if (size() < v.size()) {  // grow
+      reserve(v.size());
+      std::copy(v.begin(), v.begin() + size(), begin());
+      std::copy(v.begin() + size(), v.end(), std::back_inserter(*this));
+    } else {  // maybe shrink
+      erase(begin() + v.size(), end());
+      std::copy(v.begin(), v.end(), begin());
+    }
+    return *this;
+  }
+
+  size_t size() const {
+    return allocated() ? allocation().size() : tag().size();
+  }
+
+  bool empty() const { return (size() == 0); }
+
+  // Return number of elements that can be stored in vector
+  // without requiring a reallocation of underlying memory
+  size_t capacity() const { return allocated() ? allocation().capacity() : N; }
+
+  // Return a pointer to the underlying array.
+  // Only result[0,size()-1] are defined.
+  const_pointer data() const {
+    return allocated() ? allocated_space() : inlined_space();
+  }
+  pointer data() { return allocated() ? allocated_space() : inlined_space(); }
+
+  // An older name for the more standard-friendly .data().
+  const_pointer array() const { return data(); }
+  pointer mutable_array() { return data(); }
+
+  // Remove all elements
+  void clear() {
+    size_t s = size();
+    if (allocated()) {
+      DestroyAllocated(allocated_space(), allocated_space() + s);
+      allocation().Dealloc(allocator());
+    } else {
+      DestroyInlined(inlined_space(), inlined_space() + s);
+    }
+    tag() = Tag();
+  }
+
+  // Return the ith element
+  // REQUIRES: 0 <= i < size()
+  const value_type& at(size_t i) const {
+    DCHECK_LT(i, size());
+    return array()[i];
+  }
+  const value_type& operator[](size_t i) const {
+    DCHECK_LT(i, size());
+    return array()[i];
+  }
+
+  // Return a non-const reference to the ith element
+  // REQUIRES: 0 <= i < size()
+  value_type& at(size_t i) {
+    DCHECK_LT(i, size());
+    return mutable_array()[i];
+  }
+  value_type& operator[](size_t i) {
+    DCHECK_LT(i, size());
+    return mutable_array()[i];
+  }
+
+  value_type& back() {
+    DCHECK(!empty());
+    return at(size() - 1);
+  }
+
+  const value_type& back() const {
+    DCHECK(!empty());
+    return at(size() - 1);
+  }
+
+  value_type& front() {
+    DCHECK(!empty());
+    return at(0);
+  }
+
+  const value_type& front() const {
+    DCHECK(!empty());
+    return at(0);
+  }
+
+  // Append t to the vector.
+  // Increases size() by one.
+  // Amortized complexity: O(1)
+  // Worst-case complexity: O(size())
+  void push_back(const value_type& t) {
+    size_t s = size();
+    DCHECK_LE(s, capacity());
+    if (s == capacity()) {
+      return GrowAndPushBack(t);
+    }
+    DCHECK_LT(s, capacity());
+
+    if (allocated()) {
+      ConstructAllocated(allocated_space() + s, t);
+    } else {
+      ConstructInlined(inlined_space() + s, t);
+    }
+
+    set_size_internal(s + 1);
+  }
+
+  void pop_back() {
+    DCHECK(!empty());
+    size_t s = size();
+    if (allocated()) {
+      DestroyAllocated(allocated_space() + s - 1, allocated_space() + s);
+    } else {
+      DestroyInlined(inlined_space() + s - 1, inlined_space() + s);
+    }
+    set_size_internal(s - 1);
+  }
+
+  // Resizes the vector to contain "n" elements.
+  // If "n" is smaller than the initial size, extra elements are destroyed.
+  // If "n" is larger than the initial size, enough copies of "elem"
+  // are appended to increase the size to "n". If "elem" is omitted,
+  // new elements are value-initialized.
+  void resize(size_t n);
+  void resize(size_t n, const value_type& elem);
+
+  iterator begin() { return mutable_array(); }
+  const_iterator begin() const { return array(); }
+
+  iterator end() { return mutable_array() + size(); }
+  const_iterator end() const { return array() + size(); }
+
+  iterator insert(iterator pos, const value_type& v);
+
+  iterator erase(iterator pos) {
+    DCHECK_LT(pos, end());
+    DCHECK_GE(pos, begin());
+    std::copy(pos + 1, end(), pos);
+    pop_back();
+    return pos;
+  }
+
+  iterator erase(iterator first, iterator last);
+
+  // Enlarges the underlying representation so it can hold at least
+  // "n" elements without reallocation.
+  // Does not change size() or the actual contents of the vector.
+  void reserve(size_t n) {
+    if (n > capacity()) {
+      // Make room for new elements
+      EnlargeBy(n - size());
+    }
+  }
+
+  // Swap the contents of *this with other.
+  // REQUIRES: value_type is swappable and copyable.
+  void swap(InlinedVector& other);
+
+  allocator_type get_allocator() const { return allocator(); }
+
+ private:
+  struct AllocatorTraits {
+    typedef typename allocator_type::value_type value_type;
+    typedef typename allocator_type::pointer pointer;
+    typedef typename allocator_type::size_type size_type;
+
+    static void construct(allocator_type& a,  // NOLINT(runtime/references)
+                          pointer p) {
+      // Tricky: do we support non-copyable types, or support allocators
+      // that do special things with construct()? Non-copyable types are
+      // needed today, so they are more important. When we sort out the
+      // Android NDK C++11 problem, we will be able to use the proper
+      // std::allocator_traits<A>::construct(p, ...).
+      //
+      // a.construct(p, value_type());
+      new (p) value_type();
+    }
+    static void construct(allocator_type& a,  // NOLINT(runtime/references)
+                          pointer p, const value_type& t) {
+      a.construct(p, t);
+    }
+    static void destroy(allocator_type& a,  // NOLINT(runtime/references)
+                        pointer p) {
+      a.destroy(p);
+    }
+    static pointer allocate(allocator_type& a,  // NOLINT(runtime/references)
+                            size_type n) {
+      return a.allocate(n);
+    }
+    static void deallocate(allocator_type& a,  // NOLINT(runtime/references)
+                           pointer p, size_type n) {
+      a.deallocate(p, n);
+    }
+  };
+
+  // If the vector is inlined, holds the size of the vector.
+  // If the vector is allocated, holds the special value kAllocated,
+  // and the size is stored in the vector's Allocation.
+  class Tag {
+   public:
+    Tag() : size_(0) {}
+    size_t size() const { return size_; }
+    void set_size(size_t n) { size_ = n; }
+    bool allocated() const { return size_ == kAllocated; }
+    void set_allocated() { size_ = kAllocated; }
+
+   private:
+    static const size_t kAllocated = -1;
+    size_t size_;
+  };
+
+  // Derives from allocator_type to use the empty base class optimization.
+  // If the allocator_type is stateless, we can 'store'
+  // our instance of it for free.
+  class AllocatorAndTag : private allocator_type {
+   public:
+    explicit AllocatorAndTag(const allocator_type& a, Tag t = Tag())
+        : allocator_type(a), tag_(t) {}
+    Tag& tag() { return tag_; }
+    const Tag& tag() const { return tag_; }
+    allocator_type& allocator() { return *this; }
+    const allocator_type& allocator() const { return *this; }
+
+   private:
+    Tag tag_;
+  };
+
+  class Allocation {
+   public:
+    Allocation(allocator_type& a,  // NOLINT(runtime/references)
+               size_t capacity)
+        : size_(0),
+          capacity_(capacity),
+          buffer_(AllocatorTraits::allocate(a, capacity_)) {}
+
+    void Dealloc(allocator_type& a) {  // NOLINT(runtime/references)
+      AllocatorTraits::deallocate(a, buffer(), capacity());
+    }
+
+    size_t size() const { return size_; }
+    void set_size(size_t s) { size_ = s; }
+    size_t capacity() const { return capacity_; }
+    const value_type* buffer() const { return buffer_; }
+    value_type* buffer() { return buffer_; }
+
+   private:
+    size_t size_;
+    size_t capacity_;
+    value_type* buffer_;
+  };
+
+  const Tag& tag() const { return allocator_and_tag_.tag(); }
+  Tag& tag() { return allocator_and_tag_.tag(); }
+
+  Allocation& allocation() { return *rep_.allocation_storage.allocation.get(); }
+  const Allocation& allocation() const {
+    return *rep_.allocation_storage.allocation.get();
+  }
+  void init_allocation(const Allocation& allocation) {
+    rep_.allocation_storage.allocation.Init(allocation);
+  }
+
+  value_type* inlined_space() { return rep_.inlined_storage.inlined[0].get(); }
+  const value_type* inlined_space() const {
+    return rep_.inlined_storage.inlined[0].get();
+  }
+
+  value_type* allocated_space() { return allocation().buffer(); }
+  const value_type* allocated_space() const { return allocation().buffer(); }
+
+  const allocator_type& allocator() const {
+    return allocator_and_tag_.allocator();
+  }
+  allocator_type& allocator() { return allocator_and_tag_.allocator(); }
+
+  bool allocated() const { return tag().allocated(); }
+  void set_allocated() { return tag().set_allocated(); }
+
+  void set_size_internal(size_t n) {
+    if (allocated()) {
+      allocation().set_size(n);
+    } else {
+      tag().set_size(n);
+    }
+  }
+
+  // Enlarge the underlying representation so we can store size_ + delta elems.
+  // The size is not changed, and any newly added memory is not initialized.
+  void EnlargeBy(size_t delta);
+
+  void ResetAllocation(Allocation new_allocation) {
+    if (allocated()) {
+      DestroyAllocated(allocated_space(), allocated_space() + size());
+      DCHECK_EQ(begin(), allocated_space());
+      allocation().Dealloc(allocator());
+      allocation() = new_allocation;
+    } else {
+      DestroyInlined(inlined_space(), inlined_space() + size());
+      init_allocation(new_allocation);  // bug: only init once
+      set_allocated();
+    }
+  }
+
+  void GrowAndPushBack(const value_type& t) {
+    DCHECK_EQ(size(), capacity());
+    const size_t s = size();
+
+    Allocation new_allocation(allocator(), 2 * capacity());
+    new_allocation.set_size(s + 1);
+
+    UninitializedCopyAllocated(array(), array() + s, new_allocation.buffer());
+    ConstructAllocated(new_allocation.buffer() + s, t);
+
+    ResetAllocation(new_allocation);
+  }
+
+  void InitAssign(size_t n);
+  void InitAssign(size_t n, const value_type& t);
+
+  void ConstructInlined(pointer p) { new (p) value_type(); }
+
+  void ConstructInlined(pointer p, const value_type& t) {
+    new (p) value_type(t);
+  }
+
+  void ConstructAllocated(pointer p) {
+    AllocatorTraits::construct(allocator(), p);
+  }
+  void ConstructAllocated(pointer p, const value_type& t) {
+    AllocatorTraits::construct(allocator(), p, t);
+  }
+
+  template <typename Iter>
+  void UninitializedCopyInlined(Iter src, Iter src_last, value_type* dst) {
+    std::uninitialized_copy(src, src_last, dst);
+  }
+
+  template <typename Iter>
+  void UninitializedCopyAllocated(Iter src, Iter src_last, value_type* dst) {
+    for (; src != src_last; ++dst, ++src) ConstructAllocated(dst, *src);
+  }
+
+  void UninitializedFillInlined(value_type* dst, value_type* dst_last) {
+    for (; dst != dst_last; ++dst) ConstructInlined(dst);
+  }
+  void UninitializedFillInlined(value_type* dst, value_type* dst_last,
+                                const value_type& t) {
+    std::uninitialized_fill(dst, dst_last, t);
+  }
+
+  void UninitializedFillAllocated(value_type* dst, value_type* dst_last) {
+    for (; dst != dst_last; ++dst) ConstructAllocated(dst);
+  }
+  void UninitializedFillAllocated(value_type* dst, value_type* dst_last,
+                                  const value_type& t) {
+    for (; dst != dst_last; ++dst) ConstructAllocated(dst, t);
+  }
+
+  // Destroy [ptr, ptr_last) in place.
+  void DestroyInlined(value_type* ptr, value_type* ptr_last);
+  void DestroyAllocated(value_type* ptr, value_type* ptr_last);
+
+  template <typename Iter>
+  void AppendRange(Iter first, Iter last, std::input_iterator_tag);
+
+  // Faster path for forward iterators.
+  template <typename Iter>
+  void AppendRange(Iter first, Iter last, std::forward_iterator_tag);
+
+  template <typename Iter>
+  void AppendRange(Iter first, Iter last);
+
+  AllocatorAndTag allocator_and_tag_;
+
+  // Either the inlined or allocated representation
+  union Rep {
+    // Use struct to perform indirection that solves a bizarre compilation
+    // error on Visual Studio (all known versions).
+    struct {
+      tensorflow::ManualConstructor<value_type> inlined[N];
+    } inlined_storage;
+    struct {
+      tensorflow::ManualConstructor<Allocation> allocation;
+    } allocation_storage;
+  } rep_;
+};
+
+template <typename T, int N, typename A>
+const size_t InlinedVector<T, N, A>::Tag::kAllocated;
+
+template <typename T, int N, typename A>
+inline void swap(InlinedVector<T, N, A>& a, InlinedVector<T, N, A>& b) {
+  a.swap(b);
+}
+
+template <typename T, int N, typename A>
+inline bool operator==(const InlinedVector<T, N, A>& a,
+                       const InlinedVector<T, N, A>& b) {
+  return a.size() == b.size() && std::equal(a.begin(), a.end(), b.begin());
+}
+
+template <typename T, int N, typename A>
+inline bool operator!=(const InlinedVector<T, N, A>& a,
+                       const InlinedVector<T, N, A>& b) {
+  return !(a == b);
+}
+
+template <typename T, int N, typename A>
+inline bool operator<(const InlinedVector<T, N, A>& a,
+                      const InlinedVector<T, N, A>& b) {
+  return std::lexicographical_compare(a.begin(), a.end(), b.begin(), b.end());
+}
+
+template <typename T, int N, typename A>
+inline bool operator>(const InlinedVector<T, N, A>& a,
+                      const InlinedVector<T, N, A>& b) {
+  return b < a;
+}
+
+template <typename T, int N, typename A>
+inline bool operator<=(const InlinedVector<T, N, A>& a,
+                       const InlinedVector<T, N, A>& b) {
+  return !(b < a);
+}
+
+template <typename T, int N, typename A>
+inline bool operator>=(const InlinedVector<T, N, A>& a,
+                       const InlinedVector<T, N, A>& b) {
+  return !(a < b);
+}
+
+// ========================================
+// Implementation
+
+template <typename T, int N, typename A>
+inline InlinedVector<T, N, A>::InlinedVector()
+    : allocator_and_tag_(allocator_type()) {}
+
+template <typename T, int N, typename A>
+inline InlinedVector<T, N, A>::InlinedVector(const allocator_type& alloc)
+    : allocator_and_tag_(alloc) {}
+
+template <typename T, int N, typename A>
+inline InlinedVector<T, N, A>::InlinedVector(size_t n)
+    : allocator_and_tag_(allocator_type()) {
+  InitAssign(n);
+}
+
+template <typename T, int N, typename A>
+inline InlinedVector<T, N, A>::InlinedVector(size_t n, const value_type& elem,
+                                             const allocator_type& alloc)
+    : allocator_and_tag_(alloc) {
+  InitAssign(n, elem);
+}
+
+template <typename T, int N, typename A>
+inline InlinedVector<T, N, A>::InlinedVector(const InlinedVector& v)
+    : allocator_and_tag_(v.allocator()) {
+  reserve(v.size());
+  if (allocated()) {
+    UninitializedCopyAllocated(v.begin(), v.end(), allocated_space());
+  } else {
+    UninitializedCopyInlined(v.begin(), v.end(), inlined_space());
+  }
+  set_size_internal(v.size());
+}
+
+template <typename T, int N, typename A>
+inline void InlinedVector<T, N, A>::InitAssign(size_t n, const value_type& t) {
+  if (n > static_cast<size_t>(N)) {
+    Allocation new_allocation(allocator(), n);
+    init_allocation(new_allocation);
+    set_allocated();
+    UninitializedFillAllocated(allocated_space(), allocated_space() + n, t);
+  } else {
+    UninitializedFillInlined(inlined_space(), inlined_space() + n, t);
+  }
+  set_size_internal(n);
+}
+
+template <typename T, int N, typename A>
+inline void InlinedVector<T, N, A>::InitAssign(size_t n) {
+  if (n > static_cast<size_t>(N)) {
+    Allocation new_allocation(allocator(), n);
+    init_allocation(new_allocation);
+    set_allocated();
+    UninitializedFillAllocated(allocated_space(), allocated_space() + n);
+  } else {
+    UninitializedFillInlined(inlined_space(), inlined_space() + n);
+  }
+  set_size_internal(n);
+}
+
+template <typename T, int N, typename A>
+inline void InlinedVector<T, N, A>::resize(size_t n) {
+  size_t s = size();
+  if (n < s) {
+    erase(begin() + n, end());
+    return;
+  }
+  reserve(n);
+  DCHECK_GE(capacity(), n);
+
+  // Fill new space with elements constructed in-place.
+  if (allocated()) {
+    UninitializedFillAllocated(allocated_space() + s, allocated_space() + n);
+  } else {
+    UninitializedFillInlined(inlined_space() + s, inlined_space() + n);
+  }
+  set_size_internal(n);
+}
+
+template <typename T, int N, typename A>
+inline void InlinedVector<T, N, A>::resize(size_t n, const value_type& elem) {
+  size_t s = size();
+  if (n < s) {
+    erase(begin() + n, end());
+    return;
+  }
+  reserve(n);
+  DCHECK_GE(capacity(), n);
+
+  // Fill new space with copies of 'elem'.
+  if (allocated()) {
+    UninitializedFillAllocated(allocated_space() + s, allocated_space() + n,
+                               elem);
+  } else {
+    UninitializedFillInlined(inlined_space() + s, inlined_space() + n, elem);
+  }
+  set_size_internal(n);
+}
+
+template <typename T, int N, typename A>
+typename InlinedVector<T, N, A>::iterator InlinedVector<T, N, A>::insert(
+    iterator pos, const value_type& v) {
+  DCHECK_GE(pos, begin());
+  DCHECK_LE(pos, end());
+  if (pos == end()) {
+    push_back(v);
+    return end() - 1;
+  }
+  size_t s = size();
+  size_t idx = std::distance(begin(), pos);
+  if (s == capacity()) {
+    EnlargeBy(1);
+  }
+  CHECK_LT(s, capacity());
+  pos = begin() + idx;  // Reset 'pos' into a post-enlarge iterator.
+
+  if (allocated()) {
+    ConstructAllocated(allocated_space() + s, *(allocated_space() + s - 1));
+    std::copy_backward(pos, allocated_space() + s - 1, allocated_space() + s);
+  } else {
+    ConstructInlined(inlined_space() + s, *(inlined_space() + s - 1));
+    std::copy_backward(pos, inlined_space() + s - 1, inlined_space() + s);
+  }
+
+  *pos = v;
+
+  set_size_internal(s + 1);
+  return pos;
+}
+
+template <typename T, int N, typename A>
+typename InlinedVector<T, N, A>::iterator InlinedVector<T, N, A>::erase(
+    iterator first, iterator last) {
+  DCHECK_LE(begin(), first);
+  DCHECK_LE(first, last);
+  DCHECK_LE(last, end());
+
+  size_t s = size();
+  ptrdiff_t erase_gap = std::distance(first, last);
+
+  if (allocated()) {
+    std::copy(last, allocated_space() + s, first);
+    DestroyAllocated(allocated_space() + s - erase_gap, allocated_space() + s);
+  } else {
+    std::copy(last, inlined_space() + s, first);
+    DestroyInlined(inlined_space() + s - erase_gap, inlined_space() + s);
+  }
+
+  set_size_internal(size() - erase_gap);
+
+  return first;
+}
+
+template <typename T, int N, typename A>
+void InlinedVector<T, N, A>::swap(InlinedVector& other) {
+  using std::swap;  // Augment ADL with std::swap.
+  if (&other == this) {
+    return;
+  }
+  if (allocated() && other.allocated()) {
+    // Both out of line, so just swap the tag, allocation, and allocator.
+    swap(tag(), other.tag());
+    swap(allocation(), other.allocation());
+    swap(allocator(), other.allocator());
+    return;
+  }
+  if (!allocated() && !other.allocated()) {
+    // Both inlined: swap up to smaller size, then move remaining elements.
+    InlinedVector* a = this;
+    InlinedVector* b = &other;
+    if (size() < other.size()) {
+      swap(a, b);
+    }
+
+    const size_t a_size = a->size();
+    const size_t b_size = b->size();
+    DCHECK_GE(a_size, b_size);
+    // 'a' is larger. Swap the elements up to the smaller array size.
+    std::swap_ranges(a->inlined_space(), a->inlined_space() + b_size,
+                     b->inlined_space());
+
+    // Move the remaining elements: A[b_size,a_size) -> B[b_size,a_size)
+    b->UninitializedCopyInlined(a->inlined_space() + b_size,
+                                a->inlined_space() + a_size,
+                                b->inlined_space() + b_size);
+    a->DestroyInlined(a->inlined_space() + b_size, a->inlined_space() + a_size);
+
+    swap(a->tag(), b->tag());
+    swap(a->allocator(), b->allocator());
+    DCHECK_EQ(b->size(), a_size);
+    DCHECK_EQ(a->size(), b_size);
+    return;
+  }
+  // One is out of line, one is inline.
+  // We first move the elements from the inlined vector into the
+  // inlined space in the other vector.  We then put the other vector's
+  // pointer/capacity into the originally inlined vector and swap
+  // the tags.
+  InlinedVector* a = this;
+  InlinedVector* b = &other;
+  if (a->allocated()) {
+    swap(a, b);
+  }
+  DCHECK(!a->allocated());
+  DCHECK(b->allocated());
+  const size_t a_size = a->size();
+  const size_t b_size = b->size();
+
+  // Made Local copies of size(), don't need tag() accurate anymore
+  swap(a->tag(), b->tag());
+
+  // Copy b_allocation out before b's union gets clobbered by inline_space.
+  Allocation b_allocation = b->allocation();
+
+  b->UninitializedCopyInlined(a->inlined_space(), a->inlined_space() + a_size,
+                              b->inlined_space());
+  a->DestroyInlined(a->inlined_space(), a->inlined_space() + a_size);
+
+  a->allocation() = b_allocation;
+
+  if (a->allocator() != b->allocator()) {
+    swap(a->allocator(), b->allocator());
+  }
+
+  DCHECK_EQ(b->size(), a_size);
+  DCHECK_EQ(a->size(), b_size);
+}
+
+template <typename T, int N, typename A>
+void InlinedVector<T, N, A>::EnlargeBy(size_t delta) {
+  const size_t s = size();
+  DCHECK_LE(s, capacity());
+
+  size_t target = std::max(static_cast<size_t>(N), s + delta);
+
+  // Compute new capacity by repeatedly doubling current capacity
+  // TODO(psrc): Check and avoid overflow?
+  size_t new_capacity = capacity();
+  while (new_capacity < target) {
+    new_capacity <<= 1;
+  }
+
+  Allocation new_allocation(allocator(), new_capacity);
+  new_allocation.set_size(s);
+
+  UninitializedCopyAllocated(array(), array() + s, new_allocation.buffer());
+
+  ResetAllocation(new_allocation);
+}
+
+template <typename T, int N, typename A>
+inline void InlinedVector<T, N, A>::DestroyInlined(value_type* ptr,
+                                                   value_type* ptr_last) {
+  for (value_type* p = ptr; p != ptr_last; ++p) {
+    p->~value_type();
+  }
+
+// Overwrite unused memory with 0xab so we can catch uninitialized usage.
+// Cast to void* to tell the compiler that we don't care that we might be
+// scribbling on a vtable pointer.
+#ifndef NDEBUG
+  if (ptr != ptr_last) {
+    memset(reinterpret_cast<void*>(ptr), 0xab, sizeof(*ptr) * (ptr_last - ptr));
+  }
+#endif
+}
+
+template <typename T, int N, typename A>
+inline void InlinedVector<T, N, A>::DestroyAllocated(value_type* ptr,
+                                                     value_type* ptr_last) {
+  for (value_type* p = ptr; p != ptr_last; ++p) {
+    AllocatorTraits::destroy(allocator(), p);
+  }
+
+// Overwrite unused memory with 0xab so we can catch uninitialized usage.
+// Cast to void* to tell the compiler that we don't care that we might be
+// scribbling on a vtable pointer.
+#ifndef NDEBUG
+  if (ptr != ptr_last) {
+    memset(reinterpret_cast<void*>(ptr), 0xab, sizeof(*ptr) * (ptr_last - ptr));
+  }
+#endif
+}
+
+template <typename T, int N, typename A>
+template <typename Iter>
+inline void InlinedVector<T, N, A>::AppendRange(Iter first, Iter last,
+                                                std::input_iterator_tag) {
+  std::copy(first, last, std::back_inserter(*this));
+}
+
+template <typename T, int N, typename A>
+template <typename Iter>
+inline void InlinedVector<T, N, A>::AppendRange(Iter first, Iter last,
+                                                std::forward_iterator_tag) {
+  typedef typename std::iterator_traits<Iter>::difference_type Length;
+  Length length = std::distance(first, last);
+  reserve(size() + length);
+  if (allocated()) {
+    UninitializedCopyAllocated(first, last, allocated_space() + size());
+  } else {
+    UninitializedCopyInlined(first, last, inlined_space() + size());
+  }
+  set_size_internal(size() + length);
+}
+
+template <typename T, int N, typename A>
+template <typename Iter>
+inline void InlinedVector<T, N, A>::AppendRange(Iter first, Iter last) {
+  typedef typename std::iterator_traits<Iter>::iterator_category IterTag;
+  AppendRange(first, last, IterTag());
+}
+
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_INLINED_VECTOR_H_
diff --git a/tensorflow/core/lib/gtl/inlined_vector_test.cc b/tensorflow/core/lib/gtl/inlined_vector_test.cc
new file mode 100644
index 0000000000..ec5fe1eaa8
--- /dev/null
+++ b/tensorflow/core/lib/gtl/inlined_vector_test.cc
@@ -0,0 +1,905 @@
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+
+#include <list>
+#include <memory>
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+typedef tensorflow::gtl::InlinedVector<int, 8> IntVec;
+
+// A type that counts number of live occurrences of the type
+static int64 instances = 0;
+class Instance {
+ public:
+  int value_;
+  explicit Instance(int x) : value_(x) { instances++; }
+  Instance(const Instance& x) : value_(x.value_) { instances++; }
+  ~Instance() { instances--; }
+
+  friend inline void swap(Instance& a, Instance& b) {
+    using std::swap;
+    swap(a.value_, b.value_);
+  }
+
+  friend std::ostream& operator<<(std::ostream& o, const Instance& v) {
+    return o << "[value:" << v.value_ << "]";
+  }
+};
+
+typedef tensorflow::gtl::InlinedVector<Instance, 8> InstanceVec;
+
+// A simple reference counted class to make sure that the proper elements are
+// destroyed in the erase(begin, end) test.
+class RefCounted {
+ public:
+  RefCounted(int value, int* count) : value_(value), count_(count) { Ref(); }
+
+  RefCounted(const RefCounted& v) : value_(v.value_), count_(v.count_) {
+    VLOG(5) << "[RefCounted: copy"
+            << " from count @" << v.count_ << "]";
+    Ref();
+  }
+
+  ~RefCounted() {
+    Unref();
+    count_ = NULL;
+  }
+
+  friend void swap(RefCounted& a, RefCounted& b) {
+    using std::swap;
+    swap(a.value_, b.value_);
+    swap(a.count_, b.count_);
+  }
+
+  RefCounted& operator=(RefCounted v) {
+    using std::swap;
+    swap(*this, v);
+    return *this;
+  }
+
+  void Ref() const {
+    CHECK(count_ != NULL);
+    ++(*count_);
+    VLOG(5) << "[Ref: refcount " << *count_ << " on count @" << count_ << "]";
+  }
+
+  void Unref() const {
+    --(*count_);
+    CHECK_GE(*count_, 0);
+    VLOG(5) << "[Unref: refcount " << *count_ << " on count @" << count_ << "]";
+  }
+
+  int count() const { return *count_; }
+
+  friend std::ostream& operator<<(std::ostream& o, const RefCounted& v) {
+    return o << "[value:" << v.value_ << ", count:" << *v.count_ << "]";
+  }
+
+  int value_;
+  int* count_;
+};
+
+typedef tensorflow::gtl::InlinedVector<RefCounted, 8> RefCountedVec;
+
+// A class with a vtable pointer
+class Dynamic {
+ public:
+  virtual ~Dynamic() {}
+
+  friend std::ostream& operator<<(std::ostream& o, const Dynamic& v) {
+    return o << "[Dynamic]";
+  }
+};
+
+typedef tensorflow::gtl::InlinedVector<Dynamic, 8> DynamicVec;
+
+// Append 0..len-1 to *v
+static void Fill(IntVec* v, int len, int offset = 0) {
+  for (int i = 0; i < len; i++) {
+    v->push_back(i + offset);
+  }
+}
+
+static IntVec Fill(int len, int offset = 0) {
+  IntVec v;
+  Fill(&v, len, offset);
+  return v;
+}
+
+TEST(IntVec, SimpleOps) {
+  for (int len = 0; len < 20; len++) {
+    IntVec v;
+    const IntVec& cv = v;  // const alias
+
+    Fill(&v, len);
+    EXPECT_EQ(len, v.size());
+    EXPECT_LE(len, v.capacity());
+
+    for (int i = 0; i < len; i++) {
+      EXPECT_EQ(i, v[i]);
+    }
+    EXPECT_EQ(v.begin(), v.array());
+    EXPECT_EQ(v.begin(), v.mutable_array());
+
+    EXPECT_EQ(v.begin(), v.data());
+    EXPECT_EQ(cv.begin(), cv.data());
+
+    int counter = 0;
+    for (IntVec::iterator iter = v.begin(); iter != v.end(); ++iter) {
+      EXPECT_EQ(counter, *iter);
+      counter++;
+    }
+    EXPECT_EQ(counter, len);
+
+    counter = 0;
+    for (IntVec::const_iterator iter = v.begin(); iter != v.end(); ++iter) {
+      EXPECT_EQ(counter, *iter);
+      counter++;
+    }
+    EXPECT_EQ(counter, len);
+
+    if (len > 0) {
+      EXPECT_EQ(0, v.front());
+      EXPECT_EQ(len - 1, v.back());
+      v.pop_back();
+      EXPECT_EQ(len - 1, v.size());
+      for (size_t i = 0; i < v.size(); ++i) {
+        EXPECT_EQ(i, v[i]);
+      }
+    }
+  }
+}
+
+TEST(IntVec, Erase) {
+  for (int len = 1; len < 20; len++) {
+    for (int i = 0; i < len; ++i) {
+      IntVec v;
+      Fill(&v, len);
+      v.erase(v.begin() + i);
+      EXPECT_EQ(len - 1, v.size());
+      for (int j = 0; j < i; ++j) {
+        EXPECT_EQ(j, v[j]);
+      }
+      for (int j = i; j < len - 1; ++j) {
+        EXPECT_EQ(j + 1, v[j]);
+      }
+    }
+  }
+}
+
+// At the end of this test loop, the elements between [erase_begin, erase_end)
+// should have reference counts == 0, and all others elements should have
+// reference counts == 1.
+TEST(RefCountedVec, EraseBeginEnd) {
+  for (int len = 1; len < 20; ++len) {
+    for (int erase_begin = 0; erase_begin < len; ++erase_begin) {
+      for (int erase_end = erase_begin; erase_end <= len; ++erase_end) {
+        std::vector<int> counts(len, 0);
+        RefCountedVec v;
+        for (int i = 0; i < len; ++i) {
+          v.push_back(RefCounted(i, &counts[i]));
+        }
+
+        int erase_len = erase_end - erase_begin;
+
+        v.erase(v.begin() + erase_begin, v.begin() + erase_end);
+
+        EXPECT_EQ(len - erase_len, v.size());
+
+        // Check the elements before the first element erased.
+        for (int i = 0; i < erase_begin; ++i) {
+          EXPECT_EQ(i, v[i].value_);
+        }
+
+        // Check the elements after the first element erased.
+        for (size_t i = erase_begin; i < v.size(); ++i) {
+          EXPECT_EQ(i + erase_len, v[i].value_);
+        }
+
+        // Check that the elements at the beginning are preserved.
+        for (int i = 0; i < erase_begin; ++i) {
+          EXPECT_EQ(1, counts[i]);
+        }
+
+        // Check that the erased elements are destroyed
+        for (int i = erase_begin; i < erase_end; ++i) {
+          EXPECT_EQ(0, counts[i]);
+        }
+
+        // Check that the elements at the end are preserved.
+        for (int i = erase_end; i < len; ++i) {
+          EXPECT_EQ(1, counts[i]);
+        }
+      }
+    }
+  }
+}
+
+struct NoDefaultCtor {
+  explicit NoDefaultCtor(int /* x */) {}
+};
+struct NoCopy {
+  NoCopy() {}
+  NoCopy(const NoCopy& /* x */) = delete;
+};
+struct NoAssign {
+  NoAssign() {}
+  NoAssign& operator=(const NoAssign& /* x */) = delete;
+};
+TEST(InlinedVectorTest, NoDefaultCtor) {
+  tensorflow::gtl::InlinedVector<NoDefaultCtor, 1> v(10, NoDefaultCtor(2));
+  (void)v;
+}
+TEST(InlinedVectorTest, NoCopy) {
+  tensorflow::gtl::InlinedVector<NoCopy, 1> v(10);
+  (void)v;
+}
+TEST(InlinedVectorTest, NoAssign) {
+  tensorflow::gtl::InlinedVector<NoAssign, 1> v(10);
+  (void)v;
+}
+
+TEST(IntVec, Insert) {
+  for (int len = 0; len < 20; len++) {
+    for (int pos = 0; pos <= len; pos++) {
+      IntVec v;
+      Fill(&v, len);
+      v.insert(v.begin() + pos, 9999);
+      EXPECT_EQ(v.size(), len + 1);
+      for (int i = 0; i < pos; i++) {
+        EXPECT_EQ(v[i], i);
+      }
+      EXPECT_EQ(v[pos], 9999);
+      for (size_t i = pos + 1; i < v.size(); i++) {
+        EXPECT_EQ(v[i], i - 1);
+      }
+    }
+  }
+}
+
+TEST(RefCountedVec, InsertConstructorDestructor) {
+  // Make sure the proper construction/destruction happen during insert
+  // operations.
+  for (int len = 0; len < 20; len++) {
+    SCOPED_TRACE(len);
+    for (int pos = 0; pos <= len; pos++) {
+      SCOPED_TRACE(pos);
+      std::vector<int> counts(len, 0);
+      RefCountedVec v;
+      for (int i = 0; i < len; ++i) {
+        SCOPED_TRACE(i);
+        v.push_back(RefCounted(i, &counts[i]));
+      }
+
+      for (auto elem : counts) {
+        EXPECT_EQ(1, elem);
+      }
+
+      int inserted_count = 0;
+      RefCounted insert_element(9999, &inserted_count);
+      EXPECT_EQ(1, inserted_count);
+      v.insert(v.begin() + pos, insert_element);
+      EXPECT_EQ(2, inserted_count);
+      // Check that the elements at the end are preserved.
+      for (auto elem : counts) {
+        EXPECT_EQ(1, elem);
+      }
+      EXPECT_EQ(2, inserted_count);
+    }
+  }
+}
+
+TEST(IntVec, Resize) {
+  for (int len = 0; len < 20; len++) {
+    IntVec v;
+    Fill(&v, len);
+
+    // Try resizing up and down by k elements
+    static const int kResizeElem = 1000000;
+    for (int k = 0; k < 10; k++) {
+      // Enlarging resize
+      v.resize(len + k, kResizeElem);
+      EXPECT_EQ(len + k, v.size());
+      EXPECT_LE(len + k, v.capacity());
+      for (int i = 0; i < len + k; i++) {
+        if (i < len) {
+          EXPECT_EQ(i, v[i]);
+        } else {
+          EXPECT_EQ(kResizeElem, v[i]);
+        }
+      }
+
+      // Shrinking resize
+      v.resize(len, kResizeElem);
+      EXPECT_EQ(len, v.size());
+      EXPECT_LE(len, v.capacity());
+      for (int i = 0; i < len; i++) {
+        EXPECT_EQ(i, v[i]);
+      }
+    }
+  }
+}
+
+TEST(IntVec, InitWithLength) {
+  for (int len = 0; len < 20; len++) {
+    IntVec v(len, 7);
+    EXPECT_EQ(len, v.size());
+    EXPECT_LE(len, v.capacity());
+    for (int i = 0; i < len; i++) {
+      EXPECT_EQ(7, v[i]);
+    }
+  }
+}
+
+TEST(IntVec, CopyConstructorAndAssignment) {
+  for (int len = 0; len < 20; len++) {
+    IntVec v;
+    Fill(&v, len);
+    EXPECT_EQ(len, v.size());
+    EXPECT_LE(len, v.capacity());
+
+    IntVec v2(v);
+    EXPECT_EQ(v, v2);
+
+    for (int start_len = 0; start_len < 20; start_len++) {
+      IntVec v3;
+      Fill(&v3, start_len, 99);  // Add dummy elements that should go away
+      v3 = v;
+      EXPECT_EQ(v, v3);
+    }
+  }
+}
+
+TEST(OverheadTest, Storage) {
+  // Check for size overhead.
+  // In particular, ensure that std::allocator doesn't cost anything to store.
+  // The union should be absorbing some of the allocation bookkeeping overhead
+  // in the larger vectors, leaving only the size_ field as overhead.
+  using tensorflow::gtl::InlinedVector;
+  EXPECT_EQ(3 * sizeof(int*),
+            sizeof(InlinedVector<int*, 1>) - 1 * sizeof(int*));
+  EXPECT_EQ(2 * sizeof(int*),
+            sizeof(InlinedVector<int*, 2>) - 2 * sizeof(int*));
+  EXPECT_EQ(1 * sizeof(int*),
+            sizeof(InlinedVector<int*, 3>) - 3 * sizeof(int*));
+  EXPECT_EQ(1 * sizeof(int*),
+            sizeof(InlinedVector<int*, 4>) - 4 * sizeof(int*));
+  EXPECT_EQ(1 * sizeof(int*),
+            sizeof(InlinedVector<int*, 5>) - 5 * sizeof(int*));
+  EXPECT_EQ(1 * sizeof(int*),
+            sizeof(InlinedVector<int*, 6>) - 6 * sizeof(int*));
+  EXPECT_EQ(1 * sizeof(int*),
+            sizeof(InlinedVector<int*, 7>) - 7 * sizeof(int*));
+  EXPECT_EQ(1 * sizeof(int*),
+            sizeof(InlinedVector<int*, 8>) - 8 * sizeof(int*));
+}
+
+TEST(IntVec, Clear) {
+  for (int len = 0; len < 20; len++) {
+    SCOPED_TRACE(len);
+    IntVec v;
+    Fill(&v, len);
+    v.clear();
+    EXPECT_EQ(0, v.size());
+    EXPECT_EQ(v.begin(), v.end());
+  }
+}
+
+TEST(IntVec, Reserve) {
+  for (size_t len = 0; len < 20; len++) {
+    IntVec v;
+    Fill(&v, len);
+
+    for (size_t newlen = 0; newlen < 100; newlen++) {
+      const int* start_rep = v.array();
+      v.reserve(newlen);
+      const int* final_rep = v.array();
+      if (newlen <= len) {
+        EXPECT_EQ(start_rep, final_rep);
+      }
+      EXPECT_LE(newlen, v.capacity());
+
+      // Filling up to newlen should not change rep
+      while (v.size() < newlen) {
+        v.push_back(0);
+      }
+      EXPECT_EQ(final_rep, v.array());
+    }
+  }
+}
+
+template <typename T>
+static std::vector<typename T::value_type> Vec(const T& src) {
+  std::vector<typename T::value_type> result;
+  for (const auto& elem : src) {
+    result.push_back(elem);
+  }
+  return result;
+}
+
+TEST(IntVec, SelfRefPushBack) {
+  std::vector<string> std_v;
+  tensorflow::gtl::InlinedVector<string, 4> v;
+  const string s = "A very long string to ensure heap.";
+  std_v.push_back(s);
+  v.push_back(s);
+  for (int i = 0; i < 20; ++i) {
+    EXPECT_EQ(std_v, Vec(v));
+
+    v.push_back(v.back());
+    std_v.push_back(std_v.back());
+  }
+  EXPECT_EQ(std_v, Vec(v));
+}
+
+TEST(IntVec, Swap) {
+  for (int l1 = 0; l1 < 20; l1++) {
+    SCOPED_TRACE(l1);
+    for (int l2 = 0; l2 < 20; l2++) {
+      SCOPED_TRACE(l2);
+      IntVec a = Fill(l1, 0);
+      IntVec b = Fill(l2, 100);
+      {
+        using std::swap;
+        swap(a, b);
+      }
+      EXPECT_EQ(l1, b.size());
+      EXPECT_EQ(l2, a.size());
+      for (int i = 0; i < l1; i++) {
+        SCOPED_TRACE(i);
+        EXPECT_EQ(i, b[i]);
+      }
+      for (int i = 0; i < l2; i++) {
+        SCOPED_TRACE(i);
+        EXPECT_EQ(100 + i, a[i]);
+      }
+    }
+  }
+}
+
+TEST(InstanceVec, Swap) {
+  for (int l1 = 0; l1 < 20; l1++) {
+    for (int l2 = 0; l2 < 20; l2++) {
+      InstanceVec a, b;
+      for (int i = 0; i < l1; i++) a.push_back(Instance(i));
+      for (int i = 0; i < l2; i++) b.push_back(Instance(100 + i));
+      EXPECT_EQ(l1 + l2, instances);
+      {
+        using std::swap;
+        swap(a, b);
+      }
+      EXPECT_EQ(l1 + l2, instances);
+      EXPECT_EQ(l1, b.size());
+      EXPECT_EQ(l2, a.size());
+      for (int i = 0; i < l1; i++) {
+        EXPECT_EQ(i, b[i].value_);
+      }
+      for (int i = 0; i < l2; i++) {
+        EXPECT_EQ(100 + i, a[i].value_);
+      }
+    }
+  }
+}
+
+TEST(IntVec, EqualAndNotEqual) {
+  IntVec a, b;
+  EXPECT_TRUE(a == b);
+  EXPECT_FALSE(a != b);
+
+  a.push_back(3);
+  EXPECT_FALSE(a == b);
+  EXPECT_TRUE(a != b);
+
+  b.push_back(3);
+  EXPECT_TRUE(a == b);
+  EXPECT_FALSE(a != b);
+
+  b.push_back(7);
+  EXPECT_FALSE(a == b);
+  EXPECT_TRUE(a != b);
+
+  a.push_back(6);
+  EXPECT_FALSE(a == b);
+  EXPECT_TRUE(a != b);
+
+  a.clear();
+  b.clear();
+  for (int i = 0; i < 100; i++) {
+    a.push_back(i);
+    b.push_back(i);
+    EXPECT_TRUE(a == b);
+    EXPECT_FALSE(a != b);
+
+    b[i] = b[i] + 1;
+    EXPECT_FALSE(a == b);
+    EXPECT_TRUE(a != b);
+
+    b[i] = b[i] - 1;  // Back to before
+    EXPECT_TRUE(a == b);
+    EXPECT_FALSE(a != b);
+  }
+}
+
+TEST(IntVec, RelationalOps) {
+  IntVec a, b;
+  EXPECT_FALSE(a < b);
+  EXPECT_FALSE(b < a);
+  EXPECT_FALSE(a > b);
+  EXPECT_FALSE(b > a);
+  EXPECT_TRUE(a <= b);
+  EXPECT_TRUE(b <= a);
+  EXPECT_TRUE(a >= b);
+  EXPECT_TRUE(b >= a);
+  b.push_back(3);
+  EXPECT_TRUE(a < b);
+  EXPECT_FALSE(b < a);
+  EXPECT_FALSE(a > b);
+  EXPECT_TRUE(b > a);
+  EXPECT_TRUE(a <= b);
+  EXPECT_FALSE(b <= a);
+  EXPECT_FALSE(a >= b);
+  EXPECT_TRUE(b >= a);
+}
+
+TEST(InstanceVec, CountConstructorsDestructors) {
+  const int start = instances;
+  for (int len = 0; len < 20; len++) {
+    InstanceVec v;
+    for (int i = 0; i < len; i++) {
+      v.push_back(Instance(i));
+    }
+    EXPECT_EQ(start + len, instances);
+
+    {  // Copy constructor should create 'len' more instances.
+      InstanceVec v_copy(v);
+      EXPECT_EQ(start + len + len, instances);
+    }
+    EXPECT_EQ(start + len, instances);
+
+    // Enlarging resize() must construct some objects
+    v.resize(len + 10, Instance(100));
+    EXPECT_EQ(start + len + 10, instances);
+
+    // Shrinking resize() must destroy some objects
+    v.resize(len, Instance(100));
+    EXPECT_EQ(start + len, instances);
+
+    // reserve() must not increase the number of initialized objects
+    v.reserve(len + 1000);
+    EXPECT_EQ(start + len, instances);
+
+    // pop_back() and erase() must destroy one object
+    if (len > 0) {
+      v.pop_back();
+      EXPECT_EQ(start + len - 1, instances);
+      if (!v.empty()) {
+        v.erase(v.begin());
+        EXPECT_EQ(start + len - 2, instances);
+      }
+    }
+  }
+  EXPECT_EQ(start, instances);
+}
+
+TEST(InstanceVec, CountConstructorsDestructorsOnAssignment) {
+  const int start = instances;
+  for (int len = 0; len < 20; len++) {
+    for (int longorshort = 0; longorshort <= 1; ++longorshort) {
+      InstanceVec longer, shorter;
+      for (int i = 0; i < len; i++) {
+        longer.push_back(Instance(i));
+        shorter.push_back(Instance(i));
+      }
+      longer.push_back(Instance(len));
+      EXPECT_EQ(start + len + len + 1, instances);
+
+      if (longorshort) {
+        shorter = longer;
+        EXPECT_EQ(start + (len + 1) + (len + 1), instances);
+      } else {
+        longer = shorter;
+        EXPECT_EQ(start + len + len, instances);
+      }
+    }
+  }
+  EXPECT_EQ(start, instances);
+}
+
+TEST(RangedConstructor, SimpleType) {
+  std::vector<int> source_v = {4, 5, 6};
+  // First try to fit in inline backing
+  tensorflow::gtl::InlinedVector<int, 4> v(source_v.begin(), source_v.end());
+  EXPECT_EQ(3, v.size());
+  EXPECT_EQ(4, v.capacity());  // Indication that we're still on inlined storage
+  EXPECT_EQ(4, v[0]);
+  EXPECT_EQ(5, v[1]);
+  EXPECT_EQ(6, v[2]);
+
+  // Now, force a re-allocate
+  tensorflow::gtl::InlinedVector<int, 2> realloc_v(source_v.begin(),
+                                                   source_v.end());
+  EXPECT_EQ(3, realloc_v.size());
+  EXPECT_LT(2, realloc_v.capacity());
+  EXPECT_EQ(4, realloc_v[0]);
+  EXPECT_EQ(5, realloc_v[1]);
+  EXPECT_EQ(6, realloc_v[2]);
+}
+
+TEST(RangedConstructor, ComplexType) {
+  // We also use a list here to pass a different flavor of iterator (e.g. not
+  // random-access).
+  std::list<Instance> source_v = {Instance(0)};
+
+  // First try to fit in inline backing
+  tensorflow::gtl::InlinedVector<Instance, 1> v(source_v.begin(),
+                                                source_v.end());
+  EXPECT_EQ(1, v.size());
+  EXPECT_EQ(1, v.capacity());  // Indication that we're still on inlined storage
+  EXPECT_EQ(0, v[0].value_);
+
+  std::list<Instance> source_v2 = {Instance(0), Instance(1)};
+  // Now, force a re-allocate
+  tensorflow::gtl::InlinedVector<Instance, 1> realloc_v(source_v2.begin(),
+                                                        source_v2.end());
+  EXPECT_EQ(2, realloc_v.size());
+  EXPECT_LT(1, realloc_v.capacity());
+  EXPECT_EQ(0, realloc_v[0].value_);
+  EXPECT_EQ(1, realloc_v[1].value_);
+}
+
+TEST(RangedConstructor, ElementsAreConstructed) {
+  std::vector<string> source_v = {"cat", "dog"};
+
+  // Force expansion and re-allocation of v.  Ensures that when the vector is
+  // expanded that new elements are constructed.
+  tensorflow::gtl::InlinedVector<string, 1> v(source_v.begin(), source_v.end());
+  EXPECT_EQ("cat", v[0]);
+  EXPECT_EQ("dog", v[1]);
+}
+
+TEST(InitializerListConstructor, SimpleTypeWithInlineBacking) {
+  auto vec = tensorflow::gtl::InlinedVector<int, 4>{4, 5, 6};
+  EXPECT_EQ(3, vec.size());
+  EXPECT_EQ(4, vec.capacity());
+  EXPECT_EQ(4, vec[0]);
+  EXPECT_EQ(5, vec[1]);
+  EXPECT_EQ(6, vec[2]);
+}
+
+TEST(InitializerListConstructor, SimpleTypeWithReallocationRequired) {
+  auto vec = tensorflow::gtl::InlinedVector<int, 2>{4, 5, 6};
+  EXPECT_EQ(3, vec.size());
+  EXPECT_LE(3, vec.capacity());
+  EXPECT_EQ(4, vec[0]);
+  EXPECT_EQ(5, vec[1]);
+  EXPECT_EQ(6, vec[2]);
+}
+
+TEST(InitializerListConstructor, DisparateTypesInList) {
+  EXPECT_EQ((std::vector<int>{-7, 8}),
+            Vec(tensorflow::gtl::InlinedVector<int, 2>{-7, 8ULL}));
+
+  EXPECT_EQ(
+      (std::vector<string>{"foo", "bar"}),
+      Vec(tensorflow::gtl::InlinedVector<string, 2>{"foo", string("bar")}));
+}
+
+TEST(InitializerListConstructor, ComplexTypeWithInlineBacking) {
+  auto vec = tensorflow::gtl::InlinedVector<Instance, 1>{Instance(0)};
+  EXPECT_EQ(1, vec.size());
+  EXPECT_EQ(1, vec.capacity());
+  EXPECT_EQ(0, vec[0].value_);
+}
+
+TEST(InitializerListConstructor, ComplexTypeWithReallocationRequired) {
+  auto vec =
+      tensorflow::gtl::InlinedVector<Instance, 1>{Instance(0), Instance(1)};
+  EXPECT_EQ(2, vec.size());
+  EXPECT_LE(2, vec.capacity());
+  EXPECT_EQ(0, vec[0].value_);
+  EXPECT_EQ(1, vec[1].value_);
+}
+
+TEST(DynamicVec, DynamicVecCompiles) {
+  DynamicVec v;
+  (void)v;
+}
+
+#ifdef INLINED_VECTOR_HAS_ALLOC
+TEST(AllocatorSupportTest, Constructors) {
+  typedef STLCountingAllocator<int> MyAlloc;
+  typedef tensorflow::gtl::InlinedVector<int, 4, MyAlloc> AllocVec;
+  const int ia[] = {0, 1, 2, 3, 4, 5, 6, 7};
+  int64 allocated = 0;
+  MyAlloc alloc(&allocated);
+  { AllocVec TF_ATTRIBUTE_UNUSED v; }
+  { AllocVec TF_ATTRIBUTE_UNUSED v(alloc); }
+  { AllocVec TF_ATTRIBUTE_UNUSED v(ia, ia + arraysize(ia), alloc); }
+#ifdef LANG_CXX11
+  { AllocVec TF_ATTRIBUTE_UNUSED v({1, 2, 3}, alloc); }
+#endif  // LANG_CXX11
+}
+
+TEST(AllocatorSupportTest, CountAllocations) {
+  typedef STLCountingAllocator<int> MyAlloc;
+  typedef tensorflow::gtl::InlinedVector<int, 4, MyAlloc> AllocVec;
+  const int ia[] = {0, 1, 2, 3, 4, 5, 6, 7};
+  int64 allocated = 0;
+  MyAlloc alloc(&allocated);
+  {
+    AllocVec TF_ATTRIBUTE_UNUSED v(ia, ia + 4, alloc);
+    EXPECT_THAT(allocated, 0);
+  }
+  EXPECT_THAT(allocated, 0);
+  {
+    AllocVec TF_ATTRIBUTE_UNUSED v(ia, ia + arraysize(ia), alloc);
+    EXPECT_THAT(allocated, v.size() * sizeof(int));
+  }
+  EXPECT_THAT(allocated, 0);
+}
+
+TEST(AllocatorSupportTest, SwapBothAllocated) {
+  typedef STLCountingAllocator<int> MyAlloc;
+  typedef tensorflow::gtl::InlinedVector<int, 4, MyAlloc> AllocVec;
+  int64 allocated1 = 0;
+  int64 allocated2 = 0;
+  {
+    const std::vector<int> ia1 = {0, 1, 2, 3, 4, 5, 6, 7};
+    const std::vector<int> ia2 = {0, 1, 2, 3, 4, 5, 6, 7, 8};
+    MyAlloc a1(&allocated1);
+    MyAlloc a2(&allocated2);
+    AllocVec v1(ia1.data(), ia1.data() + ia1.size(), a1);
+    AllocVec v2(ia2.data(), ia2.data() + ia2.size(), a2);
+    EXPECT_LT(v1.capacity(), v2.capacity());
+    EXPECT_THAT(allocated1, v1.capacity() * sizeof(int));
+    EXPECT_THAT(allocated2, v2.capacity() * sizeof(int));
+    v1.swap(v2);
+    EXPECT_EQ(ia2, Vec(v1));
+    EXPECT_EQ(ia1, Vec(v2));
+    EXPECT_THAT(allocated1, v2.capacity() * sizeof(int));
+    EXPECT_THAT(allocated2, v1.capacity() * sizeof(int));
+  }
+  EXPECT_THAT(allocated1, 0);
+  EXPECT_THAT(allocated2, 0);
+}
+
+TEST(AllocatorSupportTest, SwapOneAllocated) {
+  typedef STLCountingAllocator<int> MyAlloc;
+  typedef tensorflow::gtl::InlinedVector<int, 4, MyAlloc> AllocVec;
+  int64 allocated1 = 0;
+  int64 allocated2 = 0;
+  {
+    const std::vector<int> ia1 = {0, 1, 2, 3, 4, 5, 6, 7};
+    const std::vector<int> ia2 = {0, 1, 2, 3};
+    MyAlloc a1(&allocated1);
+    MyAlloc a2(&allocated2);
+    AllocVec v1(ia1.data(), ia1.data() + ia1.size(), a1);
+    AllocVec v2(ia2.data(), ia2.data() + ia2.size(), a2);
+    EXPECT_THAT(allocated1, v1.capacity() * sizeof(int));
+    EXPECT_THAT(allocated2, 0);
+    v1.swap(v2);
+    EXPECT_EQ(ia2, Vec(v1));
+    EXPECT_EQ(ia1, Vec(v2));
+    EXPECT_THAT(allocated1, v2.capacity() * sizeof(int));
+    EXPECT_THAT(allocated2, 0);
+    EXPECT_TRUE(v2.get_allocator() == a1);
+    EXPECT_TRUE(v1.get_allocator() == a2);
+  }
+  EXPECT_THAT(allocated1, 0);
+  EXPECT_THAT(allocated2, 0);
+}
+#endif  // INLINED_VECTOR_HAS_ALLOC
+
+static void BM_InlinedVectorFill(int iters, int len) {
+  for (int i = 0; i < iters; i++) {
+    IntVec v;
+    for (int j = 0; j < len; j++) {
+      v.push_back(j);
+    }
+  }
+  testing::BytesProcessed((static_cast<int64>(iters) * len) * sizeof(int));
+}
+BENCHMARK(BM_InlinedVectorFill)->Range(0, 1024);
+
+static void BM_InlinedVectorFillRange(int iters, int len) {
+  std::unique_ptr<int[]> ia(new int[len]);
+  for (int j = 0; j < len; j++) {
+    ia[j] = j;
+  }
+  for (int i = 0; i < iters; i++) {
+    IntVec TF_ATTRIBUTE_UNUSED v(ia.get(), ia.get() + len);
+  }
+  testing::BytesProcessed((static_cast<int64>(iters) * len) * sizeof(int));
+}
+BENCHMARK(BM_InlinedVectorFillRange)->Range(0, 1024);
+
+static void BM_StdVectorFill(int iters, int len) {
+  for (int i = 0; i < iters; i++) {
+    std::vector<int> v;
+    for (int j = 0; j < len; j++) {
+      v.push_back(j);
+    }
+  }
+  testing::BytesProcessed((static_cast<int64>(iters) * len) * sizeof(int));
+}
+BENCHMARK(BM_StdVectorFill)->Range(0, 1024);
+
+namespace {
+struct Buffer {  // some arbitrary structure for benchmarking.
+  char* base;
+  int length;
+  int capacity;
+  void* user_data;
+};
+}  // anonymous namespace
+
+static void BM_InlinedVectorTenAssignments(int iters, int len) {
+  typedef tensorflow::gtl::InlinedVector<Buffer, 2> BufferVec;
+
+  BufferVec src;
+  src.resize(len);
+
+  iters *= 10;
+  BufferVec dst;
+  for (int i = 0; i < iters; i++) {
+    dst = src;
+  }
+}
+BENCHMARK(BM_InlinedVectorTenAssignments)
+    ->Arg(0)
+    ->Arg(1)
+    ->Arg(2)
+    ->Arg(3)
+    ->Arg(4)
+    ->Arg(20);
+
+static void BM_CreateFromInitializerList(int iters) {
+  for (; iters > 0; iters--) {
+    tensorflow::gtl::InlinedVector<int, 4> x{1, 2, 3};
+    (void)x[0];
+  }
+}
+BENCHMARK(BM_CreateFromInitializerList);
+
+namespace {
+
+struct LargeSwappable {
+  LargeSwappable() : d_(1024, 17) {}
+  ~LargeSwappable() {}
+  LargeSwappable(const LargeSwappable& o) : d_(o.d_) {}
+
+  friend void swap(LargeSwappable& a, LargeSwappable& b) {
+    using std::swap;
+    swap(a.d_, b.d_);
+  }
+
+  LargeSwappable& operator=(LargeSwappable o) {
+    using std::swap;
+    swap(*this, o);
+    return *this;
+  }
+
+  std::vector<int> d_;
+};
+
+}  // namespace
+
+static void BM_LargeSwappableElements(int iters, int len) {
+  typedef tensorflow::gtl::InlinedVector<LargeSwappable, 32> Vec;
+  Vec a(len);
+  Vec b;
+  while (--iters >= 0) {
+    using std::swap;
+    swap(a, b);
+  }
+}
+BENCHMARK(BM_LargeSwappableElements)->Range(0, 1024);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/gtl/int_type.h b/tensorflow/core/lib/gtl/int_type.h
new file mode 100644
index 0000000000..d3fcb08d38
--- /dev/null
+++ b/tensorflow/core/lib/gtl/int_type.h
@@ -0,0 +1,343 @@
+// #status: LEGACY
+// #category: Miscellaneous
+// #summary: Integral types; prefer util/intops/strong_int.h
+// #bugs: Infrastructure > C++ Library Team > util
+//
+// IntType is a simple template class mechanism for defining "logical"
+// integer-like class types that support many of the same functionalities
+// as native integer types, but which prevent assignment, construction, and
+// other operations from other similar integer-like types.  Essentially, the
+// template class IntType<IntTypeName, ValueType> (where ValueType assumes
+// valid scalar types such as int, uint, int32, etc) has the additional
+// property that it cannot be assigned to or constructed from other IntTypes
+// or native integer types of equal or implicitly convertible type.
+//
+// The class is useful for preventing mingling of integer variables with
+// different logical roles or units.  Unfortunately, C++ provides relatively
+// good type-safety for user-defined classes but not for integer types.  It is
+// essentially up to the user to use nice variable names and comments to prevent
+// accidental mismatches, such as confusing a user-index with a group-index or a
+// time-in-milliseconds with a time-in-seconds.  The use of typedefs are limited
+// in that regard as they do not enforce type-safety.
+//
+// USAGE -----------------------------------------------------------------------
+//
+//    DEFINE_INT_TYPE(IntTypeName, ValueType);
+//
+//  where:
+//    IntTypeName: is the desired (unique) name for the "logical" integer type.
+//    ValueType: is one of the integral types as defined by base::is_integral
+//               (see base/type_traits.h).
+//
+// DISALLOWED OPERATIONS / TYPE-SAFETY ENFORCEMENT -----------------------------
+//
+//  Consider these definitions and variable declarations:
+//    DEFINE_INT_TYPE(GlobalDocID, int64);
+//    DEFINE_INT_TYPE(LocalDocID, int64);
+//    GlobalDocID global;
+//    LocalDocID local;
+//
+//  The class IntType prevents:
+//
+//  1) Assignments of other IntTypes with different IntTypeNames.
+//
+//    global = local;                  <-- Fails to compile!
+//    local = global;                  <-- Fails to compile!
+//
+//  2) Explicit/implicit conversion from an IntType to another IntType.
+//
+//    LocalDocID l(global);            <-- Fails to compile!
+//    LocalDocID l = global;           <-- Fails to compile!
+//
+//    void GetGlobalDoc(GlobalDocID global) { }
+//    GetGlobalDoc(global);            <-- Compiles fine, types match!
+//    GetGlobalDoc(local);             <-- Fails to compile!
+//
+//  3) Implicit conversion from an IntType to a native integer type.
+//
+//    void GetGlobalDoc(int64 global) { ...
+//    GetGlobalDoc(global);            <-- Fails to compile!
+//    GetGlobalDoc(local);             <-- Fails to compile!
+//
+//    void GetLocalDoc(int32 local) { ...
+//    GetLocalDoc(global);             <-- Fails to compile!
+//    GetLocalDoc(local);              <-- Fails to compile!
+//
+//
+// SUPPORTED OPERATIONS --------------------------------------------------------
+//
+// The following operators are supported: unary: ++ (both prefix and postfix),
+// +, -, ! (logical not), ~ (one's complement); comparison: ==, !=, <, <=, >,
+// >=; numerical: +, -, *, /; assignment: =, +=, -=, /=, *=; stream: <<. Each
+// operator allows the same IntTypeName and the ValueType to be used on
+// both left- and right-hand sides.
+//
+// It also supports an accessor value() returning the stored value as ValueType,
+// and a templatized accessor value<T>() method that serves as syntactic sugar
+// for static_cast<T>(var.value()).  These accessors are useful when assigning
+// the stored value into protocol buffer fields and using it as printf args.
+//
+// The class also defines a hash functor that allows the IntType to be used
+// as key to hashable containers such as std::unordered_map and
+// std::unordered_set.
+//
+// We suggest using the IntTypeIndexedContainer wrapper around FixedArray and
+// STL vector (see int-type-indexed-container.h) if an IntType is intended to
+// be used as an index into these containers.  These wrappers are indexed in a
+// type-safe manner using IntTypes to ensure type-safety.
+//
+// NB: this implementation does not attempt to abide by or enforce dimensional
+// analysis on these scalar types.
+//
+// EXAMPLES --------------------------------------------------------------------
+//
+//    DEFINE_INT_TYPE(GlobalDocID, int64);
+//    GlobalDocID global = 3;
+//    cout << global;                      <-- Prints 3 to stdout.
+//
+//    for (GlobalDocID i(0); i < global; ++i) {
+//      cout << i;
+//    }                                    <-- Print(ln)s 0 1 2 to stdout
+//
+//    DEFINE_INT_TYPE(LocalDocID, int64);
+//    LocalDocID local;
+//    cout << local;                       <-- Prints 0 to stdout it default
+//                                             initializes the value to 0.
+//
+//    local = 5;
+//    local *= 2;
+//    LocalDocID l(local);
+//    cout << l + local;                   <-- Prints 20 to stdout.
+//
+//    GenericSearchRequest request;
+//    request.set_doc_id(global.value());  <-- Uses value() to extract the value
+//                                             from the IntType class.
+//
+// REMARKS ---------------------------------------------------------------------
+//
+// The following bad usage is permissible although discouraged.  Essentially, it
+// involves using the value*() accessors to extract the native integer type out
+// of the IntType class.  Keep in mind that the primary reason for the IntType
+// class is to prevent *accidental* mingling of similar logical integer types --
+// and not type casting from one type to another.
+//
+//  DEFINE_INT_TYPE(GlobalDocID, int64);
+//  DEFINE_INT_TYPE(LocalDocID, int64);
+//  GlobalDocID global;
+//  LocalDocID local;
+//
+//  global = local.value();                       <-- Compiles fine.
+//
+//  void GetGlobalDoc(GlobalDocID global) { ...
+//  GetGlobalDoc(local.value());                  <-- Compiles fine.
+//
+//  void GetGlobalDoc(int64 global) { ...
+//  GetGlobalDoc(local.value());                  <-- Compiles fine.
+
+#ifndef TENSORFLOW_LIB_GTL_INT_TYPE_H_
+#define TENSORFLOW_LIB_GTL_INT_TYPE_H_
+
+#include <stddef.h>
+#include <functional>
+#include <iosfwd>
+#include <ostream>  // NOLINT
+#include <unordered_map>
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace gtl {
+
+template <typename IntTypeName, typename _ValueType>
+class IntType;
+
+// Defines the IntType using value_type and typedefs it to int_type_name.
+// The struct int_type_name ## _tag_ trickery is needed to ensure that a new
+// type is created per int_type_name.
+#define TF_LIB_GTL_DEFINE_INT_TYPE(int_type_name, value_type)          \
+  struct int_type_name##_tag_ {};                                      \
+  typedef ::tensorflow::gtl::IntType<int_type_name##_tag_, value_type> \
+      int_type_name;
+
+// Holds an integer value (of type ValueType) and behaves as a ValueType by
+// exposing assignment, unary, comparison, and arithmetic operators.
+//
+// The template parameter IntTypeName defines the name for the int type and must
+// be unique within a binary (the convenient DEFINE_INT_TYPE macro at the end of
+// the file generates a unique IntTypeName).  The parameter ValueType defines
+// the integer type value (see supported list above).
+//
+// This class is NOT thread-safe.
+template <typename IntTypeName, typename _ValueType>
+class IntType {
+ public:
+  typedef _ValueType ValueType;                      // for non-member operators
+  typedef IntType<IntTypeName, ValueType> ThisType;  // Syntactic sugar.
+
+  // Note that this may change from time to time without notice.
+  struct Hasher {
+    size_t operator()(const IntType& arg) const {
+      return static_cast<size_t>(arg.value());
+    }
+  };
+
+ public:
+  // Default c'tor initializing value_ to 0.
+  constexpr IntType() : value_(0) {}
+  // C'tor explicitly initializing from a ValueType.
+  constexpr explicit IntType(ValueType value) : value_(value) {}
+
+  // IntType uses the default copy constructor, destructor and assign operator.
+  // The defaults are sufficient and omitting them allows the compiler to add
+  // the move constructor/assignment.
+
+  // -- ACCESSORS --------------------------------------------------------------
+  // The class provides a value() accessor returning the stored ValueType value_
+  // as well as a templatized accessor that is just a syntactic sugar for
+  // static_cast<T>(var.value());
+  constexpr ValueType value() const { return value_; }
+
+  template <typename ValType>
+  constexpr ValType value() const {
+    return static_cast<ValType>(value_);
+  }
+
+  // -- UNARY OPERATORS --------------------------------------------------------
+  ThisType& operator++() {  // prefix ++
+    ++value_;
+    return *this;
+  }
+  const ThisType operator++(int v) {  // postfix ++
+    ThisType temp(*this);
+    ++value_;
+    return temp;
+  }
+  ThisType& operator--() {  // prefix --
+    --value_;
+    return *this;
+  }
+  const ThisType operator--(int v) {  // postfix --
+    ThisType temp(*this);
+    --value_;
+    return temp;
+  }
+
+  constexpr bool operator!() const { return value_ == 0; }
+  constexpr const ThisType operator+() const { return ThisType(value_); }
+  constexpr const ThisType operator-() const { return ThisType(-value_); }
+  constexpr const ThisType operator~() const { return ThisType(~value_); }
+
+// -- ASSIGNMENT OPERATORS ---------------------------------------------------
+// We support the following assignment operators: =, +=, -=, *=, /=, <<=, >>=
+// and %= for both ThisType and ValueType.
+#define INT_TYPE_ASSIGNMENT_OP(op)                   \
+  ThisType& operator op(const ThisType& arg_value) { \
+    value_ op arg_value.value();                     \
+    return *this;                                    \
+  }                                                  \
+  ThisType& operator op(ValueType arg_value) {       \
+    value_ op arg_value;                             \
+    return *this;                                    \
+  }
+  INT_TYPE_ASSIGNMENT_OP(+= );
+  INT_TYPE_ASSIGNMENT_OP(-= );
+  INT_TYPE_ASSIGNMENT_OP(*= );
+  INT_TYPE_ASSIGNMENT_OP(/= );
+  INT_TYPE_ASSIGNMENT_OP(<<= );  // NOLINT
+  INT_TYPE_ASSIGNMENT_OP(>>= );  // NOLINT
+  INT_TYPE_ASSIGNMENT_OP(%= );
+#undef INT_TYPE_ASSIGNMENT_OP
+
+  ThisType& operator=(ValueType arg_value) {
+    value_ = arg_value;
+    return *this;
+  }
+
+ private:
+  // The integer value of type ValueType.
+  ValueType value_;
+
+  static_assert(std::is_integral<ValueType>::value, "invalid integer type");
+} TF_PACKED;
+
+// -- NON-MEMBER STREAM OPERATORS ----------------------------------------------
+// We provide the << operator, primarily for logging purposes.  Currently, there
+// seems to be no need for an >> operator.
+template <typename IntTypeName, typename ValueType>
+std::ostream& operator<<(std::ostream& os,  // NOLINT
+                         IntType<IntTypeName, ValueType> arg) {
+  return os << arg.value();
+}
+
+// -- NON-MEMBER ARITHMETIC OPERATORS ------------------------------------------
+// We support only the +, -, *, and / operators with the same IntType and
+// ValueType types.  The reason is to allow simple manipulation on these IDs
+// when used as indices in vectors and arrays.
+//
+// NB: Although it is possible to do IntType * IntType and IntType / IntType,
+// it is probably non-sensical from a dimensionality analysis perspective.
+#define INT_TYPE_ARITHMETIC_OP(op)                                        \
+  template <typename IntTypeName, typename ValueType>                     \
+  static inline constexpr IntType<IntTypeName, ValueType> operator op(    \
+      IntType<IntTypeName, ValueType> id_1,                               \
+      IntType<IntTypeName, ValueType> id_2) {                             \
+    return IntType<IntTypeName, ValueType>(id_1.value() op id_2.value()); \
+  }                                                                       \
+  template <typename IntTypeName, typename ValueType>                     \
+  static inline constexpr IntType<IntTypeName, ValueType> operator op(    \
+      IntType<IntTypeName, ValueType> id,                                 \
+      typename IntType<IntTypeName, ValueType>::ValueType arg_val) {      \
+    return IntType<IntTypeName, ValueType>(id.value() op arg_val);        \
+  }                                                                       \
+  template <typename IntTypeName, typename ValueType>                     \
+  static inline constexpr IntType<IntTypeName, ValueType> operator op(    \
+      typename IntType<IntTypeName, ValueType>::ValueType arg_val,        \
+      IntType<IntTypeName, ValueType> id) {                               \
+    return IntType<IntTypeName, ValueType>(arg_val op id.value());        \
+  }
+INT_TYPE_ARITHMETIC_OP(+);
+INT_TYPE_ARITHMETIC_OP(-);
+INT_TYPE_ARITHMETIC_OP(*);
+INT_TYPE_ARITHMETIC_OP(/ );
+INT_TYPE_ARITHMETIC_OP(<< );  // NOLINT
+INT_TYPE_ARITHMETIC_OP(>> );  // NOLINT
+INT_TYPE_ARITHMETIC_OP(% );
+#undef INT_TYPE_ARITHMETIC_OP
+
+// -- NON-MEMBER COMPARISON OPERATORS ------------------------------------------
+// Static inline comparison operators.  We allow all comparison operators among
+// the following types (OP \in [==, !=, <, <=, >, >=]:
+//   IntType<IntTypeName, ValueType> OP IntType<IntTypeName, ValueType>
+//   IntType<IntTypeName, ValueType> OP ValueType
+//   ValueType OP IntType<IntTypeName, ValueType>
+#define INT_TYPE_COMPARISON_OP(op)                               \
+  template <typename IntTypeName, typename ValueType>            \
+  static inline constexpr bool operator op(                      \
+      IntType<IntTypeName, ValueType> id_1,                      \
+      IntType<IntTypeName, ValueType> id_2) {                    \
+    return id_1.value() op id_2.value();                         \
+  }                                                              \
+  template <typename IntTypeName, typename ValueType>            \
+  static inline constexpr bool operator op(                      \
+      IntType<IntTypeName, ValueType> id,                        \
+      typename IntType<IntTypeName, ValueType>::ValueType val) { \
+    return id.value() op val;                                    \
+  }                                                              \
+  template <typename IntTypeName, typename ValueType>            \
+  static inline constexpr bool operator op(                      \
+      typename IntType<IntTypeName, ValueType>::ValueType val,   \
+      IntType<IntTypeName, ValueType> id) {                      \
+    return val op id.value();                                    \
+  }
+INT_TYPE_COMPARISON_OP(== );  // NOLINT
+INT_TYPE_COMPARISON_OP(!= );  // NOLINT
+INT_TYPE_COMPARISON_OP(< );   // NOLINT
+INT_TYPE_COMPARISON_OP(<= );  // NOLINT
+INT_TYPE_COMPARISON_OP(> );   // NOLINT
+INT_TYPE_COMPARISON_OP(>= );  // NOLINT
+#undef INT_TYPE_COMPARISON_OP
+
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_INT_TYPE_H_
diff --git a/tensorflow/core/lib/gtl/int_type_test.cc b/tensorflow/core/lib/gtl/int_type_test.cc
new file mode 100644
index 0000000000..694886d345
--- /dev/null
+++ b/tensorflow/core/lib/gtl/int_type_test.cc
@@ -0,0 +1,282 @@
+// Unit test cases for IntType.
+
+#include <memory>
+#include <unordered_map>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/gtl/int_type.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TF_LIB_GTL_DEFINE_INT_TYPE(Int8_IT, int8);
+TF_LIB_GTL_DEFINE_INT_TYPE(UInt8_IT, uint8);
+TF_LIB_GTL_DEFINE_INT_TYPE(Int16_IT, int16);
+TF_LIB_GTL_DEFINE_INT_TYPE(UInt16_IT, uint16);
+TF_LIB_GTL_DEFINE_INT_TYPE(Int32_IT, int32);
+TF_LIB_GTL_DEFINE_INT_TYPE(Int64_IT, int64);
+TF_LIB_GTL_DEFINE_INT_TYPE(UInt32_IT, uint32);
+TF_LIB_GTL_DEFINE_INT_TYPE(UInt64_IT, uint64);
+TF_LIB_GTL_DEFINE_INT_TYPE(Long_IT, long);  // NOLINT
+
+template <typename IntType_Type>
+class IntTypeTest : public ::testing::Test {
+ public:
+  typedef IntType_Type T;
+};
+
+// All tests below will be executed on all supported IntTypes.
+typedef ::testing::Types<Int8_IT, UInt8_IT, Int16_IT, UInt16_IT, Int32_IT,
+                         Int64_IT, UInt64_IT, Long_IT> SupportedIntTypes;
+
+TYPED_TEST_CASE(IntTypeTest, SupportedIntTypes);
+
+TYPED_TEST(IntTypeTest, TestInitialization) {
+  constexpr typename TestFixture::T a;
+  constexpr typename TestFixture::T b(1);
+  constexpr typename TestFixture::T c(b);
+  EXPECT_EQ(0, a);  // default initialization to 0
+  EXPECT_EQ(1, b);
+  EXPECT_EQ(1, c);
+}
+
+TYPED_TEST(IntTypeTest, TestOperators) {
+  typename TestFixture::T a(0);
+  typename TestFixture::T b(1);
+  typename TestFixture::T c(2);
+  constexpr typename TestFixture::T d(3);
+  constexpr typename TestFixture::T e(4);
+
+  // On all EXPECT_EQ below, we use the accessor value() as to not invoke the
+  // comparison operators which must themselves be tested.
+
+  // -- UNARY OPERATORS --------------------------------------------------------
+  EXPECT_EQ(0, (a++).value());
+  EXPECT_EQ(2, (++a).value());
+  EXPECT_EQ(2, (a--).value());
+  EXPECT_EQ(0, (--a).value());
+
+  EXPECT_EQ(true, !a);
+  EXPECT_EQ(false, !b);
+  static_assert(!d == false, "Unary operator! failed");
+
+  EXPECT_EQ(a.value(), +a);
+  static_assert(+d == d.value(), "Unary operator+ failed");
+  EXPECT_EQ(-a.value(), -a);
+  static_assert(-d == -d.value(), "Unary operator- failed");
+  EXPECT_EQ(~a.value(), ~a);  // ~zero
+  EXPECT_EQ(~b.value(), ~b);  // ~non-zero
+  static_assert(~d == ~d.value(), "Unary operator~ failed");
+
+  // -- ASSIGNMENT OPERATORS ---------------------------------------------------
+  // We test all assignment operators using IntType and constant as arguments.
+  // We also test the return from the operators.
+  // From same IntType
+  c = a = b;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+  // From constant
+  c = b = 2;
+  EXPECT_EQ(2, b.value());
+  EXPECT_EQ(2, c.value());
+  // From same IntType
+  c = a += b;
+  EXPECT_EQ(3, a.value());
+  EXPECT_EQ(3, c.value());
+  c = a -= b;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+  c = a *= b;
+  EXPECT_EQ(2, a.value());
+  EXPECT_EQ(2, c.value());
+  c = a /= b;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+  c = a <<= b;
+  EXPECT_EQ(4, a.value());
+  EXPECT_EQ(4, c.value());
+  c = a >>= b;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+  c = a %= b;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+  // From constant
+  c = a += 2;
+  EXPECT_EQ(3, a.value());
+  EXPECT_EQ(3, c.value());
+  c = a -= 2;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+  c = a *= 2;
+  EXPECT_EQ(2, a.value());
+  EXPECT_EQ(2, c.value());
+  c = a /= 2;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+  c = a <<= 2;
+  EXPECT_EQ(4, a.value());
+  EXPECT_EQ(4, c.value());
+  c = a >>= 2;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+  c = a %= 2;
+  EXPECT_EQ(1, a.value());
+  EXPECT_EQ(1, c.value());
+
+  // -- COMPARISON OPERATORS ---------------------------------------------------
+  a = 0;
+  b = 1;
+
+  EXPECT_FALSE(a == b);
+  EXPECT_TRUE(a == 0);   // NOLINT
+  EXPECT_FALSE(1 == a);  // NOLINT
+  static_assert(d == d, "operator== failed");
+  static_assert(d == 3, "operator== failed");
+  static_assert(3 == d, "operator== failed");
+  EXPECT_TRUE(a != b);
+  EXPECT_TRUE(a != 1);   // NOLINT
+  EXPECT_FALSE(0 != a);  // NOLINT
+  static_assert(d != e, "operator!= failed");
+  static_assert(d != 4, "operator!= failed");
+  static_assert(4 != d, "operator!= failed");
+  EXPECT_TRUE(a < b);
+  EXPECT_TRUE(a < 1);   // NOLINT
+  EXPECT_FALSE(0 < a);  // NOLINT
+  static_assert(d < e, "operator< failed");
+  static_assert(d < 4, "operator< failed");
+  static_assert(3 < e, "operator< failed");
+  EXPECT_TRUE(a <= b);
+  EXPECT_TRUE(a <= 1);  // NOLINT
+  EXPECT_TRUE(0 <= a);  // NOLINT
+  static_assert(d <= e, "operator<= failed");
+  static_assert(d <= 4, "operator<= failed");
+  static_assert(3 <= e, "operator<= failed");
+  EXPECT_FALSE(a > b);
+  EXPECT_FALSE(a > 1);  // NOLINT
+  EXPECT_FALSE(0 > a);  // NOLINT
+  static_assert(e > d, "operator> failed");
+  static_assert(e > 3, "operator> failed");
+  static_assert(4 > d, "operator> failed");
+  EXPECT_FALSE(a >= b);
+  EXPECT_FALSE(a >= 1);  // NOLINT
+  EXPECT_TRUE(0 >= a);   // NOLINT
+  static_assert(e >= d, "operator>= failed");
+  static_assert(e >= 3, "operator>= failed");
+  static_assert(4 >= d, "operator>= failed");
+
+  // -- BINARY OPERATORS -------------------------------------------------------
+  a = 1;
+  b = 3;
+  EXPECT_EQ(4, (a + b).value());
+  EXPECT_EQ(4, (a + 3).value());
+  EXPECT_EQ(4, (1 + b).value());
+  static_assert((d + e).value() == 7, "Binary operator+ failed");
+  static_assert((d + 4).value() == 7, "Binary operator+ failed");
+  static_assert((3 + e).value() == 7, "Binary operator+ failed");
+  EXPECT_EQ(2, (b - a).value());
+  EXPECT_EQ(2, (b - 1).value());
+  EXPECT_EQ(2, (3 - a).value());
+  static_assert((e - d).value() == 1, "Binary operator- failed");
+  static_assert((e - 3).value() == 1, "Binary operator- failed");
+  static_assert((4 - d).value() == 1, "Binary operator- failed");
+  EXPECT_EQ(3, (a * b).value());
+  EXPECT_EQ(3, (a * 3).value());
+  EXPECT_EQ(3, (1 * b).value());
+  static_assert((d * e).value() == 12, "Binary operator* failed");
+  static_assert((d * 4).value() == 12, "Binary operator* failed");
+  static_assert((3 * e).value() == 12, "Binary operator* failed");
+  EXPECT_EQ(0, (a / b).value());
+  EXPECT_EQ(0, (a / 3).value());
+  EXPECT_EQ(0, (1 / b).value());
+  static_assert((d / e).value() == 0, "Binary operator/ failed");
+  static_assert((d / 4).value() == 0, "Binary operator/ failed");
+  static_assert((3 / e).value() == 0, "Binary operator/ failed");
+  EXPECT_EQ(8, (a << b).value());
+  EXPECT_EQ(8, (a << 3).value());
+  EXPECT_EQ(8, (1 << b).value());
+  static_assert((d << e).value() == 48, "Binary operator<< failed");
+  static_assert((d << 4).value() == 48, "Binary operator<< failed");
+  static_assert((3 << e).value() == 48, "Binary operator<< failed");
+  b = 8;
+  EXPECT_EQ(4, (b >> a).value());
+  EXPECT_EQ(4, (b >> 1).value());
+  EXPECT_EQ(4, (8 >> a).value());
+  static_assert((d >> e).value() == 0, "Binary operator>> failed");
+  static_assert((d >> 4).value() == 0, "Binary operator>> failed");
+  static_assert((3 >> e).value() == 0, "Binary operator>> failed");
+  b = 3;
+  a = 2;
+  EXPECT_EQ(1, (b % a).value());
+  EXPECT_EQ(1, (b % 2).value());
+  EXPECT_EQ(1, (3 % a).value());
+  static_assert((e % d).value() == 1, "Binary operator% failed");
+  static_assert((e % 3).value() == 1, "Binary operator% failed");
+  static_assert((4 % d).value() == 1, "Binary operator% failed");
+}
+
+TYPED_TEST(IntTypeTest, TestHashFunctor) {
+  std::unordered_map<typename TestFixture::T, char,
+                     typename TestFixture::T::Hasher> map;
+  typename TestFixture::T a(0);
+  map[a] = 'c';
+  EXPECT_EQ('c', map[a]);
+  map[++a] = 'o';
+  EXPECT_EQ('o', map[a]);
+
+  typename TestFixture::T b(a);
+  EXPECT_EQ(typename TestFixture::T::Hasher()(a),
+            typename TestFixture::T::Hasher()(b));
+}
+
+// Tests the use of the templatized value accessor that performs static_casts.
+// We use -1 to force casting in unsigned integers.
+TYPED_TEST(IntTypeTest, TestValueAccessor) {
+  constexpr typename TestFixture::T::ValueType i = -1;
+  constexpr typename TestFixture::T int_type(i);
+  EXPECT_EQ(i, int_type.value());
+  static_assert(int_type.value() == i, "value() failed");
+  // The use of the keyword 'template' (suggested by Clang) is only necessary
+  // as this code is part of a template class.  Weird syntax though.  Good news
+  // is that only int_type.value<int>() is needed in most code.
+  EXPECT_EQ(static_cast<int>(i), int_type.template value<int>());
+  EXPECT_EQ(static_cast<int8>(i), int_type.template value<int8>());
+  EXPECT_EQ(static_cast<int16>(i), int_type.template value<int16>());
+  EXPECT_EQ(static_cast<int32>(i), int_type.template value<int32>());
+  EXPECT_EQ(static_cast<uint32>(i), int_type.template value<uint32>());
+  EXPECT_EQ(static_cast<int64>(i), int_type.template value<int64>());
+  EXPECT_EQ(static_cast<uint64>(i), int_type.template value<uint64>());
+  EXPECT_EQ(static_cast<long>(i), int_type.template value<long>());  // NOLINT
+  static_assert(int_type.template value<int>() == static_cast<int>(i),
+                "value<Value>() failed");
+}
+
+TYPED_TEST(IntTypeTest, TestMove) {
+  // Check that the int types have move constructor/assignment.
+  // We do this by composing a struct with an int type and a unique_ptr. This
+  // struct can't be copied due to the unique_ptr, so it must be moved.
+  // If this compiles, it means that the int types have move operators.
+  struct NotCopyable {
+    typename TestFixture::T inttype;
+    std::unique_ptr<int> ptr;
+
+    static NotCopyable Make(int i) {
+      NotCopyable f;
+      f.inttype = typename TestFixture::T(i);
+      f.ptr.reset(new int(i));
+      return f;
+    }
+  };
+
+  // Test move constructor.
+  NotCopyable foo = NotCopyable::Make(123);
+  EXPECT_EQ(123, foo.inttype);
+  EXPECT_EQ(123, *foo.ptr);
+
+  // Test move assignment.
+  foo = NotCopyable::Make(321);
+  EXPECT_EQ(321, foo.inttype);
+  EXPECT_EQ(321, *foo.ptr);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/gtl/iterator_range.h b/tensorflow/core/lib/gtl/iterator_range.h
new file mode 100644
index 0000000000..baec85c40a
--- /dev/null
+++ b/tensorflow/core/lib/gtl/iterator_range.h
@@ -0,0 +1,49 @@
+// This provides a very simple, boring adaptor for a begin and end iterator
+// into a range type. This should be used to build range views that work well
+// with range based for loops and range based constructors.
+//
+// Note that code here follows more standards-based coding conventions as it
+// is mirroring proposed interfaces for standardization.
+//
+// Converted from chandlerc@'s code to Google style by joshl@.
+
+#ifndef TENSORFLOW_LIB_GTL_ITERATOR_RANGE_H_
+#define TENSORFLOW_LIB_GTL_ITERATOR_RANGE_H_
+
+#include <utility>
+
+namespace tensorflow {
+namespace gtl {
+
+// A range adaptor for a pair of iterators.
+//
+// This just wraps two iterators into a range-compatible interface. Nothing
+// fancy at all.
+template <typename IteratorT>
+class iterator_range {
+ public:
+  iterator_range() : begin_iterator_(), end_iterator_() {}
+  iterator_range(IteratorT begin_iterator, IteratorT end_iterator)
+      : begin_iterator_(std::move(begin_iterator)),
+        end_iterator_(std::move(end_iterator)) {}
+
+  IteratorT begin() const { return begin_iterator_; }
+  IteratorT end() const { return end_iterator_; }
+
+ private:
+  IteratorT begin_iterator_, end_iterator_;
+};
+
+// Convenience function for iterating over sub-ranges.
+//
+// This provides a bit of syntactic sugar to make using sub-ranges
+// in for loops a bit easier. Analogous to std::make_pair().
+template <class T>
+iterator_range<T> make_range(T x, T y) {
+  return iterator_range<T>(std::move(x), std::move(y));
+}
+
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_ITERATOR_RANGE_H_
diff --git a/tensorflow/core/lib/gtl/iterator_range_test.cc b/tensorflow/core/lib/gtl/iterator_range_test.cc
new file mode 100644
index 0000000000..328be4ecbc
--- /dev/null
+++ b/tensorflow/core/lib/gtl/iterator_range_test.cc
@@ -0,0 +1,60 @@
+#include "tensorflow/core/lib/gtl/iterator_range.h"
+
+#include <vector>
+#include "tensorflow/core/platform/port.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace gtl {
+namespace {
+
+TEST(IteratorRange, WholeVector) {
+  std::vector<int> v = {2, 3, 5, 7, 11, 13};
+  iterator_range<std::vector<int>::iterator> range(v.begin(), v.end());
+  int index = 0;
+  for (int prime : range) {
+    ASSERT_LT(index, v.size());
+    EXPECT_EQ(v[index], prime);
+    ++index;
+  }
+  EXPECT_EQ(v.size(), index);
+}
+
+TEST(IteratorRange, VectorMakeRange) {
+  std::vector<int> v = {2, 3, 5, 7, 11, 13};
+  auto range = make_range(v.begin(), v.end());
+  int index = 0;
+  for (int prime : range) {
+    ASSERT_LT(index, v.size());
+    EXPECT_EQ(v[index], prime);
+    ++index;
+  }
+  EXPECT_EQ(v.size(), index);
+}
+
+TEST(IteratorRange, PartArray) {
+  int v[] = {2, 3, 5, 7, 11, 13};
+  iterator_range<int*> range(&v[1], &v[4]);  // 3, 5, 7
+  int index = 1;
+  for (int prime : range) {
+    ASSERT_LT(index, TF_ARRAYSIZE(v));
+    EXPECT_EQ(v[index], prime);
+    ++index;
+  }
+  EXPECT_EQ(4, index);
+}
+
+TEST(IteratorRange, ArrayMakeRange) {
+  int v[] = {2, 3, 5, 7, 11, 13};
+  auto range = make_range(&v[1], &v[4]);  // 3, 5, 7
+  int index = 1;
+  for (int prime : range) {
+    ASSERT_LT(index, TF_ARRAYSIZE(v));
+    EXPECT_EQ(v[index], prime);
+    ++index;
+  }
+  EXPECT_EQ(4, index);
+}
+}  // namespace
+}  // namespace gtl
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/gtl/manual_constructor.h b/tensorflow/core/lib/gtl/manual_constructor.h
new file mode 100644
index 0000000000..39f029ed4a
--- /dev/null
+++ b/tensorflow/core/lib/gtl/manual_constructor.h
@@ -0,0 +1,230 @@
+// ManualConstructor statically-allocates space in which to store some
+// object, but does not initialize it.  You can then call the constructor
+// and destructor for the object yourself as you see fit.  This is useful
+// for memory management optimizations, where you want to initialize and
+// destroy an object multiple times but only allocate it once.
+//
+// (When I say ManualConstructor statically allocates space, I mean that
+// the ManualConstructor object itself is forced to be the right size.)
+
+#ifndef TENSORFLOW_LIB_GTL_MANUAL_CONSTRUCTOR_H_
+#define TENSORFLOW_LIB_GTL_MANUAL_CONSTRUCTOR_H_
+
+#include <stddef.h>
+#include <new>
+#include <utility>
+
+#include "tensorflow/core/platform/port.h"  // For aligned_malloc/aligned_free
+
+namespace tensorflow {
+namespace gtl {
+namespace internal {
+
+//
+// Provides a char array with the exact same alignment as another type. The
+// first parameter must be a complete type, the second parameter is how many
+// of that type to provide space for.
+//
+//   TF_LIB_GTL_ALIGNED_CHAR_ARRAY(struct stat, 16) storage_;
+//
+// Because MSVC and older GCCs require that the argument to their alignment
+// construct to be a literal constant integer, we use a template instantiated
+// at all the possible powers of two.
+#ifndef SWIG
+template <int alignment, int size>
+struct AlignType {};
+template <int size>
+struct AlignType<0, size> {
+  typedef char result[size];
+};
+#if defined(COMPILER_MSVC)
+#define TF_LIB_GTL_ALIGN_ATTRIBUTE(X) __declspec(align(X))
+#define TF_LIB_GTL_ALIGN_OF(T) __alignof(T)
+#elif defined(COMPILER_GCC3) || __GNUC__ >= 3 || defined(__APPLE__) || \
+    defined(COMPILER_ICC) || defined(OS_NACL) || defined(__clang__)
+#define TF_LIB_GTL_ALIGN_ATTRIBUTE(X) __attribute__((aligned(X)))
+#define TF_LIB_GTL_ALIGN_OF(T) __alignof__(T)
+#endif
+
+#if defined(TF_LIB_GTL_ALIGN_ATTRIBUTE)
+
+#define TF_LIB_GTL_ALIGNTYPE_TEMPLATE(X)                     \
+  template <int size>                                        \
+  struct AlignType<X, size> {                                \
+    typedef TF_LIB_GTL_ALIGN_ATTRIBUTE(X) char result[size]; \
+  }
+
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(1);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(2);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(4);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(8);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(16);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(32);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(64);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(128);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(256);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(512);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(1024);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(2048);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(4096);
+TF_LIB_GTL_ALIGNTYPE_TEMPLATE(8192);
+// Any larger and MSVC++ will complain.
+
+#define TF_LIB_GTL_ALIGNED_CHAR_ARRAY(T, Size)                          \
+  typename tensorflow::gtl::internal::AlignType<TF_LIB_GTL_ALIGN_OF(T), \
+                                                sizeof(T) * Size>::result
+
+#undef TF_LIB_GTL_ALIGNTYPE_TEMPLATE
+#undef TF_LIB_GTL_ALIGN_ATTRIBUTE
+
+#else  // defined(TF_LIB_GTL_ALIGN_ATTRIBUTE)
+#error "You must define TF_LIB_GTL_ALIGNED_CHAR_ARRAY for your compiler."
+#endif  // defined(TF_LIB_GTL_ALIGN_ATTRIBUTE)
+
+#else  // !SWIG
+
+// SWIG can't represent alignment and doesn't care about alignment on data
+// members (it works fine without it).
+template <typename Size>
+struct AlignType {
+  typedef char result[Size];
+};
+#define TF_LIB_GTL_ALIGNED_CHAR_ARRAY(T, Size) \
+  tensorflow::gtl::internal::AlignType<Size * sizeof(T)>::result
+
+// Enough to parse with SWIG, will never be used by running code.
+#define TF_LIB_GTL_ALIGN_OF(Type) 16
+
+#endif  // !SWIG
+
+}  // namespace internal
+}  // namespace gtl
+
+template <typename Type>
+class ManualConstructor {
+ public:
+  // No constructor or destructor because one of the most useful uses of
+  // this class is as part of a union, and members of a union cannot have
+  // constructors or destructors.  And, anyway, the whole point of this
+  // class is to bypass these.
+
+  // Support users creating arrays of ManualConstructor<>s.  This ensures that
+  // the array itself has the correct alignment.
+  static void* operator new[](size_t size) {
+    return port::aligned_malloc(size, TF_LIB_GTL_ALIGN_OF(Type));
+  }
+  static void operator delete[](void* mem) { port::aligned_free(mem); }
+
+  inline Type* get() { return reinterpret_cast<Type*>(space_); }
+  inline const Type* get() const {
+    return reinterpret_cast<const Type*>(space_);
+  }
+
+  inline Type* operator->() { return get(); }
+  inline const Type* operator->() const { return get(); }
+
+  inline Type& operator*() { return *get(); }
+  inline const Type& operator*() const { return *get(); }
+
+  inline void Init() { new (space_) Type; }
+
+// Init() constructs the Type instance using the given arguments
+// (which are forwarded to Type's constructor). In C++11, Init() can
+// take any number of arguments of any type, and forwards them perfectly.
+// On pre-C++11 platforms, it can take up to 11 arguments, and may not be
+// able to forward certain kinds of arguments.
+//
+// Note that Init() with no arguments performs default-initialization,
+// not zero-initialization (i.e it behaves the same as "new Type;", not
+// "new Type();"), so it will leave non-class types uninitialized.
+#ifdef LANG_CXX11
+  template <typename... Ts>
+  inline void Init(Ts&&... args) {                 // NOLINT
+    new (space_) Type(std::forward<Ts>(args)...);  // NOLINT
+  }
+#else   // !defined(LANG_CXX11)
+  template <typename T1>
+  inline void Init(const T1& p1) {
+    new (space_) Type(p1);
+  }
+
+  template <typename T1, typename T2>
+  inline void Init(const T1& p1, const T2& p2) {
+    new (space_) Type(p1, p2);
+  }
+
+  template <typename T1, typename T2, typename T3>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3) {
+    new (space_) Type(p1, p2, p3);
+  }
+
+  template <typename T1, typename T2, typename T3, typename T4>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3, const T4& p4) {
+    new (space_) Type(p1, p2, p3, p4);
+  }
+
+  template <typename T1, typename T2, typename T3, typename T4, typename T5>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3, const T4& p4,
+                   const T5& p5) {
+    new (space_) Type(p1, p2, p3, p4, p5);
+  }
+
+  template <typename T1, typename T2, typename T3, typename T4, typename T5,
+            typename T6>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3, const T4& p4,
+                   const T5& p5, const T6& p6) {
+    new (space_) Type(p1, p2, p3, p4, p5, p6);
+  }
+
+  template <typename T1, typename T2, typename T3, typename T4, typename T5,
+            typename T6, typename T7>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3, const T4& p4,
+                   const T5& p5, const T6& p6, const T7& p7) {
+    new (space_) Type(p1, p2, p3, p4, p5, p6, p7);
+  }
+
+  template <typename T1, typename T2, typename T3, typename T4, typename T5,
+            typename T6, typename T7, typename T8>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3, const T4& p4,
+                   const T5& p5, const T6& p6, const T7& p7, const T8& p8) {
+    new (space_) Type(p1, p2, p3, p4, p5, p6, p7, p8);
+  }
+
+  template <typename T1, typename T2, typename T3, typename T4, typename T5,
+            typename T6, typename T7, typename T8, typename T9>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3, const T4& p4,
+                   const T5& p5, const T6& p6, const T7& p7, const T8& p8,
+                   const T9& p9) {
+    new (space_) Type(p1, p2, p3, p4, p5, p6, p7, p8, p9);
+  }
+
+  template <typename T1, typename T2, typename T3, typename T4, typename T5,
+            typename T6, typename T7, typename T8, typename T9, typename T10>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3, const T4& p4,
+                   const T5& p5, const T6& p6, const T7& p7, const T8& p8,
+                   const T9& p9, const T10& p10) {
+    new (space_) Type(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10);
+  }
+
+  template <typename T1, typename T2, typename T3, typename T4, typename T5,
+            typename T6, typename T7, typename T8, typename T9, typename T10,
+            typename T11>
+  inline void Init(const T1& p1, const T2& p2, const T3& p3, const T4& p4,
+                   const T5& p5, const T6& p6, const T7& p7, const T8& p8,
+                   const T9& p9, const T10& p10, const T11& p11) {
+    new (space_) Type(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11);
+  }
+#endif  // LANG_CXX11
+
+  inline void Destroy() { get()->~Type(); }
+
+ private:
+  TF_LIB_GTL_ALIGNED_CHAR_ARRAY(Type, 1) space_;
+};
+
+#undef TF_LIB_GTL_ALIGNED_CHAR_ARRAY
+#undef TF_LIB_GTL_ALIGN_OF
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_MANUAL_CONSTRUCTOR_H_
diff --git a/tensorflow/core/lib/gtl/manual_constructor_test.cc b/tensorflow/core/lib/gtl/manual_constructor_test.cc
new file mode 100644
index 0000000000..a929591be2
--- /dev/null
+++ b/tensorflow/core/lib/gtl/manual_constructor_test.cc
@@ -0,0 +1,113 @@
+#include "tensorflow/core/lib/gtl/manual_constructor.h"
+
+#include <stdint.h>
+
+#include "tensorflow/core/platform/logging.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+static int constructor_count_ = 0;
+
+template <int kSize>
+struct TestN {
+  TestN() { ++constructor_count_; }
+  ~TestN() { --constructor_count_; }
+  char a[kSize];
+};
+
+typedef TestN<1> Test1;
+typedef TestN<2> Test2;
+typedef TestN<3> Test3;
+typedef TestN<4> Test4;
+typedef TestN<5> Test5;
+typedef TestN<9> Test9;
+typedef TestN<15> Test15;
+
+}  // namespace
+
+namespace {
+
+TEST(ManualConstructorTest, Sizeof) {
+  CHECK_EQ(sizeof(ManualConstructor<Test1>), sizeof(Test1));
+  CHECK_EQ(sizeof(ManualConstructor<Test2>), sizeof(Test2));
+  CHECK_EQ(sizeof(ManualConstructor<Test3>), sizeof(Test3));
+  CHECK_EQ(sizeof(ManualConstructor<Test4>), sizeof(Test4));
+  CHECK_EQ(sizeof(ManualConstructor<Test5>), sizeof(Test5));
+  CHECK_EQ(sizeof(ManualConstructor<Test9>), sizeof(Test9));
+  CHECK_EQ(sizeof(ManualConstructor<Test15>), sizeof(Test15));
+
+  CHECK_EQ(constructor_count_, 0);
+  ManualConstructor<Test1> mt[4];
+  CHECK_EQ(sizeof(mt), 4);
+  CHECK_EQ(constructor_count_, 0);
+  mt[0].Init();
+  CHECK_EQ(constructor_count_, 1);
+  mt[0].Destroy();
+}
+
+TEST(ManualConstructorTest, Alignment) {
+  // We want to make sure that ManualConstructor aligns its memory properly
+  // on a word barrier.  Otherwise, it might be unexpectedly slow, since
+  // memory access will be unaligned.
+
+  struct {
+    char a;
+    ManualConstructor<void*> b;
+  } test1;
+  struct {
+    char a;
+    void* b;
+  } control1;
+
+  // TODO(bww): Make these tests more direct with C++11 alignment_of<T>::value.
+  EXPECT_EQ(reinterpret_cast<char*>(test1.b.get()) - &test1.a,
+            reinterpret_cast<char*>(&control1.b) - &control1.a);
+  EXPECT_EQ(reinterpret_cast<intptr_t>(test1.b.get()) % sizeof(control1.b), 0);
+
+  struct {
+    char a;
+    ManualConstructor<long double> b;
+  } test2;
+  struct {
+    char a;
+    long double b;
+  } control2;
+
+  EXPECT_EQ(reinterpret_cast<char*>(test2.b.get()) - &test2.a,
+            reinterpret_cast<char*>(&control2.b) - &control2.a);
+#ifdef ARCH_K8
+  EXPECT_EQ(reinterpret_cast<intptr_t>(test2.b.get()) % 16, 0);
+#endif
+#ifdef ARCH_PIII
+  EXPECT_EQ(reinterpret_cast<intptr_t>(test2.b.get()) % 4, 0);
+#endif
+}
+
+TEST(ManualConstructorTest, DefaultInitialize) {
+  struct X {
+    X() : x(123) {}
+    int x;
+  };
+  union {
+    ManualConstructor<X> x;
+    ManualConstructor<int> y;
+  } u;
+  *u.y = -1;
+  u.x.Init();  // should default-initialize u.x
+  EXPECT_EQ(123, u.x->x);
+}
+
+TEST(ManualConstructorTest, ZeroInitializePOD) {
+  union {
+    ManualConstructor<int> x;
+    ManualConstructor<int> y;
+  } u;
+  *u.y = -1;
+  u.x.Init();  // should not zero-initialize u.x
+  EXPECT_EQ(-1, *u.y);
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/gtl/map_util.h b/tensorflow/core/lib/gtl/map_util.h
new file mode 100644
index 0000000000..c953de57c7
--- /dev/null
+++ b/tensorflow/core/lib/gtl/map_util.h
@@ -0,0 +1,123 @@
+// This file provides utility functions for use with STL map-like data
+// structures, such as std::map and hash_map. Some functions will also work with
+// sets, such as ContainsKey().
+
+#ifndef TENSORFLOW_LIB_GTL_MAP_UTIL_H_
+#define TENSORFLOW_LIB_GTL_MAP_UTIL_H_
+
+#include <stddef.h>
+#include <iterator>
+#include <memory>
+#include <string>
+#include <utility>
+#include <vector>
+
+namespace tensorflow {
+namespace gtl {
+
+// Returns a pointer to the const value associated with the given key if it
+// exists, or NULL otherwise.
+template <class Collection>
+const typename Collection::value_type::second_type* FindOrNull(
+    const Collection& collection,
+    const typename Collection::value_type::first_type& key) {
+  typename Collection::const_iterator it = collection.find(key);
+  if (it == collection.end()) {
+    return 0;
+  }
+  return &it->second;
+}
+
+// Same as above but returns a pointer to the non-const value.
+template <class Collection>
+typename Collection::value_type::second_type* FindOrNull(
+    Collection& collection,  // NOLINT
+    const typename Collection::value_type::first_type& key) {
+  typename Collection::iterator it = collection.find(key);
+  if (it == collection.end()) {
+    return 0;
+  }
+  return &it->second;
+}
+
+// Returns the pointer value associated with the given key. If none is found,
+// NULL is returned. The function is designed to be used with a map of keys to
+// pointers.
+//
+// This function does not distinguish between a missing key and a key mapped
+// to a NULL value.
+template <class Collection>
+typename Collection::value_type::second_type FindPtrOrNull(
+    const Collection& collection,
+    const typename Collection::value_type::first_type& key) {
+  typename Collection::const_iterator it = collection.find(key);
+  if (it == collection.end()) {
+    return typename Collection::value_type::second_type();
+  }
+  return it->second;
+}
+
+// Returns a const reference to the value associated with the given key if it
+// exists, otherwise returns a const reference to the provided default value.
+//
+// WARNING: If a temporary object is passed as the default "value,"
+// this function will return a reference to that temporary object,
+// which will be destroyed at the end of the statement. A common
+// example: if you have a map with string values, and you pass a char*
+// as the default "value," either use the returned value immediately
+// or store it in a string (not string&).
+template <class Collection>
+const typename Collection::value_type::second_type& FindWithDefault(
+    const Collection& collection,
+    const typename Collection::value_type::first_type& key,
+    const typename Collection::value_type::second_type& value) {
+  typename Collection::const_iterator it = collection.find(key);
+  if (it == collection.end()) {
+    return value;
+  }
+  return it->second;
+}
+
+// Inserts the given key and value into the given collection if and only if the
+// given key did NOT already exist in the collection. If the key previously
+// existed in the collection, the value is not changed. Returns true if the
+// key-value pair was inserted; returns false if the key was already present.
+template <class Collection>
+bool InsertIfNotPresent(Collection* const collection,
+                        const typename Collection::value_type& vt) {
+  return collection->insert(vt).second;
+}
+
+// Same as above except the key and value are passed separately.
+template <class Collection>
+bool InsertIfNotPresent(
+    Collection* const collection,
+    const typename Collection::value_type::first_type& key,
+    const typename Collection::value_type::second_type& value) {
+  return InsertIfNotPresent(collection,
+                            typename Collection::value_type(key, value));
+}
+
+// Looks up a given key and value pair in a collection and inserts the key-value
+// pair if it's not already present. Returns a reference to the value associated
+// with the key.
+template <class Collection>
+typename Collection::value_type::second_type& LookupOrInsert(
+    Collection* const collection, const typename Collection::value_type& vt) {
+  return collection->insert(vt).first->second;
+}
+
+// Same as above except the key-value are passed separately.
+template <class Collection>
+typename Collection::value_type::second_type& LookupOrInsert(
+    Collection* const collection,
+    const typename Collection::value_type::first_type& key,
+    const typename Collection::value_type::second_type& value) {
+  return LookupOrInsert(collection,
+                        typename Collection::value_type(key, value));
+}
+
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_MAP_UTIL_H_
diff --git a/tensorflow/core/lib/gtl/map_util_test.cc b/tensorflow/core/lib/gtl/map_util_test.cc
new file mode 100644
index 0000000000..356f987337
--- /dev/null
+++ b/tensorflow/core/lib/gtl/map_util_test.cc
@@ -0,0 +1,47 @@
+#include "tensorflow/core/lib/gtl/map_util.h"
+
+#include <map>
+#include <set>
+#include <string>
+#include "tensorflow/core/platform/port.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TEST(MapUtil, Find) {
+  typedef std::map<string, string> Map;
+  Map m;
+
+  // Check that I can use a type that's implicitly convertible to the
+  // key or value type, such as const char* -> string.
+  EXPECT_EQ("", gtl::FindWithDefault(m, "foo", ""));
+  m["foo"] = "bar";
+  EXPECT_EQ("bar", gtl::FindWithDefault(m, "foo", ""));
+  EXPECT_EQ("bar", *gtl::FindOrNull(m, "foo"));
+  string str;
+  EXPECT_TRUE(m.count("foo") > 0);
+  EXPECT_EQ(m["foo"], "bar");
+}
+
+TEST(MapUtil, LookupOrInsert) {
+  typedef std::map<string, string> Map;
+  Map m;
+
+  // Check that I can use a type that's implicitly convertible to the
+  // key or value type, such as const char* -> string.
+  EXPECT_EQ("xyz", gtl::LookupOrInsert(&m, "foo", "xyz"));
+  EXPECT_EQ("xyz", gtl::LookupOrInsert(&m, "foo", "abc"));
+}
+
+TEST(MapUtil, InsertIfNotPresent) {
+  // Set operations
+  typedef std::set<int> Set;
+  Set s;
+  EXPECT_TRUE(gtl::InsertIfNotPresent(&s, 0));
+  EXPECT_EQ(s.count(0), 1);
+  EXPECT_FALSE(gtl::InsertIfNotPresent(&s, 0));
+  EXPECT_EQ(s.count(0), 1);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/gtl/stl_util.h b/tensorflow/core/lib/gtl/stl_util.h
new file mode 100644
index 0000000000..83abcd6b55
--- /dev/null
+++ b/tensorflow/core/lib/gtl/stl_util.h
@@ -0,0 +1,130 @@
+// This file provides utility functions for use with STL
+
+#ifndef TENSORFLOW_LIB_GTL_STL_UTIL_H_
+#define TENSORFLOW_LIB_GTL_STL_UTIL_H_
+
+#include <stddef.h>
+#include <algorithm>
+#include <iterator>
+#include <memory>
+#include <string>
+#include <utility>
+#include <vector>
+
+namespace tensorflow {
+namespace gtl {
+
+// Returns a mutable char* pointing to a string's internal buffer, which may not
+// be null-terminated. Returns NULL for an empty string. If not non-null,
+// writing through this pointer will modify the string.
+//
+// string_as_array(&str)[i] is valid for 0 <= i < str.size() until the
+// next call to a string method that invalidates iterators.
+//
+// In C++11 you may simply use &str[0] to get a mutable char*.
+//
+// Prior to C++11, there was no standard-blessed way of getting a mutable
+// reference to a string's internal buffer. The requirement that string be
+// contiguous is officially part of the C++11 standard [string.require]/5.
+// According to Matt Austern, this should already work on all current C++98
+// implementations.
+inline char* string_as_array(string* str) {
+  return str->empty() ? NULL : &*str->begin();
+}
+
+// Returns the T* array for the given vector, or NULL if the vector was empty.
+//
+// Note: If you know the array will never be empty, you can use &*v.begin()
+// directly, but that is may dump core if v is empty. This function is the most
+// efficient code that will work, taking into account how our STL is actually
+// implemented. THIS IS NON-PORTABLE CODE, so use this function instead of
+// repeating the nonportable code everywhere. If our STL implementation changes,
+// we will need to change this as well.
+template <typename T, typename Allocator>
+inline T* vector_as_array(std::vector<T, Allocator>* v) {
+#if defined NDEBUG && !defined _GLIBCXX_DEBUG
+  return &*v->begin();
+#else
+  return v->empty() ? NULL : &*v->begin();
+#endif
+}
+// vector_as_array overload for const std::vector<>.
+template <typename T, typename Allocator>
+inline const T* vector_as_array(const std::vector<T, Allocator>* v) {
+#if defined NDEBUG && !defined _GLIBCXX_DEBUG
+  return &*v->begin();
+#else
+  return v->empty() ? NULL : &*v->begin();
+#endif
+}
+
+// Like str->resize(new_size), except any new characters added to "*str" as a
+// result of resizing may be left uninitialized, rather than being filled with
+// '0' bytes. Typically used when code is then going to overwrite the backing
+// store of the string with known data. Uses a Google extension to ::string.
+inline void STLStringResizeUninitialized(string* s, size_t new_size) {
+#if __google_stl_resize_uninitialized_string
+  s->resize_uninitialized(new_size);
+#else
+  s->resize(new_size);
+#endif
+}
+
+// Calls delete (non-array version) on the SECOND item (pointer) in each pair in
+// the range [begin, end).
+//
+// Note: If you're calling this on an entire container, you probably want to
+// call STLDeleteValues(&container) instead, or use ValueDeleter.
+template <typename ForwardIterator>
+void STLDeleteContainerPairSecondPointers(ForwardIterator begin,
+                                          ForwardIterator end) {
+  while (begin != end) {
+    ForwardIterator temp = begin;
+    ++begin;
+    delete temp->second;
+  }
+}
+
+// Deletes all the elements in an STL container and clears the container. This
+// function is suitable for use with a vector, set, hash_set, or any other STL
+// container which defines sensible begin(), end(), and clear() methods.
+//
+// If container is NULL, this function is a no-op.
+template <typename T>
+void STLDeleteElements(T* container) {
+  if (!container) return;
+  auto it = container->begin();
+  while (it != container->end()) {
+    auto temp = it;
+    ++it;
+    delete *temp;
+  }
+  container->clear();
+}
+
+// Given an STL container consisting of (key, value) pairs, STLDeleteValues
+// deletes all the "value" components and clears the container. Does nothing in
+// the case it's given a NULL pointer.
+template <typename T>
+void STLDeleteValues(T* container) {
+  if (!container) return;
+  auto it = container->begin();
+  while (it != container->end()) {
+    auto temp = it;
+    ++it;
+    delete temp->second;
+  }
+  container->clear();
+}
+
+// Sorts and removes duplicates from a sequence container.
+template <typename T>
+inline void STLSortAndRemoveDuplicates(T* v) {
+  std::sort(v->begin(), v->end());
+  v->erase(std::unique(v->begin(), v->end()), v->end());
+}
+
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_STL_UTIL_H_
diff --git a/tensorflow/core/lib/gtl/top_n.h b/tensorflow/core/lib/gtl/top_n.h
new file mode 100644
index 0000000000..b95b998c21
--- /dev/null
+++ b/tensorflow/core/lib/gtl/top_n.h
@@ -0,0 +1,324 @@
+// This simple class finds the top n elements of an incrementally provided set
+// of elements which you push one at a time.  If the number of elements exceeds
+// n, the lowest elements are incrementally dropped.  At the end you get
+// a vector of the top elements sorted in descending order (through Extract() or
+// ExtractNondestructive()), or a vector of the top elements but not sorted
+// (through ExtractUnsorted() or ExtractUnsortedNondestructive()).
+//
+// The value n is specified in the constructor.  If there are p elements pushed
+// altogether:
+//   The total storage requirements are O(min(n, p)) elements
+//   The running time is O(p * log(min(n, p))) comparisons
+// If n is a constant, the total storage required is a constant and the running
+// time is linear in p.
+//
+// NOTE(zhifengc): There is a way to do this in O(min(n, p)) storage and O(p)
+// runtime. The basic idea is to repeatedly fill up a buffer of 2 * n elements,
+// discarding the lowest n elements whenever the buffer is full using a linear-
+// time median algorithm. This may have better performance when the input
+// sequence is partially sorted.
+//
+// NOTE(zhifengc): This class should be redesigned to avoid reallocating a
+// vector for each Extract.
+
+#ifndef TENSORFLOW_LIB_GTL_TOP_N_H_
+#define TENSORFLOW_LIB_GTL_TOP_N_H_
+
+#include <stddef.h>
+#include <algorithm>
+#include <functional>
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace gtl {
+
+// Cmp is an stl binary predicate.  Note that Cmp is the "greater" predicate,
+// not the more commonly used "less" predicate.
+//
+// If you use a "less" predicate here, the TopN will pick out the bottom N
+// elements out of the ones passed to it, and it will return them sorted in
+// ascending order.
+//
+// TopN is rule-of-zero copyable and movable if its members are.
+template <class T, class Cmp = std::greater<T> >
+class TopN {
+ public:
+  // The TopN is in one of the three states:
+  //
+  //  o UNORDERED: this is the state an instance is originally in,
+  //    where the elements are completely orderless.
+  //
+  //  o BOTTOM_KNOWN: in this state, we keep the invariant that there
+  //    is at least one element in it, and the lowest element is at
+  //    position 0. The elements in other positions remain
+  //    unsorted. This state is reached if the state was originally
+  //    UNORDERED and a peek_bottom() function call is invoked.
+  //
+  //  o HEAP_SORTED: in this state, the array is kept as a heap and
+  //    there are exactly (limit_+1) elements in the array. This
+  //    state is reached when at least (limit_+1) elements are
+  //    pushed in.
+  //
+  //  The state transition graph is at follows:
+  //
+  //             peek_bottom()                (limit_+1) elements
+  //  UNORDERED --------------> BOTTOM_KNOWN --------------------> HEAP_SORTED
+  //      |                                                           ^
+  //      |                      (limit_+1) elements                  |
+  //      +-----------------------------------------------------------+
+
+  enum State { UNORDERED, BOTTOM_KNOWN, HEAP_SORTED };
+  using UnsortedIterator = typename std::vector<T>::const_iterator;
+
+  // 'limit' is the maximum number of top results to return.
+  explicit TopN(size_t limit) : TopN(limit, Cmp()) {}
+  TopN(size_t limit, const Cmp &cmp) : limit_(limit), cmp_(cmp) {}
+
+  size_t limit() const { return limit_; }
+
+  // Number of elements currently held by this TopN object.  This
+  // will be no greater than 'limit' passed to the constructor.
+  size_t size() const { return std::min(elements_.size(), limit_); }
+
+  bool empty() const { return size() == 0; }
+
+  // If you know how many elements you will push at the time you create the
+  // TopN object, you can call reserve to preallocate the memory that TopN
+  // will need to process all 'n' pushes.  Calling this method is optional.
+  void reserve(size_t n) { elements_.reserve(std::min(n, limit_ + 1)); }
+
+  // Push 'v'.  If the maximum number of elements was exceeded, drop the
+  // lowest element and return it in 'dropped' (if given). If the maximum is not
+  // exceeded, 'dropped' will remain unchanged. 'dropped' may be omitted or
+  // nullptr, in which case it is not filled in.
+  // Requires: T is CopyAssignable, Swappable
+  void push(const T &v) { push(v, nullptr); }
+  void push(const T &v, T *dropped) { PushInternal(v, dropped); }
+
+  // Move overloads of push.
+  // Requires: T is MoveAssignable, Swappable
+  void push(T &&v) {  // NOLINT(build/c++11)
+    push(std::move(v), nullptr);
+  }
+  void push(T &&v, T *dropped) {  // NOLINT(build/c++11)
+    PushInternal(std::move(v), dropped);
+  }
+
+  // Peeks the bottom result without calling Extract()
+  const T &peek_bottom();
+
+  // Extract the elements as a vector sorted in descending order.  The caller
+  // assumes ownership of the vector and must delete it when done.  This is a
+  // destructive operation.  The only method that can be called immediately
+  // after Extract() is Reset().
+  std::vector<T> *Extract();
+
+  // Similar to Extract(), but makes no guarantees the elements are in sorted
+  // order.  As with Extract(), the caller assumes ownership of the vector and
+  // must delete it when done.  This is a destructive operation.  The only
+  // method that can be called immediately after ExtractUnsorted() is Reset().
+  std::vector<T> *ExtractUnsorted();
+
+  // A non-destructive version of Extract(). Copy the elements in a new vector
+  // sorted in descending order and return it.  The caller assumes ownership of
+  // the new vector and must delete it when done.  After calling
+  // ExtractNondestructive(), the caller can continue to push() new elements.
+  std::vector<T> *ExtractNondestructive() const;
+
+  // A non-destructive version of Extract(). Copy the elements to a given
+  // vector sorted in descending order. After calling
+  // ExtractNondestructive(), the caller can continue to push() new elements.
+  // Note:
+  //  1. The given argument must to be allocated.
+  //  2. Any data contained in the vector prior to the call will be deleted
+  //     from it. After the call the vector will contain only the elements
+  //     from the data structure.
+  void ExtractNondestructive(std::vector<T> *output) const;
+
+  // A non-destructive version of ExtractUnsorted(). Copy the elements in a new
+  // vector and return it, with no guarantees the elements are in sorted order.
+  // The caller assumes ownership of the new vector and must delete it when
+  // done.  After calling ExtractUnsortedNondestructive(), the caller can
+  // continue to push() new elements.
+  std::vector<T> *ExtractUnsortedNondestructive() const;
+
+  // A non-destructive version of ExtractUnsorted(). Copy the elements into
+  // a given vector, with no guarantees the elements are in sorted order.
+  // After calling ExtractUnsortedNondestructive(), the caller can continue
+  // to push() new elements.
+  // Note:
+  //  1. The given argument must to be allocated.
+  //  2. Any data contained in the vector prior to the call will be deleted
+  //     from it. After the call the vector will contain only the elements
+  //     from the data structure.
+  void ExtractUnsortedNondestructive(std::vector<T> *output) const;
+
+  // Return an iterator to the beginning (end) of the container,
+  // with no guarantees about the order of iteration. These iterators are
+  // invalidated by mutation of the data structure.
+  UnsortedIterator unsorted_begin() const { return elements_.begin(); }
+  UnsortedIterator unsorted_end() const { return elements_.begin() + size(); }
+
+  // Accessor for comparator template argument.
+  Cmp *comparator() { return &cmp_; }
+
+  // This removes all elements.  If Extract() or ExtractUnsorted() have been
+  // called, this will put it back in an empty but useable state.
+  void Reset();
+
+ private:
+  template <typename U>
+  void PushInternal(U &&v, T *dropped);  // NOLINT(build/c++11)
+
+  // elements_ can be in one of two states:
+  //   elements_.size() <= limit_:  elements_ is an unsorted vector of elements
+  //      pushed so far.
+  //   elements_.size() > limit_:  The last element of elements_ is unused;
+  //      the other elements of elements_ are an stl heap whose size is exactly
+  //      limit_.  In this case elements_.size() is exactly one greater than
+  //      limit_, but don't use "elements_.size() == limit_ + 1" to check for
+  //      that because you'll get a false positive if limit_ == size_t(-1).
+  std::vector<T> elements_;
+  size_t limit_;  // Maximum number of elements to find
+  Cmp cmp_;       // Greater-than comparison function
+  State state_ = UNORDERED;
+};
+
+// ----------------------------------------------------------------------
+// Implementations of non-inline functions
+
+template <class T, class Cmp>
+template <typename U>
+void TopN<T, Cmp>::PushInternal(U &&v, T *dropped) {  // NOLINT(build/c++11)
+  if (limit_ == 0) {
+    if (dropped) *dropped = std::forward<U>(v);  // NOLINT(build/c++11)
+    return;
+  }
+  if (state_ != HEAP_SORTED) {
+    elements_.push_back(std::forward<U>(v));  // NOLINT(build/c++11)
+    if (state_ == UNORDERED || cmp_(elements_.back(), elements_.front())) {
+      // Easy case: we just pushed the new element back
+    } else {
+      // To maintain the BOTTOM_KNOWN state, we need to make sure that
+      // the element at position 0 is always the smallest. So we put
+      // the new element at position 0 and push the original bottom
+      // element in the back.
+      // Warning: this code is subtle.
+      using std::swap;
+      swap(elements_.front(), elements_.back());
+    }
+    if (elements_.size() == limit_ + 1) {
+      // Transition from unsorted vector to a heap.
+      std::make_heap(elements_.begin(), elements_.end(), cmp_);
+      if (dropped) *dropped = std::move(elements_.front());
+      std::pop_heap(elements_.begin(), elements_.end(), cmp_);
+      state_ = HEAP_SORTED;
+    }
+  } else {
+    // Only insert the new element if it is greater than the least element.
+    if (cmp_(v, elements_.front())) {
+      elements_.back() = std::forward<U>(v);  // NOLINT(build/c++11)
+      std::push_heap(elements_.begin(), elements_.end(), cmp_);
+      if (dropped) *dropped = std::move(elements_.front());
+      std::pop_heap(elements_.begin(), elements_.end(), cmp_);
+    } else {
+      if (dropped) *dropped = std::forward<U>(v);  // NOLINT(build/c++11)
+    }
+  }
+}
+
+template <class T, class Cmp>
+const T &TopN<T, Cmp>::peek_bottom() {
+  CHECK(!empty());
+  if (state_ == UNORDERED) {
+    // We need to do a linear scan to find out the bottom element
+    int min_candidate = 0;
+    for (size_t i = 1; i < elements_.size(); ++i) {
+      if (cmp_(elements_[min_candidate], elements_[i])) {
+        min_candidate = i;
+      }
+    }
+    // By swapping the element at position 0 and the minimal
+    // element, we transition to the BOTTOM_KNOWN state
+    if (min_candidate != 0) {
+      using std::swap;
+      swap(elements_[0], elements_[min_candidate]);
+    }
+    state_ = BOTTOM_KNOWN;
+  }
+  return elements_.front();
+}
+
+template <class T, class Cmp>
+std::vector<T> *TopN<T, Cmp>::Extract() {
+  auto out = new std::vector<T>;
+  out->swap(elements_);
+  if (state_ != HEAP_SORTED) {
+    std::sort(out->begin(), out->end(), cmp_);
+  } else {
+    out->pop_back();
+    std::sort_heap(out->begin(), out->end(), cmp_);
+  }
+  return out;
+}
+
+template <class T, class Cmp>
+std::vector<T> *TopN<T, Cmp>::ExtractUnsorted() {
+  auto out = new std::vector<T>;
+  out->swap(elements_);
+  if (state_ == HEAP_SORTED) {
+    // Remove the limit_+1'th element.
+    out->pop_back();
+  }
+  return out;
+}
+
+template <class T, class Cmp>
+std::vector<T> *TopN<T, Cmp>::ExtractNondestructive() const {
+  auto out = new std::vector<T>;
+  ExtractNondestructive(out);
+  return out;
+}
+
+template <class T, class Cmp>
+void TopN<T, Cmp>::ExtractNondestructive(std::vector<T> *output) const {
+  CHECK(output);
+  *output = elements_;
+  if (state_ != HEAP_SORTED) {
+    std::sort(output->begin(), output->end(), cmp_);
+  } else {
+    output->pop_back();
+    std::sort_heap(output->begin(), output->end(), cmp_);
+  }
+}
+
+template <class T, class Cmp>
+std::vector<T> *TopN<T, Cmp>::ExtractUnsortedNondestructive() const {
+  auto elements = new std::vector<T>;
+  ExtractUnsortedNondestructive(elements);
+  return elements;
+}
+
+template <class T, class Cmp>
+void TopN<T, Cmp>::ExtractUnsortedNondestructive(std::vector<T> *output) const {
+  CHECK(output);
+  *output = elements_;
+  if (state_ == HEAP_SORTED) {
+    // Remove the limit_+1'th element.
+    output->pop_back();
+  }
+}
+
+template <class T, class Cmp>
+void TopN<T, Cmp>::Reset() {
+  elements_.clear();
+  state_ = UNORDERED;
+}
+
+}  // namespace gtl
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_GTL_TOP_N_H_
diff --git a/tensorflow/core/lib/gtl/top_n_test.cc b/tensorflow/core/lib/gtl/top_n_test.cc
new file mode 100644
index 0000000000..1812a1bd3f
--- /dev/null
+++ b/tensorflow/core/lib/gtl/top_n_test.cc
@@ -0,0 +1,249 @@
+// Unit test for TopN.
+
+#include "tensorflow/core/lib/gtl/top_n.h"
+
+#include <string>
+#include <vector>
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace {
+
+using tensorflow::gtl::TopN;
+using tensorflow::random::PhiloxRandom;
+using tensorflow::random::SimplePhilox;
+using tensorflow::string;
+
+// Move the contents from an owned raw pointer, returning by value.
+// Objects are easier to manage by value.
+template <class T>
+T ConsumeRawPtr(T *p) {
+  T tmp = std::move(*p);
+  delete p;
+  return tmp;
+}
+
+template <class Cmp>
+void TestIntTopNHelper(size_t limit, size_t n_elements, const Cmp &cmp,
+                       SimplePhilox *random, bool test_peek,
+                       bool test_extract_unsorted) {
+  LOG(INFO) << "Testing limit=" << limit << ", n_elements=" << n_elements
+            << ", test_peek=" << test_peek
+            << ", test_extract_unsorted=" << test_extract_unsorted;
+  TopN<int, Cmp> top(limit, cmp);
+  std::vector<int> shadow(n_elements);
+  for (int i = 0; i != n_elements; ++i) shadow[i] = random->Uniform(limit);
+  for (int e : shadow) top.push(e);
+  std::sort(shadow.begin(), shadow.end(), cmp);
+  size_t top_size = std::min(limit, n_elements);
+  EXPECT_EQ(top_size, top.size());
+  if (test_peek && top_size != 0) {
+    EXPECT_EQ(shadow[top_size - 1], top.peek_bottom());
+  }
+  std::vector<int> v;
+  if (test_extract_unsorted) {
+    v = ConsumeRawPtr(top.ExtractUnsorted());
+    std::sort(v.begin(), v.end(), cmp);
+  } else {
+    v = ConsumeRawPtr(top.Extract());
+  }
+  EXPECT_EQ(top_size, v.size());
+  for (int i = 0; i != top_size; ++i) {
+    VLOG(1) << "Top element " << v[i];
+    EXPECT_EQ(shadow[i], v[i]);
+  }
+}
+
+template <class Cmp>
+void TestIntTopN(size_t limit, size_t n_elements, const Cmp &cmp,
+                 SimplePhilox *random) {
+  // Test peek_bottom() and Extract()
+  TestIntTopNHelper(limit, n_elements, cmp, random, true, false);
+  // Test Extract()
+  TestIntTopNHelper(limit, n_elements, cmp, random, false, false);
+  // Test peek_bottom() and ExtractUnsorted()
+  TestIntTopNHelper(limit, n_elements, cmp, random, true, true);
+  // Test ExtractUnsorted()
+  TestIntTopNHelper(limit, n_elements, cmp, random, false, true);
+}
+
+TEST(TopNTest, Misc) {
+  PhiloxRandom philox(1, 1);
+  SimplePhilox random(&philox);
+
+  TestIntTopN(0, 5, std::greater<int>(), &random);
+  TestIntTopN(32, 0, std::greater<int>(), &random);
+  TestIntTopN(6, 6, std::greater<int>(), &random);
+  TestIntTopN(6, 6, std::less<int>(), &random);
+  TestIntTopN(1000, 999, std::greater<int>(), &random);
+  TestIntTopN(1000, 1000, std::greater<int>(), &random);
+  TestIntTopN(1000, 1001, std::greater<int>(), &random);
+  TestIntTopN(2300, 28393, std::less<int>(), &random);
+  TestIntTopN(30, 100, std::greater<int>(), &random);
+  TestIntTopN(100, 30, std::less<int>(), &random);
+  TestIntTopN(size_t(-1), 3, std::greater<int>(), &random);
+  TestIntTopN(size_t(-1), 0, std::greater<int>(), &random);
+  TestIntTopN(0, 5, std::greater<int>(), &random);
+}
+
+TEST(TopNTest, String) {
+  LOG(INFO) << "Testing strings";
+
+  TopN<string> top(3);
+  EXPECT_TRUE(top.empty());
+  top.push("abracadabra");
+  top.push("waldemar");
+  EXPECT_EQ(2, top.size());
+  EXPECT_EQ("abracadabra", top.peek_bottom());
+  top.push("");
+  EXPECT_EQ(3, top.size());
+  EXPECT_EQ("", top.peek_bottom());
+  top.push("top");
+  EXPECT_EQ(3, top.size());
+  EXPECT_EQ("abracadabra", top.peek_bottom());
+  top.push("Google");
+  top.push("test");
+  EXPECT_EQ(3, top.size());
+  EXPECT_EQ("test", top.peek_bottom());
+  TopN<string> top2(top);
+  TopN<string> top3(5);
+  top3 = top;
+  EXPECT_EQ("test", top3.peek_bottom());
+  {
+    std::vector<string> s = ConsumeRawPtr(top.Extract());
+    EXPECT_EQ(s[0], "waldemar");
+    EXPECT_EQ(s[1], "top");
+    EXPECT_EQ(s[2], "test");
+  }
+
+  top2.push("zero");
+  EXPECT_EQ(top2.peek_bottom(), "top");
+
+  {
+    std::vector<string> s = ConsumeRawPtr(top2.Extract());
+    EXPECT_EQ(s[0], "zero");
+    EXPECT_EQ(s[1], "waldemar");
+    EXPECT_EQ(s[2], "top");
+  }
+  {
+    std::vector<string> s = ConsumeRawPtr(top3.Extract());
+    EXPECT_EQ(s[0], "waldemar");
+    EXPECT_EQ(s[1], "top");
+    EXPECT_EQ(s[2], "test");
+  }
+
+  TopN<string> top4(3);
+  // Run this test twice to check Reset():
+  for (int i = 0; i < 2; ++i) {
+    top4.push("abcd");
+    top4.push("ijkl");
+    top4.push("efgh");
+    top4.push("mnop");
+    std::vector<string> s = ConsumeRawPtr(top4.Extract());
+    EXPECT_EQ(s[0], "mnop");
+    EXPECT_EQ(s[1], "ijkl");
+    EXPECT_EQ(s[2], "efgh");
+    top4.Reset();
+  }
+}
+
+// Test that pointers aren't leaked from a TopN if we use the 2-argument version
+// of push().
+TEST(TopNTest, Ptr) {
+  LOG(INFO) << "Testing 2-argument push()";
+  TopN<string *> topn(3);
+  for (int i = 0; i < 8; ++i) {
+    string *dropped = NULL;
+    topn.push(new string(std::to_string(i)), &dropped);
+    delete dropped;
+  }
+
+  for (int i = 8; i > 0; --i) {
+    string *dropped = NULL;
+    topn.push(new string(std::to_string(i)), &dropped);
+    delete dropped;
+  }
+
+  std::vector<string *> extract = ConsumeRawPtr(topn.Extract());
+  tensorflow::gtl::STLDeleteElements(&extract);
+}
+
+struct PointeeGreater {
+  template <typename T>
+  bool operator()(const T &a, const T &b) const {
+    return *a > *b;
+  }
+};
+
+TEST(TopNTest, MoveOnly) {
+  using StrPtr = std::unique_ptr<string>;
+  TopN<StrPtr, PointeeGreater> topn(3);
+  for (int i = 0; i < 8; ++i) topn.push(StrPtr(new string(std::to_string(i))));
+  for (int i = 8; i > 0; --i) topn.push(StrPtr(new string(std::to_string(i))));
+
+  std::vector<StrPtr> extract = ConsumeRawPtr(topn.Extract());
+  EXPECT_EQ(extract.size(), 3);
+  EXPECT_EQ(*(extract[0]), "8");
+  EXPECT_EQ(*(extract[1]), "7");
+  EXPECT_EQ(*(extract[2]), "7");
+}
+
+// Test that Nondestructive extracts do not need a Reset() afterwards,
+// and that pointers aren't leaked from a TopN after calling them.
+TEST(TopNTest, Nondestructive) {
+  LOG(INFO) << "Testing Nondestructive extracts";
+  TopN<int> top4(4);
+  for (int i = 0; i < 8; ++i) {
+    top4.push(i);
+    std::vector<int> v = ConsumeRawPtr(top4.ExtractNondestructive());
+    EXPECT_EQ(std::min(i + 1, 4), v.size());
+    for (size_t j = 0; j < v.size(); ++j) EXPECT_EQ(i - j, v[j]);
+  }
+
+  TopN<int> top3(3);
+  for (int i = 0; i < 8; ++i) {
+    top3.push(i);
+    std::vector<int> v = ConsumeRawPtr(top3.ExtractUnsortedNondestructive());
+    std::sort(v.begin(), v.end(), std::greater<int>());
+    EXPECT_EQ(std::min(i + 1, 3), v.size());
+    for (size_t j = 0; j < v.size(); ++j) EXPECT_EQ(i - j, v[j]);
+  }
+}
+
+struct ForbiddenCmp {
+  bool operator()(int lhs, int rhs) const {
+    LOG(FATAL) << "ForbiddenCmp called " << lhs << " " << rhs;
+  }
+};
+
+TEST(TopNTest, ZeroLimit) {
+  TopN<int, ForbiddenCmp> top(0);
+  top.push(1);
+  top.push(2);
+
+  int dropped = -1;
+  top.push(1, &dropped);
+  top.push(2, &dropped);
+
+  std::vector<int> v;
+  top.ExtractNondestructive(&v);
+  EXPECT_EQ(0, v.size());
+}
+
+TEST(TopNTest, Iteration) {
+  TopN<int> top(4);
+  for (int i = 0; i < 8; ++i) top.push(i);
+  std::vector<int> actual(top.unsorted_begin(), top.unsorted_end());
+  // Check that we have 4,5,6,7 as the top 4 (in some order, so we sort)
+  sort(actual.begin(), actual.end());
+  EXPECT_EQ(actual.size(), 4);
+  EXPECT_EQ(actual[0], 4);
+  EXPECT_EQ(actual[1], 5);
+  EXPECT_EQ(actual[2], 6);
+  EXPECT_EQ(actual[3], 7);
+}
+}  // namespace
diff --git a/tensorflow/core/lib/hash/crc32c.cc b/tensorflow/core/lib/hash/crc32c.cc
new file mode 100644
index 0000000000..3bef1cf78d
--- /dev/null
+++ b/tensorflow/core/lib/hash/crc32c.cc
@@ -0,0 +1,244 @@
+// A portable implementation of crc32c, optimized to handle
+// four bytes at a time.
+
+#include "tensorflow/core/lib/hash/crc32c.h"
+
+#include <stdint.h>
+#include "tensorflow/core/lib/core/coding.h"
+
+namespace tensorflow {
+namespace crc32c {
+
+static const uint32 table0_[256] = {
+    0x00000000, 0xf26b8303, 0xe13b70f7, 0x1350f3f4, 0xc79a971f, 0x35f1141c,
+    0x26a1e7e8, 0xd4ca64eb, 0x8ad958cf, 0x78b2dbcc, 0x6be22838, 0x9989ab3b,
+    0x4d43cfd0, 0xbf284cd3, 0xac78bf27, 0x5e133c24, 0x105ec76f, 0xe235446c,
+    0xf165b798, 0x030e349b, 0xd7c45070, 0x25afd373, 0x36ff2087, 0xc494a384,
+    0x9a879fa0, 0x68ec1ca3, 0x7bbcef57, 0x89d76c54, 0x5d1d08bf, 0xaf768bbc,
+    0xbc267848, 0x4e4dfb4b, 0x20bd8ede, 0xd2d60ddd, 0xc186fe29, 0x33ed7d2a,
+    0xe72719c1, 0x154c9ac2, 0x061c6936, 0xf477ea35, 0xaa64d611, 0x580f5512,
+    0x4b5fa6e6, 0xb93425e5, 0x6dfe410e, 0x9f95c20d, 0x8cc531f9, 0x7eaeb2fa,
+    0x30e349b1, 0xc288cab2, 0xd1d83946, 0x23b3ba45, 0xf779deae, 0x05125dad,
+    0x1642ae59, 0xe4292d5a, 0xba3a117e, 0x4851927d, 0x5b016189, 0xa96ae28a,
+    0x7da08661, 0x8fcb0562, 0x9c9bf696, 0x6ef07595, 0x417b1dbc, 0xb3109ebf,
+    0xa0406d4b, 0x522bee48, 0x86e18aa3, 0x748a09a0, 0x67dafa54, 0x95b17957,
+    0xcba24573, 0x39c9c670, 0x2a993584, 0xd8f2b687, 0x0c38d26c, 0xfe53516f,
+    0xed03a29b, 0x1f682198, 0x5125dad3, 0xa34e59d0, 0xb01eaa24, 0x42752927,
+    0x96bf4dcc, 0x64d4cecf, 0x77843d3b, 0x85efbe38, 0xdbfc821c, 0x2997011f,
+    0x3ac7f2eb, 0xc8ac71e8, 0x1c661503, 0xee0d9600, 0xfd5d65f4, 0x0f36e6f7,
+    0x61c69362, 0x93ad1061, 0x80fde395, 0x72966096, 0xa65c047d, 0x5437877e,
+    0x4767748a, 0xb50cf789, 0xeb1fcbad, 0x197448ae, 0x0a24bb5a, 0xf84f3859,
+    0x2c855cb2, 0xdeeedfb1, 0xcdbe2c45, 0x3fd5af46, 0x7198540d, 0x83f3d70e,
+    0x90a324fa, 0x62c8a7f9, 0xb602c312, 0x44694011, 0x5739b3e5, 0xa55230e6,
+    0xfb410cc2, 0x092a8fc1, 0x1a7a7c35, 0xe811ff36, 0x3cdb9bdd, 0xceb018de,
+    0xdde0eb2a, 0x2f8b6829, 0x82f63b78, 0x709db87b, 0x63cd4b8f, 0x91a6c88c,
+    0x456cac67, 0xb7072f64, 0xa457dc90, 0x563c5f93, 0x082f63b7, 0xfa44e0b4,
+    0xe9141340, 0x1b7f9043, 0xcfb5f4a8, 0x3dde77ab, 0x2e8e845f, 0xdce5075c,
+    0x92a8fc17, 0x60c37f14, 0x73938ce0, 0x81f80fe3, 0x55326b08, 0xa759e80b,
+    0xb4091bff, 0x466298fc, 0x1871a4d8, 0xea1a27db, 0xf94ad42f, 0x0b21572c,
+    0xdfeb33c7, 0x2d80b0c4, 0x3ed04330, 0xccbbc033, 0xa24bb5a6, 0x502036a5,
+    0x4370c551, 0xb11b4652, 0x65d122b9, 0x97baa1ba, 0x84ea524e, 0x7681d14d,
+    0x2892ed69, 0xdaf96e6a, 0xc9a99d9e, 0x3bc21e9d, 0xef087a76, 0x1d63f975,
+    0x0e330a81, 0xfc588982, 0xb21572c9, 0x407ef1ca, 0x532e023e, 0xa145813d,
+    0x758fe5d6, 0x87e466d5, 0x94b49521, 0x66df1622, 0x38cc2a06, 0xcaa7a905,
+    0xd9f75af1, 0x2b9cd9f2, 0xff56bd19, 0x0d3d3e1a, 0x1e6dcdee, 0xec064eed,
+    0xc38d26c4, 0x31e6a5c7, 0x22b65633, 0xd0ddd530, 0x0417b1db, 0xf67c32d8,
+    0xe52cc12c, 0x1747422f, 0x49547e0b, 0xbb3ffd08, 0xa86f0efc, 0x5a048dff,
+    0x8ecee914, 0x7ca56a17, 0x6ff599e3, 0x9d9e1ae0, 0xd3d3e1ab, 0x21b862a8,
+    0x32e8915c, 0xc083125f, 0x144976b4, 0xe622f5b7, 0xf5720643, 0x07198540,
+    0x590ab964, 0xab613a67, 0xb831c993, 0x4a5a4a90, 0x9e902e7b, 0x6cfbad78,
+    0x7fab5e8c, 0x8dc0dd8f, 0xe330a81a, 0x115b2b19, 0x020bd8ed, 0xf0605bee,
+    0x24aa3f05, 0xd6c1bc06, 0xc5914ff2, 0x37faccf1, 0x69e9f0d5, 0x9b8273d6,
+    0x88d28022, 0x7ab90321, 0xae7367ca, 0x5c18e4c9, 0x4f48173d, 0xbd23943e,
+    0xf36e6f75, 0x0105ec76, 0x12551f82, 0xe03e9c81, 0x34f4f86a, 0xc69f7b69,
+    0xd5cf889d, 0x27a40b9e, 0x79b737ba, 0x8bdcb4b9, 0x988c474d, 0x6ae7c44e,
+    0xbe2da0a5, 0x4c4623a6, 0x5f16d052, 0xad7d5351};
+static const uint32 table1_[256] = {
+    0x00000000, 0x13a29877, 0x274530ee, 0x34e7a899, 0x4e8a61dc, 0x5d28f9ab,
+    0x69cf5132, 0x7a6dc945, 0x9d14c3b8, 0x8eb65bcf, 0xba51f356, 0xa9f36b21,
+    0xd39ea264, 0xc03c3a13, 0xf4db928a, 0xe7790afd, 0x3fc5f181, 0x2c6769f6,
+    0x1880c16f, 0x0b225918, 0x714f905d, 0x62ed082a, 0x560aa0b3, 0x45a838c4,
+    0xa2d13239, 0xb173aa4e, 0x859402d7, 0x96369aa0, 0xec5b53e5, 0xfff9cb92,
+    0xcb1e630b, 0xd8bcfb7c, 0x7f8be302, 0x6c297b75, 0x58ced3ec, 0x4b6c4b9b,
+    0x310182de, 0x22a31aa9, 0x1644b230, 0x05e62a47, 0xe29f20ba, 0xf13db8cd,
+    0xc5da1054, 0xd6788823, 0xac154166, 0xbfb7d911, 0x8b507188, 0x98f2e9ff,
+    0x404e1283, 0x53ec8af4, 0x670b226d, 0x74a9ba1a, 0x0ec4735f, 0x1d66eb28,
+    0x298143b1, 0x3a23dbc6, 0xdd5ad13b, 0xcef8494c, 0xfa1fe1d5, 0xe9bd79a2,
+    0x93d0b0e7, 0x80722890, 0xb4958009, 0xa737187e, 0xff17c604, 0xecb55e73,
+    0xd852f6ea, 0xcbf06e9d, 0xb19da7d8, 0xa23f3faf, 0x96d89736, 0x857a0f41,
+    0x620305bc, 0x71a19dcb, 0x45463552, 0x56e4ad25, 0x2c896460, 0x3f2bfc17,
+    0x0bcc548e, 0x186eccf9, 0xc0d23785, 0xd370aff2, 0xe797076b, 0xf4359f1c,
+    0x8e585659, 0x9dface2e, 0xa91d66b7, 0xbabffec0, 0x5dc6f43d, 0x4e646c4a,
+    0x7a83c4d3, 0x69215ca4, 0x134c95e1, 0x00ee0d96, 0x3409a50f, 0x27ab3d78,
+    0x809c2506, 0x933ebd71, 0xa7d915e8, 0xb47b8d9f, 0xce1644da, 0xddb4dcad,
+    0xe9537434, 0xfaf1ec43, 0x1d88e6be, 0x0e2a7ec9, 0x3acdd650, 0x296f4e27,
+    0x53028762, 0x40a01f15, 0x7447b78c, 0x67e52ffb, 0xbf59d487, 0xacfb4cf0,
+    0x981ce469, 0x8bbe7c1e, 0xf1d3b55b, 0xe2712d2c, 0xd69685b5, 0xc5341dc2,
+    0x224d173f, 0x31ef8f48, 0x050827d1, 0x16aabfa6, 0x6cc776e3, 0x7f65ee94,
+    0x4b82460d, 0x5820de7a, 0xfbc3faf9, 0xe861628e, 0xdc86ca17, 0xcf245260,
+    0xb5499b25, 0xa6eb0352, 0x920cabcb, 0x81ae33bc, 0x66d73941, 0x7575a136,
+    0x419209af, 0x523091d8, 0x285d589d, 0x3bffc0ea, 0x0f186873, 0x1cbaf004,
+    0xc4060b78, 0xd7a4930f, 0xe3433b96, 0xf0e1a3e1, 0x8a8c6aa4, 0x992ef2d3,
+    0xadc95a4a, 0xbe6bc23d, 0x5912c8c0, 0x4ab050b7, 0x7e57f82e, 0x6df56059,
+    0x1798a91c, 0x043a316b, 0x30dd99f2, 0x237f0185, 0x844819fb, 0x97ea818c,
+    0xa30d2915, 0xb0afb162, 0xcac27827, 0xd960e050, 0xed8748c9, 0xfe25d0be,
+    0x195cda43, 0x0afe4234, 0x3e19eaad, 0x2dbb72da, 0x57d6bb9f, 0x447423e8,
+    0x70938b71, 0x63311306, 0xbb8de87a, 0xa82f700d, 0x9cc8d894, 0x8f6a40e3,
+    0xf50789a6, 0xe6a511d1, 0xd242b948, 0xc1e0213f, 0x26992bc2, 0x353bb3b5,
+    0x01dc1b2c, 0x127e835b, 0x68134a1e, 0x7bb1d269, 0x4f567af0, 0x5cf4e287,
+    0x04d43cfd, 0x1776a48a, 0x23910c13, 0x30339464, 0x4a5e5d21, 0x59fcc556,
+    0x6d1b6dcf, 0x7eb9f5b8, 0x99c0ff45, 0x8a626732, 0xbe85cfab, 0xad2757dc,
+    0xd74a9e99, 0xc4e806ee, 0xf00fae77, 0xe3ad3600, 0x3b11cd7c, 0x28b3550b,
+    0x1c54fd92, 0x0ff665e5, 0x759baca0, 0x663934d7, 0x52de9c4e, 0x417c0439,
+    0xa6050ec4, 0xb5a796b3, 0x81403e2a, 0x92e2a65d, 0xe88f6f18, 0xfb2df76f,
+    0xcfca5ff6, 0xdc68c781, 0x7b5fdfff, 0x68fd4788, 0x5c1aef11, 0x4fb87766,
+    0x35d5be23, 0x26772654, 0x12908ecd, 0x013216ba, 0xe64b1c47, 0xf5e98430,
+    0xc10e2ca9, 0xd2acb4de, 0xa8c17d9b, 0xbb63e5ec, 0x8f844d75, 0x9c26d502,
+    0x449a2e7e, 0x5738b609, 0x63df1e90, 0x707d86e7, 0x0a104fa2, 0x19b2d7d5,
+    0x2d557f4c, 0x3ef7e73b, 0xd98eedc6, 0xca2c75b1, 0xfecbdd28, 0xed69455f,
+    0x97048c1a, 0x84a6146d, 0xb041bcf4, 0xa3e32483};
+static const uint32 table2_[256] = {
+    0x00000000, 0xa541927e, 0x4f6f520d, 0xea2ec073, 0x9edea41a, 0x3b9f3664,
+    0xd1b1f617, 0x74f06469, 0x38513ec5, 0x9d10acbb, 0x773e6cc8, 0xd27ffeb6,
+    0xa68f9adf, 0x03ce08a1, 0xe9e0c8d2, 0x4ca15aac, 0x70a27d8a, 0xd5e3eff4,
+    0x3fcd2f87, 0x9a8cbdf9, 0xee7cd990, 0x4b3d4bee, 0xa1138b9d, 0x045219e3,
+    0x48f3434f, 0xedb2d131, 0x079c1142, 0xa2dd833c, 0xd62de755, 0x736c752b,
+    0x9942b558, 0x3c032726, 0xe144fb14, 0x4405696a, 0xae2ba919, 0x0b6a3b67,
+    0x7f9a5f0e, 0xdadbcd70, 0x30f50d03, 0x95b49f7d, 0xd915c5d1, 0x7c5457af,
+    0x967a97dc, 0x333b05a2, 0x47cb61cb, 0xe28af3b5, 0x08a433c6, 0xade5a1b8,
+    0x91e6869e, 0x34a714e0, 0xde89d493, 0x7bc846ed, 0x0f382284, 0xaa79b0fa,
+    0x40577089, 0xe516e2f7, 0xa9b7b85b, 0x0cf62a25, 0xe6d8ea56, 0x43997828,
+    0x37691c41, 0x92288e3f, 0x78064e4c, 0xdd47dc32, 0xc76580d9, 0x622412a7,
+    0x880ad2d4, 0x2d4b40aa, 0x59bb24c3, 0xfcfab6bd, 0x16d476ce, 0xb395e4b0,
+    0xff34be1c, 0x5a752c62, 0xb05bec11, 0x151a7e6f, 0x61ea1a06, 0xc4ab8878,
+    0x2e85480b, 0x8bc4da75, 0xb7c7fd53, 0x12866f2d, 0xf8a8af5e, 0x5de93d20,
+    0x29195949, 0x8c58cb37, 0x66760b44, 0xc337993a, 0x8f96c396, 0x2ad751e8,
+    0xc0f9919b, 0x65b803e5, 0x1148678c, 0xb409f5f2, 0x5e273581, 0xfb66a7ff,
+    0x26217bcd, 0x8360e9b3, 0x694e29c0, 0xcc0fbbbe, 0xb8ffdfd7, 0x1dbe4da9,
+    0xf7908dda, 0x52d11fa4, 0x1e704508, 0xbb31d776, 0x511f1705, 0xf45e857b,
+    0x80aee112, 0x25ef736c, 0xcfc1b31f, 0x6a802161, 0x56830647, 0xf3c29439,
+    0x19ec544a, 0xbcadc634, 0xc85da25d, 0x6d1c3023, 0x8732f050, 0x2273622e,
+    0x6ed23882, 0xcb93aafc, 0x21bd6a8f, 0x84fcf8f1, 0xf00c9c98, 0x554d0ee6,
+    0xbf63ce95, 0x1a225ceb, 0x8b277743, 0x2e66e53d, 0xc448254e, 0x6109b730,
+    0x15f9d359, 0xb0b84127, 0x5a968154, 0xffd7132a, 0xb3764986, 0x1637dbf8,
+    0xfc191b8b, 0x595889f5, 0x2da8ed9c, 0x88e97fe2, 0x62c7bf91, 0xc7862def,
+    0xfb850ac9, 0x5ec498b7, 0xb4ea58c4, 0x11abcaba, 0x655baed3, 0xc01a3cad,
+    0x2a34fcde, 0x8f756ea0, 0xc3d4340c, 0x6695a672, 0x8cbb6601, 0x29faf47f,
+    0x5d0a9016, 0xf84b0268, 0x1265c21b, 0xb7245065, 0x6a638c57, 0xcf221e29,
+    0x250cde5a, 0x804d4c24, 0xf4bd284d, 0x51fcba33, 0xbbd27a40, 0x1e93e83e,
+    0x5232b292, 0xf77320ec, 0x1d5de09f, 0xb81c72e1, 0xccec1688, 0x69ad84f6,
+    0x83834485, 0x26c2d6fb, 0x1ac1f1dd, 0xbf8063a3, 0x55aea3d0, 0xf0ef31ae,
+    0x841f55c7, 0x215ec7b9, 0xcb7007ca, 0x6e3195b4, 0x2290cf18, 0x87d15d66,
+    0x6dff9d15, 0xc8be0f6b, 0xbc4e6b02, 0x190ff97c, 0xf321390f, 0x5660ab71,
+    0x4c42f79a, 0xe90365e4, 0x032da597, 0xa66c37e9, 0xd29c5380, 0x77ddc1fe,
+    0x9df3018d, 0x38b293f3, 0x7413c95f, 0xd1525b21, 0x3b7c9b52, 0x9e3d092c,
+    0xeacd6d45, 0x4f8cff3b, 0xa5a23f48, 0x00e3ad36, 0x3ce08a10, 0x99a1186e,
+    0x738fd81d, 0xd6ce4a63, 0xa23e2e0a, 0x077fbc74, 0xed517c07, 0x4810ee79,
+    0x04b1b4d5, 0xa1f026ab, 0x4bdee6d8, 0xee9f74a6, 0x9a6f10cf, 0x3f2e82b1,
+    0xd50042c2, 0x7041d0bc, 0xad060c8e, 0x08479ef0, 0xe2695e83, 0x4728ccfd,
+    0x33d8a894, 0x96993aea, 0x7cb7fa99, 0xd9f668e7, 0x9557324b, 0x3016a035,
+    0xda386046, 0x7f79f238, 0x0b899651, 0xaec8042f, 0x44e6c45c, 0xe1a75622,
+    0xdda47104, 0x78e5e37a, 0x92cb2309, 0x378ab177, 0x437ad51e, 0xe63b4760,
+    0x0c158713, 0xa954156d, 0xe5f54fc1, 0x40b4ddbf, 0xaa9a1dcc, 0x0fdb8fb2,
+    0x7b2bebdb, 0xde6a79a5, 0x3444b9d6, 0x91052ba8};
+static const uint32 table3_[256] = {
+    0x00000000, 0xdd45aab8, 0xbf672381, 0x62228939, 0x7b2231f3, 0xa6679b4b,
+    0xc4451272, 0x1900b8ca, 0xf64463e6, 0x2b01c95e, 0x49234067, 0x9466eadf,
+    0x8d665215, 0x5023f8ad, 0x32017194, 0xef44db2c, 0xe964b13d, 0x34211b85,
+    0x560392bc, 0x8b463804, 0x924680ce, 0x4f032a76, 0x2d21a34f, 0xf06409f7,
+    0x1f20d2db, 0xc2657863, 0xa047f15a, 0x7d025be2, 0x6402e328, 0xb9474990,
+    0xdb65c0a9, 0x06206a11, 0xd725148b, 0x0a60be33, 0x6842370a, 0xb5079db2,
+    0xac072578, 0x71428fc0, 0x136006f9, 0xce25ac41, 0x2161776d, 0xfc24ddd5,
+    0x9e0654ec, 0x4343fe54, 0x5a43469e, 0x8706ec26, 0xe524651f, 0x3861cfa7,
+    0x3e41a5b6, 0xe3040f0e, 0x81268637, 0x5c632c8f, 0x45639445, 0x98263efd,
+    0xfa04b7c4, 0x27411d7c, 0xc805c650, 0x15406ce8, 0x7762e5d1, 0xaa274f69,
+    0xb327f7a3, 0x6e625d1b, 0x0c40d422, 0xd1057e9a, 0xaba65fe7, 0x76e3f55f,
+    0x14c17c66, 0xc984d6de, 0xd0846e14, 0x0dc1c4ac, 0x6fe34d95, 0xb2a6e72d,
+    0x5de23c01, 0x80a796b9, 0xe2851f80, 0x3fc0b538, 0x26c00df2, 0xfb85a74a,
+    0x99a72e73, 0x44e284cb, 0x42c2eeda, 0x9f874462, 0xfda5cd5b, 0x20e067e3,
+    0x39e0df29, 0xe4a57591, 0x8687fca8, 0x5bc25610, 0xb4868d3c, 0x69c32784,
+    0x0be1aebd, 0xd6a40405, 0xcfa4bccf, 0x12e11677, 0x70c39f4e, 0xad8635f6,
+    0x7c834b6c, 0xa1c6e1d4, 0xc3e468ed, 0x1ea1c255, 0x07a17a9f, 0xdae4d027,
+    0xb8c6591e, 0x6583f3a6, 0x8ac7288a, 0x57828232, 0x35a00b0b, 0xe8e5a1b3,
+    0xf1e51979, 0x2ca0b3c1, 0x4e823af8, 0x93c79040, 0x95e7fa51, 0x48a250e9,
+    0x2a80d9d0, 0xf7c57368, 0xeec5cba2, 0x3380611a, 0x51a2e823, 0x8ce7429b,
+    0x63a399b7, 0xbee6330f, 0xdcc4ba36, 0x0181108e, 0x1881a844, 0xc5c402fc,
+    0xa7e68bc5, 0x7aa3217d, 0x52a0c93f, 0x8fe56387, 0xedc7eabe, 0x30824006,
+    0x2982f8cc, 0xf4c75274, 0x96e5db4d, 0x4ba071f5, 0xa4e4aad9, 0x79a10061,
+    0x1b838958, 0xc6c623e0, 0xdfc69b2a, 0x02833192, 0x60a1b8ab, 0xbde41213,
+    0xbbc47802, 0x6681d2ba, 0x04a35b83, 0xd9e6f13b, 0xc0e649f1, 0x1da3e349,
+    0x7f816a70, 0xa2c4c0c8, 0x4d801be4, 0x90c5b15c, 0xf2e73865, 0x2fa292dd,
+    0x36a22a17, 0xebe780af, 0x89c50996, 0x5480a32e, 0x8585ddb4, 0x58c0770c,
+    0x3ae2fe35, 0xe7a7548d, 0xfea7ec47, 0x23e246ff, 0x41c0cfc6, 0x9c85657e,
+    0x73c1be52, 0xae8414ea, 0xcca69dd3, 0x11e3376b, 0x08e38fa1, 0xd5a62519,
+    0xb784ac20, 0x6ac10698, 0x6ce16c89, 0xb1a4c631, 0xd3864f08, 0x0ec3e5b0,
+    0x17c35d7a, 0xca86f7c2, 0xa8a47efb, 0x75e1d443, 0x9aa50f6f, 0x47e0a5d7,
+    0x25c22cee, 0xf8878656, 0xe1873e9c, 0x3cc29424, 0x5ee01d1d, 0x83a5b7a5,
+    0xf90696d8, 0x24433c60, 0x4661b559, 0x9b241fe1, 0x8224a72b, 0x5f610d93,
+    0x3d4384aa, 0xe0062e12, 0x0f42f53e, 0xd2075f86, 0xb025d6bf, 0x6d607c07,
+    0x7460c4cd, 0xa9256e75, 0xcb07e74c, 0x16424df4, 0x106227e5, 0xcd278d5d,
+    0xaf050464, 0x7240aedc, 0x6b401616, 0xb605bcae, 0xd4273597, 0x09629f2f,
+    0xe6264403, 0x3b63eebb, 0x59416782, 0x8404cd3a, 0x9d0475f0, 0x4041df48,
+    0x22635671, 0xff26fcc9, 0x2e238253, 0xf36628eb, 0x9144a1d2, 0x4c010b6a,
+    0x5501b3a0, 0x88441918, 0xea669021, 0x37233a99, 0xd867e1b5, 0x05224b0d,
+    0x6700c234, 0xba45688c, 0xa345d046, 0x7e007afe, 0x1c22f3c7, 0xc167597f,
+    0xc747336e, 0x1a0299d6, 0x782010ef, 0xa565ba57, 0xbc65029d, 0x6120a825,
+    0x0302211c, 0xde478ba4, 0x31035088, 0xec46fa30, 0x8e647309, 0x5321d9b1,
+    0x4a21617b, 0x9764cbc3, 0xf54642fa, 0x2803e842};
+
+// Used to fetch a naturally-aligned 32-bit word in little endian byte-order
+static inline uint32_t LE_LOAD32(const uint8_t *p) {
+  return core::DecodeFixed32(reinterpret_cast<const char *>(p));
+}
+
+uint32 Extend(uint32 crc, const char *buf, size_t size) {
+  const uint8 *p = reinterpret_cast<const uint8 *>(buf);
+  const uint8 *e = p + size;
+  uint32 l = crc ^ 0xffffffffu;
+
+#define STEP1                  \
+  do {                         \
+    int c = (l & 0xff) ^ *p++; \
+    l = table0_[c] ^ (l >> 8); \
+  } while (0)
+
+#define STEP4                                          \
+  do {                                                 \
+    uint32 c = l ^ LE_LOAD32(p);                       \
+    p += 4;                                            \
+    l = table3_[c & 0xff] ^ table2_[(c >> 8) & 0xff] ^ \
+        table1_[(c >> 16) & 0xff] ^ table0_[c >> 24];  \
+  } while (0)
+
+  // Point x at first 4-byte aligned byte in string.  This might be
+  // just past the end of the string.
+  const uintptr_t pval = reinterpret_cast<uintptr_t>(p);
+  const uint8 *x = reinterpret_cast<const uint8 *>(((pval + 3) >> 2) << 2);
+  if (x <= e) {
+    // Process bytes until finished or p is 4-byte aligned
+    while (p != x) {
+      STEP1;
+    }
+  }
+  // Process bytes 16 at a time
+  while ((e - p) >= 16) {
+    STEP4;
+    STEP4;
+    STEP4;
+    STEP4;
+  }
+  // Process bytes 4 at a time
+  while ((e - p) >= 4) {
+    STEP4;
+  }
+  // Process the last few bytes
+  while (p != e) {
+    STEP1;
+  }
+#undef STEP4
+#undef STEP1
+  return l ^ 0xffffffffu;
+}
+
+}  // namespace crc32c
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/hash/crc32c.h b/tensorflow/core/lib/hash/crc32c.h
new file mode 100644
index 0000000000..f728b6f5e7
--- /dev/null
+++ b/tensorflow/core/lib/hash/crc32c.h
@@ -0,0 +1,39 @@
+#ifndef TENSORFLOW_LIB_HASH_CRC32C_H_
+#define TENSORFLOW_LIB_HASH_CRC32C_H_
+
+#include <stddef.h>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace crc32c {
+
+// Return the crc32c of concat(A, data[0,n-1]) where init_crc is the
+// crc32c of some string A.  Extend() is often used to maintain the
+// crc32c of a stream of data.
+extern uint32 Extend(uint32 init_crc, const char* data, size_t n);
+
+// Return the crc32c of data[0,n-1]
+inline uint32 Value(const char* data, size_t n) { return Extend(0, data, n); }
+
+static const uint32 kMaskDelta = 0xa282ead8ul;
+
+// Return a masked representation of crc.
+//
+// Motivation: it is problematic to compute the CRC of a string that
+// contains embedded CRCs.  Therefore we recommend that CRCs stored
+// somewhere (e.g., in files) should be masked before being stored.
+inline uint32 Mask(uint32 crc) {
+  // Rotate right by 15 bits and add a constant.
+  return ((crc >> 15) | (crc << 17)) + kMaskDelta;
+}
+
+// Return the crc whose masked representation is masked_crc.
+inline uint32 Unmask(uint32 masked_crc) {
+  uint32 rot = masked_crc - kMaskDelta;
+  return ((rot >> 17) | (rot << 15));
+}
+
+}  // namespace crc32c
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_HASH_CRC32C_H_
diff --git a/tensorflow/core/lib/hash/crc32c_test.cc b/tensorflow/core/lib/hash/crc32c_test.cc
new file mode 100644
index 0000000000..54aced3186
--- /dev/null
+++ b/tensorflow/core/lib/hash/crc32c_test.cc
@@ -0,0 +1,51 @@
+#include "tensorflow/core/lib/hash/crc32c.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace crc32c {
+
+TEST(CRC, StandardResults) {
+  // From rfc3720 section B.4.
+  char buf[32];
+
+  memset(buf, 0, sizeof(buf));
+  ASSERT_EQ(0x8a9136aa, Value(buf, sizeof(buf)));
+
+  memset(buf, 0xff, sizeof(buf));
+  ASSERT_EQ(0x62a8ab43, Value(buf, sizeof(buf)));
+
+  for (int i = 0; i < 32; i++) {
+    buf[i] = i;
+  }
+  ASSERT_EQ(0x46dd794e, Value(buf, sizeof(buf)));
+
+  for (int i = 0; i < 32; i++) {
+    buf[i] = 31 - i;
+  }
+  ASSERT_EQ(0x113fdb5c, Value(buf, sizeof(buf)));
+
+  unsigned char data[48] = {
+      0x01, 0xc0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+      0x00, 0x00, 0x00, 0x00, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00,
+      0x00, 0x00, 0x00, 0x14, 0x00, 0x00, 0x00, 0x18, 0x28, 0x00, 0x00, 0x00,
+      0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  };
+  ASSERT_EQ(0xd9963a56, Value(reinterpret_cast<char*>(data), sizeof(data)));
+}
+
+TEST(CRC, Values) { ASSERT_NE(Value("a", 1), Value("foo", 3)); }
+
+TEST(CRC, Extend) {
+  ASSERT_EQ(Value("hello world", 11), Extend(Value("hello ", 6), "world", 5));
+}
+
+TEST(CRC, Mask) {
+  uint32 crc = Value("foo", 3);
+  ASSERT_NE(crc, Mask(crc));
+  ASSERT_NE(crc, Mask(Mask(crc)));
+  ASSERT_EQ(crc, Unmask(Mask(crc)));
+  ASSERT_EQ(crc, Unmask(Unmask(Mask(Mask(crc)))));
+}
+
+}  // namespace crc32c
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/hash/hash.cc b/tensorflow/core/lib/hash/hash.cc
new file mode 100644
index 0000000000..075d252412
--- /dev/null
+++ b/tensorflow/core/lib/hash/hash.cc
@@ -0,0 +1,113 @@
+#include "tensorflow/core/lib/hash/hash.h"
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/raw_coding.h"
+
+#include <string.h>
+
+namespace tensorflow {
+
+// 0xff is in case char is signed.
+static inline uint32 ByteAs32(char c) { return static_cast<uint32>(c) & 0xff; }
+static inline uint64 ByteAs64(char c) { return static_cast<uint64>(c) & 0xff; }
+
+uint32 Hash32(const char* data, size_t n, uint32 seed) {
+  // 'm' and 'r' are mixing constants generated offline.
+  // They're not really 'magic', they just happen to work well.
+
+  const uint32 m = 0x5bd1e995;
+  const int r = 24;
+
+  // Initialize the hash to a 'random' value
+  uint32 h = seed ^ n;
+
+  // Mix 4 bytes at a time into the hash
+  while (n >= 4) {
+    uint32 k = core::DecodeFixed32(data);
+
+    k *= m;
+    k ^= k >> r;
+    k *= m;
+
+    h *= m;
+    h ^= k;
+
+    data += 4;
+    n -= 4;
+  }
+
+  // Handle the last few bytes of the input array
+
+  switch (n) {
+    case 3:
+      h ^= ByteAs32(data[2]) << 16;
+      TF_FALLTHROUGH_INTENDED;
+    case 2:
+      h ^= ByteAs32(data[1]) << 8;
+      TF_FALLTHROUGH_INTENDED;
+    case 1:
+      h ^= ByteAs32(data[0]);
+      h *= m;
+  }
+
+  // Do a few final mixes of the hash to ensure the last few
+  // bytes are well-incorporated.
+
+  h ^= h >> 13;
+  h *= m;
+  h ^= h >> 15;
+
+  return h;
+}
+
+uint64 Hash64(const char* data, size_t n, uint64 seed) {
+  const uint64 m = 0xc6a4a7935bd1e995;
+  const int r = 47;
+
+  uint64 h = seed ^ (n * m);
+
+  while (n >= 8) {
+    uint64 k = core::DecodeFixed64(data);
+    data += 8;
+    n -= 8;
+
+    k *= m;
+    k ^= k >> r;
+    k *= m;
+
+    h ^= k;
+    h *= m;
+  }
+
+  switch (n) {
+    case 7:
+      h ^= ByteAs64(data[6]) << 48;
+      TF_FALLTHROUGH_INTENDED;
+    case 6:
+      h ^= ByteAs64(data[5]) << 40;
+      TF_FALLTHROUGH_INTENDED;
+    case 5:
+      h ^= ByteAs64(data[4]) << 32;
+      TF_FALLTHROUGH_INTENDED;
+    case 4:
+      h ^= ByteAs64(data[3]) << 24;
+      TF_FALLTHROUGH_INTENDED;
+    case 3:
+      h ^= ByteAs64(data[2]) << 16;
+      TF_FALLTHROUGH_INTENDED;
+    case 2:
+      h ^= ByteAs64(data[1]) << 8;
+      TF_FALLTHROUGH_INTENDED;
+    case 1:
+      h ^= ByteAs64(data[0]);
+      h *= m;
+  }
+
+  h ^= h >> r;
+  h *= m;
+  h ^= h >> r;
+
+  return h;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/hash/hash.h b/tensorflow/core/lib/hash/hash.h
new file mode 100644
index 0000000000..af56218fed
--- /dev/null
+++ b/tensorflow/core/lib/hash/hash.h
@@ -0,0 +1,28 @@
+// Simple hash functions used for internal data structures
+
+#ifndef TENSORFLOW_LIB_HASH_HASH_H_
+#define TENSORFLOW_LIB_HASH_HASH_H_
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include <string>
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+extern uint32 Hash32(const char* data, size_t n, uint32 seed);
+extern uint64 Hash64(const char* data, size_t n, uint64 seed);
+
+inline uint64 Hash64(const char* data, size_t n) {
+  return Hash64(data, n, 0xDECAFCAFFE);
+}
+
+inline uint64 Hash64(const string& str) {
+  return Hash64(str.data(), str.size());
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_HASH_HASH_H_
diff --git a/tensorflow/core/lib/hash/hash_test.cc b/tensorflow/core/lib/hash/hash_test.cc
new file mode 100644
index 0000000000..9d3b970f3b
--- /dev/null
+++ b/tensorflow/core/lib/hash/hash_test.cc
@@ -0,0 +1,64 @@
+#include <vector>
+
+#include "tensorflow/core/lib/hash/hash.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TEST(Hash, SignedUnsignedIssue) {
+  const unsigned char d1[1] = {0x62};
+  const unsigned char d2[2] = {0xc3, 0x97};
+  const unsigned char d3[3] = {0xe2, 0x99, 0xa5};
+  const unsigned char d4[4] = {0xe1, 0x80, 0xb9, 0x32};
+  const unsigned char d5[48] = {
+      0x01, 0xc0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+      0x00, 0x00, 0x00, 0x00, 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00,
+      0x00, 0x00, 0x00, 0x14, 0x00, 0x00, 0x00, 0x18, 0x28, 0x00, 0x00, 0x00,
+      0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  };
+
+  struct Case {
+    uint32 hash32;
+    uint64 hash64;
+    const unsigned char* data;
+    size_t size;
+    uint32 seed;
+  };
+
+  for (Case c : std::vector<Case>{
+           {0x471a8188u, 0x4c61ea3eeda4cb87ull, nullptr, 0, 0xbc9f1d34},
+           {0xd615eba5u, 0x091309f7ef916c8aull, d1, sizeof(d1), 0xbc9f1d34},
+           {0x0c3cccdau, 0xa815bcdf1d1af01cull, d2, sizeof(d2), 0xbc9f1d34},
+           {0x3ba37e0eu, 0x02167564e4d06430ull, d3, sizeof(d3), 0xbc9f1d34},
+           {0x16174eb3u, 0x8f7ed82ffc21071full, d4, sizeof(d4), 0xbc9f1d34},
+           {0x98b1926cu, 0xce196580c97aff1eull, d5, sizeof(d5), 0x12345678},
+       }) {
+    EXPECT_EQ(c.hash32,
+              Hash32(reinterpret_cast<const char*>(c.data), c.size, c.seed));
+    EXPECT_EQ(c.hash64,
+              Hash64(reinterpret_cast<const char*>(c.data), c.size, c.seed));
+
+    // Check hashes with inputs aligned differently.
+    for (int align = 1; align <= 7; align++) {
+      std::string input(align, 'x');
+      input.append(reinterpret_cast<const char*>(c.data), c.size);
+      EXPECT_EQ(c.hash32, Hash32(&input[align], c.size, c.seed));
+      EXPECT_EQ(c.hash64, Hash64(&input[align], c.size, c.seed));
+    }
+  }
+}
+
+static void BM_Hash32(int iters, int len) {
+  std::string input(len, 'x');
+  uint32 h = 0;
+  for (int i = 0; i < iters; i++) {
+    h = Hash32(input.data(), len, 1);
+  }
+  testing::BytesProcessed(static_cast<int64>(iters) * len);
+  VLOG(1) << h;
+}
+BENCHMARK(BM_Hash32)->Range(1, 1024);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/histogram/histogram.cc b/tensorflow/core/lib/histogram/histogram.cc
new file mode 100644
index 0000000000..4c29d687b7
--- /dev/null
+++ b/tensorflow/core/lib/histogram/histogram.cc
@@ -0,0 +1,247 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#include "tensorflow/core/lib/histogram/histogram.h"
+#include <float.h>
+#include <math.h>
+#include "tensorflow/core/framework/summary.pb.h"
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+namespace tensorflow {
+namespace histogram {
+
+static std::vector<double>* InitDefaultBucketsInner() {
+  std::vector<double> buckets;
+  std::vector<double> neg_buckets;
+  // Make buckets whose range grows by 10% starting at 1.0e-12 up to 1.0e20
+  double v = 1.0e-12;
+  while (v < 1.0e20) {
+    buckets.push_back(v);
+    neg_buckets.push_back(-v);
+    v *= 1.1;
+  }
+  buckets.push_back(DBL_MAX);
+  neg_buckets.push_back(-DBL_MAX);
+  std::reverse(neg_buckets.begin(), neg_buckets.end());
+  std::vector<double>* result = new std::vector<double>;
+  result->insert(result->end(), neg_buckets.begin(), neg_buckets.end());
+  result->push_back(0.0);
+  result->insert(result->end(), buckets.begin(), buckets.end());
+  return result;
+}
+
+static gtl::ArraySlice<double> InitDefaultBuckets() {
+  static std::vector<double>* default_bucket_limits = InitDefaultBucketsInner();
+  return *default_bucket_limits;
+}
+
+Histogram::Histogram() : bucket_limits_(InitDefaultBuckets()) { Clear(); }
+
+// Create a histogram with a custom set of bucket limits,
+// specified in "custom_buckets[0..custom_buckets.size()-1]"
+Histogram::Histogram(gtl::ArraySlice<double> custom_bucket_limits)
+    : custom_bucket_limits_(custom_bucket_limits.begin(),
+                            custom_bucket_limits.end()),
+      bucket_limits_(custom_bucket_limits_) {
+#ifndef NDEBUG
+  DCHECK_GT(bucket_limits_.size(), 0);
+  // Verify that the bucket boundaries are strictly increasing
+  for (size_t i = 1; i < bucket_limits_.size(); i++) {
+    DCHECK_GT(bucket_limits_[i], bucket_limits_[i - 1]);
+  }
+#endif
+  Clear();
+}
+
+bool Histogram::DecodeFromProto(const HistogramProto& proto) {
+  if ((proto.bucket_size() != proto.bucket_limit_size()) ||
+      (proto.bucket_size() == 0)) {
+    return false;
+  }
+  min_ = proto.min();
+  max_ = proto.max();
+  num_ = proto.num();
+  sum_ = proto.sum();
+  sum_squares_ = proto.sum_squares();
+  custom_bucket_limits_.clear();
+  custom_bucket_limits_.insert(custom_bucket_limits_.end(),
+                               proto.bucket_limit().begin(),
+                               proto.bucket_limit().end());
+  bucket_limits_ = custom_bucket_limits_;
+  buckets_.clear();
+  buckets_.insert(buckets_.end(), proto.bucket().begin(), proto.bucket().end());
+  return true;
+}
+
+void Histogram::Clear() {
+  min_ = bucket_limits_[bucket_limits_.size() - 1];
+  max_ = -DBL_MAX;
+  num_ = 0;
+  sum_ = 0;
+  sum_squares_ = 0;
+  buckets_.resize(bucket_limits_.size());
+  for (size_t i = 0; i < bucket_limits_.size(); i++) {
+    buckets_[i] = 0;
+  }
+}
+
+void Histogram::Add(double value) {
+  int b =
+      std::upper_bound(bucket_limits_.begin(), bucket_limits_.end(), value) -
+      bucket_limits_.begin();
+
+  buckets_[b] += 1.0;
+  if (min_ > value) min_ = value;
+  if (max_ < value) max_ = value;
+  num_++;
+  sum_ += value;
+  sum_squares_ += (value * value);
+}
+
+double Histogram::Median() const { return Percentile(50.0); }
+
+double Histogram::Percentile(double p) const {
+  if (num_ == 0.0) return 0.0;
+  double threshold = num_ * (p / 100.0);
+  double sum = 0;
+  for (size_t b = 0; b < buckets_.size(); b++) {
+    sum += buckets_[b];
+    if (sum >= threshold) {
+      // Scale linearly within this bucket
+      double left_point = (b == 0) ? min_ : bucket_limits_[b - 1];
+      double right_point = bucket_limits_[b];
+      double left_sum = sum - buckets_[b];
+      double right_sum = sum;
+      double pos = (threshold - left_sum) / (right_sum - left_sum);
+      double r = left_point + (right_point - left_point) * pos;
+      if (r < min_) r = min_;
+      if (r > max_) r = max_;
+      return r;
+    }
+  }
+  return max_;
+}
+
+double Histogram::Average() const {
+  if (num_ == 0.0) return 0;
+  return sum_ / num_;
+}
+
+double Histogram::StandardDeviation() const {
+  if (num_ == 0.0) return 0;
+  double variance = (sum_squares_ * num_ - sum_ * sum_) / (num_ * num_);
+  return sqrt(variance);
+}
+
+std::string Histogram::ToString() const {
+  std::string r;
+  char buf[200];
+  snprintf(buf, sizeof(buf), "Count: %.0f  Average: %.4f  StdDev: %.2f\n", num_,
+           Average(), StandardDeviation());
+  r.append(buf);
+  snprintf(buf, sizeof(buf), "Min: %.4f  Median: %.4f  Max: %.4f\n",
+           (num_ == 0.0 ? 0.0 : min_), Median(), max_);
+  r.append(buf);
+  r.append("------------------------------------------------------\n");
+  const double mult = num_ > 0 ? 100.0 / num_ : 0.0;
+  double sum = 0;
+  for (size_t b = 0; b < buckets_.size(); b++) {
+    if (buckets_[b] <= 0.0) continue;
+    sum += buckets_[b];
+    snprintf(buf, sizeof(buf), "[ %10.2g, %10.2g ) %7.0f %7.3f%% %7.3f%% ",
+             ((b == 0) ? -DBL_MAX : bucket_limits_[b - 1]),  // left
+             bucket_limits_[b],                              // right
+             buckets_[b],                                    // count
+             mult * buckets_[b],                             // percentage
+             mult * sum);                                    // cum percentage
+    r.append(buf);
+
+    // Add hash marks based on percentage; 20 marks for 100%.
+    int marks = static_cast<int>(20 * (buckets_[b] / num_) + 0.5);
+    r.append(marks, '#');
+    r.push_back('\n');
+  }
+  return r;
+}
+
+void Histogram::EncodeToProto(HistogramProto* proto,
+                              bool preserve_zero_buckets) const {
+  proto->Clear();
+  proto->set_min(min_);
+  proto->set_max(max_);
+  proto->set_num(num_);
+  proto->set_sum(sum_);
+  proto->set_sum_squares(sum_squares_);
+  for (size_t i = 0; i < buckets_.size();) {
+    double end = bucket_limits_[i];
+    double count = buckets_[i];
+    i++;
+    if (!preserve_zero_buckets && count <= 0.0) {
+      // Find run of empty buckets and collapse them into one
+      while (i < buckets_.size() && buckets_[i] <= 0.0) {
+        end = bucket_limits_[i];
+        count = buckets_[i];
+        i++;
+      }
+    }
+    proto->add_bucket_limit(end);
+    proto->add_bucket(count);
+  }
+  if (proto->bucket_size() == 0.0) {
+    // It's easier when we restore if we always have at least one bucket entry
+    proto->add_bucket_limit(DBL_MAX);
+    proto->add_bucket(0.0);
+  }
+}
+
+// ThreadSafeHistogram implementation.
+bool ThreadSafeHistogram::DecodeFromProto(const HistogramProto& proto) {
+  mutex_lock l(mu_);
+  return histogram_.DecodeFromProto(proto);
+}
+
+void ThreadSafeHistogram::Clear() {
+  mutex_lock l(mu_);
+  histogram_.Clear();
+}
+
+void ThreadSafeHistogram::Add(double value) {
+  mutex_lock l(mu_);
+  histogram_.Add(value);
+}
+
+void ThreadSafeHistogram::EncodeToProto(HistogramProto* proto,
+                                        bool preserve_zero_buckets) const {
+  mutex_lock l(mu_);
+  histogram_.EncodeToProto(proto, preserve_zero_buckets);
+}
+
+double ThreadSafeHistogram::Median() const {
+  mutex_lock l(mu_);
+  return histogram_.Median();
+}
+
+double ThreadSafeHistogram::Percentile(double p) const {
+  mutex_lock l(mu_);
+  return histogram_.Percentile(p);
+}
+
+double ThreadSafeHistogram::Average() const {
+  mutex_lock l(mu_);
+  return histogram_.Average();
+}
+
+double ThreadSafeHistogram::StandardDeviation() const {
+  mutex_lock l(mu_);
+  return histogram_.StandardDeviation();
+}
+
+std::string ThreadSafeHistogram::ToString() const {
+  mutex_lock l(mu_);
+  return histogram_.ToString();
+}
+
+}  // namespace histogram
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/histogram/histogram.h b/tensorflow/core/lib/histogram/histogram.h
new file mode 100644
index 0000000000..9b655f3acb
--- /dev/null
+++ b/tensorflow/core/lib/histogram/histogram.h
@@ -0,0 +1,119 @@
+#ifndef TENSORFLOW_LIB_HISTOGRAM_HISTOGRAM_H_
+#define TENSORFLOW_LIB_HISTOGRAM_HISTOGRAM_H_
+
+#include <string>
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+
+namespace tensorflow {
+
+class HistogramProto;
+
+namespace histogram {
+
+class Histogram {
+ public:
+  // Create a histogram with a default set of bucket boundaries.
+  // Buckets near zero cover very small ranges (e.g. 10^-12), and each
+  // bucket range grows by ~10% as we head away from zero.  The
+  // buckets cover the range from -DBL_MAX to DBL_MAX.
+  Histogram();
+
+  // Create a histogram with a custom set of bucket boundaries,
+  // specified in "custom_bucket_limits[0..custom_bucket_limits.size()-1]"
+  // REQUIRES: custom_bucket_limits[i] values are monotonically increasing.
+  // REQUIRES: custom_bucket_limits is not empty()
+  explicit Histogram(gtl::ArraySlice<double> custom_bucket_limits);
+
+  // Restore the state of a histogram that was previously encoded
+  // via Histogram::EncodeToProto.  Note that only the bucket boundaries
+  // generated by EncodeToProto will be restored.
+  bool DecodeFromProto(const HistogramProto& proto);
+
+  ~Histogram() {}
+
+  void Clear();
+  void Add(double value);
+
+  // Save the current state of the histogram to "*proto".  If
+  // "preserve_zero_buckets" is false, only non-zero bucket values and
+  // ranges are saved, and the bucket boundaries of zero-valued buckets
+  // are lost.
+  void EncodeToProto(HistogramProto* proto, bool preserve_zero_buckets) const;
+
+  // Return the median of the values in the histogram
+  double Median() const;
+
+  // Return the "p"th percentile [0.0..100.0] of the values in the
+  // distribution
+  double Percentile(double p) const;
+
+  // Return the average value of the distribution
+  double Average() const;
+
+  // Return the standard deviation of values in the distribution
+  double StandardDeviation() const;
+
+  // Returns a multi-line human-readable string representing the histogram
+  // contents.  Example output:
+  //   Count: 4  Average: 251.7475  StdDev: 432.02
+  //   Min: -3.0000  Median: 5.0000  Max: 1000.0000
+  //   ------------------------------------------------------
+  //   [      -5,       0 )       1  25.000%  25.000% #####
+  //   [       0,       5 )       1  25.000%  50.000% #####
+  //   [       5,      10 )       1  25.000%  75.000% #####
+  //   [    1000,   10000 )       1  25.000% 100.000% #####
+  std::string ToString() const;
+
+ private:
+  double min_;
+  double max_;
+  double num_;
+  double sum_;
+  double sum_squares_;
+
+  std::vector<double> custom_bucket_limits_;
+  gtl::ArraySlice<double> bucket_limits_;
+  std::vector<double> buckets_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(Histogram);
+};
+
+// Wrapper around a Histogram object that is thread safe.
+//
+// All methods hold a lock while delegating to a Histogram object owned by the
+// ThreadSafeHistogram instance.
+//
+// See Histogram for documentation of the methods.
+class ThreadSafeHistogram {
+ public:
+  ThreadSafeHistogram() {}
+  explicit ThreadSafeHistogram(gtl::ArraySlice<double> custom_bucket_limits)
+      : histogram_(custom_bucket_limits) {}
+  bool DecodeFromProto(const HistogramProto& proto);
+
+  ~ThreadSafeHistogram() {}
+
+  void Clear();
+
+  // TODO(mdevin): It might be a good idea to provide a AddN(<many values>)
+  // method to avoid grabbing/releasing the lock when adding many values.
+  void Add(double value);
+
+  void EncodeToProto(HistogramProto* proto, bool preserve_zero_buckets) const;
+  double Median() const;
+  double Percentile(double p) const;
+  double Average() const;
+  double StandardDeviation() const;
+  std::string ToString() const;
+
+ private:
+  mutable mutex mu_;
+  Histogram histogram_ GUARDED_BY(mu_);
+};
+
+}  // namespace histogram
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_HISTOGRAM_HISTOGRAM_H_
diff --git a/tensorflow/core/lib/histogram/histogram_test.cc b/tensorflow/core/lib/histogram/histogram_test.cc
new file mode 100644
index 0000000000..ede44fe85b
--- /dev/null
+++ b/tensorflow/core/lib/histogram/histogram_test.cc
@@ -0,0 +1,112 @@
+#include "tensorflow/core/lib/histogram/histogram.h"
+#include <float.h>
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/framework/summary.pb.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace histogram {
+
+static void Validate(const Histogram& h) {
+  string s1 = h.ToString();
+  LOG(ERROR) << s1;
+
+  HistogramProto proto_with_zeroes;
+  h.EncodeToProto(&proto_with_zeroes, true);
+  Histogram h2;
+  EXPECT_TRUE(h2.DecodeFromProto(proto_with_zeroes));
+  string s2 = h2.ToString();
+  LOG(ERROR) << s2;
+
+  EXPECT_EQ(s1, s2);
+
+  HistogramProto proto_no_zeroes;
+  h.EncodeToProto(&proto_no_zeroes, false);
+  LOG(ERROR) << proto_no_zeroes.DebugString();
+  Histogram h3;
+  EXPECT_TRUE(h3.DecodeFromProto(proto_no_zeroes));
+  string s3 = h3.ToString();
+  LOG(ERROR) << s3;
+
+  EXPECT_EQ(s1, s3);
+}
+
+TEST(Histogram, Empty) {
+  Histogram h;
+  Validate(h);
+}
+
+TEST(Histogram, SingleValue) {
+  Histogram h;
+  h.Add(-3.0);
+  Validate(h);
+}
+
+TEST(Histogram, CustomBuckets) {
+  Histogram h({-10, -5, 0, 5, 10, 100, 1000, 10000, DBL_MAX});
+  h.Add(-3.0);
+  h.Add(4.99);
+  h.Add(5.0);
+  h.Add(1000.0);
+  Validate(h);
+}
+
+TEST(Histogram, Percentile) {
+  Histogram h({0, 10, 100, DBL_MAX});
+  h.Add(-2);
+  h.Add(-2);
+  h.Add(0);
+  double median = h.Percentile(50.0);
+  EXPECT_EQ(median, -0.5);
+}
+
+TEST(Histogram, Basic) {
+  Histogram h;
+  for (int i = 0; i < 100; i++) {
+    h.Add(i);
+  }
+  for (int i = 1000; i < 100000; i += 1000) {
+    h.Add(i);
+  }
+  Validate(h);
+}
+
+TEST(ThreadSafeHistogram, Basic) {
+  // Fill a normal histogram.
+  Histogram h;
+  for (int i = 0; i < 100; i++) {
+    h.Add(i);
+  }
+
+  // Fill a thread-safe histogram with the same values.
+  ThreadSafeHistogram tsh;
+  for (int i = 0; i < 100; i++) {
+    tsh.Add(i);
+  }
+
+  for (int i = 0; i < 2; ++i) {
+    bool preserve_zero_buckets = (i == 0);
+    HistogramProto h_proto;
+    h.EncodeToProto(&h_proto, preserve_zero_buckets);
+    HistogramProto tsh_proto;
+    tsh.EncodeToProto(&tsh_proto, preserve_zero_buckets);
+
+    // Let's decode from the proto of the other histogram type.
+    Histogram h2;
+    EXPECT_TRUE(h2.DecodeFromProto(tsh_proto));
+    ThreadSafeHistogram tsh2;
+    EXPECT_TRUE(tsh2.DecodeFromProto(h_proto));
+
+    // Now let's reencode and check they match.
+    EXPECT_EQ(h2.ToString(), tsh2.ToString());
+  }
+
+  EXPECT_EQ(h.Median(), tsh.Median());
+  EXPECT_EQ(h.Percentile(40.0), tsh.Percentile(40.0));
+  EXPECT_EQ(h.Average(), tsh.Average());
+  EXPECT_EQ(h.StandardDeviation(), tsh.StandardDeviation());
+  EXPECT_EQ(h.ToString(), tsh.ToString());
+}
+
+}  // namespace histogram
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/block.cc b/tensorflow/core/lib/io/block.cc
new file mode 100644
index 0000000000..1ddaa2eb78
--- /dev/null
+++ b/tensorflow/core/lib/io/block.cc
@@ -0,0 +1,236 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+//
+// Decodes the blocks generated by block_builder.cc.
+
+#include "tensorflow/core/lib/io/block.h"
+
+#include <vector>
+#include <algorithm>
+#include "tensorflow/core/lib/io/format.h"
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+namespace table {
+
+inline uint32 Block::NumRestarts() const {
+  assert(size_ >= sizeof(uint32));
+  return core::DecodeFixed32(data_ + size_ - sizeof(uint32));
+}
+
+Block::Block(const BlockContents& contents)
+    : data_(contents.data.data()),
+      size_(contents.data.size()),
+      owned_(contents.heap_allocated) {
+  if (size_ < sizeof(uint32)) {
+    size_ = 0;  // Error marker
+  } else {
+    size_t max_restarts_allowed = (size_ - sizeof(uint32)) / sizeof(uint32);
+    if (NumRestarts() > max_restarts_allowed) {
+      // The size is too small for NumRestarts()
+      size_ = 0;
+    } else {
+      restart_offset_ = size_ - (1 + NumRestarts()) * sizeof(uint32);
+    }
+  }
+}
+
+Block::~Block() {
+  if (owned_) {
+    delete[] data_;
+  }
+}
+
+// Helper routine: decode the next block entry starting at "p",
+// storing the number of shared key bytes, non_shared key bytes,
+// and the length of the value in "*shared", "*non_shared", and
+// "*value_length", respectively.  Will not dereference past "limit".
+//
+// If any errors are detected, returns NULL.  Otherwise, returns a
+// pointer to the key delta (just past the three decoded values).
+static inline const char* DecodeEntry(const char* p, const char* limit,
+                                      uint32* shared, uint32* non_shared,
+                                      uint32* value_length) {
+  if (limit - p < 3) return NULL;
+  *shared = reinterpret_cast<const unsigned char*>(p)[0];
+  *non_shared = reinterpret_cast<const unsigned char*>(p)[1];
+  *value_length = reinterpret_cast<const unsigned char*>(p)[2];
+  if ((*shared | *non_shared | *value_length) < 128) {
+    // Fast path: all three values are encoded in one byte each
+    p += 3;
+  } else {
+    if ((p = core::GetVarint32Ptr(p, limit, shared)) == NULL) return NULL;
+    if ((p = core::GetVarint32Ptr(p, limit, non_shared)) == NULL) return NULL;
+    if ((p = core::GetVarint32Ptr(p, limit, value_length)) == NULL) return NULL;
+  }
+
+  if (static_cast<uint32>(limit - p) < (*non_shared + *value_length)) {
+    return NULL;
+  }
+  return p;
+}
+
+class Block::Iter : public Iterator {
+ private:
+  const char* const data_;     // underlying block contents
+  uint32 const restarts_;      // Offset of restart array (list of fixed32)
+  uint32 const num_restarts_;  // Number of uint32 entries in restart array
+
+  // current_ is offset in data_ of current entry.  >= restarts_ if !Valid
+  uint32 current_;
+  uint32 restart_index_;  // Index of restart block in which current_ falls
+  string key_;
+  StringPiece value_;
+  Status status_;
+
+  inline int Compare(const StringPiece& a, const StringPiece& b) const {
+    return a.compare(b);
+  }
+
+  // Return the offset in data_ just past the end of the current entry.
+  inline uint32 NextEntryOffset() const {
+    return (value_.data() + value_.size()) - data_;
+  }
+
+  uint32 GetRestartPoint(uint32 index) {
+    assert(index < num_restarts_);
+    return core::DecodeFixed32(data_ + restarts_ + index * sizeof(uint32));
+  }
+
+  void SeekToRestartPoint(uint32 index) {
+    key_.clear();
+    restart_index_ = index;
+    // current_ will be fixed by ParseNextKey();
+
+    // ParseNextKey() starts at the end of value_, so set value_ accordingly
+    uint32 offset = GetRestartPoint(index);
+    value_ = StringPiece(data_ + offset, 0);
+  }
+
+ public:
+  Iter(const char* data, uint32 restarts, uint32 num_restarts)
+      : data_(data),
+        restarts_(restarts),
+        num_restarts_(num_restarts),
+        current_(restarts_),
+        restart_index_(num_restarts_) {
+    assert(num_restarts_ > 0);
+  }
+
+  virtual bool Valid() const { return current_ < restarts_; }
+  virtual Status status() const { return status_; }
+  virtual StringPiece key() const {
+    assert(Valid());
+    return key_;
+  }
+  virtual StringPiece value() const {
+    assert(Valid());
+    return value_;
+  }
+
+  virtual void Next() {
+    assert(Valid());
+    ParseNextKey();
+  }
+
+  virtual void Seek(const StringPiece& target) {
+    // Binary search in restart array to find the last restart point
+    // with a key < target
+    uint32 left = 0;
+    uint32 right = num_restarts_ - 1;
+    while (left < right) {
+      uint32 mid = (left + right + 1) / 2;
+      uint32 region_offset = GetRestartPoint(mid);
+      uint32 shared, non_shared, value_length;
+      const char* key_ptr =
+          DecodeEntry(data_ + region_offset, data_ + restarts_, &shared,
+                      &non_shared, &value_length);
+      if (key_ptr == NULL || (shared != 0)) {
+        CorruptionError();
+        return;
+      }
+      StringPiece mid_key(key_ptr, non_shared);
+      if (Compare(mid_key, target) < 0) {
+        // Key at "mid" is smaller than "target".  Therefore all
+        // blocks before "mid" are uninteresting.
+        left = mid;
+      } else {
+        // Key at "mid" is >= "target".  Therefore all blocks at or
+        // after "mid" are uninteresting.
+        right = mid - 1;
+      }
+    }
+
+    // Linear search (within restart block) for first key >= target
+    SeekToRestartPoint(left);
+    while (true) {
+      if (!ParseNextKey()) {
+        return;
+      }
+      if (Compare(key_, target) >= 0) {
+        return;
+      }
+    }
+  }
+
+  virtual void SeekToFirst() {
+    SeekToRestartPoint(0);
+    ParseNextKey();
+  }
+
+ private:
+  void CorruptionError() {
+    current_ = restarts_;
+    restart_index_ = num_restarts_;
+    status_ = errors::DataLoss("bad entry in block");
+    key_.clear();
+    value_.clear();
+  }
+
+  bool ParseNextKey() {
+    current_ = NextEntryOffset();
+    const char* p = data_ + current_;
+    const char* limit = data_ + restarts_;  // Restarts come right after data
+    if (p >= limit) {
+      // No more entries to return.  Mark as invalid.
+      current_ = restarts_;
+      restart_index_ = num_restarts_;
+      return false;
+    }
+
+    // Decode next entry
+    uint32 shared, non_shared, value_length;
+    p = DecodeEntry(p, limit, &shared, &non_shared, &value_length);
+    if (p == NULL || key_.size() < shared) {
+      CorruptionError();
+      return false;
+    } else {
+      key_.resize(shared);
+      key_.append(p, non_shared);
+      value_ = StringPiece(p + non_shared, value_length);
+      while (restart_index_ + 1 < num_restarts_ &&
+             GetRestartPoint(restart_index_ + 1) < current_) {
+        ++restart_index_;
+      }
+      return true;
+    }
+  }
+};
+
+Iterator* Block::NewIterator() {
+  if (size_ < sizeof(uint32)) {
+    return NewErrorIterator(errors::DataLoss("bad block contents"));
+  }
+  const uint32 num_restarts = NumRestarts();
+  if (num_restarts == 0) {
+    return NewEmptyIterator();
+  } else {
+    return new Iter(data_, restart_offset_, num_restarts);
+  }
+}
+
+}  // namespace table
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/block.h b/tensorflow/core/lib/io/block.h
new file mode 100644
index 0000000000..bf53245b8d
--- /dev/null
+++ b/tensorflow/core/lib/io/block.h
@@ -0,0 +1,45 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#ifndef TENSORFLOW_LIB_IO_BLOCK_H_
+#define TENSORFLOW_LIB_IO_BLOCK_H_
+
+#include <stddef.h>
+#include <stdint.h>
+#include "tensorflow/core/lib/io/iterator.h"
+
+namespace tensorflow {
+namespace table {
+
+struct BlockContents;
+
+class Block {
+ public:
+  // Initialize the block with the specified contents.
+  explicit Block(const BlockContents& contents);
+
+  ~Block();
+
+  size_t size() const { return size_; }
+  Iterator* NewIterator();
+
+ private:
+  uint32 NumRestarts() const;
+
+  const char* data_;
+  size_t size_;
+  uint32 restart_offset_;  // Offset in data_ of restart array
+  bool owned_;             // Block owns data_[]
+
+  // No copying allowed
+  Block(const Block&);
+  void operator=(const Block&);
+
+  class Iter;
+};
+
+}  // namespace table
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_BLOCK_H_
diff --git a/tensorflow/core/lib/io/block_builder.cc b/tensorflow/core/lib/io/block_builder.cc
new file mode 100644
index 0000000000..d94048d744
--- /dev/null
+++ b/tensorflow/core/lib/io/block_builder.cc
@@ -0,0 +1,107 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+//
+// BlockBuilder generates blocks where keys are prefix-compressed:
+//
+// When we store a key, we drop the prefix shared with the previous
+// string.  This helps reduce the space requirement significantly.
+// Furthermore, once every K keys, we do not apply the prefix
+// compression and store the entire key.  We call this a "restart
+// point".  The tail end of the block stores the offsets of all of the
+// restart points, and can be used to do a binary search when looking
+// for a particular key.  Values are stored as-is (without compression)
+// immediately following the corresponding key.
+//
+// An entry for a particular key-value pair has the form:
+//     shared_bytes: varint32
+//     unshared_bytes: varint32
+//     value_length: varint32
+//     key_delta: char[unshared_bytes]
+//     value: char[value_length]
+// shared_bytes == 0 for restart points.
+//
+// The trailer of the block has the form:
+//     restarts: uint32[num_restarts]
+//     num_restarts: uint32
+// restarts[i] contains the offset within the block of the ith restart point.
+
+#include "tensorflow/core/lib/io/block_builder.h"
+
+#include <algorithm>
+#include <assert.h>
+#include "tensorflow/core/lib/io/table_builder.h"
+#include "tensorflow/core/lib/core/coding.h"
+
+namespace tensorflow {
+namespace table {
+
+BlockBuilder::BlockBuilder(const Options* options)
+    : options_(options), restarts_(), counter_(0), finished_(false) {
+  assert(options->block_restart_interval >= 1);
+  restarts_.push_back(0);  // First restart point is at offset 0
+}
+
+void BlockBuilder::Reset() {
+  buffer_.clear();
+  restarts_.clear();
+  restarts_.push_back(0);  // First restart point is at offset 0
+  counter_ = 0;
+  finished_ = false;
+  last_key_.clear();
+}
+
+size_t BlockBuilder::CurrentSizeEstimate() const {
+  return (buffer_.size() +                     // Raw data buffer
+          restarts_.size() * sizeof(uint32) +  // Restart array
+          sizeof(uint32));                     // Restart array length
+}
+
+StringPiece BlockBuilder::Finish() {
+  // Append restart array
+  for (size_t i = 0; i < restarts_.size(); i++) {
+    core::PutFixed32(&buffer_, restarts_[i]);
+  }
+  core::PutFixed32(&buffer_, restarts_.size());
+  finished_ = true;
+  return StringPiece(buffer_);
+}
+
+void BlockBuilder::Add(const StringPiece& key, const StringPiece& value) {
+  StringPiece last_key_piece(last_key_);
+  assert(!finished_);
+  assert(counter_ <= options_->block_restart_interval);
+  assert(buffer_.empty()  // No values yet?
+         || key.compare(last_key_piece) > 0);
+  size_t shared = 0;
+  if (counter_ < options_->block_restart_interval) {
+    // See how much sharing to do with previous string
+    const size_t min_length = std::min(last_key_piece.size(), key.size());
+    while ((shared < min_length) && (last_key_piece[shared] == key[shared])) {
+      shared++;
+    }
+  } else {
+    // Restart compression
+    restarts_.push_back(buffer_.size());
+    counter_ = 0;
+  }
+  const size_t non_shared = key.size() - shared;
+
+  // Add "<shared><non_shared><value_size>" to buffer_
+  core::PutVarint32(&buffer_, shared);
+  core::PutVarint32(&buffer_, non_shared);
+  core::PutVarint32(&buffer_, value.size());
+
+  // Add string delta to buffer_ followed by value
+  buffer_.append(key.data() + shared, non_shared);
+  buffer_.append(value.data(), value.size());
+
+  // Update state
+  last_key_.resize(shared);
+  last_key_.append(key.data() + shared, non_shared);
+  assert(StringPiece(last_key_) == key);
+  counter_++;
+}
+
+}  // namespace table
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/block_builder.h b/tensorflow/core/lib/io/block_builder.h
new file mode 100644
index 0000000000..e07a647805
--- /dev/null
+++ b/tensorflow/core/lib/io/block_builder.h
@@ -0,0 +1,57 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#ifndef TENSORFLOW_LIB_IO_BLOCK_BUILDER_H_
+#define TENSORFLOW_LIB_IO_BLOCK_BUILDER_H_
+
+#include <vector>
+
+#include <stdint.h>
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+namespace table {
+
+struct Options;
+
+class BlockBuilder {
+ public:
+  explicit BlockBuilder(const Options* options);
+
+  // Reset the contents as if the BlockBuilder was just constructed.
+  void Reset();
+
+  // REQUIRES: Finish() has not been called since the last call to Reset().
+  // REQUIRES: key is larger than any previously added key
+  void Add(const StringPiece& key, const StringPiece& value);
+
+  // Finish building the block and return a slice that refers to the
+  // block contents.  The returned slice will remain valid for the
+  // lifetime of this builder or until Reset() is called.
+  StringPiece Finish();
+
+  // Returns an estimate of the current (uncompressed) size of the block
+  // we are building.
+  size_t CurrentSizeEstimate() const;
+
+  // Return true iff no entries have been added since the last Reset()
+  bool empty() const { return buffer_.empty(); }
+
+ private:
+  const Options* options_;
+  string buffer_;                 // Destination buffer
+  std::vector<uint32> restarts_;  // Restart points
+  int counter_;                   // Number of entries emitted since restart
+  bool finished_;                 // Has Finish() been called?
+  string last_key_;
+
+  // No copying allowed
+  BlockBuilder(const BlockBuilder&);
+  void operator=(const BlockBuilder&);
+};
+
+}  // namespace table
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_BLOCK_BUILDER_H_
diff --git a/tensorflow/core/lib/io/format.cc b/tensorflow/core/lib/io/format.cc
new file mode 100644
index 0000000000..259cfc13dc
--- /dev/null
+++ b/tensorflow/core/lib/io/format.cc
@@ -0,0 +1,148 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#include "tensorflow/core/lib/io/format.h"
+
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/lib/io/block.h"
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/hash/crc32c.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+namespace table {
+
+void BlockHandle::EncodeTo(string* dst) const {
+  // Sanity check that all fields have been set
+  assert(offset_ != ~static_cast<uint64>(0));
+  assert(size_ != ~static_cast<uint64>(0));
+  core::PutVarint64(dst, offset_);
+  core::PutVarint64(dst, size_);
+}
+
+Status BlockHandle::DecodeFrom(StringPiece* input) {
+  if (core::GetVarint64(input, &offset_) && core::GetVarint64(input, &size_)) {
+    return Status::OK();
+  } else {
+    return errors::DataLoss("bad block handle");
+  }
+}
+
+void Footer::EncodeTo(string* dst) const {
+#ifndef NDEBUG
+  const size_t original_size = dst->size();
+#endif
+  metaindex_handle_.EncodeTo(dst);
+  index_handle_.EncodeTo(dst);
+  dst->resize(2 * BlockHandle::kMaxEncodedLength);  // Padding
+  core::PutFixed32(dst, static_cast<uint32>(kTableMagicNumber & 0xffffffffu));
+  core::PutFixed32(dst, static_cast<uint32>(kTableMagicNumber >> 32));
+  assert(dst->size() == original_size + kEncodedLength);
+}
+
+Status Footer::DecodeFrom(StringPiece* input) {
+  const char* magic_ptr = input->data() + kEncodedLength - 8;
+  const uint32 magic_lo = core::DecodeFixed32(magic_ptr);
+  const uint32 magic_hi = core::DecodeFixed32(magic_ptr + 4);
+  const uint64 magic =
+      ((static_cast<uint64>(magic_hi) << 32) | (static_cast<uint64>(magic_lo)));
+  if (magic != kTableMagicNumber) {
+    return errors::DataLoss("not an sstable (bad magic number)");
+  }
+
+  Status result = metaindex_handle_.DecodeFrom(input);
+  if (result.ok()) {
+    result = index_handle_.DecodeFrom(input);
+  }
+  if (result.ok()) {
+    // We skip over any leftover data (just padding for now) in "input"
+    const char* end = magic_ptr + 8;
+    *input = StringPiece(end, input->data() + input->size() - end);
+  }
+  return result;
+}
+
+Status ReadBlock(RandomAccessFile* file, const BlockHandle& handle,
+                      BlockContents* result) {
+  result->data = StringPiece();
+  result->cachable = false;
+  result->heap_allocated = false;
+
+  // Read the block contents as well as the type/crc footer.
+  // See table_builder.cc for the code that built this structure.
+  size_t n = static_cast<size_t>(handle.size());
+  char* buf = new char[n + kBlockTrailerSize];
+  StringPiece contents;
+  Status s =
+      file->Read(handle.offset(), n + kBlockTrailerSize, &contents, buf);
+  if (!s.ok()) {
+    delete[] buf;
+    return s;
+  }
+  if (contents.size() != n + kBlockTrailerSize) {
+    delete[] buf;
+    return errors::DataLoss("truncated block read");
+  }
+
+  // Check the crc of the type and the block contents
+  const char* data = contents.data();  // Pointer to where Read put the data
+  // This checksum verification is optional.  We leave it on for now
+  const bool verify_checksum = true;
+  if (verify_checksum) {
+    const uint32 crc = crc32c::Unmask(core::DecodeFixed32(data + n + 1));
+    const uint32 actual = crc32c::Value(data, n + 1);
+    if (actual != crc) {
+      delete[] buf;
+      s = errors::DataLoss("block checksum mismatch");
+      return s;
+    }
+  }
+
+  switch (data[n]) {
+    case kNoCompression:
+      if (data != buf) {
+        // File implementation gave us pointer to some other data.
+        // Use it directly under the assumption that it will be live
+        // while the file is open.
+        delete[] buf;
+        result->data = StringPiece(data, n);
+        result->heap_allocated = false;
+        result->cachable = false;  // Do not double-cache
+      } else {
+        result->data = StringPiece(buf, n);
+        result->heap_allocated = true;
+        result->cachable = true;
+      }
+
+      // Ok
+      break;
+    case kSnappyCompression: {
+      size_t ulength = 0;
+      if (!port::Snappy_GetUncompressedLength(data, n, &ulength)) {
+        delete[] buf;
+        return errors::DataLoss("corrupted compressed block contents");
+      }
+      char* ubuf = new char[ulength];
+      if (!port::Snappy_Uncompress(data, n, ubuf)) {
+        delete[] buf;
+        delete[] ubuf;
+        return errors::DataLoss("corrupted compressed block contents");
+      }
+      delete[] buf;
+      result->data = StringPiece(ubuf, ulength);
+      result->heap_allocated = true;
+      result->cachable = true;
+      break;
+    }
+    default:
+      delete[] buf;
+      return errors::DataLoss("bad block type");
+  }
+
+  return Status::OK();
+}
+
+}  // namespace table
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/format.h b/tensorflow/core/lib/io/format.h
new file mode 100644
index 0000000000..3121c41bb8
--- /dev/null
+++ b/tensorflow/core/lib/io/format.h
@@ -0,0 +1,99 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#ifndef TENSORFLOW_LIB_IO_FORMAT_H_
+#define TENSORFLOW_LIB_IO_FORMAT_H_
+
+#include <string>
+#include <stdint.h>
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/io/table_builder.h"
+
+namespace tensorflow {
+class RandomAccessFile;
+namespace table {
+
+class Block;
+
+// BlockHandle is a pointer to the extent of a file that stores a data
+// block or a meta block.
+class BlockHandle {
+ public:
+  BlockHandle();
+
+  // The offset of the block in the file.
+  uint64 offset() const { return offset_; }
+  void set_offset(uint64 offset) { offset_ = offset; }
+
+  // The size of the stored block
+  uint64 size() const { return size_; }
+  void set_size(uint64 size) { size_ = size; }
+
+  void EncodeTo(string* dst) const;
+  Status DecodeFrom(StringPiece* input);
+
+  // Maximum encoding length of a BlockHandle
+  enum { kMaxEncodedLength = 10 + 10 };
+
+ private:
+  uint64 offset_;
+  uint64 size_;
+};
+
+// Footer encapsulates the fixed information stored at the tail
+// end of every table file.
+class Footer {
+ public:
+  Footer() {}
+
+  // The block handle for the metaindex block of the table
+  const BlockHandle& metaindex_handle() const { return metaindex_handle_; }
+  void set_metaindex_handle(const BlockHandle& h) { metaindex_handle_ = h; }
+
+  // The block handle for the index block of the table
+  const BlockHandle& index_handle() const { return index_handle_; }
+  void set_index_handle(const BlockHandle& h) { index_handle_ = h; }
+
+  void EncodeTo(string* dst) const;
+  Status DecodeFrom(StringPiece* input);
+
+  // Encoded length of a Footer.  Note that the serialization of a
+  // Footer will always occupy exactly this many bytes.  It consists
+  // of two block handles and a magic number.
+  enum { kEncodedLength = 2 * BlockHandle::kMaxEncodedLength + 8 };
+
+ private:
+  BlockHandle metaindex_handle_;
+  BlockHandle index_handle_;
+};
+
+// kTableMagicNumber was picked by running
+//    echo http://code.google.com/p/leveldb/ | sha1sum
+// and taking the leading 64 bits.
+static const uint64 kTableMagicNumber = 0xdb4775248b80fb57ull;
+
+// 1-byte type + 32-bit crc
+static const size_t kBlockTrailerSize = 5;
+
+struct BlockContents {
+  StringPiece data;     // Actual contents of data
+  bool cachable;        // True iff data can be cached
+  bool heap_allocated;  // True iff caller should delete[] data.data()
+};
+
+// Read the block identified by "handle" from "file".  On failure
+// return non-OK.  On success fill *result and return OK.
+extern Status ReadBlock(RandomAccessFile* file, const BlockHandle& handle,
+                             BlockContents* result);
+
+// Implementation details follow.  Clients should ignore,
+
+inline BlockHandle::BlockHandle()
+    : offset_(~static_cast<uint64>(0)), size_(~static_cast<uint64>(0)) {}
+
+}  // namespace table
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_FORMAT_H_
diff --git a/tensorflow/core/lib/io/inputbuffer.cc b/tensorflow/core/lib/io/inputbuffer.cc
new file mode 100644
index 0000000000..8fa245a546
--- /dev/null
+++ b/tensorflow/core/lib/io/inputbuffer.cc
@@ -0,0 +1,112 @@
+#include "tensorflow/core/lib/io/inputbuffer.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+namespace io {
+
+InputBuffer::InputBuffer(RandomAccessFile* file, size_t buffer_bytes)
+    : file_(file),
+      file_pos_(0),
+      size_(buffer_bytes),
+      buf_(new char[size_]),
+      pos_(buf_),
+      limit_(buf_) {}
+
+InputBuffer::~InputBuffer() {
+  delete file_;
+  delete[] buf_;
+}
+
+Status InputBuffer::FillBuffer() {
+  StringPiece data;
+  Status s = file_->Read(file_pos_, size_, &data, buf_);
+  if (data.data() != buf_) {
+    memmove(buf_, data.data(), data.size());
+  }
+  pos_ = buf_;
+  limit_ = pos_ + data.size();
+  file_pos_ += data.size();
+  return s;
+}
+
+Status InputBuffer::ReadLine(string* result) {
+  result->clear();
+  int i;
+  Status s;
+  for (i = 0;; i++) {
+    if (pos_ == limit_) {
+      // Get more data into buffer
+      s = FillBuffer();
+      if (limit_ == buf_) {
+        break;
+      }
+    }
+    char c = *pos_++;
+    if (c == '\n') {
+      // We don't append the '\n' to *result
+      return Status::OK();
+    }
+    *result += c;
+  }
+  if (errors::IsOutOfRange(s) && !result->empty()) {
+    return Status::OK();
+  }
+  return s;
+}
+
+Status InputBuffer::ReadNBytes(int64 bytes_to_read, string* result) {
+  result->clear();
+  if (bytes_to_read < 0) {
+    return errors::InvalidArgument("Can't read a negative number of bytes: ",
+                                   bytes_to_read);
+  }
+  result->reserve(bytes_to_read);
+  Status s;
+  while (result->size() < static_cast<size_t>(bytes_to_read)) {
+    if (pos_ == limit_) {
+      // Get more data into buffer
+      s = FillBuffer();
+      if (limit_ == buf_) {
+        break;
+      }
+    }
+    const int64 bytes_to_copy =
+        std::min<int64>(limit_ - pos_, bytes_to_read - result->size());
+    result->insert(result->size(), pos_, bytes_to_copy);
+    pos_ += bytes_to_copy;
+  }
+  if (errors::IsOutOfRange(s) &&
+      (result->size() == static_cast<size_t>(bytes_to_read))) {
+    return Status::OK();
+  }
+  return s;
+}
+
+Status InputBuffer::SkipNBytes(int64 bytes_to_skip) {
+  if (bytes_to_skip < 0) {
+    return errors::InvalidArgument("Can only skip forward, not ",
+                                   bytes_to_skip);
+  }
+  int64 bytes_skipped = 0;
+  Status s;
+  while (bytes_skipped < bytes_to_skip) {
+    if (pos_ == limit_) {
+      // Get more data into buffer
+      s = FillBuffer();
+      if (limit_ == buf_) {
+        break;
+      }
+    }
+    const int64 bytes_to_advance =
+        std::min<int64>(limit_ - pos_, bytes_to_skip - bytes_skipped);
+    bytes_skipped += bytes_to_advance;
+    pos_ += bytes_to_advance;
+  }
+  if (errors::IsOutOfRange(s) && bytes_skipped == bytes_to_skip) {
+    return Status::OK();
+  }
+  return s;
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/inputbuffer.h b/tensorflow/core/lib/io/inputbuffer.h
new file mode 100644
index 0000000000..6879f30567
--- /dev/null
+++ b/tensorflow/core/lib/io/inputbuffer.h
@@ -0,0 +1,62 @@
+#ifndef TENSORFLOW_LIB_IO_INPUTBUFFER_H_
+#define TENSORFLOW_LIB_IO_INPUTBUFFER_H_
+
+#include <string>
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+namespace io {
+
+// An InputBuffer provides a buffer on top of a RandomAccessFile.
+// A given instance of an InputBuffer is NOT safe for concurrent use
+// by multiple threads
+class InputBuffer {
+ public:
+  // Create an InputBuffer for "file" with a buffer size of
+  // "buffer_bytes" bytes.  Takes ownership of "file" and will
+  // delete it when the InputBuffer is destroyed.
+  InputBuffer(RandomAccessFile* file, size_t buffer_bytes);
+  ~InputBuffer();
+
+  // Read one text line of data into "*result" until end-of-file or a
+  // \n is read.  (The \n is not included in the result.)  Overwrites
+  // any existing data in *result.
+  //
+  // If successful, returns OK.  If we are already at the end of the
+  // file, we return an OUT_OF_RANGE error.  Otherwise, we return
+  // some other non-OK status.
+  Status ReadLine(string* result);
+
+  // Reads bytes_to_read bytes into *result, overwriting *result.
+  //
+  // If successful, returns OK.  If we there are not enough bytes to
+  // read before the end of the file, we return an OUT_OF_RANGE error.
+  // Otherwise, we return some other non-OK status.
+  Status ReadNBytes(int64 bytes_to_read, string* result);
+
+  // Like ReadNBytes() without returning the bytes read.
+  Status SkipNBytes(int64 bytes_to_skip);
+
+  // Returns the position in the file.
+  int64 Tell() const { return file_pos_ - (limit_ - pos_); }
+
+ private:
+  Status FillBuffer();
+
+  RandomAccessFile* file_;  // Owned
+  int64 file_pos_;          // Next position to read from in "file_"
+  size_t size_;             // Size of "buf_"
+  char* buf_;               // The buffer itself
+  // [pos_,limit_) hold the "limit_ - pos_" bytes just before "file_pos_"
+  char* pos_;    // Current position in "buf"
+  char* limit_;  // Just past end of valid data in "buf"
+
+  TF_DISALLOW_COPY_AND_ASSIGN(InputBuffer);
+};
+
+}  // namespace io
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_INPUTBUFFER_H_
diff --git a/tensorflow/core/lib/io/inputbuffer_test.cc b/tensorflow/core/lib/io/inputbuffer_test.cc
new file mode 100644
index 0000000000..34094f018c
--- /dev/null
+++ b/tensorflow/core/lib/io/inputbuffer_test.cc
@@ -0,0 +1,174 @@
+#include "tensorflow/core/lib/io/inputbuffer.h"
+
+#include "tensorflow/core/public/env.h"
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/platform/test.h"
+
+namespace tensorflow {
+
+static std::vector<int> BufferSizes() {
+  return {1,  2,  3,  4,  5,  6,  7,  8,  9,  10,   11,
+          12, 13, 14, 15, 16, 17, 18, 19, 20, 65536};
+}
+
+TEST(InputBuffer, ReadLine_Empty) {
+  Env* env = Env::Default();
+  string fname = testing::TmpDir() + "/inputbuffer_test";
+  WriteStringToFile(env, fname, "");
+
+  for (auto buf_size : BufferSizes()) {
+    RandomAccessFile* file;
+    TF_CHECK_OK(env->NewRandomAccessFile(fname, &file));
+    string line;
+    io::InputBuffer in(file, buf_size);
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadLine(&line)));
+  }
+}
+
+TEST(InputBuffer, ReadLine1) {
+  Env* env = Env::Default();
+  string fname = testing::TmpDir() + "/inputbuffer_test";
+  WriteStringToFile(env, fname, "line one\nline two\nline three\n");
+
+  for (auto buf_size : BufferSizes()) {
+    RandomAccessFile* file;
+    TF_CHECK_OK(env->NewRandomAccessFile(fname, &file));
+    string line;
+    io::InputBuffer in(file, buf_size);
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line one");
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line two");
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line three");
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadLine(&line)));
+    // A second call should also return end of file
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadLine(&line)));
+  }
+}
+
+TEST(InputBuffer, ReadLine_NoTrailingNewLine) {
+  Env* env = Env::Default();
+  string fname = testing::TmpDir() + "/inputbuffer_test";
+  WriteStringToFile(env, fname, "line one\nline two\nline three");
+
+  for (auto buf_size : BufferSizes()) {
+    RandomAccessFile* file;
+    TF_CHECK_OK(env->NewRandomAccessFile(fname, &file));
+    string line;
+    io::InputBuffer in(file, buf_size);
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line one");
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line two");
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line three");
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadLine(&line)));
+    // A second call should also return end of file
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadLine(&line)));
+  }
+}
+
+TEST(InputBuffer, ReadLine_EmptyLines) {
+  Env* env = Env::Default();
+  string fname = testing::TmpDir() + "/inputbuffer_test";
+  WriteStringToFile(env, fname, "line one\n\n\nline two\nline three");
+
+  for (auto buf_size : BufferSizes()) {
+    RandomAccessFile* file;
+    TF_CHECK_OK(env->NewRandomAccessFile(fname, &file));
+    string line;
+    io::InputBuffer in(file, buf_size);
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line one");
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "");
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "");
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line two");
+    TF_CHECK_OK(in.ReadLine(&line));
+    EXPECT_EQ(line, "line three");
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadLine(&line)));
+    // A second call should also return end of file
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadLine(&line)));
+  }
+}
+
+TEST(InputBuffer, ReadNBytes) {
+  Env* env = Env::Default();
+  string fname = testing::TmpDir() + "/inputbuffer_test";
+  WriteStringToFile(env, fname, "0123456789");
+
+  for (auto buf_size : BufferSizes()) {
+    RandomAccessFile* file;
+    TF_CHECK_OK(env->NewRandomAccessFile(fname, &file));
+    string read;
+    io::InputBuffer in(file, buf_size);
+    EXPECT_EQ(0, in.Tell());
+    TF_CHECK_OK(in.ReadNBytes(3, &read));
+    EXPECT_EQ(read, "012");
+    EXPECT_EQ(3, in.Tell());
+    TF_CHECK_OK(in.ReadNBytes(0, &read));
+    EXPECT_EQ(read, "");
+    EXPECT_EQ(3, in.Tell());
+    TF_CHECK_OK(in.ReadNBytes(4, &read));
+    EXPECT_EQ(read, "3456");
+    EXPECT_EQ(7, in.Tell());
+    TF_CHECK_OK(in.ReadNBytes(0, &read));
+    EXPECT_EQ(read, "");
+    EXPECT_EQ(7, in.Tell());
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadNBytes(5, &read)));
+    EXPECT_EQ(read, "789");
+    EXPECT_EQ(10, in.Tell());
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadNBytes(5, &read)));
+    EXPECT_EQ(read, "");
+    EXPECT_EQ(10, in.Tell());
+    TF_CHECK_OK(in.ReadNBytes(0, &read));
+    EXPECT_EQ(read, "");
+    EXPECT_EQ(10, in.Tell());
+  }
+}
+
+TEST(InputBuffer, SkipNBytes) {
+  Env* env = Env::Default();
+  string fname = testing::TmpDir() + "/inputbuffer_test";
+  WriteStringToFile(env, fname, "0123456789");
+
+  for (auto buf_size : BufferSizes()) {
+    RandomAccessFile* file;
+    TF_CHECK_OK(env->NewRandomAccessFile(fname, &file));
+    string read;
+    io::InputBuffer in(file, buf_size);
+    EXPECT_EQ(0, in.Tell());
+    TF_CHECK_OK(in.SkipNBytes(3));
+    EXPECT_EQ(3, in.Tell());
+    TF_CHECK_OK(in.SkipNBytes(0));
+    EXPECT_EQ(3, in.Tell());
+    TF_CHECK_OK(in.ReadNBytes(2, &read));
+    EXPECT_EQ(read, "34");
+    EXPECT_EQ(5, in.Tell());
+    TF_CHECK_OK(in.SkipNBytes(0));
+    EXPECT_EQ(5, in.Tell());
+    TF_CHECK_OK(in.SkipNBytes(2));
+    EXPECT_EQ(7, in.Tell());
+    TF_CHECK_OK(in.ReadNBytes(1, &read));
+    EXPECT_EQ(read, "7");
+    EXPECT_EQ(8, in.Tell());
+    EXPECT_TRUE(errors::IsOutOfRange(in.SkipNBytes(5)));
+    EXPECT_EQ(10, in.Tell());
+    EXPECT_TRUE(errors::IsOutOfRange(in.SkipNBytes(5)));
+    EXPECT_EQ(10, in.Tell());
+    EXPECT_TRUE(errors::IsOutOfRange(in.ReadNBytes(5, &read)));
+    EXPECT_EQ(read, "");
+    EXPECT_EQ(10, in.Tell());
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/iterator.cc b/tensorflow/core/lib/io/iterator.cc
new file mode 100644
index 0000000000..878e93a911
--- /dev/null
+++ b/tensorflow/core/lib/io/iterator.cc
@@ -0,0 +1,72 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#include "tensorflow/core/lib/io/iterator.h"
+
+namespace tensorflow {
+namespace table {
+
+Iterator::Iterator() {
+  cleanup_.function = NULL;
+  cleanup_.next = NULL;
+}
+
+Iterator::~Iterator() {
+  if (cleanup_.function != NULL) {
+    (*cleanup_.function)(cleanup_.arg1, cleanup_.arg2);
+    for (Cleanup* c = cleanup_.next; c != NULL;) {
+      (*c->function)(c->arg1, c->arg2);
+      Cleanup* next = c->next;
+      delete c;
+      c = next;
+    }
+  }
+}
+
+void Iterator::RegisterCleanup(CleanupFunction func, void* arg1, void* arg2) {
+  assert(func != NULL);
+  Cleanup* c;
+  if (cleanup_.function == NULL) {
+    c = &cleanup_;
+  } else {
+    c = new Cleanup;
+    c->next = cleanup_.next;
+    cleanup_.next = c;
+  }
+  c->function = func;
+  c->arg1 = arg1;
+  c->arg2 = arg2;
+}
+
+namespace {
+class EmptyIterator : public Iterator {
+ public:
+  EmptyIterator(const Status& s) : status_(s) {}
+  virtual bool Valid() const { return false; }
+  virtual void Seek(const StringPiece& target) {}
+  virtual void SeekToFirst() {}
+  virtual void Next() { assert(false); }
+  StringPiece key() const {
+    assert(false);
+    return StringPiece();
+  }
+  StringPiece value() const {
+    assert(false);
+    return StringPiece();
+  }
+  virtual Status status() const { return status_; }
+
+ private:
+  Status status_;
+};
+}  // namespace
+
+Iterator* NewEmptyIterator() { return new EmptyIterator(Status::OK()); }
+
+Iterator* NewErrorIterator(const Status& status) {
+  return new EmptyIterator(status);
+}
+
+}  // namespace table
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/iterator.h b/tensorflow/core/lib/io/iterator.h
new file mode 100644
index 0000000000..603a2f95fe
--- /dev/null
+++ b/tensorflow/core/lib/io/iterator.h
@@ -0,0 +1,93 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+//
+// An iterator yields a sequence of key/value pairs from a source.
+// The following class defines the interface.  Multiple implementations
+// are provided by this library.  In particular, iterators are provided
+// to access the contents of a Table or a DB.
+//
+// Multiple threads can invoke const methods on an Iterator without
+// external synchronization, but if any of the threads may call a
+// non-const method, all threads accessing the same Iterator must use
+// external synchronization.
+
+#ifndef TENSORFLOW_LIB_IO_ITERATOR_H_
+#define TENSORFLOW_LIB_IO_ITERATOR_H_
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+namespace table {
+
+class Iterator {
+ public:
+  Iterator();
+  virtual ~Iterator();
+
+  // An iterator is either positioned at a key/value pair, or
+  // not valid.  This method returns true iff the iterator is valid.
+  virtual bool Valid() const = 0;
+
+  // Position at the first key in the source.  The iterator is Valid()
+  // after this call iff the source is not empty.
+  virtual void SeekToFirst() = 0;
+
+  // Position at the first key in the source that is at or past target.
+  // The iterator is Valid() after this call iff the source contains
+  // an entry that comes at or past target.
+  virtual void Seek(const StringPiece& target) = 0;
+
+  // Moves to the next entry in the source.  After this call, Valid() is
+  // true iff the iterator was not positioned at the last entry in the source.
+  // REQUIRES: Valid()
+  virtual void Next() = 0;
+
+  // Return the key for the current entry.  The underlying storage for
+  // the returned slice is valid only until the next modification of
+  // the iterator.
+  // REQUIRES: Valid()
+  virtual StringPiece key() const = 0;
+
+  // Return the value for the current entry.  The underlying storage for
+  // the returned slice is valid only until the next modification of
+  // the iterator.
+  // REQUIRES: Valid()
+  virtual StringPiece value() const = 0;
+
+  // If an error has occurred, return it.  Else return an ok status.
+  virtual Status status() const = 0;
+
+  // Clients are allowed to register function/arg1/arg2 triples that
+  // will be invoked when this iterator is destroyed.
+  //
+  // Note that unlike all of the preceding methods, this method is
+  // not abstract and therefore clients should not override it.
+  typedef void (*CleanupFunction)(void* arg1, void* arg2);
+  void RegisterCleanup(CleanupFunction function, void* arg1, void* arg2);
+
+ private:
+  struct Cleanup {
+    CleanupFunction function;
+    void* arg1;
+    void* arg2;
+    Cleanup* next;
+  };
+  Cleanup cleanup_;
+
+  // No copying allowed
+  Iterator(const Iterator&);
+  void operator=(const Iterator&);
+};
+
+// Return an empty iterator (yields nothing).
+extern Iterator* NewEmptyIterator();
+
+// Return an empty iterator with the specified status.
+extern Iterator* NewErrorIterator(const Status& status);
+
+}  // namespace table
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_ITERATOR_H_
diff --git a/tensorflow/core/lib/io/match.cc b/tensorflow/core/lib/io/match.cc
new file mode 100644
index 0000000000..1563642d0b
--- /dev/null
+++ b/tensorflow/core/lib/io/match.cc
@@ -0,0 +1,31 @@
+#include "tensorflow/core/lib/io/match.h"
+#include <fnmatch.h>
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace io {
+
+Status GetMatchingFiles(Env* env, const string& pattern,
+                             std::vector<string>* results) {
+  results->clear();
+  std::vector<string> all_files;
+  string dir = Dirname(pattern).ToString();
+  if (dir.empty()) dir = ".";
+  string basename_pattern = Basename(pattern).ToString();
+  Status s = env->GetChildren(dir, &all_files);
+  if (!s.ok()) {
+    return s;
+  }
+  for (const auto& f : all_files) {
+    int flags = 0;
+    if (fnmatch(basename_pattern.c_str(), Basename(f).ToString().c_str(),
+                flags) == 0) {
+      results->push_back(JoinPath(dir, f));
+    }
+  }
+  return Status::OK();
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/match.h b/tensorflow/core/lib/io/match.h
new file mode 100644
index 0000000000..fd194178e7
--- /dev/null
+++ b/tensorflow/core/lib/io/match.h
@@ -0,0 +1,24 @@
+#ifndef TENSORFLOW_LIB_IO_MATCH_H_
+#define TENSORFLOW_LIB_IO_MATCH_H_
+
+#include <vector>
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+class Env;
+namespace io {
+
+// Given a pattern, return the set of files that match the pattern.
+// Note that this routine only supports wildcard characters in the
+// basename portion of the pattern, not in the directory portion.  If
+// successful, return Status::OK and store the matching files in
+// "*results".  Otherwise, return a non-OK status.
+Status GetMatchingFiles(Env* env, const string& pattern,
+                             std::vector<string>* results);
+
+}  // namespace io
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_MATCH_H_
diff --git a/tensorflow/core/lib/io/match_test.cc b/tensorflow/core/lib/io/match_test.cc
new file mode 100644
index 0000000000..aaa56e4e7e
--- /dev/null
+++ b/tensorflow/core/lib/io/match_test.cc
@@ -0,0 +1,51 @@
+#include <algorithm>
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/io/match.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/public/env.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace io {
+
+static string Match(Env* env, const string& suffix_pattern) {
+  std::vector<string> results;
+  Status s = GetMatchingFiles(env, JoinPath(testing::TmpDir(), suffix_pattern),
+                              &results);
+  if (!s.ok()) {
+    return s.ToString();
+  } else {
+    string r;
+    std::sort(results.begin(), results.end());
+    for (size_t i = 0; i < results.size(); i++) {
+      strings::StrAppend(&r, (i > 0) ? "," : "", Basename(results[i]));
+    }
+    return r;
+  }
+}
+TEST(GetMatchingFiles, Simple) {
+  Env* env = Env::Default();
+  EXPECT_EQ(Match(env, "thereisnosuchfile"), "");
+  EXPECT_EQ(Match(env, "thereisnosuchfile*"), "");
+
+  // Populate a few files
+  EXPECT_OK(WriteStringToFile(Env::Default(),
+                              JoinPath(testing::TmpDir(), "match-00"), ""));
+  EXPECT_OK(WriteStringToFile(Env::Default(),
+                              JoinPath(testing::TmpDir(), "match-0a"), ""));
+  EXPECT_OK(WriteStringToFile(Env::Default(),
+                              JoinPath(testing::TmpDir(), "match-01"), ""));
+  EXPECT_OK(WriteStringToFile(Env::Default(),
+                              JoinPath(testing::TmpDir(), "match-aaa"), ""));
+
+  EXPECT_EQ(Match(env, "match-*"), "match-00,match-01,match-0a,match-aaa");
+  EXPECT_EQ(Match(env, "match-0[0-9]"), "match-00,match-01");
+  EXPECT_EQ(Match(env, "match-?[0-9]"), "match-00,match-01");
+  EXPECT_EQ(Match(env, "match-?a*"), "match-0a,match-aaa");
+  EXPECT_EQ(Match(env, "match-??"), "match-00,match-01,match-0a");
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/path.cc b/tensorflow/core/lib/io/path.cc
new file mode 100644
index 0000000000..1359ded0f0
--- /dev/null
+++ b/tensorflow/core/lib/io/path.cc
@@ -0,0 +1,92 @@
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace tensorflow {
+namespace io {
+
+string JoinPath(StringPiece part1, StringPiece part2) {
+  string result;
+
+  StringPiece paths[2] = {part1, part2};
+  for (StringPiece path : paths) {
+    if (path.empty()) continue;
+
+    if (result.empty()) {
+      result = path.ToString();
+      continue;
+    }
+
+    if (result[result.size() - 1] == '/') {
+      if (IsAbsolutePath(path)) {
+        strings::StrAppend(&result, path.substr(1));
+      } else {
+        strings::StrAppend(&result, path);
+      }
+    } else {
+      if (IsAbsolutePath(path)) {
+        strings::StrAppend(&result, path);
+      } else {
+        strings::StrAppend(&result, "/", path);
+      }
+    }
+  }
+
+  return result;
+}
+
+namespace internal {
+
+// Return the parts of the path, split on the final "/".  If there is no
+// "/" in the path, the first part of the output is empty and the second
+// is the input. If the only "/" in the path is the first character, it is
+// the first part of the output.
+std::pair<StringPiece, StringPiece> SplitPath(StringPiece path) {
+  auto pos = path.rfind('/');
+
+  // Handle the case with no '/' in 'path'.
+  if (pos == StringPiece::npos)
+    return std::make_pair(StringPiece(path.data(), 0), path);
+
+  // Handle the case with a single leading '/' in 'path'.
+  if (pos == 0)
+    return std::make_pair(StringPiece(path.data(), 1),
+                          StringPiece(path.data() + 1, path.size() - 1));
+
+  return std::make_pair(
+      StringPiece(path.data(), pos),
+      StringPiece(path.data() + pos + 1, path.size() - (pos + 1)));
+}
+
+// Return the parts of the basename of path, split on the final ".".
+// If there is no "." in the basename or "." is the final character in the
+// basename, the second value will be empty.
+std::pair<StringPiece, StringPiece> SplitBasename(StringPiece path) {
+  path = Basename(path);
+
+  auto pos = path.rfind('.');
+  if (pos == StringPiece::npos)
+    return std::make_pair(path, StringPiece(path.data() + path.size(), 0));
+  return std::make_pair(
+      StringPiece(path.data(), pos),
+      StringPiece(path.data() + pos + 1, path.size() - (pos + 1)));
+}
+}  // namespace internal
+
+bool IsAbsolutePath(StringPiece path) {
+  return !path.empty() && path[0] == '/';
+}
+
+StringPiece Dirname(StringPiece path) {
+  return internal::SplitPath(path).first;
+}
+
+StringPiece Basename(StringPiece path) {
+  return internal::SplitPath(path).second;
+}
+
+StringPiece Extension(StringPiece path) {
+  return internal::SplitBasename(path).second;
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/path.h b/tensorflow/core/lib/io/path.h
new file mode 100644
index 0000000000..01483f1702
--- /dev/null
+++ b/tensorflow/core/lib/io/path.h
@@ -0,0 +1,47 @@
+#ifndef TENSORFLOW_LIB_IO_PATH_H_
+#define TENSORFLOW_LIB_IO_PATH_H_
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+class StringPiece;
+namespace io {
+
+// Utility routines for processing filenames
+
+// Join multiple paths together, without introducing unnecessary path
+// separators.
+// For example:
+//
+//  Arguments                  | JoinPath
+//  ---------------------------+----------
+//  '/foo', 'bar'              | /foo/bar
+//  '/foo/', 'bar'             | /foo/bar
+//  '/foo', '/bar'             | /foo/bar
+//
+// Usage:
+// string path = io::JoinPath("/mydir", filename);
+// string path = io::JoinPath(FLAGS_test_srcdir, filename);
+string JoinPath(StringPiece part1, StringPiece part2);
+
+// Return true if path is absolute.
+bool IsAbsolutePath(StringPiece path);
+
+// Returns the part of the path before the final "/".  If there is a single
+// leading "/" in the path, the result will be the leading "/".  If there is
+// no "/" in the path, the result is the empty prefix of the input.
+StringPiece Dirname(StringPiece path);
+
+// Returns the part of the path after the final "/".  If there is no
+// "/" in the path, the result is the same as the input.
+StringPiece Basename(StringPiece path);
+
+// Returns the part of the basename of path after the final ".".  If
+// there is no "." in the basename, the result is empty.
+StringPiece Extension(StringPiece path);
+
+}  // namespace io
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_PATH_H_
diff --git a/tensorflow/core/lib/io/path_test.cc b/tensorflow/core/lib/io/path_test.cc
new file mode 100644
index 0000000000..b670e44f1f
--- /dev/null
+++ b/tensorflow/core/lib/io/path_test.cc
@@ -0,0 +1,65 @@
+#include "tensorflow/core/lib/io/path.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace io {
+
+TEST(PathTest, JoinPath) {
+  EXPECT_EQ("/foo/bar", JoinPath("/foo", "bar"));
+  EXPECT_EQ("foo/bar", JoinPath("foo", "bar"));
+  EXPECT_EQ("foo/bar", JoinPath("foo", "/bar"));
+  EXPECT_EQ("/foo/bar", JoinPath("/foo", "/bar"));
+
+  EXPECT_EQ("/bar", JoinPath("", "/bar"));
+  EXPECT_EQ("bar", JoinPath("", "bar"));
+  EXPECT_EQ("/foo", JoinPath("/foo", ""));
+
+  EXPECT_EQ("/foo/bar/baz/blah/blink/biz",
+            JoinPath("/foo/bar/baz/", "/blah/blink/biz"));
+}
+
+TEST(PathTest, IsAbsolutePath) {
+  EXPECT_FALSE(IsAbsolutePath(""));
+  EXPECT_FALSE(IsAbsolutePath("../foo"));
+  EXPECT_FALSE(IsAbsolutePath("foo"));
+  EXPECT_FALSE(IsAbsolutePath("./foo"));
+  EXPECT_FALSE(IsAbsolutePath("foo/bar/baz/"));
+  EXPECT_TRUE(IsAbsolutePath("/foo"));
+  EXPECT_TRUE(IsAbsolutePath("/foo/bar/../baz"));
+}
+
+TEST(PathTest, Dirname) {
+  EXPECT_EQ("/hello", Dirname("/hello/"));
+  EXPECT_EQ("/", Dirname("/hello"));
+  EXPECT_EQ("hello", Dirname("hello/world"));
+  EXPECT_EQ("hello", Dirname("hello/"));
+  EXPECT_EQ("", Dirname("world"));
+  EXPECT_EQ("/", Dirname("/"));
+  EXPECT_EQ("", Dirname(""));
+}
+
+TEST(PathTest, Basename) {
+  EXPECT_EQ("", Basename("/hello/"));
+  EXPECT_EQ("hello", Basename("/hello"));
+  EXPECT_EQ("world", Basename("hello/world"));
+  EXPECT_EQ("", Basename("hello/"));
+  EXPECT_EQ("world", Basename("world"));
+  EXPECT_EQ("", Basename("/"));
+  EXPECT_EQ("", Basename(""));
+}
+
+TEST(PathTest, Extension) {
+  EXPECT_EQ("gif", Extension("foo.gif"));
+  EXPECT_EQ("", Extension("foo."));
+  EXPECT_EQ("", Extension(""));
+  EXPECT_EQ("", Extension("/"));
+  EXPECT_EQ("", Extension("foo"));
+  EXPECT_EQ("", Extension("foo/"));
+  EXPECT_EQ("gif", Extension("/a/path/to/foo.gif"));
+  EXPECT_EQ("html", Extension("/a/path.bar/to/foo.html"));
+  EXPECT_EQ("", Extension("/a/path.bar/to/foo"));
+  EXPECT_EQ("baz", Extension("/a/path.bar/to/foo.bar.baz"));
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/record_reader.cc b/tensorflow/core/lib/io/record_reader.cc
new file mode 100644
index 0000000000..2f0fabff63
--- /dev/null
+++ b/tensorflow/core/lib/io/record_reader.cc
@@ -0,0 +1,80 @@
+#include "tensorflow/core/lib/io/record_reader.h"
+
+#include <limits.h>
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/hash/crc32c.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+namespace io {
+
+RecordReader::RecordReader(RandomAccessFile* file) : src_(file) {}
+
+RecordReader::~RecordReader() {}
+
+// Read n+4 bytes from file, verify that checksum of first n bytes is
+// stored in the last 4 bytes and store the first n bytes in *result.
+// May use *storage as backing store.
+static Status ReadChecksummed(RandomAccessFile* file, uint64 offset,
+                                   size_t n, StringPiece* result,
+                                   string* storage) {
+  if (n >= SIZE_MAX - sizeof(uint32)) {
+    return errors::DataLoss("record size too large");
+  }
+
+  const size_t expected = n + sizeof(uint32);
+  storage->resize(expected);
+  StringPiece data;
+  Status s = file->Read(offset, expected, &data, &(*storage)[0]);
+  if (!s.ok()) {
+    return s;
+  }
+  if (data.size() != expected) {
+    if (data.size() == 0) {
+      return errors::OutOfRange("eof");
+    } else {
+      return errors::DataLoss("truncated record at ", offset);
+    }
+  }
+  uint32 masked_crc = core::DecodeFixed32(data.data() + n);
+  if (crc32c::Unmask(masked_crc) != crc32c::Value(data.data(), n)) {
+    return errors::DataLoss("corrupted record at ", offset);
+  }
+  *result = StringPiece(data.data(), n);
+  return Status::OK();
+}
+
+Status RecordReader::ReadRecord(uint64* offset, string* record) {
+  static const size_t kHeaderSize = sizeof(uint64) + sizeof(uint32);
+  static const size_t kFooterSize = sizeof(uint32);
+
+  // Read length
+  StringPiece lbuf;
+  Status s = ReadChecksummed(src_, *offset, sizeof(uint64), &lbuf, record);
+  if (!s.ok()) {
+    return s;
+  }
+  const uint64 length = core::DecodeFixed64(lbuf.data());
+
+  // Read data
+  StringPiece data;
+  s = ReadChecksummed(src_, *offset + kHeaderSize, length, &data, record);
+  if (!s.ok()) {
+    if (errors::IsOutOfRange(s)) {
+      s = errors::DataLoss("truncated record at ", *offset);
+    }
+    return s;
+  }
+  if (record->data() != data.data()) {
+    // RandomAccessFile placed the data in some other location.
+    memmove(&(*record)[0], data.data(), data.size());
+  }
+
+  record->resize(data.size());
+  *offset += kHeaderSize + length + kFooterSize;
+  return Status::OK();
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/record_reader.h b/tensorflow/core/lib/io/record_reader.h
new file mode 100644
index 0000000000..a8c1b0dd5d
--- /dev/null
+++ b/tensorflow/core/lib/io/record_reader.h
@@ -0,0 +1,36 @@
+#ifndef TENSORFLOW_LIB_IO_RECORD_READER_H_
+#define TENSORFLOW_LIB_IO_RECORD_READER_H_
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class RandomAccessFile;
+
+namespace io {
+
+class RecordReader {
+ public:
+  // Create a reader that will return log records from "*file".
+  // "*file" must remain live while this Reader is in use.
+  explicit RecordReader(RandomAccessFile* file);
+
+  ~RecordReader();
+
+  // Read the record at "*offset" into *record and update *offset to
+  // point to the offset of the next record.  Returns OK on success,
+  // OUT_OF_RANGE for end of file, or something else for an error.
+  Status ReadRecord(uint64* offset, string* record);
+
+ private:
+  RandomAccessFile* src_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(RecordReader);
+};
+
+}  // namespace io
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_RECORD_READER_H_
diff --git a/tensorflow/core/lib/io/record_writer.cc b/tensorflow/core/lib/io/record_writer.cc
new file mode 100644
index 0000000000..3d7f1509ab
--- /dev/null
+++ b/tensorflow/core/lib/io/record_writer.cc
@@ -0,0 +1,42 @@
+#include "tensorflow/core/lib/io/record_writer.h"
+
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/hash/crc32c.h"
+
+namespace tensorflow {
+namespace io {
+
+RecordWriter::RecordWriter(WritableFile* dest) : dest_(dest) {}
+
+RecordWriter::~RecordWriter() {}
+
+static uint32 MaskedCrc(const char* data, size_t n) {
+  return crc32c::Mask(crc32c::Value(data, n));
+}
+
+Status RecordWriter::WriteRecord(StringPiece data) {
+  // Format of a single record:
+  //  uint64    length
+  //  uint32    masked crc of length
+  //  byte      data[length]
+  //  uint32    masked crc of data
+  char header[sizeof(uint64) + sizeof(uint32)];
+  core::EncodeFixed64(header + 0, data.size());
+  core::EncodeFixed32(header + sizeof(uint64),
+                      MaskedCrc(header, sizeof(uint64)));
+  Status s = dest_->Append(StringPiece(header, sizeof(header)));
+  if (!s.ok()) {
+    return s;
+  }
+  s = dest_->Append(data);
+  if (!s.ok()) {
+    return s;
+  }
+  char footer[sizeof(uint32)];
+  core::EncodeFixed32(footer, MaskedCrc(data.data(), data.size()));
+  return dest_->Append(StringPiece(footer, sizeof(footer)));
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/record_writer.h b/tensorflow/core/lib/io/record_writer.h
new file mode 100644
index 0000000000..c7af00e5ae
--- /dev/null
+++ b/tensorflow/core/lib/io/record_writer.h
@@ -0,0 +1,34 @@
+#ifndef TENSORFLOW_LIB_IO_RECORD_WRITER_H_
+#define TENSORFLOW_LIB_IO_RECORD_WRITER_H_
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class WritableFile;
+
+namespace io {
+
+class RecordWriter {
+ public:
+  // Create a writer that will append data to "*dest".
+  // "*dest" must be initially empty.
+  // "*dest" must remain live while this Writer is in use.
+  explicit RecordWriter(WritableFile* dest);
+
+  ~RecordWriter();
+
+  Status WriteRecord(StringPiece slice);
+
+ private:
+  WritableFile* const dest_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(RecordWriter);
+};
+
+}  // namespace io
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_RECORD_WRITER_H_
diff --git a/tensorflow/core/lib/io/recordio_test.cc b/tensorflow/core/lib/io/recordio_test.cc
new file mode 100644
index 0000000000..3e9c816443
--- /dev/null
+++ b/tensorflow/core/lib/io/recordio_test.cc
@@ -0,0 +1,245 @@
+#include "tensorflow/core/lib/io/record_reader.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/hash/crc32c.h"
+#include "tensorflow/core/lib/io/record_writer.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace io {
+
+// Construct a string of the specified length made out of the supplied
+// partial string.
+static string BigString(const string& partial_string, size_t n) {
+  string result;
+  while (result.size() < n) {
+    result.append(partial_string);
+  }
+  result.resize(n);
+  return result;
+}
+
+// Construct a string from a number
+static string NumberString(int n) {
+  char buf[50];
+  snprintf(buf, sizeof(buf), "%d.", n);
+  return string(buf);
+}
+
+// Return a skewed potentially long string
+static string RandomSkewedString(int i, random::SimplePhilox* rnd) {
+  return BigString(NumberString(i), rnd->Skewed(17));
+}
+
+class RecordioTest : public testing::Test {
+ private:
+  class StringDest : public WritableFile {
+   public:
+    string contents_;
+
+    Status Close() override { return Status::OK(); }
+    Status Flush() override { return Status::OK(); }
+    Status Sync() override { return Status::OK(); }
+    Status Append(const StringPiece& slice) override {
+      contents_.append(slice.data(), slice.size());
+      return Status::OK();
+    }
+  };
+
+  class StringSource : public RandomAccessFile {
+   public:
+    StringPiece contents_;
+    mutable bool force_error_;
+    mutable bool returned_partial_;
+    StringSource() : force_error_(false), returned_partial_(false) {}
+
+    Status Read(uint64 offset, size_t n, StringPiece* result,
+                     char* scratch) const override {
+      EXPECT_FALSE(returned_partial_) << "must not Read() after eof/error";
+
+      if (force_error_) {
+        force_error_ = false;
+        returned_partial_ = true;
+        return errors::DataLoss("read error");
+      }
+
+      if (offset >= contents_.size()) {
+        return errors::OutOfRange("end of file");
+      }
+
+      if (contents_.size() < offset + n) {
+        n = contents_.size() - offset;
+        returned_partial_ = true;
+      }
+      *result = StringPiece(contents_.data() + offset, n);
+      return Status::OK();
+    }
+  };
+
+  StringDest dest_;
+  StringSource source_;
+  bool reading_;
+  uint64 readpos_;
+  RecordWriter* writer_;
+  RecordReader* reader_;
+
+ public:
+  RecordioTest()
+      : reading_(false),
+        readpos_(0),
+        writer_(new RecordWriter(&dest_)),
+        reader_(new RecordReader(&source_)) {}
+
+  ~RecordioTest() override {
+    delete writer_;
+    delete reader_;
+  }
+
+  void Write(const string& msg) {
+    ASSERT_TRUE(!reading_) << "Write() after starting to read";
+    ASSERT_OK(writer_->WriteRecord(StringPiece(msg)));
+  }
+
+  size_t WrittenBytes() const { return dest_.contents_.size(); }
+
+  string Read() {
+    if (!reading_) {
+      reading_ = true;
+      source_.contents_ = StringPiece(dest_.contents_);
+    }
+    string record;
+    Status s = reader_->ReadRecord(&readpos_, &record);
+    if (s.ok()) {
+      return record;
+    } else if (errors::IsOutOfRange(s)) {
+      return "EOF";
+    } else {
+      return s.ToString();
+    }
+  }
+
+  void IncrementByte(int offset, int delta) {
+    dest_.contents_[offset] += delta;
+  }
+
+  void SetByte(int offset, char new_byte) {
+    dest_.contents_[offset] = new_byte;
+  }
+
+  void ShrinkSize(int bytes) {
+    dest_.contents_.resize(dest_.contents_.size() - bytes);
+  }
+
+  void FixChecksum(int header_offset, int len) {
+    // Compute crc of type/len/data
+    uint32_t crc = crc32c::Value(&dest_.contents_[header_offset + 6], 1 + len);
+    crc = crc32c::Mask(crc);
+    core::EncodeFixed32(&dest_.contents_[header_offset], crc);
+  }
+
+  void ForceError() { source_.force_error_ = true; }
+
+  void StartReadingAt(uint64_t initial_offset) { readpos_ = initial_offset; }
+
+  void CheckOffsetPastEndReturnsNoRecords(uint64_t offset_past_end) {
+    Write("foo");
+    Write("bar");
+    Write(BigString("x", 10000));
+    reading_ = true;
+    source_.contents_ = StringPiece(dest_.contents_);
+    uint64 offset = WrittenBytes() + offset_past_end;
+    string record;
+    Status s = reader_->ReadRecord(&offset, &record);
+    ASSERT_TRUE(errors::IsOutOfRange(s)) << s;
+  }
+};
+
+TEST_F(RecordioTest, Empty) { ASSERT_EQ("EOF", Read()); }
+
+TEST_F(RecordioTest, ReadWrite) {
+  Write("foo");
+  Write("bar");
+  Write("");
+  Write("xxxx");
+  ASSERT_EQ("foo", Read());
+  ASSERT_EQ("bar", Read());
+  ASSERT_EQ("", Read());
+  ASSERT_EQ("xxxx", Read());
+  ASSERT_EQ("EOF", Read());
+  ASSERT_EQ("EOF", Read());  // Make sure reads at eof work
+}
+
+TEST_F(RecordioTest, ManyRecords) {
+  for (int i = 0; i < 100000; i++) {
+    Write(NumberString(i));
+  }
+  for (int i = 0; i < 100000; i++) {
+    ASSERT_EQ(NumberString(i), Read());
+  }
+  ASSERT_EQ("EOF", Read());
+}
+
+TEST_F(RecordioTest, RandomRead) {
+  const int N = 500;
+  {
+    random::PhiloxRandom philox(301, 17);
+    random::SimplePhilox rnd(&philox);
+    for (int i = 0; i < N; i++) {
+      Write(RandomSkewedString(i, &rnd));
+    }
+  }
+  {
+    random::PhiloxRandom philox(301, 17);
+    random::SimplePhilox rnd(&philox);
+    for (int i = 0; i < N; i++) {
+      ASSERT_EQ(RandomSkewedString(i, &rnd), Read());
+    }
+  }
+  ASSERT_EQ("EOF", Read());
+}
+
+// Tests of all the error paths in log_reader.cc follow:
+static void AssertHasSubstr(StringPiece s, StringPiece expected) {
+  EXPECT_TRUE(StringPiece(s).contains(expected)) << s << " does not contain "
+                                                 << expected;
+}
+
+TEST_F(RecordioTest, ReadError) {
+  Write("foo");
+  ForceError();
+  AssertHasSubstr(Read(), "Data loss");
+}
+
+TEST_F(RecordioTest, CorruptLength) {
+  Write("foo");
+  IncrementByte(6, 100);
+  AssertHasSubstr(Read(), "Data loss");
+}
+
+TEST_F(RecordioTest, CorruptLengthCrc) {
+  Write("foo");
+  IncrementByte(10, 100);
+  AssertHasSubstr(Read(), "Data loss");
+}
+
+TEST_F(RecordioTest, CorruptData) {
+  Write("foo");
+  IncrementByte(14, 10);
+  AssertHasSubstr(Read(), "Data loss");
+}
+
+TEST_F(RecordioTest, CorruptDataCrc) {
+  Write("foo");
+  IncrementByte(WrittenBytes() - 1, 10);
+  AssertHasSubstr(Read(), "Data loss");
+}
+
+TEST_F(RecordioTest, ReadEnd) { CheckOffsetPastEndReturnsNoRecords(0); }
+
+TEST_F(RecordioTest, ReadPastEnd) { CheckOffsetPastEndReturnsNoRecords(5); }
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/table.cc b/tensorflow/core/lib/io/table.cc
new file mode 100644
index 0000000000..769d7e72a5
--- /dev/null
+++ b/tensorflow/core/lib/io/table.cc
@@ -0,0 +1,169 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#include "tensorflow/core/lib/io/table.h"
+
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/io/block.h"
+#include "tensorflow/core/lib/io/format.h"
+#include "tensorflow/core/lib/io/table_options.h"
+#include "tensorflow/core/lib/io/two_level_iterator.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace table {
+
+struct Table::Rep {
+  ~Rep() { delete index_block; }
+
+  Options options;
+  Status status;
+  RandomAccessFile* file;
+  // XXX  uint64 cache_id;
+
+  BlockHandle metaindex_handle;  // Handle to metaindex_block: saved from footer
+  Block* index_block;
+};
+
+Status Table::Open(const Options& options, RandomAccessFile* file,
+                        uint64 size, Table** table) {
+  *table = NULL;
+  if (size < Footer::kEncodedLength) {
+    return errors::DataLoss("file is too short to be an sstable");
+  }
+
+  char footer_space[Footer::kEncodedLength];
+  StringPiece footer_input;
+  Status s =
+      file->Read(size - Footer::kEncodedLength, Footer::kEncodedLength,
+                 &footer_input, footer_space);
+  if (!s.ok()) return s;
+
+  Footer footer;
+  s = footer.DecodeFrom(&footer_input);
+  if (!s.ok()) return s;
+
+  // Read the index block
+  BlockContents contents;
+  Block* index_block = NULL;
+  if (s.ok()) {
+    s = ReadBlock(file, footer.index_handle(), &contents);
+    if (s.ok()) {
+      index_block = new Block(contents);
+    }
+  }
+
+  if (s.ok()) {
+    // We've successfully read the footer and the index block: we're
+    // ready to serve requests.
+    Rep* rep = new Table::Rep;
+    rep->options = options;
+    rep->file = file;
+    rep->metaindex_handle = footer.metaindex_handle();
+    rep->index_block = index_block;
+    // XXX    rep->cache_id = (options.block_cache ?
+    // options.block_cache->NewId() : 0);
+    *table = new Table(rep);
+  } else {
+    if (index_block) delete index_block;
+  }
+
+  return s;
+}
+
+Table::~Table() { delete rep_; }
+
+static void DeleteBlock(void* arg, void* ignored) {
+  delete reinterpret_cast<Block*>(arg);
+}
+
+// Convert an index iterator value (i.e., an encoded BlockHandle)
+// into an iterator over the contents of the corresponding block.
+Iterator* Table::BlockReader(void* arg, const StringPiece& index_value) {
+  Table* table = reinterpret_cast<Table*>(arg);
+  //  Cache* block_cache = table->rep_->options.block_cache;
+  Block* block = NULL;
+  //  Cache::Handle* cache_handle = NULL;
+
+  BlockHandle handle;
+  StringPiece input = index_value;
+  Status s = handle.DecodeFrom(&input);
+  // We intentionally allow extra stuff in index_value so that we
+  // can add more features in the future.
+
+  if (s.ok()) {
+    BlockContents contents;
+    s = ReadBlock(table->rep_->file, handle, &contents);
+    if (s.ok()) {
+      block = new Block(contents);
+    }
+  }
+
+  Iterator* iter;
+  if (block != NULL) {
+    iter = block->NewIterator();
+    iter->RegisterCleanup(&DeleteBlock, block, NULL);
+  } else {
+    iter = NewErrorIterator(s);
+  }
+  return iter;
+}
+
+Iterator* Table::NewIterator() const {
+  return NewTwoLevelIterator(rep_->index_block->NewIterator(),
+                             &Table::BlockReader, const_cast<Table*>(this));
+}
+
+Status Table::InternalGet(const StringPiece& k, void* arg,
+                               void (*saver)(void*, const StringPiece&,
+                                             const StringPiece&)) {
+  Status s;
+  Iterator* iiter = rep_->index_block->NewIterator();
+  iiter->Seek(k);
+  if (iiter->Valid()) {
+    BlockHandle handle;
+    Iterator* block_iter = BlockReader(this, iiter->value());
+    block_iter->Seek(k);
+    if (block_iter->Valid()) {
+      (*saver)(arg, block_iter->key(), block_iter->value());
+    }
+    s = block_iter->status();
+    delete block_iter;
+  }
+  if (s.ok()) {
+    s = iiter->status();
+  }
+  delete iiter;
+  return s;
+}
+
+uint64 Table::ApproximateOffsetOf(const StringPiece& key) const {
+  Iterator* index_iter = rep_->index_block->NewIterator();
+  index_iter->Seek(key);
+  uint64 result;
+  if (index_iter->Valid()) {
+    BlockHandle handle;
+    StringPiece input = index_iter->value();
+    Status s = handle.DecodeFrom(&input);
+    if (s.ok()) {
+      result = handle.offset();
+    } else {
+      // Strange: we can't decode the block handle in the index block.
+      // We'll just return the offset of the metaindex block, which is
+      // close to the whole file size for this case.
+      result = rep_->metaindex_handle.offset();
+    }
+  } else {
+    // key is past the last key in the file.  Approximate the offset
+    // by returning the offset of the metaindex block (which is
+    // right near the end of the file).
+    result = rep_->metaindex_handle.offset();
+  }
+  delete index_iter;
+  return result;
+}
+
+}  // namespace table
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/table.h b/tensorflow/core/lib/io/table.h
new file mode 100644
index 0000000000..230dded2d4
--- /dev/null
+++ b/tensorflow/core/lib/io/table.h
@@ -0,0 +1,76 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#ifndef TENSORFLOW_LIB_IO_TABLE_H_
+#define TENSORFLOW_LIB_IO_TABLE_H_
+
+#include <stdint.h>
+#include "tensorflow/core/lib/io/iterator.h"
+
+namespace tensorflow {
+class RandomAccessFile;
+
+namespace table {
+
+class Block;
+class BlockHandle;
+class Footer;
+struct Options;
+
+// A Table is a sorted map from strings to strings.  Tables are
+// immutable and persistent.  A Table may be safely accessed from
+// multiple threads without external synchronization.
+class Table {
+ public:
+  // Attempt to open the table that is stored in bytes [0..file_size)
+  // of "file", and read the metadata entries necessary to allow
+  // retrieving data from the table.
+  //
+  // If successful, returns ok and sets "*table" to the newly opened
+  // table.  The client should delete "*table" when no longer needed.
+  // If there was an error while initializing the table, sets "*table"
+  // to NULL and returns a non-ok status.  Does not take ownership of
+  // "*file", but the client must ensure that "file" remains live
+  // for the duration of the returned table's lifetime.
+  static Status Open(const Options& options, RandomAccessFile* file,
+                          uint64 file_size, Table** table);
+
+  ~Table();
+
+  // Returns a new iterator over the table contents.
+  // The result of NewIterator() is initially invalid (caller must
+  // call one of the Seek methods on the iterator before using it).
+  Iterator* NewIterator() const;
+
+  // Given a key, return an approximate byte offset in the file where
+  // the data for that key begins (or would begin if the key were
+  // present in the file).  The returned value is in terms of file
+  // bytes, and so includes effects like compression of the underlying data.
+  // E.g., the approximate offset of the last key in the table will
+  // be close to the file length.
+  uint64 ApproximateOffsetOf(const StringPiece& key) const;
+
+ private:
+  struct Rep;
+  Rep* rep_;
+
+  explicit Table(Rep* rep) { rep_ = rep; }
+  static Iterator* BlockReader(void*, const StringPiece&);
+
+  // Calls (*handle_result)(arg, ...) with the entry found after a call
+  // to Seek(key).  May not make such a call if filter policy says
+  // that key is not present.
+  Status InternalGet(const StringPiece& key, void* arg,
+                          void (*handle_result)(void* arg, const StringPiece& k,
+                                                const StringPiece& v));
+
+  // No copying allowed
+  Table(const Table&);
+  void operator=(const Table&);
+};
+
+}  // namespace table
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_TABLE_H_
diff --git a/tensorflow/core/lib/io/table_builder.cc b/tensorflow/core/lib/io/table_builder.cc
new file mode 100644
index 0000000000..b786888b30
--- /dev/null
+++ b/tensorflow/core/lib/io/table_builder.cc
@@ -0,0 +1,263 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#include "tensorflow/core/lib/io/table_builder.h"
+
+#include <assert.h>
+#include "tensorflow/core/lib/io/block_builder.h"
+#include "tensorflow/core/lib/io/format.h"
+#include "tensorflow/core/lib/io/table_options.h"
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/hash/crc32c.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+namespace table {
+
+namespace {
+
+void FindShortestSeparator(string* start, const StringPiece& limit) {
+  // Find length of common prefix
+  size_t min_length = std::min(start->size(), limit.size());
+  size_t diff_index = 0;
+  while ((diff_index < min_length) &&
+         ((*start)[diff_index] == limit[diff_index])) {
+    diff_index++;
+  }
+
+  if (diff_index >= min_length) {
+    // Do not shorten if one string is a prefix of the other
+  } else {
+    uint8 diff_byte = static_cast<uint8>((*start)[diff_index]);
+    if (diff_byte < static_cast<uint8>(0xff) &&
+        diff_byte + 1 < static_cast<uint8>(limit[diff_index])) {
+      (*start)[diff_index]++;
+      start->resize(diff_index + 1);
+      assert(StringPiece(*start).compare(limit) < 0);
+    }
+  }
+}
+
+void FindShortSuccessor(string* key) {
+  // Find first character that can be incremented
+  size_t n = key->size();
+  for (size_t i = 0; i < n; i++) {
+    const uint8 byte = (*key)[i];
+    if (byte != static_cast<uint8>(0xff)) {
+      (*key)[i] = byte + 1;
+      key->resize(i + 1);
+      return;
+    }
+  }
+  // *key is a run of 0xffs.  Leave it alone.
+}
+}  // namespace
+
+struct TableBuilder::Rep {
+  Options options;
+  Options index_block_options;
+  WritableFile* file;
+  uint64 offset;
+  Status status;
+  BlockBuilder data_block;
+  BlockBuilder index_block;
+  string last_key;
+  int64 num_entries;
+  bool closed;  // Either Finish() or Abandon() has been called.
+
+  // We do not emit the index entry for a block until we have seen the
+  // first key for the next data block.  This allows us to use shorter
+  // keys in the index block.  For example, consider a block boundary
+  // between the keys "the quick brown fox" and "the who".  We can use
+  // "the r" as the key for the index block entry since it is >= all
+  // entries in the first block and < all entries in subsequent
+  // blocks.
+  //
+  // Invariant: r->pending_index_entry is true only if data_block is empty.
+  bool pending_index_entry;
+  BlockHandle pending_handle;  // Handle to add to index block
+
+  string compressed_output;
+
+  Rep(const Options& opt, WritableFile* f)
+      : options(opt),
+        index_block_options(opt),
+        file(f),
+        offset(0),
+        data_block(&options),
+        index_block(&index_block_options),
+        num_entries(0),
+        closed(false),
+        pending_index_entry(false) {
+    index_block_options.block_restart_interval = 1;
+  }
+};
+
+TableBuilder::TableBuilder(const Options& options, WritableFile* file)
+    : rep_(new Rep(options, file)) {}
+
+TableBuilder::~TableBuilder() {
+  assert(rep_->closed);  // Catch errors where caller forgot to call Finish()
+  delete rep_;
+}
+
+void TableBuilder::Add(const StringPiece& key, const StringPiece& value) {
+  Rep* r = rep_;
+  assert(!r->closed);
+  if (!ok()) return;
+  if (r->num_entries > 0) {
+    assert(key.compare(StringPiece(r->last_key)) > 0);
+    // See if this key+value would make our current block overly large.  If
+    // so, emit the current block before adding this key/value
+    const int kOverlyLargeBlockRatio = 2;
+    const size_t this_entry_bytes = key.size() + value.size();
+    if (this_entry_bytes >= kOverlyLargeBlockRatio * r->options.block_size) {
+      Flush();
+    }
+  }
+
+  if (r->pending_index_entry) {
+    assert(r->data_block.empty());
+    FindShortestSeparator(&r->last_key, key);
+    string handle_encoding;
+    r->pending_handle.EncodeTo(&handle_encoding);
+    r->index_block.Add(r->last_key, StringPiece(handle_encoding));
+    r->pending_index_entry = false;
+  }
+
+  r->last_key.assign(key.data(), key.size());
+  r->num_entries++;
+  r->data_block.Add(key, value);
+
+  const size_t estimated_block_size = r->data_block.CurrentSizeEstimate();
+  if (estimated_block_size >= r->options.block_size) {
+    Flush();
+  }
+}
+
+void TableBuilder::Flush() {
+  Rep* r = rep_;
+  assert(!r->closed);
+  if (!ok()) return;
+  if (r->data_block.empty()) return;
+  assert(!r->pending_index_entry);
+  WriteBlock(&r->data_block, &r->pending_handle);
+  if (ok()) {
+    r->pending_index_entry = true;
+    r->status = r->file->Flush();
+  }
+}
+
+void TableBuilder::WriteBlock(BlockBuilder* block, BlockHandle* handle) {
+  // File format contains a sequence of blocks where each block has:
+  //    block_data: uint8[n]
+  //    type: uint8
+  //    crc: uint32
+  assert(ok());
+  Rep* r = rep_;
+  StringPiece raw = block->Finish();
+
+  StringPiece block_contents;
+  CompressionType type = r->options.compression;
+  // TODO(postrelease): Support more compression options: zlib?
+  switch (type) {
+    case kNoCompression:
+      block_contents = raw;
+      break;
+
+    case kSnappyCompression: {
+      string* compressed = &r->compressed_output;
+      if (port::Snappy_Compress(raw.data(), raw.size(), compressed) &&
+          compressed->size() < raw.size() - (raw.size() / 8u)) {
+        block_contents = *compressed;
+      } else {
+        // Snappy not supported, or compressed less than 12.5%, so just
+        // store uncompressed form
+        block_contents = raw;
+        type = kNoCompression;
+      }
+      break;
+    }
+  }
+  WriteRawBlock(block_contents, type, handle);
+  r->compressed_output.clear();
+  block->Reset();
+}
+
+void TableBuilder::WriteRawBlock(const StringPiece& block_contents,
+                                 CompressionType type, BlockHandle* handle) {
+  Rep* r = rep_;
+  handle->set_offset(r->offset);
+  handle->set_size(block_contents.size());
+  r->status = r->file->Append(block_contents);
+  if (r->status.ok()) {
+    char trailer[kBlockTrailerSize];
+    trailer[0] = type;
+    uint32 crc = crc32c::Value(block_contents.data(), block_contents.size());
+    crc = crc32c::Extend(crc, trailer, 1);  // Extend crc to cover block type
+    core::EncodeFixed32(trailer + 1, crc32c::Mask(crc));
+    r->status = r->file->Append(StringPiece(trailer, kBlockTrailerSize));
+    if (r->status.ok()) {
+      r->offset += block_contents.size() + kBlockTrailerSize;
+    }
+  }
+}
+
+Status TableBuilder::status() const { return rep_->status; }
+
+Status TableBuilder::Finish() {
+  Rep* r = rep_;
+  Flush();
+  assert(!r->closed);
+  r->closed = true;
+
+  BlockHandle metaindex_block_handle, index_block_handle;
+
+  // Write metaindex block
+  if (ok()) {
+    BlockBuilder meta_index_block(&r->options);
+    // TODO(postrelease): Add stats and other meta blocks
+    WriteBlock(&meta_index_block, &metaindex_block_handle);
+  }
+
+  // Write index block
+  if (ok()) {
+    if (r->pending_index_entry) {
+      FindShortSuccessor(&r->last_key);
+      string handle_encoding;
+      r->pending_handle.EncodeTo(&handle_encoding);
+      r->index_block.Add(r->last_key, StringPiece(handle_encoding));
+      r->pending_index_entry = false;
+    }
+    WriteBlock(&r->index_block, &index_block_handle);
+  }
+
+  // Write footer
+  if (ok()) {
+    Footer footer;
+    footer.set_metaindex_handle(metaindex_block_handle);
+    footer.set_index_handle(index_block_handle);
+    string footer_encoding;
+    footer.EncodeTo(&footer_encoding);
+    r->status = r->file->Append(footer_encoding);
+    if (r->status.ok()) {
+      r->offset += footer_encoding.size();
+    }
+  }
+  return r->status;
+}
+
+void TableBuilder::Abandon() {
+  Rep* r = rep_;
+  assert(!r->closed);
+  r->closed = true;
+}
+
+uint64 TableBuilder::NumEntries() const { return rep_->num_entries; }
+
+uint64 TableBuilder::FileSize() const { return rep_->offset; }
+
+}  // namespace table
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/table_builder.h b/tensorflow/core/lib/io/table_builder.h
new file mode 100644
index 0000000000..cebf4d8e0c
--- /dev/null
+++ b/tensorflow/core/lib/io/table_builder.h
@@ -0,0 +1,87 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+//
+// TableBuilder provides the interface used to build a Table
+// (an immutable and sorted map from keys to values).
+//
+// Multiple threads can invoke const methods on a TableBuilder without
+// external synchronization, but if any of the threads may call a
+// non-const method, all threads accessing the same TableBuilder must use
+// external synchronization.
+
+#ifndef TENSORFLOW_LIB_IO_TABLE_BUILDER_H_
+#define TENSORFLOW_LIB_IO_TABLE_BUILDER_H_
+
+#include <stdint.h>
+#include "tensorflow/core/lib/io/table_options.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+class WritableFile;
+namespace table {
+
+class BlockBuilder;
+class BlockHandle;
+
+class TableBuilder {
+ public:
+  // Create a builder that will store the contents of the table it is
+  // building in *file.  Does not close the file.  It is up to the
+  // caller to close the file after calling Finish().
+  TableBuilder(const Options& options, WritableFile* file);
+
+  // REQUIRES: Either Finish() or Abandon() has been called.
+  ~TableBuilder();
+
+  // Add key,value to the table being constructed.
+  // REQUIRES: key is after any previously added key in lexicographic order.
+  // REQUIRES: Finish(), Abandon() have not been called
+  void Add(const StringPiece& key, const StringPiece& value);
+
+  // Advanced operation: flush any buffered key/value pairs to file.
+  // Can be used to ensure that two adjacent entries never live in
+  // the same data block.  Most clients should not need to use this method.
+  // REQUIRES: Finish(), Abandon() have not been called
+  void Flush();
+
+  // Return non-ok iff some error has been detected.
+  Status status() const;
+
+  // Finish building the table.  Stops using the file passed to the
+  // constructor after this function returns.
+  // REQUIRES: Finish(), Abandon() have not been called
+  Status Finish();
+
+  // Indicate that the contents of this builder should be abandoned.  Stops
+  // using the file passed to the constructor after this function returns.
+  // If the caller is not going to call Finish(), it must call Abandon()
+  // before destroying this builder.
+  // REQUIRES: Finish(), Abandon() have not been called
+  void Abandon();
+
+  // Number of calls to Add() so far.
+  uint64 NumEntries() const;
+
+  // Size of the file generated so far.  If invoked after a successful
+  // Finish() call, returns the size of the final generated file.
+  uint64 FileSize() const;
+
+ private:
+  bool ok() const { return status().ok(); }
+  void WriteBlock(BlockBuilder* block, BlockHandle* handle);
+  void WriteRawBlock(const StringPiece& data, CompressionType,
+                     BlockHandle* handle);
+
+  struct Rep;
+  Rep* rep_;
+
+  // No copying allowed
+  TableBuilder(const TableBuilder&);
+  void operator=(const TableBuilder&);
+};
+
+}  // namespace table
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_TABLE_BUILDER_H_
diff --git a/tensorflow/core/lib/io/table_format.txt b/tensorflow/core/lib/io/table_format.txt
new file mode 100644
index 0000000000..7edb9fb121
--- /dev/null
+++ b/tensorflow/core/lib/io/table_format.txt
@@ -0,0 +1,8 @@
+File format
+===========
+
+The table format is heavily based on the table format for the LevelDB
+open source key/value store, with the exception that our tables
+do not support "filter" meta blocks (Bloom Filters).  See:
+
+https://code.google.com/p/leveldb/source/browse/doc/table_format.txt
diff --git a/tensorflow/core/lib/io/table_options.h b/tensorflow/core/lib/io/table_options.h
new file mode 100644
index 0000000000..45b061b03b
--- /dev/null
+++ b/tensorflow/core/lib/io/table_options.h
@@ -0,0 +1,53 @@
+#ifndef TENSORFLOW_LIB_IO_TABLE_OPTIONS_H_
+#define TENSORFLOW_LIB_IO_TABLE_OPTIONS_H_
+
+#include <stddef.h>
+
+namespace tensorflow {
+namespace table {
+
+// DB contents are stored in a set of blocks, each of which holds a
+// sequence of key,value pairs.  Each block may be compressed before
+// being stored in a file.  The following enum describes which
+// compression method (if any) is used to compress a block.
+enum CompressionType {
+  // NOTE: do not change the values of existing entries, as these are
+  // part of the persistent format on disk.
+  kNoCompression = 0x0,
+  kSnappyCompression = 0x1
+};
+
+// Options to control the behavior of a table (passed to Table::Open)
+struct Options {
+  // Approximate size of user data packed per block.  Note that the
+  // block size specified here corresponds to uncompressed data.  The
+  // actual size of the unit read from disk may be smaller if
+  // compression is enabled.  This parameter can be changed dynamically.
+  size_t block_size = 262144;
+
+  // Number of keys between restart points for delta encoding of keys.
+  // This parameter can be changed dynamically.  Most clients should
+  // leave this parameter alone.
+  int block_restart_interval = 16;
+
+  // Compress blocks using the specified compression algorithm.  This
+  // parameter can be changed dynamically.
+  //
+  // Default: kSnappyCompression, which gives lightweight but fast
+  // compression.
+  //
+  // Typical speeds of kSnappyCompression on an Intel(R) Core(TM)2 2.4GHz:
+  //    ~200-500MB/s compression
+  //    ~400-800MB/s decompression
+  // Note that these speeds are significantly faster than most
+  // persistent storage speeds, and therefore it is typically never
+  // worth switching to kNoCompression.  Even if the input data is
+  // incompressible, the kSnappyCompression implementation will
+  // efficiently detect that and will switch to uncompressed mode.
+  CompressionType compression = kSnappyCompression;
+};
+
+}  // namespace table
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_TABLE_OPTIONS_H_
diff --git a/tensorflow/core/lib/io/table_test.cc b/tensorflow/core/lib/io/table_test.cc
new file mode 100644
index 0000000000..66e90ac64e
--- /dev/null
+++ b/tensorflow/core/lib/io/table_test.cc
@@ -0,0 +1,601 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#include "tensorflow/core/lib/io/table.h"
+
+#include <map>
+#include <string>
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/io/block.h"
+#include "tensorflow/core/lib/io/block_builder.h"
+#include "tensorflow/core/lib/io/format.h"
+#include "tensorflow/core/lib/io/iterator.h"
+#include "tensorflow/core/lib/io/table_builder.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace table {
+
+namespace test {
+static StringPiece RandomString(random::SimplePhilox* rnd, int len,
+                                string* dst) {
+  dst->resize(len);
+  for (int i = 0; i < len; i++) {
+    (*dst)[i] = static_cast<char>(' ' + rnd->Uniform(95));  // ' ' .. '~'
+  }
+  return StringPiece(*dst);
+}
+static string RandomKey(random::SimplePhilox* rnd, int len) {
+  // Make sure to generate a wide variety of characters so we
+  // test the boundary conditions for short-key optimizations.
+  static const char kTestChars[] = {'\0', '\1', 'a',    'b',    'c',
+                                    'd',  'e',  '\xfd', '\xfe', '\xff'};
+  string result;
+  for (int i = 0; i < len; i++) {
+    result += kTestChars[rnd->Uniform(sizeof(kTestChars))];
+  }
+  return result;
+}
+static StringPiece CompressibleString(random::SimplePhilox* rnd,
+                                      double compressed_fraction, size_t len,
+                                      string* dst) {
+  int raw = static_cast<int>(len * compressed_fraction);
+  if (raw < 1) raw = 1;
+  string raw_data;
+  RandomString(rnd, raw, &raw_data);
+
+  // Duplicate the random data until we have filled "len" bytes
+  dst->clear();
+  while (dst->size() < len) {
+    dst->append(raw_data);
+  }
+  dst->resize(len);
+  return StringPiece(*dst);
+}
+}
+
+static void Increment(string* key) { key->push_back('\0'); }
+
+// An STL comparator that compares two StringPieces
+namespace {
+struct STLLessThan {
+  STLLessThan() {}
+  bool operator()(const string& a, const string& b) const {
+    return StringPiece(a).compare(StringPiece(b)) < 0;
+  }
+};
+}  // namespace
+
+class StringSink : public WritableFile {
+ public:
+  ~StringSink() {}
+
+  const string& contents() const { return contents_; }
+
+  virtual Status Close() { return Status::OK(); }
+  virtual Status Flush() { return Status::OK(); }
+  virtual Status Sync() { return Status::OK(); }
+
+  virtual Status Append(const StringPiece& data) {
+    contents_.append(data.data(), data.size());
+    return Status::OK();
+  }
+
+ private:
+  string contents_;
+};
+
+class StringSource : public RandomAccessFile {
+ public:
+  StringSource(const StringPiece& contents)
+      : contents_(contents.data(), contents.size()), bytes_read_(0) {}
+
+  virtual ~StringSource() {}
+
+  uint64 Size() const { return contents_.size(); }
+
+  virtual Status Read(uint64 offset, size_t n, StringPiece* result,
+                           char* scratch) const {
+    if (offset > contents_.size()) {
+      return errors::InvalidArgument("invalid Read offset");
+    }
+    if (offset + n > contents_.size()) {
+      n = contents_.size() - offset;
+    }
+    memcpy(scratch, &contents_[offset], n);
+    *result = StringPiece(scratch, n);
+    bytes_read_ += n;
+    return Status::OK();
+  }
+
+  uint64 BytesRead() const { return bytes_read_; }
+
+ private:
+  string contents_;
+  mutable uint64 bytes_read_;
+};
+
+typedef std::map<string, string, STLLessThan> KVMap;
+
+// Helper class for tests to unify the interface between
+// BlockBuilder/TableBuilder and Block/Table.
+class Constructor {
+ public:
+  explicit Constructor() : data_(STLLessThan()) {}
+  virtual ~Constructor() {}
+
+  void Add(const string& key, const StringPiece& value) {
+    data_[key] = value.ToString();
+  }
+
+  // Finish constructing the data structure with all the keys that have
+  // been added so far.  Returns the keys in sorted order in "*keys"
+  // and stores the key/value pairs in "*kvmap"
+  void Finish(const Options& options, std::vector<string>* keys, KVMap* kvmap) {
+    *kvmap = data_;
+    keys->clear();
+    for (KVMap::const_iterator it = data_.begin(); it != data_.end(); ++it) {
+      keys->push_back(it->first);
+    }
+    data_.clear();
+    Status s = FinishImpl(options, *kvmap);
+    ASSERT_TRUE(s.ok()) << s.ToString();
+  }
+
+  // Construct the data structure from the data in "data"
+  virtual Status FinishImpl(const Options& options, const KVMap& data) = 0;
+
+  virtual Iterator* NewIterator() const = 0;
+
+  virtual const KVMap& data() { return data_; }
+
+ private:
+  KVMap data_;
+};
+
+class BlockConstructor : public Constructor {
+ public:
+  BlockConstructor() : block_(NULL) {}
+  ~BlockConstructor() { delete block_; }
+  virtual Status FinishImpl(const Options& options, const KVMap& data) {
+    delete block_;
+    block_ = NULL;
+    BlockBuilder builder(&options);
+
+    for (KVMap::const_iterator it = data.begin(); it != data.end(); ++it) {
+      builder.Add(it->first, it->second);
+    }
+    // Open the block
+    data_ = builder.Finish().ToString();
+    BlockContents contents;
+    contents.data = data_;
+    contents.cachable = false;
+    contents.heap_allocated = false;
+    block_ = new Block(contents);
+    return Status::OK();
+  }
+  virtual Iterator* NewIterator() const { return block_->NewIterator(); }
+
+ private:
+  string data_;
+  Block* block_;
+};
+
+class TableConstructor : public Constructor {
+ public:
+  TableConstructor() : source_(NULL), table_(NULL) {}
+  ~TableConstructor() { Reset(); }
+  virtual Status FinishImpl(const Options& options, const KVMap& data) {
+    Reset();
+    StringSink sink;
+    TableBuilder builder(options, &sink);
+
+    for (KVMap::const_iterator it = data.begin(); it != data.end(); ++it) {
+      builder.Add(it->first, it->second);
+      TF_CHECK_OK(builder.status());
+    }
+    Status s = builder.Finish();
+    TF_CHECK_OK(s) << s.ToString();
+
+    CHECK_EQ(sink.contents().size(), builder.FileSize());
+
+    // Open the table
+    source_ = new StringSource(sink.contents());
+    Options table_options;
+    return Table::Open(table_options, source_, sink.contents().size(), &table_);
+  }
+
+  virtual Iterator* NewIterator() const { return table_->NewIterator(); }
+
+  uint64 ApproximateOffsetOf(const StringPiece& key) const {
+    return table_->ApproximateOffsetOf(key);
+  }
+
+  uint64 BytesRead() const { return source_->BytesRead(); }
+
+ private:
+  void Reset() {
+    delete table_;
+    delete source_;
+    table_ = NULL;
+    source_ = NULL;
+  }
+
+  StringSource* source_;
+  Table* table_;
+};
+
+enum TestType { TABLE_TEST, BLOCK_TEST };
+
+struct TestArgs {
+  TestType type;
+  int restart_interval;
+};
+
+static const TestArgs kTestArgList[] = {
+    {TABLE_TEST, 16}, {TABLE_TEST, 1}, {TABLE_TEST, 1024},
+    {BLOCK_TEST, 16}, {BLOCK_TEST, 1}, {BLOCK_TEST, 1024},
+};
+static const int kNumTestArgs = sizeof(kTestArgList) / sizeof(kTestArgList[0]);
+
+class Harness : public ::testing::Test {
+ public:
+  Harness() : constructor_(NULL) {}
+
+  void Init(const TestArgs& args) {
+    delete constructor_;
+    constructor_ = NULL;
+    options_ = Options();
+
+    options_.block_restart_interval = args.restart_interval;
+    // Use shorter block size for tests to exercise block boundary
+    // conditions more.
+    options_.block_size = 256;
+    switch (args.type) {
+      case TABLE_TEST:
+        constructor_ = new TableConstructor();
+        break;
+      case BLOCK_TEST:
+        constructor_ = new BlockConstructor();
+        break;
+    }
+  }
+
+  ~Harness() { delete constructor_; }
+
+  void Add(const string& key, const string& value) {
+    constructor_->Add(key, value);
+  }
+
+  void Test(random::SimplePhilox* rnd) {
+    std::vector<string> keys;
+    KVMap data;
+    constructor_->Finish(options_, &keys, &data);
+
+    TestForwardScan(keys, data);
+    TestRandomAccess(rnd, keys, data);
+  }
+
+  void TestForwardScan(const std::vector<string>& keys, const KVMap& data) {
+    Iterator* iter = constructor_->NewIterator();
+    ASSERT_TRUE(!iter->Valid());
+    iter->SeekToFirst();
+    for (KVMap::const_iterator model_iter = data.begin();
+         model_iter != data.end(); ++model_iter) {
+      ASSERT_EQ(ToString(data, model_iter), ToString(iter));
+      iter->Next();
+    }
+    ASSERT_TRUE(!iter->Valid());
+    delete iter;
+  }
+
+  void TestRandomAccess(random::SimplePhilox* rnd,
+                        const std::vector<string>& keys, const KVMap& data) {
+    static const bool kVerbose = false;
+    Iterator* iter = constructor_->NewIterator();
+    ASSERT_TRUE(!iter->Valid());
+    KVMap::const_iterator model_iter = data.begin();
+    if (kVerbose) fprintf(stderr, "---\n");
+    for (int i = 0; i < 200; i++) {
+      const int toss = rnd->Uniform(3);
+      switch (toss) {
+        case 0: {
+          if (iter->Valid()) {
+            if (kVerbose) fprintf(stderr, "Next\n");
+            iter->Next();
+            ++model_iter;
+            ASSERT_EQ(ToString(data, model_iter), ToString(iter));
+          }
+          break;
+        }
+
+        case 1: {
+          if (kVerbose) fprintf(stderr, "SeekToFirst\n");
+          iter->SeekToFirst();
+          model_iter = data.begin();
+          ASSERT_EQ(ToString(data, model_iter), ToString(iter));
+          break;
+        }
+
+        case 2: {
+          string key = PickRandomKey(rnd, keys);
+          model_iter = data.lower_bound(key);
+          if (kVerbose)
+            fprintf(stderr, "Seek '%s'\n", str_util::CEscape(key).c_str());
+          iter->Seek(StringPiece(key));
+          ASSERT_EQ(ToString(data, model_iter), ToString(iter));
+          break;
+        }
+      }
+    }
+    delete iter;
+  }
+
+  string ToString(const KVMap& data, const KVMap::const_iterator& it) {
+    if (it == data.end()) {
+      return "END";
+    } else {
+      return "'" + it->first + "->" + it->second + "'";
+    }
+  }
+
+  string ToString(const KVMap& data, const KVMap::const_reverse_iterator& it) {
+    if (it == data.rend()) {
+      return "END";
+    } else {
+      return "'" + it->first + "->" + it->second + "'";
+    }
+  }
+
+  string ToString(const Iterator* it) {
+    if (!it->Valid()) {
+      return "END";
+    } else {
+      return "'" + it->key().ToString() + "->" + it->value().ToString() + "'";
+    }
+  }
+
+  string PickRandomKey(random::SimplePhilox* rnd,
+                       const std::vector<string>& keys) {
+    if (keys.empty()) {
+      return "foo";
+    } else {
+      const int index = rnd->Uniform(keys.size());
+      string result = keys[index];
+      switch (rnd->Uniform(3)) {
+        case 0:
+          // Return an existing key
+          break;
+        case 1: {
+          // Attempt to return something smaller than an existing key
+          if (result.size() > 0 && result[result.size() - 1] > '\0') {
+            result[result.size() - 1]--;
+          }
+          break;
+        }
+        case 2: {
+          // Return something larger than an existing key
+          Increment(&result);
+          break;
+        }
+      }
+      return result;
+    }
+  }
+
+ private:
+  Options options_;
+  Constructor* constructor_;
+};
+
+// Test empty table/block.
+TEST_F(Harness, Empty) {
+  for (int i = 0; i < kNumTestArgs; i++) {
+    Init(kTestArgList[i]);
+    random::PhiloxRandom philox(testing::RandomSeed() + 1, 17);
+    random::SimplePhilox rnd(&philox);
+    Test(&rnd);
+  }
+}
+
+// Special test for a block with no restart entries.  The C++ leveldb
+// code never generates such blocks, but the Java version of leveldb
+// seems to.
+TEST_F(Harness, ZeroRestartPointsInBlock) {
+  char data[sizeof(uint32)];
+  memset(data, 0, sizeof(data));
+  BlockContents contents;
+  contents.data = StringPiece(data, sizeof(data));
+  contents.cachable = false;
+  contents.heap_allocated = false;
+  Block block(contents);
+  Iterator* iter = block.NewIterator();
+  iter->SeekToFirst();
+  ASSERT_TRUE(!iter->Valid());
+  iter->Seek("foo");
+  ASSERT_TRUE(!iter->Valid());
+  delete iter;
+}
+
+// Test the empty key
+TEST_F(Harness, SimpleEmptyKey) {
+  for (int i = 0; i < kNumTestArgs; i++) {
+    Init(kTestArgList[i]);
+    random::PhiloxRandom philox(testing::RandomSeed() + 1, 17);
+    random::SimplePhilox rnd(&philox);
+    Add("", "v");
+    Test(&rnd);
+  }
+}
+
+TEST_F(Harness, SimpleSingle) {
+  for (int i = 0; i < kNumTestArgs; i++) {
+    Init(kTestArgList[i]);
+    random::PhiloxRandom philox(testing::RandomSeed() + 2, 17);
+    random::SimplePhilox rnd(&philox);
+    Add("abc", "v");
+    Test(&rnd);
+  }
+}
+
+TEST_F(Harness, SimpleMulti) {
+  for (int i = 0; i < kNumTestArgs; i++) {
+    Init(kTestArgList[i]);
+    random::PhiloxRandom philox(testing::RandomSeed() + 3, 17);
+    random::SimplePhilox rnd(&philox);
+    Add("abc", "v");
+    Add("abcd", "v");
+    Add("ac", "v2");
+    Test(&rnd);
+  }
+}
+
+TEST_F(Harness, SimpleMultiBigValues) {
+  for (int i = 0; i < kNumTestArgs; i++) {
+    Init(kTestArgList[i]);
+    random::PhiloxRandom philox(testing::RandomSeed() + 3, 17);
+    random::SimplePhilox rnd(&philox);
+    Add("ainitial", "tiny");
+    Add("anext", string(10000000, 'a'));
+    Add("anext2", string(10000000, 'b'));
+    Add("azz", "tiny");
+    Test(&rnd);
+  }
+}
+
+TEST_F(Harness, SimpleSpecialKey) {
+  for (int i = 0; i < kNumTestArgs; i++) {
+    Init(kTestArgList[i]);
+    random::PhiloxRandom philox(testing::RandomSeed() + 4, 17);
+    random::SimplePhilox rnd(&philox);
+    Add("\xff\xff", "v3");
+    Test(&rnd);
+  }
+}
+
+TEST_F(Harness, Randomized) {
+  for (int i = 0; i < kNumTestArgs; i++) {
+    Init(kTestArgList[i]);
+    random::PhiloxRandom philox(testing::RandomSeed() + 5, 17);
+    random::SimplePhilox rnd(&philox);
+    for (int num_entries = 0; num_entries < 2000;
+         num_entries += (num_entries < 50 ? 1 : 200)) {
+      if ((num_entries % 10) == 0) {
+        fprintf(stderr, "case %d of %d: num_entries = %d\n", (i + 1),
+                int(kNumTestArgs), num_entries);
+      }
+      for (int e = 0; e < num_entries; e++) {
+        string v;
+        Add(test::RandomKey(&rnd, rnd.Skewed(4)),
+            test::RandomString(&rnd, rnd.Skewed(5), &v).ToString());
+      }
+      Test(&rnd);
+    }
+  }
+}
+
+static bool Between(uint64 val, uint64 low, uint64 high) {
+  bool result = (val >= low) && (val <= high);
+  if (!result) {
+    fprintf(stderr, "Value %llu is not in range [%llu, %llu]\n",
+            (unsigned long long)(val), (unsigned long long)(low),
+            (unsigned long long)(high));
+  }
+  return result;
+}
+
+class TableTest {};
+
+TEST(TableTest, ApproximateOffsetOfPlain) {
+  TableConstructor c;
+  c.Add("k01", "hello");
+  c.Add("k02", "hello2");
+  c.Add("k03", string(10000, 'x'));
+  c.Add("k04", string(200000, 'x'));
+  c.Add("k05", string(300000, 'x'));
+  c.Add("k06", "hello3");
+  c.Add("k07", string(100000, 'x'));
+  std::vector<string> keys;
+  KVMap kvmap;
+  Options options;
+  options.block_size = 1024;
+  options.compression = kNoCompression;
+  c.Finish(options, &keys, &kvmap);
+
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("abc"), 0, 0));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k01"), 0, 0));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k01a"), 0, 0));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k02"), 0, 0));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k03"), 10, 500));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k04"), 10000, 11000));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k04a"), 210000, 211000));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k05"), 210000, 211000));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k06"), 510000, 511000));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k07"), 510000, 511000));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("xyz"), 610000, 612000));
+}
+
+static bool SnappyCompressionSupported() {
+  string out;
+  StringPiece in = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
+  return port::Snappy_Compress(in.data(), in.size(), &out);
+}
+
+TEST(TableTest, ApproximateOffsetOfCompressed) {
+  if (!SnappyCompressionSupported()) {
+    fprintf(stderr, "skipping compression tests\n");
+    return;
+  }
+
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  TableConstructor c;
+  string tmp;
+  c.Add("k01", "hello");
+  c.Add("k02", test::CompressibleString(&rnd, 0.25, 10000, &tmp));
+  c.Add("k03", "hello3");
+  c.Add("k04", test::CompressibleString(&rnd, 0.25, 10000, &tmp));
+  std::vector<string> keys;
+  KVMap kvmap;
+  Options options;
+  options.block_size = 1024;
+  options.compression = kSnappyCompression;
+  c.Finish(options, &keys, &kvmap);
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("abc"), 0, 0));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k01"), 0, 0));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k02"), 10, 100));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k03"), 2000, 3000));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("k04"), 2000, 3000));
+  ASSERT_TRUE(Between(c.ApproximateOffsetOf("xyz"), 4000, 6000));
+}
+
+TEST(TableTest, SeekToFirstKeyDoesNotReadTooMuch) {
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  string tmp;
+  TableConstructor c;
+  c.Add("k01", "firstvalue");
+  c.Add("k03", test::CompressibleString(&rnd, 0.25, 1000000, &tmp));
+  c.Add("k04", "abc");
+  std::vector<string> keys;
+  KVMap kvmap;
+  Options options;
+  options.block_size = 1024;
+  options.compression = kNoCompression;
+  c.Finish(options, &keys, &kvmap);
+
+  Iterator* iter = c.NewIterator();
+  iter->Seek("k01");
+  delete iter;
+  // Make sure we don't read the big second block when just trying to
+  // retrieve the data in the first key
+  EXPECT_LT(c.BytesRead(), 200);
+}
+
+}  // namespace table
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/two_level_iterator.cc b/tensorflow/core/lib/io/two_level_iterator.cc
new file mode 100644
index 0000000000..409baade6d
--- /dev/null
+++ b/tensorflow/core/lib/io/two_level_iterator.cc
@@ -0,0 +1,148 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#include "tensorflow/core/lib/io/two_level_iterator.h"
+
+#include "tensorflow/core/lib/io/table.h"
+#include "tensorflow/core/lib/io/block.h"
+#include "tensorflow/core/lib/io/format.h"
+#include "tensorflow/core/lib/io/iterator.h"
+
+namespace tensorflow {
+namespace table {
+
+namespace {
+
+typedef Iterator* (*BlockFunction)(void*, const StringPiece&);
+
+class TwoLevelIterator : public Iterator {
+ public:
+  TwoLevelIterator(Iterator* index_iter, BlockFunction block_function,
+                   void* arg);
+
+  virtual ~TwoLevelIterator();
+
+  virtual void Seek(const StringPiece& target);
+  virtual void SeekToFirst();
+  virtual void Next();
+
+  virtual bool Valid() const {
+    return (data_iter_ == nullptr) ? false : data_iter_->Valid();
+  }
+  virtual StringPiece key() const {
+    assert(Valid());
+    return data_iter_->key();
+  }
+  virtual StringPiece value() const {
+    assert(Valid());
+    return data_iter_->value();
+  }
+  virtual Status status() const {
+    // It'd be nice if status() returned a const Status& instead of a
+    // Status
+    if (!index_iter_->status().ok()) {
+      return index_iter_->status();
+    } else if (data_iter_ != NULL && !data_iter_->status().ok()) {
+      return data_iter_->status();
+    } else {
+      return status_;
+    }
+  }
+
+ private:
+  void SaveError(const Status& s) {
+    if (status_.ok() && !s.ok()) status_ = s;
+  }
+  void SkipEmptyDataBlocksForward();
+  void SetDataIterator(Iterator* data_iter);
+  void InitDataBlock();
+
+  BlockFunction block_function_;
+  void* arg_;
+  Status status_;
+  Iterator* index_iter_;
+  Iterator* data_iter_;  // May be NULL
+  // If data_iter_ is non-NULL, then "data_block_handle_" holds the
+  // "index_value" passed to block_function_ to create the data_iter_.
+  string data_block_handle_;
+};
+
+TwoLevelIterator::TwoLevelIterator(Iterator* index_iter,
+                                   BlockFunction block_function, void* arg)
+    : block_function_(block_function),
+      arg_(arg),
+      index_iter_(index_iter),
+      data_iter_(NULL) {}
+
+TwoLevelIterator::~TwoLevelIterator() {
+  delete index_iter_;
+  delete data_iter_;
+}
+
+void TwoLevelIterator::Seek(const StringPiece& target) {
+  index_iter_->Seek(target);
+  InitDataBlock();
+  if (data_iter_ != NULL) data_iter_->Seek(target);
+  SkipEmptyDataBlocksForward();
+}
+
+void TwoLevelIterator::SeekToFirst() {
+  index_iter_->SeekToFirst();
+  InitDataBlock();
+  if (data_iter_ != NULL) data_iter_->SeekToFirst();
+  SkipEmptyDataBlocksForward();
+}
+
+void TwoLevelIterator::Next() {
+  assert(Valid());
+  data_iter_->Next();
+  SkipEmptyDataBlocksForward();
+}
+
+void TwoLevelIterator::SkipEmptyDataBlocksForward() {
+  while (data_iter_ == NULL || !data_iter_->Valid()) {
+    // Move to next block
+    if (!index_iter_->Valid()) {
+      SetDataIterator(NULL);
+      return;
+    }
+    index_iter_->Next();
+    InitDataBlock();
+    if (data_iter_ != NULL) data_iter_->SeekToFirst();
+  }
+}
+
+void TwoLevelIterator::SetDataIterator(Iterator* data_iter) {
+  if (data_iter_ != NULL) {
+    SaveError(data_iter_->status());
+    delete data_iter_;
+  }
+  data_iter_ = data_iter;
+}
+
+void TwoLevelIterator::InitDataBlock() {
+  if (!index_iter_->Valid()) {
+    SetDataIterator(NULL);
+  } else {
+    StringPiece handle = index_iter_->value();
+    if (data_iter_ != NULL && handle.compare(data_block_handle_) == 0) {
+      // data_iter_ is already constructed with this iterator, so
+      // no need to change anything
+    } else {
+      Iterator* iter = (*block_function_)(arg_, handle);
+      data_block_handle_.assign(handle.data(), handle.size());
+      SetDataIterator(iter);
+    }
+  }
+}
+
+}  // namespace
+
+Iterator* NewTwoLevelIterator(Iterator* index_iter,
+                              BlockFunction block_function, void* arg) {
+  return new TwoLevelIterator(index_iter, block_function, arg);
+}
+
+}  // namespace table
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/io/two_level_iterator.h b/tensorflow/core/lib/io/two_level_iterator.h
new file mode 100644
index 0000000000..1cc5d2f921
--- /dev/null
+++ b/tensorflow/core/lib/io/two_level_iterator.h
@@ -0,0 +1,30 @@
+// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file. See the AUTHORS file for names of contributors.
+
+#ifndef TENSORFLOW_LIB_IO_TWO_LEVEL_ITERATOR_H_
+#define TENSORFLOW_LIB_IO_TWO_LEVEL_ITERATOR_H_
+
+#include "tensorflow/core/lib/io/iterator.h"
+
+namespace tensorflow {
+namespace table {
+
+// Return a new two level iterator.  A two-level iterator contains an
+// index iterator whose values point to a sequence of blocks where
+// each block is itself a sequence of key,value pairs.  The returned
+// two-level iterator yields the concatenation of all key/value pairs
+// in the sequence of blocks.  Takes ownership of "index_iter" and
+// will delete it when no longer needed.
+//
+// Uses a supplied function to convert an index_iter value into
+// an iterator over the contents of the corresponding block.
+extern Iterator* NewTwoLevelIterator(
+    Iterator* index_iter,
+    Iterator* (*block_function)(void* arg, const StringPiece& index_value),
+    void* arg);
+
+}  // namespace table
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_IO_TWO_LEVEL_ITERATOR_H_
diff --git a/tensorflow/core/lib/jpeg/jpeg_handle.cc b/tensorflow/core/lib/jpeg/jpeg_handle.cc
new file mode 100644
index 0000000000..4521be0afb
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/jpeg_handle.cc
@@ -0,0 +1,162 @@
+// This file implements a memory destination for libjpeg
+// The design is very similar to jdatadst.c in libjpeg
+// These functions are not meant to be used directly, see jpeg_mem.h instead.
+// We are filling out stubs required by jpeglib, those stubs are private to
+// the implementation, we are just making available JPGMemSrc, JPGMemDest
+
+#include "tensorflow/core/lib/jpeg/jpeg_handle.h"
+
+#include <setjmp.h>
+#include <stddef.h>
+
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace jpeg {
+
+void CatchError(j_common_ptr cinfo) {
+  (*cinfo->err->output_message)(cinfo);
+  jmp_buf *jpeg_jmpbuf = reinterpret_cast<jmp_buf *>(cinfo->client_data);
+  jpeg_destroy(cinfo);
+  longjmp(*jpeg_jmpbuf, 1);
+}
+
+// *****************************************************************************
+// *****************************************************************************
+// *****************************************************************************
+// Destination functions
+
+// -----------------------------------------------------------------------------
+void MemInitDestination(j_compress_ptr cinfo) {
+  MemDestMgr *dest = reinterpret_cast<MemDestMgr *>(cinfo->dest);
+  VLOG(1) << "Initializing buffer=" << dest->bufsize << " bytes";
+  dest->pub.next_output_byte = dest->buffer;
+  dest->pub.free_in_buffer = dest->bufsize;
+  dest->datacount = 0;
+  if (dest->dest) {
+    dest->dest->clear();
+  }
+}
+
+// -----------------------------------------------------------------------------
+boolean MemEmptyOutputBuffer(j_compress_ptr cinfo) {
+  MemDestMgr *dest = reinterpret_cast<MemDestMgr *>(cinfo->dest);
+  VLOG(1) << "Writing " << dest->bufsize << " bytes";
+  if (dest->dest) {
+    dest->dest->append(reinterpret_cast<char *>(dest->buffer), dest->bufsize);
+  }
+  dest->pub.next_output_byte = dest->buffer;
+  dest->pub.free_in_buffer = dest->bufsize;
+  return TRUE;
+}
+
+// -----------------------------------------------------------------------------
+void MemTermDestination(j_compress_ptr cinfo) {
+  MemDestMgr *dest = reinterpret_cast<MemDestMgr *>(cinfo->dest);
+  VLOG(1) << "Writing " << dest->bufsize - dest->pub.free_in_buffer << " bytes";
+  if (dest->dest) {
+    dest->dest->append(reinterpret_cast<char *>(dest->buffer),
+                       dest->bufsize - dest->pub.free_in_buffer);
+    VLOG(1) << "Total size= " << dest->dest->size();
+  }
+  dest->datacount = dest->bufsize - dest->pub.free_in_buffer;
+}
+
+// -----------------------------------------------------------------------------
+void SetDest(j_compress_ptr cinfo, void *buffer, int bufsize) {
+  SetDest(cinfo, buffer, bufsize, NULL);
+}
+
+// -----------------------------------------------------------------------------
+void SetDest(j_compress_ptr cinfo, void *buffer, int bufsize,
+             string *destination) {
+  MemDestMgr *dest;
+  if (cinfo->dest == NULL) {
+    cinfo->dest = reinterpret_cast<struct jpeg_destination_mgr *>(
+        (*cinfo->mem->alloc_small)(reinterpret_cast<j_common_ptr>(cinfo),
+                                   JPOOL_PERMANENT, sizeof(MemDestMgr)));
+  }
+
+  dest = reinterpret_cast<MemDestMgr *>(cinfo->dest);
+  dest->bufsize = bufsize;
+  dest->buffer = static_cast<JOCTET *>(buffer);
+  dest->dest = destination;
+  dest->pub.init_destination = MemInitDestination;
+  dest->pub.empty_output_buffer = MemEmptyOutputBuffer;
+  dest->pub.term_destination = MemTermDestination;
+}
+
+// *****************************************************************************
+// *****************************************************************************
+// *****************************************************************************
+// Source functions
+
+// -----------------------------------------------------------------------------
+void MemInitSource(j_decompress_ptr cinfo) {
+  MemSourceMgr *src = reinterpret_cast<MemSourceMgr *>(cinfo->src);
+  src->pub.next_input_byte = src->data;
+  src->pub.bytes_in_buffer = src->datasize;
+}
+
+// -----------------------------------------------------------------------------
+// We emulate the same error-handling as fill_input_buffer() from jdatasrc.c,
+// for coherency's sake.
+boolean MemFillInputBuffer(j_decompress_ptr cinfo) {
+  static const JOCTET kEOIBuffer[2] = {0xff, JPEG_EOI};
+  MemSourceMgr *src = reinterpret_cast<MemSourceMgr *>(cinfo->src);
+  if (src->pub.bytes_in_buffer == 0 && src->pub.next_input_byte == src->data) {
+    // empty file -> treated as an error.
+    ERREXIT(cinfo, JERR_INPUT_EMPTY);
+    return FALSE;
+  } else if (src->pub.bytes_in_buffer) {
+    // if there's still some data left, it's probably corrupted
+    return src->try_recover_truncated_jpeg ? TRUE : FALSE;
+  } else if (src->pub.next_input_byte != kEOIBuffer &&
+             src->try_recover_truncated_jpeg) {
+    // In an attempt to recover truncated files, we insert a fake EOI
+    WARNMS(cinfo, JWRN_JPEG_EOF);
+    src->pub.next_input_byte = kEOIBuffer;
+    src->pub.bytes_in_buffer = 2;
+    return TRUE;
+  } else {
+    // We already inserted a fake EOI and it wasn't enough, so this time
+    // it's really an error.
+    ERREXIT(cinfo, JERR_FILE_READ);
+    return FALSE;
+  }
+}
+
+// -----------------------------------------------------------------------------
+void MemTermSource(j_decompress_ptr cinfo) {}
+
+// -----------------------------------------------------------------------------
+void MemSkipInputData(j_decompress_ptr cinfo, long jump) {
+  MemSourceMgr *src = reinterpret_cast<MemSourceMgr *>(cinfo->src);
+  src->pub.bytes_in_buffer -= jump;
+  src->pub.next_input_byte += jump;
+}
+
+// -----------------------------------------------------------------------------
+void SetSrc(j_decompress_ptr cinfo, const void *data,
+            unsigned long int datasize, bool try_recover_truncated_jpeg) {
+  MemSourceMgr *src;
+
+  cinfo->src = reinterpret_cast<struct jpeg_source_mgr *>(
+      (*cinfo->mem->alloc_small)(reinterpret_cast<j_common_ptr>(cinfo),
+                                 JPOOL_PERMANENT, sizeof(MemSourceMgr)));
+
+  src = reinterpret_cast<MemSourceMgr *>(cinfo->src);
+  src->pub.init_source = MemInitSource;
+  src->pub.fill_input_buffer = MemFillInputBuffer;
+  src->pub.skip_input_data = MemSkipInputData;
+  src->pub.resync_to_restart = jpeg_resync_to_restart;
+  src->pub.term_source = MemTermSource;
+  src->data = reinterpret_cast<const unsigned char *>(data);
+  src->datasize = datasize;
+  src->pub.bytes_in_buffer = 0;
+  src->pub.next_input_byte = NULL;
+  src->try_recover_truncated_jpeg = try_recover_truncated_jpeg;
+}
+
+}  // namespace jpeg
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/jpeg/jpeg_handle.h b/tensorflow/core/lib/jpeg/jpeg_handle.h
new file mode 100644
index 0000000000..58f7f6f666
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/jpeg_handle.h
@@ -0,0 +1,51 @@
+// This file declares the functions and structures for memory I/O with libjpeg
+// These functions are not meant to be used directly, see jpeg_mem.h isntead.
+
+#ifndef TENSORFLOW_LIB_JPEG_JPEG_HANDLE_H_
+#define TENSORFLOW_LIB_JPEG_JPEG_HANDLE_H_
+
+extern "C" {
+#include "external/jpeg_archive/jpeg-9a/jinclude.h"
+#include "external/jpeg_archive/jpeg-9a/jpeglib.h"
+#include "external/jpeg_archive/jpeg-9a/jerror.h"
+#include "external/jpeg_archive/jpeg-9a/transupp.h"  // for rotations
+}
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace jpeg {
+
+// Handler for fatal JPEG library errors: clean up & return
+void CatchError(j_common_ptr cinfo);
+
+typedef struct {
+  struct jpeg_destination_mgr pub;
+  JOCTET *buffer;
+  int bufsize;
+  int datacount;
+  string *dest;
+} MemDestMgr;
+
+typedef struct {
+  struct jpeg_source_mgr pub;
+  const unsigned char *data;
+  unsigned long int datasize;
+  bool try_recover_truncated_jpeg;
+} MemSourceMgr;
+
+void SetSrc(j_decompress_ptr cinfo, const void *data,
+            unsigned long int datasize, bool try_recover_truncated_jpeg);
+
+// JPEG destination: we will store all the data in a buffer "buffer" of total
+// size "bufsize", if the buffer overflows, we will be in trouble.
+void SetDest(j_compress_ptr cinfo, void *buffer, int bufsize);
+// Same as above, except that buffer is only used as a temporary structure and
+// is emptied into "destination" as soon as it fills up.
+void SetDest(j_compress_ptr cinfo, void *buffer, int bufsize,
+             string *destination);
+
+}  // namespace jpeg
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_JPEG_JPEG_HANDLE_H_
diff --git a/tensorflow/core/lib/jpeg/jpeg_mem.cc b/tensorflow/core/lib/jpeg/jpeg_mem.cc
new file mode 100644
index 0000000000..556f13e388
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/jpeg_mem.cc
@@ -0,0 +1,557 @@
+// This file defines functions to compress and uncompress JPEG data
+// to and from memory, as well as some direct manipulations of JPEG string
+
+#include "tensorflow/core/lib/jpeg/jpeg_mem.h"
+
+#include <setjmp.h>
+#include <string.h>
+#include <algorithm>
+#include <memory>
+#include <string>
+
+#include "tensorflow/core/lib/jpeg/jpeg_handle.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace jpeg {
+
+// -----------------------------------------------------------------------------
+// Decompression
+
+namespace {
+
+enum JPEGErrors {
+  JPEGERRORS_OK,
+  JPEGERRORS_UNEXPECTED_END_OF_DATA,
+  JPEGERRORS_BAD_PARAM
+};
+
+// Prevent bad compiler behaviour in ASAN mode by wrapping most of the
+// arguments in a struct struct.
+class FewerArgsForCompiler {
+ public:
+  FewerArgsForCompiler(int datasize, const UncompressFlags& flags, int* nwarn,
+                       std::function<uint8*(int, int, int)> allocate_output)
+      : datasize_(datasize),
+        flags_(flags),
+        pnwarn_(nwarn),
+        allocate_output_(allocate_output),
+        fraction_read_(0.),
+        height_(0),
+        stride_(0) {
+    if (pnwarn_ != nullptr) *pnwarn_ = 0;
+  }
+
+  const int datasize_;
+  const UncompressFlags flags_;
+  int* const pnwarn_;
+  std::function<uint8*(int, int, int)> allocate_output_;
+  float fraction_read_;  // fraction of scanline lines successfully read
+  int height_;
+  int stride_;
+};
+
+uint8* UncompressLow(const void* srcdata, FewerArgsForCompiler* argball) {
+  // unpack the argball
+  const int datasize = argball->datasize_;
+  const auto& flags = argball->flags_;
+  const int ratio = flags.ratio;
+  int components = flags.components;
+  int stride = flags.stride;            // may be 0
+  int* const nwarn = argball->pnwarn_;  // may be NULL
+
+  // can't decode if the ratio is not recognized by libjpeg
+  if ((ratio != 1) && (ratio != 2) && (ratio != 4) && (ratio != 8)) {
+    return nullptr;
+  }
+
+  // if empty image, return
+  if (datasize == 0 || srcdata == NULL) return nullptr;
+
+  // Declare temporary buffer pointer here so that we can free on error paths
+  JSAMPLE* tempdata = nullptr;
+
+  // Initialize libjpeg structures to have a memory source
+  // Modify the usual jpeg error manager to catch fatal errors.
+  JPEGErrors error = JPEGERRORS_OK;
+  struct jpeg_decompress_struct cinfo;
+  struct jpeg_error_mgr jerr;
+  cinfo.err = jpeg_std_error(&jerr);
+  jmp_buf jpeg_jmpbuf;
+  cinfo.client_data = &jpeg_jmpbuf;
+  jerr.error_exit = CatchError;
+  if (setjmp(jpeg_jmpbuf)) {
+    return nullptr;
+  }
+
+  jpeg_create_decompress(&cinfo);
+  SetSrc(&cinfo, srcdata, datasize, flags.try_recover_truncated_jpeg);
+  jpeg_read_header(&cinfo, TRUE);
+
+  // Set components automatically if desired
+  if (components == 0) components = cinfo.num_components;
+
+  // set grayscale and ratio parameters
+  switch (components) {
+    case 1:
+      cinfo.out_color_space = JCS_GRAYSCALE;
+      break;
+    case 3:
+    case 4:
+      if (cinfo.jpeg_color_space == JCS_CMYK ||
+          cinfo.jpeg_color_space == JCS_YCCK) {
+        // always use cmyk for output in a 4 channel jpeg. libjpeg has a builtin
+        // decoder.
+        cinfo.out_color_space = JCS_CMYK;
+      } else {
+        cinfo.out_color_space = JCS_RGB;
+      }
+      break;
+    default:
+      LOG(ERROR) << " Invalid components value " << components << std::endl;
+      jpeg_destroy_decompress(&cinfo);
+      return nullptr;
+  }
+  cinfo.do_fancy_upsampling = boolean(flags.fancy_upscaling);
+  cinfo.scale_num = 1;
+  cinfo.scale_denom = ratio;
+  // Activating this has a quality/speed trade-off implication:
+  // cinfo.dct_method = JDCT_IFAST;
+
+  jpeg_start_decompress(&cinfo);
+
+  // check for compatible stride
+  const int min_stride = cinfo.output_width * components * sizeof(JSAMPLE);
+  if (stride == 0) {
+    stride = min_stride;
+  } else if (stride < min_stride) {
+    LOG(ERROR) << "Incompatible stride: " << stride << " < " << min_stride;
+    jpeg_destroy_decompress(&cinfo);
+    return nullptr;
+  }
+
+  // Remember stride and height for use in Uncompress
+  argball->height_ = cinfo.output_height;
+  argball->stride_ = stride;
+
+  uint8* const dstdata = argball->allocate_output_(
+      cinfo.output_width, cinfo.output_height, components);
+  if (dstdata == nullptr) {
+    jpeg_destroy_decompress(&cinfo);
+    return nullptr;
+  }
+  JSAMPLE* output_line = static_cast<JSAMPLE*>(dstdata);
+
+  // Temporary buffer used for CMYK -> RGB conversion.
+  const bool use_cmyk = (cinfo.out_color_space == JCS_CMYK);
+  tempdata = use_cmyk ? new JSAMPLE[cinfo.output_width * 4] : NULL;
+
+  // If there is an error reading a line, this aborts the reading.
+  // Save the fraction of the image that has been read.
+  argball->fraction_read_ = 1.0;
+  while (cinfo.output_scanline < cinfo.output_height) {
+    int num_lines_read = 0;
+    if (cinfo.out_color_space == JCS_CMYK) {
+      num_lines_read = jpeg_read_scanlines(&cinfo, &tempdata, 1);
+      // Convert CMYK to RGB
+      for (size_t i = 0; i < cinfo.output_width; ++i) {
+        int c = tempdata[4 * i + 0];
+        int m = tempdata[4 * i + 1];
+        int y = tempdata[4 * i + 2];
+        int k = tempdata[4 * i + 3];
+        int r, g, b;
+        if (cinfo.saw_Adobe_marker) {
+          r = (k * c) / 255;
+          g = (k * m) / 255;
+          b = (k * y) / 255;
+        } else {
+          r = (255 - k) * (255 - c) / 255;
+          g = (255 - k) * (255 - m) / 255;
+          b = (255 - k) * (255 - y) / 255;
+        }
+        output_line[3 * i + 0] = r;
+        output_line[3 * i + 1] = g;
+        output_line[3 * i + 2] = b;
+      }
+    } else {
+      num_lines_read = jpeg_read_scanlines(&cinfo, &output_line, 1);
+    }
+    // Handle error cases
+    if (num_lines_read == 0) {
+      LOG(ERROR) << "Premature end of JPEG data. Stopped at line "
+                 << cinfo.output_scanline << "/" << cinfo.output_height;
+      if (!flags.try_recover_truncated_jpeg) {
+        argball->fraction_read_ =
+            static_cast<float>(cinfo.output_scanline) / cinfo.output_height;
+        error = JPEGERRORS_UNEXPECTED_END_OF_DATA;
+      } else {
+        for (size_t line = cinfo.output_scanline; line < cinfo.output_height;
+             ++line) {
+          if (line == 0) {
+            // If even the first line is missing, fill with black color
+            memset(output_line, 0, min_stride);
+          } else {
+            // else, just replicate the line above.
+            memcpy(output_line, output_line - stride, min_stride);
+          }
+          output_line += stride;
+        }
+        argball->fraction_read_ = 1.0;  // consider all lines as read
+        // prevent error-on-exit in libjpeg:
+        cinfo.output_scanline = cinfo.output_height;
+      }
+      break;
+    }
+    DCHECK_EQ(num_lines_read, 1);
+    TF_ANNOTATE_MEMORY_IS_INITIALIZED(output_line, min_stride);
+    output_line += stride;
+  }
+  delete[] tempdata;
+
+  // Convert the RGB data to RGBA, with alpha set to 0xFF to indicate
+  // opacity.
+  // RGBRGBRGB... --> RGBARGBARGBA...
+  if (components == 4) {
+    // Start on the last line.
+    JSAMPLE* scanlineptr =
+        static_cast<JSAMPLE*>(dstdata + (cinfo.output_height - 1) * stride);
+    const JSAMPLE kOpaque = -1;  // All ones appropriate for JSAMPLE.
+    const int right_rgb = (cinfo.output_width - 1) * 3;
+    const int right_rgba = (cinfo.output_width - 1) * 4;
+
+    for (int y = cinfo.output_height; y-- > 0;) {
+      // We do all the transformations in place, going backwards for each row.
+      const JSAMPLE* rgb_pixel = scanlineptr + right_rgb;
+      JSAMPLE* rgba_pixel = scanlineptr + right_rgba;
+      scanlineptr -= stride;
+      for (int x = cinfo.output_width; x-- > 0;
+           rgba_pixel -= 4, rgb_pixel -= 3) {
+        // We copy the 3 bytes at rgb_pixel into the 4 bytes at rgba_pixel
+        // The "a" channel is set to be opaque.
+        rgba_pixel[3] = kOpaque;
+        rgba_pixel[2] = rgb_pixel[2];
+        rgba_pixel[1] = rgb_pixel[1];
+        rgba_pixel[0] = rgb_pixel[0];
+      }
+    }
+  }
+
+  switch (components) {
+    case 1:
+      if (cinfo.output_components != 1) {
+        error = JPEGERRORS_BAD_PARAM;
+      }
+      break;
+    case 3:
+    case 4:
+      if (cinfo.out_color_space == JCS_CMYK) {
+        if (cinfo.output_components != 4) {
+          error = JPEGERRORS_BAD_PARAM;
+        }
+      } else {
+        if (cinfo.output_components != 3) {
+          error = JPEGERRORS_BAD_PARAM;
+        }
+      }
+      break;
+    default:
+      // will never happen, should be catched by the previous switch
+      LOG(ERROR) << "Invalid components value " << components << std::endl;
+      jpeg_destroy_decompress(&cinfo);
+      return nullptr;
+  }
+
+  // save number of warnings if requested
+  if (nwarn != nullptr) {
+    *nwarn = cinfo.err->num_warnings;
+  }
+
+  // Handle errors in JPEG
+  switch (error) {
+    case JPEGERRORS_OK:
+      jpeg_finish_decompress(&cinfo);
+      break;
+    case JPEGERRORS_UNEXPECTED_END_OF_DATA:
+    case JPEGERRORS_BAD_PARAM:
+      jpeg_abort(reinterpret_cast<j_common_ptr>(&cinfo));
+      break;
+    default:
+      LOG(ERROR) << "Unhandled case " << error;
+      break;
+  }
+  jpeg_destroy_decompress(&cinfo);
+
+  return dstdata;
+}
+
+}  // anonymous namespace
+
+// -----------------------------------------------------------------------------
+//  We do the apparently silly thing of packing 5 of the arguments
+//  into a structure that is then passed to another routine
+//  that does all the work.  The reason is that we want to catch
+//  fatal JPEG library errors with setjmp/longjmp, and g++ and
+//  associated libraries aren't good enough to guarantee that 7
+//  parameters won't get clobbered by the longjmp.  So we help
+//  it out a little.
+uint8* Uncompress(const void* srcdata, int datasize,
+                  const UncompressFlags& flags, int* nwarn,
+                  std::function<uint8*(int, int, int)> allocate_output) {
+  FewerArgsForCompiler argball(datasize, flags, nwarn, allocate_output);
+  uint8* const dstdata = UncompressLow(srcdata, &argball);
+  const float fraction_read = argball.fraction_read_;
+  if (dstdata == NULL ||
+      fraction_read < std::min(1.0f, flags.min_acceptable_fraction)) {
+    // Major failure, none or too-partial read returned; get out
+    return NULL;
+  }
+
+  // If there was an error in reading the jpeg data,
+  // set the unread pixels to black
+  if (fraction_read < 1.0) {
+    const int first_bad_line =
+        static_cast<int>(fraction_read * argball.height_);
+    uint8* start = dstdata + first_bad_line * argball.stride_;
+    const int nbytes = (argball.height_ - first_bad_line) * argball.stride_;
+    memset(static_cast<void*>(start), 0, nbytes);
+  }
+
+  return dstdata;
+}
+
+uint8* Uncompress(const void* srcdata, int datasize,
+                  const UncompressFlags& flags, int* pwidth, int* pheight,
+                  int* pcomponents, int* nwarn) {
+  uint8* buffer = NULL;
+  uint8* result =
+      Uncompress(srcdata, datasize, flags, nwarn,
+                 [=, &buffer](int width, int height, int components) {
+                   if (pwidth != nullptr) *pwidth = width;
+                   if (pheight != nullptr) *pheight = height;
+                   if (pcomponents != nullptr) *pcomponents = components;
+                   buffer = new uint8[height * width * components];
+                   return buffer;
+                 });
+  if (!result) delete[] buffer;
+  return result;
+}
+
+// ----------------------------------------------------------------------------
+// Computes image information from jpeg header.
+// Returns true on success; false on failure.
+bool GetImageInfo(const void* srcdata, int datasize, int* width, int* height,
+                  int* components) {
+  // Init in case of failure
+  if (width) *width = 0;
+  if (height) *height = 0;
+  if (components) *components = 0;
+
+  // If empty image, return
+  if (datasize == 0 || srcdata == NULL) return false;
+
+  // Initialize libjpeg structures to have a memory source
+  // Modify the usual jpeg error manager to catch fatal errors.
+  struct jpeg_decompress_struct cinfo;
+  struct jpeg_error_mgr jerr;
+  jmp_buf jpeg_jmpbuf;
+  cinfo.err = jpeg_std_error(&jerr);
+  cinfo.client_data = &jpeg_jmpbuf;
+  jerr.error_exit = CatchError;
+  if (setjmp(jpeg_jmpbuf)) {
+    return false;
+  }
+
+  // set up, read header, set image parameters, save size
+  jpeg_create_decompress(&cinfo);
+  SetSrc(&cinfo, srcdata, datasize, false);
+
+  jpeg_read_header(&cinfo, TRUE);
+  jpeg_start_decompress(&cinfo);  // required to transfer image size to cinfo
+  if (width) *width = cinfo.output_width;
+  if (height) *height = cinfo.output_height;
+  if (components) *components = cinfo.output_components;
+
+  jpeg_destroy_decompress(&cinfo);
+
+  return true;
+}
+
+// -----------------------------------------------------------------------------
+// Compression
+
+namespace {
+bool CompressInternal(const uint8* srcdata, int width, int height,
+                      const CompressFlags& flags, string* output) {
+  output->clear();
+  const int components = (static_cast<int>(flags.format) & 0xff);
+  int in_stride = flags.stride;
+  if (in_stride == 0) {
+    in_stride = width * (static_cast<int>(flags.format) & 0xff);
+  } else if (in_stride < width * components) {
+    LOG(ERROR) << "Incompatible input stride";
+    return false;
+  }
+
+  JOCTET* buffer = 0;
+
+  // NOTE: for broader use xmp_metadata should be made a unicode string
+  CHECK(srcdata != nullptr);
+  CHECK(output != nullptr);
+  // This struct contains the JPEG compression parameters and pointers to
+  // working space
+  struct jpeg_compress_struct cinfo;
+  // This struct represents a JPEG error handler.
+  struct jpeg_error_mgr jerr;
+  jmp_buf jpeg_jmpbuf;  // recovery point in case of error
+
+  // Step 1: allocate and initialize JPEG compression object
+  // Use the usual jpeg error manager.
+  cinfo.err = jpeg_std_error(&jerr);
+  cinfo.client_data = &jpeg_jmpbuf;
+  jerr.error_exit = CatchError;
+  if (setjmp(jpeg_jmpbuf)) {
+    output->clear();
+    delete[] buffer;
+    return false;
+  }
+
+  jpeg_create_compress(&cinfo);
+
+  // Step 2: specify data destination
+  // We allocate a buffer of reasonable size. If we have a small image, just
+  // estimate the size of the output using the number of bytes of the input.
+  // If this is getting too big, we will append to the string by chunks of 1MB.
+  // This seems like a reasonable compromise between performance and memory.
+  int bufsize = std::min(width * height * components, 1 << 20);
+  buffer = new JOCTET[bufsize];
+  SetDest(&cinfo, buffer, bufsize, output);
+
+  // Step 3: set parameters for compression
+  cinfo.image_width = width;
+  cinfo.image_height = height;
+  switch (components) {
+    case 1:
+      cinfo.input_components = 1;
+      cinfo.in_color_space = JCS_GRAYSCALE;
+      break;
+    case 3:
+    case 4:
+      cinfo.input_components = 3;
+      cinfo.in_color_space = JCS_RGB;
+      break;
+    default:
+      LOG(ERROR) << " Invalid components value " << components << std::endl;
+      output->clear();
+      delete[] buffer;
+      return false;
+  }
+  jpeg_set_defaults(&cinfo);
+  if (flags.optimize_jpeg_size) cinfo.optimize_coding = TRUE;
+
+  cinfo.density_unit = flags.density_unit;  // JFIF code for pixel size units:
+                                            // 1 = in, 2 = cm
+  cinfo.X_density = flags.x_density;        // Horizontal pixel density
+  cinfo.Y_density = flags.y_density;        // Vertical pixel density
+  jpeg_set_quality(&cinfo, flags.quality, TRUE);
+
+  if (flags.progressive) {
+    jpeg_simple_progression(&cinfo);
+  }
+
+  if (!flags.chroma_downsampling) {
+    // Turn off chroma subsampling (it is on by default).  For more details on
+    // chroma subsampling, see http://en.wikipedia.org/wiki/Chroma_subsampling.
+    for (int i = 0; i < cinfo.num_components; ++i) {
+      cinfo.comp_info[i].h_samp_factor = 1;
+      cinfo.comp_info[i].v_samp_factor = 1;
+    }
+  }
+
+  jpeg_start_compress(&cinfo, TRUE);
+
+  // Embed XMP metadata if any
+  if (!flags.xmp_metadata.empty()) {
+    // XMP metadata is embedded in the APP1 tag of JPEG and requires this
+    // namespace header string (null-terminated)
+    const string name_space = "http://ns.adobe.com/xap/1.0/";
+    const int name_space_length = name_space.size();
+    const int metadata_length = flags.xmp_metadata.size();
+    const int packet_length = metadata_length + name_space_length + 1;
+    std::unique_ptr<JOCTET[]> joctet_packet(new JOCTET[packet_length]);
+
+    for (int i = 0; i < name_space_length; i++) {
+      // Conversion char --> JOCTET
+      joctet_packet[i] = name_space[i];
+    }
+    joctet_packet[name_space_length] = 0;  // null-terminate namespace string
+
+    for (int i = 0; i < metadata_length; i++) {
+      // Conversion char --> JOCTET
+      joctet_packet[i + name_space_length + 1] = flags.xmp_metadata[i];
+    }
+    jpeg_write_marker(&cinfo, JPEG_APP0 + 1, joctet_packet.get(),
+                      packet_length);
+  }
+
+  // JSAMPLEs per row in image_buffer
+  std::unique_ptr<JSAMPLE[]> row_temp(
+      new JSAMPLE[width * cinfo.input_components]);
+  while (cinfo.next_scanline < cinfo.image_height) {
+    JSAMPROW row_pointer[1];  // pointer to JSAMPLE row[s]
+    const uint8* r = &srcdata[cinfo.next_scanline * in_stride];
+    uint8* p = static_cast<uint8*>(row_temp.get());
+    switch (flags.format) {
+      case FORMAT_RGBA: {
+        for (int i = 0; i < width; ++i, p += 3, r += 4) {
+          p[0] = r[0];
+          p[1] = r[1];
+          p[2] = r[2];
+        }
+        row_pointer[0] = row_temp.get();
+        break;
+      }
+      case FORMAT_ABGR: {
+        for (int i = 0; i < width; ++i, p += 3, r += 4) {
+          p[0] = r[3];
+          p[1] = r[2];
+          p[2] = r[1];
+        }
+        row_pointer[0] = row_temp.get();
+        break;
+      }
+      default: {
+        row_pointer[0] = reinterpret_cast<JSAMPLE*>(const_cast<JSAMPLE*>(r));
+      }
+    }
+    CHECK_EQ(jpeg_write_scanlines(&cinfo, row_pointer, 1), 1);
+  }
+  jpeg_finish_compress(&cinfo);
+
+  // release JPEG compression object
+  jpeg_destroy_compress(&cinfo);
+  delete[] buffer;
+  return true;
+}
+
+}  // anonymous namespace
+
+// -----------------------------------------------------------------------------
+
+bool Compress(const void* srcdata, int width, int height,
+              const CompressFlags& flags, string* output) {
+  return CompressInternal(static_cast<const uint8*>(srcdata), width, height,
+                          flags, output);
+}
+
+string Compress(const void* srcdata, int width, int height,
+                const CompressFlags& flags) {
+  string temp;
+  CompressInternal(static_cast<const uint8*>(srcdata), width, height, flags,
+                   &temp);
+  // If CompressInternal fails, temp will be empty.
+  return temp;
+}
+
+}  // namespace jpeg
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/jpeg/jpeg_mem.h b/tensorflow/core/lib/jpeg/jpeg_mem.h
new file mode 100644
index 0000000000..19ba7d4acf
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/jpeg_mem.h
@@ -0,0 +1,130 @@
+// This file defines functions to compress and uncompress JPEG files
+// to and from memory.  It provides interfaces for raw images
+// (data array and size fields).
+// Direct manipulation of JPEG strings are supplied: Flip, Rotate, Crop..
+
+#ifndef TENSORFLOW_LIB_JPEG_JPEG_MEM_H_
+#define TENSORFLOW_LIB_JPEG_JPEG_MEM_H_
+
+#include <functional>
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+namespace jpeg {
+
+// Flags for Uncompress
+struct UncompressFlags {
+  // ratio can be 1, 2, 4, or 8 and represent the denominator for the scaling
+  // factor (eg ratio = 4 means that the resulting image will be at 1/4 original
+  // size in both directions).
+  int ratio = 1;
+
+  // The number of bytes per pixel (1, 3 or 4), or 0 for autodetect.
+  int components = 0;
+
+  // If true, decoder will use a slower but nicer upscaling of the chroma
+  // planes (yuv420/422 only).
+  bool fancy_upscaling = true;
+
+  // If true, will attempt to fill in missing lines of truncated files
+  bool try_recover_truncated_jpeg = false;
+
+  // The minimum required fraction of lines read before the image is accepted.
+  float min_acceptable_fraction = 1.0;
+
+  // The distance in bytes from one scanline to the other.  Should be at least
+  // equal to width*components*sizeof(JSAMPLE).  If 0 is passed, the stride
+  // used will be this minimal value.
+  int stride = 0;
+};
+
+// Uncompress some raw JPEG data given by the pointer srcdata and the length
+// datasize.
+// - width and height are the address where to store the size of the
+//   uncompressed image in pixels.  May be nullptr.
+// - components is the address where the number of read components are
+//   stored.  This is *output only*: to request a specific number of
+//   components use flags.components.  May be nullptr.
+// - nwarn is the address in which to store the number of warnings.
+//   May be nullptr.
+// The function returns a pointer to the raw uncompressed data or NULL if
+// there was an error. The caller of the function is responsible for
+// freeing the memory (using delete []).
+uint8* Uncompress(const void* srcdata, int datasize,
+                  const UncompressFlags& flags, int* width, int* height,
+                  int* components,  // Output only: useful with autodetect
+                  int* nwarn);
+
+// Version of Uncompress that allocates memory via a callback.  The callback
+// arguments are (width, height, components).  If the size is known ahead of
+// time this function can return an existing buffer; passing a callback allows
+// the buffer to be shaped based on the JPEG header.  The caller is responsible
+// for freeing the memory *even along error paths*.
+uint8* Uncompress(const void* srcdata, int datasize,
+                  const UncompressFlags& flags, int* nwarn,
+                  std::function<uint8*(int, int, int)> allocate_output);
+
+// Read jpeg header and get image information.  Returns true on success.
+// The width, height, and components points may be null.
+bool GetImageInfo(const void* srcdata, int datasize, int* width, int* height,
+                  int* components);
+
+// Note: (format & 0xff) = number of components (<=> bytes per pixels)
+enum Format {
+  FORMAT_GRAYSCALE = 0x001,  // 1 byte/pixel
+  FORMAT_RGB = 0x003,        // 3 bytes/pixel RGBRGBRGBRGB...
+  FORMAT_RGBA = 0x004,       // 4 bytes/pixel RGBARGBARGBARGBA...
+  FORMAT_ABGR = 0x104        // 4 bytes/pixel ABGRABGRABGR...
+};
+
+// Flags for compression
+struct CompressFlags {
+  // Encoding of the input data for compression
+  Format format;
+
+  // Quality of the compression from 0-100
+  int quality = 95;
+
+  // If true, create a jpeg image that loads progressively
+  bool progressive = false;
+
+  // If true, reduce jpeg size without changing quality (at the cost of CPU/RAM)
+  bool optimize_jpeg_size = false;
+
+  // See http://en.wikipedia.org/wiki/Chroma_subsampling
+  bool chroma_downsampling = true;
+
+  // Resolution
+  int density_unit = 1;  // 1 = in, 2 = cm
+  int x_density = 300;
+  int y_density = 300;
+
+  // If not empty, embed this XMP metadata in the image header
+  StringPiece xmp_metadata;
+
+  // The distance in bytes from one scanline to the other.  Should be at least
+  // equal to width*components*sizeof(JSAMPLE).  If 0 is passed, the stride
+  // used will be this minimal value.
+  int stride = 0;
+};
+
+// Compress some raw image given in srcdata, the data is a 2D array of size
+// stride*height with one of the formats enumerated above.
+// The encoded data is returned as a string.
+// If not empty, XMP metadata can be embedded in the image header
+// On error, returns the empty string (which is never a valid jpeg).
+string Compress(const void* srcdata, int width, int height,
+                const CompressFlags& flags);
+
+// On error, returns false and sets output to empty.
+bool Compress(const void* srcdata, int width, int height,
+              const CompressFlags& flags, string* output);
+
+}  // namespace jpeg
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_JPEG_JPEG_MEM_H_
diff --git a/tensorflow/core/lib/jpeg/jpeg_mem_unittest.cc b/tensorflow/core/lib/jpeg/jpeg_mem_unittest.cc
new file mode 100644
index 0000000000..23e72f9d57
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/jpeg_mem_unittest.cc
@@ -0,0 +1,304 @@
+#include "tensorflow/core/lib/jpeg/jpeg_mem.h"
+
+#include <setjmp.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <memory>
+
+#include "tensorflow/core/lib/jpeg/jpeg_handle.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+#include <gtest/gtest.h>
+
+#include "tensorflow/core/lib/core/casts.h"
+
+namespace tensorflow {
+namespace jpeg {
+namespace {
+
+const char kTestData[] = "tensorflow/core/lib/jpeg/testdata/";
+
+int ComputeSumAbsoluteDifference(const uint8* a, const uint8* b, int width,
+                                 int height, int a_stride, int b_stride) {
+  int totalerr = 0;
+  for (int i = 0; i < height; i++) {
+    const uint8* const pa = a + i * a_stride;
+    const uint8* const pb = b + i * b_stride;
+    for (int j = 0; j < 3 * width; j++) {
+      totalerr += abs(static_cast<int>(pa[j]) - static_cast<int>(pb[j]));
+    }
+  }
+  return totalerr;
+}
+
+// Reads the contents of the file into output
+void ReadFileToStringOrDie(Env* env, const string& filename, string* output) {
+  TF_CHECK_OK(ReadFileToString(env, filename, output));
+}
+
+void TestJPEG(Env* env, const string& jpegfile) {
+  // Read the data from the jpeg file into memory
+  string jpeg;
+  ReadFileToStringOrDie(Env::Default(), jpegfile, &jpeg);
+  const int fsize = jpeg.size();
+  const uint8* const temp = bit_cast<const uint8*>(jpeg.data());
+
+  // try partial decoding (half of the data)
+  int w, h, c;
+  std::unique_ptr<uint8[]> imgdata;
+
+  UncompressFlags flags;
+  flags.components = 3;
+
+  // set min_acceptable_fraction to something insufficient
+  flags.min_acceptable_fraction = 0.8;
+  imgdata.reset(Uncompress(temp, fsize / 2, flags, &w, &h, &c, NULL));
+  CHECK(imgdata.get() == NULL);
+
+  // now, use a value that makes fsize/2 be enough for a black-filling
+  flags.min_acceptable_fraction = 0.01;
+  imgdata.reset(Uncompress(temp, fsize / 2, flags, &w, &h, &c, NULL));
+  CHECK(imgdata.get() != NULL);
+
+  // finally, uncompress the whole data
+  flags.min_acceptable_fraction = 1.0;
+  imgdata.reset(Uncompress(temp, fsize, flags, &w, &h, &c, NULL));
+  CHECK(imgdata.get() != NULL);
+
+  // Uncompress the data to RGBA, too
+  flags.min_acceptable_fraction = 1.0;
+  flags.components = 4;
+  imgdata.reset(Uncompress(temp, fsize, flags, &w, &h, &c, NULL));
+  CHECK(imgdata.get() != NULL);
+}
+
+TEST(JpegMemTest, Jpeg) {
+  Env* env = Env::Default();
+  const string data_path = kTestData;
+
+  // Name of a valid jpeg file on the disk
+  TestJPEG(env, data_path + "jpeg_merge_test1.jpg");
+
+  // Exercise CMYK machinery as well
+  TestJPEG(env, data_path + "jpeg_merge_test1_cmyk.jpg");
+}
+
+TEST(JpegMemTest, Jpeg2) {
+  // create known data, for size in_w x in_h
+  const int in_w = 256;
+  const int in_h = 256;
+  const int stride1 = 3 * in_w;
+  const std::unique_ptr<uint8[]> refdata1(new uint8[stride1 * in_h]);
+  for (int i = 0; i < in_h; i++) {
+    for (int j = 0; j < in_w; j++) {
+      const int offset = i * stride1 + 3 * j;
+      refdata1[offset + 0] = i;
+      refdata1[offset + 1] = j;
+      refdata1[offset + 2] = static_cast<uint8>((i + j) >> 1);
+    }
+  }
+
+  // duplicate with weird input stride
+  const int stride2 = 3 * 357;
+  const std::unique_ptr<uint8[]> refdata2(new uint8[stride2 * in_h]);
+  for (int i = 0; i < in_h; i++) {
+    memcpy(&refdata2[i * stride2], &refdata1[i * stride1], 3 * in_w);
+  }
+
+  // Test compression
+  string cpdata1, cpdata2;
+  {
+    const string kXMP = "XMP_TEST_123";
+
+    // Compress it to JPEG
+    CompressFlags flags;
+    flags.format = FORMAT_RGB;
+    flags.quality = 97;
+    flags.xmp_metadata = kXMP;
+    cpdata1 = Compress(refdata1.get(), in_w, in_h, flags);
+    flags.stride = stride2;
+    cpdata2 = Compress(refdata2.get(), in_w, in_h, flags);
+    // Different input stride shouldn't change the output
+    CHECK_EQ(cpdata1, cpdata2);
+
+    // Verify valid XMP.
+    CHECK_NE(string::npos, cpdata1.find(kXMP));
+
+    // Test the other API, where a storage string is supplied
+    string cptest;
+    flags.stride = 0;
+    Compress(refdata1.get(), in_w, in_h, flags, &cptest);
+    CHECK_EQ(cptest, cpdata1);
+    flags.stride = stride2;
+    Compress(refdata2.get(), in_w, in_h, flags, &cptest);
+    CHECK_EQ(cptest, cpdata2);
+  }
+
+  // Uncompress twice: once with 3 components and once with autodetect
+  std::unique_ptr<uint8[]> imgdata1;
+  for (const int components : {0, 3}) {
+    // Uncompress it
+    UncompressFlags flags;
+    flags.components = components;
+    int w, h, c;
+    imgdata1.reset(
+        Uncompress(cpdata1.c_str(), cpdata1.length(), flags, &w, &h, &c, NULL));
+
+    // Check obvious formatting stuff
+    CHECK_EQ(w, in_w);
+    CHECK_EQ(h, in_h);
+    CHECK_EQ(c, 3);
+    CHECK(imgdata1.get());
+
+    // Compare the two images
+    const int totalerr = ComputeSumAbsoluteDifference(
+        imgdata1.get(), refdata1.get(), in_w, in_h, stride1, stride1);
+    CHECK_LE(totalerr, 85000);
+  }
+
+  // check the second image too. Should be bitwise identical to the first.
+  // uncompress using a weird stride
+  {
+    UncompressFlags flags;
+    flags.stride = 3 * 411;
+    const std::unique_ptr<uint8[]> imgdata2(new uint8[flags.stride * in_h]);
+    CHECK(imgdata2.get() == Uncompress(cpdata2.c_str(), cpdata2.length(), flags,
+                                       NULL, [&imgdata2](int w, int h, int c) {
+                                         CHECK_EQ(w, in_w);
+                                         CHECK_EQ(h, in_h);
+                                         CHECK_EQ(c, 3);
+                                         return imgdata2.get();
+                                       }));
+    const int totalerr = ComputeSumAbsoluteDifference(
+        imgdata1.get(), imgdata2.get(), in_w, in_h, stride1, flags.stride);
+    CHECK_EQ(totalerr, 0);
+  }
+}
+
+// Takes JPEG data and reads its headers to determine whether or not the JPEG
+// was chroma downsampled.
+bool IsChromaDownsampled(const string& jpegdata) {
+  // Initialize libjpeg structures to have a memory source
+  // Modify the usual jpeg error manager to catch fatal errors.
+  struct jpeg_decompress_struct cinfo;
+  struct jpeg_error_mgr jerr;
+  jmp_buf jpeg_jmpbuf;
+  cinfo.err = jpeg_std_error(&jerr);
+  cinfo.client_data = &jpeg_jmpbuf;
+  jerr.error_exit = CatchError;
+  if (setjmp(jpeg_jmpbuf)) return false;
+
+  // set up, read header, set image parameters, save size
+  jpeg_create_decompress(&cinfo);
+  SetSrc(&cinfo, jpegdata.c_str(), jpegdata.size(), false);
+
+  jpeg_read_header(&cinfo, TRUE);
+  jpeg_start_decompress(&cinfo);  // required to transfer image size to cinfo
+  const int components = cinfo.output_components;
+  if (components == 1) return false;
+
+  // Check validity
+  CHECK_EQ(3, components);
+  CHECK_EQ(cinfo.comp_info[1].h_samp_factor, cinfo.comp_info[2].h_samp_factor)
+      << "The h sampling factors should be the same.";
+  CHECK_EQ(cinfo.comp_info[1].v_samp_factor, cinfo.comp_info[2].v_samp_factor)
+      << "The v sampling factors should be the same.";
+  for (int i = 0; i < components; ++i) {
+    CHECK_GT(cinfo.comp_info[i].h_samp_factor, 0) << "Invalid sampling factor.";
+    CHECK_EQ(cinfo.comp_info[i].h_samp_factor, cinfo.comp_info[i].v_samp_factor)
+        << "The sampling factor should be the same in both directions.";
+  }
+
+  // We're downsampled if we use fewer samples for color than for brightness.
+  // Do this before deallocating cinfo.
+  const bool downsampled =
+      cinfo.comp_info[1].h_samp_factor < cinfo.comp_info[0].h_samp_factor;
+
+  jpeg_destroy_decompress(&cinfo);
+  return downsampled;
+}
+
+TEST(JpegMemTest, ChromaDownsampling) {
+  // Read the data from a test jpeg file into memory
+  const string jpegfile = string(kTestData) + "jpeg_merge_test1.jpg";
+  string jpeg;
+  ReadFileToStringOrDie(Env::Default(), jpegfile, &jpeg);
+
+  // Verify that compressing the JPEG with chroma downsampling works.
+  //
+  // First, uncompress the JPEG.
+  UncompressFlags unflags;
+  unflags.components = 3;
+  int w, h, c, num_warnings;
+  std::unique_ptr<uint8[]> uncompressed(Uncompress(
+      jpeg.c_str(), jpeg.size(), unflags, &w, &h, &c, &num_warnings));
+  CHECK(uncompressed.get() != NULL);
+  CHECK_EQ(num_warnings, 0);
+
+  // Recompress the JPEG with and without chroma downsampling
+  for (const bool downsample : {false, true}) {
+    CompressFlags flags;
+    flags.format = FORMAT_RGB;
+    flags.quality = 85;
+    flags.chroma_downsampling = downsample;
+    string recompressed;
+    Compress(uncompressed.get(), w, h, flags, &recompressed);
+    CHECK(!recompressed.empty());
+    CHECK_EQ(IsChromaDownsampled(recompressed), downsample);
+  }
+}
+
+void TestBadJPEG(Env* env, const string& bad_jpeg_file, int expected_width,
+                 int expected_height, const string& reference_RGB_file,
+                 const bool try_recover_truncated_jpeg) {
+  string jpeg;
+  ReadFileToStringOrDie(env, bad_jpeg_file, &jpeg);
+
+  UncompressFlags flags;
+  flags.components = 3;
+  flags.try_recover_truncated_jpeg = try_recover_truncated_jpeg;
+
+  int width, height, components;
+  std::unique_ptr<uint8[]> imgdata;
+  imgdata.reset(Uncompress(jpeg.c_str(), jpeg.size(), flags, &width, &height,
+                           &components, NULL));
+  if (expected_width > 0) {  // we expect the file to decode into 'something'
+    CHECK_EQ(width, expected_width);
+    CHECK_EQ(height, expected_height);
+    CHECK_EQ(components, 3);
+    CHECK(imgdata.get());
+    if (!reference_RGB_file.empty()) {
+      string ref;
+      ReadFileToStringOrDie(env, reference_RGB_file, &ref);
+      CHECK(!memcmp(ref.data(), imgdata.get(), ref.size()));
+    }
+  } else {  // no decodable
+    CHECK(!imgdata.get()) << "file:" << bad_jpeg_file;
+  }
+}
+
+TEST(JpegMemTest, BadJpeg) {
+  Env* env = Env::Default();
+  const string data_path = kTestData;
+
+  // Test corrupt file
+  TestBadJPEG(env, data_path + "bad_huffman.jpg", 1024, 768, "", false);
+  TestBadJPEG(env, data_path + "corrupt.jpg", 0 /*120*/, 90, "", false);
+
+  // Truncated files, undecodable because of missing lines:
+  TestBadJPEG(env, data_path + "corrupt34_2.jpg", 0, 3300, "", false);
+  TestBadJPEG(env, data_path + "corrupt34_3.jpg", 0, 3300, "", false);
+  TestBadJPEG(env, data_path + "corrupt34_4.jpg", 0, 3300, "", false);
+
+  // Try in 'recover' mode now:
+  TestBadJPEG(env, data_path + "corrupt34_2.jpg", 2544, 3300, "", true);
+  TestBadJPEG(env, data_path + "corrupt34_3.jpg", 2544, 3300, "", true);
+  TestBadJPEG(env, data_path + "corrupt34_4.jpg", 2544, 3300, "", true);
+}
+
+}  // namespace
+}  // namespace jpeg
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/jpeg/testdata/bad_huffman.jpg b/tensorflow/core/lib/jpeg/testdata/bad_huffman.jpg
new file mode 100644
index 0000000000..ef5b6f12c5
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/testdata/bad_huffman.jpg
diff --git a/tensorflow/core/lib/jpeg/testdata/corrupt.jpg b/tensorflow/core/lib/jpeg/testdata/corrupt.jpg
new file mode 100644
index 0000000000..5e2fe6c56f
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/testdata/corrupt.jpg
diff --git a/tensorflow/core/lib/jpeg/testdata/corrupt34_2.jpg b/tensorflow/core/lib/jpeg/testdata/corrupt34_2.jpg
new file mode 100644
index 0000000000..4211155c45
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/testdata/corrupt34_2.jpg
diff --git a/tensorflow/core/lib/jpeg/testdata/corrupt34_3.jpg b/tensorflow/core/lib/jpeg/testdata/corrupt34_3.jpg
new file mode 100644
index 0000000000..c1c2a9d1e1
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/testdata/corrupt34_3.jpg
diff --git a/tensorflow/core/lib/jpeg/testdata/corrupt34_4.jpg b/tensorflow/core/lib/jpeg/testdata/corrupt34_4.jpg
new file mode 100644
index 0000000000..b8e7308ba0
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/testdata/corrupt34_4.jpg
diff --git a/tensorflow/core/lib/jpeg/testdata/jpeg_merge_test1.jpg b/tensorflow/core/lib/jpeg/testdata/jpeg_merge_test1.jpg
new file mode 100644
index 0000000000..5e348a12fd
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/testdata/jpeg_merge_test1.jpg
diff --git a/tensorflow/core/lib/jpeg/testdata/jpeg_merge_test1_cmyk.jpg b/tensorflow/core/lib/jpeg/testdata/jpeg_merge_test1_cmyk.jpg
new file mode 100644
index 0000000000..15f895960d
--- /dev/null
+++ b/tensorflow/core/lib/jpeg/testdata/jpeg_merge_test1_cmyk.jpg
diff --git a/tensorflow/core/lib/png/png_io.cc b/tensorflow/core/lib/png/png_io.cc
new file mode 100644
index 0000000000..43b84e41e0
--- /dev/null
+++ b/tensorflow/core/lib/png/png_io.cc
@@ -0,0 +1,385 @@
+// Functions to read and write images in PNG format.
+
+#include <string.h>
+#include <sys/types.h>
+#include <string>
+#include <utility>
+#include <vector>
+// NOTE(skal): we don't '#include <setjmp.h>' before png/png.h as it otherwise
+// provokes a compile error. We instead let png.h include what is needed.
+
+#include "tensorflow/core/lib/core/casts.h"
+#include "tensorflow/core/lib/png/png_io.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"  // endian
+#include "external/png_archive/libpng-1.2.53/png.h"
+
+namespace tensorflow {
+namespace png {
+
+////////////////////////////////////////////////////////////////////////////////
+// Encode an 8- or 16-bit rgb/grayscale image to PNG string
+////////////////////////////////////////////////////////////////////////////////
+
+namespace {
+
+#define PTR_INC(type, ptr, del) (ptr = \
+    reinterpret_cast<type*>(reinterpret_cast<char*>(ptr) + (del)))
+#define CPTR_INC(type, ptr, del) (ptr = \
+    reinterpret_cast<const type*>(reinterpret_cast<const char*>(ptr) + (del)))
+
+// Convert from 8 bit components to 16. This works in-place.
+static void Convert8to16(const uint8* p8, int num_comps, int p8_row_bytes,
+                         int width, int height, uint16* p16,
+                         int p16_row_bytes) {
+  // Adjust pointers to copy backwards
+  width *= num_comps;
+  CPTR_INC(uint8,  p8, (height - 1) * p8_row_bytes +
+                       (width  - 1) * sizeof(*p8));
+  PTR_INC(uint16, p16, (height - 1) * p16_row_bytes +
+                       (width  - 1) * sizeof(*p16));
+  int bump8  = width * sizeof(*p8)  - p8_row_bytes;
+  int bump16 = width * sizeof(*p16) - p16_row_bytes;
+  for (; height-- != 0;
+      CPTR_INC(uint8, p8, bump8), PTR_INC(uint16, p16, bump16)) {
+    for (int w = width; w-- != 0; --p8, --p16) {
+      uint pix = *p8;
+      pix |= pix << 8;
+      *p16 = static_cast<uint16>(pix);
+    }
+  }
+}
+
+#undef PTR_INC
+#undef CPTR_INC
+
+void ErrorHandler(png_structp png_ptr, png_const_charp msg) {
+  DecodeContext* const ctx = bit_cast<DecodeContext*>(png_get_io_ptr(png_ptr));
+  ctx->error_condition = true;
+  // To prevent log spam, errors are logged as VLOG(1) instead of ERROR.
+  VLOG(1) << "PNG error: " << msg;
+  longjmp(png_jmpbuf(png_ptr), 1);
+}
+
+void WarningHandler(png_structp png_ptr, png_const_charp msg) {
+  LOG(WARNING) << "PNG warning: " << msg;
+}
+
+void StringReader(png_structp png_ptr,
+                  png_bytep data, png_size_t length) {
+  DecodeContext* const ctx = bit_cast<DecodeContext*>(png_get_io_ptr(png_ptr));
+  if (static_cast<png_size_t>(ctx->data_left) < length) {
+    if (!ctx->error_condition) {
+      VLOG(1) << "PNG read decoding error";
+      ctx->error_condition = true;
+    }
+    memset(data, 0, length);
+  } else {
+    memcpy(data, ctx->data, length);
+    ctx->data += length;
+    ctx->data_left -= length;
+  }
+}
+
+void StringWriter(png_structp png_ptr, png_bytep data, png_size_t length) {
+  string* const s = bit_cast<string*>(png_get_io_ptr(png_ptr));
+  s->append(bit_cast<const char*>(data), length);
+}
+
+void StringWriterFlush(png_structp png_ptr) {
+}
+
+char* check_metadata_string(const string& s) {
+  const char* const c_string = s.c_str();
+  const size_t length = s.size();
+  if (strlen(c_string) != length) {
+    LOG(WARNING) << "Warning! Metadata contains \\0 character(s).";
+  }
+  return const_cast<char*>(c_string);
+}
+
+}  // namespace
+
+// We move CommonInitDecode() and CommonFinishDecode()
+// out of the CommonDecode() template to save code space.
+void CommonFreeDecode(DecodeContext* context) {
+  if (context->png_ptr) {
+    png_destroy_read_struct(&context->png_ptr,
+                            context->info_ptr ? &context->info_ptr : NULL, 0);
+    context->png_ptr = nullptr;
+    context->info_ptr = nullptr;
+  }
+}
+
+bool DecodeHeader(StringPiece png_string, int* width, int* height,
+                  int* components, int* channel_bit_depth,
+                  std::vector<std::pair<string, string> >* metadata) {
+  DecodeContext context;
+  // Ask for 16 bits even if there may be fewer.  This assures that sniffing
+  // the metadata will succeed in all cases.
+  //
+  // TODO(skal): CommonInitDecode() mixes the operation of sniffing the
+  // metadata with setting up the data conversions.  These should be separated.
+  constexpr int kDesiredNumChannels = 1;
+  constexpr int kDesiredChannelBits = 16;
+  if (!CommonInitDecode(png_string, kDesiredNumChannels, kDesiredChannelBits,
+                        &context)) {
+    return false;
+  }
+  CHECK_NOTNULL(width);
+  *width = static_cast<int>(context.width);
+  CHECK_NOTNULL(height);
+  *height = static_cast<int>(context.height);
+  if (components != NULL) {
+    switch (context.color_type) {
+      case PNG_COLOR_TYPE_PALETTE:
+        *components = (context.info_ptr->valid & PNG_INFO_tRNS) ? 4 : 3;
+        break;
+      case PNG_COLOR_TYPE_GRAY:
+        *components = 1;
+        break;
+      case PNG_COLOR_TYPE_GRAY_ALPHA:
+        *components = 2;
+        break;
+      case PNG_COLOR_TYPE_RGB:
+        *components = 3;
+        break;
+      case PNG_COLOR_TYPE_RGB_ALPHA:
+        *components = 4;
+        break;
+      default:
+        *components = 0;
+        break;
+    }
+  }
+  if (channel_bit_depth != NULL) {
+    *channel_bit_depth = context.bit_depth;
+  }
+  if (metadata != NULL) {
+    metadata->clear();
+    for (int i = 0; i < context.info_ptr->num_text; i++) {
+      const png_text& text = context.info_ptr->text[i];
+      metadata->push_back(std::make_pair(text.key, text.text));
+    }
+  }
+  CommonFreeDecode(&context);
+  return true;
+}
+
+bool CommonInitDecode(StringPiece png_string, int desired_channels,
+                      int desired_channel_bits, DecodeContext* context) {
+  CHECK(desired_channel_bits == 8 || desired_channel_bits == 16)
+      << "desired_channel_bits = " << desired_channel_bits;
+  CHECK(0 <= desired_channels && desired_channels <= 4) << "desired_channels = "
+                                                        << desired_channels;
+  context->error_condition = false;
+  context->channels = desired_channels;
+  context->png_ptr = png_create_read_struct(PNG_LIBPNG_VER_STRING, context,
+                                            ErrorHandler, WarningHandler);
+  if (!context->png_ptr) {
+    VLOG(1) << ": DecodePNG <- png_create_read_struct failed";
+    return false;
+  }
+  if (setjmp(png_jmpbuf(context->png_ptr))) {
+    VLOG(1) << ": DecodePNG error trapped.";
+    CommonFreeDecode(context);
+    return false;
+  }
+  context->info_ptr = png_create_info_struct(context->png_ptr);
+  if (!context->info_ptr || context->error_condition) {
+    VLOG(1) << ": DecodePNG <- png_create_info_struct failed";
+    CommonFreeDecode(context);
+    return false;
+  }
+  context->data = bit_cast<const uint8*>(png_string.data());
+  context->data_left = png_string.size();
+  png_set_read_fn(context->png_ptr, context, StringReader);
+  png_read_info(context->png_ptr, context->info_ptr);
+  png_get_IHDR(context->png_ptr, context->info_ptr,
+               &context->width, &context->height,
+               &context->bit_depth, &context->color_type,
+               0, 0, 0);
+  if (context->error_condition) {
+    VLOG(1) << ": DecodePNG <- error during header parsing.";
+    CommonFreeDecode(context);
+    return false;
+  }
+  if (context->width <= 0 || context->height <= 0) {
+    VLOG(1) << ": DecodePNG <- invalid dimensions";
+    CommonFreeDecode(context);
+    return false;
+  }
+  if (context->channels == 0) {  // Autodetect number of channels
+    context->channels = context->info_ptr->channels;
+  }
+  const bool has_tRNS = (context->info_ptr->valid & PNG_INFO_tRNS) != 0;
+  const bool has_alpha = (context->color_type & PNG_COLOR_MASK_ALPHA) != 0;
+  if ((context->channels & 1) == 0) {  // We desire alpha
+    if (has_alpha) {                   // There is alpha
+    } else if (has_tRNS) {
+      png_set_tRNS_to_alpha(context->png_ptr);  // Convert transparency to alpha
+    } else {
+      png_set_add_alpha(
+          context->png_ptr, (1 << context->bit_depth) - 1, PNG_FILLER_AFTER);
+    }
+  } else {                                        // We don't want alpha
+    if (has_alpha || has_tRNS) {  // There is alpha
+      png_set_strip_alpha(context->png_ptr);                  // Strip alpha
+    }
+  }
+
+  // If we only want 8 bits, but are given 16, strip off the LS 8 bits
+  if (context->bit_depth > 8 && desired_channel_bits <= 8)
+    png_set_strip_16(context->png_ptr);
+
+  context->need_to_synthesize_16 =
+      (context->bit_depth <= 8 && desired_channel_bits == 16);
+
+  png_set_packing(context->png_ptr);
+  context->num_passes = png_set_interlace_handling(context->png_ptr);
+  png_read_update_info(context->png_ptr, context->info_ptr);
+
+#ifdef IS_LITTLE_ENDIAN
+  if (desired_channel_bits > 8)
+    png_set_swap(context->png_ptr);
+#endif  // IS_LITTLE_ENDIAN
+
+  // convert palette to rgb(a) if needs be.
+  if (context->color_type == PNG_COLOR_TYPE_PALETTE)
+      png_set_palette_to_rgb(context->png_ptr);
+
+  // handle grayscale case for source or destination
+  const bool want_gray = (context->channels < 3);
+  const bool is_gray = !(context->color_type & PNG_COLOR_MASK_COLOR);
+  if (is_gray) {    // upconvert gray to 8-bit if needed.
+    if (context->bit_depth < 8)
+      png_set_gray_1_2_4_to_8(context->png_ptr);
+  }
+  if (want_gray) {    // output is grayscale
+    if (!is_gray)
+      png_set_rgb_to_gray(context->png_ptr, 1, 0.299, 0.587);   // 601, JPG
+  } else {            // output is rgb(a)
+    if (is_gray)
+      png_set_gray_to_rgb(context->png_ptr);  // Enable gray -> RGB conversion
+  }
+  return true;
+}
+
+bool CommonFinishDecode(png_bytep data, int row_bytes, DecodeContext* context) {
+  CHECK_NOTNULL(data);
+
+  // we need to re-set the jump point so that we trap the errors
+  // within *this* function (and not CommonInitDecode())
+  if (setjmp(png_jmpbuf(context->png_ptr))) {
+    VLOG(1) << ": DecodePNG error trapped.";
+    CommonFreeDecode(context);
+    return false;
+  }
+  // png_read_row() takes care of offsetting the pointer based on interlacing
+  for (int p = 0; p < context->num_passes; ++p) {
+    png_bytep row = data;
+    for (int h = context->height; h-- != 0; row += row_bytes) {
+      png_read_row(context->png_ptr, row, NULL);
+    }
+  }
+
+  context->info_ptr->valid |= PNG_INFO_IDAT;
+  png_read_end(context->png_ptr, context->info_ptr);
+
+  // Clean up.
+  const bool ok = !context->error_condition;
+  CommonFreeDecode(context);
+
+  // Synthesize 16 bits from 8 if requested.
+  if (context->need_to_synthesize_16)
+    Convert8to16(bit_cast<uint8*>(data), context->channels, row_bytes,
+                 context->width, context->height, bit_cast<uint16*>(data),
+                 row_bytes);
+  return ok;
+}
+
+bool WriteImageToBuffer(
+    const void* image, int width, int height, int row_bytes, int num_channels,
+    int channel_bits, int compression, string* png_string,
+    const std::vector<std::pair<string, string> >* metadata) {
+  CHECK_NOTNULL(image);
+  CHECK_NOTNULL(png_string);
+  // Although this case is checked inside png.cc and issues an error message,
+  // that error causes memory corruption.
+  if (width == 0 || height == 0)
+    return false;
+
+  png_string->resize(0);
+  png_infop info_ptr = NULL;
+  png_structp png_ptr =
+    png_create_write_struct(PNG_LIBPNG_VER_STRING,
+                            NULL, ErrorHandler, WarningHandler);
+  if (png_ptr == NULL) return false;
+  if (setjmp(png_jmpbuf(png_ptr))) {
+    png_destroy_write_struct(&png_ptr, info_ptr ? &info_ptr : NULL);
+    return false;
+  }
+  info_ptr = png_create_info_struct(png_ptr);
+  if (info_ptr == NULL) {
+    png_destroy_write_struct(&png_ptr, NULL);
+    return false;
+  }
+
+  int color_type = -1;
+  switch (num_channels) {
+    case 1:
+      color_type = PNG_COLOR_TYPE_GRAY;
+      break;
+    case 2:
+      color_type = PNG_COLOR_TYPE_GRAY_ALPHA;
+      break;
+    case 3:
+      color_type = PNG_COLOR_TYPE_RGB;
+      break;
+    case 4:
+      color_type = PNG_COLOR_TYPE_RGB_ALPHA;
+      break;
+    default:
+      png_destroy_write_struct(&png_ptr, &info_ptr);
+      return false;
+  }
+
+  png_set_write_fn(png_ptr, png_string, StringWriter, StringWriterFlush);
+  if (compression < 0) compression = Z_DEFAULT_COMPRESSION;
+  png_set_compression_level(png_ptr, compression);
+  png_set_compression_mem_level(png_ptr, MAX_MEM_LEVEL);
+  // There used to be a call to png_set_filter here turning off filtering
+  // entirely, but it produced pessimal compression ratios.  I'm not sure
+  // why it was there.
+  png_set_IHDR(png_ptr, info_ptr, width, height, channel_bits, color_type,
+               PNG_INTERLACE_NONE, PNG_COMPRESSION_TYPE_DEFAULT,
+               PNG_FILTER_TYPE_DEFAULT);
+  // If we have metadata write to it.
+  if (metadata && !metadata->empty()) {
+    std::vector<png_text> text;
+    for (const auto& pair : *metadata) {
+      png_text txt;
+      txt.compression = PNG_TEXT_COMPRESSION_NONE;
+      txt.key = check_metadata_string(pair.first);
+      txt.text = check_metadata_string(pair.second);
+      text.push_back(txt);
+    }
+    png_set_text(png_ptr, info_ptr, &text[0], text.size());
+  }
+
+  png_write_info(png_ptr, info_ptr);
+#ifdef IS_LITTLE_ENDIAN
+  if (channel_bits > 8)
+    png_set_swap(png_ptr);
+#endif  // IS_LITTLE_ENDIAN
+
+  png_byte* row = reinterpret_cast<png_byte*>(const_cast<void*>(image));
+  for (; height--; row += row_bytes) png_write_row(png_ptr, row);
+  png_write_end(png_ptr, NULL);
+
+  png_destroy_write_struct(&png_ptr, &info_ptr);
+  return true;
+}
+
+}  // namespace png
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/png/png_io.h b/tensorflow/core/lib/png/png_io.h
new file mode 100644
index 0000000000..df9bff7be8
--- /dev/null
+++ b/tensorflow/core/lib/png/png_io.h
@@ -0,0 +1,88 @@
+// Functions to read and write images in PNG format.
+//
+// The advantage over image/codec/png{enc,dec}ocder.h is that this library
+// supports both 8 and 16 bit images.
+//
+// The decoding routine accepts binary image data as a StringPiece.  These are
+// implicitly constructed from strings or char* so they're completely
+// transparent to the caller.  They're also very cheap to construct so this
+// doesn't introduce any additional overhead.
+//
+// The primary benefit of StringPieces being, in this case, that APIs already
+// returning StringPieces (e.g., Bigtable Scanner) or Cords (e.g., IOBuffer;
+// only when they're flat, though) or protocol buffer fields typed to either of
+// these can be decoded without copying the data into a C++ string.
+
+#ifndef TENSORFLOW_LIB_PNG_PNG_IO_H_
+#define TENSORFLOW_LIB_PNG_PNG_IO_H_
+
+#include <string>
+#include <utility>
+#include <vector>
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "external/png_archive/libpng-1.2.53/png.h"
+
+namespace tensorflow {
+namespace png {
+
+// Handy container for decoding informations and struct pointers
+struct DecodeContext {
+  const uint8* data;
+  int         data_left;
+  png_structp png_ptr;
+  png_infop   info_ptr;
+  png_uint_32 width, height;
+  int         num_passes;
+  int         color_type;
+  int         bit_depth;
+  int channels;
+  bool need_to_synthesize_16;
+  bool error_condition;
+  DecodeContext() : png_ptr(NULL), info_ptr(NULL) {}
+};
+
+bool DecodeHeader(StringPiece png_string, int* width, int* height,
+                  int* components, int* channel_bit_depth,
+                  std::vector<std::pair<string, string> >* metadata);
+
+// Sample usage for reading PNG:
+//
+// string png_string;  /* fill with input PNG format data */
+// DecodeContext context;
+// CHECK(CommonInitDecode(png_string, 3 /*RGB*/, 8 /*uint8*/, &context));
+// char* image_buffer = new char[3*context.width*context.height];
+// CHECK(CommonFinishDecode(bit_cast<png_byte*>(image_buffer),
+//       3*context.width /*stride*/, &context));
+//
+// desired_channels may be 0 to detected it from the input.
+
+bool CommonInitDecode(StringPiece png_string, int desired_channels,
+                      int desired_channel_bits, DecodeContext* context);
+
+bool CommonFinishDecode(png_bytep data, int row_bytes, DecodeContext* context);
+
+// Normally called automatically from CommonFinishDecode.  If CommonInitDecode
+// is called but not CommonFinishDecode, call this to clean up.  Safe to call
+// extra times.
+void CommonFreeDecode(DecodeContext* context);
+
+// Sample usage for writing PNG:
+//
+// uint16* image_buffer = new uint16[width*height];  /* fill with pixels */
+// string png_string;
+// CHECK(WriteImageToBuffer(image_buffer, width, height, 2*width /*stride*/,
+//       1 /*gray*/, 16 /*uint16*/, &png_string, NULL));
+//
+// compression is in [-1,9], where 0 is fast and weak compression, 9 is slow
+// and strong, and -1 is the zlib default.
+
+bool WriteImageToBuffer(
+    const void* image, int width, int height, int row_bytes, int num_channels,
+    int channel_bits, int compression, string* png_string,
+    const std::vector<std::pair<string, string> >* metadata);
+
+}  // namespace png
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_PNG_PNG_IO_H_
diff --git a/tensorflow/core/lib/png/testdata/lena_gray.png b/tensorflow/core/lib/png/testdata/lena_gray.png
new file mode 100644
index 0000000000..8bc73159b0
--- /dev/null
+++ b/tensorflow/core/lib/png/testdata/lena_gray.png
diff --git a/tensorflow/core/lib/png/testdata/lena_rgba.png b/tensorflow/core/lib/png/testdata/lena_rgba.png
new file mode 100644
index 0000000000..79f1f84a62
--- /dev/null
+++ b/tensorflow/core/lib/png/testdata/lena_rgba.png
diff --git a/tensorflow/core/lib/random/distribution_sampler.cc b/tensorflow/core/lib/random/distribution_sampler.cc
new file mode 100644
index 0000000000..341f1bd595
--- /dev/null
+++ b/tensorflow/core/lib/random/distribution_sampler.cc
@@ -0,0 +1,80 @@
+#include "tensorflow/core/lib/random/distribution_sampler.h"
+
+#include <memory>
+#include <vector>
+
+namespace tensorflow {
+namespace random {
+
+DistributionSampler::DistributionSampler(
+    const gtl::ArraySlice<float>& weights) {
+  DCHECK(!weights.empty());
+  int n = weights.size();
+  num_ = n;
+  data_.reset(new std::pair<float, int>[n]);
+
+  std::unique_ptr<double[]> pr(new double[n]);
+
+  double sum = 0.0;
+  for (int i = 0; i < n; i++) {
+    sum += weights[i];
+    set_alt(i, -1);
+  }
+
+  // These are long/short items - called high/low because of reserved keywords.
+  std::vector<int> high;
+  high.reserve(n);
+  std::vector<int> low;
+  low.reserve(n);
+
+  // compute propotional weights
+  for (int i = 0; i < n; i++) {
+    double p = (weights[i] * n) / sum;
+    pr[i] = p;
+    if (p < 1.0) {
+      low.push_back(i);
+    } else {
+      high.push_back(i);
+    }
+  }
+
+  // Now pair high with low.
+  while (!high.empty() && !low.empty()) {
+    int l = low.back();
+    low.pop_back();
+    int h = high.back();
+    high.pop_back();
+
+    set_alt(l, h);
+    DCHECK_GE(pr[h], 1.0);
+    double remaining = pr[h] - (1.0 - pr[l]);
+    pr[h] = remaining;
+
+    if (remaining < 1.0) {
+      low.push_back(h);
+    } else {
+      high.push_back(h);
+    }
+  }
+  // Transfer pr to prob with rounding errors.
+  for (int i = 0; i < n; i++) {
+    set_prob(i, pr[i]);
+  }
+  // Because of rounding errors, both high and low may have elements, that are
+  // close to 1.0 prob.
+  for (size_t i = 0; i < high.size(); i++) {
+    int idx = high[i];
+    set_prob(idx, 1.0);
+    // set alt to self to prevent rounding errors returning 0
+    set_alt(idx, idx);
+  }
+  for (size_t i = 0; i < low.size(); i++) {
+    int idx = low[i];
+    set_prob(idx, 1.0);
+    // set alt to self to prevent rounding errors returning 0
+    set_alt(idx, idx);
+  }
+}
+
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/distribution_sampler.h b/tensorflow/core/lib/random/distribution_sampler.h
new file mode 100644
index 0000000000..ab9598a205
--- /dev/null
+++ b/tensorflow/core/lib/random/distribution_sampler.h
@@ -0,0 +1,79 @@
+// DistributionSampler allows generating a discrete random variable with a given
+// distribution.
+// The values taken by the variable are [0, N) and relative weights for each
+// value are specified using a vector of size N.
+//
+// The Algorithm takes O(N) time to precompute data at construction time and
+// takes O(1) time (2 random number generation, 2 lookups) for each sample.
+// The data structure takes O(N) memory.
+//
+// In contrast, util/random/weighted-picker.h provides O(lg N) sampling.
+// The advantage of that implementation is that weights can be adjusted
+// dynamically, while DistributionSampler doesn't allow weight adjustment.
+//
+// The algorithm used is Walker's Aliasing algorithm, described in Knuth, Vol 2.
+
+#ifndef TENSORFLOW_LIB_RANDOM_DISTRIBUTION_SAMPLER_H_
+#define TENSORFLOW_LIB_RANDOM_DISTRIBUTION_SAMPLER_H_
+
+#include <memory>
+#include <utility>
+#include <vector>
+
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace random {
+
+class DistributionSampler {
+ public:
+  explicit DistributionSampler(const gtl::ArraySlice<float>& weights);
+
+  ~DistributionSampler() {}
+
+  int Sample(SimplePhilox* rand) const {
+    float r = rand->RandFloat();
+    // Since n is typically low, we don't bother with UnbiasedUniform.
+    int idx = rand->Uniform(num_);
+    if (r < prob(idx)) return idx;
+    // else pick alt from that bucket.
+    DCHECK_NE(-1, alt(idx));
+    return alt(idx);
+  }
+
+  int num() const { return num_; }
+
+ private:
+  float prob(int idx) const {
+    DCHECK_LT(idx, num_);
+    return data_[idx].first;
+  }
+
+  int alt(int idx) const {
+    DCHECK_LT(idx, num_);
+    return data_[idx].second;
+  }
+
+  void set_prob(int idx, float f) {
+    DCHECK_LT(idx, num_);
+    data_[idx].first = f;
+  }
+
+  void set_alt(int idx, int val) {
+    DCHECK_LT(idx, num_);
+    data_[idx].second = val;
+  }
+
+  int num_;
+  std::unique_ptr<std::pair<float, int>[]> data_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(DistributionSampler);
+};
+
+}  // namespace random
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_RANDOM_DISTRIBUTION_SAMPLER_H_
diff --git a/tensorflow/core/lib/random/distribution_sampler_test.cc b/tensorflow/core/lib/random/distribution_sampler_test.cc
new file mode 100644
index 0000000000..d61a8daa0f
--- /dev/null
+++ b/tensorflow/core/lib/random/distribution_sampler_test.cc
@@ -0,0 +1,90 @@
+#include "tensorflow/core/lib/random/distribution_sampler.h"
+
+#include <string.h>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace random {
+
+class DistributionSamplerTest : public ::testing::Test {
+ protected:
+  // Returns the Chi-Squared statistic for the two distributions.
+  float TestWeights(const std::vector<float>& weights, int trials_per_bin) {
+    int iters = weights.size() * trials_per_bin;
+    std::unique_ptr<float[]> counts(new float[weights.size()]);
+    memset(counts.get(), 0, sizeof(float) * weights.size());
+    DistributionSampler sampler(weights);
+    PhiloxRandom philox(testing::RandomSeed(), 17);
+    SimplePhilox random(&philox);
+    for (int i = 0; i < iters; i++) {
+      int r = sampler.Sample(&random);
+      EXPECT_LT(r, weights.size());
+      EXPECT_GE(r, 0);
+      counts[r] += 1.0;
+    }
+    float chi2 = 0.0;
+    for (size_t i = 0; i < weights.size(); i++) {
+      counts[i] /= iters;
+      float err = (counts[i] - weights[i]);
+      chi2 += (err * err) / weights[i];
+    }
+    return chi2;
+  }
+
+  void TestDistribution(float* arr, int n) {
+    std::vector<float> w;
+    w.reserve(n);
+    for (int i = 0; i < n; i++) {
+      w.push_back(arr[i]);
+    }
+    float var = TestWeights(w, 1000);
+    if (var < 0.001) return;
+    // Maybe a statistical skew. Let's try more iterations.
+    var = TestWeights(w, 100000);
+    if (var < 0.001) return;
+    EXPECT_TRUE(false) << "Chi2 is " << var << " in " << n * 100000
+                       << "iterations";
+  }
+};
+
+TEST_F(DistributionSamplerTest, KnownDistribution) {
+  float kEven2[] = {0.5, 0.5};
+  float kEven3[] = {0.33333333, 0.33333333, 0.33333333};
+  float kEven4[] = {0.25, 0.25, 0.25, 0.25};
+
+  float kDist1[] = {0.8, 0.15, 0.05};
+
+  TestDistribution(kEven2, TF_ARRAYSIZE(kEven2));
+  TestDistribution(kEven3, TF_ARRAYSIZE(kEven3));
+  TestDistribution(kEven4, TF_ARRAYSIZE(kEven4));
+  TestDistribution(kDist1, TF_ARRAYSIZE(kDist1));
+}
+
+static void BM_DistributionSampler(int iters, int n) {
+  testing::StopTiming();
+  PhiloxRandom philox(173, 371);
+  SimplePhilox rand(&philox);
+  std::vector<float> weights(n, 0);
+  for (int i = 0; i < n; i++) {
+    weights[i] = rand.Uniform(100);
+  }
+  DistributionSampler picker(weights);
+  testing::StartTiming();
+  int r = 0;
+  for (int i = 0; i < iters; i++) {
+    r |= picker.Sample(&rand);
+  }
+  CHECK_NE(r, kint32max);
+}
+
+BENCHMARK(BM_DistributionSampler)->Arg(10)->Arg(100)->Arg(1000);
+
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/exact_uniform_int.h b/tensorflow/core/lib/random/exact_uniform_int.h
new file mode 100644
index 0000000000..616354cc5c
--- /dev/null
+++ b/tensorflow/core/lib/random/exact_uniform_int.h
@@ -0,0 +1,68 @@
+// Exact uniform integers using rejection sampling
+
+#ifndef TENSORFLOW_LIB_RANDOM_EXACT_UNIFORM_H_
+#define TENSORFLOW_LIB_RANDOM_EXACT_UNIFORM_H_
+
+#include <type_traits>
+
+namespace tensorflow {
+namespace random {
+
+template <typename UintType, typename RandomBits>
+UintType ExactUniformInt(const UintType n, const RandomBits& random) {
+  static_assert(std::is_unsigned<UintType>::value,
+                "UintType must be an unsigned int");
+  static_assert(std::is_same<UintType, decltype(random())>::value,
+                "random() should return UintType");
+  if (n == 0) {
+    // Consume a value anyway
+    // TODO(irving): Assert n != 0, since this case makes no sense.
+    return random() * n;
+  } else if (0 == (n & (n - 1))) {
+    // N is a power of two, so just mask off the lower bits.
+    return random() & (n - 1);
+  } else {
+    // Reject all numbers that skew the distribution towards 0.
+
+    // random's output is uniform in the half-open interval [0, 2^{bits}).
+    // For any interval [m,n), the number of elements in it is n-m.
+
+    const UintType range = ~static_cast<UintType>(0);
+    const UintType rem = (range % n) + 1;
+    UintType rnd;
+
+    // rem = ((2^bits-1) \bmod n) + 1
+    // 1 <= rem <= n
+
+    // NB: rem == n is impossible, since n is not a power of 2 (from
+    // earlier check).
+
+    do {
+      rnd = random();     // rnd uniform over [0, 2^{bits})
+    } while (rnd < rem);  // reject [0, rem)
+    // rnd is uniform over [rem, 2^{bits})
+    //
+    // The number of elements in the half-open interval is
+    //
+    //  2^{bits} - rem = 2^{bits} - ((2^{bits}-1) \bmod n) - 1
+    //                 = 2^{bits}-1 - ((2^{bits}-1) \bmod n)
+    //                 = n \cdot \lfloor (2^{bits}-1)/n \rfloor
+    //
+    // therefore n evenly divides the number of integers in the
+    // interval.
+    //
+    // The function v \rightarrow v % n takes values from [bias,
+    // 2^{bits}) to [0, n).  Each integer in the range interval [0, n)
+    // will have exactly \lfloor (2^{bits}-1)/n \rfloor preimages from
+    // the domain interval.
+    //
+    // Therefore, v % n is uniform over [0, n).  QED.
+
+    return rnd % n;
+  }
+}
+
+}  // namespace random
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_RANDOM_EXACT_UNIFORM_H_
diff --git a/tensorflow/core/lib/random/philox_random.h b/tensorflow/core/lib/random/philox_random.h
new file mode 100644
index 0000000000..2c3cd0c4b9
--- /dev/null
+++ b/tensorflow/core/lib/random/philox_random.h
@@ -0,0 +1,232 @@
+// Implement the Philox algorithm to generate random numbers in parallel.
+// Salmon et al. SC 2011. Parallel random numbers: as easy as 1, 2, 3.
+//   http://www.thesalmons.org/john/random123/papers/random123sc11.pdf
+
+#ifndef TENSORFLOW_LIB_RANDOM_PHILOX_RANDOM_H_
+#define TENSORFLOW_LIB_RANDOM_PHILOX_RANDOM_H_
+
+#include <stdlib.h>
+
+#include "tensorflow/core/platform/port.h"
+
+// Function qualifiers that need to work on both CPU and GPU.
+#ifdef __CUDA_ARCH__
+// For nvcc.
+#define PHILOX_DEVICE_FUNC __host__ __device__
+#define PHILOX_INLINE __inline__
+#else
+// For non-nvcc.
+#define PHILOX_DEVICE_FUNC
+#define PHILOX_INLINE inline
+#endif
+#define PHILOX_DEVICE_INLINE PHILOX_DEVICE_FUNC PHILOX_INLINE
+
+#include <math.h>
+
+namespace tensorflow {
+namespace random {
+
+// A class that represents an inline array. It can be used on both CPU and GPU,
+// and also trivially copyable between CPU and GPU.
+// Arguments:
+//   T: the array element type;
+//   ElementCount: the fixed size of the array;
+template <typename T, int ElementCount>
+class Array {
+ public:
+  PHILOX_DEVICE_INLINE Array() {
+    for (int i = 0; i < ElementCount; ++i) {
+      data_[i] = T();
+    }
+  }
+
+  PHILOX_DEVICE_INLINE const T& operator[](int index) const {
+    return data_[index];
+  }
+
+  PHILOX_DEVICE_INLINE T& operator[](int index) { return data_[index]; }
+
+  size_t size() const { return ElementCount; }
+
+ private:
+  T data_[ElementCount];
+};
+
+// A class that encapsulates all the states for a random number generator using
+// the philox_4x32_10 algorithm. Each invocation returns a 128-bit random bits
+// in the form of four uint32.
+// There are multiple variants of this algorithm, we picked the 4x32_10 version
+// that is most suited for our applications.
+// Since this class is meant to be copied between CPU to GPU, it maintains a
+// value semantics.
+//
+// For example: To use this class and populate an array of 1024 randoms on CPU
+// with two threads,
+//
+//  void Fill(PhiloxRandom rnd, uint32* output, int start, int limit) {
+//    assert(start % 4 == 0);
+//    assert(limit % 4 == 0);
+//    rnd.Skip(start / 4);
+//    for (int i = start; i < limit; i += 4) {
+//      auto sample = rnd();
+//      ... copy sample[0..3] to output[i..i+3]
+//    }
+//  }
+//
+//  PhiloxRandom rng(seed);
+//  PhiloxRandom rng_copy = rng;
+//  rng.Skip(1000/4);
+//
+//  ... schedule Fill(rng_copy, output, 0, 512) in thread 1;
+//  ... schedule Fill(rng_copy, output, 512, 1024) in thread 2;
+//  ... wait for thread 1 & 2 to finish executing Fill().
+//
+// NOTE:
+// 1. PhiloxRandom is trivially copyable.
+// 2. PhiloxRandom is compilable by gcc and nvcc.
+class PhiloxRandom {
+ public:
+  typedef Array<uint32, 4> ResultType;
+  typedef uint32 ResultElementType;
+  // The number of elements that will be returned.
+  static const int kResultElementCount = 4;
+
+  PHILOX_DEVICE_INLINE
+  PhiloxRandom() {}
+
+  PHILOX_DEVICE_INLINE
+  explicit PhiloxRandom(uint64 seed) {
+    key_[0] = static_cast<uint32>(seed);
+    key_[1] = static_cast<uint32>(seed >> 32);
+  }
+
+  PHILOX_DEVICE_INLINE
+  explicit PhiloxRandom(uint64 seed_lo, uint64 seed_hi) {
+    key_[0] = static_cast<uint32>(seed_lo);
+    key_[1] = static_cast<uint32>(seed_lo >> 32);
+    counter_[2] = static_cast<uint32>(seed_hi);
+    counter_[3] = static_cast<uint32>(seed_hi >> 32);
+  }
+
+  // Skip the specified number of samples of 128-bits in the current stream.
+  PHILOX_DEVICE_INLINE
+  void Skip(uint64 count) {
+    const uint32 count_lo = static_cast<uint32>(count);
+    uint32 count_hi = static_cast<uint32>(count >> 32);
+
+    counter_[0] += count_lo;
+    if (counter_[0] < count_lo) {
+      ++count_hi;
+    }
+
+    counter_[1] += count_hi;
+    if (counter_[1] < count_hi) {
+      if (++counter_[2] == 0) {
+        ++counter_[3];
+      }
+    }
+  }
+
+  // Returns a group of four random numbers using the underlying Philox
+  // algorithm.
+  PHILOX_DEVICE_INLINE ResultType operator()() {
+    ResultType counter = counter_;
+    Key key = key_;
+
+    // Run the single rounds for ten times. Manually unrolling the loop
+    // for better performance.
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+    RaiseKey(&key);
+    counter = ComputeSingleRound(counter, key);
+
+    SkipOne();
+
+    return counter;
+  }
+
+ private:
+  // The type for the 64-bit key stored in the form of two 32-bit uint
+  // that are used in the diffusion process.
+  typedef Array<uint32, 2> Key;
+
+  // We use the same constants as recommended by the original paper.
+  static const uint32 kPhiloxW32A = 0x9E3779B9;
+  static const uint32 kPhiloxW32B = 0xBB67AE85;
+  static const uint32 kPhiloxM4x32A = 0xD2511F53;
+  static const uint32 kPhiloxM4x32B = 0xCD9E8D57;
+
+  // Helper function to skip the next sample of 128-bits in the current stream.
+  PHILOX_DEVICE_INLINE void SkipOne() {
+    if (++counter_[0] == 0) {
+      if (++counter_[1] == 0) {
+        if (++counter_[2] == 0) {
+          ++counter_[3];
+        }
+      }
+    }
+  }
+
+  // Helper function to return the lower and higher 32-bits from two 32-bit
+  // integer multiplications.
+  PHILOX_DEVICE_INLINE
+  static void MultiplyHighLow(uint32 a, uint32 b, uint32* result_low,
+                              uint32* result_high) {
+#ifndef __GCUDACC__
+    const uint64 product = static_cast<uint64>(a) * b;
+    *result_low = static_cast<uint32>(product);
+    *result_high = static_cast<uint32>(product >> 32);
+#else
+    *result_low = a * b;
+    *result_high = __umulhi(a, b);
+#endif
+  }
+
+  // Helper function for a single round of the underlying Philox algorithm.
+  PHILOX_DEVICE_INLINE static ResultType ComputeSingleRound(
+      const ResultType& counter, const Key& key) {
+    uint32 lo0;
+    uint32 hi0;
+    MultiplyHighLow(kPhiloxM4x32A, counter[0], &lo0, &hi0);
+
+    uint32 lo1;
+    uint32 hi1;
+    MultiplyHighLow(kPhiloxM4x32B, counter[2], &lo1, &hi1);
+
+    ResultType result;
+    result[0] = hi1 ^ counter[1] ^ key[0];
+    result[1] = lo1;
+    result[2] = hi0 ^ counter[3] ^ key[1];
+    result[3] = lo0;
+    return result;
+  }
+
+  PHILOX_DEVICE_INLINE void RaiseKey(Key* key) {
+    (*key)[0] += kPhiloxW32A;
+    (*key)[1] += kPhiloxW32B;
+  }
+
+ private:
+  ResultType counter_;
+  Key key_;
+};
+
+}  // namespace random
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_RANDOM_PHILOX_RANDOM_H_
diff --git a/tensorflow/core/lib/random/philox_random_test.cc b/tensorflow/core/lib/random/philox_random_test.cc
new file mode 100644
index 0000000000..997c0263b7
--- /dev/null
+++ b/tensorflow/core/lib/random/philox_random_test.cc
@@ -0,0 +1,58 @@
+#include "tensorflow/core/lib/random/philox_random.h"
+
+#include <math.h>
+#include <algorithm>
+#include <functional>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/random/philox_random_test_utils.h"
+#include "tensorflow/core/lib/random/random.h"
+#include "tensorflow/core/lib/random/random_distributions.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace random {
+namespace {
+
+// A trivial distribution that just returns the PhiloxRandom as a distribution
+class TrivialPhiloxDistribution {
+ public:
+  // The number of elements that will be returned.
+  static constexpr int kResultElementCount = PhiloxRandom::kResultElementCount;
+  typedef PhiloxRandom::ResultType ResultType;
+  typedef PhiloxRandom::ResultElementType ResultElementType;
+
+  PHILOX_DEVICE_INLINE
+  ResultType operator()(PhiloxRandom* gen) { return (*gen)(); }
+};
+
+// This test checks that skipping certain number of samples, is equivalent to
+// generate the same number of samples without skipping.
+TEST(PhiloxRandomTest, SkipMatchTest) {
+  constexpr int count = 1024;
+  constexpr int skip_count = 2048;
+
+  uint64 test_seed = GetTestSeed();
+  std::vector<uint32> v1(count);
+  {
+    PhiloxRandom gen(test_seed);
+    gen.Skip(skip_count / 4);
+    FillRandoms<TrivialPhiloxDistribution>(gen, &v1[0], v1.size());
+  }
+
+  std::vector<uint32> v2(count + skip_count);
+  {
+    PhiloxRandom gen(test_seed);
+    FillRandoms<TrivialPhiloxDistribution>(gen, &v2[0], v2.size());
+  }
+
+  for (int i = 0; i < count; ++i) {
+    ASSERT_EQ(v1[i], v2[i + skip_count]);
+  }
+}
+
+}  // namespace
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/philox_random_test_utils.h b/tensorflow/core/lib/random/philox_random_test_utils.h
new file mode 100644
index 0000000000..d22f6b36e4
--- /dev/null
+++ b/tensorflow/core/lib/random/philox_random_test_utils.h
@@ -0,0 +1,36 @@
+#ifndef TENSORFLOW_LIB_RANDOM_PHILOX_RANDOM_TEST_UTILS_H_
+#define TENSORFLOW_LIB_RANDOM_PHILOX_RANDOM_TEST_UTILS_H_
+
+#include <algorithm>
+
+#include "tensorflow/core/lib/random/philox_random.h"
+#include "tensorflow/core/lib/random/random.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace random {
+
+// Return a random seed.
+inline uint64 GetTestSeed() { return New64(); }
+
+// A utility function to fill the given array with samples from the given
+// distribution.
+template <class Distribution>
+void FillRandoms(PhiloxRandom gen, typename Distribution::ResultElementType* p,
+                 int64 size) {
+  const int granularity = Distribution::kResultElementCount;
+
+  CHECK(size % granularity == 0) << " size: " << size
+                                 << " granularity: " << granularity;
+
+  Distribution dist;
+  for (int i = 0; i < size; i += granularity) {
+    const auto sample = dist(&gen);
+    std::copy(&sample[0], &sample[0] + granularity, &p[i]);
+  }
+}
+
+}  // namespace random
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_RANDOM_PHILOX_RANDOM_TEST_UTILS_H_
diff --git a/tensorflow/core/lib/random/random.cc b/tensorflow/core/lib/random/random.cc
new file mode 100644
index 0000000000..2959b05382
--- /dev/null
+++ b/tensorflow/core/lib/random/random.cc
@@ -0,0 +1,22 @@
+#include "tensorflow/core/lib/random/random.h"
+
+#include <random>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace random {
+
+std::mt19937_64* InitRng() {
+  std::random_device device("/dev/random");
+  return new std::mt19937_64(device());
+}
+
+uint64 New64() {
+  static std::mt19937_64* rng = InitRng();
+  static mutex mu;
+  mutex_lock l(mu);
+  return (*rng)();
+}
+
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/random.h b/tensorflow/core/lib/random/random.h
new file mode 100644
index 0000000000..1a20436c4e
--- /dev/null
+++ b/tensorflow/core/lib/random/random.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_LIB_RANDOM_RANDOM_H_
+#define TENSORFLOW_LIB_RANDOM_RANDOM_H_
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace random {
+
+// Return a 64-bit random value.  Different sequences are generated
+// in different processes.
+uint64 New64();
+
+}  // namespace random
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_RANDOM_RANDOM_H_
diff --git a/tensorflow/core/lib/random/random_distributions.h b/tensorflow/core/lib/random/random_distributions.h
new file mode 100644
index 0000000000..caafcde513
--- /dev/null
+++ b/tensorflow/core/lib/random/random_distributions.h
@@ -0,0 +1,361 @@
+#ifndef TENSORFLOW_LIB_RANDOM_RANDOM_DISTRIBUTIONS_H_
+#define TENSORFLOW_LIB_RANDOM_RANDOM_DISTRIBUTIONS_H_
+
+#include <math.h>
+#include <string.h>
+#include <algorithm>
+
+#include "tensorflow/core/lib/random/philox_random.h"
+
+namespace tensorflow {
+namespace random {
+
+// Helper function to convert a 32-bit integer to a float between [0..1).
+PHILOX_DEVICE_INLINE float Uint32ToFloat(uint32 x);
+// Helper function to convert two 32-bit integers to a double between [0..1).
+PHILOX_DEVICE_INLINE double Uint64ToDouble(uint32 x0, uint32 x1);
+
+// A class that generates uniform distribution random numbers from the
+// underlying random integer generator.
+// Arguments:
+//   Generator: a generator type that returns a number of uint32 upon each
+//              each invocation. It needs to define kResultElementCount for the
+//              sample count for each invocation, and ResultType for actual
+//              returned sample type.
+//   RealType: the data type of the real numberes that will be returned by the
+//             distribution. This could be either float or double for now.
+// This class is meant to be implemented through specialization. The default
+// is not defined by design.
+template <class Generator, typename RealType>
+class UniformDistribution;
+
+template <class Generator>
+class UniformDistribution<Generator, float> {
+ public:
+  // The number of elements that will be returned.
+  static const int kResultElementCount = Generator::kResultElementCount;
+  // Indicate that this distribution may take variable number of samples
+  // during the runtime.
+  static const bool kVariableSamplesPerOutput = false;
+  typedef Array<float, kResultElementCount> ResultType;
+  typedef float ResultElementType;
+
+  PHILOX_DEVICE_INLINE
+  ResultType operator()(Generator* gen) {
+    typename Generator::ResultType sample = (*gen)();
+    ResultType result;
+    for (int i = 0; i < kResultElementCount; ++i) {
+      result[i] = Uint32ToFloat(sample[i]);
+    }
+    return result;
+  }
+};
+
+template <class Generator>
+class UniformDistribution<Generator, double> {
+ public:
+  // The number of elements that will be returned.
+  static const int kResultElementCount = Generator::kResultElementCount / 2;
+  // Indicate that this distribution may take variable number of samples
+  // during the runtime.
+  static const bool kVariableSamplesPerOutput = false;
+  typedef Array<double, kResultElementCount> ResultType;
+  typedef double ResultElementType;
+
+  PHILOX_DEVICE_INLINE
+  ResultType operator()(Generator* gen) {
+    typename Generator::ResultType sample = (*gen)();
+    ResultType result;
+    for (int i = 0; i < kResultElementCount; ++i) {
+      result[i] = Uint64ToDouble(sample[2 * i], sample[2 * i + 1]);
+    }
+    return result;
+  }
+};
+
+// A class that adapts the underlying native multiple samples to return a single
+// sample at a time.
+template <class Generator>
+class SingleSampleAdapter {
+ public:
+  // The number of elements that will be returned.
+  static const int kResultElementCount = 1;
+  // The number of elements that will be returned by the underlying generator.
+  static const int kNativeElementCount = Generator::kResultElementCount;
+  typedef typename Generator::ResultElementType ResultType;
+  typedef typename Generator::ResultElementType ResultElementType;
+
+  PHILOX_DEVICE_INLINE
+  explicit SingleSampleAdapter(Generator* gen)
+      : generator_(gen), used_result_index_(Generator::kResultElementCount) {}
+
+  PHILOX_DEVICE_INLINE
+  ResultType operator()() {
+    if (used_result_index_ == Generator::kResultElementCount) {
+      unused_results_ = (*generator_)();
+      used_result_index_ = 0;
+    }
+
+    return unused_results_[used_result_index_++];
+  }
+
+ private:
+  Generator* generator_;
+  typename Generator::ResultType unused_results_;
+  int used_result_index_;
+};
+
+// A class that generates unit normal distribution random numbers from the
+// underlying random integer generator.
+// Arguments:
+//   Generator: a generator type that returns a number of uint32 upon each
+//              each invocation. It needs to define kResultElementCount for the
+//              sample count for each invocation, and ResultType for actual
+//              returned sample type.
+//   RealType: the data type of the real numberes that will be returned by the
+//             distribution. This could be either float or double for now.
+// This class is meant to be implemented through specialization. The default
+// is not defined by design.
+template <class Generator, typename RealType>
+class NormalDistribution;
+
+PHILOX_DEVICE_INLINE
+void BoxMullerFloat(uint32 x0, uint32 x1, float* f0, float* f1);
+
+PHILOX_DEVICE_INLINE
+void BoxMullerDouble(uint32 x0, uint32 x1, uint32 x2, uint32 x3, double* d0,
+                     double* d1);
+
+template <class Generator>
+class NormalDistribution<Generator, float> {
+ public:
+  // The number of elements that will be returned.
+  static const int kResultElementCount = Generator::kResultElementCount;
+  // Indicate that this distribution may take variable number of samples
+  // during the runtime.
+  static const bool kVariableSamplesPerOutput = false;
+  typedef Array<float, kResultElementCount> ResultType;
+  typedef float ResultElementType;
+
+  PHILOX_DEVICE_INLINE
+  ResultType operator()(Generator* gen) {
+    typename Generator::ResultType sample = (*gen)();
+    ResultType result;
+    for (int i = 0; i < kResultElementCount; i += 2) {
+      BoxMullerFloat(sample[i], sample[i + 1], &result[i], &result[i + 1]);
+    }
+    return result;
+  }
+};
+
+template <class Generator>
+class NormalDistribution<Generator, double> {
+ public:
+  // The number of elements that will be returned.
+  static const int kResultElementCount = Generator::kResultElementCount / 2;
+  // Indicate that this distribution may take variable number of samples
+  // during the runtime.
+  static const bool kVariableSamplesPerOutput = false;
+  typedef Array<double, kResultElementCount> ResultType;
+  typedef double ResultElementType;
+
+  PHILOX_DEVICE_INLINE
+  ResultType operator()(Generator* gen) {
+    typename Generator::ResultType sample = (*gen)();
+    ResultType result;
+    for (int i = 0; i < kResultElementCount; i += 2) {
+      const int i2 = 2 * i;
+      BoxMullerDouble(sample[i2], sample[i2 + 1], sample[i2 + 2],
+                      sample[i2 + 3], &result[i], &result[i + 1]);
+    }
+    return result;
+  }
+};
+
+// A class that returns standard normal distribution between
+// [-kTruncateValue, kTruncateValue].
+// Arguments:
+//   Generator: a generator type that returns a number of uint32 upon each
+//              each invocation. It needs to define kResultElementCount for the
+//              sample count for each invocation, and ResultType for actual
+//              returned sample type.
+//   RealType: the data type of the real numberes that will be returned by the
+//             distribution. This could be either float or double for now.
+// This class is meant to be implemented through specialization. The default
+// is not defined by design.
+template <class SingleSampleGenerator, typename RealType>
+class TruncatedNormalDistribution;
+
+// Partial specialization for float.
+template <class SingleSampleGenerator>
+class TruncatedNormalDistribution<SingleSampleGenerator, float> {
+ public:
+  // The number of elements that will be returned.
+  static const int kResultElementCount =
+      SingleSampleGenerator::kNativeElementCount;
+  // Indicate that this distribution may take variable number of samples
+  // during the runtime.
+  static const bool kVariableSamplesPerOutput = true;
+  // The threshold where the normal distribution is truncated.
+  const float kTruncateValue = 2.0f;
+
+  typedef Array<float, kResultElementCount> ResultType;
+  typedef float ResultElementType;
+
+  PHILOX_DEVICE_INLINE
+  ResultType operator()(SingleSampleGenerator* gen) {
+    ResultType results;
+    int index = 0;
+    while (true) {
+      // Repeatedly take samples from the normal distribution, until we have
+      // the desired number of elements that fall within the pre-defined cutoff
+      // threshold.
+      const uint32 x0 = (*gen)();
+      const uint32 x1 = (*gen)();
+      float f[2];
+      BoxMullerFloat(x0, x1, &f[0], &f[1]);
+
+      for (int i = 0; i < 2; ++i) {
+        if (fabs(f[i]) < kTruncateValue) {
+          results[index++] = f[i];
+          if (index >= kResultElementCount) {
+            return results;
+          }
+        }
+      }
+    }
+  }
+};
+
+// Partial specialization for double.
+template <class SingleSampleGenerator>
+class TruncatedNormalDistribution<SingleSampleGenerator, double> {
+ public:
+  // The number of elements that will be returned.
+  static const int kResultElementCount =
+      (SingleSampleGenerator::kNativeElementCount > 1)
+          ? SingleSampleGenerator::kNativeElementCount / 2
+          : 1;
+  // Indicate that this distribution may take variable number of samples
+  // during the runtime.
+  static const bool kVariableSamplesPerOutput = true;
+  typedef Array<double, kResultElementCount> ResultType;
+  typedef double ResultElementType;
+  const double kTruncateValue = 2.0;
+
+  PHILOX_DEVICE_INLINE
+  ResultType operator()(SingleSampleGenerator* gen) {
+    ResultType results;
+    int index = 0;
+    while (1) {
+      const uint32 x0 = (*gen)();
+      const uint32 x1 = (*gen)();
+      const uint32 x2 = (*gen)();
+      const uint32 x3 = (*gen)();
+      double d[2];
+      BoxMullerDouble(x0, x1, x2, x3, &d[0], &d[1]);
+
+      for (int i = 0; i < 2; ++i) {
+        if (fabs(d[i]) < kTruncateValue) {
+          results[index++] = d[i];
+          if (index >= kResultElementCount) {
+            return results;
+          }
+        }
+      }
+    }
+  }
+};
+
+// Helper function to convert two 32-bit uniform integers to two floats
+// under the unit normal distribution.
+PHILOX_DEVICE_INLINE
+void BoxMullerFloat(uint32 x0, uint32 x1, float* f0, float* f1) {
+  // This function implements the Box-Muller transform:
+  // http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform#Basic_form
+  // Do not send a really small number to log().
+  // We cannot mark "epsilon" as "static const" because NVCC would complain
+  const float epsilon = 1.0e-7f;
+  float u1 = Uint32ToFloat(x0);
+  if (u1 < epsilon) {
+    u1 = epsilon;
+  }
+  const float v1 = 2.0f * M_PI * Uint32ToFloat(x1);
+  const float u2 = sqrt(-2.0f * log(u1));
+#if defined(__linux)
+  sincosf(v1, f0, f1);
+#else
+  *f0 = sinf(v1);
+  *f1 = cosf(v1);
+#endif
+  *f0 *= u2;
+  *f1 *= u2;
+}
+
+// Helper function to convert four 32-bit uniform integers to two doubles
+// under the unit normal distribution.
+PHILOX_DEVICE_INLINE
+void BoxMullerDouble(uint32 x0, uint32 x1, uint32 x2, uint32 x3, double* d0,
+                     double* d1) {
+  // This function implements the Box-Muller transform:
+  // http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform#Basic_form
+  // Do not send a really small number to log().
+  // We cannot mark "epsilon" as "static const" because NVCC would complain
+  const double epsilon = 1.0e-7;
+  double u1 = Uint64ToDouble(x0, x1);
+  if (u1 < epsilon) {
+    u1 = epsilon;
+  }
+  const double v1 = 2 * M_PI * Uint64ToDouble(x2, x3);
+  const double u2 = sqrt(-2.0 * log(u1));
+#if defined(__linux)
+  sincos(v1, d0, d1);
+#else
+  *d0 = sin(v1);
+  *d1 = cos(v1);
+#endif
+  *d0 *= u2;
+  *d1 *= u2;
+}
+
+// Helper function to convert an 32-bit integer to a float between [0..1).
+PHILOX_DEVICE_INLINE float Uint32ToFloat(uint32 x) {
+  // IEEE754 floats are formatted as follows (MSB first):
+  //    sign(1) exponent(8) mantissa(23)
+  // Conceptually construct the following:
+  //    sign == 0
+  //    exponent == 127  -- an excess 127 representation of a zero exponent
+  //    mantissa == 23 random bits
+  const uint32 man = x & 0x7fffffu;  // 23 bit mantissa
+  const uint32 exp = static_cast<uint32>(127);
+  const uint32 val = (exp << 23) | man;
+
+  // Assumes that endian-ness is same for float and uint32.
+  float result;
+  memcpy(&result, &val, sizeof(val));
+  return result - 1.0f;
+}
+
+// Helper function to convert two 32-bit integers to a double between [0..1).
+PHILOX_DEVICE_INLINE double Uint64ToDouble(uint32 x0, uint32 x1) {
+  // IEEE754 doubles are formatted as follows (MSB first):
+  //    sign(1) exponent(11) mantissa(52)
+  // Conceptually construct the following:
+  //    sign == 0
+  //    exponent == 1023  -- an excess 1023 representation of a zero exponent
+  //    mantissa == 52 random bits
+  const uint32 mhi = x0 & 0xfffffu;  // upper 20 bits of mantissa
+  const uint32 mlo = x1;             // lower 32 bits of mantissa
+  const uint64 man = (static_cast<uint64>(mhi) << 32) | mlo;  // mantissa
+  const uint64 exp = static_cast<uint64>(1023);
+  const uint64 val = (exp << 52) | man;
+  // Assumes that endian-ness is same for double and uint64.
+  double result;
+  memcpy(&result, &val, sizeof(val));
+  return result - 1.0;
+}
+
+}  // namespace random
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_RANDOM_RANDOM_DISTRIBUTIONS_H_
diff --git a/tensorflow/core/lib/random/random_distributions_test.cc b/tensorflow/core/lib/random/random_distributions_test.cc
new file mode 100644
index 0000000000..3ce86a907a
--- /dev/null
+++ b/tensorflow/core/lib/random/random_distributions_test.cc
@@ -0,0 +1,270 @@
+#include "tensorflow/core/lib/random/random_distributions.h"
+
+#include <math.h>
+#include <algorithm>
+#include <functional>
+#include <unordered_map>
+#include <vector>
+
+#include "tensorflow/core/lib/random/philox_random.h"
+#include "tensorflow/core/lib/random/philox_random_test_utils.h"
+#include "tensorflow/core/lib/random/random.h"
+#include "tensorflow/core/platform/logging.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace random {
+namespace {
+
+// The largest z-value we want to tolerate. Since the z-test approximates a
+// unit normal distribution, it should almost definitely never exceed 6.
+static constexpr float kZLimit = 6.0;
+
+// A utility function to fill the given array with samples from the given
+// distribution, using the single adatper of the underlying generator
+template <class Distribution>
+void FillRandomsWithSingles(PhiloxRandom gen,
+                            typename Distribution::ResultElementType* p,
+                            int64 size) {
+  int granularity = Distribution::kResultElementCount;
+
+  CHECK(size % granularity == 0) << " size: " << size
+                                 << " granularity: " << granularity;
+
+  SingleSampleAdapter<PhiloxRandom> single_samples(&gen);
+
+  Distribution dist;
+  for (int i = 0; i < size; i += granularity) {
+    auto sample = dist(&single_samples);
+    std::copy(&sample[0], &sample[0] + granularity, &p[i]);
+  }
+}
+
+// Check the given array of samples matches the given theoretical moment
+// function at different orders. The test is considered passing if the z-tests
+// of all statistical moments are all below z_limit.
+// typename T in the template argument could be either float or double.
+// Arguments:
+//   samples: an array of samples to be tested for their statistical properties;
+//   theoretical_moments: a functor that can calculate arbitrary order of
+//       of the given distribution;
+//   max_moments: the largest moments of the uniform distribution to be tested;
+//   stride: the distance between samples to check for statistical properties
+//       0 means the n-th moment of each sample
+//       any other strides tests for spatial correlation between samples;
+//   z_limit: the maximum z-test we would consider the test to pass;
+template <typename T>
+bool CheckSamplesMoments(const std::vector<T>& samples,
+                         std::function<double(int)> theoretical_moments,
+                         int max_moments, int stride, T z_limit) {
+  const T* const samples_data = &samples[0];
+  const int samples_size = samples.size();
+  std::vector<double> moments(max_moments + 1);
+  double* const moments_data = &moments[0];
+  std::vector<int> moments_sample_count(max_moments + 1);
+  int* const moments_sample_count_data = &moments_sample_count[0];
+
+  for (int k = 0; k < samples_size; ++k) {
+    double moment = 1.;
+    for (int i = 0; i <= max_moments; ++i) {
+      int index = k + i * stride;
+      if (index >= samples_size) {
+        break;
+      }
+      // moments[i] store the i-th order measured moments.
+      // bypass std::vector::opeartor[] because they are too slow in the debug
+      // mode, given the large number of samples.
+      moments_data[i] += moment;
+      ++moments_sample_count_data[i];
+      moment *= samples_data[index];
+    }
+  }
+
+  // normalize the moments
+  for (int i = 0; i <= max_moments; ++i) {
+    moments[i] /= moments_sample_count[i];
+  }
+
+  bool status = true;
+
+  for (int i = 1; i <= max_moments; ++i) {
+    // Calculate the theoretical mean and variance
+    const double moments_i_mean = (stride == 0)
+                                      ? theoretical_moments(i)
+                                      : std::pow(theoretical_moments(1), i);
+    const double moments_i_squared = (stride == 0)
+                                         ? theoretical_moments(2 * i)
+                                         : std::pow(theoretical_moments(2), i);
+    const double moments_i_var =
+        moments_i_squared - moments_i_mean * moments_i_mean;
+
+    // assume every operation has a small numerical error.
+    static const double kNumericalError = 1e-6;
+    // it takes i multiplications to calculate one i-th moment.
+    const double error_per_moment = i * kNumericalError;
+    const double total_variance =
+        moments_i_var / moments_sample_count[i] + error_per_moment;
+    // z_test is approximately a unit normal distribution.
+    const double z_test =
+        fabs((moments[i] - moments_i_mean) / sqrt(total_variance));
+
+    if (z_test > z_limit) {
+      LOG(ERROR) << "failing z_test:"
+                 << " moment: " << i << " stride: " << stride
+                 << " z_test: " << z_test << " z_limit: " << z_limit
+                 << " measured moments: " << moments[i]
+                 << " theoretical mean of the moments: " << moments_i_mean
+                 << " theoretical var of the moments: " << moments_i_var
+                 << " sample count: " << moments_sample_count[i];
+      status = false;
+    }
+  }
+
+  return status;
+}
+
+// This tests checks that the generated samples match the theoretical moments
+// of the uniform distribution.
+template <typename T>
+void UniformMomentsTest(int count, int max_moments,
+                        const std::vector<int>& strides, T z_limit) {
+  auto uniform_moments = [](int n) -> double { return 1. / (n + 1); };
+
+  std::vector<T> v1(count);
+  uint64 seed = GetTestSeed();
+  PhiloxRandom gen(seed);
+  FillRandoms<UniformDistribution<PhiloxRandom, T> >(gen, &v1[0], v1.size());
+  for (int stride : strides) {
+    bool status = CheckSamplesMoments<T>(v1, uniform_moments, max_moments,
+                                         stride, z_limit);
+    ASSERT_TRUE(status) << " UniformMomentsTest failing. seed: " << seed;
+  }
+}
+
+// This test checks that the generated samples match the theoretical moments
+// of the unit normal distribution.
+template <typename T>
+void NormalMomentsTest(int count, int max_moments,
+                       const std::vector<int>& strides, T z_limit) {
+  auto normal_moments = [](int n) -> double {
+    if (n % 2 == 1) {
+      // For an odd order, the moment of a unit normal distribution is zero.
+      return 0.;
+    } else {
+      // For an even order, the moment of a unit normal distribution is.
+      // (n-1)!!
+      double v = 1.;
+      for (int i = n - 1; i >= 1; i -= 2) {
+        v *= i;
+      }
+      return v;
+    }
+  };
+
+  std::vector<T> v1(count);
+  uint64 seed = GetTestSeed();
+  PhiloxRandom gen(seed);
+  FillRandoms<NormalDistribution<PhiloxRandom, T> >(gen, &v1[0], v1.size());
+
+  for (int stride : strides) {
+    bool status = CheckSamplesMoments<T>(v1, normal_moments, max_moments,
+                                         stride, z_limit);
+    ASSERT_TRUE(status) << " NormalMomentsTest failing. seed: " << seed;
+  }
+}
+
+// A functor to calculate the moments for the truncated normal distribution.
+// For any odd order, the moment is zero. But for any other n, it can be proven
+// that the following recursive relationship for the moments of the truncated
+// standard normal:
+//   m(n) = (n - 1) * m(n - 2) - 2 * v ^ (n - 1) * f(v) / (2 * Phi(v) - 1)
+//   where v is the cut-off value, f(v) is the p.d.f of the standard
+//     normal, and Phi(v) is the c.d.f of the standard normal.
+class TruncatedNormalMoments {
+ public:
+  double operator()(int n) {
+    if (n == 0) {
+      return 1;
+    }
+    if (n % 2 == 1) {
+      // For an odd order, the moment is always zero
+      return 0.;
+    }
+
+    // Memoization and check the cached results.
+    auto iter = cached_results_.find(n);
+    if (iter != cached_results_.end()) {
+      return iter->second;
+    }
+
+    // The real computation of the moment.
+    double bias = 2.0 * std::pow(kV, n - 1) * kFV / (2.0 * kPhiV - 1.0);
+    double moment_n_minus_2 = (*this)(n - 2);
+    double moment_n = (n - 1) * moment_n_minus_2 - bias;
+
+    cached_results_[n] = moment_n;
+    return moment_n;
+  }
+
+ private:
+  const double kV = 2.0;
+  // f(v), where f is the p.d.f of the normal distribution and v=2.
+  const double kFV = 1.0 / sqrt(2.0 * M_PI) * exp(-kV * kV / 2.0);
+  // The numerical evaluation of Phi(v), where v is the truncate value.
+  // v = 2 in the current implementation.
+  const double kPhiV = 0.977249868051821;
+  std::unordered_map<int, double> cached_results_;
+};
+
+// This test checks that the generated samples matche the theoretical moments
+// of the truncated normal distribution.
+template <typename T>
+void RandomParametersMomentsTest(int count, int max_moments,
+                                 const std::vector<int>& strides, T z_limit) {
+  std::vector<T> v1(count);
+  uint64 seed = GetTestSeed();
+  PhiloxRandom gen(seed);
+  FillRandomsWithSingles<
+      TruncatedNormalDistribution<SingleSampleAdapter<PhiloxRandom>, T> >(
+      gen, &v1[0], v1.size());
+
+  for (int stride : strides) {
+    bool status = CheckSamplesMoments<T>(v1, TruncatedNormalMoments(),
+                                         max_moments, stride, z_limit);
+    ASSERT_TRUE(status) << " NormalMomentsTest failing. seed: " << seed;
+  }
+}
+
+TEST(PhiloxRandomTest, UniformFloatMomentsTest) {
+  const std::vector<int> strides = {0, 1, 4, 17};
+  UniformMomentsTest<float>(1 << 20, 40, strides, kZLimit);
+}
+
+TEST(PhiloxRandomTest, NormalFloatMomentsTest) {
+  const std::vector<int> strides = {0, 1, 4, 17};
+  NormalMomentsTest<float>(8 << 20, 25, strides, kZLimit);
+}
+
+TEST(PhiloxRandomTest, RandomParametersFloatMomentsTest) {
+  const std::vector<int> strides = {0, 1, 4, 17};
+  RandomParametersMomentsTest<float>(1 << 20, 40, strides, kZLimit);
+}
+
+TEST(PhiloxRandomTest, UniformDoubleMomentsTest) {
+  const std::vector<int> strides = {0, 1, 4, 17};
+  UniformMomentsTest<double>(1 << 20, 40, strides, kZLimit);
+}
+
+TEST(PhiloxRandomTest, NormalDoubleMomentsTest) {
+  const std::vector<int> strides = {0, 1, 4, 17};
+  NormalMomentsTest<double>(8 << 20, 25, strides, kZLimit);
+}
+
+TEST(PhiloxRandomTest, RandomParametersDoubleMomentsTest) {
+  const std::vector<int> strides = {0, 1, 4, 17};
+  RandomParametersMomentsTest<double>(1 << 20, 40, strides, kZLimit);
+}
+
+}  // namespace
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/random_test.cc b/tensorflow/core/lib/random/random_test.cc
new file mode 100644
index 0000000000..7ed37c8b5e
--- /dev/null
+++ b/tensorflow/core/lib/random/random_test.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/lib/random/random.h"
+
+#include <set>
+#include "tensorflow/core/platform/port.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace random {
+namespace {
+
+TEST(New64Test, SanityCheck) {
+  std::set<uint64> values;
+  for (int i = 0; i < 1000000; i++) {
+    uint64 x = New64();
+    EXPECT_TRUE(values.insert(x).second) << "duplicate " << x;
+  }
+}
+
+}  // namespace
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/simple_philox.cc b/tensorflow/core/lib/random/simple_philox.cc
new file mode 100644
index 0000000000..1035e1f017
--- /dev/null
+++ b/tensorflow/core/lib/random/simple_philox.cc
@@ -0,0 +1,24 @@
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/lib/random/exact_uniform_int.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace random {
+
+uint32 SimplePhilox::Uniform(uint32 n) {
+  return ExactUniformInt<uint32>(n, [this]() { return Rand32(); });
+}
+
+uint64 SimplePhilox::Uniform64(uint64 n) {
+  return ExactUniformInt<uint64>(n, [this]() { return Rand64(); });
+}
+
+uint32 SimplePhilox::Skewed(int max_log) {
+  CHECK(0 <= max_log && max_log <= 32);
+  const int shift = Rand32() % (max_log + 1);
+  const uint32 mask = shift == 32 ? ~static_cast<uint32>(0) : (1 << shift) - 1;
+  return Rand32() & mask;
+}
+
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/simple_philox.h b/tensorflow/core/lib/random/simple_philox.h
new file mode 100644
index 0000000000..12b15d7616
--- /dev/null
+++ b/tensorflow/core/lib/random/simple_philox.h
@@ -0,0 +1,61 @@
+#ifndef TENSORFLOW_LIB_RANDOM_SIMPLE_PHILOX_H_
+#define TENSORFLOW_LIB_RANDOM_SIMPLE_PHILOX_H_
+
+#include <math.h>
+#include <string.h>
+#include <algorithm>
+
+#include "tensorflow/core/lib/random/philox_random.h"
+#include "tensorflow/core/lib/random/random_distributions.h"
+
+namespace tensorflow {
+namespace random {
+
+// A simple imperative interface to Philox
+class SimplePhilox {
+ public:
+  PHILOX_DEVICE_INLINE
+  explicit SimplePhilox(PhiloxRandom* gen) : single_(gen) {}
+
+  // 32 random bits
+  PHILOX_DEVICE_INLINE uint32 Rand32() { return single_(); }
+
+  // 64 random bits
+  PHILOX_DEVICE_INLINE uint64 Rand64() {
+    const uint32 lo = single_(), hi = single_();
+    return lo | static_cast<uint64>(hi) << 32;
+  }
+
+  // Uniform float in [0, 1)
+  PHILOX_DEVICE_INLINE float RandFloat() { return Uint32ToFloat(single_()); }
+
+  // Uniform double in [0, 1)
+  PHILOX_DEVICE_INLINE double RandDouble() {
+    const uint32 x0 = single_(), x1 = single_();
+    return Uint64ToDouble(x0, x1);
+  }
+
+  // Uniform integer in [0, n).
+  // Uses rejection sampling, so may need more than one 32-bit sample.
+  uint32 Uniform(uint32 n);
+
+  // Approximately uniform integer in [0, n).
+  // Uses rejection sampling, so may need more than one 64-bit sample.
+  uint64 Uniform64(uint64 n);
+
+  // True with probability 1/n.
+  bool OneIn(uint32 n) { return Uniform(n) == 0; }
+
+  // Skewed: pick "base" uniformly from range [0,max_log] and then
+  // return "base" random bits.  The effect is to pick a number in the
+  // range [0,2^max_log-1] with bias towards smaller numbers.
+  uint32 Skewed(int max_log);
+
+ private:
+  SingleSampleAdapter<PhiloxRandom> single_;
+};
+
+}  // namespace random
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_RANDOM_SIMPLE_PHILOX_H_
diff --git a/tensorflow/core/lib/random/simple_philox_test.cc b/tensorflow/core/lib/random/simple_philox_test.cc
new file mode 100644
index 0000000000..4246b8b4dd
--- /dev/null
+++ b/tensorflow/core/lib/random/simple_philox_test.cc
@@ -0,0 +1,120 @@
+#include "tensorflow/core/lib/random/simple_philox.h"
+
+#include <set>
+#include <string>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace random {
+namespace {
+
+TEST(SimplePhiloxTest, FloatTest) {
+  PhiloxRandom philox(7, 7);
+  SimplePhilox gen(&philox);
+  static const int kIters = 1000000;
+  for (int i = 0; i < kIters; ++i) {
+    float f = gen.RandFloat();
+    EXPECT_LE(0.0f, f);
+    EXPECT_GT(1.0f, f);
+  }
+  for (int i = 0; i < kIters; ++i) {
+    double d = gen.RandDouble();
+    EXPECT_LE(0.0, d);
+    EXPECT_GT(1.0, d);
+  }
+}
+
+static void DifferenceTest(const char *names, SimplePhilox *gen1,
+                           SimplePhilox *gen2) {
+  static const int kIters = 100;
+  bool different = false;
+  for (int i = 0; i < kIters; ++i) {
+    if (gen1->Rand32() != gen2->Rand32()) {
+      different = true;
+      break;
+    }
+  }
+  CHECK(different) << "different seeds but same output!";
+}
+
+TEST(SimplePhiloxTest, DifferenceTest) {
+  PhiloxRandom philox1(1, 1), philox2(17, 17);
+  SimplePhilox gen1(&philox1), gen2(&philox2);
+
+  DifferenceTest("SimplePhilox: different seeds", &gen1, &gen2);
+}
+
+TEST(SimplePhiloxTest, DifferenceTestCloseSeeds) {
+  PhiloxRandom philox1(1, 1), philox2(2, 1);
+  SimplePhilox gen1(&philox1), gen2(&philox2);
+
+  DifferenceTest("SimplePhilox: close seeds", &gen1, &gen2);
+}
+
+TEST(SimplePhiloxTest, Regression_CloseSeedsAreDifferent) {
+  const int kCount = 1000;
+
+  // Two seeds differ only by the last bit.
+  PhiloxRandom philox1(0, 1), philox2(1, 1);
+  SimplePhilox gen1(&philox1), gen2(&philox2);
+
+  std::set<uint32> first;
+  std::set<uint32> all;
+  for (int i = 0; i < kCount; ++i) {
+    uint32 v = gen1.Rand32();
+    first.insert(v);
+    all.insert(v);
+    all.insert(gen2.Rand32());
+  }
+
+  // Broken array initialization implementation (before 2009-08-18) using the
+  // above seeds return <1000, 1007>, generating output that is >99% similar.
+  // The fix returns <1000, 2000> for completely disjoint sets.
+  EXPECT_EQ(kCount, first.size());
+  EXPECT_EQ(2 * kCount, all.size());
+}
+
+TEST(SimplePhiloxTest, TestUniform) {
+  PhiloxRandom philox(17, 17);
+  SimplePhilox gen(&philox);
+
+  uint32 range = 3 * (1L << 29);
+  uint32 threshold = 1L << 30;
+
+  size_t count = 0;
+  static const int kTrials = 100000;
+  for (int i = 0; i < kTrials; ++i) {
+    uint32 rnd = gen.Uniform(range);
+    if (rnd < threshold) {
+      ++count;
+    }
+  }
+
+  EXPECT_LT(fabs((threshold + 0.0) / range - (count + 0.0) / kTrials), 0.005);
+}
+
+TEST(SimplePhiloxTest, TestUniform64) {
+  PhiloxRandom philox(17, 17);
+  SimplePhilox gen(&philox);
+
+  uint64 range = 3 * (1LL << 59);
+  uint64 threshold = 1LL << 60;
+
+  size_t count = 0;
+  static const int kTrials = 100000;
+  for (int i = 0; i < kTrials; ++i) {
+    uint64 rnd = gen.Uniform64(range);
+    if (rnd < threshold) {
+      ++count;
+    }
+  }
+
+  EXPECT_LT(fabs((threshold + 0.0) / range - (count + 0.0) / kTrials), 0.005);
+}
+
+}  // namespace
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/weighted_picker.cc b/tensorflow/core/lib/random/weighted_picker.cc
new file mode 100644
index 0000000000..f96da578ec
--- /dev/null
+++ b/tensorflow/core/lib/random/weighted_picker.cc
@@ -0,0 +1,203 @@
+#include "tensorflow/core/lib/random/weighted_picker.h"
+
+#include <string.h>
+#include <algorithm>
+
+#include "tensorflow/core/lib/random/simple_philox.h"
+
+namespace tensorflow {
+namespace random {
+
+WeightedPicker::WeightedPicker(int N) {
+  CHECK_GE(N, 0);
+  N_ = N;
+
+  // Find the number of levels
+  num_levels_ = 1;
+  while (LevelSize(num_levels_ - 1) < N) {
+    num_levels_++;
+  }
+
+  // Initialize the levels
+  level_ = new int32*[num_levels_];
+  for (int l = 0; l < num_levels_; l++) {
+    level_[l] = new int32[LevelSize(l)];
+  }
+
+  SetAllWeights(1);
+}
+
+WeightedPicker::~WeightedPicker() {
+  for (int l = 0; l < num_levels_; l++) {
+    delete[] level_[l];
+  }
+  delete[] level_;
+}
+
+static int32 UnbiasedUniform(SimplePhilox* r, int32 n) {
+  CHECK_LE(0, n);
+  const uint32 range = ~static_cast<uint32>(0);
+  if (n == 0) {
+    return r->Rand32() * n;
+  } else if (0 == (n & (n - 1))) {
+    // N is a power of two, so just mask off the lower bits.
+    return r->Rand32() & (n - 1);
+  } else {
+    // Reject all numbers that skew the distribution towards 0.
+
+    // Rand32's output is uniform in the half-open interval [0, 2^{32}).
+    // For any interval [m,n), the number of elements in it is n-m.
+
+    uint32 rem = (range % n) + 1;
+    uint32 rnd;
+
+    // rem = ((2^{32}-1) \bmod n) + 1
+    // 1 <= rem <= n
+
+    // NB: rem == n is impossible, since n is not a power of 2 (from
+    // earlier check).
+
+    do {
+      rnd = r->Rand32();  // rnd uniform over [0, 2^{32})
+    } while (rnd < rem);  // reject [0, rem)
+    // rnd is uniform over [rem, 2^{32})
+    //
+    // The number of elements in the half-open interval is
+    //
+    //  2^{32} - rem = 2^{32} - ((2^{32}-1) \bmod n) - 1
+    //               = 2^{32}-1 - ((2^{32}-1) \bmod n)
+    //               = n \cdot \lfloor (2^{32}-1)/n \rfloor
+    //
+    // therefore n evenly divides the number of integers in the
+    // interval.
+    //
+    // The function v \rightarrow v % n takes values from [bias,
+    // 2^{32}) to [0, n).  Each integer in the range interval [0, n)
+    // will have exactly \lfloor (2^{32}-1)/n \rfloor preimages from
+    // the domain interval.
+    //
+    // Therefore, v % n is uniform over [0, n).  QED.
+
+    return rnd % n;
+  }
+}
+
+int WeightedPicker::Pick(SimplePhilox* rnd) const {
+  if (total_weight() == 0) return -1;
+
+  // using unbiased uniform distribution to avoid bias
+  // toward low elements resulting from a possible use
+  // of big weights.
+  return PickAt(UnbiasedUniform(rnd, total_weight()));
+}
+
+int WeightedPicker::PickAt(int32 weight_index) const {
+  if (weight_index < 0 || weight_index >= total_weight()) return -1;
+
+  int32 position = weight_index;
+  int index = 0;
+
+  for (int l = 1; l < num_levels_; l++) {
+    // Pick left or right child of "level_[l-1][index]"
+    const int32 left_weight = level_[l][2 * index];
+    if (position < left_weight) {
+      // Descend to left child
+      index = 2 * index;
+    } else {
+      // Descend to right child
+      index = 2 * index + 1;
+      position -= left_weight;
+    }
+  }
+  CHECK_GE(index, 0);
+  CHECK_LT(index, N_);
+  CHECK_LE(position, level_[num_levels_ - 1][index]);
+  return index;
+}
+
+void WeightedPicker::set_weight(int index, int32 weight) {
+  assert(index >= 0);
+  assert(index < N_);
+
+  // Adjust the sums all the way up to the root
+  const int32 delta = weight - get_weight(index);
+  for (int l = num_levels_ - 1; l >= 0; l--) {
+    level_[l][index] += delta;
+    index >>= 1;
+  }
+}
+
+void WeightedPicker::SetAllWeights(int32 weight) {
+  // Initialize leaves
+  int32* leaves = level_[num_levels_ - 1];
+  for (int i = 0; i < N_; i++) leaves[i] = weight;
+  for (int i = N_; i < LevelSize(num_levels_ - 1); i++) leaves[i] = 0;
+
+  // Now sum up towards the root
+  RebuildTreeWeights();
+}
+
+void WeightedPicker::SetWeightsFromArray(int N, const int32* weights) {
+  Resize(N);
+
+  // Initialize leaves
+  int32* leaves = level_[num_levels_ - 1];
+  for (int i = 0; i < N_; i++) leaves[i] = weights[i];
+  for (int i = N_; i < LevelSize(num_levels_ - 1); i++) leaves[i] = 0;
+
+  // Now sum up towards the root
+  RebuildTreeWeights();
+}
+
+void WeightedPicker::RebuildTreeWeights() {
+  for (int l = num_levels_ - 2; l >= 0; l--) {
+    int32* level = level_[l];
+    int32* children = level_[l + 1];
+    for (int i = 0; i < LevelSize(l); i++) {
+      level[i] = children[2 * i] + children[2 * i + 1];
+    }
+  }
+}
+
+void WeightedPicker::Append(int32 weight) {
+  Resize(num_elements() + 1);
+  set_weight(num_elements() - 1, weight);
+}
+
+void WeightedPicker::Resize(int new_size) {
+  CHECK_GE(new_size, 0);
+  if (new_size <= LevelSize(num_levels_ - 1)) {
+    // The new picker fits in the existing levels.
+
+    // First zero out any of the weights that are being dropped so
+    // that the levels are correct (only needed when shrinking)
+    for (int i = new_size; i < N_; i++) {
+      set_weight(i, 0);
+    }
+
+    // We do not need to set any new weights when enlarging because
+    // the unneeded entries always have weight zero.
+    N_ = new_size;
+    return;
+  }
+
+  // We follow the simple strategy of just copying the old
+  // WeightedPicker into a new WeightedPicker.  The cost is
+  // O(N) regardless.
+  assert(new_size > N_);
+  WeightedPicker new_picker(new_size);
+  int32* dst = new_picker.level_[new_picker.num_levels_ - 1];
+  int32* src = this->level_[this->num_levels_ - 1];
+  memcpy(dst, src, sizeof(dst[0]) * N_);
+  memset(dst + N_, 0, sizeof(dst[0]) * (new_size - N_));
+  new_picker.RebuildTreeWeights();
+
+  // Now swap the two pickers
+  std::swap(new_picker.N_, this->N_);
+  std::swap(new_picker.num_levels_, this->num_levels_);
+  std::swap(new_picker.level_, this->level_);
+  assert(this->N_ == new_size);
+}
+
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/random/weighted_picker.h b/tensorflow/core/lib/random/weighted_picker.h
new file mode 100644
index 0000000000..3d2c2dbb39
--- /dev/null
+++ b/tensorflow/core/lib/random/weighted_picker.h
@@ -0,0 +1,118 @@
+
+// An abstraction to pick from one of N elements with a specified
+// weight per element.
+//
+// The weight for a given element can be changed in O(lg N) time
+// An element can be picked in O(lg N) time.
+//
+// Uses O(N) bytes of memory.
+//
+// Alternative: distribution-sampler.h allows O(1) time picking, but no weight
+// adjustment after construction.
+
+#ifndef TENSORFLOW_LIB_RANDOM_WEIGHTED_PICKER_H_
+#define TENSORFLOW_LIB_RANDOM_WEIGHTED_PICKER_H_
+
+#include <assert.h>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace random {
+
+class SimplePhilox;
+
+class WeightedPicker {
+ public:
+  // REQUIRES   N >= 0
+  // Initializes the elements with a weight of one per element
+  explicit WeightedPicker(int N);
+
+  // Releases all resources
+  ~WeightedPicker();
+
+  // Pick a random element with probability proportional to its weight.
+  // If total weight is zero, returns -1.
+  int Pick(SimplePhilox* rnd) const;
+
+  // Deterministically pick element x whose weight covers the
+  // specified weight_index.
+  // Returns -1 if weight_index is not in the range [ 0 .. total_weight()-1 ]
+  int PickAt(int32 weight_index) const;
+
+  // Get the weight associated with an element
+  // REQUIRES 0 <= index < N
+  int32 get_weight(int index) const;
+
+  // Set the weight associated with an element
+  // REQUIRES weight >= 0.0f
+  // REQUIRES 0 <= index < N
+  void set_weight(int index, int32 weight);
+
+  // Get the total combined weight of all elements
+  int32 total_weight() const;
+
+  // Get the number of elements in the picker
+  int num_elements() const;
+
+  // Set weight of each element to "weight"
+  void SetAllWeights(int32 weight);
+
+  // Resizes the picker to N and
+  // sets the weight of each element i to weight[i].
+  // The sum of the weights should not exceed 2^31 - 2
+  // Complexity O(N).
+  void SetWeightsFromArray(int N, const int32* weights);
+
+  // REQUIRES   N >= 0
+  //
+  // Resize the weighted picker so that it has "N" elements.
+  // Any newly added entries have zero weight.
+  //
+  // Note: Resizing to a smaller size than num_elements() will
+  // not reclaim any memory.  If you wish to reduce memory usage,
+  // allocate a new WeightedPicker of the appropriate size.
+  //
+  // It is efficient to use repeated calls to Resize(num_elements() + 1)
+  // to grow the picker to size X (takes total time O(X)).
+  void Resize(int N);
+
+  // Grow the picker by one and set the weight of the new entry to "weight".
+  //
+  // Repeated calls to Append() in order to grow the
+  // picker to size X takes a total time of O(X lg(X)).
+  // Consider using SetWeightsFromArray instead.
+  void Append(int32 weight);
+
+ private:
+  // We keep a binary tree with N leaves.  The "i"th leaf contains
+  // the weight of the "i"th element.  An internal node contains
+  // the sum of the weights of its children.
+  int N_;           // Number of elements
+  int num_levels_;  // Number of levels in tree (level-0 is root)
+  int32** level_;   // Array that holds nodes per level
+
+  // Size of each level
+  static int LevelSize(int level) { return 1 << level; }
+
+  // Rebuild the tree weights using the leaf weights
+  void RebuildTreeWeights();
+
+  TF_DISALLOW_COPY_AND_ASSIGN(WeightedPicker);
+};
+
+inline int32 WeightedPicker::get_weight(int index) const {
+  DCHECK_GE(index, 0);
+  DCHECK_LT(index, N_);
+  return level_[num_levels_ - 1][index];
+}
+
+inline int32 WeightedPicker::total_weight() const { return level_[0][0]; }
+
+inline int WeightedPicker::num_elements() const { return N_; }
+
+}  // namespace random
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_RANDOM_WEIGHTED_PICKER_H_
diff --git a/tensorflow/core/lib/random/weighted_picker_test.cc b/tensorflow/core/lib/random/weighted_picker_test.cc
new file mode 100644
index 0000000000..0b27d437d5
--- /dev/null
+++ b/tensorflow/core/lib/random/weighted_picker_test.cc
@@ -0,0 +1,254 @@
+#include "tensorflow/core/lib/random/weighted_picker.h"
+
+#include <string.h>
+#include <vector>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace random {
+
+static void TestPicker(SimplePhilox* rnd, int size);
+static void CheckUniform(SimplePhilox* rnd, WeightedPicker* picker, int trials);
+static void CheckSkewed(SimplePhilox* rnd, WeightedPicker* picker, int trials);
+static void TestPickAt(int items, const int32* weights);
+
+TEST(WeightedPicker, Simple) {
+  PhiloxRandom philox(testing::RandomSeed(), 17);
+  SimplePhilox rnd(&philox);
+
+  {
+    VLOG(0) << "======= Zero-length picker";
+    WeightedPicker picker(0);
+    EXPECT_EQ(picker.Pick(&rnd), -1);
+  }
+
+  {
+    VLOG(0) << "======= Singleton picker";
+    WeightedPicker picker(1);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+  }
+
+  {
+    VLOG(0) << "======= Grown picker";
+    WeightedPicker picker(0);
+    for (int i = 0; i < 10; i++) {
+      picker.Append(1);
+    }
+    CheckUniform(&rnd, &picker, 100000);
+  }
+
+  {
+    VLOG(0) << "======= Grown picker with zero weights";
+    WeightedPicker picker(1);
+    picker.Resize(10);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+  }
+
+  {
+    VLOG(0) << "======= Shrink picker and check weights";
+    WeightedPicker picker(1);
+    picker.Resize(10);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+    EXPECT_EQ(picker.Pick(&rnd), 0);
+    for (int i = 0; i < 10; i++) {
+      picker.set_weight(i, i);
+    }
+    EXPECT_EQ(picker.total_weight(), 45);
+    picker.Resize(5);
+    EXPECT_EQ(picker.total_weight(), 10);
+    picker.Resize(2);
+    EXPECT_EQ(picker.total_weight(), 1);
+    picker.Resize(1);
+    EXPECT_EQ(picker.total_weight(), 0);
+  }
+}
+
+TEST(WeightedPicker, BigWeights) {
+  PhiloxRandom philox(testing::RandomSeed() + 1, 17);
+  SimplePhilox rnd(&philox);
+  VLOG(0) << "======= Check uniform with big weights";
+  WeightedPicker picker(2);
+  picker.SetAllWeights(2147483646L / 3);  // (2^31 - 2) / 3
+  CheckUniform(&rnd, &picker, 100000);
+}
+
+TEST(WeightedPicker, Deterministic) {
+  VLOG(0) << "======= Testing deterministic pick";
+  static const int32 weights[] = {1, 0, 200, 5, 42};
+  TestPickAt(TF_ARRAYSIZE(weights), weights);
+}
+
+TEST(WeightedPicker, Randomized) {
+  PhiloxRandom philox(testing::RandomSeed() + 10, 17);
+  SimplePhilox rnd(&philox);
+  TestPicker(&rnd, 1);
+  TestPicker(&rnd, 2);
+  TestPicker(&rnd, 3);
+  TestPicker(&rnd, 4);
+  TestPicker(&rnd, 7);
+  TestPicker(&rnd, 8);
+  TestPicker(&rnd, 9);
+  TestPicker(&rnd, 10);
+  TestPicker(&rnd, 100);
+}
+
+static void TestPicker(SimplePhilox* rnd, int size) {
+  VLOG(0) << "======= Testing size " << size;
+
+  // Check that empty picker returns -1
+  {
+    WeightedPicker picker(size);
+    picker.SetAllWeights(0);
+    for (int i = 0; i < 100; i++) EXPECT_EQ(picker.Pick(rnd), -1);
+  }
+
+  // Create zero weights array
+  std::vector<int32> weights(size);
+  for (int elem = 0; elem < size; elem++) {
+    weights[elem] = 0;
+  }
+
+  // Check that singleton picker always returns the same element
+  for (int elem = 0; elem < size; elem++) {
+    WeightedPicker picker(size);
+    picker.SetAllWeights(0);
+    picker.set_weight(elem, elem + 1);
+    for (int i = 0; i < 100; i++) EXPECT_EQ(picker.Pick(rnd), elem);
+    weights[elem] = 10;
+    picker.SetWeightsFromArray(size, &weights[0]);
+    for (int i = 0; i < 100; i++) EXPECT_EQ(picker.Pick(rnd), elem);
+    weights[elem] = 0;
+  }
+
+  // Check that uniform picker generates elements roughly uniformly
+  {
+    WeightedPicker picker(size);
+    CheckUniform(rnd, &picker, 100000);
+  }
+
+  // Check uniform picker that was grown piecemeal
+  if (size / 3 > 0) {
+    WeightedPicker picker(size / 3);
+    while (picker.num_elements() != size) {
+      picker.Append(1);
+    }
+    CheckUniform(rnd, &picker, 100000);
+  }
+
+  // Check that skewed distribution works
+  if (size <= 10) {
+    // When picker grows one element at a time
+    WeightedPicker picker(size);
+    int32 weight = 1;
+    for (int elem = 0; elem < size; elem++) {
+      picker.set_weight(elem, weight);
+      weights[elem] = weight;
+      weight *= 2;
+    }
+    CheckSkewed(rnd, &picker, 1000000);
+
+    // When picker is created from an array
+    WeightedPicker array_picker(0);
+    array_picker.SetWeightsFromArray(size, &weights[0]);
+    CheckSkewed(rnd, &array_picker, 1000000);
+  }
+}
+
+static void CheckUniform(SimplePhilox* rnd, WeightedPicker* picker,
+                         int trials) {
+  const int size = picker->num_elements();
+  int* count = new int[size];
+  memset(count, 0, sizeof(count[0]) * size);
+  for (int i = 0; i < size * trials; i++) {
+    const int elem = picker->Pick(rnd);
+    EXPECT_GE(elem, 0);
+    EXPECT_LT(elem, size);
+    count[elem]++;
+  }
+  const int expected_min = int(0.9 * trials);
+  const int expected_max = int(1.1 * trials);
+  for (int i = 0; i < size; i++) {
+    EXPECT_GE(count[i], expected_min);
+    EXPECT_LE(count[i], expected_max);
+  }
+  delete[] count;
+}
+
+static void CheckSkewed(SimplePhilox* rnd, WeightedPicker* picker, int trials) {
+  const int size = picker->num_elements();
+  int* count = new int[size];
+  memset(count, 0, sizeof(count[0]) * size);
+  for (int i = 0; i < size * trials; i++) {
+    const int elem = picker->Pick(rnd);
+    EXPECT_GE(elem, 0);
+    EXPECT_LT(elem, size);
+    count[elem]++;
+  }
+
+  for (int i = 0; i < size - 1; i++) {
+    LOG(INFO) << i << ": " << count[i];
+    const float ratio = float(count[i + 1]) / float(count[i]);
+    EXPECT_GE(ratio, 1.6f);
+    EXPECT_LE(ratio, 2.4f);
+  }
+  delete[] count;
+}
+
+static void TestPickAt(int items, const int32* weights) {
+  WeightedPicker picker(items);
+  picker.SetWeightsFromArray(items, weights);
+  int weight_index = 0;
+  for (int i = 0; i < items; ++i) {
+    for (int j = 0; j < weights[i]; ++j) {
+      int pick = picker.PickAt(weight_index);
+      EXPECT_EQ(pick, i);
+      ++weight_index;
+    }
+  }
+  EXPECT_EQ(weight_index, picker.total_weight());
+}
+
+static void BM_Create(int iters, int arg) {
+  while (--iters > 0) {
+    WeightedPicker p(arg);
+  }
+}
+BENCHMARK(BM_Create)->Range(1, 1024);
+
+static void BM_CreateAndSetWeights(int iters, int arg) {
+  std::vector<int32> weights(arg);
+  for (int i = 0; i < arg; i++) {
+    weights[i] = i * 10;
+  }
+  while (--iters > 0) {
+    WeightedPicker p(arg);
+    p.SetWeightsFromArray(arg, &weights[0]);
+  }
+}
+BENCHMARK(BM_CreateAndSetWeights)->Range(1, 1024);
+
+static void BM_Pick(int iters, int arg) {
+  PhiloxRandom philox(301, 17);
+  SimplePhilox rnd(&philox);
+  WeightedPicker p(arg);
+  int result = 0;
+  while (--iters > 0) {
+    result += p.Pick(&rnd);
+  }
+  VLOG(4) << result;  // Dummy use
+}
+BENCHMARK(BM_Pick)->Range(1, 1024);
+
+}  // namespace random
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/numbers.cc b/tensorflow/core/lib/strings/numbers.cc
new file mode 100644
index 0000000000..d61129fb3f
--- /dev/null
+++ b/tensorflow/core/lib/strings/numbers.cc
@@ -0,0 +1,260 @@
+#include "tensorflow/core/lib/strings/numbers.h"
+
+#include <float.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <algorithm>
+#include <cmath>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace strings {
+
+char* FastInt32ToBufferLeft(int32 i, char* buffer) {
+  uint32 u = i;
+  if (i < 0) {
+    *buffer++ = '-';
+    // We need to do the negation in modular (i.e., "unsigned")
+    // arithmetic; MSVC++ apprently warns for plain "-u", so
+    // we write the equivalent expression "0 - u" instead.
+    u = 0 - u;
+  }
+  return FastUInt32ToBufferLeft(u, buffer);
+}
+
+char* FastUInt32ToBufferLeft(uint32 i, char* buffer) {
+  char* start = buffer;
+  do {
+    *buffer++ = ((i % 10) + '0');
+    i /= 10;
+  } while (i > 0);
+  *buffer = 0;
+  std::reverse(start, buffer);
+  return buffer;
+}
+
+char* FastInt64ToBufferLeft(int64 i, char* buffer) {
+  uint64 u = i;
+  if (i < 0) {
+    *buffer++ = '-';
+    u = 0 - u;
+  }
+  return FastUInt64ToBufferLeft(u, buffer);
+}
+
+char* FastUInt64ToBufferLeft(uint64 i, char* buffer) {
+  char* start = buffer;
+  do {
+    *buffer++ = ((i % 10) + '0');
+    i /= 10;
+  } while (i > 0);
+  *buffer = 0;
+  std::reverse(start, buffer);
+  return buffer;
+}
+
+static const double kDoublePrecisionCheckMax = DBL_MAX / 1.000000000000001;
+
+char* DoubleToBuffer(double value, char* buffer) {
+  // DBL_DIG is 15 for IEEE-754 doubles, which are used on almost all
+  // platforms these days.  Just in case some system exists where DBL_DIG
+  // is significantly larger -- and risks overflowing our buffer -- we have
+  // this assert.
+  static_assert(DBL_DIG < 20, "DBL_DIG is too big");
+
+  bool full_precision_needed = true;
+  if (std::abs(value) <= kDoublePrecisionCheckMax) {
+    int snprintf_result =
+        snprintf(buffer, kFastToBufferSize, "%.*g", DBL_DIG, value);
+
+    // The snprintf should never overflow because the buffer is significantly
+    // larger than the precision we asked for.
+    DCHECK(snprintf_result > 0 && snprintf_result < kFastToBufferSize);
+
+    full_precision_needed = strtod(buffer, NULL) != value;
+  }
+
+  if (full_precision_needed) {
+    int snprintf_result =
+        snprintf(buffer, kFastToBufferSize, "%.*g", DBL_DIG + 2, value);
+
+    // Should never overflow; see above.
+    DCHECK(snprintf_result > 0 && snprintf_result < kFastToBufferSize);
+  }
+  return buffer;
+}
+
+bool safe_strto64(const char* str, int64* value) {
+  if (!str) return false;
+
+  // Skip leading space.
+  while (isspace(*str)) ++str;
+
+  int64 vlimit = kint64max;
+  int sign = 1;
+  if (*str == '-') {
+    sign = -1;
+    ++str;
+    // Different limit for positive and negative integers.
+    vlimit = kint64min;
+  }
+
+  if (!isdigit(*str)) return false;
+
+  int64 result = 0;
+  if (sign == 1) {
+    do {
+      int digit = *str - '0';
+      if ((vlimit - digit) / 10 < result) {
+        return false;
+      }
+      result = result * 10 + digit;
+      ++str;
+    } while (isdigit(*str));
+  } else {
+    do {
+      int digit = *str - '0';
+      if ((vlimit + digit) / 10 > result) {
+        return false;
+      }
+      result = result * 10 - digit;
+      ++str;
+    } while (isdigit(*str));
+  }
+
+  // Skip trailing space.
+  while (isspace(*str)) ++str;
+
+  if (*str) return false;
+
+  *value = result;
+  return true;
+}
+
+bool safe_strto32(const char* str, int32* value) {
+  if (!str) return false;
+
+  // Skip leading space.
+  while (isspace(*str)) ++str;
+
+  int64 vmax = kint32max;
+  int sign = 1;
+  if (*str == '-') {
+    sign = -1;
+    ++str;
+    // Different max for positive and negative integers.
+    ++vmax;
+  }
+
+  if (!isdigit(*str)) return false;
+
+  int64 result = 0;
+  do {
+    result = result * 10 + *str - '0';
+    if (result > vmax) {
+      return false;
+    }
+    ++str;
+  } while (isdigit(*str));
+
+  // Skip trailing space.
+  while (isspace(*str)) ++str;
+
+  if (*str) return false;
+
+  *value = result * sign;
+  return true;
+}
+
+bool safe_strtof(const char* str, float* value) {
+  char* endptr;
+  *value = strtof(str, &endptr);
+  while (isspace(*endptr)) ++endptr;
+  // Ignore range errors from strtod/strtof.
+  // The values it returns on underflow and
+  // overflow are the right fallback in a
+  // robust setting.
+  return *str != '\0' && *endptr == '\0';
+}
+
+char* FloatToBuffer(float value, char* buffer) {
+  // FLT_DIG is 6 for IEEE-754 floats, which are used on almost all
+  // platforms these days.  Just in case some system exists where FLT_DIG
+  // is significantly larger -- and risks overflowing our buffer -- we have
+  // this assert.
+  static_assert(FLT_DIG < 10, "FLT_DIG is too big");
+
+  int snprintf_result =
+      snprintf(buffer, kFastToBufferSize, "%.*g", FLT_DIG, value);
+
+  // The snprintf should never overflow because the buffer is significantly
+  // larger than the precision we asked for.
+  DCHECK(snprintf_result > 0 && snprintf_result < kFastToBufferSize);
+
+  float parsed_value;
+  if (!safe_strtof(buffer, &parsed_value) || parsed_value != value) {
+    snprintf_result =
+        snprintf(buffer, kFastToBufferSize, "%.*g", FLT_DIG + 2, value);
+
+    // Should never overflow; see above.
+    DCHECK(snprintf_result > 0 && snprintf_result < kFastToBufferSize);
+  }
+  return buffer;
+}
+
+string FpToString(Fprint fp) {
+  char buf[17];
+  snprintf(buf, sizeof(buf), "%016llx", static_cast<uint64>(fp));
+  return string(buf);
+}
+
+bool StringToFp(const string& s, Fprint* fp) {
+  char junk;
+  uint64 result;
+  if (sscanf(s.c_str(), "%llx%c", &result, &junk) == 1) {
+    *fp = result;
+    return true;
+  } else {
+    return false;
+  }
+}
+
+string HumanReadableNumBytes(int64 num_bytes) {
+  if (num_bytes == kint64min) {
+    // Special case for number with not representable negation.
+    return "-8E";
+  }
+
+  const char* neg_str = (num_bytes < 0) ? "-" : "";
+  if (num_bytes < 0) {
+    num_bytes = -num_bytes;
+  }
+
+  // Special case for bytes.
+  if (num_bytes < 1024) {
+    // No fractions for bytes.
+    char buf[8];  // Longest possible string is '-XXXXB'
+    snprintf(buf, sizeof(buf), "%s%lldB", neg_str,
+             static_cast<int64>(num_bytes));
+    return string(buf);
+  }
+
+  static const char units[] = "KMGTPE";  // int64 only goes up to E.
+  const char* unit = units;
+  while (num_bytes >= static_cast<int64>(1024) * 1024) {
+    num_bytes /= 1024;
+    ++unit;
+    CHECK(unit < units + TF_ARRAYSIZE(units));
+  }
+
+  // We use SI prefixes.
+  char buf[16];
+  snprintf(buf, sizeof(buf), ((*unit == 'K') ? "%s%.1f%ciB" : "%s%.2f%ciB"),
+           neg_str, num_bytes / 1024.0, *unit);
+  return string(buf);
+}
+
+}  // namespace strings
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/numbers.h b/tensorflow/core/lib/strings/numbers.h
new file mode 100644
index 0000000000..a30a862279
--- /dev/null
+++ b/tensorflow/core/lib/strings/numbers.h
@@ -0,0 +1,92 @@
+#ifndef TENSORFLOW_LIB_STRINGS_NUMBERS_H_
+#define TENSORFLOW_LIB_STRINGS_NUMBERS_H_
+
+#include <string>
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace strings {
+
+// ----------------------------------------------------------------------
+// FastIntToBufferLeft()
+//    These are intended for speed.
+//
+//    All functions take the output buffer as an arg.  FastInt() uses
+//    at most 22 bytes, FastTime() uses exactly 30 bytes.  They all
+//    return a pointer to the beginning of the output, which is the same as
+//    the beginning of the input buffer.
+//
+//    NOTE: In 64-bit land, sizeof(time_t) is 8, so it is possible
+//    to pass to FastTimeToBuffer() a time whose year cannot be
+//    represented in 4 digits. In this case, the output buffer
+//    will contain the string "Invalid:<value>"
+// ----------------------------------------------------------------------
+
+// Previously documented minimums -- the buffers provided must be at least this
+// long, though these numbers are subject to change:
+//     Int32, UInt32:                   12 bytes
+//     Int64, UInt64, Int, Uint:        22 bytes
+//     Time:                            30 bytes
+// Use kFastToBufferSize rather than hardcoding constants.
+static const int kFastToBufferSize = 32;
+
+// ----------------------------------------------------------------------
+// FastInt32ToBufferLeft()
+// FastUInt32ToBufferLeft()
+// FastInt64ToBufferLeft()
+// FastUInt64ToBufferLeft()
+//
+// These functions convert their numeric argument to an ASCII
+// representation of the numeric value in base 10, with the
+// representation being left-aligned in the buffer.  The caller is
+// responsible for ensuring that the buffer has enough space to hold
+// the output.  The buffer should typically be at least kFastToBufferSize
+// bytes.
+//
+// Returns a pointer to the end of the string (i.e. the null character
+// terminating the string).
+// ----------------------------------------------------------------------
+
+char* FastInt32ToBufferLeft(int32 i, char* buffer);    // at least 12 bytes
+char* FastUInt32ToBufferLeft(uint32 i, char* buffer);  // at least 12 bytes
+char* FastInt64ToBufferLeft(int64 i, char* buffer);    // at least 22 bytes
+char* FastUInt64ToBufferLeft(uint64 i, char* buffer);  // at least 22 bytes
+
+// Required buffer size for DoubleToBuffer is kFastToBufferSize.
+// Required buffer size for FloatToBuffer is kFastToBufferSize.
+char* DoubleToBuffer(double i, char* buffer);
+char* FloatToBuffer(float i, char* buffer);
+
+// Convert a 64-bit fingerprint value to an ASCII representation.
+string FpToString(Fprint fp);
+
+// Attempt to parse a fingerprint in the form encoded by FpToString.  If
+// successsful, stores the fingerprint in *fp and returns true.  Otherwise,
+// returns false.
+bool StringToFp(const string& s, Fprint* fp);
+
+// Convert strings to 32bit integer values.
+// Leading and trailing spaces are allowed.
+// Return false with overflow or invalid input.
+bool safe_strto32(const char* str, int32* value);
+
+// Convert strings to 64bit integer values.
+// Leading and trailing spaces are allowed.
+// Return false with overflow or invalid input.
+bool safe_strto64(const char* str, int64* value);
+
+// Convert strings to floating point values.
+// Leading and trailing spaces are allowed.
+// Values may be rounded on over- and underflow.
+bool safe_strtof(const char* str, float* value);
+
+// Converts from an int64 representing a number of bytes to a
+// human readable string representing the same number.
+// e.g. 12345678 -> "11.77MiB".
+string HumanReadableNumBytes(int64 num_bytes);
+
+}  // namespace strings
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_STRINGS_NUMBERS_H_
diff --git a/tensorflow/core/lib/strings/numbers_test.cc b/tensorflow/core/lib/strings/numbers_test.cc
new file mode 100644
index 0000000000..b178e6af53
--- /dev/null
+++ b/tensorflow/core/lib/strings/numbers_test.cc
@@ -0,0 +1,113 @@
+#include "tensorflow/core/lib/strings/numbers.h"
+
+#include <string>
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace strings {
+
+// NOTE: most of the routines in numbers.h are tested indirectly through
+// strcat_test.cc in this directory.
+
+// Test StrCat of ints and longs of various sizes and signdedness.
+TEST(FpToString, Ints) {
+  for (int s = 0; s < 64; s++) {
+    for (int delta = -1; delta <= 1; delta++) {
+      uint64 fp = (1ull << s) + delta;
+      string s = FpToString(fp);
+      uint64 fp2;
+      EXPECT_TRUE(StringToFp(s, &fp2));
+      EXPECT_EQ(fp, fp2);
+    }
+  }
+  Fprint dummy;
+  EXPECT_FALSE(StringToFp("", &dummy));
+  EXPECT_FALSE(StringToFp("xyz", &dummy));
+  EXPECT_FALSE(StringToFp("0000000000000000xyz", &dummy));
+}
+
+TEST(HumanReadableNumBytes, Bytes) {
+  EXPECT_EQ("0B", HumanReadableNumBytes(0));
+  EXPECT_EQ("4B", HumanReadableNumBytes(4));
+  EXPECT_EQ("1023B", HumanReadableNumBytes(1023));
+
+  EXPECT_EQ("1.0KiB", HumanReadableNumBytes(1024));
+  EXPECT_EQ("1.0KiB", HumanReadableNumBytes(1025));
+  EXPECT_EQ("1.5KiB", HumanReadableNumBytes(1500));
+  EXPECT_EQ("1.9KiB", HumanReadableNumBytes(1927));
+
+  EXPECT_EQ("2.0KiB", HumanReadableNumBytes(2048));
+  EXPECT_EQ("1.00MiB", HumanReadableNumBytes(1 << 20));
+  EXPECT_EQ("11.77MiB", HumanReadableNumBytes(12345678));
+  EXPECT_EQ("1.00GiB", HumanReadableNumBytes(1 << 30));
+
+  EXPECT_EQ("1.00TiB", HumanReadableNumBytes(1LL << 40));
+  EXPECT_EQ("1.00PiB", HumanReadableNumBytes(1LL << 50));
+  EXPECT_EQ("1.00EiB", HumanReadableNumBytes(1LL << 60));
+
+  // Try a few negative numbers
+  EXPECT_EQ("-1B", HumanReadableNumBytes(-1));
+  EXPECT_EQ("-4B", HumanReadableNumBytes(-4));
+  EXPECT_EQ("-1000B", HumanReadableNumBytes(-1000));
+  EXPECT_EQ("-11.77MiB", HumanReadableNumBytes(-12345678));
+  EXPECT_EQ("-8E", HumanReadableNumBytes(kint64min));
+}
+
+TEST(safe_strto32, Int32s) {
+  int32 result;
+
+  EXPECT_EQ(true, safe_strto32("1", &result));
+  EXPECT_EQ(1, result);
+  EXPECT_EQ(true, safe_strto32("123", &result));
+  EXPECT_EQ(123, result);
+  EXPECT_EQ(true, safe_strto32(" -123 ", &result));
+  EXPECT_EQ(-123, result);
+  EXPECT_EQ(true, safe_strto32("2147483647", &result));
+  EXPECT_EQ(2147483647, result);
+  EXPECT_EQ(true, safe_strto32("-2147483648", &result));
+  EXPECT_EQ(-2147483648, result);
+
+  // Invalid argument
+  EXPECT_EQ(false, safe_strto32(" 132as ", &result));
+  EXPECT_EQ(false, safe_strto32(" 132.2 ", &result));
+  EXPECT_EQ(false, safe_strto32(" -", &result));
+  EXPECT_EQ(false, safe_strto32("", &result));
+  EXPECT_EQ(false, safe_strto32("  ", &result));
+  EXPECT_EQ(false, safe_strto32("123 a", &result));
+
+  // Overflow
+  EXPECT_EQ(false, safe_strto32("2147483648", &result));
+  EXPECT_EQ(false, safe_strto32("-2147483649", &result));
+}
+
+TEST(safe_strto64, Int64s) {
+  int64 result;
+
+  EXPECT_EQ(true, safe_strto64("1", &result));
+  EXPECT_EQ(1, result);
+  EXPECT_EQ(true, safe_strto64("123", &result));
+  EXPECT_EQ(123, result);
+  EXPECT_EQ(true, safe_strto64(" -123 ", &result));
+  EXPECT_EQ(-123, result);
+  EXPECT_EQ(true, safe_strto64("9223372036854775807", &result));
+  EXPECT_EQ(9223372036854775807, result);
+  EXPECT_EQ(true, safe_strto64("-9223372036854775808", &result));
+  // kint64min == -9223372036854775808
+  // Use -9223372036854775808 directly results in out of range error
+  EXPECT_EQ(kint64min, result);
+
+  // Invalid argument
+  EXPECT_EQ(false, safe_strto64(" 132as ", &result));
+  EXPECT_EQ(false, safe_strto64(" 132.2 ", &result));
+  EXPECT_EQ(false, safe_strto64(" -", &result));
+  EXPECT_EQ(false, safe_strto64("", &result));
+  EXPECT_EQ(false, safe_strto64("  ", &result));
+  EXPECT_EQ(false, safe_strto64("123 a", &result));
+
+  // Overflow
+  EXPECT_EQ(false, safe_strto64("9223372036854775808", &result));
+  EXPECT_EQ(false, safe_strto64("-9223372036854775809", &result));
+}
+
+}  // namespace strings
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/ordered_code.cc b/tensorflow/core/lib/strings/ordered_code.cc
new file mode 100644
index 0000000000..ec67595ebb
--- /dev/null
+++ b/tensorflow/core/lib/strings/ordered_code.cc
@@ -0,0 +1,515 @@
+#include "tensorflow/core/lib/strings/ordered_code.h"
+
+#include <assert.h>
+#include <stddef.h>
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace strings {
+
+// We encode a string in different ways depending on whether the item
+// should be in lexicographically increasing or decreasing order.
+//
+//
+// Lexicographically increasing order
+//
+// We want a string-to-string mapping F(x) such that for any two strings
+//
+//      x < y   =>   F(x) < F(y)
+//
+// In addition to the normal characters '\x00' through '\xff', we want to
+// encode a few extra symbols in strings:
+//
+//      <sep>           Separator between items
+//      <infinity>      Infinite string
+//
+// Therefore we need an alphabet with at least 258 symbols.  Each
+// character '\1' through '\xfe' is mapped to itself.  The other four are
+// encoded into two-letter sequences starting with '\0' and '\xff':
+//
+//      <sep>           encoded as =>           \0\1
+//      \0              encoded as =>           \0\xff
+//      \xff            encoded as =>           \xff\x00
+//      <infinity>      encoded as =>           \xff\xff
+//
+// The remaining two-letter sequences starting with '\0' and '\xff' are
+// currently unused.
+//
+// F(<infinity>) is defined above.  For any finite string x, F(x) is the
+// the encodings of x's characters followed by the encoding for <sep>.  The
+// ordering of two finite strings is the same as the ordering of the
+// respective characters at the first position where they differ, which in
+// turn is the same as the ordering of the encodings of those two
+// characters.  Moreover, for every finite string x, F(x) < F(<infinity>).
+//
+//
+// Lexicographically decreasing order
+//
+// We want a string-to-string mapping G(x) such that for any two strings,
+// whether finite or not,
+//
+//      x < y   =>   G(x) > G(y)
+//
+// To achieve this, define G(x) to be the inversion of F(x): I(F(x)).  In
+// other words, invert every bit in F(x) to get G(x). For example,
+//
+//        x  = \x00\x13\xff
+//      F(x) = \x00\xff\x13\xff\x00\x00\x01  escape \0, \xff, append F(<sep>)
+//      G(x) = \xff\x00\xec\x00\xff\xff\xfe  invert every bit in F(x)
+//
+//        x  = <infinity>
+//      F(x) = \xff\xff
+//      G(x) = \x00\x00
+//
+// Another example is
+//
+//        x            F(x)        G(x) = I(F(x))
+//        -            ----        --------------
+//        <infinity>   \xff\xff    \x00\x00
+//        "foo"        foo\0\1     \x99\x90\x90\xff\xfe
+//        "aaa"        aaa\0\1     \x9e\x9e\x9e\xff\xfe
+//        "aa"         aa\0\1      \x9e\x9e\xff\xfe
+//        ""           \0\1        \xff\xfe
+//
+// More generally and rigorously, if for any two strings x and y
+//
+//      F(x) < F(y)   =>   I(F(x)) > I(F(y))                      (1)
+//
+// it would follow that x < y => G(x) > G(y) because
+//
+//      x < y   =>   F(x) < F(y)   =>   G(x) = I(F(x)) > I(F(y)) = G(y)
+//
+// We now show why (1) is true, in two parts.  Notice that for any two
+// strings x < y, F(x) is *not* a proper prefix of F(y).  Suppose x is a
+// proper prefix of y (say, x="abc" < y="abcd").  F(x) and F(y) diverge at
+// the F(<sep>) in F(x) (v. F('d') in the example).  Suppose x is not a
+// proper prefix of y (say, x="abce" < y="abd"), F(x) and F(y) diverge at
+// their respective encodings of the characters where x and y diverge
+// (F('c') v. F('d')).  Finally, if y=<infinity>, we can see that
+// F(y)=\xff\xff is not the prefix of F(x) for any finite string x, simply
+// by considering all the possible first characters of F(x).
+//
+// Given that F(x) is not a proper prefix F(y), the order of F(x) and F(y)
+// is determined by the byte where F(x) and F(y) diverge.  For example, the
+// order of F(x)="eefh" and F(y)="eeg" is determined by their third
+// characters.  I(p) inverts each byte in p, which effectively subtracts
+// each byte from 0xff.  So, in this example, I('f') > I('g'), and thus
+// I(F(x)) > I(F(y)).
+//
+//
+// Implementation
+//
+// To implement G(x) efficiently, we use C++ template to instantiate two
+// versions of the code to produce F(x), one for normal encoding (giving us
+// F(x)) and one for inverted encoding (giving us G(x) = I(F(x))).
+
+static const char kEscape1 = '\000';
+static const char kNullCharacter = '\xff';  // Combined with kEscape1
+static const char kSeparator = '\001';      // Combined with kEscape1
+
+static const char kEscape2 = '\xff';
+static const char kInfinity = '\xff';     // Combined with kEscape2
+static const char kFFCharacter = '\000';  // Combined with kEscape2
+
+static const char kEscape1_Separator[2] = {kEscape1, kSeparator};
+
+// Append to "*dest" the "len" bytes starting from "*src".
+inline static void AppendBytes(string* dest, const char* src, int len) {
+  dest->append(src, len);
+}
+
+inline bool IsSpecialByte(char c) { return ((unsigned char)(c + 1)) < 2; }
+
+// Return a pointer to the first byte in the range "[start..limit)"
+// whose value is 0 or 255 (kEscape1 or kEscape2).  If no such byte
+// exists in the range, returns "limit".
+inline const char* SkipToNextSpecialByte(const char* start, const char* limit) {
+  // If these constants were ever changed, this routine needs to change
+  DCHECK_EQ(kEscape1, 0);
+  DCHECK_EQ(kEscape2 & 0xffu, 255u);
+  const char* p = start;
+  while (p < limit && !IsSpecialByte(*p)) {
+    p++;
+  }
+  return p;
+}
+
+// Expose SkipToNextSpecialByte for testing purposes
+const char* OrderedCode::TEST_SkipToNextSpecialByte(const char* start,
+                                                    const char* limit) {
+  return SkipToNextSpecialByte(start, limit);
+}
+
+// Helper routine to encode "s" and append to "*dest", escaping special
+// characters.
+inline static void EncodeStringFragment(string* dest, StringPiece s) {
+  const char* p = s.data();
+  const char* limit = p + s.size();
+  const char* copy_start = p;
+  while (true) {
+    p = SkipToNextSpecialByte(p, limit);
+    if (p >= limit) break;  // No more special characters that need escaping
+    char c = *(p++);
+    DCHECK(IsSpecialByte(c));
+    if (c == kEscape1) {
+      AppendBytes(dest, copy_start, p - copy_start - 1);
+      dest->push_back(kEscape1);
+      dest->push_back(kNullCharacter);
+      copy_start = p;
+    } else {
+      assert(c == kEscape2);
+      AppendBytes(dest, copy_start, p - copy_start - 1);
+      dest->push_back(kEscape2);
+      dest->push_back(kFFCharacter);
+      copy_start = p;
+    }
+  }
+  if (p > copy_start) {
+    AppendBytes(dest, copy_start, p - copy_start);
+  }
+}
+
+void OrderedCode::WriteString(string* dest, StringPiece s) {
+  EncodeStringFragment(dest, s);
+  AppendBytes(dest, kEscape1_Separator, 2);
+}
+
+void OrderedCode::WriteNumIncreasing(string* dest, uint64 val) {
+  // Values are encoded with a single byte length prefix, followed
+  // by the actual value in big-endian format with leading 0 bytes
+  // dropped.
+  unsigned char buf[9];  // 8 bytes for value plus one byte for length
+  int len = 0;
+  while (val > 0) {
+    len++;
+    buf[9 - len] = (val & 0xff);
+    val >>= 8;
+  }
+  buf[9 - len - 1] = (unsigned char)len;
+  len++;
+  AppendBytes(dest, reinterpret_cast<const char*>(buf + 9 - len), len);
+}
+
+// Parse the encoding of a previously encoded string.
+// If parse succeeds, return true, consume encoding from
+// "*src", and if result != NULL append the decoded string to "*result".
+// Otherwise, return false and leave both undefined.
+inline static bool ReadStringInternal(StringPiece* src, string* result) {
+  const char* start = src->data();
+  const char* string_limit = src->data() + src->size();
+
+  // We only scan up to "limit-2" since a valid string must end with
+  // a two character terminator: 'kEscape1 kSeparator'
+  const char* limit = string_limit - 1;
+  const char* copy_start = start;
+  while (true) {
+    start = SkipToNextSpecialByte(start, limit);
+    if (start >= limit) break;  // No terminator sequence found
+    const char c = *(start++);
+    // If inversion is required, instead of inverting 'c', we invert the
+    // character constants to which 'c' is compared.  We get the same
+    // behavior but save the runtime cost of inverting 'c'.
+    DCHECK(IsSpecialByte(c));
+    if (c == kEscape1) {
+      if (result) {
+        AppendBytes(result, copy_start, start - copy_start - 1);
+      }
+      // kEscape1 kSeparator ends component
+      // kEscape1 kNullCharacter represents '\0'
+      const char next = *(start++);
+      if (next == kSeparator) {
+        src->remove_prefix(start - src->data());
+        return true;
+      } else if (next == kNullCharacter) {
+        if (result) {
+          *result += '\0';
+        }
+      } else {
+        return false;
+      }
+      copy_start = start;
+    } else {
+      assert(c == kEscape2);
+      if (result) {
+        AppendBytes(result, copy_start, start - copy_start - 1);
+      }
+      // kEscape2 kFFCharacter represents '\xff'
+      // kEscape2 kInfinity is an error
+      const char next = *(start++);
+      if (next == kFFCharacter) {
+        if (result) {
+          *result += '\xff';
+        }
+      } else {
+        return false;
+      }
+      copy_start = start;
+    }
+  }
+  return false;
+}
+
+bool OrderedCode::ReadString(StringPiece* src, string* result) {
+  return ReadStringInternal(src, result);
+}
+
+bool OrderedCode::ReadNumIncreasing(StringPiece* src, uint64* result) {
+  if (src->empty()) {
+    return false;  // Not enough bytes
+  }
+
+  // Decode length byte
+  const size_t len = static_cast<unsigned char>((*src)[0]);
+
+  // If len > 0 and src is longer than 1, the first byte of "payload"
+  // must be non-zero (otherwise the encoding is not minimal).
+  // In opt mode, we don't enforce that encodings must be minimal.
+  DCHECK(0 == len || src->size() == 1 || (*src)[1] != '\0')
+      << "invalid encoding";
+
+  if (len + 1 > src->size() || len > 8) {
+    return false;  // Not enough bytes or too many bytes
+  }
+
+  if (result) {
+    uint64 tmp = 0;
+    for (size_t i = 0; i < len; i++) {
+      tmp <<= 8;
+      tmp |= static_cast<unsigned char>((*src)[1 + i]);
+    }
+    *result = tmp;
+  }
+  src->remove_prefix(len + 1);
+  return true;
+}
+
+void OrderedCode::TEST_Corrupt(string* str, int k) {
+  int seen_seps = 0;
+  for (size_t i = 0; i + 1 < str->size(); i++) {
+    if ((*str)[i] == kEscape1 && (*str)[i + 1] == kSeparator) {
+      seen_seps++;
+      if (seen_seps == k) {
+        (*str)[i + 1] = kSeparator + 1;
+        return;
+      }
+    }
+  }
+}
+
+// Signed number encoding/decoding /////////////////////////////////////
+//
+// The format is as follows:
+//
+// The first bit (the most significant bit of the first byte)
+// represents the sign, 0 if the number is negative and
+// 1 if the number is >= 0.
+//
+// Any unbroken sequence of successive bits with the same value as the sign
+// bit, up to 9 (the 8th and 9th are the most significant bits of the next
+// byte), are size bits that count the number of bytes after the first byte.
+// That is, the total length is between 1 and 10 bytes.
+//
+// The value occupies the bits after the sign bit and the "size bits"
+// till the end of the string, in network byte order.  If the number
+// is negative, the bits are in 2-complement.
+//
+//
+// Example 1: number 0x424242 -> 4 byte big-endian hex string 0xf0424242:
+//
+// +---------------+---------------+---------------+---------------+
+//  1 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0
+// +---------------+---------------+---------------+---------------+
+//  ^ ^ ^ ^   ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
+//  | | | |   | | | | | | | | | | | | | | | | | | | | | | | | | | |
+//  | | | |   payload: the remaining bits after the sign and size bits
+//  | | | |            and the delimiter bit, the value is 0x424242
+//  | | | |
+//  | size bits: 3 successive bits with the same value as the sign bit
+//  |            (followed by a delimiter bit with the opposite value)
+//  |            mean that there are 3 bytes after the first byte, 4 total
+//  |
+//  sign bit: 1 means that the number is non-negative
+//
+// Example 2: negative number -0x800 -> 2 byte big-endian hex string 0x3800:
+//
+// +---------------+---------------+
+//  0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
+// +---------------+---------------+
+//  ^ ^   ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
+//  | |   | | | | | | | | | | | | | | | | | | | | | | | | | | |
+//  | |   payload: the remaining bits after the sign and size bits and the
+//  | |            delimiter bit, 2-complement because of the negative sign,
+//  | |            value is ~0x7ff, represents the value -0x800
+//  | |
+//  | size bits: 1 bit with the same value as the sign bit
+//  |            (followed by a delimiter bit with the opposite value)
+//  |            means that there is 1 byte after the first byte, 2 total
+//  |
+//  sign bit: 0 means that the number is negative
+//
+//
+// Compared with the simpler unsigned format used for uint64 numbers,
+// this format is more compact for small numbers, namely one byte encodes
+// numbers in the range [-64,64), two bytes cover the range [-2^13,2^13), etc.
+// In general, n bytes encode numbers in the range [-2^(n*7-1),2^(n*7-1)).
+// (The cross-over point for compactness of representation is 8 bytes,
+// where this format only covers the range [-2^55,2^55),
+// whereas an encoding with sign bit and length in the first byte and
+// payload in all following bytes would cover [-2^56,2^56).)
+
+static const int kMaxSigned64Length = 10;
+
+// This array maps encoding length to header bits in the first two bytes.
+static const char kLengthToHeaderBits[1 + kMaxSigned64Length][2] = {
+    {0, 0},      {'\x80', 0},      {'\xc0', 0},     {'\xe0', 0},
+    {'\xf0', 0}, {'\xf8', 0},      {'\xfc', 0},     {'\xfe', 0},
+    {'\xff', 0}, {'\xff', '\x80'}, {'\xff', '\xc0'}};
+
+// This array maps encoding lengths to the header bits that overlap with
+// the payload and need fixing when reading.
+static const uint64 kLengthToMask[1 + kMaxSigned64Length] = {
+    0ULL,
+    0x80ULL,
+    0xc000ULL,
+    0xe00000ULL,
+    0xf0000000ULL,
+    0xf800000000ULL,
+    0xfc0000000000ULL,
+    0xfe000000000000ULL,
+    0xff00000000000000ULL,
+    0x8000000000000000ULL,
+    0ULL};
+
+// This array maps the number of bits in a number to the encoding
+// length produced by WriteSignedNumIncreasing.
+// For positive numbers, the number of bits is 1 plus the most significant
+// bit position (the highest bit position in a positive int64 is 63).
+// For a negative number n, we count the bits in ~n.
+// That is, length = kBitsToLength[Bits::Log2Floor64(n < 0 ? ~n : n) + 1].
+static const int8 kBitsToLength[1 + 63] = {
+    1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4,
+    4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7,
+    7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 10};
+
+#if defined(__GNUC__)
+// Returns floor(lg(n)).  Returns -1 if n == 0.
+static int Log2Floor64(uint64 n) {
+  return n == 0 ? -1 : 63 ^ __builtin_clzll(n);
+}
+#else
+// Portable slow version
+static int Log2Floor32_Portable(uint32 n) {
+  if (n == 0) return -1;
+  int log = 0;
+  uint32 value = n;
+  for (int i = 4; i >= 0; --i) {
+    int shift = (1 << i);
+    uint32 x = value >> shift;
+    if (x != 0) {
+      value = x;
+      log += shift;
+    }
+  }
+  assert(value == 1);
+  return log;
+}
+// Returns floor(lg(n)).  Returns -1 if n == 0.
+static int Log2Floor64(uint64 n) {
+  const uint32 topbits = static_cast<uint32>(n >> 32);
+  if (topbits == 0) {
+    // Top bits are zero, so scan in bottom bits
+    return Log2Floor32_Portable(static_cast<uint32>(n));
+  } else {
+    return 32 + Log2Floor32_Portable(topbits);
+  }
+}
+#endif
+
+// Calculates the encoding length in bytes of the signed number n.
+static inline int SignedEncodingLength(int64 n) {
+  return kBitsToLength[Log2Floor64(n < 0 ? ~n : n) + 1];
+}
+
+static void StoreBigEndian64(char* dst, uint64 v) {
+  for (int i = 0; i < 8; i++) {
+    dst[i] = (v >> (56 - 8 * i)) & 0xff;
+  }
+}
+
+static uint64 LoadBigEndian64(const char* src) {
+  uint64 result = 0;
+  for (int i = 0; i < 8; i++) {
+    unsigned char c = static_cast<unsigned char>(src[i]);
+    result |= static_cast<uint64>(c) << (56 - 8 * i);
+  }
+  return result;
+}
+
+void OrderedCode::WriteSignedNumIncreasing(string* dest, int64 val) {
+  const uint64 x = val < 0 ? ~val : val;
+  if (x < 64) {  // fast path for encoding length == 1
+    *dest += kLengthToHeaderBits[1][0] ^ val;
+    return;
+  }
+  // buf = val in network byte order, sign extended to 10 bytes
+  const char sign_byte = val < 0 ? '\xff' : '\0';
+  char buf[10] = {
+      sign_byte, sign_byte,
+  };
+  StoreBigEndian64(buf + 2, val);
+  static_assert(sizeof(buf) == kMaxSigned64Length, "max length size mismatch");
+  const int len = SignedEncodingLength(x);
+  DCHECK_GE(len, 2);
+  char* const begin = buf + sizeof(buf) - len;
+  begin[0] ^= kLengthToHeaderBits[len][0];
+  begin[1] ^= kLengthToHeaderBits[len][1];  // ok because len >= 2
+  dest->append(begin, len);
+}
+
+bool OrderedCode::ReadSignedNumIncreasing(StringPiece* src, int64* result) {
+  if (src->empty()) return false;
+  const uint64 xor_mask = (!((*src)[0] & 0x80)) ? ~0ULL : 0ULL;
+  const unsigned char first_byte = (*src)[0] ^ (xor_mask & 0xff);
+
+  // now calculate and test length, and set x to raw (unmasked) result
+  int len;
+  uint64 x;
+  if (first_byte != 0xff) {
+    len = 7 - Log2Floor64(first_byte ^ 0xff);
+    if (src->size() < static_cast<size_t>(len)) return false;
+    x = xor_mask;  // sign extend using xor_mask
+    for (int i = 0; i < len; ++i)
+      x = (x << 8) | static_cast<unsigned char>((*src)[i]);
+  } else {
+    len = 8;
+    if (src->size() < static_cast<size_t>(len)) return false;
+    const unsigned char second_byte = (*src)[1] ^ (xor_mask & 0xff);
+    if (second_byte >= 0x80) {
+      if (second_byte < 0xc0) {
+        len = 9;
+      } else {
+        const unsigned char third_byte = (*src)[2] ^ (xor_mask & 0xff);
+        if (second_byte == 0xc0 && third_byte < 0x80) {
+          len = 10;
+        } else {
+          return false;  // either len > 10 or len == 10 and #bits > 63
+        }
+      }
+      if (src->size() < static_cast<size_t>(len)) return false;
+    }
+    x = LoadBigEndian64(src->data() + len - 8);
+  }
+
+  x ^= kLengthToMask[len];  // remove spurious header bits
+
+  DCHECK_EQ(len, SignedEncodingLength(x)) << "invalid encoding";
+
+  if (result) *result = x;
+  src->remove_prefix(len);
+  return true;
+}
+
+}  // namespace strings
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/ordered_code.h b/tensorflow/core/lib/strings/ordered_code.h
new file mode 100644
index 0000000000..39f1df9a94
--- /dev/null
+++ b/tensorflow/core/lib/strings/ordered_code.h
@@ -0,0 +1,77 @@
+// This module provides routines for encoding a sequence of typed
+// entities into a string.  The resulting strings can be
+// lexicographically compared to yield the same comparison value that
+// would have been generated if the encoded items had been compared
+// one by one according to their type.
+//
+// More precisely, suppose:
+//  1. string A is generated by encoding the sequence of items [A_1..A_n]
+//  2. string B is generated by encoding the sequence of items [B_1..B_n]
+//  3. The types match; i.e., for all i: A_i was encoded using
+//     the same routine as B_i
+// Then:
+//    Comparing A vs. B lexicographically is the same as comparing
+//    the vectors [A_1..A_n] and [B_1..B_n] lexicographically.
+//
+// Furthermore, if n < m, the encoding of [A_1..A_n] is a strict prefix of
+// [A_1..A_m] (unless m = n+1 and A_m is the empty string encoded with
+// WriteTrailingString, in which case the encodings are equal).
+//
+// This module is often useful when generating multi-part sstable
+// keys that have to be ordered in a particular fashion.
+
+#ifndef TENSORFLOW_LIB_STRINGS_ORDERED_CODE_H__
+#define TENSORFLOW_LIB_STRINGS_ORDERED_CODE_H__
+
+#include <string>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+class StringPiece;
+
+namespace strings {
+
+class OrderedCode {
+ public:
+  // -------------------------------------------------------------------
+  // Encoding routines: each one of the following routines append
+  // one item to "*dest" in an encoding where larger values are
+  // ordered lexicographically after smaller values.
+  static void WriteString(string* dest, StringPiece str);
+  static void WriteNumIncreasing(string* dest, uint64 num);
+  static void WriteSignedNumIncreasing(string* dest, int64 num);
+
+  // -------------------------------------------------------------------
+  // Decoding routines: these extract an item earlier encoded using
+  // the corresponding WriteXXX() routines above.  The item is read
+  // from "*src"; "*src" is modified to point past the decoded item;
+  // and if "result" is non-NULL, "*result" is modified to contain the
+  // result.  In case of string result, the decoded string is appended to
+  // "*result".  Returns true if the next item was read successfully, false
+  // otherwise.
+  static bool ReadString(StringPiece* src, string* result);
+  static bool ReadNumIncreasing(StringPiece* src, uint64* result);
+  static bool ReadSignedNumIncreasing(StringPiece* src, int64* result);
+
+  // Helper for testing: corrupt "*str" by changing the kth item separator
+  // in the string.
+  static void TEST_Corrupt(string* str, int k);
+
+  // Helper for testing.
+  // SkipToNextSpecialByte is an internal routine defined in the .cc file
+  // with the following semantics. Return a pointer to the first byte
+  // in the range "[start..limit)" whose value is 0 or 255.  If no such
+  // byte exists in the range, returns "limit".
+  static const char* TEST_SkipToNextSpecialByte(const char* start,
+                                                const char* limit);
+
+ private:
+  // This has only static methods, so disallow construction entirely
+  OrderedCode();
+  TF_DISALLOW_COPY_AND_ASSIGN(OrderedCode);
+};
+
+}  // namespace strings
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_STRINGS_ORDERED_CODE_H__
diff --git a/tensorflow/core/lib/strings/ordered_code_test.cc b/tensorflow/core/lib/strings/ordered_code_test.cc
new file mode 100644
index 0000000000..d517d14f4a
--- /dev/null
+++ b/tensorflow/core/lib/strings/ordered_code_test.cc
@@ -0,0 +1,1183 @@
+#include "tensorflow/core/lib/strings/ordered_code.h"
+
+#include <float.h>
+#include <stddef.h>
+#include <limits>
+#include <vector>
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+
+namespace tensorflow {
+namespace strings {
+
+static string RandomString(random::SimplePhilox* rnd, int len) {
+  string x;
+  for (int i = 0; i < len; i++) {
+    x += rnd->Uniform(256);
+  }
+  return x;
+}
+
+// ---------------------------------------------------------------------
+// Utility template functions (they help templatize the tests below)
+
+// Read/WriteIncreasing are defined for string, uint64, int64 below.
+template <typename T>
+static void OCWriteIncreasing(string* dest, const T& val);
+template <typename T>
+static bool OCReadIncreasing(StringPiece* src, T* result);
+
+// Read/WriteIncreasing<string>
+template <>
+void OCWriteIncreasing<string>(string* dest, const string& val) {
+  OrderedCode::WriteString(dest, val);
+}
+template <>
+bool OCReadIncreasing<string>(StringPiece* src, string* result) {
+  return OrderedCode::ReadString(src, result);
+}
+
+// Read/WriteIncreasing<uint64>
+template <>
+void OCWriteIncreasing<uint64>(string* dest, const uint64& val) {
+  OrderedCode::WriteNumIncreasing(dest, val);
+}
+template <>
+bool OCReadIncreasing<uint64>(StringPiece* src, uint64* result) {
+  return OrderedCode::ReadNumIncreasing(src, result);
+}
+
+// Read/WriteIncreasing<int64>
+template <>
+void OCWriteIncreasing<int64>(string* dest, const int64& val) {
+  OrderedCode::WriteSignedNumIncreasing(dest, val);
+}
+template <>
+bool OCReadIncreasing<int64>(StringPiece* src, int64* result) {
+  return OrderedCode::ReadSignedNumIncreasing(src, result);
+}
+
+template <typename T>
+string OCWrite(T val) {
+  string result;
+  OCWriteIncreasing<T>(&result, val);
+  return result;
+}
+
+template <typename T>
+void OCWriteToString(string* result, T val) {
+  OCWriteIncreasing<T>(result, val);
+}
+
+template <typename T>
+bool OCRead(StringPiece* s, T* val) {
+  return OCReadIncreasing<T>(s, val);
+}
+
+// ---------------------------------------------------------------------
+// Numbers
+
+template <typename T>
+static T TestRead(const string& a) {
+  // gracefully reject any proper prefix of an encoding
+  for (int i = 0; i < a.size() - 1; ++i) {
+    StringPiece s(a.data(), i);
+    CHECK(!OCRead<T>(&s, NULL));
+    CHECK_EQ(s, a.substr(0, i));
+  }
+
+  StringPiece s(a);
+  T v;
+  CHECK(OCRead<T>(&s, &v));
+  CHECK(s.empty());
+  return v;
+}
+
+template <typename T>
+static void TestWriteRead(T expected) {
+  EXPECT_EQ(expected, TestRead<T>(OCWrite<T>(expected)));
+}
+
+// Verifies that the second Write* call appends a non-empty string to its
+// output.
+template <typename T, typename U>
+static void TestWriteAppends(T first, U second) {
+  string encoded;
+  OCWriteToString<T>(&encoded, first);
+  string encoded_first_only = encoded;
+  OCWriteToString<U>(&encoded, second);
+  EXPECT_NE(encoded, encoded_first_only);
+  EXPECT_TRUE(StringPiece(encoded).starts_with(encoded_first_only));
+}
+
+template <typename T>
+static void TestNumbers(T multiplier) {
+  // first test powers of 2 (and nearby numbers)
+  for (T x = std::numeric_limits<T>().max(); x != 0; x /= 2) {
+    TestWriteRead(multiplier * (x - 1));
+    TestWriteRead(multiplier * x);
+    if (x != std::numeric_limits<T>::max()) {
+      TestWriteRead(multiplier * (x + 1));
+    } else if (multiplier < 0 && multiplier == -1) {
+      TestWriteRead(-x - 1);
+    }
+  }
+
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  for (int bits = 1; bits <= std::numeric_limits<T>().digits; ++bits) {
+    // test random non-negative numbers with given number of significant bits
+    const uint64 mask = (~0ULL) >> (64 - bits);
+    for (int i = 0; i < 1000; i++) {
+      T x = rnd.Rand64() & mask;
+      TestWriteRead(multiplier * x);
+      T y = rnd.Rand64() & mask;
+      TestWriteAppends(multiplier * x, multiplier * y);
+    }
+  }
+}
+
+// Return true iff 'a' is "before" 'b'
+static bool CompareStrings(const string& a, const string& b) { return (a < b); }
+
+template <typename T>
+static void TestNumberOrdering() {
+  // first the negative numbers (if T is signed, otherwise no-op)
+  string laststr = OCWrite<T>(std::numeric_limits<T>().min());
+  for (T num = std::numeric_limits<T>().min() / 2; num != 0; num /= 2) {
+    string strminus1 = OCWrite<T>(num - 1);
+    string str = OCWrite<T>(num);
+    string strplus1 = OCWrite<T>(num + 1);
+
+    CHECK(CompareStrings(strminus1, str));
+    CHECK(CompareStrings(str, strplus1));
+
+    // Compare 'str' with 'laststr'.  When we approach 0, 'laststr' is
+    // not necessarily before 'strminus1'.
+    CHECK(CompareStrings(laststr, str));
+    laststr = str;
+  }
+
+  // then the positive numbers
+  laststr = OCWrite<T>(0);
+  T num = 1;
+  while (num < std::numeric_limits<T>().max() / 2) {
+    num *= 2;
+    string strminus1 = OCWrite<T>(num - 1);
+    string str = OCWrite<T>(num);
+    string strplus1 = OCWrite<T>(num + 1);
+
+    CHECK(CompareStrings(strminus1, str));
+    CHECK(CompareStrings(str, strplus1));
+
+    // Compare 'str' with 'laststr'.
+    CHECK(CompareStrings(laststr, str));
+    laststr = str;
+  }
+}
+
+// Helper routine for testing TEST_SkipToNextSpecialByte
+static int FindSpecial(const string& x) {
+  const char* p = x.data();
+  const char* limit = p + x.size();
+  const char* result = OrderedCode::TEST_SkipToNextSpecialByte(p, limit);
+  return result - p;
+}
+
+TEST(OrderedCode, SkipToNextSpecialByte) {
+  for (size_t len = 0; len < 256; len++) {
+    random::PhiloxRandom philox(301, 17);
+    random::SimplePhilox rnd(&philox);
+    string x;
+    while (x.size() < len) {
+      char c = 1 + rnd.Uniform(254);
+      ASSERT_NE(c, 0);
+      ASSERT_NE(c, 255);
+      x += c;  // No 0 bytes, no 255 bytes
+    }
+    EXPECT_EQ(FindSpecial(x), x.size());
+    for (size_t special_pos = 0; special_pos < len; special_pos++) {
+      for (size_t special_test = 0; special_test < 2; special_test++) {
+        const char special_byte = (special_test == 0) ? 0 : 255;
+        string y = x;
+        y[special_pos] = special_byte;
+        EXPECT_EQ(FindSpecial(y), special_pos);
+        if (special_pos < 16) {
+          // Add some special bytes after the one at special_pos to make sure
+          // we still return the earliest special byte in the string
+          for (size_t rest = special_pos + 1; rest < len; rest++) {
+            if (rnd.OneIn(3)) {
+              y[rest] = rnd.OneIn(2) ? 0 : 255;
+              EXPECT_EQ(FindSpecial(y), special_pos);
+            }
+          }
+        }
+      }
+    }
+  }
+}
+
+TEST(OrderedCode, ExhaustiveFindSpecial) {
+  char buf[16];
+  char* limit = buf + sizeof(buf);
+  int count = 0;
+  for (int start_offset = 0; start_offset <= 5; start_offset += 5) {
+    // We test exhaustively with all combinations of 3 bytes starting
+    // at offset 0 and offset 5 (so as to test with the bytes at both
+    // ends of a 64-bit word).
+    for (size_t i = 0; i < sizeof(buf); i++) {
+      buf[i] = 'a';  // Not a special byte
+    }
+    for (int b0 = 0; b0 < 256; b0++) {
+      for (int b1 = 0; b1 < 256; b1++) {
+        for (int b2 = 0; b2 < 256; b2++) {
+          buf[start_offset + 0] = b0;
+          buf[start_offset + 1] = b1;
+          buf[start_offset + 2] = b2;
+          char* expected;
+          if (b0 == 0 || b0 == 255) {
+            expected = &buf[start_offset];
+          } else if (b1 == 0 || b1 == 255) {
+            expected = &buf[start_offset + 1];
+          } else if (b2 == 0 || b2 == 255) {
+            expected = &buf[start_offset + 2];
+          } else {
+            expected = limit;
+          }
+          count++;
+          EXPECT_EQ(expected,
+                    OrderedCode::TEST_SkipToNextSpecialByte(buf, limit));
+        }
+      }
+    }
+  }
+  EXPECT_EQ(count, 256 * 256 * 256 * 2);
+}
+
+TEST(Uint64, EncodeDecode) { TestNumbers<uint64>(1); }
+
+TEST(Uint64, Ordering) { TestNumberOrdering<uint64>(); }
+
+TEST(Int64, EncodeDecode) {
+  TestNumbers<int64>(1);
+  TestNumbers<int64>(-1);
+}
+
+TEST(Int64, Ordering) { TestNumberOrdering<int64>(); }
+
+// Returns the bitwise complement of s.
+static inline string StrNot(const string& s) {
+  string result;
+  for (string::const_iterator it = s.begin(); it != s.end(); ++it)
+    result.push_back(~*it);
+  return result;
+}
+
+template <typename T>
+static void TestInvalidEncoding(const string& s) {
+  StringPiece p(s);
+  EXPECT_FALSE(OCRead<T>(&p, static_cast<T*>(NULL)));
+  EXPECT_EQ(s, p);
+}
+
+TEST(OrderedCodeInvalidEncodingsTest, Overflow) {
+  // 1U << 64, increasing and decreasing
+  const string k2xx64U = "\x09\x01" + string(8, 0);
+  TestInvalidEncoding<uint64>(k2xx64U);
+
+  // 1 << 63 and ~(1 << 63), increasing and decreasing
+  const string k2xx63 = "\xff\xc0\x80" + string(7, 0);
+  TestInvalidEncoding<int64>(k2xx63);
+  TestInvalidEncoding<int64>(StrNot(k2xx63));
+}
+
+TEST(OrderedCodeInvalidEncodingsDeathTest, NonCanonical) {
+  // Test "ambiguous"/"non-canonical" encodings.
+  // These are non-minimal (but otherwise "valid") encodings that
+  // differ from the minimal encoding chosen by OrderedCode::WriteXXX
+  // and thus should be avoided to not mess up the string ordering of
+  // encodings.
+
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+
+  for (int n = 2; n <= 9; ++n) {
+    // The zero in non_minimal[1] is "redundant".
+    string non_minimal =
+        string(1, n - 1) + string(1, 0) + RandomString(&rnd, n - 2);
+    EXPECT_EQ(n, non_minimal.length());
+
+    EXPECT_NE(OCWrite<uint64>(0), non_minimal);
+#ifndef NDEBUG
+    StringPiece s(non_minimal);
+    EXPECT_DEATH(OrderedCode::ReadNumIncreasing(&s, NULL), "invalid encoding");
+#else
+    TestRead<uint64>(non_minimal);
+#endif
+  }
+
+  for (int n = 2; n <= 10; ++n) {
+    // Header with 1 sign bit and n-1 size bits.
+    string header = string(n / 8, 0xff) + string(1, 0xff << (8 - (n % 8)));
+    // There are more than 7 zero bits between header bits and "payload".
+    string non_minimal = header +
+                         string(1, rnd.Uniform(256) & ~*header.rbegin()) +
+                         RandomString(&rnd, n - header.length() - 1);
+    EXPECT_EQ(n, non_minimal.length());
+
+    EXPECT_NE(OCWrite<int64>(0), non_minimal);
+#ifndef NDEBUG
+    StringPiece s(non_minimal);
+    EXPECT_DEATH(OrderedCode::ReadSignedNumIncreasing(&s, NULL),
+                 "invalid encoding")
+        << n;
+#else
+    TestRead<int64>(non_minimal);
+#endif
+  }
+}
+
+// Returns random number with specified number of bits,
+// i.e., in the range [2^(bits-1),2^bits).
+static uint64 NextBits(random::SimplePhilox* rnd, int bits) {
+  return (bits != 0)
+             ? (rnd->Rand64() % (1LL << (bits - 1))) + (1LL << (bits - 1))
+             : 0;
+}
+
+template <typename T>
+static void BM_WriteNum(int n, T multiplier) {
+  static const int kValues = 64;
+  T values[kValues];
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  // Use enough distinct values to confuse the branch predictor
+  for (int i = 0; i < kValues; i++) {
+    values[i] = NextBits(&rnd, n % 64) * multiplier;
+  }
+  string result;
+  int index = 0;
+  while (n-- > 0) {
+    result.clear();
+    OCWriteToString<T>(&result, values[index % kValues]);
+    index++;
+  }
+}
+
+template <typename T>
+static void BM_ReadNum(int n, T multiplier) {
+  string x;
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  // Use enough distinct values to confuse the branch predictor
+  static const int kValues = 64;
+  string values[kValues];
+  for (int i = 0; i < kValues; i++) {
+    T val = NextBits(&rnd, i % 64) * multiplier;
+    values[i] = OCWrite<T>(val);
+  }
+  uint32 index = 0;
+  while (n-- > 0) {
+    T val;
+    StringPiece s = values[index++ % kValues];
+    OCRead<T>(&s, &val);
+  }
+}
+
+#define BENCHMARK_NUM(name, T, multiplier)                             \
+  static void BM_Write##name(int n) { BM_WriteNum<T>(n, multiplier); } \
+  BENCHMARK(BM_Write##name);                                           \
+  static void BM_Read##name(int n) { BM_ReadNum<T>(n, multiplier); }   \
+  BENCHMARK(BM_Read##name)
+
+BENCHMARK_NUM(NumIncreasing, uint64, 1);
+BENCHMARK_NUM(SignedNum, int64, 1);
+BENCHMARK_NUM(SignedNumNegative, int64, -1);
+
+#undef BENCHMARK_NUM
+
+// ---------------------------------------------------------------------
+// Strings
+
+TEST(String, EncodeDecode) {
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+
+  for (int len = 0; len < 256; len++) {
+    const string a = RandomString(&rnd, len);
+    TestWriteRead(a);
+    for (int len2 = 0; len2 < 64; len2++) {
+      const string b = RandomString(&rnd, len2);
+
+      TestWriteAppends(a, b);
+
+      string out;
+      OCWriteToString<string>(&out, a);
+      OCWriteToString<string>(&out, b);
+
+      string a2, b2, dummy;
+      StringPiece s = out;
+      StringPiece s2 = out;
+      CHECK(OCRead<string>(&s, &a2));
+      CHECK(OCRead<string>(&s2, NULL));
+      CHECK_EQ(s, s2);
+
+      CHECK(OCRead<string>(&s, &b2));
+      CHECK(OCRead<string>(&s2, NULL));
+      CHECK_EQ(s, s2);
+
+      CHECK(!OCRead<string>(&s, &dummy));
+      CHECK(!OCRead<string>(&s2, NULL));
+      CHECK_EQ(a, a2);
+      CHECK_EQ(b, b2);
+      CHECK(s.empty());
+      CHECK(s2.empty());
+    }
+  }
+}
+
+// 'str' is a static C-style string that may contain '\0'
+#define STATIC_STR(str) StringPiece((str), sizeof(str) - 1)
+
+static string EncodeStringIncreasing(StringPiece value) {
+  string encoded;
+  OrderedCode::WriteString(&encoded, value);
+  return encoded;
+}
+
+TEST(String, Increasing) {
+  // Here are a series of strings in non-decreasing order, including
+  // consecutive strings such that the second one is equal to, a proper
+  // prefix of, or has the same length as the first one.  Most also contain
+  // the special escaping characters '\x00' and '\xff'.
+  ASSERT_EQ(EncodeStringIncreasing(STATIC_STR("")),
+            EncodeStringIncreasing(STATIC_STR("")));
+
+  ASSERT_LT(EncodeStringIncreasing(STATIC_STR("")),
+            EncodeStringIncreasing(STATIC_STR("\x00")));
+
+  ASSERT_EQ(EncodeStringIncreasing(STATIC_STR("\x00")),
+            EncodeStringIncreasing(STATIC_STR("\x00")));
+
+  ASSERT_LT(EncodeStringIncreasing(STATIC_STR("\x00")),
+            EncodeStringIncreasing(STATIC_STR("\x01")));
+
+  ASSERT_LT(EncodeStringIncreasing(STATIC_STR("\x01")),
+            EncodeStringIncreasing(STATIC_STR("a")));
+
+  ASSERT_EQ(EncodeStringIncreasing(STATIC_STR("a")),
+            EncodeStringIncreasing(STATIC_STR("a")));
+
+  ASSERT_LT(EncodeStringIncreasing(STATIC_STR("a")),
+            EncodeStringIncreasing(STATIC_STR("aa")));
+
+  ASSERT_LT(EncodeStringIncreasing(STATIC_STR("aa")),
+            EncodeStringIncreasing(STATIC_STR("\xff")));
+
+  ASSERT_LT(EncodeStringIncreasing(STATIC_STR("\xff")),
+            EncodeStringIncreasing(STATIC_STR("\xff\x00")));
+
+  ASSERT_LT(EncodeStringIncreasing(STATIC_STR("\xff\x00")),
+            EncodeStringIncreasing(STATIC_STR("\xff\x01")));
+}
+
+TEST(EncodingIsExpected, String) {
+  std::vector<std::pair<string, string>> data = {
+      {"", string("\x00\x01", 2)},
+      {"foo", string("foo\x00\x01", 5)},
+      {"hello", string("hello\x00\x01", 7)},
+      {string("\x00\x01\xff", 3), string("\x00\xff\x01\xff\x00\x00\x01", 7)},
+  };
+  for (const auto& t : data) {
+    string result;
+    OrderedCode::WriteString(&result, t.first);
+    EXPECT_EQ(t.second, result);
+
+    StringPiece in = result;
+    string decoded;
+    EXPECT_TRUE(OrderedCode::ReadString(&in, &decoded));
+    EXPECT_EQ(t.first, decoded);
+    EXPECT_EQ("", in);
+  }
+}
+
+TEST(EncodingIsExpected, Unsigned) {
+  std::vector<std::pair<uint64, string>> data = {
+      {0x0ull, string("\000", 1)},
+      {0x1ull, string("\001\001", 2)},
+      {0x2ull, string("\001\002", 2)},
+      {0x1ull, string("\001\001", 2)},
+      {0x2ull, string("\001\002", 2)},
+      {0x3ull, string("\001\003", 2)},
+      {0x3ull, string("\001\003", 2)},
+      {0x4ull, string("\001\004", 2)},
+      {0x5ull, string("\001\005", 2)},
+      {0x7ull, string("\001\007", 2)},
+      {0x8ull, string("\001\010", 2)},
+      {0x9ull, string("\001\t", 2)},
+      {0xfull, string("\001\017", 2)},
+      {0x10ull, string("\001\020", 2)},
+      {0x11ull, string("\001\021", 2)},
+      {0x1full, string("\001\037", 2)},
+      {0x20ull, string("\001 ", 2)},
+      {0x21ull, string("\001!", 2)},
+      {0x3full, string("\001?", 2)},
+      {0x40ull, string("\001@", 2)},
+      {0x41ull, string("\001A", 2)},
+      {0x7full, string("\001\177", 2)},
+      {0x80ull, string("\001\200", 2)},
+      {0x81ull, string("\001\201", 2)},
+      {0xffull, string("\001\377", 2)},
+      {0x100ull, string("\002\001\000", 3)},
+      {0x101ull, string("\002\001\001", 3)},
+      {0x1ffull, string("\002\001\377", 3)},
+      {0x200ull, string("\002\002\000", 3)},
+      {0x201ull, string("\002\002\001", 3)},
+      {0x3ffull, string("\002\003\377", 3)},
+      {0x400ull, string("\002\004\000", 3)},
+      {0x401ull, string("\002\004\001", 3)},
+      {0x7ffull, string("\002\007\377", 3)},
+      {0x800ull, string("\002\010\000", 3)},
+      {0x801ull, string("\002\010\001", 3)},
+      {0xfffull, string("\002\017\377", 3)},
+      {0x1000ull, string("\002\020\000", 3)},
+      {0x1001ull, string("\002\020\001", 3)},
+      {0x1fffull, string("\002\037\377", 3)},
+      {0x2000ull, string("\002 \000", 3)},
+      {0x2001ull, string("\002 \001", 3)},
+      {0x3fffull, string("\002?\377", 3)},
+      {0x4000ull, string("\002@\000", 3)},
+      {0x4001ull, string("\002@\001", 3)},
+      {0x7fffull, string("\002\177\377", 3)},
+      {0x8000ull, string("\002\200\000", 3)},
+      {0x8001ull, string("\002\200\001", 3)},
+      {0xffffull, string("\002\377\377", 3)},
+      {0x10000ull, string("\003\001\000\000", 4)},
+      {0x10001ull, string("\003\001\000\001", 4)},
+      {0x1ffffull, string("\003\001\377\377", 4)},
+      {0x20000ull, string("\003\002\000\000", 4)},
+      {0x20001ull, string("\003\002\000\001", 4)},
+      {0x3ffffull, string("\003\003\377\377", 4)},
+      {0x40000ull, string("\003\004\000\000", 4)},
+      {0x40001ull, string("\003\004\000\001", 4)},
+      {0x7ffffull, string("\003\007\377\377", 4)},
+      {0x80000ull, string("\003\010\000\000", 4)},
+      {0x80001ull, string("\003\010\000\001", 4)},
+      {0xfffffull, string("\003\017\377\377", 4)},
+      {0x100000ull, string("\003\020\000\000", 4)},
+      {0x100001ull, string("\003\020\000\001", 4)},
+      {0x1fffffull, string("\003\037\377\377", 4)},
+      {0x200000ull, string("\003 \000\000", 4)},
+      {0x200001ull, string("\003 \000\001", 4)},
+      {0x3fffffull, string("\003?\377\377", 4)},
+      {0x400000ull, string("\003@\000\000", 4)},
+      {0x400001ull, string("\003@\000\001", 4)},
+      {0x7fffffull, string("\003\177\377\377", 4)},
+      {0x800000ull, string("\003\200\000\000", 4)},
+      {0x800001ull, string("\003\200\000\001", 4)},
+      {0xffffffull, string("\003\377\377\377", 4)},
+      {0x1000000ull, string("\004\001\000\000\000", 5)},
+      {0x1000001ull, string("\004\001\000\000\001", 5)},
+      {0x1ffffffull, string("\004\001\377\377\377", 5)},
+      {0x2000000ull, string("\004\002\000\000\000", 5)},
+      {0x2000001ull, string("\004\002\000\000\001", 5)},
+      {0x3ffffffull, string("\004\003\377\377\377", 5)},
+      {0x4000000ull, string("\004\004\000\000\000", 5)},
+      {0x4000001ull, string("\004\004\000\000\001", 5)},
+      {0x7ffffffull, string("\004\007\377\377\377", 5)},
+      {0x8000000ull, string("\004\010\000\000\000", 5)},
+      {0x8000001ull, string("\004\010\000\000\001", 5)},
+      {0xfffffffull, string("\004\017\377\377\377", 5)},
+      {0x10000000ull, string("\004\020\000\000\000", 5)},
+      {0x10000001ull, string("\004\020\000\000\001", 5)},
+      {0x1fffffffull, string("\004\037\377\377\377", 5)},
+      {0x20000000ull, string("\004 \000\000\000", 5)},
+      {0x20000001ull, string("\004 \000\000\001", 5)},
+      {0x3fffffffull, string("\004?\377\377\377", 5)},
+      {0x40000000ull, string("\004@\000\000\000", 5)},
+      {0x40000001ull, string("\004@\000\000\001", 5)},
+      {0x7fffffffull, string("\004\177\377\377\377", 5)},
+      {0x80000000ull, string("\004\200\000\000\000", 5)},
+      {0x80000001ull, string("\004\200\000\000\001", 5)},
+      {0xffffffffull, string("\004\377\377\377\377", 5)},
+      {0x100000000ull, string("\005\001\000\000\000\000", 6)},
+      {0x100000001ull, string("\005\001\000\000\000\001", 6)},
+      {0x1ffffffffull, string("\005\001\377\377\377\377", 6)},
+      {0x200000000ull, string("\005\002\000\000\000\000", 6)},
+      {0x200000001ull, string("\005\002\000\000\000\001", 6)},
+      {0x3ffffffffull, string("\005\003\377\377\377\377", 6)},
+      {0x400000000ull, string("\005\004\000\000\000\000", 6)},
+      {0x400000001ull, string("\005\004\000\000\000\001", 6)},
+      {0x7ffffffffull, string("\005\007\377\377\377\377", 6)},
+      {0x800000000ull, string("\005\010\000\000\000\000", 6)},
+      {0x800000001ull, string("\005\010\000\000\000\001", 6)},
+      {0xfffffffffull, string("\005\017\377\377\377\377", 6)},
+      {0x1000000000ull, string("\005\020\000\000\000\000", 6)},
+      {0x1000000001ull, string("\005\020\000\000\000\001", 6)},
+      {0x1fffffffffull, string("\005\037\377\377\377\377", 6)},
+      {0x2000000000ull, string("\005 \000\000\000\000", 6)},
+      {0x2000000001ull, string("\005 \000\000\000\001", 6)},
+      {0x3fffffffffull, string("\005?\377\377\377\377", 6)},
+      {0x4000000000ull, string("\005@\000\000\000\000", 6)},
+      {0x4000000001ull, string("\005@\000\000\000\001", 6)},
+      {0x7fffffffffull, string("\005\177\377\377\377\377", 6)},
+      {0x8000000000ull, string("\005\200\000\000\000\000", 6)},
+      {0x8000000001ull, string("\005\200\000\000\000\001", 6)},
+      {0xffffffffffull, string("\005\377\377\377\377\377", 6)},
+      {0x10000000000ull, string("\006\001\000\000\000\000\000", 7)},
+      {0x10000000001ull, string("\006\001\000\000\000\000\001", 7)},
+      {0x1ffffffffffull, string("\006\001\377\377\377\377\377", 7)},
+      {0x20000000000ull, string("\006\002\000\000\000\000\000", 7)},
+      {0x20000000001ull, string("\006\002\000\000\000\000\001", 7)},
+      {0x3ffffffffffull, string("\006\003\377\377\377\377\377", 7)},
+      {0x40000000000ull, string("\006\004\000\000\000\000\000", 7)},
+      {0x40000000001ull, string("\006\004\000\000\000\000\001", 7)},
+      {0x7ffffffffffull, string("\006\007\377\377\377\377\377", 7)},
+      {0x80000000000ull, string("\006\010\000\000\000\000\000", 7)},
+      {0x80000000001ull, string("\006\010\000\000\000\000\001", 7)},
+      {0xfffffffffffull, string("\006\017\377\377\377\377\377", 7)},
+      {0x100000000000ull, string("\006\020\000\000\000\000\000", 7)},
+      {0x100000000001ull, string("\006\020\000\000\000\000\001", 7)},
+      {0x1fffffffffffull, string("\006\037\377\377\377\377\377", 7)},
+      {0x200000000000ull, string("\006 \000\000\000\000\000", 7)},
+      {0x200000000001ull, string("\006 \000\000\000\000\001", 7)},
+      {0x3fffffffffffull, string("\006?\377\377\377\377\377", 7)},
+      {0x400000000000ull, string("\006@\000\000\000\000\000", 7)},
+      {0x400000000001ull, string("\006@\000\000\000\000\001", 7)},
+      {0x7fffffffffffull, string("\006\177\377\377\377\377\377", 7)},
+      {0x800000000000ull, string("\006\200\000\000\000\000\000", 7)},
+      {0x800000000001ull, string("\006\200\000\000\000\000\001", 7)},
+      {0xffffffffffffull, string("\006\377\377\377\377\377\377", 7)},
+      {0x1000000000000ull, string("\007\001\000\000\000\000\000\000", 8)},
+      {0x1000000000001ull, string("\007\001\000\000\000\000\000\001", 8)},
+      {0x1ffffffffffffull, string("\007\001\377\377\377\377\377\377", 8)},
+      {0x2000000000000ull, string("\007\002\000\000\000\000\000\000", 8)},
+      {0x2000000000001ull, string("\007\002\000\000\000\000\000\001", 8)},
+      {0x3ffffffffffffull, string("\007\003\377\377\377\377\377\377", 8)},
+      {0x4000000000000ull, string("\007\004\000\000\000\000\000\000", 8)},
+      {0x4000000000001ull, string("\007\004\000\000\000\000\000\001", 8)},
+      {0x7ffffffffffffull, string("\007\007\377\377\377\377\377\377", 8)},
+      {0x8000000000000ull, string("\007\010\000\000\000\000\000\000", 8)},
+      {0x8000000000001ull, string("\007\010\000\000\000\000\000\001", 8)},
+      {0xfffffffffffffull, string("\007\017\377\377\377\377\377\377", 8)},
+      {0x10000000000000ull, string("\007\020\000\000\000\000\000\000", 8)},
+      {0x10000000000001ull, string("\007\020\000\000\000\000\000\001", 8)},
+      {0x1fffffffffffffull, string("\007\037\377\377\377\377\377\377", 8)},
+      {0x20000000000000ull, string("\007 \000\000\000\000\000\000", 8)},
+      {0x20000000000001ull, string("\007 \000\000\000\000\000\001", 8)},
+      {0x3fffffffffffffull, string("\007?\377\377\377\377\377\377", 8)},
+      {0x40000000000000ull, string("\007@\000\000\000\000\000\000", 8)},
+      {0x40000000000001ull, string("\007@\000\000\000\000\000\001", 8)},
+      {0x7fffffffffffffull, string("\007\177\377\377\377\377\377\377", 8)},
+      {0x80000000000000ull, string("\007\200\000\000\000\000\000\000", 8)},
+      {0x80000000000001ull, string("\007\200\000\000\000\000\000\001", 8)},
+      {0xffffffffffffffull, string("\007\377\377\377\377\377\377\377", 8)},
+      {0x100000000000000ull, string("\010\001\000\000\000\000\000\000\000", 9)},
+      {0x100000000000001ull, string("\010\001\000\000\000\000\000\000\001", 9)},
+      {0x1ffffffffffffffull, string("\010\001\377\377\377\377\377\377\377", 9)},
+      {0x200000000000000ull, string("\010\002\000\000\000\000\000\000\000", 9)},
+      {0x200000000000001ull, string("\010\002\000\000\000\000\000\000\001", 9)},
+      {0x3ffffffffffffffull, string("\010\003\377\377\377\377\377\377\377", 9)},
+      {0x400000000000000ull, string("\010\004\000\000\000\000\000\000\000", 9)},
+      {0x400000000000001ull, string("\010\004\000\000\000\000\000\000\001", 9)},
+      {0x7ffffffffffffffull, string("\010\007\377\377\377\377\377\377\377", 9)},
+      {0x800000000000000ull, string("\010\010\000\000\000\000\000\000\000", 9)},
+      {0x800000000000001ull, string("\010\010\000\000\000\000\000\000\001", 9)},
+      {0xfffffffffffffffull, string("\010\017\377\377\377\377\377\377\377", 9)},
+      {0x1000000000000000ull,
+       string("\010\020\000\000\000\000\000\000\000", 9)},
+      {0x1000000000000001ull,
+       string("\010\020\000\000\000\000\000\000\001", 9)},
+      {0x1fffffffffffffffull,
+       string("\010\037\377\377\377\377\377\377\377", 9)},
+      {0x2000000000000000ull, string("\010 \000\000\000\000\000\000\000", 9)},
+      {0x2000000000000001ull, string("\010 \000\000\000\000\000\000\001", 9)},
+      {0x3fffffffffffffffull, string("\010?\377\377\377\377\377\377\377", 9)},
+      {0x4000000000000000ull, string("\010@\000\000\000\000\000\000\000", 9)},
+      {0x4000000000000001ull, string("\010@\000\000\000\000\000\000\001", 9)},
+      {0x7fffffffffffffffull,
+       string("\010\177\377\377\377\377\377\377\377", 9)},
+      {0x8000000000000000ull,
+       string("\010\200\000\000\000\000\000\000\000", 9)},
+      {0x8000000000000001ull,
+       string("\010\200\000\000\000\000\000\000\001", 9)},
+  };
+  for (const auto& t : data) {
+    uint64 num = t.first;
+    string result;
+    OrderedCode::WriteNumIncreasing(&result, num);
+    EXPECT_EQ(t.second, result) << std::hex << num;
+
+    StringPiece in = result;
+    uint64 decoded;
+    EXPECT_TRUE(OrderedCode::ReadNumIncreasing(&in, &decoded));
+    EXPECT_EQ(num, decoded);
+    EXPECT_EQ("", in);
+  }
+}
+
+TEST(EncodingIsExpected, Signed) {
+  std::vector<std::pair<int64, string>> data = {
+      {0ll, string("\200", 1)},
+      {1ll, string("\201", 1)},
+      {2ll, string("\202", 1)},
+      {1ll, string("\201", 1)},
+      {2ll, string("\202", 1)},
+      {3ll, string("\203", 1)},
+      {3ll, string("\203", 1)},
+      {4ll, string("\204", 1)},
+      {5ll, string("\205", 1)},
+      {7ll, string("\207", 1)},
+      {8ll, string("\210", 1)},
+      {9ll, string("\211", 1)},
+      {15ll, string("\217", 1)},
+      {16ll, string("\220", 1)},
+      {17ll, string("\221", 1)},
+      {31ll, string("\237", 1)},
+      {32ll, string("\240", 1)},
+      {33ll, string("\241", 1)},
+      {63ll, string("\277", 1)},
+      {64ll, string("\300@", 2)},
+      {65ll, string("\300A", 2)},
+      {127ll, string("\300\177", 2)},
+      {128ll, string("\300\200", 2)},
+      {129ll, string("\300\201", 2)},
+      {255ll, string("\300\377", 2)},
+      {256ll, string("\301\000", 2)},
+      {257ll, string("\301\001", 2)},
+      {511ll, string("\301\377", 2)},
+      {512ll, string("\302\000", 2)},
+      {513ll, string("\302\001", 2)},
+      {1023ll, string("\303\377", 2)},
+      {1024ll, string("\304\000", 2)},
+      {1025ll, string("\304\001", 2)},
+      {2047ll, string("\307\377", 2)},
+      {2048ll, string("\310\000", 2)},
+      {2049ll, string("\310\001", 2)},
+      {4095ll, string("\317\377", 2)},
+      {4096ll, string("\320\000", 2)},
+      {4097ll, string("\320\001", 2)},
+      {8191ll, string("\337\377", 2)},
+      {8192ll, string("\340 \000", 3)},
+      {8193ll, string("\340 \001", 3)},
+      {16383ll, string("\340?\377", 3)},
+      {16384ll, string("\340@\000", 3)},
+      {16385ll, string("\340@\001", 3)},
+      {32767ll, string("\340\177\377", 3)},
+      {32768ll, string("\340\200\000", 3)},
+      {32769ll, string("\340\200\001", 3)},
+      {65535ll, string("\340\377\377", 3)},
+      {65536ll, string("\341\000\000", 3)},
+      {65537ll, string("\341\000\001", 3)},
+      {131071ll, string("\341\377\377", 3)},
+      {131072ll, string("\342\000\000", 3)},
+      {131073ll, string("\342\000\001", 3)},
+      {262143ll, string("\343\377\377", 3)},
+      {262144ll, string("\344\000\000", 3)},
+      {262145ll, string("\344\000\001", 3)},
+      {524287ll, string("\347\377\377", 3)},
+      {524288ll, string("\350\000\000", 3)},
+      {524289ll, string("\350\000\001", 3)},
+      {1048575ll, string("\357\377\377", 3)},
+      {1048576ll, string("\360\020\000\000", 4)},
+      {1048577ll, string("\360\020\000\001", 4)},
+      {2097151ll, string("\360\037\377\377", 4)},
+      {2097152ll, string("\360 \000\000", 4)},
+      {2097153ll, string("\360 \000\001", 4)},
+      {4194303ll, string("\360?\377\377", 4)},
+      {4194304ll, string("\360@\000\000", 4)},
+      {4194305ll, string("\360@\000\001", 4)},
+      {8388607ll, string("\360\177\377\377", 4)},
+      {8388608ll, string("\360\200\000\000", 4)},
+      {8388609ll, string("\360\200\000\001", 4)},
+      {16777215ll, string("\360\377\377\377", 4)},
+      {16777216ll, string("\361\000\000\000", 4)},
+      {16777217ll, string("\361\000\000\001", 4)},
+      {33554431ll, string("\361\377\377\377", 4)},
+      {33554432ll, string("\362\000\000\000", 4)},
+      {33554433ll, string("\362\000\000\001", 4)},
+      {67108863ll, string("\363\377\377\377", 4)},
+      {67108864ll, string("\364\000\000\000", 4)},
+      {67108865ll, string("\364\000\000\001", 4)},
+      {134217727ll, string("\367\377\377\377", 4)},
+      {134217728ll, string("\370\010\000\000\000", 5)},
+      {134217729ll, string("\370\010\000\000\001", 5)},
+      {268435455ll, string("\370\017\377\377\377", 5)},
+      {268435456ll, string("\370\020\000\000\000", 5)},
+      {268435457ll, string("\370\020\000\000\001", 5)},
+      {536870911ll, string("\370\037\377\377\377", 5)},
+      {536870912ll, string("\370 \000\000\000", 5)},
+      {536870913ll, string("\370 \000\000\001", 5)},
+      {1073741823ll, string("\370?\377\377\377", 5)},
+      {1073741824ll, string("\370@\000\000\000", 5)},
+      {1073741825ll, string("\370@\000\000\001", 5)},
+      {2147483647ll, string("\370\177\377\377\377", 5)},
+      {2147483648ll, string("\370\200\000\000\000", 5)},
+      {2147483649ll, string("\370\200\000\000\001", 5)},
+      {4294967295ll, string("\370\377\377\377\377", 5)},
+      {4294967296ll, string("\371\000\000\000\000", 5)},
+      {4294967297ll, string("\371\000\000\000\001", 5)},
+      {8589934591ll, string("\371\377\377\377\377", 5)},
+      {8589934592ll, string("\372\000\000\000\000", 5)},
+      {8589934593ll, string("\372\000\000\000\001", 5)},
+      {17179869183ll, string("\373\377\377\377\377", 5)},
+      {17179869184ll, string("\374\004\000\000\000\000", 6)},
+      {17179869185ll, string("\374\004\000\000\000\001", 6)},
+      {34359738367ll, string("\374\007\377\377\377\377", 6)},
+      {34359738368ll, string("\374\010\000\000\000\000", 6)},
+      {34359738369ll, string("\374\010\000\000\000\001", 6)},
+      {68719476735ll, string("\374\017\377\377\377\377", 6)},
+      {68719476736ll, string("\374\020\000\000\000\000", 6)},
+      {68719476737ll, string("\374\020\000\000\000\001", 6)},
+      {137438953471ll, string("\374\037\377\377\377\377", 6)},
+      {137438953472ll, string("\374 \000\000\000\000", 6)},
+      {137438953473ll, string("\374 \000\000\000\001", 6)},
+      {274877906943ll, string("\374?\377\377\377\377", 6)},
+      {274877906944ll, string("\374@\000\000\000\000", 6)},
+      {274877906945ll, string("\374@\000\000\000\001", 6)},
+      {549755813887ll, string("\374\177\377\377\377\377", 6)},
+      {549755813888ll, string("\374\200\000\000\000\000", 6)},
+      {549755813889ll, string("\374\200\000\000\000\001", 6)},
+      {1099511627775ll, string("\374\377\377\377\377\377", 6)},
+      {1099511627776ll, string("\375\000\000\000\000\000", 6)},
+      {1099511627777ll, string("\375\000\000\000\000\001", 6)},
+      {2199023255551ll, string("\375\377\377\377\377\377", 6)},
+      {2199023255552ll, string("\376\002\000\000\000\000\000", 7)},
+      {2199023255553ll, string("\376\002\000\000\000\000\001", 7)},
+      {4398046511103ll, string("\376\003\377\377\377\377\377", 7)},
+      {4398046511104ll, string("\376\004\000\000\000\000\000", 7)},
+      {4398046511105ll, string("\376\004\000\000\000\000\001", 7)},
+      {8796093022207ll, string("\376\007\377\377\377\377\377", 7)},
+      {8796093022208ll, string("\376\010\000\000\000\000\000", 7)},
+      {8796093022209ll, string("\376\010\000\000\000\000\001", 7)},
+      {17592186044415ll, string("\376\017\377\377\377\377\377", 7)},
+      {17592186044416ll, string("\376\020\000\000\000\000\000", 7)},
+      {17592186044417ll, string("\376\020\000\000\000\000\001", 7)},
+      {35184372088831ll, string("\376\037\377\377\377\377\377", 7)},
+      {35184372088832ll, string("\376 \000\000\000\000\000", 7)},
+      {35184372088833ll, string("\376 \000\000\000\000\001", 7)},
+      {70368744177663ll, string("\376?\377\377\377\377\377", 7)},
+      {70368744177664ll, string("\376@\000\000\000\000\000", 7)},
+      {70368744177665ll, string("\376@\000\000\000\000\001", 7)},
+      {140737488355327ll, string("\376\177\377\377\377\377\377", 7)},
+      {140737488355328ll, string("\376\200\000\000\000\000\000", 7)},
+      {140737488355329ll, string("\376\200\000\000\000\000\001", 7)},
+      {281474976710655ll, string("\376\377\377\377\377\377\377", 7)},
+      {281474976710656ll, string("\377\001\000\000\000\000\000\000", 8)},
+      {281474976710657ll, string("\377\001\000\000\000\000\000\001", 8)},
+      {562949953421311ll, string("\377\001\377\377\377\377\377\377", 8)},
+      {562949953421312ll, string("\377\002\000\000\000\000\000\000", 8)},
+      {562949953421313ll, string("\377\002\000\000\000\000\000\001", 8)},
+      {1125899906842623ll, string("\377\003\377\377\377\377\377\377", 8)},
+      {1125899906842624ll, string("\377\004\000\000\000\000\000\000", 8)},
+      {1125899906842625ll, string("\377\004\000\000\000\000\000\001", 8)},
+      {2251799813685247ll, string("\377\007\377\377\377\377\377\377", 8)},
+      {2251799813685248ll, string("\377\010\000\000\000\000\000\000", 8)},
+      {2251799813685249ll, string("\377\010\000\000\000\000\000\001", 8)},
+      {4503599627370495ll, string("\377\017\377\377\377\377\377\377", 8)},
+      {4503599627370496ll, string("\377\020\000\000\000\000\000\000", 8)},
+      {4503599627370497ll, string("\377\020\000\000\000\000\000\001", 8)},
+      {9007199254740991ll, string("\377\037\377\377\377\377\377\377", 8)},
+      {9007199254740992ll, string("\377 \000\000\000\000\000\000", 8)},
+      {9007199254740993ll, string("\377 \000\000\000\000\000\001", 8)},
+      {18014398509481983ll, string("\377?\377\377\377\377\377\377", 8)},
+      {18014398509481984ll, string("\377@\000\000\000\000\000\000", 8)},
+      {18014398509481985ll, string("\377@\000\000\000\000\000\001", 8)},
+      {36028797018963967ll, string("\377\177\377\377\377\377\377\377", 8)},
+      {36028797018963968ll, string("\377\200\200\000\000\000\000\000\000", 9)},
+      {36028797018963969ll, string("\377\200\200\000\000\000\000\000\001", 9)},
+      {72057594037927935ll, string("\377\200\377\377\377\377\377\377\377", 9)},
+      {72057594037927936ll, string("\377\201\000\000\000\000\000\000\000", 9)},
+      {72057594037927937ll, string("\377\201\000\000\000\000\000\000\001", 9)},
+      {144115188075855871ll, string("\377\201\377\377\377\377\377\377\377", 9)},
+      {144115188075855872ll, string("\377\202\000\000\000\000\000\000\000", 9)},
+      {144115188075855873ll, string("\377\202\000\000\000\000\000\000\001", 9)},
+      {288230376151711743ll, string("\377\203\377\377\377\377\377\377\377", 9)},
+      {288230376151711744ll, string("\377\204\000\000\000\000\000\000\000", 9)},
+      {288230376151711745ll, string("\377\204\000\000\000\000\000\000\001", 9)},
+      {576460752303423487ll, string("\377\207\377\377\377\377\377\377\377", 9)},
+      {576460752303423488ll, string("\377\210\000\000\000\000\000\000\000", 9)},
+      {576460752303423489ll, string("\377\210\000\000\000\000\000\000\001", 9)},
+      {1152921504606846975ll,
+       string("\377\217\377\377\377\377\377\377\377", 9)},
+      {1152921504606846976ll,
+       string("\377\220\000\000\000\000\000\000\000", 9)},
+      {1152921504606846977ll,
+       string("\377\220\000\000\000\000\000\000\001", 9)},
+      {2305843009213693951ll,
+       string("\377\237\377\377\377\377\377\377\377", 9)},
+      {2305843009213693952ll,
+       string("\377\240\000\000\000\000\000\000\000", 9)},
+      {2305843009213693953ll,
+       string("\377\240\000\000\000\000\000\000\001", 9)},
+      {4611686018427387903ll,
+       string("\377\277\377\377\377\377\377\377\377", 9)},
+      {4611686018427387904ll,
+       string("\377\300@\000\000\000\000\000\000\000", 10)},
+      {4611686018427387905ll,
+       string("\377\300@\000\000\000\000\000\000\001", 10)},
+      {9223372036854775807ll,
+       string("\377\300\177\377\377\377\377\377\377\377", 10)},
+      {-9223372036854775807ll,
+       string("\000?\200\000\000\000\000\000\000\001", 10)},
+      {0ll, string("\200", 1)},
+      {-1ll, string("\177", 1)},
+      {-2ll, string("~", 1)},
+      {-1ll, string("\177", 1)},
+      {-2ll, string("~", 1)},
+      {-3ll, string("}", 1)},
+      {-3ll, string("}", 1)},
+      {-4ll, string("|", 1)},
+      {-5ll, string("{", 1)},
+      {-7ll, string("y", 1)},
+      {-8ll, string("x", 1)},
+      {-9ll, string("w", 1)},
+      {-15ll, string("q", 1)},
+      {-16ll, string("p", 1)},
+      {-17ll, string("o", 1)},
+      {-31ll, string("a", 1)},
+      {-32ll, string("`", 1)},
+      {-33ll, string("_", 1)},
+      {-63ll, string("A", 1)},
+      {-64ll, string("@", 1)},
+      {-65ll, string("?\277", 2)},
+      {-127ll, string("?\201", 2)},
+      {-128ll, string("?\200", 2)},
+      {-129ll, string("?\177", 2)},
+      {-255ll, string("?\001", 2)},
+      {-256ll, string("?\000", 2)},
+      {-257ll, string(">\377", 2)},
+      {-511ll, string(">\001", 2)},
+      {-512ll, string(">\000", 2)},
+      {-513ll, string("=\377", 2)},
+      {-1023ll, string("<\001", 2)},
+      {-1024ll, string("<\000", 2)},
+      {-1025ll, string(";\377", 2)},
+      {-2047ll, string("8\001", 2)},
+      {-2048ll, string("8\000", 2)},
+      {-2049ll, string("7\377", 2)},
+      {-4095ll, string("0\001", 2)},
+      {-4096ll, string("0\000", 2)},
+      {-4097ll, string("/\377", 2)},
+      {-8191ll, string(" \001", 2)},
+      {-8192ll, string(" \000", 2)},
+      {-8193ll, string("\037\337\377", 3)},
+      {-16383ll, string("\037\300\001", 3)},
+      {-16384ll, string("\037\300\000", 3)},
+      {-16385ll, string("\037\277\377", 3)},
+      {-32767ll, string("\037\200\001", 3)},
+      {-32768ll, string("\037\200\000", 3)},
+      {-32769ll, string("\037\177\377", 3)},
+      {-65535ll, string("\037\000\001", 3)},
+      {-65536ll, string("\037\000\000", 3)},
+      {-65537ll, string("\036\377\377", 3)},
+      {-131071ll, string("\036\000\001", 3)},
+      {-131072ll, string("\036\000\000", 3)},
+      {-131073ll, string("\035\377\377", 3)},
+      {-262143ll, string("\034\000\001", 3)},
+      {-262144ll, string("\034\000\000", 3)},
+      {-262145ll, string("\033\377\377", 3)},
+      {-524287ll, string("\030\000\001", 3)},
+      {-524288ll, string("\030\000\000", 3)},
+      {-524289ll, string("\027\377\377", 3)},
+      {-1048575ll, string("\020\000\001", 3)},
+      {-1048576ll, string("\020\000\000", 3)},
+      {-1048577ll, string("\017\357\377\377", 4)},
+      {-2097151ll, string("\017\340\000\001", 4)},
+      {-2097152ll, string("\017\340\000\000", 4)},
+      {-2097153ll, string("\017\337\377\377", 4)},
+      {-4194303ll, string("\017\300\000\001", 4)},
+      {-4194304ll, string("\017\300\000\000", 4)},
+      {-4194305ll, string("\017\277\377\377", 4)},
+      {-8388607ll, string("\017\200\000\001", 4)},
+      {-8388608ll, string("\017\200\000\000", 4)},
+      {-8388609ll, string("\017\177\377\377", 4)},
+      {-16777215ll, string("\017\000\000\001", 4)},
+      {-16777216ll, string("\017\000\000\000", 4)},
+      {-16777217ll, string("\016\377\377\377", 4)},
+      {-33554431ll, string("\016\000\000\001", 4)},
+      {-33554432ll, string("\016\000\000\000", 4)},
+      {-33554433ll, string("\r\377\377\377", 4)},
+      {-67108863ll, string("\014\000\000\001", 4)},
+      {-67108864ll, string("\014\000\000\000", 4)},
+      {-67108865ll, string("\013\377\377\377", 4)},
+      {-134217727ll, string("\010\000\000\001", 4)},
+      {-134217728ll, string("\010\000\000\000", 4)},
+      {-134217729ll, string("\007\367\377\377\377", 5)},
+      {-268435455ll, string("\007\360\000\000\001", 5)},
+      {-268435456ll, string("\007\360\000\000\000", 5)},
+      {-268435457ll, string("\007\357\377\377\377", 5)},
+      {-536870911ll, string("\007\340\000\000\001", 5)},
+      {-536870912ll, string("\007\340\000\000\000", 5)},
+      {-536870913ll, string("\007\337\377\377\377", 5)},
+      {-1073741823ll, string("\007\300\000\000\001", 5)},
+      {-1073741824ll, string("\007\300\000\000\000", 5)},
+      {-1073741825ll, string("\007\277\377\377\377", 5)},
+      {-2147483647ll, string("\007\200\000\000\001", 5)},
+      {-2147483648ll, string("\007\200\000\000\000", 5)},
+      {-2147483649ll, string("\007\177\377\377\377", 5)},
+      {-4294967295ll, string("\007\000\000\000\001", 5)},
+      {-4294967296ll, string("\007\000\000\000\000", 5)},
+      {-4294967297ll, string("\006\377\377\377\377", 5)},
+      {-8589934591ll, string("\006\000\000\000\001", 5)},
+      {-8589934592ll, string("\006\000\000\000\000", 5)},
+      {-8589934593ll, string("\005\377\377\377\377", 5)},
+      {-17179869183ll, string("\004\000\000\000\001", 5)},
+      {-17179869184ll, string("\004\000\000\000\000", 5)},
+      {-17179869185ll, string("\003\373\377\377\377\377", 6)},
+      {-34359738367ll, string("\003\370\000\000\000\001", 6)},
+      {-34359738368ll, string("\003\370\000\000\000\000", 6)},
+      {-34359738369ll, string("\003\367\377\377\377\377", 6)},
+      {-68719476735ll, string("\003\360\000\000\000\001", 6)},
+      {-68719476736ll, string("\003\360\000\000\000\000", 6)},
+      {-68719476737ll, string("\003\357\377\377\377\377", 6)},
+      {-137438953471ll, string("\003\340\000\000\000\001", 6)},
+      {-137438953472ll, string("\003\340\000\000\000\000", 6)},
+      {-137438953473ll, string("\003\337\377\377\377\377", 6)},
+      {-274877906943ll, string("\003\300\000\000\000\001", 6)},
+      {-274877906944ll, string("\003\300\000\000\000\000", 6)},
+      {-274877906945ll, string("\003\277\377\377\377\377", 6)},
+      {-549755813887ll, string("\003\200\000\000\000\001", 6)},
+      {-549755813888ll, string("\003\200\000\000\000\000", 6)},
+      {-549755813889ll, string("\003\177\377\377\377\377", 6)},
+      {-1099511627775ll, string("\003\000\000\000\000\001", 6)},
+      {-1099511627776ll, string("\003\000\000\000\000\000", 6)},
+      {-1099511627777ll, string("\002\377\377\377\377\377", 6)},
+      {-2199023255551ll, string("\002\000\000\000\000\001", 6)},
+      {-2199023255552ll, string("\002\000\000\000\000\000", 6)},
+      {-2199023255553ll, string("\001\375\377\377\377\377\377", 7)},
+      {-4398046511103ll, string("\001\374\000\000\000\000\001", 7)},
+      {-4398046511104ll, string("\001\374\000\000\000\000\000", 7)},
+      {-4398046511105ll, string("\001\373\377\377\377\377\377", 7)},
+      {-8796093022207ll, string("\001\370\000\000\000\000\001", 7)},
+      {-8796093022208ll, string("\001\370\000\000\000\000\000", 7)},
+      {-8796093022209ll, string("\001\367\377\377\377\377\377", 7)},
+      {-17592186044415ll, string("\001\360\000\000\000\000\001", 7)},
+      {-17592186044416ll, string("\001\360\000\000\000\000\000", 7)},
+      {-17592186044417ll, string("\001\357\377\377\377\377\377", 7)},
+      {-35184372088831ll, string("\001\340\000\000\000\000\001", 7)},
+      {-35184372088832ll, string("\001\340\000\000\000\000\000", 7)},
+      {-35184372088833ll, string("\001\337\377\377\377\377\377", 7)},
+      {-70368744177663ll, string("\001\300\000\000\000\000\001", 7)},
+      {-70368744177664ll, string("\001\300\000\000\000\000\000", 7)},
+      {-70368744177665ll, string("\001\277\377\377\377\377\377", 7)},
+      {-140737488355327ll, string("\001\200\000\000\000\000\001", 7)},
+      {-140737488355328ll, string("\001\200\000\000\000\000\000", 7)},
+      {-140737488355329ll, string("\001\177\377\377\377\377\377", 7)},
+      {-281474976710655ll, string("\001\000\000\000\000\000\001", 7)},
+      {-281474976710656ll, string("\001\000\000\000\000\000\000", 7)},
+      {-281474976710657ll, string("\000\376\377\377\377\377\377\377", 8)},
+      {-562949953421311ll, string("\000\376\000\000\000\000\000\001", 8)},
+      {-562949953421312ll, string("\000\376\000\000\000\000\000\000", 8)},
+      {-562949953421313ll, string("\000\375\377\377\377\377\377\377", 8)},
+      {-1125899906842623ll, string("\000\374\000\000\000\000\000\001", 8)},
+      {-1125899906842624ll, string("\000\374\000\000\000\000\000\000", 8)},
+      {-1125899906842625ll, string("\000\373\377\377\377\377\377\377", 8)},
+      {-2251799813685247ll, string("\000\370\000\000\000\000\000\001", 8)},
+      {-2251799813685248ll, string("\000\370\000\000\000\000\000\000", 8)},
+      {-2251799813685249ll, string("\000\367\377\377\377\377\377\377", 8)},
+      {-4503599627370495ll, string("\000\360\000\000\000\000\000\001", 8)},
+      {-4503599627370496ll, string("\000\360\000\000\000\000\000\000", 8)},
+      {-4503599627370497ll, string("\000\357\377\377\377\377\377\377", 8)},
+      {-9007199254740991ll, string("\000\340\000\000\000\000\000\001", 8)},
+      {-9007199254740992ll, string("\000\340\000\000\000\000\000\000", 8)},
+      {-9007199254740993ll, string("\000\337\377\377\377\377\377\377", 8)},
+      {-18014398509481983ll, string("\000\300\000\000\000\000\000\001", 8)},
+      {-18014398509481984ll, string("\000\300\000\000\000\000\000\000", 8)},
+      {-18014398509481985ll, string("\000\277\377\377\377\377\377\377", 8)},
+      {-36028797018963967ll, string("\000\200\000\000\000\000\000\001", 8)},
+      {-36028797018963968ll, string("\000\200\000\000\000\000\000\000", 8)},
+      {-36028797018963969ll, string("\000\177\177\377\377\377\377\377\377", 9)},
+      {-72057594037927935ll, string("\000\177\000\000\000\000\000\000\001", 9)},
+      {-72057594037927936ll, string("\000\177\000\000\000\000\000\000\000", 9)},
+      {-72057594037927937ll, string("\000~\377\377\377\377\377\377\377", 9)},
+      {-144115188075855871ll, string("\000~\000\000\000\000\000\000\001", 9)},
+      {-144115188075855872ll, string("\000~\000\000\000\000\000\000\000", 9)},
+      {-144115188075855873ll, string("\000}\377\377\377\377\377\377\377", 9)},
+      {-288230376151711743ll, string("\000|\000\000\000\000\000\000\001", 9)},
+      {-288230376151711744ll, string("\000|\000\000\000\000\000\000\000", 9)},
+      {-288230376151711745ll, string("\000{\377\377\377\377\377\377\377", 9)},
+      {-576460752303423487ll, string("\000x\000\000\000\000\000\000\001", 9)},
+      {-576460752303423488ll, string("\000x\000\000\000\000\000\000\000", 9)},
+      {-576460752303423489ll, string("\000w\377\377\377\377\377\377\377", 9)},
+      {-1152921504606846975ll, string("\000p\000\000\000\000\000\000\001", 9)},
+      {-1152921504606846976ll, string("\000p\000\000\000\000\000\000\000", 9)},
+      {-1152921504606846977ll, string("\000o\377\377\377\377\377\377\377", 9)},
+      {-2305843009213693951ll, string("\000`\000\000\000\000\000\000\001", 9)},
+      {-2305843009213693952ll, string("\000`\000\000\000\000\000\000\000", 9)},
+      {-2305843009213693953ll, string("\000_\377\377\377\377\377\377\377", 9)},
+      {-4611686018427387903ll, string("\000@\000\000\000\000\000\000\001", 9)},
+      {-4611686018427387904ll, string("\000@\000\000\000\000\000\000\000", 9)},
+      {-4611686018427387905ll,
+       string("\000?\277\377\377\377\377\377\377\377", 10)},
+      {-9223372036854775807ll,
+       string("\000?\200\000\000\000\000\000\000\001", 10)},
+      {9223372036854775807ll,
+       string("\377\300\177\377\377\377\377\377\377\377", 10)},
+  };
+  for (const auto& t : data) {
+    int64 num = t.first;
+    string result;
+    OrderedCode::WriteSignedNumIncreasing(&result, num);
+    EXPECT_EQ(t.second, result) << std::hex << num;
+
+    StringPiece in = result;
+    int64 decoded;
+    EXPECT_TRUE(OrderedCode::ReadSignedNumIncreasing(&in, &decoded));
+    EXPECT_EQ(num, decoded);
+    EXPECT_EQ("", in);
+  }
+}
+
+static void BM_WriteString(int n, int len) {
+  testing::StopTiming();
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  string x;
+  for (int i = 0; i < len; i++) {
+    x += rnd.Uniform(256);
+  }
+  string y;
+
+  testing::BytesProcessed(n * len);
+  testing::StartTiming();
+  while (n-- > 0) {
+    y.clear();
+    OCWriteToString<string>(&y, x);
+  }
+}
+
+static void BM_ReadString(int n, int len) {
+  testing::StopTiming();
+  random::PhiloxRandom philox(301, 17);
+  random::SimplePhilox rnd(&philox);
+  string x;
+  for (int i = 0; i < len; i++) {
+    x += rnd.Uniform(256);
+  }
+  string data;
+  OCWriteToString<string>(&data, x);
+  string result;
+
+  testing::BytesProcessed(n * len);
+  testing::StartTiming();
+  while (n-- > 0) {
+    result.clear();
+    StringPiece s = data;
+    OCRead<string>(&s, &result);
+  }
+}
+
+static void BM_WriteStringIncreasing(int n, int len) { BM_WriteString(n, len); }
+static void BM_ReadStringIncreasing(int n, int len) { BM_ReadString(n, len); }
+
+BENCHMARK(BM_WriteStringIncreasing)->Range(0, 1024);
+BENCHMARK(BM_ReadStringIncreasing)->Range(0, 1024);
+
+}  // namespace strings
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/str_util.cc b/tensorflow/core/lib/strings/str_util.cc
new file mode 100644
index 0000000000..cccd50c7ff
--- /dev/null
+++ b/tensorflow/core/lib/strings/str_util.cc
@@ -0,0 +1,312 @@
+#include "tensorflow/core/lib/strings/str_util.h"
+#include <ctype.h>
+
+namespace tensorflow {
+namespace str_util {
+
+static char hex_char[] = "0123456789abcdef";
+
+string CEscape(const string& src) {
+  string dest;
+
+  for (unsigned char c : src) {
+    switch (c) {
+      case '\n':
+        dest.append("\\n");
+        break;
+      case '\r':
+        dest.append("\\r");
+        break;
+      case '\t':
+        dest.append("\\t");
+        break;
+      case '\"':
+        dest.append("\\\"");
+        break;
+      case '\'':
+        dest.append("\\'");
+        break;
+      case '\\':
+        dest.append("\\\\");
+        break;
+      default:
+        // Note that if we emit \xNN and the src character after that is a hex
+        // digit then that digit must be escaped too to prevent it being
+        // interpreted as part of the character code by C.
+        if ((c >= 0x80) || !isprint(c)) {
+          dest.append("\\");
+          dest.push_back(hex_char[c / 64]);
+          dest.push_back(hex_char[(c % 64) / 8]);
+          dest.push_back(hex_char[c % 8]);
+        } else {
+          dest.push_back(c);
+          break;
+        }
+    }
+  }
+
+  return dest;
+}
+
+namespace {  // Private helpers for CUnescape().
+
+inline bool is_octal_digit(unsigned char c) { return c >= '0' && c <= '7'; }
+
+inline bool ascii_isxdigit(unsigned char c) {
+  return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') ||
+         (c >= 'A' && c <= 'F');
+}
+
+inline int hex_digit_to_int(char c) {
+  int x = static_cast<unsigned char>(c);
+  if (x > '9') {
+    x += 9;
+  }
+  return x & 0xf;
+}
+
+bool CUnescapeInternal(StringPiece source, char* dest, int* dest_len,
+                       string* error) {
+  char* d = dest;
+  const char* p = source.data();
+  const char* end = source.end();
+  const char* last_byte = end - 1;
+
+  // Small optimization for case where source = dest and there's no escaping
+  while (p == d && p < end && *p != '\\') p++, d++;
+
+  while (p < end) {
+    if (*p != '\\') {
+      *d++ = *p++;
+    } else {
+      if (++p > last_byte) {  // skip past the '\\'
+        if (error) *error = "String cannot end with \\";
+        return false;
+      }
+      switch (*p) {
+        case 'a':
+          *d++ = '\a';
+          break;
+        case 'b':
+          *d++ = '\b';
+          break;
+        case 'f':
+          *d++ = '\f';
+          break;
+        case 'n':
+          *d++ = '\n';
+          break;
+        case 'r':
+          *d++ = '\r';
+          break;
+        case 't':
+          *d++ = '\t';
+          break;
+        case 'v':
+          *d++ = '\v';
+          break;
+        case '\\':
+          *d++ = '\\';
+          break;
+        case '?':
+          *d++ = '\?';
+          break;  // \?  Who knew?
+        case '\'':
+          *d++ = '\'';
+          break;
+        case '"':
+          *d++ = '\"';
+          break;
+        case '0':
+        case '1':
+        case '2':
+        case '3':  // octal digit: 1 to 3 digits
+        case '4':
+        case '5':
+        case '6':
+        case '7': {
+          const char* octal_start = p;
+          unsigned int ch = *p - '0';
+          if (p < last_byte && is_octal_digit(p[1])) ch = ch * 8 + *++p - '0';
+          if (p < last_byte && is_octal_digit(p[1]))
+            ch = ch * 8 + *++p - '0';  // now points at last digit
+          if (ch > 0xff) {
+            if (error) {
+              *error = "Value of \\" +
+                       string(octal_start, p + 1 - octal_start) +
+                       " exceeds 0xff";
+            }
+            return false;
+          }
+          *d++ = ch;
+          break;
+        }
+        case 'x':
+        case 'X': {
+          if (p >= last_byte) {
+            if (error) *error = "String cannot end with \\x";
+            return false;
+          } else if (!ascii_isxdigit(p[1])) {
+            if (error) *error = "\\x cannot be followed by a non-hex digit";
+            return false;
+          }
+          unsigned int ch = 0;
+          const char* hex_start = p;
+          while (p < last_byte && ascii_isxdigit(p[1]))
+            // Arbitrarily many hex digits
+            ch = (ch << 4) + hex_digit_to_int(*++p);
+          if (ch > 0xFF) {
+            if (error) {
+              *error = "Value of \\" + string(hex_start, p + 1 - hex_start) +
+                       " exceeds 0xff";
+            }
+            return false;
+          }
+          *d++ = ch;
+          break;
+        }
+        default: {
+          if (error) *error = string("Unknown escape sequence: \\") + *p;
+          return false;
+        }
+      }
+      p++;  // read past letter we escaped
+    }
+  }
+  *dest_len = d - dest;
+  return true;
+}
+
+}  // namespace
+
+bool CUnescape(StringPiece source, string* dest, string* error) {
+  dest->resize(source.size());
+  int dest_size;
+  if (!CUnescapeInternal(source, const_cast<char*>(dest->data()), &dest_size,
+                         error)) {
+    return false;
+  }
+  dest->erase(dest_size);
+  return true;
+}
+
+bool NumericParse32(const string& text, int32* val) {
+  // Slow, but this code is not performance critical, and this
+  // doesn't bring in any new dependencies
+  char junk;
+  if (sscanf(text.c_str(), "%d%c", val, &junk) == 1) {
+    return true;
+  } else {
+    return false;
+  }
+}
+
+void StripTrailingWhitespace(string* s) {
+  string::size_type i;
+  for (i = s->size(); i > 0 && isspace((*s)[i - 1]); --i) {
+  }
+  s->resize(i);
+}
+
+// Return lower-cased version of s.
+string Lowercase(StringPiece s) {
+  string result(s.data(), s.size());
+  for (char& c : result) {
+    c = tolower(c);
+  }
+  return result;
+}
+
+// Return upper-cased version of s.
+string Uppercase(StringPiece s) {
+  string result(s.data(), s.size());
+  for (char& c : result) {
+    c = toupper(c);
+  }
+  return result;
+}
+
+void TitlecaseString(string* s, StringPiece delimiters) {
+  bool upper = true;
+  for (string::iterator ss = s->begin(); ss != s->end(); ++ss) {
+    if (upper) {
+      *ss = toupper(*ss);
+    }
+    upper = (delimiters.find(*ss) != StringPiece::npos);
+  }
+}
+
+size_t RemoveLeadingWhitespace(StringPiece* text) {
+  size_t count = 0;
+  const char* ptr = text->data();
+  while (count < text->size() && isspace(*ptr)) {
+    count++;
+    ptr++;
+  }
+  text->remove_prefix(count);
+  return count;
+}
+
+size_t RemoveTrailingWhitespace(StringPiece* text) {
+  size_t count = 0;
+  const char* ptr = text->data() + text->size() - 1;
+  while (count < text->size() && isspace(*ptr)) {
+    ++count;
+    --ptr;
+  }
+  text->remove_suffix(count);
+  return count;
+}
+
+size_t RemoveWhitespaceContext(StringPiece* text) {
+  // use RemoveLeadingWhitespace() and RemoveTrailingWhitespace() to do the job
+  return (RemoveLeadingWhitespace(text) + RemoveTrailingWhitespace(text));
+}
+
+bool ConsumePrefix(StringPiece* s, StringPiece expected) {
+  if (s->starts_with(expected)) {
+    s->remove_prefix(expected.size());
+    return true;
+  }
+  return false;
+}
+
+bool ConsumeLeadingDigits(StringPiece* s, uint64* val) {
+  const char* p = s->data();
+  const char* limit = p + s->size();
+  uint64 v = 0;
+  while (p < limit) {
+    const char c = *p;
+    if (c < '0' || c > '9') break;
+    uint64 new_v = (v * 10) + (c - '0');
+    if (new_v < v) {
+      // Overflow occurred
+      return false;
+    }
+    v = new_v;
+    p++;
+  }
+  if (p > s->data()) {
+    // Consume some digits
+    s->remove_prefix(p - s->data());
+    *val = v;
+    return true;
+  } else {
+    return false;
+  }
+}
+
+bool SplitAndParseAsInts(StringPiece text, char delim,
+                         std::vector<int32>* result) {
+  result->clear();
+  std::vector<string> num_strings = Split(text, delim);
+  for (const auto& s : num_strings) {
+    int32 num;
+    if (!NumericParse32(s, &num)) return false;
+    result->push_back(num);
+  }
+  return true;
+}
+
+}  // namespace str_util
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/str_util.h b/tensorflow/core/lib/strings/str_util.h
new file mode 100644
index 0000000000..34ea462b2d
--- /dev/null
+++ b/tensorflow/core/lib/strings/str_util.h
@@ -0,0 +1,149 @@
+#ifndef TENSORFLOW_LIB_STRINGS_STR_UTIL_H_
+#define TENSORFLOW_LIB_STRINGS_STR_UTIL_H_
+
+#include <string>
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+
+// Basic string utility routines
+namespace tensorflow {
+namespace str_util {
+
+// Returns a version of 'src' where unprintable characters have been
+// escaped using C-style escape sequences.
+string CEscape(const string& src);
+
+// Copies "source" to "dest", rewriting C-style escape sequences --
+// '\n', '\r', '\\', '\ooo', etc -- to their ASCII equivalents.
+//
+// Errors: Sets the description of the first encountered error in
+// 'error'. To disable error reporting, set 'error' to NULL.
+//
+// NOTE: Does not support \u or \U!
+bool CUnescape(StringPiece source, string* dest, string* error);
+
+// If "text" can be successfully parsed as the ASCII representation of
+// an integer, sets "*val" to the value and returns true.  Otherwise,
+// returns false.
+bool NumericParse32(const string& text, int32* val);
+
+// Removes any trailing whitespace from "*s".
+void StripTrailingWhitespace(string* s);
+
+// Removes leading ascii_isspace() characters.
+// Returns number of characters removed.
+size_t RemoveLeadingWhitespace(StringPiece* text);
+
+// Removes trailing ascii_isspace() characters.
+// Returns number of characters removed.
+size_t RemoveTrailingWhitespace(StringPiece* text);
+
+// Removes leading and trailing ascii_isspace() chars.
+// Returns number of chars removed.
+size_t RemoveWhitespaceContext(StringPiece* text);
+
+// Consume a leading positive integer value.  If any digits were
+// found, store the value of the leading unsigned number in "*val",
+// advance "*s" past the consumed number, and return true.  If
+// overflow occurred, returns false.  Otherwise, returns false.
+bool ConsumeLeadingDigits(StringPiece* s, uint64* val);
+
+// If "*s" starts with "expected", consume it and return true.
+// Otherwise, return false.
+bool ConsumePrefix(StringPiece* s, StringPiece expected);
+
+// Return lower-cased version of s.
+string Lowercase(StringPiece s);
+
+// Return upper-cased version of s.
+string Uppercase(StringPiece s);
+
+// Capitalize first character of each word in "*s".  "delimiters" is a
+// set of characters that can be used as word boundaries.
+void TitlecaseString(string* s, StringPiece delimiters);
+
+// Join functionality
+template <typename T>
+string Join(const std::vector<T>& s, const char* sep);
+template <typename T>
+string Join(const gtl::ArraySlice<T>& s, const char* sep);
+
+struct AllowEmpty {
+  bool operator()(StringPiece sp) const { return true; }
+};
+struct SkipEmpty {
+  bool operator()(StringPiece sp) const { return !sp.empty(); }
+};
+struct SkipWhitespace {
+  bool operator()(StringPiece sp) const {
+    RemoveTrailingWhitespace(&sp);
+    return !sp.empty();
+  }
+};
+
+std::vector<string> Split(StringPiece text, char delim);
+template <typename Predicate>
+std::vector<string> Split(StringPiece text, char delim, Predicate p);
+
+// Split "text" at "delim" characters, and parse each component as
+// an integer.  If successful, adds the individual numbers in order
+// to "*result" and returns true.  Otherwise returns false.
+bool SplitAndParseAsInts(StringPiece text, char delim,
+                         std::vector<int32>* result);
+
+// ------------------------------------------------------------------
+// Implementation details below
+namespace internal {
+template <typename T>
+string JoinHelper(typename gtl::ArraySlice<T>::const_iterator begin,
+                  typename gtl::ArraySlice<T>::const_iterator end,
+                  const char* sep) {
+  string result;
+  bool first = true;
+  for (typename gtl::ArraySlice<T>::const_iterator it = begin; it != end;
+       ++it) {
+    tensorflow::strings::StrAppend(&result, (first ? "" : sep), *it);
+    first = false;
+  }
+  return result;
+}
+}  // namespace internal
+
+template <typename T>
+string Join(const std::vector<T>& s, const char* sep) {
+  return Join<T>(gtl::ArraySlice<T>(s), sep);
+}
+
+template <typename T>
+string Join(const gtl::ArraySlice<T>& s, const char* sep) {
+  return internal::JoinHelper<T>(s.begin(), s.end(), sep);
+}
+
+inline std::vector<string> Split(StringPiece text, char delim) {
+  return Split(text, delim, AllowEmpty());
+}
+
+template <typename Predicate>
+std::vector<string> Split(StringPiece text, char delim, Predicate p) {
+  std::vector<string> result;
+  int token_start = 0;
+  if (!text.empty()) {
+    for (int i = 0; i < text.size() + 1; i++) {
+      if ((i == text.size()) || (text[i] == delim)) {
+        StringPiece token(text.data() + token_start, i - token_start);
+        if (p(token)) {
+          result.push_back(token.ToString());
+        }
+        token_start = i + 1;
+      }
+    }
+  }
+  return result;
+}
+
+}  // namespace str_util
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_STRINGS_STR_UTIL_H_
diff --git a/tensorflow/core/lib/strings/str_util_test.cc b/tensorflow/core/lib/strings/str_util_test.cc
new file mode 100644
index 0000000000..f71cc6c609
--- /dev/null
+++ b/tensorflow/core/lib/strings/str_util_test.cc
@@ -0,0 +1,258 @@
+#include "tensorflow/core/lib/strings/str_util.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TEST(CEscape, Basic) {
+  EXPECT_EQ(str_util::CEscape("hello"), "hello");
+  EXPECT_EQ(str_util::CEscape("hello\n"), "hello\\n");
+  EXPECT_EQ(str_util::CEscape("hello\r"), "hello\\r");
+  EXPECT_EQ(str_util::CEscape("\t\r\"'"), "\\t\\r\\\"\\'");
+  EXPECT_EQ(str_util::CEscape("\320hi\200"), "\\320hi\\200");
+}
+
+string ExpectCUnescapeSuccess(StringPiece source) {
+  string dest;
+  string error;
+  EXPECT_TRUE(str_util::CUnescape(source, &dest, &error)) << error;
+  return dest;
+}
+
+TEST(CUnescape, Basic) {
+  EXPECT_EQ("hello", ExpectCUnescapeSuccess("hello"));
+  EXPECT_EQ("hello\n", ExpectCUnescapeSuccess("hello\\n"));
+  EXPECT_EQ("hello\r", ExpectCUnescapeSuccess("hello\\r"));
+  EXPECT_EQ("\t\r\"'", ExpectCUnescapeSuccess("\\t\\r\\\"\\'"));
+  EXPECT_EQ("\320hi\200", ExpectCUnescapeSuccess("\\320hi\\200"));
+}
+
+TEST(NumericParse32, Basic) {
+  int32 val = -1234;
+  EXPECT_TRUE(str_util::NumericParse32("0", &val) && val == 0);
+  EXPECT_TRUE(str_util::NumericParse32("123", &val) && val == 123);
+  EXPECT_TRUE(str_util::NumericParse32("-375", &val) && val == -375);
+  EXPECT_FALSE(str_util::NumericParse32("123hello", &val));
+  EXPECT_FALSE(str_util::NumericParse32("hello123", &val));
+}
+
+TEST(StripTrailingWhitespace, Basic) {
+  string test;
+  test = "hello";
+  str_util::StripTrailingWhitespace(&test);
+  EXPECT_EQ(test, "hello");
+
+  test = "foo  ";
+  str_util::StripTrailingWhitespace(&test);
+  EXPECT_EQ(test, "foo");
+
+  test = "   ";
+  str_util::StripTrailingWhitespace(&test);
+  EXPECT_EQ(test, "");
+
+  test = "";
+  str_util::StripTrailingWhitespace(&test);
+  EXPECT_EQ(test, "");
+
+  test = " abc\t";
+  str_util::StripTrailingWhitespace(&test);
+  EXPECT_EQ(test, " abc");
+}
+
+TEST(RemoveLeadingWhitespace, Basic) {
+  string text = "  \t   \n  \r Quick\t";
+  StringPiece data(text);
+  // check that all whitespace is removed
+  EXPECT_EQ(str_util::RemoveLeadingWhitespace(&data), 11);
+  EXPECT_EQ(data, StringPiece("Quick\t"));
+  // check that non-whitespace is not removed
+  EXPECT_EQ(str_util::RemoveLeadingWhitespace(&data), 0);
+  EXPECT_EQ(data, StringPiece("Quick\t"));
+}
+
+TEST(RemoveLeadingWhitespace, TerminationHandling) {
+  // check termination handling
+  string text = "\t";
+  StringPiece data(text);
+  EXPECT_EQ(str_util::RemoveLeadingWhitespace(&data), 1);
+  EXPECT_EQ(data, StringPiece(""));
+
+  // check termination handling again
+  EXPECT_EQ(str_util::RemoveLeadingWhitespace(&data), 0);
+  EXPECT_EQ(data, StringPiece(""));
+}
+
+TEST(RemoveTrailingWhitespace, Basic) {
+  string text = "  \t   \n  \r Quick \t";
+  StringPiece data(text);
+  // check that all whitespace is removed
+  EXPECT_EQ(str_util::RemoveTrailingWhitespace(&data), 2);
+  EXPECT_EQ(data, StringPiece("  \t   \n  \r Quick"));
+  // check that non-whitespace is not removed
+  EXPECT_EQ(str_util::RemoveTrailingWhitespace(&data), 0);
+  EXPECT_EQ(data, StringPiece("  \t   \n  \r Quick"));
+}
+
+TEST(RemoveTrailingWhitespace, TerminationHandling) {
+  // check termination handling
+  string text = "\t";
+  StringPiece data(text);
+  EXPECT_EQ(str_util::RemoveTrailingWhitespace(&data), 1);
+  EXPECT_EQ(data, StringPiece(""));
+
+  // check termination handling again
+  EXPECT_EQ(str_util::RemoveTrailingWhitespace(&data), 0);
+  EXPECT_EQ(data, StringPiece(""));
+}
+
+TEST(RemoveWhitespaceContext, Basic) {
+  string text = "  \t   \n  \r Quick \t";
+  StringPiece data(text);
+  // check that all whitespace is removed
+  EXPECT_EQ(str_util::RemoveWhitespaceContext(&data), 13);
+  EXPECT_EQ(data, StringPiece("Quick"));
+  // check that non-whitespace is not removed
+  EXPECT_EQ(str_util::RemoveWhitespaceContext(&data), 0);
+  EXPECT_EQ(data, StringPiece("Quick"));
+
+  // Test empty string
+  text = "";
+  data = text;
+  EXPECT_EQ(str_util::RemoveWhitespaceContext(&data), 0);
+  EXPECT_EQ(data, StringPiece(""));
+}
+
+void TestConsumeLeadingDigits(StringPiece s, int64 expected,
+                              StringPiece remaining) {
+  uint64 v;
+  StringPiece input(s);
+  if (str_util::ConsumeLeadingDigits(&input, &v)) {
+    EXPECT_EQ(v, static_cast<uint64>(expected));
+    EXPECT_EQ(input, remaining);
+  } else {
+    EXPECT_LT(expected, 0);
+    EXPECT_EQ(input, remaining);
+  }
+}
+
+TEST(ConsumeLeadingDigits, Basic) {
+  TestConsumeLeadingDigits("123", 123, "");
+  TestConsumeLeadingDigits("a123", -1, "a123");
+  TestConsumeLeadingDigits("9_", 9, "_");
+  TestConsumeLeadingDigits("11111111111xyz", 11111111111ll, "xyz");
+
+  // Overflow case
+  TestConsumeLeadingDigits("1111111111111111111111111111111xyz", -1,
+                           "1111111111111111111111111111111xyz");
+
+  // 2^64
+  TestConsumeLeadingDigits("18446744073709551616xyz", -1,
+                           "18446744073709551616xyz");
+  // 2^64-1
+  TestConsumeLeadingDigits("18446744073709551615xyz", 18446744073709551615ull,
+                           "xyz");
+}
+
+TEST(ConsumePrefix, Basic) {
+  string s("abcdef");
+  StringPiece input(s);
+  EXPECT_FALSE(str_util::ConsumePrefix(&input, "abcdefg"));
+  EXPECT_EQ(input, "abcdef");
+
+  EXPECT_FALSE(str_util::ConsumePrefix(&input, "abce"));
+  EXPECT_EQ(input, "abcdef");
+
+  EXPECT_TRUE(str_util::ConsumePrefix(&input, ""));
+  EXPECT_EQ(input, "abcdef");
+
+  EXPECT_FALSE(str_util::ConsumePrefix(&input, "abcdeg"));
+  EXPECT_EQ(input, "abcdef");
+
+  EXPECT_TRUE(str_util::ConsumePrefix(&input, "abcdef"));
+  EXPECT_EQ(input, "");
+
+  input = s;
+  EXPECT_TRUE(str_util::ConsumePrefix(&input, "abcde"));
+  EXPECT_EQ(input, "f");
+}
+
+TEST(JoinStrings, Basic) {
+  std::vector<string> s;
+  s = {"hi"};
+  EXPECT_EQ(str_util::Join(s, " "), "hi");
+  s = {"hi", "there", "strings"};
+  EXPECT_EQ(str_util::Join(s, " "), "hi there strings");
+
+  std::vector<StringPiece> sp;
+  sp = {"hi"};
+  EXPECT_EQ(str_util::Join(sp, ",,"), "hi");
+  sp = {"hi", "there", "strings"};
+  EXPECT_EQ(str_util::Join(sp, "--"), "hi--there--strings");
+}
+
+TEST(Split, Basic) {
+  EXPECT_TRUE(str_util::Split("", ',').empty());
+  EXPECT_EQ(str_util::Join(str_util::Split("a", ','), "|"), "a");
+  EXPECT_EQ(str_util::Join(str_util::Split(",", ','), "|"), "|");
+  EXPECT_EQ(str_util::Join(str_util::Split("a,b,c", ','), "|"), "a|b|c");
+  EXPECT_EQ(str_util::Join(str_util::Split("a,,,b,,c,", ','), "|"),
+            "a|||b||c|");
+  EXPECT_EQ(str_util::Join(
+                str_util::Split("a,,,b,,c,", ',', str_util::SkipEmpty()), "|"),
+            "a|b|c");
+  EXPECT_EQ(
+      str_util::Join(
+          str_util::Split("a,  ,b,,c,", ',', str_util::SkipWhitespace()), "|"),
+      "a|b|c");
+}
+
+TEST(SplitAndParseAsInts, Basic) {
+  std::vector<int32> nums;
+  EXPECT_TRUE(str_util::SplitAndParseAsInts("", ',', &nums));
+  EXPECT_EQ(nums.size(), 0);
+
+  EXPECT_TRUE(str_util::SplitAndParseAsInts("134", ',', &nums));
+  EXPECT_EQ(nums.size(), 1);
+  EXPECT_EQ(nums[0], 134);
+
+  EXPECT_TRUE(str_util::SplitAndParseAsInts("134,2,13,-5", ',', &nums));
+  EXPECT_EQ(nums.size(), 4);
+  EXPECT_EQ(nums[0], 134);
+  EXPECT_EQ(nums[1], 2);
+  EXPECT_EQ(nums[2], 13);
+  EXPECT_EQ(nums[3], -5);
+
+  EXPECT_FALSE(str_util::SplitAndParseAsInts("abc", ',', &nums));
+
+  EXPECT_FALSE(str_util::SplitAndParseAsInts("-13,abc", ',', &nums));
+
+  EXPECT_FALSE(str_util::SplitAndParseAsInts("13,abc,5", ',', &nums));
+}
+
+TEST(Lowercase, Basic) {
+  EXPECT_EQ("", str_util::Lowercase(""));
+  EXPECT_EQ("hello", str_util::Lowercase("hello"));
+  EXPECT_EQ("hello world", str_util::Lowercase("Hello World"));
+}
+
+TEST(Uppercase, Basic) {
+  EXPECT_EQ("", str_util::Uppercase(""));
+  EXPECT_EQ("HELLO", str_util::Uppercase("hello"));
+  EXPECT_EQ("HELLO WORLD", str_util::Uppercase("Hello World"));
+}
+
+TEST(TitlecaseString, Basic) {
+  string s = "sparse_lookup";
+  str_util::TitlecaseString(&s, "_");
+  ASSERT_EQ(s, "Sparse_Lookup");
+
+  s = "sparse_lookup";
+  str_util::TitlecaseString(&s, " ");
+  ASSERT_EQ(s, "Sparse_lookup");
+
+  s = "dense";
+  str_util::TitlecaseString(&s, " ");
+  ASSERT_EQ(s, "Dense");
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/strcat.cc b/tensorflow/core/lib/strings/strcat.cc
new file mode 100644
index 0000000000..e564b9eb73
--- /dev/null
+++ b/tensorflow/core/lib/strings/strcat.cc
@@ -0,0 +1,194 @@
+#include "tensorflow/core/lib/strings/strcat.h"
+
+#include <stdarg.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+
+namespace tensorflow {
+namespace strings {
+
+AlphaNum gEmptyAlphaNum("");
+
+AlphaNum::AlphaNum(Hex hex) {
+  char *const end = &digits_[kFastToBufferSize];
+  char *writer = end;
+  uint64 value = hex.value;
+  uint64 width = hex.spec;
+  // We accomplish minimum width by OR'ing in 0x10000 to the user's value,
+  // where 0x10000 is the smallest hex number that is as wide as the user
+  // asked for.
+  uint64 mask = ((static_cast<uint64>(1) << (width - 1) * 4)) | value;
+  static const char hexdigits[] = "0123456789abcdef";
+  do {
+    *--writer = hexdigits[value & 0xF];
+    value >>= 4;
+    mask >>= 4;
+  } while (mask != 0);
+  piece_.set(writer, end - writer);
+}
+
+// ----------------------------------------------------------------------
+// StrCat()
+//    This merges the given strings or integers, with no delimiter.  This
+//    is designed to be the fastest possible way to construct a string out
+//    of a mix of raw C strings, StringPieces, strings, and integer values.
+// ----------------------------------------------------------------------
+
+// Append is merely a version of memcpy that returns the address of the byte
+// after the area just overwritten.  It comes in multiple flavors to minimize
+// call overhead.
+static char *Append1(char *out, const AlphaNum &x) {
+  memcpy(out, x.data(), x.size());
+  return out + x.size();
+}
+
+static char *Append2(char *out, const AlphaNum &x1, const AlphaNum &x2) {
+  memcpy(out, x1.data(), x1.size());
+  out += x1.size();
+
+  memcpy(out, x2.data(), x2.size());
+  return out + x2.size();
+}
+
+static char *Append4(char *out, const AlphaNum &x1, const AlphaNum &x2,
+                     const AlphaNum &x3, const AlphaNum &x4) {
+  memcpy(out, x1.data(), x1.size());
+  out += x1.size();
+
+  memcpy(out, x2.data(), x2.size());
+  out += x2.size();
+
+  memcpy(out, x3.data(), x3.size());
+  out += x3.size();
+
+  memcpy(out, x4.data(), x4.size());
+  return out + x4.size();
+}
+
+string StrCat(const AlphaNum &a, const AlphaNum &b) {
+  string result;
+  gtl::STLStringResizeUninitialized(&result, a.size() + b.size());
+  char *const begin = &*result.begin();
+  char *out = Append2(begin, a, b);
+  DCHECK_EQ(out, begin + result.size());
+  return result;
+}
+
+string StrCat(const AlphaNum &a, const AlphaNum &b, const AlphaNum &c) {
+  string result;
+  gtl::STLStringResizeUninitialized(&result, a.size() + b.size() + c.size());
+  char *const begin = &*result.begin();
+  char *out = Append2(begin, a, b);
+  out = Append1(out, c);
+  DCHECK_EQ(out, begin + result.size());
+  return result;
+}
+
+string StrCat(const AlphaNum &a, const AlphaNum &b, const AlphaNum &c,
+              const AlphaNum &d) {
+  string result;
+  gtl::STLStringResizeUninitialized(&result,
+                                    a.size() + b.size() + c.size() + d.size());
+  char *const begin = &*result.begin();
+  char *out = Append4(begin, a, b, c, d);
+  DCHECK_EQ(out, begin + result.size());
+  return result;
+}
+
+namespace internal {
+
+// Do not call directly - these are not part of the public API.
+string CatPieces(std::initializer_list<StringPiece> pieces) {
+  string result;
+  size_t total_size = 0;
+  for (const StringPiece piece : pieces) total_size += piece.size();
+  gtl::STLStringResizeUninitialized(&result, total_size);
+
+  char *const begin = &*result.begin();
+  char *out = begin;
+  for (const StringPiece piece : pieces) {
+    const size_t this_size = piece.size();
+    memcpy(out, piece.data(), this_size);
+    out += this_size;
+  }
+  DCHECK_EQ(out, begin + result.size());
+  return result;
+}
+
+// It's possible to call StrAppend with a StringPiece that is itself a fragment
+// of the string we're appending to.  However the results of this are random.
+// Therefore, check for this in debug mode.  Use unsigned math so we only have
+// to do one comparison.
+#define DCHECK_NO_OVERLAP(dest, src) \
+  DCHECK_GE(uintptr_t((src).data() - (dest).data()), uintptr_t((dest).size()))
+
+void AppendPieces(string *result, std::initializer_list<StringPiece> pieces) {
+  size_t old_size = result->size();
+  size_t total_size = old_size;
+  for (const StringPiece piece : pieces) {
+    DCHECK_NO_OVERLAP(*result, piece);
+    total_size += piece.size();
+  }
+  gtl::STLStringResizeUninitialized(result, total_size);
+
+  char *const begin = &*result->begin();
+  char *out = begin + old_size;
+  for (const StringPiece piece : pieces) {
+    const size_t this_size = piece.size();
+    memcpy(out, piece.data(), this_size);
+    out += this_size;
+  }
+  DCHECK_EQ(out, begin + result->size());
+}
+
+}  // namespace internal
+
+void StrAppend(string *result, const AlphaNum &a) {
+  DCHECK_NO_OVERLAP(*result, a);
+  result->append(a.data(), a.size());
+}
+
+void StrAppend(string *result, const AlphaNum &a, const AlphaNum &b) {
+  DCHECK_NO_OVERLAP(*result, a);
+  DCHECK_NO_OVERLAP(*result, b);
+  string::size_type old_size = result->size();
+  gtl::STLStringResizeUninitialized(result, old_size + a.size() + b.size());
+  char *const begin = &*result->begin();
+  char *out = Append2(begin + old_size, a, b);
+  DCHECK_EQ(out, begin + result->size());
+}
+
+void StrAppend(string *result, const AlphaNum &a, const AlphaNum &b,
+               const AlphaNum &c) {
+  DCHECK_NO_OVERLAP(*result, a);
+  DCHECK_NO_OVERLAP(*result, b);
+  DCHECK_NO_OVERLAP(*result, c);
+  string::size_type old_size = result->size();
+  gtl::STLStringResizeUninitialized(result,
+                                    old_size + a.size() + b.size() + c.size());
+  char *const begin = &*result->begin();
+  char *out = Append2(begin + old_size, a, b);
+  out = Append1(out, c);
+  DCHECK_EQ(out, begin + result->size());
+}
+
+void StrAppend(string *result, const AlphaNum &a, const AlphaNum &b,
+               const AlphaNum &c, const AlphaNum &d) {
+  DCHECK_NO_OVERLAP(*result, a);
+  DCHECK_NO_OVERLAP(*result, b);
+  DCHECK_NO_OVERLAP(*result, c);
+  DCHECK_NO_OVERLAP(*result, d);
+  string::size_type old_size = result->size();
+  gtl::STLStringResizeUninitialized(
+      result, old_size + a.size() + b.size() + c.size() + d.size());
+  char *const begin = &*result->begin();
+  char *out = Append4(begin + old_size, a, b, c, d);
+  DCHECK_EQ(out, begin + result->size());
+}
+
+}  // namespace strings
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/strcat.h b/tensorflow/core/lib/strings/strcat.h
new file mode 100644
index 0000000000..763ad8368a
--- /dev/null
+++ b/tensorflow/core/lib/strings/strcat.h
@@ -0,0 +1,229 @@
+// #status: RECOMMENDED
+// #category: operations on strings
+// #summary: Merges strings or numbers with no delimiter.
+//
+#ifndef TENSORFLOW_LIB_STRINGS_STRCAT_H_
+#define TENSORFLOW_LIB_STRINGS_STRCAT_H_
+
+#include <string>
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/strings/numbers.h"
+#include "tensorflow/core/platform/port.h"
+
+// The AlphaNum type was designed to be used as the parameter type for StrCat().
+// Any routine accepting either a string or a number may accept it.
+// The basic idea is that by accepting a "const AlphaNum &" as an argument
+// to your function, your callers will automagically convert bools, integers,
+// and floating point values to strings for you.
+//
+// NOTE: Use of AlphaNum outside of the //strings package is unsupported except
+// for the specific case of function parameters of type "AlphaNum" or "const
+// AlphaNum &". In particular, instantiating AlphaNum directly as a stack
+// variable is not supported.
+//
+// Conversion from 8-bit values is not accepted because if it were, then an
+// attempt to pass ':' instead of ":" might result in a 58 ending up in your
+// result.
+//
+// Bools convert to "0" or "1".
+//
+// Floating point values are converted to a string which, if passed to strtod(),
+// would produce the exact same original double (except in case of NaN; all NaNs
+// are considered the same value). We try to keep the string short but it's not
+// guaranteed to be as short as possible.
+//
+// You can convert to Hexadecimal output rather than Decimal output using Hex.
+// To do this, pass strings::Hex(my_int) as a parameter to StrCat. You may
+// specify a minimum field width using a separate parameter, so the equivalent
+// of Printf("%04x", my_int) is StrCat(Hex(my_int, strings::ZERO_PAD_4))
+//
+// This class has implicit constructors.
+namespace tensorflow {
+namespace strings {
+
+enum PadSpec {
+  NO_PAD = 1,
+  ZERO_PAD_2,
+  ZERO_PAD_3,
+  ZERO_PAD_4,
+  ZERO_PAD_5,
+  ZERO_PAD_6,
+  ZERO_PAD_7,
+  ZERO_PAD_8,
+  ZERO_PAD_9,
+  ZERO_PAD_10,
+  ZERO_PAD_11,
+  ZERO_PAD_12,
+  ZERO_PAD_13,
+  ZERO_PAD_14,
+  ZERO_PAD_15,
+  ZERO_PAD_16,
+};
+
+struct Hex {
+  uint64 value;
+  enum PadSpec spec;
+  template <class Int>
+  explicit Hex(Int v, PadSpec s = NO_PAD)
+      : spec(s) {
+    // Prevent sign-extension by casting integers to
+    // their unsigned counterparts.
+    static_assert(
+        sizeof(v) == 1 || sizeof(v) == 2 || sizeof(v) == 4 || sizeof(v) == 8,
+        "Unknown integer type");
+    value = sizeof(v) == 1
+                ? static_cast<uint8>(v)
+                : sizeof(v) == 2 ? static_cast<uint16>(v)
+                                 : sizeof(v) == 4 ? static_cast<uint32>(v)
+                                                  : static_cast<uint64>(v);
+  }
+};
+
+class AlphaNum {
+ public:
+  // No bool ctor -- bools convert to an integral type.
+  // A bool ctor would also convert incoming pointers (bletch).
+
+  AlphaNum(int i32)  // NOLINT(runtime/explicit)
+      : piece_(digits_, FastInt32ToBufferLeft(i32, digits_) - &digits_[0]) {}
+  AlphaNum(unsigned int u32)  // NOLINT(runtime/explicit)
+      : piece_(digits_, FastUInt32ToBufferLeft(u32, digits_) - &digits_[0]) {}
+  AlphaNum(long x)  // NOLINT(runtime/explicit)
+      : piece_(digits_, FastInt64ToBufferLeft(x, digits_) - &digits_[0]) {}
+  AlphaNum(unsigned long x)  // NOLINT(runtime/explicit)
+      : piece_(digits_, FastUInt64ToBufferLeft(x, digits_) - &digits_[0]) {}
+  AlphaNum(long long int i64)  // NOLINT(runtime/explicit)
+      : piece_(digits_, FastInt64ToBufferLeft(i64, digits_) - &digits_[0]) {}
+  AlphaNum(unsigned long long int u64)  // NOLINT(runtime/explicit)
+      : piece_(digits_, FastUInt64ToBufferLeft(u64, digits_) - &digits_[0]) {}
+
+  AlphaNum(float f)  // NOLINT(runtime/explicit)
+      : piece_(digits_, strlen(FloatToBuffer(f, digits_))) {}
+  AlphaNum(double f)  // NOLINT(runtime/explicit)
+      : piece_(digits_, strlen(DoubleToBuffer(f, digits_))) {}
+
+  AlphaNum(Hex hex);  // NOLINT(runtime/explicit)
+
+  AlphaNum(const char *c_str) : piece_(c_str) {}   // NOLINT(runtime/explicit)
+  AlphaNum(const StringPiece &pc) : piece_(pc) {}  // NOLINT(runtime/explicit)
+  AlphaNum(const tensorflow::string &str)          // NOLINT(runtime/explicit)
+      : piece_(str) {}
+
+  StringPiece::size_type size() const { return piece_.size(); }
+  const char *data() const { return piece_.data(); }
+  StringPiece Piece() const { return piece_; }
+
+ private:
+  StringPiece piece_;
+  char digits_[kFastToBufferSize];
+
+  // Use ":" not ':'
+  AlphaNum(char c);  // NOLINT(runtime/explicit)
+
+  TF_DISALLOW_COPY_AND_ASSIGN(AlphaNum);
+};
+
+extern AlphaNum gEmptyAlphaNum;
+
+using strings::AlphaNum;
+using strings::gEmptyAlphaNum;
+
+// ----------------------------------------------------------------------
+// StrCat()
+//    This merges the given strings or numbers, with no delimiter.  This
+//    is designed to be the fastest possible way to construct a string out
+//    of a mix of raw C strings, StringPieces, strings, bool values,
+//    and numeric values.
+//
+//    Don't use this for user-visible strings.  The localization process
+//    works poorly on strings built up out of fragments.
+//
+//    For clarity and performance, don't use StrCat when appending to a
+//    string.  In particular, avoid using any of these (anti-)patterns:
+//      str.append(StrCat(...))
+//      str += StrCat(...)
+//      str = StrCat(str, ...)
+//    where the last is the worse, with the potential to change a loop
+//    from a linear time operation with O(1) dynamic allocations into a
+//    quadratic time operation with O(n) dynamic allocations.  StrAppend
+//    is a better choice than any of the above, subject to the restriction
+//    of StrAppend(&str, a, b, c, ...) that none of the a, b, c, ... may
+//    be a reference into str.
+// ----------------------------------------------------------------------
+
+// For performance reasons, we have specializations for <= 4 args.
+string StrCat(const AlphaNum &a) TF_MUST_USE_RESULT;
+string StrCat(const AlphaNum &a, const AlphaNum &b) TF_MUST_USE_RESULT;
+string StrCat(const AlphaNum &a, const AlphaNum &b,
+              const AlphaNum &c) TF_MUST_USE_RESULT;
+string StrCat(const AlphaNum &a, const AlphaNum &b, const AlphaNum &c,
+              const AlphaNum &d) TF_MUST_USE_RESULT;
+
+// inline definitions must be duplicated due to TF_MUST_USE_RESULT
+inline string StrCat(const AlphaNum &a) { return string(a.data(), a.size()); }
+
+namespace internal {
+
+// Do not call directly - this is not part of the public API.
+string CatPieces(std::initializer_list<StringPiece> pieces);
+void AppendPieces(string *dest, std::initializer_list<StringPiece> pieces);
+
+}  // namespace internal
+
+// Support 5 or more arguments
+template <typename... AV>
+string StrCat(const AlphaNum &a, const AlphaNum &b, const AlphaNum &c,
+              const AlphaNum &d, const AlphaNum &e,
+              const AV &... args) TF_MUST_USE_RESULT;
+
+template <typename... AV>
+inline string StrCat(const AlphaNum &a, const AlphaNum &b, const AlphaNum &c,
+                     const AlphaNum &d, const AlphaNum &e, const AV &... args) {
+  return internal::CatPieces({a.Piece(), b.Piece(), c.Piece(), d.Piece(),
+                              e.Piece(),
+                              static_cast<const AlphaNum &>(args).Piece()...});
+}
+
+// ----------------------------------------------------------------------
+// StrAppend()
+//    Same as above, but adds the output to the given string.
+//    WARNING: For speed, StrAppend does not try to check each of its input
+//    arguments to be sure that they are not a subset of the string being
+//    appended to.  That is, while this will work:
+//
+//    string s = "foo";
+//    s += s;
+//
+//    This will not (necessarily) work:
+//
+//    string s = "foo";
+//    StrAppend(&s, s);
+//
+//    Note: while StrCat supports appending up to 26 arguments, StrAppend
+//    is currently limited to 9.  That's rarely an issue except when
+//    automatically transforming StrCat to StrAppend, and can easily be
+//    worked around as consecutive calls to StrAppend are quite efficient.
+// ----------------------------------------------------------------------
+
+void StrAppend(string *dest, const AlphaNum &a);
+void StrAppend(string *dest, const AlphaNum &a, const AlphaNum &b);
+void StrAppend(string *dest, const AlphaNum &a, const AlphaNum &b,
+               const AlphaNum &c);
+void StrAppend(string *dest, const AlphaNum &a, const AlphaNum &b,
+               const AlphaNum &c, const AlphaNum &d);
+
+// Support 5 or more arguments
+template <typename... AV>
+inline void StrAppend(string *dest, const AlphaNum &a, const AlphaNum &b,
+                      const AlphaNum &c, const AlphaNum &d, const AlphaNum &e,
+                      const AV &... args) {
+  internal::AppendPieces(dest,
+                         {a.Piece(), b.Piece(), c.Piece(), d.Piece(), e.Piece(),
+                          static_cast<const AlphaNum &>(args).Piece()...});
+}
+
+}  // namespace strings
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_STRINGS_STRCAT_H_
diff --git a/tensorflow/core/lib/strings/strcat_test.cc b/tensorflow/core/lib/strings/strcat_test.cc
new file mode 100644
index 0000000000..9ff7d81af9
--- /dev/null
+++ b/tensorflow/core/lib/strings/strcat_test.cc
@@ -0,0 +1,324 @@
+#include "tensorflow/core/lib/strings/strcat.h"
+
+#include <string>
+
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/port.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace strings {
+
+// Test StrCat of ints and longs of various sizes and signdedness.
+TEST(StrCat, Ints) {
+  const int16 s = -1;
+  const uint16 us = 2;
+  const int i = -3;
+  const unsigned int ui = 4;
+  const int32 l = -5;
+  const uint32 ul = 6;
+  const int64 ll = -7;
+  const uint64 ull = 8;
+  const ptrdiff_t ptrdiff = -9;
+  const size_t size = 10;
+  const ssize_t ssize = -11;
+  const intptr_t intptr = -12;
+  const uintptr_t uintptr = 13;
+  string answer;
+  answer = StrCat(s, us);
+  EXPECT_EQ(answer, "-12");
+  answer = StrCat(i, ui);
+  EXPECT_EQ(answer, "-34");
+  answer = StrCat(l, ul);
+  EXPECT_EQ(answer, "-56");
+  answer = StrCat(ll, ull);
+  EXPECT_EQ(answer, "-78");
+  answer = StrCat(ptrdiff, size);
+  EXPECT_EQ(answer, "-910");
+  answer = StrCat(ssize, intptr);
+  EXPECT_EQ(answer, "-11-12");
+  answer = StrCat(uintptr, 0);
+  EXPECT_EQ(answer, "130");
+}
+
+TEST(StrCat, Basics) {
+  string result;
+
+  string strs[] = {"Hello", "Cruel", "World"};
+
+  StringPiece pieces[] = {"Hello", "Cruel", "World"};
+
+  const char *c_strs[] = {"Hello", "Cruel", "World"};
+
+  int32 i32s[] = {'H', 'C', 'W'};
+  uint64 ui64s[] = {12345678910LL, 10987654321LL};
+
+  result = StrCat(false, true, 2, 3);
+  EXPECT_EQ(result, "0123");
+
+  result = StrCat(-1);
+  EXPECT_EQ(result, "-1");
+
+  result = StrCat(0.5);
+  EXPECT_EQ(result, "0.5");
+
+  result = StrCat(strs[1], pieces[2]);
+  EXPECT_EQ(result, "CruelWorld");
+
+  result = StrCat(strs[0], ", ", pieces[2]);
+  EXPECT_EQ(result, "Hello, World");
+
+  result = StrCat(strs[0], ", ", strs[1], " ", strs[2], "!");
+  EXPECT_EQ(result, "Hello, Cruel World!");
+
+  result = StrCat(pieces[0], ", ", pieces[1], " ", pieces[2]);
+  EXPECT_EQ(result, "Hello, Cruel World");
+
+  result = StrCat(c_strs[0], ", ", c_strs[1], " ", c_strs[2]);
+  EXPECT_EQ(result, "Hello, Cruel World");
+
+  result = StrCat("ASCII ", i32s[0], ", ", i32s[1], " ", i32s[2], "!");
+  EXPECT_EQ(result, "ASCII 72, 67 87!");
+
+  result = StrCat(ui64s[0], ", ", ui64s[1], "!");
+  EXPECT_EQ(result, "12345678910, 10987654321!");
+
+  string one = "1";  // Actually, it's the size of this string that we want; a
+                     // 64-bit build distinguishes between size_t and uint64,
+                     // even though they're both unsigned 64-bit values.
+  result = StrCat("And a ", one.size(), " and a ", &result[2] - &result[0],
+                  " and a ", one, " 2 3 4", "!");
+  EXPECT_EQ(result, "And a 1 and a 2 and a 1 2 3 4!");
+
+  // result = StrCat("Single chars won't compile", '!');
+  // result = StrCat("Neither will NULLs", NULL);
+  result = StrCat("To output a char by ASCII/numeric value, use +: ", '!' + 0);
+  EXPECT_EQ(result, "To output a char by ASCII/numeric value, use +: 33");
+
+  float f = 100000.5;
+  result = StrCat("A hundred K and a half is ", f);
+  EXPECT_EQ(result, "A hundred K and a half is 100000.5");
+
+  double d = f;
+  d *= d;
+  result = StrCat("A hundred K and a half squared is ", d);
+  EXPECT_EQ(result, "A hundred K and a half squared is 10000100000.25");
+
+  result = StrCat(1, 2, 333, 4444, 55555, 666666, 7777777, 88888888, 999999999);
+  EXPECT_EQ(result, "12333444455555666666777777788888888999999999");
+}
+
+TEST(StrCat, MaxArgs) {
+  string result;
+  // Test 10 up to 26 arguments, the current maximum
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a");
+  EXPECT_EQ(result, "123456789a");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b");
+  EXPECT_EQ(result, "123456789ab");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c");
+  EXPECT_EQ(result, "123456789abc");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d");
+  EXPECT_EQ(result, "123456789abcd");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e");
+  EXPECT_EQ(result, "123456789abcde");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f");
+  EXPECT_EQ(result, "123456789abcdef");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g");
+  EXPECT_EQ(result, "123456789abcdefg");
+  result =
+      StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g", "h");
+  EXPECT_EQ(result, "123456789abcdefgh");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i");
+  EXPECT_EQ(result, "123456789abcdefghi");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i", "j");
+  EXPECT_EQ(result, "123456789abcdefghij");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i", "j", "k");
+  EXPECT_EQ(result, "123456789abcdefghijk");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i", "j", "k", "l");
+  EXPECT_EQ(result, "123456789abcdefghijkl");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i", "j", "k", "l", "m");
+  EXPECT_EQ(result, "123456789abcdefghijklm");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i", "j", "k", "l", "m", "n");
+  EXPECT_EQ(result, "123456789abcdefghijklmn");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i", "j", "k", "l", "m", "n", "o");
+  EXPECT_EQ(result, "123456789abcdefghijklmno");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i", "j", "k", "l", "m", "n", "o", "p");
+  EXPECT_EQ(result, "123456789abcdefghijklmnop");
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g",
+                  "h", "i", "j", "k", "l", "m", "n", "o", "p", "q");
+  EXPECT_EQ(result, "123456789abcdefghijklmnopq");
+  // No limit thanks to C++11's variadic templates
+  result = StrCat(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, "a", "b", "c", "d", "e", "f",
+                  "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",
+                  "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D",
+                  "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P",
+                  "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z");
+  EXPECT_EQ(result,
+            "12345678910abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");
+}
+
+TEST(StrAppend, Basics) {
+  string result = "existing text";
+
+  string strs[] = {"Hello", "Cruel", "World"};
+
+  StringPiece pieces[] = {"Hello", "Cruel", "World"};
+
+  const char *c_strs[] = {"Hello", "Cruel", "World"};
+
+  int32 i32s[] = {'H', 'C', 'W'};
+  uint64 ui64s[] = {12345678910LL, 10987654321LL};
+
+  string::size_type old_size = result.size();
+  StrAppend(&result, strs[0]);
+  EXPECT_EQ(result.substr(old_size), "Hello");
+
+  old_size = result.size();
+  StrAppend(&result, strs[1], pieces[2]);
+  EXPECT_EQ(result.substr(old_size), "CruelWorld");
+
+  old_size = result.size();
+  StrAppend(&result, strs[0], ", ", pieces[2]);
+  EXPECT_EQ(result.substr(old_size), "Hello, World");
+
+  old_size = result.size();
+  StrAppend(&result, strs[0], ", ", strs[1], " ", strs[2], "!");
+  EXPECT_EQ(result.substr(old_size), "Hello, Cruel World!");
+
+  old_size = result.size();
+  StrAppend(&result, pieces[0], ", ", pieces[1], " ", pieces[2]);
+  EXPECT_EQ(result.substr(old_size), "Hello, Cruel World");
+
+  old_size = result.size();
+  StrAppend(&result, c_strs[0], ", ", c_strs[1], " ", c_strs[2]);
+  EXPECT_EQ(result.substr(old_size), "Hello, Cruel World");
+
+  old_size = result.size();
+  StrAppend(&result, "ASCII ", i32s[0], ", ", i32s[1], " ", i32s[2], "!");
+  EXPECT_EQ(result.substr(old_size), "ASCII 72, 67 87!");
+
+  old_size = result.size();
+  StrAppend(&result, ui64s[0], ", ", ui64s[1], "!");
+  EXPECT_EQ(result.substr(old_size), "12345678910, 10987654321!");
+
+  string one = "1";  // Actually, it's the size of this string that we want; a
+                     // 64-bit build distinguishes between size_t and uint64,
+                     // even though they're both unsigned 64-bit values.
+  old_size = result.size();
+  StrAppend(&result, "And a ", one.size(), " and a ", &result[2] - &result[0],
+            " and a ", one, " 2 3 4", "!");
+  EXPECT_EQ(result.substr(old_size), "And a 1 and a 2 and a 1 2 3 4!");
+
+  // result = StrCat("Single chars won't compile", '!');
+  // result = StrCat("Neither will NULLs", NULL);
+  old_size = result.size();
+  StrAppend(&result, "To output a char by ASCII/numeric value, use +: ",
+            '!' + 0);
+  EXPECT_EQ(result.substr(old_size),
+            "To output a char by ASCII/numeric value, use +: 33");
+
+  float f = 100000.5;
+  old_size = result.size();
+  StrAppend(&result, "A hundred K and a half is ", f);
+  EXPECT_EQ(result.substr(old_size), "A hundred K and a half is 100000.5");
+
+  double d = f;
+  d *= d;
+  old_size = result.size();
+  StrAppend(&result, "A hundred K and a half squared is ", d);
+  EXPECT_EQ(result.substr(old_size),
+            "A hundred K and a half squared is 10000100000.25");
+
+  // Test 9 arguments, the old maximum
+  old_size = result.size();
+  StrAppend(&result, 1, 22, 333, 4444, 55555, 666666, 7777777, 88888888, 9);
+  EXPECT_EQ(result.substr(old_size), "1223334444555556666667777777888888889");
+
+  // No limit thanks to C++11's variadic templates
+  old_size = result.size();
+  StrAppend(&result, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, "a", "b", "c", "d", "e",
+            "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r",
+            "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E",
+            "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R",
+            "S", "T", "U", "V", "W", "X", "Y", "Z",
+            "No limit thanks to C++11's variadic templates");
+  EXPECT_EQ(result.substr(old_size),
+            "12345678910abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
+            "No limit thanks to C++11's variadic templates");
+}
+
+TEST(StrAppend, Death) {
+  string s = "self";
+  EXPECT_DEBUG_DEATH(StrAppend(&s, s.c_str() + 1), "Check failed:");
+  EXPECT_DEBUG_DEATH(StrAppend(&s, s), "Check failed:");
+}
+
+static void CheckHex64(uint64 v) {
+  using tensorflow::strings::Hex;
+  string actual = StrCat(Hex(v, tensorflow::strings::ZERO_PAD_16));
+  string expected = Printf("%016llx", static_cast<unsigned long long>(v));
+  EXPECT_EQ(expected, actual) << " decimal value " << v;
+
+  actual = StrCat(Hex(v, tensorflow::strings::ZERO_PAD_8));
+  expected = Printf("%08llx", static_cast<unsigned long long>(v));
+  EXPECT_EQ(expected, actual) << " decimal value " << v;
+
+  actual = StrCat(Hex(v));
+  expected = Printf("%llx", static_cast<unsigned long long>(v));
+  EXPECT_EQ(expected, actual) << " decimal value " << v;
+}
+
+static void CheckHex32(uint32 v) {
+  using tensorflow::strings::Hex;
+  string actual = StrCat(Hex(v, tensorflow::strings::ZERO_PAD_8));
+  string expected = Printf("%08x", v);
+  EXPECT_EQ(expected, actual) << " decimal value " << v;
+
+  actual = StrCat(Hex(v));
+  expected = Printf("%x", v);
+  EXPECT_EQ(expected, actual) << " decimal value " << v;
+}
+
+static void CheckHexSigned32(int32 v) {
+  using tensorflow::strings::Hex;
+  string actual = StrCat(Hex(v, tensorflow::strings::ZERO_PAD_8));
+  string expected = Printf("%08x", v);
+  EXPECT_EQ(expected, actual) << " decimal value " << v;
+
+  actual = StrCat(Hex(v));
+  expected = Printf("%x", v);
+  EXPECT_EQ(expected, actual) << " decimal value " << v;
+}
+
+static void TestFastPrints() {
+  using tensorflow::strings::Hex;
+
+  // Test min int to make sure that works
+  for (int i = 0; i < 10000; i++) {
+    CheckHex64(i);
+    CheckHex32(i);
+    CheckHexSigned32(i);
+    CheckHexSigned32(-i);
+  }
+  CheckHex64(0x123456789abcdef0ull);
+  CheckHex32(0x12345678);
+
+  int8 minus_one_8bit = -1;
+  EXPECT_EQ("ff", StrCat(Hex(minus_one_8bit)));
+
+  int16 minus_one_16bit = -1;
+  EXPECT_EQ("ffff", StrCat(Hex(minus_one_16bit)));
+}
+
+TEST(Numbers, TestFunctionsMovedOverFromNumbersMain) { TestFastPrints(); }
+
+}  // namespace strings
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/stringprintf.cc b/tensorflow/core/lib/strings/stringprintf.cc
new file mode 100644
index 0000000000..b354706cbd
--- /dev/null
+++ b/tensorflow/core/lib/strings/stringprintf.cc
@@ -0,0 +1,85 @@
+#include "tensorflow/core/lib/strings/stringprintf.h"
+
+#include <errno.h>
+#include <stdarg.h>  // For va_list and related operations
+#include <stdio.h>   // MSVC requires this for _vsnprintf
+#include <vector>
+
+namespace tensorflow {
+namespace strings {
+
+#ifdef COMPILER_MSVC
+enum { IS_COMPILER_MSVC = 1 };
+#else
+enum { IS_COMPILER_MSVC = 0 };
+#endif
+
+void Appendv(string* dst, const char* format, va_list ap) {
+  // First try with a small fixed size buffer
+  static const int kSpaceLength = 1024;
+  char space[kSpaceLength];
+
+  // It's possible for methods that use a va_list to invalidate
+  // the data in it upon use.  The fix is to make a copy
+  // of the structure before using it and use that copy instead.
+  va_list backup_ap;
+  va_copy(backup_ap, ap);
+  int result = vsnprintf(space, kSpaceLength, format, backup_ap);
+  va_end(backup_ap);
+
+  if (result < kSpaceLength) {
+    if (result >= 0) {
+      // Normal case -- everything fit.
+      dst->append(space, result);
+      return;
+    }
+
+    if (IS_COMPILER_MSVC) {
+      // Error or MSVC running out of space.  MSVC 8.0 and higher
+      // can be asked about space needed with the special idiom below:
+      va_copy(backup_ap, ap);
+      result = vsnprintf(NULL, 0, format, backup_ap);
+      va_end(backup_ap);
+    }
+
+    if (result < 0) {
+      // Just an error.
+      return;
+    }
+  }
+
+  // Increase the buffer size to the size requested by vsnprintf,
+  // plus one for the closing \0.
+  int length = result + 1;
+  char* buf = new char[length];
+
+  // Restore the va_list before we use it again
+  va_copy(backup_ap, ap);
+  result = vsnprintf(buf, length, format, backup_ap);
+  va_end(backup_ap);
+
+  if (result >= 0 && result < length) {
+    // It fit
+    dst->append(buf, result);
+  }
+  delete[] buf;
+}
+
+string Printf(const char* format, ...) {
+  va_list ap;
+  va_start(ap, format);
+  string result;
+  Appendv(&result, format, ap);
+  va_end(ap);
+  return result;
+}
+
+void Appendf(string* dst, const char* format, ...) {
+  va_list ap;
+  va_start(ap, format);
+  Appendv(dst, format, ap);
+  va_end(ap);
+}
+
+}  // namespace strings
+}  // namespace tensorflow
diff --git a/tensorflow/core/lib/strings/stringprintf.h b/tensorflow/core/lib/strings/stringprintf.h
new file mode 100644
index 0000000000..23ca2583ca
--- /dev/null
+++ b/tensorflow/core/lib/strings/stringprintf.h
@@ -0,0 +1,37 @@
+// Printf variants that place their output in a C++ string.
+//
+// Usage:
+//      string result = strings::Printf("%d %s\n", 10, "hello");
+//      strings::SPrintf(&result, "%d %s\n", 10, "hello");
+//      strings::Appendf(&result, "%d %s\n", 20, "there");
+
+#ifndef TENSORFLOW_LIB_STRINGS_STRINGPRINTF_H_
+#define TENSORFLOW_LIB_STRINGS_STRINGPRINTF_H_
+
+#include <stdarg.h>
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace strings {
+
+// Return a C++ string
+extern string Printf(const char* format, ...)
+    // Tell the compiler to do printf format string checking.
+    TF_PRINTF_ATTRIBUTE(1, 2);
+
+// Append result to a supplied string
+extern void Appendf(string* dst, const char* format, ...)
+    // Tell the compiler to do printf format string checking.
+    TF_PRINTF_ATTRIBUTE(2, 3);
+
+// Lower-level routine that takes a va_list and appends to a specified
+// string.  All other routines are just convenience wrappers around it.
+extern void Appendv(string* dst, const char* format, va_list ap);
+
+}  // namespace strings
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_LIB_STRINGS_STRINGPRINTF_H_
diff --git a/tensorflow/core/lib/strings/stringprintf_test.cc b/tensorflow/core/lib/strings/stringprintf_test.cc
new file mode 100644
index 0000000000..737ed5c0e0
--- /dev/null
+++ b/tensorflow/core/lib/strings/stringprintf_test.cc
@@ -0,0 +1,113 @@
+#include "tensorflow/core/lib/strings/stringprintf.h"
+
+#include <string>
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace strings {
+namespace {
+
+TEST(PrintfTest, Empty) {
+  EXPECT_EQ("", Printf("%s", string().c_str()));
+  EXPECT_EQ("", Printf("%s", ""));
+}
+
+TEST(PrintfTest, Misc) {
+// MSVC does not support $ format specifier.
+#if !defined(COMPILER_MSVC)
+  EXPECT_EQ("123hello w", Printf("%3$d%2$s %1$c", 'w', "hello", 123));
+#endif  // !COMPILER_MSVC
+}
+
+TEST(AppendfTest, Empty) {
+  string value("Hello");
+  const char* empty = "";
+  Appendf(&value, "%s", empty);
+  EXPECT_EQ("Hello", value);
+}
+
+TEST(AppendfTest, EmptyString) {
+  string value("Hello");
+  Appendf(&value, "%s", "");
+  EXPECT_EQ("Hello", value);
+}
+
+TEST(AppendfTest, String) {
+  string value("Hello");
+  Appendf(&value, " %s", "World");
+  EXPECT_EQ("Hello World", value);
+}
+
+TEST(AppendfTest, Int) {
+  string value("Hello");
+  Appendf(&value, " %d", 123);
+  EXPECT_EQ("Hello 123", value);
+}
+
+TEST(PrintfTest, Multibyte) {
+  // If we are in multibyte mode and feed invalid multibyte sequence,
+  // Printf should return an empty string instead of running
+  // out of memory while trying to determine destination buffer size.
+  // see b/4194543.
+
+  char* old_locale = setlocale(LC_CTYPE, NULL);
+  // Push locale with multibyte mode
+  setlocale(LC_CTYPE, "en_US.utf8");
+
+  const char kInvalidCodePoint[] = "\375\067s";
+  string value = Printf("%.*s", 3, kInvalidCodePoint);
+
+  // In some versions of glibc (e.g. eglibc-2.11.1, aka GRTEv2), snprintf
+  // returns error given an invalid codepoint. Other versions
+  // (e.g. eglibc-2.15, aka pre-GRTEv3) emit the codepoint verbatim.
+  // We test that the output is one of the above.
+  EXPECT_TRUE(value.empty() || value == kInvalidCodePoint);
+
+  // Repeat with longer string, to make sure that the dynamically
+  // allocated path in StringAppendV is handled correctly.
+  int n = 2048;
+  char* buf = new char[n + 1];
+  memset(buf, ' ', n - 3);
+  memcpy(buf + n - 3, kInvalidCodePoint, 4);
+  value = Printf("%.*s", n, buf);
+  // See GRTEv2 vs. GRTEv3 comment above.
+  EXPECT_TRUE(value.empty() || value == buf);
+  delete[] buf;
+
+  setlocale(LC_CTYPE, old_locale);
+}
+
+TEST(PrintfTest, NoMultibyte) {
+  // No multibyte handling, but the string contains funny chars.
+  char* old_locale = setlocale(LC_CTYPE, NULL);
+  setlocale(LC_CTYPE, "POSIX");
+  string value = Printf("%.*s", 3, "\375\067s");
+  setlocale(LC_CTYPE, old_locale);
+  EXPECT_EQ("\375\067s", value);
+}
+
+TEST(PrintfTest, DontOverwriteErrno) {
+  // Check that errno isn't overwritten unless we're printing
+  // something significantly larger than what people are normally
+  // printing in their badly written PLOG() statements.
+  errno = ECHILD;
+  string value = Printf("Hello, %s!", "World");
+  EXPECT_EQ(ECHILD, errno);
+}
+
+TEST(PrintfTest, LargeBuf) {
+  // Check that the large buffer is handled correctly.
+  int n = 2048;
+  char* buf = new char[n + 1];
+  memset(buf, ' ', n);
+  buf[n] = 0;
+  string value = Printf("%s", buf);
+  EXPECT_EQ(buf, value);
+  delete[] buf;
+}
+
+}  // namespace
+
+}  // namespace strings
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/array_ops.cc b/tensorflow/core/ops/array_ops.cc
new file mode 100644
index 0000000000..8c0571b50e
--- /dev/null
+++ b/tensorflow/core/ops/array_ops.cc
@@ -0,0 +1,892 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("Pack")
+    .Input("values: N * T")
+    .Output("output: T")
+    .Attr("N: int >= 1")
+    .Attr("T: type")
+    .Doc(R"doc(
+Packs a list of `N` rank-`R` tensors into one rank-`(R+1)` tensor.
+
+Packs the `N` tensors in `values` into a tensor with rank one higher than each
+tensor in `values` and shape `[N] + values[0].shape`. The output satisfies
+`output[i, ...] = values[i][...]`.
+
+This is the opposite of `unpack`.
+
+values: Must be of same shape and type.
+output: The packed tensor.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Unpack")
+    .Input("value: T")
+    .Output("output: num * T")
+    .Attr("num: int >= 0")
+    .Attr("T: type")
+    .Doc(R"doc(
+Unpacks the outer dimension of a rank-`R` tensor into `num` rank-`(R-1)` tensors.
+
+Unpacks `num` tensors from `value` by chipping it along the first dimension.
+The i'th tensor in `output` is the slice `value[i, ...]`. Each tensor in
+`output` has shape `value.shape[1:]`.
+
+This is the opposite of `pack`.
+
+value: 1-D or higher, with first dimension `num`.
+output: The list of tensors unpacked from `value`.
+)doc");
+
+// --------------------------------------------------------------------------
+// TODO(josh11b): Remove the >= 2 constraint, once we can rewrite the graph
+// in the N == 1 case to remove the node.
+REGISTER_OP("Concat")
+    .Input("concat_dim: int32")
+    .Input("values: N * T")
+    .Output("output: T")
+    .Attr("N: int >= 2")
+    .Attr("T: type")
+    .Doc(R"doc(
+Concatenates tensors along one dimension.
+
+concat_dim: 0-D.  The dimension along which to concatenate.  Must be in the
+  range [0, rank(values)).
+values: The `N` Tensors to concatenate. Their ranks and types must match,
+  and their sizes must match in all dimensions except `concat_dim`.
+output: A `Tensor` with the concatenation of values stacked along the
+  `concat_dim` dimension.  This tensor's shape matches that of `values` except
+  in `concat_dim` where it has the sum of the sizes.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Split")
+    .Input("split_dim: int32")
+    .Input("value: T")
+    .Output("output: num_split * T")
+    .Attr("num_split: int >= 1")
+    .Attr("T: type")
+    .Doc(R"doc(
+Splits a tensor into `num_split` tensors along one dimension.
+
+split_dim: 0-D.  The dimension along which to split.  Must be in the range
+  `[0, rank(value))`.
+num_split: The number of ways to split.  Must evenly divide
+  `value.shape[split_dim]`.
+value: The tensor to split.
+output: They are identically shaped tensors, whose shape matches that of `value`
+  except along `split_dim`, where their sizes are
+  `values.shape[split_dim] / num_split`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Const")
+    .Output("output: dtype")
+    .Attr("value: tensor")
+    .Attr("dtype: type")
+    .Doc(R"doc(
+Returns a constant tensor.
+
+value: Attr `value` is the tensor to return.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ZerosLike")
+    .Input("x: T")
+    .Output("y: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Returns a tensor of zeros with the same shape and type as x.
+
+x: a tensor of type T.
+y: a tensor of the same shape and type as x but filled with zeros.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Diag")
+    .Input("diagonal: T")
+    .Output("output: T")
+    .Attr("T: {float, double, int32, int64}")
+    .Doc(R"doc(
+Returns a diagonal tensor with a given diagonal values.
+
+Given a `diagonal`, this operation returns a tensor with the `diagonal` and
+everything else padded with zeros. The diagonal is computed as follows:
+
+Assume `diagonal` has dimensions [D1,..., Dk], then the output is a tensor of
+rank 2k with dimensions [D1,..., Dk, D1,..., Dk] where:
+
+`output[i1,..., ik, i1,..., ik] = diagonal[i1, ..., ik]` and 0 everywhere else.
+
+For example:
+
+```prettyprint
+# 'diagonal' is [1, 2, 3, 4]
+tf.diag(diagonal) ==> [[1, 0, 0, 0]
+                       [0, 2, 0, 0]
+                       [0, 0, 3, 0]
+                       [0, 0, 0, 4]]
+```
+
+diagonal: Rank k tensor where k is at most 3.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Reverse")
+    .Input("tensor: T")
+    .Input("dims: bool")
+    .Output("output: T")
+    .Attr("T: {uint8, int8, int32, bool, float, double}")
+    .Doc(R"Doc(
+Reverses specific dimensions of a tensor.
+
+Given a `tensor`, and a `bool` tensor `dims` representing the dimensions
+of `tensor`, this operation reverses each dimension i of `tensor` where
+`dims[i]` is `True`.
+
+`tensor` can have up to 8 dimensions. The number of dimensions
+of `tensor` must equal the number of elements in `dims`. In other words:
+
+`rank(tensor) = size(dims)`
+
+For example:
+
+```prettyprint
+# tensor 't' is [[[[ 0,  1,  2,  3],
+#                  [ 4,  5,  6,  7],
+#                  [ 8,  9, 10, 11]],
+#                 [[12, 13, 14, 15],
+#                  [16, 17, 18, 19],
+#                  [20, 21, 22, 23]]]]
+# tensor 't' shape is [1, 2, 3, 4]
+
+# 'dims' is [False, False, False, True]
+reverse(t, dims) ==> [[[[ 3,  2,  1,  0],
+                        [ 7,  6,  5,  4],
+                        [ 11, 10, 9, 8]],
+                       [[15, 14, 13, 12],
+                        [19, 18, 17, 16],
+                        [23, 22, 21, 20]]]]
+
+# 'dims' is [False, True, False, False]
+reverse(t, dims) ==> [[[[12, 13, 14, 15],
+                        [16, 17, 18, 19],
+                        [20, 21, 22, 23]
+                       [[ 0,  1,  2,  3],
+                        [ 4,  5,  6,  7],
+                        [ 8,  9, 10, 11]]]]
+
+# 'dims' is [False, False, True, False]
+reverse(t, dims) ==> [[[[8, 9, 10, 11],
+                        [4, 5, 6, 7],
+                        [0, 1, 2, 3]]
+                       [[20, 21, 22, 23],
+                        [16, 17, 18, 19],
+                        [12, 13, 14, 15]]]]
+```
+
+tensor: Up to 8-D.
+dims: 1-D. The dimensions to reverse.
+output: The same shape as `tensor`.
+)Doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("EditDistance")
+    .Input("hypothesis_indices: int64")
+    .Input("hypothesis_values: T")
+    .Input("hypothesis_shape: int64")
+    .Input("truth_indices: int64")
+    .Input("truth_values: T")
+    .Input("truth_shape: int64")
+    .Attr("normalize: bool = True")
+    .Attr("T: type")
+    .Output("output: float")
+    .Doc(R"doc(
+Computes the (possibly normalized) Levenshtein Edit Distance.
+
+The inputs are variable-length sequences provided by SparseTensors
+  (hypothesis_indices, hypothesis_values, hypothesis_shape)
+and
+  (truth_indices, truth_values, truth_shape).
+
+The inputs are:
+
+hypothesis_indices: The indices of the hypothesis list SparseTensor.
+  This is an N x R int64 matrix.
+hypothesis_values: The values of the hypothesis list SparseTensor.
+  This is an N-length vector.
+hypothesis_shape: The shape of the hypothesis list SparseTensor.
+  This is an R-length vector.
+truth_indices: The indices of the truth list SparseTensor.
+  This is an M x R int64 matrix.
+truth_values: The values of the truth list SparseTensor.
+  This is an M-length vector.
+truth_shape: The shape of the truth list SparseTensor.
+  This is an R-length vector.
+truth_shape: truth indices, vector.
+normalize: boolean (if true, edit distances are normalized by length of truth).
+
+The output is:
+
+output: A dense float tensor with rank R - 1.
+
+For the example input:
+
+    // hypothesis represents a 2x1 matrix with variable-length values:
+    //   (0,0) = ["a"]
+    //   (1,0) = ["b"]
+    hypothesis_indices = [[0, 0, 0],
+                          [1, 0, 0]]
+    hypothesis_values = ["a", "b"]
+    hypothesis_shape = [2, 1, 1]
+
+    // truth represents a 2x2 matrix with variable-length values:
+    //   (0,0) = []
+    //   (0,1) = ["a"]
+    //   (1,0) = ["b", "c"]
+    //   (1,1) = ["a"]
+    truth_indices = [[0, 1, 0],
+                     [1, 0, 0],
+                     [1, 0, 1],
+                     [1, 1, 0]]
+    truth_values = ["a", "b", "c", "a"]
+    truth_shape = [2, 2, 2]
+    normalize = true
+
+The output will be:
+
+    // output is a 2x2 matrix with edit distances normalized by truth lengths.
+    output = [[inf, 1.0],  // (0,0): no truth, (0,1): no hypothesis
+              [0.5, 1.0]]  // (1,0): addition, (1,1): no hypothesis
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Fill")
+    .Input("dims: int32")
+    .Input("value: T")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Creates a tensor filled with a scalar value.
+
+This operation creates a tensor of shape `dims` and fills it with `value`.
+
+For example:
+
+```prettyprint
+# output tensor shape needs to be [2, 3]
+# so 'dims' is [2, 3]
+fill(dims, 9) ==> [[9, 9, 9]
+                   [9, 9, 9]]
+```
+
+dims: 1-D. Represents the shape of the output tensor.
+value: 0-D (scalar). Value to fill the returned tensor.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Gather")
+    .Input("params: Tparams")
+    .Input("indices: Tindices")
+    .Output("output: Tparams")
+    .Attr("Tparams: type")
+    .Attr("Tindices: {int32,int64}")
+    .Doc(R"doc(
+Gather slices from `params` according to `indices`.
+
+`indices` must be an integer tensor of any dimension (usually 0-D or 1-D).
+Produces an output tensor with shape `indices.shape + params.shape[1:]` where:
+
+    # Scalar indices
+    output[:, ..., :] = params[indices, :, ... :]
+
+    # Vector indices
+    output[i, :, ..., :] = params[indices[i], :, ... :]
+
+    # Higher rank indices
+    output[i, ..., j, :, ... :] = params[indices[i, ..., j], :, ..., :]
+
+If `indices` is a permutation and `len(indices) == params.shape[0]` then
+this operation will permute `params` accordingly.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/Gather.png" alt>
+</div>
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Identity")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"Doc(
+Return a tensor with the same shape and contents as the input tensor or value.
+)Doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("RefIdentity")
+    .Input("input: Ref(T)")
+    .Output("output: Ref(T)")
+    .Attr("T: type")
+    .Doc(R"Doc(
+Return the same ref tensor as the input ref tensor.
+)Doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("StopGradient")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"Doc(
+Stops gradient computation.
+
+When executed in a graph, this op outputs its input tensor as-is.
+
+When building ops to compute gradients, this op prevents the contribution of
+its inputs to be taken into account.  Normally, the gradient generator adds ops
+to a graph to compute the derivatives of a specified 'loss' by recursively
+finding out inputs that contributed to its computation.  If you insert this op
+in the graph it inputs are masked from the gradient generator.  They are not
+taken into account for computing gradients.
+
+This is useful any time you want to compute a value with TensorFlow but need
+to pretend that the value was a constant. Some examples include:
+
+*  The *EM* algorithm where the *M-step* should not involve backpropagation
+   through the output of the *E-step*.
+*  Contrastive divergence training of Boltzmann machines where, when
+   differentiating the energy function, the training must not backpropagate
+   through the graph that generated the samples from the model.
+*  Adversarial training, where no backprop should happen through the adversarial
+   example generation process.
+)Doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("CheckNumerics")
+    .Input("tensor: T")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Attr("message: string")
+    .Doc(R"doc(
+Checks a tensor for NaN and Inf values.
+
+When run, reports an `InvalidArgument` error if `tensor` has any values
+that are not a number (NaN) or infinity (Inf). Otherwise, passes `tensor` as-is.
+
+message: Prefix of the error message.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Reshape")
+    .Input("tensor: T")
+    .Input("shape: int32")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"Doc(
+Reshapes a tensor.
+
+Given `tensor`, this operation returns a tensor that has the same values
+as `tensor` with shape `shape`.
+
+If `shape` is the special value `[-1]`, then `tensor` is flattened and the
+operation outputs a 1-D tensor with all elements of `tensor`.
+
+If `shape` is 1-D or higher, then the operation returns a tensor with shape
+`shape` filled with the values of `tensor`. In this case, the number of elements
+implied by `shape` must be the same as the number of elements in `tensor`.
+
+For example:
+
+```prettyprint
+# tensor 't' is [1, 2, 3, 4, 5, 6, 7, 8, 9]
+# tensor 't' has shape [9]
+reshape(t, [3, 3]) ==> [[1, 2, 3]
+                        [4, 5, 6]
+                        [7, 8, 9]]
+
+# tensor 't' is [[[1, 1], [2, 2]]
+#                [[3, 3], [4, 4]]]
+# tensor 't' has shape [2, 2]
+reshape(t, [2, 4]) ==> [[1, 1, 2, 2]
+                        [3, 3, 4, 4]]
+
+# tensor 't' is [[[1, 1, 1],
+#                 [2, 2, 2]],
+#                [[3, 3, 3],
+#                 [4, 4, 4]],
+#                [[5, 5, 5],
+#                 [6, 6, 6]]]
+# tensor 't' has shape [3, 2, 3]
+# pass '[-1]' to flatten 't'
+reshape(t, [-1]) ==> [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]
+```
+
+shape: Defines the shape of the output tensor.
+)Doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("InvertPermutation")
+    .Input("x: int32")
+    .Output("y: int32")
+    .Doc(R"doc(
+Computes the inverse permutation of a tensor.
+
+This operation computes the inverse of an index permutation. It takes a 1-D
+integer tensor `x`, which represents the indices of a zero-based array, and
+swaps each value with its index position. In other words, for an ouput tensor
+`y` and an input tensor `x`, this operation computes the following:
+
+`y[x[i]] = i for i in [0, 1, ..., len(x) - 1]`
+
+The values must include 0. There can be no duplicate values or negative values.
+
+For example:
+
+```prettyprint
+# tensor `x` is [3, 4, 0, 2, 1]
+invert_permutation(x) ==> [2, 4, 3, 0, 1]
+```
+
+x: 1-D.
+y: 1-D.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Transpose")
+    .Input("x: T")
+    .Input("perm: int32")
+    .Output("y: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Shuffle dimensions of x according to a permutation.
+
+The output `y` has the same rank as `x`. The shapes of `x` and `y` satisfy:
+  `y.shape[i] == x.shape[perm[i]] for i in [0, 1, ..., rank(x) - 1]`
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Unique")
+    .Input("x: T")
+    .Output("y: T")
+    .Output("idx: int32")
+    .Attr("T: type")
+    .Doc(R"doc(
+Finds unique elements in a 1-D tensor.
+
+This operation returns a tensor `y` containing all of the unique elements of `x`
+sorted in the same order that they occur in `x`. This operation also returns a
+tensor `idx` the same size as `x` that contains the index of each value of `x`
+in the unique output `y`. In other words:
+
+`y[idx[i]] = x[i] for i in [0, 1,...,rank(x) - 1]`
+
+For example:
+
+```prettyprint
+# tensor 'x' is [1, 1, 2, 4, 4, 4, 7, 8, 8]
+y, idx = unique(x)
+y ==> [1, 2, 4, 7, 8]
+idx ==> [0, 0, 1, 2, 2, 2, 3, 4, 4]
+```
+
+x: 1-D.
+y: 1-D.
+idx: 1-D.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Shape")
+    .Input("input: T")
+    .Output("output: int32")
+    .Attr("T: type")
+    .Doc(R"doc(
+Returns the shape of a tensor.
+
+This operation returns a 1-D integer tensor representing the shape of `input`.
+
+For example:
+
+```prettyprint
+# 't' is [[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]]
+shape(t) ==> [2, 2, 3]
+```
+
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ReverseSequence")
+    .Input("input: T")
+    .Input("seq_lengths: int64")
+    .Output("output: T")
+    .Attr("seq_dim: int")
+    .Attr("T: type")
+    .Doc(R"doc(
+Reverses variable length slices in dimension `seq_dim`.
+
+This op first slices `input` along the first dimension, and for each slice `i`,
+reverses the first `seq_lengths[i]` elements along the dimension `seq_dim`.
+
+The elements of `seq_lengths` must obey `seq_lengths[i] < input.dims[seq_dim]`,
+and `seq_lengths` must be a vector of length `input.dims(0)`.
+
+The output slice `i` along dimension 0 is then given by input slice `i`, with
+the first `seq_lengths[i]` slices along dimension `seq_dim` reversed.
+
+For example:
+
+```prettyprint
+# Given this:
+seq_dim = 1
+input.dims = (4, ...)
+seq_lengths = [7, 2, 3, 5]
+
+# then slices of input are reversed on seq_dim, but only up to seq_lengths:
+output[0, 0:7, :, ...] = input[0, 7:0:-1, :, ...]
+output[1, 0:2, :, ...] = input[1, 2:0:-1, :, ...]
+output[2, 0:3, :, ...] = input[2, 3:0:-1, :, ...]
+output[3, 0:5, :, ...] = input[3, 5:0:-1, :, ...]
+
+# while entries past seq_lens are copied through:
+output[0, 7:, :, ...] = input[0, 7:, :, ...]
+output[1, 2:, :, ...] = input[1, 2:, :, ...]
+output[2, 3:, :, ...] = input[2, 3:, :, ...]
+output[3, 2:, :, ...] = input[3, 2:, :, ...]
+```
+
+input: The input to reverse.
+seq_lengths: 1-D with length `input.dims(0)` and
+  `max(seq_lengths) < input.dims(seq_dim)`
+seq_dim: The dimension which is partially reversed.
+output: The partially reversed input. It has the same shape as `input`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Rank")
+    .Input("input: T")
+    .Output("output: int32")
+    .Attr("T: type")
+    .Doc(R"doc(
+Returns the rank of a tensor.
+
+This operation returns an integer representing the rank of `input`.
+
+For example:
+
+```prettyprint
+# 't' is [[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]]
+# shape of tensor 't' is [2, 2, 3]
+rank(t) ==> 3
+```
+
+**Note**: The rank of a tensor is not the same as the rank of a matrix. The rank
+of a tensor is the number of indices required to uniquely select each element
+of the tensor. Rank is also known as "order", "degree", or "ndims."
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Size")
+    .Input("input: T")
+    .Output("output: int32")
+    .Attr("T: type")
+    .Doc(R"doc(
+Returns the size of a tensor.
+
+This operation returns an integer representing the number of elements in
+`input`.
+
+For example:
+
+```prettyprint
+# 't' is [[[1, 1,, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]]]
+size(t) ==> 12
+```
+
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Slice")
+    .Input("input: T")
+    .Input("begin: Index")
+    .Input("size: Index")
+    .Output("output: T")
+    .Attr("T: type")
+    .Attr("Index: {int32,int64}")
+    .Doc(R"doc(
+Return a slice from 'input'.
+
+The output tensor is a tensor with dimensions described by 'size'
+whose values are extracted from 'input' starting at the offsets in
+'begin'.
+
+*Requirements*:
+  0 <= begin[i] <= begin[i] + size[i] <= Di  for i in [0, n)
+
+begin: begin[i] specifies the offset into the 'i'th dimension of
+  'input' to slice from.
+size: size[i] specifies the number of elements of the 'i'th dimension
+  of 'input' to slice. If size[i] is -1, all remaining elements in dimension
+  i are included in the slice (i.e. this is equivalent to setting
+  size[i] = input.dim_size(i) - begin[i]).
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Tile")
+    .Input("input: T")
+    .Input("multiples: int32")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Constructs a tensor by tiling a given tensor.
+
+This operation creates a new tensor by replicating `input` `multiples` times.
+The output tensor's i'th dimension has `input.dims(i) * multiples[i]` elements,
+and the values of `input` are replicated `multiples[i]` times along the 'i'th
+dimension. For example, tiling `[a b c d]` by `[2]` produces
+`[a b c d a b c d]`.
+
+input: 1-D or higher.
+multiples: 1-D. Length must be the same as the number of dimensions in `input`
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("TileGrad")
+    .Input("input: T")
+    .Input("multiples: int32")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Returns the gradient of `Tile`.
+
+Since `Tile` takes an input and repeats the input `multiples` times
+along each dimension, `TileGrad` takes in `multiples` and aggregates
+each repeated tile of `input` into `output`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Where")
+    .Input("input: bool")
+    .Output("index: int64")
+    .Doc(R"doc(
+Returns locations of true values in a boolean tensor.
+
+This operation returns the coordinates of true elements in `input`. The
+coordinates are returned in a 2-D tensor where the first dimension (rows)
+represents the number of true elements, and the second dimension (columns)
+represents the coordinates of the true elements. Keep in mind, the shape of
+the output tensor can vary depending on how many true values there are in
+`input`. Indices are output in row-major order.
+
+For example:
+
+```prettyprint
+# 'input' tensor is [[True, False]
+#                    [True, False]]
+# 'input' has two true values, so output has two coordinates.
+# 'input' has rank of 2, so coordinates have two indices.
+where(input) ==> [[0, 0],
+                  [1, 0]]
+
+# `input` tensor is [[[True, False]
+#                     [True, False]]
+#                    [[False, True]
+#                     [False, True]]
+#                    [[False, False]
+#                     [False, True]]]
+# 'input' has 5 true values, so output has 5 coordinates.
+# 'input' has rank of 3, so coordinates have three indices.
+where(input) ==> [[0, 0, 0],
+                  [0, 1, 0],
+                  [1, 0, 1],
+                  [1, 1, 1],
+                  [2, 1, 1]]
+```
+
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("BroadcastGradientArgs")
+    .Input("s0: int32")
+    .Input("s1: int32")
+    .Output("r0: int32")
+    .Output("r1: int32")
+    .Doc(R"doc(
+Return the reduction indices for computing gradients of s0 op s1 with broadcast.
+
+This is typically used by gradient computations for a broadcasting operation.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("Pad")
+    .Input("input: T")
+    .Input("paddings: int32")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Pads a tensor with zeros.
+
+This operation pads a `input` with zeros according to the `paddings` you
+specify. `paddings` is an integer tensor with shape `[Dn, 2]`, where n is the
+rank of `input`. For each dimension D of `input`, `paddings[D, 0]` indicates
+how many zeros to add before the contents of `input` in that dimension, and
+`paddings[D, 1]` indicates how many zeros to add after the contents of `input`
+in that dimension.
+
+The padded size of each dimension D of the output is:
+
+`paddings(D, 0) + input.dim_size(D) + paddings(D, 1)`
+
+For example:
+
+```prettyprint
+# 't' is [[1, 1], [2, 2]]
+# 'paddings' is [[1, 1]], [2, 2]]
+# rank of 't' is 2
+pad(t, paddings) ==> [[0, 0, 0, 0, 0]
+                      [0, 0, 0, 0, 0]
+                      [0, 1, 1, 0, 0]
+                     [[0, 2, 2, 0, 0]
+                      [0, 0, 0, 0, 0]]
+```
+
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Placeholder")
+    .Output("output: dtype")
+    .Attr("dtype: type")
+    .Attr("shape: shape")
+    .Doc(R"doc(
+A placeholder op for a value that will be fed into the computation.
+
+N.B. This operation will fail with an error if it is executed. It is
+intended as a way to represent a value that will always be fed, and to
+provide attrs that enable the fed value to be checked at runtime.
+
+output: A placeholder tensor that must be replaced using the feed mechanism.
+dtype: The type of elements in the tensor.
+shape: (Optional) The shape of the tensor. If the shape has 0 dimensions, the
+  shape is unconstrained.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ExpandDims")
+    .Input("input: T")
+    .Input("dim: int32")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Inserts a dimension of 1 into a tensor's shape.
+
+Given a tensor `input`, this operation inserts a dimension of 1 at the
+dimension index `dim` of `input`'s shape. The dimension index `dim` starts at
+zero; if you specify a negative number for `dim` it is counted backward from
+the end.
+
+This operation is useful if you want to add a batch dimension to a single
+element. For example, if you have a single image of shape `[height, width,
+channels]`, you can make it a batch of 1 image with `expand_dims(image, 0)`,
+which will make the shape `[1, height, width, channels]`.
+
+Other examples:
+
+```prettyprint
+# 't' is a tensor of shape [2]
+shape(expand_dims(t, 0)) ==> [1, 2]
+shape(expand_dims(t, 1)) ==> [2, 1]
+shape(expand_dims(t, -1)) ==> [2, 1]
+
+# 't2' is a tensor of shape [2, 3, 5]
+shape(expand_dims(t2, 0)) ==> [1, 2, 3, 5]
+shape(expand_dims(t2, 2)) ==> [2, 3, 1, 5]
+shape(expand_dims(t2, 3)) ==> [2, 3, 5, 1]
+```
+
+This operation requires that:
+
+`-1-input.dims() <= dim <= input.dims()`
+
+This operation is related to `squeeze()`, which removes dimensions of
+size 1.
+
+dim: 0-D (scalar). Specifies the dimension index at which to
+  expand the shape of `input`.
+output: Contains the same data as `input`, but its shape has an additional
+  dimension of size 1 added.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Squeeze")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: type")
+    .Attr("squeeze_dims: list(int) >= 0 = []")
+    .Doc(R"doc(
+Removes dimensions of size 1 from the shape of a tensor.
+
+Given a tensor `input`, this operation returns a tensor of the same type with
+all dimensions of size 1 removed. If you don't want to remove all size 1
+dimensions, you can remove specific size 1 dimensions by specifying
+`squeeze_dims`.
+
+For example:
+
+```prettyprint
+# 't' is a tensor of shape [1, 2, 1, 3, 1, 1]
+shape(squeeze(t)) ==> [2, 3]
+```
+
+Or, to remove specific size 1 dimensions:
+
+```prettyprint
+# 't' is a tensor of shape [1, 2, 1, 3, 1, 1]
+shape(squeeze(t, [2, 4])) ==> [1, 2, 3, 1]
+```
+
+input: The `input` to squeeze.
+squeeze_dims: If specified, only squeezes the dimensions listed. The dimension
+  index starts at 0. It is an error to squeeze a dimension that is not 1.
+output: Contains the same data as `input`, but has one or more dimensions of
+  size 1 removed.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ListDiff")
+    .Input("x: T")
+    .Input("y: T")
+    .Output("out: T")
+    .Output("idx: int32")
+    .Attr("T: type")
+    .Doc(R"doc(
+Computes the difference between two lists of numbers.
+
+Given a list `x` and a list `y`, this operation returns a list `out` that
+represents all numbers that are in `x` but not in `y`. The returned list `out`
+is sorted in the same order that the numbers appear in `x` (duplicates are
+preserved). This operation also returns a list `idx` that represents the
+position of each `out` element in `x`. In other words:
+
+`out[i] = x[idx[i]] for i in [0, 1, ..., len(out) - 1]`
+
+For example, given this input:
+
+```prettyprint
+x = [1, 2, 3, 4, 5, 6]
+y = [1, 3, 5]
+```
+
+This operation would return:
+
+```prettyprint
+out ==> [2, 4, 6]
+idx ==> [1, 3, 5]
+```
+
+x: 1-D. Values to keep.
+y: 1-D. Values to remove.
+out: 1-D. Values present in `x` but not in `y`.
+idx: 1-D. Positions of `x` values preserved in `out`.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/attention_ops.cc b/tensorflow/core/ops/attention_ops.cc
new file mode 100644
index 0000000000..6fa9a6e821
--- /dev/null
+++ b/tensorflow/core/ops/attention_ops.cc
@@ -0,0 +1,54 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+// Tout = extract_glimpse(Tin, size, offsets) extract the glimpse of size size
+// centered at location offsets from the input tensor Tin
+//
+// REQUIRES: Tin.dims() == 4
+//
+REGISTER_OP("ExtractGlimpse")
+    .Input("input: float")
+    .Input("size: int32")
+    .Input("offsets: float")
+    .Output("glimpse: float")
+    .Attr("centered: bool = true")
+    .Attr("normalized: bool = true")
+    .Attr("uniform_noise: bool = true")
+    .Doc(R"doc(
+Extracts a glimpse from the input tensor.
+
+Returns a set of windows called glimpses extracted at location `offsets`
+from the input tensor. If the windows only partially overlaps the inputs, the
+non overlapping areas will be filled with random noise.
+
+The result is a 4-D tensor of shape `[batch_size, glimpse_height,
+glimpse_width, channels]`. The channels and batch dimensions are the same as that
+of the input tensor. The height and width of the output windows are
+specified in the `size` parameter.
+
+The argument `normalized` and `centered` controls how the windows are built:
+* If the coordinates are normalized but not centered, 0.0 and 1.0
+  correspond to the minimum and maximum of each height and width dimension.
+* If the coordinates are both normalized and centered, they range from -1.0 to
+  1.0. The coordinates (-1.0, -1.0) correspond to the upper left corner, the
+  lower right corner is located at  (1.0, 1.0) and the center is at (0, 0).
+* If the coordinates are not normalized they are interpreted as numbers of pixels.
+
+input: A 4-D float tensor of shape `[batch_size, height, width, channels]`.
+size: A 1-D tensor of 2 elements containing the size of the glimpses to extract.
+  The glimpse height must be specified first, following by the glimpse width.
+offsets: A 2-D integer tensor of shape `[batch_size, 2]` containing the x, y
+  locations of the center of each window.
+glimpse: A tensor representing the glimpses `[batch_size, glimpse_height,
+  glimpse_width, channels]`.
+centered: indicates if the offset coordinates are centered relative to
+  the image, in which case the (0, 0) offset is relative to the center of the
+  input images. If false, the (0,0) offset corresponds to the upper left corner
+  of the input images.
+normalized: indicates if the offset coordinates are normalized.
+uniform_noise: indicates if the noise should be generated using a
+  uniform distribution or a gaussian distribution.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/candidate_sampling_ops.cc b/tensorflow/core/ops/candidate_sampling_ops.cc
new file mode 100644
index 0000000000..a98b0295ee
--- /dev/null
+++ b/tensorflow/core/ops/candidate_sampling_ops.cc
@@ -0,0 +1,351 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("UniformCandidateSampler")
+    .Input("true_classes: int64")
+    .Output("sampled_candidates: int64")
+    .Output("true_expected_count: float")
+    .Output("sampled_expected_count: float")
+    .Attr("num_true: int >= 1")
+    .Attr("num_sampled: int >= 1")
+    .Attr("unique: bool")
+    .Attr("range_max: int >= 1")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Doc(R"doc(
+Generates labels for candidate sampling with a uniform distribution.
+
+See explanations of candidate sampling and the data formats at
+go/candidate-sampling.
+
+For each batch, this op picks a single set of sampled candidate labels.
+
+The advantages of sampling candidates per-batch are simplicity and the
+possibility of efficient dense matrix multiplication. The disadvantage is that
+the sampled candidates must be chosen independently of the context and of the
+true labels.
+
+true_classes: A batch_size * num_true matrix, in which each row contains the
+  IDs of the num_true target_classes in the corresponding original label.
+sampled_candidates: A vector of length num_sampled, in which each element is
+  the ID of a sampled candidate.
+true_expected_count: A batch_size * num_true matrix, representing
+  the number of times each candidate is expected to occur in a batch
+  of sampled candidates. If unique=true, then this is a probability.
+sampled_expected_count: A vector of length num_sampled, for each sampled
+  candidate represting the number of times the candidate is expected
+  to occur in a batch of sampled candidates.  If unique=true, then this is a
+  probability.
+num_true: Number of true labels per context.
+num_sampled: Number of candidates to randomly sample per batch.
+unique: If unique is true, we sample with rejection, so that all sampled
+  candidates in a batch are unique. This requires some approximation to
+  estimate the post-rejection sampling probabilities.
+range_max: The sampler will sample integers from the interval [0, range_max).
+seed: If either seed or seed2 are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: An second seed to avoid seed collision.
+)doc");
+
+REGISTER_OP("LogUniformCandidateSampler")
+    .Input("true_classes: int64")
+    .Output("sampled_candidates: int64")
+    .Output("true_expected_count: float")
+    .Output("sampled_expected_count: float")
+    .Attr("num_true: int >= 1")
+    .Attr("num_sampled: int >= 1")
+    .Attr("unique: bool")
+    .Attr("range_max: int >= 1")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Doc(R"doc(
+Generates labels for candidate sampling with a log-uniform distribution.
+
+See explanations of candidate sampling and the data formats at
+go/candidate-sampling.
+
+For each batch, this op picks a single set of sampled candidate labels.
+
+The advantages of sampling candidates per-batch are simplicity and the
+possibility of efficient dense matrix multiplication. The disadvantage is that
+the sampled candidates must be chosen independently of the context and of the
+true labels.
+
+
+true_classes: A batch_size * num_true matrix, in which each row contains the
+  IDs of the num_true target_classes in the corresponding original label.
+sampled_candidates: A vector of length num_sampled, in which each element is
+  the ID of a sampled candidate.
+true_expected_count: A batch_size * num_true matrix, representing
+  the number of times each candidate is expected to occur in a batch
+  of sampled candidates. If unique=true, then this is a probability.
+sampled_expected_count: A vector of length num_sampled, for each sampled
+  candidate represting the number of times the candidate is expected
+  to occur in a batch of sampled candidates.  If unique=true, then this is a
+  probability.
+num_true: Number of true labels per context.
+num_sampled: Number of candidates to randomly sample per batch.
+unique: If unique is true, we sample with rejection, so that all sampled
+  candidates in a batch are unique. This requires some approximation to
+  estimate the post-rejection sampling probabilities.
+range_max: The sampler will sample integers from the interval [0, range_max).
+seed: If either seed or seed2 are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: An second seed to avoid seed collision.
+)doc");
+
+REGISTER_OP("LearnedUnigramCandidateSampler")
+    .Input("true_classes: int64")
+    .Output("sampled_candidates: int64")
+    .Output("true_expected_count: float")
+    .Output("sampled_expected_count: float")
+    .Attr("num_true: int >= 1")
+    .Attr("num_sampled: int >= 1")
+    .Attr("unique: bool")
+    .Attr("range_max: int >= 1")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Doc(R"doc(
+Generates labels for candidate sampling with a learned unigram distribution.
+
+See explanations of candidate sampling and the data formats at
+go/candidate-sampling.
+
+For each batch, this op picks a single set of sampled candidate labels.
+
+The advantages of sampling candidates per-batch are simplicity and the
+possibility of efficient dense matrix multiplication. The disadvantage is that
+the sampled candidates must be chosen independently of the context and of the
+true labels.
+
+true_classes: A batch_size * num_true matrix, in which each row contains the
+  IDs of the num_true target_classes in the corresponding original label.
+sampled_candidates: A vector of length num_sampled, in which each element is
+  the ID of a sampled candidate.
+true_expected_count: A batch_size * num_true matrix, representing
+  the number of times each candidate is expected to occur in a batch
+  of sampled candidates. If unique=true, then this is a probability.
+sampled_expected_count: A vector of length num_sampled, for each sampled
+  candidate represting the number of times the candidate is expected
+  to occur in a batch of sampled candidates.  If unique=true, then this is a
+  probability.
+num_true: Number of true labels per context.
+num_sampled: Number of candidates to randomly sample per batch.
+unique: If unique is true, we sample with rejection, so that all sampled
+  candidates in a batch are unique. This requires some approximation to
+  estimate the post-rejection sampling probabilities.
+range_max: The sampler will sample integers from the interval [0, range_max).
+seed: If either seed or seed2 are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: An second seed to avoid seed collision.
+)doc");
+
+REGISTER_OP("ThreadUnsafeUnigramCandidateSampler")
+    .Input("true_classes: int64")
+    .Output("sampled_candidates: int64")
+    .Output("true_expected_count: float")
+    .Output("sampled_expected_count: float")
+    .Attr("num_true: int >= 1")
+    .Attr("num_sampled: int >= 1")
+    .Attr("unique: bool")
+    .Attr("range_max: int >= 1")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Doc(R"doc(
+Generates labels for candidate sampling with a learned unigram distribution.
+
+See explanations of candidate sampling and the data formats at
+go/candidate-sampling.
+
+For each batch, this op picks a single set of sampled candidate labels.
+
+The advantages of sampling candidates per-batch are simplicity and the
+possibility of efficient dense matrix multiplication. The disadvantage is that
+the sampled candidates must be chosen independently of the context and of the
+true labels.
+
+true_classes: A batch_size * num_true matrix, in which each row contains the
+  IDs of the num_true target_classes in the corresponding original label.
+sampled_candidates: A vector of length num_sampled, in which each element is
+  the ID of a sampled candidate.
+true_expected_count: A batch_size * num_true matrix, representing
+  the number of times each candidate is expected to occur in a batch
+  of sampled candidates. If unique=true, then this is a probability.
+sampled_expected_count: A vector of length num_sampled, for each sampled
+  candidate represting the number of times the candidate is expected
+  to occur in a batch of sampled candidates.  If unique=true, then this is a
+  probability.
+num_true: Number of true labels per context.
+num_sampled: Number of candidates to randomly sample per batch.
+unique: If unique is true, we sample with rejection, so that all sampled
+  candidates in a batch are unique. This requires some approximation to
+  estimate the post-rejection sampling probabilities.
+range_max: The sampler will sample integers from the interval [0, range_max).
+seed: If either seed or seed2 are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: An second seed to avoid seed collision.
+)doc");
+
+REGISTER_OP("FixedUnigramCandidateSampler")
+    .Input("true_classes: int64")
+    .Output("sampled_candidates: int64")
+    .Output("true_expected_count: float")
+    .Output("sampled_expected_count: float")
+    .Attr("num_true: int >= 1")
+    .Attr("num_sampled: int >= 1")
+    .Attr("unique: bool")
+    .Attr("range_max: int >= 1")
+    .Attr("vocab_file: string = ''")
+    .Attr("distortion: float = 1.0")
+    .Attr("num_reserved_ids: int = 0")
+    .Attr("num_shards: int >= 1 = 1")
+    .Attr("shard: int >= 0 = 0")
+    .Attr("unigrams: list(float) = []")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Doc(R"doc(
+Generates labels for candidate sampling with a learned unigram distribution.
+
+A unigram sampler could use a fixed unigram distribution read from a
+file or passed in as an in-memory array instead of building up the distribution
+from data on the fly. There is also an option to skew the distribution by
+applying a distortion power to the weights.
+
+The vocabulary file should be in CSV-like format, with the last field
+being the weight associated with the word.
+
+For each batch, this op picks a single set of sampled candidate labels.
+
+The advantages of sampling candidates per-batch are simplicity and the
+possibility of efficient dense matrix multiplication. The disadvantage is that
+the sampled candidates must be chosen independently of the context and of the
+true labels.
+
+true_classes: A batch_size * num_true matrix, in which each row contains the
+  IDs of the num_true target_classes in the corresponding original label.
+sampled_candidates: A vector of length num_sampled, in which each element is
+  the ID of a sampled candidate.
+true_expected_count: A batch_size * num_true matrix, representing
+  the number of times each candidate is expected to occur in a batch
+  of sampled candidates. If unique=true, then this is a probability.
+sampled_expected_count: A vector of length num_sampled, for each sampled
+  candidate represting the number of times the candidate is expected
+  to occur in a batch of sampled candidates.  If unique=true, then this is a
+  probability.
+num_true: Number of true labels per context.
+num_sampled: Number of candidates to randomly sample per batch.
+unique: If unique is true, we sample with rejection, so that all sampled
+  candidates in a batch are unique. This requires some approximation to
+  estimate the post-rejection sampling probabilities.
+range_max: The sampler will sample integers from the interval [0, range_max).
+vocab_file: Each valid line in this file (which should have a CSV-like format)
+  corresponds to a valid word ID. IDs are in sequential order, starting from
+  num_reserved_ids. The last entry in each line is expected to be a value
+  corresponding to the count or relative probability. Exactly one of vocab_file
+  and unigrams needs to be passed to this op.
+distortion: The distortion is used to skew the unigram probability distribution.
+  Each weight is first raised to the distortion's power before adding to the
+  internal unigram distribution. As a result, distortion = 1.0 gives regular
+  unigram sampling (as defined by the vocab file), and distortion = 0.0 gives
+  a uniform distribution.
+num_reserved_ids: Optionally some reserved IDs can be added in the range [0,
+  ..., num_reserved_ids) by the users. One use case is that a special unknown
+  word token is used as ID 0. These IDs will have a sampling probability of 0.
+num_shards: A sampler can be used to sample from a subset of the original range
+  in order to speed up the whole computation through parallelism. This parameter
+  (together with 'shard') indicates the number of partitions that are being
+  used in the overall computation.
+shard: A sampler can be used to sample from a subset of the original range
+  in order to speed up the whole computation through parallelism. This parameter
+  (together with 'num_shards') indicates the particular partition number of a
+  sampler op, when partitioning is being used.
+unigrams: A list of unigram counts or probabilities, one per ID in sequential
+  order. Exactly one of vocab_file and unigrams should be passed to this op.
+seed: If either seed or seed2 are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: An second seed to avoid seed collision.
+)doc");
+
+REGISTER_OP("AllCandidateSampler")
+    .Input("true_classes: int64")
+    .Output("sampled_candidates: int64")
+    .Output("true_expected_count: float")
+    .Output("sampled_expected_count: float")
+    .Attr("num_true: int >= 1")
+    .Attr("num_sampled: int >= 1")
+    .Attr("unique: bool")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Doc(R"doc(
+Generates labels for candidate sampling with a learned unigram distribution.
+
+See explanations of candidate sampling and the data formats at
+go/candidate-sampling.
+
+For each batch, this op picks a single set of sampled candidate labels.
+
+The advantages of sampling candidates per-batch are simplicity and the
+possibility of efficient dense matrix multiplication. The disadvantage is that
+the sampled candidates must be chosen independently of the context and of the
+true labels.
+
+true_classes: A batch_size * num_true matrix, in which each row contains the
+  IDs of the num_true target_classes in the corresponding original label.
+sampled_candidates: A vector of length num_sampled, in which each element is
+  the ID of a sampled candidate.
+true_expected_count: A batch_size * num_true matrix, representing
+  the number of times each candidate is expected to occur in a batch
+  of sampled candidates. If unique=true, then this is a probability.
+sampled_expected_count: A vector of length num_sampled, for each sampled
+  candidate represting the number of times the candidate is expected
+  to occur in a batch of sampled candidates.  If unique=true, then this is a
+  probability.
+num_true: Number of true labels per context.
+num_sampled: Number of candidates to produce per batch.
+unique: If unique is true, we sample with rejection, so that all sampled
+  candidates in a batch are unique. This requires some approximation to
+  estimate the post-rejection sampling probabilities.
+seed: If either seed or seed2 are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: An second seed to avoid seed collision.
+)doc");
+
+REGISTER_OP("ComputeAccidentalHits")
+    .Input("true_classes: int64")
+    .Input("sampled_candidates: int64")
+    .Output("indices: int32")
+    .Output("ids: int64")
+    .Output("weights: float")
+    .Attr("num_true: int")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Doc(R"doc(
+Computes the ids of the positions in sampled_candidates that match true_labels.
+
+When doing log-odds NCE, the result of this op should be passed through a
+SparseToDense op, then added to the logits of the sampled candidates. This has
+the effect of 'removing' the sampled labels that match the true labels by
+making the classifier sure that they are sampled labels.
+
+true_classes: The true_classes output of UnpackSparseLabels.
+sampled_candidates: The sampled_candidates output of CandidateSampler.
+indices: A vector of indices corresponding to rows of true_candidates.
+ids: A vector of IDs of positions in sampled_candidates that match a true_label
+  for the row with the corresponding index in indices.
+weights: A vector of the same length as indices and ids, in which each element
+  is -FLOAT_MAX.
+num_true: Number of true labels per context.
+seed: If either seed or seed2 are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: An second seed to avoid seed collision.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/control_flow_ops.cc b/tensorflow/core/ops/control_flow_ops.cc
new file mode 100644
index 0000000000..517b2d2742
--- /dev/null
+++ b/tensorflow/core/ops/control_flow_ops.cc
@@ -0,0 +1,179 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Switch")
+    .Input("data: T")
+    .Input("pred: bool")
+    .Output("output_false: T")
+    .Output("output_true: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Forwards `data` to the output port determined by `pred`.
+
+If `pred` is true, the `data` input is forwared to `output_true`. Otherwise,
+the data goes to `output_false`.
+
+See also `RefSwitch` and `Merge`.
+
+data: The tensor to be forwarded to the appropriate output.
+pred: A scalar that specifies which output port will receive data.
+output_false: If `pred` is false, data will be forwarded to this output.
+output_true: If `pred` is true, data will be forwarded to this output.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("RefSwitch")
+    .Input("data: Ref(T)")
+    .Input("pred: bool")
+    .Output("output_false: Ref(T)")
+    .Output("output_true: Ref(T)")
+    .Attr("T: type")
+    .Doc(R"doc(
+Forwards the ref tensor `data` to the output port determined by `pred`.
+
+If `pred` is true, the `data` input is forwared to `output_true`. Otherwise,
+the data goes to `output_false`.
+
+See also `Switch` and `Merge`.
+
+data: The ref tensor to be forwarded to the appropriate output.
+pred: A scalar that specifies which output port will receive data.
+output_false: If `pred` is false, data will be forwarded to this output.
+output_true: If `pred` is true, data will be forwarded to this output.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("RefSelect")
+    .Input("index: int32")
+    .Input("inputs: Ref(N * T)")
+    .Output("output: Ref(T)")
+    .Attr("T: type")
+    .Attr("N: int >= 1")
+    .Doc(R"doc(
+Forwards the `index`th element of `inputs` to `output`.
+
+index: A scalar that determines the input that gets selected.
+inputs: A list of ref tensors, one of which will be forwarded to `output`.
+output: The forwarded tensor.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Merge")
+    .Input("inputs: N * T")
+    .Output("output: T")
+    .Output("value_index: int32")
+    .Attr("T: type")
+    .Attr("N: int >= 1")
+    .Doc(R"doc(
+Forwards the value of an available tensor from `inputs` to `output`.
+
+`Merge` waits for at least one of the tensors in `inputs` to become available.
+It is usually combined with `Switch` to implement branching.
+
+`Merge` forwards the first tensor for become available to `output`, and sets
+`value_index` to its index in `inputs`.
+
+It is an error if more than one tensor in `inputs` is available.
+
+inputs: The input tensors, exactly one of which will become available.
+output: Will be set to the available input tensor.
+value_index: The index of the chosen input tensor in `inputs`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Enter")
+    .Input("data: T")
+    .Output("output: T")
+    .Attr("T: type")
+    .Attr("frame_name: string")
+    .Attr("is_constant: bool = false")
+    .Attr("parallel_iterations: int = 10")
+    .Doc(R"doc(
+Creates or finds a child frame, and makes `data` available to the child frame.
+
+This op is used together with `Exit` to create loops in the graph.
+The unique `frame_name` is used by the `Executor` to identify frames. If
+`is_constant` is true, `output` is a constant in the child frame; otherwise
+it may be changed in the child frame. At most `parallel_iterations` iterations
+are run in parallel in the child frame.
+
+data: The tensor to be made available to the child frame.
+frame_name: The name of the child frame.
+is_constant: If true, the output is constant within the child frame.
+parallel_iterations: The number of iterations allowed to run in parallel.
+output: The same tensor as `data`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("RefEnter")
+    .Input("data: Ref(T)")
+    .Output("output: Ref(T)")
+    .Attr("T: type")
+    .Attr("frame_name: string")
+    .Attr("is_constant: bool = false")
+    .Attr("parallel_iterations: int = 10")
+    .Doc(R"doc(
+Creates or finds a child frame, and makes `data` available to the child frame.
+
+The unique `frame_name` is used by the `Executor` to identify frames. If
+`is_constant` is true, `output` is a constant in the child frame; otherwise
+it may be changed in the child frame. At most `parallel_iterations` iterations
+are run in parallel in the child frame.
+
+data: The tensor to be made available to the child frame.
+frame_name: The name of the child frame.
+is_constant: If true, the output is constant within the child frame.
+parallel_iterations: The number of iterations allowed to run in parallel.
+output: The same tensor as `data`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Exit")
+    .Input("data: T")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Exits the current frame to its parent frame.
+
+Exit makes its input `data` available to the parent frame.
+
+data: The tensor to be made available to the parent frame.
+output: The same tensor as `data`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("NextIteration")
+    .Input("data: T")
+    .Output("output: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Makes its input available to the next iteration.
+
+data: The tensor to be made available to the next iteration.
+output: The same tensor as `data`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("LoopCond")
+    .Input("input: bool")
+    .Output("output: bool")
+    .Doc(R"doc(
+Forwards the input to the output.
+
+This operator represents the loop termination condition used by the
+"pivot" switches of a loop.
+
+input:= A boolean scalar, representing the branch predicate of the Switch op.
+output: The same tensor as `input`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ControlTrigger")
+    .Doc(R"doc(
+Does nothing. Serves as a control trigger for scheduling. Only useful as a
+placeholder for control edges.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/data_flow_ops.cc b/tensorflow/core/ops/data_flow_ops.cc
new file mode 100644
index 0000000000..49eba33188
--- /dev/null
+++ b/tensorflow/core/ops/data_flow_ops.cc
@@ -0,0 +1,357 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("DynamicPartition")
+    .Input("data: T")
+    .Input("partitions: int32")
+    .Output("outputs: num_partitions * T")
+    .Attr("num_partitions: int")
+    .Attr("T: type")
+    .Doc(R"doc(
+Partitions `data` into `num_partitions` tensors using indices from `partitions`.
+
+For each index tuple `js` of size `partitions.ndim`, the slice `data[js, ...]`
+becomes part of `outputs[partitions[js]]`.  The slices with `partitions[js] = i`
+are placed in `outputs[i]` in lexicographic order of `js`, and the first
+dimension of `outputs[i]` is the number of entries in `partitions` equal to `i`.
+In detail,
+
+    outputs[i].shape = [sum(partitions == i)] + data.shape[partitions.ndim:]
+
+    outputs[i] = pack([data[js, ...] for js if partitions[js] == i])
+
+`data.shape` must start with `partitions.shape`.
+
+For example:
+
+    # Scalar partitions
+    partitions = 1
+    num_partitions = 2
+    data = [10, 20]
+    outputs[0] = []  # Empty with shape [0, 2]
+    outputs[1] = [[10, 20]]
+
+    # Vector partitions
+    partitions = [0, 0, 1, 1, 0]
+    num_partitions = 2
+    data = [10, 20, 30, 40, 50]
+    outputs[0] = [10, 20, 50]
+    outputs[1] = [30, 40]
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/DynamicPartition.png" alt>
+</div>
+
+partitions: Any shape.  Indices in the range `[0, num_partitions)`.
+num_partitions: The number of partitions to output.
+)doc");
+
+REGISTER_OP("DynamicStitch")
+    .Input("indices: N * int32")
+    .Input("data: N * T")
+    .Output("merged: T")
+    .Attr("N : int >= 2")
+    .Attr("T : type")
+    .Doc(R"doc(
+Interleave the values from the `data` tensors into a single tensor.
+
+Builds a merged tensor such that
+
+    merged[indices[m][i, ..., j], ...] = data[m][i, ..., j, ...]
+
+For example, if each `indices[m]` is scalar or vector, we have
+
+    # Scalar indices
+    merged[indices[m], ...] = data[m][...]
+
+    # Vector indices
+    merged[indices[m][i], ...] = data[m][i, ...]
+
+Each `data[i].shape` must start with the corresponding `indices[i].shape`,
+and the rest of `data[i].shape` must be constant w.r.t. `i`.  That is, we
+must have `data[i].shape = indices[i].shape + constant`.  In terms of this
+`constant`, the output shape is
+
+    merged.shape = [max(indices)] + constant
+
+Values are merged in order, so if an index appears in both `indices[m][i]` and
+`indices[n][j]` for `(m,i) < (n,j)` the slice `data[n][j]` will appear in the
+merged result.
+
+For example:
+
+    indices[0] = 6
+    indices[1] = [4, 1]
+    indices[2] = [[5, 2], [0, 3]]
+    data[0] = [61, 62]
+    data[1] = [[41, 42], [11, 12]]
+    data[2] = [[[51, 52], [21, 22]], [[1, 2], [31, 32]]]
+    merged = [[1, 2], [11, 12], [21, 22], [31, 32], [41, 42],
+              [51, 52], [61, 62]]
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/DynamicStitch.png" alt>
+</div>
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("RandomShuffleQueue")
+    .Output("handle: Ref(string)")
+    .Attr("component_types: list(type) >= 1")
+    .Attr("shapes: list(shape) >= 0 = []")
+    .Attr("capacity: int = -1")
+    .Attr("min_after_dequeue: int = 0")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A queue that randomizes the order of elements.
+
+handle: The handle to the queue.
+component_types: The type of each component in a value.
+shapes: The shape of each component in a value. The length of this attr must
+  be either 0 or the same as the length of component_types. If the length of
+  this attr is 0, the shapes of queue elements are not constrained, and
+  only one element may be dequeued at a time.
+capacity: The upper bound on the number of elements in this queue.
+  Negative numbers mean no limit.
+min_after_dequeue: Dequeue will block unless there would be this
+  many elements after the dequeue or the queue is closed. This
+  ensures a minimum level of mixing of elements.
+seed: If either seed or seed2 is set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, a random seed is used.
+seed2: A second seed to avoid seed collision.
+container: If non-empty, this queue is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this queue will be shared under the given name
+  across multiple sessions.
+)doc");
+
+REGISTER_OP("FIFOQueue")
+    .Output("handle: Ref(string)")
+    .Attr("component_types: list(type) >= 1")
+    .Attr("shapes: list(shape) >= 0 = []")
+    .Attr("capacity: int = -1")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A queue that produces elements in first-in first-out order.
+
+handle: The handle to the queue.
+component_types: The type of each component in a value.
+shapes: The shape of each component in a value. The length of this attr must
+  be either 0 or the same as the length of component_types. If the length of
+  this attr is 0, the shapes of queue elements are not constrained, and
+  only one element may be dequeued at a time.
+capacity: The upper bound on the number of elements in this queue.
+  Negative numbers mean no limit.
+container: If non-empty, this queue is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this queue will be shared under the given name
+  across multiple sessions.
+)doc");
+
+REGISTER_OP("QueueEnqueue")
+    .Input("handle: Ref(string)")
+    .Input("components: Tcomponents")
+    .Attr("Tcomponents: list(type) >= 1")
+    .Attr("timeout_ms: int = -1")
+    .Doc(R"doc(
+Enqueues a tuple of one or more tensors in the given queue.
+
+The components input has k elements, which correspond to the components of
+tuples stored in the given queue.
+
+N.B. If the queue is full, this operation will block until the given
+element has been enqueued (or 'timeout_ms' elapses, if specified).
+
+handle: The handle to a queue.
+components: One or more tensors from which the enqueued tensors should be taken.
+timeout_ms: If the queue is full, this operation will block for up to
+  timeout_ms milliseconds.
+  Note: This option is not supported yet.
+)doc");
+
+REGISTER_OP("QueueEnqueueMany")
+    .Input("handle: Ref(string)")
+    .Input("components: Tcomponents")
+    .Attr("Tcomponents: list(type) >= 1")
+    .Attr("timeout_ms: int = -1")
+    .Doc(R"doc(
+Enqueues zero or more tuples of one or more tensors in the given queue.
+
+This operation slices each component tensor along the 0th dimension to
+make multiple queue elements. All of the tuple components must have the
+same size in the 0th dimension.
+
+The components input has k elements, which correspond to the components of
+tuples stored in the given queue.
+
+N.B. If the queue is full, this operation will block until the given
+elements have been enqueued (or 'timeout_ms' elapses, if specified).
+
+handle: The handle to a queue.
+components: One or more tensors from which the enqueued tensors should
+  be taken.
+timeout_ms: If the queue is too full, this operation will block for up
+  to timeout_ms milliseconds.
+  Note: This option is not supported yet.
+)doc");
+
+REGISTER_OP("QueueDequeue")
+    .Input("handle: Ref(string)")
+    .Output("components: component_types")
+    .Attr("component_types: list(type) >= 1")
+    .Attr("timeout_ms: int = -1")
+    .Doc(R"doc(
+Dequeues a tuple of one or more tensors from the given queue.
+
+This operation has k outputs, where k is the number of components
+in the tuples stored in the given queue, and output i is the ith
+component of the dequeued tuple.
+
+N.B. If the queue is empty, this operation will block until an element
+has been dequeued (or 'timeout_ms' elapses, if specified).
+
+handle: The handle to a queue.
+components: One or more tensors that were dequeued as a tuple.
+component_types: The type of each component in a tuple.
+timeout_ms: If the queue is empty, this operation will block for up to
+  timeout_ms milliseconds.
+  Note: This option is not supported yet.
+)doc");
+
+REGISTER_OP("QueueDequeueMany")
+    .Input("handle: Ref(string)")
+    .Input("n: int32")
+    .Output("components: component_types")
+    .Attr("component_types: list(type) >= 1")
+    .Attr("timeout_ms: int = -1")
+    .Doc(R"doc(
+Dequeues n tuples of one or more tensors from the given queue.
+
+This operation concatenates queue-element component tensors along the
+0th dimension to make a single component tensor.  All of the components
+in the dequeued tuple will have size n in the 0th dimension.
+
+This operation has k outputs, where k is the number of components in
+the tuples stored in the given queue, and output i is the ith
+component of the dequeued tuple.
+
+N.B. If the queue is empty, this operation will block until n elements
+have been dequeued (or 'timeout_ms' elapses, if specified).
+
+handle: The handle to a queue.
+n: The number of tuples to dequeue.
+components: One or more tensors that were dequeued as a tuple.
+component_types: The type of each component in a tuple.
+timeout_ms: If the queue has fewer than n elements, this operation
+  will block for up to timeout_ms milliseconds.
+  Note: This option is not supported yet.
+)doc");
+
+REGISTER_OP("QueueClose")
+    .Input("handle: Ref(string)")
+    .Attr("cancel_pending_enqueues: bool = false")
+    .Doc(R"doc(
+Closes the given queue.
+
+This operation signals that no more elements will be enqueued in the
+given queue. Subsequent Enqueue(Many) operations will fail.
+Subsequent Dequeue(Many) operations will continue to succeed if
+sufficient elements remain in the queue. Subsequent Dequeue(Many)
+operations that would block will fail immediately.
+
+handle: The handle to a queue.
+cancel_pending_enqueues: If true, all pending enqueue requests that are
+  blocked on the given queue will be cancelled.
+)doc");
+
+REGISTER_OP("QueueSize")
+    .Input("handle: Ref(string)")
+    .Output("size: int32")
+    .Doc(R"doc(
+Computes the number of elements in the given queue.
+
+handle: The handle to a queue.
+size: The number of elements in the given queue.
+)doc");
+
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("LookupTableFind")
+    .Input("table_handle: Ref(string)")
+    .Input("input_values: Tin")
+    .Input("default_value: Tout")
+    .Output("output_values: Tout")
+    .Attr("Tin: type")
+    .Attr("Tout: type")
+    .Doc(R"doc(
+Maps elements of a tensor into associated values given a lookup table.
+
+If an element of the input_values is not present in the table, the
+specified default_value is used.
+
+The table needs to be initialized and the input and output types correspond
+to the table key and value types.
+
+table_handle: A handle for a lookup table.
+input_values: A vector of key values.
+default_value: A scalar to return if the input is not found in the table.
+output_values: A vector of values associated to the inputs.
+)doc");
+
+REGISTER_OP("LookupTableSize")
+    .Input("table_handle: Ref(string)")
+    .Output("size: int64")
+    .Doc(R"doc(
+Computes the number of elements in the given table.
+
+table_handle: The handle to a lookup table.
+size: The number of elements in the given table.
+)doc");
+
+REGISTER_OP("HashTable")
+    .Output("table_handle: Ref(string)")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .Attr("key_dtype: type")
+    .Attr("value_dtype: type")
+    .Doc(R"doc(
+Creates and holds an immutable hash table.
+
+The key and value types can be specified. After initialization, the table
+becomes immutable.
+
+table_handle: a handle of a the lookup table.
+container: If non-empty, this hash table is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this hash table is shared under the given name across
+  multiple sessions.
+key_dtype: the type of the table key.
+value_dtype: the type of the table value.
+)doc");
+
+REGISTER_OP("InitializeTable")
+    .Input("table_handle: Ref(string)")
+    .Input("keys: Tkey")
+    .Input("values: Tval")
+    .Attr("Tkey: type")
+    .Attr("Tval: type")
+    .Doc(R"doc(
+Table initializer that takes two tensors for keys and values respectively.
+
+table_handle: a handle of the lookup table to be initialized.
+keys: a vector of keys of type Tkey.
+values: a vector of values of type Tval.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/image_ops.cc b/tensorflow/core/ops/image_ops.cc
new file mode 100644
index 0000000000..88af081893
--- /dev/null
+++ b/tensorflow/core/ops/image_ops.cc
@@ -0,0 +1,273 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ResizeArea")
+    .Input("images: T")
+    .Input("size: int32")
+    .Output("resized_images: float")
+    .Attr("T: {uint8, int8, int32, float, double}")
+    .Doc(R"doc(
+Resize `images` to `size` using area interpolation.
+
+Input images can be of different types but output images are always float.
+
+images: 4-D with shape `[batch, height, width, channels]`.
+size:= A 1-D int32 Tensor of 2 elements: `new_height, new_width`.  The
+  new size for the images.
+resized_images:  4-D with shape
+  `[batch, new_height, new_width, channels]`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ResizeBicubic")
+    .Input("images: T")
+    .Input("size: int32")
+    .Output("resized_images: float")
+    .Attr("T: {uint8, int8, int32, float, double}")
+    .Doc(R"doc(
+Resize `images` to `size` using bicubic interpolation.
+
+Input images can be of different types but output images are always float.
+
+images: 4-D with shape `[batch, height, width, channels]`.
+size:= A 1-D int32 Tensor of 2 elements: `new_height, new_width`.  The
+  new size for the images.
+resized_images:  4-D with shape
+  `[batch, new_height, new_width, channels]`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ResizeBilinear")
+    .Input("images: T")
+    .Input("size: int32")
+    .Output("resized_images: float")
+    .Attr("T: {uint8, int8, int32, float, double}")
+    .Doc(R"doc(
+Resize `images` to `size` using bilinear interpolation.
+
+Input images can be of different types but output images are always float.
+
+images: 4-D with shape `[batch, height, width, channels]`.
+size:= A 1-D int32 Tensor of 2 elements: `new_height, new_width`.  The
+  new size for the images.
+resized_images:  4-D with shape
+  `[batch, new_height, new_width, channels]`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("ResizeNearestNeighbor")
+    .Input("images: T")
+    .Input("size: int32")
+    .Output("resized_images: T")
+    .Attr("T: {uint8, int8, int32, float, double}")
+    .Doc(R"doc(
+Resize `images` to `size` using nearest neighbor interpolation.
+
+Input images can be of different types but output images are always float.
+
+images: 4-D with shape `[batch, height, width, channels]`.
+size:= A 1-D int32 Tensor of 2 elements: `new_height, new_width`.  The
+  new size for the images.
+resized_images:  4-D with shape
+  `[batch, new_height, new_width, channels]`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("RandomCrop")
+    .Input("image: T")
+    .Input("size: int64")
+    .Output("output: T")
+    .Attr("T: {uint8, int8, int16, int32, int64, float, double}")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .SetIsStateful()
+    .Doc(R"doc(
+Randomly crop `image`.
+
+`size` is a 1-D int64 tensor with 2 elements representing the crop height and
+width.  The values must be non negative.
+
+This Op picks a random location in `image` and crops a `height` by `width`
+rectangle from that location.  The random location is picked so the cropped
+area will fit inside the original image.
+
+image: 3-D of shape `[height, width, channels]`.
+size: 1-D of length 2 containing: `crop_height`, `crop_width`..
+seed: If either seed or seed2 are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: An second seed to avoid seed collision.
+output: 3-D of shape `[crop_height, crop_width, channels].`
+)doc");
+// TODO(shlens): Support variable rank in RandomCrop.
+
+// --------------------------------------------------------------------------
+REGISTER_OP("DecodeJpeg")
+    .Input("contents: string")
+    .Attr("channels: int = 0")
+    .Attr("ratio: int = 1")
+    .Attr("fancy_upscaling: bool = true")
+    .Attr("try_recover_truncated: bool = false")
+    .Attr("acceptable_fraction: float = 1.0")
+    .Output("image: uint8")
+    .Doc(R"doc(
+Decode a JPEG-encoded image to a uint8 tensor.
+
+The attr `channels` indicates the desired number of color channels for the
+decoded image.
+
+Accepted values are:
+
+*   0: Use the number of channels in the JPEG-encoded image.
+*   1: output a grayscale image.
+*   3: output an RGB image.
+
+If needed, the JPEG-encoded image is transformed to match the requested number
+of color channels.
+
+The attr `ratio` allows downscaling the image by an integer factor during
+decoding.  Allowed values are: 1, 2, 4, and 8.  This is much faster than
+downscaling the image later.
+
+contents: 0-D.  The JPEG-encoded image.
+channels: Number of color channels for the decoded image.
+ratio: Downscaling ratio.
+fancy_upscaling: If true use a slower but nicer upscaling of the
+  chroma planes (yuv420/422 only).
+try_recover_truncated:  If true try to recover an image from truncated input.
+acceptable_fraction: The minimum required fraction of lines before a truncated
+  input is accepted.
+image: 3-D with shape `[height, width, channels]`..
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("EncodeJpeg")
+    .Input("image: uint8")
+    .Attr("format: {'', 'grayscale', 'rgb'} = ''")
+    .Attr("quality: int = 95")
+    .Attr("progressive: bool = false")
+    .Attr("optimize_size: bool = false")
+    .Attr("chroma_downsampling: bool = true")
+    .Attr("density_unit: {'in', 'cm'} = 'in'")
+    .Attr("x_density: int = 300")
+    .Attr("y_density: int = 300")
+    .Attr("xmp_metadata: string = ''")
+    .Output("contents: string")
+    .Doc(R"doc(
+JPEG-encode an image.
+
+`image` is a 3-D uint8 Tensor of shape `[height, width, channels]`.
+
+The attr `format` can be used to override the color format of the encoded
+output.  Values can be:
+
+*   `''`: Use a default format based on the number of channels in the image.
+*   `grayscale`: Output a grayscale JPEG image.  The `channels` dimension
+    of `image` must be 1.
+*   `rgb`: Output an RGB JPEG image. The `channels` dimension
+    of `image` must be 3.
+
+If `format` is not specified or is the empty string, a default format is picked
+in function of the number of channels in `image`:
+
+*   1: Output a grayscale image.
+*   3: Output an RGB image.
+
+image: 3-D with shape `[height, width, channels]`.
+format: Per pixel image format.
+quality: Quality of the compression from 0 to 100 (higher is better and slower).
+progressive: If True, create a JPEG that loads progressively (coarse to fine).
+optimize_size: If True, spend CPU/RAM to reduce size with no quality change.
+chroma_downsampling: See http://en.wikipedia.org/wiki/Chroma_subsampling.
+density_unit: Unit used to specify `x_density` and `y_density`:
+   pixels per inch (`'in'`) or centimeter (`'cm'`).
+x_density: Horizontal pixels per density unit.
+y_density: Vertical pixels per density unit.
+xmp_metadata: If not empty, embed this XMP metadata in the image header.
+contents: 0-D. JPEG-encoded image.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("AdjustContrast")
+    .Input("images: T")
+    .Input("contrast_factor: float")
+    .Input("min_value: float")
+    .Input("max_value: float")
+    .Output("output: float")
+    .Attr("T: {uint8, int8, int16, int32, int64, float, double}")
+    .Doc(R"Doc(
+Adjust the contrast of one or more images.
+
+`images` is a tensor of at least 3 dimensions.  The last 3 dimensions are
+interpreted as `[height, width, channels]`.  The other dimensions only
+represent a collection of images, such as `[batch, height, width, channels].`
+
+Contrast is adjusted independently for each channel of each image.
+
+For each channel, the Op first computes the mean of the image pixels in the
+channel and then adjusts each component of each pixel to
+`(x - mean) * contrast_factor + mean`.
+
+These adjusted values are then clipped to fit in the `[min_value, max_value]`
+interval.
+
+`images: Images to adjust.  At least 3-D.
+contrast_factor: A float multiplier for adjusting contrast.
+min_value: Minimum value for clipping the adjusted pixels.
+max_value: Maximum value for clipping the adjusted pixels.
+output: The constrast-adjusted image or images.
+)Doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("DecodePng")
+    .Input("contents: string")
+    .Attr("channels: int = 0")
+    .Output("image: uint8")
+    .Doc(R"doc(
+Decode a PNG-encoded image to a uint8 tensor.
+
+The attr `channels` indicates the desired number of color channels for the
+decoded image.
+
+Accepted values are:
+
+*   0: Use the number of channels in the PNG-encoded image.
+*   1: output a grayscale image.
+*   3: output an RGB image.
+*   4: output an RGBA image.
+
+If needed, the PNG-encoded image is transformed to match the requested number
+of color channels.
+
+contents: 0-D.  The PNG-encoded image.
+channels: Number of color channels for the decoded image.
+image: 3-D with shape `[height, width, channels]`.
+)doc");
+
+// --------------------------------------------------------------------------
+REGISTER_OP("EncodePng")
+    .Input("image: uint8")
+    .Attr("compression: int = -1")
+    .Output("contents: string")
+    .Doc(R"doc(
+PNG-encode an image.
+
+`image` is a 3-D uint8 Tensor of shape `[height, width, channels]` where
+`channels` is:
+
+*   1: for grayscale.
+*   3: for RGB.
+*   4: for RGBA.
+
+The ZLIB compression level, `compression`, can be -1 for the PNG-encoder
+default or a value from 0 to 9.  9 is the highest compression level, generating
+the smallest output, but is slower.
+
+image: 3-D with shape `[height, width, channels]`.
+compression: Compression level.
+contents: 0-D. PNG-encoded image.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/io_ops.cc b/tensorflow/core/ops/io_ops.cc
new file mode 100644
index 0000000000..937fedd45d
--- /dev/null
+++ b/tensorflow/core/ops/io_ops.cc
@@ -0,0 +1,332 @@
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+REGISTER_OP("Save")
+    .Input("filename: string")
+    .Input("tensor_names: string")
+    .Input("data: T")
+    .Attr("T: list({float, double, int32, int64, quint8, qint8, qint32})")
+    .Doc(R"doc(
+Saves the input tensors to disk.
+
+The size of `tensor_names` must match the number of tensors in `data`. `data[i]`
+is written to `filename` with name `tensor_names[i]`.
+
+See also `SaveSlices`.
+
+filename: Must have a single element. The name of the file to which we write
+the tensor.
+tensor_names: Shape `[N]`. The names of the tensors to be saved.
+data: `N` tensors to save.
+)doc");
+
+REGISTER_OP("SaveSlices")
+    .Input("filename: string")
+    .Input("tensor_names: string")
+    .Input("shapes_and_slices: string")
+    .Input("data: T")
+    .Attr("T: list({float, double, int32, int64, quint8, qint8, qint32})")
+    .Doc(R"doc(
+Saves input tensors slices to disk.
+
+This is like `Save` except that tensors can be listed in the saved file as being
+a slice of a larger tensor.  `shapes_and_slices` specifies the shape of the
+larger tensor and the slice that this tensor covers. `shapes_and_slices` must
+have as many elements as `tensor_names`.
+
+Elements of the `shapes_and_slices` input must either be:
+
+*  The empty string, in which case the corresponding tensor is
+   saved normally.
+*  A string of the form `dim0 dim1 ... dimN-1 slice-spec` where the
+   `dimI` are the dimensions of the larger tensor and `slice-spec`
+   specifies what part is covered by the tensor to save.
+
+`slice-spec` itself is a `:`-separated list: `slice0:slice1:...:sliceN-1`
+where each `sliceI` is either:
+
+*  The string `-` meaning that the slice covers all indices of this dimension
+*  `start,length` where `start` and `length` are integers.  In that
+   case the slice covers `length` indices starting at `start`.
+
+See also `Save`.
+
+filename: Must have a single element. The name of the file to which we write the
+tensor.
+tensor_names: Shape `[N]`. The names of the tensors to be saved.
+shapes_and_slices: Shape `[N]`.  The shapes and slice specifications to use when
+saving the tensors.
+data: `N` tensors to save.
+)doc");
+
+REGISTER_OP("Restore")
+    .Input("file_pattern: string")
+    .Input("tensor_name: string")
+    .Output("tensor: dt")
+    .Attr("dt: type")
+    .Attr("preferred_shard: int = -1")
+    .Doc(R"doc(
+Restores a tensor from checkpoint files.
+
+Reads a tensor stored in one or several files. If there are several files (for
+instance because a tensor was saved as slices), `file_pattern` may contain
+wildcard symbols (`*` and `?`) in the filename portion only, not in the
+directory portion.
+
+If a `file_pattern` matches several files, `preferred_shard` can be used to hint
+in which file the requested tensor is likely to be found. This op will first
+open the file at index `preferred_shard` in the list of matching files and try
+to restore tensors from that file.  Only if some tensors or tensor slices are
+not found in that first file, then the Op opens all the files. Setting
+`preferred_shard` to match the value passed as the `shard` input
+of a matching `Save` Op may speed up Restore.  This attribute only affects
+performance, not correctness.  The default value -1 means files are processed in
+order.
+
+See also `RestoreSlice`.
+
+file_pattern: Must have a single element. The pattern of the files from
+  which we read the tensor.
+tensor_name: Must have a single element. The name of the tensor to be
+  restored.
+tensor: The restored tensor.
+dt: The type of the tensor to be restored.
+preferred_shard: Index of file to open first if multiple files match
+  `file_pattern`.
+)doc");
+
+REGISTER_OP("RestoreSlice")
+    .Input("file_pattern: string")
+    .Input("tensor_name: string")
+    .Input("shape_and_slice: string")
+    .Output("tensor: dt")
+    .Attr("dt: type")
+    .Attr("preferred_shard: int = -1")
+    .Doc(R"doc(
+Restores a tensor from checkpoint files.
+
+This is like `Restore` except that restored tensor can be listed as filling
+only a slice of a larger tensor.  `shape_and_slice` specifies the shape of the
+larger tensor and the slice that the restored tensor covers.
+
+The `shape_and_slice` input has the same format as the
+elements of the `shapes_and_slices` input of the `SaveSlices` op.
+
+file_pattern: Must have a single element. The pattern of the files from
+  which we read the tensor.
+tensor_name: Must have a single element. The name of the tensor to be
+  restored.
+shape_and_slice: Scalar. The shapes and slice specifications to use when
+  restoring a tensors.
+tensor: The restored tensor.
+dt: The type of the tensor to be restored.
+preferred_shard: Index of file to open first if multiple files match
+  `file_pattern`. See the documentation for `Restore`.
+)doc");
+
+REGISTER_OP("ShardedFilename")
+    .Input("basename: string")
+    .Input("shard: int32")
+    .Input("num_shards: int32")
+    .Output("filename: string")
+    .Doc(R"doc(
+Generate a sharded filename. The filename is printf formated as
+   %s-%05d-of-%05d, basename, shard, num_shards.
+)doc");
+
+REGISTER_OP("ShardedFilespec")
+    .Input("basename: string")
+    .Input("num_shards: int32")
+    .Output("filename: string")
+    .Doc(R"doc(
+Generate a glob pattern matching all sharded file names.
+)doc");
+
+// Reader source ops ----------------------------------------------------------
+
+REGISTER_OP("WholeFileReader")
+    .Output("reader_handle: Ref(string)")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A Reader that outputs the entire contents of a file as a value.
+
+To use, enqueue filenames in a Queue.  The output of ReaderRead will
+be a filename (key) and the contents of that file (value).
+
+reader_handle: The handle to reference the Reader.
+container: If non-empty, this reader is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this reader is named in the given bucket
+             with this shared_name. Otherwise, the node name is used instead.
+)doc");
+
+REGISTER_OP("TextLineReader")
+    .Output("reader_handle: Ref(string)")
+    .Attr("skip_header_lines: int = 0")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A Reader that outputs the lines of a file delimited by '\n'.
+
+reader_handle: The handle to reference the Reader.
+skip_header_lines: Number of lines to skip from the beginning of every file.
+container: If non-empty, this reader is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this reader is named in the given bucket
+             with this shared_name. Otherwise, the node name is used instead.
+)doc");
+
+REGISTER_OP("FixedLengthRecordReader")
+    .Output("reader_handle: Ref(string)")
+    .Attr("header_bytes: int = 0")
+    .Attr("record_bytes: int")
+    .Attr("footer_bytes: int = 0")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A Reader that outputs fixed-length records from a file.
+
+reader_handle: The handle to reference the Reader.
+container: If non-empty, this reader is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this reader is named in the given bucket
+             with this shared_name. Otherwise, the node name is used instead.
+)doc");
+
+REGISTER_OP("TFRecordReader")
+    .Output("reader_handle: Ref(string)")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A Reader that outputs the records from a TensorFlow Records file.
+
+reader_handle: The handle to reference the Reader.
+container: If non-empty, this reader is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this reader is named in the given bucket
+             with this shared_name. Otherwise, the node name is used instead.
+)doc");
+
+REGISTER_OP("IdentityReader")
+    .Output("reader_handle: Ref(string)")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A Reader that outputs the queued work as both the key and value.
+
+To use, enqueue strings in a Queue.  ReaderRead will take the front
+work string and output (work, work).
+
+reader_handle: The handle to reference the Reader.
+container: If non-empty, this reader is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this reader is named in the given bucket
+             with this shared_name. Otherwise, the node name is used instead.
+)doc");
+
+// Ops that operate on Readers ------------------------------------------------
+
+REGISTER_OP("ReaderRead")
+    .Input("reader_handle: Ref(string)")
+    .Input("queue_handle: Ref(string)")
+    .Output("key: string")
+    .Output("value: string")
+    .Doc(R"doc(
+Returns the next record (key, value pair) produced by a Reader.
+
+Will dequeue from the input queue if necessary (e.g. when the
+Reader needs to start reading from a new file since it has finished
+with the previous file).
+
+reader_handle: Handle to a Reader.
+queue_handle: Handle to a Queue, with string work items.
+key: A scalar.
+value: A scalar.
+)doc");
+
+REGISTER_OP("ReaderNumRecordsProduced")
+    .Input("reader_handle: Ref(string)")
+    .Output("records_produced: int64")
+    .Doc(R"doc(
+Returns the number of records this Reader has produced.
+
+This is the same as the number of ReaderRead executions that have
+succeeded.
+
+reader_handle: Handle to a Reader.
+)doc");
+
+REGISTER_OP("ReaderNumWorkUnitsCompleted")
+    .Input("reader_handle: Ref(string)")
+    .Output("units_completed: int64")
+    .Doc(R"doc(
+Returns the number of work units this Reader has finished processing.
+
+reader_handle: Handle to a Reader.
+)doc");
+
+REGISTER_OP("ReaderSerializeState")
+    .Input("reader_handle: Ref(string)")
+    .Output("state: string")
+    .Doc(R"doc(
+Produce a string tensor that encodes the state of a Reader.
+
+Not all Readers support being serialized, so this can produce an
+Unimplemented error.
+
+reader_handle: Handle to a Reader.
+)doc");
+
+REGISTER_OP("ReaderRestoreState")
+    .Input("reader_handle: Ref(string)")
+    .Input("state: string")
+    .Doc(R"doc(
+Restore a reader to a previously saved state.
+
+Not all Readers support being restored, so this can produce an
+Unimplemented error.
+
+reader_handle: Handle to a Reader.
+state: Result of a ReaderSerializeState of a Reader with type
+  matching reader_handle.
+)doc");
+
+REGISTER_OP("ReaderReset")
+    .Input("reader_handle: Ref(string)")
+    .Doc(R"doc(
+Restore a Reader to its initial clean state.
+
+reader_handle: Handle to a Reader.
+)doc");
+
+// Other input Ops ----------------------------------------------------------
+
+REGISTER_OP("ReadFile")
+    .Input("filename: string")
+    .Output("contents: string")
+    .Doc(R"doc(
+Reads and outputs the entire contents of the input filename.
+)doc");
+
+REGISTER_OP("MatchingFiles")
+    .Input("pattern: string")
+    .Output("filenames: string")
+    .Doc(R"doc(
+Returns the set of files matching a pattern.
+
+Note that this routine only supports wildcard characters in the
+basename portion of the pattern, not in the directory portion.
+
+pattern: A (scalar) shell wildcard pattern.
+filenames: A vector of matching filenames.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/linalg_ops.cc b/tensorflow/core/ops/linalg_ops.cc
new file mode 100644
index 0000000000..a9b940295e
--- /dev/null
+++ b/tensorflow/core/ops/linalg_ops.cc
@@ -0,0 +1,97 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("MatrixDeterminant")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Calculates the determinant of a square matrix.
+
+input: A tensor of shape `[M, M]`.
+output: A scalar, equal to the determinant of the input.
+T: The type of values in the input and output.
+)doc");
+
+REGISTER_OP("BatchMatrixDeterminant")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Calculates the determinants for a batch of square matrices.
+
+The input is a tensor of shape `[..., M, M]` whose inner-most 2 dimensions
+form square matrices. The output is a 1-D tensor containing the determinants
+for all input submatrices `[..., :, :]`.
+
+input: Shape is `[..., M, M]`.
+output: Shape is `[...]`.
+T: The type of values in the input and output.
+)doc");
+
+REGISTER_OP("MatrixInverse")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Calculates the inverse of a square invertible matrix. Checks for invertibility.
+
+input: Shape is `[M, M]`.
+output: Shape is `[M, M]` containing the matrix inverse of the input.
+T: The type of values in the input and output.
+)doc");
+
+REGISTER_OP("BatchMatrixInverse")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Calculates the inverse of square invertible matrices. Checks for invertibility.
+
+The input is a tensor of shape `[..., M, M]` whose inner-most 2 dimensions
+form square matrices. The output is a tensor of the same shape as the input
+containing the inverse for all input submatrices `[..., :, :]`.
+
+input: Shape is `[..., M, M]`.
+output: Shape is `[..., M, M]`.
+T: The type of values in the input and output.
+)doc");
+
+REGISTER_OP("Cholesky")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: {double, float}")
+    .Doc(R"doc(
+Calculates the Cholesky decomposition of a square matrix.
+
+The input has to be symmetric and positive definite. Only the lower-triangular
+part of the input will be used for this operation. The upper-triangular part
+will not be read.
+
+The result is the lower-triangular matrix of the Cholesky decomposition of the
+input.
+
+input: Shape is `[M, M]`.
+output: Shape is `[M, M]`.
+T: The type of values in the input and output.
+)doc");
+
+REGISTER_OP("BatchCholesky")
+    .Input("input: T")
+    .Output("output: T")
+    .Attr("T: {double, float}")
+    .Doc(R"doc(
+Calculates the Cholesky decomposition of a batch of square matrices.
+
+The input is a tensor of shape `[..., M, M]` whose inner-most 2 dimensions
+form square matrices, with the same constraints as the single matrix Cholesky
+decomposition above. The output is a tensor of the same shape as the input
+containing the Cholesky decompositions for all input submatrices `[..., :, :]`.
+
+input: Shape is `[..., M, M]`.
+output: Shape is `[..., M, M]`.
+T: The type of values in the input and output.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/logging_ops.cc b/tensorflow/core/ops/logging_ops.cc
new file mode 100644
index 0000000000..28546fe645
--- /dev/null
+++ b/tensorflow/core/ops/logging_ops.cc
@@ -0,0 +1,43 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("Assert")
+    .Input("condition: bool")
+    .Input("data: T")
+    .Attr("T: list(type)")
+    .Attr("summarize: int = 3")
+    .Doc(R"doc(
+Asserts that the given condition is true.
+
+If `condition` evaluates to false, print the list of tensors in `data`.
+`summarize` determines how many entries of the tensors to print.
+
+condition: The condition to evaluate.
+data: The tensors to print out when condition is false.
+summarize: Print this many entries of each tensor.
+)doc");
+
+REGISTER_OP("Print")
+    .Input("input: T")
+    .Input("data: U")
+    .Output("output: T")
+    .Attr("T: type")
+    .Attr("U: list(type)")
+    .Attr("message: string = ''")
+    .Attr("first_n: int = -1")
+    .Attr("summarize: int = 3")
+    .Doc(R"doc(
+Prints a list of tensors.
+
+Passes `input` through to `output` and prints `data` when evaluating.
+
+input: The tensor passed to `output`
+data: A list of tensors to print out when op is evaluated.
+output:= The unmodified `input` tensor
+message: A string, prefix of the error message.
+first_n: Only log `first_n` number of times. -1 disables logging.
+summarize: Only print this many entries of each tensor.
+)doc");
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/ops/math_ops.cc b/tensorflow/core/ops/math_ops.cc
new file mode 100644
index 0000000000..20e56316ea
--- /dev/null
+++ b/tensorflow/core/ops/math_ops.cc
@@ -0,0 +1,1053 @@
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("AddN")
+    .Input("inputs: N * T")
+    .Output("sum: T")
+    .Attr("N: int >= 1")
+    .Attr("T: numbertype")
+    .SetIsCommutative()
+    .SetIsAggregate()
+    .Doc(R"doc(
+Add all input tensors element wise.
+
+inputs: Must all be the same size and shape.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("BatchMatMul")
+    .Input("x: T")
+    .Input("y: T")
+    .Output("out: T")
+    .Attr("T: {float, double, int32, complex64}")
+    .Attr("adj_x: bool = false")
+    .Attr("adj_y: bool = false")
+    .Doc(R"doc(
+Multiplies slices of two tensors in batches.
+
+Multiplies all slices of `Tensor` `x` and `y` (each slice can be
+viewed as an element of a batch), and arranges the individual results
+in a single output tensor of the same batch size. Each of the
+individual slices can optionally be adjointed (to adjoint a matrix
+means to transpose and conjugate it) before multiplication by setting
+the `adj_x` or `adj_y` flag to `True`, which are by default `False`.
+
+The input tensors `x` and `y` are 3-D or higher with shape `[..., r_x, c_x]`
+and `[..., r_y, c_y]`.
+
+The output tensor is 3-D or higher with shape `[..., r_o, c_o]`, where:
+
+    r_o = c_x if adj_x else r_x
+    c_o = r_y if adj_y else c_y
+
+It is computed as:
+
+    out[..., :, :] = matrix(x[..., :, :]) * matrix(y[..., :, :])
+
+x: 3-D or higher with shape `[..., r_x, c_x]`.
+y: 3-D or higher with shape `[..., r_y, c_y]`.
+out: 3-D or higher with shape `[..., r_o, c_o]`
+adj_x: If `True`, adjoint the slices of `x`. Defaults to `False`.
+adj_y: If `True`, adjoint the slices of `y`. Defaults to `False`.
+)doc");
+
+// --------------------------------------------------------------------------
+// Casting Ops
+//
+// NOTE: Only a smaller number of types are supported by
+// Cast. The exact casting rule is TBD. The current
+// implementation uses C++ static cast rules for numeric
+// types, which may be changed in the future.
+REGISTER_OP("Cast")
+    .Input("x: SrcT")
+    .Output("y: DstT")
+    .Attr("SrcT: type")
+    .Attr("DstT: type")
+    .Doc(R"doc(
+Cast x of type SrcT to y of DstT.
+)doc");
+
+REGISTER_OP("_HostCast")
+    .Input("x: SrcT")
+    .Output("y: DstT")
+    .Attr("SrcT: type")
+    .Attr("DstT: type")
+    .Doc(R"doc(
+Cast x of type SrcT to y of DstT.
+
+_HostCast requires its input and produces its output in host memory.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("Abs")
+    .Input("x: T")
+    .Output("y: T")
+    .Attr("T: {float, double, int32, int64}")
+    .Doc(R"doc(
+Computes the absolute value of a tensor.
+
+Given a tensor `x`, this operation returns a tensor containing the absolute
+value of each element in `x`. For example, if x is an input element and y is
+an output element, this operation computes \\(y = |x|\\).
+)doc");
+
+REGISTER_OP("ComplexAbs")
+    .Input("x: complex64")
+    .Output("y: float")
+    .Doc(R"doc(
+Computes the complex absolute value of a tensor.
+
+Given a tensor `x` of complex numbers, this operation returns a tensor of type
+`float` that is the absolute value of each element in `x`. All elements in `x`
+must be complex numbers of the form \\(a + bj\\). The absolute value is
+computed as \\( \sqrt{a^2 + b^2}\\).
+
+For example:
+
+```
+# tensor 'x' is [[-2.25 + 4.75j], [-3.25 + 5.75j]]
+tf.complex_abs(x) ==> [5.25594902, 6.60492229]
+```
+)doc");
+
+// Declares cwise unary operations signature: 't -> 't
+#define UNARY()                      \
+  Input("x: T").Output("y: T").Attr( \
+      "T: {float, double, int32, complex64, int64}")
+
+REGISTER_OP("Neg")
+    .UNARY()
+    .Doc(R"doc(
+Computes numerical negative value element-wise.
+I.e., \\(y = -x\\).
+)doc");
+
+REGISTER_OP("Inv")
+    .UNARY()
+    .Doc(R"doc(
+Computes the reciprocal of x element-wise.
+I.e., \\(y = 1 / x\\).
+)doc");
+
+REGISTER_OP("Square")
+    .UNARY()
+    .Doc(R"doc(
+Computes square of x element-wise.
+I.e., \\(y = x * x = x^2\\).
+)doc");
+
+REGISTER_OP("Sqrt")
+    .UNARY()
+    .Doc(R"doc(
+Computes square root of x element-wise.
+I.e., \\(y = \sqrt{x} = x^{1/2}\\).
+)doc");
+
+REGISTER_OP("Rsqrt")
+    .UNARY()
+    .Doc(R"doc(
+Computes reciprocal of square root of x element-wise.
+I.e., \\(y = 1 / \sqrt{x}\\).
+)doc");
+
+REGISTER_OP("Exp")
+    .UNARY()
+    .Doc(R"doc(
+Computes exponential of x element-wise.  \\(y = e^x\\).
+)doc");
+
+REGISTER_OP("Log")
+    .UNARY()
+    .Doc(R"doc(
+Computes natural logrithm of x element-wise.
+I.e., \\(y = \log_e x\\).
+)doc");
+
+REGISTER_OP("Tanh")
+    .UNARY()
+    .Doc(R"doc(
+Computes hyperbolic tangent of `x` element-wise.
+)doc");
+
+REGISTER_OP("Sigmoid")
+    .UNARY()
+    .Doc(R"doc(
+Computes sigmoid of `x` element-wise.
+
+Specifically, `y = 1 / (1 + exp(-x))`.
+)doc");
+
+REGISTER_OP("Sin")
+    .UNARY()
+    .Doc(R"doc(
+Computes sin of x element-wise.
+)doc");
+
+REGISTER_OP("Cos")
+    .UNARY()
+    .Doc(R"doc(
+Computes cos of x element-wise.
+)doc");
+
+#undef UNARY
+
+REGISTER_OP("IsNan")
+    .Input("x: T")
+    .Output("y: bool")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Returns which elements of x are NaN.
+)doc");
+
+REGISTER_OP("IsInf")
+    .Input("x: T")
+    .Output("y: bool")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Returns which elements of x are Inf.
+)doc");
+
+REGISTER_OP("IsFinite")
+    .Input("x: T")
+    .Output("y: bool")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Returns which elements of x are finite.
+)doc");
+
+REGISTER_OP("Sign")
+    .Input("x: T")
+    .Output("y: T")
+    .Attr("T: {float, double, int32, int64}")
+    .Doc(R"doc(
+Returns an element-wise indication of the sign of a number.
+
+y = sign(x) = -1 if x < 0; 0 if x == 0; 1 if x > 0.
+)doc");
+
+REGISTER_OP("Floor")
+    .Input("x: T")
+    .Output("y: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Returns element-wise largest integer not greater than x.
+)doc");
+
+REGISTER_OP("Ceil")
+    .Input("x: T")
+    .Output("y: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Returns element-wise smallest integer in not less than x.
+)doc");
+
+// Declares cwise binary operations signature: 't, 't -> 't.
+
+#define BINARY_MORE()                              \
+  Input("x: T").Input("y: T").Output("z: T").Attr( \
+      "T: {float, double, int8, int16, int32, complex64, int64}")
+
+#define BINARY_FEWER()                             \
+  Input("x: T").Input("y: T").Output("z: T").Attr( \
+      "T: {float, double, int32, complex64, int64}")
+
+REGISTER_OP("Add")
+    .BINARY_MORE()
+    .SetIsCommutative()
+    .Doc(R"doc(
+Returns x + y element-wise.
+
+*NOTE*: Add supports broadcasting. AddN does not.
+)doc");
+
+REGISTER_OP("Sub")
+    .BINARY_FEWER()
+    .Doc(R"doc(
+Returns x - y element-wise.
+)doc");
+
+REGISTER_OP("Mul")
+    .BINARY_MORE()
+    .SetIsCommutative()
+    .Doc(R"doc(
+Returns x * y element-wise.
+)doc");
+
+REGISTER_OP("Div")
+    .BINARY_FEWER()
+    .Doc(R"doc(
+Returns x / y element-wise.
+)doc");
+
+#undef BINARY_FEWER
+#undef BINARY_MORE
+
+REGISTER_OP("Maximum")
+    .Input("x: T")
+    .Input("y: T")
+    .Output("z: T")
+    .Attr("T: {float, double, int32, int64}")
+    .SetIsCommutative()
+    .Doc(R"doc(
+Returns the max of x and y (i.e. x > y ? x : y) element-wise, broadcasts.
+)doc");
+
+REGISTER_OP("Minimum")
+    .Input("x: T")
+    .Input("y: T")
+    .Output("z: T")
+    .Attr("T: {float, double, int32, int64}")
+    .SetIsCommutative()
+    .Doc(R"doc(
+Returns the min of x and y (i.e. x < y ? x : y) element-wise, broadcasts.
+)doc");
+
+REGISTER_OP("Mod")
+    .Input("x: T")
+    .Input("y: T")
+    .Output("z: T")
+    .Attr("T: {int32, int64, float, double}")
+    .Doc(R"doc(
+Returns element-wise remainder of division.
+)doc");
+
+REGISTER_OP("Pow")
+    .Input("x: T")
+    .Input("y: T")
+    .Output("z: T")
+    .Attr("T: {float, double, int32, complex64, int64}")
+    .Doc(R"doc(
+Computes the power of one value to another.
+
+Given a tensor `x` and a tensor `y`, this operation computes \\(x^y\\) for
+corresponding elements in `x` and `y`. For example:
+
+```
+# tensor 'x' is [[2, 2]], [3, 3]]
+# tensor 'y' is [[8, 16], [2, 3]]
+tf.pow(x, y) ==> [[256, 65536], [9, 27]]
+```
+)doc");
+
+// --------------------------------------------------------------------------
+
+// Declares cwise binary comparison operations signature: 't, 't -> bool,
+// where 't has a natural total order.
+#define COMPARISON()                                  \
+  Input("x: T").Input("y: T").Output("z: bool").Attr( \
+      "T: {float, double, int32, int64}")
+
+REGISTER_OP("Less")
+    .COMPARISON()
+    .Doc(R"doc(
+Returns the truth value of (x < y) element-wise.
+)doc");
+
+REGISTER_OP("LessEqual")
+    .COMPARISON()
+    .Doc(R"doc(
+Returns the truth value of (x <= y) element-wise.
+)doc");
+
+REGISTER_OP("Greater")
+    .COMPARISON()
+    .Doc(R"doc(
+Returns the truth value of (x > y) element-wise.
+)doc");
+
+REGISTER_OP("GreaterEqual")
+    .COMPARISON()
+    .Doc(R"doc(
+Returns the truth value of (x >= y) element-wise.
+)doc");
+
+#undef COMPARISON
+
+// --------------------------------------------------------------------------
+
+#define COMPARISON()                                                     \
+  Input("x: T").Input("y: T").Output("z: bool").SetIsCommutative().Attr( \
+      "T: {float, double, int32, int64, complex64, quint8, qint8, qint32}")
+
+REGISTER_OP("Equal")
+    .COMPARISON()
+    .Doc(R"doc(
+Returns the truth value of (x == y) element-wise.
+)doc");
+
+REGISTER_OP("NotEqual")
+    .COMPARISON()
+    .Doc(R"doc(
+Returns the truth value of (x != y) element-wise.
+)doc");
+
+#undef COMPARISON
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("LogicalNot")
+    .Input("x: bool")
+    .Output("y: bool")
+    .Doc(R"doc(
+Returns the truth value of NOT x element-wise.
+)doc");
+
+#define BINARY_LOGICAL() \
+  Input("x: bool").Input("y: bool").Output("z: bool").SetIsCommutative()
+
+REGISTER_OP("LogicalAnd")
+    .BINARY_LOGICAL()
+    .Doc(R"doc(
+Returns the truth value of x AND y element-wise.
+)doc");
+
+REGISTER_OP("LogicalOr")
+    .BINARY_LOGICAL()
+    .Doc(R"doc(
+Returns the truth value of x OR y element-wise.
+)doc");
+
+#undef BINARY_LOGICAL
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("Select")
+    .Input("condition: bool")
+    .Input("t: T")
+    .Input("e: T")
+    .Output("out: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Selects elements from `t` or `e`, depending on `condition`.
+
+The `condition`, `t`, and `e` tensors must all have the same shape,
+and the output will also have that shape. The `condition` tensor acts
+as an element-wise mask that chooses, based on the value at each
+element, whether the corresponding element in the output should be
+taken from `t` (if true) or `e` (if false). For example:
+
+For example:
+
+```prettyprint
+# 'condition' tensor is [[True, False]
+#                        [True, False]]
+# 't' is [[1, 1],
+#         [1, 1]]
+# 'e' is [[2, 2],
+#         [2, 2]]
+select(condition, t, e) ==> [[1, 2],
+                             [1, 2]]
+```
+
+t:= A `Tensor` with the same shape as `condition`.
+e:= A `Tensor` with the same type and shape as `t`.
+out:= A `Tensor` with the same type and shape as `t` and `e`.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("MatMul")
+    .Input("a: T")
+    .Input("b: T")
+    .Output("product: T")
+    .Attr("transpose_a: bool = false")
+    .Attr("transpose_b: bool = false")
+    .Attr("T: {float, double, int32, complex64}")
+    .Doc(R"doc(
+Multiply the matrix "a" by the matrix "b".
+
+The inputs must be two-dimensional matrices and the inner dimension of
+"a" (after being transposed if transpose_a is true) must match the
+outer dimension of "b" (after being transposed if transposed_b is
+true).
+
+*Note*: The default kernel implementation for MatMul on GPUs uses
+cublas.
+
+transpose_a: If true, "a" is transposed before multiplication.
+transpose_b: If true, "b" is transposed before multiplication.
+)doc");
+
+REGISTER_OP("SparseMatMul")
+    .Input("a: float")
+    .Input("b: float")
+    .Output("product: float")
+    .Attr("transpose_a: bool = false")
+    .Attr("transpose_b: bool = false")
+    .Attr("a_is_sparse: bool = false")
+    .Attr("b_is_sparse: bool = false")
+    .Doc(R"doc(
+Multiply matrix "a" by matrix "b".
+
+The inputs must be two-dimensional matrices and the inner dimension of "a" must
+match the outer dimension of "b". This op is optimized for the case where at
+least one of "a" or "b" is sparse. The breakeven for using this versus a dense
+matrix multiply on one platform was 30% zero values in the sparse matrix.
+)doc");
+
+// --------------------------------------------------------------------------
+
+// For operations where the output is a reduction function along some
+// dimensions of the input.
+REGISTER_OP("Sum")
+    .Input("input: T")
+    .Input("reduction_indices: int32")
+    .Output("output: T")
+    .Attr("keep_dims: bool = false")
+    .Attr("T: numbertype")
+    .Doc(R"doc(
+Computes the sum of elements across dimensions of a tensor.
+
+Reduces `input` along the dimensions given in `reduction_indices`. Unless
+`keep_dims` is true, the rank of the tensor is reduced by 1 for each entry in
+`reduction_indices`. If `keep_dims` is true, the reduced dimensions are
+retained with length 1.
+
+input: The tensor to reduce.
+reduction_indices: The dimensions to reduce.
+keep_dims: If true, retain reduced dimensions with length 1.
+output: The reduced tensor.
+)doc");
+
+REGISTER_OP("Mean")
+    .Input("input: T")
+    .Input("reduction_indices: int32")
+    .Output("output: T")
+    .Attr("keep_dims: bool = false")
+    .Attr("T: numbertype")
+    .Doc(R"doc(
+Computes the mean of elements across dimensions of a tensor.
+
+Reduces `input` along the dimensions given in `reduction_indices`. Unless
+`keep_dims` is true, the rank of the tensor is reduced by 1 for each entry in
+`reduction_indices`. If `keep_dims` is true, the reduced dimensions are
+retained with length 1.
+
+input: The tensor to reduce.
+reduction_indices: The dimensions to reduce.
+keep_dims: If true, retain reduced dimensions with length 1.
+output: The reduced tensor.
+)doc");
+
+REGISTER_OP("Prod")
+    .Input("input: T")
+    .Input("reduction_indices: int32")
+    .Output("output: T")
+    .Attr("keep_dims: bool = false")
+    .Attr("T: numbertype")
+    .Doc(R"doc(
+Computes the product of elements across dimensions of a tensor.
+
+Reduces `input` along the dimensions given in `reduction_indices`. Unless
+`keep_dims` is true, the rank of the tensor is reduced by 1 for each entry in
+`reduction_indices`. If `keep_dims` is true, the reduced dimensions are
+retained with length 1.
+
+input: The tensor to reduce.
+reduction_indices: The dimensions to reduce.
+keep_dims: If true, retain reduced dimensions with length 1.
+output: The reduced tensor.
+)doc");
+
+REGISTER_OP("Min")
+    .Input("input: T")
+    .Input("reduction_indices: int32")
+    .Output("output: T")
+    .Attr("keep_dims: bool = false")
+    .Attr("T: numbertype")
+    .Doc(R"doc(
+Computes the minimum of elements across dimensions of a tensor.
+
+Reduces `input` along the dimensions given in `reduction_indices`. Unless
+`keep_dims` is true, the rank of the tensor is reduced by 1 for each entry in
+`reduction_indices`. If `keep_dims` is true, the reduced dimensions are
+retained with length 1.
+
+input: The tensor to reduce.
+reduction_indices: The dimensions to reduce.
+keep_dims: If true, retain reduced dimensions with length 1.
+output: The reduced tensor.
+)doc");
+
+REGISTER_OP("Max")
+    .Input("input: T")
+    .Input("reduction_indices: int32")
+    .Output("output: T")
+    .Attr("keep_dims: bool = false")
+    .Attr("T: numbertype")
+    .Doc(R"doc(
+Computes the maximum of elements across dimensions of a tensor.
+
+Reduces `input` along the dimensions given in `reduction_indices`. Unless
+`keep_dims` is true, the rank of the tensor is reduced by 1 for each entry in
+`reduction_indices`. If `keep_dims` is true, the reduced dimensions are
+retained with length 1.
+
+input: The tensor to reduce.
+reduction_indices: The dimensions to reduce.
+keep_dims: If true, retain reduced dimensions with length 1.
+output: The reduced tensor.
+)doc");
+
+REGISTER_OP("ArgMax")
+    .Input("input: T")
+    .Input("dimension: int32")
+    .Output("output: int64")
+    .Attr("T: numbertype")
+    .Doc(R"doc(
+Returns the index with the largest value across dimensions of a tensor.
+
+dimension: int32, 0 <= dimension < rank(input).  Describes which dimension
+  of the input Tensor to reduce across. For vectors, use dimension = 0.
+)doc");
+
+REGISTER_OP("ArgMin")
+    .Input("input: T")
+    .Input("dimension: int32")
+    .Output("output: int64")
+    .Attr("T: numbertype")
+    .Doc(R"doc(
+Returns the index with the smallest value across dimensions of a tensor.
+
+dimension: int32, 0 <= dimension < rank(input).  Describes which dimension
+  of the input Tensor to reduce across. For vectors, use dimension = 0.
+)doc");
+
+REGISTER_OP("SegmentSum")
+    .Input("data: T")
+    .Input("segment_ids: Tindices")
+    .Output("output: T")
+    .Attr("T: realnumbertype")
+    .Attr("Tindices: {int32,int64}")
+    .Doc(R"doc(
+Computes the sum along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \sum_j data_j\\) where sum is over `j` such
+that `segment_ids[j] == i`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentSum.png" alt>
+</div>
+
+segment_ids: A 1-D tensor whose rank is equal to the rank of `data`'s
+first dimension.  Values should be sorted and can be repeated.
+
+output: Has same shape as data, except for dimension_0 which
+has size `k`, the number of segments.
+)doc");
+
+REGISTER_OP("SegmentMean")
+    .Input("data: T")
+    .Input("segment_ids: Tindices")
+    .Output("output: T")
+    .Attr("T: realnumbertype")
+    .Attr("Tindices: {int32,int64}")
+    .Doc(R"doc(
+Computes the mean along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \frac{\sum_j data_j}{N}\\) where `mean` is
+over `j` such that `segment_ids[j] == i` and `N` is the total number of
+values summed.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentMean.png" alt>
+</div>
+
+segment_ids: A 1-D tensor whose rank is equal to the rank of `data`'s
+first dimension.  Values should be sorted and can be repeated.
+
+output: Has same shape as data, except for dimension_0 which
+has size `k`, the number of segments.
+)doc");
+
+REGISTER_OP("SegmentProd")
+    .Input("data: T")
+    .Input("segment_ids: Tindices")
+    .Output("output: T")
+    .Attr("T: realnumbertype")
+    .Attr("Tindices: {int32,int64}")
+    .Doc(R"doc(
+Computes the product along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \prod_j data_j\\) where the product is over `j` such
+that `segment_ids[j] == i`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentProd.png" alt>
+</div>
+
+segment_ids: A 1-D tensor whose rank is equal to the rank of `data`'s
+first dimension.  Values should be sorted and can be repeated.
+
+output: Has same shape as data, except for dimension_0 which
+has size `k`, the number of segments.
+)doc");
+
+REGISTER_OP("SegmentMin")
+    .Input("data: T")
+    .Input("segment_ids: Tindices")
+    .Output("output: T")
+    .Attr("T: realnumbertype")
+    .Attr("Tindices: {int32,int64}")
+    .Doc(R"doc(
+Computes the minimum along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \min_j(data_j)\\) where `min` is over `j` such
+that `segment_ids[j] == i`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentMin.png" alt>
+</div>
+
+segment_ids: A 1-D tensor whose rank is equal to the rank of `data`'s
+first dimension.  Values should be sorted and can be repeated.
+
+output: Has same shape as data, except for dimension_0 which
+has size `k`, the number of segments.
+)doc");
+
+REGISTER_OP("SegmentMax")
+    .Input("data: T")
+    .Input("segment_ids: Tindices")
+    .Output("output: T")
+    .Attr("T: realnumbertype")
+    .Attr("Tindices: {int32,int64}")
+    .Doc(R"doc(
+Computes the maximum along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \max_j(data_j)\\) where `max` is over `j` such
+that `segment_ids[j] == i`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentMax.png" alt>
+</div>
+
+segment_ids: A 1-D tensor whose rank is equal to the rank of `data`'s
+first dimension.  Values should be sorted and can be repeated.
+
+output: Has same shape as data, except for dimension_0 which
+has size `k`, the number of segments.
+)doc");
+
+REGISTER_OP("UnsortedSegmentSum")
+    .Input("data: T")
+    .Input("segment_ids: Tindices")
+    .Input("num_segments: int32")
+    .Output("output: T")
+    .Attr("T: realnumbertype")
+    .Attr("Tindices: {int32,int64}")
+    .Doc(R"doc(
+Computes the sum along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \sum_j data_j\\) where sum is over `j` such
+that `segment_ids[j] == i`. Unlike `SegmentSum`, `segment_ids`
+need not be sorted and need not cover all values in the full
+  range of valid values.
+
+If the sum is empty for a given segment ID `i`, `output[i] = 0`.
+
+`num_segments` should equal the number of distinct segment IDs.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/UnsortedSegmentSum.png" alt>
+</div>
+
+segment_ids: A 1-D tensor whose rank is equal to the rank of `data`'s
+first dimension.
+
+output: Has same shape as data, except for dimension_0 which
+has size `num_segments`.
+
+)doc");
+
+REGISTER_OP("SparseSegmentSum")
+    .Input("data: T")
+    .Input("indices: int32")
+    .Input("segment_ids: int32")
+    .Output("output: T")
+    .Attr("T: realnumbertype")
+    .Doc(R"doc(
+Computes the sum along sparse segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Like `SegmentSum`, but `segment_ids` can have rank less than `data`'s first
+dimension, selecting a subset of dimension_0, specified by `indices`.
+
+For example:
+
+```prettyprint
+c = tf.constant([[1,2,3,4], [-1,-2,-3,-4], [5,6,7,8]])
+
+# Select two rows, one segment.
+tf.sparse_segment_sum(c, tf.constant([0, 1]), tf.constant([0, 0]))
+  ==> [[0 0 0 0]]
+
+# Select two rows, two segment.
+tf.sparse_segment_sum(c, tf.constant([0, 1]), tf.constant([0, 1]))
+  ==> [[ 1  2  3  4]
+       [-1 -2 -3 -4]]
+
+# Select all rows, two segments.
+tf.sparse_segment_sum(c, tf.constant([0, 1, 2]), tf.constant([0, 0, 1]))
+  ==> [[0 0 0 0]
+       [5 6 7 8]]
+
+# Which is equivalent to:
+tf.segment_sum(c, tf.constant([0, 0, 1]))
+```
+
+indices: A 1-D tensor. Has same rank as `segment_ids`.
+
+segment_ids: A 1-D tensor. Values should be sorted and can be repeated.
+
+output: Has same shape as data, except for dimension_0 which
+has size `k`, the number of segments.
+)doc");
+
+REGISTER_OP("SparseSegmentMean")
+    .Input("data: T")
+    .Input("indices: int32")
+    .Input("segment_ids: int32")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Computes the mean along sparse segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Like `SegmentMean`, but `segment_ids` can have rank less than `data`'s first
+dimension, selecting a subset of dimension_0, specified by `indices`.
+
+indices: A 1-D tensor. Has same rank as `segment_ids`.
+
+segment_ids: A 1-D tensor. Values should be sorted and can be repeated.
+
+output: Has same shape as data, except for dimension_0 which
+has size `k`, the number of segments.
+
+)doc");
+
+REGISTER_OP("SparseSegmentMeanGrad")
+    .Input("grad: T")
+    .Input("indices: int32")
+    .Input("segment_ids: int32")
+    .Input("output_dim0: int32")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Computes gradients for SparseSegmentMean.
+
+Returns tensor "output" with same shape as grad, except for dimension_0 whose
+value is output_dim0.
+
+grad: gradient propagated to the SparseSegmentMean op.
+indices: indices passed to the corresponding SparseSegmentMean op.
+segment_ids: segment_ids passed to the corresponding SparseSegmentMean op.
+output_dim0: dimension_0 of "data" passed to SparseSegmentMean op.
+)doc");
+
+REGISTER_OP("All")
+    .Input("input: bool")
+    .Input("reduction_indices: int32")
+    .Output("output: bool")
+    .Attr("keep_dims: bool = false")
+    .Doc(R"doc(
+Computes the "logical and" of elements across dimensions of a tensor.
+
+Reduces `input` along the dimensions given in `reduction_indices`. Unless
+`keep_dims` is true, the rank of the tensor is reduced by 1 for each entry in
+`reduction_indices`. If `keep_dims` is true, the reduced dimensions are
+retained with length 1.
+
+input: The tensor to reduce.
+reduction_indices: The dimensions to reduce.
+keep_dims: If true, retain reduced dimensions with length 1.
+output: The reduced tensor.
+)doc");
+
+REGISTER_OP("Any")
+    .Input("input: bool")
+    .Input("reduction_indices: int32")
+    .Attr("keep_dims: bool = false")
+    .Output("output: bool")
+    .Doc(R"doc(
+Computes the "logical or" of elements across dimensions of a tensor.
+
+Reduces `input` along the dimensions given in `reduction_indices`. Unless
+`keep_dims` is true, the rank of the tensor is reduced by 1 for each entry in
+`reduction_indices`. If `keep_dims` is true, the reduced dimensions are
+retained with length 1.
+
+input: The tensor to reduce.
+reduction_indices: The dimensions to reduce.
+keep_dims: If true, retain reduced dimensions with length 1.
+output: The reduced tensor.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("Range")
+    .Input("start: int32")
+    .Input("limit: int32")
+    .Input("delta: int32")
+    .Output("output: int32")
+    .Doc(R"doc(
+Creates a sequence of integers.
+
+This operation creates a sequence of integers that begins at `start` and
+extends by increments of `delta` up to but not including `limit`.
+
+For example:
+
+```
+# 'start' is 3
+# 'limit' is 18
+# 'delta' is 3
+tf.range(start, limit, delta) ==> [3, 6, 9, 12, 15]
+```
+
+start: 0-D (scalar). First entry in the sequence.
+limit: 0-D (scalar). Upper limit of sequence, exclusive.
+delta: 0-D (scalar). Optional. Default is 1. Number that increments `start`.
+output: 1-D.
+)doc");
+
+REGISTER_OP("LinSpace")
+    .Input("start: T")
+    .Input("stop: T")
+    .Input("num: int32")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Generates values in an interval.
+
+A sequence of `num` evenly-spaced values are generated beginning at `start`.
+If `num > 1`, the values in the sequence increase by `stop - start / num - 1`,
+so that the last one is exactly `stop`.
+
+For example:
+
+```
+tf.linspace(10.0, 12.0, 3, name="linspace") => [ 10.0  11.0  12.0]
+```
+
+start: First entry in the range.
+stop: Last entry in the range.
+num: Number of values to generate.
+output: 1-D. The generated values.
+)doc");
+
+REGISTER_OP("Complex")
+    .Input("real: float")
+    .Input("imag: float")
+    .Output("out: complex64")
+    .Doc(R"doc(
+Converts two real numbers to a complex number.
+
+Given a tensor `real` representing the real part of a complex number, and a
+tensor `imag` representing the imaginary part of a complex number, this
+operation returns complex numbers elementwise of the form \\(a + bj\\), where
+*a* represents the `real` part and *b* represents the `imag` part.
+
+The input tensors `real` and `imag` must have the same shape.
+
+For example:
+
+```
+# tensor 'real' is [2.25, 3.25]
+# tensor `imag` is [4.75, 5.75]
+tf.complex(real, imag) ==> [[2.25 + 4.75j], [3.25 + 5.75j]]
+```
+)doc");
+
+REGISTER_OP("Real")
+    .Input("in: complex64")
+    .Output("out: float")
+    .Doc(R"doc(
+Returns the real part of a complex number.
+
+Given a tensor `in` of complex numbers, this operation returns a tensor of type
+`float` that is the real part of each element in `in`. All elements in `in`
+must be complex numbers of the form \\(a + bj\\), where *a* is the real part
+returned by this operation and *b* is the imaginary part.
+
+For example:
+
+```
+# tensor 'in' is [-2.25 + 4.75j, 3.25 + 5.75j]
+tf.real(in) ==> [-2.25, 3.25]
+```
+)doc");
+
+REGISTER_OP("Imag")
+    .Input("in: complex64")
+    .Output("out: float")
+    .Doc(R"doc(
+Returns the imaginary part of a complex number.
+
+Given a tensor `in` of complex numbers, this operation returns a tensor of type
+`float` that is the imaginary part of each element in `in`. All elements in `in`
+must be complex numbers of the form \\(a + bj\\), where *a* is the real part
+and *b* is the imaginary part returned by this operation.
+
+For example:
+
+```
+# tensor 'in' is [-2.25 + 4.75j, 3.25 + 5.75j]
+tf.imag(in) ==> [4.75, 5.75]
+```
+)doc");
+
+REGISTER_OP("Conj")
+    .Input("in: complex64")
+    .Output("out: complex64")
+    .Doc(R"doc(
+Returns the complex conjugate of a complex number.
+
+Given a tensor `in` of complex numbers, this operation returns a tensor of
+complex numbers that are the complex conjugate of each element in `in`. The
+complex numbers in `in` must be of the form \\(a + bj\\), where *a* is the real
+part and *b* is the imaginary part.
+
+The complex conjugate returned by this operation is of the form \\(a - bj\\).
+
+For example:
+
+```
+# tensor 'in' is [-2.25 + 4.75j, 3.25 + 5.75j]
+tf.conj(in) ==> [-2.25 - 4.75j, 3.25 - 5.75j]
+```
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/nn_ops.cc b/tensorflow/core/ops/nn_ops.cc
new file mode 100644
index 0000000000..03ba49d5cd
--- /dev/null
+++ b/tensorflow/core/ops/nn_ops.cc
@@ -0,0 +1,543 @@
+#include "tensorflow/core/framework/numeric_op.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/util/padding.h"
+namespace tensorflow {
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("AvgPool")
+    .Input("value: T")
+    .Output("output: T")
+    .Attr("ksize: list(int) >= 4")
+    .Attr("strides: list(int) >= 4")
+    .Attr(GetPaddingAttrString())
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Performs average pooling on the input.
+
+Each entry in `output` is the mean of the corresponding size `ksize`
+window in `value`.
+
+value: 4-D with shape `[batch, height, width, channels]`.
+ksize: The size of the sliding window for each dimension of `value`.
+strides: The stride of the sliding window for each dimension of `value`.
+padding: The type of padding algorithm to use.
+output: The average pooled output tensor.
+)doc");
+
+REGISTER_OP("AvgPoolGrad")
+    .Input("orig_input_shape: int32")
+    .Input("grad: T")
+    .Output("output: T")
+    .Attr("ksize: list(int) >= 4")
+    .Attr("strides: list(int) >= 4")
+    .Attr(GetPaddingAttrString())
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Computes gradients of the average pooling function.
+
+orig_input_shape: 1-D.  Shape of the original input to `avg_pool`.
+grad: 4-D with shape `[batch, height, width, channels]`.  Gradients w.r.t.
+  the output of `avg_pool`.
+ksize: The size of the sliding window for each dimension of the input.
+strides: The stride of the sliding window for each dimension of the input.
+padding: The type of padding algorithm to use.
+output: 4-D.  Gradients w.r.t. the input of `avg_pool`.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("BatchNormWithGlobalNormalization")
+    .Input("t: T")
+    .Input("m: T")
+    .Input("v: T")
+    .Input("beta: T")
+    .Input("gamma: T")
+    .Output("result: T")
+    .Attr("T: numbertype")
+    .Attr("variance_epsilon: float")
+    .Attr("scale_after_normalization: bool")
+    .Doc(R"doc(
+Batch normalization.
+
+t: A 4D input Tensor.
+m: A 1D mean Tensor with size matching the last dimension of t.
+  This is the first output from MovingMoments.
+v: A 1D variance Tensor with size matching the last dimension of t.
+  This is the second output from MovingMoments.
+beta: A 1D beta Tensor with size matching the last dimension of t.
+  An offset to be added to the normalized tensor.
+gamma: A 1D gamma Tensor with size matching the last dimension of t.
+  If "scale_after_normalization" is true, this tensor will be multiplied
+  with the normalized tensor.
+variance_epsilon: A small float number to avoid dividing by 0.
+scale_after_normalization: A bool indicating whether the resulted tensor
+  needs to be multiplied with gamma.
+)doc");
+
+REGISTER_OP("BatchNormWithGlobalNormalizationGrad")
+    .Input("t: T")
+    .Input("m: T")
+    .Input("v: T")
+    .Input("gamma: T")
+    .Input("backprop: T")
+    .Output("dx: T")
+    .Output("dm: T")
+    .Output("dv: T")
+    .Output("db: T")
+    .Output("dg: T")
+    .Attr("T: numbertype")
+    .Attr("variance_epsilon: float")
+    .Attr("scale_after_normalization: bool")
+    .Doc(R"doc(
+Gradients for batch normalization.
+
+t: A 4D input Tensor.
+m: A 1D mean Tensor with size matching the last dimension of t.
+  This is the first output from MovingMoments.
+v: A 1D variance Tensor with size matching the last dimension of t.
+  This is the second output from MovingMoments.
+gamma: A 1D gamma Tensor with size matching the last dimension of t.
+  If "scale_after_normalization" is true, this Tensor will be multiplied
+  with the normalized Tensor.
+backprop: 4D backprop Tensor.
+variance_epsilon: A small float number to avoid dividing by 0.
+scale_after_normalization: A bool indicating whether the resulted tensor
+  needs to be multiplied with gamma.
+
+dx: 4D backprop tensor for input.
+dm: 1D backprop tensor for mean.
+dv: 1D backprop tensor for variance.
+db: 1D backprop tensor for beta.
+dg: 1D backprop tensor for gamma.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("BiasAdd")
+    .Attr("T: numbertype")
+    .Input("value: T")
+    .Input("bias: T")
+    .Output("output: T")
+    .Doc(R"doc(
+Adds `bias` to `value`.
+
+This is a special case of `tf.add` where `bias` is restricted to be 1-D.
+Broadcasting is supported, so `value` may have any number of dimensions.
+
+value: Any number of dimensions.
+bias: 1-D with size the last dimension of `value`.
+output: Broadcasted sum of `value` and `bias`.
+)doc");
+// --------------------------------------------------------------------------
+
+REGISTER_OP("Conv2D")
+    .Input("input: T")
+    .Input("filter: T")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Attr("strides: list(int)")
+    .Attr("use_cudnn_on_gpu: bool = true")
+    .Attr(GetPaddingAttrString())
+    .Doc(R"doc(
+Computes a 2-D convolution given 4-D `input` and `filter` tensors.
+
+Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
+and a filter / kernel tensor of shape
+`[filter_height, filter_width, in_channels, out_channels]`, this op
+performs the following:
+
+1. Flattens the filter to a 2-D matrix with shape
+   `[filter_height * filter_width * in_channels, output_channels]`.
+2. Extracts image patches from the the input tensor to form a *virtual*
+   tensor of shape `[batch, out_height, out_width,
+   filter_height * filter_width * in_channels]`.
+3. For each patch, right-multiplies the filter matrix and the image patch
+   vector.
+
+In detail,
+
+    output[b, i, j, k] =
+        sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
+                        filter[di, dj, q, k]
+
+Must have `strides[0] = strides[3] = 1`.  For the most common case of the same
+horizontal and vertices strides, `strides = [1, stride, stride, 1]`.
+
+strides: 1-D of length 4.  The stride of the sliding window for each dimension
+  of `input`.
+padding: The type of padding algorithm to use.
+)doc");
+
+REGISTER_OP("Conv2DBackpropInput")
+    .Input("input_sizes: int32")
+    .Input("filter: T")
+    .Input("out_backprop: T")
+    .Output("output: T")
+    .Attr("T: {float, double}")
+    .Attr("strides: list(int)")
+    .Attr("use_cudnn_on_gpu: bool = true")
+    .Attr(GetPaddingAttrString())
+    .Doc(R"doc(
+Computes the gradients of convolution with respect to the input.
+
+input_sizes: An integer vector representing the shape of `input`,
+  where `input` is a 4-D `[batch, height, width, channels]` tensor.
+filter: 4-D with shape
+  `[filter_height, filter_width, in_channels, out_channels]`.
+out_backprop: 4-D with shape `[batch, out_height, out_width, out_channels]`.
+  Gradients w.r.t. the output of the convolution.
+strides: The stride of the sliding window for each dimension of the input
+  of the convolution.
+padding: The type of padding algorithm to use.
+output: 4-D with shape `[batch, in_height, in_width, in_channels]`.  Gradient
+  w.r.t. the input of the convolution.
+)doc");
+
+// TODO(jeff): Instead of 'use_cudnn_for_gpu', maybe we should have a
+// more general string attribute ('kernel_impl'?) that can be used to
+// select among several possible implementations.
+REGISTER_OP("Conv2DBackpropFilter")
+    .Input("input: T")
+    .Input("filter_sizes: int32")
+    .Output("output: T")
+    .Input("out_backprop: T")
+    .Attr("T: {float, double}")
+    .Attr("strides: list(int)")
+    .Attr("use_cudnn_on_gpu: bool = true")
+    .Attr(GetPaddingAttrString())
+    .Doc(R"doc(
+Computes the gradients of convolution with respect to the filter.
+
+input: 4-D with shape `[batch, in_height, in_width, in_channels]`.
+filter_sizes: An integer vector representing the tensor shape of `filter`,
+  where `filter` is a 4-D
+  `[filter_height, filter_width, in_channels, out_channels]` tensor.
+out_backprop: 4-D with shape `[batch, out_height, out_width, out_channels]`.
+  Gradients w.r.t. the output of the convolution.
+strides: The stride of the sliding window for each dimension of the input
+  of the convolution.
+padding: The type of padding algorithm to use.
+output: 4-D with shape
+  `[filter_height, filter_width, in_channels, out_channels]`.  Gradient w.r.t.
+  the `filter` input of the convolution.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("L2Loss")
+    .Input("t: T")
+    .Output("output: T")
+    .Attr("T: numbertype")
+    .Doc(R"doc(
+L2 Loss.
+
+Computes half the L2 norm of a tensor without the `sqrt`:
+
+    output = sum(t ** 2) / 2
+
+t: Typically 2-D, but may have any dimensions.
+output: 0-D.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("LRN")
+    .Input("input: float")
+    .Output("output: float")
+    .Attr("depth_radius: int = 5")
+    .Attr("bias: float = 1.0")
+    .Attr("alpha: float = 1.0")
+    .Attr("beta: float = 0.5")
+    .Doc(R"doc(
+Local Response Normalization.
+
+The 4-D `input` tensor is treated as a 3-D array of 1-D vectors (along the last
+dimension), and each vector is normalized independently.  Within a given vector,
+each component is divided by the weighted, squared sum of inputs within
+`depth_radius`.  In detail,
+
+    sqr_sum[a, b, c, d] =
+        sum(input[a, b, c, d - depth_radius : d + depth_radius + 1] ** 2)
+    output = input / (bias + alpha * sqr_sum ** beta)
+
+For details, see [Krizhevsky et al., ImageNet classification with deep
+convolutional neural networks (NIPS 2012)]
+(http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks).
+
+input: 4-D.
+depth_radius: 0-D.  Half-width of the 1-D normalization window.
+bias: An offset (usually positive to avoid dividing by 0).
+alpha: A scale factor, usually positive.
+beta: An exponent.
+)doc");
+
+REGISTER_OP("LRNGrad")
+    .Input("input_grads: float")
+    .Input("input_image: float")
+    .Input("output_image: float")
+    .Output("output: float")
+    .Attr("depth_radius: int = 5")
+    .Attr("bias: float = 1.0")
+    .Attr("alpha: float = 1.0")
+    .Attr("beta: float = 0.5")
+    .Doc(R"doc(
+Gradients for Local Response Normalization.
+
+input_grads: 4-D with shape `[batch, height, width, channels]`.
+input_image: 4-D with shape `[batch, height, width, channels]`.
+output_image: 4-D with shape `[batch, height, width, channels]`.
+depth_radius: A depth radius.
+bias: An offset (usually > 0 to avoid dividing by 0).
+alpha: A scale factor, usually positive.
+beta: An exponent.
+output: The gradients for LRN.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("MaxPool")
+    .Attr("ksize: list(int) >= 4")
+    .Attr("strides: list(int) >= 4")
+    .Attr(GetPaddingAttrString())
+    .Input("input: float")
+    .Output("output: float")
+    .Doc(R"doc(
+Performs max pooling on the input.
+
+ksize: The size of the window for each dimension of the input tensor.
+strides: The stride of the sliding window for each dimension of the
+  input tensor.
+padding: The type of padding algorithm to use.
+input: 4-D input to pool over.
+output: The max pooled output tensor.
+)doc");
+
+REGISTER_OP("MaxPoolGrad")
+    .Attr("ksize: list(int) >= 4")
+    .Attr("strides: list(int) >= 4")
+    .Attr(GetPaddingAttrString())
+    .Input("orig_input: float")
+    .Input("orig_output: float")
+    .Input("grad: float")
+    .Output("output: float")
+    .Doc(R"doc(
+Computes gradients of the maxpooling function.
+
+ksize: The size of the window for each dimension of the input tensor.
+strides: The stride of the sliding window for each dimension of the
+  input tensor.
+padding: The type of padding algorithm to use.
+orig_input: The original input tensor.
+orig_output: The original output tensor.
+grad: 4-D.  Gradients w.r.t. the output of `max_pool`.
+output: Gradients w.r.t. the input to `max_pool`.
+)doc");
+
+REGISTER_OP("MaxPoolWithArgmax")
+    .Attr("ksize: list(int) >= 4")
+    .Attr("strides: list(int) >= 4")
+    .Attr("Targmax: {int32, int64} = DT_INT64")
+    .Attr(GetPaddingAttrString())
+    .Input("input: float")
+    .Output("output: float")
+    .Output("argmax: Targmax")
+    .Doc(R"doc(
+Performs max pooling on the input and outputs both max values and indices.
+
+The indices in `argmax` are flattened, so that a maximum value at position
+`[b, y, x, c]` becomes flattened index
+`((b * height + y) * width + x) * channels + c`.
+
+ksize: The size of the window for each dimension of the input tensor.
+strides: The stride of the sliding window for each dimension of the
+  input tensor.
+padding: The type of padding algorithm to use.
+input: 4-D with shape `[batch, height, width, channels]`.  Input to pool over.
+output: The max pooled output tensor.
+argmax: 4-D.  The flattened indices of the max values chosen for each output.
+)doc");
+
+REGISTER_OP("MaxPoolGradWithArgmax")
+    .Attr("ksize: list(int) >= 4")
+    .Attr("strides: list(int) >= 4")
+    .Attr(GetPaddingAttrString())
+    .Attr("Targmax: {int32, int64}")
+    .Input("input: float")
+    .Input("grad: float")
+    .Input("argmax: Targmax")
+    .Output("output: float")
+    .Doc(R"doc(
+Computes gradients of the maxpooling function.
+
+ksize: The size of the window for each dimension of the input tensor.
+strides: The stride of the sliding window for each dimension of the
+  input tensor.
+padding: The type of padding algorithm to use.
+input: The original input.
+grad: 4-D with shape `[batch, height, width, channels]`.  Gradients w.r.t. the
+  output of `max_pool`.
+argmax: The indices of the maximum values chosen for each output of `max_pool`.
+output: Gradients w.r.t. the input of `max_pool`.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("Relu")
+    .Input("features: T")
+    .Output("activations: T")
+    .Attr("T: realnumbertype")
+    .Doc(R"doc(
+Computes rectified linear: `max(features, 0)`.
+)doc");
+
+REGISTER_OP("ReluGrad")
+    .Input("gradients: T")
+    .Input("features: T")
+    .Output("backprops: T")
+    .Attr("T: realnumbertype")
+    .Doc(R"doc(
+Computes rectified linear gradients for a Relu operation.
+
+gradients: The backpropagated gradients to the corresponding Relu operation.
+features: The features passed as input to the corresponding Relu operation.
+backprops: The gradients: `gradients * features * (features > 0)`.
+)doc");
+
+REGISTER_OP("Relu6")
+    .Input("features: T")
+    .Output("activations: T")
+    .Attr("T: realnumbertype")
+    .Doc(R"doc(
+Computes rectified linear 6: `min(max(features, 0), 6)`.
+)doc");
+
+REGISTER_OP("Relu6Grad")
+    .Input("gradients: T")
+    .Input("features: T")
+    .Output("backprops: T")
+    .Attr("T: realnumbertype")
+    .Doc(R"doc(
+Computes rectified linear 6 gradients for a Relu6 operation.
+
+gradients: The backpropagated gradients to the corresponding Relu6 operation.
+features: The features passed as input to the corresponding Relu6 operation.
+backprops: The gradients:
+  `gradients * features * (features > 0) * (features < 6)`.
+)doc");
+
+REGISTER_OP("Softplus")
+    .Input("features: T")
+    .Output("activations: T")
+    .Attr("T: realnumbertype")
+    .Doc(R"doc(
+Computes softplus: `log(exp(features) + 1)`.
+)doc");
+
+REGISTER_OP("SoftplusGrad")
+    .Input("gradients: T")
+    .Input("features: T")
+    .Output("backprops: T")
+    .Attr("T: realnumbertype")
+    .Doc(R"doc(
+Computes softplus gradients for a softplus operation.
+
+gradients: The backpropagated gradients to the corresponding softplus operation.
+features: The features passed as input to the corresponding softplus operation.
+backprops: The gradients: `gradients / (1 + exp(-features))`.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("Softmax")
+    .Input("logits: T")
+    .Output("softmax: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Computes softmax activations.
+
+For each batch `i` and class `j` we have
+
+    softmax[i, j] = exp(logits[i, j]) / sum(exp(logits[i]))
+
+logits: 2-D with shape `[batch_size, num_classes]`.
+softmax: Same shape as `logits`.
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("SoftmaxCrossEntropyWithLogits")
+    .Input("features: T")
+    .Input("labels: T")
+    .Output("loss: T")
+    .Output("backprop: T")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Computes softmax cross entropy cost and gradients to backpropagate.
+
+Inputs are the logits, not probabilities.
+
+features: batch_size x num_classes matrix
+labels: batch_size x num_classes matrix
+  The caller must ensure that each batch of labels represents a valid
+  probability distribution.
+loss: Per example loss (batch_size vector).
+backprop: backpropagated gradients (batch_size x num_classes matrix).
+)doc");
+
+// --------------------------------------------------------------------------
+
+REGISTER_OP("InTopK")
+    .Attr("k: int")
+    .Input("predictions: float")
+    .Input("targets: int32")
+    .Output("precision: bool")
+    .Doc(R"doc(
+Says whether the targets are in the top K predictions.
+
+This outputs a batch_size bool array, an entry out[i] is true if the
+prediction for the target class is among the top k predictions among
+all predictions for example i. Note that the behavior of InTopK differs
+from the TopK op in its handling of ties; if multiple classes have the
+same prediction value and straddle the top-k boundary, all of those
+classes are considered to be in the top k.
+
+More formally, let
+
+  \\(predictions_i\\) be the predictions for all classes for example i,
+  \\(targets_i\\) be the target class for example i,
+  \\(out_i\\) be the output for example i,
+
+$$out_i = predictions_{i, targets_i} \in TopKIncludingTies(predictions_i)$$
+
+predictions: A batch_size x classes tensor
+targets: A batch_size vector of class ids
+k: Number of top elements to look at for computing precision
+precision: Computed Precision at k as a bool Tensor
+
+)doc");
+
+REGISTER_OP("TopK")
+    .Attr("k: int >= 1")
+    .Input("input: T")
+    .Output("values: T")
+    .Output("indices: int32")
+    .Attr("T: realnumbertype")
+    .Doc(R"doc(
+Returns the values and indices of the k largest elements for each row.
+
+\\(values_{i, j}\\) represents the j-th largest element in \\(input_i\\).
+
+\\(indices_{i, j}\\) gives the column index of the corresponding element,
+such that \\(input_{i, indices_{i, j}} = values_{i, j}\\). If two
+elements are equal, the lower-index element appears first.
+
+k: Number of top elements to look for within each row
+input: A batch_size x classes tensor
+values: A batch_size x k tensor with the k largest elements for each row,
+  sorted in descending order
+indices: A batch_size x k tensor with the index of each value within each row
+
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/no_op.cc b/tensorflow/core/ops/no_op.cc
new file mode 100644
index 0000000000..52778917cb
--- /dev/null
+++ b/tensorflow/core/ops/no_op.cc
@@ -0,0 +1,10 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("NoOp")
+    .Doc(R"doc(
+Does nothing. Only useful as a placeholder for control edges.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/parsing_ops.cc b/tensorflow/core/ops/parsing_ops.cc
new file mode 100644
index 0000000000..7fcaa3abf1
--- /dev/null
+++ b/tensorflow/core/ops/parsing_ops.cc
@@ -0,0 +1,104 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("DecodeRaw")
+    .Input("bytes: string")
+    .Output("output: out_type")
+    .Attr("out_type: {float,double,int32,uint8,int16,int8,int64}")
+    .Attr("little_endian: bool = true")
+    .Doc(R"doc(
+Reinterpret the bytes of a string as a vector of numbers.
+
+bytes: All the elements must have the same length.
+little_endian: Whether the input bytes are in little-endian order.
+  Ignored for out_types that are stored in a single byte like uint8.
+output: A Tensor with one more dimension than the input bytes.  The
+  added dimension will have size equal to the length of the elements
+  of bytes divided by the number of bytes to represent out_type.
+)doc");
+
+REGISTER_OP("ParseExample")
+    .Input("serialized: string")
+    .Input("names: string")
+    .Input("sparse_keys: Nsparse * string")
+    .Input("dense_keys: Ndense * string")
+    .Input("dense_defaults: Tdense")
+    .Output("sparse_indices: Nsparse * int64")
+    .Output("sparse_values: sparse_types")
+    .Output("sparse_shapes: Nsparse * int64")
+    .Output("dense_values: Tdense")
+    .Attr("Nsparse: int >= 0")  // Inferred from sparse_keys
+    .Attr("Ndense: int >= 0")   // Inferred from dense_keys
+    .Attr("sparse_types: list({float,int64,string}) >= 0")
+    .Attr("Tdense: list({float,int64,string}) >= 0")
+    .Attr("dense_shapes: list(shape) >= 0")
+    .Doc(R"doc(
+Transforms a vector of brain.Example protos (as strings) into typed tensors.
+
+serialized: A vector containing a batch of binary serialized Example protos.
+names: A vector containing the names of the serialized protos.
+  May contain, for example, table key (descriptive) names for the
+  corresponding serialized protos.  These are purely useful for debugging
+  purposes, and the presence of values here has no effect on the output.
+  May also be an empty vector if no names are available.
+  If non-empty, this vector must be the same length as "serialized".
+dense_keys: A list of Ndense string Tensors (scalars).
+  The keys expected in the Examples' features associated with dense values.
+dense_defaults: A list of Ndense Tensors (some may be empty).
+  dense_defaults[j] provides default values
+  when the example's feature_map lacks dense_key[j].  If an empty Tensor is
+  provided for dense_defaults[j], then the Feature dense_keys[j] is required.
+  The input type is inferred from dense_defaults[j], even when it's empty.
+  If dense_defaults[j] is not empty, its shape must match dense_shapes[j].
+dense_shapes: A list of Ndense shapes; the shapes of data in each Feature
+  given in dense_keys.
+  The number of elements in the Feature corresponding to dense_key[j]
+  must always equal dense_shapes[j].NumEntries().
+  If dense_shapes[j] == (D0, D1, ..., DN) then the the shape of output
+  Tensor dense_values[j] will be (|serialized|, D0, D1, ..., DN):
+  The dense outputs are just the inputs row-stacked by batch.
+sparse_keys: A list of Nsparse string Tensors (scalars).
+  The keys expected in the Examples' features associated with sparse values.
+sparse_types: A list of Nsparse types; the data types of data in each Feature
+  given in sparse_keys.
+  Currently the ParseExample supports DT_FLOAT (FloatList),
+  DT_INT64 (Int64List), and DT_STRING (BytesList).
+)doc");
+
+REGISTER_OP("DecodeCSV")
+    .Input("records: string")
+    .Input("record_defaults: OUT_TYPE")
+    .Output("output: OUT_TYPE")
+    .Attr("OUT_TYPE: list({float,int32,int64,string})")
+    .Attr("field_delim: string = ','")
+    .Doc(R"doc(
+Convert CSV records to tensors. Each column maps to one tensor.
+
+RFC 4180 format is expected for the CSV records.
+(https://tools.ietf.org/html/rfc4180)
+Note that we allow leading and trailing spaces with int or float field.
+
+records: Each string is a record/row in the csv and all records should have
+  the same format.
+record_defaults: One tensor per column of the input record, with either a
+  scalar default value for that column or empty if the column is required.
+field_delim: delimiter to separate fields in a record.
+output: Each tensor will have the same shape as records.
+)doc");
+
+REGISTER_OP("StringToNumber")
+    .Input("string_tensor: string")
+    .Output("output: out_type")
+    .Attr("out_type: {float, int32} = DT_FLOAT")
+    .Doc(R"doc(
+Converts each string in the input Tensor to the specified numeric type.
+
+(Note that int32 overflow results in an error while float overflow
+results in a rounded value.)
+
+out_type: The numeric type to interpret each string in string_tensor as.
+output: A Tensor of the same shape as the input string_tensor.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/random_ops.cc b/tensorflow/core/ops/random_ops.cc
new file mode 100644
index 0000000000..4be4354b85
--- /dev/null
+++ b/tensorflow/core/ops/random_ops.cc
@@ -0,0 +1,108 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("RandomUniform")
+    .Input("shape: T")
+    .SetIsStateful()
+    .Output("output: dtype")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Attr("dtype: {float,double}")
+    .Attr("T: {int32, int64}")
+    .Doc(R"doc(
+Outputs random values from a uniform distribution.
+
+The generated values follow a uniform distribution in the range `[0, 1)`. The
+lower bound 0 is included in the range, while the upper bound 1 is excluded.
+
+shape: The shape of the output tensor.
+dtype: The type of the output.
+seed: If either `seed` or `seed2` are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: A second seed to avoid seed collision.
+
+output: A tensor of the specified shape filled with uniform random values.
+)doc");
+
+REGISTER_OP("RandomStandardNormal")
+    .Input("shape: T")
+    .SetIsStateful()
+    .Output("output: dtype")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Attr("dtype: {float,double}")
+    .Attr("T: {int32, int64}")
+    .Doc(R"doc(
+Outputs random values from a normal distribution.
+
+The generated values will have mean 0 and standard deviation 1.
+
+shape: The shape of the output tensor.
+dtype: The type of the output.
+seed: If either `seed` or `seed2` are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: A second seed to avoid seed collision.
+
+output: A tensor of the specified shape filled with random normal values.
+)doc");
+
+REGISTER_OP("TruncatedNormal")
+    .Input("shape: T")
+    .SetIsStateful()
+    .Output("output: dtype")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Attr("dtype: {float,double}")
+    .Attr("T: {int32, int64}")
+    .Doc(R"doc(
+Outputs random values from a truncated normal distribution.
+
+The generated values follow a normal distribution with mean 0 and standard
+deviation 1, except that values whose magnitude is more than 2 standard
+deviations from the mean are dropped and re-picked.
+
+shape: The shape of the output tensor.
+dtype: The type of the output.
+seed: If either `seed` or `seed2` are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: A second seed to avoid seed collision.
+
+output: A tensor of the specified shape filled with random truncated normal
+  values.
+)doc");
+
+REGISTER_OP("RandomShuffle")
+    .Input("value: T")
+    .SetIsStateful()
+    .Output("output: T")
+    .Attr("seed: int = 0")
+    .Attr("seed2: int = 0")
+    .Attr("T: type")
+    .Doc(R"doc(
+Randomly shuffles a tensor along its first dimension.
+
+  The tensor is shuffled along dimension 0, such that each `value[j]` is mapped
+  to one and only one `output[i]`. For example, a mapping that might occur for a
+  3x2 tensor is:
+
+```prettyprint
+[[1, 2],       [[5, 6],
+ [3, 4],  ==>   [1, 2],
+ [5, 6]]        [3, 4]]
+```
+
+value: The tensor to be shuffled.
+seed: If either `seed` or `seed2` are set to be non-zero, the random number
+  generator is seeded by the given seed.  Otherwise, it is seeded by a
+  random seed.
+seed2: A second seed to avoid seed collision.
+
+output: A tensor of same shape and type as `value`, shuffled along its first
+  dimension.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/sendrecv_ops.cc b/tensorflow/core/ops/sendrecv_ops.cc
new file mode 100644
index 0000000000..51158263c1
--- /dev/null
+++ b/tensorflow/core/ops/sendrecv_ops.cc
@@ -0,0 +1,99 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("_Send")
+    .Input("tensor: T")
+    .Attr("T: type")
+    .Attr("tensor_name: string")
+    .Attr("send_device: string")
+    .Attr("send_device_incarnation: int")
+    .Attr("recv_device: string")
+    .Attr("client_terminated: bool = false")
+    .Doc(R"doc(
+Sends the named tensor from send_device to recv_device.
+
+tensor: The tensor to send.
+tensor_name: The name of the tensor to send.
+send_device: The name of the device sending the tensor.
+send_device_incarnation: The current incarnation of send_device.
+recv_device: The name of the device receiving the tensor.
+client_terminated: If set to true, this indicates that the node was added
+  to the graph as a result of a client-side feed or fetch of Tensor data,
+  in which case the corresponding send or recv is expected to be managed
+  locally by the caller.
+)doc");
+
+REGISTER_OP("_Recv")
+    .Output("tensor: tensor_type")
+    .Attr("tensor_type: type")
+    .Attr("tensor_name: string")
+    .Attr("send_device: string")
+    .Attr("send_device_incarnation: int")
+    .Attr("recv_device: string")
+    .Attr("client_terminated: bool = false")
+    .Doc(R"doc(
+Receives the named tensor from send_device on recv_device.
+
+tensor: The tensor to receive.
+tensor_name: The name of the tensor to receive.
+send_device: The name of the device sending the tensor.
+send_device_incarnation: The current incarnation of send_device.
+recv_device: The name of the device receiving the tensor.
+client_terminated: If set to true, this indicates that the node was added
+  to the graph as a result of a client-side feed or fetch of Tensor data,
+  in which case the corresponding send or recv is expected to be managed
+  locally by the caller.
+)doc");
+
+REGISTER_OP("_HostSend")
+    .Input("tensor: T")
+    .Attr("T: type")
+    .Attr("tensor_name: string")
+    .Attr("send_device: string")
+    .Attr("send_device_incarnation: int")
+    .Attr("recv_device: string")
+    .Attr("client_terminated: bool = false")
+    .Doc(R"doc(
+Sends the named tensor from send_device to recv_device.
+
+_HostSend requires its input on host memory whereas _Send requires its
+input on device memory.
+
+tensor: The tensor to send.
+tensor_name: The name of the tensor to send.
+send_device: The name of the device sending the tensor.
+send_device_incarnation: The current incarnation of send_device.
+recv_device: The name of the device receiving the tensor.
+client_terminated: If set to true, this indicates that the node was added
+  to the graph as a result of a client-side feed or fetch of Tensor data,
+  in which case the corresponding send or recv is expected to be managed
+  locally by the caller.
+)doc");
+
+REGISTER_OP("_HostRecv")
+    .Output("tensor: tensor_type")
+    .Attr("tensor_type: type")
+    .Attr("tensor_name: string")
+    .Attr("send_device: string")
+    .Attr("send_device_incarnation: int")
+    .Attr("recv_device: string")
+    .Attr("client_terminated: bool = false")
+    .Doc(R"doc(
+Receives the named tensor from send_device on recv_device.
+
+_HostRecv requires its input on host memory whereas _Recv requires its
+input on device memory.
+
+tensor: The tensor to receive.
+tensor_name: The name of the tensor to receive.
+send_device: The name of the device sending the tensor.
+send_device_incarnation: The current incarnation of send_device.
+recv_device: The name of the device receiving the tensor.
+client_terminated: If set to true, this indicates that the node was added
+  to the graph as a result of a client-side feed or fetch of Tensor data,
+  in which case the corresponding send or recv is expected to be managed
+  locally by the caller.
+)doc");
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/ops/sparse_ops.cc b/tensorflow/core/ops/sparse_ops.cc
new file mode 100644
index 0000000000..51262373d5
--- /dev/null
+++ b/tensorflow/core/ops/sparse_ops.cc
@@ -0,0 +1,134 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("SparseToDense")
+    .Input("sparse_indices: Tindices")
+    .Input("output_shape: Tindices")
+    .Input("sparse_values: T")
+    .Input("default_value: T")
+    .Output("dense: T")
+    .Attr("T: type")
+    .Attr("Tindices: {int32, int64}")
+    .Doc(R"doc(
+Converts a sparse representation into a dense tensor.
+
+Builds an array `dense` with shape `output_shape` such that
+
+```prettyprint
+# If sparse_indices is scalar
+dense[i] = (i == sparse_indices ? sparse_values : default_value)
+
+# If sparse_indices is a vector, then for each i
+dense[sparse_indices[i]] = sparse_values[i]
+
+# If sparse_indices is an n by d matrix, then for each i in [0, n)
+dense[sparse_indices[i][0], ..., sparse_indices[i][d-1]] = sparse_values[i]
+```
+
+All other values in `dense` are set to `default_value`.  If `sparse_values` is a
+scalar, all sparse indices are set to this single value.
+
+sparse_indices: 0-D, 1-D, or 2-D.  `sparse_indices[i]` contains the complete
+  index where `sparse_values[i]` will be placed.
+output_shape: 1-D.  Shape of the dense output tensor.
+sparse_values: 1-D.  Values corresponding to each row of `sparse_indices`,
+  or a scalar value to be used for all sparse indices.
+default_value: Scalar value to set for indices not specified in
+  `sparse_indices`.
+dense: Dense output tensor of shape `output_shape`.
+)doc");
+
+REGISTER_OP("SparseConcat")
+    .Input("indices: N * int64")
+    .Input("values: N * T")
+    .Input("shapes: N * int64")
+    .Output("output_indices: int64")
+    .Output("output_values: T")
+    .Output("output_shape: int64")
+    .Attr("concat_dim: int >= 0")
+    .Attr("N: int >= 2")
+    .Attr("T: type")
+    .Doc(R"doc(
+Concatenates a list of `SparseTensor` along the specified dimension.
+
+Concatenation is with respect to the dense versions of these sparse tensors.
+It is assumed that each input is a `SparseTensor` whose elements are ordered
+along increasing dimension number.
+
+All inputs' shapes must match, except for the concat dimension.  The
+`indices`, `values`, and `shapes` lists must have the same length.
+
+The output shape is identical to the inputs', except along the concat
+dimension, where it is the sum of the inputs' sizes along that dimension.
+
+The output elements will be resorted to preserve the sort order along
+increasing dimension number.
+
+This op runs in `O(M log M)` time, where `M` is the total number of non-empty
+values across all inputs. This is due to the need for an internal sort in
+order to concatenate efficiently across an arbitrary dimension.
+
+For example, if `concat_dim = 1` and the inputs are
+
+    sp_inputs[0]: shape = [2, 3]
+    [0, 2]: "a"
+    [1, 0]: "b"
+    [1, 1]: "c"
+
+    sp_inputs[1]: shape = [2, 4]
+    [0, 1]: "d"
+    [0, 2]: "e"
+
+then the output will be
+
+    shape = [2, 7]
+    [0, 2]: "a"
+    [0, 4]: "d"
+    [0, 5]: "e"
+    [1, 0]: "b"
+    [1, 1]: "c"
+
+Graphically this is equivalent to doing
+
+    [    a] concat [  d e  ] = [    a   d e  ]
+    [b c  ]        [       ]   [b c          ]
+
+indices: 2-D.  Indices of each input `SparseTensor`.
+values: 1-D.  Non-empty values of each `SparseTensor`.
+shapes: 1-D.  Shapes of each `SparseTensor`.
+output_indices: 2-D.  Indices of the concatenated `SparseTensor`.
+output_values: 1-D.  Non-empty values of the concatenated `SparseTensor`.
+output_shape: 1-D.  Shape of the concatenated `SparseTensor`.
+concat_dim: Dimension to concatenate along.
+)doc");
+
+REGISTER_OP("SparseReorder")
+    .Input("input_indices: int64")
+    .Input("input_values: T")
+    .Input("input_shape: int64")
+    .Output("output_indices: int64")
+    .Output("output_values: T")
+    .Attr("T: type")
+    .Doc(R"doc(
+Reorders a SparseTensor into the canonical, row-major ordering.
+
+Note that by convention, all sparse ops preserve the canonical ordering along
+increasing dimension number. The only time ordering can be violated is during
+manual manipulation of the indices and values vectors to add entries.
+
+Reordering does not affect the shape of the SparseTensor.
+
+If the tensor has rank `R` and `N` non-empty values, `input_indices` has
+shape `[N, R]`, input_values has length `N`, and input_shape has length `R`.
+
+input_indices: 2-D.  `N x R` matrix with the indices of non-empty values in a
+  SparseTensor, possibly not in canonical ordering.
+input_values: 1-D.  `N` non-empty values corresponding to `input_indices`.
+input_shape: 1-D.  Shape of the input SparseTensor.
+output_indices: 2-D.  `N x R` matrix with the same indices as input_indices, but
+  in canonical row-major ordering.
+output_values: 1-D.  `N` non-empty values corresponding to `output_indices`.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/state_ops.cc b/tensorflow/core/ops/state_ops.cc
new file mode 100644
index 0000000000..da9fd4ad08
--- /dev/null
+++ b/tensorflow/core/ops/state_ops.cc
@@ -0,0 +1,290 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("Variable")
+    .Output("ref: Ref(dtype)")
+    .Attr("shape: shape")
+    .Attr("dtype: type")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+Holds state in the form of a tensor that persists across steps.
+
+Outputs a ref to the tensor state so it may be read or modified.
+TODO(zhifengc/mrry): Adds a pointer to a more detail document
+about sharing states in tensorflow.
+
+ref: A reference to the variable tensor.
+shape: The shape of the variable tensor.
+dtype: The type of elements in the variable tensor.
+container: If non-empty, this variable is placed in the given container.
+        Otherwise, a default container is used.
+shared_name: If non-empty, this variable is named in the given bucket
+             with this shared_name. Otherwise, the node name is used instead.
+)doc");
+
+REGISTER_OP("TemporaryVariable")
+    .Output("ref: Ref(dtype)")
+    .Attr("shape: shape")
+    .Attr("dtype: type")
+    .Attr("var_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+Returns a tensor that may be mutated, but only persists within a single step.
+
+This is an experimental op for internal use only and it is possible to use this
+op in unsafe ways.  DO NOT USE unless you fully understand the risks.
+
+It is the caller's responsibility to ensure that 'ref' is eventually passed to a
+matching 'DestroyTemporaryVariable' op after all other uses have completed.
+
+Outputs a ref to the tensor state so it may be read or modified.
+
+  E.g.
+      var = state_ops._temporary_variable([1, 2], types.float_)
+      var_name = var.op.name
+      var = state_ops.assign(var, [[4.0, 5.0]])
+      var = state_ops.assign_add(var, [[6.0, 7.0]])
+      final = state_ops._destroy_temporary_variable(var, var_name=var_name)
+
+ref: A reference to the variable tensor.
+shape: The shape of the variable tensor.
+dtype: The type of elements in the variable tensor.
+var_name: Overrides the name used for the temporary variable resource. Default
+value is the name of the 'TemporaryVariable' op (which is guaranteed unique).
+)doc");
+
+REGISTER_OP("DestroyTemporaryVariable")
+    .Input("ref: Ref(T)")
+    .Output("value: T")
+    .Attr("T: type")
+    .Attr("var_name: string")
+    .Doc(R"doc(
+Destroys the temporary variable and returns its final value.
+
+Sets output to the value of the Tensor pointed to by 'ref', then destroys
+the temporary variable called 'var_name'.
+All other uses of 'ref' *must* have executed before this op.
+This is typically achieved by chaining the ref through each assign op, or by
+using control dependencies.
+
+Outputs the final value of the tensor pointed to by 'ref'.
+
+ref: A reference to the temporary variable tensor.
+var_name: Name of the temporary variable, usually the name of the matching
+'TemporaryVariable' op.
+)doc");
+
+REGISTER_OP("Assign")
+    .Input("ref: Ref(T)")
+    .Input("value: T")
+    .Output("output_ref: Ref(T)")
+    .Attr("T: type")
+    .Attr("validate_shape: bool = true")
+    .Attr("use_locking: bool = true")
+    .SetAllowsUninitializedInput()
+    .Doc(R"doc(
+Update 'ref' by assigning 'value' to it.
+
+This operation outputs "ref" after the assignment is done.
+This makes it easier to chain operations that need to use the reset value.
+
+ref: Should be from a `Variable` node. May be uninitialized.
+value: The value to be assigned to the variable.
+validate_shape: If true, the operation will validate that the shape
+  of 'value' matches the shape of the Tensor being assigned to.  If false,
+  'ref' will take on the shape of 'value'.
+use_locking: If True, the assignment will be protected by a lock;
+  otherwise the behavior is undefined, but may exhibit less contention.
+output_ref:= Same as "ref".  Returned as a convenience for operations that want
+  to use the new value after the variable has been reset.
+)doc");
+
+REGISTER_OP("AssignAdd")
+    .Input("ref: Ref(T)")
+    .Input("value: T")
+    .Output("output_ref: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update 'ref' by adding 'value' to it.
+
+This operation outputs "ref" after the update is done.
+This makes it easier to chain operations that need to use the reset value.
+
+ref: Should be from a `Variable` node.
+value: The value to be added to the variable.
+use_locking: If True, the addition will be protected by a lock;
+  otherwise the behavior is undefined, but may exhibit less contention.
+output_ref:= Same as "ref".  Returned as a convenience for operations that want
+  to use the new value after the variable has been updated.
+)doc");
+
+REGISTER_OP("AssignSub")
+    .Input("ref: Ref(T)")
+    .Input("value: T")
+    .Output("output_ref: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update 'ref' by subtracting 'value' from it.
+
+This operation outputs "ref" after the update is done.
+This makes it easier to chain operations that need to use the reset value.
+
+ref: Should be from a `Variable` node.
+value: The value to be subtracted to the variable.
+use_locking: If True, the subtraction will be protected by a lock;
+  otherwise the behavior is undefined, but may exhibit less contention.
+output_ref:= Same as "ref".  Returned as a convenience for operations that want
+  to use the new value after the variable has been updated.
+)doc");
+
+REGISTER_OP("ScatterUpdate")
+    .Input("ref: Ref(T)")
+    .Input("indices: Tindices")
+    .Input("updates: T")
+    .Output("output_ref: Ref(T)")
+    .Attr("T: type")
+    .Attr("Tindices: {int32, int64}")
+    .Attr("use_locking: bool = true")
+    .Doc(R"doc(
+Applies sparse updates to a variable reference.
+
+This operation computes
+
+    # Scalar indices
+    ref[indices, ...] = updates[...]
+
+    # Vector indices (for each i)
+    ref[indices[i], ...] = updates[i, ...]
+
+    # High rank indices (for each i, ..., j)
+    ref[indices[i, ..., j], ...] = updates[i, ..., j, ...]
+
+This operation outputs `ref` after the update is done.
+This makes it easier to chain operations that need to use the reset value.
+
+If `indices` contains duplicate entries, lexicographically later entries
+override earlier entries.
+
+Requires `updates.shape = indices.shape + ref.shape[1:]`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/ScatterUpdate.png" alt>
+</div>
+
+ref: Should be from a `Variable` node.
+indices: A tensor of indices into the first dimension of `ref`.
+updates: A tensor of updated values to store in `ref`.
+output_ref:= Same as `ref`.  Returned as a convenience for operations that want
+  to use the updated values after the update is done.
+use_locking: If True, the assignment will be protected by a lock;
+  otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("ScatterAdd")
+    .Input("ref: Ref(T)")
+    .Input("indices: Tindices")
+    .Input("updates: T")
+    .Output("output_ref: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("Tindices: {int32, int64}")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Adds sparse updates to a variable reference.
+
+This operation computes
+
+    # Scalar indices
+    ref[indices, ...] += updates[...]
+
+    # Vector indices (for each i)
+    ref[indices[i], ...] += updates[i, ...]
+
+    # High rank indices (for each i, ..., j)
+    ref[indices[i, ..., j], ...] += updates[i, ..., j, ...]
+
+This operation outputs `ref` after the update is done.
+This makes it easier to chain operations that need to use the reset value.
+
+Duplicate entries are handled correctly: if multiple `indices` reference
+the same location, their contributions add.
+
+Requires `updates.shape = indices.shape + ref.shape[1:]`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/ScatterAdd.png" alt>
+</div>
+
+ref: Should be from a `Variable` node.
+indices: A tensor of indices into the first dimension of `ref`.
+updates: A tensor of updated values to add to `ref`.
+output_ref:= Same as `ref`.  Returned as a convenience for operations that want
+  to use the updated values after the update is done.
+use_locking: If True, the addition will be protected by a lock;
+  otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("ScatterSub")
+    .Input("ref: Ref(T)")
+    .Input("indices: Tindices")
+    .Input("updates: T")
+    .Output("output_ref: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("Tindices: {int32, int64}")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Subtracts sparse updates to a variable reference.
+
+    # Scalar indices
+    ref[indices, ...] -= updates[...]
+
+    # Vector indices (for each i)
+    ref[indices[i], ...] -= updates[i, ...]
+
+    # High rank indices (for each i, ..., j)
+    ref[indices[i, ..., j], ...] -= updates[i, ..., j, ...]
+
+This operation outputs `ref` after the update is done.
+This makes it easier to chain operations that need to use the reset value.
+
+Duplicate entries are handled correctly: if multiple `indices` reference
+the same location, their (negated) contributions add.
+
+Requires `updates.shape = indices.shape + ref.shape[1:]`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/ScatterSub.png" alt>
+</div>
+
+ref: Should be from a `Variable` node.
+indices: A tensor of indices into the first dimension of `ref`.
+updates: A tensor of updated values to subtract from `ref`.
+output_ref:= Same as `ref`.  Returned as a convenience for operations that want
+  to use the updated values after the update is done.
+use_locking: If True, the subtraction will be protected by a lock;
+  otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("CountUpTo")
+    .Input("ref: Ref(T)")
+    .Output("output: T")
+    .Attr("limit: int")
+    .Attr("T: {int32, int64}")
+    .Doc(R"doc(
+Increments 'ref' until it reaches 'limit'.
+
+This operation outputs "ref" after the update is done.  This makes it
+easier to chain operations that need to use the updated value.
+
+ref: Should be from a scalar `Variable` node.
+limit: If incrementing ref would bring it above limit, instead generates an
+  'OutOfRange' error.
+output: A copy of the input before increment. If nothing else modifies the
+  input, the values produced will all be distinct.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/string_ops.cc b/tensorflow/core/ops/string_ops.cc
new file mode 100644
index 0000000000..57b471074c
--- /dev/null
+++ b/tensorflow/core/ops/string_ops.cc
@@ -0,0 +1,21 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("StringToHashBucket")
+    .Input("string_tensor: string")
+    .Output("output: int64")
+    .Attr("num_buckets: int >= 1")
+    .Doc(R"doc(
+Converts each string in the input Tensor to its hash mod by a number of buckets.
+
+The hash function is deterministic on the content of the string within the
+process.
+
+Note that the hash function may change from time to time.
+
+num_buckets: The number of buckets.
+output: A Tensor of the same shape as the input string_tensor.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/summary_ops.cc b/tensorflow/core/ops/summary_ops.cc
new file mode 100644
index 0000000000..5f46c871b6
--- /dev/null
+++ b/tensorflow/core/ops/summary_ops.cc
@@ -0,0 +1,115 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+// Operators that deal with SummaryProtos (encoded as DT_STRING tensors) as
+// inputs or outputs in various ways.
+
+REGISTER_OP("ScalarSummary")
+    .Input("tags: string")
+    .Input("values: T")
+    .Output("summary: string")
+    .Attr("T: {float, double}")
+    .Doc(R"doc(
+Outputs a `Summary` protocol buffer with scalar values.
+
+The input `tags` and `values` must have the same shape.  The generated summary
+has a summary value for each tag-value pair in `tags` and `values`.
+
+tags: 1-D. Tags for the summary.
+values: 1-D, same size as `tags.  Values for the summary.
+summary: Scalar.  Serialized `Summary` protocol buffer.
+)doc");
+
+REGISTER_OP("HistogramSummary")
+    .Input("tag: string")
+    .Input("values: float")
+    .Output("summary: string")
+    .Doc(R"doc(
+Outputs a `Summary` protocol buffer with a histogram.
+
+The generated
+[`Summary`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+has one summary value containing a histogram for `values`.
+
+This op reports an `OutOfRange` error if any value is not finite.
+
+tag: Scalar.  Tag to use for the `Summary.Value`.
+values: Any shape. Values to use to build the histogram.
+summary: Scalar. Serialized `Summary` protocol buffer.
+)doc");
+
+REGISTER_OP("ImageSummary")
+    .Input("tag: string")
+    .Input("tensor: float")
+    .Output("summary: string")
+    .Attr("max_images: int >= 1 = 3")
+    .Attr(
+        "bad_color: tensor = { dtype: DT_UINT8 "
+        "tensor_shape: { dim { size: 4 } } "
+        "int_val: 255 int_val: 0 int_val: 0 int_val: 255 }")
+    .Doc(R"doc(
+Outputs a `Summary` protocol buffer with images.
+
+The summary has up to `max_images` summary values containing images. The
+images are built from `tensor` which must be 4-D with shape `[batch_size,
+height, width, channels]` and where `channels` can be:
+
+*  1: `tensor` is interpreted as Grayscale.
+*  3: `tensor` is interpreted as RGB.
+*  4: `tensor` is interpreted as RGBA.
+
+The images have the same number of channels as the input tensor. Their values
+are normalized, one image at a time, to fit in the range `[0, 255]`.  The
+op uses two different normalization algorithms:
+
+*  If the input values are all positive, they are rescaled so the largest one
+   is 255.
+
+*  If any input value is negative, the values are shifted so input value 0.0
+   is at 127.  They are then rescaled so that either the smallest value is 0,
+   or the largest one is 255.
+
+The `tag` argument is a scalar `Tensor` of type `string`.  It is used to
+build the `tag` of the summary values:
+
+*  If `max_images` is 1, the summary value tag is '*tag*/image'.
+*  If `max_images` is greater than 1, the summary value tags are
+   generated sequentially as '*tag*/image/0', '*tag*/image/1', etc.
+
+The `bad_color` argument is the color to use in the generated images for
+non-finite input values.  It is a `unit8` 1-D tensor of length `channels`.
+Each element must be in the range `[0, 255]` (It represents the value of a
+pixel in the output image).  Non-finite values in the input tensor are
+replaced by this tensor in the output image.  The default value is the color
+red.
+
+tag: Scalar. Used to build the `tag` attribute of the summary values.
+tensor: 4-D of shape `[batch_size, height, width, channels]` where
+  `channels` is 1, 3, or 4.
+max_images: Max number of batch elements to generate images for.
+bad_color: Color to use for pixels with non-finite values.
+summary: Scalar. Serialized `Summary` protocol buffer.
+)doc");
+
+REGISTER_OP("MergeSummary")
+    .Input("inputs: N * string")
+    .Output("summary: string")
+    .Attr("N : int >= 1")
+    .Doc(R"doc(
+Merges summaries.
+
+This op creates a
+[`Summary`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+protocol buffer that contains the union of all the values in the input
+summaries.
+
+When the Op is run, it reports an `InvalidArgument` error if multiple values
+in the summaries to merge use the same tag.
+
+inputs: Can be of any shape.  Each must contain serialized `Summary` protocol
+  buffers.
+summary: Scalar. Serialized `Summary` protocol buffer.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/ops/training_ops.cc b/tensorflow/core/ops/training_ops.cc
new file mode 100644
index 0000000000..e7b4e92fd5
--- /dev/null
+++ b/tensorflow/core/ops/training_ops.cc
@@ -0,0 +1,199 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("ApplyGradientDescent")
+    .Input("var: Ref(T)")
+    .Input("alpha: T")
+    .Input("delta: T")
+    .Output("out: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update '*var' by subtracting 'alpha' * 'delta' from it.
+
+var: Should be from a Variable().
+alpha: Scaling factor. Must be a scalar.
+delta: The change.
+out: Same as "var".
+use_locking: If True, the subtraction will be protected by a lock;
+  otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("ApplyAdagrad")
+    .Input("var: Ref(T)")
+    .Input("accum: Ref(T)")
+    .Input("lr: T")
+    .Input("grad: T")
+    .Output("out: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update '*var' according to the adagrad scheme.
+
+accum += grad * grad
+var -= lr * grad * (1 / sqrt(accum))
+
+var: Should be from a Variable().
+accum: Should be from a Variable().
+lr: Scaling factor. Must be a scalar.
+grad: The gradient.
+out: Same as "var".
+use_locking: If True, updating of the var and accum tensors will be protected by
+a lock; otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("SparseApplyAdagrad")
+    .Input("var: Ref(T)")
+    .Input("accum: Ref(T)")
+    .Input("lr: T")
+    .Input("grad: T")
+    .Input("indices: Tindices")
+    .Output("out: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("Tindices: {int32, int64}")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update relevant entries in '*var' and '*accum' according to the adagrad scheme.
+
+That is for rows we have grad for, we update var and accum as follows:
+accum += grad * grad
+var -= lr * grad * (1 / sqrt(accum))
+
+var: Should be from a Variable().
+accum: Should be from a Variable().
+lr: Learning rate. Must be a scalar.
+grad: The gradient.
+indices: A vector of indices into the first dimension of var and accum.
+out: Same as "var".
+use_locking: If True, updating of the var and accum tensors will be protected by
+a lock; otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("ApplyMomentum")
+    .Input("var: Ref(T)")
+    .Input("accum: Ref(T)")
+    .Input("lr: T")
+    .Input("grad: T")
+    .Input("momentum: T")
+    .Output("out: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update '*var' according to the momentum scheme.
+
+accum = accum * momentum + grad
+var -= lr * accum
+
+var: Should be from a Variable().
+accum: Should be from a Variable().
+lr: Scaling factor. Must be a scalar.
+grad: The gradient.
+momentum: Momentum. Must be a scalar.
+out: Same as "var".
+use_locking: If True, updating of the var and accum tensors will be protected by
+a lock; otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("SparseApplyMomentum")
+    .Input("var: Ref(T)")
+    .Input("accum: Ref(T)")
+    .Input("lr: T")
+    .Input("grad: T")
+    .Input("indices: Tindices")
+    .Input("momentum: T")
+    .Output("out: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("Tindices: {int32, int64}")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update relevant entries in '*var' and '*accum' according to the momentum scheme.
+
+That is for rows we have grad for, we update var and accum as follows:
+
+accum = accum * momentum + grad
+var -= lr * accum
+
+var: Should be from a Variable().
+accum: Should be from a Variable().
+lr: Learning rate. Must be a scalar.
+grad: The gradient.
+indices: A vector of indices into the first dimension of var and accum.
+momentum: Momentum. Must be a scalar.
+out: Same as "var".
+use_locking: If True, updating of the var and accum tensors will be protected by
+a lock; otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("ApplyAdam")
+    .Input("var: Ref(T)")
+    .Input("m: Ref(T)")
+    .Input("v: Ref(T)")
+    .Input("beta1_power: T")
+    .Input("beta2_power: T")
+    .Input("lr: T")
+    .Input("beta1: T")
+    .Input("beta2: T")
+    .Input("epsilon: T")
+    .Input("grad: T")
+    .Output("out: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update '*var' according to the Adam algorithm.
+
+lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
+m_t <- beta1 * m_{t-1} + (1 - beta1) * g_t
+v_t <- beta2 * v_{t-1} + (1 - beta2) * g_t * g_t
+variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
+
+var: Should be from a Variable().
+m: Should be from a Variable().
+v: Should be from a Variable().
+beta1_power: Must be a scalar.
+beta2_power: Must be a scalar.
+lr: Scaling factor. Must be a scalar.
+beta1: Momentum factor. Must be a scalar.
+beta2: Momentum factor. Must be a scalar.
+epsilon: Ridge term. Must be a scalar.
+grad: The gradient.
+out: Same as "var".
+use_locking: If True, updating of the var, m, and v tensors will be protected by
+a lock; otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+REGISTER_OP("ApplyRMSProp")
+    .Input("var: Ref(T)")
+    .Input("ms: Ref(T)")
+    .Input("mom: Ref(T)")
+    .Input("lr: T")
+    .Input("rho: T")
+    .Input("momentum: T")
+    .Input("epsilon: T")
+    .Input("grad: T")
+    .Output("out: Ref(T)")
+    .Attr("T: numbertype")
+    .Attr("use_locking: bool = false")
+    .Doc(R"doc(
+Update '*var' according to the RMSProp algorithm.
+
+mean_square = decay * mean_square + (1-decay) * gradient ** 2
+Delta = learning_rate * gradient / sqrt(mean_square + epsilon)
+
+ms <- rho * ms_{t-1} + (1-rho) * grad * grad
+mom <- momentum * mom_{t-1} + lr * grad / sqrt(ms + epsilon)
+var <- var - mom
+
+var: Should be from a Variable().
+ms: Should be from a Variable().
+mom: Should be from a Variable().
+lr: Scaling factor. Must be a scalar.
+epsilon: Ridge term. Must be a scalar.
+rho: Decay rate. Must be a scalar.
+grad: The gradient.
+out: Same as "var".
+use_locking: If True, updating of the var, m, and v tensors will be protected by
+a lock; otherwise the behavior is undefined, but may exhibit less contention.
+)doc");
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/default/build_config.bzl b/tensorflow/core/platform/default/build_config.bzl
new file mode 100644
index 0000000000..7cf6c274be
--- /dev/null
+++ b/tensorflow/core/platform/default/build_config.bzl
@@ -0,0 +1,65 @@
+# Platform-specific build configurations.
+
+load("/google/protobuf/protobuf", "cc_proto_library")
+load("/google/protobuf/protobuf", "py_proto_library")
+
+# Appends a suffix to a list of deps.
+def tf_deps(deps, suffix):
+  tf_deps = []
+
+  # If the package name is in shorthand form (ie: does not contain a ':'),
+  # expand it to the full name.
+  for dep in deps:
+    tf_dep = dep
+
+    if not ":" in dep:
+      dep_pieces = dep.split("/")
+      tf_dep += ":" + dep_pieces[len(dep_pieces) - 1]
+
+    tf_deps += [tf_dep + suffix]
+
+  return tf_deps
+
+def tf_proto_library(name, srcs = [], has_services = False,
+                     deps = [], visibility = [], testonly = 0,
+                     cc_api_version = 2, go_api_version = 2,
+                     java_api_version = 2,
+                     py_api_version = 2):
+  native.filegroup(name=name + "_proto_srcs",
+                   srcs=srcs + tf_deps(deps, "_proto_srcs"),
+                   testonly=testonly,)
+
+  cc_proto_library(name=name + "_cc",
+                   srcs=srcs + tf_deps(deps, "_proto_srcs"),
+                   deps=deps,
+                   cc_libs = ["//google/protobuf:protobuf"],
+                   testonly=testonly,
+                   visibility=visibility,)
+
+  py_proto_library(name=name + "_py",
+                   srcs=srcs + tf_deps(deps, "_proto_srcs"),
+                   deps=deps,
+                   py_libs = ["//google/protobuf:protobuf_python"],
+                   testonly=testonly,
+                   visibility=visibility,)
+
+def tf_proto_library_py(name, srcs=[], deps=[], visibility=[], testonly=0):
+  py_proto_library(name = name + "_py",
+                   srcs = srcs,
+                   deps = deps,
+                   visibility = visibility,
+                   testonly = testonly)
+
+def tf_additional_lib_srcs():
+  return [
+      "platform/default/*.h",
+      "platform/default/*.cc",
+      "platform/posix/*.h",
+      "platform/posix/*.cc",
+  ]
+
+def tf_additional_test_srcs():
+  return ["platform/default/test_benchmark.cc"]
+
+def tf_kernel_tests_linkstatic():
+  return 0
diff --git a/tensorflow/core/platform/default/build_config/BUILD b/tensorflow/core/platform/default/build_config/BUILD
new file mode 100644
index 0000000000..44dbc47ad1
--- /dev/null
+++ b/tensorflow/core/platform/default/build_config/BUILD
@@ -0,0 +1,85 @@
+# Description:
+# Platform-specific build configurations.
+
+package(default_visibility = ["//tensorflow:internal"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+load("/tensorflow/tensorflow", "tf_copts")
+load("/tensorflow/tensorflow", "tf_cuda_library")
+
+cc_library(
+    name = "gtest",
+    testonly = 1,
+    copts = tf_copts(),
+    deps = [
+        "//external:gtest",
+    ],
+)
+
+cc_library(
+    name = "tensorflow_platform_specific",
+    copts = tf_copts(),
+    linkstatic = 1,
+    deps = [],
+)
+
+tf_cuda_library(
+    name = "stream_executor",
+    deps = [
+        "//tensorflow/stream_executor",
+    ],
+)
+
+cc_library(
+    name = "platformlib",
+    copts = tf_copts(),
+    deps = [
+        "@jpeg_archive//:jpeg",
+        "@png_archive//:png",
+        "@re2//:re2",
+        "//tensorflow/core:protos_cc",
+    ],
+)
+
+cc_library(
+    name = "protos_cc",
+    copts = tf_copts(),
+    deps = [
+        "//tensorflow/core:protos_all_cc",
+    ],
+)
+
+cc_library(
+    name = "test_main",
+    testonly = 1,
+    linkstatic = 1,
+    deps = [],
+)
+
+cc_library(
+    name = "cuda_runtime_extra",
+    linkstatic = 1,
+    deps = [],
+)
+
+filegroup(
+    name = "android_proto_lib_portable_proto",
+    srcs = [],
+    visibility = ["//visibility:public"],
+)
+
+cc_library(
+    name = "cuda",
+    data = [
+        "//third_party/gpus/cuda:lib64/libcudart.so.7.0",
+    ],
+    linkopts = [
+        "-Wl,-rpath,third_party/gpus/cuda/lib64",
+    ],
+    deps = [
+        "//third_party/gpus/cuda:cudart",
+    ],
+)
diff --git a/tensorflow/core/platform/default/build_config_root.bzl b/tensorflow/core/platform/default/build_config_root.bzl
new file mode 100644
index 0000000000..439bf97a2c
--- /dev/null
+++ b/tensorflow/core/platform/default/build_config_root.bzl
@@ -0,0 +1,6 @@
+# Lower-level functionality for build config.
+# The functions in this file might be referred by tensorflow.bzl. They have to
+# be separate to avoid cyclic references.
+
+def tf_cuda_tests_tags():
+  return ["local"]
diff --git a/tensorflow/core/platform/default/dynamic_annotations.h b/tensorflow/core/platform/default/dynamic_annotations.h
new file mode 100644
index 0000000000..1705fb9955
--- /dev/null
+++ b/tensorflow/core/platform/default/dynamic_annotations.h
@@ -0,0 +1,9 @@
+#ifndef THIRD_PARTY_TENSORFLOW_CORE_PLATFORM_DEFAULT_DYNAMIC_ANNOTATIONS_H_
+#define THIRD_PARTY_TENSORFLOW_CORE_PLATFORM_DEFAULT_DYNAMIC_ANNOTATIONS_H_
+
+// Do nothing for this platform
+#define TF_ANNOTATE_MEMORY_IS_INITIALIZED(ptr, bytes) \
+  do {                                                \
+  } while (0)
+
+#endif  // THIRD_PARTY_TENSORFLOW_CORE_PLATFORM_DEFAULT_DYNAMIC_ANNOTATIONS_H_
diff --git a/tensorflow/core/platform/default/integral_types.h b/tensorflow/core/platform/default/integral_types.h
new file mode 100644
index 0000000000..04aae172da
--- /dev/null
+++ b/tensorflow/core/platform/default/integral_types.h
@@ -0,0 +1,18 @@
+#ifndef TENSORFLOW_PLATFORM_DEFAULT_INTEGRAL_TYPES_H_
+#define TENSORFLOW_PLATFORM_DEFAULT_INTEGRAL_TYPES_H_
+
+namespace tensorflow {
+
+typedef signed char int8;
+typedef short int16;
+typedef int int32;
+typedef long long int64;
+
+typedef unsigned char uint8;
+typedef unsigned short uint16;
+typedef unsigned int uint32;
+typedef unsigned long long uint64;
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_DEFAULT_INTEGRAL_TYPES_H_
diff --git a/tensorflow/core/platform/default/logging.cc b/tensorflow/core/platform/default/logging.cc
new file mode 100644
index 0000000000..8a16a537b0
--- /dev/null
+++ b/tensorflow/core/platform/default/logging.cc
@@ -0,0 +1,125 @@
+#include "tensorflow/core/platform/default/logging.h"
+
+#if defined(PLATFORM_POSIX_ANDROID)
+#include <android/log.h>
+#include <sstream>
+#endif
+
+#include <stdlib.h>
+
+namespace tensorflow {
+namespace internal {
+
+LogMessage::LogMessage(const char* fname, int line, int severity)
+    : fname_(fname), line_(line), severity_(severity) {}
+
+#if defined(PLATFORM_POSIX_ANDROID)
+void LogMessage::GenerateLogMessage() {
+  int android_log_level;
+  switch (severity_) {
+    case INFO:
+      android_log_level = ANDROID_LOG_INFO;
+      break;
+    case WARNING:
+      android_log_level = ANDROID_LOG_WARN;
+      break;
+    case ERROR:
+      android_log_level = ANDROID_LOG_ERROR;
+      break;
+    case FATAL:
+      android_log_level = ANDROID_LOG_FATAL;
+      break;
+    default:
+      if (severity_ < INFO) {
+        android_log_level = ANDROID_LOG_VERBOSE;
+      } else {
+        android_log_level = ANDROID_LOG_ERROR;
+      }
+      break;
+  }
+
+  std::stringstream ss;
+  ss << fname_ << ":" << line_ << " " << str();
+  __android_log_write(android_log_level, "native", ss.str().c_str());
+
+  // Android logging at level FATAL does not terminate execution, so abort()
+  // is still required to stop the program.
+  if (severity_ == FATAL) {
+    abort();
+  }
+}
+
+#else
+
+void LogMessage::GenerateLogMessage() {
+  // TODO(jeff,sanjay): For open source version, replace this with something
+  // that logs through the env or something and fill in appropriate time info.
+  fprintf(stderr, "%c %s:%d] %s\n", "IWEF"[severity_], fname_, line_,
+          str().c_str());
+}
+#endif
+
+LogMessage::~LogMessage() { GenerateLogMessage(); }
+
+LogMessageFatal::LogMessageFatal(const char* file, int line)
+    : LogMessage(file, line, FATAL) {}
+LogMessageFatal::~LogMessageFatal() {
+  // abort() ensures we don't return (we promised we would not via
+  // ATTRIBUTE_NORETURN).
+  GenerateLogMessage();
+  abort();
+}
+
+template <>
+void MakeCheckOpValueString(std::ostream* os, const char& v) {
+  if (v >= 32 && v <= 126) {
+    (*os) << "'" << v << "'";
+  } else {
+    (*os) << "char value " << (short)v;
+  }
+}
+
+template <>
+void MakeCheckOpValueString(std::ostream* os, const signed char& v) {
+  if (v >= 32 && v <= 126) {
+    (*os) << "'" << v << "'";
+  } else {
+    (*os) << "signed char value " << (short)v;
+  }
+}
+
+template <>
+void MakeCheckOpValueString(std::ostream* os, const unsigned char& v) {
+  if (v >= 32 && v <= 126) {
+    (*os) << "'" << v << "'";
+  } else {
+    (*os) << "unsigned char value " << (unsigned short)v;
+  }
+}
+
+#if LANG_CXX11
+template <>
+void MakeCheckOpValueString(std::ostream* os, const std::nullptr_t& p) {
+  (*os) << "nullptr";
+}
+#endif
+
+CheckOpMessageBuilder::CheckOpMessageBuilder(const char* exprtext)
+    : stream_(new std::ostringstream) {
+  *stream_ << "Check failed: " << exprtext << " (";
+}
+
+CheckOpMessageBuilder::~CheckOpMessageBuilder() { delete stream_; }
+
+std::ostream* CheckOpMessageBuilder::ForVar2() {
+  *stream_ << " vs. ";
+  return stream_;
+}
+
+string* CheckOpMessageBuilder::NewString() {
+  *stream_ << ")";
+  return new string(stream_->str());
+}
+
+}  // namespace internal
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/default/logging.h b/tensorflow/core/platform/default/logging.h
new file mode 100644
index 0000000000..034178751e
--- /dev/null
+++ b/tensorflow/core/platform/default/logging.h
@@ -0,0 +1,258 @@
+#ifndef TENSORFLOW_PLATFORM_DEFAULT_LOGGING_H_
+#define TENSORFLOW_PLATFORM_DEFAULT_LOGGING_H_
+
+#include <sstream>
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+const int INFO = 0;            // base_logging::INFO;
+const int WARNING = 1;         // base_logging::WARNING;
+const int ERROR = 2;           // base_logging::ERROR;
+const int FATAL = 3;           // base_logging::FATAL;
+const int NUM_SEVERITIES = 4;  // base_logging::NUM_SEVERITIES;
+
+namespace internal {
+
+class LogMessage : public std::basic_ostringstream<char> {
+ public:
+  LogMessage(const char* fname, int line, int severity);
+  ~LogMessage();
+
+ protected:
+  void GenerateLogMessage();
+
+ private:
+  const char* fname_;
+  int line_;
+  int severity_;
+};
+
+// LogMessageFatal ensures the process will exit in failure after
+// logging this message.
+class LogMessageFatal : public LogMessage {
+ public:
+  LogMessageFatal(const char* file, int line) TF_ATTRIBUTE_COLD;
+  ~LogMessageFatal() TF_ATTRIBUTE_NORETURN;
+};
+
+#define _TF_LOG_INFO \
+  ::tensorflow::internal::LogMessage(__FILE__, __LINE__, tensorflow::INFO)
+#define _TF_LOG_WARNING \
+  ::tensorflow::internal::LogMessage(__FILE__, __LINE__, tensorflow::WARNING)
+#define _TF_LOG_ERROR \
+  ::tensorflow::internal::LogMessage(__FILE__, __LINE__, tensorflow::ERROR)
+#define _TF_LOG_FATAL \
+  ::tensorflow::internal::LogMessageFatal(__FILE__, __LINE__)
+
+#define LOG(severity) _TF_LOG_##severity
+
+// TODO(jeff): Define a proper implementation of VLOG_IS_ON
+#define VLOG_IS_ON(lvl) ((lvl) <= 0)
+
+#define VLOG(lvl)      \
+  if (VLOG_IS_ON(lvl)) \
+  ::tensorflow::internal::LogMessage(__FILE__, __LINE__, tensorflow::INFO)
+
+// CHECK dies with a fatal error if condition is not true.  It is *not*
+// controlled by NDEBUG, so the check will be executed regardless of
+// compilation mode.  Therefore, it is safe to do things like:
+//    CHECK(fp->Write(x) == 4)
+#define CHECK(condition)              \
+  if (TF_PREDICT_FALSE(!(condition))) \
+  LOG(FATAL) << "Check failed: " #condition " "
+
+// Function is overloaded for integral types to allow static const
+// integrals declared in classes and not defined to be used as arguments to
+// CHECK* macros. It's not encouraged though.
+template <typename T>
+inline const T& GetReferenceableValue(const T& t) {
+  return t;
+}
+inline char GetReferenceableValue(char t) { return t; }
+inline unsigned char GetReferenceableValue(unsigned char t) { return t; }
+inline signed char GetReferenceableValue(signed char t) { return t; }
+inline short GetReferenceableValue(short t) { return t; }
+inline unsigned short GetReferenceableValue(unsigned short t) { return t; }
+inline int GetReferenceableValue(int t) { return t; }
+inline unsigned int GetReferenceableValue(unsigned int t) { return t; }
+inline long GetReferenceableValue(long t) { return t; }
+inline unsigned long GetReferenceableValue(unsigned long t) { return t; }
+inline long long GetReferenceableValue(long long t) { return t; }
+inline unsigned long long GetReferenceableValue(unsigned long long t) {
+  return t;
+}
+
+// This formats a value for a failing CHECK_XX statement.  Ordinarily,
+// it uses the definition for operator<<, with a few special cases below.
+template <typename T>
+inline void MakeCheckOpValueString(std::ostream* os, const T& v) {
+  (*os) << v;
+}
+
+// Overrides for char types provide readable values for unprintable
+// characters.
+template <>
+void MakeCheckOpValueString(std::ostream* os, const char& v);
+template <>
+void MakeCheckOpValueString(std::ostream* os, const signed char& v);
+template <>
+void MakeCheckOpValueString(std::ostream* os, const unsigned char& v);
+
+#if LANG_CXX11
+// We need an explicit specialization for std::nullptr_t.
+template <>
+void MakeCheckOpValueString(std::ostream* os, const std::nullptr_t& p);
+#endif
+
+// A container for a string pointer which can be evaluated to a bool -
+// true iff the pointer is non-NULL.
+struct CheckOpString {
+  CheckOpString(string* str) : str_(str) {}
+  // No destructor: if str_ is non-NULL, we're about to LOG(FATAL),
+  // so there's no point in cleaning up str_.
+  operator bool() const { return TF_PREDICT_FALSE(str_ != NULL); }
+  string* str_;
+};
+
+// Build the error message string. Specify no inlining for code size.
+template <typename T1, typename T2>
+string* MakeCheckOpString(const T1& v1, const T2& v2,
+                          const char* exprtext) TF_ATTRIBUTE_NOINLINE;
+
+// A helper class for formatting "expr (V1 vs. V2)" in a CHECK_XX
+// statement.  See MakeCheckOpString for sample usage.  Other
+// approaches were considered: use of a template method (e.g.,
+// base::BuildCheckOpString(exprtext, base::Print<T1>, &v1,
+// base::Print<T2>, &v2), however this approach has complications
+// related to volatile arguments and function-pointer arguments).
+class CheckOpMessageBuilder {
+ public:
+  // Inserts "exprtext" and " (" to the stream.
+  explicit CheckOpMessageBuilder(const char* exprtext);
+  // Deletes "stream_".
+  ~CheckOpMessageBuilder();
+  // For inserting the first variable.
+  std::ostream* ForVar1() { return stream_; }
+  // For inserting the second variable (adds an intermediate " vs. ").
+  std::ostream* ForVar2();
+  // Get the result (inserts the closing ")").
+  string* NewString();
+
+ private:
+  std::ostringstream* stream_;
+};
+
+template <typename T1, typename T2>
+string* MakeCheckOpString(const T1& v1, const T2& v2, const char* exprtext) {
+  CheckOpMessageBuilder comb(exprtext);
+  MakeCheckOpValueString(comb.ForVar1(), v1);
+  MakeCheckOpValueString(comb.ForVar2(), v2);
+  return comb.NewString();
+}
+
+// Helper functions for CHECK_OP macro.
+// The (int, int) specialization works around the issue that the compiler
+// will not instantiate the template version of the function on values of
+// unnamed enum type - see comment below.
+#define TF_DEFINE_CHECK_OP_IMPL(name, op)                                 \
+  template <typename T1, typename T2>                                     \
+  inline string* name##Impl(const T1& v1, const T2& v2,                   \
+                            const char* exprtext) {                       \
+    if (TF_PREDICT_TRUE(v1 op v2))                                        \
+      return NULL;                                                        \
+    else                                                                  \
+      return ::tensorflow::internal::MakeCheckOpString(v1, v2, exprtext); \
+  }                                                                       \
+  inline string* name##Impl(int v1, int v2, const char* exprtext) {       \
+    return name##Impl<int, int>(v1, v2, exprtext);                        \
+  }
+
+// We use the full name Check_EQ, Check_NE, etc. in case the file including
+// base/logging.h provides its own #defines for the simpler names EQ, NE, etc.
+// This happens if, for example, those are used as token names in a
+// yacc grammar.
+TF_DEFINE_CHECK_OP_IMPL(Check_EQ,
+                        == )  // Compilation error with CHECK_EQ(NULL, x)?
+TF_DEFINE_CHECK_OP_IMPL(Check_NE, != )  // Use CHECK(x == NULL) instead.
+TF_DEFINE_CHECK_OP_IMPL(Check_LE, <= )
+TF_DEFINE_CHECK_OP_IMPL(Check_LT, < )
+TF_DEFINE_CHECK_OP_IMPL(Check_GE, >= )
+TF_DEFINE_CHECK_OP_IMPL(Check_GT, > )
+#undef TF_DEFINE_CHECK_OP_IMPL
+
+// In optimized mode, use CheckOpString to hint to compiler that
+// the while condition is unlikely.
+#define CHECK_OP_LOG(name, op, val1, val2)                            \
+  while (::tensorflow::internal::CheckOpString _result =              \
+             ::tensorflow::internal::name##Impl(                      \
+                 ::tensorflow::internal::GetReferenceableValue(val1), \
+                 ::tensorflow::internal::GetReferenceableValue(val2), \
+                 #val1 " " #op " " #val2))                            \
+  ::tensorflow::internal::LogMessageFatal(__FILE__, __LINE__) << *(_result.str_)
+
+#define CHECK_OP(name, op, val1, val2) CHECK_OP_LOG(name, op, val1, val2)
+
+// CHECK_EQ/NE/...
+#define CHECK_EQ(val1, val2) CHECK_OP(Check_EQ, ==, val1, val2)
+#define CHECK_NE(val1, val2) CHECK_OP(Check_NE, !=, val1, val2)
+#define CHECK_LE(val1, val2) CHECK_OP(Check_LE, <=, val1, val2)
+#define CHECK_LT(val1, val2) CHECK_OP(Check_LT, <, val1, val2)
+#define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)
+#define CHECK_GT(val1, val2) CHECK_OP(Check_GT, >, val1, val2)
+#define CHECK_NOTNULL(val)                                 \
+  ::tensorflow::internal::CheckNotNull(__FILE__, __LINE__, \
+                                       "'" #val "' Must be non NULL", (val))
+
+#ifndef NDEBUG
+// DCHECK_EQ/NE/...
+#define DCHECK(condition) CHECK(condition)
+#define DCHECK_EQ(val1, val2) CHECK_EQ(val1, val2)
+#define DCHECK_NE(val1, val2) CHECK_NE(val1, val2)
+#define DCHECK_LE(val1, val2) CHECK_LE(val1, val2)
+#define DCHECK_LT(val1, val2) CHECK_LT(val1, val2)
+#define DCHECK_GE(val1, val2) CHECK_GE(val1, val2)
+#define DCHECK_GT(val1, val2) CHECK_GT(val1, val2)
+
+#else
+
+#define DCHECK(condition) \
+  while (false && (condition)) LOG(FATAL)
+
+// NDEBUG is defined, so DCHECK_EQ(x, y) and so on do nothing.
+// However, we still want the compiler to parse x and y, because
+// we don't want to lose potentially useful errors and warnings.
+// _DCHECK_NOP is a helper, and should not be used outside of this file.
+#define _TF_DCHECK_NOP(x, y) \
+  while (false && ((void)(x), (void)(y), 0)) LOG(FATAL)
+
+#define DCHECK_EQ(x, y) _TF_DCHECK_NOP(x, y)
+#define DCHECK_NE(x, y) _TF_DCHECK_NOP(x, y)
+#define DCHECK_LE(x, y) _TF_DCHECK_NOP(x, y)
+#define DCHECK_LT(x, y) _TF_DCHECK_NOP(x, y)
+#define DCHECK_GE(x, y) _TF_DCHECK_NOP(x, y)
+#define DCHECK_GT(x, y) _TF_DCHECK_NOP(x, y)
+
+#endif
+
+// These are for when you don't want a CHECK failure to print a verbose
+// stack trace.  The implementation of CHECK* in this file already doesn't.
+#define QCHECK(condition) CHECK(condition)
+#define QCHECK_EQ(x, y) CHECK_EQ(x, y)
+#define QCHECK_NE(x, y) CHECK_NE(x, y)
+#define QCHECK_LE(x, y) CHECK_LE(x, y)
+#define QCHECK_LT(x, y) CHECK_LT(x, y)
+#define QCHECK_GE(x, y) CHECK_GE(x, y)
+#define QCHECK_GT(x, y) CHECK_GT(x, y)
+
+template <typename T>
+T&& CheckNotNull(const char* file, int line, const char* exprtext, T&& t) {
+  if (t == nullptr) {
+    LogMessageFatal(file, line) << string(exprtext);
+  }
+  return std::forward<T>(t);
+}
+
+}  // namespace internal
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_DEFAULT_LOGGING_H_
diff --git a/tensorflow/core/platform/default/mutex.h b/tensorflow/core/platform/default/mutex.h
new file mode 100644
index 0000000000..b26b418e1b
--- /dev/null
+++ b/tensorflow/core/platform/default/mutex.h
@@ -0,0 +1,33 @@
+#ifndef TENSORFLOW_PLATFORM_DEFAULT_MUTEX_H_
+#define TENSORFLOW_PLATFORM_DEFAULT_MUTEX_H_
+
+#include <chrono>
+#include <condition_variable>
+#include <mutex>
+
+namespace tensorflow {
+
+enum LinkerInitialized { LINKER_INITIALIZED };
+
+// A class that wraps around the std::mutex implementation, only adding an
+// additional LinkerInitialized constructor interface.
+class mutex : public std::mutex {
+ public:
+  mutex() {}
+  // The default implementation of std::mutex is safe to use after the linker
+  // initializations
+  explicit mutex(LinkerInitialized x) {}
+};
+
+using std::condition_variable;
+typedef std::unique_lock<std::mutex> mutex_lock;
+
+inline ConditionResult WaitForMilliseconds(mutex_lock* mu,
+                                           condition_variable* cv, int64 ms) {
+  std::cv_status s = cv->wait_for(*mu, std::chrono::milliseconds(ms));
+  return (s == std::cv_status::timeout) ? kCond_Timeout : kCond_MaybeNotified;
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_DEFAULT_MUTEX_H_
diff --git a/tensorflow/core/platform/default/protobuf.h b/tensorflow/core/platform/default/protobuf.h
new file mode 100644
index 0000000000..f6083c318d
--- /dev/null
+++ b/tensorflow/core/platform/default/protobuf.h
@@ -0,0 +1,13 @@
+#ifndef THIRD_PARTY_TENSORFLOW_CORE_PLATFORM_DEFAULT_PROTOBUF_H_
+#define THIRD_PARTY_TENSORFLOW_CORE_PLATFORM_DEFAULT_PROTOBUF_H_
+
+#include "google/protobuf/descriptor.h"
+#include "google/protobuf/io/coded_stream.h"
+#include "google/protobuf/io/zero_copy_stream.h"
+#include "google/protobuf/text_format.h"
+
+namespace tensorflow {
+namespace protobuf = ::google::protobuf;
+}  // namespace tensorflow
+
+#endif  // THIRD_PARTY_TENSORFLOW_CORE_PLATFORM_DEFAULT_PROTOBUF_H_
diff --git a/tensorflow/core/platform/default/stream_executor_util.h b/tensorflow/core/platform/default/stream_executor_util.h
new file mode 100644
index 0000000000..d7fad4e233
--- /dev/null
+++ b/tensorflow/core/platform/default/stream_executor_util.h
@@ -0,0 +1,19 @@
+#ifndef TENSORFLOW_PLATFORM_DEFAULT_STREAM_EXECUTOR_UTIL_H_
+#define TENSORFLOW_PLATFORM_DEFAULT_STREAM_EXECUTOR_UTIL_H_
+
+#include "tensorflow/stream_executor/lib/status.h"
+
+namespace tensorflow {
+
+namespace gpu = ::perftools::gputools;
+
+// On the open-source platform, stream_executor currently uses
+// tensorflow::Status
+inline Status FromStreamExecutorStatus(
+    const perftools::gputools::port::Status& s) {
+  return s;
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_DEFAULT_STREAM_EXECUTOR_UTIL_H_
diff --git a/tensorflow/core/platform/default/test_benchmark.cc b/tensorflow/core/platform/default/test_benchmark.cc
new file mode 100644
index 0000000000..4004bf026b
--- /dev/null
+++ b/tensorflow/core/platform/default/test_benchmark.cc
@@ -0,0 +1,162 @@
+#include "tensorflow/core/platform/test_benchmark.h"
+
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/regexp.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace testing {
+
+static std::vector<Benchmark*>* all_benchmarks = nullptr;
+static std::string label;
+static int64 bytes_processed;
+static int64 items_processed;
+static int64 accum_time = 0;
+static int64 start_time = 0;
+static Env* env;
+
+Benchmark::Benchmark(const char* name, void (*fn)(int))
+    : name_(name), num_args_(0), fn0_(fn) {
+  args_.push_back(-1);
+  Register();
+}
+
+Benchmark::Benchmark(const char* name, void (*fn)(int, int))
+    : name_(name), num_args_(1), fn1_(fn) {
+  Register();
+}
+
+Benchmark* Benchmark::Arg(int x) {
+  CHECK_EQ(num_args_, 1);
+  args_.push_back(x);
+  return this;
+}
+
+Benchmark* Benchmark::Range(int lo, int hi) {
+  Arg(lo);
+  for (int32 i = 1; i < kint32max / 8 && i < hi; i *= 8) {
+    Arg(i);
+  }
+  if (lo != hi) Arg(hi);
+  return this;
+}
+
+void Benchmark::Run(const char* pattern) {
+  if (!all_benchmarks) return;
+
+  if (StringPiece(pattern) == "all") {
+    pattern = ".*";
+  }
+
+  // Compute name width.
+  int width = 10;
+  string name;
+  for (auto b : *all_benchmarks) {
+    name = b->name_;
+    for (auto arg : b->args_) {
+      name.resize(b->name_.size());
+      if (arg >= 0) {
+        strings::StrAppend(&name, "/", arg);
+      }
+      if (RE2::PartialMatch(name, pattern)) {
+        width = std::max<int>(width, name.size());
+      }
+    }
+  }
+
+  printf("%-*s %10s %10s\n", width, "Benchmark", "Time(ns)", "Iterations");
+  printf("%s\n", string(width + 22, '-').c_str());
+  for (auto b : *all_benchmarks) {
+    name = b->name_;
+    for (auto arg : b->args_) {
+      name.resize(b->name_.size());
+      if (arg >= 0) {
+        strings::StrAppend(&name, "/", arg);
+      }
+      if (!RE2::PartialMatch(name, pattern)) {
+        continue;
+      }
+
+      int iters;
+      double seconds;
+      b->Run(arg, &iters, &seconds);
+
+      char buf[100];
+      std::string full_label = label;
+      if (bytes_processed > 0) {
+        snprintf(buf, sizeof(buf), " %.1fMB/s",
+                 (bytes_processed * 1e-6) / seconds);
+        full_label += buf;
+      }
+      if (items_processed > 0) {
+        snprintf(buf, sizeof(buf), " %.1fM items/s",
+                 (items_processed * 1e-6) / seconds);
+        full_label += buf;
+      }
+      printf("%-*s %10.0f %10d\t%s\n", width, name.c_str(),
+             seconds * 1e9 / iters, iters, full_label.c_str());
+    }
+  }
+}
+
+void Benchmark::Register() {
+  if (!all_benchmarks) all_benchmarks = new std::vector<Benchmark*>;
+  all_benchmarks->push_back(this);
+}
+
+void Benchmark::Run(int arg, int* run_count, double* run_seconds) {
+  env = Env::Default();
+  static const int64 kMinIters = 100;
+  static const int64 kMaxIters = 1000000000;
+  static const double kMinTime = 0.5;
+  int64 iters = kMinIters;
+  while (true) {
+    accum_time = 0;
+    start_time = env->NowMicros();
+    bytes_processed = -1;
+    items_processed = -1;
+    label.clear();
+    if (fn0_) {
+      (*fn0_)(iters);
+    } else {
+      (*fn1_)(iters, arg);
+    }
+    StopTiming();
+    const double seconds = accum_time * 1e-6;
+    if (seconds >= kMinTime || iters >= kMaxIters) {
+      *run_count = iters;
+      *run_seconds = seconds;
+      return;
+    }
+
+    // Update number of iterations.  Overshoot by 40% in an attempt
+    // to succeed the next time.
+    double multiplier = 1.4 * kMinTime / std::max(seconds, 1e-9);
+    multiplier = std::min(10.0, multiplier);
+    if (multiplier <= 1.0) multiplier *= 2.0;
+    iters = std::max<int64>(multiplier * iters, iters + 1);
+    iters = std::min(iters, kMaxIters);
+  }
+}
+
+// TODO(vrv): Add support for running a subset of benchmarks by having
+// RunBenchmarks take in a spec (and maybe other options such as
+// benchmark_min_time, etc).
+void RunBenchmarks() { Benchmark::Run("all"); }
+void SetLabel(const std::string& l) { label = l; }
+void BytesProcessed(int64 n) { bytes_processed = n; }
+void ItemsProcessed(int64 n) { items_processed = n; }
+void StartTiming() {
+  if (start_time == 0) start_time = env->NowMicros();
+}
+void StopTiming() {
+  if (start_time != 0) {
+    accum_time += (env->NowMicros() - start_time);
+    start_time = 0;
+  }
+}
+void UseRealTime() {}
+
+}  // namespace testing
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/default/thread_annotations.h b/tensorflow/core/platform/default/thread_annotations.h
new file mode 100644
index 0000000000..fed39bf810
--- /dev/null
+++ b/tensorflow/core/platform/default/thread_annotations.h
@@ -0,0 +1,185 @@
+// Copyright (c) 2008, Google Inc.
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+// ---
+//
+// This header file contains the macro definitions for thread safety
+// annotations that allow the developers to document the locking policies
+// of their multi-threaded code. The annotations can also help program
+// analysis tools to identify potential thread safety issues.
+//
+// The primary documentation on these annotations is external:
+// http://clang.llvm.org/docs/ThreadSafetyAnalysis.html
+//
+// The annotations are implemented using compiler attributes.
+// Using the macros defined here instead of the raw attributes allows
+// for portability and future compatibility.
+//
+// When referring to mutexes in the arguments of the attributes, you should
+// use variable names or more complex expressions (e.g. my_object->mutex_)
+// that evaluate to a concrete mutex object whenever possible. If the mutex
+// you want to refer to is not in scope, you may use a member pointer
+// (e.g. &MyClass::mutex_) to refer to a mutex in some (unknown) object.
+//
+
+#ifndef TENSORFLOW_PLATFORM_DEFAULT_THREAD_ANNOTATIONS_H_
+#define TENSORFLOW_PLATFORM_DEFAULT_THREAD_ANNOTATIONS_H_
+
+#if defined(__clang__) && (!defined(SWIG))
+#define THREAD_ANNOTATION_ATTRIBUTE__(x) __attribute__((x))
+#else
+#define THREAD_ANNOTATION_ATTRIBUTE__(x)  // no-op
+#endif
+
+// Document if a shared variable/field needs to be protected by a mutex.
+// GUARDED_BY allows the user to specify a particular mutex that should be
+// held when accessing the annotated variable.  GUARDED_VAR indicates that
+// a shared variable is guarded by some unspecified mutex, for use in rare
+// cases where a valid mutex expression cannot be specified.
+#define GUARDED_BY(x) THREAD_ANNOTATION_ATTRIBUTE__(guarded_by(x))
+#define GUARDED_VAR THREAD_ANNOTATION_ATTRIBUTE__(guarded)
+
+// Document if the memory location pointed to by a pointer should be guarded
+// by a mutex when dereferencing the pointer.  PT_GUARDED_VAR is analogous to
+// GUARDED_VAR.   Note that a pointer variable to a shared memory location
+// could itself be a shared variable. For example, if a shared global pointer
+// q, which is guarded by mu1, points to a shared memory location that is
+// guarded by mu2, q should be annotated as follows:
+//     int *q GUARDED_BY(mu1) PT_GUARDED_BY(mu2);
+#define PT_GUARDED_BY(x) THREAD_ANNOTATION_ATTRIBUTE__(pt_guarded_by(x))
+#define PT_GUARDED_VAR THREAD_ANNOTATION_ATTRIBUTE__(pt_guarded)
+
+// Document the acquisition order between locks that can be held
+// simultaneously by a thread. For any two locks that need to be annotated
+// to establish an acquisition order, only one of them needs the annotation.
+// (i.e. You don't have to annotate both locks with both ACQUIRED_AFTER
+// and ACQUIRED_BEFORE.)
+#define ACQUIRED_AFTER(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(acquired_after(__VA_ARGS__))
+
+#define ACQUIRED_BEFORE(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(acquired_before(__VA_ARGS__))
+
+// Document a function that expects a mutex to be held prior to entry.
+// The mutex is expected to be held both on entry to and exit from the
+// function.
+#define EXCLUSIVE_LOCKS_REQUIRED(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_locks_required(__VA_ARGS__))
+
+#define SHARED_LOCKS_REQUIRED(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(shared_locks_required(__VA_ARGS__))
+
+// Document the locks acquired in the body of the function. These locks
+// cannot be held when calling this function (for instance, when the
+// mutex implementation is non-reentrant).
+#define LOCKS_EXCLUDED(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(locks_excluded(__VA_ARGS__))
+
+// Document a function that returns a mutex without acquiring it.  For example,
+// a public getter method that returns a pointer to a private mutex should
+// be annotated with LOCK_RETURNED.
+#define LOCK_RETURNED(x) THREAD_ANNOTATION_ATTRIBUTE__(lock_returned(x))
+
+// Document if a class/type is a lockable type (such as the Mutex class).
+#define LOCKABLE THREAD_ANNOTATION_ATTRIBUTE__(lockable)
+
+// Document if a class does RAII locking (such as the MutexLock class).
+// The constructor should use LOCK_FUNCTION to specify the mutex that is
+// acquired, and the destructor should use UNLOCK_FUNCTION with no arguments;
+// the analysis will assume that the destructor unlocks whatever the
+// constructor locked.
+#define SCOPED_LOCKABLE THREAD_ANNOTATION_ATTRIBUTE__(scoped_lockable)
+
+// Document functions that acquire a lock in the body of a function, and do
+// not release it.
+#define EXCLUSIVE_LOCK_FUNCTION(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_lock_function(__VA_ARGS__))
+
+#define SHARED_LOCK_FUNCTION(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(shared_lock_function(__VA_ARGS__))
+
+// Document functions that expect a lock to be held on entry to the function,
+// and release it in the body of the function.
+#define UNLOCK_FUNCTION(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(unlock_function(__VA_ARGS__))
+
+// Document functions that try to acquire a lock, and return success or failure
+// (or a non-boolean value that can be interpreted as a boolean).
+// The first argument should be true for functions that return true on success,
+// or false for functions that return false on success. The second argument
+// specifies the mutex that is locked on success. If unspecified, it is assumed
+// to be 'this'.
+#define EXCLUSIVE_TRYLOCK_FUNCTION(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_trylock_function(__VA_ARGS__))
+
+#define SHARED_TRYLOCK_FUNCTION(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(shared_trylock_function(__VA_ARGS__))
+
+// Document functions that dynamically check to see if a lock is held, and fail
+// if it is not held.
+#define ASSERT_EXCLUSIVE_LOCK(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(assert_exclusive_lock(__VA_ARGS__))
+
+#define ASSERT_SHARED_LOCK(...) \
+  THREAD_ANNOTATION_ATTRIBUTE__(assert_shared_lock(__VA_ARGS__))
+
+// Turns off thread safety checking within the body of a particular function.
+// This is used as an escape hatch for cases where either (a) the function
+// is correct, but the locking is more complicated than the analyzer can handle,
+// or (b) the function contains race conditions that are known to be benign.
+#define NO_THREAD_SAFETY_ANALYSIS \
+  THREAD_ANNOTATION_ATTRIBUTE__(no_thread_safety_analysis)
+
+// TS_UNCHECKED should be placed around lock expressions that are not valid
+// C++ syntax, but which are present for documentation purposes.  These
+// annotations will be ignored by the analysis.
+#define TS_UNCHECKED(x) ""
+
+// Disables warnings for a single read operation.  This can be used to do racy
+// reads of guarded data members, in cases where the race is benign.
+#define TS_UNCHECKED_READ(x) \
+  ::tensorflow::thread_safety_analysis::ts_unchecked_read(x)
+
+namespace tensorflow {
+namespace thread_safety_analysis {
+
+// Takes a reference to a guarded data member, and returns an unguarded
+// reference.
+template <class T>
+inline const T& ts_unchecked_read(const T& v) NO_THREAD_SAFETY_ANALYSIS {
+  return v;
+}
+
+template <class T>
+inline T& ts_unchecked_read(T& v) NO_THREAD_SAFETY_ANALYSIS {
+  return v;
+}
+}  // namespace thread_safety_analysis
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_DEFAULT_THREAD_ANNOTATIONS_H_
diff --git a/tensorflow/core/platform/default/tracing.cc b/tensorflow/core/platform/default/tracing.cc
new file mode 100644
index 0000000000..a4ddfad928
--- /dev/null
+++ b/tensorflow/core/platform/default/tracing.cc
@@ -0,0 +1,37 @@
+#include "tensorflow/core/platform/tracing.h"
+
+#include <unistd.h>
+
+namespace tensorflow {
+namespace port {
+
+void Tracing::RegisterEvent(EventCategory id, const char* name) {
+  // TODO(opensource): implement
+}
+
+void Tracing::Initialize() {}
+
+static bool TryGetEnv(const char* name, const char** value) {
+  *value = getenv(name);
+  return *value != nullptr && (*value)[0] != '\0';
+}
+
+const char* Tracing::LogDir() {
+  const char* dir;
+  if (TryGetEnv("TEST_TMPDIR", &dir)) return dir;
+  if (TryGetEnv("TMP", &dir)) return dir;
+  if (TryGetEnv("TMPDIR", &dir)) return dir;
+  dir = "/tmp";
+  if (access(dir, R_OK | W_OK | X_OK) == 0) return dir;
+  return ".";  // Default to current directory.
+}
+
+static bool DoInit() {
+  Tracing::Initialize();
+  return true;
+}
+
+static const bool dummy = DoInit();
+
+}  // namespace port
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/default/tracing_impl.h b/tensorflow/core/platform/default/tracing_impl.h
new file mode 100644
index 0000000000..e2f5d3cb3f
--- /dev/null
+++ b/tensorflow/core/platform/default/tracing_impl.h
@@ -0,0 +1,44 @@
+#ifndef TENSORFLOW_PLATFORM_DEFAULT_TRACING_IMPL_H_
+#define TENSORFLOW_PLATFORM_DEFAULT_TRACING_IMPL_H_
+
+// Stub implementations of tracing functionality.
+
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/lib/random/random.h"
+#include "tensorflow/core/platform/tracing.h"
+
+namespace tensorflow {
+namespace port {
+
+// Definitions that do nothing for platforms that don't have underlying thread
+// tracing support.
+#define TRACELITERAL(a) \
+  do {                  \
+  } while (0)
+#define TRACESTRING(s) \
+  do {                 \
+  } while (0)
+#define TRACEPRINTF(format, ...) \
+  do {                           \
+  } while (0)
+
+inline uint64 Tracing::UniqueId() { return random::New64(); }
+inline bool Tracing::IsActive() { return false; }
+inline void Tracing::RegisterCurrentThread(const char* name) {}
+
+// Posts an atomic threadscape event with the supplied category and arg.
+inline void Tracing::RecordEvent(EventCategory category, uint64 arg) {
+  // TODO(opensource): Implement
+}
+
+inline Tracing::ScopedActivity::ScopedActivity(EventCategory category,
+                                               uint64 arg)
+    : enabled_(false), region_id_(category_id_[category]) {}
+
+inline Tracing::ScopedActivity::~ScopedActivity() {}
+
+}  // namespace port
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_DEFAULT_TRACING_IMPL_H_
diff --git a/tensorflow/core/platform/env.cc b/tensorflow/core/platform/env.cc
new file mode 100644
index 0000000000..3e3c0ad74e
--- /dev/null
+++ b/tensorflow/core/platform/env.cc
@@ -0,0 +1,129 @@
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+Env::~Env() {}
+
+RandomAccessFile::~RandomAccessFile() {}
+
+WritableFile::~WritableFile() {}
+
+Thread::~Thread() {}
+
+EnvWrapper::~EnvWrapper() {}
+
+Status ReadFileToString(Env* env, const string& fname, string* data) {
+  data->clear();
+  RandomAccessFile* file;
+  Status s = env->NewRandomAccessFile(fname, &file);
+  if (!s.ok()) {
+    return s;
+  }
+  int64 offset = 0;
+  static const int kBufferSize = 8192;
+  char* space = new char[kBufferSize];
+  while (true) {
+    StringPiece fragment;
+    s = file->Read(offset, kBufferSize, &fragment, space);
+    if (!s.ok()) {
+      if (errors::IsOutOfRange(s)) {  // No more bytes, but not an error
+        s = Status::OK();
+        data->append(fragment.data(), fragment.size());
+      }
+      break;
+    }
+    offset += fragment.size();
+    data->append(fragment.data(), fragment.size());
+    if (fragment.empty()) {
+      break;
+    }
+  }
+  delete[] space;
+  delete file;
+  return s;
+}
+
+Status WriteStringToFile(Env* env, const string& fname,
+                         const StringPiece& data) {
+  WritableFile* file;
+  Status s = env->NewWritableFile(fname, &file);
+  if (!s.ok()) {
+    return s;
+  }
+  s = file->Append(data);
+  if (s.ok()) {
+    s = file->Close();
+  }
+  delete file;
+  return s;
+}
+
+// A ZeroCopyInputStream on a RandomAccessFile.
+namespace {
+class FileStream : public ::tensorflow::protobuf::io::ZeroCopyInputStream {
+ public:
+  explicit FileStream(RandomAccessFile* file) : file_(file), pos_(0) {}
+
+  void BackUp(int count) override { pos_ -= count; }
+  bool Skip(int count) override {
+    pos_ += count;
+    return true;
+  }
+  int64 ByteCount() const override { return pos_; }
+  Status status() const { return status_; }
+
+  bool Next(const void** data, int* size) override {
+    StringPiece result;
+    Status s = file_->Read(pos_, kBufSize, &result, scratch_);
+    if (result.empty()) {
+      status_ = s;
+      return false;
+    }
+    pos_ += result.size();
+    *data = result.data();
+    *size = result.size();
+    return true;
+  }
+
+ private:
+  static const int kBufSize = 512 << 10;
+
+  RandomAccessFile* file_;
+  int64 pos_;
+  Status status_;
+  char scratch_[kBufSize];
+};
+
+}  // namespace
+
+Status ReadBinaryProto(Env* env, const string& fname,
+                       ::tensorflow::protobuf::MessageLite* proto) {
+  RandomAccessFile* file;
+  auto s = env->NewRandomAccessFile(fname, &file);
+  if (!s.ok()) {
+    return s;
+  }
+  std::unique_ptr<RandomAccessFile> file_holder(file);
+  std::unique_ptr<FileStream> stream(new FileStream(file));
+
+  // TODO(jiayq): the following coded stream is for debugging purposes to allow
+  // one to parse arbitrarily large messages for MessageLite. One most likely
+  // doesn't want to put protobufs larger than 64MB on Android, so we should
+  // eventually remove this and quit loud when a large protobuf is passed in.
+  ::tensorflow::protobuf::io::CodedInputStream coded_stream(stream.get());
+  // Total bytes hard limit / warning limit are set to 1GB and 512MB
+  // respectively.
+  coded_stream.SetTotalBytesLimit(1024LL << 20, 512LL << 20);
+
+  if (!proto->ParseFromCodedStream(&coded_stream)) {
+    s = stream->status();
+    if (s.ok()) {
+      s = Status(error::DATA_LOSS, "Parse error");
+    }
+  }
+  return s;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/env_test.cc b/tensorflow/core/platform/env_test.cc
new file mode 100644
index 0000000000..be15c4a5cb
--- /dev/null
+++ b/tensorflow/core/platform/env_test.cc
@@ -0,0 +1,31 @@
+#include "tensorflow/core/public/env.h"
+
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/test.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+struct EnvTest {};
+
+TEST(EnvTest, ReadFileToString) {
+  Env* env = Env::Default();
+  const string dir = testing::TmpDir();
+  for (const int length : {0, 1, 1212, 2553, 4928, 8196, 9000}) {
+    const string filename = io::JoinPath(dir, strings::StrCat("file", length));
+
+    // Write a file with the given length
+    string input(length, 0);
+    for (int i = 0; i < length; i++) input[i] = i;
+    WriteStringToFile(env, filename, input);
+
+    // Read the file back and check equality
+    string output;
+    TF_CHECK_OK(ReadFileToString(env, filename, &output));
+    CHECK_EQ(length, output.size());
+    CHECK_EQ(input, output);
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/init_main.h b/tensorflow/core/platform/init_main.h
new file mode 100644
index 0000000000..ce3d1fbc2f
--- /dev/null
+++ b/tensorflow/core/platform/init_main.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_PLATFORM_INIT_MAIN_H_
+#define TENSORFLOW_PLATFORM_INIT_MAIN_H_
+
+namespace tensorflow {
+namespace port {
+
+// Platform-specific initialization routine that may be invoked by a
+// main() program that uses TensorFlow.
+//
+// Default implementation does nothing.
+void InitMain(const char* usage, int* argc, char*** argv);
+
+}  // namespace port
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_INIT_MAIN_H_
diff --git a/tensorflow/core/platform/integral_types_test.cc b/tensorflow/core/platform/integral_types_test.cc
new file mode 100644
index 0000000000..067787a9f4
--- /dev/null
+++ b/tensorflow/core/platform/integral_types_test.cc
@@ -0,0 +1,33 @@
+#include "tensorflow/core/platform/port.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+TEST(IntegralTypes, Basic) {
+  EXPECT_EQ(1, sizeof(int8));
+  EXPECT_EQ(2, sizeof(int16));
+  EXPECT_EQ(4, sizeof(int32));
+  EXPECT_EQ(8, sizeof(int64));
+
+  EXPECT_EQ(1, sizeof(uint8));
+  EXPECT_EQ(2, sizeof(uint16));
+  EXPECT_EQ(4, sizeof(uint32));
+  EXPECT_EQ(8, sizeof(uint64));
+}
+
+TEST(IntegralTypes, MinAndMaxConstants) {
+  EXPECT_EQ(static_cast<uint8>(kint8min), static_cast<uint8>(kint8max) + 1);
+  EXPECT_EQ(static_cast<uint16>(kint16min), static_cast<uint16>(kint16max) + 1);
+  EXPECT_EQ(static_cast<uint32>(kint32min), static_cast<uint32>(kint32max) + 1);
+  EXPECT_EQ(static_cast<uint64>(kint64min), static_cast<uint64>(kint64max) + 1);
+
+  EXPECT_EQ(0, static_cast<uint8>(kuint8max + 1));
+  EXPECT_EQ(0, static_cast<uint16>(kuint16max + 1));
+  EXPECT_EQ(0, static_cast<uint32>(kuint32max + 1));
+  EXPECT_EQ(0, static_cast<uint64>(kuint64max + 1));
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/logging.h b/tensorflow/core/platform/logging.h
new file mode 100644
index 0000000000..66caf22ede
--- /dev/null
+++ b/tensorflow/core/platform/logging.h
@@ -0,0 +1,12 @@
+#ifndef TENSORFLOW_PLATFORM_LOGGING_H_
+#define TENSORFLOW_PLATFORM_LOGGING_H_
+
+#include "tensorflow/core/platform/port.h"  // To pick up PLATFORM_define
+
+#if defined(PLATFORM_GOOGLE) || defined(PLATFORM_GOOGLE_ANDROID)
+#include "base/logging.h"
+#else
+#include "tensorflow/core/platform/default/logging.h"
+#endif
+
+#endif  // TENSORFLOW_PLATFORM_LOGGING_H_
diff --git a/tensorflow/core/platform/logging_test.cc b/tensorflow/core/platform/logging_test.cc
new file mode 100644
index 0000000000..03d734ae95
--- /dev/null
+++ b/tensorflow/core/platform/logging_test.cc
@@ -0,0 +1,76 @@
+#include "tensorflow/core/platform/logging.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TEST(Logging, Log) {
+  LOG(INFO) << "Hello";
+  LOG(INFO) << "Another log message";
+  LOG(ERROR) << "Error message";
+  VLOG(1) << "A VLOG message";
+  VLOG(2) << "A higher VLOG message";
+}
+
+TEST(Logging, CheckChecks) {
+  CHECK(true);
+  CHECK(7 > 5);
+  string a("abc");
+  string b("xyz");
+  CHECK_EQ(a, a);
+  CHECK_NE(a, b);
+  CHECK_EQ(3, 3);
+  CHECK_NE(4, 3);
+  CHECK_GT(4, 3);
+  CHECK_GE(3, 3);
+  CHECK_LT(2, 3);
+  CHECK_LE(2, 3);
+
+  DCHECK(true);
+  DCHECK(7 > 5);
+  DCHECK_EQ(a, a);
+  DCHECK_NE(a, b);
+  DCHECK_EQ(3, 3);
+  DCHECK_NE(4, 3);
+  DCHECK_GT(4, 3);
+  DCHECK_GE(3, 3);
+  DCHECK_LT(2, 3);
+  DCHECK_LE(2, 3);
+}
+
+TEST(LoggingDeathTest, FailedChecks) {
+  string a("abc");
+  string b("xyz");
+  const char* p_const = "hello there";
+  const char* p_null_const = nullptr;
+  char mybuf[10];
+  char* p_non_const = mybuf;
+  char* p_null = nullptr;
+  CHECK_NOTNULL(p_const);
+  CHECK_NOTNULL(p_non_const);
+
+  ASSERT_DEATH(CHECK(false), "false");
+  ASSERT_DEATH(CHECK(9 < 7), "9 < 7");
+  ASSERT_DEATH(CHECK_EQ(a, b), "a == b");
+  ASSERT_DEATH(CHECK_EQ(3, 4), "3 == 4");
+  ASSERT_DEATH(CHECK_NE(3, 3), "3 != 3");
+  ASSERT_DEATH(CHECK_GT(2, 3), "2 > 3");
+  ASSERT_DEATH(CHECK_GE(2, 3), "2 >= 3");
+  ASSERT_DEATH(CHECK_LT(3, 2), "3 < 2");
+  ASSERT_DEATH(CHECK_LE(3, 2), "3 <= 2");
+  ASSERT_DEATH(CHECK(false), "false");
+  ASSERT_DEATH(printf("%s", CHECK_NOTNULL(p_null)), "Must be non NULL");
+  ASSERT_DEATH(printf("%s", CHECK_NOTNULL(p_null_const)), "Must be non NULL");
+#ifndef NDEBUG
+  ASSERT_DEATH(DCHECK(9 < 7), "9 < 7");
+  ASSERT_DEATH(DCHECK(9 < 7), "9 < 7");
+  ASSERT_DEATH(DCHECK_EQ(a, b), "a == b");
+  ASSERT_DEATH(DCHECK_EQ(3, 4), "3 == 4");
+  ASSERT_DEATH(DCHECK_NE(3, 3), "3 != 3");
+  ASSERT_DEATH(DCHECK_GT(2, 3), "2 > 3");
+  ASSERT_DEATH(DCHECK_GE(2, 3), "2 >= 3");
+  ASSERT_DEATH(DCHECK_LT(3, 2), "3 < 2");
+  ASSERT_DEATH(DCHECK_LE(3, 2), "3 <= 2");
+#endif
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/port.h b/tensorflow/core/platform/port.h
new file mode 100644
index 0000000000..fef20f7753
--- /dev/null
+++ b/tensorflow/core/platform/port.h
@@ -0,0 +1,228 @@
+#ifndef TENSORFLOW_PLATFORM_PORT_H_
+#define TENSORFLOW_PLATFORM_PORT_H_
+
+#include <string>
+#include <vector>
+
+#if !defined(PLATFORM_POSIX) && !defined(PLATFORM_GOOGLE) && \
+    !defined(PLATFORM_POSIX_ANDROID) && !defined(PLATFORM_GOOGLE_ANDROID)
+
+// Choose which platform we are on.
+#if defined(ANDROID) || defined(__ANDROID__)
+#define PLATFORM_POSIX_ANDROID
+#elif defined(__APPLE__)
+#define PLATFORM_POSIX
+#else
+// If no platform specified, use:
+#define PLATFORM_POSIX
+#endif
+
+#endif
+
+// Define tensorflow::string to refer to appropriate platform specific type.
+namespace tensorflow {
+#if defined(PLATFORM_GOOGLE)
+using ::string;
+#else
+using std::string;
+#endif
+}  // namespace tensorflow
+
+namespace tensorflow {
+enum ConditionResult { kCond_Timeout, kCond_MaybeNotified };
+}  // namespace tensorflow
+
+// Include appropriate platform-dependent implementations of mutex etc.
+#if defined(PLATFORM_GOOGLE)
+#include "tensorflow/core/platform/google/integral_types.h"
+#include "tensorflow/core/platform/google/mutex.h"
+#include "tensorflow/core/platform/google/dynamic_annotations.h"
+#elif defined(PLATFORM_POSIX) || defined(PLATFORM_POSIX_ANDROID) || \
+    defined(PLATFORM_GOOGLE_ANDROID)
+#include "tensorflow/core/platform/default/integral_types.h"
+#include "tensorflow/core/platform/default/mutex.h"
+#include "tensorflow/core/platform/default/dynamic_annotations.h"
+#else
+#error Define the appropriate PLATFORM_<foo> macro for this platform
+#endif
+
+namespace tensorflow {
+
+static const uint8 kuint8max = ((uint8)0xFF);
+static const uint16 kuint16max = ((uint16)0xFFFF);
+static const uint32 kuint32max = ((uint32)0xFFFFFFFF);
+static const uint64 kuint64max = ((uint64)0xFFFFFFFFFFFFFFFFull);
+static const int8 kint8min = ((int8)~0x7F);
+static const int8 kint8max = ((int8)0x7F);
+static const int16 kint16min = ((int16)~0x7FFF);
+static const int16 kint16max = ((int16)0x7FFF);
+static const int32 kint32min = ((int32)~0x7FFFFFFF);
+static const int32 kint32max = ((int32)0x7FFFFFFF);
+static const int64 kint64min = ((int64)~0x7FFFFFFFFFFFFFFFll);
+static const int64 kint64max = ((int64)0x7FFFFFFFFFFFFFFFll);
+
+// A typedef for a uint64 used as a short fingerprint.
+typedef uint64 Fprint;
+
+// The mutex library included above defines:
+//   class mutex;
+//   class mutex_lock;
+//   class condition_variable;
+// It also defines the following:
+
+// Like "cv->wait(*mu)", except that it only waits for up to "ms" milliseconds.
+//
+// Returns kCond_Timeout if the timeout expired without this
+// thread noticing a signal on the condition variable.  Otherwise may
+// return either kCond_Timeout or kCond_MaybeNotified
+ConditionResult WaitForMilliseconds(mutex_lock* mu, condition_variable* cv,
+                                    int64 ms);
+}  // namespace tensorflow
+
+namespace tensorflow {
+namespace port {
+
+// TODO(jeff,sanjay): Make portable
+static const bool kLittleEndian = true;
+
+// TODO(jeff,sanjay): Find appropriate places for all the code below.
+// Possible places for any particular item below:
+//  (a) Here, so it gets reimplemented on every platform
+//  (b) Env
+//  (c) config.h (auto-generated by autotools?)
+//  (d) macros.h
+//  ...
+
+// Return the hostname of the machine on which this process is running
+string Hostname();
+
+// Returns an estimate of the number of schedulable CPUs for this
+// process.  Usually, it's constant throughout the lifetime of a
+// process, but it might change if the underlying cluster management
+// software can change it dynamically.
+int NumSchedulableCPUs();
+
+// Some platforms require that filenames be of a certain form when
+// used for logging.  This function is invoked to allow platforms to
+// adjust the filename used for logging appropriately, if necessary
+// (most ports can just do nothing).  If any changes are necessary, the
+// implementation should mutate "*filename" appropriately.
+void AdjustFilenameForLogging(string* filename);
+
+// Aligned allocation/deallocation
+void* aligned_malloc(size_t size, int minimum_alignment);
+void aligned_free(void* aligned_memory);
+
+// Prefetching support
+//
+// Defined behavior on some of the uarchs:
+// PREFETCH_HINT_T0:
+//   prefetch to all levels of the hierarchy (except on p4: prefetch to L2)
+// PREFETCH_HINT_NTA:
+//   p4: fetch to L2, but limit to 1 way (out of the 8 ways)
+//   core: skip L2, go directly to L1
+//   k8 rev E and later: skip L2, can go to either of the 2-ways in L1
+enum PrefetchHint {
+  PREFETCH_HINT_T0 = 3,  // More temporal locality
+  PREFETCH_HINT_T1 = 2,
+  PREFETCH_HINT_T2 = 1,  // Less temporal locality
+  PREFETCH_HINT_NTA = 0  // No temporal locality
+};
+template <PrefetchHint hint>
+void prefetch(const void* x);
+
+// Snappy compression/decompression support
+bool Snappy_Compress(const char* input, size_t length, string* output);
+
+bool Snappy_GetUncompressedLength(const char* input, size_t length,
+                                  size_t* result);
+bool Snappy_Uncompress(const char* input, size_t length, char* output);
+
+#if defined(__GXX_EXPERIMENTAL_CXX0X__) || __cplusplus >= 201103L
+// Define this to 1 if the code is compiled in C++11 mode; leave it
+// undefined otherwise.  Do NOT define it to 0 -- that causes
+// '#ifdef LANG_CXX11' to behave differently from '#if LANG_CXX11'.
+#define LANG_CXX11 1
+#endif
+
+// Compiler attributes
+#if (defined(__GNUC__) || defined(__APPLE__)) && !defined(SWIG)
+// Compiler supports GCC-style attributes
+#define TF_ATTRIBUTE_NORETURN __attribute__((noreturn))
+#define TF_ATTRIBUTE_NOINLINE __attribute__((noinline))
+#define TF_ATTRIBUTE_UNUSED __attribute__((unused))
+#define TF_ATTRIBUTE_COLD __attribute__((cold))
+#define TF_PACKED __attribute__((packed))
+#define TF_MUST_USE_RESULT __attribute__((warn_unused_result))
+#define TF_PRINTF_ATTRIBUTE(string_index, first_to_check) \
+  __attribute__((__format__(__printf__, string_index, first_to_check)))
+#define TF_SCANF_ATTRIBUTE(string_index, first_to_check) \
+  __attribute__((__format__(__scanf__, string_index, first_to_check)))
+
+#else
+// Non-GCC equivalents
+#define TF_ATTRIBUTE_NORETURN
+#define TF_ATTRIBUTE_NOINLINE
+#define TF_ATTRIBUTE_UNUSED
+#define TF_ATTRIBUTE_COLD
+#define TF_MUST_USE_RESULT
+#define TF_PACKED
+#define TF_PRINTF_ATTRIBUTE(string_index, first_to_check)
+#define TF_SCANF_ATTRIBUTE(string_index, first_to_check)
+#endif
+
+// GCC can be told that a certain branch is not likely to be taken (for
+// instance, a CHECK failure), and use that information in static analysis.
+// Giving it this information can help it optimize for the common case in
+// the absence of better information (ie. -fprofile-arcs).
+//
+#if defined(COMPILER_GCC3)
+#define TF_PREDICT_FALSE(x) (__builtin_expect(x, 0))
+#define TF_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1))
+#else
+#define TF_PREDICT_FALSE(x) x
+#define TF_PREDICT_TRUE(x) x
+#endif
+
+// ---------------------------------------------------------------------------
+// Inline implementations of some performance-critical methods
+// ---------------------------------------------------------------------------
+template <PrefetchHint hint>
+inline void prefetch(const void* x) {
+#if defined(__llvm__) || defined(COMPILER_GCC)
+  __builtin_prefetch(x, 0, hint);
+#else
+// You get no effect.  Feel free to add more sections above.
+#endif
+}
+
+// A macro to disallow the copy constructor and operator= functions
+// This is usually placed in the private: declarations for a class.
+#define TF_DISALLOW_COPY_AND_ASSIGN(TypeName) \
+  TypeName(const TypeName&) = delete;         \
+  void operator=(const TypeName&) = delete
+
+// The TF_ARRAYSIZE(arr) macro returns the # of elements in an array arr.
+//
+// The expression TF_ARRAYSIZE(a) is a compile-time constant of type
+// size_t.
+#define TF_ARRAYSIZE(a)         \
+  ((sizeof(a) / sizeof(*(a))) / \
+   static_cast<size_t>(!(sizeof(a) % sizeof(*(a)))))
+
+#if defined(__clang__) && defined(LANG_CXX11) && defined(__has_warning)
+#if __has_feature(cxx_attributes) && __has_warning("-Wimplicit-fallthrough")
+#define TF_FALLTHROUGH_INTENDED [[clang::fallthrough]]  // NOLINT
+#endif
+#endif
+
+#ifndef TF_FALLTHROUGH_INTENDED
+#define TF_FALLTHROUGH_INTENDED \
+  do {                          \
+  } while (0)
+#endif
+
+}  // namespace port
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_PORT_H_
diff --git a/tensorflow/core/platform/port_test.cc b/tensorflow/core/platform/port_test.cc
new file mode 100644
index 0000000000..8cf1c30aa3
--- /dev/null
+++ b/tensorflow/core/platform/port_test.cc
@@ -0,0 +1,48 @@
+#include "tensorflow/core/platform/port.h"
+#include <condition_variable>
+#include "tensorflow/core/lib/core/threadpool.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace port {
+
+TEST(Port, AlignedMalloc) {
+  for (size_t alignment = 1; alignment <= 1 << 20; alignment <<= 1) {
+    void* p = aligned_malloc(1, alignment);
+    ASSERT_TRUE(p != NULL) << "aligned_malloc(1, " << alignment << ")";
+    uintptr_t pval = reinterpret_cast<uintptr_t>(p);
+    EXPECT_EQ(pval % alignment, 0);
+    aligned_free(p);
+  }
+}
+
+TEST(ConditionVariable, WaitForMilliseconds_Timeout) {
+  mutex m;
+  mutex_lock l(m);
+  condition_variable cv;
+  time_t start = time(NULL);
+  EXPECT_EQ(WaitForMilliseconds(&l, &cv, 3000), kCond_Timeout);
+  time_t finish = time(NULL);
+  EXPECT_GE(finish - start, 3);
+}
+
+TEST(ConditionVariable, WaitForMilliseconds_Signalled) {
+  thread::ThreadPool pool(Env::Default(), "test", 1);
+  mutex m;
+  mutex_lock l(m);
+  condition_variable cv;
+  time_t start = time(NULL);
+  // Sleep for just 1 second then notify.  We have a timeout of 3 secs,
+  // so the condition variable will notice the cv signal before the timeout.
+  pool.Schedule([&m, &cv]() {
+    sleep(1);
+    mutex_lock l(m);
+    cv.notify_all();
+  });
+  EXPECT_EQ(WaitForMilliseconds(&l, &cv, 3000), kCond_MaybeNotified);
+  time_t finish = time(NULL);
+  EXPECT_LT(finish - start, 3);
+}
+
+}  // namespace port
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/posix/env.cc b/tensorflow/core/platform/posix/env.cc
new file mode 100644
index 0000000000..6ba2010005
--- /dev/null
+++ b/tensorflow/core/platform/posix/env.cc
@@ -0,0 +1,385 @@
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <thread>
+
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/lib/core/error_codes.pb.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+namespace {
+
+error::Code ErrnoToCode(int err_number) {
+  error::Code code;
+  switch (err_number) {
+    case 0:
+      code = error::OK;
+      break;
+    case EINVAL:        // Invalid argument
+    case ENAMETOOLONG:  // Filename too long
+    case E2BIG:         // Argument list too long
+    case EDESTADDRREQ:  // Destination address required
+    case EDOM:          // Mathematics argument out of domain of function
+    case EFAULT:        // Bad address
+    case EILSEQ:        // Illegal byte sequence
+    case ENOPROTOOPT:   // Protocol not available
+    case ENOSTR:        // Not a STREAM
+    case ENOTSOCK:      // Not a socket
+    case ENOTTY:        // Inappropriate I/O control operation
+    case EPROTOTYPE:    // Protocol wrong type for socket
+    case ESPIPE:        // Invalid seek
+      code = error::INVALID_ARGUMENT;
+      break;
+    case ETIMEDOUT:  // Connection timed out
+    case ETIME:      // Timer expired
+      code = error::DEADLINE_EXCEEDED;
+      break;
+    case ENODEV:  // No such device
+    case ENOENT:  // No such file or directory
+    case ENXIO:   // No such device or address
+    case ESRCH:   // No such process
+      code = error::NOT_FOUND;
+      break;
+    case EEXIST:         // File exists
+    case EADDRNOTAVAIL:  // Address not available
+    case EALREADY:       // Connection already in progress
+      code = error::ALREADY_EXISTS;
+      break;
+    case EPERM:   // Operation not permitted
+    case EACCES:  // Permission denied
+    case EROFS:   // Read only file system
+      code = error::PERMISSION_DENIED;
+      break;
+    case ENOTEMPTY:   // Directory not empty
+    case EISDIR:      // Is a directory
+    case ENOTDIR:     // Not a directory
+    case EADDRINUSE:  // Address already in use
+    case EBADF:       // Invalid file descriptor
+    case EBUSY:       // Device or resource busy
+    case ECHILD:      // No child processes
+    case EISCONN:     // Socket is connected
+    case ENOTBLK:     // Block device required
+    case ENOTCONN:    // The socket is not connected
+    case EPIPE:       // Broken pipe
+    case ESHUTDOWN:   // Cannot send after transport endpoint shutdown
+    case ETXTBSY:     // Text file busy
+      code = error::FAILED_PRECONDITION;
+      break;
+    case ENOSPC:   // No space left on device
+    case EDQUOT:   // Disk quota exceeded
+    case EMFILE:   // Too many open files
+    case EMLINK:   // Too many links
+    case ENFILE:   // Too many open files in system
+    case ENOBUFS:  // No buffer space available
+    case ENODATA:  // No message is available on the STREAM read queue
+    case ENOMEM:   // Not enough space
+    case ENOSR:    // No STREAM resources
+    case EUSERS:   // Too many users
+      code = error::RESOURCE_EXHAUSTED;
+      break;
+    case EFBIG:      // File too large
+    case EOVERFLOW:  // Value too large to be stored in data type
+    case ERANGE:     // Result too large
+      code = error::OUT_OF_RANGE;
+      break;
+    case ENOSYS:           // Function not implemented
+    case ENOTSUP:          // Operation not supported
+    case EAFNOSUPPORT:     // Address family not supported
+    case EPFNOSUPPORT:     // Protocol family not supported
+    case EPROTONOSUPPORT:  // Protocol not supported
+    case ESOCKTNOSUPPORT:  // Socket type not supported
+    case EXDEV:            // Improper link
+      code = error::UNIMPLEMENTED;
+      break;
+    case EAGAIN:        // Resource temporarily unavailable
+    case ECONNREFUSED:  // Connection refused
+    case ECONNABORTED:  // Connection aborted
+    case ECONNRESET:    // Connection reset
+    case EINTR:         // Interrupted function call
+    case EHOSTDOWN:     // Host is down
+    case EHOSTUNREACH:  // Host is unreachable
+    case ENETDOWN:      // Network is down
+    case ENETRESET:     // Connection aborted by network
+    case ENETUNREACH:   // Network unreachable
+    case ENOLCK:        // No locks available
+    case ENOLINK:       // Link has been severed
+#if !defined(__APPLE__)
+    case ENONET:        // Machine is not on the network
+#endif
+      code = error::UNAVAILABLE;
+      break;
+    case EDEADLK:  // Resource deadlock avoided
+    case ESTALE:   // Stale file handle
+      code = error::ABORTED;
+      break;
+    case ECANCELED:  // Operation cancelled
+      code = error::CANCELLED;
+      break;
+    // NOTE: If you get any of the following (especially in a
+    // reproducible way) and can propose a better mapping,
+    // please email the owners about updating this mapping.
+    case EBADMSG:      // Bad message
+    case EIDRM:        // Identifier removed
+    case EINPROGRESS:  // Operation in progress
+    case EIO:          // I/O error
+    case ELOOP:        // Too many levels of symbolic links
+    case ENOEXEC:      // Exec format error
+    case ENOMSG:       // No message of the desired type
+    case EPROTO:       // Protocol error
+    case EREMOTE:      // Object is remote
+      code = error::UNKNOWN;
+      break;
+    default: {
+      code = error::UNKNOWN;
+      break;
+    }
+  }
+  return code;
+}
+
+static Status IOError(const string& context, int err_number) {
+  auto code = ErrnoToCode(err_number);
+  if (code == error::UNKNOWN) {
+    return Status(ErrnoToCode(err_number),
+                  context + "; " + strerror(err_number));
+  } else {
+    return Status(ErrnoToCode(err_number), context);
+  }
+}
+
+// pread() based random-access
+class PosixRandomAccessFile : public RandomAccessFile {
+ private:
+  string filename_;
+  int fd_;
+
+ public:
+  PosixRandomAccessFile(const string& fname, int fd)
+      : filename_(fname), fd_(fd) {}
+  ~PosixRandomAccessFile() override { close(fd_); }
+
+  Status Read(uint64 offset, size_t n, StringPiece* result,
+              char* scratch) const override {
+    Status s;
+    char* dst = scratch;
+    while (n > 0 && s.ok()) {
+      ssize_t r = pread(fd_, dst, n, static_cast<off_t>(offset));
+      if (r > 0) {
+        dst += r;
+        n -= r;
+        offset += r;
+      } else if (r == 0) {
+        s = Status(error::OUT_OF_RANGE, "Read less bytes than requested");
+      } else if (errno == EINTR || errno == EAGAIN) {
+        // Retry
+      } else {
+        s = IOError(filename_, errno);
+      }
+    }
+    *result = StringPiece(scratch, dst - scratch);
+    return s;
+  }
+};
+
+class PosixWritableFile : public WritableFile {
+ private:
+  string filename_;
+  FILE* file_;
+
+ public:
+  PosixWritableFile(const string& fname, FILE* f)
+      : filename_(fname), file_(f) {}
+
+  ~PosixWritableFile() override {
+    if (file_ != NULL) {
+      // Ignoring any potential errors
+      fclose(file_);
+    }
+  }
+
+  Status Append(const StringPiece& data) override {
+    size_t r = fwrite(data.data(), 1, data.size(), file_);
+    if (r != data.size()) {
+      return IOError(filename_, errno);
+    }
+    return Status::OK();
+  }
+
+  Status Close() override {
+    Status result;
+    if (fclose(file_) != 0) {
+      result = IOError(filename_, errno);
+    }
+    file_ = NULL;
+    return result;
+  }
+
+  Status Flush() override {
+    if (fflush(file_) != 0) {
+      return IOError(filename_, errno);
+    }
+    return Status::OK();
+  }
+
+  Status Sync() override {
+    Status s;
+    if (fflush(file_) != 0) {
+      s = IOError(filename_, errno);
+    }
+    return s;
+  }
+};
+
+class StdThread : public Thread {
+ public:
+  // name and thread_options are both ignored.
+  StdThread(const ThreadOptions& thread_options, const string& name,
+            std::function<void()> fn)
+      : thread_(fn) {}
+  ~StdThread() { thread_.join(); }
+
+ private:
+  std::thread thread_;
+};
+
+class PosixEnv : public Env {
+ public:
+  PosixEnv() {}
+
+  ~PosixEnv() override { LOG(FATAL) << "Env::Default() must not be destroyed"; }
+
+  Status NewRandomAccessFile(const string& fname,
+                             RandomAccessFile** result) override {
+    *result = NULL;
+    Status s;
+    int fd = open(fname.c_str(), O_RDONLY);
+    if (fd < 0) {
+      s = IOError(fname, errno);
+    } else {
+      *result = new PosixRandomAccessFile(fname, fd);
+    }
+    return s;
+  }
+
+  Status NewWritableFile(const string& fname, WritableFile** result) override {
+    Status s;
+    FILE* f = fopen(fname.c_str(), "w");
+    if (f == NULL) {
+      *result = NULL;
+      s = IOError(fname, errno);
+    } else {
+      *result = new PosixWritableFile(fname, f);
+    }
+    return s;
+  }
+
+  Status NewAppendableFile(const string& fname,
+                           WritableFile** result) override {
+    Status s;
+    FILE* f = fopen(fname.c_str(), "a");
+    if (f == NULL) {
+      *result = NULL;
+      s = IOError(fname, errno);
+    } else {
+      *result = new PosixWritableFile(fname, f);
+    }
+    return s;
+  }
+
+  bool FileExists(const string& fname) override {
+    return access(fname.c_str(), F_OK) == 0;
+  }
+
+  Status GetChildren(const string& dir, std::vector<string>* result) override {
+    result->clear();
+    DIR* d = opendir(dir.c_str());
+    if (d == NULL) {
+      return IOError(dir, errno);
+    }
+    struct dirent* entry;
+    while ((entry = readdir(d)) != NULL) {
+      StringPiece basename = entry->d_name;
+      if ((basename != ".") && (basename != "..")) {
+        result->push_back(entry->d_name);
+      }
+    }
+    closedir(d);
+    return Status::OK();
+  }
+
+  Status DeleteFile(const string& fname) override {
+    Status result;
+    if (unlink(fname.c_str()) != 0) {
+      result = IOError(fname, errno);
+    }
+    return result;
+  }
+
+  Status CreateDir(const string& name) override {
+    Status result;
+    if (mkdir(name.c_str(), 0755) != 0) {
+      result = IOError(name, errno);
+    }
+    return result;
+  }
+
+  Status DeleteDir(const string& name) override {
+    Status result;
+    if (rmdir(name.c_str()) != 0) {
+      result = IOError(name, errno);
+    }
+    return result;
+  }
+
+  Status GetFileSize(const string& fname, uint64* size) override {
+    Status s;
+    struct stat sbuf;
+    if (stat(fname.c_str(), &sbuf) != 0) {
+      *size = 0;
+      s = IOError(fname, errno);
+    } else {
+      *size = sbuf.st_size;
+    }
+    return s;
+  }
+
+  Status RenameFile(const string& src, const string& target) override {
+    Status result;
+    if (rename(src.c_str(), target.c_str()) != 0) {
+      result = IOError(src, errno);
+    }
+    return result;
+  }
+
+  uint64 NowMicros() override {
+    struct timeval tv;
+    gettimeofday(&tv, NULL);
+    return static_cast<uint64>(tv.tv_sec) * 1000000 + tv.tv_usec;
+  }
+
+  void SleepForMicroseconds(int micros) override { usleep(micros); }
+
+  Thread* StartThread(const ThreadOptions& thread_options, const string& name,
+                      std::function<void()> fn) override {
+    return new StdThread(thread_options, name, fn);
+  }
+};
+
+}  // namespace
+#if defined(PLATFORM_POSIX) || defined(__ANDROID__)
+Env* Env::Default() {
+  static Env* default_env = new PosixEnv;
+  return default_env;
+}
+#endif
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/posix/port.cc b/tensorflow/core/platform/posix/port.cc
new file mode 100644
index 0000000000..b4a1570ef9
--- /dev/null
+++ b/tensorflow/core/platform/posix/port.cc
@@ -0,0 +1,92 @@
+#include "tensorflow/core/platform/port.h"
+#if defined(__linux) && !defined(__ANDROID__)
+#include <sched.h>
+#endif
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#ifdef SNAPPY
+#include <snappy.h>
+#endif
+
+namespace tensorflow {
+namespace port {
+
+void InitMain(const char* usage, int* argc, char*** argv) {}
+
+string Hostname() {
+  char hostname[1024];
+  gethostname(hostname, sizeof hostname);
+  hostname[sizeof hostname - 1] = 0;
+  return string(hostname);
+}
+
+int NumSchedulableCPUs() {
+#if defined(__linux) && !defined(__ANDROID__)
+  cpu_set_t cpuset;
+  if (sched_getaffinity(0, sizeof(cpu_set_t), &cpuset) == 0) {
+    return CPU_COUNT(&cpuset);
+  }
+  perror("sched_getaffinity");
+#endif
+  const int kDefaultCores = 4;  // Semi-conservative guess
+  fprintf(stderr, "can't determine number of CPU cores: assuming %d\n",
+          kDefaultCores);
+  return kDefaultCores;
+}
+
+void* aligned_malloc(size_t size, int minimum_alignment) {
+#if defined(__ANDROID__)
+  return memalign(minimum_alignment, size);
+#else  // !__ANDROID__
+  void* ptr = NULL;
+  // posix_memalign requires that the requested alignment be at least
+  // sizeof(void*). In this case, fall back on malloc which should return
+  // memory aligned to at least the size of a pointer.
+  const int required_alignment = sizeof(void*);
+  if (minimum_alignment < required_alignment) return malloc(size);
+  if (posix_memalign(&ptr, minimum_alignment, size) != 0)
+    return NULL;
+  else
+    return ptr;
+#endif
+}
+
+void aligned_free(void* aligned_memory) { free(aligned_memory); }
+
+void AdjustFilenameForLogging(string* filename) {
+  // Nothing to do
+}
+
+bool Snappy_Compress(const char* input, size_t length, string* output) {
+#ifdef SNAPPY
+  output->resize(snappy::MaxCompressedLength(length));
+  size_t outlen;
+  snappy::RawCompress(input, length, &(*output)[0], &outlen);
+  output->resize(outlen);
+  return true;
+#else
+  return false;
+#endif
+}
+
+bool Snappy_GetUncompressedLength(const char* input, size_t length,
+                                  size_t* result) {
+#ifdef SNAPPY
+  return snappy::GetUncompressedLength(input, length, result);
+#else
+  return false;
+#endif
+}
+
+bool Snappy_Uncompress(const char* input, size_t length, char* output) {
+#ifdef SNAPPY
+  return snappy::RawUncompress(input, length, output);
+#else
+  return false;
+#endif
+}
+
+}  // namespace port
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/protobuf.h b/tensorflow/core/platform/protobuf.h
new file mode 100644
index 0000000000..3a166b3973
--- /dev/null
+++ b/tensorflow/core/platform/protobuf.h
@@ -0,0 +1,29 @@
+#ifndef TENSORFLOW_PLATFORM_PROTOBUF_H_
+#define TENSORFLOW_PLATFORM_PROTOBUF_H_
+
+// Import whatever namespace protobuf comes from into the
+// ::tensorflow::protobuf namespace.
+//
+// TensorFlow code should the ::tensorflow::protobuf namespace to refer
+// to all protobuf APIs.
+
+#include "tensorflow/core/platform/port.h"
+#if defined(PLATFORM_GOOGLE)
+#include "tensorflow/core/platform/google/protobuf.h"
+#elif defined(PLATFORM_GOOGLE_ANDROID)
+#include "tensorflow/core/platform/google/protobuf_android.h"
+#else
+#include "tensorflow/core/platform/default/protobuf.h"
+#endif
+
+namespace tensorflow {
+// Parses a protocol buffer contained in a string in the binary wire format.
+// Returns true on success. Note: Unlike protobuf's builtin ParseFromString,
+// this function has no size restrictions on the total size of the encoded
+// protocol buffer.
+bool ParseProtoUnlimited(protobuf::Message* proto, const string& serialized);
+bool ParseProtoUnlimited(protobuf::Message* proto, const void* serialized,
+                         size_t size);
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_PROTOBUF_H_
diff --git a/tensorflow/core/platform/protobuf_util.cc b/tensorflow/core/platform/protobuf_util.cc
new file mode 100644
index 0000000000..b698d3f0c2
--- /dev/null
+++ b/tensorflow/core/platform/protobuf_util.cc
@@ -0,0 +1,17 @@
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+bool ParseProtoUnlimited(protobuf::Message* proto, const string& serialized) {
+  return ParseProtoUnlimited(proto, serialized.data(), serialized.size());
+}
+
+bool ParseProtoUnlimited(protobuf::Message* proto, const void* serialized,
+                         size_t size) {
+  protobuf::io::CodedInputStream coded_stream(
+      reinterpret_cast<const uint8*>(serialized), size);
+  coded_stream.SetTotalBytesLimit(INT_MAX, INT_MAX);
+  return proto->ParseFromCodedStream(&coded_stream);
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/regexp.h b/tensorflow/core/platform/regexp.h
new file mode 100644
index 0000000000..ef46a7aca5
--- /dev/null
+++ b/tensorflow/core/platform/regexp.h
@@ -0,0 +1,33 @@
+#ifndef TENSORFLOW_PLATFORM_REGEXP_H_
+#define TENSORFLOW_PLATFORM_REGEXP_H_
+
+#include "tensorflow/core/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE) || defined(PLATFORM_GOOGLE_ANDROID)
+#include "third_party/re2/re2.h"
+namespace tensorflow {
+typedef ::StringPiece RegexpStringPiece;
+}  // namespace tensorflow
+
+#else
+
+#include "external/re2/re2/re2.h"
+namespace tensorflow {
+typedef re2::StringPiece RegexpStringPiece;
+}  // namespace tensorflow
+
+#endif
+
+namespace tensorflow {
+
+// Conversion to/from the appropriate StringPiece type for using in RE2
+inline RegexpStringPiece ToRegexpStringPiece(tensorflow::StringPiece sp) {
+  return RegexpStringPiece(sp.data(), sp.size());
+}
+inline tensorflow::StringPiece FromRegexpStringPiece(RegexpStringPiece sp) {
+  return tensorflow::StringPiece(sp.data(), sp.size());
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_REGEXP_H_
diff --git a/tensorflow/core/platform/stream_executor_util.h b/tensorflow/core/platform/stream_executor_util.h
new file mode 100644
index 0000000000..a6640fb26d
--- /dev/null
+++ b/tensorflow/core/platform/stream_executor_util.h
@@ -0,0 +1,12 @@
+#ifndef TENSORFLOW_PLATFORM_STREAM_EXECUTOR_UTIL_H_
+#define TENSORFLOW_PLATFORM_STREAM_EXECUTOR_UTIL_H_
+
+#include "tensorflow/core/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE)
+#include "tensorflow/core/platform/google/stream_executor_util.h"
+#else
+#include "tensorflow/core/platform/default/stream_executor_util.h"
+#endif
+
+#endif  // TENSORFLOW_PLATFORM_STREAM_EXECUTOR_UTIL_H_
diff --git a/tensorflow/core/platform/tensor_coding.cc b/tensorflow/core/platform/tensor_coding.cc
new file mode 100644
index 0000000000..a5cbd0ab44
--- /dev/null
+++ b/tensorflow/core/platform/tensor_coding.cc
@@ -0,0 +1,53 @@
+#include "tensorflow/core/platform/tensor_coding.h"
+
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+namespace port {
+
+void AssignRefCounted(StringPiece src, core::RefCounted* obj, string* out) {
+  out->assign(src.data(), src.size());
+}
+
+void EncodeStringList(const string* strings, int64 n, string* out) {
+  out->clear();
+  for (int i = 0; i < n; ++i) {
+    core::PutVarint32(out, strings[i].size());
+  }
+  for (int i = 0; i < n; ++i) {
+    out->append(strings[i]);
+  }
+}
+
+bool DecodeStringList(const string& src, string* strings, int64 n) {
+  std::vector<uint32> sizes(n);
+  StringPiece reader(src);
+  int64 tot = 0;
+  for (auto& v : sizes) {
+    if (!core::GetVarint32(&reader, &v)) return false;
+    tot += v;
+  }
+  if (tot != static_cast<int64>(reader.size())) {
+    return false;
+  }
+
+  string* data = strings;
+  for (int64 i = 0; i < n; ++i, ++data) {
+    auto size = sizes[i];
+    if (size > reader.size()) {
+      return false;
+    }
+    data->assign(reader.data(), size);
+    reader.remove_prefix(size);
+  }
+
+  return true;
+}
+
+void CopyFromArray(string* s, const char* base, size_t bytes) {
+  s->assign(base, bytes);
+}
+
+}  // namespace port
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/tensor_coding.h b/tensorflow/core/platform/tensor_coding.h
new file mode 100644
index 0000000000..6bb9991895
--- /dev/null
+++ b/tensorflow/core/platform/tensor_coding.h
@@ -0,0 +1,40 @@
+// Helper routines for encoding/decoding tensor contents.
+#ifndef TENSORFLOW_PLATFORM_TENSOR_CODING_H_
+#define TENSORFLOW_PLATFORM_TENSOR_CODING_H_
+
+#include <string>
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/port.h"
+
+#ifdef PLATFORM_GOOGLE
+#include "tensorflow/core/platform/google/cord_coding.h"
+#endif
+
+namespace tensorflow {
+namespace port {
+
+// Store src contents in *out.  If backing memory for src is shared with *out,
+// will ref obj during the call and will arrange to unref obj when no
+// longer needed.
+void AssignRefCounted(StringPiece src, core::RefCounted* obj, string* out);
+
+// Copy contents of src to dst[0,src.size()-1].
+inline void CopyToArray(const string& src, char* dst) {
+  memcpy(dst, src.data(), src.size());
+}
+
+// Store encoding of strings[0..n-1] in *out.
+void EncodeStringList(const string* strings, int64 n, string* out);
+
+// Decode n strings from src and store in strings[0..n-1].
+// Returns true if successful, false on parse error.
+bool DecodeStringList(const string& src, string* strings, int64 n);
+
+// Assigns base[0..bytes-1] to *s
+void CopyFromArray(string* s, const char* base, size_t bytes);
+
+}  // namespace port
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_TENSOR_CODING_H_
diff --git a/tensorflow/core/platform/test.cc b/tensorflow/core/platform/test.cc
new file mode 100644
index 0000000000..21c6905683
--- /dev/null
+++ b/tensorflow/core/platform/test.cc
@@ -0,0 +1,39 @@
+#include "tensorflow/core/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE) || defined(PLATFORM_POSIX_ANDROID) || \
+    defined(PLATFORM_GOOGLE_ANDROID)
+#include "testing/base/public/googletest.h"
+#endif
+
+namespace tensorflow {
+namespace testing {
+
+#if defined(PLATFORM_GOOGLE) || defined(PLATFORM_POSIX_ANDROID) || \
+    defined(PLATFORM_GOOGLE_ANDROID)
+string TmpDir() { return FLAGS_test_tmpdir; }
+int RandomSeed() { return FLAGS_test_random_seed; }
+#else
+string TmpDir() {
+  // 'bazel test' sets TEST_TMPDIR
+  const char* env = getenv("TEST_TMPDIR");
+  if (env && env[0] != '\0') {
+    return env;
+  }
+  env = getenv("TMPDIR");
+  if (env && env[0] != '\0') {
+    return env;
+  }
+  return "/tmp";
+}
+int RandomSeed() {
+  const char* env = getenv("TEST_RANDOM_SEED");
+  int result;
+  if (env && sscanf(env, "%d", &result) == 1) {
+    return result;
+  }
+  return 301;
+}
+#endif
+
+}  // namespace testing
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/test.h b/tensorflow/core/platform/test.h
new file mode 100644
index 0000000000..ea16fe1442
--- /dev/null
+++ b/tensorflow/core/platform/test.h
@@ -0,0 +1,17 @@
+#ifndef TENSORFLOW_PLATFORM_TEST_H_
+#define TENSORFLOW_PLATFORM_TEST_H_
+
+namespace tensorflow {
+namespace testing {
+
+// Return a temporary directory suitable for temporary testing files.
+string TmpDir();
+
+// Return a random number generator seed to use in randomized tests.
+// Returns the same value for the lifetime of the process.
+int RandomSeed();
+
+}  // namespace testing
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_TEST_H_
diff --git a/tensorflow/core/platform/test_benchmark.h b/tensorflow/core/platform/test_benchmark.h
new file mode 100644
index 0000000000..8c8a92a519
--- /dev/null
+++ b/tensorflow/core/platform/test_benchmark.h
@@ -0,0 +1,58 @@
+// Simple benchmarking facility.
+#ifndef TENSORFLOW_PLATFORM_TEST_BENCHMARK_H_
+#define TENSORFLOW_PLATFORM_TEST_BENCHMARK_H_
+
+#include "tensorflow/core/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE)
+#include "testing/base/public/benchmark.h"
+
+#else
+#define BENCHMARK(n)                                            \
+  static ::tensorflow::testing::Benchmark* TF_BENCHMARK_CONCAT( \
+      __benchmark_, n, __LINE__) TF_ATTRIBUTE_UNUSED =          \
+      (new ::tensorflow::testing::Benchmark(#n, (n)))
+#define TF_BENCHMARK_CONCAT(a, b, c) TF_BENCHMARK_CONCAT2(a, b, c)
+#define TF_BENCHMARK_CONCAT2(a, b, c) a##b##c
+
+#endif  // PLATFORM_GOOGLE
+
+namespace tensorflow {
+namespace testing {
+
+#if defined(PLATFORM_GOOGLE)
+using ::testing::Benchmark;
+#else
+class Benchmark {
+ public:
+  Benchmark(const char* name, void (*fn)(int));
+  Benchmark(const char* name, void (*fn)(int, int));
+
+  Benchmark* Arg(int x);
+  Benchmark* Range(int lo, int hi);
+  static void Run(const char* pattern);
+
+ private:
+  string name_;
+  int num_args_;
+  std::vector<int> args_;
+  void (*fn0_)(int) = nullptr;
+  void (*fn1_)(int, int) = nullptr;
+
+  void Register();
+  void Run(int arg, int* run_count, double* run_seconds);
+};
+#endif
+
+void RunBenchmarks();
+void SetLabel(const std::string& label);
+void BytesProcessed(int64);
+void ItemsProcessed(int64);
+void StartTiming();
+void StopTiming();
+void UseRealTime();
+
+}  // namespace testing
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PLATFORM_TEST_BENCHMARK_H_
diff --git a/tensorflow/core/platform/test_main.cc b/tensorflow/core/platform/test_main.cc
new file mode 100644
index 0000000000..11230c3f7b
--- /dev/null
+++ b/tensorflow/core/platform/test_main.cc
@@ -0,0 +1,31 @@
+// A program with a main that is suitable for unittests, including those
+// that also define microbenchmarks.  Based on whether the user specified
+// the --benchmark_filter flag which specifies which benchmarks to run,
+// we will either run benchmarks or run the gtest tests in the program.
+
+#include <iostream>
+
+#include "tensorflow/core/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE) || defined(PLATFORM_POSIX_ANDROID) || \
+    defined(PLATFORM_GOOGLE_ANDROID)
+// main() is supplied by gunit_main
+#else
+#include "gtest/gtest.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+
+GTEST_API_ int main(int argc, char** argv) {
+  std::cout << "Running main() from test_main.cc\n";
+
+  testing::InitGoogleTest(&argc, argv);
+  for (int i = 1; i < argc; i++) {
+    if (tensorflow::StringPiece(argv[i]).starts_with("--benchmarks=")) {
+      const char* pattern = argv[i] + strlen("--benchmarks=");
+      tensorflow::testing::Benchmark::Run(pattern);
+      return 0;
+    }
+  }
+  return RUN_ALL_TESTS();
+}
+#endif
diff --git a/tensorflow/core/platform/thread_annotations.h b/tensorflow/core/platform/thread_annotations.h
new file mode 100644
index 0000000000..cb8040eed6
--- /dev/null
+++ b/tensorflow/core/platform/thread_annotations.h
@@ -0,0 +1,14 @@
+#ifndef TENSORFLOW_PLATFORM_THREAD_ANNOTATIONS_H_
+#define TENSORFLOW_PLATFORM_THREAD_ANNOTATIONS_H_
+
+#include "tensorflow/core/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE) || defined(PLATFORM_GOOGLE_ANDROID)
+#include "base/thread_annotations.h"
+#elif defined(PLATFORM_POSIX) || defined(PLATFORM_POSIX_ANDROID)
+#include "tensorflow/core/platform/default/thread_annotations.h"
+#else
+#error Define the appropriate PLATFORM_<foo> macro for this platform
+#endif
+
+#endif  // TENSORFLOW_PLATFORM_THREAD_ANNOTATIONS_H_
diff --git a/tensorflow/core/platform/tracing.cc b/tensorflow/core/platform/tracing.cc
new file mode 100644
index 0000000000..a4cb92dee4
--- /dev/null
+++ b/tensorflow/core/platform/tracing.cc
@@ -0,0 +1,135 @@
+#include "tensorflow/core/platform/tracing.h"
+
+#include <atomic>
+#include <map>
+#include <string>
+#include "tensorflow/core/framework/step_stats.pb.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+StepStatsCollector::StepStatsCollector(StepStats* ss) : step_stats_(ss) {}
+
+void StepStatsCollector::Save(const string& device, NodeExecStats* nt) {
+  VLOG(1) << "Save dev " << device << " nt " << nt;
+  {
+    mutex_lock l(mu_);
+    DeviceStepStats* dss = nullptr;
+    // Slow linear scan, but it should only be called
+    // by a Worker in a context with < ~10 devices.
+    // TODO(tucker): consider adding a std::unordered_map.
+    for (auto& ds : *step_stats_->mutable_dev_stats()) {
+      if (ds.device() == device) {
+        dss = &ds;
+        break;
+      }
+    }
+    if (dss == nullptr) {
+      dss = step_stats_->add_dev_stats();
+      dss->set_device(device);
+    }
+    nt->Swap(dss->add_node_stats());
+  }
+  delete nt;
+}
+
+void StepStatsCollector::Swap(StepStats* ss) {
+  mutex_lock l(mu_);
+  CHECK(step_stats_);
+  ss->Swap(step_stats_);
+}
+
+namespace port {
+
+int32 Tracing::category_id_[kEventCategoryMax];
+uint64 Tracing::event_mask_ = 0;
+std::map<string, int32>* Tracing::name_map_ = new std::map<string, int32>;
+
+// This needs to be kept in sync with the EventCategory enumeration.
+const char* Tracing::EventCategoryString(EventCategory category) {
+  switch (category) {
+    case EventCategory::kScheduleClosure:
+      return "ScheduleClosure";
+    case EventCategory::kRunClosure:
+      return "RunClosure";
+    case EventCategory::kCompute:
+      return "Compute";
+    case EventCategory::kEventCategoryMax:
+      return "EventCategoryMax";
+  }
+  return "Unknown";
+}
+
+// This function allows the user to specify arbitrary subsets of the
+// supported Threadscape events and activities.
+bool Tracing::ParseEventMask(const char* flagname, const string& value) {
+  VLOG(1) << flagname << " set to " << value;
+  int64 new_mask = 0;
+  std::vector<string> events =
+      str_util::Split(value, ',', str_util::SkipEmpty());
+  for (string name : events) {
+    bool clear = false;
+    int64 mask = 0;
+    if (name[0] == '!') {
+      // invert the sense of the flag
+      clear = true;
+      name = name.substr(1);
+    }
+    if (name == "ALL") {
+      mask = ~0;
+    } else {
+      auto it = name_map_->find(name);
+      int32 id;
+      if (it == name_map_->end()) {
+        id = -1;
+      } else {
+        id = it->second;
+      }
+      if (id < 0) {
+        LOG(ERROR) << "Can't parse event mask name " << name;
+        return false;
+      }
+      mask = 1 << id;
+    }
+    if (clear) {
+      new_mask &= ~mask;
+    } else {
+      new_mask |= mask;
+    }
+  }
+  // parsing was successful; set the permanent event mask
+  event_mask_ = new_mask;
+  return true;
+}
+
+static std::atomic<Tracing::Engine*> tracing_engine;
+
+void Tracing::RegisterEngine(Engine* e) {
+  tracing_engine.store(e, std::memory_order_release);
+}
+
+static Tracing::Engine* engine() {
+  return tracing_engine.load(std::memory_order_acquire);
+}
+
+Tracing::Engine::~Engine() {}
+Tracing::Engine::Annotation::~Annotation() {}
+Tracing::Engine::Tracer::~Tracer() {}
+
+Tracing::ScopedAnnotation::ScopedAnnotation(StringPiece name) {
+  auto e = engine();
+  if (e) {
+    annotation_.reset(e->PushAnnotation(name));
+  }
+}
+
+Tracing::TraceMe::TraceMe(StringPiece name) {
+  auto e = engine();
+  if (e) {
+    tracer_.reset(e->StartTracing(name));
+  }
+}
+
+}  // namespace port
+}  // namespace tensorflow
diff --git a/tensorflow/core/platform/tracing.h b/tensorflow/core/platform/tracing.h
new file mode 100644
index 0000000000..2b53a64cf1
--- /dev/null
+++ b/tensorflow/core/platform/tracing.h
@@ -0,0 +1,205 @@
+#ifndef TENSORFLOW_PLATFORM_TRACING_H_
+#define TENSORFLOW_PLATFORM_TRACING_H_
+
+// Tracing interface
+
+#include <map>
+#include <memory>
+
+#include "tensorflow/core/platform/port.h"  // Must be first
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+
+namespace tensorflow {
+
+class NodeExecStats;
+class StepStats;
+
+class StepStatsCollector {
+ public:
+  explicit StepStatsCollector(StepStats* ss);
+
+  void Save(const string& device, NodeExecStats* nt);
+
+  void Swap(StepStats* ss);
+
+ private:
+  friend class StepStatsMgr;
+  mutex mu_;
+  StepStats* step_stats_ GUARDED_BY(mu_);
+};
+
+namespace port {
+
+class Tracing {
+ public:
+  // This enumeration contains the identifiers of all TensorFlow
+  // threadscape events and code regions.  Threadscape assigns its
+  // own identiers at runtime when we register our events and we
+  // cannot know in advance what IDs it will choose.  The "RecordEvent"
+  // method and "ScopedActivity" use these event IDs for consistency
+  // and remap them to threadscape IDs at runtime.  This enum is limited
+  // to 64 values since we use a bitmask to configure which events are
+  // enabled.  It must also be kept in step with the code in
+  // "Tracing::EventCategoryString".
+  enum EventCategory {
+    kScheduleClosure = 0,
+    kRunClosure = 1,
+    kCompute = 2,
+    kEventCategoryMax = 3  // sentinel - keep last
+  };
+  // Note: We currently only support up to 64 categories.
+  static_assert(kEventCategoryMax <= 64, "only support up to 64 events");
+
+  // Called by main programs to initialize tracing facilities
+  static void Initialize();
+
+  // Return the pathname of the directory where we are writing log files.
+  static const char* LogDir();
+
+  // Returns a non-zero identifier which can be used to correlate
+  // related events.
+  static inline uint64 UniqueId();
+
+  // Returns true if a trace is in progress.  Can be used to reduce tracing
+  // overheads in fast-path code.
+  static inline bool IsActive();
+
+  // Associate name with the current thread.
+  static void RegisterCurrentThread(const char* name);
+
+  // Posts an event with the supplied category and arg.
+  static void RecordEvent(EventCategory category, uint64 arg);
+
+  // Traces a region of code.  Posts a tracing "EnterCodeRegion" event
+  // when created and an "ExitCodeRegion" event when destroyed.
+  class ScopedActivity {
+   public:
+    explicit ScopedActivity(EventCategory category, uint64 arg);
+    ~ScopedActivity();
+
+   private:
+    const bool enabled_;
+    const int32 region_id_;
+
+    TF_DISALLOW_COPY_AND_ASSIGN(ScopedActivity);
+  };
+
+  // Trace collection engine can be registered with this module.
+  // If no engine is registered, ScopedAnnotation and TraceMe are no-ops.
+  class Engine;
+  static void RegisterEngine(Engine*);
+
+  // Forward declaration of the GPU utility classes.
+  class ScopedAnnotation;
+  class TraceMe;
+
+ private:
+  friend class TracingTest;
+
+  static void RegisterEvent(EventCategory id, const char* name);
+  static const char* EventCategoryString(EventCategory category);
+
+  //
+  // Parses event mask expressions in 'value' of the form:
+  //   expr ::= <term> (,<term>)*
+  //   term ::= <event> | "!" <event>
+  //   event ::= "ALL" | <wait_event> | <other_event>
+  //   wait_event ::= "ENewSession" | "ECloseSession" | ...
+  //   other_event ::= "Send" | "Wait" | ...
+  // ALL denotes all events, <event> turns on tracing for this event, and
+  // !<event> turns off tracing for this event.
+  // If the expression can be parsed correctly it returns true and sets
+  // the event_mask_. Otherwise it returns false and the event_mask_ is left
+  // unchanged.
+  static bool ParseEventMask(const char* flagname, const string& value);
+
+  // Bit mask of enabled trace categories.
+  static uint64 event_mask_;
+
+  // Records the mappings between Threadscape IDs and the "EventCategory" enum.
+  static int32 category_id_[kEventCategoryMax];
+  static std::map<string, int32>* name_map_;
+};
+
+// Trace collection engine that actually implements collection.
+class Tracing::Engine {
+ public:
+  Engine() {}
+  virtual ~Engine();
+
+  // Represents an active annotation.
+  class Annotation {
+   public:
+    Annotation() {}
+    virtual ~Annotation();
+  };
+
+  // Represents an active trace.
+  class Tracer {
+   public:
+    Tracer() {}
+    virtual ~Tracer();
+  };
+
+ private:
+  friend class ScopedAnnotation;
+  friend class TraceMe;
+
+  // Register the specified name as an annotation on the current thread.
+  // Caller should delete the result to remove the annotation.
+  // Annotations from the same thread are destroyed in a LIFO manner.
+  // May return nullptr if annotations are not supported.
+  virtual Annotation* PushAnnotation(StringPiece name) = 0;
+
+  // Start tracing under the specified label.  Caller should delete the
+  // result to stop tracing.
+  // May return nullptr if tracing is not supported.
+  virtual Tracer* StartTracing(StringPiece label) = 0;
+};
+
+// This class permits a user to apply annotation on kernels and memcpys
+// when launching them. While an annotation is in scope, all activities
+// within that scope get their names replaced by the annotation. The kernel
+// name replacement is done when constructing the protobuf for sending out to
+// a client (e.g., the stubby requestor) for both API and Activity records.
+//
+// Ownership: The creator of ScopedAnnotation assumes ownership of the object.
+//
+// Usage: {
+//          ScopedAnnotation annotation("first set of kernels");
+//          Kernel1<<<x,y>>>;
+//          LaunchKernel2(); // Which eventually launches a cuda kernel.
+//        }
+// In the above scenario, the GPUProf UI would show 2 kernels with the name
+// "first set of kernels" executing -- they will appear as the same kernel.
+class Tracing::ScopedAnnotation {
+ public:
+  explicit ScopedAnnotation(StringPiece name);
+
+ private:
+  std::unique_ptr<Engine::Annotation> annotation_;
+};
+
+// TODO(opensource): clean up the scoped classes for GPU tracing.
+// This class permits user-specified (CPU) tracing activities. A trace
+// activity is started when an object of this class is created and stopped
+// when the object is destroyed.
+class Tracing::TraceMe {
+ public:
+  explicit TraceMe(StringPiece name);
+
+ private:
+  std::unique_ptr<Engine::Tracer> tracer_;
+};
+
+}  // namespace port
+}  // namespace tensorflow
+
+#if defined(PLATFORM_GOOGLE) && !defined(ANDROID) && !defined(__ANDROID__)
+#include "tensorflow/core/platform/google/tracing_impl.h"
+#else
+#include "tensorflow/core/platform/default/tracing_impl.h"
+#endif
+
+#endif  // TENSORFLOW_PLATFORM_TRACING_H_
diff --git a/tensorflow/core/public/README.md b/tensorflow/core/public/README.md
new file mode 100644
index 0000000000..b1afff87de
--- /dev/null
+++ b/tensorflow/core/public/README.md
@@ -0,0 +1,90 @@
+# TensorFlow
+
+TensorFlow is a computational dataflow graph library.
+
+## Getting started
+
+
+### Python API example
+The following is an example python code to do a simple matrix multiply
+of two constants and get the result from a locally-running TensorFlow
+process.
+
+First, bring in the following dependency:
+
+//third_party/tensorflow/core/public:tensorflow_py
+
+to get the python TensorFlow API. If you intend to run TensorFlow within
+the same process, link in the following to the same binary:
+
+//third_party/tensorflow/core/public:tensorflow_std_ops
+
+to get the standard set of op implementations.  Then:
+
+```python
+import tensorflow as tf
+
+with tf.Session("local"):
+  input1 = tf.Constant(1.0, shape=[1, 1], name="input1")
+  input2 = tf.Constant(2.0, shape=[1, 1], name="input2")
+  output = tf.MatMul(input1, input2)
+
+  # Run graph and fetch the output
+  result = output.eval()
+  print result
+```
+
+### C++ API Example
+
+If you are running TensorFlow locally, link your binary with
+
+//third_party/tensorflow/core/public:tensorflow_local
+
+and link in the operation implementations you want to supported, e.g.,
+
+//third_party/tensorflow/core/public:tensorflow_std_ops
+
+An example program to take a GraphDef and run it using TensorFlow
+using the C++ Session API:
+
+```c++
+#include <memory>
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/public/session.h"
+#include "tensorflow/core/public/tensor.h"
+
+int main(int argc, char** argv) {
+  // Construct your graph.
+  tensorflow::GraphDef graph = ...;
+
+  // Create a Session running TensorFlow locally in process.
+  std::unique_ptr<tensorflow::Session> session(tensorflow::NewSession({}));
+
+  // Initialize the session with the graph.
+  tensorflow::Status s = session->Create(graph);
+  if (!s.ok()) { ... }
+
+  // Specify the 'feeds' of your network if needed.
+  std::vector<std::pair<string, tensorflow::Tensor>> inputs;
+
+  // Run the session, asking for the first output of "my_output".
+  std::vector<tensorflow::Tensor> outputs;
+  s = session->Run(inputs, {"my_output:0"}, {}, &outputs);
+  if (!s.ok()) { ... }
+
+  // Do something with your outputs
+  auto output_vector = outputs[0].vec<float>();
+  if (output_vector(0) > 0.5) { ... }
+
+  // Close the session.
+  session->Close();
+
+  return 0;
+}
+```
+
+For a more fully-featured C++ example, see
+`tensorflow/cc/tutorials/example_trainer.cc`
diff --git a/tensorflow/core/public/env.h b/tensorflow/core/public/env.h
new file mode 100644
index 0000000000..4024525859
--- /dev/null
+++ b/tensorflow/core/public/env.h
@@ -0,0 +1,273 @@
+#ifndef TENSORFLOW_PUBLIC_ENV_H_
+#define TENSORFLOW_PUBLIC_ENV_H_
+
+#include <string>
+#include <vector>
+#include <stdint.h>
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+class RandomAccessFile;
+class Thread;
+class ThreadOptions;
+class WritableFile;
+
+/// \brief An interface used by the tensorflow implementation to
+/// access operating system functionality like the filesystem etc.
+///
+/// Callers may wish to provide a custom Env object to get fine grain   
+/// control.
+///
+/// All Env implementations are safe for concurrent access from
+/// multiple threads without any external synchronization.
+class Env {
+ public:
+  Env() {}
+  virtual ~Env();
+
+  /// \brief Returns a default environment suitable for the current operating
+  /// system.
+  ///
+  /// Sophisticated users may wish to provide their own Env
+  /// implementation instead of relying on this default environment.
+  ///
+  /// The result of Default() belongs to this library and must never be deleted.
+  static Env* Default();
+
+  /// \brief Creates a brand new random access read-only file with the
+  /// specified name.
+
+  /// On success, stores a pointer to the new file in
+  /// *result and returns OK.  On failure stores NULL in *result and
+  /// returns non-OK.  If the file does not exist, returns a non-OK
+  /// status.
+  ///
+  /// The returned file may be concurrently accessed by multiple threads.
+  virtual Status NewRandomAccessFile(const string& fname,
+                                          RandomAccessFile** result) = 0;
+
+  /// \brief Creates an object that writes to a new file with the specified
+  /// name.
+  ///
+  /// Deletes any existing file with the same name and creates a
+  /// new file.  On success, stores a pointer to the new file in
+  /// *result and returns OK.  On failure stores NULL in *result and
+  /// returns non-OK.
+  ///
+  /// The returned file will only be accessed by one thread at a time.
+  virtual Status NewWritableFile(const string& fname,
+                                      WritableFile** result) = 0;
+
+  /// \brief Creates an object that either appends to an existing file, or
+  /// writes to a new file (if the file does not exist to begin with).
+  ///
+  /// On success, stores a pointer to the new file in *result and
+  /// returns OK.  On failure stores NULL in *result and returns
+  /// non-OK.
+  ///
+  /// The returned file will only be accessed by one thread at a time.
+  virtual Status NewAppendableFile(const string& fname,
+                                        WritableFile** result) = 0;
+
+  /// Returns true iff the named file exists.
+  virtual bool FileExists(const string& fname) = 0;
+
+  /// \brief Stores in *result the names of the children of the specified
+  /// directory. The names are relative to "dir".
+  ///
+  /// Original contents of *results are dropped.
+  virtual Status GetChildren(const string& dir,
+                                  std::vector<string>* result) = 0;
+
+  /// Deletes the named file.
+  virtual Status DeleteFile(const string& fname) = 0;
+
+  /// Creates the specified directory.
+  virtual Status CreateDir(const string& dirname) = 0;
+
+  /// Deletes the specified directory.
+  virtual Status DeleteDir(const string& dirname) = 0;
+
+  /// Stores the size of fname in *file_size.
+  virtual Status GetFileSize(const string& fname, uint64* file_size) = 0;
+
+  /// \brief Renames file src to target. If target already exists, it will be
+  /// replaced.
+  virtual Status RenameFile(const string& src, const string& target) = 0;
+
+  // TODO(jeff,sanjay): Add back thread/thread-pool support if needed.
+  // TODO(jeff,sanjay): if needed, tighten spec so relative to epoch, or
+  // provide a routine to get the absolute time.
+
+  /// \brief Returns the number of micro-seconds since some fixed point in
+  /// time. Only useful for computing deltas of time.
+  virtual uint64 NowMicros() = 0;
+
+  /// Sleeps/delays the thread for the prescribed number of micro-seconds.
+  virtual void SleepForMicroseconds(int micros) = 0;
+
+  /// \brief Returns a new thread that is running fn() and is identified
+  /// (for debugging/performance-analysis) by "name".
+  ///
+  /// Caller takes ownership of the result and must delete it eventually
+  /// (the deletion will block until fn() stops running).
+  virtual Thread* StartThread(const ThreadOptions& thread_options,
+                              const string& name,
+                              std::function<void()> fn) TF_MUST_USE_RESULT = 0;
+
+ private:
+  /// No copying allowed
+  Env(const Env&);
+  void operator=(const Env&);
+};
+
+/// A file abstraction for randomly reading the contents of a file.
+class RandomAccessFile {
+ public:
+  RandomAccessFile() {}
+  virtual ~RandomAccessFile();
+
+  /// \brief Reads up to "n" bytes from the file starting at "offset".
+  ///
+  /// "scratch[0..n-1]" may be written by this routine.  Sets "*result"
+  /// to the data that was read (including if fewer than "n" bytes were
+  /// successfully read).  May set "*result" to point at data in
+  /// "scratch[0..n-1]", so "scratch[0..n-1]" must be live when
+  /// "*result" is used.
+  ///
+  /// On OK returned status: "n" bytes have been stored in "*result".
+  /// On non-OK returned status: [0..n] bytes have been stored in "*result".
+  ///
+  /// Returns OUT_OF_RANGE if fewer than n bytes were stored in "*result"
+  /// because of EOF.
+  ///
+  /// Safe for concurrent use by multiple threads.
+  virtual Status Read(uint64 offset, size_t n, StringPiece* result,
+                           char* scratch) const = 0;
+
+ private:
+  /// No copying allowed
+  RandomAccessFile(const RandomAccessFile&);
+  void operator=(const RandomAccessFile&);
+};
+
+/// \brief A file abstraction for sequential writing. 
+///
+/// The implementation must provide buffering since callers may append
+/// small fragments at a time to the file.
+class WritableFile {
+ public:
+  WritableFile() {}
+  virtual ~WritableFile();
+
+  virtual Status Append(const StringPiece& data) = 0;
+  virtual Status Close() = 0;
+  virtual Status Flush() = 0;
+  virtual Status Sync() = 0;
+
+ private:
+  /// No copying allowed
+  WritableFile(const WritableFile&);
+  void operator=(const WritableFile&);
+};
+
+/// \brief An implementation of Env that forwards all calls to another Env.
+///
+/// May be useful to clients who wish to override just part of the
+/// functionality of another Env.
+class EnvWrapper : public Env {
+ public:
+  /// Initializes an EnvWrapper that delegates all calls to *t
+  explicit EnvWrapper(Env* t) : target_(t) {}
+  virtual ~EnvWrapper();
+
+  /// Returns the target to which this Env forwards all calls
+  Env* target() const { return target_; }
+
+  // The following text is boilerplate that forwards all methods to target()
+  Status NewRandomAccessFile(const string& f,
+                                  RandomAccessFile** r) override {
+    return target_->NewRandomAccessFile(f, r);
+  }
+  Status NewWritableFile(const string& f, WritableFile** r) override {
+    return target_->NewWritableFile(f, r);
+  }
+  Status NewAppendableFile(const string& f, WritableFile** r) override {
+    return target_->NewAppendableFile(f, r);
+  }
+  bool FileExists(const string& f) override { return target_->FileExists(f); }
+  Status GetChildren(const string& dir, std::vector<string>* r) override {
+    return target_->GetChildren(dir, r);
+  }
+  Status DeleteFile(const string& f) override {
+    return target_->DeleteFile(f);
+  }
+  Status CreateDir(const string& d) override {
+    return target_->CreateDir(d);
+  }
+  Status DeleteDir(const string& d) override {
+    return target_->DeleteDir(d);
+  }
+  Status GetFileSize(const string& f, uint64* s) override {
+    return target_->GetFileSize(f, s);
+  }
+  Status RenameFile(const string& s, const string& t) override {
+    return target_->RenameFile(s, t);
+  }
+  uint64 NowMicros() override { return target_->NowMicros(); }
+  void SleepForMicroseconds(int micros) override {
+    target_->SleepForMicroseconds(micros);
+  }
+  Thread* StartThread(const ThreadOptions& thread_options, const string& name,
+                      std::function<void()> fn) override {
+    return target_->StartThread(thread_options, name, fn);
+  }
+
+ private:
+  Env* target_;
+};
+
+class Thread {
+ public:
+  Thread() {}
+
+  /// Blocks until the thread of control stops running.
+  virtual ~Thread();
+
+ private:
+  /// No copying allowed
+  Thread(const Thread&);
+  void operator=(const Thread&);
+};
+
+/// \brief Options to configure a Thread.
+///
+/// Note that the options are all hints, and the
+/// underlying implementation may choose to ignore it.
+struct ThreadOptions {
+  /// Thread stack size to use (in bytes).
+  size_t stack_size = 0;  // 0: use system default value
+  /// Guard area size to use near thread stacks to use (in bytes)
+  size_t guard_size = 0;  // 0: use system default value
+};
+
+/// A utility routine: reads contents of named file into *data
+Status ReadFileToString(Env* env, const string& fname, string* data);
+
+/// A utility routine: write contents of "data" to file named "fname"
+/// (overwriting existing contents, if any).
+Status WriteStringToFile(Env* env, const string& fname,
+                              const StringPiece& data);
+
+/// Reads contents of named file and parse as binary encoded proto data
+/// and store into *proto.
+Status ReadBinaryProto(Env* env, const string& fname,
+                       ::tensorflow::protobuf::MessageLite* proto);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PUBLIC_ENV_H_
diff --git a/tensorflow/core/public/session.h b/tensorflow/core/public/session.h
new file mode 100644
index 0000000000..a33d5ee6ae
--- /dev/null
+++ b/tensorflow/core/public/session.h
@@ -0,0 +1,125 @@
+#ifndef TENSORFLOW_PUBLIC_SESSION_H_
+#define TENSORFLOW_PUBLIC_SESSION_H_
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/framework/device_attributes.pb.h"
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/session_options.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+
+namespace tensorflow {
+
+/// \brief A Session instance lets a caller drive a TensorFlow graph
+/// computation.
+///
+/// When a Session is created with a given target, a new Session object
+/// is bound to the universe of resources specified by that target.
+/// Those resources are available to this session to perform
+/// computation described in the GraphDef.  After extending the session
+/// with a graph, the caller uses the Run() API to perform the
+/// computation and potentially fetch outputs as Tensors.
+///
+/// Example:
+///
+///     tensorflow::GraphDef graph;
+///     // ... Create or load graph into 'graph'.
+///
+///     // This example uses the default options which connects
+///     // to a local runtime.
+///     tensorflow::SessionOptions options;
+///     std::unique_ptr<tensorflow::Session>
+///     session(tensorflow::NewSession(options));
+///
+///     // Create the session with this graph.
+///     tensorflow::Status s = session->Create(graph);
+///     if (!s.ok()) { ... }
+///
+///     // Run the graph and fetch the first output of the "output"
+///     // operation, and also run to but do not return anything
+///     // for the "update_state" operation.
+///     std::vector<tensorflow::Tensor> outputs;
+///     s = session->Run({}, {"output:0"}, {"update_state"}, &outputs);
+///     if (!s.ok()) { ... }
+///
+///     // Map the output as a flattened float tensor, and do something
+///     // with it.
+///     auto output_tensor = outputs[0].flat<float>();
+///     if (output_tensor(0) > 0.5) { ... }
+///
+///     // Close the session to release the resources associated with
+///     // this session.
+///     session->Close()
+///
+/// A Session allows concurrent calls to Run(), though a Session must
+/// be created / extended by a single thread.
+///
+/// Only one thread must call Close(), and Close() must only be called
+/// after all other calls to Run() have returned.
+class Session {
+ public:
+  /// \brief Create the graph to be used for the session.
+  ///
+  /// Returns an error if this session has already been created with a
+  /// graph. To re-use the session with a different graph, the caller
+  /// must Close() the session first.
+  virtual Status Create(const GraphDef& graph) = 0;
+
+  /// \brief Adds operations to the graph that is already registered with the
+  /// Session.
+  ///
+  /// The names of new operations in "graph" must not exist in the
+  /// graph that is already registered.
+  virtual Status Extend(const GraphDef& graph) = 0;
+
+  /// \brief Runs the graph with the provided input tensors and fills
+  /// 'outputs' for the endpoints specified in 'output_tensor_names'.
+  /// Runs to but does not return Tensors for the nodes in
+  /// 'target_node_names'.
+  ///
+  /// The order of tensors in 'outputs' will match the order provided
+  /// by 'output_tensor_names'.
+  ///
+  /// If Run returns OK(), then outputs->size() will be equal to
+  /// output_tensor_names.size().  If Run does not return OK(), the
+  /// state of outputs is undefined.
+  ///
+  /// REQUIRES: The name of each Tensor of the input or output must
+  /// match a "Tensor endpoint" in the GraphDef passed to Create().
+  ///
+  /// REQUIRES: outputs is not nullptr if output_tensor_names is non-empty.
+  virtual Status Run(const std::vector<std::pair<string, Tensor> >& inputs,
+                     const std::vector<string>& output_tensor_names,
+                     const std::vector<string>& target_node_names,
+                     std::vector<Tensor>* outputs) = 0;
+
+  /// \brief Closes this session.
+  ///
+  /// Closing a session releases the resources used by this session
+  /// on the TensorFlow runtime (specified during session creation by
+  /// the 'SessionOptions::target' field).
+  virtual Status Close() = 0;
+
+  virtual ~Session() {}
+};
+
+/// \brief Create a new session with the given options.
+///
+/// If a new session object could not be created, this function will
+/// return nullptr.
+Session* NewSession(const SessionOptions& options);
+
+/// \brief Create a new session with the given options.
+///
+/// If session creation succeeds, the new Session will be stored in
+/// *out_session, the caller will take ownership of the returned
+/// *out_session, and this function will return OK(). Otherwise, this
+/// function will return an error status.
+Status NewSession(const SessionOptions& options, Session** out_session);
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_PUBLIC_SESSION_H_
diff --git a/tensorflow/core/public/session_options.h b/tensorflow/core/public/session_options.h
new file mode 100644
index 0000000000..11d52426ac
--- /dev/null
+++ b/tensorflow/core/public/session_options.h
@@ -0,0 +1,50 @@
+#ifndef TENSORFLOW_PUBLIC_SESSION_OPTIONS_H_
+#define TENSORFLOW_PUBLIC_SESSION_OPTIONS_H_
+
+#include <string>
+#include "tensorflow/core/framework/config.pb.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+class Env;
+
+/// Configuration information for a Session.
+struct SessionOptions {
+  /// The environment to use.
+  Env* env;
+
+  /// \brief The TensorFlow runtime to connect to.
+  ///
+  /// If 'target' is empty or unspecified, the local TensorFlow runtime
+  /// implementation will be used.  Otherwise, the TensorFlow engine
+  /// defined by 'target' will be used to perform all computations.
+  ///
+  /// "target" can be either a single entry or a comma separated list
+  /// of entries. Each entry is a resolvable address of the
+  /// following format:
+  ///   local
+  ///   ip:port
+  ///   host:port
+  ///   ... other system-specific formats to identify tasks and jobs ...
+  ///
+  /// NOTE: at the moment 'local' maps to an in-process service-based
+  /// runtime.
+  ///
+  /// Upon creation, a single session affines itself to one of the
+  /// remote processes, with possible load balancing choices when the
+  /// "target" resolves to a list of possible processes.
+  ///
+  /// If the session disconnects from the remote process during its
+  /// lifetime, session calls may fail immediately.
+  string target;
+
+  /// Configuration options.
+  ConfigProto config;
+
+  SessionOptions();
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_PUBLIC_SESSION_OPTIONS_H_
diff --git a/tensorflow/core/public/status.h b/tensorflow/core/public/status.h
new file mode 100644
index 0000000000..d0405b8876
--- /dev/null
+++ b/tensorflow/core/public/status.h
@@ -0,0 +1,96 @@
+#ifndef TENSORFLOW_PUBLIC_STATUS_H_
+#define TENSORFLOW_PUBLIC_STATUS_H_
+
+#include <iosfwd>
+#include <string>
+#include "tensorflow/core/lib/core/error_codes.pb.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+class Status {
+ public:
+  /// Create a success status.
+  Status() : state_(NULL) {}
+  ~Status();
+
+  /// \brief Create a status with the specified error code and msg as a
+  /// human-readable string containing more detailed information.
+  Status(tensorflow::error::Code code, tensorflow::StringPiece msg);
+
+  /// Copy the specified status.
+  Status(const Status& s);
+  void operator=(const Status& s);
+
+  static Status OK() { return Status(); }
+
+  /// Returns true iff the status indicates success.
+  bool ok() const { return (state_ == NULL); }
+
+  tensorflow::error::Code code() const {
+    return ok() ? tensorflow::error::OK : state_->code;
+  }
+
+  const string& error_message() const {
+    return ok() ? empty_string() : state_->msg;
+  }
+
+  bool operator==(const Status& x) const;
+  bool operator!=(const Status& x) const;
+
+  /// \brief If "ok()", stores "new_status" into *this.  If "!ok()", preserves
+  /// the current status, but may augment with additional information
+  /// about "new_status".
+  ///
+  /// Convenient way of keeping track of the first error encountered.
+  /// Instead of:
+  ///   if (overall_status.ok()) overall_status = new_status
+  /// Use:
+  ///   overall_status.Update(new_status);
+  void Update(const Status& new_status);
+
+  /// \brief Return a string representation of this status suitable for
+  /// printing. Returns the string "OK" for success.
+  string ToString() const;
+
+ private:
+  static const string& empty_string();
+  struct State {
+    tensorflow::error::Code code;
+    string msg;
+  };
+  /// OK status has a NULL state_.  Otherwise, state_ points to
+  /// a State structure containing the error code and message(s)
+  State* state_;
+
+  void SlowCopyFrom(const State* src);
+};
+
+inline Status::Status(const Status& s)
+    : state_((s.state_ == NULL) ? NULL : new State(*s.state_)) {}
+
+inline void Status::operator=(const Status& s) {
+  /// The following condition catches both aliasing (when this == &s),
+  /// and the common case where both s and *this are ok.
+  if (state_ != s.state_) {
+    SlowCopyFrom(s.state_);
+  }
+}
+
+inline bool Status::operator==(const Status& x) const {
+  return (this->state_ == x.state_) || (ToString() == x.ToString());
+}
+
+inline bool Status::operator!=(const Status& x) const { return !(*this == x); }
+
+std::ostream& operator<<(std::ostream& os, const Status& x);
+
+typedef std::function<void(const Status&)> StatusCallback;
+
+#define TF_CHECK_OK(val) CHECK_EQ(::tensorflow::Status::OK(), (val))
+#define TF_QCHECK_OK(val) QCHECK_EQ(::tensorflow::Status::OK(), (val))
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PUBLIC_STATUS_H_
diff --git a/tensorflow/core/public/tensor.h b/tensorflow/core/public/tensor.h
new file mode 100644
index 0000000000..6c6ff0f58a
--- /dev/null
+++ b/tensorflow/core/public/tensor.h
@@ -0,0 +1,472 @@
+#ifndef TENSORFLOW_PUBLIC_TENSOR_H_
+#define TENSORFLOW_PUBLIC_TENSOR_H_
+
+#include "tensorflow/core/framework/allocation_description.pb.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/tensor.pb.h"
+#include "tensorflow/core/framework/tensor_description.pb.h"
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/lib/core/refcount.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+class TensorBuffer;  // Forward declaration.
+class TensorCApi;
+
+/// Represents an n-dimensional array of values.
+class Tensor {
+ public:
+  /// Default Tensor constructor. Creates a 1-dimension, 0-element float tensor.
+  Tensor();
+
+  /// \brief Creates a Tensor of the given datatype and shape.
+  ///
+  /// The underlying buffer is allocated using a CPUAllocator.
+  Tensor(DataType type, const TensorShape& shape);
+
+  /// \brief Creates a tensor with the input datatype and shape, using the
+  /// allocator 'a' to allocate the underlying buffer.
+  ///
+  /// 'a' must outlive the lifetime of this Tensor.
+  Tensor(Allocator* a, DataType type, const TensorShape& shape);
+
+  /// Creates an uninitialized Tensor of the given data type.
+  explicit Tensor(DataType type);
+
+  Tensor(const Tensor& other);  /// Copy constructor.
+
+  ~Tensor();
+
+  /// Returns the data type.
+  DataType dtype() const { return type_; }
+
+  /// Returns the shape of the tensor.
+  const TensorShape& shape() const { return shape_; }
+
+  /// \brief Convenience accessor for the tensor shape.
+  ///
+  /// For all shape accessors, see comments for relevant methods of
+  /// TensorShape in tensor_shape.h.
+  int dims() const { return shape().dims(); }
+
+  /// Convenience accessor for the tensor shape.
+  int64 dim_size(int d) const { return shape().dim_size(d); }
+
+  /// Convenience accessor for the tensor shape.
+  int64 NumElements() const { return shape().num_elements(); }
+
+  bool IsSameSize(const Tensor& b) const {
+    return shape().IsSameSize(b.shape());
+  }
+
+  /// Has this Tensor been initialized?
+  bool IsInitialized() const;
+
+  /// Returns the estimated memory usage of this tensor.
+  size_t TotalBytes() const;
+
+  /// Assign operator. This tensor shares other's underlying storage.
+  Tensor& operator=(const Tensor& other) {
+    CopyFromInternal(other, other.shape());
+    return *this;
+  }
+
+  /// \brief Copy the other tensor into this tensor and reshape it.
+  ///
+  /// This tensor shares other's underlying storage. Returns
+  /// true iff other.shape() has the same number of elements of the
+  /// given "shape".
+  bool CopyFrom(const Tensor& other,
+                const TensorShape& shape) TF_MUST_USE_RESULT {
+    if (other.NumElements() != shape.num_elements()) return false;
+    CopyFromInternal(other, shape);
+    return true;
+  }
+
+  /// \brief Slice this tensor along the 1st dimension.
+
+  /// I.e., the returned
+  /// tensor satisifies returned[i, ...] == this[dim0_start + i, ...].
+  /// The returned tensor shares the underlying tensor buffer with this
+  /// tensor.
+  ///
+  /// NOTE: The returned tensor may not satisfies the same alignment
+  /// requirement as this tensor depending on the shape. The caller
+  /// must check the returned tensor's alignment before calling certain
+  /// methods that have alignment requirement (e.g., flat(), tensor()).
+  ///
+  /// REQUIRES: dims() >= 1
+  /// REQUIRES: 0 <= dim0_start <= dim0_limit <= dim_size(0)
+  Tensor Slice(int64 dim0_start, int64 dim0_limit) const;
+
+  /// \brief Parse "other' and construct the tensor. 
+
+  /// Returns true iff the
+  /// parsing succeeds. If the parsing fails, the state of "*this" is
+  /// unchanged.
+  bool FromProto(const TensorProto& other) TF_MUST_USE_RESULT;
+  bool FromProto(Allocator* a, const TensorProto& other) TF_MUST_USE_RESULT;
+
+  /// \brief Fills in "proto" with "*this" tensor's content.
+  ///
+  /// AsProtoField() fills in the repeated field for proto.dtype(), while
+  /// AsProtoTensorContent() encodes the content in proto.tensor_content() in a
+  /// compact form.
+  void AsProtoField(TensorProto* proto) const;
+  void AsProtoTensorContent(TensorProto* proto) const;
+
+  /// \brief Return the Tensor data as an Eigen::Tensor with the type and
+  /// sizes of this Tensor.
+  ///
+  /// Use these methods when you know the data type and the number of
+  /// dimensions of the Tensor and you want an Eigen::Tensor
+  /// automatically sized to the Tensor sizes. The implementation check
+  /// fails if either type or sizes mismatch.
+  ///
+  /// Example:
+  ///     typedef float T;
+  ///     Tensor my_mat(...built with Shape{rows: 3, cols: 5}...);
+  ///     auto mat = my_mat.matrix<T>();    // 2D Eigen::Tensor, 3 x 5.
+  ///     auto mat = my_mat.tensor<T, 2>(); // 2D Eigen::Tensor, 3 x 5.
+  ///     auto vec = my_mat.vec<T>();       // CHECK fails as my_mat is 2D.
+  ///     auto vec = my_mat.tensor<T, 3>(); // CHECK fails as my_mat is 2D.
+  ///     auto mat = my_mat.matrix<int32>();// CHECK fails as type mismatch.
+  template <typename T>
+  typename TTypes<T>::Vec vec() {
+    return tensor<T, 1>();
+  }
+
+  template <typename T>
+  typename TTypes<T>::Matrix matrix() {
+    return tensor<T, 2>();
+  }
+
+  template <typename T, size_t NDIMS>
+  typename TTypes<T, NDIMS>::Tensor tensor();
+
+  /// \brief Return the Tensor data as an Eigen::Tensor of the data type and a
+  /// specified shape.
+  ///
+  /// These methods allow you to access the data with the dimensions
+  /// and sizes of your choice.  You do not need to know the number of
+  /// dimensions of the Tensor to call them.  However, they CHECK that
+  /// the type matches and the dimensions requested creates an
+  /// Eigen::Tensor with the same number of elements as the Tensor.
+  ///
+  /// Example:
+  ///     typedef float T;
+  ///     Tensor my_ten(...built with Shape{planes: 4, rows: 3, cols: 5}...);
+  ///     // 1D Eigen::Tensor, size 60:
+  ///     auto flat = my_ten.flat<T>();
+  ///     // 2D Eigen::Tensor 12 x 5:
+  ///     auto inner = my_ten.flat_inner_dims<T>();
+  ///     // 2D Eigen::Tensor 4 x 15:
+  ///     auto outer = my_ten.shaped<T, 2>({4, 15});
+  ///     // CHECK fails, bad num elements:
+  ///     auto outer = my_ten.shaped<T, 2>({4, 8});
+  ///     // 3D Eigen::Tensor 6 x 5 x 2:
+  ///     auto weird = my_ten.shaped<T, 3>({6, 5, 2});
+  ///     // CHECK fails, type mismatch:
+  ///     auto bad   = my_ten.flat<int32>();
+  template <typename T>
+  typename TTypes<T>::Flat flat() {
+    return shaped<T, 1>({NumElements()});
+  }
+
+  template <typename T>
+  typename TTypes<T>::UnalignedFlat unaligned_flat() {
+    return unaligned_shaped<T, 1>({NumElements()});
+  }
+
+  /// Returns the data as an Eigen::Tensor with 2 dimensions, collapsing all
+  /// Tensor dimensions but the last one into the first dimension of the result.
+  template <typename T>
+  typename TTypes<T>::Matrix flat_inner_dims() {
+    int64 last_size = dims() > 0 ? dim_size(dims() - 1) : 1;
+    if (last_size == 0) {
+      DCHECK_EQ(NumElements(), 0);
+      // Return something empty, avoiding divide by 0
+      return shaped<T, 2>({0, 0});
+    } else {
+      return shaped<T, 2>({NumElements() / last_size, last_size});
+    }
+  }
+
+  /// Returns the data as an Eigen::Tensor with 2 dimensions, collapsing all
+  /// Tensor dimensions but the first one into the last dimension of the result.
+  template <typename T>
+  typename TTypes<T>::Matrix flat_outer_dims() {
+    int64 first_size = dims() > 0 ? dim_size(0) : 1;
+    if (first_size == 0) {
+      DCHECK_EQ(NumElements(), 0);
+      // Return something empty, avoiding divide by 0
+      return shaped<T, 2>({0, 0});
+    } else {
+      return shaped<T, 2>({first_size, NumElements() / first_size});
+    }
+  }
+
+  template <typename T, size_t NDIMS>
+  typename TTypes<T, NDIMS>::Tensor shaped(gtl::ArraySlice<int64> new_sizes);
+
+  template <typename T, size_t NDIMS>
+  typename TTypes<T, NDIMS>::UnalignedTensor unaligned_shaped(
+      gtl::ArraySlice<int64> new_sizes);
+
+  /// \brief Return the Tensor data as a Tensor Map of fixed size 1:
+  /// TensorMap<TensorFixedSize<T, 1>>.
+
+  /// Using scalar() allows the compiler to
+  /// perform optimizations as the size of the tensor is known at compile time.
+  template <typename T>
+  typename TTypes<T>::Scalar scalar();
+
+  /// Const versions of all the methods above.
+  template <typename T>
+  typename TTypes<T>::ConstVec vec() const {
+    return tensor<T, 1>();
+  }
+
+  template <typename T>
+  typename TTypes<T>::ConstMatrix matrix() const {
+    return tensor<T, 2>();
+  }
+
+  template <typename T, size_t NDIMS>
+  typename TTypes<T, NDIMS>::ConstTensor tensor() const;
+
+  template <typename T>
+  typename TTypes<T>::ConstFlat flat() const {
+    return shaped<T, 1>({NumElements()});
+  }
+
+  template <typename T>
+  typename TTypes<T>::UnalignedConstFlat unaligned_flat() const {
+    return unaligned_shaped<T, 1>({NumElements()});
+  }
+
+  template <typename T>
+  typename TTypes<T>::ConstMatrix flat_inner_dims() const {
+    int64 last_size = dims() > 0 ? dim_size(dims() - 1) : 1;
+    if (last_size == 0) {
+      DCHECK_EQ(NumElements(), 0);
+      // Return something empty, avoiding divide by 0
+      return shaped<T, 2>({0, 0});
+    } else {
+      return shaped<T, 2>({NumElements() / last_size, last_size});
+    }
+  }
+
+  template <typename T>
+  typename TTypes<T>::ConstMatrix flat_outer_dims() const {
+    int64 first_size = dims() > 0 ? dim_size(0) : 1;
+    if (first_size == 0) {
+      DCHECK_EQ(NumElements(), 0);
+      // Return something empty, avoiding divide by 0
+      return shaped<T, 2>({0, 0});
+    } else {
+      return shaped<T, 2>({first_size, NumElements() / first_size});
+    }
+  }
+
+  template <typename T, size_t NDIMS>
+  typename TTypes<T, NDIMS>::ConstTensor shaped(
+      gtl::ArraySlice<int64> new_sizes) const;
+  template <typename T, size_t NDIMS>
+  typename TTypes<T, NDIMS>::UnalignedConstTensor unaligned_shaped(
+      gtl::ArraySlice<int64> new_sizes) const;
+
+  template <typename T>
+  typename TTypes<T>::ConstScalar scalar() const;
+
+  /// Render the first max_entries values in *this into a string.
+  string SummarizeValue(int64 max_entries) const;
+
+  /// A human-readable summary of the Tensor suitable for debugging.
+  string DebugString() const;
+
+  /// Fill in the TensorDescription proto with metadata about the
+  /// Tensor that is useful for monitoring and debugging.
+  void FillDescription(TensorDescription* description) const;
+
+  /// \brief Returns a StringPiece mapping the current tensor's buffer.
+  ///
+  /// The returned StringPiece may point to memory location on devices
+  /// that the CPU cannot address directly.
+  ///
+  /// NOTE: The underlying Tensor buffer is refcounted, so the lifetime
+  /// of the contents mapped by the StringPiece matches the lifetime of
+  /// the buffer; callers should arrange to make sure the buffer does
+  /// not get destroyed while the StringPiece is still used.
+  ///
+  /// REQUIRES: DataTypeCanUseMemcpy(dtype()).
+  StringPiece tensor_data() const;
+
+ private:
+  DataType type_;
+  TensorShape shape_;
+  TensorBuffer* buf_;
+
+  friend class DMAHelper;
+  friend class TensorCApi;
+  friend class VariableOp;            // For access to set_shape
+  friend class AutoReloadVariableOp;  // For access to set_shape
+
+  // Creates a tensor with the input datatype, shape and buf.
+  //
+  // Acquires a ref on buf that belongs to this Tensor.
+  Tensor(DataType type, const TensorShape& shape, TensorBuffer* buf);
+
+  bool CanUseDMA() const;
+
+  // Only needed by variable op to set the shape of an uninitialized
+  // Tensor.
+  // TODO: Remove this when we have a better story for detecting
+  // uninitialized tensors.
+  void set_shape(const TensorShape& shape) { shape_ = shape; }
+
+  void CopyFromInternal(const Tensor& other, const TensorShape& shape);
+
+  template <typename T>
+  T* base() const;
+};
+
+// Implementation details
+
+// Interface to access the raw ref-counted data buffer.
+class TensorBuffer : public core::RefCounted {
+ public:
+  ~TensorBuffer() override {}
+
+  // data() points to a memory region of size() bytes.
+  virtual void* data() const = 0;
+  virtual size_t size() const = 0;
+
+  // If this TensorBuffer is sub-buffer of another TensorBuffer,
+  // returns that TensorBuffer. Otherwise, returns this.
+  virtual TensorBuffer* root_buffer() = 0;
+
+  // Fill metadata about the allocation into the proto.
+  virtual void FillAllocationDescription(
+      AllocationDescription* proto) const = 0;
+
+  template <typename T>
+  T* base() const {
+    return reinterpret_cast<T*>(data());
+  }
+};
+
+inline void CheckEigenAlignment(const void* ptr) {
+#if EIGEN_ALIGN == 1
+  CHECK_EQ(reinterpret_cast<intptr_t>(ptr) % EIGEN_ALIGN_BYTES, 0);
+#endif
+}
+
+template <typename T>
+T* Tensor::base() const {
+  return buf_ == nullptr ? nullptr : buf_->base<T>();
+}
+
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::Tensor Tensor::tensor() {
+  CHECK_EQ(dtype(), DataTypeToEnum<T>::v());
+  CheckEigenAlignment(base<T>());
+  return typename TTypes<T, NDIMS>::Tensor(base<T>(),
+                                           shape().AsEigenDSizes<NDIMS>());
+}
+
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::ConstTensor Tensor::tensor() const {
+  CheckEigenAlignment(base<T>());
+  CHECK_EQ(dtype(), DataTypeToEnum<T>::v());
+  return typename TTypes<T, NDIMS>::ConstTensor(base<const T>(),
+                                                shape().AsEigenDSizes<NDIMS>());
+}
+
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::Tensor Tensor::shaped(
+    gtl::ArraySlice<int64> new_sizes) {
+  CheckEigenAlignment(base<T>());
+  CHECK_EQ(dtype(), DataTypeToEnum<T>::v());
+  CHECK_EQ(NDIMS, new_sizes.size());
+  int64 new_num_elements = 1;
+  Eigen::array<Eigen::DenseIndex, NDIMS> dims;
+  for (int d = 0; d < NDIMS; d++) {
+    new_num_elements *= new_sizes[d];
+    dims[d] = new_sizes[d];
+  }
+  CHECK_EQ(new_num_elements, NumElements());
+  return typename TTypes<T, NDIMS>::Tensor(base<T>(), dims);
+}
+
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::UnalignedTensor Tensor::unaligned_shaped(
+    gtl::ArraySlice<int64> new_sizes) {
+  CHECK_EQ(dtype(), DataTypeToEnum<T>::v());
+  CHECK_EQ(NDIMS, new_sizes.size());
+  int64 new_num_elements = 1;
+  Eigen::array<Eigen::DenseIndex, NDIMS> dims;
+  for (int d = 0; d < NDIMS; d++) {
+    new_num_elements *= new_sizes[d];
+    dims[d] = new_sizes[d];
+  }
+  CHECK_EQ(new_num_elements, NumElements());
+  return typename TTypes<T, NDIMS>::UnalignedTensor(base<T>(), dims);
+}
+
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::ConstTensor Tensor::shaped(
+    gtl::ArraySlice<int64> new_sizes) const {
+  CheckEigenAlignment(base<T>());
+  CHECK_EQ(dtype(), DataTypeToEnum<T>::v());
+  CHECK_EQ(NDIMS, new_sizes.size());
+  int64 new_num_elements = 1;
+  Eigen::array<Eigen::DenseIndex, NDIMS> dims;
+  for (int d = 0; d < NDIMS; d++) {
+    new_num_elements *= new_sizes[d];
+    dims[d] = new_sizes[d];
+  }
+  CHECK_EQ(new_num_elements, NumElements());
+  return typename TTypes<T, NDIMS>::ConstTensor(base<T>(), dims);
+}
+
+template <typename T, size_t NDIMS>
+typename TTypes<T, NDIMS>::UnalignedConstTensor Tensor::unaligned_shaped(
+    gtl::ArraySlice<int64> new_sizes) const {
+  CHECK_EQ(dtype(), DataTypeToEnum<T>::v());
+  CHECK_EQ(NDIMS, new_sizes.size());
+  int64 new_num_elements = 1;
+  Eigen::array<Eigen::DenseIndex, NDIMS> dims;
+  for (int d = 0; d < NDIMS; d++) {
+    new_num_elements *= new_sizes[d];
+    dims[d] = new_sizes[d];
+  }
+  CHECK_EQ(new_num_elements, NumElements());
+  return typename TTypes<T, NDIMS>::UnalignedConstTensor(base<T>(), dims);
+}
+
+template <typename T>
+typename TTypes<T>::Scalar Tensor::scalar() {
+  CheckEigenAlignment(base<T>());
+  CHECK_EQ(1, NumElements()) << "Must have a one element tensor";
+  return typename TTypes<T>::Scalar(base<T>());
+}
+
+template <typename T>
+typename TTypes<T>::ConstScalar Tensor::scalar() const {
+  CheckEigenAlignment(base<T>());
+  CHECK_EQ(1, NumElements()) << "Must have a one element tensor";
+  return typename TTypes<T>::ConstScalar(base<T>());
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PUBLIC_TENSOR_H_
diff --git a/tensorflow/core/public/tensor_c_api.h b/tensorflow/core/public/tensor_c_api.h
new file mode 100644
index 0000000000..fe1846319e
--- /dev/null
+++ b/tensorflow/core/public/tensor_c_api.h
@@ -0,0 +1,243 @@
+// TODO(jeff,sanjay): Rename to tensorflow/public/c_api.h
+#ifndef TENSORFLOW_PUBLIC_TENSOR_C_API_H_
+#define TENSORFLOW_PUBLIC_TENSOR_C_API_H_
+
+#include <stddef.h>
+
+// --------------------------------------------------------------------------
+// C API for TensorFlow.
+//
+// The API leans towards simplicity and uniformity instead of convenience
+// since most usage will be by language specific wrappers.
+//
+// Conventions:
+// * We use the prefix TF_ for everything in the API.
+// * Objects are always passed around as pointers to opaque structs
+//   and these structs are allocated/deallocated via the API.
+// * TF_Status holds error information.  It is an object type
+//   and threfore is passed around as a pointer to an opaque
+//   struct as mentioned above.
+// * Every call that has a TF_Status* argument clears it on success
+//   and fills it with error info on failure.
+//
+// Questions left to address:
+// * Might need to add stride info to TF_Tensor?
+// * Might at some point need a way for callers to provide their own Env.
+// * Should we remove the TF_Status arg from TF_AddProto calls and only
+//   report errors later (e.g., on Run call).
+// * Should dimensions be unsigned instead of signed?
+// * Maybe add TF_TensorShape that encapsulates dimension info.
+//
+// Design decisions made:
+// * Backing store for tensor memory has an associated deallocation
+//   function.  This deallocation function will point to client code
+//   for tensors populated by the client.  So the client can do things
+//   like shadowing a numpy array.
+// * We do not provide TF_OK since it is not strictly necessary and we
+//   are not optimizing for convenience.
+// * We make assumption that one session has one graph.  This should be
+//   fine since we have the ability to run sub-graphs.
+// * We are not providing TF_AddNode/TF_AddNodes to better support
+//   languages/platforms where proto is not available.  This is because
+//   we can just point authors of bindings at the .proto file and the
+//   proto serialization spec and they can do the right thing for
+//   their language.
+// * We could allow NULL for some arguments (e.g., NULL options arg).
+//   However since convenience is not a primary goal, we don't do this.
+// * Devices are not in this API.  Instead, they are created/used internally
+//   and the API just provides high level controls over the number of
+//   devices of each type.
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// --------------------------------------------------------------------------
+// TF_DataType holds the type for a scalar value.  E.g., one slot in a tensor.
+// The enum values here are identical to corresponding values in types.proto.
+typedef enum {
+  TF_FLOAT = 1,
+  TF_DOUBLE = 2,
+  TF_INT32 = 3,  // Int32 tensors are always in 'host' memory.
+  TF_UINT8 = 4,
+  TF_INT16 = 5,
+  TF_INT8 = 6,
+  TF_STRING = 7,
+  TF_COMPLEX = 8,  // Single-precision complex
+  TF_INT64 = 9,
+  TF_BOOL = 10,
+  TF_QINT8 = 11,     // Quantized int8
+  TF_QUINT8 = 12,    // Quantized uint8
+  TF_QINT32 = 13,    // Quantized int32
+  TF_BFLOAT16 = 14,  // Float32 truncated to 16 bits.  Only for cast ops.
+} TF_DataType;
+
+// --------------------------------------------------------------------------
+// TF_Code holds an error code.  The enum values here are identical to
+// corresponding values in error_codes.proto.
+typedef enum {
+  TF_OK = 0,
+  TF_CANCELLED = 1,
+  TF_UNKNOWN = 2,
+  TF_INVALID_ARGUMENT = 3,
+  TF_DEADLINE_EXCEEDED = 4,
+  TF_NOT_FOUND = 5,
+  TF_ALREADY_EXISTS = 6,
+  TF_PERMISSION_DENIED = 7,
+  TF_UNAUTHENTICATED = 16,
+  TF_RESOURCE_EXHAUSTED = 8,
+  TF_FAILED_PRECONDITION = 9,
+  TF_ABORTED = 10,
+  TF_OUT_OF_RANGE = 11,
+  TF_UNIMPLEMENTED = 12,
+  TF_INTERNAL = 13,
+  TF_UNAVAILABLE = 14,
+  TF_DATA_LOSS = 15,
+} TF_Code;
+
+// --------------------------------------------------------------------------
+// TF_Status holds error information.  It either has an OK code, or
+// else an error code with an associated error message.
+typedef struct TF_Status TF_Status;
+
+// Return a new status object.
+extern TF_Status* TF_NewStatus();
+
+// Delete a previously created status object.
+extern void TF_DeleteStatus(TF_Status*);
+
+// Record <code, msg> in *s.  Any previous information is lost.
+// A common use is to clear a status: TF_SetStatus(s, TF_OK, "");
+extern void TF_SetStatus(TF_Status* s, TF_Code code, const char* msg);
+
+// Return the code record in *s.
+extern TF_Code TF_GetCode(const TF_Status* s);
+
+// Return a pointer to the error message in *s.  The return value
+// points to memory that is only usable until the next mutation to *s.
+// Always returns an empty string if TF_GetCode(s) is TF_OK.
+extern const char* TF_Message(const TF_Status* s);
+
+// --------------------------------------------------------------------------
+// TF_Tensor holds a multi-dimensional array of elements of a single data type.
+// For all types other than TF_STRING, the data buffer stores elements
+// in row major order.  E.g. if data is treated as a vector of TF_DataType:
+//
+//   element 0:   index (0, ..., 0)
+//   element 1:   index (0, ..., 1)
+//   ...
+//
+// TODO(jeff,sanjay): Define format for TF_STRING tensors.  Perhaps:
+//   start_offset: array[uint64]
+//   data:         byte[...]
+//
+//   String length is encoded (varint?) starting at data[start_offset[i]]
+//   String contents follow immediately after string length.
+
+typedef struct TF_Tensor TF_Tensor;
+
+// Return a new tensor that holds the bytes data[0,len-1].
+//
+// The data will be deallocated by a subsequent call to TF_DeleteTensor via:
+//      (*deallocator_fn)(data, len, deallocator_arg)
+// Clients can provide a custom deallocator function so they can pass in
+// memory managed by something like numpy.
+extern TF_Tensor* TF_NewTensor(TF_DataType, long long* dims, int num_dims,
+                               void* data, size_t len,
+                               void (*deallocator)(void* data, size_t len,
+                                                   void* arg),
+                               void* deallocator_arg);
+
+// Destroy a tensor.
+extern void TF_DeleteTensor(TF_Tensor*);
+
+// Return the type of a tensor element.
+extern TF_DataType TF_TensorType(const TF_Tensor*);
+
+// Return the number of dimensions that the tensor has.
+extern int TF_NumDims(const TF_Tensor*);
+
+// Return the length of the tensor in the "dim_index" dimension.
+// REQUIRES: 0 <= dim_index < TF_NumDims(tensor)
+extern long long TF_Dim(const TF_Tensor* tensor, int dim_index);
+
+// Return the size of the underlying data in bytes.
+extern size_t TF_TensorByteSize(const TF_Tensor*);
+
+// Return a pointer to the underlying data buffer.
+extern void* TF_TensorData(const TF_Tensor*);
+
+// --------------------------------------------------------------------------
+// TF_SessionOptions holds options that can be passed during session creation.
+typedef struct TF_SessionOptions TF_SessionOptions;
+
+// Return a new options object.
+extern TF_SessionOptions* TF_NewSessionOptions();
+
+// Set the target in TF_SessionOptions.options.
+// target can be empty, a single entry, or a comma separated list of entries.
+// Each entry is in one of the following formats :
+// "local"
+// ip:port
+// host:port
+extern void TF_SetTarget(TF_SessionOptions* options, const char* target);
+
+// Set the config in TF_SessionOptions.options.
+// config should be a serialized brain.ConfigProto proto.
+// If config was not parsed successfully as a ConfigProto, record the
+// error information in *status.
+extern void TF_SetConfig(TF_SessionOptions* options, const char* config,
+                         size_t config_len, TF_Status* status);
+
+// Destroy an options object.
+extern void TF_DeleteSessionOptions(TF_SessionOptions*);
+
+// TODO(jeff,sanjay):
+// - export functions to set Config fields
+
+// --------------------------------------------------------------------------
+// TF_Session manages a single graph and execution.
+typedef struct TF_Session TF_Session;
+
+// Return a new execution session, or NULL on error.
+extern TF_Session* TF_NewSession(const TF_SessionOptions*, TF_Status* status);
+
+// Close a session.
+extern void TF_CloseSession(TF_Session*, TF_Status* status);
+
+// Destroy a session.  Even if error information is recorded in *status,
+// this call discards all resources associated with the session.
+extern void TF_DeleteSession(TF_Session*, TF_Status* status);
+
+// Treat the bytes proto[0,proto_len-1] as a serialized GraphDef and
+// add the nodes in that GraphDef to the graph for the session.
+extern void TF_ExtendGraph(TF_Session*, const void* proto, size_t proto_len,
+                           TF_Status*);
+
+// Run the graph associated with the session starting with the
+// supplied inputs (inputs[0,ninputs-1]).  Regardless of success or
+// failure, inputs[] become the property of the implementation (the
+// implementation will eventually call TF_DeleteTensor on each input).
+//
+// On success, the tensors corresponding to output_names[0,noutputs-1]
+// are placed in outputs[].  and these outputs[] become the property
+// of the caller (the caller must eventually call TF_DeleteTensor on
+// them).
+//
+// On failure, outputs[] contains nulls.
+extern void TF_Run(TF_Session*,
+                   // Input tensors
+                   const char** input_names, TF_Tensor** inputs, int ninputs,
+                   // Output tensors
+                   const char** output_tensor_names, TF_Tensor** outputs,
+                   int noutputs,
+                   // Target nodes
+                   const char** target_node_names, int ntargets,
+                   // Output status
+                   TF_Status*);
+
+#ifdef __cplusplus
+} /* end extern "C" */
+#endif
+
+#endif  // TENSORFLOW_PUBLIC_TENSOR_C_API_H_
diff --git a/tensorflow/core/public/tensor_shape.h b/tensorflow/core/public/tensor_shape.h
new file mode 100644
index 0000000000..a889b8b17d
--- /dev/null
+++ b/tensorflow/core/public/tensor_shape.h
@@ -0,0 +1,239 @@
+#ifndef TENSORFLOW_PUBLIC_TENSOR_SHAPE_H_
+#define TENSORFLOW_PUBLIC_TENSOR_SHAPE_H_
+
+#include <string>
+
+#include "tensorflow/core/framework/tensor_shape.pb.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+
+class TensorShapeIter;  // Declared below
+
+/// Manages the dimensions of a Tensor and their sizes.
+class TensorShape {
+ public:
+  /// \brief Construct a TensorShape from the provided sizes..
+  /// REQUIRES: dim_sizes[i] >= 0
+  explicit TensorShape(gtl::ArraySlice<int64> dim_sizes);
+  TensorShape(std::initializer_list<int64> dim_sizes)
+      : TensorShape(gtl::ArraySlice<int64>(dim_sizes)) {}
+
+  /// REQUIRES: IsValid(proto)
+  explicit TensorShape(const TensorShapeProto& proto);
+
+  /// Create a tensor shape with no dimensions and one element, which you can
+  /// then call AddDim() on.
+  TensorShape();
+
+  /// Returns true iff "proto" is a valid tensor shape.
+  static bool IsValid(const TensorShapeProto& proto);
+
+  /// Clear a tensor shape
+  void Clear();
+
+  /// \brief Add a dimension to the end ("inner-most").
+  /// REQUIRES: size >= 0
+  void AddDim(int64 size);
+
+  /// Appends all the dimensions from shape.
+  void AppendShape(const TensorShape& shape);
+
+  /// \brief Insert a dimension somewhere in the TensorShape.
+  /// REQUIRES: "0 <= d <= dims()"
+  /// REQUIRES: size >= 0
+  void InsertDim(int d, int64 size);
+
+  /// \brief Modifies the size of the dimension 'd' to be 'size'
+  /// REQUIRES: "0 <= d < dims()"
+  /// REQUIRES: size >= 0
+  void set_dim(int d, int64 size);
+
+  /// \brief Removes dimension 'd' from the TensorShape.
+  /// REQUIRES: "0 <= d < dims()"
+  void RemoveDim(int d);
+
+  /// Return the number of dimensions in the tensor.
+  int dims() const { return dim_sizes_.size(); }
+
+  /// \brief Returns the number of elements in dimension "d".
+  /// REQUIRES: "0 <= d < dims()"
+  // TODO(mdevin): Rename to dimension() to match Eigen::Tensor::dimension()?
+  int64 dim_size(int d) const {
+    DCHECK_GE(d, 0);
+    DCHECK_LT(d, dims());
+    return dim_sizes_[d];
+  }
+
+  /// Returns sizes of all dimensions.
+  gtl::ArraySlice<int64> dim_sizes() const { return dim_sizes_; }
+
+  /// \brief Returns the number of elements in the tensor.
+  ///
+  /// We use int64 and
+  /// not size_t to be compatible with Eigen::Tensor which uses ptr_fi
+  int64 num_elements() const { return num_elements_; }
+
+  /// Returns true if *this and b have the same sizes. Ignores dimension names.
+  bool IsSameSize(const TensorShape& b) const;
+  bool operator==(const TensorShape& b) const { return IsSameSize(b); }
+
+  /// Fill *proto from *this.
+  void AsProto(TensorShapeProto* proto) const;
+
+  /// Fill *dsizes from *this.
+  template <int NDIMS>
+  Eigen::DSizes<Eigen::DenseIndex, NDIMS> AsEigenDSizes() const;
+
+  /// Same as AsEigenDSizes() but allows for NDIMS > dims() -- in which case we
+  /// pad the rest of the sizes with 1.
+  template <int NDIMS>
+  Eigen::DSizes<Eigen::DenseIndex, NDIMS> AsEigenDSizesWithPadding() const;
+
+  /// For iterating through the dimensions.
+  TensorShapeIter begin() const;
+  TensorShapeIter end() const;
+
+  /// For error messages.
+  string DebugString() const;
+  // TODO(vrv): Remove this, this is the same as DebugString().
+  string ShortDebugString() const;
+
+ private:
+  /// Recalculates the dimensions of this tensor after they are modified.
+  void recompute_dims();
+
+  // TODO(josh11b): Maybe use something from the Eigen Tensor library
+  /// for the sizes.
+  gtl::InlinedVector<int64, 4> dim_sizes_;
+
+  /// total number of elements (avoids recomputing it each time).
+  int64 num_elements_;
+};
+
+struct TensorShapeDim {
+  explicit TensorShapeDim(int64 s) : size(s) {}
+  int size;
+};
+
+class TensorShapeIter {
+ public:
+  TensorShapeIter(const TensorShape* shape, int d) : shape_(shape), d_(d) {}
+  bool operator==(const TensorShapeIter& rhs) {
+    DCHECK(shape_ == rhs.shape_);
+    return d_ == rhs.d_;
+  }
+  bool operator!=(const TensorShapeIter& rhs) {
+    DCHECK(shape_ == rhs.shape_);
+    return d_ != rhs.d_;
+  }
+  void operator++() { ++d_; }
+  TensorShapeDim operator*() { return TensorShapeDim(shape_->dim_size(d_)); }
+
+ private:
+  const TensorShape* shape_;
+  int d_;
+};
+
+// In some places, allow shape (1,) to be treated as a scalar and shape () to be
+// treated as a vector.  This flag is for temporary backwards compatibility
+// only, and will be changed to strict within Google around November 15, 2015.
+#if defined(PLATFORM_GOOGLE)
+// TODO(irving): Become strict on November 15, 2015.
+static const bool kAllowLegacyScalars = true;
+#else
+// For open source (outside Google), we are strict.
+static const bool kAllowLegacyScalars = false;
+#endif
+
+/// \brief Static helper routines for TensorShape. Includes a few common
+/// predicates on a tensor shape.
+class TensorShapeUtils {
+ public:
+  static bool IsScalar(const TensorShape& shape) { return shape.dims() == 0; }
+
+  static bool IsVector(const TensorShape& shape) { return shape.dims() == 1; }
+
+  // Allow either scalars or (if allowing legacy scalars) shape (1,).
+  static bool IsLegacyScalar(const TensorShape& shape) {
+    return shape.dims() == 0 ||
+           (kAllowLegacyScalars && shape.dims() == 1 && shape.dim_size(0) == 1);
+  }
+
+  // Allow rank 1 or (if allowing legacy scalars) rank 0.
+  static bool IsLegacyVector(const TensorShape& shape) {
+    return shape.dims() == 1 || (kAllowLegacyScalars && shape.dims() == 0);
+  }
+
+  static bool IsVectorOrHigher(const TensorShape& shape) {
+    return shape.dims() >= 1;
+  }
+
+  static bool IsMatrix(const TensorShape& shape) { return shape.dims() == 2; }
+
+  static bool IsMatrixOrHigher(const TensorShape& shape) {
+    return shape.dims() >= 2;
+  }
+
+  /// \brief Returns a TensorShape whose dimensions are dims[0], dims[1], ...,
+  /// dims[n-1].
+  template <typename T>
+  static TensorShape MakeShape(const T* dims, int n) {
+    TensorShape shape;
+    for (int i = 0; i < n; ++i) shape.AddDim(dims[i]);
+    return shape;
+  }
+
+  static string ShapeListString(const gtl::ArraySlice<TensorShape>& shapes) {
+    string result = "[";
+    bool first = true;
+    for (const TensorShape& shape : shapes) {
+      strings::StrAppend(&result, (first ? "" : ", "), shape.DebugString());
+      first = false;
+    }
+    strings::StrAppend(&result, "]");
+    return result;
+  }
+
+  static bool StartsWith(const TensorShape& shape0, const TensorShape& shape1);
+};
+
+// TODO(josh11b): Add TensorStrides once we support strides
+// struct TensorStrides {
+//   gtl::InlinedVector<int, 4> strides_;
+// };
+
+// ----------------------------------------------------------------------------
+// Template method implementation details below
+// ----------------------------------------------------------------------------
+
+template <int NDIMS>
+Eigen::DSizes<Eigen::DenseIndex, NDIMS> TensorShape::AsEigenDSizes() const {
+  CHECK_EQ(NDIMS, dims()) << "Asking for tensor of " << NDIMS
+                          << " for a tensor of " << dims() << " dimensions";
+  return AsEigenDSizesWithPadding<NDIMS>();
+}
+
+template <int NDIMS>
+Eigen::DSizes<Eigen::DenseIndex, NDIMS> TensorShape::AsEigenDSizesWithPadding()
+    const {
+  CHECK_GE(NDIMS, dims()) << "Asking for tensor of " << NDIMS
+                          << " for a tensor of " << dims() << " dimensions";
+  Eigen::DSizes<Eigen::DenseIndex, NDIMS> dsizes;
+  for (int d = 0; d < dims(); d++) {
+    dsizes[d] = dim_size(d);
+  }
+  for (int d = dims(); d < NDIMS; d++) {
+    dsizes[d] = 1;
+  }
+  return dsizes;
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PUBLIC_TENSOR_SHAPE_H_
diff --git a/tensorflow/core/public/tensorflow_server.h b/tensorflow/core/public/tensorflow_server.h
new file mode 100644
index 0000000000..0dac414555
--- /dev/null
+++ b/tensorflow/core/public/tensorflow_server.h
@@ -0,0 +1,19 @@
+#ifndef TENSORFLOW_PUBLIC_TENSORFLOW_SERVER_H_
+#define TENSORFLOW_PUBLIC_TENSORFLOW_SERVER_H_
+
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// Initialize the TensorFlow service for this address space.
+// This is a blocking call that never returns.
+// See BUILD file for details on linkage guidelines.
+::tensorflow::Status InitTensorFlow();
+
+// Like InitTensorFlow() but returns after the Tensorflow
+// services have been launched.
+::tensorflow::Status LaunchTensorFlow();
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PUBLIC_TENSORFLOW_SERVER_H_
diff --git a/tensorflow/core/user_ops/fact.cc b/tensorflow/core/user_ops/fact.cc
new file mode 100644
index 0000000000..7b6932244d
--- /dev/null
+++ b/tensorflow/core/user_ops/fact.cc
@@ -0,0 +1,29 @@
+// An example Op.
+
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+
+using namespace tensorflow;
+
+REGISTER_OP("Fact")
+    .Output("fact: string")
+    .Doc(R"doc(
+Output a fact about factorials.
+)doc");
+
+class FactOp : public OpKernel {
+ public:
+  explicit FactOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // Output a scalar string.
+    Tensor* output_tensor = NULL;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, TensorShape(), &output_tensor));
+    auto output = output_tensor->template scalar<string>();
+
+    output() = "0! == 1";
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("Fact").Device(DEVICE_CPU), FactOp);
diff --git a/tensorflow/core/util/bcast.cc b/tensorflow/core/util/bcast.cc
new file mode 100644
index 0000000000..4e70b78751
--- /dev/null
+++ b/tensorflow/core/util/bcast.cc
@@ -0,0 +1,120 @@
+#include "tensorflow/core/util/bcast.h"
+
+#include "tensorflow/core/platform/logging.h"
+namespace tensorflow {
+
+/* static */
+void BCast::Reverse(Vec* shape) { std::reverse(shape->begin(), shape->end()); }
+
+BCast::BCast(const Vec& sx, const Vec& sy) {
+  // Reverse the shape of x and y for convenience.
+  // After the reverse, 0-th is the inner-most dimension.
+  Vec x = sx;
+  Reverse(&x);
+  Vec y = sy;
+  Reverse(&y);
+
+  // 1-extend and align x and y so that they are the same size.
+  if (x.size() > y.size()) {
+    y.resize(x.size(), 1);
+  } else {
+    x.resize(y.size(), 1);
+  }
+
+  // Going through each dimension starting from the inner-most
+  // dimension, compares dimension of x and y. They are compatible if
+  // they are equal or either is 1.
+  enum State {
+    UNKNOWN,
+    SAME,
+    X_ONE,
+    Y_ONE,
+  };
+  State prev = UNKNOWN;
+  const int64 n = x.size();
+  for (int i = 0; i < n; ++i) {
+    // Output shape.
+    State curr = UNKNOWN;
+    const int64 x_i = x[i];  // i-th dimension of x.
+    CHECK_GE(x_i, 0);
+    const int64 y_i = y[i];  // i-th dimension of y.
+    CHECK_GE(y_i, 0);
+    int64 o_i;   // i-th dimension of the output.
+    int64 bx_i;  // i-th broadcast for x.
+    int64 by_i;  // i-th broadcast for y.
+    // Invariant:
+    //   o_i = x_i * bx_i = y_i * by_i
+    if (x_i == y_i) {
+      // No broadcast.
+      o_i = x_i;
+      bx_i = 1;
+      by_i = 1;
+      curr = SAME;
+    } else if (x_i == 1) {
+      // x broadcast to y on this dimension.
+      o_i = y_i;
+      bx_i = y_i;
+      by_i = 1;
+      grad_x_reduce_idx_.push_back(n - 1 - i);
+      curr = X_ONE;
+    } else if (y_i == 1) {
+      // y broadcast to x on this dimension.
+      o_i = x_i;
+      bx_i = 1;
+      by_i = x_i;
+      grad_y_reduce_idx_.push_back(n - 1 - i);
+      curr = Y_ONE;
+    } else {
+      valid_ = false;
+      return;
+    }
+    output_.push_back(o_i);
+    // Reshape/broadcast.
+    // Invariant:
+    //  result[i] == x_reshape[i] * x_bcast[i] == y_reshape_[i] * y_bcast_[i]
+    if (curr == SAME && x_i == 1) {
+      // Both side are 1s.
+      grad_x_reduce_idx_.push_back(n - 1 - i);
+      grad_y_reduce_idx_.push_back(n - 1 - i);
+      continue;
+    } else if (prev == curr) {
+      // It is a run of the same cases (no broadcast, x broadcast to
+      // y, y broadcast to x). We can reshape the input so that fewer
+      // dimensions are involved in the intermediate computation.
+      result_.back() *= o_i;
+      x_reshape_.back() *= x_i;
+      x_bcast_.back() *= bx_i;
+      y_reshape_.back() *= y_i;
+      y_bcast_.back() *= by_i;
+    } else {
+      result_.push_back(o_i);
+      x_reshape_.push_back(x_i);
+      x_bcast_.push_back(bx_i);
+      y_reshape_.push_back(y_i);
+      y_bcast_.push_back(by_i);
+    }
+    prev = curr;
+  }
+
+  if (result_.empty()) {
+    // Can happen when both x and y are effectively scalar.
+    result_.push_back(1);
+    x_reshape_.push_back(1);
+    x_bcast_.push_back(1);
+    y_reshape_.push_back(1);
+    y_bcast_.push_back(1);
+  }
+
+  // Reverse all vectors since x and y were reversed at very
+  // beginning.
+  Reverse(&x_reshape_);
+  Reverse(&x_bcast_);
+  Reverse(&y_reshape_);
+  Reverse(&y_bcast_);
+  Reverse(&result_);
+  Reverse(&output_);
+  Reverse(&grad_x_reduce_idx_);
+  Reverse(&grad_y_reduce_idx_);
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/util/bcast.h b/tensorflow/core/util/bcast.h
new file mode 100644
index 0000000000..9f0233e415
--- /dev/null
+++ b/tensorflow/core/util/bcast.h
@@ -0,0 +1,99 @@
+#ifndef TENSORFLOW_UTIL_BCAST_H_
+#define TENSORFLOW_UTIL_BCAST_H_
+
+#include <algorithm>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+
+#include "tensorflow/core/platform/logging.h"
+namespace tensorflow {
+
+// BCast is a helper for broadcasting binary tensor operation.
+// TensorFlow's broadcasting rule follows that of numpy (See
+// http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
+//
+// The rule has the following properties:
+//
+//   1. suffix matching: the rule starts with the right-most
+//      dimension, and works towards the left-most dimension. Since
+//      TensorFlow is row-major, the right-most dimension (the last
+//      element in the shape of a tensor) is the inner-most, a.k.a.
+//      the fastest changing, dimension.
+//
+//   2. Two dimensions are compatible for broadcasting if both are the
+//      same or either is 1.
+//
+// BCast takes the shape of two tensors and computes a few vectors of
+// int32 that are useful for the caller to reshape the tensors, apply
+// the right broadcasts to them, compute the broadcasted operation,
+// and possibly the gradients. In a nutshell, the caller is expected
+// to compute the broadcasted operation as following:
+//
+//   BCast b(x.shape(), y.shape());
+//   output = x.reshape(b.x_reshape()).broadcast(b.x_bcast())
+//            _op_
+//            y.reshape(b.y_reshape()).broadcast(b.y_bcast())
+//
+// For the gradient computation,
+//   grad_x = sum(grad * backprop_x(x, y), grad_x_reduce_idx)
+//            .reshape(x.shape())
+//   grad_y = sum(grad * backprop_y(x, y), grad_y_reduce_idx)
+//            .reshape(y.shape())
+// backprop_x and backprop_y are functionals of the binary function "op",
+// e.g.,
+//   for +, backprop_x(x, y) = backprop_y(x, y) = 1;
+//   for *, backprop_x(x, y) =  y, backprop_y(x, y) = x;
+//   for /, backprop_x(x, y) = 1/y, backprop_y(x, y) = -x/y^2;
+//
+// The multiplication in the grad * backprop_x itself is also
+// broadcasting following the same rule.
+//
+// TODO(zhifengc): Adds support for n-ary (n >= 2).
+class BCast {
+ public:
+  // A vector of int32 representing the shape of tensor. The 0-th
+  // element is the outer-most dimension and the last element is the
+  // inner-most dimension. Note that we do not use TensorShape since
+  // it's more convenient to manipulate Vec directly for this module.
+  typedef std::vector<int64> Vec;
+
+  BCast(const Vec& x, const Vec& y);
+  ~BCast() {}
+
+  // Returns true iff two operands are compatible according to the
+  // broadcasting rule.
+  bool IsValid() const { return valid_; }
+
+  // If and only if IsValid(), the following fields can be used in
+  // implementing a broadcasted binary tensor operation according to
+  // the broadcasting rule.
+  const Vec& x_reshape() const { return x_reshape_; }
+  const Vec& x_bcast() const { return x_bcast_; }
+  const Vec& y_reshape() const { return y_reshape_; }
+  const Vec& y_bcast() const { return y_bcast_; }
+  const Vec& result_shape() const { return result_; }
+  const Vec& output_shape() const { return output_; }
+  const Vec& grad_x_reduce_idx() const { return grad_x_reduce_idx_; }
+  const Vec& grad_y_reduce_idx() const { return grad_y_reduce_idx_; }
+
+ private:
+  bool valid_ = true;
+  Vec x_reshape_;
+  Vec x_bcast_;
+  Vec y_reshape_;
+  Vec y_bcast_;
+  Vec result_;
+  Vec output_;
+  Vec grad_x_reduce_idx_;
+  Vec grad_y_reduce_idx_;
+
+  static void Reverse(Vec* shape);
+  static bool HasZero(const Vec& shape);
+
+  TF_DISALLOW_COPY_AND_ASSIGN(BCast);
+};
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_BCAST_H_
diff --git a/tensorflow/core/util/bcast_test.cc b/tensorflow/core/util/bcast_test.cc
new file mode 100644
index 0000000000..02d18586d6
--- /dev/null
+++ b/tensorflow/core/util/bcast_test.cc
@@ -0,0 +1,226 @@
+#include "tensorflow/core/util/bcast.h"
+
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+string BCast(const tensorflow::BCast::Vec& x, const tensorflow::BCast::Vec& y) {
+  tensorflow::BCast b(x, y);
+  if (!b.IsValid()) {
+    return "invalid";
+  }
+  string ret;
+  strings::StrAppend(&ret, "[", str_util::Join(b.x_reshape(), ","), "]");
+  strings::StrAppend(&ret, "[", str_util::Join(b.x_bcast(), ","), "]");
+  strings::StrAppend(&ret, "[", str_util::Join(b.y_reshape(), ","), "]");
+  strings::StrAppend(&ret, "[", str_util::Join(b.y_bcast(), ","), "]");
+  strings::StrAppend(&ret, "[", str_util::Join(b.result_shape(), ","), "]");
+  strings::StrAppend(&ret, "[", str_util::Join(b.output_shape(), ","), "]");
+  strings::StrAppend(&ret, "[", str_util::Join(b.grad_x_reduce_idx(), ","),
+                     "]");
+  strings::StrAppend(&ret, "[", str_util::Join(b.grad_y_reduce_idx(), ","),
+                     "]");
+  return ret;
+}
+
+TEST(BCastTest, Invalid) {
+  EXPECT_EQ("invalid", BCast({5, 3, 2}, {3}));
+  EXPECT_EQ("invalid", BCast({5, 3, 2}, {2, 2}));
+  EXPECT_EQ("invalid", BCast({5, 3, 2}, {10, 1, 1}));
+  EXPECT_EQ("invalid", BCast({1, 2, 1, 2, 1, 2}, {2, 4, 2, 1, 2, 1}));
+}
+
+TEST(BCastTest, Basic_SameShape) {
+  // Effectively no broadcast needed.
+  EXPECT_EQ(BCast({11, 7, 5, 3, 2}, {11, 7, 5, 3, 2}),
+            "[2310][1][2310][1]"
+            "[2310]"
+            "[11,7,5,3,2]"
+            "[][]");
+}
+
+TEST(BCastTest, Basic_Scalar_Scalar) {
+  // Effectively it's a scalar and a scalar.
+  // [1, 1] [1]
+  EXPECT_EQ(BCast({1, 1}, {1}),
+            "[1][1][1][1]"
+            "[1]"
+            "[1,1]"
+            "[0,1][0,1]");
+
+  // [1] [1, 1]
+  EXPECT_EQ(BCast({1}, {1, 1}),
+            "[1][1][1][1]"
+            "[1]"
+            "[1,1]"
+            "[0,1][0,1]");
+}
+
+TEST(BCastTest, Basic_Tensor_Scalar) {
+  // Effectively it's a tensor and a scalar.
+  // [11, 7, 5, 3, 2] [1]
+  EXPECT_EQ(BCast({11, 7, 5, 3, 2}, {1}),
+            "[2310][1][1][2310]"
+            "[2310]"
+            "[11,7,5,3,2]"
+            "[][0,1,2,3,4]");
+
+  // [1] [11, 7, 5, 3, 2]
+  EXPECT_EQ(BCast({1}, {11, 7, 5, 3, 2}),
+            "[1][2310][2310][1]"
+            "[2310]"
+            "[11,7,5,3,2]"
+            "[0,1,2,3,4][]");
+}
+
+TEST(BCastTest, Basic_Tensor_With_DimSize_1_Scalar) {
+  // Effectively it's a tensor and a scalar.
+  // [11, 7, 5, 3, 2, 1] [1]
+  EXPECT_EQ(BCast({11, 7, 5, 3, 2, 1}, {1}),
+            "[2310][1][1][2310]"
+            "[2310]"
+            "[11,7,5,3,2,1]"
+            "[5][0,1,2,3,4,5]");
+
+  // [1] [11, 7, 5, 3, 2, 1]
+  EXPECT_EQ(BCast({1}, {11, 7, 5, 3, 2, 1}),
+            "[1][2310][2310][1]"
+            "[2310]"
+            "[11,7,5,3,2,1]"
+            "[0,1,2,3,4,5][5]");
+
+  // Effectively it's a tensor and a scalar.
+  // [11, 7, 5, 1, 1, 3, 2, 1] [1]
+  EXPECT_EQ(BCast({11, 7, 5, 1, 1, 3, 2, 1, 1}, {1}),
+            "[2310][1][1][2310]"
+            "[2310]"
+            "[11,7,5,1,1,3,2,1,1]"
+            "[3,4,7,8][0,1,2,3,4,5,6,7,8]");
+
+  // [1] [11, 7, 5, 1, 1, 3, 2, 1]
+  EXPECT_EQ(BCast({1}, {11, 7, 5, 1, 1, 3, 2, 1, 1}),
+            "[1][2310][2310][1]"
+            "[2310]"
+            "[11,7,5,1,1,3,2,1,1]"
+            "[0,1,2,3,4,5,6,7,8][3,4,7,8]");
+}
+
+TEST(BCastTest, Basic_Tensor_Vector) {
+  // [11, 7, 5, 3, 2] [2]
+  EXPECT_EQ(BCast({11, 7, 5, 3, 2}, {2}),
+            "[1155,2][1,1][1,2][1155,1]"
+            "[1155,2]"
+            "[11,7,5,3,2]"
+            "[][0,1,2,3]");
+
+  // [2] [11, 7, 5, 3, 2]
+  EXPECT_EQ(BCast({2}, {11, 7, 5, 3, 2}),
+            "[1,2][1155,1][1155,2][1,1]"
+            "[1155,2]"
+            "[11,7,5,3,2]"
+            "[0,1,2,3][]");
+}
+
+TEST(BCastTest, Basic_Tensor_Matrix) {
+  // [11, 7, 5, 3, 2] [3, 2]
+  EXPECT_EQ(BCast({11, 7, 5, 3, 2}, {3, 2}),
+            "[385,6][1,1][1,6][385,1]"
+            "[385,6]"
+            "[11,7,5,3,2]"
+            "[][0,1,2]");
+  // [3, 2] [11, 7, 5, 3, 2]
+  EXPECT_EQ(BCast({3, 2}, {11, 7, 5, 3, 2}),
+            "[1,6][385,1][385,6][1,1]"
+            "[385,6]"
+            "[11,7,5,3,2]"
+            "[0,1,2][]");
+}
+
+TEST(BCastTest, Basic_Tensor_Matrix_Column) {
+  // [11, 7, 5, 3, 2] [3, 1]
+  EXPECT_EQ(BCast({11, 7, 5, 3, 2}, {3, 1}),
+            "[385,3,2][1,1,1][1,3,1][385,1,2]"
+            "[385,3,2]"
+            "[11,7,5,3,2]"
+            "[][0,1,2,4]");
+
+  // [3, 1] [11, 7, 5, 3, 2]
+  EXPECT_EQ(BCast({3, 1}, {11, 7, 5, 3, 2}),
+            "[1,3,1][385,1,2][385,3,2][1,1,1]"
+            "[385,3,2]"
+            "[11,7,5,3,2]"
+            "[0,1,2,4][]");
+}
+
+TEST(BCastTest, Basic_Tensor_Matrix_As_Tensor) {
+  // [11, 7, 5, 3, 2] [7, 5, 1, 1]
+  EXPECT_EQ(BCast({11, 7, 5, 3, 2}, {7, 5, 1, 1}),
+            "[11,35,6][1,1,1][1,35,1][11,1,6]"
+            "[11,35,6]"
+            "[11,7,5,3,2]"
+            "[][0,3,4]");
+
+  // [7, 5, 1, 1] [11, 7, 5, 3, 2]
+  EXPECT_EQ(BCast({7, 5, 1, 1}, {11, 7, 5, 3, 2}),
+            "[1,35,1][11,1,6][11,35,6][1,1,1]"
+            "[11,35,6]"
+            "[11,7,5,3,2]"
+            "[0,3,4][]");
+}
+
+TEST(BCastTest, Complex_BCast_To_Each_Other) {
+  // Rare cases. x and y broadcast to each other.  x and y are of
+  // different ranks.
+  // Can be verified in numpy as:
+  //   import numpy as np
+  //   x = np.arange(0,110).reshape([11,1,5,1,2])
+  //   y = np.arange(0,21).reshape([7,1,3,1])
+  //   np.shape(x + y)
+  //   Out[.]: (11, 7, 5, 3, 2)
+  EXPECT_EQ(BCast({11, 1, 5, 1, 2}, {7, 1, 3, 1}),
+            "[11,1,5,1,2][1,7,1,3,1][1,7,1,3,1][11,1,5,1,2]"
+            "[11,7,5,3,2]"
+            "[11,7,5,3,2]"
+            "[1,3][0,2,4]");
+}
+
+TEST(BCastTest, TestZeroDimensionShape) {
+  EXPECT_EQ(BCast({2, 0, 5}, {5}),
+            "[0,5][1,1][1,5][0,1]"
+            "[0,5]"
+            "[2,0,5]"
+            "[][0,1]");
+  EXPECT_EQ(BCast({5}, {2, 0, 5}),
+            "[1,5][0,1][0,5][1,1]"
+            "[0,5]"
+            "[2,0,5]"
+            "[0,1][]");
+
+  EXPECT_EQ(BCast({2, 0, 3, 0, 5}, {5}),
+            "[0,5][1,1][1,5][0,1]"
+            "[0,5]"
+            "[2,0,3,0,5]"
+            "[][0,1,2,3]");
+  EXPECT_EQ(BCast({5}, {2, 0, 3, 0, 5}),
+            "[1,5][0,1][0,5][1,1]"
+            "[0,5]"
+            "[2,0,3,0,5]"
+            "[0,1,2,3][]");
+
+  EXPECT_EQ(BCast({2, 0, 3, 0, 5}, {3, 1, 5}),
+            "[0,3,0,5][1,1,1,1][1,3,1,5][0,1,0,1]"
+            "[0,3,0,5]"
+            "[2,0,3,0,5]"
+            "[][0,1,3]");
+  EXPECT_EQ(BCast({3, 1, 5}, {2, 0, 3, 0, 5}),
+            "[1,3,1,5][0,1,0,1][0,3,0,5][1,1,1,1]"
+            "[0,3,0,5]"
+            "[2,0,3,0,5]"
+            "[0,1,3][]");
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/device_name_utils.cc b/tensorflow/core/util/device_name_utils.cc
new file mode 100644
index 0000000000..b8c6a77dd0
--- /dev/null
+++ b/tensorflow/core/util/device_name_utils.cc
@@ -0,0 +1,338 @@
+#include "tensorflow/core/util/device_name_utils.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+static bool IsAlpha(char c) {
+  return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z');
+}
+
+static bool IsAlphaNum(char c) { return IsAlpha(c) || (c >= '0' && c <= '9'); }
+
+// Returns true iff "in" is a valid job name.
+static bool IsJobName(StringPiece in) {
+  if (in.empty()) return false;
+  if (!IsAlpha(in[0])) return false;
+  for (size_t i = 1; i < in.size(); ++i) {
+    if (!(IsAlphaNum(in[i]) || in[i] == '_')) return false;
+  }
+  return true;
+}
+
+// Returns true and fills in "*job" iff "*in" starts with a job name.
+static bool ConsumeJobName(StringPiece* in, string* job) {
+  if (in->empty()) return false;
+  if (!IsAlpha((*in)[0])) return false;
+  size_t i = 1;
+  for (; i < in->size(); ++i) {
+    const char c = (*in)[i];
+    if (c == '/') break;
+    if (!(IsAlphaNum(c) || c == '_')) {
+      return false;
+    }
+  }
+  job->assign(in->data(), i);
+  in->remove_prefix(i);
+  return true;
+}
+
+// Returns true and fills in "*device_type" iff "*in" starts with a device type
+// name.
+static bool ConsumeDeviceType(StringPiece* in, string* device_type) {
+  if (in->empty()) return false;
+  if (!IsAlpha((*in)[0])) return false;
+  size_t i = 1;
+  for (; i < in->size(); ++i) {
+    const char c = (*in)[i];
+    if (c == '/' || c == ':') break;
+    if (!(IsAlphaNum(c) || c == '_')) {
+      return false;
+    }
+  }
+  device_type->assign(in->data(), i);
+  in->remove_prefix(i);
+  return true;
+}
+
+// Returns true and fills in "*val" iff "*in" starts with a decimal
+// number.
+static bool ConsumeNumber(StringPiece* in, int* val) {
+  uint64 tmp;
+  if (str_util::ConsumeLeadingDigits(in, &tmp)) {
+    *val = tmp;
+    return true;
+  } else {
+    return false;
+  }
+}
+
+/* static */
+string DeviceNameUtils::FullName(const string& job, int replica, int task,
+                                 const string& type, int id) {
+  CHECK(IsJobName(job)) << job;
+  CHECK_LE(0, replica);
+  CHECK_LE(0, task);
+  CHECK(!type.empty());
+  CHECK_LE(0, id);
+  return strings::StrCat("/job:", job, "/replica:", replica, "/task:", task,
+                         "/device:", type, ":", id);
+}
+
+bool DeviceNameUtils::ParseFullName(StringPiece fullname, ParsedName* p) {
+  p->Clear();
+  if (fullname == "/") {
+    return true;
+  }
+  StringPiece tmp;
+  while (!fullname.empty()) {
+    if (str_util::ConsumePrefix(&fullname, "/job:")) {
+      p->has_job = !str_util::ConsumePrefix(&fullname, "*");
+      if (p->has_job && !ConsumeJobName(&fullname, &p->job)) {
+        return false;
+      }
+    } else if (str_util::ConsumePrefix(&fullname, "/replica:")) {
+      p->has_replica = !str_util::ConsumePrefix(&fullname, "*");
+      if (p->has_replica && !ConsumeNumber(&fullname, &p->replica)) {
+        return false;
+      }
+    } else if (str_util::ConsumePrefix(&fullname, "/task:")) {
+      p->has_task = !str_util::ConsumePrefix(&fullname, "*");
+      if (p->has_task && !ConsumeNumber(&fullname, &p->task)) {
+        return false;
+      }
+    } else if (str_util::ConsumePrefix(&fullname, "/device:")) {
+      p->has_type = !str_util::ConsumePrefix(&fullname, "*");
+      if (p->has_type && !ConsumeDeviceType(&fullname, &p->type)) {
+        return false;
+      }
+      if (!str_util::ConsumePrefix(&fullname, ":")) {
+        p->has_id = false;
+      } else {
+        p->has_id = !str_util::ConsumePrefix(&fullname, "*");
+        if (p->has_id && !ConsumeNumber(&fullname, &p->id)) {
+          return false;
+        }
+      }
+
+    } else if (str_util::ConsumePrefix(&fullname, "/cpu:") ||
+               str_util::ConsumePrefix(&fullname, "/CPU:")) {
+      p->has_type = true;
+      p->type = "CPU";  // Treat '/cpu:..' as uppercase '/device:CPU:...'
+      p->has_id = !str_util::ConsumePrefix(&fullname, "*");
+      if (p->has_id && !ConsumeNumber(&fullname, &p->id)) {
+        return false;
+      }
+    } else if (str_util::ConsumePrefix(&fullname, "/gpu:") ||
+               str_util::ConsumePrefix(&fullname, "/GPU:")) {
+      p->has_type = true;
+      p->type = "GPU";  // Treat '/gpu:..' as uppercase '/device:GPU:...'
+      p->has_id = !str_util::ConsumePrefix(&fullname, "*");
+      if (p->has_id && !ConsumeNumber(&fullname, &p->id)) {
+        return false;
+      }
+    } else {
+      return false;
+    }
+  }
+  return true;
+}
+
+/* static */
+string DeviceNameUtils::ParsedNameToString(const ParsedName& pn) {
+  string buf;
+  if (pn.has_job) strings::StrAppend(&buf, "/job:", pn.job);
+  if (pn.has_replica) strings::StrAppend(&buf, "/replica:", pn.replica);
+  if (pn.has_task) strings::StrAppend(&buf, "/task:", pn.task);
+  if (pn.has_type) {
+    strings::StrAppend(&buf, "/", pn.type, ":");
+    if (pn.has_id) {
+      strings::StrAppend(&buf, pn.id);
+    } else {
+      strings::StrAppend(&buf, "*");
+    }
+  }
+  return buf;
+}
+
+/* static */
+bool DeviceNameUtils::IsSpecification(const ParsedName& less_specific,
+                                      const ParsedName& more_specific) {
+  if (less_specific.has_job &&
+      (!more_specific.has_job || (less_specific.job != more_specific.job))) {
+    return false;
+  }
+  if (less_specific.has_replica &&
+      (!more_specific.has_replica ||
+       (less_specific.replica != more_specific.replica))) {
+    return false;
+  }
+  if (less_specific.has_task &&
+      (!more_specific.has_task || (less_specific.task != more_specific.task))) {
+    return false;
+  }
+  if (less_specific.has_type &&
+      (!more_specific.has_type || (less_specific.type != more_specific.type))) {
+    return false;
+  }
+  if (less_specific.has_id &&
+      (!more_specific.has_id || (less_specific.id != more_specific.id))) {
+    return false;
+  }
+  return true;
+}
+
+/* static */
+bool DeviceNameUtils::IsCompleteSpecification(const ParsedName& pattern,
+                                              const ParsedName& name) {
+  CHECK(name.has_job && name.has_replica && name.has_task && name.has_type &&
+        name.has_id);
+
+  if (pattern.has_job && (pattern.job != name.job)) return false;
+  if (pattern.has_replica && (pattern.replica != name.replica)) return false;
+  if (pattern.has_task && (pattern.task != name.task)) return false;
+  if (pattern.has_type && (pattern.type != name.type)) return false;
+  if (pattern.has_id && (pattern.id != name.id)) return false;
+  return true;
+}
+
+/* static */
+Status DeviceNameUtils::MergeDevNames(ParsedName* target,
+                                      const ParsedName& other,
+                                      bool allow_soft_placement) {
+  if (other.has_job) {
+    if (target->has_job && target->job != other.job) {
+      return errors::InvalidArgument(
+          "Cannot merge devices with incompatible jobs: '",
+          ParsedNameToString(*target), "' and '", ParsedNameToString(other),
+          "'");
+    } else {
+      target->has_job = other.has_job;
+      target->job = other.job;
+    }
+  }
+
+  if (other.has_replica) {
+    if (target->has_replica && target->replica != other.replica) {
+      return errors::InvalidArgument(
+          "Cannot merge devices with incompatible replicas: '",
+          ParsedNameToString(*target), "' and '", ParsedNameToString(other),
+          "'");
+    } else {
+      target->has_replica = other.has_replica;
+      target->replica = other.replica;
+    }
+  }
+
+  if (other.has_task) {
+    if (target->has_task && target->task != other.task) {
+      return errors::InvalidArgument(
+          "Cannot merge devices with incompatible tasks: '",
+          ParsedNameToString(*target), "' and '", ParsedNameToString(other),
+          "'");
+    } else {
+      target->has_task = other.has_task;
+      target->task = other.task;
+    }
+  }
+
+  if (other.has_type) {
+    if (target->has_type && target->type != other.type) {
+      if (!allow_soft_placement) {
+        return errors::InvalidArgument(
+            "Cannot merge devices with incompatible types: '",
+            ParsedNameToString(*target), "' and '", ParsedNameToString(other),
+            "'");
+      } else {
+        target->has_id = false;
+        target->has_type = false;
+        return Status::OK();
+      }
+    } else {
+      target->has_type = other.has_type;
+      target->type = other.type;
+    }
+  }
+
+  if (other.has_id) {
+    if (target->has_id && target->id != other.id) {
+      if (!allow_soft_placement) {
+        return errors::InvalidArgument(
+            "Cannot merge devices with incompatible ids: '",
+            ParsedNameToString(*target), "' and '", ParsedNameToString(other),
+            "'");
+      } else {
+        target->has_id = false;
+        return Status::OK();
+      }
+    } else {
+      target->has_id = other.has_id;
+      target->id = other.id;
+    }
+  }
+
+  return Status::OK();
+}
+
+/* static */
+bool DeviceNameUtils::IsSameAddressSpace(const ParsedName& a,
+                                         const ParsedName& b) {
+  return (a.has_job && b.has_job && (a.job == b.job)) &&
+         (a.has_replica && b.has_replica && (a.replica == b.replica)) &&
+         (a.has_task && b.has_task && (a.task == b.task));
+}
+
+/* static */
+bool DeviceNameUtils::IsSameAddressSpace(StringPiece src, StringPiece dst) {
+  ParsedName x;
+  ParsedName y;
+  return ParseFullName(src, &x) && ParseFullName(dst, &y) &&
+         IsSameAddressSpace(x, y);
+}
+
+/* static */
+string DeviceNameUtils::LocalName(StringPiece type, int id) {
+  return strings::StrCat(type, ":", id);
+}
+
+/* static */
+string DeviceNameUtils::LocalName(StringPiece fullname) {
+  ParsedName x;
+  CHECK(ParseFullName(fullname, &x)) << fullname;
+  return LocalName(x.type, x.id);
+}
+
+/* static */
+bool DeviceNameUtils::ParseLocalName(StringPiece name, ParsedName* p) {
+  ParsedName x;
+  if (!ConsumeDeviceType(&name, &p->type)) {
+    return false;
+  }
+  if (!str_util::ConsumePrefix(&name, ":")) {
+    return false;
+  }
+  if (!ConsumeNumber(&name, &p->id)) {
+    return false;
+  }
+  return name.empty();
+}
+
+/* static */
+bool DeviceNameUtils::SplitDeviceName(StringPiece name, string* task,
+                                      string* device) {
+  ParsedName pn;
+  if (ParseFullName(name, &pn) && pn.has_type && pn.has_id) {
+    *task = strings::StrCat(
+        (pn.has_job ? strings::StrCat("/job:", pn.job) : ""),
+        (pn.has_replica ? strings::StrCat("/replica:", pn.replica) : ""),
+        (pn.has_task ? strings::StrCat("/task:", pn.task) : ""));
+    *device = strings::StrCat(pn.type, ":", pn.id);
+    return true;
+  }
+  return false;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/device_name_utils.h b/tensorflow/core/util/device_name_utils.h
new file mode 100644
index 0000000000..8b0a24ed0d
--- /dev/null
+++ b/tensorflow/core/util/device_name_utils.h
@@ -0,0 +1,141 @@
+#ifndef TENSORFLOW_UTIL_DEVICE_NAME_UTILS_H_
+#define TENSORFLOW_UTIL_DEVICE_NAME_UTILS_H_
+
+#include <string>
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// In TensorFlow a device name is a string of the following form:
+//   /job:<name>/replica:<replica>/task:<task>/device:<type>:<device_num>
+//
+// <name> is a short identifier conforming to the regexp
+//     [a-zA-Z][_a-zA-Z]*
+// <type> is a supported device type (e.g. 'cpu' or 'gpu')
+// <replica>, <task>, <device_num> are small non-negative integers and are
+// densely allocated (except in tests).
+//
+// For some purposes, we also allow device patterns, which can specify
+// some or none of the specific fields above, with missing components,
+// or "<component>:*" indicating "any value allowed for that component.
+//
+// For example:
+//   "/job:param_server"   - Consider any devices in the "param_server" job
+//   "/device:cpu:*"       - Consider any cpu devices in any job/task/replica
+//   "/job:*/replica:*/task:*/device:cpu:*"  - Consider any cpu devices in any
+//                                             job/task/replica
+//   "/job:w/replica:0/task:0/device:gpu:*"  - Consider any gpu devices in
+//                                             replica 0, task 0, of job "w"
+class DeviceNameUtils {
+ public:
+  // Returns a fully qualified device name given the parameters.
+  static string FullName(const string& job, int replica, int task,
+                         const string& type, int id);
+
+  struct ParsedName {
+    void Clear() {
+      has_job = false;
+      has_replica = false;
+      has_task = false;
+      has_type = false;
+      has_id = false;
+      job.clear();
+      replica = 0;
+      task = 0;
+      type.clear();
+      id = 0;
+    }
+
+    bool operator==(const ParsedName& other) const {
+      return (has_job ? (other.has_job && job == other.job) : !other.has_job) &&
+             (has_replica ? (other.has_replica && replica == other.replica)
+                          : !other.has_replica) &&
+             (has_task ? (other.has_task && task == other.task)
+                       : !other.has_task) &&
+             (has_type ? (other.has_type && type == other.type)
+                       : !other.has_type) &&
+             (has_id ? (other.has_id && id == other.id) : !other.has_id);
+    }
+
+    bool has_job = false;
+    string job;
+    bool has_replica = false;
+    int replica = 0;
+    bool has_task = false;
+    int task = 0;
+    bool has_type = false;
+    string type;
+    bool has_id = false;
+    int id = 0;
+  };
+  // Parses "fullname" into "*parsed". Returns true iff succeeds.
+  static bool ParseFullName(StringPiece fullname, ParsedName* parsed);
+
+  // Returns true if "name" specifies any non-trivial constraint on the device.
+  static bool HasSomeDetails(const ParsedName& name) {
+    return name.has_job || name.has_replica || name.has_task || name.has_type ||
+           name.has_id;
+  }
+
+  // Returns true if more_specific is a specification of
+  // less_specific, i.e. everywhere that less-specific has a
+  // non-wildcard component value, more_specific has the same value
+  // for that component.
+  static bool IsSpecification(const ParsedName& less_specific,
+                              const ParsedName& more_specific);
+
+  // Like IsSpecification, but the second argument "name" must have a
+  // non-wildcard value for all of its components.
+  static bool IsCompleteSpecification(const ParsedName& pattern,
+                                      const ParsedName& name);
+
+  // True iff there exists any possible complete device name that is
+  // a specification of both "a" and "b".
+  static inline bool AreCompatibleDevNames(const ParsedName& a,
+                                           const ParsedName& b) {
+    return IsSpecification(a, b) || IsSpecification(b, a);
+  }
+
+  // Merges the device specifications in "*target" and "other", and
+  // stores the result in "*target". Returns OK if "*target" and
+  // "other" are compatible, otherwise returns an error.
+  static Status MergeDevNames(ParsedName* target, const ParsedName& other) {
+    return MergeDevNames(target, other, false);
+  }
+  static Status MergeDevNames(ParsedName* target, const ParsedName& other,
+                              bool allow_soft_placement);
+
+  // Returns true iff devices identified by 'src' and 'dst' are in the
+  // same address space.
+  static bool IsSameAddressSpace(StringPiece src, StringPiece dst);
+  static bool IsSameAddressSpace(const ParsedName& src, const ParsedName& dst);
+
+  // Returns the local device given its "type" and "id".
+  static string LocalName(StringPiece type, int id);
+
+  // Returns a short local device name (cpu:0, gpu:1, etc) based on
+  // the given fullname.
+  static string LocalName(StringPiece fullname);
+
+  // If "name" is a valid local device name (cpu:0, gpu:1, etc.),
+  // fills in parsed.type and parsed.id accordingly. Returns true iff
+  // succeeds.
+  static bool ParseLocalName(StringPiece name, ParsedName* parsed);
+
+  // Splits a fully-qualified device name into a task identifier and a
+  // relative device identifier. It first parses "name" using
+  // ParseFullName(), then assigns *task with everything except for
+  // the local device component, and assigns the relative device
+  // component into *device.  This function will still return true if
+  // the task component is empty, but it requires the relative device
+  // component to be fully specified.
+  static bool SplitDeviceName(StringPiece name, string* task, string* device);
+
+  static string ParsedNameToString(const ParsedName& pn);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_DEVICE_NAME_UTILS_H_
diff --git a/tensorflow/core/util/device_name_utils_test.cc b/tensorflow/core/util/device_name_utils_test.cc
new file mode 100644
index 0000000000..14f30d6de5
--- /dev/null
+++ b/tensorflow/core/util/device_name_utils_test.cc
@@ -0,0 +1,369 @@
+#include "tensorflow/core/util/device_name_utils.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+TEST(DeviceNameUtilsTest, Basic) {
+  EXPECT_EQ(DeviceNameUtils::FullName("hello", 1, 2, "CPU", 3),
+            "/job:hello/replica:1/task:2/device:CPU:3");
+
+  {
+    DeviceNameUtils::ParsedName p;
+    EXPECT_FALSE(DeviceNameUtils::ParseFullName("foobar", &p));
+    EXPECT_FALSE(
+        DeviceNameUtils::ParseFullName("/job:123/replica:1/task:2/gpu:3", &p));
+    EXPECT_FALSE(
+        DeviceNameUtils::ParseFullName("/job:123/replica:1/task:2/gpu:", &p));
+    EXPECT_FALSE(DeviceNameUtils::ParseFullName(
+        "/job:123/replica:1/task:2/device:gpu:", &p));
+    EXPECT_FALSE(
+        DeviceNameUtils::ParseFullName("/job:foo/replica:-1/task:2/gpu:3", &p));
+    EXPECT_FALSE(
+        DeviceNameUtils::ParseFullName("/job:foo/replica:1/task:-2/gpu:3", &p));
+    EXPECT_FALSE(
+        DeviceNameUtils::ParseFullName("/job:foo/replica:1/task:2/bar:3", &p));
+    EXPECT_FALSE(DeviceNameUtils::ParseFullName(
+        "/job:foo/replica:1/task:2/gpu:3/extra", &p));
+    EXPECT_TRUE(
+        DeviceNameUtils::ParseFullName("/job:foo/replica:1/task:2/gpu:3", &p));
+    EXPECT_TRUE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_TRUE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_TRUE(p.has_id);
+    EXPECT_EQ(p.job, "foo");
+    EXPECT_EQ(p.replica, 1);
+    EXPECT_EQ(p.task, 2);
+    EXPECT_EQ(p.type, "GPU");
+    EXPECT_EQ(p.id, 3);
+  }
+  {
+    // Allow _ in job names.
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(DeviceNameUtils::ParseFullName(
+        "/job:foo_bar/replica:1/task:2/gpu:3", &p));
+    EXPECT_TRUE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_TRUE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_TRUE(p.has_id);
+    EXPECT_EQ(p.job, "foo_bar");
+    EXPECT_EQ(p.replica, 1);
+    EXPECT_EQ(p.task, 2);
+    EXPECT_EQ(p.type, "GPU");
+    EXPECT_EQ(p.id, 3);
+  }
+  {
+    // Allow _ in job names.
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(DeviceNameUtils::ParseFullName(
+        "/job:foo_bar/replica:1/task:2/device:GPU:3", &p));
+    EXPECT_TRUE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_TRUE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_TRUE(p.has_id);
+    EXPECT_EQ(p.job, "foo_bar");
+    EXPECT_EQ(p.replica, 1);
+    EXPECT_EQ(p.task, 2);
+    EXPECT_EQ(p.type, "GPU");
+    EXPECT_EQ(p.id, 3);
+  }
+  {
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(DeviceNameUtils::ParseFullName("/job:*/replica:4/gpu:*", &p));
+    EXPECT_FALSE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_FALSE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_FALSE(p.has_id);
+    EXPECT_EQ(p.replica, 4);
+    EXPECT_EQ(p.type, "GPU");
+  }
+  {
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(
+        DeviceNameUtils::ParseFullName("/job:*/replica:4/device:GPU:*", &p));
+    EXPECT_FALSE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_FALSE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_FALSE(p.has_id);
+    EXPECT_EQ(p.replica, 4);
+    EXPECT_EQ(p.type, "GPU");
+  }
+  {
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(
+        DeviceNameUtils::ParseFullName("/job:*/device:GPU/replica:4", &p));
+    EXPECT_FALSE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_FALSE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_FALSE(p.has_id);
+    EXPECT_EQ(p.replica, 4);
+    EXPECT_EQ(p.type, "GPU");
+  }
+  {
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(DeviceNameUtils::ParseFullName(
+        "/job:*/replica:4/device:myspecialdevice:13", &p));
+    EXPECT_FALSE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_FALSE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_TRUE(p.has_id);
+    EXPECT_EQ(p.replica, 4);
+    EXPECT_EQ(p.type, "myspecialdevice");
+    EXPECT_EQ(p.id, 13);
+  }
+  {
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(DeviceNameUtils::ParseFullName("/", &p));
+    EXPECT_FALSE(p.has_job);
+    EXPECT_FALSE(p.has_replica);
+    EXPECT_FALSE(p.has_task);
+    EXPECT_FALSE(p.has_type);
+    EXPECT_FALSE(p.has_id);
+  }
+  {
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(DeviceNameUtils::ParseFullName("/job:*/replica:4/gpu:5", &p));
+    EXPECT_FALSE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_FALSE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_TRUE(p.has_id);
+    EXPECT_EQ(p.replica, 4);
+    EXPECT_EQ(p.type, "GPU");
+    EXPECT_EQ(p.id, 5);
+  }
+  {  // Same result if we reorder the components
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(DeviceNameUtils::ParseFullName("/gpu:*/job:*/replica:4", &p));
+    EXPECT_FALSE(p.has_job);
+    EXPECT_TRUE(p.has_replica);
+    EXPECT_FALSE(p.has_task);
+    EXPECT_TRUE(p.has_type);
+    EXPECT_FALSE(p.has_id);
+    EXPECT_EQ(p.replica, 4);
+    EXPECT_EQ(p.type, "GPU");
+  }
+
+  EXPECT_TRUE(DeviceNameUtils::IsSameAddressSpace(
+      "/job:foo/replica:1/task:2/cpu:3", "/job:foo/replica:1/task:2/gpu:4"));
+  EXPECT_FALSE(DeviceNameUtils::IsSameAddressSpace(
+      "/job:foo/replica:1/task:2/cpu:3", "/job:foo/replica:1/task:3/gpu:4"));
+  EXPECT_FALSE(DeviceNameUtils::IsSameAddressSpace(
+      "/job:foo/replica:1/task:2/cpu:3", "/job:foo/replica:10/task:2/gpu:4"));
+  EXPECT_FALSE(DeviceNameUtils::IsSameAddressSpace(
+      "/job:foo/replica:1/task:2/cpu:3", "/job:bar/replica:1/task:2/gpu:4"));
+
+  EXPECT_EQ(DeviceNameUtils::LocalName("CPU", 1), "CPU:1");
+  EXPECT_EQ(DeviceNameUtils::LocalName("GPU", 2), "GPU:2");
+  EXPECT_EQ(DeviceNameUtils::LocalName("MySpecialDevice", 13),
+            "MySpecialDevice:13");
+
+  EXPECT_EQ(
+      DeviceNameUtils::LocalName("/job:foo/replica:1/task:2/device:CPU:3"),
+      "CPU:3");
+
+  EXPECT_EQ(DeviceNameUtils::LocalName("/job:foo/replica:1/task:2/cpu:3"),
+            "CPU:3");
+
+  EXPECT_EQ(
+      DeviceNameUtils::LocalName("/job:foo/replica:1/task:2/device:abc:73"),
+      "abc:73");
+
+  {
+    DeviceNameUtils::ParsedName p;
+    EXPECT_TRUE(DeviceNameUtils::ParseLocalName("CPU:10", &p));
+    EXPECT_EQ(p.type, "CPU");
+    EXPECT_EQ(p.id, 10);
+    EXPECT_FALSE(DeviceNameUtils::ParseLocalName("cpu:abc", &p));
+    EXPECT_FALSE(DeviceNameUtils::ParseLocalName("abc:", &p));
+    EXPECT_FALSE(DeviceNameUtils::ParseLocalName("abc", &p));
+    EXPECT_FALSE(DeviceNameUtils::ParseLocalName("myspecialdevice", &p));
+  }
+}
+
+static bool IsCSHelper(StringPiece pattern, StringPiece actual) {
+  DeviceNameUtils::ParsedName p, a;
+  EXPECT_TRUE(DeviceNameUtils::ParseFullName(pattern, &p));
+  EXPECT_TRUE(DeviceNameUtils::ParseFullName(actual, &a));
+  return DeviceNameUtils::IsCompleteSpecification(p, a);
+}
+
+TEST(DeviceNameUtilsTest, IsCompleteSpecification) {
+  EXPECT_TRUE(IsCSHelper("/job:*", "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(
+      IsCSHelper("/job:*/replica:*", "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsCSHelper("/job:*/task:*", "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsCSHelper("/job:*/replica:*/task:*",
+                         "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(
+      IsCSHelper("/job:*/replica:*/gpu:*", "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_FALSE(IsCSHelper("/cpu:*", "/job:worker/replica:1/task:2/gpu:3"));
+  EXPECT_FALSE(IsCSHelper("/gpu:2", "/job:worker/replica:1/task:2/gpu:1"));
+  EXPECT_TRUE(IsCSHelper("/gpu:*", "/job:worker/replica:1/task:2/gpu:3"));
+}
+
+static bool IsSpecHelper(StringPiece pattern, StringPiece actual) {
+  DeviceNameUtils::ParsedName p, a;
+  EXPECT_TRUE(DeviceNameUtils::ParseFullName(pattern, &p));
+  EXPECT_TRUE(DeviceNameUtils::ParseFullName(actual, &a));
+  return DeviceNameUtils::IsSpecification(p, a);
+}
+
+TEST(DeviceNameUtilsTest, IsSpecification) {
+  EXPECT_TRUE(IsSpecHelper("/job:*", "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/job:*", "/job:work/replica:1/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/job:*", "/job:work/replica:1"));
+  EXPECT_TRUE(IsSpecHelper("/job:*", "/replica:1"));
+  EXPECT_TRUE(IsSpecHelper("/job:*", "/job:work"));
+  EXPECT_TRUE(
+      IsSpecHelper("/job:*/replica:*", "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/job:work/replica:1/gpu:*",
+                           "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/job:work/replica:1/gpu:3",
+                           "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/job:work/replica:1/task:2",
+                           "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/job:work/replica:*/task:2",
+                           "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/task:*", "/job:*/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/task:2", "/job:*/replica:1/task:2/gpu:3"));
+  EXPECT_TRUE(IsSpecHelper("/cpu:*", "/job:*/replica:1/task:2/cpu:1"));
+  EXPECT_TRUE(IsSpecHelper("/cpu:0", "/cpu:0"));
+  EXPECT_TRUE(IsSpecHelper("/gpu:*", "/job:worker/replica:1/task:2/gpu:3"));
+
+  EXPECT_FALSE(IsSpecHelper("/job:worker/replica:1/task:2/gpu:3", "/gpu:*"));
+  EXPECT_FALSE(IsSpecHelper("/cpu:*", "/job:*/replica:1/task:2"));
+  EXPECT_FALSE(IsSpecHelper("/cpu:*", "/job:*/replica:1/task:2/gpu:1"));
+  EXPECT_FALSE(IsSpecHelper("/cpu:*", "/job:worker/replica:1/task:2/gpu:3"));
+  EXPECT_FALSE(IsSpecHelper("/gpu:2", "/job:worker/replica:1/task:2/gpu:1"));
+  EXPECT_FALSE(IsSpecHelper("/job:work/replica:*/task:0",
+                            "/job:work/replica:1/task:2/gpu:3"));
+  EXPECT_FALSE(IsSpecHelper("/job:work/replica:0/task:2",
+                            "/job:work/replica:*/task:2/gpu:3"));
+}
+
+TEST(DeviceNameUtilsTest, SplitDeviceName) {
+  string task;
+  string device;
+  EXPECT_TRUE(DeviceNameUtils::SplitDeviceName(
+      "/job:foo/replica:1/task:2/cpu:1", &task, &device));
+  EXPECT_EQ("/job:foo/replica:1/task:2", task);
+  EXPECT_EQ("CPU:1", device);
+  EXPECT_TRUE(DeviceNameUtils::SplitDeviceName(
+      "/job:foo/cpu:1/task:2/replica:1", &task, &device));
+  EXPECT_EQ("/job:foo/replica:1/task:2", task);
+  EXPECT_EQ("CPU:1", device);
+  EXPECT_TRUE(DeviceNameUtils::SplitDeviceName("/gpu:3", &task, &device));
+  EXPECT_EQ("", task);
+  EXPECT_EQ("GPU:3", device);
+  EXPECT_FALSE(DeviceNameUtils::SplitDeviceName("gpu:3", &task, &device));
+  EXPECT_FALSE(DeviceNameUtils::SplitDeviceName("/job:foo/task:2/replica:1",
+                                                &task, &device));
+  EXPECT_TRUE(DeviceNameUtils::SplitDeviceName("/device:myspecialdevice:3",
+                                               &task, &device));
+  EXPECT_EQ("", task);
+  EXPECT_EQ("myspecialdevice:3", device);
+}
+
+static DeviceNameUtils::ParsedName Name(const string& str) {
+  DeviceNameUtils::ParsedName ret;
+  CHECK(DeviceNameUtils::ParseFullName(str, &ret)) << "Invalid name: " << str;
+  return ret;
+}
+
+static void MergeDevNamesHelperImpl(const string& name_a, const string& name_b,
+                                    const string& expected_merge_name,
+                                    bool allow_soft_placement) {
+  DeviceNameUtils::ParsedName target_a = Name(name_a);
+  EXPECT_OK(DeviceNameUtils::MergeDevNames(&target_a, Name(name_b),
+                                           allow_soft_placement));
+  DeviceNameUtils::ParsedName target_b = Name(name_b);
+  EXPECT_OK(DeviceNameUtils::MergeDevNames(&target_b, Name(name_a),
+                                           allow_soft_placement));
+  EXPECT_EQ(target_a, target_b);
+  EXPECT_EQ(target_a, Name(expected_merge_name));
+  EXPECT_EQ(target_b, Name(expected_merge_name));
+}
+
+static void MergeDevNamesHelper(const string& name_a, const string& name_b,
+                                const string& expected_merge_name) {
+  MergeDevNamesHelperImpl(name_a, name_b, expected_merge_name, false);
+}
+
+static void MergeDevNamesHelperAllowSoftPlacement(
+    const string& name_a, const string& name_b,
+    const string& expected_merge_name) {
+  MergeDevNamesHelperImpl(name_a, name_b, expected_merge_name, true);
+}
+
+static void MergeDevNamesError(const string& name_a, const string& name_b,
+                               const string& expected_error_substr) {
+  DeviceNameUtils::ParsedName target_a = Name(name_a);
+  Status s = DeviceNameUtils::MergeDevNames(&target_a, Name(name_b));
+  EXPECT_EQ(s.code(), error::INVALID_ARGUMENT);
+  EXPECT_TRUE(StringPiece(s.error_message()).contains(expected_error_substr))
+      << s;
+}
+
+TEST(DeviceNameUtilsTest, MergeDevNames) {
+  DeviceNameUtils::ParsedName target;
+
+  // Idempotence tests.
+  MergeDevNamesHelper("", "", "");
+  MergeDevNamesHelper("/job:foo/replica:1/task:2/cpu:1",
+                      "/job:foo/replica:1/task:2/cpu:1",
+                      "/job:foo/replica:1/task:2/cpu:1");
+
+  // Merging with empty device has no effect.
+  MergeDevNamesHelper("", "/job:foo", "/job:foo");
+  MergeDevNamesHelper("", "/replica:2", "/replica:2");
+  MergeDevNamesHelper("", "/task:7", "/task:7");
+  // MergeDevNamesHelper("", "/gpu:1", "/gpu:1");
+
+  // Combining disjoint names.
+  MergeDevNamesHelper("/job:foo", "/task:7", "/job:foo/task:7");
+  MergeDevNamesHelper("/job:foo", "/gpu:1", "/job:foo/gpu:1");
+
+  // Combining overlapping names.
+  MergeDevNamesHelper("/job:foo/replica:0", "/replica:0/task:1",
+                      "/job:foo/replica:0/task:1");
+
+  // Wildcard tests.
+  MergeDevNamesHelper("", "/gpu:*", "/gpu:*");
+  MergeDevNamesHelper("/gpu:*", "/gpu:*", "/gpu:*");
+  MergeDevNamesHelper("/gpu:1", "/gpu:*", "/gpu:1");
+
+  // Incompatible components.
+  MergeDevNamesError("/job:foo", "/job:bar", "incompatible jobs");
+  MergeDevNamesError("/replica:0", "/replica:1", "incompatible replicas");
+  MergeDevNamesError("/task:0", "/task:1", "incompatible tasks");
+  MergeDevNamesError("/gpu:*", "/cpu:*", "incompatible types");
+  MergeDevNamesError("/gpu:0", "/gpu:1", "incompatible ids");
+}
+
+TEST(DeviceNameUtilsTest, MergeDevNamesAllowSoftPlacement) {
+  // Incompatible components with allow_soft_placement.
+  MergeDevNamesHelperAllowSoftPlacement("/gpu:*", "/cpu:1", "");
+  MergeDevNamesHelperAllowSoftPlacement("/cpu:*", "/gpu:1", "");
+  MergeDevNamesHelperAllowSoftPlacement("/gpu:1", "/gpu:2", "/gpu:*");
+}
+
+static void BM_ParseFullName(int iters) {
+  DeviceNameUtils::ParsedName p;
+  while (iters--) {
+    DeviceNameUtils::ParseFullName("/job:worker/replica:3/task:0/cpu:0", &p);
+  }
+}
+BENCHMARK(BM_ParseFullName);
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/event.proto b/tensorflow/core/util/event.proto
new file mode 100644
index 0000000000..5d67823ce7
--- /dev/null
+++ b/tensorflow/core/util/event.proto
@@ -0,0 +1,29 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/graph.proto";
+import "tensorflow/core/framework/summary.proto";
+
+// Protocol buffer representing an event that happened during
+// the execution of a Brain model.
+message Event {
+  // Timestamp of the event.
+  double wall_time = 1;
+
+  // Globale step of the event.
+  int64 step = 2;
+
+  oneof what {
+    // An event file was started, with the specified version.
+    // This is use to identify the contents of the record IO files
+    // easily.  Current version is "tensorflow.Event:1".  All versions
+    // start with "tensorflow.Event:".
+    string file_version = 3;
+    // A model was constructed.
+    GraphDef graph_def = 4;
+    // A summary was generated.
+    Summary summary = 5;
+  }
+}
diff --git a/tensorflow/core/util/events_writer.cc b/tensorflow/core/util/events_writer.cc
new file mode 100644
index 0000000000..1b34a36577
--- /dev/null
+++ b/tensorflow/core/util/events_writer.cc
@@ -0,0 +1,144 @@
+#include "tensorflow/core/util/events_writer.h"
+
+#include <stddef.h>  // for NULL
+
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/lib/strings/stringprintf.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/util/event.pb.h"
+
+namespace tensorflow {
+
+EventsWriter::EventsWriter(const string& file_prefix)
+    // TODO(jeff,sanjay): Pass in env and use that here instead of Env::Default
+    : env_(Env::Default()),
+      file_prefix_(file_prefix),
+      num_outstanding_events_(0) {}
+
+bool EventsWriter::Init() {
+  if (recordio_writer_.get() != nullptr) {
+    CHECK(!filename_.empty());
+    if (FileHasDisappeared()) {
+      // Warn user of data loss and let .reset() below do basic cleanup.
+      if (num_outstanding_events_ > 0) {
+        LOG(WARNING) << "Re-intialization, attempting to open a new file, "
+                     << num_outstanding_events_ << " events will be lost.";
+      }
+    } else {
+      // No-op: File is present and writer is initialized.
+      return true;
+    }
+  }
+
+  int64 time_in_seconds = env_->NowMicros() / 1000000;
+
+  filename_ = strings::Printf(
+      "%s.out.tfevents.%010lld.%s", file_prefix_.c_str(),
+      static_cast<long long>(time_in_seconds), port::Hostname().c_str());
+  port::AdjustFilenameForLogging(&filename_);
+
+  WritableFile* file;
+  Status s = env_->NewWritableFile(filename_, &file);
+  if (!s.ok()) {
+    LOG(ERROR) << "Could not open events file: " << filename_ << ": " << s;
+    return false;
+  }
+  recordio_file_.reset(file);
+  recordio_writer_.reset(new io::RecordWriter(recordio_file_.get()));
+  if (recordio_writer_.get() == NULL) {
+    LOG(ERROR) << "Could not create record writer";
+    return false;
+  }
+  num_outstanding_events_ = 0;
+  VLOG(1) << "Successfully opened events file: " << filename_;
+  {
+    // Write the first event with the current version, and flush
+    // right away so the file contents will be easily determined.
+
+    Event event;
+    event.set_wall_time(time_in_seconds);
+    event.set_file_version(strings::StrCat(kVersionPrefix, kCurrentVersion));
+    WriteEvent(event);
+    Flush();
+  }
+  return true;
+}
+
+string EventsWriter::FileName() {
+  if (filename_.empty()) {
+    Init();
+  }
+  return filename_;
+}
+
+void EventsWriter::WriteSerializedEvent(const string& event_str) {
+  if (recordio_writer_.get() == NULL) {
+    if (!Init()) {
+      LOG(ERROR) << "Write failed because file could not be opened.";
+      return;
+    }
+  }
+  num_outstanding_events_++;
+  recordio_writer_->WriteRecord(event_str);
+}
+
+void EventsWriter::WriteEvent(const Event& event) {
+  string record;
+  event.AppendToString(&record);
+  WriteSerializedEvent(record);
+}
+
+bool EventsWriter::Flush() {
+  if (num_outstanding_events_ == 0) return true;
+  CHECK(recordio_file_.get() != NULL) << "Unexpected NULL file";
+  // The FileHasDisappeared() condition is necessary because
+  // recordio_writer_->Sync() can return true even if the underlying
+  // file has been deleted.  EventWriter.FileDeletionBeforeWriting
+  // demonstrates this and will fail if the FileHasDisappeared()
+  // conditon is removed.
+  // Also, we deliberately attempt to Sync() before checking for a
+  // disappearing file, in case for some file system File::Exists() is
+  // false after File::Open() but before File::Sync().
+  if (!recordio_file_->Flush().ok() || !recordio_file_->Sync().ok() ||
+      FileHasDisappeared()) {
+    LOG(ERROR) << "Failed to flush " << num_outstanding_events_ << " events to "
+               << filename_;
+    return false;
+  }
+  VLOG(1) << "Wrote " << num_outstanding_events_ << " events to disk.";
+  num_outstanding_events_ = 0;
+  return true;
+}
+
+bool EventsWriter::Close() {
+  bool return_value = Flush();
+  if (recordio_file_.get() != NULL) {
+    Status s = recordio_file_->Close();
+    if (!s.ok()) {
+      LOG(ERROR) << "Error when closing previous event file: " << filename_
+                 << ": " << s;
+      return_value = false;
+    }
+    recordio_writer_.reset(NULL);
+    recordio_file_.reset(NULL);
+  }
+  num_outstanding_events_ = 0;
+  return return_value;
+}
+
+bool EventsWriter::FileHasDisappeared() {
+  if (env_->FileExists(filename_)) {
+    return false;
+  } else {
+    // This can happen even with non-null recordio_writer_ if some other
+    // process has removed the file.
+    LOG(ERROR) << "The events file " << filename_ << " has disappeared.";
+    return true;
+  }
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/events_writer.h b/tensorflow/core/util/events_writer.h
new file mode 100644
index 0000000000..e6b94ad265
--- /dev/null
+++ b/tensorflow/core/util/events_writer.h
@@ -0,0 +1,77 @@
+#ifndef TENSORFLOW_UTIL_EVENTS_WRITER_H_
+#define TENSORFLOW_UTIL_EVENTS_WRITER_H_
+
+#include <memory>
+#include <string>
+#include "tensorflow/core/lib/io/record_writer.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/util/event.pb.h"
+
+namespace tensorflow {
+
+class EventsWriter {
+ public:
+#ifndef SWIG
+  // Prefix of version string present in the first entry of every event file.
+  static constexpr const char* kVersionPrefix = "brain.Event:";
+  static constexpr const int kCurrentVersion = 1;
+#endif
+
+  // Events files typically have a name of the form
+  //   '/some/file/path/my.file.out.events.[timestamp].[hostname]'
+  // To create and EventWriter, the user should provide file_prefix =
+  //   '/some/file/path/my.file'
+  // The EventsWriter will append '.out.events.[timestamp].[hostname]'
+  // to the ultimate filename once Init() is called.
+  // Note that it is not recommended to simultaneously have two
+  // EventWriters writing to the same file_prefix.
+  explicit EventsWriter(const string& file_prefix);
+  ~EventsWriter() { Close(); }  // Autoclose in destructor.
+
+  // Sets the event file filename and opens file for writing.  If not called by
+  // user, will be invoked automatically by a call to FileName() or Write*().
+  // Returns false if the file could not be opened.  Idempotent: if file exists
+  // and is open this is a no-op.  If on the other hand the file was opened,
+  // but has since disappeared (e.g. deleted by another process), this will open
+  // a new file with a new timestamp in its filename.
+  bool Init();
+
+  // Returns the filename for the current events file:
+  // filename_ = [file_prefix_].out.events.[timestamp].[hostname]
+  string FileName();
+
+  // Append "event" to the file.  The "tensorflow::" part is for swig happiness.
+  void WriteEvent(const tensorflow::Event& event);
+
+  // Append "event_str", a serialized Event, to the file.
+  // Note that this function does NOT check that de-serializing event_str
+  // results in a valid Event proto.
+  void WriteSerializedEvent(const string& event_str);
+
+  // EventWriter automatically flushes and closes on destruction, but
+  // these two methods are provided for users who want to write to disk sooner
+  // and/or check for success.
+  //   Flush() pushes outstanding events to disk.  Returns false if the
+  // events file could not be created, or if the file exists but could not
+  // be written too.
+  //   Close() calls Flush() and then closes the current events file.
+  // Returns true only if both the flush and the closure were successful.
+  bool Flush();
+  bool Close();
+
+ private:
+  bool FileHasDisappeared();  // True if event_file_path_ does not exist.
+
+  Env* env_;
+  const string file_prefix_;
+  string filename_;
+  std::unique_ptr<WritableFile> recordio_file_;
+  std::unique_ptr<io::RecordWriter> recordio_writer_;
+  int num_outstanding_events_;
+  TF_DISALLOW_COPY_AND_ASSIGN(EventsWriter);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_EVENTS_WRITER_H_
diff --git a/tensorflow/core/util/events_writer_test.cc b/tensorflow/core/util/events_writer_test.cc
new file mode 100644
index 0000000000..f6523ead92
--- /dev/null
+++ b/tensorflow/core/util/events_writer_test.cc
@@ -0,0 +1,198 @@
+#include "tensorflow/core/util/events_writer.h"
+
+#include <math.h>
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/io/record_reader.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/util/event.pb.h"
+
+namespace tensorflow {
+namespace {
+
+// shorthand
+Env* env() { return Env::Default(); }
+
+void WriteSimpleValue(EventsWriter* writer, double wall_time, int64 step,
+                      const string& tag, float simple_value) {
+  Event event;
+  event.set_wall_time(wall_time);
+  event.set_step(step);
+  Summary::Value* summ_val = event.mutable_summary()->add_value();
+  summ_val->set_tag(tag);
+  summ_val->set_simple_value(simple_value);
+  writer->WriteEvent(event);
+}
+
+void WriteFile(EventsWriter* writer) {
+  WriteSimpleValue(writer, 1234, 34, "foo", 3.14159);
+  WriteSimpleValue(writer, 2345, 35, "bar", -42);
+}
+
+static bool ReadEventProto(io::RecordReader* reader, uint64* offset,
+                           Event* proto) {
+  string record;
+  Status s = reader->ReadRecord(offset, &record);
+  if (!s.ok()) {
+    return false;
+  }
+  return ParseProtoUnlimited(proto, record);
+}
+
+void VerifyFile(const string& filename) {
+  CHECK(env()->FileExists(filename));
+  RandomAccessFile* event_file;
+  TF_CHECK_OK(env()->NewRandomAccessFile(filename, &event_file));
+  io::RecordReader* reader = new io::RecordReader(event_file);
+
+  uint64 offset = 0;
+
+  Event actual;
+  CHECK(ReadEventProto(reader, &offset, &actual));
+  VLOG(1) << actual.ShortDebugString();
+  // Wall time should be within 5s of now.
+
+  double current_time = env()->NowMicros() / 1000000.0;
+  EXPECT_LT(fabs(actual.wall_time() - current_time), 5);
+  // Should have the current version number.
+  EXPECT_EQ(actual.file_version(),
+            strings::StrCat(EventsWriter::kVersionPrefix,
+                            EventsWriter::kCurrentVersion));
+
+  Event expected;
+  CHECK(ReadEventProto(reader, &offset, &actual));
+  VLOG(1) << actual.ShortDebugString();
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "wall_time: 1234 step: 34 "
+      "summary { value { tag: 'foo' simple_value: 3.14159 } }",
+      &expected));
+  // TODO(keveman): Enable this check
+  // EXPECT_THAT(expected, EqualsProto(actual));
+
+  CHECK(ReadEventProto(reader, &offset, &actual));
+  VLOG(1) << actual.ShortDebugString();
+  ASSERT_TRUE(protobuf::TextFormat::ParseFromString(
+      "wall_time: 2345 step: 35 "
+      "summary { value { tag: 'bar' simple_value: -42 } }",
+      &expected));
+  // TODO(keveman): Enable this check
+  // EXPECT_THAT(expected, EqualsProto(actual));
+
+  TF_CHECK_OK(env()->DeleteFile(filename));
+
+  delete reader;
+  delete event_file;
+}
+
+string GetDirName(const string& suffix) {
+  return io::JoinPath(testing::TmpDir(), suffix);
+}
+
+TEST(EventWriter, WriteFlush) {
+  string file_prefix = GetDirName("/writeflush_test");
+  EventsWriter writer(file_prefix);
+  WriteFile(&writer);
+  EXPECT_TRUE(writer.Flush());
+  string filename = writer.FileName();
+  VerifyFile(filename);
+}
+
+TEST(EventWriter, WriteClose) {
+  string file_prefix = GetDirName("/writeclose_test");
+  EventsWriter writer(file_prefix);
+  WriteFile(&writer);
+  EXPECT_TRUE(writer.Close());
+  string filename = writer.FileName();
+  VerifyFile(filename);
+}
+
+TEST(EventWriter, WriteDelete) {
+  string file_prefix = GetDirName("/writedelete_test");
+  EventsWriter* writer = new EventsWriter(file_prefix);
+  WriteFile(writer);
+  string filename = writer->FileName();
+  delete writer;
+  VerifyFile(filename);
+}
+
+TEST(EventWriter, FailFlush) {
+  string file_prefix = GetDirName("/failflush_test");
+  EventsWriter writer(file_prefix);
+  string filename = writer.FileName();
+  WriteFile(&writer);
+  EXPECT_TRUE(env()->FileExists(filename));
+  env()->DeleteFile(filename);
+  EXPECT_FALSE(env()->FileExists(filename));
+  EXPECT_FALSE(writer.Flush());
+  EXPECT_FALSE(env()->FileExists(filename));
+}
+
+TEST(EventWriter, FailClose) {
+  string file_prefix = GetDirName("/failclose_test");
+  EventsWriter writer(file_prefix);
+  string filename = writer.FileName();
+  WriteFile(&writer);
+  EXPECT_TRUE(env()->FileExists(filename));
+  env()->DeleteFile(filename);
+  EXPECT_FALSE(env()->FileExists(filename));
+  EXPECT_FALSE(writer.Close());
+  EXPECT_FALSE(env()->FileExists(filename));
+}
+
+TEST(EventWriter, InitWriteClose) {
+  string file_prefix = GetDirName("/initwriteclose_test");
+  EventsWriter writer(file_prefix);
+  EXPECT_TRUE(writer.Init());
+  string filename0 = writer.FileName();
+  EXPECT_TRUE(env()->FileExists(filename0));
+  WriteFile(&writer);
+  EXPECT_TRUE(writer.Close());
+  string filename1 = writer.FileName();
+  EXPECT_EQ(filename0, filename1);
+  VerifyFile(filename1);
+}
+
+TEST(EventWriter, NameWriteClose) {
+  string file_prefix = GetDirName("/namewriteclose_test");
+  EventsWriter writer(file_prefix);
+  string filename = writer.FileName();
+  EXPECT_TRUE(env()->FileExists(filename));
+  WriteFile(&writer);
+  EXPECT_TRUE(writer.Close());
+  VerifyFile(filename);
+}
+
+TEST(EventWriter, NameClose) {
+  string file_prefix = GetDirName("/nameclose_test");
+  EventsWriter writer(file_prefix);
+  string filename = writer.FileName();
+  EXPECT_TRUE(writer.Close());
+  EXPECT_TRUE(env()->FileExists(filename));
+  env()->DeleteFile(filename);
+}
+
+TEST(EventWriter, FileDeletionBeforeWriting) {
+  string file_prefix = GetDirName("/fdbw_test");
+  EventsWriter writer(file_prefix);
+  string filename0 = writer.FileName();
+  EXPECT_TRUE(env()->FileExists(filename0));
+  env()->SleepForMicroseconds(
+      2000000);  // To make sure timestamp part of filename will differ.
+  env()->DeleteFile(filename0);
+  EXPECT_TRUE(writer.Init());  // Init should reopen file.
+  WriteFile(&writer);
+  EXPECT_TRUE(writer.Flush());
+  string filename1 = writer.FileName();
+  EXPECT_NE(filename0, filename1);
+  VerifyFile(filename1);
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/guarded_philox_random.cc b/tensorflow/core/util/guarded_philox_random.cc
new file mode 100644
index 0000000000..4cf58b8979
--- /dev/null
+++ b/tensorflow/core/util/guarded_philox_random.cc
@@ -0,0 +1,39 @@
+#include "tensorflow/core/util/guarded_philox_random.h"
+#include "tensorflow/core/lib/random/random.h"
+
+namespace tensorflow {
+
+Status GuardedPhiloxRandom::Init(OpKernelConstruction* context) {
+  // Grab seed Attrs.
+  int64 seed, seed2;
+  auto status = context->GetAttr("seed", &seed);
+  if (!status.ok()) return status;
+  status = context->GetAttr("seed2", &seed2);
+  if (!status.ok()) return status;
+
+  // Initialize with the given seeds
+  Init(seed, seed2);
+  return Status::OK();
+}
+
+void GuardedPhiloxRandom::Init(int64 seed, int64 seed2) {
+  CHECK(!initialized_);
+  if (seed == 0 && seed2 == 0) {
+    // If both seeds are unspecified, use completely random seeds.
+    seed = random::New64();
+    seed2 = random::New64();
+  }
+  mutex_lock lock(mu_);
+  generator_ = random::PhiloxRandom(seed, seed2);
+  initialized_ = true;
+}
+
+random::PhiloxRandom GuardedPhiloxRandom::ReserveSamples128(int64 samples) {
+  CHECK(initialized_);
+  mutex_lock lock(mu_);
+  auto local = generator_;
+  generator_.Skip(samples);
+  return local;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/guarded_philox_random.h b/tensorflow/core/util/guarded_philox_random.h
new file mode 100644
index 0000000000..6e9cb9f99c
--- /dev/null
+++ b/tensorflow/core/util/guarded_philox_random.h
@@ -0,0 +1,56 @@
+#ifndef TENSORFLOW_KERNELS_GUARDED_PHILOX_RANDOM_H_
+#define TENSORFLOW_KERNELS_GUARDED_PHILOX_RANDOM_H_
+
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/random/philox_random.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// A thread safe wrapper around a Philox generator.  Example usage:
+//
+//   GuardedRandomPhilox generator;
+//   generator.Init(context);
+//
+//   // In thread safe code
+//   const int samples = ...;
+//   auto local_generator = generator.ReserveSamples128(samples);
+//   for (int i = 0; i < samples; i++)
+//     Array<uint32, 4> sample = local_generator();
+//     // Use sample
+//   }
+//
+class GuardedPhiloxRandom {
+ public:
+  // Must call Init to finish initialization
+  GuardedPhiloxRandom() : initialized_(false) {}
+
+  // Initialize the generator from attributes "seed" and "seed2".
+  // If both seeds are unspecified, use random seeds.
+  // Must be called exactly once.
+  Status Init(OpKernelConstruction* context);
+
+  // Initialize with given seeds.
+  void Init(int64 seed, int64 seed2);
+
+  // Reserve a certain number of 128-bit samples.
+  // This function is thread safe.  The returned generator is valid for the
+  // given number of samples, and can be used without a lock.
+  random::PhiloxRandom ReserveSamples128(int64 samples);
+
+  // Reserve a certain number of 32-bit samples
+  random::PhiloxRandom ReserveSamples32(int64 samples) {
+    return ReserveSamples128((samples + 3) / 4);
+  }
+
+ private:
+  mutex mu_;
+  random::PhiloxRandom generator_ GUARDED_BY(mu_);
+  bool initialized_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(GuardedPhiloxRandom);
+};
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_KERNELS_GUARDED_PHILOX_RANDOM_H_
diff --git a/tensorflow/core/util/padding.cc b/tensorflow/core/util/padding.cc
new file mode 100644
index 0000000000..24273e5ca4
--- /dev/null
+++ b/tensorflow/core/util/padding.cc
@@ -0,0 +1,24 @@
+#include "tensorflow/core/util/padding.h"
+
+#include "tensorflow/core/framework/node_def_util.h"
+#include "tensorflow/core/lib/core/errors.h"
+
+namespace tensorflow {
+
+Status GetNodeAttr(const NodeDef& node_def, const string& attr_name,
+                        Padding* value) {
+  string str_value;
+  TF_RETURN_IF_ERROR(GetNodeAttr(node_def, attr_name, &str_value));
+  if (str_value == "SAME") {
+    *value = SAME;
+  } else if (str_value == "VALID") {
+    *value = VALID;
+  } else {
+    return errors::NotFound(str_value, " is not an allowed padding type");
+  }
+  return Status::OK();
+}
+
+string GetPaddingAttrString() { return "padding: {'SAME', 'VALID'}"; }
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/util/padding.h b/tensorflow/core/util/padding.h
new file mode 100644
index 0000000000..66cd96abdb
--- /dev/null
+++ b/tensorflow/core/util/padding.h
@@ -0,0 +1,37 @@
+#ifndef TENSORFLOW_UTIL_PADDING_H_
+#define TENSORFLOW_UTIL_PADDING_H_
+
+// This file contains helper routines to deal with padding in various ops and
+// kernels.
+
+#include <string>
+
+#include "tensorflow/core/framework/graph.pb.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+// Padding: the padding we apply to the input tensor along the rows and columns
+// dimensions. This is usually used to make sure that the spatial dimensions do
+// not shrink when we progress with convolutions. Two types of padding are
+// supported:
+//   VALID: No padding is carried out.
+//   SAME: The pad value is computed so that the output will have the same
+//         dimensions as the input.
+// The padded area is zero-filled.
+enum Padding {
+  VALID = 1,  // No padding.
+  SAME = 2,   // Input and output layers have the same size.
+};
+
+// Return the string containing the list of valid padding types, that can be
+// used as an Attr() in REGISTER_OP.
+string GetPaddingAttrString();
+
+// Specialization to parse an attribute directly into a Padding enum.
+Status GetNodeAttr(const NodeDef& node_def, const string& attr_name,
+                        Padding* value);
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_PADDING_H_
diff --git a/tensorflow/core/util/port.cc b/tensorflow/core/util/port.cc
new file mode 100644
index 0000000000..12eb076a4d
--- /dev/null
+++ b/tensorflow/core/util/port.cc
@@ -0,0 +1,13 @@
+#include "tensorflow/core/util/port.h"
+
+namespace tensorflow {
+
+bool IsGoogleCudaEnabled() {
+#if GOOGLE_CUDA
+  return true;
+#else
+  return false;
+#endif
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/util/port.h b/tensorflow/core/util/port.h
new file mode 100644
index 0000000000..8b9d033d63
--- /dev/null
+++ b/tensorflow/core/util/port.h
@@ -0,0 +1,11 @@
+#ifndef TENSORFLOW_UTIL_PORT_H_
+#define TENSORFLOW_UTIL_PORT_H_
+
+namespace tensorflow {
+
+// Returns true if GOOGLE_CUDA is defined.
+bool IsGoogleCudaEnabled();
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_PORT_H_
diff --git a/tensorflow/core/util/saved_tensor_slice.proto b/tensorflow/core/util/saved_tensor_slice.proto
new file mode 100644
index 0000000000..f6599d9669
--- /dev/null
+++ b/tensorflow/core/util/saved_tensor_slice.proto
@@ -0,0 +1,76 @@
+// Protocol buffers for saved tensor slices. It's used for the brain tensor
+// ops checkpoints and the V3 checkpoints in dist_belief.
+
+// A checkpoint file is an sstable. The value for each record is a serialized
+// SavedTensorSlices message (defined below).
+//
+// Each checkpoint file has a record with the empty key (""), which corresponds
+// to a SavedTensorSlices message that contains a "meta", that serves as a
+// table of contents on all the tensor slices saved in this file. Since the key
+// is "", it's always the first record in each file.
+//
+// Each of the rest of the records in a checkpoint stores the raw data of a
+// particular tensor slice, in SavedSlice format. The corresponding key is an
+// ordered code that encodes the name of the tensor and the slice
+// information. The name is also stored in the SaveSlice message for ease of
+// debugging and manual examination.
+
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+import "tensorflow/core/framework/tensor_shape.proto";
+import "tensorflow/core/framework/tensor_slice.proto";
+import "tensorflow/core/framework/tensor.proto";
+import "tensorflow/core/framework/types.proto";
+
+// Metadata describing the set of slices of the same tensor saved in a
+// checkpoint file.
+message SavedSliceMeta {
+  // Name of the tensor.
+  string name = 1;
+
+  // Shape of the tensor
+  TensorShapeProto shape = 2;
+
+  // Type of the tensor
+  DataType type = 3;
+
+  // Explicit list of slices saved in the checkpoint file.
+  repeated TensorSliceProto slice = 4;
+};
+
+// Metadata describing the set of tensor slices saved in a checkpoint file.
+// It is always stored at the beginning of each checkpoint file.
+message SavedTensorSliceMeta {
+  // Each SavedSliceMeta describes the slices for one tensor.
+  repeated SavedSliceMeta tensor = 1;
+};
+
+// Saved tensor slice: it stores the name of the tensors, the slice, and the
+// raw data.
+message SavedSlice {
+  // Name of the tensor that this slice belongs to. This must be identical to
+  // the name used to encode the key for this record.
+  string name = 1;
+
+  // Extent of the slice.  Must have one entry for each of the dimension of the
+  // tensor that this slice belongs to.
+  TensorSliceProto slice = 2;
+
+  // The raw data of the slice is stored as a TensorProto. Only raw data are
+  // stored (we don't fill in fields such as dtype or tensor_shape).
+  TensorProto data = 3;
+};
+
+// Each record in a v3 checkpoint file is a serialized SavedTensorSlices
+// message.
+message SavedTensorSlices {
+  // This is only present at the first item of each checkpoint file and serves
+  // as a table of contents, listing all the tensor slices saved in this file.
+  SavedTensorSliceMeta meta = 1;
+
+  // This exists in all but the first item of each checkpoint file.
+  SavedSlice data = 2;
+};
diff --git a/tensorflow/core/util/saved_tensor_slice_util.cc b/tensorflow/core/util/saved_tensor_slice_util.cc
new file mode 100644
index 0000000000..7a5903f07f
--- /dev/null
+++ b/tensorflow/core/util/saved_tensor_slice_util.cc
@@ -0,0 +1,76 @@
+#include "tensorflow/core/util/saved_tensor_slice_util.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/strings/ordered_code.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+const char kSavedTensorSlicesKey[] = "";
+
+string EncodeTensorNameSlice(const string& name, const TensorSlice& slice) {
+  string buffer;
+  // All the tensor slice keys will start with a 0
+  tensorflow::strings::OrderedCode::WriteNumIncreasing(&buffer, 0);
+  tensorflow::strings::OrderedCode::WriteString(&buffer, name);
+  tensorflow::strings::OrderedCode::WriteNumIncreasing(&buffer, slice.dims());
+  for (int d = 0; d < slice.dims(); ++d) {
+    // A trivial extent (meaning we take EVERYTHING) will default to -1 for both
+    // start and end. These will be properly parsed.
+    tensorflow::strings::OrderedCode::WriteSignedNumIncreasing(&buffer,
+                                                               slice.start(d));
+    tensorflow::strings::OrderedCode::WriteSignedNumIncreasing(&buffer,
+                                                               slice.length(d));
+  }
+  return buffer;
+}
+
+Status DecodeTensorNameSlice(const string& code, string* name,
+                             tensorflow::TensorSlice* slice) {
+  StringPiece src(code);
+  uint64 x;
+  if (!tensorflow::strings::OrderedCode::ReadNumIncreasing(&src, &x)) {
+    return errors::Internal("Failed to parse the leading number: src = ", src);
+  }
+  if (x != 0) {
+    return errors::Internal(
+        "The leading number should always be 0 for any valid key: src = ", src);
+  }
+  if (!tensorflow::strings::OrderedCode::ReadString(&src, name)) {
+    return errors::Internal("Failed to parse the tensor name: src = ", src);
+  }
+  if (!tensorflow::strings::OrderedCode::ReadNumIncreasing(&src, &x)) {
+    return errors::Internal("Failed to parse the tensor rank: src = ", src);
+  }
+  if (x == 0) {
+    return errors::Internal("Expecting positive rank of the tensor, got ", x,
+                            ", src = ", src);
+  }
+  if (x >= kint32max) {
+    return errors::Internal("Too many elements ", x);
+  }
+  slice->SetFullSlice(x);
+  for (int d = 0; d < static_cast<int32>(x); ++d) {
+    // We expected 2x integers
+    int64 start, length;
+    if (!tensorflow::strings::OrderedCode::ReadSignedNumIncreasing(&src,
+                                                                   &start)) {
+      return errors::Internal("Failed to parse start: src = ", src);
+    }
+    if (!tensorflow::strings::OrderedCode::ReadSignedNumIncreasing(&src,
+                                                                   &length)) {
+      return errors::Internal("Failed to parse length: src = ", src);
+    }
+    if (length >= 0) {
+      // a non-trivial extent
+      slice->set_start(d, start);
+      slice->set_length(d, length);
+    }
+  }
+  return Status::OK();
+}
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/saved_tensor_slice_util.h b/tensorflow/core/util/saved_tensor_slice_util.h
new file mode 100644
index 0000000000..6206cd8538
--- /dev/null
+++ b/tensorflow/core/util/saved_tensor_slice_util.h
@@ -0,0 +1,110 @@
+// Utilities for saving/restoring tensor slice checkpoints.
+
+#ifndef TENSORFLOW_UTIL_SAVED_TENSOR_SLICE_UTIL_H_
+#define TENSORFLOW_UTIL_SAVED_TENSOR_SLICE_UTIL_H_
+
+#include <string>  // for string
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/framework/tensor.pb.h"
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/status.h"  // for Status
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+// The key for the metadata in the tensor slice checkpoint files. It is "" so
+// that the metadata is always at the beginning of a checkpoint file.
+extern const char kSavedTensorSlicesKey[];
+
+// Encode a tensor name + a tensor slice into an ordered code and outputs it as
+// a string.
+// The format is
+//  <0>
+//  <tensor_name>
+//  <rank>
+//  <dim-0-start><dim-0-length>
+//  <dim-1-start><dim-1-length>
+//  ...
+
+string EncodeTensorNameSlice(const string& name,
+                             const tensorflow::TensorSlice& slice);
+
+// Parse out the name and the slice from string encoded as an ordered code.
+Status DecodeTensorNameSlice(const string& code, string* name,
+                             tensorflow::TensorSlice* slice);
+
+template <typename T>
+struct SaveTypeTraits;
+
+template <typename T>
+const typename SaveTypeTraits<T>::SavedType* TensorProtoData(
+    const TensorProto& t);
+
+template <typename T>
+protobuf::RepeatedField<typename SaveTypeTraits<T>::SavedType>*
+MutableTensorProtoData(TensorProto* t);
+
+template <typename T>
+void Fill(T* data, size_t n, TensorProto* t);
+
+#define TENSOR_PROTO_EXTRACT_TYPE(TYPE, FIELD, FTYPE)                    \
+  template <>                                                            \
+  struct SaveTypeTraits<TYPE> {                                          \
+    static constexpr bool supported = true;                              \
+    typedef FTYPE SavedType;                                             \
+  };                                                                     \
+  template <>                                                            \
+  inline const FTYPE* TensorProtoData<TYPE>(const TensorProto& t) {      \
+    static_assert(SaveTypeTraits<TYPE>::supported,                       \
+                  "Specified type " #TYPE " not supported for Restore"); \
+    return reinterpret_cast<const FTYPE*>(t.FIELD##_val().data());       \
+  }                                                                      \
+  template <>                                                            \
+  inline protobuf::RepeatedField<FTYPE>* MutableTensorProtoData<TYPE>(   \
+      TensorProto * t) {                                                 \
+    static_assert(SaveTypeTraits<TYPE>::supported,                       \
+                  "Specified type " #TYPE " not supported for Save");    \
+    return reinterpret_cast<protobuf::RepeatedField<FTYPE>*>(            \
+        t->mutable_##FIELD##_val());                                     \
+  }                                                                      \
+  template <>                                                            \
+  inline void Fill(const TYPE* data, size_t n, TensorProto* t) {         \
+    typename protobuf::RepeatedField<FTYPE> copy(data, data + n);        \
+    t->mutable_##FIELD##_val()->Swap(&copy);                             \
+  }
+
+TENSOR_PROTO_EXTRACT_TYPE(float, float, float);
+TENSOR_PROTO_EXTRACT_TYPE(double, double, double);
+TENSOR_PROTO_EXTRACT_TYPE(int32, int, int32);
+TENSOR_PROTO_EXTRACT_TYPE(int64, int64, int64);
+TENSOR_PROTO_EXTRACT_TYPE(uint8, int, int32);
+TENSOR_PROTO_EXTRACT_TYPE(int8, int, int32);
+TENSOR_PROTO_EXTRACT_TYPE(int16, int, int32);
+TENSOR_PROTO_EXTRACT_TYPE(qint8, int, int32);
+TENSOR_PROTO_EXTRACT_TYPE(quint8, int, int32);
+
+#undef TENSOR_PROTO_EXTRACT_TYPE
+
+template <>
+struct SaveTypeTraits<qint32> : SaveTypeTraits<int32> {};
+
+template <>
+inline const int32* TensorProtoData<qint32>(const TensorProto& t) {
+  static_assert(SaveTypeTraits<qint32>::supported,
+                "Specified type qint32 not supported for Restore");
+  return reinterpret_cast<const int32*>(t.int_val().data());
+}
+
+inline void Fill(const qint32* data, size_t n, TensorProto* t) {
+  const int32* p = reinterpret_cast<const int32*>(data);
+  typename protobuf::RepeatedField<int32> copy(p, p + n);
+  t->mutable_int_val()->Swap(&copy);
+}
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_SAVED_TENSOR_SLICE_UTIL_H_
diff --git a/tensorflow/core/util/saved_tensor_slice_util_test.cc b/tensorflow/core/util/saved_tensor_slice_util_test.cc
new file mode 100644
index 0000000000..2c34c903db
--- /dev/null
+++ b/tensorflow/core/util/saved_tensor_slice_util_test.cc
@@ -0,0 +1,32 @@
+#include "tensorflow/core/util/saved_tensor_slice_util.h"
+
+#include <gtest/gtest.h>
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+namespace {
+
+// Testing serialization of tensor name and tensor slice in the ordered code
+// format.
+TEST(TensorShapeUtilTest, TensorNameSliceToOrderedCode) {
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("-:-:1,3:4,5");
+    string buffer = EncodeTensorNameSlice("foo", s);
+    string name;
+    s.Clear();
+    TF_CHECK_OK(DecodeTensorNameSlice(buffer, &name, &s));
+    EXPECT_EQ("foo", name);
+    EXPECT_EQ("-:-:1,3:4,5", s.DebugString());
+  }
+}
+
+}  // namespace
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/sparse/README.md b/tensorflow/core/util/sparse/README.md
new file mode 100644
index 0000000000..7b0799eb0e
--- /dev/null
+++ b/tensorflow/core/util/sparse/README.md
@@ -0,0 +1,222 @@
+SparseTensor
+============
+
+Sparse Tensors are stored as two dense tensors and a shape:
+
+*  `indices`: a `brain::Tensor` storing a matrix, typically `int64`
+*  `values`: a `brain::Tensor` storing a vector with values of type T.
+*  `shape`: a `TensorShape` storing the bounds of the underlying tensor
+*  `order`: (optional) a `gtl::InlinedVector<int64,8>` with the dimensions
+            along which the indices are ordered.
+
+Let
+
+    ix = indices.matrix<int64>()
+    vals = values.vec<T>()
+
+The shape of `ix` is `N x NDIMS`, and each row corresponds to the
+index of a single element of the sparse tensor.
+
+The length of `vals` must be `N`, and `vals(i)` corresponds to the
+value with index `ix(i,:)`.
+
+Shape must be a `TensorShape` with `dims() == NDIMS`.
+The shape is the full shape of the dense tensor these indices
+represent.
+
+To be specific, the representation (pseudocode) is:
+
+    tensor[ix[i,:]] == vals[i] for i = 0, ..., N-1
+
+Ordering
+--------
+
+Indices need not be provided in order.  For example, the following
+index matrix is ordered according to dimension order `{0, 1, 2}`.
+
+    [0 0 1]
+    [0 1 1]
+    [2 0 2]
+
+However, you can provide an unordered version:
+
+    [2 0 2]
+    [0 0 1]
+    [0 1 1]
+
+If the SparseTensor is constructed without a provided order, then a
+the default order is `{-1, ..., -1}`.  Certain operations will fail or crash
+when the order is not provided.
+
+Resorting the SparseTensor in-place (which resorts the underlying index and
+values tensors in-place) will update the order.  The cost of reordering the
+matrix is `O(N*log(N))`, and requires `O(N)` additional temporary space to store
+a reordering index.  If the default order is not specified and reordering is not
+performed, the following will happen:
+
+*  `group()` will **raise an assertion failure**
+*  `IndicesValid()` will **raise an assertion failure**
+
+To update the internal index ordering after construction, call
+`Reorder<T>()` via, e.g., `Reorder<T>({0,1,2})`.
+After this step, all the above methods should work correctly.
+
+The method `IndicesValid()` checks to make sure:
+
+*  `0 <= ix(i, d) < shape.dim_size(d)`
+*  indices do not repeat
+*  indices are in order
+
+Iterating
+---------
+
+### group({grouping dims})
+
+*  provides an iterator that groups entries according to
+   dimensions you care about
+*  may require a sort if your data isn't presorted in a way that's
+   compatible with grouping_dims
+*  for each group, returns the group index (values of the group
+   dims for this iteration), the subset of indices in this group,
+   and the subset of values in this group.  these are lazy outputs
+   so to read them individually, copy them as per the example
+   below.
+
+#### **NOTE**
+`group({dim0, ..., dimk})` will **raise an assertion failure** if the
+order of the SparseTensor does not match the dimensions you wish to group by.
+You must either have your indices in the correct order and construct the
+SparseTensor with
+
+    order = {dim0, ..., dimk, ...}
+
+or call
+
+    Reorder<T>({dim0, .., dimk, ...})
+
+to sort the SparseTensor before grouping.
+
+Example of grouping:
+
+    Tensor indices(DT_INT64, TensorShape({N, NDIMS});
+    Tensor values(DT_STRING, TensorShape({N});
+    TensorShape shape({dim0,...});
+    SparseTensor sp(indices, vals, shape);
+    sp.Reorder<string>({1, 2, 0, 3, ...}); // Must provide NDIMS dims.
+    // group according to dims 1 and 2
+    for (const auto& g : sp.group({1, 2})) {
+      cout << "vals of ix[:, 1,2] for this group: "
+           << g.group()[0] << ", " << g.group()[1];
+      cout << "full indices of group:\n" << g.indices();
+      cout << "values of group:\n" << g.values();
+
+      TTypes<int64>::UnalignedMatrix g_ix = g.indices();
+      TTypes<string>::UnalignedVec g_v = g.values();
+      ASSERT(g_ix.dimension(0) == g_v.size());  // number of elements match.
+    }
+
+
+ToDense
+--------
+
+Converts sparse tensor to dense.  You must provide a pointer to the
+dense tensor (preallocated).  `ToDense()` will optionally
+preinitialize the tensor with zeros.
+
+Shape checking is performed, as is boundary checking.
+
+    Tensor indices(DT_INT64, TensorShape({N, NDIMS});
+    Tensor values(DT_STRING, TensorShape({N});
+    TensorShape shape({dim0,...});
+    SparseTensor sp(indices, vals, shape);
+    ASSERT(sp.IndicesValid());  // checks ordering & index bounds.
+
+    Tensor dense(DT_STRING, shape);
+    // initialize other indices to zero.  copy.
+    ASSERT(sp.ToDense<string>(&dense, true));
+
+
+Concat
+--------
+
+Concatenates multiple SparseTensors and returns a new SparseTensor.
+This concatenation is with respect to the "dense" versions of these
+SparseTensors.  Concatenation is performed along dimension order[0]
+of all tensors.  As a result, shape[order[0]] may differ across
+the inputs, but shape[d] for d != order[0] must match across all inputs.
+
+We call order[0] the **primary dimension**.
+
+**Prerequisites**
+
+*  The inputs' ranks must all match.
+*  The inputs' order[0] must all match.
+*  The inputs' shapes must all match except for dimension order[0].
+*  The inputs' values must all be of the same type.
+
+If any of these are false, concat will die with an assertion failure.
+
+Example:
+Concatenate two sparse matrices along columns.
+
+Matrix 1:
+
+    [0 0 1]
+    [2 0 0]
+    [3 0 4]
+
+Matrix 2:
+
+    [0 0 0 0 0]
+    [0 1 0 0 0]
+    [2 0 0 1 0]
+
+Concatenated Matrix:
+
+    [0 0 1 0 0 0 0 0]
+    [2 0 0 0 1 0 0 0]
+    [3 0 4 2 0 0 1 0]
+
+Expected input shapes, orders, and `nnz()`:
+
+    shape_1 = TensorShape({3, 3})
+    shape_2 = TensorShape({3, 8})
+    order_1 = {1, 0}  // primary order is 1, columns
+    order_2 = {1, 0}  // primary order is 1, must match
+    nnz_1 = 4
+    nnz_2 = 3
+
+Output shapes and orders:
+
+    conc_shape = TensorShape({3, 11})  // primary dim increased, others same
+    conc_order = {1, 0}  // Orders match along all inputs
+    conc_nnz = 7  // Sum of nonzeros of inputs
+
+Coding Example:
+
+    Tensor ix1(DT_INT64, TensorShape({N1, 3});
+    Tensor vals1(DT_STRING, TensorShape({N1, 3});
+    Tensor ix2(DT_INT64, TensorShape({N2, 3});
+    Tensor vals2(DT_STRING, TensorShape({N2, 3});
+    Tensor ix3(DT_INT64, TensorShape({N3, 3});
+    Tensor vals3(DT_STRING, TensorShape({N3, 3});
+
+    SparseTensor st1(ix1, vals1, TensorShape({10, 20, 5}), {1, 0, 2});
+    SparseTensor st2(ix2, vals2, TensorShape({10, 10, 5}), {1, 0, 2});
+    // For kicks, st3 indices are out of order, but order[0] matches so we
+    // can still concatenate along this dimension.
+    SparseTensor st3(ix3, vals3, TensorShape({10, 30, 5}), {1, 2, 0});
+
+    SparseTensor conc = SparseTensor::Concat<string>({st1, st2, st3});
+    Tensor ix_conc = conc.indices();
+    Tensor vals_conc = conc.values();
+    EXPECT_EQ(conc.nnz(), st1.nnz() + st2.nnz() + st3.nnz());
+    EXPECT_EQ(conc.Shape(), TensorShape({10, 60, 5}));
+    EXPECT_EQ(conc.Order(), {-1, -1, -1});
+
+    // Reorder st3 so all input tensors have the exact same orders.
+    st3.Reorder<string>({1, 0, 2});
+    SparseTensor conc2 = SparseTensor::Concat<string>({st1, st2, st3});
+    EXPECT_EQ(conc2.Order(), {1, 0, 2});
+    // All indices' orders matched, so output is in order.
+    EXPECT_TRUE(conc2.IndicesValid());
diff --git a/tensorflow/core/util/sparse/dim_comparator.h b/tensorflow/core/util/sparse/dim_comparator.h
new file mode 100644
index 0000000000..57473867cf
--- /dev/null
+++ b/tensorflow/core/util/sparse/dim_comparator.h
@@ -0,0 +1,60 @@
+#ifndef TENSORFLOW_UTIL_SPARSE_DIM_COMPARATOR_H_
+#define TENSORFLOW_UTIL_SPARSE_DIM_COMPARATOR_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace sparse {
+
+/////////////////
+// DimComparator
+/////////////////
+//
+// Helper class, mainly used by the IndexSortOrder. This comparator
+// can be passed to e.g. std::sort, or any other sorter, to sort two
+// rows of an index matrix according to the dimension(s) of interest.
+// The dimensions to sort by are passed to the constructor as "order".
+//
+// Example: if given index matrix IX, two rows ai and bi, and order = {2,1}.
+// operator() compares
+//    IX(ai,2) < IX(bi,2).
+// If IX(ai,2) == IX(bi,2), it compares
+//    IX(ai,1) < IX(bi,1).
+//
+// This can be used to sort a vector of row indices into IX according to
+// the values in IX in particular columns (dimensions) of interest.
+class DimComparator {
+ public:
+  typedef typename gtl::ArraySlice<int64> VarDimArray;
+
+  inline DimComparator(const TTypes<int64>::Matrix& ix,
+                       const VarDimArray& order, int dims)
+      : ix_(ix), order_(order), dims_(dims) {
+    CHECK_GT(order.size(), 0) << "Must order using at least one index";
+    CHECK_LE(order.size(), dims_) << "Can only sort up to dims";
+    for (size_t d = 0; d < order.size(); ++d) {
+      CHECK_GE(order[d], 0);
+      CHECK_LT(order[d], dims);
+    }
+  }
+
+  inline bool operator()(const int64 i, const int64 j) const {
+    for (int di = 0; di < dims_; ++di) {
+      const int64 d = order_[di];
+      if (ix_(i, d) < ix_(j, d)) return true;
+      if (ix_(i, d) > ix_(j, d)) return false;
+    }
+    return false;
+  }
+
+  const TTypes<int64>::Matrix ix_;
+  const VarDimArray order_;
+  const int dims_;
+};
+
+}  // namespace sparse
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_SPARSE_DIM_COMPARATOR_H_
diff --git a/tensorflow/core/util/sparse/group_iterator.cc b/tensorflow/core/util/sparse/group_iterator.cc
new file mode 100644
index 0000000000..e153bcdbb4
--- /dev/null
+++ b/tensorflow/core/util/sparse/group_iterator.cc
@@ -0,0 +1,49 @@
+#include "tensorflow/core/util/sparse/group_iterator.h"
+
+namespace tensorflow {
+namespace sparse {
+
+void GroupIterable::IteratorStep::UpdateEndOfGroup() {
+  ++next_loc_;
+  int64 N = iter_->ix_.dim_size(0);
+  auto ix_t = iter_->ix_.template matrix<int64>();
+  while (next_loc_ < N && iter_->GroupMatches(ix_t, loc_, next_loc_)) {
+    ++next_loc_;
+  }
+}
+
+bool GroupIterable::IteratorStep::operator!=(const IteratorStep& rhs) const {
+  CHECK_EQ(rhs.iter_, iter_) << "Can't compare steps from different iterators";
+  return (rhs.loc_ != loc_);
+}
+
+GroupIterable::IteratorStep& GroupIterable::IteratorStep::
+operator++() {  // prefix ++
+  loc_ = next_loc_;
+  UpdateEndOfGroup();
+  return *this;
+}
+
+GroupIterable::IteratorStep GroupIterable::IteratorStep::operator++(
+    int) {  // postfix ++
+  IteratorStep lhs(*this);
+  ++(*this);
+  return lhs;
+}
+
+std::vector<int64> Group::group() const {
+  std::vector<int64> g;
+  auto ix_t = iter_->ix_.template matrix<int64>();
+  for (const int d : iter_->group_dims_) {
+    g.push_back(ix_t(loc_, d));
+  }
+  return g;
+}
+
+TTypes<int64>::UnalignedConstMatrix Group::indices() const {
+  return TTypes<int64>::UnalignedConstMatrix(
+      &(iter_->ix_.matrix<int64>()(loc_, 0)), next_loc_ - loc_, iter_->dims_);
+}
+
+}  // namespace sparse
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/sparse/group_iterator.h b/tensorflow/core/util/sparse/group_iterator.h
new file mode 100644
index 0000000000..8423d54f27
--- /dev/null
+++ b/tensorflow/core/util/sparse/group_iterator.h
@@ -0,0 +1,120 @@
+#ifndef TENSORFLOW_UTIL_SPARSE_GROUP_ITERATOR_H_
+#define TENSORFLOW_UTIL_SPARSE_GROUP_ITERATOR_H_
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace sparse {
+
+class GroupIterable;  // Predeclare GroupIterable for Group.
+
+// This class is returned when dereferencing a GroupIterable iterator.
+// It provides the methods group(), indices(), and values(), which
+// provide access into the underlying SparseTensor.
+class Group {
+ public:
+  Group(GroupIterable* iter, int64 loc, int64 next_loc)
+      : iter_(iter), loc_(loc), next_loc_(next_loc) {}
+
+  std::vector<int64> group() const;
+  TTypes<int64>::UnalignedConstMatrix indices() const;
+  template <typename T>
+  typename TTypes<T>::UnalignedVec values() const;
+
+ private:
+  GroupIterable* iter_;
+  int64 loc_;
+  int64 next_loc_;
+};
+
+/////////////////
+// GroupIterable
+/////////////////
+//
+// Returned when calling sparse_tensor.group({dim0, dim1, ...}).
+//
+// Please note: the sparse_tensor should already be ordered according
+// to {dim0, dim1, ...}.  Otherwise this iteration will return invalid groups.
+//
+// Allows grouping and iteration of the SparseTensor according to the
+// subset of dimensions provided to the group call.
+//
+// The actual grouping dimensions are stored in the
+// internal vector group_dims_.  Iterators inside the iterable provide
+// the three methods:
+//
+// *  group(): returns a vector with the current group dimension values.
+// *  indices(): a map of index, providing the indices in
+//    this group.
+// *  values(): a map of values, providing the values in
+//    this group.
+//
+// To iterate across GroupIterable, see examples in README.md.
+//
+
+// Forward declaration of SparseTensor
+class GroupIterable {
+ public:
+  typedef gtl::ArraySlice<int64> VarDimArray;
+
+  GroupIterable(Tensor ix, Tensor vals, int dims, const VarDimArray& group_dims)
+      : ix_(ix), vals_(vals), dims_(dims), group_dims_(group_dims) {}
+
+  class IteratorStep;
+
+  IteratorStep begin() { return IteratorStep(this, 0); }
+  IteratorStep end() { return IteratorStep(this, ix_.dim_size(0)); }
+
+  template <typename TIX>
+  inline bool GroupMatches(const TIX& ix, int64 loc_a, int64 loc_b) const {
+    bool matches = true;
+    for (int d : group_dims_) {
+      if (ix(loc_a, d) != ix(loc_b, d)) {
+        matches = false;
+      }
+    }
+    return matches;
+  }
+
+  class IteratorStep {
+   public:
+    IteratorStep(GroupIterable* iter, int64 loc)
+        : iter_(iter), loc_(loc), next_loc_(loc_) {
+      UpdateEndOfGroup();
+    }
+
+    void UpdateEndOfGroup();
+    bool operator!=(const IteratorStep& rhs) const;
+    IteratorStep& operator++();    // prefix ++
+    IteratorStep operator++(int);  // postfix ++
+    Group operator*() const { return Group(iter_, loc_, next_loc_); }
+
+   private:
+    GroupIterable* iter_;
+    int64 loc_;
+    int64 next_loc_;
+  };
+
+ private:
+  friend class Group;
+  Tensor ix_;
+  Tensor vals_;
+  const int dims_;
+  const VarDimArray group_dims_;
+};
+
+// Implementation of Group::values<T>()
+template <typename T>
+typename TTypes<T>::UnalignedVec Group::values() const {
+  return typename TTypes<T>::UnalignedVec(&(iter_->vals_.vec<T>()(loc_)),
+                                          next_loc_ - loc_);
+}
+
+}  // namespace sparse
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_SPARSE_GROUP_ITERATOR_H_
diff --git a/tensorflow/core/util/sparse/sparse_tensor.h b/tensorflow/core/util/sparse/sparse_tensor.h
new file mode 100644
index 0000000000..dcb75e7f54
--- /dev/null
+++ b/tensorflow/core/util/sparse/sparse_tensor.h
@@ -0,0 +1,353 @@
+#ifndef TENSORFLOW_UTIL_SPARSE_SPARSE_TENSOR_H_
+#define TENSORFLOW_UTIL_SPARSE_SPARSE_TENSOR_H_
+
+#include <limits>
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/core/util/sparse/dim_comparator.h"
+#include "tensorflow/core/util/sparse/group_iterator.h"
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace sparse {
+
+class SparseTensor {
+ public:
+  typedef typename gtl::ArraySlice<int64> VarDimArray;
+
+  SparseTensor(Tensor ix, Tensor vals, const TensorShape& shape)
+      : SparseTensor(ix, vals, shape, UndefinedOrder(shape)) {}
+
+  SparseTensor(Tensor ix, Tensor vals, const TensorShape& shape,
+               const VarDimArray& order)
+      : ix_(ix),
+        vals_(vals),
+        shape_(shape),
+        order_(order.begin(), order.end()),
+        dims_(GetDimsFromIx(ix)) {
+    CHECK_EQ(ix.dtype(), DT_INT64) << "indices must be type int64 but got: "
+                                   << ix.dtype();
+    CHECK(TensorShapeUtils::IsMatrix(ix.shape()))
+        << "indices must be a matrix, but got: " << ix.shape().DebugString();
+    CHECK(TensorShapeUtils::IsVector(vals.shape()))
+        << "vals must be a vec, but got: " << vals.shape().DebugString();
+    CHECK_EQ(ix.shape().dim_size(0), vals.shape().dim_size(0))
+        << "indices and values rows (indexing dimension) must match.";
+  }
+
+  std::size_t num_entries() const { return ix_.dim_size(0); }
+
+  const Tensor& indices() const { return ix_; }
+
+  const Tensor& values() const { return vals_; }
+
+  DataType dtype() const { return vals_.dtype(); }
+
+  bool IndicesValid() const {
+    const auto ix_t = ix_.matrix<int64>();
+    for (int64 ord : order_) {
+      CHECK_GE(ord, 0) << "Order was not provided.  Provide an order at "
+                          "construction time or run ReorderInPlace";
+    }
+
+    for (std::size_t n = 0; n < num_entries(); ++n) {
+      if (!IndexValid(ix_t, n)) return false;
+    }
+
+    return true;
+  }
+
+  // Returns the tensor shape (the dimensions of the "densified"
+  // tensor this tensor represents).
+  const TensorShape shape() const { return shape_; }
+
+  const VarDimArray order() const { return order_; }
+
+  // Resorts the indices and values according to the dimensions in order.
+  template <typename T>
+  void Reorder(const VarDimArray& order);
+
+  // Returns a group iterable that can be used for clumping indices
+  // and values according to the group indices of interest.
+  //
+  // Precondition: order()[0..group_ix.size()] == group_ix.
+  //
+  // See the README.md in this directory for more usage information.
+  GroupIterable group(const VarDimArray& group_ix) {
+    CHECK_LE(group_ix.size(), dims_);
+    for (std::size_t di = 0; di < group_ix.size(); ++di) {
+      CHECK_GE(group_ix[di], 0) << "Group dimension out of range";
+      CHECK_LT(group_ix[di], dims_) << "Group dimension out of range";
+      CHECK_EQ(group_ix[di], order_[di])
+          << "Group dimension does not match sorted order";
+    }
+    return GroupIterable(ix_, vals_, dims_, group_ix);
+  }
+
+  // Stores the sparse indices into the dense tensor out.
+  // Preconditions:
+  //   out->shape().dims() == shape().dims()
+  //   out->shape().dim_size(d) >= shape(d) for all d
+  //
+  // Returns true on success.  False on failure (mismatched dimensions
+  // or out-of-bounds indices).
+  //
+  // If initialize==True, ToDense first overwrites all coefficients in out to 0.
+  //
+  template <typename T>
+  bool ToDense(Tensor* out, bool initialize = true);
+
+  // Concat() will concatenate all the tensors according to their first order
+  // dimension.  All tensors must have identical shape except for
+  // the first order dimension.  All tensors orders' first dimension
+  // must match.
+  //
+  // If all of the tensors have identical ordering, then the output
+  // will have this ordering.  Otherwise the output is set as not
+  // having any order and a Reorder<T>() should be called on it before
+  // performing any subsequent operations.
+  template <typename T>
+  static SparseTensor Concat(const gtl::ArraySlice<SparseTensor>& tensors);
+
+ private:
+  static int GetDimsFromIx(const Tensor& ix) {
+    CHECK(TensorShapeUtils::IsMatrix(ix.shape()));
+    return ix.dim_size(1);
+  }
+
+  static gtl::InlinedVector<int64, 8> UndefinedOrder(const TensorShape& shape) {
+    return gtl::InlinedVector<int64, 8>(shape.dims(), -1);
+  }
+
+  // Helper for IndicesValid()
+  inline bool IndexValid(const TTypes<int64>::ConstMatrix& ix_t,
+                         int64 n) const {
+    bool different = false;
+    bool bad_order = false;
+    bool valid = true;
+    if (n == 0) {
+      for (int di = 0; di < dims_; ++di) {
+        if (ix_t(n, di) < 0 || ix_t(n, di) >= shape_.dim_size(di))
+          valid = false;
+      }
+      different = true;
+    } else {
+      for (int di = 0; di < dims_; ++di) {
+        if (ix_t(n, di) < 0 || ix_t(n, di) >= shape_.dim_size(di))
+          valid = false;
+        int64 diff = ix_t(n, order_[di]) - ix_t(n - 1, order_[di]);
+        if (diff > 0) different = true;
+        if (!different && diff < 0) bad_order = true;
+      }
+    }
+    if (!valid) return false;      // Out of bounds
+    if (!different) return false;  // The past two indices are identical...
+    if (bad_order) return false;   // Decreasing in order.
+    return true;
+  }
+
+  // Helper for ToDense<T>()
+  template <typename T>
+  bool ValidateAndInitializeToDense(Tensor* out, bool initialize);
+
+  Tensor ix_;
+  Tensor vals_;
+  TensorShape shape_;
+  gtl::InlinedVector<int64, 8> order_;
+  const int dims_;
+};
+
+// This operation updates the indices and values Tensor rows, so it is
+// an in-place algorithm.  It requires O(N log N) time and O(N)
+// temporary space.
+template <typename T>
+void SparseTensor::Reorder(const VarDimArray& order) {
+  CHECK_EQ(DataTypeToEnum<T>::v(), dtype())
+      << "Reorder requested with the wrong datatype";
+  CHECK_EQ(order.size(), dims_) << "Order length must be SparseTensor rank";
+  auto ix_t = ix_.matrix<int64>();
+  auto vals_t = vals_.vec<T>();
+
+  DimComparator sorter(ix_t, order, dims_);
+
+  std::vector<int64> reorder(num_entries());
+  std::iota(reorder.begin(), reorder.end(), 0);
+
+  // Sort to get order of indices
+  std::sort(reorder.begin(), reorder.end(), sorter);
+
+  // We have a forward reordering, but what we'll need is a
+  // permutation (the inverse).  This can be calculated with O(1)
+  // additional
+  // and O(n) time (INVPERM) but we just do the simple thing here.
+  std::vector<int64> permutation(reorder.size());
+  for (std::size_t n = 0; n < reorder.size(); ++n) {
+    permutation[reorder[n]] = n;
+  }
+
+  // Update indices & values by converting the permutations to
+  // a product of transpositions.  Iterate over the cycles in the
+  // permutation, and convert each of those into a product of
+  // transpositions (swaps):
+  //   https://en.wikipedia.org/wiki/Cyclic_permutation
+  // This is N swaps, 2*N comparisons.
+  for (std::size_t n = 0; n + 1 < permutation.size(); ++n) {
+    while (n != permutation[n]) {
+      std::size_t r = permutation[n];
+      std::swap_ranges(&(ix_t(n, 0)), &(ix_t(n + 1, 0)), &(ix_t(r, 0)));
+      std::swap(vals_t(n), vals_t(r));
+      std::swap(permutation[n], permutation[r]);
+    }
+  }
+
+  order_ = gtl::InlinedVector<int64, 8>(order.begin(), order.end());
+}
+
+template <typename T>
+bool SparseTensor::ValidateAndInitializeToDense(Tensor* out, bool initialize) {
+  CHECK_EQ(DataTypeToEnum<T>::v(), dtype())
+      << "ToDense requested with the wrong datatype";
+
+  CHECK_EQ(out->shape().dims(), dims_)
+      << "Incompatible dimensions between SparseTensor and output";
+
+  CHECK_EQ(out->dtype(), DataTypeToEnum<T>::v())
+      << "Output must be type: " << DataTypeToEnum<T>::v()
+      << " but got: " << out->dtype();
+
+  // Make sure the dense output is the same rank and has room
+  // to hold the SparseTensor.
+  const auto& out_shape = out->shape();
+  if (shape_.dims() != out_shape.dims()) return false;
+  for (int d = 0; d < shape_.dims(); ++d) {
+    if (shape_.dim_size(d) > out_shape.dim_size(d)) return false;
+  }
+
+  if (initialize) {
+    auto out_t = out->flat<T>();
+    out_t.setConstant(T());
+  }
+
+  return true;
+}
+
+template <typename T>
+bool SparseTensor::ToDense(Tensor* out, bool initialize) {
+  if (!ValidateAndInitializeToDense<T>(out, initialize)) return false;
+
+  auto out_t = out->flat<T>();
+  auto ix_t = ix_.matrix<int64>();
+  auto vals_t = vals_.vec<T>();
+
+  std::vector<int64> strides(dims_);
+  const auto& out_shape = out->shape();
+  strides[dims_ - 1] = 1;
+  for (int d = dims_ - 2; d >= 0; --d) {
+    strides[d] = strides[d + 1] * out_shape.dim_size(d + 1);
+  }
+
+  for (std::size_t n = 0; n < vals_t.dimension(0); ++n) {
+    bool invalid_dims = false;
+    int64 ix = 0;
+    for (int d = 0; d < dims_; ++d) {
+      const int64 ix_n_d = ix_t(n, d);
+      if (ix_n_d < 0 || ix_n_d >= out_shape.dim_size(d)) {
+        invalid_dims = true;
+      }
+      ix += strides[d] * ix_n_d;
+    }
+    if (invalid_dims) return false;
+    out_t(ix) = vals_t(n);
+  }
+  return true;
+}
+
+template <typename T>
+SparseTensor SparseTensor::Concat(
+    const gtl::ArraySlice<SparseTensor>& tensors) {
+  CHECK_GE(tensors.size(), 1) << "Cannot concat 0 SparseTensors";
+  const int dims = tensors[0].dims_;
+  CHECK_GE(dims, 1) << "Cannot concat 0-dimensional SparseTensors";
+  auto order_0 = tensors[0].order();
+  const int primary_dim = order_0[0];
+  gtl::InlinedVector<int64, 8> final_order(order_0.begin(), order_0.end());
+  TensorShape final_shape(tensors[0].shape());
+  final_shape.set_dim(primary_dim, 0);  // We'll build this up as we go along.
+  int num_entries = 0;
+
+  bool fully_ordered = true;
+  for (const SparseTensor& st : tensors) {
+    CHECK_EQ(st.dims_, dims) << "All SparseTensors must have the same rank.";
+    CHECK_EQ(DataTypeToEnum<T>::v(), st.dtype())
+        << "Concat requested with the wrong data type";
+    CHECK_GE(st.order()[0], 0) << "SparseTensor must be ordered";
+    CHECK_EQ(st.order()[0], primary_dim)
+        << "All SparseTensors' order[0] must match.  This is the concat dim.";
+    if (st.order() != final_order) fully_ordered = false;
+    const TensorShape st_shape = st.shape();
+    for (int d = 0; d < dims - 1; ++d) {
+      const int cdim = (d < primary_dim) ? d : d + 1;
+      CHECK_EQ(final_shape.dim_size(cdim), st_shape.dim_size(cdim))
+          << "All SparseTensors' shapes must match except on the concat dim.  "
+          << "Concat dim: " << primary_dim
+          << ", mismatched shape at dim: " << cdim
+          << ".  Expecting shape like: " << final_shape.DebugString()
+          << " but saw shape: " << st_shape.DebugString();
+    }
+
+    // Update dimension of final shape
+    final_shape.set_dim(primary_dim, final_shape.dim_size(primary_dim) +
+                                         st_shape.dim_size(primary_dim));
+
+    num_entries += st.num_entries();  // Update number of entries
+  }
+
+  // If nonconsistent ordering among inputs, set final order to -1s.
+  if (!fully_ordered) {
+    final_order = UndefinedOrder(final_shape);
+  }
+
+  Tensor output_ix(DT_INT64, TensorShape({num_entries, dims}));
+  Tensor output_vals(DataTypeToEnum<T>::v(), TensorShape({num_entries}));
+
+  auto ix_t = output_ix.matrix<int64>();
+  auto vals_t = output_vals.vec<T>();
+
+  Eigen::DenseIndex offset = 0;
+  int64 shape_offset = 0;
+  for (const SparseTensor& st : tensors) {
+    int st_num_entries = st.num_entries();
+    Eigen::DSizes<Eigen::DenseIndex, 2> ix_start(offset, 0);
+    Eigen::DSizes<Eigen::DenseIndex, 2> ix_size(st_num_entries, dims);
+    Eigen::DSizes<Eigen::DenseIndex, 1> vals_start(offset);
+    Eigen::DSizes<Eigen::DenseIndex, 1> vals_size(st_num_entries);
+
+    // Fill in indices & values.
+    ix_t.slice(ix_start, ix_size) = st.ix_.matrix<int64>();
+    vals_t.slice(vals_start, vals_size) = st.vals_.vec<T>();
+
+    Eigen::DSizes<Eigen::DenseIndex, 2> ix_update_start(offset, primary_dim);
+    Eigen::DSizes<Eigen::DenseIndex, 2> ix_update_size(st_num_entries, 1);
+    // The index associated with the primary dimension gets increased
+    // by the shapes of the previous concatted Tensors.
+    auto update_slice = ix_t.slice(ix_update_start, ix_update_size);
+    update_slice += update_slice.constant(shape_offset);
+
+    offset += st_num_entries;
+    shape_offset += st.shape().dim_size(primary_dim);
+  }
+
+  return SparseTensor(output_ix, output_vals, final_shape, final_order);
+}
+
+}  // namespace sparse
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_SPARSE_SPARSE_TENSOR_H_
diff --git a/tensorflow/core/util/sparse/sparse_tensor_test.cc b/tensorflow/core/util/sparse/sparse_tensor_test.cc
new file mode 100644
index 0000000000..47126b7187
--- /dev/null
+++ b/tensorflow/core/util/sparse/sparse_tensor_test.cc
@@ -0,0 +1,467 @@
+#include "tensorflow/core/util/sparse/sparse_tensor.h"
+
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/framework/tensor_types.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/public/tensor.h"
+#include <gtest/gtest.h>
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+
+namespace tensorflow {
+namespace sparse {
+namespace {
+
+Eigen::Tensor<int64, 2, Eigen::RowMajor, Eigen::DenseIndex>
+GetSimpleIndexTensor(int N, const int NDIM) {
+  Eigen::Tensor<int64, 2, Eigen::RowMajor, Eigen::DenseIndex> ix(N, NDIM);
+  ix(0, 0) = 0;
+  ix(0, 1) = 0;
+  ix(0, 2) = 0;
+
+  ix(1, 0) = 3;
+  ix(1, 1) = 0;
+  ix(1, 2) = 0;
+
+  ix(2, 0) = 2;
+  ix(2, 1) = 0;
+  ix(2, 2) = 0;
+
+  ix(3, 0) = 0;
+  ix(3, 1) = 1;
+  ix(3, 2) = 0;
+
+  ix(4, 0) = 0;
+  ix(4, 1) = 0;
+  ix(4, 2) = 2;
+  return ix;
+}
+
+TEST(SparseTensorTest, DimComparatorSorts) {
+  std::size_t N = 5;
+  const int NDIM = 3;
+  auto ix = GetSimpleIndexTensor(N, NDIM);
+  TTypes<int64>::Matrix map(ix.data(), N, NDIM);
+
+  std::vector<int64> sorting(N);
+  for (std::size_t n = 0; n < N; ++n) sorting[n] = n;
+
+  // new order should be: {0, 4, 3, 2, 1}
+  std::vector<int64> order{0, 1, 2};
+  DimComparator sorter(map, order, NDIM);
+  std::sort(sorting.begin(), sorting.end(), sorter);
+
+  EXPECT_EQ(sorting, std::vector<int64>({0, 4, 3, 2, 1}));
+
+  // new order should be: {0, 3, 2, 1, 4}
+  std::vector<int64> order1{2, 0, 1};
+  DimComparator sorter1(map, order1, NDIM);
+  for (std::size_t n = 0; n < N; ++n) sorting[n] = n;
+  std::sort(sorting.begin(), sorting.end(), sorter1);
+
+  EXPECT_EQ(sorting, std::vector<int64>({0, 3, 2, 1, 4}));
+}
+
+TEST(SparseTensorTest, SparseTensorConstruction) {
+  int N = 5;
+  const int NDIM = 3;
+  auto ix_c = GetSimpleIndexTensor(N, NDIM);
+  Eigen::Tensor<string, 1, Eigen::RowMajor> vals_c(N);
+  vals_c(0) = "hi0";
+  vals_c(1) = "hi1";
+  vals_c(2) = "hi2";
+  vals_c(3) = "hi3";
+  vals_c(4) = "hi4";
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_STRING, TensorShape({N}));
+
+  auto ix_t = ix.matrix<int64>();
+  auto vals_t = vals.vec<string>();
+  vals_t = vals_c;
+  ix_t = ix_c;
+
+  TensorShape shape({10, 10, 10});
+  std::vector<int64> order{0, 1, 2};
+  SparseTensor st(ix, vals, shape, order);
+  EXPECT_FALSE(st.IndicesValid());  // Out of order
+
+  // Regardless of how order is updated; so long as there are no
+  // duplicates, the resulting indices are valid.
+  st.Reorder<string>({2, 0, 1});
+  EXPECT_TRUE(st.IndicesValid());
+  EXPECT_EQ(vals_t(0), "hi0");
+  EXPECT_EQ(vals_t(1), "hi3");
+  EXPECT_EQ(vals_t(2), "hi2");
+  EXPECT_EQ(vals_t(3), "hi1");
+  EXPECT_EQ(vals_t(4), "hi4");
+
+  ix_t = ix_c;
+  vals_t = vals_c;
+  st.Reorder<string>({0, 1, 2});
+  EXPECT_TRUE(st.IndicesValid());
+  EXPECT_EQ(vals_t(0), "hi0");
+  EXPECT_EQ(vals_t(1), "hi4");
+  EXPECT_EQ(vals_t(2), "hi3");
+  EXPECT_EQ(vals_t(3), "hi2");
+  EXPECT_EQ(vals_t(4), "hi1");
+
+  ix_t = ix_c;
+  vals_t = vals_c;
+  st.Reorder<string>({2, 1, 0});
+  EXPECT_TRUE(st.IndicesValid());
+}
+
+TEST(SparseTensorTest, EmptySparseTensorAllowed) {
+  int N = 0;
+  const int NDIM = 3;
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_STRING, TensorShape({N}));
+
+  TensorShape shape({10, 10, 10});
+  std::vector<int64> order{0, 1, 2};
+  SparseTensor st(ix, vals, shape, order);
+  EXPECT_TRUE(st.IndicesValid());
+  EXPECT_EQ(st.order(), order);
+
+  std::vector<int64> new_order{1, 0, 2};
+  st.Reorder<string>(new_order);
+  EXPECT_TRUE(st.IndicesValid());
+  EXPECT_EQ(st.order(), new_order);
+}
+
+TEST(SparseTensorTest, SortingWorksCorrectly) {
+  int N = 30;
+  const int NDIM = 4;
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_STRING, TensorShape({N}));
+  TensorShape shape({1000, 1000, 1000, 1000});
+  SparseTensor st(ix, vals, shape);
+
+  auto ix_t = ix.matrix<int64>();
+
+  for (int n = 0; n < 100; ++n) {
+    ix_t = ix_t.random(Eigen::internal::UniformRandomGenerator<int64>(n + 1));
+    ix_t = ix_t.abs() % 1000;
+    st.Reorder<string>({0, 1, 2, 3});
+    EXPECT_TRUE(st.IndicesValid());
+    st.Reorder<string>({3, 2, 1, 0});
+    EXPECT_TRUE(st.IndicesValid());
+    st.Reorder<string>({1, 0, 2, 3});
+    EXPECT_TRUE(st.IndicesValid());
+    st.Reorder<string>({3, 0, 2, 1});
+    EXPECT_TRUE(st.IndicesValid());
+  }
+}
+
+TEST(SparseTensorTest, ValidateIndicesFindsInvalid) {
+  int N = 2;
+  const int NDIM = 3;
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_STRING, TensorShape({N}));
+
+  Eigen::Tensor<int64, 2, Eigen::RowMajor> ix_orig(N, NDIM);
+  ix_orig(0, 0) = 0;
+  ix_orig(0, 1) = 0;
+  ix_orig(0, 2) = 0;
+
+  ix_orig(1, 0) = 0;
+  ix_orig(1, 1) = 0;
+  ix_orig(1, 2) = 0;
+
+  auto ix_t = ix.matrix<int64>();
+  ix_t = ix_orig;
+
+  TensorShape shape({10, 10, 10});
+  std::vector<int64> order{0, 1, 2};
+  SparseTensor st(ix, vals, shape, order);
+
+  st.Reorder<string>(order);
+  EXPECT_FALSE(st.IndicesValid());  // two indices are identical
+
+  ix_orig(1, 2) = 1;
+  ix_t = ix_orig;
+  st.Reorder<string>(order);
+  EXPECT_TRUE(st.IndicesValid());  // second index now (0, 0, 1)
+
+  ix_orig(0, 2) = 1;
+  ix_t = ix_orig;
+  st.Reorder<string>(order);
+  EXPECT_FALSE(st.IndicesValid());  // first index now (0, 0, 1)
+}
+
+TEST(SparseTensorTest, SparseTensorCheckBoundaries) {
+  int N = 5;
+  const int NDIM = 3;
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_STRING, TensorShape({N}));
+
+  auto ix_t = GetSimpleIndexTensor(N, NDIM);
+
+  ix.matrix<int64>() = ix_t;
+
+  TensorShape shape({10, 10, 10});
+  std::vector<int64> order{0, 1, 2};
+
+  SparseTensor st(ix, vals, shape, order);
+  EXPECT_FALSE(st.IndicesValid());
+
+  st.Reorder<string>(order);
+  EXPECT_TRUE(st.IndicesValid());
+
+  ix_t(0, 0) = 11;
+  ix.matrix<int64>() = ix_t;
+  st.Reorder<string>(order);
+  EXPECT_FALSE(st.IndicesValid());
+
+  ix_t(0, 0) = -1;
+  ix.matrix<int64>() = ix_t;
+  st.Reorder<string>(order);
+  EXPECT_FALSE(st.IndicesValid());
+
+  ix_t(0, 0) = 0;
+  ix.matrix<int64>() = ix_t;
+  st.Reorder<string>(order);
+  EXPECT_TRUE(st.IndicesValid());
+}
+
+TEST(SparseTensorTest, SparseTensorToDenseTensor) {
+  int N = 5;
+  const int NDIM = 3;
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_STRING, TensorShape({N}));
+
+  auto ix_t = GetSimpleIndexTensor(N, NDIM);
+  auto vals_t = vals.vec<string>();
+
+  ix.matrix<int64>() = ix_t;
+
+  vals_t(0) = "hi0";
+  vals_t(1) = "hi1";
+  vals_t(2) = "hi2";
+  vals_t(3) = "hi3";
+  vals_t(4) = "hi4";
+
+  TensorShape shape({4, 4, 5});
+  std::vector<int64> order{0, 1, 2};
+  SparseTensor st(ix, vals, shape, order);
+
+  Tensor dense(DT_STRING, TensorShape({4, 4, 5}));
+  st.ToDense<string>(&dense);
+
+  auto dense_t = dense.tensor<string, 3>();
+  Eigen::array<Eigen::DenseIndex, NDIM> ix_n;
+  for (int n = 0; n < N; ++n) {
+    for (int d = 0; d < NDIM; ++d) ix_n[d] = ix_t(n, d);
+    EXPECT_EQ(dense_t(ix_n), vals_t(n));
+  }
+
+  // Spot checks on the others
+  EXPECT_EQ(dense_t(0, 0, 1), "");
+  EXPECT_EQ(dense_t(0, 0, 3), "");
+  EXPECT_EQ(dense_t(3, 3, 3), "");
+  EXPECT_EQ(dense_t(3, 3, 4), "");
+}
+
+TEST(SparseTensorTest, SparseTensorToLargerDenseTensor) {
+  int N = 5;
+  const int NDIM = 3;
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_STRING, TensorShape({N}));
+
+  auto ix_t = GetSimpleIndexTensor(N, NDIM);
+  auto vals_t = vals.vec<string>();
+
+  ix.matrix<int64>() = ix_t;
+
+  vals_t(0) = "hi0";
+  vals_t(1) = "hi1";
+  vals_t(2) = "hi2";
+  vals_t(3) = "hi3";
+  vals_t(4) = "hi4";
+
+  TensorShape shape({4, 4, 5});
+  std::vector<int64> order{0, 1, 2};
+  SparseTensor st(ix, vals, shape, order);
+
+  Tensor dense(DT_STRING, TensorShape({10, 10, 10}));
+  st.ToDense<string>(&dense);
+
+  auto dense_t = dense.tensor<string, 3>();
+  Eigen::array<Eigen::DenseIndex, NDIM> ix_n;
+  for (int n = 0; n < N; ++n) {
+    for (int d = 0; d < NDIM; ++d) ix_n[d] = ix_t(n, d);
+    EXPECT_EQ(dense_t(ix_n), vals_t(n));
+  }
+
+  // Spot checks on the others
+  EXPECT_EQ(dense_t(0, 0, 1), "");
+  EXPECT_EQ(dense_t(0, 0, 3), "");
+  EXPECT_EQ(dense_t(3, 3, 3), "");
+  EXPECT_EQ(dense_t(3, 3, 4), "");
+  EXPECT_EQ(dense_t(9, 0, 0), "");
+  EXPECT_EQ(dense_t(9, 0, 9), "");
+  EXPECT_EQ(dense_t(9, 9, 9), "");
+}
+
+TEST(SparseTensorTest, SparseTensorGroup) {
+  int N = 5;
+  const int NDIM = 3;
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_INT32, TensorShape({N}));
+
+  auto ix_t = ix.matrix<int64>();
+  auto vals_t = vals.vec<int32>();
+
+  ix_t = GetSimpleIndexTensor(N, NDIM);
+
+  vals_t(0) = 1;  // associated with ix (000)
+  vals_t(1) = 2;  // associated with ix (300)
+  vals_t(2) = 3;  // associated with ix (200)
+  vals_t(3) = 4;  // associated with ix (010)
+  vals_t(4) = 5;  // associated with ix (002)
+
+  TensorShape shape({10, 10, 10});
+  std::vector<int64> order{0, 1, 2};
+
+  SparseTensor st(ix, vals, shape, order);
+  st.Reorder<int32>(order);
+
+  std::vector<std::vector<int64> > groups;
+  std::vector<TTypes<int64>::UnalignedConstMatrix> grouped_indices;
+  std::vector<TTypes<int32>::UnalignedVec> grouped_values;
+
+  // Group by index 0
+  auto gi = st.group({0});
+
+  // All the hard work is right here!
+  for (const auto& g : gi) {
+    groups.push_back(g.group());
+    VLOG(1) << "Group: " << str_util::Join(g.group(), ",");
+    VLOG(1) << "Indices: " << g.indices();
+    VLOG(1) << "Values: " << g.values<int32>();
+
+    grouped_indices.push_back(g.indices());
+    grouped_values.push_back(g.values<int32>());
+  }
+
+  // Group by dimension 0, we have groups: 0--, 2--, 3--
+  EXPECT_EQ(groups.size(), 3);
+  EXPECT_EQ(groups[0], std::vector<int64>({0}));
+  EXPECT_EQ(groups[1], std::vector<int64>({2}));
+  EXPECT_EQ(groups[2], std::vector<int64>({3}));
+
+  std::vector<Eigen::Tensor<int64, 2, Eigen::RowMajor> > expected_indices;
+  std::vector<Eigen::Tensor<int32, 1, Eigen::RowMajor> > expected_vals;
+
+  // First group: 000, 002, 010
+  expected_indices.emplace_back(3, NDIM);  // 3 x 3 tensor
+  expected_vals.emplace_back(3);           // 3 x 5 x 1 x 1 tensor
+  expected_indices[0].setZero();
+  expected_indices[0](1, 2) = 2;  // 002
+  expected_indices[0](2, 1) = 1;  // 010
+  expected_vals[0].setConstant(-1);
+  expected_vals[0](0) = 1;  // val associated with ix 000
+  expected_vals[0](1) = 5;  // val associated with ix 002
+  expected_vals[0](2) = 4;  // val associated with ix 010
+
+  // Second group: 200
+  expected_indices.emplace_back(1, NDIM);
+  expected_vals.emplace_back(1);
+  expected_indices[1].setZero();
+  expected_indices[1](0, 0) = 2;  // 200
+  expected_vals[1](0) = 3;        // val associated with ix 200
+
+  // Third group: 300
+  expected_indices.emplace_back(1, NDIM);
+  expected_vals.emplace_back(1);
+  expected_indices[2].setZero();
+  expected_indices[2](0, 0) = 3;  // 300
+  expected_vals[2](0) = 2;        // val associated with ix 300
+
+  for (std::size_t gix = 0; gix < groups.size(); ++gix) {
+    // Compare indices
+    auto gi_t = grouped_indices[gix];
+    Eigen::Tensor<bool, 0, Eigen::RowMajor> eval =
+        (gi_t == expected_indices[gix]).all();
+    EXPECT_TRUE(eval()) << gix << " indices: " << gi_t << " vs. "
+                        << expected_indices[gix];
+
+    // Compare values
+    auto gv_t = grouped_values[gix];
+    eval = (gv_t == expected_vals[gix]).all();
+    EXPECT_TRUE(eval()) << gix << " values: " << gv_t << " vs. "
+                        << expected_vals[gix];
+  }
+}
+
+TEST(SparseTensorTest, Concat) {
+  int N = 5;
+  const int NDIM = 3;
+
+  Tensor ix(DT_INT64, TensorShape({N, NDIM}));
+  Tensor vals(DT_STRING, TensorShape({N}));
+
+  auto ix_c = GetSimpleIndexTensor(N, NDIM);
+
+  auto ix_t = ix.matrix<int64>();
+  auto vals_t = vals.vec<string>();
+
+  ix_t = ix_c;
+
+  TensorShape shape({10, 10, 10});
+  std::vector<int64> order{0, 1, 2};
+
+  SparseTensor st(ix, vals, shape, order);
+  EXPECT_FALSE(st.IndicesValid());
+  st.Reorder<string>(order);
+  EXPECT_TRUE(st.IndicesValid());
+
+  SparseTensor concatted = SparseTensor::Concat<string>({st, st, st, st});
+  EXPECT_EQ(concatted.order(), st.order());
+  TensorShape expected_shape({40, 10, 10});
+  EXPECT_EQ(concatted.shape(), expected_shape);
+  EXPECT_EQ(concatted.num_entries(), 4 * N);
+  EXPECT_TRUE(concatted.IndicesValid());
+
+  auto conc_ix_t = concatted.indices().matrix<int64>();
+  auto conc_vals_t = concatted.values().vec<string>();
+
+  for (int n = 0; n < 4; ++n) {
+    for (int i = 0; i < N; ++i) {
+      // Dimensions match except the primary dim, which is offset by
+      // shape[order[0]]
+      EXPECT_EQ(conc_ix_t(n * N + i, 0), 10 * n + ix_t(i, 0));
+      EXPECT_EQ(conc_ix_t(n * N + i, 1), ix_t(i, 1));
+      EXPECT_EQ(conc_ix_t(n * N + i, 1), ix_t(i, 1));
+
+      // Values match
+      EXPECT_EQ(conc_vals_t(n * N + i), vals_t(i));
+    }
+  }
+
+  // Concat works if non-primary ix is out of order, but output order
+  // is not defined
+  SparseTensor st_ooo(ix, vals, shape, {0, 2, 1});  // non-primary ix OOO
+  SparseTensor conc_ooo = SparseTensor::Concat<string>({st, st, st, st_ooo});
+  std::vector<int64> expected_ooo{-1, -1, -1};
+  EXPECT_EQ(conc_ooo.order(), expected_ooo);
+  EXPECT_EQ(conc_ooo.shape(), expected_shape);
+  EXPECT_EQ(conc_ooo.num_entries(), 4 * N);
+}
+
+// TODO(ebrevdo): ReduceToDense(R={dim1,dim2,...}, reduce_fn, &output)
+// reduce_fn sees slices of resorted values based on generator (dim: DDIMS), and
+// slices of resorted indices on generator.
+
+}  // namespace
+}  // namespace sparse
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/tensor_slice_reader.cc b/tensorflow/core/util/tensor_slice_reader.cc
new file mode 100644
index 0000000000..00bc16f105
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_reader.cc
@@ -0,0 +1,230 @@
+#include "tensorflow/core/util/tensor_slice_reader.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/io/iterator.h"
+#include "tensorflow/core/lib/io/match.h"
+#include "tensorflow/core/lib/io/table.h"
+#include "tensorflow/core/lib/io/table_options.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/util/saved_tensor_slice_util.h"
+#include "tensorflow/core/util/tensor_slice_util.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+TensorSliceReader::Table::~Table() {}
+
+namespace {
+class TensorSliceReaderTable : public TensorSliceReader::Table {
+ public:
+  explicit TensorSliceReaderTable(RandomAccessFile* f, table::Table* t)
+      : file_(f), table_(t) {}
+
+  ~TensorSliceReaderTable() override {
+    delete table_;
+    delete file_;
+  }
+
+  bool Get(const string& key, string* value) override {
+    std::unique_ptr<table::Iterator> iter(table_->NewIterator());
+    iter->Seek(key);
+    if (iter->Valid() && iter->key() == key) {
+      StringPiece v = iter->value();
+      value->assign(v.data(), v.size());
+      return true;
+    } else {
+      return false;
+    }
+  }
+
+ private:
+  RandomAccessFile* file_;
+  table::Table* table_;
+};
+}  // namespace
+
+Status OpenTableTensorSliceReader(const string& fname,
+                                  TensorSliceReader::Table** result) {
+  *result = nullptr;
+  Env* env = Env::Default();
+  RandomAccessFile* f = nullptr;
+  Status s = env->NewRandomAccessFile(fname, &f);
+  if (s.ok()) {
+    uint64 file_size;
+    s = env->GetFileSize(fname, &file_size);
+    if (s.ok()) {
+      table::Options options;
+      table::Table* table;
+      s = table::Table::Open(options, f, file_size, &table);
+      if (s.ok()) {
+        *result = new TensorSliceReaderTable(f, table);
+        return Status::OK();
+      } else {
+        s = Status(s.code(),
+                   strings::StrCat(s.error_message(),
+                                   ": perhaps your file is in a different "
+                                   "file format and you need to use a "
+                                   "different restore operator?"));
+      }
+    }
+  }
+  LOG(WARNING) << "Could not open " << fname << ": " << s;
+  delete f;
+  return s;
+}
+
+TensorSliceReader::TensorSliceReader(const string& filepattern,
+                                     OpenTableFunction open_function)
+    : TensorSliceReader(filepattern, open_function, kLoadAllShards) {}
+
+TensorSliceReader::TensorSliceReader(const string& filepattern,
+                                     OpenTableFunction open_function,
+                                     int preferred_shard)
+    : filepattern_(filepattern), open_function_(open_function) {
+  VLOG(1) << "TensorSliceReader for " << filepattern;
+  Status s = io::GetMatchingFiles(Env::Default(), filepattern, &fnames_);
+  if (!s.ok()) {
+    status_ = errors::InvalidArgument(
+        "Unsuccessful TensorSliceReader constructor: "
+        "Failed to get matching files on ",
+        filepattern, ": ", s.ToString());
+    return;
+  }
+  if (fnames_.empty()) {
+    status_ = errors::NotFound(
+        "Unsuccessful TensorSliceReader constructor: "
+        "Failed to find any matching files for ",
+        filepattern);
+    return;
+  }
+  sss_.resize(fnames_.size());
+  for (size_t shard = 0; shard < fnames_.size(); ++shard) {
+    fname_to_index_.insert(std::make_pair(fnames_[shard], shard));
+  }
+  if (preferred_shard == kLoadAllShards || fnames_.size() == 1 ||
+      static_cast<size_t>(preferred_shard) >= fnames_.size()) {
+    LoadAllShards();
+  } else {
+    VLOG(1) << "Loading shard " << preferred_shard << " for " << filepattern_;
+    LoadShard(preferred_shard);
+  }
+}
+
+void TensorSliceReader::LoadShard(int shard) const {
+  CHECK_LT(shard, sss_.size());
+  if (sss_[shard] || !status_.ok()) {
+    return;  // Already loaded, or invalid.
+  }
+  string value;
+  SavedTensorSlices sts;
+  const string fname = fnames_[shard];
+  VLOG(1) << "Reading meta data from file " << fname << "...";
+  Table* table;
+  Status s = open_function_(fname, &table);
+  if (!s.ok()) {
+    status_ = errors::DataLoss("Unable to open table file ", fname, ": ",
+                               s.ToString());
+    return;
+  }
+  sss_[shard].reset(table);
+  if (!(table->Get(kSavedTensorSlicesKey, &value) &&
+        ParseProtoUnlimited(&sts, value))) {
+    status_ = errors::Internal(
+        "Failed to find the saved tensor slices at the beginning of the "
+        "checkpoint file: ",
+        fname);
+    return;
+  }
+  for (const SavedSliceMeta& ssm : sts.meta().tensor()) {
+    TensorShape ssm_shape(ssm.shape());
+    for (const TensorSliceProto& tsp : ssm.slice()) {
+      TensorSlice ss_slice(tsp);
+      RegisterTensorSlice(ssm.name(), ssm_shape, ssm.type(), fname, ss_slice);
+    }
+  }
+}
+
+void TensorSliceReader::LoadAllShards() const {
+  VLOG(1) << "Loading all shards for " << filepattern_;
+  for (size_t i = 0; i < fnames_.size() && status_.ok(); ++i) {
+    LoadShard(i);
+  }
+  all_shards_loaded_ = true;
+}
+
+const TensorSliceSet* TensorSliceReader::FindTensorSlice(
+    const string& name, const TensorSlice& slice,
+    std::vector<std::pair<TensorSlice, string>>* details) const {
+  const TensorSliceSet* tss = gtl::FindPtrOrNull(tensors_, name);
+  if (tss && !tss->QueryMeta(slice, details)) {
+    return nullptr;
+  }
+  return tss;
+}
+
+TensorSliceReader::~TensorSliceReader() { gtl::STLDeleteValues(&tensors_); }
+
+void TensorSliceReader::RegisterTensorSlice(const string& name,
+                                            const TensorShape& shape,
+                                            DataType type, const string& tag,
+                                            const TensorSlice& slice) const {
+  TensorSliceSet* tss = gtl::FindPtrOrNull(tensors_, name);
+  // Create a tensor slice set if needed
+  if (!tss) {
+    tss = new TensorSliceSet(shape, type);
+    tensors_.insert(std::make_pair(name, tss));
+  } else {
+    // Check if the shapes match
+    TensorShape tss_shape(tss->shape());
+    if (!shape.IsSameSize(tss_shape)) {
+      status_ =
+          errors::Internal("Incompatible tensor shapes detected for tensor ",
+                           name, ": existing = ", tss_shape.DebugString(),
+                           ", new = ", shape.DebugString());
+      return;
+    }
+    if (type != tss->type()) {
+      status_ =
+          errors::Internal("Incompatible tensor types detected for tensor ",
+                           name, ": existing = ", DataTypeString(tss->type()),
+                           ", new = ", DataTypeString(type));
+      return;
+    }
+  }
+  // Register the tensor slices without the actual data.
+  Status s = tss->Register(slice, tag, nullptr);
+  if (!s.ok()) {
+    status_ = s;
+  }
+}
+
+bool TensorSliceReader::HasTensor(const string& name, TensorShape* shape,
+                                  DataType* type) const {
+  mutex_lock l(mu_);
+  const TensorSliceSet* tss = gtl::FindPtrOrNull(tensors_, name);
+  if (!tss && !all_shards_loaded_) {
+    VLOG(1) << "Did not find tensor in preferred shard, loading all shards: "
+            << name;
+    LoadAllShards();
+    tss = gtl::FindPtrOrNull(tensors_, name);
+  }
+  if (tss) {
+    if (shape) {
+      *shape = tss->shape();
+    }
+    if (type) {
+      *type = tss->type();
+    }
+    return true;
+  } else {
+    return false;
+  }
+}
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/tensor_slice_reader.h b/tensorflow/core/util/tensor_slice_reader.h
new file mode 100644
index 0000000000..b5f26a689b
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_reader.h
@@ -0,0 +1,157 @@
+// The utility to read checkpoints for google brain tensor ops and v3
+// checkpoints for dist_belief.
+//
+
+#ifndef TENSORFLOW_UTIL_TENSOR_SLICE_READER_H_
+#define TENSORFLOW_UTIL_TENSOR_SLICE_READER_H_
+
+#include <unordered_map>
+
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/util/saved_tensor_slice.pb.h"
+#include "tensorflow/core/util/saved_tensor_slice_util.h"
+#include "tensorflow/core/util/tensor_slice_set.h"
+#include "tensorflow/core/util/tensor_slice_util.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+// The reader reads in all the meta data about all the tensor slices. Then it
+// will try to read the relevant data on-demand to produce the data for the
+// slices needed.
+// NOTE(yangke): another way to do this is to first load a list of the tensor
+// slices needed and then just selectively read some of the meta data. That
+// might optimize the loading but makes the logic a bit more complicated. We
+// might want to revisit that.
+// TODO(yangke): consider moving to TensorProto.
+class TensorSliceReader {
+ public:
+  // Abstract interface for reading data out of a tensor slice checkpoint file
+  class Table {
+   public:
+    virtual ~Table();
+    virtual bool Get(const string& key, string* value) = 0;
+  };
+  typedef std::function<Status(const string&, Table**)> OpenTableFunction;
+
+  static const int kLoadAllShards = -1;
+  TensorSliceReader(const string& filepattern, OpenTableFunction open_function);
+  TensorSliceReader(const string& filepattern, OpenTableFunction open_function,
+                    int preferred_shard);
+  virtual ~TensorSliceReader();
+
+  // Get the filename this reader is attached to.
+  const string& filepattern() const { return filepattern_; }
+
+  // Get the number of files matched.
+  int num_files() const { return sss_.size(); }
+
+  // Get the status of the reader.
+  const Status status() const { return status_; }
+
+  // Checks if the reader contains any slice of a tensor. In case the reader
+  // does contain the tensor, if "shape" is not nullptr, fill "shape" with the
+  // shape of the tensor; if "type" is not nullptr, fill "type" with the type
+  // of the tensor.
+  bool HasTensor(const string& name, TensorShape* shape, DataType* type) const;
+
+  // Checks if the reader contains all the data about a tensor slice, and if
+  // yes, copies the data of the slice to "data". The caller needs to make sure
+  // that "data" points to a buffer that holds enough data.
+  // This is a slow function since it needs to read sstables.
+  template <typename T>
+  bool CopySliceData(const string& name, const TensorSlice& slice,
+                     T* data) const;
+
+  // Get the tensors.
+  const std::unordered_map<string, TensorSliceSet*>& Tensors() const {
+    return tensors_;
+  }
+
+ private:
+  friend class TensorSliceWriteTestHelper;
+
+  void LoadShard(int shard) const;
+  void LoadAllShards() const;
+  void RegisterTensorSlice(const string& name, const TensorShape& shape,
+                           DataType type, const string& tag,
+                           const TensorSlice& slice) const;
+
+  const TensorSliceSet* FindTensorSlice(
+      const string& name, const TensorSlice& slice,
+      std::vector<std::pair<TensorSlice, string>>* details) const;
+
+  const string filepattern_;
+  const OpenTableFunction open_function_;
+  std::vector<string> fnames_;
+  std::unordered_map<string, int> fname_to_index_;
+
+  // Guards the attributes below.
+  mutable mutex mu_;
+  mutable bool all_shards_loaded_ = false;
+  mutable std::vector<std::unique_ptr<Table>> sss_;
+  mutable std::unordered_map<string, TensorSliceSet*> tensors_;
+  mutable Status status_;
+
+  TF_DISALLOW_COPY_AND_ASSIGN(TensorSliceReader);
+};
+
+Status OpenTableTensorSliceReader(const string& fname,
+                                  TensorSliceReader::Table** table);
+
+template <typename T>
+bool TensorSliceReader::CopySliceData(const string& name,
+                                      const TensorSlice& slice, T* data) const {
+  std::vector<std::pair<TensorSlice, string>> details;
+  const TensorSliceSet* tss;
+  {
+    mutex_lock l(mu_);
+    tss = FindTensorSlice(name, slice, &details);
+    if (!tss && !all_shards_loaded_) {
+      VLOG(1) << "Did not find slice in preferred shard, loading all shards."
+              << name << ": " << slice.DebugString();
+      LoadAllShards();
+      tss = FindTensorSlice(name, slice, &details);
+    }
+    if (!tss) {
+      // No such tensor
+      return false;
+    }
+  }
+  // We have the data -- copy it over.
+  string value;
+  for (const auto& x : details) {
+    const TensorSlice& slice_s = x.first;
+    const string& fname = x.second;
+    int idx = gtl::FindWithDefault(fname_to_index_, fname, -1);
+    CHECK_GE(idx, 0) << "Failed to find the index for filename " << fname;
+    // We read a record in the corresponding sstable
+    const string key = EncodeTensorNameSlice(name, slice_s);
+    CHECK(sss_[idx]->Get(key, &value))
+        << "Failed to seek to the record for tensor " << name << ", slice "
+        << slice_s.DebugString() << ": computed key = " << key;
+    SavedTensorSlices sts;
+    CHECK(ParseProtoUnlimited(&sts, value))
+        << "Failed to parse the record for tensor " << name << ", slice "
+        << slice_s.DebugString() << ": computed key = " << key;
+    CopyDataFromTensorSliceToTensorSlice(
+        tss->shape(), slice_s, slice,
+        checkpoint::TensorProtoData<T>(sts.data().data()), data);
+  }
+  return true;
+}
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_TENSOR_SLICE_READER_H_
diff --git a/tensorflow/core/util/tensor_slice_reader_cache.cc b/tensorflow/core/util/tensor_slice_reader_cache.cc
new file mode 100644
index 0000000000..af81d0115e
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_reader_cache.cc
@@ -0,0 +1,94 @@
+#include "tensorflow/core/util/tensor_slice_reader_cache.h"
+
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+TensorSliceReaderCacheWrapper::TensorSliceReaderCacheWrapper() {}
+TensorSliceReaderCacheWrapper::~TensorSliceReaderCacheWrapper() {
+  if (cache_) {
+    delete cache_;
+  }
+  cache_ = nullptr;
+}
+
+const TensorSliceReader* TensorSliceReaderCacheWrapper::GetReader(
+    const string& filepattern,
+    TensorSliceReader::OpenTableFunction open_function,
+    int preferred_shard) const {
+  mutex_lock l(mu_);
+  if (!cache_) {
+    cache_ = new TensorSliceReaderCache;
+  }
+  return cache_->GetReader(filepattern, open_function, preferred_shard);
+}
+
+TensorSliceReaderCache::TensorSliceReaderCache() {}
+
+TensorSliceReaderCache::~TensorSliceReaderCache() {
+  for (auto pair : readers_) {
+    delete pair.second.second;
+  }
+}
+
+const TensorSliceReader* TensorSliceReaderCache::GetReader(
+    const string& filepattern,
+    TensorSliceReader::OpenTableFunction open_function, int preferred_shard) {
+  mutex_lock l(mu_);
+
+  // Get the function pointer from the open_function value.
+  TensorSliceReaderCache::OpenFuncType* func_ptr =
+      open_function.target<TensorSliceReaderCache::OpenFuncType>();
+  if (!func_ptr) {
+    // We could not get the pointer, no caching is possible.
+    LOG(WARNING) << "Caching disabled because the open function is a lambda.";
+    return nullptr;
+  }
+
+  // Wait if another thread is already trying to open the same files.
+  while (still_opening_.find(filepattern) != still_opening_.end()) {
+    cv_.wait(l);
+  }
+
+  TensorSliceReader* reader = nullptr;
+  if (readers_.find(filepattern) == readers_.end()) {
+    VLOG(1) << "Creating new TensorSliceReader for " << filepattern;
+    still_opening_.insert(filepattern);
+    // Release the lock temporary as constructing TensorSliceReader is
+    // expensive.
+    mu_.unlock();
+    TensorSliceReader* tmp_reader(
+        new TensorSliceReader(filepattern, open_function, preferred_shard));
+    // Acquire the lock again.
+    mu_.lock();
+    if (tmp_reader->status().ok()) {
+      reader = tmp_reader;
+      readers_[filepattern] = make_pair(*func_ptr, reader);
+    } else {
+      delete tmp_reader;
+    }
+    CHECK_EQ(1, still_opening_.erase(filepattern));
+    VLOG(1) << "Cached TensorSliceReader for " << filepattern << ": " << reader;
+  } else {
+    auto cached_val = readers_[filepattern];
+    if (cached_val.first == *func_ptr) {
+      reader = cached_val.second;
+      VLOG(1) << "Using cached TensorSliceReader for " << filepattern << ": "
+              << reader;
+    } else {
+      LOG(WARNING) << "Caching disabled because the checkpoint file "
+                   << "is being opened with two different open functions: "
+                   << filepattern;
+    }
+  }
+
+  cv_.notify_all();
+  return reader;
+}
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/tensor_slice_reader_cache.h b/tensorflow/core/util/tensor_slice_reader_cache.h
new file mode 100644
index 0000000000..eaeeeec83f
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_reader_cache.h
@@ -0,0 +1,73 @@
+// The utility to read checkpoints for google brain tensor ops and v3
+// checkpoints for dist_belief.
+//
+
+#ifndef TENSORFLOW_UTIL_TENSOR_SLICE_READER_CACHE_H_
+#define TENSORFLOW_UTIL_TENSOR_SLICE_READER_CACHE_H_
+
+#include <unordered_map>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/util/tensor_slice_reader.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+class TensorSliceReaderCache;
+
+// Wrapper to a lazily allocated TensorSliceReaderCache.
+class TensorSliceReaderCacheWrapper {
+ public:
+  TensorSliceReaderCacheWrapper();
+  ~TensorSliceReaderCacheWrapper();
+
+  // Same as TensorSliceReaderCache::GetReader().
+  const TensorSliceReader* GetReader(
+      const string& filepattern,
+      TensorSliceReader::OpenTableFunction open_function,
+      int preferred_shard) const;
+
+ private:
+  mutable mutex mu_;
+  mutable TensorSliceReaderCache* cache_ = nullptr;
+};
+
+// A cache of TensorSliceReaders.
+class TensorSliceReaderCache {
+ public:
+  TensorSliceReaderCache();
+  ~TensorSliceReaderCache();
+
+  // Returns the TensorSliceReader corresponding to 'filepattern' and the
+  // open_function.  May return nullptr if we can not create a new
+  // TensorSliceReader for the filepattern/open_function combination.
+  const TensorSliceReader* GetReader(
+      const string& filepattern,
+      TensorSliceReader::OpenTableFunction open_function, int preferred_shard);
+
+ private:
+  // Need to use a regular function type in the key map as std::function does
+  // not support ==.
+  typedef Status (*OpenFuncType)(const string&, TensorSliceReader::Table**);
+
+  // Protects attributes below.
+  mutex mu_;
+
+  // Maps of opened readers.
+  std::unordered_map<string, std::pair<OpenFuncType, TensorSliceReader*>>
+      readers_;
+
+  // Set of keys that a previous GetReader() call is still trying to populate.
+  std::set<string> still_opening_;
+
+  // Condition variable to notify when a reader has been created.
+  condition_variable cv_;
+};
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_TENSOR_SLICE_READER_CACHE_H_
diff --git a/tensorflow/core/util/tensor_slice_reader_test.cc b/tensorflow/core/util/tensor_slice_reader_test.cc
new file mode 100644
index 0000000000..e14b920003
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_reader_test.cc
@@ -0,0 +1,395 @@
+#include "tensorflow/core/util/tensor_slice_reader.h"
+
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/status_test_util.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/util/saved_tensor_slice_util.h"
+#include "tensorflow/core/util/tensor_slice_writer.h"
+#include "tensorflow/core/util/tensor_slice_reader_cache.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+namespace {
+
+// A simple test where we write a few tensor slices with a number of tensor
+// slice writers and then read them back from a tensor slice reader.
+//
+// We have a 2-d tensor of shape 4 X 5 that looks like this:
+//
+//   0   1   2   3   4
+//   5   6   7   8   9
+//  10  11  12  13  14
+//  15  16  17  18  19
+//
+// We assume this is a row-major matrix.
+
+void SimpleFloatHelper(TensorSliceWriter::CreateBuilderFunction create_function,
+                       TensorSliceReader::OpenTableFunction open_function) {
+  const string fname_base = io::JoinPath(testing::TmpDir(), "float_checkpoint");
+
+  TensorShape shape({4, 5});
+
+  // File #0 contains a slice that is the top two rows:
+  //
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    const string fname = strings::StrCat(fname_base, "_0");
+    TensorSliceWriter writer(fname, create_function);
+    const float data[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+    TensorSlice slice = TensorSlice::ParseOrDie("0,2:-");
+    TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    TF_CHECK_OK(writer.Finish());
+  }
+
+  // File #1 contains two slices:
+  //
+  // slice #0 is the bottom left corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //  10  11  12   .   .
+  //  15  16  17   .   .
+  //
+  // slice #1 is the bottom right corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .  18  19
+  {
+    const string fname = strings::StrCat(fname_base, "_1");
+    TensorSliceWriter writer(fname, create_function);
+    // slice #0
+    {
+      const float data[] = {10, 11, 12, 15, 16, 17};
+      TensorSlice slice = TensorSlice::ParseOrDie("2,2:0,3");
+      TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    }
+    // slice #1
+    {
+      const float data[] = {18, 19};
+      TensorSlice slice = TensorSlice::ParseOrDie("3,1:3,2");
+      TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    }
+    TF_CHECK_OK(writer.Finish());
+  }
+
+  // Notice that we leave a hole in the tensor
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   . (13) (14)
+  //   .   .   .   .   .
+
+  // Now we need to read the tensor slices
+  const string filepattern = strings::StrCat(fname_base, "_*");
+  TensorSliceReader reader(filepattern, open_function);
+  EXPECT_OK(reader.status());
+  EXPECT_EQ(2, reader.num_files());
+
+  // We query some of the tensors
+  {
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("test", &shape, &type));
+    EXPECT_EQ(
+        "dim { size: 4 } "
+        "dim { size: 5 }",
+        shape.DebugString());
+    EXPECT_EQ(DT_FLOAT, type);
+    EXPECT_FALSE(reader.HasTensor("don't exist", nullptr, nullptr));
+  }
+
+  // Now we query some slices
+  //
+  // Slice #1 is an exact match
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("0,2:-");
+    float expected[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+    float results[10];
+    EXPECT_TRUE(reader.CopySliceData("test", s, results));
+    for (int i = 0; i < 10; ++i) {
+      EXPECT_EQ(expected[i], results[i]);
+    }
+  }
+
+  // Slice #2 is a subset match
+  //   .   .   .   .   .
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,1:-");
+    float expected[] = {5, 6, 7, 8, 9};
+    float results[5];
+    EXPECT_TRUE(reader.CopySliceData("test", s, results));
+    for (int i = 0; i < 5; ++i) {
+      EXPECT_EQ(expected[i], results[i]);
+    }
+  }
+
+  // Slice #4 includes the hole and so there is no match
+  //   .   .   .   .   .
+  //   .   .   7   8   9
+  //   .   .  12  13  14
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,2:2,3");
+    float results[6];
+    EXPECT_FALSE(reader.CopySliceData("test", s, results));
+  }
+}
+
+TEST(TensorSliceReaderTest, SimpleFloat) {
+  SimpleFloatHelper(CreateTableTensorSliceBuilder, OpenTableTensorSliceReader);
+}
+
+template <typename T, typename U>
+void SimpleIntXHelper(TensorSliceWriter::CreateBuilderFunction create_function,
+                      TensorSliceReader::OpenTableFunction open_function,
+                      const string& checkpoint_file) {
+  const string fname_base = io::JoinPath(testing::TmpDir(), checkpoint_file);
+
+  TensorShape shape({4, 5});
+
+  // File #0 contains a slice that is the top two rows:
+  //
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    const string fname = strings::StrCat(fname_base, "_0");
+    TensorSliceWriter writer(fname, create_function);
+    const T data[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+    TensorSlice slice = TensorSlice::ParseOrDie("0,2:-");
+    TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    TF_CHECK_OK(writer.Finish());
+  }
+
+  // File #1 contains two slices:
+  //
+  // slice #0 is the bottom left corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //  10  11  12   .   .
+  //  15  16  17   .   .
+  //
+  // slice #1 is the bottom right corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .  18  19
+  {
+    const string fname = strings::StrCat(fname_base, "_1");
+    TensorSliceWriter writer(fname, create_function);
+    // slice #0
+    {
+      const T data[] = {10, 11, 12, 15, 16, 17};
+      TensorSlice slice = TensorSlice::ParseOrDie("2,2:0,3");
+      TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    }
+    // slice #1
+    {
+      const T data[] = {18, 19};
+      TensorSlice slice = TensorSlice::ParseOrDie("3,1:3,2");
+      TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    }
+    TF_CHECK_OK(writer.Finish());
+  }
+
+  // Notice that we leave a hole in the tensor
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   . (13) (14)
+  //   .   .   .   .   .
+
+  // Now we need to read the tensor slices
+  const string filepattern = strings::StrCat(fname_base, "_*");
+  TensorSliceReader reader(filepattern, open_function);
+  EXPECT_OK(reader.status());
+  EXPECT_EQ(2, reader.num_files());
+
+  // We query some of the tensors
+  {
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader.HasTensor("test", &shape, &type));
+    EXPECT_EQ(
+        "dim { size: 4 } "
+        "dim { size: 5 }",
+        shape.DebugString());
+    EXPECT_EQ(DataTypeToEnum<T>::v(), type);
+    EXPECT_FALSE(reader.HasTensor("don't exist", nullptr, nullptr));
+  }
+
+  // Now we query some slices
+  //
+  // Slice #1 is an exact match
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("0,2:-");
+    T expected[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+    U results[10];
+    EXPECT_TRUE(reader.CopySliceData("test", s, results));
+    for (int i = 0; i < 10; ++i) {
+      EXPECT_EQ(expected[i], results[i]);
+    }
+  }
+
+  // Slice #2 is a subset match
+  //   .   .   .   .   .
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,1:-");
+    T expected[] = {5, 6, 7, 8, 9};
+    U results[5];
+    EXPECT_TRUE(reader.CopySliceData("test", s, results));
+    for (int i = 0; i < 5; ++i) {
+      EXPECT_EQ(expected[i], results[i]);
+    }
+  }
+
+  // Slice #4 includes the hole and so there is no match
+  //   .   .   .   .   .
+  //   .   .   7   8   9
+  //   .   .  12  13  14
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,2:2,3");
+    U results[6];
+    EXPECT_FALSE(reader.CopySliceData("test", s, results));
+  }
+}
+
+#define TEST_SIMPLE_INT(TYPE, SAVED_TYPE)                             \
+  TEST(TensorSliceReaderTest, Simple##TYPE) {                         \
+    SimpleIntXHelper<TYPE, SAVED_TYPE>(CreateTableTensorSliceBuilder, \
+                                       OpenTableTensorSliceReader,    \
+                                       #TYPE "_checkpoint");          \
+  }
+
+TEST_SIMPLE_INT(int32, int32)
+TEST_SIMPLE_INT(int64, int64)
+TEST_SIMPLE_INT(int16, int32)
+TEST_SIMPLE_INT(int8, int32)
+TEST_SIMPLE_INT(uint8, int32)
+
+void CachedTensorSliceReaderTesterHelper(
+    TensorSliceWriter::CreateBuilderFunction create_function,
+    TensorSliceReader::OpenTableFunction open_function) {
+  const string fname_base = io::JoinPath(testing::TmpDir(), "float_checkpoint");
+
+  TensorShape shape({4, 5});
+
+  // File #0 contains a slice that is the top two rows:
+  //
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    const string fname = strings::StrCat(fname_base, "_0");
+    TensorSliceWriter writer(fname, create_function);
+    const float data[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+    TensorSlice slice = TensorSlice::ParseOrDie("0,2:-");
+    TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    TF_CHECK_OK(writer.Finish());
+  }
+
+  // File #1 contains two slices:
+  //
+  // slice #0 is the bottom left corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //  10  11  12   .   .
+  //  15  16  17   .   .
+  //
+  // slice #1 is the bottom right corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .  18  19
+  {
+    const string fname = strings::StrCat(fname_base, "_1");
+    TensorSliceWriter writer(fname, create_function);
+    // slice #0
+    {
+      const float data[] = {10, 11, 12, 15, 16, 17};
+      TensorSlice slice = TensorSlice::ParseOrDie("2,2:0,3");
+      TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    }
+    // slice #1
+    {
+      const float data[] = {18, 19};
+      TensorSlice slice = TensorSlice::ParseOrDie("3,1:3,2");
+      TF_CHECK_OK(writer.Add("test", shape, slice, data));
+    }
+    TF_CHECK_OK(writer.Finish());
+  }
+
+  // Notice that we leave a hole in the tensor
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   . (13) (14)
+  //   .   .   .   .   .
+
+  // Now we need to read the tensor slices
+  TensorSliceReaderCache cache;
+  const string filepattern = strings::StrCat(fname_base, "_*");
+  const TensorSliceReader* reader = cache.GetReader(
+      filepattern, open_function, TensorSliceReader::kLoadAllShards);
+  EXPECT_TRUE(reader != nullptr);
+  EXPECT_EQ(2, reader->num_files());
+
+  // We query some of the tensors
+  {
+    TensorShape shape;
+    DataType type;
+    EXPECT_TRUE(reader->HasTensor("test", &shape, &type));
+    EXPECT_EQ(
+        "dim { size: 4 } "
+        "dim { size: 5 }",
+        shape.DebugString());
+    EXPECT_EQ(DT_FLOAT, type);
+    EXPECT_FALSE(reader->HasTensor("don't exist", nullptr, nullptr));
+  }
+
+  // Make sure the reader is cached.
+  const TensorSliceReader* reader2 = cache.GetReader(
+      filepattern, open_function, TensorSliceReader::kLoadAllShards);
+  EXPECT_EQ(reader, reader2);
+
+  reader = cache.GetReader("file_does_not_exist", open_function,
+                           TensorSliceReader::kLoadAllShards);
+  EXPECT_TRUE(reader == nullptr);
+}
+
+TEST(CachedTensorSliceReaderTest, SimpleFloat) {
+  CachedTensorSliceReaderTesterHelper(CreateTableTensorSliceBuilder,
+                                      OpenTableTensorSliceReader);
+}
+
+}  // namespace
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/tensor_slice_set.cc b/tensorflow/core/util/tensor_slice_set.cc
new file mode 100644
index 0000000000..765686f189
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_set.cc
@@ -0,0 +1,148 @@
+#include "tensorflow/core/util/tensor_slice_set.h"
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/util/tensor_slice_util.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+TensorSliceSet::TensorSliceSet(const TensorShape& shape, DataType type)
+    : shape_(shape), type_(type) {}
+
+TensorSliceSet::~TensorSliceSet() {}
+
+Status TensorSliceSet::Register(const TensorSlice& slice,
+                                     const string& tag, const float* data) {
+  TensorShape result_shape;
+  TF_RETURN_IF_ERROR(slice.SliceTensorShape(shape_, &result_shape));
+  string str = slice.DebugString();
+  // We check if there is any intersection between this slice and any of the
+  // registered slices.
+  for (const auto x : slices_) {
+    if (slice.Overlaps(x.second.slice)) {
+      return errors::Internal("Overlapping slices: existing slice = ", x.first,
+                              ", new slice = ", str);
+    }
+  }
+  // No overlap: we can now insert the slice
+  TensorSliceSet::SliceInfo info = {slice, tag, data,
+                                    result_shape.num_elements()};
+  slices_.insert(std::make_pair(str, info));
+  return Status::OK();
+}
+
+// TODO(yangke): merge Query() with QueryMeta()
+bool TensorSliceSet::Query(const TensorSlice& slice, float* data) const {
+  Status s;
+  string str = slice.DebugString();
+  // First we check if there is an exactly match (this is the dominant case).
+  const TensorSliceSet::SliceInfo* info = gtl::FindOrNull(slices_, str);
+  if (info) {
+    if (data) {
+      std::copy_n(info->data, info->num_floats, data);
+    }
+    return true;
+  } else {
+    // We didn't find any exact match but there is still a posibility that
+    // mutliple existing slices can be patched together to output the slice.
+    // We figure this out by computing the intersection of each of the existing
+    // slices with the query slice, and check if the union of all these
+    // intersections cover the entire slice. We rely on the fact that the
+    // existing slices don't have any intersection among themselves.
+    TensorShape target_shape;
+    Status s;
+    s = slice.SliceTensorShape(shape_, &target_shape);
+    if (!s.ok()) {
+      LOG(WARNING) << s;
+      return false;
+    }
+    int64 total_size = target_shape.num_elements();
+
+    int64 overlap_size = 0;
+    TensorSlice intersection;
+    TensorShape inter_shape;
+    for (const auto x : slices_) {
+      if (slice.Intersect(x.second.slice, &intersection)) {
+        s = intersection.SliceTensorShape(shape_, &inter_shape);
+        if (!s.ok()) {
+          LOG(WARNING) << s;
+          return false;
+        }
+        overlap_size += inter_shape.num_elements();
+      }
+    }
+    if (total_size == overlap_size) {
+      // We have it!
+      // Now we need to copy the data to "data"
+      if (data) {
+        for (const auto x : slices_) {
+          CopyDataFromTensorSliceToTensorSlice(shape_, x.second.slice, slice,
+                                               x.second.data, data);
+        }
+      }
+      return true;
+    } else {
+      // We don't have all the data for the asked tensor slice
+      return false;
+    }
+  }
+}
+
+bool TensorSliceSet::QueryMeta(
+    const TensorSlice& slice,
+    std::vector<std::pair<TensorSlice, string>>* results) const {
+  results->clear();
+  Status s;
+  string str = slice.DebugString();
+  // First we check if there is an exactly match (this is the dominant case).
+  const TensorSliceSet::SliceInfo* info = gtl::FindOrNull(slices_, str);
+  if (info) {
+    results->emplace_back(std::make_pair(info->slice, info->tag));
+    return true;
+  } else {
+    // We didn't find any exact match but there is still a posibility that
+    // multiple existing slices can be patched together to output the slice.
+    // We figure this out by computing the intersection of each of the existing
+    // slices with the query slice, and check if the union of all these
+    // intersections cover the entire slice. We rely on the fact that the
+    // existing slices don't have any intersection among themselves.
+    TensorShape target_shape;
+    Status s;
+    s = slice.SliceTensorShape(shape_, &target_shape);
+    if (!s.ok()) {
+      LOG(WARNING) << s;
+      return false;
+    }
+    int64 total_size = target_shape.num_elements();
+
+    int64 overlap_size = 0;
+    TensorSlice intersection;
+    TensorShape inter_shape;
+    for (const auto x : slices_) {
+      if (slice.Intersect(x.second.slice, &intersection)) {
+        s = intersection.SliceTensorShape(shape_, &inter_shape);
+        if (!s.ok()) {
+          LOG(WARNING) << s;
+          return false;
+        }
+        overlap_size += inter_shape.num_elements();
+        results->emplace_back(std::make_pair(x.second.slice, x.second.tag));
+      }
+    }
+    if (total_size == overlap_size) {
+      // We have it!
+      return true;
+    } else {
+      // We don't have all the data for the asked tensor slice
+      results->clear();
+      return false;
+    }
+  }
+}
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/tensor_slice_set.h b/tensorflow/core/util/tensor_slice_set.h
new file mode 100644
index 0000000000..f3f7ac0e76
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_set.h
@@ -0,0 +1,73 @@
+// A class to manage slices of a tensor. You can "register" set of slices for a
+// tensor and then "query" if we have data for a given slice.
+
+// TODO(yangke): consider moving it to a more private place so that we don't
+// need to expose the API.
+
+#ifndef TENSORFLOW_UTIL_TENSOR_SLICE_SET_H_
+#define TENSORFLOW_UTIL_TENSOR_SLICE_SET_H_
+
+#include <string>                 // for string
+#include <unordered_map>
+
+#include "tensorflow/core/platform/port.h"  // for int64
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/lib/core/stringpiece.h"  // for StringPiece
+#include "tensorflow/core/public/status.h"         // for Status
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+class TensorSliceSet {
+ public:
+  TensorSliceSet(const TensorShape& shape, DataType type);
+  virtual ~TensorSliceSet();
+
+  const TensorShape& shape() const { return shape_; }
+  const DataType type() const { return type_; }
+
+  // Register a new slice for the tensor. The "tag" is an arbitrary string
+  // associated with the slice (in one application it denotes the name of the
+  // file that contains the slice); the "data" points to the data of the tensor
+  // slice (it can be a nullptr).
+  // We don't take the ownership of "data" and the caller needs to make sure
+  // the data is always available during the life time of the tensor slice set
+  // if it is not nullptr.
+  Status Register(const TensorSlice& slice, const string& tag,
+                       const float* data);
+
+  // Query about a new slice: checks if we have data for "slice" and if we have
+  // the data and "data" is not nullptr, fill "data" with the slice data. The
+  // caller needs to make sure "data" point to a large eough buffer.
+  // TODO(yangke): avoid unnecessary copying by using a core::RefCounted
+  // pointer.
+  bool Query(const TensorSlice& slice, float* data) const;
+
+  // Alternative way of querying about a new slice: instead of copying the
+  // data, it returns a list of meta data about the stored slices that will
+  // supply data for the slice.
+  bool QueryMeta(
+      const TensorSlice& slice,
+      std::vector<std::pair<tensorflow::TensorSlice, string>>* results) const;
+
+ private:
+  const TensorShape shape_;
+  const DataType type_;
+  struct SliceInfo {
+    TensorSlice slice;
+    const string tag;
+    const float* data;
+    int64 num_floats;
+  };
+  // We maintain a mapping from the slice string to the slice information.
+  std::unordered_map<string, SliceInfo> slices_;
+};
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_TENSOR_SLICE_SET_H_
diff --git a/tensorflow/core/util/tensor_slice_set_test.cc b/tensorflow/core/util/tensor_slice_set_test.cc
new file mode 100644
index 0000000000..fb2f46f34c
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_set_test.cc
@@ -0,0 +1,227 @@
+#include "tensorflow/core/util/tensor_slice_set.h"
+
+#include "tensorflow/core/platform/logging.h"
+#include <gtest/gtest.h>
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+namespace {
+
+// A simple test: we have a 2-d tensor of shape 4 X 5 that looks like this:
+//
+//   0   1   2   3   4
+//   5   6   7   8   9
+//  10  11  12  13  14
+//  15  16  17  18  19
+//
+// We assume this is a row-major matrix.
+//
+// We store the tensor in a couple of slices and verify that we can recover all
+// of them.
+TEST(TensorSliceSetTest, QueryTwoD) {
+  TensorShape shape({4, 5});
+
+  TensorSliceSet tss(shape, DT_FLOAT);
+  // We store a few slices.
+
+  // Slice #1 is the top two rows:
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  const float src_1[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+  TensorSlice slice_1 = TensorSlice::ParseOrDie("0,2:-");
+  TF_CHECK_OK(tss.Register(slice_1, "", src_1));
+
+  // Slice #2 is the bottom left corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //  10  11  12   .   .
+  //  15  16  17   .   .
+  const float src_2[] = {10, 11, 12, 15, 16, 17};
+  TensorSlice slice_2 = TensorSlice::ParseOrDie("2,2:0,3");
+  TF_CHECK_OK(tss.Register(slice_2, "", src_2));
+
+  // Slice #3 is the bottom right corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .  18  19
+  const float src_3[] = {18, 19};
+  TensorSlice slice_3 = TensorSlice::ParseOrDie("3,1:3,2");
+  TF_CHECK_OK(tss.Register(slice_3, "", src_3));
+
+  // Notice that we leave a hole in the tensor
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   . (13) (14)
+  //   .   .   .   .   .
+
+  // Now we query some of the slices
+
+  // Slice #1 is an exact match
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("0,2:-");
+    float expected[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
+    float results[10];
+    EXPECT_TRUE(tss.Query(s, results));
+    for (int i = 0; i < 10; ++i) {
+      EXPECT_EQ(expected[i], results[i]);
+    }
+  }
+
+  // Slice #2 is a subset match
+  //   .   .   .   .   .
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,1:-");
+    float expected[] = {5, 6, 7, 8, 9};
+    float results[5];
+    EXPECT_TRUE(tss.Query(s, results));
+    for (int i = 0; i < 5; ++i) {
+      EXPECT_EQ(expected[i], results[i]);
+    }
+  }
+
+  // Slice #3 is a more complicated match: it needs the combination of a couple
+  // of slices
+  //   .   .   .   .   .
+  //   5   6   7   .   .
+  //  10  11  12   .   .
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,2:0,3");
+    float expected[] = {5, 6, 7, 10, 11, 12};
+    float results[6];
+    EXPECT_TRUE(tss.Query(s, results));
+    for (int i = 0; i < 6; ++i) {
+      EXPECT_EQ(expected[i], results[i]);
+    }
+  }
+
+  // Slice #4 includes the hole and so there is no match
+  //   .   .   .   .   .
+  //   .   .   7   8   9
+  //   .   .  12  13  14
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,2:2,3");
+    float results[6];
+    EXPECT_FALSE(tss.Query(s, results));
+  }
+}
+
+// Testing the meta version of the tensor slice set.
+TEST(TensorSliceSetTest, QueryMetaTwoD) {
+  TensorShape shape({4, 5});
+
+  TensorSliceSet tss(shape, DT_INT32);
+  // We store a few slices.
+
+  // Slice #1 is the top two rows:
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  TensorSlice slice_1 = TensorSlice::ParseOrDie("0,2:-");
+  TF_CHECK_OK(tss.Register(slice_1, "slice_1", nullptr));
+
+  // Slice #2 is the bottom left corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //  10  11  12   .   .
+  //  15  16  17   .   .
+  TensorSlice slice_2 = TensorSlice::ParseOrDie("2,2:0,3");
+  TF_CHECK_OK(tss.Register(slice_2, "slice_2", nullptr));
+
+  // Slice #3 is the bottom right corner
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   .  18  19
+  TensorSlice slice_3 = TensorSlice::ParseOrDie("3,1:3,2");
+  TF_CHECK_OK(tss.Register(slice_3, "slice_3", nullptr));
+
+  // Notice that we leave a hole in the tensor
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  //   .   .   . (13) (14)
+  //   .   .   .   .   .
+
+  // Now we query some of the slices
+
+  // Slice #1 is an exact match
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  // We just need slice_1 for this
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("0,2:-");
+    std::vector<std::pair<TensorSlice, string>> results;
+    EXPECT_TRUE(tss.QueryMeta(s, &results));
+    EXPECT_EQ(1, results.size());
+    EXPECT_EQ("0,2:-", results[0].first.DebugString());
+    EXPECT_EQ("slice_1", results[0].second);
+  }
+
+  // Slice #2 is a subset match
+  //   .   .   .   .   .
+  //   5   6   7   8   9
+  //   .   .   .   .   .
+  //   .   .   .   .   .
+  // We just need slice_1 for this
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,1:-");
+    std::vector<std::pair<TensorSlice, string>> results;
+    EXPECT_TRUE(tss.QueryMeta(s, &results));
+    EXPECT_EQ(1, results.size());
+    EXPECT_EQ("0,2:-", results[0].first.DebugString());
+    EXPECT_EQ("slice_1", results[0].second);
+  }
+
+  // Slice #3 is a more complicated match: it needs the combination of a couple
+  // of slices
+  //   .   .   .   .   .
+  //   5   6   7   .   .
+  //  10  11  12   .   .
+  //   .   .   .   .   .
+  // We need both slice_1 and slice_2 for this.
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,2:0,3");
+    std::vector<std::pair<TensorSlice, string>> results;
+    EXPECT_TRUE(tss.QueryMeta(s, &results));
+    EXPECT_EQ(2, results.size());
+    EXPECT_EQ("2,2:0,3", results[0].first.DebugString());
+    EXPECT_EQ("slice_2", results[0].second);
+    EXPECT_EQ("0,2:-", results[1].first.DebugString());
+    EXPECT_EQ("slice_1", results[1].second);
+  }
+
+  // Slice #4 includes the hole and so there is no match
+  //   .   .   .   .   .
+  //   .   .   7   8   9
+  //   .   .  12  13  14
+  //   .   .   .   .   .
+  {
+    TensorSlice s = TensorSlice::ParseOrDie("1,2:2,3");
+    std::vector<std::pair<TensorSlice, string>> results;
+    EXPECT_FALSE(tss.QueryMeta(s, &results));
+    EXPECT_EQ(0, results.size());
+  }
+}
+
+}  // namespace
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/tensor_slice_util.h b/tensorflow/core/util/tensor_slice_util.h
new file mode 100644
index 0000000000..5422c3bef3
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_util.h
@@ -0,0 +1,88 @@
+#ifndef TENSORFLOW_UTIL_TENSOR_SLICE_UTIL_H_
+#define TENSORFLOW_UTIL_TENSOR_SLICE_UTIL_H_
+
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+
+namespace tensorflow {
+
+// Some hackery to invoke eigen tensor to copy over tensor slices with variable
+// dimension tensors.
+// TODO(yangke): get rid of that once the variable dimension tensor support is
+// in.
+static const int kTensorSliceMaxRank = 8;
+
+// Create a tensor map with the given shape: we support up to 8 dimensions. If
+// the shape has less than 8 dimensions, we pad the remaining dimension with 1.
+template <typename T>
+Eigen::TensorMap<Eigen::Tensor<T, kTensorSliceMaxRank, Eigen::RowMajor>>
+GetEigenTensorMapFromTensorShape(const TensorShape& shape, T* data) {
+  Eigen::DSizes<Eigen::DenseIndex, kTensorSliceMaxRank> dsizes =
+      shape.AsEigenDSizesWithPadding<kTensorSliceMaxRank>();
+  Eigen::TensorMap<Eigen::Tensor<T, kTensorSliceMaxRank, Eigen::RowMajor>> eig(
+      data, dsizes);
+  return eig;
+}
+
+// Given a tensor described by "shape", two slices "slice_s" and "slice_d",
+// and two pointers "ptr_s" and "ptr_d", where "ptr_s" points to a chunk of
+// memory that stores the data for "slice_s" and "ptr_d" points to a chunk of
+// memory that stores the data for "slice_d". This function copies the data
+// that belongs to the intersection of the two slices from slice_s to
+// slice_d.  Uses Tensor cast<DstT>() to convert from SrcT to DstT. Returns true
+// iff the two slices share any intersection (and thus some data is copied).
+// TODO(yangke): figure out if we can make it private.
+template <typename SrcT, typename DstT>
+static bool CopyDataFromTensorSliceToTensorSlice(const TensorShape& shape,
+                                                 const TensorSlice& slice_s,
+                                                 const TensorSlice& slice_d,
+                                                 const SrcT* ptr_s,
+                                                 DstT* ptr_d) {
+  CHECK_LE(shape.dims(), kTensorSliceMaxRank) << "Only tensors of size up to "
+                                              << kTensorSliceMaxRank
+                                              << " are supported";
+  // We need to compute the intersection of the two slices.
+  TensorSlice inter;
+  if (!slice_s.Intersect(slice_d, &inter)) {
+    // There is no intersection: returns false.
+    return false;
+  } else {
+    // We need to compute the applied shapes after applying slice_s and
+    // slice_d.
+    TensorShape shp_s, shp_d;
+    Status s;
+    s = slice_s.SliceTensorShape(shape, &shp_s);
+    if (!s.ok()) {
+      LOG(WARNING) << s;
+      return false;
+    }
+    s = slice_d.SliceTensorShape(shape, &shp_d);
+    if (!s.ok()) {
+      LOG(WARNING) << s;
+      return false;
+    }
+
+    // We need to compute the relative slice of "inter" w.r.t. both slice_s and
+    // slice_d.
+    TensorSlice rel_s, rel_d;
+    slice_s.ComputeRelative(inter, &rel_s);
+    slice_d.ComputeRelative(inter, &rel_d);
+
+    // Get the eigen tensor maps to the data.
+    auto t_s = GetEigenTensorMapFromTensorShape(shp_s, ptr_s);
+    auto t_d = GetEigenTensorMapFromTensorShape(shp_d, ptr_d);
+
+    Eigen::DSizes<Eigen::DenseIndex, kTensorSliceMaxRank> s_start, s_len,
+        d_start, d_len;
+
+    rel_s.FillIndicesAndSizes<kTensorSliceMaxRank>(shp_s, &s_start, &s_len);
+    rel_d.FillIndicesAndSizes<kTensorSliceMaxRank>(shp_d, &d_start, &d_len);
+    t_d.slice(d_start, d_len) = t_s.slice(s_start, s_len).template cast<DstT>();
+    return true;
+  }
+}
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_TENSOR_SLICE_UTIL_H_
diff --git a/tensorflow/core/util/tensor_slice_util_test.cc b/tensorflow/core/util/tensor_slice_util_test.cc
new file mode 100644
index 0000000000..348b0c884e
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_util_test.cc
@@ -0,0 +1,91 @@
+#include "tensorflow/core/util/tensor_slice_util.h"
+
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+// Testing copying data from one tensor slice to another tensor slice
+TEST(TensorSliceUtilTest, CopyTensorSliceToTensorSlice) {
+  // We map out a 2-d tensor of size 4 X 5 and we want the final results look
+  // like this:
+  //
+  //   0   1   2   3   4
+  //   5   6   7   8   9
+  //  10  11  12  13  14
+  //  15  16  17  18  19
+  //
+  // We assume this is a row-major matrix
+  //
+  TensorShape shape({4, 5});
+
+  // We will try to do a couple of slice to slice copies.
+
+  // Case 1: simple identity copy
+  // The slice is the "interior" of the matrix
+  //   .   .   .   .   .
+  //   .   6   7   8   .
+  //   ,  11  12  13   .
+  //   .   .   .   .   .
+  {
+    TensorSlice slice_s = TensorSlice::ParseOrDie("1,2:1,3");
+    TensorSlice slice_d = TensorSlice::ParseOrDie("1,2:1,3");
+    const float ptr_s[] = {6, 7, 8, 11, 12, 13};
+    float ptr_d[6];
+    for (int i = 0; i < 6; ++i) {
+      ptr_d[i] = 0;
+    }
+    EXPECT_TRUE(CopyDataFromTensorSliceToTensorSlice(shape, slice_s, slice_d,
+                                                     ptr_s, ptr_d));
+    for (int i = 0; i < 6; ++i) {
+      EXPECT_EQ(ptr_s[i], ptr_d[i]);
+    }
+  }
+
+  // Case 2: no intersection
+  {
+    TensorSlice slice_s = TensorSlice::ParseOrDie("1,2:1,3");
+    TensorSlice slice_d = TensorSlice::ParseOrDie("3,1:2,3");
+    const float ptr_s[] = {6, 7, 8, 11, 12, 13};
+    float ptr_d[6];
+    EXPECT_FALSE(CopyDataFromTensorSliceToTensorSlice(shape, slice_s, slice_d,
+                                                      ptr_s, ptr_d));
+  }
+
+  // Case 3: a trickier case
+  // The source slice is on the upper left corner:
+  //   0   1   2   .   .
+  //   5   6   7   .   .
+  //  10  11  12   .   .
+  //   .   .   .   .   .
+  //
+  // The destination slice is the right part of the middle stripe:
+  //   .   .   .   .   .
+  //   .   X   X   X   X
+  //   .   X   X   X   X
+  //   .   .   .   .   .
+  //
+  // So we expect to copy over the 2X2 block:
+  //   .   .   .   .   .
+  //   .   6   7   .   .
+  //   .  11  12   .   .
+  //   .   .   .   .   .
+  {
+    TensorSlice slice_s = TensorSlice::ParseOrDie("0,3:0,3");
+    TensorSlice slice_d = TensorSlice::ParseOrDie("1,2:1,4");
+    const float ptr_s[] = {0, 1, 2, 5, 6, 7, 10, 11, 12};
+    float ptr_d[8];
+    for (int i = 0; i < 8; ++i) {
+      ptr_d[i] = 0;
+    }
+    EXPECT_TRUE(CopyDataFromTensorSliceToTensorSlice(shape, slice_s, slice_d,
+                                                     ptr_s, ptr_d));
+    const float expected[] = {6, 7, 0, 0, 11, 12, 0, 0};
+    for (int i = 0; i < 8; ++i) {
+      EXPECT_EQ(expected[i], ptr_d[i]);
+    }
+  }
+}
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/tensor_slice_writer.cc b/tensorflow/core/util/tensor_slice_writer.cc
new file mode 100644
index 0000000000..bb2fd96c05
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_writer.cc
@@ -0,0 +1,110 @@
+#include "tensorflow/core/util/tensor_slice_writer.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/io/table_builder.h"
+#include "tensorflow/core/lib/random/random.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/util/saved_tensor_slice_util.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+namespace {
+
+class TableBuilder : public TensorSliceWriter::Builder {
+ public:
+  TableBuilder(const string& name, WritableFile* f)
+      : name_(name),
+        file_(f),
+        builder_(new table::TableBuilder(table::Options(), f)) {}
+  void Add(StringPiece key, StringPiece val) override {
+    builder_->Add(key, val);
+  }
+  Status Finish(int64* file_size) override {
+    *file_size = -1;
+    Status s = builder_->Finish();
+    if (s.ok()) {
+      s = file_->Close();
+      if (s.ok()) {
+        *file_size = builder_->FileSize();
+      }
+    }
+    if (!s.ok()) {
+      s = errors::Internal("Error writing (tmp) checkpoint file: ", name_, ": ",
+                           s.ToString());
+    }
+    builder_.reset();
+    file_.reset();
+    return s;
+  }
+
+ private:
+  string name_;
+  std::unique_ptr<WritableFile> file_;
+  std::unique_ptr<table::TableBuilder> builder_;
+};
+}  // anonymous namespace
+
+Status CreateTableTensorSliceBuilder(
+    const string& name, TensorSliceWriter::Builder** builder) {
+  *builder = nullptr;
+  WritableFile* f;
+  Status s = Env::Default()->NewWritableFile(name, &f);
+  if (s.ok()) {
+    *builder = new TableBuilder(name, f);
+    return Status::OK();
+  } else {
+    return s;
+  }
+}
+
+TensorSliceWriter::TensorSliceWriter(const string& filename,
+                                     CreateBuilderFunction create_builder)
+    : filename_(filename),
+      create_builder_(create_builder),
+      tmpname_(strings::StrCat(filename, ".tempstate", random::New64())),
+      slices_(0) {}
+
+Status TensorSliceWriter::Finish() {
+  Builder* b;
+  Status s = create_builder_(tmpname_, &b);
+  if (!s.ok()) {
+    delete b;
+    return s;
+  }
+  std::unique_ptr<Builder> builder(b);
+
+  // We save the saved tensor slice metadata as the first element.
+  string meta;
+  sts_.AppendToString(&meta);
+  builder->Add(kSavedTensorSlicesKey, meta);
+
+  // Go through all the data and add them
+  for (const auto& x : data_) {
+    builder->Add(x.first, x.second);
+  }
+
+  int64 file_size;
+  s = builder->Finish(&file_size);
+  // We need to rename the file to the proper name
+  if (s.ok()) {
+    s = Env::Default()->RenameFile(tmpname_, filename_);
+    if (s.ok()) {
+      VLOG(1) << "Written " << slices_ << " slices for "
+              << sts_.meta().tensor_size() << " tensors (" << file_size
+              << " bytes) to " << filename_;
+    } else {
+      LOG(ERROR) << "Failed to rename file " << tmpname_ << " to " << filename_;
+    }
+  } else {
+    Env::Default()->DeleteFile(tmpname_);
+  }
+  return s;
+}
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/tensor_slice_writer.h b/tensorflow/core/util/tensor_slice_writer.h
new file mode 100644
index 0000000000..cce3880cb3
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_writer.h
@@ -0,0 +1,149 @@
+// The utility to write checkpoints for google brain tensor ops and v3
+// checkpoints for dist_belief.
+//
+
+#ifndef TENSORFLOW_UTIL_TENSOR_SLICE_WRITER_H_
+#define TENSORFLOW_UTIL_TENSOR_SLICE_WRITER_H_
+
+#include <unordered_map>
+
+#include "tensorflow/core/framework/tensor_slice.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/public/tensor_shape.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/util/saved_tensor_slice.pb.h"
+#include "tensorflow/core/util/saved_tensor_slice_util.h"
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+class TensorSliceWriter {
+ public:
+  // Abstract interface that TensorSliceWriter uses for building
+  class Builder {
+   public:
+    virtual ~Builder() {}
+    virtual void Add(StringPiece key, StringPiece value) = 0;
+    virtual Status Finish(int64* file_size) = 0;
+  };
+  typedef std::function<Status(const string&, Builder**)>
+      CreateBuilderFunction;
+
+  TensorSliceWriter(const string& filename,
+                    CreateBuilderFunction create_builder);
+  virtual ~TensorSliceWriter() {}
+  // Adds a slice. We support float and int32 for now.
+  // TODO(yangke): add more supports
+  template <typename T>
+  Status Add(const string& name, const TensorShape& shape,
+                  const TensorSlice& slice, const T* data);
+  Status Finish();
+
+ private:
+  // Allocate "num_elements" elements in "ss" and save the data in "data"
+  // there.
+  template <typename T>
+  static void SaveData(const T* data, int num_elements, SavedSlice* ss);
+
+  const string filename_;
+  const CreateBuilderFunction create_builder_;
+  const string tmpname_;
+
+  // A mapping from the tensor names to their index in meta_.saved_slice_meta()
+  std::unordered_map<string, int> name_to_index_;
+  // The metadata that holds all the saved tensor slices.
+  SavedTensorSlices sts_;
+  // The data to be written to the builder
+  std::map<string, string> data_;
+  // Total number of slices written
+  int slices_;
+  TF_DISALLOW_COPY_AND_ASSIGN(TensorSliceWriter);
+};
+
+template <typename T>
+Status TensorSliceWriter::Add(const string& name, const TensorShape& shape,
+                                   const TensorSlice& slice, const T* data) {
+  // The tensor and the slice have to be compatible
+  if (shape.dims() != slice.dims()) {
+    return errors::Internal("Incompatible tensor shape and slice: ", "shape = ",
+                            shape.DebugString(), ", slice = ",
+                            slice.DebugString());
+  }
+  DataType dt = DataTypeToEnum<T>::value;
+  // We need to add an entry for "name" if there isn't an entry already.
+  int index = gtl::FindWithDefault(name_to_index_, name, -1);
+  if (index >= 0) {
+    // The same tensor has been registered -- we verify that the shapes and the
+    // type agree.
+    const SavedSliceMeta& ssm = sts_.meta().tensor(index);
+    CHECK_EQ(name, ssm.name()) << ssm.ShortDebugString();
+    TensorShape ssm_shape(ssm.shape());
+    if (!shape.IsSameSize(ssm_shape)) {
+      return errors::Internal("Mismatching shapes: existing tensor = ",
+                              ssm_shape.DebugString(), ", trying to add name ",
+                              name, ", shape = ", shape.DebugString());
+    }
+    if (dt != ssm.type()) {
+      return errors::Internal(
+          "Mismatching types: existing type = ", DataTypeString(ssm.type()),
+          ", trying to add name ", name, ", type = ", DataTypeString(dt));
+    }
+  } else {
+    // Insert the new tensor name with the shape information
+    index = sts_.meta().tensor_size();
+    name_to_index_.insert(std::make_pair(name, index));
+    SavedSliceMeta* ssm = sts_.mutable_meta()->add_tensor();
+    ssm->set_name(name);
+    shape.AsProto(ssm->mutable_shape());
+    ssm->set_type(dt);
+  }
+  // Now we need to add the slice info the list of slices.
+  SavedSliceMeta* ssm = sts_.mutable_meta()->mutable_tensor(index);
+  slice.AsProto(ssm->add_slice());
+
+  // Now we need to add the real data.
+  {
+    SavedTensorSlices sts;
+    SavedSlice* ss = sts.mutable_data();
+    ss->set_name(name);
+    slice.AsProto(ss->mutable_slice());
+    TensorShape saved_shape(ssm->shape());
+    TensorShape sliced_shape;
+    TF_RETURN_IF_ERROR(slice.SliceTensorShape(saved_shape, &sliced_shape));
+    SaveData(data, sliced_shape.num_elements(), ss);
+    string key = EncodeTensorNameSlice(name, slice);
+    // TODO(yangke): consider doing a two-pass thing where the first pass just
+    // list the tensor slices we want to save and then another pass to actually
+    // set the data. Need to figure out if the interface works well.
+    std::pair<string, string> key_value(key, "");
+    sts.AppendToString(&key_value.second);
+    data_.insert(key_value);
+  }
+  ++slices_;
+  return Status::OK();
+}
+
+template <typename T>
+void TensorSliceWriter::SaveData(const T* data, int num_elements,
+                                 SavedSlice* ss) {
+  Fill(data, num_elements, ss->mutable_data());
+}
+
+// Create a table builder that will write to "filename" in
+// tensorflow::io::Table format.  If successful, return OK
+// and set "*builder" to the allocated builder.  Otherwise, return a
+// non-OK status.
+Status CreateTableTensorSliceBuilder(const string& filename,
+                                          TensorSliceWriter::Builder** builder);
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_TENSOR_SLICE_WRITER_H_
diff --git a/tensorflow/core/util/tensor_slice_writer_test.cc b/tensorflow/core/util/tensor_slice_writer_test.cc
new file mode 100644
index 0000000000..ca3dffe422
--- /dev/null
+++ b/tensorflow/core/util/tensor_slice_writer_test.cc
@@ -0,0 +1,248 @@
+#include "tensorflow/core/util/tensor_slice_writer.h"
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/platform/protobuf.h"
+#include "tensorflow/core/util/saved_tensor_slice_util.h"
+#include "tensorflow/core/util/tensor_slice_reader.h"
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/test.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+
+namespace checkpoint {
+
+class TensorSliceWriteTestHelper {
+ public:
+  static void CheckEntries(const string& fname);
+  static void GetData(TensorSliceReader::Table* table, const string& name,
+                      const TensorSlice& slice, SavedSlice* ss);
+};
+
+namespace {
+
+// Testing that an array is what is expected
+void ExpectIdenticalFloatArrays(const float* expected, int size,
+                                const float* actual) {
+  // TODO(yangke): copy some of the Dump* functions over
+  //  LOG(INFO) << "Expected = " << DumpFloatArray(expected, size);
+  //  LOG(INFO) << "Actual   = " << DumpFloatArray(actual, size);
+  for (int i = 0; i < size; ++i) {
+    EXPECT_NEAR(expected[i], actual[i], 1e-6);
+  }
+}
+
+template <typename T, typename U>
+void ExpectIdenticalIntArrays(const T* expected, int size, const U* actual) {
+  for (int i = 0; i < size; ++i) {
+    EXPECT_EQ(expected[i], static_cast<T>(actual[i]));
+  }
+}
+
+// Nifty routine to get the size of an array
+template <typename T, unsigned SIZE>
+inline size_t ArraySize(const T(&v)[SIZE]) {
+  return SIZE;
+}
+
+// A simple test on writing a few tensor slices
+// TODO(yangke): refactor into smaller tests: will do as we add more stuff to
+// the writer.
+TEST(TensorSliceWriteTest, SimpleWrite) {
+  const string filename = io::JoinPath(testing::TmpDir(), "checkpoint");
+
+  TensorSliceWriter writer(filename, CreateTableTensorSliceBuilder);
+
+  // Add some int32 tensor slices
+  {
+    TensorShape shape({5, 10});
+    TensorSlice slice = TensorSlice::ParseOrDie("-:0,1");
+    const int32 data[] = {0, 1, 2, 3, 4};
+    TF_CHECK_OK(writer.Add("test", shape, slice, data));
+  }
+
+  // Two slices share the same tensor name
+  {
+    TensorShape shape({5, 10});
+    TensorSlice slice = TensorSlice::ParseOrDie("-:3,1");
+    const int32 data[] = {10, 11, 12, 13, 14};
+    TF_CHECK_OK(writer.Add("test", shape, slice, data));
+  }
+
+  // Another slice from a different float tensor -- it has a different name and
+  // should be inserted in front of the previous tensor
+  {
+    TensorShape shape({3, 2});
+    TensorSlice slice = TensorSlice::ParseOrDie("-:-");
+    const float data[] = {1.2, 1.3, 1.4, 2.1, 2.2, 2.3};
+    TF_CHECK_OK(writer.Add("AA", shape, slice, data));
+  }
+
+  // A slice with int64 data
+  {
+    TensorShape shape({5, 10});
+    TensorSlice slice = TensorSlice::ParseOrDie("-:3,1");
+    const int64 data[] = {10, 11, 12, 13, 14};
+    TF_CHECK_OK(writer.Add("int64", shape, slice, data));
+  }
+
+  // A slice with int16 data
+  {
+    TensorShape shape({5, 10});
+    TensorSlice slice = TensorSlice::ParseOrDie("-:3,1");
+    const int16 data[] = {10, 11, 12, 13, 14};
+    TF_CHECK_OK(writer.Add("int16", shape, slice, data));
+  }
+
+  TF_CHECK_OK(writer.Finish());
+
+  // Now we examine the checkpoint file manually.
+  TensorSliceWriteTestHelper::CheckEntries(filename);
+}
+
+}  // namespace
+
+void TensorSliceWriteTestHelper::GetData(TensorSliceReader::Table* table,
+                                         const string& name,
+                                         const TensorSlice& slice,
+                                         SavedSlice* ss) {
+  string key = EncodeTensorNameSlice(name, slice);
+  string value;
+  EXPECT_TRUE(table->Get(key, &value));
+  SavedTensorSlices sts;
+  EXPECT_TRUE(ParseProtoUnlimited(&sts, value));
+  EXPECT_FALSE(sts.has_meta());
+  *ss = sts.data();
+  EXPECT_EQ(name, ss->name());
+  TensorSlice slice2(ss->slice());
+  EXPECT_EQ(slice.DebugString(), slice2.DebugString());
+}
+
+void TensorSliceWriteTestHelper::CheckEntries(const string& fname) {
+  TensorSliceReader::Table* tptr;
+  TF_CHECK_OK(OpenTableTensorSliceReader(fname, &tptr));
+  std::unique_ptr<TensorSliceReader::Table> table(tptr);
+  CHECK_NOTNULL(table.get());
+
+  // We expect a block of SavedTensorSlices
+  string value;
+  ASSERT_TRUE(table->Get(kSavedTensorSlicesKey, &value));
+  {
+    SavedTensorSlices sts;
+    EXPECT_TRUE(ParseProtoUnlimited(&sts, value));
+    // We also expect two entries for the tensors
+    EXPECT_TRUE(sts.has_meta());
+    EXPECT_EQ(4, sts.meta().tensor_size());
+    // We don't expect any data in the first block.
+    EXPECT_FALSE(sts.has_data());
+    // The two tensors should be stored in the same order as they are first
+    // created.
+    {
+      // The two slices of the "test" tensor
+      const SavedSliceMeta& ssm = sts.meta().tensor(0);
+      EXPECT_EQ("test", ssm.name());
+      EXPECT_EQ(
+          "dim { size: 5 } "
+          "dim { size: 10 }",
+          ssm.shape().ShortDebugString());
+      EXPECT_EQ(DT_INT32, ssm.type());
+      EXPECT_EQ(2, ssm.slice_size());
+      TensorSlice s0(ssm.slice(0));
+      TensorSlice s1(ssm.slice(1));
+      EXPECT_EQ("-:0,1", s0.DebugString());
+      EXPECT_EQ("-:3,1", s1.DebugString());
+    }
+    {
+      // The "AA" tensor
+      const SavedSliceMeta& ssm = sts.meta().tensor(1);
+      EXPECT_EQ("AA", ssm.name());
+      EXPECT_EQ(
+          "dim { size: 3 } "
+          "dim { size: 2 }",
+          ssm.shape().ShortDebugString());
+      EXPECT_EQ(DT_FLOAT, ssm.type());
+      EXPECT_EQ(1, ssm.slice_size());
+      TensorSlice s0(ssm.slice(0));
+      EXPECT_EQ("-:-", s0.DebugString());
+    }
+    {
+      // The "int64" tensor
+      const SavedSliceMeta& ssm = sts.meta().tensor(2);
+      EXPECT_EQ("int64", ssm.name());
+      EXPECT_EQ(
+          "dim { size: 5 } "
+          "dim { size: 10 }",
+          ssm.shape().ShortDebugString());
+      EXPECT_EQ(DT_INT64, ssm.type());
+      EXPECT_EQ(1, ssm.slice_size());
+      TensorSlice s0(ssm.slice(0));
+      EXPECT_EQ("-:3,1", s0.DebugString());
+    }
+    {
+      // The "int16" tensor
+      const SavedSliceMeta& ssm = sts.meta().tensor(3);
+      EXPECT_EQ("int16", ssm.name());
+      EXPECT_EQ(
+          "dim { size: 5 } "
+          "dim { size: 10 }",
+          ssm.shape().ShortDebugString());
+      EXPECT_EQ(DT_INT16, ssm.type());
+      EXPECT_EQ(1, ssm.slice_size());
+      TensorSlice s0(ssm.slice(0));
+      EXPECT_EQ("-:3,1", s0.DebugString());
+    }
+  }
+
+  // We expect 5 blocks of tensor data
+  {
+    // Block 1: we expect it to be the full slice of the "AA" tensor
+    SavedSlice ss;
+    GetData(table.get(), "AA", TensorSlice(2), &ss);
+    const float data[] = {1.2, 1.3, 1.4, 2.1, 2.2, 2.3};
+    EXPECT_EQ(ArraySize(data), ss.data().float_val_size());
+    ExpectIdenticalFloatArrays(data, ArraySize(data),
+                               ss.data().float_val().data());
+  }
+
+  {
+    // Block 2: we expect it to be the first slice of the "test" tensor
+    SavedSlice ss;
+    GetData(table.get(), "test", TensorSlice({{0, -1}, {0, 1}}), &ss);
+    const int32 data[] = {0, 1, 2, 3, 4};
+    EXPECT_EQ(ArraySize(data), ss.data().int_val_size());
+    ExpectIdenticalIntArrays(data, ArraySize(data), ss.data().int_val().data());
+  }
+
+  {
+    // Block 3: we expect it to be the second slice of the "test" tensor
+    SavedSlice ss;
+    GetData(table.get(), "test", TensorSlice({{0, -1}, {3, 1}}), &ss);
+    const int32 data[] = {10, 11, 12, 13, 14};
+    EXPECT_EQ(ArraySize(data), ss.data().int_val_size());
+    ExpectIdenticalIntArrays(data, ArraySize(data), ss.data().int_val().data());
+  }
+
+  {
+    // Block 4: we expect it to be the slice of the "int64" tensor
+    SavedSlice ss;
+    GetData(table.get(), "int64", TensorSlice({{0, -1}, {3, 1}}), &ss);
+    const int64 data[] = {10, 11, 12, 13, 14};
+    EXPECT_EQ(ArraySize(data), ss.data().int64_val_size());
+    ExpectIdenticalIntArrays(data, ArraySize(data),
+                             ss.data().int64_val().data());
+  }
+
+  {
+    // Block 5: we expect it to be the slice of the "int16" tensor
+    SavedSlice ss;
+    GetData(table.get(), "int16", TensorSlice({{0, -1}, {3, 1}}), &ss);
+    const int16 data[] = {10, 11, 12, 13, 14};
+    EXPECT_EQ(ArraySize(data), ss.data().int_val_size());
+    ExpectIdenticalIntArrays(data, ArraySize(data), ss.data().int_val().data());
+  }
+}
+
+}  // namespace checkpoint
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/use_cudnn.cc b/tensorflow/core/util/use_cudnn.cc
new file mode 100644
index 0000000000..544b48a679
--- /dev/null
+++ b/tensorflow/core/util/use_cudnn.cc
@@ -0,0 +1,20 @@
+#include "tensorflow/core/util/use_cudnn.h"
+
+#include <stdlib.h>
+
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+bool CanUseCudnn() {
+  const char* tf_use_cudnn = getenv("TF_USE_CUDNN");
+  if (tf_use_cudnn != nullptr) {
+    string tf_use_cudnn_str = tf_use_cudnn;
+    if (tf_use_cudnn_str == "0") {
+      return false;
+    }
+  }
+  return true;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/use_cudnn.h b/tensorflow/core/util/use_cudnn.h
new file mode 100644
index 0000000000..20ce24c513
--- /dev/null
+++ b/tensorflow/core/util/use_cudnn.h
@@ -0,0 +1,12 @@
+// The utility to check whether we have Cudnn depenedency.
+
+#ifndef TENSORFLOW_UTIL_USE_CUDNN_H_
+#define TENSORFLOW_UTIL_USE_CUDNN_H_
+
+namespace tensorflow {
+
+bool CanUseCudnn();
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_USE_CUDNN_H_
diff --git a/tensorflow/core/util/util.cc b/tensorflow/core/util/util.cc
new file mode 100644
index 0000000000..14ac513074
--- /dev/null
+++ b/tensorflow/core/util/util.cc
@@ -0,0 +1,81 @@
+#include "tensorflow/core/util/util.h"
+
+#include "tensorflow/core/platform/logging.h"
+namespace tensorflow {
+
+StringPiece NodeNamePrefix(const StringPiece& op_name) {
+  StringPiece sp(op_name);
+  auto p = sp.find('/');
+  if (p == StringPiece::npos || p == 0) {
+    return "";
+  } else {
+    return StringPiece(sp.data(), p);
+  }
+}
+
+StringPiece NodeNameFullPrefix(const StringPiece& op_name) {
+  StringPiece sp(op_name);
+  auto p = sp.rfind('/');
+  if (p == StringPiece::npos || p == 0) {
+    return "";
+  } else {
+    return StringPiece(sp.data(), p);
+  }
+}
+
+MovingAverage::MovingAverage(int window)
+    : window_(window),
+      sum_(0.0),
+      data_(new double[window_]),
+      head_(0),
+      count_(0) {
+  CHECK_GE(window, 1);
+}
+
+MovingAverage::~MovingAverage() { delete[] data_; }
+
+void MovingAverage::Clear() {
+  count_ = 0;
+  head_ = 0;
+  sum_ = 0;
+}
+
+double MovingAverage::GetAverage() const {
+  if (count_ == 0) {
+    return 0;
+  } else {
+    return static_cast<double>(sum_) / count_;
+  }
+}
+
+void MovingAverage::AddValue(double v) {
+  if (count_ < window_) {
+    // This is the warmup phase. We don't have a full window's worth of data.
+    head_ = count_;
+    data_[count_++] = v;
+  } else {
+    if (window_ == ++head_) {
+      head_ = 0;
+    }
+    // Toss the oldest element
+    sum_ -= data_[head_];
+    // Add the newest element
+    data_[head_] = v;
+  }
+  sum_ += v;
+}
+
+static char hex_char[] = "0123456789abcdef";
+
+string PrintMemory(const char* ptr, int n) {
+  string ret;
+  ret.resize(n * 3);
+  for (int i = 0; i < n; ++i) {
+    ret[i * 3] = ' ';
+    ret[i * 3 + 1] = hex_char[ptr[i] >> 4];
+    ret[i * 3 + 2] = hex_char[ptr[i] & 0xf];
+  }
+  return ret;
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/core/util/util.h b/tensorflow/core/util/util.h
new file mode 100644
index 0000000000..52650bd8ea
--- /dev/null
+++ b/tensorflow/core/util/util.h
@@ -0,0 +1,40 @@
+#ifndef TENSORFLOW_UTIL_UTIL_H_
+#define TENSORFLOW_UTIL_UTIL_H_
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+
+namespace tensorflow {
+
+// If op_name has '/' in it, then return everything before the first '/'.
+// Otherwise return empty string.
+StringPiece NodeNamePrefix(const StringPiece& op_name);
+
+// If op_name has '/' in it, then return everything before the last '/'.
+// Otherwise return empty string.
+StringPiece NodeNameFullPrefix(const StringPiece& op_name);
+
+class MovingAverage {
+ public:
+  explicit MovingAverage(int window);
+  ~MovingAverage();
+
+  void Clear();
+
+  double GetAverage() const;
+  void AddValue(double v);
+
+ private:
+  const int window_;  // Max size of interval
+  double sum_;        // Sum over interval
+  double* data_;      // Actual data values
+  int head_;          // Offset of the newest statistic in data_
+  int count_;         // # of valid data elements in window
+};
+
+// Returns a string printing bytes in ptr[0..n).  The output looks
+// like "00 01 ef cd cd ef".
+string PrintMemory(const char* ptr, int n);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_UTIL_H_
diff --git a/tensorflow/core/util/work_sharder.cc b/tensorflow/core/util/work_sharder.cc
new file mode 100644
index 0000000000..d9ab0805c5
--- /dev/null
+++ b/tensorflow/core/util/work_sharder.cc
@@ -0,0 +1,57 @@
+#include "tensorflow/core/util/work_sharder.h"
+
+#include <vector>
+#include "tensorflow/core/lib/core/blocking_counter.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+
+void Shard(int num_workers, thread::ThreadPool* workers, int64 total,
+           int64 cost_per_unit, std::function<void(int64, int64)> work) {
+  CHECK_GE(total, 0);
+  if (total == 0) {
+    return;
+  }
+  if (num_workers <= 1) {
+    // Just inline the whole work since we only have 1 thread (core).
+    work(0, total);
+    return;
+  }
+  cost_per_unit = std::max(1LL, cost_per_unit);
+  // We shard [0, total) into "num_shards" shards.
+  //   1 <= num_shards <= num worker threads
+  //
+  // If total * cost_per_unit is small, it is not worth shard too
+  // much. Let us assume each cost unit is 1ns, kMinCostPerShard=10000
+  // is 10us.
+  static const int64 kMinCostPerShard = 10000;
+  const int num_shards = std::max(
+      1, std::min<int>(num_workers, total * cost_per_unit / kMinCostPerShard));
+  // Each shard contains up to "block_size" units. [0, total) is sharded
+  // into:
+  //   [0, block_size), [block_size, 2*block_size), ...
+  // The 1st shard is done by the caller thread and the other shards
+  // are dispatched to the worker threads. The last shard may be smaller than
+  // block_size.
+  const int64 block_size = (total + num_shards - 1) / num_shards;
+  CHECK_GT(block_size, 0);  // total > 0 guarantees this.
+  if (block_size >= total) {
+    work(0, total);
+    return;
+  }
+  const int num_shards_used = (total + block_size - 1) / block_size;
+  BlockingCounter counter(num_shards_used - 1);
+  for (int64 start = block_size; start < total; start += block_size) {
+    auto limit = std::min(start + block_size, total);
+    workers->Schedule([&work, &counter, start, limit]() {
+      work(start, limit);        // Compute the shard.
+      counter.DecrementCount();  // The shard is done.
+    });
+  }
+
+  // Inline execute the 1st shard.
+  work(0, std::min(block_size, total));
+  counter.Wait();
+}
+
+}  // end namespace tensorflow
diff --git a/tensorflow/core/util/work_sharder.h b/tensorflow/core/util/work_sharder.h
new file mode 100644
index 0000000000..1ea2cf4397
--- /dev/null
+++ b/tensorflow/core/util/work_sharder.h
@@ -0,0 +1,33 @@
+#ifndef TENSORFLOW_UTIL_WORK_SHARDER_H_
+#define TENSORFLOW_UTIL_WORK_SHARDER_H_
+
+#include <functional>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+
+namespace tensorflow {
+
+// Shards the "total" unit of work assuming each unit of work having
+// roughly "cost_per_unit". Each unit of work is indexed 0, 1, ...,
+// total - 1. Each shard contains 1 or more units of work and the
+// total cost of each shard is roughly the same. The total number of
+// shards is no more than num_workers. The calling thread and the
+// "workers" are used to compute each shard (calling work(start,
+// limit). A common configuration is that "workers" is a thread pool
+// with "num_workers" threads.
+//
+// "work" should be a callable taking (int64, int64) arguments.
+// work(start, limit) computes the work units from [start,
+// limit), i.e., [start, limit) is a shard.
+//
+// REQUIRES: num_workers >= 0
+// REQUIRES: workers != nullptr
+// REQUIRES: total >= 0
+// REQUIRES: cost_per_unit >= 0
+void Shard(int num_workers, thread::ThreadPool* workers, int64 total,
+           int64 cost_per_unit, std::function<void(int64, int64)> work);
+
+}  // end namespace tensorflow
+
+#endif  // TENSORFLOW_UTIL_WORK_SHARDER_H_
diff --git a/tensorflow/core/util/work_sharder_test.cc b/tensorflow/core/util/work_sharder_test.cc
new file mode 100644
index 0000000000..d9792c0e8d
--- /dev/null
+++ b/tensorflow/core/util/work_sharder_test.cc
@@ -0,0 +1,57 @@
+#include "tensorflow/core/util/work_sharder.h"
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+#include <gtest/gtest.h>
+
+namespace tensorflow {
+namespace {
+
+void RunSharding(int64 num_workers, int64 total, int64 cost_per_unit) {
+  thread::ThreadPool threads(Env::Default(), "test", 16);
+  mutex mu;
+  int64 num_shards = 0;
+  int64 num_done_work = 0;
+  std::vector<bool> work(total, false);
+  Shard(num_workers, &threads, total, cost_per_unit,
+        [&mu, &num_shards, &num_done_work, &work](int start, int limit) {
+          VLOG(1) << "Shard [" << start << "," << limit << ")";
+          mutex_lock l(mu);
+          ++num_shards;
+          for (; start < limit; ++start) {
+            EXPECT_FALSE(work[start]);  // No duplicate
+            ++num_done_work;
+            work[start] = true;
+          }
+        });
+  EXPECT_LE(num_shards, num_workers + 1);
+  EXPECT_EQ(num_done_work, total);
+  LOG(INFO) << num_workers << " " << total << " " << cost_per_unit << " "
+            << num_shards;
+}
+
+TEST(Shard, Basic) {
+  for (auto workers : {0, 1, 2, 3, 5, 7, 10, 11, 15, 100, 1000}) {
+    for (auto total : {0, 1, 7, 10, 64, 100, 256, 1000, 9999}) {
+      for (auto cost_per_unit : {0, 1, 11, 102, 1003, 10005, 1000007}) {
+        RunSharding(workers, total, cost_per_unit);
+      }
+    }
+  }
+}
+
+void BM_Sharding(int iters, int arg) {
+  thread::ThreadPool threads(Env::Default(), "test", 16);
+  const int64 total = 1LL << 30;
+  auto lambda = [](int64 start, int64 limit) {};
+  auto work = std::cref(lambda);
+  for (; iters > 0; iters -= arg) {
+    Shard(arg - 1, &threads, total, 1, work);
+  }
+}
+BENCHMARK(BM_Sharding)->Range(1, 128);
+
+}  // namespace
+}  // namespace tensorflow
diff --git a/tensorflow/examples/android/AndroidManifest.xml b/tensorflow/examples/android/AndroidManifest.xml
new file mode 100644
index 0000000000..fbbc74a678
--- /dev/null
+++ b/tensorflow/examples/android/AndroidManifest.xml
@@ -0,0 +1,46 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Copyright 2014 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<manifest xmlns:android="http://schemas.android.com/apk/res/android"
+    package="org.tensorflow.demo">
+
+    <uses-permission android:name="android.permission.CAMERA" />
+    <uses-feature android:name="android.hardware.camera" />
+    <uses-feature android:name="android.hardware.camera.autofocus" />
+    <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
+
+    <uses-sdk
+        android:minSdkVersion="21"
+        android:targetSdkVersion="23" />
+
+    <application android:allowBackup="true"
+        android:debuggable="true"
+        android:label="@string/app_name"
+        android:icon="@drawable/ic_launcher"
+        android:theme="@style/MaterialTheme">
+
+        <activity android:name="org.tensorflow.demo.CameraActivity"
+                  android:screenOrientation="portrait"
+                  android:label="@string/app_name">
+            <intent-filter>
+                <action android:name="android.intent.action.MAIN" />
+                <category android:name="android.intent.category.LAUNCHER" />
+            </intent-filter>
+        </activity>
+    </application>
+
+</manifest>
diff --git a/tensorflow/examples/android/BUILD b/tensorflow/examples/android/BUILD
new file mode 100644
index 0000000000..fb5bc8da71
--- /dev/null
+++ b/tensorflow/examples/android/BUILD
@@ -0,0 +1,70 @@
+# Description:
+#   Tensorflow camera demo app for Android.
+
+package(default_visibility = ["//visibility:public"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+cc_library(
+    name = "tensorflow_native_libs",
+    srcs = glob(["jni/**/*.cc"]),
+    hdrs = glob(["jni/**/*.h"]),
+    copts = [
+        "-std=c++11",
+        "-mfpu=neon",
+    ],
+    linkopts = ["-llog -landroid -lm -ljnigraphics"],
+    tags = [
+        "manual",
+        "notap",
+    ],
+    deps = [
+        ":dummy_pthread",
+        "//tensorflow/core:android_tensorflow_lib",
+    ],
+)
+
+# This library only exists as a workaround to satisfy dependencies
+# that declare -lpthread in their linkopts. Although Android supports
+# pthreads, it does not provide it as a separate library.
+cc_library(
+    name = "dummy_pthread",
+    srcs = ["jni/libpthread.so"],
+)
+
+android_binary(
+    name = "tensorflow_demo",
+    srcs = glob([
+        "src/**/*.java",
+    ]),
+    assets = glob(["assets/**"]),
+    assets_dir = "assets",
+    custom_package = "org.tensorflow.demo",
+    inline_constants = 1,
+    legacy_native_support = 0,
+    manifest = "AndroidManifest.xml",
+    resource_files = glob(["res/**"]),
+    tags = [
+        "manual",
+        "notap",
+    ],
+    deps = [
+        ":tensorflow_native_libs",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+            "bin/**",
+            "gen/**",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/examples/android/README.md b/tensorflow/examples/android/README.md
new file mode 100644
index 0000000000..8a6441581a
--- /dev/null
+++ b/tensorflow/examples/android/README.md
@@ -0,0 +1,39 @@
+# Tensorflow Android Camera Demo
+
+This folder contains a simple camera-based demo application utilizing Tensorflow.
+
+## Description
+
+This demo uses a Google Inception model to classify camera frames in real-time,
+displaying the top results in an overlay on the camera image. See
+assets/imagenet_comp_graph_label_strings.txt for the possible classificiations.
+
+## To build/install/run
+
+As a pre-requisite, Bazel, the Android NDK, and the Android SDK must all be
+installed on your system. The Android build tools may be obtained from:
+https://developer.android.com/tools/revisions/build-tools.html
+
+The Android entries in [<workspace_root>/WORKSPACE](../../WORKSPACE) must be
+uncommented with the paths filled in appropriately depending on where you
+installed the NDK and SDK. Otherwise an error such as:
+"The external label '//external:android/sdk' is not bound to anything" will
+be reported.
+
+
+To build the APK, run this from your workspace root:
+```
+bazel build //tensorflow/examples/android:tensorflow_demo -c opt --copt=-mfpu=neon
+```
+Note that "-c opt" is currently required; if not set, an assert (for an
+otherwise non-problematic issue) in Eigen will halt the application during
+execution. This issue will be corrected in an upcoming release.
+
+If adb debugging is enabled on your device, you may instead use the following
+command from your workspace root to automatically build and install:
+```
+bazel mobile-install //tensorflow/examples/android:tensorflow_demo -c opt --copt=-mfpu=neon
+```
+
+Add the "--start_app" flag if you wish to automatically start the app after
+installing. Otherwise, find the application icon labeled "Tensorflow Demo".
diff --git a/tensorflow/examples/android/__init__.py b/tensorflow/examples/android/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/examples/android/__init__.py
diff --git a/tensorflow/examples/android/assets/imagenet_comp_graph_label_strings.txt b/tensorflow/examples/android/assets/imagenet_comp_graph_label_strings.txt
new file mode 100644
index 0000000000..0ac5a169d9
--- /dev/null
+++ b/tensorflow/examples/android/assets/imagenet_comp_graph_label_strings.txt
@@ -0,0 +1,1001 @@
+dummy
+kit fox
+English setter
+Siberian husky
+Australian terrier
+English springer
+grey whale
+lesser panda
+Egyptian cat
+ibex
+Persian cat
+cougar
+gazelle
+porcupine
+sea lion
+malamute
+badger
+Great Dane
+Walker hound
+Welsh springer spaniel
+whippet
+Scottish deerhound
+killer whale
+mink
+African elephant
+Weimaraner
+soft-coated wheaten terrier
+Dandie Dinmont
+red wolf
+Old English sheepdog
+jaguar
+otterhound
+bloodhound
+Airedale
+hyena
+meerkat
+giant schnauzer
+titi
+three-toed sloth
+sorrel
+black-footed ferret
+dalmatian
+black-and-tan coonhound
+papillon
+skunk
+Staffordshire bullterrier
+Mexican hairless
+Bouvier des Flandres
+weasel
+miniature poodle
+Cardigan
+malinois
+bighorn
+fox squirrel
+colobus
+tiger cat
+Lhasa
+impala
+coyote
+Yorkshire terrier
+Newfoundland
+brown bear
+red fox
+Norwegian elkhound
+Rottweiler
+hartebeest
+Saluki
+grey fox
+schipperke
+Pekinese
+Brabancon griffon
+West Highland white terrier
+Sealyham terrier
+guenon
+mongoose
+indri
+tiger
+Irish wolfhound
+wild boar
+EntleBucher
+zebra
+ram
+French bulldog
+orangutan
+basenji
+leopard
+Bernese mountain dog
+Maltese dog
+Norfolk terrier
+toy terrier
+vizsla
+cairn
+squirrel monkey
+groenendael
+clumber
+Siamese cat
+chimpanzee
+komondor
+Afghan hound
+Japanese spaniel
+proboscis monkey
+guinea pig
+white wolf
+ice bear
+gorilla
+borzoi
+toy poodle
+Kerry blue terrier
+ox
+Scotch terrier
+Tibetan mastiff
+spider monkey
+Doberman
+Boston bull
+Greater Swiss Mountain dog
+Appenzeller
+Shih-Tzu
+Irish water spaniel
+Pomeranian
+Bedlington terrier
+warthog
+Arabian camel
+siamang
+miniature schnauzer
+collie
+golden retriever
+Irish terrier
+affenpinscher
+Border collie
+hare
+boxer
+silky terrier
+beagle
+Leonberg
+German short-haired pointer
+patas
+dhole
+baboon
+macaque
+Chesapeake Bay retriever
+bull mastiff
+kuvasz
+capuchin
+pug
+curly-coated retriever
+Norwich terrier
+flat-coated retriever
+hog
+keeshond
+Eskimo dog
+Brittany spaniel
+standard poodle
+Lakeland terrier
+snow leopard
+Gordon setter
+dingo
+standard schnauzer
+hamster
+Tibetan terrier
+Arctic fox
+wire-haired fox terrier
+basset
+water buffalo
+American black bear
+Angora
+bison
+howler monkey
+hippopotamus
+chow
+giant panda
+American Staffordshire terrier
+Shetland sheepdog
+Great Pyrenees
+Chihuahua
+tabby
+marmoset
+Labrador retriever
+Saint Bernard
+armadillo
+Samoyed
+bluetick
+redbone
+polecat
+marmot
+kelpie
+gibbon
+llama
+miniature pinscher
+wood rabbit
+Italian greyhound
+lion
+cocker spaniel
+Irish setter
+dugong
+Indian elephant
+beaver
+Sussex spaniel
+Pembroke
+Blenheim spaniel
+Madagascar cat
+Rhodesian ridgeback
+lynx
+African hunting dog
+langur
+Ibizan hound
+timber wolf
+cheetah
+English foxhound
+briard
+sloth bear
+Border terrier
+German shepherd
+otter
+koala
+tusker
+echidna
+wallaby
+platypus
+wombat
+revolver
+umbrella
+schooner
+soccer ball
+accordion
+ant
+starfish
+chambered nautilus
+grand piano
+laptop
+strawberry
+airliner
+warplane
+airship
+balloon
+space shuttle
+fireboat
+gondola
+speedboat
+lifeboat
+canoe
+yawl
+catamaran
+trimaran
+container ship
+liner
+pirate
+aircraft carrier
+submarine
+wreck
+half track
+tank
+missile
+bobsled
+dogsled
+bicycle-built-for-two
+mountain bike
+freight car
+passenger car
+barrow
+shopping cart
+motor scooter
+forklift
+electric locomotive
+steam locomotive
+amphibian
+ambulance
+beach wagon
+cab
+convertible
+jeep
+limousine
+minivan
+Model T
+racer
+sports car
+go-kart
+golfcart
+moped
+snowplow
+fire engine
+garbage truck
+pickup
+tow truck
+trailer truck
+moving van
+police van
+recreational vehicle
+streetcar
+snowmobile
+tractor
+mobile home
+tricycle
+unicycle
+horse cart
+jinrikisha
+oxcart
+bassinet
+cradle
+crib
+four-poster
+bookcase
+china cabinet
+medicine chest
+chiffonier
+table lamp
+file
+park bench
+barber chair
+throne
+folding chair
+rocking chair
+studio couch
+toilet seat
+desk
+pool table
+dining table
+entertainment center
+wardrobe
+Granny Smith
+orange
+lemon
+fig
+pineapple
+banana
+jackfruit
+custard apple
+pomegranate
+acorn
+hip
+ear
+rapeseed
+corn
+buckeye
+organ
+upright
+chime
+drum
+gong
+maraca
+marimba
+steel drum
+banjo
+cello
+violin
+harp
+acoustic guitar
+electric guitar
+cornet
+French horn
+trombone
+harmonica
+ocarina
+panpipe
+bassoon
+oboe
+sax
+flute
+daisy
+yellow lady's slipper
+cliff
+valley
+alp
+volcano
+promontory
+sandbar
+coral reef
+lakeside
+seashore
+geyser
+hatchet
+cleaver
+letter opener
+plane
+power drill
+lawn mower
+hammer
+corkscrew
+can opener
+plunger
+screwdriver
+shovel
+plow
+chain saw
+cock
+hen
+ostrich
+brambling
+goldfinch
+house finch
+junco
+indigo bunting
+robin
+bulbul
+jay
+magpie
+chickadee
+water ouzel
+kite
+bald eagle
+vulture
+great grey owl
+black grouse
+ptarmigan
+ruffed grouse
+prairie chicken
+peacock
+quail
+partridge
+African grey
+macaw
+sulphur-crested cockatoo
+lorikeet
+coucal
+bee eater
+hornbill
+hummingbird
+jacamar
+toucan
+drake
+red-breasted merganser
+goose
+black swan
+white stork
+black stork
+spoonbill
+flamingo
+American egret
+little blue heron
+bittern
+crane
+limpkin
+American coot
+bustard
+ruddy turnstone
+red-backed sandpiper
+redshank
+dowitcher
+oystercatcher
+European gallinule
+pelican
+king penguin
+albatross
+great white shark
+tiger shark
+hammerhead
+electric ray
+stingray
+barracouta
+coho
+tench
+goldfish
+eel
+rock beauty
+anemone fish
+lionfish
+puffer
+sturgeon
+gar
+loggerhead
+leatherback turtle
+mud turtle
+terrapin
+box turtle
+banded gecko
+common iguana
+American chameleon
+whiptail
+agama
+frilled lizard
+alligator lizard
+Gila monster
+green lizard
+African chameleon
+Komodo dragon
+triceratops
+African crocodile
+American alligator
+thunder snake
+ringneck snake
+hognose snake
+green snake
+king snake
+garter snake
+water snake
+vine snake
+night snake
+boa constrictor
+rock python
+Indian cobra
+green mamba
+sea snake
+horned viper
+diamondback
+sidewinder
+European fire salamander
+common newt
+eft
+spotted salamander
+axolotl
+bullfrog
+tree frog
+tailed frog
+whistle
+wing
+paintbrush
+hand blower
+oxygen mask
+snorkel
+loudspeaker
+microphone
+screen
+mouse
+electric fan
+oil filter
+strainer
+space heater
+stove
+guillotine
+barometer
+rule
+odometer
+scale
+analog clock
+digital clock
+wall clock
+hourglass
+sundial
+parking meter
+stopwatch
+digital watch
+stethoscope
+syringe
+magnetic compass
+binoculars
+projector
+sunglasses
+loupe
+radio telescope
+bow
+cannon [ground]
+assault rifle
+rifle
+projectile
+computer keyboard
+typewriter keyboard
+crane
+lighter
+abacus
+cash machine
+slide rule
+desktop computer
+hand-held computer
+notebook
+web site
+harvester
+thresher
+printer
+slot
+vending machine
+sewing machine
+joystick
+switch
+hook
+car wheel
+paddlewheel
+pinwheel
+potter's wheel
+gas pump
+carousel
+swing
+reel
+radiator
+puck
+hard disc
+sunglass
+pick
+car mirror
+solar dish
+remote control
+disk brake
+buckle
+hair slide
+knot
+combination lock
+padlock
+nail
+safety pin
+screw
+muzzle
+seat belt
+ski
+candle
+jack-o'-lantern
+spotlight
+torch
+neck brace
+pier
+tripod
+maypole
+mousetrap
+spider web
+trilobite
+harvestman
+scorpion
+black and gold garden spider
+barn spider
+garden spider
+black widow
+tarantula
+wolf spider
+tick
+centipede
+isopod
+Dungeness crab
+rock crab
+fiddler crab
+king crab
+American lobster
+spiny lobster
+crayfish
+hermit crab
+tiger beetle
+ladybug
+ground beetle
+long-horned beetle
+leaf beetle
+dung beetle
+rhinoceros beetle
+weevil
+fly
+bee
+grasshopper
+cricket
+walking stick
+cockroach
+mantis
+cicada
+leafhopper
+lacewing
+dragonfly
+damselfly
+admiral
+ringlet
+monarch
+cabbage butterfly
+sulphur butterfly
+lycaenid
+jellyfish
+sea anemone
+brain coral
+flatworm
+nematode
+conch
+snail
+slug
+sea slug
+chiton
+sea urchin
+sea cucumber
+iron
+espresso maker
+microwave
+Dutch oven
+rotisserie
+toaster
+waffle iron
+vacuum
+dishwasher
+refrigerator
+washer
+Crock Pot
+frying pan
+wok
+caldron
+coffeepot
+teapot
+spatula
+altar
+triumphal arch
+patio
+steel arch bridge
+suspension bridge
+viaduct
+barn
+greenhouse
+palace
+monastery
+library
+apiary
+boathouse
+church
+mosque
+stupa
+planetarium
+restaurant
+cinema
+home theater
+lumbermill
+coil
+obelisk
+totem pole
+castle
+prison
+grocery store
+bakery
+barbershop
+bookshop
+butcher shop
+confectionery
+shoe shop
+tobacco shop
+toyshop
+fountain
+cliff dwelling
+yurt
+dock
+brass
+megalith
+bannister
+breakwater
+dam
+chainlink fence
+picket fence
+worm fence
+stone wall
+grille
+sliding door
+turnstile
+mountain tent
+scoreboard
+honeycomb
+plate rack
+pedestal
+beacon
+mashed potato
+bell pepper
+head cabbage
+broccoli
+cauliflower
+zucchini
+spaghetti squash
+acorn squash
+butternut squash
+cucumber
+artichoke
+cardoon
+mushroom
+shower curtain
+jean
+carton
+handkerchief
+sandal
+ashcan
+safe
+plate
+necklace
+croquet ball
+fur coat
+thimble
+pajama
+running shoe
+cocktail shaker
+chest
+manhole cover
+modem
+tub
+tray
+balance beam
+bagel
+prayer rug
+kimono
+hot pot
+whiskey jug
+knee pad
+book jacket
+spindle
+ski mask
+beer bottle
+crash helmet
+bottlecap
+tile roof
+mask
+maillot
+Petri dish
+football helmet
+bathing cap
+teddy bear
+holster
+pop bottle
+photocopier
+vestment
+crossword puzzle
+golf ball
+trifle
+suit
+water tower
+feather boa
+cloak
+red wine
+drumstick
+shield
+Christmas stocking
+hoopskirt
+menu
+stage
+bonnet
+meat loaf
+baseball
+face powder
+scabbard
+sunscreen
+beer glass
+hen-of-the-woods
+guacamole
+lampshade
+wool
+hay
+bow tie
+mailbag
+water jug
+bucket
+dishrag
+soup bowl
+eggnog
+mortar
+trench coat
+paddle
+chain
+swab
+mixing bowl
+potpie
+wine bottle
+shoji
+bulletproof vest
+drilling platform
+binder
+cardigan
+sweatshirt
+pot
+birdhouse
+hamper
+ping-pong ball
+pencil box
+pay-phone
+consomme
+apron
+punching bag
+backpack
+groom
+bearskin
+pencil sharpener
+broom
+mosquito net
+abaya
+mortarboard
+poncho
+crutch
+Polaroid camera
+space bar
+cup
+racket
+traffic light
+quill
+radio
+dough
+cuirass
+military uniform
+lipstick
+shower cap
+monitor
+oscilloscope
+mitten
+brassiere
+French loaf
+vase
+milk can
+rugby ball
+paper towel
+earthstar
+envelope
+miniskirt
+cowboy hat
+trolleybus
+perfume
+bathtub
+hotdog
+coral fungus
+bullet train
+pillow
+toilet tissue
+cassette
+carpenter's kit
+ladle
+stinkhorn
+lotion
+hair spray
+academic gown
+dome
+crate
+wig
+burrito
+pill bottle
+chain mail
+theater curtain
+window shade
+barrel
+washbasin
+ballpoint
+basketball
+bath towel
+cowboy boot
+gown
+window screen
+agaric
+cellular telephone
+nipple
+barbell
+mailbox
+lab coat
+fire screen
+minibus
+packet
+maze
+pole
+horizontal bar
+sombrero
+pickelhaube
+rain barrel
+wallet
+cassette player
+comic book
+piggy bank
+street sign
+bell cote
+fountain pen
+Windsor tie
+volleyball
+overskirt
+sarong
+purse
+bolo tie
+bib
+parachute
+sleeping bag
+television
+swimming trunks
+measuring cup
+espresso
+pizza
+breastplate
+shopping basket
+wooden spoon
+saltshaker
+chocolate sauce
+ballplayer
+goblet
+gyromitra
+stretcher
+water bottle
+dial telephone
+soap dispenser
+jersey
+school bus
+jigsaw puzzle
+plastic bag
+reflex camera
+diaper
+Band Aid
+ice lolly
+velvet
+tennis ball
+gasmask
+doormat
+Loafer
+ice cream
+pretzel
+quilt
+maillot
+tape player
+clog
+iPod
+bolete
+scuba diver
+pitcher
+matchstick
+bikini
+sock
+CD player
+lens cap
+thatch
+vault
+beaker
+bubble
+cheeseburger
+parallel bars
+flagpole
+coffee mug
+rubber eraser
+stole
+carbonara
+dumbbell
+\ No newline at end of file
diff --git a/tensorflow/examples/android/jni/__init__.py b/tensorflow/examples/android/jni/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/examples/android/jni/__init__.py
diff --git a/tensorflow/examples/android/jni/imageutils_jni.cc b/tensorflow/examples/android/jni/imageutils_jni.cc
new file mode 100644
index 0000000000..a1f88fb867
--- /dev/null
+++ b/tensorflow/examples/android/jni/imageutils_jni.cc
@@ -0,0 +1,122 @@
+// This file binds the native image utility code to the Java class
+// which exposes them.
+
+#include <jni.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/examples/android/jni/rgb2yuv.h"
+#include "tensorflow/examples/android/jni/yuv2rgb.h"
+
+#define IMAGEUTILS_METHOD(METHOD_NAME) \
+  Java_org_tensorflow_demo_env_ImageUtils_##METHOD_NAME  // NOLINT
+
+using namespace tensorflow;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+JNIEXPORT void JNICALL
+IMAGEUTILS_METHOD(convertYUV420SPToARGB8888)(
+    JNIEnv* env, jclass clazz, jbyteArray input, jintArray output,
+    jint width, jint height, jboolean halfSize);
+
+JNIEXPORT void JNICALL
+IMAGEUTILS_METHOD(convertYUV420SPToRGB565)(
+    JNIEnv* env, jclass clazz, jbyteArray input, jbyteArray output,
+    jint width, jint height);
+
+JNIEXPORT void JNICALL
+IMAGEUTILS_METHOD(convertARGB8888ToYUV420SP)(
+    JNIEnv* env, jclass clazz, jintArray input, jbyteArray output,
+    jint width, jint height);
+
+JNIEXPORT void JNICALL
+IMAGEUTILS_METHOD(convertRGB565ToYUV420SP)(
+    JNIEnv* env, jclass clazz, jbyteArray input, jbyteArray output,
+    jint width, jint height);
+
+#ifdef __cplusplus
+}
+#endif
+
+JNIEXPORT void JNICALL
+IMAGEUTILS_METHOD(convertYUV420SPToARGB8888)(
+    JNIEnv* env, jclass clazz, jbyteArray input, jintArray output,
+    jint width, jint height, jboolean halfSize) {
+  jboolean inputCopy = JNI_FALSE;
+  jbyte* const i = env->GetByteArrayElements(input, &inputCopy);
+
+  jboolean outputCopy = JNI_FALSE;
+  jint* const o = env->GetIntArrayElements(output, &outputCopy);
+
+  if (halfSize) {
+    ConvertYUV420SPToARGB8888HalfSize(reinterpret_cast<uint8*>(i),
+                                      reinterpret_cast<uint32*>(o),
+                                      width, height);
+  } else {
+    ConvertYUV420SPToARGB8888(reinterpret_cast<uint8*>(i),
+                              reinterpret_cast<uint8*>(i) + width * height,
+                              reinterpret_cast<uint32*>(o),
+                              width, height);
+  }
+
+  env->ReleaseByteArrayElements(input, i, JNI_ABORT);
+  env->ReleaseIntArrayElements(output, o, 0);
+}
+
+JNIEXPORT void JNICALL
+IMAGEUTILS_METHOD(convertYUV420SPToRGB565)(
+    JNIEnv* env, jclass clazz, jbyteArray input, jbyteArray output,
+    jint width, jint height) {
+  jboolean inputCopy = JNI_FALSE;
+  jbyte* const i = env->GetByteArrayElements(input, &inputCopy);
+
+  jboolean outputCopy = JNI_FALSE;
+  jbyte* const o = env->GetByteArrayElements(output, &outputCopy);
+
+  ConvertYUV420SPToRGB565(reinterpret_cast<uint8*>(i),
+                          reinterpret_cast<uint16*>(o),
+                          width, height);
+
+  env->ReleaseByteArrayElements(input, i, JNI_ABORT);
+  env->ReleaseByteArrayElements(output, o, 0);
+}
+
+JNIEXPORT void JNICALL
+IMAGEUTILS_METHOD(convertARGB8888ToYUV420SP)(
+    JNIEnv* env, jclass clazz, jintArray input, jbyteArray output,
+    jint width, jint height) {
+  jboolean inputCopy = JNI_FALSE;
+  jint* const i = env->GetIntArrayElements(input, &inputCopy);
+
+  jboolean outputCopy = JNI_FALSE;
+  jbyte* const o = env->GetByteArrayElements(output, &outputCopy);
+
+  ConvertARGB8888ToYUV420SP(reinterpret_cast<uint32*>(i),
+                            reinterpret_cast<uint8*>(o),
+                            width, height);
+
+  env->ReleaseIntArrayElements(input, i, JNI_ABORT);
+  env->ReleaseByteArrayElements(output, o, 0);
+}
+
+JNIEXPORT void JNICALL
+IMAGEUTILS_METHOD(convertRGB565ToYUV420SP)(
+    JNIEnv* env, jclass clazz, jbyteArray input, jbyteArray output,
+    jint width, jint height) {
+  jboolean inputCopy = JNI_FALSE;
+  jbyte* const i = env->GetByteArrayElements(input, &inputCopy);
+
+  jboolean outputCopy = JNI_FALSE;
+  jbyte* const o = env->GetByteArrayElements(output, &outputCopy);
+
+  ConvertRGB565ToYUV420SP(reinterpret_cast<uint16*>(i),
+                          reinterpret_cast<uint8*>(o),
+                          width, height);
+
+  env->ReleaseByteArrayElements(input, i, JNI_ABORT);
+  env->ReleaseByteArrayElements(output, o, 0);
+}
diff --git a/tensorflow/examples/android/jni/jni_utils.cc b/tensorflow/examples/android/jni/jni_utils.cc
new file mode 100644
index 0000000000..3fffc19cb6
--- /dev/null
+++ b/tensorflow/examples/android/jni/jni_utils.cc
@@ -0,0 +1,144 @@
+#include "tensorflow/examples/android/jni/jni_utils.h"
+
+#include <android/asset_manager.h>
+#include <android/asset_manager_jni.h>
+#include <jni.h>
+#include <stdlib.h>
+
+#include <string>
+#include <vector>
+#include <fstream>
+#include <sstream>
+
+#include "tensorflow/core/platform/logging.h"
+#include "google/protobuf/src/google/protobuf/io/zero_copy_stream_impl.h"
+#include "google/protobuf/src/google/protobuf/io/zero_copy_stream_impl_lite.h"
+#include "google/protobuf/src/google/protobuf/io/coded_stream.h"
+#include "google/protobuf/src/google/protobuf/message_lite.h"
+
+static const char* const ASSET_PREFIX = "file:///android_asset/";
+
+namespace {
+class IfstreamInputStream : public ::google::protobuf::io::CopyingInputStream {
+ public:
+  explicit IfstreamInputStream(const std::string& file_name)
+      : ifs_(file_name.c_str(), std::ios::in | std::ios::binary) {}
+  ~IfstreamInputStream() { ifs_.close(); }
+
+  int Read(void* buffer, int size) {
+    if (!ifs_) {
+      return -1;
+    }
+    ifs_.read(static_cast<char*>(buffer), size);
+    return ifs_.gcount();
+  }
+
+ private:
+  std::ifstream ifs_;
+};
+}  // namespace
+
+bool PortableReadFileToProto(const std::string& file_name,
+                             ::google::protobuf::MessageLite* proto) {
+  ::google::protobuf::io::CopyingInputStreamAdaptor stream(
+      new IfstreamInputStream(file_name));
+  stream.SetOwnsCopyingStream(true);
+  // TODO(jiayq): the following coded stream is for debugging purposes to allow
+  // one to parse arbitrarily large messages for MessageLite. One most likely
+  // doesn't want to put protobufs larger than 64MB on Android, so we should
+  // eventually remove this and quit loud when a large protobuf is passed in.
+  ::google::protobuf::io::CodedInputStream coded_stream(&stream);
+  // Total bytes hard limit / warning limit are set to 1GB and 512MB
+  // respectively.
+  coded_stream.SetTotalBytesLimit(1024LL << 20, 512LL << 20);
+  return proto->ParseFromCodedStream(&coded_stream);
+}
+
+bool IsAsset(const char* const filename) {
+  return strstr(filename, ASSET_PREFIX) == filename;
+}
+
+void ReadFileToProto(AAssetManager* const asset_manager,
+                     const char* const filename,
+                     google::protobuf::MessageLite* message) {
+  if (!IsAsset(filename)) {
+    VLOG(0) << "Opening file: " << filename;
+    CHECK(PortableReadFileToProto(filename, message));
+    return;
+  }
+
+  CHECK_NOTNULL(asset_manager);
+
+  const char* const asset_filename = filename + strlen(ASSET_PREFIX);
+  AAsset* asset = AAssetManager_open(asset_manager,
+                                     asset_filename,
+                                     AASSET_MODE_STREAMING);
+  CHECK_NOTNULL(asset);
+
+  off_t start;
+  off_t length;
+  const int fd = AAsset_openFileDescriptor(asset, &start, &length);
+
+  if (fd >= 0) {
+    // If it has a file descriptor that means it can be memmapped directly
+    // from the APK.
+    VLOG(0) << "Opening asset " << asset_filename
+            << " from disk with zero-copy.";
+    google::protobuf::io::FileInputStream is(fd);
+    google::protobuf::io::LimitingInputStream lis(&is, start + length);
+    lis.Skip(start);
+    CHECK(message->ParseFromZeroCopyStream(&lis));
+    is.Close();
+  } else {
+    // It may be compressed, in which case we have to uncompress
+    // it to memory first.
+    VLOG(0) << "Opening asset " << asset_filename
+            << " from disk with copy.";
+    const off_t data_size = AAsset_getLength(asset);
+    const void* const memory = AAsset_getBuffer(asset);
+    CHECK(message->ParseFromArray(memory, data_size));
+  }
+  AAsset_close(asset);
+}
+
+void ReadFileToString(AAssetManager* const asset_manager,
+                      const char* const filename, std::string* str) {
+  if (!IsAsset(filename)) {
+    VLOG(0) << "Opening file: " << filename;
+    std::ifstream t(filename);
+    std::string tmp((std::istreambuf_iterator<char>(t)),
+                     std::istreambuf_iterator<char>());
+    tmp.swap(*str);
+    t.close();
+    return;
+  }
+
+  CHECK_NOTNULL(asset_manager);
+  const char* const asset_filename = filename + strlen(ASSET_PREFIX);
+  AAsset* asset = AAssetManager_open(asset_manager,
+                                     asset_filename,
+                                     AASSET_MODE_STREAMING);
+  CHECK_NOTNULL(asset);
+  VLOG(0) << "Opening asset " << asset_filename << " from disk with copy.";
+  const off_t data_size = AAsset_getLength(asset);
+  const char* memory = reinterpret_cast<const char*>(AAsset_getBuffer(asset));
+
+  std::string tmp(memory, memory + data_size);
+  tmp.swap(*str);
+  AAsset_close(asset);
+}
+
+void ReadFileToVector(AAssetManager* const asset_manager,
+                      const char* const filename,
+                      std::vector<std::string>* str_vector) {
+  std::string labels_string;
+  ReadFileToString(asset_manager, filename, &labels_string);
+  std::istringstream ifs(labels_string);
+  str_vector->clear();
+  std::string label;
+  while (std::getline(ifs, label)) {
+    str_vector->push_back(label);
+  }
+  VLOG(0) << "Read " << str_vector->size() << " values from " << filename;
+}
+
diff --git a/tensorflow/examples/android/jni/jni_utils.h b/tensorflow/examples/android/jni/jni_utils.h
new file mode 100644
index 0000000000..9bd8d2c21f
--- /dev/null
+++ b/tensorflow/examples/android/jni/jni_utils.h
@@ -0,0 +1,30 @@
+#ifndef ORG_TENSORFLOW_JNI_JNI_UTILS_H_  // NOLINT
+#define ORG_TENSORFLOW_JNI_JNI_UTILS_H_  // NOLINT
+
+#include <jni.h>
+#include <string>
+#include <vector>
+
+#include "tensorflow/core/platform/port.h"
+
+namespace google {
+namespace protobuf {
+class MessageLite;
+}  // google
+}  // protobuf
+
+class AAssetManager;
+
+bool PortableReadFileToProto(const std::string& file_name,
+                             ::google::protobuf::MessageLite* proto);
+
+void ReadFileToProto(AAssetManager* const asset_manager,
+    const char* const filename, google::protobuf::MessageLite* message);
+
+void ReadFileToString(AAssetManager* const asset_manager,
+    const char* const filename, std::string* str);
+
+void ReadFileToVector(AAssetManager* const asset_manager,
+    const char* const filename, std::vector<std::string>* str_vector);
+
+#endif  // ORG_TENSORFLOW_JNI_JNI_UTILS_H_
diff --git a/tensorflow/examples/android/jni/libpthread.so b/tensorflow/examples/android/jni/libpthread.so
new file mode 100755
index 0000000000..7992d0de4c
--- /dev/null
+++ b/tensorflow/examples/android/jni/libpthread.so
diff --git a/tensorflow/examples/android/jni/rgb2yuv.cc b/tensorflow/examples/android/jni/rgb2yuv.cc
new file mode 100755
index 0000000000..428f311eb8
--- /dev/null
+++ b/tensorflow/examples/android/jni/rgb2yuv.cc
@@ -0,0 +1,89 @@
+// These utility functions allow for the conversion of RGB data to YUV data.
+
+#include "tensorflow/examples/android/jni/rgb2yuv.h"
+
+#include "tensorflow/core/platform/port.h"
+
+using namespace tensorflow;
+
+static inline void WriteYUV(const int x, const int y, const int width,
+                            const int r8, const int g8, const int b8,
+                            uint8* const pY,
+                            uint8* const pUV) {
+  // Using formulas from http://msdn.microsoft.com/en-us/library/ms893078
+  *pY = ((66 * r8 + 129 * g8 + 25 * b8 + 128) >> 8) + 16;
+
+  // Odd widths get rounded up so that UV blocks on the side don't get cut off.
+  const int blocks_per_row = (width + 1) / 2;
+
+  // 2 bytes per UV block
+  const int offset = 2 * (((y / 2) * blocks_per_row + (x / 2)));
+
+  // U and V are the average values of all 4 pixels in the block.
+  if (!(x & 1) && !(y & 1)) {
+    // Explicitly clear the block if this is the first pixel in it.
+    pUV[offset] = 0;
+    pUV[offset + 1] = 0;
+  }
+
+  // V (with divide by 4 factored in)
+#ifdef __APPLE__
+  const int u_offset = 0;
+  const int v_offset = 1;
+#else
+  const int u_offset = 1;
+  const int v_offset = 0;
+#endif
+  pUV[offset + v_offset] += ((112 * r8 - 94 * g8 - 18 * b8 + 128) >> 10) + 32;
+
+  // U (with divide by 4 factored in)
+  pUV[offset + u_offset] += ((-38 * r8 - 74 * g8 + 112 * b8 + 128) >> 10) + 32;
+}
+
+void ConvertARGB8888ToYUV420SP(const uint32* const input, uint8* const output,
+                               int width, int height) {
+  uint8* pY = output;
+  uint8* pUV = output + (width * height);
+  const uint32* in = input;
+
+  for (int y = 0; y < height; y++) {
+    for (int x = 0; x < width; x++) {
+      const uint32 rgb = *in++;
+#ifdef __APPLE__
+      const int nB = (rgb >> 8) & 0xFF;
+      const int nG = (rgb >> 16) & 0xFF;
+      const int nR = (rgb >> 24) & 0xFF;
+#else
+      const int nR = (rgb >> 16) & 0xFF;
+      const int nG = (rgb >> 8) & 0xFF;
+      const int nB = rgb & 0xFF;
+#endif
+      WriteYUV(x, y, width, nR, nG, nB, pY++, pUV);
+    }
+  }
+}
+
+void ConvertRGB565ToYUV420SP(const uint16* const input, uint8* const output,
+                             const int width, const int height) {
+  uint8* pY = output;
+  uint8* pUV = output + (width * height);
+  const uint16* in = input;
+
+  for (int y = 0; y < height; y++) {
+    for (int x = 0; x < width; x++) {
+      const uint32 rgb = *in++;
+
+      const int r5 = ((rgb >> 11) & 0x1F);
+      const int g6 = ((rgb >> 5) & 0x3F);
+      const int b5 = (rgb & 0x1F);
+
+      // Shift left, then fill in the empty low bits with a copy of the high
+      // bits so we can stretch across the entire 0 - 255 range.
+      const int r8 = r5 << 3 | r5 >> 2;
+      const int g8 = g6 << 2 | g6 >> 4;
+      const int b8 = b5 << 3 | b5 >> 2;
+
+      WriteYUV(x, y, width, r8, g8, b8, pY++, pUV);
+    }
+  }
+}
diff --git a/tensorflow/examples/android/jni/rgb2yuv.h b/tensorflow/examples/android/jni/rgb2yuv.h
new file mode 100755
index 0000000000..e5eb5aa419
--- /dev/null
+++ b/tensorflow/examples/android/jni/rgb2yuv.h
@@ -0,0 +1,23 @@
+#ifndef ORG_TENSORFLOW_JNI_IMAGEUTILS_RGB2YUV_H_
+#define ORG_TENSORFLOW_JNI_IMAGEUTILS_RGB2YUV_H_
+
+#include "tensorflow/core/platform/port.h"
+
+using namespace tensorflow;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+void ConvertARGB8888ToYUV420SP(const uint32* const input, uint8* const output,
+                               int width, int height);
+
+void ConvertRGB565ToYUV420SP(const uint16* const input,
+                             uint8* const output,
+                             const int width, const int height);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  // ORG_TENSORFLOW_JNI_IMAGEUTILS_RGB2YUV_H_
diff --git a/tensorflow/examples/android/jni/tensorflow_jni.cc b/tensorflow/examples/android/jni/tensorflow_jni.cc
new file mode 100644
index 0000000000..39d0bb1249
--- /dev/null
+++ b/tensorflow/examples/android/jni/tensorflow_jni.cc
@@ -0,0 +1,253 @@
+#include "tensorflow/examples/android/jni/tensorflow_jni.h"
+
+#include <android/asset_manager.h>
+#include <android/asset_manager_jni.h>
+#include <android/bitmap.h>
+
+#include <jni.h>
+#include <pthread.h>
+#include <unistd.h>
+#include <queue>
+#include <sstream>
+#include <string>
+
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/core/public/session.h"
+#include "tensorflow/core/public/tensor.h"
+#include "tensorflow/examples/android/jni/jni_utils.h"
+
+// Global variables that holds the Tensorflow classifier.
+static std::unique_ptr<tensorflow::Session> session;
+
+static std::vector<std::string> g_label_strings;
+static bool g_compute_graph_initialized = false;
+//static mutex g_compute_graph_mutex(base::LINKER_INITIALIZED);
+
+static int g_tensorflow_input_size;  // The image size for the mognet input.
+static int g_image_mean;  // The image mean.
+
+using namespace tensorflow;
+
+JNIEXPORT jint JNICALL
+TENSORFLOW_METHOD(initializeTensorflow)(
+    JNIEnv* env, jobject thiz, jobject java_asset_manager,
+    jstring model, jstring labels,
+    jint num_classes, jint mognet_input_size, jint image_mean) {
+  //MutexLock input_lock(&g_compute_graph_mutex);
+  if (g_compute_graph_initialized) {
+    LOG(INFO) << "Compute graph already loaded. skipping.";
+    return 0;
+  }
+
+  const char* const model_cstr = env->GetStringUTFChars(model, NULL);
+  const char* const labels_cstr = env->GetStringUTFChars(labels, NULL);
+
+  g_tensorflow_input_size = mognet_input_size;
+  g_image_mean = image_mean;
+
+  LOG(INFO) << "Loading Tensorflow.";
+
+  LOG(INFO) << "Making new SessionOptions.";
+  tensorflow::SessionOptions options;
+  tensorflow::ConfigProto& config = options.config;
+  LOG(INFO) << "Got config, " << config.device_count_size() << " devices";
+
+  session.reset(tensorflow::NewSession(options));
+  LOG(INFO) << "Session created.";
+
+  tensorflow::GraphDef tensorflow_graph;
+  LOG(INFO) << "Graph created.";
+
+  AAssetManager* const asset_manager =
+      AAssetManager_fromJava(env, java_asset_manager);
+  LOG(INFO) << "Acquired AssetManager.";
+
+  LOG(INFO) << "Reading file to proto: " << model_cstr;
+  ReadFileToProto(asset_manager, model_cstr, &tensorflow_graph);
+
+  LOG(INFO) << "Creating session.";
+  tensorflow::Status s = session->Create(tensorflow_graph);
+  if (!s.ok()) {
+    LOG(ERROR) << "Could not create Tensorflow Graph: " << s;
+    return -1;
+  }
+
+  // Clear the proto to save memory space.
+  tensorflow_graph.Clear();
+  LOG(INFO) << "Tensorflow graph loaded from: " << model_cstr;
+
+  // Read the label list
+  ReadFileToVector(asset_manager, labels_cstr, &g_label_strings);
+  LOG(INFO) << g_label_strings.size() << " label strings loaded from: "
+            << labels_cstr;
+  g_compute_graph_initialized = true;
+
+  return 0;
+}
+
+namespace {
+typedef struct {
+  uint8 red;
+  uint8 green;
+  uint8 blue;
+  uint8 alpha;
+} RGBA;
+}  // namespace
+
+// Returns the top N confidence values over threshold in the provided vector,
+// sorted by confidence in descending order.
+static void GetTopN(
+    const Eigen::TensorMap<Eigen::Tensor<float, 1, Eigen::RowMajor>,
+                           Eigen::Aligned>& prediction,
+    const int num_results, const float threshold,
+    std::vector<std::pair<float, int> >* top_results) {
+  // Will contain top N results in ascending order.
+  std::priority_queue<std::pair<float, int>,
+      std::vector<std::pair<float, int> >,
+      std::greater<std::pair<float, int> > > top_result_pq;
+
+  const int count = prediction.size();
+  for (int i = 0; i < count; ++i) {
+    const float value = prediction(i);
+
+    // Only add it if it beats the threshold and has a chance at being in
+    // the top N.
+    if (value < threshold) {
+      continue;
+    }
+
+    top_result_pq.push(std::pair<float, int>(value, i));
+
+    // If at capacity, kick the smallest value out.
+    if (top_result_pq.size() > num_results) {
+      top_result_pq.pop();
+    }
+  }
+
+  // Copy to output vector and reverse into descending order.
+  while (!top_result_pq.empty()) {
+    top_results->push_back(top_result_pq.top());
+    top_result_pq.pop();
+  }
+  std::reverse(top_results->begin(), top_results->end());
+}
+
+static std::string ClassifyImage(const RGBA* const bitmap_src,
+                            const int in_stride,
+                            const int width, const int height) {
+  // Create input tensor
+  tensorflow::Tensor input_tensor(
+      tensorflow::DT_FLOAT,
+      tensorflow::TensorShape({
+          1, g_tensorflow_input_size, g_tensorflow_input_size, 3}));
+
+  auto input_tensor_mapped = input_tensor.tensor<float, 4>();
+
+  LOG(INFO) << "Tensorflow: Copying Data.";
+  for (int i = 0; i < g_tensorflow_input_size; ++i) {
+    const RGBA* src = bitmap_src + i * g_tensorflow_input_size;
+    for (int j = 0; j < g_tensorflow_input_size; ++j) {
+       // Copy 3 values
+      input_tensor_mapped(0, i, j, 0) =
+          static_cast<float>(src->red) - g_image_mean;
+      input_tensor_mapped(0, i, j, 1) =
+          static_cast<float>(src->green) - g_image_mean;
+      input_tensor_mapped(0, i, j, 2) =
+          static_cast<float>(src->blue) - g_image_mean;
+      ++src;
+    }
+  }
+
+  std::vector<std::pair<std::string, tensorflow::Tensor> > input_tensors(
+      {{"input:0", input_tensor}});
+
+  VLOG(0) << "Start computing.";
+  std::vector<tensorflow::Tensor> output_tensors;
+  std::vector<std::string> output_names({"output:0"});
+
+  tensorflow::Status s =
+      session->Run(input_tensors, output_names, {}, &output_tensors);
+  VLOG(0) << "End computing.";
+
+  if (!s.ok()) {
+    LOG(ERROR) << "Error during inference: " << s;
+    return "";
+  }
+
+  VLOG(0) << "Reading from layer " << output_names[0];
+  tensorflow::Tensor* output = &output_tensors[0];
+  const int kNumResults = 5;
+  const float kThreshold = 0.1f;
+  std::vector<std::pair<float, int> > top_results;
+  GetTopN(output->flat<float>(), kNumResults, kThreshold, &top_results);
+
+  std::stringstream ss;
+  ss.precision(3);
+  for (const auto& result : top_results) {
+    const float confidence = result.first;
+    const int index = result.second;
+
+    ss << index << " " << confidence << " ";
+
+    // Write out the result as a string
+    if (index < g_label_strings.size()) {
+      // just for safety: theoretically, the output is under 1000 unless there
+      // is some numerical issues leading to a wrong prediction.
+      ss << g_label_strings[index];
+    } else {
+      ss << "Prediction: " << index;
+    }
+
+    ss << "\n";
+  }
+
+  LOG(INFO) << "Predictions: " << ss.str();
+  return ss.str();
+}
+
+JNIEXPORT jstring JNICALL
+TENSORFLOW_METHOD(classifyImageRgb)(
+    JNIEnv* env, jobject thiz, jintArray image, jint width, jint height) {
+  // Copy image into currFrame.
+  jboolean iCopied = JNI_FALSE;
+  jint* pixels = env->GetIntArrayElements(image, &iCopied);
+
+  std::string result = ClassifyImage(
+      reinterpret_cast<const RGBA*>(pixels), width * 4, width, height);
+
+  env->ReleaseIntArrayElements(image, pixels, JNI_ABORT);
+
+  return env->NewStringUTF(result.c_str());
+}
+
+JNIEXPORT jstring JNICALL
+TENSORFLOW_METHOD(classifyImageBmp)(
+    JNIEnv* env, jobject thiz, jobject bitmap) {
+  // Obtains the bitmap information.
+  AndroidBitmapInfo info;
+  CHECK_EQ(AndroidBitmap_getInfo(env, bitmap, &info),
+           ANDROID_BITMAP_RESULT_SUCCESS);
+  void* pixels;
+  CHECK_EQ(AndroidBitmap_lockPixels(env, bitmap, &pixels),
+           ANDROID_BITMAP_RESULT_SUCCESS);
+  LOG(INFO) << "Height: " << info.height;
+  LOG(INFO) << "Width: " << info.width;
+  LOG(INFO) << "Stride: " << info.stride;
+  // TODO(jiayq): deal with other formats if necessary.
+  if (info.format != ANDROID_BITMAP_FORMAT_RGBA_8888) {
+    return env->NewStringUTF(
+        "Error: Android system is not using RGBA_8888 in default.");
+  }
+
+  std::string result = ClassifyImage(
+      static_cast<const RGBA*>(pixels), info.stride, info.width, info.height);
+
+  // Finally, unlock the pixels
+  CHECK_EQ(AndroidBitmap_unlockPixels(env, bitmap),
+           ANDROID_BITMAP_RESULT_SUCCESS);
+
+  return env->NewStringUTF(result.c_str());
+}
diff --git a/tensorflow/examples/android/jni/tensorflow_jni.h b/tensorflow/examples/android/jni/tensorflow_jni.h
new file mode 100644
index 0000000000..2de353bac8
--- /dev/null
+++ b/tensorflow/examples/android/jni/tensorflow_jni.h
@@ -0,0 +1,36 @@
+// The methods are exposed to Java to allow for interaction with the native
+// Tensorflow code. See
+// tensorflow/examples/android/src/org/tensorflow/TensorflowClassifier.java
+// for the Java counterparts.
+
+#ifndef ORG_TENSORFLOW_JNI_TENSORFLOW_JNI_H_  // NOLINT
+#define ORG_TENSORFLOW_JNI_TENSORFLOW_JNI_H_  // NOLINT
+
+#include <jni.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif  // __cplusplus
+
+#define TENSORFLOW_METHOD(METHOD_NAME) \
+  Java_org_tensorflow_demo_TensorflowClassifier_##METHOD_NAME  // NOLINT
+
+JNIEXPORT jint JNICALL
+TENSORFLOW_METHOD(initializeTensorflow)(
+    JNIEnv* env, jobject thiz, jobject java_asset_manager,
+    jstring model, jstring labels,
+    jint num_classes, jint mognet_input_size, jint image_mean);
+
+JNIEXPORT jstring JNICALL
+TENSORFLOW_METHOD(classifyImageBmp)(
+    JNIEnv* env, jobject thiz, jobject bitmap);
+
+JNIEXPORT jstring JNICALL
+TENSORFLOW_METHOD(classifyImageRgb)(
+    JNIEnv* env, jobject thiz, jintArray image, jint width, jint height);
+
+#ifdef __cplusplus
+}  // extern "C"
+#endif  // __cplusplus
+
+#endif  // ORG_TENSORFLOW_JNI_TENSORFLOW_JNI_H_  // NOLINT
diff --git a/tensorflow/examples/android/jni/yuv2rgb.cc b/tensorflow/examples/android/jni/yuv2rgb.cc
new file mode 100644
index 0000000000..93694e492d
--- /dev/null
+++ b/tensorflow/examples/android/jni/yuv2rgb.cc
@@ -0,0 +1,161 @@
+// This is a collection of routines which converts various YUV image formats
+// to ARGB.
+
+#include "tensorflow/examples/android/jni/yuv2rgb.h"
+
+#ifndef MAX
+#define MAX(a, b) ({__typeof__(a) _a = (a); __typeof__(b) _b = (b); _a > _b ? _a : _b; })
+#define MIN(a, b) ({__typeof__(a) _a = (a); __typeof__(b) _b = (b); _a < _b ? _a : _b; })
+#endif
+
+// This value is 2 ^ 18 - 1, and is used to clamp the RGB values before their ranges
+// are normalized to eight bits.
+static const int kMaxChannelValue = 262143;
+
+//  Accepts a YUV 4:2:0 image with a plane of 8 bit Y samples followed by an
+//  interleaved U/V plane containing 8 bit 2x2 subsampled chroma samples,
+//  except the interleave order of U and V is reversed. Converts to a packed
+//  ARGB 32 bit output of the same pixel dimensions.
+void ConvertYUV420SPToARGB8888(const uint8* const yData,
+                               const uint8* const uvData,
+                               uint32* const output, const int width,
+                               const int height) {
+  const uint8* pY = yData;
+  const uint8* pUV = uvData;
+  uint32* out = output;
+
+  for (int y = 0; y < height; y++) {
+    for (int x = 0; x < width; x++) {
+      int nY = *pY++;
+      int offset = (y >> 1) * width + 2 * (x >> 1);
+#ifdef __APPLE__
+      int nU = pUV[offset];
+      int nV = pUV[offset + 1];
+#else
+      int nV = pUV[offset];
+      int nU = pUV[offset + 1];
+#endif
+
+      nY -= 16;
+      nU -= 128;
+      nV -= 128;
+      if (nY < 0) nY = 0;
+
+      // This is the floating point equivalent. We do the conversion in integer
+      // because some Android devices do not have floating point in hardware.
+      // nR = (int)(1.164 * nY + 2.018 * nU);
+      // nG = (int)(1.164 * nY - 0.813 * nV - 0.391 * nU);
+      // nB = (int)(1.164 * nY + 1.596 * nV);
+
+      int nR = (int)(1192 * nY + 1634 * nV);
+      int nG = (int)(1192 * nY - 833 * nV - 400 * nU);
+      int nB = (int)(1192 * nY + 2066 * nU);
+
+      nR = MIN(kMaxChannelValue, MAX(0, nR));
+      nG = MIN(kMaxChannelValue, MAX(0, nG));
+      nB = MIN(kMaxChannelValue, MAX(0, nB));
+
+      nR = (nR >> 10) & 0xff;
+      nG = (nG >> 10) & 0xff;
+      nB = (nB >> 10) & 0xff;
+      *out++ = 0xff000000 | (nR << 16) | (nG << 8) | nB;
+    }
+  }
+}
+
+// The same as above, but downsamples each dimension to half size.
+void ConvertYUV420SPToARGB8888HalfSize(const uint8* const input,
+                                       uint32* const output,
+                                       int width, int height) {
+  const uint8* pY = input;
+  const uint8* pUV = input + (width * height);
+  uint32* out = output;
+  int stride = width;
+  width >>= 1;
+  height >>= 1;
+
+  for (int y = 0; y < height; y++) {
+    for (int x = 0; x < width; x++) {
+      int nY = (pY[0] + pY[1] + pY[stride] + pY[stride + 1]) >> 2;
+      pY += 2;
+#ifdef __APPLE__
+      int nU = *pUV++;
+      int nV = *pUV++;
+#else
+      int nV = *pUV++;
+      int nU = *pUV++;
+#endif
+
+      nY -= 16;
+      nU -= 128;
+      nV -= 128;
+      if (nY < 0) nY = 0;
+
+      int nR = (int)(1192 * nY + 1634 * nV);
+      int nG = (int)(1192 * nY - 833 * nV - 400 * nU);
+      int nB = (int)(1192 * nY + 2066 * nU);
+
+      nR = MIN(kMaxChannelValue, MAX(0, nR));
+      nG = MIN(kMaxChannelValue, MAX(0, nG));
+      nB = MIN(kMaxChannelValue, MAX(0, nB));
+
+      nR = (nR >> 10) & 0xff;
+      nG = (nG >> 10) & 0xff;
+      nB = (nB >> 10) & 0xff;
+      *out++ = 0xff000000 | (nR << 16) | (nG << 8) | nB;
+    }
+    pY += stride;
+  }
+}
+
+//  Accepts a YUV 4:2:0 image with a plane of 8 bit Y samples followed by an
+//  interleaved U/V plane containing 8 bit 2x2 subsampled chroma samples,
+//  except the interleave order of U and V is reversed. Converts to a packed
+//  RGB 565 bit output of the same pixel dimensions.
+void ConvertYUV420SPToRGB565(const uint8* const input, uint16* const output,
+                             const int width, const int height) {
+  const uint8* pY = input;
+  const uint8* pUV = input + (width * height);
+  uint16 *out = output;
+
+  for (int y = 0; y < height; y++) {
+    for (int x = 0; x < width; x++) {
+      int nY = *pY++;
+      int offset = (y >> 1) * width + 2 * (x >> 1);
+#ifdef __APPLE__
+      int nU = pUV[offset];
+      int nV = pUV[offset + 1];
+#else
+      int nV = pUV[offset];
+      int nU = pUV[offset + 1];
+#endif
+
+      nY -= 16;
+      nU -= 128;
+      nV -= 128;
+      if (nY < 0) nY = 0;
+
+      // This is the floating point equivalent. We do the conversion in integer
+      // because some Android devices do not have floating point in hardware.
+      // nR = (int)(1.164 * nY + 2.018 * nU);
+      // nG = (int)(1.164 * nY - 0.813 * nV - 0.391 * nU);
+      // nB = (int)(1.164 * nY + 1.596 * nV);
+
+      int nR = (int)(1192 * nY + 1634 * nV);
+      int nG = (int)(1192 * nY - 833 * nV - 400 * nU);
+      int nB = (int)(1192 * nY + 2066 * nU);
+
+      nR = MIN(kMaxChannelValue, MAX(0, nR));
+      nG = MIN(kMaxChannelValue, MAX(0, nG));
+      nB = MIN(kMaxChannelValue, MAX(0, nB));
+
+      // Shift more than for ARGB8888 and apply appropriate bitmask.
+      nR = (nR >> 13) & 0x1f;
+      nG = (nG >> 12) & 0x3f;
+      nB = (nB >> 13) & 0x1f;
+
+      // R is high 5 bits, G is middle 6 bits, and B is low 5 bits.
+      *out++ = (nR << 11) | (nG << 5) | nB;
+    }
+  }
+}
diff --git a/tensorflow/examples/android/jni/yuv2rgb.h b/tensorflow/examples/android/jni/yuv2rgb.h
new file mode 100644
index 0000000000..698da415f5
--- /dev/null
+++ b/tensorflow/examples/android/jni/yuv2rgb.h
@@ -0,0 +1,37 @@
+// This is a collection of routines which converts various YUV image formats
+// to (A)RGB.
+
+#ifndef ORG_TENSORFLOW_JNI_IMAGEUTILS_YUV2RGB_H_
+#define ORG_TENSORFLOW_JNI_IMAGEUTILS_YUV2RGB_H_
+
+#include "tensorflow/core/platform/port.h"
+
+using namespace tensorflow;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Converts YUV420 semi-planar data to ARGB 8888 data using the supplied width
+// and height. The input and output must already be allocated and non-null.
+// For efficiency, no error checking is performed.
+void ConvertYUV420SPToARGB8888(const uint8* const pY, const uint8* const pUV,
+                               uint32* const output, const int width,
+                               const int height);
+
+// The same as above, but downsamples each dimension to half size.
+void ConvertYUV420SPToARGB8888HalfSize(const uint8* const input,
+                                       uint32* const output,
+                                       int width, int height);
+
+// Converts YUV420 semi-planar data to RGB 565 data using the supplied width
+// and height. The input and output must already be allocated and non-null.
+// For efficiency, no error checking is performed.
+void ConvertYUV420SPToRGB565(const uint8* const input, uint16* const output,
+                             const int width, const int height);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  // ORG_TENSORFLOW_JNI_IMAGEUTILS_YUV2RGB_H_
diff --git a/tensorflow/examples/android/res/drawable-hdpi/ic_action_info.png b/tensorflow/examples/android/res/drawable-hdpi/ic_action_info.png
new file mode 100644
index 0000000000..32bd1aabca
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-hdpi/ic_action_info.png
diff --git a/tensorflow/examples/android/res/drawable-hdpi/ic_launcher.png b/tensorflow/examples/android/res/drawable-hdpi/ic_launcher.png
new file mode 100644
index 0000000000..b3113cd15c
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-hdpi/ic_launcher.png
diff --git a/tensorflow/examples/android/res/drawable-hdpi/tile.9.png b/tensorflow/examples/android/res/drawable-hdpi/tile.9.png
new file mode 100644
index 0000000000..135862883e
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-hdpi/tile.9.png
diff --git a/tensorflow/examples/android/res/drawable-mdpi/ic_action_info.png b/tensorflow/examples/android/res/drawable-mdpi/ic_action_info.png
new file mode 100644
index 0000000000..8efbbf8b3c
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-mdpi/ic_action_info.png
diff --git a/tensorflow/examples/android/res/drawable-mdpi/ic_launcher.png b/tensorflow/examples/android/res/drawable-mdpi/ic_launcher.png
new file mode 100644
index 0000000000..51f87ee650
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-mdpi/ic_launcher.png
diff --git a/tensorflow/examples/android/res/drawable-xhdpi/ic_action_info.png b/tensorflow/examples/android/res/drawable-xhdpi/ic_action_info.png
new file mode 100644
index 0000000000..ba143ea7a8
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-xhdpi/ic_action_info.png
diff --git a/tensorflow/examples/android/res/drawable-xhdpi/ic_launcher.png b/tensorflow/examples/android/res/drawable-xhdpi/ic_launcher.png
new file mode 100644
index 0000000000..6361d792da
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-xhdpi/ic_launcher.png
diff --git a/tensorflow/examples/android/res/drawable-xxhdpi/ic_action_info.png b/tensorflow/examples/android/res/drawable-xxhdpi/ic_action_info.png
new file mode 100644
index 0000000000..394eb7e534
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-xxhdpi/ic_action_info.png
diff --git a/tensorflow/examples/android/res/drawable-xxhdpi/ic_launcher.png b/tensorflow/examples/android/res/drawable-xxhdpi/ic_launcher.png
new file mode 100644
index 0000000000..2e27bec978
--- /dev/null
+++ b/tensorflow/examples/android/res/drawable-xxhdpi/ic_launcher.png
diff --git a/tensorflow/examples/android/res/layout-land/camera_connection_fragment.xml b/tensorflow/examples/android/res/layout-land/camera_connection_fragment.xml
new file mode 100644
index 0000000000..56b526c84b
--- /dev/null
+++ b/tensorflow/examples/android/res/layout-land/camera_connection_fragment.xml
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="utf-8"?><!--
+ Copyright 2014 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
+    android:layout_width="match_parent"
+    android:layout_height="match_parent">
+
+    <org.tensorflow.demo.AutoFitTextureView
+        android:id="@+id/texture"
+        android:layout_width="wrap_content"
+        android:layout_height="wrap_content"
+        android:layout_alignParentBottom="true"
+        android:layout_alignParentStart="true"
+        android:layout_alignParentTop="true" />
+
+    <org.tensorflow.demo.RecognitionScoreView
+        android:id="@+id/results"
+        android:layout_width="match_parent"
+        android:layout_height="112dp"
+        android:layout_alignParentTop="true" />
+        
+</RelativeLayout>
diff --git a/tensorflow/examples/android/res/layout/activity_camera.xml b/tensorflow/examples/android/res/layout/activity_camera.xml
new file mode 100644
index 0000000000..d21be9fc37
--- /dev/null
+++ b/tensorflow/examples/android/res/layout/activity_camera.xml
@@ -0,0 +1,22 @@
+<?xml version="1.0" encoding="utf-8"?><!--
+ Copyright 2014 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<FrameLayout xmlns:android="http://schemas.android.com/apk/res/android"
+    xmlns:tools="http://schemas.android.com/tools"
+    android:id="@+id/container"
+    android:layout_width="match_parent"
+    android:layout_height="match_parent"
+    android:background="#000"
+    tools:context="org.tensorflow.demo.CameraActivity" />
diff --git a/tensorflow/examples/android/res/layout/camera_connection_fragment.xml b/tensorflow/examples/android/res/layout/camera_connection_fragment.xml
new file mode 100644
index 0000000000..0e8e52a138
--- /dev/null
+++ b/tensorflow/examples/android/res/layout/camera_connection_fragment.xml
@@ -0,0 +1,32 @@
+<?xml version="1.0" encoding="utf-8"?><!--
+ Copyright 2014 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
+    android:layout_width="match_parent"
+    android:layout_height="match_parent">
+
+    <org.tensorflow.demo.AutoFitTextureView
+        android:id="@+id/texture"
+        android:layout_width="wrap_content"
+        android:layout_height="wrap_content"
+        android:layout_alignParentBottom="true" />
+    
+    <org.tensorflow.demo.RecognitionScoreView
+        android:id="@+id/results"
+        android:layout_width="match_parent"
+        android:layout_height="112dp"
+        android:layout_alignParentTop="true" />
+        
+</RelativeLayout>
diff --git a/tensorflow/examples/android/res/values-sw600dp/template-dimens.xml b/tensorflow/examples/android/res/values-sw600dp/template-dimens.xml
new file mode 100644
index 0000000000..22074a2bdb
--- /dev/null
+++ b/tensorflow/examples/android/res/values-sw600dp/template-dimens.xml
@@ -0,0 +1,24 @@
+<!--
+  Copyright 2013 The Android Open Source Project
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+  -->
+
+<resources>
+
+    <!-- Semantic definitions -->
+
+    <dimen name="horizontal_page_margin">@dimen/margin_huge</dimen>
+    <dimen name="vertical_page_margin">@dimen/margin_medium</dimen>
+
+</resources>
diff --git a/tensorflow/examples/android/res/values-sw600dp/template-styles.xml b/tensorflow/examples/android/res/values-sw600dp/template-styles.xml
new file mode 100644
index 0000000000..03d1974183
--- /dev/null
+++ b/tensorflow/examples/android/res/values-sw600dp/template-styles.xml
@@ -0,0 +1,25 @@
+<!--
+  Copyright 2013 The Android Open Source Project
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+  -->
+
+<resources>
+
+    <style name="Widget.SampleMessage">
+        <item name="android:textAppearance">?android:textAppearanceLarge</item>
+        <item name="android:lineSpacingMultiplier">1.2</item>
+        <item name="android:shadowDy">-6.5</item>
+    </style>
+
+</resources>
diff --git a/tensorflow/examples/android/res/values-v11/styles.xml b/tensorflow/examples/android/res/values-v11/styles.xml
new file mode 100644
index 0000000000..c2d1babc12
--- /dev/null
+++ b/tensorflow/examples/android/res/values-v11/styles.xml
@@ -0,0 +1,24 @@
+<?xml version="1.0" encoding="utf-8"?>
+<resources>
+
+  <!--
+        Base application theme for API 11+. This theme completely replaces
+        AppBaseTheme from res/values/styles.xml on API 11+ devices.
+  -->
+  <style name="AppBaseTheme" parent="android:Theme.Holo.Light">
+    <!-- API 11 theme customizations can go here. -->
+  </style>
+
+  <style name="FullscreenTheme" parent="android:Theme.Holo">
+    <item name="android:actionBarStyle">@style/FullscreenActionBarStyle</item>
+    <item name="android:windowActionBarOverlay">true</item>
+    <item name="android:windowBackground">@null</item>
+    <item name="metaButtonBarStyle">?android:attr/buttonBarStyle</item>
+    <item name="metaButtonBarButtonStyle">?android:attr/buttonBarButtonStyle</item>
+  </style>
+
+  <style name="FullscreenActionBarStyle" parent="android:Widget.Holo.ActionBar">
+    <!--  <item name="android:background">@color/black_overlay</item>  -->
+  </style>
+
+</resources>
diff --git a/tensorflow/examples/android/res/values-v11/template-styles.xml b/tensorflow/examples/android/res/values-v11/template-styles.xml
new file mode 100644
index 0000000000..8c1ea66f28
--- /dev/null
+++ b/tensorflow/examples/android/res/values-v11/template-styles.xml
@@ -0,0 +1,22 @@
+<!--
+  Copyright 2013 The Android Open Source Project
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+  -->
+
+<resources>
+
+    <!-- Activity themes -->
+    <style name="Theme.Base" parent="android:Theme.Holo.Light" />
+
+</resources>
diff --git a/tensorflow/examples/android/res/values-v14/styles.xml b/tensorflow/examples/android/res/values-v14/styles.xml
new file mode 100644
index 0000000000..cc370849c0
--- /dev/null
+++ b/tensorflow/examples/android/res/values-v14/styles.xml
@@ -0,0 +1,12 @@
+<resources>
+
+  <!--
+        Base application theme for API 14+. This theme completely replaces
+        AppBaseTheme from BOTH res/values/styles.xml and
+        res/values-v11/styles.xml on API 14+ devices.
+  -->
+  <style name="AppBaseTheme" parent="android:Theme.Holo.Light.DarkActionBar">
+    <!-- API 14 theme customizations can go here. -->
+  </style>
+
+</resources>
diff --git a/tensorflow/examples/android/res/values-v21/base-colors.xml b/tensorflow/examples/android/res/values-v21/base-colors.xml
new file mode 100644
index 0000000000..8b6ec3f85d
--- /dev/null
+++ b/tensorflow/examples/android/res/values-v21/base-colors.xml
@@ -0,0 +1,21 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Copyright 2013 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<resources>
+
+
+</resources>
diff --git a/tensorflow/examples/android/res/values-v21/base-template-styles.xml b/tensorflow/examples/android/res/values-v21/base-template-styles.xml
new file mode 100644
index 0000000000..c778e4f98a
--- /dev/null
+++ b/tensorflow/examples/android/res/values-v21/base-template-styles.xml
@@ -0,0 +1,24 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Copyright 2013 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<resources>
+
+    <!-- Activity themes -->
+    <style name="Theme.Base" parent="android:Theme.Material.Light">
+    </style>
+
+</resources>
diff --git a/tensorflow/examples/android/res/values/attrs.xml b/tensorflow/examples/android/res/values/attrs.xml
new file mode 100644
index 0000000000..56e5beae76
--- /dev/null
+++ b/tensorflow/examples/android/res/values/attrs.xml
@@ -0,0 +1,14 @@
+<resources>
+
+  <!--
+         Declare custom theme attributes that allow changing which styles are
+         used for button bars depending on the API level.
+         ?android:attr/buttonBarStyle is new as of API 11 so this is
+         necessary to support previous API levels.
+  -->
+  <declare-styleable name="ButtonBarContainerTheme">
+    <attr name="metaButtonBarStyle" format="reference" />
+    <attr name="metaButtonBarButtonStyle" format="reference" />
+  </declare-styleable>
+
+</resources>
diff --git a/tensorflow/examples/android/res/values/base-strings.xml b/tensorflow/examples/android/res/values/base-strings.xml
new file mode 100644
index 0000000000..e6c3bc7fa0
--- /dev/null
+++ b/tensorflow/examples/android/res/values/base-strings.xml
@@ -0,0 +1,20 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Copyright 2013 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<resources>
+    <string name="app_name">Tensorflow Demo</string>
+</resources>
diff --git a/tensorflow/examples/android/res/values/colors.xml b/tensorflow/examples/android/res/values/colors.xml
new file mode 100644
index 0000000000..4b75d2b2bd
--- /dev/null
+++ b/tensorflow/examples/android/res/values/colors.xml
@@ -0,0 +1,19 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!--
+ Copyright 2015 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<resources>
+    <color name="control_background">#cc4285f4</color>
+</resources>
diff --git a/tensorflow/examples/android/res/values/strings.xml b/tensorflow/examples/android/res/values/strings.xml
new file mode 100644
index 0000000000..038c73b3d9
--- /dev/null
+++ b/tensorflow/examples/android/res/values/strings.xml
@@ -0,0 +1,20 @@
+<?xml version="1.0" encoding="utf-8"?><!--
+ Copyright 2014 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<resources>
+    <string name="description_info">Info</string>
+    <string name="request_permission">This sample needs camera permission.</string>
+    <string name="camera_error">This device doesn\'t support Camera2 API.</string>
+</resources>
diff --git a/tensorflow/examples/android/res/values/styles.xml b/tensorflow/examples/android/res/values/styles.xml
new file mode 100644
index 0000000000..3f3bdfb494
--- /dev/null
+++ b/tensorflow/examples/android/res/values/styles.xml
@@ -0,0 +1,18 @@
+<?xml version="1.0" encoding="utf-8"?><!--
+ Copyright 2014 The Android Open Source Project
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<resources>
+    <style name="MaterialTheme" parent="android:Theme.Material.Light.NoActionBar.Fullscreen" />
+</resources>
diff --git a/tensorflow/examples/android/res/values/template-dimens.xml b/tensorflow/examples/android/res/values/template-dimens.xml
new file mode 100644
index 0000000000..39e710b5ca
--- /dev/null
+++ b/tensorflow/examples/android/res/values/template-dimens.xml
@@ -0,0 +1,32 @@
+<!--
+  Copyright 2013 The Android Open Source Project
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+  -->
+
+<resources>
+
+    <!-- Define standard dimensions to comply with Holo-style grids and rhythm. -->
+
+    <dimen name="margin_tiny">4dp</dimen>
+    <dimen name="margin_small">8dp</dimen>
+    <dimen name="margin_medium">16dp</dimen>
+    <dimen name="margin_large">32dp</dimen>
+    <dimen name="margin_huge">64dp</dimen>
+
+    <!-- Semantic definitions -->
+
+    <dimen name="horizontal_page_margin">@dimen/margin_medium</dimen>
+    <dimen name="vertical_page_margin">@dimen/margin_medium</dimen>
+
+</resources>
diff --git a/tensorflow/examples/android/res/values/template-styles.xml b/tensorflow/examples/android/res/values/template-styles.xml
new file mode 100644
index 0000000000..6e7d593dd8
--- /dev/null
+++ b/tensorflow/examples/android/res/values/template-styles.xml
@@ -0,0 +1,42 @@
+<!--
+  Copyright 2013 The Android Open Source Project
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+  -->
+
+<resources>
+
+    <!-- Activity themes -->
+
+    <style name="Theme.Base" parent="android:Theme.Light" />
+
+    <style name="Theme.Sample" parent="Theme.Base" />
+
+    <style name="AppTheme" parent="Theme.Sample" />
+    <!-- Widget styling -->
+
+    <style name="Widget" />
+
+    <style name="Widget.SampleMessage">
+        <item name="android:textAppearance">?android:textAppearanceMedium</item>
+        <item name="android:lineSpacingMultiplier">1.1</item>
+    </style>
+
+    <style name="Widget.SampleMessageTile">
+        <item name="android:background">@drawable/tile</item>
+        <item name="android:shadowColor">#7F000000</item>
+        <item name="android:shadowDy">-3.5</item>
+        <item name="android:shadowRadius">2</item>
+    </style>
+
+</resources>
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/AutoFitTextureView.java b/tensorflow/examples/android/src/org/tensorflow/demo/AutoFitTextureView.java
new file mode 100644
index 0000000000..011dc64d16
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/AutoFitTextureView.java
@@ -0,0 +1,74 @@
+/*
+ * Copyright 2014 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.tensorflow.demo;
+
+import android.content.Context;
+import android.util.AttributeSet;
+import android.view.TextureView;
+
+/**
+ * A {@link TextureView} that can be adjusted to a specified aspect ratio.
+ */
+public class AutoFitTextureView extends TextureView {
+  private int ratioWidth = 0;
+  private int ratioHeight = 0;
+
+  public AutoFitTextureView(final Context context) {
+    this(context, null);
+  }
+
+  public AutoFitTextureView(final Context context, final AttributeSet attrs) {
+    this(context, attrs, 0);
+  }
+
+  public AutoFitTextureView(final Context context, final AttributeSet attrs, final int defStyle) {
+    super(context, attrs, defStyle);
+  }
+
+  /**
+   * Sets the aspect ratio for this view. The size of the view will be measured based on the ratio
+   * calculated from the parameters. Note that the actual sizes of parameters don't matter, that
+   * is, calling setAspectRatio(2, 3) and setAspectRatio(4, 6) make the same result.
+   *
+   * @param width  Relative horizontal size
+   * @param height Relative vertical size
+   */
+  public void setAspectRatio(final int width, final int height) {
+    if (width < 0 || height < 0) {
+      throw new IllegalArgumentException("Size cannot be negative.");
+    }
+    ratioWidth = width;
+    ratioHeight = height;
+    requestLayout();
+  }
+
+  @Override
+  protected void onMeasure(final int widthMeasureSpec, final int heightMeasureSpec) {
+    super.onMeasure(widthMeasureSpec, heightMeasureSpec);
+    final int width = MeasureSpec.getSize(widthMeasureSpec);
+    final int height = MeasureSpec.getSize(heightMeasureSpec);
+    if (0 == ratioWidth || 0 == ratioHeight) {
+      setMeasuredDimension(width, height);
+    } else {
+      if (width < height * ratioWidth / ratioHeight) {
+        setMeasuredDimension(width, width * ratioHeight / ratioWidth);
+      } else {
+        setMeasuredDimension(height * ratioWidth / ratioHeight, height);
+      }
+    }
+  }
+}
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/CameraActivity.java b/tensorflow/examples/android/src/org/tensorflow/demo/CameraActivity.java
new file mode 100644
index 0000000000..943dddd254
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/CameraActivity.java
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2014 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.tensorflow.demo;
+
+import android.app.Activity;
+import android.os.Bundle;
+
+public class CameraActivity extends Activity {
+  @Override
+  protected void onCreate(final Bundle savedInstanceState) {
+    super.onCreate(savedInstanceState);
+    setContentView(R.layout.activity_camera);
+    if (null == savedInstanceState) {
+      getFragmentManager()
+          .beginTransaction()
+          .replace(R.id.container, CameraConnectionFragment.newInstance())
+          .commit();
+    }
+  }
+}
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/CameraConnectionFragment.java b/tensorflow/examples/android/src/org/tensorflow/demo/CameraConnectionFragment.java
new file mode 100644
index 0000000000..d9a696d9bb
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/CameraConnectionFragment.java
@@ -0,0 +1,593 @@
+/*
+ * Copyright 2014 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.tensorflow.demo;
+
+import android.app.Activity;
+import android.app.AlertDialog;
+import android.app.Dialog;
+import android.app.DialogFragment;
+import android.app.Fragment;
+import android.content.Context;
+import android.content.DialogInterface;
+import android.content.res.Configuration;
+import android.graphics.ImageFormat;
+import android.graphics.Matrix;
+import android.graphics.RectF;
+import android.graphics.SurfaceTexture;
+import android.hardware.camera2.CameraAccessException;
+import android.hardware.camera2.CameraCaptureSession;
+import android.hardware.camera2.CameraCharacteristics;
+import android.hardware.camera2.CameraDevice;
+import android.hardware.camera2.CameraManager;
+import android.hardware.camera2.CaptureRequest;
+import android.hardware.camera2.CaptureResult;
+import android.hardware.camera2.TotalCaptureResult;
+import android.hardware.camera2.params.StreamConfigurationMap;
+import android.media.ImageReader;
+import android.os.Bundle;
+import android.os.Handler;
+import android.os.HandlerThread;
+import android.util.Size;
+import android.util.SparseIntArray;
+import android.view.LayoutInflater;
+import android.view.Surface;
+import android.view.TextureView;
+import android.view.View;
+import android.view.ViewGroup;
+import android.widget.Toast;
+
+import org.tensorflow.demo.env.Logger;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.concurrent.Semaphore;
+import java.util.concurrent.TimeUnit;
+
+public class CameraConnectionFragment extends Fragment {
+  private static final Logger LOGGER = new Logger();
+
+  private RecognitionScoreView scoreView;
+
+  /**
+   * Conversion from screen rotation to JPEG orientation.
+   */
+  private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
+  private static final String FRAGMENT_DIALOG = "dialog";
+
+  static {
+    ORIENTATIONS.append(Surface.ROTATION_0, 90);
+    ORIENTATIONS.append(Surface.ROTATION_90, 0);
+    ORIENTATIONS.append(Surface.ROTATION_180, 270);
+    ORIENTATIONS.append(Surface.ROTATION_270, 180);
+  }
+
+  /**
+   * {@link android.view.TextureView.SurfaceTextureListener} handles several lifecycle events on a
+   * {@link TextureView}.
+   */
+  private final TextureView.SurfaceTextureListener surfaceTextureListener =
+      new TextureView.SurfaceTextureListener() {
+        @Override
+        public void onSurfaceTextureAvailable(
+            final SurfaceTexture texture, final int width, final int height) {
+          openCamera(width, height);
+        }
+
+        @Override
+        public void onSurfaceTextureSizeChanged(
+            final SurfaceTexture texture, final int width, final int height) {
+          configureTransform(width, height);
+        }
+
+        @Override
+        public boolean onSurfaceTextureDestroyed(final SurfaceTexture texture) {
+          return true;
+        }
+
+        @Override
+        public void onSurfaceTextureUpdated(final SurfaceTexture texture) {}
+      };
+
+  /**
+   * ID of the current {@link CameraDevice}.
+   */
+  private String cameraId;
+
+  /**
+   * An {@link AutoFitTextureView} for camera preview.
+   */
+  private AutoFitTextureView textureView;
+
+  /**
+   * A {@link CameraCaptureSession } for camera preview.
+   */
+  private CameraCaptureSession captureSession;
+
+  /**
+   * A reference to the opened {@link CameraDevice}.
+   */
+  private CameraDevice cameraDevice;
+
+  /**
+   * The {@link android.util.Size} of camera preview.
+   */
+  private Size previewSize;
+
+  /**
+   * {@link android.hardware.camera2.CameraDevice.StateCallback}
+   * is called when {@link CameraDevice} changes its state.
+   */
+  private final CameraDevice.StateCallback stateCallback =
+      new CameraDevice.StateCallback() {
+        @Override
+        public void onOpened(final CameraDevice cd) {
+          // This method is called when the camera is opened.  We start camera preview here.
+          cameraOpenCloseLock.release();
+          cameraDevice = cd;
+          createCameraPreviewSession();
+        }
+
+        @Override
+        public void onDisconnected(final CameraDevice cd) {
+          cameraOpenCloseLock.release();
+          cd.close();
+          cameraDevice = null;
+        }
+
+        @Override
+        public void onError(final CameraDevice cd, final int error) {
+          cameraOpenCloseLock.release();
+          cd.close();
+          cameraDevice = null;
+          final Activity activity = getActivity();
+          if (null != activity) {
+            activity.finish();
+          }
+        }
+      };
+
+  /**
+   * An additional thread for running tasks that shouldn't block the UI.
+   */
+  private HandlerThread backgroundThread;
+
+  /**
+   * A {@link Handler} for running tasks in the background.
+   */
+  private Handler backgroundHandler;
+
+  /**
+   * An {@link ImageReader} that handles still image capture.
+   */
+  private ImageReader imageReader;
+
+  /**
+   * {@link android.hardware.camera2.CaptureRequest.Builder} for the camera preview
+   */
+  private CaptureRequest.Builder previewRequestBuilder;
+
+  /**
+   * {@link CaptureRequest} generated by {@link #previewRequestBuilder}
+   */
+  private CaptureRequest previewRequest;
+
+  /**
+   * A {@link Semaphore} to prevent the app from exiting before closing the camera.
+   */
+  private final Semaphore cameraOpenCloseLock = new Semaphore(1);
+
+  /**
+   * Shows a {@link Toast} on the UI thread.
+   *
+   * @param text The message to show
+   */
+  private void showToast(final String text) {
+    final Activity activity = getActivity();
+    if (activity != null) {
+      activity.runOnUiThread(
+          new Runnable() {
+            @Override
+            public void run() {
+              Toast.makeText(activity, text, Toast.LENGTH_SHORT).show();
+            }
+          });
+    }
+  }
+
+  /**
+   * Given {@code choices} of {@code Size}s supported by a camera, chooses the smallest one whose
+   * width and height are at least as large as the respective requested values, and whose aspect
+   * ratio matches with the specified value.
+   *
+   * @param choices     The list of sizes that the camera supports for the intended output class
+   * @param width       The minimum desired width
+   * @param height      The minimum desired height
+   * @param aspectRatio The aspect ratio
+   * @return The optimal {@code Size}, or an arbitrary one if none were big enough
+   */
+  private static Size chooseOptimalSize(
+      final Size[] choices, final int width, final int height, final Size aspectRatio) {
+    // Collect the supported resolutions that are at least as big as the preview Surface
+    final List<Size> bigEnough = new ArrayList<>();
+    for (final Size option : choices) {
+      // TODO(andrewharp): Choose size intelligently.
+      if (option.getHeight() == 320 && option.getWidth() == 480) {
+        LOGGER.i("Adding size: " + option.getWidth() + "x" + option.getHeight());
+        bigEnough.add(option);
+      } else {
+        LOGGER.i("Not adding size: " + option.getWidth() + "x" + option.getHeight());
+      }
+    }
+
+    // Pick the smallest of those, assuming we found any
+    if (bigEnough.size() > 0) {
+      final Size chosenSize = Collections.min(bigEnough, new CompareSizesByArea());
+      LOGGER.i("Chosen size: " + chosenSize.getWidth() + "x" + chosenSize.getHeight());
+      return chosenSize;
+    } else {
+      LOGGER.e("Couldn't find any suitable preview size");
+      return choices[0];
+    }
+  }
+
+  public static CameraConnectionFragment newInstance() {
+    return new CameraConnectionFragment();
+  }
+
+  @Override
+  public View onCreateView(
+      final LayoutInflater inflater, final ViewGroup container, final Bundle savedInstanceState) {
+    return inflater.inflate(R.layout.camera_connection_fragment, container, false);
+  }
+
+  @Override
+  public void onViewCreated(final View view, final Bundle savedInstanceState) {
+    textureView = (AutoFitTextureView) view.findViewById(R.id.texture);
+    scoreView = (RecognitionScoreView) view.findViewById(R.id.results);
+  }
+
+  @Override
+  public void onActivityCreated(final Bundle savedInstanceState) {
+    super.onActivityCreated(savedInstanceState);
+  }
+
+  @Override
+  public void onResume() {
+    super.onResume();
+    startBackgroundThread();
+
+    // When the screen is turned off and turned back on, the SurfaceTexture is already
+    // available, and "onSurfaceTextureAvailable" will not be called. In that case, we can open
+    // a camera and start preview from here (otherwise, we wait until the surface is ready in
+    // the SurfaceTextureListener).
+    if (textureView.isAvailable()) {
+      openCamera(textureView.getWidth(), textureView.getHeight());
+    } else {
+      textureView.setSurfaceTextureListener(surfaceTextureListener);
+    }
+  }
+
+  @Override
+  public void onPause() {
+    closeCamera();
+    stopBackgroundThread();
+    super.onPause();
+  }
+
+  /**
+   * Sets up member variables related to camera.
+   *
+   * @param width  The width of available size for camera preview
+   * @param height The height of available size for camera preview
+   */
+  private void setUpCameraOutputs(final int width, final int height) {
+    final Activity activity = getActivity();
+    final CameraManager manager = (CameraManager) activity.getSystemService(Context.CAMERA_SERVICE);
+    try {
+      for (final String cameraId : manager.getCameraIdList()) {
+        final CameraCharacteristics characteristics = manager.getCameraCharacteristics(cameraId);
+
+        // We don't use a front facing camera in this sample.
+        final Integer facing = characteristics.get(CameraCharacteristics.LENS_FACING);
+        if (facing != null && facing == CameraCharacteristics.LENS_FACING_FRONT) {
+          continue;
+        }
+
+        final StreamConfigurationMap map =
+            characteristics.get(CameraCharacteristics.SCALER_STREAM_CONFIGURATION_MAP);
+
+        if (map == null) {
+          continue;
+        }
+
+        // For still image captures, we use the largest available size.
+        final Size largest =
+            Collections.max(
+                Arrays.asList(map.getOutputSizes(ImageFormat.YUV_420_888)),
+                new CompareSizesByArea());
+
+        imageReader =
+            ImageReader.newInstance(
+                largest.getWidth(), largest.getHeight(), ImageFormat.YUV_420_888, /*maxImages*/ 2);
+
+        // Danger, W.R.! Attempting to use too large a preview size could  exceed the camera
+        // bus' bandwidth limitation, resulting in gorgeous previews but the storage of
+        // garbage capture data.
+        previewSize =
+            chooseOptimalSize(map.getOutputSizes(SurfaceTexture.class), width, height, largest);
+
+        // We fit the aspect ratio of TextureView to the size of preview we picked.
+        final int orientation = getResources().getConfiguration().orientation;
+        if (orientation == Configuration.ORIENTATION_LANDSCAPE) {
+          textureView.setAspectRatio(previewSize.getWidth(), previewSize.getHeight());
+        } else {
+          textureView.setAspectRatio(previewSize.getHeight(), previewSize.getWidth());
+        }
+
+        CameraConnectionFragment.this.cameraId = cameraId;
+        return;
+      }
+    } catch (final CameraAccessException e) {
+      LOGGER.e(e, "Exception!");
+    } catch (final NullPointerException e) {
+      // Currently an NPE is thrown when the Camera2API is used but not supported on the
+      // device this code runs.
+      ErrorDialog.newInstance(getString(R.string.camera_error))
+          .show(getChildFragmentManager(), FRAGMENT_DIALOG);
+    }
+  }
+
+  /**
+   * Opens the camera specified by {@link CameraConnectionFragment#cameraId}.
+   */
+  private void openCamera(final int width, final int height) {
+    setUpCameraOutputs(width, height);
+    configureTransform(width, height);
+    final Activity activity = getActivity();
+    final CameraManager manager = (CameraManager) activity.getSystemService(Context.CAMERA_SERVICE);
+    try {
+      if (!cameraOpenCloseLock.tryAcquire(2500, TimeUnit.MILLISECONDS)) {
+        throw new RuntimeException("Time out waiting to lock camera opening.");
+      }
+      manager.openCamera(cameraId, stateCallback, backgroundHandler);
+    } catch (final CameraAccessException e) {
+      LOGGER.e(e, "Exception!");
+    } catch (final InterruptedException e) {
+      throw new RuntimeException("Interrupted while trying to lock camera opening.", e);
+    }
+  }
+
+  /**
+   * Closes the current {@link CameraDevice}.
+   */
+  private void closeCamera() {
+    try {
+      cameraOpenCloseLock.acquire();
+      if (null != captureSession) {
+        captureSession.close();
+        captureSession = null;
+      }
+      if (null != cameraDevice) {
+        cameraDevice.close();
+        cameraDevice = null;
+      }
+      if (null != imageReader) {
+        imageReader.close();
+        imageReader = null;
+      }
+    } catch (final InterruptedException e) {
+      throw new RuntimeException("Interrupted while trying to lock camera closing.", e);
+    } finally {
+      cameraOpenCloseLock.release();
+    }
+  }
+
+  /**
+   * Starts a background thread and its {@link Handler}.
+   */
+  private void startBackgroundThread() {
+    backgroundThread = new HandlerThread("CameraBackground");
+    backgroundThread.start();
+    backgroundHandler = new Handler(backgroundThread.getLooper());
+  }
+
+  /**
+   * Stops the background thread and its {@link Handler}.
+   */
+  private void stopBackgroundThread() {
+    backgroundThread.quitSafely();
+    try {
+      backgroundThread.join();
+      backgroundThread = null;
+      backgroundHandler = null;
+    } catch (final InterruptedException e) {
+      LOGGER.e(e, "Exception!");
+    }
+  }
+
+  private final TensorflowImageListener tfPreviewListener = new TensorflowImageListener();
+
+  private final CameraCaptureSession.CaptureCallback captureCallback =
+      new CameraCaptureSession.CaptureCallback() {
+        @Override
+        public void onCaptureProgressed(
+            final CameraCaptureSession session,
+            final CaptureRequest request,
+            final CaptureResult partialResult) {}
+
+        @Override
+        public void onCaptureCompleted(
+            final CameraCaptureSession session,
+            final CaptureRequest request,
+            final TotalCaptureResult result) {}
+      };
+
+  /**
+   * Creates a new {@link CameraCaptureSession} for camera preview.
+   */
+  private void createCameraPreviewSession() {
+    try {
+      final SurfaceTexture texture = textureView.getSurfaceTexture();
+      assert texture != null;
+
+      // We configure the size of default buffer to be the size of camera preview we want.
+      texture.setDefaultBufferSize(previewSize.getWidth(), previewSize.getHeight());
+
+      // This is the output Surface we need to start preview.
+      final Surface surface = new Surface(texture);
+
+      // We set up a CaptureRequest.Builder with the output Surface.
+      previewRequestBuilder = cameraDevice.createCaptureRequest(CameraDevice.TEMPLATE_PREVIEW);
+      previewRequestBuilder.addTarget(surface);
+
+      LOGGER.i("Opening camera preview: " + previewSize.getWidth() + "x" + previewSize.getHeight());
+
+      // Create the reader for the preview frames.
+      final ImageReader previewReader =
+          ImageReader.newInstance(
+              previewSize.getWidth(), previewSize.getHeight(), ImageFormat.YUV_420_888, 2);
+
+      previewReader.setOnImageAvailableListener(tfPreviewListener, backgroundHandler);
+      previewRequestBuilder.addTarget(previewReader.getSurface());
+
+      // Here, we create a CameraCaptureSession for camera preview.
+      cameraDevice.createCaptureSession(
+          Arrays.asList(surface, imageReader.getSurface(), previewReader.getSurface()),
+          new CameraCaptureSession.StateCallback() {
+
+            @Override
+            public void onConfigured(final CameraCaptureSession cameraCaptureSession) {
+              // The camera is already closed
+              if (null == cameraDevice) {
+                return;
+              }
+
+              // When the session is ready, we start displaying the preview.
+              captureSession = cameraCaptureSession;
+              try {
+                // Auto focus should be continuous for camera preview.
+                previewRequestBuilder.set(
+                    CaptureRequest.CONTROL_AF_MODE,
+                    CaptureRequest.CONTROL_AF_MODE_CONTINUOUS_PICTURE);
+                // Flash is automatically enabled when necessary.
+                previewRequestBuilder.set(
+                    CaptureRequest.CONTROL_AE_MODE, CaptureRequest.CONTROL_AE_MODE_ON_AUTO_FLASH);
+
+                // Finally, we start displaying the camera preview.
+                previewRequest = previewRequestBuilder.build();
+                captureSession.setRepeatingRequest(
+                    previewRequest, captureCallback, backgroundHandler);
+              } catch (final CameraAccessException e) {
+                LOGGER.e(e, "Exception!");
+              }
+            }
+
+            @Override
+            public void onConfigureFailed(final CameraCaptureSession cameraCaptureSession) {
+              showToast("Failed");
+            }
+          },
+          null);
+    } catch (final CameraAccessException e) {
+      LOGGER.e(e, "Exception!");
+    }
+
+    LOGGER.i("Getting assets.");
+    tfPreviewListener.initialize(getActivity().getAssets(), scoreView);
+    LOGGER.i("Tensorflow initialized.");
+  }
+
+  /**
+   * Configures the necessary {@link android.graphics.Matrix} transformation to `mTextureView`.
+   * This method should be called after the camera preview size is determined in
+   * setUpCameraOutputs and also the size of `mTextureView` is fixed.
+   *
+   * @param viewWidth  The width of `mTextureView`
+   * @param viewHeight The height of `mTextureView`
+   */
+  private void configureTransform(final int viewWidth, final int viewHeight) {
+    final Activity activity = getActivity();
+    if (null == textureView || null == previewSize || null == activity) {
+      return;
+    }
+    final int rotation = activity.getWindowManager().getDefaultDisplay().getRotation();
+    final Matrix matrix = new Matrix();
+    final RectF viewRect = new RectF(0, 0, viewWidth, viewHeight);
+    final RectF bufferRect = new RectF(0, 0, previewSize.getHeight(), previewSize.getWidth());
+    final float centerX = viewRect.centerX();
+    final float centerY = viewRect.centerY();
+    if (Surface.ROTATION_90 == rotation || Surface.ROTATION_270 == rotation) {
+      bufferRect.offset(centerX - bufferRect.centerX(), centerY - bufferRect.centerY());
+      matrix.setRectToRect(viewRect, bufferRect, Matrix.ScaleToFit.FILL);
+      final float scale =
+          Math.max(
+              (float) viewHeight / previewSize.getHeight(),
+              (float) viewWidth / previewSize.getWidth());
+      matrix.postScale(scale, scale, centerX, centerY);
+      matrix.postRotate(90 * (rotation - 2), centerX, centerY);
+    } else if (Surface.ROTATION_180 == rotation) {
+      matrix.postRotate(180, centerX, centerY);
+    }
+    textureView.setTransform(matrix);
+  }
+
+  /**
+   * Compares two {@code Size}s based on their areas.
+   */
+  static class CompareSizesByArea implements Comparator<Size> {
+    @Override
+    public int compare(final Size lhs, final Size rhs) {
+      // We cast here to ensure the multiplications won't overflow
+      return Long.signum(
+          (long) lhs.getWidth() * lhs.getHeight() - (long) rhs.getWidth() * rhs.getHeight());
+    }
+  }
+
+  /**
+   * Shows an error message dialog.
+   */
+  public static class ErrorDialog extends DialogFragment {
+    private static final String ARG_MESSAGE = "message";
+
+    public static ErrorDialog newInstance(final String message) {
+      final ErrorDialog dialog = new ErrorDialog();
+      final Bundle args = new Bundle();
+      args.putString(ARG_MESSAGE, message);
+      dialog.setArguments(args);
+      return dialog;
+    }
+
+    @Override
+    public Dialog onCreateDialog(final Bundle savedInstanceState) {
+      final Activity activity = getActivity();
+      return new AlertDialog.Builder(activity)
+          .setMessage(getArguments().getString(ARG_MESSAGE))
+          .setPositiveButton(
+              android.R.string.ok,
+              new DialogInterface.OnClickListener() {
+                @Override
+                public void onClick(final DialogInterface dialogInterface, final int i) {
+                  activity.finish();
+                }
+              })
+          .create();
+    }
+  }
+}
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/Classifier.java b/tensorflow/examples/android/src/org/tensorflow/demo/Classifier.java
new file mode 100644
index 0000000000..60b3037c7d
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/Classifier.java
@@ -0,0 +1,87 @@
+package org.tensorflow.demo;
+
+import android.graphics.Bitmap;
+import android.graphics.RectF;
+
+import java.util.List;
+
+/**
+ * Generic interface for interacting with different recognition engines.
+ */
+public interface Classifier {
+  /**
+   * An immutable result returned by a Classifier describing what was recognized.
+   */
+  public class Recognition {
+    /**
+     * A unique identifier for what has been recognized. Specific to the class, not the instance of
+     * the object.
+     */
+    private final String id;
+
+    /**
+     * Display name for the recognition.
+     */
+    private final String title;
+
+    /**
+     * A sortable score for how good the recognition is relative to others. Higher should be better.
+     */
+    private final Float confidence;
+
+    /**
+     * Optional location within the source image for the location of the recognized object.
+     */
+    private final RectF location;
+
+    public Recognition(
+        final String id, final String title, final Float confidence, final RectF location) {
+      this.id = id;
+      this.title = title;
+      this.confidence = confidence;
+      this.location = location;
+    }
+
+    public String getId() {
+      return id;
+    }
+
+    public String getTitle() {
+      return title;
+    }
+
+    public Float getConfidence() {
+      return confidence;
+    }
+
+    public RectF getLocation() {
+      return new RectF(location);
+    }
+
+    @Override
+    public String toString() {
+      String resultString = "";
+      if (id != null) {
+        resultString += "[" + id + "] ";
+      }
+
+      if (title != null) {
+        resultString += title + " ";
+      }
+
+      if (confidence != null) {
+        resultString += String.format("(%.1f%%) ", confidence * 100.0f);
+      }
+
+      if (location != null) {
+        resultString += location + " ";
+      }
+
+      return resultString.trim();
+    }
+  }
+
+  List<Recognition> recognizeImage(Bitmap bitmap);
+
+  void close();
+}
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/RecognitionScoreView.java b/tensorflow/examples/android/src/org/tensorflow/demo/RecognitionScoreView.java
new file mode 100644
index 0000000000..961b492a8d
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/RecognitionScoreView.java
@@ -0,0 +1,53 @@
+package org.tensorflow.demo;
+
+import android.content.Context;
+import android.graphics.Canvas;
+import android.graphics.Paint;
+import android.util.AttributeSet;
+import android.util.TypedValue;
+import android.view.View;
+
+import org.tensorflow.demo.Classifier.Recognition;
+
+import java.util.List;
+
+public class RecognitionScoreView extends View {
+  private static final float TEXT_SIZE_DIP = 24;
+  private List<Recognition> results;
+  private final float textSizePx;
+  private final Paint fgPaint;
+  private final Paint bgPaint;
+
+  public RecognitionScoreView(final Context context, final AttributeSet set) {
+    super(context, set);
+
+    textSizePx =
+        TypedValue.applyDimension(
+            TypedValue.COMPLEX_UNIT_DIP, TEXT_SIZE_DIP, getResources().getDisplayMetrics());
+    fgPaint = new Paint();
+    fgPaint.setTextSize(textSizePx);
+
+    bgPaint = new Paint();
+    bgPaint.setColor(0xcc4285f4);
+  }
+
+  public void setResults(final List<Recognition> results) {
+    this.results = results;
+    postInvalidate();
+  }
+
+  @Override
+  public void onDraw(final Canvas canvas) {
+    final int x = 10;
+    int y = (int) (fgPaint.getTextSize() * 1.5f);
+
+    canvas.drawPaint(bgPaint);
+
+    if (results != null) {
+      for (final Recognition recog : results) {
+        canvas.drawText(recog.getTitle() + ": " + recog.getConfidence(), x, y, fgPaint);
+        y += fgPaint.getTextSize() * 1.5f;
+      }
+    }
+  }
+}
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/TensorflowClassifier.java b/tensorflow/examples/android/src/org/tensorflow/demo/TensorflowClassifier.java
new file mode 100644
index 0000000000..84a7596ecb
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/TensorflowClassifier.java
@@ -0,0 +1,62 @@
+package org.tensorflow.demo;
+
+import android.content.res.AssetManager;
+import android.graphics.Bitmap;
+import android.util.Log;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.StringTokenizer;
+
+/**
+ * JNI wrapper class for the Tensorflow native code.
+ */
+public class TensorflowClassifier implements Classifier {
+  private static final String TAG = "TensorflowClassifier";
+
+  // jni native methods.
+  public native int initializeTensorflow(
+      AssetManager assetManager,
+      String model,
+      String labels,
+      int numClasses,
+      int inputSize,
+      int imageMean);
+
+  private native String classifyImageBmp(Bitmap bitmap);
+
+  private native String classifyImageRgb(int[] output, int width, int height);
+
+  static {
+    System.loadLibrary("tensorflow_demo");
+  }
+
+  @Override
+  public List<Recognition> recognizeImage(final Bitmap bitmap) {
+    final ArrayList<Recognition> recognitions = new ArrayList<Recognition>();
+    for (final String result : classifyImageBmp(bitmap).split("\n")) {
+      Log.i(TAG, "Parsing [" + result + "]");
+
+      // Clean up the string as needed
+      final StringTokenizer st = new StringTokenizer(result);
+      if (!st.hasMoreTokens()) {
+        continue;
+      }
+
+      final String id = st.nextToken();
+      final String confidenceString = st.nextToken();
+      final float confidence = Float.parseFloat(confidenceString);
+
+      final String title =
+          result.substring(id.length() + confidenceString.length() + 2, result.length());
+
+      if (!title.isEmpty()) {
+        recognitions.add(new Recognition(id, title, confidence, null));
+      }
+    }
+    return recognitions;
+  }
+
+  @Override
+  public void close() {}
+}
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/TensorflowImageListener.java b/tensorflow/examples/android/src/org/tensorflow/demo/TensorflowImageListener.java
new file mode 100644
index 0000000000..940fbc6771
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/TensorflowImageListener.java
@@ -0,0 +1,147 @@
+package org.tensorflow.demo;
+
+import android.content.res.AssetManager;
+import android.graphics.Bitmap;
+import android.graphics.Bitmap.Config;
+import android.graphics.Canvas;
+import android.graphics.Matrix;
+import android.media.Image;
+import android.media.Image.Plane;
+import android.media.ImageReader;
+import android.media.ImageReader.OnImageAvailableListener;
+
+import junit.framework.Assert;
+
+import org.tensorflow.demo.env.ImageUtils;
+import org.tensorflow.demo.env.Logger;
+
+import java.nio.ByteBuffer;
+import java.util.List;
+
+/**
+ * Class that takes in preview frames and converts the image to Bitmaps to process with Tensorflow.
+ */
+public class TensorflowImageListener implements OnImageAvailableListener {
+  private static final Logger LOGGER = new Logger();
+
+  private static final boolean SAVE_PREVIEW_BITMAP = false;
+
+  private static final String MODEL_FILE = "file:///android_asset/tensorflow_inception_graph.pb";
+  private static final String LABEL_FILE =
+      "file:///android_asset/imagenet_comp_graph_label_strings.txt";
+
+  private static final int NUM_CLASSES = 1001;
+  private static final int INPUT_SIZE = 224;
+  private static final int IMAGE_MEAN = 117;
+
+  // TODO(andrewharp): Get orientation programatically.
+  private final int screenRotation = 90;
+
+  private final TensorflowClassifier tensorflow = new TensorflowClassifier();
+
+  private int previewWidth = 0;
+  private int previewHeight = 0;
+  private byte[] yuvBytes = null;
+  private int[] rgbBytes = null;
+  private Bitmap rgbFrameBitmap = null;
+  private Bitmap croppedBitmap = null;
+
+  private RecognitionScoreView scoreView;
+
+  public void initialize(final AssetManager assetManager, final RecognitionScoreView scoreView) {
+    tensorflow.initializeTensorflow(
+        assetManager, MODEL_FILE, LABEL_FILE, NUM_CLASSES, INPUT_SIZE, IMAGE_MEAN);
+    this.scoreView = scoreView;
+  }
+
+  private void drawResizedBitmap(final Bitmap src, final Bitmap dst) {
+    Assert.assertEquals(dst.getWidth(), dst.getHeight());
+    final float minDim = Math.min(src.getWidth(), src.getHeight());
+
+    final Matrix matrix = new Matrix();
+
+    // We only want the center square out of the original rectangle.
+    final float translateX = -Math.max(0, (src.getWidth() - minDim) / 2);
+    final float translateY = -Math.max(0, (src.getHeight() - minDim) / 2);
+    matrix.preTranslate(translateX, translateY);
+
+    final float scaleFactor = dst.getHeight() / minDim;
+    matrix.postScale(scaleFactor, scaleFactor);
+
+    // Rotate around the center if necessary.
+    if (screenRotation != 0) {
+      matrix.postTranslate(-dst.getWidth() / 2.0f, -dst.getHeight() / 2.0f);
+      matrix.postRotate(screenRotation);
+      matrix.postTranslate(dst.getWidth() / 2.0f, dst.getHeight() / 2.0f);
+    }
+
+    final Canvas canvas = new Canvas(dst);
+    canvas.drawBitmap(src, matrix, null);
+  }
+
+  @Override
+  public void onImageAvailable(final ImageReader reader) {
+    Image image = null;
+    try {
+      image = reader.acquireLatestImage();
+
+      if (image == null) {
+        return;
+      }
+
+      // Initialize the storage bitmaps once when the resolution is known.
+      if (previewWidth != image.getWidth() || previewHeight != image.getHeight()) {
+        LOGGER.i("Initializing at size %dx%d", previewWidth, previewHeight);
+        previewWidth = image.getWidth();
+        previewHeight = image.getHeight();
+        rgbBytes = new int[previewWidth * previewHeight];
+        yuvBytes = new byte[ImageUtils.getYUVByteSize(previewWidth, previewHeight)];
+        rgbFrameBitmap = Bitmap.createBitmap(previewWidth, previewHeight, Config.ARGB_8888);
+        croppedBitmap = Bitmap.createBitmap(INPUT_SIZE, INPUT_SIZE, Config.ARGB_8888);
+      }
+
+      final Plane[] planes = image.getPlanes();
+      int position = 0;
+
+      // Copy the bytes from the Image into a buffer for easier conversion to RGB.
+      // TODO(andrewharp): It may not be correct to do it this way.
+      final int[] planeOrder = {0, 2};
+      for (int i = 0; i < planeOrder.length; ++i) {
+        final Plane plane = planes[planeOrder[i]];
+        final ByteBuffer buffer = plane.getBuffer();
+
+        buffer.rewind();
+        final int readAmount = buffer.remaining();
+
+        buffer.get(yuvBytes, position, readAmount);
+        position += readAmount;
+      }
+
+      image.close();
+
+      ImageUtils.convertYUV420SPToARGB8888(yuvBytes, rgbBytes, previewWidth, previewHeight, false);
+    } catch (final Exception e) {
+      if (image != null) {
+        image.close();
+      }
+      LOGGER.e(e, "Exception!");
+      return;
+    }
+
+    rgbFrameBitmap.setPixels(rgbBytes, 0, previewWidth, 0, 0, previewWidth, previewHeight);
+    drawResizedBitmap(rgbFrameBitmap, croppedBitmap);
+
+    // For examining the actual TF input.
+    if (SAVE_PREVIEW_BITMAP) {
+      ImageUtils.saveBitmap(croppedBitmap);
+    }
+
+    final List<Classifier.Recognition> results = tensorflow.recognizeImage(croppedBitmap);
+
+    LOGGER.v("%d results", results.size());
+    for (final Classifier.Recognition result : results) {
+      LOGGER.v("Result: " + result.getTitle());
+    }
+    scoreView.setResults(results);
+  }
+}
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/env/ImageUtils.java b/tensorflow/examples/android/src/org/tensorflow/demo/env/ImageUtils.java
new file mode 100644
index 0000000000..78f818f734
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/env/ImageUtils.java
@@ -0,0 +1,113 @@
+package org.tensorflow.demo.env;
+
+import android.graphics.Bitmap;
+import android.os.Environment;
+
+import java.io.File;
+import java.io.FileOutputStream;
+
+/**
+ * Utility class for manipulating images.
+ **/
+public class ImageUtils {
+  @SuppressWarnings("unused")
+  private static final Logger LOGGER = new Logger();
+
+  /**
+   * Utility method to compute the allocated size in bytes of a YUV420SP image
+   * of the given dimensions.
+   */
+  public static int getYUVByteSize(final int width, final int height) {
+    // The luminance plane requires 1 byte per pixel.
+    final int ySize = width * height;
+
+    // The UV plane works on 2x2 blocks, so dimensions with odd size must be rounded up.
+    // Each 2x2 block takes 2 bytes to encode, one each for U and V.
+    final int uvSize = ((width + 1) / 2) * ((height + 1) / 2) * 2;
+
+    return ySize + uvSize;
+  }
+
+  /**
+   * Saves a Bitmap object to disk for analysis.
+   *
+   * @param bitmap The bitmap to save.
+   */
+  public static void saveBitmap(final Bitmap bitmap) {
+    final String root =
+        Environment.getExternalStorageDirectory().getAbsolutePath() + File.separator + "tensorflow";
+    LOGGER.i("Saving %dx%d bitmap to %s.", bitmap.getWidth(), bitmap.getHeight(), root);
+    final File myDir = new File(root);
+
+    if (!myDir.mkdirs()) {
+      LOGGER.i("Make dir failed");
+    }
+
+    final String fname = "preview.png";
+    final File file = new File(myDir, fname);
+    if (file.exists()) {
+      file.delete();
+    }
+    try {
+      final FileOutputStream out = new FileOutputStream(file);
+      bitmap.compress(Bitmap.CompressFormat.PNG, 99, out);
+      out.flush();
+      out.close();
+    } catch (final Exception e) {
+      LOGGER.e(e, "Exception!");
+    }
+  }
+
+  /**
+   * Converts YUV420 semi-planar data to ARGB 8888 data using the supplied width
+   * and height. The input and output must already be allocated and non-null.
+   * For efficiency, no error checking is performed.
+   *
+   * @param input The array of YUV 4:2:0 input data.
+   * @param output A pre-allocated array for the ARGB 8:8:8:8 output data.
+   * @param width The width of the input image.
+   * @param height The height of the input image.
+   * @param halfSize If true, downsample to 50% in each dimension, otherwise not.
+   */
+  public static native void convertYUV420SPToARGB8888(
+      byte[] input, int[] output, int width, int height, boolean halfSize);
+
+  /**
+   * Converts YUV420 semi-planar data to RGB 565 data using the supplied width
+   * and height. The input and output must already be allocated and non-null.
+   * For efficiency, no error checking is performed.
+   *
+   * @param input The array of YUV 4:2:0 input data.
+   * @param output A pre-allocated array for the RGB 5:6:5 output data.
+   * @param width The width of the input image.
+   * @param height The height of the input image.
+   */
+  public static native void convertYUV420SPToRGB565(
+      byte[] input, byte[] output, int width, int height);
+
+  /**
+   * Converts 32-bit ARGB8888 image data to YUV420SP data.  This is useful, for
+   * instance, in creating data to feed the classes that rely on raw camera
+   * preview frames.
+   *
+   * @param input An array of input pixels in ARGB8888 format.
+   * @param output A pre-allocated array for the YUV420SP output data.
+   * @param width The width of the input image.
+   * @param height The height of the input image.
+   */
+  public static native void convertARGB8888ToYUV420SP(
+      int[] input, byte[] output, int width, int height);
+
+  /**
+   * Converts 16-bit RGB565 image data to YUV420SP data.  This is useful, for
+   * instance, in creating data to feed the classes that rely on raw camera
+   * preview frames.
+   *
+   * @param input An array of input pixels in RGB565 format.
+   * @param output A pre-allocated array for the YUV420SP output data.
+   * @param width The width of the input image.
+   * @param height The height of the input image.
+   */
+  public static native void convertRGB565ToYUV420SP(
+      byte[] input, byte[] output, int width, int height);
+}
diff --git a/tensorflow/examples/android/src/org/tensorflow/demo/env/Logger.java b/tensorflow/examples/android/src/org/tensorflow/demo/env/Logger.java
new file mode 100644
index 0000000000..697c231176
--- /dev/null
+++ b/tensorflow/examples/android/src/org/tensorflow/demo/env/Logger.java
@@ -0,0 +1,176 @@
+package org.tensorflow.demo.env;
+
+import android.util.Log;
+
+import java.util.HashSet;
+import java.util.Set;
+
+/**
+ * Wrapper for the platform log function, allows convenient message prefixing and log disabling.
+ */
+public final class Logger {
+  private static final String DEFAULT_TAG = "tensorflow";
+  private static final int DEFAULT_MIN_LOG_LEVEL = Log.DEBUG;
+
+  // Classes to be ignored when examining the stack trace
+  private static final Set<String> IGNORED_CLASS_NAMES;
+
+  static {
+    IGNORED_CLASS_NAMES = new HashSet<String>(3);
+    IGNORED_CLASS_NAMES.add("dalvik.system.VMStack");
+    IGNORED_CLASS_NAMES.add("java.lang.Thread");
+    IGNORED_CLASS_NAMES.add(Logger.class.getCanonicalName());
+  }
+
+  private final String tag;
+  private final String messagePrefix;
+  private int minLogLevel = DEFAULT_MIN_LOG_LEVEL;
+
+  /**
+   * Creates a Logger using the class name as the message prefix.
+   *
+   * @param clazz the simple name of this class is used as the message prefix.
+   */
+  public Logger(final Class<?> clazz) {
+    this(clazz.getSimpleName());
+  }
+
+  /**
+   * Creates a Logger using the specified message prefix.
+   *
+   * @param messagePrefix is prepended to the text of every message.
+   */
+  public Logger(final String messagePrefix) {
+    this(DEFAULT_TAG, messagePrefix);
+  }
+
+  /**
+   * Creates a Logger with a custom tag and a custom message prefix. If the message prefix
+   * is set to <pre>null</pre>, the caller's class name is used as the prefix.
+   *
+   * @param tag identifies the source of a log message.
+   * @param messagePrefix prepended to every message if non-null. If null, the name of the caller is
+   *                      being used
+   */
+  public Logger(final String tag, final String messagePrefix) {
+    this.tag = tag;
+    final String prefix = messagePrefix == null ? getCallerSimpleName() : messagePrefix;
+    this.messagePrefix = (prefix.length() > 0) ? prefix + ": " : prefix;
+  }
+
+  /**
+   * Creates a Logger using the caller's class name as the message prefix.
+   */
+  public Logger() {
+    this(DEFAULT_TAG, null);
+  }
+
+  /**
+   * Creates a Logger using the caller's class name as the message prefix.
+   */
+  public Logger(final int minLogLevel) {
+    this(DEFAULT_TAG, null);
+    this.minLogLevel = minLogLevel;
+  }
+
+  public void setMinLogLevel(final int minLogLevel) {
+    this.minLogLevel = minLogLevel;
+  }
+
+  public boolean isLoggable(final int logLevel) {
+    return logLevel >= minLogLevel || Log.isLoggable(tag, logLevel);
+  }
+
+  /**
+   * Return caller's simple name.
+   *
+   * Android getStackTrace() returns an array that looks like this:
+   *     stackTrace[0]: dalvik.system.VMStack
+   *     stackTrace[1]: java.lang.Thread
+   *     stackTrace[2]: com.google.android.apps.unveil.env.UnveilLogger
+   *     stackTrace[3]: com.google.android.apps.unveil.BaseApplication
+   *
+   * This function returns the simple version of the first non-filtered name.
+   *
+   * @return caller's simple name
+   */
+  private static String getCallerSimpleName() {
+    // Get the current callstack so we can pull the class of the caller off of it.
+    final StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace();
+
+    for (final StackTraceElement elem : stackTrace) {
+      final String className = elem.getClassName();
+      if (!IGNORED_CLASS_NAMES.contains(className)) {
+        // We're only interested in the simple name of the class, not the complete package.
+        final String[] classParts = className.split("\\.");
+        return classParts[classParts.length - 1];
+      }
+    }
+
+    return Logger.class.getSimpleName();
+  }
+
+  private String toMessage(final String format, final Object... args) {
+    return messagePrefix + (args.length > 0 ? String.format(format, args) : format);
+  }
+
+  public void v(final String format, final Object... args) {
+    if (isLoggable(Log.VERBOSE)) {
+      Log.v(tag, toMessage(format, args));
+    }
+  }
+
+  public void v(final Throwable t, final String format, final Object... args) {
+    if (isLoggable(Log.VERBOSE)) {
+      Log.v(tag, toMessage(format, args), t);
+    }
+  }
+
+  public void d(final String format, final Object... args) {
+    if (isLoggable(Log.DEBUG)) {
+      Log.d(tag, toMessage(format, args));
+    }
+  }
+
+  public void d(final Throwable t, final String format, final Object... args) {
+    if (isLoggable(Log.DEBUG)) {
+      Log.d(tag, toMessage(format, args), t);
+    }
+  }
+
+  public void i(final String format, final Object... args) {
+    if (isLoggable(Log.INFO)) {
+      Log.i(tag, toMessage(format, args));
+    }
+  }
+
+  public void i(final Throwable t, final String format, final Object... args) {
+    if (isLoggable(Log.INFO)) {
+      Log.i(tag, toMessage(format, args), t);
+    }
+  }
+
+  public void w(final String format, final Object... args) {
+    if (isLoggable(Log.WARN)) {
+      Log.w(tag, toMessage(format, args));
+    }
+  }
+
+  public void w(final Throwable t, final String format, final Object... args) {
+    if (isLoggable(Log.WARN)) {
+      Log.w(tag, toMessage(format, args), t);
+    }
+  }
+
+  public void e(final String format, final Object... args) {
+    if (isLoggable(Log.ERROR)) {
+      Log.e(tag, toMessage(format, args));
+    }
+  }
+
+  public void e(final Throwable t, final String format, final Object... args) {
+    if (isLoggable(Log.ERROR)) {
+      Log.e(tag, toMessage(format, args), t);
+    }
+  }
+}
diff --git a/tensorflow/g3doc/__init__.py b/tensorflow/g3doc/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/__init__.py
diff --git a/tensorflow/g3doc/api_docs/cc/ClassEnv.md b/tensorflow/g3doc/api_docs/cc/ClassEnv.md
new file mode 100644
index 0000000000..0fdb3d32c7
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassEnv.md
@@ -0,0 +1,146 @@
+#Class tensorflow::Env
+
+An interface used by the tensorflow implementation to access operating system functionality like the filesystem etc.
+
+Callers may wish to provide a custom Env object to get fine grain control.
+
+All Env implementations are safe for concurrent access from multiple threads without any external synchronization.
+
+##Member Summary
+
+* [tensorflow::Env::Env](#tensorflow_Env_Env)
+* [virtual tensorflow::Env::~Env](#virtual_tensorflow_Env_Env)
+* [virtual Status tensorflow::Env::NewRandomAccessFile](#virtual_Status_tensorflow_Env_NewRandomAccessFile)
+  * Creates a brand new random access read-only file with the specified name.
+* [virtual Status tensorflow::Env::NewWritableFile](#virtual_Status_tensorflow_Env_NewWritableFile)
+  * Creates an object that writes to a new file with the specified name.
+* [virtual Status tensorflow::Env::NewAppendableFile](#virtual_Status_tensorflow_Env_NewAppendableFile)
+  * Creates an object that either appends to an existing file, or writes to a new file (if the file does not exist to begin with).
+* [virtual bool tensorflow::Env::FileExists](#virtual_bool_tensorflow_Env_FileExists)
+  * Returns true iff the named file exists.
+* [virtual Status tensorflow::Env::GetChildren](#virtual_Status_tensorflow_Env_GetChildren)
+  * Stores in *result the names of the children of the specified directory. The names are relative to &quot;dir&quot;.
+* [virtual Status tensorflow::Env::DeleteFile](#virtual_Status_tensorflow_Env_DeleteFile)
+  * Deletes the named file.
+* [virtual Status tensorflow::Env::CreateDir](#virtual_Status_tensorflow_Env_CreateDir)
+  * Creates the specified directory.
+* [virtual Status tensorflow::Env::DeleteDir](#virtual_Status_tensorflow_Env_DeleteDir)
+  * Deletes the specified directory.
+* [virtual Status tensorflow::Env::GetFileSize](#virtual_Status_tensorflow_Env_GetFileSize)
+  * Stores the size of fname in *file_size.
+* [virtual Status tensorflow::Env::RenameFile](#virtual_Status_tensorflow_Env_RenameFile)
+  * Renames file src to target. If target already exists, it will be replaced.
+* [virtual uint64 tensorflow::Env::NowMicros](#virtual_uint64_tensorflow_Env_NowMicros)
+  * Returns the number of micro-seconds since some fixed point in time. Only useful for computing deltas of time.
+* [virtual void tensorflow::Env::SleepForMicroseconds](#virtual_void_tensorflow_Env_SleepForMicroseconds)
+  * Sleeps/delays the thread for the prescribed number of micro-seconds.
+* [virtual Thread* tensorflow::Env::StartThread](#virtual_Thread_tensorflow_Env_StartThread)
+  * Returns a new thread that is running fn() and is identified (for debugging/performance-analysis) by &quot;name&quot;.
+* [static Env* tensorflow::Env::Default](#static_Env_tensorflow_Env_Default)
+  * Returns a default environment suitable for the current operating system.
+
+##Member Details
+
+#### tensorflow::Env::Env() {#tensorflow_Env_Env}
+
+
+
+
+
+#### virtual tensorflow::Env::~Env() {#virtual_tensorflow_Env_Env}
+
+
+
+
+
+#### virtual Status tensorflow::Env::NewRandomAccessFile(const string &amp;fname, RandomAccessFile **result)=0 {#virtual_Status_tensorflow_Env_NewRandomAccessFile}
+
+Creates a brand new random access read-only file with the specified name.
+
+On success, stores a pointer to the new file in *result and returns OK. On failure stores NULL in *result and returns non-OK. If the file does not exist, returns a non-OK status.
+
+The returned file may be concurrently accessed by multiple threads.
+
+#### virtual Status tensorflow::Env::NewWritableFile(const string &amp;fname, WritableFile **result)=0 {#virtual_Status_tensorflow_Env_NewWritableFile}
+
+Creates an object that writes to a new file with the specified name.
+
+Deletes any existing file with the same name and creates a new file. On success, stores a pointer to the new file in *result and returns OK. On failure stores NULL in *result and returns non-OK.
+
+The returned file will only be accessed by one thread at a time.
+
+#### virtual Status tensorflow::Env::NewAppendableFile(const string &amp;fname, WritableFile **result)=0 {#virtual_Status_tensorflow_Env_NewAppendableFile}
+
+Creates an object that either appends to an existing file, or writes to a new file (if the file does not exist to begin with).
+
+On success, stores a pointer to the new file in *result and returns OK. On failure stores NULL in *result and returns non-OK.
+
+The returned file will only be accessed by one thread at a time.
+
+#### virtual bool tensorflow::Env::FileExists(const string &amp;fname)=0 {#virtual_bool_tensorflow_Env_FileExists}
+
+Returns true iff the named file exists.
+
+
+
+#### virtual Status tensorflow::Env::GetChildren(const string &amp;dir, std::vector&lt; string &gt; *result)=0 {#virtual_Status_tensorflow_Env_GetChildren}
+
+Stores in *result the names of the children of the specified directory. The names are relative to &quot;dir&quot;.
+
+Original contents of *results are dropped.
+
+#### virtual Status tensorflow::Env::DeleteFile(const string &amp;fname)=0 {#virtual_Status_tensorflow_Env_DeleteFile}
+
+Deletes the named file.
+
+
+
+#### virtual Status tensorflow::Env::CreateDir(const string &amp;dirname)=0 {#virtual_Status_tensorflow_Env_CreateDir}
+
+Creates the specified directory.
+
+
+
+#### virtual Status tensorflow::Env::DeleteDir(const string &amp;dirname)=0 {#virtual_Status_tensorflow_Env_DeleteDir}
+
+Deletes the specified directory.
+
+
+
+#### virtual Status tensorflow::Env::GetFileSize(const string &amp;fname, uint64 *file_size)=0 {#virtual_Status_tensorflow_Env_GetFileSize}
+
+Stores the size of fname in *file_size.
+
+
+
+#### virtual Status tensorflow::Env::RenameFile(const string &amp;src, const string &amp;target)=0 {#virtual_Status_tensorflow_Env_RenameFile}
+
+Renames file src to target. If target already exists, it will be replaced.
+
+
+
+#### virtual uint64 tensorflow::Env::NowMicros()=0 {#virtual_uint64_tensorflow_Env_NowMicros}
+
+Returns the number of micro-seconds since some fixed point in time. Only useful for computing deltas of time.
+
+
+
+#### virtual void tensorflow::Env::SleepForMicroseconds(int micros)=0 {#virtual_void_tensorflow_Env_SleepForMicroseconds}
+
+Sleeps/delays the thread for the prescribed number of micro-seconds.
+
+
+
+#### virtual Thread* tensorflow::Env::StartThread(const ThreadOptions &amp;thread_options, const string &amp;name, std::function&lt; void()&gt; fn) TF_MUST_USE_RESULT=0 {#virtual_Thread_tensorflow_Env_StartThread}
+
+Returns a new thread that is running fn() and is identified (for debugging/performance-analysis) by &quot;name&quot;.
+
+Caller takes ownership of the result and must delete it eventually (the deletion will block until fn() stops running).
+
+#### static Env* tensorflow::Env::Default() {#static_Env_tensorflow_Env_Default}
+
+Returns a default environment suitable for the current operating system.
+
+Sophisticated users may wish to provide their own Env implementation instead of relying on this default environment.
+
+The result of Default() belongs to this library and must never be deleted.
diff --git a/tensorflow/g3doc/api_docs/cc/ClassEnvWrapper.md b/tensorflow/g3doc/api_docs/cc/ClassEnvWrapper.md
new file mode 100644
index 0000000000..2c6af82113
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassEnvWrapper.md
@@ -0,0 +1,143 @@
+#Class tensorflow::EnvWrapper
+
+An implementation of Env that forwards all calls to another Env .
+
+May be useful to clients who wish to override just part of the functionality of another Env .
+
+##Member Summary
+
+* [tensorflow::EnvWrapper::EnvWrapper](#tensorflow_EnvWrapper_EnvWrapper)
+  * Initializes an EnvWrapper that delegates all calls to *t.
+* [virtual tensorflow::EnvWrapper::~EnvWrapper](#virtual_tensorflow_EnvWrapper_EnvWrapper)
+* [Env* tensorflow::EnvWrapper::target](#Env_tensorflow_EnvWrapper_target)
+  * Returns the target to which this Env forwards all calls.
+* [Status tensorflow::EnvWrapper::NewRandomAccessFile](#Status_tensorflow_EnvWrapper_NewRandomAccessFile)
+  * Creates a brand new random access read-only file with the specified name.
+* [Status tensorflow::EnvWrapper::NewWritableFile](#Status_tensorflow_EnvWrapper_NewWritableFile)
+  * Creates an object that writes to a new file with the specified name.
+* [Status tensorflow::EnvWrapper::NewAppendableFile](#Status_tensorflow_EnvWrapper_NewAppendableFile)
+  * Creates an object that either appends to an existing file, or writes to a new file (if the file does not exist to begin with).
+* [bool tensorflow::EnvWrapper::FileExists](#bool_tensorflow_EnvWrapper_FileExists)
+  * Returns true iff the named file exists.
+* [Status tensorflow::EnvWrapper::GetChildren](#Status_tensorflow_EnvWrapper_GetChildren)
+  * Stores in *result the names of the children of the specified directory. The names are relative to &quot;dir&quot;.
+* [Status tensorflow::EnvWrapper::DeleteFile](#Status_tensorflow_EnvWrapper_DeleteFile)
+  * Deletes the named file.
+* [Status tensorflow::EnvWrapper::CreateDir](#Status_tensorflow_EnvWrapper_CreateDir)
+  * Creates the specified directory.
+* [Status tensorflow::EnvWrapper::DeleteDir](#Status_tensorflow_EnvWrapper_DeleteDir)
+  * Deletes the specified directory.
+* [Status tensorflow::EnvWrapper::GetFileSize](#Status_tensorflow_EnvWrapper_GetFileSize)
+  * Stores the size of fname in *file_size.
+* [Status tensorflow::EnvWrapper::RenameFile](#Status_tensorflow_EnvWrapper_RenameFile)
+  * Renames file src to target. If target already exists, it will be replaced.
+* [uint64 tensorflow::EnvWrapper::NowMicros](#uint64_tensorflow_EnvWrapper_NowMicros)
+  * Returns the number of micro-seconds since some fixed point in time. Only useful for computing deltas of time.
+* [void tensorflow::EnvWrapper::SleepForMicroseconds](#void_tensorflow_EnvWrapper_SleepForMicroseconds)
+  * Sleeps/delays the thread for the prescribed number of micro-seconds.
+* [Thread* tensorflow::EnvWrapper::StartThread](#Thread_tensorflow_EnvWrapper_StartThread)
+  * Returns a new thread that is running fn() and is identified (for debugging/performance-analysis) by &quot;name&quot;.
+
+##Member Details
+
+#### tensorflow::EnvWrapper::EnvWrapper(Env *t) {#tensorflow_EnvWrapper_EnvWrapper}
+
+Initializes an EnvWrapper that delegates all calls to *t.
+
+
+
+#### virtual tensorflow::EnvWrapper::~EnvWrapper() {#virtual_tensorflow_EnvWrapper_EnvWrapper}
+
+
+
+
+
+#### Env* tensorflow::EnvWrapper::target() const {#Env_tensorflow_EnvWrapper_target}
+
+Returns the target to which this Env forwards all calls.
+
+
+
+#### Status tensorflow::EnvWrapper::NewRandomAccessFile(const string &amp;f, RandomAccessFile **r) override {#Status_tensorflow_EnvWrapper_NewRandomAccessFile}
+
+Creates a brand new random access read-only file with the specified name.
+
+On success, stores a pointer to the new file in *result and returns OK. On failure stores NULL in *result and returns non-OK. If the file does not exist, returns a non-OK status.
+
+The returned file may be concurrently accessed by multiple threads.
+
+#### Status tensorflow::EnvWrapper::NewWritableFile(const string &amp;f, WritableFile **r) override {#Status_tensorflow_EnvWrapper_NewWritableFile}
+
+Creates an object that writes to a new file with the specified name.
+
+Deletes any existing file with the same name and creates a new file. On success, stores a pointer to the new file in *result and returns OK. On failure stores NULL in *result and returns non-OK.
+
+The returned file will only be accessed by one thread at a time.
+
+#### Status tensorflow::EnvWrapper::NewAppendableFile(const string &amp;f, WritableFile **r) override {#Status_tensorflow_EnvWrapper_NewAppendableFile}
+
+Creates an object that either appends to an existing file, or writes to a new file (if the file does not exist to begin with).
+
+On success, stores a pointer to the new file in *result and returns OK. On failure stores NULL in *result and returns non-OK.
+
+The returned file will only be accessed by one thread at a time.
+
+#### bool tensorflow::EnvWrapper::FileExists(const string &amp;f) override {#bool_tensorflow_EnvWrapper_FileExists}
+
+Returns true iff the named file exists.
+
+
+
+#### Status tensorflow::EnvWrapper::GetChildren(const string &amp;dir, std::vector&lt; string &gt; *r) override {#Status_tensorflow_EnvWrapper_GetChildren}
+
+Stores in *result the names of the children of the specified directory. The names are relative to &quot;dir&quot;.
+
+Original contents of *results are dropped.
+
+#### Status tensorflow::EnvWrapper::DeleteFile(const string &amp;f) override {#Status_tensorflow_EnvWrapper_DeleteFile}
+
+Deletes the named file.
+
+
+
+#### Status tensorflow::EnvWrapper::CreateDir(const string &amp;d) override {#Status_tensorflow_EnvWrapper_CreateDir}
+
+Creates the specified directory.
+
+
+
+#### Status tensorflow::EnvWrapper::DeleteDir(const string &amp;d) override {#Status_tensorflow_EnvWrapper_DeleteDir}
+
+Deletes the specified directory.
+
+
+
+#### Status tensorflow::EnvWrapper::GetFileSize(const string &amp;f, uint64 *s) override {#Status_tensorflow_EnvWrapper_GetFileSize}
+
+Stores the size of fname in *file_size.
+
+
+
+#### Status tensorflow::EnvWrapper::RenameFile(const string &amp;s, const string &amp;t) override {#Status_tensorflow_EnvWrapper_RenameFile}
+
+Renames file src to target. If target already exists, it will be replaced.
+
+
+
+#### uint64 tensorflow::EnvWrapper::NowMicros() override {#uint64_tensorflow_EnvWrapper_NowMicros}
+
+Returns the number of micro-seconds since some fixed point in time. Only useful for computing deltas of time.
+
+
+
+#### void tensorflow::EnvWrapper::SleepForMicroseconds(int micros) override {#void_tensorflow_EnvWrapper_SleepForMicroseconds}
+
+Sleeps/delays the thread for the prescribed number of micro-seconds.
+
+
+
+#### Thread* tensorflow::EnvWrapper::StartThread(const ThreadOptions &amp;thread_options, const string &amp;name, std::function&lt; void()&gt; fn) override {#Thread_tensorflow_EnvWrapper_StartThread}
+
+Returns a new thread that is running fn() and is identified (for debugging/performance-analysis) by &quot;name&quot;.
+
+Caller takes ownership of the result and must delete it eventually (the deletion will block until fn() stops running).
diff --git a/tensorflow/g3doc/api_docs/cc/ClassRandomAccessFile.md b/tensorflow/g3doc/api_docs/cc/ClassRandomAccessFile.md
new file mode 100644
index 0000000000..3538c2ca11
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassRandomAccessFile.md
@@ -0,0 +1,38 @@
+#Class tensorflow::RandomAccessFile
+
+A file abstraction for randomly reading the contents of a file.
+
+
+
+##Member Summary
+
+* [tensorflow::RandomAccessFile::RandomAccessFile](#tensorflow_RandomAccessFile_RandomAccessFile)
+* [virtual tensorflow::RandomAccessFile::~RandomAccessFile](#virtual_tensorflow_RandomAccessFile_RandomAccessFile)
+* [virtual Status tensorflow::RandomAccessFile::Read](#virtual_Status_tensorflow_RandomAccessFile_Read)
+  * Reads up to &quot;n&quot; bytes from the file starting at &quot;offset&quot;.
+
+##Member Details
+
+#### tensorflow::RandomAccessFile::RandomAccessFile() {#tensorflow_RandomAccessFile_RandomAccessFile}
+
+
+
+
+
+#### virtual tensorflow::RandomAccessFile::~RandomAccessFile() {#virtual_tensorflow_RandomAccessFile_RandomAccessFile}
+
+
+
+
+
+#### virtual Status tensorflow::RandomAccessFile::Read(uint64 offset, size_t n, StringPiece *result, char *scratch) const =0 {#virtual_Status_tensorflow_RandomAccessFile_Read}
+
+Reads up to &quot;n&quot; bytes from the file starting at &quot;offset&quot;.
+
+&quot;scratch[0..n-1]&quot; may be written by this routine. Sets &quot;*result&quot; to the data that was read (including if fewer than &quot;n&quot; bytes were successfully read). May set &quot;*result&quot; to point at data in &quot;scratch[0..n-1]&quot;, so &quot;scratch[0..n-1]&quot; must be live when &quot;*result&quot; is used.
+
+On OK returned status: &quot;n&quot; bytes have been stored in &quot;*result&quot;. On non-OK returned status: [0..n] bytes have been stored in &quot;*result&quot;.
+
+Returns OUT_OF_RANGE if fewer than n bytes were stored in &quot;*result&quot; because of EOF.
+
+Safe for concurrent use by multiple threads.
diff --git a/tensorflow/g3doc/api_docs/cc/ClassSession.md b/tensorflow/g3doc/api_docs/cc/ClassSession.md
new file mode 100644
index 0000000000..f2f9d8f762
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassSession.md
@@ -0,0 +1,88 @@
+#Class tensorflow::Session
+
+A Session instance lets a caller drive a TensorFlow graph computation.
+
+When a Session is created with a given target, a new Session object is bound to the universe of resources specified by that target. Those resources are available to this session to perform computation described in the GraphDef. After extending the session with a graph, the caller uses the Run() API to perform the computation and potentially fetch outputs as Tensors.
+
+Example: tensorflow::GraphDef graph;
+// ... Create or load graph into &apos;graph&apos;.
+
+// This example uses the default options which connects
+// to a local runtime.
+tensorflow::SessionOptions options;
+std::unique_ptr&lt;tensorflow::Session&gt;
+session(tensorflow::NewSession(options));
+
+// Create the session with this graph.
+tensorflow::Status s = session-&gt;Create(graph);
+if (!s.ok()) { ... }
+
+// Run the graph and fetch the first output of the &quot;output&quot;
+// operation, and also run to but do not return anything
+// for the &quot;update_state&quot; operation.
+std::vector&lt;tensorflow::Tensor&gt; outputs;
+s = session-&gt;Run({}, {&quot;output:0&quot;}, {&quot;update_state&quot;}, &amp;outputs);
+if (!s.ok()) { ... }
+
+// Map the output as a flattened float tensor, and do something
+// with it.
+auto output_tensor = outputs[0].flat&lt;float&gt;();
+if (output_tensor(0) &gt; 0.5) { ... }
+
+// Close the session to release the resources associated with
+// this session.
+session-&gt;Close()
+
+A Session allows concurrent calls to Run() , though a Session must be created / extended by a single thread.
+
+Only one thread must call Close() , and Close() must only be called after all other calls to Run() have returned.
+
+##Member Summary
+
+* [virtual Status tensorflow::Session::Create](#virtual_Status_tensorflow_Session_Create)
+  * Create the graph to be used for the session.
+* [virtual Status tensorflow::Session::Extend](#virtual_Status_tensorflow_Session_Extend)
+  * Adds operations to the graph that is already registered with the Session .
+* [virtual Status tensorflow::Session::Run](#virtual_Status_tensorflow_Session_Run)
+  * Runs the graph with the provided input tensors and fills &apos;outputs&apos; for the endpoints specified in &apos;output_tensor_names&apos;. Runs to but does not return Tensors for the nodes in &apos;target_node_names&apos;.
+* [virtual Status tensorflow::Session::Close](#virtual_Status_tensorflow_Session_Close)
+  * Closes this session.
+* [virtual tensorflow::Session::~Session](#virtual_tensorflow_Session_Session)
+
+##Member Details
+
+#### virtual Status tensorflow::Session::Create(const GraphDef &amp;graph)=0 {#virtual_Status_tensorflow_Session_Create}
+
+Create the graph to be used for the session.
+
+Returns an error if this session has already been created with a graph. To re-use the session with a different graph, the caller must Close() the session first.
+
+#### virtual Status tensorflow::Session::Extend(const GraphDef &amp;graph)=0 {#virtual_Status_tensorflow_Session_Extend}
+
+Adds operations to the graph that is already registered with the Session .
+
+The names of new operations in &quot;graph&quot; must not exist in the graph that is already registered.
+
+#### virtual Status tensorflow::Session::Run(const std::vector&lt; std::pair&lt; string, Tensor &gt; &gt; &amp;inputs, const std::vector&lt; string &gt; &amp;output_tensor_names, const std::vector&lt; string &gt; &amp;target_node_names, std::vector&lt; Tensor &gt; *outputs)=0 {#virtual_Status_tensorflow_Session_Run}
+
+Runs the graph with the provided input tensors and fills &apos;outputs&apos; for the endpoints specified in &apos;output_tensor_names&apos;. Runs to but does not return Tensors for the nodes in &apos;target_node_names&apos;.
+
+The order of tensors in &apos;outputs&apos; will match the order provided by &apos;output_tensor_names&apos;.
+
+If Run returns OK(), then outputs-&gt;size() will be equal to output_tensor_names.size(). If Run does not return OK(), the state of outputs is undefined.
+
+REQUIRES: The name of each Tensor of the input or output must match a &quot;Tensor endpoint&quot; in the GraphDef passed to Create() .
+
+REQUIRES: outputs is not nullptr if output_tensor_names is non-empty.
+
+#### virtual Status tensorflow::Session::Close()=0 {#virtual_Status_tensorflow_Session_Close}
+
+Closes this session.
+
+Closing a session releases the resources used by this session on the TensorFlow runtime (specified during session creation by the &apos; SessionOptions::target &apos; field).
+
+#### virtual tensorflow::Session::~Session() {#virtual_tensorflow_Session_Session}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/ClassStatus.md b/tensorflow/g3doc/api_docs/cc/ClassStatus.md
new file mode 100644
index 0000000000..d5ef48b14d
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassStatus.md
@@ -0,0 +1,107 @@
+#Class tensorflow::Status
+
+
+
+
+
+##Member Summary
+
+* [tensorflow::Status::Status](#tensorflow_Status_Status)
+  * Create a success status.
+* [tensorflow::Status::~Status](#tensorflow_Status_Status)
+* [tensorflow::Status::Status](#tensorflow_Status_Status)
+  * Create a status with the specified error code and msg as a human-readable string containing more detailed information.
+* [tensorflow::Status::Status](#tensorflow_Status_Status)
+  * Copy the specified status.
+* [void tensorflow::Status::operator=](#void_tensorflow_Status_operator_)
+* [bool tensorflow::Status::ok](#bool_tensorflow_Status_ok)
+  * Returns true iff the status indicates success.
+* [tensorflow::error::Code tensorflow::Status::code](#tensorflow_error_Code_tensorflow_Status_code)
+* [const string&amp; tensorflow::Status::error_message](#const_string_amp_tensorflow_Status_error_message)
+* [bool tensorflow::Status::operator==](#bool_tensorflow_Status_operator_)
+* [bool tensorflow::Status::operator!=](#bool_tensorflow_Status_operator_)
+* [void tensorflow::Status::Update](#void_tensorflow_Status_Update)
+  * If &quot;ok()&quot;, stores &quot;new_status&quot; into *this. If &quot;!ok()&quot;, preserves the current status, but may augment with additional information about &quot;new_status&quot;.
+* [string tensorflow::Status::ToString](#string_tensorflow_Status_ToString)
+  * Return a string representation of this status suitable for printing. Returns the string &quot;OK&quot; for success.
+* [static Status tensorflow::Status::OK](#static_Status_tensorflow_Status_OK)
+
+##Member Details
+
+#### tensorflow::Status::Status() {#tensorflow_Status_Status}
+
+Create a success status.
+
+
+
+#### tensorflow::Status::~Status() {#tensorflow_Status_Status}
+
+
+
+
+
+#### tensorflow::Status::Status(tensorflow::error::Code code, tensorflow::StringPiece msg) {#tensorflow_Status_Status}
+
+Create a status with the specified error code and msg as a human-readable string containing more detailed information.
+
+
+
+#### tensorflow::Status::Status(const Status &amp;s) {#tensorflow_Status_Status}
+
+Copy the specified status.
+
+
+
+#### void tensorflow::Status::operator=(const Status &amp;s) {#void_tensorflow_Status_operator_}
+
+
+
+
+
+#### bool tensorflow::Status::ok() const {#bool_tensorflow_Status_ok}
+
+Returns true iff the status indicates success.
+
+
+
+#### tensorflow::error::Code tensorflow::Status::code() const {#tensorflow_error_Code_tensorflow_Status_code}
+
+
+
+
+
+#### const string&amp; tensorflow::Status::error_message() const {#const_string_amp_tensorflow_Status_error_message}
+
+
+
+
+
+#### bool tensorflow::Status::operator==(const Status &amp;x) const {#bool_tensorflow_Status_operator_}
+
+
+
+
+
+#### bool tensorflow::Status::operator!=(const Status &amp;x) const {#bool_tensorflow_Status_operator_}
+
+
+
+
+
+#### void tensorflow::Status::Update(const Status &amp;new_status) {#void_tensorflow_Status_Update}
+
+If &quot;ok()&quot;, stores &quot;new_status&quot; into *this. If &quot;!ok()&quot;, preserves the current status, but may augment with additional information about &quot;new_status&quot;.
+
+Convenient way of keeping track of the first error encountered. Instead of: if (overall_status.ok()) overall_status = new_status Use: overall_status.Update(new_status);
+
+#### string tensorflow::Status::ToString() const {#string_tensorflow_Status_ToString}
+
+Return a string representation of this status suitable for printing. Returns the string &quot;OK&quot; for success.
+
+
+
+#### static Status tensorflow::Status::OK() {#static_Status_tensorflow_Status_OK}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/ClassTensor.md b/tensorflow/g3doc/api_docs/cc/ClassTensor.md
new file mode 100644
index 0000000000..7ecc7688f3
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassTensor.md
@@ -0,0 +1,361 @@
+#Class tensorflow::Tensor
+
+Represents an n-dimensional array of values.
+
+
+
+##Member Summary
+
+* [tensorflow::Tensor::Tensor](#tensorflow_Tensor_Tensor)
+  * Default Tensor constructor. Creates a 1-dimension, 0-element float tensor.
+* [tensorflow::Tensor::Tensor](#tensorflow_Tensor_Tensor)
+  * Creates a Tensor of the given datatype and shape.
+* [tensorflow::Tensor::Tensor](#tensorflow_Tensor_Tensor)
+  * Creates a tensor with the input datatype and shape, using the allocator &apos;a&apos; to allocate the underlying buffer.
+* [tensorflow::Tensor::Tensor](#tensorflow_Tensor_Tensor)
+  * Creates an uninitialized Tensor of the given data type.
+* [tensorflow::Tensor::Tensor](#tensorflow_Tensor_Tensor)
+* [tensorflow::Tensor::~Tensor](#tensorflow_Tensor_Tensor)
+  * Copy constructor.
+* [DataType tensorflow::Tensor::dtype](#DataType_tensorflow_Tensor_dtype)
+  * Returns the data type.
+* [const TensorShape&amp; tensorflow::Tensor::shape](#const_TensorShape_amp_tensorflow_Tensor_shape)
+  * Returns the shape of the tensor.
+* [int tensorflow::Tensor::dims](#int_tensorflow_Tensor_dims)
+  * Convenience accessor for the tensor shape.
+* [int64 tensorflow::Tensor::dim_size](#int64_tensorflow_Tensor_dim_size)
+  * Convenience accessor for the tensor shape.
+* [int64 tensorflow::Tensor::NumElements](#int64_tensorflow_Tensor_NumElements)
+  * Convenience accessor for the tensor shape.
+* [bool tensorflow::Tensor::IsSameSize](#bool_tensorflow_Tensor_IsSameSize)
+* [bool tensorflow::Tensor::IsInitialized](#bool_tensorflow_Tensor_IsInitialized)
+  * Has this Tensor been initialized?
+* [size_t tensorflow::Tensor::TotalBytes](#size_t_tensorflow_Tensor_TotalBytes)
+  * Returns the estimated memory usage of this tensor.
+* [Tensor&amp; tensorflow::Tensor::operator=](#Tensor_amp_tensorflow_Tensor_operator_)
+  * Assign operator. This tensor shares other&apos;s underlying storage.
+* [bool tensorflow::Tensor::CopyFrom](#bool_tensorflow_Tensor_CopyFrom)
+  * Copy the other tensor into this tensor and reshape it.
+* [Tensor tensorflow::Tensor::Slice](#Tensor_tensorflow_Tensor_Slice)
+  * Slice this tensor along the 1st dimension.
+* [bool tensorflow::Tensor::FromProto](#bool_tensorflow_Tensor_FromProto)
+  * Parse "other&apos; and construct the tensor.
+* [bool tensorflow::Tensor::FromProto](#bool_tensorflow_Tensor_FromProto)
+* [void tensorflow::Tensor::AsProtoField](#void_tensorflow_Tensor_AsProtoField)
+  * Fills in &quot;proto&quot; with &quot;*this&quot; tensor&apos;s content.
+* [void tensorflow::Tensor::AsProtoTensorContent](#void_tensorflow_Tensor_AsProtoTensorContent)
+* [TTypes&lt;T&gt;::Vec tensorflow::Tensor::vec](#TTypes_lt_T_gt_Vec_tensorflow_Tensor_vec)
+  * Return the Tensor data as an Eigen::Tensor with the type and sizes of this Tensor .
+* [TTypes&lt;T&gt;::Matrix tensorflow::Tensor::matrix](#TTypes_lt_T_gt_Matrix_tensorflow_Tensor_matrix)
+* [TTypes&lt; T, NDIMS &gt;::Tensor tensorflow::Tensor::tensor](#TTypes_lt_T_NDIMS_gt_Tensor_tensorflow_Tensor_tensor)
+* [TTypes&lt;T&gt;::Flat tensorflow::Tensor::flat](#TTypes_lt_T_gt_Flat_tensorflow_Tensor_flat)
+  * Return the Tensor data as an Eigen::Tensor of the data type and a specified shape.
+* [TTypes&lt;T&gt;::UnalignedFlat tensorflow::Tensor::unaligned_flat](#TTypes_lt_T_gt_UnalignedFlat_tensorflow_Tensor_unaligned_flat)
+* [TTypes&lt;T&gt;::Matrix tensorflow::Tensor::flat_inner_dims](#TTypes_lt_T_gt_Matrix_tensorflow_Tensor_flat_inner_dims)
+* [TTypes&lt;T&gt;::Matrix tensorflow::Tensor::flat_outer_dims](#TTypes_lt_T_gt_Matrix_tensorflow_Tensor_flat_outer_dims)
+* [TTypes&lt; T, NDIMS &gt;::Tensor tensorflow::Tensor::shaped](#TTypes_lt_T_NDIMS_gt_Tensor_tensorflow_Tensor_shaped)
+* [TTypes&lt; T, NDIMS &gt;::UnalignedTensor tensorflow::Tensor::unaligned_shaped](#TTypes_lt_T_NDIMS_gt_UnalignedTensor_tensorflow_Tensor_unaligned_shaped)
+* [TTypes&lt; T &gt;::Scalar tensorflow::Tensor::scalar](#TTypes_lt_T_gt_Scalar_tensorflow_Tensor_scalar)
+  * Return the Tensor data as a Tensor Map of fixed size 1: TensorMap&lt;TensorFixedSize&lt;T, 1&gt;&gt;.
+* [TTypes&lt;T&gt;::ConstVec tensorflow::Tensor::vec](#TTypes_lt_T_gt_ConstVec_tensorflow_Tensor_vec)
+  * Const versions of all the methods above.
+* [TTypes&lt;T&gt;::ConstMatrix tensorflow::Tensor::matrix](#TTypes_lt_T_gt_ConstMatrix_tensorflow_Tensor_matrix)
+* [TTypes&lt; T, NDIMS &gt;::ConstTensor tensorflow::Tensor::tensor](#TTypes_lt_T_NDIMS_gt_ConstTensor_tensorflow_Tensor_tensor)
+* [TTypes&lt;T&gt;::ConstFlat tensorflow::Tensor::flat](#TTypes_lt_T_gt_ConstFlat_tensorflow_Tensor_flat)
+* [TTypes&lt;T&gt;::ConstUnalignedFlat tensorflow::Tensor::unaligned_flat](#TTypes_lt_T_gt_ConstUnalignedFlat_tensorflow_Tensor_unaligned_flat)
+* [TTypes&lt;T&gt;::ConstMatrix tensorflow::Tensor::flat_inner_dims](#TTypes_lt_T_gt_ConstMatrix_tensorflow_Tensor_flat_inner_dims)
+* [TTypes&lt;T&gt;::ConstMatrix tensorflow::Tensor::flat_outer_dims](#TTypes_lt_T_gt_ConstMatrix_tensorflow_Tensor_flat_outer_dims)
+* [TTypes&lt; T, NDIMS &gt;::ConstTensor tensorflow::Tensor::shaped](#TTypes_lt_T_NDIMS_gt_ConstTensor_tensorflow_Tensor_shaped)
+* [TTypes&lt; T, NDIMS &gt;::ConstUnalignedTensor tensorflow::Tensor::unaligned_shaped](#TTypes_lt_T_NDIMS_gt_ConstUnalignedTensor_tensorflow_Tensor_unaligned_shaped)
+* [TTypes&lt; T &gt;::ConstScalar tensorflow::Tensor::scalar](#TTypes_lt_T_gt_ConstScalar_tensorflow_Tensor_scalar)
+* [string tensorflow::Tensor::SummarizeValue](#string_tensorflow_Tensor_SummarizeValue)
+  * Render the first max_entries values in *this into a string.
+* [string tensorflow::Tensor::DebugString](#string_tensorflow_Tensor_DebugString)
+  * A human-readable summary of the Tensor suitable for debugging.
+* [void tensorflow::Tensor::FillDescription](#void_tensorflow_Tensor_FillDescription)
+* [StringPiece tensorflow::Tensor::tensor_data](#StringPiece_tensorflow_Tensor_tensor_data)
+  * Returns a StringPiece mapping the current tensor&apos;s buffer.
+
+##Member Details
+
+#### tensorflow::Tensor::Tensor() {#tensorflow_Tensor_Tensor}
+
+Default Tensor constructor. Creates a 1-dimension, 0-element float tensor.
+
+
+
+#### tensorflow::Tensor::Tensor(DataType type, const TensorShape &amp;shape) {#tensorflow_Tensor_Tensor}
+
+Creates a Tensor of the given datatype and shape.
+
+The underlying buffer is allocated using a CPUAllocator.
+
+#### tensorflow::Tensor::Tensor(Allocator *a, DataType type, const TensorShape &amp;shape) {#tensorflow_Tensor_Tensor}
+
+Creates a tensor with the input datatype and shape, using the allocator &apos;a&apos; to allocate the underlying buffer.
+
+&apos;a&apos; must outlive the lifetime of this Tensor .
+
+#### tensorflow::Tensor::Tensor(DataType type) {#tensorflow_Tensor_Tensor}
+
+Creates an uninitialized Tensor of the given data type.
+
+
+
+#### tensorflow::Tensor::Tensor(const Tensor &amp;other) {#tensorflow_Tensor_Tensor}
+
+
+
+
+
+#### tensorflow::Tensor::~Tensor() {#tensorflow_Tensor_Tensor}
+
+Copy constructor.
+
+
+
+#### DataType tensorflow::Tensor::dtype() const {#DataType_tensorflow_Tensor_dtype}
+
+Returns the data type.
+
+
+
+#### const TensorShape&amp; tensorflow::Tensor::shape() const {#const_TensorShape_amp_tensorflow_Tensor_shape}
+
+Returns the shape of the tensor.
+
+
+
+#### int tensorflow::Tensor::dims() const {#int_tensorflow_Tensor_dims}
+
+Convenience accessor for the tensor shape.
+
+For all shape accessors, see comments for relevant methods of TensorShape in tensor_shape.h .
+
+#### int64 tensorflow::Tensor::dim_size(int d) const {#int64_tensorflow_Tensor_dim_size}
+
+Convenience accessor for the tensor shape.
+
+
+
+#### int64 tensorflow::Tensor::NumElements() const {#int64_tensorflow_Tensor_NumElements}
+
+Convenience accessor for the tensor shape.
+
+
+
+#### bool tensorflow::Tensor::IsSameSize(const Tensor &amp;b) const {#bool_tensorflow_Tensor_IsSameSize}
+
+
+
+
+
+#### bool tensorflow::Tensor::IsInitialized() const {#bool_tensorflow_Tensor_IsInitialized}
+
+Has this Tensor been initialized?
+
+
+
+#### size_t tensorflow::Tensor::TotalBytes() const {#size_t_tensorflow_Tensor_TotalBytes}
+
+Returns the estimated memory usage of this tensor.
+
+
+
+#### Tensor&amp; tensorflow::Tensor::operator=(const Tensor &amp;other) {#Tensor_amp_tensorflow_Tensor_operator_}
+
+Assign operator. This tensor shares other&apos;s underlying storage.
+
+
+
+#### bool tensorflow::Tensor::CopyFrom(const Tensor &amp;other, const TensorShape &amp;shape) TF_MUST_USE_RESULT {#bool_tensorflow_Tensor_CopyFrom}
+
+Copy the other tensor into this tensor and reshape it.
+
+This tensor shares other&apos;s underlying storage. Returns true iff other.shape() has the same number of elements of the given &quot;shape&quot;.
+
+#### Tensor tensorflow::Tensor::Slice(int64 dim0_start, int64 dim0_limit) const {#Tensor_tensorflow_Tensor_Slice}
+
+Slice this tensor along the 1st dimension.
+
+I.e., the returned tensor satisifies returned[i, ...] == this[dim0_start + i, ...]. The returned tensor shares the underlying tensor buffer with this tensor.
+
+NOTE: The returned tensor may not satisfies the same alignment requirement as this tensor depending on the shape. The caller must check the returned tensor&apos;s alignment before calling certain methods that have alignment requirement (e.g., flat() , tensor()).
+
+REQUIRES: dims() &gt;= 1 REQUIRES: 0 &lt;= dim0_start &lt;= dim0_limit &lt;= dim_size(0)
+
+#### bool tensorflow::Tensor::FromProto(const TensorProto &amp;other) TF_MUST_USE_RESULT {#bool_tensorflow_Tensor_FromProto}
+
+Parse "other&apos; and construct the tensor.
+
+Returns true iff the parsing succeeds. If the parsing fails, the state of &quot;*this&quot; is unchanged.
+
+#### bool tensorflow::Tensor::FromProto(Allocator *a, const TensorProto &amp;other) TF_MUST_USE_RESULT {#bool_tensorflow_Tensor_FromProto}
+
+
+
+
+
+#### void tensorflow::Tensor::AsProtoField(TensorProto *proto) const {#void_tensorflow_Tensor_AsProtoField}
+
+Fills in &quot;proto&quot; with &quot;*this&quot; tensor&apos;s content.
+
+AsProtoField() fills in the repeated field for proto.dtype(), while AsProtoTensorContent() encodes the content in proto.tensor_content() in a compact form.
+
+#### void tensorflow::Tensor::AsProtoTensorContent(TensorProto *proto) const {#void_tensorflow_Tensor_AsProtoTensorContent}
+
+
+
+
+
+#### TTypes&lt;T&gt;::Vec tensorflow::Tensor::vec() {#TTypes_lt_T_gt_Vec_tensorflow_Tensor_vec}
+
+Return the Tensor data as an Eigen::Tensor with the type and sizes of this Tensor .
+
+Use these methods when you know the data type and the number of dimensions of the Tensor and you want an Eigen::Tensor automatically sized to the Tensor sizes. The implementation check fails if either type or sizes mismatch.
+
+Example: typedef float T; Tensor my_mat(...built with Shape{rows: 3, cols: 5}...); auto mat = my_mat.matrix&lt;T&gt;(); // 2D Eigen::Tensor, 3 x 5. auto mat = my_mat.tensor&lt;T, 2&gt;(); // 2D Eigen::Tensor, 3 x 5. auto vec = my_mat.vec&lt;T&gt;(); // CHECK fails as my_mat is 2D. auto vec = my_mat.tensor&lt;T, 3&gt;(); // CHECK fails as my_mat is 2D. auto mat = my_mat.matrix&lt;int32&gt;();// CHECK fails as type mismatch.
+
+#### TTypes&lt;T&gt;::Matrix tensorflow::Tensor::matrix() {#TTypes_lt_T_gt_Matrix_tensorflow_Tensor_matrix}
+
+
+
+
+
+#### TTypes&lt; T, NDIMS &gt;::Tensor tensorflow::Tensor::tensor() {#TTypes_lt_T_NDIMS_gt_Tensor_tensorflow_Tensor_tensor}
+
+
+
+
+
+#### TTypes&lt;T&gt;::Flat tensorflow::Tensor::flat() {#TTypes_lt_T_gt_Flat_tensorflow_Tensor_flat}
+
+Return the Tensor data as an Eigen::Tensor of the data type and a specified shape.
+
+These methods allow you to access the data with the dimensions and sizes of your choice. You do not need to know the number of dimensions of the Tensor to call them. However, they CHECK that the type matches and the dimensions requested creates an Eigen::Tensor with the same number of elements as the Tensor .
+
+Example: typedef float T; Tensor my_ten(...built with Shape{planes: 4, rows: 3, cols: 5}...); // 1D Eigen::Tensor, size 60: auto flat = my_ten.flat&lt;T&gt;(); // 2D Eigen::Tensor 12 x 5: auto inner = my_ten.flat_inner_dims&lt;T&gt;(); // 2D Eigen::Tensor 4 x 15: auto outer = my_ten.shaped&lt;T, 2&gt;({4, 15}); // CHECK fails, bad num elements: auto outer = my_ten.shaped&lt;T, 2&gt;({4, 8}); // 3D Eigen::Tensor 6 x 5 x 2: auto weird = my_ten.shaped&lt;T, 3&gt;({6, 5, 2}); // CHECK fails, type mismatch: auto bad = my_ten.flat&lt;int32&gt;();
+
+#### TTypes&lt;T&gt;::UnalignedFlat tensorflow::Tensor::unaligned_flat() {#TTypes_lt_T_gt_UnalignedFlat_tensorflow_Tensor_unaligned_flat}
+
+
+
+
+
+#### TTypes&lt;T&gt;::Matrix tensorflow::Tensor::flat_inner_dims() {#TTypes_lt_T_gt_Matrix_tensorflow_Tensor_flat_inner_dims}
+
+
+
+Returns the data as an Eigen::Tensor with 2 dimensions, collapsing all Tensor dimensions but the last one into the first dimension of the result.
+
+#### TTypes&lt;T&gt;::Matrix tensorflow::Tensor::flat_outer_dims() {#TTypes_lt_T_gt_Matrix_tensorflow_Tensor_flat_outer_dims}
+
+
+
+Returns the data as an Eigen::Tensor with 2 dimensions, collapsing all Tensor dimensions but the first one into the last dimension of the result.
+
+#### TTypes&lt; T, NDIMS &gt;::Tensor tensorflow::Tensor::shaped(gtl::ArraySlice&lt; int64 &gt; new_sizes) {#TTypes_lt_T_NDIMS_gt_Tensor_tensorflow_Tensor_shaped}
+
+
+
+
+
+#### TTypes&lt; T, NDIMS &gt;::UnalignedTensor tensorflow::Tensor::unaligned_shaped(gtl::ArraySlice&lt; int64 &gt; new_sizes) {#TTypes_lt_T_NDIMS_gt_UnalignedTensor_tensorflow_Tensor_unaligned_shaped}
+
+
+
+
+
+#### TTypes&lt; T &gt;::Scalar tensorflow::Tensor::scalar() {#TTypes_lt_T_gt_Scalar_tensorflow_Tensor_scalar}
+
+Return the Tensor data as a Tensor Map of fixed size 1: TensorMap&lt;TensorFixedSize&lt;T, 1&gt;&gt;.
+
+Using scalar() allows the compiler to perform optimizations as the size of the tensor is known at compile time.
+
+#### TTypes&lt;T&gt;::ConstVec tensorflow::Tensor::vec() const {#TTypes_lt_T_gt_ConstVec_tensorflow_Tensor_vec}
+
+Const versions of all the methods above.
+
+
+
+#### TTypes&lt;T&gt;::ConstMatrix tensorflow::Tensor::matrix() const {#TTypes_lt_T_gt_ConstMatrix_tensorflow_Tensor_matrix}
+
+
+
+
+
+#### TTypes&lt; T, NDIMS &gt;::ConstTensor tensorflow::Tensor::tensor() const {#TTypes_lt_T_NDIMS_gt_ConstTensor_tensorflow_Tensor_tensor}
+
+
+
+
+
+#### TTypes&lt;T&gt;::ConstFlat tensorflow::Tensor::flat() const {#TTypes_lt_T_gt_ConstFlat_tensorflow_Tensor_flat}
+
+
+
+
+
+#### TTypes&lt;T&gt;::ConstUnalignedFlat tensorflow::Tensor::unaligned_flat() const {#TTypes_lt_T_gt_ConstUnalignedFlat_tensorflow_Tensor_unaligned_flat}
+
+
+
+
+
+#### TTypes&lt;T&gt;::ConstMatrix tensorflow::Tensor::flat_inner_dims() const {#TTypes_lt_T_gt_ConstMatrix_tensorflow_Tensor_flat_inner_dims}
+
+
+
+
+
+#### TTypes&lt;T&gt;::ConstMatrix tensorflow::Tensor::flat_outer_dims() const {#TTypes_lt_T_gt_ConstMatrix_tensorflow_Tensor_flat_outer_dims}
+
+
+
+
+
+#### TTypes&lt; T, NDIMS &gt;::ConstTensor tensorflow::Tensor::shaped(gtl::ArraySlice&lt; int64 &gt; new_sizes) const {#TTypes_lt_T_NDIMS_gt_ConstTensor_tensorflow_Tensor_shaped}
+
+
+
+
+
+#### TTypes&lt; T, NDIMS &gt;::ConstUnalignedTensor tensorflow::Tensor::unaligned_shaped(gtl::ArraySlice&lt; int64 &gt; new_sizes) const {#TTypes_lt_T_NDIMS_gt_ConstUnalignedTensor_tensorflow_Tensor_unaligned_shaped}
+
+
+
+
+
+#### TTypes&lt; T &gt;::ConstScalar tensorflow::Tensor::scalar() const {#TTypes_lt_T_gt_ConstScalar_tensorflow_Tensor_scalar}
+
+
+
+
+
+#### string tensorflow::Tensor::SummarizeValue(int64 max_entries) const {#string_tensorflow_Tensor_SummarizeValue}
+
+Render the first max_entries values in *this into a string.
+
+
+
+#### string tensorflow::Tensor::DebugString() const {#string_tensorflow_Tensor_DebugString}
+
+A human-readable summary of the Tensor suitable for debugging.
+
+
+
+#### void tensorflow::Tensor::FillDescription(TensorDescription *description) const {#void_tensorflow_Tensor_FillDescription}
+
+
+
+Fill in the TensorDescription proto with metadata about the Tensor that is useful for monitoring and debugging.
+
+#### StringPiece tensorflow::Tensor::tensor_data() const {#StringPiece_tensorflow_Tensor_tensor_data}
+
+Returns a StringPiece mapping the current tensor&apos;s buffer.
+
+The returned StringPiece may point to memory location on devices that the CPU cannot address directly.
+
+NOTE: The underlying Tensor buffer is refcounted, so the lifetime of the contents mapped by the StringPiece matches the lifetime of the buffer; callers should arrange to make sure the buffer does not get destroyed while the StringPiece is still used.
+
+REQUIRES: DataTypeCanUseMemcpy( dtype() ).
diff --git a/tensorflow/g3doc/api_docs/cc/ClassTensorBuffer.md b/tensorflow/g3doc/api_docs/cc/ClassTensorBuffer.md
new file mode 100644
index 0000000000..9f2c6a23be
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassTensorBuffer.md
@@ -0,0 +1,52 @@
+#Class tensorflow::TensorBuffer
+
+
+
+
+
+##Member Summary
+
+* [tensorflow::TensorBuffer::~TensorBuffer](#tensorflow_TensorBuffer_TensorBuffer)
+* [virtual void* tensorflow::TensorBuffer::data](#virtual_void_tensorflow_TensorBuffer_data)
+* [virtual size_t tensorflow::TensorBuffer::size](#virtual_size_t_tensorflow_TensorBuffer_size)
+* [virtual TensorBuffer* tensorflow::TensorBuffer::root_buffer](#virtual_TensorBuffer_tensorflow_TensorBuffer_root_buffer)
+* [virtual void tensorflow::TensorBuffer::FillAllocationDescription](#virtual_void_tensorflow_TensorBuffer_FillAllocationDescription)
+* [T* tensorflow::TensorBuffer::base](#T_tensorflow_TensorBuffer_base)
+
+##Member Details
+
+#### tensorflow::TensorBuffer::~TensorBuffer() override {#tensorflow_TensorBuffer_TensorBuffer}
+
+
+
+
+
+#### virtual void* tensorflow::TensorBuffer::data() const =0 {#virtual_void_tensorflow_TensorBuffer_data}
+
+
+
+
+
+#### virtual size_t tensorflow::TensorBuffer::size() const =0 {#virtual_size_t_tensorflow_TensorBuffer_size}
+
+
+
+
+
+#### virtual TensorBuffer* tensorflow::TensorBuffer::root_buffer()=0 {#virtual_TensorBuffer_tensorflow_TensorBuffer_root_buffer}
+
+
+
+
+
+#### virtual void tensorflow::TensorBuffer::FillAllocationDescription(AllocationDescription *proto) const =0 {#virtual_void_tensorflow_TensorBuffer_FillAllocationDescription}
+
+
+
+
+
+#### T* tensorflow::TensorBuffer::base() const {#T_tensorflow_TensorBuffer_base}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/ClassTensorShape.md b/tensorflow/g3doc/api_docs/cc/ClassTensorShape.md
new file mode 100644
index 0000000000..47a105a76e
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassTensorShape.md
@@ -0,0 +1,196 @@
+#Class tensorflow::TensorShape
+
+Manages the dimensions of a Tensor and their sizes.
+
+
+
+##Member Summary
+
+* [tensorflow::TensorShape::TensorShape](#tensorflow_TensorShape_TensorShape)
+  * Construct a TensorShape from the provided sizes.. REQUIRES: dim_sizes[i] &gt;= 0.
+* [tensorflow::TensorShape::TensorShape](#tensorflow_TensorShape_TensorShape)
+* [tensorflow::TensorShape::TensorShape](#tensorflow_TensorShape_TensorShape)
+  * REQUIRES: IsValid(proto)
+* [tensorflow::TensorShape::TensorShape](#tensorflow_TensorShape_TensorShape)
+* [void tensorflow::TensorShape::Clear](#void_tensorflow_TensorShape_Clear)
+  * Clear a tensor shape.
+* [void tensorflow::TensorShape::AddDim](#void_tensorflow_TensorShape_AddDim)
+  * Add a dimension to the end (&quot;inner-most&quot;). REQUIRES: size &gt;= 0.
+* [void tensorflow::TensorShape::AppendShape](#void_tensorflow_TensorShape_AppendShape)
+  * Appends all the dimensions from shape.
+* [void tensorflow::TensorShape::InsertDim](#void_tensorflow_TensorShape_InsertDim)
+  * Insert a dimension somewhere in the TensorShape . REQUIRES: &quot;0 &lt;= d &lt;= dims()&quot; REQUIRES: size &gt;= 0.
+* [void tensorflow::TensorShape::set_dim](#void_tensorflow_TensorShape_set_dim)
+  * Modifies the size of the dimension &apos;d&apos; to be &apos;size&apos; REQUIRES: &quot;0 &lt;= d &lt; dims()&quot; REQUIRES: size &gt;= 0.
+* [void tensorflow::TensorShape::RemoveDim](#void_tensorflow_TensorShape_RemoveDim)
+  * Removes dimension &apos;d&apos; from the TensorShape . REQUIRES: &quot;0 &lt;= d &lt; dims()&quot;.
+* [int tensorflow::TensorShape::dims](#int_tensorflow_TensorShape_dims)
+  * Return the number of dimensions in the tensor.
+* [int64 tensorflow::TensorShape::dim_size](#int64_tensorflow_TensorShape_dim_size)
+  * Returns the number of elements in dimension &quot;d&quot;. REQUIRES: &quot;0 &lt;= d &lt; dims()&quot;.
+* [gtl::ArraySlice&lt;int64&gt; tensorflow::TensorShape::dim_sizes](#gtl_ArraySlice_lt_int64_gt_tensorflow_TensorShape_dim_sizes)
+  * Returns sizes of all dimensions.
+* [int64 tensorflow::TensorShape::num_elements](#int64_tensorflow_TensorShape_num_elements)
+  * Returns the number of elements in the tensor.
+* [bool tensorflow::TensorShape::IsSameSize](#bool_tensorflow_TensorShape_IsSameSize)
+  * Returns true if *this and b have the same sizes. Ignores dimension names.
+* [bool tensorflow::TensorShape::operator==](#bool_tensorflow_TensorShape_operator_)
+* [void tensorflow::TensorShape::AsProto](#void_tensorflow_TensorShape_AsProto)
+  * Fill *proto from *this.
+* [Eigen::DSizes&lt; Eigen::DenseIndex, NDIMS &gt; tensorflow::TensorShape::AsEigenDSizes](#Eigen_DSizes_lt_Eigen_DenseIndex_NDIMS_gt_tensorflow_TensorShape_AsEigenDSizes)
+  * Fill *dsizes from *this.
+* [Eigen::DSizes&lt; Eigen::DenseIndex, NDIMS &gt; tensorflow::TensorShape::AsEigenDSizesWithPadding](#Eigen_DSizes_lt_Eigen_DenseIndex_NDIMS_gt_tensorflow_TensorShape_AsEigenDSizesWithPadding)
+* [TensorShapeIter tensorflow::TensorShape::begin](#TensorShapeIter_tensorflow_TensorShape_begin)
+  * For iterating through the dimensions.
+* [TensorShapeIter tensorflow::TensorShape::end](#TensorShapeIter_tensorflow_TensorShape_end)
+* [string tensorflow::TensorShape::DebugString](#string_tensorflow_TensorShape_DebugString)
+  * For error messages.
+* [string tensorflow::TensorShape::ShortDebugString](#string_tensorflow_TensorShape_ShortDebugString)
+* [static bool tensorflow::TensorShape::IsValid](#static_bool_tensorflow_TensorShape_IsValid)
+  * Returns true iff &quot;proto&quot; is a valid tensor shape.
+
+##Member Details
+
+#### tensorflow::TensorShape::TensorShape(gtl::ArraySlice&lt; int64 &gt; dim_sizes) {#tensorflow_TensorShape_TensorShape}
+
+Construct a TensorShape from the provided sizes.. REQUIRES: dim_sizes[i] &gt;= 0.
+
+
+
+#### tensorflow::TensorShape::TensorShape(std::initializer_list&lt; int64 &gt; dim_sizes) {#tensorflow_TensorShape_TensorShape}
+
+
+
+
+
+#### tensorflow::TensorShape::TensorShape(const TensorShapeProto &amp;proto) {#tensorflow_TensorShape_TensorShape}
+
+REQUIRES: IsValid(proto)
+
+
+
+#### tensorflow::TensorShape::TensorShape() {#tensorflow_TensorShape_TensorShape}
+
+
+
+Create a tensor shape with no dimensions and one element, which you can then call AddDim() on.
+
+#### void tensorflow::TensorShape::Clear() {#void_tensorflow_TensorShape_Clear}
+
+Clear a tensor shape.
+
+
+
+#### void tensorflow::TensorShape::AddDim(int64 size) {#void_tensorflow_TensorShape_AddDim}
+
+Add a dimension to the end (&quot;inner-most&quot;). REQUIRES: size &gt;= 0.
+
+
+
+#### void tensorflow::TensorShape::AppendShape(const TensorShape &amp;shape) {#void_tensorflow_TensorShape_AppendShape}
+
+Appends all the dimensions from shape.
+
+
+
+#### void tensorflow::TensorShape::InsertDim(int d, int64 size) {#void_tensorflow_TensorShape_InsertDim}
+
+Insert a dimension somewhere in the TensorShape . REQUIRES: &quot;0 &lt;= d &lt;= dims()&quot; REQUIRES: size &gt;= 0.
+
+
+
+#### void tensorflow::TensorShape::set_dim(int d, int64 size) {#void_tensorflow_TensorShape_set_dim}
+
+Modifies the size of the dimension &apos;d&apos; to be &apos;size&apos; REQUIRES: &quot;0 &lt;= d &lt; dims()&quot; REQUIRES: size &gt;= 0.
+
+
+
+#### void tensorflow::TensorShape::RemoveDim(int d) {#void_tensorflow_TensorShape_RemoveDim}
+
+Removes dimension &apos;d&apos; from the TensorShape . REQUIRES: &quot;0 &lt;= d &lt; dims()&quot;.
+
+
+
+#### int tensorflow::TensorShape::dims() const {#int_tensorflow_TensorShape_dims}
+
+Return the number of dimensions in the tensor.
+
+
+
+#### int64 tensorflow::TensorShape::dim_size(int d) const {#int64_tensorflow_TensorShape_dim_size}
+
+Returns the number of elements in dimension &quot;d&quot;. REQUIRES: &quot;0 &lt;= d &lt; dims()&quot;.
+
+
+
+#### gtl::ArraySlice&lt;int64&gt; tensorflow::TensorShape::dim_sizes() const {#gtl_ArraySlice_lt_int64_gt_tensorflow_TensorShape_dim_sizes}
+
+Returns sizes of all dimensions.
+
+
+
+#### int64 tensorflow::TensorShape::num_elements() const {#int64_tensorflow_TensorShape_num_elements}
+
+Returns the number of elements in the tensor.
+
+We use int64 and not size_t to be compatible with Eigen::Tensor which uses ptr_fi
+
+#### bool tensorflow::TensorShape::IsSameSize(const TensorShape &amp;b) const {#bool_tensorflow_TensorShape_IsSameSize}
+
+Returns true if *this and b have the same sizes. Ignores dimension names.
+
+
+
+#### bool tensorflow::TensorShape::operator==(const TensorShape &amp;b) const {#bool_tensorflow_TensorShape_operator_}
+
+
+
+
+
+#### void tensorflow::TensorShape::AsProto(TensorShapeProto *proto) const {#void_tensorflow_TensorShape_AsProto}
+
+Fill *proto from *this.
+
+
+
+#### Eigen::DSizes&lt; Eigen::DenseIndex, NDIMS &gt; tensorflow::TensorShape::AsEigenDSizes() const {#Eigen_DSizes_lt_Eigen_DenseIndex_NDIMS_gt_tensorflow_TensorShape_AsEigenDSizes}
+
+Fill *dsizes from *this.
+
+
+
+#### Eigen::DSizes&lt; Eigen::DenseIndex, NDIMS &gt; tensorflow::TensorShape::AsEigenDSizesWithPadding() const {#Eigen_DSizes_lt_Eigen_DenseIndex_NDIMS_gt_tensorflow_TensorShape_AsEigenDSizesWithPadding}
+
+
+
+Same as AsEigenDSizes() but allows for NDIMS &gt; dims()  in which case we pad the rest of the sizes with 1.
+
+#### TensorShapeIter tensorflow::TensorShape::begin() const {#TensorShapeIter_tensorflow_TensorShape_begin}
+
+For iterating through the dimensions.
+
+
+
+#### TensorShapeIter tensorflow::TensorShape::end() const {#TensorShapeIter_tensorflow_TensorShape_end}
+
+
+
+
+
+#### string tensorflow::TensorShape::DebugString() const {#string_tensorflow_TensorShape_DebugString}
+
+For error messages.
+
+
+
+#### string tensorflow::TensorShape::ShortDebugString() const {#string_tensorflow_TensorShape_ShortDebugString}
+
+
+
+
+
+#### static bool tensorflow::TensorShape::IsValid(const TensorShapeProto &amp;proto) {#static_bool_tensorflow_TensorShape_IsValid}
+
+Returns true iff &quot;proto&quot; is a valid tensor shape.
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/ClassTensorShapeIter.md b/tensorflow/g3doc/api_docs/cc/ClassTensorShapeIter.md
new file mode 100644
index 0000000000..2f198168a2
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassTensorShapeIter.md
@@ -0,0 +1,45 @@
+#Class tensorflow::TensorShapeIter
+
+
+
+
+
+##Member Summary
+
+* [tensorflow::TensorShapeIter::TensorShapeIter](#tensorflow_TensorShapeIter_TensorShapeIter)
+* [bool tensorflow::TensorShapeIter::operator==](#bool_tensorflow_TensorShapeIter_operator_)
+* [bool tensorflow::TensorShapeIter::operator!=](#bool_tensorflow_TensorShapeIter_operator_)
+* [void tensorflow::TensorShapeIter::operator++](#void_tensorflow_TensorShapeIter_operator_)
+* [TensorShapeDim tensorflow::TensorShapeIter::operator*](#TensorShapeDim_tensorflow_TensorShapeIter_operator_)
+
+##Member Details
+
+#### tensorflow::TensorShapeIter::TensorShapeIter(const TensorShape *shape, int d) {#tensorflow_TensorShapeIter_TensorShapeIter}
+
+
+
+
+
+#### bool tensorflow::TensorShapeIter::operator==(const TensorShapeIter &amp;rhs) {#bool_tensorflow_TensorShapeIter_operator_}
+
+
+
+
+
+#### bool tensorflow::TensorShapeIter::operator!=(const TensorShapeIter &amp;rhs) {#bool_tensorflow_TensorShapeIter_operator_}
+
+
+
+
+
+#### void tensorflow::TensorShapeIter::operator++() {#void_tensorflow_TensorShapeIter_operator_}
+
+
+
+
+
+#### TensorShapeDim tensorflow::TensorShapeIter::operator*() {#TensorShapeDim_tensorflow_TensorShapeIter_operator_}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/ClassTensorShapeUtils.md b/tensorflow/g3doc/api_docs/cc/ClassTensorShapeUtils.md
new file mode 100644
index 0000000000..7b81eb62a8
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassTensorShapeUtils.md
@@ -0,0 +1,81 @@
+#Class tensorflow::TensorShapeUtils
+
+Static helper routines for TensorShape . Includes a few common predicates on a tensor shape.
+
+
+
+##Member Summary
+
+* [static bool tensorflow::TensorShapeUtils::IsScalar](#static_bool_tensorflow_TensorShapeUtils_IsScalar)
+* [static bool tensorflow::TensorShapeUtils::IsVector](#static_bool_tensorflow_TensorShapeUtils_IsVector)
+* [static bool tensorflow::TensorShapeUtils::IsLegacyScalar](#static_bool_tensorflow_TensorShapeUtils_IsLegacyScalar)
+* [static bool tensorflow::TensorShapeUtils::IsLegacyVector](#static_bool_tensorflow_TensorShapeUtils_IsLegacyVector)
+* [static bool tensorflow::TensorShapeUtils::IsVectorOrHigher](#static_bool_tensorflow_TensorShapeUtils_IsVectorOrHigher)
+* [static bool tensorflow::TensorShapeUtils::IsMatrix](#static_bool_tensorflow_TensorShapeUtils_IsMatrix)
+* [static bool tensorflow::TensorShapeUtils::IsMatrixOrHigher](#static_bool_tensorflow_TensorShapeUtils_IsMatrixOrHigher)
+* [static TensorShape tensorflow::TensorShapeUtils::MakeShape](#static_TensorShape_tensorflow_TensorShapeUtils_MakeShape)
+  * Returns a TensorShape whose dimensions are dims[0], dims[1], ..., dims[n-1].
+* [static string tensorflow::TensorShapeUtils::ShapeListString](#static_string_tensorflow_TensorShapeUtils_ShapeListString)
+* [static bool tensorflow::TensorShapeUtils::StartsWith](#static_bool_tensorflow_TensorShapeUtils_StartsWith)
+
+##Member Details
+
+#### static bool tensorflow::TensorShapeUtils::IsScalar(const TensorShape &amp;shape) {#static_bool_tensorflow_TensorShapeUtils_IsScalar}
+
+
+
+
+
+#### static bool tensorflow::TensorShapeUtils::IsVector(const TensorShape &amp;shape) {#static_bool_tensorflow_TensorShapeUtils_IsVector}
+
+
+
+
+
+#### static bool tensorflow::TensorShapeUtils::IsLegacyScalar(const TensorShape &amp;shape) {#static_bool_tensorflow_TensorShapeUtils_IsLegacyScalar}
+
+
+
+
+
+#### static bool tensorflow::TensorShapeUtils::IsLegacyVector(const TensorShape &amp;shape) {#static_bool_tensorflow_TensorShapeUtils_IsLegacyVector}
+
+
+
+
+
+#### static bool tensorflow::TensorShapeUtils::IsVectorOrHigher(const TensorShape &amp;shape) {#static_bool_tensorflow_TensorShapeUtils_IsVectorOrHigher}
+
+
+
+
+
+#### static bool tensorflow::TensorShapeUtils::IsMatrix(const TensorShape &amp;shape) {#static_bool_tensorflow_TensorShapeUtils_IsMatrix}
+
+
+
+
+
+#### static bool tensorflow::TensorShapeUtils::IsMatrixOrHigher(const TensorShape &amp;shape) {#static_bool_tensorflow_TensorShapeUtils_IsMatrixOrHigher}
+
+
+
+
+
+#### static TensorShape tensorflow::TensorShapeUtils::MakeShape(const T *dims, int n) {#static_TensorShape_tensorflow_TensorShapeUtils_MakeShape}
+
+Returns a TensorShape whose dimensions are dims[0], dims[1], ..., dims[n-1].
+
+
+
+#### static string tensorflow::TensorShapeUtils::ShapeListString(const gtl::ArraySlice&lt; TensorShape &gt; &amp;shapes) {#static_string_tensorflow_TensorShapeUtils_ShapeListString}
+
+
+
+
+
+#### static bool tensorflow::TensorShapeUtils::StartsWith(const TensorShape &amp;shape0, const TensorShape &amp;shape1) {#static_bool_tensorflow_TensorShapeUtils_StartsWith}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/ClassThread.md b/tensorflow/g3doc/api_docs/cc/ClassThread.md
new file mode 100644
index 0000000000..32bb286206
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassThread.md
@@ -0,0 +1,25 @@
+#Class tensorflow::Thread
+
+
+
+
+
+##Member Summary
+
+* [tensorflow::Thread::Thread](#tensorflow_Thread_Thread)
+* [virtual tensorflow::Thread::~Thread](#virtual_tensorflow_Thread_Thread)
+  * Blocks until the thread of control stops running.
+
+##Member Details
+
+#### tensorflow::Thread::Thread() {#tensorflow_Thread_Thread}
+
+
+
+
+
+#### virtual tensorflow::Thread::~Thread() {#virtual_tensorflow_Thread_Thread}
+
+Blocks until the thread of control stops running.
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/ClassWritableFile.md b/tensorflow/g3doc/api_docs/cc/ClassWritableFile.md
new file mode 100644
index 0000000000..e1b2132b4f
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/ClassWritableFile.md
@@ -0,0 +1,52 @@
+#Class tensorflow::WritableFile
+
+A file abstraction for sequential writing.
+
+The implementation must provide buffering since callers may append small fragments at a time to the file.
+
+##Member Summary
+
+* [tensorflow::WritableFile::WritableFile](#tensorflow_WritableFile_WritableFile)
+* [virtual tensorflow::WritableFile::~WritableFile](#virtual_tensorflow_WritableFile_WritableFile)
+* [virtual Status tensorflow::WritableFile::Append](#virtual_Status_tensorflow_WritableFile_Append)
+* [virtual Status tensorflow::WritableFile::Close](#virtual_Status_tensorflow_WritableFile_Close)
+* [virtual Status tensorflow::WritableFile::Flush](#virtual_Status_tensorflow_WritableFile_Flush)
+* [virtual Status tensorflow::WritableFile::Sync](#virtual_Status_tensorflow_WritableFile_Sync)
+
+##Member Details
+
+#### tensorflow::WritableFile::WritableFile() {#tensorflow_WritableFile_WritableFile}
+
+
+
+
+
+#### virtual tensorflow::WritableFile::~WritableFile() {#virtual_tensorflow_WritableFile_WritableFile}
+
+
+
+
+
+#### virtual Status tensorflow::WritableFile::Append(const StringPiece &amp;data)=0 {#virtual_Status_tensorflow_WritableFile_Append}
+
+
+
+
+
+#### virtual Status tensorflow::WritableFile::Close()=0 {#virtual_Status_tensorflow_WritableFile_Close}
+
+
+
+
+
+#### virtual Status tensorflow::WritableFile::Flush()=0 {#virtual_Status_tensorflow_WritableFile_Flush}
+
+
+
+
+
+#### virtual Status tensorflow::WritableFile::Sync()=0 {#virtual_Status_tensorflow_WritableFile_Sync}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/StructSessionOptions.md b/tensorflow/g3doc/api_docs/cc/StructSessionOptions.md
new file mode 100644
index 0000000000..99044997c9
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/StructSessionOptions.md
@@ -0,0 +1,49 @@
+#Struct tensorflow::SessionOptions
+
+Configuration information for a Session .
+
+
+
+##Member Summary
+
+* [Env* tensorflow::SessionOptions::env](#Env_tensorflow_SessionOptions_env)
+  * The environment to use.
+* [string tensorflow::SessionOptions::target](#string_tensorflow_SessionOptions_target)
+  * The TensorFlow runtime to connect to.
+* [ConfigProto tensorflow::SessionOptions::config](#ConfigProto_tensorflow_SessionOptions_config)
+  * Configuration options.
+* [tensorflow::SessionOptions::SessionOptions](#tensorflow_SessionOptions_SessionOptions)
+
+##Member Details
+
+#### Env* tensorflow::SessionOptions::env {#Env_tensorflow_SessionOptions_env}
+
+The environment to use.
+
+
+
+#### string tensorflow::SessionOptions::target {#string_tensorflow_SessionOptions_target}
+
+The TensorFlow runtime to connect to.
+
+If &apos;target&apos; is empty or unspecified, the local TensorFlow runtime implementation will be used. Otherwise, the TensorFlow engine defined by &apos;target&apos; will be used to perform all computations.
+
+&quot;target&quot; can be either a single entry or a comma separated list of entries. Each entry is a resolvable address of the following format: local ip:port host:port ... other system-specific formats to identify tasks and jobs ...
+
+NOTE: at the moment &apos;local&apos; maps to an in-process service-based runtime.
+
+Upon creation, a single session affines itself to one of the remote processes, with possible load balancing choices when the &quot;target&quot; resolves to a list of possible processes.
+
+If the session disconnects from the remote process during its lifetime, session calls may fail immediately.
+
+#### ConfigProto tensorflow::SessionOptions::config {#ConfigProto_tensorflow_SessionOptions_config}
+
+Configuration options.
+
+
+
+#### tensorflow::SessionOptions::SessionOptions() {#tensorflow_SessionOptions_SessionOptions}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/StructState.md b/tensorflow/g3doc/api_docs/cc/StructState.md
new file mode 100644
index 0000000000..d031b50370
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/StructState.md
@@ -0,0 +1,24 @@
+#Struct tensorflow::Status::State
+
+
+
+
+
+##Member Summary
+
+* [tensorflow::error::Code tensorflow::Status::State::code](#tensorflow_error_Code_tensorflow_Status_State_code)
+* [string tensorflow::Status::State::msg](#string_tensorflow_Status_State_msg)
+
+##Member Details
+
+#### tensorflow::error::Code tensorflow::Status::State::code {#tensorflow_error_Code_tensorflow_Status_State_code}
+
+
+
+
+
+#### string tensorflow::Status::State::msg {#string_tensorflow_Status_State_msg}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/StructTensorShapeDim.md b/tensorflow/g3doc/api_docs/cc/StructTensorShapeDim.md
new file mode 100644
index 0000000000..711743ac85
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/StructTensorShapeDim.md
@@ -0,0 +1,24 @@
+#Struct tensorflow::TensorShapeDim
+
+
+
+
+
+##Member Summary
+
+* [int tensorflow::TensorShapeDim::size](#int_tensorflow_TensorShapeDim_size)
+* [tensorflow::TensorShapeDim::TensorShapeDim](#tensorflow_TensorShapeDim_TensorShapeDim)
+
+##Member Details
+
+#### int tensorflow::TensorShapeDim::size {#int_tensorflow_TensorShapeDim_size}
+
+
+
+
+
+#### tensorflow::TensorShapeDim::TensorShapeDim(int64 s) {#tensorflow_TensorShapeDim_TensorShapeDim}
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/StructThreadOptions.md b/tensorflow/g3doc/api_docs/cc/StructThreadOptions.md
new file mode 100644
index 0000000000..b568855d6e
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/StructThreadOptions.md
@@ -0,0 +1,26 @@
+#Struct tensorflow::ThreadOptions
+
+Options to configure a Thread .
+
+Note that the options are all hints, and the underlying implementation may choose to ignore it.
+
+##Member Summary
+
+* [size_t tensorflow::ThreadOptions::stack_size](#size_t_tensorflow_ThreadOptions_stack_size)
+  * Thread stack size to use (in bytes).
+* [size_t tensorflow::ThreadOptions::guard_size](#size_t_tensorflow_ThreadOptions_guard_size)
+  * Guard area size to use near thread stacks to use (in bytes)
+
+##Member Details
+
+#### size_t tensorflow::ThreadOptions::stack_size {#size_t_tensorflow_ThreadOptions_stack_size}
+
+Thread stack size to use (in bytes).
+
+
+
+#### size_t tensorflow::ThreadOptions::guard_size {#size_t_tensorflow_ThreadOptions_guard_size}
+
+Guard area size to use near thread stacks to use (in bytes)
+
+
diff --git a/tensorflow/g3doc/api_docs/cc/index.md b/tensorflow/g3doc/api_docs/cc/index.md
new file mode 100644
index 0000000000..82aafc7486
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/cc/index.md
@@ -0,0 +1,75 @@
+# TensorFlow C++ Session API reference documentation
+
+TensorFlow's public C++ API includes only the API for executing graphs, as of
+version 0.5. To control the execution of a graph from C++:
+
+1. Build the computation graph using the [Python API](../python/).
+1. Use [tf.train.write_graph()](../python/train.md?cl=head#write_graph) to
+write the graph to a file.
+1. Load the graph using the C++ Session API. For example:
+
+  ```c++
+  // Reads a model graph definition from disk, and creates a session object you
+  // can use to run it.
+  Status LoadGraph(string graph_file_name, Session** session) {
+    GraphDef graph_def;
+    TF_RETURN_IF_ERROR(
+        ReadBinaryProto(Env::Default(), graph_file_name, &graph_def));
+    TF_RETURN_IF_ERROR(NewSession(SessionOptions(), session));
+    TF_RETURN_IF_ERROR((*session)->Create(graph_def));
+    return Status::OK();
+  }
+```
+
+1. Run the graph with a call to `session->Run()`
+
+
+##Classes
+
+* [tensorflow::Env](ClassEnv.md)
+* [tensorflow::EnvWrapper](ClassEnvWrapper.md)
+* [tensorflow::RandomAccessFile](ClassRandomAccessFile.md)
+* [tensorflow::Session](ClassSession.md)
+* [tensorflow::Status](ClassStatus.md)
+* [tensorflow::Tensor](ClassTensor.md)
+* [tensorflow::TensorBuffer](ClassTensorBuffer.md)
+* [tensorflow::TensorShape](ClassTensorShape.md)
+* [tensorflow::TensorShapeIter](ClassTensorShapeIter.md)
+* [tensorflow::TensorShapeUtils](ClassTensorShapeUtils.md)
+* [tensorflow::Thread](ClassThread.md)
+* [tensorflow::WritableFile](ClassWritableFile.md)
+
+##Structs
+
+* [tensorflow::SessionOptions](StructSessionOptions.md)
+* [tensorflow::Status::State](StructState.md)
+* [tensorflow::TensorShapeDim](StructTensorShapeDim.md)
+* [tensorflow::ThreadOptions](StructThreadOptions.md)
+
+
+<div class='sections-order' style="display: none;">
+<!--
+<!-- ClassEnv.md -->
+<!-- ClassEnvWrapper.md -->
+<!-- ClassRandomAccessFile.md -->
+<!-- ClassSession.md -->
+<!-- ClassStatus.md -->
+<!-- ClassTensor.md -->
+<!-- ClassTensorBuffer.md -->
+<!-- ClassTensorShape.md -->
+<!-- ClassTensorShapeIter.md -->
+<!-- ClassTensorShapeUtils.md -->
+<!-- ClassThread.md -->
+<!-- ClassWritableFile.md -->
+<!-- StructSessionOptions.md -->
+<!-- StructState.md -->
+<!-- StructTensorShapeDim.md -->
+<!-- StructThreadOptions.md -->
+-->
+</div>
+
+
+
+
+
+
diff --git a/tensorflow/g3doc/api_docs/index.md b/tensorflow/g3doc/api_docs/index.md
new file mode 100644
index 0000000000..7234bf45a8
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/index.md
@@ -0,0 +1,15 @@
+# Overview
+
+TensorFlow has APIs available in several languages both for constructing and
+executing a TensorFlow graph.  The Python API is at present the most complete
+and the easiest to use, but the C++ API may offer some performance advantages
+in graph execution, and supports deployment to small devices such as Android.
+
+Over time, we hope that the TensorFlow community will develop front ends for
+languages like Go, Java, Javascript, Lua R, and perhaps others. With SWIG, it's
+relatively easy to contribute a TensorFlow interface to your favorite language.
+
+Note: Many practical aspects of ssage are covered in the Mechanics tab, and
+some additional documentation not specific to any particular language API is
+available in the Resources tab.
+
diff --git a/tensorflow/g3doc/api_docs/python/array_ops.md b/tensorflow/g3doc/api_docs/python/array_ops.md
new file mode 100644
index 0000000000..eecb442f1c
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/array_ops.md
@@ -0,0 +1,1025 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Tensor Transformations
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Casting](#AUTOGENERATED-casting)
+  * [tf.string_to_number(string_tensor, out_type=None, name=None)](#string_to_number)
+  * [tf.to_double(x, name='ToDouble')](#to_double)
+  * [tf.to_float(x, name='ToFloat')](#to_float)
+  * [tf.to_bfloat16(x, name='ToBFloat16')](#to_bfloat16)
+  * [tf.to_int32(x, name='ToInt32')](#to_int32)
+  * [tf.to_int64(x, name='ToInt64')](#to_int64)
+  * [tf.cast(x, dtype, name=None)](#cast)
+* [Shapes and Shaping](#AUTOGENERATED-shapes-and-shaping)
+  * [tf.shape(input, name=None)](#shape)
+  * [tf.size(input, name=None)](#size)
+  * [tf.rank(input, name=None)](#rank)
+  * [tf.reshape(tensor, shape, name=None)](#reshape)
+  * [tf.squeeze(input, squeeze_dims=None, name=None)](#squeeze)
+  * [tf.expand_dims(input, dim, name=None)](#expand_dims)
+* [Slicing and Joining](#AUTOGENERATED-slicing-and-joining)
+  * [tf.slice(input_, begin, size, name=None)](#slice)
+  * [tf.split(split_dim, num_split, value, name='split')](#split)
+  * [tf.tile(input, multiples, name=None)](#tile)
+  * [tf.pad(input, paddings, name=None)](#pad)
+  * [tf.concat(concat_dim, values, name='concat')](#concat)
+  * [tf.pack(values, name='pack')](#pack)
+  * [tf.unpack(value, num=None, name='unpack')](#unpack)
+  * [tf.reverse_sequence(input, seq_lengths, seq_dim, name=None)](#reverse_sequence)
+  * [tf.reverse(tensor, dims, name=None)](#reverse)
+  * [tf.transpose(a, perm=None, name='transpose')](#transpose)
+  * [tf.gather(params, indices, name=None)](#gather)
+  * [tf.dynamic_partition(data, partitions, num_partitions, name=None)](#dynamic_partition)
+  * [tf.dynamic_stitch(indices, data, name=None)](#dynamic_stitch)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Casting <div class="md-anchor" id="AUTOGENERATED-casting">{#AUTOGENERATED-casting}</div>
+
+TensorFlow provides several operations that you can use to cast tensor data
+types in your graph.
+
+- - -
+
+### tf.string_to_number(string_tensor, out_type=None, name=None) <div class="md-anchor" id="string_to_number">{#string_to_number}</div>
+
+Converts each string in the input Tensor to the specified numeric type.
+
+(Note that int32 overflow results in an error while float overflow
+results in a rounded value.)
+
+##### Args:
+
+
+*  <b>string_tensor</b>: A `Tensor` of type `string`.
+*  <b>out_type</b>: An optional `tf.DType` from: `tf.float32, tf.int32`. Defaults to `tf.float32`.
+    The numeric type to interpret each string in string_tensor as.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `out_type`.
+  A Tensor of the same shape as the input string_tensor.
+
+
+- - -
+
+### tf.to_double(x, name='ToDouble') <div class="md-anchor" id="to_double">{#to_double}</div>
+
+Casts a tensor to type `float64`.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` or `SparseTensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` or `SparseTensor` with same shape as `x` with type `float64`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `x` cannot be cast to the `float64`.
+
+
+- - -
+
+### tf.to_float(x, name='ToFloat') <div class="md-anchor" id="to_float">{#to_float}</div>
+
+Casts a tensor to type `float32`.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` or `SparseTensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` or `SparseTensor` with same shape as `x` with type `float32`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `x` cannot be cast to the `float32`.
+
+
+- - -
+
+### tf.to_bfloat16(x, name='ToBFloat16') <div class="md-anchor" id="to_bfloat16">{#to_bfloat16}</div>
+
+Casts a tensor to type `bfloat16`.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` or `SparseTensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` or `SparseTensor` with same shape as `x` with type `bfloat16`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `x` cannot be cast to the `bfloat16`.
+
+
+- - -
+
+### tf.to_int32(x, name='ToInt32') <div class="md-anchor" id="to_int32">{#to_int32}</div>
+
+Casts a tensor to type `int32`.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` or `SparseTensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` or `SparseTensor` with same shape as `x` with type `int32`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `x` cannot be cast to the `int32`.
+
+
+- - -
+
+### tf.to_int64(x, name='ToInt64') <div class="md-anchor" id="to_int64">{#to_int64}</div>
+
+Casts a tensor to type `int64`.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` or `SparseTensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` or `SparseTensor` with same shape as `x` with type `int64`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `x` cannot be cast to the `int64`.
+
+
+- - -
+
+### tf.cast(x, dtype, name=None) <div class="md-anchor" id="cast">{#cast}</div>
+
+Casts a tensor to a new type.
+
+The operation casts `x` (in case of `Tensor`) or `x.values`
+(in case of `SparseTensor`) to `dtype`.
+
+For example:
+
+```python
+# tensor `a` is [1.8, 2.2], dtype=tf.float
+tf.cast(a, tf.int32) ==> [1, 2]  # dtype=tf.int32
+```
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` or `SparseTensor`.
+*  <b>dtype</b>: The destination type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` or `SparseTensor` with same shape as `x`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `x` cannot be cast to the `dtype`.
+
+
+
+## Shapes and Shaping <div class="md-anchor" id="AUTOGENERATED-shapes-and-shaping">{#AUTOGENERATED-shapes-and-shaping}</div>
+
+TensorFlow provides several operations that you can use to determine the shape
+of a tensor and change the shape of a tensor.
+
+- - -
+
+### tf.shape(input, name=None) <div class="md-anchor" id="shape">{#shape}</div>
+
+Returns the shape of a tensor.
+
+This operation returns a 1-D integer tensor representing the shape of `input`.
+
+For example:
+
+```prettyprint
+# 't' is [[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]]
+shape(t) ==> [2, 2, 3]
+```
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `int32`.
+
+
+- - -
+
+### tf.size(input, name=None) <div class="md-anchor" id="size">{#size}</div>
+
+Returns the size of a tensor.
+
+This operation returns an integer representing the number of elements in
+`input`.
+
+For example:
+
+```prettyprint
+# 't' is [[[1, 1,, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]]]
+size(t) ==> 12
+```
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `int32`.
+
+
+- - -
+
+### tf.rank(input, name=None) <div class="md-anchor" id="rank">{#rank}</div>
+
+Returns the rank of a tensor.
+
+This operation returns an integer representing the rank of `input`.
+
+For example:
+
+```prettyprint
+# 't' is [[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]]
+# shape of tensor 't' is [2, 2, 3]
+rank(t) ==> 3
+```
+
+**Note**: The rank of a tensor is not the same as the rank of a matrix. The rank
+of a tensor is the number of indices required to uniquely select each element
+of the tensor. Rank is also known as "order", "degree", or "ndims."
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `int32`.
+
+
+- - -
+
+### tf.reshape(tensor, shape, name=None) <div class="md-anchor" id="reshape">{#reshape}</div>
+
+Reshapes a tensor.
+
+Given `tensor`, this operation returns a tensor that has the same values
+as `tensor` with shape `shape`.
+
+If `shape` is the special value `[-1]`, then `tensor` is flattened and the
+operation outputs a 1-D tensor with all elements of `tensor`.
+
+If `shape` is 1-D or higher, then the operation returns a tensor with shape
+`shape` filled with the values of `tensor`. In this case, the number of elements
+implied by `shape` must be the same as the number of elements in `tensor`.
+
+For example:
+
+```prettyprint
+# tensor 't' is [1, 2, 3, 4, 5, 6, 7, 8, 9]
+# tensor 't' has shape [9]
+reshape(t, [3, 3]) ==> [[1, 2, 3]
+                        [4, 5, 6]
+                        [7, 8, 9]]
+
+# tensor 't' is [[[1, 1], [2, 2]]
+#                [[3, 3], [4, 4]]]
+# tensor 't' has shape [2, 2]
+reshape(t, [2, 4]) ==> [[1, 1, 2, 2]
+                        [3, 3, 4, 4]]
+
+# tensor 't' is [[[1, 1, 1],
+#                 [2, 2, 2]],
+#                [[3, 3, 3],
+#                 [4, 4, 4]],
+#                [[5, 5, 5],
+#                 [6, 6, 6]]]
+# tensor 't' has shape [3, 2, 3]
+# pass '[-1]' to flatten 't'
+reshape(t, [-1]) ==> [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]
+```
+
+##### Args:
+
+
+*  <b>tensor</b>: A `Tensor`.
+*  <b>shape</b>: A `Tensor` of type `int32`. Defines the shape of the output tensor.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `tensor`.
+
+
+- - -
+
+### tf.squeeze(input, squeeze_dims=None, name=None) <div class="md-anchor" id="squeeze">{#squeeze}</div>
+
+Removes dimensions of size 1 from the shape of a tensor.
+
+Given a tensor `input`, this operation returns a tensor of the same type with
+all dimensions of size 1 removed. If you don't want to remove all size 1
+dimensions, you can remove specific size 1 dimensions by specifying
+`squeeze_dims`.
+
+For example:
+
+```prettyprint
+# 't' is a tensor of shape [1, 2, 1, 3, 1, 1]
+shape(squeeze(t)) ==> [2, 3]
+```
+
+Or, to remove specific size 1 dimensions:
+
+```prettyprint
+# 't' is a tensor of shape [1, 2, 1, 3, 1, 1]
+shape(squeeze(t, [2, 4])) ==> [1, 2, 3, 1]
+```
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. The `input` to squeeze.
+*  <b>squeeze_dims</b>: An optional list of `ints`. Defaults to `[]`.
+    If specified, only squeezes the dimensions listed. The dimension
+    index starts at 0. It is an error to squeeze a dimension that is not 1.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+  Contains the same data as `input`, but has one or more dimensions of
+  size 1 removed.
+
+
+- - -
+
+### tf.expand_dims(input, dim, name=None) <div class="md-anchor" id="expand_dims">{#expand_dims}</div>
+
+Inserts a dimension of 1 into a tensor's shape.
+
+Given a tensor `input`, this operation inserts a dimension of 1 at the
+dimension index `dim` of `input`'s shape. The dimension index `dim` starts at
+zero; if you specify a negative number for `dim` it is counted backward from
+the end.
+
+This operation is useful if you want to add a batch dimension to a single
+element. For example, if you have a single image of shape `[height, width,
+channels]`, you can make it a batch of 1 image with `expand_dims(image, 0)`,
+which will make the shape `[1, height, width, channels]`.
+
+Other examples:
+
+```prettyprint
+# 't' is a tensor of shape [2]
+shape(expand_dims(t, 0)) ==> [1, 2]
+shape(expand_dims(t, 1)) ==> [2, 1]
+shape(expand_dims(t, -1)) ==> [2, 1]
+
+# 't2' is a tensor of shape [2, 3, 5]
+shape(expand_dims(t2, 0)) ==> [1, 2, 3, 5]
+shape(expand_dims(t2, 2)) ==> [2, 3, 1, 5]
+shape(expand_dims(t2, 3)) ==> [2, 3, 5, 1]
+```
+
+This operation requires that:
+
+`-1-input.dims() <= dim <= input.dims()`
+
+This operation is related to `squeeze()`, which removes dimensions of
+size 1.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`.
+*  <b>dim</b>: A `Tensor` of type `int32`.
+    0-D (scalar). Specifies the dimension index at which to
+    expand the shape of `input`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+  Contains the same data as `input`, but its shape has an additional
+  dimension of size 1 added.
+
+
+
+## Slicing and Joining <div class="md-anchor" id="AUTOGENERATED-slicing-and-joining">{#AUTOGENERATED-slicing-and-joining}</div>
+
+TensorFlow provides several operations to slice or extract parts of a tensor,
+or join multiple tensors together.
+
+- - -
+
+### tf.slice(input_, begin, size, name=None) <div class="md-anchor" id="slice">{#slice}</div>
+
+Extracts a slice from a tensor.
+
+This operation extracts a slice of size `size` from a tensor `input` starting
+at the location specified by `begin`. The slice `size` is represented as a
+tensor shape, where `size[i]` is the number of elements of the 'i'th dimension
+of `input` that you want to slice. The starting location (`begin`) for the
+slice is represented as an offset in each dimension of `input`. In other
+words, `begin[i]` is the offset into the 'i'th dimension of `input` that you
+want to slice from.
+
+`begin` is zero-based; `size` is one-based. If `size[i]` is -1,
+all remaining elements in dimension i are included in the
+slice. In other words, this is equivalent to setting:
+
+`size[i] = input.dim_size(i) - begin[i]`
+
+This operation requires that:
+
+`0 <= begin[i] <= begin[i] + size[i] <= Di  for i in [0, n]`
+
+For example:
+
+```
+# 'input' is [[[1, 1, 1], [2, 2, 2]],
+#             [[3, 3, 3], [4, 4, 4]],
+#             [[5, 5, 5], [6, 6, 6]]]
+tf.slice(input, [1, 0, 0], [1, 1, 3]) ==> [[[3, 3, 3]]]
+tf.slice(input, [1, 0, 0], [1, 2, 3]) ==> [[[3, 3, 3],
+                                            [4, 4, 4]]]
+tf.slice(input, [1, 0, 0], [2, 1, 3]) ==> [[[3, 3, 3]],
+                                           [[5, 5, 5]]]
+```
+
+##### Args:
+
+
+*  <b>input_</b>: A `Tensor`.
+*  <b>begin</b>: An `int32` or `int64` `Tensor`.
+*  <b>size</b>: An `int32` or `int64` `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` the same type as `input`.
+
+
+- - -
+
+### tf.split(split_dim, num_split, value, name='split') <div class="md-anchor" id="split">{#split}</div>
+
+Splits a tensor into `num_split` tensors along one dimension.
+
+Splits `value` along dimension `split_dim` into `num_split` smaller tensors.
+Requires that `num_split` evenly divide `value.shape[split_dim]`.
+
+For example:
+
+```python
+# 'value' is a tensor with shape [5, 30]
+# Split 'value' into 3 tensors along dimension 1
+split0, split1, split2 = tf.split(1, 3, value)
+tf.shape(split0) ==> [5, 10]
+```
+
+##### Args:
+
+
+*  <b>split_dim</b>: A 0-D `int32` `Tensor`. The dimension along which to split.
+    Must be in the range `[0, rank(value))`.
+*  <b>num_split</b>: A 0-D `int32` `Tensor`. The number of ways to split.
+*  <b>value</b>: The `Tensor` to split.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  `num_split` `Tensor` objects resulting from splitting `value`.
+
+
+- - -
+
+### tf.tile(input, multiples, name=None) <div class="md-anchor" id="tile">{#tile}</div>
+
+Constructs a tensor by tiling a given tensor.
+
+This operation creates a new tensor by replicating `input` `multiples` times.
+The output tensor's i'th dimension has `input.dims(i) * multiples[i]` elements,
+and the values of `input` are replicated `multiples[i]` times along the 'i'th
+dimension. For example, tiling `[a b c d]` by `[2]` produces
+`[a b c d a b c d]`.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. 1-D or higher.
+*  <b>multiples</b>: A `Tensor` of type `int32`.
+    1-D. Length must be the same as the number of dimensions in `input`
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+
+
+- - -
+
+### tf.pad(input, paddings, name=None) <div class="md-anchor" id="pad">{#pad}</div>
+
+Pads a tensor with zeros.
+
+This operation pads a `input` with zeros according to the `paddings` you
+specify. `paddings` is an integer tensor with shape `[Dn, 2]`, where n is the
+rank of `input`. For each dimension D of `input`, `paddings[D, 0]` indicates
+how many zeros to add before the contents of `input` in that dimension, and
+`paddings[D, 1]` indicates how many zeros to add after the contents of `input`
+in that dimension.
+
+The padded size of each dimension D of the output is:
+
+`paddings(D, 0) + input.dim_size(D) + paddings(D, 1)`
+
+For example:
+
+```prettyprint
+# 't' is [[1, 1], [2, 2]]
+# 'paddings' is [[1, 1]], [2, 2]]
+# rank of 't' is 2
+pad(t, paddings) ==> [[0, 0, 0, 0, 0]
+                      [0, 0, 0, 0, 0]
+                      [0, 1, 1, 0, 0]
+                     [[0, 2, 2, 0, 0]
+                      [0, 0, 0, 0, 0]]
+```
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`.
+*  <b>paddings</b>: A `Tensor` of type `int32`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+
+
+- - -
+
+### tf.concat(concat_dim, values, name='concat') <div class="md-anchor" id="concat">{#concat}</div>
+
+Concatenates tensors along one dimension.
+
+Concatenates the list of tensors `values` along dimension `concat_dim`.  If
+`values[i].shape = [D0, D1, ... Dconcat_dim(i), ...Dn]`, the concatenated
+result has shape
+
+    [D0, D1, ... Rconcat_dim, ...Dn]
+
+where
+
+    Rconcat_dim = sum(Dconcat_dim(i))
+
+That is, the data from the input tensors is joined along the `concat_dim`
+dimension.
+
+The number of dimensions of the input tensors must match, and all dimensions
+except `concat_dim` must be equal.
+
+For example:
+
+```python
+t1 = [[1, 2, 3], [4, 5, 6]]
+t2 = [[7, 8, 9], [10, 11, 12]]
+tf.concat(0, [t1, t2]) ==> [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
+tf.concat(1, [t1, t2]) ==> [[1, 2, 3, 7, 8, 9], [4, 5, 6, 10, 11, 12]]
+
+# tensor t3 with shape [2, 3]
+# tensor t4 with shape [2, 3]
+tf.shape(tf.concat(0, [t3, t4])) ==> [4, 3]
+tf.shape(tf.concat(1, [t3, t4])) ==> [2, 6]
+```
+
+##### Args:
+
+
+*  <b>concat_dim</b>: 0-D `int32` `Tensor`.  Dimension along which to concatenate.
+*  <b>values</b>: A list of `Tensor` objects or a single `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` resulting from concatenation of the input tensors.
+
+
+- - -
+
+### tf.pack(values, name='pack') <div class="md-anchor" id="pack">{#pack}</div>
+
+Packs a list of rank-`R` tensors into one rank-`(R+1)` tensor.
+
+Packs tensors in `values` into a tensor with rank one higher than each tensor
+in `values` and shape `[len(values)] + values[0].shape`. The output satisfies
+`output[i, ...] = values[i][...]`.
+
+This is the opposite of unpack.  The numpy equivalent is
+
+    tf.pack([x, y, z]) = np.asarray([x, y, z])
+
+##### Args:
+
+
+*  <b>values</b>: A list of `Tensor` objects with the same shape and type.
+*  <b>name</b>: A name for this operation (optional).
+
+##### Returns:
+
+
+*  <b>output</b>: A packed `Tensor` with the same type as `values`.
+
+
+- - -
+
+### tf.unpack(value, num=None, name='unpack') <div class="md-anchor" id="unpack">{#unpack}</div>
+
+Unpacks the outer dimension of a rank-`R` tensor into rank-`(R-1)` tensors.
+
+Unpacks `num` tensors from `value` along the first dimension.
+If `num` is not specified (the default), it is inferred from `value`'s shape.
+If `value.shape[0]` is not known, `ValueError` is raised.
+
+The ith tensor in `output` is the slice `value[i, ...]`. Each tensor in
+`output` has shape `value.shape[1:]`.
+
+This is the opposite of pack.  The numpy equivalent is
+
+    tf.unpack(x, n) = list(x)
+
+##### Args:
+
+
+*  <b>value</b>: A rank `R > 0` `Tensor` to be unpacked.
+*  <b>num</b>: An `int`. The first dimension of value. Automatically inferred if
+    `None` (the default).
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The list of `Tensor` objects unpacked from `value`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `num` is unspecified and cannot be inferred.
+
+
+- - -
+
+### tf.reverse_sequence(input, seq_lengths, seq_dim, name=None) <div class="md-anchor" id="reverse_sequence">{#reverse_sequence}</div>
+
+Reverses variable length slices in dimension `seq_dim`.
+
+This op first slices `input` along the first dimension, and for each slice `i`,
+reverses the first `seq_lengths[i]` elements along the dimension `seq_dim`.
+
+The elements of `seq_lengths` must obey `seq_lengths[i] < input.dims[seq_dim]`,
+and `seq_lengths` must be a vector of length `input.dims(0)`.
+
+The output slice `i` along dimension 0 is then given by input slice `i`, with
+the first `seq_lengths[i]` slices along dimension `seq_dim` reversed.
+
+For example:
+
+```prettyprint
+# Given this:
+seq_dim = 1
+input.dims = (4, ...)
+seq_lengths = [7, 2, 3, 5]
+
+# then slices of input are reversed on seq_dim, but only up to seq_lengths:
+output[0, 0:7, :, ...] = input[0, 7:0:-1, :, ...]
+output[1, 0:2, :, ...] = input[1, 2:0:-1, :, ...]
+output[2, 0:3, :, ...] = input[2, 3:0:-1, :, ...]
+output[3, 0:5, :, ...] = input[3, 5:0:-1, :, ...]
+
+# while entries past seq_lens are copied through:
+output[0, 7:, :, ...] = input[0, 7:, :, ...]
+output[1, 2:, :, ...] = input[1, 2:, :, ...]
+output[2, 3:, :, ...] = input[2, 3:, :, ...]
+output[3, 2:, :, ...] = input[3, 2:, :, ...]
+```
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. The input to reverse.
+*  <b>seq_lengths</b>: A `Tensor` of type `int64`.
+    1-D with length `input.dims(0)` and
+    `max(seq_lengths) < input.dims(seq_dim)`
+*  <b>seq_dim</b>: An `int`. The dimension which is partially reversed.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+  The partially reversed input. It has the same shape as `input`.
+
+
+- - -
+
+### tf.reverse(tensor, dims, name=None) <div class="md-anchor" id="reverse">{#reverse}</div>
+
+Reverses specific dimensions of a tensor.
+
+Given a `tensor`, and a `bool` tensor `dims` representing the dimensions
+of `tensor`, this operation reverses each dimension i of `tensor` where
+`dims[i]` is `True`.
+
+`tensor` can have up to 8 dimensions. The number of dimensions
+of `tensor` must equal the number of elements in `dims`. In other words:
+
+`rank(tensor) = size(dims)`
+
+For example:
+
+```prettyprint
+# tensor 't' is [[[[ 0,  1,  2,  3],
+#                  [ 4,  5,  6,  7],
+#                  [ 8,  9, 10, 11]],
+#                 [[12, 13, 14, 15],
+#                  [16, 17, 18, 19],
+#                  [20, 21, 22, 23]]]]
+# tensor 't' shape is [1, 2, 3, 4]
+
+# 'dims' is [False, False, False, True]
+reverse(t, dims) ==> [[[[ 3,  2,  1,  0],
+                        [ 7,  6,  5,  4],
+                        [ 11, 10, 9, 8]],
+                       [[15, 14, 13, 12],
+                        [19, 18, 17, 16],
+                        [23, 22, 21, 20]]]]
+
+# 'dims' is [False, True, False, False]
+reverse(t, dims) ==> [[[[12, 13, 14, 15],
+                        [16, 17, 18, 19],
+                        [20, 21, 22, 23]
+                       [[ 0,  1,  2,  3],
+                        [ 4,  5,  6,  7],
+                        [ 8,  9, 10, 11]]]]
+
+# 'dims' is [False, False, True, False]
+reverse(t, dims) ==> [[[[8, 9, 10, 11],
+                        [4, 5, 6, 7],
+                        [0, 1, 2, 3]]
+                       [[20, 21, 22, 23],
+                        [16, 17, 18, 19],
+                        [12, 13, 14, 15]]]]
+```
+
+##### Args:
+
+
+*  <b>tensor</b>: A `Tensor`. Must be one of the following types: `uint8`, `int8`, `int32`, `bool`, `float32`, `float64`.
+    Up to 8-D.
+*  <b>dims</b>: A `Tensor` of type `bool`. 1-D. The dimensions to reverse.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `tensor`. The same shape as `tensor`.
+
+
+- - -
+
+### tf.transpose(a, perm=None, name='transpose') <div class="md-anchor" id="transpose">{#transpose}</div>
+
+Transposes `a`. Permutes the dimensions according to `perm`.
+
+The returned tensor's dimension i will correspond to the input dimension
+`perm[i]`. If `perm` is not given, it is set to (n-1...0), where n is
+the rank of the input tensor. Hence by default, this operation performs a
+regular matrix transpose on 2-D input Tensors.
+
+For example:
+
+```python
+# 'x' is [[1 2 3]
+#         [4 5 6]]
+tf.transpose(x) ==> [[1 4]
+                     [2 5]
+                     [3 6]]
+
+# Equivalently
+tf.transpose(x perm=[0, 1]) ==> [[1 4]
+                                 [2 5]
+                                 [3 6]]
+
+# 'perm' is more useful for n-dimensional tensors, for n > 2
+# 'x' is   [[[1  2  3]
+#            [4  5  6]]
+#           [[7  8  9]
+#            [10 11 12]]]
+# Take the transpose of the matrices in dimension-0
+tf.transpose(b, perm=[0, 2, 1]) ==> [[[1  4]
+                                      [2  5]
+                                      [3  6]]
+
+                                     [[7 10]
+                                      [8 11]
+                                      [9 12]]]
+```
+
+##### Args:
+
+
+*  <b>a</b>: A `Tensor`.
+*  <b>perm</b>: A permutation of the dimensions of `a`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A transposed `Tensor`.
+
+
+- - -
+
+### tf.gather(params, indices, name=None) <div class="md-anchor" id="gather">{#gather}</div>
+
+Gather slices from `params` according to `indices`.
+
+`indices` must be an integer tensor of any dimension (usually 0-D or 1-D).
+Produces an output tensor with shape `indices.shape + params.shape[1:]` where:
+
+    # Scalar indices
+    output[:, ..., :] = params[indices, :, ... :]
+
+    # Vector indices
+    output[i, :, ..., :] = params[indices[i], :, ... :]
+
+    # Higher rank indices
+    output[i, ..., j, :, ... :] = params[indices[i, ..., j], :, ..., :]
+
+If `indices` is a permutation and `len(indices) == params.shape[0]` then
+this operation will permute `params` accordingly.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/Gather.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>params</b>: A `Tensor`.
+*  <b>indices</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `params`.
+
+
+- - -
+
+### tf.dynamic_partition(data, partitions, num_partitions, name=None) <div class="md-anchor" id="dynamic_partition">{#dynamic_partition}</div>
+
+Partitions `data` into `num_partitions` tensors using indices from `partitions`.
+
+For each index tuple `js` of size `partitions.ndim`, the slice `data[js, ...]`
+becomes part of `outputs[partitions[js]]`.  The slices with `partitions[js] = i`
+are placed in `outputs[i]` in lexicographic order of `js`, and the first
+dimension of `outputs[i]` is the number of entries in `partitions` equal to `i`.
+In detail,
+
+    outputs[i].shape = [sum(partitions == i)] + data.shape[partitions.ndim:]
+
+    outputs[i] = pack([data[js, ...] for js if partitions[js] == i])
+
+`data.shape` must start with `partitions.shape`.
+
+For example:
+
+    # Scalar partitions
+    partitions = 1
+    num_partitions = 2
+    data = [10, 20]
+    outputs[0] = []  # Empty with shape [0, 2]
+    outputs[1] = [[10, 20]]
+
+    # Vector partitions
+    partitions = [0, 0, 1, 1, 0]
+    num_partitions = 2
+    data = [10, 20, 30, 40, 50]
+    outputs[0] = [10, 20, 50]
+    outputs[1] = [30, 40]
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/DynamicPartition.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`.
+*  <b>partitions</b>: A `Tensor` of type `int32`.
+    Any shape.  Indices in the range `[0, num_partitions)`.
+*  <b>num_partitions</b>: An `int` that is `>= 1`.
+    The number of partitions to output.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A list of `num_partitions` `Tensor` objects of the same type as data.
+
+
+- - -
+
+### tf.dynamic_stitch(indices, data, name=None) <div class="md-anchor" id="dynamic_stitch">{#dynamic_stitch}</div>
+
+Interleave the values from the `data` tensors into a single tensor.
+
+Builds a merged tensor such that
+
+    merged[indices[m][i, ..., j], ...] = data[m][i, ..., j, ...]
+
+For example, if each `indices[m]` is scalar or vector, we have
+
+    # Scalar indices
+    merged[indices[m], ...] = data[m][...]
+
+    # Vector indices
+    merged[indices[m][i], ...] = data[m][i, ...]
+
+Each `data[i].shape` must start with the corresponding `indices[i].shape`,
+and the rest of `data[i].shape` must be constant w.r.t. `i`.  That is, we
+must have `data[i].shape = indices[i].shape + constant`.  In terms of this
+`constant`, the output shape is
+
+    merged.shape = [max(indices)] + constant
+
+Values are merged in order, so if an index appears in both `indices[m][i]` and
+`indices[n][j]` for `(m,i) < (n,j)` the slice `data[n][j]` will appear in the
+merged result.
+
+For example:
+
+    indices[0] = 6
+    indices[1] = [4, 1]
+    indices[2] = [[5, 2], [0, 3]]
+    data[0] = [61, 62]
+    data[1] = [[41, 42], [11, 12]]
+    data[2] = [[[51, 52], [21, 22]], [[1, 2], [31, 32]]]
+    merged = [[1, 2], [11, 12], [21, 22], [31, 32], [41, 42],
+              [51, 52], [61, 62]]
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/DynamicStitch.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>indices</b>: A list of at least 2 `Tensor` objects of type `int32`.
+*  <b>data</b>: A list with the same number of `Tensor` objects as `indices` of `Tensor` objects of the same type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/client.md b/tensorflow/g3doc/api_docs/python/client.md
new file mode 100644
index 0000000000..b37057e4b5
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/client.md
@@ -0,0 +1,638 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Running Graphs
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Session management](#AUTOGENERATED-session-management)
+  * [class tf.Session](#Session)
+  * [tf.get_default_session()](#get_default_session)
+* [Error classes](#AUTOGENERATED-error-classes)
+  * [class tf.OpError](#OpError)
+  * [class tf.errors.CancelledError](#CancelledError)
+  * [class tf.errors.UnknownError](#UnknownError)
+  * [class tf.errors.InvalidArgumentError](#InvalidArgumentError)
+  * [class tf.errors.DeadlineExceededError](#DeadlineExceededError)
+  * [class tf.errors.NotFoundError](#NotFoundError)
+  * [class tf.errors.AlreadyExistsError](#AlreadyExistsError)
+  * [class tf.errors.PermissionDeniedError](#PermissionDeniedError)
+  * [class tf.errors.UnauthenticatedError](#UnauthenticatedError)
+  * [class tf.errors.ResourceExhaustedError](#ResourceExhaustedError)
+  * [class tf.errors.FailedPreconditionError](#FailedPreconditionError)
+  * [class tf.errors.AbortedError](#AbortedError)
+  * [class tf.errors.OutOfRangeError](#OutOfRangeError)
+  * [class tf.errors.UnimplementedError](#UnimplementedError)
+  * [class tf.errors.InternalError](#InternalError)
+  * [class tf.errors.UnavailableError](#UnavailableError)
+  * [class tf.errors.DataLossError](#DataLossError)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+This library contains classes for launching graphs and executing operations.
+
+The [basic usage](../../get_started/index.md#basic-usage) guide has
+examples of how a graph is launched in a [`tf.Session`](#Session).
+
+## Session management <div class="md-anchor" id="AUTOGENERATED-session-management">{#AUTOGENERATED-session-management}</div>
+
+- - -
+
+### class tf.Session <div class="md-anchor" id="Session">{#Session}</div>
+
+A class for running TensorFlow operations.
+
+A `Session` object encapsulates the environment in which `Operation`
+objects are executed, and `Tensor` objects are evaluated. For
+example:
+
+```python
+# Build a graph.
+a = tf.constant(5.0)
+b = tf.constant(6.0)
+c = a * b
+
+# Launch the graph in a session.
+sess = tf.Session()
+
+# Evaluate the tensor `c`.
+print sess.run(c)
+```
+
+A session may own resources, such as
+[variables](state_ops.md#Variable), [queues](io_ops.md#QueueBase),
+and [readers](io_ops.md#ReaderBase). It is important to release
+these resources when they are no longer required. To do this, either
+invoke the [`close()`](#Session.close) method on the session, or use
+the session as a context manager. The following two examples are
+equivalent:
+
+```python
+# Using the `close()` method.
+sess = tf.Session()
+sess.run(...)
+sess.close()
+
+# Using the context manager.
+with tf.Session() as sess:
+  sess.run(...)
+```
+
+The [`ConfigProto`]
+(https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/config.proto)
+protocol buffer exposes various configuration options for a
+session. For example, to create a session that uses soft constraints
+for device placement, and log the resulting placement decisions,
+create a session as follows:
+
+```python
+# Launch the graph in a session that allows soft device placement and
+# logs the placement decisions.
+sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True,
+                                        log_device_placement=True))
+```
+
+- - -
+
+#### tf.Session.__init__(target='', graph=None, config=None) {#Session.__init__}
+
+Creates a new TensorFlow session.
+
+If no `graph` argument is specified when constructing the session,
+the default graph will be launched in the session. If you are
+using more than one graph (created with `tf.Graph()` in the same
+process, you will have to use different sessions for each graph,
+but each graph can be used in multiple sessions. In this case, it
+is often clearer to pass the graph to be launched explicitly to
+the session constructor.
+
+##### Args:
+
+
+*  <b>target</b>: (Optional.) The execution engine to connect to.
+    Defaults to using an in-process engine. At present, no value
+    other than the empty string is supported.
+*  <b>graph</b>: (Optional.) The `Graph` to be launched (described above).
+*  <b>config</b>: (Optional.) A [`ConfigProto`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/config.proto)
+    protocol buffer with configuration options for the session.
+
+
+- - -
+
+#### tf.Session.run(fetches, feed_dict=None) {#Session.run}
+
+Runs the operations and evaluates the tensors in `fetches`.
+
+This method runs one "step" of TensorFlow computation, by
+running the necessary graph fragment to execute every `Operation`
+and evaluate every `Tensor` in `fetches`, substituting the values in
+`feed_dict` for the corresponding input values.
+
+The `fetches` argument may be a list of graph elements or a single
+graph element, and these determine the return value of this
+method. A graph element can be one of the following types:
+
+* If the *i*th element of `fetches` is an
+  [`Operation`](framework.md#Operation), the *i*th return value
+  will be `None`.
+* If the *i*th element of `fetches` is a
+  [`Tensor`](framework.md#Tensor), the *i*th return value will
+  be a numpy ndarray containing the value of that tensor.
+* If the *i*th element of `fetches` is a
+  [`SparseTensor`](sparse_ops.md#SparseTensor), the *i*th
+  return value will be a
+  [`SparseTensorValue`](sparse_ops.md#SparseTensorValue)
+  containing the value of that sparse tensor.
+
+The optional `feed_dict` argument allows the caller to override
+the value of tensors in the graph. Each key in `feed_dict` can be
+one of the following types:
+
+* If the key is a [`Tensor`](framework.md#Tensor), the
+  value may be a Python scalar, string, list, or numpy ndarray
+  that can be converted to the same `dtype` as that
+  tensor. Additionally, if the key is a
+  [placeholder](io_ops.md#placeholder), the shape of the value
+  will be checked for compatibility with the placeholder.
+* If the key is a [`SparseTensor`](sparse_ops.md#SparseTensor),
+  the value should be a
+  [`SparseTensorValue`](sparse_ops.md#SparseTensorValue).
+
+##### Args:
+
+
+*  <b>fetches</b>: A single graph element, or a list of graph elements
+    (described above).
+*  <b>feed_dict</b>: A dictionary that maps graph elements to values
+    (described above).
+
+##### Returns:
+
+  Either a single value if `fetches` is a single graph element, or
+  a list of values if `fetches` is a list (described above).
+
+##### Raises:
+
+
+*  <b>RuntimeError</b>: If this `Session` is in an invalid state (e.g. has been
+    closed).
+*  <b>TypeError</b>: If `fetches` or `feed_dict` keys are of an inappropriate type.
+*  <b>ValueError</b>: If `fetches` or `feed_dict` keys are invalid or refer to a
+    `Tensor` that doesn't exist.
+
+
+- - -
+
+#### tf.Session.close() {#Session.close}
+
+Closes this session.
+
+Calling this method frees all resources associated with the session.
+
+##### Raises:
+
+
+*  <b>RuntimeError</b>: If an error occurs while closing the session.
+
+
+
+- - -
+
+#### tf.Session.graph {#Session.graph}
+
+The graph that was launched in this session.
+
+
+- - -
+
+#### tf.Session.as_default() {#Session.as_default}
+
+Returns a context manager that makes this object the default session.
+
+Use with the `with` keyword to specify that calls to
+[`Operation.run()`](framework.md#Operation.run) or
+[`Tensor.run()`](framework.md#Tensor.run) should be executed in
+this session.
+
+```python
+c = tf.constant(..)
+sess = tf.Session()
+
+with sess.as_default():
+  assert tf.get_default_session() is sess
+  print c.eval()
+```
+
+To get the current default session, use
+[`tf.get_default_session()`](#get_default_session).
+
+
+*N.B.* The `as_default` context manager *does not* close the
+session when you exit the context, and you must close the session
+explicitly.
+
+```python
+c = tf.constant(...)
+sess = tf.Session()
+with sess.as_default():
+  print c.eval()
+# ...
+with sess.as_default():
+  print c.eval()
+
+sess.close()
+```
+
+Alternatively, you can use `with tf.Session():` to create a
+session that is automatically closed on exiting the context,
+including when an uncaught exception is raised.
+
+*N.B.* The default graph is a property of the current thread. If you
+create a new thread, and wish to use the default session in that
+thread, you must explicitly add a `with sess.as_default():` in that
+thread's function.
+
+##### Returns:
+
+  A context manager using this session as the default session.
+
+
+
+
+- - -
+
+### tf.get_default_session() <div class="md-anchor" id="get_default_session">{#get_default_session}</div>
+
+Returns the default session for the current thread.
+
+The returned `Session` will be the innermost session on which a
+`Session` or `Session.as_default()` context has been entered.
+
+*N.B.* The default session is a property of the current thread. If you
+create a new thread, and wish to use the default session in that
+thread, you must explicitly add a `with sess.as_default():` in that
+thread's function.
+
+##### Returns:
+
+  The default `Session` being used in the current thread.
+
+
+
+## Error classes <div class="md-anchor" id="AUTOGENERATED-error-classes">{#AUTOGENERATED-error-classes}</div>
+
+- - -
+
+### class tf.OpError <div class="md-anchor" id="OpError">{#OpError}</div>
+
+A generic error that is raised when TensorFlow execution fails.
+
+Whenever possible, the session will raise a more specific subclass
+of `OpError` from the `tf.errors` module.
+
+- - -
+
+#### tf.OpError.op {#OpError.op}
+
+The operation that failed, if known.
+
+*N.B.* If the failed op was synthesized at runtime, e.g. a `Send`
+or `Recv` op, there will be no corresponding
+[`Operation`](framework.md#Operation) object.  In that case, this
+will return `None`, and you should instead use the
+[`node_def`](OpError.node_def) to discover information about the op.
+
+##### Returns:
+
+  The `Operation` that failed, or None.
+
+- - -
+
+#### tf.OpError.node_def {#OpError.node_def}
+
+The `NodeDef` proto representing the op that failed.
+
+
+#### Other Methods
+- - -
+
+#### tf.OpError.__init__(node_def, op, message, error_code) {#OpError.__init__}
+
+Creates a new OpError indicating that a particular op failed.
+
+##### Args:
+
+
+*  <b>node_def</b>: The graph_pb2.NodeDef proto representing the op that failed.
+*  <b>op</b>: The ops.Operation that failed, if known; otherwise None.
+*  <b>message</b>: The message string describing the failure.
+*  <b>error_code</b>: The error_codes_pb2.Code describing the error.
+
+
+- - -
+
+#### tf.OpError.error_code {#OpError.error_code}
+
+The integer error code that describes the error.
+
+- - -
+
+#### tf.OpError.message {#OpError.message}
+
+The error message that describes the error.
+
+
+- - -
+
+### class tf.errors.CancelledError <div class="md-anchor" id="CancelledError">{#CancelledError}</div>
+
+Raised when an operation or step is cancelled.
+
+For example, a long-running operation (e.g.
+[`queue.enqueue()`](io_ops.md#QueueBase.enqueue) may be cancelled by
+running another operation (e.g.
+[`queue.close(cancel_pending_enqueues=True)`](io_ops.md#QueueBase.close),
+or by [closing the session](client.md#Session.close). A step that is
+running such a long-running operation will fail by raising `CancelledError`.
+
+- - -
+
+#### tf.errors.CancelledError.__init__(node_def, op, message) {#CancelledError.__init__}
+
+Creates a `CancelledError`.
+
+
+
+- - -
+
+### class tf.errors.UnknownError <div class="md-anchor" id="UnknownError">{#UnknownError}</div>
+
+Unknown error.
+
+An example of where this error may be returned is if a Status value
+received from another address space belongs to an error-space that
+is not known to this address space. Also errors raised by APIs that
+do not return enough error information may be converted to this
+error.
+
+- - -
+
+#### tf.errors.UnknownError.__init__(node_def, op, message, error_code=2) {#UnknownError.__init__}
+
+Creates an `UnknownError`.
+
+
+
+- - -
+
+### class tf.errors.InvalidArgumentError <div class="md-anchor" id="InvalidArgumentError">{#InvalidArgumentError}</div>
+
+Raised when an operation receives an invalid argument.
+
+This may occur, for example, if an operation is receives an input
+tensor that has an invalid value or shape. For example, the
+[`tf.matmul()`](math_ops.md#matmul) op will raise this error if it
+receives an input that is not a matrix, and the
+[`tf.reshape()`](array_ops.md#reshape) op will raise this error if
+the new shape does not match the number of elements in the input
+tensor.
+
+- - -
+
+#### tf.errors.InvalidArgumentError.__init__(node_def, op, message) {#InvalidArgumentError.__init__}
+
+Creates an `InvalidArgumentError`.
+
+
+
+- - -
+
+### class tf.errors.DeadlineExceededError <div class="md-anchor" id="DeadlineExceededError">{#DeadlineExceededError}</div>
+
+Raised when a deadline expires before an operation could complete.
+
+This exception is not currently used.
+
+- - -
+
+#### tf.errors.DeadlineExceededError.__init__(node_def, op, message) {#DeadlineExceededError.__init__}
+
+Creates a `DeadlineExceededError`.
+
+
+
+- - -
+
+### class tf.errors.NotFoundError <div class="md-anchor" id="NotFoundError">{#NotFoundError}</div>
+
+Raised when a requested entity (e.g., a file or directory) was not found.
+
+For example, running the
+[`tf.WholeFileReader.read()`](io_ops.md#WholeFileReader) operation
+could raise `NotFoundError` if it receives the name of a file that
+does not exist.
+
+- - -
+
+#### tf.errors.NotFoundError.__init__(node_def, op, message) {#NotFoundError.__init__}
+
+Creates a `NotFoundError`.
+
+
+
+- - -
+
+### class tf.errors.AlreadyExistsError <div class="md-anchor" id="AlreadyExistsError">{#AlreadyExistsError}</div>
+
+Raised when an entity that we attempted to create already exists.
+
+For example, running an operation that saves a file
+(e.g. [`tf.train.Saver.save()`](train.md#Saver.save)) could
+potentially raise this exception if an explicit filename for an
+existing file was passed.
+
+- - -
+
+#### tf.errors.AlreadyExistsError.__init__(node_def, op, message) {#AlreadyExistsError.__init__}
+
+Creates an `AlreadyExistsError`.
+
+
+
+- - -
+
+### class tf.errors.PermissionDeniedError <div class="md-anchor" id="PermissionDeniedError">{#PermissionDeniedError}</div>
+
+Raised when the caller does not have permission to run an operation.
+
+For example, running the
+[`tf.WholeFileReader.read()`](io_ops.md#WholeFileReader) operation
+could raise `PermissionDeniedError` if it receives the name of a
+file for which the user does not have the read file permission.
+
+- - -
+
+#### tf.errors.PermissionDeniedError.__init__(node_def, op, message) {#PermissionDeniedError.__init__}
+
+Creates a `PermissionDeniedError`.
+
+
+
+- - -
+
+### class tf.errors.UnauthenticatedError <div class="md-anchor" id="UnauthenticatedError">{#UnauthenticatedError}</div>
+
+The request does not have valid authentication credentials.
+
+This exception is not currently used.
+
+- - -
+
+#### tf.errors.UnauthenticatedError.__init__(node_def, op, message) {#UnauthenticatedError.__init__}
+
+Creates an `UnauthenticatedError`.
+
+
+
+- - -
+
+### class tf.errors.ResourceExhaustedError <div class="md-anchor" id="ResourceExhaustedError">{#ResourceExhaustedError}</div>
+
+Some resource has been exhausted.
+
+For example, this error might be raised if a per-user quota is
+exhausted, or perhaps the entire file system is out of space.
+
+- - -
+
+#### tf.errors.ResourceExhaustedError.__init__(node_def, op, message) {#ResourceExhaustedError.__init__}
+
+Creates a `ResourceExhaustedError`.
+
+
+
+- - -
+
+### class tf.errors.FailedPreconditionError <div class="md-anchor" id="FailedPreconditionError">{#FailedPreconditionError}</div>
+
+Operation was rejected because the system is not in a state to execute it.
+
+This exception is most commonly raised when running an operation
+that reads a [`tf.Variable`](state_ops.md#Variable) before it has
+been initialized.
+
+- - -
+
+#### tf.errors.FailedPreconditionError.__init__(node_def, op, message) {#FailedPreconditionError.__init__}
+
+Creates a `FailedPreconditionError`.
+
+
+
+- - -
+
+### class tf.errors.AbortedError <div class="md-anchor" id="AbortedError">{#AbortedError}</div>
+
+The operation was aborted, typically due to a concurrent action.
+
+For example, running a [`queue.enqueue()`](io_ops.md#QueueBase.enqueue)
+operation may raise `AbortedError` if a
+[`queue.close()`](io_ops.md@QueueBase.close) operation previously ran.
+
+- - -
+
+#### tf.errors.AbortedError.__init__(node_def, op, message) {#AbortedError.__init__}
+
+Creates an `AbortedError`.
+
+
+
+- - -
+
+### class tf.errors.OutOfRangeError <div class="md-anchor" id="OutOfRangeError">{#OutOfRangeError}</div>
+
+Raised when an operation executed past the valid range.
+
+This exception is raised in "end-of-file" conditions, such as when a
+[`queue.dequeue()`](io_ops.md#QueueBase.dequeue) operation is
+blocked on an empty queue, and a
+[`queue.close()`](io_ops.md#QueueBase.close) operation executes.
+
+- - -
+
+#### tf.errors.OutOfRangeError.__init__(node_def, op, message) {#OutOfRangeError.__init__}
+
+Creates an `OutOfRangeError`.
+
+
+
+- - -
+
+### class tf.errors.UnimplementedError <div class="md-anchor" id="UnimplementedError">{#UnimplementedError}</div>
+
+Raised when an operation has not been implemented.
+
+Some operations may raise this error when passed otherwise-valid
+arguments that it does not currently support. For example, running
+the [`tf.nn.max_pool()`](nn.md#max_pool) operation would raise this
+error if pooling was requested on the batch dimension, because this
+is not yet supported.
+
+- - -
+
+#### tf.errors.UnimplementedError.__init__(node_def, op, message) {#UnimplementedError.__init__}
+
+Creates an `UnimplementedError`.
+
+
+
+- - -
+
+### class tf.errors.InternalError <div class="md-anchor" id="InternalError">{#InternalError}</div>
+
+Raised when the system experiences an internal error.
+
+This exception is raised when some invariant expected by the runtime
+has been broken. Catching this exception is not recommended.
+
+- - -
+
+#### tf.errors.InternalError.__init__(node_def, op, message) {#InternalError.__init__}
+
+Creates an `InternalError`.
+
+
+
+- - -
+
+### class tf.errors.UnavailableError <div class="md-anchor" id="UnavailableError">{#UnavailableError}</div>
+
+Raised when the runtime is currently unavailable.
+
+This exception is not currently used.
+
+- - -
+
+#### tf.errors.UnavailableError.__init__(node_def, op, message) {#UnavailableError.__init__}
+
+Creates an `UnavailableError`.
+
+
+
+- - -
+
+### class tf.errors.DataLossError <div class="md-anchor" id="DataLossError">{#DataLossError}</div>
+
+Raised when unrecoverable data loss or corruption is encountered.
+
+For example, this may be raised by running a
+[`tf.WholeFileReader.read()`](io_ops.md#WholeFileReader) operation,
+if the file is truncated while it is being read.
+
+- - -
+
+#### tf.errors.DataLossError.__init__(node_def, op, message) {#DataLossError.__init__}
+
+Creates a `DataLossError`.
+
+
+
diff --git a/tensorflow/g3doc/api_docs/python/constant_op.md b/tensorflow/g3doc/api_docs/python/constant_op.md
new file mode 100644
index 0000000000..34d2b511ab
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/constant_op.md
@@ -0,0 +1,565 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Constants, Sequences, and Random Values
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Constant Value Tensors](#AUTOGENERATED-constant-value-tensors)
+  * [tf.zeros(shape, dtype=tf.float32, name=None)](#zeros)
+  * [tf.zeros_like(tensor, dtype=None, name=None)](#zeros_like)
+  * [tf.ones(shape, dtype=tf.float32, name=None)](#ones)
+  * [tf.ones_like(tensor, dtype=None, name=None)](#ones_like)
+  * [tf.fill(dims, value, name=None)](#fill)
+  * [tf.constant(value, dtype=None, shape=None, name='Const')](#constant)
+* [Sequences](#AUTOGENERATED-sequences)
+  * [tf.linspace(start, stop, num, name=None)](#linspace)
+  * [tf.range(start, limit, delta=1, name='range')](#range)
+* [Random Tensors](#AUTOGENERATED-random-tensors)
+  * [Examples:](#AUTOGENERATED-examples-)
+  * [tf.random_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)](#random_normal)
+  * [tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)](#truncated_normal)
+  * [tf.random_uniform(shape, minval=0.0, maxval=1.0, dtype=tf.float32, seed=None, name=None)](#random_uniform)
+  * [tf.random_shuffle(value, seed=None, name=None)](#random_shuffle)
+  * [tf.set_random_seed(seed)](#set_random_seed)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Constant Value Tensors <div class="md-anchor" id="AUTOGENERATED-constant-value-tensors">{#AUTOGENERATED-constant-value-tensors}</div>
+
+TensorFlow provides several operations that you can use to generate constants.
+
+- - -
+
+### tf.zeros(shape, dtype=tf.float32, name=None) <div class="md-anchor" id="zeros">{#zeros}</div>
+
+Creates a tensor with all elements set to zero.
+
+This operation returns a tensor of type `dtype` with shape `shape` and
+all elements set to zero.
+
+For example:
+
+```python
+tf.zeros([3, 4], int32) ==> [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
+```
+
+##### Args:
+
+
+*  <b>shape</b>: Either a list of integers, or a 1-D `Tensor` of type `int32`.
+*  <b>dtype</b>: The type of an element in the resulting `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` with all elements set to zero.
+
+
+- - -
+
+### tf.zeros_like(tensor, dtype=None, name=None) <div class="md-anchor" id="zeros_like">{#zeros_like}</div>
+
+Creates a tensor with all elements set to zero.
+
+Given a single tensor (`tensor`), this operation returns a tensor of the
+same type and shape as `tensor` with all elements set to zero. Optionally,
+you can use `dtype` to specify a new type for the returned tensor.
+
+For example:
+
+```python
+# 'tensor' is [[1, 2, 3], [4, 5, 6]]
+tf.zeros_like(tensor) ==> [[0, 0, 0], [0, 0, 0]]
+```
+
+##### Args:
+
+
+*  <b>tensor</b>: A `Tensor`.
+*  <b>dtype</b>: A type for the returned `Tensor`. Must be `float32`, `float64`,
+  `int8`, `int16`, `int32`, `int64`, `uint8`, or `complex64`.
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` with all elements set to zero.
+
+
+
+- - -
+
+### tf.ones(shape, dtype=tf.float32, name=None) <div class="md-anchor" id="ones">{#ones}</div>
+
+Creates a tensor with all elements set to 1.
+
+This operation returns a tensor of type `dtype` with shape `shape` and all
+elements set to 1.
+
+For example:
+
+```python
+tf.ones([2, 3], int32) ==> [[1, 1, 1], [1, 1, 1]]
+```
+
+##### Args:
+
+
+*  <b>shape</b>: Either a list of integers, or a 1-D `Tensor` of type `int32`.
+*  <b>dtype</b>: The type of an element in the resulting `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` with all elements set to 1.
+
+
+- - -
+
+### tf.ones_like(tensor, dtype=None, name=None) <div class="md-anchor" id="ones_like">{#ones_like}</div>
+
+Creates a tensor with all elements set to 1.
+
+Given a single tensor (`tensor`), this operation returns a tensor of the same
+type and shape as `tensor` with all elements set to 1. Optionally, you can
+specify a new type (`dtype`) for the returned tensor.
+
+For example:
+
+```python
+# 'tensor' is [[1, 2, 3], [4, 5, 6]]
+tf.ones_like(tensor) ==> [[1, 1, 1], [1, 1, 1]]
+```
+
+##### Args:
+
+
+*  <b>tensor</b>: A `Tensor`.
+*  <b>dtype</b>: A type for the returned `Tensor`. Must be `float32`, `float64`,
+  `int8`, `int16`, `int32`, `int64`, `uint8`, or `complex64`.
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` with all elements set to 1.
+
+
+
+- - -
+
+### tf.fill(dims, value, name=None) <div class="md-anchor" id="fill">{#fill}</div>
+
+Creates a tensor filled with a scalar value.
+
+This operation creates a tensor of shape `dims` and fills it with `value`.
+
+For example:
+
+```prettyprint
+# output tensor shape needs to be [2, 3]
+# so 'dims' is [2, 3]
+fill(dims, 9) ==> [[9, 9, 9]
+                   [9, 9, 9]]
+```
+
+##### Args:
+
+
+*  <b>dims</b>: A `Tensor` of type `int32`.
+    1-D. Represents the shape of the output tensor.
+*  <b>value</b>: A `Tensor`. 0-D (scalar). Value to fill the returned tensor.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `value`.
+
+
+
+- - -
+
+### tf.constant(value, dtype=None, shape=None, name='Const') <div class="md-anchor" id="constant">{#constant}</div>
+
+Creates a constant tensor.
+
+ The resulting tensor is populated with values of type `dtype`, as
+ specified by arguments `value` and (optionally) `shape` (see examples
+ below).
+
+ The argument `value` can be a constant value, or a list of values of type
+ `dtype`. If `value` is a list, then the length of the list must be less
+ than or equal to the number of elements implied by the `shape` argument (if
+ specified). In the case where the list length is less than the number of
+ elements specified by `shape`, the last element in the list will be used
+ to fill the remaining entries.
+
+ The argument `shape` is optional. If present, it specifies the dimensions
+ of the resulting tensor. If not present, then the tensor is a scalar (0-D)
+ if `value` is a scalar, or 1-D otherwise.
+
+ If the argument `dtype` is not specified, then the type is inferred from
+ the type of `value`.
+
+ For example:
+
+ ```python
+ # Constant 1-D Tensor populated with value list.
+ tensor = tf.constant([1, 2, 3, 4, 5, 6, 7]) => [1 2 3 4 5 6 7]
+
+ # Constant 2-D tensor populated with scalar value -1.
+ tensor = tf.constant(-1.0, shape=[2, 3]) => [[-1. -1. -1.]
+                                              [-1. -1. -1.]]
+ ```
+
+##### Args:
+
+
+*  <b>value</b>: A constant value (or list) of output type `dtype`.
+
+
+*  <b>dtype</b>: The type of the elements of the resulting tensor.
+
+
+*  <b>shape</b>: Optional dimensions of resulting tensor.
+
+
+*  <b>name</b>: Optional name for the tensor.
+
+##### Returns:
+
+  A Constant Tensor.
+
+
+
+## Sequences <div class="md-anchor" id="AUTOGENERATED-sequences">{#AUTOGENERATED-sequences}</div>
+
+- - -
+
+### tf.linspace(start, stop, num, name=None) <div class="md-anchor" id="linspace">{#linspace}</div>
+
+Generates values in an interval.
+
+A sequence of `num` evenly-spaced values are generated beginning at `start`.
+If `num > 1`, the values in the sequence increase by `stop - start / num - 1`,
+so that the last one is exactly `stop`.
+
+For example:
+
+```
+tf.linspace(10.0, 12.0, 3, name="linspace") => [ 10.0  11.0  12.0]
+```
+
+##### Args:
+
+
+*  <b>start</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+    First entry in the range.
+*  <b>stop</b>: A `Tensor`. Must have the same type as `start`.
+    Last entry in the range.
+*  <b>num</b>: A `Tensor` of type `int32`. Number of values to generate.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `start`. 1-D. The generated values.
+
+
+
+- - -
+
+### tf.range(start, limit, delta=1, name='range') <div class="md-anchor" id="range">{#range}</div>
+
+Creates a sequence of integers.
+
+This operation creates a sequence of integers that begins at `start` and
+extends by increments of `delta` up to but not including `limit`.
+
+For example:
+
+```
+# 'start' is 3
+# 'limit' is 18
+# 'delta' is 3
+tf.range(start, limit, delta) ==> [3, 6, 9, 12, 15]
+```
+
+##### Args:
+
+
+*  <b>start</b>: A 0-D (scalar) of type `int32`. First entry in sequence.
+*  <b>limit</b>: A 0-D (scalar) of type `int32`. Upper limit of sequence,
+    exclusive.
+*  <b>delta</b>: A 0-D `Tensor` (scalar) of type `int32`. Optional. Default is 1.
+    Number that increments `start`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An 1-D `int32` `Tensor`.
+
+
+
+## Random Tensors <div class="md-anchor" id="AUTOGENERATED-random-tensors">{#AUTOGENERATED-random-tensors}</div>
+
+TensorFlow has several ops that create random tensors with different
+distributions.  The random ops are stateful, and create new random values each
+time they are evaluated.
+
+The `seed` keyword argument in these functions acts in conjunction with
+the graph-level random seed. Changing either the graph-level seed using
+[`set_random_seed`](constant_op.md#set_random_seed) or the op-level seed
+will change the underlying seed of these operations. Setting neither graph-level
+nor op-level seed, results in a random seed for all operations.
+See [`set_random_seed`](constant_op.md#set_random_seed) for details on the
+interaction between operation-level and graph-level random seeds.
+
+### Examples: <div class="md-anchor" id="AUTOGENERATED-examples-">{#AUTOGENERATED-examples-}</div>
+
+```python
+# Create a tensor of shape [2, 3] consisting of random normal values, with mean
+# -1 and standard deviation 4.
+norm = tf.random_normal([2, 3], mean=-1, stddev=4)
+
+# Shuffle the first dimension of a tensor
+c = tf.constant([[1, 2], [3, 4], [5, 6]])
+shuff = tf.random_shuffle(c)
+
+# Each time we run these ops, different results are generated
+sess = tf.Session()
+print sess.run(norm)
+print sess.run(norm)
+
+# Set an op-level seed to generate repeatable sequences across sessions.
+c = tf.constant([[1, 2], [3, 4], [5, 6]])
+sess = tf.Session()
+norm = tf.random_normal(c, seed=1234)
+print sess.run(norm)
+print sess.run(norm)
+```
+
+Another common use of random values is the intialization of variables. Also see
+the [Variables How To](../../how_tos/variables/index.md).
+
+```python
+# Use random uniform values in [0, 1) as the initializer for a variable of shape
+# [2, 3]. The default type is float32.
+var = tf.Variable(tf.random_uniform([2, 3]), name="var")
+init = tf.initialize_all_variables()
+
+sess = tf.Session()
+sess.run(init)
+print sess.run(var)
+```
+
+- - -
+
+### tf.random_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None) <div class="md-anchor" id="random_normal">{#random_normal}</div>
+
+Outputs random values from a normal distribution.
+
+##### Args:
+
+
+*  <b>shape</b>: A 1-D integer Tensor or Python array. The shape of the output tensor.
+*  <b>mean</b>: A 0-D Tensor or Python value of type `dtype`. The mean of the normal
+    distribution.
+*  <b>stddev</b>: A 0-D Tensor or Python value of type `dtype`. The standard deviation
+    of the normal distribution.
+*  <b>dtype</b>: The type of the output.
+*  <b>seed</b>: A Python integer. Used to create a random seed for the distribution.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tensor of the specified shape filled with random normal values.
+
+
+- - -
+
+### tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None) <div class="md-anchor" id="truncated_normal">{#truncated_normal}</div>
+
+Outputs random values from a truncated normal distribution.
+
+The generated values follow a normal distribution with specified mean and
+standard deviation, except that values whose magnitude is more than 2 standard
+deviations from the mean are dropped and re-picked.
+
+##### Args:
+
+
+*  <b>shape</b>: A 1-D integer Tensor or Python array. The shape of the output tensor.
+*  <b>mean</b>: A 0-D Tensor or Python value of type `dtype`. The mean of the
+    truncated normal distribution.
+*  <b>stddev</b>: A 0-D Tensor or Python value of type `dtype`. The standard deviation
+    of the truncated normal distribution.
+*  <b>dtype</b>: The type of the output.
+*  <b>seed</b>: A Python integer. Used to create a random seed for the distribution.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tensor of the specified shape filled with random truncated normal values.
+
+
+- - -
+
+### tf.random_uniform(shape, minval=0.0, maxval=1.0, dtype=tf.float32, seed=None, name=None) <div class="md-anchor" id="random_uniform">{#random_uniform}</div>
+
+Outputs random values from a uniform distribution.
+
+The generated values follow a uniform distribution in the range
+`[minval, maxval)`. The lower bound `minval` is included in the range, while
+the upper bound `maxval` is excluded.
+
+##### Args:
+
+
+*  <b>shape</b>: A 1-D integer Tensor or Python array. The shape of the output tensor.
+*  <b>minval</b>: A 0-D Tensor or Python value of type `dtype`. The lower bound on the
+    range of random values to generate.
+*  <b>maxval</b>: A 0-D Tensor or Python value of type `dtype`. The upper bound on
+    the range of random values to generate.
+*  <b>dtype</b>: The type of the output.
+*  <b>seed</b>: A Python integer. Used to create a random seed for the distribution.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tensor of the specified shape filled with random uniform values.
+
+
+- - -
+
+### tf.random_shuffle(value, seed=None, name=None) <div class="md-anchor" id="random_shuffle">{#random_shuffle}</div>
+
+Randomly shuffles a tensor along its first dimension.
+
+The tensor is shuffled along dimension 0, such that each `value[j]` is mapped
+to one and only one `output[i]`. For example, a mapping that might occur for a
+3x2 tensor is:
+
+```python
+[[1, 2],       [[5, 6],
+ [3, 4],  ==>   [1, 2],
+ [5, 6]]        [3, 4]]
+```
+
+##### Args:
+
+
+*  <b>value</b>: A Tensor to be shuffled.
+*  <b>seed</b>: A Python integer. Used to create a random seed for the distribution.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tensor of same shape and type as `value`, shuffled along its first
+  dimension.
+
+
+- - -
+
+### tf.set_random_seed(seed) <div class="md-anchor" id="set_random_seed">{#set_random_seed}</div>
+
+Sets the graph-level random seed.
+
+Operations that rely on a random seed actually derive it from two seeds:
+the graph-level and operation-level seeds. This sets the graph-level seed.
+
+Its interactions with operation-level seeds is as follows:
+
+  1. If neither the graph-level nor the operation seed is set:
+    A random seed is used for this op.
+  2. If the graph-level seed is set, but the operation seed is not:
+    The system deterministically picks an operation seed in conjunction
+    with the graph-level seed so that it gets a unique random sequence.
+  3. If the graph-level seed is not set, but the operation seed is set:
+    A default graph-level seed and the specified operation seed are used to
+    determine the random sequence.
+  4. If both the graph-level and the operation seed are set:
+    Both seeds are used in conjunction to determine the random sequence.
+
+To illustrate the user-visible effects, consider these examples:
+
+To generate different sequences across sessions, set neither
+graph-level nor op-level seeds:
+
+```python
+a = tf.random_uniform([1])
+b = tf.random_normal([1])
+
+print "Session 1"
+with tf.Session() as sess1:
+  print sess1.run(a)  # generates 'A1'
+  print sess1.run(a)  # generates 'A2'
+  print sess1.run(b)  # generates 'B1'
+  print sess1.run(b)  # generates 'B2'
+
+print "Session 2"
+with tf.Session() as sess2:
+  print sess2.run(a)  # generates 'A3'
+  print sess2.run(a)  # generates 'A4'
+  print sess2.run(b)  # generates 'B3'
+  print sess2.run(b)  # generates 'B4'
+```
+
+To generate the same repeatable sequence for an op across sessions, set the
+seed for the op:
+
+```python
+a = tf.random_uniform([1], seed=1)
+b = tf.random_normal([1])
+
+# Repeatedly running this block with the same graph will generate the same
+# sequence of values for 'a', but different sequences of values for 'b'.
+print "Session 1"
+with tf.Session() as sess1:
+  print sess1.run(a)  # generates 'A1'
+  print sess1.run(a)  # generates 'A2'
+  print sess1.run(b)  # generates 'B1'
+  print sess1.run(b)  # generates 'B2'
+
+print "Session 2"
+with tf.Session() as sess2:
+  print sess2.run(a)  # generates 'A1'
+  print sess2.run(a)  # generates 'A2'
+  print sess2.run(b)  # generates 'B3'
+  print sess2.run(b)  # generates 'B4'
+```
+
+To make the random sequences generated by all ops be repeatable across
+sessions, set a graph-level seed:
+
+```python
+tf.set_random_seed(1234)
+a = tf.random_uniform([1])
+b = tf.random_normal([1])
+
+# Repeatedly running this block with the same graph will generate different
+# sequences of 'a' and 'b'.
+print "Session 1"
+with tf.Session() as sess1:
+  print sess1.run(a)  # generates 'A1'
+  print sess1.run(a)  # generates 'A2'
+  print sess1.run(b)  # generates 'B1'
+  print sess1.run(b)  # generates 'B2'
+
+print "Session 2"
+with tf.Session() as sess2:
+  print sess2.run(a)  # generates 'A1'
+  print sess2.run(a)  # generates 'A2'
+  print sess2.run(b)  # generates 'B1'
+  print sess2.run(b)  # generates 'B2'
+```
+
+##### Args:
+
+
+*  <b>seed</b>: integer.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/control_flow_ops.md b/tensorflow/g3doc/api_docs/python/control_flow_ops.md
new file mode 100644
index 0000000000..ad4321f01b
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/control_flow_ops.md
@@ -0,0 +1,590 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Control Flow
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Control Flow Operations](#AUTOGENERATED-control-flow-operations)
+  * [tf.identity(input, name=None)](#identity)
+  * [tf.tuple(tensors, name=None, control_inputs=None)](#tuple)
+  * [tf.group(*inputs, **kwargs)](#group)
+  * [tf.no_op(name=None)](#no_op)
+  * [tf.count_up_to(ref, limit, name=None)](#count_up_to)
+* [Logical Operators](#AUTOGENERATED-logical-operators)
+  * [tf.logical_and(x, y, name=None)](#logical_and)
+  * [tf.logical_not(x, name=None)](#logical_not)
+  * [tf.logical_or(x, y, name=None)](#logical_or)
+  * [tf.logical_xor(x, y, name='LogicalXor')](#logical_xor)
+* [Comparison Operators](#AUTOGENERATED-comparison-operators)
+  * [tf.equal(x, y, name=None)](#equal)
+  * [tf.not_equal(x, y, name=None)](#not_equal)
+  * [tf.less(x, y, name=None)](#less)
+  * [tf.less_equal(x, y, name=None)](#less_equal)
+  * [tf.greater(x, y, name=None)](#greater)
+  * [tf.greater_equal(x, y, name=None)](#greater_equal)
+  * [tf.select(condition, t, e, name=None)](#select)
+  * [tf.where(input, name=None)](#where)
+* [Debugging Operations](#AUTOGENERATED-debugging-operations)
+  * [tf.is_finite(x, name=None)](#is_finite)
+  * [tf.is_inf(x, name=None)](#is_inf)
+  * [tf.is_nan(x, name=None)](#is_nan)
+  * [tf.verify_tensor_all_finite(t, msg, name=None)](#verify_tensor_all_finite)
+  * [tf.check_numerics(tensor, message, name=None)](#check_numerics)
+  * [tf.add_check_numerics_ops()](#add_check_numerics_ops)
+  * [tf.Assert(condition, data, summarize=None, name=None)](#Assert)
+  * [tf.Print(input_, data, message=None, first_n=None, summarize=None, name=None)](#Print)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Control Flow Operations <div class="md-anchor" id="AUTOGENERATED-control-flow-operations">{#AUTOGENERATED-control-flow-operations}</div>
+
+TensorFlow provides several operations and classes that you can use to control
+the execution of operations and add conditional dependencies to your graph.
+
+- - -
+
+### tf.identity(input, name=None) <div class="md-anchor" id="identity">{#identity}</div>
+
+Return a tensor with the same shape and contents as the input tensor or value.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+
+
+- - -
+
+### tf.tuple(tensors, name=None, control_inputs=None) <div class="md-anchor" id="tuple">{#tuple}</div>
+
+Group tensors together.
+
+This creates a tuple of tensors with the same values as the `tensors`
+argument, except that the value of each tensor is only returned after the
+values of all tensors have been computed.
+
+`control_inputs` contains additional ops that have to finish before this op
+finishes, but whose outputs are not returned.
+
+This can be used as a "join" mechanism for parallel computations: all the
+argument tensors can be computed in parallel, but the values of any tensor
+returned by `tuple` are only available after all the parallel computations
+are done.
+
+See also `group` and `with_dependencies`.
+
+##### Args:
+
+
+*  <b>tensors</b>: A list of `Tensor`s or `IndexedSlices`, some entries can be `None`.
+*  <b>name</b>: (optional) A name to use as a `name_scope` for the operation.
+*  <b>control_inputs</b>: List of additional ops to finish before returning.
+
+##### Returns:
+
+  Same as `tensors`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `tensors` does not contain any `Tensor` or `IndexedSlices`.
+
+
+- - -
+
+### tf.group(*inputs, **kwargs) <div class="md-anchor" id="group">{#group}</div>
+
+Create an op that groups multiple operations.
+
+When this op finishes, all ops in `input` have finished. This op has no
+output.
+
+See also `tuple` and `with_dependencies`.
+
+##### Args:
+
+
+*  <b>*inputs</b>: One or more tensors to group.
+*  <b>**kwargs</b>: Optional parameters to pass when constructing the NodeDef.
+*  <b>name</b>: A name for this operation (optional).
+
+##### Returns:
+
+  An Operation that executes all its inputs.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If an unknown keyword argument is provided, or if there are
+              no inputs.
+
+
+- - -
+
+### tf.no_op(name=None) <div class="md-anchor" id="no_op">{#no_op}</div>
+
+Does nothing. Only useful as a placeholder for control edges.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+### tf.count_up_to(ref, limit, name=None) <div class="md-anchor" id="count_up_to">{#count_up_to}</div>
+
+Increments 'ref' until it reaches 'limit'.
+
+This operation outputs "ref" after the update is done.  This makes it
+easier to chain operations that need to use the updated value.
+
+##### Args:
+
+
+*  <b>ref</b>: A mutable `Tensor`. Must be one of the following types: `int32`, `int64`.
+    Should be from a scalar `Variable` node.
+*  <b>limit</b>: An `int`.
+    If incrementing ref would bring it above limit, instead generates an
+    'OutOfRange' error.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `ref`.
+  A copy of the input before increment. If nothing else modifies the
+  input, the values produced will all be distinct.
+
+
+
+## Logical Operators <div class="md-anchor" id="AUTOGENERATED-logical-operators">{#AUTOGENERATED-logical-operators}</div>
+
+TensorFlow provides several operations that you can use to add logical operators
+to your graph.
+
+- - -
+
+### tf.logical_and(x, y, name=None) <div class="md-anchor" id="logical_and">{#logical_and}</div>
+
+Returns the truth value of x AND y element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` of type `bool`.
+*  <b>y</b>: A `Tensor` of type `bool`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.logical_not(x, name=None) <div class="md-anchor" id="logical_not">{#logical_not}</div>
+
+Returns the truth value of NOT x element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` of type `bool`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.logical_or(x, y, name=None) <div class="md-anchor" id="logical_or">{#logical_or}</div>
+
+Returns the truth value of x OR y element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` of type `bool`.
+*  <b>y</b>: A `Tensor` of type `bool`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.logical_xor(x, y, name='LogicalXor') <div class="md-anchor" id="logical_xor">{#logical_xor}</div>
+
+x ^ y = (x | y) & ~(x & y).
+
+
+
+## Comparison Operators <div class="md-anchor" id="AUTOGENERATED-comparison-operators">{#AUTOGENERATED-comparison-operators}</div>
+
+TensorFlow provides several operations that you can use to add comparison
+operators to your graph.
+
+- - -
+
+### tf.equal(x, y, name=None) <div class="md-anchor" id="equal">{#equal}</div>
+
+Returns the truth value of (x == y) element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `complex64`, `quint8`, `qint8`, `qint32`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.not_equal(x, y, name=None) <div class="md-anchor" id="not_equal">{#not_equal}</div>
+
+Returns the truth value of (x != y) element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `complex64`, `quint8`, `qint8`, `qint32`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.less(x, y, name=None) <div class="md-anchor" id="less">{#less}</div>
+
+Returns the truth value of (x < y) element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.less_equal(x, y, name=None) <div class="md-anchor" id="less_equal">{#less_equal}</div>
+
+Returns the truth value of (x <= y) element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.greater(x, y, name=None) <div class="md-anchor" id="greater">{#greater}</div>
+
+Returns the truth value of (x > y) element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.greater_equal(x, y, name=None) <div class="md-anchor" id="greater_equal">{#greater_equal}</div>
+
+Returns the truth value of (x >= y) element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.select(condition, t, e, name=None) <div class="md-anchor" id="select">{#select}</div>
+
+Selects elements from `t` or `e`, depending on `condition`.
+
+The `condition`, `t`, and `e` tensors must all have the same shape,
+and the output will also have that shape. The `condition` tensor acts
+as an element-wise mask that chooses, based on the value at each
+element, whether the corresponding element in the output should be
+taken from `t` (if true) or `e` (if false). For example:
+
+For example:
+
+```prettyprint
+# 'condition' tensor is [[True, False]
+#                        [True, False]]
+# 't' is [[1, 1],
+#         [1, 1]]
+# 'e' is [[2, 2],
+#         [2, 2]]
+select(condition, t, e) ==> [[1, 2],
+                             [1, 2]]
+```
+
+##### Args:
+
+
+*  <b>condition</b>: A `Tensor` of type `bool`.
+*  <b>t</b>: A `Tensor` with the same shape as `condition`.
+*  <b>e</b>: A `Tensor` with the same type and shape as `t`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` with the same type and shape as `t` and `e`.
+
+
+- - -
+
+### tf.where(input, name=None) <div class="md-anchor" id="where">{#where}</div>
+
+Returns locations of true values in a boolean tensor.
+
+This operation returns the coordinates of true elements in `input`. The
+coordinates are returned in a 2-D tensor where the first dimension (rows)
+represents the number of true elements, and the second dimension (columns)
+represents the coordinates of the true elements. Keep in mind, the shape of
+the output tensor can vary depending on how many true values there are in
+`input`. Indices are output in row-major order.
+
+For example:
+
+```prettyprint
+# 'input' tensor is [[True, False]
+#                    [True, False]]
+# 'input' has two true values, so output has two coordinates.
+# 'input' has rank of 2, so coordinates have two indices.
+where(input) ==> [[0, 0],
+                  [1, 0]]
+
+# `input` tensor is [[[True, False]
+#                     [True, False]]
+#                    [[False, True]
+#                     [False, True]]
+#                    [[False, False]
+#                     [False, True]]]
+# 'input' has 5 true values, so output has 5 coordinates.
+# 'input' has rank of 3, so coordinates have three indices.
+where(input) ==> [[0, 0, 0],
+                  [0, 1, 0],
+                  [1, 0, 1],
+                  [1, 1, 1],
+                  [2, 1, 1]]
+```
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor` of type `bool`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `int64`.
+
+
+
+## Debugging Operations <div class="md-anchor" id="AUTOGENERATED-debugging-operations">{#AUTOGENERATED-debugging-operations}</div>
+
+TensorFlow provides several operations that you can use to validate values and
+debug your graph.
+
+- - -
+
+### tf.is_finite(x, name=None) <div class="md-anchor" id="is_finite">{#is_finite}</div>
+
+Returns which elements of x are finite.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.is_inf(x, name=None) <div class="md-anchor" id="is_inf">{#is_inf}</div>
+
+Returns which elements of x are Inf.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.is_nan(x, name=None) <div class="md-anchor" id="is_nan">{#is_nan}</div>
+
+Returns which elements of x are NaN.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`.
+
+
+- - -
+
+### tf.verify_tensor_all_finite(t, msg, name=None) <div class="md-anchor" id="verify_tensor_all_finite">{#verify_tensor_all_finite}</div>
+
+Assert that the tensor does not contain any NaN's or Inf's.
+
+##### Args:
+
+
+*  <b>t</b>: Tensor to check.
+*  <b>msg</b>: Message to log on failure.
+*  <b>name</b>: A name for this operation (optional).
+
+##### Returns:
+
+  Same tensor as `t`.
+
+
+- - -
+
+### tf.check_numerics(tensor, message, name=None) <div class="md-anchor" id="check_numerics">{#check_numerics}</div>
+
+Checks a tensor for NaN and Inf values.
+
+When run, reports an `InvalidArgument` error if `tensor` has any values
+that are not a number (NaN) or infinity (Inf). Otherwise, passes `tensor` as-is.
+
+##### Args:
+
+
+*  <b>tensor</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+*  <b>message</b>: A `string`. Prefix of the error message.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `tensor`.
+
+
+- - -
+
+### tf.add_check_numerics_ops() <div class="md-anchor" id="add_check_numerics_ops">{#add_check_numerics_ops}</div>
+
+Connect a check_numerics to every floating point tensor.
+
+`check_numerics` operations themselves are added for each `float` or `double`
+tensor in the graph. For all ops in the graph, the `check_numerics` op for
+all of its (`float` or `double`) inputs is guaranteed to run before the
+`check_numerics` op on any of its outputs.
+
+##### Returns:
+
+  A `group` op depending on all `check_numerics` ops added.
+
+
+- - -
+
+### tf.Assert(condition, data, summarize=None, name=None) <div class="md-anchor" id="Assert">{#Assert}</div>
+
+Asserts that the given condition is true.
+
+If `condition` evaluates to false, print the list of tensors in `data`.
+`summarize` determines how many entries of the tensors to print.
+
+##### Args:
+
+
+*  <b>condition</b>: The condition to evaluate.
+*  <b>data</b>: The tensors to print out when condition is false.
+*  <b>summarize</b>: Print this many entries of each tensor.
+*  <b>name</b>: A name for this operation (optional).
+
+
+- - -
+
+### tf.Print(input_, data, message=None, first_n=None, summarize=None, name=None) <div class="md-anchor" id="Print">{#Print}</div>
+
+Prints a list of tensors.
+
+This is an identity op with the side effect of printing `data` when
+evaluating.
+
+##### Args:
+
+
+*  <b>input_</b>: A tensor passed through this op.
+*  <b>data</b>: A list of tensors to print out when op is evaluated.
+*  <b>message</b>: A string, prefix of the error message.
+*  <b>first_n</b>: Only log `first_n` number of times. Negative numbers log always;
+           this is the default.
+*  <b>summarize</b>: Only print this many entries of each tensor.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  Same tensor as `input_`.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/framework.md b/tensorflow/g3doc/api_docs/python/framework.md
new file mode 100644
index 0000000000..e28daaa77a
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/framework.md
@@ -0,0 +1,2079 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Building Graphs
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Core graph data structures](#AUTOGENERATED-core-graph-data-structures)
+  * [class tf.Graph](#Graph)
+  * [class tf.Operation](#Operation)
+  * [class tf.Tensor](#Tensor)
+* [Tensor types](#AUTOGENERATED-tensor-types)
+  * [class tf.DType](#DType)
+  * [tf.as_dtype(type_value)](#as_dtype)
+* [Utility functions](#AUTOGENERATED-utility-functions)
+  * [tf.device(dev)](#device)
+  * [tf.name_scope(name)](#name_scope)
+  * [tf.control_dependencies(control_inputs)](#control_dependencies)
+  * [tf.convert_to_tensor(value, dtype=None, name=None)](#convert_to_tensor)
+  * [tf.get_default_graph()](#get_default_graph)
+  * [tf.import_graph_def(graph_def, input_map=None, return_elements=None, name=None, op_dict=None)](#import_graph_def)
+* [Graph collections](#AUTOGENERATED-graph-collections)
+  * [tf.add_to_collection(name, value)](#add_to_collection)
+  * [tf.get_collection(key, scope=None)](#get_collection)
+  * [class tf.GraphKeys](#GraphKeys)
+* [Defining new operations](#AUTOGENERATED-defining-new-operations)
+  * [class tf.RegisterGradient](#RegisterGradient)
+  * [tf.NoGradient(op_type)](#NoGradient)
+  * [class tf.RegisterShape](#RegisterShape)
+  * [class tf.TensorShape](#TensorShape)
+  * [class tf.Dimension](#Dimension)
+  * [tf.op_scope(*args, **kwds)](#op_scope)
+  * [tf.get_seed(op_seed)](#get_seed)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+Import names from the framework library.
+
+## Core graph data structures <div class="md-anchor" id="AUTOGENERATED-core-graph-data-structures">{#AUTOGENERATED-core-graph-data-structures}</div>
+
+- - -
+
+### class tf.Graph <div class="md-anchor" id="Graph">{#Graph}</div>
+
+A TensorFlow computation, represented as a dataflow graph.
+
+A `Graph` contains a set of [`Operation`](framework.md#Operation) objects,
+which represent units of computation; and [`Tensor`](framework.md#Tensor)
+objects, which represent the units of data that flow between operations.
+
+A default `Graph` is always registered, and accessible by calling
+[`tf.get_default_graph()`](framework.md#get_default_graph). To add an
+operation to the default graph, simply call one of the functions that defines
+a new `Operation`:
+
+```
+c = tf.constant(4.0)
+assert c.graph is tf.get_default_graph()
+```
+
+Another typical usage involves the
+[`Graph.as_default()`](framework.md#Graph.as_default)
+context manager, which overrides the current default graph for the
+lifetime of the context:
+
+```python
+g = tf.Graph()
+with g.as_default():
+  # Define operations and tensors in `g`.
+  c = tf.constant(30.0)
+  assert c.graph is g
+```
+
+Important note: This class *is not* thread-safe for graph construction. All
+operations should be created from a single thread, or external
+synchronization must be provided. Unless otherwise specified, all methods
+are not thread-safe.
+
+- - -
+
+#### tf.Graph.__init__() {#Graph.__init__}
+
+Creates a new, empty Graph.
+
+
+- - -
+
+#### tf.Graph.as_default() {#Graph.as_default}
+
+Returns a context manager that makes this `Graph` the default graph.
+
+This method should be used if you want to create multiple graphs
+in the same process. For convenience, a global default graph is
+provided, and all ops will be added to this graph if you do not
+create a new graph explicitly. Use this method the `with` keyword
+to specify that ops created within the scope of a block should be
+added to this graph.
+
+The default graph is a property of the current thread. If you
+create a new thread, and wish to use the default graph in that
+thread, you must explicitly add a `with g.as_default():` in that
+thread's function.
+
+The following code examples are equivalent:
+
+```python
+# 1. Using Graph.as_default():
+g = tf.Graph()
+with g.as_default():
+  c = tf.constant(5.0)
+  assert c.graph is g
+
+# 2. Constructing and making default:
+with tf.Graph().as_default() as g:
+  c = tf.constant(5.0)
+  assert c.graph is g
+```
+
+##### Returns:
+
+  A context manager for using this graph as the default graph.
+
+
+- - -
+
+#### tf.Graph.as_graph_def(from_version=None) {#Graph.as_graph_def}
+
+Returns a serialized `GraphDef` representation of this graph.
+
+This method is thread-safe.
+
+##### Args:
+
+
+*  <b>from_version</b>: Optional.  If this is set, returns a `GraphDef`
+    containing only the nodes that were added to this graph since
+    its `version` property had the given value.
+
+##### Returns:
+
+  A
+  [`GraphDef`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/graph.proto)
+  protocol buffer.
+
+
+- - -
+
+#### tf.Graph.finalize() {#Graph.finalize}
+
+Finalizes this graph, making it read-only.
+
+After calling `g.finalize()`, no new operations can be added to
+`g`.  This method is used to ensure that no operations are added
+to a graph when it is shared between multiple threads, for example
+when using a [`QueueRunner`](train.md#QueueRunner).
+
+
+- - -
+
+#### tf.Graph.finalized {#Graph.finalized}
+
+True if this graph has been finalized.
+
+
+- - -
+
+#### tf.Graph.control_dependencies(control_inputs) {#Graph.control_dependencies}
+
+Returns a context manager that specifies control dependencies.
+
+Use with the `with` keyword to specify that all operations constructed
+within the context should have control dependencies on
+`control_inputs`. For example:
+
+```python
+with g.control_dependencies([a, b, c]):
+  # `d` and `e` will only run after `a`, `b`, and `c` have executed.
+  d = ...
+  e = ...
+```
+
+Multiple calls to `control_dependencies()` can be nested, and in
+that case a new `Operation` will have control dependencies on the union
+of `control_inputs` from all active contexts.
+
+```python
+with g.control_dependencies([a, b]):
+  # Ops declared here run after `a` and `b`.
+  with g.control_dependencies([c, d]):
+    # Ops declared here run after `a`, `b`, `c`, and `d`.
+```
+
+*N.B.* The control dependencies context applies *only* to ops that
+are constructed within the context. Merely using an op or tensor
+in the context does not add a control dependency. The following
+example illustrates this point:
+
+```python
+# WRONG
+def my_func(pred, tensor):
+  t = tf.matmul(tensor, tensor)
+  with tf.control_dependencies([pred]):
+    # The matmul op is created outside the context, so no control
+    # dependency will be added.
+    return t
+
+# RIGHT
+def my_func(pred, tensor):
+  with tf.control_dependencies([pred]):
+    # The matmul op is created in the context, so a control dependency
+    # will be added.
+    return tf.matmul(tensor, tensor)
+```
+
+##### Args:
+
+
+*  <b>control_inputs</b>: A list of `Operation` or `Tensor` objects, which
+    must be executed or computed before running the operations
+    defined in the context.
+
+##### Returns:
+
+ A context manager that specifies control dependencies for all
+ operations constructed within the context.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `control_inputs` is not a list of `Operation` or
+    `Tensor` objects.
+
+
+- - -
+
+#### tf.Graph.device(*args, **kwds) {#Graph.device}
+
+Returns a context manager that specifies the default device to use.
+
+The `device_name_or_function` argument may either be a device name
+string, a device function, or None:
+
+* If it is a device name string, all operations constructed in
+  this context will be assigned to the device with that name.
+* If it is a function, it will be treated as function from
+  Operation objects to device name strings, and invoked each time
+  a new Operation is created. The Operation will be assigned to
+  the device with the returned name.
+* If it is None, the default device will be cleared.
+
+For example:
+
+```python
+with g.device('/gpu:0'):
+  # All operations constructed in this context will be placed
+  # on GPU 0.
+  with g.device(None):
+    # All operations constructed in this context will have no
+    # assigned device.
+
+# Defines a function from `Operation` to device string.
+def matmul_on_gpu(n):
+  if n.type == "MatMul":
+    return "/gpu:0"
+  else:
+    return "/cpu:0"
+
+with g.device(matmul_on_gpu):
+  # All operations of type "MatMul" constructed in this context
+  # will be placed on GPU 0; all other operations will be placed
+  # on CPU 0.
+```
+
+##### Args:
+
+
+*  <b>device_name_or_function</b>: The device name or function to use in
+    the context.
+
+##### Returns:
+
+  A context manager that specifies the default device to use for newly
+  created ops.
+
+
+- - -
+
+#### tf.Graph.name_scope(*args, **kwds) {#Graph.name_scope}
+
+Returns a context manager that creates hierarchical names for operations.
+
+A graph maintains a stack of name scopes. A `with name_scope(...):`
+statement pushes a new name onto the stack for the lifetime of the context.
+
+The `name` argument will be interpreted as follows:
+
+* A string (not ending with '/') will create a new name scope, in which
+  `name` is appended to the prefix of all operations created in the
+  context. If `name` has been used before, it will be made unique by
+  calling `self.unique_name(name)`.
+* A scope previously captured from a `with g.name_scope(...) as
+  scope:` statement will be treated as an "absolute" name scope, which
+  makes it possible to re-enter existing scopes.
+* A value of `None` or the empty string will reset the current name scope
+  to the top-level (empty) name scope.
+
+For example:
+
+```python
+with tf.Graph().as_default() as g:
+  c = tf.constant(5.0, name="c")
+  assert c_1.name == "c"
+  c_1 = tf.constant(6.0, name="c")
+  assert c_1.name == "c_1"
+
+  # Creates a scope called "nested"
+  with g.name_scope("nested") as scope:
+    nested_c = tf.constant(10.0, name="c")
+    assert nested_c.name == "nested/c"
+
+    # Creates a nested scope called "inner".
+    with g.name_scope("inner"):
+      nested_inner_c = tf.constant(20.0, name="c")
+      assert nested_inner_c.name == "nested/inner/c"
+
+    # Create a nested scope called "inner_1".
+    with g.name_scope("inner"):
+      nested_inner_1_c = tf.constant(30.0, name="c")
+      assert nested_inner_1_c.name == "nested/inner_1/c"
+
+      # Treats `scope` as an absolute name scope, and
+      # switches to the "nested/" scope.
+      with g.name_scope(scope):
+        nested_d = tf.constant(40.0, name="d")
+        assert nested_d.name == "nested/d"
+
+        with g.name_scope(""):
+          e = tf.constant(50.0, name="e")
+          assert e.name == "e"
+```
+
+The name of the scope itself can be captured by `with
+g.name_scope(...) as scope:`, which stores the name of the scope
+in the variable `scope`. This value can be used to name an
+operation that represents the overall result of executing the ops
+in a scope. For example:
+
+```python
+inputs = tf.constant(...)
+with g.name_scope('my_layer') as scope:
+  weights = tf.Variable(..., name="weights")
+  biases = tf.Variable(..., name="biases")
+  affine = tf.matmul(inputs, weights) + biases
+  output = tf.nn.relu(affine, name=scope)
+```
+
+
+##### Args:
+
+
+*  <b>name</b>: A name for the scope.
+
+##### Returns:
+
+  A context manager that installs `name` as a new name scope.
+
+
+
+A `Graph` instance supports an arbitrary number of "collections"
+that are identified by name. For convenience when building a large
+graph, collections can store groups of related objects: for
+example, the `tf.Variable` uses a collection (named
+[`tf.GraphKeys.VARIABLES`](framework.md#GraphKeys)) for all variables that are
+created during the construction of a graph. The caller may define
+additional collections by specifying a new name.
+
+- - -
+
+#### tf.Graph.add_to_collection(name, value) {#Graph.add_to_collection}
+
+Stores `value` in the collection with the given `name`.
+
+##### Args:
+
+
+*  <b>name</b>: The key for the collection. For example, the `GraphKeys` class
+    contains many standard names for collections.
+*  <b>value</b>: The value to add to the collection.
+
+
+- - -
+
+#### tf.Graph.get_collection(name, scope=None) {#Graph.get_collection}
+
+Returns a list of values in the collection with the given `name`.
+
+##### Args:
+
+
+*  <b>key</b>: The key for the collection. For example, the `GraphKeys` class
+    contains many standard names for collections.
+*  <b>scope</b>: (Optional.) If supplied, the resulting list is filtered to include
+    only items whose name begins with this string.
+
+##### Returns:
+
+  The list of values in the collection with the given `name`, or
+  an empty list if no value has been added to that collection. The
+  list contains the values in the order under which they were
+  collected.
+
+
+
+- - -
+
+#### tf.Graph.as_graph_element(obj, allow_tensor=True, allow_operation=True) {#Graph.as_graph_element}
+
+Returns the object referred to by `obj`, as an `Operation` or `Tensor`.
+
+This function validates that `obj` represents an element of this
+graph, and gives an informative error message if it is not.
+
+This function is the canonical way to get/validate an object of
+one of the allowed types from an external argument reference in the
+Session API.
+
+This method may be called concurrently from multiple threads.
+
+##### Args:
+
+
+*  <b>obj</b>: A `Tensor`, an `Operation`, or the name of a tensor or operation.
+    Can also be any object with an `_as_graph_element()` method that returns
+    a value of one of these types.
+*  <b>allow_tensor</b>: If true, `obj` may refer to a `Tensor`.
+*  <b>allow_operation</b>: If true, `obj` may refer to an `Operation`.
+
+##### Returns:
+
+  The `Tensor` or `Operation` in the Graph corresponding to `obj`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `obj` is not a type we support attempting to convert
+    to types.
+*  <b>ValueError</b>: If `obj` is of an appropriate type but invalid. For
+    example, an invalid string.
+*  <b>KeyError</b>: If `obj` is not an object in the graph.
+
+
+- - -
+
+#### tf.Graph.get_operation_by_name(name) {#Graph.get_operation_by_name}
+
+Returns the `Operation` with the given `name`.
+
+This method may be called concurrently from multiple threads.
+
+##### Args:
+
+
+*  <b>name</b>: The name of the `Operation` to return.
+
+##### Returns:
+
+  The `Operation` with the given `name`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `name` is not a string.
+*  <b>KeyError</b>: If `name` does not correspond to an operation in this graph.
+
+
+- - -
+
+#### tf.Graph.get_tensor_by_name(name) {#Graph.get_tensor_by_name}
+
+Returns the `Tensor` with the given `name`.
+
+This method may be called concurrently from multiple threads.
+
+##### Args:
+
+
+*  <b>name</b>: The name of the `Tensor` to return.
+
+##### Returns:
+
+  The `Tensor` with the given `name`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `name` is not a string.
+*  <b>KeyError</b>: If `name` does not correspond to a tensor in this graph.
+
+
+- - -
+
+#### tf.Graph.get_operations() {#Graph.get_operations}
+
+Return the list of operations in the graph.
+
+You can modify the operations in place, but modifications
+to the list such as inserts/delete have no effect on the
+list of operations known to the graph.
+
+This method may be called concurrently from multiple threads.
+
+##### Returns:
+
+  A list of Operations.
+
+
+
+- - -
+
+#### tf.Graph.get_default_device() {#Graph.get_default_device}
+
+Returns the default device.
+
+##### Returns:
+
+  A string.
+
+
+- - -
+
+#### tf.Graph.seed {#Graph.seed}
+
+
+
+- - -
+
+#### tf.Graph.unique_name(name) {#Graph.unique_name}
+
+Return a unique Operation name for "name".
+
+Note: You rarely need to call unique_name() directly.  Most of the time you
+just need to create "with g.name_scope()" blocks to generate structured
+names.
+
+`unique_name` is used to generate structured names, separated by "/",
+to help identify Operations when debugging a Graph.  Operation names
+are displayed in error messages reported by the TensorFlow runtime,
+and in various visualization tools such as TensorBoard.
+
+##### Args:
+
+
+*  <b>name</b>: The name for an `Operation`.
+
+##### Returns:
+
+  A string to be passed to `create_op()` that will be used
+  to name the operation being created.
+
+
+- - -
+
+#### tf.Graph.version {#Graph.version}
+
+Returns a version number that increases as ops are added to the graph.
+
+
+- - -
+
+#### tf.Graph.create_op(op_type, inputs, dtypes, input_types=None, name=None, attrs=None, op_def=None, compute_shapes=True) {#Graph.create_op}
+
+Creates an `Operation` in this graph.
+
+This is a low-level interface for creating an `Operation`. Most
+programs will not call this method directly, and instead use the
+Python op constructors, such as `tf.constant()`, which add ops to
+the default graph.
+
+##### Args:
+
+
+*  <b>op_type</b>: The `Operation` type to create. This corresponds to the
+    `OpDef.name` field for the proto that defines the operation.
+*  <b>inputs</b>: A list of `Tensor` objects that will be inputs to the `Operation`.
+*  <b>dtypes</b>: A list of `DType` objects that will be the types of the tensors
+    that the operation produces.
+*  <b>input_types</b>: (Optional.) A list of `DType`s that will be the types of
+    the tensors that the operation consumes. By default, uses the base
+    `DType` of each input in `inputs`. Operations that expect
+    reference-typed inputs must specify `input_types` explicitly.
+*  <b>name</b>: (Optional.) A string name for the operation. If not specified, a
+    name is generated based on `op_type`.
+*  <b>attrs</b>: (Optional.) A list of `AttrValue` protos for the `attr` field of
+    the `NodeDef` proto that will represent the operation.
+*  <b>op_def</b>: (Optional.) The `OpDef` proto that describes the `op_type` that
+    the operation will have.
+*  <b>compute_shapes</b>: (Optional.) If True, shape inference will be performed
+    to compute the shapes of the outputs.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: if any of the inputs is not a `Tensor`.
+
+##### Returns:
+
+  An `Operation` object.
+
+
+- - -
+
+#### tf.Graph.gradient_override_map(*args, **kwds) {#Graph.gradient_override_map}
+
+EXPERIMENTAL: A context manager for overriding gradient functions.
+
+This context manager can be used to override the gradient function
+that will be used for ops within the scope of the context.
+
+For example:
+
+```python
+@tf.RegisterGradient("CustomSquare")
+def _custom_square_grad(op, inputs):
+  # ...
+
+with tf.Graph().as_default() as g:
+  c = tf.constant(5.0)
+  s_1 = tf.square(c)  # Uses the default gradient for tf.square.
+  with g.gradient_override_map({"Square": "CustomSquare"}):
+    s_2 = tf.square(s_2)  # Uses _custom_square_grad to compute the
+                          # gradient of s_2.
+```
+
+##### Args:
+
+
+*  <b>op_type_map</b>: A dictionary mapping op type strings to alternative op
+    type strings.
+
+##### Returns:
+
+  A context manager that sets the alternative op type to be used for one
+  or more ops created in that context.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `op_type_map` is not a dictionary mapping strings to
+    strings.
+
+
+
+- - -
+
+### class tf.Operation <div class="md-anchor" id="Operation">{#Operation}</div>
+
+Represents a graph node that performs computation on tensors.
+
+An `Operation` is a node in a TensorFlow `Graph` that takes zero or
+more `Tensor` objects as input, and produces zero or more `Tensor`
+objects as output. Objects of type `Operation` are created by
+calling a Python op constructor (such as [`tf.matmul()`](math_ops.md#matmul))
+or [`Graph.create_op()`](framework.md#Graph.create_op).
+
+For example `c = tf.matmul(a, b)` creates an `Operation` of type
+"MatMul" that takes tensors `a` and `b` as input, and produces `c`
+as output.
+
+After the graph has been launched in a session, an `Operation` can
+be executed by passing it to [`Session.run()`](client.md#Session.run).
+`op.run()` is a shortcut for calling `tf.get_default_session().run(op)`.
+
+- - -
+
+#### tf.Operation.name {#Operation.name}
+
+The full name of this operation.
+
+- - -
+
+#### tf.Operation.type {#Operation.type}
+
+The type of the op (e.g. `"MatMul"`).
+
+- - -
+
+#### tf.Operation.inputs {#Operation.inputs}
+
+The list of `Tensor` objects representing the data inputs of this op.
+
+- - -
+
+#### tf.Operation.control_inputs {#Operation.control_inputs}
+
+The `Operation` objects on which this op has a control dependency.
+
+Before this op is executed, TensorFlow will ensure that the
+operations in `self.control_inputs` have finished executing. This
+mechanism can be used to run ops sequentially for performance
+reasons, or to ensure that the side effects of an op are observed
+in the correct order.
+
+##### Returns:
+
+  A list of `Operation` objects.
+
+- - -
+
+#### tf.Operation.outputs {#Operation.outputs}
+
+The list of `Tensor` objects representing the outputs of this op.
+
+- - -
+
+#### tf.Operation.device {#Operation.device}
+
+The name of the device to which this op has been assigned, if any.
+
+##### Returns:
+
+  The string name of the device to which this op has been
+  assigned, or None if it has not been assigned to a device.
+
+- - -
+
+#### tf.Operation.graph {#Operation.graph}
+
+The `Graph` that contains this operation.
+
+
+- - -
+
+#### tf.Operation.run(feed_dict=None, session=None) {#Operation.run}
+
+Runs this operation in a `Session`.
+
+Calling this method will execute all preceding operations that
+produce the inputs needed for this operation.
+
+*N.B.* Before invoking `Operation.run()`, its graph must have been
+launched in a session, and either a default session must be
+available, or `session` must be specified explicitly.
+
+##### Args:
+
+
+*  <b>feed_dict</b>: A dictionary that maps `Tensor` objects to feed values.
+    See [`Session.run()`](client.md#Session.run) for a description of the
+    valid feed values.
+*  <b>session</b>: (Optional.) The `Session` to be used to run to this operation. If
+    none, the default session will be used.
+
+
+
+- - -
+
+#### tf.Operation.get_attr(name) {#Operation.get_attr}
+
+Returns the value of the attr of this op with the given `name`.
+
+##### Args:
+
+
+*  <b>name</b>: The name of the attr to fetch.
+
+##### Returns:
+
+  The value of the attr, as a Python object.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If this op does not have an attr with the given `name`.
+
+
+- - -
+
+#### tf.Operation.traceback {#Operation.traceback}
+
+Returns the call stack from when this operation was constructed.
+
+
+#### Other Methods
+- - -
+
+#### tf.Operation.__init__(node_def, g, inputs=None, output_types=None, control_inputs=None, input_types=None, original_op=None, op_def=None) {#Operation.__init__}
+
+Creates an `Operation`.
+
+NOTE: This constructor validates the name of the Operation (passed
+as "node_def.name"). Valid Operation names match the following
+regular expression:
+
+  [A-Za-z0-9.][A-Za-z0-9_.\-/]*
+
+##### Args:
+
+
+*  <b>node_def</b>: graph_pb2.NodeDef.  NodeDef for the Operation.
+    Used for attributes of graph_pb2.NodeDef, typically "name",
+    "op", and "device".  The "input" attribute is irrelevant here
+    as it will be computed when generating the model.
+*  <b>g</b>: Graph. The parent graph.
+*  <b>inputs</b>: list of Tensor objects. The inputs to this Operation.
+*  <b>output_types</b>: list of types_pb2.DataType.  List of the types of the
+    Tensors computed by this operation.  The length of this list indicates
+    the number of output endpoints of the Operation.
+*  <b>control_inputs</b>: list of operations or tensors from which to have a
+    control dependency.
+*  <b>input_types</b>: List of types_pb2.DataType representing the
+    types of the Tensors accepted by the Operation.  By default
+    uses [x.dtype.base_dtype for x in inputs].  Operations that expect
+    reference-typed inputs must specify these explicitly.
+*  <b>original_op</b>: Optional. Used to associate the new Operation with an
+    existing Operation (for example, a replica with the op that was
+    replicated).
+*  <b>op_def</b>: Optional. The op_def_pb2.OpDef proto that describes the
+    op type that this Operation represents.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: if control inputs are not Operations or Tensors,
+    or if node_def is not a NodeDef,
+    or if g is not a Graph,
+    or if inputs are not Tensors,
+    or if inputs and input_types are incompatible.
+*  <b>ValueError</b>: if the node_def name is not valid.
+
+
+- - -
+
+#### tf.Operation.node_def {#Operation.node_def}
+
+Returns a serialized `NodeDef` representation of this operation.
+
+##### Returns:
+
+  A
+  [`NodeDef`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/graph.proto)
+  protocol buffer.
+
+- - -
+
+#### tf.Operation.op_def {#Operation.op_def}
+
+Returns the `OpDef` proto that represents the type of this op.
+
+##### Returns:
+
+  An
+  [`OpDef`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_def.proto)
+  protocol buffer.
+
+- - -
+
+#### tf.Operation.values() {#Operation.values}
+
+DEPRECATED: Use outputs.
+
+
+
+- - -
+
+### class tf.Tensor <div class="md-anchor" id="Tensor">{#Tensor}</div>
+
+Represents a value produced by an `Operation`.
+
+A `Tensor` is a symbolic handle to one of the outputs of an
+`Operation`. It does not hold the values of that operation's output,
+but instead provides a means of computing those values in a
+TensorFlow [`Session`](client.md#Session).
+
+This class has two primary purposes:
+
+1. A `Tensor` can be passed as an input to another `Operation`.
+   This builds a dataflow connection between operations, which
+   enables TensorFlow to execute an entire `Graph` that represents a
+   large, multi-step computation.
+
+2. After the graph has been launched in a session, the value of the
+   `Tensor` can be computed by passing it to
+   [`Session.run()`](client.md#Session.run).
+   `t.eval()` is a shortcut for calling
+   `tf.get_default_session().run(t)`.
+
+In the following example, `c`, `d`, and `e` are symbolic `Tensor`
+objects, whereas `result` is a numpy array that stores a concrete
+value:
+
+```python
+# Build a dataflow graph.
+c = tf.constant([[1.0, 2.0], [3.0, 4.0]])
+d = tf.constant([[1.0, 1.0], [0.0, 1.0]])
+e = tf.matmul(c, d)
+
+# Construct a `Session` to execut the graph.
+sess = tf.Session()
+
+# Execute the graph and store the value that `e` represents in `result`.
+result = sess.run(e)
+```
+
+- - -
+
+#### tf.Tensor.dtype {#Tensor.dtype}
+
+The `DType` of elements in this tensor.
+
+- - -
+
+#### tf.Tensor.name {#Tensor.name}
+
+The string name of this tensor.
+
+- - -
+
+#### tf.Tensor.value_index {#Tensor.value_index}
+
+The index of this tensor in the outputs of its `Operation`.
+
+- - -
+
+#### tf.Tensor.graph {#Tensor.graph}
+
+The `Graph` that contains this tensor.
+
+- - -
+
+#### tf.Tensor.op {#Tensor.op}
+
+The `Operation` that produces this tensor as an output.
+
+- - -
+
+#### tf.Tensor.consumers() {#Tensor.consumers}
+
+Returns a list of `Operation`s that consume this tensor.
+
+##### Returns:
+
+  A list of `Operation`s.
+
+
+
+- - -
+
+#### tf.Tensor.eval(feed_dict=None, session=None) {#Tensor.eval}
+
+Evaluates this tensor in a `Session`.
+
+Calling this method will execute all preceding operations that
+produce the inputs needed for the operation that produces this
+tensor.
+
+*N.B.* Before invoking `Tensor.eval()`, its graph must have been
+launched in a session, and either a default session must be
+available, or `session` must be specified explicitly.
+
+##### Args:
+
+
+*  <b>feed_dict</b>: A dictionary that maps `Tensor` objects to feed values.
+    See [`Session.run()`](client.md#Session.run) for a description of
+    the valid feed values.
+*  <b>session</b>: (Optional.) The `Session` to be used to evaluate this tensor. If
+    none, the default session will be used.
+
+##### Returns:
+
+  A numpy array corresponding to the value of this tensor.
+
+
+
+- - -
+
+#### tf.Tensor.get_shape() {#Tensor.get_shape}
+
+Returns the `TensorShape` that represents the shape of this tensor.
+
+The shape is computed using shape inference functions that are
+registered for each `Operation` type using `tf.RegisterShape`.
+See [`TensorShape`](framework.md#TensorShape) for more details of what a shape
+represents.
+
+The inferred shape of a tensor is used to provide shape
+information without having to launch the graph in a session. This
+can be used for debugging, and providing early error messages. For
+example:
+
+```python
+c = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
+
+print c.get_shape()
+==> TensorShape([Dimension(2), Dimension(3)])
+
+d = tf.constant([[1.0, 0.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0]])
+
+print d.get_shape()
+==> TensorShape([Dimension(4), Dimension(2)])
+
+# Raises a ValueError, because `c` and `d` do not have compatible
+# inner dimensions.
+e = tf.matmul(c, d)
+
+f = tf.matmul(c, d, transpose_a=True, transpose_b=True)
+
+print f.get_shape()
+==> TensorShape([Dimension(3), Dimension(4)])
+```
+
+In some cases, the inferred shape may have unknown dimensions. If
+the caller has additional information about the values of these
+dimensions, `Tensor.set_shape()` can be used to augment the
+inferred shape.
+
+##### Returns:
+
+  A `TensorShape` representing the shape of this tensor.
+
+
+- - -
+
+#### tf.Tensor.set_shape(shape) {#Tensor.set_shape}
+
+Updates the shape of this tensor.
+
+This method can be called multiple times, and will merge the given
+`shape` with the current shape of this tensor. It can be used to
+provide additional information about the shape of this tensor that
+cannot be inferred from the graph alone. For example, this can be used
+to provide additional information about the shapes of images:
+
+```python
+_, image_data = tf.TFRecordReader(...).read(...)
+image = tf.image.decode_png(image_data, channels=3)
+
+# The height and width dimensions of `image` are data dependent, and
+# cannot be computed without executing the op.
+print image.get_shape()
+==> TensorShape([Dimension(None), Dimension(None), Dimension(3)])
+
+# We know that each image in this dataset is 28 x 28 pixels.
+image.set_shape([28, 28, 3])
+print image.get_shape()
+==> TensorShape([Dimension(28), Dimension(28), Dimension(3)])
+```
+
+##### Args:
+
+
+*  <b>shape</b>: A `TensorShape` representing the shape of this tensor.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `shape` is not compatible with the current shape of
+    this tensor.
+
+
+
+#### Other Methods
+- - -
+
+#### tf.Tensor.__init__(op, value_index, dtype) {#Tensor.__init__}
+
+Creates a new `Tensor`.
+
+##### Args:
+
+
+*  <b>op</b>: An `Operation`. `Operation` that computes this tensor.
+*  <b>value_index</b>: An `int`. Index of the operation's endpoint that produces
+    this tensor.
+*  <b>dtype</b>: A `types.DType`. Type of data stored in this tensor.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If the op is not an `Operation`.
+
+
+- - -
+
+#### tf.Tensor.device {#Tensor.device}
+
+The name of the device on which this tensor will be produced, or None.
+
+
+
+## Tensor types <div class="md-anchor" id="AUTOGENERATED-tensor-types">{#AUTOGENERATED-tensor-types}</div>
+
+- - -
+
+### class tf.DType <div class="md-anchor" id="DType">{#DType}</div>
+
+Represents the type of the elements in a `Tensor`.
+
+The following `DType` objects are defined:
+
+* `tf.float32`: 32-bit single-precision floating-point.
+* `tf.float64`: 64-bit double-precision floating-point.
+* `tf.bfloat16`: 16-bit truncated floating-point.
+* `tf.complex64`: 64-bit single-precision complex.
+
+* `tf.int8`: 8-bit signed integer.
+* `tf.uint8`: 8-bit unsigned integer.
+* `tf.int32`: 32-bit signed integer.
+* `tf.int64`: 64-bit signed integer.
+
+* `tf.bool`: Boolean.
+
+* `tf.string`: String.
+
+* `tf.qint8`: Quantized 8-bit signed integer.
+* `tf.quint8`: Quantized 8-bit unsigned integer.
+* `tf.qint32`: Quantized 32-bit signed integer.
+
+In addition, variants of these types with the `_ref` suffix are
+defined for reference-typed tensors.
+
+The `tf.as_dtype()` function converts numpy types and string type
+names to a `DType` object.
+
+- - -
+
+#### tf.DType.is_compatible_with(other) {#DType.is_compatible_with}
+
+Returns True if the `other` DType will be converted to this DType.
+
+The conversion rules are as follows:
+
+```
+DType(T)       .is_compatible_with(DType(T))        == True
+DType(T)       .is_compatible_with(DType(T).as_ref) == True
+DType(T).as_ref.is_compatible_with(DType(T))        == False
+DType(T).as_ref.is_compatible_with(DType(T).as_ref) == True
+```
+
+##### Args:
+
+
+*  <b>other</b>: A `DType` (or object that may be converted to a `DType`).
+
+##### Returns:
+
+  True if a Tensor of the `other` `DType` will be implicitly converted to
+  this `DType`.
+
+
+- - -
+
+#### tf.DType.name {#DType.name}
+
+Returns the string name for this `DType`.
+
+- - -
+
+#### tf.DType.base_dtype {#DType.base_dtype}
+
+Returns a non-reference `DType` based on this `DType`.
+
+- - -
+
+#### tf.DType.is_ref_dtype {#DType.is_ref_dtype}
+
+Returns `True` if this `DType` represents a reference type.
+
+- - -
+
+#### tf.DType.as_ref {#DType.as_ref}
+
+Returns a reference `DType` based on this `DType`.
+
+- - -
+
+#### tf.DType.is_integer {#DType.is_integer}
+
+Returns whether this is a (non-quantized) integer type.
+
+- - -
+
+#### tf.DType.is_quantized {#DType.is_quantized}
+
+Returns whether this is a quantized data type.
+
+
+- - -
+
+#### tf.DType.as_numpy_dtype {#DType.as_numpy_dtype}
+
+Returns a `numpy.dtype` based on this `DType`.
+
+- - -
+
+#### tf.DType.as_datatype_enum {#DType.as_datatype_enum}
+
+Returns a `types_pb2.DataType` enum value based on this `DType`.
+
+
+#### Other Methods
+- - -
+
+#### tf.DType.__init__(type_enum) {#DType.__init__}
+
+Creates a new `DataType`.
+
+NOTE(mrry): In normal circumstances, you should not need to
+construct a DataType object directly. Instead, use the
+types.as_dtype() function.
+
+##### Args:
+
+
+*  <b>type_enum</b>: A `types_pb2.DataType` enum value.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `type_enum` is not a value `types_pb2.DataType`.
+
+
+- - -
+
+#### tf.DType.max {#DType.max}
+
+Returns the maximum representable value in this data type.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: if this is a non-numeric, unordered, or quantized type.
+
+- - -
+
+#### tf.DType.min {#DType.min}
+
+Returns the minimum representable value in this data type.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: if this is a non-numeric, unordered, or quantized type.
+
+
+- - -
+
+### tf.as_dtype(type_value) <div class="md-anchor" id="as_dtype">{#as_dtype}</div>
+
+Converts the given `type_value` to a `DType`.
+
+##### Args:
+
+
+*  <b>type_value</b>: A value that can be converted to a `tf.DType`
+    object. This may currently be a `tf.DType` object, a
+    [`DataType` enum](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.proto),
+    a string type name, or a `numpy.dtype`.
+
+##### Returns:
+
+  A `DType` corresponding to `type_value`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `type_value` cannot be converted to a `DType`.
+
+
+
+## Utility functions <div class="md-anchor" id="AUTOGENERATED-utility-functions">{#AUTOGENERATED-utility-functions}</div>
+
+- - -
+
+### tf.device(dev) <div class="md-anchor" id="device">{#device}</div>
+
+Wrapper for `Graph.device()` using the default graph.
+
+See [`Graph.name_scope()`](framework.md#Graph.name_scope) for more details.
+
+##### Args:
+
+
+*  <b>device_name_or_function</b>: The device name or function to use in
+    the context.
+
+##### Returns:
+
+  A context manager that specifies the default device to use for newly
+  created ops.
+
+
+- - -
+
+### tf.name_scope(name) <div class="md-anchor" id="name_scope">{#name_scope}</div>
+
+Wrapper for `Graph.name_scope()` using the default graph.
+
+See [`Graph.name_scope()`](framework.md#Graph.name_scope) for more details.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the scope.
+
+##### Returns:
+
+  A context manager that installs `name` as a new name scope in the
+  default graph.
+
+
+- - -
+
+### tf.control_dependencies(control_inputs) <div class="md-anchor" id="control_dependencies">{#control_dependencies}</div>
+
+Wrapper for `Graph.control_dependencies()` using the default graph.
+
+See [`Graph.control_dependencies()`](framework.md#Graph.control_dependencies)
+for more details.
+
+##### Args:
+
+
+*  <b>control_inputs</b>: A list of `Operation` or `Tensor` objects, which
+    must be executed or computed before running the operations
+    defined in the context.
+
+##### Returns:
+
+ A context manager that specifies control dependencies for all
+ operations constructed within the context.
+
+
+- - -
+
+### tf.convert_to_tensor(value, dtype=None, name=None) <div class="md-anchor" id="convert_to_tensor">{#convert_to_tensor}</div>
+
+Converts the given `value` to a `Tensor`.
+
+This function converts Python objects of various types to `Tensor`
+objects. It accepts `Tensor` objects, numpy arrays, Python lists,
+and Python scalars. For example:
+
+```python
+import numpy as np
+array = np.random.rand((32, 100, 100))
+
+def my_func(arg):
+  arg = tf.convert_to_tensor(arg, dtype=tf.float32)
+  return tf.matmul(arg, arg) + arg
+
+# The following calls are equivalent.
+value_1 = my_func(tf.constant([[1.0, 2.0], [3.0, 4.0]))
+value_2 = my_func([[1.0, 2.0], [3.0, 4.0]])
+value_3 = my_func(numpy.array([[1.0, 2.0], [3.0, 4.0]], dtype=numpy.float32))
+```
+
+This function can be useful when composing a new operation in Python
+(such as `my_func` in the example above). All standard Python op
+constructors apply this function to each of their Tensor-valued
+inputs, which allows those ops to accept numpy arrays, Python lists,
+and scalars in addition to `Tensor` objects.
+
+##### Args:
+
+
+*  <b>value</b>: An object whose type has a registered `Tensor` conversion function.
+*  <b>dtype</b>: Optional element type for the returned tensor. If missing, the
+    type is inferred from the type of `value`.
+*  <b>name</b>: Optional name to use if a new `Tensor` is created.
+
+##### Returns:
+
+  A `Tensor` based on `value`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If no conversion function is registered for `value`.
+*  <b>RuntimeError</b>: If a registered conversion function returns an invalid value.
+
+
+- - -
+
+### tf.get_default_graph() <div class="md-anchor" id="get_default_graph">{#get_default_graph}</div>
+
+Returns the default graph for the current thread.
+
+The returned graph will be the innermost graph on which a
+`Graph.as_default()` context has been entered, or a global default
+graph if none has been explicitly created.
+
+*N.B.* The default graph is a property of the current thread. If you
+create a new thread, and wish to use the default graph in that
+thread, you must explicitly add a `with g.as_default():` in that
+thread's function.
+
+##### Returns:
+
+  The default `Graph` being used in the current thread.
+
+
+- - -
+
+### tf.import_graph_def(graph_def, input_map=None, return_elements=None, name=None, op_dict=None) <div class="md-anchor" id="import_graph_def">{#import_graph_def}</div>
+
+Imports the TensorFlow graph in `graph_def` into the Python `Graph`.
+
+This function provides a way to import a serialized TensorFlow
+[`GraphDef`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/graph.proto)
+protocol buffer, and extract individual objects in the `GraphDef` as
+[`Tensor`](#Tensor) and [`Operation`](#Operation) objects. See
+[`Graph.as_graph_def()`](#Graph.as_graph_def) for a way to create a
+`GraphDef` proto.
+
+##### Args:
+
+
+*  <b>graph_def</b>: A `GraphDef` proto containing operations to be imported into
+    the default graph.
+*  <b>input_map</b>: A dictionary mapping input names (as strings) in `graph_def`
+    to `Tensor` objects. The values of the named input tensors in the
+    imported graph will be re-mapped to the respective `Tensor` values.
+*  <b>return_elements</b>: A list of strings containing operation names in
+    `graph_def` that will be returned as `Operation` objects; and/or
+    tensor names in `graph_def` that will be returned as `Tensor` objects.
+*  <b>name</b>: (Optional.) A prefix that will be prepended to the names in
+    `graph_def`. Defaults to `"import"`.
+*  <b>op_dict</b>: (Optional.) A dictionary mapping op type names to `OpDef` protos.
+    Must contain an `OpDef` proto for each op type named in `graph_def`.
+    If omitted, uses the `OpDef` protos registered in the global registry.
+
+##### Returns:
+
+  A list of `Operation` and/or `Tensor` objects from the imported graph,
+  corresponding to the names in `return_elements'.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `graph_def` is not a `GraphDef` proto,
+    `input_map' is not a dictionary mapping strings to `Tensor` objects,
+    or `return_elements` is not a list of strings.
+*  <b>ValueError</b>: If `input_map`, or `return_elements` contains names that
+    do not appear in `graph_def`, or `graph_def` is not well-formed (e.g.
+    it refers to an unknown tensor).
+
+
+
+## Graph collections <div class="md-anchor" id="AUTOGENERATED-graph-collections">{#AUTOGENERATED-graph-collections}</div>
+
+- - -
+
+### tf.add_to_collection(name, value) <div class="md-anchor" id="add_to_collection">{#add_to_collection}</div>
+
+Wrapper for `Graph.add_to_collection()` using the default graph.
+
+See [`Graph.add_to_collection()`](framework.md#Graph.add_to_collection)
+for more details.
+
+##### Args:
+
+
+*  <b>name</b>: The key for the collection. For example, the `GraphKeys` class
+    contains many standard names for collections.
+*  <b>value</b>: The value to add to the collection.
+
+
+- - -
+
+### tf.get_collection(key, scope=None) <div class="md-anchor" id="get_collection">{#get_collection}</div>
+
+Wrapper for `Graph.get_collection()` using the default graph.
+
+See [`Graph.get_collection()`](framework.md#Graph.get_collection)
+for more details.
+
+##### Args:
+
+
+*  <b>key</b>: The key for the collection. For example, the `GraphKeys` class
+    contains many standard names for collections.
+*  <b>scope</b>: (Optional.) If supplied, the resulting list is filtered to include
+    only items whose name begins with this string.
+
+##### Returns:
+
+  The list of values in the collection with the given `name`, or
+  an empty list if no value has been added to that collection. The
+  list contains the values in the order under which they were
+  collected.
+
+
+- - -
+
+### class tf.GraphKeys <div class="md-anchor" id="GraphKeys">{#GraphKeys}</div>
+
+Standard names to use for graph collections.
+
+The standard library uses various well-known names to collect and
+retrieve values associated with a graph. For example, the
+`tf.Optimizer` subclasses default to optimizing the variables
+collected under `tf.GraphKeys.TRAINABLE_VARIABLES` if none is
+specified, but it is also possible to pass an explicit list of
+variables.
+
+The following standard keys are defined:
+
+* `VARIABLES`: the `Variable` objects that comprise a model, and
+  must be saved and restored together. See
+  [`tf.all_variables()`](state_ops.md#all_variables) for more details.
+* `TRAINABLE_VARIABLES`: the subset of `Variable` objects that will
+  be trained by an optimizer. See
+  [`tf.trainable_variables()`](state_ops.md#trainable_variables)
+  for more details.
+* `SUMMARIES`: the summary `Tensor` objects that have been created
+  in the graph. See [`tf.merge_all_summaries()`](train.md#merge_all_summaries)
+  for more details.
+* `QUEUE_RUNNERS`: the `QueueRunner` objects that are used to
+  produce input for a computation. See
+  [`tf.start_queue_runners()`](train.md#start_queue_runners) for more details.
+
+
+## Defining new operations <div class="md-anchor" id="AUTOGENERATED-defining-new-operations">{#AUTOGENERATED-defining-new-operations}</div>
+
+- - -
+
+### class tf.RegisterGradient <div class="md-anchor" id="RegisterGradient">{#RegisterGradient}</div>
+
+A decorator for registering the gradient function for an op type.
+
+This decorator is only used when defining a new op type. For an op
+with `m` inputs and `n` inputs, the gradient function is a function
+that takes the original `Operation` and `n` `Tensor` objects
+(representing the gradients with respect to each output of the op),
+and returns `m` `Tensor` objects (representing the partial gradients
+with respect to each input of the op).
+
+For example, assuming that operations of type `"Sub"` take two
+inputs `x` and `y`, and return a single output `x - y`, the
+following gradient function would be registered:
+
+```python
+@tf.RegisterGradient("Sub")
+def _sub_grad(unused_op, grad):
+  return grad, tf.Neg(grad)
+```
+
+The decorator argument `op_type` is the string type of an
+operation. This corresponds to the `OpDef.name` field for the proto
+that defines the operation.
+
+- - -
+
+#### tf.RegisterGradient.__init__(op_type) {#RegisterGradient.__init__}
+
+Creates a new decorator with `op_type` as the Operation type.
+
+##### Args:
+
+
+*  <b>op_type</b>: The string type of an operation. This corresponds to the
+    `OpDef.name` field for the proto that defines the operation.
+
+
+
+- - -
+
+### tf.NoGradient(op_type) <div class="md-anchor" id="NoGradient">{#NoGradient}</div>
+
+Specifies that ops of type `op_type` do not have a defined gradient.
+
+This function is only used when defining a new op type. It may be
+used for ops such as `tf.size()` that are not differentiable.  For
+example:
+
+```python
+tf.NoGradient("Size")
+```
+
+##### Args:
+
+
+*  <b>op_type</b>: The string type of an operation. This corresponds to the
+    `OpDef.name` field for the proto that defines the operation.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `op_type` is not a string.
+
+
+- - -
+
+### class tf.RegisterShape <div class="md-anchor" id="RegisterShape">{#RegisterShape}</div>
+
+A decorator for registering the shape function for an op type.
+
+This decorator is only used when defining a new op type. A shape
+function is a function from an `Operation` object to a list of
+`TensorShape` objects, with one `TensorShape` for each output of the
+operation.
+
+For example, assuming that operations of type `"Sub"` take two
+inputs `x` and `y`, and return a single output `x - y`, all with the
+same shape, the following shape function would be registered:
+
+```python
+@tf.RegisterShape("Sub")
+def _sub_shape(op):
+  return [op.inputs[0].get_shape().merge_with(op.inputs[1].get_shape())]
+```
+
+The decorator argument `op_type` is the string type of an
+operation. This corresponds to the `OpDef.name` field for the proto
+that defines the operation.
+- - -
+
+#### tf.RegisterShape.__init__(op_type) {#RegisterShape.__init__}
+
+Saves the "op_type" as the Operation type.
+
+
+
+- - -
+
+### class tf.TensorShape <div class="md-anchor" id="TensorShape">{#TensorShape}</div>
+
+Represents the shape of a `Tensor`.
+
+A `TensorShape` represents a possibly-partial shape specification for a
+`Tensor`. It may be one of the following:
+
+* *Fully-known shape:* has a known number of dimensions and a known size
+  for each dimension.
+* *Partially-known shape:* has a known number of dimensions, and an unknown
+  size for one or more dimension.
+* *Unknown shape:* has an unknown number of dimensions, and an unknown
+  size in all dimensions.
+
+If a tensor is produced by an operation of type `"Foo"`, its shape
+may be inferred if there is a registered shape function for
+`"Foo"`. See [`tf.RegisterShape()`](framework.md#RegisterShape)
+for details of shape
+functions and how to register them. Alternatively, the shape may be set
+explicitly using [`Tensor.set_shape()`](framework.md#Tensor.set_shape).
+
+- - -
+
+#### tf.TensorShape.merge_with(other) {#TensorShape.merge_with}
+
+Returns a `TensorShape` combining the information in `self` and `other`.
+
+The dimensions in `self` and `other` are merged elementwise,
+according to the rules defined for `Dimension.merge_with()`.
+
+##### Args:
+
+
+*  <b>other</b>: Another `TensorShape`.
+
+##### Returns:
+
+  A `TensorShape` containing the combined information of `self` and
+  `other`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` and `other` are not compatible.
+
+
+- - -
+
+#### tf.TensorShape.concatenate(other) {#TensorShape.concatenate}
+
+Returns the concatenation of the dimension in `self` and `other`.
+
+*N.B.* If either `self` or `other` is completely unknown,
+concatenation will discard information about the other shape. In
+future, we might support concatenation that preserves this
+information for use with slicing.
+
+##### Args:
+
+
+*  <b>other</b>: Another `TensorShape`.
+
+##### Returns:
+
+  A `TensorShape` whose dimensions are the concatenation of the
+  dimensions in `self` and `other`.
+
+
+
+- - -
+
+#### tf.TensorShape.ndims {#TensorShape.ndims}
+
+Returns the rank of this shape, or None if it is unspecified.
+
+- - -
+
+#### tf.TensorShape.dims {#TensorShape.dims}
+
+Returns a list of Dimensions, or None if the shape is unspecified.
+
+- - -
+
+#### tf.TensorShape.as_list() {#TensorShape.as_list}
+
+Returns a list of integers or None for each dimension.
+
+
+- - -
+
+#### tf.TensorShape.is_compatible_with(other) {#TensorShape.is_compatible_with}
+
+Returns True iff `self` is compatible with `other`.
+
+Two possibly-partially-defined shapes are compatible if there
+exists a fully-defined shape that both shapes can represent. Thus,
+compatibility allows the shape inference code to reason about
+partially-defined shapes. For example:
+
+* TensorShape(None) is compatible with all shapes.
+
+* TensorShape([None, None]) is compatible with all two-dimensional
+  shapes, such as TensorShape([32, 784]), and also TensorShape(None). It is
+  not compatible with, for example, TensorShape([None]) or
+  TensorShape([None, None, None]).
+
+* TensorShape([32, None]) is compatible with all two-dimensional shapes
+  with size 32 in the 0th dimension, and also TensorShape([None, None])
+  and TensorShape(None). It is not compatible with, for example,
+  TensorShape([32]), TensorShape([32, None, 1]) or TensorShape([64, None]).
+
+* TensorShape([32, 784]) is compatible with itself, and also
+  TensorShape([32, None]), TensorShape([None, 784]), TensorShape([None,
+  None]) and TensorShape(None). It is not compatible with, for example,
+  TensorShape([32, 1, 784]) or TensorShape([None]).
+
+The compatibility relation is reflexive and symmetric, but not
+transitive. For example, TensorShape([32, 784]) is compatible with
+TensorShape(None), and TensorShape(None) is compatible with
+TensorShape([4, 4]), but TensorShape([32, 784]) is not compatible with
+TensorShape([4, 4]).
+
+##### Args:
+
+
+*  <b>other</b>: Another TensorShape.
+
+##### Returns:
+
+  True iff `self` is compatible with `other`.
+
+
+- - -
+
+#### tf.TensorShape.is_fully_defined() {#TensorShape.is_fully_defined}
+
+Returns True iff `self` is fully defined in every dimension.
+
+
+
+- - -
+
+#### tf.TensorShape.with_rank(rank) {#TensorShape.with_rank}
+
+Returns a shape based on `self` with the given rank.
+
+This method promotes a completely unknown shape to one with a
+known rank.
+
+##### Args:
+
+
+*  <b>rank</b>: An integer.
+
+##### Returns:
+
+  A shape that is at least as specific as `self` with the given rank.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` does not represent a shape with the given `rank`.
+
+
+- - -
+
+#### tf.TensorShape.with_rank_at_least(rank) {#TensorShape.with_rank_at_least}
+
+Returns a shape based on `self` with at least the given rank.
+
+##### Args:
+
+
+*  <b>rank</b>: An integer.
+
+##### Returns:
+
+  A shape that is at least as specific as `self` with at least the given
+  rank.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` does not represent a shape with at least the given
+    `rank`.
+
+
+- - -
+
+#### tf.TensorShape.with_rank_at_most(rank) {#TensorShape.with_rank_at_most}
+
+Returns a shape based on `self` with at most the given rank.
+
+##### Args:
+
+
+*  <b>rank</b>: An integer.
+
+##### Returns:
+
+  A shape that is at least as specific as `self` with at most the given
+  rank.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` does not represent a shape with at most the given
+    `rank`.
+
+
+
+- - -
+
+#### tf.TensorShape.assert_has_rank(rank) {#TensorShape.assert_has_rank}
+
+Raises an exception if `self` is not compatible with the given `rank`.
+
+##### Args:
+
+
+*  <b>rank</b>: An integer.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` does not represent a shape with the given `rank`.
+
+
+- - -
+
+#### tf.TensorShape.assert_same_rank(other) {#TensorShape.assert_same_rank}
+
+Raises an exception if `self` and `other` do not have compatible ranks.
+
+##### Args:
+
+
+*  <b>other</b>: Another `TensorShape`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` and `other` do not represent shapes with the
+    same rank.
+
+
+- - -
+
+#### tf.TensorShape.assert_is_compatible_with(other) {#TensorShape.assert_is_compatible_with}
+
+Raises exception if `self` and `other` do not represent the same shape.
+
+This method can be used to assert that there exists a shape that both
+`self` and `other` represent.
+
+##### Args:
+
+
+*  <b>other</b>: Another TensorShape.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` and `other` do not represent the same shape.
+
+
+- - -
+
+#### tf.TensorShape.assert_is_fully_defined() {#TensorShape.assert_is_fully_defined}
+
+Raises an exception if `self` is not fully defined in every dimension.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` does not have a known value for every dimension.
+
+
+
+#### Other Methods
+- - -
+
+#### tf.TensorShape.__init__(dims) {#TensorShape.__init__}
+
+Creates a new TensorShape with the given dimensions.
+
+##### Args:
+
+
+*  <b>dims</b>: A list of Dimensions, or None if the shape is unspecified.
+*  <b>DEPRECATED</b>: A single integer is treated as a singleton list.
+
+
+- - -
+
+#### tf.TensorShape.as_dimension_list() {#TensorShape.as_dimension_list}
+
+DEPRECATED: use as_list().
+
+
+- - -
+
+#### tf.TensorShape.num_elements() {#TensorShape.num_elements}
+
+Returns the total number of elements, or none for incomplete shapes.
+
+
+
+- - -
+
+### class tf.Dimension <div class="md-anchor" id="Dimension">{#Dimension}</div>
+
+Represents the value of one dimension in a TensorShape.
+- - -
+
+#### tf.Dimension.__init__(value) {#Dimension.__init__}
+
+Creates a new Dimension with the given value.
+
+
+- - -
+
+#### tf.Dimension.assert_is_compatible_with(other) {#Dimension.assert_is_compatible_with}
+
+Raises an exception if `other` is not compatible with this Dimension.
+
+##### Args:
+
+
+*  <b>other</b>: Another Dimension.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` and `other` are not compatible (see
+    is_compatible_with).
+
+
+- - -
+
+#### tf.Dimension.is_compatible_with(other) {#Dimension.is_compatible_with}
+
+Returns true if `other` is compatible with this Dimension.
+
+Two known Dimensions are compatible if they have the same value.
+An unknown Dimension is compatible with all other Dimensions.
+
+##### Args:
+
+
+*  <b>other</b>: Another Dimension.
+
+##### Returns:
+
+  True if this Dimension and `other` are compatible.
+
+
+- - -
+
+#### tf.Dimension.merge_with(other) {#Dimension.merge_with}
+
+Returns a Dimension that combines the information in `self` and `other`.
+
+Dimensions are combined as follows:
+
+  Dimension(n)   .merge_with(Dimension(n))    == Dimension(n)
+  Dimension(n)   .merge_with(Dimension(None)) == Dimension(n)
+  Dimension(None).merge_with(Dimension(n))    == Dimension(n)
+  Dimension(None).merge_with(Dimension(None)) == Dimension(None)
+  Dimension(n)   .merge_with(Dimension(m)) raises ValueError for n != m
+
+##### Args:
+
+
+*  <b>other</b>: Another Dimension.
+
+##### Returns:
+
+  A Dimension containing the combined information of `self` and
+  `other`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `self` and `other` are not compatible (see
+    is_compatible_with).
+
+
+- - -
+
+#### tf.Dimension.value {#Dimension.value}
+
+The value of this dimension, or None if it is unknown.
+
+
+- - -
+
+### tf.op_scope(*args, **kwds) <div class="md-anchor" id="op_scope">{#op_scope}</div>
+
+Returns a context manager for use when defining a Python op.
+
+This context manager validates that the given `values` are from the
+same graph, ensures that that graph is the default graph, and pushes a
+name scope.
+
+For example, to define a new Python op called `my_op`:
+
+```python
+def my_op(a, b, c, name=None):
+  with tf.op_scope([a, b, c], name, "MyOp") as scope:
+    a = tf.convert_to_tensor(a, name="a")
+    b = tf.convert_to_tensor(b, name="b")
+    c = tf.convert_to_tensor(c, name="c")
+    # Define some computation that uses `a`, `b`, and `c`.
+    return foo_op(..., name=scope)
+```
+
+##### Args:
+
+
+*  <b>values</b>: The list of `Tensor` arguments that are passed to the op function.
+*  <b>name</b>: The name argument that is passed to the op function.
+*  <b>default_name</b>: The default name to use if the `name` argument is `None`.
+
+##### Returns:
+
+  A context manager for use in defining a Python op.
+
+
+- - -
+
+### tf.get_seed(op_seed) <div class="md-anchor" id="get_seed">{#get_seed}</div>
+
+Returns the local seeds an operation should use given an op-specific seed.
+
+Given operation-specific seed, `op_seed`, this helper function returns two
+seeds derived from graph-level and op-level seeds. Many random operations
+internally use the two seeds to allow user to change the seed globally for a
+graph, or for only specific operations.
+
+For details on how the graph-level seed interacts with op seeds, see
+[`set_random_seed`](constant_op.md#set_random_seed).
+
+##### Args:
+
+
+*  <b>op_seed</b>: integer.
+
+##### Returns:
+
+  A tuple of two integers that should be used for the local seed of this
+  operation.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/image.md b/tensorflow/g3doc/api_docs/python/image.md
new file mode 100644
index 0000000000..6b3d6c3ca7
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/image.md
@@ -0,0 +1,857 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Images
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Encoding and Decoding.](#AUTOGENERATED-encoding-and-decoding.)
+  * [tf.image.decode_jpeg(contents, channels=None, ratio=None, fancy_upscaling=None, try_recover_truncated=None, acceptable_fraction=None, name=None)](#decode_jpeg)
+  * [tf.image.encode_jpeg(image, format=None, quality=None, progressive=None, optimize_size=None, chroma_downsampling=None, density_unit=None, x_density=None, y_density=None, xmp_metadata=None, name=None)](#encode_jpeg)
+  * [tf.image.decode_png(contents, channels=None, name=None)](#decode_png)
+  * [tf.image.encode_png(image, compression=None, name=None)](#encode_png)
+* [Resizing.](#AUTOGENERATED-resizing.)
+  * [tf.image.resize_images(images, new_height, new_width, method=0)](#resize_images)
+  * [tf.image.resize_area(images, size, name=None)](#resize_area)
+  * [tf.image.resize_bicubic(images, size, name=None)](#resize_bicubic)
+  * [tf.image.resize_bilinear(images, size, name=None)](#resize_bilinear)
+  * [tf.image.resize_nearest_neighbor(images, size, name=None)](#resize_nearest_neighbor)
+* [Cropping.](#AUTOGENERATED-cropping.)
+  * [tf.image.resize_image_with_crop_or_pad(image, target_height, target_width)](#resize_image_with_crop_or_pad)
+  * [tf.image.pad_to_bounding_box(image, offset_height, offset_width, target_height, target_width)](#pad_to_bounding_box)
+  * [tf.image.crop_to_bounding_box(image, offset_height, offset_width, target_height, target_width)](#crop_to_bounding_box)
+  * [tf.image.random_crop(image, size, seed=None, name=None)](#random_crop)
+  * [tf.image.extract_glimpse(input, size, offsets, centered=None, normalized=None, uniform_noise=None, name=None)](#extract_glimpse)
+* [Flipping and Transposing.](#AUTOGENERATED-flipping-and-transposing.)
+  * [tf.image.flip_up_down(image)](#flip_up_down)
+  * [tf.image.random_flip_up_down(image, seed=None)](#random_flip_up_down)
+  * [tf.image.flip_left_right(image)](#flip_left_right)
+  * [tf.image.random_flip_left_right(image, seed=None)](#random_flip_left_right)
+  * [tf.image.transpose_image(image)](#transpose_image)
+* [Image Adjustments.](#AUTOGENERATED-image-adjustments.)
+  * [tf.image.adjust_brightness(image, delta, min_value=None, max_value=None)](#adjust_brightness)
+  * [tf.image.random_brightness(image, max_delta, seed=None)](#random_brightness)
+  * [tf.image.adjust_contrast(images, contrast_factor, min_value=None, max_value=None)](#adjust_contrast)
+  * [tf.image.random_contrast(image, lower, upper, seed=None)](#random_contrast)
+  * [tf.image.per_image_whitening(image)](#per_image_whitening)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Encoding and Decoding. <div class="md-anchor" id="AUTOGENERATED-encoding-and-decoding.">{#AUTOGENERATED-encoding-and-decoding.}</div>
+
+TensorFlow provides Ops to decode and encode JPEG and PNG formats.  Encoded
+images are represented by scalar string Tensors, decoded images by 3-D uint8
+tensors of shape `[height, width, channels]`.
+
+The encode and decode Ops apply to one image at a time.  Their input and output
+are all of variable size.  If you need fixed size images, pass the output of
+the decode Ops to one of the cropping and resizing Ops.
+
+Note: The PNG encode and decode Ops support RGBA, but the conversions Ops
+presently only support RGB, HSV, and GrayScale.
+
+- - -
+
+### tf.image.decode_jpeg(contents, channels=None, ratio=None, fancy_upscaling=None, try_recover_truncated=None, acceptable_fraction=None, name=None) <div class="md-anchor" id="decode_jpeg">{#decode_jpeg}</div>
+
+Decode a JPEG-encoded image to a uint8 tensor.
+
+The attr `channels` indicates the desired number of color channels for the
+decoded image.
+
+Accepted values are:
+
+*   0: Use the number of channels in the JPEG-encoded image.
+*   1: output a grayscale image.
+*   3: output an RGB image.
+
+If needed, the JPEG-encoded image is transformed to match the requested number
+of color channels.
+
+The attr `ratio` allows downscaling the image by an integer factor during
+decoding.  Allowed values are: 1, 2, 4, and 8.  This is much faster than
+downscaling the image later.
+
+##### Args:
+
+
+*  <b>contents</b>: A `Tensor` of type `string`. 0-D.  The JPEG-encoded image.
+*  <b>channels</b>: An optional `int`. Defaults to `0`.
+    Number of color channels for the decoded image.
+*  <b>ratio</b>: An optional `int`. Defaults to `1`. Downscaling ratio.
+*  <b>fancy_upscaling</b>: An optional `bool`. Defaults to `True`.
+    If true use a slower but nicer upscaling of the
+    chroma planes (yuv420/422 only).
+*  <b>try_recover_truncated</b>: An optional `bool`. Defaults to `False`.
+    If true try to recover an image from truncated input.
+*  <b>acceptable_fraction</b>: An optional `float`. Defaults to `1`.
+    The minimum required fraction of lines before a truncated
+    input is accepted.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `uint8`. 3-D with shape `[height, width, channels]`..
+
+
+- - -
+
+### tf.image.encode_jpeg(image, format=None, quality=None, progressive=None, optimize_size=None, chroma_downsampling=None, density_unit=None, x_density=None, y_density=None, xmp_metadata=None, name=None) <div class="md-anchor" id="encode_jpeg">{#encode_jpeg}</div>
+
+JPEG-encode an image.
+
+`image` is a 3-D uint8 Tensor of shape `[height, width, channels]`.
+
+The attr `format` can be used to override the color format of the encoded
+output.  Values can be:
+
+*   `''`: Use a default format based on the number of channels in the image.
+*   `grayscale`: Output a grayscale JPEG image.  The `channels` dimension
+    of `image` must be 1.
+*   `rgb`: Output an RGB JPEG image. The `channels` dimension
+    of `image` must be 3.
+
+If `format` is not specified or is the empty string, a default format is picked
+in function of the number of channels in `image`:
+
+*   1: Output a grayscale image.
+*   3: Output an RGB image.
+
+##### Args:
+
+
+*  <b>image</b>: A `Tensor` of type `uint8`.
+    3-D with shape `[height, width, channels]`.
+*  <b>format</b>: An optional `string` from: `"", "grayscale", "rgb"`. Defaults to `""`.
+    Per pixel image format.
+*  <b>quality</b>: An optional `int`. Defaults to `95`.
+    Quality of the compression from 0 to 100 (higher is better and slower).
+*  <b>progressive</b>: An optional `bool`. Defaults to `False`.
+    If True, create a JPEG that loads progressively (coarse to fine).
+*  <b>optimize_size</b>: An optional `bool`. Defaults to `False`.
+    If True, spend CPU/RAM to reduce size with no quality change.
+*  <b>chroma_downsampling</b>: An optional `bool`. Defaults to `True`.
+    See http://en.wikipedia.org/wiki/Chroma_subsampling.
+*  <b>density_unit</b>: An optional `string` from: `"in", "cm"`. Defaults to `"in"`.
+    Unit used to specify `x_density` and `y_density`:
+    pixels per inch (`'in'`) or centimeter (`'cm'`).
+*  <b>x_density</b>: An optional `int`. Defaults to `300`.
+    Horizontal pixels per density unit.
+*  <b>y_density</b>: An optional `int`. Defaults to `300`.
+    Vertical pixels per density unit.
+*  <b>xmp_metadata</b>: An optional `string`. Defaults to `""`.
+    If not empty, embed this XMP metadata in the image header.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `string`. 0-D. JPEG-encoded image.
+
+
+
+- - -
+
+### tf.image.decode_png(contents, channels=None, name=None) <div class="md-anchor" id="decode_png">{#decode_png}</div>
+
+Decode a PNG-encoded image to a uint8 tensor.
+
+The attr `channels` indicates the desired number of color channels for the
+decoded image.
+
+Accepted values are:
+
+*   0: Use the number of channels in the PNG-encoded image.
+*   1: output a grayscale image.
+*   3: output an RGB image.
+*   4: output an RGBA image.
+
+If needed, the PNG-encoded image is transformed to match the requested number
+of color channels.
+
+##### Args:
+
+
+*  <b>contents</b>: A `Tensor` of type `string`. 0-D.  The PNG-encoded image.
+*  <b>channels</b>: An optional `int`. Defaults to `0`.
+    Number of color channels for the decoded image.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `uint8`. 3-D with shape `[height, width, channels]`.
+
+
+- - -
+
+### tf.image.encode_png(image, compression=None, name=None) <div class="md-anchor" id="encode_png">{#encode_png}</div>
+
+PNG-encode an image.
+
+`image` is a 3-D uint8 Tensor of shape `[height, width, channels]` where
+`channels` is:
+
+*   1: for grayscale.
+*   3: for RGB.
+*   4: for RGBA.
+
+The ZLIB compression level, `compression`, can be -1 for the PNG-encoder
+default or a value from 0 to 9.  9 is the highest compression level, generating
+the smallest output, but is slower.
+
+##### Args:
+
+
+*  <b>image</b>: A `Tensor` of type `uint8`.
+    3-D with shape `[height, width, channels]`.
+*  <b>compression</b>: An optional `int`. Defaults to `-1`. Compression level.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `string`. 0-D. PNG-encoded image.
+
+
+
+## Resizing. <div class="md-anchor" id="AUTOGENERATED-resizing.">{#AUTOGENERATED-resizing.}</div>
+
+The resizing Ops accept input images as tensors of several types.  They always
+output resized images as float32 tensors.
+
+The convenience function [resize_images()](#resize_images) supports both 4-D
+and 3-D tensors as input and output.  4-D tensors are for batches of images,
+3-D tensors for individual images.
+
+Other resizing Ops only support 3-D individual images as input:
+[resize_area](#resize_area), [resize_bicubic](#resize_bicubic),
+[resize_bilinear](#resize_bilinear),
+[resize_nearest_neighbor](#resize_nearest_neighbor).
+
+Example:
+
+```python
+# Decode a JPG image and resize it to 299 by 299.
+image = tf.image.decode_jpeg(...)
+resized_image = tf.image.resize_bilinear(image, [299, 299])
+```
+
+<i>Maybe refer to the Queue examples that show how to add images to a Queue
+after resizing them to a fixed size, and how to dequeue batches of resized
+images from the Queue.</i>
+
+- - -
+
+### tf.image.resize_images(images, new_height, new_width, method=0) <div class="md-anchor" id="resize_images">{#resize_images}</div>
+
+Resize `images` to `new_width`, `new_height` using the specified `method`.
+
+Resized images will be distorted if their original aspect ratio is not
+the same as `new_width`, `new_height`.  To avoid distortions see
+[resize_image_with_crop_or_pad](#resize_image_with_crop_or_pad).
+
+`method` can be one of:
+
+*   <b>ResizeMethod.BILINEAR</b>: [Bilinear interpolation.]
+    (https://en.wikipedia.org/wiki/Bilinear_interpolation)
+*   <b>ResizeMethod.NEAREST_NEIGHBOR</b>: [Nearest neighbor interpolation.]
+    (https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation)
+*   <b>ResizeMethod.BICUBIC</b>: [Bicubic interpolation.]
+    (https://en.wikipedia.org/wiki/Bicubic_interpolation)
+*   <b>ResizeMethod.AREA</b>: Area interpolation.
+
+##### Args:
+
+
+*  <b>images</b>: 4-D Tensor of shape `[batch, height, width, channels]` or
+          3-D Tensor of shape `[height, width, channels]`.
+*  <b>new_height</b>: integer.
+*  <b>new_width</b>: integer.
+*  <b>method</b>: ResizeMethod.  Defaults to `ResizeMethod.BILINEAR`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if the shape of `images` is incompatible with the
+    shape arguments to this function
+*  <b>ValueError</b>: if an unsupported resize method is specified.
+
+##### Returns:
+
+  If `images` was 4-D, a 4-D float Tensor of shape
+  `[batch, new_height, new_width, channels]`.
+  If `images` was 3-D, a 3-D float Tensor of shape
+  `[new_height, new_width, channels]`.
+
+
+
+- - -
+
+### tf.image.resize_area(images, size, name=None) <div class="md-anchor" id="resize_area">{#resize_area}</div>
+
+Resize `images` to `size` using area interpolation.
+
+Input images can be of different types but output images are always float.
+
+##### Args:
+
+
+*  <b>images</b>: A `Tensor`. Must be one of the following types: `uint8`, `int8`, `int32`, `float32`, `float64`.
+    4-D with shape `[batch, height, width, channels]`.
+*  <b>size</b>: A 1-D int32 Tensor of 2 elements: `new_height, new_width`.  The
+    new size for the images.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`. 4-D with shape
+  `[batch, new_height, new_width, channels]`.
+
+
+- - -
+
+### tf.image.resize_bicubic(images, size, name=None) <div class="md-anchor" id="resize_bicubic">{#resize_bicubic}</div>
+
+Resize `images` to `size` using bicubic interpolation.
+
+Input images can be of different types but output images are always float.
+
+##### Args:
+
+
+*  <b>images</b>: A `Tensor`. Must be one of the following types: `uint8`, `int8`, `int32`, `float32`, `float64`.
+    4-D with shape `[batch, height, width, channels]`.
+*  <b>size</b>: A 1-D int32 Tensor of 2 elements: `new_height, new_width`.  The
+    new size for the images.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`. 4-D with shape
+  `[batch, new_height, new_width, channels]`.
+
+
+- - -
+
+### tf.image.resize_bilinear(images, size, name=None) <div class="md-anchor" id="resize_bilinear">{#resize_bilinear}</div>
+
+Resize `images` to `size` using bilinear interpolation.
+
+Input images can be of different types but output images are always float.
+
+##### Args:
+
+
+*  <b>images</b>: A `Tensor`. Must be one of the following types: `uint8`, `int8`, `int32`, `float32`, `float64`.
+    4-D with shape `[batch, height, width, channels]`.
+*  <b>size</b>: A 1-D int32 Tensor of 2 elements: `new_height, new_width`.  The
+    new size for the images.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`. 4-D with shape
+  `[batch, new_height, new_width, channels]`.
+
+
+- - -
+
+### tf.image.resize_nearest_neighbor(images, size, name=None) <div class="md-anchor" id="resize_nearest_neighbor">{#resize_nearest_neighbor}</div>
+
+Resize `images` to `size` using nearest neighbor interpolation.
+
+Input images can be of different types but output images are always float.
+
+##### Args:
+
+
+*  <b>images</b>: A `Tensor`. Must be one of the following types: `uint8`, `int8`, `int32`, `float32`, `float64`.
+    4-D with shape `[batch, height, width, channels]`.
+*  <b>size</b>: A 1-D int32 Tensor of 2 elements: `new_height, new_width`.  The
+    new size for the images.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `images`. 4-D with shape
+  `[batch, new_height, new_width, channels]`.
+
+
+
+
+## Cropping. <div class="md-anchor" id="AUTOGENERATED-cropping.">{#AUTOGENERATED-cropping.}</div>
+
+- - -
+
+### tf.image.resize_image_with_crop_or_pad(image, target_height, target_width) <div class="md-anchor" id="resize_image_with_crop_or_pad">{#resize_image_with_crop_or_pad}</div>
+
+Crops and/or pads an image to a target width and height.
+
+Resizes an image to a target width and height by either centrally
+cropping the image or padding it evenly with zeros.
+
+If `width` or `height` is greater than the specified `target_width` or
+`target_height` respectively, this op centrally crops along that dimension.
+If `width` or `height` is smaller than the specified `target_width` or
+`target_height` respectively, this op centrally pads with 0 along that
+dimension.
+
+##### Args:
+
+
+*  <b>image</b>: 3-D tensor of shape [height, width, channels]
+*  <b>target_height</b>: Target height.
+*  <b>target_width</b>: Target width.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if `target_height` or `target_width` are zero or negative.
+
+##### Returns:
+
+  Cropped and/or padded image of shape
+  `[target_height, target_width, channels]`
+
+
+
+- - -
+
+### tf.image.pad_to_bounding_box(image, offset_height, offset_width, target_height, target_width) <div class="md-anchor" id="pad_to_bounding_box">{#pad_to_bounding_box}</div>
+
+Pad `image` with zeros to the specified `height` and `width`.
+
+Adds `offset_height` rows of zeros on top, `offset_width` columns of
+zeros on the left, and then pads the image on the bottom and right
+with zeros until it has dimensions `target_height`, `target_width`.
+
+This op does nothing if `offset_*` is zero and the image already has size
+`target_height` by `target_width`.
+
+##### Args:
+
+
+*  <b>image</b>: 3-D tensor with shape `[height, width, channels]`
+*  <b>offset_height</b>: Number of rows of zeros to add on top.
+*  <b>offset_width</b>: Number of columns of zeros to add on the left.
+*  <b>target_height</b>: Height of output image.
+*  <b>target_width</b>: Width of output image.
+
+##### Returns:
+
+  3-D tensor of shape `[target_height, target_width, channels]`
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If the shape of `image` is incompatible with the `offset_*` or
+    `target_*` arguments
+
+
+- - -
+
+### tf.image.crop_to_bounding_box(image, offset_height, offset_width, target_height, target_width) <div class="md-anchor" id="crop_to_bounding_box">{#crop_to_bounding_box}</div>
+
+Crops an image to a specified bounding box.
+
+This op cuts a rectangular part out of `image`. The top-left corner of the
+returned image is at `offset_height, offset_width` in `image`, and its
+lower-right corner is at
+`offset_height + target_height, offset_width + target_width'.
+
+##### Args:
+
+
+*  <b>image</b>: 3-D tensor with shape `[height, width, channels]`
+*  <b>offset_height</b>: Vertical coordinate of the top-left corner of the result in
+                 the input.
+*  <b>offset_width</b>: Horizontal coordinate of the top-left corner of the result in
+                the input.
+*  <b>target_height</b>: Height of the result.
+*  <b>target_width</b>: Width of the result.
+
+##### Returns:
+
+  3-D tensor of image with shape `[target_height, target_width, channels]`
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If the shape of `image` is incompatible with the `offset_*` or
+  `target_*` arguments
+
+
+- - -
+
+### tf.image.random_crop(image, size, seed=None, name=None) <div class="md-anchor" id="random_crop">{#random_crop}</div>
+
+Randomly crops `image` to size `[target_height, target_width]`.
+
+The offset of the output within `image` is uniformly random. `image` always
+fully contains the result.
+
+##### Args:
+
+
+*  <b>image</b>: 3-D tensor of shape `[height, width, channels]`
+*  <b>size</b>: 1-D tensor with two elements, specifying target `[height, width]`
+*  <b>seed</b>: A Python integer. Used to create a random seed.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+*  <b>name</b>: A name for this operation (optional).
+
+##### Returns:
+
+  A cropped 3-D tensor of shape `[target_height, target_width, channels]`.
+
+
+- - -
+
+### tf.image.extract_glimpse(input, size, offsets, centered=None, normalized=None, uniform_noise=None, name=None) <div class="md-anchor" id="extract_glimpse">{#extract_glimpse}</div>
+
+Extracts a glimpse from the input tensor.
+
+Returns a set of windows called glimpses extracted at location `offsets`
+from the input tensor. If the windows only partially overlaps the inputs, the
+non overlapping areas will be filled with random noise.
+
+The result is a 4-D tensor of shape `[batch_size, glimpse_height,
+glimpse_width, channels]`. The channels and batch dimensions are the same as that
+of the input tensor. The height and width of the output windows are
+specified in the `size` parameter.
+
+The argument `normalized` and `centered` controls how the windows are built:
+* If the coordinates are normalized but not centered, 0.0 and 1.0
+  correspond to the minimum and maximum of each height and width dimension.
+* If the coordinates are both normalized and centered, they range from -1.0 to
+  1.0. The coordinates (-1.0, -1.0) correspond to the upper left corner, the
+  lower right corner is located at  (1.0, 1.0) and the center is at (0, 0).
+* If the coordinates are not normalized they are interpreted as numbers of pixels.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor` of type `float32`.
+    A 4-D float tensor of shape `[batch_size, height, width, channels]`.
+*  <b>size</b>: A `Tensor` of type `int32`.
+    A 1-D tensor of 2 elements containing the size of the glimpses to extract.
+    The glimpse height must be specified first, following by the glimpse width.
+*  <b>offsets</b>: A `Tensor` of type `float32`.
+    A 2-D integer tensor of shape `[batch_size, 2]` containing the x, y
+    locations of the center of each window.
+*  <b>centered</b>: An optional `bool`. Defaults to `True`.
+    indicates if the offset coordinates are centered relative to
+    the image, in which case the (0, 0) offset is relative to the center of the
+    input images. If false, the (0,0) offset corresponds to the upper left corner
+    of the input images.
+*  <b>normalized</b>: An optional `bool`. Defaults to `True`.
+    indicates if the offset coordinates are normalized.
+*  <b>uniform_noise</b>: An optional `bool`. Defaults to `True`.
+    indicates if the noise should be generated using a
+    uniform distribution or a gaussian distribution.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`.
+  A tensor representing the glimpses `[batch_size, glimpse_height,
+  glimpse_width, channels]`.
+
+
+
+## Flipping and Transposing. <div class="md-anchor" id="AUTOGENERATED-flipping-and-transposing.">{#AUTOGENERATED-flipping-and-transposing.}</div>
+
+- - -
+
+### tf.image.flip_up_down(image) <div class="md-anchor" id="flip_up_down">{#flip_up_down}</div>
+
+Flip an image horizontally (upside down).
+
+Outputs the contents of `image` flipped along the first dimension, which is
+`height`.
+
+See also `reverse()`.
+
+##### Args:
+
+
+*  <b>image</b>: A 3-D tensor of shape `[height, width, channels].`
+
+##### Returns:
+
+  A 3-D tensor of the same type and shape as `image`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if the shape of `image` not supported.
+
+
+- - -
+
+### tf.image.random_flip_up_down(image, seed=None) <div class="md-anchor" id="random_flip_up_down">{#random_flip_up_down}</div>
+
+Randomly flips an image vertically (upside down).
+
+With a 1 in 2 chance, outputs the contents of `image` flipped along the first
+dimension, which is `height`.  Otherwise output the image as-is.
+
+##### Args:
+
+
+*  <b>image</b>: A 3-D tensor of shape `[height, width, channels].`
+*  <b>seed</b>: A Python integer. Used to create a random seed.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+##### Returns:
+
+  A 3-D tensor of the same type and shape as `image`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if the shape of `image` not supported.
+
+
+
+- - -
+
+### tf.image.flip_left_right(image) <div class="md-anchor" id="flip_left_right">{#flip_left_right}</div>
+
+Flip an image horizontally (left to right).
+
+Outputs the contents of `image` flipped along the second dimension, which is
+`width`.
+
+See also `reverse()`.
+
+##### Args:
+
+
+*  <b>image</b>: A 3-D tensor of shape `[height, width, channels].`
+
+##### Returns:
+
+  A 3-D tensor of the same type and shape as `image`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if the shape of `image` not supported.
+
+
+- - -
+
+### tf.image.random_flip_left_right(image, seed=None) <div class="md-anchor" id="random_flip_left_right">{#random_flip_left_right}</div>
+
+Randomly flip an image horizontally (left to right).
+
+With a 1 in 2 chance, outputs the contents of `image` flipped along the
+second dimension, which is `width`.  Otherwise output the image as-is.
+
+##### Args:
+
+
+*  <b>image</b>: A 3-D tensor of shape `[height, width, channels].`
+*  <b>seed</b>: A Python integer. Used to create a random seed.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+##### Returns:
+
+  A 3-D tensor of the same type and shape as `image`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if the shape of `image` not supported.
+
+
+
+- - -
+
+### tf.image.transpose_image(image) <div class="md-anchor" id="transpose_image">{#transpose_image}</div>
+
+Transpose an image by swapping the first and second dimension.
+
+See also `transpose()`.
+
+##### Args:
+
+
+*  <b>image</b>: 3-D tensor of shape `[height, width, channels]`
+
+##### Returns:
+
+  A 3-D tensor of shape `[width, height, channels]`
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if the shape of `image` not supported.
+
+
+
+## Image Adjustments. <div class="md-anchor" id="AUTOGENERATED-image-adjustments.">{#AUTOGENERATED-image-adjustments.}</div>
+
+TensorFlow provides functions to adjust images in various ways: brightness,
+contrast, hue, and saturation.  Each adjustment can be done with predefined
+parameters or with random parameters picked from predefined intervals.  Random
+adjustments are often useful to expand a training set and reduce overfitting.
+
+- - -
+
+### tf.image.adjust_brightness(image, delta, min_value=None, max_value=None) <div class="md-anchor" id="adjust_brightness">{#adjust_brightness}</div>
+
+Adjust the brightness of RGB or Grayscale images.
+
+The value `delta` is added to all components of the tensor `image`. `image`
+and `delta` are cast to `float` before adding, and the resulting values are
+clamped to `[min_value, max_value]`. Finally, the result is cast back to
+`images.dtype`.
+
+If `min_value` or `max_value` are not given, they are set to the minimum and
+maximum allowed values for `image.dtype` respectively.
+
+##### Args:
+
+
+*  <b>image</b>: A tensor.
+*  <b>delta</b>: A scalar. Amount to add to the pixel values.
+*  <b>min_value</b>: Minimum value for output.
+*  <b>max_value</b>: Maximum value for output.
+
+##### Returns:
+
+  A tensor of the same shape and type as `image`.
+
+
+- - -
+
+### tf.image.random_brightness(image, max_delta, seed=None) <div class="md-anchor" id="random_brightness">{#random_brightness}</div>
+
+Adjust the brightness of images by a random factor.
+
+Equivalent to `adjust_brightness()` using a `delta` randomly picked in the
+interval `[-max_delta, max_delta)`.
+
+Note that `delta` is picked as a float. Because for integer type images,
+the brightness adjusted result is rounded before casting, integer images may
+have modifications in the range `[-max_delta,max_delta]`.
+
+##### Args:
+
+
+*  <b>image</b>: 3-D tensor of shape `[height, width, channels]`.
+*  <b>max_delta</b>: float, must be non-negative.
+*  <b>seed</b>: A Python integer. Used to create a random seed.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+##### Returns:
+
+  3-D tensor of images of shape `[height, width, channels]`
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if max_delta is negative.
+
+
+
+- - -
+
+### tf.image.adjust_contrast(images, contrast_factor, min_value=None, max_value=None) <div class="md-anchor" id="adjust_contrast">{#adjust_contrast}</div>
+
+Adjust contrast of RGB or grayscale images.
+
+`images` is a tensor of at least 3 dimensions.  The last 3 dimensions are
+interpreted as `[height, width, channels]`.  The other dimensions only
+represent a collection of images, such as `[batch, height, width, channels].`
+
+Contrast is adjusted independently for each channel of each image.
+
+For each channel, this Op first computes the mean of the image pixels in the
+channel and then adjusts each component `x` of each pixel to
+`(x - mean) * contrast_factor + mean`.
+
+The adjusted values are then clipped to fit in the `[min_value, max_value]`
+interval. If `min_value` or `max_value` is not given, it is replaced with the
+minimum and maximum values for the data type of `images` respectively.
+
+The contrast-adjusted image is always computed as `float`, and it is
+cast back to its original type after clipping.
+
+##### Args:
+
+
+*  <b>images</b>: Images to adjust.  At least 3-D.
+*  <b>contrast_factor</b>: A float multiplier for adjusting contrast.
+*  <b>min_value</b>: Minimum value for clipping the adjusted pixels.
+*  <b>max_value</b>: Maximum value for clipping the adjusted pixels.
+
+##### Returns:
+
+  The constrast-adjusted image or images.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if the arguments are invalid.
+
+
+- - -
+
+### tf.image.random_contrast(image, lower, upper, seed=None) <div class="md-anchor" id="random_contrast">{#random_contrast}</div>
+
+Adjust the contrase of an image by a random factor.
+
+Equivalent to `adjust_constrast()` but uses a `contrast_factor` randomly
+picked in the interval `[lower, upper]`.
+
+##### Args:
+
+
+*  <b>image</b>: 3-D tensor of shape `[height, width, channels]`.
+*  <b>lower</b>: float.  Lower bound for the random contrast factor.
+*  <b>upper</b>: float.  Upper bound for the random contrast factor.
+*  <b>seed</b>: A Python integer. Used to create a random seed.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+##### Returns:
+
+  3-D tensor of shape `[height, width, channels]`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if `upper <= lower` or if `lower < 0`.
+
+
+
+- - -
+
+### tf.image.per_image_whitening(image) <div class="md-anchor" id="per_image_whitening">{#per_image_whitening}</div>
+
+Linearly scales `image` to have zero mean and unit norm.
+
+This op computes `(x - mean) / adjusted_stddev`, where `mean` is the average
+of all values in image, and
+`adjusted_stddev = max(stddev, 1.0/srqt(image.NumElements()))`.
+
+`stddev` is the standard deviation of all values in `image`. It is capped
+away from zero to protect against division by 0 when handling uniform images.
+
+Note that this implementation is limited:
+*  It only whitens based on the statistics of an individual image.
+*  It does not take into account the covariance structure.
+
+##### Args:
+
+
+*  <b>image</b>: 3-D tensor of shape `[height, width, channels]`.
+
+##### Returns:
+
+  The whitened image with same shape as `image`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if the shape of 'image' is incompatible with this function.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/index.md b/tensorflow/g3doc/api_docs/python/index.md
new file mode 100644
index 0000000000..72c0a401ef
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/index.md
@@ -0,0 +1,352 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# TensorFlow Python reference documentation
+
+* <b>[Building Graphs](framework.md)</b>: [class DType](framework.md#DType),
+    [class Dimension](framework.md#Dimension),
+    [class Graph](framework.md#Graph),
+    [class GraphKeys](framework.md#GraphKeys),
+    [NoGradient](framework.md#NoGradient),
+    [class Operation](framework.md#Operation),
+    [class RegisterGradient](framework.md#RegisterGradient),
+    [class RegisterShape](framework.md#RegisterShape),
+    [class Tensor](framework.md#Tensor),
+    [class TensorShape](framework.md#TensorShape),
+    [add_to_collection](framework.md#add_to_collection),
+    [as_dtype](framework.md#as_dtype),
+    [control_dependencies](framework.md#control_dependencies),
+    [convert_to_tensor](framework.md#convert_to_tensor),
+    [device](framework.md#device),
+    [get_collection](framework.md#get_collection),
+    [get_default_graph](framework.md#get_default_graph),
+    [get_seed](framework.md#get_seed),
+    [import_graph_def](framework.md#import_graph_def),
+    [name_scope](framework.md#name_scope),
+    [op_scope](framework.md#op_scope)
+
+* <b>[Constants, Sequences, and Random Values](constant_op.md)</b>: [constant](constant_op.md#constant),
+    [fill](constant_op.md#fill),
+    [linspace](constant_op.md#linspace),
+    [ones](constant_op.md#ones),
+    [ones_like](constant_op.md#ones_like),
+    [random_normal](constant_op.md#random_normal),
+    [random_shuffle](constant_op.md#random_shuffle),
+    [random_uniform](constant_op.md#random_uniform),
+    [range](constant_op.md#range),
+    [set_random_seed](constant_op.md#set_random_seed),
+    [truncated_normal](constant_op.md#truncated_normal),
+    [zeros](constant_op.md#zeros),
+    [zeros_like](constant_op.md#zeros_like)
+
+* <b>[Variables](state_ops.md)</b>: [class IndexedSlices](state_ops.md#IndexedSlices),
+    [class Saver](state_ops.md#Saver),
+    [class Variable](state_ops.md#Variable),
+    [all_variables](state_ops.md#all_variables),
+    [assert_variables_initialized](state_ops.md#assert_variables_initialized),
+    [assign](state_ops.md#assign),
+    [assign_add](state_ops.md#assign_add),
+    [assign_sub](state_ops.md#assign_sub),
+    [constant_initializer](state_ops.md#constant_initializer),
+    [count_up_to](state_ops.md#count_up_to),
+    [device](state_ops.md#device),
+    [get_checkpoint_state](state_ops.md#get_checkpoint_state),
+    [get_variable](state_ops.md#get_variable),
+    [get_variable_scope](state_ops.md#get_variable_scope),
+    [initialize_all_variables](state_ops.md#initialize_all_variables),
+    [initialize_variables](state_ops.md#initialize_variables),
+    [latest_checkpoint](state_ops.md#latest_checkpoint),
+    [random_normal_initializer](state_ops.md#random_normal_initializer),
+    [random_uniform_initializer](state_ops.md#random_uniform_initializer),
+    [scatter_add](state_ops.md#scatter_add),
+    [scatter_sub](state_ops.md#scatter_sub),
+    [scatter_update](state_ops.md#scatter_update),
+    [sparse_mask](state_ops.md#sparse_mask),
+    [trainable_variables](state_ops.md#trainable_variables),
+    [truncated_normal_initializer](state_ops.md#truncated_normal_initializer),
+    [uniform_unit_scaling_initializer](state_ops.md#uniform_unit_scaling_initializer),
+    [update_checkpoint_state](state_ops.md#update_checkpoint_state),
+    [variable_scope](state_ops.md#variable_scope),
+    [zeros_initializer](state_ops.md#zeros_initializer)
+
+* <b>[Tensor Transformations](array_ops.md)</b>: [cast](array_ops.md#cast),
+    [concat](array_ops.md#concat),
+    [dynamic_partition](array_ops.md#dynamic_partition),
+    [dynamic_stitch](array_ops.md#dynamic_stitch),
+    [expand_dims](array_ops.md#expand_dims),
+    [gather](array_ops.md#gather),
+    [pack](array_ops.md#pack),
+    [pad](array_ops.md#pad),
+    [rank](array_ops.md#rank),
+    [reshape](array_ops.md#reshape),
+    [reverse](array_ops.md#reverse),
+    [reverse_sequence](array_ops.md#reverse_sequence),
+    [shape](array_ops.md#shape),
+    [size](array_ops.md#size),
+    [slice](array_ops.md#slice),
+    [split](array_ops.md#split),
+    [squeeze](array_ops.md#squeeze),
+    [string_to_number](array_ops.md#string_to_number),
+    [tile](array_ops.md#tile),
+    [to_bfloat16](array_ops.md#to_bfloat16),
+    [to_double](array_ops.md#to_double),
+    [to_float](array_ops.md#to_float),
+    [to_int32](array_ops.md#to_int32),
+    [to_int64](array_ops.md#to_int64),
+    [transpose](array_ops.md#transpose),
+    [unpack](array_ops.md#unpack)
+
+* <b>[Math](math_ops.md)</b>: [abs](math_ops.md#abs),
+    [accumulate_n](math_ops.md#accumulate_n),
+    [add](math_ops.md#add),
+    [add_n](math_ops.md#add_n),
+    [argmax](math_ops.md#argmax),
+    [argmin](math_ops.md#argmin),
+    [batch_cholesky](math_ops.md#batch_cholesky),
+    [batch_matmul](math_ops.md#batch_matmul),
+    [batch_matrix_determinant](math_ops.md#batch_matrix_determinant),
+    [batch_matrix_inverse](math_ops.md#batch_matrix_inverse),
+    [ceil](math_ops.md#ceil),
+    [cholesky](math_ops.md#cholesky),
+    [complex](math_ops.md#complex),
+    [complex_abs](math_ops.md#complex_abs),
+    [conj](math_ops.md#conj),
+    [cos](math_ops.md#cos),
+    [diag](math_ops.md#diag),
+    [div](math_ops.md#div),
+    [edit_distance](math_ops.md#edit_distance),
+    [exp](math_ops.md#exp),
+    [floor](math_ops.md#floor),
+    [imag](math_ops.md#imag),
+    [inv](math_ops.md#inv),
+    [invert_permutation](math_ops.md#invert_permutation),
+    [listdiff](math_ops.md#listdiff),
+    [log](math_ops.md#log),
+    [matmul](math_ops.md#matmul),
+    [matrix_determinant](math_ops.md#matrix_determinant),
+    [matrix_inverse](math_ops.md#matrix_inverse),
+    [maximum](math_ops.md#maximum),
+    [minimum](math_ops.md#minimum),
+    [mod](math_ops.md#mod),
+    [mul](math_ops.md#mul),
+    [neg](math_ops.md#neg),
+    [pow](math_ops.md#pow),
+    [real](math_ops.md#real),
+    [reduce_all](math_ops.md#reduce_all),
+    [reduce_any](math_ops.md#reduce_any),
+    [reduce_max](math_ops.md#reduce_max),
+    [reduce_mean](math_ops.md#reduce_mean),
+    [reduce_min](math_ops.md#reduce_min),
+    [reduce_prod](math_ops.md#reduce_prod),
+    [reduce_sum](math_ops.md#reduce_sum),
+    [round](math_ops.md#round),
+    [rsqrt](math_ops.md#rsqrt),
+    [segment_max](math_ops.md#segment_max),
+    [segment_mean](math_ops.md#segment_mean),
+    [segment_min](math_ops.md#segment_min),
+    [segment_prod](math_ops.md#segment_prod),
+    [segment_sum](math_ops.md#segment_sum),
+    [sign](math_ops.md#sign),
+    [sin](math_ops.md#sin),
+    [sparse_segment_mean](math_ops.md#sparse_segment_mean),
+    [sparse_segment_sum](math_ops.md#sparse_segment_sum),
+    [sqrt](math_ops.md#sqrt),
+    [square](math_ops.md#square),
+    [sub](math_ops.md#sub),
+    [transpose](math_ops.md#transpose),
+    [unique](math_ops.md#unique),
+    [unsorted_segment_sum](math_ops.md#unsorted_segment_sum),
+    [where](math_ops.md#where)
+
+* <b>[Control Flow](control_flow_ops.md)</b>: [Assert](control_flow_ops.md#Assert),
+    [Print](control_flow_ops.md#Print),
+    [add_check_numerics_ops](control_flow_ops.md#add_check_numerics_ops),
+    [check_numerics](control_flow_ops.md#check_numerics),
+    [count_up_to](control_flow_ops.md#count_up_to),
+    [equal](control_flow_ops.md#equal),
+    [greater](control_flow_ops.md#greater),
+    [greater_equal](control_flow_ops.md#greater_equal),
+    [group](control_flow_ops.md#group),
+    [identity](control_flow_ops.md#identity),
+    [is_finite](control_flow_ops.md#is_finite),
+    [is_inf](control_flow_ops.md#is_inf),
+    [is_nan](control_flow_ops.md#is_nan),
+    [less](control_flow_ops.md#less),
+    [less_equal](control_flow_ops.md#less_equal),
+    [logical_and](control_flow_ops.md#logical_and),
+    [logical_not](control_flow_ops.md#logical_not),
+    [logical_or](control_flow_ops.md#logical_or),
+    [logical_xor](control_flow_ops.md#logical_xor),
+    [no_op](control_flow_ops.md#no_op),
+    [not_equal](control_flow_ops.md#not_equal),
+    [select](control_flow_ops.md#select),
+    [tuple](control_flow_ops.md#tuple),
+    [verify_tensor_all_finite](control_flow_ops.md#verify_tensor_all_finite),
+    [where](control_flow_ops.md#where)
+
+* <b>[Images](image.md)</b>: [adjust_brightness](image.md#adjust_brightness),
+    [adjust_contrast](image.md#adjust_contrast),
+    [crop_to_bounding_box](image.md#crop_to_bounding_box),
+    [decode_jpeg](image.md#decode_jpeg),
+    [decode_png](image.md#decode_png),
+    [encode_jpeg](image.md#encode_jpeg),
+    [encode_png](image.md#encode_png),
+    [extract_glimpse](image.md#extract_glimpse),
+    [flip_left_right](image.md#flip_left_right),
+    [flip_up_down](image.md#flip_up_down),
+    [pad_to_bounding_box](image.md#pad_to_bounding_box),
+    [per_image_whitening](image.md#per_image_whitening),
+    [random_brightness](image.md#random_brightness),
+    [random_contrast](image.md#random_contrast),
+    [random_crop](image.md#random_crop),
+    [random_flip_left_right](image.md#random_flip_left_right),
+    [random_flip_up_down](image.md#random_flip_up_down),
+    [resize_area](image.md#resize_area),
+    [resize_bicubic](image.md#resize_bicubic),
+    [resize_bilinear](image.md#resize_bilinear),
+    [resize_image_with_crop_or_pad](image.md#resize_image_with_crop_or_pad),
+    [resize_images](image.md#resize_images),
+    [resize_nearest_neighbor](image.md#resize_nearest_neighbor),
+    [transpose_image](image.md#transpose_image)
+
+* <b>[Sparse Tensors](sparse_ops.md)</b>: [class SparseTensor](sparse_ops.md#SparseTensor),
+    [class SparseTensorValue](sparse_ops.md#SparseTensorValue),
+    [shape](sparse_ops.md#shape),
+    [sparse_concat](sparse_ops.md#sparse_concat),
+    [sparse_fill_empty_rows](sparse_ops.md#sparse_fill_empty_rows),
+    [sparse_reorder](sparse_ops.md#sparse_reorder),
+    [sparse_retain](sparse_ops.md#sparse_retain),
+    [sparse_tensor_to_dense](sparse_ops.md#sparse_tensor_to_dense),
+    [sparse_to_dense](sparse_ops.md#sparse_to_dense),
+    [sparse_to_indicator](sparse_ops.md#sparse_to_indicator)
+
+* <b>[Inputs and Readers](io_ops.md)</b>: [class FIFOQueue](io_ops.md#FIFOQueue),
+    [class FixedLengthRecordReader](io_ops.md#FixedLengthRecordReader),
+    [class IdentityReader](io_ops.md#IdentityReader),
+    [class QueueBase](io_ops.md#QueueBase),
+    [class RandomShuffleQueue](io_ops.md#RandomShuffleQueue),
+    [class ReaderBase](io_ops.md#ReaderBase),
+    [class TFRecordReader](io_ops.md#TFRecordReader),
+    [class TextLineReader](io_ops.md#TextLineReader),
+    [class WholeFileReader](io_ops.md#WholeFileReader),
+    [batch](io_ops.md#batch),
+    [batch_join](io_ops.md#batch_join),
+    [decode_csv](io_ops.md#decode_csv),
+    [decode_raw](io_ops.md#decode_raw),
+    [limit_epochs](io_ops.md#limit_epochs),
+    [match_filenames_once](io_ops.md#match_filenames_once),
+    [matching_files](io_ops.md#matching_files),
+    [parse_example](io_ops.md#parse_example),
+    [parse_single_example](io_ops.md#parse_single_example),
+    [placeholder](io_ops.md#placeholder),
+    [range_input_producer](io_ops.md#range_input_producer),
+    [read_file](io_ops.md#read_file),
+    [shuffle_batch](io_ops.md#shuffle_batch),
+    [shuffle_batch_join](io_ops.md#shuffle_batch_join),
+    [size](io_ops.md#size),
+    [slice_input_producer](io_ops.md#slice_input_producer),
+    [string_input_producer](io_ops.md#string_input_producer)
+
+* <b>[Data IO (Python functions)](python_io.md)</b>: [class TFRecordWriter](python_io.md#TFRecordWriter),
+    [tf_record_iterator](python_io.md#tf_record_iterator)
+
+* <b>[Neural Network](nn.md)</b>: [avg_pool](nn.md#avg_pool),
+    [bias_add](nn.md#bias_add),
+    [compute_accidental_hits](nn.md#compute_accidental_hits),
+    [conv2d](nn.md#conv2d),
+    [depthwise_conv2d](nn.md#depthwise_conv2d),
+    [dropout](nn.md#dropout),
+    [embedding_lookup](nn.md#embedding_lookup),
+    [embedding_lookup_sparse](nn.md#embedding_lookup_sparse),
+    [fixed_unigram_candidate_sampler](nn.md#fixed_unigram_candidate_sampler),
+    [in_top_k](nn.md#in_top_k),
+    [l2_loss](nn.md#l2_loss),
+    [l2_normalize](nn.md#l2_normalize),
+    [learned_unigram_candidate_sampler](nn.md#learned_unigram_candidate_sampler),
+    [local_response_normalization](nn.md#local_response_normalization),
+    [log_uniform_candidate_sampler](nn.md#log_uniform_candidate_sampler),
+    [max_pool](nn.md#max_pool),
+    [max_pool_with_argmax](nn.md#max_pool_with_argmax),
+    [moments](nn.md#moments),
+    [nce_loss](nn.md#nce_loss),
+    [relu](nn.md#relu),
+    [relu6](nn.md#relu6),
+    [sampled_softmax_loss](nn.md#sampled_softmax_loss),
+    [separable_conv2d](nn.md#separable_conv2d),
+    [sigmoid](nn.md#sigmoid),
+    [sigmoid_cross_entropy_with_logits](nn.md#sigmoid_cross_entropy_with_logits),
+    [softmax](nn.md#softmax),
+    [softmax_cross_entropy_with_logits](nn.md#softmax_cross_entropy_with_logits),
+    [softplus](nn.md#softplus),
+    [tanh](nn.md#tanh),
+    [top_k](nn.md#top_k),
+    [uniform_candidate_sampler](nn.md#uniform_candidate_sampler)
+
+* <b>[Running Graphs](client.md)</b>: [class AbortedError](client.md#AbortedError),
+    [class AlreadyExistsError](client.md#AlreadyExistsError),
+    [class CancelledError](client.md#CancelledError),
+    [class DataLossError](client.md#DataLossError),
+    [class DeadlineExceededError](client.md#DeadlineExceededError),
+    [class FailedPreconditionError](client.md#FailedPreconditionError),
+    [class InternalError](client.md#InternalError),
+    [class InvalidArgumentError](client.md#InvalidArgumentError),
+    [class NotFoundError](client.md#NotFoundError),
+    [class OpError](client.md#OpError),
+    [class OutOfRangeError](client.md#OutOfRangeError),
+    [class PermissionDeniedError](client.md#PermissionDeniedError),
+    [class ResourceExhaustedError](client.md#ResourceExhaustedError),
+    [class Session](client.md#Session),
+    [class UnauthenticatedError](client.md#UnauthenticatedError),
+    [class UnavailableError](client.md#UnavailableError),
+    [class UnimplementedError](client.md#UnimplementedError),
+    [class UnknownError](client.md#UnknownError),
+    [get_default_session](client.md#get_default_session)
+
+* <b>[Training](train.md)</b>: [class AdagradOptimizer](train.md#AdagradOptimizer),
+    [class AdamOptimizer](train.md#AdamOptimizer),
+    [class AggregationMethod](train.md#AggregationMethod),
+    [class Coordinator](train.md#Coordinator),
+    [class ExponentialMovingAverage](train.md#ExponentialMovingAverage),
+    [class FtrlOptimizer](train.md#FtrlOptimizer),
+    [class GradientDescentOptimizer](train.md#GradientDescentOptimizer),
+    [class MomentumOptimizer](train.md#MomentumOptimizer),
+    [class Optimizer](train.md#Optimizer),
+    [class QueueRunner](train.md#QueueRunner),
+    [class RMSPropOptimizer](train.md#RMSPropOptimizer),
+    [class SummaryWriter](train.md#SummaryWriter),
+    [add_queue_runner](train.md#add_queue_runner),
+    [clip_by_average_norm](train.md#clip_by_average_norm),
+    [clip_by_global_norm](train.md#clip_by_global_norm),
+    [clip_by_norm](train.md#clip_by_norm),
+    [clip_by_value](train.md#clip_by_value),
+    [exponential_decay](train.md#exponential_decay),
+    [global_norm](train.md#global_norm),
+    [global_step](train.md#global_step),
+    [gradients](train.md#gradients),
+    [histogram_summary](train.md#histogram_summary),
+    [image_summary](train.md#image_summary),
+    [merge_all_summaries](train.md#merge_all_summaries),
+    [merge_summary](train.md#merge_summary),
+    [scalar_summary](train.md#scalar_summary),
+    [start_queue_runners](train.md#start_queue_runners),
+    [stop_gradient](train.md#stop_gradient),
+    [summary_iterator](train.md#summary_iterator),
+    [write_graph](train.md#write_graph),
+    [zero_fraction](train.md#zero_fraction)
+
+<div class="sections-order" style="display: none;">
+<!--
+<!-- framework.md -->
+<!-- constant_op.md -->
+<!-- state_ops.md -->
+<!-- array_ops.md -->
+<!-- math_ops.md -->
+<!-- control_flow_ops.md -->
+<!-- image.md -->
+<!-- sparse_ops.md -->
+<!-- io_ops.md -->
+<!-- python_io.md -->
+<!-- nn.md -->
+<!-- client.md -->
+<!-- train.md -->
+-->
+</div>
diff --git a/tensorflow/g3doc/api_docs/python/io_ops.md b/tensorflow/g3doc/api_docs/python/io_ops.md
new file mode 100644
index 0000000000..ab8c4aa146
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/io_ops.md
@@ -0,0 +1,1956 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Inputs and Readers
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Placeholders](#AUTOGENERATED-placeholders)
+  * [tf.placeholder(dtype, shape=None, name=None)](#placeholder)
+* [Readers](#AUTOGENERATED-readers)
+  * [class tf.ReaderBase](#ReaderBase)
+  * [class tf.TextLineReader](#TextLineReader)
+  * [class tf.WholeFileReader](#WholeFileReader)
+  * [class tf.IdentityReader](#IdentityReader)
+  * [class tf.TFRecordReader](#TFRecordReader)
+  * [class tf.FixedLengthRecordReader](#FixedLengthRecordReader)
+* [Converting](#AUTOGENERATED-converting)
+  * [tf.decode_csv(records, record_defaults, field_delim=None, name=None)](#decode_csv)
+  * [tf.decode_raw(bytes, out_type, little_endian=None, name=None)](#decode_raw)
+  * [tf.parse_example(serialized, names=None, sparse_keys=None, sparse_types=None, dense_keys=None, dense_types=None, dense_defaults=None, dense_shapes=None, name='ParseExample')](#parse_example)
+  * [tf.parse_single_example(serialized, names=None, sparse_keys=None, sparse_types=None, dense_keys=None, dense_types=None, dense_defaults=None, dense_shapes=None, name='ParseSingleExample')](#parse_single_example)
+* [Queues](#AUTOGENERATED-queues)
+  * [class tf.QueueBase](#QueueBase)
+  * [class tf.FIFOQueue](#FIFOQueue)
+  * [class tf.RandomShuffleQueue](#RandomShuffleQueue)
+* [Dealing with the filesystem](#AUTOGENERATED-dealing-with-the-filesystem)
+  * [tf.matching_files(pattern, name=None)](#matching_files)
+  * [tf.read_file(filename, name=None)](#read_file)
+* [Input pipeline](#AUTOGENERATED-input-pipeline)
+  * [Beginning of an input pipeline](#AUTOGENERATED-beginning-of-an-input-pipeline)
+  * [tf.train.match_filenames_once(pattern, name=None)](#match_filenames_once)
+  * [tf.train.limit_epochs(tensor, num_epochs=None, name=None)](#limit_epochs)
+  * [tf.train.range_input_producer(limit, num_epochs=None, shuffle=True, seed=None, capacity=32, name=None)](#range_input_producer)
+  * [tf.train.slice_input_producer(tensor_list, num_epochs=None, shuffle=True, seed=None, capacity=32, name=None)](#slice_input_producer)
+  * [tf.train.string_input_producer(string_tensor, num_epochs=None, shuffle=True, seed=None, capacity=32, name=None)](#string_input_producer)
+  * [Batching at the end of an input pipeline](#AUTOGENERATED-batching-at-the-end-of-an-input-pipeline)
+  * [tf.train.batch(tensor_list, batch_size, num_threads=1, capacity=32, enqueue_many=False, shapes=None, name=None)](#batch)
+  * [tf.train.batch_join(tensor_list_list, batch_size, capacity=32, enqueue_many=False, shapes=None, name=None)](#batch_join)
+  * [tf.train.shuffle_batch(tensor_list, batch_size, capacity, min_after_dequeue, num_threads=1, seed=None, enqueue_many=False, shapes=None, name=None)](#shuffle_batch)
+  * [tf.train.shuffle_batch_join(tensor_list_list, batch_size, capacity, min_after_dequeue, seed=None, enqueue_many=False, shapes=None, name=None)](#shuffle_batch_join)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Placeholders <div class="md-anchor" id="AUTOGENERATED-placeholders">{#AUTOGENERATED-placeholders}</div>
+
+TensorFlow provides a placeholder operation that must be fed with data
+on execution.  For more info, see the section on [Feeding
+data](../../how_tos/reading_data/index.md#feeding).
+
+- - -
+
+### tf.placeholder(dtype, shape=None, name=None) <div class="md-anchor" id="placeholder">{#placeholder}</div>
+
+Inserts a placeholder for a tensor that will be always fed.
+
+**Important**: This tensor will produce an error if evaluated. Its value must
+be fed using the `feed_dict` optional argument to `Session.run()`,
+`Tensor.eval()`, or `Operation.run()`.
+
+For example:
+
+```python
+x = tf.placeholder(float, shape=(1024, 1024))
+y = tf.matmul(x, x)
+
+with tf.Session() as sess:
+  print sess.run(y)  # ERROR: will fail because x was not fed.
+
+  rand_array = np.random.rand(1024, 1024)
+  print sess.run(y, feed_dict={x: rand_array})  # Will succeed.
+```
+
+##### Args:
+
+
+*  <b>dtype</b>: The type of elements in the tensor to be fed.
+*  <b>shape</b>: The shape of the tensor to be fed (optional). If the shape is not
+    specified, you can feed a tensor of any shape.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` that may be used as a handle for feeding a value, but not
+  evaluated directly.
+
+
+
+## Readers <div class="md-anchor" id="AUTOGENERATED-readers">{#AUTOGENERATED-readers}</div>
+
+TensorFlow provides a set of Reader classes for reading data formats.
+For more information on inputs and readers, see [Reading
+data](../../how_tos/reading_data/index.md).
+
+- - -
+
+### class tf.ReaderBase <div class="md-anchor" id="ReaderBase">{#ReaderBase}</div>
+
+Base class for different Reader types, that produce a record every step.
+
+Conceptually, Readers convert string 'work units' into records (key,
+value pairs).  Typically the 'work units' are filenames and the
+records are extracted from the contents of those files.  We want a
+single record produced per step, but a work unit can correspond to
+many records.
+
+Therefore we introduce some decoupling using a queue.  The queue
+contains the work units and the Reader dequeues from the queue when
+it is asked to produce a record (via Read()) but it has finished the
+last work unit.
+- - -
+
+#### tf.ReaderBase.__init__(reader_ref, supports_serialize=False) {#ReaderBase.__init__}
+
+Creates a new ReaderBase.
+
+##### Args:
+
+
+*  <b>reader_ref</b>: The operation that implements the reader.
+*  <b>supports_serialize</b>: True if the reader implementation can
+    serialize its state.
+
+
+- - -
+
+#### tf.ReaderBase.num_records_produced(name=None) {#ReaderBase.num_records_produced}
+
+Returns the number of records this reader has produced.
+
+This is the same as the number of Read executions that have
+succeeded.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.ReaderBase.num_work_units_completed(name=None) {#ReaderBase.num_work_units_completed}
+
+Returns the number of work units this reader has finished processing.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.ReaderBase.read(queue, name=None) {#ReaderBase.read}
+
+Returns the next record (key, value pair) produced by a reader.
+
+Will dequeue a work unit from queue if necessary (e.g. when the
+Reader needs to start reading from a new file since it has
+finished with the previous file).
+
+##### Args:
+
+
+*  <b>queue</b>: A Queue or a mutable string Tensor representing a handle
+    to a Queue, with string work items.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of Tensors (key, value).
+
+*  <b>key</b>: A string scalar Tensor.
+*  <b>value</b>: A string scalar Tensor.
+
+
+- - -
+
+#### tf.ReaderBase.reader_ref {#ReaderBase.reader_ref}
+
+Op that implements the reader.
+
+- - -
+
+#### tf.ReaderBase.reset(name=None) {#ReaderBase.reset}
+
+Restore a reader to its initial clean state.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.ReaderBase.restore_state(state, name=None) {#ReaderBase.restore_state}
+
+Restore a reader to a previously saved state.
+
+Not all Readers support being restored, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>state</b>: A string Tensor.
+    Result of a SerializeState of a Reader with matching type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.ReaderBase.serialize_state(name=None) {#ReaderBase.serialize_state}
+
+Produce a string tensor that encodes the state of a reader.
+
+Not all Readers support being serialized, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A string Tensor.
+
+
+- - -
+
+#### tf.ReaderBase.supports_serialize {#ReaderBase.supports_serialize}
+
+Whether the Reader implementation can serialize its state.
+
+
+- - -
+
+### class tf.TextLineReader <div class="md-anchor" id="TextLineReader">{#TextLineReader}</div>
+
+A Reader that outputs the lines of a file delimited by newlines.
+
+Newlines are stripped from the output.
+See ReaderBase for supported methods.
+- - -
+
+#### tf.TextLineReader.__init__(skip_header_lines=None, name=None) {#TextLineReader.__init__}
+
+Create a TextLineReader.
+
+##### Args:
+
+
+*  <b>skip_header_lines</b>: An optional int. Defaults to 0.  Number of lines
+    to skip from the beginning of every file.
+*  <b>name</b>: A name for the operation (optional).
+
+
+- - -
+
+#### tf.TextLineReader.num_records_produced(name=None) {#TextLineReader.num_records_produced}
+
+Returns the number of records this reader has produced.
+
+This is the same as the number of Read executions that have
+succeeded.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.TextLineReader.num_work_units_completed(name=None) {#TextLineReader.num_work_units_completed}
+
+Returns the number of work units this reader has finished processing.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.TextLineReader.read(queue, name=None) {#TextLineReader.read}
+
+Returns the next record (key, value pair) produced by a reader.
+
+Will dequeue a work unit from queue if necessary (e.g. when the
+Reader needs to start reading from a new file since it has
+finished with the previous file).
+
+##### Args:
+
+
+*  <b>queue</b>: A Queue or a mutable string Tensor representing a handle
+    to a Queue, with string work items.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of Tensors (key, value).
+
+*  <b>key</b>: A string scalar Tensor.
+*  <b>value</b>: A string scalar Tensor.
+
+
+- - -
+
+#### tf.TextLineReader.reader_ref {#TextLineReader.reader_ref}
+
+Op that implements the reader.
+
+- - -
+
+#### tf.TextLineReader.reset(name=None) {#TextLineReader.reset}
+
+Restore a reader to its initial clean state.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.TextLineReader.restore_state(state, name=None) {#TextLineReader.restore_state}
+
+Restore a reader to a previously saved state.
+
+Not all Readers support being restored, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>state</b>: A string Tensor.
+    Result of a SerializeState of a Reader with matching type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.TextLineReader.serialize_state(name=None) {#TextLineReader.serialize_state}
+
+Produce a string tensor that encodes the state of a reader.
+
+Not all Readers support being serialized, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A string Tensor.
+
+
+- - -
+
+#### tf.TextLineReader.supports_serialize {#TextLineReader.supports_serialize}
+
+Whether the Reader implementation can serialize its state.
+
+
+- - -
+
+### class tf.WholeFileReader <div class="md-anchor" id="WholeFileReader">{#WholeFileReader}</div>
+
+A Reader that outputs the entire contents of a file as a value.
+
+To use, enqueue filenames in a Queue.  The output of Read will
+be a filename (key) and the contents of that file (value).
+
+See ReaderBase for supported methods.
+- - -
+
+#### tf.WholeFileReader.__init__(name=None) {#WholeFileReader.__init__}
+
+Create a WholeFileReader.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+
+- - -
+
+#### tf.WholeFileReader.num_records_produced(name=None) {#WholeFileReader.num_records_produced}
+
+Returns the number of records this reader has produced.
+
+This is the same as the number of Read executions that have
+succeeded.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.WholeFileReader.num_work_units_completed(name=None) {#WholeFileReader.num_work_units_completed}
+
+Returns the number of work units this reader has finished processing.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.WholeFileReader.read(queue, name=None) {#WholeFileReader.read}
+
+Returns the next record (key, value pair) produced by a reader.
+
+Will dequeue a work unit from queue if necessary (e.g. when the
+Reader needs to start reading from a new file since it has
+finished with the previous file).
+
+##### Args:
+
+
+*  <b>queue</b>: A Queue or a mutable string Tensor representing a handle
+    to a Queue, with string work items.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of Tensors (key, value).
+
+*  <b>key</b>: A string scalar Tensor.
+*  <b>value</b>: A string scalar Tensor.
+
+
+- - -
+
+#### tf.WholeFileReader.reader_ref {#WholeFileReader.reader_ref}
+
+Op that implements the reader.
+
+- - -
+
+#### tf.WholeFileReader.reset(name=None) {#WholeFileReader.reset}
+
+Restore a reader to its initial clean state.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.WholeFileReader.restore_state(state, name=None) {#WholeFileReader.restore_state}
+
+Restore a reader to a previously saved state.
+
+Not all Readers support being restored, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>state</b>: A string Tensor.
+    Result of a SerializeState of a Reader with matching type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.WholeFileReader.serialize_state(name=None) {#WholeFileReader.serialize_state}
+
+Produce a string tensor that encodes the state of a reader.
+
+Not all Readers support being serialized, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A string Tensor.
+
+
+- - -
+
+#### tf.WholeFileReader.supports_serialize {#WholeFileReader.supports_serialize}
+
+Whether the Reader implementation can serialize its state.
+
+
+- - -
+
+### class tf.IdentityReader <div class="md-anchor" id="IdentityReader">{#IdentityReader}</div>
+
+A Reader that outputs the queued work as both the key and value.
+
+To use, enqueue strings in a Queue.  Read will take the front
+work string and output (work, work).
+
+See ReaderBase for supported methods.
+- - -
+
+#### tf.IdentityReader.__init__(name=None) {#IdentityReader.__init__}
+
+Create a IdentityReader.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+
+- - -
+
+#### tf.IdentityReader.num_records_produced(name=None) {#IdentityReader.num_records_produced}
+
+Returns the number of records this reader has produced.
+
+This is the same as the number of Read executions that have
+succeeded.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.IdentityReader.num_work_units_completed(name=None) {#IdentityReader.num_work_units_completed}
+
+Returns the number of work units this reader has finished processing.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.IdentityReader.read(queue, name=None) {#IdentityReader.read}
+
+Returns the next record (key, value pair) produced by a reader.
+
+Will dequeue a work unit from queue if necessary (e.g. when the
+Reader needs to start reading from a new file since it has
+finished with the previous file).
+
+##### Args:
+
+
+*  <b>queue</b>: A Queue or a mutable string Tensor representing a handle
+    to a Queue, with string work items.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of Tensors (key, value).
+
+*  <b>key</b>: A string scalar Tensor.
+*  <b>value</b>: A string scalar Tensor.
+
+
+- - -
+
+#### tf.IdentityReader.reader_ref {#IdentityReader.reader_ref}
+
+Op that implements the reader.
+
+- - -
+
+#### tf.IdentityReader.reset(name=None) {#IdentityReader.reset}
+
+Restore a reader to its initial clean state.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.IdentityReader.restore_state(state, name=None) {#IdentityReader.restore_state}
+
+Restore a reader to a previously saved state.
+
+Not all Readers support being restored, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>state</b>: A string Tensor.
+    Result of a SerializeState of a Reader with matching type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.IdentityReader.serialize_state(name=None) {#IdentityReader.serialize_state}
+
+Produce a string tensor that encodes the state of a reader.
+
+Not all Readers support being serialized, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A string Tensor.
+
+
+- - -
+
+#### tf.IdentityReader.supports_serialize {#IdentityReader.supports_serialize}
+
+Whether the Reader implementation can serialize its state.
+
+
+- - -
+
+### class tf.TFRecordReader <div class="md-anchor" id="TFRecordReader">{#TFRecordReader}</div>
+
+A Reader that outputs the records from a TFRecords file.
+
+See ReaderBase for supported methods.
+- - -
+
+#### tf.TFRecordReader.__init__(name=None) {#TFRecordReader.__init__}
+
+Create a TFRecordReader.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+
+- - -
+
+#### tf.TFRecordReader.num_records_produced(name=None) {#TFRecordReader.num_records_produced}
+
+Returns the number of records this reader has produced.
+
+This is the same as the number of Read executions that have
+succeeded.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.TFRecordReader.num_work_units_completed(name=None) {#TFRecordReader.num_work_units_completed}
+
+Returns the number of work units this reader has finished processing.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.TFRecordReader.read(queue, name=None) {#TFRecordReader.read}
+
+Returns the next record (key, value pair) produced by a reader.
+
+Will dequeue a work unit from queue if necessary (e.g. when the
+Reader needs to start reading from a new file since it has
+finished with the previous file).
+
+##### Args:
+
+
+*  <b>queue</b>: A Queue or a mutable string Tensor representing a handle
+    to a Queue, with string work items.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of Tensors (key, value).
+
+*  <b>key</b>: A string scalar Tensor.
+*  <b>value</b>: A string scalar Tensor.
+
+
+- - -
+
+#### tf.TFRecordReader.reader_ref {#TFRecordReader.reader_ref}
+
+Op that implements the reader.
+
+- - -
+
+#### tf.TFRecordReader.reset(name=None) {#TFRecordReader.reset}
+
+Restore a reader to its initial clean state.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.TFRecordReader.restore_state(state, name=None) {#TFRecordReader.restore_state}
+
+Restore a reader to a previously saved state.
+
+Not all Readers support being restored, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>state</b>: A string Tensor.
+    Result of a SerializeState of a Reader with matching type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.TFRecordReader.serialize_state(name=None) {#TFRecordReader.serialize_state}
+
+Produce a string tensor that encodes the state of a reader.
+
+Not all Readers support being serialized, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A string Tensor.
+
+
+- - -
+
+#### tf.TFRecordReader.supports_serialize {#TFRecordReader.supports_serialize}
+
+Whether the Reader implementation can serialize its state.
+
+
+- - -
+
+### class tf.FixedLengthRecordReader <div class="md-anchor" id="FixedLengthRecordReader">{#FixedLengthRecordReader}</div>
+
+A Reader that outputs fixed-length records from a file.
+
+See ReaderBase for supported methods.
+- - -
+
+#### tf.FixedLengthRecordReader.__init__(record_bytes, header_bytes=None, footer_bytes=None, name=None) {#FixedLengthRecordReader.__init__}
+
+Create a FixedLengthRecordReader.
+
+##### Args:
+
+
+*  <b>record_bytes</b>: An int.
+*  <b>header_bytes</b>: An optional int. Defaults to 0.
+*  <b>footer_bytes</b>: An optional int. Defaults to 0.
+*  <b>name</b>: A name for the operation (optional).
+
+
+- - -
+
+#### tf.FixedLengthRecordReader.num_records_produced(name=None) {#FixedLengthRecordReader.num_records_produced}
+
+Returns the number of records this reader has produced.
+
+This is the same as the number of Read executions that have
+succeeded.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.FixedLengthRecordReader.num_work_units_completed(name=None) {#FixedLengthRecordReader.num_work_units_completed}
+
+Returns the number of work units this reader has finished processing.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  An int64 Tensor.
+
+
+- - -
+
+#### tf.FixedLengthRecordReader.read(queue, name=None) {#FixedLengthRecordReader.read}
+
+Returns the next record (key, value pair) produced by a reader.
+
+Will dequeue a work unit from queue if necessary (e.g. when the
+Reader needs to start reading from a new file since it has
+finished with the previous file).
+
+##### Args:
+
+
+*  <b>queue</b>: A Queue or a mutable string Tensor representing a handle
+    to a Queue, with string work items.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of Tensors (key, value).
+
+*  <b>key</b>: A string scalar Tensor.
+*  <b>value</b>: A string scalar Tensor.
+
+
+- - -
+
+#### tf.FixedLengthRecordReader.reader_ref {#FixedLengthRecordReader.reader_ref}
+
+Op that implements the reader.
+
+- - -
+
+#### tf.FixedLengthRecordReader.reset(name=None) {#FixedLengthRecordReader.reset}
+
+Restore a reader to its initial clean state.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.FixedLengthRecordReader.restore_state(state, name=None) {#FixedLengthRecordReader.restore_state}
+
+Restore a reader to a previously saved state.
+
+Not all Readers support being restored, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>state</b>: A string Tensor.
+    Result of a SerializeState of a Reader with matching type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The created Operation.
+
+
+- - -
+
+#### tf.FixedLengthRecordReader.serialize_state(name=None) {#FixedLengthRecordReader.serialize_state}
+
+Produce a string tensor that encodes the state of a reader.
+
+Not all Readers support being serialized, so this can produce an
+Unimplemented error.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A string Tensor.
+
+
+- - -
+
+#### tf.FixedLengthRecordReader.supports_serialize {#FixedLengthRecordReader.supports_serialize}
+
+Whether the Reader implementation can serialize its state.
+
+
+
+## Converting <div class="md-anchor" id="AUTOGENERATED-converting">{#AUTOGENERATED-converting}</div>
+
+TensorFlow provides several operations that you can use to convert various data
+formats into tensors.
+
+- - -
+
+### tf.decode_csv(records, record_defaults, field_delim=None, name=None) <div class="md-anchor" id="decode_csv">{#decode_csv}</div>
+
+Convert CSV records to tensors. Each column maps to one tensor.
+
+RFC 4180 format is expected for the CSV records.
+(https://tools.ietf.org/html/rfc4180)
+Note that we allow leading and trailing spaces with int or float field.
+
+##### Args:
+
+
+*  <b>records</b>: A `Tensor` of type `string`.
+    Each string is a record/row in the csv and all records should have
+    the same format.
+*  <b>record_defaults</b>: A list of `Tensor` objects with types from: `float32`, `int32`, `int64`, `string`.
+    One tensor per column of the input record, with either a
+    scalar default value for that column or empty if the column is required.
+*  <b>field_delim</b>: An optional `string`. Defaults to `","`.
+    delimiter to separate fields in a record.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A list of `Tensor` objects. Has the same type as `record_defaults`.
+  Each tensor will have the same shape as records.
+
+
+- - -
+
+### tf.decode_raw(bytes, out_type, little_endian=None, name=None) <div class="md-anchor" id="decode_raw">{#decode_raw}</div>
+
+Reinterpret the bytes of a string as a vector of numbers.
+
+##### Args:
+
+
+*  <b>bytes</b>: A `Tensor` of type `string`.
+    All the elements must have the same length.
+*  <b>out_type</b>: A `tf.DType` from: `tf.float32, tf.float64, tf.int32, tf.uint8, tf.int16, tf.int8, tf.int64`.
+*  <b>little_endian</b>: An optional `bool`. Defaults to `True`.
+    Whether the input bytes are in little-endian order.
+    Ignored for out_types that are stored in a single byte like uint8.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `out_type`.
+  A Tensor with one more dimension than the input bytes.  The
+  added dimension will have size equal to the length of the elements
+  of bytes divided by the number of bytes to represent out_type.
+
+
+- - -
+
+### tf.parse_example(serialized, names=None, sparse_keys=None, sparse_types=None, dense_keys=None, dense_types=None, dense_defaults=None, dense_shapes=None, name='ParseExample') <div class="md-anchor" id="parse_example">{#parse_example}</div>
+
+Parse Example protos.
+
+##### Args:
+
+
+*  <b>serialized</b>: string vector, a batch of binary serialized Example protos.
+*  <b>names</b>: A string vector, the names of the serialized protos.
+    "names" may contain, e.g., table key (descriptive) names for the
+    corresponding serialized protos.  These are purely useful for debugging
+    purposes, and the presence of values here has no effect on the output.
+    "names" may be an empty vector, if no names are available.
+    If non-empty, this vector must be the same length as "serialized".
+*  <b>sparse_keys</b>: A string list of keys in the Examples' features.
+    These keys are associated with sparse values.
+*  <b>sparse_types</b>: A list of DTypes.
+    This list's length must match that of sparse_keys.  Currently
+    parse_example supports tf.float32 (FloatList), tf.int64 (Int64List),
+    and tf.string (BytesList).
+*  <b>dense_keys</b>: A string list of keys in the Examples' features.
+    These keys are associated with dense values.
+*  <b>dense_types</b>: A list of DTypes.
+    This list's length must match that of dense_keys.  Currently
+    parse_example supports tf.float32 (FloatList), tf.int64 (Int64List),
+    and tf.string (BytesList).
+*  <b>dense_defaults</b>: A dict of {key:Tensor} (some may be missing).
+    The keys of the dict must match the dense_keys of the feature.
+    If a key is not present in this dictionary, the corresponding dense
+    Feature is required in all elements of serialized.
+*  <b>dense_shapes</b>: A list of tuples.
+    Entries provide the shape of data in each dense Feature in features.
+    The length of dense_shapes must be the same as the length of dense_keys.
+    The number of elements in the Feature corresponding to dense_key[j]
+    must always have np.prod(dense_shapes[j]) entries.
+    If dense_shapes[j] == (D0, D1, ..., DN) then the the shape of output
+    Tensor dense_values[j] will be (|serialized|, D0, D1, ..., DN):
+    The dense outputs are just the inputs row-stacked by batch.
+*  <b>name</b>: (Optional) Name of Op in the graph.
+
+##### Returns:
+
+  A dictionary mapping keys to Tensors and SparseTensors.
+
+  The key dense_keys[j] is mapped to a tensor of type dense_types[j] and
+  of shape (serialized.size(),) + dense_shapes[j] (i.e., the dense outputs are
+  inputs, reshaped in row-major format and then row-stacked by batch).
+
+  The key sparse_keys[j] is mapped to a SparseTensor of type sparse_types[j].
+  The SparseTensor represents a ragged matrix.  Its indices are [batch, index]
+  where "batch" is is the batch entry the value is from, and "index" is the
+  value's index in the list of values associated with that feature
+  and example.  For example, if one expects a tf.float32 sparse feature "ft"
+  and three serialized examples are provided:
+
+  serialized = [
+
+*  <b>features</b>: 
+      { feature: [ key: { "ft" value: float_list: { value: [1.0, 2.0] } } ] },
+*  <b>features</b>: 
+      { feature: [] },
+*  <b>features</b>: 
+      { feature: [ key: { "ft" value: float_list: { value: [3.0] } } ] }
+  ]
+
+  then the output will look like:
+
+    {"ft": SparseTensor(indices=[[0, 0], [0, 1], [2, 0]],
+                        values=[1.0, 2.0, 3.0],
+                        shape=(3, 2)) }
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If sparse and dense keys intersect, or input lengths do not
+    match up for sparse_* (similarly for dense_*).
+*  <b>TypeError</b>: If an input is malformed.
+
+Example input, format, and output: Just Sparse Inputs
+================================================
+
+Given two brain.Example input protos:
+
+
+*  <b>serialized</b>: // serialized versions of the protos below
+  [features: {
+
+*  <b>feature</b>: { key: "kw" value: { bytes_list: { value: [ "knit", "big" ] } } }
+*  <b>feature</b>: { key: "gps" value: { float_list: { value: [] } } }
+   },
+*  <b>features</b>: {
+*  <b>feature</b>: { key: "kw" value: { bytes_list: { value: [ "emmy" ] } } }
+*  <b>feature</b>: { key: "dank" value: { int64_list: { value: [ 42 ] } } }
+*  <b>feature</b>: { key: "gps" value: { } }
+  }]
+
+*  <b>names</b>: ["input0", "input1"],
+*  <b>sparse_keys</b>: ["kw", "dank", "gps"]
+*  <b>sparse_types</b>: [DT_STRING, DT_INT64, DT_FLOAT]
+
+Then the expected output is a dictionary:
+{
+  "kw": SparseTensor(
+      indices=[[0, 0], [0, 1], [1, 0]],
+      values=["knit", "big", "emmy"]
+      shape=[2, 2]),
+  "dank": SparseTensor(
+      indices=[[1, 0]],
+      values=[42],
+      shape=[2, 1]),
+  "gps": SparseTensor(
+      indices=[],
+      values=[],
+      shape=[2, 0]),
+}
+
+
+Example input, format, and output: Dense Inputs (without defaults)
+==================================================================
+
+Given two brain.Example input protos:
+
+
+*  <b>serialized</b>: // serialized versions of the protos below
+  [features: {
+
+*  <b>feature</b>: { key: "age" value: { int64_list: { value: [ 0 ] } } }
+*  <b>feature</b>: { key: "gender" value: { bytes_list: { value: [ "f" ] } } }
+   },
+*  <b>features</b>: {
+*  <b>feature</b>: { key: "age" value: { int64_list: { value: [] } } }
+*  <b>feature</b>: { key: "gender" value: { bytes_list: { value: [ "f" ] } } }
+  }]
+
+*  <b>names</b>: ["input0", "input1"],
+*  <b>dense_keys</b>: np.array(["age", "gender"])
+*  <b>dense_types</b>: [tf.int64, tf.string]
+*  <b>dense_defaults</b>: {
+  "age": -1  # defaults to -1 if missing
+             # "gender" has no specified default so it's required
+}
+
+*  <b>dense_shapes</b>: [(1,), (1,)]  # age, gender, label, weight
+
+Then the expected output is a dictionary:
+{
+  "age": [[0], [-1]],
+  "gender": [["f"], ["f"]],
+}
+
+
+Example input, format, and output: Dense Inputs (with defaults)
+===============================================================
+
+Given two brain.Example input protos:
+
+
+*  <b>serialized</b>: // serialized versions of the protos below
+  [features: {
+
+*  <b>feature</b>: { key: "weight" value: { float_list: { value: [ 1.0 ] } } }
+   },
+*  <b>features</b>: {
+*  <b>feature</b>: { key: "label" value: { float_list: { value: [ -1.0, 0.0 ] } } }
+  }]
+
+*  <b>names</b>: ["input0", "input1"],
+*  <b>dense_keys</b>: np.array(["label", "weight"])
+*  <b>dense_defaults</b>: {
+  "label": [1.0, 2.0],  # float (default: vector)
+  "weight": 5.0         # float (default: scalar, 5.0)
+}
+
+*  <b>dense_shapes</b>: [(2,), (1,)]  # age, gender, label, weight
+
+Then the expected output is a dictionary:
+{
+  "label": [[1.0, 2.0], [-1.0, 0.0]],
+  "weight": [[1.0], [5.0]],
+}
+
+
+- - -
+
+### tf.parse_single_example(serialized, names=None, sparse_keys=None, sparse_types=None, dense_keys=None, dense_types=None, dense_defaults=None, dense_shapes=None, name='ParseSingleExample') <div class="md-anchor" id="parse_single_example">{#parse_single_example}</div>
+
+Identical to parse_example but for scalar serialized and names.
+
+##### Args:
+
+
+*  <b>serialized</b>: A scalar string, a single serialized Example.
+    See parse_example documentation for more details.
+*  <b>names</b>: (Optional) A scalar string, the associated name.
+    See parse_example documentation for more details.
+*  <b>sparse_keys</b>: See parse_example documentation for more details.
+*  <b>sparse_types</b>: See parse_example documentation for more details.
+*  <b>dense_keys</b>: See parse_example documentation for more details.
+*  <b>dense_types</b>: See parse_example documentation for more details.
+*  <b>dense_defaults</b>: See parse_example documentation for more details.
+*  <b>dense_shapes</b>: See parse_example documentation for more details.
+*  <b>name</b>: Optional op name.
+
+##### Returns:
+
+  A dictionary mapping keys to Tensors and SparseTensors.
+
+  For dense tensors, the Tensor is identical to the output of parse_example,
+  except it is one less dimension (the first, batch, dimension is removed).
+
+  For SparseTensors:
+    The first (batch) column of the indices matrix is removed
+      (it is now a column vector).
+    The values vector is unchanged.
+    The first (batch_size) entry of the shape vector is removed
+      (it is now a single element vector).
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if "scalar" or "names" have known shapes, and are not scalars.
+
+
+
+## Queues <div class="md-anchor" id="AUTOGENERATED-queues">{#AUTOGENERATED-queues}</div>
+
+TensorFlow provides several implementations of 'Queues', which are
+structures within the TensorFlow computation graph to stage pipelines
+of tensors together. The following describe the basic Queue interface
+and some implementations.  To see an example use, see [Threading and
+Queues](../../how_tos/threading_and_queues/index.md).
+
+- - -
+
+### class tf.QueueBase <div class="md-anchor" id="QueueBase">{#QueueBase}</div>
+
+Base class for queue implementations.
+
+A queue is a TensorFlow data structure that stores tensors across
+multiple steps, and exposes operations that enqueue and dequeue
+tensors.
+
+Each queue element is a tuple of one or more tensors, where each
+tuple component has a static dtype, and may have a static shape. The
+queue implementations support versions of enqueue and dequeue that
+handle single elements, versions that support enqueuing and
+dequeuing a batch of elements at once.
+
+See [`tf.FIFOQueue`](#FIFOQueue) and
+[`tf.RandomShuffleQueue`](#RandomShuffleQueue) for concrete
+implementations of this class, and instructions on how to create
+them.
+
+- - -
+
+#### tf.QueueBase.enqueue(vals, name=None) {#QueueBase.enqueue}
+
+Enqueues one element to this queue.
+
+If the queue is full when this operation executes, it will block
+until the element has been enqueued.
+
+##### Args:
+
+
+*  <b>vals</b>: The tuple of `Tensor` objects to be enqueued.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The operation that enqueues a new tuple of tensors to the queue.
+
+
+- - -
+
+#### tf.QueueBase.enqueue_many(vals, name=None) {#QueueBase.enqueue_many}
+
+Enqueues zero or elements to this queue.
+
+This operation slices each component tensor along the 0th dimension to
+make multiple queue elements. All of the tensors in `vals` must have the
+same size in the 0th dimension.
+
+If the queue is full when this operation executes, it will block
+until all of the elements have been enqueued.
+
+##### Args:
+
+
+*  <b>vals</b>: The tensor or tuple of tensors from which the queue elements
+    are taken.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The operation that enqueues a batch of tuples of tensors to the queue.
+
+
+
+- - -
+
+#### tf.QueueBase.dequeue(name=None) {#QueueBase.dequeue}
+
+Dequeues one element from this queue.
+
+If the queue is empty when this operation executes, it will block
+until there is an element to dequeue.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The tuple of tensors that was dequeued.
+
+
+- - -
+
+#### tf.QueueBase.dequeue_many(n, name=None) {#QueueBase.dequeue_many}
+
+Dequeues and concatenates `n` elements from this queue.
+
+This operation concatenates queue-element component tensors along
+the 0th dimension to make a single component tensor.  All of the
+components in the dequeued tuple will have size `n` in the 0th dimension.
+
+If the queue contains fewer than `n` elements when this operation
+executes, it will block until `n` elements have been dequeued.
+
+##### Args:
+
+
+*  <b>n</b>: A scalar `Tensor` containing the number of elements to dequeue.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The tuple of concatenated tensors that was dequeued.
+
+
+
+- - -
+
+#### tf.QueueBase.size(name=None) {#QueueBase.size}
+
+Compute the number of elements in this queue.
+
+##### Args:
+
+
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A scalar tensor containing the number of elements in this queue.
+
+
+
+- - -
+
+#### tf.QueueBase.close(cancel_pending_enqueues=False, name=None) {#QueueBase.close}
+
+Closes this queue.
+
+This operation signals that no more elements will be enqueued in
+the given queue. Subsequent `enqueue` and `enqueue_many`
+operations will fail. Subsequent `dequeue` and `dequeue_many`
+operations will continue to succeed if sufficient elements remain
+in the queue. Subsequent `dequeue` and `dequeue_many` operations
+that would block will fail immediately.
+
+If `cancel_pending_enqueues` is `True`, all pending requests will also
+be cancelled.
+
+##### Args:
+
+
+*  <b>cancel_pending_enqueues</b>: (Optional.) A boolean, defaulting to
+    `False` (described above).
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The operation that closes the queue.
+
+
+
+#### Other Methods
+- - -
+
+#### tf.QueueBase.__init__(dtypes, shapes, queue_ref) {#QueueBase.__init__}
+
+Constructs a queue object from a queue reference.
+
+##### Args:
+
+
+*  <b>dtypes</b>: A list of types.  The length of dtypes must equal the number
+    of tensors in each element.
+*  <b>shapes</b>: Constraints on the shapes of tensors in an element:
+    A list of shape tuples or None. This list is the same length
+    as dtypes.  If the shape of any tensors in the element are constrained,
+    all must be; shapes can be None if the shapes should not be constrained.
+*  <b>queue_ref</b>: The queue reference, i.e. the output of the queue op.
+
+
+- - -
+
+#### tf.QueueBase.dtypes {#QueueBase.dtypes}
+
+The list of dtypes for each component of a queue element.
+
+- - -
+
+#### tf.QueueBase.name {#QueueBase.name}
+
+The name of the underlying queue.
+
+- - -
+
+#### tf.QueueBase.queue_ref {#QueueBase.queue_ref}
+
+The underlying queue reference.
+
+
+- - -
+
+### class tf.FIFOQueue <div class="md-anchor" id="FIFOQueue">{#FIFOQueue}</div>
+
+A queue implementation that dequeues elements in first-in-first out order.
+
+See [`tf.QueueBase`](#QueueBase) for a description of the methods on
+this class.
+
+- - -
+
+#### tf.FIFOQueue.__init__(capacity, dtypes, shapes=None, shared_name=None, name='fifo_queue') {#FIFOQueue.__init__}
+
+Creates a queue that dequeues elements in a first-in first-out order.
+
+A `FIFOQueue` has bounded capacity; supports multiple concurrent
+producers and consumers; and provides exactly-once delivery.
+
+A `FIFOQueue` holds a list of up to `capacity` elements. Each
+element is a fixed-length tuple of tensors whose dtypes are
+described by `dtypes`, and whose shapes are optionally described
+by the `shapes` argument.
+
+If the `shapes` argument is specified, each component of a queue
+element must have the respective fixed shape. If it is
+unspecified, different queue elements may have different shapes,
+but the use of `dequeue_many` is disallowed.
+
+##### Args:
+
+
+*  <b>capacity</b>: An integer. The upper bound on the number of elements
+    that may be stored in this queue.
+*  <b>dtypes</b>: A list of `DType` objects. The length of `dtypes` must equal
+    the number of tensors in each queue element.
+*  <b>shapes</b>: (Optional.) A list of fully-defined `TensorShape` objects,
+    with the same length as `dtypes` or `None`.
+*  <b>shared_name</b>: (Optional.) If non-empty, this queue will be shared under
+    the given name across multiple sessions.
+*  <b>name</b>: Optional name for the queue operation.
+
+
+
+- - -
+
+### class tf.RandomShuffleQueue <div class="md-anchor" id="RandomShuffleQueue">{#RandomShuffleQueue}</div>
+
+A queue implementation that dequeues elements in a random order.
+
+See [`tf.QueueBase`](#QueueBase) for a description of the methods on
+this class.
+
+- - -
+
+#### tf.RandomShuffleQueue.__init__(capacity, min_after_dequeue, dtypes, shapes=None, seed=None, shared_name=None, name='random_shuffle_queue') {#RandomShuffleQueue.__init__}
+
+Create a queue that dequeues elements in a random order.
+
+A `RandomShuffleQueue` has bounded capacity; supports multiple
+concurrent producers and consumers; and provides exactly-once
+delivery.
+
+A `RandomShuffleQueue` holds a list of up to `capacity`
+elements. Each element is a fixed-length tuple of tensors whose
+dtypes are described by `dtypes`, and whose shapes are optionally
+described by the `shapes` argument.
+
+If the `shapes` argument is specified, each component of a queue
+element must have the respective fixed shape. If it is
+unspecified, different queue elements may have different shapes,
+but the use of `dequeue_many` is disallowed.
+
+The `min_after_dequeue` argument allows the caller to specify a
+minimum number of elements that will remain in the queue after a
+`dequeue` or `dequeue_many` operation completes, to ensure a
+minimum level of mixing of elements. This invariant is maintained
+by blocking those operations until sufficient elements have been
+enqueued. The `min_after_dequeue` argument is ignored after the
+queue has been closed.
+
+##### Args:
+
+
+*  <b>capacity</b>: An integer. The upper bound on the number of elements
+    that may be stored in this queue.
+*  <b>min_after_dequeue</b>: An integer (described above).
+*  <b>dtypes</b>: A list of `DType` objects. The length of `dtypes` must equal
+    the number of tensors in each queue element.
+*  <b>shapes</b>: (Optional.) A list of fully-defined `TensorShape` objects,
+    with the same length as `dtypes` or `None`.
+*  <b>seed</b>: A Python integer. Used to create a random seed.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+*  <b>shared_name</b>: (Optional.) If non-empty, this queue will be shared under
+    the given name across multiple sessions.
+*  <b>name</b>: Optional name for the queue operation.
+
+
+
+
+## Dealing with the filesystem <div class="md-anchor" id="AUTOGENERATED-dealing-with-the-filesystem">{#AUTOGENERATED-dealing-with-the-filesystem}</div>
+
+- - -
+
+### tf.matching_files(pattern, name=None) <div class="md-anchor" id="matching_files">{#matching_files}</div>
+
+Returns the set of files matching a pattern.
+
+Note that this routine only supports wildcard characters in the
+basename portion of the pattern, not in the directory portion.
+
+##### Args:
+
+
+*  <b>pattern</b>: A `Tensor` of type `string`. A (scalar) shell wildcard pattern.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `string`. A vector of matching filenames.
+
+
+- - -
+
+### tf.read_file(filename, name=None) <div class="md-anchor" id="read_file">{#read_file}</div>
+
+Reads and outputs the entire contents of the input filename.
+
+##### Args:
+
+
+*  <b>filename</b>: A `Tensor` of type `string`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `string`.
+
+
+
+## Input pipeline <div class="md-anchor" id="AUTOGENERATED-input-pipeline">{#AUTOGENERATED-input-pipeline}</div>
+
+TensorFlow functions for setting up an input-prefetching pipeline.
+Please see the [reading data how-to](../../how_tos/reading_data.md)
+for context.
+
+### Beginning of an input pipeline <div class="md-anchor" id="AUTOGENERATED-beginning-of-an-input-pipeline">{#AUTOGENERATED-beginning-of-an-input-pipeline}</div>
+
+The "producer" functions add a queue to the graph and a corresponding
+`QueueRunner` for running the subgraph that fills that queue.
+
+- - -
+
+### tf.train.match_filenames_once(pattern, name=None) <div class="md-anchor" id="match_filenames_once">{#match_filenames_once}</div>
+
+Save the list of files matching pattern, so it is only computed once.
+
+##### Args:
+
+
+*  <b>pattern</b>: A file pattern (glob).
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  A variable that is initialized to the list of files matching pattern.
+
+
+- - -
+
+### tf.train.limit_epochs(tensor, num_epochs=None, name=None) <div class="md-anchor" id="limit_epochs">{#limit_epochs}</div>
+
+Returns tensor num_epochs times and then raises an OutOfRange error.
+
+##### Args:
+
+
+*  <b>tensor</b>: Any Tensor.
+*  <b>num_epochs</b>: An integer (optional).  If specified, limits the number
+    of steps the output tensor may be evaluated.
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  tensor or OutOfRange.
+
+
+- - -
+
+### tf.train.range_input_producer(limit, num_epochs=None, shuffle=True, seed=None, capacity=32, name=None) <div class="md-anchor" id="range_input_producer">{#range_input_producer}</div>
+
+Produces the integers from 0 to limit-1 in a queue.
+
+##### Args:
+
+
+*  <b>limit</b>: An int32 scalar tensor.
+*  <b>num_epochs</b>: An integer (optional). If specified, `range_input_producer`
+    produces each integer `num_epochs` times before generating an
+    OutOfRange error. If not specified, `range_input_producer` can cycle
+    through the integers an unlimited number of times.
+*  <b>shuffle</b>: Boolean. If true, the integers are randomly shuffled within each
+    epoch.
+*  <b>seed</b>: An integer (optional). Seed used if shuffle == True.
+*  <b>capacity</b>: An integer. Sets the queue capacity.
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  A Queue with the output integers.  A QueueRunner for the Queue
+  is added to the current Graph's QUEUE_RUNNER collection.
+
+
+- - -
+
+### tf.train.slice_input_producer(tensor_list, num_epochs=None, shuffle=True, seed=None, capacity=32, name=None) <div class="md-anchor" id="slice_input_producer">{#slice_input_producer}</div>
+
+Produces a slice of each Tensor in tensor_list.
+
+Implemented using a Queue -- a QueueRunner for the Queue
+is added to the current Graph's QUEUE_RUNNER collection.
+
+##### Args:
+
+
+*  <b>tensor_list</b>: A list of Tensors. Every Tensor in tensor_list must
+    have the same size in the first dimension.
+*  <b>num_epochs</b>: An integer (optional). If specified, `slice_input_producer`
+    produces each slice `num_epochs` times before generating
+    an OutOfRange error. If not specified, `slice_input_producer` can cycle
+    through the slices an unlimited number of times.
+*  <b>seed</b>: An integer (optional). Seed used if shuffle == True.
+*  <b>capacity</b>: An integer. Sets the queue capacity.
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  A list of tensors, one for each element of tensor_list.  If the tensor
+  in tensor_list has shape [N, a, b, .., z], then the corresponding output
+  tensor will have shape [a, b, ..., z].
+
+
+- - -
+
+### tf.train.string_input_producer(string_tensor, num_epochs=None, shuffle=True, seed=None, capacity=32, name=None) <div class="md-anchor" id="string_input_producer">{#string_input_producer}</div>
+
+Output strings (e.g. filenames) to a queue for an input pipeline.
+
+##### Args:
+
+
+*  <b>string_tensor</b>: A 1-D string tensor with the strings to produce.
+*  <b>num_epochs</b>: An integer (optional). If specified, `string_input_producer`
+    produces each string from `string_tensor` `num_epochs` times before
+    generating an OutOfRange error. If not specified, `string_input_producer`
+    can cycle through the strings in `string_tensor` an unlimited number of
+    times.
+*  <b>shuffle</b>: Boolean. If true, the strings are randomly shuffled within each
+    epoch.
+*  <b>seed</b>: An integer (optional). Seed used if shuffle == True.
+*  <b>capacity</b>: An integer. Sets the queue capacity.
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  A queue with the output strings.  A QueueRunner for the Queue
+  is added to the current Graph's QUEUE_RUNNER collection.
+
+
+
+### Batching at the end of an input pipeline <div class="md-anchor" id="AUTOGENERATED-batching-at-the-end-of-an-input-pipeline">{#AUTOGENERATED-batching-at-the-end-of-an-input-pipeline}</div>
+
+These functions add a queue to the graph to assemble a batch of examples, with
+possible shuffling.  They also add a `QueueRunner` for running the subgraph
+that fills that queue.
+
+Use [batch](#batch) or [batch_join](#batch_join) for batching examples that have
+already been well shuffled.  Use [shuffle_batch](#shuffle_batch) or
+[shuffle_batch_join](#shuffle_batch_join) for examples that
+would benefit from additional shuffling.
+
+Use [batch](#batch) or [shuffle_batch](#shuffle_batch) if you want a
+single thread producing examples to batch, or if you have a
+single subgraph producing examples but you want to run it in N threads
+(where you increase N until it can keep the queue full).  Use
+[batch_join](#batch_join) or [shuffle_batch_join](#shuffle_batch_join)
+if you have N different subgraphs producing examples to batch and you
+want them run by N threads.
+
+- - -
+
+### tf.train.batch(tensor_list, batch_size, num_threads=1, capacity=32, enqueue_many=False, shapes=None, name=None) <div class="md-anchor" id="batch">{#batch}</div>
+
+Run tensor_list to fill a queue to create batches.
+
+Implemented using a queue -- a QueueRunner for the queue
+is added to the current Graph's QUEUE_RUNNER collection.
+
+##### Args:
+
+
+*  <b>tensor_list</b>: The list of tensors to enqueue.
+*  <b>batch_size</b>: The new batch size pulled from the queue.
+*  <b>num_threads</b>: The number of threads enqueuing tensor_list.
+*  <b>capacity</b>: Maximum number of elements in the queue, controls the
+    how far ahead the prefetching allowed is allowed to get and
+    memory usage.
+*  <b>enqueue_many</b>: If False, tensor_list is assumed to represent a
+    single example.  If True, tensor_list is assumed to represent
+    a batch of examples, where the first dimension is indexed by
+    example, and all members of tensor_list should have the same
+    size in the first dimension.
+*  <b>shapes</b>: Optional. The shapes for each example.  Defaults to the
+    inferred shapes for tensor_list (leaving off the first dimension
+    if enqueue_many is True).
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  A list of tensors with the same number and types as tensor_list.
+  If enqueue_many is false, then an input tensor with shape
+  `[x, y, z]` will be output as a tensor with shape
+  `[batch_size, x, y, z]`.  If enqueue_many is True, and an
+  input tensor has shape `[*, x, y, z]`, the the output will have
+  shape `[batch_size, x, y, z]`.
+
+
+- - -
+
+### tf.train.batch_join(tensor_list_list, batch_size, capacity=32, enqueue_many=False, shapes=None, name=None) <div class="md-anchor" id="batch_join">{#batch_join}</div>
+
+Run a list of tensors to fill a queue to create batches of examples.
+
+This version enqueues a different list of tensors in different threads.
+Implemented using a queue -- a QueueRunner for the queue
+is added to the current Graph's QUEUE_RUNNER collection.
+
+##### Args:
+
+
+*  <b>tensor_list_list</b>: A list of tuples of tensors to enqueue.
+    len(tensor_list_list) threads will be started, with the i-th
+    thread enqueuing the tensors from tensor_list[i].
+    tensor_list[i1][j] must match tensor_list[i2][j] in type and
+    shape (except in the first dimension if enqueue_many is true).
+*  <b>batch_size</b>: The new batch size pulled from the queue.
+*  <b>capacity</b>: Maximum number of elements in the queue, controls the
+    how far ahead the prefetching allowed is allowed to get and
+    memory usage.
+*  <b>enqueue_many</b>: If False, each tensor_list_list[i] is assumed to
+    represent a single example.  If True, tensor_list_list[i] is
+    assumed to represent a batch of examples, where the first
+    dimension is indexed by example, and all members of
+    tensor_list_list[i] should have the same size in the first
+    dimension.
+*  <b>shapes</b>: Optional. The shapes for each example.  Defaults to the
+    inferred shapes for tensor_list_list[i] (which must match, after
+    leaving off the first dimension if enqueue_many is True).
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  A list of tensors with the same number and types as
+  tensor_list_list[i].  If enqueue_many is false, then an input
+  tensor with shape `[x, y, z]` will be output as a tensor with
+  shape `[batch_size, x, y, z]`.  If enqueue_many is True, and an
+  input tensor has shape `[*, x, y, z]`, the the output will have
+  shape `[batch_size, x, y, z]`.
+
+
+- - -
+
+### tf.train.shuffle_batch(tensor_list, batch_size, capacity, min_after_dequeue, num_threads=1, seed=None, enqueue_many=False, shapes=None, name=None) <div class="md-anchor" id="shuffle_batch">{#shuffle_batch}</div>
+
+Create batches by randomly shuffling tensors.
+
+This adds:
+
+* a shuffling queue into which tensors from tensor_list are enqueued.
+* a dequeue many operation to create batches from the queue,
+* and a QueueRunner is added to the current Graph's QUEUE_RUNNER collection,
+  to enqueue the tensors from tensor_list.
+
+##### Args:
+
+
+*  <b>tensor_list</b>: The list of tensors to enqueue.
+*  <b>batch_size</b>: The new batch size pulled from the queue.
+*  <b>capacity</b>: Maximum number of elements in the queue, controls the
+    how far ahead the prefetching allowed is allowed to get and
+    memory usage.
+*  <b>min_after_dequeue</b>: Minimum number elements in the queue after a
+    dequeue, used to ensure a level of mixing of elements.
+*  <b>num_threads</b>: The number of threads enqueuing tensor_list.
+*  <b>seed</b>: Seed for the random shuffling within the queue.
+*  <b>enqueue_many</b>: If False, tensor_list is assumed to represent a
+    single example.  If True, tensor_list is assumed to represent
+    a batch of examples, where the first dimension is indexed by
+    example, and all members of tensor_list should have the same
+    size in the first dimension.
+*  <b>shapes</b>: Optional. The shapes for each example.  Defaults to the
+    inferred shapes for tensor_list (leaving off the first dimension
+    if enqueue_many is True).
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  A list of tensors with the same number and types as tensor_list.
+  If enqueue_many is false, then an input tensor with shape
+  `[x, y, z]` will be output as a tensor with shape
+  `[batch_size, x, y, z]`.  If enqueue_many is True, and an
+  input tensor has shape `[*, x, y, z]`, the the output will have
+  shape `[batch_size, x, y, z]`.
+
+
+- - -
+
+### tf.train.shuffle_batch_join(tensor_list_list, batch_size, capacity, min_after_dequeue, seed=None, enqueue_many=False, shapes=None, name=None) <div class="md-anchor" id="shuffle_batch_join">{#shuffle_batch_join}</div>
+
+Create batches by randomly shuffling tensors.
+
+This version enqueues a different list of tensors in different threads.
+It adds:
+
+* a shuffling queue into which tensors from tensor_list_list are enqueued.
+* a dequeue many operation to create batches from the queue,
+* and a QueueRunner is added to the current Graph's QUEUE_RUNNER collection,
+  to enqueue the tensors from tensor_list_list.
+
+##### Args:
+
+
+*  <b>tensor_list_list</b>: A list of tuples of tensors to enqueue.
+    len(tensor_list_list) threads will be started, with the i-th
+    thread enqueuing the tensors from tensor_list[i].
+    tensor_list[i1][j] must match tensor_list[i2][j] in type and
+    shape (except in the first dimension if enqueue_many is true).
+*  <b>batch_size</b>: The new batch size pulled from the queue.
+*  <b>capacity</b>: Maximum number of elements in the queue, controls the
+    how far ahead the prefetching allowed is allowed to get and
+    memory usage.
+*  <b>min_after_dequeue</b>: Minimum number elements in the queue after a
+    dequeue, used to ensure a level of mixing of elements.
+*  <b>seed</b>: Seed for the random shuffling within the queue.
+*  <b>enqueue_many</b>: If False, each tensor_list_list[i] is assumed to
+    represent a single example.  If True, tensor_list_list[i] is
+    assumed to represent a batch of examples, where the first
+    dimension is indexed by example, and all members of
+    tensor_list_list[i] should have the same size in the first
+    dimension.
+*  <b>shapes</b>: Optional. The shapes for each example.  Defaults to the
+    inferred shapes for tensor_list_list[i] (which must match, after
+    leaving off the first dimension if enqueue_many is True).
+*  <b>name</b>: A name for the operations (optional).
+
+##### Returns:
+
+  A list of tensors with the same number and types as
+  tensor_list_list[i].  If enqueue_many is false, then an input
+  tensor with shape `[x, y, z]` will be output as a tensor with
+  shape `[batch_size, x, y, z]`.  If enqueue_many is True, and an
+  input tensor has shape `[*, x, y, z]`, the the output will have
+  shape `[batch_size, x, y, z]`.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/math_ops.md b/tensorflow/g3doc/api_docs/python/math_ops.md
new file mode 100644
index 0000000000..fb93c38311
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/math_ops.md
@@ -0,0 +1,1883 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Math
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Arithmetic Operators](#AUTOGENERATED-arithmetic-operators)
+  * [tf.add(x, y, name=None)](#add)
+  * [tf.sub(x, y, name=None)](#sub)
+  * [tf.mul(x, y, name=None)](#mul)
+  * [tf.div(x, y, name=None)](#div)
+  * [tf.mod(x, y, name=None)](#mod)
+* [Basic Math Functions](#AUTOGENERATED-basic-math-functions)
+  * [tf.add_n(inputs, name=None)](#add_n)
+  * [tf.abs(x, name=None)](#abs)
+  * [tf.neg(x, name=None)](#neg)
+  * [tf.sign(x, name=None)](#sign)
+  * [tf.inv(x, name=None)](#inv)
+  * [tf.square(x, name=None)](#square)
+  * [tf.round(x, name=None)](#round)
+  * [tf.sqrt(x, name=None)](#sqrt)
+  * [tf.rsqrt(x, name=None)](#rsqrt)
+  * [tf.pow(x, y, name=None)](#pow)
+  * [tf.exp(x, name=None)](#exp)
+  * [tf.log(x, name=None)](#log)
+  * [tf.ceil(x, name=None)](#ceil)
+  * [tf.floor(x, name=None)](#floor)
+  * [tf.maximum(x, y, name=None)](#maximum)
+  * [tf.minimum(x, y, name=None)](#minimum)
+  * [tf.cos(x, name=None)](#cos)
+  * [tf.sin(x, name=None)](#sin)
+* [Matrix Math Functions](#AUTOGENERATED-matrix-math-functions)
+  * [tf.diag(diagonal, name=None)](#diag)
+  * [tf.transpose(a, perm=None, name='transpose')](#transpose)
+  * [tf.matmul(a, b, transpose_a=False, transpose_b=False, a_is_sparse=False, b_is_sparse=False, name=None)](#matmul)
+  * [tf.batch_matmul(x, y, adj_x=None, adj_y=None, name=None)](#batch_matmul)
+  * [tf.matrix_determinant(input, name=None)](#matrix_determinant)
+  * [tf.batch_matrix_determinant(input, name=None)](#batch_matrix_determinant)
+  * [tf.matrix_inverse(input, name=None)](#matrix_inverse)
+  * [tf.batch_matrix_inverse(input, name=None)](#batch_matrix_inverse)
+  * [tf.cholesky(input, name=None)](#cholesky)
+  * [tf.batch_cholesky(input, name=None)](#batch_cholesky)
+* [Complex Number Functions](#AUTOGENERATED-complex-number-functions)
+  * [tf.complex(real, imag, name=None)](#complex)
+  * [tf.complex_abs(x, name=None)](#complex_abs)
+  * [tf.conj(in_, name=None)](#conj)
+  * [tf.imag(in_, name=None)](#imag)
+  * [tf.real(in_, name=None)](#real)
+* [Reduction](#AUTOGENERATED-reduction)
+  * [tf.reduce_sum(input_tensor, reduction_indices=None, keep_dims=False, name=None)](#reduce_sum)
+  * [tf.reduce_prod(input_tensor, reduction_indices=None, keep_dims=False, name=None)](#reduce_prod)
+  * [tf.reduce_min(input_tensor, reduction_indices=None, keep_dims=False, name=None)](#reduce_min)
+  * [tf.reduce_max(input_tensor, reduction_indices=None, keep_dims=False, name=None)](#reduce_max)
+  * [tf.reduce_mean(input_tensor, reduction_indices=None, keep_dims=False, name=None)](#reduce_mean)
+  * [tf.reduce_all(input_tensor, reduction_indices=None, keep_dims=False, name=None)](#reduce_all)
+  * [tf.reduce_any(input_tensor, reduction_indices=None, keep_dims=False, name=None)](#reduce_any)
+  * [tf.accumulate_n(inputs, shape=None, tensor_dtype=None, name=None)](#accumulate_n)
+* [Segmentation](#AUTOGENERATED-segmentation)
+  * [tf.segment_sum(data, segment_ids, name=None)](#segment_sum)
+  * [tf.segment_prod(data, segment_ids, name=None)](#segment_prod)
+  * [tf.segment_min(data, segment_ids, name=None)](#segment_min)
+  * [tf.segment_max(data, segment_ids, name=None)](#segment_max)
+  * [tf.segment_mean(data, segment_ids, name=None)](#segment_mean)
+  * [tf.unsorted_segment_sum(data, segment_ids, num_segments, name=None)](#unsorted_segment_sum)
+  * [tf.sparse_segment_sum(data, indices, segment_ids, name=None)](#sparse_segment_sum)
+  * [tf.sparse_segment_mean(data, indices, segment_ids, name=None)](#sparse_segment_mean)
+* [Sequence Comparison and Indexing](#AUTOGENERATED-sequence-comparison-and-indexing)
+  * [tf.argmin(input, dimension, name=None)](#argmin)
+  * [tf.argmax(input, dimension, name=None)](#argmax)
+  * [tf.listdiff(x, y, name=None)](#listdiff)
+  * [tf.where(input, name=None)](#where)
+  * [tf.unique(x, name=None)](#unique)
+  * [tf.edit_distance(hypothesis, truth, normalize=True, name='edit_distance')](#edit_distance)
+  * [tf.invert_permutation(x, name=None)](#invert_permutation)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Arithmetic Operators <div class="md-anchor" id="AUTOGENERATED-arithmetic-operators">{#AUTOGENERATED-arithmetic-operators}</div>
+
+TensorFlow provides several operations that you can use to add basic arithmetic
+operators to your graph.
+
+- - -
+
+### tf.add(x, y, name=None) <div class="md-anchor" id="add">{#add}</div>
+
+Returns x + y element-wise.
+
+*NOTE*: Add supports broadcasting. AddN does not.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int8`, `int16`, `int32`, `complex64`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.sub(x, y, name=None) <div class="md-anchor" id="sub">{#sub}</div>
+
+Returns x - y element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.mul(x, y, name=None) <div class="md-anchor" id="mul">{#mul}</div>
+
+Returns x * y element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int8`, `int16`, `int32`, `complex64`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.div(x, y, name=None) <div class="md-anchor" id="div">{#div}</div>
+
+Returns x / y element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.mod(x, y, name=None) <div class="md-anchor" id="mod">{#mod}</div>
+
+Returns element-wise remainder of division.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`, `float32`, `float64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+
+## Basic Math Functions <div class="md-anchor" id="AUTOGENERATED-basic-math-functions">{#AUTOGENERATED-basic-math-functions}</div>
+
+TensorFlow provides several operations that you can use to add basic
+mathematical functions to your graph.
+
+- - -
+
+### tf.add_n(inputs, name=None) <div class="md-anchor" id="add_n">{#add_n}</div>
+
+Add all input tensors element wise.
+
+##### Args:
+
+
+*  <b>inputs</b>: A list of at least 1 `Tensor` objects of the same type in: `float32`, `float64`, `int64`, `int32`, `uint8`, `int16`, `int8`, `complex64`, `qint8`, `quint8`, `qint32`.
+    Must all be the same size and shape.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `inputs`.
+
+
+- - -
+
+### tf.abs(x, name=None) <div class="md-anchor" id="abs">{#abs}</div>
+
+Computes the absolute value of a tensor.
+
+Given a tensor of real numbers `x`, this operation returns a tensor
+containing the absolute value of each element in `x`. For example, if x is
+an input element and y is an output element, this operation computes
+\\(y = |x|\\).
+
+See [`tf.complex_abs()`](#tf_complex_abs) to compute the absolute value of a complex
+number.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` of type `float`, `double`, `int32`, or `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+   A `Tensor` the same size and type as `x` with absolute values.
+
+
+- - -
+
+### tf.neg(x, name=None) <div class="md-anchor" id="neg">{#neg}</div>
+
+Computes numerical negative value element-wise.
+
+I.e., \\(y = -x\\).
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.sign(x, name=None) <div class="md-anchor" id="sign">{#sign}</div>
+
+Returns an element-wise indication of the sign of a number.
+
+y = sign(x) = -1 if x < 0; 0 if x == 0; 1 if x > 0.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.inv(x, name=None) <div class="md-anchor" id="inv">{#inv}</div>
+
+Computes the reciprocal of x element-wise.
+
+I.e., \\(y = 1 / x\\).
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.square(x, name=None) <div class="md-anchor" id="square">{#square}</div>
+
+Computes square of x element-wise.
+
+I.e., \\(y = x * x = x^2\\).
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.round(x, name=None) <div class="md-anchor" id="round">{#round}</div>
+
+Rounds the values of a tensor to the nearest integer, element-wise.
+
+For example:
+
+```python
+# 'a' is [0.9, 2.5, 2.3, -4.4]
+tf.round(a) ==> [ 1.0, 3.0, 2.0, -4.0 ]
+```
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` of type `float` or `double`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of same shape and type as `x`.
+
+
+- - -
+
+### tf.sqrt(x, name=None) <div class="md-anchor" id="sqrt">{#sqrt}</div>
+
+Computes square root of x element-wise.
+
+I.e., \\(y = \sqrt{x} = x^{1/2}\\).
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.rsqrt(x, name=None) <div class="md-anchor" id="rsqrt">{#rsqrt}</div>
+
+Computes reciprocal of square root of x element-wise.
+
+I.e., \\(y = 1 / \sqrt{x}\\).
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.pow(x, y, name=None) <div class="md-anchor" id="pow">{#pow}</div>
+
+Computes the power of one value to another.
+
+Given a tensor `x` and a tensor `y`, this operation computes \\(x^y\\) for
+corresponding elements in `x` and `y`. For example:
+
+```
+# tensor 'x' is [[2, 2]], [3, 3]]
+# tensor 'y' is [[8, 16], [2, 3]]
+tf.pow(x, y) ==> [[256, 65536], [9, 27]]
+```
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` of type `float`, `double`, `int32`, `complex64`, or `int64`.
+*  <b>y</b>: A `Tensor` of type `float`, `double`, `int32`, `complex64`, or `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`.
+
+
+- - -
+
+### tf.exp(x, name=None) <div class="md-anchor" id="exp">{#exp}</div>
+
+Computes exponential of x element-wise.  \\(y = e^x\\).
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.log(x, name=None) <div class="md-anchor" id="log">{#log}</div>
+
+Computes natural logrithm of x element-wise.
+
+I.e., \\(y = \log_e x\\).
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.ceil(x, name=None) <div class="md-anchor" id="ceil">{#ceil}</div>
+
+Returns element-wise smallest integer in not less than x.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.floor(x, name=None) <div class="md-anchor" id="floor">{#floor}</div>
+
+Returns element-wise largest integer not greater than x.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.maximum(x, y, name=None) <div class="md-anchor" id="maximum">{#maximum}</div>
+
+Returns the max of x and y (i.e. x > y ? x : y) element-wise, broadcasts.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.minimum(x, y, name=None) <div class="md-anchor" id="minimum">{#minimum}</div>
+
+Returns the min of x and y (i.e. x < y ? x : y) element-wise, broadcasts.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.cos(x, name=None) <div class="md-anchor" id="cos">{#cos}</div>
+
+Computes cos of x element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+- - -
+
+### tf.sin(x, name=None) <div class="md-anchor" id="sin">{#sin}</div>
+
+Computes sin of x element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`, `int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+
+
+
+## Matrix Math Functions <div class="md-anchor" id="AUTOGENERATED-matrix-math-functions">{#AUTOGENERATED-matrix-math-functions}</div>
+
+TensorFlow provides several operations that you can use to add basic
+mathematical functions for matrices to your graph.
+
+- - -
+
+### tf.diag(diagonal, name=None) <div class="md-anchor" id="diag">{#diag}</div>
+
+Returns a diagonal tensor with a given diagonal values.
+
+Given a `diagonal`, this operation returns a tensor with the `diagonal` and
+everything else padded with zeros. The diagonal is computed as follows:
+
+Assume `diagonal` has dimensions [D1,..., Dk], then the output is a tensor of
+rank 2k with dimensions [D1,..., Dk, D1,..., Dk] where:
+
+`output[i1,..., ik, i1,..., ik] = diagonal[i1, ..., ik]` and 0 everywhere else.
+
+For example:
+
+```prettyprint
+# 'diagonal' is [1, 2, 3, 4]
+tf.diag(diagonal) ==> [[1, 0, 0, 0]
+                       [0, 2, 0, 0]
+                       [0, 0, 3, 0]
+                       [0, 0, 0, 4]]
+```
+
+##### Args:
+
+
+*  <b>diagonal</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`.
+    Rank k tensor where k is at most 3.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `diagonal`.
+
+
+- - -
+
+### tf.transpose(a, perm=None, name='transpose') <div class="md-anchor" id="transpose">{#transpose}</div>
+
+Transposes `a`. Permutes the dimensions according to `perm`.
+
+The returned tensor's dimension i will correspond to the input dimension
+`perm[i]`. If `perm` is not given, it is set to (n-1...0), where n is
+the rank of the input tensor. Hence by default, this operation performs a
+regular matrix transpose on 2-D input Tensors.
+
+For example:
+
+```python
+# 'x' is [[1 2 3]
+#         [4 5 6]]
+tf.transpose(x) ==> [[1 4]
+                     [2 5]
+                     [3 6]]
+
+# Equivalently
+tf.transpose(x perm=[0, 1]) ==> [[1 4]
+                                 [2 5]
+                                 [3 6]]
+
+# 'perm' is more useful for n-dimensional tensors, for n > 2
+# 'x' is   [[[1  2  3]
+#            [4  5  6]]
+#           [[7  8  9]
+#            [10 11 12]]]
+# Take the transpose of the matrices in dimension-0
+tf.transpose(b, perm=[0, 2, 1]) ==> [[[1  4]
+                                      [2  5]
+                                      [3  6]]
+
+                                     [[7 10]
+                                      [8 11]
+                                      [9 12]]]
+```
+
+##### Args:
+
+
+*  <b>a</b>: A `Tensor`.
+*  <b>perm</b>: A permutation of the dimensions of `a`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A transposed `Tensor`.
+
+
+
+- - -
+
+### tf.matmul(a, b, transpose_a=False, transpose_b=False, a_is_sparse=False, b_is_sparse=False, name=None) <div class="md-anchor" id="matmul">{#matmul}</div>
+
+Multiplies matrix `a` by matrix `b`, producing `a` * `b`.
+
+The inputs must be two-dimensional matrices, with matching inner dimensions,
+possibly after transposition.
+
+Both matrices must be of the same type. The supported types are:
+`float`, `double`, `int32`, `complex64`.
+
+Either matrix can be transposed on the fly by setting the corresponding flag
+to `True`. This is `False` by default.
+
+If one or both of the matrices contain a lot of zeros, a more efficient
+multiplication algorithm can be used by setting the corresponding
+`a_is_sparse` or `b_is_sparse` flag to `True`. These are `False` by default.
+
+For example:
+
+```python
+# 2-D tensor `a`
+a = tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3]) => [[1. 2. 3.]
+                                                      [4. 5. 6.]]
+# 2-D tensor `b`
+b = tf.constant([7, 8, 9, 10, 11, 12], shape=[3, 2]) => [[7. 8.]
+                                                         [9. 10.]
+                                                         [11. 12.]]
+c = tf.matmul(a, b) => [[58 64]
+                        [139 154]]
+```
+
+##### Args:
+
+
+*  <b>a</b>: `Tensor` of type `float`, `double`, `int32` or `complex64`.
+*  <b>b</b>: `Tensor` with same type as `a`.
+*  <b>transpose_a</b>: If `True`, `a` is transposed before multiplication.
+*  <b>transpose_b</b>: If `True`, `b` is transposed before multiplication.
+*  <b>a_is_sparse</b>: If `True`, `a` is treated as a sparse matrix.
+*  <b>b_is_sparse</b>: If `True`, `b` is treated as a sparse matrix.
+*  <b>name</b>: Name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of the same type as `a`.
+
+
+- - -
+
+### tf.batch_matmul(x, y, adj_x=None, adj_y=None, name=None) <div class="md-anchor" id="batch_matmul">{#batch_matmul}</div>
+
+Multiplies slices of two tensors in batches.
+
+Multiplies all slices of `Tensor` `x` and `y` (each slice can be
+viewed as an element of a batch), and arranges the individual results
+in a single output tensor of the same batch size. Each of the
+individual slices can optionally be adjointed (to adjoint a matrix
+means to transpose and conjugate it) before multiplication by setting
+the `adj_x` or `adj_y` flag to `True`, which are by default `False`.
+
+The input tensors `x` and `y` are 3-D or higher with shape `[..., r_x, c_x]`
+and `[..., r_y, c_y]`.
+
+The output tensor is 3-D or higher with shape `[..., r_o, c_o]`, where:
+
+    r_o = c_x if adj_x else r_x
+    c_o = r_y if adj_y else c_y
+
+It is computed as:
+
+    out[..., :, :] = matrix(x[..., :, :]) * matrix(y[..., :, :])
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `complex64`.
+    3-D or higher with shape `[..., r_x, c_x]`.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`.
+    3-D or higher with shape `[..., r_y, c_y]`.
+*  <b>adj_x</b>: An optional `bool`. Defaults to `False`.
+    If `True`, adjoint the slices of `x`. Defaults to `False`.
+*  <b>adj_y</b>: An optional `bool`. Defaults to `False`.
+    If `True`, adjoint the slices of `y`. Defaults to `False`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `x`.
+  3-D or higher with shape `[..., r_o, c_o]`
+
+
+
+- - -
+
+### tf.matrix_determinant(input, name=None) <div class="md-anchor" id="matrix_determinant">{#matrix_determinant}</div>
+
+Calculates the determinant of a square matrix.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+    A tensor of shape `[M, M]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+  A scalar, equal to the determinant of the input.
+
+
+- - -
+
+### tf.batch_matrix_determinant(input, name=None) <div class="md-anchor" id="batch_matrix_determinant">{#batch_matrix_determinant}</div>
+
+Calculates the determinants for a batch of square matrices.
+
+The input is a tensor of shape `[..., M, M]` whose inner-most 2 dimensions
+form square matrices. The output is a 1-D tensor containing the determinants
+for all input submatrices `[..., :, :]`.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+    Shape is `[..., M, M]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`. Shape is `[...]`.
+
+
+
+- - -
+
+### tf.matrix_inverse(input, name=None) <div class="md-anchor" id="matrix_inverse">{#matrix_inverse}</div>
+
+Calculates the inverse of a square invertible matrix. Checks for invertibility.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+    Shape is `[M, M]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+  Shape is `[M, M]` containing the matrix inverse of the input.
+
+
+- - -
+
+### tf.batch_matrix_inverse(input, name=None) <div class="md-anchor" id="batch_matrix_inverse">{#batch_matrix_inverse}</div>
+
+Calculates the inverse of square invertible matrices. Checks for invertibility.
+
+The input is a tensor of shape `[..., M, M]` whose inner-most 2 dimensions
+form square matrices. The output is a tensor of the same shape as the input
+containing the inverse for all input submatrices `[..., :, :]`.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+    Shape is `[..., M, M]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`. Shape is `[..., M, M]`.
+
+
+
+- - -
+
+### tf.cholesky(input, name=None) <div class="md-anchor" id="cholesky">{#cholesky}</div>
+
+Calculates the Cholesky decomposition of a square matrix.
+
+The input has to be symmetric and positive definite. Only the lower-triangular
+part of the input will be used for this operation. The upper-triangular part
+will not be read.
+
+The result is the lower-triangular matrix of the Cholesky decomposition of the
+input.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float64`, `float32`.
+    Shape is `[M, M]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`. Shape is `[M, M]`.
+
+
+- - -
+
+### tf.batch_cholesky(input, name=None) <div class="md-anchor" id="batch_cholesky">{#batch_cholesky}</div>
+
+Calculates the Cholesky decomposition of a batch of square matrices.
+
+The input is a tensor of shape `[..., M, M]` whose inner-most 2 dimensions
+form square matrices, with the same constraints as the single matrix Cholesky
+decomposition above. The output is a tensor of the same shape as the input
+containing the Cholesky decompositions for all input submatrices `[..., :, :]`.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float64`, `float32`.
+    Shape is `[..., M, M]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`. Shape is `[..., M, M]`.
+
+
+
+## Complex Number Functions <div class="md-anchor" id="AUTOGENERATED-complex-number-functions">{#AUTOGENERATED-complex-number-functions}</div>
+
+TensorFlow provides several operations that you can use to add complex number
+functions to your graph.
+
+- - -
+
+### tf.complex(real, imag, name=None) <div class="md-anchor" id="complex">{#complex}</div>
+
+Converts two real numbers to a complex number.
+
+Given a tensor `real` representing the real part of a complex number, and a
+tensor `imag` representing the imaginary part of a complex number, this
+operation computes complex numbers elementwise of the form \\(a + bj\\),
+where *a* represents the `real` part and *b* represents the `imag` part.
+
+The input tensors `real` and `imag` must be the same shape.
+
+For example:
+
+```
+# tensor 'real' is [2.25, 3.25]
+# tensor `imag` is [4.75, 5.75]
+tf.complex(real, imag) ==> [[2.25 + 4.74j], [3.25 + 5.75j]]
+```
+
+##### Args:
+
+
+*  <b>real</b>: A `Tensor` of type `float`.
+*  <b>imag</b>: A `Tensor` of type `float`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `complex64`.
+
+
+- - -
+
+### tf.complex_abs(x, name=None) <div class="md-anchor" id="complex_abs">{#complex_abs}</div>
+
+Computes the complex absolute value of a tensor.
+
+Given a tensor `x` of complex numbers, this operation returns a tensor of type
+`float` that is the absolute value of each element in `x`. All elements in `x`
+must be complex numbers of the form \\(a + bj\\). The absolute value is
+computed as \\( \sqrt{a^2 + b^2}\\).
+
+For example:
+
+```
+# tensor 'x' is [[-2.25 + 4.75j], [-3.25 + 5.75j]]
+tf.complex_abs(x) ==> [5.25594902, 6.60492229]
+```
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` of type `complex64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`.
+
+
+- - -
+
+### tf.conj(in_, name=None) <div class="md-anchor" id="conj">{#conj}</div>
+
+Returns the complex conjugate of a complex number.
+
+Given a tensor `in` of complex numbers, this operation returns a tensor of
+complex numbers that are the complex conjugate of each element in `in`. The
+complex numbers in `in` must be of the form \\(a + bj\\), where *a* is the real
+part and *b* is the imaginary part.
+
+The complex conjugate returned by this operation is of the form \\(a - bj\\).
+
+For example:
+
+```
+# tensor 'in' is [-2.25 + 4.75j, 3.25 + 5.75j]
+tf.conj(in) ==> [-2.25 - 4.75j, 3.25 - 5.75j]
+```
+
+##### Args:
+
+
+*  <b>in_</b>: A `Tensor` of type `complex64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `complex64`.
+
+
+- - -
+
+### tf.imag(in_, name=None) <div class="md-anchor" id="imag">{#imag}</div>
+
+Returns the imaginary part of a complex number.
+
+Given a tensor `in` of complex numbers, this operation returns a tensor of type
+`float` that is the imaginary part of each element in `in`. All elements in `in`
+must be complex numbers of the form \\(a + bj\\), where *a* is the real part
+and *b* is the imaginary part returned by this operation.
+
+For example:
+
+```
+# tensor 'in' is [-2.25 + 4.75j, 3.25 + 5.75j]
+tf.imag(in) ==> [4.75, 5.75]
+```
+
+##### Args:
+
+
+*  <b>in_</b>: A `Tensor` of type `complex64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`.
+
+
+- - -
+
+### tf.real(in_, name=None) <div class="md-anchor" id="real">{#real}</div>
+
+Returns the real part of a complex number.
+
+Given a tensor `in` of complex numbers, this operation returns a tensor of type
+`float` that is the real part of each element in `in`. All elements in `in`
+must be complex numbers of the form \\(a + bj\\), where *a* is the real part
+returned by this operation and *b* is the imaginary part.
+
+For example:
+
+```
+# tensor 'in' is [-2.25 + 4.75j, 3.25 + 5.75j]
+tf.real(in) ==> [-2.25, 3.25]
+```
+
+##### Args:
+
+
+*  <b>in_</b>: A `Tensor` of type `complex64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`.
+
+
+
+## Reduction <div class="md-anchor" id="AUTOGENERATED-reduction">{#AUTOGENERATED-reduction}</div>
+
+TensorFlow provides several operations that you can use to perform
+common math computations that reduce various dimensions of a tensor.
+
+- - -
+
+### tf.reduce_sum(input_tensor, reduction_indices=None, keep_dims=False, name=None) <div class="md-anchor" id="reduce_sum">{#reduce_sum}</div>
+
+Computes the sum of elements across dimensions of a tensor.
+
+Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+are retained with length 1.
+
+If `reduction_indices` has no entries, all dimensions are reduced, and a
+tensor with a single element is returned.
+
+For example:
+
+```python
+# 'x' is [[1, 1, 1]]
+#         [1, 1, 1]]
+tf.reduce_sum(x) ==> 6
+tf.reduce_sum(x, 0) ==> [2, 2, 2]
+tf.reduce_sum(x, 1) ==> [3, 3]
+tf.reduce_sum(x, 1, keep_dims=True) ==> [[3], [3]]
+tf.reduce_sum(x, [0, 1]) ==> 6
+```
+
+##### Args:
+
+
+*  <b>input_tensor</b>: The tensor to reduce. Should have numeric type.
+*  <b>reduction_indices</b>: The dimensions to reduce. If `None` (the defaut),
+    reduces all dimensions.
+*  <b>keep_dims</b>: If true, retains reduced dimensions with length 1.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The reduced tensor.
+
+
+- - -
+
+### tf.reduce_prod(input_tensor, reduction_indices=None, keep_dims=False, name=None) <div class="md-anchor" id="reduce_prod">{#reduce_prod}</div>
+
+Computes the product of elements across dimensions of a tensor.
+
+Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+are retained with length 1.
+
+If `reduction_indices` has no entries, all dimensions are reduced, and a
+tensor with a single element is returned.
+
+##### Args:
+
+
+*  <b>input_tensor</b>: The tensor to reduce. Should have numeric type.
+*  <b>reduction_indices</b>: The dimensions to reduce. If `None` (the defaut),
+    reduces all dimensions.
+*  <b>keep_dims</b>: If true, retains reduced dimensions with length 1.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The reduced tensor.
+
+
+- - -
+
+### tf.reduce_min(input_tensor, reduction_indices=None, keep_dims=False, name=None) <div class="md-anchor" id="reduce_min">{#reduce_min}</div>
+
+Computes the minimum of elements across dimensions of a tensor.
+
+Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+are retained with length 1.
+
+If `reduction_indices` has no entries, all dimensions are reduced, and a
+tensor with a single element is returned.
+
+##### Args:
+
+
+*  <b>input_tensor</b>: The tensor to reduce. Should have numeric type.
+*  <b>reduction_indices</b>: The dimensions to reduce. If `None` (the defaut),
+    reduces all dimensions.
+*  <b>keep_dims</b>: If true, retains reduced dimensions with length 1.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The reduced tensor.
+
+
+- - -
+
+### tf.reduce_max(input_tensor, reduction_indices=None, keep_dims=False, name=None) <div class="md-anchor" id="reduce_max">{#reduce_max}</div>
+
+Computes the maximum of elements across dimensions of a tensor.
+
+Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+are retained with length 1.
+
+If `reduction_indices` has no entries, all dimensions are reduced, and a
+tensor with a single element is returned.
+
+##### Args:
+
+
+*  <b>input_tensor</b>: The tensor to reduce. Should have numeric type.
+*  <b>reduction_indices</b>: The dimensions to reduce. If `None` (the defaut),
+    reduces all dimensions.
+*  <b>keep_dims</b>: If true, retains reduced dimensions with length 1.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The reduced tensor.
+
+
+- - -
+
+### tf.reduce_mean(input_tensor, reduction_indices=None, keep_dims=False, name=None) <div class="md-anchor" id="reduce_mean">{#reduce_mean}</div>
+
+Computes the mean of elements across dimensions of a tensor.
+
+Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+are retained with length 1.
+
+If `reduction_indices` has no entries, all dimensions are reduced, and a
+tensor with a single element is returned.
+
+For example:
+
+```python
+# 'x' is [[1., 1. ]]
+#         [2., 2.]]
+tf.reduce_mean(x) ==> 1.5
+tf.reduce_mean(x, 0) ==> [1.5, 1.5]
+tf.reduce_mean(x, 1) ==> [1.,  2.]
+```
+
+##### Args:
+
+
+*  <b>input_tensor</b>: The tensor to reduce. Should have numeric type.
+*  <b>reduction_indices</b>: The dimensions to reduce. If `None` (the defaut),
+    reduces all dimensions.
+*  <b>keep_dims</b>: If true, retains reduced dimensions with length 1.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The reduced tensor.
+
+
+- - -
+
+### tf.reduce_all(input_tensor, reduction_indices=None, keep_dims=False, name=None) <div class="md-anchor" id="reduce_all">{#reduce_all}</div>
+
+Computes the "logical and" of elements across dimensions of a tensor.
+
+Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+are retained with length 1.
+
+If `reduction_indices` has no entries, all dimensions are reduced, and a
+tensor with a single element is returned.
+
+For example:
+
+```python
+# 'x' is [[True,  True]]
+#         [False, False]]
+tf.reduce_all(x) ==> False
+tf.reduce_all(x, 0) ==> [False, False]
+tf.reduce_all(x, 1) ==> [True, False]
+```
+
+##### Args:
+
+
+*  <b>input_tensor</b>: The boolean tensor to reduce.
+*  <b>reduction_indices</b>: The dimensions to reduce. If `None` (the defaut),
+    reduces all dimensions.
+*  <b>keep_dims</b>: If true, retains reduced dimensions with length 1.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The reduced tensor.
+
+
+- - -
+
+### tf.reduce_any(input_tensor, reduction_indices=None, keep_dims=False, name=None) <div class="md-anchor" id="reduce_any">{#reduce_any}</div>
+
+Computes the "logical or" of elements across dimensions of a tensor.
+
+Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+are retained with length 1.
+
+If `reduction_indices` has no entries, all dimensions are reduced, and a
+tensor with a single element is returned.
+
+For example:
+
+```python
+# 'x' is [[True,  True]]
+#         [False, False]]
+tf.reduce_any(x) ==> True
+tf.reduce_any(x, 0) ==> [True, True]
+tf.reduce_any(x, 1) ==> [True, False]
+```
+
+##### Args:
+
+
+*  <b>input_tensor</b>: The boolean tensor to reduce.
+*  <b>reduction_indices</b>: The dimensions to reduce. If `None` (the defaut),
+    reduces all dimensions.
+*  <b>keep_dims</b>: If true, retains reduced dimensions with length 1.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The reduced tensor.
+
+
+
+- - -
+
+### tf.accumulate_n(inputs, shape=None, tensor_dtype=None, name=None) <div class="md-anchor" id="accumulate_n">{#accumulate_n}</div>
+
+Returns the element-wise sum of a list of tensors.
+
+Optionally, pass `shape` and `tensor_dtype` for shape and type checking,
+otherwise, these are inferred.
+
+For example:
+
+```python
+# tensor 'a' is [[1, 2], [3, 4]
+# tensor `b` is [[5, 0], [0, 6]]
+tf.accumulate_n([a, b, a]) ==> [[7, 4], [6, 14]]
+
+# Explicitly pass shape and type
+tf.accumulate_n([a, b, a], shape=[2, 2], tensor_dtype=tf.int32)
+  ==> [[7, 4], [6, 14]]
+```
+
+##### Args:
+
+
+*  <b>inputs</b>: A list of `Tensor` objects, each with same shape and type.
+*  <b>shape</b>: Shape of elements of `inputs`.
+*  <b>tensor_dtype</b>: The type of `inputs`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of same shape and type as the elements of `inputs`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `inputs` don't all have same shape and dtype or the shape
+  cannot be inferred.
+
+
+
+## Segmentation <div class="md-anchor" id="AUTOGENERATED-segmentation">{#AUTOGENERATED-segmentation}</div>
+
+TensorFlow provides several operations that you can use to perform common
+math computations on tensor segments.
+Here a segmentation is a partitioning of a tensor along
+the first dimension, i.e. it  defines a mapping from the first dimension onto
+`segment_ids`. The `segment_ids` tensor should be the size of
+the first dimension, `d0`, with consecutive IDs in the range `0` to `k`,
+where `k<d0`.
+In particular, a segmentation of a matrix tensor is a mapping of rows to
+segments.
+
+For example:
+
+```python
+c = tf.constant([[1,2,3,4], [-1,-2,-3,-4], [5,6,7,8]])
+tf.segment_sum(c, tf.constant([0, 0, 1]))
+  ==>  [[0 0 0 0]
+        [5 6 7 8]]
+```
+
+- - -
+
+### tf.segment_sum(data, segment_ids, name=None) <div class="md-anchor" id="segment_sum">{#segment_sum}</div>
+
+Computes the sum along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \sum_j data_j\\) where sum is over `j` such
+that `segment_ids[j] == i`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentSum.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>segment_ids</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A 1-D tensor whose rank is equal to the rank of `data`'s
+    first dimension.  Values should be sorted and can be repeated.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+  Has same shape as data, except for dimension_0 which
+  has size `k`, the number of segments.
+
+
+- - -
+
+### tf.segment_prod(data, segment_ids, name=None) <div class="md-anchor" id="segment_prod">{#segment_prod}</div>
+
+Computes the product along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \prod_j data_j\\) where the product is over `j` such
+that `segment_ids[j] == i`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentProd.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>segment_ids</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A 1-D tensor whose rank is equal to the rank of `data`'s
+    first dimension.  Values should be sorted and can be repeated.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+  Has same shape as data, except for dimension_0 which
+  has size `k`, the number of segments.
+
+
+- - -
+
+### tf.segment_min(data, segment_ids, name=None) <div class="md-anchor" id="segment_min">{#segment_min}</div>
+
+Computes the minimum along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \min_j(data_j)\\) where `min` is over `j` such
+that `segment_ids[j] == i`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentMin.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>segment_ids</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A 1-D tensor whose rank is equal to the rank of `data`'s
+    first dimension.  Values should be sorted and can be repeated.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+  Has same shape as data, except for dimension_0 which
+  has size `k`, the number of segments.
+
+
+- - -
+
+### tf.segment_max(data, segment_ids, name=None) <div class="md-anchor" id="segment_max">{#segment_max}</div>
+
+Computes the maximum along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \max_j(data_j)\\) where `max` is over `j` such
+that `segment_ids[j] == i`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentMax.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>segment_ids</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A 1-D tensor whose rank is equal to the rank of `data`'s
+    first dimension.  Values should be sorted and can be repeated.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+  Has same shape as data, except for dimension_0 which
+  has size `k`, the number of segments.
+
+
+- - -
+
+### tf.segment_mean(data, segment_ids, name=None) <div class="md-anchor" id="segment_mean">{#segment_mean}</div>
+
+Computes the mean along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \frac{\sum_j data_j}{N}\\) where `mean` is
+over `j` such that `segment_ids[j] == i` and `N` is the total number of
+values summed.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/SegmentMean.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>segment_ids</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A 1-D tensor whose rank is equal to the rank of `data`'s
+    first dimension.  Values should be sorted and can be repeated.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+  Has same shape as data, except for dimension_0 which
+  has size `k`, the number of segments.
+
+
+
+- - -
+
+### tf.unsorted_segment_sum(data, segment_ids, num_segments, name=None) <div class="md-anchor" id="unsorted_segment_sum">{#unsorted_segment_sum}</div>
+
+Computes the sum along segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Computes a tensor such that
+\\(output_i = \sum_j data_j\\) where sum is over `j` such
+that `segment_ids[j] == i`. Unlike `SegmentSum`, `segment_ids`
+need not be sorted and need not cover all values in the full
+  range of valid values.
+
+If the sum is empty for a given segment ID `i`, `output[i] = 0`.
+
+`num_segments` should equal the number of distinct segment IDs.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/UnsortedSegmentSum.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>segment_ids</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A 1-D tensor whose rank is equal to the rank of `data`'s
+    first dimension.
+*  <b>num_segments</b>: A `Tensor` of type `int32`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+  Has same shape as data, except for dimension_0 which
+  has size `num_segments`.
+
+
+
+- - -
+
+### tf.sparse_segment_sum(data, indices, segment_ids, name=None) <div class="md-anchor" id="sparse_segment_sum">{#sparse_segment_sum}</div>
+
+Computes the sum along sparse segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Like `SegmentSum`, but `segment_ids` can have rank less than `data`'s first
+dimension, selecting a subset of dimension_0, specified by `indices`.
+
+For example:
+
+```prettyprint
+c = tf.constant([[1,2,3,4], [-1,-2,-3,-4], [5,6,7,8]])
+
+# Select two rows, one segment.
+tf.sparse_segment_sum(c, tf.constant([0, 1]), tf.constant([0, 0]))
+  ==> [[0 0 0 0]]
+
+# Select two rows, two segment.
+tf.sparse_segment_sum(c, tf.constant([0, 1]), tf.constant([0, 1]))
+  ==> [[ 1  2  3  4]
+       [-1 -2 -3 -4]]
+
+# Select all rows, two segments.
+tf.sparse_segment_sum(c, tf.constant([0, 1, 2]), tf.constant([0, 0, 1]))
+  ==> [[0 0 0 0]
+       [5 6 7 8]]
+
+# Which is equivalent to:
+tf.segment_sum(c, tf.constant([0, 0, 1]))
+```
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>indices</b>: A `Tensor` of type `int32`.
+    A 1-D tensor. Has same rank as `segment_ids`.
+*  <b>segment_ids</b>: A `Tensor` of type `int32`.
+    A 1-D tensor. Values should be sorted and can be repeated.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+  Has same shape as data, except for dimension_0 which
+  has size `k`, the number of segments.
+
+
+- - -
+
+### tf.sparse_segment_mean(data, indices, segment_ids, name=None) <div class="md-anchor" id="sparse_segment_mean">{#sparse_segment_mean}</div>
+
+Computes the mean along sparse segments of a tensor.
+
+Read [the section on Segmentation](../python/math_ops.md#segmentation)
+for an explanation of segments.
+
+Like `SegmentMean`, but `segment_ids` can have rank less than `data`'s first
+dimension, selecting a subset of dimension_0, specified by `indices`.
+
+##### Args:
+
+
+*  <b>data</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+*  <b>indices</b>: A `Tensor` of type `int32`.
+    A 1-D tensor. Has same rank as `segment_ids`.
+*  <b>segment_ids</b>: A `Tensor` of type `int32`.
+    A 1-D tensor. Values should be sorted and can be repeated.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `data`.
+  Has same shape as data, except for dimension_0 which
+  has size `k`, the number of segments.
+
+
+
+
+## Sequence Comparison and Indexing <div class="md-anchor" id="AUTOGENERATED-sequence-comparison-and-indexing">{#AUTOGENERATED-sequence-comparison-and-indexing}</div>
+
+TensorFlow provides several operations that you can use to add sequence
+comparison and index extraction to your graph. You can use these operations to
+determine sequence differences and determine the indexes of specific values in
+a tensor.
+
+- - -
+
+### tf.argmin(input, dimension, name=None) <div class="md-anchor" id="argmin">{#argmin}</div>
+
+Returns the index with the smallest value across dimensions of a tensor.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int64`, `int32`, `uint8`, `int16`, `int8`, `complex64`, `qint8`, `quint8`, `qint32`.
+*  <b>dimension</b>: A `Tensor` of type `int32`.
+    int32, 0 <= dimension < rank(input).  Describes which dimension
+    of the input Tensor to reduce across. For vectors, use dimension = 0.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `int64`.
+
+
+- - -
+
+### tf.argmax(input, dimension, name=None) <div class="md-anchor" id="argmax">{#argmax}</div>
+
+Returns the index with the largest value across dimensions of a tensor.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int64`, `int32`, `uint8`, `int16`, `int8`, `complex64`, `qint8`, `quint8`, `qint32`.
+*  <b>dimension</b>: A `Tensor` of type `int32`.
+    int32, 0 <= dimension < rank(input).  Describes which dimension
+    of the input Tensor to reduce across. For vectors, use dimension = 0.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `int64`.
+
+
+
+- - -
+
+### tf.listdiff(x, y, name=None) <div class="md-anchor" id="listdiff">{#listdiff}</div>
+
+Computes the difference between two lists of numbers.
+
+Given a list `x` and a list `y`, this operation returns a list `out` that
+represents all numbers that are in `x` but not in `y`. The returned list `out`
+is sorted in the same order that the numbers appear in `x` (duplicates are
+preserved). This operation also returns a list `idx` that represents the
+position of each `out` element in `x`. In other words:
+
+`out[i] = x[idx[i]] for i in [0, 1, ..., len(out) - 1]`
+
+For example, given this input:
+
+```prettyprint
+x = [1, 2, 3, 4, 5, 6]
+y = [1, 3, 5]
+```
+
+This operation would return:
+
+```prettyprint
+out ==> [2, 4, 6]
+idx ==> [1, 3, 5]
+```
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. 1-D. Values to keep.
+*  <b>y</b>: A `Tensor`. Must have the same type as `x`. 1-D. Values to remove.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of `Tensor` objects (out, idx).
+
+*  <b>out</b>: A `Tensor`. Has the same type as `x`. 1-D. Values present in `x` but not in `y`.
+*  <b>idx</b>: A `Tensor` of type `int32`. 1-D. Positions of `x` values preserved in `out`.
+
+
+- - -
+
+### tf.where(input, name=None) <div class="md-anchor" id="where">{#where}</div>
+
+Returns locations of true values in a boolean tensor.
+
+This operation returns the coordinates of true elements in `input`. The
+coordinates are returned in a 2-D tensor where the first dimension (rows)
+represents the number of true elements, and the second dimension (columns)
+represents the coordinates of the true elements. Keep in mind, the shape of
+the output tensor can vary depending on how many true values there are in
+`input`. Indices are output in row-major order.
+
+For example:
+
+```prettyprint
+# 'input' tensor is [[True, False]
+#                    [True, False]]
+# 'input' has two true values, so output has two coordinates.
+# 'input' has rank of 2, so coordinates have two indices.
+where(input) ==> [[0, 0],
+                  [1, 0]]
+
+# `input` tensor is [[[True, False]
+#                     [True, False]]
+#                    [[False, True]
+#                     [False, True]]
+#                    [[False, False]
+#                     [False, True]]]
+# 'input' has 5 true values, so output has 5 coordinates.
+# 'input' has rank of 3, so coordinates have three indices.
+where(input) ==> [[0, 0, 0],
+                  [0, 1, 0],
+                  [1, 0, 1],
+                  [1, 1, 1],
+                  [2, 1, 1]]
+```
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor` of type `bool`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `int64`.
+
+
+- - -
+
+### tf.unique(x, name=None) <div class="md-anchor" id="unique">{#unique}</div>
+
+Finds unique elements in a 1-D tensor.
+
+This operation returns a tensor `y` containing all of the unique elements of `x`
+sorted in the same order that they occur in `x`. This operation also returns a
+tensor `idx` the same size as `x` that contains the index of each value of `x`
+in the unique output `y`. In other words:
+
+`y[idx[i]] = x[i] for i in [0, 1,...,rank(x) - 1]`
+
+For example:
+
+```prettyprint
+# tensor 'x' is [1, 1, 2, 4, 4, 4, 7, 8, 8]
+y, idx = unique(x)
+y ==> [1, 2, 4, 7, 8]
+idx ==> [0, 0, 1, 2, 2, 2, 3, 4, 4]
+```
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`. 1-D.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of `Tensor` objects (y, idx).
+
+*  <b>y</b>: A `Tensor`. Has the same type as `x`. 1-D.
+*  <b>idx</b>: A `Tensor` of type `int32`. 1-D.
+
+
+
+- - -
+
+### tf.edit_distance(hypothesis, truth, normalize=True, name='edit_distance') <div class="md-anchor" id="edit_distance">{#edit_distance}</div>
+
+Computes the Levenshtein distance between sequences.
+
+This operation takes variable-length sequences (`hypothesis` and `truth`),
+each provided as a `SparseTensor`, and computes the Levenshtein distance.
+You can normalize the edit distance by length of `truth` by setting
+`normalize` to true.
+
+For example, given the following input:
+
+```python
+# 'hypothesis' is a tensor of shape `[2, 1]` with variable-length values:
+#   (0,0) = ["a"]
+#   (1,0) = ["b"]
+hypothesis = tf.SparseTensor(
+    [[0, 0, 0],
+     [1, 0, 0]],
+    ["a", "b"]
+    (2, 1, 1))
+
+# 'truth' is a tensor of shape `[2, 2]` with variable-length values:
+#   (0,0) = []
+#   (0,1) = ["a"]
+#   (1,0) = ["b", "c"]
+#   (1,1) = ["a"]
+truth = tf.SparseTensor(
+    [[0, 1, 0],
+     [1, 0, 0],
+     [1, 0, 1],
+     [1, 1, 0]]
+    ["a", "b", "c", "a"],
+    (2, 2, 2))
+
+normalize = True
+```
+
+This operation would return the following:
+
+```python
+# 'output' is a tensor of shape `[2, 2]` with edit distances normalized
+# by 'truth' lengths.
+output ==> [[inf, 1.0],  # (0,0): no truth, (0,1): no hypothesis
+           [0.5, 1.0]]  # (1,0): addition, (1,1): no hypothesis
+```
+
+##### Args:
+
+
+*  <b>hypothesis</b>: A `SparseTensor` containing hypothesis sequences.
+*  <b>truth</b>: A `SparseTensor` containing truth sequences.
+*  <b>normalize</b>: A `bool`. If `True`, normalizes the Levenshtein distance by
+    length of `truth.`
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A dense `Tensor` with rank `R - 1`, where R is the rank of the
+  `SparseTensor` inputs `hypothesis` and `truth`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If either `hypothesis` or `truth` are not a `SparseTensor`.
+
+
+
+- - -
+
+### tf.invert_permutation(x, name=None) <div class="md-anchor" id="invert_permutation">{#invert_permutation}</div>
+
+Computes the inverse permutation of a tensor.
+
+This operation computes the inverse of an index permutation. It takes a 1-D
+integer tensor `x`, which represents the indices of a zero-based array, and
+swaps each value with its index position. In other words, for an ouput tensor
+`y` and an input tensor `x`, this operation computes the following:
+
+`y[x[i]] = i for i in [0, 1, ..., len(x) - 1]`
+
+The values must include 0. There can be no duplicate values or negative values.
+
+For example:
+
+```prettyprint
+# tensor `x` is [3, 4, 0, 2, 1]
+invert_permutation(x) ==> [2, 4, 3, 0, 1]
+```
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor` of type `int32`. 1-D.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `int32`. 1-D.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/nn.md b/tensorflow/g3doc/api_docs/python/nn.md
new file mode 100644
index 0000000000..91fab34255
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/nn.md
@@ -0,0 +1,1306 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Neural Network
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Activation Functions](#AUTOGENERATED-activation-functions)
+  * [tf.nn.relu(features, name=None)](#relu)
+  * [tf.nn.relu6(features, name=None)](#relu6)
+  * [tf.nn.softplus(features, name=None)](#softplus)
+  * [tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None, name=None)](#dropout)
+  * [tf.nn.bias_add(value, bias, name=None)](#bias_add)
+  * [tf.sigmoid(x, name=None)](#sigmoid)
+  * [tf.tanh(x, name=None)](#tanh)
+* [Convolution](#AUTOGENERATED-convolution)
+  * [tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)](#conv2d)
+  * [tf.nn.depthwise_conv2d(input, filter, strides, padding, name=None)](#depthwise_conv2d)
+  * [tf.nn.separable_conv2d(input, depthwise_filter, pointwise_filter, strides, padding, name=None)](#separable_conv2d)
+* [Pooling](#AUTOGENERATED-pooling)
+  * [tf.nn.avg_pool(value, ksize, strides, padding, name=None)](#avg_pool)
+  * [tf.nn.max_pool(value, ksize, strides, padding, name=None)](#max_pool)
+  * [tf.nn.max_pool_with_argmax(input, ksize, strides, padding, Targmax=None, name=None)](#max_pool_with_argmax)
+* [Normalization](#AUTOGENERATED-normalization)
+  * [tf.nn.l2_normalize(x, dim, epsilon=1e-12, name=None)](#l2_normalize)
+  * [tf.nn.local_response_normalization(input, depth_radius=None, bias=None, alpha=None, beta=None, name=None)](#local_response_normalization)
+  * [tf.nn.moments(x, axes, name=None)](#moments)
+* [Losses](#AUTOGENERATED-losses)
+  * [tf.nn.l2_loss(t, name=None)](#l2_loss)
+* [Classification](#AUTOGENERATED-classification)
+  * [tf.nn.sigmoid_cross_entropy_with_logits(logits, targets, name=None)](#sigmoid_cross_entropy_with_logits)
+  * [tf.nn.softmax(logits, name=None)](#softmax)
+  * [tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)](#softmax_cross_entropy_with_logits)
+* [Embeddings](#AUTOGENERATED-embeddings)
+  * [tf.nn.embedding_lookup(params, ids, name=None)](#embedding_lookup)
+  * [tf.nn.embedding_lookup_sparse(params, sp_ids, sp_weights, name=None, combiner='mean')](#embedding_lookup_sparse)
+* [Evaluation](#AUTOGENERATED-evaluation)
+  * [tf.nn.top_k(input, k, name=None)](#top_k)
+  * [tf.nn.in_top_k(predictions, targets, k, name=None)](#in_top_k)
+* [Candidate Sampling](#AUTOGENERATED-candidate-sampling)
+  * [Sampled Loss Functions](#AUTOGENERATED-sampled-loss-functions)
+  * [tf.nn.nce_loss(weights, biases, inputs, labels, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=False, name='nce_loss')](#nce_loss)
+  * [tf.nn.sampled_softmax_loss(weights, biases, inputs, labels, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=True, name='sampled_softmax_loss')](#sampled_softmax_loss)
+  * [Candidate Samplers](#AUTOGENERATED-candidate-samplers)
+  * [tf.nn.uniform_candidate_sampler(true_classes, num_true, num_sampled, unique, range_max, seed=None, name=None)](#uniform_candidate_sampler)
+  * [tf.nn.log_uniform_candidate_sampler(true_classes, num_true, num_sampled, unique, range_max, seed=None, name=None)](#log_uniform_candidate_sampler)
+  * [tf.nn.learned_unigram_candidate_sampler(true_classes, num_true, num_sampled, unique, range_max, seed=None, name=None)](#learned_unigram_candidate_sampler)
+  * [tf.nn.fixed_unigram_candidate_sampler(true_classes, num_true, num_sampled, unique, range_max, vocab_file='', distortion=0.0, num_reserved_ids=0, num_shards=1, shard=0, unigrams=[], seed=None, name=None)](#fixed_unigram_candidate_sampler)
+  * [Miscellaneous candidate sampling utilities](#AUTOGENERATED-miscellaneous-candidate-sampling-utilities)
+  * [tf.nn.compute_accidental_hits(true_classes, sampled_candidates, num_true, seed=None, name=None)](#compute_accidental_hits)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Activation Functions <div class="md-anchor" id="AUTOGENERATED-activation-functions">{#AUTOGENERATED-activation-functions}</div>
+
+The activation ops provide different types of nonlinearities for use in
+neural networks.  These include smooth nonlinearities (`sigmoid`,
+`tanh`, and `softplus`), continuous but not everywhere differentiable
+functions (`relu`, `relu6`, and `relu_x`), and random regularization
+(`dropout`).
+
+All activation ops apply componentwise, and produce a tensor of the same
+shape as the input tensor.
+
+- - -
+
+### tf.nn.relu(features, name=None) <div class="md-anchor" id="relu">{#relu}</div>
+
+Computes rectified linear: `max(features, 0)`.
+
+##### Args:
+
+
+*  <b>features</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `features`.
+
+
+- - -
+
+### tf.nn.relu6(features, name=None) <div class="md-anchor" id="relu6">{#relu6}</div>
+
+Computes Rectified Linear 6: `min(max(features, 0), 6)`.
+
+##### Args:
+
+
+*  <b>features</b>: A `Tensor` with type `float`, `double`, `int32`, `int64`, `uint8`,
+    `int16`, or `int8`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` with the same type as `features`.
+
+
+- - -
+
+### tf.nn.softplus(features, name=None) <div class="md-anchor" id="softplus">{#softplus}</div>
+
+Computes softplus: `log(exp(features) + 1)`.
+
+##### Args:
+
+
+*  <b>features</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `features`.
+
+
+- - -
+
+### tf.nn.dropout(x, keep_prob, noise_shape=None, seed=None, name=None) <div class="md-anchor" id="dropout">{#dropout}</div>
+
+Computes dropout.
+
+With probability `keep_prob`, outputs the input element scaled up by
+`1 / keep_prob`, otherwise outputs `0`.  The scaling is so that the expected
+sum is unchanged.
+
+By default, each element is kept or dropped independently.  If `noise_shape`
+is specified, it must be
+[broadcastable](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
+to the shape of `x`, and only dimensions with `noise_shape[i] == x.shape[i]`
+will make independent decisions.  For example, if `x.shape = [b, x, y, c]` and
+`noise_shape = [b, 1, 1, c]`, each batch and channel component will be
+kept independently and each row and column will be kept or not kept together.
+
+##### Args:
+
+
+*  <b>x</b>: A tensor.
+*  <b>keep_prob</b>: Float probability that each element is kept.
+*  <b>noise_shape</b>: Shape for randomly generated keep/drop flags.
+*  <b>seed</b>: A Python integer. Used to create a random seed.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+*  <b>name</b>: A name for this operation (optional).
+
+##### Returns:
+
+  A Tensor of the same shape of `x`.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If `keep_prob` is not in `(0, 1]`.
+
+
+- - -
+
+### tf.nn.bias_add(value, bias, name=None) <div class="md-anchor" id="bias_add">{#bias_add}</div>
+
+Adds `bias` to `value`.
+
+This is (mostly) a special case of `tf.add` where `bias` is restricted to 1-D.
+Broadcasting is supported, so `value` may have any number of dimensions.
+Unlike `tf.add`, the type of `bias` is allowed to differ from `value` in the
+case where both types are quantized.
+
+##### Args:
+
+
+*  <b>value</b>: A `Tensor` with type `float`, `double`, `int64`, `int32`, `uint8`,
+    `int16`, `int8`, or `complex64`.
+*  <b>bias</b>: A 1-D `Tensor` with size matching the last dimension of `value`.
+    Must be the same type as `value` unless `value` is a quantized type,
+    in which case a different quantized type may be used.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` with the same type as `value`.
+
+
+- - -
+
+### tf.sigmoid(x, name=None) <div class="md-anchor" id="sigmoid">{#sigmoid}</div>
+
+Computes sigmoid of `x` element-wise.
+
+Specifically, `y = 1 / (1 + exp(-x))`.
+
+##### Args:
+
+
+*  <b>x</b>: A Tensor with type `float`, `double`, `int32`, `complex64`, `int64`,
+    or `qint32`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A Tensor with the same type as `x` if `x.dtype != qint32`
+    otherwise the return type is `quint8`.
+
+
+- - -
+
+### tf.tanh(x, name=None) <div class="md-anchor" id="tanh">{#tanh}</div>
+
+Computes hyperbolic tangent of `x` element-wise.
+
+##### Args:
+
+
+*  <b>x</b>: A Tensor with type `float`, `double`, `int32`, `complex64`, `int64`,
+    or `qint32`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A Tensor with the same type as `x` if `x.dtype != qint32` otherwise
+    the return type is `quint8`.
+
+
+
+## Convolution <div class="md-anchor" id="AUTOGENERATED-convolution">{#AUTOGENERATED-convolution}</div>
+
+The convolution ops sweep a 2-D filter over a batch of images, applying the
+filter to each window of each image of the appropriate size.  The different
+ops trade off between generic vs. specific filters:
+
+* `conv2d`: Arbitrary filters that can mix channels together.
+* `depthwise_conv2d`: Filters that operate on each channel independently.
+* `separable_conv2d`: A depthwise spatial filter followed by a pointwise filter.
+
+Note that although these ops are called "convolution", they are strictly
+speaking "cross-correlation" since the filter is combined with an input window
+without reversing the filter.  For details, see [the properties of
+cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation#Properties).
+
+The filter is applied to image patches of the same size as the filter and
+strided according to the `strides` argument.  `strides = [1, 1, 1, 1]` applies
+the filter to a patch at every offset, `strides = [1, 2, 2, 1]` applies the
+filter to every other image patch in each dimension, etc.
+
+Ignoring channels for the moment, the spatial semantics of the convolution ops
+are as follows.  If the 4-D `input` has shape
+`[batch, in_height, in_width, ...]` and the 4-D `filter` has shape
+`[filter_height, filter_width, ...]`, then
+
+    output.shape = [batch,
+                    (in_height - filter_height + 1) / strides[1],
+                    (in_width - filter_width + 1) / strides[2],
+                    ...]
+
+    output[b, i, j, :] =
+        sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, ...] *
+                     filter[di, dj, ...]
+
+Since `input` is 4-D, each `input[b, i, j, :]` is a vector.  For `conv2d`, these
+vectors are multiplied by the `filter[di, dj, :, :]` matrices to produce new
+vectors.  For `depthwise_conv_2d`, each scalar component `input[b, i, j, k]`
+is multiplied by a vector `filter[di, dj, k]`, and all the vectors are
+concatenated.
+
+In the formula for `output.shape`, the rounding direction depends on padding:
+
+* `padding = 'SAME'`: Round down (only full size windows are considered).
+* `padding = 'VALID'`: Round up (partial windows are included).
+
+- - -
+
+### tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None) <div class="md-anchor" id="conv2d">{#conv2d}</div>
+
+Computes a 2-D convolution given 4-D `input` and `filter` tensors.
+
+Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
+and a filter / kernel tensor of shape
+`[filter_height, filter_width, in_channels, out_channels]`, this op
+performs the following:
+
+1. Flattens the filter to a 2-D matrix with shape
+   `[filter_height * filter_width * in_channels, output_channels]`.
+2. Extracts image patches from the the input tensor to form a *virtual*
+   tensor of shape `[batch, out_height, out_width,
+   filter_height * filter_width * in_channels]`.
+3. For each patch, right-multiplies the filter matrix and the image patch
+   vector.
+
+In detail,
+
+    output[b, i, j, k] =
+        sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
+                        filter[di, dj, q, k]
+
+Must have `strides[0] = strides[3] = 1`.  For the most common case of the same
+horizontal and vertices strides, `strides = [1, stride, stride, 1]`.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+*  <b>filter</b>: A `Tensor`. Must have the same type as `input`.
+*  <b>strides</b>: A list of `ints`.
+    1-D of length 4.  The stride of the sliding window for each dimension
+    of `input`.
+*  <b>padding</b>: A `string` from: `"SAME", "VALID"`.
+    The type of padding algorithm to use.
+*  <b>use_cudnn_on_gpu</b>: An optional `bool`. Defaults to `True`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+
+
+- - -
+
+### tf.nn.depthwise_conv2d(input, filter, strides, padding, name=None) <div class="md-anchor" id="depthwise_conv2d">{#depthwise_conv2d}</div>
+
+Depthwise 2-D convolution.
+
+Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
+and a filter tensor of shape
+`[filter_height, filter_width, in_channels, channel_multiplier]`
+containing `in_channels` convolutional filters of depth 1, `depthwise_conv2d`
+applies a different filter to each input channel (expanding from 1 channel
+to `channel_multiplier` channels for each), then concatenates the results
+together.  The output has `in_channels * channel_multiplier` channels.
+
+In detail,
+
+    output[b, i, j, k * channel_multiplier + q] =
+        sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, k] *
+                     filter[di, dj, k, q]
+
+Must have `strides[0] = strides[3] = 1`.  For the most common case of the
+same horizontal and vertical strides, `strides = [1, stride, stride, 1]`.
+
+##### Args:
+
+
+*  <b>input</b>: 4-D with shape `[batch, in_height, in_width, in_channels]`.
+*  <b>filter</b>: 4-D with shape
+    `[filter_height, filter_width, in_channels, channel_multiplier]`.
+*  <b>strides</b>: 1-D of size 4.  The stride of the sliding window for each
+    dimension of `input`.
+*  <b>padding</b>: A string, either `'VALID'` or `'SAME'`.  The padding algorithm.
+*  <b>name</b>: A name for this operation (optional).
+
+##### Returns:
+
+  A 4-D `Tensor` of shape
+  `[batch, out_height, out_width, in_channels * channel_multiplier].`
+
+
+- - -
+
+### tf.nn.separable_conv2d(input, depthwise_filter, pointwise_filter, strides, padding, name=None) <div class="md-anchor" id="separable_conv2d">{#separable_conv2d}</div>
+
+2-D convolution with separable filters.
+
+Performs a depthwise convolution that acts separately on channels followed by
+a pointwise convolution that mixes channels.  Note that this is separability
+between dimensions `[1, 2]` and `3`, not spatial separability between
+dimensions `1` and `2`.
+
+In detail,
+
+    output[b, i, j, k] = sum_{di, dj, q, r]
+        input[b, strides[1] * i + di, strides[2] * j + dj, q] *
+        depthwise_filter[di, dj, q, r] *
+        pointwise_filter[0, 0, q * channel_multiplier + r, k]
+
+`strides` controls the strides for the depthwise convolution only, since
+the pointwise convolution has implicit strides of `[1, 1, 1, 1]`.  Must have
+`strides[0] = strides[3] = 1`.  For the most common case of the same
+horizontal and vertical strides, `strides = [1, stride, stride, 1]`.
+
+##### Args:
+
+
+*  <b>input</b>: 4-D `Tensor` with shape `[batch, in_height, in_width, in_channels]`.
+*  <b>depthwise_filter</b>: 4-D `Tensor` with shape
+    `[filter_height, filter_width, in_channels, channel_multiplier]`.
+    Contains `in_channels` convolutional filters of depth 1.
+*  <b>pointwise_filter</b>: 4-D `Tensor` with shape
+    `[1, 1, channel_multiplier * in_channels, out_channels]`.  Pointwise
+    filter to mix channels after `depthwise_filter` has convolved spatially.
+*  <b>strides</b>: 1-D of size 4.  The strides for the depthwise convolution for
+    each dimension of `input`.
+*  <b>padding</b>: A string, either `'VALID'` or `'SAME'`.  The padding algorithm.
+*  <b>name</b>: A name for this operation (optional).
+
+##### Returns:
+
+  A 4-D `Tensor` of shape `[batch, out_height, out_width, out_channels]`.
+
+
+
+## Pooling <div class="md-anchor" id="AUTOGENERATED-pooling">{#AUTOGENERATED-pooling}</div>
+
+The pooling ops sweep a rectangular window over the input tensor, computing a
+reduction operation for each window (average, max, or max with argmax).  Each
+pooling op uses rectangular windows of size `ksize` separated by offset
+`strides`.  For example, if `strides` is all ones every window is used, if
+`strides` is all twos every other window is used in each dimension, etc.
+
+In detail, the output is
+
+    output[i] = reduce(value[strides * i:strides * i + ksize])
+
+for each tuple of indices `i`.  The output shape is
+
+    output.shape = (value.shape - ksize + 1) / strides
+
+where the rounding direction depends on padding:
+
+* `padding = 'SAME'`: Round down (only full size windows are considered).
+* `padding = 'VALID'`: Round up (partial windows are included).
+
+- - -
+
+### tf.nn.avg_pool(value, ksize, strides, padding, name=None) <div class="md-anchor" id="avg_pool">{#avg_pool}</div>
+
+Performs the average pooling on the input.
+
+Each entry in `output` is the mean of the corresponding size `ksize`
+window in `value`.
+
+##### Args:
+
+
+*  <b>value</b>: A 4-D `Tensor` of shape `[batch, height, width, channels]` and type
+    `float32`, `float64`, `qint8`, `quint8`, or `qint32`.
+*  <b>ksize</b>: A list of ints that has length >= 4.
+    The size of the window for each dimension of the input tensor.
+*  <b>strides</b>: A list of ints that has length >= 4.
+    The stride of the sliding window for each dimension of the
+    input tensor.
+*  <b>padding</b>: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
+*  <b>name</b>: Optional name for the operation.
+
+##### Returns:
+
+  A `Tensor` with the same type as `value`.  The average pooled output tensor.
+
+
+- - -
+
+### tf.nn.max_pool(value, ksize, strides, padding, name=None) <div class="md-anchor" id="max_pool">{#max_pool}</div>
+
+Performs the max pooling on the input.
+
+##### Args:
+
+
+*  <b>value</b>: A 4-D `Tensor` with shape `[batch, height, width, channels]` and
+    type `float32`, `float64`, `qint8`, `quint8`, `qint32`.
+*  <b>ksize</b>: A list of ints that has length >= 4.  The size of the window for
+    each dimension of the input tensor.
+*  <b>strides</b>: A list of ints that has length >= 4.  The stride of the sliding
+    window for each dimension of the input tensor.
+*  <b>padding</b>: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
+*  <b>name</b>: Optional name for the operation.
+
+##### Returns:
+
+  A `Tensor` with the same type as `value`.  The max pooled output tensor.
+
+
+- - -
+
+### tf.nn.max_pool_with_argmax(input, ksize, strides, padding, Targmax=None, name=None) <div class="md-anchor" id="max_pool_with_argmax">{#max_pool_with_argmax}</div>
+
+Performs max pooling on the input and outputs both max values and indices.
+
+The indices in `argmax` are flattened, so that a maximum value at position
+`[b, y, x, c]` becomes flattened index
+`((b * height + y) * width + x) * channels + c`.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor` of type `float32`.
+    4-D with shape `[batch, height, width, channels]`.  Input to pool over.
+*  <b>ksize</b>: A list of `ints` that has length `>= 4`.
+    The size of the window for each dimension of the input tensor.
+*  <b>strides</b>: A list of `ints` that has length `>= 4`.
+    The stride of the sliding window for each dimension of the
+    input tensor.
+*  <b>padding</b>: A `string` from: `"SAME", "VALID"`.
+    The type of padding algorithm to use.
+*  <b>Targmax</b>: An optional `tf.DType` from: `tf.int32, tf.int64`. Defaults to `tf.int64`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of `Tensor` objects (output, argmax).
+
+*  <b>output</b>: A `Tensor` of type `float32`. The max pooled output tensor.
+*  <b>argmax</b>: A `Tensor` of type `Targmax`. 4-D.  The flattened indices of the max values chosen for each output.
+
+
+
+## Normalization <div class="md-anchor" id="AUTOGENERATED-normalization">{#AUTOGENERATED-normalization}</div>
+
+Normalization is useful to prevent neurons from saturating when inputs may
+have varying scale, and to aid generalization.
+
+- - -
+
+### tf.nn.l2_normalize(x, dim, epsilon=1e-12, name=None) <div class="md-anchor" id="l2_normalize">{#l2_normalize}</div>
+
+Normalizes along dimension `dim` using an L2 norm.
+
+For a 1-D tensor with `dim = 0`, computes
+
+    output = x / sqrt(max(sum(x**2), epsilon))
+
+For `x` with more dimensions, independently normalizes each 1-D slice along
+dimension `dim`.
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`.
+*  <b>dim</b>: Dimension along which to normalize.
+*  <b>epsilon</b>: A lower bound value for the norm. Will use `sqrt(epsilon)` as the
+    divisor if `norm < sqrt(epsilon)`.
+*  <b>name</b>: A name for this operation (optional).
+
+##### Returns:
+
+  A `Tensor` with the same shape as `x`.
+
+
+- - -
+
+### tf.nn.local_response_normalization(input, depth_radius=None, bias=None, alpha=None, beta=None, name=None) <div class="md-anchor" id="local_response_normalization">{#local_response_normalization}</div>
+
+Local Response Normalization.
+
+The 4-D `input` tensor is treated as a 3-D array of 1-D vectors (along the last
+dimension), and each vector is normalized independently.  Within a given vector,
+each component is divided by the weighted, squared sum of inputs within
+`depth_radius`.  In detail,
+
+    sqr_sum[a, b, c, d] =
+        sum(input[a, b, c, d - depth_radius : d + depth_radius + 1] ** 2)
+    output = input / (bias + alpha * sqr_sum ** beta)
+
+For details, see [Krizhevsky et al., ImageNet classification with deep
+convolutional neural networks (NIPS 2012)]
+(http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks).
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor` of type `float32`. 4-D.
+*  <b>depth_radius</b>: An optional `int`. Defaults to `5`.
+    0-D.  Half-width of the 1-D normalization window.
+*  <b>bias</b>: An optional `float`. Defaults to `1`.
+    An offset (usually positive to avoid dividing by 0).
+*  <b>alpha</b>: An optional `float`. Defaults to `1`.
+    A scale factor, usually positive.
+*  <b>beta</b>: An optional `float`. Defaults to `0.5`. An exponent.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `float32`.
+
+
+- - -
+
+### tf.nn.moments(x, axes, name=None) <div class="md-anchor" id="moments">{#moments}</div>
+
+Calculate the mean and variance of `x`.
+
+The mean and variance are calculated by aggregating the contents of `x`
+across `axes`.  If `x` is 1-D and `axes = [0]` this is just the mean
+and variance of a vector.
+
+For so-called "global normalization" needed for convolutional filters pass
+`axes=[0, 1, 2]` (batch, height, width).  For batch normalization pass
+`axes=[0]` (batch).
+
+##### Args:
+
+
+*  <b>x</b>: A `Tensor`.
+*  <b>axes</b>: array of ints.  Axes along which to compute mean and
+    variance.
+*  <b>name</b>: Name used to scope the operations that compute the moments.
+
+##### Returns:
+
+  Two `Tensors`: `mean` and `variance`.
+
+
+
+## Losses <div class="md-anchor" id="AUTOGENERATED-losses">{#AUTOGENERATED-losses}</div>
+
+The loss ops measure error between two tensors, or between a tensor and zero.
+These can be used for measuring accuracy of a network in a regression task
+or for regularization purposes (weight decay).
+
+- - -
+
+### tf.nn.l2_loss(t, name=None) <div class="md-anchor" id="l2_loss">{#l2_loss}</div>
+
+L2 Loss.
+
+Computes half the L2 norm of a tensor without the `sqrt`:
+
+    output = sum(t ** 2) / 2
+
+##### Args:
+
+
+*  <b>t</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int64`, `int32`, `uint8`, `int16`, `int8`, `complex64`, `qint8`, `quint8`, `qint32`.
+    Typically 2-D, but may have any dimensions.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `t`. 0-D.
+
+
+
+## Classification <div class="md-anchor" id="AUTOGENERATED-classification">{#AUTOGENERATED-classification}</div>
+
+TensorFlow provides several operations that help you perform classification.
+
+- - -
+
+### tf.nn.sigmoid_cross_entropy_with_logits(logits, targets, name=None) <div class="md-anchor" id="sigmoid_cross_entropy_with_logits">{#sigmoid_cross_entropy_with_logits}</div>
+
+Computes sigmoid cross entropy given `logits`.
+
+Measures the probability error in discrete classification tasks in which each
+class is independent and not mutually exclusive.  For instance, one could
+perform multilabel classification where a picture can contain both an elephant
+and a dog at the same time.
+
+For brevity, let `x = logits`, `z = targets`.  The logistic loss is
+
+    x - x * z + log(1 + exp(-x))
+
+To ensure stability and avoid overflow, the implementation uses
+
+    max(x, 0) - x * z + log(1 + exp(-abs(x)))
+
+`logits` and `targets` must have the same type and shape.
+
+##### Args:
+
+
+*  <b>logits</b>: A `Tensor` of type `float32` or `float64`.
+*  <b>targets</b>: A `Tensor` of the same type and shape as `logits`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of the same shape as `logits` with the componentwise
+  logistic losses.
+
+
+- - -
+
+### tf.nn.softmax(logits, name=None) <div class="md-anchor" id="softmax">{#softmax}</div>
+
+Computes softmax activations.
+
+For each batch `i` and class `j` we have
+
+    softmax[i, j] = exp(logits[i, j]) / sum(exp(logits[i]))
+
+##### Args:
+
+
+*  <b>logits</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`.
+    2-D with shape `[batch_size, num_classes]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `logits`. Same shape as `logits`.
+
+
+- - -
+
+### tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None) <div class="md-anchor" id="softmax_cross_entropy_with_logits">{#softmax_cross_entropy_with_logits}</div>
+
+Computes softmax cross entropy between `logits` and `labels`.
+
+Measures the probability error in discrete classification tasks in which the
+classes are mutually exclusive (each entry is in exactly one class).  For
+example, each CIFAR-10 image is labeled with one and only one label: an image
+can be a dog or a truck, but not both.
+
+**WARNING:** This op expects unscaled logits, since it performs a `softmax`
+on `logits` internally for efficiency.  Do not call this op with the
+output of `softmax`, as it will produce incorrect results.
+
+`logits` and `labels` must have the same shape `[batch_size, num_classes]`
+and the same dtype (either `float32` or `float64`).
+
+##### Args:
+
+
+*  <b>logits</b>: Unscaled log probabilities.
+*  <b>labels</b>: Each row `labels[i]` must be a valid probability distribution.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A 1-D `Tensor` of length `batch_size` of the same type as `logits` with the
+  softmax cross entropy loss.
+
+
+
+## Embeddings <div class="md-anchor" id="AUTOGENERATED-embeddings">{#AUTOGENERATED-embeddings}</div>
+
+TensorFlow provides several operations that help you compute embeddings.
+
+- - -
+
+### tf.nn.embedding_lookup(params, ids, name=None) <div class="md-anchor" id="embedding_lookup">{#embedding_lookup}</div>
+
+Return a tensor of embedding values by looking up "ids" in "params".
+
+##### Args:
+
+
+*  <b>params</b>: List of tensors of the same shape.  A single tensor is
+          treated as a singleton list.
+*  <b>ids</b>: Tensor of integers containing the ids to be looked up in
+       'params'.  Let P be len(params).  If P > 1, then the ids are
+       partitioned by id % P, and we do separate lookups in params[p]
+       for 0 <= p < P, and then stitch the results back together into
+       a single result tensor.
+*  <b>name</b>: Optional name for the op.
+
+##### Returns:
+
+  A tensor of shape ids.shape + params[0].shape[1:] containing the
+  values params[i % P][i] for each i in ids.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if some parameters are invalid.
+
+
+- - -
+
+### tf.nn.embedding_lookup_sparse(params, sp_ids, sp_weights, name=None, combiner='mean') <div class="md-anchor" id="embedding_lookup_sparse">{#embedding_lookup_sparse}</div>
+
+Computes embeddings for the given ids and weights.
+
+This op assumes that there is at least one id for each row in the dense tensor
+represented by sp_ids (i.e. there are no rows with empty features), and that
+all the indices of sp_ids are in canonical row-major order.
+
+It also assumes that all id values lie in the range [0, p0), where p0
+is the sum of the size of params along dimension 0.
+
+##### Args:
+
+
+*  <b>params</b>: A single tensor representing the complete embedding tensor,
+    or a list of P tensors all of same shape except for the first dimension,
+    representing sharded embedding tensors. In the latter case, the ids are
+    partitioned by id % P, and we do separate lookups in params[p] for
+    0 <= p < P, and then stitch the results back together into a single
+    result tensor. The first dimension is allowed to vary as the vocab
+    size is not necessarily a multiple of P.
+*  <b>sp_ids</b>: N x M SparseTensor of int64 ids (typically from FeatureValueToId),
+    where N is typically batch size and M is arbitrary.
+*  <b>sp_weights</b>: either a SparseTensor of float / double weights, or None to
+    indicate all weights should be taken to be 1. If specified, sp_weights
+    must have exactly the same shape and indices as sp_ids.
+*  <b>name</b>: Optional name for the op.
+*  <b>combiner</b>: A string specifying the reduction op. Currently "mean" and "sum"
+    are supported.
+    "sum" computes the weighted sum of the embedding results for each row.
+    "mean" is the weighted sum divided by the total weight.
+
+##### Returns:
+
+  A dense tensor representing the combined embeddings for the
+  sparse ids. For each row in the dense tensor represented by sp_ids, the op
+  looks up the embeddings for all ids in that row, multiplies them by the
+  corresponding weight, and combines these embeddings as specified.
+
+  In other words, if
+    shape(combined params) = [p0, p1, ..., pm]
+  and
+    shape(sp_ids) = shape(sp_weights) = [d0, d1, ..., dn]
+  then
+    shape(output) = [d0, d1, ..., dn-1, p1, ..., pm].
+
+  For instance, if params is a 10x20 matrix, and sp_ids / sp_weights are
+
+    [0, 0]: id 1, weight 2.0
+    [0, 1]: id 3, weight 0.5
+    [1, 0]: id 0, weight 1.0
+    [2, 3]: id 1, weight 3.0
+
+  with combiner="mean", then the output will be a 3x20 matrix where
+    output[0, :] = (params[1, :] * 2.0 + params[3, :] * 0.5) / (2.0 + 0.5)
+    output[1, :] = params[0, :] * 1.0
+    output[2, :] = params[1, :] * 3.0
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If sp_ids is not a SparseTensor, or if sp_weights is neither
+    None nor SparseTensor.
+*  <b>ValueError</b>: If combiner is not one of {"mean", "sum"}.
+
+
+
+## Evaluation <div class="md-anchor" id="AUTOGENERATED-evaluation">{#AUTOGENERATED-evaluation}</div>
+
+The evaluation ops are useful for measuring the performance of a network.
+Since they are nondifferentiable, they are typically used at evaluation time.
+
+- - -
+
+### tf.nn.top_k(input, k, name=None) <div class="md-anchor" id="top_k">{#top_k}</div>
+
+Returns the values and indices of the k largest elements for each row.
+
+\\(values_{i, j}\\) represents the j-th largest element in \\(input_i\\).
+
+\\(indices_{i, j}\\) gives the column index of the corresponding element,
+such that \\(input_{i, indices_{i, j}} = values_{i, j}\\). If two
+elements are equal, the lower-index element appears first.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`. Must be one of the following types: `float32`, `float64`, `int32`, `int64`, `uint8`, `int16`, `int8`.
+    A batch_size x classes tensor
+*  <b>k</b>: An `int` that is `>= 1`.
+    Number of top elements to look for within each row
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A tuple of `Tensor` objects (values, indices).
+
+*  <b>values</b>: A `Tensor`. Has the same type as `input`. A batch_size x k tensor with the k largest elements for each row,
+    sorted in descending order
+*  <b>indices</b>: A `Tensor` of type `int32`. A batch_size x k tensor with the index of each value within each row
+
+
+- - -
+
+### tf.nn.in_top_k(predictions, targets, k, name=None) <div class="md-anchor" id="in_top_k">{#in_top_k}</div>
+
+Says whether the targets are in the top K predictions.
+
+This outputs a batch_size bool array, an entry out[i] is true if the
+prediction for the target class is among the top k predictions among
+all predictions for example i. Note that the behavior of InTopK differs
+from the TopK op in its handling of ties; if multiple classes have the
+same prediction value and straddle the top-k boundary, all of those
+classes are considered to be in the top k.
+
+More formally, let
+
+  \\(predictions_i\\) be the predictions for all classes for example i,
+  \\(targets_i\\) be the target class for example i,
+  \\(out_i\\) be the output for example i,
+
+$$out_i = predictions_{i, targets_i} \in TopKIncludingTies(predictions_i)$$
+
+##### Args:
+
+
+*  <b>predictions</b>: A `Tensor` of type `float32`. A batch_size x classes tensor
+*  <b>targets</b>: A `Tensor` of type `int32`. A batch_size vector of class ids
+*  <b>k</b>: An `int`. Number of top elements to look at for computing precision
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor` of type `bool`. Computed Precision at k as a bool Tensor
+
+
+
+## Candidate Sampling <div class="md-anchor" id="AUTOGENERATED-candidate-sampling">{#AUTOGENERATED-candidate-sampling}</div>
+
+Do you want to train a multiclass or multilabel model with thousands
+or millions of output classes (for example, a language model with a
+large vocabulary)?  Training with a full Softmax is slow in this case,
+since all of the classes are evaluated for every training example.
+Candidate Sampling training algorithms can speed up your step times by
+only considering a small randomly-chosen subset of contrastive classes
+(called candidates) for each batch of training examples.
+
+See our [Candidate Sampling Algorithms Reference]
+(http://www.tensorflow.org/extras/candidate_sampling.pdf)
+
+### Sampled Loss Functions <div class="md-anchor" id="AUTOGENERATED-sampled-loss-functions">{#AUTOGENERATED-sampled-loss-functions}</div>
+
+TensorFlow provides the following sampled loss functions for faster training.
+
+- - -
+
+### tf.nn.nce_loss(weights, biases, inputs, labels, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=False, name='nce_loss') <div class="md-anchor" id="nce_loss">{#nce_loss}</div>
+
+Computes and returns the noise-contrastive estimation training loss.
+
+See [Noise-contrastive estimation: A new estimation principle for
+unnormalized statistical models]
+(http://www.jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf).
+Also see our [Candidate Sampling Algorithms Reference]
+(http://www.tensorflow.org/extras/candidate_sampling.pdf)
+
+Note: In the case where num_true > 1, we assign to each target class
+the target probability 1 / num_true so that the target probabilities
+sum to 1 per-example.
+
+Note: It would be useful to allow a variable number of target classes per
+example.  We hope to provide this functionality in a future release.
+For now, if you have a variable number of target classes, you can pad them
+out to a constant number by either repeating them or by padding
+with an otherwise unused class.
+
+##### Args:
+
+
+*  <b>weights</b>: A `Tensor` of shape [num_classes, dim].  The class embeddings.
+*  <b>biases</b>: A `Tensor` of shape [num_classes].  The class biases.
+*  <b>inputs</b>: A `Tensor` of shape [batch_size, dim].  The forward
+      activations of the input network.
+*  <b>labels</b>: A `Tensor` of type `int64` and shape `[batch_size,
+    num_true]`. The target classes.
+*  <b>num_sampled</b>: An `int`.  The number of classes to randomly sample per batch.
+*  <b>num_classes</b>: An `int`. The number of possible classes.
+*  <b>num_true</b>: An `int`.  The number of target classes per training example.
+*  <b>sampled_values</b>: a tuple of `(sampled_candidates, true_expected_count,
+      sampled_expected_count)` returned by a *_candidate_sampler function.
+      (if None, we default to LogUniformCandidateSampler)
+*  <b>remove_accidental_hits</b>: A `bool`.  Whether to remove "accidental hits"
+      where a sampled class equals one of the target classes.  If set to
+      `True`, this is a "Sampled Logistic" loss instead of NCE, and we are
+      learning to generate log-odds instead of log probabilities.  See
+      our [Candidate Sampling Algorithms Reference]
+      (http://www.tensorflow.org/extras/candidate_sampling.pdf).
+      Default is False.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A batch_size 1-D tensor of per-example NCE losses.
+
+
+- - -
+
+### tf.nn.sampled_softmax_loss(weights, biases, inputs, labels, num_sampled, num_classes, num_true=1, sampled_values=None, remove_accidental_hits=True, name='sampled_softmax_loss') <div class="md-anchor" id="sampled_softmax_loss">{#sampled_softmax_loss}</div>
+
+Computes and returns the sampled softmax training loss.
+
+This is a faster way to train a softmax classifier over a huge number of
+classes.
+
+This operation is for training only.  It is generally an underestimate of
+the full softmax loss.
+
+At inference time, you can compute full softmax probabilities with the
+expression `tf.nn.softmax(tf.matmul(inputs, weights) + biases)`.
+
+See our [Candidate Sampling Algorithms Reference]
+(http://www.tensorflow.org/extras/candidate_sampling.pdf)
+
+Also see Section 3 of http://arxiv.org/abs/1412.2007 for the math.
+
+##### Args:
+
+
+*  <b>weights</b>: A `Tensor` of shape [num_classes, dim].  The class embeddings.
+*  <b>biases</b>: A `Tensor` of shape [num_classes].  The class biases.
+*  <b>inputs</b>: A `Tensor` of shape [batch_size, dim].  The forward
+      activations of the input network.
+*  <b>labels</b>: A `Tensor` of type `int64` and shape `[batch_size,
+    num_true]`. The target classes.  Note that this format differs from
+    the `labels` argument of `nn.softmax_cross_entropy_with_logits`.
+*  <b>num_sampled</b>: An `int`.  The number of classes to randomly sample per batch.
+*  <b>num_classes</b>: An `int`. The number of possible classes.
+*  <b>num_true</b>: An `int`.  The number of target classes per training example.
+*  <b>sampled_values</b>: a tuple of `(sampled_candidates, true_expected_count,
+      sampled_expected_count)` returned by a *_candidate_sampler function.
+      (if None, we default to LogUniformCandidateSampler)
+*  <b>remove_accidental_hits</b>: A `bool`.  whether to remove "accidental hits"
+      where a sampled class equals one of the target classes.  Default is
+      True.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A batch_size 1-D tensor of per-example sampled softmax losses.
+
+
+
+### Candidate Samplers <div class="md-anchor" id="AUTOGENERATED-candidate-samplers">{#AUTOGENERATED-candidate-samplers}</div>
+
+TensorFlow provides the following samplers for randomly sampling candidate
+classes when using one of the sampled loss functions above.
+
+- - -
+
+### tf.nn.uniform_candidate_sampler(true_classes, num_true, num_sampled, unique, range_max, seed=None, name=None) <div class="md-anchor" id="uniform_candidate_sampler">{#uniform_candidate_sampler}</div>
+
+Samples a set of classes using a uniform base distribution.
+
+This operation randomly samples a tensor of sampled classes
+(`sampled_candidates`) from the range of integers `[0, range_max]`.
+
+The elements of `sampled_candidates` are drawn without replacement
+(if `unique=True`) or with replacement (if `unique=False`) from
+the base distribution.
+
+The base distribution for this operation is the uniform distribution
+over the range of integers `[0, range_max]`.
+
+In addition, this operation returns tensors `true_expected_count`
+and `sampled_expected_count` representing the number of times each
+of the target classes (`true_classes`) and the sampled
+classes (`sampled_candidates`) is expected to occur in an average
+tensor of sampled classes.  These values correspond to `Q(y|x)`
+defined in [this
+document](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+If `unique=True`, then these are post-rejection probabilities and we
+compute them approximately.
+
+##### Args:
+
+
+*  <b>true_classes</b>: A `Tensor` of type `int64` and shape `[batch_size,
+    num_true]`. The target classes.
+*  <b>num_true</b>: An `int`.  The number of target classes per training example.
+*  <b>num_sampled</b>: An `int`.  The number of classes to randomly sample per batch.
+*  <b>unique</b>: A `bool`. Determines whether all sampled classes in a batch are
+    unique.
+*  <b>range_max</b>: An `int`. The number of possible classes.
+*  <b>seed</b>: An `int`. An operation-specific seed. Default is 0.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+
+*  <b>sampled_candidates</b>: A tensor of type `int64` and shape `[num_sampled]`.
+    The sampled classes.
+*  <b>true_expected_count</b>: A tensor of type `float`.  Same shape as
+    `true_classes`. The expected counts under the sampling distribution
+    of each of `true_classes`.
+*  <b>sampled_expected_count</b>: A tensor of type `float`. Same shape as
+    `sampled_candidates`. The expected counts under the sampling distribution
+    of each of `sampled_candidates`.
+
+
+- - -
+
+### tf.nn.log_uniform_candidate_sampler(true_classes, num_true, num_sampled, unique, range_max, seed=None, name=None) <div class="md-anchor" id="log_uniform_candidate_sampler">{#log_uniform_candidate_sampler}</div>
+
+Samples a set of classes using a log-uniform (Zipfian) base distribution.
+
+This operation randomly samples a tensor of sampled classes
+(`sampled_candidates`) from the range of integers `[0, range_max]`.
+
+The elements of `sampled_candidates` are drawn without replacement
+(if `unique=True`) or with replacement (if `unique=False`) from
+the base distribution.
+
+The base distribution for this operation is an approximately log-uniform
+or Zipfian distribution:
+
+`P(class) = (log(class + 2) - log(class + 1)) / log(range_max + 1)`
+
+This sampler is useful when the target classes approximately follow such
+a distribution - for example, if the classes represent words in a lexicon
+sorted in decreasing order of frequency. If your classes are not ordered by
+decreasing frequency, do not use this op.
+
+In addition, this operation returns tensors `true_expected_count`
+and `sampled_expected_count` representing the number of times each
+of the target classes (`true_classes`) and the sampled
+classes (`sampled_candidates`) is expected to occur in an average
+tensor of sampled classes.  These values correspond to `Q(y|x)`
+defined in [this
+document](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+If `unique=True`, then these are post-rejection probabilities and we
+compute them approximately.
+
+##### Args:
+
+
+*  <b>true_classes</b>: A `Tensor` of type `int64` and shape `[batch_size,
+    num_true]`. The target classes.
+*  <b>num_true</b>: An `int`.  The number of target classes per training example.
+*  <b>num_sampled</b>: An `int`.  The number of classes to randomly sample per batch.
+*  <b>unique</b>: A `bool`. Determines whether all sampled classes in a batch are
+    unique.
+*  <b>range_max</b>: An `int`. The number of possible classes.
+*  <b>seed</b>: An `int`. An operation-specific seed. Default is 0.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+
+*  <b>sampled_candidates</b>: A tensor of type `int64` and shape `[num_sampled]`.
+    The sampled classes.
+*  <b>true_expected_count</b>: A tensor of type `float`.  Same shape as
+    `true_classes`. The expected counts under the sampling distribution
+    of each of `true_classes`.
+*  <b>sampled_expected_count</b>: A tensor of type `float`. Same shape as
+    `sampled_candidates`. The expected counts under the sampling distribution
+    of each of `sampled_candidates`.
+
+
+- - -
+
+### tf.nn.learned_unigram_candidate_sampler(true_classes, num_true, num_sampled, unique, range_max, seed=None, name=None) <div class="md-anchor" id="learned_unigram_candidate_sampler">{#learned_unigram_candidate_sampler}</div>
+
+Samples a set of classes from a distribution learned during training.
+
+This operation randomly samples a tensor of sampled classes
+(`sampled_candidates`) from the range of integers `[0, range_max]`.
+
+The elements of `sampled_candidates` are drawn without replacement
+(if `unique=True`) or with replacement (if `unique=False`) from
+the base distribution.
+
+The base distribution for this operation is constructed on the fly
+during training.  It is a unigram distribution over the target
+classes seen so far during training.  Every integer in `[0, range_max]`
+begins with a weight of 1, and is incremented by 1 each time it is
+seen as a target class.  The base distribution is not saved to checkpoints,
+so it is reset when the model is reloaded.
+
+In addition, this operation returns tensors `true_expected_count`
+and `sampled_expected_count` representing the number of times each
+of the target classes (`true_classes`) and the sampled
+classes (`sampled_candidates`) is expected to occur in an average
+tensor of sampled classes.  These values correspond to `Q(y|x)`
+defined in [this
+document](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+If `unique=True`, then these are post-rejection probabilities and we
+compute them approximately.
+
+##### Args:
+
+
+*  <b>true_classes</b>: A `Tensor` of type `int64` and shape `[batch_size,
+    num_true]`. The target classes.
+*  <b>num_true</b>: An `int`.  The number of target classes per training example.
+*  <b>num_sampled</b>: An `int`.  The number of classes to randomly sample per batch.
+*  <b>unique</b>: A `bool`. Determines whether all sampled classes in a batch are
+    unique.
+*  <b>range_max</b>: An `int`. The number of possible classes.
+*  <b>seed</b>: An `int`. An operation-specific seed. Default is 0.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+
+*  <b>sampled_candidates</b>: A tensor of type `int64` and shape `[num_sampled]`.
+    The sampled classes.
+*  <b>true_expected_count</b>: A tensor of type `float`.  Same shape as
+    `true_classes`. The expected counts under the sampling distribution
+    of each of `true_classes`.
+*  <b>sampled_expected_count</b>: A tensor of type `float`. Same shape as
+    `sampled_candidates`. The expected counts under the sampling distribution
+    of each of `sampled_candidates`.
+
+
+- - -
+
+### tf.nn.fixed_unigram_candidate_sampler(true_classes, num_true, num_sampled, unique, range_max, vocab_file='', distortion=0.0, num_reserved_ids=0, num_shards=1, shard=0, unigrams=[], seed=None, name=None) <div class="md-anchor" id="fixed_unigram_candidate_sampler">{#fixed_unigram_candidate_sampler}</div>
+
+Samples a set of classes using the provided (fixed) base distribution.
+
+This operation randomly samples a tensor of sampled classes
+(`sampled_candidates`) from the range of integers `[0, range_max]`.
+
+The elements of `sampled_candidates` are drawn without replacement
+(if `unique=True`) or with replacement (if `unique=False`) from
+the base distribution.
+
+The base distribution is read from a file or passed in as an
+in-memory array. There is also an option to skew the distribution by
+applying a distortion power to the weights.
+
+In addition, this operation returns tensors `true_expected_count`
+and `sampled_expected_count` representing the number of times each
+of the target classes (`true_classes`) and the sampled
+classes (`sampled_candidates`) is expected to occur in an average
+tensor of sampled classes.  These values correspond to `Q(y|x)`
+defined in [this
+document](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+If `unique=True`, then these are post-rejection probabilities and we
+compute them approximately.
+
+##### Args:
+
+
+*  <b>true_classes</b>: A `Tensor` of type `int64` and shape `[batch_size,
+    num_true]`. The target classes.
+*  <b>num_true</b>: An `int`.  The number of target classes per training example.
+*  <b>num_sampled</b>: An `int`.  The number of classes to randomly sample per batch.
+*  <b>unique</b>: A `bool`. Determines whether all sampled classes in a batch are
+    unique.
+*  <b>range_max</b>: An `int`. The number of possible classes.
+*  <b>vocab_file</b>: Each valid line in this file (which should have a CSV-like
+    format) corresponds to a valid word ID. IDs are in sequential order,
+    starting from num_reserved_ids. The last entry in each line is expected
+    to be a value corresponding to the count or relative probability. Exactly
+    one of `vocab_file` and `unigrams` needs to be passed to this operation.
+*  <b>distortion</b>: The distortion is used to skew the unigram probability
+    distribution.  Each weight is first raised to the distortion's power
+    before adding to the internal unigram distribution. As a result,
+    `distortion = 1.0` gives regular unigram sampling (as defined by the vocab
+    file), and `distortion = 0.0` gives a uniform distribution.
+*  <b>num_reserved_ids</b>: Optionally some reserved IDs can be added in the range
+    `[0, num_reserved_ids]` by the users. One use case is that a special
+    unknown word token is used as ID 0. These IDs will have a sampling
+    probability of 0.
+*  <b>num_shards</b>: A sampler can be used to sample from a subset of the original
+    range in order to speed up the whole computation through parallelism. This
+    parameter (together with `shard`) indicates the number of partitions that
+    are being used in the overall computation.
+*  <b>shard</b>: A sampler can be used to sample from a subset of the original range
+    in order to speed up the whole computation through parallelism. This
+    parameter (together with `num_shards`) indicates the particular partition
+    number of the operation, when partitioning is being used.
+*  <b>unigrams</b>: A list of unigram counts or probabilities, one per ID in
+    sequential order. Exactly one of `vocab_file` and `unigrams` should be
+    passed to this operation.
+*  <b>seed</b>: An `int`. An operation-specific seed. Default is 0.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+
+*  <b>sampled_candidates</b>: A tensor of type `int64` and shape `[num_sampled]`.
+    The sampled classes.
+*  <b>true_expected_count</b>: A tensor of type `float`.  Same shape as
+    `true_classes`. The expected counts under the sampling distribution
+    of each of `true_classes`.
+*  <b>sampled_expected_count</b>: A tensor of type `float`. Same shape as
+    `sampled_candidates`. The expected counts under the sampling distribution
+    of each of `sampled_candidates`.
+
+
+
+### Miscellaneous candidate sampling utilities <div class="md-anchor" id="AUTOGENERATED-miscellaneous-candidate-sampling-utilities">{#AUTOGENERATED-miscellaneous-candidate-sampling-utilities}</div>
+
+- - -
+
+### tf.nn.compute_accidental_hits(true_classes, sampled_candidates, num_true, seed=None, name=None) <div class="md-anchor" id="compute_accidental_hits">{#compute_accidental_hits}</div>
+
+Compute the ids of positions in sampled_candidates matching true_classes.
+
+In Candidate Sampling, this operation facilitates virtually removing
+sampled classes which happen to match target classes.  This is done
+in Sampled Softmax and Sampled Logistic.
+
+See our [Candidate Sampling Algorithms
+Reference](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+
+We presuppose that the `sampled_candidates` are unique.
+
+We call it an 'accidental hit' when one of the target classes
+matches one of the sampled classes.  This operation reports
+accidental hits as triples `(index, id, weight)`, where `index`
+represents the row number in `true_classes`, `id` represents the
+position in `sampled_candidates`, and weight is `-FLOAT_MAX`.
+
+The result of this op should be passed through a `sparse_to_dense`
+operation, then added to the logits of the sampled classes. This
+removes the contradictory effect of accidentally sampling the true
+target classes as noise classes for the same example.
+
+##### Args:
+
+
+*  <b>true_classes</b>: A `Tensor` of type `int64` and shape `[batch_size,
+    num_true]`. The target classes.
+*  <b>sampled_candidates</b>: A tensor of type `int64` and shape `[num_sampled]`.
+    The sampled_candidates output of CandidateSampler.
+*  <b>num_true</b>: An `int`.  The number of target classes per training example.
+*  <b>seed</b>: An `int`. An operation-specific seed. Default is 0.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+
+*  <b>indices</b>: A `Tensor` of type `int32` and shape `[num_accidental_hits]`.
+    Values indicate rows in `true_classes`.
+*  <b>ids</b>: A `Tensor` of type `int64` and shape `[num_accidental_hits]`.
+    Values indicate positions in `sampled_candidates`.
+*  <b>weights</b>: A `Tensor` of type `float` and shape `[num_accidental_hits]`.
+    Each value is `-FLOAT_MAX`.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/ops.md b/tensorflow/g3doc/api_docs/python/ops.md
new file mode 100644
index 0000000000..bb7d6e70e2
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/ops.md
@@ -0,0 +1,10 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Leftovers, should be empty and removed
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+
diff --git a/tensorflow/g3doc/api_docs/python/python_io.md b/tensorflow/g3doc/api_docs/python/python_io.md
new file mode 100644
index 0000000000..7ad4b65bd0
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/python_io.md
@@ -0,0 +1,104 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Data IO (Python functions)
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Data IO (Python Functions)](#AUTOGENERATED-data-io--python-functions-)
+  * [class tf.python_io.TFRecordWriter](#TFRecordWriter)
+  * [tf.python_io.tf_record_iterator(path)](#tf_record_iterator)
+  * [TFRecords Format Details](#AUTOGENERATED-tfrecords-format-details)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Data IO (Python Functions) <div class="md-anchor" id="AUTOGENERATED-data-io--python-functions-">{#AUTOGENERATED-data-io--python-functions-}</div>
+
+A TFRecords file represents a sequence of (binary) strings.  The format is not
+random access, so it is suitable for streaming large amounts of data but not
+suitable if fast sharding or other non-sequential access is desired.
+
+- - -
+
+### class tf.python_io.TFRecordWriter <div class="md-anchor" id="TFRecordWriter">{#TFRecordWriter}</div>
+
+A class to write records to a TFRecords file.
+
+This class implements `__enter__` and `__exit__`, and can be used
+in `with` blocks like a normal file.
+
+- - -
+
+#### tf.python_io.TFRecordWriter.__init__(path) {#TFRecordWriter.__init__}
+
+Opens file `path` and creates a `TFRecordWriter` writing to it.
+
+##### Args:
+
+
+*  <b>path</b>: The path to the TFRecords file.
+
+##### Raises:
+
+
+*  <b>IOError</b>: If `path` cannot be opened for writing.
+
+
+- - -
+
+#### tf.python_io.TFRecordWriter.write(record) {#TFRecordWriter.write}
+
+Write a string record to the file.
+
+##### Args:
+
+
+*  <b>record</b>: str
+
+
+- - -
+
+#### tf.python_io.TFRecordWriter.close() {#TFRecordWriter.close}
+
+Close the file.
+
+
+
+- - -
+
+### tf.python_io.tf_record_iterator(path) <div class="md-anchor" id="tf_record_iterator">{#tf_record_iterator}</div>
+
+An iterator that read the records from a TFRecords file.
+
+##### Args:
+
+
+*  <b>path</b>: The path to the TFRecords file.
+
+##### Yields:
+
+  Strings.
+
+##### Raises:
+
+
+*  <b>IOError</b>: If `path` cannot be opened for reading.
+
+
+
+- - -
+
+### TFRecords Format Details <div class="md-anchor" id="AUTOGENERATED-tfrecords-format-details">{#AUTOGENERATED-tfrecords-format-details}</div>
+
+A TFRecords file contains a sequence of strings with CRC hashes.  Each record
+has the format
+
+    uint64 length
+    uint32 masked_crc32_of_length
+    byte   data[length]
+    uint32 masked_crc32_of_data
+
+and the records are concatenated together to produce the file.  The CRC32s
+are [described here](https://en.wikipedia.org/wiki/Cyclic_redundancy_check),
+and the mask of a CRC is
+
+    masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul
diff --git a/tensorflow/g3doc/api_docs/python/sparse_ops.md b/tensorflow/g3doc/api_docs/python/sparse_ops.md
new file mode 100644
index 0000000000..7e9ab0775f
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/sparse_ops.md
@@ -0,0 +1,502 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Sparse Tensors
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Sparse Tensor Representation.](#AUTOGENERATED-sparse-tensor-representation.)
+  * [class tf.SparseTensor](#SparseTensor)
+  * [class tf.SparseTensorValue](#SparseTensorValue)
+* [Sparse to Dense Conversion.](#AUTOGENERATED-sparse-to-dense-conversion.)
+  * [tf.sparse_to_dense(sparse_indices, output_shape, sparse_values, default_value, name=None)](#sparse_to_dense)
+  * [tf.sparse_tensor_to_dense(sp_input, default_value, name=None)](#sparse_tensor_to_dense)
+  * [tf.sparse_to_indicator(sp_input, vocab_size, name=None)](#sparse_to_indicator)
+* [Manipulation.](#AUTOGENERATED-manipulation.)
+  * [tf.sparse_concat(concat_dim, sp_inputs, name=None)](#sparse_concat)
+  * [tf.sparse_reorder(sp_input, name=None)](#sparse_reorder)
+  * [tf.sparse_retain(sp_input, to_retain)](#sparse_retain)
+  * [tf.sparse_fill_empty_rows(sp_input, default_value, name=None)](#sparse_fill_empty_rows)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Sparse Tensor Representation. <div class="md-anchor" id="AUTOGENERATED-sparse-tensor-representation.">{#AUTOGENERATED-sparse-tensor-representation.}</div>
+
+Tensorflow supports a `SparseTensor` representation for data that is sparse
+in multiple dimensions. Contrast this representation with `IndexedSlices`,
+which is efficient for representing tensors that are sparse in their first
+dimension, and dense along all other dimensions.
+
+- - -
+
+### class tf.SparseTensor <div class="md-anchor" id="SparseTensor">{#SparseTensor}</div>
+
+Represents a sparse tensor.
+
+Tensorflow represents a sparse tensor as three separate dense tensors:
+`indices`, `values`, and `dense_shape`.  In Python, the three tensors are
+collected into a `SparseTensor` class for ease of use.  If you have separate
+`indices`, `values`, and `dense_shape` tensors, wrap them in a `SparseTensor`
+object before passing to the Ops below.
+
+Concretely, the sparse tensor `SparseTensor(values, indices, dense_shape)` is
+
+* `indices`: A 2-D int64 tensor of shape `[N, ndims]`.
+* `values`: A 1-D tensor of any type and shape `[N]`.
+* `dense_shape`: A 1-D int64 tensor of shape `[ndims]`.
+
+where `N` and `ndims` are the number of values, and number of dimensions in
+the `SparseTensor` respectively.
+
+The corresponding dense tensor satisfies
+
+```python
+dense.shape = dense_shape
+dense[tuple(indices[i])] = values[i]
+```
+
+By convention, `indices` should be sorted in row-major order (or equivalently
+lexigraphic order on the tuples `indices[i]`).  This is not enforced when
+`SparseTensor` objects are constructed, but most Ops assume correct ordering.
+If the ordering is wrong, it can be fixed by calling `sparse_reorder` on the
+misordered `SparseTensor`.
+
+Example: The sparse tensor
+
+```python
+  SparseTensor(values=[1, 2], indices=[[0, 0], [1, 2]], shape=[3, 4])
+```
+
+represents the dense tensor
+
+```python
+  [[1, 0, 0, 0]
+   [0, 0, 2, 0]
+   [0, 0, 0, 0]]
+```
+
+- - -
+
+#### tf.SparseTensor.__init__(indices, values, shape) {#SparseTensor.__init__}
+
+Creates a `SparseTensor`.
+
+##### Args:
+
+
+*  <b>indices</b>: A 2-D int64 tensor of shape `[N, ndims]`.
+*  <b>values</b>: A 1-D tensor of any type and shape `[N]`.
+*  <b>dense_shape</b>: A 1-D int64 tensor of shape `[ndims]`.
+
+##### Returns:
+
+  A `SparseTensor`
+
+
+- - -
+
+#### tf.SparseTensor.indices {#SparseTensor.indices}
+
+The indices of non-zero values in the represented dense tensor.
+
+##### Returns:
+
+  A 2-D Tensor of int64 with shape `[N, ndims]`, where `N` is the
+    number of non-zero values in the tensor, and `ndims` is the rank.
+
+- - -
+
+#### tf.SparseTensor.values {#SparseTensor.values}
+
+The non-zero values in the represented dense tensor.
+
+##### Returns:
+
+  A 1-D Tensor of any data type.
+
+- - -
+
+#### tf.SparseTensor.dtype {#SparseTensor.dtype}
+
+The `DType` of elements in this tensor.
+
+- - -
+
+#### tf.SparseTensor.shape {#SparseTensor.shape}
+
+A 1-D Tensor of int64 representing the shape of the dense tensor.
+
+- - -
+
+#### tf.SparseTensor.graph {#SparseTensor.graph}
+
+The `Graph` that contains the index, value, and shape tensors.
+
+
+- - -
+
+### class tf.SparseTensorValue <div class="md-anchor" id="SparseTensorValue">{#SparseTensorValue}</div>
+
+SparseTensorValue(indices, values, shape)
+- - -
+
+#### tf.SparseTensorValue.indices {#SparseTensorValue.indices}
+
+Alias for field number 0
+
+- - -
+
+#### tf.SparseTensorValue.shape {#SparseTensorValue.shape}
+
+Alias for field number 2
+
+- - -
+
+#### tf.SparseTensorValue.values {#SparseTensorValue.values}
+
+Alias for field number 1
+
+
+
+## Sparse to Dense Conversion. <div class="md-anchor" id="AUTOGENERATED-sparse-to-dense-conversion.">{#AUTOGENERATED-sparse-to-dense-conversion.}</div>
+
+- - -
+
+### tf.sparse_to_dense(sparse_indices, output_shape, sparse_values, default_value, name=None) <div class="md-anchor" id="sparse_to_dense">{#sparse_to_dense}</div>
+
+Converts a sparse representation into a dense tensor.
+
+Builds an array `dense` with shape `output_shape` such that
+
+```prettyprint
+# If sparse_indices is scalar
+dense[i] = (i == sparse_indices ? sparse_values : default_value)
+
+# If sparse_indices is a vector, then for each i
+dense[sparse_indices[i]] = sparse_values[i]
+
+# If sparse_indices is an n by d matrix, then for each i in [0, n)
+dense[sparse_indices[i][0], ..., sparse_indices[i][d-1]] = sparse_values[i]
+```
+
+All other values in `dense` are set to `default_value`.  If `sparse_values` is a
+scalar, all sparse indices are set to this single value.
+
+##### Args:
+
+
+*  <b>sparse_indices</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    0-D, 1-D, or 2-D.  `sparse_indices[i]` contains the complete
+    index where `sparse_values[i]` will be placed.
+*  <b>output_shape</b>: A `Tensor`. Must have the same type as `sparse_indices`.
+    1-D.  Shape of the dense output tensor.
+*  <b>sparse_values</b>: A `Tensor`.
+    1-D.  Values corresponding to each row of `sparse_indices`,
+    or a scalar value to be used for all sparse indices.
+*  <b>default_value</b>: A `Tensor`. Must have the same type as `sparse_values`.
+    Scalar value to set for indices not specified in
+    `sparse_indices`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `sparse_values`.
+  Dense output tensor of shape `output_shape`.
+
+
+- - -
+
+### tf.sparse_tensor_to_dense(sp_input, default_value, name=None) <div class="md-anchor" id="sparse_tensor_to_dense">{#sparse_tensor_to_dense}</div>
+
+Converts a `SparseTensor` into a dense tensor.
+
+This op is a convenience wrapper around `sparse_to_dense` for `SparseTensor`s.
+
+For example, if `sp_input` has shape `[3, 5]` and non-empty string values:
+
+    [0, 1]: a
+    [0, 3]: b
+    [2, 0]: c
+
+and `default_value` is `x`, then the output will be a dense `[3, 5]`
+string tensor with values:
+
+    [[x a x b x]
+     [x x x x x]
+     [c x x x x]]
+
+##### Args:
+
+
+*  <b>sp_input</b>: The input `SparseTensor`.
+*  <b>default_value</b>: Scalar value to set for indices not specified in
+    `sp_input`.
+*  <b>name</b>: A name prefix for the returned tensors (optional).
+
+##### Returns:
+
+  A dense tensor with shape `sp_input.shape` and values specified by
+  the non-empty values in `sp_input`. Indices not in `sp_input` are assigned
+  `default_value`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `sp_input` is not a `SparseTensor`.
+
+
+- - -
+
+### tf.sparse_to_indicator(sp_input, vocab_size, name=None) <div class="md-anchor" id="sparse_to_indicator">{#sparse_to_indicator}</div>
+
+Converts a `SparseTensor` of ids into a dense bool indicator tensor.
+
+The last dimension of `sp_input` is discarded and replaced with the values of
+`sp_input`.  If `sp_input.shape = [D0, D1, ..., Dn, K]`, then
+`output.shape = [D0, D1, ..., Dn, vocab_size]`, where
+
+    output[d_0, d_1, ..., d_n, sp_input[d_0, d_1, ..., d_n, k]] = True
+
+and False elsewhere in `output`.
+
+For example, if `sp_input.shape = [2, 3, 4]` with non-empty values:
+
+    [0, 0, 0]: 0
+    [0, 1, 0]: 10
+    [1, 0, 3]: 103
+    [1, 1, 2]: 112
+    [1, 1, 3]: 113
+    [1, 2, 1]: 121
+
+and `vocab_size = 200`, then the output will be a `[2, 3, 200]` dense bool
+tensor with False everywhere except at positions
+
+    (0, 0, 0), (0, 1, 10), (1, 0, 103), (1, 1, 112), (1, 1, 113), (1, 2, 121).
+
+This op is useful for converting `SparseTensor`s into dense formats for
+compatibility with ops that expect dense tensors.
+
+The input `SparseTensor` must be in row-major order.
+
+##### Args:
+
+
+*  <b>sp_input</b>: A `SparseTensor` of type `int32` or `int64`.
+*  <b>vocab_size</b>: The new size of the last dimension, with
+    `all(0 <= sp_input.values < vocab_size)`.
+*  <b>name</b>: A name prefix for the returned tensors (optional)
+
+##### Returns:
+
+  A dense bool indicator tensor representing the indices with specified value.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `sp_input` is not a `SparseTensor`.
+
+
+
+## Manipulation. <div class="md-anchor" id="AUTOGENERATED-manipulation.">{#AUTOGENERATED-manipulation.}</div>
+
+- - -
+
+### tf.sparse_concat(concat_dim, sp_inputs, name=None) <div class="md-anchor" id="sparse_concat">{#sparse_concat}</div>
+
+Concatenates a list of `SparseTensor` along the specified dimension.
+
+Concatenation is with respect to the dense versions of each sparse input.
+It is assumed that each inputs is a `SparseTensor` whose elements are ordered
+along increasing dimension number.
+
+All inputs' shapes must match, except for the concat dimension.  The
+`indices`, `values`, and `shapes` lists must have the same length.
+
+The output shape is identical to the inputs', except along the concat
+dimension, where it is the sum of the inputs' sizes along that dimension.
+
+The output elements will be resorted to preserve the sort order along
+increasing dimension number.
+
+This op runs in `O(M log M)` time, where `M` is the total number of non-empty
+values across all inputs. This is due to the need for an internal sort in
+order to concatenate efficiently across an arbitrary dimension.
+
+For example, if `concat_dim = 1` and the inputs are
+
+    sp_inputs[0]: shape = [2, 3]
+    [0, 2]: "a"
+    [1, 0]: "b"
+    [1, 1]: "c"
+
+    sp_inputs[1]: shape = [2, 4]
+    [0, 1]: "d"
+    [0, 2]: "e"
+
+then the output will be
+
+    shape = [2, 7]
+    [0, 2]: "a"
+    [0, 4]: "d"
+    [0, 5]: "e"
+    [1, 0]: "b"
+    [1, 1]: "c"
+
+Graphically this is equivalent to doing
+
+    [    a] concat [  d e  ] = [    a   d e  ]
+    [b c  ]        [       ]   [b c          ]
+
+##### Args:
+
+
+*  <b>concat_dim</b>: Dimension to concatenate along.
+*  <b>sp_inputs</b>: List of `SparseTensor` to concatenate.
+*  <b>name</b>: A name prefix for the returned tensors (optional).
+
+##### Returns:
+
+  A `SparseTensor` with the concatenated output.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `sp_inputs` is not a list of `SparseTensor`.
+
+
+- - -
+
+### tf.sparse_reorder(sp_input, name=None) <div class="md-anchor" id="sparse_reorder">{#sparse_reorder}</div>
+
+Reorders a `SparseTensor` into the canonical, row-major ordering.
+
+Note that by convention, all sparse ops preserve the canonical ordering
+along increasing dimension number. The only time ordering can be violated
+is during manual manipulation of the indices and values to add entries.
+
+Reordering does not affect the shape of the `SparseTensor`.
+
+For example, if sp_input has shape `[4, 5]` and `indices` / `values`:
+
+    [0, 3]: b
+    [0, 1]: a
+    [3, 1]: d
+    [2, 0]: c
+
+then the output will be a `SparseTensor` of shape `[4, 5]` and
+`indices` / `values`:
+
+    [0, 1]: a
+    [0, 3]: b
+    [2, 0]: c
+    [3, 1]: d
+
+##### Args:
+
+
+*  <b>sp_input</b>: The input `SparseTensor`.
+*  <b>name</b>: A name prefix for the returned tensors (optional)
+
+##### Returns:
+
+  A `SparseTensor` with the same shape and non-empty values, but in
+  canonical ordering.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `sp_input` is not a `SparseTensor`.
+
+
+- - -
+
+### tf.sparse_retain(sp_input, to_retain) <div class="md-anchor" id="sparse_retain">{#sparse_retain}</div>
+
+Retains specified non-empty values within a `SparseTensor`.
+
+For example, if `sp_input` has shape `[4, 5]` and 4 non-empty string values:
+
+    [0, 1]: a
+    [0, 3]: b
+    [2, 0]: c
+    [3, 1]: d
+
+and `to_retain = [True, False, False, True]`, then the output will
+be a `SparseTensor` of shape `[4, 5]` with 2 non-empty values:
+
+    [0, 1]: a
+    [3, 1]: d
+
+##### Args:
+
+
+*  <b>sp_input</b>: The input `SparseTensor` with `N` non-empty elements.
+*  <b>to_retain</b>: A bool vector of length `N` with `M` true values.
+
+##### Returns:
+
+  A `SparseTensor` with the same shape as the input and `M` non-empty
+  elements corresponding to the true positions in `to_retain`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `sp_input` is not a `SparseTensor`.
+
+
+- - -
+
+### tf.sparse_fill_empty_rows(sp_input, default_value, name=None) <div class="md-anchor" id="sparse_fill_empty_rows">{#sparse_fill_empty_rows}</div>
+
+Fills empty rows in the input 2-D `SparseTensor` with a default value.
+
+This op adds entries with the specified `default_value` at index
+`[row, 0]` for any row in the input that does not already have a value.
+
+For example, suppose `sp_input` has shape `[5, 6]` and non-empty values:
+
+    [0, 1]: a
+    [0, 3]: b
+    [2, 0]: c
+    [3, 1]: d
+
+Rows 1 and 4 are empty, so the output will be of shape `[5, 6]` with values:
+
+    [0, 1]: a
+    [0, 3]: b
+    [1, 0]: default_value
+    [2, 0]: c
+    [3, 1]: d
+    [4, 0]: default_value
+
+Note that the input may have empty columns at the end, with no effect on
+this op.
+
+The output `SparseTensor` will be in row-major order and will have the
+same shape as the input.
+
+This op also returns an indicator vector such that
+
+    empty_row_indicator[i] = True iff row i was an empty row.
+
+##### Args:
+
+
+*  <b>sp_input</b>: A `SparseTensor` with shape `[N, M]`.
+*  <b>default_value</b>: The value to fill for empty rows, with the same type as
+    `sp_input.`
+*  <b>name</b>: A name prefix for the returned tensors (optional)
+
+##### Returns:
+
+
+*  <b>sp_ordered_output</b>: A `SparseTensor` with shape `[N, M]`, and with all empty
+    rows filled in with `default_value`.
+*  <b>empty_row_indicator</b>: A bool vector of length `N` indicating whether each
+    input row was empty.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `sp_input` is not a `SparseTensor`.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/state_ops.md b/tensorflow/g3doc/api_docs/python/state_ops.md
new file mode 100644
index 0000000000..70d912178b
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/state_ops.md
@@ -0,0 +1,1383 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Variables
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Variables](#AUTOGENERATED-variables)
+  * [class tf.Variable](#Variable)
+* [Variable helper functions](#AUTOGENERATED-variable-helper-functions)
+  * [tf.all_variables()](#all_variables)
+  * [tf.trainable_variables()](#trainable_variables)
+  * [tf.initialize_all_variables()](#initialize_all_variables)
+  * [tf.initialize_variables(var_list, name='init')](#initialize_variables)
+  * [tf.assert_variables_initialized(var_list=None)](#assert_variables_initialized)
+* [Saving and Restoring Variables.](#AUTOGENERATED-saving-and-restoring-variables.)
+  * [class tf.train.Saver](#Saver)
+  * [tf.train.latest_checkpoint(checkpoint_dir, latest_filename=None)](#latest_checkpoint)
+  * [tf.train.get_checkpoint_state(checkpoint_dir, latest_filename=None)](#get_checkpoint_state)
+  * [tf.train.update_checkpoint_state(save_dir, model_checkpoint_path, all_model_checkpoint_paths=None, latest_filename=None)](#update_checkpoint_state)
+* [Sharing Variables](#AUTOGENERATED-sharing-variables)
+  * [tf.get_variable(name, shape=None, dtype=tf.float32, initializer=None, trainable=True, collections=None)](#get_variable)
+  * [tf.get_variable_scope()](#get_variable_scope)
+  * [tf.variable_scope(*args, **kwds)](#variable_scope)
+  * [tf.constant_initializer(value=0.0)](#constant_initializer)
+  * [tf.random_normal_initializer(mean=0.0, stddev=1.0, seed=None)](#random_normal_initializer)
+  * [tf.truncated_normal_initializer(mean=0.0, stddev=1.0, seed=None)](#truncated_normal_initializer)
+  * [tf.random_uniform_initializer(minval=0.0, maxval=1.0, seed=None)](#random_uniform_initializer)
+  * [tf.uniform_unit_scaling_initializer(factor=1.0, seed=None)](#uniform_unit_scaling_initializer)
+  * [tf.zeros_initializer(shape, dtype=tf.float32)](#zeros_initializer)
+* [Sparse Variable Updates](#AUTOGENERATED-sparse-variable-updates)
+  * [tf.scatter_update(ref, indices, updates, use_locking=None, name=None)](#scatter_update)
+  * [tf.scatter_add(ref, indices, updates, use_locking=None, name=None)](#scatter_add)
+  * [tf.scatter_sub(ref, indices, updates, use_locking=None, name=None)](#scatter_sub)
+  * [tf.sparse_mask(a, mask_indices, name=None)](#sparse_mask)
+  * [class tf.IndexedSlices](#IndexedSlices)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Variables <div class="md-anchor" id="AUTOGENERATED-variables">{#AUTOGENERATED-variables}</div>
+
+- - -
+
+### class tf.Variable <div class="md-anchor" id="Variable">{#Variable}</div>
+
+See the [Variables How To](../../how_tos/variables/index.md) for a high
+level overview.
+
+A variable maintains state in the graph across calls to `run()`. You add a
+variable to the graph by constructing an instance of the class `Variable`.
+
+The `Variable()` constructor requires an initial value for the variable,
+which can be a `Tensor` of any type and shape. The initial value defines the
+type and shape of the variable. After construction, the type and shape of
+the variable are fixed. The value can be changed using one of the assign
+methods.
+
+If you want to change the shape of a variable later you have to use an
+`assign` Op with `validate_shape=False`.
+
+Just like any `Tensor`, variables created with `Variable()` can be used as
+inputs for other Ops in the graph. Additionally, all the operators
+overloaded for the `Tensor` class are carried over to variables, so you can
+also add nodes to the graph by just doing arithmetic on variables.
+
+```python
+import tensorflow as tf
+
+# Create a variable.
+w = tf.Variable(<initial-value>, name=<optional-name>)
+
+# Use the variable in the graph like any Tensor.
+y = tf.matmul(w, ...another variable or tensor...)
+
+# The overloaded operators are available too.
+z = tf.sigmoid(w + b)
+
+# Assign a new value to the variable with `assign()` or a related method.
+w.assign(w + 1.0)
+w.assign_add(1.0)
+```
+
+When you launch the graph, variables have to be explicitly initialized before
+you can run Ops that use their value. You can initialize a variable by
+running its *initializer op*, restoring the variable from a save file, or
+simply running an `assign` Op that assigns a value to the variable. In fact,
+the variable *initializer op* is just an `assign` Op that assigns the
+variable's initial value to the variable itself.
+
+```python
+# Launch the graph in a session.
+with tf.Session() as sess:
+    # Run the variable initializer.
+    sess.run(w.initializer)
+    # ...you now can run ops that use the value of 'w'...
+```
+
+The most common initialization pattern is to use the convenience function
+`initialize_all_variables()` to add an Op to the graph that initializes
+all the variables. You then run that Op after launching the graph.
+
+```python
+# Add an Op to initialize all variables.
+init_op = tf.initialize_all_variables()
+
+# Launch the graph in a session.
+with tf.Session() as sess:
+    # Run the Op that initializes all variables.
+    sess.run(init_op)
+    # ...you can now run any Op that uses variable values...
+```
+
+If you need to create a variable with an initial value dependent on another
+variable, use the other variable's `initialized_value()`. This ensures that
+variables are initialized in the right order.
+
+All variables are automatically collected in the graph where they are
+created. By default, the constructor adds the new variable to the graph
+collection `GraphKeys.VARIABLES`. The convenience function
+`all_variables()` returns the contents of that collection.
+
+When building a machine learning model it is often convenient to distinguish
+betwen variables holding the trainable model parameters and other variables
+such as a `global step` variable used to count training steps. To make this
+easier, the variable constructor supports a `trainable=<bool>` parameter. If
+`True`, the new variable is also added to the graph collection
+`GraphKeys.TRAINABLE_VARIABLES`. The convenience function
+`trainable_variables()` returns the contents of this collection. The
+various `Optimizer` classes use this collection as the default list of
+variables to optimize.
+
+
+Creating a variable.
+
+- - -
+
+#### tf.Variable.__init__(initial_value, trainable=True, collections=None, validate_shape=True, name=None) {#Variable.__init__}
+
+Creates a new variable with value `initial_value`.
+
+The new variable is added to the graph collections listed in `collections`,
+which defaults to `[GraphKeys.VARIABLES]`.
+
+If `trainable` is `True` the variable is also added to the graph collection
+`GraphKeys.TRAINABLE_VARIABLES`.
+
+This constructor creates both a `variable` Op and an `assign` Op to set the
+variable to its initial value.
+
+##### Args:
+
+
+*  <b>initial_value</b>: A `Tensor`, or Python object convertible to a `Tensor`.
+    The initial value for the Variable. Must have a shape specified unless
+    `validate_shape` is set to False.
+*  <b>trainable</b>: If `True`, the default, also adds the variable to the graph
+    collection `GraphKeys.TRAINABLE_VARIABLES`. This collection is used as
+    the default list of variables to use by the `Optimizer` classes.
+*  <b>collections</b>: List of graph collections keys. The new variable is added to
+    these collections. Defaults to `[GraphKeys.VARIABLES]`.
+*  <b>validate_shape</b>: If `False`, allows the variable to be initialized with a
+    value of unknown shape. If `True`, the default, the shape of
+    `initial_value` must be known.
+*  <b>name</b>: Optional name for the variable. Defaults to `'Variable'` and gets
+    uniquified automatically.
+
+##### Returns:
+
+  A Variable.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If the initial value does not have a shape and
+    `validate_shape` is `True`.
+
+
+- - -
+
+#### tf.Variable.initialized_value() {#Variable.initialized_value}
+
+Returns the value of the initialized variable.
+
+You should use this instead of the variable itself to initialize another
+variable with a value that depends on the value of this variable.
+
+```python
+# Initialize 'v' with a random tensor.
+v = tf.Variable(tf.truncated_normal([10, 40]))
+# Use `initialized_value` to guarantee that `v` has been
+# initialized before its value is used to initialize `w`.
+# The random values are picked only once.
+w = tf.Variable(v.initialized_value() * 2.0)
+```
+
+##### Returns:
+
+  A `Tensor` holding the value of this variable after its initializer
+  has run.
+
+
+
+Changing a variable value.
+
+- - -
+
+#### tf.Variable.assign(value, use_locking=False) {#Variable.assign}
+
+Assigns a new value to the variable.
+
+This is essentially a shortcut for `assign(self, value)`.
+
+##### Args:
+
+
+*  <b>value</b>: A `Tensor`. The new value for this variable.
+*  <b>use_locking</b>: If `True`, use locking during the assignment.
+
+##### Returns:
+
+  A `Tensor` that will hold the new value of this variable after
+  the assignment has completed.
+
+
+- - -
+
+#### tf.Variable.assign_add(delta, use_locking=False) {#Variable.assign_add}
+
+Adds a value to this variable.
+
+ This is essentially a shortcut for `assign_add(self, delta)`.
+
+##### Args:
+
+
+*  <b>delta</b>: A `Tensor`. The value to add to this variable.
+*  <b>use_locking</b>: If `True`, use locking during the operation.
+
+##### Returns:
+
+  A `Tensor` that will hold the new value of this variable after
+  the addition has completed.
+
+
+- - -
+
+#### tf.Variable.assign_sub(delta, use_locking=False) {#Variable.assign_sub}
+
+Subtracts a value from this variable.
+
+This is essentially a shortcut for `assign_sub(self, delta)`.
+
+##### Args:
+
+
+*  <b>delta</b>: A `Tensor`. The value to subtract from this variable.
+*  <b>use_locking</b>: If `True`, use locking during the operation.
+
+##### Returns:
+
+  A `Tensor` that will hold the new value of this variable after
+  the subtraction has completed.
+
+
+- - -
+
+#### tf.Variable.scatter_sub(sparse_delta, use_locking=False) {#Variable.scatter_sub}
+
+Subtracts `IndexedSlices` from this variable.
+
+This is essentially a shortcut for `scatter_sub(self, sparse_delta.indices,
+sparse_delta.values)`.
+
+##### Args:
+
+
+*  <b>sparse_delta</b>: `IndexedSlices` to be subtracted from this variable.
+*  <b>use_locking</b>: If `True`, use locking during the operation.
+
+##### Returns:
+
+  A `Tensor` that will hold the new value of this variable after
+  the scattered subtraction has completed.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if `sparse_delta` is not an `IndexedSlices`.
+
+
+- - -
+
+#### tf.Variable.count_up_to(limit) {#Variable.count_up_to}
+
+Increments this variable until it reaches `limit`.
+
+When that Op is run it tries to increment the variable by `1`. If
+incrementing the variable would bring it above `limit` then the Op raises
+the exception `OutOfRangeError`.
+
+If no error is raised, the Op outputs the value of the variable before
+the increment.
+
+This is essentially a shortcut for `count_up_to(self, limit)`.
+
+##### Args:
+
+
+*  <b>limit</b>: value at which incrementing the variable raises an error.
+
+##### Returns:
+
+  A `Tensor` that will hold the variable value before the increment. If no
+  other Op modifies this variable, the values produced will all be
+  distinct.
+
+
+
+- - -
+
+#### tf.Variable.eval(session=None) {#Variable.eval}
+
+In a session, computes and returns the value of this variable.
+
+This is not a graph construction method, it does not add ops to the graph.
+
+This convenience method requires a session where the graph containing this
+variable has been launched. If no session is passed, the default session is
+used.  See the [Session class](../client.md#Session) for more information on
+launching a graph and on sessions.
+
+```python
+v = tf.Variable([1, 2])
+init = tf.initialize_all_variables()
+
+with tf.Session() as sess:
+    sess.run(init)
+    # Usage passing the session explicitly.
+    print v.eval(sess)
+    # Usage with the default session.  The 'with' block
+    # above makes 'sess' the default session.
+    print v.eval()
+```
+
+##### Args:
+
+
+*  <b>session</b>: The session to use to evaluate this variable. If
+    none, the default session is used.
+
+##### Returns:
+
+  A numpy `ndarray` with a copy of the value of this variable.
+
+
+
+Properties.
+
+- - -
+
+#### tf.Variable.name {#Variable.name}
+
+The name of this variable.
+
+- - -
+
+#### tf.Variable.dtype {#Variable.dtype}
+
+The `DType` of this variable.
+
+- - -
+
+#### tf.Variable.get_shape() {#Variable.get_shape}
+
+The `TensorShape` of this variable.
+
+##### Returns:
+
+  A `TensorShape`.
+
+
+- - -
+
+#### tf.Variable.device {#Variable.device}
+
+The device of this variable.
+
+- - -
+
+#### tf.Variable.initializer {#Variable.initializer}
+
+The initializer operation for this variable.
+
+- - -
+
+#### tf.Variable.graph {#Variable.graph}
+
+The `Graph` of this variable.
+
+- - -
+
+#### tf.Variable.op {#Variable.op}
+
+The `Operation` of this variable.
+
+
+
+## Variable helper functions <div class="md-anchor" id="AUTOGENERATED-variable-helper-functions">{#AUTOGENERATED-variable-helper-functions}</div>
+
+TensorFlow provides a set of functions to help manage the set of variables
+collected in the graph.
+
+- - -
+
+### tf.all_variables() <div class="md-anchor" id="all_variables">{#all_variables}</div>
+
+Returns all variables collected in the graph.
+
+The `Variable()` constructor automatically adds new variables to the graph
+collection `GraphKeys.VARIABLES`. This convenience function returns the
+contents of that collection.
+
+##### Returns:
+
+  A list of `Variable` objects.
+
+
+- - -
+
+### tf.trainable_variables() <div class="md-anchor" id="trainable_variables">{#trainable_variables}</div>
+
+Returns all variables created with `trainable=True`.
+
+When passed `trainable=True`, the `Variable()` constructor automatically
+adds new variables to the graph collection
+`GraphKeys.TRAINABLE_VARIABLES`. This convenience function returns the
+contents of that collection.
+
+##### Returns:
+
+  A list of Variable objects.
+
+
+
+- - -
+
+### tf.initialize_all_variables() <div class="md-anchor" id="initialize_all_variables">{#initialize_all_variables}</div>
+
+Returns an Op that initializes all variables.
+
+This is just a shortcut for `initialize_variables(all_variables())`
+
+##### Returns:
+
+  An Op that initializes all variables in the graph.
+
+
+- - -
+
+### tf.initialize_variables(var_list, name='init') <div class="md-anchor" id="initialize_variables">{#initialize_variables}</div>
+
+Returns an Op that initializes a list of variables.
+
+After you launch the graph in a session, you can run the returned Op to
+initialize all the variables in `var_list`. This Op runs all the
+initializers of the variables in `var_list` in parallel.
+
+Calling `initialize_variables()` is equivalent to passing the list of
+initializers to `Group()`.
+
+If `var_list` is empty, however, the function still returns an Op that can
+be run. That Op just has no effect.
+
+##### Args:
+
+
+*  <b>var_list</b>: List of `Variable` objects to initialize.
+*  <b>name</b>: Optional name for the returned operation.
+
+##### Returns:
+
+  An Op that run the initializers of all the specified variables.
+
+
+- - -
+
+### tf.assert_variables_initialized(var_list=None) <div class="md-anchor" id="assert_variables_initialized">{#assert_variables_initialized}</div>
+
+Returns an Op to check if variables are initialized.
+
+When run, the returned Op will raise the exception `FailedPreconditionError`
+if any of the variables has not yet been initialized.
+
+Note: This function is implemented by trying to fetch the values of the
+variables. If one of the variables is not initialized a message may be
+logged by the C++ runtime. This is expected.
+
+##### Args:
+
+
+*  <b>var_list</b>: List of `Variable` objects to check. Defaults to the
+    value of `all_variables().`
+
+##### Returns:
+
+  An Op, or None if there are no variables.
+
+
+
+## Saving and Restoring Variables. <div class="md-anchor" id="AUTOGENERATED-saving-and-restoring-variables.">{#AUTOGENERATED-saving-and-restoring-variables.}</div>
+
+- - -
+
+### class tf.train.Saver <div class="md-anchor" id="Saver">{#Saver}</div>
+
+Saves and restores variables.
+
+See [Variables](../../how_tos/variables/index.md)
+for an overview of variables, saving and restoring.
+
+The `Saver` class adds ops to save and restore variables to and from
+*checkpoints*.  It also provides convenience methods to run these ops.
+
+Checkpoints are binary files in a proprietary format which map variable names
+to tensor values.  The best way to examine the contents of a checkpoint is to
+load it using a `Saver`.
+
+Savers can automatically number checkpoint filenames with a provided counter.
+This lets you keep multiple checkpoints at different steps while training a
+model.  For example you can number the checkpoint filenames with the training
+step number.  To avoid filling up disks, savers manage checkpoint files
+automatically. For example, they can keep only the N most recent files, or
+one checkpoint for every N hours of training.
+
+You number checkpoint filenames by passing a value to the optional
+`global_step` argument to `save()`:
+
+```python
+saver.save('my-model', global_step=0) ==> filename: 'my-model-0'
+...
+saver.save('my-model', global_step=1000) ==> filename: 'my-model-1000'
+```
+
+Additionally, optional arguments to the `Saver()` constructor let you control
+the proliferation of checkpoint files on disk:
+
+* `max_to_keep` indicates the maximum number of recent checkpoint files to
+  keep.  As new files are created, older files are deleted.  If None or 0,
+  all checkpoint files are kept.  Defaults to 5 (that is, the 5 most recent
+  checkpoint files are kept.)
+
+* `keep_checkpoint_every_n_hours`: In addition to keeping the most recent
+  `max_to_keep` checkpoint files, you might want to keep one checkpoint file
+  for every N hours of training.  This can be useful if you want to later
+  analyze how a model progressed during a long training session.  For
+  example, passing `keep_checkpoint_every_n_hours=2` ensures that you keep
+  one checkpoint file for every 2 hours of training.  The default value of
+  10,000 hours effectively disables the feature.
+
+Note that you still have to call the `save()` method to save the model.
+Passing these arguments to the constructor will not save variables
+automatically for you.
+
+A training program that saves regularly looks like:
+
+```python
+...
+# Create a saver.
+saver = tf.train.Saver(...variables...)
+# Launch the graph and train, saving the model every 1,000 steps.
+sess = tf.Session()
+for step in xrange(1000000):
+    sess.run(..training_op..)
+    if step % 1000 == 0:
+        # Append the step number to the checkpoint name:
+        saver.save(sess, 'my-model', global_step=step)
+```
+
+In addition to checkpoint files, savers keep a protocol buffer on disk with
+the list of recent checkpoints. This is used to manage numbered checkpoint
+files and by `latest_checkpoint()`, which makes it easy to discover the path
+to the most recent checkpoint. That protocol buffer is stored in a file named
+'checkpoint' next to the checkpoint files.
+
+If you create several savers, you can specify a different filename for the
+protocol buffer file in the call to `save()`.
+
+- - -
+
+#### tf.train.Saver.__init__(var_list=None, reshape=False, sharded=False, max_to_keep=5, keep_checkpoint_every_n_hours=10000.0, name=None, restore_sequentially=False, saver_def=None, builder=None) {#Saver.__init__}
+
+Creates a `Saver`.
+
+The constructor adds ops to save and restore variables.
+
+`var_list` specifies the variables that will be saved and restored. It can
+be passed as a `dict` or a list:
+
+* A `dict` of names to variables: The keys are the names that will be
+  used to save or restore the variables in the checkpoint files.
+* A list of variables: The variables will be keyed with their op name in
+  the checkpoint files.
+
+For example:
+
+```python
+v1 = tf.Variable(..., name='v1')
+v2 = tf.Variable(..., name='v2')
+
+# Pass the variables as a dict:
+saver = tf.train.Saver({'v1': v1, 'v2': v2})
+
+# Or pass them as a list.
+saver = tf.train.Saver([v1, v2])
+# Passing a list is equivalent to passing a dict with the variable op names
+# as keys:
+saver = tf.train.Saver({v.op.name: v for v in [v1, v2]})
+```
+
+The optional `reshape` argument, if True, allows restoring a variable from
+a save file where the variable had a different shape, but the same number
+of elements and type.  This is useful if you have reshaped a variable and
+want to reload it from an older checkpoint.
+
+The optional `sharded` argument, if True, instructs the saver to shard
+checkpoints per device.
+
+##### Args:
+
+
+*  <b>var_list</b>: A list of Variables or a dictionary mapping names to
+    Variables.  If None, defaults to the list of all variables.
+*  <b>reshape</b>: If True, allows restoring parameters from a checkpoint
+    where the variables have a different shape.
+*  <b>sharded</b>: If True, shard the checkpoints, one per device.
+*  <b>max_to_keep</b>: maximum number of recent checkpoints to keep.
+    Defaults to 10,000 hours.
+*  <b>keep_checkpoint_every_n_hours</b>: How often to keep checkpoints.
+    Defaults to 10,000 hours.
+*  <b>name</b>: string.  Optional name to use as a prefix when adding operations.
+*  <b>restore_sequentially</b>: A Bool, which if true, causes restore of different
+    variables to happen sequentially within each device.  This can lower
+    memory usage when restoring very large models.
+*  <b>saver_def</b>: Optional SaverDef proto to use instead of running the builder.
+    This is only useful for specialty code that wants to recreate a Saver
+    object for a previously built Graph that had a Saver.  The saver_def
+    proto should be the one returned by the as_saver_def() call of the
+    Saver that was created for that Graph.
+*  <b>builder</b>: Optional SaverBuilder to use if a saver_def was not provided.
+    Defaults to BaseSaverBuilder().
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `var_list` is invalid.
+*  <b>ValueError</b>: If any of the keys or values in `var_list` is not unique.
+
+
+- - -
+
+#### tf.train.Saver.save(sess, save_path, global_step=None, latest_filename=None) {#Saver.save}
+
+Saves variables.
+
+This method runs the ops added by the constructor for saving variables.
+It requires a session in which the graph was launched.  The variables to
+save must also have been initialized.
+
+The method returns the path of the newly created checkpoint file.  This
+path can be passed directly to a call to `restore()`.
+
+##### Args:
+
+
+*  <b>sess</b>: A Session to use to save the variables..
+*  <b>save_path</b>: string.  Path to the checkpoint filename.  If the saver is
+    `sharded`, this is the prefix of the sharded checkpoint filename.
+*  <b>global_step</b>: If provided the global step number is appended to
+    `save_path` to create the checkpoint filename. The optional argument
+    can be a Tensor, a Tensor name or an integer.
+*  <b>latest_filename</b>: Optional name for the protocol buffer file that will
+    contains the list of most recent checkpoint filenames.  That file,
+    kept in the same directory as the checkpoint files, is automatically
+    managed by the saver to keep track of recent checkpoints.  Defaults to
+    'checkpoint'.
+
+##### Returns:
+
+  A string: path at which the variables were saved.  If the saver is
+    sharded, this string ends with: '-?????-of-nnnnn' where 'nnnnn'
+    is the number of shards created.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `sess` is not a Session.
+
+
+- - -
+
+#### tf.train.Saver.restore(sess, save_path) {#Saver.restore}
+
+Restores previously saved variables.
+
+This method runs the ops added by the constructor for restoring variables.
+It requires a session in which the graph was launched.  The variables to
+restore do not have to have been initialized, as restoring is itself a way
+to initialize variables.
+
+The `save_path` argument is typically a value previously returned from a
+`save()` call, or a call to `latest_checkpoint()`.
+
+##### Args:
+
+
+*  <b>sess</b>: A Session to use to restore the parameters.
+*  <b>save_path</b>: Path where parameters were previously saved.
+
+
+
+Other utility methods.
+
+- - -
+
+#### tf.train.Saver.last_checkpoints {#Saver.last_checkpoints}
+
+List of not-yet-deleted checkpoint filenames.
+
+You can pass any of the returned values to `restore()`.
+
+##### Returns:
+
+  A list of checkpoint filenames, sorted from oldest to newest.
+
+- - -
+
+#### tf.train.Saver.set_last_checkpoints(last_checkpoints) {#Saver.set_last_checkpoints}
+
+Sets the list of not-yet-deleted checkpoint filenames.
+
+##### Args:
+
+
+*  <b>last_checkpoints</b>: a list of checkpoint filenames.
+
+##### Raises:
+
+
+*  <b>AssertionError</b>: if the list of checkpoint filenames has already been set.
+
+
+- - -
+
+#### tf.train.Saver.as_saver_def() {#Saver.as_saver_def}
+
+Generates a `SaverDef` representation of this saver.
+
+##### Returns:
+
+  A `SaverDef` proto.
+
+
+
+
+- - -
+
+### tf.train.latest_checkpoint(checkpoint_dir, latest_filename=None) <div class="md-anchor" id="latest_checkpoint">{#latest_checkpoint}</div>
+
+Finds the filename of latest saved checkpoint file.
+
+##### Args:
+
+
+*  <b>checkpoint_dir</b>: Directory where the variables were saved.
+*  <b>latest_filename</b>: Optional name for the protocol buffer file that
+    contains the list of most recent checkpoint filenames.
+    See the corresponding argument to `Saver.save()`.
+
+##### Returns:
+
+  The full path to the latest checkpoint or None if no checkpoint was found.
+
+
+
+- - -
+
+### tf.train.get_checkpoint_state(checkpoint_dir, latest_filename=None) <div class="md-anchor" id="get_checkpoint_state">{#get_checkpoint_state}</div>
+
+Returns CheckpointState proto from the "checkpoint" file.
+
+If the "checkpoint" file contains a valid CheckpointState
+proto, returns it.
+
+##### Args:
+
+
+*  <b>checkpoint_dir</b>: The directory of checkpoints.
+*  <b>latest_filename</b>: Optional name of the checkpoint file.  Default to
+    'checkpoint'.
+
+##### Returns:
+
+  A CheckpointState if the state was available, None
+  otherwise.
+
+
+- - -
+
+### tf.train.update_checkpoint_state(save_dir, model_checkpoint_path, all_model_checkpoint_paths=None, latest_filename=None) <div class="md-anchor" id="update_checkpoint_state">{#update_checkpoint_state}</div>
+
+Updates the content of the 'checkpoint' file.
+
+This updates the checkpoint file containing a CheckpointState
+proto.
+
+##### Args:
+
+
+*  <b>save_dir</b>: Directory where the model was saved.
+*  <b>model_checkpoint_path</b>: The checkpoint file.
+*  <b>all_model_checkpoint_paths</b>: list of strings.  Paths to all not-yet-deleted
+    checkpoints, sorted from oldest to newest.  If this is a non-empty list,
+    the last element must be equal to model_checkpoint_path.  These paths
+    are also saved in the CheckpointState proto.
+*  <b>latest_filename</b>: Optional name of the checkpoint file.  Default to
+    'checkpoint'.
+
+##### Raises:
+
+
+*  <b>RuntimeError</b>: If the save paths conflict.
+
+
+
+## Sharing Variables <div class="md-anchor" id="AUTOGENERATED-sharing-variables">{#AUTOGENERATED-sharing-variables}</div>
+
+TensorFlow provides several classes and operations that you can use to
+create variables contingent on certain conditions.
+
+- - -
+
+### tf.get_variable(name, shape=None, dtype=tf.float32, initializer=None, trainable=True, collections=None) <div class="md-anchor" id="get_variable">{#get_variable}</div>
+
+Gets an existing variable with these parameters or create a new one.
+
+This function prefixes the name with the current variable scope
+and performs reuse checks. See the
+[Variable Scope How To](../../how_tos/variable_scope/index.md)
+for an extensive description of how reusing works. Here is a basic example:
+
+```python
+with tf.variable_scope("foo"):
+    v = get_variable("v", [1])  # v.name == "foo/v:0"
+    w = get_variable("w", [1])  # w.name == "foo/w:0"
+with tf.variable_scope("foo", reuse=True)
+    v1 = get_variable("v")  # The same as v above.
+```
+
+If initializer is `None` (the default), the default initializer passed in
+the constructor is used. If that one is `None` too, a
+`UniformUnitScalingInitializer` will be used.
+
+##### Args:
+
+
+*  <b>name</b>: the name of the new or existing variable.
+*  <b>shape</b>: shape of the new or existing variable.
+*  <b>dtype</b>: type of the new or existing variable (defaults to `DT_FLOAT`).
+*  <b>initializer</b>: initializer for the variable if one is created.
+*  <b>trainable</b>: If `True` also add the variable to the graph collection
+    `GraphKeys.TRAINABLE_VARIABLES` (see variables.Variable).
+*  <b>collections</b>: List of graph collections keys to add the Variable to.
+    Defaults to `[GraphKeys.VARIABLES]` (see variables.Variable).
+
+##### Returns:
+
+  The created or existing variable.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: when creating a new variable and shape is not declared,
+    or when violating reuse during variable creation. Reuse is set inside
+    `variable_scope`.
+
+
+- - -
+
+### tf.get_variable_scope() <div class="md-anchor" id="get_variable_scope">{#get_variable_scope}</div>
+
+Returns the current variable scope.
+
+
+- - -
+
+### tf.variable_scope(*args, **kwds) <div class="md-anchor" id="variable_scope">{#variable_scope}</div>
+
+Returns a context for variable scope.
+
+Variable scope allows to create new variables and to share already created
+ones while providing checks to not create or share by accident. For details,
+see the [Variable Scope How To](../../how_tos/variable_scope/index.md),
+here we present only a few basic examples.
+
+Simple example of how to create a new variable:
+
+```python
+with tf.variable_scope("foo"):
+    with tf.variable_scope("bar"):
+        v = tf.get_variable("v", [1])
+        assert v.name == "foo/bar/v:0"
+```
+
+Basic example of sharing a variable:
+
+```python
+with tf.variable_scope("foo"):
+    v = get_variable("v", [1])
+with tf.variable_scope("foo", reuse=True):
+    v1 = tf.get_variable("v", [1])
+assert v1 == v
+```
+
+Sharing a variable by capturing a scope and setting reuse:
+
+```python
+with tf.variable_scope("foo") as scope.
+    v = get_variable("v", [1])
+    scope.reuse_variables()
+    v1 = tf.get_variable("v", [1])
+assert v1 == v
+```
+
+To prevent accidental sharing of variables, we raise an exception when
+getting an existing variable in a non-reusing scope.
+
+```python
+with tf.variable_scope("foo") as scope.
+    v = get_variable("v", [1])
+    v1 = tf.get_variable("v", [1])
+    #  Raises ValueError("... v already exists ...").
+```
+
+Similarly, we raise an exception when trying to get a variable that
+does not exist in reuse mode.
+
+```python
+with tf.variable_scope("foo", reuse=True):
+    v = get_variable("v", [1])
+    #  Raises ValueError("... v does not exists ...").
+```
+
+Note that the `reuse` flag is inherited: if we open a reusing scope,
+then all its sub-scopes become reusing as well.
+
+##### Args:
+
+
+*  <b>name_or_scope</b>: `string` or `VariableScope`: the scope to open.
+*  <b>reuse</b>: `True` or `None`; if `True`, we go into reuse mode for this scope as
+    well as all sub-scopes; if `None`, we just inherit the parent scope reuse.
+*  <b>initializer</b>: default initializer for variables within this scope.
+
+##### Yields:
+
+  A scope that can be to captured and reused.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: when trying to reuse within a create scope, or create within
+    a reuse scope, or if reuse is not `None` or `True`.
+*  <b>TypeError</b>: when the types of some arguments are not appropriate.
+
+
+
+- - -
+
+### tf.constant_initializer(value=0.0) <div class="md-anchor" id="constant_initializer">{#constant_initializer}</div>
+
+Returns an initializer that generates Tensors with a single value.
+
+##### Args:
+
+
+*  <b>value</b>: A Python scalar. All elements of the initialized variable
+    will be set to this value.
+
+##### Returns:
+
+  An initializer that generates Tensors with a single value.
+
+
+- - -
+
+### tf.random_normal_initializer(mean=0.0, stddev=1.0, seed=None) <div class="md-anchor" id="random_normal_initializer">{#random_normal_initializer}</div>
+
+Returns an initializer that generates Tensors with a normal distribution.
+
+##### Args:
+
+
+*  <b>mean</b>: a python scalar or a scalar tensor. Mean of the random values
+    to generate.
+*  <b>stddev</b>: a python scalar or a scalar tensor. Standard deviation of the
+    random values to generate.
+*  <b>seed</b>: A Python integer. Used to create random seeds.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+##### Returns:
+
+  An initializer that generates Tensors with a normal distribution.
+
+
+- - -
+
+### tf.truncated_normal_initializer(mean=0.0, stddev=1.0, seed=None) <div class="md-anchor" id="truncated_normal_initializer">{#truncated_normal_initializer}</div>
+
+Returns an initializer that generates a truncated normal distribution.
+
+These values are similar to values from a random_normal_initializer
+except that values more than two standard deviations from the mean
+are discarded and re-drawn. This is the recommended initializer for
+neural network weights and filters.
+
+##### Args:
+
+
+*  <b>mean</b>: a python scalar or a scalar tensor. Mean of the random values
+    to generate.
+*  <b>stddev</b>: a python scalar or a scalar tensor. Standard deviation of the
+    random values to generate.
+*  <b>seed</b>: A Python integer. Used to create random seeds.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+##### Returns:
+
+  An initializer that generates Tensors with a truncated normal
+  distribution.
+
+
+- - -
+
+### tf.random_uniform_initializer(minval=0.0, maxval=1.0, seed=None) <div class="md-anchor" id="random_uniform_initializer">{#random_uniform_initializer}</div>
+
+Returns an initializer that generates Tensors with a uniform distribution.
+
+##### Args:
+
+
+*  <b>minval</b>: a python scalar or a scalar tensor. lower bound of the range
+    of random values to generate.
+*  <b>maxval</b>: a python scalar or a scalar tensor. upper bound of the range
+    of random values to generate.
+*  <b>seed</b>: A Python integer. Used to create random seeds.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+##### Returns:
+
+  An initializer that generates Tensors with a uniform distribution.
+
+
+- - -
+
+### tf.uniform_unit_scaling_initializer(factor=1.0, seed=None) <div class="md-anchor" id="uniform_unit_scaling_initializer">{#uniform_unit_scaling_initializer}</div>
+
+Returns an initializer that generates tensors without scaling variance.
+
+When initializing a deep network, it is in principle advantageous to keep
+the scale of the input variance constant, so it does not explode or diminish
+by reaching the final layer. If the input is `x` and the operation `x * W`,
+and we want to initialize `W` uniformly at random, we need to pick `W` from
+
+    [-sqrt(3) / sqrt(dim), sqrt(3) / sqrt(dim)]
+
+to keep the scale intact, where `dim = W.shape[0]` (the size of the input).
+A similar calculation for convolutional networks gives an analogous result
+with `dim` equal to the product of the first 3 dimensions.  When
+nonlinearities are present, we need to multiply this by a constant `factor`.
+See <https://arxiv.org/pdf/1412.6558v3.pdf> for deeper motivation, experiments
+and the calculation of constants. In section 2.3 there, the constants were
+numerically computed: for a linear layer it's 1.0, relu: ~1.43, tanh: ~1.15.
+
+##### Args:
+
+
+*  <b>factor</b>: Float.  A multiplicative factor by which the values will be scaled.
+*  <b>seed</b>: A Python integer. Used to create random seeds.
+    See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+##### Returns:
+
+  An initializer that generates tensors with unit variance.
+
+
+- - -
+
+### tf.zeros_initializer(shape, dtype=tf.float32) <div class="md-anchor" id="zeros_initializer">{#zeros_initializer}</div>
+
+An adaptor for zeros() to match the Initializer spec.
+
+
+
+## Sparse Variable Updates <div class="md-anchor" id="AUTOGENERATED-sparse-variable-updates">{#AUTOGENERATED-sparse-variable-updates}</div>
+
+The sparse update ops modify a subset of the entries in a dense `Variable`,
+either overwriting the entries or adding / subtracting a delta.  These are
+useful for training embedding models and similar lookup-based networks, since
+only a small subset of embedding vectors change in any given step.
+
+Since a sparse update of a large tensor may be generated automatically during
+gradient computation (as in the gradient of [`tf.gather`](array_ops.md#gather)),
+an [`IndexedSlices`](#IndexedSlices) class is provided that encapsulates a set
+of sparse indices and values.  `IndexedSlices` objects are detected and handled
+automatically by the optimizers in most cases.
+
+- - -
+
+### tf.scatter_update(ref, indices, updates, use_locking=None, name=None) <div class="md-anchor" id="scatter_update">{#scatter_update}</div>
+
+Applies sparse updates to a variable reference.
+
+This operation computes
+
+    # Scalar indices
+    ref[indices, ...] = updates[...]
+
+    # Vector indices (for each i)
+    ref[indices[i], ...] = updates[i, ...]
+
+    # High rank indices (for each i, ..., j)
+    ref[indices[i, ..., j], ...] = updates[i, ..., j, ...]
+
+This operation outputs `ref` after the update is done.
+This makes it easier to chain operations that need to use the reset value.
+
+If `indices` contains duplicate entries, lexicographically later entries
+override earlier entries.
+
+Requires `updates.shape = indices.shape + ref.shape[1:]`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/ScatterUpdate.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>ref</b>: A mutable `Tensor`. Should be from a `Variable` node.
+*  <b>indices</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A tensor of indices into the first dimension of `ref`.
+*  <b>updates</b>: A `Tensor`. Must have the same type as `ref`.
+    A tensor of updated values to store in `ref`.
+*  <b>use_locking</b>: An optional `bool`. Defaults to `True`.
+    If True, the assignment will be protected by a lock;
+    otherwise the behavior is undefined, but may exhibit less contention.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  Same as `ref`.  Returned as a convenience for operations that want
+  to use the updated values after the update is done.
+
+
+- - -
+
+### tf.scatter_add(ref, indices, updates, use_locking=None, name=None) <div class="md-anchor" id="scatter_add">{#scatter_add}</div>
+
+Adds sparse updates to a variable reference.
+
+This operation computes
+
+    # Scalar indices
+    ref[indices, ...] += updates[...]
+
+    # Vector indices (for each i)
+    ref[indices[i], ...] += updates[i, ...]
+
+    # High rank indices (for each i, ..., j)
+    ref[indices[i, ..., j], ...] += updates[i, ..., j, ...]
+
+This operation outputs `ref` after the update is done.
+This makes it easier to chain operations that need to use the reset value.
+
+Duplicate entries are handled correctly: if multiple `indices` reference
+the same location, their contributions add.
+
+Requires `updates.shape = indices.shape + ref.shape[1:]`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/ScatterAdd.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>ref</b>: A mutable `Tensor`. Must be one of the following types: `float32`, `float64`, `int64`, `int32`, `uint8`, `int16`, `int8`, `complex64`, `qint8`, `quint8`, `qint32`.
+    Should be from a `Variable` node.
+*  <b>indices</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A tensor of indices into the first dimension of `ref`.
+*  <b>updates</b>: A `Tensor`. Must have the same type as `ref`.
+    A tensor of updated values to add to `ref`.
+*  <b>use_locking</b>: An optional `bool`. Defaults to `False`.
+    If True, the addition will be protected by a lock;
+    otherwise the behavior is undefined, but may exhibit less contention.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  Same as `ref`.  Returned as a convenience for operations that want
+  to use the updated values after the update is done.
+
+
+- - -
+
+### tf.scatter_sub(ref, indices, updates, use_locking=None, name=None) <div class="md-anchor" id="scatter_sub">{#scatter_sub}</div>
+
+Subtracts sparse updates to a variable reference.
+
+    # Scalar indices
+    ref[indices, ...] -= updates[...]
+
+    # Vector indices (for each i)
+    ref[indices[i], ...] -= updates[i, ...]
+
+    # High rank indices (for each i, ..., j)
+    ref[indices[i, ..., j], ...] -= updates[i, ..., j, ...]
+
+This operation outputs `ref` after the update is done.
+This makes it easier to chain operations that need to use the reset value.
+
+Duplicate entries are handled correctly: if multiple `indices` reference
+the same location, their (negated) contributions add.
+
+Requires `updates.shape = indices.shape + ref.shape[1:]`.
+
+<div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/ScatterSub.png" alt>
+</div>
+
+##### Args:
+
+
+*  <b>ref</b>: A mutable `Tensor`. Must be one of the following types: `float32`, `float64`, `int64`, `int32`, `uint8`, `int16`, `int8`, `complex64`, `qint8`, `quint8`, `qint32`.
+    Should be from a `Variable` node.
+*  <b>indices</b>: A `Tensor`. Must be one of the following types: `int32`, `int64`.
+    A tensor of indices into the first dimension of `ref`.
+*  <b>updates</b>: A `Tensor`. Must have the same type as `ref`.
+    A tensor of updated values to subtract from `ref`.
+*  <b>use_locking</b>: An optional `bool`. Defaults to `False`.
+    If True, the subtraction will be protected by a lock;
+    otherwise the behavior is undefined, but may exhibit less contention.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  Same as `ref`.  Returned as a convenience for operations that want
+  to use the updated values after the update is done.
+
+
+- - -
+
+### tf.sparse_mask(a, mask_indices, name=None) <div class="md-anchor" id="sparse_mask">{#sparse_mask}</div>
+
+Masks elements of `IndexedSlices`.
+
+Given an `IndexedSlices` instance `a`, returns another `IndexedSlices` that
+contains a subset of the slices of `a`. Only the slices at indices specified
+in `mask_indices` are returned.
+
+This is useful when you need to extract a subset of slices in an
+`IndexedSlices` object.
+
+For example:
+
+```python
+# `a` contains slices at indices [12, 26, 37, 45] from a large tensor
+# with shape [1000, 10]
+a.indices => [12, 26, 37, 45]
+tf.shape(a.values) => [4, 10]
+
+# `b` will be the subset of `a` slices at its second and third indices, so
+# we want to mask of its first and last indices (which are at absolute
+# indices 12, 45)
+b = tf.sparse_mask(a, [12, 45])
+
+b.indices => [26, 37]
+tf.shape(b.values) => [2, 10]
+
+```
+
+##### Args:
+
+  * `a`: An `IndexedSlices` instance.
+  * `mask_indices`: Indices of elements to mask.
+  * `name`: A name for the operation (optional).
+
+##### Returns:
+
+  The masked `IndexedSlices` instance.
+
+
+- - -
+
+### class tf.IndexedSlices <div class="md-anchor" id="IndexedSlices">{#IndexedSlices}</div>
+
+A sparse representation of a set of tensor slices at given indices.
+
+This class is a simple wrapper for a pair of `Tensor` objects:
+
+* `values`: A `Tensor` of any dtype with shape `[D0, D1, ..., Dn]`.
+* `indices`: A 1-D integer `Tensor` with shape `[D0]`.
+
+An `IndexedSlices` is typically used to represent a subset of a larger
+tensor `dense` of shape `[LARGE0, D1, .. , DN]` where `LARGE0 >> D0`.
+The values in `indices` are the indices in the first dimension of
+the slices that have been extracted from the larger tensor.
+
+The dense tensor `dense` represented by an `IndexedSlices` `slices` has
+
+```python
+dense[slices.indices[i], :, :, :, ...] = slices.values[i, :, :, :, ...]
+```
+
+The `IndexedSlices` class is used principally in the definition of
+gradients for operations that have sparse gradients
+(e.g. [`tf.gather`](array_ops.md#gather)).
+
+Contrast this representation with
+[`SparseTensor`](sparse_ops.md#SparseTensor),
+which uses multi-dimensional indices and scalar values.
+
+- - -
+
+#### tf.IndexedSlices.__init__(values, indices, dense_shape=None) {#IndexedSlices.__init__}
+
+Creates an `IndexedSlices`.
+
+
+
+- - -
+
+#### tf.IndexedSlices.values {#IndexedSlices.values}
+
+A `Tensor` containing the values of the slices.
+
+- - -
+
+#### tf.IndexedSlices.indices {#IndexedSlices.indices}
+
+A 1-D `Tensor` containing the indices of the slices.
+
+- - -
+
+#### tf.IndexedSlices.dense_shape {#IndexedSlices.dense_shape}
+
+A 1-D `Tensor` containing the shape of the corresponding dense tensor.
+
+
+- - -
+
+#### tf.IndexedSlices.name {#IndexedSlices.name}
+
+The name of this `IndexedSlices`.
+
+- - -
+
+#### tf.IndexedSlices.dtype {#IndexedSlices.dtype}
+
+The `DType` of elements in this tensor.
+
+- - -
+
+#### tf.IndexedSlices.device {#IndexedSlices.device}
+
+The name of the device on which `values` will be produced, or `None`.
+
+- - -
+
+#### tf.IndexedSlices.op {#IndexedSlices.op}
+
+The `Operation` that produces `values` as an output.
+
+
diff --git a/tensorflow/g3doc/api_docs/python/train.md b/tensorflow/g3doc/api_docs/python/train.md
new file mode 100644
index 0000000000..0c88968c5d
--- /dev/null
+++ b/tensorflow/g3doc/api_docs/python/train.md
@@ -0,0 +1,1825 @@
+<!-- This file is machine generated: DO NOT EDIT! -->
+
+# Training
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Optimizers.](#AUTOGENERATED-optimizers.)
+  * [class tf.train.Optimizer](#Optimizer)
+  * [Usage](#AUTOGENERATED-usage)
+  * [Processing gradients before applying them.](#AUTOGENERATED-processing-gradients-before-applying-them.)
+  * [Gating Gradients](#AUTOGENERATED-gating-gradients)
+  * [Slots](#AUTOGENERATED-slots)
+  * [class tf.train.GradientDescentOptimizer](#GradientDescentOptimizer)
+  * [class tf.train.AdagradOptimizer](#AdagradOptimizer)
+  * [class tf.train.MomentumOptimizer](#MomentumOptimizer)
+  * [class tf.train.AdamOptimizer](#AdamOptimizer)
+  * [class tf.train.FtrlOptimizer](#FtrlOptimizer)
+  * [class tf.train.RMSPropOptimizer](#RMSPropOptimizer)
+* [Gradient Computation.](#AUTOGENERATED-gradient-computation.)
+  * [tf.gradients(ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)](#gradients)
+  * [class tf.AggregationMethod](#AggregationMethod)
+  * [tf.stop_gradient(input, name=None)](#stop_gradient)
+* [Gradient Clipping](#AUTOGENERATED-gradient-clipping)
+  * [tf.clip_by_value(t, clip_value_min, clip_value_max, name=None)](#clip_by_value)
+  * [tf.clip_by_norm(t, clip_norm, name=None)](#clip_by_norm)
+  * [tf.clip_by_average_norm(t, clip_norm, name=None)](#clip_by_average_norm)
+  * [tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)](#clip_by_global_norm)
+  * [tf.global_norm(t_list, name=None)](#global_norm)
+* [Decaying the learning rate.](#AUTOGENERATED-decaying-the-learning-rate.)
+  * [tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)](#exponential_decay)
+* [Moving Averages.](#AUTOGENERATED-moving-averages.)
+  * [class tf.train.ExponentialMovingAverage](#ExponentialMovingAverage)
+* [Coordinator and QueueRunner.](#AUTOGENERATED-coordinator-and-queuerunner.)
+  * [class tf.train.Coordinator](#Coordinator)
+  * [class tf.train.QueueRunner](#QueueRunner)
+  * [tf.train.add_queue_runner(qr, collection='queue_runners')](#add_queue_runner)
+  * [tf.train.start_queue_runners(sess=None, coord=None, daemon=True, start=True, collection='queue_runners')](#start_queue_runners)
+* [Summary Operations.](#AUTOGENERATED-summary-operations.)
+  * [tf.scalar_summary(tags, values, collections=None, name=None)](#scalar_summary)
+  * [tf.image_summary(tag, tensor, max_images=None, collections=None, name=None)](#image_summary)
+  * [tf.histogram_summary(tag, values, collections=None, name=None)](#histogram_summary)
+  * [tf.nn.zero_fraction(value, name=None)](#zero_fraction)
+  * [tf.merge_summary(inputs, collections=None, name=None)](#merge_summary)
+  * [tf.merge_all_summaries(key='summaries')](#merge_all_summaries)
+* [Adding Summaries to Event Files.](#AUTOGENERATED-adding-summaries-to-event-files.)
+  * [class tf.train.SummaryWriter](#SummaryWriter)
+  * [tf.train.summary_iterator(path)](#summary_iterator)
+* [Training utilities.](#AUTOGENERATED-training-utilities.)
+  * [tf.train.global_step(sess, global_step_tensor)](#global_step)
+  * [tf.train.write_graph(graph_def, logdir, name, as_text=True)](#write_graph)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+This library provides a set of classes and functions that helps train models.
+
+## Optimizers. <div class="md-anchor" id="AUTOGENERATED-optimizers.">{#AUTOGENERATED-optimizers.}</div>
+
+The Optimizer base class provides methods to compute gradients for a loss and
+apply gradients to variables.  A collection of subclasses implement classic
+optimization algorithms such as GradientDescent and Adagrad.
+
+You never instantiate the Optimizer class itself, but instead instantiate one
+of the subclasses.
+
+- - -
+
+### class tf.train.Optimizer <div class="md-anchor" id="Optimizer">{#Optimizer}</div>
+
+Base class for optimizers.
+
+This class defines the API to add Ops to train a model.  You never use this
+class directly, but instead instantiate one of its subclasses such as
+`GradientDescentOptimizer`, `AdagradOptimizer`, or `MomentumOptimizer`.
+
+### Usage <div class="md-anchor" id="AUTOGENERATED-usage">{#AUTOGENERATED-usage}</div>
+
+```
+# Create an optimizer with the desired parameters.
+opt = GradientDescentOptimizer(learning_rate=0.1)
+# Add Ops to the graph to minimize a cost by updating a list of variables.
+# "cost" is a Tensor, and the list of variables contains variables.Variable
+# objects.
+opt_op = opt.minimize(cost, <list of variables>)
+```
+
+In the training program you will just have to run the returned Op.
+
+```
+# Execute opt_op to do one step of training:
+opt_op.run()
+```
+
+### Processing gradients before applying them. <div class="md-anchor" id="AUTOGENERATED-processing-gradients-before-applying-them.">{#AUTOGENERATED-processing-gradients-before-applying-them.}</div>
+
+Calling `minimize()` takes care of both computing the gradients and
+applying them to the variables.  If you want to process the gradients
+before applying them you can instead use the optimizer in three steps:
+
+1.  Compute the gradients with `compute_gradients()`.
+2.  Process the gradients as you wish.
+3.  Apply the processed gradients with `apply_gradients()`.
+
+Example:
+
+```
+# Create an optimizer.
+opt = GradientDescentOptimizer(learning_rate=0.1)
+
+# Compute the gradients for a list of variables.
+grads_and_vars = opt.compute_gradients(loss, <list of variables>)
+
+# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
+# need to the 'gradient' part, for example cap them, etc.
+capped_grads_and_vars = [(MyCapper(gv[0]), gv[1])) for gv in grads_and_vars]
+
+# Ask the optimizer to apply the capped gradients.
+opt.apply_gradients(capped_grads_and_vars)
+```
+
+- - -
+
+#### tf.train.Optimizer.__init__(use_locking, name) {#Optimizer.__init__}
+
+Create a new Optimizer.
+
+This must be called by the constructors of subclasses.
+
+##### Args:
+
+
+*  <b>use_locking</b>: Bool. If True apply use locks to prevent concurrent updates
+    to variables.
+*  <b>name</b>: A non-empty string.  The name to use for accumulators created
+    for the optimizer.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if name is malformed.
+
+
+
+- - -
+
+#### tf.train.Optimizer.minimize(loss, global_step=None, var_list=None, gate_gradients=1, name=None) {#Optimizer.minimize}
+
+Add operations to minimize 'loss' by updating 'var_list'.
+
+This method simply combines calls compute_gradients() and
+apply_gradients(). If you want to process the gradient before applying them
+call compute_gradients() and apply_gradients() explicitly instead of using
+this function.
+
+##### Args:
+
+
+*  <b>loss</b>: A Tensor containing the value to minimize.
+*  <b>global_step</b>: Optional Variable to increment by one after the
+    variables have been updated.
+*  <b>var_list</b>: Optional list of variables.Variable to update to minimize
+    'loss'.  Defaults to the list of variables collected in the graph
+    under the key GraphKeys.TRAINABLE_VARIABLES.
+*  <b>gate_gradients</b>: How to gate the computation of gradients.  Can be
+    GATE_NONE, GATE_OP, or  GATE_GRAPH.
+*  <b>name</b>: Optional name for the returned operation.
+
+##### Returns:
+
+  An Operation that updates the variables in 'var_list'.  If 'global_step'
+  was not None, that operation also increments global_step.
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if some of the variables are not variables.Variable objects.
+
+
+- - -
+
+#### tf.train.Optimizer.compute_gradients(loss, var_list=None, gate_gradients=1) {#Optimizer.compute_gradients}
+
+Compute gradients of "loss" for the variables in "var_list".
+
+This is the first part of minimize().  It returns a list
+of (gradient, variable) pairs where "gradient" is the gradient
+for "variable".  Note that "gradient" can be a Tensor, a
+IndexedSlices, or None if there is no gradient for the
+given variable.
+
+##### Args:
+
+
+*  <b>loss</b>: A Tensor containing the value to minimize.
+*  <b>var_list</b>: Optional list of variables.Variable to update to minimize
+    "loss".  Defaults to the list of variables collected in the graph
+    under the key GraphKey.TRAINABLE_VARIABLES.
+*  <b>gate_gradients</b>: How to gate the computation of gradients.  Can be
+    GATE_NONE, GATE_OP, or  GATE_GRAPH.
+
+##### Returns:
+
+  A list of (gradient, variable) pairs.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If var_list contains anything else than variables.Variable.
+*  <b>ValueError</b>: If some arguments are invalid.
+
+
+- - -
+
+#### tf.train.Optimizer.apply_gradients(grads_and_vars, global_step=None, name=None) {#Optimizer.apply_gradients}
+
+Apply gradients to variables.
+
+This is the second part of minimize(). It returns an Operation that
+applies gradients.
+
+##### Args:
+
+
+*  <b>grads_and_vars</b>: List of (gradient, variable) pairs as returned by
+    compute_gradients().
+*  <b>global_step</b>: Optional Variable to increment by one after the
+    variables have been updated.
+*  <b>name</b>: Optional name for the returned operation.  Default to the
+    name passed to the Optimizer constructor.
+
+##### Returns:
+
+  An Operation that applies the specified gradients. If 'global_step'
+  was not None, that operation also increments global_step.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: if grads_and_vars is malformed.
+
+
+
+### Gating Gradients <div class="md-anchor" id="AUTOGENERATED-gating-gradients">{#AUTOGENERATED-gating-gradients}</div>
+
+Both `minimize()` and `compute_gradients()` accept a `gate_gradient` argument
+that controls the degree of parallelism during the application of the
+gradients.
+
+The possible values are: `GATE_NONE`, `GATE_OP`, and `GATE_GRAPH`.
+
+<b>GATE_NONE</b>: Compute and apply gradients in parallel.  This provides the
+maximum parallelism in execution, at the cost of some non-reproducibility in
+the results.  For example the two gradients of MatMul depend on the input
+values: With `GATE_NONE` one of the gradients could be applied to one of the
+inputs _before_ the other gradient is computed resulting in non-reproducible
+results.
+
+<b>GATE_OP</b>: For each Op, make sure all gradients are computed before they
+are used.  This prevents race conditions for Ops that generate gradients for
+multiple inputs where the gradients depend on the inputs.
+
+<b>GATE_GRAPH</b>: Make sure all gradients for all variables are computed
+before any one of them is used.  This provides the least parallelism but can
+be useful if you want to process all gradients before applying any of them.
+
+### Slots <div class="md-anchor" id="AUTOGENERATED-slots">{#AUTOGENERATED-slots}</div>
+
+Some optimizer subclasses, such as `MomentumOptimizer` and `AdagradOptimizer`
+allocate and manage additional variables associated with the variables to
+train.  These are called <i>Slots</i>.  Slots have names and you can ask the
+optimizer for the names of the slots that it uses.  Once you have a slot name
+you can ask the optimizer for the variable it created to hold the slot value.
+
+This can be useful if you want to log debug a training algorithm, report stats
+about the slots, etc.
+
+- - -
+
+#### tf.train.Optimizer.get_slot_names() {#Optimizer.get_slot_names}
+
+Return a list of the names of slots created by the Optimizer.
+
+See get_slot().
+
+##### Returns:
+
+  A list of strings.
+
+
+- - -
+
+#### tf.train.Optimizer.get_slot(var, name) {#Optimizer.get_slot}
+
+Return a slot named "name" created for "var" by the Optimizer.
+
+Some Optimizer subclasses use additional variables.  For example
+Momentum and Adagrad use variables to accumulate updates.  This method
+gives access to these Variables if for some reason you need them.
+
+Use get_slot_names() to get the list of slot names created by the Optimizer.
+
+##### Args:
+
+
+*  <b>var</b>: A variable passed to minimize() or apply_gradients().
+*  <b>name</b>: A string.
+
+##### Returns:
+
+  The Variable for the slot if it was created, None otherwise.
+
+
+
+
+- - -
+
+### class tf.train.GradientDescentOptimizer <div class="md-anchor" id="GradientDescentOptimizer">{#GradientDescentOptimizer}</div>
+
+Optimizer that implements the gradient descent algorithm.
+
+- - -
+
+#### tf.train.GradientDescentOptimizer.__init__(learning_rate, use_locking=False, name='GradientDescent') {#GradientDescentOptimizer.__init__}
+
+Construct a new gradient descent optimizer.
+
+##### Args:
+
+
+*  <b>learning_rate</b>: A Tensor or a floating point value.  The learning
+    rate to use.
+*  <b>use_locking</b>: If True use locks for update operation.s
+*  <b>name</b>: Optional name prefix for the operations created when applying
+    gradients. Defaults to "GradientDescent".
+
+
+
+- - -
+
+### class tf.train.AdagradOptimizer <div class="md-anchor" id="AdagradOptimizer">{#AdagradOptimizer}</div>
+
+Optimizer that implements the Adagrad algorithm.
+
+- - -
+
+#### tf.train.AdagradOptimizer.__init__(learning_rate, initial_accumulator_value=0.1, use_locking=False, name='Adagrad') {#AdagradOptimizer.__init__}
+
+Construct a new Adagrad optimizer.
+
+##### Args:
+
+
+*  <b>learning_rate</b>: A `Tensor` or a floating point value.  The learning rate.
+*  <b>initial_accumulator_value</b>: A floating point value.
+    Starting value for the accumulators, must be positive.
+*  <b>use_locking</b>: If `True` use locks for update operations.
+*  <b>name</b>: Optional name prefix for the operations created when applying
+    gradients.  Defaults to "Adagrad".
+
+##### Raises:
+
+
+*  <b>ValueError</b>: If the initial_accumulator_value is invalid.
+
+
+
+- - -
+
+### class tf.train.MomentumOptimizer <div class="md-anchor" id="MomentumOptimizer">{#MomentumOptimizer}</div>
+
+Optimizer that implements the Momentum algorithm.
+
+- - -
+
+#### tf.train.MomentumOptimizer.__init__(learning_rate, momentum, use_locking=False, name='Momentum') {#MomentumOptimizer.__init__}
+
+Construct a new Momentum optimizer.
+
+##### Args:
+
+
+*  <b>learning_rate</b>: A `Tensor` or a floating point value.  The learning rate.
+*  <b>momentum</b>: A `Tensor` or a floating point value.  The momentum.
+*  <b>use_locking</b>: If `True` use locks for update operations.
+*  <b>name</b>: Optional name prefix for the operations created when applying
+    gradients.  Defaults to "Momentum".
+
+
+
+- - -
+
+### class tf.train.AdamOptimizer <div class="md-anchor" id="AdamOptimizer">{#AdamOptimizer}</div>
+
+Optimizer that implements the Adam algorithm.
+
+- - -
+
+#### tf.train.AdamOptimizer.__init__(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name='Adam') {#AdamOptimizer.__init__}
+
+Construct a new Adam optimizer.
+
+Implementation is based on: http://arxiv.org/pdf/1412.6980v7.pdf
+
+Initialization:
+
+```
+m_0 <- 0 (Initialize initial 1st moment vector)
+v_0 <- 0 (Initialize initial 2nd moment vector)
+t <- 0 (Initialize timestep)
+```
+
+The update rule for `variable` with gradient `g` uses an optimization
+described at the end of section2 of the paper:
+
+```
+t <- t + 1
+lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
+
+m_t <- beta1 * m_{t-1} + (1 - beta1) * g
+v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
+variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
+```
+
+The default value of 1e-8 for epsilon might not be a good default in
+general. For example, when training an Inception network on ImageNet a
+current good choice is 1.0 or 0.1.
+
+##### Args:
+
+
+*  <b>learning_rate</b>: A Tensor or a floating point value.  The learning rate.
+*  <b>beta1</b>: A float value or a constant float tensor.
+    The exponential decay rate for the 1st moment estimates.
+*  <b>beta2</b>: A float value or a constant float tensor.
+    The exponential decay rate for the 2st moment estimates.
+*  <b>epsilon</b>: A small constant for numerical stability.
+*  <b>use_locking</b>: If True use locks for update operation.s
+*  <b>name</b>: Optional name for the operations created when applying gradients.
+    Defaults to "Adam".
+
+
+
+- - -
+
+### class tf.train.FtrlOptimizer <div class="md-anchor" id="FtrlOptimizer">{#FtrlOptimizer}</div>
+
+Optimizer that implements the FTRL algorithm.
+
+- - -
+
+#### tf.train.FtrlOptimizer.__init__(learning_rate, learning_rate_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, use_locking=False, name='Ftrl') {#FtrlOptimizer.__init__}
+
+Construct a new FTRL optimizer.
+
+The Ftrl-proximal algorithm, abbreviated for Follow-the-regularized-leader,
+is described in the paper [Ad Click Prediction: a View from the Trenches](
+https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf).
+
+It can give a good performance vs. sparsity tradeoff.
+
+Ftrl-proximal uses its own global base learning rate and can behave like
+Adagrad with `learning_rate_power=-0.5`, or like gradient descent with
+`learning_rate_power=0.0`.
+
+The effective learning rate is adjusted per parameter, relative to this
+base learning rate as:
+
+```
+effective_learning_rate_i = (learning_rate /
+    pow(k + summed_squared_gradients_for_i, learning_rate_power));
+```
+
+where k is the small constant `initial_accumulator_value`.
+
+Note that the real regularization coefficient of `|w|^2` for objective
+function is `1 / lambda_2` if specifying `l2 = lambda_2` as argument when
+using this function.
+
+##### Args:
+
+
+*  <b>learning_rate</b>: A float value or a constant float `Tensor`.
+*  <b>learning_rate_power</b>: A float value, must be less or equal to zero.
+*  <b>initial_accumulator_value</b>: The starting value for accumulators.
+    Only positive values are allowed.
+*  <b>l1_regularization_strength</b>: A float value, must be greater than or
+    equal to zero.
+*  <b>l2_regularization_strength</b>: A float value, must be greater than or
+    equal to zero.
+*  <b>use_locking</b>: If `True` use locks for update operations.
+*  <b>name</b>: Optional name prefix for the operations created when applying
+    gradients.  Defaults to "Ftrl".
+
+##### Raises:
+
+
+*  <b>ValueError</b>: if one of the arguments is invalid.
+
+
+
+- - -
+
+### class tf.train.RMSPropOptimizer <div class="md-anchor" id="RMSPropOptimizer">{#RMSPropOptimizer}</div>
+
+Optimizer that implements the RMSProp algorithm.
+
+- - -
+
+#### tf.train.RMSPropOptimizer.__init__(learning_rate, decay, momentum=0.0, epsilon=1e-10, use_locking=False, name='RMSProp') {#RMSPropOptimizer.__init__}
+
+Construct a new RMSProp optimizer.
+
+##### Args:
+
+
+*  <b>learning_rate</b>: A Tensor or a floating point value.  The learning rate.
+*  <b>decay</b>: discounting factor for the history/coming gradient
+*  <b>momentum</b>: a scalar tensor.
+*  <b>epsilon</b>: small value to avoid zero denominator.
+*  <b>use_locking</b>: If True use locks for update operation.
+*  <b>name</b>: Optional name prefic for the operations created when applying
+    gradients. Defaults to "RMSProp".
+
+
+
+
+## Gradient Computation. <div class="md-anchor" id="AUTOGENERATED-gradient-computation.">{#AUTOGENERATED-gradient-computation.}</div>
+
+TensorFlow provides functions to compute the derivatives for a given
+TensorFlow computation graph, adding operations to the graph. The
+optimizer classes automatically compute derivatives on your graph, but
+creators of new Optimizers or expert users can call the lower-level
+functions below.
+
+- - -
+
+### tf.gradients(ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None) <div class="md-anchor" id="gradients">{#gradients}</div>
+
+Constructs symbolic partial derivatives of `ys` w.r.t. x in `xs`.
+
+`ys` and `xs` are each a `Tensor` or a list of tensors.  `grad_ys`
+is a list of `Tensor`, holding the gradients received by the
+`ys`. The list must be the same length as `ys`.
+
+`gradients()` adds ops to the graph to output the partial
+derivatives of `ys` with respect to `xs`.  It returns a list of
+`Tensor` of length `len(xs)` where each tensor is the `sum(dy/dx)`
+for y in `ys`.
+
+`grad_ys` is a list of tensors of the same length as `ys` that holds
+the initial gradients for each y in `ys`.  When `grad_ys` is None,
+we fill in a tensor of '1's of the shape of y for each y in `ys`.  A
+user can provide their own initial 'grad_ys` to compute the
+derivatives using a different initial gradient for each y (e.g., if
+one wanted to weight the gradient differently for each value in
+each y).
+
+##### Args:
+
+
+*  <b>ys</b>: A `Tensor` or list of tensors to be differentiated.
+*  <b>xs</b>: A `Tensor` or list of tensors to be used for differentiation.
+*  <b>grad_ys</b>: Optional. A `Tensor` or list of tensors the same size as
+    `ys` and holding the gradients computed for each y in `ys`.
+*  <b>name</b>: Optional name to use for grouping all the gradient ops together.
+    defaults to 'gradients'.
+*  <b>colocate_gradients_with_ops</b>: If True, try colocating gradients with
+    the corresponding op.
+*  <b>gate_gradients</b>: If True, add a tuple around the gradients returned
+    for an operations.  This avoids some race conditions.
+*  <b>aggregation_method</b>: Specifies the method used to combine gradient terms.
+    Accepted values are constants defined in the class `AggregationMethod`.
+
+##### Returns:
+
+  A list of `sum(dy/dx)` for each x in `xs`.
+
+##### Raises:
+
+
+*  <b>LookupError</b>: if one of the operations between `x` and `y` does not
+    have a registered gradient function.
+*  <b>ValueError</b>: if the arguments are invalid.
+
+
+- - -
+
+### class tf.AggregationMethod <div class="md-anchor" id="AggregationMethod">{#AggregationMethod}</div>
+
+A class listing aggregation methods used to combine gradients.
+
+Computing partial derivatives can require aggregating gradient
+contributions. This class lists the various methods that can
+be used to combine gradients in the graph:
+
+*  `ADD_N`: All of the gradient terms are summed as part of one
+   operation using the "AddN" op. It has the property that all
+   gradients must be ready before any aggregation is performed.
+*  `DEFAULT`: The system-chosen default aggregation method.
+
+
+- - -
+
+### tf.stop_gradient(input, name=None) <div class="md-anchor" id="stop_gradient">{#stop_gradient}</div>
+
+Stops gradient computation.
+
+When executed in a graph, this op outputs its input tensor as-is.
+
+When building ops to compute gradients, this op prevents the contribution of
+its inputs to be taken into account.  Normally, the gradient generator adds ops
+to a graph to compute the derivatives of a specified 'loss' by recursively
+finding out inputs that contributed to its computation.  If you insert this op
+in the graph it inputs are masked from the gradient generator.  They are not
+taken into account for computing gradients.
+
+This is useful any time you want to compute a value with TensorFlow but need
+to pretend that the value was a constant. Some examples include:
+
+*  The *EM* algorithm where the *M-step* should not involve backpropagation
+   through the output of the *E-step*.
+*  Contrastive divergence training of Boltzmann machines where, when
+   differentiating the energy function, the training must not backpropagate
+   through the graph that generated the samples from the model.
+*  Adversarial training, where no backprop should happen through the adversarial
+   example generation process.
+
+##### Args:
+
+
+*  <b>input</b>: A `Tensor`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A `Tensor`. Has the same type as `input`.
+
+
+
+
+## Gradient Clipping <div class="md-anchor" id="AUTOGENERATED-gradient-clipping">{#AUTOGENERATED-gradient-clipping}</div>
+
+TensorFlow provides several operations that you can use to add clipping
+functions to your graph. You can use these functions to perform general data
+clipping, but they're particularly useful for handling exploding or vanishing
+gradients.
+
+- - -
+
+### tf.clip_by_value(t, clip_value_min, clip_value_max, name=None) <div class="md-anchor" id="clip_by_value">{#clip_by_value}</div>
+
+Clips tensor values to a specified min and max.
+
+Given a tensor `t`, this operation returns a tensor of the same type and
+shape as `t` with its values clipped to `clip_value_min` and `clip_value_max`.
+Any values less than `clip_value_min` are set to `clip_value_min`. Any values
+greater than `clip_value_max` are set to `clip_value_max`.
+
+##### Args:
+
+
+*  <b>t</b>: A `Tensor`.
+*  <b>clip_value_min</b>: A 0-D (scalar) `Tensor`. The minimum value to clip by.
+*  <b>clip_value_max</b>: A 0-D (scalar) `Tensor`. The maximum value to clip by.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A clipped `Tensor`.
+
+
+- - -
+
+### tf.clip_by_norm(t, clip_norm, name=None) <div class="md-anchor" id="clip_by_norm">{#clip_by_norm}</div>
+
+Clips tensor values to a maximum L2-norm.
+
+Given a tensor `t`, and a maximum clip value `clip_norm`, this operation
+normalizes `t` so that its L2-norm is less than or equal to `clip_norm'.
+Specifically, if the L2-norm is already less than or equal to `clip_norm`,
+then `t` is not modified. If the L2-norm is greater than `clip_norm`, then
+this operation returns a tensor of the same type and shape as `t` with its
+values set to:
+
+`t * clip_norm / l2norm(t)`
+
+In this case, the L2-norm of the output tensor is `clip_norm`.
+
+This operation is typically used to clip gradients before applying them with
+an optimizer.
+
+##### Args:
+
+
+*  <b>t</b>: A `Tensor`.
+*  <b>clip_norm</b>: A 0-D (scalar) `Tensor` > 0. A maximum clipping value.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A clipped `Tensor`.
+
+
+- - -
+
+### tf.clip_by_average_norm(t, clip_norm, name=None) <div class="md-anchor" id="clip_by_average_norm">{#clip_by_average_norm}</div>
+
+Clips tensor values to a maximum average L2-norm.
+
+Given a tensor `t`, and a maximum clip value `clip_norm`, this operation
+normalizes `t` so that its average L2-norm is less than or equal to
+`clip_norm'. Specifically, if the average L2-norm is already less than or
+equal to `clip_norm`, then `t` is not modified. If the average L2-norm is
+greater than `clip_norm`, then this operation returns a tensor of the same
+type and shape as `t` with its values set to:
+
+`t * clip_norm / l2norm_avg(t)`
+
+In this case, the average L2-norm of the output tensor is `clip_norm`.
+
+This operation is typically used to clip gradients before applying them with
+an optimizer.
+
+##### Args:
+
+
+*  <b>t</b>: A `Tensor`.
+*  <b>clip_norm</b>: A 0-D (scalar) `Tensor` > 0. A maximum clipping value.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A clipped `Tensor`.
+
+
+- - -
+
+### tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None) <div class="md-anchor" id="clip_by_global_norm">{#clip_by_global_norm}</div>
+
+Clips values of multiple tensors by the ratio of the sum of their norms.
+
+Given a tuple or list of tensors `t_list`, and a clipping ratio `clip_norm`,
+this operation returns a list of clipped tensors `list_clipped`
+and the global norm (`global_norm`) of all tensors in `t_list`. Optionally,
+if you've already computed the global norm for `t_list`, you can specify
+the global norm with `use_norm`.
+
+To perform the clipping, the values t_list[i] are set to:
+
+`t_list[i] * clip_norm / max(global_norm, clip_norm)`
+
+where:
+
+`global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))`
+
+If `clip_norm > global_norm` then the entries in `t_list` remain as they are,
+otherwise they're all shrunk by the global ratio.
+
+Any of the entries of `t_list` that are of type None are ignored.
+
+This is the correct way to perform gradient clipping (for example, see
+R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training
+Recurrent Neural Networks".  http://arxiv.org/abs/1211.5063)
+
+However, it is slower than `clip_by_norm()` because all the parameters must be
+ready before the clipping operation can be performed.
+
+##### Args:
+
+
+*  <b>t_list</b>: A tuple or list of mixed `Tensors`, `IndexedSlices`, or None.
+*  <b>clip_norm</b>: A 0-D (scalar) `Tensor` > 0. The clipping ratio.
+*  <b>use_norm</b>: A 0-D (scalar) `Tensor` of type `float` (optional). The global
+    norm to use. If not provided, `global_norm()` is used to compute the norm.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+
+*  <b>list_clipped</b>: A list of `Tensors` of the same type as `list_t`.
+*  <b>global_norm</b>: A 0-D (scalar) `Tensor` representing the global norm.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `t_list` is not a sequence.
+
+
+- - -
+
+### tf.global_norm(t_list, name=None) <div class="md-anchor" id="global_norm">{#global_norm}</div>
+
+Computes the global norm of multiple tensors.
+
+Given a tuple or list of tensors `t_list`, this operation returns the
+global norm of the elements in all tensors in `t_list`. The global norm is
+computed as:
+
+`global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))`
+
+Any entries in `t_list` that are of type None are ignored.
+
+##### Args:
+
+
+*  <b>t_list</b>: A tuple or list of mixed `Tensors`, `IndexedSlices`, or None.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A 0-D (scalar) `Tensor` of type `float`.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If `t_list` is not a sequence.
+
+
+
+## Decaying the learning rate. <div class="md-anchor" id="AUTOGENERATED-decaying-the-learning-rate.">{#AUTOGENERATED-decaying-the-learning-rate.}</div>
+- - -
+
+### tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None) <div class="md-anchor" id="exponential_decay">{#exponential_decay}</div>
+
+Applies exponential decay to the learning rate.
+
+When training a model, it is often recommended to lower the learning rate as
+the training progresses.  This function applies an exponential decay function
+to a provided initial learning rate.  It requires a `global_step` value to
+compute the decayed learning rate.  You can just pass a TensorFlow variable
+that you increment at each training step.
+
+The function returns the decayed learning rate.  It is computed as:
+
+```python
+decayed_learning_rate = learning_rate *
+                        decay_rate ^ (global_step / decay_steps)
+```
+
+If the argument `staircase` is `True`, then `global_step /decay_steps` is an
+integer division and the decayed learning rate follows a staircase function.
+
+Example: decay every 100000 steps with a base of 0.96:
+
+```python
+...
+global_step = tf.Variable(0, trainable=False)
+starter_learning_rate = 0.1
+learning_rate = tf.exponential_decay(starter_learning_rate, global_step,
+                                     100000, 0.96, staircase=True)
+optimizer = tf.GradientDescent(learning_rate)
+# Passing global_step to minimize() will increment it at each step.
+optimizer.minimize(...my loss..., global_step=global_step)
+```
+
+##### Args:
+
+
+*  <b>learning_rate</b>: A scalar `float32` or `float64` `Tensor` or a
+    Python number.  The initial learning rate.
+*  <b>global_step</b>: A scalar `int32` or `int64` `Tensor` or a Python number.
+    Global step to use for the decay computation.  Must not be negative.
+*  <b>decay_steps</b>: A scalar `int32` or `int64` `Tensor` or a Python number.
+    Must be positive.  See the decay computation above.
+*  <b>decay_rate</b>: A scalar `float32` or `float64` `Tensor` or a
+    Python number.  The decay rate.
+*  <b>staircase</b>: Boolean.  It `True` decay the learning rate at discrete intervals.
+*  <b>name</b>: string.  Optional name of the operation.  Defaults to 'ExponentialDecay'
+
+##### Returns:
+
+  A scalar `Tensor` of the same type as `learning_rate`.  The decayed
+  learning rate.
+
+
+
+## Moving Averages. <div class="md-anchor" id="AUTOGENERATED-moving-averages.">{#AUTOGENERATED-moving-averages.}</div>
+
+Some training algorithms, such as GradientDescent and Momentum often benefit
+from maintaining a moving average of variables during optimization.  Using the
+moving averages for evaluations often improve results significantly.
+
+- - -
+
+### class tf.train.ExponentialMovingAverage <div class="md-anchor" id="ExponentialMovingAverage">{#ExponentialMovingAverage}</div>
+
+Maintains moving averages of variables by employing and exponential decay.
+
+When training a model, it is often beneficial to maintain moving averages of
+the trained parameters.  Evaluations that use averaged parameters sometimes
+produce significantly better results than the final trained values.
+
+The `apply()` method adds shadow copies of trained variables and add ops that
+maintain a moving average of the trained variables in their shadow copies.
+It is used when building the training model.  The ops that maintain moving
+averages are typically run after each training step.
+The `average()` and `average_name()` methods give access to the shadow
+variables and their names.  They are useful when building an evaluation
+model, or when restoring a model from a checkpoint file.  They help use the
+moving averages in place of the last trained values for evaluations.
+
+The moving averages are computed using exponential decay.  You specify the
+decay value when creating the `ExponentialMovingAverage` object.  The shadow
+variables are initialized with the same initial values as the trained
+variables.  When you run the ops to maintain the moving averages, each
+shadow variable is updated with the formula:
+
+  `shadow_variable -= (1 - decay) * (shadow_variable - variable)`
+
+This is mathematically equivalent to the classic formula below, but the use
+of an `assign_sub` op (the `"-="` in the formula) allows concurrent lockless
+updates to the variables:
+
+  `shadow_variable = decay * shadow_variable + (1 - decay) * variable`
+
+Reasonable values for `decay` are close to 1.0, typically in the
+multiple-nines range: 0.999, 0.9999, etc.
+
+Example usage when creating a training model:
+
+```python
+# Create variables.
+var0 = tf.Variable(...)
+var1 = tf.Variable(...)
+# ... use the variables to build a training model...
+...
+# Create an op that applies the optimizer.  This is what we usually
+# would use as a training op.
+opt_op = opt.minimize(my_loss, [var0, var1])
+
+# Create an ExponentialMovingAverage object
+ema = tf.train.ExponentialMovingAverage(decay=0.9999)
+
+# Create the shadow variables, and add ops to maintain moving averages
+# of var0 and var1.
+maintain_averages_op = ema.apply([var0, var1])
+
+# Create an op that will update the moving averages after each training
+# step.  This is what we will use in place of the usuall trainig op.
+with tf.control_dependencies([opt_op]):
+    training_op = tf.group(maintain_averages_op)
+
+...train the model by running training_op...
+```
+
+There are two ways to use the moving averages for evaluations:
+
+*  Build a model that uses the shadow variables instead of the variables.
+   For this, use the `average()` method which returns the shadow variable
+   for a given variable.
+*  Build a model normally but load the checkpoint files to evaluate by using
+   the shadow variable names.  For this use the `average_name()` method.  See
+   the [Saver class](train.md#Saver) for more information on restoring saved
+   variables.
+
+Example of restoring the shadow variable values:
+
+```python
+# Create a Saver that loads variables from their saved shadow values.
+shadow_var0_name = ema.average_name(var0)
+shadow_var1_name = ema.average_name(var1)
+saver = tf.train.Saver({shadow_var0_name: var0, shadow_var1_name: var1})
+saver.restore(...checkpoint filename...)
+# var0 and var1 now hold the moving average values
+```
+
+- - -
+
+#### tf.train.ExponentialMovingAverage.__init__(decay, num_updates=None, name='ExponentialMovingAverage') {#ExponentialMovingAverage.__init__}
+
+Creates a new ExponentialMovingAverage object.
+
+The `Apply()` method has to be called to create shadow variables and add
+ops to maintain moving averages.
+
+The optional `num_updates` parameter allows one to tweak the decay rate
+dynamically. .  It is typical to pass the count of training steps, usually
+kept in a variable that is incremented at each step, in which case the
+decay rate is lower at the start of training.  This makes moving averages
+move faster.  If passed, the actual decay rate used is:
+
+  `min(decay, (1 + num_updates) / (10 + num_updates))`
+
+##### Args:
+
+
+*  <b>decay</b>: Float.  The decay to use.
+*  <b>num_updates</b>: Optional count of number of updates applied to variables.
+*  <b>name</b>: String. Optional prefix name to use for the name of ops added in
+    `Apply()`.
+
+
+- - -
+
+#### tf.train.ExponentialMovingAverage.apply(var_list=None) {#ExponentialMovingAverage.apply}
+
+Maintains moving averages of variables.
+
+`var_list` must be a list of `Variable` or `Tensor` objects.  This method
+creates shadow variables for all elements of `var_list`.  Shadow variables
+for `Variable` objects are initialized to the variable's initial value.
+For `Tensor` objects, the shadow variables are initialized to 0.
+
+shadow variables are created with `trainable=False` and added to the
+`GraphKeys.ALL_VARIABLES` collection.  They will be returned by calls to
+`tf.all_variables()`.
+
+Returns an op that updates all shadow variables as described above.
+
+Note that `apply()` can be called multiple times with different lists of
+variables.
+
+##### Args:
+
+
+*  <b>var_list</b>: A list of Variable or Tensor objects. The variables
+    and Tensors must be of types float32 or float64.
+
+##### Returns:
+
+  An Operation that updates the moving averages.
+
+##### Raises:
+
+
+*  <b>TypeError</b>: If the arguments are not all float32 or float64.
+*  <b>ValueError</b>: If the moving average of one of the variables is already
+    being computed.
+
+
+- - -
+
+#### tf.train.ExponentialMovingAverage.average_name(var) {#ExponentialMovingAverage.average_name}
+
+Returns the name of the `Variable` holding the average for `var`.
+
+The typical scenario for `ExponentialMovingAverage` is to compute moving
+averages of variables during training, and restore the variables from the
+computed moving averages during evaluations.
+
+To restore variables, you have to know the name of the shadow variables.
+That name and the original variable can then be passed to a `Saver()` object
+to restore the variable from the moving average value with:
+  `saver = tf.train.Saver({ema.average_name(var): var})`
+
+`average_name()` can be called whether or not `apply()` has been called.
+
+##### Args:
+
+
+*  <b>var</b>: A `Variable` object.
+
+##### Returns:
+
+  A string: the name of the variable that will be used or was used
+  by the `ExponentialMovingAverage class` to hold the moving average of
+  `var`.
+
+
+- - -
+
+#### tf.train.ExponentialMovingAverage.average(var) {#ExponentialMovingAverage.average}
+
+Returns the `Variable` holding the average of `var`.
+
+##### Args:
+
+
+*  <b>var</b>: A `Variable` object.
+
+##### Returns:
+
+  A `Variable` object or `None` if the moving average of `var`
+  is not maintained..
+
+
+
+
+## Coordinator and QueueRunner. <div class="md-anchor" id="AUTOGENERATED-coordinator-and-queuerunner.">{#AUTOGENERATED-coordinator-and-queuerunner.}</div>
+
+See [Threading and Queues](../../how_tos/threading_and_queues/index.md)
+for how to use threads and queues.  For documentation on the Queue API,
+see [Queues](../../api_docs/python/io_ops.md#queues).
+
+- - -
+
+### class tf.train.Coordinator <div class="md-anchor" id="Coordinator">{#Coordinator}</div>
+
+A coordinator for threads.
+
+This class implements a simple mechanism to coordinate the termination of a
+set of threads.
+
+#### Usage:
+
+```python
+# Create a coordinator.
+coord = Coordinator()
+# Start a number of threads, passing the coordinator to each of them.
+...start thread 1...(coord, ...)
+...start thread N...(coord, ...)
+# Wait for all the threads to terminate.
+coord.join(threads)
+```
+
+Any of the threads can call `coord.request_stop()` to ask for all the threads
+to stop.  To cooperate with the requests, each thread must check for
+`coord.should_stop()` on a regular basis.  `coord.should_stop()` returns
+`True` as soon as `coord.request_stop()` has been called.
+
+A typical thread running with a Coordinator will do something like:
+
+```python
+while not coord.should_stop():
+   ...do some work...
+```
+
+#### Exception handling:
+
+A thread can report an exception to the Coordinator as part of the
+`should_stop()` call.  The exception will be re-raised from the
+`coord.join()` call.
+
+Thread code:
+
+```python
+try:
+  while not coord.should_stop():
+    ...do some work...
+except Exception, e:
+  coord.request_stop(e)
+```
+
+Main code:
+
+```python
+try:
+  ...
+  coord = Coordinator()
+  # Start a number of threads, passing the coordinator to each of them.
+  ...start thread 1...(coord, ...)
+  ...start thread N...(coord, ...)
+  # Wait for all the threads to terminate.
+  coord.join(threads)
+except Exception, e:
+  ...exception that was passed to coord.request_stop()
+```
+
+#### Grace period for stopping:
+
+After a thread has called `coord.request_stop()` the other threads have a
+fixed time to stop, this is called the 'stop grace period' and defaults to 2
+minutes.  If any of the threads is still alive after the grace period expires
+`coord.join()` raises a RuntimeException reporting the laggards.
+
+```
+try:
+  ...
+  coord = Coordinator()
+  # Start a number of threads, passing the coordinator to each of them.
+  ...start thread 1...(coord, ...)
+  ...start thread N...(coord, ...)
+  # Wait for all the threads to terminate, give them 10s grace period
+  coord.join(threads, stop_grace_period_secs=10)
+except RuntimeException:
+  ...one of the threads took more than 10s to stop after request_stop()
+  ...was called.
+except Exception:
+  ...exception that was passed to coord.request_stop()
+```
+- - -
+
+#### tf.train.Coordinator.__init__() {#Coordinator.__init__}
+
+Create a new Coordinator.
+
+
+- - -
+
+#### tf.train.Coordinator.join(threads, stop_grace_period_secs=120) {#Coordinator.join}
+
+Wait for threads to terminate.
+
+Blocks until all 'threads' have terminated or request_stop() is called.
+
+After the threads stop, if an 'exc_info' was passed to request_stop, that
+exception is re-reaised.
+
+Grace period handling: When request_stop() is called, threads are given
+'stop_grace_period_secs' seconds to terminate.  If any of them is still
+alive after that period expires, a RuntimeError is raised.  Note that if
+an 'exc_info' was passed to request_stop() then it is raised instead of
+that RuntimeError.
+
+##### Args:
+
+
+*  <b>threads</b>: List threading.Threads. The started threads to join.
+*  <b>stop_grace_period_secs</b>: Number of seconds given to threads to stop after
+    request_stop() has been called.
+
+##### Raises:
+
+
+*  <b>RuntimeError</b>: If any thread is still alive after request_stop()
+    is called and the grace period expires.
+
+
+- - -
+
+#### tf.train.Coordinator.request_stop(ex=None) {#Coordinator.request_stop}
+
+Request that the threads stop.
+
+After this is called, calls to should_stop() will return True.
+
+##### Args:
+
+
+*  <b>ex</b>: Optional Exception, or Python 'exc_info' tuple as returned by
+    sys.exc_info().  If this is the first call to request_stop() the
+    corresponding exception is recorded and re-raised from join().
+
+
+- - -
+
+#### tf.train.Coordinator.should_stop() {#Coordinator.should_stop}
+
+Check if stop was requested.
+
+##### Returns:
+
+  True if a stop was requested.
+
+
+- - -
+
+#### tf.train.Coordinator.wait_for_stop(timeout=None) {#Coordinator.wait_for_stop}
+
+Wait till the Coordinator is told to stop.
+
+##### Args:
+
+
+*  <b>timeout</b>: float.  Sleep for up to that many seconds waiting for
+    should_stop() to become True.
+
+##### Returns:
+
+  True if the Coordinator is told stop, False if the timeout expired.
+
+
+
+- - -
+
+### class tf.train.QueueRunner <div class="md-anchor" id="QueueRunner">{#QueueRunner}</div>
+
+Holds a list of enqueue operations for a queue, each to be run in a thread.
+
+Queues are a convenient TensorFlow mechanism to compute tensors
+asynchronously using multiple threads. For example in the canonical 'Input
+Reader' setup one set of threads generates filenames in a queue; a second set
+of threads read records from the files, processes them, and enqueues tensors
+on a second queue; a third set of threads dequeues these input records to
+construct batches and runs them through training operations.
+
+There are several delicate issues when running multiple threads that way:
+closing the queues in sequence as the input is exhausted, correctly catching
+and reporting exceptions, etc.
+
+The `QueueRunner`, combined with the `Coordinator`, helps handle these issues.
+- - -
+
+#### tf.train.QueueRunner.__init__(queue, enqueue_ops) {#QueueRunner.__init__}
+
+Create a QueueRunner.
+
+On construction the `QueueRunner` adds an op to close the queue.  That op
+will be run if the enqueue ops raise exceptions.
+
+When you later call the `create_threads()` method, the `QueueRunner` will
+create one thread for each op in `enqueue_ops`.  Each thread will run its
+enqueue op in parallel with the other threads.  The enqueue ops do not have
+to all be the same op, but it is expected that they all enqueue tensors in
+`queue`.
+
+##### Args:
+
+
+*  <b>queue</b>: A `Queue`.
+*  <b>enqueue_ops</b>: List of enqueue ops to run in threads later.
+
+
+- - -
+
+#### tf.train.QueueRunner.create_threads(sess, coord=None, daemon=False, start=False) {#QueueRunner.create_threads}
+
+Create threads to run the enqueue ops.
+
+This method requires a session in which the graph was launched.  It creates
+a list of threads, optionally starting them.  There is one thread for each
+op passed in `enqueue_ops`.
+
+The `coord` argument is an optional coordinator, that the threads will use
+to terminate together and report exceptions.  If a coordinator is given,
+this method starts an additional thread to close the queue when the
+coordinator requests a stop.
+
+This method may be called again as long as all threads from a previous call
+have stopped.
+
+##### Args:
+
+
+*  <b>sess</b>: A `Session`.
+*  <b>coord</b>: Optional `Coordinator` object for reporting errors and checking
+    stop conditions.
+*  <b>daemon</b>: Boolean.  If `True` make the threads daemon threads.
+*  <b>start</b>: Boolean.  If `True` starts the threads.  If `False` the
+    caller must call the `start()` method of the returned threads.
+
+##### Returns:
+
+  A list of threads.
+
+##### Raises:
+
+
+*  <b>RuntimeError</b>: If threads from a previous call to `create_threads()` are
+  still running.
+
+
+- - -
+
+#### tf.train.QueueRunner.exceptions_raised {#QueueRunner.exceptions_raised}
+
+Exceptions raised but not handled by the `QueueRunner` threads.
+
+Exceptions raised in queue runner threads are handled in one of two ways
+depending on whether or not a `Coordinator` was passed to
+`create_threads()`:
+
+* With a `Coordinator`, exceptions are reported to the coordinator and
+  forgotten by the `QueueRunner`.
+* Without a `Coordinator`, exceptions are captured by the `QueueRunner` and
+  made available in this `exceptions_raised` property.
+
+##### Returns:
+
+  A list of Python `Exception` objects.  The list is empty if no exception
+  was captured.  (No exceptions are captured when using a Coordinator.)
+
+
+- - -
+
+### tf.train.add_queue_runner(qr, collection='queue_runners') <div class="md-anchor" id="add_queue_runner">{#add_queue_runner}</div>
+
+Adds a `QueueRunner` to a collection in the graph.
+
+When building a complex model that uses many queues it is often difficult to
+gather all the queue runners that need to be run.  This convenience function
+allows you to add a queue runner to a well known collection in the graph.
+
+The companion method `start_queue_runners()` can be used to start threads for
+all the collected queue runners.
+
+##### Args:
+
+
+*  <b>qr</b>: A `QueueRunner`.
+*  <b>collection</b>: A `GraphKey` specifying the graph collection to add
+    the queue runner to.  Defaults to `GraphKeys.QUEUE_RUNNERS`.
+
+
+- - -
+
+### tf.train.start_queue_runners(sess=None, coord=None, daemon=True, start=True, collection='queue_runners') <div class="md-anchor" id="start_queue_runners">{#start_queue_runners}</div>
+
+Starts all queue runners collected in the graph.
+
+This is a companion method to `add_queue_runner()`.  It just starts
+threads for all queue runners collected in the graph.  It returns
+the list of all threads.
+
+##### Args:
+
+
+*  <b>sess</b>: `Session` used to run the queue ops.  Defaults to the
+    default session.
+*  <b>coord</b>: Optional `Coordinator` for coordinating the started threads.
+*  <b>daemon</b>: Whether the threads should be marked as `daemons`, meaning
+    they don't block program exit.
+*  <b>start</b>: Set to `False` to only create the threads, not start them.
+*  <b>collection</b>: A `GraphKey` specifying the graph collection to
+    get the queue runners from.  Defaults to `GraphKeys.QUEUE_RUNNERS`.
+
+##### Returns:
+
+  A list of threads.
+
+
+
+## Summary Operations. <div class="md-anchor" id="AUTOGENERATED-summary-operations.">{#AUTOGENERATED-summary-operations.}</div>
+
+The following ops output
+[`Summary`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+protocol buffers as serialized string tensors.
+
+You can fetch the output of a summary op in a session, and pass it to a
+[SummaryWriter](train.md#SummaryWriter) to append it to an event file.  You can
+then use TensorBoard to visualize the contents of the event files.  See
+[TensorBoard and Summaries](../../how_tos/summaries_and_tensorboard/index.md)
+for more details.
+
+- - -
+
+### tf.scalar_summary(tags, values, collections=None, name=None) <div class="md-anchor" id="scalar_summary">{#scalar_summary}</div>
+
+Outputs a `Summary` protocol buffer with scalar values.
+
+The input `tags` and `values` must have the same shape.  The generated
+summary has a summary value for each tag-value pair in `tags` and `values`.
+
+##### Args:
+
+
+*  <b>tags</b>: A 1-D `string` `Tensor`.  Tags for the summaries.
+*  <b>values</b>: A 1-D `float32` or `float64` Tensor.  Values for the summaries.
+*  <b>collections</b>: Optional list of graph collections keys. The new summary op is
+    added to these collections. Defaults to `[GraphKeys.SUMMARIES]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A scalar `Tensor` of type `string`. The serialized `Summary` protocol
+  buffer.
+
+
+- - -
+
+### tf.image_summary(tag, tensor, max_images=None, collections=None, name=None) <div class="md-anchor" id="image_summary">{#image_summary}</div>
+
+Outputs a `Summary` protocol buffer with images.
+
+The summary has up to `max_images` summary values containing images. The
+images are built from `tensor` which must be 4-D with shape `[batch_size,
+height, width, channels]` and where `channels` can be:
+
+*  1: `tensor` is interpreted as Grayscale.
+*  3: `tensor` is interpreted as RGB.
+*  4: `tensor` is interpreted as RGBA.
+
+The images have the same number of channels as the input tensor. Their values
+are normalized, one image at a time, to fit in the range `[0, 255]`.  The
+op uses two different normalization algorithms:
+
+*  If the input values are all positive, they are rescaled so the largest one
+   is 255.
+
+*  If any input value is negative, the values are shifted so input value 0.0
+   is at 127.  They are then rescaled so that either the smallest value is 0,
+   or the largest one is 255.
+
+The `tag` argument is a scalar `Tensor` of type `string`.  It is used to
+build the `tag` of the summary values:
+
+*  If `max_images` is 1, the summary value tag is '*tag*/image'.
+*  If `max_images` is greater than 1, the summary value tags are
+   generated sequentially as '*tag*/image/0', '*tag*/image/1', etc.
+
+##### Args:
+
+
+*  <b>tag</b>: A scalar `Tensor` of type `string`. Used to build the `tag`
+    of the summary values.
+*  <b>tensor</b>: A 4-D `float32` `Tensor` of shape `[batch_size, height, width,
+   channels]` where `channels` is 1, 3, or 4.
+*  <b>max_images</b>: Max number of batch elements to generate images for.
+*  <b>collections</b>: Optional list of ops.GraphKeys.  The collections to add the
+    summary to.  Defaults to [ops.GraphKeys.SUMMARIES]
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A scalar `Tensor` of type `string`. The serialized `Summary` protocol
+  buffer.
+
+
+- - -
+
+### tf.histogram_summary(tag, values, collections=None, name=None) <div class="md-anchor" id="histogram_summary">{#histogram_summary}</div>
+
+Outputs a `Summary` protocol buffer with a histogram.
+
+The generated
+[`Summary`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+has one summary value containing a histogram for `values`.
+
+This op reports an `OutOfRange` error if any value is not finite.
+
+##### Args:
+
+
+*  <b>tag</b>: A `string` `Tensor`. 0-D.  Tag to use for the summary value.
+*  <b>values</b>: A `float32` `Tensor`. Any shape. Values to use to build the
+    histogram.
+*  <b>collections</b>: Optional list of graph collections keys. The new summary op is
+    added to these collections. Defaults to `[GraphKeys.SUMMARIES]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A scalar `Tensor` of type `string`. The serialized `Summary` protocol
+  buffer.
+
+
+- - -
+
+### tf.nn.zero_fraction(value, name=None) <div class="md-anchor" id="zero_fraction">{#zero_fraction}</div>
+
+Returns the fraction of zeros in `value`.
+
+If `value` is empty, the result is `nan`.
+
+This is useful in summaries to measure and report sparsity.  For example,
+
+    z = tf.Relu(...)
+    summ = tf.scalar_summary('sparsity', tf.zero_fraction(z))
+
+##### Args:
+
+
+*  <b>value</b>: A tensor of numeric type.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  The fraction of zeros in `value`, with type `float32`.
+
+
+
+- - -
+
+### tf.merge_summary(inputs, collections=None, name=None) <div class="md-anchor" id="merge_summary">{#merge_summary}</div>
+
+Merges summaries.
+
+This op creates a
+[`Summary`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+protocol buffer that contains the union of all the values in the input
+summaries.
+
+When the Op is run, it reports an `InvalidArgument` error if multiple values
+in the summaries to merge use the same tag.
+
+##### Args:
+
+
+*  <b>inputs</b>: A list of `string` `Tensor` objects containing serialized `Summary`
+    protocol buffers.
+*  <b>collections</b>: Optional list of graph collections keys. The new summary op is
+    added to these collections. Defaults to `[GraphKeys.SUMMARIES]`.
+*  <b>name</b>: A name for the operation (optional).
+
+##### Returns:
+
+  A scalar `Tensor` of type `string`. The serialized `Summary` protocol
+  buffer resulting from the merging.
+
+
+- - -
+
+### tf.merge_all_summaries(key='summaries') <div class="md-anchor" id="merge_all_summaries">{#merge_all_summaries}</div>
+
+Merges all summaries collected in the default graph.
+
+##### Args:
+
+
+*  <b>key</b>: `GraphKey` used to collect the summaries.  Defaults to
+    `GraphKeys.SUMMARIES`.
+
+##### Returns:
+
+  If no summaries were collected, returns None.  Otherwise returns a scalar
+  `Tensor` of type`string` containing the serialized `Summary` protocol
+  buffer resulting from the merging.
+
+
+
+## Adding Summaries to Event Files. <div class="md-anchor" id="AUTOGENERATED-adding-summaries-to-event-files.">{#AUTOGENERATED-adding-summaries-to-event-files.}</div>
+
+See [Summaries and
+TensorBoard](../../how_tos/summaries_and_tensorboard/index.md) for an
+overview of summaries, event files, and visualization in TensorBoard.
+
+- - -
+
+### class tf.train.SummaryWriter <div class="md-anchor" id="SummaryWriter">{#SummaryWriter}</div>
+
+Writes `Summary` protocol buffers to event files.
+
+The `SummaryWriter` class provides a mechanism to create an event file in a
+given directory and add summaries and events to it. The class updates the
+file contents asynchronously. This allows a training program to call methods
+to add data to the file directly from the training loop, without slowing down
+training.
+
+- - -
+
+#### tf.train.SummaryWriter.__init__(logdir, graph_def=None, max_queue=10, flush_secs=120) {#SummaryWriter.__init__}
+
+Creates a `SummaryWriter` and an event file.
+
+On construction the summary writer creates a new event file in `logdir`.
+This event file will contain `Event` protocol buffers constructed when you
+call one of the following functions: `add_summary()`, `add_event()`, or
+`add_graph()`.
+
+If you pass a `graph_def` protocol buffer to the constructor it is added to
+the event file. (This is equivalent to calling `add_graph()` later).
+
+TensorBoard will pick the graph from the file and display it graphically so
+you can interactively explore the graph you built. You will usually pass
+the graph from the session in which you launched it:
+
+```python
+...create a graph...
+# Launch the graph in a session.
+sess = tf.Session()
+# Create a summary writer, add the 'graph_def' to the event file.
+writer = tf.train.SummaryWriter(<some-directory>, sess.graph_def)
+```
+
+The other arguments to the constructor control the asynchronous writes to
+the event file:
+
+*  `flush_secs`: How often, in seconds, to flush the added summaries
+   and events to disk.
+*  `max_queue`: Maximum number of summaries or events pending to be
+   written to disk before one of the 'add' calls block.
+
+##### Args:
+
+
+*  <b>logdir</b>: A string. Directory where event file will be written.
+*  <b>graph_def</b>: A `GraphDef` protocol buffer.
+*  <b>max_queue</b>: Integer. Size of the queue for pending events and summaries.
+*  <b>flush_secs</b>: Number. How often, in seconds, to flush the
+    pending events and summaries to disk.
+
+
+
+- - -
+
+#### tf.train.SummaryWriter.add_summary(summary, global_step=None) {#SummaryWriter.add_summary}
+
+Adds a `Summary` protocol buffer to the event file.
+
+This method wraps the provided summary in an `Event` procotol buffer
+and adds it to the event file.
+
+You can pass the output of any summary op, as-is, to this function. You
+can also pass a `Summary` procotol buffer that you manufacture with your
+own data. This is commonly done to report evaluation results in event
+files.
+
+##### Args:
+
+
+*  <b>summary</b>: A `Summary` protocol buffer, optionally serialized as a string.
+*  <b>global_step</b>: Number. Optional global step value to record with the
+    summary.
+
+
+- - -
+
+#### tf.train.SummaryWriter.add_event(event) {#SummaryWriter.add_event}
+
+Adds an event to the event file.
+
+##### Args:
+
+
+*  <b>event</b>: An `Event` protocol buffer.
+
+
+- - -
+
+#### tf.train.SummaryWriter.add_graph(graph_def, global_step=None) {#SummaryWriter.add_graph}
+
+Adds a `GraphDef` protocol buffer to the event file.
+
+The graph described by the protocol buffer will be displayed by
+TensorBoard. Most users pass a graph in the constructor instead.
+
+##### Args:
+
+
+*  <b>graph_def</b>: A `GraphDef` protocol buffer.
+*  <b>global_step</b>: Number. Optional global step counter to record with the
+    graph.
+
+
+
+- - -
+
+#### tf.train.SummaryWriter.flush() {#SummaryWriter.flush}
+
+Flushes the event file to disk.
+
+Call this method to make sure that all pending events have been written to
+disk.
+
+
+- - -
+
+#### tf.train.SummaryWriter.close() {#SummaryWriter.close}
+
+Flushes the event file to disk and close the file.
+
+Call this method when you do not need the summary writer anymore.
+
+
+
+- - -
+
+### tf.train.summary_iterator(path) <div class="md-anchor" id="summary_iterator">{#summary_iterator}</div>
+
+An iterator for reading `Event` protocol buffers from an event file.
+
+You can use this function to read events written to an event file. It returns
+a Python iterator that yields `Event` protocol buffers.
+
+Example: Print the contents of an events file.
+
+```python
+for e in tf.summary_iterator(path to events file):
+    print e
+```
+
+Example: Print selected summary values.
+
+```python
+# This example supposes that the events file contains summaries with a
+# summary value tag 'loss'.  These could have been added by calling
+# `add_summary()`, passing the output of a scalar summary op created with
+# with: `tf.scalar_summary(['loss'], loss_tensor)`.
+for e in tf.summary_iterator(path to events file):
+    for v in e.summary.value:
+        if v.tag == 'loss':
+            print v.simple_value
+```
+
+See the protocol buffer definitions of
+[Event](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/util/event.proto)
+and
+[Summary](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+for more information about their attributes.
+
+##### Args:
+
+
+*  <b>path</b>: The path to an event file created by a `SummaryWriter`.
+
+##### Yields:
+
+  `Event` protocol buffers.
+
+
+
+## Training utilities. <div class="md-anchor" id="AUTOGENERATED-training-utilities.">{#AUTOGENERATED-training-utilities.}</div>
+
+- - -
+
+### tf.train.global_step(sess, global_step_tensor) <div class="md-anchor" id="global_step">{#global_step}</div>
+
+Small helper to get the global step.
+
+```python
+# Creates a variable to hold the global_step.
+global_step_tensor = tf.Variable(10, trainable=False, name='global_step')
+# Creates a session.
+sess = tf.Session()
+# Initializes the variable.
+sess.run(global_step_tensor.initializer)
+print 'global_step:', tf.train.global_step(sess, global_step_tensor)
+
+global_step: 10
+```
+
+##### Args:
+
+
+*  <b>sess</b>: A brain `Session` object.
+*  <b>global_step_tensor</b>: `Tensor` or the `name` of the operation that contains
+    the global step.
+
+##### Returns:
+
+  The global step value.
+
+
+- - -
+
+### tf.train.write_graph(graph_def, logdir, name, as_text=True) <div class="md-anchor" id="write_graph">{#write_graph}</div>
+
+Writes a graph proto on disk.
+
+The graph is written as a binary proto unless as_text is `True`.
+
+```python
+v = tf.Variable(0, name='my_variable')
+sess = tf.Session()
+tf.train.write_graph(sess.graph_def, '/tmp/my-model', 'train.pbtxt')
+```
+
+##### Args:
+
+
+*  <b>graph_def</b>: A `GraphDef` protocol buffer.
+*  <b>logdir</b>: Directory where to write the graph.
+*  <b>name</b>: Filename for the graph.
+*  <b>as_text</b>: If `True`, writes the graph as an ASCII proto.
+
+
diff --git a/tensorflow/g3doc/extras/README.txt b/tensorflow/g3doc/extras/README.txt
new file mode 100644
index 0000000000..2c9682d2fb
--- /dev/null
+++ b/tensorflow/g3doc/extras/README.txt
@@ -0,0 +1,2 @@
+This directory holds extra files we'd like to be able
+to link to and serve from within tensorflow.org
diff --git a/tensorflow/g3doc/extras/tensorflow-whitepaper2015.bib b/tensorflow/g3doc/extras/tensorflow-whitepaper2015.bib
new file mode 100644
index 0000000000..d04168ab39
--- /dev/null
+++ b/tensorflow/g3doc/extras/tensorflow-whitepaper2015.bib
@@ -0,0 +1,45 @@
+@misc{tensorflow2015-whitepaper,
+title={{TensorFlow}: Large-Scale Machine Learning on Heterogeneous Systems},
+url={http://www.tensorflow.org/extras/tensorflow-whitepaper2015.pdf},
+note={Software available from www.tensorflow.org},
+author={
+    Martin~Abadi and
+    Ashish~Agarwal and
+    Paul~Barham and
+    Eugene~Brevdo and
+    Zhifeng~Chen and
+    Craig~Citro and
+    Greg~Corrado and
+    Andy~Davis and
+    Jeffrey~Dean and
+    Matthieu~Devin and
+    Sanjay~Ghemawat and
+    Ian~Goodfellow and
+    Andrew~Harp and
+    Geoffrey~Irving and
+    Michael~Isard and
+    Yangqing Jia and
+    Lukasz~Kaiser and
+    Manjunath~Kudlur and
+    Josh~Levenberg and
+    Dan~Man\'{e} and
+    Rajat~Monga and
+    Sherry~Moore and
+    Derek~Murray and
+    Chris~Olah and
+    Jonathon~Shlens and
+    Benoit~Steiner and
+    Ilya~Sutskever and
+    Kunal~Talwar and
+    Paul~Tucker and
+    Vincent~Vanhoucke and
+    Vijay~Vasudevan and
+    Fernanda~Vi\'{e}gas,
+    Oriol~Vinyals and
+    Pete~Warden and
+    Martin~Wattenberg,
+    Martin~Wicke and
+    Yuan~Yu and
+    Xiaoqiang~Zheng},
+  year={2015},
+}
diff --git a/tensorflow/g3doc/get_started/basic_usage.md b/tensorflow/g3doc/get_started/basic_usage.md
new file mode 100644
index 0000000000..04c3334028
--- /dev/null
+++ b/tensorflow/g3doc/get_started/basic_usage.md
@@ -0,0 +1,273 @@
+# Basic Usage
+
+To use TensorFlow you need to understand how TensorFlow:
+
+* Represents computations as graphs.
+* Executes graphs in the context of `Sessions`.
+* Represents data as tensors.
+* Maintains state with `Variables`.
+* Uses feeds and fetches to get data into and out of arbitrary operations.
+
+## Overview
+
+TensorFlow is a programming system in which you represent computations as
+graphs.  Nodes in the graph are called *ops* (short for operations).  An op
+takes zero or more `Tensors`, performs some computation, and produces zero or
+more `Tensors`.  A `Tensor` is a typed multi-dimensional array. For example,
+you can represent a  mini-batch of images as a 4-D array of floating point
+numbers with dimensions `[batch, height, width, channels]`).
+
+A TensorFlow graph is a *description* of computations.  To compute anything,
+a graph must be launched in a `Session`.  A `Session` places the graph ops onto
+`Devices`, such as CPUs or GPUs, and provides methods to execute them.  These
+methods return tensors produced by ops as [numpy](http://www.numpy.org)
+`ndarray` objects in Python, and as `tensorflow::Tensor` instances in C and
+C++.
+
+## The computation graph
+
+TensorFlow programs are usually structured into a construction phase, that
+assembles a graph, and an execution phase that uses a session to execute ops in
+the graph.
+
+For example, it is common to create a graph to represent and train a neural
+network in the construction phase, and then repeatedly execute a set of
+training ops in the graph in the execution phase.
+
+TensorFlow can be used from C, C++, and Python programs.  It is presently much
+easier to use the Python library to assemble graphs, as it provides a large set
+of helper functions not available in the C and C++ libraries.
+
+The session libraries have equivalent functionalities for the three languages.
+
+### Building the graph
+
+To build a graph start with ops that do not need any input (source ops), such as
+`Constant`, and pass their output to other ops that do computation.
+
+The ops constructors in the Python library return objects that stand for the
+output of the constructed ops.  You can pass these to other ops constructors to
+use as inputs.
+
+The TensorFlow Python library has a *default graph* to which ops constructors
+add nodes.  The default graph is sufficient for many applications.  See the
+[Graph class](../api_docs/python/framework.md#Graph) documentation for how
+to explicitly manage multiple graphs.
+
+```python
+import tensorflow as tf
+
+# Create a Constant op that produces a 1x2 matrix.  The op is
+# added as a node to the default graph.
+#
+# The value returned by the constructor represents the output
+# of the Constant op.
+matrix1 = tf.constant([[3., 3.]])
+
+# Create another Constant that produces a 2x1 matrix.
+matrix2 = tf.constant([[2.],[2.]])
+
+# Create a Matmul op that takes 'matrix1' and 'matrix2' as inputs.
+# The returned value, 'product', represents the result of the matrix
+# multiplication.
+product = tf.matmul(matrix1, matrix2)
+```
+
+The default graph now has three nodes: two `constant()` ops and one `matmul()`
+op. To actually multiply the matrices, and get the result of the multiplication,
+you must launch the graph in a session.
+
+## Launching the graph in a Session
+
+Launching follows construction.  To launch a graph, create a `Session` object.
+Without arguments the session constructor launches the default graph.
+
+See the [Session class](../api_docs/python/client.md#session-management) for
+the complete session API.
+
+```python
+# Launch the default graph.
+sess = tf.Session()
+
+# To run the matmul op we call the session 'run()' method, passing 'product'
+# which represents the output of the matmul op.  This indicates to the call
+# that we want to get the output of the matmul op back.
+#
+# All inputs needed by the op are run automatically by the session.  They
+# typically are run in parallel.
+#
+# The call 'run(product)' thus causes the execution of threes ops in the
+# graph: the two constants and matmul.
+#
+# The output of the op is returned in 'result' as a numpy `ndarray` object.
+result = sess.run(product)
+print result
+
+# Close the Session when we're done.
+sess.close()
+
+
+# Stdout output ==> [[ 12.]]
+```
+
+Sessions should be closed to release resources. You can also enter a `Session`
+with a "with" block. The `Session` closes automatically at the end of the
+`with` block.
+
+
+
+```python
+with tf.Session() as sess:
+  result = sess.run([product])
+  print result
+```
+
+The TensorFlow implementation translates the graph definition into executable
+operations distributed across available compute resources, such as the CPU or
+one of your computer's GPU cards. In general you do not have to specify CPUs
+or GPUs explicitly. TensorFlow uses your first GPU, if you have one, for as
+many operations as possible.
+
+If you have more than one GPU available on your machine, to use a GPU beyond
+the first you must assign ops to it explicitly. Use `with...Device` statements
+to specify which CPU or GPU to use for operations:
+
+```python
+with tf.Session() as sess:
+  with tf.device("/gpu:1"):
+    matrix1 = tf.constant([[3., 3.]])
+    matrix2 = tf.constant([[2.],[2.]])
+    product = tf.matmul(matrix1, matrix2)
+    ...
+```
+
+Devices are specified with strings.  The currently supported devices are:
+
+*  `"/cpu:0"`: The CPU of your machine.
+*  `"/gpu:0"`: The GPU of your machine, if you have one.
+*  `"/gpu:1"`: The second GPU of your machine, etc.
+
+See [Using GPUs](../how_tos/using_gpu/index.md) for more information about GPUs
+and TensorFlow.
+
+## Tensors
+
+TensorFlow programs use a tensor data structure to represent all data -- only
+tensors are passed between operations in the computation graph. You can think
+of a TensorFlow tensor as an n-dimensional array or list. A tensor has a
+static type a rank, and a shape.  To learn more about how TensorFlow handles
+these concepts, see the [Rank, Shape, and Type](../resources/dims_types.md)
+reference.
+
+
+# output:
+# [array([ 21.], dtype=float32), array([ 7.], dtype=float32)]
+
+
+
+
+
+## Variables
+
+Variables maintain state across executions of the graph. The following example
+shows a variable serving as a simple counter.  See
+[Variables](../how_tos/variables/index.md) for more details.
+
+```python
+# Create a Variable, that will be initialized to the scalar value 0.
+var = tf.Variable(0, name="counter")
+
+# Create an Op to add one to `var`.
+
+one = tf.constant(1)
+new_value = tf.add(state, one)
+update = tf.assign(state, new_value)
+
+# Variables must be initialized by running an `init` Op after having
+
+# launched the graph.  We first have to add the `init` Op to the graph.
+init_op = tf.initialize_all_variables()
+
+# Launch the graph and run the ops.
+with tf.Session() as sess:
+    # Run the 'init' op
+    sess.run(init_op)
+    # Print the initial value of 'var'
+    print sess.run(var)
+    # Run the op that updates 'var' and print 'var'.
+    for _ in range(3):
+        sess.run(update)
+        print sess.run(var)
+
+# output:
+
+# 0
+# 1
+# 2
+# 3
+```
+
+The `assign()` operation in this code is a part of the expression graph just
+like the `add()` operation, so it does not actually perform the assignment
+until `run()` executes the expression.
+
+You typically represent the parameters of a statistical model as a set of
+Variables. For example, you would store the weights for a neural network as a
+tensor in a Variable. During training you update this tensor by running a
+training graph repeatedly.
+
+## Fetches
+
+To fetch the outputs of operations, execute the graph with a `run()` call on
+the `Session` object and pass in the tensors to retrieve. In the previous
+example we fetched the single node `var`, but you can also fetch multiple
+tensors:
+
+```python
+input1 = tf.constant(3.0)
+input2 = tf.constant(2.0)
+input3 = tf.constant(5.0)
+intermed = tf.add(input2, input3)
+mul = tf.mul(input1, intermed)
+
+with tf.Session():
+  result = sess.run([mul, intermed])
+  print result
+
+# output:
+# [array([ 21.], dtype=float32), array([ 7.], dtype=float32)]
+```
+
+All the ops needed to produce the values of the requested tensors are run once
+(not once per requested tensor).
+
+## Feeds
+
+The examples above introduce tensors into the computation graph by storing them
+in `Constants` and `Variables`. TensorFlow also provides a feed mechanism for
+patching a tensor directly into any operation in the graph.
+
+A feed temporarily replaces the output of an operation with a tensor value.
+You supply feed data as an argument to a `run()` call. The feed is only used for
+the run call to which it is passed. The most common use case involves
+designating specific operations to be "feed" operations by using
+tf.placeholder() to create them:
+
+```python
+
+input1 = tf.placeholder(tf.types.float32)
+input2 = tf.placeholder(tf.types.float32)
+output = tf.mul(input1, input2)
+
+with tf.Session() as sess:
+  print sess.run([output], feed_dict={input1:[7.], input2:[2.]})
+
+# output:
+# [array([ 14.], dtype=float32)]
+```
+
+A `placeholder()` operation generates an error if you do not supply a feed for
+it. See the [MNIST fully-connected feed
+tutorial](../tutorials/mnist/fully_connected_feed.py) for a larger-scale
+example of feeds.
+
diff --git a/tensorflow/g3doc/get_started/blue_pill.jpg b/tensorflow/g3doc/get_started/blue_pill.jpg
new file mode 100644
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/get_started/blue_pill.jpg
diff --git a/tensorflow/g3doc/get_started/index.md b/tensorflow/g3doc/get_started/index.md
new file mode 100644
index 0000000000..5b92e6e53f
--- /dev/null
+++ b/tensorflow/g3doc/get_started/index.md
@@ -0,0 +1,84 @@
+# Introduction
+
+Let's get you up and running with TensorFlow!
+
+But before we even get started, let's give you a sneak peak at what TensorFlow
+code looks like in the Python API, just so you have a sense of where we're
+headed.
+
+Here's a little Python program that makes up some data in three dimensions, and
+then fits a plane to it.
+
+```python
+import tensorflow as tf
+import numpy as np
+
+# Make 100 phony data points in NumPy.
+x_data = np.float32(np.random.rand(2, 100)) # Random input
+y_data = np.dot([0.100, 0.200], x_data) + 0.300
+
+# Construct a linear model.
+b = tf.Variable(tf.zeros([1]))
+W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0))
+y = tf.matmul(W, x_data) + b
+
+# Minimize the squared errors.
+loss = tf.reduce_mean(tf.square(y - y_data))
+optimizer = tf.train.GradientDescentOptimizer(0.5)
+train = optimizer.minimize(loss)
+
+# For initializing the variables.
+init = tf.initialize_all_variables()
+
+# Launch the graph
+sess = tf.Session()
+sess.run(init)
+
+# Fit the plane.
+for step in xrange(0, 201):
+    sess.run(train)
+    if step % 20 == 0:
+        print step, sess.run(W), sess.run(b)
+
+# Learns best fit is W: [[0.100  0.200]], b: [0.300]
+```
+
+To whet your appetite further, we suggest you check out what a classical
+machine learning problem looks like in TensorFlow.  In the land of neural
+networks the most "classic" classical problem is the MNIST handwritten digit
+classification.  We offer two introductions here, one for machine learning
+newbies, and one for pros.  If you've already trained dozens of MNIST models in
+other software packages, please take the red pill.  If you've never even heard
+of MNIST, definitely take the blue pill.  If you're somewhere in between, we
+suggest skimming blue, then red.
+
+TODO(danmane): Add in creative commons attribution for these images.
+Also, make sure the sizes are precisely the same.
+
+<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px; display: flex; flex-direction: row">
+ <a href="../tutorials/mnist/beginners/index.md">
+   <img style="flex-grow:1; flex-shrink:1;border: 1px solid black;" src="./blue_pill.jpg">
+ </a>
+ <a href="../tutorials/mnist/pros/index.md">
+   <img style="flex-grow:1; flex-shrink:1; border: 1px solid black;" src="./red_pill.jpg">
+ </a>
+</div>
+
+If you're already sure you want to learn and install TensorFlow you can skip
+these and charge ahead.  Don't worry, you'll still get to see MNIST -- we'll
+also use MNIST as an example in our technical tutorial where we elaborate on
+TensorFlow features.
+
+## Recommended Next Steps:
+* [Download and Setup](os_setup.md)
+* [Basic Usage](basic_usage.md)
+* [TensorFlow Mechanics 101](../tutorials/mnist/tf/index.md)
+
+
+<div class='sections-order' style="display: none;">
+<!--
+<!-- os_setup.md -->
+<!-- basic_usage.md -->
+-->
+</div>
+
diff --git a/tensorflow/g3doc/get_started/os_setup.md b/tensorflow/g3doc/get_started/os_setup.md
new file mode 100644
index 0000000000..01e6fde788
--- /dev/null
+++ b/tensorflow/g3doc/get_started/os_setup.md
@@ -0,0 +1,261 @@
+# Download and Setup
+
+## Binary Installation
+
+### Ubuntu/Linux
+
+Make sure you have `pip` and `numpy` installed :
+
+```sh
+$ sudo apt-get install python-pip python-numpy
+```
+
+Install TensorFlow:
+
+```sh
+# For CPU-only version
+$ sudo pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl
+
+# For GPU-enabled version
+$ sudo pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl
+```
+
+### Mac OS X
+
+Make sure you have `pip` installed:
+
+If using `easy_install`:
+
+```sh
+$ sudo easy_install pip
+```
+
+Install TensorFlow (only CPU binary version is currently available).
+
+```sh
+$ sudo pip install https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-none-any.whl
+```
+
+### Try your first TensorFlow program
+
+```sh
+$ python
+
+>>> import tensorflow as tf
+>>> hello = tf.constant('Hello, TensorFlow!')
+>>> sess = tf.Session()
+>>> print sess.run(hello)
+Hello, TensorFlow!
+>>> a = tf.constant(10)
+>>> b = tf.constant(32)
+>>> print sess.run(a+b)
+42
+>>>
+
+```
+
+If you are running the GPU version and you see
+```sh
+ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory
+```
+
+you most likely need to set your `LD_LIBRARY_PATH` to point to the location of
+your CUDA libraries.
+
+### Train the MNIST neural net model
+
+```sh
+$ python tensorflow/models/image/mnist/convolutional.py
+Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
+Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
+Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
+Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
+Extracting data/train-images-idx3-ubyte.gz
+Extracting data/train-labels-idx1-ubyte.gz
+Extracting data/t10k-images-idx3-ubyte.gz
+Extracting data/t10k-labels-idx1-ubyte.gz
+can't determine number of CPU cores: assuming 4
+I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op
+parallelism threads: 3
+can't determine number of CPU cores: assuming 4
+I tensorflow/core/common_runtime/local_session.cc:45] Local session inter op
+parallelism threads: 4
+Initialized!
+Epoch 0.00
+Minibatch loss: 12.054, learning rate: 0.010000
+Minibatch error: 90.6%
+Validation error: 84.6%
+...
+...
+
+```
+
+## Source Installation {#source}
+
+### Clone the TensorFlow repository
+
+TODO(keveman): Supply clone command for external users.
+
+```sh
+$ git clone --recurse-submodules https://YOUR_WHITELISTED_EMAIL_WITH_AT_REPLACED_BY_DOT@tensorflow.googlesource.com/tf3
+```
+
+
+### Installation for Linux
+
+#### Install Bazel
+
+
+Follow instructions [here](http://bazel.io/docs/install.html) to install the
+dependencies for Bazel. Then download and build the Bazel source with the
+following commands:
+
+```sh
+$ git clone https://github.com/bazelbuild/bazel.git
+$ cd bazel
+$ git checkout tags/0.1.0
+$ ./compile.sh
+```
+
+These commands use the commit tag `0.1.0`, which is known to work with
+TensorFlow. `HEAD` may be unstable.
+
+Add the executable `output/bazel` to your `$PATH` environment variable.
+
+#### Install other dependencies
+
+```sh
+$ sudo apt-get install python-numpy swig python-dev
+```
+
+#### Optional: Install CUDA (GPUs on Linux)
+
+In order to build TensorFlow with GPU support, both Cuda Toolkit 7.0 and CUDNN
+6.5 V2 from NVIDIA need to be installed.
+
+##### Download and install Cuda Toolkit 7.0
+
+https://developer.nvidia.com/cuda-toolkit-70
+
+Install the toolkit into e.g. `/usr/local/cuda`
+
+##### Download and install CUDNN Toolkit 6.5
+
+https://developer.nvidia.com/rdp/cudnn-archive
+
+Uncompress and copy the cudnn files into the toolkit directory.  Assuming the
+toolkit is installed in `/usr/local/cuda`:
+
+``` bash
+tar xvzf cudnn-6.5-linux-x64-v2.tgz
+sudo cp cudnn-6.5-linux-x64-v2/cudnn.h /usr/local/cuda/include
+sudo cp cudnn-6.5-linux-x64-v2/libcudnn* /usr/local/cuda/lib64
+```
+
+##### Configure TensorFlow's canonical view of Cuda libraries
+From the root of your source tree, run:
+
+``` bash
+$ ./configure
+Do you wish to bulid TensorFlow with GPU support? [y/n] y
+GPU support will be enabled for TensorFlow
+
+Please specify the location where CUDA 7.0 toolkit is installed. Refer to
+README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda
+CUDA 7.0 toolkit found
+
+Please specify the location where CUDNN 6.5 V2 library is installed. Refer to
+README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda
+CUDNN 6.5 V2 library found
+
+Setting up Cuda include
+Setting up Cuda lib64
+Setting up Cuda bin
+Setting up Cuda nvvm
+Configuration finished
+```
+
+This creates a canonical set of symbolic links to the Cuda libraries on your system.
+Every time you change the Cuda library paths you need to run this step again before
+you invoke the bazel build command.
+
+##### Build your target with GPU support.
+From the root of your source tree, run:
+
+```sh
+$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
+
+$ bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
+# Lots of output. This tutorial iteratively calculates the major eigenvalue of
+# a 2x2 matrix, on GPU. The last few lines look like this.
+000009/000005 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
+000006/000001 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
+000009/000009 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
+000006/000008 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
+000009/000003 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
+000006/000006 lambda = 2.000000 x = [0.894427 -0.447214] y = [1.788854 -0.894427]
+```
+
+Note that "--config=cuda" is needed to enable the GPU support.
+
+##### Known issues
+
+* Although it is possible to build both Cuda and non-Cuda configs under the same
+source tree, we recommend to run "bazel clean" when switching between these two
+configs in the same source tree.
+
+* You have to run configure before running bazel build. Otherwise, the build
+will fail with a clear error message. In the future, we might consider making
+this more conveninent by including the configure step in our build process,
+given necessary bazel new feature support.
+
+### Installation for Mac OS X
+
+Mac needs the same set of dependencies as Linux, however their installing those
+dependencies is different. Here is a set of useful links to help with installing
+the dependencies on Mac OS X :
+
+#### Bazel
+
+Look for installation instructions for Mac OS X on
+[this](http://bazel.io/docs/install.html) page.
+
+#### SWIG
+
+[Mac OS X installation](http://www.swig.org/Doc3.0/Preface.html#Preface_osx_installation).
+
+Notes : You need to install
+[PCRE](ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/) and *NOT* PCRE2.
+
+#### Numpy
+
+Follow installation instructions [here](http://docs.scipy.org/doc/numpy/user/install.html).
+
+### Build and train your first TensorFlow neural net model
+
+```sh
+$ cd tf3
+
+$ bazel build tensorflow/models/image/mnist:convolutional
+
+$ bazel-bin/tensorflow/models/image/mnist/convolutional
+Succesfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
+Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
+Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
+Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
+Extracting data/train-images-idx3-ubyte.gz
+Extracting data/train-labels-idx1-ubyte.gz
+Extracting data/t10k-images-idx3-ubyte.gz
+Extracting data/t10k-labels-idx1-ubyte.gz
+Initialized!
+Epoch 0.00
+Minibatch loss: 12.054, learning rate: 0.010000
+Minibatch error: 90.6%
+Validation error: 84.6%
+Epoch 0.12
+Minibatch loss: 3.285, learning rate: 0.010000
+Minibatch error: 6.2%
+Validation error: 7.0%
+...
+...
+```
diff --git a/tensorflow/g3doc/get_started/red_pill.jpg b/tensorflow/g3doc/get_started/red_pill.jpg
new file mode 100644
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/get_started/red_pill.jpg
diff --git a/tensorflow/g3doc/how_tos/__init__.py b/tensorflow/g3doc/how_tos/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/__init__.py
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/__init__.py b/tensorflow/g3doc/how_tos/adding_an_op/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/__init__.py
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/attr_examples.cc b/tensorflow/g3doc/how_tos/adding_an_op/attr_examples.cc
new file mode 100644
index 0000000000..84e54c7219
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/attr_examples.cc
@@ -0,0 +1,31 @@
+#include <stdio.h>
+#include "tensorflow/core/framework/op.h"
+
+REGISTER_OP("RestrictedTypeExample").Attr("t: {int32, float, bool}");
+
+REGISTER_OP("NumberType").Attr("t: numbertype");
+
+REGISTER_OP("EnumExample").Attr("e: {'apple', 'orange'}");
+
+REGISTER_OP("MinIntExample").Attr("a: int >= 2");
+
+REGISTER_OP("TypeListExample").Attr("a: list({int32, float}) >= 3");
+
+REGISTER_OP("AttrDefaultExample").Attr("i: int = 0");
+
+REGISTER_OP("AttrDefaultExampleForAllTypes")
+    .Attr("s: string = 'foo'")
+    .Attr("i: int = 0")
+    .Attr("f: float = 1.0")
+    .Attr("b: bool = true")
+    .Attr("ty: type = DT_INT32")
+    .Attr("sh: shape = { dim { size: 1 } dim { size: 2 } }")
+    .Attr("te: tensor = { dtype: DT_INT32 int_val: 5 }")
+    .Attr("l_empty: list(int) = []")
+    .Attr("l_int: list(int) = [2, 3, 5, 7]");
+
+int main(int argc, char* argv[]) {
+  printf("All registered ops:\n%s\n",
+         tensorflow::OpRegistry::Global()->DebugString(false).c_str());
+  return 0;
+}
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/fact_test.py b/tensorflow/g3doc/how_tos/adding_an_op/fact_test.py
new file mode 100644
index 0000000000..17a7028d98
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/fact_test.py
@@ -0,0 +1,16 @@
+"""Test that user ops can be used as expected."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class FactTest(tf.test.TestCase):
+
+  def test(self):
+    with self.test_session():
+      print tf.user_ops.my_fact().eval()
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/index.md b/tensorflow/g3doc/how_tos/adding_an_op/index.md
new file mode 100644
index 0000000000..5c6243cd9c
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/index.md
@@ -0,0 +1,1015 @@
+# Adding a New Op to TensorFlow
+
+PREREQUISITES:
+
+* Some familiarity with C++.
+* Must have [downloaded TensorFlow source](../../get_started/index.md#source),
+  and be able to build it.
+
+If you'd like to incorporate an operation that isn't covered by the existing
+library, you can create a custom Op. To incorporate your custom Op, you'll need
+to:
+
+* Register the new Op in a C++ file. The Op registration is independent of the
+  implementation, and describes the semantics of how the Op is invoked. For
+  example, it defines the Op name, and specifies its inputs and outputs.
+* Implement the Op in C++. This implementation is called a "kernel", and there
+  can be multiple kernels for different architectures (e.g. CPUs, GPUs) or
+  input / output types.
+* Create a Python wrapper. This wrapper is the public API to create the Op. A
+  default wrapper is generated from the Op registration, which can be used
+  directly or added to.
+* Optionally, write a function to compute gradients for the Op.
+* Optionally, write a function that describes the input and output shapes
+  for the Op.  This allows shape inference to work with your Op.
+* Test the Op, typically in Python. If you define gradients, verify them with
+  the Python [`GradientChecker`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/kernel_tests/gradient_checker.py).
+
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Define the Op's interface](#define_interface)
+* [Implement the kernel for the Op](#AUTOGENERATED-implement-the-kernel-for-the-op)
+* [Generate the client wrapper](#AUTOGENERATED-generate-the-client-wrapper)
+  * [The Python Op wrapper](#AUTOGENERATED-the-python-op-wrapper)
+  * [The C++ Op wrapper](#AUTOGENERATED-the-c---op-wrapper)
+* [Verify it works](#AUTOGENERATED-verify-it-works)
+* [Validation](#validation)
+* [Op registration](#AUTOGENERATED-op-registration)
+  * [Attrs](#AUTOGENERATED-attrs)
+  * [Attr types](#AUTOGENERATED-attr-types)
+  * [Polymorphism](#polymorphism)
+  * [Inputs and Outputs](#AUTOGENERATED-inputs-and-outputs)
+  * [Backwards compatibility](#AUTOGENERATED-backwards-compatibility)
+* [GPU Support](#mult-archs)
+* [Implement the gradient in Python](#AUTOGENERATED-implement-the-gradient-in-python)
+* [Implement a shape function in Python](#AUTOGENERATED-implement-a-shape-function-in-python)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Define the Op's interface <div class="md-anchor" id="define_interface">{#define_interface}</div>
+
+You define the interface of an Op by registering it with the TensorFlow system.
+In the registration, you specify the name of your Op, its inputs (types and
+names) and outputs (types and names), as well as [docstrings](#docstrings) and
+any [attrs](#attrs) the Op might require.
+
+To see how this works, suppose you'd like to create an Op that takes a tensor of
+`int32`s and outputs a copy of the tensor, with all but the first element set to
+zero. Create file [`tensorflow/core/user_ops`][user_ops]`/zero_out.cc` and
+add a call to the `REGISTER_OP` macro that defines the interface for such an Op:
+
+```c++
+#include "tensorflow/core/framework/op.h"
+
+REGISTER_OP("ZeroOut")
+    .Input("to_zero: int32")
+    .Output("zeroed: int32");
+```
+
+This `ZeroOut` Op takes one tensor `to_zero` of 32-bit integers as input, and
+outputs a tensor `zeroed` of 32-bit integers.
+
+> A note on naming: The name of the Op should be unique and CamelCase.  Names
+> starting with an underscore (`_`) are reserved for internal use.
+
+## Implement the kernel for the Op <div class="md-anchor" id="AUTOGENERATED-implement-the-kernel-for-the-op">{#AUTOGENERATED-implement-the-kernel-for-the-op}</div>
+
+After you define the interface, provide one or more implementations of the Op.
+To create one of these kernels, create a class that extends `OpKernel` and
+overrides the `Compute` method. The `Compute` method provides one `context`
+argument of type `OpKernelContext*`, from which you can access useful things
+like the input and output tensors.
+
+Add your kernel to the file you created above. The kernel might look something
+like this:
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+
+using namespace tensorflow;
+
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+    auto input = input_tensor.flat<int32>();
+
+    // Create an output tensor
+    Tensor* output_tensor = NULL;
+    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
+                                                     &output_tensor));
+    auto output = output_tensor->template flat<int32>();
+
+    // Set all but the first element of the output tensor to 0.
+    const int N = input.size();
+    for (int i = 1; i < N; i++) {
+      output(i) = 0;
+    }
+
+    // Preserve the first input value if possible.
+    if (N > 0) output(0) = input(0);
+  }
+};
+```
+
+After implementing your kernel, you register it with the TensorFlow system. In
+the registration, you specify different constraints under which this kernel
+will run. For example, you might have one kernel made for CPUs, and a separate
+one for GPUs.
+
+To do this for the `ZeroOut` op, add the following to `zero_out.cc`:
+
+```c++
+REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);
+```
+
+TODO: instructions or pointer to building TF
+
+At this point, the Tensorflow system can reference and use the Op when
+requested.
+
+## Generate the client wrapper <div class="md-anchor" id="AUTOGENERATED-generate-the-client-wrapper">{#AUTOGENERATED-generate-the-client-wrapper}</div>
+### The Python Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-python-op-wrapper">{#AUTOGENERATED-the-python-op-wrapper}</div>
+
+Python op wrappers are created automatically in
+`bazel-genfiles/tensorflow/python/ops/gen_user_ops.py` for all ops placed in the
+[`tensorflow/core/user_ops`][user_ops] directory when you build Tensorflow.
+Those ops are imported into
+[`tensorflow/python/user_ops/user_ops.py`][python-user_ops] with the statement:
+
+```python
+from tensorflow.python.ops.gen_user_ops import *
+```
+
+You may optionally use your own function instead.  To do this, you first hide
+the generated code for that op by adding its name to the `hidden` list in the
+`"user_ops"` rule in
+[`tensorflow/python/BUILD`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD):
+
+```python
+tf_gen_op_wrapper_py(
+    name = "user_ops",
+    hidden = [
+        "Fact",
+    ],
+    require_shape_functions = False,
+)
+```
+
+List your op next to `"Fact"`.  Next you add your replacement function to
+[`tensorflow/python/user_ops/user_ops.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/user_ops/user_ops.py).
+Typically your function will call the generated function to actually add the op
+to the graph.  The hidden version of the generated function will be in the
+`gen_user_ops` package and start with an underscore ("`_`").  For example:
+
+```python
+def my_fact():
+    """Example of overriding the generated code for an Op."""
+    return gen_user_ops._fact()
+```
+
+### The C++ Op wrapper <div class="md-anchor" id="AUTOGENERATED-the-c---op-wrapper">{#AUTOGENERATED-the-c---op-wrapper}</div>
+
+C++ op wrappers are created automatically for all ops placed in the
+[`tensorflow/core/user_ops`][user_ops] directory, when you build Tensorflow. For
+example, ops in `tensorflow/core/user_ops/zero_out.cc` will generate wrappers in
+`bazel-genfiles/tensorflow/cc/ops/user_ops.{h,cc}`.
+
+All generated wrappers for user ops are automatically
+imported into [`tensorflow/cc/ops/standard_ops.h`][standard_ops-cc] with the
+statement
+
+```c++
+#include "tensorflow/cc/ops/user_ops.h"
+```
+
+## Verify it works <div class="md-anchor" id="AUTOGENERATED-verify-it-works">{#AUTOGENERATED-verify-it-works}</div>
+
+A good way to verify that you've successfully implemented your Op is to write a
+test for it. Create the file
+`tensorflow/python/kernel_tests/zero_out_op_test.py` with the contents:
+[TODO]:# (put tests somewhere else and make sure it works)
+
+```python
+import tensorflow as tf
+
+
+class ZeroOutTest(tf.test.TestCase):
+  def testZeroOut(self):
+    with self.test_session():
+      result = tf.user_ops.zero_out([5, 4, 3, 2, 1])
+      self.assertAllEqual(result.eval(), [5, 0, 0, 0, 0])
+```
+
+Then run your test:
+
+```sh
+$ bazel test tensorflow/python:zero_out_op_test
+```
+
+## Validation <div class="md-anchor" id="validation">{#validation}</div>
+
+The example above assumed that the Op applied to a tensor of any shape.  What
+if it only applied to vectors?  That means adding a check to the above OpKernel
+implementation.
+
+```c++
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+
+    OP_REQUIRES(context, TensorShapeUtils::IsVector(input_tensor.shape()),
+                errors::InvalidArgument("ZeroOut expects a 1-D vector."));
+    // ...
+  }
+```
+
+This asserts that the input is a vector, and returns having set the
+`InvalidArgument` status if it isn't.  The
+[OP_REQUIRES macro][validation-macros] takes three arguments:
+
+*   The `context`, which can either be an `OpKernelContext` or
+    `OpKernelConstruction` pointer (see
+    [`tensorflow/core/framework/op_kernel.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_kernel.h)),
+    for its `SetStatus()` method.
+*   The condition.  For example, there are functions for validating the shape
+    of a tensor in [`tensorflow/core/public/tensor_shape.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/tensor_shape.h) 
+*   The error itself, which is represented by a `Status` object, see
+    [`tensorflow/core/public/status.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/status.h). A
+    `Status` has both a type (frequently `InvalidArgument`, but see the list of
+    types) and a message.  Functions for constructing an error may be found in
+    [`tensorflow/core/lib/core/errors.h`][validation-macros].
+
+Alternatively, if you want to test whether a `Status` object returned from some
+function is an error, and if so return it, use
+[`OP_REQUIRES_OK`][validation-macros].  Both of these macros return from the
+function on error.
+
+## Op registration <div class="md-anchor" id="AUTOGENERATED-op-registration">{#AUTOGENERATED-op-registration}</div>
+
+### Attrs <div class="md-anchor" id="AUTOGENERATED-attrs">{#AUTOGENERATED-attrs}</div>
+
+Ops can have attrs, whose values are set when the Op is added to a graph. These
+are used to configure the Op, and their values can be accessed both within the
+kernel implementation and in the types of inputs and outputs in the Op
+registration. Prefer using an input instead of an attr when possible, since
+inputs are more flexible.  They can change every step, be set using a feed, etc.
+Attrs are used for things that can't be done with inputs: any configuration
+that affects the signature (number or type of inputs or outputs) or that
+can't change from step-to-step.
+
+You define an attr when you register the Op, by specifying its name and type
+using the `Attr` method, which expects a spec of the form:
+
+```
+<name>: <attr-type-expr>
+```
+
+where `<name>` begins with a letter and can be composed of alphanumeric
+characters and underscores, and `<attr-type-expr>` is a type expression of the
+form [described below](#attr-types)
+
+For example, if you'd like the `ZeroOut` Op to preserve a user-specified index,
+instead of only the 0th element, you can register the Op like so:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("preserve_index: int")</b>
+    .Input("to_zero: int32")
+    .Output("zeroed: int32");
+</pre></code>
+
+Your kernel can then access this attr in its constructor via the `context`
+parameter:
+
+<code class="lang-c++"><pre>
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction\* context) : OpKernel(context) {<b>
+    // Get the index of the value to preserve
+    OP_REQUIRES_OK(context->GetAttr("preserve\_index", &preserve\_index\_));
+  </b>}
+  void Compute(OpKernelContext\* context) override {
+    // ...
+  }
+ <b>private:
+  int preserve\_index\_;</b>
+}
+</pre></code>
+
+which can then be used in the `Compute` method:
+
+<code class="lang-c++"><pre>
+  void Compute(OpKernelContext\* context) override {
+    // ...
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i=0; i < N; i++) {
+      output\_flat(i) = 0;
+    }<br>
+    <b>// Preserve the requested input value
+    output\_flat(preserve\_index\_) = input(preserve\_index\_);</b>
+  }
+</pre></code>
+
+[TODO]:# (check the code in this section in and test it)
+
+> To preserve [backwards compatibility](#backwards-compatibility), you should
+> specify a [default value](#default-values-constraints) when adding an attr to
+> an existing op:
+>
+> <code class="lang-c++"><pre>
+> REGISTER\_OP("ZeroOut")
+>     <b>.Attr("preserve\_index: int = 0")</b>
+>     .Input("to_zero: int32")
+>     .Output("zeroed: int32");
+> </pre></code>
+
+### Attr types <div class="md-anchor" id="AUTOGENERATED-attr-types">{#AUTOGENERATED-attr-types}</div>
+
+The following types are supported in an attr:
+
+* `string`: Any sequence of bytes (not required to be UTF8).
+* `int`: A signed integer.
+* `float`: A floating point number.
+* `bool`: True or false.
+* `type`: One of the (non-ref) values of [`DataType`][DataTypeString].
+* `shape`: A [`TensorShapeProto`][TensorShapeProto].
+* `tensor`: A [`TensorProto`][TensorProto].
+* `list(<type>)`: A list of `<type>`, where `<type>` is one of the above types.
+  Note that `list(list(<type>))` is invalid.
+
+See also: [op_def_builder.cc:FinalizeAttr][FinalizeAttr] for a definitive list.
+
+#### Default values & constraints
+
+Attrs may have default values, and some types of attrs can have constraints. To
+define an attr with constraints, you can use the following `<attr-type-expr>`s:
+
+* `{'<string1>', '<string2>'}`: The value must be a string that has either the
+  value `<string1>` or `<string2>`.  The name of the type, `string`, is implied
+  when you use this syntax.  This emulates an enum:
+
+  ```c++
+  REGISTER_OP("EnumExample")
+      .Attr("e: {'apple', 'orange'}");
+  ```
+
+* `{<type1>, <type2>}`: The value is of type `type`, and must be one of
+  `<type1>` or `<type2>`, where `<type1>` and `<type2>` are supported
+  [tensor types](../../resources/dims_types.md#data-types).  You don't specify
+  that the type of the attr is `type`. This is implied when you have a list of
+  types in `{...}`.  For example, in this case the attr `t` is a type that must
+  be an `int32`, a `float`, or a `bool`:
+
+  ```c++
+  REGISTER_OP("RestrictedTypeExample")
+      .Attr("t: {int32, float, bool}");
+  ```
+
+* There are shortcuts for common type constraints:
+    * `numbertype`: Type `type` restricted to the numeric (non-string and
+      non-bool) types.
+    * `realnumbertype`: Like `numbertype` without complex types.
+    * `quantizedtype`: Like `numbertype` but just the quantized number types.
+
+    The specific lists of types allowed by these are defined by the functions
+    (like `NumberTypes()`) in
+    [`tensorflow/core/framework/types.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.h).
+    In this example the attr `t` must be one of the numeric types:
+
+    ```c++
+    REGISTER_OP("NumberType")
+        .Attr("t: numbertype");
+    ```
+
+    For this op:
+
+    ```python
+    tf.number_type(t=tf.int32)  # Valid
+    tf.number_type(t=tf.bool)   # Invalid
+    ```
+
+* `int >= <n>`: The value must be an int whose value is greater than or equal to
+  `<n>`, where `<n>` is a natural number.
+
+  For example, the following Op registration specifies that the attr `a` must
+  have a value that is at least `2`:
+
+  ```c++
+  REGISTER_OP("MinIntExample")
+      .Attr("a: int >= 2");
+  ```
+
+* `list(<type>) >= <n>`: A list of type `<type>` whose length is greater than
+  or equal to `<n>`.
+
+  For example, the following Op registration specifies that the attr `a` is a
+  list of types (either `int32` or `float`), and that there must be at least 3
+  of them:
+
+  ```c++
+  REGISTER_OP("TypeListExample")
+      .Attr("a: list({int32, float}) >= 3");
+  ```
+
+To set a default value for an attr (making it optional in the generated code),
+add `= <default>` to the end, as in:
+
+```c++
+REGISTER_OP("AttrDefaultExample")
+    .Attr("i: int = 0");
+```
+
+The supported syntax of the default value is what would be used in the proto
+representation of the resulting GraphDef definition.
+
+Here are examples for how to specify a default for all types:
+
+```c++
+REGISTER_OP("AttrDefaultExampleForAllTypes")
+   .Attr("s: string = 'foo'")
+   .Attr("i: int = 0")
+   .Attr("f: float = 1.0")
+   .Attr("b: bool = true")
+   .Attr("ty: type = DT_INT32")
+   .Attr("sh: shape = { dim { size: 1 } dim { size: 2 } }")
+   .Attr("te: tensor = { dtype: DT_INT32 int_val: 5 }")
+   .Attr("l_empty: list(int) = []")
+   .Attr("l_int: list(int) = [2, 3, 5, 7]");
+```
+
+Note in particular that the values of type `type` use [the `DT_*` names
+for the types](../../resources/dims_types.md#data-types).
+
+### Polymorphism <div class="md-anchor" id="polymorphism">{#polymorphism}</div>
+#### Type Polymorphism {#type-polymorphism}
+
+For ops that can take different types as input or produce different output
+types, you can specify [an attr](#attrs) in
+[an input or output type](#inputs-outputs) in the Op registration.  Typically
+you would then register an `OpKernel` for each supported type.
+
+For instance, if you'd like the `ZeroOut` Op to work on `float`s
+in addition to `int32`s, your Op registration might look like:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("T: {float, int32}")</b>
+    .Input("to_zero: <b>T</b>")
+    .Output("zeroed: <b>T</b>");
+</pre></code>
+
+Your Op registration now specifies that the input's type must be `float`, or
+`int32`, and that its output will be the same type, since both have type `T`.
+
+> A note on naming:{#naming} Inputs, outputs, and attrs generally should be
+> given snake_case names.  The one exception is attrs that are used as the type
+> of an input or in the type of an input. Those attrs can be inferred when the
+> op is added to the graph and so don't appear in the op's function.  For
+> example, this last definition of ZeroOut will generate a Python function that
+> looks like:
+>
+> ```python
+> def zero_out(to_zero, name=None):
+>   """...
+>   Args:
+>     to_zero: A `Tensor`. Must be one of the following types:
+>         `float32`, `int32`.
+>     name: A name for the operation (optional).
+>
+>   Returns:
+>     A `Tensor`. Has the same type as `x`.
+>   """
+> ```
+>
+> If `to_zero` is passed an `int32` tensor, then `T` is automatically set to
+> `int32` (well, actually `DT_INT32`). Those inferred attrs are given
+> Capitalized or CamelCase names.
+>
+> Compare this with an op that has a type attr that determines the output
+> type:
+>
+> ```c++
+> REGISTER_OP("StringToNumber")
+>     .Input("string_tensor: string")
+>     .Output("output: out_type")
+>     .Attr("out_type: {float, int32}");
+>     .Doc(R"doc(
+> Converts each string in the input Tensor to the specified numeric type.
+> )doc");
+> ```
+>
+> In this case, the user has to specify the output type, as in the generated
+> Python:
+>
+> ```python
+> def string_to_number(string_tensor, out_type=None, name=None):
+>   """Converts each string in the input Tensor to the specified numeric type.
+>
+>   Args:
+>     string_tensor: A `Tensor` of type `string`.
+>     out_type: An optional `tf.DType` from: `tf.float32, tf.int32`.
+>       Defaults to `tf.float32`.
+>     name: A name for the operation (optional).
+>
+>   Returns:
+>     A `Tensor` of type `out_type`.
+>   """
+> ```
+
+<code class="lang-c++"><pre>
+\#include "tensorflow/core/framework/op_kernel.h"<br/>
+class ZeroOut<b>Int32</b>Op : public OpKernel {
+  // as before
+};<br/>
+class ZeroOut<b>Float</b>Op : public OpKernel {
+ public:
+  explicit ZeroOut<b>Float</b>Op(OpKernelConstruction\* context)
+      : OpKernel(context) {}<br/>
+  void Compute(OpKernelContext\* context) override {
+    // Grab the input tensor
+    const Tensor& input\_tensor = context-&gt;input(0);
+    auto input = input\_tensor.flat&lt;<b>float</b>&gt;();<br/>
+    // Create an output tensor
+    Tensor* output = NULL;
+    OP\_REQUIRES\_OK(context,
+                   context-&gt;allocate\_output(0, input_tensor.shape(), &output));
+    auto output\_flat = output-&gt;template flat&lt;<b>float</b>&gt;();<br/>
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i = 0; i &lt; N; i++) {
+      output\_flat(i) = 0;
+    }<br/>
+    // Preserve the first input value
+    if (N &gt; 0) output\_flat(0) = input(0);
+  }
+};<br/><b>
+// Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
+// in the Op registration above) must be "int32" to use this template
+// instantiation.</b>
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    <b>.TypeConstraint&lt;int32&gt;("T"),</b>
+    ZeroOutOp<b>Int32</b>);
+<b>REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;float&gt;("T"),
+    ZeroOutFloatOp);
+</b></pre></code>
+
+> To preserve [backwards compatibility](#backwards-compatibility), you should
+> specify a [default value](#default-values-constraints) when adding an attr to
+> an existing op:
+>
+> <code class="lang-c++"><pre>
+> REGISTER\_OP("ZeroOut")
+>   <b>.Attr("T: {float, int32} = DT_INT32")</b>
+>   .Input("to_zero: T")
+>   .Output("zeroed: T")
+> </pre></code>
+
+Lets say you wanted to add more types, say `double`:
+
+<code class="lang-c++"><pre>
+REGISTER\_OP("ZeroOut")
+    <b>.Attr("T: {float, <b>double,</b> int32}")</b>
+    .Input("to_zero: <b>T</b>")
+    .Output("zeroed: <b>T</b>");
+</pre></code>
+
+Instead of writing another `OpKernel` with redundant code as above, often you
+will be able to use a C++ template instead.  You will still have one kernel
+registration (`REGISTER\_KERNEL\_BUILDER` call) per overload.
+
+<code class="lang-c++"><pre>
+<b>template &lt;typename T&gt;</b>
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction\* context) : OpKernel(context) {}<br/>
+  void Compute(OpKernelContext\* context) override {
+    // Grab the input tensor
+    const Tensor& input\_tensor = context-&gt;input(0);
+    auto input = input\_tensor.flat<b>&lt;T&gt;</b>();<br/>
+    // Create an output tensor
+    Tensor* output = NULL;
+    OP\_REQUIRES\_OK(context,
+                   context-&gt;allocate\_output(0, input_tensor.shape(), &output));
+    auto output\_flat = output-&gt;template flat<b>&lt;T&gt;</b>();<br/>
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i = 0; i &lt; N; i++) {
+      output\_flat(i) = 0;
+    }<br/>
+    // Preserve the first input value
+    if (N &gt; 0) output\_flat(0) = input(0);
+  }
+};<br/>
+// Note that TypeConstraint&lt;int32&gt;("T") means that attr "T" (defined
+// in the Op registration above) must be "int32" to use this template
+// instantiation.</b>
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;int32&gt;("T"),
+    <b>ZeroOutOp&lt;int32&gt;</b>);
+REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;float&gt;("T"),
+    <b>ZeroOutOp&lt;float&gt;</b>);
+<b>REGISTER\_KERNEL\_BUILDER(
+    Name("ZeroOut")
+    .Device(DEVICE\_CPU)
+    .TypeConstraint&lt;double&gt;("T"),
+    ZeroOutOp&lt;double&gt;);
+</b></pre></code>
+
+If you have more than a couple overloads, you can put the registration in a
+macro.
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+REGISTER_KERNEL(int32);
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+
+#undef REGISTER_KERNEL
+```
+
+Depending on the list of types you are registering the kernel for, you may be
+able to use a macro provided by
+[`tensorflow/core/framework/register_types.h`][register_types]:
+
+```c++
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+
+REGISTER_OP("ZeroOut")
+    .Attr("T: realnumbertypes")
+    .Input("to_zero: T")
+    .Output("zeroed: T");
+
+template <typename T>
+class ZeroOutOp : public OpKernel { ... };
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
+
+#undef REGISTER_KERNEL
+```
+
+#### List Inputs and Outputs {#list-input-output}
+
+In addition to being able to accept or produce different types, ops can consume
+or produce a variable number of tensors.
+
+In the next example, the attr `T` holds a *list* of types, and is used as the
+type of both the input `in` and the output `out`.  The input and output are
+lists of tensors of that type (and the number and types of tensors in the output
+are the same as the input, since both have type `T`).
+
+```c++
+REGISTER_OP("PolymorphicListExample")
+    .Attr("T: list(type)")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+You can also place restrictions on what types can be specified in the list. In
+this next case, the input is a list of `float` and `double` tensors. The Op
+accepts, for example, input types `(float, double, float)` and in that case the
+output type would also be `(float, double, float)`.
+
+```c++
+REGISTER_OP("ListTypeRestrictionExample")
+    .Attr("T: list({float, double})")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+If you want all the tensors in a list to be of the same type, you might do
+something like:
+
+```c++
+REGISTER_OP("IntListInputExample")
+    .Attr("N: int")
+    .Input("in: N * int32")
+    .Output("out: int32");
+```
+
+This accepts a list of `int32` tensors, and uses an `int` attr `N` to
+specify the length of the list.
+
+This can be made [type polymorphic](#type-polymorphism) as well.  In the next
+example, the input is a list of tensors (with length `"N"`) of the same (but
+unspecified) type (`"T"`), and the output is a single tensor of matching type:
+
+```c++
+REGISTER_OP("SameListInputExample")
+    .Attr("N: int")
+    .Attr("T: type")
+    .Input("in: N * T")
+    .Output("out: T");
+```
+
+By default, tensor lists have a minimum length of 1. You can change that default
+using
+[a `">="` constraint on the corresponding attr](#default-values-constraints).
+In this next example, the input is a list of at least 2 `int32` tensors:
+
+```c++
+REGISTER_OP("MinLengthIntListExample")
+    .Attr("N: int >= 2")
+    .Input("in: N * int32")
+    .Output("out: int32");
+```
+
+The same syntax works with `"list(type)"` attrs:
+
+```c++
+REGISTER_OP("MinimumLengthPolymorphicListExample")
+    .Attr("T: list(type) >= 3")
+    .Input("in: T")
+    .Output("out: T");
+```
+
+### Inputs and Outputs <div class="md-anchor" id="AUTOGENERATED-inputs-and-outputs">{#AUTOGENERATED-inputs-and-outputs}</div>
+
+To summarize the above, an Op registration can have multiple inputs and outputs:
+
+```c++
+REGISTER_OP("MultipleInsAndOuts")
+    .Input("y: int32")
+    .Input("z: float")
+    .Output("a: string")
+    .Output("b: int32");
+```
+
+Each input or output spec is of the form:
+
+```
+<name>: <io-type-expr>
+```
+
+where `<name>` begins with a letter and can be composed of alphanumeric
+characters and underscores. `<io-type-expr>` is one of the following type
+expressions:
+
+* `<type>`, where `<type>` is a supported input type (e.g. `float`, `int32`,
+  `string`). This specifies a single tensor of the given type.
+
+  See
+  [the list of supported Tensor types](../../resources/dims_types.md#data-types).
+
+  ```c++
+  REGISTER_OP("BuiltInTypesExample")
+      .Input("integers: int32")
+      .Input("complex_numbers: scomplex64");
+  ```
+
+* `<attr-type>`, where `<attr-type>` is the name of an [Attr](#attrs) with type
+  `type` or `list(type)` (with a possible type restriction). This syntax allows
+  for [polymorphic ops](#polymorphism).
+
+  ```c++
+  REGISTER_OP("PolymorphicSingleInput")
+      .Attr("T: type")
+      .Input("in: T);
+
+  REGISTER_OP("RestrictedPolymorphicSingleInput")
+      .Attr("T: {int32, int64}")
+      .Input("in: T);
+  ```
+
+  Referencing an attr of type `list(type)` allows you to accept a sequence of
+  tensors.
+
+  ```c++
+  REGISTER_OP("ArbitraryTensorSequenceExample")
+      .Attr("T: list(type)")
+      .Input("in: T")
+      .Output("out: T");
+
+  REGISTER_OP("RestrictedTensorSequenceExample")
+      .Attr("T: list({int32, int64})")
+      .Input("in: T")
+      .Output("out: T");
+  ```
+
+  Note that the number and types of tensors in the output `out` is the same as
+  in the input `in`, since both are of type `T`.
+
+* For a sequence of tensors with the same type: `<number> * <type>`, where
+  `<number>` is the name of an [Attr](#attrs) with type `int`.  The `<type>` can
+  either be
+  [a specific type like `int32` or `float`](../../resources/dims_types.md#data-types),
+  or the name of an attr with type `type`.  As an example of the first, this
+  Op accepts a list of `int32` tensors:
+
+  ```c++
+  REGISTER_OP("Int32SequenceExample")
+      .Attr("NumTensors: int")
+      .Input("in: NumTensors * int32")
+  ```
+
+  Whereas this Op accepts a list of tensors of any type, as long as they are all
+  the same:
+
+  ```c++
+  REGISTER_OP("SameTypeSequenceExample")
+      .Attr("NumTensors: int")
+      .Attr("T: type")
+      .Input("in: NumTensors * T")
+  ```
+
+* For a reference to a tensor: `Ref(<type>)`, where `<type>` is one of the
+  previous types.
+
+> A note on naming: Any attr used in the type of an input will be inferred.  By
+> convention those inferred attrs use capital names (like `T` or `N`).
+> Otherwise inputs, outputs, and attrs have names like function parameters
+> (e.g. `num_outputs`).  For more details, see the
+> [earlier note on naming](#naming).
+
+For more details, see
+[`tensorflow/core/framework/op_def_builder.h`][op_def_builder].
+
+### Backwards compatibility <div class="md-anchor" id="AUTOGENERATED-backwards-compatibility">{#AUTOGENERATED-backwards-compatibility}</div>
+
+In general, changes to specifications must be backwards-compatible: changing the
+specification of an Op must not break prior serialized GraphDefs constructed
+from older specfications.
+
+There are several ways to preserve backwards-compatibility.
+
+1. Any new attrs added to an operation must have default values defined, and
+   with that default value the Op must have the original behavior. To change an
+   operation from not polymorphic to polymorphic, you *must* give a default
+   value to the new type attr to preserve the original signature by default. For
+   example, if your operation was:
+
+   ```c++
+   REGISTER_OP("MyGeneralUnaryOp")
+       .Input("in: float")
+       .Output("out: float");
+   ```
+
+   you can make it polymorphic in a backwards-compatible way using:
+
+   ```c++
+   REGISTER_OP("MyGeneralUnaryOp")
+       .Input("in: T")
+       .Output("out: T")
+       .Attr("T: numerictype = float");
+   ```
+
+1. You can safely make a constraint on an attr less restrictive.  For example,
+   you can change from `{int32, int64}` to `{int32, int64, float}` or from
+   `{"apple", "orange"}` to `{"apple", "banana", "orange"}`.
+
+1. Namespace any new Ops you create, by prefixing the Op names with something
+   unique to your project. This avoids having your Op colliding with any Ops
+   that might be included in future versions of Tensorflow.
+
+1. Plan ahead! Try to anticipate future uses for the Op. Some signature changes
+   can't be done in a compatible way (for example, adding an input, or making a
+   single input into a list).
+
+If you cannot make your change to an operation backwards compatible, then
+create a new operation with a new name with the new semantics.
+
+## GPU Support <div class="md-anchor" id="mult-archs">{#mult-archs}</div>
+
+You can implement different OpKernels and register one for CPU and another for
+GPU, just like you can [register kernels for different types](#polymorphism).
+There are several examples of kernels with GPU support in
+[tensorflow/core/kernels/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/).
+Notice some kernels have a CPU version in a `.cc` file, a GPU version in a file
+ending in `_gpu.cu.cc`, and some code shared in common in a `.h` file.
+
+For example, the [`pad` op](../../api_docs/python/array_ops.md#pad) has
+everything but the GPU kernel in [`tensorflow/core/kernels/pad_op.cc`][pad_op].
+The GPU kernel is in
+[`tensorflow/core/kernels/pad_op_gpu.cu.cc`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op_gpu.cu.cc),
+and the shared code is a templated class defined in
+[`tensorflow/core/kernels/pad_op.h`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op.h).
+One thing to note, even when the GPU kernel version of `pad` is used, it still
+needs its `"paddings"` input in CPU memory.  To mark that inputs or outputs are
+kept on the CPU, add a `HostMemory()` call to the kernel registration, e.g.:
+
+```c++
+#define REGISTER_GPU_KERNEL(T)                         \
+  REGISTER_KERNEL_BUILDER(Name("Pad")                  \
+                              .Device(DEVICE_GPU)      \
+                              .TypeConstraint<T>("T")  \
+                              .HostMemory("paddings"), \
+                          PadOp<GPUDevice, T>)
+```
+
+## Implement the gradient in Python <div class="md-anchor" id="AUTOGENERATED-implement-the-gradient-in-python">{#AUTOGENERATED-implement-the-gradient-in-python}</div>
+
+[TODO]:# (Write this!)
+
+## Implement a shape function in Python <div class="md-anchor" id="AUTOGENERATED-implement-a-shape-function-in-python">{#AUTOGENERATED-implement-a-shape-function-in-python}</div>
+
+The TensorFlow Python API has a feature called "shape inference" that provides
+information about the shapes of tensors without having to execute the
+graph. Shape inference is supported by "shape functions" that are registered for
+each op type, and perform two roles: asserting that the shapes of the inputs are
+compatible, and specifying the shapes for the outputs. A shape function is a
+Python function that takes an
+[`Operation`](../../api_docs/python/framework.md#Operation) as input, and
+returns a list of
+[`TensorShape`](../../api_docs/python/framework.md#TensorShape) objects (one per
+output of the op). To register a shape function, apply the
+[`tf.RegisterShape` decorator](../../api_docs/python/framework.md#RegisterShape)
+to a shape function. For example, the
+[ZeroOut op defined above](#define_interface) would have a shape function like
+the following:
+
+```python
+@tf.RegisterShape("ZeroOut"):
+def _zero_out_shape(op):
+  """Shape function for the ZeroOut op.
+
+  This is the unconstrained version of ZeroOut, which produces an output
+  with the same shape as its input.
+  """
+  return [op.inputs[0].get_shape()]
+```
+
+A shape function can also constrain the shape of an input. For the version of
+[ZeroOut with a vector shape constraint](#validation), the shape function
+would be as follows:
+
+```python
+@tf.RegisterShape("ZeroOut"):
+def _zero_out_shape(op):
+  """Shape function for the ZeroOut op.
+
+  This is the constrained version of ZeroOut, which requires the input to
+  have rank 1 (a vector).
+  """
+  input_shape = op.inputs[0].get_shape().with_rank(1)
+  return [input_shape]
+```
+
+If your op is [polymorphic with multiple inputs](#polymorphism), use the
+properties of the operation to determine the number of shapes to check:
+
+```
+@tf.RegisterShape("IntListInputExample")
+def _int_list_input_example_shape(op):
+  """Shape function for the "IntListInputExample" op.
+
+  All inputs and the output are matrices of the same size.
+  """
+  output_shape = tf.TensorShape(None)
+  for input in op.inputs:
+    output_shape = output_shape.merge_with(input.get_shape().with_rank(2))
+  return [output_shape]
+```
+
+Since shape inference is an optional feature, and the shapes of tensors may vary
+dynamically, shape functions must be robust to incomplete shape information for
+any of the inputs. The [`merge_with()`](../../api_docs/python/framework.md)
+method allows the caller to assert that two shapes are the same, even if either
+or both of them do not have complete information. Shape functions are defined
+for all of the
+[standard Python ops](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/),
+and provide many different usage examples.
+
+[core-array_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/ops/array_ops.cc
+[python-user_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/user_ops/user_ops.py
+[tf-kernels]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/
+[user_ops]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/user_ops/
+[pad_op]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/pad_op.cc
+[standard_ops-py]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/standard_ops.py
+[standard_ops-cc]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/cc/ops/standard_ops.h
+[python-BUILD]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD
+[validation-macros]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/lib/core/errors.h
+[op_def_builder]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_def_builder.h
+[register_types]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/register_types.h
+[FinalizeAttr]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_def_builder.cc#FinalizeAttr
+[DataTypeString]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.cc#DataTypeString
+[python-BUILD]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/BUILD
+[types-proto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.proto
+[TensorShapeProto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/tensor_shape.proto
+[TensorProto]:https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/tensor.proto
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/register_kernels.cc b/tensorflow/g3doc/how_tos/adding_an_op/register_kernels.cc
new file mode 100644
index 0000000000..3d2f50d16e
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/register_kernels.cc
@@ -0,0 +1,64 @@
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+
+using namespace tensorflow;
+
+template <typename T>
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+    auto input = input_tensor.flat<T>();
+
+    // Create an output tensor
+    Tensor* output = NULL;
+    OP_REQUIRES_OK(context,
+                   context->allocate_output(0, input_tensor.shape(), &output));
+    auto output_flat = output->template flat<T>();
+
+    // Set all the elements of the output tensor to 0
+    const int N = input.size();
+    for (int i = 0; i < N; i++) {
+      output_flat(i) = 0;
+    }
+
+    // Preserve the first input value
+    if (N > 0) output_flat(0) = input(0);
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ZeroOut")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<float>("T"),
+                        ZeroOutOp<float>);
+REGISTER_KERNEL_BUILDER(Name("ZeroOut")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<double>("T"),
+                        ZeroOutOp<double>);
+REGISTER_KERNEL_BUILDER(Name("ZeroOut")
+                            .Device(DEVICE_CPU)
+                            .TypeConstraint<int>("T"),
+                        ZeroOutOp<int>);
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+REGISTER_KERNEL(float);
+REGISTER_KERNEL(double);
+REGISTER_KERNEL(int32);
+
+#undef REGISTER_KERNEL
+
+#define REGISTER_KERNEL(type)                                       \
+  REGISTER_KERNEL_BUILDER(                                          \
+      Name("ZeroOut").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
+      ZeroOutOp<type>)
+
+TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);
+
+#undef REGISTER_KERNEL
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/zero_out_1_test.py b/tensorflow/g3doc/how_tos/adding_an_op/zero_out_1_test.py
new file mode 100644
index 0000000000..321f603adf
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/zero_out_1_test.py
@@ -0,0 +1,18 @@
+"""Test for version 1 of the zero_out op."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+from tensorflow.g3doc.how_tos.adding_an_op import gen_zero_out_op_1
+
+
+class ZeroOut1Test(tf.test.TestCase):
+
+  def test(self):
+    with self.test_session():
+      result = gen_zero_out_op_1.zero_out([5, 4, 3, 2, 1])
+      self.assertAllEqual(result.eval(), [5, 0, 0, 0, 0])
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/g3doc/how_tos/adding_an_op/zero_out_op_kernel_1.cc b/tensorflow/g3doc/how_tos/adding_an_op/zero_out_op_kernel_1.cc
new file mode 100644
index 0000000000..e960adc047
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/adding_an_op/zero_out_op_kernel_1.cc
@@ -0,0 +1,43 @@
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+
+using namespace tensorflow;
+
+REGISTER_OP("ZeroOut")
+    .Input("to_zero: int32")
+    .Output("zeroed: int32")
+    .Doc(R"doc(
+Zeros out all but the first value of a Tensor.
+
+zeroed: A Tensor whose first value is identical to `to_zero`, and 0
+  otherwise.
+
+)doc");
+
+class ZeroOutOp : public OpKernel {
+ public:
+  explicit ZeroOutOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+  void Compute(OpKernelContext* context) override {
+    // Grab the input tensor
+    const Tensor& input_tensor = context->input(0);
+    auto input = input_tensor.flat<int32>();
+
+    // Create an output tensor
+    Tensor* output_tensor = NULL;
+    OP_REQUIRES_OK(context, context->allocate_output(0, input_tensor.shape(),
+                                                     &output_tensor));
+    auto output = output_tensor->template flat<int32>();
+
+    // Set all but the first element of the output tensor to 0.
+    const int N = input.size();
+    for (int i = 1; i < N; i++) {
+      output(i) = 0;
+    }
+
+    // Preserve the first input value.
+    if (N > 0) output(0) = input(0);
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("ZeroOut").Device(DEVICE_CPU), ZeroOutOp);
diff --git a/tensorflow/g3doc/how_tos/graph_viz/index.md b/tensorflow/g3doc/how_tos/graph_viz/index.md
new file mode 100644
index 0000000000..f0a1fc2fe7
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/graph_viz/index.md
@@ -0,0 +1,205 @@
+# TensorBoard: Visualizing Your Graph
+
+TensorFlow computation graphs are powerful but complicated. The graph visualization can help you understand and debug them. Here's an example of the visualization at work.
+
+![Visualization of a TensorFlow graph](./graph_vis_animation.gif "Visualization of a TensorFlow graph")
+*Visualization of a TensorFlow graph.*
+
+To see your own graph, run TensorBoard pointing it to the log directory of the job, click on the graph tab on the top pane and select the appropriate run using the menu at the upper left corner. For in depth information on how to run TensorBoard and make sure you are logging all the necessary information, see [Summaries and TensorBoard](../summaries_and_tensorboard/index.md).
+
+## Name scoping and nodes
+
+Typical TensorFlow graphs can have many thousands of nodes--far too many to see easily all at once, or even to lay out using standard graph tools. To simplify, variable's name can be scoped and the visualization uses this information to define a hierarchy structure on the nodes in the graph, and by default only shows the top of this hierarchy. Here is an example that defines three operations under the `hidden` name scope using [`tf.name_scope()`](https://tensorflow.org/api_docs/python/framework.html?cl=head#name_scope):
+
+```python
+import tensorflow as tf
+
+with tf.name_scope('hidden') as scope:
+  a = tf.constant(5, name='alpha')
+  W = tf.Variable(tf.random_uniform([1, 2], -1.0, 1.0), name='weights')
+  b = tf.Variable(tf.zeros([1]), name='biases')
+```
+
+This results in the following three op names:
+
+* *hidden*/alpha
+* *hidden*/weights
+* *hidden*/biases
+
+The visualization will, by default, collapse all three into a node labeled `hidden`.
+The extra detail isn't lost. You can double-click, or click
+on the orange `+` sign in the top right to expand the node, and then you'll see
+three subnodes, for `alpha`, `weights` and `biases`.
+
+Here's a real-life example of a more complicated node in its initial and
+expanded states.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./pool1_collapsed.png" alt="Unexpanded name scope" title="Unexpanded name scope" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./pool1_expanded.png" alt="Expanded name scope" title="Expanded name scope" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      Initial view of top-level name scope <code>pool_1</code>. Clicking on the orange <code>+</code> button on the top right or double-clicking on the node itself will expand it.
+    </td>
+    <td style="width: 50%;">
+      Expanded view of <code>pool_1</code> name scope. Clicking on the orange <code>-</code> button on the top right or double-clicking on the node itself will collapse the name scope.
+    </td>
+  </tr>
+</table>
+
+Grouping nodes by name scopes is critical to making a legible graph. If you're
+building a model, name scopes give you control over the resulting visualization.
+**The better your name scopes, the better your visualization.**
+
+The figure above illustrates a second aspect of the visualization. TensorFlow
+graphs have two kinds of connections: data dependencies and control
+dependencies. Data dependencies show the flow of tensors between two ops and
+are shown as solid arrows, while control dependencies use dotted lines. In the
+expanded view (right side of the figure above) all the connections are data
+dependencies with the exception of the dotted line connecting `CheckNumerics`
+and `control_dependency`.
+
+There's a second trick to simplifying the layout. Most TensorFlow graphs have a
+few nodes with many connections to other nodes. For example, many nodes might
+have a control dependencies on an initialization step. Drawing all edges
+between the `init` node and its dependencies would create a very cluttered
+view.
+
+To reduce clutter, the visualization separates out all high-degree nodes to an
+"auxiliary" area on the right and doesn't draw lines to represent their edges.
+Instead of lines, we draw small "node icons" to indicate the connections.
+Separating out the auxiliary nodes typically doesn't remove critical
+information since these nodes are usually related to bookkeeping functions.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./conv_1.png" alt="conv_1 is part of the main graph" title="conv_1 is part of the main graph" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./save.png" alt="save is extracted as auxiliary node" title="save is extracted as auxiliary node" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      Node <code>conv_1</code> is connected to <code>save</code>. Note the little <code>save</code> node icon on its right.
+    </td>
+    <td style="width: 50%;">
+      <code>save</code> has a high degree, and will appear as an auxiliary node. The connection with <code>conv_1</code> is shown as a node icon on its left. To further reduce clutter, since <code>save</code> has a lot of connections, we show the first 5 and abbreviate the others as <code>... 12 more</code>.
+    </td>
+  </tr>
+</table>
+
+One last structural simplification is "series collapsing". Sequential
+motifs--that is, nodes whose names differ by a number at the end and have
+isomorphic structures--are collapsed into a single "stack" of nodes, as shown
+below. For networks with long sequences, this greatly simplifies the view. As
+with hierarchical nodes, double-clicking expands the series.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./series.png" alt="Sequence of nodes" title="Sequence of nodes" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./series_expanded.png" alt="Expanded sequence of nodes" title="Expanded sequence of nodes" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      A collapsed view of a node sequence.
+    </td>
+    <td style="width: 50%;">
+      A small piece of the expanded view, after double-click.
+    </td>
+  </tr>
+</table>
+
+Finally, as one last aid to legibility, the visualization uses special icons
+for constants and summary nodes. To summarize, here's a table of node symbols:
+
+Symbol | Meaning
+--- | ---
+![Name scope](./namespace_node.png "Name scope") | "High-level" node representing a name scope. Double-click to expand a high-level node.
+![Sequence of unconnected nodes](./horizontal_stack.png "Sequence of unconnected nodes") | Sequence of numbered nodes that are not connected to each other.
+![Sequence of connected nodes](./vertical_stack.png "Sequence of connected nodes") | Sequence of numbered nodes that are connected to each other.
+![Operation node](./op_node.png "Operation node") | An individual operation node.
+![Constant node](./constant.png "Constant node") | A constant.
+![Summary node](./summary.png "Summary node") | A summary node.
+![Data flow edge](./dataflow_edge.png "Data flow edge") | Edge showing the data flow between operations.
+![Control dependency edge](./control_edge.png "Control dependency edge") | Edge showing the control dependency between operations.
+![Reference edge](./reference_edge.png "Reference edge") | A reference edge showing that the outgoing operation node can mutate the incoming tensor.
+
+## Interaction
+
+Navigate the graph by panning and zooming. Click and drag to pan, and use a
+scroll gesture to zoom. Double-click on a node, or click on its `+` button, to
+expand a name scope that represents a group of operations. To easily keep
+track of the current viewpoint when zooming and panning, there is a minimap in
+the bottom right corner.
+
+To close an open node, double-click it again or click its `-` button. You can
+also click once to select a node. It will turn a darker color, and details
+about it and the nodes it connects to will appear in the info card at upper
+right corner of the visualization.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./infocard.png" alt="Info card of a name scope" title="Info card of a name scope" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./infocard_op.png" alt="Info card of operation node" title="Info card of operation node" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      Info card showing detailed information for the <code>conv2</code> name scope. The inputs and outputs are combined from the inputs and outputs of the operation nodes inside the name scope. For name scopes no attributes are shown.
+    </td>
+    <td style="width: 50%;">
+      Info card showing detailed information for the <code>DecodeRaw</code> operation node. In addition to inputs and outputs, the card shows the device and the attributes associated with the current operation.
+    </td>
+  </tr>
+</table>
+
+Selection can also be helpful in understanding high-degree nodes. Select any
+high-degree node, and the corresponding "node icons" for its other connections
+will be selected as well. This makes it easy, for example, to see which nodes
+are being saved--and which aren't.
+
+Clicking on a node name in the info card will select it. If necessary, the
+viewpoint will automatically pan so that the node is visible.
+
+Finally, you can choose two color schemes for your graph, using the color menu
+above the legend. The default "Structure View" shows structure: when two
+high-level nodes have the same structure, they appear in the same color of the
+rainbow. Uniquely structured nodes are gray. There's a second view, which shows
+what device the different operations run on. Name scopes are colored
+proportionally to the fraction of devices for the operations inside them.
+
+The images below give an illustration for a piece of a real-life graph.
+
+<table width="100%;">
+  <tr>
+    <td style="width: 50%;">
+      <img src="./colorby_structure.png" alt="Color by structure" title="Color by structure" />
+    </td>
+    <td style="width: 50%;">
+      <img src="./colorby_device.png" alt="Color by device" title="Color by device" />
+    </td>
+  </tr>
+  <tr>
+    <td style="width: 50%;">
+      Structure view: The gray nodes have unique structure. The orange <code>conv1</code> and <code>conv2</code> nodes have the same structure, and analogously for nodes with other colors.
+    </td>
+    <td style="width: 50%;">
+      Device view: Name scopes are colored proportionally to the fraction of devices of the operation nodes inside them. Here, purple means GPU and the green is CPU.
+    </td>
+  </tr>
+</table>
diff --git a/tensorflow/g3doc/how_tos/index.md b/tensorflow/g3doc/how_tos/index.md
new file mode 100644
index 0000000000..f5c74715e8
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/index.md
@@ -0,0 +1,102 @@
+# Overview
+
+
+## Variables: Creation, Initializing, Saving, and Restoring
+
+TensorFlow Variables are in-memory buffers containing tensors.  Learn how to
+use them to hold and update model parameters during training.
+
+[View Tutorial](variables/index.md)
+
+
+## TensorFlow Mechanics 101
+
+A step-by-step walk through of the details of using TensorFlow infrastructure
+to train models at scale, using MNIST handwritten digit recognition as a toy
+example.
+
+[View Tutorial](../tutorials/mnist/tf/index.md)
+
+
+## TensorBoard: Visualizing Your Training
+
+TensorBoard is a useful tool for visualizing the training and evaluation of
+your model(s).  This tutorial describes how to build and run TensorBoard as well
+as how to add Summary ops to automatically output data to the Events files that
+TensorBoard uses for display.
+
+[View Tutorial](summaries_and_tensorboard/index.md)
+
+
+## TensorBoard: Visualizing Your Graph
+
+This tutorial describes how to use the graph visualizer in TensorBoard to help
+you understand the dataflow graph and debug it.
+
+[View Tutorial](graph_viz/index.md)
+
+
+## Reading Data
+
+This tutorial describes the three main methods of getting data into your
+TensorFlow program: Feeding, Reading and Preloading.
+
+[View Tutorial](reading_data/index.md)
+
+
+## Threading and Queues
+
+This tutorial describes the various constructs implemented by TensorFlow
+to facilitate asynchronous and concurrent training.
+
+[View Tutorial](threading_and_queues/index.md)
+
+
+## Adding a New Op
+
+TensorFlow already has a large suite of node operations from which you can
+compose in your graph, but here are the details of how to add you own custom Op.
+
+[View Tutorial](adding_an_op/index.md)
+
+
+## New Data Formats
+
+If you have a sizable custom data set, you may want to consider extending
+TensorFlow to read your data directly in it's native format.  Here's how.
+
+[View Tutorial](new_data_formats/index.md)
+
+
+## Using One or More GPUs
+
+This tutorial describes how to construct and execute models on GPU(s).
+
+[View Tutorial](using_gpu/index.md)
+
+
+## Sharing Variables
+
+When deploying large models on multiple GPUs, or when unrolling complex LSTMs
+or RNNs, it is often necessary to access the same Variable objects from
+different locations in the model construction code.
+
+The "Variable Scope" mechanism is designed to facilitate that.
+
+[View Tutorial](variable_scope/index.md)
+
+<div class='sections-order' style="display: none;">
+<!--
+<!-- variables/index.md -->
+<!-- ../tutorials/mnist/tf/index.md -->
+<!-- summaries_and_tensorboard/index.md -->
+<!-- graph_viz/index.md -->
+<!-- reading_data/index.md -->
+<!-- threading_and_queues/index.md -->
+<!-- adding_an_op/index.md -->
+<!-- new_data_formats/index.md -->
+<!-- using_gpu/index.md -->
+<!-- variable_scope/index.md -->
+-->
+</div>
+
diff --git a/tensorflow/g3doc/how_tos/new_data_formats/index.md b/tensorflow/g3doc/how_tos/new_data_formats/index.md
new file mode 100644
index 0000000000..b1b09fe1ff
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/new_data_formats/index.md
@@ -0,0 +1,225 @@
+# Extending TF: Supporting new data formats
+
+PREREQUISITES:
+
+*   Some familiarity with C++.
+*   Must have
+    [downloaded TensorFlow source](../../get_started/os_setup.md#source), and be
+    able to build it.
+
+We divide the task of supporting a file format into two pieces:
+
+*   File formats: We use a *Reader* Op to read a *record* (which can be any
+    string) from a file.
+*   Record formats: We use decoder or parsing Ops to turn a string record
+    into tensors usable by TensorFlow.
+
+For example, to read a
+[CSV file](https://en.wikipedia.org/wiki/Comma-separated_values), we use
+[a Reader for text files](../../api_docs/python/io_ops.md#TextLineReader)
+followed by
+[an Op that parses CSV data from a line of text](../../api_docs/python/io_ops.md#decode_csv).
+
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Writing a Reader for a file format](#AUTOGENERATED-writing-a-reader-for-a-file-format)
+* [Writing an Op for a record format](#AUTOGENERATED-writing-an-op-for-a-record-format)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Writing a Reader for a file format <div class="md-anchor" id="AUTOGENERATED-writing-a-reader-for-a-file-format">{#AUTOGENERATED-writing-a-reader-for-a-file-format}</div>
+
+A `Reader` is something that reads records from a file.  There are some examples
+of Reader Ops already built into TensorFlow:
+
+*   [`tf.TFRecordReader`](../../api_docs/python/io_ops.md#TFRecordReader)
+    ([source in kernels/tf_record_reader_op.cc](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/tf_record_reader_op.cc))
+*   [`tf.FixedLengthRecordReader`](../../api_docs/python/io_ops.md#FixedLengthRecordReader)
+    ([source in kernels/fixed_length_record_reader_op.cc](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/fixed_length_record_reader_op.cc))
+*   [`tf.TextLineReader`](../../api_docs/python/io_ops.md#TextLineReader)
+    ([source in kernels/text_line_reader_op.cc](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/text_line_reader_op.cc))
+
+You can see these all expose the same interface, the only differences
+are in their constructors.  The most important method is `read()`.
+It takes a queue argument, which is where it gets filenames to
+read from whenever it needs one (e.g. when the `read` op first runs, or
+the previous `read` reads the last record from a file).  It produces
+two scalar tensors: a string key and and a string value.
+
+To create a new reader called `SomeReader`, you will need to:
+
+1.  In C++, define a subclass of
+    [`tensorflow::ReaderBase`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/reader_base.h)
+    called `SomeReader`.
+2.  In C++, register a new reader op and kernel with the name `"SomeReader"`.
+3.  In Python, define a subclass of [`tf.ReaderBase`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/io_ops.py) called `SomeReader`.
+
+You can put all the C++ code in a file in
+`tensorflow/core/user_ops/some_reader_op.cc`.  The code to read a file will live
+in a descendant of the C++ `ReaderBase` class, which is defined in
+[tensorflow/core/kernels/reader_base.h](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/kernels/reader_base.h).
+You will need to implement the following methods:
+
+*   `OnWorkStartedLocked`: open the next file
+*   `ReadLocked`: read a record or report EOF/error
+*   `OnWorkFinishedLocked`: close the current file, and
+*   `ResetLocked`: get a clean slate after, e.g., an error
+
+These methods have names ending in "Locked" since `ReaderBase` makes sure
+to acquire a mutex before calling any of these methods, so you generally don't
+have to worry about thread safety (though that only protects the members of the
+class, not global state).
+
+For `OnWorkStartedLocked`, the name of the file to open is the value returned by
+the `current_work()` method.  `ReadLocked()` has this signature:
+
+```c++
+Status ReadLocked(string* key, string* value, bool* produced, bool* at_end)
+```
+
+If `ReadLocked()` successfully reads a record from the file, it should fill in:
+
+*   `*key`: with an identifier for the record, that a human could use to find
+    this record again.  You can include the filename from `current_work()`,
+    and append a record number or whatever.
+*   `*value`: with the contents of the record.
+*   `*produced`: set to `true`.
+
+If you hit the end of a file (EOF), set `*at_end` to `true`.  In either case,
+return `Status::OK()`.  If there is an error, simply return it using one of the
+helper functions from
+[tensorflow/core/lib/core/errors.h](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/lib/core/errors.h)
+without modifying any arguments.
+
+Next you will create the actual Reader op.  It will help if you are familiar
+with [the adding an op how-to](../adding_an_op/index.md).  The main steps
+are:
+
+*   Registering the op.
+*   Define and register an `OpKernel`.
+
+To register the op, you will use a `REGISTER_OP()` call defined in
+[tensorflow/core/framework/op.h](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op.h).
+Reader ops never take any input and always have a single output with type
+`Ref(string)`.  They should always call `SetIsStateful()`, and have a string
+`container` and `shared_name` attrs.  You may optionally define additional attrs
+for configuration or include documentation in a `Doc()`.  For examples, see
+[tensorflow/core/ops/io_ops.cc](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/ops/io_ops.cc),
+e.g.:
+
+```c++
+#include "tensorflow/core/framework/op.h"
+
+REGISTER_OP("TextLineReader")
+    .Output("reader_handle: Ref(string)")
+    .Attr("skip_header_lines: int = 0")
+    .Attr("container: string = ''")
+    .Attr("shared_name: string = ''")
+    .SetIsStateful()
+    .Doc(R"doc(
+A Reader that outputs the lines of a file delimited by '\n'.
+)doc");
+```
+    
+To define an `OpKernel`, Readers can use the shortcut of descending from
+`ReaderOpKernel`, defined in
+[tensorflow/core/framework/reader_op_kernel.h](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/reader_op_kernel.h),
+and implement a constructor that calls `SetReaderFactory()`.  After defining
+your class, you will need to register it using `REGISTER_KERNEL_BUILDER(...)`.
+An example with no attrs:
+
+```c++
+#include "tensorflow/core/framework/reader_op_kernel.h"
+
+class TFRecordReaderOp : public ReaderOpKernel {
+ public:
+  explicit TFRecordReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    Env* env = context->env();
+    SetReaderFactory([this, env]() { return new TFRecordReader(name(), env); });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("TFRecordReader").Device(DEVICE_CPU),
+                        TFRecordReaderOp);
+```
+
+An example with attrs:
+
+```c++
+#include "tensorflow/core/framework/reader_op_kernel.h"
+
+class TextLineReaderOp : public ReaderOpKernel {
+ public:
+  explicit TextLineReaderOp(OpKernelConstruction* context)
+      : ReaderOpKernel(context) {
+    int skip_header_lines = -1;
+    OP_REQUIRES_OK(context,
+                   context->GetAttr("skip_header_lines", &skip_header_lines));
+    OP_REQUIRES(context, skip_header_lines >= 0,
+                errors::InvalidArgument("skip_header_lines must be >= 0 not ",
+                                        skip_header_lines));
+    Env* env = context->env();
+    SetReaderFactory([this, skip_header_lines, env]() {
+      return new TextLineReader(name(), skip_header_lines, env);
+    });
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("TextLineReader").Device(DEVICE_CPU),
+                        TextLineReaderOp);
+```
+
+The last step is to add the Python wrapper.  You will import
+`tensorflow.python.ops.io_ops` in
+[tensorflow/python/user_ops/user_ops.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/user_ops/user_ops.py)
+and add a descendant of [`io_ops.ReaderBase`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/io_ops.py).
+
+```python
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import io_ops
+
+class SomeReader(io_ops.ReaderBase):
+
+    def __init__(self, name=None):
+        rr = gen_user_ops.some_reader(name=name)
+        super(SomeReader, self).__init__(rr)
+
+
+ops.NoGradient("SomeReader")
+ops.RegisterShape("SomeReader")(common_shapes.scalar_shape)
+```
+
+You can see some examples in
+[`tensorflow/python/ops/io_ops.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/python/ops/io_ops.py).
+
+## Writing an Op for a record format <div class="md-anchor" id="AUTOGENERATED-writing-an-op-for-a-record-format">{#AUTOGENERATED-writing-an-op-for-a-record-format}</div>
+
+Generally this is an ordinary op that takes a scalar string record as input, and
+so follow [the instructions to add an Op](../adding_an_op/index.md).  You may
+optionally take a scalar string key as input, and include that in error messages
+reporting improperly formatted data.  That way users can more easily track down
+where the bad data came from.
+
+Examples of Ops useful for decoding records:
+
+*   [`tf.parse_single_example`](../../api_docs/python/io_ops.md#parse_single_example)
+    (and
+    [`tf.parse_example`](../../api_docs/python/io_ops.md#parse_example))
+*   [`tf.decode_csv`](../../api_docs/python/io_ops.md#decode_csv)
+*   [`tf.decode_raw`](../../api_docs/python/io_ops.md#decode_raw)
+
+Note that it can be useful to use multiple Ops to decode a particular record
+format.  For example, you may have an image saved as a string in
+[a tf.train.Example protocol buffer](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/example/example.proto).
+Depending on the format of that image, you might take the corresponding output
+from a
+[`tf.parse_single_example`](../../api_docs/python/io_ops.md#parse_single_example)
+op and call [`tf.decode_jpeg`](../../api_docs/python/image.md#decode_jpeg),
+[`tf.decode_png`](../../api_docs/python/image.md#decode_png), or
+[`tf.decode_raw`](../../api_docs/python/io_ops.md#decode_raw).  It is common to
+take the output of `tf.decode_raw` and use
+[`tf.slice`](../../api_docs/python/array_ops.md#slice) and
+[`tf.reshape`](../../api_docs/python/array_ops.md#reshape) to extract pieces.
diff --git a/tensorflow/g3doc/how_tos/reading_data/__init__.py b/tensorflow/g3doc/how_tos/reading_data/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/__init__.py
diff --git a/tensorflow/g3doc/how_tos/reading_data/convert_to_records.py b/tensorflow/g3doc/how_tos/reading_data/convert_to_records.py
new file mode 100644
index 0000000000..1d510cdfa9
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/convert_to_records.py
@@ -0,0 +1,87 @@
+"""Converts MNIST data to TFRecords file format with Example protos."""
+
+import os
+import tensorflow.python.platform
+
+import numpy
+import tensorflow as tf
+from tensorflow.g3doc.tutorials.mnist import input_data
+
+
+TRAIN_IMAGES = 'train-images-idx3-ubyte.gz'  # MNIST filenames
+TRAIN_LABELS = 'train-labels-idx1-ubyte.gz'
+TEST_IMAGES = 't10k-images-idx3-ubyte.gz'
+TEST_LABELS = 't10k-labels-idx1-ubyte.gz'
+
+
+tf.app.flags.DEFINE_string('directory', 'data',
+                           'Directory to download data files and write the '
+                           'converted result')
+tf.app.flags.DEFINE_integer('validation_size', 5000,
+                            'Number of examples to separate from the training '
+                            'data for the validation set.')
+FLAGS = tf.app.flags.FLAGS
+
+
+def _int64_feature(value):
+  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
+
+
+def _bytes_feature(value):
+  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
+
+
+def convert_to(images, labels, name):
+    num_examples = labels.shape[0]
+    if images.shape[0] != num_examples:
+        raise ValueError("Images size %d does not match label size %d." %
+                         (dat.shape[0], num_examples))
+    rows = images.shape[1]
+    cols = images.shape[2]
+    depth = images.shape[3]
+
+    filename = os.path.join(FLAGS.directory, name + '.tfrecords')
+    print 'Writing', filename
+    writer = tf.python_io.TFRecordWriter(filename)
+    for index in range(num_examples):
+        image_raw = images[index].tostring()
+        example = tf.train.Example(features=tf.train.Features(feature={
+            'height':_int64_feature(rows),
+            'width':_int64_feature(cols),
+            'depth':_int64_feature(depth),
+            'label':_int64_feature(int(labels[index])),
+            'image_raw':_bytes_feature(image_raw)}))
+        writer.write(example.SerializeToString())
+
+
+def main(argv):
+    # Get the data.
+    train_images_filename = input_data.maybe_download(
+        TRAIN_IMAGES, FLAGS.directory)
+    train_labels_filename = input_data.maybe_download(
+        TRAIN_LABELS, FLAGS.directory)
+    test_images_filename = input_data.maybe_download(
+        TEST_IMAGES, FLAGS.directory)
+    test_labels_filename = input_data.maybe_download(
+        TEST_LABELS, FLAGS.directory)
+
+    # Extract it into numpy arrays.
+    train_images = input_data.extract_images(train_images_filename)
+    train_labels = input_data.extract_labels(train_labels_filename)
+    test_images = input_data.extract_images(test_images_filename)
+    test_labels = input_data.extract_labels(test_labels_filename)
+
+    # Generate a validation set.
+    validation_images = train_images[:FLAGS.validation_size, :, :, :]
+    validation_labels = train_labels[:FLAGS.validation_size]
+    train_images = train_images[FLAGS.validation_size:, :, :, :]
+    train_labels = train_labels[FLAGS.validation_size:]
+
+    # Convert to Examples and write the result to TFRecords.
+    convert_to(train_images, train_labels, 'train')
+    convert_to(validation_images, validation_labels, 'validation')
+    convert_to(test_images, test_labels, 'test')
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py b/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py
new file mode 100644
index 0000000000..b2436cd2ab
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py
@@ -0,0 +1,134 @@
+"""Trains the MNIST network using preloaded data in a constant.
+
+Command to run this py_binary target:
+
+bazel run -c opt \
+    <...>/tensorflow/g3doc/how_tos/reading_data:fully_connected_preloaded
+"""
+import os.path
+import time
+
+import tensorflow.python.platform
+import numpy
+import tensorflow as tf
+
+from tensorflow.g3doc.tutorials.mnist import input_data
+from tensorflow.g3doc.tutorials.mnist import mnist
+
+
+# Basic model parameters as external flags.
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('num_epochs', 2, 'Number of epochs to run trainer.')
+flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
+flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
+flags.DEFINE_integer('batch_size', 100, 'Batch size.  '
+                     'Must divide evenly into the dataset sizes.')
+flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
+flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
+                     'for unit testing.')
+
+
+def run_training():
+    """Train MNIST for a number of epochs."""
+    # Get the sets of images and labels for training, validation, and
+    # test on MNIST.
+    data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
+
+    # Tell TensorFlow that the model will be built into the default Graph.
+    with tf.Graph().as_default():
+        with tf.name_scope('input'):
+            # Input data
+            input_images = tf.constant(data_sets.train.images)
+            input_labels = tf.constant(data_sets.train.labels)
+
+            image, label = tf.train.slice_input_producer(
+                [input_images, input_labels], num_epochs=FLAGS.num_epochs)
+            label = tf.cast(label, tf.int32)
+            images, labels = tf.train.batch(
+                [image, label], batch_size=FLAGS.batch_size)
+
+        # Build a Graph that computes predictions from the inference model.
+        logits = mnist.inference(images, FLAGS.hidden1, FLAGS.hidden2)
+
+        # Add to the Graph the Ops for loss calculation.
+        loss = mnist.loss(logits, labels)
+
+        # Add to the Graph the Ops that calculate and apply gradients.
+        train_op = mnist.training(loss, FLAGS.learning_rate)
+
+        # Add the Op to compare the logits to the labels during evaluation.
+        eval_correct = mnist.evaluation(logits, labels)
+
+        # Build the summary operation based on the TF collection of Summaries.
+        summary_op = tf.merge_all_summaries()
+
+        # Create a saver for writing training checkpoints.
+        saver = tf.train.Saver()
+
+        # Create the op for initializing variables.
+        init_op = tf.initialize_all_variables()
+
+        # Create a session for running Ops on the Graph.
+        sess = tf.Session()
+
+        # Run the Op to initialize the variables.
+        sess.run(init_op)
+
+        # Instantiate a SummaryWriter to output summaries and the Graph.
+        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                                graph_def=sess.graph_def)
+
+        # Start input enqueue threads.
+        coord = tf.train.Coordinator()
+        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+        # And then after everything is built, start the training loop.
+        try:
+            step = 0
+            while not coord.should_stop():
+                start_time = time.time()
+
+                # Run one step of the model.
+                _, loss_value = sess.run([train_op, loss])
+
+                duration = time.time() - start_time
+
+                # Write the summaries and print an overview fairly often.
+                if step % 100 == 0:
+                    # Print status to stdout.
+                    print 'Step %d: loss = %.2f (%.3f sec)' % (step,
+                                                               loss_value,
+                                                               duration)
+                    # Update the events file.
+                    summary_str = sess.run(summary_op)
+                    summary_writer.add_summary(summary_str, step)
+                    step += 1
+
+                # Save a checkpoint periodically.
+                if (step + 1) % 1000 == 0:
+                    print 'Saving'
+                    saver.save(sess, FLAGS.train_dir, global_step=step)
+
+                step += 1
+        except tf.errors.OutOfRangeError:
+            print 'Saving'
+            saver.save(sess, FLAGS.train_dir, global_step=step)
+            print 'Done training for %d epochs, %d steps.' % (
+                FLAGS.num_epochs, step)
+        finally:
+            # When done, ask the threads to stop.
+            coord.request_stop()
+
+        # Wait for threads to finish.
+        coord.join(threads)
+        sess.close()
+
+
+def main(_):
+    run_training()
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py b/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py
new file mode 100644
index 0000000000..89abd60d0e
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py
@@ -0,0 +1,146 @@
+"""Trains the MNIST network using preloaded data stored in a variable.
+
+Command to run this py_binary target:
+
+bazel run -c opt \
+    <...>/tensorflow/g3doc/how_tos/reading_data:fully_connected_preloaded_var
+"""
+import os.path
+import time
+
+import tensorflow.python.platform
+import numpy
+import tensorflow as tf
+
+from tensorflow.g3doc.tutorials.mnist import input_data
+from tensorflow.g3doc.tutorials.mnist import mnist
+
+
+# Basic model parameters as external flags.
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('num_epochs', 2, 'Number of epochs to run trainer.')
+flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
+flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
+flags.DEFINE_integer('batch_size', 100, 'Batch size.  '
+                     'Must divide evenly into the dataset sizes.')
+flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
+flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
+                     'for unit testing.')
+
+
+def run_training():
+    """Train MNIST for a number of epochs."""
+    # Get the sets of images and labels for training, validation, and
+    # test on MNIST.
+    data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
+
+    # Tell TensorFlow that the model will be built into the default Graph.
+    with tf.Graph().as_default():
+        with tf.name_scope('input'):
+            # Input data
+            images_initializer = tf.placeholder(
+                dtype=data_sets.train.images.dtype,
+                shape=data_sets.train.images.shape)
+            labels_initializer = tf.placeholder(
+                dtype=data_sets.train.labels.dtype,
+                shape=data_sets.train.labels.shape)
+            input_images = tf.Variable(
+                images_initializer, trainable=False, collections=[])
+            input_labels = tf.Variable(
+                labels_initializer, trainable=False, collections=[])
+
+            image, label = tf.train.slice_input_producer(
+                [input_images, input_labels], num_epochs=FLAGS.num_epochs)
+            label = tf.cast(label, tf.int32)
+            images, labels = tf.train.batch(
+                [image, label], batch_size=FLAGS.batch_size)
+
+        # Build a Graph that computes predictions from the inference model.
+        logits = mnist.inference(images, FLAGS.hidden1, FLAGS.hidden2)
+
+        # Add to the Graph the Ops for loss calculation.
+        loss = mnist.loss(logits, labels)
+
+        # Add to the Graph the Ops that calculate and apply gradients.
+        train_op = mnist.training(loss, FLAGS.learning_rate)
+
+        # Add the Op to compare the logits to the labels during evaluation.
+        eval_correct = mnist.evaluation(logits, labels)
+
+        # Build the summary operation based on the TF collection of Summaries.
+        summary_op = tf.merge_all_summaries()
+
+        # Create a saver for writing training checkpoints.
+        saver = tf.train.Saver()
+
+        # Create the op for initializing variables.
+        init_op = tf.initialize_all_variables()
+
+        # Create a session for running Ops on the Graph.
+        sess = tf.Session()
+
+        # Run the Op to initialize the variables.
+        sess.run(init_op)
+        sess.run(input_images.initializer,
+                 feed_dict={images_initializer: data_sets.train.images})
+        sess.run(input_labels.initializer,
+                 feed_dict={labels_initializer: data_sets.train.labels})
+
+        # Instantiate a SummaryWriter to output summaries and the Graph.
+        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                                graph_def=sess.graph_def)
+
+        # Start input enqueue threads.
+        coord = tf.train.Coordinator()
+        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+        # And then after everything is built, start the training loop.
+        try:
+            step = 0
+            while not coord.should_stop():
+                start_time = time.time()
+
+                # Run one step of the model.
+                _, loss_value = sess.run([train_op, loss])
+
+                duration = time.time() - start_time
+
+                # Write the summaries and print an overview fairly often.
+                if step % 100 == 0:
+                    # Print status to stdout.
+                    print 'Step %d: loss = %.2f (%.3f sec)' % (step,
+                                                               loss_value,
+                                                               duration)
+                    # Update the events file.
+                    summary_str = sess.run(summary_op)
+                    summary_writer.add_summary(summary_str, step)
+                    step += 1
+
+                # Save a checkpoint periodically.
+                if (step + 1) % 1000 == 0:
+                    print 'Saving'
+                    saver.save(sess, FLAGS.train_dir, global_step=step)
+
+                step += 1
+        except tf.errors.OutOfRangeError:
+            print 'Saving'
+            saver.save(sess, FLAGS.train_dir, global_step=step)
+            print 'Done training for %d epochs, %d steps.' % (
+                FLAGS.num_epochs, step)
+        finally:
+            # When done, ask the threads to stop.
+            coord.request_stop()
+
+        # Wait for threads to finish.
+        coord.join(threads)
+        sess.close()
+
+
+def main(_):
+    run_training()
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py b/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py
new file mode 100644
index 0000000000..f1e10ca34e
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py
@@ -0,0 +1,180 @@
+"""Train and Eval the MNIST network.
+
+This version is like fully_connected_feed.py but uses data converted
+to a TFRecords file containing tf.train.Example protocol buffers.
+See tensorflow/g3doc/how_tos/reading_data.md#reading-from-files
+for context.
+
+YOU MUST run convert_to_records before running this (but you only need to
+run it once).
+"""
+
+import os.path
+import time
+
+import tensorflow.python.platform
+import numpy
+import tensorflow as tf
+
+from tensorflow.g3doc.tutorials.mnist import mnist
+
+
+# Basic model parameters as external flags.
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('num_epochs', 2, 'Number of epochs to run trainer.')
+flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
+flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
+flags.DEFINE_integer('batch_size', 100, 'Batch size.')
+flags.DEFINE_string('train_dir', 'data', 'Directory with the training data.')
+
+# Constants used for dealing with the files, matches convert_to_records.
+TRAIN_FILE = 'train.tfrecords'
+VALIDATION_FILE = 'validation.tfrecords'
+
+
+def read_and_decode(filename_queue):
+    reader = tf.TFRecordReader()
+    _, serialized_example = reader.read(filename_queue)
+    features = tf.parse_single_example(
+        serialized_example,
+        dense_keys=['image_raw', 'label'],
+        # Defaults are not specified since both keys are required.
+        dense_types=[tf.string, tf.int64])
+
+    # Convert from a scalar string tensor (whose single string has
+    # length mnist.IMAGE_PIXELS) to a uint8 tensor with shape
+    # [mnist.IMAGE_PIXELS].
+    image = tf.decode_raw(features['image_raw'], tf.uint8)
+    image.set_shape([mnist.IMAGE_PIXELS])
+
+    # OPTIONAL: Could reshape into a 28x28 image and apply distortions
+    # here.  Since we are not applying any distortions in this
+    # example, and the next step expects the image to be flattened
+    # into a vector, we don't bother.
+
+    # Convert from [0, 255] -> [-0.5, 0.5] floats.
+    image = tf.cast(image, tf.float32) * (1. / 255) - 0.5
+
+    # Convert label from a scalar uint8 tensor to an int32 scalar.
+    label = tf.cast(features['label'], tf.int32)
+
+    return image, label
+
+
+def inputs(train, batch_size, num_epochs):
+    """Reads input data num_epochs times.
+
+    Args:
+      train: Selects between the training (True) and validation (False) data.
+      batch_size: Number of examples per returned batch.
+      num_epochs: Number of times to read the input data, or 0/None to
+         train forever.
+
+    Returns:
+      A tuple (images, labels), where:
+      * images is a float tensor with shape [batch_size, mnist.IMAGE_PIXELS]
+        in the range [-0.5, 0.5].
+      * labels is an int32 tensor with shape [batch_size] with the true label,
+        a number in the range [0, mnist.NUM_CLASSES).
+      Note that an tf.train.QueueRunner is added to the graph, which
+      must be run using e.g. tf.train.start_queue_runners().
+    """
+    if not num_epochs: num_epochs = None
+    filename = os.path.join(FLAGS.train_dir,
+                            TRAIN_FILE if train else VALIDATION_FILE)
+
+    with tf.name_scope('input'):
+        filename_queue = tf.train.string_input_producer(
+            [filename], num_epochs=num_epochs)
+
+        # Even when reading in multiple threads, share the filename
+        # queue.
+        image, label = read_and_decode(filename_queue)
+
+        # Shuffle the examples and collect them into batch_size batches.
+        # (Internally uses a RandomShuffleQueue.)
+        # We run this in two threads to avoid being a bottleneck.
+        images, sparse_labels = tf.train.shuffle_batch(
+            [image, label], batch_size=batch_size, num_threads=2,
+            capacity=1000 + 3 * batch_size,
+            # Ensures a minimum amount of shuffling of examples.
+            min_after_dequeue=1000)
+
+        return images, sparse_labels
+
+
+def run_training():
+    """Train MNIST for a number of steps."""
+
+    # Tell TensorFlow that the model will be built into the default Graph.
+    with tf.Graph().as_default():
+        # Input images and labels.
+        images, labels = inputs(train=True, batch_size=FLAGS.batch_size,
+                                num_epochs=FLAGS.num_epochs)
+
+        # Build a Graph that computes predictions from the inference model.
+        logits = mnist.inference(images,
+                                 FLAGS.hidden1,
+                                 FLAGS.hidden2)
+
+        # Add to the Graph the loss calculation.
+        loss = mnist.loss(logits, labels)
+
+        # Add to the Graph operations that train the model.
+        train_op = mnist.training(loss, FLAGS.learning_rate)
+
+        # The op for initializing the variables.
+        init_op = tf.initialize_all_variables();
+
+        # Create a session for running operations in the Graph.
+        sess = tf.Session()
+
+        # Initialize the variables (the trained variables and the
+        # epoch counter).
+        sess.run(init_op)
+
+        # Start input enqueue threads.
+        coord = tf.train.Coordinator()
+        threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+        try:
+            step = 0
+            while not coord.should_stop():
+                start_time = time.time()
+
+                # Run one step of the model.  The return values are
+                # the activations from the `train_op` (which is
+                # discarded) and the `loss` op.  To inspect the values
+                # of your ops or variables, you may include them in
+                # the list passed to sess.run() and the value tensors
+                # will be returned in the tuple from the call.
+                _, loss_value = sess.run([train_op, loss])
+
+                duration = time.time() - start_time
+
+                # Print an overview fairly often.
+                if step % 100 == 0:
+                    print 'Step %d: loss = %.2f (%.3f sec)' % (step,
+                                                               loss_value,
+                                                               duration)
+                step += 1
+        except tf.errors.OutOfRangeError:
+            print 'Done training for %d epochs, %d steps.' % (
+                FLAGS.num_epochs, step)
+        finally:
+            # When done, ask the threads to stop.
+            coord.request_stop()
+
+        # Wait for threads to finish.
+        coord.join(threads)
+        sess.close()
+
+
+def main(_):
+    run_training()
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/how_tos/reading_data/index.md b/tensorflow/g3doc/how_tos/reading_data/index.md
new file mode 100644
index 0000000000..2b305f9333
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/reading_data/index.md
@@ -0,0 +1,495 @@
+# Reading data
+
+There are three main methods of getting data into a TensorFlow program:
+
+*   Feeding: Python code provides the data when running each step.
+*   Reading from files: an input pipeline reads the data from files
+    at the beginning of a TensorFlow graph.
+*   Preloaded data: a constant or variable in the TensorFlow graph holds
+    all the data (for small data sets).
+
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+* [Feeding](#Feeding)
+* [Reading from files](#AUTOGENERATED-reading-from-files)
+  * [Filenames, shuffling, and epoch limits](#AUTOGENERATED-filenames--shuffling--and-epoch-limits)
+  * [File formats](#AUTOGENERATED-file-formats)
+  * [Preprocessing](#AUTOGENERATED-preprocessing)
+  * [Batching](#AUTOGENERATED-batching)
+  * [Creating threads to prefetch using `QueueRunner` objects](#QueueRunner)
+  * [Filtering records or producing multiple examples per record](#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record)
+  * [Sparse input data](#AUTOGENERATED-sparse-input-data)
+* [Preloaded data](#AUTOGENERATED-preloaded-data)
+* [Multiple input pipelines](#AUTOGENERATED-multiple-input-pipelines)
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+## Feeding <div class="md-anchor" id="Feeding">{#Feeding}</div>
+
+TensorFlow's feed mechanism lets you inject data into any Tensor in a
+computation graph. A python computation can thus feed data directly into the
+graph.
+
+Supply feed data through the `feed_dict` argument to a run() or eval() call
+that initiates computation.
+
+```python
+with tf.Session():
+  input = tf.placeholder(tf.float32)
+  classifier = ...
+  print classifier.eval(feed_dict={input: my_python_preprocessing_fn()})
+```
+
+While you can replace any Tensor with feed data, including variables and
+constants, the best practice is to use a
+[`placeholder` op](../../api_docs/python/io_ops.md#placeholder) node. A
+`placeholder` exists solely to serve as the target of feeds. It is not
+initialized and contains no data. A placeholder generates an error if
+it is executed without a feed, so you won't forget to feed it.
+
+An example using `placeholder` and feeding to train on MNIST data can be found
+in
+[tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py),
+and is described in the [MNIST tutorial](../../tutorials/mnist/tf/index.md).
+
+## Reading from files <div class="md-anchor" id="AUTOGENERATED-reading-from-files">{#AUTOGENERATED-reading-from-files}</div>
+
+A typical pipeline for reading records from files has the following stages:
+
+1.  The list of filenames
+2.  *Optional* filename shuffling
+3.  *Optional* epoch limit
+4.  Filename queue
+5.  A Reader for the file format
+6.  A decoder for a record read by the reader
+7.  *Optional* preprocessing
+8.  Example queue
+
+### Filenames, shuffling, and epoch limits <div class="md-anchor" id="AUTOGENERATED-filenames--shuffling--and-epoch-limits">{#AUTOGENERATED-filenames--shuffling--and-epoch-limits}</div>
+
+For the list of filenames, use either a constant string Tensor (like
+`["file0", "file1"]` or `[("file%d" % i) for i in range(2)]`) or the
+[tf.train.match_filenames_once
+function](../../api_docs/python/io_ops.md#match_filenames_once).
+
+Pass the list of filenames to the [tf.train.string_input_producer
+function](../../api_docs/python/io_ops.md#string_input_producer).
+`string_input_producer` creates a FIFO queue for holding the filenames until
+the reader needs them.
+
+`string_input_producer` has options for shuffling and setting a maximum number
+of epochs. A queue runner adds the whole list of filenames to the queue once
+for each epoch, shuffling the filenames within an epoch if `shuffle=True`.
+This procedure provides a uniform sampling of files, so that examples are not
+under- or over- sampled relative to each other.
+
+The queue runner works in a thread separate from the reader that pulls
+filenames from the queue, so the shuffling and enqueuing process does not
+block the reader.
+
+### File formats <div class="md-anchor" id="AUTOGENERATED-file-formats">{#AUTOGENERATED-file-formats}</div>
+
+Select the reader that matches your input file format and pass the filename
+queue to the reader's read method.  The read method outputs a key identifying
+the file and record (useful for debugging if you have some weird records), and
+a scalar string value. Use one (or more) of the decoder and conversion ops to
+decode this string into the tensors that make up an example.
+
+#### CSV files
+
+To read text files in [comma-separated value (CSV)
+format](https://tools.ietf.org/html/rfc4180), use a
+[TextLineReader](../../api_docs/python/io_ops.md#TextLineReader) with the
+[decode_csv](../../api_docs/python/io_ops.md#decode_csv) operation. For example:
+
+```python
+filename_queue = tf.train.string_input_producer(["file0.csv", "file1.csv"])
+
+reader = tf.TextLineReader()
+key, value = reader.read(filename_queue)
+
+# Default values, in case of empty columns. Also specifies the type of the
+# decoded result.
+record_defaults = [[1], [1], [1], [1], [1]]
+col1, col2, col3, col4, col5 = tf.decode_csv(
+    value, record_defaults=record_defaults)
+features = tf.concat(0, [col1, col2, col3, col4])
+
+with tf.Session() as sess:
+  # Start populating the filename queue.
+  coord = tf.train.Coordinator()
+  threads = tf.train.start_queue_runners(coord=coord)
+
+  for i in range(1200):
+    # Retrieve a single instance:
+    example, label = sess.run([features, col5])
+
+  coord.request_stop()
+  coord.join(threads)
+```
+
+Each execution of `read()` reads a single line from the file. The
+`decode_csv()` op then parses the result into a list of tensors. The
+`record_defaults` argument determines the type of the resulting tensors and
+sets the default value to use if a value is missing in the input string.
+
+You must call `tf.train.start_queue_runners()` to populate the queue before
+you call `run()` or `eval()` to execute the `read()`. Otherwise `read()` will
+block while it waits for filenames from the queue.
+
+#### Fixed length records
+
+To read binary files in which each record is a fixed number of bytes, use
+[tf.FixedLengthRecordReader](../../api_docs/python/io_ops.md#FixedLengthRecordReader)
+with the [tf.decode_raw](../../api_docs/python/io_ops.md#decode_raw) operation.
+The `decode_raw` op converts from a string to a uint8 tensor.
+
+For example, [the CIFAR-10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html)
+uses a file format where each record is represented using a fixed number of
+bytes: 1 byte for the label followed by 3072 bytes of image data. Once you have
+a uint8 tensor, standard operations can slice out each piece and reformat as
+needed. For CIFAR-10, you can see how to do the reading and decoding in
+[tensorflow/models/image/cifar10/cifar10_input.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10_input.py)
+and described in
+[this tutorial](../../tutorials/deep_cnn/index.md#prepare-the-data).
+
+#### Standard TensorFlow format
+
+Another approach is to convert whatever data you have into a supported format.
+This approach makes it easier to mix and match data sets and network
+architectures. The recommended format for TensorFlow is a TFRecords file
+containing
+[tf.train.Example protocol buffers](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/example/example.proto)
+(which contain
+[`Features`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/example/feature.proto)
+as a field).  You write a little program that gets your data, stuffs it in an
+`Example` protocol buffer, serializes the protocol buffer to a string, and then
+writes the string to a TFRecords file using the
+[tf.python_io.TFRecordWriter class](../../api_docs/python/python_io.md#TFRecordWriter).
+For example,
+[tensorflow/g3doc/how_tos/reading_data/convert_to_records.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/convert_to_records.py)
+converts MNIST data to this format.
+
+To read a file of TFRecords, use
+[tf.TFRecordReader](../../api_docs/python/io_ops.md#TFRecordReader) with
+the [tf.parse_single_example](../../api_docs/python/io_ops.md#parse_single_example)
+decoder. The `parse_single_example` op decodes the example protocol buffers into
+tensors. An MNIST example using the data produced by `convert_to_records` can be
+found in
+[tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_reader.py),
+which you can compare with the `fully_connected_feed` version.
+
+### Preprocessing <div class="md-anchor" id="AUTOGENERATED-preprocessing">{#AUTOGENERATED-preprocessing}</div>
+
+You can then do any preprocessing of these examples you want. This would be any
+processing that doesn't depend on trainable parameters. Examples include
+normalization of your data, picking a random slice, adding noise or distortions,
+etc.  See
+[tensorflow/models/image/cifar10/cifar10.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py)
+for an example.
+
+### Batching <div class="md-anchor" id="AUTOGENERATED-batching">{#AUTOGENERATED-batching}</div>
+
+At the end of the pipeline we use another queue to batch together examples for
+training, evaluation, or inference.  For this we use a queue that randomizes the
+order of examples, using the
+[tf.train.shuffle_batch function](../../api_docs/python/io_ops.md#shuffle_batch).
+
+Example:
+
+```
+def read_my_file_format(filename_queue):
+  reader = tf.SomeReader()
+  key, record_string = reader.read(filename_queue)
+  example, label = tf.some_decoder(record_string)
+  processed_example = some_processing(example)
+  return processed_example, label
+
+def input_pipeline(filenames, batch_size, num_epochs=None):
+  filename_queue = tf.train.string_input_producer(
+      filenames, num_epochs=num_epochs, shuffle=True)
+  example, label = read_my_file_format(filename_queue)
+  # min_after_dequeue defines how big a buffer we will randomly sample
+  #   from -- bigger means better shuffling but slower start up and more
+  #   memory used.
+  # capacity must be larger than min_after_dequeue and the amount larger
+  #   determines the maximum we will prefetch.  Recommendation:
+  #   min_after_dequeue + (num_threads + a small safety margin) * batch_size
+  min_after_dequeue = 10000
+  capacity = min_after_dequeue + 3 * batch_size
+  example_batch, label_batch = tf.train.shuffle_batch(
+      [example, label], batch_size=batch_size, capacity=capacity,
+      min_after_dequeue=min_after_dequeue)
+  return example_batch, label_batch
+```
+
+If you need more parallelism or shuffling of examples between files, use
+multiple reader instances using the
+[tf.train.shuffle_batch_join function](../../api_docs/python/io_ops.md#shuffle_batch_join).
+For example:
+
+```
+def read_my_file_format(filename_queue):
+  # Same as above
+
+def input_pipeline(filenames, batch_size, read_threads, num_epochs=None):
+  filename_queue = tf.train.string_input_producer(
+      filenames, num_epochs=num_epochs, shuffle=True)
+  example_list = [read_my_file_format(filename_queue)
+                  for _ in range(read_threads)]
+  min_after_dequeue = 10000
+  capacity = min_after_dequeue + 3 * batch_size
+  example_batch, label_batch = tf.train.shuffle_batch_join(
+      example_list, batch_size=batch_size, capacity=capacity,
+      min_after_dequeue=min_after_dequeue)
+  return example_batch, label_batch
+```
+
+You still only use a single filename queue that is shared by all the readers.
+That way we ensure that the different readers use different files from the same
+epoch until all the files from the epoch have been started.  (It is also usually
+sufficient to have a single thread filling the filename queue.)
+
+An alternative is to use a single reader via the
+[tf.train.shuffle_batch function](../../api_docs/python/io_ops.md#shuffle_batch)
+with `num_threads` bigger than 1.  This will make it read from a single file at
+the same time (but faster than with 1 thread), instead of N files at once.
+This can be important:
+
+*   If you have more reading threads than input files, to avoid the risk that
+    you will have two threads reading the same example from the same file near
+    each other.
+*   Or if reading N files in parallel causes too many disk seeks.
+
+How many threads do you need? the `tf.train.shuffle_batch*` functions add a
+summary to the graph that indicates how full the example queue is. If you have
+enough reading threads, that summary will stay above zero.  You can
+[view your summaries as training progresses using TensorBoard](../summaries_and_tensorboard/index.md).
+
+### Creating threads to prefetch using `QueueRunner` objects <div class="md-anchor" id="QueueRunner">{#QueueRunner}</div>
+
+The short version: many of the `tf.train` functions listed above add
+[`QueueRunner`](../../api_docs/python/train.md#QueueRunner) objects to your
+graph.  These require that you call
+[tf.train.start_queue_runners](../../api_docs/python/train.md#start_queue_runners)
+before running any training or inference steps, or it will hang forever. This
+will start threads that run the input pipeline, filling the example queue so
+that the dequeue to get the examples will succeed.  This is best combined with a
+[tf.train.Coordinator](../../api_docs/python/train.md#Coordinator) to cleanly
+shut down these threads when there are errors. If you set a limit on the number
+of epochs, that will use an epoch counter that will need to be intialized.  The
+recommended code pattern combining these is:
+
+```python
+# Create the graph, etc.
+init_op = tf.initialize_all_variables()
+
+# Create a session for running operations in the Graph.
+sess = tf.Session()
+
+# Initialize the variables (like the epoch counter).
+sess.run(init_op)
+
+# Start input enqueue threads.
+coord = tf.train.Coordinator()
+threads = tf.train.start_queue_runners(sess=sess, coord=coord)
+
+try:
+    while not coord.should_stop():
+        # Run training steps or whatever
+        sess.run(train_op)
+
+except tf.errors.OutOfRangeError:
+    print 'Done training -- epoch limit reached'
+finally:
+    # When done, ask the threads to stop.
+    coord.request_stop()
+
+# Wait for threads to finish.
+coord.join(threads)
+sess.close()
+```
+
+#### Aside: What is happening here?
+
+First we create the graph. It will have a few pipeline stages that are
+connected by queues. The first stage will generate filenames to read and enqueue
+them in the filename queue. The second stage consumes filenames (using a
+`Reader`), produces examples, and enqueues them in an example queue. Depending
+on how you have set things up, you may actually have a few independent copies of
+the second stage, so that you can read from multiple files in parallel. At the
+end of these stages is an enqueue operation, which enqueues into a queue that
+the next stage dequeues from. We want to start threads running these enqueuing
+operations, so that our training loop can dequeue examples from the example
+queue.
+
+<div style="width:70%; margin-left:12%; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="AnimatedFileQueues.gif">
+</div>
+
+The helpers in `tf.train` that create these queues and enqueuing operations add
+a [tf.train.QueueRunner docs](../../api_docs/python/train.md#QueueRunner) to the
+graph using the
+[tf.train.add_queue_runner](../../api_docs/python/train.md#add_queue_runner)
+function. Each `QueueRunner` is responsible for one stage, and holds the list of
+enqueue operations that need to be run in threads. Once the graph is
+constructed, the
+[tf.train.start_queue_runners](../../api_docs/python/train.md#start_queue_runners)
+function asks each QueueRunner in the graph to start its threads running the
+enqueuing operations.
+
+If all goes well, you can now run your training steps and the queues will be
+filled by the background threads. If you have set an epoch limit, at some point
+an attempt to dequeue examples will get an
+[`tf.OutOfRangeError`](../../api_docs/python/client.md#OutOfRangeError).  This
+is the TensorFlow equivalent of "end of file" (EOF) -- this means the epoch
+limit has been reached and no more examples are available.
+
+The last ingredient is the
+[Coordinator](../../api_docs/python/train.md#Coordinator). This is responsible
+for letting all the threads know if anything has signalled a shut down. Most
+commonly this would be because an exception was raised, for example one of the
+threads got an error when running some operation (or an ordinary Python
+exception).
+
+For more about threading, queues, QueueRunners, and Coordinators
+[see here](../threading_and_queues/index.md).
+
+#### Aside: How clean shut-down when limiting epochs works
+
+Imagine you have a model that has set a limit on the number of epochs to train
+on.  That means that the thread generating filenames will only run that many
+times before generating an `OutOfRange` error. The QueueRunner will catch that
+error, close the filename queue, and exit the thread. Closing the queue does two
+things:
+
+*   Any future attempt to enqueue in the filename queue will generate an error.
+    At this point there shouldn't be any threads trying to do that, but this
+    is helpful when queues are closed due to other errors.
+*   Any current or future dequeue will either succeed (if there are enough
+    elements left) or fail (with an `OutOfRange` error) immediately.  They won't
+    block waiting for more elements to be enqueued, since by the previous point
+    that can't happen.
+
+The point is that when the filename queue is closed, there will likely still be
+many filenames in that queue, so the next stage of the pipeline (with the reader
+and other preprocessing) may continue running for some time.  Once the filename
+queue is exhausted, though, the next attempt to dequeue a filename (e.g. from a
+reader that has finished with the file it was working on) will trigger an
+`OutOfRange` error.  In this case, though, you might have multiple threads
+associated with a single QueueRunner.  If this isn't the last thread in the
+QueueRunner, the `OutOfRange` error just causes the one thread to exit.  This
+allows the other threads, which are still finishing up their last file, to
+proceed until they finish as well.  (Assuming you are using a
+[tf.train.Coordinator](../../api_docs/python/train.md#Coordinator),
+other types of errors will cause all the threads to stop.)  Once all the reader
+threads hit the `OutOfRange` error, only then does the next queue, the example
+queue, gets closed.
+
+Again, the example queue will have some elements queued, so training will
+continue until those are exhausted.  If the example queue is a
+[RandomShuffleQueue](../../api_docs/python/io_ops.md#RandomShuffleQueue), say
+because you are using `shuffle_batch` or `shuffle_batch_join`, it normally will
+avoid ever going having fewer than its `min_after_dequeue` attr elements
+buffered.  However, once the queue is closed that restriction will be lifted and
+the queue will eventually empty.  At that point the actual training threads,
+when they try and dequeue from example queue, will start getting `OutOfRange`
+errors and exiting.  Once all the training threads are done,
+[tf.train.Coordinator.join()](../../api_docs/python/train.md#Coordinator.join)
+will return and you can exit cleanly.
+
+### Filtering records or producing multiple examples per record <div class="md-anchor" id="AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record">{#AUTOGENERATED-filtering-records-or-producing-multiple-examples-per-record}</div>
+
+Instead of examples with shapes `[x, y, z]`, you will produce a batch of
+examples with shape `[batch, x, y, z]`.  The batch size can be 0 if you want to
+filter this record out (maybe it is in a hold-out set?), or bigger than 1 if you
+are producing multiple examples per record.  Then simply set `enqueue_many=True`
+when calling one of the batching functions (such as `shuffle_batch` or
+`shuffle_batch_join`).
+
+### Sparse input data <div class="md-anchor" id="AUTOGENERATED-sparse-input-data">{#AUTOGENERATED-sparse-input-data}</div>
+
+SparseTensors don't play well with queues. If you use SparseTensors you have
+to decode the string records using
+[tf.parse_example](../../api_docs/python/io_ops.md#parse_example) **after**
+batching (instead of using `tf.parse_single_example` before batching).
+
+## Preloaded data <div class="md-anchor" id="AUTOGENERATED-preloaded-data">{#AUTOGENERATED-preloaded-data}</div>
+
+This is only used for small data sets that can be loaded entirely in memory.
+There are two approaches:
+
+* Store the data in a constant.
+* Store the data in a variable, that you initialize and then never change.
+
+Using a constant is a bit simpler, but uses more memory (since the constant is
+stored inline in the graph data structure, which may be duplicated a few times).
+
+```python
+training_data = ...
+training_labels = ...
+with tf.Session():
+  input_data = tf.constant(training_data)
+  input_labels = tf.constant(training_labels)
+  ...
+```
+
+To instead use a variable, you need to also initialize it after the graph has been built.
+
+```python
+training_data = ...
+training_labels = ...
+with tf.Session() as sess:
+  data_initializer = tf.placeholder(dtype=training_data.dtype,
+                                    shape=training_data.shape)
+  label_initializer = tf.placeholder(dtype=training_labels.dtype,
+                                     shape=training_labels.shape)
+  input_data = tf.Variable(data_initalizer, trainable=False, collections=[])
+  input_labels = tf.Variable(label_initalizer, trainable=False, collections=[])
+  ...
+  sess.run(input_data.initializer,
+           feed_dict={data_initializer: training_data})
+  sess.run(input_labels.initializer,
+           feed_dict={label_initializer: training_lables})
+```
+
+Setting `trainable=False` keeps the variable out of the
+`GraphKeys.TRAINABLE_VARIABLES` collection in the graph, so we won't try and
+update it when training.  Setting `collections=[]` keeps the variable out of the
+`GraphKeys.VARIABLES` collection used for saving and restoring checkpoints.
+
+Either way,
+[tf.train.slice_input_producer function](../../api_docs/python/io_ops.md#slice_input_producer)
+can be used to produce a slice at a time.  This shuffles the examples across an
+entire epoch, so further shuffling when batching is undesirable.  So instead of
+using the `shuffle_batch` functions, we use the plain
+[tf.train.batch function](../../api_docs/python/io_ops.md#batch).  To use
+multiple preprocessing threads, set the `num_threads` parameter to a number
+bigger than 1.
+
+An MNIST example that preloads the data using constants can be found in
+[tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded.py), and one that preloads the data using variables can be found in
+[tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/how_tos/reading_data/fully_connected_preloaded_var.py),
+You can compare these with the `fully_connected_feed` and
+`fully_connected_reader` versions above.
+
+## Multiple input pipelines <div class="md-anchor" id="AUTOGENERATED-multiple-input-pipelines">{#AUTOGENERATED-multiple-input-pipelines}</div>
+
+Commonly you will want to train on one dataset and evaluate (or "eval") on
+another.  One way to do this is to actually have two separate processes:
+
+* The training process reads training input data and periodically writes
+  checkpoint files with all the trained variables.
+* The evaluation process restores the checkpoint files into an inference
+  model that reads validation input data.
+
+This is what is done in
+[the example CIFAR-10 model](../../tutorials/deep_cnn/index.md#save-and-restore-checkpoints).  This has a couple of benefits:
+
+* The eval is performed on a single snapshot of the trained variables.
+* You can perform the eval even after training has completed and exited.
+
+You can have the train and eval in the same graph in the same process, and share
+their trained variables.  See
+[the shared variables tutorial](../variable_scope/index.md).
diff --git a/tensorflow/g3doc/how_tos/summaries_and_tensorboard/index.md b/tensorflow/g3doc/how_tos/summaries_and_tensorboard/index.md
new file mode 100644
index 0000000000..18f4b4260e
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/summaries_and_tensorboard/index.md
@@ -0,0 +1,102 @@
+# TensorBoard: Visualizing Your Training
+
+The computations you'll use TensorBoard for - like training a massive
+deep neural network - can be complex and confusing. To make it easier to
+understand, debug, and optimize TensorFlow programs, we've included a suite of
+visualization tools called TensorBoard. You can use TensorBoard to visualize
+your TensorFlow graph, quantitative metrics about the execution of your graph,
+and even additional data like images that pass through it. When TensorBoard is
+fully configured, it looks like this:
+
+TODO(danmane): Enable a live TensorBoard
+![MNIST TensorBoard](./mnist_tensorboard.png "MNIST TensorBoard")
+
+## Serializing the data
+
+TensorBoard operates by reading TensorFlow events files, which contain summary
+data that you can generate when running TensorFlow. Here's the general
+lifecycle for summary data within TensorBoard.
+
+First, create the TensorFlow graph that you'd like to collect summary
+data from, and decide which nodes you would like to annotate with
+[summary operations]
+(../../api_docs/python/train.md#summary-operations).
+
+For example, suppose that you are creating a convolutional neural network for
+training MNIST digits recognition. You'd like to record how the learning rate
+varies over time, and how the objective function is changing. Collect these by
+attaching [`scalar_summary`](../../api_docs/python/train.md#scalar_summary) ops
+to the nodes that output the learning rate and loss respectively. Then, give
+each `scalar_summary` a meaningful `tag`, like `'learning rate'` and `'loss
+function'`.
+
+Perhaps you'd also like to visualize the distributions of activations coming
+off a particular layer, or the distribution of gradients or weights. Collect
+this data by attaching
+[`histogram_summary`](../../api_docs/python/train.md#histogram_summary) ops to
+the gradient outputs and to the variable that holds your weights, respectively.
+
+For details on all of the summary operations avaiable, check out the docs on
+[summary operations]
+(../../api_docs/python/train.md#summary-operations).
+
+Operations in TensorFlow don't do anything until you run them, or an op that
+depends on their output. And the summary nodes that we've just created are
+peripheral to your graph: none of the ops you are currently running depend on
+them. So, to generate summaries, we need to run all of these summary nodes.
+Managing them by hand would be tedious, so use
+[`tf.merge_all_summaries`](../../api_docs/python/train.md#merge_all_summaries)
+to combine them into a single op that generates all the summary data.
+
+Then, you can just run the merged summary op, which will generate a serialized
+`Summary` protobuf object with all of your summary data at a given step.
+Finally, to write this summary data to disk, pass the summary protobuf to a
+[`tf.train.SummaryWriter`](../../api_docs/python/train.md#SummaryWriter).
+
+The `SummaryWriter` takes a logdir in its constructor - this logdir is quite
+important, it's the directory where all of the events will be written out.
+Also, the `SummaryWriter` can optionally take a `GraphDef` in its constructor.
+If it receives one, then TensorBoard will visualize your graph as well.
+
+Now that you've modified your graph and have a `SummaryWriter`, you're ready to
+start runing your network! If you want, you could run the merged summary op
+every single step, and record a ton of training data. That's likely to be more
+data than you need, though. Instead, consider running the merged summary op
+every hundred steps or so, as in the following code example.
+
+```python
+merged_summary_op = tf.merge_all_summaries()
+summary_writer = tf.train.SummaryWriter('/tmp/mnist_logs', sess.graph)
+total_step = 0
+while training:
+  total_step += 1
+  session.run(training_op)
+  if total_step % 100 == 0:
+    summary_str = session.run(merged_summary_op)
+    summary_writer.add_summary(summary_str, total_step)
+```
+
+You're now all set to visualize this data using TensorBoard.
+
+
+## Launching TensorBoard
+
+To run TensorBoard, use the command
+`python tensorflow/tensorboard/tensorboard.py --logdir=path/to/logs`, where
+`logdir` points to the directory where the `SummaryWriter` serialized its data.
+If this `logdir` directory contains sub-directories which contain serialized
+data from separate runs, then TensorBoard will visualize the data from all of
+those runs. Once TensorBoard is running, navigate your web browser to
+localhost:6006 to view the TensorBoard.
+
+If you have pip installed TensorBoard, you can just simply type the command
+`tensorboard --logidr=/path/to/logs` in order to run it.
+
+When looking at TensorBoard, you will see the navigation tabs in the top right
+corner. Each tab represents a set of serialized data that can be visualized.
+For any tab you are looking at, if the logs being looked at by TensorBoard do
+not contain any data relevant to that tab, a message will be displayed
+indicating how to serialize data that is applicable to that tab.
+
+For in depth information on how to use the "graph" tab to visualize your graph,
+see [TensorBoard: Visualizing your graph](../graph_viz/index.md).
diff --git a/tensorflow/g3doc/how_tos/threading_and_queues/index.md b/tensorflow/g3doc/how_tos/threading_and_queues/index.md
new file mode 100644
index 0000000000..c472de18c5
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/threading_and_queues/index.md
@@ -0,0 +1,146 @@
+# Threading and Queues
+
+Queues, such as `FIFOQueue` and `RandomShuffleQueue`, are important TensorFlow
+objects for computing tensors asynchronously in a graph.
+
+For example, a typical input architecture is to use a `RandomShuffleQueue` to
+prepare inputs for training a model:
+
+* Multiple threads prepare training examples and push them in the queue.
+* A training thread executes a training op that dequeues mini-batches from the
+  queue
+
+This architecture has many benefits, as highlighted in the
+[Reading data how to](../reading_data), which also gives an overview of
+functions that simplify the construction of input pipelines.
+
+The TensorFlow `Session` object is multithreaded, so multiple threads can
+easily use the same session and run ops in parallel.  However, it is not always
+easy to implement a Python program that drives threads as described above.  All
+threads must be able to stop together, exceptions must be caught and
+reported, and queues must be properly closed when stopping.
+
+TensorFlow provides two classes to help:
+[tf.Coordinator](../../api_docs/python/train.md#Coordinator) and
+[tf.QueueRunner](../../api_docs/python/train.md#QueueRunner). These two classes
+are designed to be used together. The `Coordinator` class helps multiple threads
+stop together and report exceptions to a program that waits for them to stop.
+The `QueueRunner` class is used to create a number of threads cooperating to
+enqueue tensors in the same queue.
+
+## Coordinator
+
+The Coordinator class helps multiple threads stop together.
+
+Its key methods are:
+
+* `should_stop()`: returns True if the threads should stop.
+* `request_stop(<exception>)`: requests that threads should stop.
+* `join(<list of threads>)`: waits until the specified threads have stopped.
+
+You first create a `Coordinator` object, and then create a number of threads
+that use the coordinator.  The threads typically run loops that stop when
+`should_stop()` returns `True`.
+
+Any thread can decide that the computation should stop.  It only has to call
+`request_stop()` and the other threads will stop as `should_stop()` will then
+return `True`.
+
+```python
+# Thread body: loop until the coordinator indicates a stop was requested.
+# If some condition becomes true, ask the coordinator to stop.
+def MyLoop(coord):
+  while not coord.should_stop():
+    ...do something...
+    if ...some condition...:
+      coord.request_stop()
+
+# Main code: create a coordinator.
+coord = Coordinator()
+
+# Create 10 threads that run 'MyLoop()'
+threads = [threading.Thread(target=MyLoop, args=(coord)) for i in xrange(10)]
+
+# Start the threads and wait for all of them to stop.
+for t in threads: t.start()
+coord.join(threads)
+```
+
+Obviously, the coordinator can manage threads doing very different things.
+They don't have to be all the same as in the example above.  The coordinator
+also has support to capture and report exceptions.  See the [Coordinator class](../../api_docs/python/train.md#Coordinator) documentation for more details.
+
+## QueueRunner
+
+The `QueueRunner` class creates a number of threads that repeatedly run an
+enqueue op.  These threads can use a coordinator to stop together.  In
+addition, a queue runner runs a *closer thread* that automatically closes the
+queue if an exception is reported to the coordinator.
+
+You can use a queue runner to implement the architecture described above.
+
+First build a graph that uses a `Queue` for input examples.  Add ops that
+process examples and enqueue them in the queue.  Add training ops that start by
+dequeueing from the queue.
+
+```python
+example = ...ops to create one example...
+# Create a queue, and an op that enqueues examples one at a time in the queue.
+queue = tf.RandomShuffleQueue(...)
+enqueue_op = queue.enqueue(example)
+# Create a training graph that starts by dequeuing a batch of examples.
+inputs = queue.dequeue_many(batch_size)
+train_op = ...use 'inputs' to build the training part of the graph...
+```
+
+In the Python training program, create a `QueueRunner` that will run a few
+threads to process and enqueue examples.  Create a `Coordinator` and ask the
+queue runner to start its threads with the coordinator.  Write a training loop
+that also uses the coordinator.
+
+```
+# Create a queue runner that will run 4 threads in parallel to enqueue
+# examples.
+qr = tf.train.QueueRunner(queue, [enqueue_op] * 4)
+
+# Launch the graph.
+sess = tf.Session()
+# Create a coordinator, launch the queue runner threads.
+coord = tf.train.Coordinator()
+enqueue_threads = qr.create_threads(sess, coord=coord, start=True)
+# Run the training loop, controlling termination with the coordinator.
+for step in xrange(1000000):
+    if coord.should_stop():
+        break
+    sess.run(train_op)
+# When done, ask the threads to stop.
+coord.request_stop()
+# And wait for them to actually do it.
+coord.join(threads)
+```
+
+## Handling Exceptions
+
+Threads started by queue runners do more than just run the enqueue ops.  They
+also catch and handle exceptions generated by queues, including
+`OutOfRangeError` which is used to report that a queue was closed.
+
+A training program that uses a coordinator must similarly catch and report
+exceptions in its main loop.
+
+Here is an improved version of the training loop above.
+
+```python
+try:
+    for step in xrange(1000000):
+        if coord.should_stop():
+            break
+        sess.run(train_op)
+except Exception, e:
+   # Report exceptions to the coordinator.
+   coord.request_stop(e)
+
+# Terminate as usual.  It is innocuous to request stop twice.
+coord.request_stop()
+coord.join(threads)
+```
diff --git a/tensorflow/g3doc/how_tos/using_gpu/index.md b/tensorflow/g3doc/how_tos/using_gpu/index.md
new file mode 100644
index 0000000000..c0bdc5a7cb
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/using_gpu/index.md
@@ -0,0 +1,174 @@
+# Using GPUs
+
+## Supported devices
+
+On a typical system, there are multiple computing devices. In TensorFlow, the
+supported device types are `CPU` and `GPU`.  They are represented as
+`strings`. For example:
+
+*  `"/cpu:0"`: The CPU of your machine.
+*  `"/gpu:0"`: The GPU of your machine, if you have one.
+*  `"/gpu:1"`: The second GPU of your machine, etc.
+
+If a TensorFlow operation has both CPU and GPU implementations, the
+GPU devices will be given priority when the operation is assigned to
+a device. For example, `matmul` has both CPU and GPU kernels.  On a
+system with devices `cpu:0` and `gpu:0`, `gpu:0` will be selected to run
+`matmul`.
+
+## Logging Device placement
+
+To find out which devices your operations and tensors are assigned to, create
+the session with `log_device_placement` configuration option set to `True`.
+
+```python
+# Creates a graph.
+a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
+b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
+c = tf.matmul(a, b)
+# Creates a session with log_device_placement set to True.
+sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+# Runs the op.
+print sess.run(c)
+```
+
+You should see the following output:
+
+```
+Device mapping:
+/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
+id: 0000:05:00.0
+b: /job:localhost/replica:0/task:0/gpu:0
+a: /job:localhost/replica:0/task:0/gpu:0
+MatMul: /job:localhost/replica:0/task:0/gpu:0
+[[ 22.  28.]
+ [ 49.  64.]]
+
+```
+
+## Manual device placement
+
+If you would like a particular operation to run on a device of your
+choice instead of what's automatically selected for you, you can use
+`with tf.device` to create a device context such that all the operations
+within that context will have the same device assignment.
+
+```python
+# Creates a graph.
+with tf.device('/cpu:0'):
+  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
+  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
+c = tf.matmul(a, b)
+# Creates a session with log_device_placement set to True.
+sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+# Runs the op.
+print sess.run(c)
+```
+
+You will see that now `a` and `b` are assigned to `cpu:0`.
+
+```
+Device mapping:
+/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
+id: 0000:05:00.0
+b: /job:localhost/replica:0/task:0/cpu:0
+a: /job:localhost/replica:0/task:0/cpu:0
+MatMul: /job:localhost/replica:0/task:0/gpu:0
+[[ 22.  28.]
+ [ 49.  64.]]
+```
+
+## Using a single GPU on a multi-GPU system
+
+If you have more than one GPU in your system, the GPU with the lowest ID will be
+selected by default. If you would like to run on a different GPU, you will need
+to specify the preference explicitly:
+
+```python
+# Creates a graph.
+with tf.device('/gpu:2'):
+  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
+  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
+  c = tf.matmul(a, b)
+# Creates a session with log_device_placement set to True.
+sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+# Runs the op.
+print sess.run(c)
+```
+
+If the device you have specified does not exist, you will get
+`InvalidArgumentError`:
+
+```
+InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
+Could not satisfy explicit device specification '/gpu:2'
+   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
+   values: 1 2 3...>, _device="/gpu:2"]()]]
+```
+
+If you would like TensorFlow to automatically choose an existing and
+supported device to run the operations in case the specified one doesn't
+exist, you can set `allow_soft_placement` to `True` in the configuration
+option when creating the session.
+
+```python
+# Creates a graph.
+with tf.device('/gpu:2'):
+  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
+  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
+  c = tf.matmul(a, b)
+# Creates a session with allow_soft_placement and log_device_placement set
+# to True.
+sess = tf.Session(config=tf.ConfigProto(
+      allow_soft_placement=True, log_device_placement=True))
+# Runs the op.
+print sess.run(c)
+```
+
+## Using multiple GPUs
+
+If you would like to run TensorFlow on multiple GPUs, you can construct your
+model in a multi-tower fashion where each tower is assigned to a different GPU.
+For example:
+
+```
+# Creates a graph.
+c = []
+for d in ['/gpu:2', '/gpu:3']:
+  with tf.device(d):
+    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
+    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
+    c.append(tf.matmul(a, b))
+with tf.device('/cpu:0'):
+  sum = tf.add_n(c)
+# Creates a session with log_device_placement set to True.
+sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
+# Runs the op.
+print sess.run(sum)
+```
+
+You will see the following output.
+
+```
+Device mapping:
+/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus
+id: 0000:02:00.0
+/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci bus
+id: 0000:03:00.0
+/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci bus
+id: 0000:83:00.0
+/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci bus
+id: 0000:84:00.0
+Const_3: /job:localhost/replica:0/task:0/gpu:3
+Const_2: /job:localhost/replica:0/task:0/gpu:3
+MatMul_1: /job:localhost/replica:0/task:0/gpu:3
+Const_1: /job:localhost/replica:0/task:0/gpu:2
+Const: /job:localhost/replica:0/task:0/gpu:2
+MatMul: /job:localhost/replica:0/task:0/gpu:2
+AddN: /job:localhost/replica:0/task:0/cpu:0
+[[  44.   56.]
+ [  98.  128.]]
+```
+
+The [cifar10 tutorial](../../tutorials/deep_cnn/index.md) is a good example
+demonstrating how to do training with multiple GPUs.
diff --git a/tensorflow/g3doc/how_tos/variable_scope/index.md b/tensorflow/g3doc/how_tos/variable_scope/index.md
new file mode 100644
index 0000000000..f9221b207b
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/variable_scope/index.md
@@ -0,0 +1,372 @@
+# Sharing Variables
+
+You can create, initialize, save and load single variables
+in the way described in the [Variables HowTo](../variables/index.md).
+But when building complex models you often need to share large sets of
+variables and you might want to initialize all of them in one place.
+This tutorial shows how this can be done using `tf.variable_scope()` and
+the `tf.get_variable()`.
+
+## The Problem
+
+Imagine you create a simple model for image filters, similar to our
+[Convolutional Neural Networks Tutorial](../../tutorials/deep_cnn/index.md)
+model but with only 2 convolutions (for simplicity of this example). If you use
+just `tf.Variable`, as explained in [Variables HowTo](../variables/index.md),
+your model might look like this.
+
+```python
+def my_image_filter(input_images):
+    conv1_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
+        name="conv1_weights")
+    conv1_biases = tf.Variable(tf.zeros([32]), name="conv1_biases")
+    conv1 = tf.nn.conv2d(input_images, conv1_weights,
+        strides=[1, 1, 1, 1], padding='SAME')
+    relu1 = tf.nn.relu(conv1 + conv1_biases)
+
+    conv2_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
+        name="conv2_weights")
+    conv2_biases = tf.Variable(tf.zeros([32]), name="conv2_biases")
+    conv2 = tf.nn.conv2d(relu1, conv2_weights,
+        strides=[1, 1, 1, 1], padding='SAME')
+    return tf.nn.relu(conv2 + conv2_biases)
+```
+
+As you can easily imagine, models quickly get much more complicated than
+this one, and even here we already have 4 different variables: `conv1_weights`,
+`conv1_biases`, `conv2_weights`, and `conv2_biases`.
+
+The problem arises when you want to reuse this model. Assume you want to
+apply your image filter to 2 different images, `image1` and `image2`.
+You want both images processed by the same filer with the same parameters.
+You can call `my_image_filter()` twice, but this will create two sets
+of variables:
+
+```python
+# First call creates one set of variables.
+result1 = my_image_filter(image1)
+# Another set is created in the second call.
+result2 = my_image_filter(image2)
+```
+
+A common way to share variables is to create them in a separate piece of code
+and pass them to functions that use them.   For example by using a dictionary:
+
+```python
+variables_dict = {
+    "conv1_weights": tf.Variable(tf.random_normal([5, 5, 32, 32]),
+        name="conv1_weights")
+    "conv1_biases": tf.Variable(tf.zeros([32]), name="conv1_biases")
+    ... etc. ...
+}
+
+def my_image_filter(input_images, variables_dict):
+    conv1 = tf.nn.conv2d(input_images, variables_dict["conv1_weights"],
+        strides=[1, 1, 1, 1], padding='SAME')
+    relu1 = tf.nn.relu(conv1 + variables_dict["conv1_biases"])
+
+    conv2 = tf.nn.conv2d(relu1, variables_dict["conv2_weights"],
+        strides=[1, 1, 1, 1], padding='SAME')
+    return tf.nn.relu(conv2 + variables_dict["conv2_biases"])
+
+# The 2 calls to my_image_filter() now use the same variables
+result1 = my_image_filter(image1, variables_dict)
+result2 = my_image_filter(image2, variables_dict)
+```
+
+While convenient, creating variables like above,
+outside of the code, breaks encapsulation:
+
+*  The code that builds the graph must document the names, types,
+   and shapes of variables to create.
+*  When the code changes, the callers may have to create more, or less,
+   or different variables.
+
+One way to address the problem is to use classes to create a model,
+where the classes take care of managing the variables they need.
+For a lighter solution, not involving classes, TensorFlow provides
+a *Variable Scope* mechanism that allows to easily share named variables
+while constructing a graph.
+
+## Variable Scope Example
+
+Variable Scope mechanism in TensorFlow consists of 2 main functions:
+
+* `tf.get_variable(<name>, <shape>, <initializer>)`:
+  Creates or returns a variable with a given name.
+* `tf.variable_scope(<scope_name>)`:
+  Manages namespaces for names passed to `tf.get_variable()`.
+
+The function `tf.get_variable()` is used to get or create a variable instead
+of a direct call to `tf.Variable`. It uses an *initializer* instead of passing
+the value directly, as in `tf.Variable`. An initializer is a function that
+takes the shape and provides a tensor with that shape. Here are some
+initializers available in TensorFlow:
+
+* `tf.constant_initializer(value)` initializes everything to the provided value,
+* `tf.random_uniform_initializer(a, b)` initializes uniformly from [a, b],
+* `tf.random_normal_initializer(mean, stddev)` initializes from the normal
+  distribution with the given mean and standard deviation.
+
+To see how `tf.get_variable()` solves the problem discussed
+before, let's refactor the code that created one convolution into
+a separate function, named `conv_relu`:
+
+```python
+def conv_relu(input, kernel_shape, bias_shape):
+    # Create variable named "weights".
+    weights = tf.get_variable("weights", kernel_shape,
+        initializer=tf.random_normal_initializer())
+    # Create variable named "biases".
+    biases = tf.get_variable("biases", bias_shape,
+        initializer=tf.constant_intializer(0.0))
+    conv = tf.nn.conv2d(input, weights,
+        strides=[1, 1, 1, 1], padding='SAME')
+    return tf.nn.relu(conv + biases)
+```
+
+This function uses short names `"weights"` and `"biases"`.
+We'd like to use it for both `conv1` and `conv2`, but
+the variables need to have different names.
+This is where `tf.variable_scope()` comes into play:
+it pushes a namespace for variables.
+
+```python
+def my_image_filter(input_images):
+    with tf.variable_scope("conv1"):
+        # Variables created here will be named "conv1/weights", "conv1/biases".
+        relu1 = conv_relu(input_images, [5, 5, 32, 32], [32])
+    with tf.variable_scope("conv2"):
+        # Variables created here will be named "conv2/weights", "conv2/biases".
+        return conv_relu(relu1, [5, 5, 32, 32], [32])
+```
+
+Now, let's see what happens when we call `my_image_filter()` twice.
+
+```
+result1 = my_image_filter(image1)
+result2 = my_image_filter(image2)
+# Raises ValueError(... conv1/weights already exists ...)
+```
+
+As you can see, `tf.get_variable()` checks that already existing variables
+are not shared by accident. If you want to share them, you need to specify
+it by setting `reuse_variables()` as follows.
+
+```
+with tf.variable_scope("image_filters") as scope:
+    result1 = my_image_filter(image1)
+    scope.reuse_variables()
+    result2 = my_image_filter(image2)
+```
+
+This is a good way to share variables, lightweight and safe.
+
+## How Does Variable Scope Work?
+
+### Understanding `tf.get_variable()`
+
+To understand variable scope it is necessary to first
+fully understand how `tf.get_variable()` works.
+Here is how `tf.get_variable` is usually called.
+
+```python
+v = tf.get_variable(name, shape, dtype, initializer)
+```
+
+This call does one of two things depending on the scope it is called in.
+Here are the two options.
+
+* Case 1: the scope is set for creating new variables, as evidenced by
+`tf.get_variable_scope().reuse == False`.
+
+In this case, `v` will be a newly created `tf.Variable` with the provided
+shape and data type. The full name of the created variable will be set to
+the current variable scope name + the provided `name` and a check will be
+performed to ensure that no variable with this full name exists yet.
+If a variable with this full name already exists, the funtion will
+raise a `ValueError`. If a new variable is created, it will be
+initialized to the value `initializer(shape)`. For example:
+
+```python
+with tf.variable_scope("foo"):
+    v = tf.get_variable("v", [1])
+assert v.name == "foo/v:0"
+```
+
+* Case 2: the scope is set for reusing variables, as evidenced by
+`tf.get_variable_scope().reuse == True`.
+
+In this case, the call will search for an already existing variable with
+name equal to the current variable scope name + the provided `name`.
+If no such variable exists, a `ValueError` will be raised. If the variable
+is found, it will be returned. For example:
+
+```python
+with tf.variable_scope("foo"):
+    v = tf.get_variable("v", [1])
+with tf.variable_scope("foo", reuse=True):
+    v1 = tf.get_variable("v", [1])
+assert v1 == v
+```
+
+### Basics of `tf.variable_scope()`
+
+Knowing how `tf.get_variable()` works makes it easy to understand variable
+scope. The primary function of variable scope is to carry a name that will
+be used as prefix for variable names and a reuse-flag to distinguish the two
+cases described above. Nesting variable scopes appends their names in a way
+analogous to how directories work:
+
+```python
+with tf.variable_scope("foo"):
+    with tf.variable_scope("bar"):
+        v = tf.get_variable("v", [1])
+assert v.name == "foo/bar/v:0"
+```
+
+The current variable scope can be retrieved using `tf.get_variable_scope()`
+and the `reuse` flag of the current variable scope can be set to `True` by
+calling `tf.get_variable_scope().reuse_variables()`:
+
+```python
+with tf.variable_scope("foo"):
+    v = tf.get_variable("v", [1])
+    tf.get_variable_scope().reuse_variables()
+    v1 = tf.get_variable("v", [1])
+assert v1 == v
+```
+
+Note that you *cannot* set the `reuse` flag to `False`. The reason behind
+this is to allow to compose functions that create models. Imagine you write
+a function `my_image_filter(inputs)` as before. Someone calling the function
+in a variable scope with `reuse=True` would expect all inner variables to be
+reused as well. Allowing to force `reuse=False` inside the function would break
+this contract and make it hard to share parameters in this way.
+
+Even though you cannot set `reuse` to `False` explicitly, you can enter
+a reusing variable scope and then exit it, going back to a non-reusing one.
+This can be done using a `reuse=True` parameter when opening a variable scope.
+Note also that, for the same reason as above, the `reuse` parameter is
+inherited. So when you open a reusing variable scope, all sub-scopes will
+be reusing too.
+
+```python
+with tf.variable_scope("root"):
+    # At start, the scope is not reusing.
+    assert tf.get_variable_scope().reuse == False
+    with tf.variable_scope("foo"):
+        # Opened a sub-scope, still not reusing.
+        assert tf.get_variable_scope().reuse == False
+    with tf.variable_scope("foo", reuse=True):
+        # Explicitly opened a reusing scope.
+        assert tf.get_variable_scope().reuse == True
+        with tf.variable_scope("bar"):
+            # Now sub-scope inherits the reuse flag.
+            assert tf.get_variable_scope().reuse == True
+    # Exited the reusing scope, back to a non-reusing one.
+    assert tf.get_variable_scope().reuse == False
+```
+
+### Capturing variable scope
+
+In all examples presented above, we shared parameters only because their
+names agreed, that is, because we opened a reusing variable scope with
+exactly the same string. In more complex cases, it might be useful to pass
+a VariableScope object rather than rely on getting the names right.
+To this end, variable scopes can be captured and used instead of names
+when opening a new variable scope.
+
+```python
+with tf.variable_scope("foo") as foo_scope:
+    v = tf.get_variable("v", [1])
+with tf.variable_scope(foo_scope)
+    w = tf.get_variable("w", [1])
+with tf.variable_scope(foo_scope, reuse=True)
+    v1 = tf.get_variable("v", [1])
+    w1 = tf.get_variable("w", [1])
+assert v1 == v
+assert w1 == w
+```
+
+When opening a variable scope using a previously existing scope
+we jump out of the current variable scope prefix to an entirely
+different one. This is fully independent of where we do it.
+
+```python
+with tf.variable_scope("foo") as foo_scope:
+    assert foo_scope.name == "foo"
+with tf.variable_scope("bar")
+    with tf.variable_scope("baz") as other_scope:
+        assert other_scope.name == "bar/baz"
+        with tf.variable_scope(foo_scope) as foo_scope2:
+            assert foo_scope2.name == "foo"  # Not changed.
+```
+
+### Initializers in variable scope
+
+Using `tf.get_variable()` allows to write functions that create or reuse
+variables and can be transparently called from outside. But what if we wanted
+to change the initializer of the created variables? Do we need to pass an extra
+argument to every function that creates variables? What about the most common
+case, when we want to set the default initializer for all variables in one
+place, on top of all functions? To help with these cases, variable scope
+can carry a default initializer. It is inherited by sub-scopes and passed
+to each `tf.get_variable()` call. But it will be overridden if another
+initializer is specified explicitly.
+
+```python
+with tf.variable_scope("foo", initializer=tf.constant_initializer(0.4)):
+    v = tf.get_variable("v", [1])
+    assert v.eval() == 0.4  # Default initializer as set above.
+    w = tf.get_variable("w", [1], initializer=tf.constant_initializer(0.3)):
+    assert w.eval() == 0.3  # Specific initializer overrides the default.
+    with tf.variable_scope("bar"):
+        v = get_variable("v", [1])
+        assert v.eval() == 0.4  # Inherited default initializer.
+    with tf.variable_scope("baz", initializer=tf.constant_initializer(0.2)):
+        v = get_variable("v", [1])
+        assert v.eval() == 0.2  # Changed default initializer.
+```
+
+### Names of ops in `tf.variable_scope()`
+
+We discussed how `tf.variable_scope` governs the names of variables.
+But how does it influence the names of other ops in the scope?
+It is natural that ops created inside a variable scope should also
+share that name. For this reason, when we do `with tf.variable_scope("name")`,
+this implicitly opens a `tf.name_scope("name")`. For example:
+
+```python
+with tf.variable_scope("foo"):
+    x = 1.0 + tf.get_variable("v", [1])
+assert x.op.name == "foo/add"
+```
+
+Name scopes can be openend in addition to a variable scope, and then
+they will only affect the names of the ops, but not of variables.
+
+```python
+with tf.variable_scope("foo"):
+    with tf.name_scope("bar"):
+        v = tf.get_variable("v", [1])
+        x = 1.0 + v
+assert v.name == "foo/v:0"
+assert x.op.name == "foo/bar/add"
+```
+
+When opening a variable scope using a captured object instead of a string,
+we do not alter the current name scope for ops.
+
+
+## Examples of Use
+
+Here are pointers to a few files that make use of variable scope.
+In particular, it is heavily used for recurrent neural networks
+and sequence-to-sequence models.
+
+File | What's in it?
+--- | ---
+`models/image/cifar10.py` | Model for detecting objects in images.
+`models/rnn/rnn_cell.py` | Cell functions for recurrent neural networks.
+`models/rnn/seq2seq.py` | Functions for building sequence-to-sequence models.
diff --git a/tensorflow/g3doc/how_tos/variables/index.md b/tensorflow/g3doc/how_tos/variables/index.md
new file mode 100644
index 0000000000..4ad8f8a266
--- /dev/null
+++ b/tensorflow/g3doc/how_tos/variables/index.md
@@ -0,0 +1,215 @@
+# Variables: Creation, Initialization, Saving, and Loading
+
+When you train a model, you use [Variables](../../api_docs/python/state_ops.md)
+to hold and update parameters.  Variables are in-memory buffers containing
+tensors.  They need to be explicitly initialized and can be saved to disk during
+and after training. You can later restore saved values to exercise or analyse
+the model.
+
+This document references the following TensorFlow classes.  Follow the links to
+their reference manual for a complete description of their API:
+
+*  The `Variable` class [tf.Variable](../../api_docs/python/state_ops.md#Variable).
+*  The `Saver` class [tf.train.Saver](../../api_docs/python/state_ops.md#Saver).
+
+
+## Creation
+
+When you create a [Variable](../../api_docs/python/state_ops.md) you pass a
+`Tensor` as its initial value to the `Variable()` constructor.  TensorFlow
+provides a collection of Ops that produce tensors often used for initialization
+from [constants or random values](../../api_docs/python/constant_op.md).
+
+Note that all these Ops require you to specify the shape of the tensors.  That
+shape automatically becomes the shape of the variable.  Variables generally
+have a fixed shape, but TensorFlow provides advanced mechanisms to reshape
+variables.
+
+```python
+# Create two variables.
+weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
+                      name="weights")
+biases = tf.Variable(tf.zeros([200]), name="biases")
+```
+
+Calling `tf.Variable()` adds a few Ops to the graph:
+
+*  A `variable` Op that holds the variable value.
+*  An initializer Op that sets the variable to its initial value.  This is
+   actually a `tf.assign` Op.
+*  The Ops for the initial value, such as the `zeros` Op for the `biases`
+   variable in the example are also added to the graph.
+
+The value returned by `tf.Variable()` value is an instance of the Python class
+`tf.Variable`.
+
+## Initialization
+
+Variable initializers must be run explicitly before other Ops in your model can
+be run.  The easiest way to do that is to add an Op that runs all the variable
+initializers, and run that Op before using the model.
+
+You can alternatively restore variable values from a checkpoint file, see
+below.
+
+Use `tf.initialize_all_variables()` to add an Op to run variable initializers.
+Only run that Op after you have fully constructed your model and launched it in
+a session.
+
+```python
+# Create two variables.
+weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
+                      name="weights")
+biases = tf.Variable(tf.zeros([200]), name="biases")
+...
+# Add an Op to initialize the variables.
+init_op = tf.initialize_all_variables()
+
+# Later, when launching the model
+with tf.Session() as sess:
+  # Run the init operation.
+  sess.Run(init_op)
+  ...
+  # Use the model
+  ...
+```
+
+### Initialization from another Variable
+
+You sometimes need to initialize a variable from the initial value of another
+variable.  As the Op added by `tf.initialize_all_variables()` initializes all
+variables in parallel you have to be careful when this is needed.
+
+To initialize a new variable from the value of another variable use the other
+variable's `initialized_value()` property.  You can use the initialized value
+directly as the initial value for the new variable, or you can use it as any
+other tensor to compute a value for the new variable.
+
+
+```python
+# Create a variable with a random value.
+weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
+                      name="weights")
+# Create another variable with the same value as 'weights'.
+w2 = tf.Variable(weights.initialized_value(), name="w2")
+# Create another variable with twice the value of 'weights'
+w_twice = tf.Variable(weights.initialized_value() * 0.2, name="w_twice")
+```
+
+### Custom Initialization
+
+The convenience function `tf.initialize_all_variables()` adds an Op to
+initialize *all variables* in the model.  You can also pass it an explicit list
+of variables to initialize.  See the
+[Variables Documentation](../../api_docs/python/state_op.md) for more options,
+including checking if variables are initialized.
+
+## Saving and Restoring
+
+The easiest way to save and restore a model is to use a `tf.train.Saver`
+object.  The constructor adds `save` and `restore` Ops to the graph for all, or
+a specified list, of variables.  The saver object provides methods to run these
+Ops, specifying paths for the checkpoint files to write to or read from.
+
+### Checkpoint Files
+
+Variables are saved in binary files that, roughly, contains a map from variable
+names to tensors.
+
+When you create a `Saver` object, you can optionally chose names for the
+variables in the checkpoint files.  By default, it uses the names passed to the
+`tf.Variable()` call.
+
+### Saving Variables
+
+Create a `Saver` with `tf.train.Saver()` to manage all variables in
+the model.
+
+```python
+# Create some variables.
+v1 = tf.Variable(..., name="v1")
+v2 = tf.Variable(..., name="v2")
+...
+# Add an Op to initialize the variables.
+init_op = tf.initialize_all_variables()
+
+# Add Ops to save and restore all the variables.
+saver = tf.train.Saver()
+
+# Later, launch the model, initialize the variables, do some work, save the
+# variables to disk.
+with tf.Session() as sess:
+  sess.Run(init_op)
+  # Do some work with the model.
+  ..
+  # Save the variables to disk.
+  save_path = saver.Save(sess, "/tmp/model.ckpt")
+  print "Model saved in file: ", save_path
+```
+
+### Restoring Variables
+
+The same `Saver` object is used to restore variables.  Note that when you
+restore variables form a file you do not have to initialize them beforehand.
+
+```python
+# Create some variables.
+v1 = tf.Variable(..., name="v1")
+v2 = tf.Variable(..., name="v2")
+...
+# Add Ops to save and restore all the variables.
+saver = tf.train.Saver()
+
+# Later, launch the model, use the saver to restore variables from disk, and
+# do some work with the model.
+with tf.Session() as sess:
+  # Restore variables from disk.
+  saver.Restore(sess, "/tmp/model.ckpt")
+  print "Model restored."
+  # Do some work with the model
+  ...
+```
+
+### Chosing which Variables to Save and Restore
+
+If you do not pass any argument to `tf.train.Saver()` the saver
+handles all variables.  Each one of them is saved under the name that was
+passed when the variable was created.
+
+It is sometimes useful to explicitly specify names for variables in the
+checkpoint files.  For example, you may have trained a model with a variable
+named `"weights"` whose value you want to restore in a new variable named
+`"params"`.
+
+It is also sometimes useful to only save or restore a subset of the variables
+used by a model.  For example, you may have trained a neural net with 5 layers,
+and you now want to train a new model with 6 layers, restoring the parameters
+from the 5 layers of the previously trained model into the first 5 layers of
+the new model.
+
+You can easily specify the names and variables to save by passing to the
+`tf.train.Saver()` constructor a Python dictionary: keys are the
+names to use, values are the variables to manage.
+
+Notes:
+
+*  You can create as many saver objects as you want if you need to save and
+   restore different subsets of the model variables.  The same variable can be
+   listed in multiple saver objects, its value is only changed when the saver
+   `Restore()` method is run.
+
+*  If you only restore a subset of the model variables at the start
+   of a session, you have to run an initialize Op for the other variables.  See
+   [`tf.initialize_variables()`](../../api_docs/python/state_ops.md#initialize_variables)
+   for more information.
+
+```python
+# Create some variables.
+v1 = tf.Variable(..., name="v1")
+v2 = tf.Variable(..., name="v2")
+...
+# Add Ops to save and restore only 'v2' using the name "my_v2"
+saver = tf.train.Saver({"my_v2": v2})
+# Use the saver object normally after that.
+...
+```
diff --git a/tensorflow/g3doc/images/getting_started.dot b/tensorflow/g3doc/images/getting_started.dot
new file mode 100644
index 0000000000..a9cae6c4b1
--- /dev/null
+++ b/tensorflow/g3doc/images/getting_started.dot
@@ -0,0 +1,14 @@
+digraph Dependencies {
+  node [shape = oval];
+  "predictions: MatMul()" -> "data: Concat()"
+  "data: Concat()" -> data_left
+  "data: Concat()" -> data_right
+  "predictions: MatMul()" -> "weight_matrix: Reshape()"
+  "weight_matrix: Reshape()" -> "new_weights: Add()"
+  "new_weights: Add()" -> weights 
+  "new_weights: Add()" -> deltas
+  "update: Assign()" -> weights
+  "update: Assign()" -> "new_weights: Add()"
+  "InitializeAllVariables()" -> weights
+  "InitializeAllVariables()" -> init_value
+}
+\ No newline at end of file
diff --git a/tensorflow/g3doc/index.md b/tensorflow/g3doc/index.md
new file mode 100644
index 0000000000..dabc083ca8
--- /dev/null
+++ b/tensorflow/g3doc/index.md
@@ -0,0 +1,21 @@
+# TensorFlow
+
+<!-- Note: This file is ignored in building the external site tensorflow.org -->
+
+## Introduction
+
+TensorFlow&#8482; is an open source software library for numerical computation
+using data flow graphs.  Nodes in the graph represent mathematical operations,
+while the graph edges represent the multidimensional data arrays (tensors) that
+flow between them.  This flexible architecture allows you to deploy computation
+to one or more CPUs or GPUs in a desktop, server, or mobile device without
+rewriting code.  TensorFlow was originally developed by researchers and
+engineers working on the Google Brain team within Google's Machine Intelligence
+research organization for the purposes of conducting machine learning and deep
+neural networks research.  The system is general enough to be applicable in a
+wide variety of other domains as well.  The following documents show you how
+to set up and use the TensorFlow system.
+
+## Table of Contents
+<!--#include virtual="sitemap.md" -->
+
diff --git a/tensorflow/g3doc/resources/dims_types.md b/tensorflow/g3doc/resources/dims_types.md
new file mode 100644
index 0000000000..eebd80efaa
--- /dev/null
+++ b/tensorflow/g3doc/resources/dims_types.md
@@ -0,0 +1,68 @@
+# Tensor Ranks, Shapes, and Types
+
+TensorFlow programs use a tensor data structure to represent all data. You can
+think of a TensorFlow tensor as an n-dimensional array or list.
+A tensor has a static type and dynamic dimensions. Only tensors may be passed
+between nodes in the computation graph.
+
+## Rank
+
+In the TensorFlow system, tensors are described by a unit of dimensionality
+known as *rank*. Tensor rank is not the same as matrix rank. Tensor rank
+(sometimes referred to as *order* or *degree* or *n-dimension*) is the number
+of dimensions of the tensor. For example, the following tensor (defined as a
+Python list) has a rank of 2:
+
+    t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
+
+A rank two tensor is what we typically think of as a matrix, a rank on tensor
+is a vector. For a rank two tensor you can acccess any element with the syntax
+`t[i, j]`.  For a rank three tensor you would need to address an element with
+'t[i, j, k]'.
+
+Rank | Math entity | Python example
+--- | --- | ---
+0 | Scalar (magnitude only) | `s = 483`
+1 | Vector (magnitude and direction) | `v = [1.1, 2.2, 3.3]`
+2 | Matrix (table of numbers) | `m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]`
+3 | 3-Tensor (cube of numbers] | `t = [[[2], [4], [6]], [[8], [10], [12]], [[14], [16], [18]]]`
+n | n-Tensor (you get the idea) | `....`
+
+## Shape
+
+The TensorFlow documentation uses three notational conventions to describe
+tensor dimensionality: rank, shape, and dimension number. The following table
+shows how these relate to one another:
+
+Rank | Shape | Dimension number | Example
+--- | --- | --- | ---
+0 | [] | 0-D | A 0-D tensor.  A scalar.
+1 | [D0] | 1-D | A 1-D tensor with shape [5].
+2 | [D0, D1] | 2-D | A 2-D tensor with shape [3, 4].
+3 | [D0, D1, D2] | 3-D | A 3-D tensor with shape [1, 4, 3].
+n | [D0, D1, ... Dn] | n-D | A tensor with shape [D0, D1, ... Dn].
+
+Shapes can be represented via Python lists / tuples of ints, or with the
+[`TensorShape` class](../api_docs/python/framework.md#TensorShape).
+
+## Data types
+
+In addition to dimensionality, Tensors have a data type. You can assign any one
+of the following data types to a tensor:
+
+Data type | Python type | Description
+--- | --- | ---
+`DT_FLOAT` | `tf.float32` | 32 bits floating point.
+`DT_DOUBLE` | `tf.float64` | 64 bits floating point.
+`DT_INT64` | `tf.int64` | 64 bits signed integer.
+`DT_INT32` | `tf.int32` | 32 bits signed integer.
+`DT_INT16` | `tf.int16` | 16 bits signed integer.
+`DT_INT8` | `tf.int8` | 8 bits signed integer.
+`DT_UINT8` | `tf.uint8` | 8 bits unsigned integer.
+`DT_STRING` | `tf.string` | Variable length byte arrays.  Each element of a Tensor is a byte array.
+`DT_BOOL` | `tf.bool` | Boolean.
+`DT_COMPLEX64` | `tf.complex64` | Complex number made of two 32 bits floating points: real and imaginary parts.
+`DT_QINT32` | `tf.qint32` | 32 bits signed integer used in quantized Ops.
+`DT_QINT8` | `tf.qint8` | 8 bits signed integer used in quantized Ops.
+`DT_QUINT8` | `tf.quint8` | 8 bits unsigned integer used in quantized Ops.
+
diff --git a/tensorflow/g3doc/resources/faq.md b/tensorflow/g3doc/resources/faq.md
new file mode 100644
index 0000000000..fcdc8d1e33
--- /dev/null
+++ b/tensorflow/g3doc/resources/faq.md
@@ -0,0 +1,309 @@
+# Frequently Asked Questions
+
+This document provides answers to some of the frequently asked questions about
+TensorFlow. If you have a question that is not covered here, please
+[get in touch](index.md).
+
+<!-- TOC-BEGIN This section is generated by neural network: DO NOT EDIT! -->
+## Contents
+
+
+<!-- TOC-END This section was generated by neural network, THANKS FOR READING! -->
+
+#### Building a TensorFlow graph
+
+See also the
+[API documentation on building graphs](../api_docs/python/framework.md).
+
+##### Why does `c = tf.matmul(a, b)` not execute the matrix multiplication immediately?
+
+In the TensorFlow Python API, `a`, `b`, and `c` are
+[`Tensor`](../api_docs/python/framework.md#Tensor) objects. A `Tensor` object is
+a symbolic handle to the result of an operation, but does not actually hold the
+values of the operation's output. Instead, TensorFlow encourages users to build
+up complicated expressions (such as entire neural networks and its gradients) as
+a dataflow graph. You then offload the computation of the entire dataflow graph
+(or a subgraph of it) to a TensorFlow
+[`Session`](../api_docs/python/client.md#Session), which is able to execute the
+whole computation much more efficiently than executing the operations
+one-by-one.
+
+##### How are devices named?
+
+The supported device names are `"/device:CPU:0"` (or `"/cpu:0"`) for the CPU
+device, and `"/device:GPU:i"` (or `"/gpu:i"`) for the *i*th GPU device.
+
+##### How do I place operations on a particular device?
+
+To place a group of operations on a device, create them within a
+[`with tf.device(name):`](../api_docs/python/framework.md#device) context.  See
+the how-to documentation on
+[using GPUs with TensorFlow](../how_tos/using_gpu/index.md) for details of how
+TensorFlow assigns operations to devices, and the
+[CIFAR-10 tutorial](../tutorials/deep_cnn/index.md) for an example model that
+uses multiple GPUs.
+
+##### What are the different types of tensors that are available?
+
+TensorFlow supports a variety of different data types and tensor shapes. See the
+[ranks, shapes, and types reference](dims_types.md) for more details.
+
+#### Running a TensorFlow computation
+
+See also the
+[API documentation on running graphs](../api_docs/python/client.md).
+
+##### What's the deal with feeding and placeholders?
+
+Feeding is a mechanism in the TensorFlow Session API that allows you to
+substitute different values for one or more tensors at run time. The `feed_dict`
+argument to [`Session.run()`](../api_docs/python/client.md#Session.run) is a
+dictionary that maps [`Tensor`](../api_docs/python/framework.md) objects to
+numpy arrays (and some other types), which will be used as the values of those
+tensors in the execution of a step.
+
+Often, you have certain tensors, such as inputs, that will always be fed. The
+[`tf.placeholder()`] op allows you to define tensors that *must* be fed, and
+optionally allows you to constrain their shape as well. See the
+[beginners' MNIST tutorial](../tutorials/mnist/beginners/index.md) for an
+example of how placeholders and feeding can be used to provide the training data
+for a neural network.
+
+##### What is the difference between `Session.run()` and `Tensor.eval()`?
+
+If `t` is a [`Tensor`](../api_docs/python/framework.md#Tensor) object,
+[`t.eval()`](../api_docs/python/framework.md#Tensor.eval) is shorthand for
+[`sess.run(t)`](../api_docs/python/client.md#Session.run) (where `sess` is the
+current [default session](../api_docs/python/client.md#get_default_session). The
+two following snippets of code are equivalent:
+
+```python
+# Using `Session.run()`.
+sess = tf.Session()
+c = tf.constant(5.0)
+print sess.run(c)
+
+# Using `Tensor.eval()`.
+c = tf.constant(5.0)
+with tf.Session():
+  print c.eval()
+```
+
+In the second example, the session acts as a
+[context manager](https://docs.python.org/2.7/reference/compound_stmts.html#with),
+which has the effect of installing it as the default session for the lifetime of
+the `with` block. The context manager approach can lead to more concise code for
+simple use cases (like unit tests); if your code deals with multiple graphs and
+sessions, it may be more straightforward to explicit calls to `Session.run()`.
+
+##### Do Sessions have a lifetime? What about intermediate tensors?
+
+Sessions can own resources, such
+[variables](../api_docs/python/state_ops.md#Variable),
+[queues](../api_docs/python/io_ops.md#QueueBase), and
+[readers](../api_docs/python/io_ops.md#ReaderBase); and these resources can use
+a significant amount of memory. These resources (and the associated memory) are
+released when the session is closed, by calling
+[`Session.close()`](../api_docs/python/client.md#Session.close).
+
+The intermediate tensors that are created as part of a call to
+[`Session.run()`](../api_docs/python/client.md) will be freed at or before the
+end of the call.
+
+##### Can I run distributed training on multiple computers?
+
+The initial open-source release of TensorFlow supports multiple devices (CPUs
+and GPUs) in a single computer. We are working on a distributed version as well:
+if you are interested, please let us know so we can prioritize accordingly.
+
+##### Does the runtime parallelize parts of graph execution?
+
+The TensorFlow runtime parallelizes graph execution across many different
+dimensions:
+
+* The individual ops have parallel implementations, using multiple cores in a
+  CPU, or multiple threads in a GPU.
+* Independent nodes in a TensorFlow graph can run in parallel on multiple
+  devices, which makes it possible to speed up
+  [CIFAR-10 training using multiple GPUs](../tutorials/deep_cnn/index.md).
+* The Session API allows multiple concurrent steps (i.e. calls to
+  [Session.run()](../api_docs/python/client.md#Session.run) in parallel. This
+  enables the runtime to get higher throughput, if a single step does not use
+  all of the resources in your computer.
+
+##### Which client languages are supported in TensorFlow?
+
+TensorFlow is designed to support multiple client languages. Currently, the
+best-supported client language is [Python](../api_docs/python/index.md). The
+[C++ client API](../api_docs/cc/index.md) provides an interface for launching
+graphs and running steps; we also have an experimental API for
+[building graphs in C++](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/cc/tutorials/example_trainer.cc).
+
+We would like to support more client languages, as determined by community
+interest. TensorFlow has a
+[C-based client API](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/public/tensor_c_api.h)
+that makes it easy to build a client in many different languages. We invite
+contributions of new language bindings.
+
+##### Does TensorFlow make use of all the devices (GPUs and CPUs) available on my machine?
+
+TensorFlow supports multiple GPUs and CPUs. See the how-to documentation on
+[using GPUs with TensorFlow](../how_tos/using_gpu/index.md) for details of how
+TensorFlow assigns operations to devices, and the
+[CIFAR-10 tutorial](../tutorials/deep_cnn/index.md) for an example model that
+uses multiple GPUs.
+
+Note that TensorFlow only uses GPU devices with a compute capability greater
+than 3.5.
+
+##### Why does `Session.run()` hang when using a reader or a queue?
+
+The [reader](../api_docs/io_ops.md#ReaderBase) and
+[queue](../api_docs/io_ops.md#QueueBase) classes provide special operations that
+can *block* until input (or free space in a bounded queue) becomes
+available. These operations allow you to build sophisticated
+[input pipelines](../how_tos/reading_data/index.md), at the cost of making the
+TensorFlow computation somewhat more complicated. See the how-to documentation
+for
+[using `QueueRunner` objects to drive queues and readers](../how_tos/reading_data/index.md#QueueRunners)
+for more information on how to use them.
+
+#### Variables
+
+See also the how-to documentation on [variables](../how_tos/variables/index.md)
+and [variable scopes](../how_tos/variable_scope/index.md), and
+[the API documentation for variables](../api_docs/python/state_ops.md).
+
+##### What is the lifetime of a variable?
+
+A variable is created when you first run the
+[`tf.Variable.initializer`](../api_docs/python/state_ops.md#Variable.initializer)
+operation for that variable in a session. It is destroyed when that
+[`session is closed`](../api_docs/python/client.md#Session.close).
+
+##### How do variables behave when they are concurrently accessed?
+
+Variables allow concurrent read and write operations. The value read from a
+variable may change it is concurrently updated. By default, concurrent assigment
+operations to a variable are allowed to run with no mutual exclusion. To acquire
+a lock when assigning to a variable, pass `use_locking=True` to
+[`Variable.assign()`](../api_docs/python/state_ops.md#Variable.assign).
+
+#### Tensor shapes
+
+See also the
+[`TensorShape` API documentation](../api_docs/python/framework.md#TensorShape).
+
+##### How can I determine the shape of a tensor in Python?
+
+In TensorFlow, a tensor has both a static (inferred) shape and a dynamic (true)
+shape. The static shape can be read using the
+[`tf.Tensor.get_shape()`](../api_docs/python/framework.md#Tensor.get_shape)
+method: this shape is inferred from the operations that were used to create the
+tensor, and may be
+[partially complete](../api_docs/python/framework.md#TensorShape). If the static
+shape is not fully defined, the dynamic shape of a `Tensor` `t` can be
+determined by evaluating [`tf.shape(t)`](../api_docs/python/array_ops.md#shape).
+
+##### What is the difference between `x.set_shape()` and `x = tf.reshape(x)`?
+
+The [`tf.Tensor.set_shape()`](../api_docs/python/framework.md) method updates
+the static shape of a `Tensor` object, and it is typically used to provide
+additional shape information when this cannot be inferred directly. It does not
+change the dynamic shape of the tensor.
+
+The [`tf.reshape()`](../api_docs/python/array_ops.md#reshape) operation creates
+a new tensor with a different dynamic shape.
+
+##### How do I build a graph that works with variable batch sizes?
+
+It is often useful to build a graph that works with variable batch sizes, for
+example so that the same code can be used for (mini-)batch training, and
+single-instance inference. The resulting graph can be
+[saved as a protocol buffer](../api_docs/python/framework.md#Graph.as_graph_def)
+and
+[imported into another program](../api_docs/python/framework.md#import_graph_def).
+
+When building a variable-size graph, the most important thing to remember is not
+to encode the batch size as a Python constant, but instead to use a symbolic
+`Tensor` to represent it. The following tips may be useful:
+
+* Use [`batch_size = tf.shape(input)[0]`](../api_docs/python/array_ops.md#shape)
+  to extract the batch dimension from a `Tensor` called `input`, and store it in
+  a `Tensor` called `batch_size`.
+
+* Use [`tf.reduce_mean()`](../api_docs/python/math_ops.md#reduce_mean) instead
+  of `tf.reduce_sum(...) / batch_size`.
+
+* If you use
+  [placeholders for feeding input](../how_tos/reading_data/index.md#Feeding),
+  you can specify a variable batch dimension by creating the placeholder with
+  [`tf.placeholder(..., shape=[None, ...])`](../api_docs/python/io_ops.md#placeholder). The
+  `None` element of the shape corresponds to a variable-sized dimension.
+
+#### TensorBoard
+
+See also the
+[how-to documentation on TensorBoard](../how_tos/graph_viz/index.md).
+
+##### What is the simplest way to send data to tensorboard? # TODO(danmane)
+
+Add summary_ops to your TensorFlow graph, and use a SummaryWriter to write all
+of these summaries to a log directory. Then, startup TensorBoard using
+<SOME_COMMAND> and pass the --logdir flag so that it points to your
+log directory. For more details, see <YET_UNWRITTEN_TENSORBOARD_TUTORIAL>.
+
+#### Extending TensorFlow
+
+See also the how-to documentation for
+[adding a new operation to TensorFlow](../how_tos/adding_an_op/index.md).
+
+##### My data is in a custom format. How do I read it using TensorFlow?
+
+There are two main options for dealing with data in a custom format.
+
+The easier option is to write parsing code in Python that transforms the data
+into a numpy array, then feed a
+[tf.placeholder()](../api_docs/python/io_ops.md#placeholder) a tensor with that
+data. See the documentation on
+[using placeholders for input](../how_tos/reading_data/index.md#Feeding) for
+more details. This approach is easy to get up and running, but the parsing can
+be a performance bottleneck.
+
+The more efficient option is to
+[add a new op written in C++](../how_tos/adding_an_op/index.md) that parses your
+data format. The
+[guide to handling new data formats](../how_tos/new_data_formats/index.md) has
+more information about the steps for doing this.
+
+##### How do I define an operation that takes a variable number of inputs?
+
+The TensorFlow op registration mechanism allows you to define inputs that are a
+single tensor, a list of tensors with the same type (for example when adding
+together a variable-length list of tensors), or a list of tensors with different
+types (for example when enqueuing a tuple of tensors to a queue).  See the
+how-to documentation for
+[adding an op with a list of inputs or outputs](../how_tos/adding_an_op/index.md#list-input-output)
+for more details of how to define these different input types.
+
+#### Miscellaneous
+
+##### Does TensorFlow work with Python 3?
+
+We have only tested TensorFlow using Python 2.7. We are aware of some changes
+that will be required for Python 3 compatibility, and welcome contributions
+towards this effort.
+
+##### What is TensorFlow's coding style convention?
+
+The TensorFlow Python API adheres to the
+[PEP8](https://www.python.org/dev/peps/pep-0008/) conventions.<sup>*</sup> In
+particular, we use `CamelCase` names for classes, and `snake_case` names for
+functions, methods, and properties. We also adhere to the
+[Google Python style guide](https://google.github.io/styleguide/pyguide.html).
+
+The TensorFlow C++ code base adheres to the
+[Google C++ style guide](http://google.github.io/styleguide/cppguide.html).
+
+(<sup>*</sup> With one exception: we use 2-space indentation instead of 4-space
+indentation.)
diff --git a/tensorflow/g3doc/resources/glossary.md b/tensorflow/g3doc/resources/glossary.md
new file mode 100644
index 0000000000..ab1fc4eb27
--- /dev/null
+++ b/tensorflow/g3doc/resources/glossary.md
@@ -0,0 +1,149 @@
+# Glossary
+
+**Broadcasting operation**
+
+An operation that uses [numpy-style broadcasting](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
+to make the shapes of its tensor arguments compatible.
+
+**Device**
+
+A piece of hardware that can run computation and has its own address space,
+like a GPU or CPU.
+
+**eval**
+
+A method of `Tensor` that returns the value of the `Tensor`, triggering any
+graph computation required to determine the value. You may only call `eval()`
+on a `Tensor` in a graph that has been launched in a session.
+
+**Feed**
+
+TensorFlow's mechanism for patching a tensor directly into any node in a graph
+launched in a session. You apply feeds when you trigger the execution of a
+graph, not when you build the graph. A feed temporarily replaces a node with a
+tensor value. You supply feed data as an argument to a run() or eval() call
+that initiates computation. After the run the feed disappears and the original
+node definition remains. You usually designate specific nodes to be "feed"
+nodes by using tf.placeholder() to create them. See
+[Basic Usage](../get_started/basic_usage.md) for more information.
+
+**Fetch**
+
+TensorFlow's mechanism for retrieving tensors from a graph launched in a
+session. You retrieve fetches when you trigger the execution of a graph, not
+when you build the graph. To fetch the tensor value of a node or nodes,
+execute the graph with a run() call on the Session object and pass a list of
+names of nodes to retrieve. See [Basic Usage](../get_started/basic_usage.md)
+for more information.
+
+**Graph**
+
+Describes a computation as a directed acyclic
+graph.  Nodes in the graph represent operations that must be
+performed. Edges in the graph represent either data or control
+dependencies. GraphDef is the proto used to describe a graph to the
+system (it is the API), and consists of a collection of NodeDefs (see
+below). A GraphDef may be converted to a (C++) Graph object which is
+easier to operate on.
+
+**IndexedSlices**
+
+In the Python API, TensorFlow's representation of a tensor that is sparse
+along only its first dimension. If the tensor is k-dimensional, an
+IndexedSlices instance logically represents a collection of (k-1)-dimensional
+slices along the tensor's first dimension. The indices of the slices are
+stored concatenated into a single 1-dimensional vector, and the corresponding
+slices are concatenated to form a single k-dimensional tensor. Use
+SparseTensor if the sparsity is not restricted to the first dimension.
+
+**Node**
+
+An element of a graph.
+
+Describes how to invoke a specific Op as one node in a specific computation
+Graph, including the values for any attrs needed to configure the Op.  For Ops
+that are polymorphic, the attrs include sufficient information to completely
+determine the signature of the Node. See graph.proto for details.
+
+**Op (operation)**
+
+In the TensorFlow runtime: A type of computation such as 'add' or 'matmul' or
+'concat'.  You can add new ops to the runtime as described [how to add an
+op](../how_tos/adding_an_op/index.md).
+
+In the Python API: A node in the graph.  Ops are represented by instances of
+the class [tf.Operation](../api_docs/python/framework.md#Operation).  The
+`type` property of an `Operation` indicates the run operation for the node,
+such as 'add' or 'matmul'.
+
+**Quantization**
+
+A reduction of numerical precision. Quantization maps floating-point values
+onto a smaller set of values, and is particular useful for improving the
+efficiency of neural networks. See TensorFlow's [neural network
+operations](../api_docs/python/nn.md?cl=head#quantized_avg_pool) for more
+information about TensorFlow's quantization support.
+
+**Run**
+
+The action of executing ops in a launched graph.  Requires that the graph be launched
+in a Session.
+
+In the Python API: A method of the Session class:
+[tf.Session.run](../api_docs/python/client.md#Session).  You can pass tensors
+to feed and fetch to the `run()` call.
+
+In the C++ API: A method of the [tensorflow::Session](../api_docs/cc/ClassSession.md).
+
+**Session**
+
+A runtime object representing a launched graph.  Provides methods to execute
+ops in the graph.
+
+In the Python API: [tf.Session](../api_docs/python/client.md#Session)
+
+In the C++ API: class used to launch a graph and run operations
+[tensorflow::Session](../api_docs/cc/ClassSession.md).
+
+**Shape**
+
+The number of dimensions of a tensor and their sizes.
+
+In a launched graph: Property of the tensors that flow between nodes.  Some ops
+have strong requirements on the shape of their inputs and report errors at
+runtime if these are not met.
+
+In the Python API: Attribute of a Python Tensor in the graph construction
+API. During constructions the shape of tensors can be only partially known, or
+even unknown.  See
+[tf.TensorShape](../api_docs/python/framework.md#TensorShape)
+
+In the C++ API: class used to represent the shape of tensors
+[tensorflow::TensorShape](../api_docs/cc/ClassTensorShape.md).
+
+**SparseTensor**
+
+In the Python API, TensorFlow's representation of a tensor that is sparse in
+arbitrary positions. A SparseTensor stores only the non-empty values along
+with their indices, using a dictionary-of-keys format. In other words, if
+there are m non-empty values, it maintains a length-m vector of values and
+a matrix with m rows of indices. For efficiency, SparseTensor requires the
+indices to be sorted along increasing dimension number, i.e. in row-major
+order. Use IndexedSlices if the sparsity is only along the first dimension.
+
+**Tensor**
+
+A `Tensor` is a typed multi-dimensional array.  For example, a 4-D
+array of floating point numbers representing a mini-batch of images with
+dimensions `[batch, height, width, channel]`.
+
+In a launched graph: Type of the data that flow between nodes.
+
+In the Python API: class used to represent the output and inputs of Ops added
+to the graph [tf.Tensor](../api_docs/python/framework.md#Tensor).  Instances of
+this class do not hold data.
+
+In the C++ API: class used to represent tensors returned from a
+[Session::Run()](../api_docs/cc/ClassSession.md) call
+[tensorflow::Tensor](../api_docs/cc/ClassTensor.md).
+Instances of this class hold data.
diff --git a/tensorflow/g3doc/resources/index.md b/tensorflow/g3doc/resources/index.md
new file mode 100644
index 0000000000..c9b88f4512
--- /dev/null
+++ b/tensorflow/g3doc/resources/index.md
@@ -0,0 +1,41 @@
+# Additional Resources
+
+
+## TensorFlow WhitePaper
+
+Additional details about the TensorFlow programming model and the underlying
+implementation can be found in out white paper:
+
+* [TensorFlow: Large-scale machine learning on heterogeneous systems](../extras/tensorflow-whitepaper2015.pdf)
+
+### Citation
+
+If you use TensorFlow in your research and would like to cite the TensorFlow
+system, we suggest you cite the paper above.  You can use this [BibTeX
+entry](../extras/tensorflow-whitepaper2015.bib).  As the project progresses, we
+may update the suggested citation with new papers.
+
+
+## Community
+
+TODO(rajatmonga):  Write this!
+
+* NO - google group
+* YES, ASAP - internal support mailing list
+* YES, ASAP - stack overflow presence
+* SOON - slack
+
+
+
+
+
+<div class='sections-order' style="display: none;">
+<!--
+<!-- uses.md -->
+<!-- faq.md -->
+<!-- glossary.md -->
+<!-- dims_types.md -->
+-->
+</div>
+
+
diff --git a/tensorflow/g3doc/resources/uses.md b/tensorflow/g3doc/resources/uses.md
new file mode 100644
index 0000000000..ac42b1f8e7
--- /dev/null
+++ b/tensorflow/g3doc/resources/uses.md
@@ -0,0 +1,42 @@
+# Example Uses
+
+This page describes some of the current uses of the TensorFlow system.
+
+> If you are using TensorFlow for research, for education, or for production
+> usage in some product, we would love to add something about your usage here.
+> Please feel free to email us a brief description of how you're using
+> TensorFlow, or send us a pull request to add an entry to this file.
+
+Listed below are some of the many uses of TensorFlow.
+
+* **RankBrain**
+  * **Organization**: Google
+  * **Domain**: Information Retrieval
+  * **Description**:  A large-scale deployment of deep neural nets for search ranking on www.google.com.
+  * **More info**: ["Google Turning Over Its Lucrative Search to AI Machines"](http://www.bloomberg.com/news/articles/2015-10-26/google-turning-its-lucrative-web-search-over-to-ai-machines)
+
+* **Inception Image Classification Model**
+  * **Organization**: Google
+  * **Description**: Baseline model and follow on research into highly accurate computer vision models, starting with the model that won the 2014 Imagenet iamge classification challenge
+  * **More Info**: Baseline model described in [Arxiv paper](http://arxiv.org/abs/1409.4842)
+
+* **SmartReply**
+  * **Organization**: Google
+  * **Description**: Deep LSTM model to automatically generate email responses
+  * **More Info**: [Google research blog post](http://googleresearch.blogspot.com/2015/11/computer-respond-to-this-email.html)
+
+* **Massively Multitask Networks for Drug Discovery**
+  * **Organization**: Google and Stanford University
+  * **Domain**: Drug discovery
+  * **Description**:  A deep neural network model for identifying promising drug candidates.
+  * **More info**: [Arxiv paper](http://arxiv.org/abs/1502.02072)
+
+* **On-Device Computer Vision for OCR**
+  * **Organization**: Google
+  * **Description**: On-device computer vision model to do optical character recoignition to enable real-time translation.
+  * **More info**: [Google Research blog post](http://googleresearch.blogspot.com/2015/07/how-google-translate-squeezes-deep.html)
+}
+
+* TODO(opensource): Add several other research projects
+* TODO(opensource): Pointer Sets?
+* TODO(opensource): Others
diff --git a/tensorflow/g3doc/tutorials/BUILD b/tensorflow/g3doc/tutorials/BUILD
new file mode 100644
index 0000000000..5642ade160
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/BUILD
@@ -0,0 +1,19 @@
+# Description:
+# Top-level tutorials files
+
+package(default_visibility = ["//tensorflow:internal"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+)
diff --git a/tensorflow/g3doc/tutorials/__init__.py b/tensorflow/g3doc/tutorials/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/__init__.py
diff --git a/tensorflow/g3doc/tutorials/deep_cnn/cifar_tensorboard.html b/tensorflow/g3doc/tutorials/deep_cnn/cifar_tensorboard.html
new file mode 100644
index 0000000000..266faf042e
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/deep_cnn/cifar_tensorboard.html
@@ -0,0 +1,21 @@
+<html>
+
+<head>
+  <title>TensorBoard Demo</title>
+  <script src="/tensorboard/webcomponents-lite.min.js"></script>
+  <link rel="import" href="/tensorboard/tf-tensorboard-demo.html">
+  <style>
+
+  html,body {
+    margin: 0;
+    padding: 0;
+    height: 100%;
+    font-family: "RobotoDraft","Roboto",sans-serif;
+  }
+
+</style>
+</head>
+<body>
+  <tf-tensorboard-demo data-dir="/tensorboard/cifar"></tf-tensorboard-demo>
+</body>
+</html>
diff --git a/tensorflow/g3doc/tutorials/deep_cnn/index.md b/tensorflow/g3doc/tutorials/deep_cnn/index.md
new file mode 100644
index 0000000000..f40a94ba7a
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/deep_cnn/index.md
@@ -0,0 +1,462 @@
+# Convolutional Neural Networks for Object Recognition
+
+**NOTE:** This tutorial is intended for *advanced* users of TensorFlow
+and assumes expertise and experience in machine learning.
+
+## Overview
+
+CIFAR-10 classification is a common benchmark problem in machine learning.  The
+problem is to classify RGB 32x32 pixel images across 10 categories:
+```airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.```
+
+![CIFAR-10 Samples](./cifar_samples.png "CIFAR-10 Samples, from http://www.cs.toronto.edu/~kriz/cifar.html")
+
+For more details refer to the [CIFAR-10 page](http://www.cs.toronto.edu/~kriz/cifar.html)
+and a [Tech Report](http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf)
+by Alex Krizhevsky.
+
+### Goals
+
+The goal of this tutorial is to build a relatively small convolutional neural
+network (CNN) for recognizing images. In the process this tutorial:
+
+1. Highlights a canonical organization for network architecture,
+training and evaluation.
+2. Provides a template for constructing larger and more sophisticated models.
+
+The reason CIFAR-10 was selected was because it contains enough complexity to
+exercise much of TensorFlow's ability to scale to large models. At the same
+time, the model is small enough to train fast in order to test new ideas and
+experiments.
+
+### Highlights of the Tutorial
+The CIFAR-10 tutorial demonstrates several important constructs for
+designing larger and more sophisticated models in TensorFlow:
+
+* Core mathematical components including
+[convolution](../../api_docs/python/nn.md#conv2d),
+[rectified linear activations](../../api_docs/python/nn.md#relu),
+[max pooling](../../api_docs/python/nn.md#max_pool) and
+[local response normalization](../../api_docs/python/nn.md#local_response_normalization).
+* [Visualization](../../how_tos/summaries_and_tensorboard/index.md)
+of network activity during training including input images,
+losses and distributions of activations and gradients.
+* Routines for calculating the
+[moving average](../../api_docs/python/train.md#ExponentialMovingAverage)
+of learned parameters and using these averages
+during evaluation to boost predictive performance.
+* Implementation of a
+[learning rate schedule](../../api_docs/python/train.md#exponential_decay)
+that systematically decrements over time.
+* Prefetching [queues](../../api_docs/python/io_ops.md#shuffle_batch)
+for input
+data to isolate the model from disk latency and expensive image pre-processing.
+
+We also provide a multi-GPU version of the model which demonstrates:
+
+* Configuring a model to train across multiple GPU cards in parallel.
+* Sharing and updating variables between multiple GPUs.
+
+We hope that this tutorial provides a launch point for building larger CNNs for
+vision tasks on TensorFlow.
+
+### Model Architecture
+
+The model in this CIFAR-10 tutorial is a multi-layer architecture consisting of
+alternating convolutions and nonlinearities. These layers are followed by fully
+connected layers leading into a softmax classifier.  The model follows the
+architecture described by
+[Alex Krizhevsky](https://code.google.com/p/cuda-convnet/), with a few
+differences in the top few layers.
+
+This model achieves a peak performance of about 86% accuracy within a few hours
+of training time on a GPU. Please see [below](#evaluating-a-model) and the code
+for details.  It consists of 1,068,298 learnable parameters and requires about
+19.5M multiply-add operations to compute inference on a single image.
+
+## Code Organization
+
+The code for this tutorial resides in
+[`tensorflow/models/image/cifar10/`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/).
+
+File | Purpose
+--- | ---
+[`cifar10_input.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10_input.py) | Read the native CIFAR-10 binary file format.
+[`cifar10.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py) | Build the CIFAR-10 model.
+[`cifar10_train.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10_train.py) | Train a CIFAR-10 model on a single machine.
+[`cifar10_multi_gpu_train.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py) | Train a CIFAR-10 model on multiple GPUs.
+[`cifar10_eval.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10_eval.py) | Evaluates the predictive performance of a CIFAR-10 model.
+
+
+## CIFAR-10 Model
+
+The CIFAR-10 network is largely contained in
+[`cifar10.py`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10.py).
+The complete training
+graph contains roughly 765 operations. We find that we can make the code most
+reusable by constructing the graph with the following modules:
+
+1. [**Model inputs:**](#model-inputs) `inputs()` and `distorted_inputs()` add
+operations that read and preprocess CIFAR images for evaluation and training,
+respectively.
+1. [**Model prediction:**](#model-prediction) `inference()`
+adds operations that perform inference, i.e. classification, on supplied images.
+1. [**Model training:**](#model-training) `loss()` and `train()`
+add operations that compute the loss,
+gradients, variable updates and visualization summaries.
+
+### Model Inputs
+
+The input part of the model is built by the functions `inputs()` and
+`distorted_inputs()` which read images from the CIFAR-10 binary data files.
+These files contain fixed byte length records, so we use
+[`tf.FixedLengthRecordReader`](../../api_docs/python/io_ops.md#FixedLengthRecordReader).
+See [`Reading Data`](../../how_tos/reading_data/index.md#reading-from-files) to
+learn more about how the `Reader` class works.
+
+The images are processed as follows:
+
+*  They are cropped to 24 x 24 pixels, centrally for evaluation or
+   [randomly](../../api_docs/python/image.md#random_crop) for training.
+*  They are [approximately whitened](../../api_docs/python/image.md#per_image_whitening)
+   to make the model insensitive to dynamic range.
+
+For training, we additionally apply a series of random distortions to
+artificially increase the data set size:
+
+* [Randomly flip](../../api_docs/python/image.md#random_flip_left_right) the image from left to right.
+* Randomly distort the [image brightness](../../api_docs/python/image.md#random_brightness).
+* Randomly distort the [image contrast](../../api_docs/python/image.md#tf_image_random_contrast).
+
+Please see the [`Images`](../../api_docs/python/image.md) page for the list of
+available distortions. We also attach an
+[`image_summary`](../../api_docs/python/train.md?#image_summary) to the images
+so that we may visualize them in TensorBoard.  This is a good practice to verify
+that inputs are built correctly.
+
+<div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;">
+  <img style="width:70%" src="./cifar_image_summary.png">
+</div>
+
+Reading images from disk and distorting them can use a non-trivial amount of
+processing time. To prevent these operations from slowing down training, we run
+them inside 16 separate threads which continuously fill a TensorFlow
+[queue](../../api_docs/python/io_ops.md#shuffle_batch).
+
+### Model Prediction
+
+The prediction part of the model is constructed by the `inference()` function
+which adds operations to compute the *logits* of the predictions. That part of
+the model is organized as follows:
+
+Layer Name | Description
+--- | ---
+`conv1` | [convolution](../../api_docs/python/nn.md#conv2d) and [rectified linear](../../api_docs/python/nn.md#relu) activation.
+`pool1` | [max pooling](../../api_docs/python/nn.md#max_pool).
+`norm1` | [local response normalization](../../api_docs/python/nn.md#local_response_normalization).
+`conv2` | [convolution](../../api_docs/python/nn.md#conv2d) and [rectified linear](../../api_docs/python/nn.md#relu) activation.
+`norm2` | [local response normalization](../../api_docs/python/nn.md#local_response_normalization).
+`pool2` | [max pooling](../../api_docs/python/nn.md#max_pool).
+`local3` | [fully connected layer with rectified linear activation](../../api_docs/python/nn.md).
+`local4` | [fully connected layer with rectified linear activation](../../api_docs/python/nn.md).
+`softmax_linear` | linear transformation to produce logits.
+
+Here is a graph generated from TensorBoard describing the inference operation:
+
+<div style="width:15%; margin:auto; margin-bottom:10px; margin-top:20px;">
+  <img style="width:100%" src="./cifar_graph.png">
+</div>
+
+> **EXERCISE**: The output of `inference` are un-normalized logits. Try editing
+the network architecture to return normalized predictions using [`tf.softmax()`]
+(../../api_docs/python/nn.md?cl=head#softmax).
+
+The `inputs()` and `inference()` functions provide all of the components
+necessary to perform evaluation on a model. We now shift our focus towards
+building operations for training a model.
+
+> **EXERCISE:** The model architecture in `inference()` differs slightly from
+the CIFAR-10 model specified in
+[cuda-convnet](https://code.google.com/p/cuda-convnet/).  In particular, the top
+layers are locally connected and not fully connected. Try editing the
+architecture to exactly replicate that fully connected model.
+
+### Model Training
+
+The usual method for training a network to perform N-way classification is
+[multinomial logistic regression](https://en.wikipedia.org/wiki/Multinomial_logistic_regression),
+aka. *softmax regression*. Softmax regression applies a
+[softmax](../../api_docs/python/nn.md#softmax) nonlinearity to the
+output of the network and calculates the
+[cross-entropy](../../api_docs/python/nn.md#softmax_cross_entropy_with_logits)
+between the normalized predictions and a
+[1-hot encoding](../../api_docs/python/sparse_ops.md#sparse_to_dense) of the label.
+For regularization, we also apply the usual
+[weight decay](../../api_docs/python/nn.md#l2_loss) losses to all learned
+variables.  The objective function for the model is the sum of the cross entropy
+loss and all these weight decay terms, as returned by the `loss()` function.
+
+We visualize it in TensorBoard with a [scalar_summary](../../api_docs/python/train.md?#scalar_summary):
+
+[![CIFAR-10 Loss](./cifar_loss.png "CIFAR-10 Total Loss")](#TODO(migmigmig)#TODO(danmane))
+
+We train the model using standard
+[gradient descent](https://en.wikipedia.org/wiki/Gradient_descent)
+algorithm (see [Training](../../api_docs/python/train.md) for other methods)
+with a learning rate that
+[exponentially decays](../../api_docs/python/train.md#exponential_decay)
+over time.
+
+[![CIFAR-10 Learning Rate Decay](./cifar_lr_decay.png "CIFAR-10 Learning Rate Decay")](#TODO(migmigmig)#TODO(danmane))
+
+The `train()` function adds the operations needed to minimize the objective by
+calculating the gradient and updating the learned variables (see
+[`GradientDescentOptimizer`](../../api_docs/python/train.md#GradientDescentOptimizer)
+for details).  It returns an operation that executes all of the calculations
+needed to train and update the model for one batch of images.
+
+## Launching and Training the Model
+
+We have built the model, let's now launch it and run the training operation with
+the script `cifar10_train.py`.
+
+```shell
+python cifar10_train.py
+```
+
+**NOTE:** The first time your run any target in the CIFAR-10 tutorial,
+the CIFAR-10 dataset is automatically downloaded. The data set is ~160MB
+so you may want to grab a quick cup of coffee for your first run.
+
+You should see the output:
+
+```shell
+Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
+2015-11-04 11:45:45.927302: step 0, loss = 4.68 (2.0 examples/sec; 64.221 sec/batch)
+2015-11-04 11:45:49.133065: step 10, loss = 4.66 (533.8 examples/sec; 0.240 sec/batch)
+2015-11-04 11:45:51.397710: step 20, loss = 4.64 (597.4 examples/sec; 0.214 sec/batch)
+2015-11-04 11:45:54.446850: step 30, loss = 4.62 (391.0 examples/sec; 0.327 sec/batch)
+2015-11-04 11:45:57.152676: step 40, loss = 4.61 (430.2 examples/sec; 0.298 sec/batch)
+2015-11-04 11:46:00.437717: step 50, loss = 4.59 (406.4 examples/sec; 0.315 sec/batch)
+...
+```
+
+The script reports the total loss every 10 steps as well the speed at which
+the last batch of data was processed. A few comments:
+
+* The first batch of data can be inordinately slow (e.g. several minutes) as the
+preprocessing threads fill up the shuffling queue with 20,000 processed CIFAR
+images.
+
+* The reported loss is the average loss of the most recent batch. Remember that
+this loss is the sum of the cross entropy and all weight decay terms.
+
+* Keep an eye on the processing speed of a batch. The numbers shown above were
+run on a Tesla K40c. If you are running on a CPU, expect slower performance.
+
+
+> **EXERCISE:** When experimenting, it is sometimes annoying that the first
+training step can take so long. Try decreasing the number of images initially
+that initially fill up the queue.  Search for `NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN`
+in `cifar10.py`.
+
+`cifar10_train.py` periodically [saves](../../api_docs/python/state_ops.md#Saver)
+all model parameters in
+[checkpoint files](../../how_tos/variables.md#saving-and-restoring)
+but it does *not* evaluate the model. The checkpoint file
+will be used by `cifar10_eval.py` to measure the predictive
+performance (see [Evaluating a Model](#evaluating-a-model) below).
+
+
+If you followed the previous steps, then you have now started training
+a CIFAR-10 model. [Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0)
+
+The terminal text returned from `cifar10_train.py` provides minimal insight into
+how the model is training. We want more insight into the model during training:
+
+* Is the loss *really* decreasing or is that just noise?
+* Is the model being provided appropriate images?
+* Are the gradients, activations and weights reasonable?
+* What is the learning rate currently at?
+
+[TensorBoard](../../how_tos/summaries_and_tensorboard/index.md) provides this
+functionality, displaying data exported periodically from `cifar10_train.py` via
+a
+[`SummaryWriter`](../../api_docs/python/train.md#SummaryWriter).
+
+For instance, we can watch how the distribution of activations and degree of
+sparsity in `local3` features evolve during training:
+
+<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px; display: flex; flex-direction: row">
+  <img style="flex-grow:1; flex-shrink:1;" src="./cifar_sparsity.png">
+  <img style="flex-grow:1; flex-shrink:1;" src="./cifar_activations.png">
+</div>
+
+Individual loss functions, as well as the total loss, are particularly
+interesting to track over time. However, the loss exhibits a considerable amount
+of noise due to the small batch size employed by training.  In practice we find
+it extremely useful to visualize their moving averages in addition to their raw
+values.  See how the scripts use
+[ExponentialMovingAverage](../../api_docs/python/train.md#ExponentialMovingAverage)
+for this purpose.
+
+## Evaluating a Model
+
+Let us now evaluate how well the trained model performs on a hold-out data set.
+the model is evaluated by the script `cifar10_eval.py`.  It constructs the model
+with the `inference()` function and uses all 10,000 images in the evaluation set
+of CIFAR-10. It calculates the *precision at 1:* how often the top prediction
+matches the true label of the image.
+
+To monitor how the model improves during training, the evaluation script runs
+periodically on the latest checkpoint files created by the `cifar10_train.py`.
+
+```shell
+python cifar10_eval.py
+```
+
+> Be careful not to run the evaluation and training binary on the same GPU or
+else you might run out of memory. Consider running the evaluation on
+a separate GPU if available or suspending the training binary while running
+the evaluation on the same GPU.
+
+You should see the output:
+
+```shell
+2015-11-06 08:30:44.391206: precision @ 1 = 0.860
+...
+```
+
+The script merely returns the precision @ 1 periodically -- in this case
+it returned 86% accuracy. `cifar10_eval.py` also
+exports summaries that may be visualized in TensorBoard. These summaries
+provide additional insight into the model during evaluation.
+
+The training script calculates the
+[moving average](../../api_docs/python/train.md#ExponentialMovingAverage)
+version of all learned variables. The evaluation script substitutes
+all learned model parameters with the moving average version. This
+substitution boosts model performance at evaluation time.
+
+> **EXERCISE:** Employing averaged parameters may boost predictive performance
+by about 3% as measured by precision@1. Edit `cifar10_eval.py` to not employ the
+averaged parameters for the model and verify that the predictive performance
+drops.
+
+
+## Training a Model Using Multiple GPU Cards
+
+Modern workstations may contain multiple GPUs for scientific computation.
+TensorFlow can leverage this environment to run the training operation
+concurrently across multiple cards.
+
+Training a model in a parallel, distributed fashion requires
+coordinating training processes. For what follows we term *model replica*
+to be one copy of a model training on a subset of data.
+
+Naively employing asynchronous updates of model parameters
+leads to sub-optimal training performance
+because an individual model replica might be trained on a stale
+copy of the model parameters. Conversely, employing fully synchronous
+updates will be as slow as the slowest model replica.
+
+In a workstation with multiple GPU cards, each GPU will have similar speed
+and contain enough memory to run an entire CIFAR-10 model. Thus, we opt to
+design our training system in the following manner:
+
+* Place an individual model replica on each GPU.
+* Update model parameters synchronously by waiting for all GPUs to finish
+processing a batch of data.
+
+Here is a diagram of this model:
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+  <img style="width:100%" src="./Parallelism.png">
+</div>
+
+Note that each GPU computes inference as well as the gradients for a unique
+batch of data. This setup effectively permits dividing up a larger batch
+of data across the GPUs.
+
+This setup requires that all GPUs share the model parameters. A well-known
+fact is that transferring data to and from GPUs is quite slow. For this
+reason, we decide to store and update all model parameters on the CPU (see
+green box). A fresh set of model parameters are transferred to the GPU
+when a new batch of data is processed by all GPUs.
+
+The GPUs are synchronized in operation. All gradients are accumulated from
+the GPUs and averaged (see green box). The model parameters are updated with
+the gradients averaged across all model replicas.
+
+### Placing Variables and Operations on Devices
+
+Placing operations and variables on devices requires some special
+abstractions.
+
+The first abstraction we require is a function for computing inference and
+gradients for a single model replica. In the code we term this abstraction
+a *tower*. We must set two attributes for each tower:
+
+* A unique name for all operations within a tower.
+[`tf.name_scope()`](../../api_docs/python/framework.md#name_scope) provides
+this unique name by prepending a scope. For instance, all operations in
+the first tower are prepended with `tower_0`, e.g. `tower_0/conv1/Conv2D`.
+
+* A preferred hardware device to run the operation within a tower.
+[`tf.device()`](../../api_docs/python/framework.md#device) specifies this. For
+instance, all operations in the first tower reside within `device('/gpu:0')`
+scope indicating that they should be run on the first GPU.
+
+All variables are pinned to the CPU and accessed via
+[`tf.get_variable()`](../../api_docs/python/state_ops.md#get_variable)
+in order to share them in a multi-GPU version.
+See how-to on [Sharing Variables](../../how_tos/variable_scope/index.md).
+
+### Launching and Training the Model on Multiple GPU cards
+
+If you have several GPU cards installed on your machine you can use them to
+train the model faster with the `cifar10_multi_gpu_train.py` script.  It is a
+variation of the training script that parallelizes the model across multiple GPU
+cards.
+
+```shell
+python cifar10_multi_gpu_train.py --num_gpus=2
+```
+
+The training script should output:
+
+```shell
+Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
+2015-11-04 11:45:45.927302: step 0, loss = 4.68 (2.0 examples/sec; 64.221 sec/batch)
+2015-11-04 11:45:49.133065: step 10, loss = 4.66 (533.8 examples/sec; 0.240 sec/batch)
+2015-11-04 11:45:51.397710: step 20, loss = 4.64 (597.4 examples/sec; 0.214 sec/batch)
+2015-11-04 11:45:54.446850: step 30, loss = 4.62 (391.0 examples/sec; 0.327 sec/batch)
+2015-11-04 11:45:57.152676: step 40, loss = 4.61 (430.2 examples/sec; 0.298 sec/batch)
+2015-11-04 11:46:00.437717: step 50, loss = 4.59 (406.4 examples/sec; 0.315 sec/batch)
+...
+```
+
+Note that the number of GPU cards used defaults to 1. Additionally, if only 1
+GPU is available on your machine, all computations will be placed on it, even if
+you ask for more.
+
+> **EXERCISE:** The default settings for `cifar10_train.py` is to
+run on a batch size of 128. Try running `cifar10_multi_gpu_train.py` on 2 GPUs
+with a batch size of 64 and compare the training speed.
+
+## Next Steps
+
+[Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0). You have
+completed the CIFAR-10 tutorial.
+
+If you are now interested in developing and training your own image
+classification system, we recommend forking this tutorial and replacing
+components to build address your image classification problem.
+
+> **EXERCISE:** Download the
+[Street View House Numbers (SVHN)](http://ufldl.stanford.edu/housenumbers/) data set.
+Fork the CIFAR-10 tutorial and swap in the SVHN as the input data. Try adapting
+the network architecture to improve predictive performance.
+
+
+
diff --git a/tensorflow/g3doc/tutorials/index.md b/tensorflow/g3doc/tutorials/index.md
new file mode 100644
index 0000000000..726a5c6687
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/index.md
@@ -0,0 +1,142 @@
+# Overview
+
+
+## ML for Beginners
+
+If you're new to machine learning, we recommend starting here.  You'll learn
+about a classic problem, handwritten digit classification (MNIST), and get a
+gentle introduction to multiclass classification.
+
+[View Tutorial](mnist/beginners/index.md)
+
+
+## MNIST for Pros
+
+If you're already familiar with other deep learning software packages, and are
+already familiar with MNIST, this tutorial with give you a very brief primer on
+TensorFlow.
+
+[View Tutorial](mnist/pros/index.md)
+
+
+## TensorFlow Mechanics 101
+
+This is a technical tutorial, where we walk you through the details of using
+TensorFlow infrastructure to train models at scale.  We use again MNIST as the
+example.
+
+[View Tutorial](mnist/tf/index.md)
+
+
+## Convolutional Neural Networks
+
+An introduction to convolutional neural networks using the CIFAR-10 data set.
+Convolutional neural nets are particularly tailored to images, since they
+exploit translation invariance to yield more compact and effective
+representations of visual content.
+
+[View Tutorial](deep_cnn/index.md)
+
+
+## Vector Representations of Words
+
+This tutorial motivates why it is useful to learn to represent words as vectors
+(called *word embeddings*). It introduces the word2vec model as an efficient
+method for learning embeddings. It also covers the high-level details behind
+noise-contrastive training methods (the biggest recent advance in training
+embeddings).
+
+[View Tutorial](word2vec/index.md)
+
+
+## Recurrent Neural Networks
+
+An introduction to RNNs, wherein we train an LSTM network to predict the next
+word in an English sentence.  (A task sometimes called language modeling.)
+
+[View Tutorial](recurrent/index.md)
+
+
+## Sequence-to-Sequence Models
+
+A follow on to the RNN tutorial, where we assemble a sequence-to-sequence model
+for machine translation.  You will learn to build your own English-to-French
+translator, entirely machine learned, end-to-end.
+
+[View Tutorial](seq2seq/index.md)
+
+
+## Mandelbrot Set
+
+TensorFlow can be used for computation that has nothing to do with machine
+learning.  Here's a naive implementation of Mandelbrot set visualization.
+
+[View Tutorial](mandelbrot/index.md)
+
+
+## Partial Differential Equations
+
+As another example of non-machine learning computation, we offer an example of
+a naive PDE simulation of raindrops landing on a pond.
+
+[View Tutorial](pdes/index.md)
+
+
+## MNIST Data Download
+
+Details about downloading the MNIST handwritten digits data set.  Exciting
+stuff.
+
+[View Tutorial](mnist/download/index.md)
+
+
+## Sparse Linear Regression
+
+In many practical machine learning settings we have a large number input
+features, only very few of which are active for any given example.  TensorFlow
+has great tools for learning predictive models in these settings.
+
+COMING SOON
+
+
+## Visual Object Recognition
+
+We will be releasing our state-of-the-art Inception object recognition model,
+complete and already trained.
+
+COMING SOON
+
+
+## Deep Dream Visual Hallucinations
+
+Building on the Inception recognition model, we will release a TensorFlow
+version of the [Deep Dream](https://github.com/google/deepdream) neural network
+visual hallucination software.
+
+COMING SOON
+
+
+## Automated Image Captioning
+
+TODO(vinyals): Write me, three lines max.
+
+COMING SOON
+
+
+
+<div class='sections-order' style="display: none;">
+<!--
+<!-- mnist/beginners/index.md -->
+<!-- mnist/pros/index.md -->
+<!-- mnist/tf/index.md -->
+<!-- deep_cnn/index.md -->
+<!-- word2vec/index.md -->
+<!-- recurrent/index.md -->
+<!-- seq2seq/index.md -->
+<!-- mandelbrot/index.md -->
+<!-- pdes/index.md -->
+<!-- mnist/download/index.md -->
+-->
+</div>
+
+
diff --git a/tensorflow/g3doc/tutorials/mandelbrot/index.md b/tensorflow/g3doc/tutorials/mandelbrot/index.md
new file mode 100755
index 0000000000..7c6adcb4e8
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mandelbrot/index.md
@@ -0,0 +1,97 @@
+
+
+```
+#Import libraries for simulation
+import tensorflow as tf
+import numpy as np
+
+#Imports for visualization
+import PIL.Image
+from cStringIO import StringIO
+from IPython.display import clear_output, Image, display
+import scipy.ndimage as nd
+```
+
+
+```
+def DisplayFractal(a, fmt='jpeg'):
+  """Display an array of iteration counts as a
+     colorful picture of a fractal."""
+  a_cyclic = (6.28*a/20.0).reshape(list(a.shape)+[1])
+  img = np.concatenate([10+20*np.cos(a_cyclic),
+                        30+50*np.sin(a_cyclic),
+                        155-80*np.cos(a_cyclic)], 2)
+  img[a==a.max()] = 0
+  a = img
+  a = np.uint8(np.clip(a, 0, 255))
+  f = StringIO()
+  PIL.Image.fromarray(a).save(f, fmt)
+  display(Image(data=f.getvalue()))
+```
+
+
+```
+sess = tf.InteractiveSession()
+```
+
+    Exception AssertionError: AssertionError() in <bound method InteractiveSession.__del__ of <tensorflow.python.client.session.InteractiveSession object at 0x6247390>> ignored
+
+
+
+```
+# Use NumPy to create a 2D array of complex numbers on [-2,2]x[-2,2]
+
+Y, X = np.mgrid[-1.3:1.3:0.005, -2:1:0.005]
+Z = X+1j*Y
+```
+
+
+```
+xs = tf.constant(Z.astype("complex64"))
+zs = tf.Variable(xs)
+ns = tf.Variable(tf.zeros_like(xs, "float32"))
+```
+
+
+```
+tf.InitializeAllVariables().run()
+```
+
+
+```
+# Compute the new values of z: z^2 + x
+zs_ = zs*zs + xs
+
+# Have we diverged with this new value?
+not_diverged = tf.complex_abs(zs_) < 4
+
+# Operation to update the zs and the iteration count.
+#t
+# Note: We keep computing zs after they diverge! This
+#       is very wasteful! There are better, if a little
+#       less simple, ways to do this.
+#
+step = tf.group(
+  zs.assign(zs_),
+  ns.assign_add(tf.cast(not_diverged, "float32"))
+  )
+```
+
+
+```
+for i in range(200): step.run()
+```
+
+
+```
+DisplayFractal(ns.eval())
+```
+
+
+![jpeg](output_8_0.jpe)
+
+
+
+```
+
+```
diff --git a/tensorflow/g3doc/tutorials/mandelbrot/output_8_0.jpe b/tensorflow/g3doc/tutorials/mandelbrot/output_8_0.jpe
new file mode 100755
index 0000000000..8e261d44a8
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mandelbrot/output_8_0.jpe
diff --git a/tensorflow/g3doc/tutorials/mnist/__init__.py b/tensorflow/g3doc/tutorials/mnist/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/__init__.py
diff --git a/tensorflow/g3doc/tutorials/mnist/beginners/index.md b/tensorflow/g3doc/tutorials/mnist/beginners/index.md
new file mode 100644
index 0000000000..8ccb69d977
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/beginners/index.md
@@ -0,0 +1,420 @@
+# MNIST Softmax Regression (For Beginners)
+
+*This tutorial is intended for readers who are new to both machine learning and
+TensorFlow. If you already
+know what MNIST is, and what softmax (multinomial logistic) regression is,
+you might prefer this [faster paced tutorial](../pros/index.md).*
+
+When one learns how to program, there's a tradition that the first thing you do
+is print "Hello World." Just like programming has Hello World, machine learning
+has MNIST.
+
+MNIST is a simple computer vision dataset. It consists of images of handwritten
+digits like these:
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/MNIST.png">
+</div>
+
+It also includes labels for each image, telling us which digit it is. For
+example, the labels for the above images are 5, 0, 4, and 1.
+
+In this tutorial, we're going to train a model to look at images and predict
+what digits they are. Our goal isn't to train a really elaborate model that
+achieves state-of-the-art performance -- although we'll give you code to do that
+later! -- but rather to dip a toe into using TensorFlow. As such, we're going
+to start with a very simple model, called a Softmax Regression.
+
+The actual code for this tutorial is very short, and all the interesting
+stuff happens in just three lines. However, it is very
+important to understand the ideas behind it: both how TensorFlow works and the
+core machine learning concepts. Because of this, we are going to very carefully
+work through the code.
+
+## The MNIST Data
+
+The MNIST data is hosted on
+[Yann LeCun's website](http://yann.lecun.com/exdb/mnist/).
+For your convenience, we've included some python code to download and install
+the data automatically. You can either download [the code](../input_data.py) and
+import it as below, or simply copy and paste it in.
+
+```python
+import input_data
+mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
+```
+
+The downloaded data is split into two parts, 60,000 data points of training
+data (`mnist.train`) and 10,000 points of test data (`mnist.test`).  This
+split is very important: it's essential in machine learning that we
+have separate data which we don't learn from so that we can make sure
+that what we've learned actually generalizes!
+
+As mentioned earlier, every MNIST data point has two parts: an image of a
+handwritten digit and a corresponding label. We will call the images "xs" and
+the labels "ys". Both the training set and test set contain xs and ys, for
+example the training images are `mnist.train.images` and the train labels are
+`mnist.train.labels`.
+
+Each image is 28 pixels by 28 pixels. We can interpret this as a big array of
+numbers:
+
+<div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/MNIST-Matrix.png">
+</div>
+
+We can flatten this array into a vector of 28x28 = 784 numbers. It doesn't
+matter how we flatten the array, as long as we're consistent between images.
+From this perspective, the MNIST images are just a bunch of points in a
+784-dimensional vector space, with a
+[very rich structure](http://colah.github.io/posts/2014-10-Visualizing-MNIST/)
+(warning: computationally intensive visualizations).
+
+Flattening the data throws away information about the 2D structure of the image.
+Isn't that bad? Well, the best computer vision methods do exploit this
+structure, and we will in later tutorials. But the simple method we will be
+using here, a softmax regression, won't.
+
+The result is that `mnist.train.images` is a tensor (an n-dimensional array) with a
+shape of `[60000, 784]`. The first dimension indexes the images and the second
+dimension indexes the pixels in each image. Each entry in the tensor is the
+pixel intensity between 0 and 1, for a particular pixel in a particular image.
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/mnist-train-xs.png">
+</div>
+
+The corresponding labels in MNIST are numbers between 0 and 9, describing
+which digit a given image is of.
+For the purposes of this tutorial, we're going to want our labels as
+as "one-hot vectors". A one-hot vector is a vector which is 0 in most
+dimensions, and 1 in a single dimension. In this case, the $$n$$th digit will be
+represented as a vector which is 1 in the $$n$$th dimensions. For example, 0
+would be $$[1,0,0,0,0,0,0,0,0,0,0]$$.
+Consequently, `mnist.train.labels` is a
+`[60000, 10]` array of floats.
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/mnist-train-ys.png">
+</div>
+
+We're now ready to actually make our model!
+
+## Softmax Regressions
+
+We know that every image in MNIST is a digit, whether it's a zero or a nine. We
+want to be able to look at an image and give probabilities for it being each
+digit. For example, our model might look at a picture of a nine and be 80% sure
+it's a nine, but give a 5% chance to it being an eight (because of the top loop)
+and a bit of probability to all the others because it isn't sure.
+
+This is a classic case where a softmax regression is a natural, simple model.
+If you want to assign probabilities to an object being one of several different
+things, softmax is the thing to do. Even later on, when we train more
+sophisticated models, the final step will be a layer of softmax.
+
+A softmax regression has two steps: first we add up the evidence of our input
+being in certain classes, and then we convert that evidence into probabilities.
+
+To tally up the evidence that a given image is in a particular class, we do a
+weighted sum of the pixel intensities. The weight is negative if that pixel
+having a high intensity is evidence against the image being in that class,
+and positive if it is evidence in favor.
+
+The following diagram shows the weights one model learned for each of these
+classes. Red represents negative weights, while blue represents positive
+weights.
+
+<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-weights.png">
+</div>
+
+We also add some extra evidence called a bias. Basically, we want to be able
+to say that some things are more likely independent of the input. The result is
+that the evidence for a class $$i$$ given an input $$x$$ is:
+
+$$\text{evidence}_i = \sum_j W_{i,~ j} x_j + b_i$$
+
+where $$W_i$$ is the weights and $$b_i$$ is the bias for class $$i$$, and $$j$$
+is an index for summing over the pixels in our input image $$x$$. We then
+convert the evidence tallies into our predicted probabilities
+$$y$$ using the "softmax" function:
+
+$$y = \text{softmax}(\text{evidence})$$
+
+Here softmax is serving as an "activation" or "link" function, shaping
+the output of our linear function into the form we want -- in this case, a
+probability distribution over 10 cases.
+You can think of it as converting tallies
+of evidence into probabilities of our input being in each class.
+It's defined as:
+
+$$\text{softmax}(x) = \text{normalize}(\exp(x))$$
+
+If you expand that equation out, you get:
+
+$$\text{softmax}(x)_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$$
+
+But it's often more helpful to think of softmax the first way:
+exponentiating its inputs and then normalizing them. The exponentiation
+means that one unit more evidence increases the weight given to any hypothesis
+multiplicatively. And conversely, having one less unit of evidence means that a
+hypothesis gets a fraction of its earlier weight. No hypothesis ever has zero
+or negative weight. Softmax then normalizes these weights, so that they add up
+to one, forming a valid probability distribution. (To get more intuition about
+the softmax function, check out the
+[section](http://neuralnetworksanddeeplearning.com/chap3.html#softmax)
+on it in Michael Nieslen's book, complete with an interactive visualization.)
+
+
+You can picture our softmax regression as looking something like the following,
+although with a lot more $$x$$s. For each output, we compute a weighted sum of
+the $$x$$s, add a bias, and then apply softmax.
+
+<div style="width:55%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-regression-scalargraph.png">
+</div>
+
+If we write that out as equations, we get:
+
+<div style="width:52%; margin-left:25%; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-regression-scalarequation.png">
+</div>
+
+We can "vectorize" this procedure, turning it into a matrix multiplication
+and vector addition. This is helpful for computational efficiency. (It's also
+a useful way to think.)
+
+<div style="width:50%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-regression-vectorequation.png">
+</div>
+
+More compactly, we can just write:
+
+$$y = \text{softmax}(Wx + b)$$
+
+
+## Implementing the Regression
+
+
+To do efficient numerical computing in Python, we typically use libraries like
+NumPy that do expensive operations such as matrix multiplication outside Python,
+using highly efficient code implemented in another language.
+Unfortunately, there can still be a lot of overhead from switching back to
+Python every operation. This overhead is especially bad if you want to run
+computations on GPUs or in a distributed manner, where there can be a high cost
+to transferring data.
+
+TensorFlow also does its heavy lifting outside python,
+but it takes things a step further to avoid this overhead.
+Instead of running a single expensive operation independently
+from Python, TensorFlow lets us describe a graph of interacting operations that
+run entirely outside Python. (Approaches like this can be seen in a few
+machine learning libraries.)
+
+To run computations, TensorFlow needs to connect to its backend. This connection
+is called a `Session`. To use TensorFlow, we need to import it and create a
+session.
+
+```python
+import tensorflow as tf
+sess = tf.InteractiveSession()
+```
+
+(Using an `InteractiveSession` makes TensorFlow a bit more flexible about how
+you structure your code. In particular, it's helpful for work in interactive
+contexts like iPython.)
+
+We describe these interacting operations by manipulating symbolic variables.
+Let's create one:
+
+```python
+x = tf.placeholder("float", [None, 784])
+```
+
+`x` isn't a specific value. It's a `placeholder`, a value that we'll input when
+we ask TensorFlow to run a computation. We want to be able to input any number
+of MNIST images, each flattened into a 784-dimensional vector. We represent
+this as a 2d tensor of floating point numbers, with a shape `[None, 784]`.
+(Here `None` means that a dimension can be of any length.)
+
+We also need the weights and biases for our model. We could imagine treating
+these like additional inputs, but TensorFlow has an even better way to handle
+it: `Variable`.
+A `Variable` is a modifiable tensor that lives in TensorFlow's graph of
+interacting
+operations. It can be used and even modified by the computation. For machine
+learning applications, one generally has the model parameters be `Variable`s.
+
+```python
+W = tf.Variable(tf.zeros([784,10]))
+b = tf.Variable(tf.zeros([10]))
+```
+
+We create these `Variable`s by giving `tf.Variable` the initial value of the
+`Variable`: in this case, we initialize both `W` and `b` as tensors full of
+zeros. Since we are going to learn `W` and `b`, it doesn't matter very much
+what they initially are.
+
+Notice that `W` has a shape of [784, 10] because we want to multiply the
+784-dimensional image vectors by it to produce 10-dimensional vectors of
+evidence for the difference classes. `b` has a shape of [10] so we can add it
+to the output.
+
+We can now implement our model. It only takes one line!
+
+```python
+y = tf.nn.softmax(tf.matmul(x,W) + b)
+```
+
+First, we multiply `x` by `W` with the expression `tf.matmul(x,W)`. This is
+flipped from when we multiplied them in our equation, where we had $$Wx$$, as a
+small trick
+to deal with `x` being a 2D tensor with multiple inputs. We then add `b`, and
+finally apply `tf.nn.softmax`.
+
+That's it. It only took us one line to define our model, after a couple short
+lines of setup. That isn't because TensorFlow is designed to make a softmax
+regression particularly easy: it's just a very flexible way to describe many
+kinds of numerical computations, from machine learning models to physics
+simulations. And once defined, our model can be run on different devices:
+your computer's CPU, GPUs, and even phones!
+
+
+## Training
+
+In order to train our model, we need to define what it means for the  model to
+be good. Well, actually, in machine learning we typically define what it means
+for a model to be bad, called the cost or loss, and then try to minimize how bad
+it is. But the two are equivalent.
+
+One very common, very nice cost function is "cross-entropy." Surprisingly,
+cross-entropy arises from thinking about information compressing codes in
+information theory but it winds up being an important idea in lots of areas,
+from gambling to machine learning. It's defined:
+
+$$H_{y'}(y) = -\sum_i y'_i \log(y_i)$$
+
+Where $$y$$ is our predicted probability distribution, and $$y'$$ is the true
+distribution (the one-hot vector we'll input).  In some rough sense, the
+cross-entropy is measuring how inefficient our predictions are for describing
+the truth. Going into more detail about cross-entropy is beyond the scope of
+this tutorial, but it's well worth
+[understanding](http://colah.github.io/posts/2015-09-Visual-Information/).
+
+To implement cross-entropy we need to first add a new placeholder to input
+the correct answers:
+
+```python
+y_ = tf.placeholder("float", [None,10])
+```
+
+Then we can implement the cross-entropy, $$-\sum y'\log(y)$$:
+
+```python
+cross_entropy = -tf.reduce_sum(y_*tf.log(y))
+```
+
+First, `tf.log` computes the logarithm of each element of `y`. Next, we multiply
+each element of `y_` with the corresponding element of `tf.log(y_)`. Finally,
+`tf.reduce_sum` adds all the elements of the tensor. (Note that this isn't
+just the cross-entropy of the truth with a single prediction, but the sum of the
+cross-entropies for all 100 images we looked at. How well we are doing on 100
+data points is a much better description of how good our model is than a single
+data point.)
+
+Now that we know what we want our model to do, it's very easy to have TensorFlow
+train it to do so.
+Because TensorFlow know the entire graph of your computations, it
+can automatically use the [backpropagation
+algorithm](http://colah.github.io/posts/2015-08-Backprop/)
+to efficiently determine how your variables affect the cost you ask it minimize.
+Then it can apply your choice of optimization algorithm to modify the variables
+and reduce the cost.
+
+```python
+train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
+```
+
+In this case, we ask TensorFlow to minimize `cross_entropy` using the gradient
+descent algorithm with a learning rate of 0.01. Gradient descent is a simple
+procedure, where TensorFlow simply shifts each variable a little bit in the
+direction that reduces the cost. But TensorFlow also provides
+[many other optimization algorithms]
+(../../../api_docs/python/train.md?#optimizers): using one is as simple as
+tweaking one line.
+
+What TensorFlow actually does here, behind the scenes, is it adds new operations
+to your graph which
+implement backpropagation and gradient descent. Then it gives you back a
+single operation which, when run, will do a step of gradient descent training,
+slightly tweaking your variables to reduce the cost.
+
+Now we have our model set up to train. But before we start, we need to
+initialize the variables we created:
+
+```python
+tf.initialize_all_variables().run()
+```
+
+Let's train -- we'll run the training step 1000 times!
+
+```python
+for i in range(1000):
+  batch_xs, batch_ys = mnist.train.next_batch(100)
+  train_step.run({x: batch_xs, y_: batch_ys})
+```
+
+Each step of the loop, we get a "batch" of one hundred random data points from
+our training set. We run `train_step` feeding in the batches data to replace
+the `placeholder`s.
+
+Using small batches of random data is called stochastic training -- in
+this case, stochastic gradient descent. Ideally, we'd like to use all our data
+for every step of training because that would give us a better sense of what
+we should be doing, but that's expensive. So, instead, we use a different subset
+every time. Doing this is cheap and has much of the same benefit.
+
+
+
+## Evaluating Our Model
+
+How well does our model do?
+
+Well, first let's figure out where we predicted the correct label. `tf.argmax`
+is an extremely useful function which gives you the index of the highest entry
+in a tensor along some axis. For example, `tf.argmax(y,1)` is the label our
+model thinks is most likely for each input, while `tf.argmax(y_,1)` is the
+correct label. We can use `tf.equal` to check if our prediction matches the
+truth.
+
+```python
+correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
+```
+
+That gives us a list of booleans. To determine what fraction are correct, we
+cast to floating point numbers and then take the mean. For example,
+`[True, False, True, True]` would become `[1,0,1,1]` which would become `0.75`.
+
+```python
+accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
+```
+
+Finally, we ask for our accuracy on our test data.
+
+```python
+print accuracy.eval({x: mnist.test.images, y_: mnist.test.labels})
+```
+
+This should be about 91%.
+
+Is that good? Well, not really. In fact, it's pretty bad. This is because we're
+using a very simple model. With some small changes, we can get to
+97%. The best models can get to over 99.7% accuracy! (For more information, have
+a look at this
+[list of results](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html).)
+
+What matters is that we learned from this model. Still, if you're feeling a bit
+down about these results, check out [the next tutorial](../../index.md) where we
+do a lot better, and learn how to build more sophisticated models using
+TensorFlow!
diff --git a/tensorflow/g3doc/tutorials/mnist/download/index.md b/tensorflow/g3doc/tutorials/mnist/download/index.md
new file mode 100644
index 0000000000..dc11e727d8
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/download/index.md
@@ -0,0 +1,85 @@
+# Downloading MNIST
+
+Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/)
+
+The goal of this tutorial is to show how to download the dataset files required
+for handwritten digit classification using the (classic) MNIST data set.
+
+## Tutorial Files
+
+This tutorial references the following files:
+
+File | Purpose
+--- | ---
+[`input_data.py`](../input_data.py) | The code to download the MNIST dataset for training and evaluation.
+
+## Prepare the Data
+
+MNIST is a classic problem in machine learning. The problem is to look at
+greyscale 28x28 pixel images of handwritten digits and determine which digit
+the image represents, for all the digits from zero to nine.
+
+![MNIST Digits](../tf/mnist_digits.png "MNIST Digits")
+
+For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
+or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/).
+
+### Download
+
+[Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
+also hosts the training and test data for download.
+
+File | Purpose
+--- | ---
+[`train-images-idx3-ubyte.gz`](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz) | training set images - 55000 training images, 5000 validation images
+[`train-labels-idx1-ubyte.gz`](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz) | training set labels matching the images
+[`t10k-images-idx3-ubyte.gz`](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz) | test set images - 10000 images
+[`t10k-labels-idx1-ubyte.gz`](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz) | test set labels matching the images
+
+In the `input_data.py` file, the `maybe_download()` function will ensure these
+files are downloaded into a local data folder for training.
+
+The folder name is specified in a flag variable at the top of the
+`fully_connected_feed.py` file and may be changed to fit your needs.
+
+### Unpack and Reshape
+
+The files themselves are not in any standard image format and are manually
+unpacked (following the instructions available at the website) by the
+`extract_images()` and `extract_labels()` functions in `input_data.py`.
+
+The image data is extracted into a 2d tensor of: `[image index, pixel index]`
+where each entry is the intensity value of a specific pixel in a specific
+image, rescaled from `[0, 255]` to `[-0.5, 0.5]`.  The "image index" corresponds
+to an image in the dataset, counting up from zero to the size of the dataset.
+And the "pixel index" corresponds to a specific pixel in that image, ranging
+from zero to the number of pixels in the image.
+
+The 60000 examples in the `train-*` files are then split into 55000 examples
+for training and 5000 examples for validation. For all of the 28x28
+pixel greyscale images in the datasets the image size is 784 and so the output
+tensor for the training set images is of shape `[55000, 784]`.
+
+The label data is extracted into a 1d tensor of: `[image index]`
+with the class identifier for each example as the value. For the training set
+labels, this would then be of shape `[55000]`.
+
+### DataSet Object
+
+The underlying code will download, unpack, and reshape images and labels for
+the following datasets:
+
+Dataset | Purpose
+--- | ---
+`data_sets.train` | 55000 images and labels, for primary training.
+`data_sets.validation` | 5000 images and labels, for iterative validation of training accuracy.
+`data_sets.test` | 10000 images and labels, for final testing of trained accuracy.
+
+The `read_data_sets()` function will return a dictionary with a `DataSet`
+instance for each of these three sets of data.  The `DataSet.next_batch()`
+method can be used to fetch a tuple consisting of `batch_size` lists of images
+and labels to be fed into the running TensorFlow session.
+
+```python
+images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size)
+```
diff --git a/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py b/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py
new file mode 100644
index 0000000000..618c8f47cb
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/fully_connected_feed.py
@@ -0,0 +1,219 @@
+"""Trains and Evaluates the MNIST network using a feed dictionary.
+
+TensorFlow install instructions:
+https://tensorflow.org/get_started/os_setup.html
+
+MNIST tutorial:
+https://tensorflow.org/tutorials/mnist/tf/index.html
+
+"""
+# pylint: disable=missing-docstring
+import os.path
+import time
+
+import tensorflow.python.platform
+import numpy
+import tensorflow as tf
+
+from tensorflow.g3doc.tutorials.mnist import input_data
+from tensorflow.g3doc.tutorials.mnist import mnist
+
+
+# Basic model parameters as external flags.
+flags = tf.app.flags
+FLAGS = flags.FLAGS
+flags.DEFINE_float('learning_rate', 0.01, 'Initial learning rate.')
+flags.DEFINE_integer('max_steps', 2000, 'Number of steps to run trainer.')
+flags.DEFINE_integer('hidden1', 128, 'Number of units in hidden layer 1.')
+flags.DEFINE_integer('hidden2', 32, 'Number of units in hidden layer 2.')
+flags.DEFINE_integer('batch_size', 100, 'Batch size.  '
+                     'Must divide evenly into the dataset sizes.')
+flags.DEFINE_string('train_dir', 'data', 'Directory to put the training data.')
+flags.DEFINE_boolean('fake_data', False, 'If true, uses fake data '
+                     'for unit testing.')
+
+
+def placeholder_inputs(batch_size):
+    """Generate placeholder variables to represent the the input tensors.
+
+    These placeholders are used as inputs by the rest of the model building
+    code and will be fed from the downloaded data in the .run() loop, below.
+
+    Args:
+      batch_size: The batch size will be baked into both placeholders.
+
+    Returns:
+      images_placeholder: Images placeholder.
+      labels_placeholder: Labels placeholder.
+    """
+    # Note that the shapes of the placeholders match the shapes of the full
+    # image and label tensors, except the first dimension is now batch_size
+    # rather than the full size of the train or test data sets.
+    images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
+                                                           mnist.IMAGE_PIXELS))
+    labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
+    return images_placeholder, labels_placeholder
+
+
+def fill_feed_dict(data_set, images_pl, labels_pl):
+    """Fills the feed_dict for training the given step.
+
+    A feed_dict takes the form of:
+    feed_dict = {
+        <placeholder>: <tensor of values to be passed for placeholder>,
+        ....
+    }
+
+    Args:
+      data_set: The set of images and labels, from input_data.read_data_sets()
+      images_pl: The images placeholder, from placeholder_inputs().
+      labels_pl: The labels placeholder, from placeholder_inputs().
+
+    Returns:
+      feed_dict: The feed dictionary mapping from placeholders to values.
+    """
+    # Create the feed_dict for the placeholders filled with the next
+    # `batch size ` examples.
+    images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size,
+                                                   FLAGS.fake_data)
+    feed_dict = {
+        images_pl: images_feed,
+        labels_pl: labels_feed,
+    }
+    return feed_dict
+
+
+def do_eval(sess,
+            eval_correct,
+            images_placeholder,
+            labels_placeholder,
+            data_set):
+    """Runs one evaluation against the full epoch of data.
+
+    Args:
+      sess: The session in which the model has been trained.
+      eval_correct: The Tensor that returns the number of correct predictions.
+      images_placeholder: The images placeholder.
+      labels_placeholder: The labels placeholder.
+      data_set: The set of images and labels to evaluate, from
+        input_data.read_data_sets().
+    """
+    # And run one epoch of eval.
+    true_count = 0  # Counts the number of correct predictions.
+    steps_per_epoch = int(data_set.num_examples / FLAGS.batch_size)
+    num_examples = steps_per_epoch * FLAGS.batch_size
+    for step in xrange(steps_per_epoch):
+        feed_dict = fill_feed_dict(data_set,
+                                   images_placeholder,
+                                   labels_placeholder)
+        true_count += sess.run(eval_correct, feed_dict=feed_dict)
+    precision = float(true_count) / float(num_examples)
+    print '  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' % (
+        num_examples, true_count, precision)
+
+
+def run_training():
+    """Train MNIST for a number of steps."""
+    # Get the sets of images and labels for training, validation, and
+    # test on MNIST.
+    data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
+
+    # Tell TensorFlow that the model will be built into the default Graph.
+    with tf.Graph().as_default():
+        # Generate placeholders for the images and labels.
+        images_placeholder, labels_placeholder = placeholder_inputs(
+            FLAGS.batch_size)
+
+        # Build a Graph that computes predictions from the inference model.
+        logits = mnist.inference(images_placeholder,
+                                 FLAGS.hidden1,
+                                 FLAGS.hidden2)
+
+        # Add to the Graph the Ops for loss calculation.
+        loss = mnist.loss(logits, labels_placeholder)
+
+        # Add to the Graph the Ops that calculate and apply gradients.
+        train_op = mnist.training(loss, FLAGS.learning_rate)
+
+        # Add the Op to compare the logits to the labels during evaluation.
+        eval_correct = mnist.evaluation(logits, labels_placeholder)
+
+        # Build the summary operation based on the TF collection of Summaries.
+        summary_op = tf.merge_all_summaries()
+
+        # Create a saver for writing training checkpoints.
+        saver = tf.train.Saver()
+
+        # Create a session for running Ops on the Graph.
+        sess = tf.Session()
+
+        # Run the Op to initialize the variables.
+        init = tf.initialize_all_variables()
+        sess.run(init)
+
+        # Instantiate a SummaryWriter to output summaries and the Graph.
+        summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                                graph_def=sess.graph_def)
+
+        # And then after everything is built, start the training loop.
+        for step in xrange(FLAGS.max_steps):
+            start_time = time.time()
+
+            # Fill a feed dictionary with the actual set of images and labels
+            # for this particular training step.
+            feed_dict = fill_feed_dict(data_sets.train,
+                                       images_placeholder,
+                                       labels_placeholder)
+
+            # Run one step of the model.  The return values are the activations
+            # from the `train_op` (which is discarded) and the `loss` Op.  To
+            # inspect the values of your Ops or variables, you may include them
+            # in the list passed to sess.run() and the value tensors will be
+            # returned in the tuple from the call.
+            _, loss_value = sess.run([train_op, loss],
+                                     feed_dict=feed_dict)
+
+            duration = time.time() - start_time
+
+            # Write the summaries and print an overview fairly often.
+            if step % 100 == 0:
+                # Print status to stdout.
+                print 'Step %d: loss = %.2f (%.3f sec)' % (step,
+                                                           loss_value,
+                                                           duration)
+                # Update the events file.
+                summary_str = sess.run(summary_op, feed_dict=feed_dict)
+                summary_writer.add_summary(summary_str, step)
+
+            # Save a checkpoint and evaluate the model periodically.
+            if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
+                saver.save(sess, FLAGS.train_dir, global_step=step)
+                # Evaluate against the training set.
+                print 'Training Data Eval:'
+                do_eval(sess,
+                        eval_correct,
+                        images_placeholder,
+                        labels_placeholder,
+                        data_sets.train)
+                # Evaluate against the validation set.
+                print 'Validation Data Eval:'
+                do_eval(sess,
+                        eval_correct,
+                        images_placeholder,
+                        labels_placeholder,
+                        data_sets.validation)
+                # Evaluate against the test set.
+                print 'Test Data Eval:'
+                do_eval(sess,
+                        eval_correct,
+                        images_placeholder,
+                        labels_placeholder,
+                        data_sets.test)
+
+
+def main(_):
+    run_training()
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/g3doc/tutorials/mnist/input_data.py b/tensorflow/g3doc/tutorials/mnist/input_data.py
new file mode 100644
index 0000000000..88892027ff
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/input_data.py
@@ -0,0 +1,175 @@
+"""Functions for downloading and reading MNIST data."""
+import gzip
+import os
+import urllib
+
+import numpy
+
+SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'
+
+
+def maybe_download(filename, work_directory):
+    """Download the data from Yann's website, unless it's already here."""
+    if not os.path.exists(work_directory):
+        os.mkdir(work_directory)
+    filepath = os.path.join(work_directory, filename)
+    if not os.path.exists(filepath):
+        filepath, _ = urllib.urlretrieve(SOURCE_URL + filename, filepath)
+        statinfo = os.stat(filepath)
+        print 'Succesfully downloaded', filename, statinfo.st_size, 'bytes.'
+    return filepath
+
+
+def _read32(bytestream):
+  dt = numpy.dtype(numpy.uint32).newbyteorder('>')
+  return numpy.frombuffer(bytestream.read(4), dtype=dt)
+
+
+def extract_images(filename):
+    """Extract the images into a 4D uint8 numpy array [index, y, x, depth]."""
+    print 'Extracting', filename
+    with gzip.open(filename) as bytestream:
+        magic = _read32(bytestream)
+        if magic != 2051:
+            raise ValueError(
+                'Invalid magic number %d in MNIST image file: %s' %
+                (magic, filename))
+        num_images = _read32(bytestream)
+        rows = _read32(bytestream)
+        cols = _read32(bytestream)
+        buf = bytestream.read(rows * cols * num_images)
+        data = numpy.frombuffer(buf, dtype=numpy.uint8)
+        data = data.reshape(num_images, rows, cols, 1)
+        return data
+
+
+def dense_to_one_hot(labels_dense, num_classes=10):
+  """Convert class labels from scalars to one-hot vectors."""
+  num_labels = labels_dense.shape[0]
+  index_offset = numpy.arange(num_labels) * num_classes
+  labels_one_hot = numpy.zeros((num_labels, num_classes))
+  labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
+  return labels_one_hot
+
+
+def extract_labels(filename, one_hot=False):
+    """Extract the labels into a 1D uint8 numpy array [index]."""
+    print 'Extracting', filename
+    with gzip.open(filename) as bytestream:
+        magic = _read32(bytestream)
+        if magic != 2049:
+            raise ValueError(
+                'Invalid magic number %d in MNIST label file: %s' %
+                (magic, filename))
+        num_items = _read32(bytestream)
+        buf = bytestream.read(num_items)
+        labels = numpy.frombuffer(buf, dtype=numpy.uint8)
+        if one_hot:
+            return dense_to_one_hot(labels)
+        return labels
+
+
+class DataSet(object):
+
+    def __init__(self, images, labels, fake_data=False):
+        if fake_data:
+            self._num_examples = 10000
+        else:
+            assert images.shape[0] == labels.shape[0], (
+                "images.shape: %s labels.shape: %s" % (images.shape,
+                                                       labels.shape))
+            self._num_examples = images.shape[0]
+
+            # Convert shape from [num examples, rows, columns, depth]
+            # to [num examples, rows*columns] (assuming depth == 1)
+            assert images.shape[3] == 1
+            images = images.reshape(images.shape[0],
+                                    images.shape[1] * images.shape[2])
+            # Convert from [0, 255] -> [0.0, 1.0].
+            images = images.astype(numpy.float32)
+            images = numpy.multiply(images, 1.0 / 255.0)
+        self._images = images
+        self._labels = labels
+        self._epochs_completed = 0
+        self._index_in_epoch = 0
+
+    @property
+    def images(self):
+        return self._images
+
+    @property
+    def labels(self):
+        return self._labels
+
+    @property
+    def num_examples(self):
+        return self._num_examples
+
+    @property
+    def epochs_completed(self):
+        return self._epochs_completed
+
+    def next_batch(self, batch_size, fake_data=False):
+        """Return the next `batch_size` examples from this data set."""
+        if fake_data:
+            fake_image = [1.0 for _ in xrange(784)]
+            fake_label = 0
+            return [fake_image for _ in xrange(batch_size)], [
+                fake_label for _ in xrange(batch_size)]
+        start = self._index_in_epoch
+        self._index_in_epoch += batch_size
+        if self._index_in_epoch > self._num_examples:
+            # Finished epoch
+            self._epochs_completed += 1
+            # Shuffle the data
+            perm = numpy.arange(self._num_examples)
+            numpy.random.shuffle(perm)
+            self._images = self._images[perm]
+            self._labels = self._labels[perm]
+            # Start next epoch
+            start = 0
+            self._index_in_epoch = batch_size
+            assert batch_size <= self._num_examples
+        end = self._index_in_epoch
+        return self._images[start:end], self._labels[start:end]
+
+
+def read_data_sets(train_dir, fake_data=False, one_hot=False):
+    class DataSets(object):
+        pass
+    data_sets = DataSets()
+
+    if fake_data:
+      data_sets.train = DataSet([], [], fake_data=True)
+      data_sets.validation = DataSet([], [], fake_data=True)
+      data_sets.test = DataSet([], [], fake_data=True)
+      return data_sets
+
+    TRAIN_IMAGES = 'train-images-idx3-ubyte.gz'
+    TRAIN_LABELS = 'train-labels-idx1-ubyte.gz'
+    TEST_IMAGES = 't10k-images-idx3-ubyte.gz'
+    TEST_LABELS = 't10k-labels-idx1-ubyte.gz'
+    VALIDATION_SIZE = 5000
+
+    local_file = maybe_download(TRAIN_IMAGES, train_dir)
+    train_images = extract_images(local_file)
+
+    local_file = maybe_download(TRAIN_LABELS, train_dir)
+    train_labels = extract_labels(local_file, one_hot=one_hot)
+
+    local_file = maybe_download(TEST_IMAGES, train_dir)
+    test_images = extract_images(local_file)
+
+    local_file = maybe_download(TEST_LABELS, train_dir)
+    test_labels = extract_labels(local_file, one_hot=one_hot)
+
+    validation_images = train_images[:VALIDATION_SIZE]
+    validation_labels = train_labels[:VALIDATION_SIZE]
+    train_images = train_images[VALIDATION_SIZE:]
+    train_labels = train_labels[VALIDATION_SIZE:]
+
+    data_sets.train = DataSet(train_images, train_labels)
+    data_sets.validation = DataSet(validation_images, validation_labels)
+    data_sets.test = DataSet(test_images, test_labels)
+
+    return data_sets
diff --git a/tensorflow/g3doc/tutorials/mnist/mnist.py b/tensorflow/g3doc/tutorials/mnist/mnist.py
new file mode 100644
index 0000000000..acf4d01dd1
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/mnist.py
@@ -0,0 +1,148 @@
+"""Builds the MNIST network.
+
+Implements the inference/loss/training pattern for model building.
+
+1. inference() - Builds the model as far as is required for running the network
+forward to make predictions.
+2. loss() - Adds to the inference model the layers required to generate loss.
+3. training() - Adds to the loss model the Ops required to generate and
+apply gradients.
+
+This file is used by the various "fully_connected_*.py" files and not meant to
+be run.
+
+TensorFlow install instructions:
+https://tensorflow.org/get_started/os_setup.html
+
+MNIST tutorial:
+https://tensorflow.org/tutorials/mnist/tf/index.html
+"""
+import math
+
+import tensorflow.python.platform
+import tensorflow as tf
+
+# The MNIST dataset has 10 classes, representing the digits 0 through 9.
+NUM_CLASSES = 10
+
+# The MNIST images are always 28x28 pixels.
+IMAGE_SIZE = 28
+IMAGE_PIXELS = IMAGE_SIZE * IMAGE_SIZE
+
+
+def inference(images, hidden1_units, hidden2_units):
+    """Build the MNIST model up to where it may be used for inference.
+
+    Args:
+      images: Images placeholder, from inputs().
+      hidden1: Size of the first hidden layer.
+      hidden2: Size of the second hidden layer.
+
+    Returns:
+      softmax_linear: Output tensor with the computed logits.
+    """
+    # Hidden 1
+    with tf.name_scope('hidden1') as scope:
+        weights = tf.Variable(
+            tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
+                                stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
+            name='weights')
+        biases = tf.Variable(tf.zeros([hidden1_units]),
+                             name='biases')
+        hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
+    # Hidden 2
+    with tf.name_scope('hidden2') as scope:
+        weights = tf.Variable(
+            tf.truncated_normal([hidden1_units, hidden2_units],
+                                stddev=1.0 / math.sqrt(float(hidden1_units))),
+            name='weights')
+        biases = tf.Variable(tf.zeros([hidden2_units]),
+                             name='biases')
+        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
+    # Linear
+    with tf.name_scope('softmax_linear') as scope:
+        weights = tf.Variable(
+            tf.truncated_normal([hidden2_units, NUM_CLASSES],
+                                stddev=1.0 / math.sqrt(float(hidden2_units))),
+            name='weights')
+        biases = tf.Variable(tf.zeros([NUM_CLASSES]),
+                             name='biases')
+        logits = tf.matmul(hidden2, weights) + biases
+    return logits
+
+
+def loss(logits, labels):
+    """Calculates the loss from the logits and the labels.
+
+    Args:
+      logits: Logits tensor, float - [batch_size, NUM_CLASSES].
+      labels: Labels tensor, int32 - [batch_size].
+
+    Returns:
+      loss: Loss tensor of type float.
+    """
+    # Convert from sparse integer labels in the range [0, NUM_CLASSSES)
+    # to 1-hot dense float vectors (that is we will have batch_size vectors,
+    # each with NUM_CLASSES values, all of which are 0.0 except there will
+    # be a 1.0 in the entry corresponding to the label).
+    batch_size = tf.size(labels)
+    labels = tf.expand_dims(labels, 1)
+    indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
+    concated = tf.concat(1, [indices, labels])
+    onehot_labels = tf.sparse_to_dense(
+        concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
+    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,
+                                                            onehot_labels,
+                                                            name='xentropy')
+    loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
+    return loss
+
+
+def training(loss, learning_rate):
+    """Sets up the training Ops.
+
+    Creates a summarizer to track the loss over time in TensorBoard.
+
+    Creates an optimizer and applies the gradients to all trainable variables.
+
+    The Op returned by this function is what must be passed to the
+    `sess.run()` call to cause the model to train.
+
+    Args:
+      loss: Loss tensor, from loss().
+      learning_rate: The learning rate to use for gradient descent.
+
+    Returns:
+      train_op: The Op for training.
+    """
+    # Add a scalar summary for the snapshot loss.
+    tf.scalar_summary(loss.op.name, loss)
+    # Create the gradient descent optimizer with the given learning rate.
+    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
+    # Create a variable to track the global step.
+    global_step = tf.Variable(0, name='global_step', trainable=False)
+    # Use the optimizer to apply the gradients that minimize the loss
+    # (and also increment the global step counter) as a single training step.
+    train_op = optimizer.minimize(loss, global_step=global_step)
+    return train_op
+
+
+def evaluation(logits, labels):
+    """Evaluate the quality of the logits at predicting the label.
+
+    Args:
+      logits: Logits tensor, float - [batch_size, NUM_CLASSES].
+      labels: Labels tensor, int32 - [batch_size], with values in the
+        range [0, NUM_CLASSES).
+
+    Returns:
+      A scalar int32 tensor with the number of examples (out of batch_size)
+      that were predicted correctly.
+    """
+    # For a classifier model, we can use the in_top_k Op.
+    # It returns a bool tensor with shape [batch_size] that is true for
+    # the examples where the label's is was in the top k (here k=1)
+    # of all logits for that example.
+    correct = tf.nn.in_top_k(logits, labels, 1)
+    # Return the number of true entries.
+    return tf.reduce_sum(tf.cast(correct, tf.int32))
diff --git a/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py b/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py
new file mode 100644
index 0000000000..640ea29dac
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/mnist_softmax.py
@@ -0,0 +1,33 @@
+"""A very simple MNIST classifer.
+
+See extensive documentation at ??????? (insert public URL)
+"""
+
+# Import data
+import input_data
+mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
+
+import tensorflow as tf
+sess = tf.InteractiveSession()
+
+# Create the model
+x = tf.placeholder("float", [None, 784])
+W = tf.Variable(tf.zeros([784,10]))
+b = tf.Variable(tf.zeros([10]))
+y = tf.nn.softmax(tf.matmul(x,W) + b)
+
+# Define loss and optimizer
+y_ = tf.placeholder("float", [None,10])
+cross_entropy = -tf.reduce_sum(y_*tf.log(y))
+train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
+
+# Train
+tf.initialize_all_variables().run()
+for i in range(1000):
+  batch_xs, batch_ys = mnist.train.next_batch(100)
+  train_step.run({x: batch_xs, y_: batch_ys})
+
+# Test trained model
+correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
+accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
+print accuracy.eval({x: mnist.test.images, y_: mnist.test.labels})
diff --git a/tensorflow/g3doc/tutorials/mnist/pros/index.md b/tensorflow/g3doc/tutorials/mnist/pros/index.md
new file mode 100644
index 0000000000..17696712b0
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/pros/index.md
@@ -0,0 +1,390 @@
+# MNIST Deep Learning Example (For Experts)
+
+TensorFlow is a powerful library for doing large-scale numerical computation.
+One of the tasks at which it excels is implementing and training deep neural
+networks.
+In this tutorial we will learn the basic building blocks of a TensorFlow model
+while constructing a deep convolutional MNIST classifier.
+
+*This introduction assumes familiarity with neural networks and the MNIST
+dataset. If you don't have
+a background with them, check out the
+[introduction for beginners](../beginners/index.md).*
+
+## Setup
+
+Before we create our model, we will first load the MNIST dataset, and start a
+TensorFlow session.
+
+### Load MNIST Data
+
+For your convenience, we've included [a script](../input_data.py) which
+automatically downloads and imports the MNIST dataset. It will create a
+directory `'MNIST_data'` in which to store the data files.
+
+```python
+import input_data
+mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
+```
+
+Here `mnist` is a lightweight class which stores the training, validation, and
+testing sets as NumPy arrays.
+It also provides a function for iterating through data minibatches, which we
+will use below.
+
+### Start TensorFlow Session
+
+Tensorflow relies on a highly efficient C++ backend to do its computation. The
+connection to this backend is called a session. We will need to create a session
+before we can do any computation.
+
+```python
+import tensorflow as tf
+sess = tf.InteractiveSession()
+```
+
+Using an `InteractiveSession` makes TensorFlow more flexible about how you
+structure your code.
+It allows you to interleave operations which build a
+[computation graph](../../../get_started/basic_usage.md#the-computation-graph)
+with ones that run the graph.
+This is particularly convenient when working in interactive contexts like
+iPython.
+If you are not using an `InteractiveSession`, then you should build
+the entire computation graph before starting a session and [launching the
+graph](../../../get_started/basic_usage.md#launching-the-graph-in-a-session).
+
+#### Computation Graph
+
+To do efficient numerical computing in Python, we typically use libraries like
+NumPy that do expensive operations such as matrix multiplication outside Python,
+using highly efficient code implemented in another language.
+Unfortunately, there can still be a lot of overhead from switching back to
+Python every operation. This overhead is especially bad if you want to run
+computations on GPUs or in a distributed manner, where there can be a high cost
+to transferring data.
+
+TensorFlow also does its heavy lifting outside Python,
+but it takes things a step further to avoid this overhead.
+Instead of running a single expensive operation independently
+from Python, TensorFlow lets us describe a graph of interacting operations that
+run entirely outside Python.
+This approach is similar to that used in Theano or Torch.
+
+The role of the Python code is therefore to build this external computation
+graph, and to dictate which parts of the computation graph should be run. See
+the
+[Computation Graph](../../../get_started/basic_usage.md#the-computation-graph)
+section of
+[Basic Usage](../../../get_started/basic_usage.md)
+for more detail.
+
+## Build a Softmax Regression Model
+
+In this section we will build a softmax regression model with a single linear
+layer. In the next section, we will extend this to the case of softmax
+regression with a multilayer convolutional network.
+
+### Placeholders
+
+We start building the computation graph by creating nodes for the
+input images and target output classes.
+
+```python
+x = tf.placeholder("float", shape=[None, 784])
+y_ = tf.placeholder("float", shape=[None, 10])
+```
+
+Here `x` and `y_` aren't specific values. Rather, they are each a `placeholder`
+-- a value that we'll input when we ask TensorFlow to run a computation.
+
+The input images `x` will consist of a 2d tensor of floating point numbers.
+Here we assign it a `shape` of `[None, 784]`, where `784` is the dimensionality of
+a single flattened MNIST image, and `None` indicates that the first dimension,
+corresponding to the batch size, can be of any size.
+The target output classes `y_` will also consist of a 2d tensor,
+where each row is a one-hot 10-dimensional vector indicating
+which digit class the corresponding MNIST image belongs to.
+
+The `shape` argument to `placeholder` is optional, but it allows TensorFlow
+to automatically catch bugs stemming from inconsistent tensor shapes.
+
+### Variables
+
+We now define the weights `W` and biases `b` for our model. We could imagine treating
+these like additional inputs, but TensorFlow has an even better way to handle
+them: `Variable`.
+A `Variable` is a value that lives in TensorFlow's computation graph.
+It can be used and even modified by the computation. In machine
+learning applications, one generally has the model paramaters be `Variable`s.
+
+```python
+W = tf.Variable(tf.zeros([784,10]))
+b = tf.Variable(tf.zeros([10]))
+```
+
+We pass the initial value for each parameter in the call to `tf.Variable`.
+In this case, we initialize both `W` and `b` as tensors full of
+zeros. `W` is a 784x10 matrix (because we have 784 input features
+and 10 outputs) and `b` is a 10-dimensional vector (because we have 10 classes).
+
+Before `Variable`s can be used within a session, they must be initialized using
+that session.
+This step takes the initial values (in this case tensors full of zeros) that
+have already been specified, and assigns them to each `Variable`. This can be
+done for all `Variables` at once.
+
+```python
+sess.run(tf.initialize_all_variables())
+```
+
+### Predicted Class and Cost Function
+
+We can now implement our regression model. It only takes one line!
+We multiply the vectorized input images `x` by the weight matrix `W`, add
+the bias `b`, and compute the softmax probabilities that are assigned to each
+class.
+
+```python
+y = tf.nn.softmax(tf.matmul(x,W) + b)
+```
+
+The cost function to be minimized during training can be specified just as
+easily. Our cost function will be the cross-entropy between the target and the
+model's prediction.
+
+```python
+cross_entropy = -tf.reduce_sum(y_*tf.log(y))
+```
+
+Note that `tf.reduce_sum` sums across all images in the minibatch, as well as
+all classes. We are computing the cross entropy for the entire minibatch.
+
+## Train the Model
+
+Now that we have defined our model and training cost function, it is
+straightforward to train using TensorFlow.
+Because TensorFlow knows the entire computation graph, it
+can use automatic differentiation to find the gradients of the cost with
+respect to each of the variables.
+TensorFlow has a variety of
+[builtin optimization algorithms]
+(../../../api_docs/python/train.md?#optimizers).
+For this example, we will use steepest gradient descent, with a step length of
+0.01, to descend the cross entropy.
+
+```python
+train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
+```
+
+What TensorFlow actually did in that single line was to add new operations to
+the computation graph. These operations included ones to compute gradients,
+compute parameter update steps, and apply update steps to the parameters.
+
+The returned operation `train_step`, when run, will apply the gradient
+descent updates to the parameters. Training the model can therefore be
+accomplished by repeatedly running `train_step`.
+
+```python
+for i in range(1000):
+  batch = mnist.train.next_batch(50)
+  train_step.run(feed_dict={x: batch[0], y_: batch[1]})
+```
+
+Each training iteration we load 50 training examples. We then run the
+`train_step` operation, using `feed_dict` to replace the `placeholder` tensors
+`x` and `y_` with the training examples.
+Note that you can replace any tensor in your computation graph using `feed_dict`
+-- it's not restricted to just `placeholder`s.
+
+### Evaluate the Model
+
+How well did our model do?
+
+First we'll figure out where we predicted the correct label. `tf.argmax`
+is an extremely useful function which gives you the index of the highest entry
+in a tensor along some axis. For example, `tf.argmax(y,1)` is the label our
+model thinks is most likely for each input, while `tf.argmax(y_,1)` is the
+true label. We can use `tf.equal` to check if our prediction matches the
+truth.
+
+```python
+correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
+```
+
+That gives us a list of booleans. To determine what fraction are correct, we
+cast to floating point numbers and then take the mean. For example,
+`[True, False, True, True]` would become `[1,0,1,1]` which would become `0.75`.
+
+```python
+accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
+```
+
+Finally, we can evaluate our accuracy on the test data. This should be about
+91% correct.
+
+```python
+print accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})
+```
+
+## Build a Multilayer Convolutional Network
+
+Getting 91% accuracy on MNIST is bad. It's almost embarrassingly bad. In this
+section, we'll fix that, jumping from a very simple model to something moderatly
+sophisticated: a small convolutional neural network. This will get us to around
+99.2% accuracy -- not state of the art, but respectable.
+
+### Weight Initialization
+
+To create this model, we're going to need to create a lot of weights and biases.
+One should generally initialize weights with a small amount of noise for
+symmetry breaking, and to prevent 0 gradients. Since we're using ReLU neurons,
+it is also good practice to initialize them with a slightly positive initial
+bias to avoid "dead neurons." Instead of doing this repeatedly while we build
+the model, let's create two handy functions to do it for us.
+
+```python
+def weight_variable(shape):
+  initial = tf.truncated_normal(shape, stddev=0.1)
+  return tf.Variable(initial)
+
+def bias_variable(shape):
+  initial = tf.constant(0.1, shape=shape)
+  return tf.Variable(initial)
+```
+
+### Convolution and Pooling
+
+TensorFlow also gives us a lot of flexibility in convolution and pooling
+operations. How do we handle the boundaries? What is our stride size?
+In this example, we're always going to choose the vanilla version.
+Our convolutions uses a stride of one and are zero padded so that the
+output is the same size as the input. Our pooling is plain old max pooling
+over 2x2 blocks. To keep our code cleaner, let's also abstract those operations
+into functions.
+
+```python
+def conv2d(x, W):
+  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
+
+def max_pool_2x2(x):
+  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
+                        strides=[1, 2, 2, 1], padding='SAME')
+```
+
+### First Convolutional Layer
+
+We can now implement our first layer. It will consist of convolution, followed
+by max pooling. The convolutional will compute 32 features for each 5x5 patch.
+Its weight tensor will have a shape of `[5, 5, 1, 32]`. The first two
+dimensions are the patch size, the next is the number of input channels, and
+the last is the number of output channels. We will also have a bias vector with
+a component for each output channel.
+
+```python
+W_conv1 = weight_variable([5, 5, 1, 32])
+b_conv1 = bias_variable([32])
+```
+
+To apply the layer, we first reshape `x` to a 4d tensor, with the second and
+third dimensions corresponding to image width and height, and the final
+dimension corresponding to the number of color channels.
+
+```python
+x_image = tf.reshape(x, [-1,28,28,1])
+```
+
+We then convolve `x_image` with the weight tensor, add the
+bias, apply the ReLU function, and finally max pool.
+
+```python
+h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
+h_pool1 = max_pool_2x2(h_conv1)
+```
+
+### Second Convolutional Layer
+
+In order to build a deep network, we stack several layers of this type. The
+second layer will have 64 features for each 5x5 patch.
+
+```python
+W_conv2 = weight_variable([5, 5, 32, 64])
+b_conv2 = bias_variable([64])
+
+h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
+h_pool2 = max_pool_2x2(h_conv2)
+```
+
+### Densely Connected Layer
+
+Now that the image size has been reduced to 7x7, we add a fully-connected layer
+with 1024 neurons to allow processing on the entire image. We reshape the tensor
+from the pooling layer into a batch of vectors,
+multiply by a weight matrix, add a bias, and apply a ReLU.
+
+```python
+W_fc1 = weight_variable([7 * 7 * 64, 1024])
+b_fc1 = bias_variable([1024])
+
+h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
+h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
+```
+
+#### Dropout
+
+To reduce overfitting, we will apply dropout before the readout layer.
+We create a `placeholder` for the probability that a neuron's output is kept
+during dropout. This allows us to turn dropout on during training, and turn it
+off during testing.
+TensorFlow's `tf.nn.dropout` op automatically handles scaling neuron outputs in
+addition to masking them, so dropout just works without any additional scaling.
+
+```python
+keep_prob = tf.placeholder("float")
+h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
+```
+
+### Readout Layer
+
+Finally, we add a softmax layer, just like for the one layer softmax regression
+above.
+
+```python
+W_fc2 = weight_variable([1024, 10])
+b_fc2 = bias_variable([10])
+
+y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
+```
+
+### Train and Evaluate the Model
+
+How well does this model do?
+To train and evaluate it we will use code that is nearly identical to that for
+the simple one layer SoftMax network above.
+The differences are that: we will replace the steepest gradient descent
+optimizer with the more sophisticated ADAM optimizer; we will include the
+additional parameter `keep_prob` in `feed_dict` to control the dropout rate;
+and we will add logging to every 100th iteration in the training process.
+
+```python
+cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
+train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
+correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
+accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
+sess.run(tf.initialize_all_variables())
+for i in range(20000):
+  batch = mnist.train.next_batch(50)
+  if i%100 == 0:
+    train_accuracy = accuracy.eval(feed_dict={
+        x:batch[0], y_: batch[1], keep_prob: 1.0})
+    print "step %d, training accuracy %g"%(i, train_accuracy)
+  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
+
+print "test accuracy %g"%accuracy.eval(feed_dict={
+    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})
+```
+
+The final test set accuracy after running this code should be approximately 99.2%.
+
+We have learned how to quickly and easily build, train, and evaluate a
+fairly sophisticated deep learning model using TensorFlow.
diff --git a/tensorflow/g3doc/tutorials/mnist/tf/index.md b/tensorflow/g3doc/tutorials/mnist/tf/index.md
new file mode 100644
index 0000000000..86f3296287
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/mnist/tf/index.md
@@ -0,0 +1,513 @@
+# Handwritten Digit Classification
+
+Code: [tensorflow/g3doc/tutorials/mnist/](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/g3doc/tutorials/mnist/)
+
+The goal of this tutorial is to show how to use TensorFlow to train and
+evaluate a simple feed-forward neural network for handwritten digit
+classification using the (classic) MNIST data set.  The intended audience for
+this tutorial is experienced machine learning users interested in using
+TensorFlow.
+
+These tutorials are not intended for teaching Machine Learning in general.
+
+Please ensure you have followed the instructions to [`Install TensorFlow`](../../../get_started/os_setup.md).
+
+## Tutorial Files
+
+This tutorial references the following files:
+
+File | Purpose
+--- | ---
+[`mnist.py`](../mnist.py) | The code to build a fully-connected MNIST model.
+[`fully_connected_feed.py`](../fully_connected_feed.py) | The main code, to train the built MNIST model against the downloaded dataset using a feed dictionary.
+
+Simply run the `fully_connected_feed.py` file directly to start training:
+
+`python fully_connected_feed.py`
+
+## Prepare the Data
+
+MNIST is a classic problem in machine learning. The problem is to look at
+greyscale 28x28 pixel images of handwritten digits and determine which digit
+the image represents, for all the digits from zero to nine.
+
+![MNIST Digits](./mnist_digits.png "MNIST Digits")
+
+For more information, refer to [Yann LeCun's MNIST page](http://yann.lecun.com/exdb/mnist/)
+or [Chris Olah's visualizations of MNIST](http://colah.github.io/posts/2014-10-Visualizing-MNIST/).
+
+### Download
+
+At the top of the `run_training()` method, the `input_data.read_data_sets()`
+function will ensure that the correct data has been downloaded to your local
+training folder and then unpack that data to return a dictionary of `DataSet`
+instances.
+
+```python
+data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
+```
+
+**NOTE**: The `fake_data` flag is used for unit-testing purposes and may be
+safely ignored by the reader.
+
+Dataset | Purpose
+--- | ---
+`data_sets.train` | 55000 images and labels, for primary training.
+`data_sets.validation` | 5000 images and labels, for iterative validation of training accuracy.
+`data_sets.test` | 10000 images and labels, for final testing of trained accuracy.
+
+For more information about the data, please read the [`Download`](../download/index.md)
+tutorial.
+
+### Inputs and Placeholders
+
+The `placeholder_inputs()` function creates two [`tf.placeholder`](../../../api_docs/python/io_ops.md#placeholder)
+ops that define the shape of the inputs, including the `batch_size`, to the
+rest of the graph and into which the actual training examples will be fed.
+
+```python
+images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
+                                                       IMAGE_PIXELS))
+labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
+```
+
+Further down, in the training loop, the full image and label datasets are
+sliced to fit the `batch_size` for each step, matched with these placeholder
+ops, and then passed into the `sess.run()` function using the `feed_dict`
+parameter.
+
+## Build the Graph
+
+After creating placeholders for the data, the graph is built from the
+`mnist.py` file according to a 3-stage pattern: `inference()`, `loss()`, and
+`training()`.
+
+1.  `inference()` - Builds the graph as far as is required for running
+the network forward to make predictions.
+1.  `loss()` - Adds to the inference graph the ops required to generate
+loss.
+1.  `training()` - Adds to the loss graph the ops required to compute
+and apply gradients.
+
+<div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
+  <img style="width:100%" src="./mnist_subgraph.png">
+</div>
+
+### Inference
+
+The `inference()` function builds the graph as far as needed to
+return the tensor that would contain the output predictions.
+
+It takes the images placeholder as input and builds on top
+of it a pair of fully connected layers with ReLu activation followed by a ten
+node linear layer specifying the output logits.
+
+Each layer is created beneath a unique [`tf.name_scope`](../../../api_docs/python/framework.md#name_scope)
+that acts as a prefix to the items created within that scope.
+
+```python
+with tf.name_scope('hidden1') as scope:
+```
+
+Within the defined scope, the weights and biases to be used by each of these
+layers are generated into [`tf.Variable`](../../../api_docs/python/state_ops.md#Variable)
+instances, with their desired shapes:
+
+```python
+weights = tf.Variable(
+    tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
+                        stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
+    name='weights')
+biases = tf.Variable(tf.zeros([hidden1_units]),
+                     name='biases')
+```
+
+When, for instance, these are created under the `hidden1` scope, the unique
+name given to the weights variable would be "`hidden1/weights`".
+
+Each variable is given initializer ops as part of their construction.
+
+In this most common case, the weights are initialized with the
+[`tf.truncated_normal`](../../../api_docs/python/constant_op.md#truncated_normal)
+and given their shape of a 2d tensor with
+the first dim representing the number of units in the layer from which the
+weights connect and the second dim representing the number of
+units in the layer to which the weights connect.  For the first layer, named
+`hidden1`, the dimensions are `[IMAGE_PIXELS, hidden1_units]` because the
+weights are connecting the image inputs to the hidden1 layer.  The
+`tf.truncated_normal` initializer generates a random distribution with a given
+mean and standard deviation.
+
+Then the biases are initialized with [`tf.zeros`](../../../api_docs/python/constant_op.md#zeros)
+to ensure they start with all zero values, and their shape is simply the number
+of units in the layer to which they connect.
+
+The graph's three primary ops -- two [`tf.nn.relu`](../../../api_docs/python/nn.md#relu)
+ops wrapping [`tf.matmul`](../../../api_docs/python/math_ops.md#matmul)
+for the hidden layers and one extra `tf.matmul` for the logits -- are then
+created, each in turn, with their `tf.Variable` instances connected to the
+input placeholder or the output tensor of the layer beneath each.
+
+```python
+hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
+```
+
+```python
+hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
+```
+
+```python
+logits = tf.matmul(hidden2, weights) + biases
+```
+
+Finally, the `logits` tensor that will contain the output is returned.
+
+### Loss
+
+The `loss()` function further builds the graph by adding the required loss
+ops.
+
+First, the values from the label_placeholder are encoded as a tensor of 1-hot
+values. For example, if the class identifier is '3' the value is converted to:
+<br>`[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]`
+
+```python
+batch_size = tf.size(labels)
+labels = tf.expand_dims(labels, 1)
+indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
+concated = tf.concat(1, [indices, labels])
+onehot_labels = tf.sparse_to_dense(
+    concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)
+```
+
+A [`tf.nn.softmax_cross_entropy_with_logits`](../../../api_docs/python/nn.md#softmax_cross_entropy_with_logits)
+op is then added to compare the output logits from the `inference()` function
+and the 1-hot labels.
+
+```python
+cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,
+                                                        onehot_labels,
+                                                        name='xentropy')
+```
+
+It then uses [`tf.reduce_mean`](../../../api_docs/python/math_ops.md#reduce_mean)
+to average the cross entropy values across the batch dimension (the first
+dimension) as the total loss.
+
+```python
+loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')
+```
+
+And the tensor that will then contain the loss value is returned.
+
+> Note: Cross-entropy is an idea from information theory that allows us
+> to describe how bad it is to believe the predictions of the neural network,
+> given what is actually true. For more information, read the blog post Visual
+> Information Theory (http://colah.github.io/posts/2015-09-Visual-Information/)
+
+### Training
+
+The `training()` function adds the operations needed to minimize the loss via
+gradient descent.
+
+Firstly, it takes the loss tensor from the `loss()` function and hands it to a
+[`tf.scalar_summary`](../../../api_docs/python/train.md#scalar_summary),
+an op for generating summary values into the events file when used with a
+`SummaryWriter` (see below).  In this case, it will emit the snapshot value of
+the loss every time the summaries are written out.
+
+```python
+tf.scalar_summary(loss.op.name, loss)
+```
+
+Next, we instantiate a [`tf.train.GradientDescentOptimizer`](../../../api_docs/python/train.md#GradientDescentOptimizer)
+responsible for applying gradients with the requested learning rate.
+
+```python
+optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)
+```
+
+We then generate a single variable to contain a counter for the global
+training step and the [`minimize()`](../../../api_docs/python/train.md#Optimizer.minimize)
+op is used to both update the trainable weights in the system and increment the
+global step.  This is, by convention, known as the `train_op` and is what must
+be run by a TensorFlow session in order to induce one full step of training
+(see below).
+
+```python
+global_step = tf.Variable(0, name='global_step', trainable=False)
+train_op = optimizer.minimize(loss, global_step=global_step)
+```
+
+The tensor containing the outputs of the training op is returned.
+
+## Train the Model
+
+Once the graph is built, it can be iteratively trained and evaluated in a loop
+controlled by the user code in `fully_connected_feed.py`.
+
+### The Graph
+
+At the top of the `run_training()` function is a python `with` command that
+indicates all of the built ops are to be associated with the default
+global [`tf.Graph`](../../../api_docs/python/framework.md#Graph)
+instance.
+
+```python
+with tf.Graph().as_default():
+```
+
+A `tf.Graph` is a collection of ops that may be executed together as a group.
+Most TensorFlow uses will only need to rely on the single default graph.
+
+More complicated uses with multiple graphs are possible, but beyond the scope of
+this simple tutorial.
+
+### The Session
+
+Once all of the build preparation has been completed and all of the necessary
+ops generated, a [`tf.Session`](../../../api_docs/python/client.md#Session)
+is created for running the graph.
+
+```python
+sess = tf.Session()
+```
+
+Alternately, a `Session` may be generated into a `with` block for scoping:
+
+```python
+with tf.Session() as sess:
+```
+
+The empty parameter to session indicates that this code will attach to
+(or create if not yet created) the default local session.
+
+Immediately after creating the session, all of the `tf.Variable`
+instances are initialized by calling `sess.run()` on their initialization op.
+
+```python
+init = tf.initialize_all_variables()
+sess.run(init)
+```
+
+The [`sess.run()`](../../../api_docs/python/client.md#Session.run)
+method will run the complete subset of the graph that
+corresponds to the op(s) passed as parameters.  In this first call, the `init`
+op is a [`tf.group`](../../../api_docs/python/control_flow_ops.md#group)
+that contains only the initializers for the variables.  None of the rest of the
+graph is run here, that happens in the training loop below.
+
+### Train Loop
+
+After initializing the variables with the session, training may begin.
+
+The user code controls the training per step, and the simplest loop that
+can do useful training is:
+
+```python
+for step in xrange(max_steps):
+    sess.run([train_op])
+```
+
+However, this tutorial is slightly more complicated in that it must also slice
+up the input data for each step to match the previously generated placeholders.
+
+#### Feed the Graph
+
+For each step, the code will generate a feed dictionary that will contain the
+set of examples on which to train for the step, keyed by the placeholder
+ops they represent.
+
+In the `fill_feed_dict()` function, the given `DataSet` is queried for its next
+`batch_size` set of images and labels, and tensors matching the placeholders are
+filled containing the next images and labels.
+
+```python
+images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size)
+```
+
+A python dictionary object is then generated with the placeholders as keys and
+the representative feed tensors as values.
+
+```python
+feed_dict = {
+    images_placeholder: images_feed,
+    labels_placeholder: labels_feed,
+}
+```
+
+This is passed into the `sess.run()` function's `feed_dict` parameter to provide
+the input examples for this step of training.
+
+#### Check the Status
+
+The code specifies two op-tensors in its run call: `[train_op, loss]`:
+
+```python
+for step in xrange(FLAGS.max_steps):
+    feed_dict = fill_feed_dict(data_sets.train,
+                               images_placeholder,
+                               labels_placeholder)
+    _, loss_value = sess.run([train_op, loss],
+                             feed_dict=feed_dict)
+```
+
+Because there are two tensors passed as parameters, the return from
+`sess.run()` is a tuple with two items.  The returned items are themselves
+tensors, filled with the values of the passed op-tensors during this step of
+training.
+
+The value of the `train_op` is actually `None` and, thus, discarded.  But the
+value of the `loss` tensor may become NaN if the model diverges during training.
+
+Assuming that the training runs fine without NaNs, the training loop also
+prints a simple status text every 100 steps to let the user know the state of
+training.
+
+```python
+if step % 100 == 0:
+    print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)
+```
+
+#### Visualize the Status
+
+In order to emit the events files used by [TensorBoard](../../../how_tos/summaries_and_tensorboard/index.md),
+all of the summaries (in this case, only one) are collected into a single op
+during the graph building phase.
+
+```python
+summary_op = tf.merge_all_summaries()
+```
+
+And then after the Session is generated, a [`tf.train.SummaryWriter`](../../../api_docs/python/train.md#SummaryWriter)
+may be instantiated to output into the given directory the events files,
+containing the Graph itself and the values of the summaries.
+
+```python
+summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                        graph_def=sess.graph_def)
+```
+
+Lastly, the events file will be updated with new summary values every time the
+`summary_op` is run and the ouput passed to the writer's `add_summary()`
+function.
+
+```python
+summary_str = sess.run(summary_op, feed_dict=feed_dict)
+summary_writer.add_summary(summary_str, step)
+```
+
+When the events files are written, TensorBoard may be run against the training
+folder to display the values from the summaries.
+
+![MNIST TensorBoard](./mnist_tensorboard.png "MNIST TensorBoard")
+
+**NOTE**: For more info about how to build and run Tensorboard, please see the accompanying tutorial [Tensorboard: Visualizing Your Training](../../../how_tos/summaries_and_tensorboard/index.md).
+
+#### Save a Checkpoint
+
+In order to emit a checkpoint file that may be used to later restore a model
+for further training or evaluation, we instantiate a
+[`tf.train.Saver`](../../../api_docs/python/state_ops.md#Saver).
+
+```python
+saver = tf.train.Saver()
+```
+
+In the training loop, the [`saver.save()`](../../../api_docs/python/state_ops.md#Saver.save)
+method will periodically be called to write a checkpoint file to the training
+directory with the current values of all the trainable variables.
+
+```python
+saver.save(sess, FLAGS.train_dir, global_step=step)
+```
+
+At some later point in the future, training might be resumed by using the
+[`saver.restore()`](../../../api_docs/python/state_ops.md#Saver.restore)
+method to reload the model parameters.
+
+```python
+saver.restore(sess, FLAGS.train_dir)
+```
+
+## Evaluate the Model
+
+Every thousand steps, the code will attempt to evaluate the model against both
+the training and test datasets.  The `do_eval()` function is called thrice, for
+the training, validation, and test datasets.
+
+```python
+print 'Training Data Eval:'
+do_eval(sess,
+        eval_correct,
+        images_placeholder,
+        labels_placeholder,
+        data_sets.train)
+print 'Validation Data Eval:'
+do_eval(sess,
+        eval_correct,
+        images_placeholder,
+        labels_placeholder,
+        data_sets.validation)
+print 'Test Data Eval:'
+do_eval(sess,
+        eval_correct,
+        images_placeholder,
+        labels_placeholder,
+        data_sets.test)
+```
+
+> Note that more complicated usage would usually sequester the `data_sets.test`
+> to only be checked after significant amounts of hyperparameter tuning.  For
+> the sake of a simple little MNIST problem, however, we evaluate against all of
+> the data.
+
+### Build the Eval Graph
+
+Before opening the default Graph, the test data should have been fetched by
+calling the `get_data(train=False)` function with the parameter set to grab
+the test dataset.
+
+```python
+test_all_images, test_all_labels = get_data(train=False)
+```
+
+Before entering the training loop, the Eval op should have been built
+by calling the `evaluation()` function from `mnist.py` with the same
+logits/labels parameters as the `loss()` function.
+
+```python
+eval_correct = mnist.evaluation(logits, labels_placeholder)
+```
+
+The `evaluation()` function simply generates a [`tf.nn.in_top_k`](../../../api_docs/python/nn.md#in_top_k)
+op that can automatically score each model output as correct if the true label
+can be found in the K most-likely predictions.  In this case, we set the value
+of K to 1 to only consider a prediction correct if it is for the true label.
+
+```python
+eval_correct = tf.nn.in_top_k(logits, labels, 1)
+```
+
+### Eval Output
+
+One can then create a loop for filling a `feed_dict` and calling `sess.run()`
+against the `eval_correct` op to evaluate the model on the given dataset.
+
+```python
+for step in xrange(steps_per_epoch):
+    feed_dict = fill_feed_dict(data_set,
+                               images_placeholder,
+                               labels_placeholder)
+    true_count += sess.run(eval_correct, feed_dict=feed_dict)
+```
+
+The `true_count` variable simply accumulates all of the predictions that the
+`in_top_k` op has determined to be correct.  From there, the precision may be
+calculated from simply dividing by the total number of examples.
+
+```python
+precision = float(true_count) / float(num_examples)
+print '  Num examples: %d  Num correct: %d  Precision @ 1: %0.02f' % (
+    num_examples, true_count, precision)
+```
diff --git a/tensorflow/g3doc/tutorials/pdes/index.md b/tensorflow/g3doc/tutorials/pdes/index.md
new file mode 100755
index 0000000000..1f29e4037c
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/pdes/index.md
@@ -0,0 +1,129 @@
+
+## Basic Setup
+
+
+```
+#Import libraries for simulation
+import tensorflow as tf
+import numpy as np
+
+#Imports for visualization
+import PIL.Image
+from cStringIO import StringIO
+from IPython.display import clear_output, Image, display
+```
+
+
+```
+def DisplayArray(a, fmt='jpeg', rng=[0,1]):
+  """Display an array as a picture."""
+  a = (a - rng[0])/float(rng[1] - rng[0])*255
+  a = np.uint8(np.clip(a, 0, 255))
+  f = StringIO()
+  PIL.Image.fromarray(a).save(f, fmt)
+  display(Image(data=f.getvalue()))
+```
+
+
+```
+sess = tf.InteractiveSession()
+```
+
+## Computational Convenience Functions
+
+
+```
+def make_kernel(a):
+  """Transform a 2D array into a convolution kernel"""
+  a = np.asarray(a)
+  a = a.reshape(list(a.shape) + [1,1])
+  return tf.constant(a, dtype=1)
+
+def simple_conv(x, k):
+  """A simplified 2D convolution operation"""
+  x = tf.expand_dims(tf.expand_dims(x, 0), -1)
+  y = tf.nn.depthwise_conv2d(x, k, [1, 1, 1, 1], padding='SAME')
+  return y[0, :, :, 0]
+
+def laplace(x):
+  """Compute the 2D laplacian of an array"""
+  laplace_k = make_kernel([[0.5, 1.0, 0.5],
+                           [1.0, -6., 1.0],
+                           [0.5, 1.0, 0.5]])
+  return simple_conv(x, laplace_k)
+```
+
+## Define the PDE
+
+
+```
+N = 500
+```
+
+
+```
+# Initial Conditions -- some rain drops hit a pond
+
+# Set everything to zero
+u_init = np.zeros([N, N], dtype="float32")
+ut_init = np.zeros([N, N], dtype="float32")
+
+# Some rain drops hit a pond at random points
+for n in range(40):
+  a,b = np.random.randint(0, N, 2)
+  u_init[a,b] = np.random.uniform()
+  
+DisplayArray(u_init, rng=[-0.1, 0.1])
+```
+
+
+![jpeg](output_8_0.jpe)
+
+
+
+```
+# paramaters
+# eps -- time resolution
+# damping -- wave damping
+eps = tf.placeholder('float', shape=())
+damping = tf.placeholder('float', shape=())
+
+# create variables for simulation state
+U  = tf.Variable(u_init)
+Ut = tf.Variable(ut_init)
+
+# discretized PDE update rules
+U_ = U + eps*Ut
+Ut_ = Ut + eps*(laplace(U) - damping*Ut)
+
+# operation to update the state
+step = tf.group(
+  U.Assign(U_),
+  Ut.Assign(Ut_) )
+```
+
+## Run The Simulation
+
+
+```
+# initialize state to initial conditions
+tf.InitializeAllVariables().Run()
+
+# Run 1000 steps of PDE
+for i in range(1000):
+  # Step simulation
+  step.Run({eps: 0.03, damping: 0.04})
+  # Visualize every 50 steps
+  if i % 50 == 0:
+    clear_output()
+    DisplayArray(U.eval(), rng=[-0.1, 0.1])
+```
+
+
+![jpeg](output_11_0.jpe)
+
+
+
+```
+
+```
diff --git a/tensorflow/g3doc/tutorials/pdes/output_11_0.jpe b/tensorflow/g3doc/tutorials/pdes/output_11_0.jpe
new file mode 100755
index 0000000000..8cd8cf02b5
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/pdes/output_11_0.jpe
diff --git a/tensorflow/g3doc/tutorials/pdes/output_8_0.jpe b/tensorflow/g3doc/tutorials/pdes/output_8_0.jpe
new file mode 100755
index 0000000000..97954effc0
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/pdes/output_8_0.jpe
diff --git a/tensorflow/g3doc/tutorials/recurrent/index.md b/tensorflow/g3doc/tutorials/recurrent/index.md
new file mode 100644
index 0000000000..29d058cd5d
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/recurrent/index.md
@@ -0,0 +1,209 @@
+# Recurrent Neural Networks
+
+## Introduction
+
+Take a look at [this great article]
+(http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
+for an introduction to recurrent neural networks and LSTMs in particular.
+
+## Language Modeling
+
+In this tutorial we will show how to train a recurrent neural network on
+a challenging task of language modeling. The goal of the problem is to fit a
+probabilistic model which assigns probablities to sentences. It does so by
+predicting next words in a text given a history of previous words. For this
+purpose we will use the Penn Tree Bank (PTB) dataset, which is a popular
+benchmark for measuring quality of these models, whilst being small and
+relatively fast to train.
+
+Language modeling is key to many interesting problems such as speech
+recognition, machine translation, or image captioning. It is also fun, too --
+take a look [here] (http://karpathy.github.io/2015/05/21/rnn-effectiveness/).
+
+For the purpose of this tutorial, we will reproduce the results from
+[Zaremba et al., 2014] (http://arxiv.org/abs/1409.2329), which achieves very
+good results on the PTB dataset.
+
+## Tutorial Files
+
+This tutorial references the following files from `models/rnn/ptb`:
+
+File | Purpose
+--- | ---
+`ptb_word_lm.py` | The code to train a language model on the PTB dataset.
+`reader.py` | The code to read the dataset.
+
+## Download and Prepare the Data
+
+The data required for this tutorial is in the data/ directory of the
+PTB dataset from Tomas Mikolov's webpage:
+http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
+
+The dataset is already preprocessed and contains overall 10000 different words,
+including the end-of-sentence marker and a special symbol (\<unk\>) for rare
+words. We convert all of them in the `reader.py` to unique integer identifiers
+to make it easy for the neural network to process.
+
+## The Model
+
+### LSTM
+
+The core of the model consists of an LSTM cell that processes one word at the
+time and computes probabilities of the possible continuations of the sentence.
+The memory state of the network is initialized with a vector of zeros and gets
+updated after reading each word. Also, for computational reasons, we will
+process data in mini-batches of size `batch_size`.
+
+The basic pseudocode looks as follows:
+
+```python
+lstm = rnn_cell.BasicLSTMCell(lstm_size)
+# Initial state of the LSTM memory.
+state = tf.zeros([batch_size, lstm.state_size])
+
+loss = 0.0
+for current_batch_of_words in words_in_dataset:
+    # The value of state is updated after processing each batch of words.
+    output, state = lstm(current_batch_of_words, state)
+
+    # The LSTM output can be used to make next word predictions
+    logits = tf.matmul(output, softmax_w) + softmax_b
+    probabilities = tf.nn.softmax(logits)
+    loss += loss_function(probabilities, target_words)
+```
+
+### Truncated Backpropagation
+
+In order to make the learning process tractable, it is a common practice to
+truncate the gradients for backpropagation to a fixed number (`num_steps`)
+of unrolled steps.
+This is easy to implement by feeding inputs of length `num_steps` at a time and
+doing backward pass after each iteration.
+
+A simplifed version of the code for the graph creation for truncated
+backpropagation:
+
+```python
+# Placeholder for the inputs in a given iteration.
+words = tf.placeholder(tf.int32, [batch_size, num_steps])
+
+lstm = rnn_cell.BasicLSTMCell(lstm_size)
+# Initial state of the LSTM memory.
+initial_state = state = tf.zeros([batch_size, lstm.state_size])
+
+for i in range(len(num_steps)):
+    # The value of state is updated after processing each batch of words.
+    output, state = lstm(words[:, i], state)
+
+    # The rest of the code.
+    # ...
+
+final_state = state
+```
+
+And this is how to implement an iteration over the whole dataset:
+
+```python
+# A numpy array holding the state of LSTM after each batch of words.
+numpy_state = initial_state.eval()
+total_loss = 0.0
+for current_batch_of_words in words_in_dataset:
+    numpy_state, current_loss = session.run([final_state, loss],
+        # Initialize the LSTM state from the previous iteration.
+        feed_dict={initial_state: numpy_state, words: current_batch_of_words})
+    total_loss += current_loss
+```
+
+### Inputs
+
+The word IDs will be embedded into a dense representation (see the
+[Vectors Representations Tutorial](../word2vec/index.md)) before feeding to
+the LSTM. This allows the model to efficiently represent the knowledge about
+particular words. It is also easy to write:
+
+```python
+# embedding_matrix is a tensor of shape [vocabulary_size, embedding size]
+word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)
+```
+
+The embedding matrix will be initialized randomly and the model will learn to
+differentiate the meaning of words just by looking at the data.
+
+### Loss Fuction
+
+We want to minimize the average negative log probability of the target words:
+
+$$ \text{loss} = -\frac{1}{N}\sum_{i=1}^{N} \ln p_{\text{target}_i} $$
+
+It is not very difficult to implement but the function
+`sequence_loss_by_example` is already available, so we can just use it here.
+
+The typical measure reported in the papers is average per-word perplexity (often
+just called perplexity), which is equal to
+
+$$e^{-\frac{1}{N}\sum_{i=1}^{N} \ln p_{\text{target}_i}} = e^{\text{loss}} $$
+
+and we will monitor its value throughout the training process.
+
+### Stacking multiple LSTMs
+
+To give the model more expressive power, we can add multiple layers of LSTMs
+to process the data. The output of the first layer will become the input of
+the second and so on.
+
+We have a class called `MultiRNNCell` that makes the implementation seemless:
+
+```python
+lstm = rnn_cell.BasicLSTMCell(lstm_size)
+stacked_lstm = rnn_cell.MultiRNNCell([lstm] * number_of_layers)
+
+initial_state = state = stacked_lstm.zero_state(batch_size, tf.float32)
+for i in range(len(num_steps)):
+    # The value of state is updated after processing each batch of words.
+    output, state = stacked_lstm(words[:, i], state)
+
+    # The rest of the code.
+    # ...
+
+final_state = state
+```
+
+## Compile and Run the Code
+
+First, the library needs to be built. To compile it on CPU:
+
+```
+bazel build -c opt tensorflow/models/rnn/ptb:ptb_word_lm
+```
+
+And if you have a fast GPU, run the following:
+
+```
+bazel build -c opt tensorflow --config=cuda \
+  tensorflow/models/rnn/ptb:ptb_word_lm
+```
+
+Now we can run the model:
+
+```
+bazel-bin/.../ptb_word_lm \
+  --data_path=/tmp/simple-examples/data/ --alsologtostderr --model small
+```
+
+There are 3 supported model configurations in the tutorial code: "small",
+"medium" and "large". The difference between them is in size of the LSTMs and
+the set of hyperparameters used for training.
+
+The larger the model, the better results it should get. The `small` model should
+be able to reach perplexity below 120 on the test set and the `large` one below
+80, though it might take several hours to train.
+
+## What Next?
+
+There are several tricks that we haven't mentioned that make the model better,
+including:
+
+* decreasing learning rate schedule,
+* dropout between the LSTM layers.
+
+Study the code and modify it to improve the model even further.
diff --git a/tensorflow/g3doc/tutorials/seq2seq/index.md b/tensorflow/g3doc/tutorials/seq2seq/index.md
new file mode 100644
index 0000000000..e421c814aa
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/seq2seq/index.md
@@ -0,0 +1,331 @@
+# Sequence-to-Sequence Models: Learning to Translate
+
+Recurrent neural networks can learn to model language, as already discussed
+in the [RNN Tutorial](../recurrent/index.md)
+(if you did not read it, please go through it before proceeding with this one).
+This raises an interesting question: could we condition the generated words on
+some input and generate a meaningful response? For example, could we train
+a neural network to translate from English to French? It turns out that
+the answer is *yes*.
+
+This tutorial will show you how to build and train such a system end-to-end.
+You can start by running this binary.
+
+```
+bazel run -c opt <...>/models/rnn/translate/translate.py
+  --data_dir [your_data_directory]
+```
+
+It will download English-to-French translation data from the
+[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html)
+prepare it for training and train. It takes  about 20GB of disk space,
+and a while to download and prepare (see [later](#run_it) for details),
+so you can start and leave it running while reading this tutorial.
+
+This tutorial references the following files from `models/rnn`.
+
+File | What's in it?
+--- | ---
+`seq2seq.py` | Library for building sequence-to-sequence models.
+`translate/seq2seq_model.py` | Neural translation sequence-to-sequence model.
+`translate/data_utils.py` | Helper functions for preparing translation data.
+`translate/translate.py` | Binary that trains and runs the translation model.
+
+
+## Sequence-to-Sequence Basics
+
+A basic sequence-to-sequence model, as introduced in
+[Cho et al., 2014](http://arxiv.org/pdf/1406.1078v3.pdf),
+consists of two recurrent neural networks (RNNs): an *encoder* that
+processes the input and a *decoder* that generates the output.
+This basic architecture is depicted below.
+
+<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="basic_seq2seq.png" />
+</div>
+
+Each box in the picture above represents a cell of the RNN, most commonly
+a GRU cell or an LSTM cell (see the [RNN Tutorial](../recurrent/index.md)
+for an explanation of those). Encoder and decoder can share weights or,
+as is more common, use a different set of parameters. Mutli-layer cells
+have been successfully used in sequence-to-sequence models too, e.g. for
+translation [Sutskever et al., 2014](http://arxiv.org/abs/1409.3215).
+
+In the basic model depicted above, every input has to be encoded into
+a fixed-size state vector, as that is the only thing passed to the decoder.
+To allow the decoder more direct access to the input, an *attention* mechanism
+was introduced in [Bahdanu et al., 2014](http://arxiv.org/abs/1409.0473).
+We will not go into the details of the attention mechanism (see the paper),
+suffice it to say that it allows the decoder to peek into the input at every
+decoding step. A multi-layer sequence-to-sequence network with LSTM cells and
+attention mechanism in the decoder looks like this.
+
+<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="attention_seq2seq.png" />
+</div>
+
+## TensorFlow seq2seq Library
+
+As you can see above, there are many different sequence-to-sequence
+models. Each of these models can use different RNN cells, but all
+of them accept encoder inputs and decoder inputs. This motivates
+the interfaces in the TensorFlow seq2seq library (`models/rnn/seq2seq.py`).
+The basic RNN encoder-decoder sequence-to-sequence model works as follows.
+
+```python
+outputs, states = basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell)
+```
+
+In the above call, `encoder_inputs` are a list of tensors representing inputs
+to the encoder, i.e., corresponding to the letters *A, B, C* in the first
+picture above. Similarly, `decoder_inputs` are tensors representing inputs
+to the decoder, *GO, W, X, Y, Z* on the first picture.
+
+The `cell` argument is an instance of the `models.rnn.rnn_cell.RNNCell` class
+that determines which cell will be used inside the model. You can use
+an existing cell, such as `GRUCell` or `LSTMCell`, or you can write your own.
+Moreover, `rnn_cell` provides wrappers to construct multi-layer cells,
+add dropout to cell inputs or outputs, or to do other transformations,
+see the [RNN Tutorial](../recurrent/index.md) for examples.
+
+The call to `basic_rnn_seq2seq` returns two arguments: `outputs` and `states`.
+Both of them are lists of tensors of the same length as `decoder_inputs`.
+Naturally, `outputs` correspond to the outputs of the decoder in each time-step,
+in the first picture above that would be *W, X, Y, Z, EOS*. The returned
+`states` represent the internal state of the decoder at every time-step.
+
+In many applications of sequence-to-sequence models, the output of the decoder
+at time t is fed back and becomes the input of the decoder at time t+1. At test
+time, when decoding a sequence, this is how the sequence is constructed.
+During training, on the other hand, it is common to provide the correct input
+to the decoder at every time-step, even if the decoder made a mistake before.
+Functions in `seq2seq.py` support both modes using the `feed_previous` argument.
+For example, let's analyze the following use of an embedding RNN model.
+
+```python
+outputs, states = embedding_rnn_seq2seq(
+    encoder_inputs, decoder_inputs, cell,
+    num_encoder_symbols, num_decoder_symbols,
+    output_projection=None, feed_previous=False)
+```
+
+In the `embedding_rnn_seq2seq` model, all inputs (both `encoder_inputs` and
+`decoder_inputs`) are integer-tensors that represent discrete values.
+They will be embedded into a dense representation (see the
+[Vectors Representations Tutorial](../word2vec/index.md) for more details
+on embeddings), but to construct these embeddings we need to specify
+the maximum number of discrete symbols that will appear: `num_encoder_symbols`
+on the encoder side, and `num_decoder_symbols` on the decoder side.
+
+In the above invocation, we set `feed_previous` to False. This means that the
+decoder will use `decoder_inputs` tensors as provided. If we set `feed_previous`
+to True, the decoder would only use the first element of `decoder_inputs`.
+All other tensors from this list would be ignored, and instead the previous
+output of the encoder would be used. This is used for decoding translations
+in our translation model, but it can also be used during training, to make
+the model more robust to its own mistakes, similar
+to [Bengio et al., 2015](http://arxiv.org/pdf/1506.03099v2.pdf).
+
+One more important argument used above is `output_projection`. If not specified,
+the outputs of the embedding model will be tensors of shape batch-size by
+`num_decoder_symbols` as they represent the logits for each generated symbol.
+When training models with large output vocabularies, i.e., when
+`num_decoder_symbols` is large, it is not practical to store these large
+tensors. Instead, it is better to return smaller output tensors, which will
+later be projected onto a large output tensor using `output_projection`.
+This allows to use our seq2seq models with a sampled softmax loss, as described
+in [Jean et. al., 2015](http://arxiv.org/pdf/1412.2007v2.pdf).
+
+In addition to `basic_rnn_seq2seq` and `embedding_rnn_seq2seq` there are a few
+more sequence-to-sequence models in `seq2seq.py`, take a look there. They all
+have similar interfaces, so we will not describe them in detail. We will use
+`embedding_attention_seq2seq` for our translation model below.
+
+## Neural Translation Model
+
+While the core of the sequence-to-sequence model is constructed by
+the functions in `models/rnn/seq2seq.py`, there are still a few tricks
+that are worth mentioning that are used in our translation model in
+`models/rnn/translate/seq2seq_model.py`.
+
+### Sampled softmax and output projection
+
+For one, as already mentioned above, we want to use sampled softmax to
+handle large output vocabulary. To decode from it, we need to keep track
+of the output projection. Both the sampled softmax loss and the output
+projections are constructed by the following code in `seq2seq_model.py`.
+
+```python
+  if num_samples > 0 and num_samples < self.target_vocab_size:
+    w = tf.get_variable("proj_w", [size, self.target_vocab_size])
+    w_t = tf.transpose(w)
+    b = tf.get_variable("proj_b", [self.target_vocab_size])
+    output_projection = (w, b)
+
+    def sampled_loss(inputs, labels):
+      labels = tf.reshape(labels, [-1, 1])
+      return tf.nn.sampled_softmax_loss(w_t, b, inputs, labels, num_samples,
+                                        self.target_vocab_size)
+```
+
+First, note that we only construct a sampled softmax if the number of samples
+(512 by default) is smaller that the target vocabulary size. For vocabularies
+smaller than 512 it might be a better idea to just use a standard softmax loss.
+
+Then, as you can see, we construct an output projection. It is a pair,
+consisting of a weight matrix and a bias vector. If used, the rnn cell
+will return vectors of shape batch-size by `size`, rather than batch-size
+by `target_vocab_size`. To recover logits, we need to multiply by the weight
+matrix and add the biases, as is done in lines 124-126 in `seq2seq_model.py`.
+
+```python
+if output_projection is not None:
+  self.outputs[b] = [tf.matmul(output, output_projection[0]) +
+                     output_projection[1] for ...]
+```
+
+### Bucketing and padding
+
+In addition to sampled softmax, our translation model also makes use
+of *bucketing*, which is a method to efficiently handle sentences of
+different lengths. Let us first clarify the problem. When translating
+English to French, we will have English sentences of different lengths L1
+on input, and French sentences of different lengths L2 on output. Since
+the English sentence is passed as `encoder_inputs`, and the French sentence
+comes as `decoder_inputs` (prefixed by a GO symbol), we should in principle
+create a seq2seq model for every pair (L1, L2+1) of lengths of an English
+and French sentence. This would result in an enormous graph consisting of
+many very similar subgraphs. On the other hand, we could just pad every
+sentence with a special PAD symbol. Then we'd need only one seq2seq model,
+for the padded lengths. But on shorter sentence our model would be inefficient,
+encoding and decoding many PAD symbols that are useless.
+
+As a compromise between contructing a graph for every pair of lengths and
+padding to a single length, we use a number of *buckets* and pad each sentence
+to the length of the bucket above it. In `translate.py` we use the following
+default buckets.
+
+```python
+buckets = [(5, 10), (10, 15), (20, 25), (40, 50)]
+```
+
+This means that if the input is an English sentence with 3 tokens,
+and the corresponding output is a French sentence with 6 tokens,
+then they will be put in the first bucket and padded to length 5 for
+encoder inputs, and length 10 for decoder inputs. If we have an English
+sentence with 8 tokens and the corresponding French sentence has 18 tokens,
+then they will not fit into the (10, 15) bucket, and so the (20, 25) bucket
+will be used, i.e. the English sentence will be padded to 20, and the French
+one to 25.
+
+Remember that when constructing decoder inputs we prepend the special `GO`
+symbol to the input data. This is done in the `get_batch()` function in
+`seq2seq_model.py`, which also reverses the input English sentence.
+Reversing the inputs was shown to improve results for the neural translation
+model in [Sutskever et al., 2014](http://arxiv.org/abs/1409.3215).
+To put it all together, imagine we have the sentence "I go.", tokenized
+as `["I", "go", "."]` as input and the sentence "Je vais." as output,
+tokenized `["Je", "vais", "."]`. It will be put in the (5, 10) bucket,
+with encoder inputs representing `[PAD PAD "." "go" "I"]` and decoder
+inputs `[GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]`.
+
+
+## Let's Run It {#run_it}
+
+To train the model described above, we need to a large English-French corpus.
+We will use the *10^9-French-English corpus* from the
+[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html)
+for training, and the 2013 news test from the same site as development set.
+Both data-sets will be downloaded to `data_dir` and training will start,
+saving checkpoints in `train_dir`, when this command is run.
+
+```
+bazel run -c opt <...>/models/rnn/translate:translate
+  --data_dir [your_data_directory] --train_dir [checkpoints_directory]
+  --en_vocab_size=40000 --fr_vocab_size=40000
+```
+
+It takes  about 18GB of disk space and several hours to prepare the training
+corpus. It is unpacked, vocabulary files are created in `data_dir`, and then
+the corpus is tokenized and converted to integer ids. Note the parameters
+that determine vocabulary sizes. In the example above, all words outside
+the 40K most common ones will be converted to an `UNK` token representing
+unknown words. So if you change vocabulary size, the binary will re-map
+the corpus to token-ids again.
+
+After the data is prepared, training starts. Default parameters in `translate`
+are set to quite large values. Large models trained over a long time give good
+results, but it might take too long or use too much memory for your GPU.
+You can request to train a smaller model as in the following example.
+
+```
+bazel run -c opt <...>/models/rnn/translate:translate
+  --data_dir [your_data_directory] --train_dir [checkpoints_directory]
+  --size=256 --num_layers=2 --steps_per_checkpoint=50
+```
+
+The above command will train a model with 2 layers (the default is 3),
+each layer with 256 units (default is 1024), and will save a checkpoint
+every 50 steps (the default is 200). You can play with these parameters
+to find out how large a model can be to fit into the memory of your GPU.
+
+During training, every `steps_per_checkpoint` steps the binary will print
+out statistics from recent steps. With the default parameters (3 layers
+of size 1024), first messages look like this.
+
+```
+global step 200 learning rate 0.5000 step-time 1.39 perplexity 1720.62
+  eval: bucket 0 perplexity 184.97
+  eval: bucket 1 perplexity 248.81
+  eval: bucket 2 perplexity 341.64
+  eval: bucket 3 perplexity 469.04
+global step 400 learning rate 0.5000 step-time 1.38 perplexity 379.89
+  eval: bucket 0 perplexity 151.32
+  eval: bucket 1 perplexity 190.36
+  eval: bucket 2 perplexity 227.46
+  eval: bucket 3 perplexity 238.66
+```
+
+You can see that each step takes just under 1.4 seconds, the perplexity
+on the training set and the perplexities on the development set
+for each bucket. After about 30K steps, we see perplexities on short
+sentences (bucket 0 and 1) going into single digits.
+Since the training corpus contains ~22M sentences, one epoch (going through
+the training data once) takes about 340K steps with batch-size of 64. At this
+point the model can be used for translating English sentences to French
+using the `--decode` option.
+
+```
+bazel run -c opt <...>/models/rnn/translate:translate --decode
+  --data_dir [your_data_directory] --train_dir [checkpoints_directory]
+
+Reading model parameters from /tmp/translate.ckpt-340000
+>  Who is the president of the United States?
+ Qui est le président des États-Unis ?
+```
+
+## What Next?
+
+The example above shows how you can build your own English-to-French
+translator, end-to-end. Run it and see how the model performs for yourself.
+While it has reasonable quality, the default parameters will not give you
+the best translation model. Here are a few things you can improve.
+
+First of all, we use a very promitive tokenizer, the `basic_tokenizer` function
+in `data_utils`. A better tokenizer can be found on the
+[WMT'15 Website](http://www.statmt.org/wmt15/translation-task.html).
+Using that tokenizer, and a larger vocabulary, should improve your translations.
+
+Also, the default parameters of the translation model are not tuned.
+You can try changing the learning rate, decay, or initializing the weights
+of your model in a different way. You can also change the default
+`GradientDescentOptimizer` in `seq2seq_model.py` to a more advanced one, such
+as `AdagradOptimizer`. Try these things and see how they improve your results!
+
+Finally, the model presented above can be used for any sequence-to-sequence
+task, not only for translation. Even if you want to transform a sequence to
+a tree, for example to generate a parsing tree, the same model as above can
+give state-of-the-art results, as demonstrated in
+[Vinyals & Kaiser et al., 2015](http://arxiv.org/abs/1412.7449).
+So you can not only build your own translator, you can also build a parser,
+a chat-bot, or any program that comes to your mind. Experiment!
diff --git a/tensorflow/g3doc/tutorials/word2vec/__init__.py b/tensorflow/g3doc/tutorials/word2vec/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/word2vec/__init__.py
diff --git a/tensorflow/g3doc/tutorials/word2vec/index.md b/tensorflow/g3doc/tutorials/word2vec/index.md
new file mode 100644
index 0000000000..8779f33ad7
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/word2vec/index.md
@@ -0,0 +1,396 @@
+# Learning Vector Representations of Words
+
+In this tutorial we look at the word2vec model by
+[Mikolov et al.](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf).
+This model is used for learning vector representations of words, called *word
+embeddings*.
+
+## Highlights
+
+This tutorial is meant to highlight the interesting, substantive parts of
+building a word2vec model in TensorFlow.
+
+* We start by giving the motivation for why we would want to
+represent words as vectors.
+* We look at the intuition behind the model and how it is trained
+(with a splash of math for good measure).
+* We also show a simple implementation of the model in TensorFlow.
+* Finally, we look at ways to make the naive version scale better.
+
+We walk through the code later during the tutorial, but if you'd prefer to
+dive straight in, feel free to look at the minimalistic implementation in
+[tensorflow/g3doc/tutorials/word2vec/word2vec_basic.py](./word2vec_basic.py)
+This basic example contains the code needed to download some data, train on it
+a bit and visualize the result. Once you get
+comfortable with reading and running the basic version, you can graduate to
+[tensorflow/models/embedding/word2vec.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/embedding/word2vec.py)
+which is a more serious implementation that showcases some more advanced
+TensorFlow principles about how to efficiently use threads to move data into a
+text model, how to checkpoint during training, etc.
+
+But first, let's look at why we would want to learn word embeddings in the first
+place. Feel free to skip this section if you're an Embedding Pro and you'd just
+like to get your hands dirty with the details.
+
+## Motivation: Why Learn Word Embeddings?
+
+Image and audio processing systems work with rich, high-dimensional datasets
+encoded as vectors of the individual raw pixel-intensities for image data, or
+e.g. power spectral density coefficients for audio data. For tasks like object
+or speech recognition we know that all the information required to successfully
+perform the task is encoded in the data (because humans can perform these tasks
+from the raw data).  However, natural language processing systems traditionally
+treat words as discrete atomic symbols, and therefore 'cat' may be represented
+as  `Id537` and 'dog' as `Id143`.  These encodings are arbitrary, and provide
+no useful information to the system regarding the relationships that may exist
+between the individual symbols. This means that the model can leverage
+very little of what it has learned about 'cats' when it is processing data about
+'dogs' (such that they are both animals, four-legged, pets, etc.). Representing
+words as unique, discrete ids furthermore leads to data sparsity, and usually
+means that we may need more data in order to successfully train statistical
+models.  Using vector representations can overcome some of these obstacles.
+
+<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/audio-image-text.png" alt>
+</div>
+
+[Vector space models](https://en.wikipedia.org/wiki/Vector_space_model) (VSMs)
+represent (embed) words in a continuous vector space where semantically
+similar words are mapped to nearby points ('are embedded nearby each other').
+VSMs have a long, rich history in NLP, but all methods depend in some way or
+another on the
+[Distributional Hypothesis](https://en.wikipedia.org/wiki/Distributional_semantics#Distributional_Hypothesis),
+which states that words that appear in the same contexts share
+semantic meaning. The different approaches that leverage this principle can be
+divided into two categories: *count-based methods* (e.g.
+[Latent Semantic Analysis](https://en.wikipedia.org/wiki/Latent_semantic_analysis)),
+and *predictive methods* (e.g.
+[neural probabilistic language models](http://www.scholarpedia.org/article/Neural_net_language_models)).
+
+This distinction is elaborated in much more detail by
+[Baroni et al.](http://clic.cimec.unitn.it/marco/publications/acl2014/baroni-etal-countpredict-acl2014.pdf),
+but in a nutshell: Count-based methods compute the statistics of
+how often some word co-occurs with its neighbor words in a large text corpus,
+and then map these count-statistics down to a small, dense vector for each word.
+Predictive models directly try to predict a word from its neighbors in terms of
+learned small, dense *embedding vectors* (considered parameters of the
+model).
+
+Word2vec is a particularly computationally-efficient predictive model for
+learning word embeddings from raw text. It comes in two flavors, the Continuous
+Bag-of-Words model (CBOW) and the Skip-Gram model. Algorithmically, these
+models are similar, except that CBOW predicts target words (e.g. 'mat') from
+source context words ('the cat sits on the'), while the skip-gram does the
+inverse and predicts source context-words from the target words. This inversion
+might seem like an arbitrary choice, but statistically it has the effect that
+CBOW smoothes over a lot of the distributional information (by treating an
+entire context as one observation). For the most part, this turns out to be a
+useful thing for smaller datasets. However, skip-gram treats each context-target
+pair as a new observation, and this tends to do better when we have larger
+datasets. We will focus on the skip-gram model in the rest of this tutorial.
+
+
+## Scaling up with Noise-Contrastive Training
+
+Neural probabilistic language models are traditionally trained using the
+[maximum likelihood](https://en.wikipedia.org/wiki/Maximum_likelihood) (ML)
+principle  to maximize the probability of the next word $$w_t$$ (for 'target)
+given the previous words $$h$$ (for 'history') in terms of a
+[*softmax* function](https://en.wikipedia.org/wiki/Softmax_function),
+
+$$
+\begin{align}
+P(w_t | h) &= \text{softmax}(\exp \{ \text{score}(w_t, h) \}) \\
+           &= \frac{\exp \{ \text{score}(w_t, h) \} }
+             {\sum_\text{Word w' in Vocab} \exp \{ \text{score}(w', h) \} }.
+\end{align}
+$$
+
+where $$\text{score}(w_t, h)$$ computes the compatibility of word $$w_t$$ with
+the context $$h$$ (a dot product is commonly used). We train this model by
+maximizing its log-likelihood on the training set, i.e. by maximizing
+
+$$
+\begin{align}
+ J_\text{ML} &= \log P(w_t | h) \\
+  &= \text{score}(w_t, h) -
+     \log \left( \sum_\text{Word w' in Vocab} \exp \{ \text{score}(w', h) \} \right)
+\end{align}
+$$
+
+This yields a properly normalized probabilistic model for language modeling.
+However this is very expensive, because we need to compute and normalize each
+probability using the score for all other $$V$$ words $$w'$$ in the current
+context $$h$$, *at every training step*.
+
+<div style="width:60%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/softmax-nplm.png" alt>
+</div>
+
+On the other hand, for feature learning in word2vec we do not need a full
+probabilistic model. The CBOW and skip-gram models are instead trained using a
+binary classification objective (logistic regression) to discriminate the real
+target words $$w_t$$ from $$k$$ imaginary (noise) words $$\tilde w$$, in the
+same context. We illustrate this below for a CBOW model. For skip-gram the
+direction is simply inverted.
+
+<div style="width:60%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/nce-nplm.png" alt>
+</div>
+
+Mathematically, the objective (for each example) is to maximize
+
+$$J_\text{NEG} = \log Q_\theta(D=1 |w_t, h) +
+  k \mathop{\mathbb{E}}_{\tilde w \sim P_\text{noise}}
+     \left[ \log Q_\theta(D = 0 |\tilde w, h) \right]$$,
+
+where $$Q_\theta(D=1 | w, h)$$ is the binary logistic regression probability
+under the model of seeing the word $$w$$ in the context $$h$$ in the dataset
+$$D$$, calculated in terms of the learned embedding vectors $$\theta$$. In
+practice we approximate the expectation by drawing $$k$$ constrastive words
+from the noise distribution (i.e. we compute a
+[Monte Carlo average](https://en.wikipedia.org/wiki/Monte_Carlo_integration)).
+
+This objective is maximized when the model assigns high probabilities
+to the real words, and low probabilities to noise words. Technically, this is
+called
+[Negative Sampling](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf),
+and there is good mathematical motivation for using this loss function:
+The updates it proposes approximate the updates of the softmax function in the
+limit. But computationally it is especially appealing because computing the
+loss function now scales only with the number of *noise words* that we
+select ($$k$$), and not *all words* in the vocabulary ($$V$$). This makes it
+much faster to train. We will actually make use of the very similar
+[noise-contrastive estimation (NCE)](http://papers.nips.cc/paper/5165-learning-word-embeddings-efficiently-with-noise-contrastive-estimation.pdf)
+loss, for which TensorFlow has a handy helper function `tf.nn.nce_loss()`.
+
+Let's get an intuitive feel for how this would work in practice!
+
+## The Skip-gram Model
+
+As an example, let's consider the dataset
+
+`the quick brown fox jumped over the lazy dog`
+
+We first form a dataset of words and the contexts in which they appear. We
+could define 'context' in any way that makes sense, and in fact people have
+looked at syntactic contexts (i.e. the syntactic dependents of the current
+target word, see e.g.
+[Levy et al.](https://levyomer.files.wordpress.com/2014/04/dependency-based-word-embeddings-acl-2014.pdf)),
+words-to-the-left of the target, words-to-the-right of the target, etc. For now,
+let's stick to the vanilla definition and define 'context' as the window
+of words to the left and to the right of a target word. Using a window
+size of 1, we then have the dataset
+
+`([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox), ...`
+
+of `(context, target)` pairs. Recall that skip-gram inverts contexts and
+targets, and tries to predict each context word from its target word, so the
+task becomes to predict 'the' and 'brown' from 'quick', 'quick' and 'fox' from
+'brown', etc. Therefore our dataset becomes
+
+`(quick, the), (quick, brown), (brown, quick), (brown, fox), ...`
+
+of `(input, output)` pairs.  The objective function is defined over the entire
+dataset, but we typically optimize this with
+[stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
+(SGD) using one example at a time (or a 'minibatch' of `batch_size` examples,
+where typically `16 <= batch_size <= 512`). So let's look at one step of
+this process.
+
+Let's imagine at training step $$t$$ we observe the first training case above,
+where the goal is to predict `the` from `quick`. We select `num_noise` number
+of noisy (contrastive) examples by drawing from some noise distribution,
+typically the unigram distribution, $$P(w)$$. For simplicity let's say
+`num_noise=1` and we select `sheep` as a noisy example. Next we compute the
+loss for this pair of observed and noisy examples, i.e. the objective at time
+step $$t$$ becomes
+
+$$J^{(t)}_\text{NEG} = \log Q_\theta(D=1 | \text{the, quick}) +
+  \log(Q_\theta(D=0 | \text{sheep, quick}))$$.
+
+The goal is to make an update to the embedding parameters $$\theta$$ to improve
+(in this case, maximize) this objective function.  We do this by deriving the
+gradient of the loss with respect to the embedding parameters $$\theta$$, i.e.
+$$\frac{\partial}{\partial \theta} J_\text{NEG}$$ (luckily TensorFlow provides
+easy helper functions for doing this!). We then perform an update to the
+embeddings by taking a small step in the direction of the gradient. When this
+process is repeated over the entire training set, this has the effect of
+'moving' the embedding vectors around for each word until the model is
+successful at discriminating real words from noise words.
+
+We can visualize the learned vectors by projecting them down to 2 dimensions
+using for instance something like the
+[t-SNE dimensionality reduction technique](http://lvdmaaten.github.io/tsne/).
+When we inspect these visualizations it becomes apparent that the vectors
+capture some general, and in fact quite useful, semantic information about
+words and their relationships to one another. It was very interesting when we
+first discovered that certain directions in the induced vector space specialize
+towards certain semantic relationships, e.g. *male-female*, *gender* and
+even *country-capital* relationships between words, as illustrated in the figure
+below (see also for example
+[Mikolov et al., 2013](http://www.aclweb.org/anthology/N13-1090)).
+
+<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/linear-relationships.png" alt>
+</div>
+
+This explains why these vectors are also useful as features for many canonical
+NLP prediction tasks, such as part-of-speech tagging or named entity recognition
+(see for example the original work by
+[Collobert et al.](http://arxiv.org/pdf/1103.0398v1.pdf), or follow-up work by
+[Turian et al.](http://www.aclweb.org/anthology/P10-1040)).
+
+But for now, let's just use them to draw pretty pictures!
+
+## Building the Graph
+
+This is all about embeddings, so let's define our embedding matrix.
+This is just a big random matrix to start.  We'll initialize the values to be
+uniform in the unit cube.
+
+```python
+embeddings = tf.Variable(
+    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
+```
+
+The noise-contrastive estimation loss is defined in terms a logistic regression
+model. For this, we need to define the weights and biases for each word in the
+vocabulary (also called the `output weights` as opposed to the `input
+embeddings`). So let's define that.
+
+```python
+nce_weights = tf.Variable(
+  tf.truncated_normal([vocabulary_size, embedding_size],
+                      stddev=1.0 / math.sqrt(embedding_size)))
+nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
+```
+
+Now that we have the parameters in place, we can define our skip-gram model
+graph. For simplicity, let's suppose we've already integerized our text corpus
+with a vocabulary so that each word is represented as an integer (see
+[tensorflow/g3doc/tutorials/word2vec/word2vec_basic.py](./word2vec_basic.py) for
+the details). The skip-gram model takes two inputs. One is a batch full of
+integers representing the source context words, the other is for the target
+words. Let's create placeholder nodes for these inputs, so that we can feed in
+data later.
+
+```python
+# Placeholders for inputs
+train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
+train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
+```
+
+Now what we need to do is look up the vector for each of the source words in
+the batch.  TensorFlow has handy helpers that make this easy.
+
+```python
+embed = tf.nn.embedding_lookup(embeddings, train_inputs)
+```
+
+Ok, now that we have the embeddings for each word, we'd like to try to predict
+the target word using the noise-contrastive training objective.
+
+```python
+# Compute the NCE loss, using a sample of the negative labels each time.
+loss = tf.reduce_mean(
+  tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels,
+                 num_sampled, vocabulary_size))
+```
+
+Now that we have a loss node, we need to add the nodes required to compute
+gradients and update the parameters, etc. For this we will use stochastic
+gradient descent, and TensorFlow has handy helpers to make this easy.
+
+```python
+# We use the SGD optimizer.
+optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0).minimize(loss)
+```
+
+## Training the Model
+
+Training the model is then as simple as using a `feed_dict` to push data into
+the placeholders and calling `session.run` with this new data in a loop.
+
+```python
+for inputs, labels in generate_batch(...):
+  feed_dict = {training_inputs: inputs, training_labels: labels}
+  _, cur_loss = session.run([optimizer, loss], feed_dict=feed_dict)
+```
+
+See the full example code in
+[tensorflow/g3doc/tutorials/word2vec/word2vec_basic.py](./word2vec_basic.py).
+
+## Visualizing the Learned Embeddings
+
+After training has finished we can visualize the learned embeddings using
+t-SNE.
+
+<div style="width:100%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="img/tsne.png" alt>
+</div>
+
+Et voila! As expected, words that are similar end up clustering nearby each
+other. For a more heavyweight implementation of word2vec that showcases more of
+the advanced features of TensorFlow, see the implementation in
+[tensorflow/models/embedding/word2vec.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/embedding/word2vec.py).
+
+## Evaluating Embeddings: Analogical Reasoning
+
+Embeddings are useful for a wide variety of prediction tasks in NLP. Short of
+training a full-blown part-of-speech model or named-entity model, one simple way
+to evaluate embeddings is to directly use them to predict syntactic and semantic
+relationships like `king is to queen as father is to ?`. This is called
+*analogical reasoning* and the task was introduced by
+[Mikolov and colleagues](http://msr-waypoint.com/en-us/um/people/gzweig/Pubs/NAACL2013Regularities.pdf),
+and the dataset can be downloaded from here:
+https://word2vec.googlecode.com/svn/trunk/questions-words.txt.
+
+To see how we do this evaluation, have a look at the `build_eval_graph()` and
+`eval()` functions in
+[tensorflow/models/embedding/word2vec.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/embedding/word2vec.py).
+
+The choice of hyperparameters can strongly influence the accuracy on this task.
+To achieve state-of-the-art performance on this task requires training over a
+very large dataset, carefully tuning the hyperparameters and making use of
+tricks like subsampling the data, which is out of the scope of this tutorial.
+
+
+## Optimizing the Implementation
+
+Our vanilla implementation showcases the flexibility of TensorFlow. For
+example, changing the training objective is as simple as swapping out the call
+to `tf.nn.nce_loss()` for an off-the-shelf alternative such as
+`tf.nn.sampled_softmax_loss()`. If you have a new idea for a loss function, you
+can manually write an expression for the new objective in TensorFlow and let
+the optimizer compute its derivatives. This flexibility is invaluable in the
+exploratory phase of machine learning model development, where we are trying
+out several different ideas and iterating quickly.
+
+Once you have a model structure you're satisfied with, it may be worth
+optimizing your implementation to run more efficiently (and cover more data in
+less time).  For example, the naive code we used in this tutorial would suffer
+compromised speed because we use Python for reading and feeding data items --
+each of which require very little work on the TensorFlow back-end.  If you find
+your model is seriously bottlenecked on input data, you may want to implement a
+custom data reader for your problem, as described in [New Data
+Formats](../how_tos/new_data_formats/index.md).  For the case of Skip-Gram
+modeling, we've actually already done this for you as an example in
+[tensorflow/models/embedding/word2vec.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/embedding/word2vec.py).
+
+If your model is no longer I/O bound but you want still more performance, you
+can take things further by writing your own TensorFlow Ops, as described in
+[Adding a New Op](../how_tos/adding_an_op/index.md).  Again we've provided an
+example of this for the Skip-Gram case
+[tensorflow/models/embedding/word2vec_optimized.py](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/embedding/word2vec_optimized.py).
+Feel free to benchmark these against each other to measure performance
+improvements at each stage.
+
+## Conclusion
+
+In this tutorial we covered the word2vec model, a computationally efficient
+model for learning word embeddings. We motivated why embeddings are useful,
+discussed efficient training techniques and showed how to implement all of this
+in TensorFlow. Overall, we hope that this has show-cased how TensorFlow affords
+you the flexibility you need for early experimentation, and the control you
+later need for bespoke optimized implementation.
diff --git a/tensorflow/g3doc/tutorials/word2vec/word2vec_basic.py b/tensorflow/g3doc/tutorials/word2vec/word2vec_basic.py
new file mode 100644
index 0000000000..0a981570fa
--- /dev/null
+++ b/tensorflow/g3doc/tutorials/word2vec/word2vec_basic.py
@@ -0,0 +1,219 @@
+import collections
+import math
+import numpy as np
+import os
+import random
+import tensorflow as tf
+import urllib
+import zipfile
+
+# Step 1: Download the data.
+url = 'http://mattmahoney.net/dc/'
+
+def maybe_download(filename, expected_bytes):
+  """Download a file if not present, and make sure it's the right size."""
+  if not os.path.exists(filename):
+    filename, _ = urllib.urlretrieve(url + filename, filename)
+  statinfo = os.stat(filename)
+  if statinfo.st_size == expected_bytes:
+    print 'Found and verified', filename
+  else:
+    print statinfo.st_size
+    raise Exception(
+        'Failed to verify ' + filename + '. Can you get to it with a browser?')
+  return filename
+
+filename = maybe_download('text8.zip', 31344016)
+
+# Read the data into a string.
+def read_data(filename):
+  f = zipfile.ZipFile(filename)
+  for name in f.namelist():
+    return f.read(name).split()
+  f.close()
+
+words = read_data(filename)
+print 'Data size', len(words)
+
+# Step 2: Build the dictionary and replace rare words with UNK token.
+vocabulary_size = 50000
+
+def build_dataset(words):
+  count = [['UNK', -1]]
+  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
+  dictionary = dict()
+  for word, _ in count:
+    dictionary[word] = len(dictionary)
+  data = list()
+  unk_count = 0
+  for word in words:
+    if word in dictionary:
+      index = dictionary[word]
+    else:
+      index = 0  # dictionary['UNK']
+      unk_count = unk_count + 1
+    data.append(index)
+  count[0][1] = unk_count
+  reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
+  return data, count, dictionary, reverse_dictionary
+
+data, count, dictionary, reverse_dictionary = build_dataset(words)
+del words  # Hint to reduce memory.
+print 'Most common words (+UNK)', count[:5]
+print 'Sample data', data[:10]
+
+data_index = 0
+
+# Step 4: Function to generate a training batch for the skip-gram model.
+def generate_batch(batch_size, num_skips, skip_window):
+  global data_index
+  assert batch_size % num_skips == 0
+  assert num_skips <= 2 * skip_window
+  batch = np.ndarray(shape=(batch_size), dtype=np.int32)
+  labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
+  span = 2 * skip_window + 1 # [ skip_window target skip_window ]
+  buffer = collections.deque(maxlen=span)
+  for _ in range(span):
+    buffer.append(data[data_index])
+    data_index = (data_index + 1) % len(data)
+  for i in range(batch_size / num_skips):
+    target = skip_window  # target label at the center of the buffer
+    targets_to_avoid = [ skip_window ]
+    for j in range(num_skips):
+      while target in targets_to_avoid:
+        target = random.randint(0, span - 1)
+      targets_to_avoid.append(target)
+      batch[i * num_skips + j] = buffer[skip_window]
+      labels[i * num_skips + j, 0] = buffer[target]
+    buffer.append(data[data_index])
+    data_index = (data_index + 1) % len(data)
+  return batch, labels
+
+batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)
+for i in range(8):
+  print batch[i], '->', labels[i, 0]
+  print reverse_dictionary[batch[i]], '->', reverse_dictionary[labels[i, 0]]
+
+# Step 5: Build and train a skip-gram model.
+
+batch_size = 128
+embedding_size = 128  # Dimension of the embedding vector.
+skip_window = 1       # How many words to consider left and right.
+num_skips = 2         # How many times to reuse an input to generate a label.
+
+# We pick a random validation set to sample nearest neighbors. Here we limit the
+# validation samples to the words that have a low numeric ID, which by
+# construction are also the most frequent.
+valid_size = 16     # Random set of words to evaluate similarity on.
+valid_window = 100  # Only pick dev samples in the head of the distribution.
+valid_examples = np.array(random.sample(xrange(valid_window), valid_size))
+num_sampled = 64    # Number of negative examples to sample.
+
+graph = tf.Graph()
+
+with graph.as_default():
+
+  # Input data.
+  train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
+  train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
+  valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
+
+  # Construct the variables.
+  embeddings = tf.Variable(
+      tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
+  nce_weights = tf.Variable(
+      tf.truncated_normal([vocabulary_size, embedding_size],
+                          stddev=1.0 / math.sqrt(embedding_size)))
+  nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
+
+  # Look up embeddings for inputs.
+  embed = tf.nn.embedding_lookup(embeddings, train_inputs)
+
+  # Compute the average NCE loss for the batch.
+  # tf.nce_loss automatically draws a new sample of the negative labels each
+  # time we evaluate the loss.
+  loss = tf.reduce_mean(
+      tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels,
+                     num_sampled, vocabulary_size))
+
+  # Construct the SGD optimizer using a learning rate of 1.0.
+  optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
+
+  # Compute the cosine similarity between minibatch examples and all embeddings.
+  norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
+  normalized_embeddings = embeddings / norm
+  valid_embeddings = tf.nn.embedding_lookup(
+      normalized_embeddings, valid_dataset)
+  similarity = tf.matmul(
+      valid_embeddings, normalized_embeddings, transpose_b=True)
+
+# Step 6: Begin training
+num_steps = 100001
+
+with tf.Session(graph=graph) as session:
+  # We must initialize all variables before we use them.
+  tf.initialize_all_variables().run()
+  print "Initialized"
+
+  average_loss = 0
+  for step in xrange(num_steps):
+    batch_inputs, batch_labels = generate_batch(
+        batch_size, num_skips, skip_window)
+    feed_dict = {train_inputs : batch_inputs, train_labels : batch_labels}
+
+    # We perform one update step by evaluating the optimizer op (including it
+    # in the list of returned values for session.run()
+    _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
+    average_loss += loss_val
+
+    if step % 2000 == 0:
+      if step > 0:
+        average_loss = average_loss / 2000
+      # The average loss is an estimate of the loss over the last 2000 batches.
+      print "Average loss at step ", step, ": ", average_loss
+      average_loss = 0
+
+    # note that this is expensive (~20% slowdown if computed every 500 steps)
+    if step % 10000 == 0:
+      sim = similarity.eval()
+      for i in xrange(valid_size):
+        valid_word = reverse_dictionary[valid_examples[i]]
+        top_k = 8 # number of nearest neighbors
+        nearest = (-sim[i, :]).argsort()[1:top_k+1]
+        log_str = "Nearest to %s:" % valid_word
+        for k in xrange(top_k):
+          close_word = reverse_dictionary[nearest[k]]
+          log_str = "%s %s," % (log_str, close_word)
+        print log_str
+  final_embeddings = normalized_embeddings.eval()
+
+# Step 7: Visualize the embeddings.
+
+def plot_with_labels(low_dim_embs, labels, filename='tsne.png'):
+  assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"
+  plt.figure(figsize=(18, 18))  #in inches
+  for i, label in enumerate(labels):
+    x, y = low_dim_embs[i,:]
+    plt.scatter(x, y)
+    plt.annotate(label,
+                 xy=(x, y),
+                 xytext=(5, 2),
+                 textcoords='offset points',
+                 ha='right',
+                 va='bottom')
+
+  plt.savefig(filename)
+
+try:
+  from sklearn.manifold import TSNE
+  import matplotlib.pyplot as plt
+
+  tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
+  plot_only = 500
+  low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only,:])
+  labels = dictionary.keys()[:plot_only]
+  plot_with_labels(low_dim_embs, labels)
+
+except ImportError:
+  print "Please install sklearn and matplotlib to visualize embeddings."
+
diff --git a/tensorflow/models/embedding/BUILD b/tensorflow/models/embedding/BUILD
new file mode 100644
index 0000000000..0fb164b05e
--- /dev/null
+++ b/tensorflow/models/embedding/BUILD
@@ -0,0 +1,74 @@
+# Description:
+# TensorFlow model for word2vec
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+load("/tensorflow/tensorflow", "tf_gen_op_wrapper_py")
+
+py_binary(
+    name = "word2vec",
+    srcs = [
+        "word2vec.py",
+    ],
+    deps = [
+        ":gen_word2vec",
+        "//tensorflow:tensorflow_py",
+        "//tensorflow/python:platform",
+    ],
+)
+
+py_binary(
+    name = "word2vec_optimized",
+    srcs = [
+        "word2vec_optimized.py",
+    ],
+    deps = [
+        ":gen_word2vec",
+        "//tensorflow:tensorflow_py",
+        "//tensorflow/python:platform",
+    ],
+)
+
+cc_library(
+    name = "word2vec_ops",
+    srcs = [
+        "word2vec_ops.cc",
+    ],
+    visibility = ["//tensorflow:internal"],
+    deps = [
+        "//tensorflow/core:framework",
+    ],
+    alwayslink = 1,
+)
+
+cc_library(
+    name = "word2vec_kernels",
+    srcs = [
+        "word2vec_kernels.cc",
+    ],
+    visibility = ["//tensorflow:internal"],
+    deps = [
+        "//tensorflow/core",
+    ],
+    alwayslink = 1,
+)
+
+tf_gen_op_wrapper_py(
+    name = "gen_word2vec",
+    out = "gen_word2vec.py",
+    deps = [":word2vec_ops"],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/models/embedding/__init__.py b/tensorflow/models/embedding/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/models/embedding/__init__.py
diff --git a/tensorflow/models/embedding/word2vec.py b/tensorflow/models/embedding/word2vec.py
new file mode 100644
index 0000000000..4ebf3d6f27
--- /dev/null
+++ b/tensorflow/models/embedding/word2vec.py
@@ -0,0 +1,503 @@
+"""Multi-threaded word2vec mini-batched skip-gram model.
+
+Trains the model described in:
+(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space
+ICLR 2013.
+http://arxiv.org/abs/1301.3781
+This model does traditional minibatching.
+
+The key ops used are:
+* placeholder for feeding in tensors for each example.
+* embedding_lookup for fetching rows from the embedding matrix.
+* sigmoid_cross_entropy_with_logits to calculate the loss.
+* GradientDescentOptimizer for optimizing the loss.
+* skipgram custom op that does input processing.
+"""
+
+import sys
+import threading
+import time
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.models.embedding import gen_word2vec as word2vec
+
+flags = tf.app.flags
+
+flags.DEFINE_string("save_path", None, "Directory to write the model and "
+                    "training summaries.")
+flags.DEFINE_string("train_data", None, "Training text file. "
+                    "E.g., unzipped file http://mattmahoney.net/dc/text8.zip.")
+flags.DEFINE_string(
+    "eval_data", None, "File consisting of analogies of four tokens."
+    "embedding 2 - embedding 1 + embedding 3 should be close "
+    "to embedding 4."
+    "E.g. https://word2vec.googlecode.com/svn/trunk/questions-words.txt.")
+flags.DEFINE_integer("embedding_size", 200, "The embedding dimension size.")
+flags.DEFINE_integer(
+    "epochs_to_train", 15,
+    "Number of epochs to train. Each epoch processes the training data once "
+    "completely.")
+flags.DEFINE_float("learning_rate", 0.2, "Initial learning rate.")
+flags.DEFINE_integer("num_neg_samples", 100,
+                     "Negative samples per training example.")
+flags.DEFINE_integer("batch_size", 16,
+                     "Number of training examples processed per step "
+                     "(size of a minibatch).")
+flags.DEFINE_integer("concurrent_steps", 12,
+                     "The number of concurrent training steps.")
+flags.DEFINE_integer("window_size", 5,
+                     "The number of words to predict to the left and right "
+                     "of the target word.")
+flags.DEFINE_integer("min_count", 5,
+                     "The minimum number of word occurrences for it to be "
+                     "included in the vocabulary.")
+flags.DEFINE_float("subsample", 1e-3,
+                   "Subsample threshold for word occurrence. Words that appear "
+                   "with higher frequency will be randomly down-sampled. Set "
+                   "to 0 to disable.")
+flags.DEFINE_boolean(
+    "interactive", False,
+    "If true, enters an IPython interactive session to play with the trained "
+    "model. E.g., try model.analogy('france', 'paris', 'russia') and "
+    "model.nearby(['proton', 'elephant', 'maxwell']")
+flags.DEFINE_integer("statistics_interval", 5,
+                     "Print statistics every n seconds.")
+flags.DEFINE_integer("summary_interval", 5,
+                     "Save training summary to file every n seconds (rounded "
+                     "up to statistics interval.")
+flags.DEFINE_integer("checkpoint_interval", 600,
+                     "Checkpoint the model (i.e. save the parameters) every n "
+                     "seconds (rounded up to statistics interval.")
+
+FLAGS = flags.FLAGS
+
+
+class Options(object):
+  """Options used by our word2vec model."""
+
+  def __init__(self):
+    # Model options.
+
+    # Embedding dimension.
+    self.emb_dim = FLAGS.embedding_size
+
+    # Training options.
+    # The training text file.
+    self.train_data = FLAGS.train_data
+
+    # Number of negative samples per example.
+    self.num_samples = FLAGS.num_neg_samples
+
+    # The initial learning rate.
+    self.learning_rate = FLAGS.learning_rate
+
+    # Number of epochs to train. After these many epochs, the learning
+    # rate decays linearly to zero and the training stops.
+    self.epochs_to_train = FLAGS.epochs_to_train
+
+    # Concurrent training steps.
+    self.concurrent_steps = FLAGS.concurrent_steps
+
+    # Number of examples for one training step.
+    self.batch_size = FLAGS.batch_size
+
+    # The number of words to predict to the left and right of the target word.
+    self.window_size = FLAGS.window_size
+
+    # The minimum number of word occurrences for it to be included in the
+    # vocabulary.
+    self.min_count = FLAGS.min_count
+
+    # Subsampling threshold for word occurrence.
+    self.subsample = FLAGS.subsample
+
+    # How often to print statistics.
+    self.statistics_interval = FLAGS.statistics_interval
+
+    # How often to write to the summary file (rounds up to the nearest
+    # statistics_interval).
+    self.summary_interval = FLAGS.summary_interval
+
+    # How often to write checkpoints (rounds up to the nearest statistics
+    # interval).
+    self.checkpoint_interval = FLAGS.checkpoint_interval
+
+    # Where to write out summaries.
+    self.save_path = FLAGS.save_path
+
+    # Eval options.
+    # The text file for eval.
+    self.eval_data = FLAGS.eval_data
+
+
+class Word2Vec(object):
+  """Word2Vec model (Skipgram)."""
+
+  def __init__(self, options, session):
+    self._options = options
+    self._session = session
+    self._word2id = {}
+    self._id2word = []
+    self.build_graph()
+    self.build_eval_graph()
+    self.save_vocab()
+    self._read_analogies()
+
+  def _read_analogies(self):
+    """Reads through the analogy question file.
+
+    Returns:
+      questions: a [n, 4] numpy array containing the analogy question's
+                 word ids.
+      questions_skipped: questions skipped due to unknown words.
+    """
+    questions = []
+    questions_skipped = 0
+    with open(self._options.eval_data) as analogy_f:
+      for line in analogy_f:
+        if line.startswith(":"):  # Skip comments.
+          continue
+        words = line.strip().lower().split(" ")
+        ids = [self._word2id.get(w.strip()) for w in words]
+        if None in ids or len(ids) != 4:
+          questions_skipped += 1
+        else:
+          questions.append(np.array(ids))
+    print "Eval analogy file: ", self._options.eval_data
+    print "Questions: ", len(questions)
+    print "Skipped: ", questions_skipped
+    self._analogy_questions = np.array(questions, dtype=np.int32)
+
+  def forward(self, examples, labels):
+    """Build the graph for the forward pass."""
+    opts = self._options
+
+    # Declare all variables we need.
+    # Embedding: [vocab_size, emb_dim]
+    init_width = 0.5 / opts.emb_dim
+    emb = tf.Variable(
+        tf.random_uniform(
+            [opts.vocab_size, opts.emb_dim], -init_width, init_width),
+        name="emb")
+    self._emb = emb
+
+    # Softmax weight: [vocab_size, emb_dim]. Transposed.
+    sm_w_t = tf.Variable(
+        tf.zeros([opts.vocab_size, opts.emb_dim]),
+        name="sm_w_t")
+
+    # Softmax bias: [emb_dim].
+    sm_b = tf.Variable(tf.zeros([opts.vocab_size]), name="sm_b")
+
+    # Global step: scalar, i.e., shape [].
+    self.global_step = tf.Variable(0, name="global_step")
+
+    # Nodes to compute the nce loss w/ candidate sampling.
+    labels_matrix = tf.reshape(
+        tf.cast(labels,
+                dtype=tf.int64),
+        [opts.batch_size, 1])
+
+    # Negative sampling.
+    sampled_ids, _, _ = (tf.nn.fixed_unigram_candidate_sampler(
+        true_classes=labels_matrix,
+        num_true=1,
+        num_sampled=opts.num_samples,
+        unique=True,
+        range_max=opts.vocab_size,
+        distortion=0.75,
+        unigrams=opts.vocab_counts.tolist()))
+
+    # Embeddings for examples: [batch_size, emb_dim]
+    example_emb = tf.nn.embedding_lookup(emb, examples)
+
+    # Weights for labels: [batch_size, emb_dim]
+    true_w = tf.nn.embedding_lookup(sm_w_t, labels)
+    # Biases for labels: [batch_size, 1]
+    true_b = tf.nn.embedding_lookup(sm_b, labels)
+
+    # Weights for sampled ids: [num_sampled, emb_dim]
+    sampled_w = tf.nn.embedding_lookup(sm_w_t, sampled_ids)
+    # Biases for sampled ids: [num_sampled, 1]
+    sampled_b = tf.nn.embedding_lookup(sm_b, sampled_ids)
+
+    # True logits: [batch_size, 1]
+    true_logits = tf.reduce_sum(tf.mul(example_emb, true_w), 1) + true_b
+
+    # Sampled logits: [batch_size, num_sampled]
+    # We replicate sampled noise lables for all examples in the batch
+    # using the matmul.
+    sampled_b_vec = tf.reshape(sampled_b, [opts.num_samples])
+    sampled_logits = tf.matmul(example_emb,
+                               sampled_w,
+                               transpose_b=True) + sampled_b_vec
+    return true_logits, sampled_logits
+
+  def nce_loss(self, true_logits, sampled_logits):
+    """Build the graph for the NCE loss."""
+
+    # cross-entropy(logits, labels)
+    opts = self._options
+    true_xent = tf.nn.sigmoid_cross_entropy_with_logits(
+        true_logits, tf.ones_like(true_logits))
+    sampled_xent = tf.nn.sigmoid_cross_entropy_with_logits(
+        sampled_logits, tf.zeros_like(sampled_logits))
+
+    # NCE-loss is the sum of the true and noise (sampled words)
+    # contributions, averaged over the batch.
+    nce_loss_tensor = (tf.reduce_sum(true_xent) +
+                       tf.reduce_sum(sampled_xent)) / opts.batch_size
+    return nce_loss_tensor
+
+  def optimize(self, loss):
+    """Build the graph to optimize the loss function."""
+
+    # Optimizer nodes.
+    # Linear learning rate decay.
+    opts = self._options
+    words_to_train = float(opts.words_per_epoch * opts.epochs_to_train)
+    lr = opts.learning_rate * tf.maximum(
+        0.0001, 1.0 - tf.cast(self._words, tf.float32) / words_to_train)
+    self._lr = lr
+    optimizer = tf.train.GradientDescentOptimizer(lr)
+    train = optimizer.minimize(loss,
+                               global_step=self.global_step,
+                               gate_gradients=optimizer.GATE_NONE)
+    self._train = train
+
+  def build_eval_graph(self):
+    """Build the eval graph."""
+    # Eval graph
+
+    # Each analogy task is to predict the 4th word (d) given three
+    # words: a, b, c.  E.g., a=italy, b=rome, c=france, we should
+    # predict d=paris.
+
+    # The eval feeds three vectors of word ids for a, b, c, each of
+    # which is of size N, where N is the number of analogies we want to
+    # evaluate in one batch.
+    analogy_a = tf.placeholder(dtype=tf.int32)  # [N]
+    analogy_b = tf.placeholder(dtype=tf.int32)  # [N]
+    analogy_c = tf.placeholder(dtype=tf.int32)  # [N]
+
+    # Normalized word embeddings of shape [vocab_size, emb_dim].
+    nemb = tf.nn.l2_normalize(self._emb, 1)
+
+    # Each row of a_emb, b_emb, c_emb is a word's embedding vector.
+    # They all have the shape [N, emb_dim]
+    a_emb = tf.gather(nemb, analogy_a)  # a's embs
+    b_emb = tf.gather(nemb, analogy_b)  # b's embs
+    c_emb = tf.gather(nemb, analogy_c)  # c's embs
+
+    # We expect that d's embedding vectors on the unit hyper-sphere is
+    # near: c_emb + (b_emb - a_emb), which has the shape [N, emb_dim].
+    target = c_emb + (b_emb - a_emb)
+
+    # Compute cosine distance between each pair of target and vocab.
+    # dist has shape [N, vocab_size].
+    dist = tf.matmul(target, nemb, transpose_b=True)
+
+    # For each question (row in dist), find the top 4 words.
+    _, pred_idx = tf.nn.top_k(dist, 4)
+
+    # Nodes for computing neighbors for a given word according to
+    # their cosine distance.
+    nearby_word = tf.placeholder(dtype=tf.int32)  # word id
+    nearby_emb = tf.gather(nemb, nearby_word)
+    nearby_dist = tf.matmul(nearby_emb, nemb, transpose_b=True)
+    nearby_val, nearby_idx = tf.nn.top_k(nearby_dist,
+                                         min(1000, self._options.vocab_size))
+
+    # Nodes in the construct graph which are used by training and
+    # evaluation to run/feed/fetch.
+    self._analogy_a = analogy_a
+    self._analogy_b = analogy_b
+    self._analogy_c = analogy_c
+    self._analogy_pred_idx = pred_idx
+    self._nearby_word = nearby_word
+    self._nearby_val = nearby_val
+    self._nearby_idx = nearby_idx
+
+  def build_graph(self):
+    """Build the graph for the full model."""
+    opts = self._options
+    # The training data. A text file.
+    (words, counts, words_per_epoch, self._epoch, self._words, examples,
+     labels) = word2vec.skipgram(filename=opts.train_data,
+                                 batch_size=opts.batch_size,
+                                 window_size=opts.window_size,
+                                 min_count=opts.min_count,
+                                 subsample=opts.subsample)
+    (opts.vocab_words, opts.vocab_counts,
+     opts.words_per_epoch) = self._session.run([words, counts, words_per_epoch])
+    opts.vocab_size = len(opts.vocab_words)
+    print "Data file: ", opts.train_data
+    print "Vocab size: ", opts.vocab_size - 1, " + UNK"
+    print "Words per epoch: ", opts.words_per_epoch
+    self._examples = examples
+    self._labels = labels
+    self._id2word = opts.vocab_words
+    for i, w in enumerate(self._id2word):
+      self._word2id[w] = i
+    true_logits, sampled_logits = self.forward(examples, labels)
+    loss = self.nce_loss(true_logits, sampled_logits)
+    tf.scalar_summary("NCE loss", loss)
+    self._loss = loss
+    self.optimize(loss)
+
+    # Properly initialize all variables.
+    tf.initialize_all_variables().run()
+
+    self.saver = tf.train.Saver()
+
+  def save_vocab(self):
+    """Save the vocabulary to a file so the model can be reloaded."""
+    opts = self._options
+    with open(opts.save_path + "/vocab.txt", "w") as f:
+      for i in xrange(opts.vocab_size):
+        f.write(opts.vocab_words[i] + " " + str(opts.vocab_counts[i]) + "\n")
+
+  def _train_thread_body(self):
+    initial_epoch, = self._session.run([self._epoch])
+    while True:
+      _, epoch = self._session.run([self._train, self._epoch])
+      if epoch != initial_epoch:
+        break
+
+  def train(self):
+    """Train the model."""
+    opts = self._options
+
+    initial_epoch, initial_words = self._session.run([self._epoch, self._words])
+
+    summary_op = tf.merge_all_summaries()
+    summary_writer = tf.train.SummaryWriter(opts.save_path,
+                                            graph_def=self._session.graph_def)
+    workers = []
+    for _ in xrange(opts.concurrent_steps):
+      t = threading.Thread(target=self._train_thread_body)
+      t.start()
+      workers.append(t)
+
+    last_words, last_time, last_summary_time = initial_words, time.time(), 0
+    last_checkpoint_time = 0
+    while True:
+      time.sleep(opts.statistics_interval)  # Reports our progress once a while.
+      (epoch, step, loss, words, lr) = self._session.run(
+          [self._epoch, self.global_step, self._loss, self._words, self._lr])
+      now = time.time()
+      last_words, last_time, rate = words, now, (words - last_words) / (
+          now - last_time)
+      print("Epoch %4d Step %8d: lr = %5.3f loss = %6.2f words/sec = %8.0f\r" %
+            (epoch, step, lr, loss, rate)),
+      sys.stdout.flush()
+      if now - last_summary_time > opts.summary_interval:
+        summary_str = self._session.run(summary_op)
+        summary_writer.add_summary(summary_str, step)
+        last_summary_time = now
+      if now - last_checkpoint_time > opts.checkpoint_interval:
+        self.saver.save(self._session,
+                        opts.save_path + "model",
+                        global_step=step)
+        last_checkpoint_time = now
+      if epoch != initial_epoch:
+        break
+
+    for t in workers:
+      t.join()
+
+    return epoch
+
+  def _predict(self, analogy):
+    """Predict the top 4 answers for analogy questions."""
+    idx, = self._session.run([self._analogy_pred_idx], {
+        self._analogy_a: analogy[:, 0],
+        self._analogy_b: analogy[:, 1],
+        self._analogy_c: analogy[:, 2]
+    })
+    return idx
+
+  def eval(self):
+    """Evaluate analogy questions and reports accuracy."""
+
+    # How many questions we get right at precision@1.
+    correct = 0
+
+    total = self._analogy_questions.shape[0]
+    start = 0
+    while start < total:
+      limit = start + 2500
+      sub = self._analogy_questions[start:limit, :]
+      idx = self._predict(sub)
+      start = limit
+      for question in xrange(sub.shape[0]):
+        for j in xrange(4):
+          if idx[question, j] == sub[question, 3]:
+            # Bingo! We predicted correctly. E.g., [italy, rome, france, paris].
+            correct += 1
+            break
+          elif idx[question, j] in sub[question, :3]:
+            # We need to skip words already in the question.
+            continue
+          else:
+            # The correct label is not the precision@1
+            break
+    print
+    print "Eval %4d/%d accuracy = %4.1f%%" % (correct, total,
+                                              correct * 100.0 / total)
+
+  def analogy(self, w0, w1, w2):
+    """Predict word w3 as in w0:w1 vs w2:w3."""
+    wid = np.array([[self._word2id.get(w, 0) for w in [w0, w1, w2]]])
+    idx = self._predict(wid)
+    for c in [self._id2word[i] for i in idx[0, :]]:
+      if c not in [w0, w1, w2]:
+        return c
+    return "unknown"
+
+  def nearby(self, words, num=20):
+    """Prints out nearby words given a list of words."""
+    ids = np.array([self._word2id.get(x, 0) for x in words])
+    vals, idx = self._session.run(
+        [self._nearby_val, self._nearby_idx], {self._nearby_word: ids})
+    for i in xrange(len(words)):
+      print "\n%s\n=====================================" % (words[i])
+      for (neighbor, distance) in zip(idx[i, :num], vals[i, :num]):
+        print "%-20s %6.4f" % (self._id2word[neighbor], distance)
+
+
+def _start_shell(local_ns=None):
+  # An interactive shell is useful for debugging/development.
+  import IPython
+  user_ns = {}
+  if local_ns:
+    user_ns.update(local_ns)
+  user_ns.update(globals())
+  IPython.start_ipython(argv=[], user_ns=user_ns)
+
+
+def main(_):
+  """Train a word2vec model."""
+  opts = Options()
+  with tf.Graph().as_default(), tf.Session() as session:
+    model = Word2Vec(opts, session)
+    for _ in xrange(opts.epochs_to_train):
+      model.train()  # Process one epoch
+      model.eval()  # Eval analogies.
+    # Perform a final save.
+    model.saver.save(session,
+                     opts.save_path + "model",
+                     global_step=model.global_step)
+    if FLAGS.interactive:
+      # E.g.,
+      # [0]: model.analogy('france', 'paris', 'russia')
+      # [1]: model.nearby(['proton', 'elephant', 'maxwell'])
+      _start_shell(locals())
+
+
+if __name__ == "__main__":
+  tf.app.run()
diff --git a/tensorflow/models/embedding/word2vec_kernels.cc b/tensorflow/models/embedding/word2vec_kernels.cc
new file mode 100644
index 0000000000..f68139fc91
--- /dev/null
+++ b/tensorflow/models/embedding/word2vec_kernels.cc
@@ -0,0 +1,287 @@
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/random/distribution_sampler.h"
+#include "tensorflow/core/lib/random/philox_random.h"
+#include "tensorflow/core/lib/random/simple_philox.h"
+#include "tensorflow/core/platform/regexp.h"
+#include "tensorflow/core/platform/thread_annotations.h"
+#include "tensorflow/core/util/guarded_philox_random.h"
+
+namespace tensorflow {
+
+class SkipgramOp : public OpKernel {
+ public:
+  explicit SkipgramOp(OpKernelConstruction* ctx)
+      : OpKernel(ctx), rng_(&philox_) {
+    string filename;
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("filename", &filename));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("batch_size", &batch_size_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("window_size", &window_size_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("min_count", &min_count_));
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("subsample", &subsample_));
+    OP_REQUIRES_OK(ctx, Init(ctx->env(), filename));
+
+    mutex_lock l(mu_);
+    example_pos_ = corpus_size_;
+    label_pos_ = corpus_size_;
+    label_limit_ = corpus_size_;
+  }
+
+  void Compute(OpKernelContext* ctx) override {
+    Tensor words_per_epoch(DT_INT64, TensorShape({}));
+    Tensor current_epoch(DT_INT32, TensorShape({}));
+    Tensor total_words_processed(DT_INT64, TensorShape({}));
+    Tensor examples(DT_INT32, TensorShape({batch_size_}));
+    auto Texamples = examples.flat<int32>();
+    Tensor labels(DT_INT32, TensorShape({batch_size_}));
+    auto Tlabels = labels.flat<int32>();
+    {
+      mutex_lock l(mu_);
+      for (int i = 0; i < batch_size_; ++i) {
+        NextExample(&Texamples(i), &Tlabels(i));
+      }
+      words_per_epoch.scalar<int64>()() = corpus_size_;
+      current_epoch.scalar<int32>()() = current_epoch_;
+      total_words_processed.scalar<int64>()() = total_words_processed_;
+    }
+    ctx->set_output(0, word_);
+    ctx->set_output(1, freq_);
+    ctx->set_output(2, words_per_epoch);
+    ctx->set_output(3, current_epoch);
+    ctx->set_output(4, total_words_processed);
+    ctx->set_output(5, examples);
+    ctx->set_output(6, labels);
+  }
+
+ private:
+  int32 batch_size_ = 0;
+  int32 window_size_ = 5;
+  float subsample_ = 1e-3;
+  int min_count_ = 5;
+  int32 vocab_size_ = 0;
+  Tensor word_;
+  Tensor freq_;
+  int32 corpus_size_ = 0;
+  std::vector<int32> corpus_;
+
+  mutex mu_;
+  random::PhiloxRandom philox_ GUARDED_BY(mu_);
+  random::SimplePhilox rng_ GUARDED_BY(mu_);
+  int32 current_epoch_ GUARDED_BY(mu_) = -1;
+  int64 total_words_processed_ GUARDED_BY(mu_) = 0;
+  int32 example_pos_ GUARDED_BY(mu_);
+  int32 label_pos_ GUARDED_BY(mu_);
+  int32 label_limit_ GUARDED_BY(mu_);
+
+  // {example_pos_, label_pos_} is the cursor for the next example.
+  // example_pos_ wrapps around at the end of corpus_. For each
+  // example, we randomly generate [label_pos_, label_limit) for
+  // labels.
+  void NextExample(int32* example, int32* label) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+    while (true) {
+      if (label_pos_ >= label_limit_) {
+        if (example_pos_ + 1 >= corpus_size_) {
+          ++current_epoch_;
+          example_pos_ = 0;
+        } else {
+          ++example_pos_;
+        }
+        ++total_words_processed_;
+        int32 word_freq = freq_.flat<int32>()(corpus_[example_pos_]);
+        if (subsample_ > 0) {
+          // See Eq. 5 in http://arxiv.org/abs/1310.4546
+          float keep_prob =
+              (std::sqrt(word_freq / (subsample_ * corpus_size_)) + 1) *
+              (subsample_ * corpus_size_) / word_freq;
+          if (rng_.RandFloat() > keep_prob) continue;
+        }
+        const int32 skip = 1 + rng_.Uniform(window_size_);
+        label_pos_ = std::max<int32>(0, example_pos_ - skip);
+        label_limit_ = std::min<int32>(corpus_size_, example_pos_ + skip + 1);
+      }
+      if (example_pos_ != label_pos_) {
+        break;
+      }
+      ++label_pos_;
+    }
+    *example = corpus_[example_pos_];
+    *label = corpus_[label_pos_++];
+  }
+
+  Status Init(Env* env, const string& filename) {
+    string data;
+    TF_RETURN_IF_ERROR(ReadFileToString(env, filename, &data));
+    RE2 kWord("\\s*(\\S+)");
+    auto input = ToRegexpStringPiece(data);
+    string w;
+    corpus_size_ = 0;
+    std::unordered_map<string, int32> word_freq;
+    while (RE2::Consume(&input, kWord, &w)) {
+      ++(word_freq[w]);
+      ++corpus_size_;
+    }
+    if (corpus_size_ < window_size_ * 10) {
+      return errors::InvalidArgument("The text file ", filename,
+                                     " contains too little data: ",
+                                     corpus_size_, " words");
+    }
+    typedef std::pair<string, int32> WordFreq;
+    std::vector<WordFreq> ordered;
+    for (const auto& p : word_freq) {
+      if (p.second >= min_count_) ordered.push_back(p);
+    }
+    LOG(INFO) << "Data file: " << filename << " contains " << data.size()
+              << " bytes, " << corpus_size_ << " words, " << word_freq.size()
+              << " unique words, " << ordered.size()
+              << " unique frequent words.";
+    word_freq.clear();
+    std::sort(ordered.begin(), ordered.end(),
+              [](const WordFreq& x, const WordFreq& y) {
+                return x.second > y.second;
+              });
+    vocab_size_ = static_cast<int32>(1 + ordered.size());
+    Tensor word(DT_STRING, TensorShape({vocab_size_}));
+    Tensor freq(DT_INT32, TensorShape({vocab_size_}));
+    word.flat<string>()(0) = "UNK";
+    static const int32 kUnkId = 0;
+    std::unordered_map<string, int32> word_id;
+    int64 total_counted = 0;
+    for (std::size_t i = 0; i < ordered.size(); ++i) {
+      const auto& w = ordered[i].first;
+      auto id = i + 1;
+      word.flat<string>()(id) = w;
+      auto word_count = ordered[i].second;
+      freq.flat<int32>()(id) = word_count;
+      total_counted += word_count;
+      word_id[w] = id;
+    }
+    freq.flat<int32>()(kUnkId) = corpus_size_ - total_counted;
+    word_ = word;
+    freq_ = freq;
+    corpus_.reserve(corpus_size_);
+    input = ToRegexpStringPiece(data);
+    while (RE2::Consume(&input, kWord, &w)) {
+      corpus_.push_back(gtl::FindWithDefault(word_id, w, kUnkId));
+    }
+    return Status::OK();
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("Skipgram").Device(DEVICE_CPU), SkipgramOp);
+
+class NegTrainOp : public OpKernel {
+ public:
+  explicit NegTrainOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    base_.Init(0, 0);
+
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("num_negative_samples", &num_samples_));
+
+    std::vector<int32> vocab_count;
+    OP_REQUIRES_OK(ctx, ctx->GetAttr("vocab_count", &vocab_count));
+
+    std::vector<float> vocab_weights;
+    vocab_weights.reserve(vocab_count.size());
+    for (const auto& f : vocab_count) {
+      float r = std::pow(static_cast<float>(f), 0.75f);
+      vocab_weights.push_back(r);
+    }
+    sampler_ = new random::DistributionSampler(vocab_weights);
+  }
+
+  ~NegTrainOp() { delete sampler_; }
+
+  void Compute(OpKernelContext* ctx) override {
+    Tensor w_in = ctx->mutable_input(0, false);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsMatrix(w_in.shape()),
+                errors::InvalidArgument("Must be a matrix"));
+    Tensor w_out = ctx->mutable_input(1, false);
+    OP_REQUIRES(ctx, w_in.shape() == w_out.shape(),
+                errors::InvalidArgument("w_in.shape == w_out.shape"));
+    const Tensor& examples = ctx->input(2);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsVector(examples.shape()),
+                errors::InvalidArgument("Must be a vector"));
+    const Tensor& labels = ctx->input(3);
+    OP_REQUIRES(ctx, examples.shape() == labels.shape(),
+                errors::InvalidArgument("examples.shape == labels.shape"));
+    const Tensor& learning_rate = ctx->input(4);
+    OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(learning_rate.shape()),
+                errors::InvalidArgument("Must be a scalar"));
+
+    auto Tw_in = w_in.matrix<float>();
+    auto Tw_out = w_out.matrix<float>();
+    auto Texamples = examples.flat<int32>();
+    auto Tlabels = labels.flat<int32>();
+    auto lr = learning_rate.scalar<float>()();
+    const int64 vocab_size = w_in.dim_size(0);
+    const int64 dims = w_in.dim_size(1);
+    const int64 batch_size = examples.dim_size(0);
+    OP_REQUIRES(ctx, vocab_size == sampler_->num(),
+                errors::InvalidArgument("vocab_size mismatches: ", vocab_size,
+                                        " vs. ", sampler_->num()));
+
+    // Gradient accumulator for v_in.
+    Tensor buf(DT_FLOAT, TensorShape({dims}));
+    auto Tbuf = buf.flat<float>();
+
+    // Scalar buffer to hold sigmoid(+/- dot).
+    Tensor g_buf(DT_FLOAT, TensorShape({}));
+    auto g = g_buf.scalar<float>();
+
+    // The following loop needs 2 random 32-bit values per negative
+    // sample.  We reserve 8 values per sample just in case the
+    // underlying implementation changes.
+    auto rnd = base_.ReserveSamples32(batch_size * num_samples_ * 8);
+    random::SimplePhilox srnd(&rnd);
+
+    for (int64 i = 0; i < batch_size; ++i) {
+      const int32 example = Texamples(i);
+      DCHECK(0 <= example && example < vocab_size) << example;
+      const int32 label = Tlabels(i);
+      DCHECK(0 <= label && label < vocab_size) << label;
+      auto v_in = Tw_in.chip<0>(example);
+
+      // Positive: example predicts label.
+      //   forward: x = v_in' * v_out
+      //            l = log(sigmoid(x))
+      //   backward: dl/dx = g = sigmoid(-x)
+      //             dl/d(v_in) = g * v_out'
+      //             dl/d(v_out) = v_in' * g
+      {
+        auto v_out = Tw_out.chip<0>(label);
+        auto dot = (v_in * v_out).sum();
+        g = (dot.exp() + 1.f).inverse();
+        Tbuf = v_out * (g() * lr);
+        v_out += v_in * (g() * lr);
+      }
+
+      // Negative samples:
+      //   forward: x = v_in' * v_sample
+      //            l = log(sigmoid(-x))
+      //   backward: dl/dx = g = -sigmoid(x)
+      //             dl/d(v_in) = g * v_out'
+      //             dl/d(v_out) = v_in' * g
+      for (int j = 0; j < num_samples_; ++j) {
+        const int sample = sampler_->Sample(&srnd);
+        if (sample == label) continue;  // Skip.
+        auto v_sample = Tw_out.chip<0>(sample);
+        auto dot = (v_in * v_sample).sum();
+        g = -((-dot).exp() + 1.f).inverse();
+        Tbuf += v_sample * (g() * lr);
+        v_sample += v_in * (g() * lr);
+      }
+
+      // Applies the gradient on v_in.
+      v_in += Tbuf;
+    }
+  }
+
+ private:
+  int32 num_samples_ = 0;
+  random::DistributionSampler* sampler_ = nullptr;
+  GuardedPhiloxRandom base_;
+};
+
+REGISTER_KERNEL_BUILDER(Name("NegTrain").Device(DEVICE_CPU), NegTrainOp);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/models/embedding/word2vec_ops.cc b/tensorflow/models/embedding/word2vec_ops.cc
new file mode 100644
index 0000000000..abe03baaf4
--- /dev/null
+++ b/tensorflow/models/embedding/word2vec_ops.cc
@@ -0,0 +1,56 @@
+#include "tensorflow/core/framework/op.h"
+
+namespace tensorflow {
+
+REGISTER_OP("Skipgram")
+    .Output("vocab_word: string")
+    .Output("vocab_freq: int32")
+    .Output("words_per_epoch: int64")
+    .Output("current_epoch: int32")
+    .Output("total_words_processed: int64")
+    .Output("examples: int32")
+    .Output("labels: int32")
+    .Attr("filename: string")
+    .Attr("batch_size: int")
+    .Attr("window_size: int = 5")
+    .Attr("min_count: int = 5")
+    .Attr("subsample: float = 1e-3")
+    .Doc(R"doc(
+Parses a text file and creates a batch of examples.
+
+vocab_word: A vector of words in the corpus.
+vocab_freq: Frequencies of words. Sorted in the non-ascending order.
+words_per_epoch: Number of words per epoch in the data file.
+current_epoch: The current epoch number.
+total_words_processed: The total number of words processed so far.
+examples: A vector of word ids.
+labels: A vector of word ids.
+filename: The corpus's text file name.
+batch_size: The size of produced batch.
+window_size: The number of words to predict to the left and right of the target.
+min_count: The minimum number of word occurrences for it to be included in the
+    vocabulary.
+subsample: Threshold for word occurrence. Words that appear with higher
+    frequency will be randomly down-sampled. Set to 0 to disable.
+)doc");
+
+REGISTER_OP("NegTrain")
+    .Input("w_in: Ref(float)")
+    .Input("w_out: Ref(float)")
+    .Input("examples: int32")
+    .Input("labels: int32")
+    .Input("lr: float")
+    .Attr("vocab_count: list(int)")
+    .Attr("num_negative_samples: int")
+    .Doc(R"doc(
+Training via negative sampling.
+
+w_in: input word embedding.
+w_out: output word embedding.
+examples: A vector of word ids.
+labels: A vector of word ids.
+vocab_count: Count of words in the vocabulary.
+num_negative_samples: Number of negative samples per exaple.
+)doc");
+
+}  // end namespace tensorflow
diff --git a/tensorflow/models/embedding/word2vec_optimized.py b/tensorflow/models/embedding/word2vec_optimized.py
new file mode 100644
index 0000000000..23e7645a0b
--- /dev/null
+++ b/tensorflow/models/embedding/word2vec_optimized.py
@@ -0,0 +1,405 @@
+"""Multi-threaded word2vec unbatched skip-gram model.
+
+Trains the model described in:
+(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space
+ICLR 2013.
+http://arxiv.org/abs/1301.3781
+This model does true SGD (i.e. no minibatching). To do this efficiently, custom
+ops are used to sequentially process data within a 'batch'.
+
+The key ops used are:
+* skipgram custom op that does input processing.
+* neg_train custom op that efficiently calculates and applies the gradient using
+  true SGD.
+"""
+
+import sys
+import threading
+import time
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.models.embedding import gen_word2vec as word2vec
+
+flags = tf.app.flags
+
+flags.DEFINE_string("save_path", None, "Directory to write the model.")
+flags.DEFINE_string(
+    "train_data", None,
+    "Training data. E.g., unzipped file http://mattmahoney.net/dc/text8.zip.")
+flags.DEFINE_string(
+    "eval_data", None, "Analogy questions. "
+    "https://word2vec.googlecode.com/svn/trunk/questions-words.txt.")
+flags.DEFINE_integer("embedding_size", 200, "The embedding dimension size.")
+flags.DEFINE_integer(
+    "epochs_to_train", 15,
+    "Number of epochs to train. Each epoch processes the training data once "
+    "completely.")
+flags.DEFINE_float("learning_rate", 0.025, "Initial learning rate.")
+flags.DEFINE_integer("num_neg_samples", 25,
+                     "Negative samples per training example.")
+flags.DEFINE_integer("batch_size", 500,
+                     "Numbers of training examples each step processes "
+                     "(no minibatching).")
+flags.DEFINE_integer("concurrent_steps", 12,
+                     "The number of concurrent training steps.")
+flags.DEFINE_integer("window_size", 5,
+                     "The number of words to predict to the left and right "
+                     "of the target word.")
+flags.DEFINE_integer("min_count", 5,
+                     "The minimum number of word occurrences for it to be "
+                     "included in the vocabulary.")
+flags.DEFINE_float("subsample", 1e-3,
+                   "Subsample threshold for word occurrence. Words that appear "
+                   "with higher frequency will be randomly down-sampled. Set "
+                   "to 0 to disable.")
+flags.DEFINE_boolean(
+    "interactive", False,
+    "If true, enters an IPython interactive session to play with the trained "
+    "model. E.g., try model.analogy('france', 'paris', 'russia') and "
+    "model.nearby(['proton', 'elephant', 'maxwell']")
+
+FLAGS = flags.FLAGS
+
+
+class Options(object):
+  """Options used by our word2vec model."""
+
+  def __init__(self):
+    # Model options.
+
+    # Embedding dimension.
+    self.emb_dim = FLAGS.embedding_size
+
+    # Training options.
+
+    # The training text file.
+    self.train_data = FLAGS.train_data
+
+    # Number of negative samples per example.
+    self.num_samples = FLAGS.num_neg_samples
+
+    # The initial learning rate.
+    self.learning_rate = FLAGS.learning_rate
+
+    # Number of epochs to train. After these many epochs, the learning
+    # rate decays linearly to zero and the training stops.
+    self.epochs_to_train = FLAGS.epochs_to_train
+
+    # Concurrent training steps.
+    self.concurrent_steps = FLAGS.concurrent_steps
+
+    # Number of examples for one training step.
+    self.batch_size = FLAGS.batch_size
+
+    # The number of words to predict to the left and right of the target word.
+    self.window_size = FLAGS.window_size
+
+    # The minimum number of word occurrences for it to be included in the
+    # vocabulary.
+    self.min_count = FLAGS.min_count
+
+    # Subsampling threshold for word occurrence.
+    self.subsample = FLAGS.subsample
+
+    # Where to write out summaries.
+    self.save_path = FLAGS.save_path
+
+    # Eval options.
+
+    # The text file for eval.
+    self.eval_data = FLAGS.eval_data
+
+
+class Word2Vec(object):
+  """Word2Vec model (Skipgram)."""
+
+  def __init__(self, options, session):
+    self._options = options
+    self._session = session
+    self._word2id = {}
+    self._id2word = []
+    self.build_graph()
+    self.build_eval_graph()
+    self.save_vocab()
+    self._read_analogies()
+
+  def _read_analogies(self):
+    """Reads through the analogy question file.
+
+    Returns:
+      questions: a [n, 4] numpy array containing the analogy question's
+                 word ids.
+      questions_skipped: questions skipped due to unknown words.
+    """
+    questions = []
+    questions_skipped = 0
+    with open(self._options.eval_data) as analogy_f:
+      for line in analogy_f:
+        if line.startswith(":"):  # Skip comments.
+          continue
+        words = line.strip().lower().split(" ")
+        ids = [self._word2id.get(w.strip()) for w in words]
+        if None in ids or len(ids) != 4:
+          questions_skipped += 1
+        else:
+          questions.append(np.array(ids))
+    print "Eval analogy file: ", self._options.eval_data
+    print "Questions: ", len(questions)
+    print "Skipped: ", questions_skipped
+    self._analogy_questions = np.array(questions, dtype=np.int32)
+
+  def build_graph(self):
+    """Build the model graph."""
+    opts = self._options
+
+    # The training data. A text file.
+    (words, counts, words_per_epoch, current_epoch, total_words_processed,
+     examples, labels) = word2vec.skipgram(filename=opts.train_data,
+                                           batch_size=opts.batch_size,
+                                           window_size=opts.window_size,
+                                           min_count=opts.min_count,
+                                           subsample=opts.subsample)
+    (opts.vocab_words, opts.vocab_counts,
+     opts.words_per_epoch) = self._session.run([words, counts, words_per_epoch])
+    opts.vocab_size = len(opts.vocab_words)
+    print "Data file: ", opts.train_data
+    print "Vocab size: ", opts.vocab_size - 1, " + UNK"
+    print "Words per epoch: ", opts.words_per_epoch
+
+    self._id2word = opts.vocab_words
+    for i, w in enumerate(self._id2word):
+      self._word2id[w] = i
+
+    # Declare all variables we need.
+    # Input words embedding: [vocab_size, emb_dim]
+    w_in = tf.Variable(
+        tf.random_uniform(
+            [opts.vocab_size,
+             opts.emb_dim], -0.5 / opts.emb_dim, 0.5 / opts.emb_dim),
+        name="w_in")
+
+    # Global step: scalar, i.e., shape [].
+    w_out = tf.Variable(tf.zeros([opts.vocab_size, opts.emb_dim]), name="w_out")
+
+    # Global step: []
+    global_step = tf.Variable(0, name="global_step")
+
+    # Linear learning rate decay.
+    words_to_train = float(opts.words_per_epoch * opts.epochs_to_train)
+    lr = opts.learning_rate * tf.maximum(
+        0.0001,
+        1.0 - tf.cast(total_words_processed, tf.float32) / words_to_train)
+
+    # Training nodes.
+    inc = global_step.assign_add(1)
+    with tf.control_dependencies([inc]):
+      train = word2vec.neg_train(w_in,
+                                 w_out,
+                                 examples,
+                                 labels,
+                                 lr,
+                                 vocab_count=opts.vocab_counts.tolist(),
+                                 num_negative_samples=opts.num_samples)
+
+    self._w_in = w_in
+    self._examples = examples
+    self._labels = labels
+    self._lr = lr
+    self._train = train
+    self.step = global_step
+    self._epoch = current_epoch
+    self._words = total_words_processed
+
+  def save_vocab(self):
+    """Save the vocabulary to a file so the model can be reloaded."""
+    opts = self._options
+    with open(opts.save_path + "/vocab.txt", "w") as f:
+      for i in xrange(opts.vocab_size):
+        f.write(opts.vocab_words[i] + " " + str(opts.vocab_counts[i]) + "\n")
+
+  def build_eval_graph(self):
+    """Build the evaluation graph."""
+    # Eval graph
+    opts = self._options
+
+    # Each analogy task is to predict the 4th word (d) given three
+    # words: a, b, c.  E.g., a=italy, b=rome, c=france, we should
+    # predict d=paris.
+
+    # The eval feeds three vectors of word ids for a, b, c, each of
+    # which is of size N, where N is the number of analogies we want to
+    # evaluate in one batch.
+    analogy_a = tf.placeholder(dtype=tf.int32)  # [N]
+    analogy_b = tf.placeholder(dtype=tf.int32)  # [N]
+    analogy_c = tf.placeholder(dtype=tf.int32)  # [N]
+
+    # Normalized word embeddings of shape [vocab_size, emb_dim].
+    nemb = tf.nn.l2_normalize(self._w_in, 1)
+
+    # Each row of a_emb, b_emb, c_emb is a word's embedding vector.
+    # They all have the shape [N, emb_dim]
+    a_emb = tf.gather(nemb, analogy_a)  # a's embs
+    b_emb = tf.gather(nemb, analogy_b)  # b's embs
+    c_emb = tf.gather(nemb, analogy_c)  # c's embs
+
+    # We expect that d's embedding vectors on the unit hyper-sphere is
+    # near: c_emb + (b_emb - a_emb), which has the shape [N, emb_dim].
+    target = c_emb + (b_emb - a_emb)
+
+    # Compute cosine distance between each pair of target and vocab.
+    # dist has shape [N, vocab_size].
+    dist = tf.matmul(target, nemb, transpose_b=True)
+
+    # For each question (row in dist), find the top 4 words.
+    _, pred_idx = tf.nn.top_k(dist, 4)
+
+    # Nodes for computing neighbors for a given word according to
+    # their cosine distance.
+    nearby_word = tf.placeholder(dtype=tf.int32)  # word id
+    nearby_emb = tf.gather(nemb, nearby_word)
+    nearby_dist = tf.matmul(nearby_emb, nemb, transpose_b=True)
+    nearby_val, nearby_idx = tf.nn.top_k(nearby_dist,
+                                         min(1000, opts.vocab_size))
+
+    # Nodes in the construct graph which are used by training and
+    # evaluation to run/feed/fetch.
+    self._analogy_a = analogy_a
+    self._analogy_b = analogy_b
+    self._analogy_c = analogy_c
+    self._analogy_pred_idx = pred_idx
+    self._nearby_word = nearby_word
+    self._nearby_val = nearby_val
+    self._nearby_idx = nearby_idx
+
+    # Properly initialize all variables.
+    tf.initialize_all_variables().run()
+
+    self.saver = tf.train.Saver()
+
+  def _train_thread_body(self):
+    initial_epoch, = self._session.run([self._epoch])
+    while True:
+      _, epoch = self._session.run([self._train, self._epoch])
+      if epoch != initial_epoch:
+        break
+
+  def train(self):
+    """Train the model."""
+    opts = self._options
+
+    initial_epoch, initial_words = self._session.run([self._epoch, self._words])
+
+    workers = []
+    for _ in xrange(opts.concurrent_steps):
+      t = threading.Thread(target=self._train_thread_body)
+      t.start()
+      workers.append(t)
+
+    last_words, last_time = initial_words, time.time()
+    while True:
+      time.sleep(5)  # Reports our progress once a while.
+      (epoch, step, words,
+       lr) = self._session.run([self._epoch, self.step, self._words, self._lr])
+      now = time.time()
+      last_words, last_time, rate = words, now, (words - last_words) / (
+          now - last_time)
+      print "Epoch %4d Step %8d: lr = %5.3f words/sec = %8.0f\r" % (epoch, step,
+                                                                    lr, rate),
+      sys.stdout.flush()
+      if epoch != initial_epoch:
+        break
+
+    for t in workers:
+      t.join()
+
+  def _predict(self, analogy):
+    """Predict the top 4 answers for analogy questions."""
+    idx, = self._session.run([self._analogy_pred_idx], {
+        self._analogy_a: analogy[:, 0],
+        self._analogy_b: analogy[:, 1],
+        self._analogy_c: analogy[:, 2]
+    })
+    return idx
+
+  def eval(self):
+    """Evaluate analogy questions and reports accuracy."""
+
+    # How many questions we get right at precision@1.
+    correct = 0
+
+    total = self._analogy_questions.shape[0]
+    start = 0
+    while start < total:
+      limit = start + 2500
+      sub = self._analogy_questions[start:limit, :]
+      idx = self._predict(sub)
+      start = limit
+      for question in xrange(sub.shape[0]):
+        for j in xrange(4):
+          if idx[question, j] == sub[question, 3]:
+            # Bingo! We predicted correctly. E.g., [italy, rome, france, paris].
+            correct += 1
+            break
+          elif idx[question, j] in sub[question, :3]:
+            # We need to skip words already in the question.
+            continue
+          else:
+            # The correct label is not the precision@1
+            break
+    print
+    print "Eval %4d/%d accuracy = %4.1f%%" % (correct, total,
+                                              correct * 100.0 / total)
+
+  def analogy(self, w0, w1, w2):
+    """Predict word w3 as in w0:w1 vs w2:w3."""
+    wid = np.array([[self._word2id.get(w, 0) for w in [w0, w1, w2]]])
+    idx = self._predict(wid)
+    for c in [self._id2word[i] for i in idx[0, :]]:
+      if c not in [w0, w1, w2]:
+        return c
+    return "unknown"
+
+  def nearby(self, words, num=20):
+    """Prints out nearby words given a list of words."""
+    ids = np.array([self._word2id.get(x, 0) for x in words])
+    vals, idx = self._session.run(
+        [self._nearby_val, self._nearby_idx], {self._nearby_word: ids})
+    for i in xrange(len(words)):
+      print "\n%s\n=====================================" % (words[i])
+      for (neighbor, distance) in zip(idx[i, :num], vals[i, :num]):
+        print "%-20s %6.4f" % (self._id2word[neighbor], distance)
+
+
+def _start_shell(local_ns=None):
+  # An interactive shell is useful for debugging/development.
+  import IPython
+  user_ns = {}
+  if local_ns:
+    user_ns.update(local_ns)
+  user_ns.update(globals())
+  IPython.start_ipython(argv=[], user_ns=user_ns)
+
+
+def main(_):
+  """Train a word2vec model."""
+  opts = Options()
+  with tf.Graph().as_default(), tf.Session() as session:
+    model = Word2Vec(opts, session)
+    for _ in xrange(opts.epochs_to_train):
+      model.train()  # Process one epoch
+      model.eval()  # Eval analogies.
+    # Perform a final save.
+    model.saver.save(session, opts.save_path + "model", global_step=model.step)
+    if FLAGS.interactive:
+      # E.g.,
+      # [0]: model.Analogy('france', 'paris', 'russia')
+      # [1]: model.Nearby(['proton', 'elephant', 'maxwell'])
+      _start_shell(locals())
+
+
+if __name__ == "__main__":
+  tf.app.run()
diff --git a/tensorflow/models/image/alexnet/BUILD b/tensorflow/models/image/alexnet/BUILD
new file mode 100644
index 0000000000..e1b9cd6965
--- /dev/null
+++ b/tensorflow/models/image/alexnet/BUILD
@@ -0,0 +1,28 @@
+# Description:
+# Benchmark for AlexNet.
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+py_binary(
+    name = "alexnet_benchmark",
+    srcs = [
+        "alexnet_benchmark.py",
+    ],
+    deps = [
+        "//tensorflow:tensorflow_py",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/models/image/alexnet/__init__.py b/tensorflow/models/image/alexnet/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/models/image/alexnet/__init__.py
diff --git a/tensorflow/models/image/alexnet/alexnet_benchmark.py b/tensorflow/models/image/alexnet/alexnet_benchmark.py
new file mode 100644
index 0000000000..130948c4bf
--- /dev/null
+++ b/tensorflow/models/image/alexnet/alexnet_benchmark.py
@@ -0,0 +1,215 @@
+"""Timing benchmark for AlexNet inference.
+
+To run, use:
+  bazel run -c opt --config=cuda \
+      third_party/tensorflow/models/image/alexnet:alexnet_benchmark
+
+Across 100 steps on batch size = 128.
+
+Forward pass:
+Run on Tesla K40c: 145 +/- 1.5 ms / batch
+Run on Titan X:     70 +/- 0.1 ms / batch
+
+Forward-backward pass:
+Run on Tesla K40c: 480 +/- 48 ms / batch
+Run on Titan X:    244 +/- 30 ms / batch
+"""
+from datetime import datetime
+import math
+import time
+
+import tensorflow.python.platform
+import tensorflow as tf
+
+
+FLAGS = tf.app.flags.FLAGS
+
+tf.app.flags.DEFINE_integer('batch_size', 128,
+                            """Batch size.""")
+tf.app.flags.DEFINE_integer('num_batches', 100,
+                            """Number of batches to run.""")
+
+
+def print_activations(t):
+  print t.op.name, ' ', t.get_shape().as_list()
+
+
+def inference(images):
+  """Build the AlexNet model.
+
+  Args:
+    images: Images Tensor
+
+  Returns:
+    pool5: the last Tensor in the convolutional component of AlexNet.
+    parameters: a list of Tensors corresponding to the weights and biases of the
+        AlexNet model.
+  """
+  parameters = []
+  # conv1
+  with tf.name_scope('conv1') as scope:
+    kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], dtype=tf.float32,
+                                             stddev=1e-1), name='weights')
+    conv = tf.nn.conv2d(images, kernel, [1, 4, 4, 1], padding='VALID')
+    biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
+                         trainable=True, name='biases')
+    bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
+    conv1 = tf.nn.relu(bias, name=scope)
+    print_activations(conv1)
+    parameters += [kernel, biases]
+
+  # lrn1
+  # TODO(shlens, jiayq): Add a GPU version of local response normalization.
+
+  # pool1
+  pool1 = tf.nn.max_pool(conv1,
+                         ksize=[1, 3, 3, 1],
+                         strides=[1, 2, 2, 1],
+                         padding='VALID',
+                         name='pool1')
+  print_activations(pool1)
+
+  # conv2
+  with tf.name_scope('conv2') as scope:
+    kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], dtype=tf.float32,
+                                             stddev=1e-1), name='weights')
+    conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
+    biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32),
+                         trainable=True, name='biases')
+    bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
+    conv2 = tf.nn.relu(bias, name=scope)
+    parameters += [kernel, biases]
+  print_activations(conv2)
+
+  # pool2
+  pool2 = tf.nn.max_pool(conv2,
+                         ksize=[1, 3, 3, 1],
+                         strides=[1, 2, 2, 1],
+                         padding='VALID',
+                         name='pool2')
+  print_activations(pool2)
+
+  # conv3
+  with tf.name_scope('conv3') as scope:
+    kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384],
+                                             dtype=tf.float32,
+                                             stddev=1e-1), name='weights')
+    conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
+    biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32),
+                         trainable=True, name='biases')
+    bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
+    conv3 = tf.nn.relu(bias, name=scope)
+    parameters += [kernel, biases]
+    print_activations(conv3)
+
+  # conv4
+  with tf.name_scope('conv4') as scope:
+    kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256],
+                                             dtype=tf.float32,
+                                             stddev=1e-1), name='weights')
+    conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
+    biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
+                         trainable=True, name='biases')
+    bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
+    conv4 = tf.nn.relu(bias, name=scope)
+    parameters += [kernel, biases]
+    print_activations(conv4)
+
+  # conv5
+  with tf.name_scope('conv5') as scope:
+    kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256],
+                                             dtype=tf.float32,
+                                             stddev=1e-1), name='weights')
+    conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
+    biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
+                         trainable=True, name='biases')
+    bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
+    conv5 = tf.nn.relu(bias, name=scope)
+    parameters += [kernel, biases]
+    print_activations(conv5)
+
+  # pool5
+  pool5 = tf.nn.max_pool(conv5,
+                         ksize=[1, 3, 3, 1],
+                         strides=[1, 2, 2, 1],
+                         padding='VALID',
+                         name='pool5')
+  print_activations(pool5)
+
+  return pool5, parameters
+
+
+def time_tensorflow_run(session, target, info_string):
+  """Run the computation to obtain the target tensor and print timing stats.
+
+  Args:
+    session: the TensorFlow session to run the computation under.
+    target: the targe Tensor that is passed to the session's run() function.
+    info_string: a string summarizing this run, to be printed with the stats.
+
+  Returns:
+    None
+  """
+  num_steps_burn_in = 10
+  total_duration = 0.0
+  total_duration_squared = 0.0
+  for i in xrange(FLAGS.num_batches + num_steps_burn_in):
+    start_time = time.time()
+    _ = session.run(target)
+    duration = time.time() - start_time
+    if i > num_steps_burn_in:
+      if not i % 10:
+        print ('%s: step %d, duration = %.3f' %
+               (datetime.now(), i - num_steps_burn_in, duration))
+      total_duration += duration
+      total_duration_squared += duration * duration
+  mn = total_duration / FLAGS.num_batches
+  vr = total_duration_squared / FLAGS.num_batches - mn * mn
+  sd = math.sqrt(vr)
+  print ('%s: %s across %d steps, %.3f +/- %.3f sec / batch' %
+         (datetime.now(), info_string, FLAGS.num_batches, mn, sd))
+
+
+
+def run_benchmark():
+  """Run the benchmark on AlexNet."""
+  with tf.Graph().as_default():
+    # Generate some dummy images.
+    image_size = 224
+    # Note that our padding definition is slightly different the cuda-convnet.
+    # In order to force the model to start with the same activations sizes,
+    # we add 3 to the image_size and employ VALID padding above.
+    images = tf.Variable(tf.random_normal([FLAGS.batch_size,
+                                           image_size + 3,
+                                           image_size + 3, 3],
+                                          dtype=tf.float32,
+                                          stddev=1e-1))
+
+    # Build a Graph that computes the logits predictions from the
+    # inference model.
+    pool5, parameters = inference(images)
+
+    # Build an initialization operation.
+    init = tf.initialize_all_variables()
+
+    # Start running operations on the Graph.
+    sess = tf.Session('')
+    sess.run(init)
+
+    # Run the forward benchmark.
+    time_tensorflow_run(sess, pool5, "Forward")
+
+    # Add a simple objective so we can calculate the backward pass.
+    objective = tf.nn.l2_loss(pool5)
+    # Compute the gradient with respect to all the parameters.
+    grad = tf.gradients(objective, parameters)
+    # Run the backward benchmark.
+    time_tensorflow_run(sess, grad, "Forward-backward")
+
+
+def main(_):
+  run_benchmark()
+
+
+if __name__ == '__main__':
+  tf.app.run()
diff --git a/tensorflow/models/image/cifar10/BUILD b/tensorflow/models/image/cifar10/BUILD
new file mode 100644
index 0000000000..adf9aaffd4
--- /dev/null
+++ b/tensorflow/models/image/cifar10/BUILD
@@ -0,0 +1,79 @@
+# Description:
+# Example TensorFlow models for CIFAR-10
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+py_library(
+    name = "cifar10_input",
+    srcs = ["cifar10_input.py"],
+    deps = [
+        "//tensorflow:tensorflow_py",
+    ],
+)
+
+py_test(
+    name = "cifar10_input_test",
+    srcs = ["cifar10_input_test.py"],
+    deps = [
+        ":cifar10_input",
+        "//tensorflow:tensorflow_py",
+        "//tensorflow/python:framework_test_lib",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+py_library(
+    name = "cifar10",
+    srcs = ["cifar10.py"],
+    deps = [
+        ":cifar10_input",
+        "//tensorflow:tensorflow_py",
+    ],
+)
+
+py_binary(
+    name = "cifar10_eval",
+    srcs = [
+        "cifar10_eval.py",
+    ],
+    visibility = ["//tensorflow:__subpackages__"],
+    deps = [
+        ":cifar10",
+    ],
+)
+
+py_binary(
+    name = "cifar10_train",
+    srcs = [
+        "cifar10_train.py",
+    ],
+    visibility = ["//tensorflow:__subpackages__"],
+    deps = [
+        ":cifar10",
+    ],
+)
+
+py_binary(
+    name = "cifar10_multi_gpu_train",
+    srcs = [
+        "cifar10_multi_gpu_train.py",
+    ],
+    visibility = ["//tensorflow:__subpackages__"],
+    deps = [
+        ":cifar10",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/models/image/cifar10/README.md b/tensorflow/models/image/cifar10/README.md
new file mode 100644
index 0000000000..67877aedc0
--- /dev/null
+++ b/tensorflow/models/image/cifar10/README.md
@@ -0,0 +1,10 @@
+CIFAR-10 is a common benchmark in machine learning for image recognition.
+
+http://www.cs.toronto.edu/~kriz/cifar.html
+
+Code in this directory demonstrates how to use TensorFlow to train and evaluate a convolutional neural network (CNN) on both CPU and GPU. We also demonstrate how to train a CNN over multiple GPUs.
+
+Detailed instructions on how to get started available at:
+
+http://tensorflow.org/tutorials/deep_cnn/
+
diff --git a/tensorflow/models/image/cifar10/__init__.py b/tensorflow/models/image/cifar10/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/models/image/cifar10/__init__.py
diff --git a/tensorflow/models/image/cifar10/cifar10.py b/tensorflow/models/image/cifar10/cifar10.py
new file mode 100644
index 0000000000..7870080820
--- /dev/null
+++ b/tensorflow/models/image/cifar10/cifar10.py
@@ -0,0 +1,480 @@
+"""Builds the CIFAR-10 network.
+
+Summary of available functions:
+
+ # Compute input images and labels for training. If you would like to run
+ # evaluations, use input() instead.
+ inputs, labels = distorted_inputs()
+
+ # Compute inference on the model inputs to make a prediction.
+ predictions = inference(inputs)
+
+ # Compute the total loss of the prediction with respect to the labels.
+ loss = loss(predictions, labels)
+
+ # Create a graph to run one step of training with respect to the loss.
+ train_op = train(loss, global_step)
+"""
+# pylint: disable=missing-docstring
+import gzip
+import os
+import re
+import sys
+import tarfile
+import urllib
+
+import tensorflow.python.platform
+import tensorflow as tf
+
+from tensorflow.models.image.cifar10 import cifar10_input
+from tensorflow.python.platform import gfile
+
+FLAGS = tf.app.flags.FLAGS
+
+# Basic model parameters.
+tf.app.flags.DEFINE_integer('batch_size', 128,
+                            """Number of images to process in a batch.""")
+tf.app.flags.DEFINE_string('data_dir', '/tmp/cifar10_data',
+                           """Path to the CIFAR-10 data directory.""")
+
+# Process images of this size. Note that this differs from the original CIFAR
+# image size of 32 x 32. If one alters this number, then the entire model
+# architecture will change and any model would need to be retrained.
+IMAGE_SIZE = 24
+
+# Global constants describing the CIFAR-10 data set.
+NUM_CLASSES = 10
+NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
+NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 10000
+
+# Constants describing the training process.
+MOVING_AVERAGE_DECAY = 0.9999     # The decay to use for the moving average.
+NUM_EPOCHS_PER_DECAY = 350.0      # Epochs after which learning rate decays.
+LEARNING_RATE_DECAY_FACTOR = 0.1  # Learning rate decay factor.
+INITIAL_LEARNING_RATE = 0.1       # Initial learning rate.
+
+# If a model is trained with multiple GPU's prefix all Op names with tower_name
+# to differentiate the operations. Note that this prefix is removed from the
+# names of the summaries when visualizing a model.
+TOWER_NAME = 'tower'
+
+DATA_URL = 'http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'
+
+
+def _activation_summary(x):
+  """Helper to create summaries for activations.
+
+  Creates a summary that provides a histogram of activations.
+  Creates a summary that measure the sparsity of activations.
+
+  Args:
+    x: Tensor
+  Returns:
+    nothing
+  """
+  # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
+  # session. This helps the clarity of presentation on tensorboard.
+  tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name)
+  tf.histogram_summary(tensor_name + '/activations', x)
+  tf.scalar_summary(tensor_name + '/sparsity', tf.nn.zero_fraction(x))
+
+
+def _variable_on_cpu(name, shape, initializer):
+  """Helper to create a Variable stored on CPU memory.
+
+  Args:
+    name: name of the variable
+    shape: list of ints
+    initializer: initializer for Variable
+
+  Returns:
+    Variable Tensor
+  """
+  with tf.device('/cpu:0'):
+    var = tf.get_variable(name, shape, initializer=initializer)
+  return var
+
+
+def _variable_with_weight_decay(name, shape, stddev, wd):
+  """Helper to create an initialized Variable with weight decay.
+
+  Note that the Variable is initialized with a truncated normal distribution.
+  A weight decay is added only if one is specified.
+
+  Args:
+    name: name of the variable
+    shape: list of ints
+    stddev: standard deviation of a truncated Gaussian
+    wd: add L2Loss weight decay multiplied by this float. If None, weight
+        decay is not added for this Variable.
+
+  Returns:
+    Variable Tensor
+  """
+  var = _variable_on_cpu(name, shape,
+                         tf.truncated_normal_initializer(stddev=stddev))
+  if wd:
+    weight_decay = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss')
+    tf.add_to_collection('losses', weight_decay)
+  return var
+
+
+def _generate_image_and_label_batch(image, label, min_queue_examples):
+  """Construct a queued batch of images and labels.
+
+  Args:
+    image: 3-D Tensor of [IMAGE_SIZE, IMAGE_SIZE, 3] of type.float32.
+    label: 1-D Tensor of type.int32
+    min_queue_examples: int32, minimum number of samples to retain
+      in the queue that provides of batches of examples.
+
+  Returns:
+    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
+    labels: Labels. 1D tensor of [batch_size] size.
+  """
+  # Create a queue that shuffles the examples, and then
+  # read 'FLAGS.batch_size' images + labels from the example queue.
+  num_preprocess_threads = 16
+  images, label_batch = tf.train.shuffle_batch(
+      [image, label],
+      batch_size=FLAGS.batch_size,
+      num_threads=num_preprocess_threads,
+      capacity=min_queue_examples + 3 * FLAGS.batch_size,
+      min_after_dequeue=min_queue_examples)
+
+  # Display the training images in the visualizer.
+  tf.image_summary('images', images)
+
+  return images, tf.reshape(label_batch, [FLAGS.batch_size])
+
+
+def distorted_inputs():
+  """Construct distorted input for CIFAR training using the Reader ops.
+
+  Raises:
+    ValueError: if no data_dir
+
+  Returns:
+    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
+    labels: Labels. 1D tensor of [batch_size] size.
+  """
+  filenames = [os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin',
+                            'data_batch_%d.bin' % i)
+               for i in xrange(1, 5)]
+  for f in filenames:
+    if not gfile.Exists(f):
+      raise ValueError('Failed to find file: ' + f)
+
+  # Create a queue that produces the filenames to read.
+  filename_queue = tf.train.string_input_producer(filenames)
+
+  # Read examples from files in the filename queue.
+  read_input = cifar10_input.read_cifar10(filename_queue)
+  reshaped_image = tf.cast(read_input.uint8image, tf.float32)
+
+  height = IMAGE_SIZE
+  width = IMAGE_SIZE
+
+  # Image processing for training the network. Note the many random
+  # distortions applied to the image.
+
+  # Randomly crop a [height, width] section of the image.
+  distorted_image = tf.image.random_crop(reshaped_image, [height, width])
+
+  # Randomly flip the image horizontally.
+  distorted_image = tf.image.random_flip_left_right(distorted_image)
+
+  # Because these operations are not commutative, consider randomizing
+  # randomize the order their operation.
+  distorted_image = tf.image.random_brightness(distorted_image,
+                                               max_delta=63)
+  distorted_image = tf.image.random_contrast(distorted_image,
+                                             lower=0.2, upper=1.8)
+
+  # Subtract off the mean and divide by the variance of the pixels.
+  float_image = tf.image.per_image_whitening(distorted_image)
+
+  # Ensure that the random shuffling has good mixing properties.
+  min_fraction_of_examples_in_queue = 0.4
+  min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
+                           min_fraction_of_examples_in_queue)
+  print ('Filling queue with %d CIFAR images before starting to train. '
+         'This will take a few minutes.' % min_queue_examples)
+
+  # Generate a batch of images and labels by building up a queue of examples.
+  return _generate_image_and_label_batch(float_image, read_input.label,
+                                         min_queue_examples)
+
+
+def inputs(eval_data):
+  """Construct input for CIFAR evaluation using the Reader ops.
+
+  Args:
+    eval_data: bool, indicating if one should use the train or eval data set.
+
+  Raises:
+    ValueError: if no data_dir
+
+  Returns:
+    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
+    labels: Labels. 1D tensor of [batch_size] size.
+  """
+  if not FLAGS.data_dir:
+    raise ValueError('Please supply a data_dir')
+
+  if not eval_data:
+    filenames = [os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin',
+                              'data_batch_%d.bin' % i)
+                 for i in xrange(1, 5)]
+    num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
+  else:
+    filenames = [os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin',
+                              'test_batch.bin')]
+    num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
+
+  for f in filenames:
+    if not gfile.Exists(f):
+      raise ValueError('Failed to find file: ' + f)
+
+  # Create a queue that produces the filenames to read.
+  filename_queue = tf.train.string_input_producer(filenames)
+
+  # Read examples from files in the filename queue.
+  read_input = cifar10_input.read_cifar10(filename_queue)
+  reshaped_image = tf.cast(read_input.uint8image, tf.float32)
+
+  height = IMAGE_SIZE
+  width = IMAGE_SIZE
+
+  # Image processing for evaluation.
+  # Crop the central [height, width] of the image.
+  resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
+                                                         width, height)
+
+  # Subtract off the mean and divide by the variance of the pixels.
+  float_image = tf.image.per_image_whitening(resized_image)
+
+  # Ensure that the random shuffling has good mixing properties.
+  min_fraction_of_examples_in_queue = 0.4
+  min_queue_examples = int(num_examples_per_epoch *
+                           min_fraction_of_examples_in_queue)
+
+  # Generate a batch of images and labels by building up a queue of examples.
+  return _generate_image_and_label_batch(float_image, read_input.label,
+                                         min_queue_examples)
+
+
+def inference(images):
+  """Build the CIFAR-10 model.
+
+  Args:
+    images: Images returned from distorted_inputs() or inputs().
+
+  Returns:
+    Logits.
+  """
+  # We instantiate all variables using tf.get_variable() instead of
+  # tf.Variable() in order to share variables across multiple GPU training runs.
+  # If we only ran this model on a single GPU, we could simplify this function
+  # by replacing all instances of tf.get_variable() with tf.Variable().
+  #
+  # conv1
+  with tf.variable_scope('conv1') as scope:
+    kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],
+                                         stddev=1e-4, wd=0.0)
+    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
+    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
+    bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape().as_list())
+    conv1 = tf.nn.relu(bias, name=scope.name)
+    _activation_summary(conv1)
+
+  # pool1
+  pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
+                         padding='SAME', name='pool1')
+  # norm1
+  norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
+                    name='norm1')
+
+  # conv2
+  with tf.variable_scope('conv2') as scope:
+    kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 64],
+                                         stddev=1e-4, wd=0.0)
+    conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
+    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
+    bias = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape().as_list())
+    conv2 = tf.nn.relu(bias, name=scope.name)
+    _activation_summary(conv2)
+
+  # norm2
+  norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
+                    name='norm2')
+  # pool2
+  pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],
+                         strides=[1, 2, 2, 1], padding='SAME', name='pool2')
+
+  # local3
+  with tf.variable_scope('local3') as scope:
+    # Move everything into depth so we can perform a single matrix multiply.
+    dim = 1
+    for d in pool2.get_shape()[1:].as_list():
+      dim *= d
+    reshape = tf.reshape(pool2, [FLAGS.batch_size, dim])
+
+    weights = _variable_with_weight_decay('weights', shape=[dim, 384],
+                                          stddev=0.04, wd=0.004)
+    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
+    local3 = tf.nn.relu_layer(reshape, weights, biases, name=scope.name)
+    _activation_summary(local3)
+
+  # local4
+  with tf.variable_scope('local4') as scope:
+    weights = _variable_with_weight_decay('weights', shape=[384, 192],
+                                          stddev=0.04, wd=0.004)
+    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
+    local4 = tf.nn.relu_layer(local3, weights, biases, name=scope.name)
+    _activation_summary(local4)
+
+  # softmax, i.e. softmax(WX + b)
+  with tf.variable_scope('softmax_linear') as scope:
+    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
+                                          stddev=1/192.0, wd=0.0)
+    biases = _variable_on_cpu('biases', [NUM_CLASSES],
+                              tf.constant_initializer(0.0))
+    softmax_linear = tf.nn.xw_plus_b(local4, weights, biases, name=scope.name)
+    _activation_summary(softmax_linear)
+
+  return softmax_linear
+
+
+def loss(logits, labels):
+  """Add L2Loss to all the trainable variables.
+
+  Add summary for for "Loss" and "Loss/avg".
+  Args:
+    logits: Logits from inference().
+    labels: Labels from distorted_inputs or inputs(). 1-D tensor
+            of shape [batch_size]
+
+  Returns:
+    Loss tensor of type float.
+  """
+  # Reshape the labels into a dense Tensor of
+  # shape [batch_size, NUM_CLASSES].
+  sparse_labels = tf.reshape(labels, [FLAGS.batch_size, 1])
+  indices = tf.reshape(tf.range(0, FLAGS.batch_size, 1), [FLAGS.batch_size, 1])
+  concated = tf.concat(1, [indices, sparse_labels])
+  dense_labels = tf.sparse_to_dense(concated,
+                                    [FLAGS.batch_size, NUM_CLASSES],
+                                    1.0, 0.0)
+
+  # Calculate the average cross entropy loss across the batch.
+  cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
+      logits, dense_labels, name='cross_entropy_per_example')
+  cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
+  tf.add_to_collection('losses', cross_entropy_mean)
+
+  # The total loss is defined as the cross entropy loss plus all of the weight
+  # decay terms (L2 loss).
+  return tf.add_n(tf.get_collection('losses'), name='total_loss')
+
+
+def _add_loss_summaries(total_loss):
+  """Add summaries for losses in CIFAR-10 model.
+
+  Generates moving average for all losses and associated summaries for
+  visualizing the performance of the network.
+
+  Args:
+    total_loss: Total loss from loss().
+  Returns:
+    loss_averages_op: op for generating moving averages of losses.
+  """
+  # Compute the moving average of all individual losses and the total loss.
+  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
+  losses = tf.get_collection('losses')
+  loss_averages_op = loss_averages.apply(losses + [total_loss])
+
+  # Attach a scalar summmary to all individual losses and the total loss; do the
+  # same for the averaged version of the losses.
+  for l in losses + [total_loss]:
+    # Name each loss as '(raw)' and name the moving average version of the loss
+    # as the original loss name.
+    tf.scalar_summary(l.op.name +' (raw)', l)
+    tf.scalar_summary(l.op.name, loss_averages.average(l))
+
+  return loss_averages_op
+
+
+def train(total_loss, global_step):
+  """Train CIFAR-10 model.
+
+  Create an optimizer and apply to all trainable variables. Add moving
+  average for all trainable variables.
+
+  Args:
+    total_loss: Total loss from loss().
+    global_step: Integer Variable counting the number of training steps
+      processed.
+  Returns:
+    train_op: op for training.
+  """
+  # Variables that affect learning rate.
+  num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_size
+  decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)
+
+  # Decay the learning rate exponentially based on the number of steps.
+  lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
+                                  global_step,
+                                  decay_steps,
+                                  LEARNING_RATE_DECAY_FACTOR,
+                                  staircase=True)
+  tf.scalar_summary('learning_rate', lr)
+
+  # Generate moving averages of all losses and associated summaries.
+  loss_averages_op = _add_loss_summaries(total_loss)
+
+  # Compute gradients.
+  with tf.control_dependencies([loss_averages_op]):
+    opt = tf.train.GradientDescentOptimizer(lr)
+    grads = opt.compute_gradients(total_loss)
+
+  # Apply gradients.
+  apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
+
+  # Add histograms for trainable variables.
+  for var in tf.trainable_variables():
+    tf.histogram_summary(var.op.name, var)
+
+  # Add histograms for gradients.
+  for grad, var in grads:
+    if grad:
+      tf.histogram_summary(var.op.name + '/gradients', grad)
+
+  # Track the moving averages of all trainable variables.
+  variable_averages = tf.train.ExponentialMovingAverage(
+      MOVING_AVERAGE_DECAY, global_step)
+  variables_averages_op = variable_averages.apply(tf.trainable_variables())
+
+  with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
+    train_op = tf.no_op(name='train')
+
+  return train_op
+
+
+def maybe_download_and_extract():
+  """Download and extract the tarball from Alex's website."""
+  dest_directory = FLAGS.data_dir
+  if not os.path.exists(dest_directory):
+    os.makedirs(dest_directory)
+  filename = DATA_URL.split('/')[-1]
+  filepath = os.path.join(dest_directory, filename)
+  if not os.path.exists(filepath):
+    def _progress(count, block_size, total_size):
+      sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,
+          float(count * block_size) / float(total_size) * 100.0))
+      sys.stdout.flush()
+    filepath, _ = urllib.urlretrieve(DATA_URL, filepath, reporthook=_progress)
+    print
+    statinfo = os.stat(filepath)
+    print 'Succesfully downloaded', filename, statinfo.st_size, 'bytes.'
+    tarfile.open(filepath, 'r:gz').extractall(dest_directory)
diff --git a/tensorflow/models/image/cifar10/cifar10_eval.py b/tensorflow/models/image/cifar10/cifar10_eval.py
new file mode 100644
index 0000000000..73c224191d
--- /dev/null
+++ b/tensorflow/models/image/cifar10/cifar10_eval.py
@@ -0,0 +1,148 @@
+"""Evaluation for CIFAR-10.
+
+Accuracy:
+cifar10_train.py achieves 83.0% accuracy after 100K steps (256 epochs
+of data) as judged by cifar10_eval.py.
+
+Speed:
+On a single Tesla K40, cifar10_train.py processes a single batch of 128 images
+in 0.25-0.35 sec (i.e. 350 - 600 images /sec). The model reaches ~86%
+accuracy after 100K steps in 8 hours of training time.
+
+Usage:
+Please see the tutorial and website for how to download the CIFAR-10
+data set, compile the program and train the model.
+
+http://tensorflow.org/tutorials/deep_cnn/
+"""
+from datetime import datetime
+import math
+import time
+
+import tensorflow.python.platform
+from tensorflow.python.platform import gfile
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.models.image.cifar10 import cifar10
+
+FLAGS = tf.app.flags.FLAGS
+
+tf.app.flags.DEFINE_string('eval_dir', '/tmp/cifar10_eval',
+                           """Directory where to write event logs.""")
+tf.app.flags.DEFINE_string('eval_data', 'test',
+                           """Either 'test' or 'train_eval'.""")
+tf.app.flags.DEFINE_string('checkpoint_dir', '/tmp/cifar10_train',
+                           """Directory where to read model checkpoints.""")
+tf.app.flags.DEFINE_integer('eval_interval_secs', 60 * 5,
+                            """How often to run the eval.""")
+tf.app.flags.DEFINE_integer('num_examples', 10000,
+                            """Number of examples to run.""")
+tf.app.flags.DEFINE_boolean('run_once', False,
+                         """Whether to run eval only once.""")
+
+
+def eval_once(saver, summary_writer, top_k_op, summary_op):
+  """Run Eval once.
+
+  Args:
+    saver: Saver.
+    summary_writer: Summary writer.
+    top_k_op: Top K op.
+    summary_op: Summary op.
+  """
+  with tf.Session() as sess:
+    ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
+    if ckpt and ckpt.model_checkpoint_path:
+      # Restores from checkpoint
+      saver.restore(sess, ckpt.model_checkpoint_path)
+      # Assuming model_checkpoint_path looks something like:
+      #   /my-favorite-path/cifar10_train/model.ckpt-0,
+      # extract global_step from it.
+      global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
+    else:
+      print 'No checkpoint file found'
+      return
+
+    # Start the queue runners.
+    coord = tf.train.Coordinator()
+    try:
+      threads = []
+      for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
+        threads.extend(qr.create_threads(sess, coord=coord, daemon=True,
+                                         start=True))
+
+      num_iter = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size))
+      true_count = 0  # Counts the number of correct predictions.
+      total_sample_count = num_iter * FLAGS.batch_size
+      step = 0
+      while step < num_iter and not coord.should_stop():
+        predictions = sess.run([top_k_op])
+        true_count += np.sum(predictions)
+        step += 1
+
+      # Compute precision @ 1.
+      precision = float(true_count) / float(total_sample_count)
+      print '%s: precision @ 1 = %.3f' % (datetime.now(), precision)
+
+      summary = tf.Summary()
+      summary.ParseFromString(sess.run(summary_op))
+      summary.value.add(tag='Precision @ 1', simple_value=precision)
+      summary_writer.add_summary(summary, global_step)
+    except Exception, e:  # pylint: disable=broad-except
+      coord.request_stop(e)
+
+    coord.request_stop()
+    coord.join(threads, stop_grace_period_secs=10)
+
+
+def evaluate():
+  """Eval CIFAR-10 for a number of steps."""
+  with tf.Graph().as_default():
+    # Get images and labels for CIFAR-10.
+    eval_data = FLAGS.eval_data == 'test'
+    images, labels = cifar10.inputs(eval_data=eval_data)
+
+    # Build a Graph that computes the logits predictions from the
+    # inference model.
+    logits = cifar10.inference(images)
+
+    # Calculate predictions.
+    top_k_op = tf.nn.in_top_k(logits, labels, 1)
+
+    # Restore the moving average version of the learned variables for eval.
+    variable_averages = tf.train.ExponentialMovingAverage(
+        cifar10.MOVING_AVERAGE_DECAY)
+    variables_to_restore = {}
+    for v in tf.all_variables():
+      if v in tf.trainable_variables():
+        restore_name = variable_averages.average_name(v)
+      else:
+        restore_name = v.op.name
+      variables_to_restore[restore_name] = v
+    saver = tf.train.Saver(variables_to_restore)
+
+    # Build the summary operation based on the TF collection of Summaries.
+    summary_op = tf.merge_all_summaries()
+
+    graph_def = tf.get_default_graph().as_graph_def()
+    summary_writer = tf.train.SummaryWriter(FLAGS.eval_dir,
+                                            graph_def=graph_def)
+
+    while True:
+      eval_once(saver, summary_writer, top_k_op, summary_op)
+      if FLAGS.run_once:
+        break
+      time.sleep(FLAGS.eval_interval_secs)
+
+
+def main(argv=None):  # pylint: disable=unused-argument
+  cifar10.maybe_download_and_extract()
+  if gfile.Exists(FLAGS.eval_dir):
+    gfile.DeleteRecursively(FLAGS.eval_dir)
+  gfile.MakeDirs(FLAGS.eval_dir)
+  evaluate()
+
+
+if __name__ == '__main__':
+  tf.app.run()
diff --git a/tensorflow/models/image/cifar10/cifar10_input.py b/tensorflow/models/image/cifar10/cifar10_input.py
new file mode 100644
index 0000000000..686f1bf987
--- /dev/null
+++ b/tensorflow/models/image/cifar10/cifar10_input.py
@@ -0,0 +1,65 @@
+"""Routine for decoding the CIFAR-10 binary file format."""
+
+import tensorflow.python.platform
+import tensorflow as tf
+
+
+def read_cifar10(filename_queue):
+  """Reads and parses examples from CIFAR10 data files.
+
+  Recommendation: if you want N-way read parallelism, call this function
+  N times.  This will give you N independent Readers reading different
+  files & positions within those files, which will give better mixing of
+  examples.
+
+  Args:
+    filename_queue: A queue of strings with the filenames to read from.
+
+  Returns:
+    An object representing a single example, with the following fields:
+      height: number of rows in the result (32)
+      width: number of columns in the result (32)
+      depth: number of color channels in the result (3)
+      key: a scalar string Tensor describing the filename & record number
+        for this example.
+      label: an int32 Tensor with the label in the range 0..9.
+      uint8image: a [height, width, depth] uint8 Tensor with the image data
+  """
+
+  class CIFAR10Record(object):
+    pass
+  result = CIFAR10Record()
+
+  # Dimensions of the images in the CIFAR-10 dataset.
+  # See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
+  # input format.
+  label_bytes = 1  # 2 for CIFAR-100
+  result.height = 32
+  result.width = 32
+  result.depth = 3
+  image_bytes = result.height * result.width * result.depth
+  # Every record consists of a label followed by the image, with a
+  # fixed number of bytes for each.
+  record_bytes = label_bytes + image_bytes
+
+  # Read a record, getting filenames from the filename_queue.  No
+  # header or footer in the CIFAR-10 format, so we leave header_bytes
+  # and footer_bytes at their default of 0.
+  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
+  result.key, value = reader.read(filename_queue)
+
+  # Convert from a string to a vector of uint8 that is record_bytes long.
+  record_bytes = tf.decode_raw(value, tf.uint8)
+
+  # The first bytes represent the label, which we convert from uint8->int32.
+  result.label = tf.cast(
+      tf.slice(record_bytes, [0], [label_bytes]), tf.int32)
+
+  # The remaining bytes after the label represent the image, which we reshape
+  # from [depth * height * width] to [depth, height, width].
+  depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
+                           [result.depth, result.height, result.width])
+  # Convert from [depth, height, width] to [height, width, depth].
+  result.uint8image = tf.transpose(depth_major, [1, 2, 0])
+
+  return result
diff --git a/tensorflow/models/image/cifar10/cifar10_input_test.py b/tensorflow/models/image/cifar10/cifar10_input_test.py
new file mode 100644
index 0000000000..d43f5aedcf
--- /dev/null
+++ b/tensorflow/models/image/cifar10/cifar10_input_test.py
@@ -0,0 +1,49 @@
+"""Tests for cifar10 input."""
+
+import os
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+from tensorflow.models.image.cifar10 import cifar10_input
+
+
+class CIFAR10InputTest(tf.test.TestCase):
+
+  def _record(self, label, red, green, blue):
+    image_size = 32 * 32
+    record = "%s%s%s%s" % (chr(label), chr(red) * image_size,
+                           chr(green) * image_size, chr(blue) * image_size)
+    expected = [[[red, green, blue]] * 32] * 32
+    return record, expected
+
+  def testSimple(self):
+    labels = [9, 3, 0]
+    records = [self._record(labels[0], 0, 128, 255),
+               self._record(labels[1], 255, 0, 1),
+               self._record(labels[2], 254, 255, 0)]
+    contents = "".join([record for record, _ in records])
+    expected = [expected for _, expected in records]
+    filename = os.path.join(self.get_temp_dir(), "cifar")
+    open(filename, "w").write(contents)
+
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(99, [tf.string], shapes=())
+      q.enqueue([filename]).run()
+      q.close().run()
+      result = cifar10_input.read_cifar10(q)
+
+      for i in range(3):
+        key, label, uint8image = sess.run([
+            result.key, result.label, result.uint8image])
+        self.assertEqual("%s:%d" % (filename, i), key)
+        self.assertEqual(labels[i], label)
+        self.assertAllEqual(expected[i], uint8image)
+
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run([result.key, result.uint8image])
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py b/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py
new file mode 100644
index 0000000000..54bc41f444
--- /dev/null
+++ b/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py
@@ -0,0 +1,265 @@
+"""A binary to train CIFAR-10 using multiple GPU's with synchronous updates.
+
+Accuracy:
+cifar10_multi_gpu_train.py achieves ~86% accuracy after 100K steps (256
+epochs of data) as judged by cifar10_eval.py.
+
+Speed: With batch_size 128.
+
+System        | Step Time (sec/batch)  |     Accuracy
+--------------------------------------------------------------------
+1 Tesla K20m  | 0.35-0.60              | ~86% at 60K steps  (5 hours)
+1 Tesla K40m  | 0.25-0.35              | ~86% at 100K steps (4 hours)
+2 Tesla K20m  | 0.13-0.20              | ~84% at 30K steps  (2.5 hours)
+3 Tesla K20m  | 0.13-0.18              | ~84% at 30K steps
+4 Tesla K20m  | ~0.10                  | ~84% at 30K steps
+
+Usage:
+Please see the tutorial and website for how to download the CIFAR-10
+data set, compile the program and train the model.
+
+http://tensorflow.org/tutorials/deep_cnn/
+"""
+from datetime import datetime
+import os.path
+import re
+import time
+
+# pylint: disable=unused-import,g-bad-import-order
+import tensorflow.python.platform
+from tensorflow.python.platform import gfile
+import numpy as np
+import tensorflow as tf
+from tensorflow.models.image.cifar10 import cifar10
+# pylint: disable=unused-import,g-bad-import-order
+
+FLAGS = tf.app.flags.FLAGS
+
+tf.app.flags.DEFINE_string('train_dir', '/tmp/cifar10_train',
+                           """Directory where to write event logs """
+                           """and checkpoint.""")
+tf.app.flags.DEFINE_integer('max_steps', 1000000,
+                            """Number of batches to run.""")
+tf.app.flags.DEFINE_integer('num_gpus', 1,
+                            """How many GPUs to use.""")
+tf.app.flags.DEFINE_boolean('log_device_placement', False,
+                            """Whether to log device placement.""")
+
+
+def tower_loss(scope):
+  """Calculate the total loss on a single tower running the CIFAR model.
+
+  Args:
+    scope: unique prefix string identifying the CIFAR tower, e.g. 'tower_0'
+
+  Returns:
+     Tensor of shape [] containing the total loss for a batch of data
+  """
+  # Get images and labels for CIFAR-10.
+  images, labels = cifar10.distorted_inputs()
+
+  # Build inference Graph.
+  logits = cifar10.inference(images)
+
+  # Build the portion of the Graph calculating the losses. Note that we will
+  # assemble the total_loss using a custom function below.
+  _ = cifar10.loss(logits, labels)
+
+  # Assemble all of the losses for the current tower only.
+  losses = tf.get_collection('losses', scope)
+
+  # Calculate the total loss for the current tower.
+  total_loss = tf.add_n(losses, name='total_loss')
+
+  # Compute the moving average of all individual losses and the total loss.
+  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
+  loss_averages_op = loss_averages.apply(losses + [total_loss])
+
+  # Attach a scalar summmary to all individual losses and the total loss; do the
+  # same for the averaged version of the losses.
+  for l in losses + [total_loss]:
+    # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
+    # session. This helps the clarity of presentation on tensorboard.
+    loss_name = re.sub('%s_[0-9]*/' % cifar10.TOWER_NAME, '', l.op.name)
+    # Name each loss as '(raw)' and name the moving average version of the loss
+    # as the original loss name.
+    tf.scalar_summary(loss_name +' (raw)', l)
+    tf.scalar_summary(loss_name, loss_averages.average(l))
+
+  with tf.control_dependencies([loss_averages_op]):
+    total_loss = tf.identity(total_loss)
+  return total_loss
+
+
+def average_gradients(tower_grads):
+  """Calculate the average gradient for each shared variable across all towers.
+
+  Note that this function provides a synchronization point across all towers.
+
+  Args:
+    tower_grads: List of lists of (gradient, variable) tuples. The outer list
+      is over individual gradients. The inner list is over the gradient
+      calculation for each tower.
+  Returns:
+     List of pairs of (gradient, variable) where the gradient has been averaged
+     across all towers.
+  """
+  average_grads = []
+  for grad_and_vars in zip(*tower_grads):
+    # Note that each grad_and_vars looks like the following:
+    #   ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
+    grads = []
+    for g, _ in grad_and_vars:
+      # Add 0 dimension to the gradients to represent the tower.
+      expanded_g = tf.expand_dims(g, 0)
+
+      # Append on a 'tower' dimension which we will average over below.
+      grads.append(expanded_g)
+
+    # Average over the 'tower' dimension.
+    grad = tf.concat(0, grads)
+    grad = tf.reduce_mean(grad, 0)
+
+    # Keep in mind that the Variables are redundant because they are shared
+    # across towers. So .. we will just return the first tower's pointer to
+    # the Variable.
+    v = grad_and_vars[0][1]
+    grad_and_var = (grad, v)
+    average_grads.append(grad_and_var)
+  return average_grads
+
+
+def train():
+  """Train CIFAR-10 for a number of steps."""
+  with tf.Graph().as_default(), tf.device('/cpu:0'):
+    # Create a variable to count the number of train() calls. This equals the
+    # number of batches processed * FLAGS.num_gpus.
+    global_step = tf.get_variable(
+        'global_step', [],
+        initializer=tf.constant_initializer(0), trainable=False)
+
+    # Calculate the learning rate schedule.
+    num_batches_per_epoch = (cifar10.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN /
+                             FLAGS.batch_size)
+    decay_steps = int(num_batches_per_epoch * cifar10.NUM_EPOCHS_PER_DECAY)
+
+    # Decay the learning rate exponentially based on the number of steps.
+    lr = tf.train.exponential_decay(cifar10.INITIAL_LEARNING_RATE,
+                                    global_step,
+                                    decay_steps,
+                                    cifar10.LEARNING_RATE_DECAY_FACTOR,
+                                    staircase=True)
+
+    # Create an optimizer that performs gradient descent.
+    opt = tf.train.GradientDescentOptimizer(lr)
+
+    # Calculate the gradients for each model tower.
+    tower_grads = []
+    for i in xrange(FLAGS.num_gpus):
+      with tf.device('/gpu:%d' % i):
+        with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
+          # Calculate the loss for one tower of the CIFAR model. This function
+          # constructs the entire CIFAR model but shares the variables across
+          # all towers.
+          loss = tower_loss(scope)
+
+          # Reuse variables for the next tower.
+          tf.get_variable_scope().reuse_variables()
+
+          # Retain the summaries from the final tower.
+          summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
+
+          # Calculate the gradients for the batch of data on this CIFAR tower.
+          grads = opt.compute_gradients(loss)
+
+          # Keep track of the gradients across all towers.
+          tower_grads.append(grads)
+
+    # We must calculate the mean of each gradient. Note that this is the
+    # synchronization point across all towers.
+    grads = average_gradients(tower_grads)
+
+    # Add a summary to track the learning rate.
+    summaries.append(tf.scalar_summary('learning_rate', lr))
+
+    # Add histograms for gradients.
+    for grad, var in grads:
+      if grad:
+        summaries.append(
+            tf.histogram_summary(var.op.name + '/gradients', grad))
+
+    # Apply the gradients to adjust the shared variables.
+    apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
+
+    # Add histograms for trainable variables.
+    for var in tf.trainable_variables():
+      summaries.append(tf.histogram_summary(var.op.name, var))
+
+    # Track the moving averages of all trainable variables.
+    variable_averages = tf.train.ExponentialMovingAverage(
+        cifar10.MOVING_AVERAGE_DECAY, global_step)
+    variables_averages_op = variable_averages.apply(tf.trainable_variables())
+
+    # Group all updates to into a single train op.
+    train_op = tf.group(apply_gradient_op, variables_averages_op)
+
+    # Create a saver.
+    saver = tf.train.Saver(tf.all_variables())
+
+    # Build the summary operation from the last tower summaries.
+    summary_op = tf.merge_summary(summaries)
+
+    # Build an initialization operation to run below.
+    init = tf.initialize_all_variables()
+
+    # Start running operations on the Graph. allow_soft_placement must be set to
+    # True to build towers on GPU, as some of the ops do not have GPU
+    # implementations.
+    sess = tf.Session(config=tf.ConfigProto(
+        allow_soft_placement=True,
+        log_device_placement=FLAGS.log_device_placement))
+    sess.run(init)
+
+    # Start the queue runners.
+    tf.train.start_queue_runners(sess=sess)
+
+    summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                            graph_def=sess.graph_def)
+
+    for step in xrange(FLAGS.max_steps):
+      start_time = time.time()
+      _, loss_value = sess.run([train_op, loss])
+      duration = time.time() - start_time
+
+      assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
+
+      if step % 10 == 0:
+        num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus
+        examples_per_sec = num_examples_per_step / float(duration)
+        sec_per_batch = float(duration) / FLAGS.num_gpus
+
+        format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
+                      'sec/batch)')
+        print (format_str % (datetime.now(), step, loss_value,
+                             examples_per_sec, sec_per_batch))
+
+      if step % 100 == 0:
+        summary_str = sess.run(summary_op)
+        summary_writer.add_summary(summary_str, step)
+
+      # Save the model checkpoint periodically.
+      if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:
+        checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')
+        saver.save(sess, checkpoint_path, global_step=step)
+
+
+def main(argv=None):  # pylint: disable=unused-argument
+  cifar10.maybe_download_and_extract()
+  if gfile.Exists(FLAGS.train_dir):
+    gfile.DeleteRecursively(FLAGS.train_dir)
+  gfile.MakeDirs(FLAGS.train_dir)
+  train()
+
+
+if __name__ == '__main__':
+  tf.app.run()
diff --git a/tensorflow/models/image/cifar10/cifar10_train.py b/tensorflow/models/image/cifar10/cifar10_train.py
new file mode 100644
index 0000000000..bcb6eeae58
--- /dev/null
+++ b/tensorflow/models/image/cifar10/cifar10_train.py
@@ -0,0 +1,119 @@
+"""A binary to train CIFAR-10 using a single GPU.
+
+Accuracy:
+cifar10_train.py achieves ~86% accuracy after 100K steps (256 epochs of
+data) as judged by cifar10_eval.py.
+
+Speed: With batch_size 128.
+
+System        | Step Time (sec/batch)  |     Accuracy
+------------------------------------------------------------------
+1 Tesla K20m  | 0.35-0.60              | ~86% at 60K steps  (5 hours)
+1 Tesla K40m  | 0.25-0.35              | ~86% at 100K steps (4 hours)
+
+Usage:
+Please see the tutorial and website for how to download the CIFAR-10
+data set, compile the program and train the model.
+
+http://tensorflow.org/tutorials/deep_cnn/
+"""
+from datetime import datetime
+import os.path
+import time
+
+import tensorflow.python.platform
+from tensorflow.python.platform import gfile
+
+import numpy as np
+
+import tensorflow as tf
+
+from tensorflow.models.image.cifar10 import cifar10
+
+FLAGS = tf.app.flags.FLAGS
+
+tf.app.flags.DEFINE_string('train_dir', '/tmp/cifar10_train',
+                           """Directory where to write event logs """
+                           """and checkpoint.""")
+tf.app.flags.DEFINE_integer('max_steps', 1000000,
+                            """Number of batches to run.""")
+tf.app.flags.DEFINE_boolean('log_device_placement', False,
+                            """Whether to log device placement.""")
+
+
+def train():
+  """Train CIFAR-10 for a number of steps."""
+  with tf.Graph().as_default():
+    global_step = tf.Variable(0, trainable=False)
+
+    # Get images and labels for CIFAR-10.
+    images, labels = cifar10.distorted_inputs()
+
+    # Build a Graph that computes the logits predictions from the
+    # inference model.
+    logits = cifar10.inference(images)
+
+    # Calculate loss.
+    loss = cifar10.loss(logits, labels)
+
+    # Build a Graph that trains the model with one batch of examples and
+    # updates the model parameters.
+    train_op = cifar10.train(loss, global_step)
+
+    # Create a saver.
+    saver = tf.train.Saver(tf.all_variables())
+
+    # Build the summary operation based on the TF collection of Summaries.
+    summary_op = tf.merge_all_summaries()
+
+    # Build an initialization operation to run below.
+    init = tf.initialize_all_variables()
+
+    # Start running operations on the Graph.
+    sess = tf.Session(config=tf.ConfigProto(
+        log_device_placement=FLAGS.log_device_placement))
+    sess.run(init)
+
+    # Start the queue runners.
+    tf.train.start_queue_runners(sess=sess)
+
+    summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
+                                            graph_def=sess.graph_def)
+
+    for step in xrange(FLAGS.max_steps):
+      start_time = time.time()
+      _, loss_value = sess.run([train_op, loss])
+      duration = time.time() - start_time
+
+      assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
+
+      if step % 10 == 0:
+        num_examples_per_step = FLAGS.batch_size
+        examples_per_sec = num_examples_per_step / float(duration)
+        sec_per_batch = float(duration)
+
+        format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
+                      'sec/batch)')
+        print (format_str % (datetime.now(), step, loss_value,
+                             examples_per_sec, sec_per_batch))
+
+      if step % 100 == 0:
+        summary_str = sess.run(summary_op)
+        summary_writer.add_summary(summary_str, step)
+
+      # Save the model checkpoint periodically.
+      if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:
+        checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')
+        saver.save(sess, checkpoint_path, global_step=step)
+
+
+def main(argv=None):  # pylint: disable=unused-argument
+  cifar10.maybe_download_and_extract()
+  if gfile.Exists(FLAGS.train_dir):
+    gfile.DeleteRecursively(FLAGS.train_dir)
+  gfile.MakeDirs(FLAGS.train_dir)
+  train()
+
+
+if __name__ == '__main__':
+  tf.app.run()
diff --git a/tensorflow/models/image/mnist/BUILD b/tensorflow/models/image/mnist/BUILD
new file mode 100644
index 0000000000..76b31d0feb
--- /dev/null
+++ b/tensorflow/models/image/mnist/BUILD
@@ -0,0 +1,44 @@
+# Description:
+# Example TensorFlow models for MNIST that achieves high accuracy
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+py_binary(
+    name = "convolutional",
+    srcs = [
+        "convolutional.py",
+    ],
+    visibility = ["//tensorflow:__subpackages__"],
+    deps = [
+        "//tensorflow:tensorflow_py",
+    ],
+)
+
+py_test(
+    name = "convolutional_test",
+    size = "medium",
+    srcs = [
+        "convolutional.py",
+    ],
+    args = [
+        "--self_test=True",
+    ],
+    main = "convolutional.py",
+    deps = [
+        "//tensorflow:tensorflow_py",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/models/image/mnist/__init__.py b/tensorflow/models/image/mnist/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/models/image/mnist/__init__.py
diff --git a/tensorflow/models/image/mnist/convolutional.py b/tensorflow/models/image/mnist/convolutional.py
new file mode 100644
index 0000000000..8fb0e4dfb4
--- /dev/null
+++ b/tensorflow/models/image/mnist/convolutional.py
@@ -0,0 +1,270 @@
+"""Simple, end-to-end, LeNet-5-like convolutional MNIST model example.
+
+This should achieve a test error of 0.8%. Please keep this model as simple and
+linear as possible, it is meant as a tutorial for simple convolutional models.
+Run with --self_test on the command line to exectute a short self-test.
+"""
+import gzip
+import os
+import sys
+import urllib
+
+import tensorflow.python.platform
+
+import numpy
+import tensorflow as tf
+
+SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'
+WORK_DIRECTORY = 'data'
+IMAGE_SIZE = 28
+NUM_CHANNELS = 1
+PIXEL_DEPTH = 255
+NUM_LABELS = 10
+VALIDATION_SIZE = 5000  # Size of the validation set.
+SEED = 66478  # Set to None for random seed.
+BATCH_SIZE = 64
+NUM_EPOCHS = 10
+
+
+tf.app.flags.DEFINE_boolean("self_test", False, "True if running a self test.")
+FLAGS = tf.app.flags.FLAGS
+
+
+def maybe_download(filename):
+    """Download the data from Yann's website, unless it's already here."""
+    if not os.path.exists(WORK_DIRECTORY):
+        os.mkdir(WORK_DIRECTORY)
+    filepath = os.path.join(WORK_DIRECTORY, filename)
+    if not os.path.exists(filepath):
+        filepath, _ = urllib.urlretrieve(SOURCE_URL + filename, filepath)
+        statinfo = os.stat(filepath)
+        print 'Succesfully downloaded', filename, statinfo.st_size, 'bytes.'
+    return filepath
+
+
+def extract_data(filename, num_images):
+    """Extract the images into a 4D tensor [image index, y, x, channels].
+
+    Values are rescaled from [0, 255] down to [-0.5, 0.5].
+    """
+    print 'Extracting', filename
+    with gzip.open(filename) as bytestream:
+        bytestream.read(16)
+        buf = bytestream.read(IMAGE_SIZE * IMAGE_SIZE * num_images)
+        data = numpy.frombuffer(buf, dtype=numpy.uint8).astype(numpy.float32)
+        data = (data - (PIXEL_DEPTH / 2.0)) / PIXEL_DEPTH
+        data = data.reshape(num_images, IMAGE_SIZE, IMAGE_SIZE, 1)
+        return data
+
+
+def extract_labels(filename, num_images):
+    """Extract the labels into a 1-hot matrix [image index, label index]."""
+    print 'Extracting', filename
+    with gzip.open(filename) as bytestream:
+        bytestream.read(8)
+        buf = bytestream.read(1 * num_images)
+        labels = numpy.frombuffer(buf, dtype=numpy.uint8)
+    # Convert to dense 1-hot representation.
+    return (numpy.arange(NUM_LABELS) == labels[:, None]).astype(numpy.float32)
+
+
+def fake_data(num_images):
+    """Generate a fake dataset that matches the dimensions of MNIST."""
+    data = numpy.ndarray(
+        shape=(num_images, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS),
+        dtype=numpy.float32)
+    labels = numpy.zeros(shape=(num_images, NUM_LABELS), dtype=numpy.float32)
+    for image in xrange(num_images):
+        label = image % 2
+        data[image, :, :, 0] = label - 0.5
+        labels[image, label] = 1.0
+    return data, labels
+
+
+def error_rate(predictions, labels):
+    """Return the error rate based on dense predictions and 1-hot labels."""
+    return 100.0 - (
+        100.0 *
+        numpy.sum(numpy.argmax(predictions, 1) == numpy.argmax(labels, 1)) /
+        predictions.shape[0])
+
+
+def main(argv=None):  # pylint: disable=unused-argument
+    if FLAGS.self_test:
+        print 'Running self-test.'
+        train_data, train_labels = fake_data(256)
+        validation_data, validation_labels = fake_data(16)
+        test_data, test_labels = fake_data(256)
+        num_epochs = 1
+    else:
+        # Get the data.
+        train_data_filename = maybe_download('train-images-idx3-ubyte.gz')
+        train_labels_filename = maybe_download('train-labels-idx1-ubyte.gz')
+        test_data_filename = maybe_download('t10k-images-idx3-ubyte.gz')
+        test_labels_filename = maybe_download('t10k-labels-idx1-ubyte.gz')
+
+        # Extract it into numpy arrays.
+        train_data = extract_data(train_data_filename, 60000)
+        train_labels = extract_labels(train_labels_filename, 60000)
+        test_data = extract_data(test_data_filename, 10000)
+        test_labels = extract_labels(test_labels_filename, 10000)
+
+        # Generate a validation set.
+        validation_data = train_data[:VALIDATION_SIZE, :, :, :]
+        validation_labels = train_labels[:VALIDATION_SIZE]
+        train_data = train_data[VALIDATION_SIZE:, :, :, :]
+        train_labels = train_labels[VALIDATION_SIZE:]
+        num_epochs = NUM_EPOCHS
+    train_size = train_labels.shape[0]
+
+    # This is where training samples and labels are fed to the graph.
+    # These placeholder nodes will be fed a batch of training data at each
+    # training step using the {feed_dict} argument to the Run() call below.
+    train_data_node = tf.placeholder(
+        tf.float32,
+        shape=(BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))
+    train_labels_node = tf.placeholder(tf.float32,
+                                       shape=(BATCH_SIZE, NUM_LABELS))
+    # For the validation and test data, we'll just hold the entire dataset in
+    # one constant node.
+    validation_data_node = tf.constant(validation_data)
+    test_data_node = tf.constant(test_data)
+
+    # The variables below hold all the trainable weights. They are passed an
+    # initial value which will be assigned when when we call:
+    # {tf.initialize_all_variables().run()}
+    conv1_weights = tf.Variable(
+        tf.truncated_normal([5, 5, NUM_CHANNELS, 32],  # 5x5 filter, depth 32.
+                            stddev=0.1,
+                            seed=SEED))
+    conv1_biases = tf.Variable(tf.zeros([32]))
+    conv2_weights = tf.Variable(
+        tf.truncated_normal([5, 5, 32, 64],
+                            stddev=0.1,
+                            seed=SEED))
+    conv2_biases = tf.Variable(tf.constant(0.1, shape=[64]))
+    fc1_weights = tf.Variable(  # fully connected, depth 512.
+        tf.truncated_normal([IMAGE_SIZE / 4 * IMAGE_SIZE / 4 * 64, 512],
+                            stddev=0.1,
+                            seed=SEED))
+    fc1_biases = tf.Variable(tf.constant(0.1, shape=[512]))
+    fc2_weights = tf.Variable(
+        tf.truncated_normal([512, NUM_LABELS],
+                            stddev=0.1,
+                            seed=SEED))
+    fc2_biases = tf.Variable(tf.constant(0.1, shape=[NUM_LABELS]))
+
+    # We will replicate the model structure for the training subgraph, as well
+    # as the evaluation subgraphs, while sharing the trainable parameters.
+    def model(data, train=False):
+        """The Model definition."""
+        # 2D convolution, with 'SAME' padding (i.e. the output feature map has
+        # the same size as the input). Note that {strides} is a 4D array whose
+        # shape matches the data layout: [image index, y, x, depth].
+        conv = tf.nn.conv2d(data,
+                            conv1_weights,
+                            strides=[1, 1, 1, 1],
+                            padding='SAME')
+        # Bias and rectified linear non-linearity.
+        relu = tf.nn.relu(tf.nn.bias_add(conv, conv1_biases))
+        # Max pooling. The kernel size spec {ksize} also follows the layout of
+        # the data. Here we have a pooling window of 2, and a stride of 2.
+        pool = tf.nn.max_pool(relu,
+                              ksize=[1, 2, 2, 1],
+                              strides=[1, 2, 2, 1],
+                              padding='SAME')
+        conv = tf.nn.conv2d(pool,
+                            conv2_weights,
+                            strides=[1, 1, 1, 1],
+                            padding='SAME')
+        relu = tf.nn.relu(tf.nn.bias_add(conv, conv2_biases))
+        pool = tf.nn.max_pool(relu,
+                              ksize=[1, 2, 2, 1],
+                              strides=[1, 2, 2, 1],
+                              padding='SAME')
+        # Reshape the feature map cuboid into a 2D matrix to feed it to the
+        # fully connected layers.
+        pool_shape = pool.get_shape().as_list()
+        reshape = tf.reshape(
+            pool,
+            [pool_shape[0], pool_shape[1] * pool_shape[2] * pool_shape[3]])
+        # Fully connected layer. Note that the '+' operation automatically
+        # broadcasts the biases.
+        hidden = tf.nn.relu(tf.matmul(reshape, fc1_weights) + fc1_biases)
+        # Add a 50% dropout during training only. Dropout also scales
+        # activations such that no rescaling is needed at evaluation time.
+        if train:
+            hidden = tf.nn.dropout(hidden, 0.5, seed=SEED)
+        return tf.matmul(hidden, fc2_weights) + fc2_biases
+
+    # Training computation: logits + cross-entropy loss.
+    logits = model(train_data_node, True)
+    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
+        logits, train_labels_node))
+
+    # L2 regularization for the fully connected parameters.
+    regularizers = (tf.nn.l2_loss(fc1_weights) + tf.nn.l2_loss(fc1_biases) +
+                    tf.nn.l2_loss(fc2_weights) + tf.nn.l2_loss(fc2_biases))
+    # Add the regularization term to the loss.
+    loss += 5e-4 * regularizers
+
+    # Optimizer: set up a variable that's incremented once per batch and
+    # controls the learning rate decay.
+    batch = tf.Variable(0)
+    # Decay once per epoch, using an exponential schedule starting at 0.01.
+    learning_rate = tf.train.exponential_decay(
+        0.01,                # Base learning rate.
+        batch * BATCH_SIZE,  # Current index into the dataset.
+        train_size,          # Decay step.
+        0.95,                # Decay rate.
+        staircase=True)
+    # Use simple momentum for the optimization.
+    optimizer = tf.train.MomentumOptimizer(learning_rate,
+                                           0.9).minimize(loss,
+                                                         global_step=batch)
+
+    # Predictions for the minibatch, validation set and test set.
+    train_prediction = tf.nn.softmax(logits)
+    # We'll compute them only once in a while by calling their {eval()} method.
+    validation_prediction = tf.nn.softmax(model(validation_data_node))
+    test_prediction = tf.nn.softmax(model(test_data_node))
+
+    # Create a local session to run this computation.
+    with tf.Session() as s:
+        # Run all the initializers to prepare the trainable parameters.
+        tf.initialize_all_variables().run()
+        print 'Initialized!'
+        # Loop through training steps.
+        for step in xrange(int(num_epochs * train_size / BATCH_SIZE)):
+            # Compute the offset of the current minibatch in the data.
+            # Note that we could use better randomization across epochs.
+            offset = (step * BATCH_SIZE) % (train_size - BATCH_SIZE)
+            batch_data = train_data[offset:(offset + BATCH_SIZE), :, :, :]
+            batch_labels = train_labels[offset:(offset + BATCH_SIZE)]
+            # This dictionary maps the batch data (as a numpy array) to the
+            # node in the graph is should be fed to.
+            feed_dict = {train_data_node: batch_data,
+                         train_labels_node: batch_labels}
+            # Run the graph and fetch some of the nodes.
+            _, l, lr, predictions = s.run(
+                [optimizer, loss, learning_rate, train_prediction],
+                feed_dict=feed_dict)
+            if step % 100 == 0:
+                print 'Epoch %.2f' % (float(step) * BATCH_SIZE / train_size)
+                print 'Minibatch loss: %.3f, learning rate: %.6f' % (l, lr)
+                print 'Minibatch error: %.1f%%' % error_rate(predictions,
+                                                             batch_labels)
+                print 'Validation error: %.1f%%' % error_rate(
+                    validation_prediction.eval(), validation_labels)
+                sys.stdout.flush()
+        # Finally print the result!
+        test_error = error_rate(test_prediction.eval(), test_labels)
+        print 'Test error: %.1f%%' % test_error
+        if FLAGS.self_test:
+            print 'test_error', test_error
+            assert test_error == 0.0, 'expected 0.0 test_error, got %.2f' % (
+                test_error,)
+
+
+if __name__ == '__main__':
+    tf.app.run()
diff --git a/tensorflow/opensource_only/__init__.py b/tensorflow/opensource_only/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/opensource_only/__init__.py
diff --git a/tensorflow/opensource_only/pip_package/__init__.py b/tensorflow/opensource_only/pip_package/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/opensource_only/pip_package/__init__.py
diff --git a/tensorflow/python/BUILD b/tensorflow/python/BUILD
new file mode 100644
index 0000000000..89eb22daba
--- /dev/null
+++ b/tensorflow/python/BUILD
@@ -0,0 +1,965 @@
+# Description:
+# Python support for TensorFlow.
+
+package(default_visibility = ["//tensorflow:internal"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+load("/tensorflow/tensorflow", "tf_cuda_library")
+load("/tensorflow/tensorflow", "tf_gen_op_wrapper_py")
+load("/tensorflow/tensorflow", "py_tests")
+load("/tensorflow/tensorflow", "cuda_py_tests")
+load("/tensorflow/tensorflow", "tf_py_wrap_cc")
+load("/tensorflow/core/platform/default/build_config", "tf_proto_library_py")
+
+config_setting(
+    name = "macosx",
+    values = {"cpu": "darwin"},
+)
+
+numpy_macosx_include_dir = select({
+    ":macosx": ["-I/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/include"],
+    "//conditions:default": [],
+})
+
+py_library(
+    name = "python",
+    srcs = ["__init__.py"],
+    visibility = ["//tensorflow:__pkg__"],
+    deps = [
+        ":client",
+        ":client_testlib",
+        ":framework",
+        ":framework_test_lib",
+        ":platform",
+        ":platform_test",
+        ":summary",
+        ":training",
+    ],
+)
+
+py_library(
+    name = "platform",
+    srcs = glob(["platform/**/*.py"]),
+)
+
+py_library(
+    name = "platform_test",
+    srcs = [
+        "platform/default/_googletest.py",
+        "platform/googletest.py",
+    ],
+    deps = [":platform"],
+)
+
+py_tests(
+    name = "platform_tests",
+    srcs = glob(["platform/default/*_test.py"]),
+    additional_deps = [
+        ":platform",
+        ":platform_test",
+    ],
+    prefix = "platform",
+)
+
+cc_library(
+    name = "py_record_reader_lib",
+    srcs = [
+        "lib/io/py_record_reader.cc",
+    ],
+    hdrs = [
+        "lib/io/py_record_reader.h",
+    ],
+    deps = [
+        "//tensorflow/core:lib",
+    ],
+)
+
+cc_library(
+    name = "py_record_writer_lib",
+    srcs = [
+        "lib/io/py_record_writer.cc",
+    ],
+    hdrs = [
+        "lib/io/py_record_writer.h",
+    ],
+    deps = [
+        "//tensorflow/core:lib",
+    ],
+)
+
+py_test(
+    name = "pywrap_status_test",
+    size = "small",
+    srcs = ["lib/core/pywrap_status_test.py"],
+    deps = [
+        ":framework_test_lib",
+        ":platform_test",
+        "//tensorflow/core:protos_all_py",
+    ],
+)
+
+cc_library(
+    name = "python_op_gen_main",
+    srcs = [
+        "framework/python_op_gen.cc",
+        "framework/python_op_gen.h",
+        "framework/python_op_gen_main.cc",
+    ],
+    visibility = ["//visibility:public"],
+    deps = [
+        "//tensorflow/core:framework",
+        "//tensorflow/core:protos_cc",
+    ],
+)
+
+py_library(
+    name = "framework",
+    srcs = [
+        # TODO(mrry): Move this to framework.
+        "client/graph_util.py",
+        "framework/device.py",
+        "framework/errors.py",
+        "framework/framework_lib.py",
+        "framework/importer.py",
+        "framework/op_def_registry.py",
+        "framework/ops.py",
+        "framework/random_seed.py",
+        "framework/registry.py",
+        "framework/tensor_shape.py",
+        "framework/types.py",
+        "framework/tensor_util.py",
+        "ops/common_shapes.py",
+    ],
+    deps = [
+        ":platform",
+        "//tensorflow/core:protos_all_py",
+    ],
+)
+
+# subinclude("//third_party/py/cython:build_defs")
+
+py_library(
+    name = "extra_py_tests_deps",
+    deps = ["//tensorflow:tensorflow_py"],
+)
+
+py_library(
+    name = "framework_test_lib",
+    srcs = [
+        "framework/test_util.py",
+    ],
+    deps = [
+        ":framework",
+        ":platform_test",
+        ":pywrap_tensorflow",
+        ":session",
+        ":util",
+    ],
+)
+
+py_library(
+    name = "client_testlib",
+    srcs = [
+        "platform/test.py",
+    ],
+    deps = [
+        ":framework_test_lib",
+        ":platform_test",
+    ],
+)
+
+py_test(
+    name = "framework_errors_test",
+    srcs = ["framework/errors_test.py"],
+    main = "framework/errors_test.py",
+    deps = [
+        ":framework_test_lib",
+        ":platform_test",
+        "//tensorflow:tensorflow_py",
+        "//tensorflow/core:protos_all_py",
+    ],
+)
+
+py_test(
+    name = "framework_importer_test",
+    srcs = ["framework/importer_test.py"],
+    main = "framework/importer_test.py",
+    deps = [
+        ":framework_test_lib",
+        ":ops",
+        ":platform_test",
+        "//tensorflow:tensorflow_py",
+    ],
+)
+
+tf_gen_op_wrapper_py(
+    name = "test_kernel_label_op",
+    out = "framework/test_kernel_label_op.py",
+    deps = [":test_kernel_label_op_kernel"],
+)
+
+cc_library(
+    name = "test_kernel_label_op_kernel",
+    srcs = ["framework/test_kernel_label_op.cc"],
+    linkstatic = 1,
+    deps = ["//tensorflow/core:framework"],
+    alwayslink = 1,
+)
+
+py_test(
+    name = "framework_ops_test",
+    srcs = ["framework/ops_test.py"],
+    main = "framework/ops_test.py",
+    deps = [
+        ":framework_test_lib",
+        ":ops",
+        ":platform_test",
+        ":session",
+        ":test_kernel_label_op",
+    ],
+)
+
+py_test(
+    name = "framework_tensor_shape_test",
+    srcs = ["framework/tensor_shape_test.py"],
+    main = "framework/tensor_shape_test.py",
+    deps = [
+        ":framework_test_lib",
+        ":platform_test",
+        "//tensorflow/core:protos_all_py",
+    ],
+)
+
+py_test(
+    name = "framework_tensor_util_test",
+    srcs = ["framework/tensor_util_test.py"],
+    main = "framework/tensor_util_test.py",
+    deps = [
+        ":framework_test_lib",
+        ":ops",
+        ":platform_test",
+    ],
+)
+
+py_test(
+    name = "framework_test_util_test",
+    srcs = ["framework/test_util_test.py"],
+    main = "framework/test_util_test.py",
+    deps = [
+        ":framework_test_lib",
+        ":ops",
+        ":platform_test",
+    ],
+)
+
+py_test(
+    name = "framework_types_test",
+    srcs = ["framework/types_test.py"],
+    main = "framework/types_test.py",
+    deps = [
+        ":framework_test_lib",
+        ":platform_test",
+        "//tensorflow:tensorflow_py",
+        "//tensorflow/core:protos_all_py",
+    ],
+)
+
+py_test(
+    name = "op_def_library_test",
+    srcs = ["ops/op_def_library_test.py"],
+    main = "ops/op_def_library_test.py",
+    deps = [
+        ":framework_test_lib",
+        ":ops",
+    ],
+)
+
+tf_gen_op_wrapper_py(
+    name = "array_ops",
+    hidden = [
+        "BroadcastGradientArgs",
+        "Concat",
+        "Const",
+        "EditDistance",
+        "Pack",
+        "Placeholder",
+        "RefIdentity",
+        "Split",
+        "Slice",
+        "TileGrad",  # Exported through array_grad instead of array_ops.
+        "ZerosLike",  # TODO(josh11b): Use this instead of the Python version.
+        "Unpack",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "attention_ops",
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "candidate_sampling_ops",
+    hidden = [
+        "AllCandidateSampler",
+        "ComputeAccidentalHits",
+        "FixedUnigramCandidateSampler",
+        "LogUniformCandidateSampler",
+        "ThreadUnsafeUnigramCandidateSampler",
+        "UniformCandidateSampler",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "control_flow_ops",
+    hidden = [
+        "Switch",
+        "Merge",
+        "Exit",
+    ],
+    require_shape_functions = True,
+    deps = [
+        "//tensorflow/core:control_flow_ops_op_lib",
+        "//tensorflow/core:no_op_op_lib",
+    ],
+)
+
+tf_gen_op_wrapper_py(
+    name = "data_flow_ops",
+    hidden = [
+        "FIFOQueue",
+        "HashTable",
+        "InitializeTable",
+        "LookupTableFind",
+        "LookupTableSize",
+        "Mutex",
+        "MutexAcquire",
+        "MutexRelease",
+        "QueueClose",
+        "QueueDequeue",
+        "QueueDequeueMany",
+        "QueueEnqueue",
+        "QueueEnqueueMany",
+        "QueueSize",
+        "RandomShuffleQueue",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "image_ops",
+    hidden = [
+        "ScaleImageGrad",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "io_ops",
+    hidden = [
+        "FixedLengthRecordReader",
+        "IdentityReader",
+        "ReaderClose",
+        "ReaderEnqueueWork",
+        "ReaderNumRecordsProduced",
+        "ReaderNumWorkUnitsCompleted",
+        "ReaderRead",
+        "ReaderReset",
+        "ReaderRestoreState",
+        "ReaderSerializeState",
+        "ReaderWorkQueueLength",
+        "Restore",
+        "RestoreSlice",
+        "Save",
+        "SaveSlices",
+        "ShardedFilename",
+        "ShardedFilespec",
+        "TextLineReader",
+        "TFRecordReader",
+        "WholeFileReader",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "linalg_ops",
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "logging_ops",
+    hidden = [
+        "Assert",
+        "Print",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "math_ops",
+    hidden = [
+        "Abs",
+        "All",
+        "Any",
+        "BatchMatMul",
+        "Complex",
+        "Max",
+        "Mean",
+        "Min",
+        "Pow",
+        "Prod",
+        "Range",
+        "SparseMatMul",
+        "Sum",
+        "MatMul",
+        "Sigmoid",
+        "Tanh",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "nn_ops",
+    hidden = [
+        "AvgPoolGrad",  # "*Grad" accessible through nn_grad instead of nn_ops.
+        "BatchNormWithGlobalNormalizationGrad",
+        "SoftmaxCrossEntropyWithLogits",
+        "LRNGrad",
+        "MaxPoolGrad",
+        "MaxPoolGradWithArgmax",
+        "ReluGrad",
+        "Relu6Grad",
+        "SoftplusGrad",
+        "BiasAdd",
+        "Relu6",
+        "AvgPool",
+        "MaxPool",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "parsing_ops",
+    hidden = ["ParseExample"],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "random_ops",
+    hidden = [
+        "RandomUniform",
+        "RandomShuffle",
+        "RandomStandardNormal",
+        "TruncatedNormal",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "state_ops",
+    hidden = [
+        "Variable",
+        "TemporaryVariable",
+        "DestroyTemporaryVariable",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "sparse_ops",
+    hidden = [
+        "SparseConcat",
+        "SparseSelectLastK",
+        "SparseReorder",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "string_ops",
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "summary_ops",
+    hidden = [
+        "HistogramAccumulatorSummary",
+        "HistogramSummary",
+        "ImageSummary",
+        "MergeSummary",
+        "ScalarSummary",
+    ],
+    require_shape_functions = True,
+)
+
+tf_gen_op_wrapper_py(
+    name = "user_ops",
+    hidden = [
+        "Fact",
+    ],
+    require_shape_functions = False,
+)
+
+tf_gen_op_wrapper_py(
+    name = "training_ops",
+    out = "training/gen_training_ops.py",
+    require_shape_functions = True,
+)
+
+py_library(
+    name = "ops",
+    srcs = [
+        "ops/array_grad.py",
+        "ops/array_ops.py",
+        "ops/attention_ops.py",
+        "ops/candidate_sampling_ops.py",
+        "ops/clip_ops.py",
+        "ops/constant_op.py",
+        "ops/control_flow_grad.py",
+        "ops/control_flow_ops.py",
+        "ops/data_flow_grad.py",
+        "ops/data_flow_ops.py",
+        "ops/embedding_ops.py",
+        "ops/gen_array_ops.py",
+        "ops/gen_attention_ops.py",
+        "ops/gen_control_flow_ops.py",
+        "ops/gen_data_flow_ops.py",
+        "ops/gen_image_ops.py",
+        "ops/gen_io_ops.py",
+        "ops/gen_linalg_ops.py",
+        "ops/gen_logging_ops.py",
+        "ops/gen_math_ops.py",
+        "ops/gen_nn_ops.py",
+        "ops/gen_random_ops.py",
+        "ops/gen_state_ops.py",
+        "ops/gen_string_ops.py",
+        "ops/gen_summary_ops.py",
+        "ops/gradients.py",
+        "ops/image_ops.py",
+        "ops/init_ops.py",
+        "ops/io_ops.py",
+        "ops/linalg_grad.py",
+        "ops/linalg_ops.py",
+        "ops/logging_ops.py",
+        "ops/math_grad.py",
+        "ops/math_ops.py",
+        "ops/nn.py",
+        "ops/nn_grad.py",
+        "ops/nn_ops.py",
+        "ops/numerics.py",
+        "ops/op_def_library.py",
+        "ops/parsing_ops.py",
+        "ops/random_ops.py",
+        "ops/sparse_grad.py",
+        "ops/sparse_ops.py",
+        "ops/standard_ops.py",
+        "ops/state_grad.py",
+        "ops/state_ops.py",
+        "ops/string_ops.py",
+        "ops/summary_ops.py",
+        "ops/variable_scope.py",
+        "ops/variables.py",
+        "user_ops/user_ops.py",
+    ],
+    deps = [
+        ":array_ops",
+        ":candidate_sampling_ops",
+        ":control_flow_ops",
+        ":data_flow_ops",
+        ":framework",
+        ":io_ops",
+        ":linalg_ops",
+        ":logging_ops",
+        ":math_ops",
+        ":nn_ops",
+        ":parsing_ops",
+        ":random_ops",
+        ":sparse_ops",
+        ":string_ops",
+        ":summary_ops",
+        ":user_ops",
+    ],
+)
+
+py_library(
+    name = "training",
+    srcs = glob(
+        ["training/**/*.py"],
+        exclude = ["**/*test*"],
+    ),
+    deps = [
+        ":client",
+        ":framework",
+        ":lib",
+        ":ops",
+        ":protos_all_py",
+        ":pywrap_tensorflow",
+        ":training_ops",
+        "//tensorflow/core:protos_all_py",
+    ],
+)
+
+py_library(
+    name = "client",
+    srcs = glob(
+        ["client/**/*.py"],
+        exclude = ["**/*test*"],
+    ),
+    deps = [
+        ":framework",
+        ":ops",
+        ":session",
+        ":training_ops",
+    ],
+)
+
+py_library(
+    name = "util",
+    srcs = glob(["util/**/*.py"]),
+    deps = [
+        "//google/protobuf:protobuf_python",
+    ],
+)
+
+tf_proto_library_py(
+    name = "protos_all",
+    srcs = glob(
+        ["**/*.proto"],
+        exclude = ["util/protobuf/compare_test.proto"],
+    ),
+)
+
+tf_proto_library_py(
+    name = "compare_test_proto",
+    testonly = 1,
+    srcs = ["util/protobuf/compare_test.proto"],
+)
+
+py_test(
+    name = "protobuf_compare_test",
+    srcs = ["util/protobuf/compare_test.py"],
+    main = "util/protobuf/compare_test.py",
+    deps = [
+        ":compare_test_proto_py",
+        ":platform_test",
+        ":util",
+    ],
+)
+
+py_test(
+    name = "events_writer_test",
+    size = "small",
+    srcs = [
+        "client/events_writer_test.py",
+    ],
+    deps = [
+        ":framework_test_lib",
+        ":lib",
+        ":platform_test",
+    ],
+)
+
+tf_cuda_library(
+    name = "tf_session_helper",
+    srcs = ["client/tf_session_helper.cc"],
+    hdrs = ["client/tf_session_helper.h"],
+    copts = numpy_macosx_include_dir + ["-I/usr/include/python2.7"],
+    deps = [
+        ":construction_fails_op",
+        ":test_kernel_label_op_kernel",
+        "//tensorflow/core",
+        "//tensorflow/core:kernels",
+        "//tensorflow/core:lib",
+        "//tensorflow/core:local",
+        "//tensorflow/core:protos_cc",
+    ],
+)
+
+tf_py_wrap_cc(
+    name = "client/pywraptensorflow_server_lib",
+    srcs = ["client/tensorflow_server.i"],
+    copts = numpy_macosx_include_dir,
+    swig_includes = [
+        "lib/core/status.i",
+        "lib/core/strings.i",
+        "platform/base.i",
+    ],
+    deps = [
+        "//tensorflow/core",
+        "//tensorflow/core:lib",
+        "//tensorflow/core:protos_cc",
+    ],
+)
+
+tf_py_wrap_cc(
+    name = "pywrap_tensorflow",
+    srcs = ["tensorflow.i"],
+    copts = numpy_macosx_include_dir,
+    swig_includes = [
+        "client/events_writer.i",
+        "client/tf_session.i",
+        "lib/core/status.i",
+        "lib/core/status_helper.i",
+        "lib/core/strings.i",
+        "lib/io/py_record_reader.i",
+        "lib/io/py_record_writer.i",
+        "platform/base.i",
+        "platform/numpy.i",
+        "util/port.i",
+    ],
+    deps = [
+        ":py_record_reader_lib",
+        ":py_record_writer_lib",
+        ":tf_session_helper",
+    ],
+)
+
+py_library(
+    name = "lib",
+    srcs = glob(["lib/**/*.py"]),
+    deps = [
+        ":pywrap_tensorflow",
+    ],
+)
+
+py_library(
+    name = "session",
+    srcs = ["client/session.py"],
+    deps = [
+        ":framework",
+        ":ops",
+        ":pywrap_tensorflow",
+    ],
+)
+
+# Just used by tests.
+tf_cuda_library(
+    name = "construction_fails_op",
+    testonly = 1,
+    srcs = ["client/test_construction_fails_op.cc"],
+    deps = [
+        "//tensorflow/core",
+        "//tensorflow/core:lib",
+        "//tensorflow/core:protos_cc",
+    ],
+    alwayslink = 1,
+)
+
+py_test(
+    name = "session_test",
+    srcs = ["client/session_test.py"],
+    deps = [
+        ":framework",
+        ":framework_test_lib",
+        ":session",
+    ],
+)
+
+py_test(
+    name = "graph_util_test",
+    srcs = ["client/graph_util_test.py"],
+    deps = [
+        ":framework",
+        ":framework_test_lib",
+        "//tensorflow:tensorflow_py",
+    ],
+)
+
+py_library(
+    name = "kernel_tests/gradient_checker",
+    srcs = ["kernel_tests/gradient_checker.py"],
+)
+
+cpu_only_kernel_test_list = glob([
+    "kernel_tests/attention_ops_test.py",
+    "kernel_tests/barrier_ops_test.py",
+    "kernel_tests/bcast_ops_test.py",
+    "kernel_tests/candidate_sampler_ops_test.py",
+    "kernel_tests/cholesky_op_test.py",
+    "kernel_tests/clip_ops_test.py",
+    "kernel_tests/decode_csv_op_test.py",
+    "kernel_tests/decode_raw_op_test.py",
+    "kernel_tests/determinant_op_test.py",
+    "kernel_tests/diag_op_test.py",
+    "kernel_tests/edit_distance_op_test.py",
+    "kernel_tests/fifo_queue_test.py",
+    "kernel_tests/identity_op_py_test.py",
+    "kernel_tests/in_topk_op_test.py",
+    "kernel_tests/io_ops_test.py",
+    "kernel_tests/listdiff_op_test.py",
+    "kernel_tests/logging_ops_test.py",
+    "kernel_tests/lookup_table_op_test.py",
+    "kernel_tests/lrn_op_py_test.py",
+    "kernel_tests/matrix_inverse_op_test.py",
+    "kernel_tests/mutex_ops_test.py",
+    "kernel_tests/parsing_ops_test.py",
+    "kernel_tests/queue_ops_test.py",
+    "kernel_tests/random_shuffle_queue_test.py",
+    "kernel_tests/save_restore_ops_test.py",
+    "kernel_tests/segment_reduction_ops_test.py",
+    "kernel_tests/sparse_concat_op_test.py",
+    "kernel_tests/sparse_reorder_op_test.py",
+    "kernel_tests/sparse_to_dense_op_test.py",
+    "kernel_tests/sparsemask_op_test.py",
+    "kernel_tests/summary_ops_test.py",
+    "kernel_tests/topk_op_test.py",
+    "kernel_tests/unique_op_test.py",
+    "kernel_tests/variable_scope_test.py",
+    "kernel_tests/variables_test.py",
+    "kernel_tests/where_op_test.py",
+])
+
+py_tests(
+    name = "cpu_only_kernel_tests",
+    srcs = cpu_only_kernel_test_list,
+)
+
+py_tests(
+    name = "reader_ops_test",
+    srcs = ["kernel_tests/reader_ops_test.py"],
+    additional_deps = [
+        ":lib",
+    ],
+)
+
+cuda_py_tests(
+    name = "op_tests",
+    srcs = glob(
+        ["ops/*_test.py"],
+        exclude = [
+            "ops/image_ops_test.py",
+            "ops/op_def_library_test.py",
+        ],
+    ),
+)
+
+cuda_py_tests(
+    name = "kernel_tests",
+    srcs = glob(
+        ["kernel_tests/*_test.py"],
+        exclude = [
+            "**/reader_ops_test.py",
+            # Sharded below
+            "**/cwise_ops_test.py",
+            "**/conv_ops_test.py",
+            "**/linalg_grad_test.py",
+            "**/pooling_ops_test.py",
+        ] + cpu_only_kernel_test_list,
+    ),
+)
+
+cuda_py_tests(
+    name = "kernel_tests_with_sharding",
+    srcs = [
+        "kernel_tests/conv_ops_test.py",
+        "kernel_tests/cwise_ops_test.py",
+        "kernel_tests/linalg_grad_test.py",
+        "kernel_tests/pooling_ops_test.py",
+    ],
+    shard_count = 2,
+)
+
+cuda_py_tests(
+    name = "image_ops_test",
+    srcs = [
+        "ops/image_ops_test.py",
+    ],
+    data = [
+        "//tensorflow/core:image_testdata",
+    ],
+    shard_count = 5,
+)
+
+cuda_py_tests(
+    name = "training_tests",
+    srcs = glob(
+        ["training/*_test.py"],
+        exclude = ["training/input_test.py"],
+    ),
+    additional_deps = [
+        ":training",
+    ],
+)
+
+py_tests(
+    name = "training_tests",
+    srcs = glob(
+        ["training/input_test.py"],
+    ),
+    additional_deps = [
+        ":training",
+    ],
+)
+
+py_library(
+    name = "summary",
+    srcs = glob(
+        ["summary/**/*.py"],
+        exclude = ["**/*test*"],
+    ),
+    deps = [
+        ":client",
+        ":framework",
+        ":pywrap_tensorflow",
+        "//tensorflow/core:protos_all_py",
+    ],
+)
+
+py_tests(
+    name = "summary_tests",
+    srcs = glob(["summary/**/*_test.py"]),
+    additional_deps = [
+        ":summary",
+        ":training",
+    ],
+)
+
+py_library(
+    name = "docs",
+    srcs = [
+        "framework/docs.py",
+    ],
+    deps = [
+        ":platform",
+    ],
+)
+
+py_binary(
+    name = "gen_docs_combined",
+    srcs = [
+        "framework/gen_docs_combined.py",
+    ],
+    main = "framework/gen_docs_combined.py",
+    deps = [
+        ":docs",
+        ":platform",
+        "//tensorflow:tensorflow_py",
+    ],
+)
+
+sh_test(
+    name = "gen_docs_test",
+    size = "small",
+    srcs = [
+        "framework/gen_docs_test.sh",
+    ],
+    data = [
+        ":gen_docs_combined",
+    ],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/python/__init__.py b/tensorflow/python/__init__.py
new file mode 100644
index 0000000000..5527c01173
--- /dev/null
+++ b/tensorflow/python/__init__.py
@@ -0,0 +1,42 @@
+# pylint: disable=wildcard-import,unused-import,g-bad-import-order,line-too-long
+"""Import core names of TensorFlow.
+
+Programs that want to build Brain Ops and Graphs without having to import the
+constructors and utilities individually can import this file:
+
+import tensorflow.python.platform
+import tensorflow as tf
+
+"""
+
+import tensorflow.python.platform
+from tensorflow.core.framework.graph_pb2 import *
+from tensorflow.core.framework.summary_pb2 import *
+from tensorflow.core.framework.config_pb2 import *
+from tensorflow.core.util.event_pb2 import *
+
+# Framework
+from tensorflow.python.framework.framework_lib import *
+
+# Session
+from tensorflow.python.client.client_lib import *
+
+# Ops
+from tensorflow.python.ops.standard_ops import *
+
+# Bring nn, image_ops, user_ops as a subpackages
+from tensorflow.python.ops import nn
+from tensorflow.python.ops import image_ops as image
+from tensorflow.python.user_ops import user_ops
+
+# Import the names from python/training.py as train.Name.
+from tensorflow.python.training import training as train
+
+# Sub-package for performing i/o directly instead of via ops in a graph.
+from tensorflow.python.lib.io import python_io
+
+# Make some application and test modules available.
+from tensorflow.python.platform import app
+from tensorflow.python.platform import flags
+from tensorflow.python.platform import logging
+from tensorflow.python.platform import test
diff --git a/tensorflow/python/client/__init__.py b/tensorflow/python/client/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/client/__init__.py
diff --git a/tensorflow/python/client/client_lib.py b/tensorflow/python/client/client_lib.py
new file mode 100644
index 0000000000..9148ed17c0
--- /dev/null
+++ b/tensorflow/python/client/client_lib.py
@@ -0,0 +1,40 @@
+# pylint: disable=wildcard-import,unused-import,g-bad-import-order,line-too-long
+"""This library contains classes for launching graphs and executing operations.
+
+The [basic usage](../../get_started/index.md#basic-usage) guide has
+examples of how a graph is launched in a [`tf.Session`](#Session).
+
+## Session management
+
+@@Session
+
+@@get_default_session
+
+## Error classes
+
+@@OpError
+@@CancelledError
+@@UnknownError
+@@InvalidArgumentError
+@@DeadlineExceededError
+@@NotFoundError
+@@AlreadyExistsError
+@@PermissionDeniedError
+@@UnauthenticatedError
+@@ResourceExhaustedError
+@@FailedPreconditionError
+@@AbortedError
+@@OutOfRangeError
+@@UnimplementedError
+@@InternalError
+@@UnavailableError
+@@DataLossError
+"""
+
+from tensorflow.python.client.session import InteractiveSession
+from tensorflow.python.client.session import Session
+
+from tensorflow.python.framework import errors
+from tensorflow.python.framework.errors import OpError
+
+from tensorflow.python.framework.ops import get_default_session
diff --git a/tensorflow/python/client/events_writer.i b/tensorflow/python/client/events_writer.i
new file mode 100644
index 0000000000..cbf42e2791
--- /dev/null
+++ b/tensorflow/python/client/events_writer.i
@@ -0,0 +1,34 @@
+%include "tensorflow/python/platform/base.i"
+
+%{
+#include "tensorflow/core/util/events_writer.h"
+#include "tensorflow/core/util/event.pb.h"
+%}
+
+%nodefaultctor EventsWriter;
+
+%ignoreall
+%unignore tensorflow;
+%unignore tensorflow::EventsWriter;
+%unignore tensorflow::EventsWriter::EventsWriter;
+%unignore tensorflow::EventsWriter::~EventsWriter;
+%unignore tensorflow::EventsWriter::FileName;
+%rename("_WriteSerializedEvent") tensorflow::EventsWriter::WriteSerializedEvent;
+%unignore tensorflow::EventsWriter::Flush;
+%unignore tensorflow::EventsWriter::Close;
+%include "tensorflow/core/util/events_writer.h"
+%unignoreall
+
+%newobject tensorflow::EventsWriter::EventsWriter;
+
+
+%extend tensorflow::EventsWriter {
+%insert("python") %{
+  def WriteEvent(self, event):
+    from tensorflow.core.util.event_pb2 import Event
+    if not isinstance(event, Event):
+      raise TypeError("Expected an event_pb2.Event proto, "
+                      " but got %s" % type(event))
+    return self._WriteSerializedEvent(event.SerializeToString())
+%}
+}
diff --git a/tensorflow/python/client/events_writer_test.py b/tensorflow/python/client/events_writer_test.py
new file mode 100644
index 0000000000..60bce49b1f
--- /dev/null
+++ b/tensorflow/python/client/events_writer_test.py
@@ -0,0 +1,54 @@
+"""Tests for the SWIG-wrapped events writer."""
+import os.path
+
+from tensorflow.core.framework import summary_pb2
+from tensorflow.core.util import event_pb2
+from tensorflow.python import pywrap_tensorflow
+from tensorflow.python.lib.io import tf_record
+from tensorflow.python.framework import test_util
+from tensorflow.python.platform import googletest
+
+
+class PywrapeventsWriterTest(test_util.TensorFlowTestCase):
+
+  def testWriteEvents(self):
+    file_prefix = os.path.join(self.get_temp_dir(), "events")
+    writer = pywrap_tensorflow.EventsWriter(file_prefix)
+    filename = writer.FileName()
+    event_written = event_pb2.Event(
+        wall_time=123.45, step=67,
+        summary=summary_pb2.Summary(
+            value=[summary_pb2.Summary.Value(tag="foo", simple_value=89.0)]))
+    writer.WriteEvent(event_written)
+    writer.Flush()
+    writer.Close()
+
+    with self.assertRaises(IOError):
+      for r in tf_record.tf_record_iterator(filename + "DOES_NOT_EXIST"):
+        self.assertTrue(False)
+
+    reader = tf_record.tf_record_iterator(filename)
+    event_read = event_pb2.Event()
+
+    event_read.ParseFromString(next(reader))
+    self.assertTrue(event_read.HasField("file_version"))
+
+    event_read.ParseFromString(next(reader))
+    # Second event
+    self.assertProtoEquals("""
+    wall_time: 123.45 step: 67
+    summary { value { tag: 'foo' simple_value: 89.0 } }
+    """, event_read)
+
+    with self.assertRaises(StopIteration):
+      next(reader)
+
+  def testWriteEventInvalidType(self):
+    class _Invalid(object):
+      def __str__(self): return "Invalid"
+    with self.assertRaisesRegexp(TypeError, "Invalid"):
+      pywrap_tensorflow.EventsWriter("foo").WriteEvent(_Invalid())
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/client/graph_util.py b/tensorflow/python/client/graph_util.py
new file mode 100644
index 0000000000..4c65a445ae
--- /dev/null
+++ b/tensorflow/python/client/graph_util.py
@@ -0,0 +1,138 @@
+"""Helpers to manipulate a tensor graph in python.
+"""
+
+import tensorflow.python.platform
+
+from tensorflow.core.framework import graph_pb2
+from tensorflow.python.framework import device as pydev
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.platform import logging
+
+_VARIABLE_OPS = {
+    "Assign",
+    "AssignAdd",
+    "AssignSub",
+    "Queue",
+    "RandomParameters",
+    "ScatterAdd",
+    "ScatterSub",
+    "ScatterUpdate",
+    "Variable",
+}
+
+
+def _is_variable_op(op):
+  """Returns true if 'op' refers to a Variable node."""
+  return op in _VARIABLE_OPS
+
+
+def set_cpu0(device_string):
+  """Creates a new device string based on `device_string' but using /CPU:0.
+
+   If the device is already on /CPU:0, this is a no-op.
+
+   Args:
+     device_string: A device string.
+
+   Returns:
+     A device string.
+  """
+  parsed_device = pydev.from_string(device_string)
+  parsed_device.device_type = "CPU"
+  parsed_device.device_index = 0
+  return parsed_device.to_string()
+
+
+def must_run_on_cpu(node, pin_variables_on_cpu=False):
+  """Returns True if the given node_def must run on CPU, otherwise False.
+
+  Args:
+    node: The node to be assigned to a device. Could be either an ops.Operation
+      or NodeDef.
+    pin_variables_on_cpu: If True, this function will return False if node_def
+      represents a variable-related op.
+
+  Returns:
+    True if the given node must run on CPU, otherwise False.
+  """
+
+  if isinstance(node, ops.Operation):
+    node_def = node.node_def
+  else:
+    assert isinstance(node, graph_pb2.NodeDef)
+    node_def = node
+
+  # If the op is a variable-related op, should we pin it on CPU?
+  if pin_variables_on_cpu and _is_variable_op(node_def.op):
+    return True
+
+  # Constant operations producing a string or int32 must run on CPU.
+  if node_def.op == "Const":
+    # Get the value of the 'dtype' attr
+    dtype = node_def.attr["dtype"].type
+    if dtype == types.string or dtype == types.int32:
+      return True
+
+  if node_def.op == "DynamicStitch":
+    dtype = node_def.attr["T"].type
+    if dtype == types.int32:
+      # DynamicStitch on GPU only works for int32 values.
+      return True
+
+  if node_def.op in ["Cast"]:
+    dtype = node_def.attr["SrcT"].type
+    if dtype == types.int32:
+      # Cast on GPU does not works for int32 values.
+      return True
+  return False
+
+
+################################################################################
+#
+# device functions for use in with g.device(...)
+#
+################################################################################
+
+
+def pin_variables_on_cpu(op):
+  """Returns a CPU device for Variable nodes if the device is not specified.
+
+  Args:
+    op: The ops.Operation object describing the node for which a device
+      should be chosen. The op.device field is respected.
+
+  Returns:
+    A device containing "/device:CPU:0" if the node is related to a variable.
+  """
+  device = op.device if op.device is not None else ""
+  dev = pydev.from_string(device)
+
+  # If a device type exists already, do not override.
+  if dev.device_type:
+    return device
+
+  if isinstance(op, ops.Operation):
+    node_def = op.node_def
+  else:
+    assert isinstance(op, graph_pb2.NodeDef)
+    node_def = op
+
+  if _is_variable_op(node_def.op):
+    return set_cpu0(device)
+  return device
+
+
+def pin_to_cpu(op):
+  """Returns a CPU device for the given node."""
+  device = op.device if op.device is not None else ""
+  dev = pydev.from_string(device)
+
+  if not dev.device_type:
+    return set_cpu0(device)
+  if dev.device_type == "CPU":
+    return device
+
+  logging.info("Operation %s has been assigned to a non-CPU (%s), so "
+               "it will not be pinned to the CPU.", op.name, dev.device_type)
+  return device
diff --git a/tensorflow/python/client/graph_util_test.py b/tensorflow/python/client/graph_util_test.py
new file mode 100644
index 0000000000..8066f722a8
--- /dev/null
+++ b/tensorflow/python/client/graph_util_test.py
@@ -0,0 +1,126 @@
+"""Tests for tensorflow.python.client.graph_util."""
+import tensorflow.python.platform
+
+from tensorflow.python.client import graph_util
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import data_flow_ops
+# pylint: disable=unused-import
+from tensorflow.python.ops import math_ops
+# pylint: enable=unused-import
+from tensorflow.python.ops import state_ops
+from tensorflow.python.platform import googletest
+
+
+class DeviceFunctionsTest(googletest.TestCase):
+
+  def testPinToCpu(self):
+    with ops.Graph().as_default() as g, g.device(graph_util.pin_to_cpu):
+      const_a = constant_op.constant(5.0)
+      const_b = constant_op.constant(10.0)
+      add_c = const_a + const_b
+      var_v = state_ops.variable_op([], dtype=types.float32)
+      assign_c_to_v = state_ops.assign(var_v, add_c)
+      const_string = constant_op.constant("on a cpu")
+      dynamic_stitch_int_result = data_flow_ops.dynamic_stitch(
+          [[0, 1, 2], [2, 3]], [[12, 23, 34], [1, 2]])
+      dynamic_stitch_float_result = data_flow_ops.dynamic_stitch(
+          [[0, 1, 2], [2, 3]], [[12.0, 23.0, 34.0], [1.0, 2.0]])
+    self.assertEqual(const_a.device, "/device:CPU:0")
+    self.assertEqual(const_b.device, "/device:CPU:0")
+    self.assertEqual(add_c.device, "/device:CPU:0")
+    self.assertEqual(var_v.device, "/device:CPU:0")
+    self.assertEqual(assign_c_to_v.device, "/device:CPU:0")
+    self.assertEqual(const_string.device, "/device:CPU:0")
+    self.assertEqual(dynamic_stitch_int_result.device, "/device:CPU:0")
+    self.assertEqual(dynamic_stitch_float_result.device, "/device:CPU:0")
+
+  def testPinRequiredOpsOnCPU(self):
+    with ops.Graph().as_default() as g, g.device(
+        graph_util.pin_variables_on_cpu):
+      const_a = constant_op.constant(5.0)
+      const_b = constant_op.constant(10.0)
+      add_c = const_a + const_b
+      var_v = state_ops.variable_op([], dtype=types.float32)
+      assign_c_to_v = state_ops.assign(var_v, add_c)
+      dynamic_stitch_int_result = data_flow_ops.dynamic_stitch(
+          [[0, 1, 2], [2, 3]], [[12, 23, 34], [1, 2]])
+      dynamic_stitch_float_result = data_flow_ops.dynamic_stitch(
+          [[0, 1, 2], [2, 3]], [[12.0, 23.0, 34.0], [1.0, 2.0]])
+      # Non-variable ops shuld not specify a device
+      self.assertEqual(const_a.device, None)
+      self.assertEqual(const_b.device, None)
+      self.assertEqual(add_c.device, None)
+      # Variable ops specify a device
+      self.assertEqual(var_v.device, "/device:CPU:0")
+      self.assertEqual(assign_c_to_v.device, "/device:CPU:0")
+
+  def testTwoDeviceFunctions(self):
+    with ops.Graph().as_default() as g:
+      var_0 = state_ops.variable_op([1], dtype=types.float32)
+      with g.device(graph_util.pin_variables_on_cpu):
+        var_1 = state_ops.variable_op([1], dtype=types.float32)
+      var_2 = state_ops.variable_op([1], dtype=types.float32)
+      var_3 = state_ops.variable_op([1], dtype=types.float32)
+      with g.device(graph_util.pin_variables_on_cpu):
+        var_4 = state_ops.variable_op([1], dtype=types.float32)
+        with g.device("/device:GPU:0"):
+          var_5 = state_ops.variable_op([1], dtype=types.float32)
+        var_6 = state_ops.variable_op([1], dtype=types.float32)
+
+    self.assertEqual(var_0.device, None)
+    self.assertEqual(var_1.device, "/device:CPU:0")
+    self.assertEqual(var_2.device, None)
+    self.assertEqual(var_3.device, None)
+    self.assertEqual(var_4.device, "/device:CPU:0")
+    self.assertEqual(var_5.device, "/device:GPU:0")
+    self.assertEqual(var_6.device, "/device:CPU:0")
+
+  def testExplicitDevice(self):
+    with ops.Graph().as_default() as g:
+      const_0 = constant_op.constant(5.0)
+      with g.device("/device:GPU:0"):
+        const_1 = constant_op.constant(5.0)
+      with g.device("/device:GPU:1"):
+        const_2 = constant_op.constant(5.0)
+      with g.device("/device:CPU:0"):
+        const_3 = constant_op.constant(5.0)
+      with g.device("/device:CPU:1"):
+        const_4 = constant_op.constant(5.0)
+      with g.device("/job:ps"):
+        const_5 = constant_op.constant(5.0)
+
+    self.assertEqual(const_0.device, None)
+    self.assertEqual(const_1.device, "/device:GPU:0")
+    self.assertEqual(const_2.device, "/device:GPU:1")
+    self.assertEqual(const_3.device, "/device:CPU:0")
+    self.assertEqual(const_4.device, "/device:CPU:1")
+    self.assertEqual(const_5.device, "/job:ps")
+
+  def testDefaultDevice(self):
+    with ops.Graph().as_default() as g, g.device(
+        graph_util.pin_variables_on_cpu):
+      with g.device("/job:ps"):
+        const_0 = constant_op.constant(5.0)
+      with g.device("/device:GPU:0"):
+        const_1 = constant_op.constant(5.0)
+      with g.device("/device:GPU:1"):
+        const_2 = constant_op.constant(5.0)
+      with g.device("/device:CPU:0"):
+        const_3 = constant_op.constant(5.0)
+      with g.device("/device:CPU:1"):
+        const_4 = constant_op.constant(5.0)
+      with g.device("/replica:0"):
+        const_5 = constant_op.constant(5.0)
+
+    self.assertEqual(const_0.device, "/job:ps")
+    self.assertEqual(const_1.device, "/device:GPU:0")
+    self.assertEqual(const_2.device, "/device:GPU:1")
+    self.assertEqual(const_3.device, "/device:CPU:0")
+    self.assertEqual(const_4.device, "/device:CPU:1")
+    self.assertEqual(const_5.device, "/replica:0")
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/client/notebook.py b/tensorflow/python/client/notebook.py
new file mode 100644
index 0000000000..1871fbc632
--- /dev/null
+++ b/tensorflow/python/client/notebook.py
@@ -0,0 +1,104 @@
+"""Notebook front-end to TensorFlow.
+
+When you run this binary, you'll see something like below, which indicates
+the serving URL of the notebook:
+
+   The IPython Notebook is running at: http://127.0.0.1:8888/
+
+Press "Shift+Enter" to execute a cell
+Press "Enter" on a cell to go into edit mode.
+Press "Escape" to go back into command mode and use arrow keys to navigate.
+Press "a" in command mode to insert cell above or "b" to insert cell below.
+
+Your root notebooks directory is FLAGS.notebook_dir
+"""
+
+
+import os
+import socket
+import sys
+
+# pylint: disable=g-import-not-at-top
+# Official recommended way of turning on fast protocol buffers as of 10/21/14
+os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "cpp"
+os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION_VERSION"] = "2"
+
+from tensorflow.python.platform import app
+from tensorflow.python.platform import flags
+
+FLAGS = flags.FLAGS
+
+flags.DEFINE_string(
+    "password", None,
+    "Password to require. If set, the server will allow public access."
+    " Only used if notebook config file does not exist.")
+
+flags.DEFINE_string("notebook_dir", "experimental/brain/notebooks",
+                    "root location where to store notebooks")
+
+ORIG_ARGV = sys.argv
+# Main notebook process calls itself with argv[1]="kernel" to start kernel
+# subprocesses.
+IS_KERNEL = len(sys.argv) > 1 and sys.argv[1] == "kernel"
+
+
+def main(unused_argv):
+  sys.argv = ORIG_ARGV
+
+  if not IS_KERNEL:
+    # Drop all flags.
+    sys.argv = [sys.argv[0]]
+    # NOTE(sadovsky): For some reason, putting this import at the top level
+    # breaks inline plotting.  It's probably a bug in the stone-age version of
+    # matplotlib.
+    from IPython.html.notebookapp import NotebookApp  # pylint: disable=g-import-not-at-top
+    notebookapp = NotebookApp.instance()
+    notebookapp.open_browser = True
+
+    # password functionality adopted from quality/ranklab/main/tools/notebook.py
+    # add options to run with "password"
+    if FLAGS.password:
+      from IPython.lib import passwd  # pylint: disable=g-import-not-at-top
+      notebookapp.ip = "0.0.0.0"
+      notebookapp.password = passwd(FLAGS.password)
+    else:
+      print ("\nNo password specified; Notebook server will only be available"
+             " on the local machine.\n")
+    notebookapp.initialize(argv=["--notebook-dir", FLAGS.notebook_dir])
+
+    if notebookapp.ip == "0.0.0.0":
+      proto = "https" if notebookapp.certfile else "http"
+      url = "%s://%s:%d%s" % (proto, socket.gethostname(), notebookapp.port,
+                              notebookapp.base_project_url)
+      print "\nNotebook server will be publicly available at: %s\n" % url
+
+    notebookapp.start()
+    return
+
+  # Drop the --flagfile flag so that notebook doesn't complain about an
+  # "unrecognized alias" when parsing sys.argv.
+  sys.argv = ([sys.argv[0]] +
+              [z for z in sys.argv[1:] if not z.startswith("--flagfile")])
+  from IPython.kernel.zmq.kernelapp import IPKernelApp  # pylint: disable=g-import-not-at-top
+  kernelapp = IPKernelApp.instance()
+  kernelapp.initialize()
+
+  # Enable inline plotting. Equivalent to running "%matplotlib inline".
+  ipshell = kernelapp.shell
+  ipshell.enable_matplotlib("inline")
+
+  kernelapp.start()
+
+
+if __name__ == "__main__":
+  # When the user starts the main notebook process, we don't touch sys.argv.
+  # When the main process launches kernel subprocesses, it writes all flags
+  # to a tmpfile and sets --flagfile to that tmpfile, so for kernel
+  # subprocesses here we drop all flags *except* --flagfile, then call
+  # app.run(), and then (in main) restore all flags before starting the
+  # kernel app.
+  if IS_KERNEL:
+    # Drop everything except --flagfile.
+    sys.argv = ([sys.argv[0]] +
+                [x for x in sys.argv[1:] if x.startswith("--flagfile")])
+  app.run()
diff --git a/tensorflow/python/client/session.py b/tensorflow/python/client/session.py
new file mode 100644
index 0000000000..7da9b41cf4
--- /dev/null
+++ b/tensorflow/python/client/session.py
@@ -0,0 +1,567 @@
+"""A client interface for TensorFlow."""
+
+import re
+import sys
+import threading
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python import pywrap_tensorflow as tf_session
+from tensorflow.python.framework import errors
+from tensorflow.python.framework import ops
+from tensorflow.python.platform import logging
+
+
+class SessionInterface(object):
+  """Base class for implementations of TensorFlow client sessions."""
+
+  @property
+  def graph(self):
+    """The underlying TensorFlow graph, to be used in building Operations."""
+    raise NotImplementedError('graph')
+
+  @property
+  def sess_str(self):
+    """The TensorFlow process to which this session will connect."""
+    raise NotImplementedError('sess_str')
+
+  def run(self, fetches, feed_dict=None):
+    """Runs operations in the session. See `Session.run()` for details."""
+    raise NotImplementedError('Run')
+
+
+class BaseSession(SessionInterface):
+  """A class for interacting with a TensorFlow computation.
+
+  The BaseSession enables incremental graph building with inline
+  execution of Operations and evaluation of Tensors.
+  """
+
+  def __init__(self, target='', graph=None, config=None):
+    """Constructs a new TensorFlow session.
+
+    Args:
+      target: (Optional) The TensorFlow execution engine to connect to.
+      graph: (Optional) The graph to be used. If this argument is None,
+        the default graph will be used.
+      config: (Optional) ConfigProto proto used to configure the session.
+
+    Raises:
+      RuntimeError: If an error occurs while creating the TensorFlow
+        session.
+    """
+    if graph is None:
+      self._graph = ops.get_default_graph()
+    else:
+      self._graph = graph
+
+    self._opened = False
+    self._closed = False
+
+    self._current_version = 0
+    self._extend_lock = threading.Lock()
+    self._target = target
+
+    self._session = None
+
+    try:
+      opts = tf_session.TF_NewSessionOptions(target=target, config=config)
+      status = tf_session.TF_NewStatus()
+      self._session = tf_session.TF_NewSession(opts, status)
+      if tf_session.TF_GetCode(status) != 0:
+        message = tf_session.TF_Message(status)
+        raise RuntimeError(message)
+
+    finally:
+      tf_session.TF_DeleteSessionOptions(opts)
+      tf_session.TF_DeleteStatus(status)
+
+  def close(self):
+    """Closes this session.
+
+    Calling this method frees all resources associated with the session.
+
+    Raises:
+      RuntimeError: If an error occurs while closing the session.
+    """
+    with self._extend_lock:
+      if self._opened and not self._closed:
+        self._closed = True
+        try:
+          status = tf_session.TF_NewStatus()
+          tf_session.TF_CloseSession(self._session, status)
+          if tf_session.TF_GetCode(status) != 0:
+            raise RuntimeError(tf_session.TF_Message(status))
+        finally:
+          tf_session.TF_DeleteStatus(status)
+
+  def __del__(self):
+    self.close()
+    try:
+      status = tf_session.TF_NewStatus()
+      if self._session is not None:
+        tf_session.TF_DeleteSession(self._session, status)
+        if tf_session.TF_GetCode(status) != 0:
+          raise RuntimeError(tf_session.TF_Message(status))
+        self._session = None
+    finally:
+      tf_session.TF_DeleteStatus(status)
+
+  @property
+  def graph(self):
+    """The graph that was launched in this session."""
+    return self._graph
+
+  @property
+  def graph_def(self):
+    """A serializable version of the underlying TensorFlow graph.
+
+    Returns:
+      A graph_pb2.GraphDef proto containing nodes for all of the Operations in
+      the underlying TensorFlow graph.
+    """
+    return self._graph.as_graph_def()
+
+  @property
+  def sess_str(self):
+    return self._target
+
+  def as_default(self):
+    """Returns a context manager that makes this object the default session.
+
+    Use with the `with` keyword to specify that calls to
+    [`Operation.run()`](framework.md#Operation.run) or
+    [`Tensor.run()`](framework.md#Tensor.run) should be executed in
+    this session.
+
+    ```python
+    c = tf.constant(..)
+    sess = tf.Session()
+
+    with sess.as_default():
+      assert tf.get_default_session() is sess
+      print c.eval()
+    ```
+
+    To get the current default session, use
+    [`tf.get_default_session()`](#get_default_session).
+
+
+    *N.B.* The `as_default` context manager *does not* close the
+    session when you exit the context, and you must close the session
+    explicitly.
+
+    ```python
+    c = tf.constant(...)
+    sess = tf.Session()
+    with sess.as_default():
+      print c.eval()
+    # ...
+    with sess.as_default():
+      print c.eval()
+
+    sess.close()
+    ```
+
+    Alternatively, you can use `with tf.Session():` to create a
+    session that is automatically closed on exiting the context,
+    including when an uncaught exception is raised.
+
+    *N.B.* The default graph is a property of the current thread. If you
+    create a new thread, and wish to use the default session in that
+    thread, you must explicitly add a `with sess.as_default():` in that
+    thread's function.
+
+    Returns:
+      A context manager using this session as the default session.
+
+    """
+    return ops.default_session(self)
+
+  # Eventually, this registration could be opened up to support custom
+  # Tensor expansions. Expects tuples of (Type, fetch_fn, feed_fn),
+  # where the signatures are:
+  #   fetch_fn : Type -> (list of Tensors,
+  #                       lambda: list of fetched np.ndarray -> TypeVal)
+  #   feed_fn  : Type, TypeVal -> list of (Tensor, value)
+  # Conceptually, fetch_fn describes how to expand fetch into its
+  # component Tensors and how to contracting the fetched results back into
+  # a single return value. feed_fn describes how to unpack a single fed
+  # value and map it to feeds of a Tensor and its corresponding value.
+  # pylint: disable=g-long-lambda
+  _REGISTERED_EXPANSIONS = [
+      # SparseTensors are fetched as SparseTensorValues. They can be fed
+      # SparseTensorValues or normal tuples.
+      (ops.SparseTensor,
+       lambda fetch: (
+           [fetch.indices, fetch.values, fetch.shape],
+           lambda fetched_vals: ops.SparseTensorValue(*fetched_vals)),
+       lambda feed, feed_val: list(zip(
+           [feed.indices, feed.values, feed.shape], feed_val))),
+      # The default catches all types and performs no expansions.
+      (object,
+       lambda fetch: ([fetch], lambda fetched_vals: fetched_vals[0]),
+       lambda feed, feed_val: [(feed, feed_val)])]
+  # pylint: enable=g-long-lambda
+
+  def run(self, fetches, feed_dict=None):
+    """Runs the operations and evaluates the tensors in `fetches`.
+
+    This method runs one "step" of TensorFlow computation, by
+    running the necessary graph fragment to execute every `Operation`
+    and evaluate every `Tensor` in `fetches`, substituting the values in
+    `feed_dict` for the corresponding input values.
+
+    The `fetches` argument may be a list of graph elements or a single
+    graph element, and these determine the return value of this
+    method. A graph element can be one of the following types:
+
+    * If the *i*th element of `fetches` is an
+      [`Operation`](framework.md#Operation), the *i*th return value
+      will be `None`.
+    * If the *i*th element of `fetches` is a
+      [`Tensor`](framework.md#Tensor), the *i*th return value will
+      be a numpy ndarray containing the value of that tensor.
+    * If the *i*th element of `fetches` is a
+      [`SparseTensor`](sparse_ops.md#SparseTensor), the *i*th
+      return value will be a
+      [`SparseTensorValue`](sparse_ops.md#SparseTensorValue)
+      containing the value of that sparse tensor.
+
+    The optional `feed_dict` argument allows the caller to override
+    the value of tensors in the graph. Each key in `feed_dict` can be
+    one of the following types:
+
+    * If the key is a [`Tensor`](framework.md#Tensor), the
+      value may be a Python scalar, string, list, or numpy ndarray
+      that can be converted to the same `dtype` as that
+      tensor. Additionally, if the key is a
+      [placeholder](io_ops.md#placeholder), the shape of the value
+      will be checked for compatibility with the placeholder.
+    * If the key is a [`SparseTensor`](sparse_ops.md#SparseTensor),
+      the value should be a
+      [`SparseTensorValue`](sparse_ops.md#SparseTensorValue).
+
+    Args:
+      fetches: A single graph element, or a list of graph elements
+        (described above).
+      feed_dict: A dictionary that maps graph elements to values
+        (described above).
+
+    Returns:
+      Either a single value if `fetches` is a single graph element, or
+      a list of values if `fetches` is a list (described above).
+
+    Raises:
+      RuntimeError: If this `Session` is in an invalid state (e.g. has been
+        closed).
+      TypeError: If `fetches` or `feed_dict` keys are of an inappropriate type.
+      ValueError: If `fetches` or `feed_dict` keys are invalid or refer to a
+        `Tensor` that doesn't exist.
+
+    """
+    def _fetch_fn(fetch):
+      for tensor_type, fetch_fn, _ in BaseSession._REGISTERED_EXPANSIONS:
+        if isinstance(fetch, tensor_type):
+          return fetch_fn(fetch)
+      raise TypeError('Fetch argument %r has invalid type %r'
+                      % (fetch, type(fetch)))
+
+    def _feed_fn(feed, feed_val):
+      for tensor_type, _, feed_fn in BaseSession._REGISTERED_EXPANSIONS:
+        if isinstance(feed, tensor_type):
+          return feed_fn(feed, feed_val)
+      raise TypeError('Feed argument %r has invalid type %r'
+                      % (feed, type(feed)))
+
+    # Check session.
+    if self._closed:
+      raise RuntimeError('Attempted to use a closed Session.')
+
+    # Validate and process fetches.
+    is_list_fetch = isinstance(fetches, (list, tuple))
+    if not is_list_fetch:
+      fetches = [fetches]
+
+    unique_fetch_targets = set()
+    target_list = []
+
+    fetch_info = []
+    for fetch in fetches:
+      subfetches, fetch_contraction_fn = _fetch_fn(fetch)
+      subfetch_names = []
+      for subfetch in subfetches:
+        try:
+          fetch_t = self.graph.as_graph_element(subfetch, allow_tensor=True,
+                                                allow_operation=True)
+          if isinstance(fetch_t, ops.Operation):
+            target_list.append(fetch_t.name)
+          else:
+            subfetch_names.append(fetch_t.name)
+        except TypeError as e:
+          raise TypeError('Fetch argument %r of %r has invalid type %r, '
+                          'must be a string or Tensor. (%s)'
+                          % (subfetch, fetch, type(subfetch), e.message))
+        except ValueError as e:
+          raise ValueError('Fetch argument %r of %r cannot be interpreted as a '
+                           'Tensor. (%s)' % (subfetch, fetch, e.message))
+        except KeyError as e:
+          raise ValueError('Fetch argument %r of %r cannot be interpreted as a '
+                           'Tensor. (%s)' % (subfetch, fetch, e.message))
+      unique_fetch_targets.update(subfetch_names)
+      fetch_info.append((subfetch_names, fetch_contraction_fn))
+
+    unique_fetch_targets = list(unique_fetch_targets)
+
+    # Create request.
+    feed_dict_string = {}
+
+    # Validate and process feed_dict.
+    if feed_dict:
+      for feed, feed_val in feed_dict.iteritems():
+        for subfeed, subfeed_val in _feed_fn(feed, feed_val):
+          try:
+            subfeed_t = self.graph.as_graph_element(subfeed, allow_tensor=True,
+                                                    allow_operation=False)
+          except Exception as e:
+            e.message = ('Cannot interpret feed_dict key as Tensor: '
+                         + e.message)
+            e.args = (e.message,)
+            raise e
+          np_val = np.array(subfeed_val, dtype=subfeed_t.dtype.as_numpy_dtype)
+          if subfeed_t.op.type == 'Placeholder':
+            if not subfeed_t.get_shape().is_compatible_with(np_val.shape):
+              raise ValueError(
+                  'Cannot feed value of shape %r for Tensor %r, '
+                  'which has shape %r'
+                  % (np_val.shape, subfeed_t.name,
+                     tuple(subfeed_t.get_shape().dims)))
+          feed_dict_string[str(subfeed_t.name)] = np_val
+
+    # Run request and get response.
+    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
+
+    # User may have fetched the same tensor multiple times, but we
+    # only fetch them from the runtime once.  Furthermore, they may
+    # be wrapped as a tuple of tensors.  Here we map the results back
+    # to what the client asked for.
+    fetched_results = dict(zip(unique_fetch_targets, results))
+    ret = []
+    for fetch_names, fetch_contraction_fn in fetch_info:
+      if fetch_names:
+        fetched_vals = [fetched_results[name] for name in fetch_names]
+        ret.append(fetch_contraction_fn(fetched_vals))
+      else:
+        ret.append(None)
+
+    if is_list_fetch:
+      return ret
+    else:
+      return ret[0]
+
+  # Captures the name of a node in an error status.
+  _NODEDEF_NAME_RE = re.compile(r'\[\[Node: ([^ ]*?) =')
+
+  def _do_run(self, target_list, fetch_list, feed_dict):
+    """Runs a step based on the given fetches and feeds.
+
+    Args:
+      target_list: A list of strings corresponding to names of tensors
+        or operations to be run to, but not fetched.
+      fetch_list: A list of strings corresponding to names of tensors to be
+        fetched and operations to be run.
+      feed_dict: A dictionary that maps tensor names to numpy ndarrays.
+
+    Returns:
+      A list of numpy ndarrays, corresponding to the elements of
+      `fetch_list`.  If the ith element of `fetch_list` contains the
+      name of an operation, the first Tensor output of that operation
+      will be returned for that element.
+    """
+    try:
+      # Ensure any changes to the graph are reflected in the runtime.
+      with self._extend_lock:
+        if self._graph.version > self._current_version:
+          graph_def = self._graph.as_graph_def(
+              from_version=self._current_version)
+
+          try:
+            status = tf_session.TF_NewStatus()
+            tf_session.TF_ExtendGraph(
+                self._session, graph_def.SerializeToString(), status)
+            if tf_session.TF_GetCode(status) != 0:
+              raise RuntimeError(tf_session.TF_Message(status))
+            self._opened = True
+          finally:
+            tf_session.TF_DeleteStatus(status)
+
+          self._current_version = self._graph.version
+
+      return tf_session.TF_Run(self._session, feed_dict, fetch_list,
+                               target_list)
+
+    except tf_session.StatusNotOK as e:
+      e_type, e_value, e_traceback = sys.exc_info()
+      m = BaseSession._NODEDEF_NAME_RE.search(e.error_message)
+      if m is not None:
+        node_name = m.group(1)
+        node_def = None
+        try:
+          op = self._graph.get_operation_by_name(node_name)
+          node_def = op.node_def
+        except KeyError:
+          op = None
+        # pylint: disable=protected-access
+        raise errors._make_specific_exception(node_def, op, e.error_message,
+                                              e.code)
+        # pylint: enable=protected-access
+      raise e_type, e_value, e_traceback
+
+
+class Session(BaseSession):
+  """A class for running TensorFlow operations.
+
+  A `Session` object encapsulates the environment in which `Operation`
+  objects are executed, and `Tensor` objects are evaluated. For
+  example:
+
+  ```python
+  # Build a graph.
+  a = tf.constant(5.0)
+  b = tf.constant(6.0)
+  c = a * b
+
+  # Launch the graph in a session.
+  sess = tf.Session()
+
+  # Evaluate the tensor `c`.
+  print sess.run(c)
+  ```
+
+  A session may own resources, such as
+  [variables](state_ops.md#Variable), [queues](io_ops.md#QueueBase),
+  and [readers](io_ops.md#ReaderBase). It is important to release
+  these resources when they are no longer required. To do this, either
+  invoke the [`close()`](#Session.close) method on the session, or use
+  the session as a context manager. The following two examples are
+  equivalent:
+
+  ```python
+  # Using the `close()` method.
+  sess = tf.Session()
+  sess.run(...)
+  sess.close()
+
+  # Using the context manager.
+  with tf.Session() as sess:
+    sess.run(...)
+  ```
+
+  The [`ConfigProto`]
+  (https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/config.proto)
+  protocol buffer exposes various configuration options for a
+  session. For example, to create a session that uses soft constraints
+  for device placement, and log the resulting placement decisions,
+  create a session as follows:
+
+  ```python
+  # Launch the graph in a session that allows soft device placement and
+  # logs the placement decisions.
+  sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True,
+                                          log_device_placement=True))
+  ```
+
+  @@__init__
+  @@run
+  @@close
+
+  @@graph
+
+  @@as_default
+
+  """
+
+  def __init__(self, target='', graph=None, config=None):
+    """Creates a new TensorFlow session.
+
+    If no `graph` argument is specified when constructing the session,
+    the default graph will be launched in the session. If you are
+    using more than one graph (created with `tf.Graph()` in the same
+    process, you will have to use different sessions for each graph,
+    but each graph can be used in multiple sessions. In this case, it
+    is often clearer to pass the graph to be launched explicitly to
+    the session constructor.
+
+    Args:
+      target: (Optional.) The execution engine to connect to.
+        Defaults to using an in-process engine. At present, no value
+        other than the empty string is supported.
+      graph: (Optional.) The `Graph` to be launched (described above).
+      config: (Optional.) A [`ConfigProto`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/config.proto)
+        protocol buffer with configuration options for the session.
+
+    """
+    super(Session, self).__init__(target, graph, config=config)
+    self._context_managers = [self.graph.as_default(), self.as_default()]
+
+  def __enter__(self):
+    for context_manager in self._context_managers:
+      context_manager.__enter__()
+    return self
+
+  def __exit__(self, exec_type, exec_value, exec_tb):
+    if exec_type is errors.OpError:
+      logging.error('Session closing due to OpError: %s', (exec_value,))
+
+    for context_manager in reversed(self._context_managers):
+      context_manager.__exit__(exec_type, exec_value, exec_tb)
+
+    self.close()
+
+
+class InteractiveSession(BaseSession):
+  """A TensorFlow `Session` for use in interactive contexts, such as a shell.
+
+  In some cases, such as interactive shells and IPython notebooks, it is
+  useful to be able to define a `Session` without using a with block: this
+  style enables statements to be executed immediately, rather than at the
+  termination of the block. In that case, it must be closed using
+  `Session.close()`. For example:
+
+  ```python
+  sess = InteractiveSession()
+  a = tf.constant(5.0)
+  b = tf.constant(6.0)
+  c = a * b
+  print c.run()
+  sess.close()
+  ```
+
+  @@__init__
+  @@close
+  """
+
+  def __init__(self, target='', graph=None):
+    """Initializes an `InteractiveSession` object similar to `Session`.
+
+    Args:
+      target: Optional. The TensorFlow execution engine to connect to.
+      graph: Optional. The `Graph` object to be used. If this argument is None,
+        the default graph will be used.
+    """
+    super(InteractiveSession, self).__init__(target, graph)
+    self._default_session = self.as_default()
+    self._default_session.__enter__()
+    self._explicit_graph = graph
+    if self._explicit_graph is not None:
+      self._default_graph = graph.as_default()
+      self._default_graph.__enter__()
+
+  def close(self):
+    """Closes an `InteractiveSession`."""
+    super(InteractiveSession, self).close()
+    if self._explicit_graph is not None:
+      self._default_graph.__exit__(None, None, None)
+    self._default_session.__exit__(None, None, None)
diff --git a/tensorflow/python/client/session_test.py b/tensorflow/python/client/session_test.py
new file mode 100644
index 0000000000..4492840dcf
--- /dev/null
+++ b/tensorflow/python/client/session_test.py
@@ -0,0 +1,555 @@
+"""Tests for tensorflow.python.client.session.Session."""
+import threading
+import time
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.core.framework import config_pb2
+from tensorflow.core.lib.core import error_codes_pb2
+from tensorflow.python.client import session
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import state_ops
+from tensorflow.python.ops import variables
+from tensorflow.python.platform import googletest
+
+
+# NOTE(mrry): Dummy shape registration for op used in the tests.
+ops.RegisterShape('ConstructionFails')(None)
+
+
+class SessionTest(test_util.TensorFlowTestCase):
+
+  def testUseExistingGraph(self):
+    with ops.Graph().as_default() as g, ops.device('/cpu:0'):
+      a = constant_op.constant(6.0, shape=[1, 1])
+      b = constant_op.constant(7.0, shape=[1, 1])
+      c = math_ops.matmul(a, b, name='matmul')
+    with session.Session(graph=g):
+      result = c.eval()
+      self.assertAllEqual(result, [[42.0]])
+
+  def testUseDefaultGraph(self):
+    with ops.Graph().as_default(), ops.device('/cpu:0'):
+      a = constant_op.constant(6.0, shape=[1, 1])
+      b = constant_op.constant(7.0, shape=[1, 1])
+      c = math_ops.matmul(a, b, name='matmul')
+      with session.Session():
+        result = c.eval()
+        self.assertAllEqual(result, [[42.0]])
+
+  def testCreate(self):
+    with session.Session():
+      inp = constant_op.constant(10.0, name='W1')
+      copy = array_ops.identity(inp)
+      # Test with feed.
+      # TODO(mrry): Investigate why order='F' didn't work.
+      arr = np.asarray([[0, 1, 2], [3, 4, 5]], dtype=np.float32, order='C')
+      copy_val = copy.eval({'W1:0': arr})
+      self.assertAllEqual(arr, copy_val)
+      # Test without feed.
+      copy_val = copy.eval()
+      self.assertAllEqual(np.asarray(10.0, dtype=np.float32), copy_val)
+
+  def testManyCPUs(self):
+    # TODO(keveman): Implement ListDevices and test for the number of
+    # devices returned by ListDevices.
+    with session.Session(
+        config=config_pb2.ConfigProto(device_count={'CPU': 2})):
+      inp = constant_op.constant(10.0, name='W1')
+      self.assertAllEqual(inp.eval(), 10.0)
+
+  def testErrorsReported(self):
+    with session.Session() as s:
+      constant_op.constant(10.0, name='W1')
+      with self.assertRaises(ValueError):
+        s.run('foo:0')
+
+  def testErrorPayload(self):
+    with session.Session():
+      a = array_ops.placeholder(types.float32)
+      with self.assertRaisesOpError(lambda e: e.op == a.op):
+        a.eval()
+
+  def testOpConstructionErrorPayload(self):
+    with session.Session():
+      failing_op = ops.get_default_graph().create_op(
+          'ConstructionFails', [], [], name='f')
+
+      def exc_predicate(e):
+        return (e.op == failing_op
+                and e.error_code == error_codes_pb2.INVALID_ARGUMENT)
+      with self.assertRaisesOpError(exc_predicate):
+        failing_op.run()
+
+  def testErrorBasedOn(self):
+    with session.Session() as sess:
+      a = constant_op.constant(0.0, shape=[2, 3])
+      # NOTE(mrry): The original_op is nonsense, but used here to test that the
+      #   errors are reported correctly.
+      # pylint: disable=protected-access
+      with sess.graph._original_op(a.op):
+        b = array_ops.identity(a, name='id')
+      with sess.graph._original_op(b.op):
+        c = array_ops.placeholder(types.float32)
+      # pylint: enable=protected-access
+
+      def exc_predicate(e):
+        return (e.op == c.op
+                and e.op._original_op == b.op
+                and e.op._original_op._original_op == a.op)
+      with self.assertRaisesOpError(exc_predicate):
+        c.eval()
+
+  def testFetchTensorObject(self):
+    with session.Session() as s:
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[2, 3])
+      c = math_ops.matmul(a, b)
+      results_with_list = s.run([c])
+      self.assertAllEqual([[4.0, 4.0, 4.0]], results_with_list[0])
+      results_with_single = s.run(c)
+      self.assertAllEqual([[4.0, 4.0, 4.0]], results_with_single)
+      results_with_get = c.eval()
+      self.assertAllEqual([[4.0, 4.0, 4.0]], results_with_get)
+      a_val, b_val = s.run([a, b])  # Test multiple fetches.
+      self.assertAllEqual([[1.0, 1.0]], a_val)
+      self.assertAllEqual([[2.0, 2.0, 2.0], [2.0, 2.0, 2.0]], b_val)
+
+  def testFetchScalar(self):
+    with session.Session() as s:
+      for scalar in np.int32, np.int64, np.float32, np.float64:
+        x = scalar(7)
+        y = scalar(8)
+        tf_x = constant_op.constant(x, shape=[])
+        tf_y = constant_op.constant(y)
+        tf_xy = math_ops.add(tf_x, tf_y)
+        # Single fetch
+        xy = s.run(tf_xy)
+        self.assertEqual(scalar, type(xy))
+        self.assertEqual(x + y, xy)
+        # List fetch
+        xy, = s.run([tf_xy])
+        self.assertEqual(scalar, type(xy))
+        self.assertEqual(x + y, xy)
+
+  def testFetchOperationObject(self):
+    with session.Session() as s:
+      a = constant_op.constant(1.0, shape=[1, 2])
+      v = variables.Variable(a, name='testFetchOperationObject_v')
+      s.run(v.initializer)
+      v_val = s.run(v)
+      self.assertAllEqual([[1.0, 1.0]], v_val)
+
+  def testFetchSparseTensor(self):
+    with session.Session() as s:
+      indices = np.array([[3, 2, 0], [4, 5, 1]]).astype(np.int64)
+      values = np.array([1.0, 2.0]).astype(np.float32)
+      shape = np.array([7, 9, 2]).astype(np.int64)
+      sp = ops.SparseTensor(
+          constant_op.constant(indices),
+          constant_op.constant(values),
+          constant_op.constant(shape))
+      # Single fetch, use as tuple
+      sp_out = s.run(sp)
+      indices_out, values_out, shape_out = sp_out
+      self.assertAllEqual(indices_out, indices)
+      self.assertAllEqual(values_out, values)
+      self.assertAllEqual(shape_out, shape)
+      # Single fetch, use as SparseTensorValue
+      sp_out = s.run(sp)
+      self.assertAllEqual(sp_out.indices, indices)
+      self.assertAllEqual(sp_out.values, values)
+      self.assertAllEqual(sp_out.shape, shape)
+      # Tuple fetch, use as tuple
+      indices_out, values_out, shape_out = s.run(sp)
+      self.assertAllEqual(indices_out, indices)
+      self.assertAllEqual(values_out, values)
+      self.assertAllEqual(shape_out, shape)
+      # List fetch, use as tuple
+      (indices_out, values_out, shape_out), = s.run([sp])
+      self.assertAllEqual(indices_out, indices)
+      self.assertAllEqual(values_out, values)
+      self.assertAllEqual(shape_out, shape)
+      # List fetch, use as SparseTensorValue
+      sp_out, = s.run([sp])
+      self.assertAllEqual(sp_out.indices, indices)
+      self.assertAllEqual(sp_out.values, values)
+      self.assertAllEqual(sp_out.shape, shape)
+
+  def testFeedSparseTensor(self):
+    with session.Session() as s:
+      indices = np.array([[3, 2, 0], [4, 5, 1]]).astype(np.int64)
+      values = np.array([1.0, 2.0]).astype(np.float32)
+      shape = np.array([7, 9, 2]).astype(np.int64)
+      sp = ops.SparseTensor(
+          array_ops.placeholder(dtype=np.int64, shape=(2, 3)),
+          array_ops.placeholder(dtype=np.float32, shape=(2,)),
+          array_ops.placeholder(dtype=np.int64, shape=(3,)),)
+      sp_indices = array_ops.identity(sp.indices)
+      sp_values = array_ops.identity(sp.values)
+      sp_shape = array_ops.identity(sp.shape)
+      sp2 = ops.SparseTensor(sp_indices, sp_values, sp_shape)
+      # Feed with tuple
+      indices_out, values_out, shape_out = s.run(
+          [sp_indices, sp_values, sp_shape], {sp: (indices, values, shape)})
+      self.assertAllEqual(indices_out, indices)
+      self.assertAllEqual(values_out, values)
+      self.assertAllEqual(shape_out, shape)
+      # Feed with SparseTensorValue
+      indices_out, values_out, shape_out = s.run(
+          [sp_indices, sp_values, sp_shape],
+          {sp: ops.SparseTensorValue(indices, values, shape)})
+      self.assertAllEqual(indices_out, indices)
+      self.assertAllEqual(values_out, values)
+      self.assertAllEqual(shape_out, shape)
+      # Feed with SparseTensorValue, fetch SparseTensorValue
+      sp2_out = s.run(sp2, {sp: ops.SparseTensorValue(indices, values, shape)})
+      self.assertAllEqual(sp2_out.indices, indices)
+      self.assertAllEqual(sp2_out.values, values)
+      self.assertAllEqual(sp2_out.shape, shape)
+
+  def testExtendWithStatelessOperations(self):
+    with session.Session() as s:
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[2, 3])
+      c = math_ops.matmul(a, b)
+      c_val = s.run(c)
+      self.assertAllEqual([[4.0, 4.0, 4.0]], c_val)
+      d = constant_op.constant([1.0, 2.0, 3.0], shape=[3, 1])
+      e = math_ops.matmul(c, d)
+      # Extend will happen here.
+      e_val = s.run(e)
+      self.assertAllEqual([[24.0]], e_val)
+
+  def testExtendWithStatefulOperations(self):
+    with session.Session() as s:
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[2, 3])
+      c = math_ops.matmul(a, b)
+      v = variables.Variable(c, name='testExtendWithStatefulOperations_v')
+      v.initializer.run()
+      v_val = v.eval()
+      self.assertAllEqual([[4.0, 4.0, 4.0]], v_val)
+      d = constant_op.constant(3.0, shape=[2, 3])
+      e = math_ops.matmul(a, d)
+      assign_e_to_v = state_ops.assign(v, e)
+      # Extend will happen here.
+      e_val = e.eval()
+      self.assertAllEqual([[6.0, 6.0, 6.0]], e_val)
+      v_val = v.eval()
+      self.assertAllEqual([[4.0, 4.0, 4.0]], v_val)
+      s.run(assign_e_to_v)
+      v_val = v.eval()
+      self.assertAllEqual([[6.0, 6.0, 6.0]], v_val)
+
+  def testExtendWithGroupBy(self):
+    with session.Session() as s:
+      a = constant_op.constant(1.0, shape=[1, 2])
+      p = variables.Variable(a, name='testExtendWithGroupBy_p')
+      a_val = a.eval()  # Force an Extend after this op.
+      self.assertAllEqual([[1.0, 1.0]], a_val)
+
+      b = constant_op.constant(2.0, shape=[1, 2])
+      q = variables.Variable(b, name='testExtendWithGroupBy_q')
+      # Extend will happen here.
+      init = control_flow_ops.group(p.initializer, q.initializer)
+      s.run(init)
+      p_val, q_val = s.run([p, q])
+
+      self.assertAllEqual([[1.0, 1.0]], p_val)
+      self.assertAllEqual([[2.0, 2.0]], q_val)
+
+  def testTensorGetMethod(self):
+    with session.Session():
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[2, 3])
+      c = math_ops.matmul(a, b)
+
+      c_val = c.eval()
+      self.assertAllEqual([[4.0, 4.0, 4.0]], c_val)
+
+      fed_c_val = c.eval(feed_dict={a.name: [[4.0, 4.0]]})
+      self.assertAllEqual([[16.0, 16.0, 16.0]], fed_c_val)
+
+  def testOperationRunMethod(self):
+    with session.Session():
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[1, 2], name='b')
+      v = variables.Variable(a, a.dtype)
+      assign_a_to_v = state_ops.assign(v, a)
+
+      assign_a_to_v.eval()
+
+      v_val = v.eval()
+      self.assertAllEqual([[1.0, 1.0]], v_val)
+
+      assign_b_to_v = state_ops.assign(v, b)
+
+      assign_b_to_v.eval()
+      v_val = v.eval()
+      self.assertAllEqual([[2.0, 2.0]], v_val)
+
+      assign_b_to_v.eval(feed_dict={'b:0': [[3.0, 3.0]]})
+      v_val = v.eval()
+      self.assertAllEqual([[3.0, 3.0]], v_val)
+
+  def testDefaultGraph(self):
+    with session.Session() as s:
+      self.assertEqual(ops.get_default_graph(), s.graph)
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[2, 3])
+      self.assertEqual(ops.get_default_graph(), a.graph)
+      self.assertEqual(ops.get_default_graph(), b.graph)
+      c = math_ops.matmul(a, b)
+      v = variables.Variable(c, name='testDefaultGraph_v')
+      v.initializer.run()
+      v_val = v.eval()
+      self.assertAllEqual([[4.0, 4.0, 4.0]], v_val)
+      d = constant_op.constant(3.0, shape=[2, 3])
+      e = math_ops.matmul(a, d)
+      assign_e_to_v = state_ops.assign(v, e)
+      e_val = e.eval()
+      self.assertAllEqual([[6.0, 6.0, 6.0]], e_val)
+      v_val = v.eval()
+      self.assertAllEqual([[4.0, 4.0, 4.0]], v_val)
+      s.run(assign_e_to_v)
+      v_val = v.eval()
+      self.assertAllEqual([[6.0, 6.0, 6.0]], v_val)
+      self.assertEqual(ops.get_default_graph(), s.graph)
+
+  def _testDefaultGraphInThread(self, constructed_event, continue_event, i):
+    with session.Session() as s:
+      self.assertEqual(ops.get_default_graph(), s.graph)
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[2, 3])
+      c = math_ops.matmul(a, b)
+      v = variables.Variable(c, name='var_%d' % i)
+
+      # Block here until all threads have constructed their graph.
+      constructed_event.set()
+      continue_event.wait()
+
+      assign_c_to_v = state_ops.assign(v, c)
+      v.initializer.run()
+      assign_c_to_v.eval()
+      v_val = v.eval()
+      self.assertAllEqual([[4.0, 4.0, 4.0]], v_val)
+      d = constant_op.constant(3.0, shape=[2, 3])
+      e = math_ops.matmul(a, d)
+      assign_e_to_v = state_ops.assign(v, e)
+      e_val = e.eval()
+      self.assertAllEqual([[6.0, 6.0, 6.0]], e_val)
+      v_val = v.eval()
+      self.assertAllEqual([[4.0, 4.0, 4.0]], v_val)
+      s.run(assign_e_to_v)
+      v_val = v.eval()
+      self.assertAllEqual([[6.0, 6.0, 6.0]], v_val)
+      self.assertEqual(ops.get_default_graph(), s.graph)
+
+  def testDefaultGraphWithThreads(self):
+    # Fork ten threads that use their thread-local default graph.
+    threads = []
+    constructed_events = [threading.Event() for _ in range(10)]
+    continue_event = threading.Event()
+    for i, constructed_event in enumerate(constructed_events):
+      t = self.checkedThread(target=self._testDefaultGraphInThread,
+                             args=(constructed_event, continue_event, i))
+      threads.append(t)
+    for t in threads:
+      t.start()
+    for constructed_event in constructed_events:
+      constructed_event.wait()
+    continue_event.set()
+    for t in threads:
+      t.join()
+
+  def testParallelRun(self):
+    with session.Session() as sess:
+      c = constant_op.constant(5.0)
+      ev = threading.Event()
+
+      def run_step():
+        ev.wait()
+        val = c.eval(session=sess)
+        self.assertEqual(val, 5.0)
+      threads = [self.checkedThread(target=run_step) for _ in range(100)]
+      for t in threads:
+        t.start()
+      ev.set()
+      for t in threads:
+        t.join()
+
+  def testRunFeedDict(self):
+    with session.Session() as s:
+      x = array_ops.zeros([2])
+
+      y = s.run(2 * x, feed_dict={x: np.ones(2).astype(np.float32)})
+      self.assertAllEqual(y, 2 * np.ones(2))
+
+      y = s.run(2 * x, feed_dict={x.name: np.ones(2).astype(np.float32)})
+      self.assertAllEqual(y, 2 * np.ones(2))
+
+      y = s.run(2 * x, feed_dict={x: [1, 1]})
+      assert (y == 2 * np.ones(2)).all()
+
+  def testGraphDef(self):
+    with session.Session() as sess:
+      self.assertProtoEquals('', sess.graph_def)
+      c = constant_op.constant(5.0, name='c')
+      self.assertEquals(len(sess.graph_def.node), 1)
+      d = constant_op.constant(6.0, name='d')
+      self.assertEquals(len(sess.graph_def.node), 2)
+      self.assertAllEqual(c.eval(), 5.0)
+      self.assertAllEqual(d.eval(), 6.0)
+      e = constant_op.constant(7.0, name='e')
+      self.assertEquals(len(sess.graph_def.node), 3)
+      self.assertAllEqual(e.eval(), 7.0)
+
+  def testUseAfterClose(self):
+    with session.Session() as sess:
+      c = constant_op.constant(5.0)
+      self.assertAllEqual(sess.run(c), 5.0)
+    with self.assertRaisesWithPredicateMatch(
+        RuntimeError, lambda e: 'Attempted to use a closed Session.' in str(e)):
+      sess.run(c)
+
+  def testUseAfterCloseConcurrent(self):
+    with session.Session() as sess:
+      c = constant_op.constant(5.0)
+      self.assertAllEqual(sess.run(c), 5.0)
+
+      def update_thread():
+        with self.assertRaisesWithPredicateMatch(
+            RuntimeError,
+            lambda e: 'Attempted to use a closed Session.' in str(e)):
+          while True:
+            sess.run(c)
+      t = threading.Thread(target=update_thread)
+      t.start()
+      time.sleep(0.1)
+      sess.close()
+      t.join()
+
+  def testNotEntered(self):
+    # pylint: disable=protected-access
+    self.assertEqual(ops._default_session_stack.get_default(), None)
+    # pylint: enable=protected-access
+    with ops.device('/cpu:0'):
+      sess = session.Session()
+      c_1 = constant_op.constant(5.0)
+      with sess.graph.as_default():
+        c_2 = constant_op.constant(5.0)
+      self.assertEqual(c_1.graph, c_2.graph)
+      self.assertEqual(sess.run(c_2), 5.0)
+      with self.assertRaisesWithPredicateMatch(
+          ValueError, lambda e: 'No default session is registered.' in str(e)):
+        c_2.eval()
+
+  def testInteractive(self):
+    with ops.device('/cpu:0'):
+      sess = session.InteractiveSession()
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[2, 3])
+      c = math_ops.matmul(a, b)
+      self.assertAllEqual([[4.0, 4.0, 4.0]], c.eval())
+      d = constant_op.constant([1.0, 2.0, 3.0], shape=[3, 1])
+      e = math_ops.matmul(c, d)
+      self.assertAllEqual([[24.0]], e.eval())
+      sess.close()
+
+  def testSharedGraph(self):
+    with ops.Graph().as_default() as g, ops.device('/cpu:0'):
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[2, 3])
+      c = math_ops.matmul(a, b)
+
+    with session.Session(graph=g) as sess1:
+      with session.Session(graph=g) as sess2:
+        self.assertAllEqual(sess1.run(c), sess2.run(c))
+
+  def testDuplicatedInputs(self):
+    with session.Session() as sess:
+      a = constant_op.constant(1.0, shape=[1, 2])
+      b = constant_op.constant(2.0, shape=[1, 3])
+      a_val, b_val, a2_val = sess.run([a, b, a])
+      self.assertAllEqual(a_val, [[1.0, 1.0]])
+      self.assertAllEqual(b_val, [[2.0, 2.0, 2.0]])
+      self.assertAllEqual(a2_val, [[1.0, 1.0]])
+
+  def testFeedAndFetch(self):
+    with session.Session():
+      for dtype in [types.float32,
+                    types.float64,
+                    types.int32,
+                    types.uint8,
+                    types.int16,
+                    types.int8,
+                    types.int64,
+                    types.bool,
+                    types.complex64]:
+        for shape in [(32, 4, 128), (37,), (2, 0, 6), (0, 0, 0)]:
+          np_dtype = dtype.as_numpy_dtype
+
+          feed_t = array_ops.placeholder(dtype=dtype, shape=shape)
+          out_t = array_ops.identity(feed_t)
+
+          np_array = np.random.randint(-10, 10, shape)
+
+          if dtype == types.bool:
+            np_array = np_array > 0
+          elif dtype == types.complex64:
+            np_array = np.sqrt(np_array.astype(np_dtype))
+          else:
+            np_array = np_array.astype(np_dtype)
+
+          self.assertAllEqual(np_array,
+                              out_t.eval(feed_dict={feed_t: np_array}))
+
+  def testStringFetch(self):
+    with session.Session():
+      for shape in [(32, 4, 128), (37,), (2, 0, 6), (0, 0, 0)]:
+        size = 1
+        for s in shape:
+          size *= s
+        c_list = np.array([str(i) for i in xrange(size)],
+                          dtype=np.object).reshape(shape) if size > 0 else []
+        c = constant_op.constant(c_list)
+        self.assertAllEqual(c.eval(), c_list)
+
+  def testStringFeed(self):
+    with session.Session():
+      for shape in [(32, 4, 128), (37,), (2, 0, 6), (0, 0, 0)]:
+        size = 1
+        for s in shape:
+          size *= s
+        c_list = np.array([str(i) for i in xrange(size)],
+                          dtype=np.object).reshape(shape)
+        feed_t = array_ops.placeholder(dtype=types.string, shape=shape)
+        c = array_ops.identity(feed_t)
+        self.assertAllEqual(c.eval(feed_dict={feed_t: c_list}), c_list)
+
+  def testStringFeedWithNullCharacters(self):
+    with session.Session():
+      c_list = ['\n\x01\x00', '\n\x00\x01']
+      feed_t = array_ops.placeholder(dtype=types.string, shape=[2])
+      c = array_ops.identity(feed_t)
+      out = c.eval(feed_dict={feed_t: c_list})
+      self.assertEqual(c_list[0], out[0])
+      self.assertEqual(c_list[1], out[1])
+
+  def testInvalidTargetFails(self):
+    with self.assertRaises(RuntimeError):
+      session.Session("INVALID_TARGET")
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/client/tensorflow_server.i b/tensorflow/python/client/tensorflow_server.i
new file mode 100644
index 0000000000..65b3826961
--- /dev/null
+++ b/tensorflow/python/client/tensorflow_server.i
@@ -0,0 +1,16 @@
+%include "tensorflow/python/platform/base.i"
+%import(module="tensorflow.python.pywrap_tensorflow") "tensorflow/python/lib/core/status.i"
+
+%{
+#include "tensorflow/core/public/tensorflow_server.h"
+%}
+
+%ignoreall
+
+%unignore tensorflow;
+%unignore tensorflow::LaunchTensorFlow;
+
+%include "tensorflow/core/public/tensorflow_server.h"
+
+%unignoreall
+
diff --git a/tensorflow/python/client/test_construction_fails_op.cc b/tensorflow/python/client/test_construction_fails_op.cc
new file mode 100644
index 0000000000..47b2b5b49c
--- /dev/null
+++ b/tensorflow/python/client/test_construction_fails_op.cc
@@ -0,0 +1,22 @@
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+REGISTER_OP("ConstructionFails");
+
+class ConstructionFailsOp : public OpKernel {
+ public:
+  explicit ConstructionFailsOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
+    OP_REQUIRES(ctx, false,
+                errors::InvalidArgument("Failure during construction."));
+  }
+
+  void Compute(OpKernelContext* ctx) override {}
+};
+
+REGISTER_KERNEL_BUILDER(Name("ConstructionFails").Device(DEVICE_CPU),
+                        ConstructionFailsOp);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/python/client/tf_session.i b/tensorflow/python/client/tf_session.i
new file mode 100644
index 0000000000..30e80f779f
--- /dev/null
+++ b/tensorflow/python/client/tf_session.i
@@ -0,0 +1,235 @@
+%include "tensorflow/python/platform/base.i"
+
+%{
+
+#include "numpy/arrayobject.h"
+
+#include "tensorflow/python/client/tf_session_helper.h"
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/public/status.h"
+
+%}
+
+// Implements the StatusNotOK exception.
+%import(module="tensorflow.python.pywrap_tensorflow") "tensorflow/python/lib/core/status.i"
+
+// Required to use PyArray_* functions.
+%include "tensorflow/python/platform/numpy.i"
+%init %{
+import_array();
+%}
+
+// Release the Python GIL for the duration of most methods.
+%exception {
+  Py_BEGIN_ALLOW_THREADS;
+  $action
+  Py_END_ALLOW_THREADS;
+}
+
+// Proto input arguments to C API functions are passed as a (const
+// void*, size_t) pair. In Python, typemap these to a single string
+// argument.
+%typemap(in) (const void* proto, size_t proto_len) {
+  char* c_string;
+  Py_ssize_t py_size;
+  if (PyBytes_AsStringAndSize($input, &c_string, &py_size) == -1) {
+    // Python has raised an error (likely TypeError or UnicodeEncodeError).
+    SWIG_fail;
+  }
+  $1 = static_cast<void*>(c_string);
+  $2 = static_cast<size_t>(py_size);
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// BEGIN TYPEMAPS FOR tensorflow::TF_Run_wrapper()
+////////////////////////////////////////////////////////////////////////////////
+
+// The wrapper takes a vector of pairs of feed names and feed
+// values. In Python this is represented as dictionary mapping strings
+// to numpy arrays.
+%typemap(in) const tensorflow::FeedVector& inputs (
+    tensorflow::FeedVector temp,
+    tensorflow::Safe_PyObjectPtr temp_string_list(tensorflow::make_safe(nullptr)),
+    tensorflow::Safe_PyObjectPtr temp_array_list(tensorflow::make_safe(nullptr))) {
+  if (!PyDict_Check($input)) {
+    SWIG_fail;
+  }
+
+  temp_string_list = tensorflow::make_safe(PyList_New(0));
+  if (!temp_string_list) {
+    SWIG_fail;
+  }
+  temp_array_list = tensorflow::make_safe(PyList_New(0));
+  if (!temp_array_list) {
+    SWIG_fail;
+  }
+
+  PyObject* key;
+  PyObject* value;
+  Py_ssize_t pos = 0;
+  while (PyDict_Next($input, &pos, &key, &value)) {
+    const char* key_string = PyString_AsString(key);
+    if (!key_string) {
+      SWIG_fail;
+    }
+
+    // The ndarray must be stored as contiguous bytes in C (row-major) order.
+    PyObject* array_object = PyArray_FromAny(
+        value, nullptr, 0, 0, NPY_ARRAY_CARRAY, nullptr);
+    if (!array_object) {
+      SWIG_fail;
+    }
+    PyArrayObject* array = reinterpret_cast<PyArrayObject*>(array_object);
+
+    // Keep a reference to the key and the array, in case the incoming dict is
+    // modified, and/or to avoid leaking references on failure.
+    if (PyList_Append(temp_string_list.get(), key) == -1) {
+      SWIG_fail;
+    }
+    if (PyList_Append(temp_array_list.get(), array_object) == -1) {
+      SWIG_fail;
+    }
+
+    temp.push_back(std::make_pair(key_string, array));
+  }
+  $1 = &temp;
+}
+
+// The wrapper also takes a list of fetch and target names.  In Python this is
+// represented as a list of strings.
+%typemap(in) const tensorflow::NameVector& (
+    tensorflow::NameVector temp,
+    tensorflow::Safe_PyObjectPtr temp_string_list(tensorflow::make_safe(nullptr))) {
+  if (!PyList_Check($input)) {
+    SWIG_fail;
+  }
+
+  Py_ssize_t len = PyList_Size($input);
+
+  temp_string_list = tensorflow::make_safe(PyList_New(len));
+  if (!temp_string_list) {
+    SWIG_fail;
+  }
+
+  for (Py_ssize_t i = 0; i < len; ++i) {
+    PyObject* elem = PyList_GetItem($input, i);
+    if (!elem) {
+      SWIG_fail;
+    }
+
+    // Keep a reference to the string in case the incoming list is modified.
+    PyList_SET_ITEM(temp_string_list.get(), i, elem);
+    Py_INCREF(elem);
+
+    const char* fetch_name = PyString_AsString(elem);
+    if (!fetch_name) {
+      PyErr_SetString(PyExc_TypeError,
+                      "a fetch or target name was not a string");
+      SWIG_fail;
+    }
+
+    // TODO(mrry): Avoid copying the fetch name in, if this impacts performance.
+    temp.push_back(fetch_name);
+  }
+  $1 = &temp;
+}
+
+
+// The wrapper has two outputs: a tensorflow::Status, and a vector of
+// PyObjects containing the fetch results (iff the status is OK). Since
+// the interpretation of the vector depends on the status, we define
+// them as two consecutive out arguments, so that they can be accessed
+// together in a typemap.
+
+// Define temporaries for the argout outputs.
+%typemap(in, numinputs=0) tensorflow::Status* out_status (
+    tensorflow::Status temp) {
+  $1 = &temp;
+}
+%typemap(in, numinputs=0) tensorflow::PyObjectVector* out_values (
+    tensorflow::PyObjectVector temp) {
+  $1 = &temp;
+}
+
+// Raise a StatusNotOK exception if the out_status is not OK;
+// otherwise build a Python list of outputs and return it.
+%typemap(argout, fragment="StatusNotOK") (
+    tensorflow::Status* out_status, tensorflow::PyObjectVector* out_values) {
+  if (!$1->ok()) {
+    RaiseStatusNotOK(*$1, $descriptor(tensorflow::Status*));
+    SWIG_fail;
+  } else {
+    tensorflow::Safe_PyObjectVector out_values_safe;
+    for (int i = 0; i < $2->size(); ++i) {
+      out_values_safe.emplace_back(tensorflow::make_safe($2->at(i)));
+    }
+
+    $result = PyList_New($2->size());
+    if (!$result) {
+      SWIG_fail;
+    }
+
+    for (int i = 0; i < $2->size(); ++i) {
+      PyList_SET_ITEM($result, i, $2->at(i));
+      out_values_safe[i].release();
+    }
+  }
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// END TYPEMAPS FOR tensorflow::TF_Run_wrapper()
+////////////////////////////////////////////////////////////////////////////////
+
+
+
+// Include the functions from tensor_c_api.h, except TF_Run.
+%ignoreall
+%unignore TF_Code;
+%unignore TF_Status;
+%unignore TF_NewStatus;
+%unignore TF_DeleteStatus;
+%unignore TF_GetCode;
+%unignore TF_Message;
+%unignore TF_SessionOptions;
+%rename("_TF_SetTarget") TF_SetTarget;
+%rename("_TF_SetConfig") TF_SetConfig;
+%rename("_TF_NewSessionOptions") TF_NewSessionOptions;
+%unignore TF_DeleteSessionOptions;
+%unignore TF_NewSession;
+%unignore TF_CloseSession;
+%unignore TF_DeleteSession;
+%unignore TF_ExtendGraph;
+%include "tensorflow/core/public/tensor_c_api.h"
+%ignoreall
+
+%insert("python") %{
+  def TF_NewSessionOptions(target=None, config=None):
+    opts = _TF_NewSessionOptions()
+    if target is not None:
+      _TF_SetTarget(opts, target)
+    if config is not None:
+      from tensorflow.core.framework import config_pb2
+      if not isinstance(config, config_pb2.ConfigProto):
+        raise TypeError("Expected config_pb2.ConfigProto, "
+                        "but got %s" % type(config))
+      status = TF_NewStatus()
+      config_str = config.SerializeToString()
+      _TF_SetConfig(opts, config_str, len(config_str), status)
+      if TF_GetCode(status) != 0:
+        raise ValueError(TF_Message(status))
+    return opts
+%}
+
+// Include the wrapper for TF_Run from tf_session_helper.h.
+
+// The %exception block above releases the Python GIL for the length
+// of each wrapped method. We disable this behavior for TF_Run
+// because it uses the Python allocator.
+%noexception tensorflow::TF_Run_wrapper;
+%rename(TF_Run) tensorflow::TF_Run_wrapper;
+%unignore tensorflow;
+%unignore TF_Run;
+
+%include "tensorflow/python/client/tf_session_helper.h"
+
+%unignoreall
diff --git a/tensorflow/python/client/tf_session_helper.cc b/tensorflow/python/client/tf_session_helper.cc
new file mode 100644
index 0000000000..06483da87b
--- /dev/null
+++ b/tensorflow/python/client/tf_session_helper.cc
@@ -0,0 +1,518 @@
+#include "tensorflow/python/client/tf_session_helper.h"
+
+#include <cstring>
+
+#include "tensorflow/core/lib/core/coding.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+namespace {
+
+// Container types for the various temporary values used internally in
+// the wrapper.
+
+// A TF_TensorVector is a vector of borrowed pointers to TF_Tensors.
+typedef gtl::InlinedVector<TF_Tensor*, 8> TF_TensorVector;
+
+// Safe containers for (an) owned TF_Tensor(s). On destruction, the
+// tensor will be deleted by TF_DeleteTensor.
+typedef std::unique_ptr<TF_Tensor, decltype(&TF_DeleteTensor)>
+    Safe_TF_TensorPtr;
+typedef std::vector<Safe_TF_TensorPtr> Safe_TF_TensorVector;
+Safe_TF_TensorPtr make_safe(TF_Tensor* tensor) {
+  return Safe_TF_TensorPtr(tensor, TF_DeleteTensor);
+}
+
+// Safe container for an owned TF_Status. On destruction, the status
+// will be deleted by TF_DeleteStatus.
+typedef std::unique_ptr<TF_Status, decltype(&TF_DeleteStatus)>
+    Safe_TF_StatusPtr;
+Safe_TF_StatusPtr make_safe(TF_Status* status) {
+  return Safe_TF_StatusPtr(status, TF_DeleteStatus);
+}
+
+Status PyArrayDescr_to_TF_DataType(PyArray_Descr* descr,
+                                   TF_DataType* out_tf_datatype) {
+  PyObject* key;
+  PyObject* value;
+  Py_ssize_t pos = 0;
+  if (PyDict_Next(descr->fields, &pos, &key, &value)) {
+    const char* key_string = PyString_AsString(key);
+    if (!key_string) {
+      return errors::Internal("Corrupt numpy type descriptor");
+    }
+    tensorflow::string key = key_string;
+    // The typenames here should match the field names in the custom struct
+    // types constructed in test_util.py.
+    // TODO(mrry,keveman): Investigate Numpy type registration to replace this
+    // hard-coding of names.
+    if (key == "quint8") {
+      *out_tf_datatype = TF_QUINT8;
+    } else if (key == "qint8") {
+      *out_tf_datatype = TF_QINT8;
+    } else if (key == "qint32") {
+      *out_tf_datatype = TF_QINT32;
+    } else {
+      return errors::Internal("Unsupported numpy data type");
+    }
+    return Status::OK();
+  }
+  return errors::Internal("Unsupported numpy data type");
+}
+
+Status PyArray_TYPE_to_TF_DataType(PyArrayObject* array,
+                                   TF_DataType* out_tf_datatype) {
+  int pyarray_type = PyArray_TYPE(array);
+  PyArray_Descr* descr = array->descr;
+  switch (pyarray_type) {
+    case NPY_FLOAT32:
+      *out_tf_datatype = TF_FLOAT;
+      break;
+    case NPY_FLOAT64:
+      *out_tf_datatype = TF_DOUBLE;
+      break;
+    case NPY_INT32:
+      *out_tf_datatype = TF_INT32;
+      break;
+    case NPY_UINT8:
+      *out_tf_datatype = TF_UINT8;
+      break;
+    case NPY_INT16:
+      *out_tf_datatype = TF_INT16;
+      break;
+    case NPY_INT8:
+      *out_tf_datatype = TF_INT8;
+      break;
+    case NPY_INT64:
+      *out_tf_datatype = TF_INT64;
+      break;
+    case NPY_BOOL:
+      *out_tf_datatype = TF_BOOL;
+      break;
+    case NPY_COMPLEX64:
+      *out_tf_datatype = TF_COMPLEX;
+      break;
+    case NPY_OBJECT:
+      *out_tf_datatype = TF_STRING;
+      break;
+    case NPY_VOID:
+      // Quantized types are currently represented as custom struct types.
+      // PyArray_TYPE returns NPY_VOID for structs, and we should look into
+      // descr to derive the actual type.
+      return PyArrayDescr_to_TF_DataType(descr, out_tf_datatype);
+    default:
+      // TODO(mrry): Support these.
+      return errors::Internal("Unsupported feed type");
+  }
+  return Status::OK();
+}
+
+Status TF_DataType_to_PyArray_TYPE(TF_DataType tf_datatype,
+                                   int* out_pyarray_type) {
+  switch (tf_datatype) {
+    case TF_FLOAT:
+      *out_pyarray_type = NPY_FLOAT32;
+      break;
+    case TF_DOUBLE:
+      *out_pyarray_type = NPY_FLOAT64;
+      break;
+    case TF_INT32:
+      *out_pyarray_type = NPY_INT32;
+      break;
+    case TF_UINT8:
+      *out_pyarray_type = NPY_UINT8;
+      break;
+    case TF_INT16:
+      *out_pyarray_type = NPY_INT16;
+      break;
+    case TF_INT8:
+      *out_pyarray_type = NPY_INT8;
+      break;
+    case TF_INT64:
+      *out_pyarray_type = NPY_INT64;
+      break;
+    case TF_BOOL:
+      *out_pyarray_type = NPY_BOOL;
+      break;
+    case TF_COMPLEX:
+      *out_pyarray_type = NPY_COMPLEX64;
+      break;
+    case TF_STRING:
+      *out_pyarray_type = NPY_OBJECT;
+      break;
+    // TODO(keveman): These should be changed to NPY_VOID, and the type used for
+    // the resulting numpy array should be the custom struct types that we
+    // expect for quantized types.
+    case TF_QINT8:
+      *out_pyarray_type = NPY_INT8;
+      break;
+    case TF_QUINT8:
+      *out_pyarray_type = NPY_UINT8;
+      break;
+    case TF_QINT32:
+      *out_pyarray_type = NPY_INT32;
+      break;
+    case TF_BFLOAT16:
+      *out_pyarray_type = NPY_UINT16;
+      break;
+    default:
+      return errors::Internal("Unsupported fetch type");
+  }
+  return Status::OK();
+}
+
+// Iterate over the string array 'array', extract the ptr and len of each string
+// element and call f(ptr, len).
+template <typename F>
+Status PyStringArrayMap(PyArrayObject* array, F f) {
+  Safe_PyObjectPtr iter = tensorflow::make_safe(
+      PyArray_IterNew(reinterpret_cast<PyObject*>(array)));
+  while (PyArray_ITER_NOTDONE(iter.get())) {
+    auto item = tensorflow::make_safe(
+        PyArray_GETITEM(array, PyArray_ITER_DATA(iter.get())));
+    if (!item.get()) {
+      return errors::Internal("Unable to get element from the feed.");
+    }
+    char* ptr;
+    Py_ssize_t len;
+    int success = PyString_AsStringAndSize(item.get(), &ptr, &len);
+    if (success != 0) {
+      return errors::Internal("Unable to get element from the feed.");
+    }
+    f(ptr, len);
+    PyArray_ITER_NEXT(iter.get());
+  }
+  return Status::OK();
+}
+
+// Encode the strings in 'array' into a contiguous buffer and return the base of
+// the buffer. The caller takes ownership of the buffer.
+Status EncodePyStringArray(PyArrayObject* array, tensorflow::int64 nelems,
+                           size_t* size, void** buffer) {
+  // Compute bytes needed for encoding.
+  *size = 0;
+  TF_RETURN_IF_ERROR(
+      PyStringArrayMap(array, [&size](char* ptr, Py_ssize_t len) {
+        *size += sizeof(tensorflow::uint64) +
+                 tensorflow::core::VarintLength(len) + len;
+      }));
+  // Encode all strings.
+  std::unique_ptr<char[]> base_ptr(new char[*size]);
+  char* base = base_ptr.get();
+  char* data_start = base + sizeof(tensorflow::uint64) * nelems;
+  char* dst = data_start;  // Where next string is encoded.
+  tensorflow::uint64* offsets = reinterpret_cast<tensorflow::uint64*>(base);
+
+  TF_RETURN_IF_ERROR(PyStringArrayMap(
+      array, [&base, &data_start, &dst, &offsets](char* ptr, Py_ssize_t len) {
+        *offsets = (dst - data_start);
+        offsets++;
+        dst = tensorflow::core::EncodeVarint64(dst, len);
+        memcpy(dst, ptr, len);
+        dst += len;
+      }));
+  CHECK_EQ(dst, base + *size);
+  *buffer = base_ptr.release();
+  return Status::OK();
+}
+
+// Determine the pointer and offset of the string at offset 'i' in the string
+// tensor 'src', whose total length is 'num_elements'.
+static Status TF_StringTensor_GetPtrAndLen(const TF_Tensor* src,
+                                           tensorflow::int64 num_elements,
+                                           tensorflow::int64 i,
+                                           const char** ptr,
+                                           tensorflow::uint64* len) {
+  const char* input = reinterpret_cast<const char*>(TF_TensorData(src));
+  const size_t src_size = TF_TensorByteSize(src);
+  const char* data_start = input + sizeof(tensorflow::uint64) * num_elements;
+  const char* limit = input + src_size;
+  tensorflow::uint64 offset =
+      reinterpret_cast<const tensorflow::uint64*>(input)[i];
+  const char* p =
+      tensorflow::core::GetVarint64Ptr(data_start + offset, limit, len);
+  if (offset >= (limit - data_start) || !p || (*len > (limit - p))) {
+    return errors::InvalidArgument("Malformed TF_STRING tensor; element ", i,
+                                   " out of range");
+  }
+  *ptr = p;
+  return Status::OK();
+}
+
+// Copy the string at offset 'i' in the (linearized) string tensor 'tensor' into
+// 'pyarray' at offset pointed by the 'i_ptr' iterator.
+static Status CopyStringToPyArrayElement(PyArrayObject* pyarray, void* i_ptr,
+                                         TF_Tensor* tensor,
+                                         tensorflow::int64 num_elements,
+                                         tensorflow::int64 i) {
+  const char* ptr;
+  tensorflow::uint64 len;
+  TF_RETURN_IF_ERROR(
+      TF_StringTensor_GetPtrAndLen(tensor, num_elements, i, &ptr, &len));
+  auto py_string = tensorflow::make_safe(PyString_FromStringAndSize(ptr, len));
+  int success =
+      PyArray_SETITEM(pyarray, PyArray_ITER_DATA(i_ptr), py_string.get());
+  if (success != 0) {
+    return errors::Internal("Error setting element ", i);
+  }
+  return Status::OK();
+}
+
+// Converts the given TF_Tensor to a Numpy array.
+// If the returned status is OK, the caller becomes the owner of *out_array.
+Status TF_Tensor_to_PyObject(TF_Tensor* tensor, PyObject** out_array) {
+  // A fetched operation will correspond to a null tensor, and a None
+  // in Python.
+  if (tensor == nullptr) {
+    Py_INCREF(Py_None);
+    *out_array = Py_None;
+    return Status::OK();
+  }
+
+  const int ndims = TF_NumDims(tensor);
+  gtl::InlinedVector<npy_intp, 4> dims(ndims);
+  tensorflow::int64 nelems = 1;
+  for (int i = 0; i < ndims; ++i) {
+    dims[i] = TF_Dim(tensor, i);
+    nelems *= dims[i];
+  }
+
+  // Convert TensorFlow dtype to numpy type descriptor.
+  int type_num;
+  TF_RETURN_IF_ERROR(
+      TF_DataType_to_PyArray_TYPE(TF_TensorType(tensor), &type_num));
+  PyArray_Descr* descr = PyArray_DescrFromType(type_num);
+
+  // Copy the TF_TensorData into a newly-created ndarray and return it.
+  // TODO(mrry): Perhaps investigate zero-copy approaches. This would involve
+  // creating an ndarray-like object that wraps the TF_Tensor buffer, and
+  // maps its destructor to TF_DeleteTensor.
+  Safe_PyObjectPtr safe_out_array =
+      tensorflow::make_safe(PyArray_Empty(ndims, dims.data(), descr, 0));
+  if (!safe_out_array) {
+    return errors::Internal("Could not allocate ndarray");
+  }
+  PyArrayObject* py_array =
+      reinterpret_cast<PyArrayObject*>(safe_out_array.get());
+  if (PyArray_NBYTES(py_array) != TF_TensorByteSize(tensor)) {
+    if (TF_TensorType(tensor) == TF_STRING) {
+      // Copy element by element.
+      auto iter = tensorflow::make_safe(PyArray_IterNew(safe_out_array.get()));
+      for (tensorflow::int64 i = 0; i < nelems; ++i) {
+        auto s =
+            CopyStringToPyArrayElement(py_array, iter.get(), tensor, nelems, i);
+        if (!s.ok()) {
+          return s;
+        }
+        PyArray_ITER_NEXT(iter.get());
+      }
+    } else {
+      return errors::Internal("ndarray was ", PyArray_NBYTES(py_array),
+                              " bytes but TF_Tensor was ",
+                              TF_TensorByteSize(tensor), " bytes");
+    }
+  } else {
+    memcpy(py_array->data, TF_TensorData(tensor), PyArray_NBYTES(py_array));
+  }
+
+  // PyArray_Return turns rank 0 arrays into numpy scalars
+  *out_array = PyArray_Return(
+      reinterpret_cast<PyArrayObject*>(safe_out_array.release()));
+  return Status::OK();
+}
+
+tensorflow::Status TF_Status_to_Status(TF_Status* tf_status) {
+  TF_Code code = TF_GetCode(tf_status);
+  const string message(TF_Message(tf_status));
+
+  switch (code) {
+    case TF_OK:
+      return Status::OK();
+    case TF_CANCELLED:
+      return errors::Cancelled(message);
+    case TF_UNKNOWN:
+      return errors::Unknown(message);
+    case TF_INVALID_ARGUMENT:
+      return errors::InvalidArgument(message);
+    case TF_DEADLINE_EXCEEDED:
+      return errors::DeadlineExceeded(message);
+    case TF_NOT_FOUND:
+      return errors::NotFound(message);
+    case TF_ALREADY_EXISTS:
+      return errors::AlreadyExists(message);
+    case TF_PERMISSION_DENIED:
+      return errors::PermissionDenied(message);
+    case TF_UNAUTHENTICATED:
+      return errors::Unauthenticated(message);
+    case TF_RESOURCE_EXHAUSTED:
+      return errors::ResourceExhausted(message);
+    case TF_FAILED_PRECONDITION:
+      return errors::FailedPrecondition(message);
+    case TF_ABORTED:
+      return errors::Aborted(message);
+    case TF_OUT_OF_RANGE:
+      return errors::OutOfRange(message);
+    case TF_UNIMPLEMENTED:
+      return errors::Unimplemented(message);
+    case TF_INTERNAL:
+      return errors::Internal(message);
+    case TF_UNAVAILABLE:
+      return errors::Unavailable(message);
+    case TF_DATA_LOSS:
+      return errors::DataLoss(message);
+    default:
+      return errors::Internal("Got error with unknown code: ", code, " ",
+                              message);
+  }
+}
+
+static bool numpy_imported = false;
+
+}  // namespace
+
+Safe_PyObjectPtr make_safe(PyObject* o) {
+  return Safe_PyObjectPtr(o, Py_DECREF_wrapper);
+}
+
+// Wrapper for TF_Run that converts the arguments to appropriate types.
+// If *out_status is OK, the caller becomes the owner of the PyObjects
+// in *out_values.
+void TF_Run_wrapper(TF_Session* session, const FeedVector& inputs,
+                    const NameVector& output_names,
+                    const NameVector& target_nodes, Status* out_status,
+                    PyObjectVector* out_values) {
+  // 0. Ensure that numpy has been imported.
+  if (!numpy_imported) {
+    import_array();
+    numpy_imported = true;
+  }
+
+  // 1. Convert the feed inputs to the appropriate form for TF_Run.
+  NameVector input_names;
+  Safe_PyObjectVector
+      py_inputs_safe;  // Used to decref the input arrays on failure.
+  Safe_TF_TensorVector inputs_safe;  // Used to delete tensors on failure.
+  TF_TensorVector inputs_unsafe;     // Used to contain the arg to TF_Run.
+
+  for (const auto& name_and_array : inputs) {
+    py_inputs_safe.emplace_back(
+        make_safe(reinterpret_cast<PyObject*>(name_and_array.second)));
+  }
+
+  for (int i = 0; i < inputs.size(); ++i) {
+    input_names.push_back(inputs[i].first);
+    PyArrayObject* array = inputs[i].second;
+
+    // Convert numpy dtype to TensorFlow dtype.
+    TF_DataType dtype;
+    *out_status = PyArray_TYPE_to_TF_DataType(array, &dtype);
+    if (!out_status->ok()) {
+      return;
+    }
+
+    tensorflow::int64 nelems = 1;
+    gtl::InlinedVector<tensorflow::int64, 4> dims;
+    for (int i = 0; i < PyArray_NDIM(array); ++i) {
+      dims.push_back(PyArray_SHAPE(array)[i]);
+      nelems *= dims[i];
+    }
+
+    // Create a TF_Tensor based on the fed data. In the case of non-string data
+    // type, this steals a reference to array, which will be relinquished when
+    // the underlying buffer is deallocated. For string, a new temporary buffer
+    // is allocated into which the strings are encoded.
+    if (dtype != TF_STRING) {
+      // NOTE(mrry): We currently copy the numpy array into a new
+      // buffer to avoid possible issues on deallocation (such as
+      // having to acquire the Python Global Interpreter Lock).
+      // TODO(mrry): Investigate in what cases we can safely acquire
+      size_t size = PyArray_NBYTES(array);
+      // NOTE(mrry): 32 is the upper bound on current alignment
+      // requirements for tensorflow::Tensor. We hard code this here to
+      // avoid taking a dependency on Eigen in the client code.
+      void* data = tensorflow::cpu_allocator()->AllocateRaw(32, size);
+      std::memcpy(data, array->data, size);
+      inputs_safe.emplace_back(make_safe(
+          TF_NewTensor(dtype, dims.data(), dims.size(), data, size,
+                       [](void* data, size_t len, void* arg) {
+                         tensorflow::cpu_allocator()->DeallocateRaw(data);
+                       },
+                       nullptr)));
+      // The destruction of the numpy array will now be handled by the
+      // inputs_safe destructor.
+      py_inputs_safe[i].reset();
+    } else {
+      size_t size;
+      void* encoded;
+      Status s = EncodePyStringArray(array, nelems, &size, &encoded);
+      if (!s.ok()) {
+        *out_status = s;
+        return;
+      }
+      inputs_safe.emplace_back(
+          make_safe(TF_NewTensor(dtype, dims.data(), dims.size(), encoded, size,
+                                 [](void* data, size_t len, void* arg) {
+                                   delete[] reinterpret_cast<char*>(data);
+                                 },
+                                 array)));
+      // The destruction of the numpy array will now be handled by the
+      // inputs_safe destructor.
+      py_inputs_safe[i].reset();
+    }
+    inputs_unsafe.push_back(inputs_safe.back().get());
+  }
+
+  // 2. Allocate a container for the output data.
+  TF_TensorVector outputs(output_names.size());
+
+  Safe_TF_StatusPtr status = make_safe(TF_NewStatus());
+
+  // 3. Actually call TF_Run().
+  Py_BEGIN_ALLOW_THREADS;
+  TF_Run(session, input_names.data(), inputs_unsafe.data(), input_names.size(),
+         const_cast<const char**>(output_names.data()), outputs.data(),
+         output_names.size(), const_cast<const char**>(target_nodes.data()),
+         target_nodes.size(), status.get());
+  Py_END_ALLOW_THREADS;
+
+  // 4. The TensorFlow runtime has taken ownership of the fed tensors,
+  // so we release the safe pointers to them.
+  for (auto& input : inputs_safe) {
+    input.release();
+  }
+
+  if (TF_GetCode(status.get()) != TF_OK) {
+    *out_status = TF_Status_to_Status(status.get());
+    return;
+  }
+
+  // 5. We now own the fetched tensors, so set up a safe container to
+  // delete them when we exit this scope.
+  Safe_TF_TensorVector tf_outputs_safe;
+  for (const auto& output : outputs) {
+    tf_outputs_safe.emplace_back(make_safe(output));
+  }
+
+  // 6. Convert the fetched tensors into numpy ndarrays. Store them in a safe
+  // container so that we do not leak
+  Safe_PyObjectVector py_outputs_safe;
+  for (int i = 0; i < output_names.size(); ++i) {
+    PyObject* py_array;
+    *out_status = TF_Tensor_to_PyObject(outputs[i], &py_array);
+    if (!out_status->ok()) {
+      return;
+    }
+    py_outputs_safe.emplace_back(make_safe(py_array));
+  }
+
+  // 7. If we reach this point, we have successfully built a list of objects
+  // so we can release them from the safe container.
+  for (auto& output : py_outputs_safe) {
+    out_values->push_back(output.release());
+  }
+  *out_status = Status::OK();
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/python/client/tf_session_helper.h b/tensorflow/python/client/tf_session_helper.h
new file mode 100644
index 0000000000..12a7527ed9
--- /dev/null
+++ b/tensorflow/python/client/tf_session_helper.h
@@ -0,0 +1,56 @@
+#ifndef TENSORFLOW_PYTHON_CLIENT_TF_SESSION_HELPER_H_
+#define TENSORFLOW_PYTHON_CLIENT_TF_SESSION_HELPER_H_
+
+#include <Python.h>
+
+#include "numpy/arrayobject.h"
+
+#include "tensorflow/core/lib/core/errors.h"
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/core/public/tensor_c_api.h"
+
+namespace tensorflow {
+
+// Container types for the various arguments and temporary values used
+// in the wrapper.
+
+// A FeedVector is a vector of tensor name and numpy array pairs. The
+// name is a borrowed C string.
+typedef tensorflow::gtl::InlinedVector<std::pair<const char*, PyArrayObject*>,
+                                       8> FeedVector;
+
+// A NameVector is a vector of tensor or operation names, as borrowed
+// C strings.
+typedef tensorflow::gtl::InlinedVector<const char*, 8> NameVector;
+
+// A PyObjectVector is a vector of borrowed pointers to PyObjects.
+typedef tensorflow::gtl::InlinedVector<PyObject*, 8> PyObjectVector;
+
+// Safe containers for (an) owned PyObject(s). On destruction, the
+// reference count of the contained object will be decremented.
+inline void Py_DECREF_wrapper(PyObject* o) { Py_DECREF(o); }
+typedef void (*Py_DECREF_wrapper_type)(PyObject*);
+typedef std::unique_ptr<PyObject, Py_DECREF_wrapper_type> Safe_PyObjectPtr;
+typedef std::vector<Safe_PyObjectPtr> Safe_PyObjectVector;
+Safe_PyObjectPtr make_safe(PyObject* o);
+
+// Run the graph associated with the session starting with the
+// supplied inputs[].  Regardless of success of failure, inputs[] are
+// stolen by the implementation (i.e. the implementation will
+// eventually call Py_DECREF on each array input).
+//
+// On success, the tensors corresponding to output_names[0,noutputs-1]
+// are placed in out_values[], and these outputs[] become the property
+// of the caller (the caller must eventually call Py_DECREF on them).
+//
+// On failure, out_status contains a tensorflow::Status with an error
+// message.
+void TF_Run_wrapper(TF_Session* session, const FeedVector& inputs,
+                    const NameVector& output_names,
+                    const NameVector& target_nodes, Status* out_status,
+                    PyObjectVector* out_values);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PYTHON_CLIENT_TF_SESSION_HELPER_H_
diff --git a/tensorflow/python/framework/__init__.py b/tensorflow/python/framework/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/framework/__init__.py
diff --git a/tensorflow/python/framework/device.py b/tensorflow/python/framework/device.py
new file mode 100644
index 0000000000..676e5f779a
--- /dev/null
+++ b/tensorflow/python/framework/device.py
@@ -0,0 +1,220 @@
+"""Class to represent a device."""
+import copy
+
+
+class Device(object):
+  """Represents a Device."""
+
+  def __init__(self, job=None, replica=None, task=None, device_type=None,
+               device_index=None):
+    """Create a new device object.
+
+    Args:
+      job: string.  Optional device job name.
+      replica: int.  Optional replica index.
+      task: int.  Optional task index.
+      device_type: Optional device type string (e.g. "CPU" or "GPU")
+      device_index: int.  Optional device index.  If left
+        unspecified, device represents 'any' device_index.
+    """
+    self.job = job
+    self.replica = replica
+    self.task = task
+    if device_type == "cpu" or device_type == "gpu":
+      # For backwards compatibility only, we support lowercase variants of
+      # cpu and gpu but turn them into uppercase here.
+      self.device_type = device_type.upper()
+    else:
+      self.device_type = device_type
+    self.device_index = device_index
+
+  def _clear(self):
+    self._job = None
+    self._replica = None
+    self._task = None
+    self.device_type = None
+    self.device_index = None
+
+  @property
+  def job(self):
+    return self._job
+
+  @job.setter
+  def job(self, job):
+    if job is not None:
+      self._job = str(job)
+    else:
+      self._job = None
+
+  @property
+  def replica(self):
+    return self._replica
+
+  @replica.setter
+  def replica(self, replica):
+    if replica is not None:
+      self._replica = int(replica)
+    else:
+      self._replica = None
+
+  @property
+  def task(self):
+    return self._task
+
+  @task.setter
+  def task(self, task):
+    if task is not None:
+      self._task = int(task)
+    else:
+      self._task = None
+
+  def parse_from_string(self, spec):
+    """Parse a Device name into its components.
+
+    Args:
+      spec: a string of the form
+       /job:<name>/replica:<id>/task:<id>/device:CPU:<id>
+      or
+       /job:<name>/replica:<id>/task:<id>/device:GPU:<id>
+      as cpu and gpu are mutually exclusive.
+      All entries are optional.
+
+    Returns:
+      The Device, for convenience.
+
+    Raises:
+      ValueError: if the spec was not valid.
+    """
+    self._clear()
+    splits = [x.split(":") for x in spec.split("/")]
+    for y in splits:
+      ly = len(y)
+      if y:
+        # NOTE(mdevin): we use the property getters here.
+        if ly == 2 and y[0] == "job":
+          self.job = y[1]
+        elif ly == 2 and y[0] == "replica":
+          self.replica = y[1]
+        elif ly == 2 and y[0] == "task":
+          self.task = y[1]
+        elif ((ly == 1 or ly == 2) and
+              ((y[0].upper() == "GPU") or (y[0].upper() == "CPU"))):
+          if self.device_type is not None:
+            raise ValueError("Cannot specify multiple device types: %s" % spec)
+          self.device_type = y[0].upper()
+          if ly == 2 and y[1] != "*":
+            self.device_index = int(y[1])
+        elif ly == 3 and y[0] == "device":
+          if self.device_type is not None:
+            raise ValueError("Cannot specify multiple device types: %s" % spec)
+          self.device_type = y[1]
+          if y[2] != "*":
+            self.device_index = int(y[2])
+        elif ly and y[0] != "":  # pylint: disable=g-explicit-bool-comparison
+          raise ValueError("Unknown attribute: '%s' in '%s'" % (y[0], spec))
+
+    return self
+
+  def merge_from(self, dev):
+    """Merge the properties of "dev" into this Device.
+
+    Args:
+      dev: a Device.
+    """
+    if dev.job is not None:
+      self.job = dev.job
+    if dev.replica is not None:
+      self.replica = dev.replica
+    if dev.task is not None:
+      self.task = dev.task
+    if dev.device_type is not None:
+      self.device_type = dev.device_type
+    if dev.device_index is not None:
+      self.device_index = dev.device_index
+
+  def to_string(self):
+    """Return a Device specification string.
+
+    Returns:
+      a string of the form /job:<name>/replica:<id>/task:<id>/device:cpu:<id>
+      or /job:<name>/replica:<id>/task:<id>/device:cpu:<id>.
+    """
+    dev = ""
+    if self.job is not None:
+      dev += "/job:" + self.job
+    if self.replica is not None:
+      dev += "/replica:" + str(self.replica)
+    if self.task is not None:
+      dev += "/task:" + str(self.task)
+    if self.device_type is not None:
+      device_index_string = "*"
+      if self.device_index is not None:
+        device_index_string = str(self.device_index)
+      dev += "/device:%s:%s" % (self.device_type, device_index_string)
+    return dev
+
+
+def from_string(spec):
+  """Construct a Device from a string.
+
+  Args:
+    spec: a string of the form
+     /job:<name>/replica:<id>/task:<id>/device:CPU:<id>
+    or
+     /job:<name>/replica:<id>/task:<id>/device:GPU:<id>
+    as cpu and gpu are mutually exclusive.
+    All entries are optional.
+
+  Returns:
+    A Device.
+  """
+  return Device().parse_from_string(spec)
+
+
+def check_valid(spec):
+  """Check that a device spec is valid.
+
+  Args:
+    spec: a string.
+
+  Raises:
+    An exception if the spec is invalid.
+  """
+  # Construct a device.  It will assert a failure if spec is invalid.
+  from_string(spec)
+
+
+def merge_device(spec):
+  """Returns a device function that merges devices specifications.
+
+  This can be used to merge partial specifications of devices. The
+  innermost setting for a device field takes precedence. For example:
+
+    with tf.Device(MergeDevice("/device:GPU:0"))
+      # Nodes created here have device "/device:GPU:0"
+      with tf.Device(MergeDevice("/job:worker")):
+        # Nodes created here have device "/job:worker/device:GPU:0"
+        with tf.Device(MergeDevice("/device:CPU:0")):
+          # Nodes created here have device "/job:worker/device:CPU:0"
+          with tf.Device(MergeDevice("/job:ps")):
+            # Nodes created here have device "/job:ps/device:CPU:0"
+
+  Args:
+    spec: A device or a device spec string (partially) describing the
+      device that should be used for all nodes created in the scope of
+      the returned device function's with block.
+
+  Returns:
+    A device function with the above-described behavior.
+
+  Raises:
+    ValueError: if the spec was not valid.
+  """
+  if not isinstance(spec, Device):
+    spec = from_string(spec or "")
+  def _device_function(node_def):
+    current_device = from_string(node_def.device or "")
+    copy_spec = copy.copy(spec)
+    copy_spec.merge_from(current_device)  # current_device takes precedence.
+    return copy_spec
+  return _device_function
diff --git a/tensorflow/python/framework/device_test.py b/tensorflow/python/framework/device_test.py
new file mode 100644
index 0000000000..0a244b0815
--- /dev/null
+++ b/tensorflow/python/framework/device_test.py
@@ -0,0 +1,122 @@
+"""Tests for tensorflow.python.framework.device."""
+import tensorflow.python.platform
+
+from tensorflow.python.framework import device
+from tensorflow.python.framework import test_util
+from tensorflow.python.platform import googletest
+
+
+class DeviceTest(test_util.TensorFlowTestCase):
+
+  def testEmpty(self):
+    d = device.Device()
+    self.assertEquals("", d.ToString())
+    d.parse_from_string("")
+    self.assertEquals("", d.ToString())
+
+  def testConstructor(self):
+    d = device.Device(job="j", replica=0, task=1,
+                      device_type="CPU", device_index=2)
+    self.assertEquals("j", d.job)
+    self.assertEquals(0, d.replica)
+    self.assertEquals(1, d.task)
+    self.assertEquals("CPU", d.device_type)
+    self.assertEquals(2, d.device_index)
+    self.assertEquals("/job:j/replica:0/task:1/device:CPU:2", d.to_string())
+
+    d = device.Device(device_type="GPU", device_index=0)
+    self.assertEquals("/device:GPU:0", d.to_string())
+
+  def testto_string(self):
+    d = device.Device()
+    d.job = "foo"
+    self.assertEquals("/job:foo", d.to_string())
+    d.task = 3
+    self.assertEquals("/job:foo/task:3", d.to_string())
+    d.device_type = "CPU"
+    d.device_index = 0
+    self.assertEquals("/job:foo/task:3/device:CPU:0", d.to_string())
+    d.task = None
+    d.replica = 12
+    self.assertEquals("/job:foo/replica:12/device:CPU:0", d.to_string())
+    d.device_type = "GPU"
+    d.device_index = 2
+    self.assertEquals("/job:foo/replica:12/device:GPU:2", d.to_string())
+    d.device_type = "CPU"
+    d.device_index = 1
+    self.assertEquals("/job:foo/replica:12/device:CPU:1", d.to_string())
+    d.device_type = None
+    d.device_index = None
+    d.cpu = None
+    self.assertEquals("/job:foo/replica:12", d.to_string())
+
+    # Test wildcard
+    d = device.Device(job="foo", replica=12, task=3, device_type="GPU")
+    self.assertEquals("/job:foo/replica:12/task:3/device:GPU:*", d.to_string())
+
+  def testParse(self):
+    d = device.Device()
+    d.parse_from_string("/job:foo/replica:0")
+    self.assertEquals("/job:foo/replica:0", d.to_string())
+    d.parse_from_string("/replica:1/task:0/cpu:0")
+    self.assertEquals("/replica:1/task:0/device:CPU:0", d.to_string())
+    d.parse_from_string("/replica:1/task:0/device:CPU:0")
+    self.assertEquals("/replica:1/task:0/device:CPU:0", d.to_string())
+    d.parse_from_string("/job:muu/gpu:2")
+    self.assertEquals("/job:muu/device:GPU:2", d.to_string())
+    with self.assertRaises(Exception) as e:
+      d.parse_from_string("/job:muu/gpu:2/cpu:0")
+    self.assertTrue("Cannot specify multiple device" in e.exception.message)
+
+  def testFromString(self):
+    d = device.from_string("/job:foo/replica:0")
+    self.assertEquals("/job:foo/replica:0", d.to_string())
+    with self.assertRaises(Exception) as e:
+      d = device.from_string("/job:muu/gpu:2/cpu:0")
+    self.assertTrue("Cannot specify multiple device" in e.exception.message)
+
+    d = device.from_string("/job:foo/replica:0/task:3/cpu:*")
+    self.assertEquals(None, d.device_index)
+    d = device.from_string("/job:foo/replica:0/task:3/gpu:7")
+    self.assertEquals(7, d.device_index)
+    d = device.from_string("/job:foo/replica:0/task:3/device:GPU:7")
+    self.assertEquals(7, d.device_index)
+
+  def testMerge(self):
+    d = device.from_string("/job:foo/replica:0")
+    self.assertEquals("/job:foo/replica:0", d.to_string())
+    d.merge_from(device.from_string("/task:1/gpu:2"))
+    self.assertEquals("/job:foo/replica:0/task:1/device:GPU:2", d.to_string())
+
+    d = device.Device()
+    d.merge_from(device.from_string("/task:1/cpu:0"))
+    self.assertEquals("/task:1/device:CPU:0", d.to_string())
+    d.merge_from(device.from_string("/job:boo/gpu:0"))
+    self.assertEquals("/job:boo/task:1/device:GPU:0", d.to_string())
+    d.merge_from(device.from_string("/job:muu/cpu:2"))
+    self.assertEquals("/job:muu/task:1/device:CPU:2", d.to_string())
+    d.merge_from(device.from_string("/job:muu/device:MyFunnyDevice:2"))
+    self.assertEquals("/job:muu/task:1/device:MyFunnyDevice:2", d.to_string())
+
+  def testCheckValid(self):
+    device.CheckValid("/job:foo/replica:0")
+
+    with self.assertRaises(Exception) as e:
+      device.CheckValid("/job:j/replica:foo")
+    self.assertTrue("invalid literal for int" in e.exception.message)
+
+    with self.assertRaises(Exception) as e:
+      device.CheckValid("/job:j/task:bar")
+    self.assertTrue("invalid literal for int" in e.exception.message)
+
+    with self.assertRaises(Exception) as e:
+      device.CheckValid("/bar:muu/baz:2")
+    self.assertTrue("Unknown attribute: 'bar'" in e.exception.message)
+
+    with self.assertRaises(Exception) as e:
+      device.CheckValid("/cpu:0/gpu:2")
+    self.assertTrue("Cannot specify multiple device" in e.exception.message)
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/framework/docs.py b/tensorflow/python/framework/docs.py
new file mode 100644
index 0000000000..68dbb3df72
--- /dev/null
+++ b/tensorflow/python/framework/docs.py
@@ -0,0 +1,492 @@
+"""Updates generated docs from Python doc comments.
+
+Both updates the files in the file-system and executes g4 commands to
+make sure any changes are ready to be submitted.
+"""
+
+import inspect
+import os
+import re
+import sys
+
+
+_arg_re = re.compile(" *([*]{0,2}[a-zA-Z][a-zA-Z0-9_]*):")
+_section_re = re.compile("([A-Z][a-zA-Z ]*):$")
+_always_drop_symbol_re = re.compile("_[_a-zA-Z0-9]")
+_anchor_re = re.compile(r"^[\w.]+$")
+_member_mark = "@@"
+
+
+class Document(object):
+  """Base class for an automatically generated document."""
+
+  def write_markdown_to_file(self, f):
+    """Writes a Markdown-formatted version of this document to file `f`.
+
+    Args:
+      f: The output file.
+    """
+    raise NotImplementedError("Document.WriteToFile")
+
+
+class Index(Document):
+  """An automatically generated index for a collection of documents."""
+
+  def __init__(self, module_to_name, members, filename_to_library_map):
+    """Creates a new Index.
+
+    Args:
+      module_to_name: Dictionary mapping modules to short names.
+      members: Dictionary mapping member name to (fullname, member).
+      filename_to_library_map: A list of (filename, Library) pairs. The order
+        corresponds to the order in which the libraries appear in the index.
+    """
+    self._module_to_name = module_to_name
+    self._members = members
+    self._filename_to_library_map = filename_to_library_map
+
+  def write_markdown_to_file(self, f):
+    """Writes this index to file `f`.
+
+    The output is formatted as an unordered list. Each list element
+    contains the title of the library, followed by a list of symbols
+    in that library hyperlinked to the corresponding anchor in that
+    library.
+
+    Args:
+      f: The output file.
+    """
+    print >>f, "<!-- This file is machine generated: DO NOT EDIT! -->"
+    print >>f, ""
+    print >>f, "# TensorFlow Python reference documentation"
+    print >>f, ""
+    for filename, library in self._filename_to_library_map:
+      per_symbol_links = []
+      for name in sorted(library.mentioned):
+        if name in self._members:
+          fullname, member = self._members[name]
+          anchor = _get_anchor(self._module_to_name, fullname)
+          prefix = "class " * inspect.isclass(member)
+          per_symbol_links.append("[%s%s](%s#%s)" %
+                                  (prefix, name, filename, anchor))
+      if per_symbol_links:
+        print >>f, "* <b>[%s](%s)</b>: %s" % (library.title, filename,
+                                              ",\n    ".join(per_symbol_links))
+        print >>f, ""
+
+    # actually include the files right here
+    print >>f, '<div class="sections-order" style="display: none;">\n<!--'
+    for filename, _ in self._filename_to_library_map:
+      print >>f, "<!-- %s -->" % filename
+    print >>f, "-->\n</div>"
+
+def collect_members(module_to_name):
+  """Collect all symbols from a list of modules.
+
+  Args:
+    module_to_name: Dictionary mapping modules to short names.
+
+  Returns:
+    Dictionary mapping name to (fullname, member) pairs.
+  """
+  members = {}
+  for module, module_name in module_to_name.iteritems():
+    for name, member in inspect.getmembers(module):
+      if ((inspect.isfunction(member) or inspect.isclass(member)) and
+          not _always_drop_symbol_re.match(name)):
+        fullname = '%s.%s' % (module_name, name)
+        if name in members:
+          other_fullname, other_member = members[name]
+          if member is not other_member:
+            raise RuntimeError("Short name collision between %s and %s" %
+                               (fullname, other_fullname))
+          if len(fullname) == len(other_fullname):
+            raise RuntimeError("Can't decide whether to use %s or %s for %s: "
+                               "both full names have length %d" %
+                               (fullname, other_fullname, len(fullname)))
+          if len(fullname) > len(other_fullname):
+            continue  # Use the shorter full name
+        members[name] = fullname, member
+  return members
+
+
+def _get_anchor(module_to_name, fullname):
+  """Turn a full member name into an anchor.
+
+  Args:
+    module_to_name: Dictionary mapping modules to short names.
+    fullname: Fully qualified name of symbol.
+
+  Returns:
+    HTML anchor string.  The longest module name prefix of fullname is
+    removed to make the anchor.
+
+  Raises:
+    ValueError: If fullname uses characters invalid in an anchor.
+  """
+  if not _anchor_re.match(fullname):
+    raise ValueError("'%s' is not a valid anchor" % fullname)
+  anchor = fullname
+  for module_name in module_to_name.itervalues():
+    if fullname.startswith(module_name + "."):
+      rest = fullname[len(module_name)+1:]
+      # Use this prefix iff it is longer than any found before
+      if len(anchor) > len(rest):
+        anchor = rest
+  return anchor
+
+
+class Library(Document):
+  """An automatically generated document for a set of functions and classes."""
+
+  def __init__(self,
+               title,
+               module,
+               module_to_name,
+               members,
+               documented,
+               exclude_symbols=(),
+               catch_all=False):
+    """Creates a new Library.
+
+    Args:
+      title: A human-readable title for the library.
+      module: Module to pull high level docstring from (for table of contents,
+        list of Ops to document, etc.).
+      module_to_name: Dictionary mapping modules to short names.
+      members: Dictionary mapping member name to (fullname, member).
+      documented: Set of documented names to update.
+      exclude_symbols: A list of specific symbols to exclude.
+    """
+    self._title = title
+    self._module = module
+    self._module_to_name = module_to_name
+    self._members = dict(members)  # Copy since we mutate it below
+    self._exclude_symbols = frozenset(exclude_symbols)
+    documented.update(exclude_symbols)
+    self._documented = documented
+    self._mentioned = set()
+
+  @property
+  def title(self):
+    """The human-readable title for this library."""
+    return self._title
+
+  @property
+  def mentioned(self):
+    """Set of names mentioned in this library."""
+    return self._mentioned
+
+  @property
+  def exclude_symbols(self):
+    """Set of excluded symbols."""
+    return self._exclude_symbols
+
+  def _should_include_member(self, name, member):
+    """Returns True if this member should be included in the document."""
+    # Always exclude symbols matching _always_drop_symbol_re.
+    if _always_drop_symbol_re.match(name):
+      return False
+    # Finally, exclude any specifically-excluded symbols.
+    if name in self._exclude_symbols:
+      return False
+    return True
+
+  def get_imported_modules(self, module):
+    """Returns the list of modules imported from `module`."""
+    for name, member in inspect.getmembers(module):
+      if inspect.ismodule(member):
+        yield name, member
+
+  def get_class_members(self, cls_name, cls):
+    """Returns the list of class members to document in `cls`.
+
+    This function filters the class member to ONLY return those
+    defined by the class.  It drops the inherited ones.
+
+    Args:
+      cls_name: Qualified name of `cls`.
+      cls: An inspect object of type 'class'.
+
+    Yields:
+      name, member tuples.
+    """
+    for name, member in inspect.getmembers(cls):
+      # Only show methods and properties presently.
+      if not (inspect.ismethod(member) or isinstance(member, property)):
+        continue
+      if ((inspect.ismethod(member) and member.__name__ == "__init__")
+          or self._should_include_member(name, member)):
+        yield name, ("%s.%s" % (cls_name, name), member)
+
+  def _generate_signature_for_function(self, func):
+    """Given a function, returns a string representing its args."""
+    args_list = []
+    argspec = inspect.getargspec(func)
+    first_arg_with_default = (
+        len(argspec.args or []) - len(argspec.defaults or []))
+    for arg in argspec.args[:first_arg_with_default]:
+      if arg == "self":
+        # Python documentation typically skips `self` when printing method
+        # signatures.
+        continue
+      args_list.append(arg)
+    if argspec.defaults:
+      for arg, default in zip(
+          argspec.args[first_arg_with_default:], argspec.defaults):
+        args_list.append("%s=%r" % (arg, default))
+    if argspec.varargs:
+      args_list.append("*" + argspec.varargs)
+    if argspec.keywords:
+      args_list.append("**" + argspec.keywords)
+    return "(" + ", ".join(args_list) + ")"
+
+  def _remove_docstring_indent(self, docstring):
+    """Remove indenting.
+
+    We follow Python's convention and remove the minimum indent of the lines
+    after the first, see:
+    https://www.python.org/dev/peps/pep-0257/#handling-docstring-indentation
+    preserving relative indentation.
+
+    Args:
+      docstring: A docstring.
+
+    Returns:
+      A list of strings, one per line, with the minimum indent stripped.
+    """
+    docstring = docstring or ""
+    lines = docstring.strip().split("\n")
+
+    min_indent = len(docstring)
+    for l in lines[1:]:
+      l = l.rstrip()
+      if l:
+        i = 0
+        while i < len(l) and l[i] == " ":
+          i += 1
+        if i < min_indent: min_indent = i
+    for i in range(1, len(lines)):
+      l = lines[i].rstrip()
+      if len(l) >= min_indent:
+        l = l[min_indent:]
+      lines[i] = l
+    return lines
+
+  def _print_formatted_docstring(self, docstring, f):
+    """Formats the given `docstring` as Markdown and prints it to `f`."""
+    lines = self._remove_docstring_indent(docstring)
+
+    # Output the lines, identifying "Args" and other section blocks.
+    i = 0
+
+    def _at_start_of_section():
+      """Returns the header if lines[i] is at start of a docstring section."""
+      l = lines[i]
+      match = _section_re.match(l)
+      if match and i + 1 < len(
+          lines) and lines[i + 1].startswith(" "):
+        return match.group(1)
+      else:
+        return None
+
+    while i < len(lines):
+      l = lines[i]
+
+      section_header = _at_start_of_section()
+      if section_header:
+        if i == 0 or lines[i-1]:
+          print >>f, ""
+        # Use at least H4 to keep these out of the TOC.
+        print >>f, "##### " + section_header + ":"
+        print >>f, ""
+        i += 1
+        outputting_list = False
+        while i < len(lines):
+          l = lines[i]
+          # A new section header terminates the section.
+          if _at_start_of_section():
+            break
+          match = _arg_re.match(l)
+          if match:
+            if not outputting_list:
+              # We need to start a list. In Markdown, a blank line needs to
+              # precede a list.
+              print >>f, ""
+              outputting_list = True
+            suffix = l[len(match.group()):].lstrip()
+            print >>f, "*  <b>" + match.group(1) + "</b>: " + suffix
+          else:
+            # For lines that don't start with _arg_re, continue the list if it
+            # has enough indentation.
+            outputting_list &= l.startswith("   ")
+            print >>f, l
+          i += 1
+      else:
+        print >>f, l
+        i += 1
+
+  def _print_function(self, f, prefix, fullname, func):
+    """Prints the given function to `f`."""
+    heading = prefix + " " + fullname
+    if not isinstance(func, property):
+      heading += self._generate_signature_for_function(func)
+    heading += " {#%s}" % _get_anchor(self._module_to_name, fullname)
+    print >>f, heading
+    print >>f, ""
+    self._print_formatted_docstring(inspect.getdoc(func), f)
+    print >>f, ""
+
+  def _write_member_markdown_to_file(self, f, name, member):
+    """Print `member` to `f`."""
+    if inspect.isfunction(member):
+      print >>f, "- - -"
+      print >>f, ""
+      self._print_function(f, "###", name, member)
+      print >>f, ""
+    elif inspect.ismethod(member):
+      print >>f, "- - -"
+      print >>f, ""
+      self._print_function(f, "####", name, member)
+      print >>f, ""
+    elif isinstance(member, property):
+      print >>f, "- - -"
+      print >>f, ""
+      self._print_function(f, "####", name, member)
+    elif inspect.isclass(member):
+      print >>f, "- - -"
+      print >>f, ""
+      print >>f, "### class %s {#%s}" % (
+          name, _get_anchor(self._module_to_name, name))
+      print >>f, ""
+      self._write_class_markdown_to_file(f, name, member)
+      print >>f, ""
+    else:
+      raise RuntimeError("Member %s has unknown type %s" % (name, type(member)))
+
+  def _write_docstring_markdown_to_file(self, f, docstring, members, imports):
+    for l in self._remove_docstring_indent(docstring):
+      if l.startswith(_member_mark):
+        name = l[len(_member_mark):].strip(" \t")
+        if name in members:
+          self._documented.add(name)
+          self._mentioned.add(name)
+          self._write_member_markdown_to_file(f, *members[name])
+          del members[name]
+        elif name in imports:
+          self._write_module_markdown_to_file(f, imports[name])
+        else:
+          raise ValueError("%s: unknown member `%s`" % (self._title, name))
+      else:
+        print >>f, l
+
+  def _write_class_markdown_to_file(self, f, name, cls):
+    """Write the class doc to 'f'.
+
+    Args:
+      f: File to write to.
+      prefix: Prefix for names.
+      cls: class object.
+      name: name to use.
+    """
+    # Build the list of class methods to document.
+    methods = dict(self.get_class_members(name, cls))
+    # Used later to check if any methods were called out in the class
+    # docstring.
+    num_methods = len(methods)
+    self._write_docstring_markdown_to_file(f, inspect.getdoc(cls), methods, {})
+
+    # If some methods were not described, describe them now if they are
+    # defined by the class itself (not inherited).  If NO methods were
+    # described, describe all methods.
+    #
+    # TODO(mdevin): when all methods have been categorized make it an error
+    # if some methods are not categorized.
+    any_method_called_out = (len(methods) != num_methods)
+    if any_method_called_out:
+      other_methods = {n: m for n, m in methods.iteritems()
+                       if n in cls.__dict__}
+      if other_methods:
+        print >>f, "\n#### Other Methods"
+    else:
+      other_methods = methods
+    for name in sorted(other_methods):
+      self._write_member_markdown_to_file(f, *other_methods[name])
+
+  def _write_module_markdown_to_file(self, f, module):
+    imports = dict(self.get_imported_modules(module))
+    self._write_docstring_markdown_to_file(f, inspect.getdoc(module),
+                                           self._members, imports)
+
+  def write_markdown_to_file(self, f):
+    """Prints this library to file `f`.
+
+    Args:
+      f: File to write to.
+
+    Returns:
+      Dictionary of documented members.
+    """
+    print >>f, "<!-- This file is machine generated: DO NOT EDIT! -->"
+    print >>f, ""
+    # TODO(mdevin): Do not insert these.  Let the doc writer put them in
+    # the module docstring explicitly.
+    print >>f, "#", self._title
+    print >>f, "[TOC]"
+    print >>f, ""
+    if self._module is not None:
+      self._write_module_markdown_to_file(f, self._module)
+
+  def write_other_members(self, f, catch_all=False):
+    """Writes the leftover members to `f`.
+
+    Args:
+      f: File to write to.
+      catch_all: If true, document all missing symbols from any module.
+        Otherwise, document missing symbols from just this module.
+    """
+    if catch_all:
+      names = self._members.iteritems()
+    else:
+      names = inspect.getmembers(self._module)
+    leftovers = []
+    for name, _ in names:
+      if name in self._members and name not in self._documented:
+        leftovers.append(name)
+    if leftovers:
+      print "%s: undocumented members: %d" % (self._title, len(leftovers))
+      print >>f, "\n## Other Functions and Classes"
+      for name in sorted(leftovers):
+        print "  %s" % name
+        self._documented.add(name)
+        self._mentioned.add(name)
+        self._write_member_markdown_to_file(f, *self._members[name])
+
+  def assert_no_leftovers(self):
+    """Generate an error if there are leftover members."""
+    leftovers = []
+    for name in self._members.iterkeys():
+      if name in self._members and name not in self._documented:
+        leftovers.append(name)
+    if leftovers:
+      raise RuntimeError("%s: undocumented members: %s" %
+                         (self._title, ", ".join(leftovers)))
+
+
+def write_libraries(dir, libraries):
+  """Write a list of libraries to disk.
+
+  Args:
+    dir: Output directory.
+    libraries: List of (filename, library) pairs.
+  """
+  files = [open(os.path.join(dir, k), "w") for k, _ in libraries]
+  # Document mentioned symbols for all libraries
+  for f, (_, v) in zip(files, libraries):
+    v.write_markdown_to_file(f)
+  # Document symbols that no library mentioned.  We do this after writing
+  # out all libraries so that earlier libraries know what later libraries
+  # documented.
+  for f, (_, v) in zip(files, libraries):
+    v.write_other_members(f)
+    f.close()
diff --git a/tensorflow/python/framework/errors.py b/tensorflow/python/framework/errors.py
new file mode 100644
index 0000000000..fe8f107cec
--- /dev/null
+++ b/tensorflow/python/framework/errors.py
@@ -0,0 +1,410 @@
+"""Exception types for TensorFlow errors."""
+import traceback
+import warnings
+
+from tensorflow.core.lib.core import error_codes_pb2
+
+
+class OpError(Exception):
+  """A generic error that is raised when TensorFlow execution fails.
+
+  Whenever possible, the session will raise a more specific subclass
+  of `OpError` from the `tf.errors` module.
+
+  @@op
+  @@node_def
+  """
+
+  def __init__(self, node_def, op, message, error_code):
+    """Creates a new OpError indicating that a particular op failed.
+
+    Args:
+      node_def: The graph_pb2.NodeDef proto representing the op that failed.
+      op: The ops.Operation that failed, if known; otherwise None.
+      message: The message string describing the failure.
+      error_code: The error_codes_pb2.Code describing the error.
+    """
+    super(OpError, self).__init__()
+    self._message = message
+    self._node_def = node_def
+    self._op = op
+    self._error_code = error_code
+
+  @property
+  def message(self):
+    """The error message that describes the error."""
+    return self._message
+
+  @property
+  def op(self):
+    """The operation that failed, if known.
+
+    *N.B.* If the failed op was synthesized at runtime, e.g. a `Send`
+    or `Recv` op, there will be no corresponding
+    [`Operation`](framework.md#Operation) object.  In that case, this
+    will return `None`, and you should instead use the
+    [`node_def`](OpError.node_def) to discover information about the op.
+
+    Returns:
+      The `Operation` that failed, or None.
+    """
+    return self._op
+
+  @property
+  def error_code(self):
+    """The integer error code that describes the error."""
+    return self._error_code
+
+  @property
+  def node_def(self):
+    """The `NodeDef` proto representing the op that failed."""
+    return self._node_def
+
+  def __str__(self):
+    if self._op is not None:
+      output = ["%s\nCaused by op %r, defined at:\n"
+                % (self.message, self._op.name,)]
+      curr_traceback_list = traceback.format_list(self._op.traceback)
+      output.extend(curr_traceback_list)
+      original_op = self._op._original_op
+      while original_op is not None:
+        output.append(
+            "\n...which was originally created as op %r, defined at:\n"
+            % (original_op.name,))
+        prev_traceback_list = curr_traceback_list
+        curr_traceback_list = traceback.format_list(original_op.traceback)
+
+        # Attempt to elide large common subsequences of the subsequent
+        # stack traces.
+        #
+        # TODO(mrry): Consider computing the actual longest common subsequence.
+        is_eliding = False
+        elide_count = 0
+        last_elided_line = None
+        for line, line_in_prev in zip(curr_traceback_list, prev_traceback_list):
+          if line == line_in_prev:
+            if is_eliding:
+              elide_count += 1
+              last_elided_line = line
+            else:
+              output.append(line)
+              is_eliding = True
+              elide_count = 0
+          else:
+            if is_eliding:
+              if elide_count > 0:
+                output.extend(
+                    ["[elided %d identical lines from previous traceback]\n"
+                     % (elide_count - 1,), last_elided_line])
+              is_eliding = False
+            output.extend(line)
+
+        original_op = original_op._original_op
+      return ''.join(output)
+    else:
+      return self.message
+
+
+OK = error_codes_pb2.OK
+CANCELLED = error_codes_pb2.CANCELLED
+UNKNOWN = error_codes_pb2.UNKNOWN
+INVALID_ARGUMENT = error_codes_pb2.INVALID_ARGUMENT
+DEADLINE_EXCEEDED = error_codes_pb2.DEADLINE_EXCEEDED
+NOT_FOUND = error_codes_pb2.NOT_FOUND
+ALREADY_EXISTS = error_codes_pb2.ALREADY_EXISTS
+PERMISSION_DENIED = error_codes_pb2.PERMISSION_DENIED
+UNAUTHENTICATED = error_codes_pb2.UNAUTHENTICATED
+RESOURCE_EXHAUSTED = error_codes_pb2.RESOURCE_EXHAUSTED
+FAILED_PRECONDITION = error_codes_pb2.FAILED_PRECONDITION
+ABORTED = error_codes_pb2.ABORTED
+OUT_OF_RANGE = error_codes_pb2.OUT_OF_RANGE
+UNIMPLEMENTED = error_codes_pb2.UNIMPLEMENTED
+INTERNAL = error_codes_pb2.INTERNAL
+UNAVAILABLE = error_codes_pb2.UNAVAILABLE
+DATA_LOSS = error_codes_pb2.DATA_LOSS
+
+
+class CancelledError(OpError):
+  """Raised when an operation or step is cancelled.
+
+  For example, a long-running operation (e.g.
+  [`queue.enqueue()`](io_ops.md#QueueBase.enqueue) may be cancelled by
+  running another operation (e.g.
+  [`queue.close(cancel_pending_enqueues=True)`](io_ops.md#QueueBase.close),
+  or by [closing the session](client.md#Session.close). A step that is
+  running such a long-running operation will fail by raising `CancelledError`.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates a `CancelledError`."""
+    super(CancelledError, self).__init__(node_def, op, message, CANCELLED)
+
+
+class UnknownError(OpError):
+  """Unknown error.
+
+  An example of where this error may be returned is if a Status value
+  received from another address space belongs to an error-space that
+  is not known to this address space. Also errors raised by APIs that
+  do not return enough error information may be converted to this
+  error.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message, error_code=UNKNOWN):
+    """Creates an `UnknownError`."""
+    super(UnknownError, self).__init__(node_def, op, message, error_code)
+
+
+class InvalidArgumentError(OpError):
+  """Raised when an operation receives an invalid argument.
+
+  This may occur, for example, if an operation is receives an input
+  tensor that has an invalid value or shape. For example, the
+  [`tf.matmul()`](math_ops.md#matmul) op will raise this error if it
+  receives an input that is not a matrix, and the
+  [`tf.reshape()`](array_ops.md#reshape) op will raise this error if
+  the new shape does not match the number of elements in the input
+  tensor.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates an `InvalidArgumentError`."""
+    super(InvalidArgumentError, self).__init__(node_def, op, message,
+                                               INVALID_ARGUMENT)
+
+
+class DeadlineExceededError(OpError):
+  """Raised when a deadline expires before an operation could complete.
+
+  This exception is not currently used.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates a `DeadlineExceededError`."""
+    super(DeadlineExceededError, self).__init__(node_def, op, message,
+                                                DEADLINE_EXCEEDED)
+
+
+class NotFoundError(OpError):
+  """Raised when a requested entity (e.g., a file or directory) was not found.
+
+  For example, running the
+  [`tf.WholeFileReader.read()`](io_ops.md#WholeFileReader) operation
+  could raise `NotFoundError` if it receives the name of a file that
+  does not exist.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates a `NotFoundError`."""
+    super(NotFoundError, self).__init__(node_def, op, message, NOT_FOUND)
+
+
+class AlreadyExistsError(OpError):
+  """Raised when an entity that we attempted to create already exists.
+
+  For example, running an operation that saves a file
+  (e.g. [`tf.train.Saver.save()`](train.md#Saver.save)) could
+  potentially raise this exception if an explicit filename for an
+  existing file was passed.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates an `AlreadyExistsError`."""
+    super(AlreadyExistsError, self).__init__(node_def, op, message,
+                                             ALREADY_EXISTS)
+
+
+class PermissionDeniedError(OpError):
+  """Raised when the caller does not have permission to run an operation.
+
+  For example, running the
+  [`tf.WholeFileReader.read()`](io_ops.md#WholeFileReader) operation
+  could raise `PermissionDeniedError` if it receives the name of a
+  file for which the user does not have the read file permission.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates a `PermissionDeniedError`."""
+    super(PermissionDeniedError, self).__init__(node_def, op, message,
+                                                PERMISSION_DENIED)
+
+
+class UnauthenticatedError(OpError):
+  """The request does not have valid authentication credentials.
+
+  This exception is not currently used.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates an `UnauthenticatedError`."""
+    super(UnauthenticatedError, self).__init__(node_def, op, message,
+                                               UNAUTHENTICATED)
+
+
+class ResourceExhaustedError(OpError):
+  """Some resource has been exhausted.
+
+  For example, this error might be raised if a per-user quota is
+  exhausted, or perhaps the entire file system is out of space.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates a `ResourceExhaustedError`."""
+    super(ResourceExhaustedError, self).__init__(node_def, op, message,
+                                                 RESOURCE_EXHAUSTED)
+
+
+class FailedPreconditionError(OpError):
+  """Operation was rejected because the system is not in a state to execute it.
+
+  This exception is most commonly raised when running an operation
+  that reads a [`tf.Variable`](state_ops.md#Variable) before it has
+  been initialized.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates a `FailedPreconditionError`."""
+    super(FailedPreconditionError, self).__init__(node_def, op, message,
+                                                  FAILED_PRECONDITION)
+
+
+class AbortedError(OpError):
+  """The operation was aborted, typically due to a concurrent action.
+
+  For example, running a [`queue.enqueue()`](io_ops.md#QueueBase.enqueue)
+  operation may raise `AbortedError` if a
+  [`queue.close()`](io_ops.md@QueueBase.close) operation previously ran.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates an `AbortedError`."""
+    super(AbortedError, self).__init__(node_def, op, message, ABORTED)
+
+
+class OutOfRangeError(OpError):
+  """Raised when an operation executed past the valid range.
+
+  This exception is raised in "end-of-file" conditions, such as when a
+  [`queue.dequeue()`](io_ops.md#QueueBase.dequeue) operation is
+  blocked on an empty queue, and a
+  [`queue.close()`](io_ops.md#QueueBase.close) operation executes.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates an `OutOfRangeError`."""
+    super(OutOfRangeError, self).__init__(node_def, op, message,
+                                          OUT_OF_RANGE)
+
+
+class UnimplementedError(OpError):
+  """Raised when an operation has not been implemented.
+
+  Some operations may raise this error when passed otherwise-valid
+  arguments that it does not currently support. For example, running
+  the [`tf.nn.max_pool()`](nn.md#max_pool) operation would raise this
+  error if pooling was requested on the batch dimension, because this
+  is not yet supported.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates an `UnimplementedError`."""
+    super(UnimplementedError, self).__init__(node_def, op, message,
+                                             UNIMPLEMENTED)
+
+
+class InternalError(OpError):
+  """Raised when the system experiences an internal error.
+
+  This exception is raised when some invariant expected by the runtime
+  has been broken. Catching this exception is not recommended.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates an `InternalError`."""
+    super(InternalError, self).__init__(node_def, op, message, INTERNAL)
+
+
+class UnavailableError(OpError):
+  """Raised when the runtime is currently unavailable.
+
+  This exception is not currently used.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates an `UnavailableError`."""
+    super(UnavailableError, self).__init__(node_def, op, message,
+                                           UNAVAILABLE)
+
+
+class DataLossError(OpError):
+  """Raised when unrecoverable data loss or corruption is encountered.
+
+  For example, this may be raised by running a
+  [`tf.WholeFileReader.read()`](io_ops.md#WholeFileReader) operation,
+  if the file is truncated while it is being read.
+
+  @@__init__
+  """
+
+  def __init__(self, node_def, op, message):
+    """Creates a `DataLossError`."""
+    super(DataLossError, self).__init__(node_def, op, message, DATA_LOSS)
+
+
+_CODE_TO_EXCEPTION_CLASS = {
+    CANCELLED: CancelledError,
+    UNKNOWN: UnknownError,
+    INVALID_ARGUMENT: InvalidArgumentError,
+    DEADLINE_EXCEEDED: DeadlineExceededError,
+    NOT_FOUND: NotFoundError,
+    ALREADY_EXISTS: AlreadyExistsError,
+    PERMISSION_DENIED: PermissionDeniedError,
+    UNAUTHENTICATED: UnauthenticatedError,
+    RESOURCE_EXHAUSTED: ResourceExhaustedError,
+    FAILED_PRECONDITION: FailedPreconditionError,
+    ABORTED: AbortedError,
+    OUT_OF_RANGE: OutOfRangeError,
+    UNIMPLEMENTED: UnimplementedError,
+    INTERNAL: InternalError,
+    UNAVAILABLE: UnavailableError,
+    DATA_LOSS: DataLossError,
+}
+
+
+def _make_specific_exception(node_def, op, message, error_code):
+  try:
+    exc_type = _CODE_TO_EXCEPTION_CLASS[error_code]
+    return exc_type(node_def, op, message)
+  except KeyError:
+    warnings.warn("Unknown error code: %d" % error_code)
+    return UnknownError(node_def, op, message, error_code)
diff --git a/tensorflow/python/framework/errors_test.py b/tensorflow/python/framework/errors_test.py
new file mode 100644
index 0000000000..ab59a729f6
--- /dev/null
+++ b/tensorflow/python/framework/errors_test.py
@@ -0,0 +1,63 @@
+"""Tests for tensorflow.python.framework.errors."""
+import tensorflow.python.platform
+
+import warnings
+
+import tensorflow as tf
+
+from tensorflow.core.lib.core import error_codes_pb2
+
+class ErrorsTest(tf.test.TestCase):
+
+  def testUniqueClassForEachErrorCode(self):
+    for error_code, exc_type in [
+        (tf.errors.CANCELLED, tf.errors.CancelledError),
+        (tf.errors.UNKNOWN, tf.errors.UnknownError),
+        (tf.errors.INVALID_ARGUMENT, tf.errors.InvalidArgumentError),
+        (tf.errors.DEADLINE_EXCEEDED, tf.errors.DeadlineExceededError),
+        (tf.errors.NOT_FOUND, tf.errors.NotFoundError),
+        (tf.errors.ALREADY_EXISTS, tf.errors.AlreadyExistsError),
+        (tf.errors.PERMISSION_DENIED, tf.errors.PermissionDeniedError),
+        (tf.errors.UNAUTHENTICATED, tf.errors.UnauthenticatedError),
+        (tf.errors.RESOURCE_EXHAUSTED, tf.errors.ResourceExhaustedError),
+        (tf.errors.FAILED_PRECONDITION, tf.errors.FailedPreconditionError),
+        (tf.errors.ABORTED, tf.errors.AbortedError),
+        (tf.errors.OUT_OF_RANGE, tf.errors.OutOfRangeError),
+        (tf.errors.UNIMPLEMENTED, tf.errors.UnimplementedError),
+        (tf.errors.INTERNAL, tf.errors.InternalError),
+        (tf.errors.UNAVAILABLE, tf.errors.UnavailableError),
+        (tf.errors.DATA_LOSS, tf.errors.DataLossError),
+        ]:
+      # pylint: disable=protected-access
+      self.assertTrue(isinstance(
+          tf.errors._make_specific_exception(None, None, None, error_code),
+          exc_type))
+      # pylint: enable=protected-access
+
+  def testKnownErrorClassForEachErrorCodeInProto(self):
+    for error_code in error_codes_pb2.Code.values():
+      # pylint: disable=line-too-long
+      if error_code in (error_codes_pb2.OK,
+                        error_codes_pb2.DO_NOT_USE_RESERVED_FOR_FUTURE_EXPANSION_USE_DEFAULT_IN_SWITCH_INSTEAD_):
+        continue
+      # pylint: enable=line-too-long
+      with warnings.catch_warnings(record=True) as w:
+        # pylint: disable=protected-access
+        exc = tf.errors._make_specific_exception(None, None, None, error_code)
+        # pylint: enable=protected-access
+      self.assertEqual(0, len(w))  # No warning is raised.
+      self.assertTrue(isinstance(exc, tf.errors.OpError))
+      self.assertTrue(tf.errors.OpError in exc.__class__.__bases__)
+
+  def testUnknownErrorCodeCausesWarning(self):
+    with warnings.catch_warnings(record=True) as w:
+      # pylint: disable=protected-access
+      exc = tf.errors._make_specific_exception(None, None, None, 37)
+      # pylint: enable=protected-access
+    self.assertEqual(1, len(w))
+    self.assertTrue("Unknown error code: 37" in str(w[0].message))
+    self.assertTrue(isinstance(exc, tf.errors.OpError))
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/framework/framework_lib.py b/tensorflow/python/framework/framework_lib.py
new file mode 100644
index 0000000000..e317cfda8d
--- /dev/null
+++ b/tensorflow/python/framework/framework_lib.py
@@ -0,0 +1,70 @@
+# pylint: disable=wildcard-import,unused-import,g-bad-import-order,line-too-long
+"""Import names from the framework library.
+
+## Core graph data structures
+
+@@Graph
+@@Operation
+@@Tensor
+
+## Tensor types
+
+@@DType
+@@as_dtype
+
+## Utility functions
+
+@@device
+@@name_scope
+@@control_dependencies
+@@convert_to_tensor
+@@get_default_graph
+@@import_graph_def
+
+## Graph collections
+
+@@add_to_collection
+@@get_collection
+@@GraphKeys
+
+## Defining new operations
+
+@@RegisterGradient
+@@NoGradient
+@@RegisterShape
+@@TensorShape
+@@Dimension
+@@op_scope
+@@get_seed
+"""
+
+# Classes used when building a Graph.
+from tensorflow.python.framework.ops import Graph
+from tensorflow.python.framework.ops import Operation
+from tensorflow.python.framework.ops import Tensor
+from tensorflow.python.framework.ops import SparseTensor
+from tensorflow.python.framework.ops import SparseTensorValue
+from tensorflow.python.framework.ops import IndexedSlices
+
+# Utilities used when building a Graph.
+from tensorflow.python.framework.ops import device
+from tensorflow.python.framework.ops import name_scope
+from tensorflow.python.framework.ops import op_scope
+from tensorflow.python.framework.ops import control_dependencies
+from tensorflow.python.framework.ops import get_default_graph
+from tensorflow.python.framework.ops import GraphKeys
+from tensorflow.python.framework.ops import add_to_collection
+from tensorflow.python.framework.ops import get_collection
+from tensorflow.python.framework.ops import convert_to_tensor
+from tensorflow.python.framework.random_seed import get_seed
+from tensorflow.python.framework.random_seed import set_random_seed
+from tensorflow.python.framework.importer import import_graph_def
+
+# Needed when you defined a new Op in C++.
+from tensorflow.python.framework.ops import RegisterGradient
+from tensorflow.python.framework.ops import NoGradient
+from tensorflow.python.framework.ops import RegisterShape
+from tensorflow.python.framework.tensor_shape import Dimension
+from tensorflow.python.framework.tensor_shape import TensorShape
+
+from tensorflow.python.framework.types import *
diff --git a/tensorflow/python/framework/gen_docs_combined.py b/tensorflow/python/framework/gen_docs_combined.py
new file mode 100644
index 0000000000..a726d880e7
--- /dev/null
+++ b/tensorflow/python/framework/gen_docs_combined.py
@@ -0,0 +1,114 @@
+"""Updates generated docs from Python doc comments."""
+
+import os.path
+
+import tensorflow.python.platform
+import sys
+import tensorflow as tf
+
+from tensorflow.python.framework import docs
+from tensorflow.python.framework import framework_lib
+from tensorflow.python.client import client_lib
+
+
+tf.flags.DEFINE_string("out_dir", None,
+                       "Directory to which docs should be written.")
+tf.flags.DEFINE_boolean("print_hidden_regex", False,
+                        "Dump a regular expression matching any hidden symbol")
+FLAGS = tf.flags.FLAGS
+
+
+def get_module_to_name():
+  return {tf: 'tf',
+          tf.errors: 'tf.errors',
+          tf.image: 'tf.image',
+          tf.nn: 'tf.nn',
+          tf.train: 'tf.train',
+          tf.python_io: 'tf.python_io'}
+
+def all_libraries(module_to_name, members, documented):
+  # A list of (filename, docs.Library) pairs representing the individual files
+  # that we want to create.
+  def library(name, title, module=None, **args):
+    if module is None:
+      module = sys.modules["tensorflow.python.ops" +
+                           ("" if name == "ops" else "." + name)]
+    return (name + ".md", docs.Library(title=title,
+                                       module_to_name=module_to_name,
+                                       members=members,
+                                       documented=documented,
+                                       module=module,
+                                       **args))
+  return [
+      # Splits of module 'tf'.
+      library("framework", "Building Graphs", framework_lib),
+      library("constant_op", "Constants, Sequences, and Random Values"),
+      library("state_ops", "Variables"),
+      library("array_ops", "Tensor Transformations",
+              exclude_symbols=["list_diff"]),
+      library("math_ops", "Math",
+              exclude_symbols=["sparse_matmul", "arg_min", "arg_max",
+                               "lin_space", "sparse_segment_mean_grad"]),
+      library("control_flow_ops", "Control Flow"),
+      library("image", "Images", tf.image, exclude_symbols=["ResizeMethod"]),
+      library("sparse_ops", "Sparse Tensors"),
+      library("io_ops", "Inputs and Readers",
+              exclude_symbols=["LookupTableBase", "HashTable",
+                               "initialize_all_tables",
+                               "string_to_hash_bucket"]),
+      library("python_io", "Data IO (Python functions)", tf.python_io),
+      library("nn", "Neural Network", tf.nn,
+              exclude_symbols=["deconv2d", "conv2d_backprop_input",
+                               "conv2d_backprop_filter", "avg_pool_grad",
+                               "max_pool_grad", "max_pool_grad_with_argmax",
+                               "batch_norm_with_global_normalization_grad",
+                               "lrn_grad", "relu6_grad", "softplus_grad",
+                               "xw_plus_b", "relu_layer", "lrn",
+                               "batch_norm_with_global_normalization",
+                               "batch_norm_with_global_normalization_grad",
+                               "all_candidate_sampler"]),
+      library('client', "Running Graphs", client_lib,
+              exclude_symbols=["InteractiveSession"]),
+      library("train", "Training", tf.train,
+              exclude_symbols=["Feature", "Features", "BytesList", "FloatList",
+                               "Int64List", "Example", "InferenceExample",
+                               "RankingExample", "SequenceExample"]),
+  ]
+
+_hidden_symbols = ["Event", "Summary",
+                   "HistogramProto", "ConfigProto", "NodeDef", "GraphDef",
+                   "GPUOptions", "SessionInterface", "BaseSession"]
+
+def main(unused_argv):
+  if not FLAGS.out_dir:
+    tf.logging.error("out_dir not specified")
+    return -1
+
+  # Document libraries
+  documented = set()
+  module_to_name = get_module_to_name()
+  members = docs.collect_members(module_to_name)
+  libraries = all_libraries(module_to_name, members, documented)
+  docs.write_libraries(FLAGS.out_dir, libraries)
+
+  # Make it easy to search for hidden symbols
+  if FLAGS.print_hidden_regex:
+    hidden = set(_hidden_symbols)
+    for _, lib in libraries:
+      hidden.update(lib.exclude_symbols)
+    print r"hidden symbols regex = r'\b(%s)\b'" % "|".join(sorted(hidden))
+
+  # Verify that all symbols are mentioned in some library doc.
+  catch_all = docs.Library(title="Catch All", module=None,
+                           exclude_symbols=_hidden_symbols,
+                           module_to_name=module_to_name, members=members,
+                           documented=documented)
+  catch_all.assert_no_leftovers()
+
+  # Generate index
+  with open(os.path.join(FLAGS.out_dir, "index.md"), "w") as f:
+    docs.Index(module_to_name, members, libraries).write_markdown_to_file(f)
+
+
+if __name__ == "__main__":
+  tf.app.run()
diff --git a/tensorflow/python/framework/gen_docs_test.sh b/tensorflow/python/framework/gen_docs_test.sh
new file mode 100755
index 0000000000..fda214d93c
--- /dev/null
+++ b/tensorflow/python/framework/gen_docs_test.sh
@@ -0,0 +1,4 @@
+#!/bin/bash -eux
+DIR=$TEST_SRCDIR/tensorflow/python
+$DIR/gen_docs_combined --out_dir $TEST_TMPDIR
+echo "PASS"
diff --git a/tensorflow/python/framework/importer.py b/tensorflow/python/framework/importer.py
new file mode 100644
index 0000000000..6ad2a1b009
--- /dev/null
+++ b/tensorflow/python/framework/importer.py
@@ -0,0 +1,303 @@
+"""A utility function for importing TensorFlow graphs."""
+import contextlib
+
+import tensorflow.python.platform
+
+from tensorflow.core.framework import graph_pb2
+from tensorflow.core.framework import types_pb2
+from tensorflow.python.framework import op_def_registry
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types as types_lib
+
+
+# TODO(josh11b): SWIG the code from node_def_util instead of duplicating
+# the logic here.
+def _GetNodeAttr(node_def, attr_name):
+  if attr_name not in node_def.attr:
+    raise ValueError('Expected one attr with name %r in %s.'
+                     % (attr_name, str(node_def)))
+  return node_def.attr[attr_name]
+
+
+def _ArgToTypesNoRef(node_def, arg_def):
+  if arg_def.number_attr:
+    repeats = _GetNodeAttr(node_def, arg_def.number_attr).i
+    if arg_def.type_attr:
+      dtype = _GetNodeAttr(node_def, arg_def.type_attr).type
+    else:
+      assert arg_def.type != types_pb2.DT_INVALID
+      dtype = arg_def.type
+    return [dtype] * repeats
+  elif arg_def.type_attr:
+    return [_GetNodeAttr(node_def, arg_def.type_attr).type]
+  elif arg_def.type_list_attr:
+    return _GetNodeAttr(node_def, arg_def.type_list_attr).list.type
+  else:
+    assert arg_def.type != types_pb2.DT_INVALID
+    return [arg_def.type]
+
+
+def _SingleArgToTypes(node_def, arg_def):
+  types = _ArgToTypesNoRef(node_def, arg_def)
+  if arg_def.is_ref:
+    return [types_lib.as_dtype(dt).as_ref.as_datatype_enum for dt in types]
+  return types
+
+
+def _ArgsToTypes(node_def, arg_list):
+  types = []
+  for arg_def in arg_list:
+    types.extend(_SingleArgToTypes(node_def, arg_def))
+  return types
+
+
+def _InputTypes(node_def, op_dict):
+  op_def = op_dict[node_def.op]
+  return _ArgsToTypes(node_def, op_def.input_arg)
+
+
+def _OutputTypes(node_def, op_dict):
+  op_def = op_dict[node_def.op]
+  return _ArgsToTypes(node_def, op_def.output_arg)
+
+
+def _IsControlInput(input_name):
+  # Expected format: '^operation_name' (control input).
+  return input_name.startswith('^')
+
+
+def _ParseTensorName(tensor_name):
+  """Parses a tensor name into an operation name and output index.
+
+  This function will canonicalize tensor names as follows:
+
+  * "foo:0"       -> ("foo", 0)
+  * "foo:7"       -> ("foo", 7)
+  * "foo"         -> ("foo", 0)
+  * "foo:bar:baz" -> ValueError
+
+  Args:
+    tensor_name: The name of a tensor.
+
+  Returns:
+    A tuple containing the operation name, and the output index.
+
+  Raises:
+    ValueError: If `tensor_name' cannot be interpreted as the name of a tensor.
+  """
+  components = tensor_name.split(':')
+  if len(components) == 2:
+    # Expected format: 'operation_name:output_index'.
+    try:
+      output_index = int(components[1])
+    except ValueError:
+      raise ValueError('Cannot convert %r to a tensor name.' % (tensor_name,))
+    return components[0], output_index
+  elif len(components) == 1:
+    # Expected format: 'operation_name' (implicit 0th output).
+    return components[0], 0
+  else:
+    raise ValueError('Cannot convert %r to a tensor name.' % (tensor_name,))
+
+
+def _CanonicalInputName(input_name):
+  if _IsControlInput(input_name):
+    return input_name
+  input_op_name, output_index = _ParseTensorName(input_name)
+  return '%s:%d' % (input_op_name, output_index)
+
+
+def _InvalidNodeMessage(node, message):
+  return 'graph_def is invalid at node %r: %s.' % (node.name, message)
+
+
+@contextlib.contextmanager
+def _MaybeDevice(device):
+  """Applies the given device only if device is not None or empty."""
+  if device:
+    with ops.device(device):
+      yield
+  else:
+    yield
+
+
+def import_graph_def(graph_def, input_map=None, return_elements=None,
+                     name=None, op_dict=None):
+  """Imports the TensorFlow graph in `graph_def` into the Python `Graph`.
+
+  This function provides a way to import a serialized TensorFlow
+  [`GraphDef`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/graph.proto)
+  protocol buffer, and extract individual objects in the `GraphDef` as
+  [`Tensor`](#Tensor) and [`Operation`](#Operation) objects. See
+  [`Graph.as_graph_def()`](#Graph.as_graph_def) for a way to create a
+  `GraphDef` proto.
+
+  Args:
+    graph_def: A `GraphDef` proto containing operations to be imported into
+      the default graph.
+    input_map: A dictionary mapping input names (as strings) in `graph_def`
+      to `Tensor` objects. The values of the named input tensors in the
+      imported graph will be re-mapped to the respective `Tensor` values.
+    return_elements: A list of strings containing operation names in
+      `graph_def` that will be returned as `Operation` objects; and/or
+      tensor names in `graph_def` that will be returned as `Tensor` objects.
+    name: (Optional.) A prefix that will be prepended to the names in
+      `graph_def`. Defaults to `"import"`.
+    op_dict: (Optional.) A dictionary mapping op type names to `OpDef` protos.
+      Must contain an `OpDef` proto for each op type named in `graph_def`.
+      If omitted, uses the `OpDef` protos registered in the global registry.
+
+  Returns:
+    A list of `Operation` and/or `Tensor` objects from the imported graph,
+    corresponding to the names in `return_elements'.
+
+  Raises:
+    TypeError: If `graph_def` is not a `GraphDef` proto,
+      `input_map' is not a dictionary mapping strings to `Tensor` objects,
+      or `return_elements` is not a list of strings.
+    ValueError: If `input_map`, or `return_elements` contains names that
+      do not appear in `graph_def`, or `graph_def` is not well-formed (e.g.
+      it refers to an unknown tensor).
+  """
+  # Type checks for inputs.
+  if not isinstance(graph_def, graph_pb2.GraphDef):
+    raise TypeError('graph_def must be a GraphDef proto.')
+  if input_map is None:
+    input_map = {}
+  else:
+    if not (isinstance(input_map, dict)
+            and all(isinstance(k, basestring) for k in input_map.keys())):
+      raise TypeError('input_map must be a dictionary mapping strings to '
+                      'Tensor objects.')
+  if (return_elements is not None
+      and not (isinstance(return_elements, (list, tuple))
+               and all(isinstance(x, basestring) for x in return_elements))):
+    raise TypeError('return_elements must be a list of strings.')
+
+  # Use a canonical representation for all tensor names.
+  input_map = {_CanonicalInputName(k): v for k, v in input_map.items()}
+  used_input_keys = set()
+
+  name_to_op = {}
+
+  if op_dict is None:
+    op_dict = op_def_registry.get_registered_ops()
+
+  with ops.op_scope(input_map.values(), name, 'import'):
+    g = ops.get_default_graph()
+
+    with ops.name_scope('_inputs'):
+      input_map = {k: ops.convert_to_tensor(v) for k, v in input_map.items()}
+
+    # NOTE(mrry): We do this in two passes, because there may be a cycle in
+    # `graph_def'.
+
+    # 1. Add operations without their inputs.
+    for node in graph_def.node:
+      output_types = _OutputTypes(node, op_dict)
+      with _MaybeDevice(node.device):
+        name_to_op[node.name] = g.create_op(
+            node.op, [], output_types, name=node.name, attrs=node.attr,
+            compute_shapes=False)
+
+    # 2. Add inputs to the operations.
+    for node in graph_def.node:
+      op = name_to_op[node.name]
+      input_types = _InputTypes(node, op_dict)
+
+      # NOTE(mrry): We cannot use zip here because control inputs do not appear
+      # in the list of input_types.
+      for i, input_name in enumerate(
+          [_CanonicalInputName(x) for x in node.input]):
+
+        if _IsControlInput(input_name):
+          # (a) Input is a control input that should be taken from an op
+          #     in "graph_def".
+          try:
+            source_op = name_to_op[input_name[1:]]
+          except KeyError:
+            raise ValueError(
+                _InvalidNodeMessage(
+                    node,
+                    'Control input %r not found in graph_def.' % (input_name,)))
+          # pylint: disable=protected-access
+          op._add_control_input(source_op)
+          # pylint: enable=protected-access
+
+        else:
+          try:
+            input_type = input_types[i]
+          except IndexError:
+            raise ValueError(_InvalidNodeMessage(
+                node, 'More inputs specified (%r) than the op expects.'
+                % (input_name,)))
+
+          if input_name in input_map:
+            # (b) Input should be replaced by a tensor from the caller.
+            source_tensor = input_map[input_name]
+            used_input_keys.add(input_name)
+
+          else:
+            # (c) Input should be taken from an op in `graph_def'.
+            operation_name, output_index = _ParseTensorName(input_name)
+            try:
+              source_op = name_to_op[operation_name]
+              source_tensor = source_op.values()[output_index]
+            except (KeyError, IndexError):
+              raise ValueError(
+                  _InvalidNodeMessage(
+                      node,
+                      'Input tensor %r not found in graph_def.'
+                      % (input_name,)))
+
+          try:
+            # pylint: disable=protected-access
+            op._add_input(source_tensor, dtype=input_type)
+            # pylint: enable=protected-access
+          except TypeError as te:
+            raise ValueError(
+                _InvalidNodeMessage(node, 'Input tensor %r %s'
+                                    % (input_name, te.message)))
+
+      # pylint: disable=protected_access
+      if op._input_dtypes != input_types:
+        raise ValueError(
+            _InvalidNodeMessage(
+                node,
+                'Input types mismatch (expected %r but got %r)'
+                % (", ".join(types_lib.as_dtype(x).name for x in input_types),
+                   ", ".join(x.name for x in op._input_dtypes))))
+      # pylint: enable=protected_access
+
+      # Execute shape inference for this op.
+      # NOTE(mrry): If the graph contains a cycle, the full shape information
+      # may not be available for this op's inputs.
+      ops.set_shapes_for_outputs(op)
+
+    # Treat unused input mappings as an error, because they are likely to be
+    # due to a typo.
+    unused_input_keys = frozenset(input_map.keys()).difference(used_input_keys)
+    if unused_input_keys:
+      raise ValueError(
+          'Attempted to map inputs that were not found in graph_def: [%s]'
+          % ', '.join(unused_input_keys))
+
+    if return_elements is None:
+      return None
+    else:
+      ret = []
+      for name in return_elements:
+        if ':' in name:
+          try:
+            operation_name, output_index = _ParseTensorName(name)
+            ret.append(name_to_op[operation_name].outputs[output_index])
+          except (ValueError, KeyError, IndexError):
+            raise ValueError(
+                'Requested return_element %r not found in graph_def.' % name)
+        else:
+          try:
+            ret.append(name_to_op[name])
+          except KeyError:
+            raise ValueError(
+                'Requested return_element %r not found in graph_def.' % name)
+      return ret
diff --git a/tensorflow/python/framework/importer_test.py b/tensorflow/python/framework/importer_test.py
new file mode 100644
index 0000000000..470092313a
--- /dev/null
+++ b/tensorflow/python/framework/importer_test.py
@@ -0,0 +1,546 @@
+"""Tests for tensorflow.python.framework.importer."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+from google.protobuf import text_format
+
+from tensorflow.core.framework import op_def_pb2
+from tensorflow.python.framework import device
+from tensorflow.python.framework import op_def_registry
+
+
+_op_list = op_def_pb2.OpList()
+text_format.Merge("""
+  op {
+    name: 'None'
+  }
+  op {
+    name: 'Oi'
+    output_arg { name: 'a' type: DT_INT32 }
+  }
+  op {
+    name: 'Or'
+    output_arg { name: 'a' type: DT_INT32 is_ref: true }
+  }
+  op {
+    name: 'Of'
+    output_arg { name: 'a' type: DT_FLOAT }
+  }
+  op {
+    name: 'Ii'
+    input_arg { name: 'a' type: DT_INT32 }
+  }
+  op {
+    name: 'If'
+    input_arg { name: 'a' type: DT_FLOAT }
+  }
+  op {
+    name: 'Oii'
+    output_arg { name: 'a' type: DT_INT32 }
+    output_arg { name: 'b' type: DT_INT32 }
+  }
+  op {
+    name: 'Oif'
+    output_arg { name: 'a' type: DT_INT32 }
+    output_arg { name: 'b' type: DT_FLOAT }
+  }
+  op {
+    name: 'Iii'
+    input_arg { name: 'a' type: DT_INT32 }
+    input_arg { name: 'b' type: DT_INT32 }
+  }
+  op {
+    name: 'Iff'
+    input_arg { name: 'a' type: DT_FLOAT }
+    input_arg { name: 'b' type: DT_FLOAT }
+  }
+  op {
+    name: 'Iif'
+    input_arg { name: 'a' type: DT_INT32 }
+    input_arg { name: 'b' type: DT_FLOAT }
+  }
+  op {
+    name: 'Iri'
+    input_arg { name: 'a' type: DT_INT32 is_ref: true }
+    input_arg { name: 'b' type: DT_INT32 }
+  }
+  op {
+    name: 'In'
+    input_arg { name: 'a' number_attr: 'N' type_attr: 'T' }
+    attr { name: 'N' type: 'int' minimum: 1 }
+    attr { name: 'T' type: 'type' }
+  }
+  op {
+    name: 'Otl'
+    output_arg { name: 'a' type_list_attr: 't' }
+    attr { name: 'T' type: 'list(type)' minimum: 1 }
+  }
+  op {
+    name: 'Unary'
+    input_arg { name: 'a' type_attr: 'T' }
+    output_arg { name: 'b' type_attr: 'T' }
+    attr { name: 'T' type: 'type' }
+  }
+""", _op_list)
+op_def_registry.register_op_list(_op_list)
+# NOTE(mrry): Dummy shape registrations for ops used in the tests.
+for op_def in _op_list.op:
+  tf.RegisterShape(op_def.name)(None)
+
+class ImportGraphDefTest(tf.test.TestCase):
+
+  def _MakeGraphDef(self, text):
+    ret = tf.GraphDef()
+    text_format.Merge(text, ret)
+    return ret
+
+  def testBasic(self):
+    with tf.Graph().as_default():
+      a, b, c, d = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'Oif' }
+          node { name: 'B' op: 'Otl'
+                 attr { key: 't'
+                        value { list { type: DT_INT32 type: DT_FLOAT } } } }
+          node { name: 'C' op: 'In'
+                 attr { key: 'N' value { i: 2 } }
+                 attr { key: 'T' value { type: DT_INT32 } }
+                 input: 'A:0' input: 'B:0' }
+          node { name: 'D' op: 'In'
+                 attr { key: 'N' value { i: 2 } }
+                 attr { key: 'T' value { type: DT_FLOAT } }
+                 input: 'A:1' input: 'B:1' }
+          """),
+          return_elements=['A', 'B', 'C', 'D'],
+          name='import')
+
+      # Assert that the import process creates distinct tensors.
+      self.assertNotEqual(a.outputs[0].name, a.outputs[1].name)
+      self.assertNotEqual(b.outputs[0].name, b.outputs[1].name)
+      self.assertNotEqual(a.outputs[0].name, b.outputs[0].name)
+      self.assertNotEqual(a.outputs[0].name, b.outputs[1].name)
+      self.assertNotEqual(a.outputs[1].name, b.outputs[0].name)
+      self.assertNotEqual(a.outputs[1].name, b.outputs[1].name)
+
+      # Assert that the ops are connected according to the GraphDef topology.
+      self.assertEqual(c.inputs[0], a.outputs[0])
+      self.assertEqual(c.inputs[1], b.outputs[0])
+      self.assertEqual(d.inputs[0], a.outputs[1])
+      self.assertEqual(d.inputs[1], b.outputs[1])
+
+      # Check the types of the returned ops and tensors.
+      self.assertEqual(a.type, 'Oif')
+      self.assertEqual(b.type, 'Otl')
+      self.assertEqual(c.type, 'In')
+      self.assertEqual(d.type, 'In')
+      self.assertEqual(a.outputs[0].dtype, tf.int32)
+      self.assertEqual(a.outputs[1].dtype, tf.float32)
+      self.assertEqual(b.outputs[0].dtype, tf.int32)
+      self.assertEqual(b.outputs[1].dtype, tf.float32)
+
+      # Check the names of the returned ops.
+      self.assertEqual(a.name, 'import/A')
+      self.assertEqual(b.name, 'import/B')
+      self.assertEqual(c.name, 'import/C')
+      self.assertEqual(d.name, 'import/D')
+
+  def testInputMap(self):
+    with tf.Graph().as_default():
+      feed_a_0 = tf.constant(0, dtype=tf.int32)
+      feed_b_1 = tf.constant(1, dtype=tf.int32)
+
+      a, b, c, d = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'Oii' }
+          node { name: 'B' op: 'Oii' }
+          node { name: 'C' op: 'In'
+                 attr { key: 'N' value { i: 2 } }
+                 attr { key: 'T' value { type: DT_INT32 } }
+                 input: 'A:0' input: 'B:0' }
+          node { name: 'D' op: 'In'
+                 attr { key: 'N' value { i: 2 } }
+                 attr { key: 'T' value { type: DT_INT32 } }
+                 input: 'A:1' input: 'B:1' }
+          """),
+          input_map={'A:0': feed_a_0, 'B:1': feed_b_1},
+          return_elements=['A', 'B', 'C', 'D'])
+
+      self.assertEqual(c.inputs[0], feed_a_0)
+      self.assertEqual(c.inputs[1], b.outputs[0])
+      self.assertEqual(d.inputs[0], a.outputs[1])
+      self.assertEqual(d.inputs[1], feed_b_1)
+
+  def testImplicitZerothOutput(self):
+    with tf.Graph().as_default():
+      a, b = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'Oii' }
+          node { name: 'B' op: 'Ii' input: 'A' }
+          """),
+          return_elements=['A', 'B'])
+
+      self.assertEqual(b.inputs[0], a.outputs[0])
+
+  def testInputMapImplicitZerothOutput(self):
+    with tf.Graph().as_default():
+      feed_a_0 = tf.constant(0, dtype=tf.int32)
+      b, = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'Oii' }
+          node { name: 'B' op: 'Ii' input: 'A:0' }
+          """),
+          input_map={'A': feed_a_0},
+          return_elements=['B'])
+
+      self.assertEqual(b.inputs[0], feed_a_0)
+
+  def testWithControlDependency(self):
+    with tf.Graph().as_default():
+      a, b = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'None' }
+          node { name: 'B' op: 'None' input: '^A' }
+          """),
+          return_elements=['A', 'B'])
+
+      self.assertEqual(b.control_inputs, [a])
+
+  def testWithRefs(self):
+    with tf.Graph().as_default():
+      a, b, c, d = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'Or' }
+          node { name: 'B' op: 'Oi' }
+          node { name: 'C' op: 'Iii' input: 'A:0' input: 'B:0' }
+          node { name: 'D' op: 'Iri' input: 'A:0' input: 'B:0' }
+          """),
+          return_elements=['A', 'B', 'C', 'D'])
+
+      self.assertEqual(c.inputs[0], a.outputs[0])
+      self.assertEqual(c.inputs[1], b.outputs[0])
+      self.assertEqual(d.inputs[0], a.outputs[0])
+      self.assertEqual(d.inputs[1], b.outputs[0])
+
+      self.assertEqual(a.outputs[0].dtype, tf.int32_ref)
+      self.assertEqual(c._input_dtypes, [tf.int32, tf.int32])
+      self.assertEqual(c.outputs, [])
+      self.assertEqual(d._input_dtypes,
+                       [tf.int32_ref, tf.int32])
+      self.assertEqual(d.outputs, [])
+
+  def testCyclic(self):
+    with tf.Graph().as_default():
+      a, b = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'Unary'
+                 attr { key: 'T' value { type: DT_INT32 } } input: 'B:0' }
+          node { name: 'B' op: 'Unary'
+                 attr { key: 'T' value { type: DT_INT32 } } input: 'A:0' }
+          """),
+          return_elements=['A', 'B'])
+
+      self.assertEqual(a.inputs[0], b.outputs[0])
+      self.assertEqual(b.inputs[0], a.outputs[0])
+
+  def testTypeMismatchInGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'Oi' }
+            node { name: 'B' op: 'If' input: 'A:0' }
+            """))
+      self.assertTrue(
+          'Cannot convert a tensor of type int32 to an input of type float' in
+          str(e.exception))
+
+  def testInvalidSignatureTooManyInputsInGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'Oi' }
+            node { name: 'B' op: 'None' input: 'A:0' }
+            """))
+      self.assertTrue('More inputs specified (u\'A:0\') than the op expects' in
+                      str(e.exception))
+
+  def testInvalidSignatureNotEnoughInputsInGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'Oi' }
+            node { name: 'B' op: 'Iif' input: 'A:0' }
+            """))
+      self.assertTrue('Input types mismatch (expected \'int32, float32\' but '
+                      'got \'int32\')' in str(e.exception))
+
+  def testMissingInputOpInGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'B' op: 'If' input: 'A:0' }
+            """))
+      self.assertTrue('Input tensor %r not found' % (u'A:0',) in
+                      str(e.exception))
+
+  def testMissingInputOpInGraphDefButAppearsInInputMap(self):
+    with tf.Graph().as_default():
+      feed_a_0 = tf.constant(5.0)
+      b, = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'B' op: 'If' input: 'A:0' }
+          """),
+          input_map={'A:0': feed_a_0},
+          return_elements=['B'])
+      self.assertEqual(b.inputs[0], feed_a_0)
+
+  def testMissingInputTensorInGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'Of' }
+            node { name: 'B' op: 'If' input: 'A:1' }
+            """))
+      self.assertTrue('Input tensor %r not found' % (u'A:1',) in
+                      str(e.exception))
+
+  def testMissingControlInputInGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'B' op: 'None' input: '^A' }
+            """))
+      self.assertTrue('Control input %r not found' % (u'^A',) in
+                      str(e.exception))
+
+  def testInvalidTensorNameOutputIndexInGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'B' op: 'None' input: 'A:B' }
+            """))
+      self.assertEqual(
+          'Cannot convert %r to a tensor name.' % (u'A:B',), str(e.exception))
+
+  def testInvalidTensorNameInGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'B' op: 'None' input: 'A:B:0' }
+            """))
+      self.assertEqual(
+          'Cannot convert %r to a tensor name.' % (u'A:B:0',), str(e.exception))
+
+  def testMissingReturnOperation(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'None' }
+            """),
+            return_elements=['B'])
+      self.assertTrue('return_element %r not found in graph_def.' % ('B') in
+                      str(e.exception))
+
+  def testMissingReturnTensor(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'Oi' }
+            """),
+            return_elements=['A:1'])
+      self.assertTrue('return_element %r not found in graph_def.' % ('A:1') in
+                      str(e.exception))
+
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'Oi' }
+            """),
+            return_elements=['B:0'])
+      self.assertTrue('return_element %r not found in graph_def.' % ('B:0') in
+                      str(e.exception))
+
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'Oi' }
+            """),
+            return_elements=['A:B:0'])
+      self.assertTrue('return_element %r not found in graph_def.' % ('A:B:0') in
+                      str(e.exception))
+
+  def testMissingInputMap(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'None' }
+            """),
+            input_map={'B:0': tf.constant(5.0)})
+      self.assertTrue('not found in graph_def: [B:0]' in str(e.exception))
+
+  def testInputMapTypeMismatch(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(ValueError) as e:
+        tf.import_graph_def(
+            self._MakeGraphDef("""
+            node { name: 'A' op: 'Oi' }
+            node { name: 'B' op: 'Ii' input: 'A:0' }
+            """),
+            input_map={'A:0': tf.constant(5.0)})
+      self.assertTrue(
+          'Cannot convert a tensor of type float32 to an input of type int32.'
+          in str(e.exception))
+
+  def testNoReturns(self):
+    with tf.Graph().as_default() as g:
+      ret = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'None' }
+          """))
+      self.assertEqual(ret, None)
+
+      a = g.get_operation_by_name('import/A')
+      self.assertEqual(a.type, 'None')
+
+  def testOverrideNamePrefix(self):
+    with tf.Graph().as_default():
+      a, = tf.import_graph_def(
+          self._MakeGraphDef("""
+          node { name: 'A' op: 'None' }
+          """),
+          return_elements=['A'], name='imported_graph')
+      self.assertEqual(a.name, 'imported_graph/A')
+
+  def testEmptyGraph(self):
+    with tf.Graph().as_default() as g:
+      init_version = g.version
+      tf.import_graph_def(self._MakeGraphDef(''))
+      self.assertEqual(init_version, g.version)
+
+  def testInvalidInputForGraphDef(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(TypeError) as e:
+        tf.import_graph_def('')
+      self.assertEqual(
+          'graph_def must be a GraphDef proto.', str(e.exception))
+
+  def testInvalidInputForInputMap(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(TypeError) as e:
+        tf.import_graph_def(self._MakeGraphDef(''),
+                                input_map=[tf.constant(5.0)])
+      self.assertEqual('input_map must be a dictionary mapping strings to '
+                       'Tensor objects.', str(e.exception))
+
+  def testInvalidInputForReturnOperations(self):
+    with tf.Graph().as_default():
+      with self.assertRaises(TypeError) as e:
+        tf.import_graph_def(self._MakeGraphDef(''), return_elements=[7])
+      self.assertEqual(
+          'return_elements must be a list of strings.', str(e.exception))
+
+  def testWithExtensionAndAttr(self):
+    with tf.Graph().as_default() as g:
+      c = tf.constant(5.0, dtype=tf.float32, name='c')
+      tf.pack([c, c], name='pack')
+    gdef = g.as_graph_def()
+
+    with self.test_session():
+      pack, = tf.import_graph_def(gdef, return_elements=['pack'])
+      self.assertAllEqual(pack.outputs[0].eval(), [5.0, 5.0])
+
+  def testWithDevice(self):
+    with tf.Graph().as_default() as g:
+      # No device.
+      a = tf.constant(3.0, name='a')
+
+      with tf.device('/cpu:0'):
+        b = tf.constant(4.0, name='b')
+      with tf.device('/job:worker'):
+        c = tf.constant(5.0, name='c')
+
+    gdef = g.as_graph_def()
+
+    with tf.Graph().as_default():
+      a2, b2, c2 = tf.import_graph_def(
+          gdef, return_elements=['a', 'b', 'c'])
+      self.assertEqual(a.device, a2.device)
+      self.assertEqual(b.device, b2.device)
+      self.assertEqual(c.device, c2.device)
+
+    with tf.Graph().as_default():
+      with tf.device(device.merge_device('/task:0')):
+        a3, b3, c3 = tf.import_graph_def(
+            gdef, return_elements=['a', 'b', 'c'])
+        self.assertEqual('/task:0', a3.device)
+        self.assertEqual('/task:0/device:CPU:0', b3.device)  # canonicalized.
+        self.assertEqual(c.device + '/task:0', c3.device)
+
+    with tf.Graph().as_default():
+      with tf.device(device.merge_device('/job:ps')):
+        a4, b4, c4 = tf.import_graph_def(
+            gdef, return_elements=['a', 'b', 'c'])
+        self.assertEqual('/job:ps', a4.device)
+        self.assertEqual('/job:ps/device:CPU:0', b4.device)  # canonicalized.
+        self.assertEqual(c.device, c4.device)  # worker overrides ps.
+
+    with tf.Graph().as_default():
+      with tf.device(device.merge_device('/gpu:0')):
+        a5, b5, c5 = tf.import_graph_def(
+            gdef, return_elements=['a', 'b', 'c'])
+        self.assertEqual('/device:GPU:0', a5.device)
+        self.assertEqual('/device:CPU:0', b5.device)  # cpu overrides gpu.
+        self.assertEqual(c.device + '/device:GPU:0', c5.device)
+
+  def testGradient(self):
+    with tf.Graph().as_default() as g:
+      inputs = tf.placeholder(tf.float32, shape=[None, 100], name="input")
+      weights = tf.placeholder(tf.float32, shape=[100, 10], name="weights")
+      biases = tf.placeholder(tf.float32, shape=[10], name="biases")
+      activations = tf.nn.relu(tf.matmul(inputs, weights) + biases,
+                               name="activations")
+      loss = tf.reduce_mean(activations, name="loss")
+    gdef = g.as_graph_def()
+
+    with tf.Graph().as_default() as g:
+      input_placeholder = tf.placeholder(tf.float32, shape=[32, 100])
+      weights_var = tf.Variable(tf.truncated_normal([100, 10]), name="weights")
+      biases_var = tf.Variable(tf.zeros(10), name="biases")
+      activations, loss = tf.import_graph_def(
+          gdef,
+          input_map={"input:0": input_placeholder,
+                     "weights:0": weights_var,
+                     "biases:0": biases_var},
+          return_elements=["activations:0", "loss:0"])
+      self.assertEqual([32, 10], activations.get_shape())
+      self.assertEqual([], loss.get_shape())
+      weights_grad, biases_grad = tf.gradients(loss, [weights_var, biases_var])
+      self.assertEqual([100, 10], weights_grad.get_shape())
+      self.assertEqual([10], biases_grad.get_shape())
+
+  def testLargeGraph(self):
+    with self.test_session():
+      # The default message byte limit is 64M. Ours is 2G with a warning at 512.
+      # Adding a 150M entries float32 tensor should blow through the warning,
+      # but not the hard limit.
+      input_shape = [150, 1024, 1024]
+      tensor_input = tf.np.random.rand(*input_shape).astype(tf.np.float32)
+      t = tf.constant(tensor_input, shape=input_shape)
+      g = tf.identity(t)
+      g.eval()
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/python/framework/op_def_registry.py b/tensorflow/python/framework/op_def_registry.py
new file mode 100644
index 0000000000..2ec8c94a10
--- /dev/null
+++ b/tensorflow/python/framework/op_def_registry.py
@@ -0,0 +1,23 @@
+"""Global registry for OpDefs."""
+
+from tensorflow.core.framework import op_def_pb2
+
+
+_registered_ops = {}
+
+
+def register_op_list(op_list):
+  """Register all the ops in an op_def_pb2.OpList."""
+  if not isinstance(op_list, op_def_pb2.OpList):
+    raise TypeError("%s is %s, not an op_def_pb2.OpList" %
+                    (op_list, type(op_list)))
+  for op_def in op_list.op:
+    if op_def.name in _registered_ops:
+      assert _registered_ops[op_def.name] == op_def
+    else:
+      _registered_ops[op_def.name] = op_def
+
+
+def get_registered_ops():
+  """Returns a dictionary mapping names to OpDefs."""
+  return _registered_ops
diff --git a/tensorflow/python/framework/ops.py b/tensorflow/python/framework/ops.py
new file mode 100644
index 0000000000..0b0442cea1
--- /dev/null
+++ b/tensorflow/python/framework/ops.py
@@ -0,0 +1,2985 @@
+"""Classes and functions used to construct graphs."""
+# pylint: disable=g-bad-name
+import collections
+import contextlib
+import copy
+import linecache
+import re
+import sys
+import threading
+import weakref
+
+import tensorflow.python.platform
+
+from tensorflow.core.framework import attr_value_pb2
+from tensorflow.core.framework import graph_pb2
+from tensorflow.python.framework import device as pydev
+from tensorflow.python.framework import registry
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import types
+
+
+def _convert_stack(stack):
+  """Converts a stack extracted using _extract_stack() to a traceback stack.
+
+  Args:
+    stack: A list of n 4-tuples, (filename, lineno, name, frame_globals).
+
+  Returns:
+    A list of n 4-tuples (filename, lineno, name, code), where the code tuple
+    element is calculated from the corresponding elements of the input tuple.
+  """
+  ret = []
+  for filename, lineno, name, frame_globals in stack:
+    linecache.checkcache(filename)
+    line = linecache.getline(filename, lineno, frame_globals)
+    if line:
+      line = line.strip()
+    else:
+      line = None
+    ret.append((filename, lineno, name, line))
+  return ret
+
+
+# pylint: disable=line-too-long
+def _extract_stack():
+  """A lightweight re-implementation of traceback.extract_stack.
+
+  NOTE(mrry): traceback.extract_stack eagerly retrieves the line of code for
+    each stack frame using linecache, which results in an abundance of stat()
+    calls. This implementation does not retrieve the code, and any consumer
+    should apply _convert_stack to the result to obtain a traceback that can
+    be formatted etc. using traceback methods.
+
+  Returns:
+    A list of 4-tuples (filename, lineno, name, frame_globals) corresponding to
+    the call stack of the current thread.
+  """
+  # pylint: enable=line-too-long
+  try:
+    raise ZeroDivisionError
+  except ZeroDivisionError:
+    f = sys.exc_info()[2].tb_frame.f_back
+  ret = []
+  while f is not None:
+    lineno = f.f_lineno
+    co = f.f_code
+    filename = co.co_filename
+    name = co.co_name
+    frame_globals = f.f_globals
+    ret.append((filename, lineno, name, frame_globals))
+    f = f.f_back
+  ret.reverse()
+  return ret
+
+
+class Tensor(object):
+  """Represents a value produced by an `Operation`.
+
+  A `Tensor` is a symbolic handle to one of the outputs of an
+  `Operation`. It does not hold the values of that operation's output,
+  but instead provides a means of computing those values in a
+  TensorFlow [`Session`](client.md#Session).
+
+  This class has two primary purposes:
+
+  1. A `Tensor` can be passed as an input to another `Operation`.
+     This builds a dataflow connection between operations, which
+     enables TensorFlow to execute an entire `Graph` that represents a
+     large, multi-step computation.
+
+  2. After the graph has been launched in a session, the value of the
+     `Tensor` can be computed by passing it to
+     [`Session.run()`](client.md#Session.run).
+     `t.eval()` is a shortcut for calling
+     `tf.get_default_session().run(t)`.
+
+  In the following example, `c`, `d`, and `e` are symbolic `Tensor`
+  objects, whereas `result` is a numpy array that stores a concrete
+  value:
+
+  ```python
+  # Build a dataflow graph.
+  c = tf.constant([[1.0, 2.0], [3.0, 4.0]])
+  d = tf.constant([[1.0, 1.0], [0.0, 1.0]])
+  e = tf.matmul(c, d)
+
+  # Construct a `Session` to execut the graph.
+  sess = tf.Session()
+
+  # Execute the graph and store the value that `e` represents in `result`.
+  result = sess.run(e)
+  ```
+
+  @@dtype
+  @@name
+  @@value_index
+  @@graph
+  @@op
+  @@consumers
+
+  @@eval
+
+  @@get_shape
+  @@set_shape
+
+  """
+
+  # List of Python operators that we allow to override.
+  OVERLOADABLE_OPERATORS = {
+      # Binary.
+      "__add__", "__radd__",
+      "__sub__", "__rsub__",
+      "__mul__", "__rmul__",
+      "__div__", "__rdiv__",
+      "__truediv__", "__rtruediv__",
+      "__mod__", "__rmod__",
+      "__lt__", "__le__",
+      "__gt__", "__ge__",
+      "__and__", "__rand__",
+      "__or__", "__ror__",
+      "__xor__", "__rxor__",
+      "__getitem__",
+      # Unary.
+      "__invert__",
+      "__neg__", "__abs__"}
+
+  def __init__(self, op, value_index, dtype):
+    """Creates a new `Tensor`.
+
+    Args:
+      op: An `Operation`. `Operation` that computes this tensor.
+      value_index: An `int`. Index of the operation's endpoint that produces
+        this tensor.
+      dtype: A `types.DType`. Type of data stored in this tensor.
+
+    Raises:
+      TypeError: If the op is not an `Operation`.
+    """
+    if not isinstance(op, Operation):
+      raise TypeError("op needs to be an Operation: %s" % op)
+    self._op = op
+    self._value_index = value_index
+    self._dtype = types.as_dtype(dtype)
+    self._shape = tensor_shape.unknown_shape()
+    # List of operations that use this Tensor as input.  We maintain this list
+    # to easily navigate a computation graph.
+    self._consumers = []
+
+  @property
+  def op(self):
+    """The `Operation` that produces this tensor as an output."""
+    return self._op
+
+  @property
+  def dtype(self):
+    """The `DType` of elements in this tensor."""
+    return self._dtype
+
+  @property
+  def graph(self):
+    """The `Graph` that contains this tensor."""
+    return self._op.graph
+
+  @property
+  def name(self):
+    """The string name of this tensor."""
+    if not self._op.name:
+      raise ValueError("Operation was not named: %s" % self._op)
+    return "%s:%d" % (self._op.name, self._value_index)
+
+  @property
+  def device(self):
+    """The name of the device on which this tensor will be produced, or None."""
+    return self._op.device
+
+  def _shape_as_list(self):
+    if self._shape.ndims is not None:
+      return [dim.value for dim in self._shape.dims]
+    else:
+      return None
+
+  def get_shape(self):
+    """Returns the `TensorShape` that represents the shape of this tensor.
+
+    The shape is computed using shape inference functions that are
+    registered for each `Operation` type using `tf.RegisterShape`.
+    See [`TensorShape`](framework.md#TensorShape) for more details of what a shape
+    represents.
+
+    The inferred shape of a tensor is used to provide shape
+    information without having to launch the graph in a session. This
+    can be used for debugging, and providing early error messages. For
+    example:
+
+    ```python
+    c = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
+
+    print c.get_shape()
+    ==> TensorShape([Dimension(2), Dimension(3)])
+
+    d = tf.constant([[1.0, 0.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0]])
+
+    print d.get_shape()
+    ==> TensorShape([Dimension(4), Dimension(2)])
+
+    # Raises a ValueError, because `c` and `d` do not have compatible
+    # inner dimensions.
+    e = tf.matmul(c, d)
+
+    f = tf.matmul(c, d, transpose_a=True, transpose_b=True)
+
+    print f.get_shape()
+    ==> TensorShape([Dimension(3), Dimension(4)])
+    ```
+
+    In some cases, the inferred shape may have unknown dimensions. If
+    the caller has additional information about the values of these
+    dimensions, `Tensor.set_shape()` can be used to augment the
+    inferred shape.
+
+    Returns:
+      A `TensorShape` representing the shape of this tensor.
+    """
+    return self._shape
+
+  def set_shape(self, shape):
+    """Updates the shape of this tensor.
+
+    This method can be called multiple times, and will merge the given
+    `shape` with the current shape of this tensor. It can be used to
+    provide additional information about the shape of this tensor that
+    cannot be inferred from the graph alone. For example, this can be used
+    to provide additional information about the shapes of images:
+
+    ```python
+    _, image_data = tf.TFRecordReader(...).read(...)
+    image = tf.image.decode_png(image_data, channels=3)
+
+    # The height and width dimensions of `image` are data dependent, and
+    # cannot be computed without executing the op.
+    print image.get_shape()
+    ==> TensorShape([Dimension(None), Dimension(None), Dimension(3)])
+
+    # We know that each image in this dataset is 28 x 28 pixels.
+    image.set_shape([28, 28, 3])
+    print image.get_shape()
+    ==> TensorShape([Dimension(28), Dimension(28), Dimension(3)])
+    ```
+
+    Args:
+      shape: A `TensorShape` representing the shape of this tensor.
+
+    Raises:
+      ValueError: If `shape` is not compatible with the current shape of
+        this tensor.
+    """
+    self._shape = self._shape.merge_with(shape)
+
+  @property
+  def value_index(self):
+    """The index of this tensor in the outputs of its `Operation`."""
+    return self._value_index
+
+  def consumers(self):
+    """Returns a list of `Operation`s that consume this tensor.
+
+    Returns:
+      A list of `Operation`s.
+    """
+    return self._consumers
+
+  def _add_consumer(self, consumer):
+    """Add a consumer to this tensor.
+
+    Args:
+      consumer: an Operation.
+
+    Raises:
+      TypeError: if the consumer is not an Operation.
+    """
+    if not isinstance(consumer, Operation):
+      raise TypeError("Consumer must be an Operation: %s" % consumer)
+    self._consumers.append(consumer)
+
+  def _as_node_def_input(self):
+    """Return a value to use for the NodeDef "input" attribute.
+
+    The returned string can be used in a NodeDef "input" attribute
+    to indicate that the NodeDef uses this Tensor as input.
+
+    Raises:
+      ValueError: if this Tensor's Operation does not have a name.
+
+    Returns:
+      a string.
+    """
+    if not self._op.name:
+      raise ValueError("Operation was not named: %s" % self._op)
+    if self._value_index == 0:
+      return self._op.name
+    else:
+      return "%s:%d" % (self._op.name, self._value_index)
+
+  def __str__(self):
+    return "Tensor(\"%s\"%s%s%s)" % (
+        self.name,
+        (", shape=%s" % self.get_shape())
+        if self.get_shape().ndims is not None else "",
+        (", dtype=%s" % self._dtype.name) if self._dtype else "",
+        (", device=%s" % self.device) if self.device else "")
+
+  def __hash__(self):
+    # Necessary to support Python's collection membership operators
+    return id(self)
+
+  def __eq__(self, other):
+    # Necessary to support Python's collection membership operators
+    return id(self) == id(other)
+
+  # NOTE(mrry): This enables the Tensor's overloaded "right" binary
+  # operators to run when the left operand is an ndarray, because it
+  # accords the Tensor class higher priority than an ndarray, or a
+  # numpy matrix.
+  # TODO(mrry): Convert this to using numpy's __numpy_ufunc__
+  # mechanism, which allows more control over how Tensors interact
+  # with ndarrays.
+  __array_priority__ = 100
+
+  @staticmethod
+  def _override_operator(operator, func):
+    """Overrides (string) operator on Tensors to call func.
+
+    Args:
+      operator: the string name of the operator to override.
+      func: the function that replaces the overriden operator.
+
+    Raises:
+      ValueError: If operator has already been overwritten,
+        or if operator is not allowed to be overwritten.
+    """
+    if getattr(Tensor, operator, None) is not None:
+      # check to see if this is a default method-wrapper which will be true
+      # for the comparison operators.
+      if not isinstance(getattr(Tensor, operator, None), type(all.__call__)):
+        raise ValueError("operator %s cannot be overwritten again." % operator)
+    if operator not in Tensor.OVERLOADABLE_OPERATORS:
+      raise ValueError("Overriding %s is disallowed" % operator)
+    setattr(Tensor, operator, func)
+
+  def __iter__(self):
+    """Dummy method to prevent iteration. Do not call.
+
+    NOTE(mrry): If we register __getitem__ as an overloaded operator,
+    Python will valiantly attempt to iterate over the Tensor from 0 to
+    infinity.  Declaring this method prevents this unintended
+    behavior.
+
+    Raises:
+      TypeError: when invoked.
+    """
+    raise TypeError("'Tensor' object is not iterable")
+
+  def eval(self, feed_dict=None, session=None):
+    """Evaluates this tensor in a `Session`.
+
+    Calling this method will execute all preceding operations that
+    produce the inputs needed for the operation that produces this
+    tensor.
+
+    *N.B.* Before invoking `Tensor.eval()`, its graph must have been
+    launched in a session, and either a default session must be
+    available, or `session` must be specified explicitly.
+
+    Args:
+      feed_dict: A dictionary that maps `Tensor` objects to feed values.
+        See [`Session.run()`](client.md#Session.run) for a description of
+        the valid feed values.
+      session: (Optional.) The `Session` to be used to evaluate this tensor. If
+        none, the default session will be used.
+
+    Returns:
+      A numpy array corresponding to the value of this tensor.
+
+    """
+    return _eval_using_default_session(self, feed_dict, self.graph, session)
+
+
+def _TensorTensorConversionFunction(t, dtype=None, name=None):
+  _ = name
+  if dtype and not dtype.is_compatible_with(t.dtype):
+    raise ValueError(
+        "Tensor conversion requested dtype %s for Tensor with dtype %s: %r"
+        % (dtype.name, t.dtype.name, str(t)))
+  return t
+
+
+_tensor_conversion_func_registry = {
+    0: [(Tensor, _TensorTensorConversionFunction)]}
+
+
+def convert_to_tensor(value, dtype=None, name=None):
+  """Converts the given `value` to a `Tensor`.
+
+  This function converts Python objects of various types to `Tensor`
+  objects. It accepts `Tensor` objects, numpy arrays, Python lists,
+  and Python scalars. For example:
+
+  ```python
+  import numpy as np
+  array = np.random.rand((32, 100, 100))
+
+  def my_func(arg):
+    arg = tf.convert_to_tensor(arg, dtype=tf.float32)
+    return tf.matmul(arg, arg) + arg
+
+  # The following calls are equivalent.
+  value_1 = my_func(tf.constant([[1.0, 2.0], [3.0, 4.0]))
+  value_2 = my_func([[1.0, 2.0], [3.0, 4.0]])
+  value_3 = my_func(numpy.array([[1.0, 2.0], [3.0, 4.0]], dtype=numpy.float32))
+  ```
+
+  This function can be useful when composing a new operation in Python
+  (such as `my_func` in the example above). All standard Python op
+  constructors apply this function to each of their Tensor-valued
+  inputs, which allows those ops to accept numpy arrays, Python lists,
+  and scalars in addition to `Tensor` objects.
+
+  Args:
+    value: An object whose type has a registered `Tensor` conversion function.
+    dtype: Optional element type for the returned tensor. If missing, the
+      type is inferred from the type of `value`.
+    name: Optional name to use if a new `Tensor` is created.
+
+  Returns:
+    A `Tensor` based on `value`.
+
+  Raises:
+    TypeError: If no conversion function is registered for `value`.
+    RuntimeError: If a registered conversion function returns an invalid value.
+
+  """
+  error_prefix = "" if name is None else "%s: " % name
+  if dtype is not None:
+    dtype = types.as_dtype(dtype)
+  for _, funcs_at_priority in sorted(_tensor_conversion_func_registry.items()):
+    for base_type, conversion_func in funcs_at_priority:
+      if isinstance(value, base_type):
+        ret = conversion_func(value, dtype=dtype, name=name)
+        if not isinstance(ret, Tensor):
+          raise RuntimeError(
+              "%sConversion function %r for type %s returned non-Tensor: %r"
+              % (error_prefix, conversion_func, base_type, ret))
+        if dtype and not dtype.is_compatible_with(ret.dtype):
+          raise RuntimeError(
+              "%sConversion function %r for type %s returned incompatible "
+              "dtype: requested = %s, actual = %s"
+              % (error_prefix, conversion_func, base_type,
+                 dtype.name, ret.dtype.name))
+        return ret
+  raise TypeError("%sCannot convert %r with type %s to Tensor: "
+                  "no conversion function registered."
+                  % (error_prefix, value, type(value)))
+
+
+def convert_to_tensor_or_indexed_slices(value, dtype=None, name=None):
+  """Converts the given object to a `Tensor` or an `IndexedSlices`.
+
+  If `value` is an `IndexedSlices` it is returned
+  unmodified. Otherwise, it is converted to a `Tensor` using
+  `convert_to_tensor()`.
+
+  Args:
+    value: An `IndexedSlices` or an object that can be consumed by
+      `convert_to_tensor()`.
+    dtype: (Optional.) The required `DType` of the returned `Tensor` or
+      `IndexedSlices`.
+    name: (Optional.) A name to use if a new `Tensor` is created.
+
+  Returns:
+    An `Tensor` or an `IndexedSlices` based on `value`.
+
+  Raises:
+    ValueError: If `dtype` does not match the element type of `value`.
+  """
+  if isinstance(value, IndexedSlices):
+    if dtype and not types.AsDType(dtype).is_compatible_with(value.dtype):
+      raise ValueError(
+          "Tensor conversion requested dtype %s for Tensor with dtype %s: %r"
+          % (types.AsDType(dtype).name, value.dtype.name, str(value)))
+    return value
+  else:
+    return convert_to_tensor(value, dtype, name)
+
+
+def convert_n_to_tensor_or_indexed_slices(values, dtype=None, name=None):
+  """Converts `values` to a list of `Tensor` or `IndexedSlices` objects.
+
+  Args:
+    values: A list of `None`, `IndexedSlices`, or objects that can be consumed
+      by `convert_to_tensor()`.
+    dtype: (Optional.) The required `DType` of the returned `Tensor`
+      `IndexedSlices`.
+
+    name: (Optional.) A name prefix to used when a new `Tensor` is
+      created, in which case element `i` will be given the name `name
+      + '_' + i`.
+
+  Returns:
+    A list of `Tensor` and/or `IndexedSlices` objects.
+
+  Raises:
+    TypeError: If no conversion function is registered for an element in
+      `values`.
+    RuntimeError: If a registered conversion function returns an invalid
+      value.
+  """
+  if not isinstance(values, collections.Sequence):
+    raise TypeError("values must be a list.")
+  ret = []
+  for i, value in enumerate(values):
+    if value is None:
+      ret.append(value)
+    else:
+      n = None if name is None else "%s_%d" % (name, i)
+      ret.append(
+          convert_to_tensor_or_indexed_slices(value, dtype=dtype, name=n))
+  return ret
+
+
+def register_tensor_conversion_function(base_type, conversion_func,
+                                        priority=100):
+  """Registers a function for converting objects of base_type to Tensor.
+
+  The conversion function must have the following signature:
+
+      def conversion_func(value, dtype=None, name=None):
+        # ...
+
+  It must return a Tensor with the given dtype if specified. If the
+  conversion function creates a new Tensor, it should use the given
+  name if specified. All exceptions will be propagated to the caller.
+
+  NOTE: The conversion functions will execute in order of priority,
+    followed by order of registration. To ensure that a conversion
+    function F runs before another conversion function G, ensure that
+    F is registered with a smaller priority than G.
+
+  Args:
+    base_type: The base type or tuple of base types for all objects that
+      `conversion_func` accepts.
+    conversion_func: A function that converts instances of base_type to Tensor.
+    priority: Optional integer that indicates the priority for applying this
+      conversion function. Conversion functions with smaller priority values
+      run earlier than conversion functions with larger priority values.
+      Defaults to 100.
+
+  Raises:
+    TypeError: If the arguments do not have the appropriate type.
+
+  """
+  if not (isinstance(base_type, type) or
+          (isinstance(base_type, tuple)
+           and all(isinstance(x, type) for x in base_type))):
+    raise TypeError("base_type must be a type or a tuple of types.")
+  if not callable(conversion_func):
+    raise TypeError("conversion_func must be callable.")
+
+  try:
+    funcs_at_priority = _tensor_conversion_func_registry[priority]
+  except KeyError:
+    funcs_at_priority = []
+    _tensor_conversion_func_registry[priority] = funcs_at_priority
+  funcs_at_priority.append((base_type, conversion_func))
+
+
+class IndexedSlices(object):
+  """A sparse representation of a set of tensor slices at given indices.
+
+  This class is a simple wrapper for a pair of `Tensor` objects:
+
+  * `values`: A `Tensor` of any dtype with shape `[D0, D1, ..., Dn]`.
+  * `indices`: A 1-D integer `Tensor` with shape `[D0]`.
+
+  An `IndexedSlices` is typically used to represent a subset of a larger
+  tensor `dense` of shape `[LARGE0, D1, .. , DN]` where `LARGE0 >> D0`.
+  The values in `indices` are the indices in the first dimension of
+  the slices that have been extracted from the larger tensor.
+
+  The dense tensor `dense` represented by an `IndexedSlices` `slices` has
+
+  ```python
+  dense[slices.indices[i], :, :, :, ...] = slices.values[i, :, :, :, ...]
+  ```
+
+  The `IndexedSlices` class is used principally in the definition of
+  gradients for operations that have sparse gradients
+  (e.g. [`tf.gather`](array_ops.md#gather)).
+
+  Contrast this representation with
+  [`SparseTensor`](sparse_ops.md#SparseTensor),
+  which uses multi-dimensional indices and scalar values.
+
+  @@__init__
+
+  @@values
+  @@indices
+  @@dense_shape
+
+  @@name
+  @@dtype
+  @@device
+  @@op
+  """
+
+  def __init__(self, values, indices, dense_shape=None):
+    """Creates an `IndexedSlices`."""
+    self._values = values
+    self._indices = indices
+    self._dense_shape = dense_shape
+
+  @property
+  def values(self):
+    """A `Tensor` containing the values of the slices."""
+    return self._values
+
+  @property
+  def indices(self):
+    """A 1-D `Tensor` containing the indices of the slices."""
+    return self._indices
+
+  @property
+  def dense_shape(self):
+    """A 1-D `Tensor` containing the shape of the corresponding dense tensor."""
+    return self._dense_shape
+
+  @property
+  def name(self):
+    """The name of this `IndexedSlices`."""
+    return self.values.name
+
+  @property
+  def device(self):
+    """The name of the device on which `values` will be produced, or `None`."""
+    return self.values.device
+
+  @property
+  def op(self):
+    """The `Operation` that produces `values` as an output."""
+    return self.values.op
+
+  @property
+  def dtype(self):
+    """The `DType` of elements in this tensor."""
+    return self.values.dtype
+
+  def __str__(self):
+    return "IndexedSlices(indices=%s, values=%s)" % (
+        self._indices, self._values)
+
+
+def assert_same_graph(items, expected_graph=None):
+  """Asserts all items are from the same graph.
+
+  Args:
+    items: List of graph items (e.g., Variable, Tensor, SparseTensor,
+        Operation, or IndexedSlices).
+    expected_graph: Expected graph. If not specified, assert all tensors are
+        from the same graph.
+  Returns:
+    items, for chaining.
+  Raises:
+    ValueError: If any graphs do not match.
+  """
+  for item in items:
+    if not expected_graph:
+      expected_graph = item.graph
+    elif expected_graph != item.graph:
+      raise ValueError("Items must be from the same graph.")
+  return items
+
+
+class SparseTensor(object):
+  """Represents a sparse tensor.
+
+  Tensorflow represents a sparse tensor as three separate dense tensors:
+  `indices`, `values`, and `dense_shape`.  In Python, the three tensors are
+  collected into a `SparseTensor` class for ease of use.  If you have separate
+  `indices`, `values`, and `dense_shape` tensors, wrap them in a `SparseTensor`
+  object before passing to the Ops below.
+
+  Concretely, the sparse tensor `SparseTensor(values, indices, dense_shape)` is
+
+  * `indices`: A 2-D int64 tensor of shape `[N, ndims]`.
+  * `values`: A 1-D tensor of any type and shape `[N]`.
+  * `dense_shape`: A 1-D int64 tensor of shape `[ndims]`.
+
+  where `N` and `ndims` are the number of values, and number of dimensions in
+  the `SparseTensor` respectively.
+
+  The corresponding dense tensor satisfies
+
+  ```python
+  dense.shape = dense_shape
+  dense[tuple(indices[i])] = values[i]
+  ```
+
+  By convention, `indices` should be sorted in row-major order (or equivalently
+  lexigraphic order on the tuples `indices[i]`).  This is not enforced when
+  `SparseTensor` objects are constructed, but most Ops assume correct ordering.
+  If the ordering is wrong, it can be fixed by calling `sparse_reorder` on the
+  misordered `SparseTensor`.
+
+  Example: The sparse tensor
+
+  ```python
+    SparseTensor(values=[1, 2], indices=[[0, 0], [1, 2]], shape=[3, 4])
+  ```
+
+  represents the dense tensor
+
+  ```python
+    [[1, 0, 0, 0]
+     [0, 0, 2, 0]
+     [0, 0, 0, 0]]
+  ```
+
+  @@__init__
+  @@indices
+  @@values
+  @@dtype
+  @@shape
+  @@graph
+  """
+
+  def __init__(self, indices, values, shape):
+    """Creates a `SparseTensor`.
+
+    Args:
+      indices: A 2-D int64 tensor of shape `[N, ndims]`.
+      values: A 1-D tensor of any type and shape `[N]`.
+     dense_shape: A 1-D int64 tensor of shape `[ndims]`.
+
+    Returns:
+      A `SparseTensor`
+    """
+    with op_scope([indices, values, shape], None, "SparseTensor"):
+      indices = convert_to_tensor(indices, name="indices")
+      values = convert_to_tensor(values, name="values")
+      shape = convert_to_tensor(shape, name="shape")
+    self._indices = indices
+    self._values = values
+    self._shape = shape
+
+    indices_shape = indices.get_shape().with_rank(2)
+    values_shape = values.get_shape().with_rank(1)
+    shape_shape = shape.get_shape().with_rank(1)
+
+    # Assert number of rows in indices match the number of elements in values.
+    indices_shape[0].merge_with(values_shape[0])
+    # Assert number of columns in indices matches the number of elements in
+    # shape.
+    indices_shape[1].merge_with(shape_shape[0])
+
+  @property
+  def indices(self):
+    """The indices of non-zero values in the represented dense tensor.
+
+    Returns:
+      A 2-D Tensor of int64 with shape `[N, ndims]`, where `N` is the
+        number of non-zero values in the tensor, and `ndims` is the rank.
+    """
+    return self._indices
+
+  @property
+  def values(self):
+    """The non-zero values in the represented dense tensor.
+
+    Returns:
+      A 1-D Tensor of any data type.
+    """
+    return self._values
+
+  @property
+  def dtype(self):
+    """The `DType` of elements in this tensor."""
+    return self._values.dtype
+
+  @property
+  def shape(self):
+    """A 1-D Tensor of int64 representing the shape of the dense tensor."""
+    return self._shape
+
+  @property
+  def graph(self):
+    """The `Graph` that contains the index, value, and shape tensors."""
+    return self._indices.graph
+
+  def __str__(self):
+    return "SparseTensor(indices=%s, values=%s, shape=%s)" % (
+        self._indices, self._values, self._shape)
+
+
+SparseTensorValue = collections.namedtuple("SparseTensorValue",
+                                           ["indices", "values", "shape"])
+
+
+def _device_string(dev_spec):
+  if isinstance(dev_spec, pydev.Device):
+    return dev_spec.to_string()
+  else:
+    return dev_spec
+
+
+def _NodeDef(op_type, name, device=None, attrs=None):
+  """Create a NodeDef proto.
+
+  Args:
+    op_type: Value for the "op" attribute of the NodeDef proto.
+    name: Value for the "name" attribute of the NodeDef proto.
+    device: string, device, or function from NodeDef to string.
+      Value for the "device" attribute of the NodeDef proto.
+    attrs: optional list for the "attr" attribute of the NodeDef proto.
+
+  Returns:
+    A graph_pb2.NodeDef protocol buffer.
+  """
+  node_def = graph_pb2.NodeDef()
+  node_def.op = str(op_type)
+  node_def.name = str(name)
+  if attrs is not None:
+    for k, v in attrs.iteritems():
+      node_def.attr[k].CopyFrom(v)
+  if device is not None:
+    if callable(device):
+      node_def.device = device(node_def)
+    else:
+      node_def.device = _device_string(device)
+  return node_def
+
+
+# Copied from core/framework/node_def_util.cc
+# TODO(mrry,josh11b): Consolidate this validation in C++ code.
+_VALID_OP_NAME_REGEX = re.compile("[A-Za-z0-9.][A-Za-z0-9_.\\-/]*")
+
+
+class Operation(object):
+  """Represents a graph node that performs computation on tensors.
+
+  An `Operation` is a node in a TensorFlow `Graph` that takes zero or
+  more `Tensor` objects as input, and produces zero or more `Tensor`
+  objects as output. Objects of type `Operation` are created by
+  calling a Python op constructor (such as [`tf.matmul()`](math_ops.md#matmul))
+  or [`Graph.create_op()`](framework.md#Graph.create_op).
+
+  For example `c = tf.matmul(a, b)` creates an `Operation` of type
+  "MatMul" that takes tensors `a` and `b` as input, and produces `c`
+  as output.
+
+  After the graph has been launched in a session, an `Operation` can
+  be executed by passing it to [`Session.run()`](client.md#Session.run).
+  `op.run()` is a shortcut for calling `tf.get_default_session().run(op)`.
+
+  @@name
+  @@type
+  @@inputs
+  @@control_inputs
+  @@outputs
+  @@device
+  @@graph
+
+  @@run
+
+  @@get_attr
+  @@traceback
+  """
+
+  def __init__(self, node_def, g, inputs=None, output_types=None,
+               control_inputs=None, input_types=None, original_op=None,
+               op_def=None):
+    """Creates an `Operation`.
+
+    NOTE: This constructor validates the name of the Operation (passed
+    as "node_def.name"). Valid Operation names match the following
+    regular expression:
+
+      [A-Za-z0-9.][A-Za-z0-9_.\\-/]*
+
+    Args:
+      node_def: graph_pb2.NodeDef.  NodeDef for the Operation.
+        Used for attributes of graph_pb2.NodeDef, typically "name",
+        "op", and "device".  The "input" attribute is irrelevant here
+        as it will be computed when generating the model.
+      g: Graph. The parent graph.
+      inputs: list of Tensor objects. The inputs to this Operation.
+      output_types: list of types_pb2.DataType.  List of the types of the
+        Tensors computed by this operation.  The length of this list indicates
+        the number of output endpoints of the Operation.
+      control_inputs: list of operations or tensors from which to have a
+        control dependency.
+      input_types: List of types_pb2.DataType representing the
+        types of the Tensors accepted by the Operation.  By default
+        uses [x.dtype.base_dtype for x in inputs].  Operations that expect
+        reference-typed inputs must specify these explicitly.
+      original_op: Optional. Used to associate the new Operation with an
+        existing Operation (for example, a replica with the op that was
+        replicated).
+      op_def: Optional. The op_def_pb2.OpDef proto that describes the
+        op type that this Operation represents.
+
+    Raises:
+      TypeError: if control inputs are not Operations or Tensors,
+        or if node_def is not a NodeDef,
+        or if g is not a Graph,
+        or if inputs are not Tensors,
+        or if inputs and input_types are incompatible.
+      ValueError: if the node_def name is not valid.
+    """
+    if not isinstance(node_def, graph_pb2.NodeDef):
+      raise TypeError("node_def needs to be a NodeDef: %s" % node_def)
+    if node_def.ByteSize() >= (1 << 31) or node_def.ByteSize() < 0:
+      raise ValueError(
+          "Cannot create an Operation with a NodeDef larger than 2GB.")
+    if not _VALID_OP_NAME_REGEX.match(node_def.name):
+      raise ValueError("'%s' is not a valid node name" % node_def.name)
+    if not isinstance(g, Graph):
+      raise TypeError("g needs to be a Graph: %s" % g)
+    self._node_def = copy.deepcopy(node_def)
+    self._graph = g
+    if inputs is None:
+      inputs = []
+    self._inputs = inputs
+    for a in self._inputs:
+      if not isinstance(a, Tensor):
+        raise TypeError("input needs to be a Tensor: %s" % a)
+      # Mark that we consume the inputs.
+      a._add_consumer(self)  # pylint: disable=protected-access
+    if output_types is None:
+      output_types = []
+    self._output_types = output_types
+    self._outputs = [Tensor(self, i, output_types[i])
+                     for i in xrange(len(output_types))]
+    if input_types is None:
+      input_types = [i.dtype.base_dtype for i in self._inputs]
+    else:
+      if not all(x.is_compatible_with(i.dtype)
+                 for i, x in zip(self._inputs, input_types)):
+        raise TypeError("Inputs are not compatible with input types")
+    self._input_types = input_types
+
+    # Build the list of control inputs.
+    self._control_inputs = []
+    if control_inputs:
+      for c in control_inputs:
+        c_op = None
+        if isinstance(c, Operation):
+          c_op = c
+        elif isinstance(c, (Tensor, IndexedSlices)):
+          c_op = c.op
+        else:
+          raise TypeError("Control input must be an Operation, "
+                          "a Tensor, or IndexedSlices: %s" % c)
+        self._control_inputs.append(c_op)
+
+    self._original_op = original_op
+    self._op_def = op_def
+    self._traceback = _extract_stack()
+    # Add this op to the current control flow context:
+    self._control_flow_context = g._get_control_flow_context()
+    if g._get_control_flow_context() is not None:
+      g._get_control_flow_context().AddOp(self)
+    # NOTE(keveman): Control flow context's AddOp could be creating new ops and
+    # setting op.inputs[index] = new_op. Thus the new ops' id could be larger
+    # than this op's id even though this op depend on them. Therefore, delaying
+    # assigning id to this op until all ops this could be dependent on are
+    # created.
+    self._id_value = self._graph._next_id()  # pylint: disable=protected-access
+    self._recompute_node_def()
+
+  def values(self):
+    """DEPRECATED: Use outputs."""
+    return tuple(self.outputs)
+
+  def _get_control_flow_context(self):
+    """Returns the current control flow context.
+
+    Returns:
+      A context object.
+    """
+    return self._control_flow_context
+
+  @property
+  def name(self):
+    """The full name of this operation."""
+    return self._node_def.name
+
+  @property
+  def _id(self):
+    """The unique integer id of this operation."""
+    return self._id_value
+
+  @property
+  def device(self):
+    """The name of the device to which this op has been assigned, if any.
+
+    Returns:
+      The string name of the device to which this op has been
+      assigned, or None if it has not been assigned to a device.
+    """
+    dev = self._node_def.device
+    return None if not dev else dev
+
+  def _set_device(self, device):
+    """Set the device of this operation.
+
+    Args:
+      device: string or device..  The device to set.
+    """
+    self._node_def.device = _device_string(device)
+
+  def _add_input(self, tensor, dtype=None):
+    """Add a new input to this operation.
+
+    Args:
+      tensor: the Tensor to add as an input.
+      dtype: types.DType: type of the input; defaults to
+        the tensor's dtype.
+
+    Raises:
+      TypeError: if tensor is not a Tensor,
+        or if input tensor type is not convertible to dtype.
+      ValueError: if the Tensor is from a different graph.
+    """
+    if not isinstance(tensor, Tensor):
+      raise TypeError("tensor must be a Tensor: %s" % tensor)
+    assert_same_graph([self, tensor])
+    if dtype is None:
+      dtype = tensor.dtype
+    else:
+      dtype = types.as_dtype(dtype)
+      if not dtype.is_compatible_with(tensor.dtype):
+        raise TypeError(
+            "Cannot convert a tensor of type %s to an input of type %s"
+            % (tensor.dtype.name, dtype.name))
+    self._inputs.append(tensor)
+    self._input_types.append(dtype)
+    tensor._add_consumer(self)  # pylint: disable=protected-access
+    self._recompute_node_def()
+
+  def _update_input(self, index, tensor, dtype=None):
+    """Update the input to this operation at the given index.
+
+    NOTE: This is for TF internal use only. Please don't use it.
+
+    Args:
+      index: the index of the input to update.
+      tensor: the Tensor to be used as the input at the given index.
+      dtype: types.DType: type of the input; defaults to
+        the tensor's dtype.
+
+    Raises:
+      TypeError: if tensor is not a Tensor,
+        or if input tensor type is not convertible to dtype.
+      ValueError: if the Tensor is from a different graph.
+    """
+    if not isinstance(tensor, Tensor):
+      raise TypeError("tensor must be a Tensor: %s" % tensor)
+    assert_same_graph([self, tensor])
+    if dtype is None:
+      dtype = tensor.dtype
+    else:
+      dtype = types.as_dtype(dtype)
+      if not dtype.is_compatible_with(tensor.dtype):
+        raise TypeError(
+            "Cannot convert a tensor of type %s to an input of type %s"
+            % (tensor.dtype.name, dtype.name))
+
+    self._inputs[index].consumers().remove(self)
+    self._inputs[index] = tensor
+    self._input_types[index] = dtype
+    tensor._add_consumer(self)  # pylint: disable=protected-access
+    self._recompute_node_def()
+
+  def _add_control_input(self, op):
+    """Add a new control input to this operation.
+
+    Args:
+      op: the Operation to add as control input.
+
+    Raises:
+      TypeError: if op is not an Operation.
+      ValueError: if op is from a different graph.
+    """
+    if not isinstance(op, Operation):
+      raise TypeError("op must be an Operation: %s" % op)
+    assert_same_graph([self, op])
+    self._control_inputs.append(op)
+    self._recompute_node_def()
+
+  # Methods below are used when building the NodeDef and Graph proto.
+  def _recompute_node_def(self):
+    del self._node_def.input[:]
+    self._node_def.input.extend([t._as_node_def_input() for t in self._inputs])
+    if self._control_inputs:
+      self._node_def.input.extend(["^%s" % op.name for op in
+                                   self._control_inputs])
+
+  def __str__(self):
+    return str(self._node_def)
+
+  @property
+  def outputs(self):
+    """The list of `Tensor` objects representing the outputs of this op."""
+    return self._outputs
+
+# pylint: disable=protected-access
+  class _InputList(object):
+    """Immutable input list wrapper."""
+
+    def __init__(self, op):
+      self._op = op
+
+    def __iter__(self):
+      return iter(self._op._inputs)
+
+    def __len__(self):
+      return len(self._op._inputs)
+
+    def __bool__(self):
+      return bool(self._op._inputs)
+
+    def __getitem__(self, i):
+      return self._op._inputs[i]
+# pylint: enable=protected-access
+
+  @property
+  def inputs(self):
+    """The list of `Tensor` objects representing the data inputs of this op."""
+    return Operation._InputList(self)
+
+  @property
+  def _input_dtypes(self):
+    return self._input_types
+
+  @property
+  def control_inputs(self):
+    """The `Operation` objects on which this op has a control dependency.
+
+    Before this op is executed, TensorFlow will ensure that the
+    operations in `self.control_inputs` have finished executing. This
+    mechanism can be used to run ops sequentially for performance
+    reasons, or to ensure that the side effects of an op are observed
+    in the correct order.
+
+    Returns:
+      A list of `Operation` objects.
+
+    """
+    return self._control_inputs
+
+  @property
+  def type(self):
+    """The type of the op (e.g. `"MatMul"`)."""
+    return self._node_def.op
+
+  @property
+  def graph(self):
+    """The `Graph` that contains this operation."""
+    return self._graph
+
+  @property
+  def node_def(self):
+    """Returns a serialized `NodeDef` representation of this operation.
+
+    Returns:
+      A
+      [`NodeDef`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/graph.proto)
+      protocol buffer.
+    """
+    return self._node_def
+
+  @property
+  def op_def(self):
+    """Returns the `OpDef` proto that represents the type of this op.
+
+    Returns:
+      An
+      [`OpDef`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/op_def.proto)
+      protocol buffer.
+    """
+    return self._op_def
+
+  @property
+  def traceback(self):
+    """Returns the call stack from when this operation was constructed."""
+    return _convert_stack(self._traceback)
+
+  def get_attr(self, name):
+    """Returns the value of the attr of this op with the given `name`.
+
+    Args:
+      name: The name of the attr to fetch.
+
+    Returns:
+      The value of the attr, as a Python object.
+
+    Raises:
+      ValueError: If this op does not have an attr with the given `name`.
+    """
+    fields = ["s", "i", "f", "b", "type", "shape", "tensor"]
+    if name not in self._node_def.attr:
+      raise ValueError("No attr named '" + name + "' in " +
+                       str(self._node_def))
+    x = self._node_def.attr[name]
+    # Treat an empty oneof value as an empty list.
+    if not x.WhichOneof("value"):
+      return []
+    if x.HasField("list"):
+      for f in fields:
+        if getattr(x.list, f):
+          return list(getattr(x.list, f))
+      return []
+    else:
+      for f in fields:
+        if x.HasField(f):
+          return getattr(x, f)
+      assert False, "Unsupported field type in " + str(x)
+
+  def run(self, feed_dict=None, session=None):
+    """Runs this operation in a `Session`.
+
+    Calling this method will execute all preceding operations that
+    produce the inputs needed for this operation.
+
+    *N.B.* Before invoking `Operation.run()`, its graph must have been
+    launched in a session, and either a default session must be
+    available, or `session` must be specified explicitly.
+
+    Args:
+      feed_dict: A dictionary that maps `Tensor` objects to feed values.
+        See [`Session.run()`](client.md#Session.run) for a description of the
+        valid feed values.
+      session: (Optional.) The `Session` to be used to run to this operation. If
+        none, the default session will be used.
+    """
+    _run_using_default_session(self, feed_dict, self.graph, session)
+
+
+_gradient_registry = registry.Registry("gradient")
+
+
+class RegisterGradient(object):
+  """A decorator for registering the gradient function for an op type.
+
+  This decorator is only used when defining a new op type. For an op
+  with `m` inputs and `n` inputs, the gradient function is a function
+  that takes the original `Operation` and `n` `Tensor` objects
+  (representing the gradients with respect to each output of the op),
+  and returns `m` `Tensor` objects (representing the partial gradients
+  with respect to each input of the op).
+
+  For example, assuming that operations of type `"Sub"` take two
+  inputs `x` and `y`, and return a single output `x - y`, the
+  following gradient function would be registered:
+
+  ```python
+  @tf.RegisterGradient("Sub")
+  def _sub_grad(unused_op, grad):
+    return grad, tf.Neg(grad)
+  ```
+
+  The decorator argument `op_type` is the string type of an
+  operation. This corresponds to the `OpDef.name` field for the proto
+  that defines the operation.
+
+  @@__init__
+  """
+
+  def __init__(self, op_type):
+    """Creates a new decorator with `op_type` as the Operation type.
+
+    Args:
+      op_type: The string type of an operation. This corresponds to the
+        `OpDef.name` field for the proto that defines the operation.
+    """
+    if not isinstance(op_type, basestring):
+      raise TypeError("op_type must be a string")
+    self._op_type = op_type
+
+  def __call__(self, f):
+    """Registers the function `f` as gradient function for `op_type`."""
+    _gradient_registry.register(f, self._op_type)
+    return f
+
+
+def NoGradient(op_type):
+  """Specifies that ops of type `op_type` do not have a defined gradient.
+
+  This function is only used when defining a new op type. It may be
+  used for ops such as `tf.size()` that are not differentiable.  For
+  example:
+
+  ```python
+  tf.NoGradient("Size")
+  ```
+
+  Args:
+    op_type: The string type of an operation. This corresponds to the
+      `OpDef.name` field for the proto that defines the operation.
+
+  Raises:
+    TypeError: If `op_type` is not a string.
+
+  """
+  if not isinstance(op_type, basestring):
+    raise TypeError("op_type must be a string")
+  _gradient_registry.register(None, op_type)
+
+
+def get_gradient_function(op):
+  """Returns the function that computes gradients for "op"."""
+  if not op.inputs: return None
+  try:
+    op_type = op.get_attr("_gradient_op_type")
+  except ValueError:
+    op_type = op.type
+  return _gradient_registry.lookup(op_type)
+
+
+_shape_registry = registry.Registry("shape functions")
+_default_shape_function_registry = registry.Registry("default shape functions")
+
+class RegisterShape(object):
+  """A decorator for registering the shape function for an op type.
+
+  This decorator is only used when defining a new op type. A shape
+  function is a function from an `Operation` object to a list of
+  `TensorShape` objects, with one `TensorShape` for each output of the
+  operation.
+
+  For example, assuming that operations of type `"Sub"` take two
+  inputs `x` and `y`, and return a single output `x - y`, all with the
+  same shape, the following shape function would be registered:
+
+  ```python
+  @tf.RegisterShape("Sub")
+  def _sub_shape(op):
+    return [op.inputs[0].get_shape().merge_with(op.inputs[1].get_shape())]
+  ```
+
+  The decorator argument `op_type` is the string type of an
+  operation. This corresponds to the `OpDef.name` field for the proto
+  that defines the operation.
+
+  """
+
+  def __init__(self, op_type):
+    """Saves the "op_type" as the Operation type."""
+    if not isinstance(op_type, basestring):
+      raise TypeError("op_type must be a string")
+    self._op_type = op_type
+
+  def __call__(self, f):
+    """Registers "f" as the shape function for "op_type"."""
+    if f is None:
+      # None is a special "weak" value that provides a default shape function,
+      # and can be overridden by a non-None registration.
+      try:
+        _default_shape_function_registry.register(_no_shape_function,
+                                                  self._op_type)
+      except KeyError:
+        # Ignore duplicate registrations of the weak value. This can
+        # occur if the op library input to wrapper generation
+        # inadvertently links in one or more of the standard op
+        # libraries.
+        pass
+    else:
+      _shape_registry.register(f, self._op_type)
+    return f
+
+
+def _no_shape_function(op):
+  return [tensor_shape.unknown_shape() for _ in op.outputs]
+
+
+def set_shapes_for_outputs(op):
+  """Uses the registered shape functions to set the shapes for op's outputs."""
+  try:
+    shape_func = _shape_registry.lookup(op.type)
+  except LookupError:
+    try:
+      shape_func = _default_shape_function_registry.lookup(op.type)
+    except LookupError:
+      raise RuntimeError("No shape function registered for standard op: %s"
+                         % op.type)
+  shapes = shape_func(op)
+  if len(op.outputs) != len(shapes):
+    raise RuntimeError(
+        "Shape function for op %s returned %g shapes but expecting %g" %
+        (op, len(op.outputs), len(shapes)))
+  for output, s in zip(op.outputs, shapes):
+    output.set_shape(s)
+
+
+class Graph(object):
+  """A TensorFlow computation, represented as a dataflow graph.
+
+  A `Graph` contains a set of [`Operation`](framework.md#Operation) objects,
+  which represent units of computation; and [`Tensor`](framework.md#Tensor)
+  objects, which represent the units of data that flow between operations.
+
+  A default `Graph` is always registered, and accessible by calling
+  [`tf.get_default_graph()`](framework.md#get_default_graph). To add an
+  operation to the default graph, simply call one of the functions that defines
+  a new `Operation`:
+
+  ```
+  c = tf.constant(4.0)
+  assert c.graph is tf.get_default_graph()
+  ```
+
+  Another typical usage involves the
+  [`Graph.as_default()`](framework.md#Graph.as_default)
+  context manager, which overrides the current default graph for the
+  lifetime of the context:
+
+  ```python
+  g = tf.Graph()
+  with g.as_default():
+    # Define operations and tensors in `g`.
+    c = tf.constant(30.0)
+    assert c.graph is g
+  ```
+
+  Important note: This class *is not* thread-safe for graph construction. All
+  operations should be created from a single thread, or external
+  synchronization must be provided. Unless otherwise specified, all methods
+  are not thread-safe.
+
+  @@__init__
+  @@as_default
+  @@as_graph_def
+  @@finalize
+  @@finalized
+
+  @@control_dependencies
+  @@device
+  @@name_scope
+
+  A `Graph` instance supports an arbitrary number of "collections"
+  that are identified by name. For convenience when building a large
+  graph, collections can store groups of related objects: for
+  example, the `tf.Variable` uses a collection (named
+  [`tf.GraphKeys.VARIABLES`](framework.md#GraphKeys)) for all variables that are
+  created during the construction of a graph. The caller may define
+  additional collections by specifying a new name.
+
+  @@add_to_collection
+  @@get_collection
+
+  @@as_graph_element
+  @@get_operation_by_name
+  @@get_tensor_by_name
+  @@get_operations
+
+  @@get_default_device
+  @@seed
+  @@unique_name
+  @@version
+
+  @@create_op
+  @@gradient_override_map
+  """
+
+  def __init__(self):
+    """Creates a new, empty Graph."""
+    self._nodes_by_id = dict()
+    self._next_node_id = [dict()]
+    self._next_id_counter = 0
+    self._nodes_by_name = dict()
+    # Current name stack: a pair of uniquified names and plain names.
+    self._name_stack = ("", "")
+    # Maps a name used in the graph to the next id to use for that name.
+    self._names_in_use = {}
+    # Default device applied to new ops.
+    self._default_device = None
+    # Functions that will be applied to choose a device if none is specified.
+    self._device_function_stack = []
+    # Default original_op applied to new ops.
+    self._default_original_op = None
+    # Current control flow context. It could be either CondContext or
+    # WhileContext defined in ops/control_flow_ops.py
+    self._control_flow_context = None
+    # A new node will depend of the union of all of the nodes in the stack.
+    self._control_dependencies_stack = []
+    # Arbritrary collections of objects.
+    self._collections = {}
+    # The graph-level random seed
+    self._seed = None
+    # A map from op type to the kernel label that should be used.
+    self._op_to_kernel_label_map = {}
+    # A map from op type to an alternative op type that should be used when
+    # computing gradients.
+    self._gradient_override_map = {}
+    # True if the graph is considered "finalized".  In that case no
+    # new operations can be added.
+    self._finalized = False
+
+  def _check_not_finalized(self):
+    """Check if the graph is finalized.
+
+    Raises:
+      RuntimeError: If the graph finalized.
+    """
+    if self._finalized:
+      raise RuntimeError("Graph is finalized and cannot be modified.")
+
+  def _add_op(self, op):
+    """Adds 'op' to the graph.
+
+    Args:
+      op: the Operator or Tensor to add.
+
+    Raises:
+      TypeError: if op is not an Operation or Tensor.
+      ValueError: if the op.name or op._id are already used.
+    """
+    self._check_not_finalized()
+    if not isinstance(op, (Tensor, Operation)):
+      raise TypeError("op must be a Tensor or Operation: %s" % op)
+
+    if op._id in self._nodes_by_id:
+      raise ValueError("cannot add an op with id %d as it already "
+                       "exists in the graph" % op._id)
+    if op.name in self._nodes_by_name:
+      raise ValueError("cannot add op with name %s as that name "
+                       "is already used" % op.name)
+    self._nodes_by_id[op._id] = op
+    self._nodes_by_name[op.name] = op
+
+  @property
+  def version(self):
+    """Returns a version number that increases as ops are added to the graph."""
+    return self._next_id_counter
+
+  @property
+  def seed(self):
+    return self._seed
+
+  @seed.setter
+  def seed(self, seed):
+    self._seed = seed
+
+  @property
+  def finalized(self):
+    """True if this graph has been finalized."""
+    return self._finalized
+
+  def finalize(self):
+    """Finalizes this graph, making it read-only.
+
+    After calling `g.finalize()`, no new operations can be added to
+    `g`.  This method is used to ensure that no operations are added
+    to a graph when it is shared between multiple threads, for example
+    when using a [`QueueRunner`](train.md#QueueRunner).
+    """
+    self._finalized = True
+
+  def _get_control_flow_context(self):
+    """Returns the current control flow context.
+
+    Returns:
+      A context object.
+    """
+    return self._control_flow_context
+
+  def _set_control_flow_context(self, context):
+    """Sets the current control flow context.
+
+    Args:
+      context: a context object.
+    """
+    self._control_flow_context = context
+
+  def as_graph_def(self, from_version=None):
+    """Returns a serialized `GraphDef` representation of this graph.
+
+    This method is thread-safe.
+
+    Args:
+      from_version: Optional.  If this is set, returns a `GraphDef`
+        containing only the nodes that were added to this graph since
+        its `version` property had the given value.
+
+    Returns:
+      A
+      [`GraphDef`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/graph.proto)
+      protocol buffer.
+    """
+    graph = graph_pb2.GraphDef()
+    bytesize = 0
+    for op_id in sorted(self._nodes_by_id):
+      op = self._nodes_by_id[op_id]
+      if from_version is None or op_id > from_version:
+        graph.node.extend([op.node_def])
+        bytesize += op.node_def.ByteSize()
+        if bytesize >= (1 << 31) or bytesize < 0:
+          raise ValueError("GraphDef cannot be larger than 2GB.")
+    return graph
+
+  # Helper functions to create operations.
+  def create_op(self, op_type, inputs, dtypes,
+                input_types=None, name=None, attrs=None, op_def=None,
+                compute_shapes=True):
+    """Creates an `Operation` in this graph.
+
+    This is a low-level interface for creating an `Operation`. Most
+    programs will not call this method directly, and instead use the
+    Python op constructors, such as `tf.constant()`, which add ops to
+    the default graph.
+
+    Args:
+      op_type: The `Operation` type to create. This corresponds to the
+        `OpDef.name` field for the proto that defines the operation.
+      inputs: A list of `Tensor` objects that will be inputs to the `Operation`.
+      dtypes: A list of `DType` objects that will be the types of the tensors
+        that the operation produces.
+      input_types: (Optional.) A list of `DType`s that will be the types of
+        the tensors that the operation consumes. By default, uses the base
+        `DType` of each input in `inputs`. Operations that expect
+        reference-typed inputs must specify `input_types` explicitly.
+      name: (Optional.) A string name for the operation. If not specified, a
+        name is generated based on `op_type`.
+      attrs: (Optional.) A list of `AttrValue` protos for the `attr` field of
+        the `NodeDef` proto that will represent the operation.
+      op_def: (Optional.) The `OpDef` proto that describes the `op_type` that
+        the operation will have.
+      compute_shapes: (Optional.) If True, shape inference will be performed
+        to compute the shapes of the outputs.
+
+    Raises:
+      TypeError: if any of the inputs is not a `Tensor`.
+
+    Returns:
+      An `Operation` object.
+
+    """
+    self._check_not_finalized()
+    for idx, a in enumerate(inputs):
+      if not isinstance(a, Tensor):
+        raise TypeError("Input #%d is not a tensor: %s" % (idx, a))
+    if name is None:
+      name = op_type
+    # If a names ends with a '/' it is a "name scope" and we use it as-is,
+    # after removing the trailing '/'.
+    if name and name[-1] == "/":
+      name = name[:-1]
+    else:
+      name = self.unique_name(name)
+
+    node_def = _NodeDef(
+        op_type, name, device=self._default_device or None, attrs=attrs)
+
+    # Apply a kernel label if one has been specified for this op_type.
+    try:
+      kernel_label = self._op_to_kernel_label_map[op_type]
+      node_def.attr["_kernel"].CopyFrom(
+          attr_value_pb2.AttrValue(s=kernel_label))
+    except KeyError:
+      pass
+
+    # Apply the overriding op_type for gradients if one has been
+    # specified for this op_type.
+    try:
+      mapped_op_type = self._gradient_override_map[op_type]
+      node_def.attr["_gradient_op_type"].CopyFrom(
+          attr_value_pb2.AttrValue(s=mapped_op_type))
+    except KeyError:
+      pass
+
+    control_inputs = self._control_dependencies_for_inputs(inputs)
+    ret = Operation(node_def, self, inputs=inputs, output_types=dtypes,
+                    control_inputs=control_inputs, input_types=input_types,
+                    original_op=self._default_original_op, op_def=op_def)
+    if compute_shapes:
+      set_shapes_for_outputs(ret)
+    self._add_op(ret)
+    self._record_op_seen_by_control_dependencies(ret)
+    # Apply any device functions in reverse order, so that the most recently
+    # pushed function has the first chance to apply a device to the op.
+    # We apply here because the result can depend on the Operation's
+    # signature, which is computed in the Operation constructor.
+    for device_function in reversed(self._device_function_stack):
+      ret._set_device(device_function(ret))
+    return ret
+
+  def as_graph_element(self, obj, allow_tensor=True, allow_operation=True):
+    """Returns the object referred to by `obj`, as an `Operation` or `Tensor`.
+
+    This function validates that `obj` represents an element of this
+    graph, and gives an informative error message if it is not.
+
+    This function is the canonical way to get/validate an object of
+    one of the allowed types from an external argument reference in the
+    Session API.
+
+    This method may be called concurrently from multiple threads.
+
+    Args:
+      obj: A `Tensor`, an `Operation`, or the name of a tensor or operation.
+        Can also be any object with an `_as_graph_element()` method that returns
+        a value of one of these types.
+      allow_tensor: If true, `obj` may refer to a `Tensor`.
+      allow_operation: If true, `obj` may refer to an `Operation`.
+
+    Returns:
+      The `Tensor` or `Operation` in the Graph corresponding to `obj`.
+
+    Raises:
+      TypeError: If `obj` is not a type we support attempting to convert
+        to types.
+      ValueError: If `obj` is of an appropriate type but invalid. For
+        example, an invalid string.
+      KeyError: If `obj` is not an object in the graph.
+    """
+
+    # The vast majority of this function is figuring
+    # out what an API user might be doing wrong, so
+    # that we can give helpful error messages.
+    #
+    # Ideally, it would be nice to split it up, but we
+    # need context to generate nice error messages.
+
+    if allow_tensor and allow_operation:
+      types_str = "Tensor or Operation"
+    elif allow_tensor:
+      types_str = "Tensor"
+    elif allow_operation:
+      types_str = "Operation"
+    else:
+      raise ValueError("allow_tensor and allow_operation can't both be False.")
+
+    conv_fn = getattr(obj, "_as_graph_element", None)
+    if conv_fn and callable(conv_fn):
+      obj = conv_fn()
+
+    # If obj appears to be a name...
+    if isinstance(obj, basestring):
+      name = obj
+
+      if ":" in name and allow_tensor:
+        # Looks like a Tensor name and can be a Tensor.
+        try:
+          op_name, out_n = name.split(":")
+          out_n = int(out_n)
+        except:
+          raise ValueError("The name %s looks a like a Tensor name, but is "
+                           "not a valid one. Tensor names must be of the "
+                           "form \"<op_name>:<output_index>\"." % repr(name))
+        if op_name in self._nodes_by_name:
+          op = self._nodes_by_name[op_name]
+        else:
+          raise KeyError("The name %s refers to a Tensor which does not "
+                         "exist. The operation, %s, does not exist in the "
+                         "graph." % (repr(name), repr(op_name)))
+        try:
+          return op.outputs[out_n]
+        except:
+          raise KeyError("The name %s refers to a Tensor which does not "
+                         "exist. The operation, %s, exists but only has "
+                         "%s outputs."
+                         % (repr(name), repr(op_name), len(op.outputs)))
+
+      elif ":" in name and not allow_tensor:
+        # Looks like a Tensor name but can't be a Tensor.
+        raise ValueError("Name %s appears to refer to a Tensor, not a %s."
+                         % (repr(name), types_str))
+
+      elif ":" not in name and allow_operation:
+        # Looks like an Operation name and can be an Operation.
+        if name not in self._nodes_by_name:
+          raise KeyError("The name %s refers to an Operation not in the "
+                         "graph." % repr(name))
+        return self._nodes_by_name[name]
+
+      elif ":" not in name and not allow_operation:
+        # Looks like an Operation name but can't be an Operation.
+        if name in self._nodes_by_name:
+          # Yep, it's an Operation name
+          err_msg = ("The name %s refers to an Operation, not a %s."
+                     % (repr(name), types_str))
+        else:
+          err_msg = ("The name %s looks like an (invalid) Operation name, "
+                     "not a %s." % (repr(name), types_str))
+        err_msg += (" Tensor names must be of the form "
+                    "\"<op_name>:<output_index>\".")
+        raise ValueError(err_msg)
+
+    elif isinstance(obj, Tensor) and allow_tensor:
+      # Actually obj is just the object it's referring to.
+      return obj
+    elif isinstance(obj, Operation) and allow_operation:
+      # Actually obj is just the object it's referring to.
+      return obj
+    else:
+      # We give up!
+      raise TypeError("Can not convert a %s into a %s."
+                      % (type(obj).__name__, types_str))
+
+  def get_operations(self):
+    """Return the list of operations in the graph.
+
+    You can modify the operations in place, but modifications
+    to the list such as inserts/delete have no effect on the
+    list of operations known to the graph.
+
+    This method may be called concurrently from multiple threads.
+
+    Returns:
+      A list of Operations.
+    """
+    return self._nodes_by_id.values()
+
+  def get_operation_by_name(self, name):
+    """Returns the `Operation` with the given `name`.
+
+    This method may be called concurrently from multiple threads.
+
+    Args:
+      name: The name of the `Operation` to return.
+
+    Returns:
+      The `Operation` with the given `name`.
+
+    Raises:
+      TypeError: If `name` is not a string.
+      KeyError: If `name` does not correspond to an operation in this graph.
+    """
+
+    if not isinstance(name, basestring):
+      raise TypeError("Operation names are strings (or similar), not %s."
+                      % type(name).__name__)
+    return self.as_graph_element(name, allow_tensor=False, allow_operation=True)
+
+  def get_tensor_by_name(self, name):
+    """Returns the `Tensor` with the given `name`.
+
+    This method may be called concurrently from multiple threads.
+
+    Args:
+      name: The name of the `Tensor` to return.
+
+    Returns:
+      The `Tensor` with the given `name`.
+
+    Raises:
+      TypeError: If `name` is not a string.
+      KeyError: If `name` does not correspond to a tensor in this graph.
+    """
+    # Names should be strings.
+    if not isinstance(name, basestring):
+      raise TypeError("Tensor names are strings (or similar), not %s."
+                      % type(name).__name__)
+    return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
+
+  def _next_id(self):
+    """Id for next Operation instance. Also increments the internal id."""
+    self._check_not_finalized()
+    self._next_id_counter += 1
+    return self._next_id_counter
+
+  @property
+  def _last_id(self):
+    return self._next_id_counter
+
+  def as_default(self):
+    """Returns a context manager that makes this `Graph` the default graph.
+
+    This method should be used if you want to create multiple graphs
+    in the same process. For convenience, a global default graph is
+    provided, and all ops will be added to this graph if you do not
+    create a new graph explicitly. Use this method the `with` keyword
+    to specify that ops created within the scope of a block should be
+    added to this graph.
+
+    The default graph is a property of the current thread. If you
+    create a new thread, and wish to use the default graph in that
+    thread, you must explicitly add a `with g.as_default():` in that
+    thread's function.
+
+    The following code examples are equivalent:
+
+    ```python
+    # 1. Using Graph.as_default():
+    g = tf.Graph()
+    with g.as_default():
+      c = tf.constant(5.0)
+      assert c.graph is g
+
+    # 2. Constructing and making default:
+    with tf.Graph().as_default() as g:
+      c = tf.constant(5.0)
+      assert c.graph is g
+    ```
+
+    Returns:
+      A context manager for using this graph as the default graph.
+    """
+    return _default_graph_stack.get_controller(self)
+
+  def add_to_collection(self, name, value):
+    """Stores `value` in the collection with the given `name`.
+
+    Args:
+      name: The key for the collection. For example, the `GraphKeys` class
+        contains many standard names for collections.
+      value: The value to add to the collection.
+    """
+    self._check_not_finalized()
+    if name not in self._collections:
+      self._collections[name] = [value]
+    else:
+      self._collections[name].append(value)
+
+  def get_collection(self, name, scope=None):
+    """Returns a list of values in the collection with the given `name`.
+
+    Args:
+      key: The key for the collection. For example, the `GraphKeys` class
+        contains many standard names for collections.
+      scope: (Optional.) If supplied, the resulting list is filtered to include
+        only items whose name begins with this string.
+
+    Returns:
+      The list of values in the collection with the given `name`, or
+      an empty list if no value has been added to that collection. The
+      list contains the values in the order under which they were
+      collected.
+    """
+    if scope is None:
+      return self._collections.get(name, list())
+    else:
+      c = []
+      for item in self._collections.get(name, list()):
+        if hasattr(item, 'name') and item.name.startswith(scope):
+          c.append(item)
+      return c
+
+  @contextlib.contextmanager
+  def _original_op(self, op):
+    """Python 'with' handler to help annotate ops with their originator.
+
+    An op may have an 'original_op' property that indicates the op on which
+    it was based. For example a replica op is based on the op that was
+    replicated and a gradient op is based on the op that was differentiated.
+
+    All ops created in the scope of this 'with' handler will have
+    the given 'op' as their original op.
+
+    Args:
+      op: The Operation that all ops created in this scope will have as their
+        original op.
+
+    Yields:
+      Nothing.
+    """
+    old_original_op = self._default_original_op
+    try:
+      self._default_original_op = op
+      yield
+    finally:
+      self._default_original_op = old_original_op
+
+  # pylint: disable=g-doc-return-or-yield
+  @contextlib.contextmanager
+  def name_scope(self, name):
+    """Returns a context manager that creates hierarchical names for operations.
+
+    A graph maintains a stack of name scopes. A `with name_scope(...):`
+    statement pushes a new name onto the stack for the lifetime of the context.
+
+    The `name` argument will be interpreted as follows:
+
+    * A string (not ending with '/') will create a new name scope, in which
+      `name` is appended to the prefix of all operations created in the
+      context. If `name` has been used before, it will be made unique by
+      calling `self.unique_name(name)`.
+    * A scope previously captured from a `with g.name_scope(...) as
+      scope:` statement will be treated as an "absolute" name scope, which
+      makes it possible to re-enter existing scopes.
+    * A value of `None` or the empty string will reset the current name scope
+      to the top-level (empty) name scope.
+
+    For example:
+
+    ```python
+    with tf.Graph().as_default() as g:
+      c = tf.constant(5.0, name="c")
+      assert c_1.name == "c"
+      c_1 = tf.constant(6.0, name="c")
+      assert c_1.name == "c_1"
+
+      # Creates a scope called "nested"
+      with g.name_scope("nested") as scope:
+        nested_c = tf.constant(10.0, name="c")
+        assert nested_c.name == "nested/c"
+
+        # Creates a nested scope called "inner".
+        with g.name_scope("inner"):
+          nested_inner_c = tf.constant(20.0, name="c")
+          assert nested_inner_c.name == "nested/inner/c"
+
+        # Create a nested scope called "inner_1".
+        with g.name_scope("inner"):
+          nested_inner_1_c = tf.constant(30.0, name="c")
+          assert nested_inner_1_c.name == "nested/inner_1/c"
+
+          # Treats `scope` as an absolute name scope, and
+          # switches to the "nested/" scope.
+          with g.name_scope(scope):
+            nested_d = tf.constant(40.0, name="d")
+            assert nested_d.name == "nested/d"
+
+            with g.name_scope(""):
+              e = tf.constant(50.0, name="e")
+              assert e.name == "e"
+    ```
+
+    The name of the scope itself can be captured by `with
+    g.name_scope(...) as scope:`, which stores the name of the scope
+    in the variable `scope`. This value can be used to name an
+    operation that represents the overall result of executing the ops
+    in a scope. For example:
+
+    ```python
+    inputs = tf.constant(...)
+    with g.name_scope('my_layer') as scope:
+      weights = tf.Variable(..., name="weights")
+      biases = tf.Variable(..., name="biases")
+      affine = tf.matmul(inputs, weights) + biases
+      output = tf.nn.relu(affine, name=scope)
+    ```
+
+
+    Args:
+      name: A name for the scope.
+
+    Returns:
+      A context manager that installs `name` as a new name scope.
+    """
+    try:
+      old_stack = self._name_stack
+      if not name:  # Both for name=None nad name="" we re-set to empty scope.
+        new_stack = (None, None)
+      elif name and name[-1] == "/":
+        new_stack = (name[:-1], name[:-1])
+      else:
+        new_stack = (self.unique_name(name), self._plain_name(name))
+      self._name_stack = new_stack
+      yield "" if new_stack[0] is None else new_stack[0] + "/"
+    finally:
+      self._name_stack = old_stack
+  # pylint: enable=g-doc-return-or-yield
+
+  def unique_name(self, name):
+    """Return a unique Operation name for "name".
+
+    Note: You rarely need to call unique_name() directly.  Most of the time you
+    just need to create "with g.name_scope()" blocks to generate structured
+    names.
+
+    `unique_name` is used to generate structured names, separated by "/",
+    to help identify Operations when debugging a Graph.  Operation names
+    are displayed in error messages reported by the TensorFlow runtime,
+    and in various visualization tools such as TensorBoard.
+
+    Args:
+      name: The name for an `Operation`.
+
+    Returns:
+      A string to be passed to `create_op()` that will be used
+      to name the operation being created.
+    """
+    if self._name_stack[0]:
+      name = self._name_stack[0] + "/" + name
+    i = self._names_in_use.get(name, 0)
+    # Increment the number for "name".
+    self._names_in_use[name] = i + 1
+    if i > 0:
+      base_name = name
+      # Make sure the composed name is not already used.
+      while name in self._names_in_use:
+        name = "%s_%d" % (base_name, i)
+        i += 1
+      # Mark the composed name as used in case someone wants
+      # to call unique_name("name_1").
+      self._names_in_use[name] = 1
+    return name
+
+  # TODO(mdevin): remove
+  def _plain_name(self, name):
+    """Return the fully scoped 'name'.
+
+    Args:
+      name: a string.
+
+    Returns:
+      'name' scoped in the current name stack, without any uniquified
+      elements.
+    """
+    if self._name_stack[1]:
+      return self._name_stack[1] + "/" + name
+    else:
+      return name
+
+  def _set_default_device(self, dev):
+    """Set the default device properties.
+
+    Args:
+      dev: string or Device.
+    """
+    self._default_device = _device_string(dev)
+
+  def get_default_device(self):
+    """Returns the default device.
+
+    Returns:
+      A string.
+    """
+    return self._default_device
+
+  def _push_default_device_function(self, device_function):
+    """Pushes the given function onto the stack of device functions.
+
+    See Graph.device for more details.
+
+    Args:
+      device_function: The function to be pushed onto the stack of device
+        functions.
+    """
+    self._device_function_stack.append(device_function)
+
+  def _pop_default_device_function(self, device_function):
+    """Pops the given function from the stack of device functions.
+
+    See Graph.device for more details.
+
+    Args:
+      device_function: The function to be popped from the stack of device
+        functions.
+
+    Raises:
+      ValueError: if the device_function to be popped is not top of the stack,
+        or if the stack is empty.
+    """
+    if not self._device_function_stack:
+      raise ValueError("Tried to pop, but the device function stack is empty")
+    if self._device_function_stack[-1] is not device_function:
+      raise ValueError("Tried to pop device function, but it was not on top "
+                       "of the stack")
+
+    self._device_function_stack.pop()
+
+  @contextlib.contextmanager
+  def device(self, device_name_or_function):
+    """Returns a context manager that specifies the default device to use.
+
+    The `device_name_or_function` argument may either be a device name
+    string, a device function, or None:
+
+    * If it is a device name string, all operations constructed in
+      this context will be assigned to the device with that name.
+    * If it is a function, it will be treated as function from
+      Operation objects to device name strings, and invoked each time
+      a new Operation is created. The Operation will be assigned to
+      the device with the returned name.
+    * If it is None, the default device will be cleared.
+
+    For example:
+
+    ```python
+    with g.device('/gpu:0'):
+      # All operations constructed in this context will be placed
+      # on GPU 0.
+      with g.device(None):
+        # All operations constructed in this context will have no
+        # assigned device.
+
+    # Defines a function from `Operation` to device string.
+    def matmul_on_gpu(n):
+      if n.type == "MatMul":
+        return "/gpu:0"
+      else:
+        return "/cpu:0"
+
+    with g.device(matmul_on_gpu):
+      # All operations of type "MatMul" constructed in this context
+      # will be placed on GPU 0; all other operations will be placed
+      # on CPU 0.
+    ```
+
+    Args:
+      device_name_or_function: The device name or function to use in
+        the context.
+
+    Returns:
+      A context manager that specifies the default device to use for newly
+      created ops.
+    """
+    if callable(device_name_or_function):
+      try:
+        self._push_default_device_function(device_name_or_function)
+        yield
+      finally:
+        self._pop_default_device_function(device_name_or_function)
+    else:
+      try:
+        old_dev = self.get_default_device()
+        self._set_default_device(_device_string(device_name_or_function))
+        yield
+      finally:
+        self._set_default_device(old_dev)
+
+  class _ControlDependenciesController(object):
+    """Context manager for `control_dependencies()`."""
+
+    def __init__(self, graph, control_inputs):
+      self._graph = graph
+      self._control_inputs = control_inputs
+      self._seen_nodes = set()
+
+# pylint: disable=protected-access
+    def __enter__(self):
+      self._graph._push_control_dependencies_controller(self)
+
+    def __exit__(self, unused_type, unused_value, unused_traceback):
+      self._graph._pop_control_dependencies_controller(self)
+# pylint: enable=protected-access
+
+    @property
+    def control_inputs(self):
+      return self._control_inputs
+
+    def add_op(self, op):
+      self._seen_nodes.add(op)
+
+    def op_in_group(self, op):
+      return op in self._seen_nodes
+
+  def _push_control_dependencies_controller(self, controller):
+    self._control_dependencies_stack.append(controller)
+
+  def _pop_control_dependencies_controller(self, controller):
+    assert self._control_dependencies_stack[-1] is controller
+    self._control_dependencies_stack.pop()
+
+  def _current_control_dependencies(self):
+    ret = set()
+    for controller in self._control_dependencies_stack:
+      for op in controller.control_inputs:
+        ret.add(op)
+    return ret
+
+  def _control_dependencies_for_inputs(self, input_tensors):
+    """For an op that takes `input_tensors` as inputs, compute control inputs.
+
+    The returned control dependencies should yield an execution that
+    is equivalent to adding all control inputs in
+    self._control_dependencies_stack to a newly created op. However,
+    this function attempts to prune the returned control dependencies
+    by observing that nodes created within the same `with
+    control_dependencies(...):` block may have data dependencies that make
+    the explicit approach redundant.
+
+    Args:
+      input_tensors: The direct data dependencies for an op to be created.
+
+    Returns:
+      A list of control inputs for the op to be created.
+    """
+    ret = []
+    input_ops = set([t.op for t in input_tensors])
+    for controller in self._control_dependencies_stack:
+      # If any of the input_ops already depends on the inputs from controller,
+      # we say that the new op is dominated (by that input), and we therefore
+      # do not need to add control dependences for this controller's inputs.
+      dominated = False
+      for op in input_ops:
+        if controller.op_in_group(op):
+          dominated = True
+          break
+      if not dominated:
+        # Don't add a control input if we already have a data dependency on i.
+        # NOTE(mrry): We do not currently track transitive data dependencies,
+        #   so we may add redundant control inputs.
+        ret.extend([c for c in controller.control_inputs if c not in input_ops])
+    return ret
+
+  def _record_op_seen_by_control_dependencies(self, op):
+    """Record that the given op depends on all registered control dependencies.
+
+    Args:
+      op: An Operation.
+    """
+    for controller in self._control_dependencies_stack:
+      controller.add_op(op)
+
+  def control_dependencies(self, control_inputs):
+    """Returns a context manager that specifies control dependencies.
+
+    Use with the `with` keyword to specify that all operations constructed
+    within the context should have control dependencies on
+    `control_inputs`. For example:
+
+    ```python
+    with g.control_dependencies([a, b, c]):
+      # `d` and `e` will only run after `a`, `b`, and `c` have executed.
+      d = ...
+      e = ...
+    ```
+
+    Multiple calls to `control_dependencies()` can be nested, and in
+    that case a new `Operation` will have control dependencies on the union
+    of `control_inputs` from all active contexts.
+
+    ```python
+    with g.control_dependencies([a, b]):
+      # Ops declared here run after `a` and `b`.
+      with g.control_dependencies([c, d]):
+        # Ops declared here run after `a`, `b`, `c`, and `d`.
+    ```
+
+    *N.B.* The control dependencies context applies *only* to ops that
+    are constructed within the context. Merely using an op or tensor
+    in the context does not add a control dependency. The following
+    example illustrates this point:
+
+    ```python
+    # WRONG
+    def my_func(pred, tensor):
+      t = tf.matmul(tensor, tensor)
+      with tf.control_dependencies([pred]):
+        # The matmul op is created outside the context, so no control
+        # dependency will be added.
+        return t
+
+    # RIGHT
+    def my_func(pred, tensor):
+      with tf.control_dependencies([pred]):
+        # The matmul op is created in the context, so a control dependency
+        # will be added.
+        return tf.matmul(tensor, tensor)
+    ```
+
+    Args:
+      control_inputs: A list of `Operation` or `Tensor` objects, which
+        must be executed or computed before running the operations
+        defined in the context.
+
+    Returns:
+     A context manager that specifies control dependencies for all
+     operations constructed within the context.
+
+    Raises:
+      TypeError: If `control_inputs` is not a list of `Operation` or
+        `Tensor` objects.
+    """
+    # First convert the inputs to ops, and deduplicate them.
+    # NOTE(mrry): Other than deduplication, we do not currently track direct
+    #   or indirect dependencies between control_inputs, which may result in
+    #   redundant control inputs.
+    control_ops = []
+    current = self._current_control_dependencies()
+    for c in control_inputs:
+      if isinstance(c, Tensor):
+        c = c.op
+      elif not isinstance(c, Operation):
+        raise TypeError("Control input must be Operation or Tensor: %s" % c)
+      if c not in current:
+        control_ops.append(c)
+        current.add(c)
+    return self._ControlDependenciesController(self, control_ops)
+
+  # pylint: disable=g-doc-return-or-yield
+  @contextlib.contextmanager
+  def _kernel_label_map(self, op_to_kernel_label_map):
+    """EXPERIMENTAL: A context manager for setting kernel labels.
+
+    This context manager can be used to select particular
+    implementations of kernels within the scope of the context.
+
+    For example:
+
+        with ops.Graph().as_default() as g:
+          f_1 = Foo()  # Uses the default registered kernel for the Foo op.
+          with g.kernel_label_map({"Foo": "v_2"}):
+            f_2 = Foo()  # Uses the registered kernel with label "v_2"
+                         # for the Foo op.
+            with g.kernel_label_map({"Foo": "v_3"}):
+              f_3 = Foo()  # Uses the registered kernel with label "v_3"
+                           # for the Foo op.
+              with g.kernel_label_map({"Foo": ""}):
+                f_4 = Foo()  # Uses the default registered kernel
+                             # for the Foo op.
+
+    Args:
+      op_to_kernel_label_map: A dictionary mapping op type strings to
+        kernel label strings.
+
+    Returns:
+      A context manager that sets the kernel label to be used for one or more
+      ops created in that context.
+
+    Raises:
+      TypeError: If op_to_kernel_label_map is not a dictionary mapping
+        strings to strings.
+    """
+    if not isinstance(op_to_kernel_label_map, dict):
+      raise TypeError("op_to_kernel_label_map must be a dictionary mapping "
+                      "strings to strings")
+    # The saved_labels dictionary stores any currently-set labels that
+    # will be overridden by this context manager.
+    saved_labels = {}
+    # Install the given label
+    for op_type, label in op_to_kernel_label_map.items():
+      if not (isinstance(op_type, basestring)
+              and isinstance(label, basestring)):
+        raise TypeError("op_to_kernel_label_map must be a dictionary mapping "
+                        "strings to strings")
+      try:
+        saved_labels[op_type] = self._op_to_kernel_label_map[op_type]
+      except KeyError:
+        pass
+      self._op_to_kernel_label_map[op_type] = label
+    try:
+      yield  # The code within the context runs here.
+    finally:
+      # Remove the labels set for this context, and restore any saved labels.
+      for op_type, label in op_to_kernel_label_map.items():
+        try:
+          self._op_to_kernel_label_map[op_type] = saved_labels[op_type]
+        except KeyError:
+          del self._op_to_kernel_label_map[op_type]
+  # pylint: enable=g-doc-return-or-yield
+
+  # pylint: disable=g-doc-return-or-yield
+  @contextlib.contextmanager
+  def gradient_override_map(self, op_type_map):
+    """EXPERIMENTAL: A context manager for overriding gradient functions.
+
+    This context manager can be used to override the gradient function
+    that will be used for ops within the scope of the context.
+
+    For example:
+
+    ```python
+    @tf.RegisterGradient("CustomSquare")
+    def _custom_square_grad(op, inputs):
+      # ...
+
+    with tf.Graph().as_default() as g:
+      c = tf.constant(5.0)
+      s_1 = tf.square(c)  # Uses the default gradient for tf.square.
+      with g.gradient_override_map({"Square": "CustomSquare"}):
+        s_2 = tf.square(s_2)  # Uses _custom_square_grad to compute the
+                              # gradient of s_2.
+    ```
+
+    Args:
+      op_type_map: A dictionary mapping op type strings to alternative op
+        type strings.
+
+    Returns:
+      A context manager that sets the alternative op type to be used for one
+      or more ops created in that context.
+
+    Raises:
+      TypeError: If `op_type_map` is not a dictionary mapping strings to
+        strings.
+    """
+    if not isinstance(op_type_map, dict):
+      raise TypeError("op_type_map must be a dictionary mapping "
+                      "strings to strings")
+    # The saved_mappings dictionary stores any currently-set mappings that
+    # will be overridden by this context manager.
+    saved_mappings = {}
+    # Install the given label
+    for op_type, mapped_op_type in op_type_map.items():
+      if not (isinstance(op_type, basestring)
+              and isinstance(mapped_op_type, basestring)):
+        raise TypeError("op_type_map must be a dictionary mapping "
+                        "strings to strings")
+      try:
+        saved_mappings[op_type] = self._gradient_override_map[op_type]
+      except KeyError:
+        pass
+      self._gradient_override_map[op_type] = mapped_op_type
+    try:
+      yield  # The code within the context runs here.
+    finally:
+      # Remove the labels set for this context, and restore any saved labels.
+      for op_type, mapped_op_type in op_type_map.items():
+        try:
+          self._gradient_override_map[op_type] = saved_mappings[op_type]
+        except KeyError:
+          del self._gradient_override_map[op_type]
+  # pylint: enable=g-doc-return-or-yield
+
+
+def device(dev):
+  """Wrapper for `Graph.device()` using the default graph.
+
+  See [`Graph.name_scope()`](framework.md#Graph.name_scope) for more details.
+
+  Args:
+    device_name_or_function: The device name or function to use in
+      the context.
+
+  Returns:
+    A context manager that specifies the default device to use for newly
+    created ops.
+  """
+  return get_default_graph().device(dev)
+
+
+def name_scope(name):
+  """Wrapper for `Graph.name_scope()` using the default graph.
+
+  See [`Graph.name_scope()`](framework.md#Graph.name_scope) for more details.
+
+  Args:
+    name: A name for the scope.
+
+  Returns:
+    A context manager that installs `name` as a new name scope in the
+    default graph.
+  """
+  return get_default_graph().name_scope(name)
+
+
+def control_dependencies(control_inputs):
+  """Wrapper for `Graph.control_dependencies()` using the default graph.
+
+  See [`Graph.control_dependencies()`](framework.md#Graph.control_dependencies)
+  for more details.
+
+  Args:
+    control_inputs: A list of `Operation` or `Tensor` objects, which
+      must be executed or computed before running the operations
+      defined in the context.
+
+  Returns:
+   A context manager that specifies control dependencies for all
+   operations constructed within the context.
+  """
+  return get_default_graph().control_dependencies(control_inputs)
+
+
+class _DefaultStack(threading.local):
+  """A thread-local stack of objects for providing implicit defaults."""
+
+  def __init__(self):
+    super(_DefaultStack, self).__init__()
+    self.stack = []
+
+  def get_default(self):
+    return self.stack[-1] if len(self.stack) >= 1 else None
+
+  def reset(self):
+    self.stack = []
+
+  @contextlib.contextmanager
+  def get_controller(self, default):
+    """A context manager for manipulating a default stack."""
+    try:
+      self.stack.append(default)
+      yield default
+    finally:
+      assert self.stack[-1] is default
+      self.stack.pop()
+
+
+_default_session_stack = _DefaultStack()
+
+
+def default_session(session):
+  """Python "with" handler for defining a default session.
+
+  This function provides a means of registering a session for handling
+  Tensor.eval() and Operation.run() calls. It is primarily intended for use
+  by session.Session, but can be used with any object that implements
+  the Session.run() interface.
+
+  Use with the "with" keyword to specify that Tensor.eval() and Operation.run()
+  invocations within the scope of a block should be executed by a particular
+  session.
+
+  The default session applies to the current thread only, so it is always
+  possible to inspect the call stack and determine the scope of a default
+  session. If you create a new thread, and wish to use the default session
+  in that thread, you must explicitly add a "with ops.default_session(sess):"
+  block in that thread's function.
+
+  Example:
+    The following code examples are equivalent:
+
+    # 1. Using the Session object directly:
+    sess = ...
+    c = tf.constant(5.0)
+    sess.run(c)
+
+    # 2. Using default_session():
+    sess = ...
+    with ops.default_session(sess):
+      c = tf.constant(5.0)
+      result = c.eval()
+
+    # 3. Overriding default_session():
+    sess = ...
+    with ops.default_session(sess):
+      c = tf.constant(5.0)
+      with ops.default_session(...):
+        c.eval(session=sess)
+
+  Args:
+    session: The session to be installed as the default session.
+
+  Returns:
+    A context manager for the default session.
+  """
+  return _default_session_stack.get_controller(weakref.ref(session))
+
+
+def get_default_session():
+  """Returns the default session for the current thread.
+
+  The returned `Session` will be the innermost session on which a
+  `Session` or `Session.as_default()` context has been entered.
+
+  *N.B.* The default session is a property of the current thread. If you
+  create a new thread, and wish to use the default session in that
+  thread, you must explicitly add a `with sess.as_default():` in that
+  thread's function.
+
+  Returns:
+    The default `Session` being used in the current thread.
+  """
+  ref = _default_session_stack.get_default()
+  if ref is None:
+    # No default session has been registered.
+    return None
+  else:
+    # De-reference ref.
+    ret = ref()
+    if ret is None:
+      # This should never happen with the current session implementations.
+      raise RuntimeError("Default session has been garbage collected.")
+  return ret
+
+
+def _eval_using_default_session(tensors, feed_dict, graph, session=None):
+  """Uses the default session to evaluate one or more tensors.
+
+  Args:
+    tensors: A single Tensor, or a list of Tensor objects.
+    feed_dict: A dictionary that maps Tensor objects (or tensor names) to lists,
+      numpy ndarrays, TensorProtos, or strings.
+    graph: The graph in which the tensors are defined.
+    session: (Optional) A different session to use to evaluate "tensors".
+
+  Returns:
+    Either a single numpy ndarray if "tensors" is a single tensor; or a list
+    of numpy ndarrays that each correspond to the respective element in
+    "tensors".
+
+  Raises:
+    ValueError: If no default session is available; the default session
+      does not have "graph" as its graph; or if "session" is specified,
+      and it does not have "graph" as its graph.
+  """
+  if session is None:
+    session = get_default_session()
+    if session is None:
+      raise ValueError("Cannot evaluate tensor using eval(): No default "
+                       "session is registered. Use 'with "
+                       "DefaultSession(sess)' or pass an explicit session to "
+                       "eval(session=sess)")
+    if session.graph is not graph:
+      raise ValueError("Cannot use the default session to evaluate tensor: "
+                       "the tensor's graph is different from the session's "
+                       "graph. Pass an explicit session to "
+                       "eval(session=sess).")
+  else:
+    if session.graph is not graph:
+      raise ValueError("Cannot use the given session to evaluate tensor: "
+                       "the tensor's graph is different from the session's "
+                       "graph.")
+  return session.run(tensors, feed_dict)
+
+
+def _run_using_default_session(operation, feed_dict, graph, session=None):
+  """Uses the default session to run "operation".
+
+  Args:
+    operation: The Operation to be run.
+    feed_dict: A dictionary that maps Tensor objects (or tensor names) to lists,
+      numpy ndarrays, TensorProtos, or strings.
+    graph: The graph in which "operation" is defined.
+    session: (Optional) A different session to use to run "operation".
+
+  Raises:
+    ValueError: If no default session is available; the default session
+      does not have "graph" as its graph; or if "session" is specified,
+      and it does not have "graph" as its graph.
+  """
+  if session is None:
+    session = get_default_session()
+    if session is None:
+      raise ValueError("Cannot execute operation using Run(): No default "
+                       "session is registered. Use 'with "
+                       "default_session(sess)' or pass an explicit session to "
+                       "Run(session=sess)")
+    if session.graph is not graph:
+      raise ValueError("Cannot use the default session to execute operation: "
+                       "the operation's graph is different from the "
+                       "session's graph. Pass an explicit session to "
+                       "Run(session=sess).")
+  else:
+    if session.graph is not graph:
+      raise ValueError("Cannot use the given session to execute operation: "
+                       "the operation's graph is different from the session's "
+                       "graph.")
+  session.run(operation, feed_dict)
+
+
+class _DefaultGraphStack(_DefaultStack):
+  """A thread-local stack of objects for providing an implicit default graph."""
+
+  def __init__(self):
+    super(_DefaultGraphStack, self).__init__()
+    self._global_default_graph = None
+
+  def get_default(self):
+    """Override that returns a global default if the stack is empty."""
+    ret = super(_DefaultGraphStack, self).get_default()
+    if ret is None:
+      ret = self._GetGlobalDefaultGraph()
+    return ret
+
+  def _GetGlobalDefaultGraph(self):
+    if self._global_default_graph is None:
+      # TODO(mrry): Perhaps log that the default graph is being used, or set
+      #   provide some other feedback to prevent confusion when a mixture of
+      #   the global default graph and an explicit graph are combined in the
+      #   same process.
+      self._global_default_graph = Graph()
+    return self._global_default_graph
+
+  def reset(self):
+    super(_DefaultGraphStack, self).reset()
+    self._global_default_graph = None
+
+_default_graph_stack = _DefaultGraphStack()
+
+
+def reset_default_graph():
+  """Clears the default graph stack and resets the global default graph.
+
+  *N.B.* The default graph is a property of the current thread. This
+   function applies only to the current thread.
+  """
+  _default_graph_stack.reset()
+
+
+def get_default_graph():
+  """Returns the default graph for the current thread.
+
+  The returned graph will be the innermost graph on which a
+  `Graph.as_default()` context has been entered, or a global default
+  graph if none has been explicitly created.
+
+  *N.B.* The default graph is a property of the current thread. If you
+  create a new thread, and wish to use the default graph in that
+  thread, you must explicitly add a `with g.as_default():` in that
+  thread's function.
+
+  Returns:
+    The default `Graph` being used in the current thread.
+  """
+  return _default_graph_stack.get_default()
+
+
+def _get_graph_from_inputs(op_input_list, graph=None):
+  """Returns the appropriate graph to use for the given inputs.
+
+  This library method provides a consistent algorithm for choosing the graph
+  in which an Operation should be constructed:
+
+  1. If the "graph" is specified explicitly, we validate that all of the inputs
+     in "op_input_list" are compatible with that graph.
+  2. Otherwise, we attempt to select a graph from the first Operation-
+     or Tensor-valued input in "op_input_list", and validate that all other
+     such inputs are in the same graph.
+  3. If the graph was not specified and it could not be inferred from
+     "op_input_list", we attempt to use the default graph.
+
+  Args:
+    op_input_list: A list of inputs to an operation, which may include Tensor
+      and Operation objects.
+    graph: (Optional) The explicit graph to use.
+
+  Raises:
+    TypeError: If op_input_list is not a list or tuple, or if graph is not a
+      Graph.
+    ValueError: If a graph is explicitly passed and not all inputs are from it,
+      or if the inputs are from multiple graphs, or we could not find a graph
+      and there was no default graph.
+
+  Returns:
+    The appropriate graph to use for the given inputs.
+  """
+  if not isinstance(op_input_list, (list, tuple)):
+    raise TypeError("The op_input_list must be a list or tuple")
+
+  # 1. If the graph is specified explicitly, we validate that all of the inputs
+  #    are compatible with that graph.
+  if graph is not None:
+    if not isinstance(graph, Graph):
+      raise TypeError("Input graph needs to be a Graph: %s" % graph)
+    for op_input in op_input_list:
+      if isinstance(op_input, Operation):
+        if op_input.graph is not graph:
+          raise ValueError("Operation %s is not from the passed-in graph"
+                           % op_input)
+      elif isinstance(op_input, Tensor):
+        if op_input.graph is not graph:
+          raise ValueError("Tensor %s is not from the passed-in graph"
+                           % op_input)
+    return graph
+
+  # 2. Otherwise, we attempt to select a graph from one of the Operation-
+  #    or Tensor-valued inputs.
+  original_input = None
+  for op_input in op_input_list:
+    if isinstance(op_input, (Operation, Tensor)):
+      if original_input is None:
+        original_input = op_input
+      else:
+        assert_same_graph([original_input, op_input])
+  if original_input is not None:
+    return original_input.graph
+
+  # 3. If all else fails, we use the default graph, which is always there.
+  return get_default_graph()
+
+
+class GraphKeys(object):
+  """Standard names to use for graph collections.
+
+  The standard library uses various well-known names to collect and
+  retrieve values associated with a graph. For example, the
+  `tf.Optimizer` subclasses default to optimizing the variables
+  collected under `tf.GraphKeys.TRAINABLE_VARIABLES` if none is
+  specified, but it is also possible to pass an explicit list of
+  variables.
+
+  The following standard keys are defined:
+
+  * `VARIABLES`: the `Variable` objects that comprise a model, and
+    must be saved and restored together. See
+    [`tf.all_variables()`](state_ops.md#all_variables) for more details.
+  * `TRAINABLE_VARIABLES`: the subset of `Variable` objects that will
+    be trained by an optimizer. See
+    [`tf.trainable_variables()`](state_ops.md#trainable_variables)
+    for more details.
+  * `SUMMARIES`: the summary `Tensor` objects that have been created
+    in the graph. See [`tf.merge_all_summaries()`](train.md#merge_all_summaries)
+    for more details.
+  * `QUEUE_RUNNERS`: the `QueueRunner` objects that are used to
+    produce input for a computation. See
+    [`tf.start_queue_runners()`](train.md#start_queue_runners) for more details.
+  """
+
+  # Key to collect variables.Variable objects that must be saved and restored
+  # by the model.
+  VARIABLES = "variables"
+  # Key to collect variables.Variable objects that will be trained by the
+  # optimizers.
+  TRAINABLE_VARIABLES = "trainable_variables"
+  # Key to collect summaries.
+  SUMMARIES = "summaries"
+  # Key to collect QueueRunners.
+  QUEUE_RUNNERS = "queue_runners"
+  # Key to collect table initializers.
+  TABLE_INITIALIZERS = "table_initializer"
+
+
+def add_to_collection(name, value):
+  """Wrapper for `Graph.add_to_collection()` using the default graph.
+
+  See [`Graph.add_to_collection()`](framework.md#Graph.add_to_collection)
+  for more details.
+
+  Args:
+    name: The key for the collection. For example, the `GraphKeys` class
+      contains many standard names for collections.
+    value: The value to add to the collection.
+  """
+  get_default_graph().add_to_collection(name, value)
+
+
+def get_collection(key, scope=None):
+  """Wrapper for `Graph.get_collection()` using the default graph.
+
+  See [`Graph.get_collection()`](framework.md#Graph.get_collection)
+  for more details.
+
+  Args:
+    key: The key for the collection. For example, the `GraphKeys` class
+      contains many standard names for collections.
+    scope: (Optional.) If supplied, the resulting list is filtered to include
+      only items whose name begins with this string.
+
+  Returns:
+    The list of values in the collection with the given `name`, or
+    an empty list if no value has been added to that collection. The
+    list contains the values in the order under which they were
+    collected.
+  """
+  return get_default_graph().get_collection(key, scope)
+
+
+# pylint: disable=g-doc-return-or-yield
+@contextlib.contextmanager
+def op_scope(values, name, default_name):
+  """Returns a context manager for use when defining a Python op.
+
+  This context manager validates that the given `values` are from the
+  same graph, ensures that that graph is the default graph, and pushes a
+  name scope.
+
+  For example, to define a new Python op called `my_op`:
+
+  ```python
+  def my_op(a, b, c, name=None):
+    with tf.op_scope([a, b, c], name, "MyOp") as scope:
+      a = tf.convert_to_tensor(a, name="a")
+      b = tf.convert_to_tensor(b, name="b")
+      c = tf.convert_to_tensor(c, name="c")
+      # Define some computation that uses `a`, `b`, and `c`.
+      return foo_op(..., name=scope)
+  ```
+
+  Args:
+    values: The list of `Tensor` arguments that are passed to the op function.
+    name: The name argument that is passed to the op function.
+    default_name: The default name to use if the `name` argument is `None`.
+
+  Returns:
+    A context manager for use in defining a Python op.
+  """
+  g = _get_graph_from_inputs(values)
+  n = default_name if name is None else name
+  with g.as_default(), g.name_scope(n) as scope:
+    yield scope
+# pylint: enable=g-doc-return-or-yield
diff --git a/tensorflow/python/framework/ops_test.py b/tensorflow/python/framework/ops_test.py
new file mode 100644
index 0000000000..a406c5e56e
--- /dev/null
+++ b/tensorflow/python/framework/ops_test.py
@@ -0,0 +1,825 @@
+"""Tests for tensorflow.python.framework.ops."""
+import tensorflow.python.platform
+
+from tensorflow.python.framework import device as pydev
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import test_kernel_label_op
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.platform import googletest
+
+
+class TensorTest(test_util.TensorFlowTestCase):
+
+  def testShape(self):
+    op = ops.Operation(ops._NodeDef("noop", "myop"), ops.Graph(),
+                       [], [types.float32])
+    t = op.outputs[0]
+    self.assertEquals(tensor_shape.unknown_shape(), t.get_shape())
+    t.set_shape([1, 2, 3])
+    self.assertEquals([1, 2, 3], t.get_shape())
+
+
+class NodeDefConstructorTest(test_util.TensorFlowTestCase):
+
+  def testNoArgs(self):
+    nodedef = ops._NodeDef("noop", "bar")
+    self.assertProtoEquals("op: 'noop' name: 'bar'", nodedef)
+
+  def testArgs(self):
+    nodedef = ops._NodeDef("foo", "bar", device="/device:baz:*")
+    self.assertProtoEquals("op:'foo' name:'bar' device:'/device:baz:*'",
+                           nodedef)
+    nodedef = ops._NodeDef("foo", "bar", device=pydev.Device(job="j"))
+    self.assertProtoEquals("op:'foo' name:'bar' device:'/job:j'", nodedef)
+
+
+# NOTE(mrry): Dummy shape registrations for ops used in the tests.
+ops.RegisterShape("a")(None)
+ops.RegisterShape("b")(None)
+ops.RegisterShape("c")(None)
+ops.RegisterShape("add")(None)
+ops.RegisterShape("an_op")(None)
+ops.RegisterShape("const")(None)
+ops.RegisterShape("copy")(None)
+ops.RegisterShape("foo")(None)
+ops.RegisterShape("identity")(None)
+ops.RegisterShape("mul")(None)
+ops.RegisterShape("nonrefop")(None)
+ops.RegisterShape("noop")(None)
+ops.RegisterShape("refop")(None)
+
+
+def _apply_op(g, *args, **kwargs):
+  op = g.create_op(*args, **kwargs)
+  if len(op.outputs) == 1:
+    return op.outputs[0]
+  else:
+    return op.outputs
+
+
+class OperationTest(test_util.TensorFlowTestCase):
+
+  def testNoInputs(self):
+    op = ops.Operation(ops._NodeDef("noop", "myop"), ops.Graph(),
+                       [],
+                       [types.float32, types.string])
+    self.assertEquals(2, len(op.values()))
+    self.assertEquals(0, len(op.inputs))
+    self.assertEquals("myop", op.name)
+
+    float_t, label_str_t = op.values()
+    self.assertEquals(types.float32, float_t.dtype)
+    self.assertEquals(op, float_t.op)
+    self.assertEquals(0, float_t._value_index)
+    self.assertEquals(0, len(float_t._consumers))
+    self.assertEquals("myop", float_t._as_node_def_input())
+
+    self.assertEquals(types.string, label_str_t.dtype)
+    self.assertEquals(op, label_str_t.op)
+    self.assertEquals(1, label_str_t._value_index)
+    self.assertEquals(0, len(label_str_t._consumers))
+    self.assertEquals("myop:1", label_str_t._as_node_def_input())
+
+    self.assertProtoEquals("op:'noop' name:'myop'", op.node_def)
+
+  def testNoOutputs(self):
+    g = ops.Graph()
+    op1 = ops.Operation(
+        ops._NodeDef("noop", "myop1"), g, [], [types.float32])
+    float_t, = op1.values()
+    op2 = ops.Operation(ops._NodeDef("reop", "myop2"), g, [float_t], [])
+    self.assertEquals(0, len(op2.values()))
+    self.assertEquals(1, len(op2.inputs))
+    self.assertIs(float_t, op2.inputs[0])
+
+    self.assertEquals(1, len(float_t._consumers))
+    self.assertEquals(op2, float_t._consumers[0])
+
+    self.assertProtoEquals("op:'noop' name:'myop1'", op1.node_def)
+    self.assertProtoEquals("op:'reop' name:'myop2' input:'myop1'",
+                           op2.node_def)
+
+  def testInputsAndOutputs(self):
+    g = ops.Graph()
+    op1 = ops.Operation(
+        ops._NodeDef("noop", "myop1"), g, [], [types.float32])
+    self.assertEquals(1, len(op1.values()))
+    float1_t, = op1.values()
+
+    op2 = ops.Operation(ops._NodeDef("reop", "myop2"), g,
+                        [], [types.float32, types.string])
+    self.assertEquals(2, len(op2.values()))
+    float2_t, label2_str_t = op2.values()
+
+    # Note that we consume label2_str_t twice here.
+    op3 = ops.Operation(ops._NodeDef("add", "myop3"), g,
+                        [float1_t, label2_str_t, label2_str_t],
+                        [types.float32, types.int32])
+    self.assertEquals(2, len(op3.values()))
+
+    self.assertEquals(1, len(float1_t._consumers))
+    self.assertEquals(op3, float1_t._consumers[0])
+
+    self.assertEquals(0, len(float2_t._consumers))
+
+    self.assertEquals(2, len(label2_str_t._consumers))
+    self.assertEquals(op3, label2_str_t._consumers[0])
+    self.assertEquals(op3, label2_str_t._consumers[1])
+
+    self.assertProtoEquals("""
+    op:'add' name:'myop3'
+    input:'myop1' input:'myop2:1' input:'myop2:1'
+    """, op3.node_def)
+
+  def testDeviceObject(self):
+    op = ops.Operation(ops._NodeDef("noop", "myop"), ops.Graph(), [], [])
+    op._set_device("/job:goo/device:GPU:0")
+    self.assertProtoEquals(
+        "op:'noop' name:'myop' device:'/job:goo/device:GPU:0' ",
+        op.node_def)
+    op = ops.Operation(ops._NodeDef("noop", "op2"), ops.Graph(), [], [])
+    op._set_device(pydev.Device(job="muu", device_type="CPU", device_index=0))
+    self.assertProtoEquals(
+        "op:'noop' name:'op2' device:'/job:muu/device:CPU:0'",
+        op.node_def)
+
+  def testReferenceInput(self):
+    g = ops.Graph()
+    op1 = ops.Operation(ops._NodeDef("noop", "op1"), g, [],
+                        [types.float32_ref, types.float32])
+    self.assertProtoEquals("op:'noop' name:'op1'",
+                           op1.node_def)
+    ref_t, nonref_t = op1.values()
+    # NOTE(mrry): Must specify input_types to preserve ref-typed input.
+    op2 = ops.Operation(
+        ops._NodeDef("refop", "op2"), g, [ref_t, nonref_t], [],
+        input_types=[types.float32_ref, types.float32])
+    self.assertProtoEquals("op:'refop' name:'op2' input:'op1' input:'op1:1'",
+                           op2.node_def)
+    op3 = ops.Operation(
+        ops._NodeDef("nonrefop", "op3"), g, [ref_t, nonref_t], [])
+    self.assertProtoEquals("op:'nonrefop' name:'op3' input:'op1' input:'op1:1'",
+                           op3.node_def)
+
+  def testInvalidNames(self):
+    g = ops.Graph()
+    with self.assertRaises(ValueError):
+      ops.Operation(ops._NodeDef("op", ""), g)
+    with self.assertRaises(ValueError):
+      ops.Operation(ops._NodeDef("op", "_invalid"), g)
+    with self.assertRaises(ValueError):
+      ops.Operation(ops._NodeDef("op", "-invalid"), g)
+    with self.assertRaises(ValueError):
+      ops.Operation(ops._NodeDef("op", "/invalid"), g)
+
+  def testShapeFunctionAbsence(self):
+    def _test():
+      pass
+    g = ops.Graph()
+    with self.assertRaises(RuntimeError):
+      g.create_op("shapeless_op", [], [types.float32])
+
+  def testNoShapeFunction(self):
+    g = ops.Graph()
+    op = ops.Operation(ops._NodeDef("op", "an_op"), g,
+                       output_types = [types.float32])
+    self.assertEquals(tensor_shape.unknown_shape(),
+                      _apply_op(g, "an_op", [], [types.float32]).get_shape())
+
+class CreateOpTest(test_util.TensorFlowTestCase):
+
+  def testNodeDefArgs(self):
+    g = ops.Graph()
+    op1 = g.create_op("const", [], [types.float32], None, name="myop1")
+    with g.device("/device:GPU"):
+      op2 = g.create_op("add",
+                        [],
+                        [types.float32, types.string], None,
+                        name="myop2")
+    op3 = g.create_op(
+        "foo",
+        [op1.values()[0], op2.values()[1], op2.values()[0]],
+        [types.float32, types.int32], None,
+        name="myop3")
+    self.assertEquals(None, op1.device)
+    self.assertEquals("/device:GPU", op2.device)
+    self.assertEquals(None, op3.device)
+    self.assertProtoEquals("name:'myop1' op:'const'", op1.node_def)
+    self.assertProtoEquals("name:'myop2' op:'add' device:'/device:GPU'",
+                           op2.node_def)
+    self.assertProtoEquals(
+        "name:'myop3' input:'myop1' input:'myop2:1' input:'myop2' op:'foo'",
+        op3.node_def)
+
+  def testReferenceInput(self):
+    g = ops.Graph()
+    op1 = g.create_op("noop", [],
+                      [types.float32_ref, types.float32], name="op1")
+    self.assertProtoEquals("op:'noop' name:'op1'", op1.node_def)
+    ref_t, nonref_t = op1.values()
+    # NOTE(mrry): Must specify input_types to preserve ref-typed input.
+    op2 = g.create_op("refop", [ref_t, nonref_t], [],
+                      input_types=[types.float32_ref, types.float32],
+                      name="op2")
+    self.assertProtoEquals("op:'refop' name:'op2' input:'op1' input:'op1:1'",
+                           op2.node_def)
+    op3 = g.create_op("nonrefop", [ref_t, nonref_t], [], name="op3")
+    self.assertProtoEquals("op:'nonrefop' name:'op3' input:'op1' input:'op1:1'",
+                           op3.node_def)
+
+  def testFinalized(self):
+    g = ops.Graph()
+    g.finalize()
+    with self.assertRaises(RuntimeError):
+      g.create_op("const", [], [types.float32], None, name="myop1")
+
+
+class ApplyOpTest(test_util.TensorFlowTestCase):
+
+  def testNodeDefArgs(self):
+    g = ops.Graph()
+    t1 = _apply_op(g, "const", [], [types.float32], name="myop1")
+    with g.device("/device:GPU"):
+      t2 = _apply_op(g, "add",
+                     [],
+                     [types.float32, types.string],
+                     name="myop2")
+    t3 = _apply_op(g, "foo", [t1, t2[1], t2[0]],
+                   [types.float32, types.int32], name="myop3")
+    self.assertTrue(isinstance(t1, ops.Tensor))
+    self.assertTrue(isinstance(t2, list))
+    self.assertTrue(isinstance(t3, list))
+    self.assertTrue(isinstance(t3[0], ops.Tensor))
+    self.assertEquals("myop1", t1._as_node_def_input())
+    self.assertEquals("myop2", t2[0]._as_node_def_input())
+    self.assertEquals("myop2:1", t2[1]._as_node_def_input())
+    self.assertEquals("myop3", t3[0]._as_node_def_input())
+    # Validate that we got the right ops as well
+    self.assertProtoEquals("name:'myop1' op:'const'", t1.op.node_def)
+    self.assertProtoEquals("name:'myop2' op:'add' device:'/device:GPU'",
+                           t2[0].op.node_def)
+    self.assertProtoEquals(
+        "name:'myop3' input:'myop1' input:'myop2:1' input:'myop2' op:'foo'",
+        t3[0].op.node_def)
+
+  def testReferenceInput(self):
+    g = ops.Graph()
+    ref_t, nonref_t = _apply_op(
+        g, "noop", [], [types.float32_ref, types.float32], name="op1")
+    self.assertProtoEquals("op:'noop' name:'op1'", ref_t.op.node_def)
+    # NOTE(mrry): Must specify input_types to preserve ref-typed input.
+    out_2 = _apply_op(g, "refop", [ref_t, nonref_t], [types.int32],
+                      input_types=[types.float32_ref, types.float32],
+                      name="op2")
+    self.assertProtoEquals("op:'refop' name:'op2' input:'op1' input:'op1:1'",
+                           out_2.op.node_def)
+    out_3 = _apply_op(g, "nonrefop", [ref_t, nonref_t], [types.int32],
+                      name="op3")
+    self.assertProtoEquals("op:'nonrefop' name:'op3' input:'op1' input:'op1:1'",
+                           out_3.op.node_def)
+
+
+class NameStackTest(test_util.TensorFlowTestCase):
+
+  def testBasics(self):
+    g = ops.Graph()
+    self.assertEquals("foo", g.unique_name("foo"))
+    self.assertEquals("foo_1", g.unique_name("foo"))
+    self.assertEquals("foo_2", g.unique_name("foo"))
+    self.assertEquals("foo_1_1", g.unique_name("foo_1"))
+    self.assertEquals("foo_1_2", g.unique_name("foo_1"))
+    self.assertEquals("foo_1_2_1", g.unique_name("foo_1_2"))
+    with g.name_scope("bar"):
+      self.assertEquals("bar/foo", g.unique_name("foo"))
+      self.assertEquals("bar/foo_1", g.unique_name("foo"))
+      with g.name_scope(None):
+        self.assertEquals("foo_3", g.unique_name("foo"))
+      with g.name_scope("baz"):
+        self.assertEquals("bar/baz/foo", g.unique_name("foo"))
+        self.assertEquals("bar/baz/foo_1", g.unique_name("foo"))
+      with g.name_scope("baz"):
+        self.assertEquals("bar/baz_1/foo", g.unique_name("foo"))
+        self.assertEquals("bar/baz_1/foo_1", g.unique_name("foo"))
+    with g.name_scope("quux"):
+      self.assertEquals("quux/foo", g.unique_name("foo"))
+    with g.name_scope("bar"):
+      with g.name_scope("baz"):
+        self.assertEquals("bar_1/baz/foo", g.unique_name("foo"))
+    self.assertEquals("foo_4", g.unique_name("foo"))
+    self.assertEquals("bar_2", g.unique_name("bar"))
+
+  def testOutOfOrderUniqueName(self):
+    g = ops.Graph()
+    self.assertEquals("foo_2", g.unique_name("foo_2"))
+    self.assertEquals("foo", g.unique_name("foo"))
+    self.assertEquals("foo_1", g.unique_name("foo"))
+    self.assertEquals("foo_3", g.unique_name("foo"))
+
+
+class NameTest(test_util.TensorFlowTestCase):
+
+  def testGenerateName(self):
+    g = ops.Graph()
+    op0 = g.create_op("const", [], [types.float32, types.float32])
+    self.assertEquals("const", op0.name)
+    self.assertEquals("const:0", op0.outputs[0].name)
+    self.assertEquals("const:1", op0.outputs[1].name)
+
+    op1 = g.create_op("const", [], [types.float32])
+    self.assertEquals("const_1", op1.name)
+    self.assertEquals("const_1:0", op1.outputs[0].name)
+
+    op2 = g.create_op("const", [], [types.float32], name="my_op")
+    self.assertEquals("my_op", op2.name)
+    self.assertEquals("my_op:0", op2.outputs[0].name)
+
+  def testname_scope(self):
+    g = ops.Graph()
+
+    with g.name_scope("foo") as foo:
+      self.assertEquals(foo, "foo/")
+      with g.name_scope("foo2") as foo2:
+        self.assertEquals(foo2, "foo/foo2/")
+      with g.name_scope(None) as empty1:
+        self.assertEquals(empty1, "")
+        with g.name_scope("foo3") as foo3:
+          self.assertEquals(foo3, "foo3/")
+      with g.name_scope("") as empty2:
+        self.assertEquals(empty2, "")
+
+    self.assertEquals("const",
+                      g.create_op("const", [], [types.float32]).name)
+    with g.name_scope("bar") as scope:
+      self.assertEquals("bar/const",
+                        g.create_op("const", [], [types.float32]).name)
+      self.assertEquals("bar/const_1",
+                        g.create_op("const", [], [types.float32]).name)
+      # If you use the value from "with .. as", that values is used as-is.
+      self.assertEquals(
+          "bar",
+          g.create_op("const", [], [types.float32], name=scope).name)
+    with g.name_scope("baz") as scope:
+      with g.name_scope("quux"):
+        self.assertEquals("baz/quux/const",
+                          g.create_op("const", [], [types.float32]).name)
+      # If you use the value from the enclosing "with .. as", nothing is pushed.
+      with g.name_scope(scope):
+        self.assertEquals("baz/const",
+                          g.create_op("const", [], [types.float32]).name)
+        self.assertEquals("baz",
+                          g.create_op("const", [], [types.float32],
+                                     name=scope).name)
+        self.assertEquals("trailing",
+                          g.create_op("const", [], [types.float32],
+                                     name="trailing/").name)
+    with g.name_scope("bar"):
+      self.assertEquals("bar_1/const",
+                        g.create_op("const", [], [types.float32]).name)
+    with g.name_scope("bar/"):
+      self.assertEquals("bar/const_2",
+                        g.create_op("const", [], [types.float32]).name)
+
+
+class DeviceTest(test_util.TensorFlowTestCase):
+
+  def testNoDevice(self):
+    g = ops.Graph()
+    op = g.create_op("an_op", [], [types.float32])
+    self.assertEqual(None, op.device)
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "an_op" op: "an_op" }
+    """, gd)
+
+  def testDevicePartialString(self):
+    g = ops.Graph()
+    with g.device("/job:worker/replica:2"):
+      g.create_op("an_op", [], [types.float32])
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "an_op" op: "an_op" device: "/job:worker/replica:2" }
+    """, gd)
+
+  def testDeviceFull(self):
+    g = ops.Graph()
+    with g.device(pydev.Device(job="worker", replica=2, task=0,
+                               device_type="CPU",
+                               device_index=3)):
+      g.create_op("an_op", [], [types.float32])
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "an_op" op: "an_op"
+             device: "/job:worker/replica:2/task:0/device:CPU:3" }
+    """, gd)
+
+  def testNesting(self):
+    g = ops.Graph()
+    with g.device("/job:worker/replica:2"):
+      g.create_op("an_op", [], [types.float32])
+      with g.device("/job:worker/replica:3/task:0"):
+        g.create_op("an_op", [], [types.float32])
+      g.create_op("an_op", [], [types.float32])
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "an_op" op: "an_op"
+             device: "/job:worker/replica:2" }
+      node { name: "an_op_1" op: "an_op"
+             device: "/job:worker/replica:3/task:0" }
+      node { name: "an_op_2" op: "an_op"
+             device: "/job:worker/replica:2" }
+    """, gd)
+
+  def testNestingString(self):
+    g = ops.Graph()
+    with g.device("/job:worker/replica:2"):
+      g.create_op("an_op", [], [types.float32])
+      with g.device("/job:worker/replica:3/task:0"):
+        g.create_op("an_op", [], [types.float32])
+      g.create_op("an_op", [], [types.float32])
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "an_op" op: "an_op"
+             device: "/job:worker/replica:2" }
+      node { name: "an_op_1" op: "an_op"
+             device: "/job:worker/replica:3/task:0" }
+      node { name: "an_op_2" op: "an_op"
+             device: "/job:worker/replica:2" }
+    """, gd)
+
+  def testNestingOverrideGpuCpu(self):
+    g = ops.Graph()
+    with g.device("/job:worker/replica:2/device:CPU:1"):
+      g.create_op("an_op", [], [types.float32])
+      with g.device("/job:worker/replica:2/device:GPU:2"):
+        g.create_op("an_op", [], [types.float32])
+      g.create_op("an_op", [], [types.float32])
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "an_op" op: "an_op"
+             device: "/job:worker/replica:2/device:CPU:1"  }
+      node { name: "an_op_1" op: "an_op"
+             device: "/job:worker/replica:2/device:GPU:2" }
+      node { name: "an_op_2" op: "an_op"
+             device: "/job:worker/replica:2/device:CPU:1" }
+    """, gd)
+
+  def testNestingWithMergeDeviceFunction(self):
+    g = ops.Graph()
+
+    with g.device(pydev.merge_device("/device:GPU:0")):
+      g.create_op("an_op", [], [types.float32])
+      with g.device(pydev.merge_device("/job:worker")):
+        g.create_op("an_op", [], [types.float32])
+        with g.device(pydev.merge_device("/device:CPU:0")):
+          g.create_op("an_op", [], [types.float32])
+          with g.device(pydev.merge_device("/job:ps")):
+            g.create_op("an_op", [], [types.float32])
+            with g.device(pydev.merge_device(None)):
+              g.create_op("an_op", [], [types.float32])
+
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "an_op" op: "an_op"
+             device: "/device:GPU:0" }
+      node { name: "an_op_1" op: "an_op"
+             device: "/job:worker/device:GPU:0" }
+      node { name: "an_op_2" op: "an_op"
+             device: "/job:worker/device:CPU:0" }
+      node { name: "an_op_3" op: "an_op"
+             device: "/job:ps/device:CPU:0" }
+      node { name: "an_op_4" op: "an_op"
+             device: "/job:ps/device:CPU:0" }
+    """, gd)
+
+  def testNoneClearsDefault(self):
+    g = ops.Graph()
+    with g.device("/job:worker/replica:2/device:CPU:1"):
+      g.create_op("an_op", [], [types.float32])
+      with g.device(None):
+        g.create_op("an_op", [], [types.float32])
+      g.create_op("an_op", [], [types.float32])
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "an_op" op: "an_op"
+             device: "/job:worker/replica:2/device:CPU:1" }
+      node { name: "an_op_1" op: "an_op" }
+      node { name: "an_op_2" op: "an_op"
+             device: "/job:worker/replica:2/device:CPU:1" }
+    """, gd)
+
+
+class ObjectWithName(object):
+
+  def __init__(self, name):
+    self._name = name
+
+  @property
+  def name(self):
+    return self._name
+
+
+class CollectionTest(test_util.TensorFlowTestCase):
+
+  def testadd_to_collection(self):
+    g = ops.Graph()
+    g.add_to_collection("key", 12)
+    g.add_to_collection("other", "foo")
+    g.add_to_collection("key", 34)
+
+    # Note that only blank1 is returned.
+    g.add_to_collection("blah", 27)
+    blank1 = ObjectWithName("prefix/foo")
+    g.add_to_collection("blah", blank1)
+    blank2 = ObjectWithName("junk/foo")
+    g.add_to_collection("blah", blank2)
+
+    self.assertEquals(["foo"], g.get_collection("other"))
+    self.assertEquals([12, 34], g.get_collection("key"))
+    self.assertEquals([], g.get_collection("nothing"))
+    self.assertEquals([27, blank1, blank2], g.get_collection("blah"))
+    self.assertEquals([blank1], g.get_collection("blah", "prefix"))
+
+  def testDefaulGraph(self):
+    with ops.Graph().as_default():
+      ops.add_to_collection("key", 90)
+      ops.add_to_collection("key", 100)
+      # Collections are ordered.
+      self.assertEquals([90, 100], ops.get_collection("key"))
+
+
+def an_op(g):
+  return _apply_op(g, "an_op", [], [types.float32])
+
+
+ops.NoGradient("an_op")
+
+
+def copy_op(x):
+  return _apply_op(x.graph, "copy", [x], [x.dtype])
+
+
+@ops.RegisterGradient("copy")
+def _CopyGrad(op, x_grad):
+  _ = op
+  return x_grad
+
+
+@ops.RegisterGradient("copy_override")
+def _CopyOverrideGrad(op, x_grad):
+  _ = op
+  return x_grad
+
+
+class RegistrationTest(test_util.TensorFlowTestCase):
+
+  def testRegisterGradients(self):
+    g = ops.Graph()
+    x = an_op(g)
+    y = copy_op(x)
+    fn = ops.get_gradient_function(y.op)
+    self.assertEquals(_CopyGrad, fn)
+
+  def testOverrideGradients(self):
+    g = ops.Graph()
+    x = an_op(g)
+    with g.gradient_override_map({"copy": "copy_override"}):
+      y = copy_op(x)
+    fn = ops.get_gradient_function(y.op)
+    self.assertEquals(_CopyOverrideGrad, fn)
+
+  def testNonExistentOverride(self):
+    g = ops.Graph()
+    x = an_op(g)
+    with g.gradient_override_map({"copy": "unknown_override"}):
+      y = copy_op(x)
+    with self.assertRaisesRegexp(LookupError, "unknown_override"):
+      fn = ops.get_gradient_function(y.op)
+
+
+class ComparisonTest(test_util.TensorFlowTestCase):
+
+  def testMembershipAllowed(self):
+    g = ops.Graph()
+    t1 = _apply_op(g, "const", [], [types.float32], name="myop1")
+    t2 = _apply_op(g, "const", [], [types.float32], name="myop2")
+    self.assertTrue(isinstance(t1, ops.Tensor))
+    self.assertTrue(isinstance(t2, ops.Tensor))
+    self.assertTrue(t1 in [t1])
+    self.assertTrue(t1 not in [t2])
+
+
+class ControlDependenciesTest(test_util.TensorFlowTestCase):
+
+  def testBasic(self):
+    g = ops.Graph()
+    a = _apply_op(g, "const", [], [types.float32])
+    b = _apply_op(g, "const", [], [types.float32])
+    with g.control_dependencies([a]):
+      c = _apply_op(g, "const", [], [types.float32])
+      d = _apply_op(g, "identity", [b], [types.float32])
+      e = _apply_op(g, "identity", [c], [types.float32])
+
+    self.assertEqual(c.op.control_inputs, [a.op])
+    self.assertEqual(d.op.control_inputs, [a.op])
+    # e should be dominated by c.
+    self.assertEqual(e.op.control_inputs, [])
+
+  def testNested(self):
+    g = ops.Graph()
+    a_1 = _apply_op(g, "const", [], [types.float32])
+    a_2 = _apply_op(g, "const", [], [types.float32])
+    a_3 = _apply_op(g, "const", [], [types.float32])
+    a_4 = _apply_op(g, "const", [], [types.float32])
+
+    with g.control_dependencies([a_1, a_2, a_3, a_4]):
+      b_1 = _apply_op(g, "const", [], [types.float32])
+
+    with g.control_dependencies([a_1]):
+      with g.control_dependencies([a_2]):
+        with g.control_dependencies([a_3]):
+          with g.control_dependencies([a_4]):
+            b_2 = _apply_op(g, "const", [], [types.float32])
+
+    self.assertItemsEqual(
+        [a_1.op, a_2.op, a_3.op, a_4.op], b_1.op.control_inputs)
+    self.assertItemsEqual(b_1.op.control_inputs, b_2.op.control_inputs)
+
+  def testComplex(self):
+    g = ops.Graph()
+
+    # Usage pattern:
+    # * Nodes a_i are constants defined at the outermost scope, and are used
+    #   as control inputs for the ith nested scope.
+    # * Nodes b_i are defined as Mul(a_3, a_4) at each scope.
+    # * Nodes c_i are defined as Mul(a_1, b_1) at each scope.
+    # * Nodes d_i are defined as Mul(b_i, c_i) at each scope.
+    # * Nodes e_i are defined as Mul(e_i-1, e_i-1) at each scope i > 1.
+
+    a_1 = _apply_op(g, "const", [], [types.float32])
+    a_2 = _apply_op(g, "const", [], [types.float32])
+    a_3 = _apply_op(g, "const", [], [types.float32])
+    a_4 = _apply_op(g, "const", [], [types.float32])
+
+    with g.control_dependencies([a_1]):
+      b_1 = _apply_op(g, "mul", [a_3, a_4], [types.float32])
+      c_1 = _apply_op(g, "mul", [a_1, b_1], [types.float32])
+      d_1 = _apply_op(g, "mul", [b_1, c_1], [types.float32])
+      e_1 = _apply_op(g, "const", [], [types.float32])
+      with g.control_dependencies([a_2]):
+        b_2 = _apply_op(g, "mul", [a_3, a_4], [types.float32])
+        c_2 = _apply_op(g, "mul", [a_1, b_1], [types.float32])
+        d_2 = _apply_op(g, "mul", [b_2, c_2], [types.float32])
+        e_2 = _apply_op(g, "mul", [e_1, e_1], [types.float32])
+        with g.control_dependencies([a_3]):
+          b_3 = _apply_op(g, "mul", [a_3, a_4], [types.float32])
+          c_3 = _apply_op(g, "mul", [a_1, b_1], [types.float32])
+          d_3 = _apply_op(g, "mul", [b_3, c_3], [types.float32])
+          e_3 = _apply_op(g, "mul", [e_2, e_2], [types.float32])
+          with g.control_dependencies([a_4]):
+            b_4 = _apply_op(g, "mul", [a_3, a_4], [types.float32])
+            c_4 = _apply_op(g, "mul", [a_1, b_1], [types.float32])
+            d_4 = _apply_op(g, "mul", [b_4, c_4], [types.float32])
+            e_4 = _apply_op(g, "mul", [e_3, e_3], [types.float32])
+
+    self.assertItemsEqual([a_1.op], b_1.op.control_inputs)
+    self.assertItemsEqual([a_1.op, a_2.op], b_2.op.control_inputs)
+    self.assertItemsEqual([a_1.op, a_2.op], b_3.op.control_inputs)
+    self.assertItemsEqual([a_1.op, a_2.op], b_4.op.control_inputs)
+
+    self.assertItemsEqual([], c_1.op.control_inputs)
+    self.assertItemsEqual([a_2.op], c_2.op.control_inputs)
+    self.assertItemsEqual([a_2.op, a_3.op], c_3.op.control_inputs)
+    self.assertItemsEqual([a_2.op, a_3.op, a_4.op], c_4.op.control_inputs)
+
+    self.assertItemsEqual([], d_1.op.control_inputs)
+    self.assertItemsEqual([], d_2.op.control_inputs)
+    self.assertItemsEqual([], d_3.op.control_inputs)
+    self.assertItemsEqual([], d_4.op.control_inputs)
+
+    self.assertItemsEqual([a_1.op], e_1.op.control_inputs)
+    self.assertItemsEqual([a_2.op], e_2.op.control_inputs)
+    self.assertItemsEqual([a_3.op], e_3.op.control_inputs)
+    self.assertItemsEqual([a_4.op], e_4.op.control_inputs)
+
+  def testRepeatedDependency(self):
+    g = ops.Graph()
+    a = g.create_op("foo", [], [types.float32, types.float32])
+    a_0, a_1 = a.outputs
+    with g.control_dependencies([a_0]):
+      b = _apply_op(g, "const", [], [types.float32])
+      with g.control_dependencies([a_1]):
+        c = _apply_op(g, "const", [], [types.float32])
+
+    self.assertEqual(b.op.control_inputs, [a])
+    self.assertEqual(c.op.control_inputs, [a])
+
+  def testNoControlDependencyWithDataDependency(self):
+    g = ops.Graph()
+    a = _apply_op(g, "const", [], [types.float32])
+    with g.control_dependencies([a]):
+      b = _apply_op(g, "identity", [a], [types.float32])
+
+    self.assertEqual(b.op.control_inputs, [])
+
+
+class GraphTest(test_util.TensorFlowTestCase):
+
+  def setUp(self):
+    ops.reset_default_graph()
+
+  def _AssertDefault(self, expected):
+    self.assertIs(expected, ops.get_default_graph())
+
+  def testGraphContextManager(self):
+    g0 = ops.Graph()
+    with g0.as_default() as g1:
+      self.assertIs(g0, g1)
+
+  def testDefaultGraph(self):
+    orig = ops.get_default_graph()
+    self._AssertDefault(orig)
+    g0 = ops.Graph()
+    self._AssertDefault(orig)
+    context_manager_0 = g0.as_default()
+    self._AssertDefault(orig)
+    with context_manager_0 as g0:
+      self._AssertDefault(g0)
+      with ops.Graph().as_default() as g1:
+        self._AssertDefault(g1)
+      self._AssertDefault(g0)
+    self._AssertDefault(orig)
+
+  def testAsGraphElementConversions(self):
+    class ConvertibleObj(object):
+
+      def _as_graph_element(self):
+        return "const:0"
+
+    class NonConvertibleObj(object):
+
+      pass
+
+    g = ops.Graph()
+    a = _apply_op(g, "const", [], [types.float32])
+    self.assertEqual(a, g.as_graph_element(ConvertibleObj()))
+    with self.assertRaises(TypeError):
+      g.as_graph_element(NonConvertibleObj())
+
+  def testAssertSameGraph(self):
+    g0 = ops.Graph()
+    a = g0.create_op("a", [], [types.float32])
+    b = g0.create_op("b", [], [types.float32])
+    ops.assert_same_graph([a, b])
+    ops.assert_same_graph([a, b], g0)
+    g1 = ops.Graph()
+    c = g1.create_op("c", [], [types.float32])
+    self.assertRaises(ValueError, ops.assert_same_graph, [a, b, c])
+    self.assertRaises(ValueError, ops.assert_same_graph, [c], g0)
+    self.assertRaises(ValueError, ops.assert_same_graph, [a], g1)
+
+    sparse = ops.SparseTensor(
+        _apply_op(g0, "const", [], [types.int64]),
+        _apply_op(g0, "const", [], [types.float32]),
+        _apply_op(g0, "const", [], [types.int64]))
+    ops.assert_same_graph([sparse, a, b])
+    ops.assert_same_graph([sparse, a, b], g0)
+    self.assertRaises(ValueError, ops.assert_same_graph, [sparse, a, c])
+    self.assertRaises(ValueError, ops.assert_same_graph, [sparse, a, c], g1)
+
+ops.RegisterShape("KernelLabel")(common_shapes.scalar_shape)
+
+
+class KernelLabelTest(test_util.TensorFlowTestCase):
+
+  def testNoLabel(self):
+    with self.test_session():
+      self.assertAllEqual("My label is: default",
+                          test_kernel_label_op.kernel_label().eval())
+
+  def testLabelMap(self):
+    with self.test_session() as sess:
+      default_1 = test_kernel_label_op.kernel_label()
+      # pylint: disable=protected-access
+      with sess.graph._kernel_label_map({"KernelLabel": "overload_1"}):
+        overload_1_1 = test_kernel_label_op.kernel_label()
+        with sess.graph._kernel_label_map({"KernelLabel": "overload_2"}):
+          overload_2 = test_kernel_label_op.kernel_label()
+          with sess.graph._kernel_label_map({"KernelLabel": ""}):
+            default_2 = test_kernel_label_op.kernel_label()
+        overload_1_2 = test_kernel_label_op.kernel_label()
+      # pylint: enable=protected-access
+      default_3 = test_kernel_label_op.kernel_label()
+
+      self.assertAllEqual("My label is: default", default_1.eval())
+      self.assertAllEqual("My label is: default", default_2.eval())
+      self.assertAllEqual("My label is: default", default_3.eval())
+      self.assertAllEqual("My label is: overload_1", overload_1_1.eval())
+      self.assertAllEqual("My label is: overload_1", overload_1_2.eval())
+      self.assertAllEqual("My label is: overload_2", overload_2.eval())
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/framework/python_op_gen.cc b/tensorflow/python/framework/python_op_gen.cc
new file mode 100644
index 0000000000..5c1b4462d5
--- /dev/null
+++ b/tensorflow/python/framework/python_op_gen.cc
@@ -0,0 +1,678 @@
+#include "tensorflow/python/framework/python_op_gen.h"
+
+#include <stdio.h>
+#include <unordered_map>
+#include "tensorflow/core/framework/attr_value.pb.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/framework/op_def_util.h"
+#include "tensorflow/core/framework/op_gen_lib.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/lib/gtl/map_util.h"
+#include "tensorflow/core/lib/gtl/stl_util.h"
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+namespace {
+
+const int kRightMargin = 78;
+
+bool IsPythonReserved(const string& s) {
+  static const std::set<string>* const kPythonReserved = new std::set<string>(
+      {// Keywords in Python, from:
+       //   import keyword
+       //   print keyword.kwlist
+       "and", "as", "assert", "break", "class", "continue", "def", "del",
+       "elif", "else", "except", "exec", "finally", "for", "from", "global",
+       "if", "import", "in", "is", "lambda", "not", "or", "pass", "print",
+       "raise", "return", "try", "while", "with", "yield",
+       // Built-in functions and types in Python, from:
+       //   [x for x in dir(__builtins__) if not x[0].islower()]
+       "ArithmeticError", "AssertionError", "AttributeError", "BaseException",
+       "BufferError", "BytesWarning", "DeprecationWarning", "EOFError",
+       "Ellipsis", "EnvironmentError", "Exception", "False",
+       "FloatingPointError", "FutureWarning", "GeneratorExit", "IOError",
+       "ImportError", "ImportWarning", "IndentationError", "IndexError",
+       "KeyError", "KeyboardInterrupt", "LookupError", "MemoryError",
+       "NameError", "None", "NotImplemented", "NotImplementedError", "OSError",
+       "OverflowError", "PendingDeprecationWarning", "ReferenceError",
+       "RuntimeError", "RuntimeWarning", "StandardError", "StopIteration",
+       "SyntaxError", "SyntaxWarning", "SystemError", "SystemExit", "TabError",
+       "True", "TypeError", "UnboundLocalError", "UnicodeDecodeError",
+       "UnicodeEncodeError", "UnicodeError", "UnicodeTranslateError",
+       "UnicodeWarning", "UserWarning", "ValueError", "Warning",
+       "ZeroDivisionError", "__debug__", "__doc__", "__import__", "__name__",
+       "__package__",
+       // Imports and symbols used in the generated code:
+       "_op_def_lib", "text_format", "op_def_pb2", "op_def_library", "ops"});
+
+  return kPythonReserved->count(s) > 0;
+}
+
+// Add a _ to the end of s if necessary to avoid a Python keyword or built-in.
+string AvoidPythonReserved(const string& s) {
+  if (IsPythonReserved(s)) return strings::StrCat(s, "_");
+  return s;
+}
+
+// Indent the first line by "initial" spaces and all following lines
+// by "rest" spaces.
+string Indent(int initial, int rest, StringPiece in) {
+  // TODO(josh11b): Also word-wrapping?
+  string copy(in.data(), in.size());
+  str_util::StripTrailingWhitespace(&copy);
+  std::vector<string> v = str_util::Split(copy, '\n');
+
+  string result;
+  bool first = true;
+  for (const string& line : v) {
+    if (first) {
+      result = strings::StrCat(Spaces(initial), line, "\n");
+      first = false;
+    } else {
+      if (line.empty()) {
+        strings::StrAppend(&result, "\n");
+      } else {
+        strings::StrAppend(&result, Spaces(rest), line, "\n");
+      }
+    }
+  }
+  return result;
+}
+
+// Adds append to *dest, with a space if the first line will be <= width,
+// or a newline otherwise.
+void AppendWithinWidth(string* dest, StringPiece append, int width) {
+  auto first_line = append.find('\n');
+  if (first_line == string::npos) first_line = append.size();
+  if (dest->size() + first_line + 1 /* space */ > static_cast<size_t>(width)) {
+    strings::StrAppend(dest, "\n", append);
+  } else {
+    strings::StrAppend(dest, " ", append);
+  }
+}
+
+void RemoveDescriptionsFromOpDef(OpDef* op_def) {
+  for (int i = 0; i < op_def->input_arg_size(); ++i) {
+    op_def->mutable_input_arg(i)->clear_description();
+  }
+  for (int i = 0; i < op_def->output_arg_size(); ++i) {
+    op_def->mutable_output_arg(i)->clear_description();
+  }
+  for (int i = 0; i < op_def->attr_size(); ++i) {
+    op_def->mutable_attr(i)->clear_description();
+  }
+  op_def->clear_summary();
+  op_def->clear_description();
+}
+
+// Like DataTypeString() but uses the Python names for the
+// float types.
+string PythonDataTypeString(DataType dtype) {
+  switch (dtype) {
+    case DT_FLOAT:
+      return "float32";
+    case DT_DOUBLE:
+      return "float64";
+    default:
+      return DataTypeString(dtype);
+  }
+}
+
+string TypeString(DataType dtype, bool ref) {
+  if (ref) {
+    return strings::StrCat("mutable `", PythonDataTypeString(dtype), "`");
+  } else {
+    return strings::StrCat("`", PythonDataTypeString(dtype), "`");
+  }
+}
+
+string TypeListString(const AttrValue& value) {
+  string ret;
+  for (int t : value.list().type()) {
+    if (!ret.empty()) strings::StrAppend(&ret, ", ");
+    DataType dtype = static_cast<DataType>(t);
+    if (IsRefType(dtype)) {
+      strings::StrAppend(&ret, PythonDataTypeString(RemoveRefType(dtype)),
+                         " mutable");
+    } else {
+      strings::StrAppend(&ret, "`", PythonDataTypeString(dtype), "`");
+    }
+  }
+  return ret;
+}
+
+string SingleTensorName(DataType dtype, bool is_ref) {
+  const string type_str = TypeString(dtype, is_ref);
+  return strings::StrCat("A `Tensor` of type ", type_str, ".");
+}
+
+const char kUnknownTensorType[] = {"A `Tensor`."};
+
+string ArgTypeName(const OpDef& op_def, const OpDef::ArgDef& arg,
+                   const std::unordered_map<string, string>& inferred_attrs,
+                   bool is_output) {
+  if (!arg.number_attr().empty()) {
+    // N Tensors with the same type
+    const string* original_arg =
+        gtl::FindOrNull(inferred_attrs, arg.number_attr());
+    string prefix;
+    if (original_arg == nullptr) {
+      prefix = strings::StrCat("A list of `", arg.number_attr(), "`");
+    } else if (*original_arg == arg.name()) {
+      const OpDef::AttrDef* attr = FindAttr(arg.number_attr(), op_def);
+      if (attr->has_minimum() && attr->minimum() > 0) {
+        prefix = strings::StrCat("A list of at least ", attr->minimum());
+      } else {
+        prefix = "A list of";
+      }
+    } else {
+      prefix = strings::StrCat(
+          "A list with the same number of `Tensor` objects as `",
+          AvoidPythonReserved(*original_arg), "` of");
+    }
+
+    if (arg.type() != DT_INVALID) {
+      return strings::StrCat(prefix, " `Tensor` objects of type ",
+                             TypeString(arg.type(), arg.is_ref()), ".");
+    } else {
+      original_arg = gtl::FindOrNull(inferred_attrs, arg.type_attr());
+      if (arg.is_ref()) {
+        strings::StrAppend(&prefix, " mutable");
+      }
+      if (original_arg == nullptr) {
+        return strings::StrCat(prefix, " `Tensor` objects of type ",
+                               arg.type_attr(), ".");
+      } else if (*original_arg == arg.name()) {
+        const OpDef::AttrDef* attr = FindAttr(arg.type_attr(), op_def);
+        if (attr->has_allowed_values()) {
+          return strings::StrCat(prefix,
+                                 " `Tensor` objects of the same type in: ",
+                                 TypeListString(attr->allowed_values()), ".");
+        } else {
+          return strings::StrCat(prefix, " `Tensor` objects of the same type.");
+        }
+      } else {
+        return strings::StrCat(prefix, " `Tensor` objects of the same type as ",
+                               AvoidPythonReserved(*original_arg), ".");
+      }
+    }
+  } else if (!arg.type_attr().empty() || !arg.type_list_attr().empty()) {
+    const bool is_list = !arg.type_list_attr().empty();
+    const string attr_name = is_list ? arg.type_list_attr() : arg.type_attr();
+    const OpDef::AttrDef* attr = FindAttr(attr_name, op_def);
+    const string mutable_str = arg.is_ref() ? "mutable " : "";
+    const string prefix =
+        is_list ? strings::StrCat("A list of ", mutable_str, "`Tensor` objects")
+                : strings::StrCat("A ", mutable_str, "`Tensor`");
+    const string* original_arg = gtl::FindOrNull(inferred_attrs, attr_name);
+    if (original_arg == nullptr) {
+      return strings::StrCat(prefix, " of type `", attr_name, "`.");
+    } else if (*original_arg == arg.name()) {
+      if (attr->has_allowed_values()) {
+        if (is_list) {
+          return strings::StrCat(prefix, " with types from: ",
+                                 TypeListString(attr->allowed_values()), ".");
+        } else {
+          return strings::StrCat(
+              prefix, is_output ? ". Has one of the following types: "
+                                : ". Must be one of the following types: ",
+              TypeListString(attr->allowed_values()), ".");
+        }
+      } else {
+        return strings::StrCat(prefix, ".");
+      }
+    } else {
+      return strings::StrCat(prefix,
+                             is_output ? ". Has the same type as `"
+                                       : ". Must have the same type as `",
+                             AvoidPythonReserved(*original_arg), "`.");
+    }
+  } else {
+    return SingleTensorName(arg.type(), arg.is_ref());
+  }
+}
+
+void PrintReturns(const OpDef& op_def,
+                  const std::vector<string>& output_type_string) {
+  DCHECK_EQ(op_def.output_arg_size(), output_type_string.size());
+  const int num_outs = op_def.output_arg_size();
+  printf("\n  Returns:\n");
+  if (num_outs == 0) {
+    printf("    The created Operation.\n");
+  } else {
+    if (num_outs == 1) {
+      StringPiece description = op_def.output_arg(0).description();
+      if (ConsumeEquals(&description)) {  // Skip the generated type info.
+        printf("%s", Indent(4, 4, description).c_str());
+      } else {
+        // Special case of one output, don't use the name of the output unless
+        // there is no description.
+        string desc = output_type_string.empty() ? kUnknownTensorType
+                                                 : output_type_string[0];
+        if (desc == kUnknownTensorType) {
+          // Special case where we don't understand how the output tensor type
+          // depends on the input tensor types, just use the output arg
+          // description if we can.
+          if (!description.empty()) {
+            desc = op_def.output_arg(0).description();
+          } else if (!op_def.output_arg(0).name().empty()) {
+            desc = strings::StrCat(" The ", op_def.output_arg(0).name(),
+                                   " `Tensor`.");
+          }
+        } else if (!description.empty()) {
+          AppendWithinWidth(&desc, description, kRightMargin - 4 /* indent */);
+        }
+        printf("%s", Indent(4, 4, desc).c_str());
+      }
+    } else {
+      std::vector<string> out_names(num_outs);
+      for (int i = 0; i < num_outs; ++i) {
+        if (!op_def.output_arg(i).name().empty()) {
+          out_names[i] = op_def.output_arg(i).name();
+        } else {
+          out_names[i] = strings::StrCat("output", i);
+        }
+      }
+      printf("    A tuple of `Tensor` objects (%s).\n",
+             str_util::Join(out_names, ", ").c_str());
+      for (int i = 0; i < num_outs; ++i) {
+        string desc = strings::StrCat(out_names[i], ": ");
+        StringPiece description = op_def.output_arg(i).description();
+        if (ConsumeEquals(&description)) {  // Skip the generated type info.
+          strings::StrAppend(&desc, description);
+        } else {
+          const string type = static_cast<size_t>(i) < output_type_string.size()
+                                  ? output_type_string[i]
+                                  : kUnknownTensorType;
+          if (!description.empty()) {
+            if (type == kUnknownTensorType) {
+              // Special case where we don't understand how the output tensor
+              // type depends on the input tensor types, so we just use the
+              // output arg description.
+              strings::StrAppend(&desc, description);
+            } else {
+              strings::StrAppend(&desc, type, " ", description);
+            }
+          } else {
+            strings::StrAppend(&desc, type);
+          }
+        }
+        printf("%s", Indent(4, 6, desc).c_str());
+      }
+    }
+  }
+}
+
+string StringToPython(const string& str) {
+  return strings::StrCat("\"", str_util::CEscape(str), "\"");
+}
+
+string DataTypeToPython(DataType dtype) {
+  return strings::StrCat("tf.", PythonDataTypeString(dtype));
+}
+
+string ShapeToPython(const TensorShapeProto& shape) {
+  string python = "[";
+  for (const auto& dim : shape.dim()) {
+    if (python.size() > 1) strings::StrAppend(&python, ", ");
+    if (!dim.name().empty()) {
+      strings::StrAppend(&python, "(", StringToPython(dim.name()), ", ",
+                         dim.size(), ")");
+    } else {
+      strings::StrAppend(&python, dim.size());
+    }
+  }
+  strings::StrAppend(&python, "]");
+  return python;
+}
+
+string AttrListToPython(const AttrValue& value) {
+  string ret;
+  if (value.list().s_size() > 0) {
+    for (int i = 0; i < value.list().s_size(); ++i) {
+      if (i > 0) strings::StrAppend(&ret, ", ");
+      strings::StrAppend(&ret, StringToPython(value.list().s(i)));
+    }
+  } else if (value.list().i_size() > 0) {
+    for (int i = 0; i < value.list().i_size(); ++i) {
+      if (i > 0) strings::StrAppend(&ret, ", ");
+      strings::StrAppend(&ret, value.list().i(i));
+    }
+  } else if (value.list().f_size() > 0) {
+    for (int i = 0; i < value.list().f_size(); ++i) {
+      if (i > 0) strings::StrAppend(&ret, ", ");
+      strings::StrAppend(&ret, value.list().f(i));
+    }
+  } else if (value.list().b_size() > 0) {
+    for (int i = 0; i < value.list().b_size(); ++i) {
+      if (i > 0) strings::StrAppend(&ret, ", ");
+      strings::StrAppend(&ret, value.list().b(i) ? "True" : "False");
+    }
+  } else if (value.list().type_size() > 0) {
+    for (int i = 0; i < value.list().type_size(); ++i) {
+      if (i > 0) strings::StrAppend(&ret, ", ");
+      strings::StrAppend(&ret, DataTypeToPython(value.list().type(i)));
+    }
+  } else if (value.list().shape_size() > 0) {
+    for (int i = 0; i < value.list().shape_size(); ++i) {
+      if (i > 0) strings::StrAppend(&ret, ", ");
+      strings::StrAppend(&ret, ShapeToPython(value.list().shape(i)));
+    }
+  }
+  return ret;
+}
+
+string AttrValueToPython(const string& type, const AttrValue& value) {
+  if (type == "string") {
+    return StringToPython(value.s());
+  } else if (type == "int") {
+    return strings::StrCat(value.i());
+  } else if (type == "float") {
+    return strings::StrCat(value.f());
+  } else if (type == "bool") {
+    return value.b() ? "True" : "False";
+  } else if (type == "type") {
+    return DataTypeToPython(value.type());
+  } else if (type == "shape") {
+    return ShapeToPython(value.shape());
+  } else {
+    return strings::StrCat("[", AttrListToPython(value), "]");
+  }
+}
+
+// Requires: ValidateOpDef(op_def).ok()
+void PrintPythonOp(const OpDef& op_def, bool is_hidden, string op_name) {
+  // Map from attr name to the first input arg it is inferred from.
+  std::unordered_map<string, string> inferred_attrs;
+  // This has all the input args followed by those attrs that don't have
+  // defaults.
+  std::vector<string> args_no_default;
+  // The parameters with defaults (these have to be listed after those without).
+  // No input args are included, just attrs and the graph ("g") parameter.
+  std::vector<string> args_with_defaults;
+  for (int i = 0; i < op_def.input_arg_size(); ++i) {
+    const auto& arg(op_def.input_arg(i));
+    args_no_default.push_back(arg.name());
+    if (!arg.type_attr().empty()) {
+      gtl::InsertIfNotPresent(&inferred_attrs, arg.type_attr(), arg.name());
+    } else if (!arg.type_list_attr().empty()) {
+      gtl::InsertIfNotPresent(&inferred_attrs, arg.type_list_attr(),
+                              arg.name());
+    }
+    if (!arg.number_attr().empty()) {
+      gtl::InsertIfNotPresent(&inferred_attrs, arg.number_attr(), arg.name());
+    }
+  }
+  for (int i = 0; i < op_def.attr_size(); ++i) {
+    const auto& attr(op_def.attr(i));
+    // Do not add inferred attrs to the Python function signature.
+    if (inferred_attrs.find(attr.name()) == inferred_attrs.end()) {
+      if (attr.has_default_value()) {
+        args_with_defaults.push_back(attr.name());
+      } else {
+        args_no_default.push_back(attr.name());
+      }
+    }
+  }
+
+  // Save the list of attr parameters (attrs that won't be inferred),
+  // those with defaults go at the end.
+  std::vector<string> attrs;
+  // Get the attrs in the order we want by taking the attrs without defaults
+  // from the end of args_no_default, and adding args_no_default (before
+  // "g" gets added to args_no_default, so it only has attrs).
+  attrs.reserve(args_no_default.size() - op_def.input_arg_size() +
+                args_with_defaults.size());
+  attrs.insert(attrs.end(), args_no_default.begin() + op_def.input_arg_size(),
+               args_no_default.end());
+  attrs.insert(attrs.end(), args_with_defaults.begin(),
+               args_with_defaults.end());
+
+  std::vector<string> param_names;
+  param_names.reserve(args_no_default.size() + args_with_defaults.size());
+  string parameters;
+  for (const string& name : args_no_default) {
+    if (!parameters.empty()) strings::StrAppend(&parameters, ", ");
+    const string param = AvoidPythonReserved(name);
+    strings::StrAppend(&parameters, param);
+    param_names.push_back(param);
+  }
+  for (const string& name : args_with_defaults) {
+    if (!parameters.empty()) strings::StrAppend(&parameters, ", ");
+    const string param = AvoidPythonReserved(name);
+    strings::StrAppend(&parameters, param, "=None");
+    param_names.push_back(param);
+  }
+  const bool has_args = args_no_default.size() + args_with_defaults.size() > 0;
+
+  // Print: def Function(parameters):
+  const string lower_op_name = strings::StrCat(is_hidden ? "_" : "", op_name);
+
+  const string def_prefix = strings::StrCat("def ", lower_op_name, "(");
+  const string def_suffix =
+      strings::StrCat(parameters, has_args ? ", " : "", "name=None):");
+
+  printf("%s\n", WordWrap(def_prefix, def_suffix, kRightMargin).c_str());
+
+  // Format the Op's descriptions so that it can be a Python docstring.
+  string comment;
+  if (op_def.summary().empty()) {
+    comment = "TODO: add doc.\n";
+  } else {
+    comment = strings::StrCat(op_def.summary(), "\n");
+    if (!op_def.description().empty()) {
+      strings::StrAppend(&comment, "\n", Indent(2, 2, op_def.description()));
+    }
+  }
+
+  printf(R"(  r"""%s
+  Args:
+)",
+         comment.c_str());
+
+  // Inputs
+  for (int i = 0; i < op_def.input_arg_size(); ++i) {
+    const auto& arg(op_def.input_arg(i));
+    StringPiece description = op_def.input_arg(i).description();
+    string desc;
+    if (ConsumeEquals(&description)) {  // Skip the generated type info.
+      desc = strings::StrCat(param_names[i], ": ");
+    } else {
+      desc = strings::StrCat(param_names[i], ": ",
+                             ArgTypeName(op_def, arg, inferred_attrs, false));
+    }
+    if (!description.empty()) {
+      AppendWithinWidth(&desc, description, kRightMargin - 4 /* indent */);
+    }
+    printf("%s", Indent(4, 6, desc).c_str());
+  }
+
+  // Attrs
+  for (const string& name : attrs) {
+    const auto& attr = *FindAttr(name, op_def);
+    string desc = strings::StrCat(AvoidPythonReserved(name), ": ");
+
+    static const char* const kAttrTypeName[][2] = {
+        {"string", "`string`"},
+        {"list(string)", "list of `strings`"},
+        {"int", "`int`"},
+        {"list(int)", "list of `ints`"},
+        {"float", "`float`"},
+        {"list(float)", "list of `floats`"},
+        {"bool", "`bool`"},
+        {"list(bool)", "list of `bools`"},
+        {"type", "`tf.DType`"},
+        {"list(type)", "list of `tf.DTypes`"},
+        {"shape", "`tf.TensorShape` or list of `ints`"},
+        {"list(shape)",
+         "list of shapes (each a `tf.TensorShape` or list of `ints`)"},
+    };
+    for (size_t i = 0; i < TF_ARRAYSIZE(kAttrTypeName); ++i) {
+      if (attr.type() == kAttrTypeName[i][0]) {
+        string s;
+        if (attr.has_default_value()) {
+          s = strings::StrCat("optional ", kAttrTypeName[i][1]);
+        } else {
+          s = kAttrTypeName[i][1];
+        }
+        if (s[0] == 'o' || (s[0] == '`' && (s[1] == 'i' || s[1] == 'o'))) {
+          strings::StrAppend(&desc, "An ", s);
+        } else {
+          strings::StrAppend(&desc, "A ", s);
+        }
+        break;
+      }
+    }
+
+    if (attr.has_allowed_values()) {
+      strings::StrAppend(&desc, " from: `",
+                         AttrListToPython(attr.allowed_values()), "`");
+    }
+
+    if (attr.has_minimum()) {
+      if (attr.type() == "int") {
+        strings::StrAppend(&desc, " that is `>= ", attr.minimum(), "`");
+      } else if (attr.minimum() > 0) {
+        strings::StrAppend(&desc, " that has length `>= ", attr.minimum(), "`");
+      }
+    }
+
+    strings::StrAppend(&desc, ".");
+
+    if (attr.has_default_value()) {
+      strings::StrAppend(&desc, " Defaults to `",
+                         AttrValueToPython(attr.type(), attr.default_value()),
+                         "`.");
+    }
+
+    if (!attr.description().empty()) {
+      AppendWithinWidth(&desc, attr.description(),
+                        kRightMargin - 4 /* indent */);
+    }
+    printf("%s", Indent(4, 6, desc).c_str());
+  }
+
+  printf("    name: A name for the operation (optional).\n");
+
+  std::vector<string> output_type_string;
+  output_type_string.reserve(op_def.output_arg_size());
+  for (int i = 0; i < op_def.output_arg_size(); ++i) {
+    output_type_string.push_back(
+        ArgTypeName(op_def, op_def.output_arg(i), inferred_attrs, true));
+  }
+  PrintReturns(op_def, output_type_string);
+
+  string return_prefix = strings::StrCat("  return _op_def_lib.apply_op(");
+  string return_args = strings::StrCat("\"", op_def.name(), "\", ");
+  for (size_t i = 0; i < param_names.size(); ++i) {
+    strings::StrAppend(&return_args, param_names[i], "=", param_names[i], ", ");
+  }
+  strings::StrAppend(&return_args, "name=name)");
+
+  printf(R"(  """
+%s
+)",
+         // Wrap the arguments, and indent to the (.
+         WordWrap(return_prefix, return_args, kRightMargin).c_str());
+
+  printf("\n\n");
+}
+
+void GenerateLowerCaseOpName(const string& str, string* result) {
+  char joiner = '_';
+  int last_index = str.size() - 1;
+  for (int i = 0; i <= last_index; ++i) {
+    char c = str[i];
+    // Emit a joiner only if a previous-lower-to-now-upper or a
+    // now-upper-to-next-lower transition happens.
+    if (isupper(c) && (i > 0)) {
+      if (islower(str[i - 1]) || ((i < last_index) && islower(str[i + 1]))) {
+        result->push_back(joiner);
+      }
+    }
+    result->push_back(tolower(c));
+  }
+}
+
+}  // namespace
+
+void PrintPythonOps(const OpList& ops, const string& hidden_ops,
+                    bool require_shapes) {
+  // Header
+  // TODO(josh11b): Mention the library for which wrappers are being generated.
+  printf(R"("""Python wrappers around Brain.
+
+This file is MACHINE GENERATED! Do not edit.
+"""
+
+from google.protobuf import text_format
+
+from tensorflow.core.framework import op_def_pb2
+from tensorflow.python.framework import op_def_registry
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import op_def_library
+
+
+)");
+
+  std::vector<string> hidden_vec = str_util::Split(hidden_ops, ',');
+
+  // We'll make a copy of ops that filters out descriptions.
+  OpList cleaned_ops;
+  auto out = cleaned_ops.mutable_op();
+  out->Reserve(ops.op_size());
+  for (const auto& op_def : ops.op()) {
+    bool is_hidden = false;
+    for (const string& hidden : hidden_vec) {
+      if (op_def.name() == hidden) {
+        is_hidden = true;
+        break;
+      }
+    }
+
+    // PrintPythonOp(op_def, is_hidden, op_def.name());
+    string lower_case_name;
+    GenerateLowerCaseOpName(op_def.name(), &lower_case_name);
+
+    // When users create custom python wrappers, they may link in the
+    // default op registry by accident, and because they can't
+    // enumerate all 'hidden' symbols, this guard is to prevent
+    // instantiating a python reserved word in their wrapper.
+    if (!is_hidden && IsPythonReserved(lower_case_name)) {
+      continue;
+    }
+
+    PrintPythonOp(op_def, is_hidden, lower_case_name);
+
+    if (!require_shapes) {
+      printf("ops.RegisterShape(\"%s\")(None)\n", op_def.name().c_str());
+    }
+
+    auto added = out->Add();
+    *added = op_def;
+    RemoveDescriptionsFromOpDef(added);
+  }
+
+  printf(R"(def _InitOpDefLibrary():
+  op_list = op_def_pb2.OpList()
+  text_format.Merge(_InitOpDefLibrary.op_list_ascii, op_list)
+  op_def_registry.register_op_list(op_list)
+  op_def_lib = op_def_library.OpDefLibrary()
+  op_def_lib.add_op_list(op_list)
+  return op_def_lib
+
+
+_InitOpDefLibrary.op_list_ascii = """%s"""
+
+
+_op_def_lib = _InitOpDefLibrary()
+)",
+         cleaned_ops.DebugString().c_str());
+}
+
+}  // namespace tensorflow
diff --git a/tensorflow/python/framework/python_op_gen.h b/tensorflow/python/framework/python_op_gen.h
new file mode 100644
index 0000000000..488f7431e0
--- /dev/null
+++ b/tensorflow/python/framework/python_op_gen.h
@@ -0,0 +1,17 @@
+#ifndef TENSORFLOW_PYTHON_FRAMEWORK_PYTHON_OP_GEN_H_
+#define TENSORFLOW_PYTHON_FRAMEWORK_PYTHON_OP_GEN_H_
+
+#include <string>
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/platform/port.h"
+
+namespace tensorflow {
+
+// Result is printed to stdout.  hidden_ops should be a comma-separated
+// list of Op names that should get a leading _ in the output.
+void PrintPythonOps(const OpList& ops, const string& hidden_ops,
+                    bool require_shapes);
+
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PYTHON_FRAMEWORK_PYTHON_OP_GEN_H_
diff --git a/tensorflow/python/framework/python_op_gen_main.cc b/tensorflow/python/framework/python_op_gen_main.cc
new file mode 100644
index 0000000000..29afe35598
--- /dev/null
+++ b/tensorflow/python/framework/python_op_gen_main.cc
@@ -0,0 +1,30 @@
+#include "tensorflow/python/framework/python_op_gen.h"
+
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_def.pb.h"
+#include "tensorflow/core/platform/init_main.h"
+#include "tensorflow/core/platform/logging.h"
+
+namespace tensorflow {
+namespace {
+
+void PrintAllPythonOps(const char* hidden, bool require_shapes) {
+  OpList ops;
+  OpRegistry::Global()->Export(false, &ops);
+  PrintPythonOps(ops, hidden, require_shapes);
+}
+
+}  // namespace
+}  // namespace tensorflow
+
+int main(int argc, char* argv[]) {
+  tensorflow::port::InitMain(argv[0], &argc, &argv);
+  if (argc == 2) {
+    tensorflow::PrintAllPythonOps("", std::string(argv[1]) == "1");
+  } else if (argc == 3) {
+    tensorflow::PrintAllPythonOps(argv[1], std::string(argv[2]) == "1");
+  } else {
+    return -1;
+  }
+  return 0;
+}
diff --git a/tensorflow/python/framework/random_seed.py b/tensorflow/python/framework/random_seed.py
new file mode 100644
index 0000000000..d0ffee7042
--- /dev/null
+++ b/tensorflow/python/framework/random_seed.py
@@ -0,0 +1,136 @@
+"""For seeding individual ops based on a graph-level seed.
+"""
+
+from tensorflow.python.framework import ops
+
+
+_DEFAULT_GRAPH_SEED = 87654321
+
+
+def get_seed(op_seed):
+  """Returns the local seeds an operation should use given an op-specific seed.
+
+  Given operation-specific seed, `op_seed`, this helper function returns two
+  seeds derived from graph-level and op-level seeds. Many random operations
+  internally use the two seeds to allow user to change the seed globally for a
+  graph, or for only specific operations.
+
+  For details on how the graph-level seed interacts with op seeds, see
+  [`set_random_seed`](constant_op.md#set_random_seed).
+
+  Args:
+    op_seed: integer.
+
+  Returns:
+    A tuple of two integers that should be used for the local seed of this
+    operation.
+  """
+  graph_seed = ops.get_default_graph().seed
+  if graph_seed is not None:
+    if op_seed is not None:
+      return graph_seed, op_seed
+    else:
+      return graph_seed, ops.get_default_graph()._last_id
+  else:
+    if op_seed is not None:
+      return _DEFAULT_GRAPH_SEED, op_seed
+    else:
+      return None, None
+
+
+def set_random_seed(seed):
+  """Sets the graph-level random seed.
+
+  Operations that rely on a random seed actually derive it from two seeds:
+  the graph-level and operation-level seeds. This sets the graph-level seed.
+
+  Its interactions with operation-level seeds is as follows:
+
+    1. If neither the graph-level nor the operation seed is set:
+      A random seed is used for this op.
+    2. If the graph-level seed is set, but the operation seed is not:
+      The system deterministically picks an operation seed in conjunction
+      with the graph-level seed so that it gets a unique random sequence.
+    3. If the graph-level seed is not set, but the operation seed is set:
+      A default graph-level seed and the specified operation seed are used to
+      determine the random sequence.
+    4. If both the graph-level and the operation seed are set:
+      Both seeds are used in conjunction to determine the random sequence.
+
+  To illustrate the user-visible effects, consider these examples:
+
+  To generate different sequences across sessions, set neither
+  graph-level nor op-level seeds:
+
+  ```python
+  a = tf.random_uniform([1])
+  b = tf.random_normal([1])
+
+  print "Session 1"
+  with tf.Session() as sess1:
+    print sess1.run(a)  # generates 'A1'
+    print sess1.run(a)  # generates 'A2'
+    print sess1.run(b)  # generates 'B1'
+    print sess1.run(b)  # generates 'B2'
+
+  print "Session 2"
+  with tf.Session() as sess2:
+    print sess2.run(a)  # generates 'A3'
+    print sess2.run(a)  # generates 'A4'
+    print sess2.run(b)  # generates 'B3'
+    print sess2.run(b)  # generates 'B4'
+  ```
+
+  To generate the same repeatable sequence for an op across sessions, set the
+  seed for the op:
+
+  ```python
+  a = tf.random_uniform([1], seed=1)
+  b = tf.random_normal([1])
+
+  # Repeatedly running this block with the same graph will generate the same
+  # sequence of values for 'a', but different sequences of values for 'b'.
+  print "Session 1"
+  with tf.Session() as sess1:
+    print sess1.run(a)  # generates 'A1'
+    print sess1.run(a)  # generates 'A2'
+    print sess1.run(b)  # generates 'B1'
+    print sess1.run(b)  # generates 'B2'
+
+  print "Session 2"
+  with tf.Session() as sess2:
+    print sess2.run(a)  # generates 'A1'
+    print sess2.run(a)  # generates 'A2'
+    print sess2.run(b)  # generates 'B3'
+    print sess2.run(b)  # generates 'B4'
+  ```
+
+  To make the random sequences generated by all ops be repeatable across
+  sessions, set a graph-level seed:
+
+  ```python
+  tf.set_random_seed(1234)
+  a = tf.random_uniform([1])
+  b = tf.random_normal([1])
+
+  # Repeatedly running this block with the same graph will generate different
+  # sequences of 'a' and 'b'.
+  print "Session 1"
+  with tf.Session() as sess1:
+    print sess1.run(a)  # generates 'A1'
+    print sess1.run(a)  # generates 'A2'
+    print sess1.run(b)  # generates 'B1'
+    print sess1.run(b)  # generates 'B2'
+
+  print "Session 2"
+  with tf.Session() as sess2:
+    print sess2.run(a)  # generates 'A1'
+    print sess2.run(a)  # generates 'A2'
+    print sess2.run(b)  # generates 'B1'
+    print sess2.run(b)  # generates 'B2'
+  ```
+
+  Args:
+    seed: integer.
+  """
+  ops.get_default_graph().seed = seed
diff --git a/tensorflow/python/framework/registry.py b/tensorflow/python/framework/registry.py
new file mode 100644
index 0000000000..d9556f0a06
--- /dev/null
+++ b/tensorflow/python/framework/registry.py
@@ -0,0 +1,64 @@
+"""Registry mechanism for "registering" classes/functions for general use.
+
+This is typically used with a decorator that calls Register for adding
+a class or function to a registry.
+"""
+
+import traceback
+
+from tensorflow.python.platform import logging
+
+
+# Registry mechanism below is based on mapreduce.python.mrpython.Register.
+_LOCATION_TAG = "location"
+_TYPE_TAG = "type"
+
+
+class Registry(object):
+  """Provides a registry for saving objects."""
+
+  def __init__(self, name):
+    """Creates a new registry."""
+    self._name = name
+    self._registry = dict()
+
+  def register(self, candidate, name=None):
+    """Registers a Python object "candidate" for the given "name".
+
+    Args:
+      candidate: the candidate object to add to the registry.
+      name: an optional string specifying the registry key for the candidate.
+            If None, candidate.__name__ will be used.
+    Raises:
+      KeyError: If same name is used twice.
+    """
+    if not name:
+      name = candidate.__name__
+    if name in self._registry:
+      (filename, line_number, function_name, _) = (
+          self._registry[name][_LOCATION_TAG])
+      raise KeyError("Registering two %s with name '%s' !"
+                     "(Previous registration was in %s %s:%d)" %
+                     (self._name, name, function_name, filename, line_number))
+
+    logging.vlog(1, "Registering %s (%s) in %s.", name, candidate, self._name)
+    # stack trace is [this_function, Register(), user_function,...]
+    # so the user function is #2.
+    stack = traceback.extract_stack()
+    self._registry[name] = {_TYPE_TAG: candidate, _LOCATION_TAG: stack[2]}
+
+  def lookup(self, name):
+    """Looks up "name".
+
+    Args:
+      name: a string specifying the registry key for the candidate.
+    Returns:
+      Registered object if found
+    Raises:
+      LookupError: if "name" has not been registered.
+    """
+    if name in self._registry:
+      return self._registry[name][_TYPE_TAG]
+    else:
+      raise LookupError(
+          "%s registry has no entry for: %s" % (self._name, name))
diff --git a/tensorflow/python/framework/registry_test.py b/tensorflow/python/framework/registry_test.py
new file mode 100644
index 0000000000..5b4f261ceb
--- /dev/null
+++ b/tensorflow/python/framework/registry_test.py
@@ -0,0 +1,38 @@
+"""Tests for tensorflow.ops.registry."""
+
+from tensorflow.python.framework import registry
+from tensorflow.python.platform import googletest
+
+
+class RegistryTest(googletest.TestCase):
+
+  class Foo(object):
+    pass
+
+  def testRegisterClass(self):
+    myreg = registry.Registry('testfoo')
+    with self.assertRaises(LookupError):
+      myreg.lookup('Foo')
+    myreg.register(RegistryTest.Foo, 'Foo')
+    assert myreg.lookup('Foo') == RegistryTest.Foo
+
+  def testRegisterFunction(self):
+    myreg = registry.Registry('testbar')
+    with self.assertRaises(LookupError):
+      myreg.lookup('Bar')
+    myreg.register(bar, 'Bar')
+    assert myreg.lookup('Bar') == bar
+
+  def testDuplicate(self):
+    myreg = registry.Registry('testbar')
+    myreg.register(bar, 'Bar')
+    with self.assertRaises(KeyError):
+      myreg.register(bar, 'Bar')
+
+
+def bar():
+  pass
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/framework/tensor_shape.py b/tensorflow/python/framework/tensor_shape.py
new file mode 100644
index 0000000000..d4f27696d4
--- /dev/null
+++ b/tensorflow/python/framework/tensor_shape.py
@@ -0,0 +1,743 @@
+"""Helper classes for tensor shape inference."""
+import tensorflow.python.platform
+
+
+class Dimension(object):
+  """Represents the value of one dimension in a TensorShape."""
+
+  def __init__(self, value):
+    """Creates a new Dimension with the given value."""
+    if value is None:
+      self._value = None
+    else:
+      self._value = int(value)
+
+  def __repr__(self):
+    return "Dimension(%s)" % repr(self._value)
+
+  def __eq__(self, other):
+    """Returns true if `other` has the same known value as this Dimension."""
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return None
+    return self._value == other.value
+
+  def __ne__(self, other):
+    """Returns true if `other` has a different known value from `self`."""
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return None
+    return self._value != other.value
+
+  def __int__(self):
+    return self._value
+
+  @property
+  def value(self):
+    """The value of this dimension, or None if it is unknown."""
+    return self._value
+
+  def is_compatible_with(self, other):
+    """Returns true if `other` is compatible with this Dimension.
+
+    Two known Dimensions are compatible if they have the same value.
+    An unknown Dimension is compatible with all other Dimensions.
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      True if this Dimension and `other` are compatible.
+    """
+    other = as_dimension(other)
+    return (self._value is None
+            or other.value is None
+            or self._value == other.value)
+
+  def assert_is_compatible_with(self, other):
+    """Raises an exception if `other` is not compatible with this Dimension.
+
+    Args:
+      other: Another Dimension.
+
+    Raises:
+      ValueError: If `self` and `other` are not compatible (see
+        is_compatible_with).
+    """
+    if not self.is_compatible_with(other):
+      raise ValueError("Dimensions %s and %s are not compatible"
+                       % (self, other))
+
+  def merge_with(self, other):
+    """Returns a Dimension that combines the information in `self` and `other`.
+
+    Dimensions are combined as follows:
+
+      Dimension(n)   .merge_with(Dimension(n))    == Dimension(n)
+      Dimension(n)   .merge_with(Dimension(None)) == Dimension(n)
+      Dimension(None).merge_with(Dimension(n))    == Dimension(n)
+      Dimension(None).merge_with(Dimension(None)) == Dimension(None)
+      Dimension(n)   .merge_with(Dimension(m)) raises ValueError for n != m
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      A Dimension containing the combined information of `self` and
+      `other`.
+
+    Raises:
+      ValueError: If `self` and `other` are not compatible (see
+        is_compatible_with).
+    """
+    other = as_dimension(other)
+    self.assert_is_compatible_with(other)
+    if self._value is None:
+      return Dimension(other.value)
+    else:
+      return Dimension(self._value)
+
+  def __add__(self, other):
+    """Returns the sum of `self` and `other`.
+
+    Dimensions are summed as follows:
+
+      Dimension(m)    + Dimension(n)    == Dimension(m + n)
+      Dimension(m)    + Dimension(None) == Dimension(None)
+      Dimension(None) + Dimension(n)    == Dimension(None)
+      Dimension(None) + Dimension(None) == Dimension(None)
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      A Dimension whose value is the sum of `self` and `other`.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return Dimension(None)
+    else:
+      return Dimension(self._value + other.value)
+
+  def __sub__(self, other):
+    """Returns the subtraction of `other` from `self`.
+
+    Dimensions are subtracted as follows:
+
+      Dimension(m)    - Dimension(n)    == Dimension(m - n)
+      Dimension(m)    - Dimension(None) == Dimension(None)
+      Dimension(None) - Dimension(n)    == Dimension(None)
+      Dimension(None) - Dimension(None) == Dimension(None)
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      A Dimension whose value is the subtraction of sum of `other` from `self`.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return Dimension(None)
+    else:
+      return Dimension(self._value - other.value)
+
+  def __mul__(self, other):
+    """Returns the product of `self` and `other`.
+
+    Dimensions are summed as follows:
+
+      Dimension(m)    * Dimension(n)    == Dimension(m * n)
+      Dimension(m)    * Dimension(None) == Dimension(None)
+      Dimension(None) * Dimension(n)    == Dimension(None)
+      Dimension(None) * Dimension(None) == Dimension(None)
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      A Dimension whose value is the sum of `self` and `other`.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return Dimension(None)
+    else:
+      return Dimension(self._value * other.value)
+
+  def __div__(self, other):
+    """Returns the quotient of `self` and `other`.
+
+    Dimensions are summed as follows:
+
+      Dimension(m)    / Dimension(n)    == Dimension(m / n)
+      Dimension(m)    / Dimension(None) == Dimension(None)
+      Dimension(None) / Dimension(n)    == Dimension(None)
+      Dimension(None) / Dimension(None) == Dimension(None)
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      A Dimension whose value is the sum of `self` and `other`.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return Dimension(None)
+    else:
+      return Dimension(self._value / other.value)
+
+  def __mod__(self, other):
+    """Returns `self` modulo `other.
+
+    Dimension moduli are computed  as follows:
+
+      Dimension(m)    % Dimension(n)     == Dimension(m % n)
+      Dimension(m)    % Dimension(None)  == Dimension(None)
+      Dimension(None) % Dimension(n)     == Dimension(None)
+      Dimension(None) %  Dimension(None) == Dimension(None)
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      A Dimension whose value is `self` modulo `other`.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return Dimension(None)
+    else:
+      return Dimension(self._value % other.value)
+
+  def __lt__(self, other):
+    """Returns True if `self` is known to be less than `other`.
+
+    Dimensions are compared as follows:
+
+      Dimension(m)    < Dimension(n)    == m < n
+      Dimension(m)    < Dimension(None) == None
+      Dimension(None) < Dimension(n)    == None
+      Dimension(None) < Dimension(None) == None
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      The value of `self.value < other.value` if both are known, otherwise
+      None.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return None
+    else:
+      return self._value < other.value
+
+  def __le__(self, other):
+    """Returns True if `self` is known to be less than or equal to `other`.
+
+    Dimensions are compared as follows:
+
+      Dimension(m)    <= Dimension(n)    == m <= n
+      Dimension(m)    <= Dimension(None) == None
+      Dimension(None) <= Dimension(n)    == None
+      Dimension(None) <= Dimension(None) == None
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      The value of `self.value <= other.value` if both are known, otherwise
+      None.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return None
+    else:
+      return self._value <= other.value
+
+  def __gt__(self, other):
+    """Returns True if `self` is known to be greater than `other`.
+
+    Dimensions are compared as follows:
+
+      Dimension(m)    > Dimension(n)    == m > n
+      Dimension(m)    > Dimension(None) == None
+      Dimension(None) > Dimension(n)    == None
+      Dimension(None) > Dimension(None) == None
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      The value of `self.value > other.value` if both are known, otherwise
+      None.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return None
+    else:
+      return self._value > other.value
+
+  def __ge__(self, other):
+    """Returns True if `self` is known to be greater than or equal to `other`.
+
+    Dimensions are compared as follows:
+
+      Dimension(m)    >= Dimension(n)    == m >= n
+      Dimension(m)    >= Dimension(None) == None
+      Dimension(None) >= Dimension(n)    == None
+      Dimension(None) >= Dimension(None) == None
+
+    Args:
+      other: Another Dimension.
+
+    Returns:
+      The value of `self.value >= other.value` if both are known, otherwise
+      None.
+    """
+    other = as_dimension(other)
+    if self._value is None or other.value is None:
+      return None
+    else:
+      return self._value >= other.value
+
+
+def as_dimension(value):
+  """Converts the given value to a Dimension.
+
+  A Dimenson input will be returned unmodified.
+  An input of `None` will be converted to an unknown Dimension.
+  An integer input will be converted to a Dimension with that value.
+
+  Args:
+    value: The value to be converted.
+
+  Returns:
+    A Dimension corresponding to the given value.
+  """
+  if isinstance(value, Dimension):
+    return value
+  else:
+    return Dimension(value)
+
+
+class TensorShape(object):
+  """Represents the shape of a `Tensor`.
+
+  A `TensorShape` represents a possibly-partial shape specification for a
+  `Tensor`. It may be one of the following:
+
+  * *Fully-known shape:* has a known number of dimensions and a known size
+    for each dimension.
+  * *Partially-known shape:* has a known number of dimensions, and an unknown
+    size for one or more dimension.
+  * *Unknown shape:* has an unknown number of dimensions, and an unknown
+    size in all dimensions.
+
+  If a tensor is produced by an operation of type `"Foo"`, its shape
+  may be inferred if there is a registered shape function for
+  `"Foo"`. See [`tf.RegisterShape()`](framework.md#RegisterShape)
+  for details of shape
+  functions and how to register them. Alternatively, the shape may be set
+  explicitly using [`Tensor.set_shape()`](framework.md#Tensor.set_shape).
+
+  @@merge_with
+  @@concatenate
+
+  @@ndims
+  @@dims
+  @@as_list
+  @@is_compatible_with
+  @@is_fully_defined
+
+  @@with_rank
+  @@with_rank_at_least
+  @@with_rank_at_most
+
+  @@assert_has_rank
+  @@assert_same_rank
+  @@assert_is_compatible_with
+  @@assert_is_fully_defined
+  """
+
+  def __init__(self, dims):
+    """Creates a new TensorShape with the given dimensions.
+
+    Args:
+      dims: A list of Dimensions, or None if the shape is unspecified.
+        DEPRECATED: A single integer is treated as a singleton list.
+    """
+    # TODO(irving): Eliminate the single integer special case.
+    if dims is None:
+      self._dims = None
+    else:
+      try:
+        dims_iter = iter(dims)
+      except TypeError:
+        # Treat as a singleton dimension
+        self._dims = [as_dimension(dims)]
+      else:
+        # Got a list of dimensions
+        self._dims = map(as_dimension, dims_iter)
+
+  def __repr__(self):
+    return "TensorShape(%s)" % str(self._dims)
+
+  @property
+  def dims(self):
+    """Returns a list of Dimensions, or None if the shape is unspecified."""
+    return self._dims
+
+  @property
+  def ndims(self):
+    """Returns the rank of this shape, or None if it is unspecified."""
+    if self._dims is None:
+      return None
+    else:
+      return len(self._dims)
+
+  def __len__(self):
+    """Returns the rank of this shape, or raises ValueError if unspecified."""
+    if self._dims is None:
+      raise ValueError("Cannot take the length of Shape with unknown rank.")
+    return len(self._dims)
+
+  def __nonzero__(self):
+    """Returns True if this shape contains non-zero information."""
+    return self._dims is not None
+
+  def __getitem__(self, key):
+    """Returns the value of a dimension or a shape, depending on the key.
+
+    Args:
+      key: If `key` is an integer, returns the dimension at that index;
+        otherwise if `key` is a slice, returns a TensorShape whose
+        dimensions are those selected by the slice from `self`.
+
+    Returns:
+      A dimension if `key` is an integer, or a `TensorShape` if `key` is a
+      slice.
+
+    Raises:
+      ValueError: If `key` is a slice, and any of its elements are negative, or
+        if `self` is completely unknown and the step is set.
+    """
+    if self._dims is not None:
+      if isinstance(key, slice):
+        return TensorShape(self._dims[key])
+      else:
+        return self._dims[key]
+    else:
+      if isinstance(key, slice):
+        start = key.start if key.start is not None else 0
+        stop = key.stop
+
+        if key.step is not None:
+          # TODO(mrry): Handle these maybe.
+          raise ValueError("Steps are not yet handled")
+        if stop is None:
+          # NOTE(mrry): This implies that TensorShape(None) is compatible with
+          # TensorShape(None)[1:], which is obviously not true. It would be
+          # possible to track the number of dimensions symbolically,
+          # and perhaps we should do that.
+          return unknown_shape()
+        elif start < 0 or stop < 0:
+          # TODO(mrry): Handle this better, as it will be useful for handling
+          # suffixes of otherwise unknown shapes.
+          return unknown_shape()
+        else:
+          return unknown_shape(ndims=stop-start)
+      else:
+        return Dimension(None)
+
+  def num_elements(self):
+    """Returns the total number of elements, or none for incomplete shapes."""
+    if self.is_fully_defined():
+      size = 1
+      for dim in self._dims:
+        size *= dim.value
+      return size
+    else:
+      return None
+
+  def merge_with(self, other):
+    """Returns a `TensorShape` combining the information in `self` and `other`.
+
+    The dimensions in `self` and `other` are merged elementwise,
+    according to the rules defined for `Dimension.merge_with()`.
+
+    Args:
+      other: Another `TensorShape`.
+
+    Returns:
+      A `TensorShape` containing the combined information of `self` and
+      `other`.
+
+    Raises:
+      ValueError: If `self` and `other` are not compatible.
+    """
+    other = as_shape(other)
+    if self._dims is None:
+      return other
+    else:
+      self.assert_same_rank(other)
+      new_dims = []
+      for i, dim in enumerate(self._dims):
+        new_dims.append(dim.merge_with(other[i]))
+      return TensorShape(new_dims)
+
+  def concatenate(self, other):
+    """Returns the concatenation of the dimension in `self` and `other`.
+
+    *N.B.* If either `self` or `other` is completely unknown,
+    concatenation will discard information about the other shape. In
+    future, we might support concatenation that preserves this
+    information for use with slicing.
+
+    Args:
+      other: Another `TensorShape`.
+
+    Returns:
+      A `TensorShape` whose dimensions are the concatenation of the
+      dimensions in `self` and `other`.
+    """
+    # TODO(mrry): Handle the case where we concatenate a known shape with a
+    # completely unknown shape, so that we can use the partial information.
+    other = as_shape(other)
+    if self._dims is None or other.dims is None:
+      return unknown_shape()
+    else:
+      return TensorShape(self._dims + other.dims)
+
+  def assert_same_rank(self, other):
+    """Raises an exception if `self` and `other` do not have compatible ranks.
+
+    Args:
+      other: Another `TensorShape`.
+
+    Raises:
+      ValueError: If `self` and `other` do not represent shapes with the
+        same rank.
+    """
+    other = as_shape(other)
+    if self.ndims is not None and other.ndims is not None:
+      if self.ndims != other.ndims:
+        raise ValueError(
+            "Shapes %s and %s must have the same rank" % (self, other))
+
+  def assert_has_rank(self, rank):
+    """Raises an exception if `self` is not compatible with the given `rank`.
+
+    Args:
+      rank: An integer.
+
+    Raises:
+      ValueError: If `self` does not represent a shape with the given `rank`.
+    """
+    if self.ndims not in (None, rank):
+      raise ValueError("Shape %s must have rank %d" % (self, rank))
+
+  def with_rank(self, rank):
+    """Returns a shape based on `self` with the given rank.
+
+    This method promotes a completely unknown shape to one with a
+    known rank.
+
+    Args:
+      rank: An integer.
+
+    Returns:
+      A shape that is at least as specific as `self` with the given rank.
+
+    Raises:
+      ValueError: If `self` does not represent a shape with the given `rank`.
+    """
+    return self.merge_with(unknown_shape(ndims=rank))
+
+  def with_rank_at_least(self, rank):
+    """Returns a shape based on `self` with at least the given rank.
+
+    Args:
+      rank: An integer.
+
+    Returns:
+      A shape that is at least as specific as `self` with at least the given
+      rank.
+
+    Raises:
+      ValueError: If `self` does not represent a shape with at least the given
+        `rank`.
+    """
+    if self.ndims is not None and self.ndims < rank:
+      raise ValueError("Shape %s must have rank at least %d" % (self, rank))
+    else:
+      return self
+
+  def with_rank_at_most(self, rank):
+    """Returns a shape based on `self` with at most the given rank.
+
+    Args:
+      rank: An integer.
+
+    Returns:
+      A shape that is at least as specific as `self` with at most the given
+      rank.
+
+    Raises:
+      ValueError: If `self` does not represent a shape with at most the given
+        `rank`.
+    """
+    if self.ndims is not None and self.ndims > rank:
+      raise ValueError("Shape %s must have rank at most %d" % (self, rank))
+    else:
+      return self
+
+  def is_compatible_with(self, other):
+    """Returns True iff `self` is compatible with `other`.
+
+    Two possibly-partially-defined shapes are compatible if there
+    exists a fully-defined shape that both shapes can represent. Thus,
+    compatibility allows the shape inference code to reason about
+    partially-defined shapes. For example:
+
+    * TensorShape(None) is compatible with all shapes.
+
+    * TensorShape([None, None]) is compatible with all two-dimensional
+      shapes, such as TensorShape([32, 784]), and also TensorShape(None). It is
+      not compatible with, for example, TensorShape([None]) or
+      TensorShape([None, None, None]).
+
+    * TensorShape([32, None]) is compatible with all two-dimensional shapes
+      with size 32 in the 0th dimension, and also TensorShape([None, None])
+      and TensorShape(None). It is not compatible with, for example,
+      TensorShape([32]), TensorShape([32, None, 1]) or TensorShape([64, None]).
+
+    * TensorShape([32, 784]) is compatible with itself, and also
+      TensorShape([32, None]), TensorShape([None, 784]), TensorShape([None,
+      None]) and TensorShape(None). It is not compatible with, for example,
+      TensorShape([32, 1, 784]) or TensorShape([None]).
+
+    The compatibility relation is reflexive and symmetric, but not
+    transitive. For example, TensorShape([32, 784]) is compatible with
+    TensorShape(None), and TensorShape(None) is compatible with
+    TensorShape([4, 4]), but TensorShape([32, 784]) is not compatible with
+    TensorShape([4, 4]).
+
+    Args:
+      other: Another TensorShape.
+
+    Returns:
+      True iff `self` is compatible with `other`.
+
+    """
+    other = as_shape(other)
+    if self._dims is not None and other.dims is not None:
+      if self.ndims != other.ndims:
+        return False
+      for x_dim, y_dim in zip(self._dims, other.dims):
+        if not x_dim.is_compatible_with(y_dim):
+          return False
+    return True
+
+  def assert_is_compatible_with(self, other):
+    """Raises exception if `self` and `other` do not represent the same shape.
+
+    This method can be used to assert that there exists a shape that both
+    `self` and `other` represent.
+
+    Args:
+      other: Another TensorShape.
+
+    Raises:
+      ValueError: If `self` and `other` do not represent the same shape.
+    """
+    if not self.is_compatible_with(other):
+      raise ValueError("Shapes %s and %s are incompatible" % (self, other))
+
+  def is_fully_defined(self):
+    """Returns True iff `self` is fully defined in every dimension."""
+    return (self._dims is not None
+            and all(dim.value is not None for dim in self._dims))
+
+  def assert_is_fully_defined(self):
+    """Raises an exception if `self` is not fully defined in every dimension.
+
+    Raises:
+      ValueError: If `self` does not have a known value for every dimension.
+    """
+    if not self.is_fully_defined():
+      raise ValueError("Shape %s is not fully defined" % self)
+
+  def as_dimension_list(self):
+    """DEPRECATED: use as_list()."""
+    self.assert_is_fully_defined()
+    return self.as_list()
+
+  def as_list(self):
+    """Returns a list of integers or None for each dimension."""
+    return [dim.value for dim in self._dims]
+
+  def __eq__(self, other):
+    """Returns True if `self` is equivalent to `other`."""
+    other = as_shape(other)
+    return self._dims == other.dims
+
+  def __ne__(self, other):
+    """Returns True if `self` is known to be different from `other`."""
+    other = as_shape(other)
+    if self.ndims is None or other.ndims is None:
+      raise ValueError("The inequality of unknown TensorShapes is undefined.")
+    if self.ndims != other.ndims:
+      return True
+    return self._dims != other.dims
+
+
+def as_shape(shape):
+  """Converts the given object to a TensorShape."""
+  if isinstance(shape, TensorShape):
+    return shape
+  else:
+    return TensorShape(shape)
+
+
+def unknown_shape(ndims=None):
+  """Returns an unknown TensorShape, optionally with a known rank.
+
+  Args:
+    ndims: (Optional) If specified, the number of dimensions in the shape.
+
+  Returns:
+    An unknown TensorShape.
+  """
+  if ndims is None:
+    return TensorShape(None)
+  else:
+    return TensorShape([Dimension(None) for _ in range(ndims)])
+
+
+def scalar():
+  """Returns a shape representing a scalar."""
+  return TensorShape([])
+
+
+def vector(length):
+  """Returns a shape representing a vector.
+
+  Args:
+    length: The length of the vector, which may be None if unknown.
+
+  Returns:
+    A TensorShape representing a vector of the given length.
+  """
+  return TensorShape([length])
+
+
+def matrix(rows, cols):
+  """Returns a shape representing a matrix.
+
+  Args:
+    rows: The number of rows in the matrix, which may be None if unknown.
+    cols: The number of columns in the matrix, which may be None if unknown.
+
+  Returns:
+    A TensorShape representing a matrix of the given size.
+  """
+  return TensorShape([rows, cols])
diff --git a/tensorflow/python/framework/tensor_shape_test.py b/tensorflow/python/framework/tensor_shape_test.py
new file mode 100644
index 0000000000..9743a8d199
--- /dev/null
+++ b/tensorflow/python/framework/tensor_shape_test.py
@@ -0,0 +1,232 @@
+"""Functional tests for shape inference helper classes."""
+import tensorflow.python.platform
+
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import test_util
+from tensorflow.python.platform import googletest
+
+
+class DimensionTest(test_util.TensorFlowTestCase):
+
+  def testDimension(self):
+    dim = tensor_shape.Dimension(12)
+    self.assertEqual(12, dim.value)
+    self.assertEqual(12, int(dim))
+    self.assertEqual(dim, tensor_shape.Dimension(12))
+    self.assertEqual(tensor_shape.Dimension(15),
+                     dim + tensor_shape.Dimension(3))
+    self.assertEqual(tensor_shape.Dimension(15), dim + 3)
+    self.assertEqual(tensor_shape.Dimension(24),
+                     dim * tensor_shape.Dimension(2))
+    self.assertEqual(tensor_shape.Dimension(24), dim * 2)
+    self.assertEqual(tensor_shape.Dimension(6), dim / tensor_shape.Dimension(2))
+    self.assertEqual(tensor_shape.Dimension(6), dim / 2)
+    self.assertEqual(tensor_shape.Dimension(12),
+                     dim.merge_with(tensor_shape.Dimension(12)))
+    self.assertEqual(tensor_shape.Dimension(12), dim.merge_with(12))
+    self.assertLess(tensor_shape.Dimension(12), tensor_shape.Dimension(13))
+    self.assertGreater(tensor_shape.Dimension(13), tensor_shape.Dimension(12))
+    self.assertLessEqual(tensor_shape.Dimension(12), tensor_shape.Dimension(12))
+    self.assertLessEqual(tensor_shape.Dimension(12), tensor_shape.Dimension(13))
+    self.assertGreater(tensor_shape.Dimension(13), tensor_shape.Dimension(12))
+    self.assertGreaterEqual(tensor_shape.Dimension(12),
+                            tensor_shape.Dimension(12))
+    self.assertGreaterEqual(tensor_shape.Dimension(13),
+                            tensor_shape.Dimension(12))
+    with self.assertRaises(ValueError):
+      dim.merge_with(tensor_shape.Dimension(13))
+
+  def testUnknownDimension(self):
+    dim = tensor_shape.Dimension(None)
+    self.assertIs(None, dim.value)
+    self.assertEqual(dim.value, tensor_shape.Dimension(None).value)
+    self.assertEqual(tensor_shape.Dimension(None).value,
+                     (dim + tensor_shape.Dimension(None)).value)
+    self.assertEqual(tensor_shape.Dimension(None).value,
+                     (dim * tensor_shape.Dimension(None)).value)
+    self.assertEqual(tensor_shape.Dimension(None).value,
+                     (dim / tensor_shape.Dimension(None)).value)
+    self.assertEqual(tensor_shape.Dimension(None).value,
+                     dim.merge_with(tensor_shape.Dimension(None)).value)
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) < tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) <= tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) > tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) >= tensor_shape.Dimension(None))
+
+  def testKnownAndUnknownDimensions(self):
+    known = tensor_shape.Dimension(12)
+    unknown = tensor_shape.Dimension(None)
+    self.assertEqual(
+        tensor_shape.Dimension(None).value, (known + unknown).value)
+    self.assertEqual(
+        tensor_shape.Dimension(None).value, (unknown + known).value)
+    self.assertEqual(
+        tensor_shape.Dimension(None).value, (known * unknown).value)
+    self.assertEqual(
+        tensor_shape.Dimension(None).value, (unknown * known).value)
+    self.assertEqual(
+        tensor_shape.Dimension(None).value, (known / unknown).value)
+    self.assertEqual(
+        tensor_shape.Dimension(None).value, (unknown / known).value)
+    self.assertEqual(
+        tensor_shape.Dimension(12), known.merge_with(unknown))
+    self.assertEqual(
+        tensor_shape.Dimension(12), unknown.merge_with(known))
+    self.assertIs(None,
+                  tensor_shape.Dimension(12) < tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(12) <= tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(12) > tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(12) >= tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) < tensor_shape.Dimension(12))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) <= tensor_shape.Dimension(12))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) > tensor_shape.Dimension(12))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) >= tensor_shape.Dimension(12))
+
+  def testAsDimension(self):
+    self.assertEqual(tensor_shape.Dimension(12),
+                     tensor_shape.as_dimension(tensor_shape.Dimension(12)))
+    self.assertEqual(tensor_shape.Dimension(12), tensor_shape.as_dimension(12))
+    self.assertEqual(
+        tensor_shape.Dimension(None).value,
+        tensor_shape.as_dimension(tensor_shape.Dimension(None)).value)
+    self.assertEqual(tensor_shape.Dimension(None).value,
+                     tensor_shape.as_dimension(None).value)
+
+  def testEquality(self):
+    self.assertTrue(tensor_shape.Dimension(12) == tensor_shape.Dimension(12))
+    self.assertFalse(tensor_shape.Dimension(12) == tensor_shape.Dimension(13))
+    self.assertIs(None,
+                  tensor_shape.Dimension(12) == tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) == tensor_shape.Dimension(12))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) == tensor_shape.Dimension(None))
+
+  def testInequality(self):
+    self.assertTrue(tensor_shape.Dimension(12) != tensor_shape.Dimension(13))
+    self.assertFalse(tensor_shape.Dimension(12) != tensor_shape.Dimension(12))
+    self.assertIs(None,
+                  tensor_shape.Dimension(12) != tensor_shape.Dimension(None))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) != tensor_shape.Dimension(12))
+    self.assertIs(None,
+                  tensor_shape.Dimension(None) != tensor_shape.Dimension(None))
+
+
+class ShapeTest(test_util.TensorFlowTestCase):
+
+  def testUnknownShape(self):
+    s = tensor_shape.TensorShape(None)
+    with self.assertRaises(ValueError):
+      s.assert_is_fully_defined()
+    self.assertIs(None, s.ndims)
+    with self.assertRaises(ValueError):
+      len(s)
+    self.assertFalse(s)
+    self.assertIs(None, s.dims)
+
+  def testFullyDefinedShape(self):
+    s = tensor_shape.TensorShape([tensor_shape.Dimension(3),
+                     tensor_shape.Dimension(4),
+                     tensor_shape.Dimension(7)])
+    s.assert_is_fully_defined()
+    self.assertEqual(3, s.ndims)
+    self.assertEqual(3, len(s))
+    self.assertTrue(s)
+    s.assert_has_rank(3)
+    self.assertEqual([tensor_shape.Dimension(3),
+                      tensor_shape.Dimension(4),
+                      tensor_shape.Dimension(7)], s.dims)
+    self.assertEqual(tensor_shape.Dimension(3), s[0])
+    self.assertEqual(tensor_shape.Dimension(4), s[1])
+    self.assertEqual(tensor_shape.Dimension(7), s[2])
+    self.assertEqual([3, 4, 7], s.as_list())
+    s.assert_is_compatible_with([3, 4, 7])
+    s.assert_same_rank([6, 3, 7])
+
+  def testPartiallyDefinedShape(self):
+    s = tensor_shape.TensorShape([tensor_shape.Dimension(3),
+                     tensor_shape.Dimension(None),
+                     tensor_shape.Dimension(7)])
+    with self.assertRaises(ValueError):
+      s.assert_is_fully_defined()
+    self.assertEqual(3, s.ndims)
+    self.assertEqual(3, len(s))
+    self.assertTrue(s)
+    s.assert_has_rank(3)
+    self.assertEqual(tensor_shape.Dimension(3), s[0])
+    self.assertEqual(tensor_shape.Dimension(None).value, s[1].value)
+    self.assertEqual(tensor_shape.Dimension(7), s[2])
+    s.assert_same_rank([6, 3, 7])
+
+  def testMergeFullShapes(self):
+    self.assertEqual([3, 4, 7],
+                     tensor_shape.TensorShape([3, 4, 7]).merge_with(
+                         tensor_shape.TensorShape([3, 4, 7])).as_list())
+    with self.assertRaises(ValueError):
+      tensor_shape.TensorShape([3, 4, 7]).merge_with(
+          tensor_shape.TensorShape([6, 3, 7]))
+
+  def testMergePartialShapes(self):
+    s1 = tensor_shape.TensorShape([tensor_shape.Dimension(3),
+                      tensor_shape.Dimension(None),
+                      tensor_shape.Dimension(7)])
+    s2 = tensor_shape.TensorShape([tensor_shape.Dimension(None),
+                      tensor_shape.Dimension(4),
+                      tensor_shape.Dimension(7)])
+    self.assertEqual([3, 4, 7], s1.merge_with(s2).as_list())
+
+  def testMergeFullAndUnknownShape(self):
+    self.assertEqual([3, 4, 7],
+                     tensor_shape.TensorShape([3, 4, 7]).merge_with(
+                         tensor_shape.TensorShape(None)).as_list())
+
+  def testSlice(self):
+    known = tensor_shape.TensorShape([0, 1, 2, 3, 4])
+    self.assertEqual(tensor_shape.Dimension(2), known[2])
+    tensor_shape.TensorShape([1, 2, 3]).assert_is_compatible_with(known[1:4])
+
+    unknown = tensor_shape.TensorShape(None)
+    self.assertEqual(tensor_shape.Dimension(None).value, unknown[2].value)
+    tensor_shape.TensorShape(
+        [None, None, None]).assert_is_compatible_with(unknown[1:4])
+
+  def testConcatenate(self):
+    tensor_shape.TensorShape([1, 2, 3, 4]).assert_is_compatible_with(
+        tensor_shape.TensorShape([1, 2]).concatenate(
+            tensor_shape.TensorShape([3, 4])))
+    tensor_shape.TensorShape([1, 2, 3, 4]).assert_is_compatible_with(
+        tensor_shape.TensorShape([1, 2]).concatenate(
+            tensor_shape.TensorShape(None)))
+    tensor_shape.TensorShape([1, 2, 3, 4]).assert_is_compatible_with(
+        tensor_shape.TensorShape(None).concatenate(
+            tensor_shape.TensorShape([3, 4])))
+    tensor_shape.TensorShape([1, 2, 3, 4]).assert_is_compatible_with(
+        tensor_shape.TensorShape(None).concatenate(
+            tensor_shape.TensorShape(None)))
+    tensor_shape.TensorShape([1, 2, 3]).assert_is_compatible_with(
+        tensor_shape.TensorShape([1, 2]).concatenate(
+            tensor_shape.Dimension(3)))
+
+  def testHelpers(self):
+    tensor_shape.TensorShape([]).assert_is_compatible_with(
+        tensor_shape.scalar())
+    tensor_shape.TensorShape([37]).assert_is_compatible_with(
+        tensor_shape.vector(37))
+    tensor_shape.TensorShape(
+        [94, 43]).assert_is_compatible_with(tensor_shape.matrix(94, 43))
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/framework/tensor_util.py b/tensorflow/python/framework/tensor_util.py
new file mode 100644
index 0000000000..81ed54c473
--- /dev/null
+++ b/tensorflow/python/framework/tensor_util.py
@@ -0,0 +1,511 @@
+"""Utilities to create TensorProtos."""
+import numbers
+import tensorflow.python.platform
+import numpy as np
+
+from tensorflow.core.framework import tensor_pb2
+from tensorflow.core.framework import tensor_shape_pb2
+
+# TODO(opensource): Add support for pyx_library in the open-source build.
+# For now, we use the slow versions that fast_tensor_util replaces.
+# pylint: disable=g-import-not-at-top
+try:
+  from tensorflow.python.framework import fast_tensor_util
+  _FAST_TENSOR_UTIL_AVAILABLE = True
+except ImportError:
+  _FAST_TENSOR_UTIL_AVAILABLE = False
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+# pylint: enable=g-import-not-at-top
+
+
+if _FAST_TENSOR_UTIL_AVAILABLE:
+  _NP_TO_APPEND_FN = {
+      np.float32: fast_tensor_util.AppendFloat32ArrayToTensorProto,
+      np.float64: fast_tensor_util.AppendFloat64ArrayToTensorProto,
+      np.int32: fast_tensor_util.AppendInt32ArrayToTensorProto,
+      np.int64: fast_tensor_util.AppendInt64ArrayToTensorProto,
+      np.uint8: fast_tensor_util.AppendUInt8ArrayToTensorProto,
+      np.int16: fast_tensor_util.AppendInt16ArrayToTensorProto,
+      np.int8: fast_tensor_util.AppendInt8ArrayToTensorProto,
+      np.complex64: fast_tensor_util.AppendComplex64ArrayToTensorProto,
+      np.complex128: fast_tensor_util.AppendComplex128ArrayToTensorProto,
+      np.object: fast_tensor_util.AppendObjectArrayToTensorProto,
+      np.bool: fast_tensor_util.AppendBoolArrayToTensorProto,
+      types.qint8.as_numpy_dtype:
+      fast_tensor_util.AppendInt8ArrayToTensorProto,
+      types.quint8.as_numpy_dtype:
+      fast_tensor_util.AppendUInt8ArrayToTensorProto,
+      types.qint32.as_numpy_dtype:
+      fast_tensor_util.AppendInt32ArrayToTensorProto,
+      # NOTE(mdevin): Intentionally no way to feed a DT_BFLOAT16.
+  }
+else:
+
+  def SlowAppendFloat32ArrayToTensorProto(tensor_proto, proto_values):
+    tensor_proto.float_val.extend([np.asscalar(x) for x in proto_values])
+
+  def SlowAppendFloat64ArrayToTensorProto(tensor_proto, proto_values):
+    tensor_proto.double_val.extend([np.asscalar(x) for x in proto_values])
+
+  def SlowAppendIntArrayToTensorProto(tensor_proto, proto_values):
+    tensor_proto.int_val.extend([np.asscalar(x) for x in proto_values])
+
+  def SlowAppendInt64ArrayToTensorProto(tensor_proto, proto_values):
+    tensor_proto.int64_val.extend([np.asscalar(x) for x in proto_values])
+
+  def SlowAppendComplexArrayToTensorProto(tensor_proto, proto_values):
+    tensor_proto.scomplex_val.extend([np.asscalar(v)
+                                      for x in proto_values
+                                      for v in [x.real, x.imag]])
+
+  def SlowAppendObjectArrayToTensorProto(tensor_proto, proto_values):
+    tensor_proto.string_val.extend([str(x) for x in proto_values])
+
+  def SlowAppendBoolArrayToTensorProto(tensor_proto, proto_values):
+    tensor_proto.bool_val.extend([np.asscalar(x) for x in proto_values])
+
+  _NP_TO_APPEND_FN = {
+      np.float32: SlowAppendFloat32ArrayToTensorProto,
+      np.float64: SlowAppendFloat64ArrayToTensorProto,
+      np.int32: SlowAppendIntArrayToTensorProto,
+      np.int64: SlowAppendInt64ArrayToTensorProto,
+      np.uint8: SlowAppendIntArrayToTensorProto,
+      np.int16: SlowAppendIntArrayToTensorProto,
+      np.int8: SlowAppendIntArrayToTensorProto,
+      np.complex64: SlowAppendComplexArrayToTensorProto,
+      np.complex128: SlowAppendComplexArrayToTensorProto,
+      np.object: SlowAppendObjectArrayToTensorProto,
+      np.bool: SlowAppendBoolArrayToTensorProto,
+      types.qint8.as_numpy_dtype: SlowAppendIntArrayToTensorProto,
+      types.quint8.as_numpy_dtype: SlowAppendIntArrayToTensorProto,
+      types.qint32.as_numpy_dtype: SlowAppendIntArrayToTensorProto,
+      # NOTE(mdevin): Intentionally no way to feed a DT_BFLOAT16.
+  }
+
+
+def GetFromNumpyDTypeDict(dtype_dict, dtype):
+  # NOTE: dtype_dict.get(dtype) always returns None.
+  for key, val in dtype_dict.iteritems():
+    if key == dtype:
+      return val
+  return None
+
+
+def GetNumpyAppendFn(dtype):
+  # numpy dtype for strings are variable length. We can not compare
+  # dtype with a single constant (np.string does not exist) to decide
+  # dtype is a "string" type. We need to compare the dtype.type to be
+  # sure it's a string type.
+  if dtype.type == np.string_ or dtype.type == np.unicode_:
+    if _FAST_TENSOR_UTIL_AVAILABLE:
+      return fast_tensor_util.AppendObjectArrayToTensorProto
+    else:
+      return SlowAppendObjectArrayToTensorProto
+  return GetFromNumpyDTypeDict(_NP_TO_APPEND_FN, dtype)
+
+
+def MakeTensorShapeProto(shape):
+  """Create a TensorShapeProto.
+
+  Args:
+    shape: List of integers representing the dimensions of the tensor.
+
+  Returns:
+    A TensorShapeProto.
+  """
+  return tensor_shape_pb2.TensorShapeProto(
+      dim=[tensor_shape_pb2.TensorShapeProto.Dim(size=x) for x in shape])
+
+
+def TensorShapeProtoToList(shape):
+  """Convert a TensorShape to a list.
+
+  Args:
+    shape: A TensorShapeProto.
+
+  Returns:
+    List of integers representing the dimensions of the tensor.
+  """
+  return [dim.size for dim in shape.dim]
+
+
+def _GetDenseDimensions(list_of_lists):
+  """Returns the inferred dense dimensions of a list of lists."""
+  if not isinstance(list_of_lists, (list, tuple)):
+    return []
+  elif not list_of_lists:
+    return [0]
+  else:
+    return [len(list_of_lists)] + _GetDenseDimensions(list_of_lists[0])
+
+
+def _FlattenToStrings(nested_strings):
+  if isinstance(nested_strings, list):
+    for inner in nested_strings:
+      for flattened_string in _FlattenToStrings(inner):
+        yield flattened_string
+  else:
+    yield nested_strings
+
+
+_TENSOR_CONTENT_TYPES = frozenset([
+    types.float32, types.float64, types.int32, types.uint8, types.int16,
+    types.int8, types.int64
+])
+
+
+def _FirstNotNone(l):
+  for x in l:
+    if x is not None:
+      return x
+  return None
+
+
+def _FilterInt(v):
+  if isinstance(v, (list, tuple)):
+    return _FirstNotNone([_FilterInt(x) for x in v])
+  return None if isinstance(v, numbers.Integral) else repr(v)
+
+
+def _FilterFloat(v):
+  if isinstance(v, (list, tuple)):
+    return _FirstNotNone([_FilterFloat(x) for x in v])
+  return None if isinstance(v, numbers.Real) else repr(v)
+
+
+def _FilterComplex(v):
+  if isinstance(v, (list, tuple)):
+    return _FirstNotNone([_FilterComplex(x) for x in v])
+  return None if isinstance(v, numbers.Complex) else repr(v)
+
+
+def _FilterStr(v):
+  if isinstance(v, (list, tuple)):
+    return _FirstNotNone([_FilterStr(x) for x in v])
+  return None if isinstance(v, basestring) else repr(v)
+
+
+def _FilterBool(v):
+  if isinstance(v, (list, tuple)):
+    return _FirstNotNone([_FilterBool(x) for x in v])
+  return None if isinstance(v, bool) else repr(v)
+
+
+def _FilterNotTensor(v):
+  if isinstance(v, (list, tuple)):
+    return _FirstNotNone([_FilterNotTensor(x) for x in v])
+  return repr(v) if isinstance(v, ops.Tensor) else None
+
+
+_TF_TO_IS_OK = {
+    types.float32: _FilterFloat,
+    types.float64: _FilterFloat,
+    types.int32: _FilterInt,
+    types.uint8: _FilterInt,
+    types.int16: _FilterInt,
+    types.int8: _FilterInt,
+    types.string: _FilterStr,
+    types.complex64: _FilterComplex,
+    types.int64: _FilterInt,
+    types.bool: _FilterBool,
+    types.qint32: _FilterInt,
+    types.quint8: _FilterInt,
+    types.qint8: _FilterInt,
+}
+
+
+def _AssertCompatible(values, dtype):
+  fn = _TF_TO_IS_OK.get(dtype, _FilterNotTensor)
+  mismatch = fn(values)
+  if mismatch is not None:
+    if dtype is None:
+      raise TypeError("List of Tensors when single Tensor expected")
+    else:
+      raise TypeError("Expected %s, got %s instead." %
+                      (dtype.name, mismatch))
+
+
+def make_tensor_proto(values, dtype=None, shape=None):
+  """Create a TensorProto.
+
+  Args:
+    values:    Values to put in the TensorProto.
+    dtype:     Optional tensor_pb2 DataType value.
+    shape:     List of integers representing the dimensions of tensor.
+
+  Returns:
+    A TensorProto. Depending on the type, it may contain data in the
+    "tensor_content" attribute, which is not directly useful to Python programs.
+    To access the values you should convert the proto back to a numpy ndarray
+    with tensor_util.MakeNdarray(proto).
+
+  Raises:
+    TypeError:  if unsupported types are provided.
+    ValueError: if arguments have inappropriate values.
+
+  make_tensor_proto accepts "values" of a python scalar, a python list, a
+  numpy ndarray, or a numpy scalar.
+
+  If "values" is a python scalar or a python list, make_tensor_proto
+  first convert it to numpy ndarray. If dtype is None, the
+  conversion tries its best to infer the right numpy data
+  type. Otherwise, the resulting numpy array has a compatible data
+  type with the given dtype.
+
+  In either case above, the numpy ndarray (either the caller provided
+  or the auto converted) must have the compatible type with dtype.
+
+  make_tensor_proto then converts the numpy array to a tensor proto.
+
+  If "shape" is None, the resulting tensor proto represents the numpy
+  array precisely.
+
+  Otherwise, "shape" specifies the tensor's shape and the numpy array
+  can not have more elements than what "shape" specifies.
+
+  """
+  if dtype:
+    dtype = types.as_dtype(dtype)
+
+  # We first convert value to a numpy array or scalar.
+  if isinstance(values, (np.ndarray, np.generic)):
+    if dtype:
+      nparray = values.astype(dtype.as_numpy_dtype)
+    else:
+      nparray = values
+  else:
+    if values is None:
+      raise ValueError("None values not supported.")
+    # if dtype is provided, forces numpy array to be the type
+    # provided if possible.
+    np_dt = dtype.as_numpy_dtype if dtype else None
+    if np.prod(shape) == 0:
+      nparray = np.empty(shape, dtype=np_dt)
+    else:
+      _AssertCompatible(values, dtype)
+      nparray = np.array(values, dtype=np_dt)
+      if list(nparray.shape) != _GetDenseDimensions(values):
+        raise ValueError("Argument must be a dense tensor: %s" % values)
+    # python/numpy default float type is float64. We prefer float32 instead.
+    if (nparray.dtype == np.float64) and dtype is None:
+      nparray = nparray.astype(np.float32)
+    # python/numpy default int type is int64. We prefer int32 instead.
+    elif (nparray.dtype == np.int64) and dtype is None:
+      nparray = nparray.astype(np.int32)
+
+  # if dtype is provided, it must be compatible with what numpy
+  # conversion says.
+  numpy_dtype = types.as_dtype(nparray.dtype)
+  if numpy_dtype is None:
+    raise TypeError("Unrecognized data type: %s" % nparray.dtype)
+
+  # If dtype was specified and is a quantized type, we convert
+  # numpy_dtype back into the quantized version.
+  if dtype in [types.qint8, types.quint8, types.qint32]:
+    numpy_dtype = dtype
+
+  if dtype is not None and not dtype.base_dtype == numpy_dtype.base_dtype:
+    raise TypeError("Incompatible types: %s vs. %s" % (dtype, nparray.dtype))
+
+  # If shape is not given, get the shape from the numpy array.
+  if shape is None:
+    shape = nparray.shape
+    is_same_size = True
+    shape_size = nparray.size
+  else:
+    shape = [int(dim) for dim in shape]
+    shape_size = np.prod(shape)
+    is_same_size = shape_size == nparray.size
+
+    if nparray.size > shape_size:
+      raise ValueError(
+          "Too many elements provided. Needed at most %d, but received %d" %
+          (shape_size, nparray.size))
+
+  tensor_proto = tensor_pb2.TensorProto(
+      dtype=numpy_dtype.as_datatype_enum,
+      tensor_shape=MakeTensorShapeProto(shape))
+
+  if is_same_size and numpy_dtype in _TENSOR_CONTENT_TYPES and shape_size > 1:
+    tensor_proto.tensor_content = nparray.tostring()
+    return tensor_proto
+
+  # If we were not given values as a numpy array, compute the proto_values
+  # from the given values directly, to avoid numpy trimming nulls from the
+  # strings. Since values could be a list of strings, or a multi-dimensional
+  # list of lists that might or might not correspond to the given shape,
+  # we flatten it conservatively.
+  if numpy_dtype == types.string and not isinstance(values, np.ndarray):
+    proto_values = _FlattenToStrings(values)
+    tensor_proto.string_val.extend([str(x) for x in proto_values])
+    return tensor_proto
+
+  # TensorFlow expects C order (a.k.a., eigen row major).
+  proto_values = nparray.ravel()
+
+  append_fn = GetNumpyAppendFn(proto_values.dtype)
+  if append_fn is None:
+    raise TypeError("Element type not supported in TensorProto: %s" %
+                    numpy_dtype.name)
+  append_fn(tensor_proto, proto_values)
+
+  return tensor_proto
+
+
+def MakeNdarray(tensor):
+  """Create a numpy ndarray from a tensor.
+
+  Create a numpy ndarray with the same shape and data as the tensor.
+
+  Args:
+    tensor: A TensorProto.
+
+  Returns:
+    A numpy array with the tensor contents.
+
+  Raises:
+    TypeError: if tensor has unsupported type.
+
+  """
+  shape = [d.size for d in tensor.tensor_shape.dim]
+  num_elements = np.prod(shape)
+  tensor_dtype = types.as_dtype(tensor.dtype)
+  dtype = tensor_dtype.as_numpy_dtype
+
+  if tensor.tensor_content:
+    return np.fromstring(tensor.tensor_content, dtype=dtype).reshape(shape)
+  elif tensor_dtype == types.float32:
+    if len(tensor.float_val) == 1:
+      return np.repeat(np.array(tensor.float_val[0], dtype=dtype),
+                       num_elements).reshape(shape)
+    else:
+      return np.fromiter(tensor.float_val, dtype=dtype).reshape(shape)
+  elif tensor_dtype == types.float64:
+    if len(tensor.double_val) == 1:
+      return np.repeat(np.array(tensor.double_val[0], dtype=dtype),
+                       num_elements).reshape(shape)
+    else:
+      return np.fromiter(tensor.double_val, dtype=dtype).reshape(shape)
+  elif tensor_dtype in [types.int32, types.uint8, types.int16, types.int8,
+                        types.qint32, types.quint8, types.qint8,
+                        types.bfloat16]:
+    if len(tensor.int_val) == 1:
+      return np.repeat(np.array(tensor.int_val[0], dtype=dtype),
+                       num_elements).reshape(shape)
+    else:
+      return np.fromiter(tensor.int_val, dtype=dtype).reshape(shape)
+  elif tensor_dtype == types.int64:
+    if len(tensor.int64_val) == 1:
+      return np.repeat(np.array(tensor.int64_val[0], dtype=dtype),
+                       num_elements).reshape(shape)
+    else:
+      return np.fromiter(tensor.int64_val, dtype=dtype).reshape(shape)
+  elif tensor_dtype == types.string:
+    if len(tensor.string_val) == 1:
+      return np.repeat(np.array(str(tensor.string_val[0]), dtype=dtype),
+                       num_elements).reshape(shape)
+    else:
+      return np.array([str(x) for x in tensor.string_val],
+                      dtype=dtype).reshape(shape)
+  elif tensor_dtype == types.complex64:
+    it = iter(tensor.scomplex_val)
+    if len(tensor.scomplex_val) == 2:
+      return np.repeat(np.array(complex(tensor.scomplex_val[0],
+                                        tensor.scomplex_val[1]), dtype=dtype),
+                       num_elements).reshape(shape)
+    else:
+      return np.array([complex(x[0], x[1]) for x in zip(it, it)],
+                      dtype=dtype).reshape(shape)
+  elif tensor_dtype == types.bool:
+    if len(tensor.bool_val) == 1:
+      return np.repeat(np.array(tensor.bool_val[0], dtype=dtype),
+                       num_elements).reshape(shape)
+    else:
+      return np.fromiter(tensor.bool_val, dtype=dtype).reshape(shape)
+  else:
+    raise TypeError("Unsupported tensor type: %s" % tensor.dtype)
+
+
+def ShapeEquals(tensor_proto, shape):
+  """Returns True if "tensor_proto" has the given "shape".
+
+  Args:
+    tensor_proto: A TensorProto.
+    shape: A tensor shape, expressed as a TensorShape, list, or tuple.
+
+  Returns:
+    True if "tensor_proto" has the given "shape", otherwise False.
+
+  Raises:
+    TypeError: If "tensor_proto" is not a TensorProto, or shape is not a
+      TensorShape, list, or tuple.
+  """
+  if not isinstance(tensor_proto, tensor_pb2.TensorProto):
+    raise TypeError("tensor_proto is not a tensor_pb2.TensorProto object")
+  if isinstance(shape, tensor_shape_pb2.TensorShapeProto):
+    shape = [d.size for d in shape.dim]
+  elif not isinstance(shape, (list, tuple)):
+    raise TypeError("shape is not a list or tuple")
+  tensor_shape_list = [d.size for d in tensor_proto.tensor_shape.dim]
+  return all(x == y for x, y in zip(tensor_shape_list, shape))
+
+
+def ConstantValue(tensor):
+  """Returns the constant value of the given tensor, if efficiently calculable.
+
+  This function attempts to partially evaluate the given tensor, and
+  returns its value as a numpy ndarray if this succeeds.
+
+  TODO(mrry): Consider whether this function should use a registration
+  mechanism like gradients and ShapeFunctions, so that it is easily
+  extensible.
+
+  Args:
+    tensor: The Tensor to be evaluated.
+
+  Returns:
+    A numpy ndarray containing the constant value of the given `tensor`,
+    or None if it cannot be calculated.
+
+  Raises:
+    TypeError: if tensor is not an ops.Tensor.
+  """
+  # TODO(mdevin): Support Variables?
+  if not isinstance(tensor, ops.Tensor):
+    raise TypeError("tensor is not a Tensor")
+  if tensor.op.type == "Const":
+    return MakeNdarray(tensor.op.get_attr("value"))
+  elif tensor.op.type == "Shape":
+    input_shape = tensor.op.inputs[0].get_shape()
+    if input_shape.is_fully_defined():
+      return np.array([dim.value for dim in input_shape.dims])
+    else:
+      return None
+  elif tensor.op.type == "Size":
+    input_shape = tensor.op.inputs[0].get_shape()
+    if input_shape.is_fully_defined():
+      return np.array([np.prod([dim.value for dim in input_shape.dims])])
+    else:
+      return None
+  elif tensor.op.type == "Rank":
+    input_shape = tensor.op.inputs[0].get_shape()
+    if input_shape.ndims is not None:
+      return np.array([input_shape.ndims])
+    else:
+      return None
+  elif tensor.op.type == "Range":
+    start = ConstantValue(tensor.op.inputs[0])
+    if start is None:
+      return None
+    limit = ConstantValue(tensor.op.inputs[1])
+    if limit is None:
+      return None
+    delta = ConstantValue(tensor.op.inputs[2])
+    if delta is None:
+      return None
+    return np.array(range(start, limit, delta),
+                    dtype=tensor.dtype.as_numpy_dtype)
+  else:
+    return None
diff --git a/tensorflow/python/framework/tensor_util_test.py b/tensorflow/python/framework/tensor_util_test.py
new file mode 100644
index 0000000000..7c1c0b8d3e
--- /dev/null
+++ b/tensorflow/python/framework/tensor_util_test.py
@@ -0,0 +1,379 @@
+"""Functional tests for tensor_util."""
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import state_ops
+from tensorflow.python.platform import googletest
+
+
+class TensorUtilTest(test_util.TensorFlowTestCase):
+
+  def testFloat(self):
+    t = tensor_util.make_tensor_proto(10.0)
+    self.assertProtoEquals("""
+      dtype: DT_FLOAT
+      tensor_shape {}
+      float_val: 10.0
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.float32, a.dtype)
+    self.assertAllClose(np.array(10.0, dtype=np.float32), a)
+
+  def testFloatN(self):
+    t = tensor_util.make_tensor_proto([10.0, 20.0, 30.0])
+    self.assertProtoEquals("""
+      dtype: DT_FLOAT
+      tensor_shape { dim { size: 3 } }
+      tensor_content: "\000\000 A\000\000\240A\000\000\360A"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.float32, a.dtype)
+    self.assertAllClose(np.array([10.0, 20.0, 30.0], dtype=np.float32), a)
+
+  def testFloatTyped(self):
+    t = tensor_util.make_tensor_proto([10.0, 20.0, 30.0], dtype=types.float32)
+    self.assertProtoEquals("""
+      dtype: DT_FLOAT
+      tensor_shape { dim { size: 3 } }
+      tensor_content: "\000\000 A\000\000\240A\000\000\360A"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.float32, a.dtype)
+    self.assertAllClose(np.array([10.0, 20.0, 30.0], dtype=np.float32), a)
+
+  def testFloatTypeCoerce(self):
+    t = tensor_util.make_tensor_proto([10, 20, 30], dtype=types.float32)
+    self.assertProtoEquals("""
+      dtype: DT_FLOAT
+      tensor_shape { dim { size: 3 } }
+      tensor_content: "\000\000 A\000\000\240A\000\000\360A"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.float32, a.dtype)
+    self.assertAllClose(np.array([10.0, 20.0, 30.0], dtype=np.float32), a)
+
+  def testFloatTypeCoerceNdarray(self):
+    arr = np.asarray([10, 20, 30], dtype="int")
+    t = tensor_util.make_tensor_proto(arr, dtype=types.float32)
+    self.assertProtoEquals("""
+      dtype: DT_FLOAT
+      tensor_shape { dim { size: 3 } }
+      tensor_content: "\000\000 A\000\000\240A\000\000\360A"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.float32, a.dtype)
+    self.assertAllClose(np.array([10.0, 20.0, 30.0], dtype=np.float32), a)
+
+  def testFloatSizes(self):
+    t = tensor_util.make_tensor_proto([10.0, 20.0, 30.0], shape=[1, 3])
+    self.assertProtoEquals("""
+      dtype: DT_FLOAT
+      tensor_shape { dim { size: 1 } dim { size: 3 } }
+      tensor_content: "\000\000 A\000\000\240A\000\000\360A"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.float32, a.dtype)
+    self.assertAllClose(np.array([[10.0, 20.0, 30.0]], dtype=np.float32), a)
+
+  def testFloatSizes2(self):
+    t = tensor_util.make_tensor_proto([10.0, 20.0, 30.0], shape=[3, 1])
+    self.assertProtoEquals("""
+      dtype: DT_FLOAT
+      tensor_shape { dim { size: 3 } dim { size: 1 } }
+      tensor_content: "\000\000 A\000\000\240A\000\000\360A"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.float32, a.dtype)
+    self.assertAllClose(np.array([[10.0], [20.0], [30.0]], dtype=np.float32),
+                        a)
+
+  def testFloatSizesLessValues(self):
+    t = tensor_util.make_tensor_proto(10.0, shape=[1, 3])
+    self.assertProtoEquals("""
+      dtype: DT_FLOAT
+      tensor_shape { dim { size: 1 } dim { size: 3 } }
+      float_val: 10.0
+      """, t)
+    # No conversion to Ndarray for this one: not enough values.
+
+  def testFloatNpArrayFloat64(self):
+    t = tensor_util.make_tensor_proto(
+        np.array([[10.0, 20.0, 30.0]], dtype=np.float64))
+    self.assertProtoEquals("""
+      dtype: DT_DOUBLE
+      tensor_shape { dim { size: 1 } dim { size: 3 } }
+      tensor_content: "\000\000\000\000\000\000$@\000\000\000\000\000\0004@\000\000\000\000\000\000>@"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.float64, a.dtype)
+    self.assertAllClose(np.array([[10.0, 20.0, 30.0]], dtype=np.float64),
+                        tensor_util.MakeNdarray(t))
+
+  def testFloatTypesWithImplicitRepeat(self):
+    for dtype, nptype in [
+        (types.float32, np.float32), (types.float64, np.float64)]:
+      t = tensor_util.make_tensor_proto([10.0], shape=[3, 4], dtype=dtype)
+      a = tensor_util.MakeNdarray(t)
+      self.assertAllClose(np.array([[10.0, 10.0, 10.0, 10.0],
+                                    [10.0, 10.0, 10.0, 10.0],
+                                    [10.0, 10.0, 10.0, 10.0]], dtype=nptype), a)
+
+  def testInt(self):
+    t = tensor_util.make_tensor_proto(10)
+    self.assertProtoEquals("""
+      dtype: DT_INT32
+      tensor_shape {}
+      int_val: 10
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.int32, a.dtype)
+    self.assertAllClose(np.array(10, dtype=np.int32), a)
+
+  def testIntNDefaultType(self):
+    t = tensor_util.make_tensor_proto([10, 20, 30, 40], shape=[2, 2])
+    self.assertProtoEquals("""
+      dtype: DT_INT32
+      tensor_shape { dim { size: 2 } dim { size: 2 } }
+      tensor_content: "\\n\000\000\000\024\000\000\000\036\000\000\000(\000\000\000"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.int32, a.dtype)
+    self.assertAllClose(np.array([[10, 20], [30, 40]], dtype=np.int32), a)
+
+  def testIntTypes(self):
+    for dtype, nptype in [
+        (types.int32, np.int32),
+        (types.uint8, np.uint8),
+        (types.int16, np.int16),
+        (types.int8, np.int8)]:
+      # Test with array.
+      t = tensor_util.make_tensor_proto([10, 20, 30], dtype=dtype)
+      self.assertEquals(dtype, t.dtype)
+      self.assertProtoEquals("dim { size: 3 }", t.tensor_shape)
+      a = tensor_util.MakeNdarray(t)
+      self.assertEquals(nptype, a.dtype)
+      self.assertAllClose(np.array([10, 20, 30], dtype=nptype), a)
+      # Test with ndarray.
+      t = tensor_util.make_tensor_proto(np.array([10, 20, 30], dtype=nptype))
+      self.assertEquals(dtype, t.dtype)
+      self.assertProtoEquals("dim { size: 3 }", t.tensor_shape)
+      a = tensor_util.MakeNdarray(t)
+      self.assertEquals(nptype, a.dtype)
+      self.assertAllClose(np.array([10, 20, 30], dtype=nptype), a)
+
+  def testIntTypesWithImplicitRepeat(self):
+    for dtype, nptype in [
+        (types.int64, np.int64),
+        (types.int32, np.int32),
+        (types.uint8, np.uint8),
+        (types.int16, np.int16),
+        (types.int8, np.int8)]:
+      t = tensor_util.make_tensor_proto([10], shape=[3, 4], dtype=dtype)
+      a = tensor_util.MakeNdarray(t)
+      self.assertAllEqual(np.array([[10, 10, 10, 10],
+                                    [10, 10, 10, 10],
+                                    [10, 10, 10, 10]], dtype=nptype), a)
+
+  def testLong(self):
+    t = tensor_util.make_tensor_proto(10, dtype=types.int64)
+    self.assertProtoEquals("""
+      dtype: DT_INT64
+      tensor_shape {}
+      int64_val: 10
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.int64, a.dtype)
+    self.assertAllClose(np.array(10, dtype=np.int64), a)
+
+  def testLongN(self):
+    t = tensor_util.make_tensor_proto([10, 20, 30], shape=[1, 3],
+                                    dtype=types.int64)
+    self.assertProtoEquals("""
+      dtype: DT_INT64
+      tensor_shape { dim { size: 1 } dim { size: 3 } }
+      tensor_content: "\\n\000\000\000\000\000\000\000\024\000\000\000\000\000\000\000\036\000\000\000\000\000\000\000"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.int64, a.dtype)
+    self.assertAllClose(np.array([[10, 20, 30]], dtype=np.int64), a)
+
+  def testLongNpArray(self):
+    t = tensor_util.make_tensor_proto(np.array([10, 20, 30]))
+    self.assertProtoEquals("""
+      dtype: DT_INT64
+      tensor_shape { dim { size: 3 } }
+      tensor_content: "\\n\000\000\000\000\000\000\000\024\000\000\000\000\000\000\000\036\000\000\000\000\000\000\000"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.int64, a.dtype)
+    self.assertAllClose(np.array([10, 20, 30], dtype=np.int64), a)
+
+  def testString(self):
+    t = tensor_util.make_tensor_proto("foo")
+    self.assertProtoEquals("""
+      dtype: DT_STRING
+      tensor_shape {}
+      string_val: "foo"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.object, a.dtype)
+    self.assertEquals(["foo"], a)
+
+  def testStringWithImplicitRepeat(self):
+    t = tensor_util.make_tensor_proto("f", shape=[3, 4])
+    a = tensor_util.MakeNdarray(t)
+    self.assertAllEqual(np.array([["f", "f", "f", "f"],
+                                  ["f", "f", "f", "f"],
+                                  ["f", "f", "f", "f"]], dtype=np.object), a)
+
+  def testStringN(self):
+    t = tensor_util.make_tensor_proto(["foo", "bar", "baz"], shape=[1, 3])
+    self.assertProtoEquals("""
+      dtype: DT_STRING
+      tensor_shape { dim { size: 1 } dim { size: 3 } }
+      string_val: "foo"
+      string_val: "bar"
+      string_val: "baz"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.object, a.dtype)
+    self.assertAllEqual(np.array([["foo", "bar", "baz"]]), a)
+
+  def testStringNpArray(self):
+    t = tensor_util.make_tensor_proto(np.array([["a", "ab"], ["abc", "abcd"]]))
+    self.assertProtoEquals("""
+      dtype: DT_STRING
+      tensor_shape { dim { size: 2 } dim { size: 2 } }
+      string_val: "a"
+      string_val: "ab"
+      string_val: "abc"
+      string_val: "abcd"
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.object, a.dtype)
+    self.assertAllEqual(np.array([["a", "ab"], ["abc", "abcd"]]), a)
+
+  def testComplex(self):
+    t = tensor_util.make_tensor_proto((1+2j), dtype=types.complex64)
+    self.assertProtoEquals("""
+      dtype: DT_COMPLEX64
+      tensor_shape {}
+      scomplex_val: 1
+      scomplex_val: 2
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.complex64, a.dtype)
+    self.assertAllEqual(np.array(1 + 2j), a)
+
+  def testComplexWithImplicitRepeat(self):
+    t = tensor_util.make_tensor_proto((1+1j), shape=[3, 4],
+                                      dtype=types.complex64)
+    a = tensor_util.MakeNdarray(t)
+    self.assertAllClose(np.array([[(1+1j), (1+1j), (1+1j), (1+1j)],
+                                  [(1+1j), (1+1j), (1+1j), (1+1j)],
+                                  [(1+1j), (1+1j), (1+1j), (1+1j)]],
+                                 dtype=np.complex64), a)
+
+  def testComplexN(self):
+    t = tensor_util.make_tensor_proto([(1+2j), (3+4j), (5+6j)], shape=[1, 3],
+                                      dtype=types.complex64)
+    self.assertProtoEquals("""
+      dtype: DT_COMPLEX64
+      tensor_shape { dim { size: 1 } dim { size: 3 } }
+      scomplex_val: 1
+      scomplex_val: 2
+      scomplex_val: 3
+      scomplex_val: 4
+      scomplex_val: 5
+      scomplex_val: 6
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.complex64, a.dtype)
+    self.assertAllEqual(np.array([[(1+2j), (3+4j), (5+6j)]]), a)
+
+  def testComplexNpArray(self):
+    t = tensor_util.make_tensor_proto(
+        np.array([[(1+2j), (3+4j)], [(5+6j), (7+8j)]]), dtype=types.complex64)
+    # scomplex_val are real_0, imag_0, real_1, imag_1, ...
+    self.assertProtoEquals("""
+      dtype: DT_COMPLEX64
+      tensor_shape { dim { size: 2 } dim { size: 2 } }
+      scomplex_val: 1
+      scomplex_val: 2
+      scomplex_val: 3
+      scomplex_val: 4
+      scomplex_val: 5
+      scomplex_val: 6
+      scomplex_val: 7
+      scomplex_val: 8
+      """, t)
+    a = tensor_util.MakeNdarray(t)
+    self.assertEquals(np.complex64, a.dtype)
+    self.assertAllEqual(np.array([[(1+2j), (3+4j)], [(5+6j), (7+8j)]]), a)
+
+  def testUnsupportedDType(self):
+    with self.assertRaises(TypeError):
+      tensor_util.make_tensor_proto(np.array([1]), 0)
+
+  def testShapeTooLarge(self):
+    with self.assertRaises(ValueError):
+      tensor_util.make_tensor_proto(np.array([1, 2]), shape=[1])
+
+  def testLowRankSupported(self):
+    t = tensor_util.make_tensor_proto(np.array(7))
+    self.assertProtoEquals("""
+      dtype: DT_INT64
+      tensor_shape {}
+      int64_val: 7
+      """, t)
+
+  def testShapeEquals(self):
+    t = tensor_util.make_tensor_proto([10, 20, 30, 40], shape=[2, 2])
+    self.assertTrue(tensor_util.ShapeEquals(t, [2, 2]))
+    self.assertTrue(tensor_util.ShapeEquals(t, (2, 2)))
+    self.assertTrue(
+        tensor_util.ShapeEquals(t, tensor_util.MakeTensorShapeProto([2, 2])))
+    self.assertFalse(tensor_util.ShapeEquals(t, [5, 3]))
+    self.assertFalse(tensor_util.ShapeEquals(t, [1, 4]))
+    self.assertFalse(tensor_util.ShapeEquals(t, [4]))
+
+
+class ConstantValueTest(test_util.TensorFlowTestCase):
+
+  def testConstant(self):
+    np_val = np.random.rand(3, 4, 7).astype(np.float32)
+    tf_val = constant_op.constant(np_val)
+    self.assertAllClose(np_val, tensor_util.ConstantValue(tf_val))
+
+    np_val = np.random.rand(3, 0, 7).astype(np.float32)
+    tf_val = constant_op.constant(np_val)
+    self.assertAllClose(np_val, tensor_util.ConstantValue(tf_val))
+
+  def testUnknown(self):
+    tf_val = state_ops.variable_op(shape=[3, 4, 7], dtype=types.float32)
+    self.assertIs(None, tensor_util.ConstantValue(tf_val))
+
+  def testShape(self):
+    np_val = np.array([1, 2, 3])
+    tf_val = array_ops.shape(constant_op.constant(0.0, shape=[1, 2, 3]))
+    self.assertAllEqual(np_val, tensor_util.ConstantValue(tf_val))
+
+  def testSize(self):
+    np_val = np.array([6])
+    tf_val = array_ops.size(constant_op.constant(0.0, shape=[1, 2, 3]))
+    self.assertAllEqual(np_val, tensor_util.ConstantValue(tf_val))
+
+  def testRank(self):
+    np_val = np.array([3])
+    tf_val = array_ops.rank(constant_op.constant(0.0, shape=[1, 2, 3]))
+    self.assertAllEqual(np_val, tensor_util.ConstantValue(tf_val))
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/framework/test_kernel_label_op.cc b/tensorflow/python/framework/test_kernel_label_op.cc
new file mode 100644
index 0000000000..50f8522e1b
--- /dev/null
+++ b/tensorflow/python/framework/test_kernel_label_op.cc
@@ -0,0 +1,47 @@
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+REGISTER_OP("KernelLabel").Output("result: string");
+
+namespace {
+enum KernelLabel { DEFAULT_LABEL, OVERLOAD_1_LABEL, OVERLOAD_2_LABEL };
+}  // namespace
+
+template <KernelLabel KL>
+class KernelLabelOp : public OpKernel {
+ public:
+  using OpKernel::OpKernel;
+
+  void Compute(OpKernelContext* ctx) override {
+    Tensor* output;
+    OP_REQUIRES_OK(ctx,
+                   ctx->allocate_output("result", TensorShape({}), &output));
+    switch (KL) {
+      case DEFAULT_LABEL:
+        output->scalar<string>()() = "My label is: default";
+        break;
+      case OVERLOAD_1_LABEL:
+        output->scalar<string>()() = "My label is: overload_1";
+        break;
+      case OVERLOAD_2_LABEL:
+        output->scalar<string>()() = "My label is: overload_2";
+        break;
+    }
+  }
+};
+
+REGISTER_KERNEL_BUILDER(Name("KernelLabel").Device(DEVICE_CPU),
+                        KernelLabelOp<DEFAULT_LABEL>);
+REGISTER_KERNEL_BUILDER(Name("KernelLabel")
+                            .Device(DEVICE_CPU)
+                            .Label("overload_1"),
+                        KernelLabelOp<OVERLOAD_1_LABEL>);
+REGISTER_KERNEL_BUILDER(Name("KernelLabel")
+                            .Device(DEVICE_CPU)
+                            .Label("overload_2"),
+                        KernelLabelOp<OVERLOAD_2_LABEL>);
+
+}  // end namespace tensorflow
diff --git a/tensorflow/python/framework/test_util.py b/tensorflow/python/framework/test_util.py
new file mode 100644
index 0000000000..597a5ad829
--- /dev/null
+++ b/tensorflow/python/framework/test_util.py
@@ -0,0 +1,437 @@
+# pylint: disable=invalid-name
+"""Test utils for tensorflow."""
+import contextlib
+import math
+import re
+import threading
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from google.protobuf import text_format
+
+from tensorflow.core.framework import config_pb2
+from tensorflow.python import pywrap_tensorflow
+from tensorflow.python.client import graph_util
+from tensorflow.python.client import session
+from tensorflow.python.framework import errors
+from tensorflow.python.framework import ops
+from tensorflow.python.platform import googletest
+from tensorflow.python.platform import logging
+from tensorflow.python.util.protobuf import compare
+
+
+def IsGoogleCudaEnabled():
+  return pywrap_tensorflow.IsGoogleCudaEnabled()
+
+
+class TensorFlowTestCase(googletest.TestCase):
+  """Root class for tests that need to test tensor flow.
+  """
+
+  def __init__(self, methodName="runTest"):
+    super(TensorFlowTestCase, self).__init__(methodName)
+    self._threads = []
+    self._tempdir = None
+    self._cached_session = None
+
+  def setUp(self):
+    self._ClearCachedSession()
+    ops.reset_default_graph()
+
+  def tearDown(self):
+    for thread in self._threads:
+      self.assertFalse(thread.is_alive(), "A checkedThread did not terminate")
+    self._ClearCachedSession()
+
+  def _ClearCachedSession(self):
+    if self._cached_session is not None:
+      self._cached_session.close()
+      self._cached_session = None
+
+  def get_temp_dir(self):
+    if not self._tempdir:
+      self._tempdir = googletest.GetTempDir()
+    return self._tempdir
+
+  def _AssertProtoEquals(self, a, b):
+    """Asserts that a and b are the same proto.
+
+    Uses Proto2Cmp() first, as it returns correct results
+    for floating point attributes, and then use assertProto2Equal()
+    in case of failure as it provides good error messages.
+
+    Args:
+      a: a proto.
+      b: another proto.
+    """
+    if compare.Proto2Cmp(a, b) != 0:
+      compare.assertProto2Equal(self, a, b, normalize_numbers=True)
+
+  def assertProtoEquals(self, expected_message_maybe_ascii, message):
+    """Asserts that message is same as parsed expected_message_ascii.
+
+    Creates another prototype of message, reads the ascii message into it and
+    then compares them using self._AssertProtoEqual().
+
+    Args:
+      expected_message_maybe_ascii: proto message in original or ascii form
+      message: the message to validate
+    """
+
+    if type(expected_message_maybe_ascii) == type(message):
+      expected_message = expected_message_maybe_ascii
+      self._AssertProtoEquals(expected_message, message)
+    elif isinstance(expected_message_maybe_ascii, str):
+      expected_message = type(message)()
+      text_format.Merge(expected_message_maybe_ascii, expected_message)
+      self._AssertProtoEquals(expected_message, message)
+    else:
+      assert False, ("Can't compare protos of type " +
+                     type(expected_message_maybe_ascii) + " and " +
+                     type(message))
+
+  def assertStartsWith(self, actual, expected_start, msg=None):
+    """Assert that actual.startswith(expected_start) is True.
+
+    Args:
+      actual: str
+      expected_start: str
+      msg: Optional message to report on failure.
+    """
+    if not actual.startswith(expected_start):
+      fail_msg = "%r does not start with %r" % (actual, expected_start)
+      fail_msg += " : %r" % (msg) if msg else ""
+      self.fail(fail_msg)
+
+  # pylint: disable=g-doc-return-or-yield
+  @contextlib.contextmanager
+  def test_session(self,
+                   graph=None,
+                   config=None,
+                   use_gpu=False,
+                   force_gpu=False):
+    """Returns a TensorFlow Session for use in executing tests.
+
+    This method should be used for all functional tests.
+
+    Use the `use_gpu` and `force_gpu` options to control where ops are run. If
+    `force_gpu` is True, all ops are pinned to `/gpu:0`. Otherwise, if `use_gpu`
+    is True, TensorFlow tries to run as many ops on the GPU as possible. If both
+    `force_gpu and `use_gpu` are False, all ops are pinned to the CPU.
+
+    Example:
+
+      class MyOperatorTest(test_util.TensorFlowTestCase):
+        def testMyOperator(self):
+          with self.test_session(use_gpu=True):
+            valid_input = [1.0, 2.0, 3.0, 4.0, 5.0]
+            result = MyOperator(valid_input).eval()
+            self.assertEqual(result, [1.0, 2.0, 3.0, 5.0, 8.0]
+            invalid_input = [-1.0, 2.0, 7.0]
+            with self.assertRaisesOpError("negative input not supported"):
+              MyOperator(invalid_input).eval()
+
+    Args:
+      graph: Optional graph to use during the returned session.
+      config: An optional config_pb2.ConfigProto to use to configure the
+        session.
+      use_gpu: If True, attempt to run as many ops as possible on GPU.
+      force_gpu: If True, pin all ops to `/gpu:0`.
+
+    Returns:
+      A Session object that should be used as a context manager to surround
+      the graph building and execution code in a test case.
+    """
+    def prepare_config(config):
+      if config is None:
+        config = config_pb2.ConfigProto()
+        config.allow_soft_placement = not force_gpu
+        config.gpu_options.per_process_gpu_memory_fraction = 0.3
+      elif force_gpu and config.allow_soft_placement:
+        config = config_pb2.ConfigProto().CopyFrom(config)
+        config.allow_soft_placement = False
+      return config
+
+    if graph is None:
+      if self._cached_session is None:
+        self._cached_session = session.Session(graph=None,
+                                               config=prepare_config(config))
+      sess = self._cached_session
+      with sess.graph.as_default(), sess.as_default():
+        if force_gpu:
+          with sess.graph.device("/gpu:0"):
+            yield sess
+        elif use_gpu:
+          yield sess
+        else:
+          with sess.graph.device(graph_util.pin_to_cpu):
+            yield sess
+    else:
+      with session.Session(graph=graph, config=prepare_config(config)) as sess:
+        if force_gpu:
+          with sess.graph.device("/gpu:0"):
+            yield sess
+        elif use_gpu:
+          yield sess
+        else:
+          with sess.graph.device(graph_util.pin_to_cpu):
+            yield sess
+  # pylint: enable=g-doc-return-or-yield
+
+  class _CheckedThread(object):
+    """A wrapper class for Thread that asserts successful completion.
+
+    This class should be created using the TensorFlowTestCase.checkedThread()
+    method.
+    """
+
+    def __init__(self, testcase, target, args=None, kwargs=None):
+      """Constructs a new instance of _CheckedThread.
+
+      Args:
+        testcase: The TensorFlowTestCase for which this thread is being created.
+        target: A callable object representing the code to be executed in the
+          thread.
+        args: A tuple of positional arguments that will be passed to target.
+        kwargs: A dictionary of keyword arguments that will be passed to target.
+      """
+      self._testcase = testcase
+      self._target = target
+      self._args = () if args is None else args
+      self._kwargs = {} if kwargs is None else kwargs
+      self._thread = threading.Thread(target=self._protected_run)
+      self._exception = None
+
+    def _protected_run(self):
+      """Target for the wrapper thread. Sets self._exception on failure."""
+      try:
+        self._target(*self._args, **self._kwargs)
+# pylint: disable=broad-except
+      except Exception as e:
+        # pylint: enable=broad-except
+        self._exception = e
+
+    def start(self):
+      """Starts the thread's activity.
+
+      This must be called at most once per _CheckedThread object. It arranges
+      for the object's target to be invoked in a separate thread of control.
+      """
+      self._thread.start()
+
+    def join(self):
+      """Blocks until the thread terminates.
+
+      Raises:
+        self._testcase.failureException: If the thread terminates with due to
+          an exception.
+      """
+      self._thread.join()
+      if self._exception is not None:
+        self._testcase.fail(
+            "Error in checkedThread: %s" % str(self._exception))
+
+    def is_alive(self):
+      """Returns whether the thread is alive.
+
+      This method returns True just before the run() method starts
+      until just after the run() method terminates.
+
+      Returns:
+        True if the thread is alive, otherwise False.
+      """
+      return self._thread.is_alive()
+
+  def checkedThread(self, target, args=None, kwargs=None):
+    """Returns a Thread wrapper that asserts 'target' completes successfully.
+
+    This method should be used to create all threads in test cases, as
+    otherwise there is a risk that a thread will silently fail, and/or
+    assertions made in the thread will not be respected.
+
+    Args:
+      target: A callable object to be executed in the thread.
+      args: The argument tuple for the target invocation. Defaults to ().
+      kwargs: A dictionary of keyword arguments for the target invocation.
+        Defaults to {}.
+
+    Returns:
+      A wrapper for threading.Thread that supports start() and join() methods.
+    """
+    ret = TensorFlowTestCase._CheckedThread(self, target, args, kwargs)
+    self._threads.append(ret)
+    return ret
+# pylint: enable=invalid-name
+
+  def assertNear(self, f1, f2, err):
+    """Asserts that two floats are near each other.
+
+    Checks that |f1 - f2| < err and asserts a test failure
+    if not.
+
+    Args:
+      f1: a float value.
+      f2: a float value.
+      err: a float value.
+    """
+    self.assertTrue(math.fabs(f1 - f2) < err)
+
+  def assertArrayNear(self, farray1, farray2, err):
+    """Asserts that two float arrays are near each other.
+
+    Checks that for all elements of farray1 and farray2
+    |f1 - f2| < err.  Asserts a test failure if not.
+
+    Args:
+      farray1: a list of float values.
+      farray2: a list of float values.
+      err: a float value.
+    """
+    for f1, f2 in zip(farray1, farray2):
+      self.assertNear(f1, f2, err)
+
+  def _NDArrayNear(self, ndarray1, ndarray2, err):
+    return np.linalg.norm(ndarray1 - ndarray2) < err
+
+  def assertNDArrayNear(self, ndarray1, ndarray2, err):
+    """Asserts that two numpy arrays have near values.
+
+    Args:
+      ndarray1: a numpy ndarray.
+      ndarray2: a numpy ndarray.
+      err: a float. The maximum absolute difference allowed.
+    """
+    self.assertTrue(self._NDArrayNear(ndarray1, ndarray2, err))
+
+  def _GetNdArray(self, a):
+    if not isinstance(a, np.ndarray):
+      a = np.array(a)
+    return a
+
+  def assertAllClose(self, a, b, rtol=1e-6, atol=1e-6):
+    """Asserts that two numpy arrays have near values.
+
+    Args:
+      a: a numpy ndarray or anything can be converted to one.
+      b: a numpy ndarray or anything can be converted to one.
+      rtol: relative tolerance
+      atol: absolute tolerance
+    """
+    a = self._GetNdArray(a)
+    b = self._GetNdArray(b)
+    self.assertEqual(
+        a.shape, b.shape,
+        "Shape mismatch: expected %s, got %s." % (a.shape, b.shape))
+    if not np.allclose(a, b, rtol=rtol, atol=atol):
+      # Prints more details than np.testing.assert_allclose.
+      #
+      # NOTE: numpy.allclose (and numpy.testing.assert_allclose)
+      # checks whether two arrays are element-wise equal within a
+      # tolerance. The relative difference (rtol * abs(b)) and the
+      # absolute difference atol are added together to compare against
+      # the absolute difference between a and b.  Here, we want to
+      # print out which elements violate such conditions.
+      cond = np.abs(a - b) > atol + rtol * np.abs(b)
+      if a.ndim:
+        x = a[np.where(cond)]
+        y = b[np.where(cond)]
+        print "not close where = ", np.where(cond)
+      else:
+        # np.where is broken for scalars
+        x, y = a, b
+      print "not close lhs = ", x
+      print "not close rhs = ", y
+      print "not close dif = ", np.abs(x - y)
+      print "not close tol = ", atol + rtol * np.abs(y)
+      np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
+
+  def assertAllEqual(self, a, b):
+    """Asserts that two numpy arrays have the same values.
+
+    Args:
+      a: a numpy ndarray or anything can be converted to one.
+      b: a numpy ndarray or anything can be converted to one.
+    """
+    a = self._GetNdArray(a)
+    b = self._GetNdArray(b)
+    self.assertEqual(
+        a.shape, b.shape,
+        "Shape mismatch: expected %s, got %s." % (a.shape, b.shape))
+    same = (a == b)
+
+    if a.dtype == np.float32 or a.dtype == np.float64:
+      same = np.logical_or(same, np.logical_and(np.isnan(a), np.isnan(b)))
+    if not np.all(same):
+      # Prints more details than np.testing.assert_array_equal.
+      diff = np.logical_not(same)
+      if a.ndim:
+        x = a[np.where(diff)]
+        y = b[np.where(diff)]
+        print "not equal where = ", np.where(diff)
+      else:
+        # np.where is broken for scalars
+        x, y = a, b
+      print "not equal lhs = ", x
+      print "not equal rhs = ", y
+      np.testing.assert_array_equal(a, b)
+
+  # pylint: disable=g-doc-return-or-yield
+  @contextlib.contextmanager
+  def assertRaisesWithPredicateMatch(self, exception_type,
+                                     expected_err_re_or_predicate):
+    """Returns a context manager to enclose code expected to raise an exception.
+
+    Args:
+      exception_type: The expected type of exception that should be raised.
+      expected_err_re_or_predicate: If this is callable, it should be a function
+        of one argument that inspects the passed-in OpError exception and
+        returns True (success) or False (please fail the test). Otherwise, the
+        error message is expected to match this regular expression partially.
+
+    Returns:
+      A context manager to surround code that is expected to raise an
+      errors.OpError exception.
+    """
+    if callable(expected_err_re_or_predicate):
+      predicate = expected_err_re_or_predicate
+    else:
+      def predicate(e):
+        err_str = e.message
+        op = e.op
+        while op is not None:
+          err_str += "\nCaused by: " + op.name
+          op = op._original_op
+        logging.info("Searching within error strings: '%s' within '%s'",
+                     expected_err_re_or_predicate, err_str)
+        return re.search(expected_err_re_or_predicate, err_str)
+    try:
+      yield
+      self.fail(exception_type.__name__ + " not raised")
+# pylint: disable=broad-except
+    except Exception as e:
+      # pylint: enable=broad-except
+      if not isinstance(e, exception_type) or not predicate(e):
+        raise AssertionError(e)
+  # pylint: enable=g-doc-return-or-yield
+
+  def assertRaisesOpError(self, expected_err_re_or_predicate):
+    return self.assertRaisesWithPredicateMatch(errors.OpError,
+                                               expected_err_re_or_predicate)
+
+  def assertShapeEqual(self, np_array, tf_tensor):
+    """Asserts that a Numpy ndarray and a TensorFlow tensor have the same shape.
+
+    Args:
+      np_array: A Numpy ndarray or Numpy scalar.
+      tf_tensor: A Tensor.
+
+    Raises:
+      TypeError: If the arguments have the wrong type.
+    """
+    if not isinstance(np_array, (np.ndarray, np.generic)):
+      raise TypeError("np_array must be a Numpy ndarray or Numpy scalar")
+    if not isinstance(tf_tensor, ops.Tensor):
+      raise TypeError("tf_tensor must be a Tensor")
+    self.assertAllEqual(np_array.shape, tf_tensor.get_shape().as_list())
diff --git a/tensorflow/python/framework/test_util_test.py b/tensorflow/python/framework/test_util_test.py
new file mode 100644
index 0000000000..e0618cfea4
--- /dev/null
+++ b/tensorflow/python/framework/test_util_test.py
@@ -0,0 +1,128 @@
+"""Tests for tensorflow.ops.test_util."""
+import threading
+
+import tensorflow.python.platform
+import numpy as np
+
+from google.protobuf import text_format
+
+from tensorflow.core.framework import graph_pb2
+from tensorflow.python.framework import errors
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.platform import googletest
+from tensorflow.python.ops import logging_ops
+
+class TestUtilTest(test_util.TensorFlowTestCase):
+
+  def testIsGoogleCudaEnabled(self):
+    # The test doesn't assert anything. It ensures the py wrapper
+    # function is generated correctly.
+    if test_util.IsGoogleCudaEnabled():
+      print "GoogleCuda is enabled"
+    else:
+      print "GoogleCuda is disabled"
+
+  def testAssertProtoEqualsStr(self):
+
+    graph_str = "node { name: 'w1' op: 'params' }"
+    graph_def = graph_pb2.GraphDef()
+    text_format.Merge(graph_str, graph_def)
+
+    # test string based comparison
+    self.assertProtoEquals(graph_str, graph_def)
+
+    # test original comparison
+    self.assertProtoEquals(graph_def, graph_def)
+
+  def testNDArrayNear(self):
+    a1 = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
+    a2 = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
+    a3 = np.array([[10.0, 20.0, 30.0], [40.0, 50.0, 60.0]])
+    self.assertTrue(self._NDArrayNear(a1, a2, 1e-5))
+    self.assertFalse(self._NDArrayNear(a1, a3, 1e-5))
+
+  def testCheckedThreadSucceeds(self):
+    def noop(ev):
+      ev.set()
+
+    event_arg = threading.Event()
+
+    self.assertFalse(event_arg.is_set())
+    t = self.checkedThread(target=noop, args=(event_arg,))
+    t.start()
+    t.join()
+    self.assertTrue(event_arg.is_set())
+
+  def testCheckedThreadFails(self):
+    def err_func():
+      return 1 / 0
+
+    t = self.checkedThread(target=err_func)
+    t.start()
+    with self.assertRaises(self.failureException) as fe:
+      t.join()
+    self.assertTrue("integer division or modulo by zero"
+                    in fe.exception.message)
+
+  def testCheckedThreadWithWrongAssertionFails(self):
+    x = 37
+
+    def err_func():
+      self.assertTrue(x < 10)
+
+    t = self.checkedThread(target=err_func)
+    t.start()
+    with self.assertRaises(self.failureException) as fe:
+      t.join()
+    self.assertTrue("False is not true" in fe.exception.message)
+
+  def testMultipleThreadsWithOneFailure(self):
+    def err_func(i):
+      self.assertTrue(i != 7)
+
+    threads = [self.checkedThread(target=err_func, args=(i,))
+               for i in range(10)]
+    for t in threads:
+      t.start()
+    for i, t in enumerate(threads):
+      if i == 7:
+        with self.assertRaises(self.failureException):
+          t.join()
+      else:
+        t.join()
+
+  def _WeMustGoDeeper(self, msg):
+    with self.assertRaisesOpError(msg):
+      node_def = ops._NodeDef("op_type", "name")
+      node_def_orig = ops._NodeDef("op_type_orig", "orig")
+      op_orig = ops.Operation(node_def_orig, ops.get_default_graph())
+      op = ops.Operation(node_def, ops.get_default_graph(), original_op=op_orig)
+      raise errors.UnauthenticatedError(node_def, op, "true_err")
+
+  def testAssertRaisesOpErrorDoesNotPassMessageDueToLeakedStack(self):
+    with self.assertRaises(AssertionError):
+      self._WeMustGoDeeper("this_is_not_the_error_you_are_looking_for")
+
+    self._WeMustGoDeeper("true_err")
+    self._WeMustGoDeeper("name")
+    self._WeMustGoDeeper("orig")
+
+  def testAllCloseScalars(self):
+    self.assertAllClose(7, 7 + 1e-8)
+    with self.assertRaisesRegexp(AssertionError, r"Not equal to tolerance"):
+      self.assertAllClose(7, 8)
+
+  def testForceGPU(self):
+    with self.assertRaisesRegexp(errors.InvalidArgumentError,
+                                 "Cannot assign a device to node"):
+      with self.test_session(force_gpu=True):
+        # this relies on us not having a GPU implementation for assert, which
+        # seems sensible
+        x = [True]
+        y = [15]
+        logging_ops.Assert(x, y).run()
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/framework/types.py b/tensorflow/python/framework/types.py
new file mode 100644
index 0000000000..6a8c629fe4
--- /dev/null
+++ b/tensorflow/python/framework/types.py
@@ -0,0 +1,418 @@
+"""Library of dtypes (Tensor element types)."""
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.core.framework import types_pb2
+
+
+class DType(object):
+  """Represents the type of the elements in a `Tensor`.
+
+  The following `DType` objects are defined:
+
+  * `tf.float32`: 32-bit single-precision floating-point.
+  * `tf.float64`: 64-bit double-precision floating-point.
+  * `tf.bfloat16`: 16-bit truncated floating-point.
+  * `tf.complex64`: 64-bit single-precision complex.
+
+  * `tf.int8`: 8-bit signed integer.
+  * `tf.uint8`: 8-bit unsigned integer.
+  * `tf.int32`: 32-bit signed integer.
+  * `tf.int64`: 64-bit signed integer.
+
+  * `tf.bool`: Boolean.
+
+  * `tf.string`: String.
+
+  * `tf.qint8`: Quantized 8-bit signed integer.
+  * `tf.quint8`: Quantized 8-bit unsigned integer.
+  * `tf.qint32`: Quantized 32-bit signed integer.
+
+  In addition, variants of these types with the `_ref` suffix are
+  defined for reference-typed tensors.
+
+  The `tf.as_dtype()` function converts numpy types and string type
+  names to a `DType` object.
+
+  @@is_compatible_with
+  @@name
+  @@base_dtype
+  @@is_ref_dtype
+  @@as_ref
+  @@is_integer
+  @@is_quantized
+
+  @@as_numpy_dtype
+  @@as_datatype_enum
+  """
+
+  def __init__(self, type_enum):
+    """Creates a new `DataType`.
+
+    NOTE(mrry): In normal circumstances, you should not need to
+    construct a DataType object directly. Instead, use the
+    types.as_dtype() function.
+
+    Args:
+      type_enum: A `types_pb2.DataType` enum value.
+
+    Raises:
+      TypeError: If `type_enum` is not a value `types_pb2.DataType`.
+
+    """
+    # TODO(mrry): Make the necessary changes (using __new__) to ensure
+    # that calling this returns one of the interned values.
+    type_enum = int(type_enum)
+    if (type_enum not in types_pb2.DataType.values()
+        or type_enum == types_pb2.DT_INVALID):
+      raise TypeError(
+          "type_enum is not a valid types_pb2.DataType: %s" % type_enum)
+    self._type_enum = type_enum
+
+  @property
+  def is_ref_dtype(self):
+    """Returns `True` if this `DType` represents a reference type."""
+    return self._type_enum > 100
+
+  @property
+  def as_ref(self):
+    """Returns a reference `DType` based on this `DType`."""
+    if self.is_ref_dtype:
+      return self
+    else:
+      return _INTERN_TABLE[self._type_enum + 100]
+
+  @property
+  def base_dtype(self):
+    """Returns a non-reference `DType` based on this `DType`."""
+    if self.is_ref_dtype:
+      return _INTERN_TABLE[self._type_enum - 100]
+    else:
+      return self
+
+  @property
+  def as_numpy_dtype(self):
+    """Returns a `numpy.dtype` based on this `DType`."""
+    return _TF_TO_NP[self._type_enum]
+
+  @property
+  def as_datatype_enum(self):
+    """Returns a `types_pb2.DataType` enum value based on this `DType`."""
+    return self._type_enum
+
+  @property
+  def is_integer(self):
+    """Returns whether this is a (non-quantized) integer type."""
+    return (not self.is_quantized and
+            issubclass(self.as_numpy_dtype, np.integer))
+
+  @property
+  def is_quantized(self):
+    """Returns whether this is a quantized data type."""
+    return self.base_dtype in [qint8, quint8, qint32, bfloat16]
+
+  @property
+  def min(self):
+    """Returns the minimum representable value in this data type.
+
+    Raises:
+      TypeError: if this is a non-numeric, unordered, or quantized type.
+
+    """
+    if (self.is_quantized or self.base_dtype == bool or
+        self.base_dtype == string or self.base_dtype == complex64):
+      raise TypeError("Cannot find minimum value of %s." % self)
+
+    # there is no simple way to get the min value of a dtype, we have to check
+    # float and int types separately
+    try:
+      return np.finfo(self.as_numpy_dtype()).min
+    except:  # bare except as possible raises by finfo not documented
+      try:
+        return np.iinfo(self.as_numpy_dtype()).min
+      except:
+        raise TypeError("Cannot find minimum value of %s." % self)
+
+  @property
+  def max(self):
+    """Returns the maximum representable value in this data type.
+
+    Raises:
+      TypeError: if this is a non-numeric, unordered, or quantized type.
+
+    """
+    if (self.is_quantized or self.base_dtype == bool or
+        self.base_dtype == string or self.base_dtype == complex64):
+      raise TypeError("Cannot find maximum value of %s." % self)
+
+    # there is no simple way to get the min value of a dtype, we have to check
+    # float and int types separately
+    try:
+      return np.finfo(self.as_numpy_dtype()).max
+    except:  # bare except as possible raises by finfo not documented
+      try:
+        return np.iinfo(self.as_numpy_dtype()).max
+      except:
+        raise TypeError("Cannot find maximum value of %s." % self)
+
+  def is_compatible_with(self, other):
+    """Returns True if the `other` DType will be converted to this DType.
+
+    The conversion rules are as follows:
+
+    ```
+    DType(T)       .is_compatible_with(DType(T))        == True
+    DType(T)       .is_compatible_with(DType(T).as_ref) == True
+    DType(T).as_ref.is_compatible_with(DType(T))        == False
+    DType(T).as_ref.is_compatible_with(DType(T).as_ref) == True
+    ```
+
+    Args:
+      other: A `DType` (or object that may be converted to a `DType`).
+
+    Returns:
+      True if a Tensor of the `other` `DType` will be implicitly converted to
+      this `DType`.
+    """
+    other = as_dtype(other)
+    return self._type_enum in (
+        other.as_datatype_enum, other.base_dtype.as_datatype_enum)
+
+  def __eq__(self, other):
+    """Returns True iff this DType refers to the same type as `other`."""
+    return (other is not None
+            and self._type_enum == as_dtype(other).as_datatype_enum)
+
+  def __ne__(self, other):
+    """Returns True iff self != other."""
+    return not self.__eq__(other)
+
+  @property
+  def name(self):
+    """Returns the string name for this `DType`."""
+    return _TYPE_TO_STRING[self._type_enum]
+
+  def __str__(self):
+    return "<dtype: %r>" % self.name
+
+  def __repr__(self):
+    return "tf." + self.name
+
+
+# Define standard wrappers for the types_pb2.DataType enum.
+float32 = DType(types_pb2.DT_FLOAT)
+float64 = DType(types_pb2.DT_DOUBLE)
+double = float64
+int32 = DType(types_pb2.DT_INT32)
+uint8 = DType(types_pb2.DT_UINT8)
+int16 = DType(types_pb2.DT_INT16)
+int8 = DType(types_pb2.DT_INT8)
+string = DType(types_pb2.DT_STRING)
+complex64 = DType(types_pb2.DT_COMPLEX64)
+int64 = DType(types_pb2.DT_INT64)
+bool = DType(types_pb2.DT_BOOL)
+qint8 = DType(types_pb2.DT_QINT8)
+quint8 = DType(types_pb2.DT_QUINT8)
+qint32 = DType(types_pb2.DT_QINT32)
+bfloat16 = DType(types_pb2.DT_BFLOAT16)
+float32_ref = DType(types_pb2.DT_FLOAT_REF)
+float64_ref = DType(types_pb2.DT_DOUBLE_REF)
+double_ref = float64_ref
+int32_ref = DType(types_pb2.DT_INT32_REF)
+uint8_ref = DType(types_pb2.DT_UINT8_REF)
+int16_ref = DType(types_pb2.DT_INT16_REF)
+int8_ref = DType(types_pb2.DT_INT8_REF)
+string_ref = DType(types_pb2.DT_STRING_REF)
+complex64_ref = DType(types_pb2.DT_COMPLEX64_REF)
+int64_ref = DType(types_pb2.DT_INT64_REF)
+bool_ref = DType(types_pb2.DT_BOOL_REF)
+qint8_ref = DType(types_pb2.DT_QINT8_REF)
+quint8_ref = DType(types_pb2.DT_QUINT8_REF)
+qint32_ref = DType(types_pb2.DT_QINT32_REF)
+bfloat16_ref = DType(types_pb2.DT_BFLOAT16_REF)
+
+
+# Maintain an intern table so that we don't have to create a large
+# number of small objects.
+_INTERN_TABLE = {
+    types_pb2.DT_FLOAT: float32,
+    types_pb2.DT_DOUBLE: float64,
+    types_pb2.DT_INT32: int32,
+    types_pb2.DT_UINT8: uint8,
+    types_pb2.DT_INT16: int16,
+    types_pb2.DT_INT8: int8,
+    types_pb2.DT_STRING: string,
+    types_pb2.DT_COMPLEX64: complex64,
+    types_pb2.DT_INT64: int64,
+    types_pb2.DT_BOOL: bool,
+    types_pb2.DT_QINT8: qint8,
+    types_pb2.DT_QUINT8: quint8,
+    types_pb2.DT_QINT32: qint32,
+    types_pb2.DT_BFLOAT16: bfloat16,
+    types_pb2.DT_FLOAT_REF: float32_ref,
+    types_pb2.DT_DOUBLE_REF: float64_ref,
+    types_pb2.DT_INT32_REF: int32_ref,
+    types_pb2.DT_UINT8_REF: uint8_ref,
+    types_pb2.DT_INT16_REF: int16_ref,
+    types_pb2.DT_INT8_REF: int8_ref,
+    types_pb2.DT_STRING_REF: string_ref,
+    types_pb2.DT_COMPLEX64_REF: complex64_ref,
+    types_pb2.DT_INT64_REF: int64_ref,
+    types_pb2.DT_BOOL_REF: bool_ref,
+    types_pb2.DT_QINT8_REF: qint8_ref,
+    types_pb2.DT_QUINT8_REF: quint8_ref,
+    types_pb2.DT_QINT32_REF: qint32_ref,
+    types_pb2.DT_BFLOAT16_REF: bfloat16_ref,
+}
+
+
+# Standard mappings between types_pb2.DataType values and string names.
+_TYPE_TO_STRING = {
+    types_pb2.DT_FLOAT: "float32",
+    types_pb2.DT_DOUBLE: "float64",
+    types_pb2.DT_INT32: "int32",
+    types_pb2.DT_UINT8: "uint8",
+    types_pb2.DT_INT16: "int16",
+    types_pb2.DT_INT8: "int8",
+    types_pb2.DT_STRING: "string",
+    types_pb2.DT_COMPLEX64: "complex64",
+    types_pb2.DT_INT64: "int64",
+    types_pb2.DT_BOOL: "bool",
+    types_pb2.DT_QINT8: "qint8",
+    types_pb2.DT_QUINT8: "quint8",
+    types_pb2.DT_QINT32: "qint32",
+    types_pb2.DT_BFLOAT16: "bfloat16",
+    types_pb2.DT_FLOAT_REF: "float32_ref",
+    types_pb2.DT_DOUBLE_REF: "float64_ref",
+    types_pb2.DT_INT32_REF: "int32_ref",
+    types_pb2.DT_UINT8_REF: "uint8_ref",
+    types_pb2.DT_INT16_REF: "int16_ref",
+    types_pb2.DT_INT8_REF: "int8_ref",
+    types_pb2.DT_STRING_REF: "string_ref",
+    types_pb2.DT_COMPLEX64_REF: "complex64_ref",
+    types_pb2.DT_INT64_REF: "int64_ref",
+    types_pb2.DT_BOOL_REF: "bool_ref",
+    types_pb2.DT_QINT8_REF: "qint8_ref",
+    types_pb2.DT_QUINT8_REF: "quint8_ref",
+    types_pb2.DT_QINT32_REF: "qint32_ref",
+    types_pb2.DT_BFLOAT16_REF: "bfloat16_ref",
+}
+_STRING_TO_TF = {value: _INTERN_TABLE[key]
+                 for key, value in _TYPE_TO_STRING.iteritems()}
+# Add non-canonical aliases.
+_STRING_TO_TF["float"] = float32
+_STRING_TO_TF["float_ref"] = float32_ref
+_STRING_TO_TF["double"] = float64
+_STRING_TO_TF["double_ref"] = float64_ref
+
+
+# Numpy representation for quantized dtypes.
+#
+# These are magic strings that are used in the swig wrapper to identify
+# quantized types.
+# TODO(mrry,keveman): Investigate Numpy type registration to replace this
+# hard-coding of names.
+_np_qint8 = np.dtype([("qint8", np.int8, 1)])
+_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
+_np_qint32 = np.dtype([("qint32", np.int32, 1)])
+
+# Standard mappings between types_pb2.DataType values and numpy.dtypes.
+_NP_TO_TF = frozenset([
+    (np.float32, float32),
+    (np.float64, float64),
+    (np.int32, int32),
+    (np.int64, int64),
+    (np.uint8, uint8),
+    (np.int16, int16),
+    (np.int8, int8),
+    (np.complex64, complex64),
+    (np.object, string),
+    (np.bool, bool),
+    (_np_qint8, qint8),
+    (_np_quint8, quint8),
+    (_np_qint32, qint32),
+    # NOTE(mdevin): Intentionally no way to feed a DT_BFLOAT16.
+])
+_TF_TO_NP = {
+    types_pb2.DT_FLOAT: np.float32,
+    types_pb2.DT_DOUBLE: np.float64,
+    types_pb2.DT_INT32: np.int32,
+    types_pb2.DT_UINT8: np.uint8,
+    types_pb2.DT_INT16: np.int16,
+    types_pb2.DT_INT8: np.int8,
+    # NOTE(mdevin): For strings we use np.object as it supports variable length
+    # strings.
+    types_pb2.DT_STRING: np.object,
+    types_pb2.DT_COMPLEX64: np.complex64,
+    types_pb2.DT_INT64: np.int64,
+    types_pb2.DT_BOOL: np.bool,
+    types_pb2.DT_QINT8: _np_qint8,
+    types_pb2.DT_QUINT8: _np_quint8,
+    types_pb2.DT_QINT32: _np_qint32,
+    types_pb2.DT_BFLOAT16: np.uint16,
+
+    # Ref types
+    types_pb2.DT_FLOAT_REF: np.float32,
+    types_pb2.DT_DOUBLE_REF: np.float64,
+    types_pb2.DT_INT32_REF: np.int32,
+    types_pb2.DT_UINT8_REF: np.uint8,
+    types_pb2.DT_INT16_REF: np.int16,
+    types_pb2.DT_INT8_REF: np.int8,
+    types_pb2.DT_STRING_REF: np.object,
+    types_pb2.DT_COMPLEX64_REF: np.complex64,
+    types_pb2.DT_INT64_REF: np.int64,
+    types_pb2.DT_BOOL_REF: np.bool,
+    types_pb2.DT_QINT8_REF: _np_qint8,
+    types_pb2.DT_QUINT8_REF: _np_quint8,
+    types_pb2.DT_QINT32_REF: _np_qint32,
+    types_pb2.DT_BFLOAT16_REF: np.uint16,
+}
+
+
+QUANTIZED_DTYPES = frozenset(
+    [qint8, quint8, qint32, qint8_ref, quint8_ref, qint32_ref])
+
+
+def as_dtype(type_value):
+  """Converts the given `type_value` to a `DType`.
+
+  Args:
+    type_value: A value that can be converted to a `tf.DType`
+      object. This may currently be a `tf.DType` object, a
+      [`DataType` enum](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/types.proto),
+      a string type name, or a `numpy.dtype`.
+
+  Returns:
+    A `DType` corresponding to `type_value`.
+
+  Raises:
+    TypeError: If `type_value` cannot be converted to a `DType`.
+  """
+  if isinstance(type_value, DType):
+    return type_value
+
+  try:
+    return _INTERN_TABLE[type_value]
+  except KeyError:
+    pass
+
+  try:
+    return _STRING_TO_TF[type_value]
+  except KeyError:
+    pass
+
+  if isinstance(type_value, np.dtype):
+    # The numpy dtype for strings is variable length. We can not compare
+    # dtype with a single constant (np.string does not exist) to decide
+    # dtype is a "string" type. We need to compare the dtype.type to be
+    # sure it's a string type.
+    if type_value.type == np.string_ or type_value.type == np.unicode_:
+      return string
+
+  for key, val in _NP_TO_TF:
+    if key == type_value:
+      return val
+
+  raise TypeError(
+      "Cannot convert value %r to a TensorFlow DType." % type_value)
diff --git a/tensorflow/python/framework/types_test.py b/tensorflow/python/framework/types_test.py
new file mode 100644
index 0000000000..acd2994339
--- /dev/null
+++ b/tensorflow/python/framework/types_test.py
@@ -0,0 +1,174 @@
+"""Tests for tensorflow.python.framework.importer."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.core.framework import types_pb2
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.platform import googletest
+
+
+class TypesTest(test_util.TensorFlowTestCase):
+
+  def testAllTypesConstructible(self):
+    for datatype_enum in types_pb2.DataType.values():
+      if datatype_enum == types_pb2.DT_INVALID:
+        continue
+      self.assertEqual(
+          datatype_enum, types.DType(datatype_enum).as_datatype_enum)
+
+  def testAllTypesConvertibleToDType(self):
+    for datatype_enum in types_pb2.DataType.values():
+      if datatype_enum == types_pb2.DT_INVALID:
+        continue
+      self.assertEqual(
+          datatype_enum, types.as_dtype(datatype_enum).as_datatype_enum)
+
+  def testAllTypesConvertibleToNumpyDtype(self):
+    for datatype_enum in types_pb2.DataType.values():
+      if datatype_enum == types_pb2.DT_INVALID:
+        continue
+      dtype = types.as_dtype(datatype_enum)
+      numpy_dtype = dtype.as_numpy_dtype
+      _ = np.empty((1, 1, 1, 1), dtype=numpy_dtype)
+      if dtype.base_dtype != types.bfloat16:
+        # NOTE(mdevin): Intentionally no way to feed a DT_BFLOAT16.
+        self.assertEqual(
+            types.as_dtype(datatype_enum).base_dtype, types.as_dtype(numpy_dtype))
+
+  def testInvalid(self):
+    with self.assertRaises(TypeError):
+      types.DType(types_pb2.DT_INVALID)
+    with self.assertRaises(TypeError):
+      types.as_dtype(types_pb2.DT_INVALID)
+
+  def testNumpyConversion(self):
+    self.assertIs(types.float32, types.as_dtype(np.float32))
+    self.assertIs(types.float64, types.as_dtype(np.float64))
+    self.assertIs(types.int32, types.as_dtype(np.int32))
+    self.assertIs(types.int64, types.as_dtype(np.int64))
+    self.assertIs(types.uint8, types.as_dtype(np.uint8))
+    self.assertIs(types.int16, types.as_dtype(np.int16))
+    self.assertIs(types.int8, types.as_dtype(np.int8))
+    self.assertIs(types.complex64, types.as_dtype(np.complex64))
+    self.assertIs(types.string, types.as_dtype(np.object))
+    self.assertIs(types.string, types.as_dtype(np.array(["foo", "bar"]).dtype))
+    self.assertIs(types.bool, types.as_dtype(np.bool))
+    with self.assertRaises(TypeError):
+      types.as_dtype(np.dtype([("f1", np.uint), ("f2", np.int32)]))
+
+  def testStringConversion(self):
+    self.assertIs(types.float32, types.as_dtype("float32"))
+    self.assertIs(types.float64, types.as_dtype("float64"))
+    self.assertIs(types.int32, types.as_dtype("int32"))
+    self.assertIs(types.uint8, types.as_dtype("uint8"))
+    self.assertIs(types.int16, types.as_dtype("int16"))
+    self.assertIs(types.int8, types.as_dtype("int8"))
+    self.assertIs(types.string, types.as_dtype("string"))
+    self.assertIs(types.complex64, types.as_dtype("complex64"))
+    self.assertIs(types.int64, types.as_dtype("int64"))
+    self.assertIs(types.bool, types.as_dtype("bool"))
+    self.assertIs(types.qint8, types.as_dtype("qint8"))
+    self.assertIs(types.quint8, types.as_dtype("quint8"))
+    self.assertIs(types.qint32, types.as_dtype("qint32"))
+    self.assertIs(types.bfloat16, types.as_dtype("bfloat16"))
+    self.assertIs(types.float32_ref, types.as_dtype("float32_ref"))
+    self.assertIs(types.float64_ref, types.as_dtype("float64_ref"))
+    self.assertIs(types.int32_ref, types.as_dtype("int32_ref"))
+    self.assertIs(types.uint8_ref, types.as_dtype("uint8_ref"))
+    self.assertIs(types.int16_ref, types.as_dtype("int16_ref"))
+    self.assertIs(types.int8_ref, types.as_dtype("int8_ref"))
+    self.assertIs(types.string_ref, types.as_dtype("string_ref"))
+    self.assertIs(types.complex64_ref, types.as_dtype("complex64_ref"))
+    self.assertIs(types.int64_ref, types.as_dtype("int64_ref"))
+    self.assertIs(types.bool_ref, types.as_dtype("bool_ref"))
+    self.assertIs(types.qint8_ref, types.as_dtype("qint8_ref"))
+    self.assertIs(types.quint8_ref, types.as_dtype("quint8_ref"))
+    self.assertIs(types.qint32_ref, types.as_dtype("qint32_ref"))
+    self.assertIs(types.bfloat16_ref, types.as_dtype("bfloat16_ref"))
+    with self.assertRaises(TypeError):
+      types.as_dtype("not_a_type")
+
+  def testDTypesHaveUniqueNames(self):
+    dtypes = []
+    names = set()
+    for datatype_enum in types_pb2.DataType.values():
+      if datatype_enum == types_pb2.DT_INVALID:
+        continue
+      dtype = types.as_dtype(datatype_enum)
+      dtypes.append(dtype)
+      names.add(dtype.name)
+    self.assertEqual(len(dtypes), len(names))
+
+  def testIsInteger(self):
+    self.assertEqual(types.as_dtype("int8").is_integer, True)
+    self.assertEqual(types.as_dtype("int16").is_integer, True)
+    self.assertEqual(types.as_dtype("int32").is_integer, True)
+    self.assertEqual(types.as_dtype("int64").is_integer, True)
+    self.assertEqual(types.as_dtype("uint8").is_integer, True)
+    self.assertEqual(types.as_dtype("complex64").is_integer, False)
+    self.assertEqual(types.as_dtype("float").is_integer, False)
+    self.assertEqual(types.as_dtype("double").is_integer, False)
+    self.assertEqual(types.as_dtype("string").is_integer, False)
+    self.assertEqual(types.as_dtype("bool").is_integer, False)
+
+  def testMinMax(self):
+    # make sure min/max evaluates for all data types that have min/max
+    for datatype_enum in types_pb2.DataType.values():
+      if datatype_enum == types_pb2.DT_INVALID:
+        continue
+      dtype = types.as_dtype(datatype_enum)
+      numpy_dtype = dtype.as_numpy_dtype
+
+      # ignore types for which there are no minimum/maximum (or we cannot
+      # compute it, such as for the q* types)
+      if (dtype.is_quantized or
+          dtype.base_dtype == types.bool or
+          dtype.base_dtype == types.string or
+          dtype.base_dtype == types.complex64):
+        continue
+
+      print "%s: %s - %s" % (dtype, dtype.min, dtype.max)
+
+      # check some values that are known
+      if numpy_dtype == np.bool_:
+        self.assertEquals(dtype.min, 0)
+        self.assertEquals(dtype.max, 1)
+      if numpy_dtype == np.int8:
+        self.assertEquals(dtype.min, -128)
+        self.assertEquals(dtype.max, 127)
+      if numpy_dtype == np.int16:
+        self.assertEquals(dtype.min, -32768)
+        self.assertEquals(dtype.max, 32767)
+      if numpy_dtype == np.int32:
+        self.assertEquals(dtype.min, -2147483648)
+        self.assertEquals(dtype.max, 2147483647)
+      if numpy_dtype == np.int64:
+        self.assertEquals(dtype.min, -9223372036854775808)
+        self.assertEquals(dtype.max, 9223372036854775807)
+      if numpy_dtype == np.uint8:
+        self.assertEquals(dtype.min, 0)
+        self.assertEquals(dtype.max, 255)
+      if numpy_dtype == np.uint16:
+        self.assertEquals(dtype.min, 0)
+        self.assertEquals(dtype.max, 4294967295)
+      if numpy_dtype == np.uint32:
+        self.assertEquals(dtype.min, 0)
+        self.assertEquals(dtype.max, 18446744073709551615)
+      if numpy_dtype in (np.float16, np.float32, np.float64):
+        self.assertEquals(dtype.min, np.finfo(numpy_dtype).min)
+        self.assertEquals(dtype.max, np.finfo(numpy_dtype).max)
+
+  def testRepr(self):
+    for enum, name in types._TYPE_TO_STRING.iteritems():
+      dtype = types.DType(enum)
+      self.assertEquals(repr(dtype), 'tf.' + name)
+      dtype2 = eval(repr(dtype))
+      self.assertEquals(type(dtype2), types.DType)
+      self.assertEquals(dtype, dtype2)
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/kernel_tests/__init__.py b/tensorflow/python/kernel_tests/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/kernel_tests/__init__.py
diff --git a/tensorflow/python/kernel_tests/argmax_op_test.py b/tensorflow/python/kernel_tests/argmax_op_test.py
new file mode 100644
index 0000000000..2cd6101a87
--- /dev/null
+++ b/tensorflow/python/kernel_tests/argmax_op_test.py
@@ -0,0 +1,61 @@
+"""Tests for tensorflow.ops.argmax_op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+class ArgMaxTest(tf.test.TestCase):
+
+  def _testArg(self, method, x, dimension,
+               expected_values, use_gpu=False, expected_err_re=None):
+    with self.test_session(use_gpu=use_gpu):
+      ans = method(x, dimension=dimension)
+      if expected_err_re is None:
+        tf_ans = ans.eval()
+        self.assertAllEqual(tf_ans, expected_values)
+        self.assertShapeEqual(expected_values, ans)
+      else:
+        with self.assertRaisesOpError(expected_err_re):
+          ans.eval()
+
+  def _testBothArg(self, method, x, dimension,
+                   expected_values, expected_err_re=None):
+    self._testArg(method, x, dimension,
+                  expected_values, True, expected_err_re)
+    self._testArg(method, x, dimension,
+                  expected_values, False, expected_err_re)
+
+  def _testBasic(self, dtype):
+    x = np.asarray(100*np.random.randn(200), dtype=dtype)
+
+    # Check that argmin and argmax match numpy along the primary
+    # dimension
+    self._testBothArg(tf.argmax, x, 0, x.argmax())
+    self._testBothArg(tf.argmin, x, 0, x.argmin())
+
+  def _testDim(self, dtype):
+    x = np.asarray(100*np.random.randn(3, 2, 4, 5, 6), dtype=dtype)
+
+    # Check that argmin and argmax match numpy along all dimensions
+    for dim in range(5):
+      self._testBothArg(tf.argmax, x, dim, x.argmax(dim))
+      self._testBothArg(tf.argmin, x, dim, x.argmin(dim))
+
+  def testFloat(self):
+    self._testBasic(np.float32)
+    self._testDim(np.float32)
+
+  def testDouble(self):
+    self._testBasic(np.float64)
+    self._testDim(np.float64)
+
+  def testInt32(self):
+    self._testBasic(np.int32)
+    self._testDim(np.int32)
+
+  def testInt64(self):
+    self._testBasic(np.int64)
+    self._testDim(np.int64)
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/array_ops_test.py b/tensorflow/python/kernel_tests/array_ops_test.py
new file mode 100644
index 0000000000..108cc7599e
--- /dev/null
+++ b/tensorflow/python/kernel_tests/array_ops_test.py
@@ -0,0 +1,45 @@
+"""Tests for array_ops."""
+import math
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.ops import array_ops
+from tensorflow.python.platform import googletest
+
+
+class OperatorShapeTest(test_util.TensorFlowTestCase):
+
+  def testExpandScalar(self):
+    scalar = 'hello'
+    scalar_expanded = array_ops.expand_dims(scalar, [0])
+    self.assertEqual(scalar_expanded.get_shape(), (1,))
+
+  def testSqueeze(self):
+    scalar = 'hello'
+    scalar_squeezed = array_ops.squeeze(scalar, ())
+    self.assertEqual(scalar_squeezed.get_shape(), ())
+
+
+class ReverseTest(test_util.TensorFlowTestCase):
+
+  def testReverse0DimAuto(self):
+    x_np = 4
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = array_ops.reverse(x_np, []).eval()
+        self.assertAllEqual(x_tf, x_np)
+
+  def testReverse1DimAuto(self):
+    x_np = [1, 4, 9]
+
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = array_ops.reverse(x_np, [True]).eval()
+        self.assertAllEqual(x_tf, np.asarray(x_np)[::-1])
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/kernel_tests/attention_ops_test.py b/tensorflow/python/kernel_tests/attention_ops_test.py
new file mode 100644
index 0000000000..5541c541b2
--- /dev/null
+++ b/tensorflow/python/kernel_tests/attention_ops_test.py
@@ -0,0 +1,166 @@
+"""Tests for tensorflow.ops.attention_ops."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+from tensorflow.python.ops import attention_ops
+
+
+class ExtractGlimpseTest(tf.test.TestCase):
+
+  def _VerifyValues(
+      self, tensor_in_sizes, glimpse_sizes, offsets, expected_rows,
+      expected_cols):
+    """Verifies the output values of the glimpse extraction kernel.
+
+    Args:
+      tensor_in_sizes: Input tensor dimensions in [input_rows, input_cols].
+      glimpse_sizes: Dimensions of the glimpse in [glimpse_rows, glimpse_cols].
+      offsets: Relative location of the center of the glimpse in the input
+        image expressed as [row_offset, col_offset].
+      expected_rows: A list containing the expected row numbers (None for
+         out of bound entries that are expected to be replaced by uniform
+         random entries in [0,1) ).
+      expected_cols: Same as expected_rows, but for column numbers.
+    """
+
+    rows = tensor_in_sizes[0]
+    cols = tensor_in_sizes[1]
+    # Row Tensor with entries by row.
+    # [[ 1 1 1 ... ]
+    #  [ 2 2 2 ... ]
+    #  [ 3 3 3 ... ]
+    #  [ ...
+    # ]
+    t_rows = tf.tile(
+        [[1.0 * r] for r in range(1, rows + 1)], [1, cols],
+        name='tile_rows')
+
+    # Shuffle to switch to a convention of (batch_size, height, width, depth).
+    t_rows_4d = tf.transpose(
+        tf.expand_dims(
+            tf.expand_dims(t_rows, 0), 3), [0, 2, 1, 3])
+
+    # Column Tensor with entries by column.
+    # [[ 1 2 3 4 ... ]
+    #  [ 1 2 3 4 ... ]
+    #  [ 1 2 3 4 ... ]
+    #  [ ...         ]
+    # ]
+    t_cols = tf.tile(
+        [[1.0 * r for r in range(1, cols + 1)]],
+        [rows, 1], name='tile_cols')
+
+    # Shuffle to switch to a convention of (batch_size, height, width, depth).
+    t_cols_4d = tf.transpose(
+        tf.expand_dims(
+            tf.expand_dims(t_cols, 0), 3), [0, 2, 1, 3])
+
+    # extract_glimpses from Row and Column Tensor, respectively.
+    # Switch order for glimpse_sizes and offsets to switch from (row, col)
+    # convention to tensorflows (height, width) convention.
+    t1 = tf.constant([glimpse_sizes[1], glimpse_sizes[0]], shape=[2])
+    t2 = tf.constant([offsets[1], offsets[0]], shape=[1, 2])
+    glimpse_rows = (tf.transpose(
+        attention_ops.extract_glimpse(t_rows_4d, t1, t2), [0, 2, 1, 3]))
+    glimpse_cols = (tf.transpose(
+        attention_ops.extract_glimpse(t_cols_4d, t1, t2), [0, 2, 1, 3]))
+
+    # Evaluate the Tensorflow Graph.
+    with self.test_session() as sess:
+      value_rows, value_cols = sess.run([glimpse_rows, glimpse_cols])
+
+    # Check dimensions of returned glimpse.
+    self.assertEqual(value_rows.shape[1], glimpse_sizes[0])
+    self.assertEqual(value_rows.shape[2], glimpse_sizes[1])
+    self.assertEqual(value_cols.shape[1], glimpse_sizes[0])
+    self.assertEqual(value_cols.shape[2], glimpse_sizes[1])
+
+    # Check entries.
+    min_random_val = 0
+    max_random_val = max(rows, cols)
+    for i in range(0, glimpse_sizes[0]):
+      for j in range(0, glimpse_sizes[1]):
+        if expected_rows[i] is None or expected_cols[j] is None:
+          self.assertGreaterEqual(value_rows[0][i][j][0], min_random_val)
+          self.assertLessEqual(value_rows[0][i][j][0], max_random_val)
+          self.assertGreaterEqual(value_cols[0][i][j][0], min_random_val)
+          self.assertLessEqual(value_cols[0][i][j][0], max_random_val)
+        else:
+          self.assertEqual(value_rows[0][i][j][0], expected_rows[i])
+          self.assertEqual(value_cols[0][i][j][0], expected_cols[j])
+
+  def testCenterGlimpse(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[3, 5],
+                       offsets=[0.0, 0.0],
+                       expected_rows=[20, 21, 22],
+                       expected_cols=[29, 30, 31, 32, 33])
+
+  def testLargeCenterGlimpse(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[41, 61],
+                       offsets=[0.0, 0.0],
+                       expected_rows=range(1, 42),
+                       expected_cols=range(1, 62))
+
+  def testTooLargeCenterGlimpse(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[43, 63],
+                       offsets=[0.0, 0.0],
+                       expected_rows=[None] + range(1, 42) + [None],
+                       expected_cols=[None] + range(1, 62) + [None])
+
+  def testGlimpseFullOverlap(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[3, 5],
+                       offsets=[0.1, 0.3],
+                       expected_rows=[22, 23, 24],
+                       expected_cols=[38, 39, 40, 41, 42])
+
+  def testGlimpseFullOverlap2(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[11, 3],
+                       offsets=[-0.7, -0.7],
+                       expected_rows=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
+                       expected_cols=[8, 9, 10])
+
+  def testGlimpseBeforeLeftMargin(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[11, 5],
+                       offsets=[-0.7, -0.9],
+                       expected_rows=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
+                       expected_cols=[1, 2, 3, 4, 5])
+
+  def testGlimpseLowerRightCorner(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[7, 5],
+                       offsets=[1.0, 1.0],
+                       expected_rows=[38, 39, 40, 41, None, None, None],
+                       expected_cols=[59, 60, 61, None, None])
+
+  def testGlimpseNoOverlap(self):
+    self._VerifyValues(tensor_in_sizes=[20, 30],
+                       glimpse_sizes=[3, 3],
+                       offsets=[-2.0, 2.0],
+                       expected_rows=[None, None, None],
+                       expected_cols=[None, None, None])
+
+  def testGlimpseOnLeftMargin(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[11, 7],
+                       offsets=[-0.7, -1.0],
+                       expected_rows=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
+                       expected_cols=[None, None, None, 1, 2, 3, 4])
+
+  def testGlimpseUpperMargin(self):
+    self._VerifyValues(tensor_in_sizes=[41, 61],
+                       glimpse_sizes=[7, 5],
+                       offsets=[-1, 0.9],
+                       expected_rows=[None, None, None, 1, 2, 3, 4],
+                       expected_cols=[56, 57, 58, 59, 60])
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/batch_matmul_op_test.py b/tensorflow/python/kernel_tests/batch_matmul_op_test.py
new file mode 100644
index 0000000000..8ae37fec3a
--- /dev/null
+++ b/tensorflow/python/kernel_tests/batch_matmul_op_test.py
@@ -0,0 +1,195 @@
+"""Tests for tensorflow.ops.tf.BatchMatMul."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class BatchMatmulOpTest(tf.test.TestCase):
+
+  # Uses numpy to compute batch_matmul(x, y, adj_x, adj_y).
+  def _npBatchMatmul(self, x, y, adj_x, adj_y):
+    assert x.ndim >= 3
+    assert y.ndim >= 3
+    # output's shape depends on adj[0] and adj[1]
+    d0 = x.shape[-2] if not adj_x else x.shape[-1]
+    d2 = y.shape[-1] if not adj_y else y.shape[-2]
+    batch_dims = x.shape[:-2]
+    num = np.prod(batch_dims)
+    z = np.empty(list(batch_dims) + [d0, d2], dtype=x.dtype)
+    xr = x.reshape([num, x.shape[-2], x.shape[-1]])
+    yr = y.reshape([num, y.shape[-2], y.shape[-1]])
+    zr = z.reshape([num, z.shape[-2], z.shape[-1]])
+    for i in range(num):
+      a = np.matrix(xr[i, :, :])
+      if adj_x:
+        a = a.transpose().conj()
+      b = np.matrix(yr[i, :, :])
+      if adj_y:
+        b = b.transpose().conj()
+      zr[i, :, :] = a * b
+    return z
+
+  # Test _npBatchMatMul works.
+  def testSimpleNpVersion(self):
+    x = np.array([0., 1., 2., 3.]).reshape([1, 2, 2])
+    y = np.array([1., 2., 3., 4.]).reshape([1, 2, 2])
+    z0 = self._npBatchMatmul(x, y, False, False)
+    z1 = np.array([3., 4., 11., 16.]).reshape([1, 2, 2])
+    self.assertTrue(np.array_equal(z0, z1))
+
+    x = np.array([1., (1j), (-1.), (-1j)]).reshape([1, 2, 2])
+    y = x * np.complex(1, 1)  # rotate x 90 degree
+    z0 = self._npBatchMatmul(x, y, False, False)
+    z1 = np.array([2., (2.j), -2., (-2.j)]).reshape([1, 2, 2])
+    self.assertTrue(np.array_equal(z0, z1))
+
+    z0 = self._npBatchMatmul(x, y, False, True)
+    z1 = np.array([(2.-2.j), (-2.+2.j), (-2.+2.j), (2.-2.j)]).reshape([1, 2, 2])
+    self.assertTrue(np.array_equal(z0, z1))
+
+    z0 = self._npBatchMatmul(x, y, True, False)
+    z1 = np.array([(2.+2.j), (-2.+2.j), (2.-2.j), (2.+2.j)]).reshape([1, 2, 2])
+    self.assertTrue(np.array_equal(z0, z1))
+
+  # Compares _tfpBatchMatmul(x, y, alpha, adj) and _npBatchMatMul(x, y, alpha,
+  # adj)
+  def _compare(self, x, y, adj_x, adj_y, use_gpu=False):
+    with self.test_session(use_gpu=use_gpu):
+      z0 = tf.batch_matmul(x, y, adj_x=adj_x, adj_y=adj_y)
+      z0_val = z0.eval()
+    z1 = self._npBatchMatmul(x, y, adj_x, adj_y)
+    self.assertShapeEqual(z1, z0)
+    if z0_val.size != 0:
+      err = (np.abs(z0_val - z1) / np.maximum(1, np.abs(z0_val))).max()
+      tf.logging.info("error = %f", err)
+      self.assertTrue(err < 1e-4)
+
+  # Returns a random float np of "shape".
+  def _randFloat(self, shape):
+    vals = np.random.normal(0, 1, np.prod(shape)).reshape(shape)
+    return np.array(vals, dtype=np.float32)
+
+  def testSimpleFloat(self):
+    for use_gpu in [False, True]:
+      self._compare(self._randFloat([7, 2, 3]), self._randFloat([7, 3, 5]),
+                    False, False, use_gpu)
+      self._compare(self._randFloat([7, 2, 3]), self._randFloat([7, 5, 3]),
+                    False, True, use_gpu)
+      self._compare(self._randFloat([7, 3, 2]), self._randFloat([7, 3, 5]),
+                    True, False, use_gpu)
+      self._compare(self._randFloat([7, 3, 2]), self._randFloat([7, 5, 3]),
+                    True, True, use_gpu)
+
+  def testLargeFloat(self):
+    for use_gpu in [False, True]:
+      self._compare(self._randFloat([10, 64, 75]),
+                    self._randFloat([10, 75, 30]), False, False, use_gpu)
+      self._compare(self._randFloat([10, 75, 64]),
+                    self._randFloat([10, 75, 30]), True, False, use_gpu)
+      self._compare(self._randFloat([10, 64, 75]),
+                    self._randFloat([10, 30, 75]), False, True, use_gpu)
+      self._compare(self._randFloat([10, 75, 64]),
+                    self._randFloat([10, 30, 75]), True, True, use_gpu)
+
+  def testHighNDims(self):
+    for use_gpu in [False, True]:
+      self._compare(self._randFloat([5, 7, 2, 3]),
+                    self._randFloat([5, 7, 3, 5]), False, False, use_gpu)
+      self._compare(self._randFloat([5, 7, 3, 2]),
+                    self._randFloat([5, 7, 3, 5]), True, False, use_gpu)
+      self._compare(self._randFloat([5, 7, 2, 3]),
+                    self._randFloat([5, 7, 5, 3]), False, True, use_gpu)
+      self._compare(self._randFloat([5, 7, 3, 2]),
+                    self._randFloat([5, 7, 5, 3]), True, True, use_gpu)
+
+  # Returns a random complex numpy array of "shape".
+  def _randComplex(self, shape):
+    real = np.random.normal(0, 1, np.prod(shape))
+    imag = np.random.normal(0, 1, np.prod(shape))
+    vals = [np.complex(v[0], v[1]) for v in zip(real, imag)]
+    return np.array(vals, dtype=np.complex64).reshape(shape)
+
+  def testSimpleComplex(self):
+    self._compare(self._randComplex([7, 2, 3]),
+                  self._randComplex([7, 3, 5]), False, False)
+    self._compare(self._randComplex([7, 2, 3]),
+                  self._randComplex([7, 5, 3]), False, True)
+    self._compare(self._randComplex([7, 3, 2]),
+                  self._randComplex([7, 3, 5]), True, False)
+    self._compare(self._randComplex([7, 3, 2]),
+                  self._randComplex([7, 5, 3]), True, True)
+
+  def testLargeComplex(self):
+    self._compare(self._randComplex([10, 64, 75]),
+                  self._randComplex([10, 75, 30]), False,
+                  False)
+    self._compare(self._randComplex([10, 64, 75]),
+                  self._randComplex([10, 30, 75]), False, True)
+    self._compare(self._randComplex([10, 75, 64]),
+                  self._randComplex([10, 75, 30]), True, False)
+    self._compare(self._randComplex([10, 75, 64]),
+                  self._randComplex([10, 30, 75]), True, True)
+
+  def testEmpty(self):
+    self._compare(np.empty([0, 3, 2]).astype(np.float32),
+                  np.empty([0, 2, 4]).astype(np.float32), False, False)
+    self._compare(np.empty([3, 2, 0]).astype(np.float32),
+                  np.empty([3, 0, 5]).astype(np.float32), False, False)
+    self._compare(np.empty([3, 0, 2]).astype(np.float32),
+                  np.empty([3, 2, 5]).astype(np.float32), False, False)
+    self._compare(np.empty([3, 3, 2]).astype(np.float32),
+                  np.empty([3, 2, 0]).astype(np.float32), False, False)
+
+
+class BatchMatmulGradientTest(tf.test.TestCase):
+
+  # loss = sum(batch_matmul(x, y)). Verify dl/dx and dl/dy via the
+  # gradient checker.
+  def _checkGrad(self, x, y, adj_x, adj_y):
+    assert 3 == x.ndim
+    assert 3 == y.ndim
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      z = tf.batch_matmul(inx, iny, adj_x, adj_y)
+      loss = tf.reduce_sum(z)
+      epsilon = 1e-2
+      ((x_jacob_t, x_jacob_n), (y_jacob_t, y_jacob_n)) = gc.ComputeGradient(
+          [inx, iny], [x.shape, y.shape], loss, [1],
+          x_init_value=[x, y], delta=epsilon)
+
+    tf.logging.info("x_jacob_t = %s", x_jacob_t.reshape(x.shape))
+    tf.logging.info("x_jacob_n = %s", x_jacob_n.reshape(x.shape))
+    self.assertAllClose(x_jacob_t, x_jacob_n, rtol=1e-2, atol=epsilon)
+    tf.logging.info("y_jacob_t = %s", y_jacob_t.reshape(y.shape))
+    tf.logging.info("y_jacob_n = %s", y_jacob_n.reshape(y.shape))
+    self.assertAllClose(y_jacob_t, y_jacob_n, rtol=1e-2, atol=epsilon)
+
+  # Tests a batched matmul of x, and y: x is a 3D tensor of shape [b,
+  # n, k] y is a 3D tensor of shape [b, k, m] the batched matmul
+  # computes z of shape [b, n, m], where z[i, :, :] = x[i, :, :]
+  # matmul y[i, :, :]
+  def _compare(self, b, n, k, m):
+    x = np.random.normal(0, 1, b * n * k).astype(np.float32).reshape([b, n, k])
+    y = np.random.normal(0, 1, b * k * m).astype(np.float32).reshape([b, k, m])
+    self._checkGrad(x, y, False, False)
+    self._checkGrad(x.reshape([b, k, n]), y, True, False)
+    self._checkGrad(x, y.reshape([b, m, k]), False, True)
+    self._checkGrad(x.reshape([b, k, n]), y.reshape([b, m, k]), True, True)
+
+  def testSmall(self):
+    self._compare(1, 2, 3, 5)
+
+  def testMedium(self):
+    self._compare(3, 4, 7, 10)
+
+  # Can't do testLarge using very large inputs because gradient
+  # checker will take way too long time.
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/bcast_ops_test.py b/tensorflow/python/kernel_tests/bcast_ops_test.py
new file mode 100644
index 0000000000..c62a910496
--- /dev/null
+++ b/tensorflow/python/kernel_tests/bcast_ops_test.py
@@ -0,0 +1,76 @@
+"""Tests for tensorflow.kernels.bcast_ops."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+from tensorflow.python.ops.gen_array_ops import _broadcast_gradient_args
+
+
+class BcastOpsTest(tf.test.TestCase):
+
+  def _GetGradientArgs(self, xs, ys):
+    with self.test_session() as sess:
+      return sess.run(_broadcast_gradient_args(xs, ys))
+
+  def testBasic(self):
+    r0, r1 = self._GetGradientArgs([2, 3, 5], [1])
+    self.assertAllEqual(r0, [])
+    self.assertAllEqual(r1, [0, 1, 2])
+
+    r0, r1 = self._GetGradientArgs([1], [2, 3, 5])
+    self.assertAllEqual(r0, [0, 1, 2])
+    self.assertAllEqual(r1, [])
+
+    r0, r1 = self._GetGradientArgs([2, 3, 5], [5])
+    self.assertAllEqual(r0, [])
+    self.assertAllEqual(r1, [0, 1])
+
+    r0, r1 = self._GetGradientArgs([5], [2, 3, 5])
+    self.assertAllEqual(r0, [0, 1])
+    self.assertAllEqual(r1, [])
+
+    r0, r1 = self._GetGradientArgs([2, 3, 5], [3, 5])
+    self.assertAllEqual(r0, [])
+    self.assertAllEqual(r1, [0])
+
+    r0, r1 = self._GetGradientArgs([3, 5], [2, 3, 5])
+    self.assertAllEqual(r0, [0])
+    self.assertAllEqual(r1, [])
+
+    r0, r1 = self._GetGradientArgs([2, 3, 5], [3, 1])
+    self.assertAllEqual(r0, [])
+    self.assertAllEqual(r1, [0, 2])
+
+    r0, r1 = self._GetGradientArgs([3, 1], [2, 3, 5])
+    self.assertAllEqual(r0, [0, 2])
+    self.assertAllEqual(r1, [])
+
+    r0, r1 = self._GetGradientArgs([2, 1, 5], [3, 1])
+    self.assertAllEqual(r0, [1])
+    self.assertAllEqual(r1, [0, 2])
+
+    r0, r1 = self._GetGradientArgs([3, 1], [2, 1, 5])
+    self.assertAllEqual(r0, [0, 2])
+    self.assertAllEqual(r1, [1])
+
+  def testZeroDims(self):
+    r0, r1 = self._GetGradientArgs([2, 0, 3, 0, 5], [3, 0, 5])
+    self.assertAllEqual(r0, [])
+    self.assertAllEqual(r1, [0, 1])
+
+    r0, r1 = self._GetGradientArgs([3, 0, 5], [2, 0, 3, 0, 5])
+    self.assertAllEqual(r0, [0, 1])
+    self.assertAllEqual(r1, [])
+
+    r0, r1 = self._GetGradientArgs([2, 0, 3, 0, 5], [3, 1, 5])
+    self.assertAllEqual(r0, [])
+    self.assertAllEqual(r1, [0, 1, 3])
+
+    r0, r1 = self._GetGradientArgs([3, 1, 5], [2, 0, 3, 0, 5])
+    self.assertAllEqual(r0, [0, 1, 3])
+    self.assertAllEqual(r1, [])
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/bias_op_test.py b/tensorflow/python/kernel_tests/bias_op_test.py
new file mode 100644
index 0000000000..f3a26e2490
--- /dev/null
+++ b/tensorflow/python/kernel_tests/bias_op_test.py
@@ -0,0 +1,93 @@
+"""Functional tests for BiasAdd."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker
+
+
+class BiasAddTest(tf.test.TestCase):
+
+  def _npBias(self, inputs, bias):
+    assert len(bias.shape) == 1
+    print inputs.shape
+    print bias.shape
+    assert inputs.shape[-1] == bias.shape[0]
+    return inputs + bias.reshape(([1] * (len(inputs.shape) - 1))
+                                 + [bias.shape[0]])
+
+  def testNpBias(self):
+    self.assertAllClose(np.array([[11, 22, 33], [41, 52, 63]]),
+                        self._npBias(np.array([[10, 20, 30], [40, 50, 60]]),
+                                     np.array([1, 2, 3])))
+
+  def _testBias(self, np_inputs, np_bias, use_gpu=False):
+    np_val = self._npBias(np_inputs, np_bias)
+    with self.test_session(use_gpu=use_gpu):
+      tf_val = tf.nn.bias_add(np_inputs, np_bias).eval()
+    self.assertAllClose(np_val, tf_val)
+
+  def _testAll(self, np_inputs, np_bias):
+    self._testBias(np_inputs, np_bias, use_gpu=False)
+    if np_inputs.dtype == np.float32 or np_inputs.dtype == np.float64:
+      self._testBias(np_inputs, np_bias, use_gpu=True)
+
+  def testInputDims(self):
+    with self.assertRaises(ValueError):
+      tf.nn.bias_add([1, 2], [1])
+
+  def testBiasVec(self):
+    with self.assertRaises(ValueError):
+      tf.nn.bias_add(tf.reshape([1, 2], shape=[1, 2]),
+                      tf.reshape([1, 2], shape=[1, 2]))
+
+  def testBiasInputsMatch(self):
+    with self.assertRaises(ValueError):
+      tf.nn.bias_add(tf.reshape([1, 2], shape=[1, 2]),
+                      tf.reshape([1], shape=[1]))
+
+  def testIntTypes(self):
+    for t in [np.int8, np.int16, np.int32, np.int64]:
+      self._testAll(np.array([[10, 20, 30], [40, 50, 60]]).astype(t),
+                    np.array([1, 2, 3]).astype(t))
+
+  def testFloatTypes(self):
+    for t in [np.float32, np.float64]:
+      self._testAll(np.random.rand(4, 3, 3).astype(t),
+                    np.random.rand(3).astype(t))
+
+  def testGradientTensor(self):
+    with self.test_session():
+      t = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2],
+                               dtype=tf.float64)
+      b = tf.constant([1.3, 2.4], dtype=tf.float64)
+      bo = tf.nn.bias_add(t, b)
+      err = gradient_checker.ComputeGradientError(t, [3, 2], bo, [3, 2])
+    print "bias add tensor gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+  def testGradientBias(self):
+    with self.test_session():
+      t = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2],
+                               dtype=tf.float64)
+      b = tf.constant([1.3, 2.4], dtype=tf.float64)
+      bo = tf.nn.bias_add(t, b)
+      err = gradient_checker.ComputeGradientError(b, [2], bo, [3, 2])
+    print "bias add bias gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+  def testGradientTensor4D(self):
+    with self.test_session():
+      s = [2, 3, 4, 2]
+      x = np.arange(1.0, 49.0).reshape(s).astype(np.float32)
+      t = tf.constant(x, shape=s, dtype=tf.float32)
+      b = tf.constant([1.3, 2.4], dtype=tf.float32)
+      bo = tf.nn.bias_add(t, b)
+      err = gradient_checker.ComputeGradientError(t, s, bo, s, x_init_value=x)
+    print "bias add tensor gradient err = ", err
+    self.assertLess(err, 1e-3)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/candidate_sampler_ops_test.py b/tensorflow/python/kernel_tests/candidate_sampler_ops_test.py
new file mode 100644
index 0000000000..a36b8587d5
--- /dev/null
+++ b/tensorflow/python/kernel_tests/candidate_sampler_ops_test.py
@@ -0,0 +1,114 @@
+"""Tests for CandidateSamplerOp."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class RangeSamplerOpsTest(tf.test.TestCase):
+
+  BATCH_SIZE = 3
+  NUM_TRUE = 2
+  RANGE = 5
+  NUM_SAMPLED = RANGE
+
+  TRUE_LABELS = [[1, 2], [0, 4], [3, 3]]
+
+  def testTrueCandidates(self):
+    with self.test_session() as sess:
+      indices = tf.constant([0, 0, 1, 1, 2, 2])
+      true_candidates_vec = tf.constant([1, 2, 0, 4, 3, 3])
+      true_candidates_matrix = tf.reshape(
+          true_candidates_vec, [self.BATCH_SIZE, self.NUM_TRUE])
+      indices_val, true_candidates_val = sess.run(
+          [indices, true_candidates_matrix])
+
+    self.assertAllEqual(indices_val, [0, 0, 1, 1, 2, 2])
+    self.assertAllEqual(true_candidates_val, self.TRUE_LABELS)
+
+  def testSampledCandidates(self):
+    with self.test_session():
+      true_classes = tf.constant([[1, 2], [0, 4], [3, 3]],
+                                 dtype=tf.int64)
+      sampled_candidates, _, _ = tf.nn.all_candidate_sampler(
+          true_classes, self.NUM_TRUE, self.NUM_SAMPLED, True)
+      result = sampled_candidates.eval()
+
+    expected_ids = [0, 1, 2, 3, 4]
+    self.assertAllEqual(result, expected_ids)
+    self.assertEqual(sampled_candidates.get_shape(), [self.NUM_SAMPLED])
+
+  def testTrueLogExpectedCount(self):
+    with self.test_session():
+      true_classes = tf.constant([[1, 2], [0, 4], [3, 3]],
+                                 dtype=tf.int64)
+      _, true_expected_count, _ = tf.nn.all_candidate_sampler(
+          true_classes, self.NUM_TRUE, self.NUM_SAMPLED, True)
+      true_log_expected_count = tf.log(true_expected_count)
+      result = true_log_expected_count.eval()
+
+    self.assertAllEqual(result, [[0.0] * self.NUM_TRUE] * self.BATCH_SIZE)
+    self.assertEqual(true_expected_count.get_shape(), [self.BATCH_SIZE,
+                                                       self.NUM_TRUE])
+    self.assertEqual(true_log_expected_count.get_shape(), [self.BATCH_SIZE,
+                                                           self.NUM_TRUE])
+
+  def testSampledLogExpectedCount(self):
+    with self.test_session():
+      true_classes = tf.constant([[1, 2], [0, 4], [3, 3]],
+                                 dtype=tf.int64)
+      _, _, sampled_expected_count = tf.nn.all_candidate_sampler(
+          true_classes, self.NUM_TRUE, self.NUM_SAMPLED, True)
+      sampled_log_expected_count = tf.log(sampled_expected_count)
+      result = sampled_log_expected_count.eval()
+
+    self.assertAllEqual(result, [0.0] * self.NUM_SAMPLED)
+    self.assertEqual(sampled_expected_count.get_shape(), [self.NUM_SAMPLED])
+    self.assertEqual(sampled_log_expected_count.get_shape(), [self.NUM_SAMPLED])
+
+  def testAccidentalHits(self):
+    with self.test_session() as sess:
+      true_classes = tf.constant([[1, 2], [0, 4], [3, 3]],
+                                          dtype=tf.int64)
+      sampled_candidates, _, _ = tf.nn.all_candidate_sampler(
+          true_classes, self.NUM_TRUE, self.NUM_SAMPLED, True)
+      accidental_hits = tf.nn.compute_accidental_hits(
+          true_classes, sampled_candidates, self.NUM_TRUE)
+      indices, ids, weights = sess.run(accidental_hits)
+
+    self.assertEqual(1, accidental_hits[0].get_shape().ndims)
+    self.assertEqual(1, accidental_hits[1].get_shape().ndims)
+    self.assertEqual(1, accidental_hits[2].get_shape().ndims)
+    for index, id_, weight in zip(indices, ids, weights):
+      self.assertTrue(id_ in self.TRUE_LABELS[index])
+      self.assertLess(weight, -1.0e37)
+
+  def testSeed(self):
+
+    def draw(seed):
+      with self.test_session():
+        true_classes = tf.constant([[1, 2], [0, 4], [3, 3]],
+                                   dtype=tf.int64)
+        sampled, _, _ = tf.nn.log_uniform_candidate_sampler(
+            true_classes,
+            self.NUM_TRUE,
+            self.NUM_SAMPLED,
+            True,
+            5,
+            seed=seed)
+        return sampled.eval()
+    # Non-zero seed. Repeatable.
+    for seed in [1, 12, 123, 1234]:
+      self.assertAllEqual(draw(seed), draw(seed))
+    # Seed=0 means random seeds.
+    num_same = 0
+    for _ in range(10):
+      if np.allclose(draw(None), draw(None)):
+        num_same += 1
+    # Accounts for the fact that the same random seed may be picked
+    # twice very rarely.
+    self.assertLessEqual(num_same, 2)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/cast_op_test.py b/tensorflow/python/kernel_tests/cast_op_test.py
new file mode 100644
index 0000000000..21e8f71198
--- /dev/null
+++ b/tensorflow/python/kernel_tests/cast_op_test.py
@@ -0,0 +1,165 @@
+"""Tests for tensorflow.ops.tf.cast."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class CastOpTest(tf.test.TestCase):
+
+  def _toDataType(self, dtype):
+    """Returns TensorFlow data type for numpy type."""
+    if dtype == np.float32:
+      return tf.float32
+    elif dtype == np.float64:
+      return tf.float64
+    elif dtype == np.int32:
+      return tf.int32
+    elif dtype == np.int64:
+      return tf.int64
+    elif dtype == np.bool:
+      return tf.bool
+    else:
+      return None
+
+  def _cast(self, x, dtype, use_gpu=False):
+    with self.test_session(use_gpu=use_gpu):
+      val = tf.constant(x, self._toDataType(np.array([x]).dtype))
+      return tf.cast(val, self._toDataType(dtype), name="cast").eval()
+
+  def _test(self, x, dtype, use_gpu=False):
+    """Tests cast(x) to dtype behaves the same as numpy.astype."""
+    np_ans = x.astype(dtype)
+    tf_ans = self._cast(x, dtype, use_gpu)
+    self.assertAllEqual(np_ans, tf_ans)
+
+  def _testTypes(self, x, use_gpu=False):
+    """Tests cast(x) to different tf."""
+    if use_gpu:
+      type_list = [np.float32, np.float64, np.int64]
+    else:
+      type_list = [np.float32, np.float64, np.int32, np.int64]
+    for from_type in type_list:
+      for to_type in type_list:
+        self._test(x.astype(from_type), to_type, use_gpu)
+
+    self._test(x.astype(np.bool), np.float32, use_gpu)
+    self._test(x.astype(np.uint8), np.float32, use_gpu)
+    if not use_gpu:
+      self._test(x.astype(np.bool), np.int32, use_gpu)
+      self._test(x.astype(np.int32), np.int32, use_gpu)
+
+  def _testAll(self, x):
+    self._testTypes(x, use_gpu=False)
+    if x.dtype == np.float32 or x.dtype == np.float64:
+      self._testTypes(x, use_gpu=True)
+
+  def testBasic(self):
+    self._testAll(np.arange(-10, 10).reshape(2, 10))
+    self._testAll(np.linspace(-10, 10, 17))
+
+  def testSmallValues(self):
+    f4 = np.finfo(np.float32)
+    f8 = np.finfo(np.float64)
+    self._testAll(np.array([0, -1, 1, -f4.resolution, f4.resolution,
+                            f8.resolution, -f8.resolution]))
+
+  def testBfloat16(self):
+    a = np.random.uniform(-100, 100, 100).astype(np.float32)
+    with self.test_session(use_gpu=False):
+      b = tf.cast(tf.cast(a, tf.bfloat16), tf.float32)
+      self.assertAllClose(a, b.eval(), rtol=1/128.)
+    with self.test_session(use_gpu=True):
+      b = tf.cast(tf.cast(a, tf.bfloat16), tf.float32)
+      self.assertAllClose(a, b.eval(), rtol=1/128.)
+
+  def testRandom(self):
+    self._testAll(np.random.normal(0, 10, 210).reshape([2, 3, 5, 7]))
+    self._testAll(np.random.normal(0, 1e6, 210).reshape([2, 3, 5, 7]))
+
+  # Special values like int32max, int64min, inf, -inf, nan casted to
+  # integer values in somewhat unexpected ways. And they behave
+  # differently on CPU and GPU.
+  def _compare(self, x, dst_dtype, expected, use_gpu=False):
+    np.testing.assert_equal(self._cast(x, dst_dtype, use_gpu=use_gpu),
+                            dst_dtype(expected))
+
+  def testIntToFloatBoundary(self):
+    i4 = np.iinfo(np.int32)
+    i8 = np.iinfo(np.int64)
+
+    self._compare(i4.min, np.float32, i4.min, False)
+    self._compare(i4.max, np.float32, i4.max, False)
+    self._compare(i8.min, np.float32, i8.min, False)
+    self._compare(i8.max, np.float32, i8.max, False)
+    self._compare(i4.min, np.float64, i4.min, False)
+    self._compare(i4.max, np.float64, i4.max, False)
+    self._compare(i8.min, np.float64, i8.min, False)
+    self._compare(i8.max, np.float64, i8.max, False)
+    # NOTE: GPU does not support int32/int64 for casting.
+
+  def testInfNan(self):
+    i4 = np.iinfo(np.int32)
+    i8 = np.iinfo(np.int64)
+
+    self._compare(np.inf, np.float32, np.inf, False)
+    self._compare(np.inf, np.float64, np.inf, False)
+    self._compare(np.inf, np.int32, i4.min, False)
+    self._compare(np.inf, np.int64, i8.min, False)
+    self._compare(-np.inf, np.float32, -np.inf, False)
+    self._compare(-np.inf, np.float64, -np.inf, False)
+    self._compare(-np.inf, np.int32, i4.min, False)
+    self._compare(-np.inf, np.int64, i8.min, False)
+    self.assertAllEqual(np.isnan(self._cast(np.nan, np.float32, False)), True)
+    self.assertAllEqual(np.isnan(self._cast(np.nan, np.float64, False)), True)
+    self._compare(np.nan, np.int32, i4.min, False)
+    self._compare(np.nan, np.int64, i8.min, False)
+
+    self._compare(np.inf, np.float32, np.inf, True)
+    self._compare(np.inf, np.float64, np.inf, True)
+    self._compare(-np.inf, np.float32, -np.inf, True)
+    self._compare(-np.inf, np.float64, -np.inf, True)
+    self.assertAllEqual(np.isnan(self._cast(np.nan, np.float32, True)), True)
+    self.assertAllEqual(np.isnan(self._cast(np.nan, np.float64, True)), True)
+
+  def _OpError(self, x, dtype, err):
+    with self.test_session():
+      with self.assertRaisesOpError(err):
+        tf.cast(x, dtype).eval()
+
+  def testNotImplemented(self):
+    self._OpError(np.arange(0, 10), tf.string,
+                  "Cast.*int64.*string.*")
+
+  def testGradients(self):
+    t = [tf.float32, tf.float64]
+    for src_t in t:
+      for dst_t in t:
+        with self.test_session():
+          x = tf.constant(1.0, src_t)
+          z = tf.identity(x)
+          y = tf.cast(z, dst_t)
+          err = gc.ComputeGradientError(x, [1], y, [1])
+          self.assertLess(err, 1e-3)
+
+
+class SparseTensorCastTest(tf.test.TestCase):
+
+  def testCast(self):
+    indices = tf.constant([[0L], [1L], [2L]])
+    values = tf.constant(np.array([1, 2, 3], np.int64))
+    shape = tf.constant([3L])
+    st = tf.SparseTensor(indices, values, shape)
+    st_cast = tf.cast(st, tf.float32)
+    with self.test_session():
+      self.assertAllEqual(st_cast.indices.eval(), [[0L], [1L], [2L]])
+      self.assertAllEqual(st_cast.values.eval(),
+                          np.array([1, 2, 3], np.float32))
+      self.assertAllEqual(st_cast.shape.eval(), [3L])
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/cholesky_op_test.py b/tensorflow/python/kernel_tests/cholesky_op_test.py
new file mode 100644
index 0000000000..17e8d116be
--- /dev/null
+++ b/tensorflow/python/kernel_tests/cholesky_op_test.py
@@ -0,0 +1,74 @@
+"""Tests for tensorflow.ops.tf.Cholesky."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class CholeskyOpTest(tf.test.TestCase):
+
+  def _verifyCholesky(self, x):
+    with self.test_session() as sess:
+      # Verify that LL^T == x.
+      if x.ndim == 2:
+        chol = tf.cholesky(x)
+        verification = tf.matmul(chol,
+                                 chol,
+                                 transpose_a=False,
+                                 transpose_b=True)
+      else:
+        chol = tf.batch_cholesky(x)
+        verification = tf.batch_matmul(chol, chol, adj_x=False, adj_y=True)
+      chol_np, verification_np = sess.run([chol, verification])
+    self.assertAllClose(x, verification_np)
+    self.assertShapeEqual(x, chol)
+    # Check that the cholesky is lower triangular, and has positive diagonal
+    # elements.
+    if chol_np.shape[-1] > 0:
+      chol_reshaped = np.reshape(chol_np, (-1, chol_np.shape[-2],
+                                           chol_np.shape[-1]))
+      for chol_matrix in chol_reshaped:
+        self.assertAllClose(chol_matrix, np.tril(chol_matrix))
+        self.assertTrue((np.diag(chol_matrix) > 0.0).all())
+
+  def testBasic(self):
+    self._verifyCholesky(np.array([[4., -1., 2.], [-1., 6., 0], [2., 0., 5.]]))
+
+  def testBatch(self):
+    simple_array = np.array([[[1., 0.], [0., 5.]]])  # shape (1, 2, 2)
+    self._verifyCholesky(simple_array)
+    self._verifyCholesky(np.vstack((simple_array, simple_array)))
+    odd_sized_array = np.array([[[4., -1., 2.], [-1., 6., 0], [2., 0., 5.]]])
+    self._verifyCholesky(np.vstack((odd_sized_array, odd_sized_array)))
+
+    # Generate random positive-definite matrices.
+    matrices = np.random.rand(10, 5, 5)
+    for i in xrange(10):
+      matrices[i] = np.dot(matrices[i].T, matrices[i])
+    self._verifyCholesky(matrices)
+
+  def testNonSquareMatrix(self):
+    with self.assertRaises(ValueError):
+      tf.cholesky(np.array([[1., 2., 3.], [3., 4., 5.]]))
+
+  def testWrongDimensions(self):
+    tensor3 = tf.constant([1., 2.])
+    with self.assertRaises(ValueError):
+      tf.cholesky(tensor3)
+
+  def testNotInvertible(self):
+    # The input should be invertible.
+    with self.test_session():
+      with self.assertRaisesOpError("LLT decomposition was not successful. The "
+                                    "input might not be valid."):
+        # All rows of the matrix below add to zero
+        self._verifyCholesky(np.array([[1., -1., 0.], [-1., 1., -1.], [0., -1.,
+                                                                       1.]]))
+
+  def testEmpty(self):
+    self._verifyCholesky(np.empty([0, 2, 2]))
+    self._verifyCholesky(np.empty([2, 0, 0]))
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/clip_ops_test.py b/tensorflow/python/kernel_tests/clip_ops_test.py
new file mode 100644
index 0000000000..46bba7514d
--- /dev/null
+++ b/tensorflow/python/kernel_tests/clip_ops_test.py
@@ -0,0 +1,222 @@
+"""Tests for tensorflow.ops.clip_ops."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class ClipTest(tf.test.TestCase):
+
+  # ClipByValue test
+  def testClipByValue(self):
+    with self.test_session():
+      x = tf.constant([-5.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
+      np_ans = [[-4.4, 2.0, 3.0],
+                [4.0, 4.4, 4.4]]
+      clip_value = 4.4
+      ans = tf.clip_by_value(x, -clip_value, clip_value)
+      tf_ans = ans.eval()
+
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testClipByValueNonFinite(self):
+    with self.test_session():
+      x = tf.constant([float('NaN'), float('Inf'), -float('Inf')])
+      np_ans = [float('NaN'), 4.0, -4.0]
+      clip_value = 4.0
+      ans = tf.clip_by_value(x, -clip_value, clip_value)
+      tf_ans = ans.eval()
+
+    self.assertAllClose(np_ans, tf_ans)
+
+  # ClipByNorm tests
+  def testClipByNormClipped(self):
+    # Norm clipping when clip_norm < 5
+    with self.test_session():
+      x = tf.constant([-3.0, 0.0, 0.0, 4.0, 0.0, 0.0], shape=[2, 3])
+      # Norm of x = sqrt(3^2 + 4^2) = 5
+      np_ans = [[-2.4, 0.0, 0.0],
+                [3.2, 0.0, 0.0]]
+      clip_norm = 4.0
+      ans = tf.clip_by_norm(x, clip_norm)
+      tf_ans = ans.eval()
+
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testClipByNormNotClipped(self):
+    # No norm clipping when clip_norm >= 5
+    with self.test_session():
+      x = tf.constant([-3.0, 0.0, 0.0, 4.0, 0.0, 0.0], shape=[2, 3])
+      # Norm of x = sqrt(3^2 + 4^2) = 5
+      np_ans = [[-3.0, 0.0, 0.0],
+                [4.0, 0.0, 0.0]]
+      clip_norm = 6.0
+      ans = tf.clip_by_norm(x, clip_norm)
+      tf_ans = ans.eval()
+
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testClipByNormZero(self):
+    # No norm clipping when norm = 0
+    with self.test_session():
+      x = tf.constant([0.0, 0.0, 0.0, 0.0, 0.0, 0.0], shape=[2, 3])
+      # Norm = 0, no changes
+      np_ans = [[0.0, 0.0, 0.0],
+                [0.0, 0.0, 0.0]]
+      clip_norm = 6.0
+      ans = tf.clip_by_norm(x, clip_norm)
+      tf_ans = ans.eval()
+
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testClipByGlobalNormClipped(self):
+    # Norm clipping when clip_norm < 5
+    with self.test_session():
+      x0 = tf.constant([-2.0, 0.0, 0.0, 4.0, 0.0, 0.0], shape=[2, 3])
+      x1 = tf.constant([1.0, -2.0])
+      # Global norm of x0 and x1 = sqrt(1 + 4^2 + 2^2 + 2^2) = 5
+      clip_norm = 4.0
+
+      # Answers are the original tensors scaled by 4.0/5.0
+      np_ans_0 = [[-1.6, 0.0, 0.0],
+                  [3.2, 0.0, 0.0]]
+      np_ans_1 = [0.8, -1.6]
+
+      ans, norm = tf.clip_by_global_norm((x0, x1), clip_norm)
+      tf_ans_1 = ans[0].eval()
+      tf_ans_2 = ans[1].eval()
+      tf_norm = norm.eval()
+
+    self.assertAllClose(tf_norm, 5.0)
+    self.assertAllClose(np_ans_0, tf_ans_1)
+    self.assertAllClose(np_ans_1, tf_ans_2)
+
+  def testClipByGlobalNormSupportsNone(self):
+    # Norm clipping when clip_norm < 5
+    with self.test_session():
+      x0 = tf.constant([-2.0, 0.0, 0.0, 4.0, 0.0, 0.0], shape=[2, 3])
+      x1 = tf.constant([1.0, -2.0])
+      # Global norm of x0 and x1 = sqrt(1 + 4^2 + 2^2 + 2^2) = 5
+      clip_norm = 4.0
+
+      # Answers are the original tensors scaled by 4.0/5.0
+      np_ans_0 = [[-1.6, 0.0, 0.0],
+                  [3.2, 0.0, 0.0]]
+      np_ans_1 = [0.8, -1.6]
+
+      ans, norm = tf.clip_by_global_norm((x0, None, x1, None), clip_norm)
+      self.assertTrue(ans[1] is None)
+      self.assertTrue(ans[3] is None)
+      tf_ans_1 = ans[0].eval()
+      tf_ans_2 = ans[2].eval()
+      tf_norm = norm.eval()
+
+    self.assertAllClose(tf_norm, 5.0)
+    self.assertAllClose(np_ans_0, tf_ans_1)
+    self.assertAllClose(np_ans_1, tf_ans_2)
+
+  # ClipByGlobalNorm tests
+  def testClipByGlobalNormWithIndexedSlicesClipped(self):
+    # Norm clipping when clip_norm < 5
+    with self.test_session():
+      x0 = tf.constant([-2.0, 0.0, 0.0, 4.0, 0.0, 0.0], shape=[2, 3])
+      x1 = tf.IndexedSlices(tf.constant([1.0, -2.0]),
+                            tf.constant([3, 4]))
+      # Global norm of x0 and x1 = sqrt(1 + 4^2 + 2^2 + 2^2) = 5
+      clip_norm = 4.0
+
+      # Answers are the original tensors scaled by 4.0/5.0
+      np_ans_0 = [[-1.6, 0.0, 0.0],
+                  [3.2, 0.0, 0.0]]
+      np_ans_1 = [0.8, -1.6]
+
+      ans, norm = tf.clip_by_global_norm([x0, x1], clip_norm)
+      tf_ans_1 = ans[0].eval()
+      tf_ans_2 = ans[1].values.eval()
+      tf_norm = norm.eval()
+
+    self.assertAllClose(tf_norm, 5.0)
+    self.assertAllClose(np_ans_0, tf_ans_1)
+    self.assertAllClose(np_ans_1, tf_ans_2)
+
+  def testClipByGlobalNormNotClipped(self):
+    # No norm clipping when clip_norm >= 5
+    with self.test_session():
+      x0 = tf.constant([-2.0, 0.0, 0.0, 4.0, 0.0, 0.0], shape=[2, 3])
+      x1 = tf.constant([1.0, -2.0])
+      # Global norm of x0 and x1 = sqrt(1 + 4^2 + 2^2 + 2^2) = 5
+      np_ans_0 = [[-2.0, 0.0, 0.0],
+                  [4.0, 0.0, 0.0]]
+      np_ans_1 = [1.0, -2.0]
+      clip_norm = 6.0
+
+      ans, norm = tf.clip_by_global_norm([x0, x1], clip_norm)
+      tf_ans_1 = ans[0].eval()
+      tf_ans_2 = ans[1].eval()
+      tf_norm = norm.eval()
+
+    self.assertAllClose(tf_norm, 5.0)
+    self.assertAllClose(np_ans_0, tf_ans_1)
+    self.assertAllClose(np_ans_1, tf_ans_2)
+
+  def testClipByGlobalNormZero(self):
+    # No norm clipping when norm = 0
+    with self.test_session():
+      x0 = tf.constant([0.0, 0.0, 0.0, 0.0, 0.0, 0.0], shape=[2, 3])
+      x1 = tf.constant([0.0, 0.0])
+      # Norm = 0, no changes
+      np_ans_0 = [[0.0, 0.0, 0.0],
+                  [0.0, 0.0, 0.0]]
+      np_ans_1 = [0.0, 0.0]
+      clip_norm = 6.0
+
+      ans, norm = tf.clip_by_global_norm([x0, x1], clip_norm)
+      tf_ans_1 = ans[0].eval()
+      tf_ans_2 = ans[1].eval()
+      tf_norm = norm.eval()
+
+    self.assertAllClose(tf_norm, 0.0)
+    self.assertAllClose(np_ans_0, tf_ans_1)
+    self.assertAllClose(np_ans_1, tf_ans_2)
+
+  def testClipByAverageNormClipped(self):
+    # Norm clipping when average clip_norm < 0.83333333
+    with self.test_session():
+      x = tf.constant([-3.0, 0.0, 0.0, 4.0, 0.0, 0.0], shape=[2, 3])
+      # Average norm of x = sqrt(3^2 + 4^2) / 6 = 0.83333333
+      np_ans = [[-2.88, 0.0, 0.0],
+                [3.84, 0.0, 0.0]]
+      clip_norm = 0.8
+      ans = tf.clip_by_average_norm(x, clip_norm)
+      tf_ans = ans.eval()
+
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testClipByAverageNormNotClipped(self):
+    # No norm clipping when average clip_norm >= 0.83333333
+    with self.test_session():
+      x = tf.constant([-3.0, 0.0, 0.0, 4.0, 0.0, 0.0], shape=[2, 3])
+      # Average norm of x = sqrt(3^2 + 4^2) / 6 = 0.83333333
+      np_ans = [[-3.0, 0.0, 0.0],
+                [4.0, 0.0, 0.0]]
+      clip_norm = 0.9
+      ans = tf.clip_by_average_norm(x, clip_norm)
+      tf_ans = ans.eval()
+
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testClipByAverageNormZero(self):
+    # No norm clipping when average clip_norm = 0
+    with self.test_session():
+      x = tf.constant([0.0, 0.0, 0.0, 0.0, 0.0, 0.0], shape=[2, 3])
+      # Average norm = 0, no changes
+      np_ans = [[0.0, 0.0, 0.0],
+                [0.0, 0.0, 0.0]]
+      clip_norm = 0.9
+      ans = tf.clip_by_average_norm(x, clip_norm)
+      tf_ans = ans.eval()
+
+    self.assertAllClose(np_ans, tf_ans)
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/concat_op_test.py b/tensorflow/python/kernel_tests/concat_op_test.py
new file mode 100644
index 0000000000..3f6c43f0a6
--- /dev/null
+++ b/tensorflow/python/kernel_tests/concat_op_test.py
@@ -0,0 +1,276 @@
+"""Functional tests for Concat Op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class ConcatOpTest(tf.test.TestCase):
+
+  def testHStack(self):
+    with self.test_session():
+      p1 = tf.placeholder(tf.float32, shape=[4, 4])
+      p2 = tf.placeholder(tf.float32, shape=[4, 4])
+      c = tf.concat(0, [p1, p2])
+      params = {
+          p1: np.random.rand(4, 4).astype("f"),
+          p2: np.random.rand(4, 4).astype("f")
+          }
+      result = c.eval(feed_dict=params)
+
+    self.assertEqual(result.shape, c.get_shape())
+    self.assertAllEqual(result[:4, :], params[p1])
+    self.assertAllEqual(result[4:, :], params[p2])
+
+  def testVStack(self):
+    with self.test_session():
+      p1 = tf.placeholder(tf.float32, shape=[4, 4])
+      p2 = tf.placeholder(tf.float32, shape=[4, 4])
+      c = tf.concat(1, [p1, p2])
+      params = {
+          p1: np.random.rand(4, 4).astype("f"),
+          p2: np.random.rand(4, 4).astype("f")
+          }
+      result = c.eval(feed_dict=params)
+
+    self.assertEqual(result.shape, c.get_shape())
+    self.assertAllEqual(result[:, :4], params[p1])
+    self.assertAllEqual(result[:, 4:], params[p2])
+
+  def testInt32GPU(self):
+    with self.test_session(use_gpu=True):
+      p1 = np.random.rand(2, 3).astype("i")
+      p2 = np.random.rand(2, 3).astype("i")
+      x1 = tf.constant(p1)
+      x2 = tf.constant(p2)
+      c = tf.concat(0, [x1, x2])
+      result = c.eval()
+    self.assertAllEqual(result[:2, :], p1)
+    self.assertAllEqual(result[2:, :], p2)
+
+  def testRefType(self):
+    with self.test_session():
+      p1 = tf.placeholder(tf.float32_ref, shape=[4, 4])
+      p2 = tf.placeholder(tf.float32_ref, shape=[4, 4])
+      c = tf.concat(0, [p1, p2])
+      params = {
+          p1: np.random.rand(4, 4).astype("f"),
+          p2: np.random.rand(4, 4).astype("f")
+          }
+      result = c.eval(feed_dict=params)
+
+    self.assertEqual(result.shape, c.get_shape())
+    self.assertAllEqual(result[:4, :], params[p1])
+    self.assertAllEqual(result[4:, :], params[p2])
+
+  def _testRandom(self, dtype, use_gpu=False):
+    # Random dims of rank 5
+    shape = np.random.randint(1, 5, size=5)
+    # Random number of tensors, but always > 1.
+    num_tensors = np.random.randint(2, 10)
+    # Random dim to concat on
+    concat_dim = np.random.randint(5)
+    params = {}
+    with self.test_session(use_gpu=use_gpu):
+      p = []
+      for i in np.arange(num_tensors):
+        input_shape = shape
+        input_shape[concat_dim] = np.random.randint(1, 5)
+        placeholder = tf.placeholder(dtype, shape=input_shape)
+        p.append(placeholder)
+
+        t = dtype.as_numpy_dtype
+        params[placeholder] = np.random.rand(*input_shape).astype(t)
+
+      c = tf.concat(concat_dim, p)
+      result = c.eval(feed_dict=params)
+
+    self.assertEqual(result.shape, c.get_shape())
+    cur_offset = 0
+
+    for i in np.arange(num_tensors):
+      # The index into the result is the ':' along all dimensions
+      # except the concat_dim. slice(0, size) is used for ':', and
+      # a list of slices is used to index into result.
+      ind = [slice(0, params[p[i]].shape[j]) for j in np.arange(5)]
+      ind[concat_dim] = slice(cur_offset,
+                              cur_offset + params[p[i]].shape[concat_dim])
+      cur_offset += params[p[i]].shape[concat_dim]
+      self.assertAllEqual(result[ind], params[p[i]])
+
+  def testRandom(self):
+    self._testRandom(tf.float32)
+    self._testRandom(tf.int16)
+    self._testRandom(tf.int32, use_gpu=True)
+    # Note that the following does not work since bfloat16 is not supported in
+    # numpy.
+    # self._testRandom(tf.bfloat16)
+
+  def _testGradientsSimple(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      inp = []
+      inp_tensors = []
+      for x in [1, 2, 6]:
+        shape = [10, x, 2]
+        t = np.random.rand(*shape).astype("f")
+        inp.append(t)
+        inp_tensors.append(
+            tf.constant([float(y) for y in t.flatten()],
+                                 shape=shape, dtype=tf.float32))
+      c = tf.concat(1, inp_tensors)
+      output_shape = [10, 9, 2]
+      grad_inp = np.random.rand(*output_shape).astype("f")
+      grad_tensor = tf.constant([float(x) for x in grad_inp.flatten()],
+                                         shape=output_shape)
+      grad = tf.gradients([c], inp_tensors, [grad_tensor])
+      concated_grad = tf.concat(1, grad)
+      result = concated_grad.eval()
+
+    self.assertAllEqual(result, grad_inp)
+
+  def testGradientsSimpleAll(self):
+    self._testGradientsSimple(use_gpu=False)
+    self._testGradientsSimple(use_gpu=True)
+
+  def _testGradientsFirstDim(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      inp = []
+      inp_tensors = []
+      for x in [1, 2, 6]:
+        shape = [x, 10, 2]
+        t = np.random.rand(*shape).astype("f")
+        inp.append(t)
+        inp_tensors.append(
+            tf.constant([float(y) for y in t.flatten()],
+                                 shape=shape, dtype=tf.float32))
+      c = tf.concat(0, inp_tensors)
+      output_shape = [9, 10, 2]
+      grad_inp = np.random.rand(*output_shape).astype("f")
+      grad_tensor = tf.constant([float(x) for x in grad_inp.flatten()],
+                                         shape=output_shape)
+      grad = tf.gradients([c], inp_tensors, [grad_tensor])
+      concated_grad = tf.concat(0, grad)
+      result = concated_grad.eval()
+
+    self.assertAllEqual(result, grad_inp)
+
+  def testGradientsFirstDimAll(self):
+    self._testGradientsFirstDim(use_gpu=False)
+    self._testGradientsFirstDim(use_gpu=True)
+
+  def _testGradientsLastDim(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      inp = []
+      inp_tensors = []
+      for x in [1, 2, 6]:
+        shape = [10, 2, x]
+        t = np.random.rand(*shape).astype("f")
+        inp.append(t)
+        inp_tensors.append(
+            tf.constant([float(y) for y in t.flatten()],
+                                 shape=shape, dtype=tf.float32))
+      c = tf.concat(2, inp_tensors)
+      output_shape = [10, 2, 9]
+      grad_inp = np.random.rand(*output_shape).astype("f")
+      grad_tensor = tf.constant([float(x) for x in grad_inp.flatten()],
+                                         shape=output_shape)
+      grad = tf.gradients([c], inp_tensors, [grad_tensor])
+      concated_grad = tf.concat(2, grad)
+      result = concated_grad.eval()
+
+    self.assertAllEqual(result, grad_inp)
+
+  def testGradientsLastDimAll(self):
+    self._testGradientsLastDim(use_gpu=False)
+    self._testGradientsLastDim(use_gpu=True)
+
+  def _RunAndVerifyGradientsRandom(self, use_gpu):
+    # Random dims of rank 5
+    input_shape = np.random.randint(1, 5, size=5)
+    # Random number of tensors
+    num_tensors = np.random.randint(1, 10)
+    # Random dim to concat on
+    concat_dim = np.random.randint(5)
+    concat_dim_sizes = np.random.randint(1, 5, size=num_tensors)
+    with self.test_session(use_gpu=use_gpu):
+      inp = []
+      inp_tensors = []
+      for x in concat_dim_sizes:
+        shape = input_shape
+        shape[concat_dim] = x
+        t = np.random.rand(*shape).astype("f")
+        inp.append(t)
+        inp_tensors.append(
+            tf.constant([float(y) for y in t.flatten()],
+                                 shape=shape, dtype=tf.float32))
+      c = tf.concat(concat_dim, inp_tensors)
+      output_shape = input_shape
+      output_shape[concat_dim] = concat_dim_sizes.sum()
+      grad_inp = np.random.rand(*output_shape).astype("f")
+      grad_tensor = tf.constant([float(x) for x in grad_inp.flatten()],
+                                         shape=output_shape)
+      grad = tf.gradients([c], inp_tensors, [grad_tensor])
+      concated_grad = tf.concat(concat_dim, grad)
+      result = concated_grad.eval()
+
+    self.assertAllEqual(result, grad_inp)
+
+  def testGradientsRandom(self):
+    for _ in range(5):
+      self._RunAndVerifyGradientsRandom(use_gpu=False)
+      self._RunAndVerifyGradientsRandom(use_gpu=True)
+
+  def testShapeError(self):
+    # Rank doesn't match.
+    with self.assertRaises(ValueError):
+      tf.concat(1, [tf.constant(10.0, shape=[4, 4, 4, 4]),
+                           tf.constant(20.0, shape=[4, 4, 4])])
+
+    # Dimensions don't match in a non-concat dim.
+    with self.assertRaises(ValueError):
+      tf.concat(1, [tf.constant(10.0, shape=[1, 2, 1]),
+                           tf.constant(20.0, shape=[3, 2, 1])])
+
+    # concat_dim out of range.
+    with self.assertRaises(ValueError):
+      tf.concat(3, [tf.constant(10.0, shape=[4, 4, 4]),
+                           tf.constant(20.0, shape=[4, 4, 4])])
+
+  def testShapeWithUnknownConcatDim(self):
+    p1 = tf.placeholder(tf.float32)
+    c1 = tf.constant(10.0, shape=[4, 4, 4, 4])
+    p2 = tf.placeholder(tf.float32)
+    c2 = tf.constant(20.0, shape=[4, 4, 4, 4])
+    dim = tf.placeholder(tf.int32)
+    concat = tf.concat(dim, [p1, c1, p2, c2])
+    self.assertEqual(4, concat.get_shape().ndims)
+
+    # Rank doesn't match.
+    c3 = tf.constant(30.0, shape=[4, 4, 4])
+    with self.assertRaises(ValueError):
+      tf.concat(dim, [p1, c1, p2, c3])
+
+  def testZeroSize(self):
+    # Verify that concat doesn't crash and burn for zero size inputs
+    np.random.seed(7)
+    for use_gpu in False, True:
+      with self.test_session(use_gpu=use_gpu) as sess:
+        for shape0 in (), (2,):
+          axis = len(shape0)
+          for shape1 in (), (3,):
+            for n0 in 0, 1, 2:
+              for n1 in 0, 1, 2:
+                x0 = np.random.randn(*(shape0 + (n0,) + shape1))
+                x1 = np.random.randn(*(shape0 + (n1,) + shape1))
+                correct = np.concatenate([x0, x1], axis=axis)
+                xs = map(tf.constant, [x0, x1])
+                c = tf.concat(axis, xs)
+                self.assertAllEqual(c.eval(), correct)
+                # Check gradients
+                dc = np.random.randn(*c.get_shape().as_list())
+                dxs = sess.run(tf.gradients(c, xs, dc))
+                self.assertAllEqual(dc, np.concatenate(dxs, axis=axis))
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/constant_op_test.py b/tensorflow/python/kernel_tests/constant_op_test.py
new file mode 100644
index 0000000000..92f9b5fe4a
--- /dev/null
+++ b/tensorflow/python/kernel_tests/constant_op_test.py
@@ -0,0 +1,524 @@
+"""Tests for ConstantOp."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.ops import gen_array_ops
+
+
+class ConstantTest(tf.test.TestCase):
+
+  def _testCpu(self, x):
+    np_ans = np.array(x)
+    with self.test_session(use_gpu=False):
+      tf_ans = tf.convert_to_tensor(x).eval()
+    if np_ans.dtype in [np.float32, np.float64, np.complex64]:
+      self.assertAllClose(np_ans, tf_ans)
+    else:
+      self.assertAllEqual(np_ans, tf_ans)
+
+  def _testGpu(self, x):
+    np_ans = np.array(x)
+    with self.test_session(use_gpu=True):
+      tf_ans = tf.convert_to_tensor(x).eval()
+    if np_ans.dtype in [np.float32, np.float64, np.complex64]:
+      self.assertAllClose(np_ans, tf_ans)
+    else:
+      self.assertAllEqual(np_ans, tf_ans)
+
+  def _testAll(self, x):
+    self._testCpu(x)
+    self._testGpu(x)
+
+  def testFloat(self):
+    self._testAll(np.arange(-15, 15).reshape([2, 3, 5]).astype(np.float32))
+    self._testAll(
+        np.random.normal(size=30).reshape([2, 3, 5]).astype(np.float32))
+    self._testAll(np.empty((2, 0, 5)).astype(np.float32))
+
+  def testDouble(self):
+    self._testAll(np.arange(-15, 15).reshape([2, 3, 5]).astype(np.float64))
+    self._testAll(
+        np.random.normal(size=30).reshape([2, 3, 5]).astype(np.float64))
+    self._testAll(np.empty((2, 0, 5)).astype(np.float64))
+
+  def testInt32(self):
+    self._testAll(np.arange(-15, 15).reshape([2, 3, 5]).astype(np.int32))
+    self._testAll(
+        (100 * np.random.normal(size=30)).reshape([2, 3, 5]).astype(np.int32))
+    self._testAll(np.empty((2, 0, 5)).astype(np.int32))
+
+  def testInt64(self):
+    self._testAll(np.arange(-15, 15).reshape([2, 3, 5]).astype(np.int64))
+    self._testAll(
+        (100 * np.random.normal(size=30)).reshape([2, 3, 5]).astype(np.int64))
+    self._testAll(np.empty((2, 0, 5)).astype(np.int64))
+
+  def testSComplex(self):
+    self._testAll(
+        np.complex(1, 2) * np.arange(-15, 15).reshape([2, 3, 5]).astype(
+            np.complex64))
+    self._testAll(np.complex(
+        1, 2) * np.random.normal(size=30).reshape([2, 3, 5]).astype(
+            np.complex64))
+    self._testAll(np.empty((2, 0, 5)).astype(np.complex64))
+
+  def testString(self):
+    self._testCpu(np.array([str(x) for x in np.arange(-15, 15)]).reshape(
+        [2, 3, 5]))
+    self._testCpu(np.empty((2, 0, 5)).astype(np.str_))
+
+  def testStringWithNulls(self):
+    with self.test_session():
+      val = tf.convert_to_tensor("\0\0\0\0").eval()
+    self.assertEqual(len(val), 4)
+    self.assertEqual(val, "\0\0\0\0")
+
+    with self.test_session():
+      val = tf.convert_to_tensor("xx\0xx").eval()
+    self.assertEqual(len(val), 5)
+    self.assertAllEqual(val, "xx\0xx")
+    nested = [["\0\0\0\0", "xx\0xx"], ["\0_\0_\0_\0", "\0"]]
+
+    with self.test_session():
+      val = tf.convert_to_tensor(nested).eval()
+    # NOTE(mrry): Do not use assertAllEqual, because it converts nested to a
+    #   numpy array, which loses the null terminators.
+    self.assertEqual(val.tolist(), nested)
+
+  def testExplicitShapeNumPy(self):
+    with tf.Graph().as_default():
+      c = tf.constant(
+          np.arange(-15, 15).reshape([2, 3, 5]).astype(np.float32),
+          shape=[2, 3, 5])
+    self.assertEqual(c.get_shape(), [2, 3, 5])
+
+  def testImplicitShapeNumPy(self):
+    with tf.Graph().as_default():
+      c = tf.constant(
+          np.arange(-15, 15).reshape([2, 3, 5]).astype(np.float32))
+    self.assertEqual(c.get_shape(), [2, 3, 5])
+
+  def testExplicitShapeList(self):
+    with tf.Graph().as_default():
+      c = tf.constant([1, 2, 3, 4, 5, 6, 7], shape=[7])
+    self.assertEqual(c.get_shape(), [7])
+
+  def testImplicitShapeList(self):
+    with tf.Graph().as_default():
+      c = tf.constant([1, 2, 3, 4, 5, 6, 7])
+    self.assertEqual(c.get_shape(), [7])
+
+  def testExplicitShapeNumber(self):
+    with tf.Graph().as_default():
+      c = tf.constant(1, shape=[1])
+    self.assertEqual(c.get_shape(), [1])
+
+  def testImplicitShapeNumber(self):
+    with tf.Graph().as_default():
+      c = tf.constant(1)
+    self.assertEqual(c.get_shape(), [])
+
+  def testShapeInconsistent(self):
+    with tf.Graph().as_default():
+      c = tf.constant([1, 2, 3, 4, 5, 6, 7], shape=[10])
+    self.assertEqual(c.get_shape(), [10])
+
+  # pylint: disable=g-long-lambda
+  def testShapeWrong(self):
+    with tf.Graph().as_default():
+      with self.assertRaisesWithPredicateMatch(
+          ValueError,
+          lambda e: ("Too many elements provided. Needed at most 5, "
+                     "but received 7" == str(e))):
+        tf.constant([1, 2, 3, 4, 5, 6, 7], shape=[5])
+  # pylint: enable=g-long-lambda
+
+  def testTooLargeConstant(self):
+    with tf.Graph().as_default():
+      large_array = np.zeros((512, 1024, 1024), dtype=np.float32)
+      with self.assertRaisesRegexp(
+          ValueError,
+          "Cannot create an Operation with a NodeDef larger than 2GB."):
+        c = tf.constant(large_array)
+
+  def testTooLargeGraph(self):
+    with tf.Graph().as_default() as g:
+      large_array = np.zeros((256, 1024, 1024), dtype=np.float32)
+      c = tf.constant(large_array)
+      d = tf.constant(large_array)
+      with self.assertRaisesRegexp(
+          ValueError, "GraphDef cannot be larger than 2GB."):
+        g.as_graph_def()
+
+  def testSparseValuesRaiseErrors(self):
+    with self.assertRaisesRegexp(ValueError,
+                                 "setting an array element with a sequence"):
+      c = tf.constant([[1, 2], [3]], dtype=tf.int32)
+
+    with self.assertRaisesRegexp(ValueError, "must be a dense"):
+      c = tf.constant([[1, 2], [3]])
+
+    with self.assertRaisesRegexp(ValueError, "must be a dense"):
+      c = tf.constant([[1, 2], [3], [4, 5]])
+
+
+class AsTensorTest(tf.test.TestCase):
+
+  def testAsTensorForTensorInput(self):
+    with tf.Graph().as_default():
+      t = tf.constant(10.0)
+      x = tf.convert_to_tensor(t)
+    self.assertIs(t, x)
+
+  def testAsTensorForNonTensorInput(self):
+    with tf.Graph().as_default():
+      x = tf.convert_to_tensor(10.0)
+    self.assertTrue(isinstance(x, tf.Tensor))
+
+  def testAsTensorForShapeInput(self):
+    with self.test_session():
+      x = tf.convert_to_tensor(tf.TensorShape([]))
+      self.assertEqual(tf.int32, x.dtype)
+      self.assertAllEqual([], x.eval())
+
+      x = tf.convert_to_tensor(tf.TensorShape([1, 2, 3]))
+      self.assertEqual(tf.int32, x.dtype)
+      self.assertAllEqual([1, 2, 3], x.eval())
+
+      x = tf.convert_to_tensor(tf.TensorShape([1, 2, 3]), dtype=tf.int64)
+      self.assertEqual(tf.int64, x.dtype)
+      self.assertAllEqual([1, 2, 3], x.eval())
+
+      x = tf.reshape(tf.zeros([6]), tf.TensorShape([2, 3]))
+      self.assertAllEqual([[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], x.eval())
+
+    with self.assertRaisesRegexp(ValueError, "partially known"):
+      tf.convert_to_tensor(tf.TensorShape(None))
+
+    with self.assertRaisesRegexp(ValueError, "partially known"):
+      tf.convert_to_tensor(tf.TensorShape([1, None, 64]))
+
+    with self.assertRaises(TypeError):
+      tf.convert_to_tensor(tf.TensorShape([1, 2, 3]), dtype=tf.float32)
+
+  def testAsTensorForDimensionInput(self):
+    with self.test_session():
+      x = tf.convert_to_tensor(tf.TensorShape([1, 2, 3])[1])
+      self.assertEqual(tf.int32, x.dtype)
+      self.assertAllEqual(2, x.eval())
+
+      x = tf.convert_to_tensor(tf.TensorShape([1, 2, 3])[1], dtype=tf.int64)
+      self.assertEqual(tf.int64, x.dtype)
+      self.assertAllEqual(2, x.eval())
+
+    with self.assertRaisesRegexp(ValueError, "unknown Dimension"):
+      tf.convert_to_tensor(tf.TensorShape(None)[1])
+
+    with self.assertRaisesRegexp(ValueError, "unknown Dimension"):
+      tf.convert_to_tensor(tf.TensorShape([1, None, 64])[1])
+
+    with self.assertRaises(TypeError):
+      tf.convert_to_tensor(tf.TensorShape([1, 2, 3])[1], dtype=tf.float32)
+
+
+class IdentityOpTest(tf.test.TestCase):
+
+  def testIdTensor(self):
+    with tf.Graph().as_default():
+      x = tf.constant(2.0, shape=[6], name="input")
+      id_op = tf.identity(x, name="id")
+    self.assertTrue(isinstance(id_op.op.inputs[0], tf.Tensor))
+    self.assertProtoEquals(
+        "name: 'id' op: 'Identity' input: 'input' "
+        "attr { key: 'T' value { type: DT_FLOAT } }", id_op.op.node_def)
+
+
+class ZerosTest(tf.test.TestCase):
+
+  def _Zeros(self, shape):
+    with self.test_session():
+      ret = tf.zeros(shape)
+      self.assertEqual(shape, ret.get_shape())
+      return ret.eval()
+
+  def testConst(self):
+    self.assertTrue(np.array_equal(self._Zeros([2, 3]), np.array([[0] * 3] *
+                                                                 2)))
+
+  def testDynamicSizes(self):
+    np_ans = np.array([[0] * 3] * 2)
+    with self.test_session():
+      # Creates a tensor of 2 x 3.
+      d = tf.fill([2, 3], 12., name="fill")
+      # Constructs a tensor of zeros of the same dimensions as "d".
+      z = tf.zeros(tf.shape(d))
+      out = z.eval()
+    self.assertAllEqual(np_ans, out)
+    self.assertShapeEqual(np_ans, d)
+    self.assertShapeEqual(np_ans, z)
+
+  def testDtype(self):
+    with self.test_session():
+      d = tf.fill([2, 3], 12., name="fill")
+      self.assertEqual(d.get_shape(), [2, 3])
+      # Test default type for both constant size and dynamic size
+      z = tf.zeros([2, 3])
+      self.assertEquals(z.dtype, tf.float32)
+      self.assertEqual([2, 3], z.get_shape())
+      z = tf.zeros(tf.shape(d))
+      self.assertEquals(z.dtype, tf.float32)
+      self.assertEqual([2, 3], z.get_shape())
+      # Test explicit type control
+      for dtype in [tf.float32, tf.float64, tf.int32,
+                    tf.uint8, tf.int16, tf.int8,
+                    tf.complex64, tf.int64]:
+        z = tf.zeros([2, 3], dtype=dtype)
+        self.assertEquals(z.dtype, dtype)
+        self.assertEquals([2, 3], z.get_shape())
+        z = tf.zeros(tf.shape(d), dtype=dtype)
+        self.assertEquals(z.dtype, dtype)
+        self.assertEquals([2, 3], z.get_shape())
+
+
+class ZerosLikeTest(tf.test.TestCase):
+
+  def testZerosLike(self):
+    for dtype in [tf.float32, tf.float64, tf.int32,
+                  tf.uint8, tf.int16, tf.int8,
+                  tf.complex64, tf.int64]:
+      numpy_dtype = dtype.as_numpy_dtype
+      with self.test_session():
+        # Creates a tensor of non-zero values with shape 2 x 3.
+        d = tf.constant(np.ones((2, 3), dtype=numpy_dtype), dtype=dtype)
+        # Constructs a tensor of zeros of the same dimensions and type as "d".
+        z_var = tf.zeros_like(d)
+        # Test that the type is correct
+        self.assertEquals(z_var.dtype, dtype)
+        z_value = z_var.eval()
+
+      # Test that the value is correct
+      self.assertTrue(np.array_equal(z_value, np.array([[0] * 3] * 2)))
+      self.assertEqual([2, 3], z_var.get_shape())
+
+  def testGenZerosLike(self):
+    for dtype in [tf.float32, tf.float64, tf.int32,
+                  tf.uint8, tf.int16, tf.int8,
+                  tf.complex64, tf.int64]:
+      numpy_dtype = dtype.as_numpy_dtype
+      with self.test_session():
+        # Creates a tensor of non-zero values with shape 2 x 3.
+        d = tf.constant(np.ones((2, 3), dtype=numpy_dtype), dtype=dtype)
+        # Constructs a tensor of zeros of the same dimensions and type as "d".
+        z_var = gen_array_ops._zeros_like(d)
+        # Test that the type is correct
+        self.assertEquals(z_var.dtype, dtype)
+        z_value = z_var.eval()
+
+      # Test that the value is correct
+      self.assertTrue(np.array_equal(z_value, np.array([[0] * 3] * 2)))
+      self.assertEqual([2, 3], z_var.get_shape())
+
+
+class OnesTest(tf.test.TestCase):
+
+  def _Ones(self, shape):
+    with self.test_session():
+      ret = tf.ones(shape)
+      self.assertEqual(shape, ret.get_shape())
+      return ret.eval()
+
+  def testConst(self):
+    self.assertTrue(np.array_equal(self._Ones([2, 3]), np.array([[1] * 3] * 2)))
+
+  def testDynamicSizes(self):
+    np_ans = np.array([[1] * 3] * 2)
+    with self.test_session():
+      # Creates a tensor of 2 x 3.
+      d = tf.fill([2, 3], 12., name="fill")
+      # Constructs a tensor of ones of the same dimensions as "d".
+      z = tf.ones(tf.shape(d))
+      out = z.eval()
+    self.assertAllEqual(np_ans, out)
+    self.assertShapeEqual(np_ans, d)
+    self.assertShapeEqual(np_ans, z)
+
+  def testDtype(self):
+    with self.test_session():
+      d = tf.fill([2, 3], 12., name="fill")
+      self.assertEqual(d.get_shape(), [2, 3])
+      # Test default type for both constant size and dynamic size
+      z = tf.ones([2, 3])
+      self.assertEquals(z.dtype, tf.float32)
+      self.assertEqual([2, 3], z.get_shape())
+      z = tf.ones(tf.shape(d))
+      self.assertEquals(z.dtype, tf.float32)
+      self.assertEqual([2, 3], z.get_shape())
+      # Test explicit type control
+      for dtype in [tf.float32, tf.float64, tf.int32,
+                    tf.uint8, tf.int16, tf.int8,
+                    tf.complex64, tf.int64]:
+        z = tf.ones([2, 3], dtype=dtype)
+        self.assertEquals(z.dtype, dtype)
+        self.assertEqual([2, 3], z.get_shape())
+        z = tf.ones(tf.shape(d), dtype=dtype)
+        self.assertEquals(z.dtype, dtype)
+        self.assertEqual([2, 3], z.get_shape())
+
+
+class OnesLikeTest(tf.test.TestCase):
+
+  def testOnesLike(self):
+    for dtype in [tf.float32, tf.float64, tf.int32,
+                  tf.uint8, tf.int16, tf.int8,
+                  tf.complex64, tf.int64]:
+      numpy_dtype = dtype.as_numpy_dtype
+      with self.test_session():
+        # Creates a tensor of non-zero values with shape 2 x 3.
+        d = tf.constant(np.ones((2, 3), dtype=numpy_dtype), dtype=dtype)
+        # Constructs a tensor of zeros of the same dimensions and type as "d".
+        z_var = tf.ones_like(d)
+        # Test that the type is correct
+        self.assertEquals(z_var.dtype, dtype)
+        z_value = z_var.eval()
+
+      # Test that the value is correct
+      self.assertTrue(np.array_equal(z_value, np.array([[1] * 3] * 2)))
+      self.assertEqual([2, 3], z_var.get_shape())
+
+  def testGenOnesLike(self):
+    for dtype in [tf.float32, tf.float64, tf.int32,
+                  tf.uint8, tf.int16, tf.int8,
+                  tf.complex64, tf.int64]:
+      numpy_dtype = dtype.as_numpy_dtype
+      with self.test_session():
+        # Creates a tensor of non-zero values with shape 2 x 3.
+        d = tf.constant(np.ones((2, 3), dtype=numpy_dtype), dtype=dtype)
+        # Constructs a tensor of zeros of the same dimensions and type as "d".
+        z_var = tf.ones_like(d)
+        # Test that the type is correct
+        self.assertEquals(z_var.dtype, dtype)
+        z_value = z_var.eval()
+
+      # Test that the value is correct
+      self.assertTrue(np.array_equal(z_value, np.array([[1] * 3] * 2)))
+      self.assertEqual([2, 3], z_var.get_shape())
+
+
+class FillTest(tf.test.TestCase):
+
+  def _compare(self, dims, val, np_ans, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      tf_ans = tf.fill(dims, val, name="fill")
+      out = tf_ans.eval()
+    self.assertAllClose(np_ans, out)
+    # Fill does not set the shape.
+    # self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareAll(self, dims, val, np_ans):
+    self._compare(dims, val, np_ans, False)
+    self._compare(dims, val, np_ans, True)
+
+  def testFillFloat(self):
+    np_ans = np.array([[3.1415] * 3] * 2).astype(np.float32)
+    self._compareAll([2, 3], np_ans[0][0], np_ans)
+
+  def testFillDouble(self):
+    np_ans = np.array([[3.1415] * 3] * 2).astype(np.float64)
+    self._compareAll([2, 3], np_ans[0][0], np_ans)
+
+  def testFillInt32(self):
+    np_ans = np.array([[42] * 3] * 2).astype(np.int32)
+    self._compareAll([2, 3], np_ans[0][0], np_ans)
+
+  def testFillInt64(self):
+    np_ans = np.array([[-42] * 3] * 2).astype(np.int64)
+    self._compareAll([2, 3], np_ans[0][0], np_ans)
+
+  def testFillComplex(self):
+    np_ans = np.array([[0.15] * 3] * 2).astype(np.complex64)
+    self._compare([2, 3], np_ans[0][0], np_ans, use_gpu=False)
+
+  def testFillString(self):
+    np_ans = np.array([["yolo"] * 3] * 2)
+    with self.test_session(use_gpu=False):
+      tf_ans = tf.fill([2, 3], np_ans[0][0], name="fill").eval()
+    self.assertAllEqual(np_ans, tf_ans)
+
+  def testShapeFunctionEdgeCases(self):
+    # Non-vector dimensions.
+    with self.assertRaises(ValueError):
+      tf.fill([[0, 1], [2, 3]], 1.0)
+
+    # Non-scalar value.
+    with self.assertRaises(ValueError):
+      tf.fill([3, 2], [1.0, 2.0])
+
+    # Partial dimension information.
+    f = tf.fill(
+        tf.placeholder(tf.int32, shape=(4,)), 3.0)
+    self.assertEqual([None, None, None, None], f.get_shape().as_list())
+
+
+class PlaceholderTest(tf.test.TestCase):
+
+  def testDtype(self):
+    with self.test_session():
+      p = tf.placeholder(tf.float32, name="p")
+      p_identity = tf.identity(p)
+      feed_array = np.random.rand(10, 10)
+      self.assertAllClose(p_identity.eval(feed_dict={p: feed_array}),
+                          feed_array)
+
+      with self.assertRaisesOpError(
+          "must feed a value for placeholder tensor 'p' with dtype float"):
+        p_identity.eval()
+
+  def testShape(self):
+    with self.test_session():
+      p = tf.placeholder(tf.float32, shape=(10, 10), name="p")
+      p_identity = tf.identity(p)
+      feed_array = np.random.rand(10, 10)
+      self.assertAllClose(p_identity.eval(feed_dict={p: feed_array}),
+                          feed_array)
+
+      with self.assertRaisesOpError(
+          "must feed a value for placeholder tensor 'p' with dtype float and "
+          "shape dim { size: 10 } dim { size: 10 }"):
+        p_identity.eval()
+
+      with self.assertRaisesWithPredicateMatch(
+          ValueError, lambda e: "Cannot feed value of shape" in e.message):
+        p_identity.eval(feed_dict={p: feed_array[:5, :5]})
+
+  def testPartialShape(self):
+    with self.test_session():
+      p = tf.placeholder(tf.float32, shape=[None, 3], name="p")
+      p_identity = tf.identity(p)
+      feed_array = np.random.rand(10, 3)
+      self.assertAllClose(p_identity.eval(feed_dict={p: feed_array}),
+                          feed_array)
+
+      with self.assertRaisesWithPredicateMatch(
+          ValueError, lambda e: "Cannot feed value of shape" in e.message):
+        p_identity.eval(feed_dict={p: feed_array[:5, :2]})
+
+  def testControlDependency(self):
+    with self.test_session():
+      p = tf.placeholder(tf.int32, shape=[], name="p")
+      with tf.control_dependencies([p]):
+        c = tf.constant(5, tf.int32)
+      d = tf.mul(p, c)
+      self.assertEqual(10, d.eval(feed_dict={p: 2}))
+
+  def testFillNegative(self):
+    with self.test_session():
+      for shape in (-1,), (2, -1), (-1, 2):
+        with self.assertRaisesRegexp(tf.errors.InvalidArgumentError,
+                                     " must be nonnegative"):
+          tf.fill(shape, 7).eval()
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/control_flow_ops_py_test.py b/tensorflow/python/kernel_tests/control_flow_ops_py_test.py
new file mode 100644
index 0000000000..adf3552739
--- /dev/null
+++ b/tensorflow/python/kernel_tests/control_flow_ops_py_test.py
@@ -0,0 +1,1260 @@
+# pylint: disable=g-long-lambda
+"""Tests for tensorflow.ops.control_flow_ops."""
+import math
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import gradients
+from tensorflow.python.pywrap_tensorflow import StatusNotOK
+
+def check_op_order(graph):
+  """Sanity check on the ordering of op id."""
+
+  for op in graph.get_operations():
+    for v in op.inputs:
+      assert v.op._id < op._id or op.type == "Merge", (
+          "The id of %s must be less than the id of %s" % (v.op.name, op.name))
+  return True
+
+
+def check_consumers(graph):
+  """Sanity check on the consumer list of the tensors."""
+
+  consumer_count = {}
+  for op in graph.get_operations():
+    for v in op.inputs:
+      cnt = consumer_count.get(v, 0)
+      consumer_count[v] = cnt + 1
+  for k, v in consumer_count.iteritems():
+    if len(k.consumers()) != v:
+      return False
+  return True
+
+
+def isum(s):
+  i = tf.constant(0, name="i")
+  c = lambda i, s: tf.less(i, 10)
+  b = lambda i, s: [tf.add(i, 1), tf.add(i, s)]
+  _, r_s = control_flow_ops.While(c, b, [i, s])
+  return r_s
+
+
+class ControlFlowTest(tf.test.TestCase):
+
+  def testRefIdentity(self):
+    with self.test_session():
+      v = tf.Variable(7)
+
+      v = control_flow_ops._Identity(v)
+      op = tf.assign(v, 9)
+      v2 = control_flow_ops.with_dependencies([op], v)
+
+      self.assertTrue(check_op_order(v.graph))
+      self.assertTrue(isinstance(v2, tf.Tensor))
+      tf.initialize_all_variables().run()
+      self.assertEqual(9, v2.eval())
+
+  def testRefEnter(self):
+    with self.test_session():
+      v = tf.Variable(7)
+
+      enter_v = control_flow_ops._Enter(v, "foo_1")
+      nine = tf.constant(9)
+      enter_nine = control_flow_ops.enter(nine, "foo_1")
+      op = tf.assign(enter_v, enter_nine)
+      v2 = control_flow_ops.with_dependencies([op], enter_v)
+      v3 = control_flow_ops.exit(v2)
+      tf.initialize_all_variables().run()
+      self.assertEqual(9, v3.eval())
+
+  def testRefSwitch(self):
+    with self.test_session():
+      v = tf.Variable(7)
+
+      p = tf.constant(True)
+      v1 = control_flow_ops._SwitchRefOrTensor(v, p)
+      v2 = tf.assign(v1[1], 9)
+      tf.initialize_all_variables().run()
+      self.assertEqual(9, v2.eval())
+
+  def testEnterExit_1(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      enter_op = control_flow_ops.enter(data, "foo_1", False)
+      exit_op = control_flow_ops.exit(enter_op)
+
+      result = exit_op.eval()
+    self.assertAllEqual(np.array([1, 2, 3, 4, 5, 6]), result)
+
+  def testEnterMulExit_1(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      enter_data = control_flow_ops.enter(data, "foo_1", False)
+      five = tf.constant(5)
+      enter_five = control_flow_ops.enter(five, "foo_1", False)
+      mul_op = tf.mul(enter_data, enter_five)
+      exit_op = control_flow_ops.exit(mul_op)
+
+      result = exit_op.eval()
+    self.assertAllEqual(np.array([x * 5 for x in [1, 2, 3, 4, 5, 6]]), result)
+
+  def testEnterNextExit_1(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      enter_op = control_flow_ops.enter(data, "foo_1", False)
+      next_op = control_flow_ops.next_iteration(enter_op)
+      exit_op = control_flow_ops.exit(next_op)
+
+      result = exit_op.eval()
+    self.assertAllEqual(np.array([1, 2, 3, 4, 5, 6]), result)
+
+  def testSwitchMergeIndexedSlices(self):
+    with self.test_session():
+      values = tf.constant([1, 2, 3, 4, 5, 6])
+      indices = tf.constant([0, 2, 4, 6, 8, 10])
+      data = tf.IndexedSlices(values, indices)
+      pred = tf.convert_to_tensor(True)
+      switch_op = control_flow_ops.switch(data, pred)
+      merge_op = control_flow_ops.merge(switch_op)[0]
+
+      val = merge_op.values.eval()
+      ind = merge_op.indices.eval()
+    self.assertAllEqual(np.arange(1, 7), val)
+    self.assertAllEqual(np.arange(0, 12, 2), ind)
+
+  def _testSwitchMerge_1(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      ports = tf.convert_to_tensor(True, name="ports")
+      switch_op = control_flow_ops.switch(data, ports)
+      merge_op = control_flow_ops.merge(switch_op)[0]
+
+      result = merge_op.eval()
+    self.assertAllEqual(np.arange(1, 7), result)
+
+  def testSwitchMerge_1(self):
+    self._testSwitchMerge_1(use_gpu=False)
+    self._testSwitchMerge_1(use_gpu=True)
+
+  def testSwitchDeadBranch(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      ports = tf.convert_to_tensor(True, name="ports")
+      switch_op = control_flow_ops.switch(data, ports)
+      dead_branch = tf.identity(switch_op[0])
+
+      with self.assertRaisesWithPredicateMatch(
+          StatusNotOK, lambda e: 'The tensor returned for' in str(e)):
+        dead_branch.eval()
+
+  def testSwitchMergeIdentity_1(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      ports = tf.convert_to_tensor(True, name="ports")
+      switch_op = control_flow_ops.switch(data, ports)
+      merge_op = control_flow_ops.merge(switch_op)[0]
+      id_op = tf.identity(merge_op)
+
+      result = id_op.eval()
+    self.assertAllEqual(np.arange(1, 7), result)
+
+  def testSwitchMergeLess_0(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      zero = tf.constant(0)
+      one = tf.constant(1)
+      less_op = tf.less(zero, one)
+      switch_op = control_flow_ops.switch(data, less_op)
+      merge_op = control_flow_ops.merge(switch_op)[0]
+
+      result = merge_op.eval()
+    self.assertAllEqual(np.arange(1, 7), result)
+
+  def testSwitchMergeLess_1(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      zero = tf.convert_to_tensor(0)
+      one = tf.convert_to_tensor(1)
+      less_op = tf.less(zero, one)
+      switch_op = control_flow_ops.switch(data, less_op)
+      merge_op = control_flow_ops.merge(switch_op)[0]
+
+      result = merge_op.eval()
+    self.assertAllEqual(np.arange(1, 7), result)
+
+  def testSwitchMergeAddIdentity_0(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      ports = tf.convert_to_tensor(False, name="ports")
+      switch_op = control_flow_ops.switch(data, ports)
+      one = tf.constant(1)
+      add_op = tf.add(switch_op[0], one)
+      id_op = tf.identity(switch_op[1])
+      merge_op = control_flow_ops.merge([add_op, id_op])[0]
+
+      result = merge_op.eval()
+    self.assertAllEqual(np.array([x + 1 for x in [1, 2, 3, 4, 5, 6]]), result)
+
+  def testSwitchMergeAddIdentity_1(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      ports = tf.convert_to_tensor(True, name="ports")
+      switch_op = control_flow_ops.switch(data, ports)
+      one = tf.constant(1)
+      add_op = tf.add(switch_op[0], one)
+      id_op = tf.identity(switch_op[1])
+      merge_op = control_flow_ops.merge([add_op, id_op])[0]
+
+      result = merge_op.eval()
+    self.assertAllEqual(np.arange(1, 7), result)
+
+  def testSwitchMergeAddMul_0(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      ports = tf.convert_to_tensor(False, name="ports")
+      switch_op = control_flow_ops.switch(data, ports)
+      one = tf.constant(1)
+      add_op = tf.add(switch_op[0], one)
+      five = tf.constant(5)
+      mul_op = tf.mul(switch_op[1], five)
+      merge_op = control_flow_ops.merge([add_op, mul_op])[0]
+
+      result = merge_op.eval()
+    self.assertAllEqual(np.array([x + 1 for x in [1, 2, 3, 4, 5, 6]]), result)
+
+  def testSwitchMergeAddMul_1(self):
+    with self.test_session():
+      data = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      ports = tf.convert_to_tensor(True, name="ports")
+      switch_op = control_flow_ops.switch(data, ports)
+      one = tf.constant(1)
+      add_op = tf.add(switch_op[0], one)
+      five = tf.constant(5)
+      mul_op = tf.mul(switch_op[1], five)
+      merge_op = control_flow_ops.merge([add_op, mul_op])[0]
+
+      result = merge_op.eval()
+    self.assertAllEqual(np.array([x * 5 for x in [1, 2, 3, 4, 5, 6]]), result)
+
+  def testLoop_false(self):
+    with self.test_session():
+      false = tf.convert_to_tensor(False)
+      n = tf.constant(10)
+
+      enter_false = control_flow_ops.enter(false, "foo_1", False)
+      enter_n = control_flow_ops.enter(n, "foo_1", False)
+
+      merge_n = control_flow_ops.merge([enter_n], name="merge_n")[0]
+      switch_n = control_flow_ops.switch(merge_n, enter_false)
+      exit_n = control_flow_ops.exit(switch_n[0])
+
+      result = exit_n.eval()
+    self.assertAllEqual(10, result)
+
+  def testLoop_false_1(self):
+    with self.test_session():
+      false = tf.convert_to_tensor(False)
+      n = tf.constant(10)
+
+      enter_false = control_flow_ops.enter(false, "foo_1", False)
+      enter_n = control_flow_ops.enter(n, "foo_1", False)
+
+      merge_n = control_flow_ops.merge([enter_n, enter_n], name="merge_n")[0]
+      switch_n = control_flow_ops.switch(merge_n, enter_false)
+      exit_n = control_flow_ops.exit(switch_n[0])
+      next_n = control_flow_ops.next_iteration(switch_n[0])
+      merge_n.op._update_input(1, next_n)
+
+      result = exit_n.eval()
+    self.assertAllEqual(10, result)
+
+  def testLoop_1(self):
+    with self.test_session():
+      zero = tf.convert_to_tensor(0)
+      one = tf.convert_to_tensor(1)
+      n = tf.constant(10)
+
+      enter_zero = control_flow_ops.enter(zero, "foo_1", False)
+      enter_one = control_flow_ops.enter(one, "foo_1", False)
+      enter_n = control_flow_ops.enter(n, "foo_1", False)
+      merge_zero = control_flow_ops.merge([enter_zero, enter_zero],
+                                          name="merge_zero")[0]
+      merge_one = control_flow_ops.merge([enter_one, enter_one],
+                                         name="merge_one")[0]
+      merge_n = control_flow_ops.merge([enter_n, enter_n], name="merge_n")[0]
+      less_op = tf.less(merge_n, merge_n)
+      cond_op = control_flow_ops.loop_cond(less_op)
+      switch_zero = control_flow_ops.switch(merge_zero, cond_op)
+      switch_one = control_flow_ops.switch(merge_one, cond_op)
+      switch_n = control_flow_ops.switch(merge_n, cond_op)
+      next_zero = control_flow_ops.next_iteration(switch_zero[1])
+      next_one = control_flow_ops.next_iteration(switch_one[1])
+      next_n = control_flow_ops.next_iteration(switch_n[1])
+      merge_zero.op._update_input(1, next_zero)
+      merge_one.op._update_input(1, next_one)
+      merge_n.op._update_input(1, next_n)
+      exit_n = control_flow_ops.exit(switch_n[0])
+
+      result = exit_n.eval()
+    self.assertAllEqual(10, result)
+
+  def testCondIndexedSlices(self):
+    with self.test_session():
+      values = tf.constant(10)
+      indices = tf.constant(0)
+      x = tf.IndexedSlices(values, indices)
+      pred = tf.less(1, 2)
+      fn1 = lambda: tf.IndexedSlices(tf.add(x.values, 1), indices)
+      fn2 = lambda: tf.IndexedSlices(tf.sub(x.values, 1), indices)
+      r = control_flow_ops.cond(pred, fn1, fn2)
+
+      val = r.values.eval()
+      ind = r.indices.eval()
+    self.assertTrue(check_op_order(x.values.graph))
+    self.assertAllEqual(11, val)
+    self.assertAllEqual(0, ind)
+
+  def testCondIndexedSlicesDifferentTypes(self):
+    with self.test_session():
+      values = tf.constant(10)
+      i_32 = tf.convert_to_tensor(0, name="one", dtype=tf.int32)
+      i_64 = tf.convert_to_tensor(0, name="one", dtype=tf.int64)
+      x = tf.IndexedSlices(values, i_32)
+      pred = tf.less(1, 2)
+      fn1 = lambda: tf.IndexedSlices(tf.add(x.values, 1), i_32)
+      fn2 = lambda: tf.IndexedSlices(tf.sub(x.values, 1), i_64)
+      r = control_flow_ops.cond(pred, fn1, fn2)
+
+      val = r.values.eval()
+      ind = r.indices.eval()
+    self.assertTrue(check_op_order(x.values.graph))
+    self.assertAllEqual(11, val)
+    self.assertAllEqual(0, ind)
+    self.assertTrue(ind.dtype == np.int64)
+
+  def _testCond_1(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      x = tf.constant(10)
+      pred = tf.less(1, 2)
+      fn1 = lambda: tf.add(x, 1)
+      fn2 = lambda: tf.sub(x, 1)
+      r = control_flow_ops.cond(pred, fn1, fn2)
+
+      result = r.eval()
+    self.assertTrue(check_op_order(x.graph))
+    self.assertAllEqual(11, result)
+
+  def testCond_1(self):
+    self._testCond_1(use_gpu=False)
+    self._testCond_1(use_gpu=True)
+
+  def testCond_2(self):
+    with self.test_session():
+      x = tf.constant(10)
+      r = control_flow_ops.cond(tf.less(1, 0), lambda: tf.add(x, 1),
+                                lambda: tf.sub(x, 1))
+      result = r.eval()
+    self.assertTrue(check_op_order(x.graph))
+    self.assertAllEqual(9, result)
+
+  def testCond_3(self):
+    with self.test_session():
+      x = tf.constant(10)
+      pred = tf.less(1, 2)
+      fn1 = lambda: tf.add(x, 1)
+      fn2 = lambda: tf.sub(x, 1)
+      fn3 = lambda: tf.add(control_flow_ops.cond(pred, fn1, fn2), 1)
+      r = control_flow_ops.cond(pred, fn3, fn2)
+
+      result = r.eval()
+    self.assertTrue(check_op_order(x.graph))
+    self.assertAllEqual(12, result)
+
+  def testCond_4(self):
+    with self.test_session():
+      v1 = tf.Variable(7)
+      v2 = tf.Variable(7)
+      v3 = tf.Variable(7)
+
+      age = tf.constant(3)
+      max_age = tf.constant(2)
+      pred = tf.greater(age, max_age)
+      fn1 = lambda: [tf.assign(v1, 1).op, tf.assign(v2, 2).op]
+      fn2 = lambda: [tf.assign(v3, 3).op, tf.constant(10).op]
+      r = control_flow_ops.cond(pred, fn1, fn2)
+
+      tf.initialize_all_variables().run()
+      self.assertEqual(len(r), 2)
+      result = r[1].eval()
+      self.assertTrue(check_op_order(age.graph))
+      self.assertAllEqual(True, result)
+      self.assertAllEqual(7, v1.eval())
+      self.assertAllEqual(2, v2.eval())
+      self.assertAllEqual(7, v3.eval())
+
+  def testCond_5(self):
+    with self.test_session():
+      alive = tf.constant(True, name="alive")
+      count = tf.constant(0, name="count")
+
+      def body(i):
+        return control_flow_ops.cond(
+            alive, lambda: [tf.less(i, 3), tf.add(count, 1)],
+            lambda: [alive, count])
+
+      for i in range(10):
+        alive, count = body(i)
+      self.assertAllEqual(4, count.eval())
+
+  def testCond_6(self):
+    with self.test_session():
+      v1 = tf.Variable([7])
+
+      age = tf.constant(3)
+      pred = tf.greater(age, 4)
+      fn1 = lambda: age
+      fn2 = lambda: v1
+      r = control_flow_ops.cond(pred, fn1, fn2)
+
+      tf.initialize_all_variables().run()
+      result = r.eval()
+      self.assertAllEqual(np.array([7]), result)
+
+  def testCondGrad_1(self):
+    with self.test_session():
+      x = tf.constant(10.0, name="x")
+      pred = tf.less(1, 2)
+      fn1 = lambda: tf.identity(x)
+      fn2 = lambda: tf.identity(x)
+      r = control_flow_ops.cond(pred, fn1, fn2)
+
+      grad = tf.gradients(r, [x])[0]
+      result = grad.eval()
+    self.assertAllEqual(1.0, result)
+
+  def testCondGrad_2(self):
+    with self.test_session():
+      c = tf.placeholder(tf.int32, shape=[])
+      x = tf.constant(10.0)
+      pred = tf.less(c, 2)
+      fn1 = lambda: tf.mul(x, 42.0)
+      fn2 = lambda: tf.mul(x, 3.0)
+      r = control_flow_ops.cond(pred, fn1, fn2)
+
+      grad = tf.gradients(r, [x])[0]
+      self.assertAllEqual(42.0, grad.eval(feed_dict={c: 1}))
+      self.assertAllEqual(3.0, grad.eval(feed_dict={c: 3}))
+
+  def testCondGrad_Gather(self):
+    with self.test_session() as sess:
+      v1 = tf.Variable([1.0, 42.0])
+      c = tf.placeholder(tf.int32, shape=[])
+      pred = tf.less(c, 2)
+      fn1 = lambda: tf.identity(v1)
+      fn2 = lambda: tf.gather(v1, [1, 1])
+      r = control_flow_ops.cond(pred, fn1, fn2)
+      grad = tf.gradients(r, [v1])[0]
+      tf.initialize_all_variables().run()
+      # Should just be [1, 1], but possibly a sparse representation
+      gv, gi = sess.run([grad.values, grad.indices], feed_dict={c: 1})
+      dense_gv = [sum([y for (x, y) in zip(gi, gv) if x == i]) for i in range(2)
+                 ]
+      self.assertAllEqual(dense_gv, [1.0, 1.0])
+      # Should be [0, 2], as the else forwards v1[1] twice
+      gv, gi = sess.run([grad.values, grad.indices], feed_dict={c: 3})
+      dense_gv = [sum([y for (x, y) in zip(gi, gv) if x == i]) for i in range(2)
+                 ]
+      self.assertAllEqual(dense_gv, [0.0, 2.0])
+
+  def testWhileGrad_1(self):
+    with self.test_session():
+      v = tf.constant(2.0, name="v")
+      c = lambda v: tf.less(v, 100.0)
+      b = tf.square
+      r = control_flow_ops.While(c, b, [v], parallel_iterations=1)
+
+      r = tf.gradients(r, v)
+      result = r[0].eval()
+      self.assertEqual(1024.0, result)
+
+  def testWhileGrad_2(self):
+    with self.test_session():
+      a = tf.constant(3.0, name="a")
+      v = tf.constant(2.0, name="v")
+      c = lambda v: tf.less(v, 100.0)
+      b = lambda v: tf.mul(v, a)
+      r = control_flow_ops.While(c, b, [v], parallel_iterations=1)
+
+      r = tf.gradients(r, a)
+      result = r[0].eval()
+      self.assertEqual(216.0, result)
+
+  def testWhileGrad_3(self):
+    with self.test_session():
+      a = tf.constant(3.0, name="a")
+      v = tf.constant(2.0, name="v")
+      c = lambda v: tf.less(v, 100.0)
+      b = lambda v: tf.mul(v, a)
+      r = control_flow_ops.While(c, b, [v], parallel_iterations=1)
+
+      r = tf.gradients(r, v)
+      result = r[0].eval()
+      self.assertEqual(81.0, result)
+
+  def testWhileGrad_4(self):
+    with self.test_session():
+      a = tf.Variable(3.0)
+      v = tf.constant(2.0, name="v")
+      c = lambda v: tf.less(v, 100.0)
+      b = lambda v: tf.mul(v, a)
+      r = control_flow_ops.While(c, b, [v], parallel_iterations=1)
+
+      r = tf.gradients(r, a)
+      tf.initialize_all_variables().run()
+      result = r[0].eval()
+      self.assertEqual(216.0, result)
+
+  def testWhileGrad_5(self):
+    with self.test_session():
+      x = tf.constant(3.0, name="x")
+      y = tf.constant(2.0, name="y")
+      c = lambda x, y: tf.less(x, 100.0)
+
+      def b(x, y):
+        y1 = tf.add(x, y)
+        x1 = tf.mul(x, y1)
+        return x1, y1
+
+      r = control_flow_ops.While(c, b, [x, y], parallel_iterations=1)
+
+      # Must use the complete r.
+      r = tf.gradients(r, x)
+      result = r[0].eval()
+      self.assertEqual(304.0, result)
+
+  def testWhileGrad_6(self):
+    with self.test_session():
+      i = tf.constant(0, name="i")
+      x = tf.constant(2.0, name="x")
+      c = lambda i, x: tf.less(i, 10)
+
+      def b(i, x):
+        x = tf.mul(x, 2.0)
+        i = tf.add(i, 1)
+        return i, x
+
+      r = control_flow_ops.While(c, b, [i, x], parallel_iterations=1)
+
+      # Must use the complete r.
+      r = tf.gradients(r, x)
+      r = r[0].eval()
+      self.assertEqual(1024.0, r)
+
+  def testWhileGrad_7(self):
+    with self.test_session():
+      v = tf.constant(2.0, name="v")
+      c = lambda v: tf.less(v, 100.0)
+      b = tf.square
+      r = control_flow_ops.While(c, b, [v], parallel_iterations=1,
+                                 back_prop=False)
+      r = tf.add(r, v)
+      r = tf.gradients(r, v)
+      result = r[0].eval()
+      self.assertEqual(1.0, result)
+
+  # Microbenchmark: 10,000 iterations took 0.21s.
+  def testWhile_1(self):
+    with self.test_session():
+      n = tf.constant(0)
+      c = lambda x: tf.less(x, 10000)
+      b = lambda x: tf.add(x, 1)
+      r = control_flow_ops.While(c, b, [n], parallel_iterations=20)
+
+      result = r.eval()
+    self.assertTrue(check_op_order(n.graph))
+    self.assertEqual(10000, result)
+
+  def testWhile_2(self):
+    with self.test_session():
+      s = tf.constant(0)
+      r = isum(s)
+
+      result = r.eval()
+    self.assertTrue(check_op_order(s.graph))
+    self.assertAllEqual(45, result)
+
+  # Have more than 10 parallel iterations and hence exercise k-bound
+  # most of the time.
+  def testWhile_3(self):
+    with self.test_session():
+
+      def compute(i, m, c, o):
+        m, c = [tf.add(m, 1), tf.add(c, 1)]
+        o = tf.add(o, m)
+        o = tf.add(o, c)
+        i = tf.add(i, 1)
+        return [i, m, c, o]
+
+      i = tf.convert_to_tensor(0)
+      m = tf.convert_to_tensor(0)
+      c = tf.convert_to_tensor(0)
+      o = tf.convert_to_tensor(0)
+      d = tf.convert_to_tensor(100)
+      r = control_flow_ops.While(
+          lambda i, m, c, o: tf.less(i, d), compute, [i, m, c, o])
+      result = r[3].eval()
+    self.assertTrue(check_op_order(i.graph))
+    self.assertAllEqual(10100, result)
+
+  def testWhile_4(self):
+    with self.test_session():
+
+      def compute(i, m, c, o):
+        m, c = [tf.gather(x, i), tf.gather(x, i)]
+        o = tf.add(o, m)
+        o = tf.add(o, c)
+        i = tf.add(i, 1)
+        return [i, m, c, o]
+
+      i = tf.convert_to_tensor(0)
+      m = tf.convert_to_tensor(0)
+      c = tf.convert_to_tensor(0)
+      o = tf.convert_to_tensor(0)
+      x = tf.convert_to_tensor([1, 2, 3, 4, 5, 6])
+      s = tf.size(x)
+      r = control_flow_ops.While(
+          lambda i, m, c, o: tf.less(i, s), compute, [i, m, c, o])
+      result = r[3].eval()
+    self.assertTrue(check_op_order(i.graph))
+    self.assertAllEqual(42, result)
+
+  def testWhile_5(self):
+    with self.test_session():
+
+      def compute(i, c, o):
+        c = tf.slice(x, tf.expand_dims(i, 0), [1])
+        o = tf.concat(0, [o, c])
+        i = tf.add(i, 1)
+        return [i, c, o]
+
+      i = tf.convert_to_tensor(0)
+      c = tf.convert_to_tensor(0)
+      o = tf.convert_to_tensor([0])
+      x = tf.convert_to_tensor([1, 2, 3, 4, 5, 6])
+      s = tf.size(x)
+      r = control_flow_ops.While(
+          lambda i, c, o: tf.less(i, s), compute, [i, c, o])
+      result = r[2].eval()
+    self.assertTrue(check_op_order(i.graph))
+    self.assertAllEqual(np.array([0, 1, 2, 3, 4, 5, 6]), result)
+
+  def _testWhile_Gpu_1(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      n = tf.constant(1.0)
+      c = lambda x: tf.less(x, 10.0)
+      b = lambda x: tf.add(x, 1.0)
+      r = control_flow_ops.While(c, b, [n])
+
+      result = r.eval()
+    self.assertEqual(10.0, result)
+
+  def testWhile_Gpu_1(self):
+    self._testWhile_Gpu_1(use_gpu=False)
+    self._testWhile_Gpu_1(use_gpu=True)
+
+  def _testWhile_Gpu_2(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      n = tf.constant(1.0)
+      c = lambda x: tf.less(x, 10.0)
+      def b(x):
+        with tf.device("/cpu:0"):
+          return tf.add(x, 1.0)
+      r = control_flow_ops.While(c, b, [n])
+
+      result = r.eval()
+    self.assertEqual(10.0, result)
+
+  def testWhile_Gpu_2(self):
+    self._testWhile_Gpu_1(use_gpu=False)
+    self._testWhile_Gpu_1(use_gpu=True)
+
+  def testWhileWithControl_1(self):
+    with self.test_session():
+      n = tf.constant(0)
+      r = tf.constant(0)
+      condition = lambda n_, r_: tf.less(n_, 10)
+
+      def body(n_, r_):
+        n_ = tf.add(n_, 1)
+        with r_.graph.control_dependencies([r_]):
+          r_ = tf.constant(12)
+        return [n_, r_]
+
+      res = control_flow_ops.While(condition,
+                                   body,
+                                   [n, r],
+                                   parallel_iterations=1)
+      result = res[1].eval()
+    self.assertTrue(check_op_order(n.graph))
+    self.assertAllEqual(12, result)
+
+  def testWhileWithControl_2(self):
+    with self.test_session():
+      r = tf.constant(0)
+      condition = lambda r_: tf.less(r_, 10)
+
+      def body(r_):
+        with r_.graph.control_dependencies([r_]):
+          r_ = tf.constant(12)
+        return [r_]
+
+      res = control_flow_ops.While(condition, body, [r], parallel_iterations=1)
+      result = res.eval()
+    self.assertTrue(check_op_order(r.graph))
+    self.assertAllEqual(12, result)
+
+  def testCondWhile_1(self):
+    with self.test_session():
+      n = tf.convert_to_tensor(0, name="n")
+      c = lambda x: tf.less(x, 10)
+      b = lambda x: tf.add(x, 1)
+      r = control_flow_ops.cond(tf.less(0, 1),
+                                lambda: control_flow_ops.While(c, b, [n]),
+                                lambda: n)
+
+      result = r.eval()
+    self.assertTrue(check_op_order(n.graph))
+    self.assertAllEqual(10, result)
+
+  def testCondWhile_2(self):
+    with self.test_session():
+      n = tf.convert_to_tensor(0)
+      c = lambda x: tf.less(x, 10)
+      b = lambda x: tf.add(x, 1)
+      r = control_flow_ops.cond(tf.less(1, 0), lambda: tf.add(n, 1),
+                                lambda: control_flow_ops.While(c, b, [n]))
+
+      result = r.eval()
+    self.assertTrue(check_op_order(n.graph))
+    self.assertAllEqual(10, result)
+
+  def testWhileCond_1(self):
+    with self.test_session():
+      i = tf.convert_to_tensor(0, name="i")
+      n = tf.convert_to_tensor(10, name="n")
+      one = tf.convert_to_tensor(1, name="one")
+      c = lambda x: tf.less(x, n)
+      b = lambda x: control_flow_ops.cond(tf.constant(True),
+                                          lambda: tf.add(x, one),
+                                          lambda: tf.sub(x, one))
+      r = control_flow_ops.While(c, b, [i])
+
+      result = r.eval()
+    self.assertTrue(check_op_order(n.graph))
+    self.assertAllEqual(10, result)
+
+  def testWhileCond_2(self):
+    with self.test_session():
+      n = tf.convert_to_tensor(0, name="n")
+      c = lambda x: tf.less(x, 10)
+      b = lambda x: control_flow_ops.cond(tf.constant(True),
+                                          lambda: tf.add(x, 1),
+                                          lambda: n)
+      r = control_flow_ops.While(c, b, [n])
+
+      result = r.eval()
+    self.assertTrue(check_op_order(n.graph))
+    self.assertAllEqual(10, result)
+
+  def testWhileCond_3(self):
+    with self.test_session():
+      n = tf.convert_to_tensor(0)
+      c = lambda x: tf.less(x, 10)
+      b = lambda x: control_flow_ops.cond(tf.less(0, 1),
+                                          lambda: tf.add(x, 1),
+                                          lambda: tf.sub(x, 1))
+      r = control_flow_ops.While(c, b, [n])
+
+      result = r.eval()
+    self.assertTrue(check_op_order(n.graph))
+    self.assertAllEqual(10, result)
+
+  # NOTE: It is ok to have parallel_iterations > 1
+  def testWhileUpdateVariable_1(self):
+    with self.test_session():
+      select = tf.Variable([3.0, 4.0, 5.0])
+      n = tf.constant(0)
+
+      def loop_iterator(j):
+        return tf.less(j, 3)
+
+      def loop_body(j):
+        ns = tf.scatter_update(select, j, 10.0)
+        nj = tf.add(j, 1)
+        op = control_flow_ops.group(ns)
+        nj = control_flow_ops.with_dependencies([op], nj)
+        return [nj]
+
+      r = control_flow_ops.While(loop_iterator,
+                                 loop_body,
+                                 [n],
+                                 parallel_iterations=1)
+      self.assertTrue(check_op_order(n.graph))
+      tf.initialize_all_variables().run()
+      self.assertEqual(3, r.eval())
+      result = select.eval()
+      self.assertAllEqual(np.array([10.0, 10.0, 10.0]), result)
+
+  def testWhileUpdateVariable_2(self):
+    with self.test_session():
+      select1 = tf.Variable([3.0, 4.0, 5.0])
+      select2 = tf.Variable([3.0, 4.0, 5.0])
+      n = tf.constant(0)
+
+      def loop_iterator(j):
+        return tf.less(j, 3)
+
+      def loop_body(j):
+        ns1 = tf.scatter_update(select1, j, 10.0)
+        ns2 = tf.scatter_update(select2, j, 10.0)
+        nj = tf.add(j, 1)
+        op = control_flow_ops.group(ns1, ns2)
+        nj = control_flow_ops.with_dependencies([op], nj)
+        return [nj]
+
+      r = control_flow_ops.While(loop_iterator,
+                                 loop_body,
+                                 [n],
+                                 parallel_iterations=1)
+      self.assertTrue(check_op_order(n.graph))
+      tf.initialize_all_variables().run()
+      self.assertEqual(3, r.eval())
+      result1 = select1.eval()
+      self.assertAllEqual(np.array([10.0, 10.0, 10.0]), result1)
+      result2 = select2.eval()
+      self.assertAllEqual(np.array([10.0, 10.0, 10.0]), result2)
+
+  def testWhileUpdateVariable_3(self):
+    with self.test_session():
+      select = tf.Variable([3.0, 4.0, 5.0])
+      n = tf.constant(0)
+
+      def loop_iterator(j, _):
+        return tf.less(j, 3)
+
+      def loop_body(j, _):
+        ns = tf.scatter_update(select, j, 10.0)
+        nj = tf.add(j, 1)
+        return [nj, ns]
+
+      r = control_flow_ops.While(loop_iterator,
+                                 loop_body,
+                                 [n, tf.identity(select)],
+                                 parallel_iterations=1)
+      tf.initialize_all_variables().run()
+      result = r[1].eval()
+    self.assertTrue(check_op_order(n.graph))
+    self.assertAllEqual(np.array([10.0, 10.0, 10.0]), result)
+
+  # b/24814703
+  def testWhileUpdateVariable_4(self):
+    with self.test_session():
+      var_a = tf.Variable(0, name="a")
+      var_b = tf.Variable(0, name="b")
+      tf.initialize_all_variables().run()
+
+      c = tf.constant(0, name="c")
+      asn1 = tf.assign_add(var_a, 1, name="a_add")
+      # Loop condition
+      def pred(i):
+        return tf.less(i, 10)
+      # Loop body
+      def loop_body(i):
+        asn2 = tf.assign_add(var_b, asn1, name="b_add")
+        with tf.control_dependencies([asn2]):
+          ni = tf.add(i, 1, name="i_add")
+        return ni
+
+      lpa = control_flow_ops.While(pred, loop_body, [c],
+                                   parallel_iterations=1)
+
+      self.assertEqual(0, var_b.eval())
+      lpa.eval()  # Run the loop
+      self.assertEqual(10, var_b.eval())
+
+  # b/24736492
+  def testWhileUpdateVariable_5(self):
+    with self.test_session():
+      # Create some variables.
+      var_a = tf.Variable(0, name="a")
+      var_b = tf.Variable(0, name="b")
+      tf.initialize_all_variables().run()
+
+      # Change condition to check var_b
+      def pred(i):
+        return tf.less(var_b, 10)
+
+      # Change body to increment var_b
+      def loop_body(i):
+        asn1 = tf.assign_add(var_a, tf.constant(1), name="a_add")
+        asn2 = tf.assign_add(var_b, tf.constant(1), name="b_add")
+        with tf.control_dependencies([asn1, asn2]):
+          inc_b = tf.identity(var_b)
+        return inc_b
+
+      lpa = control_flow_ops.While(pred, loop_body, [var_b], 1, name="loop")
+
+      self.assertEqual(0, var_b.eval())
+      lpa.eval()  # Run the loop
+      self.assertEqual(10, var_a.eval())
+      self.assertEqual(10, var_b.eval())
+
+  def testWhileQueue_1(self):
+    with self.test_session():
+      q = tf.FIFOQueue(-1, tf.int32)
+      i = tf.constant(0)
+
+      def c(i):
+        return tf.less(i, 10)
+
+      def b(i):
+        ni = tf.add(i, 1)
+        ni = control_flow_ops.with_dependencies([q.enqueue((i,))], ni)
+        return ni
+
+      r = control_flow_ops.While(c, b, [i], parallel_iterations=1)
+      self.assertEqual([10], r.eval())
+      for i in xrange(10):
+        self.assertEqual([i], q.dequeue().eval())
+
+  def testFold_1(self):
+    with self.test_session():
+      elems = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      r = control_flow_ops.fold(
+          lambda a, x: tf.mul(tf.add(a, x), 2), elems, [1])
+      result = r.eval()
+    self.assertTrue(check_op_order(elems.graph))
+    self.assertAllEqual(np.array([208]), result)
+
+  def testFold_2(self):
+    with self.test_session():
+      elems = tf.constant([1, 2, 3, 4, 5, 6], name="data")
+      ten = tf.convert_to_tensor(10)
+
+      def compute(a, x):
+        r = tf.mul(x, ten)
+        return tf.add(a, r)
+
+      r = control_flow_ops.fold(compute, elems, [1])
+      result = r.eval()
+    self.assertTrue(check_op_order(elems.graph))
+    self.assertAllEqual([201], result)
+
+  def testOneValueCond(self):
+    with self.test_session():
+      c = tf.placeholder(tf.int32, shape=[])
+      one = tf.convert_to_tensor(1, name="one")
+      two = tf.convert_to_tensor(2, name="two")
+      p = tf.greater_equal(c, 1)
+      i = control_flow_ops.cond(p, lambda: one, lambda: two)
+      self.assertTrue(isinstance(i, tf.Tensor))
+
+      # True case: c = 2 is >= 1
+      self.assertEqual([1], i.eval(feed_dict={c: 2}))
+
+      # False case: c = 0 is not >= 1
+      self.assertEqual([2], i.eval(feed_dict={c: 0}))
+
+  def testExampleCond(self):
+    with self.test_session():
+      x = tf.convert_to_tensor([-2.0, 2.0], name="x")
+      d = tf.placeholder(tf.int32, shape=[])
+
+      def l2():
+        return tf.sqrt(tf.reduce_sum(tf.square(x)))
+
+      def l1():
+        return tf.reduce_sum(tf.abs(x))
+
+      i = control_flow_ops.cond(tf.equal(d, 2), l2, l1)
+      self.assertEqual(4.0, i.eval(feed_dict={d: 1}))
+      self.assertAllClose(2.0 * math.sqrt(2), i.eval(feed_dict={d: 2}))
+
+  def testOneOpCond(self):
+    with self.test_session():
+      v = tf.Variable(0)
+      c = tf.convert_to_tensor(0)
+      one = tf.convert_to_tensor(1)
+      two = tf.convert_to_tensor(2)
+      p = tf.greater_equal(c, 1)
+
+      def a():
+        return tf.assign(v, one)
+
+      def b():
+        return tf.assign(v, two)
+
+      i = control_flow_ops.cond(p, a, b)
+      self.assertTrue(isinstance(i, tf.Tensor))
+      tf.initialize_all_variables().run()
+
+      self.assertEqual(0, v.eval())
+
+      # True case: c = 2 is >= 1, v is set to 1.
+      self.assertEqual(1, i.eval(feed_dict={c.name: 2}))
+      self.assertEqual(1, v.eval())
+
+      # False case: c = 0 is not >= 1, v is set to 2.
+      self.assertEqual(2, i.eval(feed_dict={c.name: 0}))
+      self.assertEqual(2, v.eval())
+
+  def testWithOpsDependencies(self):
+    with self.test_session() as sess:
+      v = tf.Variable(0.0)
+      c = tf.constant(10)
+
+      # Fetching v directly will result in an uninitialized error
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        sess.run([c, v])
+
+      # Use a control dependency to ensure init_variable is run
+      # while asking for c
+      real_v = control_flow_ops.with_dependencies(name="real_tensor",
+                                                 output_tensor=v,
+                                                 dependencies=[v.initializer])
+      c_val, real_v_val = sess.run([c, real_v])
+
+    # Ensure the result of 'real_c' is the same as 'c'
+    self.assertAllEqual(10, c_val)
+
+    # Ensure that 'v' is initialized
+    self.assertAllClose(0.0, real_v_val)
+
+  def testWithTensorDependencies(self):
+    with self.test_session():
+      v = tf.Variable(0.0)
+      c1 = tf.constant(10)
+      c2 = tf.constant(20)
+
+      # c1_with_init_v depends on the init op for v
+      c1_with_init_v = control_flow_ops.with_dependencies(
+          name="c1_with_init_v",
+          output_tensor=c1,
+          dependencies=[v.initializer])
+      # c2_with_c1 depends on the value of c1_with_init_v
+      c2_with_c1_dep = control_flow_ops.with_dependencies(
+          name="c2_with_c1_dep",
+          output_tensor=c2,
+          dependencies=[c1_with_init_v])
+
+      # Fetching v directly will result in an uninitialized error
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        v.eval()
+
+      # Get the value of 'c2_with_c1_dep', which should cause 'v'
+      # to be initialized.
+      self.assertAllEqual(20, c2_with_c1_dep.eval())
+
+      # Ensure that 'v' is initialized
+      self.assertAllClose(0.0, v.eval())
+
+  def testWithIndexedSlicesDependencies(self):
+    with self.test_session():
+      v = tf.Variable(
+          np.array([[0.0, 1.0], [10.0, 11.0], [20.0, 21.0]]).astype(np.float32))
+      v_at_1 = tf.IndexedSlices(v, tf.constant([1]))
+      gather_v_at_1 = tf.gather(v_at_1.values, v_at_1.indices)
+      v_at_1_after_init = control_flow_ops.with_dependencies([v.initializer],
+                                                             v_at_1)
+      gather_v_at_1_after_init = tf.gather(
+          v_at_1_after_init.values, v_at_1_after_init.indices)
+
+      # Fetching gather_v_at_1 will result in an uninitialized error
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        gather_v_at_1.eval()
+
+      # Getting gather_v_at_1_after_init will work, and initialize v.
+      self.assertAllEqual([[10.0, 11.0]], gather_v_at_1_after_init.eval())
+
+      # Double check that 'v' is initialized
+      self.assertAllClose([[0.0, 1.0], [10.0, 11.0], [20.0, 21.0]], v.eval())
+
+  def testDependenciesDevice(self):
+    with tf.Graph().as_default():
+      # device set on tensor => same device on dep.
+      with tf.device("/job:ps"):
+        vd = tf.Variable([0.0])
+      with_vd_dep = control_flow_ops.with_dependencies([vd.initializer], vd)
+      self.assertTrue("/job:ps" in with_vd_dep.device)
+
+      # No device set on tensor => no device on dep.
+      vnod = tf.Variable([0.0])
+      with_vnod_dep = control_flow_ops.with_dependencies([vnod.initializer],
+                                                         vnod)
+      self.assertEquals(None, with_vnod_dep.device)
+
+      # device set on tensor, default device on graph => default device on dep.
+      vdef = tf.Variable([0.0])
+      with tf.device("/job:worker/gpu:1"):
+        with_vdef_dep = control_flow_ops.with_dependencies([vdef.initializer],
+                                                           vdef)
+        self.assertEquals("/job:worker/gpu:1", with_vdef_dep.device)
+
+  def testGroup(self):
+    with self.test_session() as sess:
+      v1 = tf.Variable([0.0])
+      v2 = tf.Variable([1.0])
+
+      # Group init1 and init2 and run.
+      init = control_flow_ops.group(v1.initializer, v2.initializer)
+      # Fetching v1 directly will result in an uninitialized error
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        v1.eval()
+
+      # Runs "init" before fetching v1 and v2.
+      init.run()
+      v1_val, v2_val = sess.run([v1, v2])
+
+    # Ensure that v1 and v2 are initialized
+    self.assertAllClose([0.0], v1_val)
+    self.assertAllClose([1.0], v2_val)
+
+  def testMergeShapes(self):
+    # All inputs unknown.
+    p1 = tf.placeholder(tf.float32)
+    p2 = tf.placeholder(tf.float32)
+    p3 = tf.placeholder(tf.float32)
+    m, index = control_flow_ops.merge([p1, p2, p3])
+    self.assertIs(None, m.get_shape().ndims)
+    self.assertEqual([], index.get_shape())
+
+    # All inputs known but different.
+    p1 = tf.placeholder(tf.float32, shape=[1, 2])
+    p2 = tf.placeholder(tf.float32, shape=[2, 1])
+    m, index = control_flow_ops.merge([p1, p2])
+    self.assertIs(None, m.get_shape().ndims)
+    self.assertEqual([], index.get_shape())
+
+    # All inputs known but same.
+    p1 = tf.placeholder(tf.float32, shape=[1, 2])
+    p2 = tf.placeholder(tf.float32, shape=[1, 2])
+    m, index = control_flow_ops.merge([p1, p2])
+    self.assertEqual([1, 2], m.get_shape())
+    self.assertEqual([], index.get_shape())
+
+    # Possibly the same but not guaranteed.
+    p1 = tf.placeholder(tf.float32, shape=[1, 2])
+    p2 = tf.placeholder(tf.float32)
+    p2.set_shape([None, 2])
+    m, index = control_flow_ops.merge([p1, p2])
+    self.assertIs(None, m.get_shape().ndims)
+    self.assertEqual([], index.get_shape())
+
+  def testRefSelect(self):
+    index = tf.placeholder(tf.int32)
+
+    # All inputs unknown.
+    p1 = tf.placeholder(tf.float32_ref)
+    p2 = tf.placeholder(tf.float32_ref)
+    p3 = tf.placeholder(tf.float32_ref)
+    s = control_flow_ops.ref_select(index, [p1, p2, p3])
+    self.assertIs(None, s.get_shape().ndims)
+
+    # All inputs known but different.
+    p1 = tf.placeholder(tf.float32_ref, shape=[1, 2])
+    p2 = tf.placeholder(tf.float32_ref, shape=[2, 1])
+    s = control_flow_ops.ref_select(index, [p1, p2])
+    self.assertIs(None, s.get_shape().ndims)
+
+    # All inputs known but same.
+    p1 = tf.placeholder(tf.float32_ref, shape=[1, 2])
+    p2 = tf.placeholder(tf.float32_ref, shape=[1, 2])
+    s = control_flow_ops.ref_select(index, [p1, p2])
+    self.assertEqual([1, 2], s.get_shape())
+
+    # Possibly the same but not guaranteed.
+    p1 = tf.placeholder(tf.float32_ref, shape=[1, 2])
+    p2 = tf.placeholder(tf.float32_ref)
+    p2.set_shape([None, 2])
+    s = control_flow_ops.ref_select(index, [p1, p2])
+    self.assertEqual(None, s.get_shape())
+
+
+class TupleTest(tf.test.TestCase):
+
+  def testTensors(self):
+    for v1_first in [True, False]:
+      with self.test_session():
+        v1 = tf.Variable([1.0])
+        add1 = tf.add(
+            control_flow_ops.with_dependencies([v1.initializer], v1),
+            2.0)
+        v2 = tf.Variable([10.0])
+        add2 = tf.add(control_flow_ops.with_dependencies([v2.initializer],
+                                                               v2),
+                            20.0)
+        t1, _, t2 = control_flow_ops.tuple([add1, None, add2])
+
+        # v1 is not initialized.
+        with self.assertRaisesOpError("Attempting to use uninitialized value"):
+          v1.eval()
+
+        # v2 is not initialized.
+        with self.assertRaisesOpError("Attempting to use uninitialized value"):
+          v2.eval()
+
+        if v1_first:
+          # Getting t1 initializes v2.
+          self.assertAllClose([3.0], t1.eval())
+          self.assertAllClose([10.0], v2.eval())
+        else:
+          # Getting t2 initializes v1.
+          self.assertAllClose([30.0], t2.eval())
+          self.assertAllClose([1.0], v1.eval())
+
+  def testIndexedSlices(self):
+    for v1_first in [True, False]:
+      with self.test_session():
+        v1 = tf.Variable(
+            np.array([[0.0, 1.0], [10.0, 11.0], [20.0, 21.0]]).astype(
+                np.float32))
+        v1_at_1 = tf.IndexedSlices(
+            control_flow_ops.with_dependencies([v1.initializer], v1),
+            tf.constant([1]))
+
+        v2 = tf.Variable(
+            np.array([[0.1, 1.1], [10.1, 11.1], [20.1, 21.1]]).astype(
+                np.float32))
+        v2_at_1 = tf.IndexedSlices(
+            control_flow_ops.with_dependencies([v2.initializer], v2),
+            tf.constant([1]))
+
+        st1, st2 = control_flow_ops.tuple([v1_at_1, v2_at_1])
+        g1 = tf.gather(st1.values, st1.indices)
+        g2 = tf.gather(st2.values, st2.indices)
+
+        # v1 is not initialized.
+        with self.assertRaisesOpError("Attempting to use uninitialized value"):
+          v1.eval()
+
+        # v2 is not initialized.
+        with self.assertRaisesOpError("Attempting to use uninitialized value"):
+          v2.eval()
+
+        if v1_first:
+          # Getting g1 initializes v2.
+          self.assertAllClose([[10.0, 11.0]], g1.eval())
+          self.assertAllClose([[0.1, 1.1], [10.1, 11.1], [20.1, 21.1]],
+                              v2.eval())
+        else:
+          # Getting g2 initializes v1.
+          self.assertAllClose([[10.1, 11.1]], g2.eval())
+          self.assertAllClose([[0.0, 1.0], [10.0, 11.0], [20.0, 21.0]],
+                              v1.eval())
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/conv_ops_test.py b/tensorflow/python/kernel_tests/conv_ops_test.py
new file mode 100644
index 0000000000..7f5d419c98
--- /dev/null
+++ b/tensorflow/python/kernel_tests/conv_ops_test.py
@@ -0,0 +1,1009 @@
+"""Functional tests for convolutional operations."""
+import math
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+def GetInceptionShapes():
+  """Iterator for the convolution shapes used in the Inception 2015 model.
+
+  Yields:
+    Tuple (input_size, filter_size, out_size, stride, padding), the convolution
+    parameters of Inception layers.
+  """
+  input_sizes = [[4, 5, 5, 1248], [4, 8, 8, 384], [4, 8, 8, 384],
+                 [4, 8, 8, 2048], [4, 8, 8, 448], [4, 8, 8, 2048],
+                 [4, 8, 8, 2048], [4, 8, 8, 2048], [4, 8, 8, 1760],
+                 [4, 8, 8, 1760], [4, 8, 8, 1760], [4, 8, 8, 1760],
+                 [4, 17, 17, 192], [4, 17, 17, 192], [4, 17, 17, 1248],
+                 [4, 17, 17, 128], [4, 17, 17, 1248], [4, 17, 17, 224],
+                 [4, 17, 17, 192], [4, 17, 17, 192], [4, 17, 17, 1216],
+                 [4, 17, 17, 1216], [4, 17, 17, 224], [4, 17, 17, 192],
+                 [4, 17, 17, 192], [4, 17, 17, 1152], [4, 17, 17, 1152],
+                 [4, 17, 17, 192], [4, 17, 17, 160], [4, 17, 17, 1152],
+                 [4, 17, 17, 1024], [4, 17, 17, 128], [4, 17, 17, 1024],
+                 [4, 17, 17, 128], [4, 17, 17, 1024], [4, 17, 17, 128],
+                 [4, 17, 17, 768], [4, 17, 17, 128], [4, 17, 17, 128],
+                 [4, 17, 17, 768], [4, 17, 17, 768], [4, 35, 35, 96],
+                 [4, 35, 35, 288], [4, 35, 35, 64], [4, 35, 35, 288],
+                 [4, 35, 35, 256], [4, 35, 35, 48], [4, 35, 35, 256],
+                 [4, 35, 35, 96], [4, 35, 35, 192], [4, 35, 35, 192],
+                 [4, 35, 35, 192], [4, 73, 73, 64], [4, 73, 73, 64],
+                 [4, 147, 147, 24]]
+  filter_sizes = [[1, 1, 1248, 128], [1, 3, 384, 384], [3, 1, 384, 384],
+                  [1, 1, 2048, 192], [3, 3, 448, 384], [1, 1, 2048, 320],
+                  [1, 1, 2048, 448], [1, 1, 2048, 384], [1, 1, 1760, 384],
+                  [1, 1, 1760, 192], [1, 1, 1760, 448], [1, 1, 1760, 320],
+                  [3, 3, 192, 192], [3, 3, 192, 192], [1, 1, 1248, 192],
+                  [3, 3, 128, 320], [1, 1, 1248, 128], [1, 3, 224, 224],
+                  [3, 1, 192, 256], [1, 3, 192, 256], [1, 1, 1216, 192],
+                  [1, 1, 1216, 96], [3, 1, 224, 224], [3, 3, 192, 224],
+                  [1, 3, 192, 192], [1, 1, 1152, 192], [1, 1, 1152, 128],
+                  [3, 1, 192, 192], [3, 3, 160, 192], [1, 1, 1152, 160],
+                  [1, 1, 1024, 128], [1, 3, 128, 192], [1, 1, 1024, 160],
+                  [3, 1, 128, 192], [1, 1, 1024, 256], [3, 1, 128, 128],
+                  [1, 1, 768, 192], [1, 3, 128, 128], [3, 3, 128, 128],
+                  [1, 1, 768, 128], [1, 1, 768, 320], [3, 3, 96, 96],
+                  [3, 3, 288, 384], [3, 3, 64, 96], [1, 1, 288, 64],
+                  [1, 1, 256, 64], [5, 5, 48, 64], [1, 1, 256, 48],
+                  [3, 3, 96, 96], [1, 1, 192, 32], [1, 1, 192, 64],
+                  [1, 1, 192, 48], [3, 3, 64, 192], [1, 1, 64, 64],
+                  [1, 1, 24, 64]]
+  out_sizes = [[4, 5, 5, 128], [4, 8, 8, 384], [4, 8, 8, 384],
+               [4, 8, 8, 192], [4, 8, 8, 384], [4, 8, 8, 320],
+               [4, 8, 8, 448], [4, 8, 8, 384], [4, 8, 8, 384],
+               [4, 8, 8, 192], [4, 8, 8, 448], [4, 8, 8, 320],
+               [4, 8, 8, 192], [4, 17, 17, 192], [4, 17, 17, 192],
+               [4, 8, 8, 320], [4, 17, 17, 128], [4, 17, 17, 224],
+               [4, 17, 17, 256], [4, 17, 17, 256], [4, 17, 17, 192],
+               [4, 17, 17, 96], [4, 17, 17, 224], [4, 17, 17, 224],
+               [4, 17, 17, 192], [4, 17, 17, 192], [4, 17, 17, 128],
+               [4, 17, 17, 192], [4, 17, 17, 192], [4, 17, 17, 160],
+               [4, 17, 17, 128], [4, 17, 17, 192], [4, 17, 17, 160],
+               [4, 17, 17, 192], [4, 17, 17, 256], [4, 17, 17, 128],
+               [4, 17, 17, 192], [4, 17, 17, 128], [4, 17, 17, 128],
+               [4, 17, 17, 128], [4, 17, 17, 320], [4, 17, 17, 96],
+               [4, 17, 17, 384], [4, 35, 35, 96], [4, 35, 35, 64],
+               [4, 35, 35, 64], [4, 35, 35, 64], [4, 35, 35, 48],
+               [4, 35, 35, 96], [4, 35, 35, 32], [4, 35, 35, 64],
+               [4, 35, 35, 48], [4, 71, 71, 192], [4, 73, 73, 64],
+               [4, 147, 147, 64]]
+  strides = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1,
+             1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+             1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
+  # pylint: disable=invalid-name
+  VALID = "VALID"
+  SAME = "SAME"
+  # pylint: enable=invalid-name
+  paddings = [SAME, SAME, SAME, SAME, SAME, SAME, SAME, SAME,
+              SAME, SAME, SAME, SAME, VALID, SAME, SAME, VALID,
+              SAME, SAME, SAME, SAME, SAME, SAME, SAME, SAME,
+              SAME, SAME, SAME, SAME, SAME, SAME, SAME, SAME,
+              SAME, SAME, SAME, SAME, SAME, SAME, SAME, SAME,
+              SAME, VALID, VALID, SAME, SAME, SAME, SAME, SAME,
+              SAME, SAME, SAME, SAME, VALID, VALID, VALID]
+  for i, f, o, s, p in zip(input_sizes, filter_sizes, out_sizes,
+                           strides, paddings):
+    yield i, f, o, s, p
+
+
+class Conv2DTest(tf.test.TestCase):
+
+  def _SetupValuesForDevice(self, tensor_in_sizes, filter_in_sizes, stride,
+                            padding, use_gpu):
+    """Verifies the output values of the convolution function.
+
+    Args:
+      tensor_in_sizes: Input tensor dimensions in
+        [batch, input_rows, input_cols, input_depth].
+      filter_in_sizes: Filter tensor dimensions in
+        [kernel_rows, kernel_cols, input_depth, output_depth].
+      stride: Stride.
+      padding: Padding type.
+      use_gpu: True if the operations should be run on GPU
+    Returns:
+      Symbolic tensor value that can be used to execute the computation
+    """
+    total_size_1 = 1
+    total_size_2 = 1
+    for s in tensor_in_sizes:
+      total_size_1 *= s
+    for s in filter_in_sizes:
+      total_size_2 *= s
+    # Initializes the input tensor with array containing incrementing
+    # numbers from 1.
+    x1 = [f * 1.0 for f in range(1, total_size_1 + 1)]
+    x2 = [f * 1.0 for f in range(1, total_size_2 + 1)]
+    with self.test_session(use_gpu=use_gpu) as sess:
+      t1 = tf.constant(x1, shape=tensor_in_sizes)
+      t2 = tf.constant(x2, shape=filter_in_sizes)
+      conv = tf.nn.conv2d(t1, t2,
+                           strides=[1, stride, stride, 1],
+                           padding=padding)
+      return conv
+
+  def _CompareFwdValues(self, tensor_in_sizes, filter_in_sizes,
+                        stride, padding):
+    """Verifies that CPU and GPU produce the same values.
+
+    Args:
+      tensor_in_sizes: Input tensor dimensions in
+        [batch, input_rows, input_cols, input_depth].
+      filter_in_sizes: Filter tensor dimensions in
+        [kernel_rows, kernel_cols, input_depth, output_depth].
+      stride: Stride.
+      padding: Padding type.
+    """
+    x1 = np.random.rand(*tensor_in_sizes).astype(np.float32)
+    x2 = np.random.rand(*filter_in_sizes).astype(np.float32)
+    def _SetupVal(use_gpu):
+      with self.test_session(use_gpu=use_gpu):
+        t1 = tf.constant(x1, shape=tensor_in_sizes)
+        t2 = tf.constant(x2, shape=filter_in_sizes)
+        conv = tf.nn.conv2d(t1, t2, strides=[1, stride, stride, 1],
+                             padding=padding)
+        return conv
+    gpu_tensor = _SetupVal(use_gpu=True)
+    cpu_tensor = _SetupVal(use_gpu=False)
+    with self.test_session() as sess:
+      (gpu_value, cpu_value) = sess.run([gpu_tensor, cpu_tensor])
+      self.assertAllClose(cpu_value, gpu_value, rtol=1e-5, atol=1e-5)
+
+  def _VerifyValues(self, tensor_in_sizes, filter_in_sizes, stride,
+                    padding, expected):
+    tensor_cpu = self._SetupValuesForDevice(tensor_in_sizes, filter_in_sizes,
+                                            stride, padding, use_gpu=False)
+    tensor_gpu = self._SetupValuesForDevice(tensor_in_sizes, filter_in_sizes,
+                                            stride, padding, use_gpu=True)
+    with self.test_session() as sess:
+      tensors = [tensor_cpu, tensor_gpu]
+      (value_cpu, value_gpu) = sess.run(tensors)
+      values = [value_cpu, value_gpu]
+      for i in range(len(tensors)):
+        conv = tensors[i]
+        value = values[i]
+        print "expected = ", expected
+        print "actual = ", value
+        self.assertArrayNear(expected, np.ravel(value), 1e-5)
+        self.assertShapeEqual(value, conv)
+
+  def testConv2D1x1Filter(self):
+    expected_output = [30.0, 36.0, 42.0, 66.0, 81.0, 96.0, 102.0, 126.0, 150.0,
+                       138.0, 171.0, 204.0, 174.0, 216.0, 258.0, 210.0, 261.0,
+                       312.0]
+    self._VerifyValues(tensor_in_sizes=[1, 2, 3, 3],
+                       filter_in_sizes=[1, 1, 3, 3],
+                       stride=1, padding="VALID",
+                       expected=expected_output)
+
+  def testConv2D2x2Filter(self):
+    # The outputs are computed using third_party/py/IPython/notebook.
+    expected_output = [2271.0, 2367.0, 2463.0, 2901.0, 3033.0, 3165.0]
+    self._VerifyValues(tensor_in_sizes=[1, 2, 3, 3],
+                       filter_in_sizes=[2, 2, 3, 3],
+                       stride=1, padding="VALID",
+                       expected=expected_output)
+
+  def testConv2D1x2Filter(self):
+    # The outputs are computed using third_party/py/IPython/notebook.
+    expected_output = [231.0, 252.0, 273.0, 384.0, 423.0, 462.0, 690.0,
+                       765.0, 840.0, 843.0, 936.0, 1029.0]
+    self._VerifyValues(tensor_in_sizes=[1, 2, 3, 3],
+                       filter_in_sizes=[1, 2, 3, 3],
+                       stride=1, padding="VALID",
+                       expected=expected_output)
+
+  def testConv2D2x2FilterStride2(self):
+    expected_output = [2271.0, 2367.0, 2463.0]
+    self._VerifyValues(tensor_in_sizes=[1, 2, 3, 3],
+                       filter_in_sizes=[2, 2, 3, 3],
+                       stride=2, padding="VALID",
+                       expected=expected_output)
+
+  def testConv2D2x2FilterStride2Same(self):
+    expected_output = [2271.0, 2367.0, 2463.0, 1230.0, 1305.0, 1380.0]
+    self._VerifyValues(tensor_in_sizes=[1, 2, 3, 3],
+                       filter_in_sizes=[2, 2, 3, 3],
+                       stride=2, padding="SAME",
+                       expected=expected_output)
+
+  # Testing for backprops
+  def _RunAndVerifyBackpropInput(self, input_sizes, filter_sizes, output_sizes,
+                                 stride, padding, expected, use_gpu):
+    total_output_size = 1
+    total_filter_size = 1
+    for s in output_sizes:
+      total_output_size *= s
+    for s in filter_sizes:
+      total_filter_size *= s
+    # Initializes the input tensor with array containing incrementing
+    # numbers from 1.
+    x1 = [f * 1.0 for f in range(1, total_filter_size + 1)]
+    x2 = [f * 1.0 for f in range(1, total_output_size + 1)]
+    with self.test_session(use_gpu=use_gpu) as sess:
+      t0 = tf.constant(input_sizes, shape=[len(input_sizes)])
+      t1 = tf.constant(x1, shape=filter_sizes)
+      t2 = tf.constant(x2, shape=output_sizes)
+      conv = tf.nn.conv2d_backprop_input(t0, t1, t2,
+                                        strides=[1, stride, stride, 1],
+                                        padding=padding)
+      # "values" consists of two tensors for two backprops
+      value = sess.run(conv)
+      self.assertShapeEqual(value, conv)
+    print "expected = ", expected
+    print "actual = ", value
+    self.assertArrayNear(expected, value.flatten(), 1e-5)
+
+  def _CompareBackpropInput(self, input_sizes, filter_sizes, output_sizes,
+                            stride, padding):
+    x1 = np.random.rand(*filter_sizes).astype(np.float32)
+    x2 = np.random.rand(*output_sizes).astype(np.float32)
+    def _GetVal(use_gpu):
+      with self.test_session(use_gpu=use_gpu) as sess:
+        t0 = tf.constant(input_sizes, shape=[len(input_sizes)])
+        t1 = tf.constant(x1, shape=filter_sizes)
+        t2 = tf.constant(x2, shape=output_sizes)
+        conv = tf.nn.conv2d_backprop_input(t0, t1, t2,
+                                          strides=[1, stride, stride, 1],
+                                          padding=padding)
+        ret = conv.eval()
+        self.assertShapeEqual(ret, conv)
+        return ret
+    gpu_value = _GetVal(use_gpu=True)
+    cpu_value = _GetVal(use_gpu=False)
+    self.assertAllClose(cpu_value, gpu_value, rtol=1e-4, atol=1e-4)
+
+  def testConv2D2x2Depth1ValidBackpropInput(self):
+    expected_output = [1.0, 4.0, 4.0, 3.0, 10.0, 8.0]
+    self._RunAndVerifyBackpropInput(input_sizes=[1, 2, 3, 1],
+                                    filter_sizes=[2, 2, 1, 1],
+                                    output_sizes=[1, 1, 2, 1],
+                                    stride=1, padding="VALID",
+                                    expected=expected_output, use_gpu=False)
+    self._RunAndVerifyBackpropInput(input_sizes=[1, 2, 3, 1],
+                                    filter_sizes=[2, 2, 1, 1],
+                                    output_sizes=[1, 1, 2, 1],
+                                    stride=1, padding="VALID",
+                                    expected=expected_output, use_gpu=True)
+
+  def testConv2D2x2Depth3ValidBackpropInput(self):
+    expected_output = [14.0, 32.0, 50.0,
+                       100.0, 163.0, 226.0,
+                       167.0, 212.0, 257.0,
+                       122.0, 140.0, 158.0,
+                       478.0, 541.0, 604.0,
+                       437.0, 482.0, 527.0]
+    self._RunAndVerifyBackpropInput(input_sizes=[1, 2, 3, 3],
+                                    filter_sizes=[2, 2, 3, 3],
+                                    output_sizes=[1, 1, 2, 3],
+                                    stride=1, padding="VALID",
+                                    expected=expected_output, use_gpu=False)
+    self._RunAndVerifyBackpropInput(input_sizes=[1, 2, 3, 3],
+                                    filter_sizes=[2, 2, 3, 3],
+                                    output_sizes=[1, 1, 2, 3],
+                                    stride=1, padding="VALID",
+                                    expected=expected_output, use_gpu=True)
+
+  # Testing for backprops
+  def _RunAndVerifyBackpropFilter(self, input_sizes, filter_sizes, output_sizes,
+                                  stride, padding, expected, use_gpu):
+    total_input_size = 1
+    total_output_size = 1
+    for s in input_sizes:
+      total_input_size *= s
+    for s in output_sizes:
+      total_output_size *= s
+    # Initializes the input tensor with array containing incrementing
+    # numbers from 1.
+    x0 = [f * 1.0 for f in range(1, total_input_size + 1)]
+    x2 = [f * 1.0 for f in range(1, total_output_size + 1)]
+    with self.test_session(use_gpu=use_gpu) as sess:
+      t0 = tf.constant(x0, shape=input_sizes)
+      t1 = tf.constant(filter_sizes, shape=[len(filter_sizes)])
+      t2 = tf.constant(x2, shape=output_sizes)
+      conv = tf.nn.conv2d_backprop_filter(t0, t1, t2,
+                                         strides=[1, stride, stride, 1],
+                                         padding=padding)
+      value = sess.run(conv)
+      self.assertShapeEqual(value, conv)
+    print "expected = ", expected
+    print "actual = ", value
+    self.assertArrayNear(expected, value.flatten(), 1e-5)
+
+  def _CompareBackFilter(self, input_sizes, filter_sizes, output_sizes,
+                         stride, padding):
+    x0 = np.random.rand(*input_sizes).astype(np.float32)
+    x2 = np.random.rand(*output_sizes).astype(np.float32)
+    def _GetVal(use_gpu):
+      with self.test_session(use_gpu=use_gpu) as sess:
+        t0 = tf.constant(x0, shape=input_sizes)
+        t1 = tf.constant(filter_sizes, shape=[len(filter_sizes)])
+        t2 = tf.constant(x2, shape=output_sizes)
+        conv = tf.nn.conv2d_backprop_filter(t0, t1, t2,
+                                           strides=[1, stride, stride, 1],
+                                           padding=padding)
+        ret = conv.eval()
+        self.assertShapeEqual(ret, conv)
+        return ret
+    gpu_value = _GetVal(use_gpu=True)
+    cpu_value = _GetVal(use_gpu=False)
+    self.assertAllClose(cpu_value, gpu_value, rtol=1e-4, atol=1e-4)
+
+  def testConv2D2x2Depth1ValidBackpropFilter(self):
+    expected = [5.0, 8.0, 14.0, 17.0]
+    self._RunAndVerifyBackpropFilter(input_sizes=[1, 2, 3, 1],
+                                     filter_sizes=[2, 2, 1, 1],
+                                     output_sizes=[1, 1, 2, 1],
+                                     stride=1, padding="VALID",
+                                     expected=expected, use_gpu=False)
+    self._RunAndVerifyBackpropFilter(input_sizes=[1, 2, 3, 1],
+                                     filter_sizes=[2, 2, 1, 1],
+                                     output_sizes=[1, 1, 2, 1],
+                                     stride=1, padding="VALID",
+                                     expected=expected, use_gpu=True)
+
+  def testConv2D2x2Depth3ValidBackpropFilter(self):
+    expected = [17.0, 22.0, 27.0, 22.0, 29.0, 36.0, 27.0, 36.0, 45.0,
+                32.0, 43.0, 54.0, 37.0, 50.0, 63.0, 42.0, 57.0, 72.0,
+                62.0, 85.0, 108.0, 67.0, 92.0, 117.0, 72.0, 99.0, 126.0,
+                77.0, 106.0, 135.0, 82.0, 113.0, 144.0, 87.0, 120.0, 153.0]
+    self._RunAndVerifyBackpropFilter(input_sizes=[1, 2, 3, 3],
+                                     filter_sizes=[2, 2, 3, 3],
+                                     output_sizes=[1, 1, 2, 3],
+                                     stride=1, padding="VALID",
+                                     expected=expected, use_gpu=False)
+    self._RunAndVerifyBackpropFilter(input_sizes=[1, 2, 3, 3],
+                                     filter_sizes=[2, 2, 3, 3],
+                                     output_sizes=[1, 1, 2, 3],
+                                     stride=1, padding="VALID",
+                                     expected=expected, use_gpu=True)
+
+  # Gradient checkers
+  def ConstructAndTestGradient(self, batch, input_rows, input_cols, filter_rows,
+                               filter_cols, in_depth, out_depth, stride,
+                               padding, test_input, use_gpu):
+    input_shape = [batch, input_rows, input_cols, in_depth]
+    filter_shape = [filter_rows, filter_cols, in_depth, out_depth]
+    # TODO(yangke): re-factor the computation of output shape.
+    if padding == "VALID":
+      output_rows = int(math.ceil((input_rows - filter_rows + 1.0) / stride))
+      output_cols = int(math.ceil((input_cols - filter_cols + 1.0) / stride))
+    else:
+      output_rows = int(math.ceil(float(input_rows) / stride))
+      output_cols = int(math.ceil(float(input_cols) / stride))
+    output_shape = [batch, output_rows, output_cols, out_depth]
+    input_size = 1
+    for x in input_shape:
+      input_size *= x
+    filter_size = 1
+    for x in filter_shape:
+      filter_size *= x
+    input_data = [x * 1.0 / input_size for x in range(0, input_size)]
+    filter_data = [x * 1.0 / filter_size for x in range(0, filter_size)]
+    with self.test_session(use_gpu=use_gpu):
+      # Conv2DGrad functions are not compiled for double due to
+      # a problem in the way Eigen's Conv2DGrad works for double.
+      # So we disable the DOUBLE path.  We should re-enable this
+      # when double support returns for CPU and/or GPU.
+      # data_type = tf.float64
+      # tolerance = 1e-8
+
+      data_type = tf.float32
+      tolerance = 0.002
+
+      input_tensor = tf.constant(input_data, shape=input_shape,
+                                          dtype=data_type, name="input")
+      filter_tensor = tf.constant(filter_data, shape=filter_shape,
+                                           dtype=data_type, name="filter")
+      conv = tf.nn.conv2d(input_tensor, filter_tensor,
+                           [1, stride, stride, 1], padding,
+                           name="conv")
+      self.assertEqual(output_shape, conv.get_shape())
+      if test_input:
+        err = gc.ComputeGradientError(input_tensor, input_shape,
+                                      conv, output_shape)
+      else:
+        err = gc.ComputeGradientError(filter_tensor, filter_shape,
+                                      conv, output_shape)
+      print "conv_2d gradient error = ", err
+      self.assertLess(err, tolerance)
+
+  def testInputGradientValidPaddingStrideOne(self):
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=5,
+        input_cols=4,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=2,
+        out_depth=3,
+        stride=1,
+        padding="VALID",
+        test_input=True,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=5,
+        input_cols=4,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=2,
+        out_depth=3,
+        stride=1,
+        padding="VALID",
+        test_input=True,
+        use_gpu=True)
+
+  def testFilterGradientValidPaddingStrideOne(self):
+    self.ConstructAndTestGradient(
+        batch=4,
+        input_rows=6,
+        input_cols=5,
+        filter_rows=2,
+        filter_cols=2,
+        in_depth=2,
+        out_depth=3,
+        stride=1,
+        padding="VALID",
+        test_input=False,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=4,
+        input_rows=6,
+        input_cols=5,
+        filter_rows=2,
+        filter_cols=2,
+        in_depth=2,
+        out_depth=3,
+        stride=1,
+        padding="VALID",
+        test_input=False,
+        use_gpu=True)
+
+  def testInputGradientValidPaddingStrideTwo(self):
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=4,
+        input_cols=5,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=2,
+        out_depth=3,
+        stride=2,
+        padding="VALID",
+        test_input=True,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=4,
+        input_cols=5,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=2,
+        out_depth=3,
+        stride=2,
+        padding="VALID",
+        test_input=True,
+        use_gpu=True)
+
+  def testFilterGradientValidPaddingStrideTwo(self):
+    self.ConstructAndTestGradient(
+        batch=4,
+        input_rows=6,
+        input_cols=5,
+        filter_rows=2,
+        filter_cols=2,
+        in_depth=2,
+        out_depth=3,
+        stride=2,
+        padding="VALID",
+        test_input=False,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=4,
+        input_rows=6,
+        input_cols=5,
+        filter_rows=2,
+        filter_cols=2,
+        in_depth=2,
+        out_depth=3,
+        stride=2,
+        padding="VALID",
+        test_input=False,
+        use_gpu=True)
+
+  def testInputGradientValidPaddingStrideThree(self):
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=7,
+        input_cols=6,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=4,
+        out_depth=5,
+        stride=3,
+        padding="VALID",
+        test_input=True,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=7,
+        input_cols=6,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=4,
+        out_depth=5,
+        stride=3,
+        padding="VALID",
+        test_input=True,
+        use_gpu=True)
+
+  def testFilterGradientValidPaddingStrideThree(self):
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=8,
+        input_cols=7,
+        filter_rows=4,
+        filter_cols=4,
+        in_depth=2,
+        out_depth=3,
+        stride=3,
+        padding="VALID",
+        test_input=False,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=8,
+        input_cols=7,
+        filter_rows=4,
+        filter_cols=4,
+        in_depth=2,
+        out_depth=3,
+        stride=3,
+        padding="VALID",
+        test_input=False,
+        use_gpu=True)
+
+  def testInputGradientSamePaddingStrideOne(self):
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=7,
+        input_cols=6,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=2,
+        out_depth=3,
+        stride=1,
+        padding="SAME",
+        test_input=True,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=7,
+        input_cols=6,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=2,
+        out_depth=3,
+        stride=1,
+        padding="SAME",
+        test_input=True,
+        use_gpu=True)
+
+  def testFilterGradientSamePaddingStrideOne(self):
+    self.ConstructAndTestGradient(
+        batch=4,
+        input_rows=6,
+        input_cols=5,
+        filter_rows=2,
+        filter_cols=2,
+        in_depth=2,
+        out_depth=3,
+        stride=1,
+        padding="SAME",
+        test_input=False,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=4,
+        input_rows=6,
+        input_cols=5,
+        filter_rows=2,
+        filter_cols=2,
+        in_depth=2,
+        out_depth=3,
+        stride=1,
+        padding="SAME",
+        test_input=False,
+        use_gpu=True)
+
+  def testInputGradientSamePaddingStrideTwo(self):
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=5,
+        input_cols=4,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=3,
+        out_depth=3,
+        stride=2,
+        padding="SAME",
+        test_input=True,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=5,
+        input_cols=4,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=3,
+        out_depth=3,
+        stride=2,
+        padding="SAME",
+        test_input=True,
+        use_gpu=True)
+
+  def testFilterGradientSamePaddingStrideTwo(self):
+    self.ConstructAndTestGradient(
+        batch=4,
+        input_rows=6,
+        input_cols=5,
+        filter_rows=2,
+        filter_cols=2,
+        in_depth=2,
+        out_depth=3,
+        stride=2,
+        padding="SAME",
+        test_input=False,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=4,
+        input_rows=6,
+        input_cols=5,
+        filter_rows=2,
+        filter_cols=2,
+        in_depth=2,
+        out_depth=3,
+        stride=2,
+        padding="SAME",
+        test_input=False,
+        use_gpu=True)
+
+  def testInputGradientSamePaddingStrideThree(self):
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=7,
+        input_cols=6,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=4,
+        out_depth=5,
+        stride=3,
+        padding="SAME",
+        test_input=True,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=7,
+        input_cols=6,
+        filter_rows=3,
+        filter_cols=3,
+        in_depth=4,
+        out_depth=5,
+        stride=3,
+        padding="SAME",
+        test_input=True,
+        use_gpu=True)
+
+  def testFilterGradientSamePaddingStrideThree(self):
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=8,
+        input_cols=7,
+        filter_rows=4,
+        filter_cols=4,
+        in_depth=2,
+        out_depth=3,
+        stride=3,
+        padding="SAME",
+        test_input=False,
+        use_gpu=False)
+    self.ConstructAndTestGradient(
+        batch=2,
+        input_rows=8,
+        input_cols=7,
+        filter_rows=4,
+        filter_cols=4,
+        in_depth=2,
+        out_depth=3,
+        stride=3,
+        padding="SAME",
+        test_input=False,
+        use_gpu=True)
+
+  def testShapeFunctionEdgeCases(self):
+    # All shapes unknown.
+    c1 = tf.nn.conv2d(tf.placeholder(tf.float32),
+                       tf.placeholder(tf.float32),
+                       strides=[1, 1, 1, 1], padding="SAME")
+    self.assertEqual([None, None, None, None], c1.get_shape().as_list())
+
+    # Incorrect input shape.
+    with self.assertRaises(ValueError):
+      tf.nn.conv2d(tf.placeholder(tf.float32, shape=[1, 3]),
+                    tf.placeholder(tf.float32),
+                    strides=[1, 1, 1, 1], padding="SAME")
+
+    # Incorrect filter shape.
+    with self.assertRaises(ValueError):
+      tf.nn.conv2d(tf.placeholder(tf.float32),
+                    tf.placeholder(tf.float32, shape=[1, 3]),
+                    strides=[1, 1, 1, 1], padding="SAME")
+
+    # Depth mismatch.
+    with self.assertRaises(ValueError):
+      tf.nn.conv2d(tf.placeholder(tf.float32,
+                                          shape=[32, 20, 20, 3]),
+                    tf.placeholder(tf.float32,
+                                          shape=[4, 4, 2, 2]),
+                    strides=[1, 1, 1, 1], padding="SAME")
+
+    # Illegal strides.
+    with self.assertRaisesRegexp(ValueError, "strides in the batch and depth"):
+      tf.nn.conv2d(tf.placeholder(tf.float32),
+                    tf.placeholder(tf.float32),
+                    strides=[2, 1, 1, 1], padding="SAME")
+    with self.assertRaisesRegexp(ValueError, "strides in the batch and depth"):
+      tf.nn.conv2d(tf.placeholder(tf.float32),
+                    tf.placeholder(tf.float32),
+                    strides=[1, 1, 1, 2], padding="SAME")
+
+    # Filter larger than input.
+    with self.assertRaisesRegexp(ValueError,
+                                 "filter must not be larger than the input"):
+      tf.nn.conv2d(tf.placeholder(tf.float32,
+                                          shape=[32, 20, 20, 3]),
+                    tf.placeholder(tf.float32,
+                                          shape=[20, 21, 3, 2]),
+                    strides=[1, 1, 1, 1], padding="SAME")
+    with self.assertRaisesRegexp(ValueError,
+                                 "filter must not be larger than the input"):
+      tf.nn.conv2d(tf.placeholder(tf.float32,
+                                          shape=[32, 20, 20, 3]),
+                    tf.placeholder(tf.float32,
+                                          shape=[21, 20, 3, 2]),
+                    strides=[1, 1, 1, 1], padding="SAME")
+
+    # Stride larger than filter.
+    with self.assertRaisesRegexp(ValueError,
+                                 "stride must be less than or equal to filter"):
+      tf.nn.conv2d(tf.placeholder(tf.float32,
+                                          shape=[32, 20, 20, 3]),
+                    tf.placeholder(tf.float32,
+                                          shape=[4, 5, 3, 2]),
+                    strides=[1, 5, 5, 1], padding="SAME")
+    with self.assertRaisesRegexp(ValueError,
+                                 "stride must be less than or equal to filter"):
+      tf.nn.conv2d(tf.placeholder(tf.float32,
+                                          shape=[32, 20, 20, 3]),
+                    tf.placeholder(tf.float32,
+                                          shape=[5, 4, 3, 2]),
+                    strides=[1, 5, 5, 1], padding="SAME")
+
+    # Invalid rectangular stride.
+    with self.assertRaisesRegexp(ValueError,
+                                 "equal length strides in the row and column"):
+      tf.nn.conv2d(tf.placeholder(tf.float32),
+                    tf.placeholder(tf.float32),
+                    strides=[1, 3, 7, 1], padding="SAME")
+
+
+# This is only a very simple test. More comprehensive tests live in
+# //learning/dist_belief/experimental/brain_compatibility/conv_nn_test.py
+# where we compare the numeric results of the depthwise conv op with the
+# depthwise weighted sum transformer in dist_belief.
+class DepthwiseConv2DTest(tf.test.TestCase):
+
+  def _VerifyValues(self, tensor_in_sizes, filter_in_sizes, stride,
+                    padding, expected):
+    """Verifies the output values of the convolution function.
+
+    Args:
+      tensor_in_sizes: Input tensor dimensions in
+        [batch, input_rows, input_cols, input_depth].
+      filter_in_sizes: Filter tensor dimensions in
+        [filter_rows, filter_cols, input_depth, depth_multiplier].
+      stride: Stride.
+      padding: Padding type.
+      expected: An array containing the expected operation outputs.
+    """
+    total_size_1 = 1
+    total_size_2 = 1
+    for s in tensor_in_sizes:
+      total_size_1 *= s
+    for s in filter_in_sizes:
+      total_size_2 *= s
+    # Initializes the input tensor with array containing incrementing
+    # numbers from 1.
+    x1 = [f * 1.0 for f in range(1, total_size_1 + 1)]
+    x2 = [f * 1.0 for f in range(1, total_size_2 + 1)]
+    with self.test_session() as sess:
+      t1 = tf.constant(x1, shape=tensor_in_sizes)
+      t1.set_shape(tensor_in_sizes)
+      t2 = tf.constant(x2, shape=filter_in_sizes)
+      conv = tf.nn.depthwise_conv2d(t1, t2, strides=[1, stride, stride, 1],
+                                    padding=padding)
+      value = sess.run(conv)
+    print "value = ", value
+    self.assertArrayNear(expected, np.ravel(value), 1e-5)
+    self.assertShapeEqual(value, conv)
+
+  def testConv2D2x2Filter(self):
+    # The inputs look like this (it's a 3 x 2 matrix, each of depth 2):
+    #
+    # [ (1.0, 2.0), (3.0,  4.0), ( 5.0,  6.0) ]
+    # [ (7.0, 8.0), (9.0, 10.0), (11.0, 12.0) ]
+    #  We can view this as two inputs
+    #
+    #  input depth 0:
+    #
+    #  [ 1.0,  3.0,  5.0 ]
+    #  [ 7.0,  9.0, 11.0 ]
+    #
+    #  input depth 1:
+    #
+    #  [ 2.0,  4.0,  6.0 ]
+    #  [ 8.0, 10.0, 12.0 ]
+    #
+    # The filter looks like this (it has two 2 x 2 patches, each generating 2
+    # depths):
+    #
+    #  filter #0:
+    #
+    #  [ (1.0,  3.0), ( 5.0,  7.0)]
+    #  [ (9.0, 11.0), (13.0, 15.0)]
+    #
+    #  filter #1:
+    #
+    #  [ ( 2.0,  4.0), ( 6.0,  8.0)]
+    #  [ (10.0, 12.0), (14.0, 16.0)]
+    #
+    # So the outputs are:
+    #
+    # (position 0, 0: in_depth 0, output_depth 0 -- using filter #0)
+    #  1.0 * 1.0 + 7.0 * 9.0 + 3.0 * 5.0 + 9.0 * 13.0 = 196
+    # (position 0, 0: in_depth 0, output_depth 1 -- using filter #1)
+    #  1.0 * 2.0 + 7.0 * 10.0 + 3.0 * 6.0 + 9.0 * 14.0 = 216
+    # (position 0, 0: in_depth 1, output_depth 2 -- using filter #0)
+    #  2.0 * 3.0 + 8.0 * 11.0 + 4.0 * 7.0 + 10.0 * 15.0 = 272
+    # (position 0, 0: in_depth 1, output_depth 3 -- using filter #1)
+    #  2.0 * 4.0 + 8.0 * 12.0 + 4.0 * 8.0 + 10.0 * 16.0 = 296
+    #
+    # (position 1, 0: in_depth 0, output_depth 0 -- using filter #0)
+    #  3.0 * 1.0 + 9.0 * 9.0 + 5.0 * 5.0 + 11.0 * 13.0 = 252
+    # (position 1, 0: in_depth 0, output_depth 1 -- using filter #1)
+    #  3.0 * 2.0 + 9.0 * 10.0 + 5.0 * 6.0 + 11.0 * 14.0 = 280
+    # (position 1, 0: in_depth 1, output_depth 2 -- using filter #0)
+    #  4.0 * 3.0 + 10.0 * 11.0 + 6.0 * 7.0 + 12.0 * 15.0 = 344
+    # (position 1, 0: in_depth 1, output_depth 3 -- using filter #1)
+    #  4.0 * 4.0 + 10.0 * 12.0 + 6.0 * 8.0 + 12.0 * 16.0 = 376
+    expected_output = [196, 216, 272, 296, 252, 280, 344, 376]
+    self._VerifyValues(tensor_in_sizes=[1, 2, 3, 2],
+                       filter_in_sizes=[2, 2, 2, 2],
+                       stride=1, padding="VALID",
+                       expected=expected_output)
+
+
+class SeparableConv2DTest(tf.test.TestCase):
+
+  def _InitValues(self, sizes):
+    """Initializes values for input tensors.
+
+    Args:
+      sizes: Tensor dimensions.
+
+    Returns:
+      Tensor initialized to values.
+    """
+    total_size = 1
+    for s in sizes:
+      total_size *= s
+    x = [f * 0.5 for f in range(1, total_size + 1)]
+    return tf.constant(x, shape=sizes)
+
+  def _VerifyValues(self, tensor_in_sizes, depthwise_filter_in_sizes,
+                    pointwise_filter_in_sizes, stride, padding, expected):
+    """Verifies the output values of the separable convolution function.
+
+    Args:
+      tensor_in_sizes: Input tensor dimensions.
+      depthwise_filter_in_sizes: Depthwise filter tensor dimensions.
+      pointwise_filter_in_sizes: Pointwise filter tensor dimensions.
+      stride: Stride.
+      padding: Padding type.
+      expected: An array containing the expected operation outputs.
+    """
+    with self.test_session() as sess:
+      t1 = self._InitValues(tensor_in_sizes)
+      f1 = self._InitValues(depthwise_filter_in_sizes)
+      f1.set_shape(depthwise_filter_in_sizes)
+      f2 = self._InitValues(pointwise_filter_in_sizes)
+      conv = tf.nn.separable_conv2d(t1, f1, f2, strides=[1, stride, stride, 1],
+                                    padding=padding)
+      value = sess.run(conv)
+    print "value = ", value
+    self.assertArrayNear(expected, np.ravel(value), 1e-5)
+    self.assertShapeEqual(value, conv)
+
+  def testSeparableConv2D(self):
+    # The output is the result of two convolutions:
+    # First with tensor_in[1, 4, 4, 3] * filter1[2, 2, 3, 3].
+    # Second with intermediate_out[4, 4, 3, 3] * filter2[1, 1, 3, 6].
+    # Complexity is O(3*3*2*2 + 3*6*1*1] as opposed to O(3*6*2*2).
+    expected_output = [
+        6644.5, 6971.5, 7298.5, 7625.5, 7952.5, 8279.5, 8606.5, 8154.5, 8556.5,
+        8958.5, 9360.5, 9762.5, 10164.5, 10566.5, 9664.5, 10141.5, 10618.5,
+        11095.5, 11572.5, 12049.5, 12526.5, 4145.5, 4346.5, 4547.5, 4748.5,
+        4949.5, 5150.5, 5351.5, 12684.5, 13311.5, 13938.5, 14565.5, 15192.5,
+        15819.5, 16446.5, 14194.5, 14896.5, 15598.5, 16300.5, 17002.5, 17704.5,
+        18406.5, 15704.5, 16481.5, 17258.5, 18035.5, 18812.5, 19589.5, 20366.5,
+        6499.5, 6814.5, 7129.5, 7444.5, 7759.5, 8074.5, 8389.5, 18724.5,
+        19651.5, 20578.5, 21505.5, 22432.5, 23359.5, 24286.5, 20234.5, 21236.5,
+        22238.5, 23240.5, 24242.5, 25244.5, 26246.5, 21744.5, 22821.5, 23898.5,
+        24975.5, 26052.5, 27129.5, 28206.5, 8853.5, 9282.5, 9711.5, 10140.5,
+        10569.5, 10998.5, 11427.5, 5746.75, 6010.75, 6274.75, 6538.75, 6802.75,
+        7066.75, 7330.75, 6168.75, 6452.25, 6735.75, 7019.25, 7302.75, 7586.25,
+        7869.75, 6590.75, 6893.75, 7196.75, 7499.75, 7802.75, 8105.75, 8408.75,
+        2036.25, 2119.5, 2202.75, 2286.0, 2369.25, 2452.5, 2535.75]
+
+    self._VerifyValues(tensor_in_sizes=[1, 4, 4, 2],
+                       depthwise_filter_in_sizes=[2, 2, 2, 3],
+                       pointwise_filter_in_sizes=[1, 1, 6, 7],
+                       stride=1, padding="SAME",
+                       expected=expected_output)
+
+
+def GetInceptionFwdTest(input_size, filter_size, stride, padding):
+  def Test(self):
+    tf.logging.info("Testing InceptionFwd %s", (input_size, filter_size,
+                                                stride, padding))
+    self._CompareFwdValues(input_size, filter_size, stride, padding)
+  return Test
+
+
+def GetInceptionBackInputTest(input_size, filter_size, output_size,
+                              stride, padding):
+  def Test(self):
+    tf.logging.info("Testing InceptionBackInput %s",
+                    (input_size, filter_size, output_size, stride, padding))
+    self._CompareBackpropInput(input_size, filter_size, output_size,
+                               stride, padding)
+  return Test
+
+
+def GetInceptionBackFilterTest(input_size, filter_size, output_size,
+                               stride, padding):
+  def Test(self):
+    tf.logging.info("Testing InceptionBackFilter %s",
+                    (input_size, filter_size, output_size, stride, padding))
+    self._CompareBackFilter(input_size, filter_size, output_size,
+                            stride, padding)
+  return Test
+
+
+if __name__ == "__main__":
+  for index, (input_size_, filter_size_, output_size_, stride_,
+              padding_) in enumerate(GetInceptionShapes()):
+    setattr(Conv2DTest, "testInceptionFwd_" + str(index),
+            GetInceptionFwdTest(input_size_, filter_size_, stride_, padding_))
+    setattr(Conv2DTest, "testInceptionBackInput_" + str(index),
+            GetInceptionBackInputTest(input_size_, filter_size_, output_size_,
+                                      stride_, padding_))
+    setattr(Conv2DTest, "testInceptionBackFilter_" + str(index),
+            GetInceptionBackFilterTest(input_size_, filter_size_, output_size_,
+                                       stride_, padding_))
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/cwise_ops_test.py b/tensorflow/python/kernel_tests/cwise_ops_test.py
new file mode 100644
index 0000000000..22491f231a
--- /dev/null
+++ b/tensorflow/python/kernel_tests/cwise_ops_test.py
@@ -0,0 +1,1187 @@
+"""Functional tests for coefficient-wise operations.
+"""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+_ADD = lambda x, y: x + y
+_SUB = lambda x, y: x - y
+_MUL = lambda x, y: x * y
+_DIV = lambda x, y: x / y
+_MOD = lambda x, y: x % y
+_NEG = lambda x: -x
+_ABS = abs
+
+_LT = lambda x, y: x < y
+_LE = lambda x, y: x <= y
+_GT = lambda x, y: x > y
+_GE = lambda x, y: x >= y
+
+_AND = lambda x, y: x & y
+_OR = lambda x, y: x | y
+_XOR = lambda x, y: x ^ y
+_INV = lambda x: ~x
+
+
+class UnaryOpTest(tf.test.TestCase):
+
+  def _compareCpu(self, x, np_func, tf_func):
+    np_ans = np_func(x)
+    with self.test_session(use_gpu=False):
+      inx = tf.convert_to_tensor(x)
+      y = tf_func(inx)
+      tf_cpu = y.eval()
+      self.assertShapeEqual(np_ans, y)
+      self.assertAllClose(np_ans, tf_cpu)
+      if x.dtype == np.float32:
+        s = list(np.shape(x))
+        jacob_t, jacob_n = gc.ComputeGradient(inx, s, y, s, x_init_value=x)
+        self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+      elif x.dtype == np.float64:
+        s = list(np.shape(x))
+        jacob_t, jacob_n = gc.ComputeGradient(inx, s, y, s, x_init_value=x)
+        self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+  def _compareGpu(self, x, np_func, tf_func):
+    np_ans = np_func(x)
+    with self.test_session(use_gpu=True):
+      result = tf_func(tf.convert_to_tensor(x))
+      tf_gpu = result.eval()
+    self.assertShapeEqual(np_ans, result)
+    self.assertAllClose(np_ans, tf_gpu)
+    # TODO(zhifengc/ke): make gradient checker work on GPU.
+
+  def _compareBoth(self, x, np_func, tf_func):
+    self._compareCpu(x, np_func, tf_func)
+    self._compareGpu(x, np_func, tf_func)
+
+  def _inv(self, x):
+    return 1.0 / x
+
+  def _rsqrt(self, x):
+    return self._inv(np.sqrt(x))
+
+  def _sigmoid(self, x):
+    return 1.0 / (1.0 + np.exp(-x))
+
+  def testFloatBasic(self):
+    x = np.arange(-3, 3).reshape(1, 3, 2).astype(np.float32)
+    y = (x + .5).astype(np.float32)     # no zero
+    z = (x + 15.5).astype(np.float32)   # all positive
+    self._compareBoth(x, np.abs, tf.abs)
+    self._compareBoth(x, np.abs, _ABS)
+    self._compareBoth(x, np.negative, tf.neg)
+    self._compareBoth(x, np.negative, _NEG)
+    self._compareBoth(y, self._inv, tf.inv)
+    self._compareBoth(x, np.square, tf.square)
+    self._compareBoth(z, np.sqrt, tf.sqrt)
+    self._compareBoth(z, self._rsqrt, tf.rsqrt)
+    self._compareBoth(x, np.exp, tf.exp)
+    self._compareBoth(z, np.log, tf.log)
+    self._compareBoth(x, np.tanh, tf.tanh)
+    self._compareBoth(x, self._sigmoid, tf.sigmoid)
+    self._compareBoth(y, np.sign, tf.sign)
+    self._compareBoth(x, np.sin, tf.sin)
+    self._compareBoth(x, np.cos, tf.cos)
+
+  def testFloatTanhEdge(self):
+    x = np.arange(40, 40 + 6).reshape(6).astype(np.float32)
+    self._compareBoth(x, np.tanh, tf.tanh)
+    x = np.arange(-40, -40 + 6).reshape(6).astype(np.float32)
+    self._compareBoth(x, np.tanh, tf.tanh)
+
+  def testFloatEmpty(self):
+    x = np.empty((2, 0, 5), dtype=np.float32)
+    self._compareBoth(x, np.abs, tf.abs)
+    self._compareBoth(x, np.abs, _ABS)
+    self._compareBoth(x, np.negative, tf.neg)
+    self._compareBoth(x, np.negative, _NEG)
+    self._compareBoth(x, self._inv, tf.inv)
+    self._compareBoth(x, np.square, tf.square)
+    self._compareBoth(x, np.sqrt, tf.sqrt)
+    self._compareBoth(x, self._rsqrt, tf.rsqrt)
+    self._compareBoth(x, np.exp, tf.exp)
+    self._compareBoth(x, np.log, tf.log)
+    self._compareBoth(x, np.tanh, tf.tanh)
+    self._compareBoth(x, self._sigmoid, tf.sigmoid)
+    self._compareBoth(x, np.sign, tf.sign)
+    self._compareBoth(x, np.sin, tf.sin)
+    self._compareBoth(x, np.cos, tf.cos)
+
+  def testDoubleBasic(self):
+    x = np.arange(-3, 3).reshape(1, 3, 2).astype(np.float64)
+    y = (x + .5).astype(np.float64)    # no zero
+    z = (x + 15.5).astype(np.float64)  # all positive
+    self._compareBoth(x, np.abs, tf.abs)
+    self._compareBoth(x, np.abs, _ABS)
+    self._compareBoth(x, np.negative, tf.neg)
+    self._compareBoth(x, np.negative, _NEG)
+    self._compareBoth(y, self._inv, tf.inv)
+    self._compareBoth(x, np.square, tf.square)
+    self._compareBoth(z, np.sqrt, tf.sqrt)
+    self._compareBoth(z, self._rsqrt, tf.rsqrt)
+    self._compareBoth(x, np.exp, tf.exp)
+    self._compareBoth(z, np.log, tf.log)
+    self._compareBoth(x, np.tanh, tf.tanh)
+    self._compareBoth(x, self._sigmoid, tf.sigmoid)
+    self._compareBoth(y, np.sign, tf.sign)
+    self._compareBoth(x, np.sin, tf.sin)
+    self._compareBoth(x, np.cos, tf.cos)
+
+  def testInt32Basic(self):
+    x = np.arange(-6, 6, 2).reshape(1, 3, 2).astype(np.int32)
+    self._compareCpu(x, np.abs, tf.abs)
+    self._compareCpu(x, np.abs, _ABS)
+    self._compareCpu(x, np.negative, tf.neg)
+    self._compareCpu(x, np.negative, _NEG)
+    self._compareCpu(x, np.square, tf.square)
+    self._compareCpu(x, np.sign, tf.sign)
+
+  def testInt64Basic(self):
+    x = np.arange(
+        -6 << 40, 6 << 40, 2 << 40).reshape(1, 3, 2).astype(np.int64)
+    self._compareCpu(x, np.abs, tf.abs)
+    self._compareCpu(x, np.abs, _ABS)
+    self._compareCpu(x, np.negative, tf.neg)
+    self._compareCpu(x, np.negative, _NEG)
+    self._compareCpu(x, np.square, tf.square)
+    self._compareCpu(x, np.sign, tf.sign)
+
+  def testComplex64Basic(self):
+    x = np.complex(1, 1) * np.arange(-3, 3).reshape(1, 3, 2).astype(
+        np.complex64)
+    y = x + 0.5  # no zeros
+    self._compareCpu(x, np.abs, tf.abs)
+    self._compareCpu(x, np.abs, _ABS)
+    self._compareCpu(x, np.negative, tf.neg)
+    self._compareCpu(x, np.negative, _NEG)
+    self._compareCpu(y, self._inv, tf.inv)
+    self._compareCpu(x, np.square, tf.square)
+    self._compareCpu(x, np.sqrt, tf.sqrt)
+    self._compareCpu(y, self._rsqrt, tf.rsqrt)
+    self._compareCpu(x, np.exp, tf.exp)
+    self._compareCpu(y, np.log, tf.log)
+    self._compareCpu(x, np.tanh, tf.tanh)
+    self._compareCpu(x, self._sigmoid, tf.sigmoid)
+    self._compareCpu(x, np.sin, tf.sin)
+    self._compareCpu(x, np.cos, tf.cos)
+
+
+class BinaryOpTest(tf.test.TestCase):
+
+  def _compareCpu(self, x, y, np_func, tf_func):
+    np_ans = np_func(x, y)
+    with self.test_session(use_gpu=False):
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = tf_func(inx, iny)
+      tf_cpu = out.eval()
+      # Test that the op takes precedence over numpy operators.
+      np_left = tf_func(x, iny).eval()
+      np_right = tf_func(inx, y).eval()
+
+    self.assertAllClose(np_ans, tf_cpu)
+    self.assertAllClose(np_ans, np_left)
+    self.assertAllClose(np_ans, np_right)
+    self.assertShapeEqual(np_ans, out)
+
+  def _compareGradientX(self, x, y, np_func, tf_func):
+    z = np_func(x, y)
+    zs = list(z.shape)
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = tf_func(inx, iny)
+      xs = list(x.shape)
+      jacob_t, jacob_n = gc.ComputeGradient(inx, xs, out, zs, x_init_value=x)
+      if x.dtype == np.float32:
+        self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+      elif x.dtype == np.float64:
+        self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+  def _compareGradientY(self, x, y, np_func, tf_func):
+    z = np_func(x, y)
+    zs = list(z.shape)
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = tf_func(inx, iny)
+      ys = list(np.shape(y))
+      jacob_t, jacob_n = gc.ComputeGradient(iny, ys, out, zs, x_init_value=y)
+    if x.dtype == np.float32:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+    elif x.dtype == np.float64:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+  def _compareGpu(self, x, y, np_func, tf_func):
+    np_ans = np_func(x, y)
+    with self.test_session(use_gpu=True):
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = tf_func(inx, iny)
+      tf_gpu = out.eval()
+    self.assertAllClose(np_ans, tf_gpu)
+    self.assertShapeEqual(np_ans, out)
+    # TODO(zhifengc/ke): make gradient checker work on GPU.
+
+  def _compareBoth(self, x, y, np_func, tf_func):
+    self._compareCpu(x, y, np_func, tf_func)
+    if x.dtype == np.float32 or x.dtype == np.float64:
+      self._compareGradientX(x, y, np_func, tf_func)
+      self._compareGradientY(x, y, np_func, tf_func)
+      self._compareGpu(x, y, np_func, tf_func)
+
+  def testFloatBasic(self):
+    x = np.linspace(-10, 10, 6).reshape(1, 3, 2).astype(np.float32)
+    y = np.linspace(20, -20, 6).reshape(1, 3, 2).astype(np.float32)
+    self._compareBoth(x, y, np.add, tf.add)
+    self._compareBoth(x, y, np.subtract, tf.sub)
+    self._compareBoth(x, y, np.multiply, tf.mul)
+    self._compareBoth(x, y + 0.1, np.divide, tf.div)
+    self._compareBoth(x, y, np.add, _ADD)
+    self._compareBoth(x, y, np.subtract, _SUB)
+    self._compareBoth(x, y, np.multiply, _MUL)
+    self._compareBoth(x, y + 0.1, np.divide, _DIV)
+
+  def testFloatDifferentShapes(self):
+    x = np.array([1, 2, 3, 4]).reshape(2, 2).astype(np.float32)
+    y = np.array([1, 2]).reshape(2, 1).astype(np.float32)
+    with self.test_session() as sess:
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      s = tf.reduce_sum(inx * iny)
+      gx, gy = sess.run(tf.gradients(s, [inx, iny]))
+    # gx is simply the broadcasted y
+    self.assertAllEqual(gx, np.array([1, 1, 2, 2])
+                        .reshape(2, 2).astype(np.float32))
+    # gy is x's column summed up
+    self.assertAllEqual(gy, np.array([3, 7]).
+                        reshape(2, 1).astype(np.float32))
+
+  def testDoubleBasic(self):
+    x = np.linspace(-10, 10, 6).reshape(1, 3, 2).astype(np.float64)
+    y = np.linspace(20, -20, 6).reshape(1, 3, 2).astype(np.float64)
+    self._compareBoth(x, y, np.add, tf.add)
+    self._compareBoth(x, y, np.subtract, tf.sub)
+    self._compareBoth(x, y, np.multiply, tf.mul)
+    self._compareBoth(x, y + 0.1, np.divide, tf.div)
+    self._compareBoth(x, y, np.add, _ADD)
+    self._compareBoth(x, y, np.subtract, _SUB)
+    self._compareBoth(x, y, np.multiply, _MUL)
+    self._compareBoth(x, y + 0.1, np.divide, _DIV)
+
+  def testInt8Basic(self):
+    x = np.arange(1, 13, 2).reshape(1, 3, 2).astype(np.int8)
+    y = np.arange(1, 7, 1).reshape(1, 3, 2).astype(np.int8)
+    self._compareBoth(x, y, np.multiply, tf.mul)
+    self._compareBoth(x, y, np.multiply, _MUL)
+
+  def testInt16Basic(self):
+    x = np.arange(1, 13, 2).reshape(1, 3, 2).astype(np.int16)
+    y = np.arange(1, 7, 1).reshape(1, 3, 2).astype(np.int16)
+    self._compareBoth(x, y, np.multiply, tf.mul)
+    self._compareBoth(x, y, np.multiply, _MUL)
+
+  def testInt32Basic(self):
+    x = np.arange(1, 13, 2).reshape(1, 3, 2).astype(np.int32)
+    y = np.arange(1, 7, 1).reshape(1, 3, 2).astype(np.int32)
+    self._compareBoth(x, y, np.add, tf.add)
+    self._compareBoth(x, y, np.subtract, tf.sub)
+    self._compareBoth(x, y, np.multiply, tf.mul)
+    # NOTE: int32 division is ill-defined.
+    self._compareBoth(x, y, np.divide, tf.div)
+    self._compareBoth(x, y, np.mod, tf.mod)
+    self._compareBoth(x, y, np.add, _ADD)
+    self._compareBoth(x, y, np.subtract, _SUB)
+    self._compareBoth(x, y, np.multiply, _MUL)
+    # NOTE: int32 division is ill-defined.
+    self._compareBoth(x, y, np.divide, _DIV)
+    self._compareBoth(x, y, np.mod, _MOD)
+
+  def testInt64Basic(self):
+    x = np.arange(1 << 40, 13 << 40, 2 << 40).reshape(1, 3, 2).astype(np.int64)
+    y = np.arange(1, 7, 1).reshape(1, 3, 2).astype(np.int64)
+    self._compareBoth(x, y, np.subtract, tf.sub)
+    self._compareBoth(x, y, np.multiply, tf.mul)
+    # NOTE: int64 division is ill-defined.
+    self._compareBoth(x, y, np.divide, tf.div)
+    self._compareBoth(x, y, np.mod, tf.mod)
+    self._compareBoth(x, y, np.subtract, _SUB)
+    self._compareBoth(x, y, np.multiply, _MUL)
+    # NOTE: int64 division is ill-defined.
+    self._compareBoth(x, y, np.divide, _DIV)
+    self._compareBoth(x, y, np.mod, _MOD)
+
+  def testComplex64Basic(self):
+    x = np.complex(1, 1) * np.linspace(-10, 10, 6).reshape(1, 3, 2).astype(
+        np.complex64)
+    y = np.complex(1, 1) * np.linspace(20, -20, 6).reshape(1, 3, 2).astype(
+        np.complex64)
+    self._compareCpu(x, y, np.add, tf.add)
+    self._compareCpu(x, y, np.subtract, tf.sub)
+    self._compareCpu(x, y, np.multiply, tf.mul)
+    self._compareCpu(x, y + 0.1, np.divide, tf.div)
+    self._compareCpu(x, y, np.add, _ADD)
+    self._compareCpu(x, y, np.subtract, _SUB)
+    self._compareCpu(x, y, np.multiply, _MUL)
+    self._compareCpu(x, y + 0.1, np.divide, _DIV)
+
+  def _compareBCast(self, xs, ys, dtype, np_func, tf_func):
+    x = (1 + np.linspace(0, 5, np.prod(xs))).astype(dtype).reshape(xs)
+    y = (1 + np.linspace(0, 5, np.prod(ys))).astype(dtype).reshape(ys)
+    self._compareCpu(x, y, np_func, tf_func)
+    if x.dtype == np.float32 or x.dtype == np.float64:
+      self._compareGradientX(x, y, np_func, tf_func)
+      self._compareGradientY(x, y, np_func, tf_func)
+      self._compareGpu(x, y, np_func, tf_func)
+
+  # TODO(josh11b,vrv): Refactor this to use parameterized tests.
+  def _testBCastByFunc(self, funcs, xs, ys):
+    dtypes = [
+        np.float32,
+        np.float64,
+        np.int32,
+        np.int64,
+        np.complex64
+    ]
+    for dtype in dtypes:
+      for (np_func, tf_func) in funcs:
+        self._compareBCast(xs, ys, dtype, np_func, tf_func)
+        self._compareBCast(ys, xs, dtype, np_func, tf_func)
+
+  def _testBCastA(self, xs, ys):
+    funcs = [
+        (np.add, tf.add),
+        (np.add, _ADD),
+    ]
+    self._testBCastByFunc(funcs, xs, ys)
+
+  def _testBCastB(self, xs, ys):
+    funcs = [
+        (np.subtract, tf.sub),
+        (np.subtract, _SUB),
+        (np.power, tf.pow),
+    ]
+    self._testBCastByFunc(funcs, xs, ys)
+
+  def _testBCastC(self, xs, ys):
+    funcs = [
+        (np.multiply, tf.mul),
+        (np.multiply, _MUL),
+    ]
+    self._testBCastByFunc(funcs, xs, ys)
+
+  def _testBCastD(self, xs, ys):
+    funcs = [
+        (np.divide, tf.div),
+        (np.divide, _DIV)
+    ]
+    self._testBCastByFunc(funcs, xs, ys)
+
+  def testBCast_0A(self):
+    self._testBCastA([1, 3, 2], [1])
+
+  def testBCast_0B(self):
+    self._testBCastB([1, 3, 2], [1])
+
+  def testBCast_0C(self):
+    self._testBCastC([1, 3, 2], [1])
+
+  def testBCast_0D(self):
+    self._testBCastD([1, 3, 2], [1])
+
+  def testBCast_1A(self):
+    self._testBCastA([1, 3, 2], [2])
+
+  def testBCast_1B(self):
+    self._testBCastB([1, 3, 2], [2])
+
+  def testBCast_1C(self):
+    self._testBCastC([1, 3, 2], [2])
+
+  def testBCast_1D(self):
+    self._testBCastD([1, 3, 2], [2])
+
+  def testBCast_2A(self):
+    self._testBCastA([1, 3, 2], [3, 2])
+
+  def testBCast_2B(self):
+    self._testBCastB([1, 3, 2], [3, 2])
+
+  def testBCast_2C(self):
+    self._testBCastC([1, 3, 2], [3, 2])
+
+  def testBCast_2D(self):
+    self._testBCastD([1, 3, 2], [3, 2])
+
+  def testBCast_3A(self):
+    self._testBCastA([1, 3, 2], [3, 1])
+
+  def testBCast_3B(self):
+    self._testBCastB([1, 3, 2], [3, 1])
+
+  def testBCast_3C(self):
+    self._testBCastC([1, 3, 2], [3, 1])
+
+  def testBCast_3D(self):
+    self._testBCastD([1, 3, 2], [3, 1])
+
+  def testBCast_4A(self):
+    self._testBCastA([1, 3, 2], [1, 3, 2])
+
+  def testBCast_4B(self):
+    self._testBCastB([1, 3, 2], [1, 3, 2])
+
+  def testBCast_4C(self):
+    self._testBCastC([1, 3, 2], [1, 3, 2])
+
+  def testBCast_4D(self):
+    self._testBCastD([1, 3, 2], [1, 3, 2])
+
+  def testBCast_5A(self):
+    self._testBCastA([1, 3, 2], [2, 3, 1])
+
+  def testBCast_5B(self):
+    self._testBCastB([1, 3, 2], [2, 3, 1])
+
+  def testBCast_5C(self):
+    self._testBCastC([1, 3, 2], [2, 3, 1])
+
+  def testBCast_5D(self):
+    self._testBCastD([1, 3, 2], [2, 3, 1])
+
+  def testBCast_6A(self):
+    self._testBCastA([1, 3, 2], [2, 1, 1])
+
+  def testBCast_6B(self):
+    self._testBCastB([1, 3, 2], [2, 1, 1])
+
+  def testBCast_6C(self):
+    self._testBCastC([1, 3, 2], [2, 1, 1])
+
+  def testBCast_6D(self):
+    self._testBCastD([1, 3, 2], [2, 1, 1])
+
+  def testBCast_7A(self):
+    self._testBCastA([1, 3, 2], [1, 3, 1])
+
+  def testBCast_7B(self):
+    self._testBCastB([1, 3, 2], [1, 3, 1])
+
+  def testBCast_7C(self):
+    self._testBCastC([1, 3, 2], [1, 3, 1])
+
+  def testBCast_7D(self):
+    self._testBCastD([1, 3, 2], [1, 3, 1])
+
+  def testBCast_8A(self):
+    self._testBCastA([2, 1, 5], [2, 3, 1])
+
+  def testBCast_8B(self):
+    self._testBCastB([2, 1, 5], [2, 3, 1])
+
+  def testBCast_8C(self):
+    self._testBCastC([2, 1, 5], [2, 3, 1])
+
+  def testBCast_8D(self):
+    self._testBCastD([2, 1, 5], [2, 3, 1])
+
+  def testBCast_9A(self):
+    self._testBCastA([2, 0, 5], [2, 0, 1])
+
+  def testBCast_9B(self):
+    self._testBCastB([2, 0, 5], [2, 0, 1])
+
+  def testBCast_9C(self):
+    self._testBCastC([2, 0, 5], [2, 0, 1])
+
+  def testBCast_9D(self):
+    self._testBCastD([2, 0, 5], [2, 0, 1])
+
+  def testBCast_10A(self):
+    self._testBCastA([2, 3, 0], [2, 3, 1])
+
+  def testBCast_10B(self):
+    self._testBCastB([2, 3, 0], [2, 3, 1])
+
+  def testBCast_10C(self):
+    self._testBCastC([2, 3, 0], [2, 3, 1])
+
+  def testBCast_10D(self):
+    self._testBCastD([2, 3, 0], [2, 3, 1])
+
+  def testBCast_11A(self):
+    self._testBCastA([1, 3, 2], [1, 3, 2])
+
+  def testBCast_11B(self):
+    self._testBCastB([1, 3, 2], [1, 3, 2])
+
+  def testBCast_11C(self):
+    self._testBCastC([1, 3, 2], [1, 3, 2])
+
+  def testBCast_11D(self):
+    self._testBCastD([1, 3, 2], [1, 3, 2])
+
+  def testBCast_12A(self):
+    self._testBCastA([1, 1, 1, 1, 3, 2], [1, 3, 2])
+
+  def testBCast_12B(self):
+    self._testBCastB([1, 1, 1, 1, 3, 2], [1, 3, 2])
+
+  def testBCast_12C(self):
+    self._testBCastC([1, 1, 1, 1, 3, 2], [1, 3, 2])
+
+  def testBCast_12D(self):
+    self._testBCastD([1, 1, 1, 1, 3, 2], [1, 3, 2])
+
+  def testBCast_13A(self):
+    self._testBCastA([1, 3, 2, 1, 1], [1])
+
+  def testBCast_13B(self):
+    self._testBCastB([1, 3, 2, 1, 1], [1])
+
+  def testBCast_13C(self):
+    self._testBCastC([1, 3, 2, 1, 1], [1])
+
+  def testBCast_13D(self):
+    self._testBCastD([1, 3, 2, 1, 1], [1])
+
+  def testBCast_14A(self):
+    self._testBCastA([2, 3, 1, 1, 5], [1])
+
+  def testBCast_14B(self):
+    self._testBCastB([2, 3, 1, 1, 5], [1])
+
+  def testBCast_14C(self):
+    self._testBCastC([2, 3, 1, 1, 5], [1])
+
+  def testBCast_14D(self):
+    self._testBCastD([2, 3, 1, 1, 5], [1])
+
+  def testBCast_15A(self):
+    self._testBCastA([10, 3, 1, 2], [3, 1, 2])
+
+  def testBCast_15B(self):
+    self._testBCastB([10, 3, 1, 2], [3, 1, 2])
+
+  def testBCast_15C(self):
+    self._testBCastC([10, 3, 1, 2], [3, 1, 2])
+
+  def testBCast_15D(self):
+    self._testBCastD([10, 3, 1, 2], [3, 1, 2])
+
+  def testMismatchedDimensions(self):
+    for func in [tf.add, tf.sub, tf.mul, tf.div,
+                 _ADD, _SUB, _MUL, _DIV]:
+      with self.assertRaisesWithPredicateMatch(
+          ValueError, lambda e: "Incompatible shapes" in e.message):
+        func(tf.convert_to_tensor([10.0, 20.0, 30.0]),
+             tf.convert_to_tensor([[40.0, 50.0], [60.0, 70.0]]))
+
+
+class ComparisonOpTest(tf.test.TestCase):
+
+  def _compare(self, func, x, y, dtype):
+    with self.test_session(use_gpu=False):
+      out = func(tf.convert_to_tensor(np.array([x]).astype(dtype)),
+                 tf.convert_to_tensor(np.array([y]).astype(dtype)))
+      ret = out.eval()
+    return ret[0]
+
+  def testScalarCompareScalar(self):
+    dtypes = [np.float32, np.float64, np.int32, np.int64]
+    data = [-1, 0, 1]
+    for t in dtypes:
+      for x in data:
+        for y in data:
+          self.assertEqual(self._compare(tf.less, x, y, t),
+                           x < y)
+          self.assertEqual(self._compare(tf.less_equal, x, y, t),
+                           x <= y)
+          self.assertEqual(self._compare(tf.greater, x, y, t),
+                           x > y)
+          self.assertEqual(self._compare(tf.greater_equal, x, y, t),
+                           x >= y)
+          self.assertEqual(self._compare(tf.equal, x, y, t),
+                           x == y)
+          self.assertEqual(self._compare(tf.not_equal, x, y, t),
+                           x != y)
+
+  def _compareCpu(self, x, y, np_func, tf_func):
+    np_ans = np_func(x, y)
+    with self.test_session(use_gpu=False):
+      out = tf_func(tf.convert_to_tensor(x), tf.convert_to_tensor(y))
+      tf_cpu = out.eval()
+    self.assertAllEqual(np_ans, tf_cpu)
+
+  def _compareGpu(self, x, y, np_func, tf_func):
+    np_ans = np_func(x, y)
+    with self.test_session(use_gpu=True):
+      out = tf_func(tf.convert_to_tensor(x), tf.convert_to_tensor(y))
+      tf_gpu = out.eval()
+    self.assertAllEqual(np_ans, tf_gpu)
+
+  def _compareBoth(self, x, y, np_func, tf_func):
+    self._compareCpu(x, y, np_func, tf_func)
+    if x.dtype == np.float32 or x.dtype == np.float64:
+      self._compareGpu(x, y, np_func, tf_func)
+
+  def testTensorCompareTensor(self):
+    x = np.linspace(-15, 15, 6).reshape(1, 3, 2)
+    y = np.linspace(20, -10, 6).reshape(1, 3, 2)
+    for t in [np.float32, np.float64, np.int32, np.int64]:
+      xt = x.astype(t)
+      yt = y.astype(t)
+      self._compareBoth(xt, yt, np.less, tf.less)
+      self._compareBoth(xt, yt, np.less_equal, tf.less_equal)
+      self._compareBoth(xt, yt, np.greater, tf.greater)
+      self._compareBoth(xt, yt, np.greater_equal, tf.greater_equal)
+      self._compareBoth(xt, yt, np.equal, tf.equal)
+      self._compareBoth(xt, yt, np.not_equal, tf.not_equal)
+    # TODO(zhifengc): complex64 doesn't work on GPU yet.
+    self._compareCpu(x.astype(np.complex64), y.astype(np.complex64),
+                     np.equal, tf.equal)
+    self._compareCpu(x.astype(np.complex64), y.astype(np.complex64),
+                     np.not_equal, tf.not_equal)
+
+  def _compareBCast(self, xs, ys, dtype, np_func, tf_func):
+    x = np.linspace(-15, 15, np.prod(xs)).astype(dtype).reshape(xs)
+    y = np.linspace(20, -10, np.prod(ys)).astype(dtype).reshape(ys)
+    self._compareCpu(x, y, np_func, tf_func)
+    self._compareCpu(y, x, np_func, tf_func)
+    if x.dtype == np.float32 or x.dtype == np.float64:
+      self._compareGpu(x, y, np_func, tf_func)
+      self._compareGpu(y, x, np_func, tf_func)
+
+  def _testBCastByFunc(self, np_func, tf_func):
+    shapes = [
+        ([1, 3, 2], [1]),
+        ([1, 3, 2], [2]),
+        ([1, 3, 2], [3, 2]),
+        ([1, 3, 2], [3, 1]),
+        ([1, 3, 2], [1, 3, 2]),
+        ([1, 3, 2], [2, 3, 1]),
+        ([1, 3, 2], [2, 1, 1]),
+        ([1, 3, 2], [1, 3, 1]),
+        ([2, 1, 5], [2, 3, 1]),
+        ([2, 0, 5], [2, 0, 1]),
+        ([2, 3, 0], [2, 3, 1]),
+    ]
+    dtypes = [
+        np.float32,
+        np.float64,
+        np.int32,
+        np.int64,
+    ]
+    for (xs, ys) in shapes:
+      for dtype in dtypes:
+        self._compareBCast(xs, ys, dtype, np_func, tf_func)
+
+  def testBCastLess(self):
+    self._testBCastByFunc(np.less, tf.less)
+
+  def testBCastLessEqual(self):
+    self._testBCastByFunc(np.less_equal, tf.less_equal)
+
+  def testBCastGreater(self):
+    self._testBCastByFunc(np.greater, tf.greater)
+
+  def testBCastGreaterEqual(self):
+    self._testBCastByFunc(np.greater_equal, tf.greater_equal)
+
+  def testBCastEqual(self):
+    self._testBCastByFunc(np.equal, tf.equal)
+
+  def testBCastNotEqual(self):
+    self._testBCastByFunc(np.not_equal, tf.not_equal)
+
+  def testShapeMismatch(self):
+    dtypes = [np.float32, np.float64, np.int32, np.int64]
+    funcs = [tf.less, tf.less_equal, tf.greater,
+             tf.greater_equal, tf.equal, tf.not_equal]
+    x = np.arange(0, 10).reshape([2, 5])
+    y = np.arange(0, 10).reshape([5, 2])
+    for t in dtypes:
+      for f in funcs:
+        with self.assertRaisesWithPredicateMatch(
+            ValueError, lambda e: "Incompatible shapes" in e.message):
+          f(x.astype(t), y.astype(t))
+
+
+class LogicalOpTest(tf.test.TestCase):
+
+  def _compareBinary(self, x, y, np_func, tf_func, use_gpu=False):
+    np_ans = np_func(x, y)
+    with self.test_session(use_gpu=use_gpu):
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = tf_func(inx, iny)
+      tf_val = out.eval()
+    self.assertEqual(out.dtype, tf.bool)
+    self.assertAllEqual(np_ans, tf_val)
+    self.assertShapeEqual(np_ans, out)
+
+  def _not(self, x, use_gpu=False):
+    np_ans = np.logical_not(x)
+    with self.test_session(use_gpu=use_gpu):
+      out = tf.logical_not(tf.convert_to_tensor(x))
+      tf_val = out.eval()
+    self.assertEqual(out.dtype, tf.bool)
+    self.assertAllEqual(np_ans, tf_val)
+    self.assertShapeEqual(np_ans, out)
+
+  def testScalar(self):
+    data = [np.array([True]), np.array([False])]
+    for use_gpu in [True, False]:
+      for x in data:
+        self._not(x, use_gpu)
+      for x in data:
+        for y in data:
+          self._compareBinary(
+              x, y, np.logical_and, tf.logical_and, use_gpu)
+          self._compareBinary(
+              x, y, np.logical_or, tf.logical_or, use_gpu)
+          self._compareBinary(
+              x, y, np.logical_xor, tf.logical_xor, use_gpu)
+
+  def testTensor(self):
+    x = np.random.randint(0, 2, 6).astype(np.bool).reshape(1, 3, 2)
+    y = np.random.randint(0, 2, 6).astype(np.bool).reshape(1, 3, 2)
+    for use_gpu in [True, False]:
+      self._not(x, use_gpu)
+      self._compareBinary(x, y, np.logical_and, tf.logical_and, use_gpu)
+      self._compareBinary(x, y, np.logical_or, tf.logical_or, use_gpu)
+      self._compareBinary(x, y, np.logical_xor, tf.logical_xor, use_gpu)
+
+  def testBCast(self):
+    shapes = [
+        ([1, 3, 2], [1]),
+        ([1, 3, 2], [2]),
+        ([1, 3, 2], [3, 2]),
+        ([1, 3, 2], [3, 1]),
+        ([1, 3, 2], [1, 3, 2]),
+        ([1, 3, 2], [2, 3, 1]),
+        ([1, 3, 2], [2, 1, 1]),
+        ([1, 3, 2], [1, 3, 1]),
+        ([2, 1, 5], [2, 3, 1]),
+        ([2, 0, 5], [2, 0, 1]),
+        ([2, 3, 0], [2, 3, 1]),
+    ]
+    for (xs, ys) in shapes:
+      x = np.random.randint(0, 2, np.prod(xs)).astype(np.bool).reshape(xs)
+      y = np.random.randint(0, 2, np.prod(ys)).astype(np.bool).reshape(ys)
+      for use_gpu in [True, False]:
+        self._compareBinary(x, y, np.logical_and, tf.logical_and, use_gpu)
+        self._compareBinary(x, y, np.logical_or, tf.logical_or, use_gpu)
+        self._compareBinary(x, y, np.logical_xor, tf.logical_xor, use_gpu)
+
+  def testShapeMismatch(self):
+    x = np.random.randint(0, 2, 6).astype(np.bool).reshape(1, 3, 2)
+    y = np.random.randint(0, 2, 6).astype(np.bool).reshape(3, 2, 1)
+    for f in [tf.logical_and, tf.logical_or, tf.logical_xor]:
+      with self.assertRaisesWithPredicateMatch(
+          ValueError, lambda e: "Incompatible shapes" in e.message):
+        f(x, y)
+
+
+class SelectOpTest(tf.test.TestCase):
+
+  def _compare(self, c, x, y, use_gpu):
+    np_ans = np.where(c, x, y)
+    with self.test_session(use_gpu=use_gpu):
+      out = tf.select(c, x, y)
+      tf_ans = out.eval()
+    self.assertAllEqual(np_ans, tf_ans)
+    self.assertShapeEqual(np_ans, out)
+
+  def _compareGradientX(self, c, x, y):
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = tf.select(c, inx, iny)
+      s = list(np.shape(c))
+      jacob_t, jacob_n = gc.ComputeGradient(inx, s, out, s, x_init_value=x)
+    if x.dtype == np.float32:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+    elif x.dtype == np.float64:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+  def _compareGradientY(self, c, x, y):
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = tf.select(c, inx, iny)
+      s = list(np.shape(c))
+      jacob_t, jacob_n = gc.ComputeGradient(iny, s, out, s, x_init_value=y)
+    if x.dtype == np.float32:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+    elif x.dtype == np.float64:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+  def testBasic(self):
+    c = np.random.randint(0, 2, 6).astype(np.bool).reshape(1, 3, 2)
+    x = np.random.rand(1, 3, 2) * 100
+    y = np.random.rand(1, 3, 2) * 100
+    for t in [np.float32, np.float64, np.int32, np.int64, np.complex64]:
+      xt = x.astype(t)
+      yt = y.astype(t)
+      self._compare(c, xt, yt, use_gpu=False)
+      if t in [np.float32, np.float64]:
+        self._compare(c, xt, yt, use_gpu=True)
+
+  def testGradients(self):
+    c = np.random.randint(0, 2, 6).astype(np.bool).reshape(1, 3, 2)
+    x = np.random.rand(1, 3, 2) * 100
+    y = np.random.rand(1, 3, 2) * 100
+    for t in [np.float32, np.float64]:
+      xt = x.astype(t)
+      yt = y.astype(t)
+      self._compareGradientX(c, xt, yt)
+      self._compareGradientY(c, xt, yt)
+
+  def testShapeMismatch(self):
+    c = np.random.randint(0, 2, 6).astype(np.bool).reshape(1, 3, 2)
+    x = np.random.rand(1, 3, 2) * 100
+    y = np.random.rand(2, 5, 3) * 100
+    for t in [np.float32, np.float64, np.int32, np.int64, np.complex64]:
+      xt = x.astype(t)
+      yt = y.astype(t)
+      with self.assertRaises(ValueError):
+        tf.select(c, xt, yt)
+
+
+class MinMaxOpTest(tf.test.TestCase):
+
+  def _compare(self, x, y, use_gpu):
+    np_min, np_max = np.minimum(x, y), np.maximum(x, y)
+    with self.test_session(use_gpu=use_gpu) as sess:
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      omin, omax = tf.minimum(inx, iny), tf.maximum(inx, iny)
+      tf_min, tf_max = sess.run([omin, omax])
+    self.assertAllEqual(np_min, tf_min)
+    self.assertAllEqual(np_max, tf_max)
+
+  def testBasic(self):
+    x = np.random.rand(1, 3, 2) * 100.
+    y = np.random.rand(1, 3, 2) * 100.
+    for t in [np.float32, np.float64, np.int32, np.int64]:
+      self._compare(x.astype(t), y.astype(t), use_gpu=False)
+      self._compare(x.astype(t), y.astype(t), use_gpu=True)
+
+  def testDifferentShapes(self):
+    x = np.random.rand(1, 3, 2) * 100.
+    y = np.random.rand(2) * 100.  # should broadcast
+    for t in [np.float32, np.float64, np.int32, np.int64]:
+      self._compare(x.astype(t), y.astype(t), use_gpu=False)
+      self._compare(x.astype(t), y.astype(t), use_gpu=True)
+
+  def testScalar(self):
+    x = np.random.rand(1, 3, 2) * 100.
+    y = np.asscalar(np.random.rand(1) * 100.)  # should broadcast
+    # dropped np.float64, int64 because TF automatically converts to 32 bit
+    for t in [np.float32, np.int32]:
+      self._compare(x.astype(t), t(y), use_gpu=False)
+      self._compare(x.astype(t), t(y), use_gpu=True)
+
+  def _compareGradientX(self, func, x, y):
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = func(inx, iny)
+      s = list(np.shape(x))
+      jacob_t, jacob_n = gc.ComputeGradient(inx, s, out, s, x_init_value=x)
+    if x.dtype == np.float32:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+    elif x.dtype == np.float64:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+  def _compareGradientY(self, func, x, y):
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      iny = tf.convert_to_tensor(y)
+      out = func(inx, iny)
+      s = list(np.shape(x))
+      jacob_t, jacob_n = gc.ComputeGradient(iny, s, out, s, x_init_value=y)
+    if x.dtype == np.float32:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+    elif x.dtype == np.float64:
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+  def testGradients(self):
+    x = np.random.rand(1, 3, 2) * 100.
+    # ensure x != y
+    y = x + (np.random.randint(2, size=x.shape) - .5) * 2  # -1 or +1
+    self._compareGradientX(tf.maximum, x, y)
+    self._compareGradientY(tf.maximum, x, y)
+    self._compareGradientX(tf.minimum, x, y)
+    self._compareGradientY(tf.minimum, x, y)
+
+
+class MathOpsOverloadTest(tf.test.TestCase):
+
+  def _computeTensorAndLiteral(self, x, y, dtype, func):
+    with self.test_session(use_gpu=False):
+      inx = tf.convert_to_tensor(x, dtype=dtype)
+      z = func(inx, y)  # Should use __add__, __sub__, etc.
+      return z.eval()
+
+  def _computeLiteralAndTensor(self, x, y, dtype, func):
+    with self.test_session(use_gpu=False):
+      iny = tf.convert_to_tensor(y, dtype=dtype)
+      z = func(x, iny)  # Should use __radd__, __rsub__, etc.
+      return z.eval()
+
+  def _compareBinary(self, x, y, dtype, np_func, tf_func):
+    np_ans = np_func(x, y)
+    self.assertAllClose(np_ans, self._computeTensorAndLiteral(
+        x, y, dtype, tf_func))
+    self.assertAllClose(np_ans, self._computeLiteralAndTensor(
+        x, y, dtype, tf_func))
+
+  def _compareUnary(self, x, dtype, np_func, tf_func):
+    np_ans = np_func(x)
+    with self.test_session(use_gpu=False):
+      self.assertAllClose(np_ans, tf_func(tf.convert_to_tensor(x, dtype=dtype)).eval())
+
+  def testOverload(self):
+    dtypes = [
+        tf.float32,
+        tf.float64,
+        tf.int32,
+        tf.int64,
+        tf.complex64,
+    ]
+    funcs = [
+        (np.add, _ADD),
+        (np.subtract, _SUB),
+        (np.multiply, _MUL),
+        (np.divide, _DIV)
+    ]
+    for dtype in dtypes:
+      for np_func, tf_func in funcs:
+        self._compareBinary(10, 5, dtype, np_func, tf_func)
+    # Mod only works for int32 and int64.
+    for dtype in [tf.int32, tf.int64]:
+      self._compareBinary(10, 3, dtype, np.mod, _MOD)
+
+  def testOverloadComparisons(self):
+    dtypes = [
+        tf.float32,
+        tf.float64,
+        tf.int32,
+        tf.int64,
+    ]
+    funcs = [
+        (np.less, _LT),
+        (np.less_equal, _LE),
+        (np.greater, _GT),
+        (np.greater_equal, _GE),
+    ]
+    for dtype in dtypes:
+      for np_func, tf_func in funcs:
+        self._compareBinary(10, 5, dtype, np_func, tf_func)
+    logical_funcs = [
+        (np.logical_and, _AND),
+        (np.logical_or, _OR),
+        (np.logical_xor, _XOR),
+    ]
+    for np_func, tf_func in logical_funcs:
+      self._compareBinary(True, False, tf.bool, np_func, tf_func)
+      self._compareBinary(True, True, tf.bool, np_func, tf_func)
+      self._compareBinary(False, False, tf.bool, np_func, tf_func)
+      self._compareBinary(False, True, tf.bool, np_func, tf_func)
+      self._compareBinary([True, True, False, False],
+                          [True, False, True, False],
+                          tf.bool, np_func, tf_func)
+    self._compareUnary(True, tf.bool, np.logical_not, _INV)
+    self._compareUnary(False, tf.bool, np.logical_not, _INV)
+    self._compareUnary([True, False], tf.bool, np.logical_not, _INV)
+
+
+class IsFiniteInfNanTest(tf.test.TestCase):
+
+  def _compare(self, x, use_gpu):
+    np_finite, np_inf, np_nan = np.isfinite(x), np.isinf(x), np.isnan(x)
+    with self.test_session(use_gpu=use_gpu) as sess:
+      inx = tf.convert_to_tensor(x)
+      ofinite, oinf, onan = tf.is_finite(inx), tf.is_inf(
+          inx), tf.is_nan(inx)
+      tf_finite, tf_inf, tf_nan = sess.run([ofinite, oinf, onan])
+    self.assertAllEqual(np_inf, tf_inf)
+    self.assertAllEqual(np_nan, tf_nan)
+    self.assertAllEqual(np_finite, tf_finite)
+    self.assertShapeEqual(np_inf, oinf)
+    self.assertShapeEqual(np_nan, onan)
+    self.assertShapeEqual(np_finite, ofinite)
+
+  def _testDtype(self, dtype):
+    fi = np.finfo(dtype)
+    data = np.array([0, -1, 1, fi.resolution, -fi.resolution, fi.min, fi.max,
+                     -np.inf, np.inf, np.nan]).astype(dtype)
+    self._compare(data, use_gpu=False)
+    self._compare(data, use_gpu=True)
+
+  def testFloat(self):
+    self._testDtype(np.float32)
+
+  def testDouble(self):
+    self._testDtype(np.float64)
+
+
+class RoundingTest(tf.test.TestCase):
+
+  def _compare(self, x, use_gpu):
+    np_floor, np_ceil = np.floor(x), np.ceil(x)
+    with self.test_session(use_gpu=use_gpu) as sess:
+      inx = tf.convert_to_tensor(x)
+      ofloor, oceil = tf.floor(inx), tf.ceil(inx)
+      tf_floor, tf_ceil = sess.run([ofloor, oceil])
+    self.assertAllEqual(np_floor, tf_floor)
+    self.assertAllEqual(np_ceil, tf_ceil)
+    self.assertShapeEqual(np_floor, ofloor)
+    self.assertShapeEqual(np_ceil, oceil)
+
+  def _testDtype(self, dtype):
+    data = (np.arange(-3, 3) / 4.).reshape([1, 3, 2]).astype(dtype)
+    self._compare(data, use_gpu=True)
+    self._compare(data, use_gpu=True)
+
+  def testTypes(self):
+    for dtype in [np.float32, np.float64]:
+      self._testDtype(dtype)
+
+
+class ComplexMakeRealImagTest(tf.test.TestCase):
+
+  def _compareMake(self, real, imag, use_gpu):
+    np_ans = real + (1j) * imag
+    with self.test_session(use_gpu=use_gpu):
+      real = tf.convert_to_tensor(real)
+      imag = tf.convert_to_tensor(imag)
+      tf_ans = tf.complex(real, imag)
+      out = tf_ans.eval()
+    self.assertAllEqual(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def testMake(self):
+    real = (np.arange(-3, 3) / 4.).reshape([1, 3, 2]).astype(np.float32)
+    imag = (np.arange(-3, 3) / 5.).reshape([1, 3, 2]).astype(np.float32)
+    for use_gpu in [False, True]:
+      self._compareMake(real, imag, use_gpu)
+      self._compareMake(real, 12.0, use_gpu)
+      self._compareMake(23.0, imag, use_gpu)
+
+  def _compareRealImag(self, cplx, use_gpu):
+    np_real, np_imag = np.real(cplx), np.imag(cplx)
+    with self.test_session(use_gpu=use_gpu) as sess:
+      inx = tf.convert_to_tensor(cplx)
+      tf_real = tf.real(inx)
+      tf_imag = tf.imag(inx)
+      tf_real_val, tf_imag_val = sess.run([tf_real, tf_imag])
+    self.assertAllEqual(np_real, tf_real_val)
+    self.assertAllEqual(np_imag, tf_imag_val)
+    self.assertShapeEqual(np_real, tf_real)
+    self.assertShapeEqual(np_imag, tf_imag)
+
+  def testRealImag(self):
+    real = (np.arange(-3, 3) / 4.).reshape([1, 3, 2]).astype(np.float32)
+    imag = (np.arange(-3, 3) / 5.).reshape([1, 3, 2]).astype(np.float32)
+    cplx = real + (1j) * imag
+    self._compareRealImag(cplx, use_gpu=False)
+    self._compareRealImag(cplx, use_gpu=True)
+
+  def _compareConj(self, cplx, use_gpu):
+    np_ans = np.conj(cplx)
+    with self.test_session(use_gpu=use_gpu):
+      inx = tf.convert_to_tensor(cplx)
+      tf_conj = tf.conj(inx)
+      tf_ans = tf_conj.eval()
+    self.assertAllEqual(np_ans, tf_ans)
+    self.assertShapeEqual(np_ans, tf_conj)
+
+  def testConj(self):
+    real = (np.arange(-3, 3) / 4.).reshape([1, 3, 2]).astype(np.float32)
+    imag = (np.arange(-3, 3) / 5.).reshape([1, 3, 2]).astype(np.float32)
+    cplx = real + (1j) * imag
+    self._compareConj(cplx, use_gpu=False)
+    self._compareConj(cplx, use_gpu=True)
+
+  def _compareGradient(self, x):
+    # x[:, 0] is real, x[:, 1] is imag.  We combine real and imag into
+    # complex numbers. Then, we extract real and imag parts and
+    # computes the squared sum. This is obviously the same as sum(real
+    # * real) + sum(imag * imag). We just want to make sure the
+    # gradient function is checked.
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      real, imag = tf.split(1, 2, inx)
+      real, imag = tf.reshape(real, [-1]), tf.reshape(imag, [-1])
+      cplx = tf.complex(real, imag)
+      cplx = tf.conj(cplx)
+      loss = tf.reduce_sum(
+          tf.square(tf.real(cplx))) + tf.reduce_sum(
+              tf.square(tf.imag(cplx)))
+      epsilon = 1e-3
+      jacob_t, jacob_n = gc.ComputeGradient(inx, list(x.shape), loss, [1],
+                                            x_init_value=x, delta=epsilon)
+    self.assertAllClose(jacob_t, jacob_n, rtol=epsilon, atol=epsilon)
+
+  def testGradient(self):
+    data = np.arange(1, 2, 0.10).reshape([5, 2]).astype(np.float32)
+    self._compareGradient(data)
+
+  def _compareMulGradient(self, data):
+    # data is a float matrix of shape [n, 4].  data[:, 0], data[:, 1],
+    # data[:, 2], data[:, 3] are real parts of x, imaginary parts of
+    # x, real parts of y and imaginary parts of y.
+    with self.test_session():
+      inp = tf.convert_to_tensor(data)
+      xr, xi, yr, yi = tf.split(1, 4, inp)
+
+      def vec(x):  # Reshape to a vector
+        return tf.reshape(x, [-1])
+      xr, xi, yr, yi = vec(xr), vec(xi), vec(yr), vec(yi)
+
+      def cplx(r, i):  # Combine to a complex vector
+        return tf.complex(r, i)
+      x, y = cplx(xr, xi), cplx(yr, yi)
+      # z is x times y in complex plane.
+      z = x * y
+      # Defines the loss function as the sum of all coefficients of z.
+      loss = tf.reduce_sum(tf.real(z) + tf.imag(z))
+      epsilon = 0.005
+      jacob_t, jacob_n = gc.ComputeGradient(inp, list(data.shape), loss, [1],
+                                            x_init_value=data, delta=epsilon)
+    self.assertAllClose(jacob_t, jacob_n, rtol=epsilon, atol=epsilon)
+
+  def testMulGradient(self):
+    data = np.arange(1, 2, 0.125).reshape([2, 4]).astype(np.float32)
+    self._compareMulGradient(data)
+
+
+class AccumulateTest(tf.test.TestCase):
+
+  def testSimple(self):
+    with self.test_session():
+      random_arrays = [np.random.rand(16, 16, 16, 16).astype(np.float32)
+                       for _ in range(20)]
+      random_tensors = [tf.convert_to_tensor(x, dtype=tf.float32)
+                        for x in random_arrays]
+      tf_val = tf.accumulate_n(random_tensors)
+      np_val = random_arrays[0]
+      for random_array in random_arrays[1:]:
+        np_val += random_array
+      self.assertAllClose(np_val, tf_val.eval())
+
+  def testZeroArgs(self):
+    with self.test_session():
+      with self.assertRaises(ValueError):
+        tf_val = tf.accumulate_n([])
+        tf_val.eval()
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/decode_csv_op_test.py b/tensorflow/python/kernel_tests/decode_csv_op_test.py
new file mode 100644
index 0000000000..ae0917f8c4
--- /dev/null
+++ b/tensorflow/python/kernel_tests/decode_csv_op_test.py
@@ -0,0 +1,148 @@
+"""Tests for DecodeCSV op from parsing_ops."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class DecodeCSVOpTest(tf.test.TestCase):
+
+  def _test(self, args, expected_out=None, expected_err_re=None):
+    with self.test_session() as sess:
+      decode = tf.decode_csv(**args)
+
+      if expected_err_re is None:
+        out = sess.run(decode)
+
+        for i, field in enumerate(out):
+          if field.dtype == np.float32:
+            self.assertAllClose(field, expected_out[i])
+          else:
+            self.assertAllEqual(field, expected_out[i])
+
+      else:
+        with self.assertRaisesOpError(expected_err_re):
+          sess.run(decode)
+
+  def testSimple(self):
+    args = {"records": ["1", "2", '"3"'], "record_defaults": [[1]],}
+
+    expected_out = [[1, 2, 3]]
+
+    self._test(args, expected_out)
+
+  def testScalar(self):
+    args = {"records": '1,""', "record_defaults": [[3], [4]]}
+
+    expected_out = [1, 4]
+
+    self._test(args, expected_out)
+
+  def test2D(self):
+    args = {"records": [["1", "2"], ['""', "4"]], "record_defaults": [[5]]}
+    expected_out = [[[1, 2], [5, 4]]]
+
+    self._test(args, expected_out)
+
+  def testInt64(self):
+    args = {
+        "records": ["1", "2", '"2147483648"'],
+        "record_defaults": [np.array([],
+                                     dtype=np.int64)],
+    }
+
+    expected_out = [[1, 2, 2147483648]]
+
+    self._test(args, expected_out)
+
+  def testComplexString(self):
+    args = {
+        "records": ['"1.0"', '"ab , c"', '"a\nbc"', '"ab""c"', " abc "],
+        "record_defaults": [["1"]]
+    }
+
+    expected_out = [["1.0", "ab , c", "a\nbc", 'ab"c', " abc "]]
+
+    self._test(args, expected_out)
+
+  def testMultiRecords(self):
+    args = {
+        "records": ["1.0,4,aa", "0.2,5,bb", "3,6,cc"],
+        "record_defaults": [[1.0], [1], ["aa"]]
+    }
+
+    expected_out = [[1.0, 0.2, 3], [4, 5, 6], ["aa", "bb", "cc"]]
+
+    self._test(args, expected_out)
+
+  def testWithDefaults(self):
+    args = {
+        "records": [",1,", "0.2,3,bcd", "3.0,,"],
+        "record_defaults": [[1.0], [0], ["a"]]
+    }
+
+    expected_out = [[1.0, 0.2, 3.0], [1, 3, 0], ["a", "bcd", "a"]]
+
+    self._test(args, expected_out)
+
+  def testWithTabDelim(self):
+    args = {
+        "records": ["1\t1", "0.2\t3", "3.0\t"],
+        "record_defaults": [[1.0], [0]],
+        "field_delim": "\t"
+    }
+
+    expected_out = [[1.0, 0.2, 3.0], [1, 3, 0]]
+
+    self._test(args, expected_out)
+
+  def testWithoutDefaultsError(self):
+    args = {
+        "records": [",1", "0.2,3", "3.0,"],
+        "record_defaults": [[1.0], np.array([],
+                                            dtype=np.int32)]
+    }
+
+    self._test(args,
+               expected_err_re="Field 1 is required but missing in record 2!")
+
+  def testWrongFieldIntError(self):
+    args = {
+        "records": [",1", "0.2,234a", "3.0,2"],
+        "record_defaults": [[1.0], np.array([],
+                                            dtype=np.int32)]
+    }
+
+    self._test(args,
+               expected_err_re="Field 1 in record 1 is not a valid int32: 234a")
+
+  def testOutOfRangeError(self):
+    args = {
+        "records": ["1", "9999999999999999999999999", "3"],
+        "record_defaults": [[1]]
+    }
+
+    self._test(args,
+               expected_err_re="Field 0 in record 1 is not a valid int32: ")
+
+  def testWrongFieldFloatError(self):
+    args = {
+        "records": [",1", "0.2,2", "3.0adf,3"],
+        "record_defaults": [[1.0], np.array([],
+                                            dtype=np.int32)]
+    }
+
+    self._test(args,
+               expected_err_re="Field 0 in record 2 is not a valid float: ")
+
+  def testWrongFieldStringError(self):
+    args = {"records": ['"1,a,"', "0.22", 'a"bc'], "record_defaults": [["a"]]}
+
+    self._test(
+        args,
+        expected_err_re="Unquoted fields cannot have quotes/CRLFs inside")
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/decode_raw_op_test.py b/tensorflow/python/kernel_tests/decode_raw_op_test.py
new file mode 100644
index 0000000000..abd50a7527
--- /dev/null
+++ b/tensorflow/python/kernel_tests/decode_raw_op_test.py
@@ -0,0 +1,44 @@
+"""Tests for DecodeRaw op from parsing_ops."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class DecodeRawOpTest(tf.test.TestCase):
+
+  def testToUint8(self):
+    with self.test_session():
+      in_bytes = tf.placeholder(tf.string, shape=[2])
+      decode = tf.decode_raw(in_bytes, out_type=tf.uint8)
+      self.assertEqual([2, None], decode.get_shape().as_list())
+
+      result = decode.eval(feed_dict={in_bytes: ["A", "a"]})
+      self.assertAllEqual([[ord("A")], [ord("a")]], result)
+
+      result = decode.eval(feed_dict={in_bytes: ["wer", "XYZ"]})
+      self.assertAllEqual([[ord("w"), ord("e"), ord("r")],
+                           [ord("X"), ord("Y"), ord("Z")]], result)
+
+      with self.assertRaisesOpError(
+          "DecodeRaw requires input strings to all be the same size, but "
+          "element 1 has size 5 != 6"):
+        decode.eval(feed_dict={in_bytes: ["short", "longer"]})
+
+  def testToInt16(self):
+    with self.test_session():
+      in_bytes = tf.placeholder(tf.string, shape=[None])
+      decode = tf.decode_raw(in_bytes, out_type=tf.int16)
+      self.assertEqual([None, None], decode.get_shape().as_list())
+
+      result = decode.eval(feed_dict={in_bytes: ["AaBC"]})
+      self.assertAllEqual([[ord("A") + ord("a") * 256,
+                            ord("B") + ord("C") * 256]], result)
+
+      with self.assertRaisesOpError(
+          "Input to DecodeRaw has length 3 that is not a multiple of 2, the "
+          "size of int16"):
+        decode.eval(feed_dict={in_bytes: ["123", "456"]})
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/dense_update_ops_no_tsan_test.py b/tensorflow/python/kernel_tests/dense_update_ops_no_tsan_test.py
new file mode 100644
index 0000000000..ad0724931e
--- /dev/null
+++ b/tensorflow/python/kernel_tests/dense_update_ops_no_tsan_test.py
@@ -0,0 +1,60 @@
+"""Tests for state updating ops that may have benign race conditions."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class AssignOpTest(tf.test.TestCase):
+
+  # NOTE(mrry): We exclude thess tests from the TSAN TAP target, because they
+  #   contain benign and deliberate data races when multiple threads update
+  #   the same parameters without a lock.
+  def testParallelUpdateWithoutLocking(self):
+    with self.test_session() as sess:
+      ones_t = tf.fill([1024, 1024], 1.0)
+      p = tf.Variable(tf.zeros([1024, 1024]))
+      adds = [tf.assign_add(p, ones_t, use_locking=False)
+              for _ in range(20)]
+      tf.initialize_all_variables().run()
+
+      def run_add(add_op):
+        sess.run(add_op)
+      threads = [self.checkedThread(target=run_add, args=(add_op,))
+                 for add_op in adds]
+      for t in threads:
+        t.start()
+      for t in threads:
+        t.join()
+
+      vals = p.eval()
+      ones = np.ones((1024, 1024)).astype(np.float32)
+      self.assertTrue((vals >= ones).all())
+      self.assertTrue((vals <= ones * 20).all())
+
+  def testParallelAssignWithoutLocking(self):
+    with self.test_session() as sess:
+      ones_t = tf.fill([1024, 1024], float(1))
+      p = tf.Variable(tf.zeros([1024, 1024]))
+      assigns = [tf.assign(p, tf.mul(ones_t, float(i)), False)
+                 for i in range(1, 21)]
+      tf.initialize_all_variables().run()
+
+      def run_assign(assign_op):
+        sess.run(assign_op)
+      threads = [self.checkedThread(target=run_assign, args=(assign_op,))
+                 for assign_op in assigns]
+      for t in threads:
+        t.start()
+      for t in threads:
+        t.join()
+
+      vals = p.eval()
+
+      # Assert every element is taken from one of the assignments.
+      self.assertTrue((vals > 0).all())
+      self.assertTrue((vals <= 20).all())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/dense_update_ops_test.py b/tensorflow/python/kernel_tests/dense_update_ops_test.py
new file mode 100644
index 0000000000..2e1ea468c3
--- /dev/null
+++ b/tensorflow/python/kernel_tests/dense_update_ops_test.py
@@ -0,0 +1,151 @@
+"""Tests for tensorflow.ops.tf.Assign*."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class AssignOpTest(tf.test.TestCase):
+
+  def _initAssignFetch(self, x, y, use_gpu=False):
+    """Initialize a param to init and update it with y."""
+    super(AssignOpTest, self).setUp()
+    with self.test_session(use_gpu=use_gpu):
+      p = tf.Variable(x)
+      assign = tf.assign(p, y)
+      p.initializer.run()
+      new_value = assign.eval()
+      return p.eval(), new_value
+
+  def _initAssignAddFetch(self, x, y, use_gpu=False):
+    """Initialize a param to init, and compute param += y."""
+    with self.test_session(use_gpu=use_gpu):
+      p = tf.Variable(x)
+      add = tf.assign_add(p, y)
+      p.initializer.run()
+      new_value = add.eval()
+      return p.eval(), new_value
+
+  def _initAssignSubFetch(self, x, y, use_gpu=False):
+    """Initialize a param to init, and compute param -= y."""
+    with self.test_session(use_gpu=use_gpu):
+      p = tf.Variable(x)
+      sub = tf.assign_sub(p, y)
+      p.initializer.run()
+      new_value = sub.eval()
+      return p.eval(), new_value
+
+  def _testTypes(self, vals):
+    for dtype in [np.float32, np.float64, np.int32, np.int64]:
+      x = np.zeros(vals.shape).astype(dtype)
+      y = vals.astype(dtype)
+      var_value, op_value = self._initAssignFetch(x, y, use_gpu=False)
+      self.assertAllEqual(y, var_value)
+      self.assertAllEqual(y, op_value)
+      var_value, op_value = self._initAssignAddFetch(x, y, use_gpu=False)
+      self.assertAllEqual(x + y, var_value)
+      self.assertAllEqual(x + y, op_value)
+      var_value, op_value = self._initAssignSubFetch(x, y, use_gpu=False)
+      self.assertAllEqual(x - y, var_value)
+      self.assertAllEqual(x - y, op_value)
+      if tf.test.IsBuiltWithCuda() and dtype in [np.float32, np.float64]:
+        var_value, op_value = self._initAssignFetch(x, y, use_gpu=True)
+        self.assertAllEqual(y, var_value)
+        self.assertAllEqual(y, op_value)
+        var_value, op_value = self._initAssignAddFetch(x, y, use_gpu=True)
+        self.assertAllEqual(x + y, var_value)
+        self.assertAllEqual(x + y, op_value)
+        var_value, op_value = self._initAssignSubFetch(x, y, use_gpu=False)
+        self.assertAllEqual(x - y, var_value)
+        self.assertAllEqual(x - y, op_value)
+
+  def testBasic(self):
+    self._testTypes(np.arange(0, 20).reshape([4, 5]))
+
+  def testAssignNonStrictShapeChecking(self):
+    with self.test_session():
+      data = tf.fill([1024, 1024], 0)
+      p = tf.Variable([1])
+      a = tf.assign(p, data, validate_shape=False)
+      a.op.run()
+      self.assertAllEqual(p.eval(), data.eval())
+
+      # Assign to yet another shape
+      data2 = tf.fill([10, 10], 1)
+      a2 = tf.assign(p, data2, validate_shape=False)
+      a2.op.run()
+      self.assertAllEqual(p.eval(), data2.eval())
+
+  def testInitRequiredAssignAdd(self):
+    with self.test_session():
+      p = tf.Variable(tf.fill([1024, 1024], 1),
+                             tf.int32)
+      a = tf.assign_add(p, tf.fill([1024, 1024], 0))
+      with self.assertRaisesOpError("use uninitialized"):
+        a.op.run()
+
+  def testInitRequiredAssignSub(self):
+    with self.test_session():
+      p = tf.Variable(tf.fill([1024, 1024], 1),
+                             tf.int32)
+      a = tf.assign_sub(p, tf.fill([1024, 1024], 0))
+      with self.assertRaisesOpError("use uninitialized"):
+        a.op.run()
+
+  # NOTE(mrry): See also
+  #   dense_update_ops_no_tsan_test.AssignOpTest, which contains a benign
+  #   data race and must run without TSAN.
+  def testParallelUpdateWithLocking(self):
+    with self.test_session() as sess:
+      zeros_t = tf.fill([1024, 1024], 0.0)
+      ones_t = tf.fill([1024, 1024], 1.0)
+      p = tf.Variable(zeros_t)
+      adds = [tf.assign_add(p, ones_t, use_locking=True)
+              for _ in range(20)]
+      p.initializer.run()
+
+      def run_add(add_op):
+        sess.run(add_op)
+      threads = [
+          self.checkedThread(target=run_add, args=(add_op,)) for add_op in adds]
+      for t in threads:
+        t.start()
+      for t in threads:
+        t.join()
+
+      vals = p.eval()
+      ones = np.ones((1024, 1024)).astype(np.float32)
+      self.assertAllEqual(vals, ones * 20)
+
+  # NOTE(mrry): See also
+  #   dense_update_ops_no_tsan_test.[...].testParallelAssignWithoutLocking,
+  #   which contains a benign data race and must run without TSAN.
+  def testParallelAssignWithLocking(self):
+    with self.test_session() as sess:
+      zeros_t = tf.fill([1024, 1024], 0.0)
+      ones_t = tf.fill([1024, 1024], 1.0)
+      p = tf.Variable(zeros_t)
+      assigns = [tf.assign(p, tf.mul(ones_t, float(i)),
+                                  use_locking=True)
+                 for i in range(1, 21)]
+      p.initializer.run()
+
+      def run_assign(assign_op):
+        sess.run(assign_op)
+      threads = [self.checkedThread(target=run_assign, args=(assign_op,))
+                 for assign_op in assigns]
+      for t in threads:
+        t.start()
+      for t in threads:
+        t.join()
+
+      vals = p.eval()
+
+      # Assert every element is the same, and taken from one of the assignments.
+      self.assertTrue(vals[0, 0] > 0)
+      self.assertTrue(vals[0, 0] <= 20)
+      self.assertAllEqual(vals, np.ones([1024, 1024]) * vals[0, 0])
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/determinant_op_test.py b/tensorflow/python/kernel_tests/determinant_op_test.py
new file mode 100644
index 0000000000..d4e2b88339
--- /dev/null
+++ b/tensorflow/python/kernel_tests/determinant_op_test.py
@@ -0,0 +1,72 @@
+"""Tests for tensorflow.ops.tf.MatrixDeterminant."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class DeterminantOpTest(tf.test.TestCase):
+
+  def _compareDeterminant(self, matrix_x):
+    with self.test_session():
+      if matrix_x.ndim == 2:
+        tf_ans = tf.matrix_determinant(matrix_x)
+      else:
+        tf_ans = tf.batch_matrix_determinant(matrix_x)
+      out = tf_ans.eval()
+    shape = matrix_x.shape
+    if shape[-1] == 0 and shape[-2] == 0:
+      np_ans = np.ones(shape[:-2]).astype(matrix_x.dtype)
+    else:
+      np_ans = np.array(np.linalg.det(matrix_x)).astype(matrix_x.dtype)
+    self.assertAllClose(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def testBasic(self):
+    # 2x2 matrices
+    self._compareDeterminant(np.array([[2., 3.], [3., 4.]]).astype(np.float32))
+    self._compareDeterminant(np.array([[0., 0.], [0., 0.]]).astype(np.float32))
+    # 5x5 matrices (Eigen forces LU decomposition)
+    self._compareDeterminant(np.array(
+        [[2., 3., 4., 5., 6.], [3., 4., 9., 2., 0.], [2., 5., 8., 3., 8.],
+         [1., 6., 7., 4., 7.], [2., 3., 4., 5., 6.]]).astype(np.float32))
+    # A multidimensional batch of 2x2 matrices
+    self._compareDeterminant(np.random.rand(3, 4, 5, 2, 2).astype(np.float32))
+
+  def testBasicDouble(self):
+    # 2x2 matrices
+    self._compareDeterminant(np.array([[2., 3.], [3., 4.]]).astype(np.float64))
+    self._compareDeterminant(np.array([[0., 0.], [0., 0.]]).astype(np.float64))
+    # 5x5 matrices (Eigen forces LU decomposition)
+    self._compareDeterminant(np.array(
+        [[2., 3., 4., 5., 6.], [3., 4., 9., 2., 0.], [2., 5., 8., 3., 8.],
+         [1., 6., 7., 4., 7.], [2., 3., 4., 5., 6.]]).astype(np.float64))
+    # A multidimensional batch of 2x2 matrices
+    self._compareDeterminant(np.random.rand(3, 4, 5, 2, 2).astype(np.float64))
+
+  def testOverflow(self):
+    max_double = np.finfo("d").max
+    huge_matrix = np.array([[max_double, 0.0], [0.0, max_double]])
+    with self.assertRaisesOpError("not finite"):
+      self._compareDeterminant(huge_matrix)
+
+  def testNonSquareMatrix(self):
+    # When the determinant of a non-square matrix is attempted we should return
+    # an error
+    with self.assertRaises(ValueError):
+      tf.matrix_determinant(
+          np.array([[1., 2., 3.], [3., 5., 4.]]).astype(np.float32))
+
+  def testWrongDimensions(self):
+    # The input to the determinant should be a 2-dimensional tensor.
+    tensor1 = tf.constant([1., 2.])
+    with self.assertRaises(ValueError):
+      tf.matrix_determinant(tensor1)
+
+  def testEmpty(self):
+    self._compareDeterminant(np.empty([0, 2, 2]))
+    self._compareDeterminant(np.empty([2, 0, 0]))
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/diag_op_test.py b/tensorflow/python/kernel_tests/diag_op_test.py
new file mode 100644
index 0000000000..7b53ee26fa
--- /dev/null
+++ b/tensorflow/python/kernel_tests/diag_op_test.py
@@ -0,0 +1,80 @@
+import tensorflow.python.platform
+
+import numpy
+import tensorflow as tf
+
+
+class GenerateIdentityTensorTest(tf.test.TestCase):
+
+  def _testDiagOp(self, diag, dtype, expected_ans, use_gpu=False,
+                  expected_err_re=None):
+    with self.test_session(use_gpu=use_gpu):
+      tf_ans = tf.diag(tf.convert_to_tensor(diag.astype(dtype)))
+      out = tf_ans.eval()
+    self.assertAllClose(out, expected_ans)
+    self.assertShapeEqual(expected_ans, tf_ans)
+
+  def testEmptyTensor(self):
+    x = numpy.array([])
+    expected_ans = numpy.empty([0, 0])
+    self._testDiagOp(x, numpy.int32, expected_ans)
+
+  def testRankOneIntTensor(self):
+    x = numpy.array([1, 2, 3])
+    expected_ans = numpy.array(
+        [[1, 0, 0],
+         [0, 2, 0],
+         [0, 0, 3]])
+    self._testDiagOp(x, numpy.int32, expected_ans)
+    self._testDiagOp(x, numpy.int64, expected_ans)
+
+  def testRankOneFloatTensor(self):
+    x = numpy.array([1.1, 2.2, 3.3])
+    expected_ans = numpy.array(
+        [[1.1, 0, 0],
+         [0, 2.2, 0],
+         [0, 0, 3.3]])
+    self._testDiagOp(x, numpy.float32, expected_ans)
+    self._testDiagOp(x, numpy.float64, expected_ans)
+
+  def testRankTwoIntTensor(self):
+    x = numpy.array([[1, 2, 3], [4, 5, 6]])
+    expected_ans = numpy.array(
+        [[[[1, 0, 0], [0, 0, 0]],
+          [[0, 2, 0], [0, 0, 0]],
+          [[0, 0, 3], [0, 0, 0]]],
+         [[[0, 0, 0], [4, 0, 0]],
+          [[0, 0, 0], [0, 5, 0]],
+          [[0, 0, 0], [0, 0, 6]]]])
+    self._testDiagOp(x, numpy.int32, expected_ans)
+    self._testDiagOp(x, numpy.int64, expected_ans)
+
+  def testRankTwoFloatTensor(self):
+    x = numpy.array([[1.1, 2.2, 3.3], [4.4, 5.5, 6.6]])
+    expected_ans = numpy.array(
+        [[[[1.1, 0, 0], [0, 0, 0]],
+          [[0, 2.2, 0], [0, 0, 0]],
+          [[0, 0, 3.3], [0, 0, 0]]],
+         [[[0, 0, 0], [4.4, 0, 0]],
+          [[0, 0, 0], [0, 5.5, 0]],
+          [[0, 0, 0], [0, 0, 6.6]]]])
+    self._testDiagOp(x, numpy.float32, expected_ans)
+    self._testDiagOp(x, numpy.float64, expected_ans)
+
+  def testRankThreeFloatTensor(self):
+    x = numpy.array([[[1.1, 2.2], [3.3, 4.4]],
+                     [[5.5, 6.6], [7.7, 8.8]]])
+    expected_ans = numpy.array(
+        [[[[[[1.1, 0], [0, 0]], [[0, 0], [0, 0]]],
+           [[[0, 2.2], [0, 0]], [[0, 0], [0, 0]]]],
+          [[[[0, 0], [3.3, 0]], [[0, 0], [0, 0]]],
+           [[[0, 0], [0, 4.4]], [[0, 0], [0, 0]]]]],
+         [[[[[0, 0], [0, 0]], [[5.5, 0], [0, 0]]],
+           [[[0, 0], [0, 0]], [[0, 6.6], [0, 0]]]],
+          [[[[0, 0], [0, 0]], [[0, 0], [7.7, 0]]],
+           [[[0, 0], [0, 0]], [[0, 0], [0, 8.8]]]]]])
+    self._testDiagOp(x, numpy.float32, expected_ans)
+    self._testDiagOp(x, numpy.float64, expected_ans)
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/dynamic_partition_op_test.py b/tensorflow/python/kernel_tests/dynamic_partition_op_test.py
new file mode 100644
index 0000000000..a7a276893d
--- /dev/null
+++ b/tensorflow/python/kernel_tests/dynamic_partition_op_test.py
@@ -0,0 +1,99 @@
+"""Tests for the DynamicPartition op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class DynamicPartitionTest(tf.test.TestCase):
+
+  def testSimpleOneDimensional(self):
+    with self.test_session() as sess:
+      data = tf.constant([0, 13, 2, 39, 4, 17])
+      indices = tf.constant([0, 0, 2, 3, 2, 1])
+      partitions = tf.dynamic_partition(data, indices, num_partitions=4)
+      partition_vals = sess.run(partitions)
+
+    self.assertAllEqual([0, 13], partition_vals[0])
+    self.assertAllEqual([17], partition_vals[1])
+    self.assertAllEqual([2, 4], partition_vals[2])
+    self.assertAllEqual([39], partition_vals[3])
+    # Vector data input to DynamicPartition results in
+    # `num_partitions` vectors of unknown length.
+    self.assertEqual([None], partitions[0].get_shape().as_list())
+    self.assertEqual([None], partitions[1].get_shape().as_list())
+    self.assertEqual([None], partitions[2].get_shape().as_list())
+    self.assertEqual([None], partitions[3].get_shape().as_list())
+
+  def testSimpleTwoDimensional(self):
+    with self.test_session() as sess:
+      data = tf.constant([[0, 1, 2], [3, 4, 5], [6, 7, 8],
+                                   [9, 10, 11], [12, 13, 14], [15, 16, 17]])
+      indices = tf.constant([0, 0, 2, 3, 2, 1])
+      partitions = tf.dynamic_partition(data, indices, num_partitions=4)
+      partition_vals = sess.run(partitions)
+
+    self.assertAllEqual([[0, 1, 2], [3, 4, 5]], partition_vals[0])
+    self.assertAllEqual([[15, 16, 17]], partition_vals[1])
+    self.assertAllEqual([[6, 7, 8], [12, 13, 14]], partition_vals[2])
+    self.assertAllEqual([[9, 10, 11]], partition_vals[3])
+    # Vector data input to DynamicPartition results in
+    # `num_partitions` matrices with an unknown number of rows, and 3 columns.
+    self.assertEqual([None, 3], partitions[0].get_shape().as_list())
+    self.assertEqual([None, 3], partitions[1].get_shape().as_list())
+    self.assertEqual([None, 3], partitions[2].get_shape().as_list())
+    self.assertEqual([None, 3], partitions[3].get_shape().as_list())
+
+  def testHigherRank(self):
+    np.random.seed(7)
+    with self.test_session() as sess:
+      for n in 2, 3:
+        for shape in (4,), (4, 5), (4, 5, 2):
+          partitions = np.random.randint(n, size=np.prod(shape)).reshape(shape)
+          for extra_shape in (), (6,), (6, 7):
+            data = np.random.randn(*(shape + extra_shape))
+            outputs = tf.dynamic_partition(data, partitions, num_partitions=n)
+            self.assertEqual(n, len(outputs))
+            for i, output in enumerate(sess.run(outputs)):
+              self.assertAllEqual(output, data[partitions == i])
+
+  def testErrorIndexOutOfRange(self):
+    with self.test_session() as sess:
+      data = tf.constant([[0, 1, 2], [3, 4, 5], [6, 7, 8],
+                                   [9, 10, 11], [12, 13, 14]])
+      indices = tf.constant([0, 2, 99, 2, 2])
+      partitions = tf.dynamic_partition(data, indices, num_partitions=4)
+      with self.assertRaisesOpError(r"partitions\[2\] = 99 is not in \[0, 4\)"):
+        sess.run(partitions)
+
+  def testScalarIndexOutOfRange(self):
+    with self.test_session() as sess:
+      bad = 17
+      data = np.zeros(5)
+      partitions = tf.dynamic_partition(data, bad, num_partitions=7)
+      with self.assertRaisesOpError(r"partitions = 17 is not in \[0, 7\)"):
+        sess.run(partitions)
+
+  def testHigherRankIndexOutOfRange(self):
+    with self.test_session() as sess:
+      shape = (2, 3)
+      indices = tf.placeholder(shape=shape, dtype=np.int32)
+      data = np.zeros(shape + (5,))
+      partitions = tf.dynamic_partition(data, indices, num_partitions=7)
+      for i in xrange(2):
+        for j in xrange(3):
+          bad = np.zeros(shape, dtype=np.int32)
+          bad[i, j] = 17
+          with self.assertRaisesOpError(
+              r"partitions\[%d,%d\] = 17 is not in \[0, 7\)" % (i, j)):
+            sess.run(partitions, feed_dict={indices: bad})
+
+  def testErrorWrongDimsIndices(self):
+    data = tf.constant([[0], [1], [2]])
+    indices = tf.constant([[0], [0]])
+    with self.assertRaises(ValueError):
+      tf.dynamic_partition(data, indices, num_partitions=4)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/dynamic_stitch_op_test.py b/tensorflow/python/kernel_tests/dynamic_stitch_op_test.py
new file mode 100644
index 0000000000..9ac49390b9
--- /dev/null
+++ b/tensorflow/python/kernel_tests/dynamic_stitch_op_test.py
@@ -0,0 +1,107 @@
+"""Tests for tensorflow.ops.data_flow_ops.dynamic_stitch."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class DynamicStitchTest(tf.test.TestCase):
+
+  def testScalar(self):
+    with self.test_session():
+      indices = [tf.constant(0), tf.constant(1)]
+      data = [tf.constant(40), tf.constant(60)]
+      for step in -1, 1:
+        stitched_t = tf.dynamic_stitch(indices[::step], data)
+        stitched_val = stitched_t.eval()
+        self.assertAllEqual([40, 60][::step], stitched_val)
+        # Dimension 0 is determined by the max index in indices, so we
+        # can only infer that the output is a vector of some unknown
+        # length.
+        self.assertEqual([None], stitched_t.get_shape().as_list())
+
+  def testSimpleOneDimensional(self):
+    with self.test_session():
+      indices = [tf.constant([0, 4, 7]),
+                 tf.constant([1, 6, 2, 3, 5])]
+      data = [tf.constant([0, 40, 70]),
+              tf.constant([10, 60, 20, 30, 50])]
+      stitched_t = tf.dynamic_stitch(indices, data)
+      stitched_val = stitched_t.eval()
+      self.assertAllEqual([0, 10, 20, 30, 40, 50, 60, 70], stitched_val)
+      # Dimension 0 is determined by the max index in indices, so we
+      # can only infer that the output is a vector of some unknown
+      # length.
+      self.assertEqual([None], stitched_t.get_shape().as_list())
+
+  def testSimpleTwoDimensional(self):
+    with self.test_session():
+      indices = [tf.constant([0, 4, 7]),
+                 tf.constant([1, 6]),
+                 tf.constant([2, 3, 5])]
+      data = [tf.constant([[0, 1], [40, 41], [70, 71]]),
+              tf.constant([[10, 11], [60, 61]]),
+              tf.constant([[20, 21], [30, 31], [50, 51]])]
+      stitched_t = tf.dynamic_stitch(indices, data)
+      stitched_val = stitched_t.eval()
+      self.assertAllEqual(
+          [[0, 1], [10, 11], [20, 21], [30, 31],
+           [40, 41], [50, 51], [60, 61], [70, 71]], stitched_val)
+      # Dimension 0 is determined by the max index in indices, so we
+      # can only infer that the output is a matrix with 2 columns and
+      # some unknown number of rows.
+      self.assertEqual([None, 2], stitched_t.get_shape().as_list())
+
+  def testHigherRank(self):
+    with self.test_session() as sess:
+      indices = [tf.constant(6), tf.constant([4, 1]),
+                 tf.constant([[5, 2], [0, 3]])]
+      data = [tf.constant([61, 62]), tf.constant([[41, 42], [11, 12]]),
+              tf.constant([[[51, 52], [21, 22]], [[1, 2], [31, 32]]])]
+      stitched_t = tf.dynamic_stitch(indices, data)
+      stitched_val = stitched_t.eval()
+      correct = 10 * np.arange(7)[:, None] + [1, 2]
+      self.assertAllEqual(correct, stitched_val)
+      self.assertEqual([None, 2], stitched_t.get_shape().as_list())
+      # Test gradients
+      stitched_grad = 7 * stitched_val
+      grads = tf.gradients(stitched_t, indices + data, stitched_grad)
+      self.assertEqual(grads[:3], [None] * 3)  # Indices have no gradients
+      for datum, grad in zip(data, sess.run(grads[3:])):
+        self.assertAllEqual(7 * datum.eval(), grad)
+
+  def testErrorIndicesMultiDimensional(self):
+    indices = [tf.constant([0, 4, 7]),
+               tf.constant([[1, 6, 2, 3, 5]])]
+    data = [tf.constant([[0, 40, 70]]),
+            tf.constant([10, 60, 20, 30, 50])]
+    with self.assertRaises(ValueError):
+      tf.dynamic_stitch(indices, data)
+
+  def testErrorDataNumDimsMismatch(self):
+    indices = [tf.constant([0, 4, 7]),
+               tf.constant([1, 6, 2, 3, 5])]
+    data = [tf.constant([0, 40, 70]),
+            tf.constant([[10, 60, 20, 30, 50]])]
+    with self.assertRaises(ValueError):
+      tf.dynamic_stitch(indices, data)
+
+  def testErrorDataDimSizeMismatch(self):
+    indices = [tf.constant([0, 4, 5]),
+               tf.constant([1, 6, 2, 3])]
+    data = [tf.constant([[0], [40], [70]]),
+            tf.constant([[10, 11], [60, 61], [20, 21], [30, 31]])]
+    with self.assertRaises(ValueError):
+      tf.dynamic_stitch(indices, data)
+
+  def testErrorDataAndIndicesSizeMismatch(self):
+    indices = [tf.constant([0, 4, 7]),
+               tf.constant([1, 6, 2, 3, 5])]
+    data = [tf.constant([0, 40, 70]),
+            tf.constant([10, 60, 20, 30])]
+    with self.assertRaises(ValueError):
+      tf.dynamic_stitch(indices, data)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/edit_distance_op_test.py b/tensorflow/python/kernel_tests/edit_distance_op_test.py
new file mode 100644
index 0000000000..5919adcfaf
--- /dev/null
+++ b/tensorflow/python/kernel_tests/edit_distance_op_test.py
@@ -0,0 +1,153 @@
+"""Tests for tensorflow.kernels.edit_distance_op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+def ConstantOf(x):
+  x = np.asarray(x)
+  # Convert to int64 if it's not a string
+  if x.dtype.char != "S": x = np.asarray(x, dtype=np.int64)
+  return tf.constant(x)
+
+
+class EditDistanceTest(tf.test.TestCase):
+
+  def _testEditDistance(self, hypothesis, truth, normalize,
+                        expected_output, expected_err_re=None):
+    # hypothesis and truth are (index, value, shape) tuples
+    hypothesis_st = tf.SparseTensor(*[ConstantOf(x) for x in hypothesis])
+    truth_st = tf.SparseTensor(*[ConstantOf(x) for x in truth])
+    edit_distance = tf.edit_distance(
+        hypothesis=hypothesis_st, truth=truth_st, normalize=normalize)
+
+    with self.test_session():
+      if expected_err_re is None:
+        # Shape inference figures out the shape from the shape variables
+        expected_shape = [
+            max(h, t) for h, t in zip(hypothesis[2], truth[2])[:-1]]
+        self.assertEqual(edit_distance.get_shape(), expected_shape)
+        output = edit_distance.eval()
+        self.assertAllClose(output, expected_output)
+      else:
+        with self.assertRaisesOpError(expected_err_re):
+          edit_distance.eval()
+
+  def testEditDistanceNormalized(self):
+    hypothesis_indices = [[0, 0], [0, 1],
+                          [1, 0], [1, 1]]
+    hypothesis_values = [0, 1,
+                         1, -1]
+    hypothesis_shape = [2, 2]
+    truth_indices = [[0, 0],
+                     [1, 0], [1, 1]]
+    truth_values = [0,
+                    1, 1]
+    truth_shape = [2, 2]
+    expected_output = [1.0, 0.5]
+
+    self._testEditDistance(
+        hypothesis=(hypothesis_indices, hypothesis_values, hypothesis_shape),
+        truth=(truth_indices, truth_values, truth_shape),
+        normalize=True,
+        expected_output=expected_output)
+
+  def testEditDistanceUnnormalized(self):
+    hypothesis_indices = [[0, 0],
+                          [1, 0], [1, 1]]
+    hypothesis_values = [10,
+                         10, 11]
+    hypothesis_shape = [2, 2]
+    truth_indices = [[0, 0], [0, 1],
+                     [1, 0], [1, 1]]
+    truth_values = [1, 2,
+                    1, -1]
+    truth_shape = [2, 3]
+    expected_output = [2.0, 2.0]
+
+    self._testEditDistance(
+        hypothesis=(hypothesis_indices, hypothesis_values, hypothesis_shape),
+        truth=(truth_indices, truth_values, truth_shape),
+        normalize=False,
+        expected_output=expected_output)
+
+  def testEditDistanceProperDistance(self):
+    # In this case, the values are individual characters stored in the
+    # SparseTensor (type DT_STRING)
+    hypothesis_indices = ([[0, i] for i, _ in enumerate("algorithm")] +
+                          [[1, i] for i, _ in enumerate("altruistic")])
+    hypothesis_values = [x for x in "algorithm"] + [x for x in "altruistic"]
+    hypothesis_shape = [2, 11]
+    truth_indices = ([[0, i] for i, _ in enumerate("altruistic")] +
+                     [[1, i] for i, _ in enumerate("algorithm")])
+    truth_values = [x for x in "altruistic"] + [x for x in "algorithm"]
+    truth_shape = [2, 11]
+    expected_unnormalized = [6.0, 6.0]
+    expected_normalized = [6.0/len("altruistic"),
+                           6.0/len("algorithm")]
+
+    self._testEditDistance(
+        hypothesis=(hypothesis_indices, hypothesis_values, hypothesis_shape),
+        truth=(truth_indices, truth_values, truth_shape),
+        normalize=False,
+        expected_output=expected_unnormalized)
+
+    self._testEditDistance(
+        hypothesis=(hypothesis_indices, hypothesis_values, hypothesis_shape),
+        truth=(truth_indices, truth_values, truth_shape),
+        normalize=True,
+        expected_output=expected_normalized)
+
+  def testEditDistance3D(self):
+    hypothesis_indices = [[0, 0, 0],
+                          [1, 0, 0]]
+    hypothesis_values = [0, 1]
+    hypothesis_shape = [2, 1, 1]
+    truth_indices = [[0, 1, 0],
+                     [1, 0, 0],
+                     [1, 1, 0]]
+    truth_values = [0, 1, 1]
+    truth_shape = [2, 2, 1]
+    expected_output = [[np.inf, 1.0],  # (0,0): no truth, (0,1): no hypothesis
+                       [0.0, 1.0]]     # (1,0): match,    (1,1): no hypothesis
+
+    self._testEditDistance(
+        hypothesis=(hypothesis_indices, hypothesis_values, hypothesis_shape),
+        truth=(truth_indices, truth_values, truth_shape),
+        normalize=True,
+        expected_output=expected_output)
+
+  def testEditDistanceMissingHypothesis(self):
+    hypothesis_indices = np.empty((0, 2), dtype=np.int64)
+    hypothesis_values = []
+    hypothesis_shape = [1, 0]
+    truth_indices = [[0, 0]]
+    truth_values = [0]
+    truth_shape = [1, 1]
+    expected_output = [1.0]
+
+    self._testEditDistance(
+        hypothesis=(hypothesis_indices, hypothesis_values, hypothesis_shape),
+        truth=(truth_indices, truth_values, truth_shape),
+        normalize=True,
+        expected_output=expected_output)
+
+  def testEditDistanceMissingTruth(self):
+    hypothesis_indices = [[0, 0]]
+    hypothesis_values = [0]
+    hypothesis_shape = [1, 1]
+    truth_indices = np.empty((0, 2), dtype=np.int64)
+    truth_values = []
+    truth_shape = [1, 0]
+    expected_output = [np.inf]  # Normalized, divide by zero
+
+    self._testEditDistance(
+        hypothesis=(hypothesis_indices, hypothesis_values, hypothesis_shape),
+        truth=(truth_indices, truth_values, truth_shape),
+        normalize=True,
+        expected_output=expected_output)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/embedding_ops_test.py b/tensorflow/python/kernel_tests/embedding_ops_test.py
new file mode 100644
index 0000000000..99aa2453dc
--- /dev/null
+++ b/tensorflow/python/kernel_tests/embedding_ops_test.py
@@ -0,0 +1,422 @@
+"""Functional tests for ops used with embeddings."""
+import itertools
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+def _AsLong(array):
+  """Casts arrays elements to long type. Used to convert from numpy tf."""
+  return [long(x) for x in array]
+
+
+class ScatterAddSubTest(tf.test.TestCase):
+
+  def _TestCase(self, shape, indices, scatter_op=tf.scatter_add):
+    """Run a random test case with the given shape and indices.
+
+    Args:
+      shape: Shape of the parameters array.
+      indices: One-dimensional array of ints, the indices of the last dimension
+               of the parameters to update.
+      scatter_op: ScatterAdd or ScatterSub.
+    """
+    super(ScatterAddSubTest, self).setUp()
+    with self.test_session(use_gpu=False):
+      # Create a random parameter array of given shape
+      p_init = np.random.rand(*shape).astype("f")
+      # Create the shape of the update array. All dimensions except the last
+      # match the parameter array, the last dimension equals the # of indices.
+      vals_shape = [len(indices)] + shape[1:]
+      vals_init = np.random.rand(*vals_shape).astype("f")
+      v_i = [float(x) for x in vals_init.ravel()]
+      p = tf.Variable(p_init)
+      vals = tf.constant(v_i, shape=vals_shape, name="vals")
+      ind = tf.constant(indices, dtype=tf.int32)
+      p2 = scatter_op(p, ind, vals, name="updated_p")
+      # p = init
+      tf.initialize_all_variables().run()
+      # p += vals
+      result = p2.eval()
+    # Compute the expected 'p' using numpy operations.
+    for i, ind in enumerate(indices):
+      if scatter_op == tf.scatter_add:
+        p_init.reshape(shape[0], -1)[ind, :] += (
+            vals_init.reshape(vals_shape[0], -1)[i, :])
+      else:
+        p_init.reshape(shape[0], -1)[ind, :] -= (
+            vals_init.reshape(vals_shape[0], -1)[i, :])
+    self.assertTrue(all((p_init == result).ravel()))
+
+  def testNoRepetitions(self):
+    self._TestCase([2, 2], [1])
+    self._TestCase([4, 4, 4], [2, 0])
+    self._TestCase([43, 20, 10, 10], [42, 5, 6, 1, 3, 5, 7, 9])
+
+  def testWithRepetitions(self):
+    self._TestCase([2, 2], [1, 1])
+    self._TestCase([5, 3, 9, 5], [2, 0, 4, 1, 3, 1, 4, 0, 4, 3])
+    self._TestCase([32, 4, 4], [31] * 8)
+
+  def testRandom(self):
+    # Random shapes of rank 4, random indices
+    for _ in range(5):
+      shape = np.random.randint(1, 20, size=4)
+      indices = np.random.randint(shape[0], size=2 * shape[0])
+      self._TestCase(_AsLong(list(shape)), list(indices))
+
+  def testSubRandom(self):
+    # Random shapes of rank 4, random indices
+    for _ in range(5):
+      shape = np.random.randint(1, 20, size=4)
+      indices = np.random.randint(shape[0], size=2 * shape[0])
+      self._TestCase(_AsLong(list(shape)), list(indices),
+                     tf.scatter_sub)
+
+  def testWrongShape(self):
+    # Indices and values mismatch.
+    var = tf.Variable(tf.zeros(shape=[1024, 64, 64], dtype=tf.float32))
+    indices = tf.placeholder(tf.int32, shape=[32])
+    values = tf.placeholder(tf.float32, shape=[33, 64, 64])
+    with self.assertRaises(ValueError):
+      tf.scatter_add(var, indices, values)
+
+    # Var and values mismatch.
+    values = tf.placeholder(tf.float32, shape=[32, 64, 63])
+    with self.assertRaises(ValueError):
+      tf.scatter_add(var, indices, values)
+
+
+def _PName(param_id):
+  return "p" + str(param_id)
+
+
+def _EmbeddingParams(num_shards, vocab_size,
+                     dtype=tf.float32,
+                     shape=None):
+  p = []
+  params = {}
+  feed_dict = {}
+  if not shape: shape = [10]
+  assert not vocab_size % num_shards
+  shape = [vocab_size / num_shards] + shape
+  for i in range(num_shards):
+    param_name = _PName(i)
+    constant_t = tf.constant(1.0, shape=shape, dtype=dtype,
+                                      name=param_name)
+    p.append(constant_t)
+    np_type = "f" if dtype == tf.float32 else "d"
+    val = (np.random.rand(*shape).astype(np_type)) + 1
+    params[param_name + ":0"] = val
+    feed_dict[constant_t.name] = val
+  return p, params, feed_dict
+
+
+def _EmbeddingResult(params, id_vals, num_shards, weight_vals=None):
+  if weight_vals is None:
+    weight_vals = np.copy(id_vals)
+    weight_vals.fill(1)
+  values = []
+  weights = []
+  for ids, wts in zip(id_vals, weight_vals):
+    val_aggr = None
+    wt_aggr = None
+    if isinstance(ids, int):
+      ids = [ids]
+      wts = [wts]
+    for i, wt_val in zip(ids, wts):
+      val = np.copy(params[_PName(i % num_shards) + ":0"]
+                    [i / num_shards, :]) * wt_val
+      if val_aggr is None:
+        assert wt_aggr is None
+        val_aggr = val
+        wt_aggr = wt_val
+      else:
+        assert wt_aggr is not None
+        val_aggr += val
+        wt_aggr += wt_val
+    values.append(val_aggr)
+    weights.append(wt_aggr)
+  values = np.array(values).astype(np.float32)
+  weights = np.array(weights).astype(np.float32)
+  return values, weights
+
+
+class EmbeddingLookupTest(tf.test.TestCase):
+
+  # This test looks up [0, 0] in a parameter matrix sharded 2 ways. Since
+  # both the ids are in the first shard, one of the resulting lookup
+  # vector is going to be empty. The subsequent DivOp fails because of that.
+  # TODO(keveman): Disabling the test until the underlying problem is fixed.
+  def testSimpleSharded(self):
+    with self.test_session():
+      num_shards = 2
+      vocab_size = 4
+      p, params, feed_dict = _EmbeddingParams(num_shards, vocab_size)
+
+      id_vals = np.array([0, 0])
+      ids = tf.constant(list(id_vals), dtype=tf.int32)
+      print "Construct ids", ids.get_shape()
+      embedding = tf.nn.embedding_lookup(p, ids)
+
+      tf_result = embedding.eval(feed_dict=feed_dict)
+    np_result, _ = _EmbeddingResult(params, id_vals, num_shards)
+    self.assertAllEqual(np_result, tf_result)
+    self.assertShapeEqual(np_result, embedding)
+
+  def testSharded(self):
+    with self.test_session():
+      num_shards = 5
+      vocab_size = 25
+      # Embedding dimensions is 10. The 10 x vocab_size embedding
+      # parameters are spread in num_shards matrices, so each
+      # matrix is 10 x (vocab_size / num_shards)
+      p, params, feed_dict = _EmbeddingParams(num_shards, vocab_size)
+
+      num_vals = 30
+      # Fetch num_vals embeddings for random word ids. Since
+      # num_vals > vocab_size, this ought to have repetitions, so
+      # will test that aspect.
+      id_vals = np.random.randint(vocab_size, size=num_vals)
+      ids = tf.constant(list(id_vals), dtype=tf.int32)
+
+      embedding = tf.nn.embedding_lookup(p, ids)
+      tf_result = embedding.eval(feed_dict=feed_dict)
+    np_result, _ = _EmbeddingResult(params, id_vals, num_shards)
+    self.assertAllEqual(np_result, tf_result)
+    self.assertShapeEqual(np_result, embedding)
+
+  def testGradientsEmbeddingLookup(self):
+    vocab_size = 9
+    num_ids = 5
+    id_vals = list(np.random.randint(vocab_size, size=num_ids))
+    tf.logging.vlog(1, id_vals)
+    for num_shards in [1, 3]:
+      with self.test_session():
+        ids = tf.constant(id_vals, dtype=tf.int32)
+        x, params, _ = _EmbeddingParams(
+            num_shards, vocab_size, shape=[2])
+        y = tf.nn.embedding_lookup(x, ids)
+        y_shape = [num_ids] + list(params[_PName(0) + ":0"].shape[1:])
+        x_name = [_PName(i) for i in range(num_shards)]
+        x_init_value = [params[x_n + ":0"] for x_n in x_name]
+        x_shape = [i.shape for i in x_init_value]
+        err = gc.ComputeGradientError(x, x_shape, y, y_shape,
+                                      x_init_value=x_init_value)
+      self.assertLess(err, 1e-4)
+
+  def testGradientsEmbeddingLookupWithComputedParams(self):
+    vocab_size = 9
+    num_ids = 5
+    id_vals = list(np.random.randint(vocab_size, size=num_ids))
+    tf.logging.vlog(1, id_vals)
+    for num_shards in [1, 3]:
+      with self.test_session():
+        ids = tf.constant(id_vals, dtype=tf.int32)
+        x, params, _ = _EmbeddingParams(
+            num_shards, vocab_size, shape=[2])
+        # This will force a conversion from IndexedSlices to Tensor.
+        x_squared = [tf.square(elem) for elem in x]
+        y = tf.nn.embedding_lookup(x_squared, ids)
+        y_shape = [num_ids] + list(params[_PName(0) + ":0"].shape[1:])
+        x_name = [_PName(i) for i in range(num_shards)]
+        x_init_value = [params[x_n + ":0"] for x_n in x_name]
+        x_shape = [i.shape for i in x_init_value]
+        err = gc.ComputeGradientError(x, x_shape, y, y_shape,
+                                      x_init_value=x_init_value)
+      self.assertLess(err, 1e-3)
+
+  def testConstructionNonSharded(self):
+    with tf.Graph().as_default():
+      p = tf.Variable(tf.zeros(shape=[100, 100], dtype=tf.float32))
+      ids = tf.constant([0, 1, 1, 7], dtype=tf.int32)
+      tf.nn.embedding_lookup([p], ids)
+
+  def testConstructionSharded(self):
+    with tf.Graph().as_default():
+      p = []
+      for _ in range(2):
+        p += [tf.Variable(tf.zeros(shape=[100, 100], dtype=tf.float32))]
+        ids = tf.constant([0, 1, 1, 17], dtype=tf.int32)
+      tf.nn.embedding_lookup(p, ids)
+
+  def testHigherRank(self):
+    np.random.seed(8)
+    with self.test_session():
+      for params_shape in (12,), (6, 3):
+        params = np.random.randn(*params_shape)
+        for ids_shape in (3, 2), (4, 3):
+          ids = np.random.randint(params.shape[0],
+                                  size=np.prod(ids_shape)).reshape(ids_shape)
+          # Compare nonsharded to gather
+          simple = tf.nn.embedding_lookup(params, ids).eval()
+          self.assertAllEqual(simple, tf.gather(params, ids).eval())
+          # Run a few random sharded versions
+          for procs in 1, 2, 3:
+            stride = procs * tf.range(0, params.shape[0] / procs)
+            split_params = [tf.gather(params, stride + p)
+                            for p in xrange(procs)]
+            sharded = tf.nn.embedding_lookup(split_params, ids).eval()
+            self.assertAllEqual(simple, sharded)
+
+
+class EmbeddingLookupSparseTest(tf.test.TestCase):
+
+  def _RandomIdsAndWeights(self, batch_size, vocab_size):
+    max_val_per_entry = 6
+    vals_per_batch_entry = np.random.randint(
+        1, max_val_per_entry, size=batch_size)
+    num_vals = np.sum(vals_per_batch_entry)
+
+    ids = np.random.randint(vocab_size, size=num_vals)
+    weights = 1 + np.random.rand(num_vals)
+
+    indices = []
+    for batch_entry, num_val in enumerate(vals_per_batch_entry):
+      for val_index in range(num_val):
+        indices.append([batch_entry, val_index])
+
+    shape = [batch_size, max_val_per_entry]
+
+    sp_ids = tf.SparseTensor(
+        tf.constant(indices, tf.int64),
+        tf.constant(ids, tf.int32),
+        tf.constant(shape, tf.int64))
+    sp_weights = tf.SparseTensor(
+        tf.constant(indices, tf.int64),
+        tf.constant(weights, tf.float32),
+        tf.constant(shape, tf.int64))
+
+    return sp_ids, sp_weights, ids, weights, vals_per_batch_entry
+
+  def _GroupByBatchEntry(self, vals, vals_per_batch_entry):
+    grouped_vals = []
+    index = 0
+    for num_val in vals_per_batch_entry:
+      grouped_vals.append(list(vals[index: (index + num_val)]))
+      index += num_val
+    return grouped_vals
+
+  def testEmbeddingLookupSparse(self):
+    vocab_size = 25
+    batch_size = 10
+    param_shape = [2, 5]
+
+    sp_ids, sp_weights, ids, weights, vals_per_batch_entry = (
+        self._RandomIdsAndWeights(batch_size, vocab_size))
+
+    grouped_ids = self._GroupByBatchEntry(ids, vals_per_batch_entry)
+    grouped_weights = self._GroupByBatchEntry(weights, vals_per_batch_entry)
+    grouped_ignored_weights = self._GroupByBatchEntry(
+        np.ones(np.sum(vals_per_batch_entry)), vals_per_batch_entry)
+
+    for num_shards, combiner, dtype, ignore_weights in itertools.product(
+        [1, 5],
+        ["sum", "mean"],
+        [tf.float32, tf.float64],
+        [True, False]):
+
+      with self.test_session():
+        p, params, feed_dict = _EmbeddingParams(num_shards, vocab_size,
+                                                shape=param_shape,
+                                                dtype=dtype)
+        embedding_sum = tf.nn.embedding_lookup_sparse(
+            p, sp_ids, None if ignore_weights else sp_weights,
+            combiner=combiner)
+        tf_embedding_sum = embedding_sum.eval(feed_dict=feed_dict)
+
+        np_embedding_sum, np_weight_sum = _EmbeddingResult(
+            params, grouped_ids, num_shards,
+            weight_vals=grouped_ignored_weights
+            if ignore_weights else grouped_weights)
+        if combiner == "mean":
+          np_embedding_sum /= np.reshape(np_weight_sum, (batch_size, 1, 1))
+        self.assertAllClose(np_embedding_sum, tf_embedding_sum)
+
+  def testGradientsEmbeddingLookupSparse(self):
+    vocab_size = 12
+    batch_size = 4
+    param_shape = [2, 3]
+    sp_ids, sp_weights, _, _, _ = (
+        self._RandomIdsAndWeights(batch_size, vocab_size))
+
+    for num_shards, combiner, dtype, ignore_weights in itertools.product(
+        [1, 3],
+        ["sum", "mean"],
+        [tf.float32, tf.float64],
+        [True, False]):
+      with self.test_session():
+        x, params, _ = _EmbeddingParams(num_shards, vocab_size,
+                                        shape=param_shape,
+                                        dtype=dtype)
+
+        y = tf.nn.embedding_lookup_sparse(
+            x, sp_ids, None if ignore_weights else sp_weights,
+            combiner=combiner)
+        x_name = [_PName(i) for i in range(num_shards)]
+        x_init_value = [params[x_n + ":0"] for x_n in x_name]
+        x_shape = [i.shape for i in x_init_value]
+        y_shape = [batch_size] + list(params[_PName(0) + ":0"].shape[1:])
+        err = gc.ComputeGradientError(x, x_shape, y, y_shape,
+                                      x_init_value=x_init_value)
+      self.assertLess(err, 1e-5 if dtype == tf.float64 else 2e-3)
+
+
+class DynamicStitchOpTest(tf.test.TestCase):
+
+  def testCint32Cpu(self):
+    with self.test_session(use_gpu=False):
+      indices = [tf.convert_to_tensor([0, 1, 2]), tf.convert_to_tensor([2, 3])]
+      values = [tf.convert_to_tensor([12, 23, 34]), tf.convert_to_tensor([1, 2])]
+      self.assertAllEqual(
+          tf.dynamic_stitch(indices, values).eval(), [12, 23, 1, 2])
+
+  def testCint32Gpu(self):
+    with self.test_session(use_gpu=True):
+      indices = [tf.convert_to_tensor([0, 1, 2]), tf.convert_to_tensor([2, 3])]
+      values = [tf.convert_to_tensor([12, 23, 34]), tf.convert_to_tensor([1, 2])]
+      self.assertAllEqual(
+          tf.dynamic_stitch(indices, values).eval(), [12, 23, 1, 2])
+
+  def testInt32Cpu(self):
+    with self.test_session(use_gpu=False):
+      indices = [tf.convert_to_tensor([0, 1, 2]), tf.convert_to_tensor([2, 3])]
+      values = [tf.convert_to_tensor([12, 23, 34]), tf.convert_to_tensor([1, 2])]
+      self.assertAllEqual(
+          tf.dynamic_stitch(indices, values).eval(), [12, 23, 1, 2])
+
+  def testInt32Gpu(self):
+    with self.test_session(use_gpu=True):
+      indices = [tf.convert_to_tensor([0, 1, 2]), tf.convert_to_tensor([2, 3])]
+      values = [tf.convert_to_tensor([12, 23, 34]), tf.convert_to_tensor([1, 2])]
+      self.assertAllEqual(
+          tf.dynamic_stitch(indices, values).eval(), [12, 23, 1, 2])
+
+  def testSumGradArgs(self):
+    with self.test_session(use_gpu=False):
+      indices = [tf.convert_to_tensor([0, 1, 2, 3]),
+                 tf.convert_to_tensor([2, 3])]
+      values = [tf.convert_to_tensor([2, 3, 5, 7]), tf.convert_to_tensor([1, 1])]
+      self.assertAllEqual(
+          tf.dynamic_stitch(indices, values).eval(), [2, 3, 1, 1])
+
+  # We expect that the values are merged in order.
+  def testStitchOrder(self):
+    with self.test_session():
+      indices = []
+      np_values = []
+      values = []
+      for _ in range(10):
+        indices.extend([tf.convert_to_tensor(np.arange(100).astype(np.int32))])
+        np_values.extend([np.random.uniform(size=100)])
+        values.extend([tf.convert_to_tensor(np_values[-1])])
+      stitched = tf.dynamic_stitch(indices, values).eval()
+    self.assertAllEqual(np_values[-1], stitched)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/fifo_queue_test.py b/tensorflow/python/kernel_tests/fifo_queue_test.py
new file mode 100644
index 0000000000..57448db433
--- /dev/null
+++ b/tensorflow/python/kernel_tests/fifo_queue_test.py
@@ -0,0 +1,1043 @@
+"""Tests for tensorflow.ops.data_flow_ops.FIFOQueue."""
+import random
+import re
+import time
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class FIFOQueueTest(tf.test.TestCase):
+
+  def testConstructor(self):
+    with tf.Graph().as_default():
+      q = tf.FIFOQueue(10, tf.float32, name="Q")
+    self.assertTrue(isinstance(q.queue_ref, tf.Tensor))
+    self.assertEquals(tf.string_ref, q.queue_ref.dtype)
+    self.assertProtoEquals("""
+      name:'Q' op:'FIFOQueue'
+      attr { key: 'component_types' value { list { type: DT_FLOAT } } }
+      attr { key: 'shapes' value { list {} } }
+      attr { key: 'capacity' value { i: 10 } }
+      attr { key: 'container' value { s: '' } }
+      attr { key: 'shared_name' value { s: '' } }
+      """, q.queue_ref.op.node_def)
+
+  def testMultiQueueConstructor(self):
+    with tf.Graph().as_default():
+      q = tf.FIFOQueue(5, (tf.int32, tf.float32),
+                                  shared_name="foo", name="Q")
+    self.assertTrue(isinstance(q.queue_ref, tf.Tensor))
+    self.assertEquals(tf.string_ref, q.queue_ref.dtype)
+    self.assertProtoEquals("""
+      name:'Q' op:'FIFOQueue'
+      attr { key: 'component_types' value { list {
+        type: DT_INT32 type : DT_FLOAT
+      } } }
+      attr { key: 'shapes' value { list {} } }
+      attr { key: 'capacity' value { i: 5 } }
+      attr { key: 'container' value { s: '' } }
+      attr { key: 'shared_name' value { s: 'foo' } }
+      """, q.queue_ref.op.node_def)
+
+  def testConstructorWithShapes(self):
+    with tf.Graph().as_default():
+      q = tf.FIFOQueue(5, (tf.int32, tf.float32),
+                       shapes=(tf.TensorShape([1, 1, 2, 3]),
+                               tf.TensorShape([5, 8])), name="Q")
+    self.assertTrue(isinstance(q.queue_ref, tf.Tensor))
+    self.assertEquals(tf.string_ref, q.queue_ref.dtype)
+    self.assertProtoEquals("""
+      name:'Q' op:'FIFOQueue'
+      attr { key: 'component_types' value { list {
+        type: DT_INT32 type : DT_FLOAT
+      } } }
+      attr { key: 'shapes' value { list {
+        shape { dim { size: 1 }
+                dim { size: 1 }
+                dim { size: 2 }
+                dim { size: 3 } }
+        shape { dim { size: 5 }
+                dim { size: 8 } }
+      } } }
+      attr { key: 'capacity' value { i: 5 } }
+      attr { key: 'container' value { s: '' } }
+      attr { key: 'shared_name' value { s: '' } }
+      """, q.queue_ref.op.node_def)
+
+  def testEnqueue(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      enqueue_op = q.enqueue((10.0,))
+      enqueue_op.run()
+
+  def testEnqueueWithShape(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32, shapes=(3, 2))
+      enqueue_correct_op = q.enqueue(([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]],))
+      enqueue_correct_op.run()
+      with self.assertRaises(ValueError):
+        q.enqueue(([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],))
+      self.assertEqual(1, q.size().eval())
+
+  def testEnqueueManyWithShape(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, [tf.int32, tf.int32],
+                                  shapes=[(), (2,)])
+      q.enqueue_many([[1, 2, 3, 4], [[1, 1], [2, 2], [3, 3], [4, 4]]]).run()
+      self.assertEqual(4, q.size().eval())
+
+  def testParallelEnqueue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
+      enqueue_ops = [q.enqueue((x,)) for x in elems]
+      dequeued_t = q.dequeue()
+
+      # Run one producer thread for each element in elems.
+      def enqueue(enqueue_op):
+        sess.run(enqueue_op)
+      threads = [self.checkedThread(target=enqueue, args=(e,))
+                 for e in enqueue_ops]
+      for thread in threads:
+        thread.start()
+      for thread in threads:
+        thread.join()
+
+      # Dequeue every element using a single thread.
+      results = []
+      for _ in xrange(len(elems)):
+        results.append(dequeued_t.eval())
+      self.assertItemsEqual(elems, results)
+
+  def testParallelDequeue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
+      enqueue_ops = [q.enqueue((x,)) for x in elems]
+      dequeued_t = q.dequeue()
+
+      # Enqueue every element using a single thread.
+      for enqueue_op in enqueue_ops:
+        enqueue_op.run()
+
+      # Run one consumer thread for each element in elems.
+      results = []
+
+      def dequeue():
+        results.append(sess.run(dequeued_t))
+      threads = [self.checkedThread(target=dequeue) for _ in enqueue_ops]
+      for thread in threads:
+        thread.start()
+      for thread in threads:
+        thread.join()
+      self.assertItemsEqual(elems, results)
+
+  def testDequeue(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      elems = [10.0, 20.0, 30.0]
+      enqueue_ops = [q.enqueue((x,)) for x in elems]
+      dequeued_t = q.dequeue()
+
+      for enqueue_op in enqueue_ops:
+        enqueue_op.run()
+
+      for i in xrange(len(elems)):
+        vals = dequeued_t.eval()
+        self.assertEqual([elems[i]], vals)
+
+  def testEnqueueAndBlockingDequeue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(3, tf.float32)
+      elems = [10.0, 20.0, 30.0]
+      enqueue_ops = [q.enqueue((x,)) for x in elems]
+      dequeued_t = q.dequeue()
+
+      def enqueue():
+        # The enqueue_ops should run after the dequeue op has blocked.
+        # TODO(mrry): Figure out how to do this without sleeping.
+        time.sleep(0.1)
+        for enqueue_op in enqueue_ops:
+          sess.run(enqueue_op)
+
+      results = []
+
+      def dequeue():
+        for _ in xrange(len(elems)):
+          results.append(sess.run(dequeued_t))
+
+      enqueue_thread = self.checkedThread(target=enqueue)
+      dequeue_thread = self.checkedThread(target=dequeue)
+      enqueue_thread.start()
+      dequeue_thread.start()
+      enqueue_thread.join()
+      dequeue_thread.join()
+
+      for elem, result in zip(elems, results):
+        self.assertEqual([elem], result)
+
+  def testMultiEnqueueAndDequeue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, (tf.int32, tf.float32))
+      elems = [(5, 10.0), (10, 20.0), (15, 30.0)]
+      enqueue_ops = [q.enqueue((x, y)) for x, y in elems]
+      dequeued_t = q.dequeue()
+
+      for enqueue_op in enqueue_ops:
+        enqueue_op.run()
+
+      for i in xrange(len(elems)):
+        x_val, y_val = sess.run(dequeued_t)
+        x, y = elems[i]
+        self.assertEqual([x], x_val)
+        self.assertEqual([y], y_val)
+
+  def testQueueSizeEmpty(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      self.assertEqual([0], q.size().eval())
+
+  def testQueueSizeAfterEnqueueAndDequeue(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      enqueue_op = q.enqueue((10.0,))
+      dequeued_t = q.dequeue()
+      size = q.size()
+      self.assertEqual([], size.get_shape())
+
+      enqueue_op.run()
+      self.assertEqual(1, size.eval())
+      dequeued_t.op.run()
+      self.assertEqual(0, size.eval())
+
+  def testEnqueueMany(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue()
+      enqueue_op.run()
+      enqueue_op.run()
+
+      for i in range(8):
+        vals = dequeued_t.eval()
+        self.assertEqual([elems[i % 4]], vals)
+
+  def testEmptyEnqueueMany(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      empty_t = tf.constant([], dtype=tf.float32,
+                                     shape=[0, 2, 3])
+      enqueue_op = q.enqueue_many((empty_t,))
+      size_t = q.size()
+
+      self.assertEqual([0], size_t.eval())
+      enqueue_op.run()
+      self.assertEqual([0], size_t.eval())
+
+  def testEmptyDequeueMany(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32, shapes=())
+      enqueue_op = q.enqueue((10.0,))
+      dequeued_t = q.dequeue_many(0)
+
+      self.assertEqual([], dequeued_t.eval().tolist())
+      enqueue_op.run()
+      self.assertEqual([], dequeued_t.eval().tolist())
+
+  def testEmptyDequeueManyWithNoShape(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      # Expect the operation to fail due to the shape not being constrained.
+      with self.assertRaisesOpError("specified shapes"):
+        q.dequeue_many(0).eval()
+
+  def testMultiEnqueueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, (tf.float32, tf.int32))
+      float_elems = [10.0, 20.0, 30.0, 40.0]
+      int_elems = [[1, 2], [3, 4], [5, 6], [7, 8]]
+      enqueue_op = q.enqueue_many((float_elems, int_elems))
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+      enqueue_op.run()
+
+      for i in range(8):
+        float_val, int_val = sess.run(dequeued_t)
+        self.assertEqual(float_elems[i % 4], float_val)
+        self.assertAllEqual(int_elems[i % 4], int_val)
+
+  def testDequeueMany(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32, ())
+      elems = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(4)
+
+      enqueue_op.run()
+
+      self.assertAllEqual(elems[0:4], dequeued_t.eval())
+      self.assertAllEqual(elems[4:8], dequeued_t.eval())
+
+  def testMultiDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, (tf.float32, tf.int32),
+                                  shapes=((), (2,)))
+      float_elems = [
+          10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
+      int_elems = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10],
+                   [11, 12], [13, 14], [15, 16], [17, 18], [19, 20]]
+      enqueue_op = q.enqueue_many((float_elems, int_elems))
+      dequeued_t = q.dequeue_many(4)
+      dequeued_single_t = q.dequeue()
+
+      enqueue_op.run()
+
+      float_val, int_val = sess.run(dequeued_t)
+      self.assertAllEqual(float_elems[0:4], float_val)
+      self.assertAllEqual(int_elems[0:4], int_val)
+      self.assertEqual(float_val.shape, dequeued_t[0].get_shape())
+      self.assertEqual(int_val.shape, dequeued_t[1].get_shape())
+
+      float_val, int_val = sess.run(dequeued_t)
+      self.assertAllEqual(float_elems[4:8], float_val)
+      self.assertAllEqual(int_elems[4:8], int_val)
+
+      float_val, int_val = sess.run(dequeued_single_t)
+      self.assertAllEqual(float_elems[8], float_val)
+      self.assertAllEqual(int_elems[8], int_val)
+      self.assertEqual(float_val.shape, dequeued_single_t[0].get_shape())
+      self.assertEqual(int_val.shape, dequeued_single_t[1].get_shape())
+
+  def testHighDimension(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.int32, (4, 4, 4, 4))
+      elems = np.array([[[[[x] * 4] * 4] * 4] * 4 for x in range(10)], np.int32)
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(10)
+
+      enqueue_op.run()
+      self.assertAllEqual(dequeued_t.eval(), elems)
+
+  def testParallelEnqueueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(1000, tf.float32, shapes=())
+      elems = [10.0 * x for x in range(100)]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(1000)
+
+      # Enqueue 100 items in parallel on 10 threads.
+      def enqueue():
+        sess.run(enqueue_op)
+      threads = [self.checkedThread(target=enqueue) for _ in range(10)]
+      for thread in threads:
+        thread.start()
+      for thread in threads:
+        thread.join()
+
+      self.assertItemsEqual(dequeued_t.eval(), elems * 10)
+
+  def testParallelDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(1000, tf.float32, shapes=())
+      elems = [10.0 * x for x in range(1000)]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(100)
+
+      enqueue_op.run()
+
+      # Dequeue 100 items in parallel on 10 threads.
+      dequeued_elems = []
+
+      def dequeue():
+        dequeued_elems.extend(sess.run(dequeued_t))
+      threads = [self.checkedThread(target=dequeue) for _ in range(10)]
+      for thread in threads:
+        thread.start()
+      for thread in threads:
+        thread.join()
+      self.assertItemsEqual(elems, dequeued_elems)
+
+  def testParallelEnqueueAndDequeue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(50, tf.float32, shapes=())
+      initial_elements = [10.0] * 49
+      q.enqueue_many((initial_elements,)).run()
+
+      enqueue_op = q.enqueue((20.0,))
+      dequeued_t = q.dequeue()
+
+      def enqueue():
+        for _ in xrange(100):
+          sess.run(enqueue_op)
+      def dequeue():
+        for _ in xrange(100):
+          self.assertTrue(sess.run(dequeued_t) in (10.0, 20.0))
+
+      enqueue_threads = [self.checkedThread(target=enqueue) for _ in range(10)]
+      dequeue_threads = [self.checkedThread(target=dequeue) for _ in range(10)]
+      for enqueue_thread in enqueue_threads:
+        enqueue_thread.start()
+      for dequeue_thread in dequeue_threads:
+        dequeue_thread.start()
+      for enqueue_thread in enqueue_threads:
+        enqueue_thread.join()
+      for dequeue_thread in dequeue_threads:
+        dequeue_thread.join()
+
+      # Dequeue the initial count of elements to clean up.
+      cleanup_elems = q.dequeue_many(49).eval()
+      for elem in cleanup_elems:
+        self.assertTrue(elem in (10.0, 20.0))
+
+  def testMixtureOfEnqueueAndEnqueueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.int32, shapes=())
+      enqueue_placeholder = tf.placeholder(tf.int32, shape=())
+      enqueue_op = q.enqueue((enqueue_placeholder,))
+      enqueuemany_placeholder = tf.placeholder(
+          tf.int32, shape=(None,))
+      enqueuemany_op = q.enqueue_many((enqueuemany_placeholder,))
+
+      dequeued_t = q.dequeue()
+      close_op = q.close()
+
+      def dequeue():
+        for i in xrange(250):
+          self.assertEqual(i, sess.run(dequeued_t))
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+
+      elements_enqueued = 0
+      while elements_enqueued < 250:
+        # With equal probability, run Enqueue or enqueue_many.
+        if random.random() > 0.5:
+          enqueue_op.run({enqueue_placeholder: elements_enqueued})
+          elements_enqueued += 1
+        else:
+          count = random.randint(0, min(20, 250 - elements_enqueued))
+          range_to_enqueue = range(elements_enqueued, elements_enqueued + count)
+          enqueuemany_op.run({enqueuemany_placeholder: range_to_enqueue})
+          elements_enqueued += count
+
+      close_op.run()
+      dequeue_thread.join()
+      self.assertEqual(0, q.size().eval())
+
+  def testMixtureOfDequeueAndDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.int32, shapes=())
+      enqueue_op = q.enqueue_many((range(250),))
+      dequeued_t = q.dequeue()
+      count_placeholder = tf.placeholder(tf.int32, shape=())
+      dequeuemany_t = q.dequeue_many(count_placeholder)
+
+      def enqueue():
+        sess.run(enqueue_op)
+      enqueue_thread = self.checkedThread(target=enqueue)
+      enqueue_thread.start()
+
+      elements_dequeued = 0
+      while elements_dequeued < 250:
+        # With equal probability, run Dequeue or dequeue_many.
+        if random.random() > 0.5:
+          self.assertEqual(elements_dequeued, dequeued_t.eval())
+          elements_dequeued += 1
+        else:
+          count = random.randint(0, min(20, 250 - elements_dequeued))
+          expected_range = range(elements_dequeued, elements_dequeued + count)
+          self.assertAllEqual(
+              expected_range, dequeuemany_t.eval({count_placeholder: count}))
+          elements_dequeued += count
+
+      q.close().run()
+      enqueue_thread.join()
+      self.assertEqual(0, q.size().eval())
+
+  def testBlockingDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.float32, ())
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(4)
+
+      dequeued_elems = []
+
+      def enqueue():
+        # The enqueue_op should run after the dequeue op has blocked.
+        # TODO(mrry): Figure out how to do this without sleeping.
+        time.sleep(0.1)
+        sess.run(enqueue_op)
+
+      def dequeue():
+        dequeued_elems.extend(sess.run(dequeued_t).tolist())
+
+      enqueue_thread = self.checkedThread(target=enqueue)
+      dequeue_thread = self.checkedThread(target=dequeue)
+      enqueue_thread.start()
+      dequeue_thread.start()
+      enqueue_thread.join()
+      dequeue_thread.join()
+
+      self.assertAllEqual(elems, dequeued_elems)
+
+  def testDequeueManyWithTensorParameter(self):
+    with self.test_session():
+      # Define a first queue that contains integer counts.
+      dequeue_counts = [random.randint(1, 10) for _ in range(100)]
+      count_q = tf.FIFOQueue(100, tf.int32, ())
+      enqueue_counts_op = count_q.enqueue_many((dequeue_counts,))
+      total_count = sum(dequeue_counts)
+
+      # Define a second queue that contains total_count elements.
+      elems = [random.randint(0, 100) for _ in range(total_count)]
+      q = tf.FIFOQueue(total_count, tf.int32, ())
+      enqueue_elems_op = q.enqueue_many((elems,))
+
+      # Define a subgraph that first dequeues a count, then DequeuesMany
+      # that number of elements.
+      dequeued_t = q.dequeue_many(count_q.dequeue())
+
+      enqueue_counts_op.run()
+      enqueue_elems_op.run()
+
+      dequeued_elems = []
+      for _ in dequeue_counts:
+        dequeued_elems.extend(dequeued_t.eval())
+      self.assertEqual(elems, dequeued_elems)
+
+  def testDequeueFromClosedQueue(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+      close_op.run()
+      for elem in elems:
+        self.assertEqual([elem], dequeued_t.eval())
+
+      # Expect the operation to fail due to the queue being closed.
+      with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                   "is closed and has insufficient"):
+        dequeued_t.eval()
+
+  def testBlockingDequeueFromClosedQueue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+
+      def dequeue():
+        for elem in elems:
+          self.assertEqual([elem], sess.run(dequeued_t))
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                     "is closed and has insufficient"):
+          sess.run(dequeued_t)
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      close_op.run()
+      dequeue_thread.join()
+
+  def testBlockingDequeueFromClosedEmptyQueue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.float32)
+      close_op = q.close()
+      dequeued_t = q.dequeue()
+
+      def dequeue():
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                     "is closed and has insufficient"):
+          sess.run(dequeued_t)
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      close_op.run()
+      dequeue_thread.join()
+
+  def testBlockingDequeueManyFromClosedQueue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.float32, ())
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+      dequeued_t = q.dequeue_many(4)
+
+      enqueue_op.run()
+
+      def dequeue():
+        self.assertAllEqual(elems, sess.run(dequeued_t))
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                     "is closed and has insufficient"):
+          sess.run(dequeued_t)
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      close_op.run()
+      dequeue_thread.join()
+
+  def testEnqueueManyLargerThanCapacityWithConcurrentDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(4, tf.float32, ())
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+      dequeued_t = q.dequeue_many(3)
+      cleanup_dequeue_t = q.dequeue()
+
+      def enqueue():
+        sess.run(enqueue_op)
+
+      def dequeue():
+        self.assertAllEqual(elems[0:3], sess.run(dequeued_t))
+        with self.assertRaises(tf.errors.OutOfRangeError):
+          sess.run(dequeued_t)
+        self.assertEqual(elems[3], sess.run(cleanup_dequeue_t))
+
+      def close():
+        sess.run(close_op)
+
+      enqueue_thread = self.checkedThread(target=enqueue)
+      enqueue_thread.start()
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+
+      close_thread = self.checkedThread(target=close)
+      close_thread.start()
+
+      enqueue_thread.join()
+      dequeue_thread.join()
+      close_thread.join()
+
+  def testClosedBlockingDequeueManyRestoresPartialBatch(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(4, (tf.float32, tf.float32), ((), ()))
+      elems_a = [1.0, 2.0, 3.0]
+      elems_b = [10.0, 20.0, 30.0]
+      enqueue_op = q.enqueue_many((elems_a, elems_b))
+      dequeued_a_t, dequeued_b_t = q.dequeue_many(4)
+      cleanup_dequeue_a_t, cleanup_dequeue_b_t = q.dequeue()
+      close_op = q.close()
+
+      enqueue_op.run()
+
+      def dequeue():
+        with self.assertRaises(tf.errors.OutOfRangeError):
+          sess.run([dequeued_a_t, dequeued_b_t])
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+
+      close_op.run()
+      dequeue_thread.join()
+      # Test that the elements in the partially-dequeued batch are
+      # restored in the correct order.
+      for elem_a, elem_b in zip(elems_a, elems_b):
+        val_a, val_b = sess.run([cleanup_dequeue_a_t, cleanup_dequeue_b_t])
+        self.assertEqual(elem_a, val_a)
+        self.assertEqual(elem_b, val_b)
+      self.assertEqual(0, q.size().eval())
+
+  def testBlockingDequeueManyFromClosedEmptyQueue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(10, tf.float32, ())
+      close_op = q.close()
+      dequeued_t = q.dequeue_many(4)
+
+      def dequeue():
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                      "is closed and has insufficient"):
+          sess.run(dequeued_t)
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      close_op.run()
+      dequeue_thread.join()
+
+  def testEnqueueToClosedQueue(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      enqueue_op = q.enqueue((10.0,))
+      close_op = q.close()
+
+      enqueue_op.run()
+      close_op.run()
+
+      # Expect the operation to fail due to the queue being closed.
+      with self.assertRaisesRegexp(tf.errors.AbortedError, "is closed"):
+        enqueue_op.run()
+
+  def testEnqueueManyToClosedQueue(self):
+    with self.test_session():
+      q = tf.FIFOQueue(10, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+
+      enqueue_op.run()
+      close_op.run()
+
+      # Expect the operation to fail due to the queue being closed.
+      with self.assertRaisesRegexp(tf.errors.AbortedError, "is closed"):
+        enqueue_op.run()
+
+  def testBlockingEnqueueToFullQueue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(4, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      blocking_enqueue_op = q.enqueue((50.0,))
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+
+      def blocking_enqueue():
+        sess.run(blocking_enqueue_op)
+      thread = self.checkedThread(target=blocking_enqueue)
+      thread.start()
+      # The dequeue ops should run after the blocking_enqueue_op has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      for elem in elems:
+        self.assertEqual([elem], dequeued_t.eval())
+      self.assertEqual([50.0], dequeued_t.eval())
+      thread.join()
+
+  def testBlockingEnqueueManyToFullQueue(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(4, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      blocking_enqueue_op = q.enqueue_many(([50.0, 60.0],))
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+
+      def blocking_enqueue():
+        sess.run(blocking_enqueue_op)
+      thread = self.checkedThread(target=blocking_enqueue)
+      thread.start()
+      # The dequeue ops should run after the blocking_enqueue_op has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      for elem in elems:
+        self.assertEqual([elem], dequeued_t.eval())
+        time.sleep(0.01)
+      self.assertEqual([50.0], dequeued_t.eval())
+      self.assertEqual([60.0], dequeued_t.eval())
+
+  def testBlockingEnqueueBeforeClose(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(4, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      blocking_enqueue_op = q.enqueue((50.0,))
+      close_op = q.close()
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+
+      def blocking_enqueue():
+        # Expect the operation to succeed once the dequeue op runs.
+        sess.run(blocking_enqueue_op)
+      enqueue_thread = self.checkedThread(target=blocking_enqueue)
+      enqueue_thread.start()
+
+      # The close_op should run after the blocking_enqueue_op has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+
+      def close():
+        sess.run(close_op)
+      close_thread = self.checkedThread(target=close)
+      close_thread.start()
+
+      # The dequeue will unblock both threads.
+      self.assertEqual(10.0, dequeued_t.eval())
+      enqueue_thread.join()
+      close_thread.join()
+
+      for elem in [20.0, 30.0, 40.0, 50.0]:
+        self.assertEqual(elem, dequeued_t.eval())
+      self.assertEqual(0, q.size().eval())
+
+  def testBlockingEnqueueManyBeforeClose(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(4, tf.float32)
+      elems = [10.0, 20.0, 30.0]
+      enqueue_op = q.enqueue_many((elems,))
+      blocking_enqueue_op = q.enqueue_many(([50.0, 60.0],))
+      close_op = q.close()
+      dequeued_t = q.dequeue()
+      enqueue_op.run()
+
+      def blocking_enqueue():
+        sess.run(blocking_enqueue_op)
+      enqueue_thread = self.checkedThread(target=blocking_enqueue)
+      enqueue_thread.start()
+
+      # The close_op should run after the blocking_enqueue_op has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+
+      def close():
+        sess.run(close_op)
+      close_thread = self.checkedThread(target=close)
+      close_thread.start()
+
+      # The dequeue will unblock both threads.
+      self.assertEqual(10.0, dequeued_t.eval())
+      enqueue_thread.join()
+      close_thread.join()
+      for elem in [20.0, 30.0, 50.0, 60.0]:
+        self.assertEqual(elem, dequeued_t.eval())
+
+  def testDoesNotLoseValue(self):
+    with self.test_session():
+      q = tf.FIFOQueue(1, tf.float32)
+      enqueue_op = q.enqueue((10.0,))
+      size_t = q.size()
+
+      enqueue_op.run()
+      for _ in range(500):
+        self.assertEqual(size_t.eval(), [1])
+
+  def testSharedQueueSameSession(self):
+    with self.test_session():
+      q1 = tf.FIFOQueue(
+          1, tf.float32, shared_name="shared_queue")
+      q1.enqueue((10.0,)).run()
+
+      q2 = tf.FIFOQueue(
+          1, tf.float32, shared_name="shared_queue")
+
+      q1_size_t = q1.size()
+      q2_size_t = q2.size()
+
+      self.assertEqual(q1_size_t.eval(), [1])
+      self.assertEqual(q2_size_t.eval(), [1])
+
+      self.assertEqual(q2.dequeue().eval(), [10.0])
+
+      self.assertEqual(q1_size_t.eval(), [0])
+      self.assertEqual(q2_size_t.eval(), [0])
+
+      q2.enqueue((20.0,)).run()
+
+      self.assertEqual(q1_size_t.eval(), [1])
+      self.assertEqual(q2_size_t.eval(), [1])
+
+      self.assertEqual(q1.dequeue().eval(), [20.0])
+
+      self.assertEqual(q1_size_t.eval(), [0])
+      self.assertEqual(q2_size_t.eval(), [0])
+
+  def testIncompatibleSharedQueueErrors(self):
+    with self.test_session():
+      q_a_1 = tf.FIFOQueue(10, tf.float32, shared_name="q_a")
+      q_a_2 = tf.FIFOQueue(15, tf.float32, shared_name="q_a")
+      q_a_1.queue_ref.eval()
+      with self.assertRaisesOpError("capacity"):
+        q_a_2.queue_ref.eval()
+
+      q_b_1 = tf.FIFOQueue(10, tf.float32, shared_name="q_b")
+      q_b_2 = tf.FIFOQueue(10, tf.int32, shared_name="q_b")
+      q_b_1.queue_ref.eval()
+      with self.assertRaisesOpError("component types"):
+        q_b_2.queue_ref.eval()
+
+      q_c_1 = tf.FIFOQueue(10, tf.float32, shared_name="q_c")
+      q_c_2 = tf.FIFOQueue(
+          10, tf.float32, shapes=[(1, 1, 2, 3)], shared_name="q_c")
+      q_c_1.queue_ref.eval()
+      with self.assertRaisesOpError("component shapes"):
+        q_c_2.queue_ref.eval()
+
+      q_d_1 = tf.FIFOQueue(
+          10, tf.float32, shapes=[(1, 1, 2, 3)], shared_name="q_d")
+      q_d_2 = tf.FIFOQueue(10, tf.float32, shared_name="q_d")
+      q_d_1.queue_ref.eval()
+      with self.assertRaisesOpError("component shapes"):
+        q_d_2.queue_ref.eval()
+
+      q_e_1 = tf.FIFOQueue(
+          10, tf.float32, shapes=[(1, 1, 2, 3)], shared_name="q_e")
+      q_e_2 = tf.FIFOQueue(
+          10, tf.float32, shapes=[(1, 1, 2, 4)], shared_name="q_e")
+      q_e_1.queue_ref.eval()
+      with self.assertRaisesOpError("component shapes"):
+        q_e_2.queue_ref.eval()
+
+      q_f_1 = tf.FIFOQueue(10, tf.float32, shared_name="q_f")
+      q_f_2 = tf.FIFOQueue(
+          10, (tf.float32, tf.int32), shared_name="q_f")
+      q_f_1.queue_ref.eval()
+      with self.assertRaisesOpError("component types"):
+        q_f_2.queue_ref.eval()
+
+  def testSelectQueue(self):
+    with self.test_session():
+      num_queues = 10
+      qlist = list()
+      for _ in xrange(num_queues):
+        qlist.append(tf.FIFOQueue(10, tf.float32))
+      # Enqueue/Dequeue into a dynamically selected queue
+      for _ in xrange(20):
+        index = np.random.randint(num_queues)
+        q = tf.FIFOQueue.from_list(index, qlist)
+        q.enqueue((10.,)).run()
+        self.assertEqual(q.dequeue().eval(), 10.0)
+
+  def testSelectQueueOutOfRange(self):
+    with self.test_session():
+      q1 = tf.FIFOQueue(10, tf.float32)
+      q2 = tf.FIFOQueue(15, tf.float32)
+      enq_q = tf.FIFOQueue.from_list(3, [q1, q2])
+      with self.assertRaisesOpError("Index must be in the range"):
+        enq_q.dequeue().eval()
+
+  def _blockingDequeue(self, sess, dequeue_op):
+    with self.assertRaisesOpError("Dequeue operation was cancelled"):
+      sess.run(dequeue_op)
+
+  def _blockingDequeueMany(self, sess, dequeue_many_op):
+    with self.assertRaisesOpError("Dequeue operation was cancelled"):
+      sess.run(dequeue_many_op)
+
+  def _blockingEnqueue(self, sess, enqueue_op):
+    with self.assertRaisesOpError("Enqueue operation was cancelled"):
+      sess.run(enqueue_op)
+
+  def _blockingEnqueueMany(self, sess, enqueue_many_op):
+    with self.assertRaisesOpError("Enqueue operation was cancelled"):
+      sess.run(enqueue_many_op)
+
+  def testResetOfBlockingOperation(self):
+    with self.test_session() as sess:
+      q_empty = tf.FIFOQueue(5, tf.float32, ())
+      dequeue_op = q_empty.dequeue()
+      dequeue_many_op = q_empty.dequeue_many(1)
+
+      q_full = tf.FIFOQueue(5, tf.float32)
+      sess.run(q_full.enqueue_many(([1.0, 2.0, 3.0, 4.0, 5.0],)))
+      enqueue_op = q_full.enqueue((6.0,))
+      enqueue_many_op = q_full.enqueue_many(([6.0],))
+
+      threads = [
+          self.checkedThread(self._blockingDequeue, args=(sess, dequeue_op)),
+          self.checkedThread(self._blockingDequeueMany, args=(sess,
+                                                              dequeue_many_op)),
+          self.checkedThread(self._blockingEnqueue, args=(sess, enqueue_op)),
+          self.checkedThread(self._blockingEnqueueMany, args=(sess,
+                                                              enqueue_many_op))]
+      for t in threads:
+        t.start()
+      time.sleep(0.1)
+      sess.close()  # Will cancel the blocked operations.
+      for t in threads:
+        t.join()
+
+  def testBigEnqueueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(5, tf.int32, ((),))
+      elem = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+      enq = q.enqueue_many((elem,))
+      deq = q.dequeue()
+      size_op = q.size()
+
+      enq_done = []
+      def blocking_enqueue():
+        enq_done.append(False)
+        # This will fill the queue and then block until enough dequeues happen.
+        sess.run(enq)
+        enq_done.append(True)
+      thread = self.checkedThread(target=blocking_enqueue)
+      thread.start()
+
+      # The enqueue should start and then block.
+      results = []
+      results.append(deq.eval())  # Will only complete after the enqueue starts.
+      self.assertEqual(len(enq_done), 1)
+      self.assertEqual(sess.run(size_op), 5)
+
+      for _ in range(3):
+        results.append(deq.eval())
+
+      time.sleep(0.1)
+      self.assertEqual(len(enq_done), 1)
+      self.assertEqual(sess.run(size_op), 5)
+
+      # This dequeue will unblock the thread.
+      results.append(deq.eval())
+      time.sleep(0.1)
+      self.assertEqual(len(enq_done), 2)
+      thread.join()
+
+      for i in range(5):
+        self.assertEqual(size_op.eval(), 5 - i)
+        results.append(deq.eval())
+        self.assertEqual(size_op.eval(), 5 - i - 1)
+
+      self.assertAllEqual(elem, results)
+
+  def testBigDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.FIFOQueue(2, tf.int32, ((),))
+      elem = range(4)
+      enq_list = [q.enqueue((e,)) for e in elem]
+      deq = q.dequeue_many(4)
+
+      results = []
+      def blocking_dequeue():
+        # Will only complete after 4 enqueues complete.
+        results.extend(sess.run(deq))
+      thread = self.checkedThread(target=blocking_dequeue)
+      thread.start()
+      # The dequeue should start and then block.
+      for enq in enq_list:
+        # TODO(mrry): Figure out how to do this without sleeping.
+        time.sleep(0.1)
+        self.assertEqual(len(results), 0)
+        sess.run(enq)
+
+      # Enough enqueued to unblock the dequeue
+      thread.join()
+      self.assertAllEqual(elem, results)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/gather_op_test.py b/tensorflow/python/kernel_tests/gather_op_test.py
new file mode 100644
index 0000000000..39e97531d2
--- /dev/null
+++ b/tensorflow/python/kernel_tests/gather_op_test.py
@@ -0,0 +1,71 @@
+"""Tests for tensorflow.ops.tf.gather."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class GatherTest(tf.test.TestCase):
+
+  def testScalar1D(self):
+    with self.test_session():
+      params = tf.constant([0, 1, 2, 3, 7, 5])
+      indices = tf.constant(4)
+      gather_t = tf.gather(params, indices)
+      gather_val = gather_t.eval()
+    self.assertAllEqual(7, gather_val)
+    self.assertEqual([], gather_t.get_shape())
+
+  def testScalar2D(self):
+    with self.test_session():
+      params = tf.constant([[0, 1, 2], [3, 4, 5], [6, 7, 8],
+                                     [9, 10, 11], [12, 13, 14]])
+      indices = tf.constant(2)
+      gather_t = tf.gather(params, indices)
+      gather_val = gather_t.eval()
+    self.assertAllEqual([6, 7, 8], gather_val)
+    self.assertEqual([3], gather_t.get_shape())
+
+  def testSimpleTwoD32(self):
+    with self.test_session():
+      params = tf.constant([[0, 1, 2], [3, 4, 5], [6, 7, 8],
+                                     [9, 10, 11], [12, 13, 14]])
+      indices = tf.constant([0, 4, 0, 2])
+      gather_t = tf.gather(params, indices)
+      gather_val = gather_t.eval()
+    self.assertAllEqual([[0, 1, 2], [12, 13, 14], [0, 1, 2], [6, 7, 8]],
+                        gather_val)
+    self.assertEqual([4, 3], gather_t.get_shape())
+
+  def testHigherRank(self):
+    np.random.seed(1)
+    shape = (4, 3, 2)
+    params = np.random.randn(*shape)
+    indices = np.random.randint(shape[0], size=15).reshape(3, 5)
+    with self.test_session():
+      tf_params = tf.constant(params)
+      tf_indices = tf.constant(indices)
+      gather = tf.gather(tf_params, tf_indices)
+      self.assertAllEqual(params[indices], gather.eval())
+      self.assertEqual(indices.shape + params.shape[1:], gather.get_shape())
+      # Test gradients
+      gather_grad = np.random.randn(*gather.get_shape().as_list())
+      params_grad, indices_grad = tf.gradients(gather, [tf_params, tf_indices],
+                                               gather_grad)
+      self.assertEqual(indices_grad, None)
+      self.assertEqual(type(params_grad), tf.IndexedSlices)
+      params_grad = tf.convert_to_tensor(params_grad)
+      correct_params_grad = np.zeros(shape)
+      for i, g in zip(indices.ravel(), gather_grad.reshape((15,) + shape[1:])):
+        correct_params_grad[i] += g
+      self.assertAllEqual(correct_params_grad, params_grad.eval())
+
+  def testUnknownIndices(self):
+    params = tf.constant([[0, 1, 2]])
+    indices = tf.placeholder(tf.int32)
+    gather_t = tf.gather(params, indices)
+    self.assertEqual(None, gather_t.get_shape())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/gradient_checker.py b/tensorflow/python/kernel_tests/gradient_checker.py
new file mode 100644
index 0000000000..fe74768986
--- /dev/null
+++ b/tensorflow/python/kernel_tests/gradient_checker.py
@@ -0,0 +1,251 @@
+"""Gradient checker for any ops, graphs.
+
+The gradient checker verifies numerically that an op/graph properly
+computes the gradients
+"""
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import gradients
+from tensorflow.python.platform import logging
+
+
+def _Product(t):
+  if isinstance(t, int):
+    return t
+  else:
+    y = 1
+    for x in t:
+      y *= x
+    return y
+
+
+def _ComputeTheoricalJacobian(x, x_shape, x_data, dy, dy_shape, dx):
+  """Computes the theoretical Jacobian for dy/dx.
+
+  Computes the theoretical Jacobian using the ops generated by
+  ComputeGradient().
+
+  Args:
+    x: the tensor "x".
+    x_shape: the dimensions of x as a tuple or an array of ints.
+    x_data: a numpy parray as the input data for x
+    dy: the tensor "dy".
+    dy_shape: the dimensions of dy as a tuple or an array of ints.
+    dx: Tensor or IndexedSlices representing dx
+
+  Returns:
+    A 2-d numpy array representing the Jacobian for dy/dx. It has "x_size" rows
+    and "dy_size" columns where "x_size" is the number of elements in x and
+    "dy_size" is the number of elements in dy.
+  """
+  # To compute the jacobian, we treat x and y are one-dimensional vectors
+  x_size = _Product(x_shape)
+  x_val_size = _Product(x_shape[1:])  # This is used for sparse gradients
+  dy_size = _Product(dy_shape)
+
+  jacobian = np.zeros((x_size, dy_size), dtype=x_data.dtype)
+  # For each of the entry of dy, we set this to be 1 and
+  # everything else to be 0 and compute the backprop -- this will give us one
+  # one column of the Jacobian matrix.
+  for col in range(0, dy_size):
+    dy_data = np.zeros(dy_shape, dtype=x_data.dtype)
+    dy_data.flat[col] = 1
+    sess = ops.get_default_session()
+    if isinstance(dx, ops.IndexedSlices):
+      backprop_indices, backprop_values = sess.run(
+          [dx.indices, dx.values], feed_dict={x: x_data, dy: dy_data})
+      for i, v in zip(backprop_indices, backprop_values):
+        r_begin = i * x_val_size
+        r_end = r_begin + x_val_size
+        jacobian[r_begin:r_end, col] += v.flat
+    else:
+      assert isinstance(dx, ops.Tensor), "dx = " + str(dx)
+      backprop = sess.run(dx, feed_dict={x: x_data, dy: dy_data})
+      jacobian[:, col] = backprop.reshape(x_size)
+
+  logging.vlog(1, "Theoretical Jacobian =\n%s", jacobian)
+  return jacobian
+
+
+def _ComputeNumericJacobian(x, x_shape, x_data, y, y_shape, delta):
+  """Computes the numeric Jacobian for dy/dx.
+
+  Computes the numeric Japcobian by slightly perturbing the inputs and
+  measuring the differences on the output.
+
+  Args:
+    x: the tensor "x".
+    x_shape: the dimensions of x as a tuple or an array of ints.
+    x_data: a numpy array as the input data for x
+    y: the tensor "y".
+    y_shape: the dimensions of y as a tuple or an array of ints.
+    delta: the amount of perturbation we give to the input
+
+  Returns:
+    A 2-d numpy array representing the Jacobian for dy/dx. It has "x_size" rows
+    and "y_size" columns where "x_size" is the number of elements in x and
+    "y_size" is the number of elements in y.
+  """
+
+  # To compute the jacobian, we treat x and y are one-dimensional vectors
+  x_size = _Product(x_shape)
+  y_size = _Product(y_shape)
+
+  jacobian = np.zeros((x_size, y_size), dtype=x_data.dtype)
+  # For each of the entry of x, we slightly perturbs this by adding and
+  # subtracting a delta and then compute difference between the outputs. This
+  # will give us one row of the Jacobian matrix.
+  for row in range(0, x_size):
+    x_pos = x_data.copy()
+    x_pos.flat[row] += delta
+    y_pos = y.eval(feed_dict={x: x_pos})
+    x_neg = x_data.copy()
+    x_neg.flat[row] -= delta
+    y_neg = y.eval(feed_dict={x: x_neg})
+    diff = (y_pos - y_neg) / (2 * delta)
+    jacobian[row, :] = diff.reshape(y_size)
+
+  logging.vlog(1, "Numeric Jacobian =\n%s", jacobian)
+  return jacobian
+
+
+def _ComputeDxAndDy(x, y, y_shape):
+  """Returns a node to compute gradient of x wrt y."""
+  # We make up a dy so that we can compute the gradients. We don't really use
+  # the value of dy -- we will always feed it. We need to add an identity node
+  # so that we can always feed it properly. Otherwise, for the Add operation,
+  # dx is the same as dy and we cannot fetch the tensor that we are feeding.
+  with x.graph.as_default():
+    dy_orig = constant_op.constant(1.0, shape=y_shape, dtype=y.dtype)
+    dy = array_ops.identity(dy_orig)
+  # We compute the gradients for x wrt. y
+  grads = gradients.gradients(y, x, dy)
+  assert len(grads) == 1
+  return grads[0], dy_orig
+
+
+def _ComputeGradient(x, x_shape, dx, y, y_shape, dy,
+                     x_init_value=None, delta=1e-3):
+  """Computes the theoretical and numerical jacobian."""
+  t = types.as_dtype(x.dtype)
+  allowed_types = [types.float32, types.float64]
+  assert t.base_dtype in allowed_types, "Don't support type %s for x" % t.name
+  t2 = types.as_dtype(y.dtype)
+  assert t2.base_dtype in allowed_types, "Don't support type %s for y" % t2.name
+
+  if x_init_value is not None:
+    i_shape = list(x_init_value.shape)
+    assert(list(x_shape) == i_shape), "x_shape = %s, init_data shape = %s" % (
+        x_shape, i_shape)
+    x_data = x_init_value
+  else:
+    if t == types.float32:
+      dtype = np.float32
+    else:
+      dtype = np.float64
+    x_data = np.asfarray(np.random.random_sample(x_shape), dtype=dtype)
+
+  jacob_t = _ComputeTheoricalJacobian(x, x_shape, x_data, dy, y_shape, dx)
+  jacob_n = _ComputeNumericJacobian(x, x_shape, x_data, y, y_shape, delta)
+  return jacob_t, jacob_n
+
+
+def _ComputeGradientList(
+    x, x_shape, y, y_shape, x_init_value=None, delta=1e-3, init_targets=None):
+  """Compute gradients for a list of x values."""
+  assert isinstance(x, list)
+  dx, dy = zip(*[_ComputeDxAndDy(xi, y, y_shape) for xi in x])
+
+  if init_targets is not None:
+    assert isinstance(init_targets, (list, tuple))
+    for init in init_targets:
+      init.run()
+  if x_init_value is None:
+    x_init_value = [None] * len(x)
+  ret = [_ComputeGradient(xi, x_shapei, dxi, y, y_shape, dyi,
+                          x_init_valuei, delta)
+         for xi, x_shapei, dxi, dyi, x_init_valuei in
+         zip(x, x_shape, dx, dy, x_init_value)]
+  return ret
+
+
+def ComputeGradient(
+    x, x_shape, y, y_shape, x_init_value=None, delta=1e-3, init_targets=None):
+  """Computes and returns the theoretical and numerical Jacobian.
+
+  Args:
+    x: a tensor or list of tensors
+    x_shape: the dimensions of x as a tuple or an array of ints. If x is a list,
+    then this is the list of shapes.
+    y: a tensor
+    y_shape: the dimensions of y as a tuple or an array of ints.
+    x_init_value: (optional) a numpy array of the same shape as "x"
+      representing the initial value of x. If x is a list, this should be a list
+      of numpy arrays.  If this is none, the function will pick a random tensor
+      as the initial value.
+    delta: (optional) the amount of perturbation.
+    init_targets: list of targets to run to initialize model params.
+      TODO(mrry): remove this argument.
+
+  Returns:
+    Two 2-d numpy arrays representing the theoretical and numerical
+    Jacobian for dy/dx. Each has "x_size" rows and "y_size" columns
+    where "x_size" is the number of elements in x and "y_size" is the
+    number of elements in y. If x is a list, returns a list of two numpy arrays.
+  """
+  if isinstance(x, list):
+    return _ComputeGradientList(x, x_shape, y, y_shape, x_init_value,
+                                delta, init_targets)
+  else:
+    if init_targets is not None:
+      assert isinstance(init_targets, (list, tuple))
+      for init in init_targets:
+        init.run()
+    dx, dy = _ComputeDxAndDy(x, y, y_shape)
+    ret = _ComputeGradient(x, x_shape, dx, y, y_shape, dy, x_init_value, delta)
+    return ret
+
+
+def ComputeGradientError(
+    x, x_shape, y, y_shape, x_init_value=None, delta=1e-3, init_targets=None):
+  """Computes the gradient error.
+
+  Computes the maximum error for dy/dx between the computed Jacobian and the
+  numerically estimated Jacobian.
+
+  This function will modify the tensors passed in as it adds more operations
+  and hence changing the consumers of the operations of the input tensors.
+
+  This function adds operations to the current session. To compute the error
+  using a particular device, such as a GPU, use the standard methods for
+  setting a device (e.g. using with sess.graph.device() or setting a device
+  function in the session constructor).
+
+  Args:
+    x: a tensor or list of tensors
+    x_shape: the dimensions of x as a tuple or an array of ints. If x is a list,
+    then this is the list of shapes.
+    y: a tensor
+    y_shape: the dimensions of y as a tuple or an array of ints.
+    x_init_value: (optional) a numpy array of the same shape as "x"
+      representing the initial value of x. If x is a list, this should be a list
+      of numpy arrays.  If this is none, the function will pick a random tensor
+      as the initial value.
+    delta: (optional) the amount of perturbation.
+    init_targets: list of targets to run to initialize model params.
+      TODO(mrry): Remove this argument.
+
+  Returns:
+    The maximum error in between the two Jacobians.
+  """
+  grad = ComputeGradient(x, x_shape, y, y_shape, x_init_value,
+                         delta, init_targets)
+  if isinstance(grad, tuple):
+    grad = [grad]
+  return max(np.fabs(j_t - j_n).max() for j_t, j_n in grad)
diff --git a/tensorflow/python/kernel_tests/gradient_checker_test.py b/tensorflow/python/kernel_tests/gradient_checker_test.py
new file mode 100644
index 0000000000..a844b7c637
--- /dev/null
+++ b/tensorflow/python/kernel_tests/gradient_checker_test.py
@@ -0,0 +1,178 @@
+"""Tests for tensorflow.kernels.gradient_checker."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests.gradient_checker import ComputeGradientError
+
+
+class GradientCheckerTest(tf.test.TestCase):
+
+  def testAddSimple(self):
+    with self.test_session(use_gpu=False):
+      # a test case for Add operation
+      size = (2, 3)
+      x1 = tf.constant(2.0, shape=size, name="x1")
+      x2 = tf.constant(3.0, shape=size, name="x2")
+      y = tf.add(x1, x2, name="y")
+
+      # checking gradients for x1
+      error = ComputeGradientError(x1, size, y, size)
+    tf.logging.info("x1 error = %f", error)
+    assert error < 1e-4
+
+  def testAddSimpleGPU(self):
+    with self.test_session(use_gpu=True):
+      # a test case for Add operation
+      size = (2, 3)
+      x1 = tf.constant(2.0, shape=size, name="x1")
+      x2 = tf.constant(3.0, shape=size, name="x2")
+      y = tf.add(x1, x2, name="y")
+
+      # checking gradients for x1
+      error = ComputeGradientError(x1, size, y, size)
+    tf.logging.info("x1 error = %f", error)
+    assert error < 1e-4
+
+  def testAddCustomized(self):
+    with self.test_session():
+      # a test case for Add operation
+      size = (2, 3)
+      x1 = tf.constant(2.0, shape=size, dtype=tf.float64,
+                                name="x1")
+      x2 = tf.constant(3.0, shape=size, dtype=tf.float64,
+                                name="x2")
+      y = tf.add(x1, x2, name="y")
+
+      # checkint gradients for x2 using a special init_value and delta
+      x_init_value = np.asarray(np.arange(6, dtype=np.float64).reshape(2, 3))
+      error = ComputeGradientError(x2, size, y, size, x_init_value=x_init_value,
+                                   delta=1e-2)
+    tf.logging.info("x2 error = %f", error)
+    assert error < 1e-10
+
+  def testGather(self):
+    with self.test_session():
+      p_shape = (4, 2)
+      p_size = 8
+      index_values = [1, 3]
+      y_shape = [2, 2]
+      params = tf.constant(np.arange(p_size).astype(np.float),
+                                    shape=p_shape, name="p")
+      indices = tf.constant(index_values, name="i")
+      y = tf.gather(params, indices, name="y")
+
+      error = ComputeGradientError(params, p_shape, y, y_shape)
+    tf.logging.info("gather error = %f", error)
+    assert error < 1e-4
+
+  def testNestedGather(self):
+    with self.test_session():
+      p_shape = (8, 2)
+      p_size = 16
+      index_values = [1, 3, 5, 6]
+      index_values2 = [0, 2]
+      y2_shape = [2, 2]
+
+      params = tf.constant(np.arange(p_size).astype(np.float),
+                                    shape=p_shape, name="p")
+      indices = tf.constant(index_values, name="i")
+      y = tf.gather(params, indices, name="y")
+      indices2 = tf.constant(index_values2, name="i2")
+      y2 = tf.gather(y, indices2, name="y2")
+
+      error = ComputeGradientError(params, p_shape, y2, y2_shape)
+    tf.logging.info("nested gather error = %f", error)
+    assert error < 1e-4
+
+
+# Gradient checker for MNIST.
+def BuildAndTestMiniMNIST(param_index, tag):
+  # Hyperparameters
+  batch = 3
+  inputs = 16
+  features = 32
+  classes = 10
+
+  # Define the parameters
+  inp_data = np.random.random_sample(inputs * batch)
+  hidden_weight_data = np.random.randn(inputs * features) / np.sqrt(inputs)
+  hidden_bias_data = np.random.random_sample(features)
+  sm_weight_data = np.random.randn(features * classes) / np.sqrt(features)
+  sm_bias_data = np.random.random_sample(classes)
+
+  # special care for labels since they need to be normalized per batch
+  label_data = np.random.random(batch * classes).reshape((batch, classes))
+  s = label_data.sum(axis=1)
+  label_data /= s[:, None]
+
+  with tf.Session():
+    # We treat the inputs as "parameters" here
+    inp = tf.constant(inp_data.tolist(), shape=[batch, inputs],
+                               dtype=tf.float64, name="inp")
+    hidden_weight = tf.constant(hidden_weight_data.tolist(),
+                                         shape=[inputs, features],
+                                         dtype=tf.float64,
+                                         name="hidden_weight")
+    hidden_bias = tf.constant(hidden_bias_data.tolist(),
+                                       shape=[features],
+                                       dtype=tf.float64,
+                                       name="hidden_bias")
+    softmax_weight = tf.constant(sm_weight_data.tolist(),
+                                          shape=[features, classes],
+                                          dtype=tf.float64,
+                                          name="softmax_weight")
+    softmax_bias = tf.constant(sm_bias_data.tolist(), shape=[classes],
+                                        dtype=tf.float64,
+                                        name="softmax_bias")
+
+    # List all the parameter so that we can test them one at a time
+    all_params = [inp, hidden_weight, hidden_bias, softmax_weight, softmax_bias]
+    param_sizes = [[batch, inputs],  # inp
+                   [inputs, features],  # hidden_weight,
+                   [features],  # hidden_bias
+                   [features, classes],  # softmax_weight,
+                   [classes]]  # softmax_bias
+
+    # Now, Building MNIST
+    features = tf.nn.relu(tf.nn.xw_plus_b(inp, hidden_weight, hidden_bias),
+                          name="features")
+    logits = tf.nn.xw_plus_b(features, softmax_weight, softmax_bias,
+                             name="logits")
+    labels = tf.constant(label_data.tolist(),
+                         shape=[batch, classes],
+                         dtype=tf.float64,
+                         name="labels")
+    cost = tf.nn.softmax_cross_entropy_with_logits(logits, labels, name="cost")
+
+    # Test the gradients.
+    err = ComputeGradientError(all_params[param_index],
+                               param_sizes[param_index],
+                               cost, [batch], delta=1e-5)
+
+  tf.logging.info("Mini MNIST: %s gradient error = %g", tag, err)
+  return err
+
+
+class MiniMNISTTest(tf.test.TestCase):
+
+  def testInputGradient(self):
+    self.assertLess(BuildAndTestMiniMNIST(0, "input"), 1e-8)
+
+  def testHiddenWeightGradient(self):
+    self.assertLess(BuildAndTestMiniMNIST(1, "hidden_weight"), 1e-8)
+
+  def testHiddenBiasGradient(self):
+    self.assertLess(BuildAndTestMiniMNIST(2, "hidden_bias"), 1e-8)
+
+  def testSoftmaxWeightGradient(self):
+    self.assertLess(BuildAndTestMiniMNIST(3, "softmax_weight"), 1e-8)
+
+  def testSoftmaxBiasGradient(self):
+    self.assertLess(BuildAndTestMiniMNIST(4, "softmax_bias"), 1e-8)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/identity_op_py_test.py b/tensorflow/python/kernel_tests/identity_op_py_test.py
new file mode 100644
index 0000000000..2209cf08ad
--- /dev/null
+++ b/tensorflow/python/kernel_tests/identity_op_py_test.py
@@ -0,0 +1,47 @@
+"""Tests for IdentityOp."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.ops import gen_array_ops
+
+
+class IdentityOpTest(tf.test.TestCase):
+
+  def testInt32_6(self):
+    with self.test_session():
+      value = tf.identity([1, 2, 3, 4, 5, 6]).eval()
+    self.assertAllEqual(np.array([1, 2, 3, 4, 5, 6]), value)
+
+  def testInt32_2_3(self):
+    with self.test_session():
+      inp = tf.constant([10, 20, 30, 40, 50, 60], shape=[2, 3])
+      value = tf.identity(inp).eval()
+    self.assertAllEqual(np.array([[10, 20, 30], [40, 50, 60]]), value)
+
+  def testString(self):
+    with self.test_session():
+      value = tf.identity(["A", "b", "C", "d", "E", "f"]).eval()
+    self.assertAllEqual(["A", "b", "C", "d", "E", "f"], value)
+
+  def testIdentityShape(self):
+    with self.test_session():
+      shape = [2, 3]
+      array_2x3 = [[1, 2, 3], [6, 5, 4]]
+      tensor = tf.constant(array_2x3)
+      self.assertEquals(shape, tensor.get_shape())
+      self.assertEquals(shape, tf.identity(tensor).get_shape())
+      self.assertEquals(shape, tf.identity(array_2x3).get_shape())
+      self.assertEquals(shape, tf.identity(np.array(array_2x3)).get_shape())
+
+  def testRefIdentityShape(self):
+    with self.test_session():
+      shape = [2, 3]
+      tensor = tf.Variable(tf.constant([[1, 2, 3], [6, 5, 4]], dtype=tf.int32))
+      self.assertEquals(shape, tensor.get_shape())
+      self.assertEquals(shape, gen_array_ops._ref_identity(tensor).get_shape())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/in_topk_op_test.py b/tensorflow/python/kernel_tests/in_topk_op_test.py
new file mode 100644
index 0000000000..d2a51788c4
--- /dev/null
+++ b/tensorflow/python/kernel_tests/in_topk_op_test.py
@@ -0,0 +1,36 @@
+"""Tests for PrecisionOp."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class InTopKTest(tf.test.TestCase):
+
+  def _validateInTopK(self, predictions, target, k, expected):
+    np_ans = np.array(expected)
+    with self.test_session():
+      precision = tf.nn.in_top_k(predictions, target, k)
+      out = precision.eval()
+      self.assertAllClose(np_ans, out)
+      self.assertShapeEqual(np_ans, precision)
+
+  def testInTop1(self):
+    predictions = [[0.1, 0.3, 0.2, 0.4], [0.1, 0.2, 0.3, 0.4]]
+    target = [3, 1]
+    self._validateInTopK(predictions, target, 1, [True, False])
+
+  def testInTop2(self):
+    predictions = [[0.1, 0.3, 0.2, 0.4], [0.1, 0.2, 0.3, 0.4]]
+    target = [0, 2]
+    self._validateInTopK(predictions, target, 2, [False, True])
+
+  def testInTop2Tie(self):
+    # Class 2 and 3 tie for 2nd, so both are considered in top 2.
+    predictions = [[0.1, 0.3, 0.2, 0.2], [0.1, 0.3, 0.2, 0.2]]
+    target = [2, 3]
+    self._validateInTopK(predictions, target, 2, [True, True])
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/init_ops_test.py b/tensorflow/python/kernel_tests/init_ops_test.py
new file mode 100644
index 0000000000..4ce6081b7b
--- /dev/null
+++ b/tensorflow/python/kernel_tests/init_ops_test.py
@@ -0,0 +1,252 @@
+"""Tests for tensorflow.ops.ops."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.framework import random_seed
+from tensorflow.python.ops import init_ops
+
+
+# Returns true iff the two initalizers produce the same tensor to
+# within a tiny tolerance.
+def identicaltest(tc, init1, init2, use_gpu):
+  """Tests if two initializations are identical to within tiny tolerances.
+
+  Args:
+    tc: An instance of TensorFlowTestCase.
+    init1: An Initializer that generates a tensor of a given shape
+    init2: An Initializer that generates a tensor of a given shape
+    use_gpu: Use gpu if true.
+  Returns:
+    True or False as determined by test.
+  """
+  num = 100
+  with tc.test_session(use_gpu=use_gpu, graph=tf.Graph()):
+    t1 = init1([num]).eval()
+  with tc.test_session(use_gpu=use_gpu, graph=tf.Graph()):
+    t2 = init2([num]).eval()
+  return np.allclose(t1, t2, rtol=1e-15, atol=1e-15)
+
+
+def duplicated_initializer(tc, init, use_gpu, graph_seed):
+  """Tests duplicated random initializer within the same graph.
+
+  This test generates two random kernels from the same initializer to the same
+  graph, and checks if the results are close enough. Even given the same global,
+  seed, two different instances of random kernels should generate different
+  results.
+
+  Args:
+    tc: An instance of TensorFlowTestCase.
+    init: An Initializer that generates a tensor of a given shape
+    use_gpu: Use gpu if true.
+    graph_seed: A graph-level seed to use.
+  Returns:
+    True or False as determined by test.
+  """
+  num = 100
+  with tc.test_session(use_gpu=use_gpu, graph=tf.Graph()):
+    random_seed.set_random_seed(graph_seed)
+    t1 = init([num]).eval()
+    t2 = init([num]).eval()
+    return np.allclose(t1, t2, rtol=1e-15, atol=1e-15)
+
+
+def _init_sampler(tc, init, num, use_gpu):
+  """Returns a func to generate a random tensor of shape [num].
+
+  Args:
+    tc: An instance of TensorFlowTestCase.
+    init: An Initializer that generates a tensor of a given shape
+    num: Size of 1D tensor to create.
+    use_gpu: Use gpu if true.
+  Returns:
+    Function to generate a random tensor.
+  """
+  def func():
+    with tc.test_session(use_gpu=use_gpu):
+      return init([num]).eval()
+  return func
+
+
+class RandomNormalInitializationTest(tf.test.TestCase):
+
+  def testInitializerIdentical(self):
+    for use_gpu in [False, True]:
+      init1 = tf.random_normal_initializer(0.0, 1.0, seed=1)
+      init2 = tf.random_normal_initializer(0.0, 1.0, seed=1)
+      self.assertTrue(identicaltest(self, init1, init2, use_gpu))
+
+  def testInitializerDifferent(self):
+    for use_gpu in [False, True]:
+      init1 = tf.random_normal_initializer(0.0, 1.0, seed=1)
+      init2 = tf.random_normal_initializer(0.0, 1.0, seed=2)
+      self.assertFalse(identicaltest(self, init1, init2, use_gpu=use_gpu))
+
+  def testDuplicatedInitializer(self):
+    for use_gpu in [False, True]:
+      init = tf.random_normal_initializer(0.0, 1.0)
+      self.assertFalse(duplicated_initializer(self, init, use_gpu, 1))
+
+
+class TruncatedNormalInitializationTest(tf.test.TestCase):
+
+  def testInitializerIdentical(self):
+    for use_gpu in [False, True]:
+      init1 = tf.truncated_normal_initializer(0.0, 1.0, seed=1)
+      init2 = tf.truncated_normal_initializer(0.0, 1.0, seed=1)
+      self.assertTrue(identicaltest(self, init1, init2, use_gpu))
+
+  def testInitializerDifferent(self):
+    for use_gpu in [False, True]:
+      init1 = tf.truncated_normal_initializer(0.0, 1.0, seed=1)
+      init2 = tf.truncated_normal_initializer(0.0, 1.0, seed=2)
+      self.assertFalse(identicaltest(self, init1, init2, use_gpu=use_gpu))
+
+  def testDuplicatedInitializer(self):
+    for use_gpu in [False, True]:
+      init = tf.truncated_normal_initializer(0.0, 1.0)
+      self.assertFalse(duplicated_initializer(self, init, use_gpu, 1))
+
+
+class RandomUniformInitializationTest(tf.test.TestCase):
+
+  def testInitializerIdentical(self):
+    for use_gpu in [False, True]:
+      init1 = tf.random_uniform_initializer(0.0, 1.0, seed=1)
+      init2 = tf.random_uniform_initializer(0.0, 1.0, seed=1)
+      self.assertTrue(identicaltest(self, init1, init2, use_gpu))
+
+  def testInitializerDifferent(self):
+    for use_gpu in [False, True]:
+      init1 = tf.random_uniform_initializer(0.0, 1.0, seed=1)
+      init2 = tf.random_uniform_initializer(0.0, 1.0, seed=2)
+      self.assertFalse(identicaltest(self, init1, init2, use_gpu))
+
+  def testDuplicatedInitializer(self):
+    for use_gpu in [False, True]:
+      init = tf.random_uniform_initializer(0.0, 1.0)
+      self.assertFalse(duplicated_initializer(self, init, use_gpu, 1))
+
+
+class UniformUnitScalingInitializationTest(tf.test.TestCase):
+
+  def testInitializerIdentical(self):
+    for use_gpu in [False, True]:
+      init1 = tf.uniform_unit_scaling_initializer(seed=1)
+      init2 = tf.uniform_unit_scaling_initializer(seed=1)
+      self.assertTrue(identicaltest(self, init1, init2, use_gpu))
+      init3 = tf.uniform_unit_scaling_initializer(1.5, seed=1)
+      init4 = tf.uniform_unit_scaling_initializer(1.5, seed=1)
+      self.assertTrue(identicaltest(self, init3, init4, use_gpu))
+
+  def testInitializerDifferent(self):
+    for use_gpu in [False, True]:
+      init1 = tf.uniform_unit_scaling_initializer(seed=1)
+      init2 = tf.uniform_unit_scaling_initializer(seed=2)
+      init3 = tf.uniform_unit_scaling_initializer(1.5, seed=1)
+      self.assertFalse(identicaltest(self, init1, init2, use_gpu))
+      self.assertFalse(identicaltest(self, init1, init3, use_gpu))
+      self.assertFalse(identicaltest(self, init2, init3, use_gpu))
+
+  def testDuplicatedInitializer(self):
+    for use_gpu in [False, True]:
+      init = tf.uniform_unit_scaling_initializer()
+      self.assertFalse(duplicated_initializer(self, init, use_gpu, 1))
+
+
+class RandomWalkShapeTest(tf.test.TestCase):
+
+  def testRandomWalk(self):
+    # Fully known shape.
+    rnd1 = init_ops._random_walk([1, 2], tf.nn.relu)
+    self.assertEqual([1, 2], rnd1.get_shape())
+
+
+# TODO(vrv): move to sequence_ops_test?
+class RangeTest(tf.test.TestCase):
+
+  def _Range(self, start, limit, delta):
+    with self.test_session():
+      tf_ans = tf.range(start, limit, delta, name="range")
+      self.assertEqual([len(range(start, limit, delta))], tf_ans.get_shape())
+      return tf_ans.eval()
+
+  def testBasic(self):
+    self.assertTrue(np.array_equal(
+        self._Range(0, 5, 1), np.array([0, 1, 2, 3, 4])))
+    self.assertTrue(np.array_equal(
+        self._Range(0, 5, 2), np.array([0, 2, 4])))
+    self.assertTrue(np.array_equal(
+        self._Range(0, 6, 2), np.array([0, 2, 4])))
+    self.assertTrue(np.array_equal(
+        self._Range(13, 32, 7), np.array([13, 20, 27])))
+    self.assertTrue(np.array_equal(
+        self._Range(100, 500, 100), np.array([100, 200, 300, 400])))
+    self.assertEqual(tf.range(0, 5, 1).dtype, tf.int32)
+
+  def testEmpty(self):
+    for start in 0, 5:
+      self.assertTrue(np.array_equal(self._Range(start, start, 1), []))
+
+
+# TODO(vrv): move to sequence_ops_test?
+class LinSpaceTest(tf.test.TestCase):
+
+  def _LinSpace(self, start, stop, num):
+    with self.test_session():
+      tf_ans = tf.linspace(start, stop, num, name="linspace")
+      self.assertEqual([num], tf_ans.get_shape())
+      return tf_ans.eval()
+
+  def testPositive(self):
+    self.assertArrayNear(self._LinSpace(1., 5., 1), np.array([1.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(1., 5., 2), np.array([1., 5.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(1., 5., 3),
+                         np.array([1., 3., 5.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(1., 5., 4),
+                         np.array([1., 7. / 3., 11. / 3., 5.]), 1e-5)
+
+  def testNegative(self):
+    self.assertArrayNear(self._LinSpace(-1., -5., 1), np.array([-1.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(-1., -5., 2),
+                         np.array([-1., -5.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(-1., -5., 3),
+                         np.array([-1., -3., -5.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(-1., -5., 4),
+                         np.array([-1., -7. / 3., -11. / 3., -5.]), 1e-5)
+
+  def testNegativeToPositive(self):
+    self.assertArrayNear(self._LinSpace(-1., 5., 1), np.array([-1.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(-1., 5., 2), np.array([-1., 5.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(-1., 5., 3),
+                         np.array([-1., 2., 5.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(-1., 5., 4),
+                         np.array([-1., 1., 3., 5.]), 1e-5)
+
+  def testPoint(self):
+    self.assertArrayNear(self._LinSpace(5., 5., 1), np.array([5.]), 1e-5)
+    self.assertArrayNear(self._LinSpace(5., 5., 2), np.array([5.] * 2), 1e-5)
+    self.assertArrayNear(self._LinSpace(5., 5., 3), np.array([5.] * 3), 1e-5)
+    self.assertArrayNear(self._LinSpace(5., 5., 4), np.array([5.] * 4), 1e-5)
+
+
+class DeviceTest(tf.test.TestCase):
+
+  def testNoDevice(self):
+    with tf.Graph().as_default():
+      var = tf.Variable([[1.0, 1.0]])
+    self.assertEqual(None, var.device)
+    self.assertEqual(None, var.initializer.device)
+
+  def testDevice(self):
+    with tf.Graph().as_default():
+      with tf.device("/job:ps"):
+        var = tf.Variable([[1.0, 1.0]])
+    self.assertEqual("/job:ps", var.device)
+    self.assertEqual("/job:ps", var.initializer.device)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/io_ops_test.py b/tensorflow/python/kernel_tests/io_ops_test.py
new file mode 100644
index 0000000000..2eb8bdd26f
--- /dev/null
+++ b/tensorflow/python/kernel_tests/io_ops_test.py
@@ -0,0 +1,53 @@
+"""Tests for tensorflow.python.ops.io_ops."""
+# -*- coding: utf-8 -*-
+
+import tempfile
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class IoOpsTest(tf.test.TestCase):
+
+  def testReadFile(self):
+    cases = ['', 'Some contents', 'Неки садржаји на српском']
+    for contents in cases:
+      temp = tempfile.NamedTemporaryFile(prefix='ReadFileTest')
+      open(temp.name, 'wb').write(contents)
+      with self.test_session():
+        read = tf.read_file(temp.name)
+        self.assertEqual([], read.get_shape())
+        self.assertEqual(read.eval(), contents)
+
+  def _subset(self, files, indices):
+    return set([files[i].name for i in range(len(files)) if i in indices])
+
+  def testMatchingFiles(self):
+    cases = ['ABcDEF.GH', 'ABzDEF.GH', 'ABasdfjklDEF.GH', 'AB3DEF.GH',
+             'AB4DEF.GH', 'ABDEF.GH', 'XYZ']
+    files = [tempfile.NamedTemporaryFile(prefix=c) for c in cases]
+
+    with self.test_session():
+      # Test exact match without wildcards.
+      for f in files:
+        self.assertEqual(tf.matching_files(f.name).eval(), f.name)
+
+      # We will look for files matching "ABxDEF.GH*" where "x" is some wildcard.
+      pos = files[0].name.find(cases[0])
+      pattern = files[0].name[:pos] + 'AB%sDEF.GH*'
+
+      self.assertEqual(set(tf.matching_files(pattern % 'z').eval()),
+                       self._subset(files, [1]))
+      self.assertEqual(set(tf.matching_files(pattern % '?').eval()),
+                       self._subset(files, [0, 1, 3, 4]))
+      self.assertEqual(set(tf.matching_files(pattern % '*').eval()),
+                       self._subset(files, [0, 1, 2, 3, 4, 5]))
+      self.assertEqual(set(tf.matching_files(pattern % '[cxz]').eval()),
+                       self._subset(files, [0, 1]))
+      self.assertEqual(set(tf.matching_files(pattern % '[0-9]').eval()),
+                       self._subset(files, [3, 4]))
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/linalg_grad_test.py b/tensorflow/python/kernel_tests/linalg_grad_test.py
new file mode 100644
index 0000000000..50e5328c3e
--- /dev/null
+++ b/tensorflow/python/kernel_tests/linalg_grad_test.py
@@ -0,0 +1,49 @@
+"""Tests for tensorflow.ops.linalg_grad."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class MatrixInverseGradientTest(tf.test.TestCase):
+  pass  # Filled in below
+
+def _GetMatrixInverseGradientTest(dtype, shape):
+  def Test(self):
+    with self.test_session():
+      np.random.seed(1)
+      m = np.random.uniform(low=1.0, high=100.0, size=np.prod(shape)).reshape(
+          shape).astype(dtype)
+      a = tf.constant(m)
+      epsilon = np.finfo(dtype).eps
+      # Optimal stepsize for central difference is O(epsilon^{1/3}).
+      delta = epsilon ** (1.0 / 3.0)
+      tol = 1e-3
+
+      if len(shape) == 2:
+        ainv = tf.matrix_inverse(a)
+      else:
+        ainv = tf.batch_matrix_inverse(a)
+
+      theoretical, numerical = gc.ComputeGradient(a, shape, ainv, shape,
+                                                  delta=delta)
+      self.assertAllClose(theoretical, numerical, atol=tol, rtol=tol)
+  return Test
+
+
+if __name__ == "__main__":
+  # TODO(rmlarsen,irving): Reenable float32 once tolerances are fixed
+  # The test used to loop over (np.float, np.double), both of which are float64.
+  for dtype in np.float64,:
+    for size in 2, 3, 5, 10:
+      # We skip the rank 4, size 10 case: it is slow and conceptually covered
+      # by the other cases.
+      for extra in [(), (2,), (3,)] + [(3, 2)] * (size < 10):
+        shape = extra + (size, size)
+        name = '%s_%s' % (dtype.__name__, '_'.join(map(str, shape)))
+        setattr(MatrixInverseGradientTest, 'testMatrixInverseGradient_' + name,
+                _GetMatrixInverseGradientTest(dtype, shape))
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/listdiff_op_test.py b/tensorflow/python/kernel_tests/listdiff_op_test.py
new file mode 100644
index 0000000000..b4607be1fb
--- /dev/null
+++ b/tensorflow/python/kernel_tests/listdiff_op_test.py
@@ -0,0 +1,117 @@
+"""Tests for tensorflow.kernels.listdiff_op."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class ListDiffTest(tf.test.TestCase):
+
+  def _testListDiff(self, x, y, out, idx, dtype=np.int32):
+    x = np.array(x, dtype=dtype)
+    y = np.array(y, dtype=dtype)
+    out = np.array(out, dtype=dtype)
+    idx = np.array(idx, dtype=dtype)
+
+    with self.test_session() as sess:
+      x_tensor = tf.convert_to_tensor(x)
+      y_tensor = tf.convert_to_tensor(y)
+      out_tensor, idx_tensor = tf.listdiff(x_tensor, y_tensor)
+      tf_out, tf_idx = sess.run([out_tensor, idx_tensor])
+
+    self.assertAllEqual(tf_out, out)
+    self.assertAllEqual(tf_idx, idx)
+    self.assertEqual(1, out_tensor.get_shape().ndims)
+    self.assertEqual(1, idx_tensor.get_shape().ndims)
+
+  def testBasic1(self):
+    x = [1, 2, 3, 4]
+    y = [1, 2]
+    out = [3, 4]
+    idx = [2, 3]
+    for t in [np.int32, np.int64, np.float, np.double]:
+      self._testListDiff(x, y, out, idx, dtype=t)
+
+  def testBasic2(self):
+    x = [1, 2, 3, 4]
+    y = [2]
+    out = [1, 3, 4]
+    idx = [0, 2, 3]
+    for t in [np.int32, np.int64, np.float, np.double]:
+      self._testListDiff(x, y, out, idx, dtype=t)
+
+  def testBasic3(self):
+    x = [1, 4, 3, 2]
+    y = [4, 2]
+    out = [1, 3]
+    idx = [0, 2]
+    for t in [np.int32, np.int64, np.float, np.double]:
+      self._testListDiff(x, y, out, idx, dtype=t)
+
+  def testDuplicates(self):
+    x = [1, 2, 4, 3, 2, 3, 3, 1]
+    y = [4, 2]
+    out = [1, 3, 3, 3, 1]
+    idx = [0, 3, 5, 6, 7]
+    for t in [np.int32, np.int64, np.float, np.double]:
+      self._testListDiff(x, y, out, idx, dtype=t)
+
+  def testRandom(self):
+    num_random_tests = 10
+    int_low = -7
+    int_high = 8
+    max_size = 50
+    for _ in xrange(num_random_tests):
+      x_size = np.random.randint(max_size + 1)
+      x = np.random.randint(int_low, int_high, size=x_size)
+      y_size = np.random.randint(max_size + 1)
+      y = np.random.randint(int_low, int_high, size=y_size)
+      out_idx = [(entry, pos) for pos, entry in enumerate(x) if entry not in y]
+      if out_idx:
+        out_idx = map(list, zip(*out_idx))
+        out = out_idx[0]
+        idx = out_idx[1]
+      else:
+        out = []
+        idx = []
+      for t in [np.int32, np.int64, np.float, np.double]:
+        self._testListDiff(x, y, out, idx, dtype=t)
+
+  def testInt32FullyOverlapping(self):
+    x = [1, 2, 3, 4]
+    y = [1, 2, 3, 4]
+    out = []
+    idx = []
+    self._testListDiff(x, y, out, idx)
+
+  def testInt32NonOverlapping(self):
+    x = [1, 2, 3, 4]
+    y = [5, 6]
+    out = x
+    idx = range(len(x))
+    self._testListDiff(x, y, out, idx)
+
+  def testInt32EmptyX(self):
+    x = []
+    y = [1, 2]
+    out = []
+    idx = []
+    self._testListDiff(x, y, out, idx)
+
+  def testInt32EmptyY(self):
+    x = [1, 2, 3, 4]
+    y = []
+    out = x
+    idx = range(len(x))
+    self._testListDiff(x, y, out, idx)
+
+  def testInt32EmptyXY(self):
+    x = []
+    y = []
+    out = []
+    idx = []
+    self._testListDiff(x, y, out, idx)
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/logging_ops_test.py b/tensorflow/python/kernel_tests/logging_ops_test.py
new file mode 100644
index 0000000000..18ca441b23
--- /dev/null
+++ b/tensorflow/python/kernel_tests/logging_ops_test.py
@@ -0,0 +1,50 @@
+"""Tests for tensorflow.kernels.logging_ops."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class LoggingOpsTest(tf.test.TestCase):
+
+  def testAssertDivideByZero(self):
+    with self.test_session() as sess:
+      epsilon = tf.convert_to_tensor(1e-20)
+      x = tf.convert_to_tensor(0.0)
+      y = tf.convert_to_tensor(1.0)
+      z = tf.convert_to_tensor(2.0)
+      # assert(epsilon < y)
+      # z / y
+      with sess.graph.control_dependencies(
+          [tf.Assert(tf.less(epsilon, y), ["Divide-by-zero"])]):
+        out = tf.div(z, y)
+      self.assertAllEqual(2.0, out.eval())
+      # assert(epsilon < x)
+      # z / x
+      #
+      # This tests printing out multiple tensors
+      with sess.graph.control_dependencies(
+          [tf.Assert(tf.less(epsilon, x),
+                     ["Divide-by-zero", "less than x"])]):
+        out = tf.div(z, x)
+      with self.assertRaisesOpError("less than x"):
+        out.eval()
+
+
+class PrintGradientTest(tf.test.TestCase):
+
+  def testPrintGradient(self):
+    with self.test_session():
+      inp = tf.constant(2.0, shape=[100, 32], name="in")
+      w = tf.constant(4.0, shape=[10, 100], name="w")
+      wx = tf.matmul(w, inp, name="wx")
+      wx_print = tf.Print(wx, [w, w, w])
+      wx_grad = tf.gradients(wx, w)[0]
+      wx_print_grad = tf.gradients(wx_print, w)[0]
+      wxg = wx_grad.eval()
+      wxpg = wx_print_grad.eval()
+    self.assertAllEqual(wxg, wxpg)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/lookup_table_op_test.py b/tensorflow/python/kernel_tests/lookup_table_op_test.py
new file mode 100644
index 0000000000..cd170876e6
--- /dev/null
+++ b/tensorflow/python/kernel_tests/lookup_table_op_test.py
@@ -0,0 +1,195 @@
+"""Tests for lookup table ops from tf."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class HashTableOpTest(tf.test.TestCase):
+
+  def testHashTable(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      # Initialize with keys and values tensors.
+      keys = tf.constant(['brain', 'salad', 'surgery'])
+      values = tf.constant([0, 1, 2], tf.int64)
+      init = table.initialize_from(keys, values)
+      init.run()
+      self.assertAllEqual(3, table.size().eval())
+
+      input_string = tf.constant(['brain', 'salad', 'tank'])
+      output = table.lookup(input_string)
+
+      result = output.eval()
+      self.assertAllEqual([0, 1, -1], result)
+
+  def testHashTableInitWithPythonArrays(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+      # Empty table.
+      self.assertAllEqual(0, table.size().eval())
+
+      # Initialize with keys and values tensors.
+      keys = ['brain', 'salad', 'surgery']
+      values = [0, 1, 2]
+      init = table.initialize_from(keys, values)
+      init.run()
+      self.assertAllEqual(3, table.size().eval())
+
+      input_string = tf.constant(['brain', 'salad', 'tank'])
+      output = table.lookup(input_string)
+
+      result = output.eval()
+      self.assertAllEqual([0, 1, -1], result)
+
+  def testHashTableInitWithNumPyArrays(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      # Initialize with keys and values tensors.
+      keys = np.array(['brain', 'salad', 'surgery'], dtype=np.str)
+      values = np.array([0, 1, 2], dtype=np.int64)
+      init = table.initialize_from(keys, values)
+      init.run()
+      self.assertAllEqual(3, table.size().eval())
+
+      input_string = tf.constant(['brain', 'salad', 'tank'])
+      output = table.lookup(input_string)
+
+      result = output.eval()
+      self.assertAllEqual([0, 1, -1], result)
+
+  def testMultipleHashTables(self):
+    with self.test_session() as sess:
+      shared_name = ''
+      default_val = -1
+      table1 = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+      table2 = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+      table3 = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      keys = tf.constant(['brain', 'salad', 'surgery'])
+      values = tf.constant([0, 1, 2], tf.int64)
+      table1.initialize_from(keys, values)
+      table2.initialize_from(keys, values)
+      table3.initialize_from(keys, values)
+
+      tf.initialize_all_tables().run()
+      self.assertAllEqual(3, table1.size().eval())
+      self.assertAllEqual(3, table2.size().eval())
+      self.assertAllEqual(3, table3.size().eval())
+
+      input_string = tf.constant(['brain', 'salad', 'tank'])
+      output1 = table1.lookup(input_string)
+      output2 = table2.lookup(input_string)
+      output3 = table3.lookup(input_string)
+
+      out1, out2, out3 = sess.run([output1, output2, output3])
+      self.assertAllEqual([0, 1, -1], out1)
+      self.assertAllEqual([0, 1, -1], out2)
+      self.assertAllEqual([0, 1, -1], out3)
+
+  def testHashTableWithTensorDefault(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = tf.constant(-1, tf.int64)
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      # Initialize with keys and values tensors.
+      keys = tf.constant(['brain', 'salad', 'surgery'])
+      values = tf.constant([0, 1, 2], tf.int64)
+      init = table.initialize_from(keys, values)
+      init.run()
+
+      input_string = tf.constant(['brain', 'salad', 'tank'])
+      output = table.lookup(input_string)
+
+      result = output.eval()
+      self.assertAllEqual([0, 1, -1], result)
+
+  def testSignatureMismatch(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      # Initialize with keys and values tensors.
+      keys = tf.constant(['brain', 'salad', 'surgery'])
+      values = tf.constant([0, 1, 2], tf.int64)
+      init = table.initialize_from(keys, values)
+      init.run()
+
+      input_string = tf.constant([1, 2, 3], tf.int64)
+      with self.assertRaises(TypeError):
+        table.lookup(input_string)
+
+      with self.assertRaises(TypeError):
+        tf.HashTable(tf.string, tf.int64, 'UNK', shared_name)
+
+  def testDTypes(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      with self.assertRaises(TypeError):
+        tf.HashTable([tf.string], tf.string, default_val, shared_name)
+
+  def testNotInitialized(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      input_string = tf.constant(['brain', 'salad', 'surgery'])
+      output = table.lookup(input_string)
+
+      with self.assertRaisesOpError('Table not initialized'):
+        output.eval()
+
+  def testInitializeTwice(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      # Initialize with keys and values tensors.
+      keys = tf.constant(['brain', 'salad', 'surgery'])
+      values = tf.constant([0, 1, 2], tf.int64)
+      init = table.initialize_from(keys, values)
+      init.run()
+
+      with self.assertRaisesOpError('Table already initialized'):
+        init.run()
+
+  def testInitializationWithInvalidDimensions(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      # Initialize with keys and values tensors.
+      keys = tf.constant(['brain', 'salad', 'surgery'])
+      values = tf.constant([0, 1, 2, 3, 4], tf.int64)
+      with self.assertRaises(ValueError):
+        table.initialize_from(keys, values)
+
+  def testInitializationWithInvalidDataTypes(self):
+    with self.test_session():
+      shared_name = ''
+      default_val = -1
+      table = tf.HashTable(tf.string, tf.int64, default_val, shared_name)
+
+      # Initialize with keys and values tensors.
+      keys = [0, 1, 2]
+      values = ['brain', 'salad', 'surgery']
+      with self.assertRaises(TypeError):
+        table.initialize_from(keys, values)
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/lrn_op_test.py b/tensorflow/python/kernel_tests/lrn_op_test.py
new file mode 100644
index 0000000000..7a3bb67938
--- /dev/null
+++ b/tensorflow/python/kernel_tests/lrn_op_test.py
@@ -0,0 +1,101 @@
+"""Tests for local response normalization."""
+import copy
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests.gradient_checker import ComputeGradientError
+
+
+
+class LRNOpTest(tf.test.TestCase):
+
+  def _LRN(self, input_image, lrn_depth_radius=5, bias=1.0,
+           alpha=1.0, beta=0.5):
+    """Compute expected result."""
+    output = copy.deepcopy(input_image)
+    batch_size = input_image.shape[0]
+    rows = input_image.shape[1]
+    cols = input_image.shape[2]
+    depth = input_image.shape[3]
+    for b in range(batch_size):
+      for r in range(rows):
+        for c in range(cols):
+          for d in range(depth):
+            begin = max(0, d - lrn_depth_radius)
+            end = min(depth, d + lrn_depth_radius + 1)
+            patch = input_image[b, r, c, begin:end]
+            output[b, r, c, d] /= (
+                np.power(bias + alpha * np.sum(patch * patch), beta))
+    return output
+
+  def _RunAndVerify(self):
+    with self.test_session():
+      # random shape
+      shape = np.random.randint(1, 16, size=4)
+      # Make depth at least 2 to make it meaningful
+      shape[3] += 1
+      p = tf.placeholder(tf.float32, shape=shape)
+      # random depth_radius, bias, alpha, beta
+      lrn_depth_radius = np.random.randint(1, shape[3])
+      bias = 1.0 + np.random.rand()
+      alpha = 2.0 * np.random.rand()
+      beta = 2.0 * np.random.rand()
+      lrn_t = tf.nn.local_response_normalization(
+          p, name="lrn", depth_radius=lrn_depth_radius, bias=bias,
+          alpha=alpha, beta=beta)
+      params = {p: np.random.rand(*shape).astype("f")}
+      result = lrn_t.eval(feed_dict=params)
+    expected = self._LRN(
+        params[p], lrn_depth_radius=lrn_depth_radius, bias=bias, alpha=alpha,
+        beta=beta)
+    self.assertTrue(np.amax(np.abs(result - expected)) < 1e-4)
+    self.assertShapeEqual(expected, lrn_t)
+
+  def testCompute(self):
+    for _ in range(2):
+      self._RunAndVerify()
+
+  def testGradientsZeroInput(self):
+    with self.test_session():
+      shape = [4, 4, 4, 4]
+      p = tf.placeholder(tf.float32, shape=shape)
+      inp_array = np.zeros(shape).astype("f")
+      lrn_op = tf.nn.local_response_normalization(p, 2, 1.0, 0.0,
+                                                  1.0, name="lrn")
+      grad = tf.gradients([lrn_op], [p])[0]
+      params = {p: inp_array}
+      r = grad.eval(feed_dict=params)
+    expected = np.ones(shape).astype("f")
+    self.assertAllClose(r, expected)
+    self.assertShapeEqual(expected, grad)
+
+  def _RunAndVerifyGradients(self):
+    with self.test_session():
+      # random shape
+      shape = np.random.randint(1, 5, size=4)
+      # Make depth at least 2 to make it meaningful
+      shape[3] += 1
+      # random depth_radius, bias, alpha, beta
+      lrn_depth_radius = np.random.randint(1, shape[3])
+      bias = 1.0 + np.random.rand()
+      alpha = 1.0 * np.random.rand()
+      beta = 1.0 * np.random.rand()
+      inp_array = np.random.rand(*shape).astype("f")
+      inp = tf.constant(list(inp_array.ravel(order="C")), shape=shape)
+      lrn_op = tf.nn.local_response_normalization(
+          inp, name="lrn", depth_radius=lrn_depth_radius, bias=bias,
+          alpha=alpha, beta=beta)
+      err = ComputeGradientError(inp, shape, lrn_op, shape)
+    print "LRN Gradient error ", err
+    self.assertLess(err, 1e-4)
+
+  def testGradients(self):
+    for _ in range(2):
+      self._RunAndVerifyGradients()
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/matmul_op_test.py b/tensorflow/python/kernel_tests/matmul_op_test.py
new file mode 100644
index 0000000000..5aeb736b9b
--- /dev/null
+++ b/tensorflow/python/kernel_tests/matmul_op_test.py
@@ -0,0 +1,206 @@
+"""Tests for tensorflow.ops.math_ops.matmul."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class MatMulTest(tf.test.TestCase):
+
+  def _testCpuMatmul(self, x, y, transpose_x=False, transpose_y=False):
+    x_mat = np.matrix(x).T if transpose_x else np.matrix(x)
+    y_mat = np.matrix(y).T if transpose_y else np.matrix(y)
+    np_ans = x_mat * y_mat
+    with self.test_session(use_gpu=False):
+      tf_ans = tf.matmul(x, y, transpose_x, transpose_y).eval()
+    self.assertAllClose(np_ans, tf_ans)
+    self.assertAllEqual(np_ans.shape, tf_ans.shape)
+
+  def _testGpuMatmul(self, x, y, transpose_x=False, transpose_y=False):
+    x_mat = np.matrix(x).T if transpose_x else np.matrix(x)
+    y_mat = np.matrix(y).T if transpose_y else np.matrix(y)
+    np_ans = x_mat * y_mat
+    with self.test_session(use_gpu=True):
+      tf_ans = tf.matmul(x, y, transpose_x, transpose_y).eval()
+    self.assertAllClose(np_ans, tf_ans)
+    self.assertAllEqual(np_ans.shape, tf_ans.shape)
+
+  def _randMatrix(self, rows, cols, dtype):
+    if dtype is np.complex64:
+      real = self._randMatrix(rows, cols, np.float32)
+      imag = self._randMatrix(rows, cols, np.float32)
+      return real + np.complex(0, 1) * imag
+    else:
+      return np.random.uniform(low=1.0, high=100.0, size=rows * cols).reshape(
+          [rows, cols]).astype(dtype)
+
+  # Basic test:
+  #   [ [1],
+  #     [2],
+  #     [3],   *  [1, 2]
+  #     [4] ]
+  def testFloatBasic(self):
+    x = np.arange(1., 5.).reshape([4, 1]).astype(np.float32)
+    y = np.arange(1., 3.).reshape([1, 2]).astype(np.float32)
+    self._testCpuMatmul(x, y)
+    self._testGpuMatmul(x, y)
+
+  def testDoubleBasic(self):
+    x = np.arange(1., 5.).reshape([4, 1]).astype(np.float64)
+    y = np.arange(1., 3.).reshape([1, 2]).astype(np.float64)
+    self._testCpuMatmul(x, y)
+
+  def testInt32Basic(self):
+    x = np.arange(1., 5.).reshape([4, 1]).astype(np.int32)
+    y = np.arange(1., 3.).reshape([1, 2]).astype(np.int32)
+    self._testCpuMatmul(x, y)
+
+  def testSComplexBasic(self):
+    x = np.arange(1., 5.).reshape([4, 1]).astype(np.complex64)
+    y = np.arange(1., 3.).reshape([1, 2]).astype(np.complex64)
+    self._testCpuMatmul(x, y)
+
+  # Tests testing random sized matrices.
+  def testFloatRandom(self):
+    for _ in range(10):
+      n, k, m = np.random.randint(1, 100, size=3)
+      x = self._randMatrix(n, k, np.float32)
+      y = self._randMatrix(k, m, np.float32)
+      self._testCpuMatmul(x, y)
+      self._testGpuMatmul(x, y)
+
+  def testDoubleRandom(self):
+    for _ in range(10):
+      n, k, m = np.random.randint(1, 100, size=3)
+      x = self._randMatrix(n, k, np.float64)
+      y = self._randMatrix(k, m, np.float64)
+      self._testCpuMatmul(x, y)
+
+  def testInt32Random(self):
+    for _ in range(10):
+      n, k, m = np.random.randint(1, 100, size=3)
+      x = self._randMatrix(n, k, np.int32)
+      y = self._randMatrix(k, m, np.int32)
+      self._testCpuMatmul(x, y)
+
+  def testSComplexRandom(self):
+    for _ in range(10):
+      n, k, m = np.random.randint(1, 100, size=3)
+      x = self._randMatrix(n, k, np.complex64)
+      y = self._randMatrix(k, m, np.complex64)
+      self._testCpuMatmul(x, y)
+
+  # Test the cases that transpose the matrices before multiplying.
+  # NOTE(keveman): The cases where only one of the inputs is
+  # transposed are covered by tf.matmul's gradient function.
+  def testFloatRandomTransposeBoth(self):
+    for _ in range(10):
+      n, k, m = np.random.randint(1, 100, size=3)
+      x = self._randMatrix(k, n, np.float32)
+      y = self._randMatrix(m, k, np.float32)
+      self._testCpuMatmul(x, y, True, True)
+      self._testGpuMatmul(x, y, True, True)
+
+  def testDoubleRandomTranposeBoth(self):
+    for _ in range(10):
+      n, k, m = np.random.randint(1, 100, size=3)
+      x = self._randMatrix(k, n, np.float64)
+      y = self._randMatrix(m, k, np.float64)
+      self._testCpuMatmul(x, y, True, True)
+
+  def testMatMul_OutEmpty_A(self):
+      n, k, m = 0, 8, 3
+      x = self._randMatrix(n, k, np.float32)
+      y = self._randMatrix(k, m, np.float32)
+      self._testCpuMatmul(x, y)
+      self._testGpuMatmul(x, y)
+
+  def testMatMul_OutEmpty_B(self):
+      n, k, m = 3, 8, 0
+      x = self._randMatrix(n, k, np.float32)
+      y = self._randMatrix(k, m, np.float32)
+      self._testCpuMatmul(x, y)
+      self._testGpuMatmul(x, y)
+
+  def testMatMul_Inputs_Empty(self):
+      n, k, m = 3, 0, 4
+      x = self._randMatrix(n, k, np.float32)
+      y = self._randMatrix(k, m, np.float32)
+      self._testCpuMatmul(x, y)
+      self._testGpuMatmul(x, y)
+
+
+# TODO(zhifengc): Figures out how to test matmul gradients on GPU.
+class MatMulGradientTest(tf.test.TestCase):
+
+  def testGradientInput0(self):
+    with self.test_session(use_gpu=False):
+      x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2],
+                   dtype=tf.float64, name="x")
+      y = tf.constant([1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7],
+                   shape=[2, 4], dtype=tf.float64, name="y")
+      m = tf.matmul(x, y, name="matmul")
+      err = gc.ComputeGradientError(x, [3, 2], m, [3, 4])
+    print "matmul input0 gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+  def testGradientInput1(self):
+    with self.test_session(use_gpu=False):
+      x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2],
+                   dtype=tf.float64, name="x")
+      y = tf.constant([1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7],
+                   shape=[2, 4], dtype=tf.float64, name="y")
+      m = tf.matmul(x, y, name="matmul")
+      err = gc.ComputeGradientError(y, [2, 4], m, [3, 4])
+    print "matmul input1 gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+  def _VerifyInput0(self, transpose_a, transpose_b):
+    shape_x = [3, 2]
+    shape_y = [2, 4]
+    if transpose_a:
+      shape_x = list(reversed(shape_x))
+    if transpose_b:
+      shape_y = list(reversed(shape_y))
+    with self.test_session(use_gpu=False):
+      x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=shape_x,
+                   dtype=tf.float64, name="x")
+      y = tf.constant([1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7],
+                   shape=shape_y, dtype=tf.float64, name="y")
+      m = tf.matmul(x, y, transpose_a, transpose_b, name="matmul")
+      err = gc.ComputeGradientError(x, shape_x, m, [3, 4])
+    print "matmul input0 gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+  def testGradientInput0WithTranspose(self):
+    self._VerifyInput0(transpose_a=True, transpose_b=False)
+    self._VerifyInput0(transpose_a=False, transpose_b=True)
+    self._VerifyInput0(transpose_a=True, transpose_b=True)
+
+  def _VerifyInput1(self, transpose_a, transpose_b):
+    shape_x = [3, 2]
+    shape_y = [2, 4]
+    if transpose_a:
+      shape_x = list(reversed(shape_x))
+    if transpose_b:
+      shape_y = list(reversed(shape_y))
+    with self.test_session(use_gpu=False):
+      x = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=shape_x,
+                   dtype=tf.float64, name="x")
+      y = tf.constant([1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7],
+                   shape=shape_y, dtype=tf.float64, name="y")
+      m = tf.matmul(x, y, transpose_a, transpose_b, name="matmul")
+      err = gc.ComputeGradientError(y, shape_y, m, [3, 4])
+    print "matmul input1 gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+  def testGradientInput1WithTranspose(self):
+    self._VerifyInput1(transpose_a=True, transpose_b=False)
+    self._VerifyInput1(transpose_a=False, transpose_b=True)
+    self._VerifyInput1(transpose_a=True, transpose_b=True)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/matrix_inverse_op_test.py b/tensorflow/python/kernel_tests/matrix_inverse_op_test.py
new file mode 100644
index 0000000000..541a937185
--- /dev/null
+++ b/tensorflow/python/kernel_tests/matrix_inverse_op_test.py
@@ -0,0 +1,79 @@
+"""Tests for tensorflow.ops.math_ops.matrix_inverse."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class InverseOpTest(tf.test.TestCase):
+
+  def _verifyInverse(self, x):
+    for np_type in [np.float32, np.float64]:
+      y = x.astype(np_type)
+      with self.test_session():
+        # Verify that x^{-1} * x == Identity matrix.
+        if x.ndim == 2:
+          inv = tf.matrix_inverse(y)
+          tf_ans = tf.matmul(inv, y)
+          np_ans = np.identity(y.shape[-1])
+        else:
+          inv = tf.batch_matrix_inverse(y)
+          tf_ans = tf.batch_matmul(inv, y)
+          tiling = list(y.shape)
+          tiling[-2:] = [1, 1]
+          np_ans = np.tile(np.identity(y.shape[-1]), tiling)
+        out = tf_ans.eval()
+      self.assertAllClose(np_ans, out)
+      self.assertShapeEqual(y, tf_ans)
+
+  def testBasic(self):
+    # 2x2 matrices
+    matrix1 = np.array([[1., 2.], [3., 4.]])
+    matrix2 = np.array([[1., 3.], [3., 5.]])
+    self._verifyInverse(matrix1)
+    self._verifyInverse(matrix2)
+    # A multidimensional batch of 2x2 matrices
+    matrix_batch = np.concatenate([np.expand_dims(matrix1, 0),
+                                   np.expand_dims(matrix2, 0)])
+    matrix_batch = np.tile(matrix_batch, [2, 3, 1, 1])
+    self._verifyInverse(matrix_batch)
+
+  def testNonSquareMatrix(self):
+    # When the inverse of a non-square matrix is attempted we should return
+    # an error
+    with self.assertRaises(ValueError):
+      tf.matrix_inverse(np.array([[1., 2., 3.], [3., 4., 5.]]))
+
+  def testWrongDimensions(self):
+    # The input to the inverse should be at least a 2-dimensional tensor.
+    tensor3 = tf.constant([1., 2.])
+    with self.assertRaises(ValueError):
+      tf.matrix_inverse(tensor3)
+
+  def testNotInvertible(self):
+    # The input should be invertible.
+    with self.test_session():
+      with self.assertRaisesOpError("Input is not invertible."):
+        # All rows of the matrix below add to zero
+        tensor3 = tf.constant([[1., 0., -1.], [-1., 1., 0.], [0., -1., 1.]])
+        tf.matrix_inverse(tensor3).eval()
+
+    with self.test_session():
+      with self.assertRaisesOpError("Input is not invertible."):
+        # Determinant of the matrix below is zero
+        tensor3 = tf.constant([[1., 1.], [1., 1.]])
+        tf.matrix_inverse(tensor3).eval()
+
+    with self.test_session():
+      with self.assertRaisesOpError("Input is not invertible."):
+        # Determinant of the matrix below is zero
+        tensor3 = tf.constant([[np.inf, 1.], [1., 1.]])
+        tf.matrix_inverse(tensor3).eval()
+
+  def testEmpty(self):
+    self._verifyInverse(np.empty([0, 2, 2]))
+    self._verifyInverse(np.empty([2, 0, 0]))
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/numerics_test.py b/tensorflow/python/kernel_tests/numerics_test.py
new file mode 100644
index 0000000000..8cb2fe2f8b
--- /dev/null
+++ b/tensorflow/python/kernel_tests/numerics_test.py
@@ -0,0 +1,91 @@
+"""Tests for tensorflow.ops.numerics."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.ops import control_flow_ops
+
+
+class VerifyTensorAllFiniteTest(tf.test.TestCase):
+
+  def testVerifyTensorAllFiniteSucceeds(self):
+    x_shape = [5, 4]
+    x = np.random.random_sample(x_shape).astype(np.float32)
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        t = tf.constant(x, shape=x_shape, dtype=tf.float32)
+        t_verified = tf.verify_tensor_all_finite(t, "Input is not a number.")
+        self.assertAllClose(x, t_verified.eval())
+
+  def testVerifyTensorAllFiniteFails(self):
+    x_shape = [5, 4]
+    x = np.random.random_sample(x_shape).astype(np.float32)
+    my_msg = "Input is not a number."
+
+    # Test NaN.
+    x[0] = np.nan
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        with self.assertRaisesOpError(my_msg):
+          t = tf.constant(x, shape=x_shape, dtype=tf.float32)
+          t_verified = tf.verify_tensor_all_finite(t, my_msg)
+          t_verified.eval()
+
+    # Test Inf.
+    x[0] = np.inf
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        with self.assertRaisesOpError(my_msg):
+          t = tf.constant(x, shape=x_shape, dtype=tf.float32)
+          t_verified = tf.verify_tensor_all_finite(t, my_msg)
+          t_verified.eval()
+
+
+class NumericsTest(tf.test.TestCase):
+
+  def testInf(self):
+    for use_gpu in [True, False]:
+      with self.test_session(use_gpu=use_gpu, graph=tf.Graph()):
+        t1 = tf.constant(1.0)
+        t2 = tf.constant(0.0)
+        a = tf.div(t1, t2)
+        check = tf.add_check_numerics_ops()
+        a = control_flow_ops.with_dependencies([check], a)
+        with self.assertRaisesOpError("Inf"):
+          a.eval()
+
+  def testNaN(self):
+    for use_gpu in [True, False]:
+      with self.test_session(use_gpu=use_gpu, graph=tf.Graph()):
+        t1 = tf.constant(0.0)
+        t2 = tf.constant(0.0)
+        a = tf.div(t1, t2)
+        check = tf.add_check_numerics_ops()
+        a = control_flow_ops.with_dependencies([check], a)
+        with self.assertRaisesOpError("NaN"):
+          a.eval()
+
+  def testBoth(self):
+    for use_gpu in [True, False]:
+      with self.test_session(use_gpu=use_gpu, graph=tf.Graph()):
+        t1 = tf.constant([1.0, 0.0])
+        t2 = tf.constant([0.0, 0.0])
+        a = tf.div(t1, t2)
+        check = tf.add_check_numerics_ops()
+        a = control_flow_ops.with_dependencies([check], a)
+        with self.assertRaisesOpError("Inf and NaN"):
+          a.eval()
+
+  def testPassThrough(self):
+    for use_gpu in [True, False]:
+      with self.test_session(use_gpu=use_gpu, graph=tf.Graph()):
+        t1 = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
+        checked = tf.check_numerics(t1, message="pass through test")
+        value = checked.eval()
+        self.assertAllEqual(np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]), value)
+        self.assertEqual([2, 3], checked.get_shape())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/pack_op_test.py b/tensorflow/python/kernel_tests/pack_op_test.py
new file mode 100644
index 0000000000..5f3b1823c0
--- /dev/null
+++ b/tensorflow/python/kernel_tests/pack_op_test.py
@@ -0,0 +1,47 @@
+"""Functional tests for Pack Op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker
+
+
+class PackOpTest(tf.test.TestCase):
+
+  def testSimple(self):
+    np.random.seed(7)
+    for use_gpu in False, True:
+      with self.test_session(use_gpu=use_gpu):
+        for shape in (2,), (3,), (2, 3), (3, 2), (4, 3, 2):
+          data = np.random.randn(*shape)
+          # Convert [data[0], data[1], ...] separately to tensorflow
+          xs = map(tf.constant, data)
+          # Pack back into a single tensorflow tensor
+          c = tf.pack(xs)
+          self.assertAllEqual(c.eval(), data)
+
+  def testGradients(self):
+    np.random.seed(7)
+    for use_gpu in False, True:
+      for shape in (2,), (3,), (2, 3), (3, 2), (4, 3, 2):
+        data = np.random.randn(*shape)
+        shapes = [shape[1:]] * shape[0]
+        with self.test_session(use_gpu=use_gpu):
+          xs = map(tf.constant, data)
+          c = tf.pack(xs)
+          err = gradient_checker.ComputeGradientError(xs, shapes, c, shape)
+          self.assertLess(err, 1e-6)
+
+  def testZeroSize(self):
+    # Verify that pack doesn't crash for zero size inputs
+    for use_gpu in False, True:
+      with self.test_session(use_gpu=use_gpu):
+        for shape in (0,), (3,0), (0, 3):
+          x = np.zeros((2,) + shape)
+          p = tf.pack(list(x)).eval()
+          self.assertAllEqual(p, x)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/pad_op_test.py b/tensorflow/python/kernel_tests/pad_op_test.py
new file mode 100644
index 0000000000..113aeb1ccf
--- /dev/null
+++ b/tensorflow/python/kernel_tests/pad_op_test.py
@@ -0,0 +1,140 @@
+"""Tests for tensorflow.ops.nn_ops.Pad."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class PadOpTest(tf.test.TestCase):
+
+  def _npPad(self, inp, paddings):
+    return np.pad(inp, paddings, mode="constant")
+
+  def testNpPad(self):
+    self.assertAllClose(
+        np.array([[0, 0, 0, 0, 0, 0],
+                  [0, 3, 3, 0, 0, 0],
+                  [0, 4, 4, 0, 0, 0],
+                  [0, 5, 5, 0, 0, 0],
+                  [0, 0, 0, 0, 0, 0],
+                  [0, 0, 0, 0, 0, 0]]),
+        self._npPad(np.array([[3, 3], [4, 4], [5, 5]]), [[1, 2], [1, 3]]))
+
+  def _testPad(self, np_inputs, paddings, use_gpu=False):
+    np_val = self._npPad(np_inputs, paddings)
+    with self.test_session(use_gpu=use_gpu):
+      tf_val = tf.pad(np_inputs, paddings)
+      out = tf_val.eval()
+    self.assertAllClose(np_val, out)
+    self.assertShapeEqual(np_val, tf_val)
+
+  def _testGradient(self, x, a):
+    with self.test_session():
+      inx = tf.convert_to_tensor(x)
+      xs = list(x.shape)
+      ina = tf.convert_to_tensor(a)
+      y = tf.pad(inx, ina)
+      # Expected y's shape to be:
+      ys = list(np.array(x.shape) + np.sum(np.array(a), axis=1))
+      jacob_t, jacob_n = gc.ComputeGradient(inx, xs, y, ys, x_init_value=x)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+  def _testAll(self, np_inputs, paddings):
+    self._testPad(np_inputs, paddings, use_gpu=False)
+    self._testPad(np_inputs, paddings, use_gpu=True)
+    if np_inputs.dtype == np.float32:
+      self._testGradient(np_inputs, paddings)
+
+  def testInputDims(self):
+    with self.test_session():
+      with self.assertRaises(ValueError):
+        tf.pad(
+            tf.reshape([1, 2], shape=[1, 2, 1, 1, 1, 1]),
+            tf.reshape([1, 2], shape=[1, 2]))
+
+  def testPaddingsDim(self):
+    with self.test_session():
+      with self.assertRaises(ValueError):
+        tf.pad(
+            tf.reshape([1, 2], shape=[1, 2]),
+            tf.reshape([1, 2], shape=[2]))
+
+  def testPaddingsDim2(self):
+    with self.test_session():
+      with self.assertRaises(ValueError):
+        tf.pad(
+            tf.reshape([1, 2], shape=[1, 2]),
+            tf.reshape([1, 2], shape=[2, 1]))
+
+  def testPaddingsDim3(self):
+    with self.test_session():
+      with self.assertRaises(ValueError):
+        tf.pad(
+            tf.reshape([1, 2], shape=[1, 2]),
+            tf.reshape([1, 2], shape=[1, 2]))
+
+  def testPaddingsDim4(self):
+    with self.test_session():
+      with self.assertRaises(ValueError):
+        tf.pad(
+            tf.reshape([1, 2], shape=[1, 2]),
+            tf.reshape([1, 2, 3, 4, 5, 6], shape=[3, 2]))
+
+  def testPaddingsNonNegative(self):
+    with self.test_session():
+      with self.assertRaisesRegexp(ValueError, "must be non-negative"):
+        tf.pad(
+            tf.constant([1], shape=[1]),
+            tf.constant([-1, 0], shape=[1, 2]))
+
+  def testPaddingsNonNegative2(self):
+    with self.test_session():
+      with self.assertRaisesRegexp(ValueError, "must be non-negative"):
+        tf.pad(
+            tf.constant([1], shape=[1]),
+            tf.constant([-1, 0], shape=[1, 2]))
+
+  def testIntTypes(self):
+    # TODO(mdevin): Figure out why the padding tests do not work on GPU
+    # for int types and rank > 2.
+    for t in [np.int32, np.int64]:
+      self._testPad((np.random.rand(4, 3, 3) * 100).astype(t),
+                    [[1, 0], [2, 3], [0, 2]])
+
+  def testFloatTypes(self):
+    for t in [np.float32, np.float64]:
+      self._testAll(np.random.rand(2, 5).astype(t),
+                    [[1, 0], [2, 0]])
+
+  def testShapeFunctionEdgeCases(self):
+    # Unknown paddings shape.
+    inp = tf.constant(0.0, shape=[4, 4, 4, 4])
+    padded = tf.pad(inp, tf.placeholder(tf.int32))
+    self.assertEqual([None, None, None, None], padded.get_shape().as_list())
+
+    # Unknown input shape.
+    inp = tf.placeholder(tf.float32)
+    padded = tf.pad(inp, [[2, 2], [2, 2]])
+    self.assertEqual([None, None], padded.get_shape().as_list())
+
+    # Unknown input and paddings shape.
+    inp = tf.placeholder(tf.float32)
+    padded = tf.pad(inp, tf.placeholder(tf.int32))
+    self.assertAllEqual(None, padded.get_shape().ndims)
+
+  def testScalars(self):
+    paddings = np.zeros((0, 2), dtype=np.int32)
+    inp = np.asarray(7)
+    for use_gpu in False, True:
+      with self.test_session(use_gpu=use_gpu):
+        tf_val = tf.pad(inp, paddings)
+        out = tf_val.eval()
+      self.assertAllClose(inp, out)
+      self.assertShapeEqual(inp, tf_val)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/parsing_ops_test.py b/tensorflow/python/kernel_tests/parsing_ops_test.py
new file mode 100644
index 0000000000..fba7c705fb
--- /dev/null
+++ b/tensorflow/python/kernel_tests/parsing_ops_test.py
@@ -0,0 +1,414 @@
+"""Tests for tensorflow.ops.parsing_ops."""
+
+import itertools
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+# Helpers for creating Example objects
+example = tf.train.Example
+feature = tf.train.Feature
+features = lambda d: tf.train.Features(feature=d)
+bytes_feature = lambda v: feature(bytes_list=tf.train.BytesList(value=v))
+int64_feature = lambda v: feature(int64_list=tf.train.Int64List(value=v))
+float_feature = lambda v: feature(float_list=tf.train.FloatList(value=v))
+
+
+def flatten(list_of_lists):
+  """Flatten one level of nesting."""
+  return itertools.chain.from_iterable(list_of_lists)
+
+
+def flatten_values_tensors_or_sparse(tensors_list):
+  """Flatten each SparseTensor object into 3 Tensors for session.run()."""
+  return list(flatten([[v.indices, v.values, v.shape]
+                       if isinstance(v, tf.SparseTensor) else [v]
+                       for v in tensors_list]))
+
+
+def _compare_output_to_expected(
+    tester, dict_tensors, expected_tensors, flat_output):
+  tester.assertEqual(set(dict_tensors.keys()), set(expected_tensors.keys()))
+
+  i = 0  # Index into the flattened output of session.run()
+  for k, v in dict_tensors.iteritems():
+    expected_v = expected_tensors[k]
+    tf.logging.info("Comparing key: %s", k)
+    if isinstance(v, tf.SparseTensor):
+      # Three outputs for SparseTensor : indices, values, shape.
+      tester.assertEqual([k, 3], [k, len(expected_v)])
+      tester.assertAllEqual(flat_output[i], expected_v[0])
+      tester.assertAllEqual(flat_output[i + 1], expected_v[1])
+      tester.assertAllEqual(flat_output[i + 2], expected_v[2])
+      i += 3
+    else:
+      # One output for standard Tensor.
+      tester.assertAllEqual(flat_output[i], expected_v)
+      i += 1
+
+
+class ParseExampleTest(tf.test.TestCase):
+
+  def _test(self, kwargs, expected_values=None, expected_err_re=None):
+    with self.test_session() as sess:
+      # Pull out some keys to check shape inference
+      serialized = kwargs["serialized"]
+      dense_keys = kwargs["dense_keys"] if "dense_keys" in kwargs else []
+      sparse_keys = kwargs["sparse_keys"] if "sparse_keys" in kwargs else []
+      dense_shapes = kwargs["dense_shapes"] if "dense_shapes" in kwargs else []
+
+      # Returns dict w/ Tensors and SparseTensors
+      out = tf.parse_example(**kwargs)
+
+      # Check shapes; if serialized is a Tensor we need its size to
+      # properly check.
+      batch_size = (
+          serialized.eval().size if isinstance(serialized, tf.Tensor)
+          else np.asarray(serialized).size)
+      self.assertEqual(len(dense_keys), len(dense_shapes))
+      for (k, s) in zip(dense_keys, dense_shapes):
+        self.assertEqual(tuple(out[k].get_shape().as_list()), (batch_size,) + s)
+      for k in sparse_keys:
+        self.assertEqual(tuple(out[k].indices.get_shape().as_list()), (None, 2))
+        self.assertEqual(tuple(out[k].values.get_shape().as_list()), (None,))
+        self.assertEqual(tuple(out[k].shape.get_shape().as_list()), (2,))
+
+      # Check values
+      result = flatten_values_tensors_or_sparse(out.values())  # flatten values
+      if expected_err_re is None:
+        tf_result = sess.run(result)
+        _compare_output_to_expected(self, out, expected_values, tf_result)
+      else:
+        with self.assertRaisesOpError(expected_err_re):
+          sess.run(result)
+
+  def testEmptySerializedWithAllDefaults(self):
+    dense_keys = ["a", "b", "c"]
+    dense_shapes = [(1, 3), (3, 3), (2,)]
+    dense_types = [tf.int64, tf.string, tf.float32]
+    dense_defaults = {
+        "a": [0, 42, 0],
+        "b": np.random.rand(3, 3).astype(np.str),
+        "c": np.random.rand(2).astype(np.float32),
+    }
+
+    expected_st_a = (  # indices, values, shape
+        np.empty((0, 2), dtype=np.int64),  # indices
+        np.empty((0,), dtype=np.int64),  # sp_a is DT_INT64
+        np.array([2, 0], dtype=np.int64))  # batch == 2, max_elems = 0
+
+    expected_output = {
+        "st_a": expected_st_a,
+        "a": np.array(2 * [[dense_defaults["a"]]]),
+        "b": np.array(2 * [dense_defaults["b"]]),
+        "c": np.array(2 * [dense_defaults["c"]]),
+    }
+
+    self._test(
+        {
+            "names": np.empty((0,), dtype=np.str),
+            # empty serialized input Examples
+            "serialized": tf.convert_to_tensor(["", ""]),
+            "dense_defaults": dense_defaults,
+            "sparse_keys": ["st_a"],
+            "sparse_types": [tf.int64],
+            "dense_keys": dense_keys,
+            "dense_types": dense_types,
+            "dense_shapes": dense_shapes
+        }, expected_output)
+
+  def testEmptySerializedWithoutDefaultsShouldFail(self):
+    dense_shapes = [(1, 3), (3, 3), (2,)]
+    dense_defaults = {
+        "a": [0, 42, 0],
+        "b": np.random.rand(3, 3).astype(np.str),
+        # Feature "c" is missing, since there's gaps it will cause failure.
+    }
+    self._test(
+        {
+            "serialized": ["", ""],  # empty serialized input Examples
+            "names": ["in1", "in2"],
+            "dense_defaults": dense_defaults,
+            "sparse_keys": ["st_a"],
+            "sparse_types": [tf.int64],
+            "dense_keys": ["a", "b", "c"],
+            "dense_types": [tf.int64, tf.string, tf.float32],
+            "dense_shapes": dense_shapes
+        },
+        expected_err_re="Name: in1, Feature: c is required")
+
+  def testDenseNotMatchingShapeShouldFail(self):
+    dense_shapes = [(1, 3)]
+    dense_defaults = {
+        # no default!
+    }
+
+    original = [
+        example(features=features({
+            "a": float_feature([1, 1, 3]),
+        })),
+        example(features=features({
+            "a": float_feature([-1, -1]),
+        }))
+    ]
+
+    names = ["passing", "failing"]
+    serialized = [m.SerializeToString() for m in original]
+
+    self._test(
+        {
+            "serialized": tf.convert_to_tensor(serialized),
+            "names": names,
+            "dense_defaults": dense_defaults,
+            "dense_keys": ["a"],
+            "dense_types": [tf.float32],
+            "dense_shapes": dense_shapes,
+        },
+        expected_err_re="Name: failing, Key: a.  Number of float values")
+
+  def testSerializedContainingSparse(self):
+    original = [
+        example(features=features({
+            "st_c": float_feature([3, 4])
+        })),
+        example(features=features({
+            "st_c": float_feature([]),  # empty float list
+        })),
+        example(features=features({
+            "st_d": feature(),  # feature with nothing in it
+        })),
+        example(features=features({
+            "st_c": float_feature([1, 2, -1]),
+            "st_d": bytes_feature(["hi"])
+        }))
+    ]
+
+    serialized = [m.SerializeToString() for m in original]
+
+    expected_st_c = (  # indices, values, shape
+        np.array([[0, 0], [0, 1], [3, 0], [3, 1], [3, 2]], dtype=np.int64),
+        np.array([3.0, 4.0, 1.0, 2.0, -1.0], dtype=np.float32),
+        np.array([4, 3], dtype=np.int64))  # batch == 2, max_elems = 3
+
+    expected_st_d = (  # indices, values, shape
+        np.array([[3, 0]], dtype=np.int64),
+        np.array(["hi"], dtype=np.str),
+        np.array([4, 1], dtype=np.int64))  # batch == 2, max_elems = 1
+
+    expected_output = {
+        "st_c": expected_st_c,
+        "st_d": expected_st_d,
+    }
+
+    self._test(
+        {
+            "serialized": tf.convert_to_tensor(serialized),
+            "sparse_keys": ["st_c", "st_d"],
+            "sparse_types": [tf.float32, tf.string],
+        }, expected_output)
+
+  def testSerializedContainingDense(self):
+    original = [
+        example(features=features({
+            "a": float_feature([1, 1]),
+            "b": bytes_feature(["b0_str"]),
+        })),
+        example(features=features({
+            "a": float_feature([-1, -1]),
+            "b": bytes_feature(["b1"]),
+        }))
+    ]
+
+    serialized = [m.SerializeToString() for m in original]
+
+    dense_shapes = [(1, 2, 1), (1, 1, 1, 1)]
+
+    expected_output = {
+        "a": np.array([[1, 1], [-1, -1]], dtype=np.float32).reshape(2, 1, 2, 1),
+        "b": np.array(["b0_str", "b1"], dtype=np.str).reshape(2, 1, 1, 1, 1),
+    }
+
+    # No defaults, values required
+    self._test(
+        {
+            "serialized": tf.convert_to_tensor(serialized),
+            "dense_keys": ["a", "b"],
+            "dense_types": [tf.float32, tf.string],
+            "dense_shapes": dense_shapes,
+        }, expected_output)
+
+  def testSerializedContainingDenseScalar(self):
+    original = [
+        example(features=features({
+            "a": float_feature([1]),
+        })),
+        example(features=features({}))
+    ]
+
+    serialized = [m.SerializeToString() for m in original]
+
+    expected_output = {
+        "a": np.array([[1], [-1]], dtype=np.float32)  # 2x1 (column vector)
+    }
+
+    self._test(
+        {
+            "serialized": tf.convert_to_tensor(serialized),
+            "dense_defaults": {"a": -1},
+            "dense_shapes": [(1,)],
+            "dense_keys": ["a"],
+            "dense_types": [tf.float32],
+        }, expected_output)
+
+  def testSerializedContainingDenseWithDefaults(self):
+    original = [
+        example(features=features({
+            "a": float_feature([1, 1]),
+        })),
+        example(features=features({
+            "b": bytes_feature(["b1"]),
+        }))
+    ]
+
+    serialized = [m.SerializeToString() for m in original]
+
+    dense_shapes = [(1, 2, 1), (1, 1, 1, 1)]
+    dense_types = [tf.float32, tf.string]
+    dense_defaults = {
+        "a": [3.0, -3.0],
+        "b": "tmp_str",
+    }
+
+    expected_output = {
+        "a": np.array([[1, 1], [3, -3]], dtype=np.float32).reshape(2, 1, 2, 1),
+        "b": np.array(["tmp_str", "b1"], dtype=np.str).reshape(2, 1, 1, 1, 1),
+    }
+
+    self._test(
+        {
+            "serialized": tf.convert_to_tensor(serialized),
+            "dense_defaults": dense_defaults,
+            "dense_keys": ["a", "b"],
+            "dense_types": dense_types,
+            "dense_shapes": dense_shapes,
+        }, expected_output)
+
+  def testSerializedContainingSparseAndDenseWithNoDefault(self):
+    dense_defaults = {
+        "a": [1, 2, 3],
+        "b": np.random.rand(3, 3).astype(np.str),
+        # Feature "c" must be provided
+    }
+    dense_shapes = [(1, 3), (3, 3), (2,)]
+
+    expected_st_a = (  # indices, values, shape
+        np.empty((0, 2), dtype=np.int64),  # indices
+        np.empty((0,), dtype=np.int64),  # sp_a is DT_INT64
+        np.array([2, 0], dtype=np.int64))  # batch == 2, max_elems = 0
+
+    original = [
+        example(features=features({
+            "c": float_feature([3, 4])
+        })),
+        example(features=features({
+            "c": float_feature([1, 2])
+        }))
+    ]
+
+    names = ["in1", "in2"]
+    serialized = [m.SerializeToString() for m in original]
+
+    expected_output = {
+        "st_a": expected_st_a,
+        "a": np.array(2 * [[dense_defaults["a"]]]),
+        "b": np.array(2 * [dense_defaults["b"]]),
+        "c": np.array([[3, 4], [1, 2]], dtype=np.float32),
+    }
+
+    self._test(
+        {
+            "names": names,
+            "serialized": tf.convert_to_tensor(serialized),
+            "dense_defaults": dense_defaults,
+            "sparse_keys": ["st_a"],
+            "sparse_types": [tf.int64],
+            "dense_keys": ["a", "b", "c"],
+            "dense_types": [tf.int64, tf.string, tf.float32],
+            "dense_shapes": dense_shapes
+        }, expected_output)
+
+
+class ParseSingleExampleTest(tf.test.TestCase):
+
+  def _test(self, kwargs, expected_values=None, expected_err_re=None):
+    with self.test_session() as sess:
+      # Pull out some keys to check shape inference
+      dense_keys = kwargs["dense_keys"] if "dense_keys" in kwargs else []
+      sparse_keys = kwargs["sparse_keys"] if "sparse_keys" in kwargs else []
+      dense_shapes = kwargs["dense_shapes"] if "dense_shapes" in kwargs else []
+
+      # Returns dict w/ Tensors and SparseTensors
+      out = tf.parse_single_example(**kwargs)
+
+      # Check shapes
+      self.assertEqual(len(dense_keys), len(dense_shapes))
+      for (k, s) in zip(dense_keys, dense_shapes):
+        self.assertEqual(tuple(out[k].get_shape()), s)
+      for k in sparse_keys:
+        self.assertEqual(tuple(out[k].indices.get_shape().as_list()), (None, 1))
+        self.assertEqual(tuple(out[k].values.get_shape().as_list()), (None,))
+        self.assertEqual(tuple(out[k].shape.get_shape().as_list()), (1,))
+
+      # Check values
+      result = flatten_values_tensors_or_sparse(out.values())  # flatten values
+      if expected_err_re is None:
+        tf_result = sess.run(result)
+        _compare_output_to_expected(self, out, expected_values, tf_result)
+      else:
+        with self.assertRaisesOpError(expected_err_re):
+          sess.run(result)
+
+  def testSingleExampleWithSparseAndDense(self):
+    dense_types = [tf.int64, tf.string, tf.float32]
+    dense_shapes = [(1, 3), (3, 3), (2,)]
+    dense_defaults = {
+        "a": [1, 2, 3],
+        "b": np.random.rand(3, 3).astype(np.str),
+        # Feature "c" must be provided
+    }
+
+    original = example(features=features(
+        {"c": float_feature([3, 4]),
+         "st_a": float_feature([3.0, 4.0])}))
+
+    serialized = original.SerializeToString()
+
+    expected_st_a = (
+        np.array([[0], [1]], dtype=np.int64),  # indices
+        np.array([3.0, 4.0], dtype=np.float32),  # values
+        np.array([2], dtype=np.int64))  # shape: max_values = 2
+
+    expected_output = {
+        "st_a": expected_st_a,
+        "a": [dense_defaults["a"]],
+        "b": dense_defaults["b"],
+        "c": np.array([3, 4], dtype=np.float32),
+    }
+
+    self._test(
+        {
+            "names": "in1",
+            "serialized": tf.convert_to_tensor(serialized),
+            "dense_defaults": dense_defaults,
+            "dense_types": dense_types,
+            "sparse_keys": ["st_a"],
+            "sparse_types": [tf.float32],
+            "dense_keys": ["a", "b", "c"],
+            "dense_shapes": dense_shapes
+        }, expected_output)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/pooling_ops_test.py b/tensorflow/python/kernel_tests/pooling_ops_test.py
new file mode 100644
index 0000000000..b9a65726ee
--- /dev/null
+++ b/tensorflow/python/kernel_tests/pooling_ops_test.py
@@ -0,0 +1,819 @@
+"""Functional tests for pooling operations."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+from tensorflow.python.ops import gen_nn_ops
+
+
+def GetInceptionMaxPoolShapes():
+  """Iterator for some of the max pool ops in the Inception 2015 model.
+
+  Yields:
+    Tuple (name, input_size, filter_size, out_size, strides, padding)
+  """
+  names = ["maxpool2", "maxpool3", "maxpool4", "maxpool5"]
+  input_sizes = [[32, 71, 71, 192],
+                 [32, 35, 35, 288], [32, 17, 17, 1248], [32, 8, 8, 2048]]
+  filter_sizes = [[1, 3, 3, 1], [1, 3, 3, 1],
+                  [1, 3, 3, 1], [1, 3, 3, 1]]
+  output_sizes = [[32, 35, 35, 192], [32, 17, 17, 288],
+                  [32, 8, 8, 1248], [32, 8, 8, 2048]]
+  strides = [[1, 2, 2, 1], [1, 2, 2, 1], [1, 2, 2, 1],
+             [1, 1, 1, 1]]
+  paddings = ["VALID", "VALID", "VALID", "SAME"]
+  for n, i, f, o, s, p in zip(names, input_sizes, filter_sizes, output_sizes,
+                              strides, paddings):
+    yield n, i, f, o, s, p
+
+
+class PoolingTest(tf.test.TestCase):
+
+  def _VerifyValues(self, pool_func, input_sizes, ksize, strides, padding,
+                    expected, use_gpu):
+    """Verifies the output values of the pooling function.
+
+    Args:
+      pool_func: Function to be called, co.MaxPool, co.AvgPool,
+        or the Lua version.
+      input_sizes: Input tensor dimensions.
+      ksize: The kernel size dimensions
+      strides: The stride dimensions
+      padding: Padding type.
+      expected: An array containing the expected operation outputs.
+      use_gpu: Whether we are running on GPU.
+    """
+    total_size = 1
+    for s in input_sizes:
+      total_size *= s
+    # Initializes the input tensor with array containing incrementing
+    # numbers from 1.
+    x = [f * 1.0 for f in range(1, total_size + 1)]
+    with self.test_session(use_gpu=use_gpu) as sess:
+      t = tf.constant(x, shape=input_sizes)
+      t = pool_func(t, ksize=ksize, strides=strides, padding=padding)
+      actual = t.eval()
+      self.assertAllClose(expected, actual.flatten())
+      self.assertShapeEqual(actual, t)
+
+  def _testAvgPoolValidPadding(self, use_gpu):
+    expected_output = [7.0, 8.0, 9.0]
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 3, 3, 3],
+                       ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
+                       padding="VALID",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def _testAvgPoolSamePadding(self, use_gpu):
+    expected_output = [8.5, 9.5, 10.5, 14.5, 15.5, 16.5]
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 2, 4, 3],
+                       ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
+                       padding="SAME",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def _testAvgPoolSamePaddingNonSquareWindow(self, use_gpu):
+    # input is:
+    # [1.0, 2.0
+    #  3.0  4.0]
+    #
+    # Window of [x, x] should do:
+    #  [avg(1.0, 2.0), avg(2.0, padded0),
+    #   avg(3.0, 4.0), avg(4.0, padded0)]
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 2, 2, 1],
+                       ksize=[1, 1, 2, 1], strides=[1, 1, 1, 1],
+                       padding="SAME",
+                       expected=[1.5, 2.0, 3.5, 4.0], use_gpu=use_gpu)
+
+    # Window of [x,
+    #            x] should do:
+    #  [avg(1.0, 3.0), avg(2.0, 4.0)
+    #   avg(3.0, padded0), avg(4.0, padded0)]
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 2, 2, 1],
+                       ksize=[1, 2, 1, 1], strides=[1, 1, 1, 1],
+                       padding="SAME",
+                       expected=[2.0, 3.0, 3.0, 4.0], use_gpu=use_gpu)
+
+  def _testAvgPoolSamePaddingNonSquareWindowMultiBatch(self, use_gpu):
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[2, 2, 2, 2],
+                       ksize=[1, 1, 2, 1], strides=[1, 1, 1, 1],
+                       padding="SAME",
+                       expected=[2.0, 3.0, 3.0, 4.0,
+                                 6.0, 7.0, 7.0, 8.0,
+                                 10.0, 11.0, 11.0, 12.0,
+                                 14.0, 15.0, 15.0, 16.0],
+                       use_gpu=use_gpu)
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[2, 2, 2, 2],
+                       ksize=[1, 2, 1, 1], strides=[1, 1, 1, 1],
+                       padding="SAME",
+                       expected=[3.0, 4.0, 5.0, 6.0,
+                                 5.0, 6.0, 7.0, 8.0,
+                                 11.0, 12.0, 13.0, 14.0,
+                                 13.0, 14.0, 15.0, 16.0],
+                       use_gpu=use_gpu)
+
+  def _testAvgPoolValidPaddingUnevenStride(self, use_gpu):
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 3, 3, 3],
+                       ksize=[1, 2, 2, 1], strides=[1, 1, 2, 1],
+                       padding="VALID",
+                       expected=[7.0, 8.0, 9.0, 16.0, 17.0, 18.0],
+                       use_gpu=use_gpu)
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 3, 3, 3],
+                       ksize=[1, 2, 2, 1], strides=[1, 2, 1, 1],
+                       padding="VALID",
+                       expected=[7.0, 8.0, 9.0, 10.0, 11.0, 12.0],
+                       use_gpu=use_gpu)
+
+  def _testAvgPoolSamePadding4(self, use_gpu):
+    expected_output = [11.0, 12.0, 13.0, 14.0, 19.0, 20.0, 21.0, 22.0, 43.0,
+                       44.0, 45.0, 46.0, 51.0, 52.0, 53.0, 54.0]
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 4, 4, 4],
+                       ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
+                       padding="SAME",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def _testAvgPoolSamePaddingPacket4(self, use_gpu):
+    expected_output = [21.0, 22.0, 23.0, 24.0, 27.0, 28.0, 29.0, 30.0,
+                       45.0, 46.0, 47.0, 48.0, 51.0, 52.0, 53.0, 54.0]
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 4, 4, 4],
+                       ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
+                       padding="SAME",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def _testAvgPoolSamePaddingPacket8(self, use_gpu):
+    expected_output = [73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0, 89.0,
+                       90.0, 91.0, 92.0, 93.0, 94.0, 95.0, 96.0, 105.0, 106.0,
+                       107.0, 108.0, 109.0, 110.0, 111.0, 112.0, 117.0, 118.0,
+                       119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 201.0, 202.0,
+                       203.0, 204.0, 205.0, 206.0, 207.0, 208.0, 217.0, 218.0,
+                       219.0, 220.0, 221.0, 222.0, 223.0, 224.0, 233.0, 234.0,
+                       235.0, 236.0, 237.0, 238.0, 239.0, 240.0, 245.0, 246.0,
+                       247.0, 248.0, 249.0, 250.0, 251.0, 252.0, 329.0, 330.0,
+                       331.0, 332.0, 333.0, 334.0, 335.0, 336.0, 345.0, 346.0,
+                       347.0, 348.0, 349.0, 350.0, 351.0, 352.0, 361.0, 362.0,
+                       363.0, 364.0, 365.0, 366.0, 367.0, 368.0, 373.0, 374.0,
+                       375.0, 376.0, 377.0, 378.0, 379.0, 380.0, 425.0, 426.0,
+                       427.0, 428.0, 429.0, 430.0, 431.0, 432.0, 441.0, 442.0,
+                       443.0, 444.0, 445.0, 446.0, 447.0, 448.0, 457.0, 458.0,
+                       459.0, 460.0, 461.0, 462.0, 463.0, 464.0, 469.0, 470.0,
+                       471.0, 472.0, 473.0, 474.0, 475.0, 476.0]
+    self._VerifyValues(tf.nn.avg_pool, input_sizes=[1, 8, 8, 8],
+                       ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
+                       padding="SAME",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def testAvgPooling(self):
+    for use_gpu in True, False:
+      self._testAvgPoolValidPadding(use_gpu)
+      self._testAvgPoolSamePadding(use_gpu)
+      self._testAvgPoolSamePaddingNonSquareWindow(use_gpu)
+      self._testAvgPoolSamePaddingNonSquareWindowMultiBatch(use_gpu)
+      self._testAvgPoolValidPaddingUnevenStride(use_gpu)
+      self._testAvgPoolSamePadding4(use_gpu)
+      self._testAvgPoolSamePaddingPacket4(use_gpu)
+      self._testAvgPoolSamePaddingPacket8(use_gpu)
+
+  def _testMaxPoolValidPadding(self, use_gpu):
+    expected_output = [13.0, 14.0, 15.0]
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 3, 3, 3],
+                       ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
+                       padding="VALID",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def _testMaxPoolSamePadding(self, use_gpu):
+    expected_output = [13.0, 14.0, 15.0, 16.0, 17.0, 18.0]
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 2, 3, 3],
+                       ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
+                       padding="SAME",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def _testMaxPoolSamePaddingNonSquareWindow(self, use_gpu):
+    # input is:
+    # [1.0, 2.0
+    #  3.0  4.0]
+    #
+    # Window of [x, x] should do:
+    #
+    #  [max(1.0, 2.0), max(2.0, padded0),
+    #   max(3.0, 4.0), max(4.0, padded0)]
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 2, 2, 1],
+                       ksize=[1, 1, 2, 1], strides=[1, 1, 1, 1],
+                       padding="SAME",
+                       expected=[2.0, 2.0, 4.0, 4.0], use_gpu=use_gpu)
+
+  def _testMaxPoolValidPaddingUnevenStride(self, use_gpu):
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 4, 4, 1],
+                       ksize=[1, 2, 2, 1], strides=[1, 1, 2, 1],
+                       padding="VALID",
+                       expected=[6.0, 8.0, 10.0, 12.0, 14.0, 16.0],
+                       use_gpu=use_gpu)
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 4, 4, 1],
+                       ksize=[1, 2, 2, 1], strides=[1, 2, 1, 1],
+                       padding="VALID",
+                       expected=[6.0, 7.0, 8.0, 14.0, 15.0, 16.0],
+                       use_gpu=use_gpu)
+
+  def _testMaxPoolSamePaddingPacket4(self, use_gpu):
+    expected_output = [21.0, 22.0, 23.0, 24.0, 29.0, 30.0, 31.0, 32.0, 53.0,
+                       54.0, 55.0, 56.0, 61.0, 62.0, 63.0, 64.0]
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 4, 4, 4],
+                       ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
+                       padding="SAME",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def _testMaxPoolSamePaddingPacket8(self, use_gpu):
+    expected_output = [145.0, 146.0, 147.0, 148.0, 149.0, 150.0, 151.0, 152.0,
+                       161.0, 162.0, 163.0, 164.0, 165.0, 166.0, 167.0, 168.0,
+                       177.0, 178.0, 179.0, 180.0, 181.0, 182.0, 183.0, 184.0,
+                       185.0, 186.0, 187.0, 188.0, 189.0, 190.0, 191.0, 192.0,
+                       273.0, 274.0, 275.0, 276.0, 277.0, 278.0, 279.0, 280.0,
+                       289.0, 290.0, 291.0, 292.0, 293.0, 294.0, 295.0, 296.0,
+                       305.0, 306.0, 307.0, 308.0, 309.0, 310.0, 311.0, 312.0,
+                       313.0, 314.0, 315.0, 316.0, 317.0, 318.0, 319.0, 320.0,
+                       401.0, 402.0, 403.0, 404.0, 405.0, 406.0, 407.0, 408.0,
+                       417.0, 418.0, 419.0, 420.0, 421.0, 422.0, 423.0, 424.0,
+                       433.0, 434.0, 435.0, 436.0, 437.0, 438.0, 439.0, 440.0,
+                       441.0, 442.0, 443.0, 444.0, 445.0, 446.0, 447.0, 448.0,
+                       465.0, 466.0, 467.0, 468.0, 469.0, 470.0, 471.0, 472.0,
+                       481.0, 482.0, 483.0, 484.0, 485.0, 486.0, 487.0, 488.0,
+                       497.0, 498.0, 499.0, 500.0, 501.0, 502.0, 503.0, 504.0,
+                       505.0, 506.0, 507.0, 508.0, 509.0, 510.0, 511.0, 512.0]
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 8, 8, 8],
+                       ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
+                       padding="SAME",
+                       expected=expected_output, use_gpu=use_gpu)
+
+  def testMaxPooling(self):
+    for use_gpu in True, False:
+      self._testMaxPoolValidPadding(use_gpu)
+      self._testMaxPoolSamePadding(use_gpu)
+      self._testMaxPoolSamePaddingNonSquareWindow(use_gpu)
+      self._testMaxPoolValidPaddingUnevenStride(use_gpu)
+      self._testMaxPoolSamePaddingPacket4(use_gpu)
+      self._testMaxPoolSamePaddingPacket8(use_gpu)
+
+  # Tests for DepthwiseMaxPooling on CPU only.
+  def testDepthwiseMaxPool1x1DepthWindow1(self):
+    # input is:
+    # [1.0, ..., 10.0] along depth,
+    #
+    # We maxpool by depth in patches of 2.
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 1, 1, 10],
+                       ksize=[1, 1, 1, 2], strides=[1, 1, 1, 2],
+                       padding="SAME",
+                       expected=[2.0, 4.0, 6.0, 8.0, 10.0], use_gpu=False)
+
+  def testDepthwiseMaxPool2x2DepthWindow3(self):
+    # input is:
+    #
+    # a 2x2x6 cube, and we depthwise max across 3 to produce a 2x2x2
+    # output.  Each node has contiguous values, so the depthwise max
+    # should be multiples of 3.0.
+    self._VerifyValues(tf.nn.max_pool, input_sizes=[1, 2, 2, 6],
+                       ksize=[1, 1, 1, 3], strides=[1, 1, 1, 3],
+                       padding="SAME",
+                       expected=[3.0, 6.0, 9.0, 12.0, 15.0, 18.0, 21.0, 24.0],
+                       use_gpu=False)
+
+  def _testDepthwiseMaxPoolInvalidConfig(self, in_size, ksize, strides,
+                                         error_msg, use_gpu=False):
+    t = tf.constant(1.0, shape=in_size)
+    with self.assertRaisesRegexp(ValueError, error_msg):
+      t = tf.nn.max_pool(t, ksize=ksize, strides=strides, padding="SAME")
+
+  def testDepthwiseMaxPoolInvalidConfigs(self):
+    self._testDepthwiseMaxPoolInvalidConfig(
+        [1, 2, 2, 4], [1, 2, 2, 2],
+        [1, 1, 1, 2], "exactly one of pooling across depth")
+    self._testDepthwiseMaxPoolInvalidConfig(
+        [1, 2, 2, 4], [1, 1, 1, 2],
+        [1, 1, 1, 1], "depth window to equal the depth stride")
+    self._testDepthwiseMaxPoolInvalidConfig(
+        [1, 2, 2, 4], [1, 1, 1, 3],
+        [1, 1, 1, 3], "evenly divide")
+    if tf.test.IsBuiltWithCuda():
+      with self.test_session(use_gpu=True):
+        t = tf.constant(1.0, shape=[1, 2, 2, 4])
+        with self.assertRaisesOpError("for CPU devices"):
+          tf.nn.max_pool(t, ksize=[1, 1, 1, 2], strides=[1, 1, 1, 2],
+                         padding="SAME").eval()
+
+  # The following are tests that verify that the CPU and GPU implementations
+  # produce the same resuts.
+  def _CompareMaxPoolingFwd(self, input_shape, ksize, strides, padding):
+    tensor_input = np.random.rand(*input_shape).astype(np.float32)
+    with self.test_session(use_gpu=True):
+      t = tf.constant(tensor_input, shape=input_shape)
+      out_op, _ = tf.nn.max_pool_with_argmax(t, ksize, strides, padding)
+      gpu_val = out_op.eval()
+    with self.test_session(use_gpu=False):
+      t = tf.constant(tensor_input, shape=input_shape)
+      out_op = tf.nn.max_pool(t, ksize, strides, padding)
+      cpu_val = out_op.eval()
+    self.assertAllClose(cpu_val, gpu_val, rtol=1e-5, atol=1e-5)
+
+  def _CompareMaxPoolingBk(self, input_shape, output_shape, ksize, strides,
+                           padding):
+    # Generate numbers in a narrow range, so that there are many duplicates
+    # in the input.
+    tensor_input = np.random.random_integers(0, 3,
+                                             input_shape).astype(np.float32)
+    tensor_output = np.random.rand(*output_shape).astype(np.float32)
+    with self.test_session(use_gpu=True):
+      t = tf.constant(tensor_input, shape=input_shape)
+      _, argmax_op = tf.nn.max_pool_with_argmax(t, ksize, strides, padding)
+      argmax = argmax_op.eval()
+      grad_in = tf.constant(tensor_output, shape=output_shape)
+      out_op = gen_nn_ops._max_pool_grad_with_argmax(t, grad_in, argmax,
+                                                     ksize, strides, padding)
+      gpu_val = out_op.eval()
+      self.assertShapeEqual(gpu_val, out_op)
+    with self.test_session(use_gpu=False):
+      t = tf.constant(tensor_input, shape=input_shape)
+      out_op = tf.nn.max_pool(t, ksize, strides, padding)
+      orig_out = out_op.eval()
+      grad_in = tf.constant(tensor_output, shape=output_shape)
+      out_op = gen_nn_ops._max_pool_grad(t, orig_out, grad_in, ksize,
+                                         strides, padding)
+      cpu_val = out_op.eval()
+      self.assertShapeEqual(cpu_val, out_op)
+    self.assertAllClose(cpu_val, gpu_val, rtol=1e-5, atol=1e-5)
+
+  def testMaxPoolingWithArgmax(self):
+    # MaxPoolWithArgMax is implemented only on GPU.
+    if not tf.test.IsBuiltWithCuda():
+      return
+    tensor_input = [1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0]
+    with self.test_session(use_gpu=True) as sess:
+      t = tf.constant(tensor_input, shape=[1, 3, 3, 1])
+      out_op, argmax_op = tf.nn.max_pool_with_argmax(t,
+                                                   ksize=[1, 2, 2, 1],
+                                                   strides=[1, 1, 1, 1],
+                                                   Targmax=tf.int64,
+                                                   padding="VALID")
+      out, argmax = sess.run([out_op, argmax_op])
+      self.assertShapeEqual(out, out_op)
+      self.assertShapeEqual(argmax, argmax_op)
+      self.assertAllClose(out.ravel(), [1.0, 1.0, 1.0, 1.0])
+      self.assertAllEqual(argmax.ravel(), [0, 1, 3, 5])
+
+  def testMaxPoolingGradWithArgmax(self):
+    # MaxPoolWithArgMax is implemented only on GPU.
+    if not tf.test.IsBuiltWithCuda():
+      return
+    orig_input = [1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0]
+    tensor_input = [11.0, 12.0, 13.0, 14.0]
+    tensor_argmax = list(np.array([0, 1, 3, 5], dtype=np.int64))
+    with self.test_session(use_gpu=True) as sess:
+      orig_in = tf.constant(orig_input, shape=[1, 3, 3, 1])
+      t = tf.constant(tensor_input, shape=[1, 2, 2, 1])
+      argmax = tf.constant(tensor_argmax, shape=[1, 2, 2, 1],
+                                    dtype=tf.int64)
+      out_op = gen_nn_ops._max_pool_grad_with_argmax(orig_in, t, argmax,
+                                                     ksize=[1, 2, 2, 1],
+                                                     strides=[1, 1, 1, 1],
+                                                     padding="VALID")
+      out = out_op.eval().flatten()
+      self.assertAllClose(out, [11.0, 12.0, 0.0, 13.0, 0.0,
+                                14.0, 0.0, 0.0, 0.0])
+
+  def _ConstructAndTestGradient(self, pool_func, input_sizes, output_sizes,
+                                window_rows, window_cols, row_stride,
+                                col_stride, padding, use_gpu,
+                                x_init_value=None):
+    """Verifies the gradients of the avg pooling function.
+
+    Args:
+      pool_func: Function to be called, co.MaxPool, co.AvgPool,
+        or the Lua version.
+      input_sizes: Input tensor dimensions.
+      output_sizes: Output tensor dimensions.
+      window_rows: kernel size in row dim
+      window_cols: kernel size in col dim
+      row_stride: Row Stride.
+      col_stride: Col Stride.
+      padding: Padding type.
+      use_gpu: whether we are running on GPU
+      x_init_value: Values to be passed to the gradient checker.
+    """
+    total_size = 1
+    for s in input_sizes:
+      total_size *= s
+    # Initializes the input tensor with array containing incrementing
+    # numbers from 1.
+    x = [f * 1.0 for f in range(1, total_size + 1)]
+    with self.test_session(use_gpu=use_gpu):
+      input_tensor = tf.constant(x, shape=input_sizes, name="input")
+      if pool_func == tf.nn.avg_pool:
+        func_name = "avg_pool"
+        err_margin = 1e-4
+      else:
+        if x_init_value is None:
+          x_init_value = np.asfarray(
+              np.arange(1, total_size + 1),
+              dtype=np.float32).reshape(input_sizes)
+        func_name = "max_pool"
+        err_margin = 1e-3
+      t = pool_func(input_tensor, ksize=[1, window_rows, window_rows, 1],
+                    strides=[1, row_stride, col_stride, 1],
+                    padding=padding, name=func_name)
+      err = gc.ComputeGradientError(
+          input_tensor, input_sizes, t, output_sizes,
+          x_init_value=x_init_value, delta=1e-2)
+    print "%s gradient error = " % func_name, err
+    self.assertLess(err, err_margin)
+
+  def _testMaxPoolGradValidPadding1_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.max_pool, input_sizes=[1, 3, 3, 1],
+        output_sizes=[1, 3, 3, 1], window_rows=1, window_cols=1, row_stride=1,
+        col_stride=1, padding="VALID", use_gpu=use_gpu)
+
+  def _testMaxPoolGradValidPadding2_1_6(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.max_pool, input_sizes=[2, 6, 6, 3],
+        output_sizes=[2, 5, 5, 3], window_rows=2, window_cols=2, row_stride=1,
+        col_stride=1, padding="VALID", use_gpu=use_gpu)
+
+  def _testMaxPoolGradValidPadding2_1_7(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.max_pool, input_sizes=[2, 7, 7, 3],
+        output_sizes=[2, 6, 6, 3], window_rows=2, window_cols=2, row_stride=1,
+        col_stride=1, padding="VALID", use_gpu=use_gpu)
+
+  def _testMaxPoolGradValidPadding2_2(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.max_pool, input_sizes=[2, 2, 2, 3],
+        output_sizes=[2, 1, 1, 3], window_rows=2, window_cols=2, row_stride=2,
+        col_stride=2, padding="VALID", use_gpu=use_gpu)
+
+  def _testMaxPoolGradSamePadding1_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.max_pool, input_sizes=[2, 2, 4, 3],
+        output_sizes=[2, 2, 4, 3], window_rows=1, window_cols=1, row_stride=1,
+        col_stride=1, padding="SAME", use_gpu=use_gpu)
+
+  def _testMaxPoolGradSamePadding2_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.max_pool, input_sizes=[2, 2, 4, 3],
+        output_sizes=[2, 2, 4, 3], window_rows=2, window_cols=2, row_stride=1,
+        col_stride=1, padding="SAME", use_gpu=use_gpu)
+
+  def _testMaxPoolGradSamePadding2_2(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.max_pool, input_sizes=[2, 2, 4, 3],
+        output_sizes=[2, 1, 2, 3], window_rows=2, window_cols=2, row_stride=2,
+        col_stride=2, padding="SAME", use_gpu=use_gpu)
+
+  def _testMaxPoolGradSamePadding3_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.max_pool, input_sizes=[1, 7, 7, 1],
+        output_sizes=[1, 7, 7, 1], window_rows=3, window_cols=3, row_stride=1,
+        col_stride=1, padding="SAME", use_gpu=use_gpu)
+
+  def testMaxPoolGrad(self):
+    for use_gpu in True, False:
+      self._testMaxPoolGradValidPadding1_1(use_gpu=use_gpu)
+      self._testMaxPoolGradValidPadding2_1_6(use_gpu=use_gpu)
+      self._testMaxPoolGradValidPadding2_1_7(use_gpu=use_gpu)
+      self._testMaxPoolGradValidPadding2_2(use_gpu=use_gpu)
+      self._testMaxPoolGradSamePadding1_1(use_gpu=use_gpu)
+      self._testMaxPoolGradSamePadding2_1(use_gpu=use_gpu)
+      self._testMaxPoolGradSamePadding2_2(use_gpu=use_gpu)
+      self._testMaxPoolGradSamePadding3_1(use_gpu=use_gpu)
+
+  def _MaxPoolGrad(self, orig_input, orig_output, grad, window_rows,
+                   window_cols, row_stride, col_stride, padding):
+    """Max Pooling Gradient.
+
+    Args:
+      orig_input: A float Tensor. The original input tensor.
+      orig_output: A float Tensor. The original output tensor.
+      grad: A float Tensor.
+        The 4D (batch x rows x cols x depth) output backprop.
+      window_rows: integer. Kernel size along rows dimension.
+      window_cols: integer. Kernel size along cols dimension.
+      row_stride: integer. Stride along rows dimension
+      col_stride: integer. Stride along cols dimension
+      padding: PoolingOpDef.Padding.  Padding type.
+
+    Returns:
+      A Tensor.
+    """
+    return gen_nn_ops._max_pool_grad(
+        orig_input, orig_output, grad,
+        [1, window_rows, window_cols, 1], [1, row_stride, col_stride, 1],
+        padding)
+
+  def _testMaxPoolGradDirect(self, input_data, output_backprop,
+                             expected_input_backprop, input_sizes, output_sizes,
+                             window_rows, window_cols, row_stride, col_stride,
+                             padding, use_gpu):
+    with self.test_session(use_gpu=use_gpu) as sess:
+      input_tensor = tf.constant(input_data, shape=input_sizes)
+      output_tensor = tf.nn.max_pool(
+          input_tensor, [1, window_rows, window_cols, 1],
+          [1, row_stride, col_stride, 1], padding)
+      output_backprop_tensor = tf.constant(output_backprop,
+                                                    shape=output_sizes)
+
+      input_backprop_tensor = self._MaxPoolGrad(
+          input_tensor, output_tensor, output_backprop_tensor,
+          window_rows, window_cols, row_stride, col_stride, padding)
+
+      actual_input_backprop = input_backprop_tensor.eval()
+      self.assertShapeEqual(actual_input_backprop, input_backprop_tensor)
+      actual_input_backprop = actual_input_backprop.flatten()
+      actual_input_backprop = self._GetNdArray(actual_input_backprop)
+
+      actual_output = output_tensor.eval().flatten()
+      actual_output = self._GetNdArray(actual_output)
+
+      self.assertAllClose(expected_input_backprop, actual_input_backprop,
+                          rtol=1e-6, atol=1e-6)
+
+  def _testMaxPoolGradDirect1_1(self):
+    input_data = [
+        1.0, 1.0, 1.0, 1.0,
+        1.0, 1.0, 1.0, 1.0,
+        1.0, 1.0, 1.0, 1.0,
+        1.0, 1.0, 1.0, 1.0]
+    output_backprop = [
+        11.0, 12.0, 13.0,
+        15.0, 16.0, 17.0,
+        19.0, 20.0, 21.0]
+    expected_input_backprop = [
+        11.0, 12.0, 13.0, 0.0,
+        15.0, 16.0, 17.0, 0.0,
+        19.0, 20.0, 21.0, 0.0,
+        0.0, 0.0, 0.0, 0.0]
+
+    for use_gpu in True, False:
+      self._testMaxPoolGradDirect(
+          input_data, output_backprop, expected_input_backprop,
+          input_sizes=[1, 4, 4, 1], output_sizes=[1, 3, 3, 1],
+          window_rows=2, window_cols=2, row_stride=1, col_stride=1,
+          padding="VALID", use_gpu=use_gpu)
+
+  def _testMaxPoolGradDirect1_2(self):
+    input_data = [
+        1.0, 0.0, 1.0, 0.0,
+        0.0, 1.0, 0.0, 1.0,
+        1.0, 0.0, 1.0, 0.0,
+        0.0, 1.0, 0.0, 1.0]
+    output_backprop = [
+        11.0, 12.0, 13.0,
+        15.0, 16.0, 17.0,
+        19.0, 20.0, 21.0]
+    expected_input_backprop = [
+        11.0, 0.0, 25.0, 0.0,
+        0.0, 31.0, 0.0, 17.0,
+        19.0, 0.0, 41.0, 0.0,
+        0.0, 0.0, 0.0, 0.0]
+
+    for use_gpu in True, False:
+      self._testMaxPoolGradDirect(
+          input_data, output_backprop, expected_input_backprop,
+          input_sizes=[1, 4, 4, 1], output_sizes=[1, 3, 3, 1],
+          window_rows=2, window_cols=2, row_stride=1, col_stride=1,
+          padding="VALID", use_gpu=use_gpu)
+
+  def _testMaxPoolGradDirect1_3(self):
+    input_data = [
+        1.0, 0.0, 1.0, 0.0,
+        0.0, 1.0, 0.0, 1.0,
+        1.0, 0.0, 1.0, 0.0,
+        0.0, 1.0, 0.0, 1.0,]
+    output_backprop = [
+        11.0, 12.0, 13.0, 14.0,
+        15.0, 16.0, 17.0, 18.0,
+        19.0, 20.0, 21.0, 22.0,
+        23.0, 24.0, 25.0, 26.0]
+    expected_input_backprop = [
+        54, 0.0, 62, 0.0,
+        0.0, 60, 0.0, 22.0,
+        47, 0.0, 51, 0.0,
+        0.0, 0.0, 0.0, 0.0,]
+
+    for use_gpu in True, False:
+      self._testMaxPoolGradDirect(
+          input_data, output_backprop, expected_input_backprop,
+          input_sizes=[1, 4, 4, 1], output_sizes=[1, 4, 4, 1],
+          window_rows=3, window_cols=3, row_stride=1, col_stride=1,
+          padding="SAME", use_gpu=use_gpu)
+
+  def _testMaxPoolGradDirectWithNans2_1(self):
+    input_data = [float("nan")] * 16
+    output_backprop = [
+        11.0, 12.0, 13.0,
+        15.0, 16.0, 17.0,
+        19.0, 20.0, 21.0]
+    # Test the CPU implementation, which propagates diffs in case of NaN
+    expected_input_backprop_tf_cpu = [
+        11.0, 12.0, 13.0, 0.0,
+        15.0, 16.0, 17.0, 0.0,
+        19.0, 20.0, 21.0, 0.0,
+        0.0, 0.0, 0.0, 0.0]
+    self._testMaxPoolGradDirect(
+        input_data, output_backprop, expected_input_backprop_tf_cpu,
+        input_sizes=[1, 4, 4, 1], output_sizes=[1, 3, 3, 1],
+        window_rows=2, window_cols=2, row_stride=1, col_stride=1,
+        padding="VALID", use_gpu=False)
+
+    if not tf.test.IsBuiltWithCuda():
+      return
+
+    # Test the GPU implementation that uses cudnn for now.
+    # It does not propagate the diff in cases of NaNs
+    expected_input_backprop_cudnn = [
+        0.0, 0.0, 0.0, 0.0,
+        0.0, 0.0, 0.0, 0.0,
+        0.0, 0.0, 0.0, 0.0,
+        0.0, 0.0, 0.0, 0.0]
+    self._testMaxPoolGradDirect(
+        input_data, output_backprop, expected_input_backprop_cudnn,
+        input_sizes=[1, 4, 4, 1], output_sizes=[1, 3, 3, 1],
+        window_rows=2, window_cols=2, row_stride=1, col_stride=1,
+        padding="VALID", use_gpu=True)
+
+  def _testMaxPoolGradDirectWithNans2_2(self):
+    input_data = [float("nan")] * 16
+    output_backprop = [
+        float("nan"), 12.0, 13.0,
+        15.0, float("nan"), 17.0,
+        19.0, 20.0, float("nan")]
+    # Test the CPU implementation, which propagates diffs in case of NaN
+    expected_input_backprop_tf_cpu = [
+        float("nan"), 12.0, 13.0, 0.0,
+        15.0, float("nan"), 17.0, 0.0,
+        19.0, 20.0, float("nan"), 0.0,
+        0.0, 0.0, 0.0, 0.0]
+    self._testMaxPoolGradDirect(
+        input_data, output_backprop, expected_input_backprop_tf_cpu,
+        input_sizes=[1, 4, 4, 1], output_sizes=[1, 3, 3, 1],
+        window_rows=2, window_cols=2, row_stride=1, col_stride=1,
+        padding="VALID", use_gpu=False)
+
+    if not tf.test.IsBuiltWithCuda():
+      return
+
+    # Test the GPU implementation that uses cudnn for now.
+    # It does not propagate the diff in cases of NaNs
+    expected_input_backprop_cudnn = [
+        0.0, 0.0, 0.0, 0.0,
+        0.0, 0.0, 0.0, 0.0,
+        0.0, 0.0, 0.0, 0.0,
+        0.0, 0.0, 0.0, 0.0]
+    self._testMaxPoolGradDirect(
+        input_data, output_backprop, expected_input_backprop_cudnn,
+        input_sizes=[1, 4, 4, 1], output_sizes=[1, 3, 3, 1],
+        window_rows=2, window_cols=2, row_stride=1, col_stride=1,
+        padding="VALID", use_gpu=True)
+
+  def testMaxPoolGradDirect(self):
+    self._testMaxPoolGradDirect1_1()
+    self._testMaxPoolGradDirect1_2()
+    self._testMaxPoolGradDirect1_3()
+    self._testMaxPoolGradDirectWithNans2_1()
+    self._testMaxPoolGradDirectWithNans2_2()
+
+  def testAvgPoolGrad(self):
+    for use_gpu in False, True:
+      self._testAvgPoolGradValidPadding1_1(use_gpu)
+      self._testAvgPoolGradValidPadding2_1(use_gpu)
+      self._testAvgPoolGradValidPadding2_2(use_gpu)
+      self._testAvgPoolGradSamePadding1_1(use_gpu)
+      self._testAvgPoolGradSamePadding2_1(use_gpu)
+      self._testAvgPoolGradSamePadding2_2(use_gpu)
+      self._testAvgPoolGradSamePadding3_1(use_gpu)
+
+  def _testAvgPoolGradValidPadding1_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.avg_pool, input_sizes=[2, 3, 3, 3],
+        output_sizes=[2, 3, 3, 3], window_rows=1, window_cols=1, row_stride=1,
+        col_stride=1, padding="VALID", use_gpu=use_gpu)
+
+  def _testAvgPoolGradValidPadding2_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.avg_pool, input_sizes=[2, 3, 3, 3],
+        output_sizes=[2, 2, 2, 3], window_rows=2, window_cols=2, row_stride=1,
+        col_stride=1, padding="VALID", use_gpu=use_gpu)
+
+  def _testAvgPoolGradValidPadding2_2(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.avg_pool, input_sizes=[2, 2, 2, 3],
+        output_sizes=[2, 1, 1, 3], window_rows=2, window_cols=2, row_stride=2,
+        col_stride=2, padding="VALID", use_gpu=use_gpu)
+
+  def _testAvgPoolGradSamePadding1_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.avg_pool, input_sizes=[2, 2, 4, 3],
+        output_sizes=[2, 2, 4, 3], window_rows=1, window_cols=1, row_stride=1,
+        col_stride=1, padding="SAME", use_gpu=use_gpu)
+
+  def _testAvgPoolGradSamePadding2_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.avg_pool, input_sizes=[2, 2, 4, 3],
+        output_sizes=[2, 2, 4, 3], window_rows=2, window_cols=2, row_stride=1,
+        col_stride=1, padding="SAME", use_gpu=use_gpu)
+
+  def _testAvgPoolGradSamePadding2_2(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.avg_pool, input_sizes=[2, 2, 4, 3],
+        output_sizes=[2, 1, 2, 3], window_rows=2, window_cols=2, row_stride=2,
+        col_stride=2, padding="SAME", use_gpu=use_gpu)
+
+  def _testAvgPoolGradSamePadding3_1(self, use_gpu):
+    self._ConstructAndTestGradient(
+        tf.nn.avg_pool, input_sizes=[1, 7, 7, 1],
+        output_sizes=[1, 7, 7, 1], window_rows=3, window_cols=3, row_stride=1,
+        col_stride=1, padding="SAME", use_gpu=use_gpu)
+
+  def testShapeFunctionEdgeCases(self):
+    # All shapes unknown.
+    for pool_func in [tf.nn.max_pool, tf.nn.avg_pool]:
+      p = tf.nn.max_pool(tf.placeholder(tf.float32),
+                         ksize=[1, 1, 1, 1], strides=[1, 1, 1, 1],
+                         padding="SAME")
+      self.assertEqual([None, None, None, None], p.get_shape().as_list())
+    p, am = tf.nn.max_pool_with_argmax(
+        tf.placeholder(tf.float32),
+        ksize=[1, 1, 1, 1], strides=[1, 1, 1, 1],
+        padding="SAME")
+    self.assertEqual([None, None, None, None], p.get_shape().as_list())
+    self.assertEqual([None, None, None, None], am.get_shape().as_list())
+
+    # Incorrect input shape.
+    for pool_func in [tf.nn.max_pool, tf.nn.avg_pool,
+                      tf.nn.max_pool_with_argmax]:
+      with self.assertRaises(ValueError):
+        pool_func(tf.placeholder(tf.float32, shape=[1, 3]),
+                  ksize=[1, 1, 1, 1], strides=[1, 1, 1, 1], padding="SAME")
+
+    # Illegal strides.
+    for pool_func in [tf.nn.max_pool, tf.nn.avg_pool,
+                      tf.nn.max_pool_with_argmax]:
+      with self.assertRaisesRegexp(ValueError, "strides in the batch"):
+        pool_func(tf.placeholder(tf.float32),
+                  ksize=[1, 1, 1, 1], strides=[2, 1, 1, 1], padding="SAME")
+    with self.assertRaisesRegexp(ValueError, "strides in the batch and depth"):
+      tf.nn.avg_pool(tf.placeholder(tf.float32),
+                     ksize=[1, 1, 1, 1], strides=[1, 1, 1, 2], padding="SAME")
+
+    # Filter larger than input.
+    for pool_func in [tf.nn.max_pool, tf.nn.avg_pool,
+                      tf.nn.max_pool_with_argmax]:
+      with self.assertRaisesRegexp(ValueError,
+                                   "filter must not be larger than the input"):
+        pool_func(tf.placeholder(tf.float32,
+                                        shape=[32, 20, 20, 3]),
+                  ksize=[1, 20, 21, 1], strides=[1, 1, 1, 1], padding="SAME")
+      with self.assertRaisesRegexp(ValueError,
+                                   "filter must not be larger than the input"):
+        pool_func(tf.placeholder(tf.float32,
+                                        shape=[32, 20, 20, 3]),
+                  ksize=[1, 21, 20, 1], strides=[1, 1, 1, 1], padding="SAME")
+
+    # Stride larger than filter.
+    for pool_func in [tf.nn.max_pool, tf.nn.avg_pool,
+                      tf.nn.max_pool_with_argmax]:
+      with self.assertRaisesRegexp(
+          ValueError, "stride must be less than or equal to filter"):
+        pool_func(tf.placeholder(tf.float32,
+                                        shape=[32, 20, 20, 3]),
+                  ksize=[1, 5, 3, 1], strides=[1, 5, 5, 1], padding="SAME")
+      with self.assertRaisesRegexp(
+          ValueError, "stride must be less than or equal to filter"):
+        pool_func(tf.placeholder(tf.float32,
+                                        shape=[32, 20, 20, 3]),
+                  ksize=[1, 3, 5, 1], strides=[1, 5, 5, 1], padding="SAME")
+
+
+def GetMaxPoolFwdTest(input_size, filter_size, strides, padding):
+  def Test(self):
+    # MaxPoolWithArgMax is implemented only on GPU.
+    if not tf.test.IsBuiltWithCuda():
+      return
+    self._CompareMaxPoolingFwd(input_size, filter_size, strides, padding)
+  return Test
+
+
+def GetMaxPoolGradTest(input_size, filter_size, output_size, strides, padding):
+  def Test(self):
+    # MaxPoolWithArgMax is implemented only on GPU.
+    if not tf.test.IsBuiltWithCuda():
+      return
+    self._CompareMaxPoolingBk(input_size, output_size,
+                              filter_size, strides, padding)
+  return Test
+
+
+if __name__ == "__main__":
+  for (name_, input_size_, filter_size_, output_size_, stride_,
+       padding_) in GetInceptionMaxPoolShapes():
+    setattr(PoolingTest, "testMaxPoolFwd_" + name_,
+            GetMaxPoolFwdTest(input_size_, filter_size_, stride_, padding_))
+    setattr(PoolingTest, "testMaxPoolGrad_" + name_,
+            GetMaxPoolGradTest(input_size_, filter_size_, output_size_,
+                               stride_, padding_))
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/random_ops_test.py b/tensorflow/python/kernel_tests/random_ops_test.py
new file mode 100644
index 0000000000..311f0e3e5e
--- /dev/null
+++ b/tensorflow/python/kernel_tests/random_ops_test.py
@@ -0,0 +1,242 @@
+"""Tests for tensorflow.ops.random_ops."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class RandomNormalTest(tf.test.TestCase):
+
+  def _Sampler(self, num, mu, sigma, dtype, use_gpu, seed=None):
+    def func():
+      with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
+        rng = tf.random_normal(
+            [num], mean=mu, stddev=sigma, dtype=dtype, seed=seed)
+        ret = np.empty([10, num])
+        for i in xrange(10):
+          ret[i, :] = sess.run(rng)
+      return ret
+    return func
+
+  # Asserts that different trials (1000 samples per trial) is unlikely
+  # to see the same sequence of values. Will catch buggy
+  # implementations which uses the same random number seed.
+  def testDistinct(self):
+    for use_gpu in [False, True]:
+      for dt in tf.float32, tf.float64:
+        sampler = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu)
+        x = sampler()
+        y = sampler()
+        # Number of different samples.
+        count = (x == y).sum()
+        if count >= 10:
+          print "x = ", x
+          print "y = ", y
+          print "count = ", count
+        self.assertTrue(count < 10)
+
+  # Checks that the CPU and GPU implementation returns the same results,
+  # given the same random seed
+  def testCPUGPUMatch(self):
+    for dt in tf.float32, tf.float64:
+      results = {}
+      for use_gpu in [False, True]:
+        sampler = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=12345)
+        results[use_gpu] = sampler()
+      self.assertAllClose(results[False], results[True], rtol=1e-6, atol=1e-6)
+
+  def testSeed(self):
+    for use_gpu in [False, True]:
+      for dt in tf.float32, tf.float64:
+        sx = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=345)
+        sy = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=345)
+        self.assertAllEqual(sx(), sy())
+
+  def testNoCSE(self):
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        shape = [2, 3, 4]
+        rnd1 = tf.random_normal(shape, 0.0, 1.0, tf.float32)
+        rnd2 = tf.random_normal(shape, 0.0, 1.0, tf.float32)
+        diff = rnd2 - rnd1
+        self.assertTrue(np.linalg.norm(diff.eval()) > 0.1)
+
+
+class TruncatedNormalTest(tf.test.TestCase):
+
+  def _Sampler(self, num, mu, sigma, dtype, use_gpu, seed=None):
+    def func():
+      with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
+        rng = tf.truncated_normal(
+            [num], mean=mu, stddev=sigma, dtype=dtype, seed=seed)
+        ret = np.empty([10, num])
+        for i in xrange(10):
+          ret[i, :] = sess.run(rng)
+      return ret
+    return func
+
+  # Asserts that different trials (1000 samples per trial) is unlikely
+  # to see the same sequence of values. Will catch buggy
+  # implementations which uses the same random number seed.
+  def testDistinct(self):
+    # NOTE: RandomParameters on GPU is not supported.
+    for use_gpu in [False]:
+      for dt in tf.float32, tf.float64:
+        sampler = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu)
+        x = sampler()
+        y = sampler()
+        # Number of different samples.
+        count = (x == y).sum()
+        if count >= 10:
+          print "x = ", x
+          print "y = ", y
+          print "count = ", count
+        self.assertTrue(count < 10)
+
+  # Checks that the CPU and GPU implementation returns the same results,
+  # given the same random seed
+  def testCPUGPUMatch(self):
+    for dt in tf.float32, tf.float64:
+      results = {}
+      for use_gpu in [False, True]:
+        # We need a particular larger number of samples to test multiple rounds
+        # on GPU
+        sampler = self._Sampler(1000000, 0.0, 1.0, dt, use_gpu=use_gpu,
+                                seed=12345)
+        results[use_gpu] = sampler()
+      self.assertAllClose(results[False], results[True], rtol=1e-6, atol=1e-6)
+
+  def testSeed(self):
+    for use_gpu in [False, True]:
+      for dt in tf.float32, tf.float64:
+        sx = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=345)
+        sy = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=345)
+        self.assertAllEqual(sx(), sy())
+
+  # The effective standard deviation of truncated normal is 85% of the
+  # requested one.
+  def testStdDev(self):
+    for use_gpu in [False, True]:
+      for dt in tf.float32, tf.float64:
+        stddev = 3.0
+        sampler = self._Sampler(100000, 0.0, stddev, dt, use_gpu=use_gpu)
+        x = sampler()
+        print "std(x)", np.std(x), abs(np.std(x) / stddev - 0.85)
+        self.assertTrue(abs(np.std(x) / stddev - 0.85) < 0.04)
+
+  def testNoCSE(self):
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        shape = [2, 3, 4]
+        rnd1 = tf.truncated_normal(shape, 0.0, 1.0, tf.float32)
+        rnd2 = tf.truncated_normal(shape, 0.0, 1.0, tf.float32)
+        diff = rnd2 - rnd1
+        self.assertTrue(np.linalg.norm(diff.eval()) > 0.1)
+
+
+class RandomUniformTest(tf.test.TestCase):
+
+  def _Sampler(self, num, minv, maxv, dtype, use_gpu, seed=None):
+    def func():
+      with self.test_session(use_gpu=use_gpu, graph=tf.Graph()) as sess:
+        rng = tf.random_uniform(
+            [num], minval=minv, maxval=maxv, dtype=dtype, seed=seed)
+        ret = np.empty([10, num])
+        for i in xrange(10):
+          ret[i, :] = sess.run(rng)
+      return ret
+    return func
+
+  def testRange(self):
+    for use_gpu in [False, True]:
+      for dt in tf.float32, tf.float64:
+        sampler = self._Sampler(1000, -2., 8., dt, use_gpu=use_gpu)
+        x = sampler()
+        self.assertTrue(-2 <= np.min(x))
+        self.assertTrue(np.max(x) <= 8)
+
+  # Asserts that different trials (1000 samples per trial) is unlikely
+  # to see the same sequence of values. Will catch buggy
+  # implementations which uses the same random number seed.
+  def testDistinct(self):
+    for use_gpu in [False, True]:
+      for dt in tf.float32, tf.float64:
+        sampler = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu)
+        x = sampler()
+        y = sampler()
+        count = (x == y).sum()
+        if count >= 10:
+          print "x = ", x
+          print "y = ", y
+          print "count = ", count
+        self.assertTrue(count < 10)
+
+  # Checks that the CPU and GPU implementation returns the same results,
+  # given the same random seed
+  def testCPUGPUMatch(self):
+    for dt in tf.float32, tf.float64:
+      results = {}
+      for use_gpu in [False, True]:
+        sampler = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=12345)
+        results[use_gpu] = sampler()
+      self.assertAllClose(results[False], results[True], rtol=1e-6, atol=1e-6)
+
+  def testSeed(self):
+    for use_gpu in [False, True]:
+      for dt in tf.float32, tf.float64:
+        sx = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=345)
+        sy = self._Sampler(1000, 0.0, 1.0, dt, use_gpu=use_gpu, seed=345)
+        self.assertAllEqual(sx(), sy())
+
+  def testNoCSE(self):
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        shape = [2, 3, 4]
+        rnd1 = tf.random_uniform(shape, 0.0, 1.0,
+                                         dtype=tf.float32)
+        rnd2 = tf.random_uniform(shape, 0.0, 1.0,
+                                         dtype=tf.float32)
+        diff = (rnd2 - rnd1).eval()
+        self.assertTrue(np.linalg.norm(diff) > 0.1)
+
+
+class RandomShapeTest(tf.test.TestCase):
+
+  def testRandomParameters(self):
+    # Fully known shape.
+    rnd1 = tf.truncated_normal([1, 2, 3])
+    self.assertEqual([1, 2, 3], rnd1.get_shape())
+    # Partially known shape.
+    rnd2 = tf.truncated_normal(tf.placeholder(tf.int32, shape=(3,)))
+    self.assertEqual([None, None, None], rnd2.get_shape().as_list())
+    # Unknown shape.
+    rnd3 = tf.truncated_normal(tf.placeholder(tf.int32))
+    self.assertIs(None, rnd3.get_shape().ndims)
+
+  def testRandomNormal(self):
+    # Fully known shape.
+    rnd1 = tf.random_normal([1, 2, 3])
+    self.assertEqual([1, 2, 3], rnd1.get_shape())
+    # Partially known shape.
+    rnd2 = tf.random_normal(tf.placeholder(tf.int32, shape=(3,)))
+    self.assertEqual([None, None, None], rnd2.get_shape().as_list())
+    # Unknown shape.
+    rnd3 = tf.random_normal(tf.placeholder(tf.int32))
+    self.assertIs(None, rnd3.get_shape().ndims)
+
+  def testRandomUniform(self):
+    # Fully known shape.
+    rnd1 = tf.random_uniform([1, 2, 3])
+    self.assertEqual([1, 2, 3], rnd1.get_shape())
+    # Partially known shape.
+    rnd2 = tf.random_uniform(
+        tf.placeholder(tf.int32, shape=(3,)))
+    self.assertEqual([None, None, None], rnd2.get_shape().as_list())
+    # Unknown shape.
+    rnd3 = tf.random_uniform(tf.placeholder(tf.int32))
+    self.assertIs(None, rnd3.get_shape().ndims)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/random_shuffle_queue_test.py b/tensorflow/python/kernel_tests/random_shuffle_queue_test.py
new file mode 100644
index 0000000000..343ffdcb76
--- /dev/null
+++ b/tensorflow/python/kernel_tests/random_shuffle_queue_test.py
@@ -0,0 +1,1054 @@
+"""Tests for tensorflow.ops.data_flow_ops.Queue."""
+import random
+import re
+import time
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class RandomShuffleQueueTest(tf.test.TestCase):
+
+  def setUp(self):
+    # Useful for debugging when a test times out.
+    super(RandomShuffleQueueTest, self).setUp()
+    tf.logging.error("Starting: %s", self._testMethodName)
+
+  def tearDown(self):
+    super(RandomShuffleQueueTest, self).tearDown()
+    tf.logging.error("Finished: %s", self._testMethodName)
+
+  def testEnqueue(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 5, tf.float32)
+      enqueue_op = q.enqueue((10.0,))
+      self.assertAllEqual(0, q.size().eval())
+      enqueue_op.run()
+      self.assertAllEqual(1, q.size().eval())
+
+  def testEnqueueWithShape(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shapes=tf.TensorShape([3, 2]))
+      enqueue_correct_op = q.enqueue(([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]],))
+      enqueue_correct_op.run()
+      self.assertAllEqual(1, q.size().eval())
+      with self.assertRaises(ValueError):
+        q.enqueue(([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],))
+
+  def testEnqueueManyWithShape(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(
+          10, 5, [tf.int32, tf.int32],
+          shapes=[(), (2,)])
+      q.enqueue_many([[1, 2, 3, 4], [[1, 1], [2, 2], [3, 3], [4, 4]]]).run()
+      self.assertAllEqual(4, q.size().eval())
+
+      q2 = tf.RandomShuffleQueue(10, 5, tf.int32, shapes=tf.TensorShape([3]))
+      q2.enqueue(([1, 2, 3],))
+      q2.enqueue_many(([[1, 2, 3]],))
+
+  def testScalarShapes(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(
+          10, 0, [tf.int32, tf.int32],
+          shapes=[(), (1,)])
+      q.enqueue_many([[1, 2, 3, 4], [[5], [6], [7], [8]]]).run()
+      q.enqueue([9, [10]]).run()
+      dequeue_t = q.dequeue()
+      results = []
+      for _ in range(2):
+        a, b = sess.run(dequeue_t)
+        results.append((a, b))
+      a, b = sess.run(q.dequeue_many(3))
+      for i in range(3):
+        results.append((a[i], b[i]))
+      self.assertItemsEqual([(1, [5]), (2, [6]), (3, [7]), (4, [8]), (9, [10])],
+                            results)
+
+  def testParallelEnqueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(10, 0, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
+      enqueue_ops = [q.enqueue((x,)) for x in elems]
+      dequeued_t = q.dequeue()
+
+      # Run one producer thread for each element in elems.
+      def enqueue(enqueue_op):
+        sess.run(enqueue_op)
+      threads = [self.checkedThread(target=enqueue, args=(e,))
+                 for e in enqueue_ops]
+      for thread in threads:
+        thread.start()
+      for thread in threads:
+        thread.join()
+
+      # Dequeue every element using a single thread.
+      results = []
+      for _ in xrange(len(elems)):
+        results.append(dequeued_t.eval())
+      self.assertItemsEqual(elems, results)
+
+  def testParallelDequeue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(10, 0, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
+      enqueue_ops = [q.enqueue((x,)) for x in elems]
+      dequeued_t = q.dequeue()
+
+      # Enqueue every element using a single thread.
+      for enqueue_op in enqueue_ops:
+        enqueue_op.run()
+
+      # Run one consumer thread for each element in elems.
+      results = []
+
+      def dequeue():
+        results.append(sess.run(dequeued_t))
+      threads = [self.checkedThread(target=dequeue) for _ in enqueue_ops]
+      for thread in threads:
+        thread.start()
+      for thread in threads:
+        thread.join()
+      self.assertItemsEqual(elems, results)
+
+  def testDequeue(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 0, tf.float32)
+      elems = [10.0, 20.0, 30.0]
+      enqueue_ops = [q.enqueue((x,)) for x in elems]
+      dequeued_t = q.dequeue()
+
+      for enqueue_op in enqueue_ops:
+        enqueue_op.run()
+
+      vals = [dequeued_t.eval() for _ in xrange(len(elems))]
+      self.assertItemsEqual(elems, vals)
+
+  def testEnqueueAndBlockingDequeue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(3, 0, tf.float32)
+      elems = [10.0, 20.0, 30.0]
+      enqueue_ops = [q.enqueue((x,)) for x in elems]
+      dequeued_t = q.dequeue()
+
+      def enqueue():
+        # The enqueue_ops should run after the dequeue op has blocked.
+        # TODO(mrry): Figure out how to do this without sleeping.
+        time.sleep(0.1)
+        for enqueue_op in enqueue_ops:
+          sess.run(enqueue_op)
+
+      results = []
+
+      def dequeue():
+        for _ in xrange(len(elems)):
+          results.append(sess.run(dequeued_t))
+
+      enqueue_thread = self.checkedThread(target=enqueue)
+      dequeue_thread = self.checkedThread(target=dequeue)
+      enqueue_thread.start()
+      dequeue_thread.start()
+      enqueue_thread.join()
+      dequeue_thread.join()
+
+      self.assertItemsEqual(elems, results)
+
+  def testMultiEnqueueAndDequeue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(
+          10, 0, (tf.int32, tf.float32))
+      elems = [(5, 10.0), (10, 20.0), (15, 30.0)]
+      enqueue_ops = [q.enqueue((x, y)) for x, y in elems]
+      dequeued_t = q.dequeue()
+
+      for enqueue_op in enqueue_ops:
+        enqueue_op.run()
+
+      results = []
+      for _ in xrange(len(elems)):
+        x, y = sess.run(dequeued_t)
+        results.append((x, y))
+      self.assertItemsEqual(elems, results)
+
+  def testQueueSizeEmpty(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 5, tf.float32)
+      self.assertEqual(0, q.size().eval())
+
+  def testQueueSizeAfterEnqueueAndDequeue(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 0, tf.float32)
+      enqueue_op = q.enqueue((10.0,))
+      dequeued_t = q.dequeue()
+      size = q.size()
+      self.assertEqual([], size.get_shape())
+
+      enqueue_op.run()
+      self.assertEqual([1], size.eval())
+      dequeued_t.op.run()
+      self.assertEqual([0], size.eval())
+
+  def testEnqueueMany(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 0, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue()
+      enqueue_op.run()
+      enqueue_op.run()
+
+      results = []
+      for _ in range(8):
+        results.append(dequeued_t.eval())
+      self.assertItemsEqual(elems + elems, results)
+
+  def testEmptyEnqueueMany(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 5, tf.float32)
+      empty_t = tf.constant([], dtype=tf.float32,
+                                     shape=[0, 2, 3])
+      enqueue_op = q.enqueue_many((empty_t,))
+      size_t = q.size()
+
+      self.assertEqual(0, size_t.eval())
+      enqueue_op.run()
+      self.assertEqual(0, size_t.eval())
+
+  def testEmptyDequeueMany(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 0, tf.float32, shapes=())
+      enqueue_op = q.enqueue((10.0,))
+      dequeued_t = q.dequeue_many(0)
+
+      self.assertEqual([], dequeued_t.eval().tolist())
+      enqueue_op.run()
+      self.assertEqual([], dequeued_t.eval().tolist())
+
+  def testEmptyDequeueManyWithNoShape(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 0, tf.float32)
+      enqueue_op = q.enqueue(
+          (tf.constant([10.0, 20.0], shape=(1, 2)),))
+      dequeued_t = q.dequeue_many(0)
+
+      # Expect the operation to fail due to the shape not being constrained.
+      with self.assertRaisesOpError(
+          "requires the components to have specified shapes"):
+        dequeued_t.eval()
+
+      enqueue_op.run()
+
+      # Unlike tf.Queue, RandomShuffleQueue does not make any
+      # attempt to support DequeueMany with unspecified shapes, even if
+      # a shape could be inferred from the elements enqueued.
+      with self.assertRaisesOpError(
+          "requires the components to have specified shapes"):
+        dequeued_t.eval()
+
+  def testMultiEnqueueMany(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(
+          10, 0, (tf.float32, tf.int32))
+      float_elems = [10.0, 20.0, 30.0, 40.0]
+      int_elems = [[1, 2], [3, 4], [5, 6], [7, 8]]
+      enqueue_op = q.enqueue_many((float_elems, int_elems))
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+      enqueue_op.run()
+
+      results = []
+      for _ in range(8):
+        float_val, int_val = sess.run(dequeued_t)
+        results.append((float_val, [int_val[0], int_val[1]]))
+      expected = zip(float_elems, int_elems) + zip(float_elems, int_elems)
+      self.assertItemsEqual(expected, results)
+
+  def testDequeueMany(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 0, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(5)
+
+      enqueue_op.run()
+
+      results = dequeued_t.eval().tolist()
+      results.extend(dequeued_t.eval())
+      self.assertItemsEqual(elems, results)
+
+  def testMultiDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(
+          10, 0, (tf.float32, tf.int32),
+          shapes=((), (2,)))
+      float_elems = [
+          10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0]
+      int_elems = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10],
+                   [11, 12], [13, 14], [15, 16], [17, 18], [19, 20]]
+      enqueue_op = q.enqueue_many((float_elems, int_elems))
+      dequeued_t = q.dequeue_many(4)
+      dequeued_single_t = q.dequeue()
+
+      enqueue_op.run()
+
+      results = []
+      float_val, int_val = sess.run(dequeued_t)
+      self.assertEqual(float_val.shape, dequeued_t[0].get_shape())
+      self.assertEqual(int_val.shape, dequeued_t[1].get_shape())
+      results.extend(zip(float_val, int_val.tolist()))
+
+      float_val, int_val = sess.run(dequeued_t)
+      results.extend(zip(float_val, int_val.tolist()))
+
+      float_val, int_val = sess.run(dequeued_single_t)
+      self.assertEqual(float_val.shape, dequeued_single_t[0].get_shape())
+      self.assertEqual(int_val.shape, dequeued_single_t[1].get_shape())
+      results.append((float_val, int_val.tolist()))
+
+      float_val, int_val = sess.run(dequeued_single_t)
+      results.append((float_val, int_val.tolist()))
+
+      self.assertItemsEqual(zip(float_elems, int_elems), results)
+
+  def testHighDimension(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(
+          10, 0, tf.int32, ((4, 4, 4, 4)))
+      elems = np.array([[[[[x] * 4] * 4] * 4] * 4 for x in range(10)], np.int32)
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(10)
+
+      enqueue_op.run()
+      self.assertItemsEqual(dequeued_t.eval().tolist(), elems.tolist())
+
+  def testParallelEnqueueMany(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(1000, 0, tf.float32, shapes=())
+      elems = [10.0 * x for x in range(100)]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(1000)
+
+      # Enqueue 100 items in parallel on 10 threads.
+      def enqueue():
+        sess.run(enqueue_op)
+      threads = [self.checkedThread(target=enqueue) for _ in range(10)]
+      for thread in threads:
+        thread.start()
+      for thread in threads:
+        thread.join()
+
+      self.assertItemsEqual(dequeued_t.eval(), elems * 10)
+
+  def testParallelDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(1000, 0, tf.float32, shapes=())
+      elems = [10.0 * x for x in range(1000)]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(100)
+
+      enqueue_op.run()
+
+      # Dequeue 100 items in parallel on 10 threads.
+      dequeued_elems = []
+
+      def dequeue():
+        dequeued_elems.extend(sess.run(dequeued_t))
+      threads = [self.checkedThread(target=dequeue) for _ in range(10)]
+      for thread in threads:
+        thread.start()
+      for thread in threads:
+        thread.join()
+      self.assertItemsEqual(elems, dequeued_elems)
+
+  def testBlockingDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(10, 0, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      dequeued_t = q.dequeue_many(4)
+
+      dequeued_elems = []
+
+      def enqueue():
+        # The enqueue_op should run after the dequeue op has blocked.
+        # TODO(mrry): Figure out how to do this without sleeping.
+        time.sleep(0.1)
+        sess.run(enqueue_op)
+
+      def dequeue():
+        dequeued_elems.extend(sess.run(dequeued_t).tolist())
+
+      enqueue_thread = self.checkedThread(target=enqueue)
+      dequeue_thread = self.checkedThread(target=dequeue)
+      enqueue_thread.start()
+      dequeue_thread.start()
+      enqueue_thread.join()
+      dequeue_thread.join()
+
+      self.assertItemsEqual(elems, dequeued_elems)
+
+  def testDequeueManyWithTensorParameter(self):
+    with self.test_session():
+      # Define a first queue that contains integer counts.
+      dequeue_counts = [random.randint(1, 10) for _ in range(100)]
+      count_q = tf.RandomShuffleQueue(100, 0, tf.int32)
+      enqueue_counts_op = count_q.enqueue_many((dequeue_counts,))
+      total_count = sum(dequeue_counts)
+
+      # Define a second queue that contains total_count elements.
+      elems = [random.randint(0, 100) for _ in range(total_count)]
+      q = tf.RandomShuffleQueue(
+          total_count, 0, tf.int32, ((),))
+      enqueue_elems_op = q.enqueue_many((elems,))
+
+      # Define a subgraph that first dequeues a count, then DequeuesMany
+      # that number of elements.
+      dequeued_t = q.dequeue_many(count_q.dequeue())
+
+      enqueue_counts_op.run()
+      enqueue_elems_op.run()
+
+      dequeued_elems = []
+      for _ in dequeue_counts:
+        dequeued_elems.extend(dequeued_t.eval())
+      self.assertItemsEqual(elems, dequeued_elems)
+
+  def testDequeueFromClosedQueue(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 2, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+      close_op.run()
+      results = [dequeued_t.eval() for _ in elems]
+      expected = [[elem] for elem in elems]
+      self.assertItemsEqual(expected, results)
+
+      # Expect the operation to fail due to the queue being closed.
+      with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                   "is closed and has insufficient"):
+        dequeued_t.eval()
+
+  def testBlockingDequeueFromClosedQueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(10, 2, tf.float32)
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+
+      results = []
+      def dequeue():
+        for _ in elems:
+          results.append(sess.run(dequeued_t))
+        self.assertItemsEqual(elems, results)
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                     "is closed and has insufficient"):
+          sess.run(dequeued_t)
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      # The dequeue thread blocked when it hit the min_size requirement.
+      self.assertEqual(len(results), 2)
+      close_op.run()
+      dequeue_thread.join()
+      # Once the queue is closed, the min_size requirement is lifted.
+      self.assertEqual(len(results), 4)
+
+  def testBlockingDequeueFromClosedEmptyQueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(10, 0, tf.float32)
+      close_op = q.close()
+      dequeued_t = q.dequeue()
+
+      finished = []  # Needs to be a mutable type
+      def dequeue():
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                     "is closed and has insufficient"):
+          sess.run(dequeued_t)
+        finished.append(True)
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      self.assertEqual(len(finished), 0)
+      close_op.run()
+      dequeue_thread.join()
+      self.assertEqual(len(finished), 1)
+
+  def testBlockingDequeueManyFromClosedQueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(10, 0, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+      dequeued_t = q.dequeue_many(4)
+
+      enqueue_op.run()
+
+      progress = []  # Must be mutable
+      def dequeue():
+        self.assertItemsEqual(elems, sess.run(dequeued_t))
+        progress.append(1)
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                     "is closed and has insufficient"):
+          sess.run(dequeued_t)
+        progress.append(2)
+
+      self.assertEqual(len(progress), 0)
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      for _ in range(100):
+        time.sleep(0.01)
+        if len(progress) == 1: break
+      self.assertEqual(len(progress), 1)
+      time.sleep(0.01)
+      close_op.run()
+      dequeue_thread.join()
+      self.assertEqual(len(progress), 2)
+
+  def testBlockingDequeueManyFromClosedQueueWithElementsRemaining(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(10, 0, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+      dequeued_t = q.dequeue_many(3)
+      cleanup_dequeue_t = q.dequeue_many(q.size())
+
+      enqueue_op.run()
+
+      results = []
+      def dequeue():
+        results.extend(sess.run(dequeued_t))
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                     "is closed and has insufficient"):
+          sess.run(dequeued_t)
+        # However, the last result was dequeued before the queue was closed,
+        # so nothing more is added to results.
+        results.extend(sess.run(cleanup_dequeue_t))
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      self.assertEqual(len(results), 3)
+      close_op.run()
+      dequeue_thread.join()
+      self.assertEqual(len(results), 3)
+
+  def testBlockingDequeueManyFromClosedEmptyQueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(10, 5, tf.float32, ((),))
+      close_op = q.close()
+      dequeued_t = q.dequeue_many(4)
+
+      def dequeue():
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.OutOfRangeError,
+                                     "is closed and has insufficient"):
+          sess.run(dequeued_t)
+
+      dequeue_thread = self.checkedThread(target=dequeue)
+      dequeue_thread.start()
+      # The close_op should run after the dequeue_thread has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      close_op.run()
+      dequeue_thread.join()
+
+  def testEnqueueToClosedQueue(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 4, tf.float32)
+      enqueue_op = q.enqueue((10.0,))
+      close_op = q.close()
+
+      enqueue_op.run()
+      close_op.run()
+
+      # Expect the operation to fail due to the queue being closed.
+      with self.assertRaisesRegexp(tf.errors.AbortedError, "is closed"):
+        enqueue_op.run()
+
+  def testEnqueueManyToClosedQueue(self):
+    with self.test_session():
+      q = tf.RandomShuffleQueue(10, 5, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      close_op = q.close()
+
+      enqueue_op.run()
+      close_op.run()
+
+      # Expect the operation to fail due to the queue being closed.
+      with self.assertRaisesRegexp(tf.errors.AbortedError, "is closed"):
+        enqueue_op.run()
+
+  def testBlockingEnqueueToFullQueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(4, 0, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      blocking_enqueue_op = q.enqueue((50.0,))
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+
+      def blocking_enqueue():
+        sess.run(blocking_enqueue_op)
+      thread = self.checkedThread(target=blocking_enqueue)
+      thread.start()
+      # The dequeue ops should run after the blocking_enqueue_op has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      results = []
+      for _ in elems:
+        results.append(dequeued_t.eval())
+      results.append(dequeued_t.eval())
+      self.assertItemsEqual(elems + [50.0], results)
+      # There wasn't room for 50.0 in the queue when the first element was
+      # dequeued.
+      self.assertNotEqual(50.0, results[0])
+      thread.join()
+
+  def testBlockingEnqueueManyToFullQueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(4, 0, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      blocking_enqueue_op = q.enqueue_many(([50.0, 60.0],))
+      dequeued_t = q.dequeue()
+
+      enqueue_op.run()
+
+      def blocking_enqueue():
+        sess.run(blocking_enqueue_op)
+      thread = self.checkedThread(target=blocking_enqueue)
+      thread.start()
+      # The dequeue ops should run after the blocking_enqueue_op has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+
+      results = []
+      for _ in elems:
+        time.sleep(0.01)
+        results.append(dequeued_t.eval())
+      results.append(dequeued_t.eval())
+      results.append(dequeued_t.eval())
+      self.assertItemsEqual(elems + [50.0, 60.0], results)
+      # There wasn't room for 50.0 or 60.0 in the queue when the first
+      # element was dequeued.
+      self.assertNotEqual(50.0, results[0])
+      self.assertNotEqual(60.0, results[0])
+      # Similarly for 60.0 and the second element.
+      self.assertNotEqual(60.0, results[1])
+
+  def testBlockingEnqueueToClosedQueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(4, 0, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0, 40.0]
+      enqueue_op = q.enqueue_many((elems,))
+      blocking_enqueue_op = q.enqueue((50.0,))
+      dequeued_t = q.dequeue()
+      close_op = q.close()
+
+      enqueue_op.run()
+
+      def blocking_enqueue():
+        # Expect the operation to succeed since it will complete
+        # before the queue is closed.
+        sess.run(blocking_enqueue_op)
+
+        # Expect the operation to fail due to the queue being closed.
+        with self.assertRaisesRegexp(tf.errors.AbortedError, "closed"):
+          sess.run(blocking_enqueue_op)
+      thread1 = self.checkedThread(target=blocking_enqueue)
+      thread1.start()
+
+      # The close_op should run after the first blocking_enqueue_op has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+
+      def blocking_close():
+        sess.run(close_op)
+      thread2 = self.checkedThread(target=blocking_close)
+      thread2.start()
+
+      # Wait for the close op to block before unblocking the enqueue.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+
+      results = []
+      # Dequeue to unblock the first blocking_enqueue_op, after which the
+      # close will complete.
+      results.append(dequeued_t.eval())
+      self.assertTrue(results[0] in elems)
+      thread2.join()
+      thread1.join()
+
+  def testBlockingEnqueueManyToClosedQueue(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(4, 0, tf.float32, ((),))
+      elems = [10.0, 20.0, 30.0]
+      enqueue_op = q.enqueue_many((elems,))
+      blocking_enqueue_op = q.enqueue_many(([50.0, 60.0],))
+      close_op = q.close()
+      size_t = q.size()
+
+      enqueue_op.run()
+      self.assertEqual(size_t.eval(), 3)
+
+      def blocking_enqueue():
+        # This will block until the dequeue after the close.
+        sess.run(blocking_enqueue_op)
+        # At this point the close operation will become unblocked, so the
+        # next enqueue will fail.
+        with self.assertRaisesRegexp(tf.errors.AbortedError, "closed"):
+          sess.run(blocking_enqueue_op)
+      thread1 = self.checkedThread(target=blocking_enqueue)
+      thread1.start()
+      # The close_op should run after the blocking_enqueue_op has blocked.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+      # First blocking_enqueue_op of blocking_enqueue has enqueued 1 of 2
+      # elements, and is blocked waiting for one more element to be dequeue.
+      self.assertEqual(size_t.eval(), 4)
+
+      def blocking_close():
+        sess.run(close_op)
+      thread2 = self.checkedThread(target=blocking_close)
+      thread2.start()
+
+      # The close_op should run before the second blocking_enqueue_op
+      # has started.
+      # TODO(mrry): Figure out how to do this without sleeping.
+      time.sleep(0.1)
+
+      # Unblock the first blocking_enqueue_op in blocking_enqueue.
+      q.dequeue().eval()
+
+      thread2.join()
+      thread1.join()
+
+  def testSharedQueueSameSession(self):
+    with self.test_session():
+      q1 = tf.RandomShuffleQueue(
+          1, 0, tf.float32, ((),), shared_name="shared_queue")
+      q1.enqueue((10.0,)).run()
+
+      q2 = tf.RandomShuffleQueue(
+          1, 0, tf.float32, ((),), shared_name="shared_queue")
+
+      q1_size_t = q1.size()
+      q2_size_t = q2.size()
+
+      self.assertEqual(q1_size_t.eval(), 1)
+      self.assertEqual(q2_size_t.eval(), 1)
+
+      self.assertEqual(q2.dequeue().eval(), 10.0)
+
+      self.assertEqual(q1_size_t.eval(), 0)
+      self.assertEqual(q2_size_t.eval(), 0)
+
+      q2.enqueue((20.0,)).run()
+
+      self.assertEqual(q1_size_t.eval(), 1)
+      self.assertEqual(q2_size_t.eval(), 1)
+
+      self.assertEqual(q1.dequeue().eval(), 20.0)
+
+      self.assertEqual(q1_size_t.eval(), 0)
+      self.assertEqual(q2_size_t.eval(), 0)
+
+  def testIncompatibleSharedQueueErrors(self):
+    with self.test_session():
+      q_a_1 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shared_name="q_a")
+      q_a_2 = tf.RandomShuffleQueue(
+          15, 5, tf.float32, shared_name="q_a")
+      q_a_1.queue_ref.eval()
+      with self.assertRaisesOpError("capacity"):
+        q_a_2.queue_ref.eval()
+
+      q_b_1 = tf.RandomShuffleQueue(
+          10, 0, tf.float32, shared_name="q_b")
+      q_b_2 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shared_name="q_b")
+      q_b_1.queue_ref.eval()
+      with self.assertRaisesOpError("min_after_dequeue"):
+        q_b_2.queue_ref.eval()
+
+      q_c_1 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shared_name="q_c")
+      q_c_2 = tf.RandomShuffleQueue(
+          10, 5, tf.int32, shared_name="q_c")
+      q_c_1.queue_ref.eval()
+      with self.assertRaisesOpError("component types"):
+        q_c_2.queue_ref.eval()
+
+      q_d_1 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shared_name="q_d")
+      q_d_2 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shapes=[(1, 1, 2, 3)], shared_name="q_d")
+      q_d_1.queue_ref.eval()
+      with self.assertRaisesOpError("component shapes"):
+        q_d_2.queue_ref.eval()
+
+      q_e_1 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shapes=[(1, 1, 2, 3)], shared_name="q_e")
+      q_e_2 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shared_name="q_e")
+      q_e_1.queue_ref.eval()
+      with self.assertRaisesOpError("component shapes"):
+        q_e_2.queue_ref.eval()
+
+      q_f_1 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shapes=[(1, 1, 2, 3)], shared_name="q_f")
+      q_f_2 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shapes=[(1, 1, 2, 4)], shared_name="q_f")
+      q_f_1.queue_ref.eval()
+      with self.assertRaisesOpError("component shapes"):
+        q_f_2.queue_ref.eval()
+
+      q_g_1 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, shared_name="q_g")
+      q_g_2 = tf.RandomShuffleQueue(
+          10, 5, (tf.float32, tf.int32), shared_name="q_g")
+      q_g_1.queue_ref.eval()
+      with self.assertRaisesOpError("component types"):
+        q_g_2.queue_ref.eval()
+
+      q_h_1 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, seed=12, shared_name="q_h")
+      q_h_2 = tf.RandomShuffleQueue(
+          10, 5, tf.float32, seed=21, shared_name="q_h")
+      q_h_1.queue_ref.eval()
+      with self.assertRaisesOpError("random seeds"):
+        q_h_2.queue_ref.eval()
+
+  def testSelectQueue(self):
+    with self.test_session():
+      num_queues = 10
+      qlist = list()
+      for _ in xrange(num_queues):
+        qlist.append(
+            tf.RandomShuffleQueue(10, 0, tf.float32))
+      # Enqueue/Dequeue into a dynamically selected queue
+      for _ in xrange(20):
+        index = np.random.randint(num_queues)
+        q = tf.RandomShuffleQueue.from_list(index, qlist)
+        q.enqueue((10.,)).run()
+        self.assertEqual(q.dequeue().eval(), 10.0)
+
+  def testSelectQueueOutOfRange(self):
+    with self.test_session():
+      q1 = tf.RandomShuffleQueue(10, 0, tf.float32)
+      q2 = tf.RandomShuffleQueue(15, 0, tf.float32)
+      enq_q = tf.RandomShuffleQueue.from_list(3, [q1, q2])
+      with self.assertRaisesOpError("Index must be in the range"):
+        enq_q.dequeue().eval()
+
+  def _blockingDequeue(self, sess, dequeue_op):
+    with self.assertRaisesOpError("Dequeue operation was cancelled"):
+      sess.run(dequeue_op)
+
+  def _blockingDequeueMany(self, sess, dequeue_many_op):
+    with self.assertRaisesOpError("Dequeue operation was cancelled"):
+      sess.run(dequeue_many_op)
+
+  def _blockingEnqueue(self, sess, enqueue_op):
+    with self.assertRaisesOpError("Enqueue operation was cancelled"):
+      sess.run(enqueue_op)
+
+  def _blockingEnqueueMany(self, sess, enqueue_many_op):
+    with self.assertRaisesOpError("Enqueue operation was cancelled"):
+      sess.run(enqueue_many_op)
+
+  def testResetOfBlockingOperation(self):
+    with self.test_session() as sess:
+      q_empty = tf.RandomShuffleQueue(
+          5, 0, tf.float32, ((),))
+      dequeue_op = q_empty.dequeue()
+      dequeue_many_op = q_empty.dequeue_many(1)
+
+      q_full = tf.RandomShuffleQueue(5, 0, tf.float32, ((),))
+      sess.run(q_full.enqueue_many(([1.0, 2.0, 3.0, 4.0, 5.0],)))
+      enqueue_op = q_full.enqueue((6.0,))
+      enqueue_many_op = q_full.enqueue_many(([6.0],))
+
+      threads = [
+          self.checkedThread(self._blockingDequeue, args=(sess, dequeue_op)),
+          self.checkedThread(self._blockingDequeueMany, args=(sess,
+                                                              dequeue_many_op)),
+          self.checkedThread(self._blockingEnqueue, args=(sess, enqueue_op)),
+          self.checkedThread(self._blockingEnqueueMany, args=(sess,
+                                                              enqueue_many_op))]
+      for t in threads:
+        t.start()
+      time.sleep(0.1)
+      sess.close()  # Will cancel the blocked operations.
+      for t in threads:
+        t.join()
+
+  def testDequeueManyInDifferentOrders(self):
+    with self.test_session():
+      # Specify seeds to make the test deterministic
+      # (https://en.wikipedia.org/wiki/Taxicab_number).
+      q1 = tf.RandomShuffleQueue(10, 5, tf.int32,
+                                            ((),), seed=1729)
+      q2 = tf.RandomShuffleQueue(10, 5, tf.int32,
+                                            ((),), seed=87539319)
+      enq1 = q1.enqueue_many(([1, 2, 3, 4, 5],))
+      enq2 = q2.enqueue_many(([1, 2, 3, 4, 5],))
+      deq1 = q1.dequeue_many(5)
+      deq2 = q2.dequeue_many(5)
+
+      enq1.run()
+      enq1.run()
+      enq2.run()
+      enq2.run()
+
+      results = [[], [], [], []]
+
+      results[0].extend(deq1.eval())
+      results[1].extend(deq2.eval())
+
+      q1.close().run()
+      q2.close().run()
+
+      results[2].extend(deq1.eval())
+      results[3].extend(deq2.eval())
+
+      # No two should match
+      for i in range(1, 4):
+        for j in range(i):
+          self.assertNotEqual(results[i], results[j])
+
+  def testDequeueInDifferentOrders(self):
+    with self.test_session():
+      # Specify seeds to make the test deterministic
+      # (https://en.wikipedia.org/wiki/Taxicab_number).
+      q1 = tf.RandomShuffleQueue(10, 5, tf.int32,
+                                            ((),), seed=1729)
+      q2 = tf.RandomShuffleQueue(10, 5, tf.int32,
+                                            ((),), seed=87539319)
+      enq1 = q1.enqueue_many(([1, 2, 3, 4, 5],))
+      enq2 = q2.enqueue_many(([1, 2, 3, 4, 5],))
+      deq1 = q1.dequeue()
+      deq2 = q2.dequeue()
+
+      enq1.run()
+      enq1.run()
+      enq2.run()
+      enq2.run()
+
+      results = [[], [], [], []]
+
+      for _ in range(5):
+        results[0].append(deq1.eval())
+        results[1].append(deq2.eval())
+
+      q1.close().run()
+      q2.close().run()
+
+      for _ in range(5):
+        results[2].append(deq1.eval())
+        results[3].append(deq2.eval())
+
+      # No two should match
+      for i in range(1, 4):
+        for j in range(i):
+          self.assertNotEqual(results[i], results[j])
+
+  def testBigEnqueueMany(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(
+          5, 0, tf.int32, ((),))
+      elem = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+      enq = q.enqueue_many((elem,))
+      deq = q.dequeue()
+      size_op = q.size()
+
+      enq_done = []
+      def blocking_enqueue():
+        enq_done.append(False)
+        # This will fill the queue and then block until enough dequeues happen.
+        sess.run(enq)
+        enq_done.append(True)
+      thread = self.checkedThread(target=blocking_enqueue)
+      thread.start()
+
+      # The enqueue should start and then block.
+      results = []
+      results.append(deq.eval())  # Will only complete after the enqueue starts.
+      self.assertEqual(len(enq_done), 1)
+      self.assertEqual(sess.run(size_op), 5)
+
+      for _ in range(3):
+        results.append(deq.eval())
+
+      time.sleep(0.1)
+      self.assertEqual(len(enq_done), 1)
+      self.assertEqual(sess.run(size_op), 5)
+
+      # This dequeue will unblock the thread.
+      results.append(deq.eval())
+      time.sleep(0.1)
+      self.assertEqual(len(enq_done), 2)
+      thread.join()
+
+      for i in range(5):
+        self.assertEqual(size_op.eval(), 5 - i)
+        results.append(deq.eval())
+        self.assertEqual(size_op.eval(), 5 - i - 1)
+
+      self.assertItemsEqual(elem, results)
+
+  def testBigDequeueMany(self):
+    with self.test_session() as sess:
+      q = tf.RandomShuffleQueue(2, 0, tf.int32, ((),))
+      elem = range(4)
+      enq_list = [q.enqueue((e,)) for e in elem]
+      deq = q.dequeue_many(4)
+
+      results = []
+      def blocking_dequeue():
+        # Will only complete after 4 enqueues complete.
+        results.extend(sess.run(deq))
+      thread = self.checkedThread(target=blocking_dequeue)
+      thread.start()
+      # The dequeue should start and then block.
+      for enq in enq_list:
+        # TODO(mrry): Figure out how to do this without sleeping.
+        time.sleep(0.1)
+        self.assertEqual(len(results), 0)
+        sess.run(enq)
+
+      # Enough enqueued to unblock the dequeue
+      thread.join()
+      self.assertItemsEqual(elem, results)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/reader_ops_test.py b/tensorflow/python/kernel_tests/reader_ops_test.py
new file mode 100644
index 0000000000..484e3eca43
--- /dev/null
+++ b/tensorflow/python/kernel_tests/reader_ops_test.py
@@ -0,0 +1,362 @@
+"""Tests for Reader ops from io_ops."""
+
+import os
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class IdentityReaderTest(tf.test.TestCase):
+
+  def _ExpectRead(self, sess, key, value, expected):
+    k, v = sess.run([key, value])
+    self.assertAllEqual(expected, k)
+    self.assertAllEqual(expected, v)
+
+  def testOneEpoch(self):
+    with self.test_session() as sess:
+      reader = tf.IdentityReader("test_reader")
+      work_completed = reader.num_work_units_completed()
+      produced = reader.num_records_produced()
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      queued_length = queue.size()
+      key, value = reader.read(queue)
+
+      self.assertAllEqual(0, work_completed.eval())
+      self.assertAllEqual(0, produced.eval())
+      self.assertAllEqual(0, queued_length.eval())
+
+      queue.enqueue_many([["A", "B", "C"]]).run()
+      queue.close().run()
+      self.assertAllEqual(3, queued_length.eval())
+
+      self._ExpectRead(sess, key, value, "A")
+      self.assertAllEqual(1, produced.eval())
+
+      self._ExpectRead(sess, key, value, "B")
+
+      self._ExpectRead(sess, key, value, "C")
+      self.assertAllEqual(3, produced.eval())
+      self.assertAllEqual(0, queued_length.eval())
+
+      with self.assertRaisesOpError("is closed and has insufficient elements "
+                                    "\\(requested 1, current size 0\\)"):
+        sess.run([key, value])
+
+      self.assertAllEqual(3, work_completed.eval())
+      self.assertAllEqual(3, produced.eval())
+      self.assertAllEqual(0, queued_length.eval())
+
+  def testMultipleEpochs(self):
+    with self.test_session() as sess:
+      reader = tf.IdentityReader("test_reader")
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      enqueue = queue.enqueue_many([["DD", "EE"]])
+      key, value = reader.read(queue)
+
+      enqueue.run()
+      self._ExpectRead(sess, key, value, "DD")
+      self._ExpectRead(sess, key, value, "EE")
+      enqueue.run()
+      self._ExpectRead(sess, key, value, "DD")
+      self._ExpectRead(sess, key, value, "EE")
+      enqueue.run()
+      self._ExpectRead(sess, key, value, "DD")
+      self._ExpectRead(sess, key, value, "EE")
+      queue.close().run()
+      with self.assertRaisesOpError("is closed and has insufficient elements "
+                                    "\\(requested 1, current size 0\\)"):
+        sess.run([key, value])
+
+  def testSerializeRestore(self):
+    with self.test_session() as sess:
+      reader = tf.IdentityReader("test_reader")
+      produced = reader.num_records_produced()
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      queue.enqueue_many([["X", "Y", "Z"]]).run()
+      key, value = reader.read(queue)
+
+      self._ExpectRead(sess, key, value, "X")
+      self.assertAllEqual(1, produced.eval())
+      state = reader.serialize_state().eval()
+
+      self._ExpectRead(sess, key, value, "Y")
+      self._ExpectRead(sess, key, value, "Z")
+      self.assertAllEqual(3, produced.eval())
+
+      queue.enqueue_many([["Y", "Z"]]).run()
+      queue.close().run()
+      reader.restore_state(state).run()
+      self.assertAllEqual(1, produced.eval())
+      self._ExpectRead(sess, key, value, "Y")
+      self._ExpectRead(sess, key, value, "Z")
+      with self.assertRaisesOpError("is closed and has insufficient elements "
+                                    "\\(requested 1, current size 0\\)"):
+        sess.run([key, value])
+      self.assertAllEqual(3, produced.eval())
+
+      self.assertEqual(str, type(state))
+
+      with self.assertRaises(ValueError):
+        reader.restore_state([])
+
+      with self.assertRaises(ValueError):
+        reader.restore_state([state, state])
+
+      with self.assertRaisesOpError(
+          "Could not parse state for IdentityReader 'test_reader'"):
+        reader.restore_state(state[1:]).run()
+
+      with self.assertRaisesOpError(
+          "Could not parse state for IdentityReader 'test_reader'"):
+        reader.restore_state(state[:-1]).run()
+
+      with self.assertRaisesOpError(
+          "Could not parse state for IdentityReader 'test_reader'"):
+        reader.restore_state(state + "ExtraJunk").run()
+
+      with self.assertRaisesOpError(
+          "Could not parse state for IdentityReader 'test_reader'"):
+        reader.restore_state("PREFIX" + state).run()
+
+      with self.assertRaisesOpError(
+          "Could not parse state for IdentityReader 'test_reader'"):
+        reader.restore_state("BOGUS" + state[5:]).run()
+
+  def testReset(self):
+    with self.test_session() as sess:
+      reader = tf.IdentityReader("test_reader")
+      work_completed = reader.num_work_units_completed()
+      produced = reader.num_records_produced()
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      queued_length = queue.size()
+      key, value = reader.read(queue)
+
+      queue.enqueue_many([["X", "Y", "Z"]]).run()
+      self._ExpectRead(sess, key, value, "X")
+      self.assertLess(0, queued_length.eval())
+      self.assertAllEqual(1, produced.eval())
+
+      self._ExpectRead(sess, key, value, "Y")
+      self.assertLess(0, work_completed.eval())
+      self.assertAllEqual(2, produced.eval())
+
+      reader.reset().run()
+      self.assertAllEqual(0, work_completed.eval())
+      self.assertAllEqual(0, produced.eval())
+      self.assertAllEqual(1, queued_length.eval())
+      self._ExpectRead(sess, key, value, "Z")
+
+      queue.enqueue_many([["K", "L"]]).run()
+      self._ExpectRead(sess, key, value, "K")
+
+
+class WholeFileReaderTest(tf.test.TestCase):
+
+  def setUp(self):
+    super(WholeFileReaderTest, self).setUp()
+    self._filenames = [os.path.join(self.get_temp_dir(), "whole_file.%d.txt" % i)
+                       for i in range(3)]
+    self._content = ["One\na\nb\n", "Two\nC\nD", "Three x, y, z"]
+    for fn, c in zip(self._filenames, self._content):
+      open(fn, "w").write(c)
+
+  def tearDown(self):
+    super(WholeFileReaderTest, self).tearDown()
+    for fn in self._filenames:
+      os.remove(fn)
+
+  def _ExpectRead(self, sess, key, value, index):
+    k, v = sess.run([key, value])
+    self.assertAllEqual(self._filenames[index], k)
+    self.assertAllEqual(self._content[index], v)
+
+  def testOneEpoch(self):
+    with self.test_session() as sess:
+      reader = tf.WholeFileReader("test_reader")
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      queue.enqueue_many([self._filenames]).run()
+      queue.close().run()
+      key, value = reader.read(queue)
+
+      self._ExpectRead(sess, key, value, 0)
+      self._ExpectRead(sess, key, value, 1)
+      self._ExpectRead(sess, key, value, 2)
+
+      with self.assertRaisesOpError("is closed and has insufficient elements "
+                                    "\\(requested 1, current size 0\\)"):
+        sess.run([key, value])
+
+  def testInfiniteEpochs(self):
+    with self.test_session() as sess:
+      reader = tf.WholeFileReader("test_reader")
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      enqueue = queue.enqueue_many([self._filenames])
+      key, value = reader.read(queue)
+
+      enqueue.run()
+      self._ExpectRead(sess, key, value, 0)
+      self._ExpectRead(sess, key, value, 1)
+      enqueue.run()
+      self._ExpectRead(sess, key, value, 2)
+      self._ExpectRead(sess, key, value, 0)
+      self._ExpectRead(sess, key, value, 1)
+      enqueue.run()
+      self._ExpectRead(sess, key, value, 2)
+      self._ExpectRead(sess, key, value, 0)
+
+
+class TextLineReaderTest(tf.test.TestCase):
+
+  def setUp(self):
+    super(TextLineReaderTest, self).setUp()
+    self._num_files = 2
+    self._num_lines = 5
+
+  def _LineText(self, f, l):
+    return "%d: %d" % (f, l)
+
+  def _CreateFiles(self):
+    filenames = []
+    for i in range(self._num_files):
+      fn = os.path.join(self.get_temp_dir(), "text_line.%d.txt" % i)
+      filenames.append(fn)
+      f = open(fn, "w")
+      for j in range(self._num_lines):
+        f.write(self._LineText(i, j))
+        # Always include a newline after the record unless it is
+        # at the end of the file, in which case we include it sometimes.
+        if j + 1 != self._num_lines or i == 0:
+          f.write("\n")
+    return filenames
+
+  def testOneEpoch(self):
+    files = self._CreateFiles()
+    with self.test_session() as sess:
+      reader = tf.TextLineReader(name="test_reader")
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      key, value = reader.read(queue)
+
+      queue.enqueue_many([files]).run()
+      queue.close().run()
+      for i in range(self._num_files):
+        for j in range(self._num_lines):
+          k, v = sess.run([key, value])
+          self.assertAllEqual("%s:%d" % (files[i], j + 1), k)
+          self.assertAllEqual(self._LineText(i, j), v)
+
+      with self.assertRaisesOpError("is closed and has insufficient elements "
+                                    "\\(requested 1, current size 0\\)"):
+        k, v = sess.run([key, value])
+
+  def testSkipHeaderLines(self):
+    files = self._CreateFiles()
+    with self.test_session() as sess:
+      reader = tf.TextLineReader(skip_header_lines=1, name="test_reader")
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      key, value = reader.read(queue)
+
+      queue.enqueue_many([files]).run()
+      queue.close().run()
+      for i in range(self._num_files):
+        for j in range(self._num_lines - 1):
+          k, v = sess.run([key, value])
+          self.assertAllEqual("%s:%d" % (files[i], j + 2), k)
+          self.assertAllEqual(self._LineText(i, j + 1), v)
+
+      with self.assertRaisesOpError("is closed and has insufficient elements "
+                                    "\\(requested 1, current size 0\\)"):
+        k, v = sess.run([key, value])
+
+
+class FixedLengthRecordReaderTest(tf.test.TestCase):
+
+  def setUp(self):
+    super(FixedLengthRecordReaderTest, self).setUp()
+    self._num_files = 2
+    self._num_records = 7
+    self._header_bytes = 5
+    self._record_bytes = 3
+    self._footer_bytes = 2
+
+  def _Record(self, f, r):
+    return str(f * 2 + r) * self._record_bytes
+
+  def _CreateFiles(self):
+    filenames = []
+    for i in range(self._num_files):
+      fn = os.path.join(self.get_temp_dir(), "fixed_length_record.%d.txt" % i)
+      filenames.append(fn)
+      f = open(fn, "w")
+      f.write("H" * self._header_bytes)
+      for j in range(self._num_records):
+        f.write(self._Record(i, j))
+      f.write("F" * self._footer_bytes)
+    return filenames
+
+  def testOneEpoch(self):
+    files = self._CreateFiles()
+    with self.test_session() as sess:
+      reader = tf.FixedLengthRecordReader(
+          header_bytes=self._header_bytes,
+          record_bytes=self._record_bytes,
+          footer_bytes=self._footer_bytes,
+          name="test_reader")
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      key, value = reader.read(queue)
+
+      queue.enqueue_many([files]).run()
+      queue.close().run()
+      for i in range(self._num_files):
+        for j in range(self._num_records):
+          k, v = sess.run([key, value])
+          self.assertAllEqual("%s:%d" % (files[i], j), k)
+          self.assertAllEqual(self._Record(i, j), v)
+
+      with self.assertRaisesOpError("is closed and has insufficient elements "
+                                    "\\(requested 1, current size 0\\)"):
+        k, v = sess.run([key, value])
+
+
+class TFRecordReaderTest(tf.test.TestCase):
+
+  def setUp(self):
+    super(TFRecordReaderTest, self).setUp()
+    self._num_files = 2
+    self._num_records = 7
+
+  def _Record(self, f, r):
+    return "Record %d of file %d" % (r, f)
+
+  def _CreateFiles(self):
+    filenames = []
+    for i in range(self._num_files):
+      fn = os.path.join(self.get_temp_dir(), "tf_record.%d.txt" % i)
+      filenames.append(fn)
+      writer = tf.python_io.TFRecordWriter(fn)
+      for j in range(self._num_records):
+        writer.write(self._Record(i, j))
+    return filenames
+
+  def testOneEpoch(self):
+    files = self._CreateFiles()
+    with self.test_session() as sess:
+      reader = tf.TFRecordReader(name="test_reader")
+      queue = tf.FIFOQueue(99, [tf.string], shapes=())
+      key, value = reader.read(queue)
+
+      queue.enqueue_many([files]).run()
+      queue.close().run()
+      for i in range(self._num_files):
+        for j in range(self._num_records):
+          k, v = sess.run([key, value])
+          self.assertTrue(k.startswith("%s:" % files[i]))
+          self.assertAllEqual(self._Record(i, j), v)
+
+      with self.assertRaisesOpError("is closed and has insufficient elements "
+                                    "\\(requested 1, current size 0\\)"):
+        k, v = sess.run([key, value])
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/reduction_ops_test.py b/tensorflow/python/kernel_tests/reduction_ops_test.py
new file mode 100644
index 0000000000..e5cab62c09
--- /dev/null
+++ b/tensorflow/python/kernel_tests/reduction_ops_test.py
@@ -0,0 +1,533 @@
+"""Functional tests for reduction ops."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.kernel_tests import gradient_checker
+
+
+class SumReductionTest(tf.test.TestCase):
+
+  def _compare(self, x, reduction_axes, keep_dims, use_gpu=False):
+    np_ans = x
+    if reduction_axes is None:
+      np_ans = np.sum(np_ans, keepdims=keep_dims)
+    else:
+      reduction_axes = np.array(reduction_axes).astype(np.int32)
+      for ra in reduction_axes.ravel()[::-1]:
+        np_ans = np.sum(np_ans, axis=ra, keepdims=keep_dims)
+    with self.test_session(use_gpu=use_gpu):
+      tf_ans = tf.reduce_sum(x, reduction_axes, keep_dims)
+      out = tf_ans.eval()
+    self.assertAllClose(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareAll(self, x, reduction_axes):
+    if reduction_axes is not None and np.shape(reduction_axes) == (1,):
+      # Test scalar reduction_axes argument
+      self._compareAll(x, reduction_axes[0])
+    self._compare(x, reduction_axes, False, use_gpu=True)
+    self._compare(x, reduction_axes, False, use_gpu=False)
+    self._compare(x, reduction_axes, True, use_gpu=True)
+    self._compare(x, reduction_axes, True, use_gpu=False)
+
+  def testFloatReduce1D(self):
+    # Create a 1D array of floats
+    np_arr = np.arange(1, 6).reshape([5]).astype(np.float32)
+    self._compareAll(np_arr, [0])
+
+  def testFloatReduce2D(self):
+    # Create a 2D array of floats and reduce across all possible
+    # dimensions
+    np_arr = np.arange(0, 10).reshape([2, 5]).astype(np.float32)
+    self._compareAll(np_arr, None)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [0, 1])
+
+  def testFloatReduce3D(self):
+    # Create a 3D array of floats and reduce across all possible
+    # dimensions
+    np_arr = np.arange(0, 30).reshape([2, 3, 5]).astype(np.float32)
+    self._compareAll(np_arr, None)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+
+  def testFloatReduce4D(self):
+    # Create a 4D array of floats and reduce across some
+    # dimensions
+    np_arr = np.arange(0, 210).reshape([2, 3, 5, 7]).astype(np.float32)
+    self._compareAll(np_arr, None)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    # Need specialization for reduce(4D, [0, 2])
+    # self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+    self._compareAll(np_arr, [1, 2, 3])
+    self._compareAll(np_arr, [0, 1, 2, 3])
+
+  def testFloatReduce5D(self):
+    # Create a 5D array of floats and reduce across some dimensions
+    np_arr = np.arange(0, 840).reshape([2, 3, 5, 7, 4]).astype(np.float32)
+    self._compareAll(np_arr, None)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    # Need specialization for reduce(4D, [0, 2])
+    # self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+    self._compareAll(np_arr, [1, 2, 3])
+    self._compareAll(np_arr, [0, 1, 2, 3])
+    self._compareAll(np_arr, [1, 2, 3, 4])
+    self._compareAll(np_arr, [0, 1, 2, 3, 4])
+
+  # Simple tests for various tf.
+  def testDoubleReduce1D(self):
+    np_arr = np.arange(1, 6).reshape([5]).astype(np.float64)
+    self._compare(np_arr, [], False)
+    self._compare(np_arr, [0], False)
+
+  def testInt32Reduce1D(self):
+    np_arr = np.arange(1, 6).reshape([5]).astype(np.int32)
+    self._compare(np_arr, [], False)
+    self._compare(np_arr, [0], False)
+
+  def testInvalidIndex(self):
+    np_arr = np.arange(0, 10).reshape([2, 5]).astype(np.float32)
+    input_tensor = tf.convert_to_tensor(np_arr)
+    with self.assertRaisesWithPredicateMatch(
+        ValueError, lambda e: "Invalid reduction dimension" in e.message):
+      tf.reduce_sum(input_tensor, [-1])
+    with self.assertRaisesWithPredicateMatch(
+        ValueError, lambda e: "Invalid reduction dimension" in e.message):
+      tf.reduce_sum(input_tensor, [2])
+    with self.assertRaisesWithPredicateMatch(
+        ValueError, lambda e: "Invalid reduction dimension" in e.message):
+      tf.reduce_sum(input_tensor, [0, 2])
+
+  # Int64??
+
+  def _compareGradient(self, shape, sum_shape, reduction_axes):
+    if reduction_axes is not None and np.shape(reduction_axes) == (1,):
+      # Test scalar reduction_axes argument
+      self._compareGradient(shape, sum_shape, reduction_axes[0])
+    x = np.arange(1.0, 49.0).reshape(shape).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_sum(t, reduction_axes)
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t,
+          shape,
+          su,
+          sum_shape,
+          x_init_value=x,
+          delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+  def testGradient(self):
+    self._compareGradient([2, 3, 4, 2], [2, 2], [1, 2])
+
+  def testGradient2(self):
+    self._compareGradient([2, 3, 4, 2], [2, 4, 2], [1])
+
+  def testGradient3(self):
+    self._compareGradient([2, 3, 4, 2], [2, 3, 2], [2])
+
+  def testGradient4(self):
+    self._compareGradient([2, 3, 4, 2], [], None)
+
+
+class MeanReductionTest(tf.test.TestCase):
+
+  def _compare(self, x, reduction_axes, keep_dims):
+    np_sum = x
+    count = 1
+    for ra in reduction_axes[::-1]:
+      np_sum = np.sum(np_sum, axis=ra, keepdims=keep_dims)
+      count *= x.shape[ra]
+    np_ans = np_sum / count
+    with self.test_session():
+      reduction_axes = np.array(reduction_axes).astype(np.int32)
+      tf_ans = tf.reduce_mean(x, reduction_axes, keep_dims)
+      out = tf_ans.eval()
+    self.assertAllClose(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareAll(self, x, reduction_axes):
+    self._compare(x, reduction_axes, False)
+    self._compare(x, reduction_axes, True)
+
+  def testFloatReduce3D(self):
+    # Create a 3D array of floats and reduce across all possible
+    # dimensions
+    np_arr = np.arange(0, 30).reshape([2, 3, 5]).astype(np.float32)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+
+  def testGradient(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float32)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_mean(t, [1, 2])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 2], x_init_value=x, delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+      su = tf.reduce_mean(t, [0, 1, 2, 3])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [1], x_init_value=x, delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+      su = tf.reduce_mean(t, [])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 3, 4, 2], x_init_value=x, delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+
+class ProdReductionTest(tf.test.TestCase):
+
+  def _compare(self, x, reduction_axes, keep_dims):
+    np_ans = x
+    if reduction_axes is None:
+      np_ans = np.prod(np_ans, keepdims=keep_dims)
+    else:
+      for ra in reduction_axes[::-1]:
+        np_ans = np.prod(np_ans, axis=ra, keepdims=keep_dims)
+    with self.test_session():
+      if reduction_axes is not None:
+        reduction_axes = np.array(reduction_axes).astype(np.int32)
+      tf_ans = tf.reduce_prod(x, reduction_axes, keep_dims)
+      out = tf_ans.eval()
+    self.assertAllClose(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareAll(self, x, reduction_axes):
+    self._compare(x, reduction_axes, False)
+    self._compare(x, reduction_axes, True)
+
+  def testFloatReduce3D(self):
+    # Create a 3D array of floats and reduce across all possible
+    # dimensions
+    np_arr = np.arange(0, 30).reshape([2, 3, 5]).astype(np.float32)
+    self._compareAll(np_arr, None)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+
+  def testGradient(self):
+    s = [2, 3, 4, 2]
+    # NOTE(kearnes): divide by 20 so product is a reasonable size
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float32) / 20.
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+
+      su = tf.reduce_prod(t, [])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 3, 4, 2], x_init_value=x, delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+      su = tf.reduce_prod(t, [1, 2])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 2], x_init_value=x, delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+      su = tf.reduce_prod(t, [0, 1, 2, 3])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [1], x_init_value=x, delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+    # NOTE(kearnes): the current gradient calculation gives NaNs for 0 inputs
+    x = np.arange(0.0, 48.0).reshape(s).astype(np.float32) / 20.
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_prod(t, [])
+      jacob_t, _ = gradient_checker.ComputeGradient(
+          t, s, su, [2, 3, 4, 2], x_init_value=x, delta=1)
+      with self.assertRaisesOpError("Tensor had NaN values"):
+        tf.check_numerics(jacob_t, message="_ProdGrad NaN test").op.run()
+
+
+class MinReductionTest(tf.test.TestCase):
+
+  def _compare(self, x, reduction_axes, keep_dims, use_gpu=False):
+    np_ans = x
+    if reduction_axes is None:
+      np_ans = np.amin(np_ans, keepdims=keep_dims)
+    else:
+      for ra in reduction_axes[::-1]:
+        np_ans = np.amin(np_ans, axis=ra, keepdims=keep_dims)
+    with self.test_session(use_gpu=use_gpu):
+      if reduction_axes is not None:
+        reduction_axes = np.array(reduction_axes).astype(np.int32)
+      tf_ans = tf.reduce_min(x, reduction_axes, keep_dims)
+      out = tf_ans.eval()
+    self.assertAllClose(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareAll(self, x, reduction_axes):
+    self._compare(x, reduction_axes, False, use_gpu=True)
+    self._compare(x, reduction_axes, False, use_gpu=False)
+    self._compare(x, reduction_axes, True, use_gpu=True)
+    self._compare(x, reduction_axes, True, use_gpu=False)
+
+  def testFloatReduce3D(self):
+    # Create a 3D array of floats and reduce across all possible
+    # dimensions
+    np_arr = np.arange(0, 30).reshape([2, 3, 5]).astype(np.float32)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+
+  def testGradient(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_min(t, [1, 2])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 2], x_init_value=x, delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+  def testGradient2(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_min(t, [1])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 4, 2], x_init_value=x, delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+  def testGradient3(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_min(t, [2])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 3, 2], x_init_value=x, delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+  def testGradient4(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_min(t)
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [1], x_init_value=x, delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+
+class MaxReductionTest(tf.test.TestCase):
+
+  def _compare(self, x, reduction_axes, keep_dims, use_gpu=False):
+    np_ans = x
+    if reduction_axes is None:
+      np_ans = np.amax(np_ans, keepdims=keep_dims)
+    else:
+      for ra in reduction_axes[::-1]:
+        np_ans = np.amax(np_ans, axis=ra, keepdims=keep_dims)
+    with self.test_session(use_gpu=use_gpu):
+      if reduction_axes is not None:
+        reduction_axes = np.array(reduction_axes).astype(np.int32)
+      tf_ans = tf.reduce_max(x, reduction_axes, keep_dims)
+      out = tf_ans.eval()
+    self.assertAllClose(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareAll(self, x, reduction_axes):
+    self._compare(x, reduction_axes, False, use_gpu=True)
+    self._compare(x, reduction_axes, False, use_gpu=False)
+    self._compare(x, reduction_axes, True, use_gpu=True)
+    self._compare(x, reduction_axes, True, use_gpu=False)
+
+  def testFloatReduce3D(self):
+    # Create a 3D array of floats and reduce across all possible
+    # dimensions
+    np_arr = np.arange(0, 30).reshape([2, 3, 5]).astype(np.float32)
+    self._compareAll(np_arr, None)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+
+  def testGradient(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_max(t, [1, 2])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 2], x_init_value=x, delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+  def testGradient2(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_max(t, [1])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 4, 2], x_init_value=x, delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+  def testGradient3(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_max(t, [2])
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [2, 3, 2], x_init_value=x, delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+  def testGradient4(self):
+    s = [2, 3, 4, 2]
+    x = np.arange(1.0, 49.0).reshape(s).astype(np.float64)
+    with self.test_session():
+      t = tf.convert_to_tensor(x)
+      su = tf.reduce_max(t)
+      jacob_t, jacob_n = gradient_checker.ComputeGradient(
+          t, s, su, [1], x_init_value=x, delta=1)
+    self.assertAllClose(jacob_t, jacob_n, rtol=1e-8, atol=1e-8)
+
+
+class AllReductionTest(tf.test.TestCase):
+
+  def _compare(self, x, reduction_axes, keep_dims, use_gpu=False):
+    np_ans = x
+    if reduction_axes is None:
+      np_ans = np.all(np_ans, keepdims=keep_dims)
+    else:
+      for ra in reduction_axes[::-1]:
+        np_ans = np.all(np_ans, axis=ra, keepdims=keep_dims)
+    with self.test_session(use_gpu=use_gpu):
+      if reduction_axes is not None:
+        reduction_axes = np.array(reduction_axes).astype(np.int32)
+      tf_ans = tf.reduce_all(x, reduction_axes, keep_dims)
+      out = tf_ans.eval()
+    self.assertAllEqual(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareAll(self, x, reduction_axes):
+    self._compare(x, reduction_axes, False, use_gpu=True)
+    self._compare(x, reduction_axes, False, use_gpu=False)
+    self._compare(x, reduction_axes, True, use_gpu=True)
+    self._compare(x, reduction_axes, True, use_gpu=False)
+
+  def testAll3D(self):
+    # Create a 3D array of bools and reduce across all possible
+    # dimensions
+    np_arr = (np.random.uniform(0, 1, 30) > 0.1).reshape([2, 3, 5])
+    self._compareAll(np_arr, None)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+
+
+class AnyReductionTest(tf.test.TestCase):
+
+  def _compare(self, x, reduction_axes, keep_dims, use_gpu=False):
+    np_ans = x
+    if reduction_axes is None:
+      np_ans = np.any(np_ans, keepdims=keep_dims)
+    else:
+      for ra in reduction_axes[::-1]:
+        np_ans = np.any(np_ans, axis=ra, keepdims=keep_dims)
+    with self.test_session(use_gpu=use_gpu):
+      if reduction_axes is not None:
+        reduction_axes = np.array(reduction_axes).astype(np.int32)
+      tf_ans = tf.reduce_any(x, reduction_axes, keep_dims)
+      out = tf_ans.eval()
+    self.assertAllEqual(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareAll(self, x, reduction_axes):
+    self._compare(x, reduction_axes, False, use_gpu=True)
+    self._compare(x, reduction_axes, False, use_gpu=False)
+    self._compare(x, reduction_axes, True, use_gpu=True)
+    self._compare(x, reduction_axes, True, use_gpu=False)
+
+  def testAll3D(self):
+    # Create a 3D array of bools and reduce across all possible
+    # dimensions
+    np_arr = (np.random.uniform(0, 1, 30) > 0.9).reshape([2, 3, 5])
+    self._compareAll(np_arr, None)
+    self._compareAll(np_arr, [])
+    self._compareAll(np_arr, [0])
+    self._compareAll(np_arr, [1])
+    self._compareAll(np_arr, [2])
+    self._compareAll(np_arr, [0, 1])
+    self._compareAll(np_arr, [1, 2])
+    self._compareAll(np_arr, [0, 2])
+    self._compareAll(np_arr, [0, 1, 2])
+
+  def testPartialShapes(self):
+    # Input shape is unknown.
+    c_unknown = tf.placeholder(tf.float32)
+    s_unknown = tf.reduce_sum(c_unknown, [1, 2])
+    self.assertEqual(tensor_shape.unknown_shape(), s_unknown.get_shape())
+
+    # Input shape only has known rank.
+    c_known_rank = tf.placeholder(tf.float32)
+    c_known_rank.set_shape(tensor_shape.unknown_shape(ndims=3))
+    s_known_rank = tf.reduce_sum(c_known_rank, [1, 2], keep_dims=True)
+    self.assertEqual(3, s_known_rank.get_shape().ndims)
+
+    # Reduction indices are unknown.
+    unknown_indices = tf.placeholder(tf.int32)
+    c_unknown_indices = tf.constant([[10.0], [20.0]])
+    s_unknown_indices = tf.reduce_sum(c_unknown_indices, unknown_indices,
+                                     keep_dims=False)
+    self.assertEqual(tensor_shape.unknown_shape(),
+                     s_unknown_indices.get_shape())
+    s_unknown_indices_keep = tf.reduce_sum(c_unknown_indices, unknown_indices,
+                                          keep_dims=True)
+    self.assertEqual(2, s_unknown_indices_keep.get_shape().ndims)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/relu_op_test.py b/tensorflow/python/kernel_tests/relu_op_test.py
new file mode 100644
index 0000000000..a4b353f253
--- /dev/null
+++ b/tensorflow/python/kernel_tests/relu_op_test.py
@@ -0,0 +1,181 @@
+"""Tests for Relu and ReluGrad."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class ReluTest(tf.test.TestCase):
+
+  def _npRelu(self, np_features):
+    return np.maximum(np_features, np.zeros(np_features.shape))
+
+  def testNpRelu(self):
+    self.assertAllClose(
+        np.array([[0.0, 0.7, 0.0, 0.3, 0.0],
+                  [0.1, 0.0, 0.5, 0.0, 0.9]]),
+        self._npRelu(np.array([[-0.9, 0.7, -0.5, 0.3, -0.1],
+                               [0.1, -0.3, 0.5, -0.7, 0.9]])))
+
+  def _testRelu(self, np_features, use_gpu=False):
+    np_relu = self._npRelu(np_features)
+    with self.test_session(use_gpu=use_gpu):
+      relu = tf.nn.relu(np_features)
+      tf_relu = relu.eval()
+    self.assertAllClose(np_relu, tf_relu)
+    self.assertShapeEqual(np_relu, relu)
+
+  def testNumbers(self):
+    for t in [np.int32, np.int64, np.float, np.double]:
+      self._testRelu(
+          np.array([[-9, 7, -5, 3, -1], [1, -3, 5, -7, 9]]).astype(t),
+          use_gpu=False)
+      if t in [np.float, np.double]:
+        self._testRelu(
+            np.array([[-9, 7, -5, 3, -1], [1, -3, 5, -7, 9]]).astype(t),
+            use_gpu=True)
+
+  # The gradient test for ReLU is a bit tricky as the derivative is not well
+  # defined at around zero and we want to avoid that in terms of input values.
+  def testGradientFloat(self):
+    with self.test_session():
+      x = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9],
+          shape=[2, 5], name="x")
+      y = tf.nn.relu(x, name="relu")
+      x_init = np.asarray(
+          [[-0.9, -0.7, -0.5, -0.3, -0.1], [0.1, 0.3, 0.5, 0.7, 0.9]],
+          dtype=np.float32, order="F")
+      err = gc.ComputeGradientError(x, [2, 5], y, [2, 5], x_init_value=x_init)
+    print "relu (float) gradient err = ", err
+    self.assertLess(err, 1e-4)
+
+  def testGradientNaN(self):
+    with self.test_session():
+      # Note the NaN is injected as an input to the gradient calculation.
+      x = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, np.nan, 0.1, 0.3, 0.5, 0.7, 0.9],
+          shape=[2, 5], name="x")
+      y = tf.nn.relu(x, name="relu")
+      grad_ys = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9],
+          shape=[2, 5], name="ys")
+      g_op = tf.gradients(
+          [y], [x], grad_ys=[grad_ys], name="gradients")[0]
+      try:
+        g_op.op.run()
+        assert False, "ReluGrad should have failed due to CheckNumerics."
+      except Exception as e:  # pylint: disable=broad-except
+        assert "ReluGrad input is not finite." in str(e)
+
+  def testGradientDouble(self):
+    with self.test_session():
+      x = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9],
+          shape=[2, 5], dtype=tf.float64, name="x")
+      y = tf.nn.relu(x, name="relu")
+      x_init = np.asarray(
+          [[-0.9, -0.7, -0.5, -0.3, -0.1], [0.1, 0.3, 0.5, 0.7, 0.9]],
+          dtype=np.float64, order="F")
+      err = gc.ComputeGradientError(x, [2, 5], y, [2, 5], x_init_value=x_init)
+    print "relu (double) gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+  def testGradGradFloat(self):
+    with self.test_session():
+      x = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9],
+          shape=[2, 5], name="x")
+      y = tf.nn.relu(x, name="relu")
+      z = tf.gradients(y, x)
+      x_init = np.asarray(
+          [[-0.9, -0.7, -0.5, -0.3, -0.1], [0.1, 0.3, 0.5, 0.7, 0.9]],
+          dtype=np.float32, order="F")
+      err = gc.ComputeGradientError(x, [2, 5], z[0], [2, 5],
+                                    x_init_value=x_init)
+    print "relu (float) gradient of gradient err = ", err
+    self.assertLess(err, 1e-4)
+
+  def testGradGradDouble(self):
+    with self.test_session():
+      x = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9],
+          shape=[2, 5], dtype=tf.float64, name="x")
+      y = tf.nn.relu(x, name="relu")
+      z = tf.gradients(y, x)
+      x_init = np.asarray(
+          [[-0.9, -0.7, -0.5, -0.3, -0.1], [0.1, 0.3, 0.5, 0.7, 0.9]],
+          dtype=np.float64, order="F")
+      err = gc.ComputeGradientError(x, [2, 5], z[0], [2, 5],
+                                    x_init_value=x_init)
+    print "relu (double) gradient of gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+
+class Relu6Test(tf.test.TestCase):
+
+  def _npRelu6(self, np_features):
+    sixes = np.copy(np_features)
+    sixes.fill(6.0)
+    return np.minimum(np.maximum(np_features, np.zeros(np_features.shape)),
+                      sixes)
+
+  def testNpRelu6(self):
+    self.assertAllClose(
+        np.array([[0.0, 0.7, 0.0, 0.3, 6.0],
+                  [0.1, 0.0, 6.0, 0.0, 0.9]]),
+        self._npRelu6(np.array([[-0.9, 0.7, -0.5, 0.3, 6.0],
+                                [0.1, -0.3, 6.5, -0.7, 0.9]])))
+
+  def _testRelu6(self, np_features, use_gpu=False):
+    np_relu6 = self._npRelu6(np_features)
+    with self.test_session(use_gpu=use_gpu):
+      relu6 = tf.nn.relu6(np_features)
+      tf_relu6 = relu6.eval()
+    self.assertAllClose(np_relu6, tf_relu6)
+    self.assertShapeEqual(np_relu6, relu6)
+
+  def testNumbers(self):
+    for t in [np.int32, np.int64, np.float, np.double]:
+      self._testRelu6(
+          np.array([[-9, 7, -5, 3, -1], [1, -3, 5, -7, 9]]).astype(t),
+          use_gpu=False)
+      if t in [np.float, np.double]:
+        self._testRelu6(
+            np.array([[-9, 7, -5, 3, -1], [1, -3, 5, -7, 9]]).astype(t),
+            use_gpu=True)
+
+  # The gradient test for ReLU6 is a bit tricky as the derivative is
+  # not well defined at around zero and six and we want to avoid that
+  # in terms of input values.
+  def testGradientFloat(self):
+    with self.test_session():
+      x = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, -0.1, 6.1, 6.3, 6.5, 6.7, 6.9],
+          shape=[2, 5], name="x")
+      y = tf.nn.relu6(x, name="relu6")
+      x_init = np.asarray(
+          [[-0.9, -0.7, -0.5, -0.3, -0.1], [6.1, 6.3, 6.5, 6.7, 6.9]],
+          dtype=np.float32, order="F")
+      err = gc.ComputeGradientError(x, [2, 5], y, [2, 5], x_init_value=x_init)
+    print "relu6 (float) gradient err = ", err
+    self.assertLess(err, 1e-4)
+
+  def testGradientDouble(self):
+    with self.test_session():
+      x = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, -0.1, 6.1, 6.3, 6.5, 6.7, 6.9],
+          shape=[2, 5], dtype=tf.float64, name="x")
+      y = tf.nn.relu6(x, name="relu6")
+      x_init = np.asarray(
+          [[-0.9, -0.7, -0.5, -0.3, -0.1], [6.1, 6.3, 6.5, 6.7, 6.9]],
+          dtype=np.float64, order="F")
+      err = gc.ComputeGradientError(x, [2, 5], y, [2, 5], x_init_value=x_init)
+    print "relu6 (double) gradient err = ", err
+    self.assertLess(err, 1e-10)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/reshape_op_test.py b/tensorflow/python/kernel_tests/reshape_op_test.py
new file mode 100644
index 0000000000..65b0e6d4bf
--- /dev/null
+++ b/tensorflow/python/kernel_tests/reshape_op_test.py
@@ -0,0 +1,106 @@
+"""Tests for tensorflow.ops.reshape_op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class ReshapeTest(tf.test.TestCase):
+
+  def _testReshape(self, x, y, use_gpu=False):
+    with self.test_session(use_gpu=use_gpu):
+      np_ans = x.reshape(y)
+      tf_ans = tf.reshape(x, y)
+      out = tf_ans.eval()
+      self.assertEqual(tf_ans.get_shape(), out.shape)
+      self.assertShapeEqual(np_ans, tf_ans)
+
+  def _testBothReshape(self, x, y):
+    self._testReshape(x, y, False)
+    self._testReshape(x, y, True)
+
+  def testFloatBasic(self):
+    x = np.arange(1., 7.).reshape([1, 6]).astype(np.float32)
+    self._testBothReshape(x, [2, 3])
+
+  def testDoubleBasic(self):
+    x = np.arange(1., 7.).reshape([1, 6]).astype(np.float64)
+    self._testBothReshape(x, [2, 3])
+
+  def testInt32Basic(self):
+    x = np.arange(1., 7.).reshape([1, 6]).astype(np.int32)
+    self._testBothReshape(x, [2, 3])
+
+  def testSComplexBasic(self):
+    x = np.arange(1., 7.).reshape([1, 6]).astype(np.complex64)
+    self._testBothReshape(x, [2, 3])
+
+  def testFloatReshapeThreeDimensions(self):
+    x = np.arange(1., 28.).reshape([1, 27]).astype(np.float32)
+    self._testBothReshape(x, [3, 3, 3])
+
+  def testFloatUnspecifiedDimOnly(self):
+    x = np.arange(1., 7.).reshape([6]).astype(np.float32)
+    self._testBothReshape(x, [-1])
+
+  def testFloatUnspecifiedDimBegin(self):
+    x = np.arange(1., 7.).reshape([6]).astype(np.float32)
+    self._testBothReshape(x, [-1, 2])
+
+  def testFloatUnspecifiedDimEnd(self):
+    x = np.arange(1., 7.).reshape([6]).astype(np.float32)
+    self._testBothReshape(x, [3, -1])
+
+  # TODO(vrv): Add tests for failure conditions once python test_util
+  # reports errors.
+
+  def testFloatReshapeGradThreeDimensions(self):
+    x = np.arange(1., 25.).reshape([1, 24]).astype(np.float32)
+    s = list(np.shape(x))
+    with self.test_session():
+      input_tensor = tf.constant(x, shape=[2, 3, 4])
+      reshape_out = tf.reshape(input_tensor, [1, 8, 3])
+      err = gc.ComputeGradientError(input_tensor, s,
+                                    reshape_out, s, x_init_value=x)
+    print "Reshape gradient error = " % err
+    self.assertLess(err, 1e-3)
+
+  def testFloatEmpty(self):
+    x = np.empty((0, 0, 0, 0), dtype=np.float32)
+    self._testBothReshape(x, [1, 2, 3, 0])
+    self._testBothReshape(x, [1, 0, 0, 4])
+    self._testBothReshape(x, [0, 0, 0, 0])
+    self._testBothReshape(x, [1, 2, 0])
+    self._testBothReshape(x, [0, 0, 0])
+    self._testBothReshape(x, [1, -1, 5])
+
+  def testErrors(self):
+    x = tf.constant(0.0, shape=[1, 0, 3])
+    with self.assertRaisesRegexp(
+        ValueError, "cannot infer the missing input size"):
+      tf.reshape(x, [0, -1, 5])
+
+    y = tf.constant(0.0, shape=[23, 29, 31])
+    with self.assertRaisesRegexp(ValueError, "isn't divisible by 17"):
+      tf.reshape(y, [17, -1])
+
+  def testPartialShapes(self):
+    x = tf.placeholder(tf.float32)
+
+    # Unknown input shape, partial new shape.
+    y = tf.reshape(x, [1, 1, -1, 1])
+    self.assertEqual([1, 1, None, 1], y.get_shape().as_list())
+
+    # Unknown input shape, unknown new shape.
+    y = tf.reshape(x, tf.placeholder(tf.int32))
+    self.assertEqual(None, y.get_shape().ndims)
+
+    # Unknown input shape, known rank for new shape.
+    y = tf.reshape(x, tf.placeholder(tf.int32, shape=(3,)))
+    self.assertEqual([None, None, None], y.get_shape().as_list())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/reverse_sequence_op_test.py b/tensorflow/python/kernel_tests/reverse_sequence_op_test.py
new file mode 100644
index 0000000000..7cfbcd7946
--- /dev/null
+++ b/tensorflow/python/kernel_tests/reverse_sequence_op_test.py
@@ -0,0 +1,109 @@
+"""Tests for tensorflow.ops.reverse_sequence_op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class ReverseSequenceTest(tf.test.TestCase):
+
+  def _testReverseSequence(self, x, seq_dim, seq_lengths,
+                           truth, use_gpu=False, expected_err_re=None):
+    with self.test_session(use_gpu=use_gpu):
+      ans = tf.reverse_sequence(x,
+                                seq_dim=seq_dim,
+                                seq_lengths=seq_lengths)
+      if expected_err_re is None:
+        tf_ans = ans.eval()
+        self.assertAllClose(tf_ans, truth, atol=1e-10)
+        self.assertShapeEqual(truth, ans)
+      else:
+        with self.assertRaisesOpError(expected_err_re):
+          ans.eval()
+
+  def _testBothReverseSequence(self, x, seq_dim, seq_lengths,
+                               truth, expected_err_re=None):
+    self._testReverseSequence(x, seq_dim, seq_lengths,
+                              truth, True, expected_err_re)
+    self._testReverseSequence(x, seq_dim, seq_lengths,
+                              truth, False, expected_err_re)
+
+  def _testBasic(self, dtype):
+    x = np.asarray([
+        [[1, 2, 3, 4], [5, 6, 7, 8]],
+        [[9, 10, 11, 12], [13, 14, 15, 16]],
+        [[17, 18, 19, 20], [21, 22, 23, 24]]], dtype=dtype)
+    x = x.reshape(3, 2, 4, 1, 1)
+
+    # reverse dim 2 up to (0:3, none, 0:4) along dim=0
+    seq_dim = 2
+    seq_lengths = np.asarray([3, 0, 4], dtype=np.int64)
+
+    truth = np.asarray(
+        [[[3, 2, 1, 4], [7, 6, 5, 8]],  # reverse 0:3
+         [[9, 10, 11, 12], [13, 14, 15, 16]],  # reverse none
+         [[20, 19, 18, 17], [24, 23, 22, 21]]],  # reverse 0:4 (all)
+        dtype=dtype)
+    truth = truth.reshape(3, 2, 4, 1, 1)
+    self._testBothReverseSequence(x, seq_dim, seq_lengths, truth)
+
+  def testFloatBasic(self):
+    self._testBasic(np.float32)
+
+  def testDoubleBasic(self):
+    self._testBasic(np.float64)
+
+  def testInt32Basic(self):
+    self._testBasic(np.int32)
+
+  def testInt64Basic(self):
+    self._testBasic(np.int64)
+
+  def testSComplexBasic(self):
+    self._testBasic(np.complex64)
+
+  def testFloatReverseSequenceGrad(self):
+    x = np.asarray([
+        [[1, 2, 3, 4], [5, 6, 7, 8]],
+        [[9, 10, 11, 12], [13, 14, 15, 16]],
+        [[17, 18, 19, 20], [21, 22, 23, 24]]], dtype=np.float)
+    x = x.reshape(3, 2, 4, 1, 1)
+
+    # reverse dim 2 up to (0:3, none, 0:4) along dim=0
+    seq_dim = 2
+    seq_lengths = np.asarray([3, 0, 4], dtype=np.int64)
+
+    with self.test_session():
+      input_t = tf.constant(x, shape=x.shape)
+      seq_lengths_t = tf.constant(seq_lengths, shape=seq_lengths.shape)
+      reverse_sequence_out = tf.reverse_sequence(input_t,
+                                                 seq_dim=seq_dim,
+                                                 seq_lengths=seq_lengths_t)
+      err = gc.ComputeGradientError(input_t,
+                                    x.shape,
+                                    reverse_sequence_out,
+                                    x.shape,
+                                    x_init_value=x)
+    print "ReverseSequence gradient error = %g" % err
+    self.assertLess(err, 1e-8)
+
+  def testShapeFunctionEdgeCases(self):
+    # Batch size mismatched between input and seq_lengths.
+    with self.assertRaises(ValueError):
+      tf.reverse_sequence(
+          tf.placeholder(tf.float32, shape=(32, 2, 3)),
+          seq_lengths=tf.placeholder(tf.int64, shape=(33,)),
+          seq_dim=3)
+
+    # seq_dim out of bounds.
+    with self.assertRaisesRegexp(ValueError, "seq_dim must be < input.dims()"):
+      tf.reverse_sequence(
+          tf.placeholder(tf.float32, shape=(32, 2, 3)),
+          seq_lengths=tf.placeholder(tf.int64, shape=(32,)),
+          seq_dim=3)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/save_restore_ops_test.py b/tensorflow/python/kernel_tests/save_restore_ops_test.py
new file mode 100644
index 0000000000..d59d76c58f
--- /dev/null
+++ b/tensorflow/python/kernel_tests/save_restore_ops_test.py
@@ -0,0 +1,21 @@
+"""Tests for tensorflow.ops.io_ops."""
+import tensorflow.python.platform
+
+import tensorflow as tf
+from tensorflow.python.ops import gen_io_ops
+
+
+class ShardedFileOpsTest(tf.test.TestCase):
+
+  def testShardedFileName(self):
+    with tf.Session(
+        target="",
+        config=tf.ConfigProto(device_count={"CPU": 2})):
+      self.assertEqual(gen_io_ops._sharded_filename("foo", 4, 100).eval(),
+                       "foo-00004-of-00100")
+      self.assertEqual(gen_io_ops._sharded_filespec("foo", 100).eval(),
+                       "foo-?????-of-00100")
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/scatter_ops_test.py b/tensorflow/python/kernel_tests/scatter_ops_test.py
new file mode 100644
index 0000000000..dd645819a3
--- /dev/null
+++ b/tensorflow/python/kernel_tests/scatter_ops_test.py
@@ -0,0 +1,49 @@
+"""Tests for tensorflow.ops.tf.scatter."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class ScatterTest(tf.test.TestCase):
+
+  def _VariableRankTest(self, np_scatter, tf_scatter):
+    np.random.seed(8)
+    with self.test_session():
+      for indices_shape in (), (2,), (2, 3), (2, 3, 4):
+        for extra_shape in (), (5,), (5, 6):
+          # Generate random indices with no duplicates for easy numpy comparison
+          size = np.prod(indices_shape, dtype=np.int32)
+          indices = np.arange(2 * size)
+          np.random.shuffle(indices)
+          indices = indices[:size].reshape(indices_shape)
+          updates = np.random.randn(*(indices_shape + extra_shape))
+          old = np.random.randn(*((2 * size,) + extra_shape))
+        # Scatter via numpy
+        new = old.copy()
+        np_scatter(new, indices, updates)
+        # Scatter via tensorflow
+        ref = tf.Variable(old)
+        ref.initializer.run()
+        tf_scatter(ref, indices, updates).eval()
+        # Compare
+        self.assertAllClose(ref.eval(), new)
+
+  def testVariableRankUpdate(self):
+    def update(ref, indices, updates):
+      ref[indices] = updates
+    self._VariableRankTest(update, tf.scatter_update)
+
+  def testVariableRankAdd(self):
+    def add(ref, indices, updates):
+      ref[indices] += updates
+    self._VariableRankTest(add, tf.scatter_add)
+
+  def testVariableRankSub(self):
+    def sub(ref, indices, updates):
+      ref[indices] -= updates
+    self._VariableRankTest(sub, tf.scatter_sub)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/segment_reduction_ops_test.py b/tensorflow/python/kernel_tests/segment_reduction_ops_test.py
new file mode 100644
index 0000000000..558ce06285
--- /dev/null
+++ b/tensorflow/python/kernel_tests/segment_reduction_ops_test.py
@@ -0,0 +1,269 @@
+"""Functional tests for segment reduction ops."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker
+
+
+class SegmentReductionHelper(tf.test.TestCase):
+
+  def _input(self, input_shape, dtype=tf.int32):
+    num_elem = 1
+    for x in input_shape:
+      num_elem *= x
+    values = range(1, num_elem + 1)
+    np_values = np.array(values).reshape(input_shape).astype(
+        dtype.as_numpy_dtype)
+    return tf.constant(values, shape=input_shape,
+                                dtype=dtype), np_values
+
+  def _segmentReduce(self, indices, x, op1, op2=None, num_out_rows=None):
+    if not x.size: return np.array([])
+    indices = np.asarray(indices)
+    if num_out_rows is None:
+      num_out_rows = indices[-1] + 1
+    output = [None] * num_out_rows
+    slice_shape = x.shape[indices.ndim:]
+    x_flat = x.reshape((indices.size,) + slice_shape)
+    for i, index in enumerate(indices.ravel()):
+      if output[index] is not None:
+        output[index] = op1(output[index], x_flat[i])
+      else:
+        output[index] = x_flat[i]
+    # zero initialize values that are still uncalcuated.
+    output = [o if o is not None else np.zeros(slice_shape) for o in output]
+    if op2 is not None:
+      output = [op2(o) for o in output]
+    output = [o.reshape(slice_shape) for o in output]
+    return np.array(output)
+
+  def _assertAllClose(self, indices, np_x, tf_x):
+    for i in set(np.asarray(indices).ravel()):
+      self.assertAllClose(np_x[i], tf_x[i])
+
+  def _mean_cum_op(self, x, y):
+    return (x[0] + y, x[1] + 1) if isinstance(x, tuple) else (x + y, 2)
+
+  def _mean_reduce_op(self, x):
+    return  x[0] / x[1] if isinstance(x, tuple) else x
+
+
+class SegmentReductionOpTest(SegmentReductionHelper):
+
+  def testValues(self):
+    dtypes = [tf.float32,
+              tf.float64,
+              tf.int64,
+              tf.int32]
+
+    # Each item is np_op1, np_op2, tf_op
+    ops_list = [(np.add, None, tf.segment_sum),
+                (self._mean_cum_op, self._mean_reduce_op,
+                 tf.segment_mean),
+                (np.ndarray.__mul__, None, tf.segment_prod),
+                (np.minimum, None, tf.segment_min),
+                (np.maximum, None, tf.segment_max)]
+
+    n = 10
+    shape = [n, 2]
+    indices = [int(i / 3) for i in range(n)]
+    for dtype in dtypes:
+      with self.test_session(use_gpu=False):
+        tf_x, np_x = self._input(shape, dtype=dtype)
+        for np_op1, np_op2, tf_op in ops_list:
+          np_ans = self._segmentReduce(indices, np_x, np_op1, np_op2)
+          s = tf_op(data=tf_x, segment_ids=indices)
+          tf_ans = s.eval()
+          self._assertAllClose(indices, np_ans, tf_ans)
+          # NOTE(mrry): The static shape inference that computes
+          # `tf_ans.shape` can only infer that sizes from dimension 1
+          # onwards, because the size of dimension 0 is data-dependent
+          # and may therefore vary dynamically.
+          self.assertAllEqual(np_ans.shape[1:], tf_ans.shape[1:])
+
+  def testSegmentIdsShape(self):
+    shape = [4, 4]
+    tf_x, _ = self._input(shape)
+    indices = tf.constant([0, 1, 2, 2], shape=[2, 2])
+    with self.assertRaises(ValueError):
+      tf.segment_sum(data=tf_x, segment_ids=indices)
+
+  def testSegmentIdsSize(self):
+    shape = [4, 4]
+    with self.test_session():
+      tf_x, _ = self._input(shape)
+      indices = [0, 1]
+      s = tf.segment_sum(data=tf_x, segment_ids=indices)
+      with self.assertRaisesOpError("segment_ids should be the same size"):
+        s.eval()
+
+  def testGradient(self):
+    shape = [4, 4]
+    indices = [0, 1, 2, 2]
+    for tf_op in [tf.segment_sum,
+                  tf.segment_mean,
+                  tf.segment_min,
+                  tf.segment_max]:
+      with self.test_session():
+        tf_x, np_x = self._input(shape, dtype=tf.float64)
+        s = tf_op(data=tf_x, segment_ids=indices)
+        jacob_t, jacob_n = gradient_checker.ComputeGradient(
+            tf_x, shape, s, [3, 4], x_init_value=np_x.astype(np.double),
+            delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+
+class UnsortedSegmentSumTest(SegmentReductionHelper):
+
+  def testValues(self):
+    dtypes = [tf.float32,
+              tf.float64,
+              tf.int64,
+              tf.int32]
+    indices_flat = np.array([0, 4, 0, 8, 3, 8, 4, 7, 7, 3])
+    num_segments = 12
+    for indices in indices_flat, indices_flat.reshape(5, 2):
+      shape = indices.shape + (2,)
+      for dtype in dtypes:
+        with self.test_session(use_gpu=False):
+          tf_x, np_x = self._input(shape, dtype=dtype)
+          np_ans = self._segmentReduce(indices,
+                                       np_x,
+                                       np.add,
+                                       op2=None,
+                                       num_out_rows=num_segments)
+          s = tf.unsorted_segment_sum(data=tf_x,
+                                      segment_ids=indices,
+                                      num_segments=num_segments)
+          tf_ans = s.eval()
+        self._assertAllClose(indices, np_ans, tf_ans)
+        self.assertShapeEqual(np_ans, s)
+
+  def testGradient(self):
+    num_cols = 2
+    indices_flat = np.array([0, 4, 0, 8, 3, 8, 4, 7, 7, 3])
+    num_segments = max(indices_flat) + 3
+    for indices in indices_flat, indices_flat.reshape(5, 2):
+      shape = indices.shape + (num_cols,)
+      with self.test_session():
+        tf_x, np_x = self._input(shape, dtype=tf.float64)
+        s = tf.unsorted_segment_sum(data=tf_x,
+                                    segment_ids=indices,
+                                    num_segments=num_segments)
+        jacob_t, jacob_n = gradient_checker.ComputeGradient(
+            tf_x,
+            shape,
+            s,
+            [num_segments, num_cols],
+            x_init_value=np_x.astype(np.double),
+            delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+  def testGradientMatchesSegmentSum(self):
+    # Strategy: compute the gradient for UnsortedSegmentSum and SegmentSum
+    # and compare the outputs, which should be identical.
+    # NB: for this test to work, indices must be valid for SegmentSum, namely
+    # it must be sorted, the indices must be contiguous, and num_segments
+    # must be max(indices) + 1.
+    indices = [0, 0, 1, 1, 1, 2, 3, 4, 5]
+    n = len(indices)
+    num_cols = 2
+    shape = [n, num_cols]
+    num_segments = max(indices) + 1
+    with self.test_session():
+      tf_x, np_x = self._input(shape, dtype=tf.float64)
+      # Results from UnsortedSegmentSum
+      unsorted_s = tf.unsorted_segment_sum(data=tf_x,
+                                                 segment_ids=indices,
+                                                 num_segments=num_segments)
+      unsorted_jacob_t, unsorted_jacob_n = gradient_checker.ComputeGradient(
+          tf_x, shape, unsorted_s, [num_segments, num_cols],
+          x_init_value=np_x.astype(np.double),
+          delta=1)
+      # Results from SegmentSum
+      sorted_s = tf.segment_sum(data=tf_x, segment_ids=indices)
+      sorted_jacob_t, sorted_jacob_n = gradient_checker.ComputeGradient(
+          tf_x, shape, sorted_s, [num_segments, num_cols],
+          x_init_value=np_x.astype(np.double),
+          delta=1)
+    self.assertAllClose(unsorted_jacob_t, sorted_jacob_t, rtol=1e-3, atol=1e-3)
+    self.assertAllClose(unsorted_jacob_n, sorted_jacob_n, rtol=1e-3, atol=1e-3)
+
+
+class SparseSegmentReductionHelper(SegmentReductionHelper):
+
+  def _sparse_input(self, input_shape, num_indices,
+                    dtype=tf.int32):
+    a, b = super(SparseSegmentReductionHelper, self)._input(input_shape,
+                                                            dtype)
+    indices = np.random.randint(0, input_shape[0], num_indices).astype(np.int32)
+    return (tf.constant(indices, dtype=tf.int32),
+            indices, a, b)
+
+  def _sparseSegmentReduce(self, x, indices, segment_indices, op1, op2=None):
+    return self._segmentReduce(segment_indices, x[indices], op1, op2)
+
+
+class SparseSegmentReductionOpTest(SparseSegmentReductionHelper):
+
+  def testValues(self):
+    dtypes = [tf.float32,
+              tf.float64,
+              tf.int64,
+              tf.int32]
+
+    mean_dtypes = [tf.float32,
+                   tf.float64]
+
+    # Each item is np_op1, np_op2, tf_op
+    ops_list = [(np.add, None, tf.sparse_segment_sum),
+                (self._mean_cum_op, self._mean_reduce_op,
+                 tf.sparse_segment_mean)]
+
+    n = 400
+    shape = [n, 2]
+    segment_indices = []
+    for i in range(20):
+      for _ in range(i + 1):
+        segment_indices.append(i)
+    num_indices = len(segment_indices)
+    for dtype in dtypes:
+      with self.test_session(use_gpu=False):
+        tf_indices, np_indices, tf_x, np_x = self._sparse_input(shape,
+                                                                num_indices,
+                                                                dtype=dtype)
+        for np_op1, np_op2, tf_op in ops_list:
+          if tf_op == tf.sparse_segment_mean and dtype not in mean_dtypes:
+            continue
+          np_ans = self._sparseSegmentReduce(np_x, np_indices, segment_indices,
+                                             np_op1, np_op2)
+          s = tf_op(data=tf_x, indices=tf_indices, segment_ids=segment_indices)
+          tf_ans = s.eval()
+          self._assertAllClose(segment_indices, np_ans, tf_ans)
+          # NOTE(mrry): The static shape inference that computes
+          # `tf_ans.shape` can only infer that sizes from dimension 1
+          # onwards, because the size of dimension 0 is data-dependent
+          # and may therefore vary dynamically.
+          self.assertAllEqual(np_ans.shape[1:], tf_ans.shape[1:])
+
+  def testGradient(self):
+    shape = [10, 4]
+
+    segment_indices = [0, 1, 2, 2]
+    num_indices = len(segment_indices)
+    for tf_op in [tf.sparse_segment_sum,
+                  tf.sparse_segment_mean]:
+      with self.test_session():
+        tf_indices, _, tf_x, np_x = self._sparse_input(
+            shape, num_indices, dtype=tf.float64)
+        s = tf_op(data=tf_x, indices=tf_indices, segment_ids=segment_indices)
+        jacob_t, jacob_n = gradient_checker.ComputeGradient(
+            tf_x, shape, s, [3, 4], x_init_value=np_x.astype(np.double),
+            delta=1)
+      self.assertAllClose(jacob_t, jacob_n, rtol=1e-3, atol=1e-3)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/shape_ops_test.py b/tensorflow/python/kernel_tests/shape_ops_test.py
new file mode 100644
index 0000000000..ac97180dbe
--- /dev/null
+++ b/tensorflow/python/kernel_tests/shape_ops_test.py
@@ -0,0 +1,389 @@
+"""Tests for various tensorflow.ops.tf."""
+import tensorflow.python.platform
+
+import numpy as np
+
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class ShapeOpsTest(tf.test.TestCase):
+
+  def _compareShape(self, x, use_gpu=False):
+    np_ans = np.array(np.shape(x))
+    with self.test_session(use_gpu=use_gpu):
+      tf_ans = tf.shape(x)
+      result = tf_ans.eval()
+    self.assertAllEqual(np_ans, result)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareRank(self, x, use_gpu=False):
+    np_ans = np.asarray(np.ndim(x))
+    with self.test_session(use_gpu=use_gpu):
+      tf_ans = tf.rank(x)
+      result = tf_ans.eval()
+    self.assertAllEqual(np_ans, result)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _compareSize(self, x, use_gpu=False):
+    np_ans = np.asarray(np.size(x))
+    with self.test_session(use_gpu=use_gpu):
+      tf_ans = tf.size(x)
+      result = tf_ans.eval()
+    self.assertAllEqual(np_ans, result)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def _testCpu(self, x):
+    self._compareShape(x, use_gpu=False)
+    self._compareRank(x, use_gpu=False)
+    self._compareSize(x, use_gpu=False)
+
+  def _testGpu(self, x):
+    self._compareShape(x, use_gpu=True)
+    self._compareRank(x, use_gpu=True)
+    self._compareSize(x, use_gpu=True)
+
+  def _testAll(self, x):
+    self._testCpu(x)
+    self._testGpu(x)
+
+  def testBasic(self):
+    self._testAll(np.zeros([2]))
+    self._testAll(np.zeros([2, 3]))
+    self._testAll(np.zeros([2, 3, 5]))
+    self._testAll(np.zeros([2, 3, 5, 7]))
+    self._testAll(np.zeros([2, 3, 5, 7, 11]))
+    self._testAll(np.zeros([2, 3, 5, 7, 11, 13]))
+
+  def _compareExpandDims(self, x, dim, use_gpu):
+    np_ans = np.expand_dims(x, axis=dim)
+    with self.test_session(use_gpu=use_gpu):
+      tensor = tf.expand_dims(x, dim)
+      tf_ans = tensor.eval()
+    self.assertShapeEqual(np_ans, tensor)
+    self.assertAllEqual(np_ans, tf_ans)
+
+  def _compareExpandDimsAll(self, x, dim):
+    self._compareExpandDims(x, dim, False)
+    self._compareExpandDims(x, dim, True)
+
+  def testExpandDims(self):
+    self._compareExpandDimsAll(np.zeros([2]), 0)
+    self._compareExpandDimsAll(np.zeros([2]), 1)
+    self._compareExpandDimsAll(np.zeros([2]), -1)
+
+    self._compareExpandDimsAll(np.zeros([2, 3]), 0)
+    self._compareExpandDimsAll(np.zeros([2, 3]), 1)
+    self._compareExpandDimsAll(np.zeros([2, 3]), 2)
+    self._compareExpandDimsAll(np.zeros([2, 3]), -1)
+    self._compareExpandDimsAll(np.zeros([2, 3]), -2)
+
+    self._compareExpandDimsAll(np.zeros([2, 3, 5]), 0)
+    self._compareExpandDimsAll(np.zeros([2, 3, 5]), 1)
+    self._compareExpandDimsAll(np.zeros([2, 3, 5]), 2)
+    self._compareExpandDimsAll(np.zeros([2, 3, 5]), 3)
+
+    self._compareExpandDimsAll(np.zeros([2, 3, 5]), -1)
+    self._compareExpandDimsAll(np.zeros([2, 3, 5]), -2)
+    self._compareExpandDimsAll(np.zeros([2, 3, 5]), -3)
+    self._compareExpandDimsAll(np.zeros([2, 3, 5]), -4)
+
+  def testExpandDimsErrors(self):
+    with self.test_session():
+      self.assertRaises(ValueError, tf.expand_dims, np.zeros([2, 3, 5]), -5)
+      self.assertRaises(ValueError, tf.expand_dims, np.zeros([2, 3, 5]), 4)
+
+  def testExpandDimsGradient(self):
+    with self.test_session():
+      inp = tf.constant(np.random.rand(4, 2).astype("f"),
+                     dtype=tf.float32)
+      squeezed = tf.expand_dims(inp, 1)
+
+      err = gc.ComputeGradientError(inp, [4, 2], squeezed, [4, 1, 2])
+    self.assertLess(err, 1e-3)
+
+  def testExpandDimsScalar(self):
+    with self.test_session():
+      inp = tf.constant(7)
+      self.assertAllEqual([7], tf.expand_dims(inp, 0).eval())
+      self.assertAllEqual([7], tf.expand_dims(inp, -1).eval())
+
+  def _compareSqueeze(self, x, squeeze_dims, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      if squeeze_dims:
+        np_ans = np.squeeze(x, axis=tuple(squeeze_dims))
+        tensor = tf.squeeze(x, squeeze_dims)
+        tf_ans = tensor.eval()
+      else:
+        np_ans = np.squeeze(x)
+        tensor = tf.squeeze(x)
+        tf_ans = tensor.eval()
+    self.assertShapeEqual(np_ans, tensor)
+    self.assertAllEqual(np_ans, tf_ans)
+
+  def _compareSqueezeAll(self, x, squeeze_dims=None):
+    if squeeze_dims is None:
+      squeeze_dims = []
+    self._compareSqueeze(x, squeeze_dims, False)
+    self._compareSqueeze(x, squeeze_dims, True)
+
+  def testSqueeze(self):
+    # Nothing to squeeze.
+    self._compareSqueezeAll(np.zeros([2]))
+    self._compareSqueezeAll(np.zeros([2, 3]))
+
+    # Squeeze the middle element away.
+    self._compareSqueezeAll(np.zeros([2, 1, 2]))
+
+    # Squeeze on both ends.
+    self._compareSqueezeAll(np.zeros([1, 2, 1, 3, 1]))
+
+  def testSqueezeSpecificDimension(self):
+    # Positive squeeze dim index.
+    self._compareSqueezeAll(np.zeros([1, 2, 1, 3, 1]), [0])
+    self._compareSqueezeAll(np.zeros([1, 2, 1, 3, 1]), [2, 4])
+    self._compareSqueezeAll(np.zeros([1, 2, 1, 3, 1]), [0, 4, 2])
+
+    # Negative squeeze dim index.
+    self._compareSqueezeAll(np.zeros([1, 2, 1, 3, 1]), [-1])
+    self._compareSqueezeAll(np.zeros([1, 2, 1, 3, 1]), [-3, -5])
+    self._compareSqueezeAll(np.zeros([1, 2, 1, 3, 1]), [-3, -5, -1])
+
+  def testSqueezeAllOnes(self):
+    # Numpy squeezes a 1 element tensor into a zero dimensional tensor.
+    # Verify that we do the same.
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        tensor = tf.squeeze(np.zeros([1, 1, 1]), [])
+        self.assertEqual(np.shape(1), tensor.get_shape())
+        tf_ans = tensor.eval()
+        self.assertEqual(np.shape(1), tf_ans.shape)
+
+  def testSqueezeOnlyOnes(self):
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        input_1x1x3 = np.zeros([1, 1, 3])
+        self._compareSqueezeAll(input_1x1x3)
+        self._compareSqueezeAll(input_1x1x3, [0])
+        self._compareSqueezeAll(input_1x1x3, [1])
+        self.assertRaises(ValueError, tf.squeeze, input_1x1x3, [2])
+
+  def testSqueezeErrors(self):
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        self.assertRaises(ValueError, tf.squeeze, np.zeros([1, 2, 1]), [-4])
+        self.assertRaises(ValueError, tf.squeeze, np.zeros([1, 2, 1]), [0, -4])
+        self.assertRaises(ValueError, tf.squeeze, np.zeros([1, 2, 1]), [3])
+        self.assertRaises(ValueError, tf.squeeze, np.zeros([1, 2, 1]), [2, 3])
+
+  def testSqueezeGradient(self):
+    with self.test_session():
+      inp = np.random.rand(4, 2).astype("f")
+      a = tf.reshape(inp, [4, 1, 2])
+      squeezed = tf.squeeze(a, [])
+
+      err = gc.ComputeGradientError(a, [4, 1, 2], squeezed, [4, 2])
+    self.assertLess(err, 1e-3)
+
+  def testSqueezeGradientWithSqueezeDims(self):
+    with self.test_session():
+      inp = np.random.rand(4, 2).astype("f")
+      a = tf.reshape(inp, [4, 1, 2, 1])
+      squeezed = tf.squeeze(a, [1])
+
+      err = gc.ComputeGradientError(a, [4, 1, 2, 1], squeezed, [4, 2, 1])
+    self.assertLess(err, 1e-3)
+
+
+class TileTest(tf.test.TestCase):
+
+  def testScalar(self):
+    with self.test_session():
+      a = tf.constant(7, shape=[], dtype=tf.float32)
+      tiled = tf.tile(a, [])
+      result = tiled.eval()
+    self.assertEqual(result.shape, ())
+    self.assertEqual([], tiled.get_shape())
+    self.assertEqual(7, result)
+
+  def testSimple(self):
+    with self.test_session():
+      inp = np.random.rand(4, 1).astype("f")
+      a = tf.constant([float(x) for x in inp.ravel(order="C")],
+                   shape=[4, 1], dtype=tf.float32)
+      tiled = tf.tile(a, [1, 4])
+      result = tiled.eval()
+    self.assertEqual(result.shape, (4, 4))
+    self.assertEqual([4, 4], tiled.get_shape())
+    self.assertTrue((result == np.tile(inp, (1, 4))).all())
+
+  def testTypes(self):
+    types_to_test = {
+        "bool": (tf.bool, bool),
+        "float32": (tf.float32, float),
+        "float64": (tf.float64, float),
+        "uint8": (tf.uint8, int),
+        "int32": (tf.int32, int),
+        "int64": (tf.int64, int),
+        "string": (tf.string, str)
+    }
+    for dtype_np, v in types_to_test.iteritems():
+      with self.test_session():
+        dtype_tf = v[0]
+        cast = v[1]
+        inp = np.random.rand(4, 1).astype(dtype_np)
+        a = tf.constant([cast(x) for x in inp.ravel(order="C")],
+                     shape=[4, 1],
+                     dtype=dtype_tf)
+        tiled = tf.tile(a, [1, 4])
+        result = tiled.eval()
+      self.assertEqual(result.shape, (4, 4))
+      self.assertEqual([4, 4], tiled.get_shape())
+      self.assertTrue((result == np.tile(inp, (1, 4))).all())
+
+  def testInvalidDim(self):
+    with self.test_session():
+      inp = np.random.rand(4, 1).astype("f")
+      a = tf.constant([float(x) for x in inp.ravel(order="C")],
+                   shape=[4, 1], dtype=tf.float32)
+      # Wrong length of multiples.
+      with self.assertRaises(ValueError):
+        tf.tile(a, [1, 4, 2])
+      # Wrong rank for multiples.
+      with self.assertRaises(ValueError):
+        tf.tile(a, [[2, 3], [3, 4]]).eval()
+
+  def _RunAndVerifyResult(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      # Random dims of rank 5
+      input_shape = np.random.randint(1, 4, size=5)
+      inp = np.random.rand(*input_shape).astype("f")
+      a = tf.constant([float(x) for x in inp.ravel(order="C")],
+                   shape=input_shape, dtype=tf.float32)
+      multiples = np.random.randint(1, 4, size=5).astype(np.int32)
+      tiled = tf.tile(a, multiples)
+      result = tiled.eval()
+    self.assertTrue((np.array(multiples) * np.array(inp.shape) ==
+                     np.array(result.shape)).all())
+    self.assertAllEqual(result, np.tile(inp, tuple(multiples)))
+    self.assertShapeEqual(result, tiled)
+
+  def testRandom(self):
+    for _ in range(5):
+      self._RunAndVerifyResult(use_gpu=False)
+    for _ in range(5):
+      self._RunAndVerifyResult(use_gpu=True)
+
+  def testGradientSimpleReduction(self):
+    with self.test_session():
+      inp = np.random.rand(4, 1).astype("f")
+      a = tf.constant([float(x) for x in inp.flatten()],
+                   shape=[4, 1], dtype=tf.float32)
+      tiled = tf.tile(a, [1, 4])
+      grad_shape = [4, 4]
+      grad_inp = np.random.rand(*grad_shape).astype("f")
+      grad_tensor = tf.constant([float(x) for x in grad_inp.flatten()],
+                             shape=grad_shape)
+      grad = tf.gradients([tiled], [a], [grad_tensor])[0]
+      self.assertShapeEqual(inp, grad)
+      result = grad.eval()
+    self.assertAllClose(np.sum(grad_inp, axis=1).reshape(4, 1), result, 1e-3)
+
+  def testGradientStridedReduction(self):
+    with self.test_session():
+      inp = np.random.rand(4, 2).astype("f")
+      a = tf.constant([float(x) for x in inp.flatten()],
+                   shape=[4, 2], dtype=tf.float32)
+      tiled = tf.tile(a, [1, 2])
+      grad_shape = [4, 4]
+      grad_inp = np.random.rand(*grad_shape).astype("f")
+      grad_tensor = tf.constant([float(x) for x in grad_inp.flatten()],
+                             shape=grad_shape)
+      grad = tf.gradients([tiled], [a], [grad_tensor])[0]
+      self.assertShapeEqual(inp, grad)
+      result = grad.eval()
+    expected_shape = [4, 2]
+    expected = np.zeros(expected_shape)
+    expected[:, 0] = grad_inp[:, 0] + grad_inp[:, 2]
+    expected[:, 1] = grad_inp[:, 1] + grad_inp[:, 3]
+    self.assertTrue((np.abs(expected - result) < 1e-3).all())
+
+  def testGradientSimpleReductionOnGPU(self):
+    with self.test_session(use_gpu=True):
+      inp = np.random.rand(4, 1).astype("f")
+      a = tf.constant([float(x) for x in inp.flatten()],
+                   shape=[4, 1], dtype=tf.float32)
+      tiled = tf.tile(a, [1, 4])
+      grad_shape = [4, 4]
+      grad_inp = np.random.rand(*grad_shape).astype("f")
+      grad_tensor = tf.constant([float(x) for x in grad_inp.flatten()],
+                             shape=grad_shape)
+      grad = tf.gradients([tiled], [a], [grad_tensor])[0]
+      result = grad.eval()
+    self.assertAllClose(np.sum(grad_inp, axis=1).reshape(4, 1), result, 1e-3)
+
+  def testGradientStridedReductionOnGPU(self):
+    with self.test_session(use_gpu=True):
+      inp = np.random.rand(4, 2).astype("f")
+      a = tf.constant([float(x) for x in inp.flatten()],
+                   shape=[4, 2], dtype=tf.float32)
+      tiled = tf.tile(a, [1, 2])
+      grad_shape = [4, 4]
+      grad_inp = np.random.rand(*grad_shape).astype("f")
+      grad_tensor = tf.constant([float(x) for x in grad_inp.flatten()],
+                             shape=grad_shape)
+      grad = tf.gradients([tiled], [a], [grad_tensor])[0]
+      result = grad.eval()
+    expected_shape = [4, 2]
+    expected = np.zeros(expected_shape)
+    expected[:, 0] = grad_inp[:, 0] + grad_inp[:, 2]
+    expected[:, 1] = grad_inp[:, 1] + grad_inp[:, 3]
+    self.assertAllClose(expected, result, 1e-3)
+
+  def _RunAndVerifyGradientResult(self, input_shape, multiples):
+    with self.test_session():
+      # Random values
+      inp = np.random.rand(*input_shape)
+      a = tf.constant([float(x) for x in inp.flatten()],
+                   shape=input_shape, dtype=tf.float64)
+      tiled = tf.tile(a, multiples)
+      grad_shape = list(np.array(multiples) * np.array(inp.shape))
+      err = gc.ComputeGradientError(a, list(input_shape), tiled, grad_shape,
+                                    x_init_value=inp)
+    print "tile(float) error = ", err
+    self.assertLess(err, 1e-3)
+
+  def testGradientRandom(self):
+    self._RunAndVerifyGradientResult([2, 2, 1, 1, 3], [1, 2, 1, 3, 1])
+    self._RunAndVerifyGradientResult([2, 3, 1, 1, 3], [3, 1, 1, 2, 2])
+    self._RunAndVerifyGradientResult([2, 1, 3, 3, 2], [1, 3, 3, 1, 2])
+
+  def testGradientStridedReductionGC(self):
+    with self.test_session():
+      inp = np.random.rand(4, 2).astype("f")
+      a = tf.constant([float(x) for x in inp.flatten()],
+                   shape=[4, 2], dtype=tf.float32)
+      tiled = tf.tile(a, [1, 2])
+      err = gc.ComputeGradientError(a, [4, 2], tiled, [4, 4])
+    self.assertLess(err, 1e-3)
+
+  def testShapeFunctionEdgeCases(self):
+    # Unknown multiples shape.
+    inp = tf.constant(0.0, shape=[4, 4, 4, 4])
+    tiled = tf.tile(inp, tf.placeholder(tf.int32))
+    self.assertEqual([None, None, None, None], tiled.get_shape().as_list())
+
+    # Unknown input shape.
+    inp = tf.placeholder(tf.float32)
+    tiled = tf.tile(inp, [2, 2, 2, 2])
+    self.assertEqual([None, None, None, None], tiled.get_shape().as_list())
+
+    # Unknown input and multiples shape.
+    inp = tf.placeholder(tf.float32)
+    tiled = tf.tile(inp, tf.placeholder(tf.int32))
+    self.assertIs(None, tiled.get_shape().ndims)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/slice_op_test.py b/tensorflow/python/kernel_tests/slice_op_test.py
new file mode 100644
index 0000000000..62d7e31dfc
--- /dev/null
+++ b/tensorflow/python/kernel_tests/slice_op_test.py
@@ -0,0 +1,235 @@
+"""Functional tests for slice op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class SliceTest(tf.test.TestCase):
+
+  def _testEmpty(self, use_gpu):
+    inp = np.random.rand(4, 4).astype("f")
+    for k in xrange(4):
+      with self.test_session(use_gpu=use_gpu):
+        a = tf.constant(inp, shape=[4, 4], dtype=tf.float32)
+        slice_t = a[2, k:k]
+        slice_val = slice_t.eval()
+      self.assertAllEqual(slice_val, inp[2, k:k])
+
+  def testEmptyAll(self):
+    self._testEmpty(use_gpu=False)
+    self._testEmpty(use_gpu=True)
+
+  def _testInt32(self, use_gpu):
+    inp = np.random.rand(4, 4).astype("i")
+    for k in xrange(4):
+      with self.test_session(use_gpu=use_gpu):
+        a = tf.constant(inp, shape=[4, 4], dtype=tf.int32)
+        slice_t = a[2, k:k]
+        slice_val = slice_t.eval()
+      self.assertAllEqual(slice_val, inp[2, k:k])
+
+  def testInt32(self):
+    self._testEmpty(use_gpu=False)
+    self._testEmpty(use_gpu=True)
+
+  def _testSelectAll(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      inp = np.random.rand(4, 4, 4, 4).astype("f")
+      a = tf.constant(inp, shape=[4, 4, 4, 4],
+                               dtype=tf.float32)
+
+      slice_explicit_t = tf.slice(a, [0, 0, 0, 0], [-1, -1, -1, -1])
+      slice_implicit_t = a[:, :, :, :]
+
+      self.assertAllEqual(inp, slice_explicit_t.eval())
+      self.assertAllEqual(inp, slice_implicit_t.eval())
+      self.assertEqual(inp.shape, slice_explicit_t.get_shape())
+      self.assertEqual(inp.shape, slice_implicit_t.get_shape())
+
+  def testSelectAll(self):
+    for _ in range(10):
+      self._testSelectAll(use_gpu=False)
+      self._testSelectAll(use_gpu=True)
+
+  def _testSingleDimension(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      inp = np.random.rand(10).astype("f")
+      a = tf.constant(inp, shape=[10], dtype=tf.float32)
+
+      hi = np.random.random_integers(0, 9)
+      scalar_t = a[hi]
+      scalar_val = scalar_t.eval()
+      self.assertAllEqual(scalar_val, inp[hi])
+
+      lo = np.random.random_integers(0, hi)
+      slice_t = a[lo:hi]
+      slice_val = slice_t.eval()
+      self.assertAllEqual(slice_val, inp[lo:hi])
+
+  def testSingleDimension(self):
+    for _ in range(10):
+      self._testSingleDimension(use_gpu=False)
+      self._testSingleDimension(use_gpu=True)
+
+  def _testSliceMatrixDim0(self, x, begin, size, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      tf_ans = tf.slice(x, [begin, 0], [size, x.shape[1]]).eval()
+    np_ans = x[begin:begin+size, :]
+    self.assertAllEqual(tf_ans, np_ans)
+
+  def testSliceMatrixDim0(self):
+    for use_gpu in [False, True]:
+      x = np.random.rand(8, 4).astype("f")
+      self._testSliceMatrixDim0(x, 1, 2, use_gpu)
+      self._testSliceMatrixDim0(x, 3, 3, use_gpu)
+      y = np.random.rand(8, 7).astype("f")    # 7 * sizeof(float) is not aligned
+      self._testSliceMatrixDim0(y, 1, 2, use_gpu)
+      self._testSliceMatrixDim0(y, 3, 3, use_gpu)
+
+  def _testIndexAndSlice(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      inp = np.random.rand(4, 4).astype("f")
+      a = tf.constant(inp, shape=[4, 4], dtype=tf.float32)
+
+      x, y = np.random.random_integers(0, 3, size=2).tolist()
+      slice_t = a[x, 0:y]
+      slice_val = slice_t.eval()
+    self.assertAllEqual(slice_val, inp[x, 0:y])
+
+  def testSingleElementAll(self):
+    for _ in range(10):
+      self._testIndexAndSlice(use_gpu=False)
+      self._testIndexAndSlice(use_gpu=True)
+
+  def _testSimple(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu) as sess:
+      inp = np.random.rand(4, 4).astype("f")
+      a = tf.constant([float(x) for x in inp.ravel(order="C")],
+                               shape=[4, 4], dtype=tf.float32)
+      slice_t = tf.slice(a, [0, 0], [2, 2])
+      slice2_t = a[:2, :2]
+      slice_val, slice2_val = sess.run([slice_t, slice2_t])
+    self.assertAllEqual(slice_val, inp[:2, :2])
+    self.assertAllEqual(slice2_val, inp[:2, :2])
+    self.assertEqual(slice_val.shape, slice_t.get_shape())
+    self.assertEqual(slice2_val.shape, slice2_t.get_shape())
+
+  def testSimpleAll(self):
+    self._testSimple(use_gpu=False)
+    self._testSimple(use_gpu=True)
+
+  def _testComplex(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      inp = np.random.rand(4, 10, 10, 4).astype("f")
+      a = tf.constant(inp, dtype=tf.float32)
+
+      x = np.random.random_integers(0, 9)
+      z = np.random.random_integers(0, 9)
+      y = np.random.random_integers(0, z)
+      slice_t = a[:, x, y:z, :]
+      self.assertAllEqual(slice_t.eval(), inp[:, x, y:z, :])
+
+  def testComplex(self):
+    for _ in range(10):
+      self._testComplex(use_gpu=False)
+      self._testComplex(use_gpu=True)
+
+  def _RunAndVerifyResult(self, use_gpu):
+    # Random dims of rank 5
+    input_shape = np.random.randint(0, 20, size=5)
+    inp = np.random.rand(*input_shape).astype("f")
+    with self.test_session(use_gpu=use_gpu) as sess:
+      a = tf.constant([float(x) for x in inp.ravel(order="C")],
+                               shape=input_shape, dtype=tf.float32)
+      indices = [0 if x == 0 else np.random.randint(x) for x in input_shape]
+      sizes = [np.random.randint(0, input_shape[i] - indices[i] + 1)
+               for i in range(5)]
+      slice_t = tf.slice(a, indices, sizes)
+      slice2_t = a[indices[0]:indices[0]+sizes[0],
+                   indices[1]:indices[1]+sizes[1],
+                   indices[2]:indices[2]+sizes[2],
+                   indices[3]:indices[3]+sizes[3],
+                   indices[4]:indices[4]+sizes[4]]
+
+      slice_val, slice2_val = sess.run([slice_t, slice2_t])
+
+    expected_val = inp[indices[0]:indices[0]+sizes[0],
+                       indices[1]:indices[1]+sizes[1],
+                       indices[2]:indices[2]+sizes[2],
+                       indices[3]:indices[3]+sizes[3],
+                       indices[4]:indices[4]+sizes[4]]
+    self.assertAllEqual(slice_val, expected_val)
+    self.assertAllEqual(slice2_val, expected_val)
+    self.assertEqual(expected_val.shape, slice_t.get_shape())
+    self.assertEqual(expected_val.shape, slice2_t.get_shape())
+
+  def testRandom(self):
+    for _ in range(10):
+      self._RunAndVerifyResult(use_gpu=False)
+      self._RunAndVerifyResult(use_gpu=True)
+
+  def _testGradientSlice(self, input_shape, slice_begin, slice_size, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      num_inputs = np.prod(input_shape)
+      num_grads = np.prod(slice_size)
+      inp = np.random.rand(num_inputs).astype("f").reshape(input_shape)
+      a = tf.constant([float(x) for x in inp.ravel(order="C")],
+                               shape=input_shape, dtype=tf.float32)
+      slice_t = tf.slice(a, slice_begin, slice_size)
+      grads = np.random.rand(num_grads).astype("f").reshape(slice_size)
+      grad_tensor = tf.constant(grads)
+      grad = tf.gradients(slice_t, [a], grad_tensor)[0]
+      result = grad.eval()
+
+    # Create a zero tensor of the input shape ane place
+    # the grads into the right location to compare against TensorFlow.
+    np_ans = np.zeros(input_shape)
+    slices = []
+    for i in xrange(len(input_shape)):
+      slices.append(slice(slice_begin[i], slice_begin[i] + slice_size[i]))
+    np_ans[slices] = grads
+
+    self.assertAllClose(np_ans, result)
+
+  def _testGradientVariableSize(self, use_gpu):
+    with self.test_session(use_gpu=use_gpu):
+      inp = tf.constant([1.0, 2.0, 3.0], name="in")
+      out = tf.slice(inp, [1], [-1])
+      grad_actual = tf.gradients(out, inp)[0].eval()
+    self.assertAllClose([0., 1., 1.], grad_actual)
+
+  def _testGradientsSimple(self, use_gpu):
+    # Slice the middle square out of a 4x4 input
+    self._testGradientSlice([4, 4], [1, 1], [2, 2], use_gpu)
+
+    # Slice the upper left square out of a 4x4 input
+    self._testGradientSlice([4, 4], [0, 0], [2, 2], use_gpu)
+
+    # Slice a non-square input starting from (2,1)
+    self._testGradientSlice([4, 4], [2, 1], [1, 2], use_gpu)
+
+    # Slice a 3D tensor
+    self._testGradientSlice([3, 3, 3], [0, 1, 0], [2, 1, 1], use_gpu)
+
+    # Use -1 as a slice dimension.
+    self._testGradientVariableSize(use_gpu)
+
+  def testGradientsAll(self):
+    self._testGradientsSimple(use_gpu=False)
+    self._testGradientsSimple(use_gpu=True)
+
+  def testNotIterable(self):
+    # NOTE(mrry): If we register __getitem__ as an overloaded
+    # operator, Python will valiantly attempt to iterate over the
+    # Tensor from 0 to infinity.  This test ensures that this
+    # unintended behavior is prevented.
+    c = tf.constant(5.0)
+    with self.assertRaisesWithPredicateMatch(
+        TypeError,
+        lambda e: "'Tensor' object is not iterable" in e.message):
+      for _ in c:
+        pass
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/softmax_op_test.py b/tensorflow/python/kernel_tests/softmax_op_test.py
new file mode 100644
index 0000000000..fd25970093
--- /dev/null
+++ b/tensorflow/python/kernel_tests/softmax_op_test.py
@@ -0,0 +1,65 @@
+"""Tests for SoftmaxOp."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class SoftmaxTest(tf.test.TestCase):
+
+  def _npSoftmax(self, features):
+    batch_dim = 0
+    class_dim = 1
+    batch_size = features.shape[batch_dim]
+    e = np.exp(features -
+               np.reshape(np.amax(features, axis=class_dim), [batch_size, 1]))
+    return e / np.reshape(np.sum(e, axis=class_dim), [batch_size, 1])
+
+  def _testSoftmax(self, np_features, use_gpu=False):
+    np_softmax = self._npSoftmax(np_features)
+    with self.test_session(use_gpu=use_gpu):
+      tf_softmax = tf.nn.softmax(np_features)
+      out = tf_softmax.eval()
+    self.assertAllClose(np_softmax, out)
+    self.assertShapeEqual(np_softmax, tf_softmax)
+    # Bonus check: the softmaxes should add to one in each
+    # batch element.
+    self.assertAllClose(np.ones(out.shape[0]),
+                        np.sum(out, axis=1))
+
+  def _testAll(self, features):
+    self._testSoftmax(features, use_gpu=False)
+    self._testSoftmax(features, use_gpu=True)
+
+  def testNpSoftmax(self):
+    features = [[1., 1., 1., 1.], [1., 2., 3., 4.]]
+    # Batch 0: All exps are 1.  The expected result is
+    # [0.25, 0.25, 0.25, 0.25]
+    #
+    # Batch 1:
+    # exps = [1., 2.718, 7.389, 20.085]
+    # sum = 31.192
+    # Softmaxes = exps / sum = [0.0320586, 0.08714432, 0.23688282, 0.64391426]
+    np_sm = self._npSoftmax(np.array(features))
+    self.assertAllClose(
+        np.array([[0.25, 0.25, 0.25, 0.25],
+                  [0.0320586, 0.08714432, 0.23688282, 0.64391426]]),
+        np_sm,
+        rtol=1.e-5, atol=1.e-5)
+
+  def testShapeMismatch(self):
+    with self.assertRaises(ValueError):
+      tf.nn.softmax([0., 1., 2., 3.])
+
+  def testFloat(self):
+    self._testAll(
+        np.array([[1., 1., 1., 1.], [1., 2., 3., 4.]]).astype(np.float32))
+
+  def testDouble(self):
+    self._testSoftmax(
+        np.array([[1., 1., 1., 1.], [1., 2., 3., 4.]]).astype(np.float64),
+        use_gpu=False)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/softplus_op_test.py b/tensorflow/python/kernel_tests/softplus_op_test.py
new file mode 100644
index 0000000000..25b68aa659
--- /dev/null
+++ b/tensorflow/python/kernel_tests/softplus_op_test.py
@@ -0,0 +1,47 @@
+"""Tests for Softplus and SoftplusGrad."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class SoftplusTest(tf.test.TestCase):
+
+  def _npSoftplus(self, np_features):
+    return np.log(1 + np.exp(np_features))
+
+  def _testSoftplus(self, np_features, use_gpu=False):
+    np_softplus = self._npSoftplus(np_features)
+    with self.test_session(use_gpu=use_gpu):
+      softplus = tf.nn.softplus(np_features)
+      tf_softplus = softplus.eval()
+    self.assertAllClose(np_softplus, tf_softplus)
+    self.assertShapeEqual(np_softplus, softplus)
+
+  def testNumbers(self):
+    for t in [np.float, np.double]:
+      self._testSoftplus(
+          np.array([[-9, 7, -5, 3, -1], [1, -3, 5, -7, 9]]).astype(t),
+          use_gpu=False)
+      self._testSoftplus(
+          np.array([[-9, 7, -5, 3, -1], [1, -3, 5, -7, 9]]).astype(t),
+          use_gpu=True)
+
+  def testGradient(self):
+    with self.test_session():
+      x = tf.constant(
+          [-0.9, -0.7, -0.5, -0.3, -0.1, 0.1, 0.3, 0.5, 0.7, 0.9],
+          shape=[2, 5], name="x")
+      y = tf.nn.softplus(x, name="softplus")
+      x_init = np.asarray(
+          [[-0.9, -0.7, -0.5, -0.3, -0.1], [0.1, 0.3, 0.5, 0.7, 0.9]],
+          dtype=np.float32, order="F")
+      err = gc.ComputeGradientError(x, [2, 5], y, [2, 5], x_init_value=x_init)
+    print "softplus (float) gradient err = ", err
+    self.assertLess(err, 1e-4)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/sparse_concat_op_test.py b/tensorflow/python/kernel_tests/sparse_concat_op_test.py
new file mode 100644
index 0000000000..0f5650b89c
--- /dev/null
+++ b/tensorflow/python/kernel_tests/sparse_concat_op_test.py
@@ -0,0 +1,260 @@
+"""Tests for SparseConcat."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class SparseConcatTest(tf.test.TestCase):
+
+  def _SparseTensor_UnknownShape(self, ind_shape=None, val_shape=None,
+                                 shape_shape=None):
+    return tf.SparseTensor(
+        tf.placeholder(tf.int64, shape=ind_shape),
+        tf.placeholder(tf.float32, shape=val_shape),
+        tf.placeholder(tf.int64, shape=shape_shape))
+
+  def _SparseTensor_3x3(self):
+    # [    1]
+    # [2    ]
+    # [3   4]
+    ind = np.array([[0, 2], [1, 0], [2, 0], [2, 2]])
+    val = np.array([1, 2, 3, 4])
+    shape = np.array([3, 3])
+    return tf.SparseTensor(
+        tf.constant(ind, tf.int64),
+        tf.constant(val, tf.float32),
+        tf.constant(shape, tf.int64))
+
+  def _SparseTensor_3x5(self):
+    # [         ]
+    # [  1      ]
+    # [2     1 0]
+    ind = np.array([[1, 1], [2, 0], [2, 3], [2, 4]])
+    val = np.array([1, 2, 1, 0])
+    shape = np.array([3, 5])
+    return tf.SparseTensor(
+        tf.constant(ind, tf.int64),
+        tf.constant(val, tf.float32),
+        tf.constant(shape, tf.int64))
+
+  def _SparseTensor_3x2(self):
+    # [   ]
+    # [1  ]
+    # [2  ]
+    ind = np.array([[1, 0], [2, 0]])
+    val = np.array([1, 2])
+    shape = np.array([3, 2])
+    return tf.SparseTensor(
+        tf.constant(ind, tf.int64),
+        tf.constant(val, tf.float32),
+        tf.constant(shape, tf.int64))
+
+  def _SparseTensor_2x3(self):
+    # [  1  ]
+    # [1   2]
+    ind = np.array([[0, 1], [1, 0], [1, 2]])
+    val = np.array([1, 1, 2])
+    shape = np.array([2, 3])
+    return tf.SparseTensor(
+        tf.constant(ind, tf.int64),
+        tf.constant(val, tf.float32),
+        tf.constant(shape, tf.int64))
+
+  def _SparseTensor_2x3x4(self):
+    ind = np.array([
+        [0, 0, 1],
+        [0, 1, 0], [0, 1, 2],
+        [1, 0, 3],
+        [1, 1, 1], [1, 1, 3],
+        [1, 2, 2]])
+    val = np.array([1, 10, 12, 103, 111, 113, 122])
+    shape = np.array([2, 3, 4])
+    return tf.SparseTensor(
+        tf.constant(ind, tf.int64),
+        tf.constant(val, tf.float32),
+        tf.constant(shape, tf.int64))
+
+  def _SparseTensor_String3x3(self):
+    # [    a]
+    # [b    ]
+    # [c   d]
+    ind = np.array([[0, 2], [1, 0], [2, 0], [2, 2]])
+    val = np.array(["a", "b", "c", "d"])
+    shape = np.array([3, 3])
+    return tf.SparseTensor(
+        tf.constant(ind, tf.int64),
+        tf.constant(val, tf.string),
+        tf.constant(shape, tf.int64))
+
+  def _SparseTensor_String3x5(self):
+    # [         ]
+    # [  e      ]
+    # [f     g h]
+    ind = np.array([[1, 1], [2, 0], [2, 3], [2, 4]])
+    val = np.array(["e", "f", "g", "h"])
+    shape = np.array([3, 5])
+    return tf.SparseTensor(
+        tf.constant(ind, tf.int64),
+        tf.constant(val, tf.string),
+        tf.constant(shape, tf.int64))
+
+  def testConcat1(self):
+    with self.test_session(use_gpu=False) as sess:
+      # concat(A):
+      # [    1]
+      # [2    ]
+      # [3   4]
+      sp_a = self._SparseTensor_3x3()
+
+      sp_concat = tf.sparse_concat(1, [sp_a])
+
+      self.assertEqual(sp_concat.indices.get_shape(), [4, 2])
+      self.assertEqual(sp_concat.values.get_shape(), [4])
+      self.assertEqual(sp_concat.shape.get_shape(), [2])
+
+      concat_out = sess.run(sp_concat)
+
+      self.assertAllEqual(
+          concat_out.indices, [[0, 2], [1, 0], [2, 0], [2, 2]])
+      self.assertAllEqual(concat_out.values, [1, 2, 3, 4])
+      self.assertAllEqual(concat_out.shape, [3, 3])
+
+  def testConcat2(self):
+    with self.test_session(use_gpu=False) as sess:
+      # concat(A, B):
+      # [    1          ]
+      # [2       1      ]
+      # [3   4 2     1 0]
+      sp_a = self._SparseTensor_3x3()
+      sp_b = self._SparseTensor_3x5()
+
+      sp_concat = tf.sparse_concat(1, [sp_a, sp_b])
+
+      self.assertEqual(sp_concat.indices.get_shape(), [8, 2])
+      self.assertEqual(sp_concat.values.get_shape(), [8])
+      self.assertEqual(sp_concat.shape.get_shape(), [2])
+
+      concat_out = sess.run(sp_concat)
+
+      self.assertAllEqual(
+          concat_out.indices,
+          [[0, 2], [1, 0], [1, 4], [2, 0], [2, 2], [2, 3], [2, 6], [2, 7]])
+      self.assertAllEqual(concat_out.values, [1, 2, 1, 3, 4, 2, 1, 0])
+      self.assertAllEqual(concat_out.shape, [3, 8])
+
+  def testConcatDim0(self):
+    with self.test_session(use_gpu=False) as sess:
+      # concat(A, D):
+      # [    1]
+      # [2    ]
+      # [3   4]
+      # [  1  ]
+      # [1   2]
+      sp_a = self._SparseTensor_3x3()
+      sp_d = self._SparseTensor_2x3()
+
+      sp_concat = tf.sparse_concat(0, [sp_a, sp_d])
+
+      self.assertEqual(sp_concat.indices.get_shape(), [7, 2])
+      self.assertEqual(sp_concat.values.get_shape(), [7])
+      self.assertEqual(sp_concat.shape.get_shape(), [2])
+
+      concat_out = sess.run(sp_concat)
+
+      self.assertAllEqual(
+          concat_out.indices,
+          [[0, 2], [1, 0], [2, 0], [2, 2], [3, 1], [4, 0], [4, 2]])
+      self.assertAllEqual(
+          concat_out.values, np.array([1, 2, 3, 4, 1, 1, 2]))
+      self.assertAllEqual(
+          concat_out.shape, np.array([5, 3]))
+
+  def testConcat3(self):
+    with self.test_session(use_gpu=False) as sess:
+      # concat(A, B, C):
+      # [    1              ]
+      # [2       1       1  ]
+      # [3   4 2     1 0 2  ]
+      sp_a = self._SparseTensor_3x3()
+      sp_b = self._SparseTensor_3x5()
+      sp_c = self._SparseTensor_3x2()
+
+      sp_concat = tf.sparse_concat(1, [sp_a, sp_b, sp_c])
+
+      self.assertEqual(sp_concat.indices.get_shape(), [10, 2])
+      self.assertEqual(sp_concat.values.get_shape(), [10])
+      self.assertEqual(sp_concat.shape.get_shape(), [2])
+
+      concat_out = sess.run(sp_concat)
+
+      self.assertAllEqual(
+          concat_out.indices,
+          [[0, 2], [1, 0], [1, 4], [1, 8], [2, 0], [2, 2], [2, 3], [2, 6],
+           [2, 7], [2, 8]])
+      self.assertAllEqual(concat_out.values, [1, 2, 1, 1, 3, 4, 2, 1, 0, 2])
+      self.assertAllEqual(concat_out.shape, [3, 10])
+
+  def testConcatNonNumeric(self):
+    with self.test_session(use_gpu=False) as sess:
+      # concat(A, B):
+      # [    a          ]
+      # [b       e      ]
+      # [c   d f     g h]
+      sp_a = self._SparseTensor_String3x3()
+      sp_b = self._SparseTensor_String3x5()
+
+      sp_concat = tf.sparse_concat(1, [sp_a, sp_b])
+
+      self.assertEqual(sp_concat.indices.get_shape(), [8, 2])
+      self.assertEqual(sp_concat.values.get_shape(), [8])
+      self.assertEqual(sp_concat.shape.get_shape(), [2])
+
+      concat_out = sess.run(sp_concat)
+
+      self.assertAllEqual(
+          concat_out.indices,
+          [[0, 2], [1, 0], [1, 4], [2, 0], [2, 2], [2, 3], [2, 6], [2, 7]])
+      self.assertAllEqual(
+          concat_out.values, ["a", "b", "e", "c", "d", "f", "g", "h"])
+      self.assertAllEqual(concat_out.shape, [3, 8])
+
+  def testMismatchedRank(self):
+    with self.test_session(use_gpu=False):
+      sp_a = self._SparseTensor_3x3()
+      sp_e = self._SparseTensor_2x3x4()
+
+      # Rank mismatches can be caught at shape-inference time
+      with self.assertRaises(ValueError):
+        tf.sparse_concat(1, [sp_a, sp_e])
+
+  def testMismatchedShapes(self):
+    with self.test_session(use_gpu=False) as sess:
+      sp_a = self._SparseTensor_3x3()
+      sp_b = self._SparseTensor_3x5()
+      sp_c = self._SparseTensor_3x2()
+      sp_d = self._SparseTensor_2x3()
+      sp_concat = tf.sparse_concat(1, [sp_a, sp_b, sp_c, sp_d])
+
+      # Shape mismatches can only be caught when the op is run
+      with self.assertRaisesOpError("Input shapes must match"):
+        sess.run(sp_concat)
+
+  def testShapeInferenceUnknownShapes(self):
+    with self.test_session(use_gpu=False):
+      sp_inputs = [
+          self._SparseTensor_UnknownShape(),
+          self._SparseTensor_UnknownShape(val_shape=[3]),
+          self._SparseTensor_UnknownShape(ind_shape=[1, 3]),
+          self._SparseTensor_UnknownShape(shape_shape=[3])]
+
+      sp_concat = tf.sparse_concat(0, sp_inputs)
+
+      self.assertEqual(sp_concat.indices.get_shape().as_list(), [None, 3])
+      self.assertEqual(sp_concat.values.get_shape().as_list(), [None])
+      self.assertEqual(sp_concat.shape.get_shape(), [3])
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/sparse_matmul_op_test.py b/tensorflow/python/kernel_tests/sparse_matmul_op_test.py
new file mode 100644
index 0000000000..d87d15cae9
--- /dev/null
+++ b/tensorflow/python/kernel_tests/sparse_matmul_op_test.py
@@ -0,0 +1,82 @@
+"""Tests for tensorflow.ops.tf.matmul."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+def RandMatrix(rows, cols, tr):
+  if tr:
+    rows, cols = cols, rows
+  return (np.clip(np.random.uniform(low=-100.0, high=100.0, size=rows * cols),
+                  0, 100) / 100).reshape([rows, cols]).astype(np.float32)
+
+
+class SparseMatMulTest(tf.test.TestCase):
+
+  def _testCpuMatmul(self, x, y, tr_a=False, tr_b=False,
+                     sp_a=True, sp_b=False):
+    x_mat = np.matrix(x)
+    if tr_a:
+      x_mat = np.transpose(x_mat)
+    y_mat = np.matrix(y)
+    if tr_b:
+      y_mat = np.transpose(y_mat)
+    np_ans = x_mat * y_mat
+    with self.test_session(use_gpu=False):
+      tf_ans = tf.matmul(x, y,
+                                transpose_a=tr_a, transpose_b=tr_b,
+                                a_is_sparse=sp_a,
+                                b_is_sparse=sp_b)
+      out = tf_ans.eval()
+    self.assertAllClose(np_ans, out)
+    self.assertShapeEqual(np_ans, tf_ans)
+
+  def testFloatBasic(self):
+    x = np.arange(0., 4.).reshape([4, 1]).astype(np.float32)
+    y = np.arange(-1., 1.).reshape([1, 2]).astype(np.float32)
+    self._testCpuMatmul(x, y)
+
+  # Tests testing random sized matrices.
+  def testFloatRandom(self):
+    for _ in range(10):
+      for tr_a in [True, False]:
+        for tr_b in [True, False]:
+          for sp_a in [True, False]:
+            for sp_b in [True, False]:
+              n, k, m = np.random.randint(1, 100, size=3)
+              x = RandMatrix(n, k, tr_a)
+              y = RandMatrix(k, m, tr_b)
+              self._testCpuMatmul(x, y, tr_a, tr_b, sp_a, sp_b)
+
+
+class MatMulGradientTest(tf.test.TestCase):
+
+  def _testGradients(self, tr_a, tr_b, sp_a, sp_b, name):
+    with self.test_session():
+      a = tf.constant(RandMatrix(3, 2, tr_a), dtype=tf.float32)
+      b = tf.constant(RandMatrix(2, 4, tr_b), dtype=tf.float32)
+      m = tf.matmul(a, b,
+                           name=name,
+                           transpose_a=tr_a,
+                           transpose_b=tr_b,
+                           a_is_sparse=sp_a,
+                           b_is_sparse=sp_b)
+      err = (gc.ComputeGradientError(a, [2, 3] if tr_a else [3, 2], m, [3, 4]) +
+             gc.ComputeGradientError(b, [4, 2] if tr_b else [2, 4], m, [3, 4]))
+    print "sparse_matmul gradient err = ", err
+    self.assertLess(err, 1e-3)
+
+  def testGradientInput(self):
+    for tr_a in [True, False]:
+      for tr_b in [True, False]:
+        for sp_a in [True, False]:
+          for sp_b in [True, False]:
+            name = "sparse_matmul_%s_%s_%s_%s" % (tr_a, tr_b, sp_a, sp_b)
+            self._testGradients(tr_a, tr_b, sp_a, sp_b, name)
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/sparse_reorder_op_test.py b/tensorflow/python/kernel_tests/sparse_reorder_op_test.py
new file mode 100644
index 0000000000..c3bcc25311
--- /dev/null
+++ b/tensorflow/python/kernel_tests/sparse_reorder_op_test.py
@@ -0,0 +1,56 @@
+"""Tests for SparseReorder."""
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class SparseReorderTest(tf.test.TestCase):
+
+  def _SparseTensorPlaceholder(self):
+    return tf.SparseTensor(
+        tf.placeholder(tf.int64),
+        tf.placeholder(tf.int32),
+        tf.placeholder(tf.int64))
+
+  def _SparseTensorValue_5x6(self, permutation):
+    ind = np.array([
+        [0, 0],
+        [1, 0], [1, 3], [1, 4],
+        [3, 2], [3, 3]]).astype(np.int64)
+    val = np.array([0, 10, 13, 14, 32, 33]).astype(np.int32)
+
+    ind = ind[permutation]
+    val = val[permutation]
+
+    shape = np.array([5, 6]).astype(np.int64)
+    return tf.SparseTensorValue(ind, val, shape)
+
+  def testAlreadyInOrder(self):
+    with self.test_session(use_gpu=False) as sess:
+      sp_input = self._SparseTensorPlaceholder()
+      input_val = self._SparseTensorValue_5x6(np.arange(6))
+      sp_output = tf.sparse_reorder(sp_input)
+
+      output_val = sess.run(sp_output, {sp_input: input_val})
+      self.assertAllEqual(output_val.indices, input_val.indices)
+      self.assertAllEqual(output_val.values, input_val.values)
+      self.assertAllEqual(output_val.shape, input_val.shape)
+
+  def testOutOfOrder(self):
+    expected_output_val = self._SparseTensorValue_5x6(np.arange(6))
+    with self.test_session(use_gpu=False) as sess:
+      for _ in range(5):  # To test various random permutations
+        sp_input = self._SparseTensorPlaceholder()
+        input_val = self._SparseTensorValue_5x6(np.random.permutation(6))
+        sp_output = tf.sparse_reorder(sp_input)
+
+        output_val = sess.run(sp_output, {sp_input: input_val})
+        self.assertAllEqual(output_val.indices, expected_output_val.indices)
+        self.assertAllEqual(output_val.values, expected_output_val.values)
+        self.assertAllEqual(output_val.shape, expected_output_val.shape)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/sparse_to_dense_op_py_test.py b/tensorflow/python/kernel_tests/sparse_to_dense_op_py_test.py
new file mode 100644
index 0000000000..2bab89923e
--- /dev/null
+++ b/tensorflow/python/kernel_tests/sparse_to_dense_op_py_test.py
@@ -0,0 +1,111 @@
+"""Tests for tensorflow.kernels.sparse_op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+def _SparseToDense(sparse_indices, output_size, sparse_values,
+                   default_value):
+  return tf.sparse_to_dense(sparse_indices, output_size,
+                            sparse_values, default_value)
+
+
+class SparseToDenseTest(tf.test.TestCase):
+
+  def testInt(self):
+    with self.test_session(use_gpu=False):
+      tf_ans = _SparseToDense([1, 3], [5], 1, 0).eval()
+    np_ans = np.array([0, 1, 0, 1, 0]).astype(np.int32)
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testFloat(self):
+    with self.test_session(use_gpu=False):
+      tf_ans = _SparseToDense([1, 3], [5], 1.0, 0.0).eval()
+    np_ans = np.array([0, 1, 0, 1, 0]).astype(np.float32)
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testString(self):
+    with self.test_session(use_gpu=False):
+      tf_ans = _SparseToDense([1, 3], [5], "a", "b").eval()
+    np_ans = np.array(["b", "a", "b", "a", "b"]).astype(np.string_)
+    self.assertAllEqual(np_ans, tf_ans)
+
+  def testSetValue(self):
+    with self.test_session(use_gpu=False):
+      tf_ans = _SparseToDense([1, 3], [5], [1, 2], -1).eval()
+    np_ans = np.array([-1, 1, -1, 2, -1]).astype(np.int32)
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testSetSingleValue(self):
+    with self.test_session(use_gpu=False):
+      tf_ans = _SparseToDense([1, 3], [5], 1, -1).eval()
+    np_ans = np.array([-1, 1, -1, 1, -1]).astype(np.int32)
+    self.assertAllClose(np_ans, tf_ans)
+
+  def test2d(self):
+    # pylint: disable=bad-whitespace
+    with self.test_session(use_gpu=False):
+      tf_ans = _SparseToDense([[1, 3], [2, 0]], [3, 4], 1, -1).eval()
+    np_ans = np.array([[-1, -1, -1, -1],
+                       [-1, -1, -1,  1],
+                       [ 1, -1, -1, -1]]).astype(np.int32)
+    self.assertAllClose(np_ans, tf_ans)
+
+  def test3d(self):
+    with self.test_session(use_gpu=False):
+      tf_ans = _SparseToDense([[1, 3, 0], [2, 0, 1]], [3, 4, 2], 1, -1).eval()
+    np_ans = np.ones((3, 4, 2), dtype=np.int32) * -1
+    np_ans[1, 3, 0] = 1
+    np_ans[2, 0, 1] = 1
+    self.assertAllClose(np_ans, tf_ans)
+
+  def testBadShape(self):
+    with self.test_session():
+      with self.assertRaisesWithPredicateMatch(
+          ValueError, lambda e: ("Input shape should be a vector" == str(e))):
+        _SparseToDense([1, 3], [[5], [3]], 1, -1)
+
+  def testBadValue(self):
+    with self.test_session():
+      dense = _SparseToDense([1, 3], [5], [[5], [3]], -1)
+      with self.assertRaisesOpError(
+          r"sparse_values has incorrect shape \[2,1\], "
+          r"should be \[\] or \[2\]"):
+        dense.eval()
+
+  def testBadNumValues(self):
+    with self.test_session():
+      dense = _SparseToDense([1, 3], [5], [1, 2, 3], -1)
+      with self.assertRaisesOpError(
+          r"sparse_values has incorrect shape \[3\], should be \[\] or \[2\]"):
+        dense.eval()
+
+  def testBadDefault(self):
+    with self.test_session():
+      dense = _SparseToDense([1, 3], [5], [1, 2], [1, 2])
+      with self.assertRaisesOpError("default_value should be a scalar"):
+        dense.eval()
+
+  def testShapeInferenceKnownShape(self):
+    with self.test_session(use_gpu=False):
+      indices = tf.placeholder(tf.int64)
+
+      shape = [4, 5, 6]
+      output = tf.sparse_to_dense(indices, shape, 1, 0)
+      self.assertEqual(output.get_shape(), [4, 5, 6])
+
+      shape = tf.placeholder(tf.int64, shape=(3,))
+      output = tf.sparse_to_dense(indices, shape, 1, 0)
+      self.assertEqual(output.get_shape().as_list(), [None, None, None])
+
+  def testShapeInferenceUnknownShape(self):
+    with self.test_session(use_gpu=False):
+      indices = tf.placeholder(tf.int64)
+      shape = tf.placeholder(tf.int64)
+      output = tf.sparse_to_dense(indices, shape, 1, 0)
+      self.assertEqual(output.get_shape().ndims, None)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/sparsemask_op_test.py b/tensorflow/python/kernel_tests/sparsemask_op_test.py
new file mode 100644
index 0000000000..ffde8f7944
--- /dev/null
+++ b/tensorflow/python/kernel_tests/sparsemask_op_test.py
@@ -0,0 +1,32 @@
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class SparseMaskTest(tf.test.TestCase):
+
+  def testBasic(self):
+    values = np.random.rand(4, 4).astype(np.single)
+    indices = np.array([0, 2, 3, 4], dtype=np.int32)
+    mask_indices = np.array([0], dtype=np.int32)
+
+    out_values = values[1:, :]
+    out_indices = np.array([2, 3, 4], dtype=np.int32)
+
+    with self.test_session() as sess:
+      values_tensor = tf.convert_to_tensor(values)
+      indices_tensor = tf.convert_to_tensor(indices)
+      mask_indices_tensor = tf.convert_to_tensor(mask_indices)
+
+      t = tf.IndexedSlices(values_tensor, indices_tensor)
+      masked_t = tf.sparse_mask(t, mask_indices_tensor)
+
+      tf_out_values, tf_out_indices = sess.run([masked_t.values,
+                                                masked_t.indices])
+
+      self.assertAllEqual(tf_out_values, out_values)
+      self.assertAllEqual(tf_out_indices, out_indices)
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/split_op_test.py b/tensorflow/python/kernel_tests/split_op_test.py
new file mode 100644
index 0000000000..19906aa02b
--- /dev/null
+++ b/tensorflow/python/kernel_tests/split_op_test.py
@@ -0,0 +1,132 @@
+"""Functional tests for Split Op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class SplitOpTest(tf.test.TestCase):
+
+  def _compare(self, x, dim, num, use_gpu):
+    np_ans = np.split(x, num, dim)
+    with self.test_session(use_gpu=use_gpu) as sess:
+      tf_ans = tf.split(dim, num, x)
+      out = sess.run(tf_ans)
+    self.assertEqual(num, len(np_ans))
+    self.assertEqual(num, len(np_ans))
+    self.assertEqual(num, len(out))
+    for i in range(num):
+      self.assertAllEqual(np_ans[i], out[i])
+      self.assertShapeEqual(np_ans[i], tf_ans[i])
+
+  def _testSplitRows(self, use_gpu):
+    inp = np.random.rand(4, 4).astype("f")
+    self._compare(inp, 0, 4, use_gpu)
+
+  def testSplitRowsAll(self):
+    self._testSplitRows(use_gpu=False)
+    self._testSplitRows(use_gpu=True)
+
+  def _testSplitCols(self, use_gpu):
+    inp = np.random.rand(4, 4).astype("f")
+    self._compare(inp, 1, 4, use_gpu)
+
+  def testSplitColsAll(self):
+    self._testSplitRows(use_gpu=False)
+    self._testSplitCols(use_gpu=True)
+
+  def _testEmpty(self, x, dim, num, expected_shape):
+    with self.test_session() as sess:
+      tf_ans = tf.split(dim, num, x)
+      out = sess.run(tf_ans)
+    self.assertEqual(x.size, 0)
+    self.assertEqual(len(out), num)
+    for i in range(num):
+      self.assertEqual(out[i].shape, expected_shape)
+      self.assertEqual(expected_shape, tf_ans[i].get_shape())
+
+  def testEmpty(self):
+    # Note: np.split returns a rank-0 empty ndarray
+    # if the input ndarray is empty.
+    inp = np.random.rand(8, 0, 21).astype("f")
+    self._testEmpty(inp, 0, 2, (4, 0, 21))
+    self._testEmpty(inp, 0, 4, (2, 0, 21))
+    self._testEmpty(inp, 1, 4, (8, 0, 21))
+    self._testEmpty(inp, 2, 3, (8, 0, 7))
+    self._testEmpty(inp, 2, 7, (8, 0, 3))
+
+  def testIdentity(self):
+    inp = np.random.rand(2, 2, 2).astype("f")
+    for use_gpu in [False, True]:
+      self._compare(inp, 0, 1, use_gpu)
+      self._compare(inp, 1, 1, use_gpu)
+      self._compare(inp, 2, 1, use_gpu)
+
+  def testSplitDim0(self):
+    for use_gpu in [False, True]:
+      self._compare(np.random.rand(6, 10, 18).astype("f"), 0, 3, use_gpu)
+      self._compare(np.random.rand(6, 7, 18).astype("f"), 0, 3, use_gpu)
+      self._compare(np.random.rand(6, 7, 9).astype("f"), 0, 3, use_gpu)
+
+  def _RunAndVerify(self, use_gpu):
+    # Random dims of rank 5
+    shape = np.random.randint(0, 5, size=5)
+    split_dim = np.random.randint(0, 5)
+    num_split = np.random.randint(2, 8)
+    shape[split_dim] = np.random.randint(2, 5) * num_split
+    inp = np.random.rand(*shape).astype("f")
+    with self.test_session(use_gpu=use_gpu) as sess:
+      result = sess.run(tf.split(split_dim, num_split, inp))
+    slices = [slice(0, x) for x in shape]
+    offset = 0
+    length = shape[split_dim] / num_split
+    for i in range(num_split):
+      slices[split_dim] = slice(offset, offset + length)
+      offset += length
+      self.assertAllEqual(result[i], inp[slices])
+
+  def testRandom(self):
+    for _ in range(5):
+      self._RunAndVerify(use_gpu=False)
+      self._RunAndVerify(use_gpu=True)
+
+  def _testGradientsSimple(self, use_gpu):
+    inp = np.random.rand(4, 4).astype("f")
+    with self.test_session(use_gpu=use_gpu):
+      inp_tensor = tf.convert_to_tensor(inp)
+      s = tf.split(1, 4, inp_tensor)
+      inp_grads = [np.random.rand(4, 1).astype("f") for _ in range(4)]
+      grad_tensors = [tf.constant(x) for x in inp_grads]
+      grad = tf.gradients(s, [inp_tensor], grad_tensors)[0]
+      result = grad.eval()
+    for i in range(4):
+      self.assertAllEqual(result[:, i:i+1], inp_grads[i])
+
+  def testGradientsAll(self):
+    self._testGradientsSimple(use_gpu=False)
+    self._testGradientsSimple(use_gpu=True)
+
+  def testShapeFunctionEdgeCases(self):
+    # split_dim greater than rank of input.
+    with self.assertRaises(ValueError):
+      tf.split(2, 4, [[0, 1], [2, 3]])
+
+    # num_split does not evenly divide the size in split_dim.
+    with self.assertRaisesRegexp(ValueError, "should evenly divide"):
+      tf.split(0, 3, [0, 1, 2, 3])
+
+    # Unknown split_dim.
+    splits = tf.split(tf.placeholder(tf.int32),
+                             4, [[0, 1, 2, 3]])
+    for s in splits:
+      self.assertEqual([None, None], s.get_shape().as_list())
+
+    # Unknown split_dim and input shape.
+    splits = tf.split(tf.placeholder(tf.int32),
+                             4, tf.placeholder(tf.float32))
+    for s in splits:
+      self.assertEqual(None, s.get_shape().ndims)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/string_to_hash_bucket_op_test.py b/tensorflow/python/kernel_tests/string_to_hash_bucket_op_test.py
new file mode 100644
index 0000000000..8615b271b8
--- /dev/null
+++ b/tensorflow/python/kernel_tests/string_to_hash_bucket_op_test.py
@@ -0,0 +1,34 @@
+"""Tests for StringToHashBucket op from string_ops."""
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class StringToHashBucketOpTest(tf.test.TestCase):
+
+  def testStringToOneHashBucket(self):
+    with self.test_session():
+      input_string = tf.placeholder(tf.string)
+      output = tf.string_to_hash_bucket(input_string, 1)
+      result = output.eval(feed_dict={
+          input_string: ['a', 'b', 'c']
+      })
+
+      self.assertAllEqual([0, 0, 0], result)
+
+  def testStringToHashBuckets(self):
+    with self.test_session():
+      input_string = tf.placeholder(tf.string)
+      output = tf.string_to_hash_bucket(input_string, 10)
+      result = output.eval(feed_dict={
+          input_string: ['a', 'b', 'c']
+      })
+
+      # Hash64('a') -> 2996632905371535868 -> mod 10 -> 8
+      # Hash64('b') -> 5795986006276551370 -> mod 10 -> 0
+      # Hash64('c') -> 14899841994519054197 -> mod 10 -> 7
+      self.assertAllEqual([8, 0, 7], result)
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/string_to_number_op_test.py b/tensorflow/python/kernel_tests/string_to_number_op_test.py
new file mode 100644
index 0000000000..39505e18ba
--- /dev/null
+++ b/tensorflow/python/kernel_tests/string_to_number_op_test.py
@@ -0,0 +1,66 @@
+"""Tests for StringToNumber op from parsing_ops."""
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+_ERROR_MESSAGE = "StringToNumberOp could not correctly convert string: "
+
+
+class StringToNumberOpTest(tf.test.TestCase):
+
+  def testToFloat(self):
+    with self.test_session():
+      input_string = tf.placeholder(tf.string)
+      output = tf.string_to_number(
+          input_string,
+          out_type=tf.float32)
+
+      result = output.eval(feed_dict={
+          input_string: ["0",
+                         "3",
+                         "-1",
+                         "1.12",
+                         "0xF",
+                         "   -10.5",
+                         "3.40282e+38",
+                         # The next two exceed maximum value for float, so we
+                         # expect +/-INF to be returned instead.
+                         "3.40283e+38",
+                         "-3.40283e+38",
+                         "NAN",
+                         "INF"]
+      })
+
+      self.assertAllClose([0, 3, -1, 1.12, 0xF, -10.5, 3.40282e+38,
+                           float("INF"), float("-INF"), float("NAN"),
+                           float("INF")], result)
+
+      with self.assertRaisesOpError(_ERROR_MESSAGE + "10foobar"):
+        output.eval(feed_dict={input_string: ["10foobar"]})
+
+  def testToInt32(self):
+    with self.test_session():
+      input_string = tf.placeholder(tf.string)
+      output = tf.string_to_number(
+          input_string,
+          out_type=tf.int32)
+
+      result = output.eval(feed_dict={
+          input_string: ["0", "3", "-1", "    -10", "-2147483648", "2147483647"]
+      })
+
+      self.assertAllEqual([0, 3, -1, -10, -2147483648, 2147483647], result)
+
+      with self.assertRaisesOpError(_ERROR_MESSAGE + "2.9"):
+        output.eval(feed_dict={input_string: ["2.9"]})
+
+      # The next two exceed maximum value of int32.
+      for in_string in ["-2147483649", "2147483648"]:
+        with self.assertRaisesOpError(_ERROR_MESSAGE + in_string):
+          output.eval(feed_dict={input_string: [in_string]})
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/summary_image_op_test.py b/tensorflow/python/kernel_tests/summary_image_op_test.py
new file mode 100644
index 0000000000..dfdb2c8938
--- /dev/null
+++ b/tensorflow/python/kernel_tests/summary_image_op_test.py
@@ -0,0 +1,63 @@
+"""Tests for summary image op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.ops import image_ops
+
+
+class SummaryImageOpTest(tf.test.TestCase):
+
+  def _AsSummary(self, s):
+    summ = tf.Summary()
+    summ.ParseFromString(s)
+    return summ
+
+  def testImageSummary(self):
+    np.random.seed(7)
+    with self.test_session() as sess:
+      for depth in 1, 3, 4:
+        shape = (4, 5, 7) + (depth,)
+        bad_color = [255, 0, 0, 255][:depth]
+        for positive in False, True:
+          # Build a mostly random image with one nan
+          const = np.random.randn(*shape)
+          const[0, 1, 2] = 0  # Make the nan entry not the max
+          if positive:
+            const = 1 + np.maximum(const, 0)
+            scale = 255 / const.reshape(4, -1).max(axis=1)
+            offset = 0
+          else:
+            scale = 127 / np.abs(const.reshape(4, -1)).max(axis=1)
+            offset = 128
+          adjusted = np.floor(scale[:, None, None, None] * const + offset)
+          const[0, 1, 2, depth / 2] = np.nan
+
+          # Summarize
+          summ = tf.image_summary("img", const)
+          value = sess.run(summ)
+          self.assertEqual([], summ.get_shape())
+          image_summ = self._AsSummary(value)
+
+          # Decode the first image and check consistency
+          image = image_ops.decode_png(
+              image_summ.value[0].image.encoded_image_string).eval()
+          self.assertAllEqual(image[1, 2], bad_color)
+          image[1, 2] = adjusted[0, 1, 2]
+          self.assertAllClose(image, adjusted[0])
+
+          # Check the rest of the proto
+          # Only the first 3 images are returned.
+          for v in image_summ.value:
+            v.image.ClearField("encoded_image_string")
+          expected = '\n'.join("""
+              value {
+                tag: "img/image/%d"
+                image { height: %d width: %d colorspace: %d }
+              }""" % ((i,) + shape[1:]) for i in xrange(3))
+          self.assertProtoEquals(expected, image_summ)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/summary_ops_test.py b/tensorflow/python/kernel_tests/summary_ops_test.py
new file mode 100644
index 0000000000..13e5021ccc
--- /dev/null
+++ b/tensorflow/python/kernel_tests/summary_ops_test.py
@@ -0,0 +1,83 @@
+"""Tests for summary ops."""
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+class SummaryOpsTest(tf.test.TestCase):
+
+  def _AsSummary(self, s):
+    summ = tf.Summary()
+    summ.ParseFromString(s)
+    return summ
+
+  def testScalarSummary(self):
+    with self.test_session() as sess:
+      const = tf.constant([10.0, 20.0])
+      summ = tf.scalar_summary(["c1", "c2"], const, name="mysumm")
+      value = sess.run(summ)
+    self.assertEqual([], summ.get_shape())
+    self.assertProtoEquals("""
+      value { tag: "c1" simple_value: 10.0 }
+      value { tag: "c2" simple_value: 20.0 }
+      """, self._AsSummary(value))
+
+  def testScalarSummaryDefaultName(self):
+    with self.test_session() as sess:
+      const = tf.constant([10.0, 20.0])
+      summ = tf.scalar_summary(["c1", "c2"], const)
+      value = sess.run(summ)
+    self.assertEqual([], summ.get_shape())
+    self.assertProtoEquals("""
+      value { tag: "c1" simple_value: 10.0 }
+      value { tag: "c2" simple_value: 20.0 }
+      """, self._AsSummary(value))
+
+  def testMergeSummary(self):
+    with self.test_session() as sess:
+      const = tf.constant(10.0)
+      summ1 = tf.histogram_summary("h", const, name="histo")
+      summ2 = tf.scalar_summary("c", const, name="summ")
+      merge = tf.merge_summary([summ1, summ2])
+      value = sess.run(merge)
+    self.assertEqual([], merge.get_shape())
+    self.assertProtoEquals("""
+      value {
+        tag: "h"
+        histo {
+          min: 10.0
+          max: 10.0
+          num: 1.0
+          sum: 10.0
+          sum_squares: 100.0
+          bucket_limit: 9.93809490288
+          bucket_limit: 10.9319043932
+          bucket_limit: 1.79769313486e+308
+          bucket: 0.0
+          bucket: 1.0
+          bucket: 0.0
+        }
+      }
+      value { tag: "c" simple_value: 10.0 }
+    """, self._AsSummary(value))
+
+  def testMergeAllSummaries(self):
+    with tf.Graph().as_default():
+      const = tf.constant(10.0)
+      summ1 = tf.histogram_summary("h", const, name="histo")
+      summ2 = tf.scalar_summary("o", const, name="oops",
+                                        collections=["foo_key"])
+      summ3 = tf.scalar_summary("c", const, name="summ")
+      merge = tf.merge_all_summaries()
+      self.assertEqual("MergeSummary", merge.op.type)
+      self.assertEqual(2, len(merge.op.inputs))
+      self.assertEqual(summ1, merge.op.inputs[0])
+      self.assertEqual(summ3, merge.op.inputs[1])
+      merge = tf.merge_all_summaries("foo_key")
+      self.assertEqual("MergeSummary", merge.op.type)
+      self.assertEqual(1, len(merge.op.inputs))
+      self.assertEqual(summ2, merge.op.inputs[0])
+      self.assertTrue(tf.merge_all_summaries("bar_key") is None)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/topk_op_test.py b/tensorflow/python/kernel_tests/topk_op_test.py
new file mode 100644
index 0000000000..497dc9ac1e
--- /dev/null
+++ b/tensorflow/python/kernel_tests/topk_op_test.py
@@ -0,0 +1,52 @@
+"""Tests for TopK op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class TopKTest(tf.test.TestCase):
+
+  def _validateTopK(self, inputs, k, expected_values, expected_indices):
+    np_values = np.array(expected_values)
+    np_indices = np.array(expected_indices)
+    with self.test_session():
+      values_op, indices_op = tf.nn.top_k(inputs, k)
+      values = values_op.eval()
+      indices = indices_op.eval()
+      self.assertAllClose(np_values, values)
+      self.assertAllEqual(np_indices, indices)
+      self.assertShapeEqual(np_values, values_op)
+      self.assertShapeEqual(np_indices, indices_op)
+
+  def testTop1(self):
+    inputs = [[0.1, 0.3, 0.2, 0.4], [0.1, 0.3, 0.3, 0.2]]
+    self._validateTopK(inputs, 1,
+                       [[0.4], [0.3]],
+                       [[3], [1]])
+
+  def testTop2(self):
+    inputs = [[0.1, 0.3, 0.2, 0.4], [0.1, 0.3, 0.3, 0.2]]
+    self._validateTopK(inputs, 2,
+                       [[0.4, 0.3], [0.3, 0.3]],
+                       [[3, 1], [1, 2]])
+
+  def testTopAll(self):
+    inputs = [[0.1, 0.3, 0.2, 0.4], [0.1, 0.3, 0.3, 0.2]]
+    self._validateTopK(inputs, 4,
+                       [[0.4, 0.3, 0.2, 0.1], [0.3, 0.3, 0.2, 0.1]],
+                       [[3, 1, 2, 0], [1, 2, 3, 0]])
+
+  def testKNegative(self):
+    inputs = [[0.1, 0.2], [0.3, 0.4]]
+    with self.assertRaisesRegexp(ValueError, "less than minimum 1"):
+      tf.nn.top_k(inputs, -1)
+
+  def testKTooLarge(self):
+    inputs = [[0.1, 0.2], [0.3, 0.4]]
+    with self.assertRaisesRegexp(ValueError, "input must have at least k"):
+      tf.nn.top_k(inputs, 4)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/transpose_op_test.py b/tensorflow/python/kernel_tests/transpose_op_test.py
new file mode 100644
index 0000000000..2786eaf37b
--- /dev/null
+++ b/tensorflow/python/kernel_tests/transpose_op_test.py
@@ -0,0 +1,176 @@
+"""Functional tests for Transpose op."""
+import itertools
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests.gradient_checker import ComputeGradient
+
+
+class TransposeTest(tf.test.TestCase):
+
+  def _np_transpose(self, x, perm):
+    ret = np.copy(x)
+    ret = ret.transpose(perm)
+    return ret
+
+  def _compareCpu(self, x, p):
+    np_ans = self._np_transpose(x, p)
+    with self.test_session(use_gpu=False):
+      inx = tf.convert_to_tensor(x)
+      y = tf.transpose(inx, p)
+      tf_ans = y.eval()
+      self.assertAllEqual(np_ans, tf_ans)
+      self.assertShapeEqual(np_ans, y)
+
+      jacob_t = None
+      # Gradient check on CPU.
+      xs = list(np.shape(x))
+      ys = list(np.shape(tf_ans))
+      if x.dtype == np.float32:
+        jacob_t, jacob_n = ComputeGradient(inx, xs, y, ys, x, 1e-2)
+        self.assertAllClose(jacob_t, jacob_n, 1e-3, 1e-3)
+      elif x.dtype == np.float64:
+        jacob_t, jacob_n = ComputeGradient(inx, xs, y, ys, x, 1e-2)
+        self.assertAllClose(jacob_t, jacob_n, 1e-6, 1e-6)
+
+      return tf_ans, jacob_t
+
+  def _compareGpu(self, x, p):
+    np_ans = self._np_transpose(x, p)
+    with self.test_session(use_gpu=True):
+      inx = tf.convert_to_tensor(x)
+      y = tf.transpose(inx, p)
+      tf_ans = y.eval()
+      self.assertAllEqual(np_ans, tf_ans)
+      self.assertShapeEqual(np_ans, y)
+
+      jacob_t = None
+      # Gradient check on GPU.
+      xs = list(np.shape(x))
+      ys = list(np.shape(tf_ans))
+      if x.dtype == np.float32:
+        jacob_t, jacob_n = ComputeGradient(inx, xs, y, ys, x, 1e-2)
+        self.assertAllClose(jacob_t, jacob_n, 1e-3, 1e-3)
+      elif x.dtype == np.float64:
+        jacob_t, jacob_n = ComputeGradient(inx, xs, y, ys, x, 1e-2)
+        self.assertAllClose(jacob_t, jacob_n, 1e-6, 1e-6)
+
+      return tf_ans, jacob_t
+
+  def _compare(self, x, use_gpu=False):
+    n = np.ndim(x)
+    # generate all permutations of [0, 1, ... n-1] in random order.
+    all_perm = np.random.permutation(
+        [p for p in itertools.permutations(range(n))]).astype(np.int32)
+    for p in all_perm[0:2]:
+      self._compareCpu(x, p)
+      if use_gpu:
+        self._compareGpu(x, p)
+
+  def _compare_cpu_gpu(self, x):
+    n = np.ndim(x)
+    # generate all permutation of [0, 1, ... n-1] in random order.
+    all_perm = np.random.permutation(
+        [p for p in itertools.permutations(range(n))]).astype(np.int32)
+    for p in all_perm[0:2]:
+      tf_a_cpu, tf_g_cpu = self._compareCpu(x, p)
+      tf_a_gpu, tf_g_gpu = self._compareGpu(x, p)
+      assert tf_g_cpu is not None
+      assert tf_g_gpu is not None
+      if x.dtype == np.float32:
+        self.assertAllClose(tf_a_cpu, tf_a_gpu, 1e-3, 1e-3)
+        self.assertAllClose(tf_g_cpu, tf_g_gpu, 1e-3, 1e-3)
+      elif x.dtype == np.float64:
+        self.assertAllClose(tf_a_cpu, tf_a_gpu, 1e-6, 1e-6)
+        self.assertAllClose(tf_g_cpu, tf_g_gpu, 1e-6, 1e-6)
+
+  def _testCpu(self, x):
+    self._compare(x, use_gpu=False)
+
+  def test1D(self):
+    self._compareCpu(np.arange(0., 2), [0])
+
+  def testNop(self):
+    self._compareCpu(np.arange(0, 6).reshape([3, 2]).astype(np.float32), [0, 1])
+
+  def testSimple(self):
+    self._compareCpu(np.arange(0, 8).reshape([2, 4]).astype(np.float32),
+                     np.array([1, 0]).astype(np.int32))
+
+  def testFloat(self):
+    self._compare_cpu_gpu(np.arange(0, 21).reshape([3, 7]).astype(np.float32))
+    self._compare_cpu_gpu(
+        np.arange(0, 210).reshape([2, 3, 5, 7]).astype(np.float32))
+
+  def testDouble(self):
+    self._compare_cpu_gpu(np.arange(0, 21).reshape([3, 7]).astype(np.float64))
+    self._compare_cpu_gpu(
+        np.arange(0, 210).reshape([2, 3, 5, 7]).astype(np.float64))
+
+  def testSComplex(self):
+    self._testCpu(np.complex(1, 2) * np.arange(0, 21).reshape(
+        [3, 7]).astype(np.complex64))
+    self._testCpu(np.complex(1, 2) * np.arange(0, 210).reshape(
+        [2, 3, 5, 7]).astype(np.complex64))
+
+  def testInt8(self):
+    self._testCpu(np.arange(0, 21).reshape([3, 7]).astype(np.int8))
+    self._testCpu(np.arange(0, 210).reshape([2, 3, 5, 7]).astype(np.int8))
+
+  def testInt16(self):
+    self._testCpu(np.arange(0, 21).reshape([3, 7]).astype(np.int16))
+    self._testCpu(np.arange(0, 210).reshape([2, 3, 5, 7]).astype(np.int16))
+
+  def testInt32(self):
+    self._testCpu(np.arange(0, 21).reshape([3, 7]).astype(np.int32))
+    self._testCpu(np.arange(0, 210).reshape([2, 3, 5, 7]).astype(np.int32))
+
+  def testInt64(self):
+    self._testCpu(np.arange(0, 21).reshape([3, 7]).astype(np.int64))
+    self._testCpu(np.arange(0, 210).reshape([2, 3, 5, 7]).astype(np.int64))
+
+  def testTranspose2DAuto(self):
+    x_np = [[1, 2, 3], [4, 5, 6]]
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = tf.transpose(x_np).eval()
+        self.assertAllEqual(x_tf, [[1, 4], [2, 5], [3, 6]])
+
+  def testTransposeShapes(self):
+    self.assertEqual([], tf.transpose(
+        tf.placeholder(tf.int32, shape=[])).get_shape().dims)
+    self.assertEqual([100], tf.transpose(
+        tf.placeholder(tf.int32, shape=[100])).get_shape().dims)
+    self.assertEqual([37, 100], tf.transpose(
+        tf.placeholder(tf.int32, shape=[100, 37])).get_shape().dims)
+    self.assertEqual([100, 37], tf.transpose(
+        tf.placeholder(tf.int32, shape=[100, 37]), [0, 1]).get_shape().dims)
+    self.assertEqual([15, 37, 100], tf.transpose(
+        tf.placeholder(tf.int32, shape=[100, 37, 15])).get_shape().dims)
+    self.assertEqual([15, 100, 37], tf.transpose(
+        tf.placeholder(tf.int32,
+                       shape=[100, 37, 15]), [2, 0, 1]).get_shape().dims)
+    self.assertEqual(tf.TensorShape(None), tf.transpose(
+        tf.placeholder(tf.int32)).get_shape())
+
+  def _testError(self, x, p, err):
+    with self.test_session():
+      with self.assertRaisesOpError(err):
+        tf.transpose(x, p).eval()
+
+  def testError(self):
+    with self.assertRaises(ValueError):
+      tf.transpose(np.arange(0., 30).reshape([2, 3, 5]), [[0, 1], [2, 3]])
+    self._testError(np.arange(0., 2 ** 10).reshape([2] * 10),
+                    range(10),
+                    "not implemented")
+    with self.assertRaises(IndexError):
+      tf.transpose(np.arange(0., 30).reshape([2, 3, 5]), [0, 1, 3])
+    self._testError(np.arange(0., 30).reshape([2, 3, 5]),
+                    [0, 1, 1],
+                    "2 is missing")
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/unique_op_test.py b/tensorflow/python/kernel_tests/unique_op_test.py
new file mode 100644
index 0000000000..4d6543a206
--- /dev/null
+++ b/tensorflow/python/kernel_tests/unique_op_test.py
@@ -0,0 +1,22 @@
+"""Tests for tensorflow.kernels.unique_op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class UniqueTest(tf.test.TestCase):
+
+  def testInt32(self):
+    x = list(np.random.randint(2, high=10, size=7000))
+    with self.test_session() as sess:
+      y, idx = tf.unique(x)
+      tf_y, tf_idx = sess.run([y, idx])
+
+    self.assertEqual(len(x), len(tf_idx))
+    self.assertEqual(len(tf_y), len(np.unique(x)))
+    for i in range(len(x)):
+      self.assertEqual(x[i], tf_y[tf_idx[i]])
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/unpack_op_test.py b/tensorflow/python/kernel_tests/unpack_op_test.py
new file mode 100644
index 0000000000..4929af035f
--- /dev/null
+++ b/tensorflow/python/kernel_tests/unpack_op_test.py
@@ -0,0 +1,56 @@
+"""Functional tests for Unpack Op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker
+
+
+class UnpackOpTest(tf.test.TestCase):
+
+  def testSimple(self):
+    np.random.seed(7)
+    for use_gpu in False, True:
+      with self.test_session(use_gpu=use_gpu):
+        for shape in (2,), (3,), (2, 3), (3, 2), (4, 3, 2):
+          data = np.random.randn(*shape)
+          # Convert data to a single tensorflow tensor
+          x = tf.constant(data)
+          # Unpack into a list of tensors
+          cs = tf.unpack(x, num=shape[0])
+          self.assertEqual(type(cs), list)
+          self.assertEqual(len(cs), shape[0])
+          cs = [c.eval() for c in cs]
+          self.assertAllEqual(cs, data)
+
+  def testGradients(self):
+    for use_gpu in False, True:
+      for shape in (2,), (3,), (2, 3), (3, 2), (4, 3, 2):
+        data = np.random.randn(*shape)
+        shapes = [shape[1:]] * shape[0]
+        for i in xrange(shape[0]):
+          with self.test_session(use_gpu=use_gpu):
+            x = tf.constant(data)
+            cs = tf.unpack(x, num=shape[0])
+            err = gradient_checker.ComputeGradientError(x, shape, cs[i],
+                                                        shapes[i])
+            self.assertLess(err, 1e-6)
+
+  def testInferNum(self):
+    with self.test_session():
+      for shape in (2,), (3,), (2, 3), (3, 2), (4, 3, 2):
+        x = tf.placeholder(np.float32, shape=shape)
+        cs = tf.unpack(x)
+        self.assertEqual(type(cs), list)
+        self.assertEqual(len(cs), shape[0])
+
+  def testCannotInferNum(self):
+    x = tf.placeholder(np.float32)
+    with self.assertRaisesRegexp(
+        ValueError, r'Cannot infer num from shape TensorShape\(None\)'):
+      tf.unpack(x)
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/variable_ops_test.py b/tensorflow/python/kernel_tests/variable_ops_test.py
new file mode 100644
index 0000000000..aaa4237260
--- /dev/null
+++ b/tensorflow/python/kernel_tests/variable_ops_test.py
@@ -0,0 +1,225 @@
+"""Tests for tensorflow.ops.tf.variable_op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.framework import errors
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import gen_state_ops
+from tensorflow.python.ops import state_ops
+
+
+_NP_TO_TF = {
+    np.float32: tf.float32,
+    np.float64: tf.float64,
+    np.int32: tf.int32,
+    np.int64: tf.int64,
+}
+
+
+class VariableOpTest(tf.test.TestCase):
+
+  def _initFetch(self, x, tftype, use_gpu=None):
+    with self.test_session(use_gpu=use_gpu):
+      p = state_ops.variable_op(x.shape, tftype)
+      op = tf.assign(p, x)
+      op.op.run()
+      return p.eval()
+
+  def _testTypes(self, vals):
+    for dtype in [np.float32, np.float64, np.int32, np.int64]:
+      self.setUp()
+      x = vals.astype(dtype)
+      tftype = _NP_TO_TF[dtype]
+      self.assertAllEqual(x, self._initFetch(x, tftype, use_gpu=False))
+      # NOTE(mdevin): the GPU test should pass for all types, whether the
+      # Variable op has an implementation for that type on GPU as we expect
+      # that Variable and Assign have GPU implementations for matching tf.
+      self.assertAllEqual(x, self._initFetch(x, tftype, use_gpu=True))
+
+  def testBasic(self):
+    self._testTypes(np.arange(0, 20).reshape([4, 5]))
+
+  def testset_shape(self):
+    p = state_ops.variable_op([1, 2], tf.float32)
+    self.assertEqual([1, 2], p.get_shape())
+    p = state_ops.variable_op([1, 2], tf.float32, set_shape=False)
+    self.assertEqual(tensor_shape.unknown_shape(), p.get_shape())
+
+  def testAssign(self):
+    value = np.array([[42.0, 43.0]])
+    var = state_ops.variable_op(value.shape, tf.float32)
+    self.assertShapeEqual(value, var)
+    assigned = tf.assign(var, value)
+    self.assertShapeEqual(value, assigned)
+
+  def testAssignNoValidateShape(self):
+    value = np.array([[42.0, 43.0]])
+    var = state_ops.variable_op(value.shape, tf.float32)
+    self.assertShapeEqual(value, var)
+    assigned = tf.assign(var, value, validate_shape=False)
+    self.assertShapeEqual(value, assigned)
+
+  def testAssignNoVarShape(self):
+    value = np.array([[42.0, 43.0]])
+    var = state_ops.variable_op(value.shape, tf.float32, set_shape=False)
+    self.assertEqual(tensor_shape.unknown_shape(), var.get_shape())
+    assigned = tf.assign(var, value)
+    self.assertShapeEqual(value, assigned)
+
+  def testAssignNoVarShapeNoValidateShape(self):
+    value = np.array([[42.0, 43.0]])
+    var = state_ops.variable_op(value.shape, tf.float32, set_shape=False)
+    self.assertEqual(tensor_shape.unknown_shape(), var.get_shape())
+    assigned = tf.assign(var, value, validate_shape=False)
+    self.assertShapeEqual(value, assigned)
+
+  def _NewShapelessTensor(self):
+    tensor = tf.placeholder(tf.float32)
+    self.assertEqual(tensor_shape.unknown_shape(), tensor.get_shape())
+    return tensor
+
+  def testAssignNoValueShape(self):
+    value = self._NewShapelessTensor()
+    shape = [1, 2]
+    var = state_ops.variable_op(shape, tf.float32)
+    assigned = tf.assign(var, value)
+    self.assertEqual(shape, var.get_shape())
+    self.assertEqual(shape, assigned.get_shape())
+
+  def testAssignNoValueShapeNoValidateShape(self):
+    value = self._NewShapelessTensor()
+    shape = [1, 2]
+    var = state_ops.variable_op(shape, tf.float32)
+    self.assertEqual(shape, var.get_shape())
+    assigned = tf.assign(var, value, validate_shape=False)
+    self.assertEqual(tensor_shape.unknown_shape(), assigned.get_shape())
+
+  def testAssignNoShape(self):
+    with self.test_session():
+      value = self._NewShapelessTensor()
+      var = state_ops.variable_op([1, 2], tf.float32, set_shape=False)
+      self.assertEqual(tensor_shape.unknown_shape(), var.get_shape())
+      self.assertEqual(tensor_shape.unknown_shape(),
+                       tf.assign(var, value).get_shape())
+
+  def testAssignNoShapeNoValidateShape(self):
+    with self.test_session():
+      value = self._NewShapelessTensor()
+      var = state_ops.variable_op([1, 2], tf.float32, set_shape=False)
+      self.assertEqual(tensor_shape.unknown_shape(), var.get_shape())
+      self.assertEqual(tensor_shape.unknown_shape(),
+                       tf.assign(var, value, validate_shape=False).get_shape())
+
+  def testAssignUpdate(self):
+    var = state_ops.variable_op([1, 2], tf.float32)
+    added = tf.assign_add(var, [[2.0, 3.0]])
+    self.assertEqual([1, 2], added.get_shape())
+    subbed = tf.assign_sub(var, [[12.0, 13.0]])
+    self.assertEqual([1, 2], subbed.get_shape())
+
+  def testAssignUpdateNoVarShape(self):
+    var = state_ops.variable_op([1, 2], tf.float32, set_shape=False)
+    added = tf.assign_add(var, [[2.0, 3.0]])
+    self.assertEqual([1, 2], added.get_shape())
+    subbed = tf.assign_sub(var, [[12.0, 13.0]])
+    self.assertEqual([1, 2], subbed.get_shape())
+
+  def testAssignUpdateNoValueShape(self):
+    var = state_ops.variable_op([1, 2], tf.float32)
+    added = tf.assign_add(var, self._NewShapelessTensor())
+    self.assertEqual([1, 2], added.get_shape())
+    subbed = tf.assign_sub(var, self._NewShapelessTensor())
+    self.assertEqual([1, 2], subbed.get_shape())
+
+  def testAssignUpdateNoShape(self):
+    var = state_ops.variable_op([1, 2], tf.float32, set_shape=False)
+    added = tf.assign_add(var, self._NewShapelessTensor())
+    self.assertEqual(tensor_shape.unknown_shape(), added.get_shape())
+    subbed = tf.assign_sub(var, self._NewShapelessTensor())
+    self.assertEqual(tensor_shape.unknown_shape(), subbed.get_shape())
+
+  def testTemporaryVariable(self):
+    with self.test_session(use_gpu=True):
+      var = gen_state_ops._temporary_variable(
+          [1, 2],
+          tf.float32,
+          var_name="foo")
+      var = tf.assign(var, [[4.0, 5.0]])
+      var = tf.assign_add(var, [[6.0, 7.0]])
+      final = gen_state_ops._destroy_temporary_variable(var, var_name="foo")
+      self.assertAllClose([[10.0, 12.0]], final.eval())
+
+  def testDestroyNonexistentTemporaryVariable(self):
+    with self.test_session(use_gpu=True):
+      var = gen_state_ops._temporary_variable([1, 2], tf.float32)
+      final = gen_state_ops._destroy_temporary_variable(var, var_name="bad")
+      with self.assertRaises(errors.NotFoundError):
+        final.eval()
+
+  def testDuplicateTemporaryVariable(self):
+    with self.test_session(use_gpu=True):
+      var1 = gen_state_ops._temporary_variable(
+          [1, 2],
+          tf.float32,
+          var_name="dup")
+      var1 = tf.assign(var1, [[1.0, 2.0]])
+      var2 = gen_state_ops._temporary_variable(
+          [1, 2],
+          tf.float32,
+          var_name="dup")
+      var2 = tf.assign(var2, [[3.0, 4.0]])
+      final = var1 + var2
+      with self.assertRaises(errors.AlreadyExistsError):
+        final.eval()
+
+  def testDestroyTemporaryVariableTwice(self):
+    with self.test_session(use_gpu=True):
+      var = gen_state_ops._temporary_variable([1, 2], tf.float32)
+      val1 = gen_state_ops._destroy_temporary_variable(var, var_name="dup")
+      val2 = gen_state_ops._destroy_temporary_variable(var, var_name="dup")
+      final = val1 + val2
+      with self.assertRaises(errors.NotFoundError):
+        final.eval()
+
+  def testTemporaryVariableNoLeak(self):
+    with self.test_session(use_gpu=True):
+      var = gen_state_ops._temporary_variable(
+          [1, 2],
+          tf.float32,
+          var_name="bar")
+      final = tf.identity(var)
+      final.eval()
+
+  def testTwoTemporaryVariablesNoLeaks(self):
+    with self.test_session(use_gpu=True):
+      var1 = gen_state_ops._temporary_variable(
+          [1, 2],
+          tf.float32,
+          var_name="var1")
+      var2 = gen_state_ops._temporary_variable(
+          [1, 2],
+          tf.float32,
+          var_name="var2")
+      final = var1 + var2
+      final.eval()
+
+  def testAssignDependencyAcrossDevices(self):
+    with self.test_session(use_gpu=True):
+      # The variable and an op to increment it are on the GPU.
+      var = state_ops.variable_op([1], tf.float32)
+      tf.assign(var, [1.0]).eval()
+      increment = tf.assign_add(var, [1.0])
+      with tf.control_dependencies([increment]):
+        with tf.device("/cpu:0"):
+          # This mul op is pinned to the CPU, but reads the variable from the
+          # GPU. The test ensures that the dependency on 'increment' is still
+          # honored, i.e., the Send and Recv from GPU to CPU should take place
+          # only after the increment.
+          result = tf.mul(var, var)
+      self.assertAllClose([4.0], result.eval())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/variable_scope_test.py b/tensorflow/python/kernel_tests/variable_scope_test.py
new file mode 100644
index 0000000000..bb538198ea
--- /dev/null
+++ b/tensorflow/python/kernel_tests/variable_scope_test.py
@@ -0,0 +1,160 @@
+"""Tests for variable store."""
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+from tensorflow.python.ops import variable_scope
+
+
+class VariableStoreTest(tf.test.TestCase):
+
+  def testGetVar(self):
+    vs = variable_scope._get_default_variable_store()
+    v = vs.get_variable("v", [1])
+    v1 = vs.get_variable("v", [1])
+    assert v == v1
+
+  def testNameExists(self):
+    vs = variable_scope._get_default_variable_store()
+    # No check by default, so we can both create and get existing names.
+    v = vs.get_variable("v", [1])
+    v1 = vs.get_variable("v", [1])
+    assert v == v1
+    # When reuse is False, we fail when variables are already there.
+    vs.get_variable("w", [1], reuse=False)  # That's ok.
+    with self.assertRaises(ValueError):
+      vs.get_variable("v", [1], reuse=False)  # That fails.
+    # When reuse is True, we fail when variables are new.
+    vs.get_variable("v", [1], reuse=True)  # That's ok.
+    with self.assertRaises(ValueError):
+      vs.get_variable("u", [1], reuse=True)  # That fails.
+
+  def testNamelessStore(self):
+    vs = variable_scope._get_default_variable_store()
+    vs.get_variable("v1", [2])
+    vs.get_variable("v2", [2])
+    expected_names = ["%s:0" % name for name in ["v1", "v2"]]
+    self.assertEqual(set(expected_names),
+                     set([v.name for v in vs._vars.values()]))
+
+  def testVarScopeIntializer(self):
+    with self.test_session() as sess:
+      init = tf.constant_initializer(0.3)
+      with variable_scope.variable_scope("tower") as tower:
+        with variable_scope.variable_scope("foo", initializer=init):
+          v = variable_scope.get_variable("v", [])
+          sess.run(tf.initialize_variables([v]))
+          self.assertAllClose(v.eval(), 0.3)
+        with variable_scope.variable_scope(tower, initializer=init):
+          w = variable_scope.get_variable("w", [])
+          sess.run(tf.initialize_variables([w]))
+          self.assertAllClose(w.eval(), 0.3)
+
+  def testGetVariableScope(self):
+    # Test the get_variable_scope() function and setting properties of result.
+    with self.test_session() as sess:
+      init = tf.constant_initializer(0.3)
+      with variable_scope.variable_scope("foo"):
+        new_init1 = variable_scope.get_variable_scope().initializer
+        self.assertEqual(new_init1, None)
+        # Check that we can set initializer like this.
+        variable_scope.get_variable_scope().set_initializer(init)
+        v = variable_scope.get_variable("v", [])
+        sess.run(tf.initialize_variables([v]))
+        self.assertAllClose(v.eval(), 0.3)
+        # Check that we can set reuse.
+        variable_scope.get_variable_scope().reuse_variables()
+        with self.assertRaises(ValueError):  # Fail, w does not exist yet.
+          variable_scope.get_variable("w", [1])
+      # Check that the set initializer goes away.
+      new_init = variable_scope.get_variable_scope().initializer
+      self.assertEqual(new_init, None)
+
+  def testVarScope(self):
+    with self.test_session():
+      with variable_scope.variable_scope("tower") as tower:
+        self.assertEqual(tower.name, "tower")
+        with tf.name_scope("scope") as sc:
+          self.assertEqual(sc, "tower/scope/")
+
+      with variable_scope.variable_scope("foo"):
+        with variable_scope.variable_scope("bar") as bar:
+          self.assertEqual(bar.name, "foo/bar")
+          with tf.name_scope("scope") as sc:
+            self.assertEqual(sc, "foo/bar/scope/")
+
+      with variable_scope.variable_scope("foo"):
+        with variable_scope.variable_scope(tower, reuse=True) as tower_shared:
+          self.assertEqual(tower_shared.name, "tower")
+          with tf.name_scope("scope") as sc:
+            self.assertEqual(sc, "foo_1/scope/")
+
+  def testVarScopeNameScope(self):
+    with self.test_session():
+      with tf.name_scope("scope1"):
+        with variable_scope.variable_scope("tower") as tower:
+          with tf.name_scope("scope2") as sc2:
+            self.assertEqual(sc2, "scope1/tower/scope2/")
+        with variable_scope.variable_scope("tower"):  # Re-enter adds suffix.
+          with tf.name_scope("scope2") as sc2:
+            self.assertEqual(sc2, "scope1/tower_1/scope2/")
+
+      with tf.name_scope("scope3"):
+        with variable_scope.variable_scope("tower"):
+          with tf.name_scope("scope2") as sc2:
+            self.assertEqual(sc2, "scope3/tower/scope2/")
+        with variable_scope.variable_scope(tower):
+          with tf.name_scope("scope2") as sc2:
+            self.assertEqual(sc2, "scope3/scope2/")
+
+  def testVarScopeGetVar(self):
+    with self.test_session():
+      with variable_scope.variable_scope("root"):
+        with variable_scope.variable_scope("towerA") as tower_a:
+          va = variable_scope.get_variable("v", [1])
+          self.assertEqual(va.name, "root/towerA/v:0")
+
+        with variable_scope.variable_scope(tower_a, reuse=True):
+          va2 = variable_scope.get_variable("v", [1])
+          self.assertEqual(va2, va)
+
+        with variable_scope.variable_scope("towerB"):
+          vb = variable_scope.get_variable("v", [1])
+          self.assertEqual(vb.name, "root/towerB/v:0")
+
+        with self.assertRaises(ValueError) as exc:
+          with variable_scope.variable_scope("towerA"):
+            va2 = variable_scope.get_variable("v", [1])
+        self.assertEqual(exc.exception.message[:12], "Over-sharing")
+
+        with variable_scope.variable_scope("towerA", reuse=True):
+          va2 = variable_scope.get_variable("v", [1])
+          self.assertEqual(va2, va)
+
+        with variable_scope.variable_scope("foo"):
+          with variable_scope.variable_scope("bar"):
+            v = variable_scope.get_variable("v", [1])
+            self.assertEqual(v.name, "root/foo/bar/v:0")
+            with variable_scope.variable_scope(tower_a, reuse=True):
+              va3 = variable_scope.get_variable("v", [1])
+              self.assertEqual(va, va3)
+
+        with self.assertRaises(ValueError) as exc:
+          with variable_scope.variable_scope(tower_a, reuse=True):
+            with variable_scope.variable_scope("baz"):
+              variable_scope.get_variable("v", [1])
+        self.assertEqual(exc.exception.message[:13], "Under-sharing")
+
+        with self.assertRaises(ValueError) as exc:
+          with variable_scope.variable_scope(tower_a, reuse=True):
+            variable_scope.get_variable("v", [2])  # Different shape.
+        self.assertEqual("shape" in exc.exception.message, True)
+
+        with self.assertRaises(ValueError) as exc:
+          with variable_scope.variable_scope(tower_a, reuse=True):
+            variable_scope.get_variable("v", [1], dtype=tf.int32)
+        self.assertEqual("dtype" in exc.exception.message, True)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/variables_test.py b/tensorflow/python/kernel_tests/variables_test.py
new file mode 100644
index 0000000000..f2a7ea0af8
--- /dev/null
+++ b/tensorflow/python/kernel_tests/variables_test.py
@@ -0,0 +1,242 @@
+"""Tests for tf.py."""
+import operator
+
+import tensorflow.python.platform
+
+import numpy as np
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+from tensorflow.python.ops import random_ops
+
+
+class VariablesTestCase(tf.test.TestCase):
+
+  def testInitialization(self):
+    with self.test_session():
+      var0 = tf.Variable(0.0)
+      self.assertEqual("Variable:0", var0.name)
+      self.assertEqual([], var0.get_shape())
+      self.assertEqual([], var0.get_shape())
+
+      var1 = tf.Variable(1.1)
+      self.assertEqual("Variable_1:0", var1.name)
+      self.assertEqual([], var1.get_shape())
+      self.assertEqual([], var1.get_shape())
+
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        var0.eval()
+
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        var1.eval()
+
+      tf.initialize_all_variables().run()
+
+      self.assertAllClose(0.0, var0.eval())
+      self.assertAllClose(1.1, var1.eval())
+
+  def testInitializationOrder(self):
+    with self.test_session():
+      rnd = tf.Variable(random_ops.random_uniform([3, 6]), name="rnd")
+      self.assertEqual("rnd:0", rnd.name)
+      self.assertEqual([3, 6], rnd.get_shape())
+      self.assertEqual([3, 6], rnd.get_shape())
+
+      dep = tf.Variable(rnd.initialized_value(), name="dep")
+      self.assertEqual("dep:0", dep.name)
+      self.assertEqual([3, 6], dep.get_shape())
+      self.assertEqual([3, 6], dep.get_shape())
+
+      # Currently have to set the shape manually for Add.
+      added_val = rnd.initialized_value() + dep.initialized_value() + 2.0
+      added_val.set_shape(rnd.get_shape())
+
+      depdep = tf.Variable(added_val, name="depdep")
+      self.assertEqual("depdep:0", depdep.name)
+      self.assertEqual([3, 6], depdep.get_shape())
+      self.assertEqual([3, 6], depdep.get_shape())
+
+      tf.initialize_all_variables().run()
+
+      self.assertAllClose(rnd.eval(), dep.eval())
+      self.assertAllClose(rnd.eval() + dep.eval() + 2.0,
+                          depdep.eval())
+
+  def testAssignments(self):
+    with self.test_session():
+      var = tf.Variable(0.0)
+      plus_one = var.assign_add(1.0)
+      minus_one = var.assign_sub(2.0)
+      four = var.assign(4.0)
+      tf.initialize_all_variables().run()
+      self.assertAllClose(0.0, var.eval())
+
+      self.assertAllClose(1.0, plus_one.eval())
+      self.assertAllClose(1.0, var.eval())
+
+      self.assertAllClose(-1.0, minus_one.eval())
+      self.assertAllClose(-1.0, var.eval())
+
+      self.assertAllClose(4.0, four.eval())
+      self.assertAllClose(4.0, var.eval())
+
+  def _countUpToTest(self, dtype):
+    with self.test_session():
+      zero = tf.constant(0, dtype=dtype)
+      var = tf.Variable(zero)
+      count_up_to = var.count_up_to(3)
+
+      tf.initialize_all_variables().run()
+      self.assertEqual(0, var.eval())
+
+      self.assertEqual(0, count_up_to.eval())
+      self.assertEqual(1, var.eval())
+
+      self.assertEqual(1, count_up_to.eval())
+      self.assertEqual(2, var.eval())
+
+      self.assertEqual(2, count_up_to.eval())
+      self.assertEqual(3, var.eval())
+
+      with self.assertRaisesOpError("Reached limit of 3"):
+        count_up_to.eval()
+      self.assertEqual(3, var.eval())
+
+      with self.assertRaisesOpError("Reached limit of 3"):
+        count_up_to.eval()
+      self.assertEqual(3, var.eval())
+
+  def testCountUpToInt32(self):
+    self._countUpToTest(tf.int32)
+
+  def testCountUpToInt64(self):
+    self._countUpToTest(tf.int64)
+
+  def testUseVariableAsTensor(self):
+    with self.test_session():
+      var_x = tf.Variable(2.0)
+      var_y = tf.Variable(3.0)
+      tf.initialize_all_variables().run()
+      self.assertAllClose(2.0, var_x.eval())
+      self.assertAllClose(3.0, var_y.eval())
+      self.assertAllClose(5.0, tf.add(var_x, var_y).eval())
+
+  def testCollections(self):
+    with self.test_session():
+      var_x = tf.Variable(2.0)
+      var_y = tf.Variable(2.0, trainable=False)
+      var_z = tf.Variable(2.0, trainable=True)
+      var_t = tf.Variable(
+          2.0, trainable=True,
+          collections=[tf.GraphKeys.TRAINABLE_VARIABLES,
+                       tf.GraphKeys.VARIABLES])
+      self.assertEqual([var_x, var_y, var_z, var_t], tf.all_variables())
+      self.assertEqual([var_x, var_z, var_t], tf.trainable_variables())
+
+  def testOperators(self):
+    with self.test_session():
+      var_f = tf.Variable([2.0])
+      add = var_f + 0.0
+      radd = 1.0 + var_f
+      sub = var_f - 1.0
+      rsub = 1.0 - var_f
+      mul = var_f * 10.0
+      rmul = 10.0 * var_f
+      div = var_f / 10.0
+      rdiv = 10.0 / var_f
+      lt = var_f < 3.0
+      rlt = 3.0 < var_f
+      le = var_f <= 2.0
+      rle = 2.0 <= var_f
+      gt = var_f > 3.0
+      rgt = 3.0 > var_f
+      ge = var_f >= 2.0
+      rge = 2.0 >= var_f
+      neg = -var_f
+      abs_v = abs(var_f)
+
+      var_i = tf.Variable([20])
+      mod = var_i % 7
+      rmod = 103 % var_i
+
+      var_b = tf.Variable([True, False])
+      and_v = operator.and_(var_b, [True, True])
+      or_v = operator.or_(var_b, [False, True])
+      xor_v = operator.xor(var_b, [False, False])
+      invert_v = ~var_b
+
+      rnd = np.random.rand(4, 4).astype("f")
+      var_t = tf.Variable(rnd)
+      slice_v = var_t[2, 0:0]
+
+      tf.initialize_all_variables().run()
+      self.assertAllClose([2.0], add.eval())
+      self.assertAllClose([3.0], radd.eval())
+      self.assertAllClose([1.0], sub.eval())
+      self.assertAllClose([-1.0], rsub.eval())
+      self.assertAllClose([20.0], mul.eval())
+      self.assertAllClose([20.0], rmul.eval())
+      self.assertAllClose([0.2], div.eval())
+      self.assertAllClose([5.0], rdiv.eval())
+      self.assertAllClose([-2.0], neg.eval())
+      self.assertAllClose([2.0], abs_v.eval())
+      self.assertAllClose([True], lt.eval())
+      self.assertAllClose([False], rlt.eval())
+      self.assertAllClose([True], le.eval())
+      self.assertAllClose([True], rle.eval())
+      self.assertAllClose([False], gt.eval())
+      self.assertAllClose([True], rgt.eval())
+      self.assertAllClose([True], ge.eval())
+      self.assertAllClose([True], rge.eval())
+
+      self.assertAllClose([6], mod.eval())
+      self.assertAllClose([3], rmod.eval())
+
+      self.assertAllClose([True, False], and_v.eval())
+      self.assertAllClose([True, True], or_v.eval())
+      self.assertAllClose([True, False], xor_v.eval())
+      self.assertAllClose([False, True], invert_v.eval())
+
+      self.assertAllClose(rnd[2, 0:0], slice_v.eval())
+
+  def testSession(self):
+    with self.test_session() as sess:
+      var = tf.Variable([1, 12])
+      tf.initialize_all_variables().run()
+      self.assertAllClose([1, 12], sess.run(var))
+
+
+class IsInitializedTest(tf.test.TestCase):
+
+  def testNoVars(self):
+    with tf.Graph().as_default():
+      self.assertEqual(None, tf.assert_variables_initialized())
+
+  def testVariables(self):
+    with tf.Graph().as_default(), self.test_session() as sess:
+      v = tf.Variable([1, 2])
+      w = tf.Variable([3, 4])
+      _ = v, w
+      inited = tf.assert_variables_initialized()
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        sess.run(inited)
+      tf.initialize_all_variables().run()
+      sess.run(inited)
+
+  def testVariableList(self):
+    with tf.Graph().as_default(), self.test_session() as sess:
+      v = tf.Variable([1, 2])
+      w = tf.Variable([3, 4])
+      inited = tf.assert_variables_initialized([v])
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        inited.op.run()
+      sess.run(w.initializer)
+      with self.assertRaisesOpError("Attempting to use uninitialized value"):
+        inited.op.run()
+      v.initializer.run()
+      inited.op.run()
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/where_op_test.py b/tensorflow/python/kernel_tests/where_op_test.py
new file mode 100644
index 0000000000..263f98f622
--- /dev/null
+++ b/tensorflow/python/kernel_tests/where_op_test.py
@@ -0,0 +1,43 @@
+"""Tests for tensorflow.ops.reverse_sequence_op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class WhereOpTest(tf.test.TestCase):
+
+  def _testWhere(self, x, truth, expected_err_re=None):
+    with self.test_session():
+      ans = tf.where(x)
+      self.assertEqual([None, x.ndim], ans.get_shape().as_list())
+      if expected_err_re is None:
+        tf_ans = ans.eval()
+        self.assertAllClose(tf_ans, truth, atol=1e-10)
+      else:
+        with self.assertRaisesOpError(expected_err_re):
+          ans.eval()
+
+  def testBasicMat(self):
+    x = np.asarray([[True, False], [True, False]])
+
+    # Ensure RowMajor mode
+    truth = np.asarray([[0, 0], [1, 0]], dtype=np.int64)
+
+    self._testWhere(x, truth)
+
+  def testBasic3Tensor(self):
+    x = np.asarray(
+        [[[True, False], [True, False]], [[False, True], [False, True]],
+         [[False, False], [False, True]]])
+
+    # Ensure RowMajor mode
+    truth = np.asarray(
+        [[0, 0, 0], [0, 1, 0], [1, 0, 1], [1, 1, 1], [2, 1, 1]],
+        dtype=np.int64)
+
+    self._testWhere(x, truth)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/kernel_tests/xent_op_test.py b/tensorflow/python/kernel_tests/xent_op_test.py
new file mode 100644
index 0000000000..4e44472c0d
--- /dev/null
+++ b/tensorflow/python/kernel_tests/xent_op_test.py
@@ -0,0 +1,110 @@
+"""Tests for SoftmaxCrossEntropyWithLogits op."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+from tensorflow.python.kernel_tests import gradient_checker as gc
+
+
+class XentTest(tf.test.TestCase):
+
+  def _npXent(self, features, labels):
+    batch_dim = 0
+    class_dim = 1
+    batch_size = features.shape[batch_dim]
+    e = np.exp(features -
+               np.reshape(np.amax(features, axis=class_dim), [batch_size, 1]))
+    probs = e / np.reshape(np.sum(e, axis=class_dim), [batch_size, 1])
+    bp = (probs - labels)
+    l = -np.sum(labels * np.log(probs + 1.0e-20), axis=1)
+    return l, bp
+
+  def _testXent(self, np_features, np_labels, use_gpu=False):
+    np_loss, np_backprop = self._npXent(np_features, np_labels)
+    with self.test_session(use_gpu=use_gpu) as sess:
+      loss = tf.nn.softmax_cross_entropy_with_logits(np_features, np_labels)
+      backprop = loss.op.outputs[1]
+      tf_loss, tf_backprop = sess.run([loss, backprop])
+    self.assertAllClose(np_loss, tf_loss)
+    self.assertAllClose(np_backprop, tf_backprop)
+
+  def _testAll(self, features, labels):
+    self._testXent(features, labels, use_gpu=False)
+    self._testXent(features, labels, use_gpu=True)
+
+  def testNpXent(self):
+    # We create 2 batches of logits for testing.
+    # batch 0 is the boring uniform distribution: 1, 1, 1, 1, with target 3.
+    # batch 1 has a bit of difference: 1, 2, 3, 4, with soft targets (1, 2).
+    features = [[1., 1., 1., 1.], [1., 2., 3., 4.]]
+    labels = [[0., 0., 0., 1.], [0., .5, .5, 0.]]
+
+    # For batch 0, we expect the uniform distribution: 0.25, 0.25, 0.25, 0.25
+    # With a hard target 3, the backprop is [0.25, 0.25, 0.25, -0.75]
+    # The loss for this batch is -log(0.25) = 1.386
+    #
+    # For batch 1, we have:
+    # exp(0) = 1
+    # exp(1) = 2.718
+    # exp(2) = 7.389
+    # exp(3) = 20.085
+    # SUM = 31.192
+    # So we have as probabilities:
+    # exp(0) / SUM = 0.032
+    # exp(1) / SUM = 0.087
+    # exp(2) / SUM = 0.237
+    # exp(3) / SUM = 0.644
+    # With a soft target (1, 2), the backprop is
+    # [0.032, 0.087 - 0.5 = -0.413, 0.237 - 0.5 = -0.263, 0.644]
+    # The loss for this batch is [0.5 * -log(0.087), 0.5 * -log(0.237)]
+    # = [1.3862, 1.9401]
+    np_loss, np_backprop = self._npXent(np.array(features), np.array(labels))
+    self.assertAllClose(np.array([[0.25, 0.25, 0.25, -0.75],
+                                  [0.0321, -0.4129, -0.2632, 0.6439]]),
+                        np_backprop,
+                        rtol=1.e-3, atol=1.e-3)
+    self.assertAllClose(np.array([1.3862, 1.9401]), np_loss,
+                        rtol=1.e-3, atol=1.e-3)
+
+  def testShapeMismatch(self):
+    with self.test_session():
+      with self.assertRaises(ValueError):
+        tf.nn.softmax_cross_entropy_with_logits(
+            [[0., 1.], [2., 3.]], [[0., 1., 0.], [1., 0., 0.]])
+
+  def testNotMatrix(self):
+    with self.test_session():
+      with self.assertRaises(ValueError):
+        tf.nn.softmax_cross_entropy_with_logits([0., 1., 2., 3.],
+                                                [0., 1., 0., 1.])
+
+  def testFloat(self):
+    self._testAll(
+        np.array([[1., 1., 1., 1.], [1., 2., 3., 4.]]).astype(np.float32),
+        np.array([[0., 0., 0., 1.], [0., .5, .5, 0.]]).astype(np.float32))
+
+  def testDouble(self):
+    self._testXent(
+        np.array([[1., 1., 1., 1.], [1., 2., 3., 4.]]).astype(np.float64),
+        np.array([[0., 0., 0., 1.], [0., .5, .5, 0.]]).astype(np.float64),
+        use_gpu=False)
+
+  def testGradient(self):
+    with self.test_session():
+      l = tf.constant([0.0, 0.0, 1.0, 0.0,
+                       1.0, 0.0, 0.0, 0.0,
+                       0.0, 0.5, 0.0, 0.5], shape=[3, 4],
+                      dtype=tf.float64, name="l")
+      f = tf.constant([0.1, 0.2, 0.3, 0.4,
+                       0.1, 0.4, 0.9, 1.6,
+                       0.1, 0.8, 2.7, 6.4], shape=[3, 4],
+                      dtype=tf.float64, name="f")
+      x = tf.nn.softmax_cross_entropy_with_logits(f, l, name="xent")
+      err = gc.ComputeGradientError(f, [3, 4], x, [3])
+    print "cross entropy gradient err = ", err
+    self.assertLess(err, 5e-8)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/lib/__init__.py b/tensorflow/python/lib/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/lib/__init__.py
diff --git a/tensorflow/python/lib/core/__init__.py b/tensorflow/python/lib/core/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/lib/core/__init__.py
diff --git a/tensorflow/python/lib/core/pywrap_status_test.py b/tensorflow/python/lib/core/pywrap_status_test.py
new file mode 100644
index 0000000000..000a784b6c
--- /dev/null
+++ b/tensorflow/python/lib/core/pywrap_status_test.py
@@ -0,0 +1,35 @@
+"""Tests for SWIG wrapped brain::Status."""
+
+from tensorflow.core.lib.core import error_codes_pb2
+from tensorflow.python import pywrap_tensorflow
+from tensorflow.python.platform import googletest
+
+
+class StatusTest(googletest.TestCase):
+
+  def testDefaultOk(self):
+    status = pywrap_tensorflow.Status()
+    self.assertTrue(status.ok())
+
+  def testCodeAndMessage(self):
+    status = pywrap_tensorflow.Status(error_codes_pb2.INVALID_ARGUMENT, 'foo')
+    self.assertEqual(error_codes_pb2.INVALID_ARGUMENT, status.code())
+    self.assertEqual('foo', status.error_message())
+
+  def testToString(self):
+    status = pywrap_tensorflow.Status()
+    # .ToString was remapped in the .swig file, hence will not work
+    # self.assertIn('OK', status.ToString())
+    self.assertIn('OK', str(status))
+
+  def testException(self):
+    with self.assertRaises(pywrap_tensorflow.StatusNotOK) as context:
+      pywrap_tensorflow.NotOkay()
+    self.assertEqual(context.exception.code, error_codes_pb2.INVALID_ARGUMENT)
+    self.assertEqual(context.exception.error_message, 'Testing 1 2 3')
+    self.assertEqual(None, pywrap_tensorflow.Okay(),
+                     'Status wrapper should not return anything upon OK.')
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/lib/core/status.i b/tensorflow/python/lib/core/status.i
new file mode 100644
index 0000000000..fddbc31e24
--- /dev/null
+++ b/tensorflow/python/lib/core/status.i
@@ -0,0 +1,116 @@
+// SWIG wrapper for lib::tensorflow::Status
+
+%include "tensorflow/python/platform/base.i"
+%include "tensorflow/python/lib/core/strings.i"
+
+%apply int { tensorflow::error::Code };  // Treat the enum as an integer.
+
+%{
+#include "tensorflow/core/public/status.h"
+%}
+
+%typemap(out, fragment="StatusNotOK") tensorflow::Status {
+  if ($1.ok()) {
+    $result = SWIG_Py_Void();
+  } else {
+    RaiseStatusNotOK($1, $descriptor(tensorflow::Status*));
+    SWIG_fail;
+  }
+}
+
+%init %{
+// Setup the StatusNotOK exception class.
+PyObject *pywrap_status = PyImport_ImportModuleNoBlock(
+    "tensorflow.python.pywrap_tensorflow");
+if (pywrap_status) {
+  PyObject *exception = PyErr_NewException(
+      "tensorflow.python.pywrap_tensorflow.StatusNotOK",
+      NULL, NULL);
+  if (exception) {
+    PyModule_AddObject(pywrap_status, "StatusNotOK", exception);  // Steals ref.
+  }
+  Py_DECREF(pywrap_status);
+}
+%}
+
+%fragment("StatusNotOK", "header") %{
+#include "tensorflow/core/public/status.h"
+
+namespace {
+// Initialized on the first call to RaiseStatusNotOK().
+static PyObject *StatusNotOKError = nullptr;
+
+inline void Py_DECREF_wrapper(PyObject *o) { Py_DECREF(o); }
+typedef std::unique_ptr<PyObject, decltype(&Py_DECREF_wrapper)> SafePyObjectPtr;
+SafePyObjectPtr make_safe(PyObject* o) {
+  return SafePyObjectPtr(o, Py_DECREF_wrapper);
+}
+
+void RaiseStatusNotOK(const tensorflow::Status& status, swig_type_info *type) {
+  const int code = status.code();
+  string fullmsg = status.ToString();
+
+  PyObject *exception = nullptr;
+
+  // We're holding the Python GIL, so we don't need to synchronize
+  // access to StatusNotOKError with a Mutex of our own.
+  if (!StatusNotOKError) {
+    PyObject *cls = nullptr;
+    auto pywrap = make_safe(PyImport_ImportModule(
+        "tensorflow.python.pywrap_tensorflow"));
+    if (pywrap) {
+      cls = PyObject_GetAttrString(pywrap.get(), "StatusNotOK");
+    }
+    if (!cls) {
+      cls = Py_None;
+      Py_INCREF(cls);
+    }
+    StatusNotOKError = cls;
+  }
+
+  if (StatusNotOKError != Py_None) {
+    auto fullmsg_ptr = make_safe(_SwigString_FromString(fullmsg));
+    auto exception_ptr = make_safe(PyObject_CallFunctionObjArgs(
+        StatusNotOKError, fullmsg_ptr.get(), NULL));
+    exception = exception_ptr.get();
+    if (exception) {
+      auto pycode = make_safe(PyInt_FromLong(static_cast<long>(code)));
+      auto pymsg = make_safe(_SwigString_FromString(status.error_message()));
+      auto pystatus = make_safe(SWIG_NewPointerObj(
+          SWIG_as_voidptr(new tensorflow::Status(status)), type, SWIG_POINTER_OWN));
+      PyObject_SetAttrString(exception, "code", pycode.get());
+      PyObject_SetAttrString(exception, "error_message", pymsg.get());
+      PyErr_SetObject(StatusNotOKError, exception);
+    }
+  }
+  if (!exception) {
+    fullmsg =
+        ("could not construct StatusNotOK (original error "
+         " was: " +
+         fullmsg + ")");
+    PyErr_SetString(PyExc_SystemError, fullmsg.c_str());
+  }
+}
+
+}  // namespace
+%}
+
+%ignoreall
+
+%unignore tensorflow;
+%unignore tensorflow::lib;
+%unignore tensorflow::Status;
+%unignore tensorflow::Status::Status;
+%unignore tensorflow::Status::Status(tensorflow::error::Code, StringPiece);
+%unignore tensorflow::Status::~Status;
+%unignore tensorflow::Status::code;
+%unignore tensorflow::Status::ok;
+%unignore tensorflow::Status::error_message;
+%unignore tensorflow::Status::ToString;
+%ignore tensorflow::Status::operator=;
+
+%rename(__str__) tensorflow::Status::ToString;
+
+%include "tensorflow/core/public/status.h"
+
+%unignoreall
diff --git a/tensorflow/python/lib/core/status_helper.i b/tensorflow/python/lib/core/status_helper.i
new file mode 100644
index 0000000000..2e01e79ebd
--- /dev/null
+++ b/tensorflow/python/lib/core/status_helper.i
@@ -0,0 +1,16 @@
+// SWIG test helper for lib::tensorflow::Status
+
+%include "tensorflow/python/platform/base.i"
+%import(module="tensorflow.python.pywrap_tensorflow") "tensorflow/python/lib/core/status.i"
+
+%inline %{
+#include "tensorflow/core/public/status.h"
+
+tensorflow::Status NotOkay() {
+  return tensorflow::Status(tensorflow::error::INVALID_ARGUMENT, "Testing 1 2 3");
+}
+
+tensorflow::Status Okay() {
+  return tensorflow::Status();
+}
+%}
diff --git a/tensorflow/python/lib/core/strings.i b/tensorflow/python/lib/core/strings.i
new file mode 100644
index 0000000000..c88e426a54
--- /dev/null
+++ b/tensorflow/python/lib/core/strings.i
@@ -0,0 +1,94 @@
+// Wrapper functions to provide a scripting-language-friendly interface
+// to our string libraries.
+//
+// NOTE: as of 2005-01-13, this SWIG file is not used to generate a pywrap
+//       library for manipulation of various string-related types or access
+//       to the special string functions (Python has plenty). This SWIG file
+//       should be %import'd so that other SWIG wrappers have proper access
+//       to the types in //strings (such as the StringPiece object). We may
+//       generate a pywrap at some point in the future.
+//
+// NOTE: (Dan Ardelean) as of 2005-11-15 added typemaps to convert Java String
+//       arguments to C++ StringPiece& objects. This is required because a
+//       StringPiece class does not make sense - the code SWIG generates for a
+//       StringPiece class is useless, because it releases the buffer set in
+//       StringPiece after creating the object. C++ StringPiece objects rely on
+//       the buffer holding the data being allocated externally.
+
+// NOTE: for now, we'll just start with what is needed, and add stuff
+//       as it comes up.
+
+%{
+#include "tensorflow/core/lib/core/stringpiece.h"
+%}
+
+%typemap(typecheck) tensorflow::StringPiece = char *;
+%typemap(typecheck) const tensorflow::StringPiece & = char *;
+
+// "tensorflow::StringPiece" arguments can be provided by a simple Python 'str' string
+// or a 'unicode' object. If 'unicode', it's translated using the default
+// encoding, i.e., sys.getdefaultencoding(). If passed None, a tensorflow::StringPiece
+// of zero length with a NULL pointer is provided.
+%typemap(in) tensorflow::StringPiece {
+  if ($input != Py_None) {
+    char * buf;
+    Py_ssize_t len;
+%#if PY_VERSION_HEX >= 0x03030000
+    /* Do unicode handling as PyBytes_AsStringAndSize doesn't in Python 3. */
+    if (PyUnicode_Check($input)) {
+      buf = PyUnicode_AsUTF8AndSize($input, &len);
+      if (buf == NULL)
+        SWIG_fail;
+    } else {
+%#elif PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION < 3
+%#  error "Unsupported Python 3.x C API version (3.3 or later required)."
+%#endif
+      if (PyBytes_AsStringAndSize($input, &buf, &len) == -1) {
+        // Python has raised an error (likely TypeError or UnicodeEncodeError).
+        SWIG_fail;
+      }
+%#if PY_VERSION_HEX >= 0x03030000
+    }
+%#endif
+    $1.set(buf, len);
+  }
+}
+
+// "const tensorflow::StringPiece&" arguments can be provided the same as
+// "tensorflow::StringPiece", whose typemap is defined above.
+%typemap(in) const tensorflow::StringPiece & (tensorflow::StringPiece temp) {
+  if ($input != Py_None) {
+    char * buf;
+    Py_ssize_t len;
+%#if PY_VERSION_HEX >= 0x03030000
+    /* Do unicode handling as PyBytes_AsStringAndSize doesn't in Python 3. */
+    if (PyUnicode_Check($input)) {
+      buf = PyUnicode_AsUTF8AndSize($input, &len);
+      if (buf == NULL)
+        SWIG_fail;
+    } else {
+%#elif PY_MAJOR_VERSION == 3 && PY_MINOR_VERSION < 3
+%#  error "Unsupported Python 3.x C API version (3.3 or later required)."
+%#endif
+      if (PyBytes_AsStringAndSize($input, &buf, &len) == -1) {
+        // Python has raised an error (likely TypeError or UnicodeEncodeError).
+        SWIG_fail;
+      }
+%#if PY_VERSION_HEX >= 0x03030000
+    }
+%#endif
+    temp.set(buf, len);
+  }
+  $1 = &temp;
+}
+
+// C++ functions returning tensorflow::StringPiece will simply return bytes in Python,
+// or None if the StringPiece contained a NULL pointer.
+%typemap(out) tensorflow::StringPiece {
+  if ($1.data()) {
+    $result = PyString_FromStringAndSize($1.data(), $1.size());
+  } else {
+    Py_INCREF(Py_None);
+    $result = Py_None;
+  }
+}
diff --git a/tensorflow/python/lib/io/__init__.py b/tensorflow/python/lib/io/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/lib/io/__init__.py
diff --git a/tensorflow/python/lib/io/py_record_reader.cc b/tensorflow/python/lib/io/py_record_reader.cc
new file mode 100644
index 0000000000..5cc5229a8b
--- /dev/null
+++ b/tensorflow/python/lib/io/py_record_reader.cc
@@ -0,0 +1,49 @@
+#include "tensorflow/python/lib/io/py_record_reader.h"
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/lib/io/record_reader.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+
+class RandomAccessFile;
+
+namespace io {
+
+PyRecordReader::PyRecordReader() {}
+
+PyRecordReader* PyRecordReader::New(const string& filename,
+                                    uint64 start_offset) {
+  RandomAccessFile* file;
+  Status s = Env::Default()->NewRandomAccessFile(filename, &file);
+  if (!s.ok()) {
+    return nullptr;
+  }
+  PyRecordReader* reader = new PyRecordReader;
+  reader->offset_ = start_offset;
+  reader->file_ = file;
+  reader->reader_ = new RecordReader(reader->file_);
+  return reader;
+}
+
+PyRecordReader::~PyRecordReader() {
+  delete reader_;
+  delete file_;
+}
+
+bool PyRecordReader::GetNext() {
+  if (reader_ == nullptr) return false;
+  Status s = reader_->ReadRecord(&offset_, &record_);
+  return s.ok();
+}
+
+void PyRecordReader::Close() {
+  delete reader_;
+  delete file_;
+  file_ = nullptr;
+  reader_ = nullptr;
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/python/lib/io/py_record_reader.h b/tensorflow/python/lib/io/py_record_reader.h
new file mode 100644
index 0000000000..5a775761df
--- /dev/null
+++ b/tensorflow/python/lib/io/py_record_reader.h
@@ -0,0 +1,50 @@
+#ifndef TENSORFLOW_PYTHON_LIB_IO_PY_RECORD_READER_H_
+#define TENSORFLOW_PYTHON_LIB_IO_PY_RECORD_READER_H_
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class RandomAccessFile;
+
+namespace io {
+
+class RecordReader;
+
+// A wrapper around io::RecordReader that is more easily SWIG wrapped for
+// Python.  An instance of this class is not safe for concurrent access
+// by multiple threads.
+class PyRecordReader {
+ public:
+  static PyRecordReader* New(const string& filename, uint64 start_offset);
+  ~PyRecordReader();
+
+  // Attempt to get the next record at "current_offset()".  If
+  // successful, returns true, and the record contents can be retrieve
+  // with "this->record()".  Otherwise, returns false.
+  bool GetNext();
+  // Return the current record contents.  Only valid after the preceding call
+  // to GetNext() returned true
+  string record() const { return record_; }
+  // Return the current offset in the file.
+  uint64 offset() const { return offset_; }
+
+  // Close the underlying file and release its resources.
+  void Close();
+
+ private:
+  PyRecordReader();
+
+  uint64 offset_;
+  RandomAccessFile* file_;    // Owned
+  io::RecordReader* reader_;  // Owned
+  string record_;
+  TF_DISALLOW_COPY_AND_ASSIGN(PyRecordReader);
+};
+
+}  // namespace io
+}  // namespace tensorflow
+
+#endif  // TENSORFLOW_PYTHON_LIB_IO_PY_RECORD_READER_H_
diff --git a/tensorflow/python/lib/io/py_record_reader.i b/tensorflow/python/lib/io/py_record_reader.i
new file mode 100644
index 0000000000..19f911bd52
--- /dev/null
+++ b/tensorflow/python/lib/io/py_record_reader.i
@@ -0,0 +1,39 @@
+%nothread tensorflow::io::PyRecordReader::GetNext;
+
+%include "tensorflow/python/platform/base.i"
+
+%feature("except") tensorflow::io::PyRecordReader::New {
+  // Let other threads run while we read
+  Py_BEGIN_ALLOW_THREADS
+  $action
+  Py_END_ALLOW_THREADS
+}
+
+%newobject tensorflow::io::PyRecordReader::New;
+
+%feature("except") tensorflow::io::PyRecordReader::GetNext {
+  // Let other threads run while we read
+  Py_BEGIN_ALLOW_THREADS
+  $action
+  Py_END_ALLOW_THREADS
+}
+
+%{
+#include "tensorflow/python/lib/io/py_record_reader.h"
+%}
+
+%ignoreall
+
+%unignore tensorflow;
+%unignore tensorflow::io;
+%unignore tensorflow::io::PyRecordReader;
+%unignore tensorflow::io::PyRecordReader::~PyRecordReader;
+%unignore tensorflow::io::PyRecordReader::GetNext;
+%unignore tensorflow::io::PyRecordReader::offset;
+%unignore tensorflow::io::PyRecordReader::record;
+%unignore tensorflow::io::PyRecordReader::Close;
+%unignore tensorflow::io::PyRecordReader::New;
+
+%include "tensorflow/python/lib/io/py_record_reader.h"
+
+%unignoreall
diff --git a/tensorflow/python/lib/io/py_record_writer.cc b/tensorflow/python/lib/io/py_record_writer.cc
new file mode 100644
index 0000000000..e557756cbc
--- /dev/null
+++ b/tensorflow/python/lib/io/py_record_writer.cc
@@ -0,0 +1,44 @@
+#include "tensorflow/python/lib/io/py_record_writer.h"
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/lib/io/record_writer.h"
+#include "tensorflow/core/public/env.h"
+
+namespace tensorflow {
+namespace io {
+
+PyRecordWriter::PyRecordWriter() {}
+
+PyRecordWriter* PyRecordWriter::New(const string& filename) {
+  WritableFile* file;
+  Status s = Env::Default()->NewWritableFile(filename, &file);
+  if (!s.ok()) {
+    return nullptr;
+  }
+  PyRecordWriter* writer = new PyRecordWriter;
+  writer->file_ = file;
+  writer->writer_ = new RecordWriter(writer->file_);
+  return writer;
+}
+
+PyRecordWriter::~PyRecordWriter() {
+  delete writer_;
+  delete file_;
+}
+
+bool PyRecordWriter::WriteRecord(::tensorflow::StringPiece record) {
+  if (writer_ == nullptr) return false;
+  Status s = writer_->WriteRecord(record);
+  return s.ok();
+}
+
+void PyRecordWriter::Close() {
+  delete writer_;
+  delete file_;
+  writer_ = nullptr;
+  file_ = nullptr;
+}
+
+}  // namespace io
+}  // namespace tensorflow
diff --git a/tensorflow/python/lib/io/py_record_writer.h b/tensorflow/python/lib/io/py_record_writer.h
new file mode 100644
index 0000000000..e3fd05bd9a
--- /dev/null
+++ b/tensorflow/python/lib/io/py_record_writer.h
@@ -0,0 +1,38 @@
+#ifndef THIRD_PARTY_TENSORFLOW_PYTHON_LIB_IO_PY_RECORD_WRITER_H_
+#define THIRD_PARTY_TENSORFLOW_PYTHON_LIB_IO_PY_RECORD_WRITER_H_
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/core/platform/port.h"
+#include "tensorflow/core/public/status.h"
+
+namespace tensorflow {
+
+class WritableFile;
+
+namespace io {
+
+class RecordWriter;
+
+// A wrapper around io::RecordWriter that is more easily SWIG wrapped for
+// Python.  An instance of this class is not safe for concurrent access
+// by multiple threads.
+class PyRecordWriter {
+ public:
+  static PyRecordWriter* New(const string& filename);
+  ~PyRecordWriter();
+
+  bool WriteRecord(::tensorflow::StringPiece record);
+  void Close();
+
+ private:
+  PyRecordWriter();
+
+  WritableFile* file_;        // Owned
+  io::RecordWriter* writer_;  // Owned
+  TF_DISALLOW_COPY_AND_ASSIGN(PyRecordWriter);
+};
+
+}  // namespace io
+}  // namespace tensorflow
+
+#endif  // THIRD_PARTY_TENSORFLOW_PYTHON_LIB_IO_PY_RECORD_WRITER_H_
diff --git a/tensorflow/python/lib/io/py_record_writer.i b/tensorflow/python/lib/io/py_record_writer.i
new file mode 100644
index 0000000000..20fe52c495
--- /dev/null
+++ b/tensorflow/python/lib/io/py_record_writer.i
@@ -0,0 +1,38 @@
+%nothread tensorflow::io::PyRecordWriter::WriteRecord;
+
+%include "tensorflow/python/platform/base.i"
+%include "tensorflow/python/lib/core/strings.i"
+
+%feature("except") tensorflow::io::PyRecordWriter::New {
+  // Let other threads run while we write
+  Py_BEGIN_ALLOW_THREADS
+  $action
+  Py_END_ALLOW_THREADS
+}
+
+%newobject tensorflow::io::PyRecordWriter::New;
+
+%feature("except") tensorflow::io::PyRecordWriter::WriteRecord {
+  // Let other threads run while we write
+  Py_BEGIN_ALLOW_THREADS
+  $action
+  Py_END_ALLOW_THREADS
+}
+
+%{
+#include "tensorflow/python/lib/io/py_record_writer.h"
+%}
+
+%ignoreall
+
+%unignore tensorflow;
+%unignore tensorflow::io;
+%unignore tensorflow::io::PyRecordWriter;
+%unignore tensorflow::io::PyRecordWriter::~PyRecordWriter;
+%unignore tensorflow::io::PyRecordWriter::WriteRecord;
+%unignore tensorflow::io::PyRecordWriter::Close;
+%unignore tensorflow::io::PyRecordWriter::New;
+
+%include "tensorflow/python/lib/io/py_record_writer.h"
+
+%unignoreall
diff --git a/tensorflow/python/lib/io/python_io.py b/tensorflow/python/lib/io/python_io.py
new file mode 100644
index 0000000000..aedcd2ef03
--- /dev/null
+++ b/tensorflow/python/lib/io/python_io.py
@@ -0,0 +1,29 @@
+"""## Data IO (Python Functions)
+
+A TFRecords file represents a sequence of (binary) strings.  The format is not
+random access, so it is suitable for streaming large amounts of data but not
+suitable if fast sharding or other non-sequential access is desired.
+
+@@TFRecordWriter
+@@tf_record_iterator
+
+- - -
+
+### TFRecords Format Details
+
+A TFRecords file contains a sequence of strings with CRC hashes.  Each record
+has the format
+
+    uint64 length
+    uint32 masked_crc32_of_length
+    byte   data[length]
+    uint32 masked_crc32_of_data
+
+and the records are concatenated together to produce the file.  The CRC32s
+are [described here](https://en.wikipedia.org/wiki/Cyclic_redundancy_check),
+and the mask of a CRC is
+
+    masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul
+"""
+
+from tensorflow.python.lib.io.tf_record import *
diff --git a/tensorflow/python/lib/io/tf_record.py b/tensorflow/python/lib/io/tf_record.py
new file mode 100644
index 0000000000..00825bbda2
--- /dev/null
+++ b/tensorflow/python/lib/io/tf_record.py
@@ -0,0 +1,68 @@
+"""For reading and writing TFRecords files."""
+
+from tensorflow.python import pywrap_tensorflow
+
+
+def tf_record_iterator(path):
+  """An iterator that read the records from a TFRecords file.
+
+  Args:
+    path: The path to the TFRecords file.
+
+  Yields:
+    Strings.
+
+  Raises:
+    IOError: If `path` cannot be opened for reading.
+  """
+  reader = pywrap_tensorflow.PyRecordReader_New(path, 0)
+  if reader is None:
+    raise IOError("Could not open %s." % path)
+  while reader.GetNext():
+    yield reader.record()
+  reader.Close()
+
+
+class TFRecordWriter(object):
+  """A class to write records to a TFRecords file.
+
+  This class implements `__enter__` and `__exit__`, and can be used
+  in `with` blocks like a normal file.
+
+  @@__init__
+  @@write
+  @@close
+  """
+  # TODO(josh11b): Support appending?
+  def __init__(self, path):
+    """Opens file `path` and creates a `TFRecordWriter` writing to it.
+
+    Args:
+      path: The path to the TFRecords file.
+
+    Raises:
+      IOError: If `path` cannot be opened for writing.
+    """
+    self._writer = pywrap_tensorflow.PyRecordWriter_New(path)
+    if self._writer is None:
+      raise IOError("Could not write to %s." % path)
+
+  def __enter__(self):
+    """Enter a `with` block."""
+    pass
+
+  def __exit__(self, unused_type, unused_value, unused_traceback):
+    """Exit a `with` block, closing the file."""
+    self.close()
+
+  def write(self, record):
+    """Write a string record to the file.
+
+    Args:
+      record: str
+    """
+    self._writer.WriteRecord(record)
+
+  def close(self):
+    """Close the file."""
+    self._writer.Close()
diff --git a/tensorflow/python/ops/__init__.py b/tensorflow/python/ops/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/ops/__init__.py
diff --git a/tensorflow/python/ops/array_grad.py b/tensorflow/python/ops/array_grad.py
new file mode 100644
index 0000000000..2a463940d6
--- /dev/null
+++ b/tensorflow/python/ops/array_grad.py
@@ -0,0 +1,187 @@
+"""Gradients for operators defined in array_ops.py."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import gen_array_ops
+
+
+@ops.RegisterGradient("Pack")
+def _PackGrad(op, grad):
+  """Gradient for pack op."""
+  return array_ops.unpack(grad, num=op.get_attr('N'))
+
+
+@ops.RegisterGradient("Unpack")
+def _UnpackGrad(_, *grads):
+  """Gradient for unpack op."""
+  return array_ops.pack(grads)
+
+
+@ops.RegisterGradient("Concat")
+def _ConcatGrad(op, grad):
+  """Gradient for concat op."""
+  assert isinstance(grad, ops.Tensor)
+  # Degenerate concatenation, just return grad.
+  if len(op.inputs) == 2:
+    return [None, grad]
+  # Get the inputs' tensor shapes
+  sizes = [array_ops.shape(x) for x in op.inputs[1:]]
+  concat_dim = op.inputs[0]
+  # Since shape is 1-D, shape_of_shape = [rank-of-inputs]
+  shape_of_shape = array_ops.shape(sizes[0])
+  # Make a vector of length equal to the input's dimensions,
+  # with 0's everywhere and 1 in the concat dim position.
+  # Note: Can't use sparse_to_dense since it isn't GPU-capable (for now)
+  mask = array_ops.concat(0,
+                          [array_ops.fill(
+                              array_ops.expand_dims(concat_dim, 0), 0), [1],
+                           array_ops.fill(shape_of_shape - concat_dim - 1, 0)])
+  out_grads = []
+  begin = array_ops.fill(shape_of_shape, 0)
+  for i in range(len(sizes)):
+    out_grads.append(array_ops.slice(grad, begin, sizes[i]))
+    # Lint complains begin = begin + ...
+    begin = math_ops.add(begin, sizes[i] * mask)
+  return [None] + out_grads
+
+
+@ops.RegisterGradient("Slice")
+def _SliceGrad(op, grad):
+  """Gradient for Slice op."""
+  # Create an Nx2 padding where the first column represents how many
+  # zeros are to be prepended for each dimension, and the second
+  # column indicates how many zeros are appended.
+  #
+  # The number of zeros to append is the shape of the input
+  # elementwise-subtracted by both the begin vector and sizes vector.
+  #
+  # Some more reshaping is needed to assemble this tensor with the
+  # right dimensions.
+  input_vec = op.inputs[0]
+  begin_vec = op.inputs[1]
+  input_rank = array_ops.rank(input_vec)
+  slice_size = array_ops.shape(op.outputs[0])
+
+  shape = array_ops.pack([input_rank, 1])
+  before_pad = array_ops.reshape(begin_vec, shape)
+  after_pad = array_ops.reshape(
+      array_ops.shape(input_vec) - slice_size - begin_vec, shape)
+  paddings = array_ops.concat(1, [before_pad, after_pad])
+  return array_ops.pad(grad, paddings), None, None
+
+
+@ops.RegisterGradient("Split")
+def _SplitGrad(op, *grads):
+  return None, array_ops.concat(op.inputs[0], list(grads))
+
+
+ops.NoGradient("Const")
+
+# TODO(liqzhang): The gradient for Diag operator would be
+# the diagonal of the backprop. Implement if there is a need.
+ops.NoGradient("Diag")
+
+# Edit Distance has no gradient (but can be used to eval seq2seq or CTC).
+ops.NoGradient("EditDistance")
+
+ops.NoGradient("Fill")
+
+
+@ops.RegisterGradient("Gather")
+def _GatherGrad(op, grad):
+  return [
+      ops.IndexedSlices(grad, op.inputs[1], array_ops.shape(op.inputs[0])), None
+  ]
+
+
+@ops.RegisterGradient("Identity")
+def _IdGrad(_, grad):
+  return grad
+
+
+@ops.RegisterGradient("RefIdentity")
+def _RefIdGrad(_, grad):
+  return grad
+
+
+ops.NoGradient("StopGradient")
+
+
+@ops.RegisterGradient("Reshape")
+def _ReshapeGrad(op, grad):
+  return [array_ops.reshape(grad, array_ops.shape(op.inputs[0])), None]
+
+
+ops.NoGradient("InvertPermutation")
+
+
+def _ReshapeToInput(op, grad):
+  """Reshapes the gradient to the shape of the original input."""
+  return array_ops.reshape(grad, array_ops.shape(op.inputs[0]))
+
+
+@ops.RegisterGradient("ExpandDims")
+def _ExpandDimsGrad(op, grad):
+  return [_ReshapeToInput(op, grad), None]
+
+
+@ops.RegisterGradient("Squeeze")
+def _SqueezeGrad(op, grad):
+  return _ReshapeToInput(op, grad)
+
+
+@ops.RegisterGradient("Transpose")
+def _TransposeGrad(op, grad):
+  """Returns unshuffle(grad)."""
+  p = op.inputs[1]
+  return [array_ops.transpose(grad, array_ops.invert_permutation(p)), None]
+
+
+ops.NoGradient("Shape")
+
+
+ops.NoGradient("Rank")
+
+
+ops.NoGradient("Size")
+
+
+@ops.RegisterGradient("Tile")
+def _TileGrad(op, grad):
+  """Sum reduces grad along the tiled dimensions."""
+  assert isinstance(grad, ops.Tensor)
+  return [gen_array_ops._tile_grad(grad, op.inputs[1]), None]
+
+
+ops.NoGradient("TileGrad")
+
+
+ops.NoGradient("BroadcastGradientArgs")
+
+
+@ops.RegisterGradient("Pad")
+def _PadGrad(op, grad):
+  """Gradient for Pad."""
+  # Pad introduces values around the original tensor, so the gradient function
+  # slices the original shape out of the gradient."""
+  x = op.inputs[0]
+  a = op.inputs[1]  # [Rank(x), 2]
+  # Takes a slice of a. The 1st column. [Rank(x), 1].
+  pad_before = array_ops.slice(a, [0, 0],
+                               array_ops.pack([array_ops.rank(x), 1]))
+  # Make it a 1-D tensor.
+  begin = array_ops.reshape(pad_before, [-1])
+  sizes = array_ops.shape(x)
+  return array_ops.slice(grad, begin, sizes), None
+
+
+# ReverseSequence is just a permutation.  The gradient permutes back.
+@ops.RegisterGradient("ReverseSequence")
+def _ReverseSequenceGrad(op, grad):
+  seq_lengths = op.inputs[1]
+  return [array_ops.reverse_sequence(grad,
+                                    seq_dim=op.get_attr("seq_dim"),
+                                    seq_lengths=seq_lengths),
+          None]
diff --git a/tensorflow/python/ops/array_ops.py b/tensorflow/python/ops/array_ops.py
new file mode 100644
index 0000000000..ed780db625
--- /dev/null
+++ b/tensorflow/python/ops/array_ops.py
@@ -0,0 +1,1207 @@
+"""## Casting
+
+TensorFlow provides several operations that you can use to cast tensor data
+types in your graph.
+
+@@string_to_number
+@@to_double
+@@to_float
+@@to_bfloat16
+@@to_int32
+@@to_int64
+@@cast
+
+## Shapes and Shaping
+
+TensorFlow provides several operations that you can use to determine the shape
+of a tensor and change the shape of a tensor.
+
+@@shape
+@@size
+@@rank
+@@reshape
+@@squeeze
+@@expand_dims
+
+## Slicing and Joining
+
+TensorFlow provides several operations to slice or extract parts of a tensor,
+or join multiple tensors together.
+
+@@slice
+@@split
+@@tile
+@@pad
+@@concat
+@@pack
+@@unpack
+@@reverse_sequence
+@@reverse
+@@transpose
+@@gather
+@@dynamic_partition
+@@dynamic_stitch
+"""
+import sys
+import tensorflow.python.platform
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import gen_array_ops
+from tensorflow.python.ops import gen_math_ops
+# pylint: disable=wildcard-import
+# 'Constant' gets imported in the module 'array_ops'.
+from tensorflow.python.ops.constant_op import constant
+from tensorflow.python.ops.gen_array_ops import *
+
+
+# We override the 'slice' for the "slice" op, so we keep python's
+# existing 'slice' for later use in this module.
+_baseslice = slice
+
+
+# Aliases for some automatically-generated names.
+listdiff = gen_array_ops.list_diff
+
+
+# pylint: disable=undefined-variable,protected-access
+def _SliceHelper(tensor, slice_spec):
+  """Overload for Tensor.__getitem__.
+
+  Currently the size of the slice must be statically known in each dimension,
+  i.e. the "stop" of the slice must not be omitted.
+
+  TODO(mrry): Support slices where the sizes are not specified.
+  TODO(mrry): Support negative indices in slices with numpy/Python semantics.
+
+  Args:
+    tensor: An ops.Tensor object.
+    slice_spec: The arguments to Tensor.__getitem__.
+
+  Returns:
+    The appropriate slice of "tensor", based on "slice_spec".
+
+  Raises:
+    ValueError: If a slice range is negative size.
+    TypeError: If the slice indices aren't int, slice, or Ellipsis.
+  """
+  if not isinstance(slice_spec, (list, tuple)):
+    slice_spec = [slice_spec]
+  indices = []
+  sizes = []
+  squeeze_dims = []
+  for dim, s in enumerate(slice_spec):
+    if isinstance(s, int):
+      if s < 0:
+        raise NotImplementedError("Negative indices are currently unsupported")
+      indices.append(s)
+      sizes.append(1)
+      squeeze_dims.append(dim)
+    elif isinstance(s, _baseslice):
+      if s.step not in (None, 1):
+        raise NotImplementedError(
+            "Steps other than 1 are not currently supported")
+      start = s.start if s.start is not None else 0
+      if start < 0:
+        raise NotImplementedError(
+            "Negative start indices are not currently supported")
+      indices.append(start)
+      if s.stop is not None and s.stop < 0:
+        raise NotImplementedError(
+            "Negative stop indices are not currently supported")
+      # NOTE(mrry): If the stop is not specified, Python substitutes
+      #   sys.maxsize, which is typically (2 ** 63) - 1. Since Slice currently
+      #   supports signed DT_INT32 arguments, we use -1 to specify that all
+      #   elements should be captured.
+      if s.stop is None or s.stop == sys.maxsize:
+        sizes.append(-1)
+      else:
+        if start > s.stop:
+          raise ValueError("Stop must be at least start")
+        sizes.append(s.stop - start)
+    elif s is Ellipsis:
+      raise NotImplementedError("Ellipsis is not currently supported")
+    else:
+      raise TypeError("Bad slice index %s of type %s" % (s, type(s)))
+  sliced = slice(tensor, indices, sizes)
+  if squeeze_dims:
+    return squeeze(sliced, squeeze_dims=squeeze_dims)
+  else:
+    return sliced
+
+
+def slice(input_, begin, size, name=None):
+  """Extracts a slice from a tensor.
+
+  This operation extracts a slice of size `size` from a tensor `input` starting
+  at the location specified by `begin`. The slice `size` is represented as a
+  tensor shape, where `size[i]` is the number of elements of the 'i'th dimension
+  of `input` that you want to slice. The starting location (`begin`) for the
+  slice is represented as an offset in each dimension of `input`. In other
+  words, `begin[i]` is the offset into the 'i'th dimension of `input` that you
+  want to slice from.
+
+  `begin` is zero-based; `size` is one-based. If `size[i]` is -1,
+  all remaining elements in dimension i are included in the
+  slice. In other words, this is equivalent to setting:
+
+  `size[i] = input.dim_size(i) - begin[i]`
+
+  This operation requires that:
+
+  `0 <= begin[i] <= begin[i] + size[i] <= Di  for i in [0, n]`
+
+  For example:
+
+  ```
+  # 'input' is [[[1, 1, 1], [2, 2, 2]],
+  #             [[3, 3, 3], [4, 4, 4]],
+  #             [[5, 5, 5], [6, 6, 6]]]
+  tf.slice(input, [1, 0, 0], [1, 1, 3]) ==> [[[3, 3, 3]]]
+  tf.slice(input, [1, 0, 0], [1, 2, 3]) ==> [[[3, 3, 3],
+                                              [4, 4, 4]]]
+  tf.slice(input, [1, 0, 0], [2, 1, 3]) ==> [[[3, 3, 3]],
+                                             [[5, 5, 5]]]
+  ```
+
+  Args:
+    input_: A `Tensor`.
+    begin: An `int32` or `int64` `Tensor`.
+    size: An `int32` or `int64` `Tensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` the same type as `input`.
+  """
+  return gen_array_ops._slice(input_, begin, size, name=name)
+
+
+ops.Tensor._override_operator("__getitem__", _SliceHelper)
+
+
+def pack(values, name="pack"):
+  """Packs a list of rank-`R` tensors into one rank-`(R+1)` tensor.
+
+  Packs tensors in `values` into a tensor with rank one higher than each tensor
+  in `values` and shape `[len(values)] + values[0].shape`. The output satisfies
+  `output[i, ...] = values[i][...]`.
+
+  This is the opposite of unpack.  The numpy equivalent is
+
+      tf.pack([x, y, z]) = np.asarray([x, y, z])
+
+  Args:
+    values: A list of `Tensor` objects with the same shape and type.
+    name: A name for this operation (optional).
+
+  Returns:
+    output: A packed `Tensor` with the same type as `values`.
+  """
+  return gen_array_ops._pack(values, name=name)
+
+
+def unpack(value, num=None, name="unpack"):
+  """Unpacks the outer dimension of a rank-`R` tensor into rank-`(R-1)` tensors.
+
+  Unpacks `num` tensors from `value` along the first dimension.
+  If `num` is not specified (the default), it is inferred from `value`'s shape.
+  If `value.shape[0]` is not known, `ValueError` is raised.
+
+  The ith tensor in `output` is the slice `value[i, ...]`. Each tensor in
+  `output` has shape `value.shape[1:]`.
+
+  This is the opposite of pack.  The numpy equivalent is
+
+      tf.unpack(x, n) = list(x)
+
+  Args:
+    value: A rank `R > 0` `Tensor` to be unpacked.
+    num: An `int`. The first dimension of value. Automatically inferred if
+      `None` (the default).
+    name: A name for the operation (optional).
+
+  Returns:
+    The list of `Tensor` objects unpacked from `value`.
+
+  Raises:
+    ValueError: If `num` is unspecified and cannot be inferred.
+  """
+  if num is None:
+    value = ops.convert_to_tensor(value)
+    shape = value.get_shape()
+    num = shape[0].value
+    if num is None:
+      raise ValueError("Cannot infer num from shape %s" % shape)
+  return gen_array_ops._unpack(value, num=num, name=name)
+
+
+def concat(concat_dim, values, name="concat"):
+  """Concatenates tensors along one dimension.
+
+  Concatenates the list of tensors `values` along dimension `concat_dim`.  If
+  `values[i].shape = [D0, D1, ... Dconcat_dim(i), ...Dn]`, the concatenated
+  result has shape
+
+      [D0, D1, ... Rconcat_dim, ...Dn]
+
+  where
+
+      Rconcat_dim = sum(Dconcat_dim(i))
+
+  That is, the data from the input tensors is joined along the `concat_dim`
+  dimension.
+
+  The number of dimensions of the input tensors must match, and all dimensions
+  except `concat_dim` must be equal.
+
+  For example:
+
+  ```python
+  t1 = [[1, 2, 3], [4, 5, 6]]
+  t2 = [[7, 8, 9], [10, 11, 12]]
+  tf.concat(0, [t1, t2]) ==> [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
+  tf.concat(1, [t1, t2]) ==> [[1, 2, 3, 7, 8, 9], [4, 5, 6, 10, 11, 12]]
+
+  # tensor t3 with shape [2, 3]
+  # tensor t4 with shape [2, 3]
+  tf.shape(tf.concat(0, [t3, t4])) ==> [4, 3]
+  tf.shape(tf.concat(1, [t3, t4])) ==> [2, 6]
+  ```
+
+  Args:
+    concat_dim: 0-D `int32` `Tensor`.  Dimension along which to concatenate.
+    values: A list of `Tensor` objects or a single `Tensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` resulting from concatenation of the input tensors.
+  """
+  if not isinstance(values, (list)):
+    values = [values]
+  # TODO(mrry): Change to return values?
+  if len(values) == 1:  # Degenerate case of one tensor.
+    return identity(values[0], name=name)
+  return gen_array_ops._concat(concat_dim=concat_dim,
+                               values=values,
+                               name=name)
+
+
+@ops.RegisterShape("Pack")
+def _PackShape(op):
+  input_shape = op.inputs[0].get_shape()
+  for inp in op.inputs[1:]:
+    input_shape = input_shape.merge_with(inp.get_shape())
+  return [tensor_shape.TensorShape([len(op.inputs)]).concatenate(input_shape)]
+
+
+@ops.RegisterShape("Unpack")
+def _UnpackShape(op):
+  input_shape = op.inputs[0].get_shape()
+  return [input_shape[1:]] * op.get_attr("num")
+
+
+@ops.RegisterShape("Concat")
+def _ConcatShape(op):
+  concat_dim = tensor_util.ConstantValue(op.inputs[0])
+  if concat_dim is None:
+    # Return an unknown shape with the same rank as the inputs, or an
+    # unknown rank if no input's rank is known.
+    rank = None
+    for value in op.inputs[1:]:
+      if rank is not None:
+        value.get_shape().assert_has_rank(rank)
+      else:
+        rank = value.get_shape().ndims
+    return [tensor_shape.unknown_shape(ndims=max(rank, 1))]
+
+  else:
+    # Merge all the non-concat dims, and sum the concat dim to make an
+    # output shape.
+    concat_dim = int(concat_dim)
+    output_shape = op.inputs[1].get_shape()
+    # TODO(irving): Remove once !kAllowLegacyScalars.
+    if output_shape.ndims == 0:
+      output_shape = tensor_shape.TensorShape([1])
+    for value in op.inputs[2:]:
+      value_shape = value.get_shape()
+      if value_shape.ndims is not None and concat_dim >= value_shape.ndims:
+        if value_shape.ndims == 0 and concat_dim == 0:
+          # Let concat handle scalars
+          # TODO(irving): Remove once !kAllowLegacyScalars.
+          value_shape = tensor_shape.TensorShape([1])
+        else:
+          raise ValueError("concat_dim is out of range (values rank = %d)" %
+                           value_shape.ndims)
+      before = output_shape[:concat_dim].merge_with(value_shape[:concat_dim])
+      at = output_shape[concat_dim] + value_shape[concat_dim]
+      after = output_shape[
+          concat_dim + 1:].merge_with(value_shape[concat_dim + 1:])
+      output_shape = before.concatenate(at).concatenate(after)
+    return [output_shape]
+
+
+def sparse_mask(a, mask_indices, name=None):
+  """Masks elements of `IndexedSlices`.
+
+  Given an `IndexedSlices` instance `a`, returns another `IndexedSlices` that
+  contains a subset of the slices of `a`. Only the slices at indices specified
+  in `mask_indices` are returned.
+
+  This is useful when you need to extract a subset of slices in an
+  `IndexedSlices` object.
+
+  For example:
+
+  ```python
+  # `a` contains slices at indices [12, 26, 37, 45] from a large tensor
+  # with shape [1000, 10]
+  a.indices => [12, 26, 37, 45]
+  tf.shape(a.values) => [4, 10]
+
+  # `b` will be the subset of `a` slices at its second and third indices, so
+  # we want to mask of its first and last indices (which are at absolute
+  # indices 12, 45)
+  b = tf.sparse_mask(a, [12, 45])
+
+  b.indices => [26, 37]
+  tf.shape(b.values) => [2, 10]
+
+  ```
+
+  Args:
+    * `a`: An `IndexedSlices` instance.
+    * `mask_indices`: Indices of elements to mask.
+    * `name`: A name for the operation (optional).
+
+  Returns:
+    The masked `IndexedSlices` instance.
+  """
+  with ops.op_scope([a, mask_indices], name, "sparse_mask") as name:
+    indices = a.indices
+    out_indices, to_gather = listdiff(indices, mask_indices)
+    out_values = gather(a.values, to_gather, name=name)
+    return ops.IndexedSlices(out_values, out_indices, a.dense_shape)
+
+
+def split(split_dim, num_split, value, name="split"):
+  """Splits a tensor into `num_split` tensors along one dimension.
+
+  Splits `value` along dimension `split_dim` into `num_split` smaller tensors.
+  Requires that `num_split` evenly divide `value.shape[split_dim]`.
+
+  For example:
+
+  ```python
+  # 'value' is a tensor with shape [5, 30]
+  # Split 'value' into 3 tensors along dimension 1
+  split0, split1, split2 = tf.split(1, 3, value)
+  tf.shape(split0) ==> [5, 10]
+  ```
+
+  Args:
+    split_dim: A 0-D `int32` `Tensor`. The dimension along which to split.
+      Must be in the range `[0, rank(value))`.
+    num_split: A 0-D `int32` `Tensor`. The number of ways to split.
+    value: The `Tensor` to split.
+    name: A name for the operation (optional).
+
+  Returns:
+    `num_split` `Tensor` objects resulting from splitting `value`.
+  """
+  return gen_array_ops._split(split_dim=split_dim,
+                              num_split=num_split,
+                              value=value,
+                              name=name)
+
+
+@ops.RegisterShape("Reverse")
+def _ReverseShape(op):
+  return [op.inputs[0].get_shape().with_rank_at_most(8)]
+
+
+def transpose(a, perm=None, name="transpose"):
+  """Transposes `a`. Permutes the dimensions according to `perm`.
+
+  The returned tensor's dimension i will correspond to the input dimension
+  `perm[i]`. If `perm` is not given, it is set to (n-1...0), where n is
+  the rank of the input tensor. Hence by default, this operation performs a
+  regular matrix transpose on 2-D input Tensors.
+
+  For example:
+
+  ```python
+  # 'x' is [[1 2 3]
+  #         [4 5 6]]
+  tf.transpose(x) ==> [[1 4]
+                       [2 5]
+                       [3 6]]
+
+  # Equivalently
+  tf.transpose(x perm=[0, 1]) ==> [[1 4]
+                                   [2 5]
+                                   [3 6]]
+
+  # 'perm' is more useful for n-dimensional tensors, for n > 2
+  # 'x' is   [[[1  2  3]
+  #            [4  5  6]]
+  #           [[7  8  9]
+  #            [10 11 12]]]
+  # Take the transpose of the matrices in dimension-0
+  tf.transpose(b, perm=[0, 2, 1]) ==> [[[1  4]
+                                        [2  5]
+                                        [3  6]]
+
+                                       [[7 10]
+                                        [8 11]
+                                        [9 12]]]
+  ```
+
+  Args:
+    a: A `Tensor`.
+    perm: A permutation of the dimensions of `a`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A transposed `Tensor`.
+  """
+  with ops.op_scope([a], name, "transpose") as name:
+    if perm is None:
+      dims = gen_math_ops._range(0, gen_array_ops.rank(a), 1)
+      perm = gen_array_ops.reverse(dims, [True])
+      ret = gen_array_ops.transpose(a, perm, name=name)
+      # NOTE(mrry): Setting the shape explicitly because
+      #   reverse is not handled by the shape function.
+      input_shape = ret.op.inputs[0].get_shape().dims
+      if input_shape is not None:
+        ret.set_shape(input_shape[::-1])
+    else:
+      ret = gen_array_ops.transpose(a, perm, name=name)
+    return ret
+
+
+def zeros(shape, dtype=types.float32, name=None):
+  """Creates a tensor with all elements set to zero.
+
+  This operation returns a tensor of type `dtype` with shape `shape` and
+  all elements set to zero.
+
+  For example:
+
+  ```python
+  tf.zeros([3, 4], int32) ==> [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
+  ```
+
+  Args:
+    shape: Either a list of integers, or a 1-D `Tensor` of type `int32`.
+    dtype: The type of an element in the resulting `Tensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` with all elements set to zero.
+  """
+  with ops.op_scope([shape], name, "zeros") as name:
+    if isinstance(shape, list):
+      output = constant(0, shape=shape, dtype=dtype, name=name)
+    else:
+      shape = ops.convert_to_tensor(shape, name="shape")
+      output = fill(shape, constant(0, dtype=dtype), name=name)
+  assert output.dtype.base_dtype == types.as_dtype(dtype).base_dtype
+  return output
+
+
+def zeros_like(tensor, dtype=None, name=None):
+  """Creates a tensor with all elements set to zero.
+
+  Given a single tensor (`tensor`), this operation returns a tensor of the
+  same type and shape as `tensor` with all elements set to zero. Optionally,
+  you can use `dtype` to specify a new type for the returned tensor.
+
+  For example:
+
+  ```python
+  # 'tensor' is [[1, 2, 3], [4, 5, 6]]
+  tf.zeros_like(tensor) ==> [[0, 0, 0], [0, 0, 0]]
+  ```
+
+  Args:
+    tensor: A `Tensor`.
+    dtype: A type for the returned `Tensor`. Must be `float32`, `float64`,
+    `int8`, `int16`, `int32`, `int64`, `uint8`, or `complex64`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` with all elements set to zero.
+  """
+  with ops.op_scope([tensor], name, "zeros_like") as name:
+    tensor = ops.convert_to_tensor(tensor, name="tensor")
+    zeros_shape = shape(tensor)
+    if dtype is None:
+      dtype = tensor.dtype
+    return zeros(zeros_shape, dtype=dtype, name=name)
+
+
+def ones_like(tensor, dtype=None, name=None):
+  """Creates a tensor with all elements set to 1.
+
+  Given a single tensor (`tensor`), this operation returns a tensor of the same
+  type and shape as `tensor` with all elements set to 1. Optionally, you can
+  specify a new type (`dtype`) for the returned tensor.
+
+  For example:
+
+  ```python
+  # 'tensor' is [[1, 2, 3], [4, 5, 6]]
+  tf.ones_like(tensor) ==> [[1, 1, 1], [1, 1, 1]]
+  ```
+
+  Args:
+    tensor: A `Tensor`.
+    dtype: A type for the returned `Tensor`. Must be `float32`, `float64`,
+    `int8`, `int16`, `int32`, `int64`, `uint8`, or `complex64`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` with all elements set to 1.
+  """
+  with ops.op_scope([tensor], name, "ones_like") as name:
+    tensor = ops.convert_to_tensor(tensor, name="tensor")
+    ones_shape = shape(tensor)
+    if dtype is None:
+      dtype = tensor.dtype
+    return ones(ones_shape, dtype=dtype, name=name)
+
+
+def zeros_initializer(shape, dtype=types.float32):
+  """An adaptor for zeros() to match the Initializer spec."""
+  return zeros(shape, dtype)
+
+
+def ones(shape, dtype=types.float32, name=None):
+  """Creates a tensor with all elements set to 1.
+
+  This operation returns a tensor of type `dtype` with shape `shape` and all
+  elements set to 1.
+
+  For example:
+
+  ```python
+  tf.ones([2, 3], int32) ==> [[1, 1, 1], [1, 1, 1]]
+  ```
+
+  Args:
+    shape: Either a list of integers, or a 1-D `Tensor` of type `int32`.
+    dtype: The type of an element in the resulting `Tensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` with all elements set to 1.
+  """
+  with ops.op_scope([shape], name, "ones") as name:
+    if isinstance(shape, list):
+      output = constant(1, shape=shape, dtype=dtype, name=name)
+    else:
+      shape = ops.convert_to_tensor(shape, name="shape")
+      output = fill(shape, constant(1, dtype=dtype), name=name)
+  assert output.dtype.base_dtype == types.as_dtype(dtype).base_dtype
+  return output
+
+
+def placeholder(dtype, shape=None, name=None):
+  """Inserts a placeholder for a tensor that will be always fed.
+
+  **Important**: This tensor will produce an error if evaluated. Its value must
+  be fed using the `feed_dict` optional argument to `Session.run()`,
+  `Tensor.eval()`, or `Operation.run()`.
+
+  For example:
+
+  ```python
+  x = tf.placeholder(float, shape=(1024, 1024))
+  y = tf.matmul(x, x)
+
+  with tf.Session() as sess:
+    print sess.run(y)  # ERROR: will fail because x was not fed.
+
+    rand_array = np.random.rand(1024, 1024)
+    print sess.run(y, feed_dict={x: rand_array})  # Will succeed.
+  ```
+
+  Args:
+    dtype: The type of elements in the tensor to be fed.
+    shape: The shape of the tensor to be fed (optional). If the shape is not
+      specified, you can feed a tensor of any shape.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` that may be used as a handle for feeding a value, but not
+    evaluated directly.
+  """
+  shape = tensor_shape.as_shape(shape)
+  if shape.is_fully_defined():
+    dim_list = shape.as_list()
+  else:
+    dim_list = []
+  ret = gen_array_ops._placeholder(
+      dtype=dtype,
+      shape=dim_list,
+      name=name)
+  ret.set_shape(shape)
+  return ret
+
+
+@ops.RegisterShape("Placeholder")
+def _PlaceholderShape(op):
+  given_shape = tensor_util.TensorShapeProtoToList(op.get_attr("shape"))
+  if given_shape:
+    return [tensor_shape.TensorShape(given_shape)]
+  else:
+    return [tensor_shape.unknown_shape()]
+
+
+@ops.RegisterShape("CheckNumerics")
+@ops.RegisterShape("Identity")
+@ops.RegisterShape("RefIdentity")
+@ops.RegisterShape("StopGradient")
+def _UnchangedShape(op):
+  return [op.inputs[0].get_shape()]
+
+
+@ops.RegisterShape("Rank")
+@ops.RegisterShape("Size")
+def _ScalarShape(unused_op):
+  return [tensor_shape.scalar()]
+
+
+@ops.RegisterShape("Slice")
+def _SliceShape(op):
+  """Shape function for array_ops.slice."""
+  input_shape = op.inputs[0].get_shape()
+  begin_shape = op.inputs[1].get_shape().with_rank_at_most(1)
+  sizes_shape = op.inputs[2].get_shape().with_rank_at_most(1)
+  rank_vector_shape = begin_shape.merge_with(sizes_shape)
+  ndims = rank_vector_shape.num_elements()
+  if ndims is not None:
+    input_shape.assert_has_rank(ndims)
+  begin_value = tensor_util.ConstantValue(op.inputs[1])
+  sizes_value = tensor_util.ConstantValue(op.inputs[2])
+  if sizes_value is not None:
+    returned_dims = []
+    for i, slice_size in enumerate(sizes_value.ravel()):
+      if slice_size != -1:
+        returned_dims.append(slice_size)
+      elif begin_value is not None:
+        returned_dims.append(input_shape[i] - begin_value[i])
+      else:
+        returned_dims.append(None)
+    return [tensor_shape.TensorShape(returned_dims)]
+  else:
+    if input_shape.ndims is not None:
+      return [tensor_shape.unknown_shape(ndims=input_shape.ndims)]
+    elif ndims is not None:
+      return [tensor_shape.unknown_shape(ndims=ndims)]
+    else:
+      return [tensor_shape.unknown_shape()]
+
+
+@ops.RegisterShape("Gather")
+def _GatherShape(op):
+  """Shape function for array_ops.gather."""
+  params_shape = op.inputs[0].get_shape()
+  indices_shape = op.inputs[1].get_shape()
+  return [indices_shape.concatenate(params_shape[1:])]
+
+
+@ops.RegisterShape("Unique")
+def _UniqueShape(op):
+  """Shape function for array_ops.Unique."""
+  # The output is a vector with data-dependent length.
+  input_shape = op.inputs[0].get_shape()
+  input_shape.assert_has_rank(1)
+  return [tensor_shape.vector(None), input_shape]
+
+
+@ops.RegisterShape("Diag")
+def _DiagShape(op):
+  """Shape function for array_ops.diag.
+
+  This op has one input (of rank k <= 3), and one output (of rank 2k),
+  where the shape of the output is the concatenation of the input
+  shape with itself.
+
+  Args:
+    op: A Diag Operation.
+
+  Returns:
+    A single-element list containing the shape of the output.
+  """
+  input_shape = op.inputs[0].get_shape().with_rank_at_most(3)
+  return [input_shape.concatenate(input_shape)]
+
+
+@ops.RegisterShape("ExpandDims")
+def _ExpandDimsShape(op):
+  """Determine shape for expand op's output tensor.
+
+  Args:
+    op: Operation for which to determine shape.
+        op.inputs[0] is the input tensor.
+        op.inputs[1] is the dimension in which to expand.
+  Returns:
+    Shape of op's output tensor.
+  Raises:
+    ValueError: If dim is outside of [-rank - 1, rank], where rank is the number
+        of dimensions in the input tensor.
+  """
+  input_shape = op.inputs[0].get_shape()
+  if input_shape.dims is None:
+    return [tensor_shape.unknown_shape()]
+  dim = tensor_util.ConstantValue(op.inputs[1])
+  input_ndims = input_shape.ndims
+  if dim < -input_ndims - 1 or dim > input_ndims:
+    raise ValueError(
+        "dim %d not in [%d, %d]." % (dim, -input_ndims, input_ndims))
+  if dim < 0:
+    dim += (input_ndims + 1)
+  result_shape = list(input_shape.dims)
+  result_shape.insert(dim, 1)
+  return [tensor_shape.TensorShape(result_shape)]
+
+
+@ops.RegisterShape("Squeeze")
+def _SqueezeShape(op):
+  """Determine shape for squeeze op's output tensor.
+
+  Args:
+    op: Operation for which to determine shape.
+  Returns:
+    Shape of op's output tensor.
+  Raises:
+    ValueError: if squeeze_dims includes a dimension outside of [-rank, rank),
+        where rank is the number of dimensions in the input tensor. Or, if
+        squeeze_dims includes a dimension for which input shape has a value
+        not equal to 1.
+  """
+  input_shape = op.inputs[0].get_shape()
+  if input_shape.dims is None:
+    return [tensor_shape.unknown_shape()]
+
+  squeeze_dims = op.get_attr("squeeze_dims") or []
+  wrapped_squeeze_dims = []
+  input_ndims = input_shape.ndims
+  for i, squeeze_dim in enumerate(squeeze_dims):
+    if squeeze_dim < -input_ndims or squeeze_dim >= input_ndims:
+      raise ValueError(
+          "squeeze_dims[%d]=%d not in [%d, %d)." % (
+              i, squeeze_dim, -input_ndims, input_ndims))
+    if squeeze_dim < 0:
+      squeeze_dim += input_ndims
+    wrapped_squeeze_dims.append(squeeze_dim)
+
+  result_shape = []
+  for i, dim in enumerate([d.value for d in input_shape.dims]):
+    is_explicit_match = i in wrapped_squeeze_dims
+    if is_explicit_match or not wrapped_squeeze_dims:
+      if dim is None:
+        return [tensor_shape.unknown_shape()]
+      if dim != 1:
+        if is_explicit_match:
+          raise ValueError(
+              "Can not squeeze dim[%d], expected a dimension of 1, got %d." % (
+                  i, dim))
+        result_shape.append(dim)
+    else:
+      result_shape.append(dim)
+  return [tensor_shape.TensorShape(result_shape)]
+
+
+@ops.RegisterShape("Reshape")
+def _ReshapeShape(op):
+  """Shape function for Reshape op."""
+  input_shape = op.inputs[0].get_shape()
+  new_shape_shape = op.inputs[1].get_shape().with_rank_at_most(1)
+  new_shape = tensor_util.ConstantValue(op.inputs[1])
+  if new_shape is None:
+    # Attempt to infer the rank of the output from the length of
+    # new_shape.
+    return [tensor_shape.unknown_shape(ndims=new_shape_shape.num_elements())]
+  new_shape = np.reshape(new_shape, -1).tolist()
+  if -1 not in new_shape:
+    # The new shape is fully defined.
+    return [tensor_shape.TensorShape(new_shape)]
+  elif input_shape.is_fully_defined():
+    # We know the input shape, so we can calculate the missing
+    # dimension in the new_shape.
+    num_elements = 1
+    for dim in input_shape.dims:
+      num_elements *= dim.value
+    known_elements = 1
+    unknown_index = None
+    for i, dim in enumerate(new_shape):
+      if dim == -1:
+        unknown_index = i
+      else:
+        known_elements *= dim
+    if known_elements == 0:
+      raise ValueError("cannot infer the missing input size for "
+                       "an empty tensor unless all specified "
+                       "input sizes are non-zero")
+    if num_elements % known_elements != 0:
+      raise ValueError("input has %s elements, which isn't divisible by %d" %
+                       (num_elements, known_elements))
+    new_shape[unknown_index] = num_elements / known_elements
+    return [tensor_shape.TensorShape(new_shape)]
+  else:
+    # We don't know the input shape, but we know n-1 of the dimensions
+    # in the new shape.
+    new_shape[new_shape.index(-1)] = None
+    return [tensor_shape.TensorShape(new_shape)]
+
+
+@ops.RegisterShape("BroadcastGradientArgs")
+def _BroadcastGradientArgsShape(op):
+  """Shape function for the BroadcastGradientArgs op."""
+  # TODO(mrry): Implement ConstantValue for BroadcastGradientArgs?
+  op.inputs[0].get_shape().assert_has_rank(1)
+  op.inputs[1].get_shape().assert_has_rank(1)
+  return [tensor_shape.vector(None), tensor_shape.vector(None)]
+
+
+@ops.RegisterShape("Fill")
+def _FillShape(op):
+  """Shape function for the Fill op.
+
+  This op takes a vector of dimensions and a scalar, and produces a
+  tensor with the given dimensions.
+
+  Args:
+    op: A Fill Operation.
+
+  Returns:
+    A single-element list containing the shape of the output.
+  """
+  dimensions_shape = op.inputs[0].get_shape().with_rank_at_most(1)
+  op.inputs[1].get_shape().assert_is_compatible_with(tensor_shape.scalar())
+  fill_dims = tensor_util.ConstantValue(op.inputs[0])
+  if fill_dims is None:
+    # Attempt to infer the rank of the output from the length of
+    # dimensions.
+    return [tensor_shape.unknown_shape(ndims=dimensions_shape.num_elements())]
+  else:
+    return [tensor_shape.TensorShape(fill_dims.tolist())]
+
+
+@ops.RegisterShape("InvertPermutation")
+def _InvertPermutationShape(op):
+  """Shape function for the InvertPermutation op."""
+  return [op.inputs[0].get_shape().with_rank(1)]
+
+
+@ops.RegisterShape("ListDiff")
+def _ListDiffShape(op):
+  """Shape function for the ListDiff op."""
+  op.inputs[0].get_shape().assert_has_rank(1)
+  op.inputs[1].get_shape().assert_has_rank(1)
+  # TODO(mrry): Indicate that the length falls within an interval?
+  return [tensor_shape.vector(None)] * 2
+
+
+@ops.RegisterShape("Pad")
+def _PadShape(op):
+  """Shape function for the Pad op.
+
+  This op has two inputs:
+
+  * input: A rank-N tensor.
+  * paddings: An N-by-2 matrix, in which the i^th row contains the
+    number of padding elements to add before and after `input` in the
+    i^th dimension.
+
+  It has one output, which has the same rank as input, and additional
+  elements according to the values in paddings.
+
+  Args:
+    op: A Pad Operation.
+
+  Returns:
+    A single-element list containing the shape of the output.
+
+  Raises:
+    ValueError: If the input shapes are incompatible.
+  """
+  paddings_shape = op.inputs[1].get_shape().with_rank(2)
+  input_shape = op.inputs[0].get_shape()
+  if input_shape.ndims == 0 and paddings_shape[0].value == 1:
+    # TODO(irving): Remove once !kAllowLegacyScalars.
+    input_shape = tensor_shape.TensorShape([1])
+  else:
+    input_shape = input_shape.with_rank(paddings_shape[0].value)
+  paddings_shape = paddings_shape.merge_with(
+      tensor_shape.matrix(input_shape.ndims, 2))
+  paddings = tensor_util.ConstantValue(op.inputs[1])
+  if paddings is None:
+    return [tensor_shape.unknown_shape(ndims=input_shape.ndims)]
+  else:
+    output_dims = []
+    for i, dim in enumerate(input_shape.dims):
+      if paddings[i, 0] < 0 or paddings[i, 1] < 0:
+        raise ValueError("paddings must be non-negative")
+      output_dims.append(dim + paddings[i, 0] + paddings[i, 1])
+    return [tensor_shape.TensorShape(output_dims)]
+
+
+@ops.RegisterShape("ReverseSequence")
+def _ReverseSequenceShape(op):
+  """Shape function for the ReverseSequence op.
+
+  This op has two inputs:
+
+  * input: A rank-N tensor with size B in the 0th dimension.
+  * seq_lens: A vector of length B.
+
+  It has one output, with the same size as input.
+
+  Args:
+    op: A ReverseSequence Operation.
+
+  Returns:
+    A single-element list containing the shape of the output.
+
+  Raises:
+    ValueError: If the input shapes are incompatible.
+  """
+  input_shape = op.inputs[0].get_shape()
+  seq_lens_shape = op.inputs[1].get_shape().with_rank(1)
+  batch_size = input_shape[0].merge_with(seq_lens_shape[0])
+  input_shape = tensor_shape.TensorShape([batch_size]).concatenate(
+      input_shape[1:])
+  seq_dim = op.get_attr("seq_dim")
+  if seq_dim >= input_shape.ndims:
+    raise ValueError("seq_dim must be < input.dims() (%d vs %d)" %
+                     (seq_dim, input_shape.ndims))
+  return [input_shape]
+
+
+@ops.RegisterShape("Shape")
+def _ShapeShape(op):
+  """Shape function for the Shape op."""
+  input_shape = op.inputs[0].get_shape()
+  return [tensor_shape.vector(input_shape.ndims)]
+
+
+@ops.RegisterShape("Transpose")
+def _TransposeShape(op):
+  """Shape function for the Transpose op.
+
+  This op takes two inputs:
+
+  * input: a rank-N tensor of arbitrary shape.
+  * shuffle: a length-N vector.
+
+  Its output is the rank-N tensor computed by permuting the dimensions
+  of input according to shuffle.
+
+  Args:
+    op: A Transpose op.
+
+  Returns:
+    A single-element list containing the shape of the output.
+
+  Raises:
+    ValueError: If the shapes of input and shuffle are incompatible.
+    IndexError: If shuffle contains an index that is >= the rank of input.
+  """
+  input_shape = op.inputs[0].get_shape()
+  transpose_shape = op.inputs[1].get_shape().merge_with(tensor_shape.vector(
+      input_shape.ndims))
+  transpose_vec = tensor_util.ConstantValue(op.inputs[1])
+  if transpose_vec is None:
+    return [tensor_shape.unknown_shape(ndims=transpose_shape[0].value)]
+  else:
+    return [tensor_shape.TensorShape([input_shape[i]
+                                      for i in transpose_vec.tolist()])]
+
+
+@ops.RegisterShape("Split")
+def _SplitShape(op):
+  """Shape function for the Split op."""
+  split_dim = tensor_util.ConstantValue(op.inputs[0])
+  num_split = len(op.outputs)
+  input_shape = op.inputs[1].get_shape()
+  if split_dim is None:
+    return [tensor_shape.unknown_shape(ndims=input_shape.ndims)] * num_split
+  else:
+    split_dim = int(split_dim)
+    input_shape = input_shape.with_rank_at_least(split_dim + 1)
+    if not (input_shape[split_dim] % num_split).is_compatible_with(0):
+      raise ValueError(
+          "Number of ways to split should evenly divide the split "
+          "dimension but got split_dim %d (size = %d) and num_split %d" %
+          (split_dim, input_shape[split_dim].value, num_split))
+    prefix = input_shape[:split_dim]
+    size_in_split_dim = input_shape[split_dim] / num_split
+    suffix = input_shape[split_dim + 1:]
+    output_shape = prefix.concatenate(size_in_split_dim).concatenate(suffix)
+    return [output_shape] * num_split
+
+
+@ops.RegisterShape("Tile")
+def _TileShape(op):
+  """Shape function for the Tile op.
+
+  This op has two inputs:
+
+  * input: A rank-N tensor.
+  * multiples: A length-N vector, in which the i^th element contains
+    the factor by which `input` will be tiled in the i^th dimension.
+
+  It has one output, which has the same rank as input, and additional
+  elements according to the values in multiples
+
+  Args:
+    op: A Tile Operation.
+
+  Returns:
+    A single-element list containing the shape of the output.
+  """
+  multiples_shape = op.inputs[1].get_shape().with_rank_at_most(1)
+  input_shape = op.inputs[0].get_shape().with_rank(multiples_shape.num_elements())
+  multiples = tensor_util.ConstantValue(op.inputs[1])
+  if multiples is None:
+    return [tensor_shape.unknown_shape(ndims=input_shape.ndims)]
+  else:
+    output_dims = []
+    multiples = multiples.ravel()
+    for i, dim in enumerate(input_shape.dims):
+      output_dims.append(dim * multiples[i])
+    return [tensor_shape.TensorShape(output_dims)]
+
+
+@ops.RegisterShape("TileGrad")
+def _TileGradShape(op):
+  """Shape function for the TileGrad op."""
+  multiples_shape = op.inputs[1].get_shape().with_rank_at_most(1)
+  input_shape = op.inputs[0].get_shape().with_rank(multiples_shape.num_elements())
+  multiples = tensor_util.ConstantValue(op.inputs[1])
+  if multiples is None:
+    return [tensor_shape.unknown_shape(ndims=input_shape.ndims)]
+  else:
+    output_dims = []
+    for i, dim in enumerate(input_shape.dims):
+      output_dims.append(dim / multiples[i])
+    return [tensor_shape.TensorShape(output_dims)]
+
+
+@ops.RegisterShape("Where")
+def _WhereShape(op):
+  """Shape function for the Where op."""
+  input_shape = op.inputs[0].get_shape()
+  return [tensor_shape.matrix(None, input_shape.ndims)]
+
+
+@ops.RegisterShape("ZerosLike")
+def _ZerosLikeShape(op):
+  """Shape function for the ZerosLike op."""
+  return [op.inputs[0].get_shape()]
+
+
+def edit_distance(hypothesis, truth, normalize=True, name="edit_distance"):
+  """Computes the Levenshtein distance between sequences.
+
+  This operation takes variable-length sequences (`hypothesis` and `truth`),
+  each provided as a `SparseTensor`, and computes the Levenshtein distance.
+  You can normalize the edit distance by length of `truth` by setting
+  `normalize` to true.
+
+  For example, given the following input:
+
+  ```python
+  # 'hypothesis' is a tensor of shape `[2, 1]` with variable-length values:
+  #   (0,0) = ["a"]
+  #   (1,0) = ["b"]
+  hypothesis = tf.SparseTensor(
+      [[0, 0, 0],
+       [1, 0, 0]],
+      ["a", "b"]
+      (2, 1, 1))
+
+  # 'truth' is a tensor of shape `[2, 2]` with variable-length values:
+  #   (0,0) = []
+  #   (0,1) = ["a"]
+  #   (1,0) = ["b", "c"]
+  #   (1,1) = ["a"]
+  truth = tf.SparseTensor(
+      [[0, 1, 0],
+       [1, 0, 0],
+       [1, 0, 1],
+       [1, 1, 0]]
+      ["a", "b", "c", "a"],
+      (2, 2, 2))
+
+  normalize = True
+  ```
+
+  This operation would return the following:
+
+  ```python
+  # 'output' is a tensor of shape `[2, 2]` with edit distances normalized
+  # by 'truth' lengths.
+  output ==> [[inf, 1.0],  # (0,0): no truth, (0,1): no hypothesis
+             [0.5, 1.0]]  # (1,0): addition, (1,1): no hypothesis
+  ```
+
+  Args:
+    hypothesis: A `SparseTensor` containing hypothesis sequences.
+    truth: A `SparseTensor` containing truth sequences.
+    normalize: A `bool`. If `True`, normalizes the Levenshtein distance by
+      length of `truth.`
+    name: A name for the operation (optional).
+
+  Returns:
+    A dense `Tensor` with rank `R - 1`, where R is the rank of the
+    `SparseTensor` inputs `hypothesis` and `truth`.
+
+  Raises:
+    TypeError: If either `hypothesis` or `truth` are not a `SparseTensor`.
+  """
+  if not isinstance(hypothesis, ops.SparseTensor):
+    raise TypeError("Hypothesis must be a SparseTensor")
+  if not isinstance(truth, ops.SparseTensor):
+    raise TypeError("Truth must be a SparseTensor")
+
+  return gen_array_ops._edit_distance(hypothesis.indices,
+                                      hypothesis.values,
+                                      hypothesis.shape,
+                                      truth.indices,
+                                      truth.values,
+                                      truth.shape,
+                                      normalize=normalize,
+                                      name=name)
+
+
+@ops.RegisterShape("EditDistance")
+def _EditDistanceShape(op):
+  """Shape function for the EditDistance op."""
+  hypothesis_shape = tensor_util.ConstantValue(op.inputs[2])
+  truth_shape = tensor_util.ConstantValue(op.inputs[5])
+  if hypothesis_shape is not None and truth_shape is not None:
+    if len(hypothesis_shape) != len(truth_shape):
+      raise ValueError(
+          "Inconsistent ranks in hypothesis and truth.  Saw shapes: %s and %s" %
+          (str(hypothesis_shape), str(truth_shape)))
+    return [tensor_shape.TensorShape(
+        [max(h, t) for h, t in zip(hypothesis_shape[:-1], truth_shape[:-1])])]
+
+  return [tensor_shape.unknown_shape()]
+
+
+# The remaining ops do not change the shape of their inputs.
+@ops.RegisterShape("Quantize")
+@ops.RegisterShape("Dequantize")
+def _QuantizeDequantizeShape(op):
+  unused_min_range = op.inputs[1].get_shape().merge_with(tensor_shape.scalar())
+  unused_max_range = op.inputs[2].get_shape().merge_with(tensor_shape.scalar())
+  return common_shapes.unchanged_shape(op)
diff --git a/tensorflow/python/ops/attention_ops.py b/tensorflow/python/ops/attention_ops.py
new file mode 100644
index 0000000000..4829bcd7cd
--- /dev/null
+++ b/tensorflow/python/ops/attention_ops.py
@@ -0,0 +1,34 @@
+"""Operations for implementing attention.
+"""
+import tensorflow.python.platform
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.ops import gen_attention_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_attention_ops import *
+
+
+# TODO(bsteiner): Implement the gradient function for extract_glimpse
+ops.NoGradient("ExtractGlimpse")
+
+
+@ops.RegisterShape("ExtractGlimpse")
+def _ExtractGlimpseShape(op):
+  """Shape function for ExtractGlimpse op."""
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  unused_size_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.vector(2))
+  offsets_shape = op.inputs[2].get_shape().merge_with(
+      input_shape[:1].concatenate([2]))
+  offsets_shape = offsets_shape
+  size_value = tensor_util.ConstantValue(op.inputs[1])
+  if size_value is not None:
+    height = size_value[0]
+    width = size_value[1]
+  else:
+    height = None
+    width = None
+  return [tensor_shape.TensorShape(
+      [input_shape[0], height, width, input_shape[3]])]
diff --git a/tensorflow/python/ops/candidate_sampling_ops.py b/tensorflow/python/ops/candidate_sampling_ops.py
new file mode 100644
index 0000000000..06857c0adc
--- /dev/null
+++ b/tensorflow/python/ops/candidate_sampling_ops.py
@@ -0,0 +1,365 @@
+"""Wrappers for primitive Neural Net (NN) Operations."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import random_seed
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import gen_candidate_sampling_ops
+from tensorflow.python.ops import math_ops
+
+
+def uniform_candidate_sampler(true_classes, num_true, num_sampled, unique,
+                              range_max, seed=None, name=None):
+  """Samples a set of classes using a uniform base distribution.
+
+  This operation randomly samples a tensor of sampled classes
+  (`sampled_candidates`) from the range of integers `[0, range_max]`.
+
+  The elements of `sampled_candidates` are drawn without replacement
+  (if `unique=True`) or with replacement (if `unique=False`) from
+  the base distribution.
+
+  The base distribution for this operation is the uniform distribution
+  over the range of integers `[0, range_max]`.
+
+  In addition, this operation returns tensors `true_expected_count`
+  and `sampled_expected_count` representing the number of times each
+  of the target classes (`true_classes`) and the sampled
+  classes (`sampled_candidates`) is expected to occur in an average
+  tensor of sampled classes.  These values correspond to `Q(y|x)`
+  defined in [this
+  document](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+  If `unique=True`, then these are post-rejection probabilities and we
+  compute them approximately.
+
+  Args:
+    true_classes: A `Tensor` of type `int64` and shape `[batch_size,
+      num_true]`. The target classes.
+    num_true: An `int`.  The number of target classes per training example.
+    num_sampled: An `int`.  The number of classes to randomly sample per batch.
+    unique: A `bool`. Determines whether all sampled classes in a batch are
+      unique.
+    range_max: An `int`. The number of possible classes.
+    seed: An `int`. An operation-specific seed. Default is 0.
+    name: A name for the operation (optional).
+
+  Returns:
+    sampled_candidates: A tensor of type `int64` and shape `[num_sampled]`.
+      The sampled classes.
+    true_expected_count: A tensor of type `float`.  Same shape as
+      `true_classes`. The expected counts under the sampling distribution
+      of each of `true_classes`.
+    sampled_expected_count: A tensor of type `float`. Same shape as
+      `sampled_candidates`. The expected counts under the sampling distribution
+      of each of `sampled_candidates`.
+  """
+  seed1, seed2 = random_seed.get_seed(seed)
+  return gen_candidate_sampling_ops._uniform_candidate_sampler(
+      true_classes, num_true, num_sampled, unique, range_max, seed=seed1,
+      seed2=seed2, name=name)
+
+
+def log_uniform_candidate_sampler(true_classes, num_true, num_sampled, unique,
+                                  range_max, seed=None, name=None):
+  """Samples a set of classes using a log-uniform (Zipfian) base distribution.
+
+  This operation randomly samples a tensor of sampled classes
+  (`sampled_candidates`) from the range of integers `[0, range_max]`.
+
+  The elements of `sampled_candidates` are drawn without replacement
+  (if `unique=True`) or with replacement (if `unique=False`) from
+  the base distribution.
+
+  The base distribution for this operation is an approximately log-uniform
+  or Zipfian distribution:
+
+  `P(class) = (log(class + 2) - log(class + 1)) / log(range_max + 1)`
+
+  This sampler is useful when the target classes approximately follow such
+  a distribution - for example, if the classes represent words in a lexicon
+  sorted in decreasing order of frequency. If your classes are not ordered by
+  decreasing frequency, do not use this op.
+
+  In addition, this operation returns tensors `true_expected_count`
+  and `sampled_expected_count` representing the number of times each
+  of the target classes (`true_classes`) and the sampled
+  classes (`sampled_candidates`) is expected to occur in an average
+  tensor of sampled classes.  These values correspond to `Q(y|x)`
+  defined in [this
+  document](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+  If `unique=True`, then these are post-rejection probabilities and we
+  compute them approximately.
+
+  Args:
+    true_classes: A `Tensor` of type `int64` and shape `[batch_size,
+      num_true]`. The target classes.
+    num_true: An `int`.  The number of target classes per training example.
+    num_sampled: An `int`.  The number of classes to randomly sample per batch.
+    unique: A `bool`. Determines whether all sampled classes in a batch are
+      unique.
+    range_max: An `int`. The number of possible classes.
+    seed: An `int`. An operation-specific seed. Default is 0.
+    name: A name for the operation (optional).
+
+  Returns:
+    sampled_candidates: A tensor of type `int64` and shape `[num_sampled]`.
+      The sampled classes.
+    true_expected_count: A tensor of type `float`.  Same shape as
+      `true_classes`. The expected counts under the sampling distribution
+      of each of `true_classes`.
+    sampled_expected_count: A tensor of type `float`. Same shape as
+      `sampled_candidates`. The expected counts under the sampling distribution
+      of each of `sampled_candidates`.
+  """
+  seed1, seed2 = random_seed.get_seed(seed)
+  return gen_candidate_sampling_ops._log_uniform_candidate_sampler(
+      true_classes, num_true, num_sampled, unique, range_max, seed=seed1,
+      seed2=seed2, name=name)
+
+
+def learned_unigram_candidate_sampler(true_classes, num_true, num_sampled,
+                                      unique, range_max, seed=None, name=None):
+  """Samples a set of classes from a distribution learned during training.
+
+  This operation randomly samples a tensor of sampled classes
+  (`sampled_candidates`) from the range of integers `[0, range_max]`.
+
+  The elements of `sampled_candidates` are drawn without replacement
+  (if `unique=True`) or with replacement (if `unique=False`) from
+  the base distribution.
+
+  The base distribution for this operation is constructed on the fly
+  during training.  It is a unigram distribution over the target
+  classes seen so far during training.  Every integer in `[0, range_max]`
+  begins with a weight of 1, and is incremented by 1 each time it is
+  seen as a target class.  The base distribution is not saved to checkpoints,
+  so it is reset when the model is reloaded.
+
+  In addition, this operation returns tensors `true_expected_count`
+  and `sampled_expected_count` representing the number of times each
+  of the target classes (`true_classes`) and the sampled
+  classes (`sampled_candidates`) is expected to occur in an average
+  tensor of sampled classes.  These values correspond to `Q(y|x)`
+  defined in [this
+  document](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+  If `unique=True`, then these are post-rejection probabilities and we
+  compute them approximately.
+
+  Args:
+    true_classes: A `Tensor` of type `int64` and shape `[batch_size,
+      num_true]`. The target classes.
+    num_true: An `int`.  The number of target classes per training example.
+    num_sampled: An `int`.  The number of classes to randomly sample per batch.
+    unique: A `bool`. Determines whether all sampled classes in a batch are
+      unique.
+    range_max: An `int`. The number of possible classes.
+    seed: An `int`. An operation-specific seed. Default is 0.
+    name: A name for the operation (optional).
+
+  Returns:
+    sampled_candidates: A tensor of type `int64` and shape `[num_sampled]`.
+      The sampled classes.
+    true_expected_count: A tensor of type `float`.  Same shape as
+      `true_classes`. The expected counts under the sampling distribution
+      of each of `true_classes`.
+    sampled_expected_count: A tensor of type `float`. Same shape as
+      `sampled_candidates`. The expected counts under the sampling distribution
+      of each of `sampled_candidates`.
+
+  """
+  seed1, seed2 = random_seed.get_seed(seed)
+  return gen_candidate_sampling_ops._learned_unigram_candidate_sampler(
+      true_classes, num_true, num_sampled, unique, range_max, seed=seed1,
+      seed2=seed2, name=name)
+
+
+def fixed_unigram_candidate_sampler(true_classes, num_true, num_sampled, unique,
+                                    range_max, vocab_file='', distortion=0.0,
+                                    num_reserved_ids=0, num_shards=1, shard=0,
+                                    unigrams=[], seed=None, name=None):
+  """Samples a set of classes using the provided (fixed) base distribution.
+
+  This operation randomly samples a tensor of sampled classes
+  (`sampled_candidates`) from the range of integers `[0, range_max]`.
+
+  The elements of `sampled_candidates` are drawn without replacement
+  (if `unique=True`) or with replacement (if `unique=False`) from
+  the base distribution.
+
+  The base distribution is read from a file or passed in as an
+  in-memory array. There is also an option to skew the distribution by
+  applying a distortion power to the weights.
+
+  In addition, this operation returns tensors `true_expected_count`
+  and `sampled_expected_count` representing the number of times each
+  of the target classes (`true_classes`) and the sampled
+  classes (`sampled_candidates`) is expected to occur in an average
+  tensor of sampled classes.  These values correspond to `Q(y|x)`
+  defined in [this
+  document](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+  If `unique=True`, then these are post-rejection probabilities and we
+  compute them approximately.
+
+  Args:
+    true_classes: A `Tensor` of type `int64` and shape `[batch_size,
+      num_true]`. The target classes.
+    num_true: An `int`.  The number of target classes per training example.
+    num_sampled: An `int`.  The number of classes to randomly sample per batch.
+    unique: A `bool`. Determines whether all sampled classes in a batch are
+      unique.
+    range_max: An `int`. The number of possible classes.
+    vocab_file: Each valid line in this file (which should have a CSV-like
+      format) corresponds to a valid word ID. IDs are in sequential order,
+      starting from num_reserved_ids. The last entry in each line is expected
+      to be a value corresponding to the count or relative probability. Exactly
+      one of `vocab_file` and `unigrams` needs to be passed to this operation.
+    distortion: The distortion is used to skew the unigram probability
+      distribution.  Each weight is first raised to the distortion's power
+      before adding to the internal unigram distribution. As a result,
+      `distortion = 1.0` gives regular unigram sampling (as defined by the vocab
+      file), and `distortion = 0.0` gives a uniform distribution.
+    num_reserved_ids: Optionally some reserved IDs can be added in the range
+      `[0, num_reserved_ids]` by the users. One use case is that a special
+      unknown word token is used as ID 0. These IDs will have a sampling
+      probability of 0.
+    num_shards: A sampler can be used to sample from a subset of the original
+      range in order to speed up the whole computation through parallelism. This
+      parameter (together with `shard`) indicates the number of partitions that
+      are being used in the overall computation.
+    shard: A sampler can be used to sample from a subset of the original range
+      in order to speed up the whole computation through parallelism. This
+      parameter (together with `num_shards`) indicates the particular partition
+      number of the operation, when partitioning is being used.
+    unigrams: A list of unigram counts or probabilities, one per ID in
+      sequential order. Exactly one of `vocab_file` and `unigrams` should be
+      passed to this operation.
+    seed: An `int`. An operation-specific seed. Default is 0.
+    name: A name for the operation (optional).
+
+  Returns:
+    sampled_candidates: A tensor of type `int64` and shape `[num_sampled]`.
+      The sampled classes.
+    true_expected_count: A tensor of type `float`.  Same shape as
+      `true_classes`. The expected counts under the sampling distribution
+      of each of `true_classes`.
+    sampled_expected_count: A tensor of type `float`. Same shape as
+      `sampled_candidates`. The expected counts under the sampling distribution
+      of each of `sampled_candidates`.
+
+  """
+  seed1, seed2 = random_seed.get_seed(seed)
+  return gen_candidate_sampling_ops._fixed_unigram_candidate_sampler(
+      true_classes, num_true, num_sampled, unique, range_max,
+      vocab_file=vocab_file, distortion=distortion,
+      num_reserved_ids=num_reserved_ids, num_shards=num_shards, shard=shard,
+      unigrams=unigrams, seed=seed1, seed2=seed2, name=name)
+
+
+def all_candidate_sampler(true_classes, num_true, num_sampled, unique,
+                          seed=None, name=None):
+  """Generate the set of all classes.
+
+  Deterministically generates and returns the set of all possible classes.
+  For testing purposes.  There is no need to use this, since you might as
+  well use full softmax or full logistic regression.
+
+  Args:
+    true_classes: A `Tensor` of type `int64` and shape `[batch_size,
+      num_true]`. The target classes.
+    num_true: An `int`.  The number of target classes per training example.
+    num_sampled: An `int`.  The number of possible classes.
+    unique: A `bool`. Ignored.
+      unique.
+    seed: An `int`. An operation-specific seed. Default is 0.
+    name: A name for the operation (optional).
+
+  Returns:
+    sampled_candidates: A tensor of type `int64` and shape `[num_sampled]`.
+      This operation deterministically returns the entire range
+      `[0, num_sampled]`.
+    true_expected_count: A tensor of type `float`.  Same shape as
+      `true_classes`. The expected counts under the sampling distribution
+      of each of `true_classes`. All returned values are 1.0.
+    sampled_expected_count: A tensor of type `float`. Same shape as
+      `sampled_candidates`. The expected counts under the sampling distribution
+      of each of `sampled_candidates`. All returned values are 1.0.
+  """
+  seed1, seed2 = random_seed.get_seed(seed)
+  return gen_candidate_sampling_ops._all_candidate_sampler(
+      true_classes, num_true, num_sampled, unique, seed=seed1, seed2=seed2,
+      name=name)
+
+
+def compute_accidental_hits(true_classes, sampled_candidates, num_true,
+                            seed=None, name=None):
+  """Compute the ids of positions in sampled_candidates matching true_classes.
+
+  In Candidate Sampling, this operation facilitates virtually removing
+  sampled classes which happen to match target classes.  This is done
+  in Sampled Softmax and Sampled Logistic.
+
+  See our [Candidate Sampling Algorithms
+  Reference](http://www.tensorflow.org/extras/candidate_sampling.pdf).
+
+  We presuppose that the `sampled_candidates` are unique.
+
+  We call it an 'accidental hit' when one of the target classes
+  matches one of the sampled classes.  This operation reports
+  accidental hits as triples `(index, id, weight)`, where `index`
+  represents the row number in `true_classes`, `id` represents the
+  position in `sampled_candidates`, and weight is `-FLOAT_MAX`.
+
+  The result of this op should be passed through a `sparse_to_dense`
+  operation, then added to the logits of the sampled classes. This
+  removes the contradictory effect of accidentally sampling the true
+  target classes as noise classes for the same example.
+
+  Args:
+    true_classes: A `Tensor` of type `int64` and shape `[batch_size,
+      num_true]`. The target classes.
+    sampled_candidates: A tensor of type `int64` and shape `[num_sampled]`.
+      The sampled_candidates output of CandidateSampler.
+    num_true: An `int`.  The number of target classes per training example.
+    seed: An `int`. An operation-specific seed. Default is 0.
+    name: A name for the operation (optional).
+
+  Returns:
+    indices: A `Tensor` of type `int32` and shape `[num_accidental_hits]`.
+      Values indicate rows in `true_classes`.
+    ids: A `Tensor` of type `int64` and shape `[num_accidental_hits]`.
+      Values indicate positions in `sampled_candidates`.
+    weights: A `Tensor` of type `float` and shape `[num_accidental_hits]`.
+      Each value is `-FLOAT_MAX`.
+
+  """
+  seed1, seed2 = random_seed.get_seed(seed)
+  return gen_candidate_sampling_ops._compute_accidental_hits(
+      true_classes, sampled_candidates, num_true, seed=seed1, seed2=seed2,
+      name=name)
+
+
+@ops.RegisterShape("AllCandidateSampler")
+@ops.RegisterShape("FixedUnigramCandidateSampler")
+@ops.RegisterShape("LearnedUnigramCandidateSampler")
+@ops.RegisterShape("LogUniformCandidateSampler")
+@ops.RegisterShape("ThreadUnsafeUnigramCandidateSampler")
+@ops.RegisterShape("UniformCandidateSampler")
+def _CandidateSamplerShape(op):
+  true_classes_shape = op.inputs[0].get_shape().with_rank(2)
+  batch_size = true_classes_shape[0]
+  num_sampled = op.get_attr("num_sampled")
+  num_true = op.get_attr("num_true")
+  return [tensor_shape.vector(num_sampled),
+          tensor_shape.matrix(batch_size, num_true),
+          tensor_shape.vector(num_sampled)]
+
+
+@ops.RegisterShape("ComputeAccidentalHits")
+def _ComputeAccidentalHitsShape(op):
+  num_true = op.get_attr("num_true")
+  # Validate that the input shape matches the attrs, even though it
+  # does not influence the shape of the output.
+  true_candidates_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.matrix(None, num_true))
+  output_shape = tensor_shape.vector(None)
+  return [output_shape] * 3
diff --git a/tensorflow/python/ops/clip_ops.py b/tensorflow/python/ops/clip_ops.py
new file mode 100644
index 0000000000..08781932f9
--- /dev/null
+++ b/tensorflow/python/ops/clip_ops.py
@@ -0,0 +1,234 @@
+"""Operations for clipping (gradient, weight) tensors to min/max values."""
+
+import collections
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import math_ops
+
+
+def clip_by_value(t, clip_value_min, clip_value_max,
+                  name=None):
+  """Clips tensor values to a specified min and max.
+
+  Given a tensor `t`, this operation returns a tensor of the same type and
+  shape as `t` with its values clipped to `clip_value_min` and `clip_value_max`.
+  Any values less than `clip_value_min` are set to `clip_value_min`. Any values
+  greater than `clip_value_max` are set to `clip_value_max`.
+
+  Args:
+    t: A `Tensor`.
+    clip_value_min: A 0-D (scalar) `Tensor`. The minimum value to clip by.
+    clip_value_max: A 0-D (scalar) `Tensor`. The maximum value to clip by.
+    name: A name for the operation (optional).
+
+  Returns:
+    A clipped `Tensor`.
+  """
+  with ops.op_scope([t, clip_value_min, clip_value_max], name,
+                   "clip_by_value") as name:
+    t = ops.convert_to_tensor(t, name="t")
+
+    # Go through list of tensors, for each value in each tensor clip
+    t_min = math_ops.minimum(
+        t, array_ops.fill(array_ops.shape(t), clip_value_max))
+    t_max = math_ops.maximum(
+        t_min, array_ops.fill(array_ops.shape(t), clip_value_min),
+        name=name)
+
+  return t_max
+
+
+def clip_by_norm(t, clip_norm, name=None):
+  """Clips tensor values to a maximum L2-norm.
+
+  Given a tensor `t`, and a maximum clip value `clip_norm`, this operation
+  normalizes `t` so that its L2-norm is less than or equal to `clip_norm'.
+  Specifically, if the L2-norm is already less than or equal to `clip_norm`,
+  then `t` is not modified. If the L2-norm is greater than `clip_norm`, then
+  this operation returns a tensor of the same type and shape as `t` with its
+  values set to:
+
+  `t * clip_norm / l2norm(t)`
+
+  In this case, the L2-norm of the output tensor is `clip_norm`.
+
+  This operation is typically used to clip gradients before applying them with
+  an optimizer.
+
+  Args:
+    t: A `Tensor`.
+    clip_norm: A 0-D (scalar) `Tensor` > 0. A maximum clipping value.
+    name: A name for the operation (optional).
+
+  Returns:
+    A clipped `Tensor`.
+  """
+  with ops.op_scope([t, clip_norm], name, "clip_by_norm") as name:
+    t = ops.convert_to_tensor(t, name="t")
+
+    # Calculate L2-norm, clip elements by ratio of clip_norm to L2-norm
+    l2norm_inv = math_ops.rsqrt(
+        math_ops.reduce_sum(t * t, math_ops.range(0, array_ops.rank(t))))
+    tclip = array_ops.identity(t * clip_norm * math_ops.minimum(
+        l2norm_inv, constant_op.constant(1.0 / clip_norm)), name=name)
+
+  return tclip
+
+def global_norm(t_list, name=None):
+  """Computes the global norm of multiple tensors.
+
+  Given a tuple or list of tensors `t_list`, this operation returns the
+  global norm of the elements in all tensors in `t_list`. The global norm is
+  computed as:
+
+  `global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))`
+
+  Any entries in `t_list` that are of type None are ignored.
+
+  Args:
+    t_list: A tuple or list of mixed `Tensors`, `IndexedSlices`, or None.
+    name: A name for the operation (optional).
+
+  Returns:
+    A 0-D (scalar) `Tensor` of type `float`.
+
+  Raises:
+    TypeError: If `t_list` is not a sequence.
+  """
+  if (not isinstance(t_list, collections.Sequence)
+      or isinstance(t_list, basestring)):
+    raise TypeError("t_list should be a sequence")
+  t_list = list(t_list)
+  with ops.op_scope(t_list, name, "global_norm") as name:
+    values = [
+        ops.convert_to_tensor(
+            t.values if isinstance(t, ops.IndexedSlices) else t,
+            name="t_%d" % i)
+        if t is not None else t
+        for i, t in enumerate(t_list)]
+    squared_norms = array_ops.pack(
+        [math_ops.reduce_sum(v * v) for v in values if v])
+
+    norm = math_ops.sqrt(
+        math_ops.reduce_sum(squared_norms), name="global_norm")
+
+  return norm
+
+def clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None):
+  """Clips values of multiple tensors by the ratio of the sum of their norms.
+
+  Given a tuple or list of tensors `t_list`, and a clipping ratio `clip_norm`,
+  this operation returns a list of clipped tensors `list_clipped`
+  and the global norm (`global_norm`) of all tensors in `t_list`. Optionally,
+  if you've already computed the global norm for `t_list`, you can specify
+  the global norm with `use_norm`.
+
+  To perform the clipping, the values t_list[i] are set to:
+
+  `t_list[i] * clip_norm / max(global_norm, clip_norm)`
+
+  where:
+
+  `global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))`
+
+  If `clip_norm > global_norm` then the entries in `t_list` remain as they are,
+  otherwise they're all shrunk by the global ratio.
+
+  Any of the entries of `t_list` that are of type None are ignored.
+
+  This is the correct way to perform gradient clipping (for example, see
+  R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training
+  Recurrent Neural Networks".  http://arxiv.org/abs/1211.5063)
+
+  However, it is slower than `clip_by_norm()` because all the parameters must be
+  ready before the clipping operation can be performed.
+
+  Args:
+    t_list: A tuple or list of mixed `Tensors`, `IndexedSlices`, or None.
+    clip_norm: A 0-D (scalar) `Tensor` > 0. The clipping ratio.
+    use_norm: A 0-D (scalar) `Tensor` of type `float` (optional). The global
+      norm to use. If not provided, `global_norm()` is used to compute the norm.
+    name: A name for the operation (optional).
+
+  Returns:
+    list_clipped: A list of `Tensors` of the same type as `list_t`.
+    global_norm: A 0-D (scalar) `Tensor` representing the global norm.
+
+  Raises:
+    TypeError: If `t_list` is not a sequence.
+  """
+  if (not isinstance(t_list, collections.Sequence)
+      or isinstance(t_list, basestring)):
+    raise TypeError("t_list should be a sequence")
+  t_list = list(t_list)
+  if use_norm is None:
+    use_norm = global_norm(t_list, name)
+
+  with ops.op_scope(t_list + [clip_norm], name, "clip_by_global_norm") as name:
+    # Calculate L2-norm, clip elements by ratio of clip_norm to L2-norm
+    scale = clip_norm * math_ops.minimum(
+        1.0 / use_norm, constant_op.constant(1.0 / clip_norm))
+
+    values = [
+        ops.convert_to_tensor(
+            t.values if isinstance(t, ops.IndexedSlices) else t,
+            name="t_%d" % i)
+        if t is not None else t
+        for i, t in enumerate(t_list)]
+
+    values_clipped = [
+        array_ops.identity(v * scale, name="%s_%d" % (name, i))
+        if v is not None else None
+        for i, v in enumerate(values)]
+
+    list_clipped = [
+        ops.IndexedSlices(c_v, t.indices)
+        if isinstance(t, ops.IndexedSlices)
+        else c_v
+        for (c_v, t) in zip(values_clipped, t_list)]
+
+  return list_clipped, use_norm
+
+
+def clip_by_average_norm(t, clip_norm, name=None):
+  """Clips tensor values to a maximum average L2-norm.
+
+  Given a tensor `t`, and a maximum clip value `clip_norm`, this operation
+  normalizes `t` so that its average L2-norm is less than or equal to
+  `clip_norm'. Specifically, if the average L2-norm is already less than or
+  equal to `clip_norm`, then `t` is not modified. If the average L2-norm is
+  greater than `clip_norm`, then this operation returns a tensor of the same
+  type and shape as `t` with its values set to:
+
+  `t * clip_norm / l2norm_avg(t)`
+
+  In this case, the average L2-norm of the output tensor is `clip_norm`.
+
+  This operation is typically used to clip gradients before applying them with
+  an optimizer.
+
+  Args:
+    t: A `Tensor`.
+    clip_norm: A 0-D (scalar) `Tensor` > 0. A maximum clipping value.
+    name: A name for the operation (optional).
+
+  Returns:
+    A clipped `Tensor`.
+  """
+  with ops.op_scope([t, clip_norm], name, "clip_by_average_norm") as name:
+    t = ops.convert_to_tensor(t, name="t")
+
+    # Calculate L2-norm per element, clip elements by ratio of clip_norm to
+    # L2-norm per element
+    n_element = math_ops.cast(array_ops.size(t), types.float32)
+    l2norm_inv = math_ops.rsqrt(
+        math_ops.reduce_sum(t * t, math_ops.range(0, array_ops.rank(t))))
+    tclip = array_ops.identity(
+        t * clip_norm * math_ops.minimum(
+            l2norm_inv * n_element, constant_op.constant(1.0 / clip_norm)),
+        name=name)
+
+  return tclip
diff --git a/tensorflow/python/ops/common_shapes.py b/tensorflow/python/ops/common_shapes.py
new file mode 100644
index 0000000000..c41d1ff71d
--- /dev/null
+++ b/tensorflow/python/ops/common_shapes.py
@@ -0,0 +1,371 @@
+"""A library of common shape functions."""
+import math
+
+from tensorflow.python.framework import tensor_shape
+
+
+def scalar_shape(unused_op):
+  """Shape function for ops that output a scalar value."""
+  return [tensor_shape.scalar()]
+
+
+def unchanged_shape(op):
+  """Shape function for ops that output an tensor like their first input."""
+  return [op.inputs[0].get_shape()]
+
+
+def unchanged_shape_with_rank(rank):
+  """Returns a shape function for ops that constrain the rank of their input.
+
+  Args:
+    rank: The exact rank of the input and output.
+
+  Returns:
+    A shape function for ops that output a tensor of the same size as their
+    input, with a particular rank.
+  """
+  def _ShapeFunction(op):
+    return [op.inputs[0].get_shape().with_rank(rank)]
+  return _ShapeFunction
+
+
+def unchanged_shape_with_rank_at_least(rank):
+  """Returns a shape function for ops that constrain the rank of their input.
+
+  Args:
+    rank: A lower bound on the rank of the input and output.
+
+  Returns:
+    A shape function for ops that output a tensor of the same size as their
+    input, with a particular rank.
+  """
+  def _ShapeFunction(op):
+    return [op.inputs[0].get_shape().with_rank_at_least(rank)]
+  return _ShapeFunction
+
+
+def unchanged_shape_with_rank_at_most(rank):
+  """Returns a shape function for ops that constrain the rank of their input.
+
+  Args:
+    rank: An upper bound on the rank of the input and output.
+
+  Returns:
+    A shape function for ops that output a tensor of the same size as their
+    input, with a particular rank.
+  """
+  def _ShapeFunction(op):
+    return [op.inputs[0].get_shape().with_rank_at_most(rank)]
+  return _ShapeFunction
+
+
+def matmul_shape(op):
+  """Shape function for a MatMul op."""
+  a_shape = op.inputs[0].get_shape().with_rank(2)
+  transpose_a = op.get_attr("transpose_a")
+  b_shape = op.inputs[1].get_shape().with_rank(2)
+  transpose_b = op.get_attr("transpose_b")
+  output_rows = a_shape[1] if transpose_a else a_shape[0]
+  output_cols = b_shape[0] if transpose_b else b_shape[1]
+  inner_a = a_shape[0] if transpose_a else a_shape[1]
+  inner_b = b_shape[1] if transpose_b else b_shape[0]
+  inner_a.assert_is_compatible_with(inner_b)
+  return [tensor_shape.TensorShape([output_rows, output_cols])]
+
+
+def bias_add_shape(op):
+  """Shape function for a BiasAdd op."""
+  input_shape = op.inputs[0].get_shape().with_rank_at_least(2)
+  bias_shape = op.inputs[1].get_shape().with_rank(1)
+  if input_shape.ndims is not None:
+    # Output has the same shape as input, and matches the length of
+    # bias in its last dimension.
+    output_shape = input_shape[0:-1].concatenate(
+        input_shape[-1].merge_with(bias_shape[0]))
+  else:
+    output_shape = tensor_shape.unknown_shape()
+  return [output_shape]
+
+
+def _Get2DOutputSize(input_height, input_width, filter_height, filter_width,
+                     row_stride, col_stride, padding_type):
+  """Returns the number of rows and columns in a convolution/pooling output."""
+  input_height = tensor_shape.as_dimension(input_height)
+  input_width = tensor_shape.as_dimension(input_width)
+  filter_height = tensor_shape.as_dimension(filter_height)
+  filter_width = tensor_shape.as_dimension(filter_width)
+  row_stride = int(row_stride)
+  col_stride = int(col_stride)
+
+  if filter_height.value == 1 and filter_width.value == 1 and (
+      row_stride == 1 and col_stride == 1):
+    return input_height, input_width
+  else:
+    if filter_height > input_height or filter_width > input_width:
+      raise ValueError("filter must not be larger than the input: ",
+                       "Filter: [", filter_height, "x", filter_width, "] ",
+                       "Input: [", input_height, "x", input_width, "] ")
+    if row_stride > filter_height or col_stride > filter_width:
+      raise ValueError("stride must be less than or equal to filter size",
+                       "stride: [", row_stride, "x", col_stride, "] ",
+                       "filter: [", filter_height, "x", filter_width, "] ")
+
+    # Compute number of rows in the output, based on the padding.
+    if input_height.value is None or filter_height.value is None:
+      out_rows = None
+    elif padding_type == "VALID":
+      out_rows = int(
+          math.ceil((input_height.value - filter_height.value + 1.0)
+                    / row_stride))
+    elif padding_type == "SAME":
+      out_rows = int(math.ceil(input_height.value * 1.0
+                               / row_stride))
+    else:
+      raise ValueError("Invalid value for padding: %r" % padding_type)
+
+    # Compute number of columns in the output, based on the padding.
+    if input_width.value is None or filter_width.value is None:
+      out_cols = None
+    elif padding_type == "VALID":
+      out_cols = int(
+          math.ceil((input_width.value - filter_width.value + 1.0)
+                    / col_stride))
+    elif padding_type == "SAME":
+      out_cols = int(math.ceil(input_width.value * 1.0 / col_stride))
+
+    return out_rows, out_cols
+
+
+def conv2d_shape(op):
+  """Shape function for a Conv2D op.
+
+  This op has two inputs:
+
+  * input, a 4D tensor with shape = [batch_size, rows, cols, depth_in]
+  * filter, a 4D tensor with shape =  [filter_rows, filter_cols,
+    depth_in, depth_out]
+
+  The output is a 4D tensor with shape = [batch_size, out_rows,
+  out_cols, depth_out], where out_rows and out_cols depend on the
+  value of the op's "padding" and "strides" attrs.
+
+  Args:
+    op: A Conv2D Operation.
+
+  Returns:
+    A list containing the Shape of the Conv2D output.
+
+  Raises:
+    ValueError: If the shapes of the input or filter are incompatible.
+  """
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  filter_shape = op.inputs[1].get_shape().with_rank(4)
+
+  batch_size = input_shape[0]
+  in_rows = input_shape[1]
+  in_cols = input_shape[2]
+
+  filter_rows = filter_shape[0]
+  filter_cols = filter_shape[1]
+  depth_out = filter_shape[3]
+  # Check that the input depths are compatible.
+  input_shape[3].assert_is_compatible_with(filter_shape[2])
+
+  stride_b, stride_r, stride_c, stride_d = op.get_attr("strides")
+  if stride_b != 1 or stride_d != 1:
+    raise ValueError("Current implementation does not yet support "
+                     "strides in the batch and depth dimensions.")
+  if stride_r != stride_c:
+    # TODO(shlens): Add support for this.
+    raise ValueError("Current implementation only supports equal length "
+                     "strides in the row and column dimensions.")
+
+  # TODO(mrry,shlens): Raise an error if the stride would cause
+  # information in the input to be ignored. This will require a change
+  # in the kernel implementation.
+  stride = stride_r
+  padding = op.get_attr("padding")
+  out_rows, out_cols = _Get2DOutputSize(
+      in_rows, in_cols, filter_rows, filter_cols, stride, stride, padding)
+
+  return [tensor_shape.TensorShape([batch_size, out_rows, out_cols, depth_out])]
+
+
+def separable_conv2d_shape(op):
+  """Shape function for a SeparableConv2D op.
+
+  This op has three inputs:
+
+  * input, a 4D tensor with shape = [batch_size, rows, cols, depth_in]
+
+  * depthwise_filter, a 4D tensor with shape = [filter_rows,
+    filter_cols, depth_in, depth_multiplier]
+
+  * pointwise_filter, a 4D tensor with shape = [1, 1, depth_in *
+    depth_multiplier, depth_out]
+
+  The output is a 4D tensor with shape = [batch_size, out_rows,
+  out_cols, depth_out], where out_rows and out_cols depend on the
+  value of the op's "padding" and "strides" attrs.
+
+  Args:
+    op: A SeparableConv2D Operation.
+
+  Returns:
+    A list containing the Shape of the SeparableConv2D output.
+
+  Raises:
+    ValueError: If the shapes of the input or filter are incompatible.
+  """
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  depthwise_filter_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.TensorShape([None, None, input_shape[3], None]))
+  pointwise_depth_in = depthwise_filter_shape[2] * depthwise_filter_shape[3]
+
+  pointwise_filter_shape = op.inputs[2].get_shape().merge_with(
+      tensor_shape.TensorShape([1, 1, pointwise_depth_in, None]))
+
+  batch_size = input_shape[0]
+  in_rows = input_shape[1]
+  in_cols = input_shape[2]
+
+  filter_rows = depthwise_filter_shape[0]
+  filter_cols = depthwise_filter_shape[1]
+  depth_out = pointwise_filter_shape[3]
+
+  stride_b, stride_r, stride_c, stride_d = op.get_attr("strides")
+  if stride_b != 1 or stride_d != 1:
+    raise ValueError("Current implementation does not yet support "
+                     "strides in the batch and depth dimensions.")
+  if stride_r != stride_c:
+    # TODO(shlens): Add support for this.
+    raise ValueError("Current implementation only supports equal length "
+                     "strides in the row and column dimensions.")
+
+  # TODO(mrry,shlens): Raise an error if the stride would cause
+  # information in the input to be ignored. This will require a change
+  # in the kernel implementation.
+  stride = stride_r
+  padding = op.get_attr("padding")
+  out_rows, out_cols = _Get2DOutputSize(
+      in_rows, in_cols, filter_rows, filter_cols, stride, stride, padding)
+
+  return [tensor_shape.TensorShape([batch_size, out_rows, out_cols, depth_out])]
+
+
+def avg_pool_shape(op):
+  """Shape function for an AvgPool op.
+
+  This op has one input:
+
+  * input, a 4D tensor with shape = [batch_size, rows, cols, depth]
+
+  The output is a 4D tensor with shape = [batch_size, out_rows,
+  out_cols, depth_out], where out_rows and out_cols depend on the
+  value of the op's "ksize", "strides", and "padding" attrs.
+
+  Args:
+    op: An AvgPool Operation.
+
+  Returns:
+    A single-element list containing the Shape of the AvgPool output.
+
+  Raises:
+    ValueError: If the shape of the input is invalid or incompatible with
+      the values of the attrs.
+  """
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  ksize_b, ksize_r, ksize_c, ksize_d = op.get_attr("ksize")
+  stride_b, stride_r, stride_c, stride_d = op.get_attr("strides")
+
+  batch_size = input_shape[0]
+  in_rows = input_shape[1]
+  in_cols = input_shape[2]
+  depth = input_shape[3]
+
+  if ksize_b != 1 or ksize_d != 1:
+    raise ValueError("Current implementation does not support pooling "
+                     "in the batch and depth dimensions.")
+  if stride_b != 1 or stride_d != 1:
+    raise ValueError("Current implementation does not support strides "
+                     "in the batch and depth dimensions.")
+
+  # TODO(mrry,shlens): Raise an error if the stride would cause
+  # information in the input to be ignored. This will require a change
+  # in the kernel implementation.
+  padding = op.get_attr("padding")
+
+  out_rows, out_cols = _Get2DOutputSize(
+      in_rows, in_cols, ksize_r, ksize_c, stride_r, stride_c, padding)
+
+  return [tensor_shape.TensorShape([batch_size, out_rows, out_cols, depth])]
+
+
+def max_pool_shape(op):
+  """Shape function for a MaxPool op.
+
+  This op has one input:
+
+  * input, a 4D tensor with shape = [batch_size, rows, cols, depth_in]
+
+  The output is a 4D tensor with shape = [batch_size, out_rows,
+  out_cols, depth_out], where out_rows, out_cols, and depth_out depend
+  on the value of the op's "ksize", "strides", and "padding" attrs.
+
+  Args:
+    op: A MaxPool Operation.
+
+  Returns:
+    A single-element list containing the Shape of the MaxPool output.
+
+  Raises:
+    ValueError: If the shape of the input is invalid or incompatible with
+      the values of the attrs.
+  """
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  ksize_b, ksize_r, ksize_c, ksize_d = op.get_attr("ksize")
+  stride_b, stride_r, stride_c, stride_d = op.get_attr("strides")
+
+  batch_size = input_shape[0]
+  in_rows = input_shape[1]
+  in_cols = input_shape[2]
+  depth = input_shape[3]
+
+  if ksize_b != 1:
+    raise ValueError("Current implementation does not support pooling "
+                     "in the batch dimension.")
+  if stride_b != 1:
+    raise ValueError("Current implementation does not support strides "
+                     "in the batch dimension.")
+
+  if not ((ksize_r == 1 and ksize_c == 1) or ksize_d == 1):
+    raise ValueError("MaxPooling supports exactly one of pooling across depth "
+                     "or pooling across width/height.")
+
+  # TODO(mrry,shlens): Raise an error if the stride would cause
+  # information in the input to be ignored. This will require a change
+  # in the kernel implementation.
+  if ksize_d == 1:
+    padding = op.get_attr("padding")
+    out_rows, out_cols = _Get2DOutputSize(
+        in_rows, in_cols, ksize_r, ksize_c, stride_r, stride_c, padding)
+    return [tensor_shape.TensorShape([batch_size, out_rows, out_cols, depth])]
+  else:
+    if depth % ksize_d > 0:
+      raise ValueError("Depthwise max pooling requires the depth window "
+                       "to evenly divide the input depth.")
+    if stride_d != ksize_d:
+      raise ValueError("Depthwise max pooling requires the depth window "
+                       "to equal the depth stride.")
+    return [tensor_shape.TensorShape(
+        [batch_size, in_rows, in_cols, depth / ksize_d])]
+
+
+def no_outputs(unused_op):
+  """Shape function for use with ops that have no outputs."""
+  return []
+
+
+def unknown_shape(op):
+  """Shape function for use with ops whose output shapes are unknown."""
+  return [tensor_shape.unknown_shape() for _ in op.outputs]
diff --git a/tensorflow/python/ops/constant_op.py b/tensorflow/python/ops/constant_op.py
new file mode 100644
index 0000000000..7d9044b689
--- /dev/null
+++ b/tensorflow/python/ops/constant_op.py
@@ -0,0 +1,189 @@
+"""## Constant Value Tensors
+
+TensorFlow provides several operations that you can use to generate constants.
+
+@@zeros
+@@zeros_like
+
+@@ones
+@@ones_like
+
+@@fill
+
+@@constant
+
+## Sequences
+
+@@linspace
+
+@@range
+
+## Random Tensors
+
+TensorFlow has several ops that create random tensors with different
+distributions.  The random ops are stateful, and create new random values each
+time they are evaluated.
+
+The `seed` keyword argument in these functions acts in conjunction with
+the graph-level random seed. Changing either the graph-level seed using
+[`set_random_seed`](constant_op.md#set_random_seed) or the op-level seed
+will change the underlying seed of these operations. Setting neither graph-level
+nor op-level seed, results in a random seed for all operations.
+See [`set_random_seed`](constant_op.md#set_random_seed) for details on the
+interaction between operation-level and graph-level random seeds.
+
+### Examples:
+
+```python
+# Create a tensor of shape [2, 3] consisting of random normal values, with mean
+# -1 and standard deviation 4.
+norm = tf.random_normal([2, 3], mean=-1, stddev=4)
+
+# Shuffle the first dimension of a tensor
+c = tf.constant([[1, 2], [3, 4], [5, 6]])
+shuff = tf.random_shuffle(c)
+
+# Each time we run these ops, different results are generated
+sess = tf.Session()
+print sess.run(norm)
+print sess.run(norm)
+
+# Set an op-level seed to generate repeatable sequences across sessions.
+c = tf.constant([[1, 2], [3, 4], [5, 6]])
+sess = tf.Session()
+norm = tf.random_normal(c, seed=1234)
+print sess.run(norm)
+print sess.run(norm)
+```
+
+Another common use of random values is the intialization of variables. Also see
+the [Variables How To](../../how_tos/variables/index.md).
+
+```python
+# Use random uniform values in [0, 1) as the initializer for a variable of shape
+# [2, 3]. The default type is float32.
+var = tf.Variable(tf.random_uniform([2, 3]), name="var")
+init = tf.initialize_all_variables()
+
+sess = tf.Session()
+sess.run(init)
+print sess.run(var)
+```
+
+@@random_normal
+@@truncated_normal
+@@random_uniform
+@@random_shuffle
+@@set_random_seed
+
+"""
+"""Constant Operation.
+
+Has to be separate from array_ops to avoid a cyclic dependency.
+"""
+import tensorflow.python.platform
+import numpy as np
+
+from tensorflow.core.framework import attr_value_pb2
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+
+
+def constant(value, dtype=None, shape=None, name="Const"):
+  """Creates a constant tensor.
+
+   The resulting tensor is populated with values of type `dtype`, as
+   specified by arguments `value` and (optionally) `shape` (see examples
+   below).
+
+   The argument `value` can be a constant value, or a list of values of type
+   `dtype`. If `value` is a list, then the length of the list must be less
+   than or equal to the number of elements implied by the `shape` argument (if
+   specified). In the case where the list length is less than the number of
+   elements specified by `shape`, the last element in the list will be used
+   to fill the remaining entries.
+
+   The argument `shape` is optional. If present, it specifies the dimensions
+   of the resulting tensor. If not present, then the tensor is a scalar (0-D)
+   if `value` is a scalar, or 1-D otherwise.
+
+   If the argument `dtype` is not specified, then the type is inferred from
+   the type of `value`.
+
+   For example:
+
+   ```python
+   # Constant 1-D Tensor populated with value list.
+   tensor = tf.constant([1, 2, 3, 4, 5, 6, 7]) => [1 2 3 4 5 6 7]
+
+   # Constant 2-D tensor populated with scalar value -1.
+   tensor = tf.constant(-1.0, shape=[2, 3]) => [[-1. -1. -1.]
+                                                [-1. -1. -1.]]
+   ```
+
+  Args:
+    value:     A constant value (or list) of output type `dtype`.
+
+    dtype:     The type of the elements of the resulting tensor.
+
+    shape:     Optional dimensions of resulting tensor.
+
+    name:      Optional name for the tensor.
+
+  Returns:
+    A Constant Tensor.
+  """
+  g = ops.get_default_graph()
+  tensor_value = attr_value_pb2.AttrValue()
+  tensor_value.tensor.CopyFrom(
+      tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
+  dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
+  const_tensor = g.create_op(
+      "Const", [], [dtype_value.type],
+      attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
+  return const_tensor
+
+
+@ops.RegisterShape("Const")
+def _ConstantShape(op):
+  return [tensor_shape.TensorShape(
+      [d.size for d in op.get_attr("value").tensor_shape.dim])]
+
+
+ops.register_tensor_conversion_function((list, tuple), constant, 100)
+ops.register_tensor_conversion_function(np.ndarray, constant, 100)
+ops.register_tensor_conversion_function(np.generic, constant, 100)
+ops.register_tensor_conversion_function(object, constant, 200)
+
+def _tensor_shape_tensor_conversion_function(s, dtype=None, name=None):
+  if not s.is_fully_defined():
+    raise ValueError(
+        "Cannot convert a partially known TensorShape to a Tensor: %s" % s)
+  if dtype is not None:
+    if dtype not in (types.int32, types.int64):
+      raise TypeError("Cannot convert a TensorShape to dtype: %s" % dtype)
+  else:
+    dtype = types.int32
+  if name is None:
+    name = "shape_as_tensor"
+  return constant(s.as_list(), dtype=dtype, name=name)
+
+ops.register_tensor_conversion_function(
+    tensor_shape.TensorShape, _tensor_shape_tensor_conversion_function, 100)
+
+def _dimension_tensor_conversion_function(d, dtype=None, name=None):
+  if d.value is None:
+    raise ValueError("Cannot convert an unknown Dimension to a Tensor: %s" % d)
+  if dtype is not None:
+    if dtype not in (types.int32, types.int64):
+      raise TypeError("Cannot convert a TensorShape to dtype: %s" % dtype)
+  else:
+    dtype = types.int32
+  if name is None:
+    name = "shape_as_tensor"
+  return constant(d.value, dtype=dtype, name=name)
+
+ops.register_tensor_conversion_function(
+    tensor_shape.Dimension, _dimension_tensor_conversion_function, 100)
diff --git a/tensorflow/python/ops/control_flow_grad.py b/tensorflow/python/ops/control_flow_grad.py
new file mode 100644
index 0000000000..3a1a5b91c0
--- /dev/null
+++ b/tensorflow/python/ops/control_flow_grad.py
@@ -0,0 +1,100 @@
+"""Gradients for operators defined in control_flow_ops.py."""
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import math_ops
+# pylint: disable=wildcard-import,undefined-variable
+from tensorflow.python.ops.control_flow_ops import *
+from tensorflow.python.ops.gen_control_flow_ops import *
+
+
+@ops.RegisterGradient("Switch")
+def _SwitchGrad(op, *grad):
+  op = GetRealOp(op)
+  ctxt = op._get_control_flow_context()  # pylint: disable=protected-access
+  if isinstance(ctxt, WhileContext):
+    merge_op = ctxt.switch_map.get(op)
+    if merge_op:
+      merge_op._update_input(1, grad[1])
+      return None, None
+    else:
+      merge_op = merge(grad, name="b_switch")[0]
+      ctxt.switch_map[op] = merge_op.op
+      return merge_op, None
+  elif isinstance(ctxt, CondContext):
+    good_grad = grad[ctxt.branch]
+    zero_grad = grad[1 - ctxt.branch]
+    zero_grad = switch(zero_grad, ctxt.pred, name="grad_0")[1 - ctxt.branch]
+    return merge([good_grad, zero_grad], name="switch_grad")[0], None
+  else:
+    false_grad = switch(grad[0], op.inputs[1])[0]
+    true_grad = switch(grad[1], op.inputs[1])[1]
+    return merge([false_grad, true_grad])[0], None
+
+
+@ops.RegisterGradient("RefSwitch")
+def _RefSwitchGrad(op, *grad):
+  return _SwitchGrad(op, *grad)
+
+
+@ops.RegisterGradient("Merge")
+def _MergeGrad(op, grad, _):
+  op = GetRealOp(op)
+  input_op = op.inputs[0].op
+  # pylint: disable=protected-access
+  ctxt = input_op._get_control_flow_context()
+  # pylint: enable=protected-access
+  if isinstance(ctxt, WhileContext):
+    grad_ctxt = ctxt.grad_context
+    return switch(grad, grad_ctxt.pivot)
+  elif isinstance(ctxt, CondContext):
+    return switch(grad, ctxt.pred, name="merge_grad")
+  else:
+    num_inputs = len(op.inputs)
+    cond = [math_ops.equal(op.outputs[1], i) for i in xrange(num_inputs)]
+    return [Switch(grad, cond[i])[1] for i in xrange(num_inputs)]
+
+
+@ops.RegisterGradient("Exit")
+def _ExitGrad(op, grad):
+  # pylint: disable=protected-access
+  forward_ctxt = op._get_control_flow_context()
+  # pylint: enable=protected-access
+  if not forward_ctxt.back_prop:
+    return None
+  grad_ctxt = forward_ctxt.grad_context
+  grad_ctxt.AddName(grad.name)
+  return enter(grad, grad_ctxt.name, is_constant=False,
+               parallel_iterations=forward_ctxt.parallel_iterations,
+               name="b_exit")
+
+
+@ops.RegisterGradient("NextIteration")
+def _NextIterationGrad(_, grad):
+  return next_iteration(grad)
+
+
+@ops.RegisterGradient("Enter")
+def _EnterGrad(op, grad):
+  op = GetRealOp(op)
+  # pylint: disable=protected-access
+  forward_ctxt = op._get_control_flow_context()
+  # pylint: enable=protected-access
+  grad_ctxt = forward_ctxt.grad_context
+  if grad_ctxt:
+    if op.get_attr("is_constant"):
+      # Add a gradient accumulator for every loop invariant.
+      result = grad_ctxt.AddBackPropAccumulateLoop(grad)
+    else:
+      result = exit(grad)
+    return result
+  else:
+    return grad
+
+
+@ops.RegisterGradient("RefEnter")
+def _RefEnterGrad(op, grad):
+  return _EnterGrad(op, grad)
+
+
+@ops.RegisterGradient("LoopCond")
+def _LoopCondGrad(_):
+  return None
diff --git a/tensorflow/python/ops/control_flow_ops.py b/tensorflow/python/ops/control_flow_ops.py
new file mode 100644
index 0000000000..068e3b5553
--- /dev/null
+++ b/tensorflow/python/ops/control_flow_ops.py
@@ -0,0 +1,1561 @@
+"""## Control Flow Operations
+
+TensorFlow provides several operations and classes that you can use to control
+the execution of operations and add conditional dependencies to your graph.
+
+@@identity
+@@tuple
+@@group
+@@no_op
+@@count_up_to
+
+## Logical Operators
+
+TensorFlow provides several operations that you can use to add logical operators
+to your graph.
+
+@@logical_and
+@@logical_not
+@@logical_or
+@@logical_xor
+
+## Comparison Operators
+
+TensorFlow provides several operations that you can use to add comparison
+operators to your graph.
+
+@@equal
+@@not_equal
+@@less
+@@less_equal
+@@greater
+@@greater_equal
+@@select
+@@where
+
+## Debugging Operations
+
+TensorFlow provides several operations that you can use to validate values and
+debug your graph.
+
+@@is_finite
+@@is_inf
+@@is_nan
+@@verify_tensor_all_finite
+@@check_numerics
+@@add_check_numerics_ops
+@@Assert
+@@Print
+"""
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import gen_control_flow_ops
+from tensorflow.python.ops import gen_array_ops
+from tensorflow.python.ops import logging_ops
+from tensorflow.python.ops import math_ops
+# pylint: disable=wildcard-import,undefined-variable
+from tensorflow.python.ops.gen_control_flow_ops import *
+
+
+# We override the 'tuple' for a control flow op, so we keep python's
+# existing 'tuple' for later use in this module.
+_basetuple = tuple
+
+
+# pylint: disable=protected-access
+def _Identity(data, name=None):
+  """Return a tensor with the same shape and contents as the input tensor.
+
+  Args:
+    data: A Tensor.
+    name: A name for this operation (optional).
+
+  Returns:
+    A Tensor with the same type and value as the input Tensor.
+  """
+  if not data.dtype.is_ref_dtype:
+    return array_ops.identity(data, name=name)
+  else:
+    return gen_array_ops._ref_identity(data, name=name)
+
+
+def _Enter(data, frame_name, is_constant=False, parallel_iterations=10,
+           name=None):
+  """Creates or finds a child frame, and makes 'data' available to it.
+
+  The unique `frame_name` is used by the `Executor` to identify frames. If
+  `is_constant` is true, `output` is a constant in the child frame; otherwise
+  it may be changed in the child frame. At most `parallel_iterations` iterations
+  are run in parallel in the child frame.
+
+  Args:
+    data: The tensor to be made available to the child frame.
+    frame_name: The name of the child frame.
+    is_constant: If true, the output is constant within the child frame.
+    parallel_iterations: The number of iterations allowed to run in parallel.
+    name: A name for this operation (optional).
+
+  Returns:
+    The same tensor as 'data'.
+  """
+  if not data.dtype.is_ref_dtype:
+    return enter(data, frame_name, is_constant, parallel_iterations,
+                 name=name)
+  else:
+    return ref_enter(data, frame_name, is_constant, parallel_iterations,
+                     name=name)
+
+
+def exit(data, name=None):
+  """Exits the current frame to its parent frame.
+
+  Exit makes its input `data` available to the parent frame.
+
+  Args:
+    data: The tensor to be made available to the parent frame.
+    name: A name for this operation (optional).
+
+  Returns:
+    The same tensor as `data`.
+  """
+  return gen_control_flow_ops._exit(data, name)
+
+
+def switch(data, pred, name=None):
+  """Forwards `data` to an output determined by `pred`.
+
+  If `pred` is true, the `data` input is forwared to the first output.
+  Otherwise, the data goes to the second output.
+
+  This op handles `Tensor`s and `IndexedSlices`.
+
+  Args:
+    data: The tensor to be forwarded to the appropriate output.
+    pred: A scalar that specifies which output port will receive data.
+    name: A name for this operation (optional).
+
+  Returns:
+    `(output_true, output_false)`: If `pred` is true, data will be forwarded to
+    `output_true`, otherwise it goes to `output_false`.
+  """
+  with ops.op_scope([data, pred], name, "Switch") as name:
+    data = ops.convert_to_tensor_or_indexed_slices(data, name="data")
+    pred = ops.convert_to_tensor(pred, name="pred")
+    if isinstance(data, ops.Tensor):
+      return gen_control_flow_ops._switch(data, pred, name=name)
+    else:
+      val, ind, dense_shape = data.values, data.indices, data.dense_shape
+      val_f, val_t = gen_control_flow_ops._switch(val, pred, name=name)
+      ind_f, ind_t = gen_control_flow_ops._switch(ind, pred, name="indices")
+      if dense_shape:
+        dense_shape_f, dense_shape_t = gen_control_flow_ops._switch(
+            dense_shape, pred, name="dense_shape")
+      else:
+        dense_shape_f, dense_shape_t = None, None
+      return (ops.IndexedSlices(val_f, ind_f, dense_shape_f),
+              ops.IndexedSlices(val_t, ind_t, dense_shape_t))
+
+
+def merge(inputs, name=None):
+  """Returns the value of an available element of `inputs`.
+
+  This op tests each of the tensors in `inputs` in turn to determine if any of
+  them is available. If it finds an available tensor, it returns it and its
+  index in `inputs`.
+
+  It is an error if more than one tensor in `inputs` is available. If no tensor
+  in `inputs` is available, the returned tensor and index are not set.
+
+  This op handles both `Tensor`s and `IndexedSlices`. If inputs has a mix of
+  `Tensor`s and `IndexedSlices`, all inputs are converted to IndexedSlices
+  before merging.
+
+  Args:
+    inputs: The input tensors, at most one of which is available.
+    name: A name for this operation (optional).
+
+  Returns:
+    A tuple containing the chosen input tensor and its index in `inputs`.
+
+  Raises:
+    ValueError: If inputs are IndexedSlices and some but not all have a
+      dense_shape property.
+  """
+  with ops.op_scope(inputs, name, "Merge") as name:
+    inputs = [ops.convert_to_tensor_or_indexed_slices(inp) for inp in inputs]
+    if all([isinstance(inp, ops.Tensor) for inp in inputs]):
+      return gen_control_flow_ops._merge(inputs, name=name)
+    else:
+      inputs = math_ops._as_indexed_slices_list(inputs)
+      values, _ = gen_control_flow_ops._merge([inp.values for inp in inputs],
+                                              name=name)
+      indices, chosen_index = gen_control_flow_ops._merge(
+          [inp.indices for inp in inputs], name="indices")
+      if any(inp.dense_shape for inp in inputs):
+        if not all(inp.dense_shape for inp in inputs):
+          raise ValueError("Either all merged IndexedSlices must have a "
+                           "dense_shape, or none must have a dense_shape.")
+        dense_shape, _ = gen_control_flow_ops._merge(
+            [inp.dense_shape for inp in inputs], name="dense_shape")
+      else:
+        dense_shape = None
+      return ops.IndexedSlices(values, indices, dense_shape), chosen_index
+
+
+def _SwitchRefOrTensor(data, pred, name="Switch"):
+  """Forwards `data` to an output determined by `pred`.
+
+  If `pred` is true, the `data` input is forwared to the first output.
+  Otherwise, the data goes to the second output.
+
+  This op handles `Tensor`s and `IndexedSlices`.
+
+  Args:
+    data: The tensor to be forwarded to the appropriate output.
+    pred: A scalar that specifies which output port will receive data.
+    name: A name for this operation (optional).
+
+  Returns:
+    `(output_false, output_false)`: If `pred` is true, data will be forwarded to
+    `output_true`, otherwise it goes to `output_false`.
+
+  Raises:
+    TypeError: if data is not a Tensor or IndexedSlices
+  """
+  data = ops.convert_to_tensor_or_indexed_slices(data, name="data")
+  if isinstance(data, ops.Tensor):
+    if not data.dtype.is_ref_dtype:
+      return switch(data, pred, name=name)
+    else:
+      return ref_switch(data, pred, name=name)
+  else:
+    return switch(data, pred, name=name)
+
+
+class ControlFlowOpInputs(object):
+  """An indirection to capture the input tensors needed in backprop."""
+
+  def __init__(self, op):
+    self._op = op
+    self._inputs = None
+
+  def __len__(self):
+    return len(self._op._inputs)
+
+  def __getitem__(self, index):
+    if self._inputs is None:
+      self._inputs = [None for _ in self._op.inputs]
+    if isinstance(index, int):
+      val = self._inputs[index]
+      if val is None:
+        f_val = self._op.inputs[index]
+        val = _GetRealValue(f_val)
+        self._inputs[index] = val
+      return val
+    elif isinstance(index, slice):
+      start, stop, step = index.indices(len(self))
+      vals = [self[i] for i in xrange(start, stop, step)]
+      return vals
+    else:
+      raise TypeError("index must be an integer or slice")
+
+
+class ControlFlowOpOutputs(object):
+  """An indirection to capture the output tensors needed in backprop."""
+
+  def __init__(self, op):
+    self._op = op
+    self._outputs = None
+
+  def __len__(self):
+    return len(self._op._outputs)
+
+  def __getitem__(self, index):
+    if self._outputs is None:
+      self._outputs = [None for _ in self._op.outputs]
+    if isinstance(index, int):
+      val = self._outputs[index]
+      if val is None:
+        f_val = self._op.outputs[index]
+        val = _GetRealValue(f_val)
+        self._outputs[index] = val
+      return val
+    elif isinstance(index, slice):
+      start, stop, step = index.indices(len(self))
+      vals = [self[i] for i in xrange(start, stop, step)]
+      return vals
+    else:
+      raise TypeError("index must be an integer or slice")
+
+
+class ControlFlowOpWrapper(object):
+  """A wrapper class for Operation."""
+
+  def __init__(self, op):
+    self._op = op
+    self._inputs = None
+    self._outputs = None
+
+  @property
+  def inputs(self):
+    if self._inputs is None:
+      self._inputs = ControlFlowOpInputs(self._op)
+    return self._inputs
+
+  @property
+  def outputs(self):
+    if self._outputs is None:
+      self._outputs = ControlFlowOpOutputs(self._op)
+    return self._outputs
+
+  @property
+  def op(self):
+    return self._op
+
+  @property
+  def name(self):
+    """Returns the name of this instance of op."""
+    return self._op.name
+
+  @property
+  def _id(self):
+    """Returns the unique id of this operation."""
+    return self._op._id
+
+  @property
+  def device(self):
+    """Returns the device of this operation.
+
+    Returns:
+      a string or None if the device was not set.
+    """
+    return self._op.device
+
+  @property
+  def output_types(self):
+    return self._op.output_types
+
+  @property
+  def input_types(self):
+    return self._op._input_types
+
+  @property
+  def type(self):
+    """Returns the type of the op."""
+    return self._op.type
+
+  @property
+  def graph(self):
+    """Returns the parent graph."""
+    return self._op.graph
+
+  def GetAttr(self, attr_name):
+    """Returns the value of attribute 'attr_name' of NodeDef."""
+    return self._op.get_attr(attr_name)
+
+  def _get_control_flow_context(self):
+    return self._op._get_control_flow_context()
+
+
+def GetRealOp(op):
+  while isinstance(op, ControlFlowOpWrapper):
+    op = op.op
+  return op
+
+
+def MakeWrapper(op):
+  """Make a wrapper for op if it is in a WhileContext."""
+  forward_ctxt = op._get_control_flow_context()
+  if forward_ctxt and isinstance(forward_ctxt, WhileContext):
+    return ControlFlowOpWrapper(op)
+  return op
+
+
+def EnterGradWhileContext(op):
+  """Enter the WhileContext for gradient computation."""
+  forward_ctxt = op._get_control_flow_context()
+  if forward_ctxt and isinstance(forward_ctxt, WhileContext):
+    grad_ctxt = forward_ctxt.CreateGradWhileContext()
+    grad_ctxt.Enter()
+
+
+def ExitGradWhileContext(op):
+  """Exit the WhileContext for gradient computation."""
+  forward_ctxt = op._get_control_flow_context()
+  if forward_ctxt and isinstance(forward_ctxt, WhileContext):
+    assert forward_ctxt.grad_context
+    forward_ctxt.grad_context.Exit()
+
+
+def _GetRealValue(value):
+  """Get the real value.
+
+  If backprop "uses" a value produced by forward inference, an
+  accumulator is added in the forward loop to accumulate its values,
+  so we use the accumulated value, indexed by the backprop counter.
+
+  Args:
+    value: A tensor to be captured.
+
+  Returns:
+    The same tensor value from the saved history.
+  """
+  real_value = value
+  forward_ctxt = value.op._get_control_flow_context()
+  real_value = forward_ctxt.history_map.get(value.name)
+  assert value.op.type != "Variable"
+  if real_value is None:
+    if value.op.type == "Enter" and value.op.get_attr("is_constant"):
+      # Use the input of this Enter node
+      real_value = GetRealOp(value.op).inputs[0]
+    else:
+      # Accumulate the history of this value.
+      # NOTE(yuanbyu): Don't accumulate for constants. One approach is
+      # to deepcopy the constants for the grad while context.
+      history_value = forward_ctxt.AddForwardAccumulateLoop(value)
+
+      # The shapes of the whole history and a single event element.
+      forward_ctxt.grad_context.Exit()
+      elem_rank = array_ops.rank(history_value) - 1
+      elem_rank_vec = array_ops.expand_dims(elem_rank, 0)
+      elem_shape = array_ops.slice(array_ops.shape(history_value), [1],
+                                   elem_rank_vec)
+      slice_shape = array_ops.concat(0, [[1], elem_shape])
+      forward_ctxt.grad_context.Enter()
+
+      # The begin position of the slice at slice_index.
+      slice_index = forward_ctxt.grad_context.index
+      b1 = array_ops.zeros(elem_rank_vec, dtype=types.int32)
+      b = array_ops.concat(0, [array_ops.expand_dims(slice_index, 0), b1])
+
+      # The slice at slice_index.
+      # TODO(irving): Replace with gather once that's GPU accelerated
+      real_value = array_ops.squeeze(
+          array_ops.slice(history_value,
+                          b,
+                          slice_shape,
+                          name="real"),
+          squeeze_dims=[0])
+  forward_ctxt.history_map[value.name] = real_value
+  return real_value
+
+
+def IsLoopSwitch(op):
+  """Returns true if `op` is the Switch for a While loop."""
+  if op.type == "Switch":
+    ctxt = op._get_control_flow_context()
+    return ctxt and isinstance(ctxt, WhileContext)
+  return False
+
+
+class ControlFlowContext(object):
+  """The base class for control flow context.
+
+  The usage pattern is a sequence of (Enter, Exit) followed by a final
+  ExitResult.
+  """
+
+  def AddName(self, name):
+    self._values.add(name)
+
+  # pylint: disable=protected-access
+  def Enter(self):
+    """Enter the current context."""
+    self._outer_context = ops.get_default_graph()._get_control_flow_context()
+    ops.get_default_graph()._set_control_flow_context(self)
+
+  def Exit(self):
+    """Exit the current context."""
+    ops.get_default_graph()._set_control_flow_context(self._outer_context)
+  # pylint: enable=protected-access
+
+  def ExitResult(self, result):
+    """Make a list of tensors available in the outer context."""
+    if self._outer_context is not None:
+      for x in result:
+        self._outer_context.AddName(x.name)
+
+  def GetWhileContext(self):
+    """Get the current while context."""
+    if self._outer_context is not None:
+      return self._outer_context.GetWhileContext()
+    return None
+
+  def AddToWhileContext(self, op):
+    """Add a control dependency to the containing WhileContext.
+
+    The added control dependency ensures that the outputs of this op
+    belong to the WhileContext.
+
+    Args:
+      op: An operation.
+    """
+    while_ctxt = self.GetWhileContext()
+    if while_ctxt is not None:
+      # pylint: disable=protected-access
+      op._add_control_input(while_ctxt.GetControlPivot().op)
+      # pylint: enable=protected-access
+
+
+class CondContext(ControlFlowContext):
+  """The context for the conditional construct."""
+
+  def __init__(self, pred, pivot, branch):
+    self._pred = pred
+    self._outer_context = None
+    self._pivot = pivot
+    self._branch = branch
+    self._values = set()
+    self._values.add(pred.name)
+    self._values.add(pivot.name)
+    self._external_values = {}
+
+  @property
+  def pred(self):
+    return self._pred
+
+  @property
+  def pivot(self):
+    return self._pivot
+
+  @property
+  def branch(self):
+    return self._branch
+
+  def AddValue(self, val):
+    """Add 'val' to the current context and its outer context recursively."""
+    result = val
+    if val.name not in self._values:
+      self._values.add(val.name)
+      if self._outer_context is not None:
+        result = self._outer_context.AddValue(val)
+      result = with_dependencies([self._pivot], result)
+      self._external_values[val.name] = result
+    return result
+
+  def AddOp(self, op):
+    """Add 'op' to the current context."""
+    if not op.inputs:
+      # Add this op to the enclosing while context
+      self.AddToWhileContext(op)
+      # pylint: disable=protected-access
+      op._add_control_input(self._pivot.op)
+      # pylint: enable=protected-access
+      for x in op.outputs:
+        self._values.add(x.name)
+    else:
+      for index in range(len(op.inputs)):
+        x = op.inputs[index]
+        if x.name not in self._values:
+          self._values.add(x.name)
+          # Add this value to the parent contexts up to the context that
+          # creates this value.
+          real_x = x
+          if self._outer_context is not None:
+            real_x = self._outer_context.AddValue(x)
+          real_x = _SwitchRefOrTensor(real_x, self._pred)[self._branch]
+          self._external_values[x.name] = real_x
+        x = self._external_values.get(x.name)
+        if x is not None:
+          op._update_input(index, x)
+      for x in op.outputs:
+        self._values.add(x.name)
+
+  def BuildCondBranch(self, fn):
+    """Add the subgraph defined by fn() to the graph."""
+    r = fn()
+    result = []
+    if r is not None:
+      if not isinstance(r, list) and not isinstance(r, _basetuple):
+        r = [r]
+      for v in r:
+        if isinstance(v, ops.Operation):
+          v = with_dependencies([v], self._pivot)
+        elif v.name not in self._values:
+          self._values.add(v.name)
+          if self._outer_context is not None:
+            v = self._outer_context.AddValue(v)
+          v = _SwitchRefOrTensor(v, self._pred)[self._branch]
+        else:
+          external_v = self._external_values.get(v.name)
+          if external_v is not None:
+            v = external_v
+        result.append(v)
+    return result
+
+
+def cond(pred, fn1, fn2, name=None):
+  """Return either 'fn1()' or 'fn2()' based on the boolean predicate 'pred'.
+
+  `fn1` and `fn2` both return lists of output tensors. `fn1` and `fn2` must have
+  the same number and type of outputs.
+
+  Args:
+    pred: A scalar determining whether to return the result of `fn1` or `fn2`.
+    fn1: The function to be performed if pred is true.
+    fn2: The function to be performed if pref is false.
+    name: Optional name prefix for the returned tensors.
+
+  Returns:
+    Tensors returned by the call to either `fn1` or `fn2`. If the functions
+    return a singleton list, the element is extracted from the list.
+
+  Raises:
+    TypeError: if `fn1` or `fn2` is not callable.
+    ValueError: if `fn1` and `fn2` do not return the same number of tensors, or
+                return tensors of different types.
+
+  Example:
+  ```python
+    x = constant(2)
+    y = constant(5)
+    def f1(): return constant(17)
+    def f2(): return constant(23)
+    r = cond(math_ops.less(x, y), f1, f2)
+    # r is set to f1()
+  ```
+  """
+  with ops.op_scope([pred], name, "Cond") as name:
+    if not callable(fn1):
+      raise TypeError("fn1 must be callable.")
+    if not callable(fn2):
+      raise TypeError("fn2 must be callable.")
+
+    # Add the Switch to the graph.
+    p_2, p_1 = switch(pred, pred)
+    pivot_1 = array_ops.identity(p_1, name="switch_t")
+    pivot_2 = array_ops.identity(p_2, name="switch_f")
+    pred = array_ops.identity(pred, name="pred_id")
+
+    # Build the graph for the true branch in a new context.
+    context_t = CondContext(pred, pivot_1, 1)
+    context_t.Enter()
+    res_t = context_t.BuildCondBranch(fn1)
+    context_t.ExitResult(res_t)
+    context_t.Exit()
+
+    # Build the graph for the false branch in a new context.
+    context_f = CondContext(pred, pivot_2, 0)
+    context_f.Enter()
+    res_f = context_f.BuildCondBranch(fn2)
+    context_t.ExitResult(res_f)
+    context_f.Exit()
+
+    # Add the final merge to the graph.
+    if len(res_t) != len(res_f):
+      raise ValueError("fn1 and fn2 must return the same number of tensors.")
+    for x, y in zip(res_f, res_t):
+      assert ((isinstance(x, ops.IndexedSlices) and
+               isinstance(y, ops.IndexedSlices)) or
+              (isinstance(x, ops.Tensor) and isinstance(y, ops.Tensor)))
+      val_x = x if isinstance(x, ops.Tensor) else x.values
+      val_y = y if isinstance(y, ops.Tensor) else y.values
+      if val_x.dtype.base_dtype != val_y.dtype.base_dtype:
+        raise ValueError("Outputs of fn1 and fn2 must have the same type: "
+                         "%s, %s" % (val_x.dtype.name, val_y.dtype.name))
+    merges = [merge([x[0], x[1]])[0] for x in zip(res_f, res_t)]
+    return merges[0] if len(merges) == 1 else merges
+
+
+# TODO(yuanbyu): We should probably separate the notion of context so it
+# could be used not only for conditionals and loops but also subgraphs.
+class WhileContext(ControlFlowContext):
+  """The context for the loop construct."""
+
+  def __init__(self, parallel_iterations, back_prop, name):
+    self._name = ops.get_default_graph().unique_name(name)
+    self._parallel_iterations = parallel_iterations
+    self._back_prop = back_prop
+    self._outer_context = None
+    # We use this node to control constants created by the pred lambda.
+    self._pivot_for_pred = None
+    # We use this node to control constants created by the body lambda.
+    self._pivot_for_body = None
+    # The boolean tensor for loop termination condition. Used in code
+    # generation for gradient computation
+    self._pivot = None
+
+    # The tensors for the counters added by AddForwardCounterLoop or
+    # AddBackPropCounterLoop
+    self._index = None
+
+    # Information needed by backprop
+    self._grad_context = None
+    self._total_iterations = None
+    self._history_map = {}
+    self._switch_map = {}
+
+    # values considered to have been already seen in this context
+    self._values = set()
+
+    # values referenced by but external to this context
+    self._external_values = {}
+
+  @property
+  def name(self):
+    return self._name
+
+  @property
+  def parallel_iterations(self):
+    """The number of iterations allowed to run in parallel."""
+    return self._parallel_iterations
+
+  @property
+  def back_prop(self):
+    """True iff backprop is enabled for this While loop."""
+    return self._back_prop
+
+  @property
+  def pivot(self):
+    """The boolean tensor representing the loop termination condition."""
+    return self._pivot
+
+  @property
+  def index(self):
+    """The loop index representing the current iteration."""
+    return self._index
+
+  @property
+  def grad_context(self):
+    """The corresponding WhileContext for gradient."""
+    return self._grad_context
+
+  @property
+  def history_map(self):
+    """The map that records all the tensors needed for backprop."""
+    return self._history_map
+
+  @property
+  def switch_map(self):
+    """The map that records all the Switch ops in the While loop."""
+    return self._switch_map
+
+  @property
+  def total_iterations(self):
+    """The total number of iterations of the while loop."""
+    return self._total_iterations
+
+  def GetWhileContext(self):
+    return self
+
+  def GetControlPivot(self):
+    if self._pivot_for_body:
+      return self._pivot_for_body
+    return self._pivot_for_pred
+
+  def AddValue(self, val):
+    """Add 'val' to the current context and its outer context recursively."""
+    result = val
+    if val.name not in self._values:
+      self._values.add(val.name)
+      if self._outer_context is not None:
+        result = self._outer_context.AddValue(val)
+      # Create an Enter that makes 'result' known to this context.
+      enter = _Enter(result, self._name, is_constant=True,
+                     parallel_iterations=self._parallel_iterations)
+      self._values.add(enter.name)
+      self._external_values[val.name] = enter
+      result = enter
+    else:
+      actual_val = self._external_values.get(val.name)
+      if actual_val is not None:
+        result = actual_val
+    return result
+
+  def AddOp(self, op):
+    """Adds 'op' to the current context."""
+    if not op.inputs:
+      if not op.control_inputs:
+        # Add a control edge from the control pivot to this op.
+        # pylint: disable=protected-access
+        op._add_control_input(self.GetControlPivot().op)
+        # pylint: enable=protected-access
+      else:
+        # Control edges must be in the same context.
+        for x in op.control_inputs:
+          assert x._get_control_flow_context() == self, (
+              "Control inputs must come from Operations in the same while "
+              "loop context (not an outer context).")
+      for x in op.outputs:
+        self._values.add(x.name)
+    else:
+      for index in range(len(op.inputs)):
+        x = op.inputs[index]
+        self.AddValue(x)
+        real_x = self._external_values.get(x.name)
+        if real_x is not None:
+          op._update_input(index, real_x)
+          # Add a control dependency to prevent loop invariants from
+          # enabling ops that should not be executed.
+          if real_x.op.type == "RefEnter" and real_x.op.get_attr("is_constant"):
+            # pylint: disable=protected-access
+            op._add_control_input(self.GetControlPivot().op)
+            # pylint: enable=protected-access
+      for x in op.outputs:
+        self._values.add(x.name)
+
+  def CreateGradWhileContext(self):
+    """Creates the WhileContext for backprop gradient computation."""
+    if self._grad_context is None:
+      cnt = self.AddForwardCounterLoop()
+      self._grad_context = WhileContext(self._parallel_iterations,
+                                        self._back_prop, self._name)
+      self._grad_context.AddBackPropCounterLoop(cnt)
+    return self._grad_context
+
+  def AddForwardCounterLoop(self):
+    """Adds a loop that counts the number of iterations.
+
+    This is added to the forward loop at the time when we start to
+    create the loop for backprop gradient computation.
+
+    The pseudocode is:
+      `n = 0; while (_pivot) { n++; }`
+
+    Returns:
+      The number of iterations taken by the forward loop.
+    """
+    n = constant_op.constant(0, name="f_count")
+    self.Enter()
+    self.AddName(n.name)
+    enter_n = _Enter(n, self._name, is_constant=False,
+                     parallel_iterations=self._parallel_iterations,
+                     name="f_count")
+    merge_n = merge([enter_n, enter_n])[0]
+    switch_n = switch(merge_n, self._pivot)
+    self._index = switch_n[1]
+
+    add_n = math_ops.add(self._index, 1)
+    next_n = next_iteration(add_n)
+    merge_n.op._update_input(1, next_n)
+
+    self._total_iterations = exit(switch_n[0], name="f_count")
+    self.Exit()
+    return self._total_iterations
+
+  def AddForwardAccumulateLoop(self, value):
+    """Add an accumulation loop for each value needed in backprop.
+
+    This is added to the forward loop at the first time when a value
+    in the forward loop is used by backprop gradient computation loop.
+
+    The pseudocode is:
+    ```
+      acc;
+      while (_pivot) {
+        if (index == 0) [value] else Concat(acc, [value]);
+      }
+    ```
+
+    Args:
+      value: The tensor that is accumulated.
+
+    Returns:
+      The accumulated history of value.
+
+    Raises:
+      ValueError: If the shape of "value" is not known statically.
+    """
+    if not value.get_shape().is_fully_defined():
+      raise ValueError("Must have known shape: %s" % value)
+    self._grad_context.Exit()
+    # TODO(irving): Now that acc starts out empty, most of the
+    # conditional logic can go away.
+    acc = constant_op.constant([],
+                               value.dtype,
+                               shape=[0] + value.get_shape().as_list(),
+                               name="f_acc")
+    self.Enter()
+    self.AddName(acc.name)
+    enter_acc = _Enter(acc, self._name, is_constant=False,
+                       parallel_iterations=self._parallel_iterations,
+                       name="f_acc")
+    merge_acc = merge([enter_acc, enter_acc])[0]
+    switch_acc = switch(merge_acc, self._pivot)
+
+    # If index = 0 then [value] else Concat(acc, [value]).
+    cond = math_ops.greater(self._index, 0)
+    switch_add_acc = switch(switch_acc[1], cond)
+    expand_value = array_ops.expand_dims(value, 0)
+    true_branch = array_ops.concat(0, [switch_add_acc[1], expand_value])
+    false_branch = array_ops.identity(switch_add_acc[0])
+    false_branch = with_dependencies([false_branch], expand_value)
+    add_acc = merge([false_branch, true_branch])[0]
+
+    next_acc = next_iteration(add_acc)
+    merge_acc.op._update_input(1, next_acc)
+
+    exit_acc = exit(switch_acc[0], name="f_acc")
+    self.Exit()
+    self._grad_context.Enter()
+    return exit_acc
+
+  def AddForwardAccumulateCondLoop(self, value):
+    """Add an accumulation loop for each conditional switch.
+
+    This is added to the forward loop at the first time when a conditional
+    switch in the forward loop is used by backprop gradient computation loop.
+
+    The pseudocode is:
+      ```
+      acc;
+      while (_pivot) {
+        Concat(acc, value);
+      }
+      ```
+
+    Args:
+      value: The boolean tensor that is accumulated.
+
+    Returns:
+      The accumulated history of value.
+    """
+    self._grad_context.Exit()
+    acc = constant_op.constant(False, name="f_acc")
+    self.Enter()
+    self.AddName(acc.name)
+    enter_acc = _Enter(acc, self._name, is_constant=False,
+                       parallel_iterations=self._parallel_iterations,
+                       name="f_acc")
+    merge_acc = merge([enter_acc, enter_acc])[0]
+    switch_acc = switch(merge_acc, self._pivot)
+    acc = array_ops.concat(0, [switch_add_acc[1], value])
+    next_acc = next_iteration(acc)
+    merge_acc.op._update_input(1, next_acc)
+
+    exit_acc = exit(switch_acc[0], name="f_acc")
+    self.Exit()
+    self._grad_context.Enter()
+    return exit_acc
+
+  def AddBackPropCounterLoop(self, count):
+    """Add the backprop loop that controls the iterations.
+
+    This is added to the backprop loop. It is used to control the loop
+    termination and the slice index.
+
+    The pseudocode is:
+      `n = count; while (n >= 1) { n--; }`
+
+    Args:
+      count: The number of iterations for backprop.
+
+    Returns:
+      always 0.
+    """
+    one = constant_op.constant(1, name="b_count")
+    self.Enter()
+    self.AddName(count.name)
+    enter_count = _Enter(count, self._name, is_constant=False,
+                         parallel_iterations=self._parallel_iterations,
+                         name="b_count")
+    merge_count = merge([enter_count, enter_count])[0]
+    self._pivot_for_pred = merge_count
+
+    cond = math_ops.greater_equal(merge_count, one)
+    self._pivot = loop_cond(cond, name="b_count")
+    switch_count = switch(merge_count, self._pivot)
+
+    # Add next_iteration right after Switch to match the gradient function.
+    next_count = next_iteration(switch_count[1])
+    self._pivot_for_body = next_count
+    self._index = math_ops.sub(next_count, one)
+    merge_count.op._update_input(1, self._index)
+
+    exit_count = exit(switch_count[0], name="b_count")
+    self.Exit()
+    return exit_count
+
+  def AddBackPropAccumulateLoop(self, value):
+    """Add an accumulation loop for every loop invariant.
+
+    This is added to the backprop loop. It is used to accumulate partial
+    gradients for each loop iteration. Called when in the while context
+    for gradient.
+
+    The pseudocode is:
+      ```
+      acc = 0;
+      while (_pivot) {
+        acc += value;
+      }
+      ```
+
+    Args:
+      value: The partial gradient of an iteration for a loop invariant.
+
+    Returns:
+      The gradient for a loop invariant.
+    """
+    self.Exit()
+    acc = constant_op.constant(0, value.dtype, name="b_acc")
+    self.Enter()
+    self.AddName(acc.name)
+    enter_acc = _Enter(acc, self._name, is_constant=False,
+                       parallel_iterations=self._parallel_iterations,
+                       name="b_acc")
+    merge_acc = merge([enter_acc, enter_acc], name="b_acc")[0]
+    switch_acc = switch(merge_acc, self._pivot)
+
+    next_acc = next_iteration(switch_acc[1])
+    add_acc = math_ops.add(next_acc, value)
+    merge_acc.op._update_input(1, add_acc)
+
+    exit_acc = exit(switch_acc[0], name="b_acc")
+    return exit_acc
+
+  def BuildLoop(self, pred, body, loop_vars):
+    """Add the loop termination condition and body to the graph."""
+
+    loop_vars = ops.convert_n_to_tensor_or_indexed_slices(loop_vars)
+    # Let the context know the loop variabes so the _Enter nodes below
+    # would be added into the context correctly.
+    self._values = set([x.name for x in loop_vars])
+    if self._outer_context is not None:
+      real_vars = [self._outer_context.AddValue(x) for x in loop_vars]
+    else:
+      real_vars = loop_vars
+    enter_vars = [_Enter(x, self._name, is_constant=False,
+                         parallel_iterations=self._parallel_iterations)
+                  for x in real_vars]
+    self._values = set([x.name for x in enter_vars])
+
+    merge_vars = [merge([x, x])[0] for x in enter_vars]
+    self._pivot_for_pred = merge_vars[0]
+
+    # Build the graph for pred.
+    c = ops.convert_to_tensor(pred(*merge_vars))
+    self._pivot = loop_cond(c, name="LoopCond")
+    switch_vars = [_SwitchRefOrTensor(x, self._pivot) for x in merge_vars]
+
+    # Build the graph for body.
+    vars_for_body = [_Identity(x[1]) for x in switch_vars]
+    self._pivot_for_body = vars_for_body[0]
+
+    body_result = body(*vars_for_body)
+    if not isinstance(body_result, (list, _basetuple)):
+      body_result = [body_result]
+    result = ops.convert_n_to_tensor_or_indexed_slices(body_result)
+    next_vars = [next_iteration(x) for x in result]
+
+    # Add the back edges to complete the loop.
+    assert len(merge_vars) == len(next_vars)
+    for x in zip(merge_vars, next_vars):
+      x[0].op._update_input(1, x[1])
+
+    # Add the exit ops.
+    exit_vars = [exit(x[0]) for x in switch_vars]
+
+    for m_var, n_var, e_var in zip(merge_vars, next_vars, exit_vars):
+      if m_var.get_shape().is_compatible_with(n_var.get_shape()):
+        e_var.set_shape(m_var.get_shape().merge_with(n_var.get_shape()))
+
+    # Exit the loop.
+    self.ExitResult(exit_vars)
+    self.Exit()
+    return exit_vars[0] if len(exit_vars) == 1 else exit_vars
+
+
+def While(cond, body, loop_vars, parallel_iterations=10, back_prop=True,
+          name=None):
+  """Repeat `body` while the condition `cond` is true.
+
+  `cond` is a function taking a list of tensors and returning a boolean scalar
+  tensor. `body` is a function taking a list of tensors and returning a list of
+  tensors of the same length and with the same types as the input. `loop_vars`
+  is a list of tensors that is passed to both `cond` and `body`.
+
+  While `cond` evaluates to true, `body` is executed.
+
+  Args:
+    cond: The termination condition of the loop.
+    body: A function that represents the loop body.
+    loop_vars: The list of variable input tensors.
+    parallel_iterations: The number of iterations allowed to run in parallel.
+    back_prop: Whether backprop is enabled for this while loop.
+    name: Optional name prefix for the returned tensors.
+
+  Returns:
+    The output tensors for the loop variables after the loop.
+
+  Raises:
+    TypeError: if `cond` or `body` is not callable.
+    ValueError: if `loop_var` is empty.
+
+  Example:
+    ```python
+    i =  Constant(0)
+    c = lambda i: math_ops.less(i, 10)
+    b = lambda i: math_ops.add(i, 1)
+    r = While(c, b, [i])
+    ```
+  """
+  with ops.op_scope(loop_vars, name, "While") as name:
+    if not loop_vars:
+      raise ValueError("No loop variables provided")
+    if not callable(cond):
+      raise TypeError("cond must be callable.")
+    if not callable(body):
+      raise TypeError("body must be callable.")
+
+    context = WhileContext(parallel_iterations, back_prop, name)
+    context.Enter()
+    return context.BuildLoop(cond, body, loop_vars)
+
+
+def _AsTensorList(x, p):
+  """Return x as a list of Tensors or IndexedSlices.
+
+  For entries of `x` that are Operations, this returns an Identity of `p`
+  with a dependency on the operation.
+
+  Args:
+    x: A Tensor/IndexedSlices/Operation or a list or tuple of them.
+    p: A Tensor to return for entries in `x` that are Operations.
+
+  Returns:
+    A list of Tensors or IndexedSlices.
+  """
+  if not isinstance(x, list) and not isinstance(x, _basetuple):
+    x = [x]
+
+  l = []
+  for v in x:
+    if isinstance(v, ops.Operation):
+      v = with_dependencies([v], p)
+    v = ops.convert_to_tensor_or_indexed_slices(v)
+    if isinstance(v, ops.Tensor):
+      l.append(array_ops.identity(v))
+    else:
+      l.append(ops.IndexedSlices(array_ops.identity(v.values),
+                                 array_ops.identity(v.indices)))
+  return l
+
+
+def _CheckResults(a, b):
+  assert len(a) == len(b), (
+      "Values returned by a() and b() must have the same length.")
+  for x, y in zip(a, b):
+    assert x.dtype == y.dtype, (
+        "Values returned by a() [%s] and b() [%s] must have "
+        "the same type: %s, %s." %
+        (x.name, y.name, x.dtype.name, y.dtype.name))
+
+
+def with_dependencies(dependencies, output_tensor, name=None):
+  """Produces the content of `output_tensor` only after `dependencies`.
+
+  In some cases, a user may want the output of an operation to be
+  consumed externally only after some other dependencies have run
+  first. This function ensures returns `output_tensor`, but only after all
+  operations in `dependencies` have run. Note that this means that there is
+  no guarantee that `output_tensor` will be evaluated after any `dependencies`
+  have run.
+
+  See also `tuple` and `group`.
+
+  Args:
+    dependencies: A list of operations to run before this op finishes.
+    output_tensor: A `Tensor` or `IndexedSlices` that will be returned.
+    name: (Optional) A name for this operation.
+
+  Returns:
+    Same as `output_tensor`.
+
+  Raises:
+    TypeError: if `output_tensor` is not a `Tensor` or `IndexedSlices`.
+  """
+  with ops.op_scope(dependencies + [output_tensor], name,
+                   "control_dependency") as name:
+    with ops.device(output_tensor.device
+                    or ops.get_default_graph().get_default_device()):
+      with ops.control_dependencies(dependencies):
+        output_tensor = ops.convert_to_tensor_or_indexed_slices(output_tensor)
+        if isinstance(output_tensor, ops.Tensor):
+          return _Identity(output_tensor, name=name)
+        else:
+          return ops.IndexedSlices(_Identity(output_tensor.values, name=name),
+                                   output_tensor.indices,
+                                   output_tensor.dense_shape)
+
+
+def _GroupControlDeps(dev, deps, name=None):
+  with ops.control_dependencies(deps):
+    if dev is None:
+      return no_op(name=name)
+    else:
+      with ops.device(dev):
+        return no_op(name=name)
+
+
+# TODO(mdevin): Accept "inputs" as a list.
+def group(*inputs, **kwargs):
+  """Create an op that groups multiple operations.
+
+  When this op finishes, all ops in `input` have finished. This op has no
+  output.
+
+  See also `tuple` and `with_dependencies`.
+
+  Args:
+    *inputs: One or more tensors to group.
+    **kwargs: Optional parameters to pass when constructing the NodeDef.
+    name: A name for this operation (optional).
+
+  Returns:
+    An Operation that executes all its inputs.
+
+  Raises:
+    ValueError: If an unknown keyword argument is provided, or if there are
+                no inputs.
+  """
+  name = kwargs.pop("name", None)
+  if kwargs:
+    raise ValueError("Unknown keyword arguments: " + ", ".join(kwargs.keys()))
+  if not inputs:
+    # TODO(mdevin): Would make sense to return a NoOp.
+    raise ValueError("No inputs provided")
+  with ops.op_scope(inputs, name, "group_deps") as name:
+    # Sorts *inputs according to their devices.
+    ops_on_device = {}  # device -> operations specified on the device.
+    for inp in inputs:
+      dev = inp.device
+      if dev in ops_on_device:
+        ops_on_device[dev].append(inp)
+      else:
+        ops_on_device[dev] = [inp]
+    if len(ops_on_device) == 1:
+      # 1-level tree. The root node is the returned NoOp node.
+      dev, deps = ops_on_device.items()[0]
+      return _GroupControlDeps(dev, deps, name=name)
+    # 2-level tree. The root node is the returned NoOp node.
+    # deps contains 1 NoOp node for each device.
+    deps = []
+    for dev in sorted(ops_on_device.iterkeys()):
+      deps.append(_GroupControlDeps(dev, ops_on_device[dev]))
+    return _GroupControlDeps(None, deps, name=name)
+
+def tuple(tensors, name=None, control_inputs=None):
+  """Group tensors together.
+
+  This creates a tuple of tensors with the same values as the `tensors`
+  argument, except that the value of each tensor is only returned after the
+  values of all tensors have been computed.
+
+  `control_inputs` contains additional ops that have to finish before this op
+  finishes, but whose outputs are not returned.
+
+  This can be used as a "join" mechanism for parallel computations: all the
+  argument tensors can be computed in parallel, but the values of any tensor
+  returned by `tuple` are only available after all the parallel computations
+  are done.
+
+  See also `group` and `with_dependencies`.
+
+  Args:
+    tensors: A list of `Tensor`s or `IndexedSlices`, some entries can be `None`.
+    name: (optional) A name to use as a `name_scope` for the operation.
+    control_inputs: List of additional ops to finish before returning.
+
+  Returns:
+    Same as `tensors`.
+
+  Raises:
+    ValueError: If `tensors` does not contain any `Tensor` or `IndexedSlices`.
+
+  """
+  with ops.op_scope(tensors, name, "tuple") as name:
+    gating_ops = [t.op for t in tensors if t]
+    if control_inputs:
+      gating_ops += control_inputs
+    # Note that in order to ensure ordering in the pbtxt, we must take care to
+    # ensure the order here.
+    gating_ops = sorted(set(gating_ops), key=lambda op: op._id)  # Uniquify ops.
+    if not gating_ops:
+      raise ValueError("Must have at least one Tensor: %s" % tensors)
+    gate = group(*gating_ops)
+    tpl = []
+    for t in tensors:
+      if t:
+        tpl.append(with_dependencies([gate], t))
+      else:
+        tpl.append(None)
+    return tpl
+
+
+# TODO(yuanbyu): It would be nicer if we could have the distributed list
+# support that Derek has been proposing.
+# TODO(yuanbyu, mrry): Handle stride to support sliding windows.
+def fold(fn, elems, elem_shape, name=None):
+  """The fold operator on slices of a tensor.
+
+  This fold operator applies the function `fn` to slices of `elems` on
+  dimension 0. The shape of the slices is specified by `elem_shape`. `elems`
+  must contain at least one slice (`shape(elems)[0] / elem_shape[0] > 0`).
+
+  Args:
+    fn: The function to be performed on each slice of the tensor.
+    elems: The tensor to whose slices we want to apply `fn`.
+    elem_shape: The shape definition for the slices.
+    name: Optional name prefix for the returned tensors.
+
+  Returns:
+    A tensor resulting from applying `fn` consecutively on each slice of
+    `elems`.
+
+  Raises:
+    TypeError: if `fn` is not callable.
+  """
+  with ops.op_scope([elems], name, "Fold") as name:
+    if not callable(fn):
+      raise TypeError("fn must be callable.")
+
+    s0 = array_ops.shape(elems)[0]
+    d0 = elem_shape[0]
+    n = math_ops.div(s0, d0)
+    b1 = array_ops.zeros(array_ops.expand_dims(array_ops.rank(elems) - 1, 0),
+                         dtype=types.int32)
+    # Initialize the output with slice 0
+    b = array_ops.concat(0, [[0], b1])
+    o = array_ops.slice(elems, b, elem_shape)
+    i = ops.convert_to_tensor(d0)
+
+    def Compute(i, o):
+      b = array_ops.concat(0, [array_ops.expand_dims(i, 0), b1])
+      x = array_ops.slice(elems, b, elem_shape)
+      o = fn(o, x)
+      i = math_ops.add(i, d0)
+      return [i, o]
+    r = While(lambda i, o: math_ops.less(i, n), Compute, [i, o])
+    return r[1]
+
+
+def case(pred_fn_pairs, default, exclusive=False, name="Case"):
+  """Create a Case operation.
+
+  The `pred_fn_pairs` parameter is a dict or list of pairs of size N.
+  Each pair contains a boolean scalar tensor and a python callable that
+  creates the tensors to be returned if the boolean evaluates to True. `default`
+  is a callable generating a list of tensors. All the callables in
+  `pred_fn_pairs` as well as `default` should return the same number and types
+  of tensors.
+
+  If `exclusive==True`, all predicates are evaluated, and a logging operation
+  with an error is returned if more than one of the predicates evaluates to
+  True. If `exclusive==False`, execution stops are the first predicate which
+  evaluates to True, and the tensors generated by the corresponding function
+  are returned immediately. If none of the predicates evaluate to True, this
+  operation returns the tensors generated by `default`.
+
+  Example 1:
+    Pseudocode:
+    ```
+      if (x < y) return 17;
+      else return 23;
+    ```
+
+    Expressions:
+    ```
+      f1 = lambda: Constant(17)
+      f2 = lambda: Constant(23)
+      r = Case([(math_ops.less(x, y), f1)], default=f2)
+    ```
+
+  Example 2:
+    Pseudocode:
+    ```
+      if (x < y && x > z) raise OpError("Only one predicate may evaluate true");
+      if (x < y) return 17;
+      else if (x > z) return 23;
+      else return -1;
+    ```
+
+    Expressions:
+    ```
+      def f1(): return Constant(17)
+      def f2(): return Constant(23)
+      def f3(): return Constant(-1)
+      r = Case({math_ops.less(x, y): f1, math_ops.greater(x, z): f2},
+               default=f3, exclusive=True)
+    ```
+
+  Args:
+    pred_fn_pairs: Dict or list of pairs of a boolean scalar tensor and a
+                   callable which returns a list of tensors.
+    default: A callable that returns a list of tensors.
+    exclusive: True iff more than one predicate is allowed to evaluate to True.
+    name: A name for this operation (optional).
+
+  Returns:
+    The tensors returned by the first pair whose predicate evaluated to True, or
+    those returned by `default` if none does.
+
+  Raises:
+    TypeError: If `pred_fn_pairs` is not a list/dictionary.
+    TypeError: If `pred_fn_pairs` is a list but does not contain 2-tuples.
+    TypeError: If `fns[i]` is not callable for any i, or `default` is not
+               callable.
+  """
+  pfp = pred_fn_pairs  # For readability
+  if not (isinstance(pfp, list) or isinstance(pfp, _basetuple)
+          or isinstance(pfp, dict)):
+    raise TypeError("fns must be a list, tuple, or dict")
+  if isinstance(pfp, dict):
+    pfp = pfp.items()
+    if not exclusive:
+      logging.warn("%s: Provided dictionary of predicate/fn pairs, but "
+                   "exclusive=False.  Order of conditional tests is "
+                   "not guaranteed." % name)
+  for tup in pfp:
+    if not isinstance(tup, _basetuple) or len(tup) != 2:
+      raise TypeError("Each entry in pred_fn_pairs must be a 2-tuple")
+    pred, fn = tup
+    if pred.dtype != types.bool:
+      raise TypeError("pred must be of type bool: %s", pred.name)
+    if not callable(fn):
+      raise TypeError("fn for pred %s must be callable." % pred.name)
+  if not callable(default):
+    raise TypeError("default must be callable.")
+
+  preds, fns = map(list, zip(*pfp))
+  with ops.op_scope([[f() for f in fns] + preds + [default()]], name, "Case"):
+    if not preds:
+      return default()
+    not_preds = []
+    for i, p in enumerate(preds):
+      with ops.name_scope("not_%d" % i):
+        not_preds.append(math_ops.logical_not(p))
+    and_not_preds = [constant_op.constant(True, name="and_not_true")]
+    for i, notp in enumerate(not_preds[:-1]):
+      with ops.name_scope("and_not_%d" % i):
+        and_not_preds.append(math_ops.logical_and(and_not_preds[-1], notp))
+
+    # preds = [p1, p2, p3]
+    # fns = [f1, f2, f3]
+    # not_preds = [~p1, ~p2, ~p3]
+    # case_preds = [p1 & True,
+    #               p2 & ~p1,
+    #               p3 & ~p1 & ~ p2]
+    case_preds = []
+    for i, (p, and_not_p_prev) in enumerate(zip(preds, and_not_preds)):
+      with ops.name_scope("case_%d" % i):
+        case_preds.append(math_ops.logical_and(p, and_not_p_prev))
+
+    # case_sequence = [Cond(p3 & ..., f3, default),
+    #                  Cond(p2 & ..., f2, lambda: case_sequence[0]),
+    #                  ...
+    #                  Cond(p1 & True, f1, lambda: case_sequence[i-1])]
+    # and prev_case_seq will loop from case_sequence[0] to case_sequence[-1]
+    if exclusive:
+      # TODO(ebrevdo): Add Where() for DT_BOOL, replace with Size(Where(preds))
+      preds_c = array_ops.concat(0, preds, name="preds_c")
+      num_true_conditions = math_ops.reduce_sum(
+          math_ops.cast(preds_c, types.int32), name="num_true_conds")
+      at_most_one_true_condition = math_ops.less(
+          num_true_conditions, constant_op.constant(2, name="two_true_conds"))
+
+      error_msg = [
+          ("More than one condition evaluated as True but "
+           "exclusive=True.  Conditions: (%s), Values:"
+           % ", ".join([p.name for p in preds])),
+          preds_c]
+      with ops.control_dependencies([
+          logging_ops.Assert(condition=at_most_one_true_condition,
+                             data=error_msg, summarize=len(preds))]):
+        prev_case_seq = default()
+        for i, (cp, fn) in enumerate(zip(case_preds, fns)[::-1]):
+          prev_case_seq = cond(cp, fn, lambda: prev_case_seq, name="If_%d" % i)
+    else:
+      prev_case_seq = default()
+      for i, (cp, fn) in enumerate(zip(case_preds, fns)[::-1]):
+        prev_case_seq = cond(cp, fn, lambda: prev_case_seq, name="If_%d" % i)
+
+    return prev_case_seq
+
+
+ops.RegisterShape("Enter")(common_shapes.unchanged_shape)
+ops.RegisterShape("Exit")(common_shapes.unknown_shape)
+ops.RegisterShape("NextIteration")(common_shapes.unchanged_shape)
+ops.RegisterShape("RefEnter")(common_shapes.unchanged_shape)
+ops.RegisterShape("ControlTrigger")(common_shapes.no_outputs)
+ops.RegisterShape("NoOp")(common_shapes.no_outputs)
+
+
+@ops.RegisterShape("LoopCond")
+def _LoopCondShape(op):
+  """Shape function for the LoopCond op."""
+  return [op.inputs[0].get_shape().merge_with(tensor_shape.scalar())]
+
+
+@ops.RegisterShape("Merge")
+def _MergeShape(op):
+  """Shape function for the Merge op.
+
+  The Merge op takes many inputs of arbitrary shapes, and produces a
+  first output that is one of those inputs, and a second scalar
+  output.
+
+  This function conservatively assumes that if any of its inputs is
+  not fully defined, the output shape is unknown. If all of the inputs
+  have the exact same known shape, the output must have that shape.
+
+  Args:
+    op: A Merge Operation.
+
+  Returns:
+    A single-element list containing the Shape of the Merge op.
+
+  """
+  first_input_shape = op.inputs[0].get_shape()
+  if first_input_shape.is_fully_defined():
+    for input_ in op.inputs[1:]:
+      input_shape = input_.get_shape()
+      if (not input_shape.is_fully_defined()
+          or not input_shape.is_compatible_with(first_input_shape)):
+        return [tensor_shape.unknown_shape(), tensor_shape.scalar()]
+    return [first_input_shape, tensor_shape.scalar()]
+  else:
+    return [tensor_shape.unknown_shape(), tensor_shape.scalar()]
+
+
+@ops.RegisterShape("RefSelect")
+def _RefSelectShape(op):
+  """Shape function for the RefSelect op.
+
+  The RefSelect takes one scalar input and N inputs of arbitrary
+  shapes, and produces one output, which is one of those N inputs.
+
+  This function conservatively assumes that if any of the N inputs is
+  not fully defined, the output shape is unknown. If all of the N
+  inputs have the exact same known shape, the output must have that
+  shape.
+
+  Args:
+    op: A RefSelect Operation.
+
+  Returns:
+    A single-element list containing the Shape of the RefSelect op.
+  """
+  unused_shape = op.inputs[0].get_shape().merge_with(tensor_shape.scalar())
+  first_input_shape = op.inputs[1].get_shape()
+  if first_input_shape.is_fully_defined():
+    for input_ in op.inputs[2:]:
+      input_shape = input_.get_shape()
+      if (not input_shape.is_fully_defined()
+          or not input_shape.is_compatible_with(first_input_shape)):
+        return [tensor_shape.unknown_shape()]
+    return [first_input_shape]
+  else:
+    return [tensor_shape.unknown_shape()]
+
+
+@ops.RegisterShape("RefSwitch")
+@ops.RegisterShape("Switch")
+def _SwitchShape(op):
+  input_shape = op.inputs[0].get_shape()
+  unused_pred_shape = op.inputs[1].get_shape().merge_with(tensor_shape.scalar())
+  return [input_shape] * 2
diff --git a/tensorflow/python/ops/control_flow_ops_test.py b/tensorflow/python/ops/control_flow_ops_test.py
new file mode 100644
index 0000000000..34b1ab0a25
--- /dev/null
+++ b/tensorflow/python/ops/control_flow_ops_test.py
@@ -0,0 +1,88 @@
+"""Tests for control_flow_ops.py."""
+import tensorflow.python.platform
+
+from tensorflow.core.framework import graph_pb2
+from tensorflow.python.framework import ops
+from tensorflow.python.framework.test_util import TensorFlowTestCase
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import standard_ops as tf
+from tensorflow.python.platform import googletest
+
+
+class GroupTestCase(TensorFlowTestCase):
+
+  def _StripNode(self, nd):
+    snode = graph_pb2.NodeDef(name=nd.name, op=nd.op, input=nd.input)
+    if nd.device:
+      snode.device = nd.device
+    return snode
+
+  def _StripGraph(self, gd):
+    """Copy gd keeping only, node.name, node.op, node.input, and node.device."""
+    return graph_pb2.GraphDef(node=[self._StripNode(nd) for nd in gd.node])
+
+  def testGroup_NoDevices(self):
+    with ops.Graph().as_default() as g:
+      a = tf.constant(0, name="a")
+      b = tf.constant(0, name="b")
+      c = tf.constant(0, name="c")
+      tf.group(a.op, b.op, c.op, name="root")
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "a" op: "Const"}
+      node { name: "b" op: "Const"}
+      node { name: "c" op: "Const"}
+      node { name: "root" op: "NoOp" input: "^a" input: "^b" input: "^c" }
+    """, self._StripGraph(gd))
+
+  def testGroup_OneDevice(self):
+    with ops.Graph().as_default() as g:
+      with g.device("/task:0"):
+        a = tf.constant(0, name="a")
+        b = tf.constant(0, name="b")
+      tf.group(a.op, b.op, name="root")
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "a" op: "Const" device: "/task:0" }
+      node { name: "b" op: "Const" device: "/task:0" }
+      node { name: "root" op: "NoOp" input: "^a" input: "^b" device: "/task:0" }
+    """, self._StripGraph(gd))
+
+  def testGroup_MultiDevice(self):
+    with ops.Graph().as_default() as g:
+      with g.device("/task:0"):
+        a = tf.constant(0, name="a")
+        b = tf.constant(0, name="b")
+      with g.device("/task:1"):
+        c = tf.constant(0, name="c")
+        d = tf.constant(0, name="d")
+      with g.device("/task:2"):
+        tf.group(a.op, b.op, c.op, d.op, name="root")
+    gd = g.as_graph_def()
+    self.assertProtoEquals("""
+      node { name: "a" op: "Const" device: "/task:0"}
+      node { name: "b" op: "Const" device: "/task:0"}
+      node { name: "c" op: "Const" device: "/task:1"}
+      node { name: "d" op: "Const" device: "/task:1"}
+      node { name: "root/NoOp" op: "NoOp" input: "^a" input: "^b"
+             device: "/task:0" }
+      node { name: "root/NoOp_1" op: "NoOp" input: "^c" input: "^d"
+             device: "/task:1" }
+      node { name: "root" op: "NoOp" input: "^root/NoOp" input: "^root/NoOp_1"
+             device: "/task:2" }
+    """, self._StripGraph(gd))
+
+
+class ShapeTestCase(TensorFlowTestCase):
+
+  def testShape(self):
+    with ops.Graph().as_default():
+      tensor = tf.constant([1.0, 2.0])
+      self.assertEquals([2], tensor.get_shape())
+      self.assertEquals([2],
+                        control_flow_ops.with_dependencies(
+                            [tf.constant(1.0)], tensor).get_shape())
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/ops/data_flow_grad.py b/tensorflow/python/ops/data_flow_grad.py
new file mode 100644
index 0000000000..d2473490ce
--- /dev/null
+++ b/tensorflow/python/ops/data_flow_grad.py
@@ -0,0 +1,37 @@
+"""Gradients for operators defined in data_flow_ops.py."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import data_flow_ops
+from tensorflow.python.ops import gen_data_flow_ops
+from tensorflow.python.ops import math_ops
+
+
+@ops.RegisterGradient("DynamicStitch")
+def _DynamicStitchGrads(op, grad):
+  """Gradients for DynamicStitch."""
+
+  num_values = len(op.inputs) / 2
+  indices_grad = [None] * num_values
+
+  def AsInt32(x):
+    return (x if op.inputs[0].dtype == types.int32 else
+            math_ops.cast(x, types.int32))
+  inputs = [AsInt32(op.inputs[i]) for i in range(num_values)]
+  if isinstance(grad, ops.IndexedSlices):
+    output_shape = array_ops.shape(op.outputs[0])
+    output_rows = output_shape[0]
+    grad = math_ops.unsorted_segment_sum(grad.values, grad.indices, output_rows)
+  values_grad = [array_ops.gather(grad, inp) for inp in inputs]
+  return indices_grad + values_grad
+
+
+ops.NoGradient("Queue")
+ops.NoGradient("QueueEnqueue")
+ops.NoGradient("QueueEnqueueMany")
+ops.NoGradient("QueueDequeue")
+ops.NoGradient("QueueDequeueMany")
+ops.NoGradient("QueueClose")
+ops.NoGradient("QueueSize")
diff --git a/tensorflow/python/ops/data_flow_ops.py b/tensorflow/python/ops/data_flow_ops.py
new file mode 100644
index 0000000000..5c8ab66297
--- /dev/null
+++ b/tensorflow/python/ops/data_flow_ops.py
@@ -0,0 +1,680 @@
+"""Data Flow Operations."""
+# pylint: disable=g-bad-name
+import re
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import random_seed
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import gen_data_flow_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_data_flow_ops import *
+
+
+def _as_type_list(dtypes):
+  """Convert dtypes to a list of types."""
+  assert dtypes is not None
+  if not (isinstance(dtypes, list) or isinstance(dtypes, tuple)):
+    # We have a single type.
+    return [dtypes]
+  else:
+    # We have a list or tuple of types.
+    return list(dtypes)
+
+
+def _as_shape_list(shapes, dtypes):
+  """Convert shapes to a list of tuples of int (or None)."""
+  if shapes is None: return None
+  if isinstance(shapes, tensor_shape.TensorShape):
+    shapes = [shapes]
+  if not isinstance(shapes, (tuple, list)):
+    raise TypeError(
+        "shapes must be a TensorShape or a list or tuple of TensorShapes.")
+  if all(isinstance(shape, int) for shape in shapes):
+    # We have a single shape.
+    shapes = [shapes]
+  shapes = [tensor_shape.as_shape(shape) for shape in shapes]
+  if any(not shape.is_fully_defined() for shape in shapes):
+    raise ValueError("All shapes must be fully defined.")
+  return shapes
+
+
+# pylint: disable=protected-access
+class QueueBase(object):
+  """Base class for queue implementations.
+
+  A queue is a TensorFlow data structure that stores tensors across
+  multiple steps, and exposes operations that enqueue and dequeue
+  tensors.
+
+  Each queue element is a tuple of one or more tensors, where each
+  tuple component has a static dtype, and may have a static shape. The
+  queue implementations support versions of enqueue and dequeue that
+  handle single elements, versions that support enqueuing and
+  dequeuing a batch of elements at once.
+
+  See [`tf.FIFOQueue`](#FIFOQueue) and
+  [`tf.RandomShuffleQueue`](#RandomShuffleQueue) for concrete
+  implementations of this class, and instructions on how to create
+  them.
+
+  @@enqueue
+  @@enqueue_many
+
+  @@dequeue
+  @@dequeue_many
+
+  @@size
+
+  @@close
+
+  """
+
+  def __init__(self, dtypes, shapes, queue_ref):
+    """Constructs a queue object from a queue reference.
+
+    Args:
+      dtypes:  A list of types.  The length of dtypes must equal the number
+        of tensors in each element.
+      shapes: Constraints on the shapes of tensors in an element:
+        A list of shape tuples or None. This list is the same length
+        as dtypes.  If the shape of any tensors in the element are constrained,
+        all must be; shapes can be None if the shapes should not be constrained.
+      queue_ref: The queue reference, i.e. the output of the queue op.
+    """
+    self._dtypes = dtypes
+    if shapes is not None:
+      self._shapes = [tensor_shape.TensorShape(s) for s in shapes]
+    else:
+      self._shapes = [tensor_shape.unknown_shape() for _ in self._dtypes]
+    self._queue_ref = queue_ref
+    self._name = self._queue_ref.op.name.split("/")[-1]
+
+  @staticmethod
+  def from_list(index, queues):
+    """Create a queue using the queue reference from `queues[index]`.
+
+    Args:
+      index: An integer scalar tensor that determines the input that gets
+        selected.
+      queues: A list of `QueueBase` objects.
+
+    Returns:
+      A `QueueBase` object.
+
+    Raises:
+      TypeError: when `queues` is not a list of `QueueBase` objects,
+        or when the data types of `queues` are not all the same.
+    """
+    if ((not queues) or
+        (not isinstance(queues, list)) or
+        (not all([isinstance(x, QueueBase) for x in queues]))):
+      raise TypeError("A list of queues expected")
+
+    dtypes = queues[0].dtypes
+    if not all([dtypes == q.dtypes for q in queues[1:]]):
+      raise TypeError("Queues do not have matching component dtypes.")
+
+    queue_refs = [x.queue_ref for x in queues]
+    selected_queue = control_flow_ops.ref_select(index, queue_refs)
+    # TODO(josh11b): Unify the shapes of the queues too?
+    return QueueBase(dtypes=dtypes, shapes=None, queue_ref=selected_queue)
+
+  @property
+  def queue_ref(self):
+    """The underlying queue reference."""
+    return self._queue_ref
+
+  @property
+  def name(self):
+    """The name of the underlying queue."""
+    return self._queue_ref.op.name
+
+  @property
+  def dtypes(self):
+    """The list of dtypes for each component of a queue element."""
+    return self._dtypes
+
+  def enqueue(self, vals, name=None):
+    """Enqueues one element to this queue.
+
+    If the queue is full when this operation executes, it will block
+    until the element has been enqueued.
+
+    Args:
+      vals: The tuple of `Tensor` objects to be enqueued.
+      name: A name for the operation (optional).
+
+    Returns:
+      The operation that enqueues a new tuple of tensors to the queue.
+    """
+    if name is None:
+      name = "%s_enqueue" % self._name
+    ret = gen_data_flow_ops._queue_enqueue(self._queue_ref, vals, name=name)
+
+    # NOTE(mrry): Not using a shape function because we need access to
+    # the Queue object.
+    for val, shape in zip(ret.inputs[1:], self._shapes):
+      val.get_shape().assert_is_compatible_with(shape)
+
+    return ret
+
+  def enqueue_many(self, vals, name=None):
+    """Enqueues zero or elements to this queue.
+
+    This operation slices each component tensor along the 0th dimension to
+    make multiple queue elements. All of the tensors in `vals` must have the
+    same size in the 0th dimension.
+
+    If the queue is full when this operation executes, it will block
+    until all of the elements have been enqueued.
+
+    Args:
+      vals: The tensor or tuple of tensors from which the queue elements
+        are taken.
+      name: A name for the operation (optional).
+
+    Returns:
+      The operation that enqueues a batch of tuples of tensors to the queue.
+    """
+    if name is None:
+      name = "%s_EnqueueMany" % self._name
+
+    ret = gen_data_flow_ops._queue_enqueue_many(
+        self._queue_ref, vals, name=name)
+
+    # NOTE(mrry): Not using a shape function because we need access to
+    # the `QueueBase` object.
+    batch_dim = ret.inputs[1].get_shape()[0]
+    for val, shape in zip(ret.inputs[1:], self._shapes):
+      batch_dim.merge_with(val.get_shape()[0])
+      val.get_shape()[1:].assert_is_compatible_with(shape)
+
+    return ret
+
+  def dequeue(self, name=None):
+    """Dequeues one element from this queue.
+
+    If the queue is empty when this operation executes, it will block
+    until there is an element to dequeue.
+
+    Args:
+      name: A name for the operation (optional).
+
+    Returns:
+      The tuple of tensors that was dequeued.
+    """
+    if name is None:
+      name = "%s_Dequeue" % self._name
+    ret = gen_data_flow_ops._queue_dequeue(
+        self._queue_ref, self._dtypes, name=name)
+
+    # NOTE(mrry): Not using a shape function because we need access to
+    # the `QueueBase` object.
+    op = ret[0].op
+    for output, shape in zip(op.values(), self._shapes):
+      output.set_shape(shape)
+
+    return ret if len(ret) != 1 else ret[0]
+
+  def dequeue_many(self, n, name=None):
+    """Dequeues and concatenates `n` elements from this queue.
+
+    This operation concatenates queue-element component tensors along
+    the 0th dimension to make a single component tensor.  All of the
+    components in the dequeued tuple will have size `n` in the 0th dimension.
+
+    If the queue contains fewer than `n` elements when this operation
+    executes, it will block until `n` elements have been dequeued.
+
+    Args:
+      n: A scalar `Tensor` containing the number of elements to dequeue.
+      name: A name for the operation (optional).
+
+    Returns:
+      The tuple of concatenated tensors that was dequeued.
+    """
+    if name is None:
+      name = "%s_DequeueMany" % self._name
+
+    ret = gen_data_flow_ops._queue_dequeue_many(
+        self._queue_ref, n, self._dtypes, name=name)
+
+    # NOTE(mrry): Not using a shape function because we need access to
+    # the Queue object.
+    op = ret[0].op
+    batch_dim = tensor_shape.Dimension(tensor_util.ConstantValue(op.inputs[1]))
+    for output, shape in zip(op.values(), self._shapes):
+      output.set_shape(tensor_shape.TensorShape([batch_dim]).concatenate(shape))
+
+    return ret if len(ret) != 1 else ret[0]
+
+  def close(self, cancel_pending_enqueues=False, name=None):
+    """Closes this queue.
+
+    This operation signals that no more elements will be enqueued in
+    the given queue. Subsequent `enqueue` and `enqueue_many`
+    operations will fail. Subsequent `dequeue` and `dequeue_many`
+    operations will continue to succeed if sufficient elements remain
+    in the queue. Subsequent `dequeue` and `dequeue_many` operations
+    that would block will fail immediately.
+
+    If `cancel_pending_enqueues` is `True`, all pending requests will also
+    be cancelled.
+
+    Args:
+      cancel_pending_enqueues: (Optional.) A boolean, defaulting to
+        `False` (described above).
+      name: A name for the operation (optional).
+
+    Returns:
+      The operation that closes the queue.
+    """
+    if name is None:
+      name = "%s_Close" % self._name
+    return gen_data_flow_ops._queue_close(
+        self._queue_ref, cancel_pending_enqueues=cancel_pending_enqueues,
+        name=name)
+
+  def size(self, name=None):
+    """Compute the number of elements in this queue.
+
+    Args:
+      name: A name for the operation (optional).
+
+    Returns:
+      A scalar tensor containing the number of elements in this queue.
+    """
+    if name is None:
+      name = "%s_Size" % self._name
+    return gen_data_flow_ops._queue_size(self._queue_ref, name=name)
+
+
+class RandomShuffleQueue(QueueBase):
+  """A queue implementation that dequeues elements in a random order.
+
+  See [`tf.QueueBase`](#QueueBase) for a description of the methods on
+  this class.
+
+  @@__init__
+  """
+
+  def __init__(self, capacity, min_after_dequeue, dtypes, shapes=None,
+               seed=None, shared_name=None, name="random_shuffle_queue"):
+    """Create a queue that dequeues elements in a random order.
+
+    A `RandomShuffleQueue` has bounded capacity; supports multiple
+    concurrent producers and consumers; and provides exactly-once
+    delivery.
+
+    A `RandomShuffleQueue` holds a list of up to `capacity`
+    elements. Each element is a fixed-length tuple of tensors whose
+    dtypes are described by `dtypes`, and whose shapes are optionally
+    described by the `shapes` argument.
+
+    If the `shapes` argument is specified, each component of a queue
+    element must have the respective fixed shape. If it is
+    unspecified, different queue elements may have different shapes,
+    but the use of `dequeue_many` is disallowed.
+
+    The `min_after_dequeue` argument allows the caller to specify a
+    minimum number of elements that will remain in the queue after a
+    `dequeue` or `dequeue_many` operation completes, to ensure a
+    minimum level of mixing of elements. This invariant is maintained
+    by blocking those operations until sufficient elements have been
+    enqueued. The `min_after_dequeue` argument is ignored after the
+    queue has been closed.
+
+    Args:
+      capacity: An integer. The upper bound on the number of elements
+        that may be stored in this queue.
+      min_after_dequeue: An integer (described above).
+      dtypes:  A list of `DType` objects. The length of `dtypes` must equal
+        the number of tensors in each queue element.
+      shapes: (Optional.) A list of fully-defined `TensorShape` objects,
+        with the same length as `dtypes` or `None`.
+      seed: A Python integer. Used to create a random seed.
+        See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+      shared_name: (Optional.) If non-empty, this queue will be shared under
+        the given name across multiple sessions.
+      name: Optional name for the queue operation.
+    """
+    dtypes = _as_type_list(dtypes)
+    shapes = _as_shape_list(shapes, dtypes)
+    seed1, seed2 = random_seed.get_seed(seed)
+    queue_ref = gen_data_flow_ops._random_shuffle_queue(
+        component_types=dtypes, shapes=shapes, capacity=capacity,
+        min_after_dequeue=min_after_dequeue, seed=seed1, seed2=seed2,
+        shared_name=shared_name, name=name)
+
+    super(RandomShuffleQueue, self).__init__(dtypes, shapes, queue_ref)
+
+
+class FIFOQueue(QueueBase):
+  """A queue implementation that dequeues elements in first-in-first out order.
+
+  See [`tf.QueueBase`](#QueueBase) for a description of the methods on
+  this class.
+
+  @@__init__
+  """
+
+  def __init__(self, capacity, dtypes, shapes=None, shared_name=None,
+               name="fifo_queue"):
+    """Creates a queue that dequeues elements in a first-in first-out order.
+
+    A `FIFOQueue` has bounded capacity; supports multiple concurrent
+    producers and consumers; and provides exactly-once delivery.
+
+    A `FIFOQueue` holds a list of up to `capacity` elements. Each
+    element is a fixed-length tuple of tensors whose dtypes are
+    described by `dtypes`, and whose shapes are optionally described
+    by the `shapes` argument.
+
+    If the `shapes` argument is specified, each component of a queue
+    element must have the respective fixed shape. If it is
+    unspecified, different queue elements may have different shapes,
+    but the use of `dequeue_many` is disallowed.
+
+    Args:
+      capacity: An integer. The upper bound on the number of elements
+        that may be stored in this queue.
+      dtypes:  A list of `DType` objects. The length of `dtypes` must equal
+        the number of tensors in each queue element.
+      shapes: (Optional.) A list of fully-defined `TensorShape` objects,
+        with the same length as `dtypes` or `None`.
+      shared_name: (Optional.) If non-empty, this queue will be shared under
+        the given name across multiple sessions.
+      name: Optional name for the queue operation.
+    """
+    dtypes = _as_type_list(dtypes)
+    shapes = _as_shape_list(shapes, dtypes)
+    queue_ref = gen_data_flow_ops._fifo_queue(
+        component_types=dtypes, shapes=shapes, capacity=capacity,
+        shared_name=shared_name, name=name)
+
+    super(FIFOQueue, self).__init__(dtypes, shapes, queue_ref)
+
+
+# TODO(josh11b): class BatchQueue(QueueBase):
+
+
+# pylint: disable=protected-access
+class LookupTableBase(object):
+  """Represents a lookup table that persists across different steps."""
+
+  def __init__(self, key_dtype, value_dtype, default_value, table_ref):
+    """Construct a table object from a table reference.
+
+    Args:
+      key_dtype:  The key data type of the table.
+      value_dtype:  The kvalue data type of the table.
+      default_value: The scalar tensor to be used when a key is not present in
+        the table.
+      table_ref: The table reference, i.e. the output of the lookup table ops.
+    """
+    self._key_dtype = types.as_dtype(key_dtype)
+    self._value_dtype = types.as_dtype(value_dtype)
+    self._shapes = [tensor_shape.TensorShape([1])]
+    self._table_ref = table_ref
+    self._name = self._table_ref.op.name.split("/")[-1]
+    self._default_value = ops.convert_to_tensor(default_value,
+                                                dtype=self._value_dtype)
+    self._default_value.get_shape().merge_with(tensor_shape.scalar())
+
+  @property
+  def table_ref(self):
+    """Get the underlying table reference."""
+    return self._table_ref
+
+  @property
+  def key_dtype(self):
+    """The key dtype supported by the table."""
+    return self._key_dtype
+
+  @property
+  def value_dtype(self):
+    """The value dtype supported by the table."""
+    return self._value_dtype
+
+  @property
+  def name(self):
+    """The name of the table."""
+    return self._name
+
+  @property
+  def default_value(self):
+    """The default value of the table."""
+    return self._default_value
+
+  def size(self, name=None):
+    """Compute the number of elements in this table.
+
+    Args:
+      name: A name for the operation (optional).
+
+    Returns:
+      A scalar tensor containing the number of elements in this table.
+    """
+    if name is None:
+      name = "%s_Size" % self._name
+    return gen_data_flow_ops._lookup_table_size(self._table_ref, name=name)
+
+  def lookup(self, keys, name=None):
+    """Returns the values for the given 'keys' tensor.
+
+    If an element on the key tensor is not found in the table, the default_value
+    is used.
+
+    Args:
+      keys: The tensor for the keys.
+      name: Optional name for the op.
+
+    Returns:
+      The operation that looks up the keys.
+
+    Raises:
+      TypeError: when 'keys' or 'default_value' doesn't match the table data
+        types.
+    """
+    if name is None:
+      name = "%s_lookup_table_find" % self._name
+
+    if keys.dtype != self._key_dtype:
+      raise TypeError("Signature mismatch. Keys must be dtype %s, got %s." % (
+          self._key_dtype, keys.dtype))
+
+    return gen_data_flow_ops._lookup_table_find(
+        self._table_ref, keys, self._default_value, name=name)
+
+  def initialize_from(self, keys, values, name=None):
+    """Initialize the lookup table with the provided keys and values tensors.
+
+    Construct an initializer object from keys and value tensors.
+
+    Args:
+      keys: The tensor for the keys.
+      values: The tensor for the values.
+      name: Optional name for the op.
+
+    Returns:
+      The operation that initializes a lookup table.
+
+    Raises:
+      TypeError: when the 'keys' and 'values' data type do not match the table
+      key and value data types.
+    """
+    if name is None:
+      name = "%s_initialize_table" % self.name
+    with ops.op_scope([keys, values], None, name):
+      keys = ops.convert_to_tensor(keys, dtype=self.key_dtype, name="keys")
+      values = ops.convert_to_tensor(values, dtype=self.value_dtype,
+                                     name="values")
+
+    init_op = gen_data_flow_ops._initialize_table(
+        self.table_ref, keys, values, name=name)
+    ops.add_to_collection(ops.GraphKeys.TABLE_INITIALIZERS, init_op)
+    return init_op
+
+  def _check_table_dtypes(self, key_dtype, value_dtype):
+    """Check that the given key_dtype and value_dtype matches the table dtypes'.
+
+    Args:
+      key_dtype: The key data type to check.
+      value_dtype: The value data type to check.
+
+    Raises:
+      TypeError: when 'key_dtype' or 'value_dtype' doesn't match the table data
+        types.
+    """
+    if key_dtype != self.key_dtype:
+      raise TypeError("Invalid key dtype, expected %s but got %s." % (
+          self.key_dtype, key_dtype))
+    if value_dtype != self.value_dtype:
+      raise TypeError("Invalid value dtype, expected %s but got %s." % (
+          self.value_dtype, value_dtype))
+
+
+class HashTable(LookupTableBase):
+  """A generic hash table implementation."""
+
+  def __init__(self, key_dtype, value_dtype, default_value, shared_name=None,
+               name="hash_table"):
+    """Create a generic hash table.
+
+    A table holds a key-value pairs. The key and value types are
+    described by key_dtype and value_dtype respectively.
+
+    Args:
+      key_dtype:  The key data type of the table.
+      value_dtype:  The kvalue data type of the table.
+      default_value: The scalar tensor to be used when a key is not present in
+        the table.
+      shared_name: Optional. If non-empty, this table will be shared under
+        the given name across multiple sessions.
+      name: Optional name for the hash table op.
+
+    Returns:
+      A table object that can be used to lookup data.
+    """
+    table_ref = gen_data_flow_ops._hash_table(
+        shared_name=shared_name, key_dtype=key_dtype,
+        value_dtype=value_dtype, name=name)
+
+    super(HashTable, self).__init__(key_dtype, value_dtype, default_value,
+                                    table_ref)
+
+
+def initialize_all_tables(name="init_all_tables"):
+  """Returns an Op that initializes all tables of the default graph.
+
+  Returns:
+    An Op that initializes all tables.  Note that if there are
+    not tables the returned Op is a NoOp.
+  """
+  initializers = ops.get_collection(ops.GraphKeys.TABLE_INITIALIZERS)
+  if initializers:
+    return control_flow_ops.group(*initializers, name=name)
+  return control_flow_ops.no_op(name=name)
+
+
+ops.NoGradient("LookupTableFind")
+ops.NoGradient("LookupTableSize")
+ops.NoGradient("HashTable")
+ops.NoGradient("InitializeTable")
+
+
+ops.RegisterShape("QueueSize")(common_shapes.scalar_shape)
+ops.RegisterShape("Queue")(common_shapes.scalar_shape)
+ops.RegisterShape("FIFOQueue")(common_shapes.scalar_shape)
+ops.RegisterShape("RandomShuffleQueue")(common_shapes.scalar_shape)
+
+
+# NOTE(mrry): The following ops use higher-level information in the
+# Queue class to provide shape information.
+ops.RegisterShape("QueueDequeue")(common_shapes.unknown_shape)
+ops.RegisterShape("QueueDequeueMany")(common_shapes.unknown_shape)
+ops.RegisterShape("QueueEnqueue")(common_shapes.unknown_shape)
+ops.RegisterShape("QueueEnqueueMany")(common_shapes.unknown_shape)
+
+
+@ops.RegisterShape("QueueClose")
+def _ScalarToVoidShape(op):
+  """Shape function for ops that take a scalar and produce no outputs."""
+  unused_input_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  return []
+
+
+@ops.RegisterShape("DynamicPartition")
+def _DynamicPartitionShape(op):
+  """Shape function for data_flow_ops.dynamic_partition."""
+  data_shape = op.inputs[0].get_shape()
+  partitions_shape = op.inputs[1].get_shape()
+  # If we don't know the rank of partitions, we don't know anything
+  mid = partitions_shape.ndims
+  if mid is None:
+    result_shape = tensor_shape.unknown_shape()
+  else:
+    # data_shape must start with partitions_shape
+    partitions_shape.assert_is_compatible_with(data_shape[:mid])
+    # The partition shape is dynamic in the 0th dimension, and matches
+    # data_shape in the remaining dimensions.
+    result_shape = tensor_shape.TensorShape([None]).concatenate(
+        data_shape[mid:])
+  return [result_shape] * op.get_attr("num_partitions")
+
+
+@ops.RegisterShape("DynamicStitch")
+def _DynamicStitchShape(op):
+  """Shape function for data_flow_ops.dynamic_stitch."""
+  num_partitions = op.get_attr("N")
+  indices_shapes = [t.get_shape() for t in op.inputs[0:num_partitions]]
+  data_shapes = [t.get_shape() for t in op.inputs[num_partitions:]]
+  output_shape = tensor_shape.unknown_shape()
+  extra_shape = tensor_shape.TensorShape(None)
+  for indices_shape, data_shape in zip(indices_shapes, data_shapes):
+    indices_ndims = indices_shape.ndims
+    if indices_ndims is not None:
+      # Assert that data_shape starts with indices_shape
+      indices_shape.merge_with(data_shape[:indices_ndims])
+      # The rest belongs to output
+      extra_shape = extra_shape.merge_with(data_shape[indices_ndims:])
+  return [tensor_shape.TensorShape([None]).concatenate(extra_shape)]
+
+
+@ops.RegisterShape("LookupTableFind")
+def _LookupTableFindShape(op):
+  """Shape function for data_flow_ops._lookup_table_find."""
+  unused_table_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  shape_in = op.inputs[1].get_shape()
+  return [shape_in]
+
+
+@ops.RegisterShape("LookupTableSize")
+def _LookupTableSizeShape(op):
+  """Shape function for data_flow_ops._lookup_table_find."""
+  unused_table_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  return [tensor_shape.scalar()]
+
+
+@ops.RegisterShape("HashTable")
+def _HashTableShape(unused_op):
+  """Shape function for data_flow_ops._hash_table."""
+  return [tensor_shape.scalar()]
+
+
+@ops.RegisterShape("InitializeTable")
+def _InitializeLookupTableShape(op):
+  """Shape function for data_flow_ops._initialize_table."""
+  unused_table_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  keys_shape = op.inputs[1].get_shape().with_rank(1)
+  unused_values_shape = op.inputs[2].get_shape().merge_with(keys_shape)
+  return []
diff --git a/tensorflow/python/ops/embedding_ops.py b/tensorflow/python/ops/embedding_ops.py
new file mode 100644
index 0000000000..bc64593d23
--- /dev/null
+++ b/tensorflow/python/ops/embedding_ops.py
@@ -0,0 +1,197 @@
+"""Operations for embeddings."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import data_flow_ops
+from tensorflow.python.ops import math_ops
+
+
+def embedding_lookup(params, ids, name=None):
+  """Return a tensor of embedding values by looking up "ids" in "params".
+
+  Args:
+    params: List of tensors of the same shape.  A single tensor is
+            treated as a singleton list.
+    ids: Tensor of integers containing the ids to be looked up in
+         'params'.  Let P be len(params).  If P > 1, then the ids are
+         partitioned by id % P, and we do separate lookups in params[p]
+         for 0 <= p < P, and then stitch the results back together into
+         a single result tensor.
+    name: Optional name for the op.
+
+  Returns:
+    A tensor of shape ids.shape + params[0].shape[1:] containing the
+    values params[i % P][i] for each i in ids.
+
+  Raises:
+    ValueError: if some parameters are invalid.
+  """
+  if not isinstance(params, list):
+    params = [params]
+  with ops.op_scope(params + [ids], name, "embedding_lookup") as name:
+    if not params:
+      raise ValueError("Need at least one param")
+    np = len(params)  # Number of partitions
+    params = ops.convert_n_to_tensor_or_indexed_slices(params, name="params")
+    if np == 1:
+      with ops.device(params[0].device):
+        return array_ops.gather(params[0], ids, name=name)
+    else:
+      ids = ops.convert_to_tensor(ids, name="ids")
+      flat_ids = array_ops.reshape(ids, [-1])
+      original_indices = math_ops.range(0, array_ops.size(flat_ids))
+      # Compute flat_ids % partitions for each id
+      ids_mod_p = flat_ids % np
+      if ids_mod_p.dtype != types.int32:
+        ids_mod_p = math_ops.cast(ids_mod_p, types.int32)
+      # Partition single list of ids based on ids % np into np separate lists
+      plist = data_flow_ops.dynamic_partition(flat_ids, ids_mod_p, np)
+      # Similarly, partition the original indices.
+      pindices = data_flow_ops.dynamic_partition(original_indices, ids_mod_p,
+                                                 np)
+      # Do np separate lookups, finding embeddings for plist[p] in params[p]
+      partitioned_result = []
+      for p in range(np):
+        # TODO(agarwal): handle device allocations here and later in the
+        # colocate code.
+        gather_ids = plist[p] / np
+        with ops.device(params[p].device):
+          partitioned_result.append(array_ops.gather(params[p], gather_ids))
+      # Stitch these back together
+      ret = data_flow_ops.dynamic_stitch(pindices, partitioned_result,
+                                         name=name)
+      # Reshape to reverse the flattening of ids.
+      # It's important that we compute params[0].shape on the right device
+      # to avoid data motion.
+      with ops.device(params[0].device):
+        params_shape = array_ops.shape(params[0])
+      ret = array_ops.reshape(ret, array_ops.concat(0, [
+          array_ops.shape(ids), array_ops.slice(params_shape, [1], [-1])]))
+      # output shape = ids.shape + params[*].shape[1:]
+      # Normally the reshape is sufficient, but setting shape explicitly
+      # teaches shape inference that params[1:].get_shape() matters.
+      element_shape = params[0].get_shape()[1:]
+      for p in params[1:]:
+        element_shape = element_shape.merge_with(p.get_shape()[1:])
+      ret.set_shape(ids.get_shape().concatenate(element_shape))
+      return ret
+
+
+# TODO(lif): Add support for higher-rank SparseTensors
+def embedding_lookup_sparse(params, sp_ids, sp_weights,
+                            name=None,
+                            combiner="mean"):
+  """Computes embeddings for the given ids and weights.
+
+  This op assumes that there is at least one id for each row in the dense tensor
+  represented by sp_ids (i.e. there are no rows with empty features), and that
+  all the indices of sp_ids are in canonical row-major order.
+
+  It also assumes that all id values lie in the range [0, p0), where p0
+  is the sum of the size of params along dimension 0.
+
+  Args:
+    params: A single tensor representing the complete embedding tensor,
+      or a list of P tensors all of same shape except for the first dimension,
+      representing sharded embedding tensors. In the latter case, the ids are
+      partitioned by id % P, and we do separate lookups in params[p] for
+      0 <= p < P, and then stitch the results back together into a single
+      result tensor. The first dimension is allowed to vary as the vocab
+      size is not necessarily a multiple of P.
+    sp_ids: N x M SparseTensor of int64 ids (typically from FeatureValueToId),
+      where N is typically batch size and M is arbitrary.
+    sp_weights: either a SparseTensor of float / double weights, or None to
+      indicate all weights should be taken to be 1. If specified, sp_weights
+      must have exactly the same shape and indices as sp_ids.
+    name: Optional name for the op.
+    combiner: A string specifying the reduction op. Currently "mean" and "sum"
+      are supported.
+      "sum" computes the weighted sum of the embedding results for each row.
+      "mean" is the weighted sum divided by the total weight.
+
+  Returns:
+    A dense tensor representing the combined embeddings for the
+    sparse ids. For each row in the dense tensor represented by sp_ids, the op
+    looks up the embeddings for all ids in that row, multiplies them by the
+    corresponding weight, and combines these embeddings as specified.
+
+    In other words, if
+      shape(combined params) = [p0, p1, ..., pm]
+    and
+      shape(sp_ids) = shape(sp_weights) = [d0, d1, ..., dn]
+    then
+      shape(output) = [d0, d1, ..., dn-1, p1, ..., pm].
+
+    For instance, if params is a 10x20 matrix, and sp_ids / sp_weights are
+
+      [0, 0]: id 1, weight 2.0
+      [0, 1]: id 3, weight 0.5
+      [1, 0]: id 0, weight 1.0
+      [2, 3]: id 1, weight 3.0
+
+    with combiner="mean", then the output will be a 3x20 matrix where
+      output[0, :] = (params[1, :] * 2.0 + params[3, :] * 0.5) / (2.0 + 0.5)
+      output[1, :] = params[0, :] * 1.0
+      output[2, :] = params[1, :] * 3.0
+
+  Raises:
+    TypeError: If sp_ids is not a SparseTensor, or if sp_weights is neither
+      None nor SparseTensor.
+    ValueError: If combiner is not one of {"mean", "sum"}.
+  """
+  if combiner not in ("mean", "sum"):
+    raise ValueError("combiner must be one of 'mean' or 'sum'")
+  if not isinstance(params, list):
+    params = [params]
+  if not isinstance(sp_ids, ops.SparseTensor):
+    raise TypeError("sp_ids must be SparseTensor")
+  ignore_weights = sp_weights is None
+  if not ignore_weights and not isinstance(sp_weights, ops.SparseTensor):
+    raise TypeError("sp_weights must be either None or SparseTensor")
+
+  with ops.op_scope(params + [sp_ids], name, "embedding_lookup_sparse") as name:
+    segment_ids = sp_ids.indices[:, 0]
+    if segment_ids.dtype != types.int32:
+      segment_ids = math_ops.cast(segment_ids, types.int32)
+
+    ids = sp_ids.values
+    if ignore_weights:
+      ids, idx = array_ops.unique(ids)
+    else:
+      idx = None
+
+    embeddings = embedding_lookup(params, ids)
+    if not ignore_weights:
+      weights = sp_weights.values
+      if weights.dtype != embeddings.dtype:
+        weights = math_ops.cast(weights, embeddings.dtype)
+
+      # Reshape weights to allow broadcast
+      ones = array_ops.fill(
+          array_ops.expand_dims(array_ops.rank(embeddings) - 1, 0), 1)
+      bcast_weights_shape = array_ops.concat(0, [
+          array_ops.shape(weights), ones])
+      weights = array_ops.reshape(weights, bcast_weights_shape)
+      embeddings *= weights
+
+      if combiner == "sum":
+        embeddings = math_ops.segment_sum(embeddings, segment_ids, name=name)
+      elif combiner == "mean":
+        embeddings = math_ops.segment_sum(embeddings, segment_ids)
+        weight_sum = math_ops.segment_sum(weights, segment_ids)
+        embeddings = math_ops.div(embeddings, weight_sum, name=name)
+      else:
+        assert False, "Unrecognized combiner"
+    else:
+      assert idx is not None
+      if combiner == "sum":
+        embeddings = math_ops.sparse_segment_sum(embeddings, idx, segment_ids,
+                                                 name=name)
+      elif combiner == "mean":
+        embeddings = math_ops.sparse_segment_mean(embeddings, idx, segment_ids,
+                                                  name=name)
+      else:
+        assert False, "Unrecognized combiner"
+
+    return embeddings
diff --git a/tensorflow/python/ops/gradients.py b/tensorflow/python/ops/gradients.py
new file mode 100644
index 0000000000..ffa7828c04
--- /dev/null
+++ b/tensorflow/python/ops/gradients.py
@@ -0,0 +1,661 @@
+"""Implements the graph generation for computation of gradients."""
+
+import collections
+import warnings
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+# pylint: disable=unused-import
+from tensorflow.python.ops import array_grad
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import control_flow_grad
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import logging_ops
+from tensorflow.python.ops import linalg_grad
+from tensorflow.python.ops import math_grad
+# pylint: enable=unused-import
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import linalg_ops
+from tensorflow.python.platform import logging
+
+
+# Warn the user if we convert a sparse representation to dense with at
+# least this number of elements.
+_LARGE_SPARSE_NUM_ELEMENTS = 100000000
+
+
+def _IndexedSlicesToTensor(value, dtype=None, name=None):
+  """Converts an IndexedSlices object `value` to a Tensor.
+
+  NOTE(mrry): This function is potentially expensive.
+
+  Args:
+    value: An ops.IndexedSlices object.
+    dtype: The dtype of the Tensor to be returned.
+    name: Optional name to use for the returned Tensor.
+
+  Returns:
+    A dense Tensor representing the values in the given IndexedSlices.
+
+  Raises:
+    ValueError: If the IndexedSlices does not have the same dtype.
+  """
+  if dtype and not dtype.is_compatible_with(value.dtype):
+    raise ValueError(
+        "Tensor conversion requested dtype %s for IndexedSlices with dtype %s"
+        % (dtype.name, value.dtype.name))
+  if value.dense_shape is None:
+    raise ValueError(
+        "Tensor conversion requested for IndexedSlices without dense_shape: %s"
+        % str(value))
+  # TODO(mrry): Consider adding static shape information to
+  # IndexedSlices, to avoid using numpy here.
+  dense_shape_value = tensor_util.ConstantValue(value.dense_shape)
+  if dense_shape_value is not None:
+    num_elements = np.prod(dense_shape_value)
+    if num_elements >= _LARGE_SPARSE_NUM_ELEMENTS:
+      warnings.warn(
+          "Converting sparse IndexedSlices to a dense Tensor with %d elements. "
+          "This may consume a large amount of memory." % num_elements)
+  else:
+    warnings.warn(
+        "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
+        "This may consume a large amount of memory.")
+  return math_ops.unsorted_segment_sum(
+      value.values, value.indices, value.dense_shape[0], name=name)
+
+
+ops.register_tensor_conversion_function(ops.IndexedSlices, _IndexedSlicesToTensor)
+
+
+def _MarkReachedOps(from_ops, reached_ops):
+  """Mark all ops reached from "from_ops".
+
+  Args:
+    from_ops: list of Operations.
+    reached_ops: list of booleans, indexed by operation id.
+  """
+  queue = collections.deque()
+  queue.extend(from_ops)
+  while queue:
+    op = queue.popleft()
+    if not reached_ops[op._id]:
+      reached_ops[op._id] = True
+      for output in op.outputs:
+        queue.extend(output.consumers())
+
+
+def _GatherInputs(to_ops, reached_ops):
+  """List all inputs of to_ops that are in reached_ops.
+
+  Args:
+    to_ops: list of Operations.
+    reached_ops: list of booleans, indexed by operation id.
+
+  Returns:
+    The list of all inputs of to_ops that are in reached_ops.
+    That list includes all elements of to_ops.
+  """
+  inputs = []
+  queue = collections.deque()
+  queue.extend(to_ops)
+  while queue:
+    op = queue.popleft()
+    # We are interested in this op.
+    if reached_ops[op._id]:
+      inputs.append(op)
+      # Clear the boolean so we won't add the inputs again.
+      reached_ops[op._id] = False
+      for inp in op.inputs:
+        queue.append(inp.op)
+  return inputs
+
+
+def _GetGradsDevice(op, colocate_gradients_with_ops):
+  """Gets the device to which to assign gradients of "op".
+
+  Args:
+    op: an Operation.
+    colocate_gradients_with_ops: If True, try colocating gradients with the
+      corresponding op.
+
+  Returns:
+    A device string.
+  """
+  if colocate_gradients_with_ops and op.device:
+    return op.device
+  else:
+    return op.graph.get_default_device()
+
+
+def _PendingCount(graph, to_ops, from_ops):
+  """Initialize the pending count for ops between two lists of Operations.
+
+  'pending_count[op._id]' indicates the number of backprop inputs
+  to this operation.
+
+  Args:
+    graph: a Graph.
+    to_ops: list of Operations.
+    from_ops: list of Operations.
+
+  Returns:
+    A tuple containing: (1) a list of integers indexed by operation id,
+    indicating the number of backprop inputs to this operation, and (2)
+    a boolean which is True if any of the ops in between from_ops and to_ops
+    contain control flow loops.
+  """
+  # Mark reachable ops from from_ops.
+  reached_ops = [False] * (graph._last_id + 1)
+  for op in to_ops:
+    reached_ops[op._id] = True
+  _MarkReachedOps(from_ops, reached_ops)
+
+  # Mark between ops.
+  between_ops = [False] * (graph._last_id + 1)
+  between_op_list = []
+  queue = collections.deque()
+  queue.extend(to_ops)
+  while queue:
+    op = queue.popleft()
+    # We are interested in this op.
+    if reached_ops[op._id]:
+      between_ops[op._id] = True
+      between_op_list.append(op)
+      # Clear the boolean so we won't add the inputs again.
+      reached_ops[op._id] = False
+      for inp in op.inputs:
+        queue.append(inp.op)
+
+  # Initialize pending count for between ops.
+  pending_count = [0] * (graph._last_id + 1)
+  has_control_flow = False
+  for op in between_op_list:
+    for x in op.inputs:
+      if between_ops[x.op._id]:
+        pending_count[x.op._id] += 1
+    for x in op.control_inputs:
+      if between_ops[x._id]:
+        pending_count[x._id] += 1
+    if op.type == "Exit":
+      has_control_flow = True
+  return pending_count, has_control_flow
+
+
+def _AsList(x):
+  return x if isinstance(x, (list, tuple)) else [x]
+
+
+def _DefaultGradYs(grad_ys, ys, colocate_gradients_with_ops):
+  """Fill in default values for grad_ys.
+
+  Args:
+    grad_ys: List of gradients, can contain None.
+    ys: List of tensors.
+    colocate_gradients_with_ops: If True, try colocating gradients with
+      the corresponding op.
+
+  Returns:
+    A list of gradients to use, without None.
+
+  Raises:
+    ValueError: If one of the grad_ys is invalid.
+  """
+  if len(grad_ys) != len(ys):
+    raise ValueError("Passed %d grad_ys for %d ys" % (len(grad_ys), len(ys)))
+  grad_ys = ops.convert_n_to_tensor_or_indexed_slices(grad_ys, name="grad_y")
+  for i in xrange(len(grad_ys)):
+    grad_y = grad_ys[i]
+    y = ys[i]
+    if grad_y is None:
+      with ops.device(_GetGradsDevice(y.op, colocate_gradients_with_ops)):
+        grad_ys[i] = array_ops.fill(array_ops.shape(y),
+                                    constant_op.constant(1, dtype=y.dtype))
+    else:
+      if grad_y.dtype != y.dtype:
+        raise ValueError("Y and ys_grad must be of the same type, "
+                         "not y: %s, ys_grad: %s " %
+                         (types.as_dtype(y.dtype).name,
+                          types.as_dtype(grad_y.dtype).name))
+  return grad_ys
+
+
+def _VerifyGeneratedGradients(grads, op):
+  """Verify that gradients are valid in number and type.
+
+  Args:
+    grads: List of generated gradients.
+    op: Operation for which the gradients where generated.
+
+  Raises:
+    ValueError: if the gradients are invalid.
+  """
+  if len(grads) != len(op.inputs):
+    raise ValueError("Num gradients %d generated for op %s do not match num "
+                     "inputs %d" % (len(grads), op.node_def, len(op.inputs)))
+  for i in xrange(len(grads)):
+    grad = grads[i]
+    inp = op.inputs[i]
+    if grad is not None:
+      if not grad.dtype.is_compatible_with(inp.dtype):
+        raise ValueError(
+            "Gradient type %s generated for op %s does "
+            "not match input type %s" %
+            (types.as_dtype(grad.dtype).name, op.node_def,
+             types.as_dtype(inp.dtype).name))
+
+
+def _StopOps(from_ops, pending_count):
+  """The set of ops that terminate the gradient computation.
+
+  This computes the frontier of the forward graph *before* which backprop
+  should stop. Operations in the returned set will not be differentiated.
+  This set is defined as the subset of `from_ops` containing ops that have
+  no predecessor in `from_ops`. `pending_count` is the result of
+  `_PendingCount(g, xs, from_ops)`. An 'op' has predecessors in `from_ops`
+  iff pending_count[op._id] > 0.
+
+  Args:
+    from_ops: list of Operations.
+    pending_count: List of integers, indexed by operation id.
+
+  Returns:
+    The set of operations.
+  """
+  stop_ops = set()
+  for op in from_ops:
+    is_stop_op = True
+    for inp in op.inputs:
+      if pending_count[inp.op._id] > 0:
+        is_stop_op = False
+        break
+    if is_stop_op:
+      stop_ops.add(op._id)
+  return stop_ops
+
+
+def gradients(ys, xs, grad_ys=None, name="gradients",
+              colocate_gradients_with_ops=False,
+              gate_gradients=False,
+              aggregation_method=None):
+  """Constructs symbolic partial derivatives of `ys` w.r.t. x in `xs`.
+
+  `ys` and `xs` are each a `Tensor` or a list of tensors.  `grad_ys`
+  is a list of `Tensor`, holding the gradients received by the
+  `ys`. The list must be the same length as `ys`.
+
+  `gradients()` adds ops to the graph to output the partial
+  derivatives of `ys` with respect to `xs`.  It returns a list of
+  `Tensor` of length `len(xs)` where each tensor is the `sum(dy/dx)`
+  for y in `ys`.
+
+  `grad_ys` is a list of tensors of the same length as `ys` that holds
+  the initial gradients for each y in `ys`.  When `grad_ys` is None,
+  we fill in a tensor of '1's of the shape of y for each y in `ys`.  A
+  user can provide their own initial 'grad_ys` to compute the
+  derivatives using a different initial gradient for each y (e.g., if
+  one wanted to weight the gradient differently for each value in
+  each y).
+
+  Args:
+    ys: A `Tensor` or list of tensors to be differentiated.
+    xs: A `Tensor` or list of tensors to be used for differentiation.
+    grad_ys: Optional. A `Tensor` or list of tensors the same size as
+      `ys` and holding the gradients computed for each y in `ys`.
+    name: Optional name to use for grouping all the gradient ops together.
+      defaults to 'gradients'.
+    colocate_gradients_with_ops: If True, try colocating gradients with
+      the corresponding op.
+    gate_gradients: If True, add a tuple around the gradients returned
+      for an operations.  This avoids some race conditions.
+    aggregation_method: Specifies the method used to combine gradient terms.
+      Accepted values are constants defined in the class `AggregationMethod`.
+
+  Returns:
+    A list of `sum(dy/dx)` for each x in `xs`.
+
+  Raises:
+    LookupError: if one of the operations between `x` and `y` does not
+      have a registered gradient function.
+    ValueError: if the arguments are invalid.
+
+  """
+  ys = _AsList(ys)
+  xs = _AsList(xs)
+  if grad_ys is None:
+    grad_ys = [None] * len(ys)
+  else:
+    grad_ys = _AsList(grad_ys)
+  with ops.op_scope(ys + xs + grad_ys, name, "gradients"):
+    ys = ops.convert_n_to_tensor_or_indexed_slices(ys, name="y")
+    xs = ops.convert_n_to_tensor_or_indexed_slices(xs, name="x")
+    grad_ys = _DefaultGradYs(grad_ys, ys, colocate_gradients_with_ops)
+
+    # The approach we take here is as follows: Create a list of all ops in the
+    # subgraph between the ys and xs.  Visit these ops in reverse order of ids
+    # to ensure that when we visit an op the gradients w.r.t its outputs have
+    # been collected.  Then aggregate these gradients if needed, call the op's
+    # gradient function, and add the generated gradients to the gradients for
+    # its input.
+
+    # Initialize the pending count for ops in the connected subgraph from ys
+    # to the xs.
+    to_ops = [t.op for t in ys]
+    from_ops = [t.op for t in xs]
+    pending_count, has_control_flow = _PendingCount(
+        ops.get_default_graph(), to_ops, from_ops)
+
+    # Iterate over the collected ops.
+    #
+    # grads: op => list of gradients received on each output endpoint of the
+    # op.  The gradients for each endpoint are initially collected as a list.
+    # When it is time to call the op's gradient function, for each endpoint we
+    # aggregate the list of received gradients into a Add() Operation if there
+    # is more than one.
+    grads = {}
+
+    # Add the initial gradients for the ys.
+    for y, grad_y in zip(ys, grad_ys):
+      _SetGrad(grads, y, grad_y)
+
+    # Initialize queue with to_ops.
+    queue = collections.deque()
+    # Add the ops in 'to_ops' into the queue.
+    to_ops_set = set()
+    for op in to_ops:
+      if op._id not in to_ops_set:
+        to_ops_set.add(op._id)
+        queue.append(op)
+    # The set of 'from_ops'.
+    stop_ops = _StopOps(from_ops, pending_count)
+    while queue:
+      # generate gradient subgraph for op.
+      op = queue.popleft()
+      with ops.device(_GetGradsDevice(op, colocate_gradients_with_ops)):
+        if has_control_flow:
+          control_flow_ops.EnterGradWhileContext(op)
+        out_grads = _AggregatedGrads(grads, op, has_control_flow,
+                                     aggregation_method)
+        grad_fn = None
+        if any(out_grads) and op._id not in stop_ops:
+          # A grad_fn must be defined, either as a function or as None
+          # for ops that do not have gradients.
+          try:
+            grad_fn = ops.get_gradient_function(op)
+          except LookupError:
+            raise LookupError(
+                "No gradient defined for operation '%s' (op type: %s)" %
+                (op.name, op.type))
+        if grad_fn and any(out_grads):
+          # NOTE: If _AggregatedGrads didn't compute a value for the i'th
+          # output, it means that the cost does not depend on output[i],
+          # therefore dC/doutput[i] is 0.
+          for i, out_grad in enumerate(out_grads):
+            if (not out_grad
+                and types.as_dtype(op.outputs[i].dtype).base_dtype in (
+                    types.float32, types.float64)):
+              # Only floating-point outputs get a zero gradient. Gradient
+              # functions should ignore the gradient for other outputs.
+              out_grads[i] = array_ops.zeros_like(op.outputs[i])
+          with ops.name_scope(op.name + "_grad"):
+            # pylint: disable=protected-access
+            with ops.get_default_graph()._original_op(op):
+            # pylint: enable=protected-access
+              op_wrapper = op
+              if has_control_flow:
+                op_wrapper = control_flow_ops.MakeWrapper(op)
+              in_grads = _AsList(grad_fn(op_wrapper, *out_grads))
+              _VerifyGeneratedGradients(in_grads, op)
+              if gate_gradients and len(in_grads) > 1:
+                in_grads = control_flow_ops.tuple(in_grads)
+          logging.vlog(1, "Gradient for '" + op.name + "'")
+          logging.vlog(1, "  in  --> %s",
+                       ", ".join([x.name for x in out_grads if x]))
+          logging.vlog(1, "  out --> %s",
+                       ", ".join([x.name for x in in_grads if x]))
+        else:
+          # If no grad_fn is defined or none of out_grads is available,
+          # just propagates a list of None backwards.
+          in_grads = [None] * len(op.inputs)
+        for t_in, in_grad in zip(op.inputs, in_grads):
+          if in_grad:
+            _SetGrad(grads, t_in, in_grad)
+        if has_control_flow:
+          control_flow_ops.ExitGradWhileContext(op)
+
+      # update pending count for the inputs of op.
+      for x in op.inputs:
+        pending_count[x.op._id] -= 1
+        ready = (pending_count[x.op._id] == 0)
+        if has_control_flow and not ready:
+          ready = (pending_count[x.op._id] > 0 and
+                   control_flow_ops.IsLoopSwitch(x.op))
+        if ready:
+          queue.append(x.op)
+      for x in op.control_inputs:
+        pending_count[x._id] -= 1
+        if pending_count[x._id] is 0:
+          queue.append(x)
+  return [_GetGrad(grads, x) for x in xs]
+
+
+def _SetGrad(grads, t, grad):
+  """Sets gradient "grad" in "grads" for tensor "t"."""
+  op = t.op
+  op_grads = grads.get(op)
+  if not op_grads:
+    op_grads = [[] for _ in xrange(len(op.outputs))]
+    grads[op] = op_grads
+  t_grads = op_grads[t.value_index]
+  if isinstance(t_grads, list):
+    t_grads.append(grad)
+  else:
+    assert op.type == "Switch"
+    op_grads[t.value_index] = grad
+
+
+def _GetGrad(grads, t):
+  """Gets gradient for tensor "t"."""
+  op = t.op
+  op_grads = grads.get(op)
+  if not op_grads: return None
+  t_grad = op_grads[t.value_index]
+  assert not isinstance(t_grad, list), (
+      "gradients list should have been aggregated by now.")
+  return t_grad
+
+
+def _GetGrads(grads, op):
+  """Gets all gradients for op."""
+  if op in grads:
+    return grads[op]
+  else:
+    return [[] for _ in xrange(len(op.outputs))]
+
+
+def _HandleNestedIndexedSlices(grad):
+  assert isinstance(grad, ops.IndexedSlices)
+  if isinstance(grad.values, ops.Tensor):
+    return grad
+  else:
+    assert isinstance(grad.values, ops.IndexedSlices)
+    g = _HandleNestedIndexedSlices(grad.values)
+    return ops.IndexedSlices(
+        g.values, array_ops.gather(grad.indices, g.indices), g.dense_shape)
+
+
+def _AccumulatorShape(inputs):
+  shape = tensor_shape.unknown_shape()
+  for i in inputs:
+    if isinstance(i, ops.Tensor):
+      shape = shape.merge_with(i.get_shape())
+  return shape
+
+
+class AggregationMethod(object):
+  """A class listing aggregation methods used to combine gradients.
+
+  Computing partial derivatives can require aggregating gradient
+  contributions. This class lists the various methods that can
+  be used to combine gradients in the graph:
+
+  *  `ADD_N`: All of the gradient terms are summed as part of one
+     operation using the "AddN" op. It has the property that all
+     gradients must be ready before any aggregation is performed.
+  *  `DEFAULT`: The system-chosen default aggregation method.
+  """
+  ADD_N = 0
+  DEFAULT = ADD_N
+  # The following are experimental and may not be supported in future releases.
+  EXPERIMENTAL_TREE = 1
+  EXPERIMENTAL_ACCUMULATE_N = 2
+
+
+def _AggregatedGrads(grads, op, has_control_flow, aggregation_method=None):
+  """Get the aggregated gradients for op.
+
+  Args:
+    grads: The map of memoized gradients.
+    op: The op to get gradients for.
+    has_control_flow: True iff the graph contains control flow ops.
+    aggregation_method: Specifies the method used to combine gradient terms.
+      Accepted values are constants defined in the class `AggregationMethod`.
+
+  Returns:
+    A list of gradients, one per each output of `op`. If the gradients
+      for a particular output is a list, this function aggregates it
+      before returning.
+
+  Raises:
+    TypeError: if the incoming grads are not Tensors or IndexedSlices.
+    ValueError: if the arguments are invalid.
+
+  """
+  if aggregation_method is None:
+    aggregation_method = AggregationMethod.DEFAULT
+  if aggregation_method not in [AggregationMethod.ADD_N,
+                                AggregationMethod.EXPERIMENTAL_TREE,
+                                AggregationMethod.EXPERIMENTAL_ACCUMULATE_N]:
+    raise ValueError("Invalid aggregation_method specified.")
+  out_grads = _GetGrads(grads, op)
+  for i, out_grad in enumerate(out_grads):
+    if has_control_flow:
+      if isinstance(out_grad, (ops.Tensor, ops.IndexedSlices)):
+        assert op.type == "Switch"
+        continue
+    # Grads have to be Tensors or IndexedSlices
+    if not all([isinstance(g, (ops.Tensor, ops.IndexedSlices))
+                for g in out_grad if g]):
+      raise TypeError("gradients have to be either all Tensors "
+                      "or all IndexedSlices")
+    # Aggregate multiple gradients, and convert [] to None.
+    if out_grad:
+      if all([isinstance(g, ops.Tensor) for g in out_grad if g]):
+        tensor_shape = _AccumulatorShape(out_grad)
+        if len(out_grad) < 2:
+          used = "nop"
+          out_grads[i] = out_grad[0]
+        elif (aggregation_method == AggregationMethod.EXPERIMENTAL_ACCUMULATE_N
+              and len(out_grad) > 2 and tensor_shape.is_fully_defined()):
+          # The benefit of using AccumulateN is that its inputs can be combined
+          # in any order and this can allow the expression to be evaluated with
+          # a smaller memory footprint.  When used with gpu_allocator_retry,
+          # it is possible to compute a sum of terms which are much larger than
+          # total GPU memory.
+          # AccumulateN can currently only be used if we know the shape for
+          # an accumulator variable.  If this is not known, or if we only have
+          # 2 grads then we fall through to the "tree" case below.
+          used = "accumulate_n"
+          out_grads[i] = math_ops.accumulate_n(out_grad)
+        elif aggregation_method in [AggregationMethod.EXPERIMENTAL_TREE,
+                                    AggregationMethod.EXPERIMENTAL_ACCUMULATE_N
+                                   ]:
+          # Aggregate all gradients by doing pairwise sums: this may
+          # reduce performance, but it can improve memory because the
+          # gradients can be released earlier.
+          #
+          # TODO(vrv): Consider replacing this with a version of
+          # tf.AddN() that eagerly frees its inputs as soon as they are
+          # ready, so the order of this tree does not become a problem.
+          used = "tree"
+          with ops.name_scope(op.name + "_gradient_sum"):
+            running_sum = out_grad[0]
+            for grad in out_grad[1:]:
+              running_sum = math_ops.add_n([running_sum, grad])
+            out_grads[i] = running_sum
+        else:
+          used = "add_n"
+          out_grads[i] = math_ops.add_n(out_grad)
+        logging.vlog(2, "  _AggregatedGrads %d x %s using %s", len(out_grad),
+                     tensor_shape, used)
+      else:
+        out_grad = math_ops._as_indexed_slices_list([g for g in out_grad if g])
+        out_grad = [_HandleNestedIndexedSlices(x) for x in out_grad]
+        # Form IndexedSlices out of the concatenated values and
+        # indices.
+        out_grads[i] = ops.IndexedSlices(
+            array_ops.concat(0, [x.values for x in out_grad]),
+            array_ops.concat(0, [x.indices for x in out_grad]),
+            out_grad[0].dense_shape)
+    else:
+      out_grads[i] = []
+  return out_grads
+
+
+# TODO(vrv): Make this available when we want to make it public.
+def _hessian_vector_product(ys, xs, v):
+  """Multiply the Hessian of `ys` wrt `xs` by `v`.
+
+  This is an efficient construction that uses a backprop-like approach
+  to compute the product between the Hessian and another vector. The
+  Hessian is usually too large to be explicitly computed or even
+  represented, but this method allows us to at least multiply by it
+  for the same big-O cost as backprop.
+
+  Implicit Hessian-vector products are the main practical, scalable way
+  of using second derivatives with neural networks. They allow us to
+  do things like construct Krylov subspaces and approximate conjugate
+  gradient descent.
+
+  Example: if `y` = 1/2 `x`^T A `x`, then `hessian_vector_product(y,
+  x, v)` will return an expression that evaluates to the same values
+  as (A + A.T) `v`.
+
+  Args:
+    ys: A scalar value, or a tensor or list of tensors to be summed to
+        yield a scalar.
+    xs: A list of tensors that we should construct the Hessian over.
+    v: A list of tensors, with the same shapes as xs, that we want to
+       multiply by the Hessian.
+
+  Returns:
+    A list of tensors (or if the list would be length 1, a single tensor)
+    containing the product between the Hessian and `v`.
+
+  Raises:
+    ValueError: `xs` and `v` have different length.
+
+  """
+
+  # Validate the input
+  length = len(xs)
+  if len(v) != length:
+    raise ValueError("xs and v must have the same length.")
+
+  # First backprop
+  grads = gradients(ys, xs)
+
+  assert len(grads) == length
+  elemwise_products = [math_ops.mul(grad_elem, array_ops.stop_gradient(v_elem))
+                       for grad_elem, v_elem in zip(grads, v)
+                       if grad_elem is not None]
+
+  # Second backprop
+  return gradients(elemwise_products, xs)
diff --git a/tensorflow/python/ops/gradients_test.py b/tensorflow/python/ops/gradients_test.py
new file mode 100644
index 0000000000..dac0ebbb60
--- /dev/null
+++ b/tensorflow/python/ops/gradients_test.py
@@ -0,0 +1,337 @@
+"""Tests for tensorflow.ops.gradients."""
+import warnings
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+# pylint: disable=unused-import
+from tensorflow.python.ops import array_grad
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import data_flow_grad
+from tensorflow.python.ops import data_flow_ops
+from tensorflow.python.ops import gradients
+from tensorflow.python.ops import math_grad
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_grad
+from tensorflow.python.ops import state_grad
+# pylint: enable=unused-import
+from tensorflow.python.ops.constant_op import constant
+from tensorflow.python.ops.nn_ops import bias_add
+from tensorflow.python.platform import googletest
+
+
+def _OpsBetween(graph, to_ops, from_ops):
+  """Build the list of operations between two lists of Operations.
+
+  Args:
+    graph: a Graph.
+    to_ops: list of Operations.
+    from_ops: list of Operations.
+
+  Returns:
+    The list of operations between "from_ops" and "to_ops", sorted by
+    decreasing operation id. This list contains all elements of to_ops.
+
+    TODO(mdevin): Think about returning an empty list if from_ops are not
+    reachable from to_ops.  Presently it returns to_ops in that case.
+  """
+  # List of booleans, indexed by operation id, indicating if
+  # an op is reached from the output of "input_ops".
+  reached_ops = [False] * (graph._last_id + 1)
+  # We only care to reach up to "output_ops" so we mark the
+  # output ops as reached to avoid recursing past them.
+  for op in to_ops:
+    reached_ops[op._id] = True
+  gradients._MarkReachedOps(from_ops, reached_ops)
+  between_ops = gradients._GatherInputs(to_ops, reached_ops)
+  between_ops.sort(lambda x, y: y._id - x._id)
+  return between_ops
+
+
+class GradientsTest(test_util.TensorFlowTestCase):
+
+  def _OpNames(self, op_list):
+    return ["%s/%d" % (str(op.name), op._id) for op in op_list]
+
+  def _assertOpListEqual(self, ops1, ops2):
+    self.assertEquals(self._OpNames(ops1), self._OpNames(ops2))
+
+  def testOpsBetweenSimple(self):
+    with ops.Graph().as_default() as g:
+      t1 = constant(1.0)
+      t2 = constant(2.0)
+      t3 = array_ops.pack([t1, t2])
+    # Full graph
+    self._assertOpListEqual([t3.op, t2.op, t1.op],
+                            _OpsBetween(g, [t3.op], [t1.op, t2.op]))
+    # Only t1, t3.
+    self._assertOpListEqual([t3.op, t1.op],
+                            _OpsBetween(g, [t3.op], [t1.op]))
+
+  def testOpsBetweenUnreachable(self):
+    with ops.Graph().as_default() as g:
+      t1 = constant(1.0)
+      t2 = constant(2.0)
+      _ = array_ops.pack([t1, t2])
+      t4 = constant(1.0)
+      t5 = constant(2.0)
+      t6 = array_ops.pack([t4, t5])
+    # Elements of to_ops are always listed.
+    self._assertOpListEqual([t6.op], _OpsBetween(g, [t6.op], [t1.op]))
+
+  def testOpsBetweenCut(self):
+    with ops.Graph().as_default() as g:
+      t1 = constant(1.0)
+      t2 = constant(2.0)
+      t3 = array_ops.pack([t1, t2])
+      t4 = constant([1.0])
+      t5 = array_ops.concat(0, [t4, t3])
+      t6 = constant([2.0])
+      t7 = array_ops.concat(0, [t5, t6])
+    self._assertOpListEqual([t7.op, t5.op, t4.op],
+                            _OpsBetween(g, [t7.op], [t4.op]))
+
+  def testOpsBetweenCycle(self):
+    with ops.Graph().as_default() as g:
+      t1 = constant(1.0)
+      t2 = constant(2.0)
+      t3 = array_ops.pack([t1, t2])
+      t4 = array_ops.concat(0, [t3, t3, t3])
+      t5 = constant([1.0])
+      t6 = array_ops.concat(0, [t4, t5])
+      t7 = array_ops.concat(0, [t6, t3])
+    self._assertOpListEqual([t6.op, t4.op, t3.op],
+                            _OpsBetween(g, [t6.op], [t3.op]))
+    self._assertOpListEqual([t7.op, t6.op, t5.op, t4.op, t3.op, t1.op],
+                            _OpsBetween(g, [t7.op], [t1.op, t5.op]))
+    self._assertOpListEqual([t6.op, t5.op, t4.op, t3.op, t2.op],
+                            _OpsBetween(g, [t6.op], [t2.op, t5.op]))
+
+  def testGradients(self):
+    with ops.Graph().as_default():
+      inp = constant(1.0, shape=[32, 100], name="in")
+      w = constant(1.0, shape=[100, 10], name="w")
+      b = constant(1.0, shape=[10], name="b")
+      xw = math_ops.matmul(inp, w, name="xw")
+      h = bias_add(xw, b, name="h")
+      w_grad = gradients.gradients(h, w)[0]
+    self.assertEquals("MatMul", w_grad.op.type)
+    self.assertEquals(w_grad.op._original_op, xw.op)
+    self.assertTrue(w_grad.op.get_attr("transpose_a"))
+    self.assertFalse(w_grad.op.get_attr("transpose_b"))
+
+  def testUnusedOutput(self):
+    with ops.Graph().as_default():
+      w = constant(1.0, shape=[2, 2])
+      x = constant(1.0, shape=[2, 2])
+      wx = math_ops.matmul(w, x)
+      split_wx = array_ops.split(0, 2, wx)
+      c = math_ops.reduce_sum(split_wx[1])
+      gw = gradients.gradients(c, [w])[0]
+    self.assertEquals("MatMul", gw.op.type)
+
+  def testColocateGradients(self):
+    with ops.Graph().as_default() as g:
+      w = constant(1.0, shape=[1, 1])
+      x = constant(1.0, shape=[1, 2])
+      with g.device("/gpu:0"):
+        wx = math_ops.matmul(w, x)
+      gw = gradients.gradients(wx, [w], colocate_gradients_with_ops=True)[0]
+    self.assertEquals("/gpu:0", gw.device)
+
+  def testColocateGradientsWithAggregation(self):
+    with ops.Graph().as_default() as g:
+      with g.device("/gpu:1"):
+        w = constant(1.0, shape=[1, 1])
+      x = constant(1.0, shape=[1, 2])
+      y = constant(1.0, shape=[1, 2])
+      wx = math_ops.matmul(w, x)
+      wy = math_ops.matmul(w, y)
+      with g.device("/gpu:0"):
+        z = wx + wy
+      gw1 = gradients.gradients(z, [w], colocate_gradients_with_ops=True)[0]
+      self.assertEquals("/gpu:1", gw1.device)
+      gw2 = gradients.gradients(z, [w], colocate_gradients_with_ops=False)[0]
+      self.assertEquals(None, gw2.device)
+
+  def testBoundaryStop(self):
+    # Test that we don't differentiate 'x'. The gradient function for 'x' is
+    # set explicitly to None so we will get an exception if the gradient code
+    # tries to differentiate 'x'.
+    with ops.Graph().as_default() as g:
+      c = constant(1.0)
+      x = array_ops.identity(c)
+      y = x + 1.0
+      z = y + 1
+      grads = gradients.gradients(z, [x])
+      self.assertTrue(all([x for x in grads]))
+
+  def testBoundaryContinue(self):
+    # Test that we differentiate both 'x' and 'y' correctly when x is a
+    # predecessor of y.
+    with self.test_session():
+      x = constant(1.0)
+      y = x * 2.0
+      z = y * 3.0
+      grads = gradients.gradients(z, [x, y])
+      self.assertTrue(all([x for x in grads]))
+      self.assertEqual(6.0, grads[0].eval())
+
+  def testAggregationMethodAccumulateN(self):
+    with self.test_session():
+      x = constant(1.0)
+      y = x * 2.0
+      z = y + y + y + y + y + y + y + y + y + y
+      grads = gradients.gradients(
+          z,
+          [x, y],
+          aggregation_method=
+          gradients.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)
+      self.assertTrue(all([x for x in grads]))
+      self.assertEqual(20.0, grads[0].eval())
+      self.assertEqual(10.0, grads[1].eval())
+
+  def testAggregationMethodAddN(self):
+    with self.test_session():
+      x = constant(1.0)
+      y = x * 2.0
+      z = y + y + y + y + y + y + y + y + y + y
+      grads = gradients.gradients(
+          z,
+          [x, y],
+          aggregation_method=gradients.AggregationMethod.ADD_N)
+      self.assertTrue(all([x for x in grads]))
+      self.assertEqual(20.0, grads[0].eval())
+      self.assertEqual(10.0, grads[1].eval())
+
+  def testAggregationMethodTree(self):
+    with self.test_session():
+      x = constant(1.0)
+      y = x * 2.0
+      z = y + y + y + y + y + y + y + y + y + y
+      grads = gradients.gradients(
+          z,
+          [x, y],
+          aggregation_method=gradients.AggregationMethod.EXPERIMENTAL_TREE)
+      self.assertTrue(all([x for x in grads]))
+      self.assertEqual(20.0, grads[0].eval())
+      self.assertEqual(10.0, grads[1].eval())
+
+  def testNoGradientForStringOutputs(self):
+    with ops.Graph().as_default() as g:
+      @ops.RegisterGradient("TestOp")
+      def _TestOpGrad(op, float_grad, string_grad):
+        """Gradient function for TestOp."""
+        self.assertEquals(float_grad.dtype, types.float32)
+        self.assertFalse(string_grad)
+        return float_grad
+      ops.RegisterShape("TestOp")(None)
+
+      c = constant(1.0)
+      x, y = g.create_op("TestOp", [c], [types.float32, types.string]).outputs
+      z = x * 2.0
+      w = z * 3.0
+      grads = gradients.gradients(z, [c])
+      self.assertTrue(isinstance(grads[0], ops.Tensor))
+
+
+class StopGradientTest(test_util.TensorFlowTestCase):
+
+  def testStopGradient(self):
+    with ops.Graph().as_default():
+      inp = constant(1.0, shape=[100, 32], name="in")
+      out = array_ops.stop_gradient(inp)
+      igrad = gradients.gradients(out, inp)[0]
+    assert igrad is None
+
+
+class HessianVectorProductTest(test_util.TensorFlowTestCase):
+
+  def testHessianVectorProduct(self):
+    # Manually compute the Hessian explicitly for a low-dimensional problem
+    # and check that HessianVectorProduct matches multiplication by the
+    # explicit Hessian.
+    # Specifically, the Hessian of f(x) = x^T A x is
+    # H = A + A^T.
+    # We expect HessianVectorProduct(f(x), x, v) to be H v.
+    m = 4
+    rng = np.random.RandomState([1, 2, 3])
+    mat_value = rng.randn(m, m).astype("float32")
+    v_value = rng.randn(m, 1).astype("float32")
+    x_value = rng.randn(m, 1).astype("float32")
+    hess_value = mat_value + mat_value.T
+    hess_v_value = np.dot(hess_value, v_value)
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        mat = constant_op.constant(mat_value)
+        v = constant_op.constant(v_value)
+        x = constant_op.constant(x_value)
+        mat_x = math_ops.matmul(mat, x, name="Ax")
+        x_mat_x = math_ops.matmul(array_ops.transpose(x), mat_x, name="xAx")
+        hess_v = gradients._hessian_vector_product(x_mat_x, [x], [v])[0]
+        hess_v_actual = hess_v.eval()
+      self.assertAllClose(hess_v_value, hess_v_actual)
+
+
+class IndexedSlicesToTensorTest(test_util.TensorFlowTestCase):
+
+  def testIndexedSlicesToTensor(self):
+    with self.test_session():
+      np_val = np.random.rand(4, 4, 4, 4).astype(np.float32)
+      c = constant_op.constant(np_val)
+      c_sparse = math_ops._as_indexed_slices(c)
+      self.assertAllEqual(np_val.shape, c_sparse.dense_shape.eval())
+      c_dense = math_ops.mul(c_sparse, 1.0)
+      self.assertAllClose(np_val, c_dense.eval())
+
+  def testInt64Indices(self):
+    with self.test_session():
+      np_val = np.random.rand(4, 4, 4, 4).astype(np.float32)
+      c = constant_op.constant(np_val)
+      c_sparse = math_ops._as_indexed_slices(c)
+      c_sparse = ops.IndexedSlices(
+          c_sparse.values, math_ops.cast(c_sparse.indices, types.int64),
+          c_sparse.dense_shape)
+      self.assertAllEqual(np_val.shape, c_sparse.dense_shape.eval())
+      c_dense = math_ops.mul(c_sparse, 1.0)
+      self.assertAllClose(np_val, c_dense.eval())
+
+  def testWarnings(self):
+    # Smaller than the threshold: no warning.
+    c_sparse = ops.IndexedSlices(array_ops.placeholder(types.float32),
+                                 array_ops.placeholder(types.int32),
+                                 constant([4, 4, 4, 4]))
+    with warnings.catch_warnings(record=True) as w:
+      math_ops.mul(c_sparse, 1.0)
+    self.assertEqual(0, len(w))
+
+    # Greater than or equal to the threshold: warning.
+    c_sparse = ops.IndexedSlices(array_ops.placeholder(types.float32),
+                                 array_ops.placeholder(types.int32),
+                                 constant([100, 100, 100, 100]))
+    with warnings.catch_warnings(record=True) as w:
+      math_ops.mul(c_sparse, 1.0)
+    self.assertEqual(1, len(w))
+    self.assertTrue(
+        "with 100000000 elements. This may consume a large amount of memory."
+        in str(w[0].message))
+
+    # Unknown dense shape: warning.
+    c_sparse = ops.IndexedSlices(array_ops.placeholder(types.float32),
+                                 array_ops.placeholder(types.int32),
+                                 array_ops.placeholder(types.int32))
+    with warnings.catch_warnings(record=True) as w:
+      math_ops.mul(c_sparse, 1.0)
+    self.assertEqual(1, len(w))
+    self.assertTrue(
+        "of unknown shape. This may consume a large amount of memory."
+        in str(w[0].message))
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/ops/image_ops.py b/tensorflow/python/ops/image_ops.py
new file mode 100644
index 0000000000..1b4f4aef22
--- /dev/null
+++ b/tensorflow/python/ops/image_ops.py
@@ -0,0 +1,786 @@
+"""## Encoding and Decoding.
+
+TensorFlow provides Ops to decode and encode JPEG and PNG formats.  Encoded
+images are represented by scalar string Tensors, decoded images by 3-D uint8
+tensors of shape `[height, width, channels]`.
+
+The encode and decode Ops apply to one image at a time.  Their input and output
+are all of variable size.  If you need fixed size images, pass the output of
+the decode Ops to one of the cropping and resizing Ops.
+
+Note: The PNG encode and decode Ops support RGBA, but the conversions Ops
+presently only support RGB, HSV, and GrayScale.
+
+@@decode_jpeg
+@@encode_jpeg
+
+@@decode_png
+@@encode_png
+
+## Resizing.
+
+The resizing Ops accept input images as tensors of several types.  They always
+output resized images as float32 tensors.
+
+The convenience function [resize_images()](#resize_images) supports both 4-D
+and 3-D tensors as input and output.  4-D tensors are for batches of images,
+3-D tensors for individual images.
+
+Other resizing Ops only support 3-D individual images as input:
+[resize_area](#resize_area), [resize_bicubic](#resize_bicubic),
+[resize_bilinear](#resize_bilinear),
+[resize_nearest_neighbor](#resize_nearest_neighbor).
+
+Example:
+
+```python
+# Decode a JPG image and resize it to 299 by 299.
+image = tf.image.decode_jpeg(...)
+resized_image = tf.image.resize_bilinear(image, [299, 299])
+```
+
+<i>Maybe refer to the Queue examples that show how to add images to a Queue
+after resizing them to a fixed size, and how to dequeue batches of resized
+images from the Queue.</i>
+
+@@resize_images
+
+@@resize_area
+@@resize_bicubic
+@@resize_bilinear
+@@resize_nearest_neighbor
+
+
+## Cropping.
+
+@@resize_image_with_crop_or_pad
+
+@@pad_to_bounding_box
+@@crop_to_bounding_box
+@@random_crop
+@@extract_glimpse
+
+## Flipping and Transposing.
+
+@@flip_up_down
+@@random_flip_up_down
+
+@@flip_left_right
+@@random_flip_left_right
+
+@@transpose_image
+
+## Image Adjustments.
+
+TensorFlow provides functions to adjust images in various ways: brightness,
+contrast, hue, and saturation.  Each adjustment can be done with predefined
+parameters or with random parameters picked from predefined intervals.  Random
+adjustments are often useful to expand a training set and reduce overfitting.
+
+@@adjust_brightness
+@@random_brightness
+
+@@adjust_contrast
+@@random_contrast
+
+@@per_image_whitening
+"""
+import math
+
+import tensorflow.python.platform
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import random_seed
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import clip_ops
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import gen_image_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import random_ops
+
+
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_image_ops import *
+from tensorflow.python.ops.gen_attention_ops import *
+# pylint: enable=wildcard-import
+
+ops.NoGradient('ResizeBilinear')
+ops.NoGradient('RandomCrop')
+
+
+def _ImageDimensions(images):
+  """Returns the dimensions of an image tensor.
+
+  Args:
+    images: 4-D Tensor of shape [batch, height, width, channels]
+
+  Returns:
+    list of integers [batch, height, width, channels]
+  """
+  # A simple abstraction to provide names for each dimension. This abstraction
+  # should make it simpler to switch dimensions in the future (e.g. if we ever
+  # want to switch height and width.)
+  return images.get_shape().as_list()
+
+
+def _Check3DImage(image):
+  """Assert that we are working with properly shaped image.
+
+  Args:
+    image: 3-D Tensor of shape [height, width, channels]
+
+  Raises:
+    ValueError: if image.shape is not a [3] vector.
+  """
+  if not image.get_shape().is_fully_defined():
+    raise ValueError('\'image\' must be fully defined.')
+  if image.get_shape().ndims != 3:
+    raise ValueError('\'image\' must be three-dimensional.')
+  if not all(x > 0 for x in image.get_shape()):
+    raise ValueError('all dims of \'image.shape\' must be > 0: %s' %
+                     image.get_shape())
+
+
+def _CheckAtLeast3DImage(image):
+  """Assert that we are working with properly shaped image.
+
+  Args:
+    image: >= 3-D Tensor of size [*, height, width, depth]
+
+  Raises:
+    ValueError: if image.shape is not a [>= 3] vector.
+  """
+  if not image.get_shape().is_fully_defined():
+    raise ValueError('\'image\' must be fully defined.')
+  if image.get_shape().ndims < 3:
+    raise ValueError('\'image\' must be at least three-dimensional.')
+  if not all(x > 0 for x in image.get_shape()):
+    raise ValueError('all dims of \'image.shape\' must be > 0: %s' %
+                     image.get_shape())
+
+
+def random_flip_up_down(image, seed=None):
+  """Randomly flips an image vertically (upside down).
+
+  With a 1 in 2 chance, outputs the contents of `image` flipped along the first
+  dimension, which is `height`.  Otherwise output the image as-is.
+
+  Args:
+    image: A 3-D tensor of shape `[height, width, channels].`
+    seed: A Python integer. Used to create a random seed.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+  Returns:
+    A 3-D tensor of the same type and shape as `image`.
+
+  Raises:
+    ValueError: if the shape of `image` not supported.
+  """
+  _Check3DImage(image)
+  uniform_random = random_ops.random_uniform([], 0, 1.0, seed=seed)
+  mirror = math_ops.less(array_ops.pack([uniform_random, 1.0, 1.0]), 0.5)
+  return array_ops.reverse(image, mirror)
+
+
+def random_flip_left_right(image, seed=None):
+  """Randomly flip an image horizontally (left to right).
+
+  With a 1 in 2 chance, outputs the contents of `image` flipped along the
+  second dimension, which is `width`.  Otherwise output the image as-is.
+
+  Args:
+    image: A 3-D tensor of shape `[height, width, channels].`
+    seed: A Python integer. Used to create a random seed.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+  Returns:
+    A 3-D tensor of the same type and shape as `image`.
+
+  Raises:
+    ValueError: if the shape of `image` not supported.
+  """
+  _Check3DImage(image)
+  uniform_random = random_ops.random_uniform([], 0, 1.0, seed=seed)
+  mirror = math_ops.less(array_ops.pack([1.0, uniform_random, 1.0]), 0.5)
+  return array_ops.reverse(image, mirror)
+
+
+def flip_left_right(image):
+  """Flip an image horizontally (left to right).
+
+  Outputs the contents of `image` flipped along the second dimension, which is
+  `width`.
+
+  See also `reverse()`.
+
+  Args:
+    image: A 3-D tensor of shape `[height, width, channels].`
+
+  Returns:
+    A 3-D tensor of the same type and shape as `image`.
+
+  Raises:
+    ValueError: if the shape of `image` not supported.
+  """
+  _Check3DImage(image)
+  return array_ops.reverse(image, [False, True, False])
+
+
+def flip_up_down(image):
+  """Flip an image horizontally (upside down).
+
+  Outputs the contents of `image` flipped along the first dimension, which is
+  `height`.
+
+  See also `reverse()`.
+
+  Args:
+    image: A 3-D tensor of shape `[height, width, channels].`
+
+  Returns:
+    A 3-D tensor of the same type and shape as `image`.
+
+  Raises:
+    ValueError: if the shape of `image` not supported.
+  """
+  _Check3DImage(image)
+  return array_ops.reverse(image, [True, False, False])
+
+
+def transpose_image(image):
+  """Transpose an image by swapping the first and second dimension.
+
+  See also `transpose()`.
+
+  Args:
+    image: 3-D tensor of shape `[height, width, channels]`
+
+  Returns:
+    A 3-D tensor of shape `[width, height, channels]`
+
+  Raises:
+    ValueError: if the shape of `image` not supported.
+  """
+  _Check3DImage(image)
+  return array_ops.transpose(image, [1, 0, 2], name='transpose_image')
+
+
+def pad_to_bounding_box(image, offset_height, offset_width, target_height,
+                        target_width):
+  """Pad `image` with zeros to the specified `height` and `width`.
+
+  Adds `offset_height` rows of zeros on top, `offset_width` columns of
+  zeros on the left, and then pads the image on the bottom and right
+  with zeros until it has dimensions `target_height`, `target_width`.
+
+  This op does nothing if `offset_*` is zero and the image already has size
+  `target_height` by `target_width`.
+
+  Args:
+    image: 3-D tensor with shape `[height, width, channels]`
+    offset_height: Number of rows of zeros to add on top.
+    offset_width: Number of columns of zeros to add on the left.
+    target_height: Height of output image.
+    target_width: Width of output image.
+
+  Returns:
+    3-D tensor of shape `[target_height, target_width, channels]`
+
+  Raises:
+    ValueError: If the shape of `image` is incompatible with the `offset_*` or
+      `target_*` arguments
+  """
+  _Check3DImage(image)
+  height, width, depth = _ImageDimensions(image)
+
+  if target_width < width:
+    raise ValueError('target_width must be >= width')
+  if target_height < height:
+    raise ValueError('target_height must be >= height')
+
+  after_padding_width = target_width - offset_width - width
+  after_padding_height = target_height - offset_height - height
+
+  if after_padding_width < 0:
+    raise ValueError('target_width not possible given '
+                     'offset_width and image width')
+  if after_padding_height < 0:
+    raise ValueError('target_height not possible given '
+                     'offset_height and image height')
+
+  # Do not pad on the depth dimensions.
+  if (offset_width or offset_height or after_padding_width or
+      after_padding_height):
+    paddings = [[offset_height, after_padding_height],
+                [offset_width, after_padding_width], [0, 0]]
+    padded = array_ops.pad(image, paddings)
+    padded.set_shape([target_height, target_width, depth])
+  else:
+    padded = image
+
+  return padded
+
+
+def crop_to_bounding_box(image, offset_height, offset_width, target_height,
+                         target_width):
+  """Crops an image to a specified bounding box.
+
+  This op cuts a rectangular part out of `image`. The top-left corner of the
+  returned image is at `offset_height, offset_width` in `image`, and its
+  lower-right corner is at
+  `offset_height + target_height, offset_width + target_width'.
+
+  Args:
+    image: 3-D tensor with shape `[height, width, channels]`
+    offset_height: Vertical coordinate of the top-left corner of the result in
+                   the input.
+    offset_width: Horizontal coordinate of the top-left corner of the result in
+                  the input.
+    target_height: Height of the result.
+    target_width: Width of the result.
+
+  Returns:
+    3-D tensor of image with shape `[target_height, target_width, channels]`
+
+  Raises:
+    ValueError: If the shape of `image` is incompatible with the `offset_*` or
+    `target_*` arguments
+  """
+  _Check3DImage(image)
+  height, width, _ = _ImageDimensions(image)
+
+  if offset_width < 0:
+    raise ValueError('offset_width must be >= 0.')
+  if offset_height < 0:
+    raise ValueError('offset_height must be >= 0.')
+
+  if width < (target_width + offset_width):
+    raise ValueError('width must be >= target + offset.')
+  if height < (target_height + offset_height):
+    raise ValueError('height must be >= target + offset.')
+
+  cropped = array_ops.slice(image, [offset_height, offset_width, 0],
+                            [target_height, target_width, -1])
+
+  return cropped
+
+
+def resize_image_with_crop_or_pad(image, target_height, target_width):
+  """Crops and/or pads an image to a target width and height.
+
+  Resizes an image to a target width and height by either centrally
+  cropping the image or padding it evenly with zeros.
+
+  If `width` or `height` is greater than the specified `target_width` or
+  `target_height` respectively, this op centrally crops along that dimension.
+  If `width` or `height` is smaller than the specified `target_width` or
+  `target_height` respectively, this op centrally pads with 0 along that
+  dimension.
+
+  Args:
+    image: 3-D tensor of shape [height, width, channels]
+    target_height: Target height.
+    target_width: Target width.
+
+  Raises:
+    ValueError: if `target_height` or `target_width` are zero or negative.
+
+  Returns:
+    Cropped and/or padded image of shape
+    `[target_height, target_width, channels]`
+  """
+  _Check3DImage(image)
+  original_height, original_width, _ = _ImageDimensions(image)
+
+  if target_width <= 0:
+    raise ValueError('target_width must be > 0.')
+  if target_height <= 0:
+    raise ValueError('target_height must be > 0.')
+
+  offset_crop_width = 0
+  offset_pad_width = 0
+  if target_width < original_width:
+    offset_crop_width = int((original_width - target_width) / 2)
+  elif target_width > original_width:
+    offset_pad_width = int((target_width - original_width) / 2)
+
+  offset_crop_height = 0
+  offset_pad_height = 0
+  if target_height < original_height:
+    offset_crop_height = int((original_height - target_height) / 2)
+  elif target_height > original_height:
+    offset_pad_height = int((target_height - original_height) / 2)
+
+  # Maybe crop if needed.
+  cropped = crop_to_bounding_box(image, offset_crop_height, offset_crop_width,
+                                 min(target_height, original_height),
+                                 min(target_width, original_width))
+
+  # Maybe pad if needed.
+  resized = pad_to_bounding_box(cropped, offset_pad_height, offset_pad_width,
+                                target_height, target_width)
+
+  if resized.get_shape().ndims is None:
+    raise ValueError('resized contains no shape.')
+  if not resized.get_shape()[0].is_compatible_with(target_height):
+    raise ValueError('resized height is not correct.')
+  if not resized.get_shape()[1].is_compatible_with(target_width):
+    raise ValueError('resized width is not correct.')
+  return resized
+
+
+class ResizeMethod(object):
+  BILINEAR = 0
+  NEAREST_NEIGHBOR = 1
+  BICUBIC = 2
+  AREA = 3
+
+
+def resize_images(images, new_height, new_width, method=ResizeMethod.BILINEAR):
+  """Resize `images` to `new_width`, `new_height` using the specified `method`.
+
+  Resized images will be distorted if their original aspect ratio is not
+  the same as `new_width`, `new_height`.  To avoid distortions see
+  [resize_image_with_crop_or_pad](#resize_image_with_crop_or_pad).
+
+  `method` can be one of:
+
+  *   <b>ResizeMethod.BILINEAR</b>: [Bilinear interpolation.]
+      (https://en.wikipedia.org/wiki/Bilinear_interpolation)
+  *   <b>ResizeMethod.NEAREST_NEIGHBOR</b>: [Nearest neighbor interpolation.]
+      (https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation)
+  *   <b>ResizeMethod.BICUBIC</b>: [Bicubic interpolation.]
+      (https://en.wikipedia.org/wiki/Bicubic_interpolation)
+  *   <b>ResizeMethod.AREA</b>: Area interpolation.
+
+  Args:
+    images: 4-D Tensor of shape `[batch, height, width, channels]` or
+            3-D Tensor of shape `[height, width, channels]`.
+    new_height: integer.
+    new_width: integer.
+    method: ResizeMethod.  Defaults to `ResizeMethod.BILINEAR`.
+
+  Raises:
+    ValueError: if the shape of `images` is incompatible with the
+      shape arguments to this function
+    ValueError: if an unsupported resize method is specified.
+
+  Returns:
+    If `images` was 4-D, a 4-D float Tensor of shape
+    `[batch, new_height, new_width, channels]`.
+    If `images` was 3-D, a 3-D float Tensor of shape
+    `[new_height, new_width, channels]`.
+  """
+  if images.get_shape().ndims is None:
+    raise ValueError('\'images\' contains no shape.')
+  # TODO(shlens): Migrate this functionality to the underlying Op's.
+  is_batch = True
+  if len(images.get_shape()) == 3:
+    is_batch = False
+    images = array_ops.expand_dims(images, 0)
+
+  _, height, width, depth = _ImageDimensions(images)
+
+  if width == new_width and height == new_height:
+    return images
+
+  if method == ResizeMethod.BILINEAR:
+    images = gen_image_ops.resize_bilinear(images, [new_height, new_width])
+  elif method == ResizeMethod.NEAREST_NEIGHBOR:
+    images = gen_image_ops.resize_nearest_neighbor(images, [new_height,
+                                                            new_width])
+  elif method == ResizeMethod.BICUBIC:
+    images = gen_image_ops.resize_bicubic(images, [new_height, new_width])
+  elif method == ResizeMethod.AREA:
+    images = gen_image_ops.resize_area(images, [new_height, new_width])
+  else:
+    raise ValueError('Resize method is not implemented.')
+
+  if not is_batch:
+    images = array_ops.reshape(images, [new_height, new_width, depth])
+  return images
+
+
+def per_image_whitening(image):
+  """Linearly scales `image` to have zero mean and unit norm.
+
+  This op computes `(x - mean) / adjusted_stddev`, where `mean` is the average
+  of all values in image, and
+  `adjusted_stddev = max(stddev, 1.0/srqt(image.NumElements()))`.
+
+  `stddev` is the standard deviation of all values in `image`. It is capped
+  away from zero to protect against division by 0 when handling uniform images.
+
+  Note that this implementation is limited:
+  *  It only whitens based on the statistics of an individual image.
+  *  It does not take into account the covariance structure.
+
+  Args:
+    image: 3-D tensor of shape `[height, width, channels]`.
+
+  Returns:
+    The whitened image with same shape as `image`.
+
+  Raises:
+    ValueError: if the shape of 'image' is incompatible with this function.
+  """
+  _Check3DImage(image)
+  height, width, depth = _ImageDimensions(image)
+  num_pixels = height * width * depth
+
+  image = math_ops.cast(image, dtype=types.float32)
+  image_mean = math_ops.reduce_mean(image)
+
+  variance = (math_ops.reduce_mean(math_ops.square(image)) -
+              math_ops.square(image_mean))
+  stddev = math_ops.sqrt(variance)
+
+  # Apply a minimum normalization that protects us against uniform images.
+  min_stddev = constant_op.constant(1.0 / math.sqrt(num_pixels))
+  pixel_value_scale = math_ops.maximum(stddev, min_stddev)
+  pixel_value_offset = image_mean
+
+  image = math_ops.sub(image, pixel_value_offset)
+  image = math_ops.div(image, pixel_value_scale)
+  return image
+
+
+def random_brightness(image, max_delta, seed=None):
+  """Adjust the brightness of images by a random factor.
+
+  Equivalent to `adjust_brightness()` using a `delta` randomly picked in the
+  interval `[-max_delta, max_delta)`.
+
+  Note that `delta` is picked as a float. Because for integer type images,
+  the brightness adjusted result is rounded before casting, integer images may
+  have modifications in the range `[-max_delta,max_delta]`.
+
+  Args:
+    image: 3-D tensor of shape `[height, width, channels]`.
+    max_delta: float, must be non-negative.
+    seed: A Python integer. Used to create a random seed.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+  Returns:
+    3-D tensor of images of shape `[height, width, channels]`
+
+  Raises:
+    ValueError: if max_delta is negative.
+  """
+  _Check3DImage(image)
+
+  if max_delta < 0:
+    raise ValueError('max_delta must be non-negative.')
+
+  delta = random_ops.random_uniform([], -max_delta, max_delta, seed=seed)
+  return adjust_brightness(image, delta)
+
+
+def random_contrast(image, lower, upper, seed=None):
+  """Adjust the contrase of an image by a random factor.
+
+  Equivalent to `adjust_constrast()` but uses a `contrast_factor` randomly
+  picked in the interval `[lower, upper]`.
+
+  Args:
+    image: 3-D tensor of shape `[height, width, channels]`.
+    lower: float.  Lower bound for the random contrast factor.
+    upper: float.  Upper bound for the random contrast factor.
+    seed: A Python integer. Used to create a random seed.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+  Returns:
+    3-D tensor of shape `[height, width, channels]`.
+
+  Raises:
+    ValueError: if `upper <= lower` or if `lower < 0`.
+  """
+  _Check3DImage(image)
+
+  if upper <= lower:
+    raise ValueError('upper must be > lower.')
+
+  if lower < 0:
+    raise ValueError('lower must be non-negative.')
+
+  # Generate an a float in [lower, upper]
+  contrast_factor = random_ops.random_uniform([], lower, upper, seed=seed)
+  return adjust_contrast(image, contrast_factor)
+
+
+def adjust_brightness(image, delta, min_value=None, max_value=None):
+  """Adjust the brightness of RGB or Grayscale images.
+
+  The value `delta` is added to all components of the tensor `image`. `image`
+  and `delta` are cast to `float` before adding, and the resulting values are
+  clamped to `[min_value, max_value]`. Finally, the result is cast back to
+  `images.dtype`.
+
+  If `min_value` or `max_value` are not given, they are set to the minimum and
+  maximum allowed values for `image.dtype` respectively.
+
+  Args:
+    image: A tensor.
+    delta: A scalar. Amount to add to the pixel values.
+    min_value: Minimum value for output.
+    max_value: Maximum value for output.
+
+  Returns:
+    A tensor of the same shape and type as `image`.
+  """
+  if min_value is None:
+    min_value = image.dtype.min
+  if max_value is None:
+    max_value = image.dtype.max
+
+  with ops.op_scope([image, delta, min_value, max_value], None,
+                    'adjust_brightness') as name:
+    adjusted = math_ops.add(
+        math_ops.cast(image, types.float32),
+        math_ops.cast(delta, types.float32),
+        name=name)
+    if image.dtype.is_integer:
+      rounded = math_ops.round(adjusted)
+    else:
+      rounded = adjusted
+    clipped = clip_ops.clip_by_value(rounded, float(min_value),
+                                     float(max_value))
+    output = math_ops.cast(clipped, image.dtype)
+    return output
+
+
+def adjust_contrast(images, contrast_factor, min_value=None, max_value=None):
+  """Adjust contrast of RGB or grayscale images.
+
+  `images` is a tensor of at least 3 dimensions.  The last 3 dimensions are
+  interpreted as `[height, width, channels]`.  The other dimensions only
+  represent a collection of images, such as `[batch, height, width, channels].`
+
+  Contrast is adjusted independently for each channel of each image.
+
+  For each channel, this Op first computes the mean of the image pixels in the
+  channel and then adjusts each component `x` of each pixel to
+  `(x - mean) * contrast_factor + mean`.
+
+  The adjusted values are then clipped to fit in the `[min_value, max_value]`
+  interval. If `min_value` or `max_value` is not given, it is replaced with the
+  minimum and maximum values for the data type of `images` respectively.
+
+  The contrast-adjusted image is always computed as `float`, and it is
+  cast back to its original type after clipping.
+
+  Args:
+    images: Images to adjust.  At least 3-D.
+    contrast_factor: A float multiplier for adjusting contrast.
+    min_value: Minimum value for clipping the adjusted pixels.
+    max_value: Maximum value for clipping the adjusted pixels.
+
+  Returns:
+    The constrast-adjusted image or images.
+
+  Raises:
+    ValueError: if the arguments are invalid.
+  """
+  _CheckAtLeast3DImage(images)
+
+  # If these are None, the min/max should be a nop, but still prevent overflows
+  # from the cast back to images.dtype at the end of adjust_contrast.
+  if min_value is None:
+    min_value = images.dtype.min
+  if max_value is None:
+    max_value = images.dtype.max
+
+  with ops.op_scope(
+      [images, contrast_factor, min_value,
+       max_value], None, 'adjust_contrast') as name:
+    adjusted = gen_image_ops.adjust_contrast(images,
+                                             contrast_factor=contrast_factor,
+                                             min_value=min_value,
+                                             max_value=max_value,
+                                             name=name)
+    if images.dtype.is_integer:
+      return math_ops.cast(math_ops.round(adjusted), images.dtype)
+    else:
+      return math_ops.cast(adjusted, images.dtype)
+
+
+ops.RegisterShape('AdjustContrast')(
+    common_shapes.unchanged_shape_with_rank_at_least(3))
+
+
+@ops.RegisterShape('ResizeBilinear')
+@ops.RegisterShape('ResizeNearestNeighbor')
+@ops.RegisterShape('ResizeBicubic')
+@ops.RegisterShape('ResizeArea')
+def _ResizeShape(op):
+  """Shape function for the resize_bilinear and resize_nearest_neighbor ops."""
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  size = tensor_util.ConstantValue(op.inputs[1])
+  if size is not None:
+    height = size[0]
+    width = size[1]
+  else:
+    height = None
+    width = None
+  return [tensor_shape.TensorShape(
+      [input_shape[0], height, width, input_shape[3]])]
+
+
+@ops.RegisterShape('DecodeJpeg')
+@ops.RegisterShape('DecodePng')
+def _ImageDecodeShape(op):
+  """Shape function for image decoding ops."""
+  unused_input_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  channels = op.get_attr('channels') or None
+  return [tensor_shape.TensorShape([None, None, channels])]
+
+
+@ops.RegisterShape('EncodeJpeg')
+@ops.RegisterShape('EncodePng')
+def _ImageEncodeShape(op):
+  """Shape function for image encoding ops."""
+  unused_input_shape = op.inputs[0].get_shape().with_rank(3)
+  return [tensor_shape.scalar()]
+
+
+@ops.RegisterShape('RandomCrop')
+def _random_cropShape(op):
+  """Shape function for the random_crop op."""
+  input_shape = op.inputs[0].get_shape().with_rank(3)
+  unused_size_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.vector(2))
+  size = tensor_util.ConstantValue(op.inputs[1])
+  if size is not None:
+    height = size[0]
+    width = size[1]
+  else:
+    height = None
+    width = None
+  channels = input_shape[2]
+  return [tensor_shape.TensorShape([height, width, channels])]
+
+
+def random_crop(image, size, seed=None, name=None):
+  """Randomly crops `image` to size `[target_height, target_width]`.
+
+  The offset of the output within `image` is uniformly random. `image` always
+  fully contains the result.
+
+  Args:
+    image: 3-D tensor of shape `[height, width, channels]`
+    size: 1-D tensor with two elements, specifying target `[height, width]`
+    seed: A Python integer. Used to create a random seed.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+    name: A name for this operation (optional).
+
+  Returns:
+    A cropped 3-D tensor of shape `[target_height, target_width, channels]`.
+  """
+  seed1, seed2 = random_seed.get_seed(seed)
+  return gen_image_ops.random_crop(image, size, seed=seed1, seed2=seed2,
+                                   name=name)
diff --git a/tensorflow/python/ops/image_ops_test.py b/tensorflow/python/ops/image_ops_test.py
new file mode 100644
index 0000000000..2c51299198
--- /dev/null
+++ b/tensorflow/python/ops/image_ops_test.py
@@ -0,0 +1,771 @@
+"""Tests for tensorflow.ops.image_ops."""
+import math
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import image_ops
+from tensorflow.python.ops import io_ops
+from tensorflow.python.platform import googletest
+
+
+class FlipTest(test_util.TensorFlowTestCase):
+
+  def testIdempotentLeftRight(self):
+    x_np = np.array([[1, 2, 3], [1, 2, 3]], dtype=np.uint8).reshape([2, 3, 1])
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = constant_op.constant(x_np, shape=x_np.shape)
+        y = image_ops.flip_left_right(image_ops.flip_left_right(x_tf))
+        y_tf = y.eval()
+        self.assertAllEqual(y_tf, x_np)
+
+  def testLeftRight(self):
+    x_np = np.array([[1, 2, 3], [1, 2, 3]], dtype=np.uint8).reshape([2, 3, 1])
+    y_np = np.array([[3, 2, 1], [3, 2, 1]], dtype=np.uint8).reshape([2, 3, 1])
+
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = constant_op.constant(x_np, shape=x_np.shape)
+        y = image_ops.flip_left_right(x_tf)
+        y_tf = y.eval()
+        self.assertAllEqual(y_tf, y_np)
+
+  def testIdempotentUpDown(self):
+    x_np = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint8).reshape([2, 3, 1])
+
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = constant_op.constant(x_np, shape=x_np.shape)
+        y = image_ops.flip_up_down(image_ops.flip_up_down(x_tf))
+        y_tf = y.eval()
+        self.assertAllEqual(y_tf, x_np)
+
+  def testUpDown(self):
+    x_np = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint8).reshape([2, 3, 1])
+    y_np = np.array([[4, 5, 6], [1, 2, 3]], dtype=np.uint8).reshape([2, 3, 1])
+
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = constant_op.constant(x_np, shape=x_np.shape)
+        y = image_ops.flip_up_down(x_tf)
+        y_tf = y.eval()
+        self.assertAllEqual(y_tf, y_np)
+
+  def testIdempotentTranspose(self):
+    x_np = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint8).reshape([2, 3, 1])
+
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = constant_op.constant(x_np, shape=x_np.shape)
+        y = image_ops.transpose_image(image_ops.transpose_image(x_tf))
+        y_tf = y.eval()
+        self.assertAllEqual(y_tf, x_np)
+
+  def testTranspose(self):
+    x_np = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint8).reshape([2, 3, 1])
+    y_np = np.array([[1, 4], [2, 5], [3, 6]], dtype=np.uint8).reshape([3, 2, 1])
+
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu):
+        x_tf = constant_op.constant(x_np, shape=x_np.shape)
+        y = image_ops.transpose_image(x_tf)
+        y_tf = y.eval()
+        self.assertAllEqual(y_tf, y_np)
+
+
+class RandomFlipTest(test_util.TensorFlowTestCase):
+
+  def testRandomLeftRight(self):
+    x_np = np.array([0, 1], dtype=np.uint8).reshape([1, 2, 1])
+    num_iterations = 500
+
+    hist = [0, 0]
+    with self.test_session():
+      x_tf = constant_op.constant(x_np, shape=x_np.shape)
+      y = image_ops.random_flip_left_right(x_tf)
+      for _ in xrange(num_iterations):
+        y_np = y.eval().flatten()[0]
+        hist[y_np] += 1
+
+    # Ensure that each entry is observed within 4 standard deviations.
+    four_stddev = 4.0 * np.sqrt(num_iterations / 2.0)
+    self.assertAllClose(hist, [num_iterations / 2.0] * 2, atol=four_stddev)
+
+  def testRandomUpDown(self):
+    x_np = np.array([0, 1], dtype=np.uint8).reshape([2, 1, 1])
+    num_iterations = 500
+
+    hist = [0, 0]
+    with self.test_session():
+      x_tf = constant_op.constant(x_np, shape=x_np.shape)
+      y = image_ops.random_flip_up_down(x_tf)
+      for _ in xrange(num_iterations):
+        y_np = y.eval().flatten()[0]
+        hist[y_np] += 1
+
+    # Ensure that each entry is observed within 4 standard deviations.
+    four_stddev = 4.0 * np.sqrt(num_iterations / 2.0)
+    self.assertAllClose(hist, [num_iterations / 2.0] * 2, atol=four_stddev)
+
+
+class AdjustContrastTest(test_util.TensorFlowTestCase):
+
+  def _testContrast(self, x_np, y_np, contrast_factor, min_value, max_value):
+    for use_gpu in [True, False]:
+      with self.test_session(use_gpu=use_gpu):
+        x = constant_op.constant(x_np, shape=x_np.shape)
+        y = image_ops.adjust_contrast(x,
+                                      contrast_factor,
+                                      min_value=min_value,
+                                      max_value=max_value)
+        y_tf = y.eval()
+        self.assertAllEqual(y_tf, y_np)
+
+  def testDoubleContrastUint8(self):
+    x_shape = [1, 2, 2, 3]
+    x_data = [0, 5, 13, 54, 135, 226, 37, 8, 234, 90, 255, 1]
+    x_np = np.array(x_data, dtype=np.uint8).reshape(x_shape)
+
+    y_data = [0, 0, 0, 63, 169, 255, 29, 0, 255, 135, 255, 0]
+    y_np = np.array(y_data, dtype=np.uint8).reshape(x_shape)
+
+    self._testContrast(x_np,
+                       y_np,
+                       contrast_factor=2.0,
+                       min_value=None,
+                       max_value=None)
+
+  def testDoubleContrastFloat(self):
+    x_shape = [1, 2, 2, 3]
+    x_data = [0, 5, 13, 54, 135, 226, 37, 8, 234, 90, 255, 1]
+    x_np = np.array(x_data, dtype=np.float).reshape(x_shape)
+
+    y_data = [0, 0, 0, 62.75, 169.25, 255, 28.75, 0, 255, 134.75, 255, 0]
+    y_np = np.array(y_data, dtype=np.float).reshape(x_shape)
+
+    self._testContrast(x_np,
+                       y_np,
+                       contrast_factor=2.0,
+                       min_value=0,
+                       max_value=255)
+
+  def testHalfContrastUint8(self):
+    x_shape = [1, 2, 2, 3]
+    x_data = [0, 5, 13, 54, 135, 226, 37, 8, 234, 90, 255, 1]
+    x_np = np.array(x_data, dtype=np.uint8).reshape(x_shape)
+
+    y_data = [23, 53, 66, 50, 118, 172, 41, 54, 176, 68, 178, 60]
+    y_np = np.array(y_data, dtype=np.uint8).reshape(x_shape)
+
+    self._testContrast(x_np,
+                       y_np,
+                       contrast_factor=0.5,
+                       min_value=None,
+                       max_value=None)
+
+  def testBatchDoubleContrast(self):
+    x_shape = [2, 1, 2, 3]
+    x_data = [0, 5, 13, 54, 135, 226, 37, 8, 234, 90, 255, 1]
+    x_np = np.array(x_data, dtype=np.uint8).reshape(x_shape)
+
+    y_data = [0, 0, 0, 81, 200, 255, 11, 0, 255, 117, 255, 0]
+    y_np = np.array(y_data, dtype=np.uint8).reshape(x_shape)
+
+    self._testContrast(x_np,
+                       y_np,
+                       contrast_factor=2.0,
+                       min_value=None,
+                       max_value=None)
+
+
+class AdjustBrightnessTest(test_util.TensorFlowTestCase):
+
+  def _testBrightness(self, x_np, y_np, delta, min_value, max_value):
+    with self.test_session():
+      x = constant_op.constant(x_np, shape=x_np.shape)
+      y = image_ops.adjust_brightness(x,
+                                      delta,
+                                      min_value=min_value,
+                                      max_value=max_value)
+      y_tf = y.eval()
+      self.assertAllEqual(y_tf, y_np)
+
+  def testPositiveDeltaUint8(self):
+    x_shape = [2, 2, 3]
+    x_data = [0, 5, 13, 54, 135, 226, 37, 8, 234, 90, 255, 1]
+    x_np = np.array(x_data, dtype=np.uint8).reshape(x_shape)
+
+    y_data = [10, 15, 23, 64, 145, 236, 47, 18, 244, 100, 255, 11]
+    y_np = np.array(y_data, dtype=np.uint8).reshape(x_shape)
+
+    self._testBrightness(x_np, y_np, delta=10.0, min_value=None, max_value=None)
+
+  def testPositiveDeltaFloat(self):
+    x_shape = [2, 2, 3]
+    x_data = [0, 5, 13, 54, 135, 226, 37, 8, 234, 90, 255, 1]
+    x_np = np.array(x_data, dtype=np.float32).reshape(x_shape)
+
+    y_data = [10, 15, 23, 64, 145, 236, 47, 18, 244, 100, 265, 11]
+    y_np = np.array(y_data, dtype=np.float32).reshape(x_shape)
+
+    self._testBrightness(x_np, y_np, delta=10.0, min_value=None, max_value=None)
+
+  def testNegativeDelta(self):
+    x_shape = [2, 2, 3]
+    x_data = [0, 5, 13, 54, 135, 226, 37, 8, 234, 90, 255, 1]
+    x_np = np.array(x_data, dtype=np.uint8).reshape(x_shape)
+
+    y_data = [5, 5, 5, 44, 125, 216, 27, 5, 224, 80, 245, 5]
+    y_np = np.array(y_data, dtype=np.uint8).reshape(x_shape)
+
+    self._testBrightness(x_np, y_np, delta=-10.0, min_value=5, max_value=None)
+
+
+class RandomCropTest(test_util.TensorFlowTestCase):
+
+  def testNoOp(self):
+    # No random cropping is performed since the target width and height
+    # are match the image dimensions.
+    height = 4
+    width = 5
+    x_shape = [height, width, 3]
+    x_np = np.arange(0, np.prod(x_shape), dtype=np.int32).reshape(x_shape)
+    target_shape_np = np.array([height, width], dtype=np.int64)
+
+    with self.test_session():
+      x = constant_op.constant(x_np, shape=x_shape)
+      target_shape = constant_op.constant(target_shape_np, shape=[2])
+      y = image_ops.random_crop(x, target_shape)
+      y_tf = y.eval()
+      self.assertAllEqual(y_tf, x_np)
+
+  def testRandomization(self):
+    # Run 1x1 crop num_samples times in an image and ensure that one finds each
+    # pixel 1/num_pixels of the time.
+    num_samples = 1000
+    height = 5
+    width = 4
+
+    num_pixels = height * width
+    data = np.arange(num_pixels).reshape([height, width, 1])
+    x_np = np.array(data).astype(np.int32)
+
+    target_shape_np = np.array([1, 1], dtype=np.int64)
+
+    y = []
+    with self.test_session():
+      x = constant_op.constant(x_np, shape=x_np.shape)
+      target_shape = constant_op.constant(target_shape_np, shape=[2])
+      y_tf = image_ops.random_crop(x, target_shape)
+      for _ in xrange(num_samples):
+        y_np = y_tf.eval()
+        self.assertAllEqual(y_np.shape, [1, 1, 1])
+        y.extend(y_np.flatten())
+
+    # Calculate the mean and 4 * standard deviation.
+    mean = [num_samples / num_pixels] * num_pixels
+    four_stddev = 4.0 * np.sqrt(mean)
+
+    # Ensure that each entry is observed in 1/num_pixels of the samples
+    # within 4 standard deviations.
+    counts = np.bincount(y)
+    self.assertAllClose(counts, mean, atol=four_stddev)
+
+
+class PerImageWhiteningTest(test_util.TensorFlowTestCase):
+
+  def _NumpyPerImageWhitening(self, x):
+    num_pixels = np.prod(x.shape)
+    x2 = np.square(x).astype(np.float32)
+    mn = np.mean(x)
+    vr = np.mean(x2) - (mn * mn)
+    stddev = max(math.sqrt(vr), 1.0 / math.sqrt(num_pixels))
+
+    y = x.astype(np.float32)
+    y -= mn
+    y /= stddev
+    return y
+
+  def testBasic(self):
+    x_shape = [13, 9, 3]
+    x_np = np.arange(0, np.prod(x_shape), dtype=np.int32).reshape(x_shape)
+    y_np = self._NumpyPerImageWhitening(x_np)
+
+    with self.test_session():
+      x = constant_op.constant(x_np, shape=x_shape)
+      y = image_ops.per_image_whitening(x)
+      y_tf = y.eval()
+      self.assertAllClose(y_tf, y_np, atol=1e-4)
+
+
+class CropToBoundingBoxTest(test_util.TensorFlowTestCase):
+
+  def testNoOp(self):
+    x_shape = [13, 9, 3]
+    x_np = np.ones(x_shape, dtype=np.float32)
+
+    with self.test_session():
+      x = constant_op.constant(x_np, shape=x_shape)
+      target_height = x_shape[0]
+      target_width = x_shape[1]
+      y = image_ops.crop_to_bounding_box(x, 0, 0, target_height, target_width)
+      y_tf = y.eval()
+      self.assertAllEqual(y_tf, x_np)
+
+  def testCropping(self):
+    x_np = np.arange(0, 30, dtype=np.int32).reshape([6, 5, 1])
+
+    offset_height = 1
+    after_height = 2
+
+    offset_width = 0
+    after_width = 3
+
+    target_height = x_np.shape[0] - offset_height - after_height
+    target_width = x_np.shape[1] - offset_width - after_width
+
+    y_np = x_np[offset_height:offset_height + target_height,
+                offset_width:offset_width + target_width, :]
+
+    with self.test_session():
+      x = constant_op.constant(x_np, shape=x_np.shape)
+      y = image_ops.crop_to_bounding_box(x, offset_height, offset_width,
+                                         target_height, target_width)
+      y_tf = y.eval()
+      self.assertAllEqual(y_tf.flatten(), y_np.flatten())
+
+
+class PadToBoundingBoxTest(test_util.TensorFlowTestCase):
+
+  def testNoOp(self):
+    x_shape = [13, 9, 3]
+    x_np = np.ones(x_shape, dtype=np.float32)
+
+    target_height = x_shape[0]
+    target_width = x_shape[1]
+
+    with self.test_session():
+      x = constant_op.constant(x_np, shape=x_shape)
+      y = image_ops.pad_to_bounding_box(x, 0, 0, target_height, target_width)
+      y_tf = y.eval()
+      self.assertAllEqual(y_tf, x_np)
+
+  def testPadding(self):
+    x_shape = [3, 4, 1]
+    x_np = np.ones(x_shape, dtype=np.float32)
+
+    offset_height = 2
+    after_height = 3
+
+    offset_width = 1
+    after_width = 4
+
+    target_height = x_shape[0] + offset_height + after_height
+    target_width = x_shape[1] + offset_width + after_width
+
+    # Note the padding are along batch, height, width and depth.
+    paddings = ((offset_height, after_height),
+                (offset_width, after_width),
+                (0, 0))
+
+    y_np = np.pad(x_np, paddings, 'constant')
+
+    with self.test_session():
+      x = constant_op.constant(x_np, shape=x_shape)
+      y = image_ops.pad_to_bounding_box(x, offset_height, offset_width,
+                                        target_height, target_width)
+      y_tf = y.eval()
+      self.assertAllEqual(y_tf, y_np)
+
+
+class ResizeImagesTest(test_util.TensorFlowTestCase):
+
+  OPTIONS = [image_ops.ResizeMethod.BILINEAR,
+             image_ops.ResizeMethod.NEAREST_NEIGHBOR,
+             image_ops.ResizeMethod.BICUBIC,
+             image_ops.ResizeMethod.AREA]
+
+  def testNoOp(self):
+    img_shape = [1, 6, 4, 1]
+    data = [128, 128, 64, 64,
+            128, 128, 64, 64,
+            64, 64, 128, 128,
+            64, 64, 128, 128,
+            50, 50, 100, 100,
+            50, 50, 100, 100]
+    img_np = np.array(data, dtype=np.uint8).reshape(img_shape)
+
+    target_height = 6
+    target_width = 4
+
+    for opt in self.OPTIONS:
+      with self.test_session():
+        image = constant_op.constant(img_np, shape=img_shape)
+        y = image_ops.resize_images(image, target_height, target_width, opt)
+        resized = y.eval()
+        self.assertAllClose(resized, img_np, atol=1e-5)
+
+  def testResizeDown(self):
+
+    data = [128, 128, 64, 64,
+            128, 128, 64, 64,
+            64, 64, 128, 128,
+            64, 64, 128, 128,
+            50, 50, 100, 100,
+            50, 50, 100, 100]
+    expected_data = [128, 64,
+                     64, 128,
+                     50, 100]
+    target_height = 3
+    target_width = 2
+
+    # Test out 3-D and 4-D image shapes.
+    img_shapes = [[1, 6, 4, 1], [6, 4, 1]]
+    target_shapes = [[1, target_height, target_width, 1],
+                     [target_height, target_width, 1]]
+
+    for target_shape, img_shape in zip(target_shapes, img_shapes):
+      img_np = np.array(data, dtype=np.uint8).reshape(img_shape)
+
+      for opt in self.OPTIONS:
+        with self.test_session():
+          image = constant_op.constant(img_np, shape=img_shape)
+          y = image_ops.resize_images(image, target_height, target_width, opt)
+          expected = np.array(expected_data).reshape(target_shape)
+          resized = y.eval()
+          self.assertAllClose(resized, expected, atol=1e-5)
+
+  def testResizeUp(self):
+    img_shape = [1, 3, 2, 1]
+    data = [128, 64,
+            64, 128,
+            50, 100]
+    img_np = np.array(data, dtype=np.uint8).reshape(img_shape)
+
+    target_height = 6
+    target_width = 4
+    expected_data = {}
+    expected_data[image_ops.ResizeMethod.BILINEAR] = [
+        128.0, 96.0, 64.0, 64.0,
+        96.0, 96.0, 96.0, 96.0,
+        64.0, 96.0, 128.0, 128.0,
+        57.0, 85.5, 114.0, 114.0,
+        50.0, 75.0, 100.0, 100.0,
+        50.0, 75.0, 100.0, 100.0]
+    expected_data[image_ops.ResizeMethod.NEAREST_NEIGHBOR] = [
+        128.0, 128.0, 64.0, 64.0,
+        128.0, 128.0, 64.0, 64.0,
+        64.0, 64.0, 128.0, 128.0,
+        64.0, 64.0, 128.0, 128.0,
+        50.0, 50.0, 100.0, 100.0,
+        50.0, 50.0, 100.0, 100.0]
+    expected_data[image_ops.ResizeMethod.AREA] = [
+        128.0, 128.0, 64.0, 64.0,
+        128.0, 128.0, 64.0, 64.0,
+        64.0, 64.0, 128.0, 128.0,
+        64.0, 64.0, 128.0, 128.0,
+        50.0, 50.0, 100.0, 100.0,
+        50.0, 50.0, 100.0, 100.0]
+
+    for opt in [
+        image_ops.ResizeMethod.BILINEAR,
+        image_ops.ResizeMethod.NEAREST_NEIGHBOR,
+        image_ops.ResizeMethod.AREA]:
+      with self.test_session():
+        image = constant_op.constant(img_np, shape=img_shape)
+        y = image_ops.resize_images(image, target_height, target_width, opt)
+        resized = y.eval()
+        expected = np.array(expected_data[opt]).reshape(
+            [1, target_height, target_width, 1])
+        self.assertAllClose(resized, expected, atol=1e-05)
+
+  def testResizeUpBicubic(self):
+    img_shape = [1, 6, 6, 1]
+    data = [128, 128, 64, 64, 128, 128, 64, 64,
+            64, 64, 128, 128, 64, 64, 128, 128,
+            50, 50, 100, 100, 50, 50, 100, 100,
+            50, 50, 100, 100, 50, 50, 100, 100,
+            50, 50, 100, 100]
+    img_np = np.array(data, dtype=np.uint8).reshape(img_shape)
+
+    target_height = 8
+    target_width = 8
+    expected_data = [128, 135, 96, 55, 64, 114, 134, 128,
+                     78, 81, 68, 52, 57, 118, 144, 136,
+                     55, 49, 79, 109, 103, 89, 83, 84,
+                     74, 70, 95, 122, 115, 69, 49, 55,
+                     100, 105, 75, 43, 50, 89, 105, 100,
+                     57, 54, 74, 96, 91, 65, 55, 58,
+                     70, 69, 75, 81, 80, 72, 69, 70,
+                     105, 112, 75, 36, 45, 92, 111, 105]
+
+    with self.test_session():
+      image = constant_op.constant(img_np, shape=img_shape)
+      y = image_ops.resize_images(image, target_height, target_width,
+                                  image_ops.ResizeMethod.BICUBIC)
+      resized = y.eval()
+      expected = np.array(expected_data).reshape(
+          [1, target_height, target_width, 1])
+      self.assertAllClose(resized, expected, atol=1)
+
+  def testResizeDownArea(self):
+    img_shape = [1, 6, 6, 1]
+    data = [128, 64, 32, 16, 8, 4,
+            4, 8, 16, 32, 64, 128,
+            128, 64, 32, 16, 8, 4,
+            5, 10, 15, 20, 25, 30,
+            30, 25, 20, 15, 10, 5,
+            5, 10, 15, 20, 25, 30]
+    img_np = np.array(data, dtype=np.uint8).reshape(img_shape)
+
+    target_height = 4
+    target_width = 4
+    expected_data = [73, 33, 23, 39,
+                     73, 33, 23, 39,
+                     14, 16, 19, 21,
+                     14, 16, 19, 21]
+
+    with self.test_session():
+      image = constant_op.constant(img_np, shape=img_shape)
+      y = image_ops.resize_images(image, target_height, target_width,
+                                  image_ops.ResizeMethod.AREA)
+      expected = np.array(expected_data).reshape(
+          [1, target_height, target_width, 1])
+      resized = y.eval()
+      self.assertAllClose(resized, expected, atol=1)
+
+
+class ResizeImageWithCropOrPadTest(test_util.TensorFlowTestCase):
+
+  def _ResizeImageWithCropOrPad(self, original, original_shape,
+                                expected, expected_shape):
+    x_np = np.array(original, dtype=np.uint8).reshape(original_shape)
+    y_np = np.array(expected).reshape(expected_shape)
+
+    target_height = expected_shape[0]
+    target_width = expected_shape[1]
+
+    with self.test_session():
+      image = constant_op.constant(x_np, shape=original_shape)
+      y = image_ops.resize_image_with_crop_or_pad(image,
+                                                  target_height,
+                                                  target_width)
+      resized = y.eval()
+      self.assertAllClose(resized, y_np, atol=1e-5)
+
+  def testBasic(self):
+    # Basic no-op.
+    original = [1, 2, 3, 4,
+                5, 6, 7, 8]
+    self._ResizeImageWithCropOrPad(original, [2, 4, 1],
+                                   original, [2, 4, 1])
+
+  def testPad(self):
+    # Pad even along col.
+    original = [1, 2, 3, 4, 5, 6, 7, 8]
+    expected = [0, 1, 2, 3, 4, 0,
+                0, 5, 6, 7, 8, 0]
+    self._ResizeImageWithCropOrPad(original, [2, 4, 1],
+                                   expected, [2, 6, 1])
+    # Pad odd along col.
+    original = [1, 2, 3, 4,
+                5, 6, 7, 8]
+    expected = [0, 1, 2, 3, 4, 0, 0,
+                0, 5, 6, 7, 8, 0, 0]
+    self._ResizeImageWithCropOrPad(original, [2, 4, 1],
+                                   expected, [2, 7, 1])
+
+    # Pad even along row.
+    original = [1, 2, 3, 4,
+                5, 6, 7, 8]
+    expected = [0, 0, 0, 0,
+                1, 2, 3, 4,
+                5, 6, 7, 8,
+                0, 0, 0, 0]
+    self._ResizeImageWithCropOrPad(original, [2, 4, 1],
+                                   expected, [4, 4, 1])
+    # Pad odd along row.
+    original = [1, 2, 3, 4,
+                5, 6, 7, 8]
+    expected = [0, 0, 0, 0,
+                1, 2, 3, 4,
+                5, 6, 7, 8,
+                0, 0, 0, 0,
+                0, 0, 0, 0]
+    self._ResizeImageWithCropOrPad(original, [2, 4, 1],
+                                   expected, [5, 4, 1])
+
+  def testCrop(self):
+    # Crop even along col.
+    original = [1, 2, 3, 4,
+                5, 6, 7, 8]
+    expected = [2, 3,
+                6, 7]
+    self._ResizeImageWithCropOrPad(original, [2, 4, 1],
+                                   expected, [2, 2, 1])
+    # Crop odd along col.
+
+    original = [1, 2, 3, 4, 5, 6,
+                7, 8, 9, 10, 11, 12]
+    expected = [2, 3, 4,
+                8, 9, 10]
+    self._ResizeImageWithCropOrPad(original, [2, 6, 1],
+                                   expected, [2, 3, 1])
+
+    # Crop even along row.
+    original = [1, 2,
+                3, 4,
+                5, 6,
+                7, 8]
+    expected = [3, 4,
+                5, 6]
+    self._ResizeImageWithCropOrPad(original, [4, 2, 1],
+                                   expected, [2, 2, 1])
+
+    # Crop odd along row.
+    original = [1, 2,
+                3, 4,
+                5, 6,
+                7, 8,
+                9, 10,
+                11, 12,
+                13, 14,
+                15, 16]
+    expected = [3, 4,
+                5, 6,
+                7, 8,
+                9, 10,
+                11, 12]
+    self._ResizeImageWithCropOrPad(original, [8, 2, 1],
+                                   expected, [5, 2, 1])
+
+  def testCropAndPad(self):
+    # Pad along row but crop along col.
+    original = [1, 2, 3, 4,
+                5, 6, 7, 8]
+    expected = [0, 0,
+                2, 3,
+                6, 7,
+                0, 0]
+    self._ResizeImageWithCropOrPad(original, [2, 4, 1],
+                                   expected, [4, 2, 1])
+
+    # Crop along row but pad along col.
+    original = [1, 2,
+                3, 4,
+                5, 6,
+                7, 8]
+    expected = [0, 3, 4, 0,
+                0, 5, 6, 0]
+    self._ResizeImageWithCropOrPad(original, [4, 2, 1],
+                                   expected, [2, 4, 1])
+
+
+def _SimpleColorRamp():
+  """Build a simple color ramp RGB image."""
+  w, h = 256, 200
+  i = np.arange(h)[:, None]
+  j = np.arange(w)
+  image = np.empty((h, w, 3), dtype=np.uint8)
+  image[:, :, 0] = i
+  image[:, :, 1] = j
+  image[:, :, 2] = (i + j) >> 1
+  return image
+
+
+class JpegTest(test_util.TensorFlowTestCase):
+
+  # TODO(irving): Add self.assertAverageLess or similar to test_util
+  def averageError(self, image0, image1):
+    self.assertEqual(image0.shape, image1.shape)
+    image0 = image0.astype(int)  # Avoid overflow
+    return np.abs(image0 - image1).sum() / float(np.prod(image0.shape))
+
+  def testExisting(self):
+    # Read a real jpeg and verify shape
+    path = ('tensorflow/core/lib/jpeg/testdata/'
+            'jpeg_merge_test1.jpg')
+    with self.test_session() as sess:
+      jpeg0 = io_ops.read_file(path)
+      image0 = image_ops.decode_jpeg(jpeg0)
+      image1 = image_ops.decode_jpeg(image_ops.encode_jpeg(image0))
+      jpeg0, image0, image1 = sess.run([jpeg0, image0, image1])
+      self.assertEqual(len(jpeg0), 3771)
+      self.assertEqual(image0.shape, (256, 128, 3))
+      self.assertLess(self.averageError(image0, image1), 0.8)
+
+  def testSynthetic(self):
+    with self.test_session() as sess:
+      # Encode it, then decode it, then encode it
+      image0 = constant_op.constant(_SimpleColorRamp())
+      jpeg0 = image_ops.encode_jpeg(image0)
+      image1 = image_ops.decode_jpeg(jpeg0)
+      image2 = image_ops.decode_jpeg(image_ops.encode_jpeg(image1))
+      jpeg0, image0, image1, image2 = sess.run([jpeg0, image0, image1, image2])
+
+      # The decoded-encoded image should be similar to the input
+      self.assertLess(self.averageError(image0, image1), 0.6)
+
+      # We should be very close to a fixpoint
+      self.assertLess(self.averageError(image1, image2), 0.02)
+
+      # Smooth ramps compress well (input size is 153600)
+      self.assertGreaterEqual(len(jpeg0), 5000)
+      self.assertLessEqual(len(jpeg0), 6000)
+
+  def testShape(self):
+    with self.test_session() as sess:
+      jpeg = constant_op.constant('nonsense')
+      for channels in 0, 1, 3:
+        image = image_ops.decode_jpeg(jpeg, channels=channels)
+        self.assertEqual(image.get_shape().as_list(),
+                         [None, None, channels or None])
+
+
+class PngTest(test_util.TensorFlowTestCase):
+
+  def testExisting(self):
+    # Read some real PNGs, converting to different channel numbers
+    prefix = 'tensorflow/core/lib/png/testdata/'
+    inputs = (1, 'lena_gray.png'), (4, 'lena_rgba.png')
+    for channels_in, filename in inputs:
+      for channels in 0, 1, 3, 4:
+        with self.test_session() as sess:
+          png0 = io_ops.read_file(prefix + filename)
+          image0 = image_ops.decode_png(png0, channels=channels)
+          png0, image0 = sess.run([png0, image0])
+          self.assertEqual(image0.shape, (26, 51, channels or channels_in))
+          if channels == channels_in:
+            image1 = image_ops.decode_png(image_ops.encode_png(image0))
+            self.assertAllEqual(image0, image1.eval())
+
+  def testSynthetic(self):
+    with self.test_session() as sess:
+      # Encode it, then decode it
+      image0 = constant_op.constant(_SimpleColorRamp())
+      png0 = image_ops.encode_png(image0, compression=7)
+      image1 = image_ops.decode_png(png0)
+      png0, image0, image1 = sess.run([png0, image0, image1])
+
+      # PNG is lossless
+      self.assertAllEqual(image0, image1)
+
+      # Smooth ramps compress well, but not too well
+      self.assertGreaterEqual(len(png0), 400)
+      self.assertLessEqual(len(png0), 750)
+
+  def testShape(self):
+    with self.test_session() as sess:
+      png = constant_op.constant('nonsense')
+      for channels in 0, 1, 3:
+        image = image_ops.decode_png(png, channels=channels)
+        self.assertEqual(image.get_shape().as_list(),
+                         [None, None, channels or None])
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/ops/init_ops.py b/tensorflow/python/ops/init_ops.py
new file mode 100644
index 0000000000..09c8801e0e
--- /dev/null
+++ b/tensorflow/python/ops/init_ops.py
@@ -0,0 +1,181 @@
+"""Operations often used for initializing tensors."""
+
+import math
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops import random_ops
+
+
+# TODO(mrry): PEP8 these.
+def constant_initializer(value=0.0):
+  """Returns an initializer that generates Tensors with a single value.
+
+  Args:
+    value: A Python scalar. All elements of the initialized variable
+      will be set to this value.
+
+  Returns:
+    An initializer that generates Tensors with a single value.
+  """
+  def _initializer(shape, dtype=types.float32):
+    return constant_op.constant(value, dtype=dtype, shape=shape)
+  return _initializer
+
+def random_uniform_initializer(minval=0.0, maxval=1.0, seed=None):
+  """Returns an initializer that generates Tensors with a uniform distribution.
+
+  Args:
+    minval: a python scalar or a scalar tensor. lower bound of the range
+      of random values to generate.
+    maxval: a python scalar or a scalar tensor. upper bound of the range
+      of random values to generate.
+    seed: A Python integer. Used to create random seeds.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+  Returns:
+    An initializer that generates Tensors with a uniform distribution.
+  """
+  def _initializer(shape, dtype=types.float32):
+    return random_ops.random_uniform(shape, minval, maxval, dtype, seed=seed)
+  return _initializer
+
+def random_normal_initializer(mean=0.0, stddev=1.0, seed=None):
+  """Returns an initializer that generates Tensors with a normal distribution.
+
+  Args:
+    mean: a python scalar or a scalar tensor. Mean of the random values
+      to generate.
+    stddev: a python scalar or a scalar tensor. Standard deviation of the
+      random values to generate.
+    seed: A Python integer. Used to create random seeds.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+  Returns:
+    An initializer that generates Tensors with a normal distribution.
+  """
+  def _initializer(shape, dtype=types.float32):
+    return random_ops.random_normal(shape, mean, stddev, dtype, seed=seed)
+  return _initializer
+
+def truncated_normal_initializer(mean=0.0, stddev=1.0, seed=None):
+  """Returns an initializer that generates a truncated normal distribution.
+
+  These values are similar to values from a random_normal_initializer
+  except that values more than two standard deviations from the mean
+  are discarded and re-drawn. This is the recommended initializer for
+  neural network weights and filters.
+
+  Args:
+    mean: a python scalar or a scalar tensor. Mean of the random values
+      to generate.
+    stddev: a python scalar or a scalar tensor. Standard deviation of the
+      random values to generate.
+    seed: A Python integer. Used to create random seeds.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+  Returns:
+    An initializer that generates Tensors with a truncated normal
+    distribution.
+  """
+  def _initializer(shape, dtype=types.float32):
+    return random_ops.truncated_normal(shape, mean, stddev, dtype, seed=seed)
+  return _initializer
+
+def uniform_unit_scaling_initializer(factor=1.0, seed=None):
+  """Returns an initializer that generates tensors without scaling variance.
+
+  When initializing a deep network, it is in principle advantageous to keep
+  the scale of the input variance constant, so it does not explode or diminish
+  by reaching the final layer. If the input is `x` and the operation `x * W`,
+  and we want to initialize `W` uniformly at random, we need to pick `W` from
+
+      [-sqrt(3) / sqrt(dim), sqrt(3) / sqrt(dim)]
+
+  to keep the scale intact, where `dim = W.shape[0]` (the size of the input).
+  A similar calculation for convolutional networks gives an analogous result
+  with `dim` equal to the product of the first 3 dimensions.  When
+  nonlinearities are present, we need to multiply this by a constant `factor`.
+  See <https://arxiv.org/pdf/1412.6558v3.pdf> for deeper motivation, experiments
+  and the calculation of constants. In section 2.3 there, the constants were
+  numerically computed: for a linear layer it's 1.0, relu: ~1.43, tanh: ~1.15.
+
+  Args:
+    factor: Float.  A multiplicative factor by which the values will be scaled.
+    seed: A Python integer. Used to create random seeds.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+
+  Returns:
+    An initializer that generates tensors with unit variance.
+  """
+  def _initializer(shape, dtype=types.float32):
+    input_size = 1.0
+    # Estimating input size is not possible to do perfectly, but we try.
+    # The estimate, obtained by multiplying all dimensions but the last one,
+    # is the right thing for matrix multiply and convolutions (see above).
+    for dim in shape[:-1]:
+      input_size *= float(dim)
+    max_val = math.sqrt(float(3) / float(input_size)) * factor
+    return random_ops.random_uniform(shape, -max_val, max_val,
+                                     dtype, seed=seed)
+  return _initializer
+
+# TODO(vrv): Unhide when we are ready to expose this publicly.
+def _random_walk(shape, nonlinearity, dtype=types.float32, seed=None,
+                 name="random_walk"):
+  """Create a random tensor such that backprop neither vanishes nor explodes.
+
+  Args:
+    shape: a python array of int or a 1-d tensor. Sizes of the Tensor.
+    nonlinearity: the brain python function for implementing the
+      nonlinearity in tensor flow.
+    dtype: The type of the output.
+    seed: A Python integer. Used to create random seeds.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+    name: string.  Optional name for the op.
+
+  Returns:
+    A Tensor of the specified sizes filled with random values.
+  """
+  assert len(shape) == 2, "Random Walk initialization only supports 2D tensors."
+  num_inputs = shape[0]
+  if nonlinearity == math_ops.tanh:
+    # No real formula for this case yet, but this works well for many
+    # layer widths.
+    rwg = 1.13
+  elif nonlinearity == array_ops.identity:
+    rwg = math.exp(1.0 / float(2.0 * num_inputs))
+  elif nonlinearity == nn_ops.relu:
+    rwg = math.sqrt(2.0) * math.exp(1.2 / float(max(num_inputs, 6) - 2.4))
+  else:
+    assert False, "Unsupported nonlinearity for Random Walk initialization."
+
+  mean = 0.0
+  stddev = rwg / math.sqrt(float(num_inputs))
+
+  return random_ops.random_normal(shape, mean=mean, stddev=stddev, dtype=dtype,
+                                  seed=seed, name=name)
+
+
+# TODO(vrv): Unhide when we are ready to expose this publicly.
+class _RandomWalkInitializer(object):
+  """An Initializer that generates a tensor for Random Walk Initialization."""
+
+  def __init__(self, nonlinearity, seed=None):
+    """Construct a RandomWalkInitializer.
+
+    Args:
+      nonlinearity: the python tensorflow function that computes a nonlinearity
+        in the graph, typically after a Wx+b type operation.
+      seed: A Python integer. Used to create random seeds.
+        See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+    """
+    self._nonlinearity = nonlinearity
+    self._seed = seed
+
+  def __call__(self, shape, dtype=types.float32):
+    """Generate a tensor used to initialize a variable."""
+    return random_ops._random_walk(shape, self._nonlinearity, dtype,
+                                   seed=self._seed)
diff --git a/tensorflow/python/ops/io_ops.py b/tensorflow/python/ops/io_ops.py
new file mode 100644
index 0000000000..9eb3bdfae4
--- /dev/null
+++ b/tensorflow/python/ops/io_ops.py
@@ -0,0 +1,541 @@
+"""## Placeholders
+
+TensorFlow provides a placeholder operation that must be fed with data
+on execution.  For more info, see the section on [Feeding
+data](../../how_tos/reading_data/index.md#feeding).
+
+@@placeholder
+
+## Readers
+
+TensorFlow provides a set of Reader classes for reading data formats.
+For more information on inputs and readers, see [Reading
+data](../../how_tos/reading_data/index.md).
+
+@@ReaderBase
+@@TextLineReader
+@@WholeFileReader
+@@IdentityReader
+@@TFRecordReader
+@@FixedLengthRecordReader
+
+## Converting
+
+TensorFlow provides several operations that you can use to convert various data
+formats into tensors.
+
+@@decode_csv
+@@decode_raw
+@@parse_example
+@@parse_single_example
+
+## Queues
+
+TensorFlow provides several implementations of 'Queues', which are
+structures within the TensorFlow computation graph to stage pipelines
+of tensors together. The following describe the basic Queue interface
+and some implementations.  To see an example use, see [Threading and
+Queues](../../how_tos/threading_and_queues/index.md).
+
+@@QueueBase
+@@FIFOQueue
+@@RandomShuffleQueue
+
+## Dealing with the filesystem
+
+@@matching_files
+@@read_file
+
+## Input pipeline
+
+TensorFlow functions for setting up an input-prefetching pipeline.
+Please see the [reading data how-to](../../how_tos/reading_data.md)
+for context.
+
+### Beginning of an input pipeline
+
+The "producer" functions add a queue to the graph and a corresponding
+`QueueRunner` for running the subgraph that fills that queue.
+
+@@match_filenames_once
+@@limit_epochs
+@@range_input_producer
+@@slice_input_producer
+@@string_input_producer
+
+### Batching at the end of an input pipeline
+
+These functions add a queue to the graph to assemble a batch of examples, with
+possible shuffling.  They also add a `QueueRunner` for running the subgraph
+that fills that queue.
+
+Use [batch](#batch) or [batch_join](#batch_join) for batching examples that have
+already been well shuffled.  Use [shuffle_batch](#shuffle_batch) or
+[shuffle_batch_join](#shuffle_batch_join) for examples that
+would benefit from additional shuffling.
+
+Use [batch](#batch) or [shuffle_batch](#shuffle_batch) if you want a
+single thread producing examples to batch, or if you have a
+single subgraph producing examples but you want to run it in N threads
+(where you increase N until it can keep the queue full).  Use
+[batch_join](#batch_join) or [shuffle_batch_join](#shuffle_batch_join)
+if you have N different subgraphs producing examples to batch and you
+want them run by N threads.
+
+@@batch
+@@batch_join
+@@shuffle_batch
+@@shuffle_batch_join
+"""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import types
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import gen_io_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_io_ops import *
+# pylint: enable=wildcard-import
+
+
+# pylint: disable=protected-access
+def _save(filename, tensor_names, tensors, tensor_slices=None, name="save"):
+  """Save a list of tensors to a file with given names.
+
+  Example usage without slice info:
+    Save("/foo/bar", ["w", "b"], [w, b])
+
+  Example usage with slices:
+    Save("/foo/bar", ["w", "w"], [slice0, slice1],
+         tensor_slices=["4 10 0,2:-", "4 10 2,2:-"])
+
+  Args:
+    filename: the file name of the sstable.
+    tensor_names: a list of strings.
+    tensors: the list of tensors to be saved.
+    tensor_slices: Optional list of strings to specify the shape and slices of
+      a larger virtual tensor that each tensor is a part of.  If not specified
+      each tensor is saved as a full slice.
+    name: string.  Optional name for the op.
+
+  Requires:
+    The length of tensors should match the size of tensor_names and of
+    tensor_slices.
+
+  Returns:
+    An Operation that saves the tensors.
+  """
+  if tensor_slices is None:
+    return gen_io_ops._save(filename, tensor_names, tensors, name=name)
+  else:
+    return gen_io_ops._save_slices(filename, tensor_names, tensor_slices,
+                                   tensors, name=name)
+
+
+def _restore_slice(file_pattern, tensor_name, shape_and_slice, tensor_type,
+                   name="restore_slice", preferred_shard=-1):
+  """Restore a tensor slice from a set of files with a given pattern.
+
+  Example usage:
+    RestoreSlice("/foo/bar-?????-of-?????", "w", "10 10 0,2:-", DT_FLOAT)
+
+  Args:
+    file_pattern: the file pattern used to match a set of checkpoint files.
+    tensor_name: the name of the tensor to restore.
+    shape_and_slice: the shape-and-slice spec of the slice.
+    tensor_type: the type of the tensor to restore.
+    name: string.  Optional name for the op.
+    preferred_shard: Int. Optional shard to open first in the checkpoint file.
+
+  Returns:
+    A tensor of type "tensor_type".
+  """
+  base_type = types.as_dtype(tensor_type).base_dtype
+  return gen_io_ops._restore_slice(
+      file_pattern, tensor_name, shape_and_slice, base_type,
+      preferred_shard, name=name)
+
+
+@ops.RegisterShape("Restore")
+def _RestoreShape(op):
+  """Shape function for Restore op."""
+  # Validate input shapes.
+  unused_file_pattern = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  unused_tensor_name = op.inputs[1].get_shape().merge_with(
+      tensor_shape.scalar())
+  return [tensor_shape.unknown_shape()]
+
+
+@ops.RegisterShape("RestoreSlice")
+def _RestoreSliceShape(op):
+  """Shape function for RestoreSlice op."""
+  # Validate input shapes.
+  unused_file_pattern = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  unused_tensor_name = op.inputs[1].get_shape().merge_with(
+      tensor_shape.scalar())
+  unused_shape_and_slice_shape = op.inputs[2].get_shape().merge_with(
+      tensor_shape.scalar())
+  # TODO(mrry): Attempt to parse the shape_and_slice value and use it
+  # to form the shape of the output.
+  return [tensor_shape.unknown_shape()]
+
+
+@ops.RegisterShape("Save")
+def _SaveShape(op):
+  """Shape function for Save op."""
+  # Validate input shapes.
+  unused_filename = op.inputs[0].get_shape().merge_with(tensor_shape.scalar())
+  data_count = len(op.inputs) - 2
+  unused_tensor_names_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.vector(data_count))
+  return []
+
+
+@ops.RegisterShape("SaveSlices")
+def _SaveSlicesShape(op):
+  """Shape function for SaveSlices op."""
+  # Validate input shapes.
+  unused_filename = op.inputs[0].get_shape().merge_with(tensor_shape.scalar())
+  data_count = len(op.inputs) - 3
+  unused_tensor_names_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.vector(data_count))
+  unused_shapes_and_slices_shape = op.inputs[2].get_shape().merge_with(
+      tensor_shape.vector(data_count))
+  # TODO(mrry): Attempt to parse the shapes_and_slices values and use
+  # them to constrain the shape of the remaining inputs.
+  return []
+
+
+@ops.RegisterShape("ShardedFilename")
+def _ShardedFilenameShape(op):
+  """Shape function for ShardedFilename op."""
+  # Validate input shapes.
+  unused_basename_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  unused_shard_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.scalar())
+  unused_num_shards_shape = op.inputs[2].get_shape().merge_with(
+      tensor_shape.scalar())
+  return [tensor_shape.scalar()]
+
+
+@ops.RegisterShape("ShardedFilespec")
+def _ShardedFilespecShape(op):
+  """Shape function for ShardedFilespec op."""
+  # Validate input shapes.
+  unused_basename_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  unused_num_shards_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.scalar())
+  return [tensor_shape.scalar()]
+
+
+class ReaderBase(object):
+  """Base class for different Reader types, that produce a record every step.
+
+  Conceptually, Readers convert string 'work units' into records (key,
+  value pairs).  Typically the 'work units' are filenames and the
+  records are extracted from the contents of those files.  We want a
+  single record produced per step, but a work unit can correspond to
+  many records.
+
+  Therefore we introduce some decoupling using a queue.  The queue
+  contains the work units and the Reader dequeues from the queue when
+  it is asked to produce a record (via Read()) but it has finished the
+  last work unit.
+  """
+
+  def __init__(self, reader_ref, supports_serialize=False):
+    """Creates a new ReaderBase.
+
+    Args:
+      reader_ref: The operation that implements the reader.
+      supports_serialize: True if the reader implementation can
+        serialize its state.
+    """
+    self._reader_ref = reader_ref
+    self._supports_serialize = supports_serialize
+
+  @property
+  def reader_ref(self):
+    """Op that implements the reader."""
+    return self._reader_ref
+
+  def read(self, queue, name=None):
+    """Returns the next record (key, value pair) produced by a reader.
+
+    Will dequeue a work unit from queue if necessary (e.g. when the
+    Reader needs to start reading from a new file since it has
+    finished with the previous file).
+
+    Args:
+      queue: A Queue or a mutable string Tensor representing a handle
+        to a Queue, with string work items.
+      name: A name for the operation (optional).
+
+    Returns:
+      A tuple of Tensors (key, value).
+      key: A string scalar Tensor.
+      value: A string scalar Tensor.
+    """
+    if isinstance(queue, ops.Tensor):
+      queue_ref = queue
+    else:
+      queue_ref = queue.queue_ref
+    return gen_io_ops._reader_read(self._reader_ref, queue_ref, name=name)
+
+  def num_records_produced(self, name=None):
+    """Returns the number of records this reader has produced.
+
+    This is the same as the number of Read executions that have
+    succeeded.
+
+    Args:
+      name: A name for the operation (optional).
+
+    Returns:
+      An int64 Tensor.
+
+    """
+    return gen_io_ops._reader_num_records_produced(self._reader_ref, name=name)
+
+  def num_work_units_completed(self, name=None):
+    """Returns the number of work units this reader has finished processing.
+
+    Args:
+      name: A name for the operation (optional).
+
+    Returns:
+      An int64 Tensor.
+    """
+    return gen_io_ops._reader_num_work_units_completed(self._reader_ref,
+                                                       name=name)
+
+  def serialize_state(self, name=None):
+    """Produce a string tensor that encodes the state of a reader.
+
+    Not all Readers support being serialized, so this can produce an
+    Unimplemented error.
+
+    Args:
+      name: A name for the operation (optional).
+
+    Returns:
+      A string Tensor.
+    """
+    return gen_io_ops._reader_serialize_state(self._reader_ref, name=name)
+
+  def restore_state(self, state, name=None):
+    """Restore a reader to a previously saved state.
+
+    Not all Readers support being restored, so this can produce an
+    Unimplemented error.
+
+    Args:
+      state: A string Tensor.
+        Result of a SerializeState of a Reader with matching type.
+      name: A name for the operation (optional).
+
+    Returns:
+      The created Operation.
+    """
+    return gen_io_ops._reader_restore_state(self._reader_ref, state, name=name)
+
+  @property
+  def supports_serialize(self):
+    """Whether the Reader implementation can serialize its state."""
+    return self._supports_serialize
+
+  def reset(self, name=None):
+    """Restore a reader to its initial clean state.
+
+    Args:
+      name: A name for the operation (optional).
+
+    Returns:
+      The created Operation.
+    """
+    return gen_io_ops._reader_reset(self._reader_ref, name=name)
+
+
+ops.NoGradient("ReaderRead")
+ops.NoGradient("ReaderNumRecordsProduced")
+ops.NoGradient("ReaderNumWorkUnitsCompleted")
+ops.NoGradient("ReaderSerializeState")
+ops.NoGradient("ReaderRestoreState")
+ops.NoGradient("ReaderReset")
+
+
+class WholeFileReader(ReaderBase):
+  """A Reader that outputs the entire contents of a file as a value.
+
+  To use, enqueue filenames in a Queue.  The output of Read will
+  be a filename (key) and the contents of that file (value).
+
+  See ReaderBase for supported methods.
+  """
+
+  def __init__(self, name=None):
+    """Create a WholeFileReader.
+
+    Args:
+      name: A name for the operation (optional).
+    """
+    rr = gen_io_ops._whole_file_reader(name=name)
+    super(WholeFileReader, self).__init__(rr, supports_serialize=True)
+
+
+ops.NoGradient("WholeFileReader")
+
+
+class TextLineReader(ReaderBase):
+  """A Reader that outputs the lines of a file delimited by newlines.
+
+  Newlines are stripped from the output.
+  See ReaderBase for supported methods.
+  """
+  # TODO(josh11b): Support serializing and restoring state.
+
+  def __init__(self, skip_header_lines=None, name=None):
+    """Create a TextLineReader.
+
+    Args:
+      skip_header_lines: An optional int. Defaults to 0.  Number of lines
+        to skip from the beginning of every file.
+      name: A name for the operation (optional).
+    """
+    rr = gen_io_ops._text_line_reader(skip_header_lines=skip_header_lines,
+                                      name=name)
+    super(TextLineReader, self).__init__(rr)
+
+
+ops.NoGradient("TextLineReader")
+
+
+class FixedLengthRecordReader(ReaderBase):
+  """A Reader that outputs fixed-length records from a file.
+
+  See ReaderBase for supported methods.
+  """
+  # TODO(josh11b): Support serializing and restoring state.
+
+  def __init__(self, record_bytes, header_bytes=None, footer_bytes=None,
+               name=None):
+    """Create a FixedLengthRecordReader.
+
+    Args:
+      record_bytes: An int.
+      header_bytes: An optional int. Defaults to 0.
+      footer_bytes: An optional int. Defaults to 0.
+      name: A name for the operation (optional).
+    """
+    rr = gen_io_ops._fixed_length_record_reader(
+        record_bytes=record_bytes, header_bytes=header_bytes,
+        footer_bytes=footer_bytes, name=name)
+    super(FixedLengthRecordReader, self).__init__(rr)
+
+
+ops.NoGradient("FixedLengthRecordReader")
+
+
+class TFRecordReader(ReaderBase):
+  """A Reader that outputs the records from a TFRecords file.
+
+  See ReaderBase for supported methods.
+  """
+  # TODO(josh11b): Support serializing and restoring state.
+
+  def __init__(self, name=None):
+    """Create a TFRecordReader.
+
+    Args:
+      name: A name for the operation (optional).
+    """
+    rr = gen_io_ops._tf_record_reader(name=name)
+    super(TFRecordReader, self).__init__(rr)
+
+
+ops.NoGradient("TFRecordReader")
+
+
+class IdentityReader(ReaderBase):
+  """A Reader that outputs the queued work as both the key and value.
+
+  To use, enqueue strings in a Queue.  Read will take the front
+  work string and output (work, work).
+
+  See ReaderBase for supported methods.
+  """
+
+  def __init__(self, name=None):
+    """Create a IdentityReader.
+
+    Args:
+      name: A name for the operation (optional).
+    """
+    rr = gen_io_ops._identity_reader(name=name)
+    super(IdentityReader, self).__init__(rr, supports_serialize=True)
+
+
+ops.NoGradient("IdentityReader")
+
+
+ops.RegisterShape("FixedLengthRecordReader")(common_shapes.scalar_shape)
+ops.RegisterShape("IdentityReader")(common_shapes.scalar_shape)
+ops.RegisterShape("TextLineReader")(common_shapes.scalar_shape)
+ops.RegisterShape("WholeFileReader")(common_shapes.scalar_shape)
+ops.RegisterShape("TFRecordReader")(common_shapes.scalar_shape)
+
+
+@ops.RegisterShape("ReaderNumRecordsProduced")
+@ops.RegisterShape("ReaderNumWorkUnitsCompleted")
+@ops.RegisterShape("ReaderSerializeState")
+def _ReaderScalarShape(op):
+  """Shape function for ops that transform a reader to a scalar."""
+  unused_handle_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  return [tensor_shape.scalar()]
+
+
+@ops.RegisterShape("ReaderRead")
+def _ReaderReadShape(op):
+  """Shape function for the ReaderBase.Read op."""
+  unused_handle_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  unused_queue_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.scalar())
+  return [tensor_shape.scalar(), tensor_shape.scalar()]
+
+
+@ops.RegisterShape("ReaderReset")
+def _ReaderResetShape(op):
+  """Shape function for the ReaderBase.Reset op."""
+  unused_handle_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  return []
+
+
+@ops.RegisterShape("ReaderRestoreState")
+def _ReaderRestoreStateShape(op):
+  """Shape function for the ReaderBase.Restore op."""
+  unused_handle_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  unused_state_shape = op.inputs[1].get_shape().merge_with(
+      tensor_shape.scalar())
+  return []
+
+
+@ops.RegisterShape("ReadFile")
+def _ReadFileShape(op):
+  """Shape function for the ReadFile op."""
+  return [op.inputs[0].get_shape().merge_with(tensor_shape.scalar())]
+
+
+@ops.RegisterShape("MatchingFiles")
+def _MatchingFilesShape(op):
+  """Shape function for the MatchingFiles op."""
+  unused_patern_shape = op.inputs[0].get_shape().merge_with(
+      tensor_shape.scalar())
+  return [tensor_shape.unknown_shape(ndims=1)]
diff --git a/tensorflow/python/ops/linalg_grad.py b/tensorflow/python/ops/linalg_grad.py
new file mode 100644
index 0000000000..893618c9dd
--- /dev/null
+++ b/tensorflow/python/ops/linalg_grad.py
@@ -0,0 +1,25 @@
+"""Gradients for operators defined in linalg_ops.py."""
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import linalg_ops
+from tensorflow.python.ops import math_ops
+
+@ops.RegisterGradient("MatrixInverse")
+def _MatrixInverseGrad(op, grad):
+  """Gradient for MatrixInverse."""
+  ainv = op.outputs[0]
+  return -math_ops.matmul(
+      ainv,
+      math_ops.matmul(grad, ainv, transpose_b=True),
+      transpose_a=True)
+
+@ops.RegisterGradient("BatchMatrixInverse")
+def _BatchMatrixInverseGrad(op, grad):
+  """Gradient for BatchMatrixInverse."""
+  ainv = op.outputs[0]
+  return -math_ops.batch_matmul(
+      ainv,
+      math_ops.batch_matmul(grad, ainv, adj_y=True),
+      adj_x=True)
diff --git a/tensorflow/python/ops/linalg_ops.py b/tensorflow/python/ops/linalg_ops.py
new file mode 100644
index 0000000000..76fd83fb3d
--- /dev/null
+++ b/tensorflow/python/ops/linalg_ops.py
@@ -0,0 +1,62 @@
+"""Operations for linear algebra."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import gen_linalg_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_linalg_ops import *
+# pylint: enable=wildcard-import
+
+
+@ops.RegisterShape("Cholesky")
+def _CholeskyShape(op):
+  input_shape = op.inputs[0].get_shape().with_rank(2)
+  # The matrix must be square.
+  input_shape[0].assert_is_compatible_with(input_shape[1])
+  return [input_shape]
+
+
+@ops.RegisterShape("BatchCholesky")
+def _BatchCholeskyShape(op):
+  input_shape = op.inputs[0].get_shape().with_rank_at_least(3)
+  # The matrices in the batch must be square.
+  input_shape[-1].assert_is_compatible_with(input_shape[-2])
+  return [input_shape]
+
+
+@ops.RegisterShape("MatrixDeterminant")
+def _MatrixDeterminantShape(op):
+  input_shape = op.inputs[0].get_shape().with_rank(2)
+  # The matrix must be square.
+  input_shape[0].assert_is_compatible_with(input_shape[1])
+  if input_shape.ndims is not None:
+    return [tensor_shape.scalar()]
+  else:
+    return [tensor_shape.unknown_shape()]
+
+
+@ops.RegisterShape("BatchMatrixDeterminant")
+def _BatchMatrixDeterminantShape(op):
+  input_shape = op.inputs[0].get_shape().with_rank_at_least(3)
+  # The matrices in the batch must be square.
+  input_shape[-1].assert_is_compatible_with(input_shape[-2])
+  if input_shape.ndims is not None:
+    return [input_shape[:-2]]
+  else:
+    return [tensor_shape.unknown_shape()]
+
+
+@ops.RegisterShape("MatrixInverse")
+def _MatrixInverseShape(op):
+  input_shape = op.inputs[0].get_shape().with_rank(2)
+  # The matrix must be square.
+  input_shape[0].assert_is_compatible_with(input_shape[1])
+  return [input_shape]
+
+
+@ops.RegisterShape("BatchMatrixInverse")
+def _BatchMatrixInverseShape(op):
+  input_shape = op.inputs[0].get_shape().with_rank_at_least(3)
+  # The matrices in the batch must be square.
+  input_shape[-1].assert_is_compatible_with(input_shape[-2])
+  return [input_shape]
diff --git a/tensorflow/python/ops/logging_ops.py b/tensorflow/python/ops/logging_ops.py
new file mode 100644
index 0000000000..0fad4a2dde
--- /dev/null
+++ b/tensorflow/python/ops/logging_ops.py
@@ -0,0 +1,58 @@
+"""Logging Operations."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import gen_logging_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_logging_ops import *
+# pylint: enable=wildcard-import
+
+
+# Assert and Print are special symbols in python, so we must
+# use an upper-case version of them.
+def Assert(condition, data, summarize=None, name=None):
+  """Asserts that the given condition is true.
+
+  If `condition` evaluates to false, print the list of tensors in `data`.
+  `summarize` determines how many entries of the tensors to print.
+
+  Args:
+    condition: The condition to evaluate.
+    data: The tensors to print out when condition is false.
+    summarize: Print this many entries of each tensor.
+    name: A name for this operation (optional).
+  """
+  return gen_logging_ops._assert(condition, data, summarize, name)
+
+
+def Print(input_, data, message=None, first_n=None, summarize=None,
+          name=None):
+  """Prints a list of tensors.
+
+  This is an identity op with the side effect of printing `data` when
+  evaluating.
+
+  Args:
+    input_: A tensor passed through this op.
+    data: A list of tensors to print out when op is evaluated.
+    message: A string, prefix of the error message.
+    first_n: Only log `first_n` number of times. Negative numbers log always;
+             this is the default.
+    summarize: Only print this many entries of each tensor.
+    name: A name for the operation (optional).
+
+  Returns:
+    Same tensor as `input_`.
+  """
+  return gen_logging_ops._print(input_, data, message, first_n, summarize, name)
+
+
+@ops.RegisterGradient("Print")
+def _PrintGrad(op, *grad):
+  return list(grad) + [None] * (len(op.inputs) - 1)
+
+
+# NOTE(mrry): Assert and Print produce an empty output, which is
+# presumably never read.
+ops.RegisterShape("Assert")(common_shapes.unknown_shape)
+ops.RegisterShape("Print")(common_shapes.unknown_shape)
diff --git a/tensorflow/python/ops/math_grad.py b/tensorflow/python/ops/math_grad.py
new file mode 100644
index 0000000000..cb808ff5b8
--- /dev/null
+++ b/tensorflow/python/ops/math_grad.py
@@ -0,0 +1,506 @@
+"""Gradients for operators defined in math_ops.py."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import data_flow_ops
+from tensorflow.python.ops import gen_array_ops
+from tensorflow.python.ops import gen_math_ops
+from tensorflow.python.ops import math_ops
+
+
+def _ReductionGradAssist(op):
+  """Reduction grads have much in common, so factor the commonality out."""
+  inp = op.inputs[0]                                # Example:
+  input_shape = array_ops.shape(inp)                # [2, 3, 5, 7]
+  input_rank = array_ops.rank(inp)                  # 4
+  indices = op.inputs[1]                            # [1, 2]
+  indices_shape = array_ops.shape(indices)          # [2]
+  new_output_shape = data_flow_ops.dynamic_stitch(  # [2, 1, 1, 7]
+      [math_ops.range(0, input_rank),               # [0, 1, 2, 3]
+       indices],                                    # [1, 2]
+      [input_shape,                                 # [2, 3, 5, 7]
+       array_ops.fill(indices_shape, 1)])           # [1, 1]
+  return inp, new_output_shape, input_shape
+
+
+@ops.RegisterGradient("Sum")
+def _SumGrad(op, grad):
+  """Gradient for Sum."""
+  _, new_output_shape, input_shape = _ReductionGradAssist(op)
+  tile_scaling = input_shape / new_output_shape
+  grad = array_ops.reshape(grad, new_output_shape)
+  return [array_ops.tile(grad, tile_scaling), None]
+
+
+def _MinOrMaxGrad(op, grad):
+  """Gradient for Max or Max. Amazingly it's precisely the same code."""
+  inp, new_output_shape, _ = _ReductionGradAssist(op)
+  y = op.outputs[0]
+  y = array_ops.reshape(y, new_output_shape)
+  grad = array_ops.reshape(grad, new_output_shape)
+  indicators = math_ops.cast(math_ops.equal(y, inp), grad.dtype)
+  return [indicators * grad, None]
+
+
+@ops.RegisterGradient("Max")
+def _MaxGrad(op, grad):
+  """Gradient for Max."""
+  return _MinOrMaxGrad(op, grad)
+
+
+@ops.RegisterGradient("Min")
+def _MinGrad(op, grad):
+  return _MinOrMaxGrad(op, grad)
+
+
+@ops.RegisterGradient("Mean")
+def _MeanGrad(op, grad):
+  """Gradient for Mean."""
+  sum_grad = _SumGrad(op, grad)[0]
+  input_shape = array_ops.shape(op.inputs[0])
+  output_shape = array_ops.shape(op.outputs[0])
+  factor = (math_ops.reduce_prod(input_shape) /
+            math_ops.reduce_prod(output_shape))
+  return sum_grad / math_ops.cast(factor, sum_grad.dtype), None
+
+
+@ops.RegisterGradient("Prod")
+def _ProdGrad(op, grad):
+  """Gradient for Prod."""
+  # TODO(kearnes): this gives NaNs for 0s in the input tensor
+  _, new_output_shape, input_shape = _ReductionGradAssist(op)
+  tile_scaling = input_shape / new_output_shape
+  grad = array_ops.reshape(grad * op.outputs[0], new_output_shape)
+  grad = math_ops.div(array_ops.tile(grad, tile_scaling), op.inputs[0])
+  return grad, None
+
+
+@ops.RegisterGradient("SegmentSum")
+def _SegmentSumGrad(op, grad):
+  """Gradient for SegmentSum."""
+  return array_ops.gather(grad, op.inputs[1]), None
+
+
+@ops.RegisterGradient("SegmentMean")
+def _SegmentMeanGrad(op, grad):
+  """Gradient for SegmentMean."""
+  input_rank = array_ops.rank(op.inputs[0])
+  ones_shape = array_ops.concat(
+      0, [array_ops.shape(op.inputs[1]),
+          array_ops.fill(array_ops.expand_dims(input_rank - 1, 0), 1)])
+  ones = array_ops.fill(ones_shape,
+                        constant_op.constant(1, dtype=grad.dtype))
+  scaled_grad = grad * math_ops.inv(math_ops.segment_sum(ones, op.inputs[1]))
+  return array_ops.gather(scaled_grad, op.inputs[1]), None
+
+
+@ops.RegisterGradient("SparseSegmentSum")
+def _SparseSegmentSumGrad(op, grad):
+  """Gradient for SparseSegmentSum."""
+  input_rows = array_ops.shape(op.inputs[0])[0]
+  return (math_ops.unsorted_segment_sum(
+      array_ops.gather(grad, op.inputs[2]),
+      op.inputs[1], input_rows), None, None)
+
+
+@ops.RegisterGradient("SparseSegmentMean")
+def _SparseSegmentMeanGrad(op, grad):
+  """Gradient for SparseSegmentMean."""
+  dim0 = array_ops.shape(op.inputs[0])[0]
+  return (math_ops.sparse_segment_mean_grad(grad,
+                                            op.inputs[1],
+                                            op.inputs[2],
+                                            dim0),
+          None, None)
+
+
+@ops.RegisterGradient("SegmentMin")
+def _SegmentMinGrad(op, grad):
+  """Gradient for SegmentMin."""
+  zeros = array_ops.zeros(array_ops.shape(op.inputs[0]),
+                          dtype=op.inputs[0].dtype)
+  gathered_grads = array_ops.gather(grad, op.inputs[1])
+  gathered_outputs = array_ops.gather(op.outputs[0], op.inputs[1])
+  return math_ops.select(math_ops.greater(op.inputs[0], gathered_outputs),
+                         zeros,
+                         gathered_grads), None
+
+
+@ops.RegisterGradient("SegmentMax")
+def _SegmentMaxGrad(op, grad):
+  """Gradient for SegmentMax."""
+  zeros = array_ops.zeros(array_ops.shape(op.inputs[0]),
+                          dtype=op.inputs[0].dtype)
+  gathered_grads = array_ops.gather(grad, op.inputs[1])
+  gathered_outputs = array_ops.gather(op.outputs[0], op.inputs[1])
+  return math_ops.select(math_ops.less(op.inputs[0], gathered_outputs),
+                         zeros,
+                         gathered_grads), None
+
+
+@ops.RegisterGradient("UnsortedSegmentSum")
+def _UnsortedSegmentSumGrad(op, grad):
+  """Gradient for SegmentSum."""
+  return array_ops.gather(grad, op.inputs[1]), None, None
+
+
+@ops.RegisterGradient("Abs")
+def _AbsGrad(op, grad):
+  x = op.inputs[0]
+  return grad * math_ops.sign(x)
+
+
+@ops.RegisterGradient("Neg")
+def _NegGrad(_, grad):
+  """Returns -grad."""
+  return - grad
+
+
+@ops.RegisterGradient("Inv")
+def _InvGrad(op, grad):
+  """Returns -grad * (1 / x^2)."""
+  y = op.outputs[0]  # y = 1 / x
+  return grad * (- math_ops.square(y))
+
+
+@ops.RegisterGradient("Square")
+def _SquareGrad(op, grad):
+  x = op.inputs[0]
+  return grad * (2.0 * x)
+
+
+@ops.RegisterGradient("Sqrt")
+def _SqrtGrad(op, grad):
+  y = op.outputs[0]  # y = x^(1/2)
+  return grad * (.5 * math_ops.inv(y))
+
+
+@ops.RegisterGradient("Rsqrt")
+def _RsqrtGrad(op, grad):
+  x = op.inputs[0]
+  y = op.outputs[0]  # y = x^(-1/2)
+  return grad * ((-0.5) * math_ops.inv(x) * y)
+
+
+@ops.RegisterGradient("Exp")
+def _ExpGrad(op, grad):
+  """Returns grad * exp(x)."""
+  y = op.outputs[0]  # y = e^x
+  return grad * y
+
+
+@ops.RegisterGradient("Log")
+def _LogGrad(op, grad):
+  """Returns grad * (1/x)."""
+  x = op.inputs[0]
+  return grad * math_ops.inv(x)
+
+
+@ops.RegisterGradient("Tanh")
+def _TanhGrad(op, grad):
+  """Returns grad * (1 - tanh(x) * tanh(x))."""
+  y = op.outputs[0]  # y = tanh(x)
+  return grad * (1 - math_ops.square(y))
+
+
+@ops.RegisterGradient("Sigmoid")
+def _SigmoidGrad(op, grad):
+  """Returns grad * sigmoid(x) * (1 - sigmoid(x))."""
+  y = op.outputs[0]  # y = sigmoid(x)
+  return grad * (y * (1 - y))
+
+
+@ops.RegisterGradient("Sign")
+def _SignGrad(op, _):
+  """Returns 0."""
+  x = op.inputs[0]
+  return array_ops.zeros(array_ops.shape(x), dtype=x.dtype)
+
+
+@ops.RegisterGradient("Sin")
+def _SinGrad(op, grad):
+  """Returns grad * cos(x)."""
+  x = op.inputs[0]
+  return grad * math_ops.cos(x)
+
+
+@ops.RegisterGradient("Cos")
+def _CosGrad(op, grad):
+  """Returns grad * -sin(x)."""
+  x = op.inputs[0]
+  return -grad * math_ops.sin(x)
+
+
+@ops.RegisterGradient("AddN")
+def _AddNGrad(op, grad):
+  """Copies the gradient to all inputs."""
+  # Not broadcasting.
+  return [grad] * len(op.inputs)
+
+
+@ops.RegisterGradient("Add")
+def _AddGrad(op, grad):
+  x = op.inputs[0]
+  y = op.inputs[1]
+  sx = array_ops.shape(x)
+  sy = array_ops.shape(y)
+  rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
+  return (array_ops.reshape(math_ops.reduce_sum(grad, rx), sx),
+          array_ops.reshape(math_ops.reduce_sum(grad, ry), sy))
+
+
+@ops.RegisterGradient("Sub")
+def _SubGrad(op, grad):
+  x = op.inputs[0]
+  y = op.inputs[1]
+  sx = array_ops.shape(x)
+  sy = array_ops.shape(y)
+  rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
+  return (array_ops.reshape(math_ops.reduce_sum(grad, rx), sx),
+          array_ops.reshape(-math_ops.reduce_sum(grad, ry), sy))
+
+
+@ops.RegisterGradient("Mul")
+def _MulGrad(op, grad):
+  x = op.inputs[0]
+  y = op.inputs[1]
+  assert x.dtype.base_dtype == y.dtype.base_dtype, (x.dtype, " vs. ", y.dtype)
+  sx = array_ops.shape(x)
+  sy = array_ops.shape(y)
+  rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
+  if x.dtype.base_dtype == types.complex64:
+    return (array_ops.reshape(math_ops.reduce_sum(grad * math_ops.conj(y), rx), sx),
+            array_ops.reshape(math_ops.reduce_sum(math_ops.conj(x) * grad, ry), sy))
+  else:
+    return (array_ops.reshape(math_ops.reduce_sum(grad * y, rx), sx),
+            array_ops.reshape(math_ops.reduce_sum(x * grad, ry), sy))
+
+
+@ops.RegisterGradient("Div")
+def _DivGrad(op, grad):
+  x = op.inputs[0]
+  y = op.inputs[1]
+  sx = array_ops.shape(x)
+  sy = array_ops.shape(y)
+  rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
+  return (array_ops.reshape(math_ops.reduce_sum(grad / y, rx), sx),
+          array_ops.reshape(math_ops.reduce_sum(grad *
+                                         (-x / math_ops.square(y)), ry), sy))
+
+
+@ops.RegisterGradient("Pow")
+def _PowGrad(op, grad):
+  """Returns grad * (y*x^(y-1), z*log(x))."""
+  x = op.inputs[0]
+  y = op.inputs[1]
+  z = op.outputs[0]
+  sx = array_ops.shape(x)
+  sy = array_ops.shape(y)
+  rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
+  gx = array_ops.reshape(math_ops.reduce_sum(grad * y * math_ops.pow(x, y - 1), rx),
+                         sx)
+  gy = array_ops.reshape(math_ops.reduce_sum(grad * z * math_ops.log(x), ry), sy)
+  return gx, gy
+
+
+def _MaximumMinimumGrad(op, grad, selector_op):
+  """Factor out the code for the gradient of Maximum or Minimum."""
+  x = op.inputs[0]
+  y = op.inputs[1]
+  gdtype = grad.dtype
+  sx = array_ops.shape(x)
+  sy = array_ops.shape(y)
+  gradshape = array_ops.shape(grad)
+  zeros = array_ops.zeros(gradshape, gdtype)
+  xmask = selector_op(x, y)
+  rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
+  xgrad = math_ops.select(xmask, grad, zeros)
+  ygrad = math_ops.select(math_ops.logical_not(xmask), grad, zeros)
+  gx = array_ops.reshape(math_ops.reduce_sum(xgrad, rx), sx)
+  gy = array_ops.reshape(math_ops.reduce_sum(ygrad, ry), sy)
+  return (gx, gy)
+
+
+@ops.RegisterGradient("Maximum")
+def _MaximumGrad(op, grad):
+  """Returns grad*(x > y, x <= y) with type of grad."""
+  return _MaximumMinimumGrad(op, grad, math_ops.greater_equal)
+
+
+@ops.RegisterGradient("Minimum")
+def _MinimumGrad(op, grad):
+  """Returns grad*(x < y, x >= y) with type of grad."""
+  return _MaximumMinimumGrad(op, grad, math_ops.less_equal)
+
+
+# Logical operations have no gradients.
+ops.NoGradient("Less")
+ops.NoGradient("LessEqual")
+ops.NoGradient("Greater")
+ops.NoGradient("GreaterEqual")
+ops.NoGradient("Equal")
+ops.NoGradient("NotEqual")
+ops.NoGradient("LogicalAnd")
+ops.NoGradient("LogicalOr")
+ops.NoGradient("LogicalNot")
+
+
+@ops.RegisterGradient("Select")
+def _SelectGrad(op, grad):
+  c = op.inputs[0]
+  x = op.inputs[1]
+  zeros = array_ops.zeros(array_ops.shape(c), dtype=x.dtype)
+  return (None, math_ops.select(c, grad, zeros),
+          math_ops.select(c, zeros, grad))
+
+
+@ops.RegisterGradient("MatMul")
+def _MatMulGrad(op, grad):
+  t_a = op.get_attr("transpose_a")
+  t_b = op.get_attr("transpose_b")
+  if not t_a and not t_b:
+    return (math_ops.matmul(grad, op.inputs[1], transpose_b=True),
+            math_ops.matmul(op.inputs[0], grad, transpose_a=True))
+  elif not t_a and t_b:
+    return (math_ops.matmul(grad, op.inputs[1]),
+            math_ops.matmul(grad, op.inputs[0], transpose_a=True))
+  elif t_a and not t_b:
+    return (math_ops.matmul(op.inputs[1], grad, transpose_b=True),
+            math_ops.matmul(op.inputs[0], grad))
+  elif t_a and t_b:
+    return (math_ops.matmul(op.inputs[1], grad, transpose_a=True,
+                            transpose_b=True),
+            math_ops.matmul(grad, op.inputs[0], transpose_a=True,
+                            transpose_b=True))
+
+
+@ops.RegisterGradient("SparseMatMul")
+def _SparseMatMulGrad(op, grad):
+  """Gradient for SparseMatMul."""
+
+  t_a = op.get_attr("transpose_a")
+  t_b = op.get_attr("transpose_b")
+  is_sparse = {
+      op.inputs[0]: op.get_attr("a_is_sparse"),
+      op.inputs[1]: op.get_attr("b_is_sparse"),
+      # Use heuristic to figure out if grad might be sparse
+      grad: (grad.op.type == "ReluGrad")
+  }
+  def _SparseMatMul(t1, t2, transpose_a=False, transpose_b=False):
+    """Helper function to create SparseMatMul op."""
+
+    assert t1 in is_sparse and t2 in is_sparse
+    t1_sparse = is_sparse[t1]
+    t2_sparse = is_sparse[t2]
+    if not t1_sparse and not t2_sparse:
+      return math_ops.matmul(t1, t2,
+                             transpose_a=transpose_a,
+                             transpose_b=transpose_b)
+    transpose_out = False
+    if not t1_sparse:
+      transpose_out = True
+      t1, t2 = t2, t1
+      t1_sparse, t2_sparse = t2_sparse, t1_sparse
+      assert t1_sparse
+      transpose_a, transpose_b = not transpose_b, not transpose_a
+
+    if transpose_b:
+      t2 = array_ops.transpose(t2)
+      transpose_b = False
+    m = math_ops.matmul(t1, t2,
+                        transpose_a=transpose_a,
+                        transpose_b=transpose_b,
+                        a_is_sparse=t1_sparse,
+                        b_is_sparse=t2_sparse)
+    if transpose_out:
+      m = array_ops.transpose(m)
+    return m
+
+  if not t_a and not t_b:
+    return (_SparseMatMul(grad, op.inputs[1], transpose_b=True),
+            _SparseMatMul(op.inputs[0], grad, transpose_a=True))
+  elif not t_a and t_b:
+    return (_SparseMatMul(grad, op.inputs[1]),
+            _SparseMatMul(grad, op.inputs[0], transpose_a=True))
+  elif t_a and not t_b:
+    return (_SparseMatMul(op.inputs[1], grad, transpose_b=True),
+            _SparseMatMul(op.inputs[0], grad))
+  elif t_a and t_b:
+    return (_SparseMatMul(op.inputs[1], grad,
+                          transpose_a=True, transpose_b=True),
+            _SparseMatMul(grad, op.inputs[0],
+                          transpose_a=True, transpose_b=True))
+
+
+@ops.RegisterGradient("Floor")
+def _FloorGrad(_, grad):
+  return grad
+
+
+@ops.RegisterGradient("BatchMatMul")
+def _BatchMatMul(op, grad):
+  """Returns the gradient of x and y given the gradient of x * y."""
+  x = op.inputs[0]
+  y = op.inputs[1]
+  adj_x = op.get_attr("adj_x")
+  adj_y = op.get_attr("adj_y")
+
+  if not adj_x:
+    if not adj_y:
+      grad_x = math_ops.batch_matmul(grad, y, False, True)
+      grad_y = math_ops.batch_matmul(x, grad, True, False)
+    else:
+      grad_x = math_ops.batch_matmul(grad, y, False, False)
+      grad_y = math_ops.batch_matmul(grad, x, True, False)
+  else:
+    if not adj_y:
+      grad_x = math_ops.batch_matmul(y, grad, False, True)
+      grad_y = math_ops.batch_matmul(x, grad, False, False)
+    else:
+      grad_x = math_ops.batch_matmul(y, grad, True, True)
+      grad_y = math_ops.batch_matmul(grad, x, True, True)
+
+  return grad_x, grad_y
+
+
+ops.NoGradient("Range")
+ops.NoGradient("LinSpace")
+
+
+@ops.RegisterGradient("Complex")
+def _ComplexGrad(_, grad):
+  """Returns the real and imaginary components of 'grad', respectively."""
+  return math_ops.real(grad), math_ops.imag(grad)
+
+
+@ops.RegisterGradient("Real")
+def _RealGrad(_, grad):
+  """Returns 'grad' as the real part and set the imaginary part 0."""
+  zero = constant_op.constant(0, dtype=grad.dtype)
+  return math_ops.complex(grad, zero)
+
+
+@ops.RegisterGradient("Imag")
+def _ImagGrad(_, grad):
+  """Returns 'grad' as the imaginary part and set the real part 0."""
+  zero = constant_op.constant(0, dtype=grad.dtype)
+  return math_ops.complex(zero, grad)
+
+
+@ops.RegisterGradient("Conj")
+def _ConjGrad(_, grad):
+  """Returns the complex conjugate of grad."""
+  return math_ops.conj(grad)
+
+
+@ops.RegisterGradient("Cast")
+def _CastGrad(op, grad):
+  t = [types.float32, types.float64, types.bfloat16]
+  src_type = op.inputs[0].dtype.base_dtype
+  dst_type = grad.dtype.base_dtype
+  if src_type in t and dst_type in t:
+    return math_ops.cast(grad, src_type)
+  else:
+    return None
diff --git a/tensorflow/python/ops/math_ops.py b/tensorflow/python/ops/math_ops.py
new file mode 100644
index 0000000000..d96320e96e
--- /dev/null
+++ b/tensorflow/python/ops/math_ops.py
@@ -0,0 +1,1201 @@
+"""## Arithmetic Operators
+
+TensorFlow provides several operations that you can use to add basic arithmetic
+operators to your graph.
+
+@@add
+@@sub
+@@mul
+@@div
+@@mod
+
+## Basic Math Functions
+
+TensorFlow provides several operations that you can use to add basic
+mathematical functions to your graph.
+
+@@add_n
+@@abs
+@@neg
+@@sign
+@@inv
+@@square
+@@round
+@@sqrt
+@@rsqrt
+@@pow
+@@exp
+@@log
+@@ceil
+@@floor
+@@maximum
+@@minimum
+@@cos
+@@sin
+
+## Matrix Math Functions
+
+TensorFlow provides several operations that you can use to add basic
+mathematical functions for matrices to your graph.
+
+@@diag
+@@transpose
+
+@@matmul
+@@batch_matmul
+
+@@matrix_determinant
+@@batch_matrix_determinant
+
+@@matrix_inverse
+@@batch_matrix_inverse
+
+@@cholesky
+@@batch_cholesky
+
+## Complex Number Functions
+
+TensorFlow provides several operations that you can use to add complex number
+functions to your graph.
+
+@@complex
+@@complex_abs
+@@conj
+@@imag
+@@real
+
+## Reduction
+
+TensorFlow provides several operations that you can use to perform
+common math computations that reduce various dimensions of a tensor.
+
+@@reduce_sum
+@@reduce_prod
+@@reduce_min
+@@reduce_max
+@@reduce_mean
+@@reduce_all
+@@reduce_any
+
+@@accumulate_n
+
+## Segmentation
+
+TensorFlow provides several operations that you can use to perform common
+math computations on tensor segments.
+Here a segmentation is a partitioning of a tensor along
+the first dimension, i.e. it  defines a mapping from the first dimension onto
+`segment_ids`. The `segment_ids` tensor should be the size of
+the first dimension, `d0`, with consecutive IDs in the range `0` to `k`,
+where `k<d0`.
+In particular, a segmentation of a matrix tensor is a mapping of rows to
+segments.
+
+For example:
+
+```python
+c = tf.constant([[1,2,3,4], [-1,-2,-3,-4], [5,6,7,8]])
+tf.segment_sum(c, tf.constant([0, 0, 1]))
+  ==>  [[0 0 0 0]
+        [5 6 7 8]]
+```
+
+@@segment_sum
+@@segment_prod
+@@segment_min
+@@segment_max
+@@segment_mean
+
+@@unsorted_segment_sum
+
+@@sparse_segment_sum
+@@sparse_segment_mean
+
+
+## Sequence Comparison and Indexing
+
+TensorFlow provides several operations that you can use to add sequence
+comparison and index extraction to your graph. You can use these operations to
+determine sequence differences and determine the indexes of specific values in
+a tensor.
+
+@@argmin
+@@argmax
+
+@@listdiff
+@@where
+@@unique
+
+@@edit_distance
+
+@@invert_permutation
+"""
+import itertools
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import gen_math_ops
+from tensorflow.python.ops import state_ops
+from tensorflow.python.ops import gen_state_ops
+# pylint: disable=wildcard-import,undefined-variable
+from tensorflow.python.ops.gen_math_ops import *
+
+
+# Aliases for some automatically-generated names.
+argmax = gen_math_ops.arg_max
+argmin = gen_math_ops.arg_min
+linspace = gen_math_ops.lin_space
+
+
+# pylint: disable=anomalous-backslash-in-string,protected-access
+def abs(x, name=None):
+  """Computes the absolute value of a tensor.
+
+  Given a tensor of real numbers `x`, this operation returns a tensor
+  containing the absolute value of each element in `x`. For example, if x is
+  an input element and y is an output element, this operation computes
+  \\\\(y = |x|\\\\).
+
+  See [`tf.complex_abs()`](#tf_complex_abs) to compute the absolute value of a complex
+  number.
+
+  Args:
+    x: A `Tensor` of type `float`, `double`, `int32`, or `int64`.
+    name: A name for the operation (optional).
+
+  Returns:
+     A `Tensor` the same size and type as `x` with absolute values.
+  """
+  with ops.op_scope([x], name, "Abs") as name:
+    x = ops.convert_to_tensor(x, name="x")
+    if x.dtype == types.complex64:
+      return gen_math_ops.complex_abs(x, name=name)
+    return gen_math_ops._abs(x, name=name)
+
+
+
+def pow(x, y, name=None):
+  """Computes the power of one value to another.
+
+  Given a tensor `x` and a tensor `y`, this operation computes \\\\(x^y\\\\) for
+  corresponding elements in `x` and `y`. For example:
+
+  ```
+  # tensor 'x' is [[2, 2]], [3, 3]]
+  # tensor 'y' is [[8, 16], [2, 3]]
+  tf.pow(x, y) ==> [[256, 65536], [9, 27]]
+  ```
+
+  Args:
+    x: A `Tensor` of type `float`, `double`, `int32`, `complex64`, or `int64`.
+    y: A `Tensor` of type `float`, `double`, `int32`, `complex64`, or `int64`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor`.
+  """
+  with ops.op_scope([x], name, "Pow") as name:
+    return gen_math_ops._pow(x, y, name=name)
+
+
+def complex(real, imag, name=None):
+  """Converts two real numbers to a complex number.
+
+  Given a tensor `real` representing the real part of a complex number, and a
+  tensor `imag` representing the imaginary part of a complex number, this
+  operation computes complex numbers elementwise of the form \\\\(a + bj\\\\),
+  where *a* represents the `real` part and *b* represents the `imag` part.
+
+  The input tensors `real` and `imag` must be the same shape.
+
+  For example:
+
+  ```
+  # tensor 'real' is [2.25, 3.25]
+  # tensor `imag` is [4.75, 5.75]
+  tf.complex(real, imag) ==> [[2.25 + 4.74j], [3.25 + 5.75j]]
+  ```
+
+  Args:
+    real: A `Tensor` of type `float`.
+    imag: A `Tensor` of type `float`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` of type `complex64`.
+  """
+  with ops.op_scope([real, imag], name, "Complex") as name:
+    return gen_math_ops._complex(real, imag, name=name)
+
+
+def round(x, name=None):
+  """Rounds the values of a tensor to the nearest integer, element-wise.
+
+  For example:
+
+  ```python
+  # 'a' is [0.9, 2.5, 2.3, -4.4]
+  tf.round(a) ==> [ 1.0, 3.0, 2.0, -4.0 ]
+  ```
+
+  Args:
+    x: A `Tensor` of type `float` or `double`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` of same shape and type as `x`.
+  """
+  x = ops.convert_to_tensor(x, name="x")
+  if x.dtype.is_integer:
+    return x
+  else:
+    return floor(x + 0.5, name=name)
+
+
+def cast(x, dtype, name=None):
+  """Casts a tensor to a new type.
+
+  The operation casts `x` (in case of `Tensor`) or `x.values`
+  (in case of `SparseTensor`) to `dtype`.
+
+  For example:
+
+  ```python
+  # tensor `a` is [1.8, 2.2], dtype=tf.float
+  tf.cast(a, tf.int32) ==> [1, 2]  # dtype=tf.int32
+  ```
+
+  Args:
+    x: A `Tensor` or `SparseTensor`.
+    dtype: The destination type.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` or `SparseTensor` with same shape as `x`.
+
+  Raises:
+    TypeError: If `x` cannot be cast to the `dtype`.
+  """
+  with ops.op_scope([x], name, "Cast") as name:
+    if isinstance(x, ops.SparseTensor):
+      values_cast = cast(x.values, dtype, name=name)
+      return ops.SparseTensor(x.indices, values_cast, x.shape)
+    else:
+      # TODO(mdevin): Handle what Josh said.
+      #
+      # Could return ops.convert_to_tensor(x, dtype=dtype, ...)  here, but that
+      # allows some conversions that cast() can't do, e.g.  casting numbers to
+      # strings.
+      x = ops.convert_to_tensor(x, name="x")
+      if x.dtype.base_dtype == dtype:
+        return x
+      return gen_math_ops.cast(x, dtype, name=name)
+
+
+def to_float(x, name="ToFloat"):
+  """Casts a tensor to type `float32`.
+
+  Args:
+    x: A `Tensor` or `SparseTensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` or `SparseTensor` with same shape as `x` with type `float32`.
+
+  Raises:
+    TypeError: If `x` cannot be cast to the `float32`.
+  """
+  return cast(x, types.float32, name=name)
+
+
+def to_double(x, name="ToDouble"):
+  """Casts a tensor to type `float64`.
+
+  Args:
+    x: A `Tensor` or `SparseTensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` or `SparseTensor` with same shape as `x` with type `float64`.
+
+  Raises:
+    TypeError: If `x` cannot be cast to the `float64`.
+  """
+  return cast(x, types.float64, name=name)
+
+
+def to_int32(x, name="ToInt32"):
+  """Casts a tensor to type `int32`.
+
+  Args:
+    x: A `Tensor` or `SparseTensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` or `SparseTensor` with same shape as `x` with type `int32`.
+
+  Raises:
+    TypeError: If `x` cannot be cast to the `int32`.
+  """
+  return cast(x, types.int32, name=name)
+
+
+def to_int64(x, name="ToInt64"):
+  """Casts a tensor to type `int64`.
+
+  Args:
+    x: A `Tensor` or `SparseTensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` or `SparseTensor` with same shape as `x` with type `int64`.
+
+  Raises:
+    TypeError: If `x` cannot be cast to the `int64`.
+  """
+  return cast(x, types.int64, name=name)
+
+
+def to_bfloat16(x, name="ToBFloat16"):
+  """Casts a tensor to type `bfloat16`.
+
+  Args:
+    x: A `Tensor` or `SparseTensor`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` or `SparseTensor` with same shape as `x` with type `bfloat16`.
+
+  Raises:
+    TypeError: If `x` cannot be cast to the `bfloat16`.
+  """
+  return cast(x, types.bfloat16, name=name)
+
+
+ops.Tensor._override_operator("__neg__", neg)
+ops.Tensor._override_operator("__abs__", abs)
+# __invert__ corresponds to the ~ operator.  Here we follow the numpy convention
+# ~ marks an elementwise bit-wise inverse.  This is only implemented for boolean
+# tensors and will throw a TypeError if used on nonboolean arrays
+ops.Tensor._override_operator("__invert__", logical_not)
+
+
+def _OverrideBinaryOperatorHelper(func, op_name):
+  """Register operators with different tensor and scalar versions.
+
+  Args:
+    func: the operator
+    op_name: name of the operator being overridden
+  """
+
+  def binary_op_wrapper(x, y):
+    with ops.op_scope([x, y], None, op_name) as name:
+      assert isinstance(x, ops.Tensor)
+      y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
+      return func(x, y, name=name)
+
+  ops.Tensor._override_operator("__%s__" % op_name, binary_op_wrapper)
+  del binary_op_wrapper
+
+  def r_binary_op_wrapper(y, x):
+    with ops.op_scope([x, y], None, op_name) as name:
+      assert isinstance(y, ops.Tensor)
+      x = ops.convert_to_tensor(x, dtype=y.dtype.base_dtype, name="x")
+      return func(x, y, name=name)
+
+  ops.Tensor._override_operator("__r%s__" % op_name, r_binary_op_wrapper)
+  del r_binary_op_wrapper
+
+
+_OverrideBinaryOperatorHelper(add, "add")
+_OverrideBinaryOperatorHelper(sub, "sub")
+_OverrideBinaryOperatorHelper(mul, "mul")
+_OverrideBinaryOperatorHelper(div, "div")
+_OverrideBinaryOperatorHelper(mod, "mod")
+
+
+def logical_xor(x, y, name="LogicalXor"):
+  """x ^ y = (x | y) & ~(x & y)."""
+  # TODO(alemi) Make this a cwise op if people end up relying on it.
+  return logical_and(logical_or(x, y), logical_not(logical_and(x, y)),
+                     name=name)
+
+_OverrideBinaryOperatorHelper(logical_and, "and")
+_OverrideBinaryOperatorHelper(logical_or, "or")
+_OverrideBinaryOperatorHelper(logical_xor, "xor")
+
+ops.Tensor._override_operator("__lt__", less)
+ops.Tensor._override_operator("__le__", less_equal)
+ops.Tensor._override_operator("__gt__", greater)
+ops.Tensor._override_operator("__ge__", greater_equal)
+
+
+def range(start, limit, delta=1, name="range"):
+  """Creates a sequence of integers.
+
+  This operation creates a sequence of integers that begins at `start` and
+  extends by increments of `delta` up to but not including `limit`.
+
+  For example:
+
+  ```
+  # 'start' is 3
+  # 'limit' is 18
+  # 'delta' is 3
+  tf.range(start, limit, delta) ==> [3, 6, 9, 12, 15]
+  ```
+
+  Args:
+    start: A 0-D (scalar) of type `int32`. First entry in sequence.
+    limit: A 0-D (scalar) of type `int32`. Upper limit of sequence,
+      exclusive.
+    delta: A 0-D `Tensor` (scalar) of type `int32`. Optional. Default is 1.
+      Number that increments `start`.
+    name: A name for the operation (optional).
+
+  Returns:
+    An 1-D `int32` `Tensor`.
+  """
+  return gen_math_ops._range(start, limit, delta, name=name)
+
+
+@ops.RegisterShape("Range")
+def _RangeShape(op):
+  start_value = tensor_util.ConstantValue(op.inputs[0])
+  limit_value = tensor_util.ConstantValue(op.inputs[1])
+  delta_value = tensor_util.ConstantValue(op.inputs[2])
+  if start_value is None or limit_value is None or delta_value is None:
+    return [tensor_shape.vector(None)]
+  else:
+    return [tensor_shape.vector(
+        (limit_value - start_value + delta_value - 1) / delta_value)]
+
+
+# Reduction operations
+def _ReductionDims(x, reduction_indices):
+  """Returns range(0, rank(x)) if reduction_indices is None."""
+  if reduction_indices is not None:
+    return reduction_indices
+  else:
+    return range(0, array_ops.rank(x))
+
+
+def reduce_sum(input_tensor, reduction_indices=None, keep_dims=False,
+               name=None):
+  """Computes the sum of elements across dimensions of a tensor.
+
+  Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+  Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+  entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+  are retained with length 1.
+
+  If `reduction_indices` has no entries, all dimensions are reduced, and a
+  tensor with a single element is returned.
+
+  For example:
+
+  ```python
+  # 'x' is [[1, 1, 1]]
+  #         [1, 1, 1]]
+  tf.reduce_sum(x) ==> 6
+  tf.reduce_sum(x, 0) ==> [2, 2, 2]
+  tf.reduce_sum(x, 1) ==> [3, 3]
+  tf.reduce_sum(x, 1, keep_dims=True) ==> [[3], [3]]
+  tf.reduce_sum(x, [0, 1]) ==> 6
+  ```
+
+  Args:
+    input_tensor: The tensor to reduce. Should have numeric type.
+    reduction_indices: The dimensions to reduce. If `None` (the defaut),
+      reduces all dimensions.
+    keep_dims: If true, retains reduced dimensions with length 1.
+    name: A name for the operation (optional).
+
+  Returns:
+    The reduced tensor.
+  """
+  return gen_math_ops._sum(input_tensor, _ReductionDims(input_tensor,
+                                                        reduction_indices),
+                           keep_dims, name=name)
+
+
+def reduce_mean(input_tensor, reduction_indices=None, keep_dims=False,
+                name=None):
+  """Computes the mean of elements across dimensions of a tensor.
+
+  Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+  Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+  entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+  are retained with length 1.
+
+  If `reduction_indices` has no entries, all dimensions are reduced, and a
+  tensor with a single element is returned.
+
+  For example:
+
+  ```python
+  # 'x' is [[1., 1. ]]
+  #         [2., 2.]]
+  tf.reduce_mean(x) ==> 1.5
+  tf.reduce_mean(x, 0) ==> [1.5, 1.5]
+  tf.reduce_mean(x, 1) ==> [1.,  2.]
+  ```
+
+  Args:
+    input_tensor: The tensor to reduce. Should have numeric type.
+    reduction_indices: The dimensions to reduce. If `None` (the defaut),
+      reduces all dimensions.
+    keep_dims: If true, retains reduced dimensions with length 1.
+    name: A name for the operation (optional).
+
+  Returns:
+    The reduced tensor.
+  """
+  return gen_math_ops._mean(input_tensor, _ReductionDims(input_tensor,
+                                                         reduction_indices),
+                            keep_dims, name=name)
+
+
+def reduce_prod(input_tensor, reduction_indices=None, keep_dims=False,
+                name=None):
+  """Computes the product of elements across dimensions of a tensor.
+
+  Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+  Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+  entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+  are retained with length 1.
+
+  If `reduction_indices` has no entries, all dimensions are reduced, and a
+  tensor with a single element is returned.
+
+  Args:
+    input_tensor: The tensor to reduce. Should have numeric type.
+    reduction_indices: The dimensions to reduce. If `None` (the defaut),
+      reduces all dimensions.
+    keep_dims: If true, retains reduced dimensions with length 1.
+    name: A name for the operation (optional).
+
+  Returns:
+    The reduced tensor.
+  """
+  return gen_math_ops._prod(input_tensor, _ReductionDims(input_tensor,
+                                                         reduction_indices),
+                            keep_dims, name=name)
+
+
+def reduce_min(input_tensor, reduction_indices=None, keep_dims=False,
+               name=None):
+  """Computes the minimum of elements across dimensions of a tensor.
+
+  Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+  Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+  entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+  are retained with length 1.
+
+  If `reduction_indices` has no entries, all dimensions are reduced, and a
+  tensor with a single element is returned.
+
+  Args:
+    input_tensor: The tensor to reduce. Should have numeric type.
+    reduction_indices: The dimensions to reduce. If `None` (the defaut),
+      reduces all dimensions.
+    keep_dims: If true, retains reduced dimensions with length 1.
+    name: A name for the operation (optional).
+
+  Returns:
+    The reduced tensor.
+  """
+  return gen_math_ops._min(input_tensor, _ReductionDims(input_tensor,
+                                                        reduction_indices),
+                           keep_dims, name=name)
+
+
+def reduce_max(input_tensor, reduction_indices=None, keep_dims=False,
+               name=None):
+  """Computes the maximum of elements across dimensions of a tensor.
+
+  Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+  Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+  entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+  are retained with length 1.
+
+  If `reduction_indices` has no entries, all dimensions are reduced, and a
+  tensor with a single element is returned.
+
+  Args:
+    input_tensor: The tensor to reduce. Should have numeric type.
+    reduction_indices: The dimensions to reduce. If `None` (the defaut),
+      reduces all dimensions.
+    keep_dims: If true, retains reduced dimensions with length 1.
+    name: A name for the operation (optional).
+
+  Returns:
+    The reduced tensor.
+  """
+  return gen_math_ops._max(input_tensor, _ReductionDims(input_tensor,
+                                                        reduction_indices),
+                           keep_dims, name=name)
+
+
+def reduce_all(input_tensor, reduction_indices=None, keep_dims=False,
+               name=None):
+  """Computes the "logical and" of elements across dimensions of a tensor.
+
+  Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+  Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+  entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+  are retained with length 1.
+
+  If `reduction_indices` has no entries, all dimensions are reduced, and a
+  tensor with a single element is returned.
+
+  For example:
+
+  ```python
+  # 'x' is [[True,  True]]
+  #         [False, False]]
+  tf.reduce_all(x) ==> False
+  tf.reduce_all(x, 0) ==> [False, False]
+  tf.reduce_all(x, 1) ==> [True, False]
+  ```
+
+  Args:
+    input_tensor: The boolean tensor to reduce.
+    reduction_indices: The dimensions to reduce. If `None` (the defaut),
+      reduces all dimensions.
+    keep_dims: If true, retains reduced dimensions with length 1.
+    name: A name for the operation (optional).
+
+  Returns:
+    The reduced tensor.
+  """
+  return gen_math_ops._all(input_tensor, _ReductionDims(input_tensor,
+                                                        reduction_indices),
+                           keep_dims, name=name)
+
+
+def reduce_any(input_tensor, reduction_indices=None, keep_dims=False,
+               name=None):
+  """Computes the "logical or" of elements across dimensions of a tensor.
+
+  Reduces `input_tensor` along the dimensions given in `reduction_indices`.
+  Unless `keep_dims` is true, the rank of the tensor is reduced by 1 for each
+  entry in `reduction_indices`. If `keep_dims` is true, the reduced dimensions
+  are retained with length 1.
+
+  If `reduction_indices` has no entries, all dimensions are reduced, and a
+  tensor with a single element is returned.
+
+  For example:
+
+  ```python
+  # 'x' is [[True,  True]]
+  #         [False, False]]
+  tf.reduce_any(x) ==> True
+  tf.reduce_any(x, 0) ==> [True, True]
+  tf.reduce_any(x, 1) ==> [True, False]
+  ```
+
+  Args:
+    input_tensor: The boolean tensor to reduce.
+    reduction_indices: The dimensions to reduce. If `None` (the defaut),
+      reduces all dimensions.
+    keep_dims: If true, retains reduced dimensions with length 1.
+    name: A name for the operation (optional).
+
+  Returns:
+    The reduced tensor.
+  """
+  return gen_math_ops._any(input_tensor, _ReductionDims(input_tensor,
+                                                        reduction_indices),
+                           keep_dims, name=name)
+
+
+def matmul(a, b,
+           transpose_a=False, transpose_b=False,
+           a_is_sparse=False, b_is_sparse=False,
+           name=None):
+  """Multiplies matrix `a` by matrix `b`, producing `a` * `b`.
+
+  The inputs must be two-dimensional matrices, with matching inner dimensions,
+  possibly after transposition.
+
+  Both matrices must be of the same type. The supported types are:
+  `float`, `double`, `int32`, `complex64`.
+
+  Either matrix can be transposed on the fly by setting the corresponding flag
+  to `True`. This is `False` by default.
+
+  If one or both of the matrices contain a lot of zeros, a more efficient
+  multiplication algorithm can be used by setting the corresponding
+  `a_is_sparse` or `b_is_sparse` flag to `True`. These are `False` by default.
+
+  For example:
+
+  ```python
+  # 2-D tensor `a`
+  a = tf.constant([1, 2, 3, 4, 5, 6], shape=[2, 3]) => [[1. 2. 3.]
+                                                        [4. 5. 6.]]
+  # 2-D tensor `b`
+  b = tf.constant([7, 8, 9, 10, 11, 12], shape=[3, 2]) => [[7. 8.]
+                                                           [9. 10.]
+                                                           [11. 12.]]
+  c = tf.matmul(a, b) => [[58 64]
+                          [139 154]]
+  ```
+
+  Args:
+    a: `Tensor` of type `float`, `double`, `int32` or `complex64`.
+    b: `Tensor` with same type as `a`.
+    transpose_a: If `True`, `a` is transposed before multiplication.
+    transpose_b: If `True`, `b` is transposed before multiplication.
+    a_is_sparse: If `True`, `a` is treated as a sparse matrix.
+    b_is_sparse: If `True`, `b` is treated as a sparse matrix.
+    name: Name for the operation (optional).
+
+  Returns:
+    A `Tensor` of the same type as `a`.
+  """
+  with ops.op_scope([a, b], name, "MatMul") as name:
+    a = ops.convert_to_tensor(a, name="a")
+    b = ops.convert_to_tensor(b, name="b")
+    if a.dtype == types.float32 and (a_is_sparse or b_is_sparse):
+      return sparse_matmul(a, b,
+                           transpose_a=transpose_a,
+                           transpose_b=transpose_b,
+                           a_is_sparse=a_is_sparse,
+                           b_is_sparse=b_is_sparse,
+                           name=name)
+    else:
+      return gen_math_ops._mat_mul(a, b,
+                                   transpose_a=transpose_a,
+                                   transpose_b=transpose_b,
+                                   name=name)
+
+sparse_matmul = gen_math_ops._sparse_mat_mul
+batch_matmul = gen_math_ops._batch_mat_mul
+
+ops.RegisterShape("MatMul")(common_shapes.matmul_shape)
+ops.RegisterShape("SparseMatMul")(common_shapes.matmul_shape)
+
+
+def _as_indexed_slices(x):
+  """Convert 'x' to IndexedSlices.
+
+  Convert a dense Tensor to a block-sparse IndexedSlices.
+
+  Args:
+    x: Either a Tensor object, or an IndexedSlices object.
+
+  Returns:
+    An IndexedSlices object.
+
+  Raises:
+    TypeError: If 'x' is not a Tensor or an IndexedSlices object.
+  """
+  # TODO(mdevin): op_scope
+  if not isinstance(x, (ops.Tensor, ops.IndexedSlices)):
+    raise TypeError("Not a Tensor or IndexedSlices: %s" % type(x))
+  if isinstance(x, ops.IndexedSlices):
+    return x
+  x_shape = array_ops.shape(x)
+  return ops.IndexedSlices(x, range(0, x_shape[0]), x_shape)
+
+
+def _as_indexed_slices_list(inputs):
+  """Convert all elements of 'inputs' to IndexedSlices.
+
+  Additionally, homogenize the types of all the indices to
+  either int32 or int64.
+
+  Args:
+    inputs: List containing either Tensor or IndexedSlices objects.
+
+  Returns:
+    A list of IndexedSlices objects.
+
+  Raises:
+    TypeError: If 'inputs' is not a list or a tuple.
+  """
+  if not isinstance(inputs, (list, tuple)):
+    raise TypeError("Expected a list or tuple, not a %s" % type(inputs))
+  outputs = [_as_indexed_slices(i) for i in inputs]
+  with_int32_index = [o.indices for o in outputs
+                      if o.indices.dtype == types.int32]
+  if not with_int32_index or len(with_int32_index) == len(outputs):
+    return outputs
+  casted_outputs = []
+  for o in outputs:
+    if o.indices.dtype == types.int32:
+      casted_outputs.append(
+          ops.IndexedSlices(o.values, cast(o.indices, types.int64),
+                            o.dense_shape))
+    else:
+      casted_outputs.append(o)
+  return casted_outputs
+
+
+def accumulate_n(inputs, shape=None, tensor_dtype=None, name=None):
+  """Returns the element-wise sum of a list of tensors.
+
+  Optionally, pass `shape` and `tensor_dtype` for shape and type checking,
+  otherwise, these are inferred.
+
+  For example:
+
+  ```python
+  # tensor 'a' is [[1, 2], [3, 4]
+  # tensor `b` is [[5, 0], [0, 6]]
+  tf.accumulate_n([a, b, a]) ==> [[7, 4], [6, 14]]
+
+  # Explicitly pass shape and type
+  tf.accumulate_n([a, b, a], shape=[2, 2], tensor_dtype=tf.int32)
+    ==> [[7, 4], [6, 14]]
+  ```
+
+  Args:
+    inputs: A list of `Tensor` objects, each with same shape and type.
+    shape: Shape of elements of `inputs`.
+    tensor_dtype: The type of `inputs`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` of same shape and type as the elements of `inputs`.
+
+  Raises:
+    ValueError: If `inputs` don't all have same shape and dtype or the shape
+    cannot be inferred.
+  """
+  if tensor_dtype is None:
+    if not inputs or not isinstance(inputs, (list, tuple)):
+      raise ValueError("inputs must be a list of at least one Tensor with the "
+                       "same dtype and shape")
+    inputs = ops.convert_n_to_tensor_or_indexed_slices(inputs)
+    if not all(isinstance(x, ops.Tensor) for x in inputs):
+      raise ValueError("inputs must be a list of at least one Tensor with the "
+                       "same dtype and shape")
+    if not all(x.dtype == inputs[0].dtype for x in inputs):
+      raise ValueError("inputs must be a list of at least one Tensor with the "
+                       "same dtype and shape")
+    tensor_dtype = inputs[0].dtype
+  if shape is not None:
+    shape = tensor_shape.as_shape(shape)
+  else:
+    shape = tensor_shape.unknown_shape()
+    for input_tensor in inputs:
+      if isinstance(input_tensor, ops.Tensor):
+        shape = shape.merge_with(input_tensor.get_shape())
+  if not shape.is_fully_defined():
+    # TODO(pbar): Make a version of assign_add that accepts an uninitialized
+    # lvalue, and takes its shape from that? This would allow accumulate_n to
+    # work in all situations that add_n currently works.
+    raise ValueError("Cannot infer the shape of the accumulator for "
+                     "accumulate_n. Pass the shape argument, or set the shape "
+                     "of at least one of the inputs.")
+  with ops.op_scope(inputs, name, "AccumulateN") as name:
+    var = gen_state_ops._temporary_variable(shape=shape, dtype=tensor_dtype)
+    var_name = var.op.name
+    var = state_ops.assign(var, array_ops.zeros_like(inputs[0]))
+    update_ops = []
+    for input_tensor in inputs:
+      op = state_ops.assign_add(var, input_tensor, use_locking=True)
+      update_ops.append(op)
+    with ops.control_dependencies(update_ops):
+      return gen_state_ops._destroy_temporary_variable(var,
+                                                       var_name=var_name,
+                                                       name=name)
+
+
+@ops.RegisterShape("BatchMatMul")
+def _BatchMatMulShape(op):
+  """Shape function for BatchMatMul op."""
+  a_shape = op.inputs[0].get_shape()
+  adj_a = op.get_attr("adj_x")
+  b_shape = op.inputs[1].get_shape()
+  adj_b = op.get_attr("adj_y")
+  if not a_shape.is_fully_defined() or not b_shape.is_fully_defined():
+    return [tensor_shape.unknown_shape()]
+  batch_dims = a_shape[:-2].merge_with(b_shape[:-2])
+  output_rows = a_shape[-1] if adj_a else a_shape[-2]
+  output_cols = b_shape[-2] if adj_b else b_shape[-1]
+  inner_a = a_shape[-2] if adj_a else a_shape[-1]
+  inner_b = b_shape[-1] if adj_b else b_shape[-2]
+  inner_a.assert_is_compatible_with(inner_b)
+  return [batch_dims.concatenate([output_rows, output_cols])]
+
+
+def sigmoid(x, name=None):
+  """Computes sigmoid of `x` element-wise.
+
+  Specifically, `y = 1 / (1 + exp(-x))`.
+
+  Args:
+    x: A Tensor with type `float`, `double`, `int32`, `complex64`, `int64`,
+      or `qint32`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A Tensor with the same type as `x` if `x.dtype != qint32`
+      otherwise the return type is `quint8`.
+  """
+  with ops.op_scope([x], name, "Sigmoid") as name:
+    x = ops.convert_to_tensor(x, name="x")
+    return gen_math_ops._sigmoid(x, name=name)
+
+
+def tanh(x, name=None):
+  """Computes hyperbolic tangent of `x` element-wise.
+
+  Args:
+    x: A Tensor with type `float`, `double`, `int32`, `complex64`, `int64`,
+      or `qint32`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A Tensor with the same type as `x` if `x.dtype != qint32` otherwise
+      the return type is `quint8`.
+  """
+  with ops.op_scope([x], name, "Tanh") as name:
+    x = ops.convert_to_tensor(x, name="x")
+    return gen_math_ops._tanh(x, name=name)
+
+
+ops.RegisterShape("Abs")(common_shapes.unchanged_shape)
+ops.RegisterShape("Ceil")(common_shapes.unchanged_shape)
+ops.RegisterShape("Conj")(common_shapes.unchanged_shape)
+ops.RegisterShape("Cos")(common_shapes.unchanged_shape)
+ops.RegisterShape("Exp")(common_shapes.unchanged_shape)
+ops.RegisterShape("Floor")(common_shapes.unchanged_shape)
+ops.RegisterShape("Imag")(common_shapes.unchanged_shape)
+ops.RegisterShape("Inv")(common_shapes.unchanged_shape)
+ops.RegisterShape("IsFinite")(common_shapes.unchanged_shape)
+ops.RegisterShape("IsInf")(common_shapes.unchanged_shape)
+ops.RegisterShape("IsNan")(common_shapes.unchanged_shape)
+ops.RegisterShape("Log")(common_shapes.unchanged_shape)
+ops.RegisterShape("LogicalNot")(common_shapes.unchanged_shape)
+ops.RegisterShape("Neg")(common_shapes.unchanged_shape)
+ops.RegisterShape("Real")(common_shapes.unchanged_shape)
+ops.RegisterShape("Rsqrt")(common_shapes.unchanged_shape)
+ops.RegisterShape("Sign")(common_shapes.unchanged_shape)
+ops.RegisterShape("Sin")(common_shapes.unchanged_shape)
+ops.RegisterShape("Sqrt")(common_shapes.unchanged_shape)
+ops.RegisterShape("Square")(common_shapes.unchanged_shape)
+ops.RegisterShape("Sigmoid")(common_shapes.unchanged_shape)
+ops.RegisterShape("Tanh")(common_shapes.unchanged_shape)
+ops.RegisterShape("Cast")(common_shapes.unchanged_shape)
+ops.RegisterShape("ComplexAbs")(common_shapes.unchanged_shape)
+
+
+@ops.RegisterShape("Add")
+@ops.RegisterShape("Complex")
+@ops.RegisterShape("Div")
+@ops.RegisterShape("Equal")
+@ops.RegisterShape("Greater")
+@ops.RegisterShape("GreaterEqual")
+@ops.RegisterShape("Less")
+@ops.RegisterShape("LessEqual")
+@ops.RegisterShape("LogicalAnd")
+@ops.RegisterShape("LogicalOr")
+@ops.RegisterShape("Maximum")
+@ops.RegisterShape("Minimum")
+@ops.RegisterShape("Mod")
+@ops.RegisterShape("Mul")
+@ops.RegisterShape("NotEqual")
+@ops.RegisterShape("Pow")
+@ops.RegisterShape("Sub")
+def _BroadcastShape(op):
+  """Common shape function for binary operators that broadcast their inputs."""
+  shape_x = op.inputs[0].get_shape()
+  shape_y = op.inputs[1].get_shape()
+  if shape_x.ndims is None or shape_y.ndims is None:
+    return [tensor_shape.unknown_shape()]
+
+  # To compute the broadcasted dimensions, we zip together shape_x and shape_y,
+  # and pad with 1 to make them the same length.
+  broadcasted_dims = reversed(list(itertools.izip_longest(
+      reversed(shape_x.dims), reversed(shape_y.dims),
+      fillvalue=tensor_shape.Dimension(1))))
+  # Next we combine the dimensions according to the numpy broadcasting rules.
+  # http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
+  return_dims = []
+  for (dim_x, dim_y) in broadcasted_dims:
+    if dim_x.value is None or dim_y.value is None:
+      # One or both dimensions is unknown. If either dimension is greater than
+      # 1, we assume that the program is correct, and the other dimension will
+      # be broadcast to match it.
+      # TODO(mrry): If we eliminate the shape checks in C++, we must still
+      # assert that the unknown dim is either 1 or the same as the known dim.
+      if dim_x.value is not None and dim_x.value > 1:
+        return_dims.append(dim_x)
+      elif dim_y.value is not None and dim_y.value > 1:
+        return_dims.append(dim_y)
+      else:
+        return_dims.append(None)
+    elif dim_x.value == 1:
+      # We will broadcast dim_x to dim_y.
+      return_dims.append(dim_y)
+    elif dim_y.value == 1:
+      # We will broadcast dim_y to dim_x.
+      return_dims.append(dim_x)
+    elif dim_x.value == dim_y.value:
+      # The dimensions are compatible, so output is the same size in that
+      # dimension.
+      return_dims.append(dim_x.merge_with(dim_y))
+    else:
+      raise ValueError("Incompatible shapes for broadcasting: %s and %s"
+                       % (shape_x, shape_y))
+  return [tensor_shape.TensorShape(return_dims)]
+
+
+@ops.RegisterShape("AddN")
+def _AddNShape(op):
+  merged_shape = tensor_shape.unknown_shape()
+  for input_ in op.inputs:
+    merged_shape = merged_shape.merge_with(input_.get_shape())
+  return [merged_shape]
+
+
+@ops.RegisterShape("Select")
+def _SelectShape(op):
+  # All three inputs must have the same shape.
+  return [op.inputs[0].get_shape()
+          .merge_with(op.inputs[1].get_shape())
+          .merge_with(op.inputs[2].get_shape())]
+
+
+@ops.RegisterShape("ArgMax")
+@ops.RegisterShape("ArgMin")
+def _ArgOpShape(op):
+  """Common shape function for arg-reduction ops."""
+  dimension_shape = op.inputs[1].get_shape()
+  dimension_shape.assert_is_compatible_with(tensor_shape.scalar())
+  input_shape = op.inputs[0].get_shape()
+  if input_shape.ndims is None:
+    return [tensor_shape.unknown_shape()]
+  elif input_shape.ndims <= 1:
+    return [tensor_shape.scalar()]
+
+  dimension = tensor_util.ConstantValue(op.inputs[1])
+  if dimension is None:
+    return [tensor_shape.unknown_shape(ndims=input_shape.ndims - 1)]
+  elif 0 <= dimension and dimension < input_shape.ndims:
+    returned_shape = []
+    for i, dim in enumerate(input_shape.dims):
+      if i != dimension:
+        returned_shape.append(dim)
+    return [tensor_shape.TensorShape(returned_shape)]
+  else:
+    raise ValueError(
+        "dimension (%d) must be in the range [0, %d), where %d is the number "
+        "of dimensions in the input"
+        % (dimension, input_shape.ndims, input_shape.ndims))
+
+
+@ops.RegisterShape("All")
+@ops.RegisterShape("Any")
+@ops.RegisterShape("Max")
+@ops.RegisterShape("Mean")
+@ops.RegisterShape("Min")
+@ops.RegisterShape("Prod")
+@ops.RegisterShape("Sum")
+def _ReductionShape(op):
+  """Common shape function for reduction ops."""
+  input_shape = op.inputs[0].get_shape()
+  reduction_indices = tensor_util.ConstantValue(op.inputs[1])
+  keep_dims = op.get_attr("keep_dims")
+  if reduction_indices is None or input_shape.ndims is None:
+    if keep_dims:
+      return [tensor_shape.unknown_shape(ndims=input_shape.ndims)]
+    else:
+      return [tensor_shape.unknown_shape()]
+
+  # Turn reduction_indices from scalar to vector if necessary
+  reduction_indices = np.ravel(reduction_indices)
+
+  for reduction_index in reduction_indices:
+    if reduction_index < 0 or reduction_index >= input_shape.ndims:
+      raise ValueError("Invalid reduction dimension %d for input with %d "
+                       "dimensions" % (reduction_index, input_shape.ndims))
+
+  returned_dims = []
+  if keep_dims:
+    for i, dim in enumerate(input_shape.dims):
+      if i in reduction_indices:
+        returned_dims.append(1)
+      else:
+        returned_dims.append(dim)
+  else:
+    for i, dim in enumerate(input_shape.dims):
+      if i not in reduction_indices:
+        returned_dims.append(dim)
+  return [tensor_shape.TensorShape(returned_dims)]
+
+
+@ops.RegisterShape("SegmentMax")
+@ops.RegisterShape("SegmentMean")
+@ops.RegisterShape("SegmentMin")
+@ops.RegisterShape("SegmentProd")
+@ops.RegisterShape("SegmentSum")
+def _SegmentReductionShape(op):
+  """Common shape function for segment reduction ops."""
+  data_shape = op.inputs[0].get_shape()
+  segment_ids_shape = op.inputs[1].get_shape()
+  segment_ids_shape.assert_has_rank(1)
+  return [tensor_shape.TensorShape([None]).concatenate(data_shape[1:])]
+
+
+@ops.RegisterShape("SparseSegmentMean")
+@ops.RegisterShape("SparseSegmentSum")
+def _SparseSegmentReductionShape(op):
+  """Common shape function for sparse segment reduction ops."""
+  data_shape = op.inputs[0].get_shape()
+  indices_shape = op.inputs[1].get_shape()
+  indices_shape.assert_has_rank(1)
+  segment_ids_shape = op.inputs[2].get_shape()
+  segment_ids_shape.assert_has_rank(1)
+  indices_shape.assert_is_compatible_with(segment_ids_shape)
+  return [tensor_shape.TensorShape([None]).concatenate(data_shape[1:])]
+
+
+@ops.RegisterShape("SparseSegmentMeanGrad")
+def _SparseSegmentMeanGradShape(op):
+  """Shape function for the SparseSegmentMeanGrad op."""
+  input_shape = op.inputs[0].get_shape()
+  indices_shape = op.inputs[1].get_shape().with_rank(1)
+  unused_segment_ids_shape = op.inputs[2].get_shape().merge_with(indices_shape)
+  unused_output_dim0_shape = op.inputs[3].get_shape().merge_with(
+      tensor_shape.scalar())
+  output_dim0 = tensor_util.ConstantValue(op.inputs[3])
+  if output_dim0 is not None:
+    dim0 = output_dim0[0]
+  else:
+    dim0 = None
+  return [tensor_shape.TensorShape([dim0]).concatenate(input_shape[1:])]
+
+
+@ops.RegisterShape("UnsortedSegmentSum")
+def _UnsortedSegmentSumShape(op):
+  """Shape function for UnsortedSegmentSum."""
+  data_shape = op.inputs[0].get_shape()
+  segment_ids_shape = op.inputs[1].get_shape()
+  mid = segment_ids_shape.ndims
+  if mid is None:
+    return [tensor_shape.unknown_shape()]
+  else:
+    num_segments = tensor_util.ConstantValue(op.inputs[2])
+    return [tensor_shape.TensorShape([num_segments]).concatenate(
+        data_shape[mid:])]
+
+
+@ops.RegisterShape("LinSpace")
+def _LinspaceShape(op):
+  num = tensor_util.ConstantValue(op.inputs[2])
+  return [tensor_shape.vector(num)]
diff --git a/tensorflow/python/ops/math_ops_test.py b/tensorflow/python/ops/math_ops_test.py
new file mode 100644
index 0000000000..86ea04f54d
--- /dev/null
+++ b/tensorflow/python/ops/math_ops_test.py
@@ -0,0 +1,68 @@
+"""Tests for tensorflow.ops.math_ops."""
+import math
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import math_ops
+from tensorflow.python.platform import googletest
+
+exp = math.exp
+log = math.log
+
+class ReduceTest(test_util.TensorFlowTestCase):
+
+  def testReduceAllDims(self):
+    x = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int32)
+    with self.test_session():
+      y_tf = math_ops.reduce_sum(x).eval()
+      self.assertEqual(y_tf, 21)
+
+class RoundTest(test_util.TensorFlowTestCase):
+
+  def testRounding(self):
+    x = [0.49, 0.7, -0.3, -0.8]
+    for dtype in [np.float32, np.double]:
+      x_np = np.array(x, dtype=dtype)
+      for use_gpu in [True, False]:
+        with self.test_session(use_gpu=use_gpu):
+          x_tf = constant_op.constant(x_np, shape=x_np.shape)
+          y_tf = math_ops.round(x_tf)
+          y_tf_np = y_tf.eval()
+          y_np = np.round(x_np)
+          self.assertAllClose(y_tf_np, y_np, atol=1e-2)
+
+
+class ModTest(test_util.TensorFlowTestCase):
+
+  def testFloat(self):
+    x = [0.5, 0.7, 0.3]
+    for dtype in [np.float32, np.double]:
+      # Test scalar and vector versions.
+      for denom in [x[0], [x[0]] * 3]:
+        x_np = np.array(x, dtype=dtype)
+        with self.test_session():
+          x_tf = constant_op.constant(x_np, shape=x_np.shape)
+          y_tf = math_ops.mod(x_tf, denom)
+          y_tf_np = y_tf.eval()
+          y_np = np.fmod(x_np, denom)
+        self.assertAllClose(y_tf_np, y_np, atol=1e-2)
+
+  def testFixed(self):
+    x = [5, 10, 23]
+    for dtype in [np.int32, np.int64]:
+      # Test scalar and vector versions.
+      for denom in [x[0], x]:
+        x_np = np.array(x, dtype=dtype)
+        with self.test_session():
+          x_tf = constant_op.constant(x_np, shape=x_np.shape)
+          y_tf = math_ops.mod(x_tf, denom)
+          y_tf_np = y_tf.eval()
+          y_np = np.mod(x_np, denom)
+        self.assertAllClose(y_tf_np, y_np)
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/ops/nn.py b/tensorflow/python/ops/nn.py
new file mode 100644
index 0000000000..7a4dc25e8b
--- /dev/null
+++ b/tensorflow/python/ops/nn.py
@@ -0,0 +1,816 @@
+# pylint: disable=wildcard-import,unused-import,g-bad-import-order
+"""## Activation Functions
+
+The activation ops provide different types of nonlinearities for use in
+neural networks.  These include smooth nonlinearities (`sigmoid`,
+`tanh`, and `softplus`), continuous but not everywhere differentiable
+functions (`relu`, `relu6`, and `relu_x`), and random regularization
+(`dropout`).
+
+All activation ops apply componentwise, and produce a tensor of the same
+shape as the input tensor.
+
+@@relu
+@@relu6
+@@softplus
+@@dropout
+@@bias_add
+@@sigmoid
+@@tanh
+
+## Convolution
+
+The convolution ops sweep a 2-D filter over a batch of images, applying the
+filter to each window of each image of the appropriate size.  The different
+ops trade off between generic vs. specific filters:
+
+* `conv2d`: Arbitrary filters that can mix channels together.
+* `depthwise_conv2d`: Filters that operate on each channel independently.
+* `separable_conv2d`: A depthwise spatial filter followed by a pointwise filter.
+
+Note that although these ops are called "convolution", they are strictly
+speaking "cross-correlation" since the filter is combined with an input window
+without reversing the filter.  For details, see [the properties of
+cross-correlation](https://en.wikipedia.org/wiki/Cross-correlation#Properties).
+
+The filter is applied to image patches of the same size as the filter and
+strided according to the `strides` argument.  `strides = [1, 1, 1, 1]` applies
+the filter to a patch at every offset, `strides = [1, 2, 2, 1]` applies the
+filter to every other image patch in each dimension, etc.
+
+Ignoring channels for the moment, the spatial semantics of the convolution ops
+are as follows.  If the 4-D `input` has shape
+`[batch, in_height, in_width, ...]` and the 4-D `filter` has shape
+`[filter_height, filter_width, ...]`, then
+
+    output.shape = [batch,
+                    (in_height - filter_height + 1) / strides[1],
+                    (in_width - filter_width + 1) / strides[2],
+                    ...]
+
+    output[b, i, j, :] =
+        sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, ...] *
+                     filter[di, dj, ...]
+
+Since `input` is 4-D, each `input[b, i, j, :]` is a vector.  For `conv2d`, these
+vectors are multiplied by the `filter[di, dj, :, :]` matrices to produce new
+vectors.  For `depthwise_conv_2d`, each scalar component `input[b, i, j, k]`
+is multiplied by a vector `filter[di, dj, k]`, and all the vectors are
+concatenated.
+
+In the formula for `output.shape`, the rounding direction depends on padding:
+
+* `padding = 'SAME'`: Round down (only full size windows are considered).
+* `padding = 'VALID'`: Round up (partial windows are included).
+
+@@conv2d
+@@depthwise_conv2d
+@@separable_conv2d
+
+## Pooling
+
+The pooling ops sweep a rectangular window over the input tensor, computing a
+reduction operation for each window (average, max, or max with argmax).  Each
+pooling op uses rectangular windows of size `ksize` separated by offset
+`strides`.  For example, if `strides` is all ones every window is used, if
+`strides` is all twos every other window is used in each dimension, etc.
+
+In detail, the output is
+
+    output[i] = reduce(value[strides * i:strides * i + ksize])
+
+for each tuple of indices `i`.  The output shape is
+
+    output.shape = (value.shape - ksize + 1) / strides
+
+where the rounding direction depends on padding:
+
+* `padding = 'SAME'`: Round down (only full size windows are considered).
+* `padding = 'VALID'`: Round up (partial windows are included).
+
+@@avg_pool
+@@max_pool
+@@max_pool_with_argmax
+
+## Normalization
+
+Normalization is useful to prevent neurons from saturating when inputs may
+have varying scale, and to aid generalization.
+
+@@l2_normalize
+@@local_response_normalization
+@@moments
+
+## Losses
+
+The loss ops measure error between two tensors, or between a tensor and zero.
+These can be used for measuring accuracy of a network in a regression task
+or for regularization purposes (weight decay).
+
+@@l2_loss
+
+## Classification
+
+TensorFlow provides several operations that help you perform classification.
+
+@@sigmoid_cross_entropy_with_logits
+@@softmax
+@@softmax_cross_entropy_with_logits
+
+## Embeddings
+
+TensorFlow provides several operations that help you compute embeddings.
+
+@@embedding_lookup
+@@embedding_lookup_sparse
+
+## Evaluation
+
+The evaluation ops are useful for measuring the performance of a network.
+Since they are nondifferentiable, they are typically used at evaluation time.
+
+@@top_k
+@@in_top_k
+
+## Candidate Sampling
+
+Do you want to train a multiclass or multilabel model with thousands
+or millions of output classes (for example, a language model with a
+large vocabulary)?  Training with a full Softmax is slow in this case,
+since all of the classes are evaluated for every training example.
+Candidate Sampling training algorithms can speed up your step times by
+only considering a small randomly-chosen subset of contrastive classes
+(called candidates) for each batch of training examples.
+
+See our [Candidate Sampling Algorithms Reference]
+(http://www.tensorflow.org/extras/candidate_sampling.pdf)
+
+### Sampled Loss Functions
+
+TensorFlow provides the following sampled loss functions for faster training.
+
+@@nce_loss
+@@sampled_softmax_loss
+
+### Candidate Samplers
+
+TensorFlow provides the following samplers for randomly sampling candidate
+classes when using one of the sampled loss functions above.
+
+@@uniform_candidate_sampler
+@@log_uniform_candidate_sampler
+@@learned_unigram_candidate_sampler
+@@fixed_unigram_candidate_sampler
+
+### Miscellaneous candidate sampling utilities
+
+@@compute_accidental_hits
+
+"""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import candidate_sampling_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import embedding_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_grad
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops import numerics
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops import sparse_ops
+from tensorflow.python.ops.math_ops import sigmoid
+from tensorflow.python.ops.math_ops import tanh
+
+# Bring more nn-associated functionality into this package.
+from tensorflow.python.ops.nn_ops import *
+from tensorflow.python.ops.candidate_sampling_ops import *
+from tensorflow.python.ops.embedding_ops import *
+
+
+def sigmoid_cross_entropy_with_logits(logits, targets, name=None):
+  """Computes sigmoid cross entropy given `logits`.
+
+  Measures the probability error in discrete classification tasks in which each
+  class is independent and not mutually exclusive.  For instance, one could
+  perform multilabel classification where a picture can contain both an elephant
+  and a dog at the same time.
+
+  For brevity, let `x = logits`, `z = targets`.  The logistic loss is
+
+      x - x * z + log(1 + exp(-x))
+
+  To ensure stability and avoid overflow, the implementation uses
+
+      max(x, 0) - x * z + log(1 + exp(-abs(x)))
+
+  `logits` and `targets` must have the same type and shape.
+
+  Args:
+    logits: A `Tensor` of type `float32` or `float64`.
+    targets: A `Tensor` of the same type and shape as `logits`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` of the same shape as `logits` with the componentwise
+    logistic losses.
+  """
+  with ops.op_scope([logits, targets], name, "logistic_loss") as name:
+    logits = ops.convert_to_tensor(logits, name="logits")
+    targets = ops.convert_to_tensor(targets, name="targets")
+    # The logistic loss formula from above is
+    #   x - x * z + log(1 + exp(-x))
+    # For x < 0, a more numerically stable formula is
+    #   -x * z + log(1 + exp(x))
+    # To avoid branching, we use the combined version
+    #   max(x, 0) - x * z + log(1 + exp(-abs(x)))
+    return math_ops.add(nn_ops.relu(logits) - logits * targets,
+                        math_ops.log(1 + math_ops.exp(-math_ops.abs(logits))),
+                        name=name)
+
+
+def xw_plus_b(x, weights, biases, name=None):
+  """Computes matmul(x, weights) + biases.
+
+  Args:
+    x: a 2D tensor.  Dimensions typically: batch, in_units
+    weights: a 2D tensor.  Dimensions typically: in_units, out_units
+    biases: a 1D tensor.  Dimensions: out_units
+    name: A name for the operation (optional).  If not specified
+      "wx_plus_b" is used.
+
+  Returns:
+    A 2-D Tensor computing matmul(x, weights) + biases.
+    Dimensions typically: batch, out_units.
+  """
+  with ops.op_scope([x, weights, biases], name, "xw_plus_b") as name:
+    x = ops.convert_to_tensor(x, name="x")
+    weights = ops.convert_to_tensor(weights, name="weights")
+    biases = ops.convert_to_tensor(biases, name="biases")
+    mm = math_ops.matmul(x, weights)
+    return nn_ops.bias_add(mm, biases, name=name)
+
+
+def relu_layer(x, weights, biases, name=None):
+  """Computes Relu(x * weight + biases).
+
+  Args:
+    x: a 2D tensor.  Dimensions typically: batch, in_units
+    weights: a 2D tensor.  Dimensions typically: in_units, out_units
+    biases: a 1D tensor.  Dimensions: out_units
+    name: A name for the operation (optional).  If not specified
+      "nn_relu_layer" is used.
+
+  Returns:
+    A 2-D Tensor computing relu(matmul(x, weights) + biases).
+    Dimensions typically: batch, out_units.
+  """
+  with ops.op_scope([x, weights, biases], name, "relu_layer") as name:
+    x = ops.convert_to_tensor(x, name="x")
+    weights = ops.convert_to_tensor(weights, name="weights")
+    biases = ops.convert_to_tensor(biases, name="biases")
+    xw_plus_b = nn_ops.bias_add(math_ops.matmul(x, weights), biases)
+    return nn_ops.relu(xw_plus_b, name=name)
+
+
+def l2_normalize(x, dim, epsilon=1e-12, name=None):
+  """Normalizes along dimension `dim` using an L2 norm.
+
+  For a 1-D tensor with `dim = 0`, computes
+
+      output = x / sqrt(max(sum(x**2), epsilon))
+
+  For `x` with more dimensions, independently normalizes each 1-D slice along
+  dimension `dim`.
+
+  Args:
+    x: A `Tensor`.
+    dim: Dimension along which to normalize.
+    epsilon: A lower bound value for the norm. Will use `sqrt(epsilon)` as the
+      divisor if `norm < sqrt(epsilon)`.
+    name: A name for this operation (optional).
+
+  Returns:
+    A `Tensor` with the same shape as `x`.
+  """
+  with ops.op_scope([x], name, "l2_normalize") as name:
+    x = ops.convert_to_tensor(x, name="x")
+    square_sum = math_ops.reduce_sum(math_ops.square(x), [dim], keep_dims=True)
+    x_inv_norm = math_ops.rsqrt(math_ops.maximum(square_sum, epsilon))
+    return math_ops.mul(x, x_inv_norm, name=name)
+
+
+def zero_fraction(value, name=None):
+  """Returns the fraction of zeros in `value`.
+
+  If `value` is empty, the result is `nan`.
+
+  This is useful in summaries to measure and report sparsity.  For example,
+
+      z = tf.Relu(...)
+      summ = tf.scalar_summary('sparsity', tf.zero_fraction(z))
+
+  Args:
+    value: A tensor of numeric type.
+    name: A name for the operation (optional).
+
+  Returns:
+    The fraction of zeros in `value`, with type `float32`.
+  """
+  with ops.op_scope([value], name, "zero_fraction"):
+    value = ops.convert_to_tensor(value, name="value")
+    zero = constant_op.constant(0, dtype=value.dtype, name="zero")
+    return math_ops.reduce_mean(math_ops.cast(math_ops.equal(value, zero),
+                                              types.float32))
+
+
+def dropout(x, keep_prob, noise_shape=None, seed=None, name=None):
+  """Computes dropout.
+
+  With probability `keep_prob`, outputs the input element scaled up by
+  `1 / keep_prob`, otherwise outputs `0`.  The scaling is so that the expected
+  sum is unchanged.
+
+  By default, each element is kept or dropped independently.  If `noise_shape`
+  is specified, it must be
+  [broadcastable](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
+  to the shape of `x`, and only dimensions with `noise_shape[i] == x.shape[i]`
+  will make independent decisions.  For example, if `x.shape = [b, x, y, c]` and
+  `noise_shape = [b, 1, 1, c]`, each batch and channel component will be
+  kept independently and each row and column will be kept or not kept together.
+
+  Args:
+    x: A tensor.
+    keep_prob: Float probability that each element is kept.
+    noise_shape: Shape for randomly generated keep/drop flags.
+    seed: A Python integer. Used to create a random seed.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+    name: A name for this operation (optional).
+
+  Returns:
+    A Tensor of the same shape of `x`.
+
+  Raises:
+    ValueError: If `keep_prob` is not in `(0, 1]`.
+  """
+  if not (0 < keep_prob <= 1):
+    raise ValueError("Expected keep_prob in (0, 1], got %g" % keep_prob)
+  with ops.op_scope([x], name, "dropout") as name:
+    x = ops.convert_to_tensor(x, name="x")
+    noise_shape = noise_shape or array_ops.shape(x)
+    # uniform [keep_prob, 1.0 + keep_prob)
+    random_tensor = keep_prob
+    random_tensor += random_ops.random_uniform(
+        noise_shape, seed=seed, dtype=x.dtype)
+    # 0. if [keep_prob, 1.0) and 1. if [1.0, 1.0 + keep_prob)
+    binary_tensor = math_ops.floor(random_tensor)
+    return x * (1.0 / keep_prob) * binary_tensor
+
+
+def depthwise_conv2d(input, filter, strides, padding, name=None):
+  """Depthwise 2-D convolution.
+
+  Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
+  and a filter tensor of shape
+  `[filter_height, filter_width, in_channels, channel_multiplier]`
+  containing `in_channels` convolutional filters of depth 1, `depthwise_conv2d`
+  applies a different filter to each input channel (expanding from 1 channel
+  to `channel_multiplier` channels for each), then concatenates the results
+  together.  The output has `in_channels * channel_multiplier` channels.
+
+  In detail,
+
+      output[b, i, j, k * channel_multiplier + q] =
+          sum_{di, dj} input[b, strides[1] * i + di, strides[2] * j + dj, k] *
+                       filter[di, dj, k, q]
+
+  Must have `strides[0] = strides[3] = 1`.  For the most common case of the
+  same horizontal and vertical strides, `strides = [1, stride, stride, 1]`.
+
+  Args:
+    input: 4-D with shape `[batch, in_height, in_width, in_channels]`.
+    filter: 4-D with shape
+      `[filter_height, filter_width, in_channels, channel_multiplier]`.
+    strides: 1-D of size 4.  The stride of the sliding window for each
+      dimension of `input`.
+    padding: A string, either `'VALID'` or `'SAME'`.  The padding algorithm.
+    name: A name for this operation (optional).
+
+  Returns:
+    A 4-D `Tensor` of shape
+    `[batch, out_height, out_width, in_channels * channel_multiplier].`
+  """
+  with ops.op_scope([input, filter], name, "depthwise") as name:
+    input = ops.convert_to_tensor(input, name="tensor_in")
+    filter = ops.convert_to_tensor(filter, name="filter_in")
+    # A shape is required to statically compute the number of separable filters.
+    if filter.get_shape().ndims is not None:
+      assert len(filter.get_shape()) == 4
+      in_channels = filter.get_shape()[2]
+      # Sanity checks, if shape information is available for the inputs.
+      if input.get_shape().ndims is not None:
+        assert len(input.get_shape()) == 4
+        assert input.get_shape()[3] == in_channels, (
+            "Mismatched input depth %d and number of depthwise filters %d." % (
+                input.get_shape()[3].value, in_channels))
+    else:
+      assert input.get_shape().ndims is not None, (
+          "Either tensor must provide static shape information.")
+      assert input.get_shape().ndims == 4
+      in_channels = input.get_shape()[3]
+
+    if in_channels == 1:
+      return nn_ops.conv2d(input, filter, strides, padding, name=name)
+    else:
+      # Create one separate convolution per channel.
+      convs = []
+      for channel in xrange(in_channels):
+        with ops.name_scope("depth%d" % channel) as channel_scope:
+          t_in = array_ops.slice(input, [0, 0, 0, channel], [-1, -1, -1, 1],
+                                 name="slice_inputs")
+          f_in = array_ops.slice(filter, [0, 0, channel, 0], [-1, -1, 1, -1],
+                                 name="slice_params")
+          convs.append(nn_ops.conv2d(t_in, f_in,
+                                     strides, padding, name=channel_scope))
+      # Concatenate the per-channel convolutions along the channel dimension.
+      return array_ops.concat(3, convs, name=name)
+
+
+def separable_conv2d(input, depthwise_filter, pointwise_filter, strides,
+                     padding,
+                     name=None):
+  """2-D convolution with separable filters.
+
+  Performs a depthwise convolution that acts separately on channels followed by
+  a pointwise convolution that mixes channels.  Note that this is separability
+  between dimensions `[1, 2]` and `3`, not spatial separability between
+  dimensions `1` and `2`.
+
+  In detail,
+
+      output[b, i, j, k] = sum_{di, dj, q, r]
+          input[b, strides[1] * i + di, strides[2] * j + dj, q] *
+          depthwise_filter[di, dj, q, r] *
+          pointwise_filter[0, 0, q * channel_multiplier + r, k]
+
+  `strides` controls the strides for the depthwise convolution only, since
+  the pointwise convolution has implicit strides of `[1, 1, 1, 1]`.  Must have
+  `strides[0] = strides[3] = 1`.  For the most common case of the same
+  horizontal and vertical strides, `strides = [1, stride, stride, 1]`.
+
+  Args:
+    input: 4-D `Tensor` with shape `[batch, in_height, in_width, in_channels]`.
+    depthwise_filter: 4-D `Tensor` with shape
+      `[filter_height, filter_width, in_channels, channel_multiplier]`.
+      Contains `in_channels` convolutional filters of depth 1.
+    pointwise_filter: 4-D `Tensor` with shape
+      `[1, 1, channel_multiplier * in_channels, out_channels]`.  Pointwise
+      filter to mix channels after `depthwise_filter` has convolved spatially.
+    strides: 1-D of size 4.  The strides for the depthwise convolution for
+      each dimension of `input`.
+    padding: A string, either `'VALID'` or `'SAME'`.  The padding algorithm.
+    name: A name for this operation (optional).
+
+  Returns:
+    A 4-D `Tensor` of shape `[batch, out_height, out_width, out_channels]`.
+  """
+  with ops.op_scope([input, depthwise_filter, pointwise_filter],
+                   name, "separable_conv2d") as name:
+    input = ops.convert_to_tensor(input, name="tensor_in")
+    depthwise_filter = ops.convert_to_tensor(depthwise_filter,
+                                             name="depthwise_filter")
+    pointwise_filter = ops.convert_to_tensor(pointwise_filter,
+                                             name="pointwise_filter")
+
+    if pointwise_filter.get_shape().ndims is not None:
+      assert len(pointwise_filter.get_shape()) == 4
+      assert pointwise_filter.get_shape()[0] == 1
+      assert pointwise_filter.get_shape()[1] == 1
+      if depthwise_filter.get_shape().ndims and input.get_shape().ndims:
+        channel_multiplier = depthwise_filter.get_shape()[3]
+        in_channels = input.get_shape()[3]
+        out_channels = pointwise_filter.get_shape()[3]
+        # This would mean the separable convolutions is over-parametrized.
+        assert channel_multiplier * in_channels < out_channels
+    # The layout of the ops in the graph are expected to be as follows:
+    # separable_conv2d  // Conv2D op corresponding to the pointwise conv.
+    # separable_conv2d/depthwise  // Concat op for the deptwise outputs.
+    # separable_conv2d/depthwise/depth0  // Conv2D op for depth 0
+    # separable_conv2d/depthwise/depth1  // Conv2D op for depth 1
+    # separable_conv2d/depthwise/depth2  // Conv2D op for depth 2
+    depthwise = depthwise_conv2d(input, depthwise_filter, strides,
+                                 padding, name="depthwise")
+    return nn_ops.conv2d(depthwise, pointwise_filter, [1, 1, 1, 1],
+                         padding="VALID", name=name)
+
+
+def moments(x, axes, name=None):
+  """Calculate the mean and variance of `x`.
+
+  The mean and variance are calculated by aggregating the contents of `x`
+  across `axes`.  If `x` is 1-D and `axes = [0]` this is just the mean
+  and variance of a vector.
+
+  For so-called "global normalization" needed for convolutional filters pass
+  `axes=[0, 1, 2]` (batch, height, width).  For batch normalization pass
+  `axes=[0]` (batch).
+
+  Args:
+    x: A `Tensor`.
+    axes: array of ints.  Axes along which to compute mean and
+      variance.
+    name: Name used to scope the operations that compute the moments.
+
+  Returns:
+    Two `Tensors`: `mean` and `variance`.
+  """
+  with ops.op_scope([x, axes], name, "moments"):
+    x = ops.convert_to_tensor(x, name="x")
+    divisor = 1.0
+    for d in xrange(len(x.get_shape())):
+      if d in axes:
+        divisor *= x.get_shape()[d].value
+    divisor = constant_op.constant(1.0 / divisor, x.dtype, name="divisor")
+    axes = constant_op.constant(axes, name="axes")
+    # Note: We do not use Mean here because it is very slow on GPU.
+    # Note 2: The expression below is potentially more stable.
+    # It is however a bit slower and stability doesn't appear to be an issue.
+    # mean = math_ops.reduce_sum(math_ops.mul(x, divisor), axes, name="mean")
+    # var = math_ops.reduce_sum(math_ops.mul(math_ops.square(x - mean),
+    #                                        divisor), axes,
+    #                    name="variance")
+    mean = math_ops.mul(math_ops.reduce_sum(x, axes), divisor, name="mean")
+    var = math_ops.mul(math_ops.reduce_sum(math_ops.square(x - mean), axes),
+                       divisor, name="variance")
+    return mean, var
+
+
+def _sum_rows(x):
+  """Returns a vector summing up each row of the matrix x."""
+  # _sum_rows(x) is equivalent to math_ops.reduce_sum(x, 1) when x is
+  # a matrix.  The gradient of _sum_rows(x) is more efficient than
+  # reduce_sum(x, 1)'s gradient in today's implementation. Therefore,
+  # we use _sum_rows(x) in the nce_loss() computation since the loss
+  # is mostly used for training.
+  cols = array_ops.shape(x)[1]
+  ones_shape = array_ops.pack([cols, 1])
+  ones = array_ops.ones(ones_shape, x.dtype)
+  return array_ops.reshape(math_ops.matmul(x, ones), [-1])
+
+
+def _compute_sampled_logits(weights, biases, inputs, labels, num_sampled,
+                            num_classes, num_true=1,
+                            sampled_values=None,
+                            subtract_log_q=True,
+                            remove_accidental_hits=False,
+                            name=None):
+  """Helper function for nce_loss and sampled_softmax_loss functions.
+
+  Computes sampled output training logits and labels suitable for implementing
+  e.g. noise-contrastive estimation (see nce_loss) or sampled softmax (see
+  sampled_softmax_loss).
+
+  Note: In the case where num_true > 1, we assign to each target class
+  the target probability 1 / num_true so that the target probabilities
+  sum to 1 per-example.
+
+  Args:
+    weights: tensor of label embeddings with shape = [num_classes, dim]
+    biases: tensor of num_classes label biases
+    inputs: tensor with shape = [batch_size, dim] corresponding to forward
+        activations of the input network
+    labels: int tensor with shape [batch_size, num_true]
+    num_sampled: number of label classes to sample per batch
+    num_classes: number of possible label classes in the data (e.g. vocab size)
+    num_true: number of target classes per example (default: 1)
+    sampled_values: a tuple of (sampled_candidates, true_expected_count,
+        sampled_expected_count) returned by a *CandidateSampler function to use
+        (if None, we default to LogUniformCandidateSampler)
+    subtract_log_q: subtract the log expected count of the labels in the sample
+        to get the logits of the true labels (default: True)
+        Turn off for Negative Sampling.
+    remove_accidental_hits: whether to remove "accidental hits" where a sampled
+        label equals the true labels (bool, default: False)
+    name: name for this op
+
+  Returns:
+    out_logits, out_labels: tensors with shape [batch_size, num_true +
+        num_sampled] for passing to either SigmoidCrossEntropyWithLogits (NCE)
+        or SoftmaxCrossEntropyWithLogits (sampled softmax).
+
+  """
+
+  with ops.op_scope(
+      [weights, biases, inputs, labels], name, "compute_sampled_logits"):
+    if labels.dtype != types.int64:
+      labels = math_ops.cast(labels, types.int64)
+    labels_flat = array_ops.reshape(labels, [-1])
+
+    # Sample the negative labels.
+    #   sampled shape: num_sampled vector
+    #   true_expected_count shape = [batch_size, 1]
+    #   sampled_expected_count shape = num_sampled vector
+    if sampled_values is None:
+      sampled_values = candidate_sampling_ops.log_uniform_candidate_sampler(
+          true_classes=labels,
+          num_true=num_true,
+          num_sampled=num_sampled,
+          unique=True,
+          range_max=num_classes)
+    # NOTE: pylint cannot tell that 'sampled_values' is a sequence
+    # pylint: disable=unpacking-non-sequence
+    sampled, true_expected_count, sampled_expected_count = sampled_values
+    # pylint: enable=unpacking-non-sequence
+
+    # weights shape is [num_classes, dim]
+    # labels_flat is a [batch_size * num_true] vector
+    # true_w shape is [batch_size * num_true, dim]
+    # true_b is a [batch_size * num_true] vector
+    true_w = embedding_ops.embedding_lookup(weights, labels_flat)
+    true_b = embedding_ops.embedding_lookup(biases, labels_flat)
+
+    # inputs shape is [batch_size, dim]
+    # true_w shape is [batch_size * num_true, dim]
+    # row_wise_dots is [batch_size, num_true, dim]
+    dim = array_ops.shape(true_w)[1:2]
+    new_true_w_shape = array_ops.concat(0, [[-1, num_true], dim])
+    row_wise_dots = math_ops.mul(
+        array_ops.expand_dims(inputs, 1),
+        array_ops.reshape(true_w, new_true_w_shape))
+    # We want the row-wise dot plus biases which yields a
+    # [batch_size, num_true] tensor of true_logits.
+    dots_as_matrix = array_ops.reshape(row_wise_dots,
+                                       array_ops.concat(0, [[-1], dim]))
+    true_logits = array_ops.reshape(_sum_rows(dots_as_matrix), [-1, num_true])
+    true_b = array_ops.reshape(true_b, [-1, num_true])
+    true_logits += true_b
+
+    # Lookup weights and biases for sampled labels.
+    #   sampled is a num_sampled int vector
+    #   sampled_w shape is [num_sampled, dim]
+    #   sampled_b is a num_sampled float vector
+    sampled_w = embedding_ops.embedding_lookup(weights, sampled)
+    sampled_b = embedding_ops.embedding_lookup(biases, sampled)
+
+    # inputs has shape [batch_size, dim]
+    # sampled_w has shape [num_sampled, dim]
+    # sampled_b has shape [num_sampled]
+    # Apply X*W'+B, which yields [batch_size, num_sampled]
+    sampled_logits = math_ops.matmul(inputs,
+                                     sampled_w,
+                                     transpose_b=True) + sampled_b
+
+    if remove_accidental_hits:
+      acc_hits = candidate_sampling_ops.compute_accidental_hits(
+          labels, sampled, num_true=num_true)
+      acc_indices, acc_ids, acc_weights = acc_hits
+
+      # This is how SparseToDense expects the indices.
+      acc_indices_2d = array_ops.reshape(acc_indices, [-1, 1])
+      acc_ids_2d_int32 = array_ops.reshape(math_ops.cast(
+          acc_ids, types.int32), [-1, 1])
+      sparse_indices = array_ops.concat(
+          1, [acc_indices_2d, acc_ids_2d_int32], "sparse_indices")
+      # Create sampled_logits_shape = [batch_size, num_sampled]
+      sampled_logits_shape = array_ops.concat(
+          0,
+          [array_ops.shape(labels)[:1], array_ops.expand_dims(num_sampled, 0)])
+      sampled_logits += sparse_ops.sparse_to_dense(
+          sparse_indices, sampled_logits_shape, acc_weights, 0.0)
+
+    if subtract_log_q:
+      # Subtract log of Q(l), prior probability that l appears in sampled.
+      true_logits -= math_ops.log(true_expected_count)
+      sampled_logits -= math_ops.log(sampled_expected_count)
+
+    # Construct output logits and labels. The true labels/logits start at col 0.
+    out_logits = array_ops.concat(1, [true_logits, sampled_logits])
+    # true_logits is a float tensor, ones_like(true_logits) is a float tensor
+    # of ones. We then divide by num_true to ensure the per-example labels sum
+    # to 1.0, i.e. form a proper probability distribution.
+    out_labels = array_ops.concat(
+        1, [array_ops.ones_like(true_logits) / num_true,
+            array_ops.zeros_like(sampled_logits)])
+
+  return out_logits, out_labels
+
+
+def nce_loss(weights, biases, inputs, labels, num_sampled, num_classes,
+             num_true=1,
+             sampled_values=None,
+             remove_accidental_hits=False,
+             name="nce_loss"):
+  """Computes and returns the noise-contrastive estimation training loss.
+
+  See [Noise-contrastive estimation: A new estimation principle for
+  unnormalized statistical models]
+  (http://www.jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf).
+  Also see our [Candidate Sampling Algorithms Reference]
+  (http://www.tensorflow.org/extras/candidate_sampling.pdf)
+
+  Note: In the case where num_true > 1, we assign to each target class
+  the target probability 1 / num_true so that the target probabilities
+  sum to 1 per-example.
+
+  Note: It would be useful to allow a variable number of target classes per
+  example.  We hope to provide this functionality in a future release.
+  For now, if you have a variable number of target classes, you can pad them
+  out to a constant number by either repeating them or by padding
+  with an otherwise unused class.
+
+  Args:
+    weights: A `Tensor` of shape [num_classes, dim].  The class embeddings.
+    biases: A `Tensor` of shape [num_classes].  The class biases.
+    inputs: A `Tensor` of shape [batch_size, dim].  The forward
+        activations of the input network.
+    labels: A `Tensor` of type `int64` and shape `[batch_size,
+      num_true]`. The target classes.
+    num_sampled: An `int`.  The number of classes to randomly sample per batch.
+    num_classes: An `int`. The number of possible classes.
+    num_true: An `int`.  The number of target classes per training example.
+    sampled_values: a tuple of `(sampled_candidates, true_expected_count,
+        sampled_expected_count)` returned by a *_candidate_sampler function.
+        (if None, we default to LogUniformCandidateSampler)
+    remove_accidental_hits:  A `bool`.  Whether to remove "accidental hits"
+        where a sampled class equals one of the target classes.  If set to
+        `True`, this is a "Sampled Logistic" loss instead of NCE, and we are
+        learning to generate log-odds instead of log probabilities.  See
+        our [Candidate Sampling Algorithms Reference]
+        (http://www.tensorflow.org/extras/candidate_sampling.pdf).
+        Default is False.
+    name: A name for the operation (optional).
+
+  Returns:
+    A batch_size 1-D tensor of per-example NCE losses.
+  """
+  logits, labels = _compute_sampled_logits(
+      weights, biases, inputs, labels, num_sampled, num_classes,
+      num_true=num_true,
+      sampled_values=sampled_values,
+      subtract_log_q=True,
+      remove_accidental_hits=remove_accidental_hits,
+      name=name)
+  sampled_losses = sigmoid_cross_entropy_with_logits(logits,
+                                                     labels,
+                                                     name="sampled_losses")
+  # sampled_losses is batch_size x {true_loss, sampled_losses...}
+  # We sum out true and sampled losses.
+  return _sum_rows(sampled_losses)
+
+
+def sampled_softmax_loss(weights, biases, inputs, labels, num_sampled,
+                         num_classes, num_true=1,
+                         sampled_values=None,
+                         remove_accidental_hits=True,
+                         name="sampled_softmax_loss"):
+  """Computes and returns the sampled softmax training loss.
+
+  This is a faster way to train a softmax classifier over a huge number of
+  classes.
+
+  This operation is for training only.  It is generally an underestimate of
+  the full softmax loss.
+
+  At inference time, you can compute full softmax probabilities with the
+  expression `tf.nn.softmax(tf.matmul(inputs, weights) + biases)`.
+
+  See our [Candidate Sampling Algorithms Reference]
+  (http://www.tensorflow.org/extras/candidate_sampling.pdf)
+
+  Also see Section 3 of http://arxiv.org/abs/1412.2007 for the math.
+
+  Args:
+    weights: A `Tensor` of shape [num_classes, dim].  The class embeddings.
+    biases: A `Tensor` of shape [num_classes].  The class biases.
+    inputs: A `Tensor` of shape [batch_size, dim].  The forward
+        activations of the input network.
+    labels: A `Tensor` of type `int64` and shape `[batch_size,
+      num_true]`. The target classes.  Note that this format differs from
+      the `labels` argument of `nn.softmax_cross_entropy_with_logits`.
+    num_sampled: An `int`.  The number of classes to randomly sample per batch.
+    num_classes: An `int`. The number of possible classes.
+    num_true: An `int`.  The number of target classes per training example.
+    sampled_values: a tuple of `(sampled_candidates, true_expected_count,
+        sampled_expected_count)` returned by a *_candidate_sampler function.
+        (if None, we default to LogUniformCandidateSampler)
+    remove_accidental_hits:  A `bool`.  whether to remove "accidental hits"
+        where a sampled class equals one of the target classes.  Default is
+        True.
+    name: A name for the operation (optional).
+
+  Returns:
+    A batch_size 1-D tensor of per-example sampled softmax losses.
+
+  """
+  logits, labels = _compute_sampled_logits(
+      weights, biases, inputs, labels, num_sampled, num_classes,
+      num_true=num_true,
+      sampled_values=sampled_values,
+      subtract_log_q=True,
+      remove_accidental_hits=remove_accidental_hits,
+      name=name)
+  sampled_losses = nn_ops.softmax_cross_entropy_with_logits(logits, labels)
+  # sampled_losses is a batch_size vector.
+  return sampled_losses
diff --git a/tensorflow/python/ops/nn_grad.py b/tensorflow/python/ops/nn_grad.py
new file mode 100644
index 0000000000..0cf867d217
--- /dev/null
+++ b/tensorflow/python/ops/nn_grad.py
@@ -0,0 +1,229 @@
+"""Gradients for operators defined in nn_ops.py."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn_ops
+from tensorflow.python.ops import gen_nn_ops
+
+
+@ops.RegisterGradient("Conv2DBackpropInput")
+def _DeConv2DGrad(op, grad):
+  """The derivatives for deconvolution.
+
+  Args:
+    op: the Deconvolution op.
+    grad: the tensor representing the gradient w.r.t. the output
+
+  Returns:
+    the gradients w.r.t. the input and the filter
+  """
+  return [None,
+          nn_ops.conv2d_backprop_filter(grad,
+                                      array_ops.shape(op.inputs[1]),
+                                      op.inputs[2],
+                                      op.get_attr("strides"),
+                                      op.get_attr("padding")),
+          nn_ops.conv2d(grad,
+                        op.inputs[1],
+                        op.get_attr("strides"),
+                        op.get_attr("padding"))]
+
+
+@ops.RegisterGradient("Softmax")
+def _SoftmaxGrad(op, grad_softmax):
+  """The derivative of the softmax nonlinearity.
+
+  We assume that probs is of shape [batch_size * dim]
+  The formula for dsoftmax / dx = (diag(softmax) - softmax * softmax').
+  This matrix is diagonal minus a rank one matrix, so it is easy to implement
+  as follows:
+
+    grad_x = grad_softmax * softmax - sum(grad_softmax * softmax) * softmax
+
+  Args:
+     op: the Softmax op.
+     grad_softmax:  the tensor representing the gradient w.r.t. the
+       softmax output.
+
+  Returns:
+     gradient w.r.t the input to the softmax
+
+  """
+  # TODO(ilyasu): assert that the tensor has two dimensions at
+  # graph-construction time?  Alternatively: do different things
+  # depending on the dimensionality of the input tensors.
+  softmax = op.outputs[0]
+  grad_x = ((grad_softmax -
+             array_ops.reshape(math_ops.reduce_sum(grad_softmax * softmax, [1]),
+                               [-1, 1]))
+            * softmax)
+  return grad_x
+
+
+@ops.RegisterGradient("BiasAdd")
+def _BiasAddGrad(unused_bias_op, received_grad):
+  """Return the gradients for the 2 inputs of bias_op.
+
+  The first input of unused_bias_op is the tensor t, and its gradient is
+  just the gradient the unused_bias_op received.
+
+  The second input of unused_bias_op is the bias vector which has one fewer
+  dimension than "received_grad" (the batch dimension.)  Its gradient is the
+  received gradient Summed on the batch dimension, which is the first dimension.
+
+  Args:
+    unused_bias_op: The BiasOp for which we need to generate gradients.
+    received_grad: Tensor.  The gradients passed to the BiasOp.
+
+  Returns:
+    Two tensors, the first one for the "tensor" input of the BiasOp,
+    the second one for the "bias" input of the BiasOp.
+  """
+  reduction_dim_tensor = math_ops.range(0, array_ops.rank(received_grad) - 1)
+  return (received_grad, math_ops.reduce_sum(received_grad, reduction_dim_tensor))
+
+
+def _VerifyTensor(t, name, msg):
+  """Assert that the tensor does not contain any NaN's.
+
+  Args:
+    t: Tensor
+    name: name
+    msg: message to log
+  Returns:
+    Tensor, but verified
+  """
+  with ops.name_scope(name):
+    with ops.device(t.device or ops.get_default_graph().get_default_device()):
+      verify_input = array_ops.check_numerics(t, message=msg)
+      out = control_flow_ops.with_dependencies([verify_input], t)
+  return out
+
+
+@ops.RegisterGradient("Relu")
+def _ReluGrad(op, grad):
+  t = _VerifyTensor(op.inputs[0], op.name, "ReluGrad input is not finite.")
+  return gen_nn_ops._relu_grad(grad, t)
+
+
+@ops.RegisterGradient("Relu6")
+def _Relu6Grad(op, grad):
+  return gen_nn_ops._relu6_grad(grad, op.inputs[0])
+
+
+@ops.RegisterGradient("Softplus")
+def _SoftplusGrad(op, grad):
+  return gen_nn_ops._softplus_grad(grad, op.inputs[0])
+
+
+@ops.RegisterGradient("ReluGrad")
+def _ReluGradGrad(op, grad):
+  x = op.inputs[1]
+  return (gen_nn_ops._relu_grad(grad, x),
+          array_ops.zeros(shape=array_ops.shape(x), dtype=x.dtype))
+
+
+def _BroadcastMul(vec, mat):
+  """Multiply after broadcasting vec to match dimensions of mat.
+
+  Args:
+    vec: A 1-D tensor of dimension [D0]
+    mat: A 2-D tensor of dimension [D0, D1]
+
+  Returns:
+    A tensor of dimension [D0, D1], the result of vec * mat
+  """
+  # Reshape vec to [D0, 1]
+  vec = array_ops.expand_dims(vec, -1)
+  return vec * mat
+
+
+@ops.RegisterGradient("SoftmaxCrossEntropyWithLogits")
+def _SoftmaxCrossEntropyWithLogitsGrad(op, grad_0, _):
+  # grad_0 is the backprop for cost, and we multiply it with the gradients
+  # (which is output[1])
+  # There is no gradient for the labels
+  return _BroadcastMul(grad_0, op.outputs[1]), None
+
+
+@ops.RegisterGradient("Conv2D")
+def _Conv2DGrad(op, grad):
+  return [nn_ops.conv2d_backprop_input(array_ops.shape(op.inputs[0]),
+                                       op.inputs[1],
+                                       grad,
+                                       op.get_attr("strides"),
+                                       op.get_attr("padding")),
+          nn_ops.conv2d_backprop_filter(op.inputs[0],
+                                        array_ops.shape(op.inputs[1]),
+                                        grad,
+                                        op.get_attr("strides"),
+                                        op.get_attr("padding"))]
+
+
+@ops.RegisterGradient("LRN")
+def _LRNGrad(op, grad):
+  depth_radius = op.get_attr("depth_radius")
+  bias = op.get_attr("bias")
+  alpha = op.get_attr("alpha")
+  beta = op.get_attr("beta")
+  return [gen_nn_ops._lrn_grad(grad, op.inputs[0], op.outputs[0],
+                               depth_radius, bias, alpha, beta)]
+
+
+@ops.RegisterGradient("AvgPool")
+def _AvgPoolGrad(op, grad):
+  return gen_nn_ops._avg_pool_grad(array_ops.shape(op.inputs[0]), grad,
+                                   op.get_attr("ksize"),
+                                   op.get_attr("strides"),
+                                   op.get_attr("padding"))
+
+
+@ops.RegisterGradient("MaxPool")
+def _MaxPoolGrad(op, grad):
+  return gen_nn_ops._max_pool_grad(op.inputs[0], op.outputs[0], grad,
+                                   op.get_attr("ksize"),
+                                   op.get_attr("strides"),
+                                   padding=op.get_attr("padding"))
+
+
+@ops.RegisterGradient("BatchNormWithGlobalNormalization")
+def _BatchNormWithGlobalNormalizationGrad(op, grad):
+  """Return the gradients for the 5 inputs of BatchNormWithGlobalNormalization.
+
+  We do not backprop anything for the mean and var intentionally as they are
+  not being trained with backprop in the operation.
+
+  Args:
+    op: The BatchNormOp for which we need to generate gradients.
+    grad: Tensor.  The gradients passed to the BatchNormOp.
+
+  Returns:
+    dx: Backprop for input, which is (grad * (g * rsqrt(v + epsilon)))
+    dm: Backprop for mean, which is
+        sum_over_rest(grad * g) * (-1 / rsqrt(v + epsilon))
+    dv: Backprop for variance, which is
+        sum_over_rest(grad * g * (x - m)) * (-1/2) * (v + epsilon) ^ (-3/2)
+    db: Backprop for beta, which is grad reduced in all except the
+        last dimension.
+    dg: Backprop for gamma, which is (grad * ((x - m) * rsqrt(v + epsilon)))
+  """
+  dx, dm, dv, db, dg = gen_nn_ops._batch_norm_with_global_normalization_grad(
+      op.inputs[0], op.inputs[1], op.inputs[2], op.inputs[4], grad,
+      op.get_attr("variance_epsilon"), op.get_attr("scale_after_normalization"))
+  return dx, dm, dv, db, dg
+
+
+@ops.RegisterGradient("L2Loss")
+def _L2LossGrad(op, grad):
+  """Return the gradients for L2Loss.
+
+  Args:
+    op: The L2LossOp for which we need to generate gradients.
+    grad: Tensor containing a single number.
+
+  Returns:
+    The gradient, which is (x * grad).
+  """
+  return op.inputs[0] * grad
diff --git a/tensorflow/python/ops/nn_ops.py b/tensorflow/python/ops/nn_ops.py
new file mode 100644
index 0000000000..0ffe95de2b
--- /dev/null
+++ b/tensorflow/python/ops/nn_ops.py
@@ -0,0 +1,365 @@
+"""Wrappers for primitive Neural Net (NN) Operations."""
+
+import tensorflow.python.platform
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import gen_nn_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_nn_ops import *
+
+
+# Aliases for some automatically-generated names.
+local_response_normalization = gen_nn_ops.lrn
+
+
+def deconv2d(value, filter, output_shape, strides, padding="SAME",
+             name=None):
+  """The transpose of `conv2d`.
+
+  This used to be called "deconvolution", but it is actually the transpose
+  (gradient) of `conv2d`, not an actual deconvolution.
+
+  Args:
+    value: A 4-D `Tensor` of type `float` and shape
+      `[batch, height, width, in_channels]`.
+    filter: A 4-D `Tensor` with the same type as `value` and shape
+      `[height, width, output_channels, in_channels]`.  `filter`'s
+      `in_channels` dimension must match that of `value`.
+    output_shape: A 1-D `Tensor` representing the output shape of the
+      deconvolution op.
+    strides: A list of ints. The stride of the sliding window for each
+      dimension of the input tensor.
+    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
+    name: Optional name for the returned tensor.
+
+  Returns:
+    A `Tensor` with the same type as `value`.
+
+  Raises:
+    ValueError: If input/output depth does not match `filter`'s shape, or if
+      padding is other than `'VALID'` or `'SAME'`.
+  """
+  with ops.op_scope([value, filter, output_shape], name, "DeConv2D") as name:
+    value = ops.convert_to_tensor(value, name="value")
+    filter = ops.convert_to_tensor(filter, name="filter")
+    if not value.get_shape()[3].is_compatible_with(filter.get_shape()[3]):
+      raise ValueError(
+          "input channels does not match filter's input channels, "
+          "{} != {}".format(value.get_shape()[3], filter.get_shape()[3]))
+
+    output_shape_ = ops.convert_to_tensor(output_shape, name="output_shape")
+    if not output_shape_.get_shape().is_compatible_with(tensor_shape.vector(4)):
+      raise ValueError("output_shape must have shape (4,), got {}"
+                       .format(output_shape_.get_shape()))
+
+    if isinstance(output_shape, (list, np.ndarray)):
+      # output_shape's shape should be == [4] if reached this point.
+      if not filter.get_shape()[2].is_compatible_with(output_shape[3]):
+        raise ValueError(
+            "output_shape does not match filter's output channels, "
+            "{} != {}".format(output_shape[3], filter.get_shape()[2]))
+
+    if padding != "VALID" and padding != "SAME":
+      raise ValueError("padding must be either VALID or SAME:"
+                       " {}".format(padding))
+
+    return gen_nn_ops.conv2d_backprop_input(input_sizes=output_shape_,
+                                            filter=filter,
+                                            out_backprop=value,
+                                            strides=strides,
+                                            padding=padding,
+                                            name=name)
+
+# pylint: disable=protected-access
+def bias_add(value, bias, name=None):
+  """Adds `bias` to `value`.
+
+  This is (mostly) a special case of `tf.add` where `bias` is restricted to 1-D.
+  Broadcasting is supported, so `value` may have any number of dimensions.
+  Unlike `tf.add`, the type of `bias` is allowed to differ from `value` in the
+  case where both types are quantized.
+
+  Args:
+    value: A `Tensor` with type `float`, `double`, `int64`, `int32`, `uint8`,
+      `int16`, `int8`, or `complex64`.
+    bias: A 1-D `Tensor` with size matching the last dimension of `value`.
+      Must be the same type as `value` unless `value` is a quantized type,
+      in which case a different quantized type may be used.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` with the same type as `value`.
+  """
+  with ops.op_scope([value, bias], name, "BiasAdd") as name:
+    value = ops.convert_to_tensor(value, name="input")
+    bias = ops.convert_to_tensor(bias, dtype=value.dtype, name="bias")
+    return gen_nn_ops._bias_add(value, bias, name=name)
+
+
+ops.RegisterShape("BiasAdd")(common_shapes.bias_add_shape)
+
+
+
+def relu6(features, name=None):
+  """Computes Rectified Linear 6: `min(max(features, 0), 6)`.
+
+  Args:
+    features: A `Tensor` with type `float`, `double`, `int32`, `int64`, `uint8`,
+      `int16`, or `int8`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A `Tensor` with the same type as `features`.
+  """
+  with ops.op_scope([features], name, "Relu6") as name:
+    features = ops.convert_to_tensor(features, name="features")
+    return gen_nn_ops._relu6(features, name=name)
+
+
+def softmax_cross_entropy_with_logits(logits, labels, name=None):
+  """Computes softmax cross entropy between `logits` and `labels`.
+
+  Measures the probability error in discrete classification tasks in which the
+  classes are mutually exclusive (each entry is in exactly one class).  For
+  example, each CIFAR-10 image is labeled with one and only one label: an image
+  can be a dog or a truck, but not both.
+
+  **WARNING:** This op expects unscaled logits, since it performs a `softmax`
+  on `logits` internally for efficiency.  Do not call this op with the
+  output of `softmax`, as it will produce incorrect results.
+
+  `logits` and `labels` must have the same shape `[batch_size, num_classes]`
+  and the same dtype (either `float32` or `float64`).
+
+  Args:
+    logits: Unscaled log probabilities.
+    labels: Each row `labels[i]` must be a valid probability distribution.
+    name: A name for the operation (optional).
+
+  Returns:
+    A 1-D `Tensor` of length `batch_size` of the same type as `logits` with the
+    softmax cross entropy loss.
+  """
+  # The second output tensor contains the gradients.  We use it in
+  # _CrossEntropyGrad() in nn_grad but not here.
+  cost, unused_backprop = gen_nn_ops._softmax_cross_entropy_with_logits(
+      logits, labels, name=name)
+  return cost
+
+
+@ops.RegisterShape("SoftmaxCrossEntropyWithLogits")
+def _SoftmaxCrossEntropyWithLogitsShape(op):
+  """Shape function for SoftmaxCrossEntropyWithLogits op."""
+  logits_shape = op.inputs[0].get_shape()
+  labels_shape = op.inputs[1].get_shape()
+  input_shape = logits_shape.merge_with(labels_shape).with_rank(2)
+  batch_size = input_shape[0]
+  return [tensor_shape.vector(batch_size.value), input_shape]
+
+
+def avg_pool(value, ksize, strides, padding, name=None):
+  """Performs the average pooling on the input.
+
+  Each entry in `output` is the mean of the corresponding size `ksize`
+  window in `value`.
+
+  Args:
+    value: A 4-D `Tensor` of shape `[batch, height, width, channels]` and type
+      `float32`, `float64`, `qint8`, `quint8`, or `qint32`.
+    ksize: A list of ints that has length >= 4.
+      The size of the window for each dimension of the input tensor.
+    strides: A list of ints that has length >= 4.
+      The stride of the sliding window for each dimension of the
+      input tensor.
+    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
+    name: Optional name for the operation.
+
+  Returns:
+    A `Tensor` with the same type as `value`.  The average pooled output tensor.
+  """
+  with ops.op_scope([value], name, "AvgPool") as name:
+    value = ops.convert_to_tensor(value, name="input")
+    return gen_nn_ops._avg_pool(value, ksize=ksize, strides=strides,
+                                padding=padding,
+                                name=name)
+
+
+def max_pool(value, ksize, strides, padding, name=None):
+  """Performs the max pooling on the input.
+
+  Args:
+    value: A 4-D `Tensor` with shape `[batch, height, width, channels]` and
+      type `float32`, `float64`, `qint8`, `quint8`, `qint32`.
+    ksize: A list of ints that has length >= 4.  The size of the window for
+      each dimension of the input tensor.
+    strides: A list of ints that has length >= 4.  The stride of the sliding
+      window for each dimension of the input tensor.
+    padding: A string, either `'VALID'` or `'SAME'`. The padding algorithm.
+    name: Optional name for the operation.
+
+  Returns:
+    A `Tensor` with the same type as `value`.  The max pooled output tensor.
+  """
+  with ops.op_scope([value], name, "MaxPool") as name:
+    value = ops.convert_to_tensor(value, name="input")
+    return gen_nn_ops._max_pool(value, ksize=ksize, strides=strides,
+                                padding=padding,
+                                name=name)
+
+
+ops.RegisterShape("Relu")(common_shapes.unchanged_shape)
+ops.RegisterShape("Relu6")(common_shapes.unchanged_shape)
+ops.RegisterShape("Softplus")(common_shapes.unchanged_shape)
+
+
+@ops.RegisterShape("ReluGrad")
+@ops.RegisterShape("Relu6Grad")
+@ops.RegisterShape("SoftplusGrad")
+def _BinaryElementwiseShape(op):
+  """Returns same shape as both inputs to op.
+
+  Args:
+    op: Input operation.
+
+  Returns:
+    Shape of both inputs to `op`.
+  """
+  return [op.inputs[0].get_shape().merge_with(op.inputs[1].get_shape())]
+
+
+ops.RegisterShape("L2Loss")(common_shapes.scalar_shape)
+
+
+ops.RegisterShape("LRN")(common_shapes.unchanged_shape_with_rank(4))
+
+
+@ops.RegisterShape("LRNGrad")
+def _LRNGradShape(op):
+  """Shape function for LRNGrad op."""
+  in_grads_shape = op.inputs[0].get_shape().with_rank(4)
+  in_image_shape = op.inputs[1].get_shape().with_rank(4)
+  out_image_shape = op.inputs[2].get_shape().with_rank(4)
+  return [in_grads_shape.merge_with(in_image_shape).merge_with(out_image_shape)]
+
+
+ops.RegisterShape("Softmax")(
+    common_shapes.unchanged_shape_with_rank(2))
+
+
+@ops.RegisterShape("InTopK")
+def _InTopKShape(op):
+  """Shape function for InTopK op."""
+  predictions_shape = op.inputs[0].get_shape().with_rank(2)
+  targets_shape = op.inputs[1].get_shape().with_rank(1)
+  batch_size = predictions_shape[0].merge_with(targets_shape[0])
+  return [tensor_shape.vector(batch_size.value)]
+
+
+@ops.RegisterShape("TopK")
+def _TopKShape(op):
+  """Shape function for TopK op."""
+  input_shape = op.inputs[0].get_shape().with_rank(2)
+  k = op.get_attr("k")
+  num_rows = input_shape[0]
+  num_cols = input_shape[1]
+  if num_cols.value is not None and num_cols.value < k:
+    raise ValueError("input must have at least k (%d) columns" % k)
+  return [tensor_shape.TensorShape([num_rows, k]),
+          tensor_shape.TensorShape([num_rows, k])]
+
+
+@ops.RegisterShape("BatchNormWithGlobalNormalization")
+def _BatchNormShape(op):
+  """Shape function for BatchNormWithGlobalNormalization op."""
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  mean_shape = op.inputs[1].get_shape().with_rank(1)
+  var_shape = op.inputs[2].get_shape().with_rank(1)
+  beta_shape = op.inputs[3].get_shape().with_rank(1)
+  gamma_shape = op.inputs[4].get_shape().with_rank(1)
+  mean_shape[0].merge_with(input_shape[3])
+  var_shape[0].merge_with(input_shape[3])
+  beta_shape[0].merge_with(input_shape[3])
+  gamma_shape[0].merge_with(input_shape[3])
+  return [input_shape]
+
+
+@ops.RegisterShape("BatchNormWithGlobalNormalizationGrad")
+def _BatchNormGradShape(op):
+  """Shape function for BatchNormWithGlobalNormalizationGrad op."""
+  input_shape = op.inputs[0].get_shape().with_rank(4)
+  mean_shape = op.inputs[1].get_shape().with_rank(1)
+  var_shape = op.inputs[2].get_shape().with_rank(1)
+  beta_shape = op.inputs[3].get_shape().with_rank(1)
+  out_backprop_shape = op.inputs[4].get_shape().with_rank(4)
+  input_shape = input_shape.merge_with(out_backprop_shape)
+  vector_dim = input_shape[3]
+  vector_dim = vector_dim.merge_with(mean_shape[0])
+  vector_dim = vector_dim.merge_with(var_shape[0])
+  vector_dim = vector_dim.merge_with(beta_shape[0])
+  return [input_shape] + ([tensor_shape.vector(vector_dim)] * 4)
+
+
+ops.RegisterShape("Conv2D")(common_shapes.conv2d_shape)
+ops.RegisterShape("AvgPool")(common_shapes.avg_pool_shape)
+ops.RegisterShape("MaxPool")(common_shapes.max_pool_shape)
+
+
+@ops.RegisterShape("MaxPoolWithArgmax")
+def _MaxPoolWithArgMaxShape(op):
+  """Shape function for MaxPoolWithArgmax op."""
+  return common_shapes.max_pool_shape(op) * 2
+
+
+@ops.RegisterShape("AvgPoolGrad")
+def _AvgPoolGradShape(op):
+  """Shape function for the AvgPoolGrad op."""
+  orig_input_shape = tensor_util.ConstantValue(op.inputs[0])
+  if orig_input_shape is not None:
+    return [tensor_shape.TensorShape(orig_input_shape.tolist())]
+  else:
+    # NOTE(mrry): We could in principle work out the shape from the
+    # gradients and the attrs, but if we do not know orig_input_shape
+    # statically, then we are unlikely to know the shape of the
+    # gradients either.
+    return [tensor_shape.unknown_shape(ndims=4)]
+
+
+@ops.RegisterShape("Conv2DBackpropFilter")
+def _Conv2DBackpropFilterShape(op):
+  """Shape function for the Conv2DBackpropFilter op."""
+  filter_shape = tensor_util.ConstantValue(op.inputs[1])
+  if filter_shape is not None:
+    return [tensor_shape.TensorShape(filter_shape.tolist())]
+  else:
+    # NOTE(mrry): We could in principle work out the shape from the
+    # gradients and the attrs, but if we do not know filter_shape
+    # statically, then we are unlikely to know the shape of the
+    # gradients either.
+    return [tensor_shape.unknown_shape(ndims=4)]
+
+
+@ops.RegisterShape("Conv2DBackpropInput")
+def _Conv2DBackpropInputShape(op):
+  """Shape function for the Conv2DBackpropInput op."""
+  input_shape = tensor_util.ConstantValue(op.inputs[0])
+  if input_shape is not None:
+    return [tensor_shape.TensorShape(input_shape.tolist())]
+  else:
+    # NOTE(mrry): We could in principle work out the shape from the
+    # gradients and the attrs, but if we do not know input_shape
+    # statically, then we are unlikely to know the shape of the
+    # gradients either.
+    return [tensor_shape.unknown_shape(ndims=4)]
+
+
+@ops.RegisterShape("MaxPoolGrad")
+@ops.RegisterShape("MaxPoolGradWithArgmax")
+def _MaxPoolGradShape(op):
+  """Shape function for the MaxPoolGrad op."""
+  orig_input_shape = op.inputs[0].get_shape().with_rank(4)
+  return [orig_input_shape]
diff --git a/tensorflow/python/ops/nn_test.py b/tensorflow/python/ops/nn_test.py
new file mode 100644
index 0000000000..11ce56e359
--- /dev/null
+++ b/tensorflow/python/ops/nn_test.py
@@ -0,0 +1,882 @@
+"""Tests for tensorflow.ops.nn."""
+import math
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.kernel_tests import gradient_checker as gc
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import gen_nn_ops
+from tensorflow.python.ops import gradients
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import nn
+from tensorflow.python.ops import nn_grad
+from tensorflow.python.platform import googletest
+
+exp = math.exp
+log = math.log
+
+
+class SigmoidCrossEntropyWithLogitsTest(test_util.TensorFlowTestCase):
+
+  def _SigmoidCrossEntropyWithLogits(self, logits, targets):
+    assert len(logits) == len(targets)
+    pred = [1 / (1 + exp(-x)) for x in logits]
+    eps = 0.0001
+    pred = [min(max(p, eps), 1 - eps) for p in pred]
+    return [-z * log(y) - (1 - z) * log(1 - y) for y, z in zip(pred, targets)]
+
+  def _Inputs(self, x=None, y=None, dtype=types.float64, sizes=None):
+    x = [-100, -2, -2, 0, 2, 2, 2, 100] if x is None else x
+    y = [0, 0, 1, 0, 0, 1, 0.5, 1] if y is None else y
+    assert len(x) == len(y)
+    sizes = sizes if sizes else [len(x)]
+    logits = constant_op.constant(x, shape=sizes, dtype=dtype, name="logits")
+    targets = constant_op.constant(y, shape=sizes, dtype=dtype, name="targets")
+    losses = np.array(self._SigmoidCrossEntropyWithLogits(x, y)).reshape(*sizes)
+    return logits, targets, losses
+
+  def testConstructionNamed(self):
+    with self.test_session():
+      logits, targets, _ = self._Inputs()
+      loss = nn.sigmoid_cross_entropy_with_logits(logits, targets,
+                                                  name="mylogistic")
+    self.assertEqual("mylogistic", loss.op.name)
+
+  def testLogisticOutput(self):
+    for use_gpu in [True, False]:
+      with self.test_session(use_gpu=use_gpu):
+        logits, targets, losses = self._Inputs(dtype=types.float32)
+        loss = nn.sigmoid_cross_entropy_with_logits(logits, targets)
+        np_loss = np.array(losses).astype(np.float32)
+        tf_loss = loss.eval()
+      self.assertAllClose(np_loss, tf_loss, atol=0.001)
+
+  def testLogisticOutputMultiDim(self):
+    for use_gpu in [True, False]:
+      with self.test_session(use_gpu=use_gpu):
+        logits, targets, losses = self._Inputs(dtype=types.float32,
+                                               sizes=[2, 2, 2])
+        loss = nn.sigmoid_cross_entropy_with_logits(logits, targets)
+        np_loss = np.array(losses).astype(np.float32)
+        tf_loss = loss.eval()
+      self.assertAllClose(np_loss, tf_loss, atol=0.001)
+
+  def testGradient(self):
+    sizes = [4, 2]
+    with self.test_session():
+      logits, targets, _ = self._Inputs(sizes=sizes)
+      loss = nn.sigmoid_cross_entropy_with_logits(logits, targets)
+      err = gc.ComputeGradientError(logits, sizes, loss, sizes)
+    print "logistic loss gradient err = ", err
+    self.assertLess(err, 1e-7)
+
+
+class ZeroFractionTest(test_util.TensorFlowTestCase):
+
+  def _ZeroFraction(self, x):
+    assert x.shape
+    total_elements = float(np.prod(x.shape))
+    nonzeros = float(np.count_nonzero(x.flatten()))
+    return 1.0 - (nonzeros / total_elements)
+
+  def testZeroFraction(self):
+    x_shape = [5, 17]
+    x_np = np.random.randint(0, 2, size=x_shape).astype(np.float32)
+    y_np = self._ZeroFraction(x_np)
+    with self.test_session():
+      x_tf = constant_op.constant(x_np)
+      x_tf.set_shape(x_shape)
+      y_tf = nn.zero_fraction(x_tf)
+      y_tf_np = y_tf.eval()
+    eps = 1e-8
+    self.assertAllClose(y_tf_np, y_np, eps)
+
+  def testZeroFractionEmpty(self):
+    with self.test_session():
+      x = np.zeros(0)
+      y = nn.zero_fraction(x).eval()
+      self.assertTrue(np.isnan(y))
+
+
+class SoftmaxTest(test_util.TensorFlowTestCase):
+
+  def _softmax(self, x):
+    assert len(x.shape) == 2
+    m = x.max(1)[:, np.newaxis]
+    u = np.exp(x - m)
+    z = u.sum(1)[:, np.newaxis]
+    return u / z
+
+  def testSoftmax(self):
+    x_shape = [5, 10]
+    x_np = np.random.randn(*x_shape).astype(np.float32)
+    y_np = self._softmax(x_np)
+    with self.test_session():
+      x_tf = constant_op.constant(x_np)
+      y_tf = nn.softmax(x_tf)
+      y_tf_np = y_tf.eval()
+    eps = 1e-3
+    self.assertAllClose(y_tf_np, y_np, eps)
+
+  def testGradient(self):
+    x_shape = [5, 10]
+    x_np = np.random.randn(*x_shape).astype(np.float64)
+    with self.test_session():
+      x_tf = constant_op.constant(x_np)
+      y_tf = nn.softmax(x_tf)
+      err = gc.ComputeGradientError(x_tf, x_shape, y_tf, x_shape)
+    eps = 1e-8
+    self.assertLess(err, eps)
+
+
+class DeConv2DTest(test_util.TensorFlowTestCase):
+
+  def testDeConv2DSingleStride(self):
+    with self.test_session():
+      strides = [1, 1, 1, 1]
+
+      # Input, output: [batch, height, width, depth]
+      x_shape = [2, 6, 4, 3]
+      y_shape = [2, 6, 4, 2]
+
+      # Filter: [kernel_height, kernel_width, output_depth, input_depth]
+      f_shape = [3, 3, 2, 3]
+
+      x = constant_op.constant(1.0, shape=x_shape, name="x",
+                               dtype=types.float32)
+      f = constant_op.constant(1.0, shape=f_shape, name="filter",
+                               dtype=types.float32)
+      output = nn.deconv2d(x, f, y_shape, strides=strides, padding="SAME")
+      value = output.eval()
+
+      # We count the number of cells being added at the locations in the output.
+      # At the center, #cells=kernel_height * kernel_width
+      # At the corners, #cells=ceil(kernel_height/2) * ceil(kernel_width/2)
+      # At the borders, #cells=ceil(kernel_height/2)*kernel_width or
+      #                        kernel_height * ceil(kernel_width/2)
+
+      for n in xrange(x_shape[0]):
+        for k in xrange(f_shape[2]):
+          for w in xrange(y_shape[2]):
+            for h in xrange(y_shape[1]):
+              target = 4 * 3.0
+              h_in = h > 0 and h < y_shape[1] - 1
+              w_in = w > 0 and w < y_shape[2] - 1
+              if h_in and w_in:
+                target += 5 * 3.0
+              elif h_in or w_in:
+                target += 2 * 3.0
+              self.assertAllClose(target, value[n, h, w, k])
+
+  def testDeConv2DSame(self):
+    with self.test_session():
+      strides = [1, 2, 2, 1]
+
+      # Input, output: [batch, height, width, depth]
+      x_shape = [2, 6, 4, 3]
+      y_shape = [2, 12, 8, 2]
+
+      # Filter: [kernel_height, kernel_width, output_depth, input_depth]
+      f_shape = [3, 3, 2, 3]
+
+      x = constant_op.constant(1.0, shape=x_shape, name="x",
+                               dtype=types.float32)
+      f = constant_op.constant(1.0, shape=f_shape, name="filter",
+                               dtype=types.float32)
+      output = nn.deconv2d(x, f, y_shape, strides=strides, padding="SAME")
+      value = output.eval()
+
+      for n in xrange(x_shape[0]):
+        for k in xrange(f_shape[2]):
+          for w in xrange(y_shape[2]):
+            for h in xrange(y_shape[1]):
+              target = 3.0
+              # We add a case for locations divisible by the stride.
+              h_in = h % strides[1] == 0 and h > 0 and h < y_shape[1] - 1
+              w_in = w % strides[2] == 0 and w > 0 and w < y_shape[2] - 1
+              if h_in and w_in:
+                target += 9.0
+              elif h_in or w_in:
+                target += 3.0
+              self.assertAllClose(target, value[n, h, w, k])
+
+  def testDeConv2DValid(self):
+    with self.test_session():
+      strides = [1, 2, 2, 1]
+
+      # Input, output: [batch, height, width, depth]
+      x_shape = [2, 6, 4, 3]
+      y_shape = [2, 13, 9, 2]
+
+      # Filter: [kernel_height, kernel_width, output_depth, input_depth]
+      f_shape = [3, 3, 2, 3]
+
+      x = constant_op.constant(1.0, shape=x_shape, name="x",
+                               dtype=types.float32)
+      f = constant_op.constant(1.0, shape=f_shape, name="filter",
+                               dtype=types.float32)
+      output = nn.deconv2d(x, f, y_shape, strides=strides, padding="VALID")
+      value = output.eval()
+
+      cache_values = np.zeros(y_shape, dtype=np.float32)
+
+      # The amount of padding added
+      pad = 1
+
+      for n in xrange(x_shape[0]):
+        for k in xrange(f_shape[2]):
+          for w in xrange(pad, y_shape[2] - pad):
+            for h in xrange(pad, y_shape[1] - pad):
+              target = 3.0
+              # We add a case for locations divisible by the stride.
+              h_in = h % strides[
+                  1] == 0 and h > pad and h < y_shape[1] - 1 - pad
+              w_in = w % strides[
+                  2] == 0 and w > pad and w < y_shape[2] - 1 - pad
+              if h_in and w_in:
+                target += 9.0
+              elif h_in or w_in:
+                target += 3.0
+              cache_values[n, h, w, k] = target
+
+          # copy values in the border
+          cache_values[n, :, 0, k] = cache_values[n, :, 1, k]
+          cache_values[n, :, -1, k] = cache_values[n, :, -2, k]
+          cache_values[n, 0, :, k] = cache_values[n, 1, :, k]
+          cache_values[n, -1, :, k] = cache_values[n, -2, :, k]
+
+    self.assertAllClose(cache_values, value)
+
+  def testGradient(self):
+    x_shape = [2, 6, 4, 3]
+    f_shape = [3, 3, 2, 3]
+    y_shape = [2, 12, 8, 2]
+    strides = [1, 2, 2, 1]
+    np.random.seed(1)  # Make it reproducible.
+    x_val = np.random.random_sample(x_shape).astype(np.float64)
+    f_val = np.random.random_sample(f_shape).astype(np.float64)
+    with self.test_session():
+      x = constant_op.constant(x_val, name="x", dtype=types.float32)
+      f = constant_op.constant(f_val, name="f", dtype=types.float32)
+      output = nn.deconv2d(x, f, y_shape, strides=strides, padding="SAME")
+      err = gc.ComputeGradientError([x, f], [x_shape, f_shape], output, y_shape)
+    print "DeConv gradient err = %g " % err
+    err_tolerance = 0.0005
+    self.assertLess(err, err_tolerance)
+
+
+class L2LossTest(test_util.TensorFlowTestCase):
+
+  def testL2Loss(self):
+    with self.test_session():
+      x = constant_op.constant([1.0, 0.0, 3.0, 2.0], shape=[2, 2], name="x")
+      l2loss = nn.l2_loss(x)
+      value = l2loss.eval()
+    self.assertAllClose(7.0, value)
+
+  def testGradient(self):
+    x_shape = [20, 7, 3]
+    np.random.seed(1)  # Make it reproducible.
+    x_val = np.random.random_sample(x_shape).astype(np.float64)
+    with self.test_session():
+      x = constant_op.constant(x_val, name="x")
+      output = nn.l2_loss(x)
+      err = gc.ComputeGradientError(x, x_shape, output, [1])
+    print "L2Loss gradient err = %g " % err
+    err_tolerance = 1e-11
+    self.assertLess(err, err_tolerance)
+
+
+class L2NormalizeTest(test_util.TensorFlowTestCase):
+
+  def _l2Normalize(self, x, dim):
+    norm = np.apply_along_axis(np.linalg.norm, dim, x)
+    return x / np.expand_dims(norm, dim)
+
+  def testL2Normalize(self):
+    x_shape = [20, 7, 3]
+    np.random.seed(1)
+    x_np = np.random.random_sample(x_shape).astype(np.float32)
+    for dim in range(len(x_shape)):
+      y_np = self._l2Normalize(x_np, dim)
+      with self.test_session():
+        x_tf = constant_op.constant(x_np, name="x")
+        y_tf = nn.l2_normalize(x_tf, dim)
+        self.assertAllClose(y_np, y_tf.eval())
+
+  def testL2NormalizeGradient(self):
+    x_shape = [20, 7, 3]
+    np.random.seed(1)
+    x_np = np.random.random_sample(x_shape).astype(np.float64)
+    for dim in range(len(x_shape)):
+      with self.test_session():
+        x_tf = constant_op.constant(x_np, name="x")
+        y_tf = nn.l2_normalize(x_tf, dim)
+        err = gc.ComputeGradientError(x_tf, x_shape, y_tf, x_shape)
+      print "L2Normalize gradient err = %g " % err
+      self.assertLess(err, 1e-4)
+
+
+class DropoutTest(test_util.TensorFlowTestCase):
+
+  def testDropout(self):
+    # Runs dropout with 0-1 tensor 10 times, sum the number of ones and validate
+    # that it is producing approximately the right number of ones over a large
+    # number of samples, based on the keep probability.
+    x_dim = 40
+    y_dim = 30
+    num_iter = 10
+    for keep_prob in [0.1, 0.5, 0.8]:
+      with self.test_session():
+        t = constant_op.constant(1.0,
+                                 shape=[x_dim, y_dim],
+                                 dtype=types.float32)
+        dropout = nn.dropout(t, keep_prob)
+        final_count = 0
+        self.assertEqual([x_dim, y_dim], dropout.get_shape())
+        for _ in xrange(0, num_iter):
+          value = dropout.eval()
+          final_count += np.count_nonzero(value)
+          # Verifies that there are only two values: 0 and 1/keep_prob.
+          sorted_value = np.unique(np.sort(value))
+          self.assertEqual(0, sorted_value[0])
+          self.assertAllClose(1 / keep_prob, sorted_value[1])
+      # Check that we are in the 15% error range
+      expected_count = x_dim * y_dim * keep_prob * num_iter
+      rel_error = math.fabs(final_count - expected_count) / expected_count
+      print rel_error
+      self.assertTrue(rel_error < 0.15)
+
+  def testShapedDropout(self):
+    # Runs dropout with 0-1 tensor 10 times, sum the number of ones and validate
+    # that it is producing approximately the right number of ones over a large
+    # number of samples, based on the keep probability. This time with shaped
+    # noise.
+    x_dim = 40 * 30
+    y_dim = 3
+    num_iter = 10
+    for keep_prob in [0.1, 0.5, 0.8]:
+      with self.test_session():
+        t = constant_op.constant(1.0,
+                                 shape=[x_dim, y_dim],
+                                 dtype=types.float32)
+        dropout = nn.dropout(t, keep_prob, noise_shape=[x_dim, 1])
+        self.assertEqual([x_dim, y_dim], dropout.get_shape())
+        final_count = 0
+        for _ in xrange(0, num_iter):
+          value = dropout.eval()
+          final_count += np.count_nonzero(value)
+          # Verifies that there are only two values: 0 and 1/keep_prob.
+          sorted_value = np.unique(np.sort(value))
+          self.assertEqual(0, sorted_value[0])
+          self.assertAllClose(1 / keep_prob, sorted_value[1])
+      # Check that we are in the 15% error range
+      expected_count = x_dim * y_dim * keep_prob * num_iter
+      rel_error = math.fabs(final_count - expected_count) / expected_count
+      print rel_error
+      self.assertTrue(rel_error < 0.15)
+
+  def testShapedDropoutCorrelation(self):
+    # Runs a shaped dropout and tests that the correlations are correct.
+    x_dim = 40
+    y_dim = 30
+    num_iter = 10
+    for keep_prob in [0.1, 0.5, 0.8]:
+      with self.test_session():
+        t = constant_op.constant(1.0,
+                                 shape=[x_dim, y_dim],
+                                 dtype=types.float32)
+        dropout = nn.dropout(t, keep_prob, noise_shape=[x_dim, 1])
+        self.assertEqual([x_dim, y_dim], dropout.get_shape())
+        for _ in xrange(0, num_iter):
+          value = dropout.eval()
+          # Verifies that each y column as only one type of activation.
+          for i in xrange(x_dim):
+            sorted_value = np.unique(np.sort(value[i, :]))
+            self.assertEqual(sorted_value.size, 1)
+
+  def testShapedDropoutShapeError(self):
+    # Runs shaped dropout and verifies an error is thrown on misshapen noise.
+    x_dim = 40
+    y_dim = 30
+    keep_prob = 0.5
+    with self.test_session():
+      t = constant_op.constant(1.0,
+                               shape=[x_dim, y_dim],
+                               dtype=types.float32)
+      with self.assertRaises(ValueError):
+        _ = nn.dropout(t, keep_prob, noise_shape=[x_dim, y_dim + 10])
+      with self.assertRaises(ValueError):
+        _ = nn.dropout(t, keep_prob, noise_shape=[x_dim, y_dim, 5])
+      with self.assertRaises(ValueError):
+        _ = nn.dropout(t, keep_prob, noise_shape=[x_dim + 3])
+      with self.assertRaises(ValueError):
+        _ = nn.dropout(t, keep_prob, noise_shape=[x_dim])
+      # test that broadcasting proceeds
+      _ = nn.dropout(t, keep_prob, noise_shape=[y_dim])
+      _ = nn.dropout(t, keep_prob, noise_shape=[1, y_dim])
+      _ = nn.dropout(t, keep_prob, noise_shape=[x_dim, 1])
+      _ = nn.dropout(t, keep_prob, noise_shape=[1, 1])
+
+
+class BatchNormWithGlobalNormalizationTest(test_util.TensorFlowTestCase):
+
+  def _npBatchNorm(self, x, m, v, beta, gamma, epsilon,
+                   scale_after_normalization):
+    y = (x - m) / np.sqrt(v + epsilon)
+    y = y * gamma if scale_after_normalization else y
+    y += beta
+    return y
+
+  def _opsBatchNorm(self, x, m, v, beta, gamma, epsilon,
+                    scale_after_normalization):
+    y = (x - m) * math_ops.rsqrt(v + epsilon)
+    if scale_after_normalization:
+      y = gamma * y
+    y += beta
+    return y
+
+  def testBatchNorm(self):
+    x_shape = [3, 5, 4, 2]
+    param_shape = [2]
+    x_val = np.random.random_sample(x_shape).astype(np.float32)
+    m_val = np.random.random_sample(param_shape).astype(np.float32)
+    v_val = np.random.random_sample(param_shape).astype(np.float32)
+    beta_val = np.random.random_sample(param_shape).astype(np.float32)
+    gamma_val = np.random.random_sample(param_shape).astype(np.float32)
+    for use_gpu in [True, False]:
+      with self.test_session(use_gpu=use_gpu) as sess:
+        x = constant_op.constant(x_val, name="x")
+        m = constant_op.constant(m_val, name="m")
+        v = constant_op.constant(v_val, name="v")
+        beta = constant_op.constant(beta_val, name="beta")
+        gamma = constant_op.constant(gamma_val, name="gamma")
+        epsilon = 0.001
+        for scale_after_normalization in [True, False]:
+          bn = nn.batch_norm_with_global_normalization(
+              x, m, v, beta, gamma, epsilon, scale_after_normalization)
+          on = self._opsBatchNorm(
+              x, m, v, beta, gamma, epsilon, scale_after_normalization)
+          np_batch_norm = self._npBatchNorm(
+              x_val, m_val, v_val, beta_val, gamma_val, epsilon,
+              scale_after_normalization)
+          tf_batch_norm, ops_batch_norm = sess.run([bn, on])
+          self.assertAllClose(np_batch_norm, tf_batch_norm, atol=0.000001)
+          self.assertAllClose(np_batch_norm, ops_batch_norm, atol=0.000001)
+          self.assertAllClose(tf_batch_norm, ops_batch_norm, atol=0.000001)
+
+  def _testBatchNormGradient(self, param_index, tag, scale_after_normalization,
+                             err_tolerance=1e-11):
+    x_shape = [3, 5, 4, 5]
+    param_shape = [5]
+    np.random.seed(1)  # Make it reproducible.
+    x_val = np.random.random_sample(x_shape).astype(np.float64)
+    m_val = np.random.random_sample(param_shape).astype(np.float64)
+    v_val = np.random.random_sample(param_shape).astype(np.float64)
+    beta_val = np.random.random_sample(param_shape).astype(np.float64)
+    gamma_val = np.random.random_sample(param_shape).astype(np.float64)
+    with self.test_session():
+      x = constant_op.constant(x_val, name="x")
+      m = constant_op.constant(m_val, name="m")
+      v = constant_op.constant(v_val, name="v")
+      beta = constant_op.constant(beta_val, name="beta")
+      gamma = constant_op.constant(gamma_val, name="gamma")
+      epsilon = 0.001
+      # If scale_after_normalization is False, backprop for gamma
+      # will be 0. gamma is unchanged.
+      output = nn.batch_norm_with_global_normalization(
+          x, m, v, beta, gamma, epsilon, scale_after_normalization)
+      all_params = [x, m, v, beta, gamma]
+      all_shapes = [x_shape, param_shape, param_shape, param_shape, param_shape]
+      err = gc.ComputeGradientError(all_params[param_index],
+                                    all_shapes[param_index], output, x_shape)
+    print "Batch normalization %s gradient %s scale err = " % (
+        tag, "with" if scale_after_normalization else "without"
+    ), err
+    self.assertLess(err, err_tolerance)
+
+  def testBatchNormInputGradient(self):
+    for scale_after_normalization in [True, False]:
+      self._testBatchNormGradient(0, "x", scale_after_normalization)
+
+  def testBatchNormMeanGradient(self):
+    for scale_after_normalization in [True, False]:
+      self._testBatchNormGradient(1, "mean", scale_after_normalization)
+
+  def testBatchNormVarianceGradient(self):
+    for scale_after_normalization in [True, False]:
+      self._testBatchNormGradient(2, "variance", scale_after_normalization,
+                                  err_tolerance=1e-03)
+
+  def testBatchNormBetaGradient(self):
+    for scale_after_normalization in [True, False]:
+      self._testBatchNormGradient(3, "beta", scale_after_normalization)
+
+  def testBatchNormGammaGradient(self):
+    for scale_after_normalization in [True, False]:
+      self._testBatchNormGradient(4, "gamma", scale_after_normalization)
+
+  def testBatchNormGradImpl(self):
+    x_shape = [7, 5, 4, 6]
+    param_shape = [6]
+    np.random.seed(1)  # Make it reproducible.
+    x_val = np.random.random_sample(x_shape).astype(np.float32)
+    m_val = np.random.random_sample(param_shape).astype(np.float32)
+    v_val = np.random.random_sample(param_shape).astype(np.float32)
+    beta_val = np.random.random_sample(param_shape).astype(np.float32)
+    gamma_val = np.random.random_sample(param_shape).astype(np.float32)
+    backprop_val = np.random.random_sample(x_shape).astype(np.float32)
+    for use_gpu in [False, True]:
+      with self.test_session(use_gpu=use_gpu) as sess:
+        x = constant_op.constant(x_val, name="x")
+        m = constant_op.constant(m_val, name="m")
+        v = constant_op.constant(v_val, name="v")
+        beta = constant_op.constant(beta_val, name="beta")
+        gamma = constant_op.constant(gamma_val, name="gamma")
+        backprop = constant_op.constant(backprop_val, name="backprop")
+        epsilon = 0.001
+        for scale_after_normalization in [True, False]:
+          dx, dm, dv, db, dg = (
+              gen_nn_ops._batch_norm_with_global_normalization_grad(
+              x, m, v, gamma, backprop, epsilon, scale_after_normalization))
+          on = self._opsBatchNorm(
+              x, m, v, beta, gamma, epsilon, scale_after_normalization)
+          odx, odm, odv, odb, odg = gradients.gradients(
+              [on], [x, m, v, beta, gamma], [backprop])
+          if scale_after_normalization:
+            all_grads = sess.run([dx, dm, dv, db, dg, odx, odm, odv, odb, odg])
+            to_check = ["dx", "dm", "dv", "db", "dg"]
+          else:
+            all_grads = sess.run([dx, dm, dv, db, odx, odm, odv, odb])
+            to_check = ["dx", "dm", "dv", "db"]
+          for i, n in enumerate(to_check):
+            print n
+            self.assertAllClose(
+                all_grads[i + len(to_check)], all_grads[i], atol=0.000001)
+
+
+class MomentsTest(test_util.TensorFlowTestCase):
+
+  def RunMomentTest(self, shape, global_norm):
+    with self.test_session():
+      # shape = [batch, width, height, depth]
+      assert len(shape) == 4
+
+      x_numpy = np.random.normal(size=shape).astype(np.float32)
+      x = constant_op.constant(x_numpy)
+      x.set_shape(shape)
+      axes = [0, 1, 2] if global_norm else [0]
+      mean, var = nn.moments(x, axes)
+
+      num_elements = np.prod([shape[i] for i in axes])
+
+      ax = (0, 1, 2) if global_norm else (0)
+      expected_mean = np.sum(x_numpy, axis=ax) / num_elements
+      expected_mean_squared = np.multiply(expected_mean, expected_mean)
+      expected_x_squared = np.sum(
+          np.multiply(x_numpy, x_numpy), axis=ax) / num_elements
+      expected_variance = expected_x_squared - expected_mean_squared
+
+      # Check that the moments are correct.
+      self.assertAllClose(expected_mean, mean.eval())
+      self.assertAllClose(expected_variance, var.eval())
+
+  def testBasic(self):
+    self.RunMomentTest(shape=[2, 3, 5, 4], global_norm=False)
+
+  def testGlobalNormalization(self):
+    self.RunMomentTest(shape=[2, 3, 5, 4], global_norm=True)
+
+  def _testGlobalGradient(self, from_y="mean"):
+    with self.test_session():
+      x_shape = [3, 5, 4, 2]
+      x_val = np.random.random_sample(x_shape).astype(np.float64)
+      x = constant_op.constant(x_val)
+      x.set_shape(x_shape)
+
+      axes = [0, 1, 2]
+      y_shape = [2]  # Depth of x
+      out_mean, out_var = nn.moments(x, axes)
+      if from_y == "mean":
+        y = out_mean
+      elif from_y == "var":
+        y = out_var
+      err = gc.ComputeGradientError(x, x_shape, y, y_shape)
+      print "Moments %s gradient err = %g" % (from_y, err)
+      self.assertLess(err, 1e-11)
+
+  def testMeanGlobalGradient(self):
+    self._testGlobalGradient(from_y="mean")
+
+  def testVarGlobalGradient(self):
+    self._testGlobalGradient(from_y="var")
+
+
+class ComputeSampledLogitsTest(test_util.TensorFlowTestCase):
+
+  def setUp(self):
+    self._num_classes = 5
+    self._dim = 10
+    self._batch_size = 3
+
+  def _GenerateTestInputs(self):
+    np.random.seed(0)
+    weights = np.random.randn(self._num_classes, self._dim).astype(np.float32)
+    biases = np.random.randn(self._num_classes).astype(np.float32)
+    hidden_acts = np.random.randn(self._batch_size, self._dim).astype(
+        np.float32)
+
+    return weights, biases, hidden_acts
+
+  def _ComputeSampledLogitsNP(self, true_w, true_b, sampled_w, sampled_b,
+                              hidden_acts,
+                              num_true=1,
+                              true_expected=None,
+                              sampled_expected=None):
+
+    batch_size, dim = hidden_acts.shape
+    true_logits = np.sum(
+        hidden_acts.reshape((batch_size, 1, dim)) * true_w.reshape(
+            (batch_size, num_true, dim)),
+        axis=2)
+    true_b = true_b.reshape((batch_size, num_true))
+    true_logits += true_b
+    sampled_logits = np.dot(hidden_acts, sampled_w.T) + sampled_b
+
+    if true_expected is not None:
+      true_logits -= np.log(true_expected)
+    if sampled_expected is not None:
+      sampled_logits -= np.log(sampled_expected[np.newaxis, :])
+
+    out_logits = np.concatenate([true_logits, sampled_logits], axis=1)
+    out_labels = np.hstack((np.ones_like(true_logits) / num_true,
+                            np.zeros_like(sampled_logits)))
+
+    return out_logits, out_labels
+
+  def _ComputeSampledLogitsTF(self, weights, biases, hidden_acts, labels,
+                              num_sampled, num_classes, num_true, sampled_vals,
+                              subtract_log_q, remove_accidental_hits,
+                              name="sampled_loss_TF"):
+    # Should be called from within a `with test_session():` block
+    weights_tf = constant_op.constant(weights)
+    biases_tf = constant_op.constant(biases)
+    hidden_acts_tf = constant_op.constant(hidden_acts,
+                                          shape=(self._batch_size, self._dim))
+    labels_tf = constant_op.constant(labels, dtype=types.int64,
+                                     shape=(self._batch_size, num_true))
+
+    pred_logits_tf, pred_labels_tf = nn._compute_sampled_logits(
+        weights_tf, biases_tf, hidden_acts_tf, labels_tf, num_sampled,
+        num_classes, num_true, sampled_vals,
+        subtract_log_q=subtract_log_q,
+        remove_accidental_hits=remove_accidental_hits,
+        name=name)
+    return pred_logits_tf, pred_labels_tf
+
+  def testComputeSampledLogitsShapes(self):
+    # We just check that the shapes of the returned values are correct.
+    weights, biases, hidden_acts = self._GenerateTestInputs()
+    sampled = [1, 0, 2, 3]
+    num_sampled = len(sampled)
+    true_exp = sampled_exp = [1., 1., 1., 1.]
+    test_sampled_vals = (sampled, true_exp, sampled_exp)
+    sampled_w, sampled_b = weights[sampled], biases[sampled]
+
+    with self.test_session() as sess:
+      for num_true_test in range(1, 5):
+        labels = np.random.randint(low=0, high=self._num_classes,
+                                   size=self._batch_size * num_true_test)
+        true_w, true_b = weights[labels], biases[labels]
+
+        logits_np, labels_np = self._ComputeSampledLogitsNP(
+            true_w, true_b, sampled_w, sampled_b, hidden_acts,
+            num_true=num_true_test)
+
+        logits_tf, labels_tf = self._ComputeSampledLogitsTF(
+            weights, biases, hidden_acts, labels, num_sampled,
+            self._num_classes,
+            num_true=num_true_test,
+            sampled_vals=test_sampled_vals,
+            remove_accidental_hits=True,
+            subtract_log_q=False)
+
+      logits_tf_val, labels_tf_val = sess.run([logits_tf, labels_tf])
+      self.assertEqual(logits_np.shape, logits_tf_val.shape)
+      self.assertEqual(labels_np.shape, labels_tf_val.shape)
+
+  def testComputeSampledLogitsValues(self):
+    # Here we check the actual numerics.
+    weights, biases, hidden_acts = self._GenerateTestInputs()
+    eps = 1e-3
+    sampled = [1, 0, 2, 3]
+    num_sampled = len(sampled)
+    true_exp = np.empty([self._batch_size, 1], dtype=np.float32)
+    true_exp.fill(0.5)
+    sampled_exp = np.empty([num_sampled], dtype=np.float32)
+    sampled_exp.fill(0.5)
+    sampled_w, sampled_b = weights[sampled], biases[sampled]
+    test_sampled_vals = (sampled, true_exp, sampled_exp)
+
+    with self.test_session() as sess:
+      for num_true_test in range(1, 5):
+        # Generate test data for this run
+        labels = np.random.randint(low=0, high=self._num_classes,
+                                   size=self._batch_size * num_true_test)
+        true_w, true_b = weights[labels], biases[labels]
+
+        # Test 1: Without accidental hit removal or subtract_log_q
+        logits_np, labels_np = self._ComputeSampledLogitsNP(
+            true_w, true_b, sampled_w, sampled_b, hidden_acts,
+            num_true=num_true_test)
+        logits_tf, labels_tf = self._ComputeSampledLogitsTF(
+            weights, biases, hidden_acts, labels, num_sampled,
+            self._num_classes,
+            num_true=num_true_test,
+            sampled_vals=test_sampled_vals,
+            subtract_log_q=False,
+            remove_accidental_hits=False,
+            name="sampled_loss_test1_num_true%d" % num_true_test)
+
+        logits_tf_val, labels_tf_val = sess.run([logits_tf, labels_tf])
+        self.assertAllClose(logits_np, logits_tf_val, eps)
+        self.assertAllClose(labels_np, labels_tf_val, eps)
+
+        # Test 2: With accidental hit removal, no subtract_log_q
+        logits_tf, labels_tf = self._ComputeSampledLogitsTF(
+            weights, biases, hidden_acts, labels, num_sampled,
+            self._num_classes,
+            num_true=num_true_test,
+            sampled_vals=test_sampled_vals,
+            subtract_log_q=False,
+            remove_accidental_hits=True,
+            name="sampled_loss_test2_num_true%d" % num_true_test)
+
+        # Test that the exponentiated logits of accidental hits are near 0.
+        # First we need to find the hits in this random test run:
+        labels_reshape = labels.reshape((self._batch_size, num_true_test))
+        logits_tf_np = logits_tf.eval()
+        for row in xrange(self._batch_size):
+          row_labels = labels_reshape[row, :]
+          for col in xrange(num_sampled):
+            if sampled[col] in row_labels:
+              # We need to add the num_true_test offset into logits_*
+              self.assertNear(
+                  np.exp(logits_tf_np[row, col + num_true_test]), 0., eps)
+
+        # Test 3: With subtract_log_q, no accidental hit removal
+        logits_np, labels_np = self._ComputeSampledLogitsNP(
+            true_w, true_b, sampled_w, sampled_b, hidden_acts,
+            num_true=num_true_test,
+            true_expected=true_exp,
+            sampled_expected=sampled_exp)
+        logits_tf, labels_tf = self._ComputeSampledLogitsTF(
+            weights, biases, hidden_acts, labels, num_sampled,
+            self._num_classes,
+            num_true=num_true_test,
+            sampled_vals=test_sampled_vals,
+            subtract_log_q=True,
+            remove_accidental_hits=False,
+            name="sampled_loss_test3_num_true%d" % num_true_test)
+
+        logits_tf_val, labels_tf_val = sess.run([logits_tf, labels_tf])
+        self.assertAllClose(logits_np, logits_tf_val, eps)
+        self.assertAllClose(labels_np, labels_tf_val, eps)
+
+  def testNCELoss(self):
+    # A simple test to verify the numerics.
+
+    def _SigmoidCrossEntropyWithLogits(logits, targets):
+      # logits, targets: float arrays of the same shape.
+      assert logits.shape == targets.shape
+      pred = 1. / (1. + np.exp(-logits))
+      eps = 0.0001
+      pred = np.minimum(np.maximum(pred, eps), 1 - eps)
+      return -targets * np.log(pred) - (1. - targets) * np.log(1. - pred)
+
+    weights, biases, hidden_acts = self._GenerateTestInputs()
+    labels = [0, 1, 2]
+    true_w, true_b = weights[labels], biases[labels]
+    sampled = [1, 0, 2, 3]
+    num_sampled = len(sampled)
+    true_exp = np.empty([self._batch_size, 1], dtype=np.float32)
+    true_exp.fill(0.5)
+    sampled_exp = np.empty([num_sampled], dtype=np.float32)
+    sampled_exp.fill(0.5)
+    sampled_w, sampled_b = weights[sampled], biases[sampled]
+    test_sampled_vals = (sampled, true_exp, sampled_exp)
+
+    with self.test_session():
+      logits_np, labels_np = self._ComputeSampledLogitsNP(
+          true_w, true_b, sampled_w, sampled_b, hidden_acts,
+          true_expected=true_exp,
+          sampled_expected=sampled_exp)
+      nce_loss_np = np.sum(
+          _SigmoidCrossEntropyWithLogits(logits_np, labels_np), 1)
+
+      labels_tf = constant_op.constant(labels, shape=(self._batch_size, 1))
+      weights_tf = constant_op.constant(weights)
+      biases_tf = constant_op.constant(biases)
+      inputs_tf = constant_op.constant(hidden_acts)
+
+      nce_loss_tf = nn.nce_loss(
+          weights_tf, biases_tf, inputs_tf, labels_tf,
+          num_sampled=1,
+          num_classes=self._num_classes,
+          num_true=1,
+          sampled_values=test_sampled_vals)
+
+      self.assertAllClose(nce_loss_np, nce_loss_tf.eval(), 1e-4)
+
+  def testSampledSoftmaxLoss(self):
+    # A simple test to verify the numerics.
+
+    def _SoftmaxCrossEntropyWithLogits(logits, targets):
+      # logits, targets: float arrays of the same shape.
+      assert logits.shape == targets.shape
+      stable_exp_logits = np.exp(logits - np.amax(
+          logits, axis=1, keepdims=True))
+      pred = stable_exp_logits / np.sum(stable_exp_logits, 1, keepdims=True)
+      return -np.sum(targets * np.log(pred + 1.0e-20), axis=1)
+
+    weights, biases, hidden_acts = self._GenerateTestInputs()
+    labels = [0, 1, 2]
+    true_w, true_b = weights[labels], biases[labels]
+    sampled = [1, 0, 2, 3]
+    num_sampled = len(sampled)
+    true_exp = np.full([self._batch_size, 1], fill_value=0.5, dtype=np.float32)
+    sampled_exp = np.full([num_sampled], fill_value=0.5, dtype=np.float32)
+    sampled_w, sampled_b = weights[sampled], biases[sampled]
+    test_sampled_vals = (sampled, true_exp, sampled_exp)
+
+    with self.test_session():
+      logits_np, labels_np = self._ComputeSampledLogitsNP(
+          true_w, true_b, sampled_w, sampled_b, hidden_acts,
+          true_expected=true_exp,
+          sampled_expected=sampled_exp)
+      sampled_softmax_loss_np = _SoftmaxCrossEntropyWithLogits(logits_np,
+                                                               labels_np)
+
+      labels_tf = constant_op.constant(labels, shape=(self._batch_size, 1))
+      weights_tf = constant_op.constant(weights)
+      biases_tf = constant_op.constant(biases)
+      inputs_tf = constant_op.constant(hidden_acts)
+
+      sampled_softmax_loss_tf = nn.sampled_softmax_loss(
+          weights_tf, biases_tf, inputs_tf, labels_tf,
+          num_sampled=1,
+          num_classes=self._num_classes,
+          num_true=1,
+          sampled_values=test_sampled_vals,
+          remove_accidental_hits=False)
+
+      self.assertAllClose(
+          sampled_softmax_loss_np, sampled_softmax_loss_tf.eval(), 1e-4)
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/ops/numerics.py b/tensorflow/python/ops/numerics.py
new file mode 100644
index 0000000000..93f5d5db20
--- /dev/null
+++ b/tensorflow/python/ops/numerics.py
@@ -0,0 +1,50 @@
+"""Connects all float and double tensors to CheckNumericsOp."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import control_flow_ops
+
+
+def verify_tensor_all_finite(t, msg, name=None):
+  """Assert that the tensor does not contain any NaN's or Inf's.
+
+  Args:
+    t: Tensor to check.
+    msg: Message to log on failure.
+    name: A name for this operation (optional).
+
+  Returns:
+    Same tensor as `t`.
+  """
+  with ops.op_scope([t], name, "VerifyFinite") as name:
+    t = ops.convert_to_tensor(t, name="t")
+    with ops.device(t.device or t.graph.get_default_device()):
+      verify_input = array_ops.check_numerics(t, message=msg)
+      out = control_flow_ops.with_dependencies([verify_input], t)
+  return out
+
+
+def add_check_numerics_ops():
+  """Connect a check_numerics to every floating point tensor.
+
+  `check_numerics` operations themselves are added for each `float` or `double`
+  tensor in the graph. For all ops in the graph, the `check_numerics` op for
+  all of its (`float` or `double`) inputs is guaranteed to run before the
+  `check_numerics` op on any of its outputs.
+
+  Returns:
+    A `group` op depending on all `check_numerics` ops added.
+  """
+  check_op = []
+  # This code relies on the ordering of ops in get_operations().
+  # The consumer of a tensor always comes before that tensor's producer in
+  # this list. This is true because get_operations() returns ops in the order
+  # added, and ops can only be added once its inputs are added.
+  for op in ops.get_default_graph().get_operations():
+    for output in op.outputs:
+      if output.dtype in [types.float32, types.float64]:
+        message = op.name + ":" + str(output.value_index)
+        with ops.control_dependencies(check_op):
+          check_op = [array_ops.check_numerics(output, message=message)]
+  return control_flow_ops.group(*check_op)
diff --git a/tensorflow/python/ops/op_def_library.py b/tensorflow/python/ops/op_def_library.py
new file mode 100644
index 0000000000..5947b6df89
--- /dev/null
+++ b/tensorflow/python/ops/op_def_library.py
@@ -0,0 +1,640 @@
+"""Class to hold a library of OpDefs and use it to create Brain operations."""
+
+import numbers
+
+from tensorflow.core.framework import attr_value_pb2
+from tensorflow.core.framework import op_def_pb2
+from tensorflow.core.framework import tensor_pb2
+from tensorflow.core.framework import tensor_shape_pb2
+from tensorflow.core.framework import types_pb2
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import types as types_lib
+from tensorflow.python.ops import constant_op
+from tensorflow.python.platform import logging
+
+
+def _Attr(op_def, name):
+  for attr in op_def.attr:
+    if attr.name == name:
+      return attr
+  raise TypeError("Inconsistent OpDef for '%s', missing attr '%s'" %
+                  (op_def.name, name))
+
+
+def _AttrValue(attr_protos, name):
+  if name in attr_protos:
+    return attr_protos[name]
+  raise TypeError("Inconsistent OpDef, missing attr '%s' from '%s'." %
+                  (name, attr_protos))
+
+
+def _SatisfiesTypeConstraint(dtype, attr_def):
+  if attr_def.HasField("allowed_values"):
+    allowed_list = attr_def.allowed_values.list.type
+    if dtype not in allowed_list:
+      raise TypeError(
+          "DataType %s for attr '%s' not in list of allowed values: %s" %
+          (types_lib.as_dtype(dtype).name, attr_def.name,
+           ", ".join(types_lib.as_dtype(x).name for x in allowed_list)))
+
+
+def _IsListParameter(arg):
+  if arg.number_attr:
+    return True
+  elif arg.type_list_attr:
+    return True
+  return False
+
+
+def _NumTypeFields(arg):
+  num = 0
+  if arg.type != types_pb2.DT_INVALID: num += 1
+  if arg.type_attr: num += 1
+  if arg.type_list_attr: num += 1
+  return num
+
+
+def _IsListValue(v):
+  return isinstance(v, (list, tuple))
+
+
+def _Flatten(l):
+  """Converts [1, 2, [3, 4], [5]] to [1, 2, 3, 4, 5]."""
+  # [1, 2, [3, 4], [5]] -> [[1], [2], [3, 4], [5]]
+  l_of_l = [x if _IsListValue(x) else [x] for x in l]
+  # [[1], [2], [3, 4], [5]] -> [1, 2, 3, 4, 5]
+  return [item for sublist in l_of_l for item in sublist]
+
+
+def _Restructure(l, structure):
+  """Returns the elements of list l structured according to the given structure.
+
+  A structure is represented by a list whose elements are either
+  `None` or a non-negative integer. `None` corresponds to a single
+  element in the output list, and an integer N corresponds to a nested
+  list of length N.
+
+  The function returns a data structure whose shape is given by
+  `structure`, and whose elements are taken from `l`. If `structure`
+  is a singleton, the function returns the single data structure
+  implied by the 0th element of `structure`. For example:
+
+      _Restructure(["foo", "bar", "baz", "qux"], [None, 2, None])
+        -> ["foo", ["bar", "baz"], "qux"]
+
+      _Restructure(["foo"], [None]) -> "foo"
+
+      _Restructure(["foo"], [1]) -> ["foo"]
+
+      _Restructure([], [0]) -> []
+
+  Args:
+    l: A list.
+    structure: A list whose elements are either `None` or a non-negative
+      integer.
+
+  Returns:
+    The elements of `l`, restructured according to `structure`. If
+    `structure` is a list of length 1, this function returns the
+    single data structure implied by `structure[0]`.
+
+  """
+  result = []
+  current_index = 0
+  for element in structure:
+    if element is None:
+      result.append(l[current_index])
+      current_index += 1
+    else:
+      result.append(l[current_index:current_index+element])
+      current_index += element
+
+  if len(result) == 1:
+    return result[0]
+  else:
+    return tuple(result)
+
+
+def _MakeFloat(v, arg_name):
+  if not isinstance(v, numbers.Real):
+    raise TypeError("Expected float for argument '%s' not %s." %
+                    (arg_name, repr(v)))
+  return float(v)
+
+
+def _MakeInt(v, arg_name):
+  if isinstance(v, basestring):
+    raise TypeError("Expected int for argument '%s' not %s." %
+                    (arg_name, repr(v)))
+  try:
+    return int(v)
+  except (ValueError, TypeError):
+    raise TypeError("Expected int for argument '%s' not %s." %
+                    (arg_name, repr(v)))
+
+
+def _MakeStr(v, arg_name):
+  if not isinstance(v, basestring):
+    raise TypeError("Expected string for argument '%s' not %s." %
+                    (arg_name, repr(v)))
+  return str(v)  # Convert unicode strings to bytes.
+
+
+def _MakeBool(v, arg_name):
+  if not isinstance(v, bool):
+    raise TypeError("Expected bool for argument '%s' not %s." %
+                    (arg_name, repr(v)))
+  return v
+
+
+def _MakeType(v, attr_def):
+  try:
+    v = types_lib.as_dtype(v)
+  except TypeError:
+    raise TypeError("Expected DataType for argument '%s' not %s." %
+                    (attr_def.name, repr(v)))
+  i = v.as_datatype_enum
+  _SatisfiesTypeConstraint(i, attr_def)
+  return i
+
+
+def _MakeShape(v, arg_name):
+  """Convert v into a TensorShapeProto."""
+  # Args:
+  #   v: A TensorShapeProto, a list of ints, or a tensor_shape.TensorShape.
+  #   arg_name: String, for error messages.
+
+  # Returns:
+  #   A TensorShapeProto.
+  if isinstance(v, tensor_shape_pb2.TensorShapeProto):
+    for d in v.dim:
+      if d.name:
+        logging.warning("Warning: TensorShapeProto with a named dimension: %s",
+                        str(v))
+        break
+    return v
+  s = tensor_shape.as_shape(v)
+  ret = tensor_shape_pb2.TensorShapeProto()
+  for i in s.as_dimension_list():
+    ret.dim.add(size = i)
+  return ret
+
+
+def _MakeTensor(v, arg_name):
+  """Ensure v is a TensorProto."""
+  if isinstance(v, tensor_pb2.TensorProto):
+    return v
+  raise TypeError(
+      "Don't know how to convert %s to a TensorProto for argument '%s'" %
+      (repr(v), arg_name))
+
+
+class _OpInfo(object):
+  """All per-Op state we would like to precompute/validate."""
+
+  def __init__(self, op_def):
+    self.op_def = op_def
+    # TODO(josh11b): SWIG the ValidateOpDef() function from C++ and call it
+    # here, instead of these checks.
+    for arg in list(op_def.input_arg) + list(op_def.output_arg):
+      num_type_fields = _NumTypeFields(arg)
+      if num_type_fields != 1:
+        raise TypeError("Arg '%s' of '%s' must have one type field not %d" %
+                        (arg.name, op_def.name, num_type_fields))
+      if arg.type_attr:
+        attr_type = _Attr(op_def, arg.type_attr).type
+        if attr_type != "type":
+          raise TypeError("Attr '%s' of '%s' used as a type_attr "
+                          "but has type %s" %
+                          (arg.type_attr, op_def.name, attr_type))
+      if arg.type_list_attr:
+        attr_type = _Attr(op_def, arg.type_list_attr).type
+        if attr_type != "list(type)":
+          raise TypeError(
+              "Attr '%s' of '%s' used as a type_list_attr but has type %s" %
+              (arg.type_attr, op_def.name, attr_type))
+      if arg.number_attr:
+        attr_type = _Attr(op_def, arg.number_attr).type
+        if attr_type != "int":
+          raise TypeError(
+              "Attr '%s' of '%s' used as a number_attr but has type %s" %
+              (arg.number_attr, op_def.name, attr_type))
+
+
+class OpDefLibrary(object):
+  """Holds a collection of OpDefs, can add the corresponding Ops to a graph."""
+
+  def __init__(self):
+    self._ops = {}
+
+  def add_op(self, op_def):
+    """Register an OpDef. May call apply_op with the name afterwards."""
+    if not isinstance(op_def, op_def_pb2.OpDef):
+      raise TypeError("%s is %s, not an op_def_pb2.OpDef" %
+                      (op_def, type(op_def)))
+    if not op_def.name:
+      raise ValueError("%s missing name." % op_def)
+    if op_def.name in self._ops:
+      raise RuntimeError("Op name %s registered twice." % op_def.name)
+    self._ops[op_def.name] = _OpInfo(op_def)
+
+  def add_op_list(self, op_list):
+    """Register the OpDefs from an OpList."""
+    if not isinstance(op_list, op_def_pb2.OpList):
+      raise TypeError("%s is %s, not an op_def_pb2.OpList" %
+                      (op_list, type(op_list)))
+    for op_def in op_list.op:
+      self.add_op(op_def)
+
+  def apply_op(self, op_type_name, g=None, name=None, **keywords):
+    # pylint: disable=g-doc-args
+    """Add a node invoking a registered Op to a graph.
+
+    Config proto extensions must be provided via the 'ext' keyword argument.
+    Example usage:
+       # input1 and input2 can be Tensors or anything ops.convert_to_tensor()
+       # will convert to a Tensor.
+       op_def_library.apply_op("op", input1=input1, input2=input2)
+       # If none of the inputs are Tensors and your session doesn't have a
+       # default graph, you will have to specify the graph.
+       op_def_library.apply_op("op", input1=input1, g=g)
+       # Can specify a node name.
+       op_def_library.apply_op("op", input1=input1, name="node_name")
+       # Must use keyword arguments, with the names specified in the OpDef.
+       op_def_library.apply_op("op", input_name=input, attr_name=attr)
+
+    All attrs must either be inferred from an input or specified.
+    (If inferred, the attr must not be specified.)  If an attr has a default
+    value specified in the Op's OpDef, then you may pass None as the value
+    of that attr to get the default.
+
+    Args:
+      op_type_name: string. Must match the name field of a registered Op.
+      g: The graph context (optional)
+      name: string. Optional name of the created op.
+      **keywords: input Tensor and attr arguments specified by name,
+        and optional parameters to pass when constructing the Operation.
+
+    Returns:
+      The Tensor(s) representing the output of the operation, or the Operation
+      itself if there are no outputs.
+
+    Raises:
+      RuntimeError: On some errors.
+      TypeError: On some errors.
+      ValueError: On some errors.
+    """
+    op_info = self._ops.get(op_type_name, None)
+    if op_info is None:
+      raise RuntimeError("Unrecognized Op name " + op_type_name)
+    op_def = op_info.op_def
+
+    # Determine the graph context.
+    try:
+      # Need to flatten all the arguments into a list.
+      # pylint: disable=protected-access
+      g = ops._get_graph_from_inputs(_Flatten(keywords.values()), graph=g)
+      # pyline: enable=protected-access
+    except AssertionError as e:
+      raise RuntimeError(
+          "Need to specify g=graph to Op '%s' (could not determine graph due "
+          "to: %s)" % (op_type_name, e.message))
+
+    # Default name if not specified.
+    if name is None:
+      name = op_type_name
+
+    # Requires that op_def has passed validation (using the C++
+    # ValidateOpDef() from ../framework/op_def_util.h).
+    attrs = {}
+    inputs = []
+    input_types = []
+    with g.as_default(), ops.name_scope(name) as scope:
+
+      # Perform input type inference
+      inferred_from = {}
+      for input_arg in op_def.input_arg:
+        input_name = input_arg.name
+        if input_name in keywords:
+          values = keywords.pop(input_name)
+        elif input_name + "_" in keywords:
+          # Handle the case where the name is a keyword or built-in
+          # for Python so we use the name + _ instead.
+          input_name += "_"
+          values = keywords.pop(input_name)
+        else:
+          raise TypeError("No argument for input " + input_name)
+
+        # Goals:
+        # * Convert values to Tensors if it contains constants.
+        # * Verify that values is a list if that matches the input_arg's
+        #   type.
+        # * If the input_arg's type is determined by attrs, either set
+        #   those attrs and validate those attr values are legal (if
+        #   they have not yet been set) or validate the input matches
+        #   the type indicated by the attrs (if they have already been
+        #   inferred via an earlier input).
+        # * If the input_arg has an explicit type, make sure the input
+        #   conforms.
+
+        if _IsListParameter(input_arg):
+          if not _IsListValue(values):
+            raise TypeError(
+                "Expected list for '%s' argument to '%s' Op, not %s." %
+                (input_name, op_type_name, values))
+          # In cases where we expect all elements of the list to have the
+          # same dtype, try to cast non-Tensor elements to that type.
+          dtype = None
+          if input_arg.type != types_pb2.DT_INVALID:
+            dtype = input_arg.type
+          elif input_arg.number_attr:
+            if input_arg.type_attr in attrs:
+              dtype = attrs[input_arg.type_attr]
+            else:
+              for t in values:
+                if isinstance(t, ops.Tensor):
+                  dtype = t.dtype
+                  break
+
+          try:
+            values = ops.convert_n_to_tensor_or_indexed_slices(
+                values, name=input_arg.name,
+                dtype=types_lib.as_dtype(dtype).base_dtype if dtype else None)
+          except (TypeError, ValueError):
+            assert dtype is not None, "Should not fail if dtype is None"
+            assert input_arg.number_attr, "Should be number_attr case"
+            # What types does the conversion function think values have?
+            values = ops.convert_n_to_tensor_or_indexed_slices(values)
+            observed = ", ".join(v.dtype.base_dtype.name for v in values)
+
+            prefix = (
+                "Tensors in list passed to '%s' of '%s' Op have types [%s]" %
+                (input_name, op_type_name, observed))
+            if input_arg.type != types_pb2.DT_INVALID:
+              raise TypeError("%s that do not match expected type %s." %
+                              (prefix, types_lib.as_dtype(dtype).name))
+            elif input_arg.type_attr in attrs:
+              raise TypeError("%s that do not match type %s inferred from "
+                              "earlier arguments." %
+                              (prefix, types_lib.as_dtype(dtype).name))
+            else:
+              raise TypeError("%s that don't all match." % prefix)
+
+          types = [x.dtype for x in values]
+          inputs.extend(values)
+        else:
+          # In cases where we have an expected type, try to convert non-Tensor
+          # arguments to that type.
+          dtype = None
+          if input_arg.type != types_pb2.DT_INVALID:
+            dtype = input_arg.type
+          elif input_arg.type_attr in attrs:
+            dtype = attrs[input_arg.type_attr]
+
+          try:
+            values = ops.convert_to_tensor(
+                values, name=input_arg.name, dtype=dtype)
+          except ValueError:
+            # What type does convert_to_tensor think it has?
+            observed = ops.convert_to_tensor(values).dtype.name
+            prefix = ("Input '%s' of '%s' Op has type %s that does not match" %
+                      (input_name, op_type_name, observed))
+            if input_arg.type != types_pb2.DT_INVALID:
+              raise TypeError("%s expected type of %s." %
+                              (prefix, types_lib.as_dtype(input_arg.type).name))
+            else:
+              raise TypeError(
+                  "%s type %s of argument '%s'." %
+                  (prefix, types_lib.as_dtype(attrs[input_arg.type_attr]).name,
+                   inferred_from[input_arg.type_attr]))
+
+          types = [values.dtype]
+          inputs.append(values)
+        base_types = [x.base_dtype for x in types]
+
+        if input_arg.number_attr:
+          # <number-attr> * <type> or <number-attr> * <type-attr>
+          if input_arg.number_attr in attrs:
+            if len(values) != attrs[input_arg.number_attr]:
+              raise ValueError(
+                  "List argument '%s' to '%s' Op with length %d must match "
+                  "length %d of argument '%s'." %
+                  (input_name, op_type_name, len(values),
+                   attrs[input_arg.number_attr],
+                   inferred_from[input_arg.number_attr]))
+          else:
+            attrs[input_arg.number_attr] = len(values)
+            inferred_from[input_arg.number_attr] = input_name
+            num_attr = _Attr(op_def, input_arg.number_attr)
+            if num_attr.has_minimum and len(values) < num_attr.minimum:
+              raise ValueError(
+                  "List argument '%s' to '%s' Op with length %d shorter "
+                  "than minimum length %d." %
+                  (input_name, op_type_name, len(values), num_attr.minimum))
+          # All tensors must have the same base type.
+          if any([bt != base_types[0] for bt in base_types]):
+            raise TypeError(
+                "All tensors passed to '%s' of '%s' Op "
+                "must have the same type." %
+                (input_name, op_type_name))
+          if input_arg.type != types_pb2.DT_INVALID:
+            # <number-attr> * <type> case
+            if base_types and base_types[0] != input_arg.type:
+              assert False, "Unreachable"
+          elif input_arg.type_attr in attrs:
+            # <number-attr> * <type-attr> case, where <type-attr> already
+            # has an inferred value.
+            if base_types and base_types[0] != attrs[input_arg.type_attr]:
+              assert False, "Unreachable"
+          else:
+            # <number-attr> * <type-attr> case, where we are now setting
+            # the <type-attr> based on this input
+            if not base_types:
+              raise TypeError(
+                  "Don't know how to infer type variable from empty input "
+                  "list passed to input '%s' of '%s' Op." %
+                  (input_name, op_type_name))
+            attrs[input_arg.type_attr] = base_types[0]
+            inferred_from[input_arg.type_attr] = input_name
+            type_attr = _Attr(op_def, input_arg.type_attr)
+            _SatisfiesTypeConstraint(base_types[0], type_attr)
+        elif input_arg.type_attr:
+          # <type-attr>
+          attr_value = base_types[0]
+          if input_arg.type_attr in attrs:
+            if attrs[input_arg.type_attr] != attr_value:
+              assert False, "Unreachable"
+          else:
+            for base_type in base_types:
+              _SatisfiesTypeConstraint(base_type,
+                                       _Attr(op_def, input_arg.type_attr))
+            attrs[input_arg.type_attr] = attr_value
+            inferred_from[input_arg.type_attr] = input_name
+        elif input_arg.type_list_attr:
+          # <type-list-attr>
+          attr_value = base_types
+          if input_arg.type_list_attr in attrs:
+            if attrs[input_arg.type_list_attr] != attr_value:
+              raise TypeError(
+                  "Input '%s' of '%s' Op has type list of %s that does not "
+                  "match type list %s of argument '%s'." %
+                  (input_name, op_type_name,
+                   ", ".join(types_lib.as_dtype(x).name for x in attr_value),
+                   ", ".join(types_lib.as_dtype(x).name
+                             for x in attrs[input_arg.type_list_attr]),
+                   inferred_from[input_arg.type_list_attr]))
+          else:
+            for base_type in base_types:
+              _SatisfiesTypeConstraint(base_type,
+                                       _Attr(op_def, input_arg.type_list_attr))
+            attrs[input_arg.type_list_attr] = attr_value
+            inferred_from[input_arg.type_list_attr] = input_name
+        else:
+          # single Tensor with specified type
+          if base_types[0] != input_arg.type:
+            assert False, "Unreachable"
+
+        if input_arg.is_ref:
+          if not all(x.is_ref_dtype for x in types):
+            raise TypeError(
+                "Input '%s' of '%s' Op requires l-value input" %
+                (input_name, op_type_name))
+          input_types.extend(types)
+        else:
+          input_types.extend(base_types)
+
+      # Process remaining attrs
+      for attr in op_def.attr:
+        # Skip attrs that have already had their values inferred
+        if attr.name in attrs:
+          if attr.name in keywords:
+            raise TypeError(
+                "Should not specify value for inferred attr '%s'." % attr.name)
+          continue
+        if attr.name in keywords:
+          attrs[attr.name] = keywords.pop(attr.name)
+        elif attr.name + "_" in keywords:
+          # Attrs whose names match Python keywords have an extra '_'
+          # appended, so we must check for that as well.
+          attrs[attr.name] = keywords.pop(attr.name + "_")
+        else:
+          raise TypeError("No argument for attr " + attr.name)
+
+      # Convert attr values to AttrValue protos.
+      attr_protos = {}
+      for attr_def in op_def.attr:
+        key = attr_def.name
+        value = attrs[key]
+        attr_value = attr_value_pb2.AttrValue()
+        if attr_def.HasField("default_value") and value is None:
+          attr_value.CopyFrom(attr_def.default_value)
+          attr_protos[key] = attr_value
+          continue
+        if attr_def.type.startswith("list("):
+          if not _IsListValue(value):
+            raise TypeError("Expected list for attr " + key)
+          if attr_def.has_minimum:
+            if len(value) < attr_def.minimum:
+              raise ValueError("Attr '%s' of '%s' Op passed list of length %d "
+                               "less than minimum %d." %
+                               (key, op_type_name, len(value),
+                                attr_def.minimum))
+        if attr_def.type == "string":
+          attr_value.s = _MakeStr(value, key)
+          if attr_def.HasField("allowed_values"):
+            if attr_value.s not in attr_def.allowed_values.list.s:
+              raise ValueError(
+                  "Attr '%s' of '%s' Op passed string '%s' not in: \"%s\"." %
+                  (key, op_type_name, attr_value.s,
+                   '", "'.join(attr_def.allowed_values.list.s)))
+        elif attr_def.type == "list(string)":
+          attr_value.list.s.extend([_MakeStr(x, key) for x in value])
+          if attr_def.HasField("allowed_values"):
+            for x in attr_value.list.s:
+              if x not in attr_def.allowed_values.list.s:
+                raise ValueError(
+                    "Attr '%s' of '%s' Op passed string '%s' not in: \"%s\"." %
+                    (key, op_type_name, x,
+                     '", "'.join(attr_def.allowed_values.list.s)))
+        elif attr_def.type == "int":
+          attr_value.i = _MakeInt(value, key)
+          if attr_def.has_minimum:
+            if attr_value.i < attr_def.minimum:
+              raise ValueError(
+                  "Attr '%s' of '%s' Op passed %d less than minimum %d." %
+                  (key, op_type_name, attr_value.i, attr_def.minimum))
+        elif attr_def.type == "list(int)":
+          attr_value.list.i.extend([_MakeInt(x, key) for x in value])
+        elif attr_def.type == "float":
+          attr_value.f = _MakeFloat(value, key)
+        elif attr_def.type == "list(float)":
+          attr_value.list.f.extend([_MakeFloat(x, key) for x in value])
+        elif attr_def.type == "bool":
+          attr_value.b = _MakeBool(value, key)
+        elif attr_def.type == "list(bool)":
+          attr_value.list.b.extend([_MakeBool(x, key) for x in value])
+        elif attr_def.type == "type":
+          attr_value.type = _MakeType(value, attr_def)
+        elif attr_def.type == "list(type)":
+          attr_value.list.type.extend(
+              [_MakeType(x, attr_def) for x in value])
+        elif attr_def.type == "shape":
+          attr_value.shape.CopyFrom(_MakeShape(value, key))
+        elif attr_def.type == "list(shape)":
+          attr_value.list.shape.extend(
+              [_MakeShape(x, key) for x in value])
+        elif attr_def.type == "tensor":
+          attr_value.tensor.CopyFrom(_MakeTensor(value, key))
+        elif attr_def.type == "list(tensor)":
+          attr_value.list.tensor.extend(
+              [_MakeTensor(x, key) for x in value])
+        else:
+          raise TypeError("Unrecognized Attr type " + attr_def.type)
+
+        attr_protos[key] = attr_value
+      del attrs  # attrs is no longer authoritative, use attr_protos instead
+
+      # Determine output types (possibly using attrs)
+      output_types = []
+      output_structure = []
+      for arg in op_def.output_arg:
+        types = []
+        if arg.number_attr:
+          n = _AttrValue(attr_protos, arg.number_attr).i
+          if arg.type_attr:
+            types = [_AttrValue(attr_protos, arg.type_attr).type] * n
+          else:
+            types = [arg.type] * n
+          output_structure.append(n)
+        elif arg.type_attr:
+          t = _AttrValue(attr_protos, arg.type_attr)
+          types = [t.type]
+          output_structure.append(None)
+        elif arg.type_list_attr:
+          t = _AttrValue(attr_protos, arg.type_list_attr)
+          types = t.list.type
+          output_structure.append(len(t.list.type))
+        else:
+          types = [arg.type]
+          output_structure.append(None)
+        if arg.is_ref:
+          types = [types_lib.as_dtype(x).as_ref for x in types]
+        output_types.extend(types)
+
+      if keywords:
+        raise TypeError("apply_op() got unexpected keyword arguments: " +
+                        ", ".join(sorted(keywords.keys())))
+
+      # Add Op to graph
+      if output_structure:
+        op = g.create_op(op_type_name, inputs, output_types, name=scope,
+                         input_types=input_types, attrs=attr_protos,
+                         op_def=op_def)
+        outputs = op.outputs
+        return _Restructure(ops.convert_n_to_tensor_or_indexed_slices(outputs),
+                            output_structure)
+      else:
+        return g.create_op(op_type_name, inputs, output_types, name=scope,
+                           input_types=input_types, attrs=attr_protos,
+                           op_def=op_def)
diff --git a/tensorflow/python/ops/op_def_library_test.py b/tensorflow/python/ops/op_def_library_test.py
new file mode 100644
index 0000000000..72de4586a3
--- /dev/null
+++ b/tensorflow/python/ops/op_def_library_test.py
@@ -0,0 +1,1402 @@
+"""Tests for tensorflow.python.ops.op_def_library."""
+
+from google.protobuf import text_format
+
+from tensorflow.core.framework import op_def_pb2
+from tensorflow.core.framework import tensor_shape_pb2
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import types
+from tensorflow.python.ops.op_def_library import OpDefLibrary
+from tensorflow.python.platform import googletest
+
+
+# NOTE(mrry): Dummy shape registrations for ops used in the tests.
+ops.RegisterShape("Attr")(None)
+ops.RegisterShape("AttrBool")(None)
+ops.RegisterShape("AttrBoolList")(None)
+ops.RegisterShape("AttrDefault")(None)
+ops.RegisterShape("AttrEmptyListDefault")(None)
+ops.RegisterShape("AttrEnum")(None)
+ops.RegisterShape("AttrEnumList")(None)
+ops.RegisterShape("AttrFloat")(None)
+ops.RegisterShape("AttrListDefault")(None)
+ops.RegisterShape("AttrListMin")(None)
+ops.RegisterShape("AttrMin")(None)
+ops.RegisterShape("AttrShape")(None)
+ops.RegisterShape("AttrShapeList")(None)
+ops.RegisterShape("Binary")(None)
+ops.RegisterShape("ComplexStruct")(None)
+ops.RegisterShape("InPolymorphicTwice")(None)
+ops.RegisterShape("MixedStruct")(None)
+ops.RegisterShape("NInPolymorphicTwice")(None)
+ops.RegisterShape("NInTwice")(None)
+ops.RegisterShape("NInTwoTypeVariables")(None)
+ops.RegisterShape("NIntsIn")(None)
+ops.RegisterShape("NIntsOut")(None)
+ops.RegisterShape("NIntsOutDefault")(None)
+ops.RegisterShape("NPolymorphicIn")(None)
+ops.RegisterShape("NPolymorphicOut")(None)
+ops.RegisterShape("NPolymorphicOutDefault")(None)
+ops.RegisterShape("NPolymorphicRestrictIn")(None)
+ops.RegisterShape("NPolymorphicRestrictOut")(None)
+ops.RegisterShape("OutT")(None)
+ops.RegisterShape("OutTypeList")(None)
+ops.RegisterShape("OutTypeListRestrict")(None)
+ops.RegisterShape("Polymorphic")(None)
+ops.RegisterShape("PolymorphicDefaultOut")(None)
+ops.RegisterShape("PolymorphicOut")(None)
+ops.RegisterShape("RefIn")(None)
+ops.RegisterShape("RefOut")(None)
+ops.RegisterShape("ReservedAttr")(None)
+ops.RegisterShape("ReservedInput")(None)
+ops.RegisterShape("Restrict")(None)
+ops.RegisterShape("Simple")(None)
+ops.RegisterShape("SimpleStruct")(None)
+ops.RegisterShape("TypeList")(None)
+ops.RegisterShape("TypeListRestrict")(None)
+ops.RegisterShape("TypeListTwice")(None)
+
+
+class OpDefLibraryTest(test_util.TensorFlowTestCase):
+
+  def setUp(self):
+    self._lib = OpDefLibrary()
+    self._g = ops.Graph()
+    self._default_graph_controller = self._g.as_default()
+    self._default_graph_controller.__enter__()
+    self._add_op("name: 'Simple' input_arg { name: 'a' type: DT_INT32 } "
+                 "output_arg { name: 'out' type: DT_FLOAT }")
+    self._add_op("name: 'OutT' output_arg { name: 'a' type_attr: 'T' } "
+                 "attr { name: 'T' type: 'type' }")
+
+  def tearDown(self):
+    self._default_graph_controller.__exit__(None, None, None)
+
+  def _add_op(self, ascii):
+    op_def = op_def_pb2.OpDef()
+    text_format.Merge(ascii, op_def)
+    self._lib.add_op(op_def)
+
+  def Tensor(self, t, name="in"):
+    return self._lib.apply_op("OutT", T=t, name=name)
+
+  def testNoRegisteredOpFails(self):
+    with self.assertRaises(RuntimeError) as cm:
+      self._lib.apply_op("unknown", g=self._g)
+    self.assertEqual(cm.exception.message, "Unrecognized Op name unknown")
+
+  def testAddOpValidation(self):
+    with self.assertRaises(TypeError) as cm:
+      self._add_op("name: 'MissingTypeAttr' "
+                   "input_arg { name: 'a' type_attr: 'T' } ")
+    self.assertEqual(cm.exception.message,
+                     "Inconsistent OpDef for 'MissingTypeAttr', "
+                     "missing attr 'T'")
+
+    with self.assertRaises(TypeError) as cm:
+      self._add_op("name: 'BadTypeAttr' "
+                   "output_arg { name: 'a' type_attr: 'T' } "
+                   "attr { name: 'T' type: 'int' }")
+    self.assertEqual(
+        cm.exception.message,
+        "Attr 'T' of 'BadTypeAttr' used as a type_attr but has type int")
+
+    with self.assertRaises(TypeError) as cm:
+      self._add_op("name: 'MissingNumberAttr' "
+                   "input_arg { name: 'a' type: DT_INT32 number_attr: 'N' } ")
+    self.assertEqual(cm.exception.message,
+                     "Inconsistent OpDef for 'MissingNumberAttr', "
+                     "missing attr 'N'")
+
+    with self.assertRaises(TypeError) as cm:
+      self._add_op("name: 'BadNumberAttr' "
+                   "output_arg { name: 'a' type: DT_INT32 number_attr: 'N' } "
+                   "attr { name: 'N' type: 'type' }")
+    self.assertEqual(
+        cm.exception.message,
+        "Attr 'N' of 'BadNumberAttr' used as a number_attr but has type type")
+
+    with self.assertRaises(TypeError) as cm:
+      self._add_op("name: 'TwoTypesA' "
+                   "input_arg { name: 'a' type: DT_INT32 type_attr: 'T' } "
+                   "attr { name: 'T' type: 'type' }")
+    self.assertEqual(cm.exception.message,
+                     "Arg 'a' of 'TwoTypesA' must have one type field not 2")
+
+    with self.assertRaises(TypeError) as cm:
+      self._add_op("name: 'TwoTypesB' "
+                   "input_arg { name: 'a' type: DT_INT32 type_list_attr: 'T' } "
+                   "attr { name: 'T' type: 'list(type)' }")
+    self.assertEqual(cm.exception.message,
+                     "Arg 'a' of 'TwoTypesB' must have one type field not 2")
+
+    with self.assertRaises(TypeError) as cm:
+      self._add_op("name: 'ThreeTypes' "
+                   "input_arg { name: 'a' type: DT_INT32 type_attr: 'T' "
+                   "type_list_attr: 'U' } "
+                   "attr { name: 'T' type: 'type' } "
+                   "attr { name: 'U' type: 'list(type)' }")
+    self.assertEqual(cm.exception.message,
+                     "Arg 'a' of 'ThreeTypes' must have one type field not 3")
+
+    with self.assertRaises(TypeError) as cm:
+      self._add_op("name: 'NoTypes' output_arg { name: 'a' } ")
+    self.assertEqual(cm.exception.message,
+                     "Arg 'a' of 'NoTypes' must have one type field not 0")
+
+  def testSimple(self):
+    out = self._lib.apply_op("Simple", a=3)
+    self.assertEquals(types.float32, out.dtype)
+    self.assertProtoEquals("""
+      name: 'Simple' op: 'Simple' input: 'Simple/a'
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("Simple", a=4)
+    self.assertProtoEquals("""
+      name: 'Simple_1' op: 'Simple' input: 'Simple_1/a'
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("Simple", a=5, name="named")
+    self.assertProtoEquals("""
+      name: 'named' op: 'Simple' input: 'named/a'
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("Simple", a=[[1, 2, 3], [4, 5, 6]], name="two_d")
+    self.assertProtoEquals("""
+      name: 'two_d' op: 'Simple' input: 'two_d/a'
+      """, out.op.node_def)
+
+  def testSimpleFailures(self):
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Simple", a="Bad string")
+    self.assertEqual(cm.exception.message,
+                     "Expected int32, got 'Bad string' instead.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Simple", a=self.Tensor(types.string))
+    self.assertEqual(cm.exception.message,
+                     "Input 'a' of 'Simple' Op has type string "
+                     "that does not match expected type of int32.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Simple", a=6, extra="bogus")
+    self.assertEqual(cm.exception.message,
+                     "apply_op() got unexpected keyword arguments: extra")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Simple", a=6, extra1="bogus", extra2="also_bogus")
+    self.assertEqual(cm.exception.message,
+                     "apply_op() got unexpected keyword arguments: extra1, "
+                     "extra2")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Simple")
+    self.assertEqual(cm.exception.message, "No argument for input a")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Simple", wrong=7)
+    self.assertEqual(cm.exception.message, "No argument for input a")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Simple", a=[self.Tensor(types.int32)])
+    self.assertStartsWith(cm.exception.message, "Expected int32, got")
+
+  def testReservedInput(self):
+    self._add_op("name: 'ReservedInput' "
+                 "input_arg { name: 'input' type: DT_INT32 } ")
+    op = self._lib.apply_op("ReservedInput", input_=7, name="x")
+    self.assertProtoEquals("""
+      name: 'x' op: 'ReservedInput' input: 'x/input'
+      """, op.node_def)
+
+  def testPolymorphic(self):
+    self._add_op("name: 'Polymorphic' "
+                 "input_arg { name: 'a' type_attr: 'T' } "
+                 "output_arg { name: 'out' type_attr: 'T' } "
+                 "attr { name: 'T' type: 'type' }")
+
+    out = self._lib.apply_op("Polymorphic", a=7, name="p")
+    self.assertEquals(types.int32, out.dtype)
+    self.assertProtoEquals("""
+      name: 'p' op: 'Polymorphic' input: 'p/a'
+      attr { key: 'T' value { type: DT_INT32 } }
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("Polymorphic", a="s", name="q")
+    self.assertEquals(types.string, out.dtype)
+    self.assertProtoEquals("""
+      name: 'q' op: 'Polymorphic' input: 'q/a'
+      attr { key: 'T' value { type: DT_STRING } }
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("Polymorphic", a=["s", "t", "u"], name="r")
+    self.assertEquals(types.string, out.dtype)
+    self.assertProtoEquals("""
+      name: 'r' op: 'Polymorphic' input: 'r/a'
+      attr { key: 'T' value { type: DT_STRING } }
+      """, out.op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Polymorphic", a="s", T=types.string)
+    self.assertEqual(cm.exception.message,
+                     "Should not specify value for inferred attr 'T'.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Polymorphic", a=[self.Tensor(types.bool)])
+    self.assertEqual(cm.exception.message,
+                     "List of Tensors when single Tensor expected")
+
+  def testPolymorphicOut(self):
+    self._add_op("name: 'PolymorphicOut' "
+                 "output_arg { name: 'out' type_attr: 'T' } "
+                 "attr { name: 'T' type: 'type' }")
+
+    out = self._lib.apply_op("PolymorphicOut", T=types.int32, name="p")
+    self.assertEquals(types.int32, out.dtype)
+    self.assertProtoEquals("""
+      name: 'p' op: 'PolymorphicOut'
+      attr { key: 'T' value { type: DT_INT32 } }
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("PolymorphicOut", T=types.bool, name="q")
+    self.assertEquals(types.bool, out.dtype)
+    self.assertProtoEquals("""
+      name: 'q' op: 'PolymorphicOut'
+      attr { key: 'T' value { type: DT_BOOL } }
+      """, out.op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("PolymorphicOut")
+    self.assertEqual(cm.exception.message,
+                     "No argument for attr T")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("PolymorphicOut", T=None)
+    self.assertEqual(cm.exception.message,
+                     "Expected DataType for argument 'T' not None.")
+
+  def testPolymorphicDefaultOut(self):
+    self._add_op("name: 'PolymorphicDefaultOut' "
+                 "output_arg { name: 'out' type_attr: 'T' } "
+                 "attr { name: 'T' type: 'type' "
+                 "  default_value { type: DT_STRING } }")
+
+    out = self._lib.apply_op("PolymorphicDefaultOut", T=None, name="p")
+    self.assertEquals(types.string, out.dtype)
+    self.assertProtoEquals("""
+      name: 'p' op: 'PolymorphicDefaultOut'
+      attr { key: 'T' value { type: DT_STRING } }
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("PolymorphicDefaultOut", T=types.bool,
+                            name="q")
+    self.assertEquals(types.bool, out.dtype)
+    self.assertProtoEquals("""
+      name: 'q' op: 'PolymorphicDefaultOut'
+      attr { key: 'T' value { type: DT_BOOL } }
+      """, out.op.node_def)
+
+  def testBinary(self):
+    self._add_op("name: 'Binary' "
+                 "input_arg { name: 'a' type_attr: 'T' } "
+                 "input_arg { name: 'b' type_attr: 'T' } "
+                 "output_arg { name: 'out' type_attr: 'T' } "
+                 "attr { name: 'T' type: 'type' }")
+
+    out = self._lib.apply_op("Binary", a=8, b=9, name="b")
+    self.assertEquals(types.int32, out.dtype)
+    self.assertProtoEquals("""
+      name: 'b' op: 'Binary' input: 'b/a' input: 'b/b'
+      attr { key: 'T' value { type: DT_INT32 } }
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("Binary", a="left", b="right", name="c")
+    self.assertEquals(types.string, out.dtype)
+    self.assertProtoEquals("""
+      name: 'c' op: 'Binary' input: 'c/a' input: 'c/b'
+      attr { key: 'T' value { type: DT_STRING } }
+      """, out.op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Binary", a="left", b=12)
+    self.assertEqual(cm.exception.message,
+                     "Expected string, got 12 instead.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Binary", a=self.Tensor(types.string),
+                        b=self.Tensor(types.int32))
+    self.assertEqual(cm.exception.message,
+                     "Input 'b' of 'Binary' Op has type int32 "
+                     "that does not match type string of argument 'a'.")
+
+  def testRestrict(self):
+    self._add_op("name: 'Restrict' "
+                 "input_arg { name: 'a' type_attr: 'T' } "
+                 "output_arg { name: 'out' type_attr: 'T' } "
+                 "attr { name: 'T' type: 'type' allowed_values { list { "
+                 "  type: DT_STRING type: DT_BOOL } } }")
+
+    out = self._lib.apply_op("Restrict", a="foo", name="g")
+    self.assertEquals(types.string, out.dtype)
+    self.assertProtoEquals("""
+      name: 'g' op: 'Restrict' input: 'g/a'
+      attr { key: 'T' value { type: DT_STRING } }
+      """, out.op.node_def)
+
+    out = self._lib.apply_op("Restrict", a=True, name="h")
+    self.assertEquals(types.bool, out.dtype)
+    self.assertProtoEquals("""
+      name: 'h' op: 'Restrict' input: 'h/a'
+      attr { key: 'T' value { type: DT_BOOL } }
+      """, out.op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Restrict", a=17)
+    self.assertEqual(cm.exception.message,
+                     "DataType int32 for attr 'T' "
+                     "not in list of allowed values: "
+                     "string, bool")
+
+  def testTypeList(self):
+    self._add_op("name: 'TypeList' "
+                 "input_arg { name: 'a' type_list_attr: 'T' } "
+                 "attr { name: 'T' type: 'list(type)' }")
+
+    op = self._lib.apply_op("TypeList", a=["foo"], name="z")
+    self.assertProtoEquals("""
+      name: 'z' op: 'TypeList' input: 'z/a_0'
+      attr { key: 'T' value { list { type: DT_STRING } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("TypeList", a=[True, 12], name="y")
+    self.assertProtoEquals("""
+      name: 'y' op: 'TypeList' input: 'y/a_0' input: 'y/a_1'
+      attr { key: 'T' value { list { type: DT_BOOL type: DT_INT32 } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("TypeList", a=[], name="empty")
+    self.assertProtoEquals("""
+      name: 'empty' op: 'TypeList' attr { key: 'T' value { list { } } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("TypeList", a=17)
+    self.assertStartsWith(cm.exception.message,
+                          "Expected list for 'a' "
+                          "argument to 'TypeList' Op, not ")
+
+  def testTypeListTwice(self):
+    self._add_op("name: 'TypeListTwice' "
+                 "input_arg { name: 'a' type_list_attr: 'T' } "
+                 "input_arg { name: 'b' type_list_attr: 'T' } "
+                 "attr { name: 'T' type: 'list(type)' }")
+
+    op = self._lib.apply_op("TypeListTwice", a=["foo", True], b=["bar", False],
+                           name="z")
+    self.assertProtoEquals("""
+      name: 'z' op: 'TypeListTwice'
+      input: 'z/a_0' input: 'z/a_1' input: 'z/b_0' input: 'z/b_1'
+      attr { key: 'T' value { list { type: DT_STRING type: DT_BOOL } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("TypeListTwice", a=[], b=[], name="empty")
+    self.assertProtoEquals("""
+      name: 'empty' op: 'TypeListTwice' attr { key: 'T' value { list { } } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("TypeListTwice", a=["foo", True], b=["bar", 6])
+    self.assertEqual(cm.exception.message,
+                     "Input 'b' of 'TypeListTwice' Op has type list of "
+                     "string, int32 that does not match type list "
+                     "string, bool of argument 'a'.")
+
+  def testOutTypeList(self):
+    self._add_op("name: 'OutTypeList' "
+                 "output_arg { name: 'out' type_list_attr: 'T' } "
+                 "attr { name: 'T' type: 'list(type)' }")
+
+    out, = self._lib.apply_op("OutTypeList", T=[types.float32], name="x")
+    self.assertEquals(types.float32, out.dtype)
+    self.assertProtoEquals("""
+      name: 'x' op: 'OutTypeList'
+      attr { key: 'T' value { list { type: DT_FLOAT } } }
+      """, out.op.node_def)
+
+    out1, out2 = self._lib.apply_op("OutTypeList",
+                                   T=[types.int32, types.bool],
+                                   name="w")
+    self.assertEquals(types.int32, out1.dtype)
+    self.assertEquals(types.bool, out2.dtype)
+    self.assertProtoEquals("""
+      name: 'w' op: 'OutTypeList'
+      attr { key: 'T' value { list { type: DT_INT32 type: DT_BOOL } } }
+      """, out1.op.node_def)
+
+    out = self._lib.apply_op("OutTypeList", T=[], name="empty")
+    self.assertEqual([], out)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("OutTypeList", T=types.int32)
+    self.assertEqual(cm.exception.message, "Expected list for attr T")
+
+  def testTypeListRestrict(self):
+    self._add_op("name: 'TypeListRestrict' "
+                 "input_arg { name: 'a' type_list_attr: 'T' } "
+                 "attr { name: 'T' type: 'list(type)' allowed_values { list { "
+                 "  type: DT_STRING type: DT_BOOL } } }")
+
+    op = self._lib.apply_op("TypeListRestrict", a=["foo", False], name="v")
+    self.assertProtoEquals("""
+      name: 'v' op: 'TypeListRestrict' input: 'v/a_0' input: 'v/a_1'
+      attr { key: 'T' value { list { type: DT_STRING type: DT_BOOL } } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("TypeListRestrict", a=[True, 12])
+    self.assertEqual(cm.exception.message,
+                     "DataType int32 for attr 'T' "
+                     "not in list of allowed values: string, bool")
+
+  def testOutTypeListRestrict(self):
+    self._add_op("name: 'OutTypeListRestrict' "
+                 "output_arg { name: 'out' type_list_attr: 't' } "
+                 "attr { name: 't' type: 'list(type)' allowed_values { list { "
+                 "  type: DT_STRING type: DT_BOOL } } }")
+
+    out1, out2 = self._lib.apply_op("OutTypeListRestrict",
+                                   t=[types.bool, types.string],
+                                   name="u")
+    self.assertEquals(types.bool, out1.dtype)
+    self.assertEquals(types.string, out2.dtype)
+    self.assertProtoEquals("""
+      name: 'u' op: 'OutTypeListRestrict'
+      attr { key: 't' value { list { type: DT_BOOL type: DT_STRING } } }
+      """, out1.op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("OutTypeListRestrict",
+                        t=[types.string, types.int32])
+    self.assertEqual(cm.exception.message,
+                     "DataType int32 for attr 't' "
+                     "not in list of allowed values: string, bool")
+
+  def testAttr(self):
+    self._add_op("name: 'Attr' attr { name: 'a' type: 'int' }")
+    op = self._lib.apply_op("Attr", a=12, name="t")
+    self.assertProtoEquals("""
+      name: 't' op: 'Attr' attr { key: 'a' value { i: 12 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("Attr", a=tensor_shape.Dimension(13), name="u")
+    self.assertProtoEquals("""
+      name: 'u' op: 'Attr' attr { key: 'a' value { i: 13 } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Attr", a="bad")
+    self.assertEqual(cm.exception.message,
+                     "Expected int for argument 'a' not 'bad'.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Attr", a=[12])
+    self.assertEqual(cm.exception.message,
+                     "Expected int for argument 'a' not [12].")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Attr", a=None)
+    self.assertEqual(cm.exception.message,
+                     "Expected int for argument 'a' not None.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("Attr")
+    self.assertEqual(cm.exception.message, "No argument for attr a")
+
+  def testAttrFloat(self):
+    self._add_op("name: 'AttrFloat' attr { name: 'a' type: 'float' }")
+
+    op = self._lib.apply_op("AttrFloat", a=1.2, name="t")
+    self.assertProtoEquals("""
+      name: 't' op: 'AttrFloat' attr { key: 'a' value { f: 1.2 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrFloat", a=12, name="u")
+    self.assertProtoEquals("""
+      name: 'u' op: 'AttrFloat' attr { key: 'a' value { f: 12 } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("AttrFloat", a="bad")
+    self.assertEqual(cm.exception.message,
+                     "Expected float for argument 'a' not 'bad'.")
+
+  def testAttrBool(self):
+    self._add_op("name: 'AttrBool' attr { name: 'a' type: 'bool' }")
+
+    op = self._lib.apply_op("AttrBool", a=True, name="t")
+    self.assertProtoEquals("""
+      name: 't' op: 'AttrBool' attr { key: 'a' value { b: true } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrBool", a=False, name="u")
+    self.assertProtoEquals("""
+      name: 'u' op: 'AttrBool' attr { key: 'a' value { b: false } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("AttrBool", a=0)
+    self.assertEqual(cm.exception.message,
+                     "Expected bool for argument 'a' not 0.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("AttrBool", a=1)
+    self.assertEqual(cm.exception.message,
+                     "Expected bool for argument 'a' not 1.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("AttrBool", a=[])
+    self.assertEqual(cm.exception.message,
+                     "Expected bool for argument 'a' not [].")
+
+  def testAttrBoolList(self):
+    self._add_op("name: 'AttrBoolList' attr { name: 'a' type: 'list(bool)' }")
+
+    op = self._lib.apply_op("AttrBoolList", a=[True, False, True], name="t")
+    self.assertProtoEquals("""
+      name: 't' op: 'AttrBoolList'
+      attr { key: 'a' value { list { b: true b: false b:true } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrBoolList", a=[], name="u")
+    self.assertProtoEquals("""
+      name: 'u' op: 'AttrBoolList' attr { key: 'a' value { list { } } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("AttrBoolList", a=[0])
+    self.assertEqual(cm.exception.message,
+                     "Expected bool for argument 'a' not 0.")
+
+  def testAttrMin(self):
+    self._add_op("name: 'AttrMin' attr { name: 'a' type: 'int' "
+                 "has_minimum: true minimum: 5 }")
+    op = self._lib.apply_op("AttrMin", a=12, name="s")
+    self.assertProtoEquals("""
+      name: 's' op: 'AttrMin' attr { key: 'a' value { i: 12 } }
+      """, op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("AttrMin", a=2)
+    self.assertEqual(cm.exception.message,
+                     "Attr 'a' of 'AttrMin' Op passed 2 less than minimum 5.")
+
+  def testAttrListMin(self):
+    self._add_op("name: 'AttrListMin' attr { name: 'a' type: 'list(int)' "
+                 "has_minimum: true minimum: 2 }")
+
+    op = self._lib.apply_op("AttrListMin", a=[1, 2], name="r")
+    self.assertProtoEquals("""
+      name: 'r' op: 'AttrListMin'
+      attr { key: 'a' value { list { i: 1 i: 2 } } }
+      """, op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("AttrListMin", a=[17])
+    self.assertEqual(cm.exception.message,
+                     "Attr 'a' of 'AttrListMin' Op "
+                     "passed list of length 1 less than minimum 2.")
+
+  def testAttrEnum(self):
+    self._add_op("name: 'AttrEnum' "
+                 "attr { name: 'a' type: 'string' "
+                 "  allowed_values { list { s: 'apples' s: 'oranges' } } }")
+
+    op = self._lib.apply_op("AttrEnum", a="oranges", name="e")
+    self.assertProtoEquals("""
+      name: 'e' op: 'AttrEnum' attr { key: 'a' value { s: 'oranges' } }
+      """, op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("AttrEnum", a="invalid")
+    self.assertEqual(cm.exception.message,
+                     'Attr \'a\' of \'AttrEnum\' Op '
+                     'passed string \'invalid\' not in: '
+                     '"apples", "oranges".')
+
+  def testAttrEnumList(self):
+    self._add_op("name: 'AttrEnumList' "
+                 "attr { name: 'a' type: 'list(string)' "
+                 "  allowed_values { list { s: 'apples' s: 'oranges' } } }")
+
+    op = self._lib.apply_op("AttrEnumList", a=["oranges", "apples"], name="f")
+    self.assertProtoEquals("""
+      name: 'f' op: 'AttrEnumList'
+      attr { key: 'a' value { list { s: 'oranges' s: 'apples' } } }
+      """, op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("AttrEnumList", a=["apples", "invalid", "oranges"])
+    self.assertEqual(cm.exception.message,
+                     'Attr \'a\' of \'AttrEnumList\' Op '
+                     'passed string \'invalid\' not '
+                     'in: "apples", "oranges".')
+
+  def testAttrShape(self):
+    self._add_op("name: 'AttrShape' attr { name: 'a' type: 'shape' }")
+
+    op = self._lib.apply_op("AttrShape", a=[5], name="s1")
+    self.assertProtoEquals("""
+      name: 's1' op: 'AttrShape'
+      attr { key: 'a' value { shape { dim { size: 5 } } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrShape", a=(4, 3, 2), name="s2")
+    self.assertProtoEquals("""
+      name: 's2' op: 'AttrShape'
+      attr { key: 'a' value {
+        shape { dim { size: 4 } dim { size: 3 } dim { size: 2 } } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op(
+        "AttrShape", a=tensor_shape.TensorShape([3, 2]), name="s3")
+    self.assertProtoEquals("""
+      name: 's3' op: 'AttrShape'
+      attr { key: 'a' value {
+        shape { dim { size: 3 } dim { size: 2 } } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrShape", a=[], name="s4")
+    self.assertProtoEquals("""
+      name: 's4' op: 'AttrShape' attr { key: 'a' value { shape { } } }
+      """, op.node_def)
+
+    shape = tensor_shape_pb2.TensorShapeProto()
+    shape.dim.add().size = 6
+    shape.dim.add().size = 3
+    op = self._lib.apply_op("AttrShape", a=shape, name="s5")
+    self.assertProtoEquals("""
+      name: 's5' op: 'AttrShape'
+      attr { key: 'a' value { shape { dim { size: 6 } dim { size: 3 } } } }
+      """, op.node_def)
+
+    # TODO(josh11b): Re-enable this test once we stop promoting scalars to shapes.
+    # with self.assertRaises(TypeError) as cm:
+    #   self._lib.apply_op("AttrShape", a=5)
+    # self.assertEqual(cm.exception.message,
+    #                  "Don't know how to convert 5 to a TensorShapeProto for "
+    #                  "argument 'a'")
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("AttrShape", a="ABC")
+
+  def testAttrShapeList(self):
+    self._add_op("name: 'AttrShapeList' attr { name: 'a' type: 'list(shape)' }")
+
+    op = self._lib.apply_op("AttrShapeList", a=[[3, 2], [6, 5, 4]], name="sl")
+    self.assertProtoEquals("""
+      name: 'sl' op: 'AttrShapeList'
+      attr { key: 'a' value { list {
+        shape { dim { size: 3 } dim { size: 2 } }
+        shape { dim { size: 6 } dim { size: 5 } dim { size: 4 } } } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrShapeList", a=[], name="esl")
+    self.assertProtoEquals("""
+      name: 'esl' op: 'AttrShapeList' attr { key: 'a' value { list { } } }
+      """, op.node_def)
+
+  def testAttrDefault(self):
+    self._add_op("name: 'AttrDefault' "
+                 "attr { name: 'a' type: 'string' "
+                 "  default_value { s: 'banana' } }")
+
+    op = self._lib.apply_op("AttrDefault", a=None, name="d")
+    self.assertProtoEquals("""
+      name: 'd' op: 'AttrDefault' attr { key: 'a' value { s: 'banana' } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrDefault", a="kiwi", name="c")
+    self.assertProtoEquals("""
+      name: 'c' op: 'AttrDefault' attr { key: 'a' value { s: 'kiwi' } }
+      """, op.node_def)
+
+  def testAttrListDefault(self):
+    self._add_op("name: 'AttrListDefault' "
+                 "attr { name: 'a' type: 'list(int)' "
+                 "  default_value { list { i: 5 i: 15 } } }")
+
+    op = self._lib.apply_op("AttrListDefault", a=None, name="b")
+    self.assertProtoEquals("""
+      name: 'b' op: 'AttrListDefault'
+      attr { key: 'a' value { list { i: 5 i: 15 } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrListDefault", a=[3], name="a")
+    self.assertProtoEquals("""
+      name: 'a' op: 'AttrListDefault'
+      attr { key: 'a' value { list { i: 3 } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrListDefault", a=[], name="empty")
+    self.assertProtoEquals("""
+      name: 'empty' op: 'AttrListDefault'
+      attr { key: 'a' value { list { } } }
+      """, op.node_def)
+
+  def testAttrEmptyListDefault(self):
+    self._add_op("name: 'AttrEmptyListDefault' "
+                 "attr { name: 'a' type: 'list(float)' "
+                 "       default_value { list { } } }")
+
+    op = self._lib.apply_op("AttrEmptyListDefault", a=None, name="b")
+    self.assertProtoEquals("""
+      name: 'b' op: 'AttrEmptyListDefault'
+      attr { key: 'a' value { list { } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrEmptyListDefault", a=[3], name="a")
+    self.assertProtoEquals("""
+      name: 'a' op: 'AttrEmptyListDefault'
+      attr { key: 'a' value { list { f: 3 } } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("AttrEmptyListDefault", a=[], name="empty")
+    self.assertProtoEquals("""
+      name: 'empty' op: 'AttrEmptyListDefault'
+      attr { key: 'a' value { list { } } }
+      """, op.node_def)
+
+  def testReservedAttr(self):
+    self._add_op("name: 'ReservedAttr' "
+                 "attr { name: 'range' type: 'int' } ")
+    op = self._lib.apply_op("ReservedAttr", range_=7, name="x")
+    self.assertProtoEquals("""
+      name: 'x' op: 'ReservedAttr' attr { key: 'range' value { i: 7 } }
+      """, op.node_def)
+
+  def testNIntsIn(self):
+    self._add_op("name: 'NIntsIn' "
+                 "input_arg { name: 'a' type: DT_INT32 number_attr: 'N' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 2 }")
+
+    op = self._lib.apply_op("NIntsIn", a=[1, 2], name="n")
+    self.assertProtoEquals("""
+      name: 'n' op: 'NIntsIn' input: 'n/a_0' input: 'n/a_1'
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("NIntsIn", a=[5, 4, 3, 2, 1], name="o")
+    self.assertProtoEquals("""
+      name: 'o' op: 'NIntsIn'
+      input: 'o/a_0' input: 'o/a_1' input: 'o/a_2' input: 'o/a_3' input: 'o/a_4'
+      attr { key: 'N' value { i: 5 } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NIntsIn", a=["foo", "bar"])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'a' of 'NIntsIn' Op have types "
+                     "[string, string] that do not match expected type int32.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NIntsIn", a=[self.Tensor(types.string),
+                                      self.Tensor(types.string)])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'a' of 'NIntsIn' Op have "
+                     "types [string, string] that do not match expected type "
+                     "int32.")
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("NIntsIn", a=[99])
+    self.assertEqual(cm.exception.message,
+                     "List argument 'a' to 'NIntsIn' Op "
+                     "with length 1 shorter than "
+                     "minimum length 2.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NIntsIn", a=[38, "bar"])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'a' of 'NIntsIn' Op have types "
+                     "[int32, string] that do not match expected type int32.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NIntsIn", a=[self.Tensor(types.int32),
+                                      self.Tensor(types.string)])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'a' of 'NIntsIn' Op "
+                     "have types [int32, string] that do not match expected "
+                     "type int32.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NIntsIn", a=17)
+    self.assertStartsWith(cm.exception.message,
+                          "Expected list for 'a' argument "
+                          "to 'NIntsIn' Op, not ")
+
+  def testNPolymorphicIn(self):
+    self._add_op("name: 'NPolymorphicIn' "
+                 "input_arg { name: 'a' type_attr: 'T' number_attr: 'N' } "
+                 "attr { name: 'T' type: 'type' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 2 }")
+
+    op = self._lib.apply_op("NPolymorphicIn", a=[1, 2], name="n")
+    self.assertProtoEquals("""
+      name: 'n' op: 'NPolymorphicIn' input: 'n/a_0' input: 'n/a_1'
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("NPolymorphicIn", a=[5, 4, 3, 2, 1], name="o")
+    self.assertProtoEquals("""
+      name: 'o' op: 'NPolymorphicIn'
+      input: 'o/a_0' input: 'o/a_1' input: 'o/a_2' input: 'o/a_3' input: 'o/a_4'
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 5 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("NPolymorphicIn", a=["foo", "bar"], name="p")
+    self.assertProtoEquals("""
+      name: 'p' op: 'NPolymorphicIn' input: 'p/a_0' input: 'p/a_1'
+      attr { key: 'T' value { type: DT_STRING } }
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("NPolymorphicIn",
+                           a=[1, self.Tensor(types.float32, name="x")],
+                           name="q")
+    self.assertProtoEquals("""
+      name: 'q' op: 'NPolymorphicIn' input: 'q/a_0' input: 'x'
+      attr { key: 'T' value { type: DT_FLOAT } }
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("NPolymorphicIn", a=[99])
+    self.assertEqual(cm.exception.message,
+                     "List argument 'a' to 'NPolymorphicIn' Op with length 1 "
+                     "shorter than minimum length 2.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NPolymorphicIn", a=[38, "bar"])
+    self.assertEqual(cm.exception.message,
+                     "All tensors passed to 'a' of 'NPolymorphicIn' "
+                     "Op must have the same type.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NPolymorphicIn",
+                        a=[38, self.Tensor(types.string)])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'a' of 'NPolymorphicIn' Op "
+                     "have types [int32, string] that don't all match.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NPolymorphicIn",
+                        a=["abcd", self.Tensor(types.int32)])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'a' of 'NPolymorphicIn' Op "
+                     "have types [string, int32] that don't all match.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NPolymorphicIn", a=17)
+    self.assertStartsWith(cm.exception.message,
+                          "Expected list for 'a' argument "
+                          "to 'NPolymorphicIn' Op, not ")
+
+  def testNPolymorphicRestrictIn(self):
+    self._add_op("name: 'NPolymorphicRestrictIn' "
+                 "input_arg { name: 'a' type_attr: 'T' number_attr: 'N' } "
+                 "attr { name: 'T' type: 'type' allowed_values { "
+                 "  list { type: DT_STRING type: DT_BOOL } } } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 2 }")
+
+    op = self._lib.apply_op("NPolymorphicRestrictIn", a=["foo", "bar"],
+                            name="p")
+    self.assertProtoEquals("""
+      name: 'p' op: 'NPolymorphicRestrictIn' input: 'p/a_0' input: 'p/a_1'
+      attr { key: 'T' value { type: DT_STRING } }
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("NPolymorphicRestrictIn", a=[False, True, False],
+                           name="b")
+    self.assertProtoEquals("""
+      name: 'b' op: 'NPolymorphicRestrictIn'
+      input: 'b/a_0' input: 'b/a_1' input: 'b/a_2'
+      attr { key: 'T' value { type: DT_BOOL } }
+      attr { key: 'N' value { i: 3 } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NPolymorphicRestrictIn", a=[1, 2])
+    self.assertEqual(cm.exception.message,
+                     "DataType int32 for attr 'T' "
+                     "not in list of allowed values: string, bool")
+
+  def testNInTwice(self):
+    self._add_op("name: 'NInTwice' "
+                 "input_arg { name: 'a' type: DT_INT32 number_attr: 'N' } "
+                 "input_arg { name: 'b' type: DT_STRING number_attr: 'N' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 0 }")
+
+    op = self._lib.apply_op("NInTwice", a=[1, 2], b=["one", "two"], name="n")
+    self.assertProtoEquals("""
+      name: 'n' op: 'NInTwice'
+      input: 'n/a_0' input: 'n/a_1' input: 'n/b_0' input: 'n/b_1'
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("NInTwice", a=[], b=[], name="o")
+    self.assertProtoEquals("""
+      name: 'o' op: 'NInTwice' attr { key: 'N' value { i: 0 } }
+      """, op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("NInTwice", a=[1, 2, 3], b=["too short"])
+    self.assertEqual(cm.exception.message,
+                     "List argument 'b' to 'NInTwice' Op "
+                     "with length 1 must match "
+                     "length 3 of argument 'a'.")
+
+  def testNInPolymorphicTwice(self):
+    self._add_op("name: 'NInPolymorphicTwice' "
+                 "input_arg { name: 'a' type_attr: 'T' number_attr: 'N' } "
+                 "input_arg { name: 'b' type_attr: 'T' number_attr: 'N' } "
+                 "attr { name: 'T' type: 'type' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 0 }")
+
+    op = self._lib.apply_op("NInPolymorphicTwice", a=[1, 2], b=[3, 4], name="n")
+    self.assertProtoEquals("""
+      name: 'n' op: 'NInPolymorphicTwice'
+      input: 'n/a_0' input: 'n/a_1' input: 'n/b_0' input: 'n/b_1'
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("NInPolymorphicTwice", a=[1, 2, 3], b=[5])
+    self.assertEqual(cm.exception.message,
+                     "List argument 'b' to 'NInPolymorphicTwice' Op "
+                     "with length 1 "
+                     "must match length 3 of argument 'a'.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NInPolymorphicTwice", a=[1, 2], b=["one", "two"])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'b' of 'NInPolymorphicTwice' "
+                     "Op have types [string, string] that do not match type "
+                     "int32 inferred from earlier arguments.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NInPolymorphicTwice",
+                        a=[self.Tensor(types.int32)],
+                        b=[self.Tensor(types.string)])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'b' of "
+                     "'NInPolymorphicTwice' Op have types [string] that do not "
+                     "match type int32 inferred from earlier arguments.")
+
+  def testNInTwoTypeVariables(self):
+    self._add_op("name: 'NInTwoTypeVariables' "
+                 "input_arg { name: 'a' type_attr: 'S' number_attr: 'N' } "
+                 "input_arg { name: 'b' type_attr: 'T' number_attr: 'N' } "
+                 "attr { name: 'S' type: 'type' } "
+                 "attr { name: 'T' type: 'type' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 0 }")
+
+    op = self._lib.apply_op("NInTwoTypeVariables", a=[1, 2], b=[True, False],
+                           name="n")
+    self.assertProtoEquals("""
+      name: 'n' op: 'NInTwoTypeVariables'
+      input: 'n/a_0' input: 'n/a_1' input: 'n/b_0' input: 'n/b_1'
+      attr { key: 'S' value { type: DT_INT32 } }
+      attr { key: 'T' value { type: DT_BOOL } }
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("NInTwoTypeVariables", a=[1, 2], b=[3, 4], name="o")
+    self.assertProtoEquals("""
+      name: 'o' op: 'NInTwoTypeVariables'
+      input: 'o/a_0' input: 'o/a_1' input: 'o/b_0' input: 'o/b_1'
+      attr { key: 'S' value { type: DT_INT32 } }
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 2 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("NInTwoTypeVariables",
+                           a=[self.Tensor(types.int32, name="q")],
+                           b=[self.Tensor(types.string, name="r")],
+                           name="p")
+    self.assertProtoEquals("""
+      name: 'p' op: 'NInTwoTypeVariables' input: 'q' input: 'r'
+      attr { key: 'S' value { type: DT_INT32 } }
+      attr { key: 'T' value { type: DT_STRING } }
+      attr { key: 'N' value { i: 1 } }
+      """, op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("NInTwoTypeVariables", a=[1, 2, 3], b=["5"])
+    self.assertEqual(cm.exception.message,
+                     "List argument 'b' to 'NInTwoTypeVariables' Op "
+                     "with length 1 "
+                     "must match length 3 of argument 'a'.")
+
+  def testInPolymorphicTwice(self):
+    self._add_op("name: 'InPolymorphicTwice' "
+                 "input_arg { name: 'a' type_attr: 'T' number_attr: 'N' } "
+                 "input_arg { name: 'b' type_attr: 'T' number_attr: 'M' } "
+                 "attr { name: 'T' type: 'type' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 0 } "
+                 "attr { name: 'M' type: 'int' has_minimum: true minimum: 0 } ")
+
+    op = self._lib.apply_op("InPolymorphicTwice", a=[8], b=[3, 4, 5], name="n")
+    self.assertProtoEquals("""
+      name: 'n' op: 'InPolymorphicTwice'
+      input: 'n/a_0' input: 'n/b_0' input: 'n/b_1' input: 'n/b_2'
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 1 } }
+      attr { key: 'M' value { i: 3 } }
+      """, op.node_def)
+
+    op = self._lib.apply_op("InPolymorphicTwice", a=[8], b=[], name="o")
+    self.assertProtoEquals("""
+      name: 'o' op: 'InPolymorphicTwice' input: 'o/a_0'
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 1 } }
+      attr { key: 'M' value { i: 0 } }
+      """, op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("InPolymorphicTwice", a=[], b=[3, 4, 5])
+    self.assertEqual(cm.exception.message,
+                     "Don't know how to infer type variable from empty input "
+                     "list passed to input 'a' of 'InPolymorphicTwice' Op.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("InPolymorphicTwice", a=[1, 2], b=["one", "two"])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'b' of 'InPolymorphicTwice' Op "
+                     "have types [string, string] that do not match type int32 "
+                     "inferred from earlier arguments.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("InPolymorphicTwice",
+                        a=[self.Tensor(types.int32)],
+                        b=[self.Tensor(types.string)])
+    self.assertEqual(cm.exception.message,
+                     "Tensors in list passed to 'b' of 'InPolymorphicTwice' "
+                     "Op have types [string] that do not match type int32 "
+                     "inferred from earlier arguments.")
+
+  def testNIntsOut(self):
+    self._add_op("name: 'NIntsOut' "
+                 "output_arg { name: 'a' type: DT_INT32 number_attr: 'N' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 2 }")
+
+    out1, out2 = self._lib.apply_op("NIntsOut", N=2, name="n")
+    self.assertEquals(types.int32, out1.dtype)
+    self.assertEquals(types.int32, out2.dtype)
+    self.assertProtoEquals("""
+      name: 'n' op: 'NIntsOut' attr { key: 'N' value { i: 2 } }
+      """, out1.op.node_def)
+
+    out1, out2, out3, out4, out5 = self._lib.apply_op(
+        "NIntsOut", N=5, name="o")
+    self.assertEquals(types.int32, out1.dtype)
+    self.assertEquals(types.int32, out2.dtype)
+    self.assertEquals(types.int32, out3.dtype)
+    self.assertEquals(types.int32, out4.dtype)
+    self.assertEquals(types.int32, out5.dtype)
+    self.assertProtoEquals("""
+      name: 'o' op: 'NIntsOut' attr { key: 'N' value { i: 5 } }
+      """, out5.op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("NIntsOut", N=1)
+    self.assertEqual(cm.exception.message,
+                     "Attr 'N' of 'NIntsOut' Op passed 1 less than minimum 2.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NIntsOut", N=[3])
+    self.assertEqual(cm.exception.message,
+                     "Expected int for argument 'N' not [3].")
+
+  def testNIntsOutDefault(self):
+    self._add_op("name: 'NIntsOutDefault' "
+                 "output_arg { name: 'a' type: DT_INT32 number_attr: 'N' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 2"
+                 "  default_value { i:3 } }")
+
+    out1, out2, out3 = self._lib.apply_op(
+        "NIntsOutDefault", N=None, name="z")
+    self.assertEquals(types.int32, out1.dtype)
+    self.assertEquals(types.int32, out2.dtype)
+    self.assertEquals(types.int32, out3.dtype)
+    self.assertProtoEquals("""
+      name: 'z' op: 'NIntsOutDefault' attr { key: 'N' value { i: 3 } }
+      """, out1.op.node_def)
+
+    out1, out2 = self._lib.apply_op("NIntsOutDefault", N=2, name="y")
+    self.assertEquals(types.int32, out1.dtype)
+    self.assertEquals(types.int32, out2.dtype)
+    self.assertProtoEquals("""
+      name: 'y' op: 'NIntsOutDefault' attr { key: 'N' value { i: 2 } }
+      """, out2.op.node_def)
+
+  def testNPolymorphicOut(self):
+    self._add_op("name: 'NPolymorphicOut' "
+                 "output_arg { name: 'a' type_attr: 'T' number_attr: 'N' } "
+                 "attr { name: 'T' type: 'type' } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 2 }")
+
+    out1, out2 = self._lib.apply_op("NPolymorphicOut", N=2,
+                                   T=types.int32, name="n")
+    self.assertEquals(types.int32, out1.dtype)
+    self.assertEquals(types.int32, out2.dtype)
+    self.assertProtoEquals("""
+      name: 'n' op: 'NPolymorphicOut'
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 2 } }
+      """, out1.op.node_def)
+
+    out1, out2, out3 = self._lib.apply_op(
+        "NPolymorphicOut", T=types.string, N=3, name="o")
+    self.assertEquals(types.string, out1.dtype)
+    self.assertEquals(types.string, out2.dtype)
+    self.assertEquals(types.string, out3.dtype)
+    self.assertProtoEquals("""
+      name: 'o' op: 'NPolymorphicOut'
+      attr { key: 'T' value { type: DT_STRING } }
+      attr { key: 'N' value { i: 3 } }
+      """, out3.op.node_def)
+
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("NPolymorphicOut", N=1, T=types.string)
+    self.assertEqual(cm.exception.message,
+                     "Attr 'N' of 'NPolymorphicOut' Op "
+                     "passed 1 less than minimum 2.")
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NPolymorphicOut", N=3, T=[types.string])
+    self.assertEqual(
+        cm.exception.message,
+        "Expected DataType for argument 'T' not [tf.string].")
+
+  def testNPolymorphicOutDefault(self):
+    self._add_op("name: 'NPolymorphicOutDefault' "
+                 "output_arg { name: 'a' type_attr: 'T' number_attr: 'N' } "
+                 "attr { name: 'T' type: 'type'"
+                 "  default_value { type: DT_BOOL } } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 2 "
+                 "  default_value { i: 2 } }")
+
+    out1, out2 = self._lib.apply_op(
+        "NPolymorphicOutDefault", N=None, T=None, name="r")
+    self.assertEquals(types.bool, out1.dtype)
+    self.assertEquals(types.bool, out2.dtype)
+    self.assertProtoEquals("""
+      name: 'r' op: 'NPolymorphicOutDefault'
+      attr { key: 'T' value { type: DT_BOOL } }
+      attr { key: 'N' value { i: 2 } }
+      """, out1.op.node_def)
+
+    out1, out2, out3 = self._lib.apply_op(
+        "NPolymorphicOutDefault", N=3, T=None, name="s")
+    self.assertEquals(types.bool, out1.dtype)
+    self.assertEquals(types.bool, out2.dtype)
+    self.assertEquals(types.bool, out3.dtype)
+    self.assertProtoEquals("""
+      name: 's' op: 'NPolymorphicOutDefault'
+      attr { key: 'T' value { type: DT_BOOL } }
+      attr { key: 'N' value { i: 3 } }
+      """, out1.op.node_def)
+
+    out1, out2 = self._lib.apply_op(
+        "NPolymorphicOutDefault", N=None, T=types.int32, name="t")
+    self.assertEquals(types.int32, out1.dtype)
+    self.assertEquals(types.int32, out2.dtype)
+    self.assertProtoEquals("""
+      name: 't' op: 'NPolymorphicOutDefault'
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 2 } }
+      """, out1.op.node_def)
+
+    out1, out2, out3 = self._lib.apply_op(
+        "NPolymorphicOutDefault", N=3, T=types.int32, name="u")
+    self.assertEquals(types.int32, out1.dtype)
+    self.assertEquals(types.int32, out2.dtype)
+    self.assertEquals(types.int32, out3.dtype)
+    self.assertProtoEquals("""
+      name: 'u' op: 'NPolymorphicOutDefault'
+      attr { key: 'T' value { type: DT_INT32 } }
+      attr { key: 'N' value { i: 3 } }
+      """, out1.op.node_def)
+
+  def testNPolymorphicRestrictOut(self):
+    self._add_op("name: 'NPolymorphicRestrictOut' "
+                 "output_arg { name: 'a' type_attr: 'T' number_attr: 'N' } "
+                 "attr { name: 'T' type: 'type' allowed_values { "
+                 "  list { type: DT_STRING type: DT_BOOL } } } "
+                 "attr { name: 'N' type: 'int' has_minimum: true minimum: 2 }")
+
+    out1, out2, out3 = self._lib.apply_op(
+        "NPolymorphicRestrictOut", N=3, T=types.bool, name="u")
+    self.assertEquals(types.bool, out1.dtype)
+    self.assertEquals(types.bool, out2.dtype)
+    self.assertEquals(types.bool, out3.dtype)
+    self.assertProtoEquals("""
+      name: 'u' op: 'NPolymorphicRestrictOut'
+      attr { key: 'T' value { type: DT_BOOL } }
+      attr { key: 'N' value { i: 3 } }
+      """, out1.op.node_def)
+
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("NPolymorphicRestrictOut", N=2, T=types.int32)
+    self.assertEqual(cm.exception.message,
+                     "DataType int32 for attr 'T' "
+                     "not in list of allowed values: string, bool")
+
+  def testRef(self):
+    self._add_op("name: 'RefIn' "
+                 "input_arg { name: 'a' type_attr: 'T' is_ref: true } "
+                 "attr { name: 'T' type: 'type' } ")
+    self._add_op("name: 'RefOut' "
+                 "output_arg { name: 'a' type_attr: 'T' is_ref: true } "
+                 "attr { name: 'T' type: 'type' } ")
+
+    out = self._lib.apply_op("RefOut", T=types.bool, name="o")
+    self.assertEquals(types.bool_ref, out.dtype)
+    self.assertProtoEquals("""
+      name: 'o' op: 'RefOut'
+      attr { key: 'T' value { type: DT_BOOL } }
+      """, out.op.node_def)
+
+    op = self._lib.apply_op("RefIn", a=out, name="i")
+    self.assertProtoEquals("""
+      name: 'i' op: 'RefIn' input: 'o'
+      attr { key: 'T' value { type: DT_BOOL } }
+      """, op.node_def)
+
+    # Can pass ref to non-ref input.
+    out = self._lib.apply_op("RefOut", T=types.int32, name="r")
+    out = self._lib.apply_op("Simple", a=out, name="s")
+    self.assertProtoEquals("""
+      name: 's' op: 'Simple' input: 'r'
+      """, out.op.node_def)
+
+    # Can't pass non-ref to ref input.
+    with self.assertRaises(TypeError) as cm:
+      self._lib.apply_op("RefIn", a=2)
+    self.assertEqual(cm.exception.message,
+                     "Input 'a' of 'RefIn' Op requires l-value input")
+
+  def testSpecifyDevice(self):
+    with self._g.device("ADevice"):
+      self._lib.apply_op("Simple", a=3)
+    # We look at the whole graph here to make sure the Const op is also given
+    # the specified device.
+    graph_def = self._g.as_graph_def()
+    self.assertEqual(len(graph_def.node), 2)
+    for node in graph_def.node:
+      self.assertEqual(node.device, "ADevice")
+
+  def testStructuredOutputSingleList(self):
+    self._add_op("name: 'SimpleStruct' "
+                 "output_arg { name: 'a' type: DT_INT32 number_attr: 'n_a' } "
+                 "attr { name: 'n_a' type: 'int' }")
+    for n_a in [0, 1, 3]:
+      a = self._lib.apply_op("SimpleStruct", n_a=n_a)
+      self.assertTrue(isinstance(a, list))
+      self.assertEqual(n_a, len(a))
+
+  def testStructuredOutputListAndSingle(self):
+    self._add_op("name: 'MixedStruct' "
+                 "output_arg { name: 'a' type: DT_INT32 number_attr: 'n_a' } "
+                 "output_arg { name: 'b' type: DT_FLOAT } "
+                 "attr { name: 'n_a' type: 'int' }")
+    for n_a in [0, 1, 3]:
+      a, b = self._lib.apply_op("MixedStruct", n_a=n_a)
+      self.assertTrue(isinstance(a, list))
+      self.assertEqual(n_a, len(a))
+      self.assertTrue(all(x.dtype == types.int32 for x in a))
+      self.assertTrue(isinstance(b, ops.Tensor))
+      self.assertEqual(types.float32, b.dtype)
+
+  def testStructuredOutputMultipleLists(self):
+    self._add_op("name: 'ComplexStruct' "
+                 "output_arg { name: 'a' type: DT_INT32 number_attr: 'n_a' } "
+                 "output_arg { name: 'b' type: DT_INT64 number_attr: 'n_b' } "
+                 "output_arg { name: 'c' type_list_attr: 't_c' } "
+                 "attr { name: 'n_a' type: 'int' } "
+                 "attr { name: 'n_b' type: 'int' } "
+                 "attr { name: 't_c' type: 'list(type)' }")
+    for n_a in [0, 1, 3]:
+      for n_b in [0, 1, 3]:
+        for t_c in [[],
+                    [types.int32],
+                    [types.int32, types.float32]]:
+          a, b, c = self._lib.apply_op("ComplexStruct",
+                                      n_a=n_a, n_b=n_b, t_c=t_c)
+
+          self.assertEqual(n_a, len(a))
+          self.assertTrue(all(x.dtype == types.int32 for x in a))
+          self.assertEqual(n_b, len(b))
+          self.assertTrue(all(x.dtype == types.int64 for x in b))
+          self.assertEqual(t_c, [x.dtype for x in c])
+
+
+class OpDefLibraryGraphTest(test_util.TensorFlowTestCase):
+
+  def setUp(self):
+    self._lib = OpDefLibrary()
+    self._g = ops.Graph()
+    self._add_op("name: 'Simple' input_arg { name: 'a' type: DT_INT32 } "
+                 "output_arg { name: 'out' type: DT_FLOAT }")
+    self._add_op("name: 'Binary' "
+                 "input_arg { name: 'a' type_attr: 'T' } "
+                 "input_arg { name: 'b' type_attr: 'T' } "
+                 "output_arg { name: 'out' type_attr: 'T' } "
+                 "attr { name: 'T' type: 'type' }")
+
+  def _add_op(self, ascii):
+    op_def = op_def_pb2.OpDef()
+    text_format.Merge(ascii, op_def)
+    self._lib.add_op(op_def)
+
+  def testNoGraph(self):
+    out = self._lib.apply_op("Simple", a=3)
+    self.assertEquals(out.graph, ops.get_default_graph())
+
+  def testDefaultGraph(self):
+    with self._g.as_default():
+      out = self._lib.apply_op("Simple", a=3)
+      self.assertEquals(out.graph, self._g)
+
+  def testIgnoreDefaultGraphWithGraphArgument(self):
+    default_g = ops.Graph()
+    with default_g.as_default():
+      out = self._lib.apply_op("Simple", a=3, g=self._g)
+      self.assertEquals(ops.get_default_graph(), default_g)
+      self.assertEquals(out.graph, self._g)
+
+  def testDifferentGraphFails(self):
+    a = self._lib.apply_op("Simple", a=3, g=self._g)
+    other_g = ops.Graph()
+    b = self._lib.apply_op("Simple", a=4, g=other_g)
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("Binary", a=a, b=b)
+    self.assertTrue("must be from the same graph" in cm.exception.message)
+
+  def testDifferentGraphFailsWithGraphArgument(self):
+    other_g = ops.Graph()
+    a = self._lib.apply_op("Simple", a=3, g=other_g)
+    b = self._lib.apply_op("Simple", a=4, g=other_g)
+    with self.assertRaises(ValueError) as cm:
+      self._lib.apply_op("Binary", a=a, b=b, g=self._g)
+    self.assertTrue(
+        "not from the passed-in graph" in cm.exception.message)
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/ops/parsing_ops.py b/tensorflow/python/ops/parsing_ops.py
new file mode 100644
index 0000000000..dc954a3776
--- /dev/null
+++ b/tensorflow/python/ops/parsing_ops.py
@@ -0,0 +1,390 @@
+"""Parsing Ops."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import gen_parsing_ops
+from tensorflow.python.ops import logging_ops
+from tensorflow.python.ops import math_ops
+# pylint: disable=wildcard-import,undefined-variable
+from tensorflow.python.ops.gen_parsing_ops import *
+
+
+ops.NoGradient("DecodeRaw")
+ops.NoGradient("StringToNumber")
+
+
+# pylint: disable=protected-access
+def parse_example(serialized,
+                  names=None,
+                  sparse_keys=None,
+                  sparse_types=None,
+                  dense_keys=None,
+                  dense_types=None,
+                  dense_defaults=None,
+                  dense_shapes=None,
+                  name="ParseExample"):
+  """Parse Example protos.
+
+  Args:
+    serialized: string vector, a batch of binary serialized Example protos.
+    names: A string vector, the names of the serialized protos.
+      "names" may contain, e.g., table key (descriptive) names for the
+      corresponding serialized protos.  These are purely useful for debugging
+      purposes, and the presence of values here has no effect on the output.
+      "names" may be an empty vector, if no names are available.
+      If non-empty, this vector must be the same length as "serialized".
+    sparse_keys: A string list of keys in the Examples' features.
+      These keys are associated with sparse values.
+    sparse_types: A list of DTypes.
+      This list's length must match that of sparse_keys.  Currently
+      parse_example supports tf.float32 (FloatList), tf.int64 (Int64List),
+      and tf.string (BytesList).
+    dense_keys: A string list of keys in the Examples' features.
+      These keys are associated with dense values.
+    dense_types: A list of DTypes.
+      This list's length must match that of dense_keys.  Currently
+      parse_example supports tf.float32 (FloatList), tf.int64 (Int64List),
+      and tf.string (BytesList).
+    dense_defaults: A dict of {key:Tensor} (some may be missing).
+      The keys of the dict must match the dense_keys of the feature.
+      If a key is not present in this dictionary, the corresponding dense
+      Feature is required in all elements of serialized.
+    dense_shapes: A list of tuples.
+      Entries provide the shape of data in each dense Feature in features.
+      The length of dense_shapes must be the same as the length of dense_keys.
+      The number of elements in the Feature corresponding to dense_key[j]
+      must always have np.prod(dense_shapes[j]) entries.
+      If dense_shapes[j] == (D0, D1, ..., DN) then the the shape of output
+      Tensor dense_values[j] will be (|serialized|, D0, D1, ..., DN):
+      The dense outputs are just the inputs row-stacked by batch.
+    name: (Optional) Name of Op in the graph.
+
+  Returns:
+    A dictionary mapping keys to Tensors and SparseTensors.
+
+    The key dense_keys[j] is mapped to a tensor of type dense_types[j] and
+    of shape (serialized.size(),) + dense_shapes[j] (i.e., the dense outputs are
+    inputs, reshaped in row-major format and then row-stacked by batch).
+
+    The key sparse_keys[j] is mapped to a SparseTensor of type sparse_types[j].
+    The SparseTensor represents a ragged matrix.  Its indices are [batch, index]
+    where "batch" is is the batch entry the value is from, and "index" is the
+    value's index in the list of values associated with that feature
+    and example.  For example, if one expects a tf.float32 sparse feature "ft"
+    and three serialized examples are provided:
+
+    serialized = [
+      features:
+        { feature: [ key: { "ft" value: float_list: { value: [1.0, 2.0] } } ] },
+      features:
+        { feature: [] },
+      features:
+        { feature: [ key: { "ft" value: float_list: { value: [3.0] } } ] }
+    ]
+
+    then the output will look like:
+
+      {"ft": SparseTensor(indices=[[0, 0], [0, 1], [2, 0]],
+                          values=[1.0, 2.0, 3.0],
+                          shape=(3, 2)) }
+
+  Raises:
+    ValueError: If sparse and dense keys intersect, or input lengths do not
+      match up for sparse_* (similarly for dense_*).
+    TypeError: If an input is malformed.
+
+  Example input, format, and output: Just Sparse Inputs
+  ================================================
+
+  Given two brain.Example input protos:
+
+  serialized:  // serialized versions of the protos below
+    [features: {
+      feature: { key: "kw" value: { bytes_list: { value: [ "knit", "big" ] } } }
+      feature: { key: "gps" value: { float_list: { value: [] } } }
+     },
+     features: {
+      feature: { key: "kw" value: { bytes_list: { value: [ "emmy" ] } } }
+      feature: { key: "dank" value: { int64_list: { value: [ 42 ] } } }
+      feature: { key: "gps" value: { } }
+    }]
+  names: ["input0", "input1"],
+  sparse_keys: ["kw", "dank", "gps"]
+  sparse_types: [DT_STRING, DT_INT64, DT_FLOAT]
+
+  Then the expected output is a dictionary:
+  {
+    "kw": SparseTensor(
+        indices=[[0, 0], [0, 1], [1, 0]],
+        values=["knit", "big", "emmy"]
+        shape=[2, 2]),
+    "dank": SparseTensor(
+        indices=[[1, 0]],
+        values=[42],
+        shape=[2, 1]),
+    "gps": SparseTensor(
+        indices=[],
+        values=[],
+        shape=[2, 0]),
+  }
+
+
+  Example input, format, and output: Dense Inputs (without defaults)
+  ==================================================================
+
+  Given two brain.Example input protos:
+
+  serialized:  // serialized versions of the protos below
+    [features: {
+      feature: { key: "age" value: { int64_list: { value: [ 0 ] } } }
+      feature: { key: "gender" value: { bytes_list: { value: [ "f" ] } } }
+     },
+     features: {
+      feature: { key: "age" value: { int64_list: { value: [] } } }
+      feature: { key: "gender" value: { bytes_list: { value: [ "f" ] } } }
+    }]
+  names: ["input0", "input1"],
+  dense_keys: np.array(["age", "gender"])
+  dense_types: [tf.int64, tf.string]
+  dense_defaults: {
+    "age": -1  # defaults to -1 if missing
+               # "gender" has no specified default so it's required
+  }
+  dense_shapes: [(1,), (1,)]  # age, gender, label, weight
+
+  Then the expected output is a dictionary:
+  {
+    "age": [[0], [-1]],
+    "gender": [["f"], ["f"]],
+  }
+
+
+  Example input, format, and output: Dense Inputs (with defaults)
+  ===============================================================
+
+  Given two brain.Example input protos:
+
+  serialized:  // serialized versions of the protos below
+    [features: {
+      feature: { key: "weight" value: { float_list: { value: [ 1.0 ] } } }
+     },
+     features: {
+      feature: { key: "label" value: { float_list: { value: [ -1.0, 0.0 ] } } }
+    }]
+  names: ["input0", "input1"],
+  dense_keys: np.array(["label", "weight"])
+  dense_defaults: {
+    "label": [1.0, 2.0],  # float (default: vector)
+    "weight": 5.0         # float (default: scalar, 5.0)
+  }
+  dense_shapes: [(2,), (1,)]  # age, gender, label, weight
+
+  Then the expected output is a dictionary:
+  {
+    "label": [[1.0, 2.0], [-1.0, 0.0]],
+    "weight": [[1.0], [5.0]],
+  }
+  """
+  names = [] if names is None else names
+  dense_defaults = {} if dense_defaults is None else dense_defaults
+  sparse_keys = [] if sparse_keys is None else sparse_keys
+  sparse_types = [] if sparse_types is None else sparse_types
+  dense_keys = [] if dense_keys is None else dense_keys
+  dense_types = [] if dense_types is None else dense_types
+  dense_shapes = [
+      []] * len(dense_keys) if dense_shapes is None else dense_shapes
+
+  num_dense = len(dense_keys)
+  num_sparse = len(sparse_keys)
+
+  if len(dense_shapes) != num_dense:
+    raise ValueError("len(dense_shapes) != len(dense_keys): %d vs. %d"
+                     % (len(dense_shapes), num_dense))
+  if len(dense_types) != num_dense:
+    raise ValueError("len(dense_types) != len(num_dense): %d vs. %d"
+                     % (len(dense_types), num_dense))
+  if len(sparse_types) != num_sparse:
+    raise ValueError("len(sparse_types) != len(sparse_keys): %d vs. %d"
+                     % (len(sparse_types), num_sparse))
+  if num_dense + num_sparse == 0:
+    raise ValueError("Must provide at least one sparse key or dense key")
+  if not set(dense_keys).isdisjoint(set(sparse_keys)):
+    raise ValueError(
+        "Dense and sparse keys must not intersect; intersection: %s" %
+        set(dense_keys).intersection(set(sparse_keys)))
+
+  dense_defaults_vec = []
+  for i, key in enumerate(dense_keys):
+    default_value = dense_defaults.get(key)
+    if default_value is None:
+      default_value = constant_op.constant([], dtype=dense_types[i])
+    elif not isinstance(default_value, ops.Tensor):
+      default_value = ops.convert_to_tensor(
+          default_value, dtype=dense_types[i], name=key)
+      default_value = array_ops.reshape(default_value, dense_shapes[i])
+
+    dense_defaults_vec.append(default_value)
+
+  dense_shapes = [tensor_util.MakeTensorShapeProto(shape)
+                  if isinstance(shape, (list, tuple)) else shape
+                  for shape in dense_shapes]
+
+  outputs = gen_parsing_ops._parse_example(
+      serialized=serialized,
+      names=names,
+      dense_defaults=dense_defaults_vec,
+      sparse_keys=sparse_keys,
+      sparse_types=sparse_types,
+      dense_keys=dense_keys,
+      dense_shapes=dense_shapes,
+      name=name)
+
+  (sparse_indices, sparse_values, sparse_shapes, dense_values) = outputs
+
+  sparse_tensors = [ops.SparseTensor(ix, val, shape) for (ix, val, shape)
+                    in zip(sparse_indices, sparse_values, sparse_shapes)]
+
+  return dict(
+      zip(sparse_keys + dense_keys, sparse_tensors + dense_values))
+
+
+def parse_single_example(serialized,  # pylint: disable=invalid-name
+                         names=None,
+                         sparse_keys=None,
+                         sparse_types=None,
+                         dense_keys=None,
+                         dense_types=None,
+                         dense_defaults=None,
+                         dense_shapes=None,
+                         name="ParseSingleExample"):
+  """Identical to parse_example but for scalar serialized and names.
+
+  Args:
+    serialized: A scalar string, a single serialized Example.
+      See parse_example documentation for more details.
+    names: (Optional) A scalar string, the associated name.
+      See parse_example documentation for more details.
+    sparse_keys: See parse_example documentation for more details.
+    sparse_types: See parse_example documentation for more details.
+    dense_keys: See parse_example documentation for more details.
+    dense_types: See parse_example documentation for more details.
+    dense_defaults: See parse_example documentation for more details.
+    dense_shapes: See parse_example documentation for more details.
+    name: Optional op name.
+
+  Returns:
+    A dictionary mapping keys to Tensors and SparseTensors.
+
+    For dense tensors, the Tensor is identical to the output of parse_example,
+    except it is one less dimension (the first, batch, dimension is removed).
+
+    For SparseTensors:
+      The first (batch) column of the indices matrix is removed
+        (it is now a column vector).
+      The values vector is unchanged.
+      The first (batch_size) entry of the shape vector is removed
+        (it is now a single element vector).
+
+  Raises:
+    ValueError: if "scalar" or "names" have known shapes, and are not scalars.
+  """
+  with ops.op_scope([serialized], name, "parse_single_example"):
+    serialized = ops.convert_to_tensor(serialized)
+    serialized_shape = serialized.get_shape()
+    if serialized_shape.ndims is not None:
+      if serialized_shape.ndims != 0:
+        raise ValueError("Input serialized must be a scalar")
+    else:
+      serialized = control_flow_ops.with_dependencies(
+          [logging_ops.Assert(
+              math_ops.equal(array_ops.rank(serialized), 0),
+              ["Input serialized must be a scalar"],
+              name="SerializedIsScalar")],
+          serialized,
+          name="SerializedDependencies")
+    serialized = array_ops.expand_dims(serialized, 0)
+    if names is not None:
+      names = ops.convert_to_tensor(names)
+      names_shape = names.get_shape()
+      if names_shape.ndims is not None:
+        if names_shape.ndims != 0:
+          raise ValueError("Input names must be a scalar")
+      else:
+        names = control_flow_ops.with_dependencies(
+            [logging_ops.Assert(
+                math_ops.equal(array_ops.rank(names), 0),
+                ["Input names must be a scalar"],
+                name="NamesIsScalar")],
+            names,
+            name="NamesDependencies")
+      names = array_ops.expand_dims(names, 0)
+
+    outputs = parse_example(serialized,
+                            names=names,
+                            sparse_keys=sparse_keys,
+                            sparse_types=sparse_types,
+                            dense_keys=dense_keys,
+                            dense_types=dense_types,
+                            dense_defaults=dense_defaults,
+                            dense_shapes=dense_shapes,
+                            name=name)
+    if dense_keys is not None:
+      for d in dense_keys:
+        outputs[d] = array_ops.squeeze(outputs[d], [0], name="Squeeze_%s" % d)
+    if sparse_keys is not None:
+      for s in sparse_keys:
+        outputs[s] = ops.SparseTensor(
+            array_ops.slice(outputs[s].indices,
+                            [0, 1], [-1, -1], name="Slice_Indices_%s" % s),
+            outputs[s].values,
+            array_ops.slice(outputs[s].shape,
+                            [1], [-1], name="Squeeze_Shape_%s" % s))
+    return outputs
+
+
+@ops.RegisterShape("ParseExample")
+def _ParseExampleShape(op):
+  """Shape function for the ParseExample op."""
+  input_shape = op.inputs[0].get_shape().with_rank(1)
+  num_sparse = op.get_attr("Nsparse")
+  num_dense = op.get_attr("Ndense")
+  dense_shapes = op.get_attr("dense_shapes")
+  sparse_index_shapes = [
+      tensor_shape.matrix(None, 2) for _ in range(num_sparse)]
+  sparse_value_shapes = [tensor_shape.vector(None) for _ in range(num_sparse)]
+  sparse_shape_shapes = [tensor_shape.vector(2) for _ in range(num_sparse)]
+  assert num_dense == len(dense_shapes)
+  dense_shapes = [
+      input_shape.concatenate((d.size for d in dense_shape.dim))
+      for dense_shape in dense_shapes]
+  return (sparse_index_shapes + sparse_value_shapes + sparse_shape_shapes +
+          dense_shapes)
+
+
+ops.RegisterShape("StringToNumber")(
+    common_shapes.unchanged_shape)
+
+
+@ops.RegisterShape("DecodeRaw")
+def _DecodeRawShape(op):
+  """Shape function for the DecodeRaw op."""
+  # NOTE(mrry): Last dimension is data-dependent.
+  return [op.inputs[0].get_shape().concatenate([None])]
+
+
+@ops.RegisterShape("DecodeCSV")
+def _DecodeCSVShape(op):
+  """Shape function for the DecodeCSV op."""
+  input_shape = op.inputs[0].get_shape()
+  # Optionally check that all of other inputs are scalar or empty.
+  for default_input in op.inputs[1:]:
+    default_input_shape = default_input.get_shape().with_rank(1)
+    if default_input_shape[0] > 1:
+      raise ValueError(
+          "Shape of a default must be a length-0 or length-1 vector.")
+  return [input_shape] * len(op.outputs)
diff --git a/tensorflow/python/ops/random_ops.py b/tensorflow/python/ops/random_ops.py
new file mode 100644
index 0000000000..6bd8dd9e3d
--- /dev/null
+++ b/tensorflow/python/ops/random_ops.py
@@ -0,0 +1,181 @@
+"""Operations for generating random numbers."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+from tensorflow.python.framework import random_seed
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import gen_random_ops
+from tensorflow.python.ops import math_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_random_ops import *
+# pylint: enable=wildcard-import
+
+
+def _ShapeTensor(shape):
+  """Convert to an int32 or int64 tensor, defaulting to int32 if empty."""
+  if isinstance(shape, (tuple, list)) and not shape:
+    dtype = types.int32
+  else:
+    dtype = None
+  return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
+
+# pylint: disable=protected-access
+def random_normal(shape, mean=0.0, stddev=1.0, dtype=types.float32,
+                  seed=None, name=None):
+  """Outputs random values from a normal distribution.
+
+  Args:
+    shape: A 1-D integer Tensor or Python array. The shape of the output tensor.
+    mean: A 0-D Tensor or Python value of type `dtype`. The mean of the normal
+      distribution.
+    stddev: A 0-D Tensor or Python value of type `dtype`. The standard deviation
+      of the normal distribution.
+    dtype: The type of the output.
+    seed: A Python integer. Used to create a random seed for the distribution.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+    name: A name for the operation (optional).
+
+  Returns:
+    A tensor of the specified shape filled with random normal values.
+  """
+  with ops.op_scope([shape, mean, stddev], name, "random_normal") as name:
+    shape_tensor = _ShapeTensor(shape)
+    mean_tensor = ops.convert_to_tensor(
+        mean, dtype=dtype, name="mean")
+    stddev_tensor = ops.convert_to_tensor(
+        stddev, dtype=dtype, name="stddev")
+    seed1, seed2 = random_seed.get_seed(seed)
+    rnd = gen_random_ops._random_standard_normal(shape_tensor, dtype,
+                                                 seed=seed1,
+                                                 seed2=seed2)
+    mul = rnd * stddev_tensor
+    value = math_ops.add(mul, mean_tensor, name=name)
+    return value
+
+
+ops.NoGradient("RandomStandardNormal")
+
+
+def truncated_normal(shape, mean=0.0, stddev=1.0, dtype=types.float32,
+                     seed=None, name=None):
+  """Outputs random values from a truncated normal distribution.
+
+  The generated values follow a normal distribution with specified mean and
+  standard deviation, except that values whose magnitude is more than 2 standard
+  deviations from the mean are dropped and re-picked.
+
+  Args:
+    shape: A 1-D integer Tensor or Python array. The shape of the output tensor.
+    mean: A 0-D Tensor or Python value of type `dtype`. The mean of the
+      truncated normal distribution.
+    stddev: A 0-D Tensor or Python value of type `dtype`. The standard deviation
+      of the truncated normal distribution.
+    dtype: The type of the output.
+    seed: A Python integer. Used to create a random seed for the distribution.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+    name: A name for the operation (optional).
+
+  Returns:
+    A tensor of the specified shape filled with random truncated normal values.
+  """
+  with ops.op_scope([shape, mean, stddev], name, "truncated_normal") as name:
+    shape_tensor = _ShapeTensor(shape)
+    mean_tensor = ops.convert_to_tensor(
+        mean, dtype=dtype, name="mean")
+    stddev_tensor = ops.convert_to_tensor(
+        stddev, dtype=dtype, name="stddev")
+    seed1, seed2 = random_seed.get_seed(seed)
+    rnd = gen_random_ops._truncated_normal(shape_tensor, dtype,
+                                           seed=seed1,
+                                           seed2=seed2)
+    mul = rnd * stddev_tensor
+    value = math_ops.add(mul, mean_tensor, name=name)
+    return value
+
+
+ops.NoGradient("TruncatedNormal")
+
+
+def random_uniform(shape, minval=0.0, maxval=1.0,
+                   dtype=types.float32, seed=None,
+                   name=None):
+  """Outputs random values from a uniform distribution.
+
+  The generated values follow a uniform distribution in the range
+  `[minval, maxval)`. The lower bound `minval` is included in the range, while
+  the upper bound `maxval` is excluded.
+
+  Args:
+    shape: A 1-D integer Tensor or Python array. The shape of the output tensor.
+    minval: A 0-D Tensor or Python value of type `dtype`. The lower bound on the
+      range of random values to generate.
+    maxval: A 0-D Tensor or Python value of type `dtype`. The upper bound on
+      the range of random values to generate.
+    dtype: The type of the output.
+    seed: A Python integer. Used to create a random seed for the distribution.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+    name: A name for the operation (optional).
+
+  Returns:
+    A tensor of the specified shape filled with random uniform values.
+  """
+  with ops.op_scope([shape, minval, maxval], name, "random_uniform") as name:
+    shape_tensor = _ShapeTensor(shape)
+    min_tensor = ops.convert_to_tensor(minval, dtype=dtype, name="min")
+    range_tensor = ops.convert_to_tensor(
+        maxval - minval, dtype=dtype, name="range")
+    seed1, seed2 = random_seed.get_seed(seed)
+    rnd = gen_random_ops._random_uniform(shape_tensor, dtype,
+                                         seed=seed1,
+                                         seed2=seed2)
+    mul = rnd * range_tensor
+    value = math_ops.add(mul, min_tensor, name=name)
+    return value
+
+
+def random_shuffle(value, seed=None, name=None):
+  """Randomly shuffles a tensor along its first dimension.
+
+  The tensor is shuffled along dimension 0, such that each `value[j]` is mapped
+  to one and only one `output[i]`. For example, a mapping that might occur for a
+  3x2 tensor is:
+
+  ```python
+  [[1, 2],       [[5, 6],
+   [3, 4],  ==>   [1, 2],
+   [5, 6]]        [3, 4]]
+  ```
+
+  Args:
+    value: A Tensor to be shuffled.
+    seed: A Python integer. Used to create a random seed for the distribution.
+      See [`set_random_seed`](constant_op.md#set_random_seed) for behavior.
+    name: A name for the operation (optional).
+
+  Returns:
+    A tensor of same shape and type as `value`, shuffled along its first
+    dimension.
+  """
+  seed1, seed2 = random_seed.get_seed(seed)
+  return gen_random_ops._random_shuffle(value, seed=seed1, seed2=seed2,
+                                        name=name)
+
+
+ops.NoGradient("RandomUniform")
+
+
+@ops.RegisterShape("TruncatedNormal")
+@ops.RegisterShape("RandomStandardNormal")
+@ops.RegisterShape("RandomUniform")
+def _RandomShape(op):
+  shape_val = tensor_util.ConstantValue(op.inputs[0])
+  if shape_val is not None:
+    return [tensor_shape.TensorShape(shape_val.tolist())]
+  else:
+    shape_shape = op.inputs[0].get_shape().with_rank_at_most(1)
+    return [tensor_shape.unknown_shape(ndims=shape_shape.num_elements())]
+
+
+ops.RegisterShape("RandomShuffle")(common_shapes.unchanged_shape)
diff --git a/tensorflow/python/ops/sparse_grad.py b/tensorflow/python/ops/sparse_grad.py
new file mode 100644
index 0000000000..3685b671b7
--- /dev/null
+++ b/tensorflow/python/ops/sparse_grad.py
@@ -0,0 +1,12 @@
+"""Gradients for operators defined in sparse_ops.py."""
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import sparse_ops
+
+
+ops.NoGradient("SparseToDense")
+
+
+ops.NoGradient("SparseConcat")
+
+
+ops.NoGradient("SparseReorder")
diff --git a/tensorflow/python/ops/sparse_ops.py b/tensorflow/python/ops/sparse_ops.py
new file mode 100644
index 0000000000..c0dca6156d
--- /dev/null
+++ b/tensorflow/python/ops/sparse_ops.py
@@ -0,0 +1,458 @@
+"""## Sparse Tensor Representation.
+
+Tensorflow supports a `SparseTensor` representation for data that is sparse
+in multiple dimensions. Contrast this representation with `IndexedSlices`,
+which is efficient for representing tensors that are sparse in their first
+dimension, and dense along all other dimensions.
+
+@@SparseTensor
+@@SparseTensorValue
+
+## Sparse to Dense Conversion.
+
+@@sparse_to_dense
+@@sparse_tensor_to_dense
+@@sparse_to_indicator
+
+## Manipulation.
+
+@@sparse_concat
+@@sparse_reorder
+@@sparse_retain
+@@sparse_fill_empty_rows
+"""
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import gen_sparse_ops
+from tensorflow.python.ops import math_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.ops.gen_sparse_ops import *
+# pylint: enable=wildcard-import
+# pylint: disable=protected-access
+
+
+def sparse_concat(concat_dim, sp_inputs, name=None):
+  """Concatenates a list of `SparseTensor` along the specified dimension.
+
+  Concatenation is with respect to the dense versions of each sparse input.
+  It is assumed that each inputs is a `SparseTensor` whose elements are ordered
+  along increasing dimension number.
+
+  All inputs' shapes must match, except for the concat dimension.  The
+  `indices`, `values`, and `shapes` lists must have the same length.
+
+  The output shape is identical to the inputs', except along the concat
+  dimension, where it is the sum of the inputs' sizes along that dimension.
+
+  The output elements will be resorted to preserve the sort order along
+  increasing dimension number.
+
+  This op runs in `O(M log M)` time, where `M` is the total number of non-empty
+  values across all inputs. This is due to the need for an internal sort in
+  order to concatenate efficiently across an arbitrary dimension.
+
+  For example, if `concat_dim = 1` and the inputs are
+
+      sp_inputs[0]: shape = [2, 3]
+      [0, 2]: "a"
+      [1, 0]: "b"
+      [1, 1]: "c"
+
+      sp_inputs[1]: shape = [2, 4]
+      [0, 1]: "d"
+      [0, 2]: "e"
+
+  then the output will be
+
+      shape = [2, 7]
+      [0, 2]: "a"
+      [0, 4]: "d"
+      [0, 5]: "e"
+      [1, 0]: "b"
+      [1, 1]: "c"
+
+  Graphically this is equivalent to doing
+
+      [    a] concat [  d e  ] = [    a   d e  ]
+      [b c  ]        [       ]   [b c          ]
+
+  Args:
+    concat_dim: Dimension to concatenate along.
+    sp_inputs: List of `SparseTensor` to concatenate.
+    name: A name prefix for the returned tensors (optional).
+
+  Returns:
+    A `SparseTensor` with the concatenated output.
+
+  Raises:
+    TypeError: If `sp_inputs` is not a list of `SparseTensor`.
+  """
+  if not isinstance(sp_inputs, list):
+    raise TypeError("Inputs must be a list")
+  if not all(isinstance(sp_input, ops.SparseTensor) for sp_input in sp_inputs):
+    raise TypeError("All inputs must be SparseTensors")
+
+  if len(sp_inputs) == 1:  # Degenerate case of one tensor.
+    return sp_inputs[0]
+
+  inds = [sp_input.indices for sp_input in sp_inputs]
+  vals = [sp_input.values for sp_input in sp_inputs]
+  shapes = [sp_input.shape for sp_input in sp_inputs]
+
+  output_ind, output_val, output_shape = (
+      gen_sparse_ops._sparse_concat(
+          inds,
+          vals,
+          shapes,
+          concat_dim,
+          name=name))
+
+  return ops.SparseTensor(output_ind, output_val, output_shape)
+
+
+@ops.RegisterShape("SparseConcat")
+def _SparseConcatShape(op):
+  """Shape function for SparseConcat op."""
+  num_inputs = int(op.get_attr("N"))
+
+  # TF flattens and concatenates all list inputs, so reconstruct the lists here.
+  ind_shapes = [ind.get_shape().with_rank(2) for ind in op.inputs[0:num_inputs]]
+  val_shapes = [val.get_shape().with_rank(1)
+                for val in op.inputs[num_inputs:2 * num_inputs]]
+  shape_shapes = [shape.get_shape().with_rank(1)
+                  for shape in op.inputs[2 * num_inputs:]]
+
+  output_ind_rows = tensor_shape.Dimension(0)
+  output_ind_cols = tensor_shape.Dimension(None)
+  output_val_elems = tensor_shape.Dimension(0)
+  output_shape_shape = tensor_shape.TensorShape(None)
+
+  for i in range(num_inputs):
+    num_elems_i = ind_shapes[i][0].merge_with(val_shapes[i][0])
+    output_ind_rows += num_elems_i
+    output_ind_cols = output_ind_cols.merge_with(ind_shapes[i][1])
+    output_val_elems += num_elems_i
+    output_shape_shape = output_shape_shape.merge_with(shape_shapes[i])
+
+  output_ind_shape = tensor_shape.matrix(output_ind_rows, output_ind_cols)
+  output_val_shape = tensor_shape.vector(output_val_elems)
+
+  return [output_ind_shape, output_val_shape, output_shape_shape]
+
+
+def sparse_reorder(sp_input, name=None):
+  """Reorders a `SparseTensor` into the canonical, row-major ordering.
+
+  Note that by convention, all sparse ops preserve the canonical ordering
+  along increasing dimension number. The only time ordering can be violated
+  is during manual manipulation of the indices and values to add entries.
+
+  Reordering does not affect the shape of the `SparseTensor`.
+
+  For example, if sp_input has shape `[4, 5]` and `indices` / `values`:
+
+      [0, 3]: b
+      [0, 1]: a
+      [3, 1]: d
+      [2, 0]: c
+
+  then the output will be a `SparseTensor` of shape `[4, 5]` and
+  `indices` / `values`:
+
+      [0, 1]: a
+      [0, 3]: b
+      [2, 0]: c
+      [3, 1]: d
+
+  Args:
+    sp_input: The input `SparseTensor`.
+    name: A name prefix for the returned tensors (optional)
+
+  Returns:
+    A `SparseTensor` with the same shape and non-empty values, but in
+    canonical ordering.
+
+  Raises:
+    TypeError: If `sp_input` is not a `SparseTensor`.
+  """
+  if not isinstance(sp_input, ops.SparseTensor):
+    raise TypeError("Input must be a SparseTensor")
+
+  reordered_ind, reordered_val = (
+      gen_sparse_ops._sparse_reorder(
+          sp_input.indices,
+          sp_input.values,
+          sp_input.shape,
+          name=name))
+
+  return ops.SparseTensor(
+      reordered_ind, reordered_val, array_ops.identity(sp_input.shape))
+
+
+@ops.RegisterShape("SparseReorder")
+def _SparseReorderShape(op):
+  """Shape function for SparseReorder op."""
+  input_indices_shape = op.inputs[0].get_shape().with_rank(2)
+  input_values_shape = op.inputs[1].get_shape().with_rank(1)
+  unused_shape_shape = op.inputs[2].get_shape().with_rank(1)
+
+  return [input_indices_shape, input_values_shape]
+
+
+@ops.RegisterShape("SparseToDense")
+def _SparseToDenseShape(op):
+  input_shape = tensor_util.ConstantValue(op.inputs[1])
+  if input_shape is not None:
+    if np.ndim(input_shape) > 1:
+      raise ValueError("Input shape should be a vector")
+    return [tensor_shape.TensorShape(input_shape.tolist())]
+  else:
+    input_shape_shape = op.inputs[1].get_shape().with_rank_at_most(1)
+    return [tensor_shape.unknown_shape(ndims=input_shape_shape.num_elements())]
+
+
+def sparse_tensor_to_dense(sp_input, default_value, name=None):
+  """Converts a `SparseTensor` into a dense tensor.
+
+  This op is a convenience wrapper around `sparse_to_dense` for `SparseTensor`s.
+
+  For example, if `sp_input` has shape `[3, 5]` and non-empty string values:
+
+      [0, 1]: a
+      [0, 3]: b
+      [2, 0]: c
+
+  and `default_value` is `x`, then the output will be a dense `[3, 5]`
+  string tensor with values:
+
+      [[x a x b x]
+       [x x x x x]
+       [c x x x x]]
+
+  Args:
+    sp_input: The input `SparseTensor`.
+    default_value: Scalar value to set for indices not specified in
+      `sp_input`.
+    name: A name prefix for the returned tensors (optional).
+
+  Returns:
+    A dense tensor with shape `sp_input.shape` and values specified by
+    the non-empty values in `sp_input`. Indices not in `sp_input` are assigned
+    `default_value`.
+
+  Raises:
+    TypeError: If `sp_input` is not a `SparseTensor`.
+  """
+  if not isinstance(sp_input, ops.SparseTensor):
+    raise TypeError("Input must be a SparseTensor")
+
+  return gen_sparse_ops.sparse_to_dense(
+      sp_input.indices,
+      sp_input.shape,
+      sp_input.values,
+      default_value,
+      name=name)
+
+
+def sparse_to_indicator(sp_input, vocab_size, name=None):
+  """Converts a `SparseTensor` of ids into a dense bool indicator tensor.
+
+  The last dimension of `sp_input` is discarded and replaced with the values of
+  `sp_input`.  If `sp_input.shape = [D0, D1, ..., Dn, K]`, then
+  `output.shape = [D0, D1, ..., Dn, vocab_size]`, where
+
+      output[d_0, d_1, ..., d_n, sp_input[d_0, d_1, ..., d_n, k]] = True
+
+  and False elsewhere in `output`.
+
+  For example, if `sp_input.shape = [2, 3, 4]` with non-empty values:
+
+      [0, 0, 0]: 0
+      [0, 1, 0]: 10
+      [1, 0, 3]: 103
+      [1, 1, 2]: 112
+      [1, 1, 3]: 113
+      [1, 2, 1]: 121
+
+  and `vocab_size = 200`, then the output will be a `[2, 3, 200]` dense bool
+  tensor with False everywhere except at positions
+
+      (0, 0, 0), (0, 1, 10), (1, 0, 103), (1, 1, 112), (1, 1, 113), (1, 2, 121).
+
+  This op is useful for converting `SparseTensor`s into dense formats for
+  compatibility with ops that expect dense tensors.
+
+  The input `SparseTensor` must be in row-major order.
+
+  Args:
+    sp_input: A `SparseTensor` of type `int32` or `int64`.
+    vocab_size: The new size of the last dimension, with
+      `all(0 <= sp_input.values < vocab_size)`.
+    name: A name prefix for the returned tensors (optional)
+
+  Returns:
+    A dense bool indicator tensor representing the indices with specified value.
+
+  Raises:
+    TypeError: If `sp_input` is not a `SparseTensor`.
+  """
+  if not isinstance(sp_input, ops.SparseTensor):
+    raise TypeError("Input must be a SparseTensor")
+
+  with ops.op_scope([sp_input], name, "SparseToIndicator") as name:
+    indices_shape = array_ops.shape(sp_input.indices)
+    num_entries = indices_shape[0]
+    rank = indices_shape[1]
+
+    ids = sp_input.values
+    if ids.dtype != types.int64:
+      ids = math_ops.cast(ids, types.int64)
+
+    # Slice off the last dimension of indices, then then tack on the ids
+    indices_columns_to_preserve = array_ops.slice(
+        sp_input.indices, [0, 0], array_ops.pack([-1, rank - 1]))
+    new_indices = array_ops.concat(
+        1, [indices_columns_to_preserve, array_ops.reshape(ids, [-1, 1])])
+
+    new_values = array_ops.fill(array_ops.expand_dims(num_entries, 0), True)
+    new_shape = array_ops.concat(
+        0, [array_ops.slice(sp_input.shape, [0],
+                            array_ops.expand_dims(rank - 1, 0)), [vocab_size]])
+
+    sp_new = ops.SparseTensor(new_indices, new_values, new_shape)
+
+    return sparse_tensor_to_dense(sp_new, False, name=name)
+
+
+def sparse_retain(sp_input, to_retain):
+  """Retains specified non-empty values within a `SparseTensor`.
+
+  For example, if `sp_input` has shape `[4, 5]` and 4 non-empty string values:
+
+      [0, 1]: a
+      [0, 3]: b
+      [2, 0]: c
+      [3, 1]: d
+
+  and `to_retain = [True, False, False, True]`, then the output will
+  be a `SparseTensor` of shape `[4, 5]` with 2 non-empty values:
+
+      [0, 1]: a
+      [3, 1]: d
+
+  Args:
+    sp_input: The input `SparseTensor` with `N` non-empty elements.
+    to_retain: A bool vector of length `N` with `M` true values.
+
+  Returns:
+    A `SparseTensor` with the same shape as the input and `M` non-empty
+    elements corresponding to the true positions in `to_retain`.
+
+  Raises:
+    TypeError: If `sp_input` is not a `SparseTensor`.
+  """
+  if not isinstance(sp_input, ops.SparseTensor):
+    raise TypeError("Input must be a SparseTensor")
+
+  to_retain = ops.convert_to_tensor(to_retain)
+
+  # Shape checking, if shape is known at graph construction time
+  retain_shape = to_retain.get_shape()
+  retain_shape.assert_has_rank(1)
+  sp_input.values.get_shape()[0].merge_with(retain_shape[0])
+
+  where_true = array_ops.reshape(array_ops.where(to_retain), [-1])
+  new_indices = array_ops.gather(sp_input.indices, where_true)
+  new_values = array_ops.gather(sp_input.values, where_true)
+  return ops.SparseTensor(
+      new_indices, new_values, array_ops.identity(sp_input.shape))
+
+
+def sparse_fill_empty_rows(sp_input, default_value, name=None):
+  """Fills empty rows in the input 2-D `SparseTensor` with a default value.
+
+  This op adds entries with the specified `default_value` at index
+  `[row, 0]` for any row in the input that does not already have a value.
+
+  For example, suppose `sp_input` has shape `[5, 6]` and non-empty values:
+
+      [0, 1]: a
+      [0, 3]: b
+      [2, 0]: c
+      [3, 1]: d
+
+  Rows 1 and 4 are empty, so the output will be of shape `[5, 6]` with values:
+
+      [0, 1]: a
+      [0, 3]: b
+      [1, 0]: default_value
+      [2, 0]: c
+      [3, 1]: d
+      [4, 0]: default_value
+
+  Note that the input may have empty columns at the end, with no effect on
+  this op.
+
+  The output `SparseTensor` will be in row-major order and will have the
+  same shape as the input.
+
+  This op also returns an indicator vector such that
+
+      empty_row_indicator[i] = True iff row i was an empty row.
+
+  Args:
+    sp_input: A `SparseTensor` with shape `[N, M]`.
+    default_value: The value to fill for empty rows, with the same type as
+      `sp_input.`
+    name: A name prefix for the returned tensors (optional)
+
+  Returns:
+    sp_ordered_output: A `SparseTensor` with shape `[N, M]`, and with all empty
+      rows filled in with `default_value`.
+    empty_row_indicator: A bool vector of length `N` indicating whether each
+      input row was empty.
+
+  Raises:
+    TypeError: If `sp_input` is not a `SparseTensor`.
+  """
+  if not isinstance(sp_input, ops.SparseTensor):
+    raise TypeError("Input must be a SparseTensor")
+
+  with ops.op_scope([sp_input], name, "SparseFillEmptyRows"):
+    default_value = ops.convert_to_tensor(
+        default_value, dtype=sp_input.values.dtype)
+
+    num_rows = math_ops.cast(sp_input.shape[0], types.int32)
+    all_row_indices = math_ops.cast(
+        math_ops.range(0, num_rows, 1), types.int64)
+    empty_row_indices, _ = array_ops.list_diff(
+        all_row_indices, sp_input.indices[:, 0])
+    empty_row_indicator = gen_sparse_ops.sparse_to_dense(
+        empty_row_indices, array_ops.expand_dims(sp_input.shape[0], -1), True,
+        False)
+
+    empty_row_indices_as_column = array_ops.reshape(empty_row_indices, [-1, 1])
+    additional_indices = array_ops.concat(
+        1,
+        [empty_row_indices_as_column,
+         array_ops.zeros_like(empty_row_indices_as_column)])
+    additional_values = array_ops.fill(array_ops.shape(empty_row_indices),
+                                       default_value)
+
+    all_indices_unordered = array_ops.concat(
+        0, [sp_input.indices, additional_indices])
+    all_values_unordered = array_ops.concat(
+        0, [sp_input.values, additional_values])
+    sp_unordered_output = ops.SparseTensor(
+        all_indices_unordered, all_values_unordered, sp_input.shape)
+    sp_ordered_output = sparse_reorder(sp_unordered_output)
+
+    return sp_ordered_output, empty_row_indicator
diff --git a/tensorflow/python/ops/sparse_ops_test.py b/tensorflow/python/ops/sparse_ops_test.py
new file mode 100644
index 0000000000..07a5e6c6da
--- /dev/null
+++ b/tensorflow/python/ops/sparse_ops_test.py
@@ -0,0 +1,212 @@
+"""Tests for Python ops defined in sparse_ops."""
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import sparse_ops
+from tensorflow.python.platform import googletest
+
+
+class SparseToIndicatorTest(test_util.TensorFlowTestCase):
+
+  def _SparseTensor_5x6(self, dtype):
+    ind = np.array([
+        [0, 0],
+        [1, 0], [1, 3], [1, 4],
+        [3, 2], [3, 3]])
+    val = np.array([0, 10, 13, 14, 32, 33])
+    shape = np.array([5, 6])
+    return ops.SparseTensor(
+        constant_op.constant(ind, types.int64),
+        constant_op.constant(val, dtype),
+        constant_op.constant(shape, types.int64))
+
+  def _SparseTensor_2x3x4(self, dtype):
+    ind = np.array([
+        [0, 0, 1],
+        [0, 1, 0], [0, 1, 2],
+        [1, 0, 3],
+        [1, 1, 1], [1, 1, 3],
+        [1, 2, 2]])
+    val = np.array([1, 10, 12, 103, 111, 113, 122])
+    shape = np.array([2, 3, 4])
+    return ops.SparseTensor(
+        constant_op.constant(ind, types.int64),
+        constant_op.constant(val, dtype),
+        constant_op.constant(shape, types.int64))
+
+  def testInt32(self):
+    with self.test_session(use_gpu=False):
+      sp_input = self._SparseTensor_5x6(types.int32)
+      output = sparse_ops.sparse_to_indicator(sp_input, 50).eval()
+
+      expected_output = np.zeros((5, 50), dtype=np.bool)
+      expected_trues = ((0, 0), (1, 10), (1, 13), (1, 14), (3, 32), (3, 33))
+      for expected_true in expected_trues:
+        expected_output[expected_true] = True
+
+      self.assertAllEqual(output, expected_output)
+
+  def testInt64(self):
+    with self.test_session(use_gpu=False):
+      sp_input = self._SparseTensor_5x6(types.int64)
+      output = sparse_ops.sparse_to_indicator(sp_input, 50).eval()
+
+      expected_output = np.zeros((5, 50), dtype=np.bool)
+      expected_trues = [(0, 0), (1, 10), (1, 13), (1, 14), (3, 32), (3, 33)]
+      for expected_true in expected_trues:
+        expected_output[expected_true] = True
+
+      self.assertAllEqual(output, expected_output)
+
+  def testHigherRank(self):
+    with self.test_session(use_gpu=False):
+      sp_input = self._SparseTensor_2x3x4(types.int64)
+      output = sparse_ops.sparse_to_indicator(sp_input, 200).eval()
+
+      expected_output = np.zeros((2, 3, 200), dtype=np.bool)
+      expected_trues = [(0, 0, 1), (0, 1, 10), (0, 1, 12),
+                        (1, 0, 103), (1, 1, 111), (1, 1, 113), (1, 2, 122)]
+      for expected_true in expected_trues:
+        expected_output[expected_true] = True
+
+      self.assertAllEqual(output, expected_output)
+
+
+class SparseRetainTest(test_util.TensorFlowTestCase):
+
+  def _SparseTensor_5x6(self):
+    ind = np.array([
+        [0, 0],
+        [1, 0], [1, 3], [1, 4],
+        [3, 2], [3, 3]])
+    val = np.array([0, 10, 13, 14, 32, 33])
+    shape = np.array([5, 6])
+    return ops.SparseTensor(
+        constant_op.constant(ind, types.int64),
+        constant_op.constant(val, types.int32),
+        constant_op.constant(shape, types.int64))
+
+  def testBasic(self):
+    with self.test_session(use_gpu=False) as sess:
+      sp_input = self._SparseTensor_5x6()
+      to_retain = np.array([1, 0, 0, 1, 1, 0], dtype=np.bool)
+      sp_output = sparse_ops.sparse_retain(sp_input, to_retain)
+
+      output = sess.run(sp_output)
+
+      self.assertAllEqual(output.indices, [[0, 0], [1, 4], [3, 2]])
+      self.assertAllEqual(output.values, [0, 14, 32])
+      self.assertAllEqual(output.shape, [5, 6])
+
+  def testRetainNone(self):
+    with self.test_session(use_gpu=False) as sess:
+      sp_input = self._SparseTensor_5x6()
+      to_retain = np.zeros((6,), dtype=np.bool)
+      sp_output = sparse_ops.sparse_retain(sp_input, to_retain)
+
+      output = sess.run(sp_output)
+
+      self.assertAllEqual(output.indices, np.array([]).reshape((0, 2)))
+      self.assertAllEqual(output.values, [])
+      self.assertAllEqual(output.shape, [5, 6])
+
+  def testMismatchedRetainShape(self):
+    with self.test_session(use_gpu=False):
+      sp_input = self._SparseTensor_5x6()
+      to_retain = np.array([1, 0, 0, 1, 0], dtype=np.bool)
+      with self.assertRaises(ValueError):
+        sparse_ops.sparse_retain(sp_input, to_retain)
+
+
+class SparseFillEmptyRowsTest(test_util.TensorFlowTestCase):
+
+  def _SparseTensor_5x6(self):
+    ind = np.array([
+        [0, 0],
+        [1, 0], [1, 3], [1, 4],
+        [3, 2], [3, 3]])
+    val = np.array([0, 10, 13, 14, 32, 33])
+    shape = np.array([5, 6])
+    return ops.SparseTensor(
+        constant_op.constant(ind, types.int64),
+        constant_op.constant(val, types.int32),
+        constant_op.constant(shape, types.int64))
+
+  def _SparseTensor_String5x6(self):
+    ind = np.array([
+        [0, 0],
+        [1, 0], [1, 3], [1, 4],
+        [3, 2], [3, 3]])
+    val = np.array(["a", "b", "c", "d", "e", "f"])
+    shape = np.array([5, 6])
+    return ops.SparseTensor(
+        constant_op.constant(ind, types.int64),
+        constant_op.constant(val, types.string),
+        constant_op.constant(shape, types.int64))
+
+  def _SparseTensor_2x6(self):
+    ind = np.array([[0, 0], [1, 0], [1, 3], [1, 4]])
+    val = np.array([0, 10, 13, 14])
+    shape = np.array([2, 6])
+    return ops.SparseTensor(
+        constant_op.constant(ind, types.int64),
+        constant_op.constant(val, types.int32),
+        constant_op.constant(shape, types.int64))
+
+  def testFillNumber(self):
+    with self.test_session(use_gpu=False) as sess:
+      sp_input = self._SparseTensor_5x6()
+      sp_output, empty_row_indicator = (
+          sparse_ops.sparse_fill_empty_rows(sp_input, -1))
+
+      output, empty_row_indicator_out = sess.run(
+          [sp_output, empty_row_indicator])
+
+      self.assertAllEqual(
+          output.indices,
+          [[0, 0], [1, 0], [1, 3], [1, 4], [2, 0], [3, 2], [3, 3], [4, 0]])
+      self.assertAllEqual(output.values, [0, 10, 13, 14, -1, 32, 33, -1])
+      self.assertAllEqual(output.shape, [5, 6])
+      self.assertAllEqual(empty_row_indicator_out,
+                          np.array([0, 0, 1, 0, 1]).astype(np.bool))
+
+  def testFillString(self):
+    with self.test_session(use_gpu=False) as sess:
+      sp_input = self._SparseTensor_String5x6()
+      sp_output, empty_row_indicator = (
+          sparse_ops.sparse_fill_empty_rows(sp_input, ""))
+
+      output, empty_row_indicator_out = sess.run(
+          [sp_output, empty_row_indicator])
+
+      self.assertAllEqual(
+          output.indices,
+          [[0, 0], [1, 0], [1, 3], [1, 4], [2, 0], [3, 2], [3, 3], [4, 0]])
+      self.assertAllEqual(output.values, ["a", "b", "c", "d", "", "e", "f", ""])
+      self.assertAllEqual(output.shape, [5, 6])
+      self.assertAllEqual(empty_row_indicator_out,
+                          np.array([0, 0, 1, 0, 1]).astype(np.bool))
+
+  def testNoEmptyRows(self):
+    with self.test_session(use_gpu=False) as sess:
+      sp_input = self._SparseTensor_2x6()
+      sp_output, empty_row_indicator = (
+          sparse_ops.sparse_fill_empty_rows(sp_input, -1))
+
+      output, empty_row_indicator_out = sess.run(
+          [sp_output, empty_row_indicator])
+
+      self.assertAllEqual(output.indices, [[0, 0], [1, 0], [1, 3], [1, 4]])
+      self.assertAllEqual(output.values, [0, 10, 13, 14])
+      self.assertAllEqual(output.shape, [2, 6])
+      self.assertAllEqual(empty_row_indicator_out, np.zeros(2).astype(np.bool))
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/ops/standard_ops.py b/tensorflow/python/ops/standard_ops.py
new file mode 100644
index 0000000000..beef8e75b5
--- /dev/null
+++ b/tensorflow/python/ops/standard_ops.py
@@ -0,0 +1,41 @@
+# pylint: disable=wildcard-import,unused-import
+"""Import names of Tensor Flow standard Ops."""
+
+# Imports the following modules so that @RegisterGradient get executed.
+from tensorflow.python.ops import array_grad
+from tensorflow.python.ops import data_flow_grad
+from tensorflow.python.ops import math_grad
+from tensorflow.python.ops import state_grad
+
+from tensorflow.python.ops.array_ops import *
+from tensorflow.python.ops.clip_ops import *
+# TODO(vrv): Switch to import * once we're okay with exposing the module.
+from tensorflow.python.ops.control_flow_ops import group
+from tensorflow.python.ops.control_flow_ops import no_op
+from tensorflow.python.ops.control_flow_ops import tuple
+from tensorflow.python.ops.data_flow_ops import *
+from tensorflow.python.ops.gradients import *
+from tensorflow.python.ops.init_ops import *
+from tensorflow.python.ops.io_ops import *
+from tensorflow.python.ops.linalg_ops import *
+from tensorflow.python.ops.logging_ops import *
+from tensorflow.python.ops.math_ops import *
+from tensorflow.python.ops.numerics import *
+from tensorflow.python.ops.parsing_ops import *
+from tensorflow.python.ops.random_ops import *
+from tensorflow.python.ops.sparse_ops import *
+from tensorflow.python.ops.state_ops import assign
+from tensorflow.python.ops.state_ops import assign_add
+from tensorflow.python.ops.state_ops import assign_sub
+from tensorflow.python.ops.state_ops import count_up_to
+from tensorflow.python.ops.state_ops import scatter_add
+from tensorflow.python.ops.state_ops import scatter_sub
+from tensorflow.python.ops.state_ops import scatter_update
+from tensorflow.python.ops.string_ops import *
+from tensorflow.python.ops.summary_ops import histogram_summary
+from tensorflow.python.ops.summary_ops import image_summary
+from tensorflow.python.ops.summary_ops import merge_all_summaries
+from tensorflow.python.ops.summary_ops import merge_summary
+from tensorflow.python.ops.summary_ops import scalar_summary
+from tensorflow.python.ops.variable_scope import *
+from tensorflow.python.ops.variables import *
diff --git a/tensorflow/python/ops/state_grad.py b/tensorflow/python/ops/state_grad.py
new file mode 100644
index 0000000000..d9b084693c
--- /dev/null
+++ b/tensorflow/python/ops/state_grad.py
@@ -0,0 +1,18 @@
+"""Gradients for operators defined in state_ops.py."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import state_ops
+
+ops.NoGradient("Assign")
+
+
+ops.NoGradient("AssignAdd")
+
+
+ops.NoGradient("AssignSub")
+
+
+ops.NoGradient("ScatterAdd")
+
+
+ops.NoGradient("ScatterSub")
diff --git a/tensorflow/python/ops/state_ops.py b/tensorflow/python/ops/state_ops.py
new file mode 100644
index 0000000000..1c8f38b94c
--- /dev/null
+++ b/tensorflow/python/ops/state_ops.py
@@ -0,0 +1,189 @@
+"""## Variables
+
+@@Variable
+
+## Variable helper functions
+
+TensorFlow provides a set of functions to help manage the set of variables
+collected in the graph.
+
+@@all_variables
+@@trainable_variables
+
+@@initialize_all_variables
+@@initialize_variables
+@@assert_variables_initialized
+
+## Saving and Restoring Variables.
+
+@@Saver
+
+@@latest_checkpoint
+
+@@get_checkpoint_state
+@@update_checkpoint_state
+
+## Sharing Variables
+
+TensorFlow provides several classes and operations that you can use to
+create variables contingent on certain conditions.
+
+@@get_variable
+@@get_variable_scope
+@@variable_scope
+
+@@constant_initializer
+@@random_normal_initializer
+@@truncated_normal_initializer
+@@random_uniform_initializer
+@@uniform_unit_scaling_initializer
+@@zeros_initializer
+
+## Sparse Variable Updates
+
+The sparse update ops modify a subset of the entries in a dense `Variable`,
+either overwriting the entries or adding / subtracting a delta.  These are
+useful for training embedding models and similar lookup-based networks, since
+only a small subset of embedding vectors change in any given step.
+
+Since a sparse update of a large tensor may be generated automatically during
+gradient computation (as in the gradient of [`tf.gather`](array_ops.md#gather)),
+an [`IndexedSlices`](#IndexedSlices) class is provided that encapsulates a set
+of sparse indices and values.  `IndexedSlices` objects are detected and handled
+automatically by the optimizers in most cases.
+
+@@scatter_update
+@@scatter_add
+@@scatter_sub
+@@sparse_mask
+@@IndexedSlices
+"""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import gen_state_ops
+# pylint: disable=wildcard-import,undefined-variable
+from tensorflow.python.ops.gen_state_ops import *
+
+
+# pylint: disable=protected-access
+def variable_op(shape, dtype, name="Variable", set_shape=True, container="",
+                shared_name=""):
+  """Create a variable Operation.
+
+  See also variables.Variable.
+
+  Args:
+    shape: The shape of the tensor managed by this variable
+    dtype: The underlying type of the tensor values.
+    name: optional name to use for the variable op.
+    set_shape: If True, set the shape property of the returned Tensor to
+      the shape argument.
+    container: An optional string. Defaults to "".
+      If non-empty, this variable is placed in the given container.
+      Otherwise, a default container is used.
+    shared_name: An optional string. Defaults to "".
+      If non-empty, this variable is named in the given bucket
+      with this shared_name. Otherwise, the node name is used instead.
+
+  Returns:
+    A variable tensor.
+  """
+  ret = gen_state_ops._variable(shape=shape, dtype=dtype, name=name,
+                                container=container, shared_name=shared_name)
+  # TODO(mrry): Move this to where it is used, so we can get rid of this op
+  #   wrapper?
+  if set_shape:
+    ret.set_shape(shape)
+  return ret
+
+
+# NOTE(mrry): Shapes are conditionally set in the Python wrapper.
+ops.RegisterShape("Variable")(common_shapes.unknown_shape)
+
+
+@ops.RegisterShape("TemporaryVariable")
+def _TemporaryVariableShape(op):
+  """Shape function for the TemporaryVariable op."""
+  shape = tensor_util.TensorShapeProtoToList(op.get_attr("shape"))
+  return [tensor_shape.TensorShape(shape)]
+
+
+@ops.RegisterShape("DestroyTemporaryVariable")
+def _DestroyTemporaryVariableShape(op):
+  """Shape function for the DestroyTemporaryVariable op."""
+  return [op.inputs[0].get_shape()]
+
+
+def init_variable(v, init, name="init"):
+  """Initializes variable with "init".
+
+  This op does the following:
+  if init is a Tensor, v = init
+  if callable(init): v = init(VariableShape(v), v.dtype)
+
+  Args:
+    v: Variable to initialize
+    init: Tensor to assign to v,
+      Or an object convertible to Tensor e.g. nparray,
+      Or an Initializer that generates a tensor given the shape and type of v.
+      An "Initializer" is a callable that returns a tensor that "v" should be
+      set to. It will be called as init(shape, dtype).
+    name: Optional name for the op.
+
+  Returns:
+    The operation that initializes v.
+  """
+  with ops.op_scope([v, init], None, v.op.name + "/"):
+    with ops.name_scope(name) as scope:
+      with ops.device(v.device or ops.get_default_graph().get_default_device()):
+        if callable(init):
+          assert v.get_shape().is_fully_defined(), "Variable shape unknown."
+          # TODO(mrry): Convert to v.shape when the property and
+          # accessor are reconciled (and all initializers support
+          # tf.TensorShape objects).
+          value = init(v.get_shape().as_list(), v.dtype.base_dtype)
+          value = ops.convert_to_tensor(value, name="value")
+          return assign(v, value, name=scope)
+        else:
+          init = ops.convert_to_tensor(init, name="init")
+          return assign(v, init, name=scope)
+
+
+@ops.RegisterShape("Assign")
+def _AssignShape(op):
+  """Shape function for the Assign op."""
+  if op.get_attr("validate_shape"):
+    # NOTE(mrry): Return a known shape here. This makes it awkward to
+    # chain a validated-shape assignment and a reshaping assignment,
+    # but that is a sufficiently niche case that supporting it does
+    # not seem worthwhile.
+    return [op.inputs[0].get_shape().merge_with(op.inputs[1].get_shape())]
+  return [op.inputs[1].get_shape()]
+
+
+@ops.RegisterShape("AssignAdd")
+@ops.RegisterShape("AssignSub")
+def _AssignUpdateShape(op):
+  """Shape function for the AssignAdd and AssignSub dense update ops."""
+  return [op.inputs[0].get_shape().merge_with(op.inputs[1].get_shape())]
+
+
+@ops.RegisterShape("CountUpTo")
+def _CountUpToShape(op):
+  """Shape function for the CountUpTo op."""
+  return [op.inputs[0].get_shape().merge_with(tensor_shape.scalar())]
+
+
+@ops.RegisterShape("ScatterAdd")
+@ops.RegisterShape("ScatterSub")
+@ops.RegisterShape("ScatterUpdate")
+def _ScatterUpdateShape(op):
+  """Shape function for the sparse update ops."""
+  var_shape = op.inputs[0].get_shape()
+  indices_shape = op.inputs[1].get_shape()
+  unused_updates_shape = op.inputs[2].get_shape().merge_with(
+      indices_shape.concatenate(var_shape[1:]))
+  return [var_shape]
diff --git a/tensorflow/python/ops/string_ops.py b/tensorflow/python/ops/string_ops.py
new file mode 100644
index 0000000000..8181fe9a2a
--- /dev/null
+++ b/tensorflow/python/ops/string_ops.py
@@ -0,0 +1,12 @@
+"""String Ops."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_util
+from tensorflow.python.ops import common_shapes
+from tensorflow.python.ops import gen_string_ops
+# pylint: disable=wildcard-import,undefined-variable
+from tensorflow.python.ops.gen_string_ops import *
+
+ops.NoGradient("StringToHashBucket")
+
+ops.RegisterShape("StringToHashBucket")(common_shapes.unchanged_shape)
diff --git a/tensorflow/python/ops/summary_ops.py b/tensorflow/python/ops/summary_ops.py
new file mode 100644
index 0000000000..d65fd1ea7c
--- /dev/null
+++ b/tensorflow/python/ops/summary_ops.py
@@ -0,0 +1,177 @@
+"""Summary Operations."""
+# pylint: disable=wildcard-import,protected-access
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.ops import gen_summary_ops
+from tensorflow.python.ops.gen_summary_ops import *
+
+
+def _Collect(val, collections, default_collections):
+  if collections is None:
+    collections = default_collections
+  for key in collections:
+    ops.add_to_collection(key, val)
+
+
+def histogram_summary(tag, values, collections=None, name=None):
+  """Outputs a `Summary` protocol buffer with a histogram.
+
+  The generated
+  [`Summary`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+  has one summary value containing a histogram for `values`.
+
+  This op reports an `OutOfRange` error if any value is not finite.
+
+  Args:
+    tag: A `string` `Tensor`. 0-D.  Tag to use for the summary value.
+    values: A `float32` `Tensor`. Any shape. Values to use to build the
+      histogram.
+    collections: Optional list of graph collections keys. The new summary op is
+      added to these collections. Defaults to `[GraphKeys.SUMMARIES]`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A scalar `Tensor` of type `string`. The serialized `Summary` protocol
+    buffer.
+  """
+  with ops.op_scope([tag, values], name, "HistogramSummary") as scope:
+    val = gen_summary_ops._histogram_summary(
+        tag=tag, values=values, name=scope)
+    _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
+  return val
+
+
+def image_summary(tag, tensor, max_images=None, collections=None, name=None):
+  """Outputs a `Summary` protocol buffer with images.
+
+  The summary has up to `max_images` summary values containing images. The
+  images are built from `tensor` which must be 4-D with shape `[batch_size,
+  height, width, channels]` and where `channels` can be:
+
+  *  1: `tensor` is interpreted as Grayscale.
+  *  3: `tensor` is interpreted as RGB.
+  *  4: `tensor` is interpreted as RGBA.
+
+  The images have the same number of channels as the input tensor. Their values
+  are normalized, one image at a time, to fit in the range `[0, 255]`.  The
+  op uses two different normalization algorithms:
+
+  *  If the input values are all positive, they are rescaled so the largest one
+     is 255.
+
+  *  If any input value is negative, the values are shifted so input value 0.0
+     is at 127.  They are then rescaled so that either the smallest value is 0,
+     or the largest one is 255.
+
+  The `tag` argument is a scalar `Tensor` of type `string`.  It is used to
+  build the `tag` of the summary values:
+
+  *  If `max_images` is 1, the summary value tag is '*tag*/image'.
+  *  If `max_images` is greater than 1, the summary value tags are
+     generated sequentially as '*tag*/image/0', '*tag*/image/1', etc.
+
+  Args:
+    tag: A scalar `Tensor` of type `string`. Used to build the `tag`
+      of the summary values.
+    tensor: A 4-D `float32` `Tensor` of shape `[batch_size, height, width,
+     channels]` where `channels` is 1, 3, or 4.
+    max_images: Max number of batch elements to generate images for.
+    collections: Optional list of ops.GraphKeys.  The collections to add the
+      summary to.  Defaults to [ops.GraphKeys.SUMMARIES]
+    name: A name for the operation (optional).
+
+  Returns:
+    A scalar `Tensor` of type `string`. The serialized `Summary` protocol
+    buffer.
+  """
+  with ops.op_scope([tag, tensor], name, "ImageSummary") as scope:
+    val = gen_summary_ops._image_summary(
+        tag=tag, tensor=tensor, max_images=max_images, name=scope)
+    _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
+  return val
+
+
+def merge_summary(inputs, collections=None, name=None):
+  """Merges summaries.
+
+  This op creates a
+  [`Summary`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+  protocol buffer that contains the union of all the values in the input
+  summaries.
+
+  When the Op is run, it reports an `InvalidArgument` error if multiple values
+  in the summaries to merge use the same tag.
+
+  Args:
+    inputs: A list of `string` `Tensor` objects containing serialized `Summary`
+      protocol buffers.
+    collections: Optional list of graph collections keys. The new summary op is
+      added to these collections. Defaults to `[GraphKeys.SUMMARIES]`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A scalar `Tensor` of type `string`. The serialized `Summary` protocol
+    buffer resulting from the merging.
+  """
+  with ops.op_scope(inputs, name, "MergeSummary") as scope:
+    val = gen_summary_ops._merge_summary(inputs=inputs, name=name)
+    _Collect(val, collections, [])
+  return val
+
+
+def merge_all_summaries(key=ops.GraphKeys.SUMMARIES):
+  """Merges all summaries collected in the default graph.
+
+  Args:
+    key: `GraphKey` used to collect the summaries.  Defaults to
+      `GraphKeys.SUMMARIES`.
+
+  Returns:
+    If no summaries were collected, returns None.  Otherwise returns a scalar
+    `Tensor` of type`string` containing the serialized `Summary` protocol
+    buffer resulting from the merging.
+  """
+  summary_ops = ops.get_collection(key)
+  if not summary_ops:
+    return None
+  else:
+    return merge_summary(summary_ops)
+
+
+def scalar_summary(tags, values, collections=None, name=None):
+  """Outputs a `Summary` protocol buffer with scalar values.
+
+  The input `tags` and `values` must have the same shape.  The generated
+  summary has a summary value for each tag-value pair in `tags` and `values`.
+
+  Args:
+    tags: A 1-D `string` `Tensor`.  Tags for the summaries.
+    values: A 1-D `float32` or `float64` Tensor.  Values for the summaries.
+    collections: Optional list of graph collections keys. The new summary op is
+      added to these collections. Defaults to `[GraphKeys.SUMMARIES]`.
+    name: A name for the operation (optional).
+
+  Returns:
+    A scalar `Tensor` of type `string`. The serialized `Summary` protocol
+    buffer.
+  """
+  with ops.op_scope([tags, values], name, "ScalarSummary") as scope:
+    val = gen_summary_ops._scalar_summary(tags=tags, values=values, name=scope)
+    _Collect(val, collections, [ops.GraphKeys.SUMMARIES])
+  return val
+
+
+ops.NoGradient("HistogramAccumulatorSummary")
+ops.NoGradient("HistogramSummary")
+ops.NoGradient("ImageSummary")
+ops.NoGradient("MergeSummary")
+ops.NoGradient("ScalarSummary")
+
+
+@ops.RegisterShape("HistogramAccumulatorSummary")
+@ops.RegisterShape("HistogramSummary")
+@ops.RegisterShape("ImageSummary")
+@ops.RegisterShape("MergeSummary")
+@ops.RegisterShape("ScalarSummary")
+def _ScalarShape(unused_op):
+  return [tensor_shape.scalar()]
diff --git a/tensorflow/python/ops/variable_scope.py b/tensorflow/python/ops/variable_scope.py
new file mode 100644
index 0000000000..c9c2cac0a5
--- /dev/null
+++ b/tensorflow/python/ops/variable_scope.py
@@ -0,0 +1,333 @@
+"""A class to store named variables and a scope operator to manage sharing."""
+
+import contextlib
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import types
+from tensorflow.python.ops import init_ops
+from tensorflow.python.ops import variables
+from tensorflow.python.platform import logging
+
+
+class _VariableStore(object):
+  """Variable store that carries a number of named Variables.
+
+  New variable names and new variables can be created; all stored
+  variables are initialized with the initializer passed to __init__.
+
+  Attributes:
+    vars: a dictionary with string names (same as passed in GetVar) as keys
+          and the corresponding TensorFlow Variables as values.
+  """
+
+  def __init__(self):
+    """Create a variable store."""
+    self._vars = {}  # A dictionary of the stored TensorFlow variables.
+
+  def get_variable(self, name, shape=None, dtype=types.float32,
+                   initializer=None, reuse=None, trainable=True,
+                   collections=None):
+    """Gets an existing variable with these parameters or create a new one.
+
+    If a variable with the given name is already stored, we return the stored
+    variable. Otherwise, we create a new one.
+
+    Set `reuse` to `True` when you only want to reuse existing Variables.
+    Set `reuse` to `False` when you only want to create new Variables.
+    If `reuse` is `None` (the default), both new and existing variables are
+    returned.
+
+    If initializer is `None` (the default), the default initializer passed in
+    the constructor is used. If that one is `None` too, we use a new
+    `UniformUnitScalingInitializer`.
+
+    Args:
+      name: the name of the new or existing variable.
+      shape: shape of the new or existing variable.
+      dtype: type of the new or existing variable (defaults to `DT_FLOAT`).
+      initializer: initializer for the variable.
+      reuse: a Boolean or `None`. Controls reuse or creation of variables.
+      trainable: If `True` also add the variable to the graph collection
+        `GraphKeys.TRAINABLE_VARIABLES` (see variables.Variable).
+      collections: List of graph collections keys to add the Variable to.
+        Defaults to `[GraphKeys.VARIABLES]` (see variables.Variable).
+
+    Returns:
+      The created or existing variable.
+
+    Raises:
+      ValueError: when creating a new variable and shape is not declared,
+        when reusing a variable and specifying a conflicting shape,
+        or when violating reuse during variable creation.
+    """
+    should_check = reuse is not None
+    dtype = types.as_dtype(dtype)
+    shape = tensor_shape.as_shape(shape)
+    if name in self._vars:
+      # Here we handle the case when returning an existing variable.
+      if should_check and not reuse:
+        raise ValueError("Over-sharing: Variable %s already exists, disallowed."
+                         " Did you mean to set reuse=True in VarScope?" % name)
+      found_var = self._vars[name]
+      if not shape.is_compatible_with(found_var.get_shape()):
+        raise ValueError("Trying to share variable %s, but specified shape %s"
+                         " and found shape %s." % (name, str(shape),
+                                                   str(found_var.get_shape())))
+      if not dtype.is_compatible_with(found_var.dtype):
+        dtype_str = dtype.name
+        found_type_str = found_var.dtype.name
+        raise ValueError("Trying to share variable %s, but specified dtype %s"
+                         " and found dtype %s." % (name, str(dtype_str),
+                                                   str(found_type_str)))
+      return found_var
+
+    # The code below handles only the case of creating a new variable.
+    if should_check and reuse:
+      raise ValueError("Under-sharing: Variable %s does not exist, disallowed."
+                       " Did you mean to set reuse=None in VarScope?" % name)
+    if not shape.is_fully_defined():
+      raise ValueError("Shape of a new variable (%s) must be fully defined, "
+                       "but instead was %s." % (name, shape))
+    if initializer is None:
+      initializer = init_ops.uniform_unit_scaling_initializer()
+    with ops.name_scope(name + "/Initializer/"):
+      init_val = initializer(shape.as_list(), dtype=dtype)
+    v = variables.Variable(init_val, name=name, trainable=trainable,
+                           collections=collections)
+    self._vars[name] = v
+    logging.info("Created variable %s with shape %s and init %s", v.name,
+                 format(shape), str(initializer))
+    return v
+
+
+class _VariableScope(object):
+  """Variable scope object to carry defaults to provide to get_variable.
+
+  Many of the arguments we need for get_variable in a variable store are most
+  easily handled with a context. This object is used for the defaults.
+
+  Attributes:
+    name: name of the current scope, used as prefix in get_variable.
+    initializer: default initializer passed to get_variable.
+    reuse: Boolean or None, setting the reuse in get_variable.
+  """
+
+  def __init__(self, reuse, name="", initializer=None):
+    self._name = name
+    self._initializer = initializer
+    self._reuse = reuse
+
+  @property
+  def name(self):
+    return self._name
+
+  @property
+  def reuse(self):
+    return self._reuse
+
+  @property
+  def initializer(self):
+    return self._initializer
+
+  def reuse_variables(self):
+    """Reuse variables in this scope."""
+    self._reuse = True
+
+  def set_initializer(self, initializer):
+    """Set initializer for this scope."""
+    self._initializer = initializer
+
+  def get_variable(self, var_store, name, shape=None, dtype=types.float32,
+                   initializer=None, trainable=True, collections=None):
+    """Gets an existing variable with this name or create a new one."""
+    if initializer is None and self._initializer:
+      initializer = self._initializer
+    full_name = self.name + "/" + name if self.name else name
+    # Variable names only depend on variable_scope (full_name here),
+    # not name_scope, so we reset it below for the time of variable creation.
+    with ops.name_scope(None):
+      return var_store.get_variable(full_name, shape, dtype, initializer,
+                                    self.reuse, trainable, collections)
+
+
+_VARSTORE_KEY = ("__variable_store",)
+_VARSCOPE_KEY = ("__varscope",)
+
+
+def get_variable_scope():
+  """Returns the current variable scope."""
+  scope = ops.get_collection(_VARSCOPE_KEY)
+  if scope:  # This collection has at most 1 element, the default scope at [0].
+    return scope[0]
+  scope = _VariableScope(False)
+  ops.add_to_collection(_VARSCOPE_KEY, scope)
+  return scope
+
+
+def _get_default_variable_store():
+  store = ops.get_collection(_VARSTORE_KEY)
+  if store:
+    return store[0]
+  store = _VariableStore()
+  ops.add_to_collection(_VARSTORE_KEY, store)
+  return store
+
+
+def get_variable(name, shape=None, dtype=types.float32, initializer=None,
+                 trainable=True, collections=None):
+  """Gets an existing variable with these parameters or create a new one.
+
+  This function prefixes the name with the current variable scope
+  and performs reuse checks. See the
+  [Variable Scope How To](../../how_tos/variable_scope/index.md)
+  for an extensive description of how reusing works. Here is a basic example:
+
+  ```python
+  with tf.variable_scope("foo"):
+      v = get_variable("v", [1])  # v.name == "foo/v:0"
+      w = get_variable("w", [1])  # w.name == "foo/w:0"
+  with tf.variable_scope("foo", reuse=True)
+      v1 = get_variable("v")  # The same as v above.
+  ```
+
+  If initializer is `None` (the default), the default initializer passed in
+  the constructor is used. If that one is `None` too, a
+  `UniformUnitScalingInitializer` will be used.
+
+  Args:
+    name: the name of the new or existing variable.
+    shape: shape of the new or existing variable.
+    dtype: type of the new or existing variable (defaults to `DT_FLOAT`).
+    initializer: initializer for the variable if one is created.
+    trainable: If `True` also add the variable to the graph collection
+      `GraphKeys.TRAINABLE_VARIABLES` (see variables.Variable).
+    collections: List of graph collections keys to add the Variable to.
+      Defaults to `[GraphKeys.VARIABLES]` (see variables.Variable).
+
+  Returns:
+    The created or existing variable.
+
+  Raises:
+    ValueError: when creating a new variable and shape is not declared,
+      or when violating reuse during variable creation. Reuse is set inside
+      `variable_scope`.
+  """
+  return get_variable_scope().get_variable(_get_default_variable_store(), name,
+                                           shape, dtype, initializer,
+                                           trainable, collections)
+
+
+@contextlib.contextmanager
+def variable_scope(name_or_scope, reuse=None, initializer=None):
+  """Returns a context for variable scope.
+
+  Variable scope allows to create new variables and to share already created
+  ones while providing checks to not create or share by accident. For details,
+  see the [Variable Scope How To](../../how_tos/variable_scope/index.md),
+  here we present only a few basic examples.
+
+  Simple example of how to create a new variable:
+
+  ```python
+  with tf.variable_scope("foo"):
+      with tf.variable_scope("bar"):
+          v = tf.get_variable("v", [1])
+          assert v.name == "foo/bar/v:0"
+  ```
+
+  Basic example of sharing a variable:
+
+  ```python
+  with tf.variable_scope("foo"):
+      v = get_variable("v", [1])
+  with tf.variable_scope("foo", reuse=True):
+      v1 = tf.get_variable("v", [1])
+  assert v1 == v
+  ```
+
+  Sharing a variable by capturing a scope and setting reuse:
+
+  ```python
+  with tf.variable_scope("foo") as scope.
+      v = get_variable("v", [1])
+      scope.reuse_variables()
+      v1 = tf.get_variable("v", [1])
+  assert v1 == v
+  ```
+
+  To prevent accidental sharing of variables, we raise an exception when
+  getting an existing variable in a non-reusing scope.
+
+  ```python
+  with tf.variable_scope("foo") as scope.
+      v = get_variable("v", [1])
+      v1 = tf.get_variable("v", [1])
+      #  Raises ValueError("... v already exists ...").
+  ```
+
+  Similarly, we raise an exception when trying to get a variable that
+  does not exist in reuse mode.
+
+  ```python
+  with tf.variable_scope("foo", reuse=True):
+      v = get_variable("v", [1])
+      #  Raises ValueError("... v does not exists ...").
+  ```
+
+  Note that the `reuse` flag is inherited: if we open a reusing scope,
+  then all its sub-scopes become reusing as well.
+
+  Args:
+    name_or_scope: `string` or `VariableScope`: the scope to open.
+    reuse: `True` or `None`; if `True`, we go into reuse mode for this scope as
+      well as all sub-scopes; if `None`, we just inherit the parent scope reuse.
+    initializer: default initializer for variables within this scope.
+
+  Yields:
+    A scope that can be to captured and reused.
+
+  Raises:
+    ValueError: when trying to reuse within a create scope, or create within
+      a reuse scope, or if reuse is not `None` or `True`.
+    TypeError: when the types of some arguments are not appropriate.
+  """
+  if not isinstance(name_or_scope, (_VariableScope, basestring)):
+    raise TypeError("VariableScope: name_scope must be a string or "
+                    "VariableScope.")
+  if reuse not in [None, True]:
+    raise ValueError("VariableScope reuse parameter must be True or None.")
+  if not reuse and isinstance(name_or_scope, (_VariableScope)):
+    logging.info("Passing VariableScope to a non-reusing scope, intended?")
+  if reuse and isinstance(name_or_scope, (basestring)):
+    logging.info("Re-using string-named scope, consider capturing as object.")
+  get_variable_scope()  # Ensure that a default exists, then get a pointer.
+  default_varscope = ops.get_collection(_VARSCOPE_KEY)
+  try:
+    old = default_varscope[0]
+    reuse = reuse or old.reuse  # Re-using is inherited by sub-scopes.
+    if isinstance(name_or_scope, _VariableScope):
+      # Handler for the case when we jump to a shared scope.
+      #   In this case, we leave the current name_scope unchanged.
+      #   We create a new VariableScope (default_varscope[0]) that contains
+      #   a copy of the provided shared scope, possibly with changed reuse
+      #   and initializer, if the user requested this.
+      default_varscope[0] = _VariableScope(reuse, name_or_scope.name,
+                                           name_or_scope.initializer)
+      if initializer:
+        default_varscope[0].set_initializer(initializer)
+      yield default_varscope[0]
+    else:
+      # Handler for the case when we just prolong current variable scope.
+      #   In this case we prolong the current name_scope and create a new
+      #   VariableScope with name extended by the provided one, and inherited
+      #   reuse and initializer (except if the user provided values to set).
+      with ops.name_scope(name_or_scope):
+        new_name = old.name + "/" + name_or_scope if old.name else name_or_scope
+        default_varscope[0] = _VariableScope(reuse, name=new_name,
+                                             initializer=old.initializer)
+        if initializer:
+          default_varscope[0].set_initializer(initializer)
+        yield default_varscope[0]
+  finally:
+    default_varscope[0] = old
diff --git a/tensorflow/python/ops/variables.py b/tensorflow/python/ops/variables.py
new file mode 100644
index 0000000000..dafd3b8bdc
--- /dev/null
+++ b/tensorflow/python/ops/variables.py
@@ -0,0 +1,569 @@
+"""Variable class."""
+import tensorflow.python.platform
+
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import state_ops
+
+
+class Variable(object):
+  """See the [Variables How To](../../how_tos/variables/index.md) for a high
+  level overview.
+
+  A variable maintains state in the graph across calls to `run()`. You add a
+  variable to the graph by constructing an instance of the class `Variable`.
+
+  The `Variable()` constructor requires an initial value for the variable,
+  which can be a `Tensor` of any type and shape. The initial value defines the
+  type and shape of the variable. After construction, the type and shape of
+  the variable are fixed. The value can be changed using one of the assign
+  methods.
+
+  If you want to change the shape of a variable later you have to use an
+  `assign` Op with `validate_shape=False`.
+
+  Just like any `Tensor`, variables created with `Variable()` can be used as
+  inputs for other Ops in the graph. Additionally, all the operators
+  overloaded for the `Tensor` class are carried over to variables, so you can
+  also add nodes to the graph by just doing arithmetic on variables.
+
+  ```python
+  import tensorflow as tf
+
+  # Create a variable.
+  w = tf.Variable(<initial-value>, name=<optional-name>)
+
+  # Use the variable in the graph like any Tensor.
+  y = tf.matmul(w, ...another variable or tensor...)
+
+  # The overloaded operators are available too.
+  z = tf.sigmoid(w + b)
+
+  # Assign a new value to the variable with `assign()` or a related method.
+  w.assign(w + 1.0)
+  w.assign_add(1.0)
+  ```
+
+  When you launch the graph, variables have to be explicitly initialized before
+  you can run Ops that use their value. You can initialize a variable by
+  running its *initializer op*, restoring the variable from a save file, or
+  simply running an `assign` Op that assigns a value to the variable. In fact,
+  the variable *initializer op* is just an `assign` Op that assigns the
+  variable's initial value to the variable itself.
+
+  ```python
+  # Launch the graph in a session.
+  with tf.Session() as sess:
+      # Run the variable initializer.
+      sess.run(w.initializer)
+      # ...you now can run ops that use the value of 'w'...
+  ```
+
+  The most common initialization pattern is to use the convenience function
+  `initialize_all_variables()` to add an Op to the graph that initializes
+  all the variables. You then run that Op after launching the graph.
+
+  ```python
+  # Add an Op to initialize all variables.
+  init_op = tf.initialize_all_variables()
+
+  # Launch the graph in a session.
+  with tf.Session() as sess:
+      # Run the Op that initializes all variables.
+      sess.run(init_op)
+      # ...you can now run any Op that uses variable values...
+  ```
+
+  If you need to create a variable with an initial value dependent on another
+  variable, use the other variable's `initialized_value()`. This ensures that
+  variables are initialized in the right order.
+
+  All variables are automatically collected in the graph where they are
+  created. By default, the constructor adds the new variable to the graph
+  collection `GraphKeys.VARIABLES`. The convenience function
+  `all_variables()` returns the contents of that collection.
+
+  When building a machine learning model it is often convenient to distinguish
+  betwen variables holding the trainable model parameters and other variables
+  such as a `global step` variable used to count training steps. To make this
+  easier, the variable constructor supports a `trainable=<bool>` parameter. If
+  `True`, the new variable is also added to the graph collection
+  `GraphKeys.TRAINABLE_VARIABLES`. The convenience function
+  `trainable_variables()` returns the contents of this collection. The
+  various `Optimizer` classes use this collection as the default list of
+  variables to optimize.
+
+
+  Creating a variable.
+
+  @@__init__
+  @@initialized_value
+
+  Changing a variable value.
+
+  @@assign
+  @@assign_add
+  @@assign_sub
+  @@scatter_sub
+  @@count_up_to
+
+  @@eval
+
+  Properties.
+
+  @@name
+  @@dtype
+  @@get_shape
+  @@device
+  @@initializer
+  @@graph
+  @@op
+  """
+
+  def __init__(self, initial_value, trainable=True, collections=None,
+               validate_shape=True, name=None):
+    """Creates a new variable with value `initial_value`.
+
+    The new variable is added to the graph collections listed in `collections`,
+    which defaults to `[GraphKeys.VARIABLES]`.
+
+    If `trainable` is `True` the variable is also added to the graph collection
+    `GraphKeys.TRAINABLE_VARIABLES`.
+
+    This constructor creates both a `variable` Op and an `assign` Op to set the
+    variable to its initial value.
+
+    Args:
+      initial_value: A `Tensor`, or Python object convertible to a `Tensor`.
+        The initial value for the Variable. Must have a shape specified unless
+        `validate_shape` is set to False.
+      trainable: If `True`, the default, also adds the variable to the graph
+        collection `GraphKeys.TRAINABLE_VARIABLES`. This collection is used as
+        the default list of variables to use by the `Optimizer` classes.
+      collections: List of graph collections keys. The new variable is added to
+        these collections. Defaults to `[GraphKeys.VARIABLES]`.
+      validate_shape: If `False`, allows the variable to be initialized with a
+        value of unknown shape. If `True`, the default, the shape of
+        `initial_value` must be known.
+      name: Optional name for the variable. Defaults to `'Variable'` and gets
+        uniquified automatically.
+
+    Returns:
+      A Variable.
+
+    Raises:
+      ValueError: If the initial value does not have a shape and
+        `validate_shape` is `True`.
+    """
+    if collections is None:
+      collections = [ops.GraphKeys.VARIABLES]
+    if trainable and ops.GraphKeys.TRAINABLE_VARIABLES not in collections:
+      # pylint: disable=g-no-augmented-assignment
+      #
+      # Pylint wants us to write collections += [...TRAINABLE_VARIABLES] which
+      # is not the same (it modifies the list in place.)  Here, we only want to
+      # modify the value of the variable, not the list.
+      collections = collections + [ops.GraphKeys.TRAINABLE_VARIABLES]
+      # pylint: enable=g-no-augmented-assignment
+    with ops.op_scope([initial_value], name, "Variable") as name:
+      self._initial_value = ops.convert_to_tensor(initial_value,
+                                                  name="initial_value")
+      if not self._initial_value.get_shape().is_fully_defined():
+        if validate_shape:
+          raise ValueError(
+              "initial_value must have a shape specified: %s"
+              % self._initial_value)
+        self._variable = state_ops.variable_op(
+            [], self._initial_value.dtype.base_dtype, set_shape=False,
+            name=name)
+        with ops.device(self._variable.device):
+          self._initializer_op = state_ops.assign(
+              self._variable, self._initial_value, validate_shape=False).op
+      else:
+        self._variable = state_ops.variable_op(
+            self._initial_value.get_shape(),
+            self._initial_value.dtype.base_dtype,
+            name=name)
+        with ops.device(self._variable.device):
+          self._initializer_op = state_ops.assign(
+              self._variable, self._initial_value).op
+    for key in collections:
+      ops.add_to_collection(key, self)
+    self._save_slice_info = None
+
+  def _as_graph_element(self):
+    """Conversion function for Graph.as_graph_element()."""
+    return self._variable
+
+  def _AsTensor(self):
+    """Conversion function for ops.convert_to_tensor()."""
+    return self._variable
+
+  def eval(self, session=None):
+    """In a session, computes and returns the value of this variable.
+
+    This is not a graph construction method, it does not add ops to the graph.
+
+    This convenience method requires a session where the graph containing this
+    variable has been launched. If no session is passed, the default session is
+    used.  See the [Session class](../client.md#Session) for more information on
+    launching a graph and on sessions.
+
+    ```python
+    v = tf.Variable([1, 2])
+    init = tf.initialize_all_variables()
+
+    with tf.Session() as sess:
+        sess.run(init)
+        # Usage passing the session explicitly.
+        print v.eval(sess)
+        # Usage with the default session.  The 'with' block
+        # above makes 'sess' the default session.
+        print v.eval()
+    ```
+
+    Args:
+      session: The session to use to evaluate this variable. If
+        none, the default session is used.
+
+    Returns:
+      A numpy `ndarray` with a copy of the value of this variable.
+    """
+    return self._variable.eval(session=session)
+
+  def initialized_value(self):
+    """Returns the value of the initialized variable.
+
+    You should use this instead of the variable itself to initialize another
+    variable with a value that depends on the value of this variable.
+
+    ```python
+    # Initialize 'v' with a random tensor.
+    v = tf.Variable(tf.truncated_normal([10, 40]))
+    # Use `initialized_value` to guarantee that `v` has been
+    # initialized before its value is used to initialize `w`.
+    # The random values are picked only once.
+    w = tf.Variable(v.initialized_value() * 2.0)
+    ```
+
+    Returns:
+      A `Tensor` holding the value of this variable after its initializer
+      has run.
+    """
+    return control_flow_ops.with_dependencies(
+        [self._initializer_op], self._variable)
+
+  def assign(self, value, use_locking=False):
+    """Assigns a new value to the variable.
+
+    This is essentially a shortcut for `assign(self, value)`.
+
+    Args:
+      value: A `Tensor`. The new value for this variable.
+      use_locking: If `True`, use locking during the assignment.
+
+    Returns:
+      A `Tensor` that will hold the new value of this variable after
+      the assignment has completed.
+    """
+    return state_ops.assign(self._variable, value, use_locking=use_locking)
+
+  def assign_add(self, delta, use_locking=False):
+    """Adds a value to this variable.
+
+     This is essentially a shortcut for `assign_add(self, delta)`.
+
+    Args:
+      delta: A `Tensor`. The value to add to this variable.
+      use_locking: If `True`, use locking during the operation.
+
+    Returns:
+      A `Tensor` that will hold the new value of this variable after
+      the addition has completed.
+    """
+    return state_ops.assign_add(self._variable, delta, use_locking=use_locking)
+
+  def assign_sub(self, delta, use_locking=False):
+    """Subtracts a value from this variable.
+
+    This is essentially a shortcut for `assign_sub(self, delta)`.
+
+    Args:
+      delta: A `Tensor`. The value to subtract from this variable.
+      use_locking: If `True`, use locking during the operation.
+
+    Returns:
+      A `Tensor` that will hold the new value of this variable after
+      the subtraction has completed.
+    """
+    return state_ops.assign_sub(self._variable, delta, use_locking=use_locking)
+
+  def scatter_sub(self, sparse_delta, use_locking=False):
+    """Subtracts `IndexedSlices` from this variable.
+
+    This is essentially a shortcut for `scatter_sub(self, sparse_delta.indices,
+    sparse_delta.values)`.
+
+    Args:
+      sparse_delta: `IndexedSlices` to be subtracted from this variable.
+      use_locking: If `True`, use locking during the operation.
+
+    Returns:
+      A `Tensor` that will hold the new value of this variable after
+      the scattered subtraction has completed.
+
+    Raises:
+      ValueError: if `sparse_delta` is not an `IndexedSlices`.
+    """
+    if not isinstance(sparse_delta, ops.IndexedSlices):
+      raise ValueError("sparse_delta is not IndexedSlices: %s" % sparse_delta)
+    return state_ops.scatter_sub(self._variable,
+                                 sparse_delta.indices,
+                                 sparse_delta.values,
+                                 use_locking=use_locking)
+
+  def count_up_to(self, limit):
+    """Increments this variable until it reaches `limit`.
+
+    When that Op is run it tries to increment the variable by `1`. If
+    incrementing the variable would bring it above `limit` then the Op raises
+    the exception `OutOfRangeError`.
+
+    If no error is raised, the Op outputs the value of the variable before
+    the increment.
+
+    This is essentially a shortcut for `count_up_to(self, limit)`.
+
+    Args:
+      limit: value at which incrementing the variable raises an error.
+
+    Returns:
+      A `Tensor` that will hold the variable value before the increment. If no
+      other Op modifies this variable, the values produced will all be
+      distinct.
+    """
+    return state_ops.count_up_to(self._variable, limit=limit)
+
+  # Conversion to tensor.
+  @staticmethod
+  def _TensorConversionFunction(v, dtype=None, name=None):
+    """Utility function for converting a Variable to a Tensor."""
+    _ = name
+    ret = v._AsTensor()  # pylint: disable=protected-access
+    if dtype and not dtype.is_compatible_with(v.dtype):
+      raise ValueError(
+          "Incompatible type conversion requested to type '%s' for variable "
+          "of type '%s'" % (dtype.name, v.dtype.name))
+    return ret
+
+  # Operator overloading.
+  #
+  # To carry over all overloaded operators from ops.Tensor to Variable, we
+  # register the _RunOp() static method as the implementation of all operators.
+  # That function dynamically discovers the overloaded operator in ops.Tensor
+  # and invokes it after converting the Variable to a tensor.
+  @staticmethod
+  def _OverloadAllOperators():
+    """Register overloads for all operators."""
+    for operator in ops.Tensor.OVERLOADABLE_OPERATORS:
+      Variable._OverloadOperator(operator)
+
+  @staticmethod
+  def _OverloadOperator(operator):
+    """Register _RunOp as the implementation of 'operator'.
+
+    Args:
+      operator: string. The operator name.
+    """
+    if operator in ["__invert__", "__neg__", "__abs__"]:
+      setattr(Variable, operator, lambda a: Variable._RunOp(operator, a, None))
+    else:
+      setattr(Variable, operator, lambda a, b: Variable._RunOp(operator, a, b))
+
+  @staticmethod
+  def _RunOp(operator, a, b):
+    """Run the operator 'op' for 'a'.
+
+    Args:
+      operator: string. The operator name.
+      a: A Variable.
+      b: Second argument to the operator. None if unary.
+    Returns:
+      The result of the operator.
+    """
+    # pylint: disable=protected-access
+    if b is not None:
+      return getattr(ops.Tensor, operator)(a._AsTensor(), b)
+    else:
+      return getattr(ops.Tensor, operator)(a._AsTensor())
+    # pylint: enable=protected-access
+
+  @property
+  def name(self):
+    """The name of this variable."""
+    return self._variable.name
+
+  @property
+  def initializer(self):
+    """The initializer operation for this variable."""
+    return self._initializer_op
+
+  @property
+  def device(self):
+    """The device of this variable."""
+    return self._variable.device
+
+  @property
+  def dtype(self):
+    """The `DType` of this variable."""
+    return self._variable.dtype
+
+  @property
+  def op(self):
+    """The `Operation` of this variable."""
+    return self._variable.op
+
+  @property
+  def graph(self):
+    """The `Graph` of this variable."""
+    return self._variable.graph
+
+  def get_shape(self):
+    """The `TensorShape` of this variable.
+
+    Returns:
+      A `TensorShape`.
+    """
+    return self._variable.get_shape()
+
+  # Experimental support for saving variables as slices of a larger variable.
+  class SaveSliceInfo(object):
+    """Information on how to save this Variable as a slice."""
+
+    def  __init__(self, name, spec):
+      """Create a SliceInfo.
+
+      Args:
+        name: Name of the larger Tensor that this variable is a slice of.
+        spec: Slice specification for the saver.
+      """
+      self.name = name
+      self.spec = spec
+
+  def _set_save_slice_info(self, save_slice_info):
+    """Sets the slice info for this Variable.
+
+    Args:
+      save_slice_info: A Variable.SliceInfo object.
+    """
+    self._save_slice_info = save_slice_info
+
+
+def all_variables():
+  """Returns all variables collected in the graph.
+
+  The `Variable()` constructor automatically adds new variables to the graph
+  collection `GraphKeys.VARIABLES`. This convenience function returns the
+  contents of that collection.
+
+  Returns:
+    A list of `Variable` objects.
+  """
+  return ops.get_collection(ops.GraphKeys.VARIABLES)
+
+
+def trainable_variables():
+  """Returns all variables created with `trainable=True`.
+
+  When passed `trainable=True`, the `Variable()` constructor automatically
+  adds new variables to the graph collection
+  `GraphKeys.TRAINABLE_VARIABLES`. This convenience function returns the
+  contents of that collection.
+
+  Returns:
+    A list of Variable objects.
+  """
+  return ops.get_collection(ops.GraphKeys.TRAINABLE_VARIABLES)
+
+
+def initialize_variables(var_list, name="init"):
+  """Returns an Op that initializes a list of variables.
+
+  After you launch the graph in a session, you can run the returned Op to
+  initialize all the variables in `var_list`. This Op runs all the
+  initializers of the variables in `var_list` in parallel.
+
+  Calling `initialize_variables()` is equivalent to passing the list of
+  initializers to `Group()`.
+
+  If `var_list` is empty, however, the function still returns an Op that can
+  be run. That Op just has no effect.
+
+  Args:
+    var_list: List of `Variable` objects to initialize.
+    name: Optional name for the returned operation.
+
+  Returns:
+    An Op that run the initializers of all the specified variables.
+  """
+  if var_list:
+    return control_flow_ops.group(
+        *[v.initializer for v in var_list], name=name)
+  return control_flow_ops.no_op(name=name)
+
+
+def initialize_all_variables():
+  """Returns an Op that initializes all variables.
+
+  This is just a shortcut for `initialize_variables(all_variables())`
+
+  Returns:
+    An Op that initializes all variables in the graph.
+  """
+  return initialize_variables(all_variables())
+
+
+def assert_variables_initialized(var_list=None):
+  """Returns an Op to check if variables are initialized.
+
+  When run, the returned Op will raise the exception `FailedPreconditionError`
+  if any of the variables has not yet been initialized.
+
+  Note: This function is implemented by trying to fetch the values of the
+  variables. If one of the variables is not initialized a message may be
+  logged by the C++ runtime. This is expected.
+
+  Args:
+    var_list: List of `Variable` objects to check. Defaults to the
+      value of `all_variables().`
+
+  Returns:
+    An Op, or None if there are no variables.
+  """
+  if var_list is None:
+    var_list = all_variables()
+  # Backwards compatibility for old-style variables. TODO(mdevin): remove.
+  if not var_list:
+    var_list = []
+    for op in ops.get_default_graph().get_operations():
+      if op.type in ["Variable", "AutoReloadVariable"]:
+        var_list.append(op.outputs[0])
+  if not var_list:
+    return None
+  else:
+    ranks = []
+    for var in var_list:
+      with ops.device(var.device):
+        ranks.append(array_ops.rank(var))
+    if len(ranks) == 1:
+      return ranks[0]
+    else:
+      return array_ops.pack(ranks)
+
+
+# pylint: disable=protected-access
+ops.register_tensor_conversion_function(Variable,
+                                        Variable._TensorConversionFunction)
+Variable._OverloadAllOperators()
+# pylint: enable=protected-access
diff --git a/tensorflow/python/platform/__init__.py b/tensorflow/python/platform/__init__.py
new file mode 100644
index 0000000000..b545bac907
--- /dev/null
+++ b/tensorflow/python/platform/__init__.py
@@ -0,0 +1,6 @@
+"""Setup system-specific platform environment for TensorFlow."""
+import control_imports
+if control_imports.USE_OSS:
+  from tensorflow.python.platform.default._init import *
+else:
+  from tensorflow.python.platform.google._init import *
diff --git a/tensorflow/python/platform/app.py b/tensorflow/python/platform/app.py
new file mode 100644
index 0000000000..3d51bc74b2
--- /dev/null
+++ b/tensorflow/python/platform/app.py
@@ -0,0 +1,13 @@
+"""Switch between depending on pyglib.app or an OSS replacement."""
+# pylint: disable=unused-import
+# pylint: disable=g-import-not-at-top
+# pylint: disable=wildcard-import
+import tensorflow.python.platform
+import control_imports
+if control_imports.USE_OSS and control_imports.OSS_APP:
+  from tensorflow.python.platform.default._app import *
+else:
+  from tensorflow.python.platform.google._app import *
+
+# Import 'flags' into this module
+from tensorflow.python.platform import flags
diff --git a/tensorflow/python/platform/base.i b/tensorflow/python/platform/base.i
new file mode 100644
index 0000000000..85fa3968a1
--- /dev/null
+++ b/tensorflow/python/platform/base.i
@@ -0,0 +1,176 @@
+// Helper macros and typemaps for use in Tensorflow swig files.
+//
+%{
+  #include <memory>
+  #include "tensorflow/core/platform/port.h"
+  using tensorflow::uint64;
+  using tensorflow::string;
+
+  template<class T>
+      bool _PyObjAs(PyObject *pystr, T* cstr) {
+    T::undefined;  // You need to define specialization _PyObjAs<T>
+  }
+
+  template<class T>
+      PyObject *_PyObjFrom(const T& c) {
+    T::undefined;  // You need to define specialization _PyObjFrom<T>
+  }
+
+#ifdef HAS_GLOBAL_STRING
+  template<>
+      bool _PyObjAs(PyObject *pystr, ::string* cstr) {
+    char *buf;
+    Py_ssize_t len;
+#if PY_VERSION_HEX >= 0x03030000
+    if (PyUnicode_Check(pystr)) {
+      buf = PyUnicode_AsUTF8AndSize(pystr, &len);
+      if (!buf) return false;
+    } else  // NOLINT
+#endif
+      if (PyBytes_AsStringAndSize(pystr, &buf, &len) == -1) return false;
+    if (cstr) cstr->assign(buf, len);
+    return true;
+  }
+#endif
+  template<>
+      bool _PyObjAs(PyObject *pystr, std::string* cstr) {
+    char *buf;
+    Py_ssize_t len;
+#if PY_VERSION_HEX >= 0x03030000
+    if (PyUnicode_Check(pystr)) {
+      buf = PyUnicode_AsUTF8AndSize(pystr, &len);
+      if (!buf) return false;
+    } else  // NOLINT
+#endif
+      if (PyBytes_AsStringAndSize(pystr, &buf, &len) == -1) return false;
+    if (cstr) cstr->assign(buf, len);
+    return true;
+  }
+#ifdef HAS_GLOBAL_STRING
+  template<>
+      PyObject* _PyObjFrom(const ::string& c) {
+    return PyString_FromStringAndSize(c.data(), c.size());
+  }
+#endif
+  template<>
+      PyObject* _PyObjFrom(const std::string& c) {
+    return PyString_FromStringAndSize(c.data(), c.size());
+  }
+
+  PyObject* _SwigString_FromString(const string& s) {
+    return PyUnicode_FromStringAndSize(s.data(), s.size());
+  }
+%}
+
+%typemap(in) string {
+  if (!_PyObjAs<string>($input, &$1)) return NULL;
+}
+
+%typemap(in) const string& (string temp) {
+  if (!_PyObjAs<string>($input, &temp)) return NULL;
+  $1 = &temp;
+}
+
+%typemap(out) string {
+  $result = PyString_FromStringAndSize($1.data(), $1.size());
+}
+
+%typemap(out) const string& {
+  $result = PyString_FromStringAndSize($1->data(), $1->size());
+}
+
+%typemap(in, numinputs = 0) string* OUTPUT (string temp) {
+  $1 = &temp;
+}
+
+%typemap(argout) string * OUTPUT {
+  PyObject *str = PyString_FromStringAndSize($1->data(), $1->length());
+  if (!str) SWIG_fail;
+  %append_output(str);
+}
+
+%typemap(argout) string* INOUT = string* OUTPUT;
+
+%typemap(varout) string {
+  $result = PyString_FromStringAndSize($1.data(), $1.size());
+}
+
+%define _LIST_OUTPUT_TYPEMAP(type, py_converter)
+    %typemap(in) std::vector<type>(std::vector<type> temp) {
+  if (!vector_input_helper($input, &temp, _PyObjAs<type>)) {
+    if (!PyErr_Occurred())
+      PyErr_SetString(PyExc_TypeError, "sequence(type) expected");
+    return NULL;
+  }
+  $1 = temp;
+}
+%typemap(in) const std::vector<type>& (std::vector<type> temp),
+   const std::vector<type>* (std::vector<type> temp) {
+  if (!vector_input_helper($input, &temp, _PyObjAs<type>)) {
+    if (!PyErr_Occurred())
+      PyErr_SetString(PyExc_TypeError, "sequence(type) expected");
+    return NULL;
+  }
+  $1 = &temp;
+}
+%typemap(in,numinputs=0)
+std::vector<type>* OUTPUT (std::vector<type> temp),
+   hash_set<type>* OUTPUT (hash_set<type> temp),
+   set<type>* OUTPUT (set<type> temp) {
+  $1 = &temp;
+}
+%typemap(argout) std::vector<type>* OUTPUT, set<type>* OUTPUT, hash_set<type>* OUTPUT {
+  %append_output(list_output_helper($1, &py_converter));
+}
+%typemap(out) std::vector<type> {
+  $result = vector_output_helper(&$1, &py_converter);
+}
+%typemap(out) std::vector<type>*, const std::vector<type>& {
+  $result = vector_output_helper($1, &py_converter);
+}
+%enddef
+
+_LIST_OUTPUT_TYPEMAP(string, _SwigString_FromString);
+_LIST_OUTPUT_TYPEMAP(unsigned long long, PyLong_FromUnsignedLongLong);
+
+%typemap(in) uint64 {
+  // TODO(gps): Check if another implementation
+  // from hosting/images/util/image-hosting-utils.swig is better. May be not.
+%#if PY_MAJOR_VERSION < 3
+  if (PyInt_Check($input)) {
+    $1 = static_cast<uint64>(PyInt_AsLong($input));
+  } else
+%#endif
+  if (PyLong_Check($input)) {
+    $1 = static_cast<uint64>(PyLong_AsUnsignedLongLong($input));
+  } else {
+    PyErr_SetString(PyExc_TypeError,
+                    "int or long value expected for argument \"$1_name\"");
+  }
+  // TODO(mrovner): Make consistent use of SWIG_fail vs. return NULL.
+  if (PyErr_Occurred()) return NULL;
+}
+
+%define _COPY_TYPEMAPS(oldtype, newtype)
+    typedef oldtype newtype;
+%apply oldtype * OUTPUT { newtype * OUTPUT };
+%apply oldtype & OUTPUT { newtype & OUTPUT };
+%apply oldtype * INPUT { newtype * INPUT };
+%apply oldtype & INPUT { newtype & INPUT };
+%apply oldtype * INOUT { newtype * INOUT };
+%apply oldtype & INOUT { newtype & INOUT };
+%apply std::vector<oldtype> * OUTPUT { std::vector<newtype> * OUTPUT };
+%enddef
+
+_COPY_TYPEMAPS(unsigned long long, uint64);
+
+// SWIG macros for explicit API declaration.
+// Usage:
+//
+// %ignoreall
+// %unignore SomeName;   // namespace / class / method
+// %include "somelib.h"
+// %unignoreall  // mandatory closing "bracket"
+%define %ignoreall %ignore ""; %enddef
+%define %unignore %rename("%s") %enddef
+%define %unignoreall %rename("%s") ""; %enddef
diff --git a/tensorflow/python/platform/control_imports.py b/tensorflow/python/platform/control_imports.py
new file mode 100644
index 0000000000..713caf3f4f
--- /dev/null
+++ b/tensorflow/python/platform/control_imports.py
@@ -0,0 +1,13 @@
+"""Switch between Google or open source dependencies."""
+# Switch between Google and OSS dependencies
+USE_OSS = True
+
+# Per-dependency switches determining whether each dependency is ready
+# to be replaced by its OSS equivalence.
+# TODO(danmane,mrry,opensource): Flip these switches, then remove them
+OSS_APP = True
+OSS_FLAGS = True
+OSS_GFILE = True
+OSS_GOOGLETEST = True
+OSS_LOGGING = True
+OSS_PARAMETERIZED = True
diff --git a/tensorflow/python/platform/default/__init__.py b/tensorflow/python/platform/default/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/platform/default/__init__.py
diff --git a/tensorflow/python/platform/default/_app.py b/tensorflow/python/platform/default/_app.py
new file mode 100644
index 0000000000..5917d00ce3
--- /dev/null
+++ b/tensorflow/python/platform/default/_app.py
@@ -0,0 +1,11 @@
+"""Generic entry point script."""
+import sys
+
+from tensorflow.python.platform import flags
+
+
+def run():
+    f = flags.FLAGS
+    f._parse_flags()
+    main = sys.modules['__main__'].main
+    sys.exit(main(sys.argv))
diff --git a/tensorflow/python/platform/default/_flags.py b/tensorflow/python/platform/default/_flags.py
new file mode 100644
index 0000000000..ceccda6e5c
--- /dev/null
+++ b/tensorflow/python/platform/default/_flags.py
@@ -0,0 +1,92 @@
+"""Implementation of the flags interface."""
+import tensorflow.python.platform
+
+import argparse
+
+_global_parser = argparse.ArgumentParser()
+
+class _FlagValues(object):
+
+  def __init__(self):
+    """Global container and accessor for flags and their values."""
+    self.__dict__['__flags'] = {}
+    self.__dict__['__parsed'] = False
+
+  def _parse_flags(self):
+    result = _global_parser.parse_args()
+    for flag_name, val in vars(result).items():
+      self.__dict__['__flags'][flag_name] = val
+    self.__dict__['__parsed'] = True
+
+  def __getattr__(self, name):
+    """Retrieves the 'value' attribute of the flag --name."""
+    if not self.__dict__['__parsed']:
+      self._parse_flags()
+    if name not in self.__dict__['__flags']:
+      raise AttributeError(name)
+    return self.__dict__['__flags'][name]
+
+  def __setattr__(self, name, value):
+    """Sets the 'value' attribute of the flag --name."""
+    if not self.__dict__['__parsed']:
+      self._parse_flags()
+    self.__dict__['__flags'][name] = value
+
+
+def _define_helper(flag_name, default_value, docstring, flagtype):
+  """Registers 'flag_name' with 'default_value' and 'docstring'."""
+  _global_parser.add_argument("--" + flag_name,
+                              default=default_value,
+                              help=docstring,
+                              type=flagtype)
+
+
+# Provides the global object that can be used to access flags.
+FLAGS = _FlagValues()
+
+
+def DEFINE_string(flag_name, default_value, docstring):
+  """Defines a flag of type 'string'.
+
+  Args:
+    flag_name: The name of the flag as a string.
+    default_value: The default value the flag should take as a string.
+    docstring: A helpful message explaining the use of the flag.
+  """
+  _define_helper(flag_name, default_value, docstring, str)
+
+
+def DEFINE_integer(flag_name, default_value, docstring):
+  """Defines a flag of type 'int'.
+
+  Args:
+    flag_name: The name of the flag as a string.
+    default_value: The default value the flag should take as an int.
+    docstring: A helpful message explaining the use of the flag.
+  """
+  _define_helper(flag_name, default_value, docstring, int)
+
+
+def DEFINE_boolean(flag_name, default_value, docstring):
+  """Defines a flag of type 'boolean'.
+
+  Args:
+    flag_name: The name of the flag as a string.
+    default_value: The default value the flag should take as a boolean.
+    docstring: A helpful message explaining the use of the flag.
+  """
+  _define_helper(flag_name, default_value, docstring, bool)
+  _global_parser.add_argument('--no' + flag_name,
+                              action='store_false',
+                              dest=flag_name)
+
+
+def DEFINE_float(flag_name, default_value, docstring):
+  """Defines a flag of type 'float'.
+
+  Args:
+    flag_name: The name of the flag as a string.
+    default_value: The default value the flag should take as a float.
+    docstring: A helpful message explaining the use of the flag.
+  """
+  _define_helper(flag_name, default_value, docstring, float)
diff --git a/tensorflow/python/platform/default/_gfile.py b/tensorflow/python/platform/default/_gfile.py
new file mode 100644
index 0000000000..cfd25bdf90
--- /dev/null
+++ b/tensorflow/python/platform/default/_gfile.py
@@ -0,0 +1,404 @@
+"""File processing utilities."""
+
+import errno
+import functools
+import glob as _glob
+import os
+import shutil
+import threading
+
+
+class FileError(IOError):
+  """An error occurred while reading or writing a file."""
+
+
+class GOSError(OSError):
+  """An error occurred while finding a file or in handling pathnames."""
+
+
+class _GFileBase(object):
+  """Base I/O wrapper class.  Similar semantics to Python's file object."""
+
+  # pylint: disable=protected-access
+  def _error_wrapper(fn):
+    """Decorator wrapping GFileBase class method errors."""
+    @functools.wraps(fn)  # Preserve methods' __doc__
+    def wrap(self, *args, **kwargs):
+      try:
+        return fn(self, *args, **kwargs)
+      except ValueError, e:
+        # Sometimes a ValueError is raised, e.g., a read() on a closed file.
+        raise FileError(errno.EIO, e.message, self._name)
+      except IOError, e:
+        e.filename = self._name
+        raise FileError(e)
+      except OSError, e:
+        raise GOSError(e)
+    return wrap
+
+  def _synchronized(fn):
+    """Synchronizes file I/O for methods in GFileBase."""
+    @functools.wraps(fn)
+    def sync(self, *args, **kwargs):
+      # Sometimes a GFileBase method is called before the instance
+      # has been properly initialized.  Check that _locker is available.
+      if hasattr(self, '_locker'): self._locker.lock()
+      try:
+        return fn(self, *args, **kwargs)
+      finally:
+        if hasattr(self, '_locker'): self._locker.unlock()
+    return sync
+  # pylint: enable=protected-access
+
+  @_error_wrapper
+  def __init__(self, name, mode, locker):
+    """Create the GFileBase object with the given filename, mode, and locker.
+
+    Args:
+      name: string, the filename.
+      mode: string, the mode to open the file with (e.g. "r", "w", "a+").
+      locker: the thread locking object (e.g. _PythonLocker) for controlling
+        thread access to the I/O methods of this class.
+    """
+    self._name = name
+    self._mode = mode
+    self._locker = locker
+    self._fp = open(name, mode)
+
+  def __enter__(self):
+    """Make GFileBase usable with "with" statement."""
+    return self
+
+  def __exit__(self, unused_type, unused_value, unused_traceback):
+    """Make GFileBase usable with "with" statement."""
+    self.close()
+
+  @_error_wrapper
+  @_synchronized
+  def __del__(self):
+    # __del__ is sometimes called before initialization, in which
+    # case the object is not fully constructed.  Check for this here
+    # before trying to close the file handle.
+    if hasattr(self, '_fp'): self._fp.close()
+
+  @_error_wrapper
+  @_synchronized
+  def flush(self):
+    """Flush the underlying file handle."""
+    return self._fp.flush()
+
+  @property
+  @_error_wrapper
+  @_synchronized
+  def closed(self):
+    """Returns "True" if the file handle is closed.  Otherwise False."""
+    return self._fp.closed
+
+  @_error_wrapper
+  @_synchronized
+  def write(self, data):
+    """Write data to the underlying file handle.
+
+    Args:
+      data: The string to write to the file handle.
+    """
+    self._fp.write(data)
+
+  @_error_wrapper
+  @_synchronized
+  def writelines(self, seq):
+    """Write a sequence of strings to the underlying file handle."""
+    self._fp.writelines(seq)
+
+  @_error_wrapper
+  @_synchronized
+  def tell(self):
+    """Return the location from the underlying file handle.
+
+    Returns:
+      An integer location (which can be used in e.g., seek).
+    """
+    return self._fp.tell()
+
+  @_error_wrapper
+  @_synchronized
+  def seek(self, offset, whence=0):
+    """Seek to offset (conditioned on whence) in the underlying file handle.
+
+    Args:
+      offset: int, the offset within the file to seek to.
+      whence: 0, 1, or 2.  See python's seek() documentation for details.
+    """
+    self._fp.seek(offset, whence)
+
+  @_error_wrapper
+  @_synchronized
+  def truncate(self, new_size=None):
+    """Truncate the underlying file handle to new_size.
+
+    Args:
+      new_size: Size after truncation.  If None, the file handle is truncated
+      to 0 bytes.
+    """
+    self._fp.truncate(new_size)
+
+  @_error_wrapper
+  @_synchronized
+  def readline(self, max_length=-1):
+    """Read a single line (up to max_length) from the underlying file handle.
+
+    Args:
+      max_length: The maximum number of chsaracters to read.
+
+    Returns:
+      A string, including any newline at the end, or empty string if at EOF.
+    """
+    return self._fp.readline(max_length)
+
+  @_error_wrapper
+  @_synchronized
+  def readlines(self, sizehint=None):
+    """Read lines from the underlying file handle.
+
+    Args:
+      sizehint: See the python file.readlines() documentation.
+
+    Returns:
+      A list of strings from the underlying file handle.
+    """
+    if sizehint is not None:
+      return self._fp.readlines(sizehint)
+    else:
+      return self._fp.readlines()
+
+  def __iter__(self):
+    """Enable line iteration on the underlying handle (not synchronized)."""
+    return self
+
+  # Not synchronized
+  @_error_wrapper
+  def next(self):
+    """Enable line iteration on the underlying handle (not synchronized).
+
+    Returns:
+      An line iterator from the underlying handle.
+
+    Example:
+      # read a file's lines by consuming the iterator with a list
+      with open("filename", "r") as fp: lines = list(fp)
+    """
+    return self._fp.next()
+
+  @_error_wrapper
+  @_synchronized
+  def Size(self):   # pylint: disable=invalid-name
+    """Get byte size of the file from the underlying file handle."""
+    cur = self.tell()
+    try:
+      self.seek(0, 2)
+      size = self.tell()
+    finally:
+      self.seek(cur)
+    return size
+
+  @_error_wrapper
+  @_synchronized
+  def read(self, n=-1):
+    """Read n bytes from the underlying file handle.
+
+    Args:
+      n: Number of bytes to read (if negative, read to end of file handle.)
+
+    Returns:
+      A string of the bytes read, up to the end of file.
+    """
+    return self._fp.read(n)
+
+  @_error_wrapper
+  @_synchronized
+  def close(self):
+    """Close the underlying file handle."""
+    self._fp.close()
+
+  # Declare wrappers as staticmethods at the end so that we can
+  # use them as decorators.
+  _error_wrapper = staticmethod(_error_wrapper)
+  _synchronized = staticmethod(_synchronized)
+
+
+class GFile(_GFileBase):
+  """File I/O wrappers with thread locking."""
+
+  def __init__(self, name, mode='r'):
+    super(GFile, self).__init__(name, mode, _Pythonlocker())
+
+
+class FastGFile(_GFileBase):
+  """File I/O wrappers without thread locking."""
+
+  def __init__(self, name, mode='r'):
+    super(FastGFile, self).__init__(name, mode, _Nulllocker())
+
+
+# locker classes.  Note that locks must be reentrant, so that multiple
+# lock() calls by the owning thread will not block.
+class _Pythonlocker(object):
+  """A locking strategy that uses standard locks from the thread module."""
+
+  def __init__(self):
+    self._lock = threading.RLock()
+
+  def lock(self):
+    self._lock.acquire()
+
+  def unlock(self):
+    self._lock.release()
+
+
+class _Nulllocker(object):
+  """A locking strategy where lock() and unlock() methods are no-ops."""
+
+  def lock(self):
+    pass
+
+  def unlock(self):
+    pass
+
+
+def _func_error_wrapper(fn):
+  """Decorator wrapping function errors."""
+  @functools.wraps(fn)  # Preserve methods' __doc__
+  def wrap(*args, **kwargs):
+    try:
+      return fn(*args, **kwargs)
+    except ValueError, e:
+      raise FileError(errno.EIO, e.message)
+    except IOError, e:
+      raise FileError(e)
+    except OSError, e:
+      raise GOSError(e)
+  return wrap
+
+
+@_func_error_wrapper
+def Exists(path):   # pylint: disable=invalid-name
+  """Retruns True iff "path" exists (as a dir, file, non-broken symlink)."""
+  return os.path.exists(path)
+
+
+@_func_error_wrapper
+def IsDirectory(path):   # pylint: disable=invalid-name
+  """Return True iff "path" exists and is a directory."""
+  return os.path.isdir(path)
+
+
+@_func_error_wrapper
+def Glob(glob):   # pylint: disable=invalid-name
+  """Return a list of filenames matching the glob "glob"."""
+  return _glob.glob(glob)
+
+
+@_func_error_wrapper
+def MkDir(path, mode=0755):   # pylint: disable=invalid-name
+  """Create the directory "path" with the given mode.
+
+  Args:
+    path: The directory path
+    mode: The file mode for the directory
+
+  Returns:
+    None
+
+  Raises:
+    GOSError: if the path already exists
+  """
+  os.mkdir(path, mode)
+
+
+@_func_error_wrapper
+def MakeDirs(path, mode=0755):   # pylint: disable=invalid-name
+  """Recursively create the directory "path" with the given mode.
+
+  Args:
+    path: The directory path
+    mode: The file mode for the created directories
+
+  Returns:
+    None
+
+
+  Raises:
+    GOSError: if the path already exists
+  """
+  os.makedirs(path, mode)
+
+
+@_func_error_wrapper
+def RmDir(directory):   # pylint: disable=invalid-name
+  """Removes the directory "directory" iff the directory is empty.
+
+  Args:
+    directory: The directory to remove.
+
+  Raises:
+    GOSError: If the directory does not exist or is not empty.
+  """
+  os.rmdir(directory)
+
+
+@_func_error_wrapper
+def Remove(path):   # pylint: disable=invalid-name
+  """Delete the (non-directory) file "path".
+
+  Args:
+    path: The file to remove.
+
+  Raises:
+    GOSError: If "path" does not exist, is a directory, or cannot be deleted.
+  """
+  os.remove(path)
+
+
+@_func_error_wrapper
+def DeleteRecursively(path):   # pylint: disable=invalid-name
+  """Delete the file or directory "path" recursively.
+
+  Args:
+    path: The path to remove (may be a non-empty directory).
+
+  Raises:
+    GOSError: If the path does not exist or cannot be deleted.
+  """
+  if IsDirectory(path):
+    shutil.rmtree(path)
+  else:
+    Remove(path)
+
+
+@_func_error_wrapper
+def ListDirectory(directory, return_dotfiles=False):  # pylint: disable=invalid-name
+  """Returns a list of files in dir.
+
+  As with the standard os.listdir(), the filenames in the returned list will be
+  the basenames of the files in dir (not absolute paths).  To get a list of
+  absolute paths of files in a directory, a client could do:
+    file_list = gfile.ListDir(my_dir)
+    file_list = [os.path.join(my_dir, f) for f in file_list]
+  (assuming that my_dir itself specified an absolute path to a directory).
+
+  Args:
+    directory: the directory to list
+    return_dotfiles: if True, dotfiles will be returned as well.  Even if
+      this arg is True, '.' and '..' will not be returned.
+
+  Returns:
+    ['list', 'of', 'files']. The entries '.' and '..' are never returned.
+    Other entries starting with a dot will only be returned if return_dotfiles
+    is True.
+  Raises:
+    GOSError: if there is an error retrieving the directory listing.
+  """
+  files = os.listdir(directory)
+  if not return_dotfiles:
+    files = [f for f in files if not f.startswith('.')]
+  return files
diff --git a/tensorflow/python/platform/default/_googletest.py b/tensorflow/python/platform/default/_googletest.py
new file mode 100644
index 0000000000..d2686565a0
--- /dev/null
+++ b/tensorflow/python/platform/default/_googletest.py
@@ -0,0 +1,68 @@
+"""Imports unittest as a replacement for testing.pybase.googletest."""
+import inspect
+import itertools
+import os
+import tempfile
+
+# pylint: disable=wildcard-import
+from unittest import *
+
+
+unittest_main = main
+
+
+# pylint: disable=invalid-name
+# pylint: disable=undefined-variable
+def main(*args, **kwargs):
+  """Delegate to unittest.main after redefining testLoader."""
+  if 'TEST_SHARD_STATUS_FILE' in os.environ:
+    try:
+      f = None
+      try:
+        f = open(os.environ['TEST_SHARD_STATUS_FILE'], 'w')
+        f.write('')
+      except IOError:
+        sys.stderr.write('Error opening TEST_SHARD_STATUS_FILE (%s). Exiting.'
+                         % os.environ['TEST_SHARD_STATUS_FILE'])
+        sys.exit(1)
+    finally:
+      if f is not None: f.close()
+
+  if ('TEST_TOTAL_SHARDS' not in os.environ or
+      'TEST_SHARD_INDEX' not in os.environ):
+    return unittest_main(*args, **kwargs)
+
+  total_shards = int(os.environ['TEST_TOTAL_SHARDS'])
+  shard_index = int(os.environ['TEST_SHARD_INDEX'])
+  base_loader = TestLoader()
+
+  delegate_get_names = base_loader.getTestCaseNames
+  bucket_iterator = itertools.cycle(range(total_shards))
+
+  def getShardedTestCaseNames(testCaseClass):
+    filtered_names = []
+    for testcase in sorted(delegate_get_names(testCaseClass)):
+      bucket = bucket_iterator.next()
+      if bucket == shard_index:
+        filtered_names.append(testcase)
+    return filtered_names
+
+  # Override getTestCaseNames
+  base_loader.getTestCaseNames = getShardedTestCaseNames
+
+  kwargs['testLoader'] = base_loader
+  unittest_main(*args, **kwargs)
+
+
+def GetTempDir():
+  first_frame = inspect.stack()[-1][0]
+  temp_dir = os.path.join(
+      tempfile.gettempdir(), os.path.basename(inspect.getfile(first_frame)))
+  temp_dir = temp_dir.rstrip('.py')
+  if not os.path.isdir(temp_dir):
+    os.mkdir(temp_dir, 0755)
+  return temp_dir
+
+
+def StatefulSessionAvailable():
+  return False
diff --git a/tensorflow/python/platform/default/_init.py b/tensorflow/python/platform/default/_init.py
new file mode 100644
index 0000000000..916d598856
--- /dev/null
+++ b/tensorflow/python/platform/default/_init.py
@@ -0,0 +1 @@
+# Nothing to do for default platform
diff --git a/tensorflow/python/platform/default/_logging.py b/tensorflow/python/platform/default/_logging.py
new file mode 100644
index 0000000000..2e289b1abe
--- /dev/null
+++ b/tensorflow/python/platform/default/_logging.py
@@ -0,0 +1,182 @@
+"""Logging utilities."""
+# pylint: disable=unused-import
+# pylint: disable=g-bad-import-order
+# pylint: disable=invalid-name
+import os
+import sys
+import time
+import thread
+from logging import getLogger
+from logging import log
+from logging import debug
+from logging import error
+from logging import fatal
+from logging import info
+from logging import warn
+from logging import warning
+from logging import DEBUG
+from logging import ERROR
+from logging import FATAL
+from logging import INFO
+from logging import WARN
+
+# Controls which methods from pyglib.logging are available within the project
+# Do not add methods here without also adding to platform/default/_logging.py
+__all__ = ['log', 'debug', 'error', 'fatal', 'info', 'warn', 'warning',
+           'DEBUG', 'ERROR', 'FATAL', 'INFO', 'WARN',
+           'flush', 'log_every_n', 'log_first_n', 'vlog',
+           'TaskLevelStatusMessage', 'get_verbosity', 'set_verbosity']
+
+warning = warn
+
+_level_names = {
+    FATAL: 'FATAL',
+    ERROR: 'ERROR',
+    WARN: 'WARN',
+    INFO: 'INFO',
+    DEBUG: 'DEBUG',
+}
+
+# Mask to convert integer thread ids to unsigned quantities for logging
+# purposes
+_THREAD_ID_MASK = 2 * sys.maxint + 1
+
+_log_prefix = None  # later set to google2_log_prefix
+
+# Counter to keep track of number of log entries per token.
+_log_counter_per_token = {}
+
+
+def TaskLevelStatusMessage(msg):
+  error(msg)
+
+
+def flush():
+  raise NotImplementedError()
+
+
+# Code below is taken from pyglib/logging
+def vlog(level, msg, *args, **kwargs):
+  log(level, msg, *args, **kwargs)
+
+
+def _GetNextLogCountPerToken(token):
+  """Wrapper for _log_counter_per_token.
+
+  Args:
+    token: The token for which to look up the count.
+
+  Returns:
+    The number of times this function has been called with
+    *token* as an argument (starting at 0)
+  """
+  global _log_counter_per_token  # pylint: disable=global-variable-not-assigned
+  _log_counter_per_token[token] = 1 + _log_counter_per_token.get(token, -1)
+  return _log_counter_per_token[token]
+
+
+def log_every_n(level, msg, n, *args):
+  """Log 'msg % args' at level 'level' once per 'n' times.
+
+  Logs the 1st call, (N+1)st call, (2N+1)st call,  etc.
+  Not threadsafe.
+
+  Args:
+    level: The level at which to log.
+    msg: The message to be logged.
+    n: The number of times this should be called before it is logged.
+    *args: The args to be substituted into the msg.
+  """
+  count = _GetNextLogCountPerToken(_GetFileAndLine())
+  log_if(level, msg, not (count % n), *args)
+
+
+def log_first_n(level, msg, n, *args):  # pylint: disable=g-bad-name
+  """Log 'msg % args' at level 'level' only first 'n' times.
+
+  Not threadsafe.
+
+  Args:
+    level: The level at which to log.
+    msg: The message to be logged.
+    n: The number of times this should be called before it is logged.
+    *args: The args to be substituted into the msg.
+  """
+  count = _GetNextLogCountPerToken(_GetFileAndLine())
+  log_if(level, msg, count < n, *args)
+
+
+def log_if(level, msg, condition, *args):
+  """Log 'msg % args' at level 'level' only if condition is fulfilled."""
+  if condition:
+    vlog(level, msg, *args)
+
+
+def _GetFileAndLine():
+  """Returns (filename, linenumber) for the stack frame."""
+  # Use sys._getframe().  This avoids creating a traceback object.
+  # pylint: disable=protected-access
+  f = sys._getframe()
+  # pylint: enable=protected-access
+  our_file = f.f_code.co_filename
+  f = f.f_back
+  while f:
+    code = f.f_code
+    if code.co_filename != our_file:
+      return (code.co_filename, f.f_lineno)
+    f = f.f_back
+  return ('<unknown>', 0)
+
+
+def google2_log_prefix(level, timestamp=None, file_and_line=None):
+  """Assemble a logline prefix using the google2 format."""
+  # pylint: disable=global-variable-not-assigned
+  global _level_names
+  global _logfile_map, _logfile_map_mutex
+  # pylint: enable=global-variable-not-assigned
+
+  # Record current time
+  now = timestamp or time.time()
+  now_tuple = time.localtime(now)
+  now_microsecond = int(1e6 * (now % 1.0))
+
+  (filename, line) = file_and_line or _GetFileAndLine()
+  basename = os.path.basename(filename)
+
+  # Severity string
+  severity = 'I'
+  if level in _level_names:
+    severity = _level_names[level][0]
+
+  s = '%c%02d%02d %02d:%02d:%02d.%06d %5d %s:%d] ' % (
+      severity,
+      now_tuple[1],  # month
+      now_tuple[2],  # day
+      now_tuple[3],  # hour
+      now_tuple[4],  # min
+      now_tuple[5],  # sec
+      now_microsecond,
+      _get_thread_id(),
+      basename,
+      line)
+
+  return s
+
+
+def get_verbosity():
+  """Return how much logging output will be produced."""
+  return getLogger().getEffectiveLevel()
+
+
+def set_verbosity(verbosity):
+  """Sets the threshold for what messages will be logged."""
+  getLogger().setLevel(verbosity)
+
+
+def _get_thread_id():
+  """Get id of current thread, suitable for logging as an unsigned quantity."""
+  thread_id = thread.get_ident()
+  return thread_id & _THREAD_ID_MASK
+
+
+_log_prefix = google2_log_prefix
diff --git a/tensorflow/python/platform/default/_parameterized.py b/tensorflow/python/platform/default/_parameterized.py
new file mode 100644
index 0000000000..5d141568ed
--- /dev/null
+++ b/tensorflow/python/platform/default/_parameterized.py
@@ -0,0 +1,2 @@
+"""Extension to unittest to run parameterized tests."""
+raise ImportError("Not implemented yet.")
diff --git a/tensorflow/python/platform/default/_resource_loader.py b/tensorflow/python/platform/default/_resource_loader.py
new file mode 100644
index 0000000000..69f425072f
--- /dev/null
+++ b/tensorflow/python/platform/default/_resource_loader.py
@@ -0,0 +1,26 @@
+"""Read a file and return its contents."""
+
+import os.path
+
+from tensorflow.python.platform import logging
+
+
+def load_resource(path):
+  """Load the resource at given path, where path is relative to tensorflow/.
+
+  Args:
+    path: a string resource path relative to tensorflow/.
+
+  Returns:
+    The contents of that resource.
+
+  Raises:
+    IOError: If the path is not found, or the resource can't be opened.
+  """
+  path = os.path.join('tensorflow', path)
+  path = os.path.abspath(path)
+  try:
+    with open(path, 'rb') as f:
+      return f.read()
+  except IOError as e:
+    logging.warning('IOError %s on path %s' % (e, path))
diff --git a/tensorflow/python/platform/default/_status_bar.py b/tensorflow/python/platform/default/_status_bar.py
new file mode 100644
index 0000000000..2953908724
--- /dev/null
+++ b/tensorflow/python/platform/default/_status_bar.py
@@ -0,0 +1,5 @@
+"""A no-op implementation of status bar functions."""
+
+
+def SetupStatusBarInsideGoogle(unused_link_text, unused_port):
+  pass
diff --git a/tensorflow/python/platform/default/flags_test.py b/tensorflow/python/platform/default/flags_test.py
new file mode 100644
index 0000000000..1b15ca138a
--- /dev/null
+++ b/tensorflow/python/platform/default/flags_test.py
@@ -0,0 +1,53 @@
+"""Tests for our flags implementation."""
+import sys
+
+from tensorflow.python.platform.default import _googletest as googletest
+
+from tensorflow.python.platform.default import _flags as flags
+
+
+flags.DEFINE_string("string_foo", "default_val", "HelpString")
+flags.DEFINE_boolean("bool_foo", True, "HelpString")
+flags.DEFINE_integer("int_foo", 42, "HelpString")
+flags.DEFINE_float("float_foo", 42.0, "HelpString")
+
+FLAGS = flags.FLAGS
+
+class FlagsTest(googletest.TestCase):
+
+  def testString(self):
+    res = FLAGS.string_foo
+    self.assertEqual(res, "default_val")
+    FLAGS.string_foo = "bar"
+    self.assertEqual("bar", FLAGS.string_foo)
+
+  def testBool(self):
+    res = FLAGS.bool_foo
+    self.assertTrue(res)
+    FLAGS.bool_foo = False
+    self.assertFalse(FLAGS.bool_foo)
+
+  def testNoBool(self):
+    FLAGS.bool_foo = True
+    try:
+      sys.argv.append("--nobool_foo")
+      FLAGS._parse_flags()
+      self.assertFalse(FLAGS.bool_foo)
+    finally:
+      sys.argv.pop()
+
+  def testInt(self):
+    res = FLAGS.int_foo
+    self.assertEquals(res, 42)
+    FLAGS.int_foo = -1
+    self.assertEqual(-1, FLAGS.int_foo)
+
+  def testFloat(self):
+    res = FLAGS.float_foo
+    self.assertEquals(42.0, res)
+    FLAGS.float_foo = -1.0
+    self.assertEqual(-1.0, FLAGS.float_foo)
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/platform/default/gfile_test.py b/tensorflow/python/platform/default/gfile_test.py
new file mode 100644
index 0000000000..9eec952e95
--- /dev/null
+++ b/tensorflow/python/platform/default/gfile_test.py
@@ -0,0 +1,147 @@
+import os
+import shutil
+
+from tensorflow.python.platform.default import _gfile as gfile
+from tensorflow.python.platform.default import _googletest as googletest
+from tensorflow.python.platform.default import _logging as logging
+
+
+class _BaseTest(object):
+
+  @property
+  def tmp(self):
+    return self._tmp_dir
+
+  def setUp(self):
+    self._orig_dir = os.getcwd()
+    self._tmp_dir = googletest.GetTempDir() + "/"
+    try:
+      os.makedirs(self._tmp_dir)
+    except OSError:
+      pass  # Directory already exists
+
+  def tearDown(self):
+    try:
+      shutil.rmtree(self._tmp_dir)
+    except OSError:
+      logging.warn("[%s] Post-test directory cleanup failed: %s"
+                   % (self, self._tmp_dir))
+
+
+class _GFileBaseTest(_BaseTest):
+
+  @property
+  def gfile(self):
+    raise NotImplementedError("Do not use _GFileBaseTest directly.")
+
+  def testWith(self):
+    with self.gfile(self.tmp + "test_with", "w") as fh:
+      fh.write("hi")
+    with self.gfile(self.tmp + "test_with", "r") as fh:
+      self.assertEquals(fh.read(), "hi")
+
+  def testSizeAndTellAndSeek(self):
+    with self.gfile(self.tmp + "test_tell", "w") as fh:
+      fh.write("".join(["0"] * 1000))
+    with self.gfile(self.tmp + "test_tell", "r") as fh:
+      self.assertEqual(1000, fh.Size())
+      self.assertEqual(0, fh.tell())
+      fh.seek(0, 2)
+      self.assertEqual(1000, fh.tell())
+      fh.seek(0)
+      self.assertEqual(0, fh.tell())
+
+  def testReadAndWritelines(self):
+    with self.gfile(self.tmp + "test_writelines", "w") as fh:
+      fh.writelines(["%d\n" % d for d in range(10)])
+    with self.gfile(self.tmp + "test_writelines", "r") as fh:
+      self.assertEqual(["%d\n" % x for x in range(10)], fh.readlines())
+
+  def testWriteAndTruncate(self):
+    with self.gfile(self.tmp + "test_truncate", "w") as fh:
+      fh.write("ababab")
+    with self.gfile(self.tmp + "test_truncate", "a+") as fh:
+      fh.seek(0, 2)
+      fh.write("hjhjhj")
+    with self.gfile(self.tmp + "test_truncate", "a+") as fh:
+      self.assertEqual(fh.Size(), 12)
+      fh.truncate(6)
+    with self.gfile(self.tmp + "test_truncate", "r") as fh:
+      self.assertEqual(fh.read(), "ababab")
+
+  def testErrors(self):
+    self.assertRaises(
+        gfile.FileError, lambda: self.gfile(self.tmp + "doesnt_exist", "r"))
+    with self.gfile(self.tmp + "test_error", "w") as fh:
+      self.assertRaises(gfile.FileError, lambda: fh.seek(-1))
+    # test_error now exists, we can read from it:
+    with self.gfile(self.tmp + "test_error", "r") as fh:
+      self.assertRaises(gfile.FileError, lambda: fh.write("ack"))
+    fh = self.gfile(self.tmp + "test_error", "w")
+    self.assertFalse(fh.closed)
+    fh.close()
+    self.assertTrue(fh.closed)
+    self.assertRaises(gfile.FileError, lambda: fh.write("ack"))
+
+  def testIteration(self):
+    with self.gfile(self.tmp + "test_iter", "w") as fh:
+      fh.writelines(["a\n", "b\n", "c\n"])
+    with self.gfile(self.tmp + "test_iter", "r") as fh:
+      lines = list(fh)
+      self.assertEqual(["a\n", "b\n", "c\n"], lines)
+
+
+class GFileTest(_GFileBaseTest, googletest.TestCase):
+
+  @property
+  def gfile(self):
+    return gfile.GFile
+
+
+class FastGFileTest(_GFileBaseTest, googletest.TestCase):
+
+  @property
+  def gfile(self):
+    return gfile.FastGFile
+
+
+class FunctionTests(_BaseTest, googletest.TestCase):
+
+  def testExists(self):
+    self.assertFalse(gfile.Exists(self.tmp + "test_exists"))
+    with gfile.GFile(self.tmp + "test_exists", "w"):
+      pass
+    self.assertTrue(gfile.Exists(self.tmp + "test_exists"))
+
+  def testMkDirsGlobAndRmDirs(self):
+    self.assertFalse(gfile.Exists(self.tmp + "test_dir"))
+    gfile.MkDir(self.tmp + "test_dir")
+    self.assertTrue(gfile.Exists(self.tmp + "test_dir"))
+    gfile.RmDir(self.tmp + "test_dir")
+    self.assertFalse(gfile.Exists(self.tmp + "test_dir"))
+    gfile.MakeDirs(self.tmp + "test_dir/blah0")
+    gfile.MakeDirs(self.tmp + "test_dir/blah1")
+    self.assertEqual([self.tmp + "test_dir/blah0", self.tmp + "test_dir/blah1"],
+                     sorted(gfile.Glob(self.tmp + "test_dir/*")))
+    gfile.DeleteRecursively(self.tmp + "test_dir")
+    self.assertFalse(gfile.Exists(self.tmp + "test_dir"))
+
+  def testErrors(self):
+    self.assertRaises(
+        gfile.GOSError, lambda: gfile.RmDir(self.tmp + "dir_doesnt_exist"))
+    self.assertRaises(
+        gfile.GOSError, lambda: gfile.Remove(self.tmp + "file_doesnt_exist"))
+    gfile.MkDir(self.tmp + "error_dir")
+    with gfile.GFile(self.tmp + "error_dir/file", "w"):
+      pass  # Create file
+    self.assertRaises(
+        gfile.GOSError, lambda: gfile.Remove(self.tmp + "error_dir"))
+    self.assertRaises(
+        gfile.GOSError, lambda: gfile.RmDir(self.tmp + "error_dir"))
+    self.assertTrue(gfile.Exists(self.tmp + "error_dir"))
+    gfile.DeleteRecursively(self.tmp + "error_dir")
+    self.assertFalse(gfile.Exists(self.tmp + "error_dir"))
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/platform/default/logging_test.py b/tensorflow/python/platform/default/logging_test.py
new file mode 100644
index 0000000000..fd492bc384
--- /dev/null
+++ b/tensorflow/python/platform/default/logging_test.py
@@ -0,0 +1,13 @@
+from tensorflow.python.platform.default import _googletest as googletest
+from tensorflow.python.platform.default import _logging as logging
+
+
+class EventLoaderTest(googletest.TestCase):
+
+  def test_log(self):
+    # Just check that logging works without raising an exception.
+    logging.error("test log message")
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/platform/flags.py b/tensorflow/python/platform/flags.py
new file mode 100644
index 0000000000..d5b12d26df
--- /dev/null
+++ b/tensorflow/python/platform/flags.py
@@ -0,0 +1,10 @@
+"""Switch between depending on pyglib.flags or open-source gflags."""
+# pylint: disable=unused-import
+# pylint: disable=g-import-not-at-top
+# pylint: disable=wildcard-import
+import tensorflow.python.platform
+import control_imports
+if control_imports.USE_OSS and control_imports.OSS_FLAGS:
+  from tensorflow.python.platform.default._flags import *
+else:
+  from tensorflow.python.platform.google._flags import *
diff --git a/tensorflow/python/platform/gfile.py b/tensorflow/python/platform/gfile.py
new file mode 100644
index 0000000000..fc28811821
--- /dev/null
+++ b/tensorflow/python/platform/gfile.py
@@ -0,0 +1,10 @@
+"""Switch between depending on pyglib.gfile or an OSS replacement."""
+# pylint: disable=unused-import
+# pylint: disable=g-import-not-at-top
+# pylint: disable=wildcard-import
+import tensorflow.python.platform
+import control_imports
+if control_imports.USE_OSS and control_imports.OSS_GFILE:
+  from tensorflow.python.platform.default._gfile import *
+else:
+  from tensorflow.python.platform.google._gfile import *
diff --git a/tensorflow/python/platform/googletest.py b/tensorflow/python/platform/googletest.py
new file mode 100644
index 0000000000..ca22ec6e6b
--- /dev/null
+++ b/tensorflow/python/platform/googletest.py
@@ -0,0 +1,10 @@
+"""Switch between depending on googletest or unittest."""
+# pylint: disable=unused-import
+# pylint: disable=g-import-not-at-top
+# pylint: disable=wildcard-import
+import tensorflow.python.platform
+import control_imports
+if control_imports.USE_OSS and control_imports.OSS_GOOGLETEST:
+  from tensorflow.python.platform.default._googletest import *
+else:
+  from tensorflow.python.platform.google._googletest import *
diff --git a/tensorflow/python/platform/logging.py b/tensorflow/python/platform/logging.py
new file mode 100644
index 0000000000..b6d2e53dd4
--- /dev/null
+++ b/tensorflow/python/platform/logging.py
@@ -0,0 +1,10 @@
+"""Switch between depending on pyglib.logging or regular logging."""
+# pylint: disable=unused-import
+# pylint: disable=g-import-not-at-top
+# pylint: disable=wildcard-import
+import tensorflow.python.platform
+import control_imports
+if control_imports.USE_OSS and control_imports.OSS_LOGGING:
+  from tensorflow.python.platform.default._logging import *
+else:
+  from tensorflow.python.platform.google._logging import *
diff --git a/tensorflow/python/platform/numpy.i b/tensorflow/python/platform/numpy.i
new file mode 100644
index 0000000000..217acd5bff
--- /dev/null
+++ b/tensorflow/python/platform/numpy.i
@@ -0,0 +1,3085 @@
+/* -*- C -*-  (not really, but good for syntax highlighting) */
+#ifdef SWIGPYTHON
+
+%{
+#ifndef SWIG_FILE_WITH_INIT
+#define NO_IMPORT_ARRAY
+#endif
+#include "stdio.h"
+#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
+#include <numpy/arrayobject.h>
+%}
+
+/**********************************************************************/
+
+%fragment("NumPy_Backward_Compatibility", "header")
+{
+%#if NPY_API_VERSION < 0x00000007
+%#define NPY_ARRAY_DEFAULT NPY_DEFAULT
+%#define NPY_ARRAY_FARRAY  NPY_FARRAY
+%#define NPY_FORTRANORDER  NPY_FORTRAN
+%#endif
+}
+
+/**********************************************************************/
+
+/* The following code originally appeared in
+ * enthought/kiva/agg/src/numeric.i written by Eric Jones.  It was
+ * translated from C++ to C by John Hunter.  Bill Spotz has modified
+ * it to fix some minor bugs, upgrade from Numeric to numpy (all
+ * versions), add some comments and functionality, and convert from
+ * direct code insertion to SWIG fragments.
+ */
+
+%fragment("NumPy_Macros", "header")
+{
+/* Macros to extract array attributes.
+ */
+%#if NPY_API_VERSION < 0x00000007
+%#define is_array(a)            ((a) && PyArray_Check((PyArrayObject*)a))
+%#define array_type(a)          (int)(PyArray_TYPE((PyArrayObject*)a))
+%#define array_numdims(a)       (((PyArrayObject*)a)->nd)
+%#define array_dimensions(a)    (((PyArrayObject*)a)->dimensions)
+%#define array_size(a,i)        (((PyArrayObject*)a)->dimensions[i])
+%#define array_strides(a)       (((PyArrayObject*)a)->strides)
+%#define array_stride(a,i)      (((PyArrayObject*)a)->strides[i])
+%#define array_data(a)          (((PyArrayObject*)a)->data)
+%#define array_descr(a)         (((PyArrayObject*)a)->descr)
+%#define array_flags(a)         (((PyArrayObject*)a)->flags)
+%#define array_enableflags(a,f) (((PyArrayObject*)a)->flags) = f
+%#else
+%#define is_array(a)            ((a) && PyArray_Check(a))
+%#define array_type(a)          PyArray_TYPE((PyArrayObject*)a)
+%#define array_numdims(a)       PyArray_NDIM((PyArrayObject*)a)
+%#define array_dimensions(a)    PyArray_DIMS((PyArrayObject*)a)
+%#define array_strides(a)       PyArray_STRIDES((PyArrayObject*)a)
+%#define array_stride(a,i)      PyArray_STRIDE((PyArrayObject*)a,i)
+%#define array_size(a,i)        PyArray_DIM((PyArrayObject*)a,i)
+%#define array_data(a)          PyArray_DATA((PyArrayObject*)a)
+%#define array_descr(a)         PyArray_DESCR((PyArrayObject*)a)
+%#define array_flags(a)         PyArray_FLAGS((PyArrayObject*)a)
+%#define array_enableflags(a,f) PyArray_ENABLEFLAGS((PyArrayObject*)a,f)
+%#endif
+%#define array_is_contiguous(a) (PyArray_ISCONTIGUOUS((PyArrayObject*)a))
+%#define array_is_native(a)     (PyArray_ISNOTSWAPPED((PyArrayObject*)a))
+%#define array_is_fortran(a)    (PyArray_ISFORTRAN((PyArrayObject*)a))
+}
+
+/**********************************************************************/
+
+%fragment("NumPy_Utilities",
+          "header")
+{
+  /* Given a PyObject, return a string describing its type.
+   */
+  const char* pytype_string(PyObject* py_obj)
+  {
+    if (py_obj == NULL          ) return "C NULL value";
+    if (py_obj == Py_None       ) return "Python None" ;
+    if (PyCallable_Check(py_obj)) return "callable"    ;
+    if (PyString_Check(  py_obj)) return "string"      ;
+    if (PyInt_Check(     py_obj)) return "int"         ;
+    if (PyFloat_Check(   py_obj)) return "float"       ;
+    if (PyDict_Check(    py_obj)) return "dict"        ;
+    if (PyList_Check(    py_obj)) return "list"        ;
+    if (PyTuple_Check(   py_obj)) return "tuple"       ;
+%#if PY_MAJOR_VERSION < 3
+    if (PyFile_Check(    py_obj)) return "file"        ;
+    if (PyModule_Check(  py_obj)) return "module"      ;
+    if (PyInstance_Check(py_obj)) return "instance"    ;
+%#endif
+
+    return "unkown type";
+  }
+
+  /* Given a NumPy typecode, return a string describing the type.
+   */
+  const char* typecode_string(int typecode)
+  {
+    static const char* type_names[25] = {"bool",
+                                         "byte",
+                                         "unsigned byte",
+                                         "short",
+                                         "unsigned short",
+                                         "int",
+                                         "unsigned int",
+                                         "long",
+                                         "unsigned long",
+                                         "long long",
+                                         "unsigned long long",
+                                         "float",
+                                         "double",
+                                         "long double",
+                                         "complex float",
+                                         "complex double",
+                                         "complex long double",
+                                         "object",
+                                         "string",
+                                         "unicode",
+                                         "void",
+                                         "ntypes",
+                                         "notype",
+                                         "char",
+                                         "unknown"};
+    return typecode < 24 ? type_names[typecode] : type_names[24];
+  }
+
+  /* Make sure input has correct numpy type.  This now just calls
+     PyArray_EquivTypenums().
+   */
+  int type_match(int actual_type,
+                 int desired_type)
+  {
+    return PyArray_EquivTypenums(actual_type, desired_type);
+  }
+
+%#ifdef SWIGPY_USE_CAPSULE
+  void free_cap(PyObject * cap)
+  {
+    void* array = (void*) PyCapsule_GetPointer(cap,SWIGPY_CAPSULE_NAME);
+    if (array != NULL) free(array);
+  }
+%#endif
+
+
+}
+
+/**********************************************************************/
+
+%fragment("NumPy_Object_to_Array",
+          "header",
+          fragment="NumPy_Backward_Compatibility",
+          fragment="NumPy_Macros",
+          fragment="NumPy_Utilities")
+{
+  /* Given a PyObject pointer, cast it to a PyArrayObject pointer if
+   * legal.  If not, set the python error string appropriately and
+   * return NULL.
+   */
+  PyArrayObject* obj_to_array_no_conversion(PyObject* input,
+                                            int        typecode)
+  {
+    PyArrayObject* ary = NULL;
+    if (is_array(input) && (typecode == NPY_NOTYPE ||
+                            PyArray_EquivTypenums(array_type(input), typecode)))
+    {
+      ary = (PyArrayObject*) input;
+    }
+    else if is_array(input)
+    {
+      const char* desired_type = typecode_string(typecode);
+      const char* actual_type  = typecode_string(array_type(input));
+      PyErr_Format(PyExc_TypeError,
+                   "Array of type '%s' required.  Array of type '%s' given",
+                   desired_type, actual_type);
+      ary = NULL;
+    }
+    else
+    {
+      const char* desired_type = typecode_string(typecode);
+      const char* actual_type  = pytype_string(input);
+      PyErr_Format(PyExc_TypeError,
+                   "Array of type '%s' required.  A '%s' was given",
+                   desired_type,
+                   actual_type);
+      ary = NULL;
+    }
+    return ary;
+  }
+
+  /* Convert the given PyObject to a NumPy array with the given
+   * typecode.  On success, return a valid PyArrayObject* with the
+   * correct type.  On failure, the python error string will be set and
+   * the routine returns NULL.
+   */
+  PyArrayObject* obj_to_array_allow_conversion(PyObject* input,
+                                               int       typecode,
+                                               int*      is_new_object)
+  {
+    PyArrayObject* ary = NULL;
+    PyObject*      py_obj;
+    if (is_array(input) && (typecode == NPY_NOTYPE ||
+                            PyArray_EquivTypenums(array_type(input),typecode)))
+    {
+      ary = (PyArrayObject*) input;
+      *is_new_object = 0;
+    }
+    else
+    {
+      py_obj = PyArray_FROMANY(input, typecode, 0, 0, NPY_ARRAY_DEFAULT);
+      /* If NULL, PyArray_FromObject will have set python error value.*/
+      ary = (PyArrayObject*) py_obj;
+      *is_new_object = 1;
+    }
+    return ary;
+  }
+
+  /* Given a PyArrayObject, check to see if it is contiguous.  If so,
+   * return the input pointer and flag it as not a new object.  If it is
+   * not contiguous, create a new PyArrayObject using the original data,
+   * flag it as a new object and return the pointer.
+   */
+  PyArrayObject* make_contiguous(PyArrayObject* ary,
+                                 int*           is_new_object,
+                                 int            min_dims,
+                                 int            max_dims)
+  {
+    PyArrayObject* result;
+    if (array_is_contiguous(ary))
+    {
+      result = ary;
+      *is_new_object = 0;
+    }
+    else
+    {
+      result = (PyArrayObject*) PyArray_ContiguousFromObject((PyObject*)ary,
+                                                              array_type(ary),
+                                                              min_dims,
+                                                              max_dims);
+      *is_new_object = 1;
+    }
+    return result;
+  }
+
+  /* Given a PyArrayObject, check to see if it is Fortran-contiguous.
+   * If so, return the input pointer, but do not flag it as not a new
+   * object.  If it is not Fortran-contiguous, create a new
+   * PyArrayObject using the original data, flag it as a new object
+   * and return the pointer.
+   */
+  PyArrayObject* make_fortran(PyArrayObject* ary,
+                              int*           is_new_object)
+  {
+    PyArrayObject* result;
+    if (array_is_fortran(ary))
+    {
+      result = ary;
+      *is_new_object = 0;
+    }
+    else
+    {
+      Py_INCREF(array_descr(ary));
+      result = (PyArrayObject*) PyArray_FromArray(ary,
+                                                  array_descr(ary),
+                                                  NPY_FORTRANORDER);
+      *is_new_object = 1;
+    }
+    return result;
+  }
+
+  /* Convert a given PyObject to a contiguous PyArrayObject of the
+   * specified type.  If the input object is not a contiguous
+   * PyArrayObject, a new one will be created and the new object flag
+   * will be set.
+   */
+  PyArrayObject* obj_to_array_contiguous_allow_conversion(PyObject* input,
+                                                          int       typecode,
+                                                          int*      is_new_object)
+  {
+    int is_new1 = 0;
+    int is_new2 = 0;
+    PyArrayObject* ary2;
+    PyArrayObject* ary1 = obj_to_array_allow_conversion(input,
+                                                        typecode,
+                                                        &is_new1);
+    if (ary1)
+    {
+      ary2 = make_contiguous(ary1, &is_new2, 0, 0);
+      if ( is_new1 && is_new2)
+      {
+        Py_DECREF(ary1);
+      }
+      ary1 = ary2;
+    }
+    *is_new_object = is_new1 || is_new2;
+    return ary1;
+  }
+
+  /* Convert a given PyObject to a Fortran-ordered PyArrayObject of the
+   * specified type.  If the input object is not a Fortran-ordered
+   * PyArrayObject, a new one will be created and the new object flag
+   * will be set.
+   */
+  PyArrayObject* obj_to_array_fortran_allow_conversion(PyObject* input,
+                                                       int       typecode,
+                                                       int*      is_new_object)
+  {
+    int is_new1 = 0;
+    int is_new2 = 0;
+    PyArrayObject* ary2;
+    PyArrayObject* ary1 = obj_to_array_allow_conversion(input,
+                                                        typecode,
+                                                        &is_new1);
+    if (ary1)
+    {
+      ary2 = make_fortran(ary1, &is_new2);
+      if (is_new1 && is_new2)
+      {
+        Py_DECREF(ary1);
+      }
+      ary1 = ary2;
+    }
+    *is_new_object = is_new1 || is_new2;
+    return ary1;
+  }
+} /* end fragment */
+
+/**********************************************************************/
+
+%fragment("NumPy_Array_Requirements",
+          "header",
+          fragment="NumPy_Backward_Compatibility",
+          fragment="NumPy_Macros")
+{
+  /* Test whether a python object is contiguous.  If array is
+   * contiguous, return 1.  Otherwise, set the python error string and
+   * return 0.
+   */
+  int require_contiguous(PyArrayObject* ary)
+  {
+    int contiguous = 1;
+    if (!array_is_contiguous(ary))
+    {
+      PyErr_SetString(PyExc_TypeError,
+                      "Array must be contiguous.  A non-contiguous array was given");
+      contiguous = 0;
+    }
+    return contiguous;
+  }
+
+  /* Require that a numpy array is not byte-swapped.  If the array is
+   * not byte-swapped, return 1.  Otherwise, set the python error string
+   * and return 0.
+   */
+  int require_native(PyArrayObject* ary)
+  {
+    int native = 1;
+    if (!array_is_native(ary))
+    {
+      PyErr_SetString(PyExc_TypeError,
+                      "Array must have native byteorder.  "
+                      "A byte-swapped array was given");
+      native = 0;
+    }
+    return native;
+  }
+
+  /* Require the given PyArrayObject to have a specified number of
+   * dimensions.  If the array has the specified number of dimensions,
+   * return 1.  Otherwise, set the python error string and return 0.
+   */
+  int require_dimensions(PyArrayObject* ary,
+                         int            exact_dimensions)
+  {
+    int success = 1;
+    if (array_numdims(ary) != exact_dimensions)
+    {
+      PyErr_Format(PyExc_TypeError,
+                   "Array must have %d dimensions.  Given array has %d dimensions",
+                   exact_dimensions,
+                   array_numdims(ary));
+      success = 0;
+    }
+    return success;
+  }
+
+  /* Require the given PyArrayObject to have one of a list of specified
+   * number of dimensions.  If the array has one of the specified number
+   * of dimensions, return 1.  Otherwise, set the python error string
+   * and return 0.
+   */
+  int require_dimensions_n(PyArrayObject* ary,
+                           int*           exact_dimensions,
+                           int            n)
+  {
+    int success = 0;
+    int i;
+    char dims_str[255] = "";
+    char s[255];
+    for (i = 0; i < n && !success; i++)
+    {
+      if (array_numdims(ary) == exact_dimensions[i])
+      {
+        success = 1;
+      }
+    }
+    if (!success)
+    {
+      for (i = 0; i < n-1; i++)
+      {
+        sprintf(s, "%d, ", exact_dimensions[i]);
+        strcat(dims_str,s);
+      }
+      sprintf(s, " or %d", exact_dimensions[n-1]);
+      strcat(dims_str,s);
+      PyErr_Format(PyExc_TypeError,
+                   "Array must have %s dimensions.  Given array has %d dimensions",
+                   dims_str,
+                   array_numdims(ary));
+    }
+    return success;
+  }
+
+  /* Require the given PyArrayObject to have a specified shape.  If the
+   * array has the specified shape, return 1.  Otherwise, set the python
+   * error string and return 0.
+   */
+  int require_size(PyArrayObject* ary,
+                   npy_intp*      size,
+                   int            n)
+  {
+    int i;
+    int success = 1;
+    int len;
+    char desired_dims[255] = "[";
+    char s[255];
+    char actual_dims[255] = "[";
+    for(i=0; i < n;i++)
+    {
+      if (size[i] != -1 &&  size[i] != array_size(ary,i))
+      {
+        success = 0;
+      }
+    }
+    if (!success)
+    {
+      for (i = 0; i < n; i++)
+      {
+        if (size[i] == -1)
+        {
+          sprintf(s, "*,");
+        }
+        else
+        {
+          sprintf(s, "%ld,", (long int)size[i]);
+        }
+        strcat(desired_dims,s);
+      }
+      len = strlen(desired_dims);
+      desired_dims[len-1] = ']';
+      for (i = 0; i < n; i++)
+      {
+        sprintf(s, "%ld,", (long int)array_size(ary,i));
+        strcat(actual_dims,s);
+      }
+      len = strlen(actual_dims);
+      actual_dims[len-1] = ']';
+      PyErr_Format(PyExc_TypeError,
+                   "Array must have shape of %s.  Given array has shape of %s",
+                   desired_dims,
+                   actual_dims);
+    }
+    return success;
+  }
+
+  /* Require the given PyArrayObject to to be Fortran ordered.  If the
+   * the PyArrayObject is already Fortran ordered, do nothing.  Else,
+   * set the Fortran ordering flag and recompute the strides.
+   */
+  int require_fortran(PyArrayObject* ary)
+  {
+    int success = 1;
+    int nd = array_numdims(ary);
+    int i;
+    npy_intp * strides = array_strides(ary);
+    if (array_is_fortran(ary)) return success;
+    /* Set the Fortran ordered flag */
+    array_enableflags(ary,NPY_ARRAY_FARRAY);
+    /* Recompute the strides */
+    strides[0] = strides[nd-1];
+    for (i=1; i < nd; ++i)
+      strides[i] = strides[i-1] * array_size(ary,i-1);
+    return success;
+  }
+}
+
+/* Combine all NumPy fragments into one for convenience */
+%fragment("NumPy_Fragments",
+          "header",
+          fragment="NumPy_Backward_Compatibility",
+          fragment="NumPy_Macros",
+          fragment="NumPy_Utilities",
+          fragment="NumPy_Object_to_Array",
+          fragment="NumPy_Array_Requirements")
+{
+}
+
+/* End John Hunter translation (with modifications by Bill Spotz)
+ */
+
+/* %numpy_typemaps() macro
+ *
+ * This macro defines a family of 74 typemaps that allow C arguments
+ * of the form
+ *
+ *    1. (DATA_TYPE IN_ARRAY1[ANY])
+ *    2. (DATA_TYPE* IN_ARRAY1, DIM_TYPE DIM1)
+ *    3. (DIM_TYPE DIM1, DATA_TYPE* IN_ARRAY1)
+ *
+ *    4. (DATA_TYPE IN_ARRAY2[ANY][ANY])
+ *    5. (DATA_TYPE* IN_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+ *    6. (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_ARRAY2)
+ *    7. (DATA_TYPE* IN_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+ *    8. (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_FARRAY2)
+ *
+ *    9. (DATA_TYPE IN_ARRAY3[ANY][ANY][ANY])
+ *   10. (DATA_TYPE* IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+ *   11. (DATA_TYPE** IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+ *   12. (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* IN_ARRAY3)
+ *   13. (DATA_TYPE* IN_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+ *   14. (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* IN_FARRAY3)
+ *
+ *   15. (DATA_TYPE IN_ARRAY4[ANY][ANY][ANY][ANY])
+ *   16. (DATA_TYPE* IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+ *   17. (DATA_TYPE** IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+ *   18. (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, , DIM_TYPE DIM4, DATA_TYPE* IN_ARRAY4)
+ *   19. (DATA_TYPE* IN_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+ *   20. (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* IN_FARRAY4)
+ *
+ *   21. (DATA_TYPE INPLACE_ARRAY1[ANY])
+ *   22. (DATA_TYPE* INPLACE_ARRAY1, DIM_TYPE DIM1)
+ *   23. (DIM_TYPE DIM1, DATA_TYPE* INPLACE_ARRAY1)
+ *
+ *   24. (DATA_TYPE INPLACE_ARRAY2[ANY][ANY])
+ *   25. (DATA_TYPE* INPLACE_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+ *   26. (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* INPLACE_ARRAY2)
+ *   27. (DATA_TYPE* INPLACE_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+ *   28. (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* INPLACE_FARRAY2)
+ *
+ *   29. (DATA_TYPE INPLACE_ARRAY3[ANY][ANY][ANY])
+ *   30. (DATA_TYPE* INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+ *   31. (DATA_TYPE** INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+ *   32. (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* INPLACE_ARRAY3)
+ *   33. (DATA_TYPE* INPLACE_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+ *   34. (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* INPLACE_FARRAY3)
+ *
+ *   35. (DATA_TYPE INPLACE_ARRAY4[ANY][ANY][ANY][ANY])
+ *   36. (DATA_TYPE* INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+ *   37. (DATA_TYPE** INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+ *   38. (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* INPLACE_ARRAY4)
+ *   39. (DATA_TYPE* INPLACE_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+ *   40. (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* INPLACE_FARRAY4)
+ *
+ *   41. (DATA_TYPE ARGOUT_ARRAY1[ANY])
+ *   42. (DATA_TYPE* ARGOUT_ARRAY1, DIM_TYPE DIM1)
+ *   43. (DIM_TYPE DIM1, DATA_TYPE* ARGOUT_ARRAY1)
+ *
+ *   44. (DATA_TYPE ARGOUT_ARRAY2[ANY][ANY])
+ *
+ *   45. (DATA_TYPE ARGOUT_ARRAY3[ANY][ANY][ANY])
+ *
+ *   46. (DATA_TYPE ARGOUT_ARRAY4[ANY][ANY][ANY][ANY])
+ *
+ *   47. (DATA_TYPE** ARGOUTVIEW_ARRAY1, DIM_TYPE* DIM1)
+ *   48. (DIM_TYPE* DIM1, DATA_TYPE** ARGOUTVIEW_ARRAY1)
+ *
+ *   49. (DATA_TYPE** ARGOUTVIEW_ARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+ *   50. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEW_ARRAY2)
+ *   51. (DATA_TYPE** ARGOUTVIEW_FARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+ *   52. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEW_FARRAY2)
+ *
+ *   53. (DATA_TYPE** ARGOUTVIEW_ARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+ *   54. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEW_ARRAY3)
+ *   55. (DATA_TYPE** ARGOUTVIEW_FARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+ *   56. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEW_FARRAY3)
+ *
+ *   57. (DATA_TYPE** ARGOUTVIEW_ARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ *   58. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEW_ARRAY4)
+ *   59. (DATA_TYPE** ARGOUTVIEW_FARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ *   60. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEW_FARRAY4)
+ *
+ *   61. (DATA_TYPE** ARGOUTVIEWM_ARRAY1, DIM_TYPE* DIM1)
+ *   62. (DIM_TYPE* DIM1, DATA_TYPE** ARGOUTVIEWM_ARRAY1)
+ *
+ *   63. (DATA_TYPE** ARGOUTVIEWM_ARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+ *   64. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEWM_ARRAY2)
+ *   65. (DATA_TYPE** ARGOUTVIEWM_FARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+ *   66. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEWM_FARRAY2)
+ *
+ *   67. (DATA_TYPE** ARGOUTVIEWM_ARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+ *   68. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEWM_ARRAY3)
+ *   69. (DATA_TYPE** ARGOUTVIEWM_FARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+ *   70. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEWM_FARRAY3)
+ *
+ *   71. (DATA_TYPE** ARGOUTVIEWM_ARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ *   72. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEWM_ARRAY4)
+ *   73. (DATA_TYPE** ARGOUTVIEWM_FARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ *   74. (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEWM_FARRAY4)
+ *
+ * where "DATA_TYPE" is any type supported by the NumPy module, and
+ * "DIM_TYPE" is any int-like type suitable for specifying dimensions.
+ * The difference between "ARRAY" typemaps and "FARRAY" typemaps is
+ * that the "FARRAY" typemaps expect Fortran ordering of
+ * multidimensional arrays.  In python, the dimensions will not need
+ * to be specified (except for the "DATA_TYPE* ARGOUT_ARRAY1"
+ * typemaps).  The IN_ARRAYs can be a numpy array or any sequence that
+ * can be converted to a numpy array of the specified type.  The
+ * INPLACE_ARRAYs must be numpy arrays of the appropriate type.  The
+ * ARGOUT_ARRAYs will be returned as new numpy arrays of the
+ * appropriate type.
+ *
+ * These typemaps can be applied to existing functions using the
+ * %apply directive.  For example:
+ *
+ *     %apply (double* IN_ARRAY1, int DIM1) {(double* series, int length)};
+ *     double prod(double* series, int length);
+ *
+ *     %apply (int DIM1, int DIM2, double* INPLACE_ARRAY2)
+ *           {(int rows, int cols, double* matrix        )};
+ *     void floor(int rows, int cols, double* matrix, double f);
+ *
+ *     %apply (double IN_ARRAY3[ANY][ANY][ANY])
+ *           {(double tensor[2][2][2]         )};
+ *     %apply (double ARGOUT_ARRAY3[ANY][ANY][ANY])
+ *           {(double low[2][2][2]                )};
+ *     %apply (double ARGOUT_ARRAY3[ANY][ANY][ANY])
+ *           {(double upp[2][2][2]                )};
+ *     void luSplit(double tensor[2][2][2],
+ *                  double low[2][2][2],
+ *                  double upp[2][2][2]    );
+ *
+ * or directly with
+ *
+ *     double prod(double* IN_ARRAY1, int DIM1);
+ *
+ *     void floor(int DIM1, int DIM2, double* INPLACE_ARRAY2, double f);
+ *
+ *     void luSplit(double IN_ARRAY3[ANY][ANY][ANY],
+ *                  double ARGOUT_ARRAY3[ANY][ANY][ANY],
+ *                  double ARGOUT_ARRAY3[ANY][ANY][ANY]);
+ */
+
+%define %numpy_typemaps(DATA_TYPE, DATA_TYPECODE, DIM_TYPE)
+
+/************************/
+/* Input Array Typemaps */
+/************************/
+
+/* Typemap suite for (DATA_TYPE IN_ARRAY1[ANY])
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE IN_ARRAY1[ANY])
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE IN_ARRAY1[ANY])
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[1] = { $1_dim0 };
+  array = obj_to_array_contiguous_allow_conversion($input,
+                                                   DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 1) ||
+      !require_size(array, size, 1)) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+%typemap(freearg)
+  (DATA_TYPE IN_ARRAY1[ANY])
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE* IN_ARRAY1, DIM_TYPE DIM1)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* IN_ARRAY1, DIM_TYPE DIM1)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* IN_ARRAY1, DIM_TYPE DIM1)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[1] = { -1 };
+  array = obj_to_array_contiguous_allow_conversion($input,
+                                                   DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 1) ||
+      !require_size(array, size, 1)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+}
+%typemap(freearg)
+  (DATA_TYPE* IN_ARRAY1, DIM_TYPE DIM1)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DATA_TYPE* IN_ARRAY1)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DATA_TYPE* IN_ARRAY1)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DATA_TYPE* IN_ARRAY1)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[1] = {-1};
+  array = obj_to_array_contiguous_allow_conversion($input,
+                                                   DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 1) ||
+      !require_size(array, size, 1)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DATA_TYPE*) array_data(array);
+}
+%typemap(freearg)
+  (DIM_TYPE DIM1, DATA_TYPE* IN_ARRAY1)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE IN_ARRAY2[ANY][ANY])
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE IN_ARRAY2[ANY][ANY])
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE IN_ARRAY2[ANY][ANY])
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[2] = { $1_dim0, $1_dim1 };
+  array = obj_to_array_contiguous_allow_conversion($input,
+                                                   DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 2) ||
+      !require_size(array, size, 2)) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+%typemap(freearg)
+  (DATA_TYPE IN_ARRAY2[ANY][ANY])
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE* IN_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* IN_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* IN_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[2] = { -1, -1 };
+  array = obj_to_array_contiguous_allow_conversion($input, DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 2) ||
+      !require_size(array, size, 2)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+}
+%typemap(freearg)
+  (DATA_TYPE* IN_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_ARRAY2)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_ARRAY2)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_ARRAY2)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[2] = { -1, -1 };
+  array = obj_to_array_contiguous_allow_conversion($input,
+                                                   DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 2) ||
+      !require_size(array, size, 2)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DATA_TYPE*) array_data(array);
+}
+%typemap(freearg)
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_ARRAY2)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE* IN_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* IN_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* IN_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[2] = { -1, -1 };
+  array = obj_to_array_fortran_allow_conversion($input,
+                                                DATA_TYPECODE,
+                                                &is_new_object);
+  if (!array || !require_dimensions(array, 2) ||
+      !require_size(array, size, 2) || !require_fortran(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+}
+%typemap(freearg)
+  (DATA_TYPE* IN_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_FARRAY2)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_FARRAY2)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_FARRAY2)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[2] = { -1, -1 };
+  array = obj_to_array_fortran_allow_conversion($input,
+                                                   DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 2) ||
+      !require_size(array, size, 2) || !require_fortran(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DATA_TYPE*) array_data(array);
+}
+%typemap(freearg)
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* IN_FARRAY2)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE IN_ARRAY3[ANY][ANY][ANY])
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE IN_ARRAY3[ANY][ANY][ANY])
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE IN_ARRAY3[ANY][ANY][ANY])
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[3] = { $1_dim0, $1_dim1, $1_dim2 };
+  array = obj_to_array_contiguous_allow_conversion($input,
+                                                   DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 3) ||
+      !require_size(array, size, 3)) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+%typemap(freearg)
+  (DATA_TYPE IN_ARRAY3[ANY][ANY][ANY])
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE* IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[3] = { -1, -1, -1 };
+  array = obj_to_array_contiguous_allow_conversion($input, DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 3) ||
+      !require_size(array, size, 3)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+  $4 = (DIM_TYPE) array_size(array,2);
+}
+%typemap(freearg)
+  (DATA_TYPE* IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE** IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE** IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  /* for now, only concerned with lists */
+  $1 = PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE** IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+  (DATA_TYPE** array=NULL, PyArrayObject** object_array=NULL, int* is_new_object_array=NULL)
+{
+  npy_intp size[2] = { -1, -1 };
+  PyArrayObject* temp_array;
+  Py_ssize_t i;
+  int is_new_object;
+
+  /* length of the list */
+  $2 = PyList_Size($input);
+
+  /* the arrays */
+  array = (DATA_TYPE **)malloc($2*sizeof(DATA_TYPE *));
+  object_array = (PyArrayObject **)calloc($2,sizeof(PyArrayObject *));
+  is_new_object_array = (int *)calloc($2,sizeof(int));
+
+  if (array == NULL || object_array == NULL || is_new_object_array == NULL)
+  {
+    SWIG_fail;
+  }
+
+  for (i=0; i<$2; i++)
+  {
+    temp_array = obj_to_array_contiguous_allow_conversion(PySequence_GetItem($input,i), DATA_TYPECODE, &is_new_object);
+
+    /* the new array must be stored so that it can be destroyed in freearg */
+    object_array[i] = temp_array;
+    is_new_object_array[i] = is_new_object;
+
+    if (!temp_array || !require_dimensions(temp_array, 2)) SWIG_fail;
+
+    /* store the size of the first array in the list, then use that for comparison. */
+    if (i == 0)
+    {
+      size[0] = array_size(temp_array,0);
+      size[1] = array_size(temp_array,1);
+    }
+
+    if (!require_size(temp_array, size, 2)) SWIG_fail;
+
+    array[i] = (DATA_TYPE*) array_data(temp_array);
+  }
+
+  $1 = (DATA_TYPE**) array;
+  $3 = (DIM_TYPE) size[0];
+  $4 = (DIM_TYPE) size[1];
+}
+%typemap(freearg)
+  (DATA_TYPE** IN_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  Py_ssize_t i;
+
+  if (array$argnum!=NULL) free(array$argnum);
+
+  /*freeing the individual arrays if needed */
+  if (object_array$argnum!=NULL)
+  {
+    if (is_new_object_array$argnum!=NULL)
+    {
+      for (i=0; i<$2; i++)
+      {
+        if (object_array$argnum[i] != NULL && is_new_object_array$argnum[i])
+        { Py_DECREF(object_array$argnum[i]); }
+      }
+      free(is_new_object_array$argnum);
+    }
+    free(object_array$argnum);
+  }
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3,
+ *                    DATA_TYPE* IN_ARRAY3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* IN_ARRAY3)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* IN_ARRAY3)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[3] = { -1, -1, -1 };
+  array = obj_to_array_contiguous_allow_conversion($input, DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 3) ||
+      !require_size(array, size, 3)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DIM_TYPE) array_size(array,2);
+  $4 = (DATA_TYPE*) array_data(array);
+}
+%typemap(freearg)
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* IN_ARRAY3)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE* IN_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* IN_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* IN_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[3] = { -1, -1, -1 };
+  array = obj_to_array_fortran_allow_conversion($input, DATA_TYPECODE,
+                                                &is_new_object);
+  if (!array || !require_dimensions(array, 3) ||
+      !require_size(array, size, 3) | !require_fortran(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+  $4 = (DIM_TYPE) array_size(array,2);
+}
+%typemap(freearg)
+  (DATA_TYPE* IN_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3,
+ *                    DATA_TYPE* IN_FARRAY3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* IN_FARRAY3)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* IN_FARRAY3)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[3] = { -1, -1, -1 };
+  array = obj_to_array_fortran_allow_conversion($input,
+                                                   DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 3) ||
+      !require_size(array, size, 3) || !require_fortran(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DIM_TYPE) array_size(array,2);
+  $4 = (DATA_TYPE*) array_data(array);
+}
+%typemap(freearg)
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* IN_FARRAY3)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE IN_ARRAY4[ANY][ANY][ANY][ANY])
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE IN_ARRAY4[ANY][ANY][ANY][ANY])
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE IN_ARRAY4[ANY][ANY][ANY][ANY])
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[4] = { $1_dim0, $1_dim1, $1_dim2 , $1_dim3};
+  array = obj_to_array_contiguous_allow_conversion($input, DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 4) ||
+      !require_size(array, size, 4)) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+%typemap(freearg)
+  (DATA_TYPE IN_ARRAY4[ANY][ANY][ANY][ANY])
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE* IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3, DIM_TYPE DIM4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[4] = { -1, -1, -1, -1 };
+  array = obj_to_array_contiguous_allow_conversion($input, DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 4) ||
+      !require_size(array, size, 4)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+  $4 = (DIM_TYPE) array_size(array,2);
+  $5 = (DIM_TYPE) array_size(array,3);
+}
+%typemap(freearg)
+  (DATA_TYPE* IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE** IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3, DIM_TYPE DIM4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE** IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  /* for now, only concerned with lists */
+  $1 = PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE** IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+  (DATA_TYPE** array=NULL, PyArrayObject** object_array=NULL, int* is_new_object_array=NULL)
+{
+  npy_intp size[3] = { -1, -1, -1 };
+  PyArrayObject* temp_array;
+  Py_ssize_t i;
+  int is_new_object;
+
+  /* length of the list */
+  $2 = PyList_Size($input);
+
+  /* the arrays */
+  array = (DATA_TYPE **)malloc($2*sizeof(DATA_TYPE *));
+  object_array = (PyArrayObject **)calloc($2,sizeof(PyArrayObject *));
+  is_new_object_array = (int *)calloc($2,sizeof(int));
+
+  if (array == NULL || object_array == NULL || is_new_object_array == NULL)
+  {
+    SWIG_fail;
+  }
+
+  for (i=0; i<$2; i++)
+  {
+    temp_array = obj_to_array_contiguous_allow_conversion(PySequence_GetItem($input,i), DATA_TYPECODE, &is_new_object);
+
+    /* the new array must be stored so that it can be destroyed in freearg */
+    object_array[i] = temp_array;
+    is_new_object_array[i] = is_new_object;
+
+    if (!temp_array || !require_dimensions(temp_array, 3)) SWIG_fail;
+
+    /* store the size of the first array in the list, then use that for comparison. */
+    if (i == 0)
+    {
+      size[0] = array_size(temp_array,0);
+      size[1] = array_size(temp_array,1);
+      size[2] = array_size(temp_array,2);
+    }
+
+    if (!require_size(temp_array, size, 3)) SWIG_fail;
+
+    array[i] = (DATA_TYPE*) array_data(temp_array);
+  }
+
+  $1 = (DATA_TYPE**) array;
+  $3 = (DIM_TYPE) size[0];
+  $4 = (DIM_TYPE) size[1];
+  $5 = (DIM_TYPE) size[2];
+}
+%typemap(freearg)
+  (DATA_TYPE** IN_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  Py_ssize_t i;
+
+  if (array$argnum!=NULL) free(array$argnum);
+
+  /*freeing the individual arrays if needed */
+  if (object_array$argnum!=NULL)
+  {
+    if (is_new_object_array$argnum!=NULL)
+    {
+      for (i=0; i<$2; i++)
+      {
+        if (object_array$argnum[i] != NULL && is_new_object_array$argnum[i])
+        { Py_DECREF(object_array$argnum[i]); }
+      }
+      free(is_new_object_array$argnum);
+    }
+    free(object_array$argnum);
+  }
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4,
+ *                    DATA_TYPE* IN_ARRAY4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* IN_ARRAY4)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* IN_ARRAY4)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[4] = { -1, -1, -1 , -1};
+  array = obj_to_array_contiguous_allow_conversion($input, DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 4) ||
+      !require_size(array, size, 4)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DIM_TYPE) array_size(array,2);
+  $4 = (DIM_TYPE) array_size(array,3);
+  $5 = (DATA_TYPE*) array_data(array);
+}
+%typemap(freearg)
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* IN_ARRAY4)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DATA_TYPE* IN_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3, DIM_TYPE DIM4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* IN_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* IN_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[4] = { -1, -1, -1, -1 };
+  array = obj_to_array_fortran_allow_conversion($input, DATA_TYPECODE,
+                                                &is_new_object);
+  if (!array || !require_dimensions(array, 4) ||
+      !require_size(array, size, 4) | !require_fortran(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+  $4 = (DIM_TYPE) array_size(array,2);
+  $5 = (DIM_TYPE) array_size(array,3);
+}
+%typemap(freearg)
+  (DATA_TYPE* IN_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4,
+ *                    DATA_TYPE* IN_FARRAY4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* IN_FARRAY4)
+{
+  $1 = is_array($input) || PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* IN_FARRAY4)
+  (PyArrayObject* array=NULL, int is_new_object=0)
+{
+  npy_intp size[4] = { -1, -1, -1 , -1 };
+  array = obj_to_array_fortran_allow_conversion($input, DATA_TYPECODE,
+                                                   &is_new_object);
+  if (!array || !require_dimensions(array, 4) ||
+      !require_size(array, size, 4) || !require_fortran(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DIM_TYPE) array_size(array,2);
+  $4 = (DIM_TYPE) array_size(array,3);
+  $5 = (DATA_TYPE*) array_data(array);
+}
+%typemap(freearg)
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* IN_FARRAY4)
+{
+  if (is_new_object$argnum && array$argnum)
+    { Py_DECREF(array$argnum); }
+}
+
+/***************************/
+/* In-Place Array Typemaps */
+/***************************/
+
+/* Typemap suite for (DATA_TYPE INPLACE_ARRAY1[ANY])
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE INPLACE_ARRAY1[ANY])
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE INPLACE_ARRAY1[ANY])
+  (PyArrayObject* array=NULL)
+{
+  npy_intp size[1] = { $1_dim0 };
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,1) || !require_size(array, size, 1) ||
+      !require_contiguous(array) || !require_native(array)) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE* INPLACE_ARRAY1, DIM_TYPE DIM1)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* INPLACE_ARRAY1, DIM_TYPE DIM1)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* INPLACE_ARRAY1, DIM_TYPE DIM1)
+  (PyArrayObject* array=NULL, int i=1)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,1) || !require_contiguous(array)
+      || !require_native(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = 1;
+  for (i=0; i < array_numdims(array); ++i) $2 *= array_size(array,i);
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DATA_TYPE* INPLACE_ARRAY1)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DATA_TYPE* INPLACE_ARRAY1)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DATA_TYPE* INPLACE_ARRAY1)
+  (PyArrayObject* array=NULL, int i=0)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,1) || !require_contiguous(array)
+      || !require_native(array)) SWIG_fail;
+  $1 = 1;
+  for (i=0; i < array_numdims(array); ++i) $1 *= array_size(array,i);
+  $2 = (DATA_TYPE*) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE INPLACE_ARRAY2[ANY][ANY])
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE INPLACE_ARRAY2[ANY][ANY])
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE INPLACE_ARRAY2[ANY][ANY])
+  (PyArrayObject* array=NULL)
+{
+  npy_intp size[2] = { $1_dim0, $1_dim1 };
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,2) || !require_size(array, size, 2) ||
+      !require_contiguous(array) || !require_native(array)) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE* INPLACE_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* INPLACE_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* INPLACE_ARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,2) || !require_contiguous(array)
+      || !require_native(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* INPLACE_ARRAY2)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* INPLACE_ARRAY2)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* INPLACE_ARRAY2)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,2) || !require_contiguous(array) ||
+      !require_native(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DATA_TYPE*) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE* INPLACE_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* INPLACE_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* INPLACE_FARRAY2, DIM_TYPE DIM1, DIM_TYPE DIM2)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,2) || !require_contiguous(array)
+      || !require_native(array) || !require_fortran(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* INPLACE_FARRAY2)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* INPLACE_FARRAY2)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DATA_TYPE* INPLACE_FARRAY2)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,2) || !require_contiguous(array) ||
+      !require_native(array) || !require_fortran(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DATA_TYPE*) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE INPLACE_ARRAY3[ANY][ANY][ANY])
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE INPLACE_ARRAY3[ANY][ANY][ANY])
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE INPLACE_ARRAY3[ANY][ANY][ANY])
+  (PyArrayObject* array=NULL)
+{
+  npy_intp size[3] = { $1_dim0, $1_dim1, $1_dim2 };
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,3) || !require_size(array, size, 3) ||
+      !require_contiguous(array) || !require_native(array)) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE* INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,3) || !require_contiguous(array) ||
+      !require_native(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+  $4 = (DIM_TYPE) array_size(array,2);
+}
+
+/* Typemap suite for (DATA_TYPE** INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE** INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  $1 = PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE** INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+  (DATA_TYPE** array=NULL, PyArrayObject** object_array=NULL)
+{
+  npy_intp size[2] = { -1, -1 };
+  PyArrayObject* temp_array;
+  Py_ssize_t i;
+
+  /* length of the list */
+  $2 = PyList_Size($input);
+
+  /* the arrays */
+  array = (DATA_TYPE **)malloc($2*sizeof(DATA_TYPE *));
+  object_array = (PyArrayObject **)calloc($2,sizeof(PyArrayObject *));
+
+  if (array == NULL || object_array == NULL)
+  {
+    SWIG_fail;
+  }
+
+  for (i=0; i<$2; i++)
+  {
+    temp_array = obj_to_array_no_conversion(PySequence_GetItem($input,i), DATA_TYPECODE);
+
+    /* the new array must be stored so that it can be destroyed in freearg */
+    object_array[i] = temp_array;
+
+    if ( !temp_array || !require_dimensions(temp_array, 2) ||
+      !require_contiguous(temp_array) ||
+      !require_native(temp_array) ||
+      !PyArray_EquivTypenums(array_type(temp_array), DATA_TYPECODE)
+    ) SWIG_fail;
+
+    /* store the size of the first array in the list, then use that for comparison. */
+    if (i == 0)
+    {
+      size[0] = array_size(temp_array,0);
+      size[1] = array_size(temp_array,1);
+    }
+
+    if (!require_size(temp_array, size, 2)) SWIG_fail;
+
+    array[i] = (DATA_TYPE*) array_data(temp_array);
+  }
+
+  $1 = (DATA_TYPE**) array;
+  $3 = (DIM_TYPE) size[0];
+  $4 = (DIM_TYPE) size[1];
+}
+%typemap(freearg)
+  (DATA_TYPE** INPLACE_ARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  if (array$argnum!=NULL) free(array$argnum);
+  if (object_array$argnum!=NULL) free(object_array$argnum);
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3,
+ *                    DATA_TYPE* INPLACE_ARRAY3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* INPLACE_ARRAY3)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* INPLACE_ARRAY3)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,3) || !require_contiguous(array)
+      || !require_native(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DIM_TYPE) array_size(array,2);
+  $4 = (DATA_TYPE*) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE* INPLACE_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* INPLACE_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* INPLACE_FARRAY3, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,3) || !require_contiguous(array) ||
+      !require_native(array) || !require_fortran(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+  $4 = (DIM_TYPE) array_size(array,2);
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3,
+ *                    DATA_TYPE* INPLACE_FARRAY3)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* INPLACE_FARRAY3)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DATA_TYPE* INPLACE_FARRAY3)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,3) || !require_contiguous(array)
+      || !require_native(array) || !require_fortran(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DIM_TYPE) array_size(array,2);
+  $4 = (DATA_TYPE*) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE INPLACE_ARRAY4[ANY][ANY][ANY][ANY])
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE INPLACE_ARRAY4[ANY][ANY][ANY][ANY])
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE INPLACE_ARRAY4[ANY][ANY][ANY][ANY])
+  (PyArrayObject* array=NULL)
+{
+  npy_intp size[4] = { $1_dim0, $1_dim1, $1_dim2 , $1_dim3 };
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,4) || !require_size(array, size, 4) ||
+      !require_contiguous(array) || !require_native(array)) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE* INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3, DIM_TYPE DIM4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,4) || !require_contiguous(array) ||
+      !require_native(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+  $4 = (DIM_TYPE) array_size(array,2);
+  $5 = (DIM_TYPE) array_size(array,3);
+}
+
+/* Typemap suite for (DATA_TYPE** INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3, DIM_TYPE DIM4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE** INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  $1 = PySequence_Check($input);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE** INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+  (DATA_TYPE** array=NULL, PyArrayObject** object_array=NULL)
+{
+  npy_intp size[3] = { -1, -1, -1 };
+  PyArrayObject* temp_array;
+  Py_ssize_t i;
+
+  /* length of the list */
+  $2 = PyList_Size($input);
+
+  /* the arrays */
+  array = (DATA_TYPE **)malloc($2*sizeof(DATA_TYPE *));
+  object_array = (PyArrayObject **)calloc($2,sizeof(PyArrayObject *));
+
+  if (array == NULL || object_array == NULL)
+  {
+    SWIG_fail;
+  }
+
+  for (i=0; i<$2; i++)
+  {
+    temp_array = obj_to_array_no_conversion(PySequence_GetItem($input,i), DATA_TYPECODE);
+
+    /* the new array must be stored so that it can be destroyed in freearg */
+    object_array[i] = temp_array;
+
+    if ( !temp_array || !require_dimensions(temp_array, 3) ||
+      !require_contiguous(temp_array) ||
+      !require_native(temp_array) ||
+      !PyArray_EquivTypenums(array_type(temp_array), DATA_TYPECODE)
+    ) SWIG_fail;
+
+    /* store the size of the first array in the list, then use that for comparison. */
+    if (i == 0)
+    {
+      size[0] = array_size(temp_array,0);
+      size[1] = array_size(temp_array,1);
+      size[2] = array_size(temp_array,2);
+    }
+
+    if (!require_size(temp_array, size, 3)) SWIG_fail;
+
+    array[i] = (DATA_TYPE*) array_data(temp_array);
+  }
+
+  $1 = (DATA_TYPE**) array;
+  $3 = (DIM_TYPE) size[0];
+  $4 = (DIM_TYPE) size[1];
+  $5 = (DIM_TYPE) size[2];
+}
+%typemap(freearg)
+  (DATA_TYPE** INPLACE_ARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  if (array$argnum!=NULL) free(array$argnum);
+  if (object_array$argnum!=NULL) free(object_array$argnum);
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4,
+ *                    DATA_TYPE* INPLACE_ARRAY4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* INPLACE_ARRAY4)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* INPLACE_ARRAY4)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,4) || !require_contiguous(array)
+      || !require_native(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DIM_TYPE) array_size(array,2);
+  $4 = (DIM_TYPE) array_size(array,3);
+  $5 = (DATA_TYPE*) array_data(array);
+}
+
+/* Typemap suite for (DATA_TYPE* INPLACE_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2,
+ *                    DIM_TYPE DIM3, DIM_TYPE DIM4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DATA_TYPE* INPLACE_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* INPLACE_FARRAY4, DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,4) || !require_contiguous(array) ||
+      !require_native(array) || !require_fortran(array)) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+  $2 = (DIM_TYPE) array_size(array,0);
+  $3 = (DIM_TYPE) array_size(array,1);
+  $4 = (DIM_TYPE) array_size(array,2);
+  $5 = (DIM_TYPE) array_size(array,3);
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3,
+ *                    DATA_TYPE* INPLACE_FARRAY4)
+ */
+%typecheck(SWIG_TYPECHECK_DOUBLE_ARRAY,
+           fragment="NumPy_Macros")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* INPLACE_FARRAY4)
+{
+  $1 = is_array($input) && PyArray_EquivTypenums(array_type($input),
+                                                 DATA_TYPECODE);
+}
+%typemap(in,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DIM_TYPE DIM2, DIM_TYPE DIM3, DIM_TYPE DIM4, DATA_TYPE* INPLACE_FARRAY4)
+  (PyArrayObject* array=NULL)
+{
+  array = obj_to_array_no_conversion($input, DATA_TYPECODE);
+  if (!array || !require_dimensions(array,4) || !require_contiguous(array)
+      || !require_native(array) || !require_fortran(array)) SWIG_fail;
+  $1 = (DIM_TYPE) array_size(array,0);
+  $2 = (DIM_TYPE) array_size(array,1);
+  $3 = (DIM_TYPE) array_size(array,2);
+  $4 = (DIM_TYPE) array_size(array,3);
+  $5 = (DATA_TYPE*) array_data(array);
+}
+
+/*************************/
+/* Argout Array Typemaps */
+/*************************/
+
+/* Typemap suite for (DATA_TYPE ARGOUT_ARRAY1[ANY])
+ */
+%typemap(in,numinputs=0,
+         fragment="NumPy_Backward_Compatibility,NumPy_Macros")
+  (DATA_TYPE ARGOUT_ARRAY1[ANY])
+  (PyObject* array = NULL)
+{
+  npy_intp dims[1] = { $1_dim0 };
+  array = PyArray_SimpleNew(1, dims, DATA_TYPECODE);
+  if (!array) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+%typemap(argout)
+  (DATA_TYPE ARGOUT_ARRAY1[ANY])
+{
+  $result = SWIG_Python_AppendOutput($result,(PyObject*)array$argnum);
+}
+
+/* Typemap suite for (DATA_TYPE* ARGOUT_ARRAY1, DIM_TYPE DIM1)
+ */
+%typemap(in,numinputs=1,
+         fragment="NumPy_Fragments")
+  (DATA_TYPE* ARGOUT_ARRAY1, DIM_TYPE DIM1)
+  (PyObject* array = NULL)
+{
+  npy_intp dims[1];
+  if (!PyInt_Check($input))
+  {
+    const char* typestring = pytype_string($input);
+    PyErr_Format(PyExc_TypeError,
+                 "Int dimension expected.  '%s' given.",
+                 typestring);
+    SWIG_fail;
+  }
+  $2 = (DIM_TYPE) PyInt_AsLong($input);
+  dims[0] = (npy_intp) $2;
+  array = PyArray_SimpleNew(1, dims, DATA_TYPECODE);
+  if (!array) SWIG_fail;
+  $1 = (DATA_TYPE*) array_data(array);
+}
+%typemap(argout)
+  (DATA_TYPE* ARGOUT_ARRAY1, DIM_TYPE DIM1)
+{
+  $result = SWIG_Python_AppendOutput($result,(PyObject*)array$argnum);
+}
+
+/* Typemap suite for (DIM_TYPE DIM1, DATA_TYPE* ARGOUT_ARRAY1)
+ */
+%typemap(in,numinputs=1,
+         fragment="NumPy_Fragments")
+  (DIM_TYPE DIM1, DATA_TYPE* ARGOUT_ARRAY1)
+  (PyObject* array = NULL)
+{
+  npy_intp dims[1];
+  if (!PyInt_Check($input))
+  {
+    const char* typestring = pytype_string($input);
+    PyErr_Format(PyExc_TypeError,
+                 "Int dimension expected.  '%s' given.",
+                 typestring);
+    SWIG_fail;
+  }
+  $1 = (DIM_TYPE) PyInt_AsLong($input);
+  dims[0] = (npy_intp) $1;
+  array = PyArray_SimpleNew(1, dims, DATA_TYPECODE);
+  if (!array) SWIG_fail;
+  $2 = (DATA_TYPE*) array_data(array);
+}
+%typemap(argout)
+  (DIM_TYPE DIM1, DATA_TYPE* ARGOUT_ARRAY1)
+{
+  $result = SWIG_Python_AppendOutput($result,(PyObject*)array$argnum);
+}
+
+/* Typemap suite for (DATA_TYPE ARGOUT_ARRAY2[ANY][ANY])
+ */
+%typemap(in,numinputs=0,
+         fragment="NumPy_Backward_Compatibility,NumPy_Macros")
+  (DATA_TYPE ARGOUT_ARRAY2[ANY][ANY])
+  (PyObject* array = NULL)
+{
+  npy_intp dims[2] = { $1_dim0, $1_dim1 };
+  array = PyArray_SimpleNew(2, dims, DATA_TYPECODE);
+  if (!array) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+%typemap(argout)
+  (DATA_TYPE ARGOUT_ARRAY2[ANY][ANY])
+{
+  $result = SWIG_Python_AppendOutput($result,(PyObject*)array$argnum);
+}
+
+/* Typemap suite for (DATA_TYPE ARGOUT_ARRAY3[ANY][ANY][ANY])
+ */
+%typemap(in,numinputs=0,
+         fragment="NumPy_Backward_Compatibility,NumPy_Macros")
+  (DATA_TYPE ARGOUT_ARRAY3[ANY][ANY][ANY])
+  (PyObject* array = NULL)
+{
+  npy_intp dims[3] = { $1_dim0, $1_dim1, $1_dim2 };
+  array = PyArray_SimpleNew(3, dims, DATA_TYPECODE);
+  if (!array) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+%typemap(argout)
+  (DATA_TYPE ARGOUT_ARRAY3[ANY][ANY][ANY])
+{
+  $result = SWIG_Python_AppendOutput($result,(PyObject*)array$argnum);
+}
+
+/* Typemap suite for (DATA_TYPE ARGOUT_ARRAY4[ANY][ANY][ANY][ANY])
+ */
+%typemap(in,numinputs=0,
+         fragment="NumPy_Backward_Compatibility,NumPy_Macros")
+  (DATA_TYPE ARGOUT_ARRAY4[ANY][ANY][ANY][ANY])
+  (PyObject* array = NULL)
+{
+  npy_intp dims[4] = { $1_dim0, $1_dim1, $1_dim2, $1_dim3 };
+  array = PyArray_SimpleNew(4, dims, DATA_TYPECODE);
+  if (!array) SWIG_fail;
+  $1 = ($1_ltype) array_data(array);
+}
+%typemap(argout)
+  (DATA_TYPE ARGOUT_ARRAY4[ANY][ANY][ANY][ANY])
+{
+  $result = SWIG_Python_AppendOutput($result,(PyObject*)array$argnum);
+}
+
+/*****************************/
+/* Argoutview Array Typemaps */
+/*****************************/
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEW_ARRAY1, DIM_TYPE* DIM1)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEW_ARRAY1, DIM_TYPE* DIM1    )
+  (DATA_TYPE*  data_temp = NULL , DIM_TYPE  dim_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility")
+  (DATA_TYPE** ARGOUTVIEW_ARRAY1, DIM_TYPE* DIM1)
+{
+  npy_intp dims[1] = { *$2 };
+  PyObject* obj = PyArray_SimpleNewFromData(1, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DATA_TYPE** ARGOUTVIEW_ARRAY1)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DATA_TYPE** ARGOUTVIEW_ARRAY1)
+  (DIM_TYPE  dim_temp, DATA_TYPE*  data_temp = NULL )
+{
+  $1 = &dim_temp;
+  $2 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility")
+  (DIM_TYPE* DIM1, DATA_TYPE** ARGOUTVIEW_ARRAY1)
+{
+  npy_intp dims[1] = { *$1 };
+  PyObject* obj = PyArray_SimpleNewFromData(1, dims, DATA_TYPECODE, (void*)(*$2));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEW_ARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEW_ARRAY2, DIM_TYPE* DIM1     , DIM_TYPE* DIM2     )
+  (DATA_TYPE*  data_temp = NULL , DIM_TYPE  dim1_temp, DIM_TYPE  dim2_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility")
+  (DATA_TYPE** ARGOUTVIEW_ARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+{
+  npy_intp dims[2] = { *$2, *$3 };
+  PyObject* obj = PyArray_SimpleNewFromData(2, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEW_ARRAY2)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1     , DIM_TYPE* DIM2     , DATA_TYPE** ARGOUTVIEW_ARRAY2)
+  (DIM_TYPE  dim1_temp, DIM_TYPE  dim2_temp, DATA_TYPE*  data_temp = NULL )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEW_ARRAY2)
+{
+  npy_intp dims[2] = { *$1, *$2 };
+  PyObject* obj = PyArray_SimpleNewFromData(2, dims, DATA_TYPECODE, (void*)(*$3));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEW_FARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEW_FARRAY2, DIM_TYPE* DIM1     , DIM_TYPE* DIM2     )
+  (DATA_TYPE*  data_temp = NULL  , DIM_TYPE  dim1_temp, DIM_TYPE  dim2_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements")
+  (DATA_TYPE** ARGOUTVIEW_FARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+{
+  npy_intp dims[2] = { *$2, *$3 };
+  PyObject* obj = PyArray_SimpleNewFromData(2, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEW_FARRAY2)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1     , DIM_TYPE* DIM2     , DATA_TYPE** ARGOUTVIEW_FARRAY2)
+  (DIM_TYPE  dim1_temp, DIM_TYPE  dim2_temp, DATA_TYPE*  data_temp = NULL  )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEW_FARRAY2)
+{
+  npy_intp dims[2] = { *$1, *$2 };
+  PyObject* obj = PyArray_SimpleNewFromData(2, dims, DATA_TYPECODE, (void*)(*$3));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEW_ARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEW_ARRAY3, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    )
+  (DATA_TYPE* data_temp = NULL  , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility")
+  (DATA_TYPE** ARGOUTVIEW_ARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+{
+  npy_intp dims[3] = { *$2, *$3, *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3,
+                      DATA_TYPE** ARGOUTVIEW_ARRAY3)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEW_ARRAY3)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DATA_TYPE* data_temp = NULL)
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEW_ARRAY3)
+{
+  npy_intp dims[3] = { *$1, *$2, *$3 };
+  PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$4));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEW_FARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEW_FARRAY3, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    )
+  (DATA_TYPE* data_temp = NULL   , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements")
+  (DATA_TYPE** ARGOUTVIEW_FARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+{
+  npy_intp dims[3] = { *$2, *$3, *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3,
+                      DATA_TYPE** ARGOUTVIEW_FARRAY3)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DATA_TYPE** ARGOUTVIEW_FARRAY3)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DATA_TYPE* data_temp = NULL   )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEW_FARRAY3)
+{
+  npy_intp dims[3] = { *$1, *$2, *$3 };
+  PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$4));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEW_ARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEW_ARRAY4, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    )
+  (DATA_TYPE* data_temp = NULL  , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+  $5 = &dim4_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility")
+  (DATA_TYPE** ARGOUTVIEW_ARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+{
+  npy_intp dims[4] = { *$2, *$3, *$4 , *$5 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4,
+                      DATA_TYPE** ARGOUTVIEW_ARRAY4)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    , DATA_TYPE** ARGOUTVIEW_ARRAY4)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp, DATA_TYPE* data_temp = NULL  )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &dim4_temp;
+  $5 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEW_ARRAY4)
+{
+  npy_intp dims[4] = { *$1, *$2, *$3 , *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$5));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEW_FARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEW_FARRAY4, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    )
+  (DATA_TYPE* data_temp = NULL   , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+  $5 = &dim4_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements")
+  (DATA_TYPE** ARGOUTVIEW_FARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+{
+  npy_intp dims[4] = { *$2, *$3, *$4 , *$5 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4,
+                      DATA_TYPE** ARGOUTVIEW_FARRAY4)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    , DATA_TYPE** ARGOUTVIEW_FARRAY4)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp, DATA_TYPE* data_temp = NULL   )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &dim4_temp;
+  $5 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEW_FARRAY4)
+{
+  npy_intp dims[4] = { *$1, *$2, *$3 , *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$5));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/*************************************/
+/* Managed Argoutview Array Typemaps */
+/*************************************/
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_ARRAY1, DIM_TYPE* DIM1)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY1, DIM_TYPE* DIM1    )
+  (DATA_TYPE*  data_temp = NULL  , DIM_TYPE  dim_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY1, DIM_TYPE* DIM1)
+{
+  npy_intp dims[1] = { *$2 };
+  PyObject* obj = PyArray_SimpleNewFromData(1, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DATA_TYPE** ARGOUTVIEWM_ARRAY1)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DATA_TYPE** ARGOUTVIEWM_ARRAY1)
+  (DIM_TYPE  dim_temp, DATA_TYPE*  data_temp = NULL  )
+{
+  $1 = &dim_temp;
+  $2 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DATA_TYPE** ARGOUTVIEWM_ARRAY1)
+{
+  npy_intp dims[1] = { *$1 };
+  PyObject* obj = PyArray_SimpleNewFromData(1, dims, DATA_TYPECODE, (void*)(*$2));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_ARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY2, DIM_TYPE* DIM1     , DIM_TYPE* DIM2     )
+  (DATA_TYPE*  data_temp = NULL  , DIM_TYPE  dim1_temp, DIM_TYPE  dim2_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+{
+  npy_intp dims[2] = { *$2, *$3 };
+  PyObject* obj = PyArray_SimpleNewFromData(2, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEWM_ARRAY2)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1     , DIM_TYPE* DIM2     , DATA_TYPE** ARGOUTVIEWM_ARRAY2)
+  (DIM_TYPE  dim1_temp, DIM_TYPE  dim2_temp, DATA_TYPE*  data_temp = NULL  )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEWM_ARRAY2)
+{
+  npy_intp dims[2] = { *$1, *$2 };
+  PyObject* obj = PyArray_SimpleNewFromData(2, dims, DATA_TYPECODE, (void*)(*$3));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_FARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_FARRAY2, DIM_TYPE* DIM1     , DIM_TYPE* DIM2     )
+  (DATA_TYPE*  data_temp = NULL   , DIM_TYPE  dim1_temp, DIM_TYPE  dim2_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_FARRAY2, DIM_TYPE* DIM1, DIM_TYPE* DIM2)
+{
+  npy_intp dims[2] = { *$2, *$3 };
+  PyObject* obj = PyArray_SimpleNewFromData(2, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEWM_FARRAY2)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1     , DIM_TYPE* DIM2     , DATA_TYPE** ARGOUTVIEWM_FARRAY2)
+  (DIM_TYPE  dim1_temp, DIM_TYPE  dim2_temp, DATA_TYPE*  data_temp = NULL   )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DATA_TYPE** ARGOUTVIEWM_FARRAY2)
+{
+  npy_intp dims[2] = { *$1, *$2 };
+  PyObject* obj = PyArray_SimpleNewFromData(2, dims, DATA_TYPECODE, (void*)(*$3));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_ARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY3, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    )
+  (DATA_TYPE* data_temp = NULL   , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+{
+  npy_intp dims[3] = { *$2, *$3, *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3,
+                      DATA_TYPE** ARGOUTVIEWM_ARRAY3)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DATA_TYPE** ARGOUTVIEWM_ARRAY3)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DATA_TYPE* data_temp = NULL   )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEWM_ARRAY3)
+{
+  npy_intp dims[3] = { *$1, *$2, *$3 };
+  PyObject* obj= PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$4));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_FARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_FARRAY3, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    )
+  (DATA_TYPE* data_temp = NULL    , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_FARRAY3, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+{
+  npy_intp dims[3] = { *$2, *$3, *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3,
+                      DATA_TYPE** ARGOUTVIEWM_FARRAY3)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DATA_TYPE** ARGOUTVIEWM_FARRAY3)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DATA_TYPE* data_temp = NULL    )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DATA_TYPE** ARGOUTVIEWM_FARRAY3)
+{
+  npy_intp dims[3] = { *$1, *$2, *$3 };
+  PyObject* obj = PyArray_SimpleNewFromData(3, dims, DATA_TYPECODE, (void*)(*$4));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_ARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY4, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    )
+  (DATA_TYPE* data_temp = NULL   , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+  $5 = &dim4_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+{
+  npy_intp dims[4] = { *$2, *$3, *$4 , *$5 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4,
+                      DATA_TYPE** ARGOUTVIEWM_ARRAY4)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    , DATA_TYPE** ARGOUTVIEWM_ARRAY4)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp, DATA_TYPE* data_temp = NULL   )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &dim4_temp;
+  $5 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEWM_ARRAY4)
+{
+  npy_intp dims[4] = { *$1, *$2, *$3 , *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$5));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_FARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_FARRAY4, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    )
+  (DATA_TYPE* data_temp = NULL    , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+  $5 = &dim4_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_FARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3)
+{
+  npy_intp dims[4] = { *$2, *$3, *$4 , *$5 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4,
+                      DATA_TYPE** ARGOUTVIEWM_FARRAY4)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    , DATA_TYPE** ARGOUTVIEWM_FARRAY4)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp, DATA_TYPE* data_temp = NULL    )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &dim4_temp;
+  $5 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEWM_FARRAY4)
+{
+  npy_intp dims[4] = { *$1, *$2, *$3 , *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$5));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_ARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY4, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    )
+  (DATA_TYPE* data_temp = NULL   , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+  $5 = &dim4_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_ARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+{
+  npy_intp dims[4] = { *$2, *$3, *$4 , *$5 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4,
+                      DATA_TYPE** ARGOUTVIEWM_ARRAY4)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    , DATA_TYPE** ARGOUTVIEWM_ARRAY4)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp, DATA_TYPE* data_temp = NULL   )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &dim4_temp;
+  $5 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEWM_ARRAY4)
+{
+  npy_intp dims[4] = { *$1, *$2, *$3 , *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$5));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DATA_TYPE** ARGOUTVIEWM_FARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2,
+                      DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+ */
+%typemap(in,numinputs=0)
+  (DATA_TYPE** ARGOUTVIEWM_FARRAY4, DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    )
+  (DATA_TYPE* data_temp = NULL    , DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp)
+{
+  $1 = &data_temp;
+  $2 = &dim1_temp;
+  $3 = &dim2_temp;
+  $4 = &dim3_temp;
+  $5 = &dim4_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements,NumPy_Utilities")
+  (DATA_TYPE** ARGOUTVIEWM_FARRAY4, DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4)
+{
+  npy_intp dims[4] = { *$2, *$3, *$4 , *$5 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$1));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+/* Typemap suite for (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4,
+                      DATA_TYPE** ARGOUTVIEWM_FARRAY4)
+ */
+%typemap(in,numinputs=0)
+  (DIM_TYPE* DIM1    , DIM_TYPE* DIM2    , DIM_TYPE* DIM3    , DIM_TYPE* DIM4    , DATA_TYPE** ARGOUTVIEWM_FARRAY4)
+  (DIM_TYPE dim1_temp, DIM_TYPE dim2_temp, DIM_TYPE dim3_temp, DIM_TYPE dim4_temp, DATA_TYPE* data_temp = NULL    )
+{
+  $1 = &dim1_temp;
+  $2 = &dim2_temp;
+  $3 = &dim3_temp;
+  $4 = &dim4_temp;
+  $5 = &data_temp;
+}
+%typemap(argout,
+         fragment="NumPy_Backward_Compatibility,NumPy_Array_Requirements,NumPy_Utilities")
+  (DIM_TYPE* DIM1, DIM_TYPE* DIM2, DIM_TYPE* DIM3, DIM_TYPE* DIM4, DATA_TYPE** ARGOUTVIEWM_FARRAY4)
+{
+  npy_intp dims[4] = { *$1, *$2, *$3 , *$4 };
+  PyObject* obj = PyArray_SimpleNewFromData(4, dims, DATA_TYPECODE, (void*)(*$5));
+  PyArrayObject* array = (PyArrayObject*) obj;
+
+  if (!array || !require_fortran(array)) SWIG_fail;
+
+%#ifdef SWIGPY_USE_CAPSULE
+    PyObject* cap = PyCapsule_New((void*)(*$1), SWIGPY_CAPSULE_NAME, free_cap);
+%#else
+    PyObject* cap = PyCObject_FromVoidPtr((void*)(*$1), free);
+%#endif
+
+%#if NPY_API_VERSION < 0x00000007
+  PyArray_BASE(array) = cap;
+%#else
+  PyArray_SetBaseObject(array,cap);
+%#endif
+
+  $result = SWIG_Python_AppendOutput($result,obj);
+}
+
+%enddef    /* %numpy_typemaps() macro */
+/* *************************************************************** */
+
+/* Concrete instances of the %numpy_typemaps() macro: Each invocation
+ * below applies all of the typemaps above to the specified data type.
+ */
+%numpy_typemaps(signed char       , NPY_BYTE     , int)
+%numpy_typemaps(unsigned char     , NPY_UBYTE    , int)
+%numpy_typemaps(short             , NPY_SHORT    , int)
+%numpy_typemaps(unsigned short    , NPY_USHORT   , int)
+%numpy_typemaps(int               , NPY_INT      , int)
+%numpy_typemaps(unsigned int      , NPY_UINT     , int)
+%numpy_typemaps(long              , NPY_LONG     , int)
+%numpy_typemaps(unsigned long     , NPY_ULONG    , int)
+%numpy_typemaps(long long         , NPY_LONGLONG , int)
+%numpy_typemaps(unsigned long long, NPY_ULONGLONG, int)
+%numpy_typemaps(float             , NPY_FLOAT    , int)
+%numpy_typemaps(double            , NPY_DOUBLE   , int)
+
+/* ***************************************************************
+ * The follow macro expansion does not work, because C++ bool is 4
+ * bytes and NPY_BOOL is 1 byte
+ *
+ *    %numpy_typemaps(bool, NPY_BOOL, int)
+ */
+
+/* ***************************************************************
+ * On my Mac, I get the following warning for this macro expansion:
+ * 'swig/python detected a memory leak of type 'long double *', no destructor found.'
+ *
+ *    %numpy_typemaps(long double, NPY_LONGDOUBLE, int)
+ */
+
+/* ***************************************************************
+ * Swig complains about a syntax error for the following macro
+ * expansions:
+ *
+ *    %numpy_typemaps(complex float,  NPY_CFLOAT , int)
+ *
+ *    %numpy_typemaps(complex double, NPY_CDOUBLE, int)
+ *
+ *    %numpy_typemaps(complex long double, NPY_CLONGDOUBLE, int)
+ */
+
+#endif /* SWIGPYTHON */
diff --git a/tensorflow/python/platform/parameterized.py b/tensorflow/python/platform/parameterized.py
new file mode 100644
index 0000000000..cf01512bc1
--- /dev/null
+++ b/tensorflow/python/platform/parameterized.py
@@ -0,0 +1,10 @@
+"""Switch between depending on pyglib.gfile or an OSS replacement."""
+# pylint: disable=unused-import
+# pylint: disable=g-import-not-at-top
+# pylint: disable=wildcard-import
+import tensorflow.python.platform
+import control_imports
+if control_imports.USE_OSS and control_imports.OSS_PARAMETERIZED:
+  from tensorflow.python.platform.default._parameterized import *
+else:
+  from tensorflow.python.platform.google._parameterized import *
diff --git a/tensorflow/python/platform/resource_loader.py b/tensorflow/python/platform/resource_loader.py
new file mode 100644
index 0000000000..a0e6546c28
--- /dev/null
+++ b/tensorflow/python/platform/resource_loader.py
@@ -0,0 +1,10 @@
+"""Load a file resource and return the contents."""
+# pylint: disable=unused-import
+# pylint: disable=g-import-not-at-top
+# pylint: disable=wildcard-import
+import control_imports
+import tensorflow.python.platform
+if control_imports.USE_OSS:
+  from tensorflow.python.platform.default._resource_loader import *
+else:
+  from tensorflow.python.platform.google._resource_loader import *
diff --git a/tensorflow/python/platform/status_bar.py b/tensorflow/python/platform/status_bar.py
new file mode 100644
index 0000000000..720b9d82c0
--- /dev/null
+++ b/tensorflow/python/platform/status_bar.py
@@ -0,0 +1,10 @@
+"""Switch between an internal status bar and a no-op version."""
+# pylint: disable=unused-import
+# pylint: disable=g-import-not-at-top
+# pylint: disable=wildcard-import
+import tensorflow.python.platform
+import control_imports
+if control_imports.USE_OSS:
+  from tensorflow.python.platform.default._status_bar import *
+else:
+  from tensorflow.python.platform.google._status_bar import *
diff --git a/tensorflow/python/platform/test.py b/tensorflow/python/platform/test.py
new file mode 100644
index 0000000000..7d46f9cbc2
--- /dev/null
+++ b/tensorflow/python/platform/test.py
@@ -0,0 +1,6 @@
+from tensorflow.python.platform.googletest import GetTempDir
+from tensorflow.python.platform.googletest import main
+from tensorflow.python.framework.test_util import TensorFlowTestCase as TestCase
+from tensorflow.python.framework.test_util import IsGoogleCudaEnabled as IsBuiltWithCuda
+
+get_temp_dir = GetTempDir
diff --git a/tensorflow/python/summary/README.md b/tensorflow/python/summary/README.md
new file mode 100644
index 0000000000..8a5fea0d9a
--- /dev/null
+++ b/tensorflow/python/summary/README.md
@@ -0,0 +1,15 @@
+# TensorFlow Event Processing
+
+This folder contains classes useful for analyzing and visualizing TensorFlow
+events files. The code is primarily being developed to support TensorBoard,
+but it can be used by anyone who wishes to analyze or visualize TensorFlow
+events files.
+
+If you wish to load TensorFlow events, you should use an EventAccumulator
+(to load from a single events file) or an EventMultiplexer (to load from
+multiple events files).
+
+The API around these tools has not solidified, and we may make backwards-
+incompatible changes without warning.
+
+If you have questions or requests, please contact danmane@google.com
diff --git a/tensorflow/python/summary/__init__.py b/tensorflow/python/summary/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/summary/__init__.py
diff --git a/tensorflow/python/summary/event_accumulator.py b/tensorflow/python/summary/event_accumulator.py
new file mode 100644
index 0000000000..ae067d94fe
--- /dev/null
+++ b/tensorflow/python/summary/event_accumulator.py
@@ -0,0 +1,433 @@
+"""Takes a generator of values, and accumulates them for a frontend."""
+
+import collections
+import threading
+
+from tensorflow.python.platform import gfile
+from tensorflow.python.platform import logging
+from tensorflow.python.summary.impl import directory_watcher
+from tensorflow.python.summary.impl import event_file_loader
+from tensorflow.python.summary.impl import reservoir
+
+namedtuple = collections.namedtuple
+ScalarEvent = namedtuple('ScalarEvent',
+                         ['wall_time', 'step', 'value'])
+
+CompressedHistogramEvent = namedtuple('CompressedHistogramEvent',
+                                      ['wall_time', 'step',
+                                       'compressed_histogram_values'])
+
+CompressedHistogramValue = namedtuple('CompressedHistogramValue',
+                                      ['basis_point', 'value'])
+
+HistogramEvent = namedtuple('HistogramEvent',
+                            ['wall_time', 'step', 'histogram_value'])
+
+HistogramValue = namedtuple('HistogramValue',
+                            ['min', 'max', 'num', 'sum', 'sum_squares',
+                             'bucket_limit', 'bucket'])
+
+ImageEvent = namedtuple('ImageEvent',
+                        ['wall_time', 'step', 'encoded_image_string',
+                         'width', 'height'])
+
+## The tagTypes below are just arbitrary strings chosen to pass the type
+## information of the tag from the backend to the frontend
+COMPRESSED_HISTOGRAMS = 'compressedHistograms'
+HISTOGRAMS = 'histograms'
+IMAGES = 'images'
+SCALARS = 'scalars'
+GRAPH = 'graph'
+
+## normal CDF for std_devs: (-Inf, -1.5, -1, -0.5, 0, 0.5, 1, 1.5, Inf)
+## naturally gives bands around median of width 1 std dev, 2 std dev, 3 std dev,
+## and then the long tail.
+NORMAL_HISTOGRAM_BPS = (0, 668, 1587, 3085, 5000, 6915, 8413, 9332, 10000)
+
+DEFAULT_SIZE_GUIDANCE = {
+    COMPRESSED_HISTOGRAMS: 500,
+    IMAGES: 4,
+    SCALARS: 10000,
+    HISTOGRAMS: 1,
+}
+
+STORE_EVERYTHING_SIZE_GUIDANCE = {
+    COMPRESSED_HISTOGRAMS: 0,
+    IMAGES: 0,
+    SCALARS: 0,
+    HISTOGRAMS: 0,
+}
+
+
+def IsTensorFlowEventsFile(path):
+  """Check the path name to see if it is probably a TF Events file."""
+  return 'tfevents' in path
+
+
+class EventAccumulator(object):
+  """An `EventAccumulator` takes an event generator, and accumulates the values.
+
+  The `EventAccumulator` is intended to provide a convenient Python interface
+  for loading Event data written during a TensorFlow run. TensorFlow writes out
+  `Event` protobuf objects, which have a timestamp and step number, and often
+  contain a `Summary`. Summaries can have different kinds of data like an image,
+  a scalar value, or a histogram. The Summaries also have a tag, which we use to
+  organize logically related data. The `EventAccumulator` supports retrieving
+  the `Event` and `Summary` data by its tag.
+
+  Calling `Tags()` gets a map from `tagType` (e.g. `'images'`,
+  `'compressedHistograms'`, `'scalars'`, etc) to the associated tags for those
+  data types. Then, various functional endpoints (eg
+  `Accumulator.Scalars(tag)`) allow for the retrieval of all data
+  associated with that tag.
+
+  Before usage, the `EventAccumulator` must be activated via `Reload()` or
+  `AutoUpdate(interval)`.
+
+  If activated via `Reload()`, it loads synchronously, so calls to `Values` or
+  `Tags` will block until all outstanding events are processed. Afterwards,
+  `Reload()` may be called again to load any new data.
+
+  If activated via `AutoUpdate(interval)`, it loads asynchronously, so calls to
+  `Values` or `Tags` will immediately return a valid subset of the outstanding
+  event data. It reloads new data every `interval` seconds.
+
+  Histograms and images are very large, so storing all of them is not
+  recommended.
+
+  @@Reload
+  @@AutoUpdate
+  @@Tags
+  @@Scalars
+  @@Graph
+  @@Histograms
+  @@CompressedHistograms
+  @@Images
+  """
+
+  def __init__(self, path, size_guidance=DEFAULT_SIZE_GUIDANCE,
+               compression_bps=NORMAL_HISTOGRAM_BPS):
+    """Construct the `EventAccumulator`.
+
+    Args:
+      path: A file path to a directory containing tf events files, or a single
+        tf events file. The accumulator will load events from this path.
+      size_guidance: Information on how much data the EventAccumulator should
+        store in memory. The DEFAULT_SIZE_GUIDANCE tries not to store too much
+        so as to avoid OOMing the client. The size_guidance should be a map
+        from a `tagType` string to an integer representing the number of
+        items to keep per tag for items of that `tagType`. If the size is 0,
+        all events are stored.
+      compression_bps: Information on how the `EventAccumulator` should compress
+        histogram data for the `CompressedHistograms` tag (for details see
+        `ProcessCompressedHistogram`).
+    """
+    sizes = {}
+    for key in DEFAULT_SIZE_GUIDANCE:
+      if key in size_guidance:
+        sizes[key] = size_guidance[key]
+      else:
+        sizes[key] = DEFAULT_SIZE_GUIDANCE[key]
+
+    self._scalars = reservoir.Reservoir(size=sizes[SCALARS])
+    self._graph = None
+    self._histograms = reservoir.Reservoir(size=sizes[HISTOGRAMS])
+    self._compressed_histograms = reservoir.Reservoir(
+        size=sizes[COMPRESSED_HISTOGRAMS])
+    self._images = reservoir.Reservoir(size=sizes[IMAGES])
+    self._generator_mutex = threading.Lock()
+    self._generator = _GeneratorFromPath(path)
+    self._is_autoupdating = False
+    self._activated = False
+    self._compression_bps = compression_bps
+
+  def Reload(self):
+    """Loads all events added since the last call to `Reload`.
+
+    If `Reload` was never called, loads all events in the file.
+    Calling `Reload` activates the `EventAccumulator`.
+
+    Returns:
+      The `EventAccumulator`.
+    """
+    self._activated = True
+    with self._generator_mutex:
+      for event in self._generator.Load():
+        if event.HasField('graph_def'):
+          if self._graph is not None:
+            logging.warn(('Found more than one graph event per run.'
+                          'Overwritting the graph with the newest event'))
+          self._graph = event.graph_def
+        elif event.HasField('summary'):
+          for value in event.summary.value:
+            if value.HasField('simple_value'):
+              self._ProcessScalar(value.tag, event.wall_time, event.step,
+                                  value.simple_value)
+            elif value.HasField('histo'):
+              self._ProcessHistogram(value.tag, event.wall_time, event.step,
+                                     value.histo)
+              self._ProcessCompressedHistogram(value.tag, event.wall_time,
+                                               event.step, value.histo)
+            elif value.HasField('image'):
+              self._ProcessImage(value.tag, event.wall_time, event.step,
+                                 value.image)
+    return self
+
+  def AutoUpdate(self, interval=60):
+    """Asynchronously load all events, and periodically reload.
+
+    Calling this function is not thread safe.
+    Calling this function activates the `EventAccumulator`.
+
+    Args:
+      interval: how many seconds after each successful reload to load new events
+        (default 60)
+
+    Returns:
+      The `EventAccumulator`.
+    """
+    if self._is_autoupdating:
+      return
+    self._is_autoupdating = True
+    self._activated = True
+    def Update():
+      self.Reload()
+      logging.info('EventAccumulator update triggered')
+      t = threading.Timer(interval, Update)
+      t.daemon = True
+      t.start()
+    # Asynchronously start the update process, so that the accumulator can
+    # immediately serve data, even if there is a very large event file to parse
+    t = threading.Timer(0, Update)
+    t.daemon = True
+    t.start()
+    return self
+
+  def Tags(self):
+    """Return all tags found in the value stream.
+
+    Raises:
+      RuntimeError: If the `EventAccumulator` has not been activated.
+
+    Returns:
+      A `{tagType: ['list', 'of', 'tags']}` dictionary.
+    """
+    self._VerifyActivated()
+    return {IMAGES: self._images.Keys(),
+            HISTOGRAMS: self._histograms.Keys(),
+            SCALARS: self._scalars.Keys(),
+            COMPRESSED_HISTOGRAMS: self._compressed_histograms.Keys(),
+            GRAPH: self._graph is not None}
+
+  def Scalars(self, tag):
+    """Given a summary tag, return all associated `ScalarEvent`s.
+
+    Args:
+      tag: A string tag associated with the events.
+
+    Raises:
+      KeyError: If the tag is not found.
+      RuntimeError: If the `EventAccumulator` has not been activated.
+
+    Returns:
+      An array of `ScalarEvent`s.
+    """
+    self._VerifyActivated()
+    return self._scalars.Items(tag)
+
+  def Graph(self):
+    """Return the graph definition, if there is one.
+
+    Raises:
+      ValueError: If there is no graph for this run.
+      RuntimeError: If the `EventAccumulator` has not been activated.
+
+    Returns:
+      The `graph_def` proto.
+    """
+    self._VerifyActivated()
+    if self._graph is None:
+      raise ValueError('There is no graph in this EventAccumulator')
+    return self._graph
+
+  def Histograms(self, tag):
+    """Given a summary tag, return all associated histograms.
+
+    Args:
+      tag: A string tag associated with the events.
+
+    Raises:
+      KeyError: If the tag is not found.
+      RuntimeError: If the `EventAccumulator` has not been activated.
+
+    Returns:
+      An array of `HistogramEvent`s.
+    """
+    self._VerifyActivated()
+    return self._histograms.Items(tag)
+
+  def CompressedHistograms(self, tag):
+    """Given a summary tag, return all associated compressed histograms.
+
+    Args:
+      tag: A string tag associated with the events.
+
+    Raises:
+      KeyError: If the tag is not found.
+      RuntimeError: If the `EventAccumulator` has not been activated.
+
+    Returns:
+      An array of `CompressedHistogramEvent`s.
+    """
+    self._VerifyActivated()
+    return self._compressed_histograms.Items(tag)
+
+  def Images(self, tag):
+    """Given a summary tag, return all associated images.
+
+    Args:
+      tag: A string tag associated with the events.
+
+    Raises:
+      KeyError: If the tag is not found.
+      RuntimeError: If the `EventAccumulator` has not been activated.
+
+    Returns:
+      An array of `ImageEvent`s.
+    """
+    self._VerifyActivated()
+    return self._images.Items(tag)
+
+  def _VerifyActivated(self):
+    if not self._activated:
+      raise RuntimeError('Accumulator must be activated before it may be used.')
+
+  def _ProcessScalar(self, tag, wall_time, step, scalar):
+    """Processes a simple value by adding it to accumulated state."""
+    sv = ScalarEvent(wall_time=wall_time, step=step, value=scalar)
+    self._scalars.AddItem(tag, sv)
+
+  def _ProcessHistogram(self, tag, wall_time, step, histo):
+    """Processes a histogram by adding it to accumulated state."""
+    histogram_value = HistogramValue(
+        min=histo.min,
+        max=histo.max,
+        num=histo.num,
+        sum=histo.sum,
+        sum_squares=histo.sum_squares,
+        # convert from proto repeated to list
+        bucket_limit=list(histo.bucket_limit),
+        bucket=list(histo.bucket),
+    )
+    histogram_event = HistogramEvent(
+        wall_time=wall_time,
+        step=step,
+        histogram_value=histogram_value,
+    )
+    self._histograms.AddItem(tag, histogram_event)
+
+  def _Remap(self, x, x0, x1, y0, y1):
+    """Linearly map from [x0, x1] unto [y0, y1]."""
+    return y0 + (x - x0) * float(y1 - y0)/(x1 - x0)
+
+  def _Percentile(self, compression_bps, bucket_limit, cumsum_weights,
+                  histo_min, histo_max, histo_num):
+    """Linearly interpolates a histogram weight for a particular basis point.
+
+    Uses clamping methods on `histo_min` and `histo_max` to produce tight
+    linear estimates of the histogram weight at a particular basis point.
+
+    Args:
+      compression_bps: The desired basis point at which to estimate the weight
+      bucket_limit: An array of the RHS histogram bucket limits
+      cumsum_weights: A cumulative sum of the fraction of weights in each
+        histogram bucket, represented in basis points.
+      histo_min: The minimum weight observed in the weight histogram
+      histo_max: The maximum weight observed in the weight histogram
+      histo_num: The number of items in the weight histogram
+
+    Returns:
+      A linearly interpolated value of the histogram weight estimate.
+    """
+    if histo_num == 0: return 0
+
+    for i, cumsum in enumerate(cumsum_weights):
+      if cumsum >= compression_bps:
+        cumsum_prev = cumsum_weights[i-1] if i > 0 else 0
+        # Prevent cumsum = 0, cumsum_prev = 0, lerp divide by zero.
+        if cumsum == cumsum_prev: continue
+
+        # Calculate the lower bound of interpolation
+        lhs = bucket_limit[i-1] if (i > 0 and cumsum_prev > 0) else histo_min
+        lhs = max(lhs, histo_min)
+
+        # Calculate the upper bound of interpolation
+        rhs = bucket_limit[i]
+        rhs = min(rhs, histo_max)
+
+        weight = self._Remap(compression_bps, cumsum_prev, cumsum, lhs, rhs)
+        return weight
+
+    ## We have not exceeded cumsum, so return the max observed.
+    return histo_max
+
+  def _ProcessCompressedHistogram(self, tag, wall_time, step, histo):
+    """Processes a histogram by adding a compression to accumulated state.
+
+    Adds a compressed histogram by linearly interpolating histogram buckets to
+    represent the histogram weight at multiple compression points. Uses
+    self._compression_bps (passed to EventAccumulator constructor) as the
+    compression points (represented in basis points, 1/100ths of a precent).
+
+    Args:
+      tag: A string name of the tag for which histograms are retrieved.
+      wall_time: Time in seconds since epoch
+      step: Number of steps that have passed
+      histo: proto2 histogram Object
+    """
+    def _CumulativeSum(arr):
+      return [sum(arr[:i+1]) for i in range(len(arr))]
+
+    # Convert from proto repeated field into a Python list.
+    bucket = list(histo.bucket)
+    bucket_limit = list(histo.bucket_limit)
+
+    bucket_total = sum(bucket)
+    fraction_weights = [float(10000*x)/bucket_total for x in bucket]
+    cumsum_weights = _CumulativeSum(fraction_weights)
+
+    percentiles = [
+        self._Percentile(bps, bucket_limit, cumsum_weights, histo.min,
+                         histo.max, histo.num) for bps in self._compression_bps
+    ]
+
+    compressed_histogram_values = [CompressedHistogramValue(
+        basis_point=bps,
+        value=value) for bps, value in zip(self._compression_bps, percentiles)]
+    histogram_event = CompressedHistogramEvent(
+        wall_time=wall_time,
+        step=step,
+        compressed_histogram_values=compressed_histogram_values)
+
+    self._compressed_histograms.AddItem(tag, histogram_event)
+
+  def _ProcessImage(self, tag, wall_time, step, image):
+    """Processes an image by adding it to accumulated state."""
+    event = ImageEvent(
+        wall_time=wall_time,
+        step=step,
+        encoded_image_string=image.encoded_image_string,
+        width=image.width,
+        height=image.height
+    )
+    self._images.AddItem(tag, event)
+
+
+def _GeneratorFromPath(path):
+  """Create an event generator for file or directory at given path string."""
+  loader_factory = event_file_loader.EventFileLoader
+  if gfile.IsDirectory(path):
+    return directory_watcher.DirectoryWatcher(path, loader_factory,
+                                              IsTensorFlowEventsFile)
+  else:
+    return loader_factory(path)
diff --git a/tensorflow/python/summary/event_accumulator_test.py b/tensorflow/python/summary/event_accumulator_test.py
new file mode 100644
index 0000000000..c8de80ccba
--- /dev/null
+++ b/tensorflow/python/summary/event_accumulator_test.py
@@ -0,0 +1,422 @@
+import os
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+from tensorflow.python.platform import gfile
+from tensorflow.python.summary import event_accumulator as ea
+
+
+class _EventGenerator(object):
+
+  def __init__(self):
+    self.items = []
+
+  def Load(self):
+    while self.items:
+      yield self.items.pop(0)
+
+  def AddScalar(self, tag, wall_time=0, step=0, value=0):
+    event = tf.Event(
+        wall_time=wall_time, step=step,
+        summary=tf.Summary(
+            value=[tf.Summary.Value(tag=tag, simple_value=value)]
+        )
+    )
+    self.AddEvent(event)
+
+  def AddHistogram(self, tag, wall_time=0, step=0, hmin=1, hmax=2, hnum=3,
+                   hsum=4, hsum_squares=5, hbucket_limit=None, hbucket=None):
+    histo = tf.HistogramProto(min=hmin, max=hmax, num=hnum, sum=hsum,
+                              sum_squares=hsum_squares,
+                              bucket_limit=hbucket_limit,
+                              bucket=hbucket)
+    event = tf.Event(
+        wall_time=wall_time,
+        step=step,
+        summary=tf.Summary(value=[tf.Summary.Value(tag=tag, histo=histo)]))
+    self.AddEvent(event)
+
+  def AddImage(self, tag, wall_time=0, step=0, encoded_image_string='imgstr',
+               width=150, height=100):
+    image = tf.Summary.Image(encoded_image_string=encoded_image_string,
+                             width=width, height=height)
+    event = tf.Event(
+        wall_time=wall_time,
+        step=step,
+        summary=tf.Summary(
+            value=[tf.Summary.Value(tag=tag, image=image)]))
+    self.AddEvent(event)
+
+  def AddEvent(self, event):
+    self.items.append(event)
+
+
+class EventAccumulatorTest(tf.test.TestCase):
+
+  def assertTagsEqual(self, tags1, tags2):
+    # Make sure the two dictionaries have the same keys.
+    self.assertItemsEqual(tags1, tags2)
+    # Additionally, make sure each key in the dictionary maps to the same value.
+    for key in tags1:
+      if isinstance(tags1[key], list):
+        # We don't care about the order of the values in lists, thus asserting
+        # only if the items are equal.
+        self.assertItemsEqual(tags1[key], tags2[key])
+      else:
+        # Make sure the values are equal.
+        self.assertEqual(tags1[key], tags2[key])
+
+
+class MockingEventAccumulatorTest(EventAccumulatorTest):
+
+  def setUp(self):
+    super(MockingEventAccumulatorTest, self).setUp()
+    self.empty = {ea.IMAGES: [],
+                  ea.SCALARS: [],
+                  ea.HISTOGRAMS: [],
+                  ea.COMPRESSED_HISTOGRAMS: [],
+                  ea.GRAPH: False}
+    self._real_constructor = ea.EventAccumulator
+    self._real_generator = ea._GeneratorFromPath
+    def _FakeAccumulatorConstructor(generator, *args, **kwargs):
+      ea._GeneratorFromPath = lambda x: generator
+      return self._real_constructor(generator, *args, **kwargs)
+    ea.EventAccumulator = _FakeAccumulatorConstructor
+
+  def tearDown(self):
+    ea.EventAccumulator = self._real_constructor
+    ea._GeneratorFromPath = self._real_generator
+
+  def testEmptyAccumulator(self):
+    gen = _EventGenerator()
+    x = ea.EventAccumulator(gen)
+    x.Reload()
+    self.assertEqual(x.Tags(), self.empty)
+
+  def testTags(self):
+    gen = _EventGenerator()
+    gen.AddScalar('sv1')
+    gen.AddScalar('sv2')
+    gen.AddHistogram('hst1')
+    gen.AddHistogram('hst2')
+    gen.AddImage('im1')
+    gen.AddImage('im2')
+    acc = ea.EventAccumulator(gen)
+    acc.Reload()
+    self.assertTagsEqual(
+        acc.Tags(), {
+            ea.IMAGES: ['im1', 'im2'],
+            ea.SCALARS: ['sv1', 'sv2'],
+            ea.HISTOGRAMS: ['hst1', 'hst2'],
+            ea.COMPRESSED_HISTOGRAMS: ['hst1', 'hst2'],
+            ea.GRAPH: False})
+
+  def testReload(self):
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen)
+    acc.Reload()
+    self.assertEqual(acc.Tags(), self.empty)
+    gen.AddScalar('sv1')
+    gen.AddScalar('sv2')
+    gen.AddHistogram('hst1')
+    gen.AddHistogram('hst2')
+    gen.AddImage('im1')
+    gen.AddImage('im2')
+    self.assertEqual(acc.Tags(), self.empty)
+    acc.Reload()
+    self.assertTagsEqual(acc.Tags(), {
+        ea.IMAGES: ['im1', 'im2'],
+        ea.SCALARS: ['sv1', 'sv2'],
+        ea.HISTOGRAMS: ['hst1', 'hst2'],
+        ea.COMPRESSED_HISTOGRAMS: ['hst1', 'hst2'],
+        ea.GRAPH: False})
+
+  def testScalars(self):
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen)
+    sv1 = ea.ScalarEvent(wall_time=1, step=10, value=32)
+    sv2 = ea.ScalarEvent(wall_time=2, step=12, value=64)
+    gen.AddScalar('sv1', wall_time=1, step=10, value=32)
+    gen.AddScalar('sv2', wall_time=2, step=12, value=64)
+    acc.Reload()
+    self.assertEqual(acc.Scalars('sv1'), [sv1])
+    self.assertEqual(acc.Scalars('sv2'), [sv2])
+
+  def testHistograms(self):
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen)
+
+    val1 = ea.HistogramValue(min=1, max=2, num=3, sum=4, sum_squares=5,
+                             bucket_limit=[1, 2, 3], bucket=[0, 3, 0])
+    val2 = ea.HistogramValue(min=-2, max=3, num=4, sum=5, sum_squares=6,
+                             bucket_limit=[2, 3, 4], bucket=[1, 3, 0])
+
+    hst1 = ea.HistogramEvent(wall_time=1, step=10, histogram_value=val1)
+    hst2 = ea.HistogramEvent(wall_time=2, step=12, histogram_value=val2)
+    gen.AddHistogram('hst1', wall_time=1, step=10, hmin=1, hmax=2, hnum=3,
+                     hsum=4, hsum_squares=5, hbucket_limit=[1, 2, 3],
+                     hbucket=[0, 3, 0])
+    gen.AddHistogram('hst2', wall_time=2, step=12, hmin=-2, hmax=3, hnum=4,
+                     hsum=5, hsum_squares=6, hbucket_limit=[2, 3, 4],
+                     hbucket=[1, 3, 0])
+    acc.Reload()
+    self.assertEqual(acc.Histograms('hst1'), [hst1])
+    self.assertEqual(acc.Histograms('hst2'), [hst2])
+
+  def testCompressedHistograms(self):
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen, compression_bps=(0, 2500, 5000, 7500, 10000))
+
+    gen.AddHistogram('hst1', wall_time=1, step=10, hmin=1, hmax=2, hnum=3,
+                     hsum=4, hsum_squares=5, hbucket_limit=[1, 2, 3],
+                     hbucket=[0, 3, 0])
+    gen.AddHistogram('hst2', wall_time=2, step=12, hmin=-2, hmax=3, hnum=4,
+                     hsum=5, hsum_squares=6, hbucket_limit=[2, 3, 4],
+                     hbucket=[1, 3, 0])
+    acc.Reload()
+
+    # Create the expected values after compressing hst1
+    expected_vals1 = [ea.CompressedHistogramValue(bp, val) for bp, val in [(
+        0, 1.0), (2500, 1.25), (5000, 1.5), (7500, 1.75), (10000, 2.0)]]
+    expected_cmphst1 = ea.CompressedHistogramEvent(
+        wall_time=1,
+        step=10,
+        compressed_histogram_values=expected_vals1)
+    self.assertEqual(acc.CompressedHistograms('hst1'), [expected_cmphst1])
+
+    # Create the expected values after compressing hst2
+    expected_vals2 = [
+        ea.CompressedHistogramValue(bp, val)
+        for bp, val in [(0, -2), (2500, 2), (5000, 2 + float(1) / 3), (
+            7500, 2 + float(2) / 3), (10000, 3)]
+    ]
+    expected_cmphst2 = ea.CompressedHistogramEvent(
+        wall_time=2,
+        step=12,
+        compressed_histogram_values=expected_vals2)
+    self.assertEqual(acc.CompressedHistograms('hst2'), [expected_cmphst2])
+
+  def testPercentile(self):
+
+    def AssertExpectedForBps(bps, expected):
+      output = acc._Percentile(
+          bps, bucket_limit, cumsum_weights, histo_min, histo_max, histo_num)
+      self.assertAlmostEqual(expected, output)
+
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen)
+
+    bucket_limit = [1, 2, 3, 4]
+    histo_num = 100
+
+    ## All weights in the first bucket
+    cumsum_weights = [10000, 10000, 10000, 10000]
+    histo_min = -1
+    histo_max = .9
+    AssertExpectedForBps(0, histo_min)
+    AssertExpectedForBps(2500, acc._Remap(2500, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(5000, acc._Remap(5000, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(7500, acc._Remap(7500, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(10000, histo_max)
+
+    ## All weights in second bucket
+    cumsum_weights = [0, 10000, 10000, 10000]
+    histo_min = 1.1
+    histo_max = 1.8
+    AssertExpectedForBps(0, histo_min)
+    AssertExpectedForBps(2500, acc._Remap(2500, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(5000, acc._Remap(5000, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(7500, acc._Remap(7500, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(10000, histo_max)
+
+    ## All weights in the last bucket
+    cumsum_weights = [0, 0, 0, 10000]
+    histo_min = 3.1
+    histo_max = 3.6
+    AssertExpectedForBps(0, histo_min)
+    AssertExpectedForBps(2500, acc._Remap(2500, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(5000, acc._Remap(5000, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(7500, acc._Remap(7500, 0, 10000, histo_min, histo_max))
+    AssertExpectedForBps(10000, histo_max)
+
+    ## Weights distributed between two buckets
+    cumsum_weights = [0, 4000, 10000, 10000]
+    histo_min = 1.1
+    histo_max = 2.9
+    AssertExpectedForBps(0, histo_min)
+    AssertExpectedForBps(2500, acc._Remap(2500, 0, 4000, histo_min,
+                                          bucket_limit[1]))
+    AssertExpectedForBps(5000, acc._Remap(5000, 4000, 10000, bucket_limit[1],
+                                          histo_max))
+    AssertExpectedForBps(7500, acc._Remap(7500, 4000, 10000, bucket_limit[1],
+                                          histo_max))
+    AssertExpectedForBps(10000, histo_max)
+
+    ## Weights distributed between all buckets
+    cumsum_weights = [1000, 4000, 8000, 10000]
+    histo_min = -1
+    histo_max = 3.9
+    AssertExpectedForBps(0, histo_min)
+    AssertExpectedForBps(2500, acc._Remap(2500, 1000, 4000, bucket_limit[0],
+                                          bucket_limit[1]))
+    AssertExpectedForBps(5000, acc._Remap(5000, 4000, 8000, bucket_limit[1],
+                                          bucket_limit[2]))
+    AssertExpectedForBps(7500, acc._Remap(7500, 4000, 8000, bucket_limit[1],
+                                          bucket_limit[2]))
+    AssertExpectedForBps(9000, acc._Remap(9000, 8000, 10000, bucket_limit[2],
+                                          histo_max))
+    AssertExpectedForBps(10000, histo_max)
+
+    ## Most weight in first bucket
+    cumsum_weights = [9000, 10000, 10000, 10000]
+    histo_min = -1
+    histo_max = 1.1
+    AssertExpectedForBps(0, histo_min)
+    AssertExpectedForBps(2500, acc._Remap(2500, 0, 9000, histo_min,
+                                          bucket_limit[0]))
+    AssertExpectedForBps(5000, acc._Remap(5000, 0, 9000, histo_min,
+                                          bucket_limit[0]))
+    AssertExpectedForBps(7500, acc._Remap(7500, 0, 9000, histo_min,
+                                          bucket_limit[0]))
+    AssertExpectedForBps(9500, acc._Remap(9500, 9000, 10000, bucket_limit[0],
+                                          histo_max))
+    AssertExpectedForBps(10000, histo_max)
+
+  def testImages(self):
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen)
+    im1 = ea.ImageEvent(wall_time=1, step=10, encoded_image_string='big',
+                        width=400, height=300)
+    im2 = ea.ImageEvent(wall_time=2, step=12, encoded_image_string='small',
+                        width=40, height=30)
+    gen.AddImage('im1', wall_time=1, step=10, encoded_image_string='big',
+                 width=400, height=300)
+    gen.AddImage('im2', wall_time=2, step=12, encoded_image_string='small',
+                 width=40, height=30)
+    acc.Reload()
+    self.assertEqual(acc.Images('im1'), [im1])
+    self.assertEqual(acc.Images('im2'), [im2])
+
+  def testActivation(self):
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen)
+    self.assertFalse(acc._activated)
+    with self.assertRaises(RuntimeError):
+      acc.Tags()
+    with self.assertRaises(RuntimeError):
+      acc.Scalars('sv1')
+    acc.Reload()
+    self.assertTrue(acc._activated)
+    acc._activated = False
+
+  def testKeyError(self):
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen)
+    acc.Reload()
+    with self.assertRaises(KeyError):
+      acc.Scalars('sv1')
+    with self.assertRaises(KeyError):
+      acc.Scalars('hst1')
+    with self.assertRaises(KeyError):
+      acc.Scalars('im1')
+    with self.assertRaises(KeyError):
+      acc.Histograms('sv1')
+    with self.assertRaises(KeyError):
+      acc.Histograms('im1')
+    with self.assertRaises(KeyError):
+      acc.Images('sv1')
+    with self.assertRaises(KeyError):
+      acc.Images('hst1')
+
+  def testNonValueEvents(self):
+    """Tests that non-value events in the generator don't cause early exits."""
+    gen = _EventGenerator()
+    acc = ea.EventAccumulator(gen)
+    gen.AddScalar('sv1', wall_time=1, step=10, value=20)
+    gen.AddEvent(tf.Event(
+        wall_time=2, step=20, file_version='notsv2'))
+    gen.AddScalar('sv3', wall_time=3, step=100, value=1)
+    gen.AddHistogram('hst1')
+    gen.AddImage('im1')
+
+    acc.Reload()
+    self.assertTagsEqual(acc.Tags(), {
+        ea.IMAGES: ['im1'],
+        ea.SCALARS: ['sv1', 'sv3'],
+        ea.HISTOGRAMS: ['hst1'],
+        ea.COMPRESSED_HISTOGRAMS: ['hst1'],
+        ea.GRAPH: False})
+
+
+class RealisticEventAccumulatorTest(EventAccumulatorTest):
+
+  def setUp(self):
+    super(RealisticEventAccumulatorTest, self).setUp()
+
+  def testScalarsRealistically(self):
+    """Test accumulator by writing values and then reading them."""
+    def FakeScalarSummary(tag, value):
+      value = tf.Summary.Value(tag=tag, simple_value=value)
+      summary = tf.Summary(value=[value])
+      return summary
+
+    directory = os.path.join(self.get_temp_dir(), 'values_dir')
+    if gfile.IsDirectory(directory):
+      gfile.DeleteRecursively(directory)
+    gfile.MkDir(directory)
+
+    writer = tf.train.SummaryWriter(directory, max_queue=100)
+    graph_def = tf.GraphDef(node=[tf.NodeDef(name='A', op='Mul')])
+    # Add a graph to the summary writer.
+    writer.add_graph(graph_def)
+
+    # Write a bunch of events using the writer
+    for i in xrange(30):
+      summ_id = FakeScalarSummary('id', i)
+      summ_sq = FakeScalarSummary('sq', i*i)
+      writer.add_summary(summ_id, i*5)
+      writer.add_summary(summ_sq, i*5)
+    writer.flush()
+
+    # Verify that we can load those events properly
+    acc = ea.EventAccumulator(directory)
+    acc.Reload()
+    self.assertTagsEqual(acc.Tags(), {
+        ea.IMAGES: [],
+        ea.SCALARS: ['id', 'sq'],
+        ea.HISTOGRAMS: [],
+        ea.COMPRESSED_HISTOGRAMS: [],
+        ea.GRAPH: True})
+    id_events = acc.Scalars('id')
+    sq_events = acc.Scalars('sq')
+    self.assertEqual(30, len(id_events))
+    self.assertEqual(30, len(sq_events))
+    for i in xrange(30):
+      self.assertEqual(i*5, id_events[i].step)
+      self.assertEqual(i*5, sq_events[i].step)
+      self.assertEqual(i, id_events[i].value)
+      self.assertEqual(i*i, sq_events[i].value)
+
+    # Write a few more events to test incremental reloading
+    for i in xrange(30, 40):
+      summ_id = FakeScalarSummary('id', i)
+      summ_sq = FakeScalarSummary('sq', i*i)
+      writer.add_summary(summ_id, i*5)
+      writer.add_summary(summ_sq, i*5)
+    writer.flush()
+
+    # Verify we can now see all of the data
+    acc.Reload()
+    self.assertEqual(40, len(id_events))
+    self.assertEqual(40, len(sq_events))
+    for i in xrange(40):
+      self.assertEqual(i*5, id_events[i].step)
+      self.assertEqual(i*5, sq_events[i].step)
+      self.assertEqual(i, id_events[i].value)
+      self.assertEqual(i*i, sq_events[i].value)
+
+
+if __name__ == '__main__':
+  tf.test.main()
diff --git a/tensorflow/python/summary/event_multiplexer.py b/tensorflow/python/summary/event_multiplexer.py
new file mode 100644
index 0000000000..9966d76b21
--- /dev/null
+++ b/tensorflow/python/summary/event_multiplexer.py
@@ -0,0 +1,346 @@
+"""Provides an interface for working with multiple event files."""
+
+import os
+import threading
+
+from tensorflow.python.platform import gfile
+from tensorflow.python.platform import logging
+from tensorflow.python.summary import event_accumulator
+
+
+class EventMultiplexer(object):
+  """An `EventMultiplexer` manages access to multiple `EventAccumulator`s.
+
+  Each `EventAccumulator` is associated with a `run`, which is a self-contained
+  TensorFlow execution. The `EventMultiplexer` provides methods for extracting
+  information about events from multiple `run`s.
+
+  Example usage for loading specific runs from files:
+
+  ```python
+  x = EventMultiplexer({'run1': 'path/to/run1', 'run2': 'path/to/run2'})
+  x.Reload()
+  ```
+
+  Example usage for loading a directory where each subdirectory is a run
+
+  ```python
+  (eg:) /parent/directory/path/
+        /parent/directory/path/run1/
+        /parent/directory/path/run1/events.out.tfevents.1001
+        /parent/directory/path/run1/events.out.tfevents.1002
+
+        /parent/directory/path/run2/
+        /parent/directory/path/run2/events.out.tfevents.9232
+
+        /parent/directory/path/run3/
+        /parent/directory/path/run3/events.out.tfevents.9232
+  x = EventMultiplexer().AddRunsFromDirectory('/parent/directory/path')
+  (which is equivalent to:)
+  x = EventMultiplexer({'run1': '/parent/directory/path/run1', 'run2':...}
+  ```
+
+  If you would like to watch `/parent/directory/path`, wait for it to be created
+    (if necessary) and then periodically pick up new runs, use
+    `AutoloadingMultiplexer`
+
+  @@__init__
+  @@AddRun
+  @@AddRunsFromDirectory
+  @@Reload
+  @@AutoUpdate
+  @@Runs
+  @@Scalars
+  @@Graph
+  @@Histograms
+  @@CompressedHistograms
+  @@Images
+  """
+
+  def __init__(self, run_path_map=None,
+               size_guidance=event_accumulator.DEFAULT_SIZE_GUIDANCE):
+    """Constructor for the `EventMultiplexer`.
+
+    Args:
+      run_path_map: Dict `{run: path}` which specifies the
+        name of a run, and the path to find the associated events. If it is
+        None, then the EventMultiplexer initializes without any runs.
+      size_guidance: A dictionary mapping from `tagType` to the number of items
+        to store for each tag of that type. See
+        `event_ccumulator.EventAccumulator` for details.
+    """
+    self._accumulators_mutex = threading.Lock()
+    self._accumulators = {}
+    self._paths = {}
+    self._reload_called = False
+    self._autoupdate_called = False
+    self._autoupdate_interval = None
+    self._size_guidance = size_guidance
+    if run_path_map is not None:
+      for (run, path) in run_path_map.iteritems():
+        self.AddRun(path, run)
+
+  def AddRun(self, path, name=None):
+    """Add a run to the multiplexer.
+
+    If the name is not specified, it is the same as the path.
+
+    If a run by that name exists, and we are already watching the right path,
+      do nothing. If we are watching a different path, replace the event
+      accumulator.
+
+    If `AutoUpdate` or `Reload` have been called, it will `AutoUpdate` or
+    `Reload` the newly created accumulators. This maintains the invariant that
+    once the Multiplexer was activated, all of its accumulators are active.
+
+    Args:
+      path: Path to the event files (or event directory) for given run.
+      name: Name of the run to add. If not provided, is set to path.
+
+    Returns:
+      The `EventMultiplexer`.
+    """
+    if name is None or name is '':
+      name = path
+    accumulator = None
+    with self._accumulators_mutex:
+      if name not in self._accumulators or self._paths[name] != path:
+        if name in self._paths and self._paths[name] != path:
+          # TODO(danmane) - Make it impossible to overwrite an old path with
+          # a new path (just give the new path a distinct name)
+          logging.warning('Conflict for name %s: old path %s, new path %s' %
+                          (name, self._paths[name], path))
+        logging.info('Constructing EventAccumulator for %s', path)
+        accumulator = event_accumulator.EventAccumulator(path,
+                                                         self._size_guidance)
+        self._accumulators[name] = accumulator
+        self._paths[name] = path
+    if accumulator:
+      if self._reload_called:
+        accumulator.Reload()
+      if self._autoupdate_called:
+        accumulator.AutoUpdate(self._autoupdate_interval)
+    return self
+
+  def AddRunsFromDirectory(self, path, name=None):
+    """Load runs from a directory, assuming each subdirectory is a run.
+
+    If path doesn't exist, no-op. This ensures that it is safe to call
+      `AddRunsFromDirectory` multiple times, even before the directory is made.
+
+    If the directory contains TensorFlow event files, it is itself treated as a
+      run.
+
+    If the `EventMultiplexer` is already loaded or autoupdating, this will cause
+    the newly created accumulators to also `Reload()` or `AutoUpdate()`.
+
+    Args:
+      path: A string path to a directory to load runs from.
+      name: Optionally, what name to apply to the runs. If name is provided
+        and the directory contains run subdirectories, the name of each subrun
+        is the concatenation of the parent name and the subdirectory name. If
+        name is provided and the directory contains event files, then a run
+        is added called "name" and with the events from the path.
+
+    Raises:
+      ValueError: If the path exists and isn't a directory.
+
+    Returns:
+      The `EventMultiplexer`.
+    """
+    if not gfile.Exists(path):
+      return  # Maybe it hasn't been created yet, fail silently to retry later
+    if not gfile.IsDirectory(path):
+      raise ValueError('Path exists and is not a directory, %s'  % path)
+    paths = gfile.ListDirectory(path)
+    is_directory = lambda x: gfile.IsDirectory(os.path.join(path, x))
+    subdirectories = filter(is_directory, paths)
+    for s in subdirectories:
+      if name:
+        subname = '/'.join([name, s])
+      else:
+        subname = s
+      self.AddRun(os.path.join(path, s), subname)
+
+    if filter(event_accumulator.IsTensorFlowEventsFile, paths):
+      directory_name = os.path.split(path)[1]
+      logging.info('Directory %s has event files; loading' % directory_name)
+      if name:
+        dname = name
+      else:
+        dname = directory_name
+      self.AddRun(path, dname)
+    return self
+
+  def Reload(self):
+    """Call `Reload` on every `EventAccumulator`."""
+    self._reload_called = True
+    with self._accumulators_mutex:
+      loaders = self._accumulators.values()
+
+    for l in loaders:
+      l.Reload()
+    return self
+
+  def AutoUpdate(self, interval=60):
+    """Call `AutoUpdate(interval)` on every `EventAccumulator`."""
+    self._autoupdate_interval = interval
+    self._autoupdate_called = True
+    with self._accumulators_mutex:
+      loaders = self._accumulators.values()
+    for l in loaders:
+      l.AutoUpdate(interval)
+    return self
+
+  def Scalars(self, run, tag):
+    """Retrieve the scalar events associated with a run and tag.
+
+    Args:
+      run: A string name of the run for which values are retrieved.
+      tag: A string name of the tag for which values are retrieved.
+
+    Raises:
+      KeyError: If the run is not found, or the tag is not available for
+        the given run.
+      RuntimeError: If the run's `EventAccumulator` has not been activated.
+
+    Returns:
+      An array of `event_accumulator.ScalarEvents`.
+    """
+    accumulator = self._GetAccumulator(run)
+    return accumulator.Scalars(tag)
+
+  def Graph(self, run):
+    """Retrieve the graphs associated with the provided run.
+
+    Args:
+      run: A string name of a run to load the graph for.
+
+    Raises:
+      KeyError: If the run is not found.
+      ValueError: If the run does not have an associated graph.
+      RuntimeError: If the run's EventAccumulator has not been activated.
+
+    Returns:
+      The `graph_def` protobuf data structure.
+    """
+    accumulator = self._GetAccumulator(run)
+    return accumulator.Graph()
+
+  def Histograms(self, run, tag):
+    """Retrieve the histogram events associated with a run and tag.
+
+    Args:
+      run: A string name of the run for which values are retrieved.
+      tag: A string name of the tag for which values are retrieved.
+
+    Raises:
+      KeyError: If the run is not found, or the tag is not available for
+        the given run.
+      RuntimeError: If the run's `EventAccumulator` has not been activated.
+
+    Returns:
+      An array of `event_accumulator.HistogramEvents`.
+    """
+    accumulator = self._GetAccumulator(run)
+    return accumulator.Histograms(tag)
+
+  def CompressedHistograms(self, run, tag):
+    """Retrieve the compressed histogram events associated with a run and tag.
+
+    Args:
+      run: A string name of the run for which values are retrieved.
+      tag: A string name of the tag for which values are retrieved.
+
+    Raises:
+      KeyError: If the run is not found, or the tag is not available for
+        the given run.
+      RuntimeError: If the run's EventAccumulator has not been activated.
+
+    Returns:
+      An array of `event_accumulator.CompressedHistogramEvents`.
+    """
+    accumulator = self._GetAccumulator(run)
+    return accumulator.CompressedHistograms(tag)
+
+  def Images(self, run, tag):
+    """Retrieve the image events associated with a run and tag.
+
+    Args:
+      run: A string name of the run for which values are retrieved.
+      tag: A string name of the tag for which values are retrieved.
+
+    Raises:
+      KeyError: If the run is not found, or the tag is not available for
+        the given run.
+      RuntimeError: If the run's `EventAccumulator` has not been activated.
+
+    Returns:
+      An array of `event_accumulator.ImageEvents`.
+    """
+    accumulator = self._GetAccumulator(run)
+    return accumulator.Images(tag)
+
+  def Runs(self):
+    """Return all the run names in the `EventMultiplexer`.
+
+    Returns:
+    ```
+      {runName: { images: [tag1, tag2, tag3],
+                  scalarValues: [tagA, tagB, tagC],
+                  histograms: [tagX, tagY, tagZ],
+                  compressedHistograms: [tagX, tagY, tagZ],
+                  graph: true}}
+    ```
+    """
+    with self._accumulators_mutex:
+      # To avoid nested locks, we construct a copy of the run-accumulator map
+      items = list(self._accumulators.iteritems())
+    return {
+        run_name: accumulator.Tags()
+        for run_name, accumulator in items
+    }
+
+  def _GetAccumulator(self, run):
+    with self._accumulators_mutex:
+      return self._accumulators[run]
+
+
+def AutoloadingMultiplexer(path_to_run, interval_secs=60,
+    size_guidance=event_accumulator.DEFAULT_SIZE_GUIDANCE):
+  """Create an `EventMultiplexer` that automatically loads runs in directories.
+
+  Args:
+    path_to_run: Dict `{path: name}` which specifies the path to a directory,
+      and its name (or `None`). The path may contain tfevents files (in which
+      case they are loaded, with name as the name of the run) and subdirectories
+      containing tfevents files (in which case each subdirectory is added as a
+      run, named `'name/subdirectory'`).
+
+    interval_secs: How often to poll the directory for new runs.
+    size_guidance: How much data to store for each tag of various types - see
+      `event_accumulator.EventAccumulator`.
+
+  Returns:
+    The multiplexer which will automatically load from the directories.
+
+  Raises:
+    ValueError: if `path_to_run` is `None`
+    TypeError: if `path_to_run` is not a dict
+  """
+  multiplexer = EventMultiplexer(size_guidance=size_guidance)
+  if path_to_run is None:
+    raise ValueError('Cant construct an autoloading multiplexer without runs.')
+  if not isinstance(path_to_run, dict):
+    raise TypeError('path_to_run should be a dict, was %s', path_to_run)
+  def Load():
+    for (path, name) in path_to_run.iteritems():
+      logging.info('Checking for new runs in %s', path)
+      multiplexer.AddRunsFromDirectory(path, name)
+    t = threading.Timer(interval_secs, Load)
+    t.daemon = True
+    t.start()
+  t = threading.Timer(0, Load)
+  t.daemon = True
+  t.start()
+  return multiplexer
diff --git a/tensorflow/python/summary/event_multiplexer_test.py b/tensorflow/python/summary/event_multiplexer_test.py
new file mode 100644
index 0000000000..35a8aed266
--- /dev/null
+++ b/tensorflow/python/summary/event_multiplexer_test.py
@@ -0,0 +1,244 @@
+import os
+
+import tensorflow.python.platform
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.platform import gfile
+from tensorflow.python.platform import googletest
+from tensorflow.python.summary import event_accumulator
+from tensorflow.python.summary import event_multiplexer
+
+
+class _FakeAccumulator(object):
+
+  def __init__(self, path):
+    self._path = path
+    self.autoupdate_called = False
+    self.autoupdate_interval = None
+    self.reload_called = False
+
+  def Tags(self):
+    return {event_accumulator.IMAGES: ['im1', 'im2'],
+            event_accumulator.HISTOGRAMS: ['hst1', 'hst2'],
+            event_accumulator.COMPRESSED_HISTOGRAMS: ['cmphst1', 'cmphst2'],
+            event_accumulator.SCALARS: ['sv1', 'sv2']}
+
+  def Scalars(self, tag_name):
+    if tag_name not in self.Tags()[event_accumulator.SCALARS]:
+      raise KeyError
+    return ['%s/%s' % (self._path, tag_name)]
+
+  def Histograms(self, tag_name):
+    if tag_name not in self.Tags()[event_accumulator.HISTOGRAMS]:
+      raise KeyError
+    return ['%s/%s' % (self._path, tag_name)]
+
+  def CompressedHistograms(self, tag_name):
+    if tag_name not in self.Tags()[event_accumulator.COMPRESSED_HISTOGRAMS]:
+      raise KeyError
+    return ['%s/%s' % (self._path, tag_name)]
+
+  def Images(self, tag_name):
+    if tag_name not in self.Tags()[event_accumulator.IMAGES]:
+      raise KeyError
+    return ['%s/%s' % (self._path, tag_name)]
+
+  def AutoUpdate(self, interval):
+    self.autoupdate_called = True
+    self.autoupdate_interval = interval
+
+  def Reload(self):
+    self.reload_called = True
+
+
+def _GetFakeAccumulator(path, size_guidance):  # pylint: disable=unused-argument
+  return _FakeAccumulator(path)
+
+
+class EventMultiplexerTest(test_util.TensorFlowTestCase):
+
+  def setUp(self):
+    super(EventMultiplexerTest, self).setUp()
+    event_accumulator.EventAccumulator = _GetFakeAccumulator
+
+  def testEmptyLoader(self):
+    x = event_multiplexer.EventMultiplexer()
+    self.assertEqual(x.Runs(), {})
+
+  def testRunNamesRespected(self):
+    x = event_multiplexer.EventMultiplexer({'run1': 'path1', 'run2': 'path2'})
+    self.assertItemsEqual(x.Runs().keys(), ['run1', 'run2'])
+    self.assertEqual(x._GetAccumulator('run1')._path, 'path1')
+    self.assertEqual(x._GetAccumulator('run2')._path, 'path2')
+
+  def testReload(self):
+    x = event_multiplexer.EventMultiplexer({'run1': 'path1', 'run2': 'path2'})
+    self.assertFalse(x._GetAccumulator('run1').reload_called)
+    self.assertFalse(x._GetAccumulator('run2').reload_called)
+    x.Reload()
+    self.assertTrue(x._GetAccumulator('run1').reload_called)
+    self.assertTrue(x._GetAccumulator('run2').reload_called)
+
+  def testAutoUpdate(self):
+    x = event_multiplexer.EventMultiplexer({'run1': 'path1', 'run2': 'path2'})
+    x.AutoUpdate(5)
+    self.assertTrue(x._GetAccumulator('run1').autoupdate_called)
+    self.assertEqual(x._GetAccumulator('run1').autoupdate_interval, 5)
+    self.assertTrue(x._GetAccumulator('run2').autoupdate_called)
+    self.assertEqual(x._GetAccumulator('run2').autoupdate_interval, 5)
+
+  def testScalars(self):
+    x = event_multiplexer.EventMultiplexer({'run1': 'path1', 'run2': 'path2'})
+
+    run1_actual = x.Scalars('run1', 'sv1')
+    run1_expected = ['path1/sv1']
+
+    self.assertEqual(run1_expected, run1_actual)
+
+  def testExceptions(self):
+    x = event_multiplexer.EventMultiplexer({'run1': 'path1', 'run2': 'path2'})
+    with self.assertRaises(KeyError):
+      x.Scalars('sv1', 'xxx')
+
+  def testInitialization(self):
+    x = event_multiplexer.EventMultiplexer()
+    self.assertEqual(x.Runs(), {})
+    x = event_multiplexer.EventMultiplexer({'run1': 'path1', 'run2': 'path2'})
+    self.assertItemsEqual(x.Runs(), ['run1', 'run2'])
+    self.assertEqual(x._GetAccumulator('run1')._path, 'path1')
+    self.assertEqual(x._GetAccumulator('run2')._path, 'path2')
+
+  def testAddRunsFromDirectory(self):
+    x = event_multiplexer.EventMultiplexer()
+    tmpdir = self.get_temp_dir()
+    join = os.path.join
+    fakedir = join(tmpdir, 'fake_accumulator_directory')
+    realdir = join(tmpdir, 'real_accumulator_directory')
+    self.assertEqual(x.Runs(), {})
+    x.AddRunsFromDirectory(fakedir)
+    self.assertEqual(x.Runs(), {}, 'loading fakedir had no effect')
+
+    if gfile.IsDirectory(realdir):
+      gfile.DeleteRecursively(realdir)
+    gfile.MkDir(realdir)
+    x.AddRunsFromDirectory(realdir)
+    self.assertEqual(x.Runs(), {}, 'loading empty directory had no effect')
+
+    path1 = join(realdir, 'path1')
+    gfile.MkDir(path1)
+    x.AddRunsFromDirectory(realdir)
+    self.assertEqual(x.Runs().keys(), ['path1'], 'loaded run: path1')
+    loader1 = x._GetAccumulator('path1')
+    self.assertEqual(loader1._path, path1, 'has the correct path')
+
+    path2 = join(realdir, 'path2')
+    gfile.MkDir(path2)
+    x.AddRunsFromDirectory(realdir)
+    self.assertItemsEqual(x.Runs().keys(), ['path1', 'path2'])
+    self.assertEqual(x._GetAccumulator('path1'), loader1,
+                     'loader1 not regenerated')
+    loader2 = x._GetAccumulator('path2')
+
+    path2_2 = join(path2, 'path2')
+    gfile.MkDir(path2_2)
+    x.AddRunsFromDirectory(path2)
+    self.assertItemsEqual(x.Runs().keys(), ['path1', 'path2'])
+    self.assertNotEqual(loader2, x._GetAccumulator('path2'),
+                        'loader2 regenerated')
+    self.assertEqual(x._GetAccumulator('path2')._path, path2_2,
+                     'loader2 path correct')
+
+  def testAddRunsFromDirectoryThatContainsEvents(self):
+    x = event_multiplexer.EventMultiplexer()
+    tmpdir = self.get_temp_dir()
+    join = os.path.join
+    realdir = join(tmpdir, 'event_containing_directory')
+
+    if gfile.IsDirectory(realdir):
+      gfile.DeleteRecursively(realdir)
+    gfile.MkDir(realdir)
+
+    self.assertEqual(x.Runs(), {})
+
+    with gfile.GFile(join(realdir, 'hypothetical.tfevents.out'), 'w'):
+      pass
+    x.AddRunsFromDirectory(realdir)
+    self.assertItemsEqual(x.Runs(), ['event_containing_directory'])
+
+    subdir = join(realdir, 'subdir')
+    gfile.MkDir(subdir)
+    x.AddRunsFromDirectory(realdir)
+    self.assertItemsEqual(x.Runs(), ['event_containing_directory', 'subdir'])
+
+  def testAddRunsFromDirectoryWithRunNames(self):
+    x = event_multiplexer.EventMultiplexer()
+    tmpdir = self.get_temp_dir()
+    join = os.path.join
+    realdir = join(tmpdir, 'event_containing_directory')
+
+    if gfile.IsDirectory(realdir):
+      gfile.DeleteRecursively(realdir)
+    gfile.MkDir(realdir)
+
+    self.assertEqual(x.Runs(), {})
+
+    with gfile.GFile(join(realdir, 'hypothetical.tfevents.out'), 'w'):
+      pass
+    x.AddRunsFromDirectory(realdir, 'foo')
+    self.assertItemsEqual(x.Runs(), ['foo'])
+
+    subdir = join(realdir, 'subdir')
+    gfile.MkDir(subdir)
+    x.AddRunsFromDirectory(realdir, 'foo')
+    self.assertItemsEqual(x.Runs(), ['foo', 'foo/subdir'])
+
+  def testAddRunsFromDirectoryThrowsException(self):
+    x = event_multiplexer.EventMultiplexer()
+    tmpdir = self.get_temp_dir()
+
+    filepath = os.path.join(tmpdir, 'bad_file')
+    with gfile.GFile(filepath, 'w'):
+      pass
+
+    with self.assertRaises(ValueError):
+      x.AddRunsFromDirectory(filepath)
+
+  def testAddRun(self):
+    x = event_multiplexer.EventMultiplexer()
+    x.AddRun('run1_path', 'run1')
+    run1 = x._GetAccumulator('run1')
+    self.assertEqual(x.Runs().keys(), ['run1'])
+    self.assertEqual(run1._path, 'run1_path')
+
+    x.AddRun('run1_path', 'run1')
+    self.assertEqual(run1, x._GetAccumulator('run1'), 'loader not recreated')
+
+    x.AddRun('run2_path', 'run1')
+    new_run1 = x._GetAccumulator('run1')
+    self.assertEqual(new_run1._path, 'run2_path')
+    self.assertNotEqual(run1, new_run1)
+
+    x.AddRun('runName3')
+    self.assertItemsEqual(x.Runs().keys(), ['run1', 'runName3'])
+    self.assertEqual(x._GetAccumulator('runName3')._path, 'runName3')
+
+  def testAddRunMaintainsLoading(self):
+    x = event_multiplexer.EventMultiplexer()
+    x.Reload()
+    x.AddRun('run1')
+    x.AddRun('run2')
+    self.assertTrue(x._GetAccumulator('run1').reload_called)
+    self.assertTrue(x._GetAccumulator('run2').reload_called)
+
+  def testAddRunMaintainsAutoUpdate(self):
+    x = event_multiplexer.EventMultiplexer()
+    x.AutoUpdate(5)
+    x.AddRun('run1')
+    x.AddRun('run2')
+    self.assertTrue(x._GetAccumulator('run1').autoupdate_called)
+    self.assertTrue(x._GetAccumulator('run2').autoupdate_called)
+    self.assertEqual(x._GetAccumulator('run1').autoupdate_interval, 5)
+    self.assertEqual(x._GetAccumulator('run2').autoupdate_interval, 5)
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/summary/impl/__init__.py b/tensorflow/python/summary/impl/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/summary/impl/__init__.py
diff --git a/tensorflow/python/summary/impl/directory_watcher.py b/tensorflow/python/summary/impl/directory_watcher.py
new file mode 100644
index 0000000000..830e538cb6
--- /dev/null
+++ b/tensorflow/python/summary/impl/directory_watcher.py
@@ -0,0 +1,115 @@
+"""Contains the implementation for the DirectoryWatcher class."""
+import os
+
+from tensorflow.python.platform import gfile
+from tensorflow.python.platform import logging
+
+
+class DirectoryWatcher(object):
+  """A DirectoryWatcher wraps a loader to load from a directory.
+
+  A loader reads a file on disk and produces some kind of values as an
+  iterator. A DirectoryWatcher takes a directory with one file at a time being
+  written to and a factory for loaders and watches all the files at once.
+
+  This class is *only* valid under the assumption that files are never removed
+  and the only file ever changed is whichever one is lexicographically last.
+  """
+
+  def __init__(self, directory, loader_factory, path_filter=lambda x: True):
+    """Constructs a new DirectoryWatcher.
+
+    Args:
+      directory: The directory to watch. The directory doesn't have to exist.
+      loader_factory: A factory for creating loaders. The factory should take a
+        file path and return an object that has a Load method returning an
+        iterator that will yield all events that have not been yielded yet.
+      path_filter: Only files whose full path matches this predicate will be
+        loaded. If not specified, all files are loaded.
+
+    Raises:
+      ValueError: If directory or loader_factory is None.
+    """
+    if directory is None:
+      raise ValueError('A directory is required')
+    if loader_factory is None:
+      raise ValueError('A loader factory is required')
+    self._directory = directory
+    self._loader_factory = loader_factory
+    self._loader = None
+    self._path = None
+    self._path_filter = path_filter
+
+  def Load(self):
+    """Loads new values from disk.
+
+    The watcher will load from one file at a time; as soon as that file stops
+    yielding events, it will move on to the next file. We assume that old files
+    are never modified after a newer file has been written. As a result, Load()
+    can be called multiple times in a row without losing events that have not
+    been yielded yet. In other words, we guarantee that every event will be
+    yielded exactly once.
+
+    Yields:
+      All values that were written to disk that have not been yielded yet.
+    """
+
+    # If the loader exists, check it for a value.
+    if not self._loader:
+      self._InitializeLoader()
+
+    while True:
+      # Yield all the new events in the file we're currently loading from.
+      for event in self._loader.Load():
+        yield event
+
+      next_path = self._GetNextPath()
+      if not next_path:
+        logging.info('No more files in %s', self._directory)
+        # Current file is empty and there are no new files, so we're done.
+        return
+
+      # There's a new file, so check to make sure there weren't any events
+      # written between when we finished reading the current file and when we
+      # checked for the new one. The sequence of events might look something
+      # like this:
+      #
+      # 1. Event #1 written to file #1.
+      # 2. We check for events and yield event #1 from file #1
+      # 3. We check for events and see that there are no more events in file #1.
+      # 4. Event #2 is written to file #1.
+      # 5. Event #3 is written to file #2.
+      # 6. We check for a new file and see that file #2 exists.
+      #
+      # Without this loop, we would miss event #2. We're also guaranteed by the
+      # loader contract that no more events will be written to file #1 after
+      # events start being written to file #2, so we don't have to worry about
+      # that.
+      for event in self._loader.Load():
+        yield event
+
+      logging.info('Directory watcher for %s advancing to file %s',
+                   self._directory, next_path)
+
+      # Advance to the next file and start over.
+      self._SetPath(next_path)
+
+  def _InitializeLoader(self):
+    path = self._GetNextPath()
+    if path:
+      self._SetPath(path)
+    else:
+      raise StopIteration
+
+  def _SetPath(self, path):
+    self._path = path
+    self._loader = self._loader_factory(path)
+
+  def _GetNextPath(self):
+    """Returns the path of the next file to use or None if no file exists."""
+    sorted_paths = [os.path.join(self._directory, path)
+                    for path in sorted(gfile.ListDirectory(self._directory))]
+    # We filter here so the filter gets the full directory name.
+    filtered_paths = (path for path in sorted_paths
+                      if self._path_filter(path) and path > self._path)
+    return next(filtered_paths, None)
diff --git a/tensorflow/python/summary/impl/directory_watcher_test.py b/tensorflow/python/summary/impl/directory_watcher_test.py
new file mode 100644
index 0000000000..a22e3f2922
--- /dev/null
+++ b/tensorflow/python/summary/impl/directory_watcher_test.py
@@ -0,0 +1,102 @@
+"""Tests for directory_watcher."""
+
+import os
+import shutil
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.platform import googletest
+from tensorflow.python.summary.impl import directory_watcher
+
+
+class _ByteLoader(object):
+  """A loader that loads individual bytes from a file."""
+
+  def __init__(self, path):
+    self._f = open(path)
+
+  def Load(self):
+    while True:
+      byte = self._f.read(1)
+      if byte:
+        yield byte
+      else:
+        return
+
+
+class DirectoryWatcherTest(test_util.TensorFlowTestCase):
+
+  def setUp(self):
+    # Put everything in a directory so it's easier to delete.
+    self._directory = os.path.join(self.get_temp_dir(), 'monitor_dir')
+    os.mkdir(self._directory)
+    self._watcher = directory_watcher.DirectoryWatcher(
+        self._directory, _ByteLoader)
+
+  def tearDown(self):
+    shutil.rmtree(self._directory)
+
+  def _WriteToFile(self, filename, data):
+    path = os.path.join(self._directory, filename)
+    with open(path, 'a') as f:
+      f.write(data)
+
+  def assertWatcherYields(self, values):
+    self.assertEqual(list(self._watcher.Load()), values)
+
+  def testRaisesWithBadArguments(self):
+    with self.assertRaises(ValueError):
+      directory_watcher.DirectoryWatcher(None, lambda x: [])
+    with self.assertRaises(ValueError):
+      directory_watcher.DirectoryWatcher('asdf', None)
+
+  def testEmptyDirectory(self):
+    self.assertWatcherYields([])
+
+  def testSingleWrite(self):
+    self._WriteToFile('a', 'abc')
+    self.assertWatcherYields(['a', 'b', 'c'])
+
+  def testMultipleWrites(self):
+    self._WriteToFile('a', 'abc')
+    self.assertWatcherYields(['a', 'b', 'c'])
+    self._WriteToFile('a', 'xyz')
+    self.assertWatcherYields(['x', 'y', 'z'])
+
+  def testMultipleLoads(self):
+    self._WriteToFile('a', 'a')
+    self._watcher.Load()
+    self._watcher.Load()
+    self.assertWatcherYields(['a'])
+
+  def testMultipleFilesAtOnce(self):
+    self._WriteToFile('b', 'b')
+    self._WriteToFile('a', 'a')
+    self.assertWatcherYields(['a', 'b'])
+
+  def testFinishesLoadingFileWhenSwitchingToNewFile(self):
+    self._WriteToFile('a', 'a')
+    # Empty the iterator.
+    self.assertEquals(['a'], list(self._watcher.Load()))
+    self._WriteToFile('a', 'b')
+    self._WriteToFile('b', 'c')
+    # The watcher should finish its current file before starting a new one.
+    self.assertWatcherYields(['b', 'c'])
+
+  def testIntermediateEmptyFiles(self):
+    self._WriteToFile('a', 'a')
+    self._WriteToFile('b', '')
+    self._WriteToFile('c', 'c')
+    self.assertWatcherYields(['a', 'c'])
+
+  def testFileFilter(self):
+    self._watcher = directory_watcher.DirectoryWatcher(
+        self._directory, _ByteLoader,
+        path_filter=lambda path: 'do_not_watch_me' not in path)
+
+    self._WriteToFile('a', 'a')
+    self._WriteToFile('do_not_watch_me', 'b')
+    self._WriteToFile('c', 'c')
+    self.assertWatcherYields(['a', 'c'])
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/summary/impl/event_file_loader.py b/tensorflow/python/summary/impl/event_file_loader.py
new file mode 100644
index 0000000000..0571bc84cb
--- /dev/null
+++ b/tensorflow/python/summary/impl/event_file_loader.py
@@ -0,0 +1,49 @@
+"""Functionality for loading events from a record file."""
+
+from tensorflow.core.util import event_pb2
+from tensorflow.python import pywrap_tensorflow
+from tensorflow.python.platform import app
+from tensorflow.python.platform import logging
+
+
+class EventFileLoader(object):
+  """An EventLoader is an iterator that yields Event protos."""
+
+  def __init__(self, file_path):
+    if file_path is None:
+      raise ValueError('A file path is required')
+    logging.debug('Opening a record reader pointing at %s', file_path)
+    self._reader = pywrap_tensorflow.PyRecordReader_New(file_path, 0)
+    # Store it for logging purposes.
+    self._file_path = file_path
+    if not self._reader:
+      raise IOError('Failed to open a record reader pointing to %s' % file_path)
+
+  def Load(self):
+    """Loads all new values from disk.
+
+    Calling Load multiple times in a row will not 'drop' events as long as the
+    return value is not iterated over.
+
+    Yields:
+      All values that were written to disk that have not been yielded yet.
+    """
+    while self._reader.GetNext():
+      logging.debug('Got an event from %s', self._file_path)
+      event = event_pb2.Event()
+      event.ParseFromString(self._reader.record())
+      yield event
+    logging.debug('No more events in %s', self._file_path)
+
+
+def main(argv):
+  if len(argv) != 2:
+    print 'Usage: event_file_loader <path-to-the-recordio-file>'
+    return 1
+  loader = EventFileLoader(argv[1])
+  for event in loader.Load():
+    print event
+
+
+if __name__ == '__main__':
+  app.run()
diff --git a/tensorflow/python/summary/impl/event_file_loader_test.py b/tensorflow/python/summary/impl/event_file_loader_test.py
new file mode 100644
index 0000000000..1dc29d85d5
--- /dev/null
+++ b/tensorflow/python/summary/impl/event_file_loader_test.py
@@ -0,0 +1,59 @@
+"""Tests for event_file_loader."""
+
+import os
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.platform import googletest
+from tensorflow.python.summary.impl import event_file_loader
+
+
+class EventFileLoaderTest(test_util.TensorFlowTestCase):
+  # A record containing a simple event.
+  RECORD = ('\x18\x00\x00\x00\x00\x00\x00\x00\xa3\x7fK"\t\x00\x00\xc0%\xddu'
+            '\xd5A\x1a\rbrain.Event:1\xec\xf32\x8d')
+
+  def _WriteToFile(self, filename, data):
+    path = os.path.join(self.get_temp_dir(), filename)
+    with open(path, 'ab') as f:
+      f.write(data)
+
+  def _LoaderForTestFile(self, filename):
+    return event_file_loader.EventFileLoader(
+        os.path.join(self.get_temp_dir(), filename))
+
+  def testEmptyEventFile(self):
+    self._WriteToFile('empty_event_file', '')
+    loader = self._LoaderForTestFile('empty_event_file')
+    self.assertEquals(len(list(loader.Load())), 0)
+
+  def testSingleWrite(self):
+    self._WriteToFile('single_event_file', EventFileLoaderTest.RECORD)
+    loader = self._LoaderForTestFile('single_event_file')
+    events = list(loader.Load())
+    self.assertEquals(len(events), 1)
+    self.assertEquals(events[0].wall_time, 1440183447.0)
+    self.assertEquals(len(list(loader.Load())), 0)
+
+  def testMultipleWrites(self):
+    self._WriteToFile('staggered_event_file', EventFileLoaderTest.RECORD)
+    loader = self._LoaderForTestFile('staggered_event_file')
+    self.assertEquals(len(list(loader.Load())), 1)
+    self._WriteToFile('staggered_event_file', EventFileLoaderTest.RECORD)
+    self.assertEquals(len(list(loader.Load())), 1)
+
+  def testMultipleLoads(self):
+    self._WriteToFile('multiple_loads_event_file', EventFileLoaderTest.RECORD)
+    loader = self._LoaderForTestFile('multiple_loads_event_file')
+    loader.Load()
+    loader.Load()
+    self.assertEquals(len(list(loader.Load())), 1)
+
+  def testMultipleWritesAtOnce(self):
+    self._WriteToFile('multiple_event_file', EventFileLoaderTest.RECORD)
+    self._WriteToFile('multiple_event_file', EventFileLoaderTest.RECORD)
+    loader = self._LoaderForTestFile('staggered_event_file')
+    self.assertEquals(len(list(loader.Load())), 2)
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/summary/impl/reservoir.py b/tensorflow/python/summary/impl/reservoir.py
new file mode 100644
index 0000000000..2c9b294841
--- /dev/null
+++ b/tensorflow/python/summary/impl/reservoir.py
@@ -0,0 +1,164 @@
+"""A key-value[] store that implements reservoir sampling on the values."""
+
+import collections
+import random
+import threading
+
+
+class Reservoir(object):
+  """A map-to-arrays container, with deterministic Reservoir Sampling.
+
+  Items are added with an associated key. Items may be retrieved by key, and
+  a list of keys can also be retrieved. If size is not zero, then it dictates
+  the maximum number of items that will be stored with each key. Once there are
+  more items for a given key, they are replaced via reservoir sampling, such
+  that each item has an equal probability of being included in the sample.
+
+  Deterministic means that for any given seed and bucket size, the sequence of
+  values that are kept for any given tag will always be the same, and that this
+  is independent of any insertions on other tags. That is:
+
+  >>> separate_reservoir = reservoir.Reservoir(10)
+  >>> interleaved_reservoir = reservoir.Reservoir(10)
+  >>> for i in xrange(100):
+  >>>   separate_reservoir.AddItem('key1', i)
+  >>> for i in xrange(100):
+  >>>   separate_reservoir.AddItem('key2', i)
+  >>> for i in xrange(100):
+  >>>   interleaved_reservoir.AddItem('key1', i)
+  >>>   interleaved_reservoir.AddItem('key2', i)
+
+  separate_reservoir and interleaved_reservoir will be in identical states.
+
+  See: https://en.wikipedia.org/wiki/Reservoir_sampling
+
+  Adding items has amortized O(1) runtime.
+
+  """
+
+  def __init__(self, size, seed=0):
+    """Creates a new reservoir.
+
+    Args:
+      size: The number of values to keep in the reservoir for each tag. If 0,
+        all values will be kept.
+      seed: The seed of the random number generator to use when sampling.
+        Different values for |seed| will produce different samples from the same
+        input items.
+
+    Raises:
+      ValueError: If size is negative or not an integer.
+    """
+    if size < 0 or size != round(size):
+      raise ValueError('size must be nonegative integer, was %s' % size)
+    self._buckets = collections.defaultdict(
+        lambda: _ReservoirBucket(size, random.Random(seed)))
+    # _mutex guards the keys - creating new keys, retreiving by key, etc
+    # the internal items are guarded by the ReservoirBuckets' internal mutexes
+    self._mutex = threading.Lock()
+
+  def Keys(self):
+    """Return all the keys in the reservoir.
+
+    Returns:
+      ['list', 'of', 'keys'] in the Reservoir.
+    """
+    with self._mutex:
+      return self._buckets.keys()
+
+  def Items(self, key):
+    """Return items associated with given key.
+
+    Args:
+      key: The key for which we are finding associated items.
+
+    Raises:
+      KeyError: If the key is not ofund in the reservoir.
+
+    Returns:
+      [list, of, items] associated with that key.
+    """
+    with self._mutex:
+      if key not in self._buckets:
+        raise KeyError('Key %s was not found in Reservoir' % key)
+      bucket = self._buckets[key]
+    return bucket.Items()
+
+  def AddItem(self, key, item):
+    """Add a new item to the Reservoir with the given tag.
+
+    The new item is guaranteed to be kept in the Reservoir. One other item might
+    be replaced.
+
+    Args:
+      key: The key to store the item under.
+      item: The item to add to the reservoir.
+    """
+    with self._mutex:
+      bucket = self._buckets[key]
+    bucket.AddItem(item)
+
+
+class _ReservoirBucket(object):
+  """A container for items from a stream, that implements reservoir sampling.
+
+  It always stores the most recent item as its final item.
+  """
+
+  def __init__(self, _max_size, _random=None):
+    """Create the _ReservoirBucket.
+
+    Args:
+      _max_size: The maximum size the reservoir bucket may grow to. If size is
+        zero, the bucket has unbounded size.
+      _random: The random number generator to use. If not specified, defaults to
+        random.Random(0).
+
+    Raises:
+      ValueError: if the size is not a nonnegative integer.
+    """
+    if _max_size < 0 or _max_size != round(_max_size):
+      raise ValueError('_max_size must be nonegative int, was %s' % _max_size)
+    self.items = []
+    # This mutex protects the internal items, ensuring that calls to Items and
+    # AddItem are thread-safe
+    self._mutex = threading.Lock()
+    self._max_size = _max_size
+    self._count = 0
+    if _random is not None:
+      self._random = _random
+    else:
+      self._random = random.Random(0)
+
+  def AddItem(self, item):
+    """Add an item to the ReservoirBucket, replacing an old item if necessary.
+
+    The new item is guaranteed to be added to the bucket, and to be the last
+    element in the bucket. If the bucket has reached capacity, then an old item
+    will be replaced. With probability (_max_size/_count) a random item in the
+    bucket will be popped out and the new item will be appended to the end. With
+    probability (1 - _max_size/_count) the last item in the bucket will be
+    replaced.
+
+    Since the O(n) replacements occur with O(1/_count) liklihood, the amortized
+    runtime is O(1).
+
+    Args:
+      item: The item to add to the bucket.
+    """
+    with self._mutex:
+      if len(self.items) < self._max_size or self._max_size == 0:
+        self.items.append(item)
+      else:
+        r = self._random.randint(0, self._count)
+        if r < self._max_size:
+          self.items.pop(r)
+          self.items.append(item)
+        else:
+          self.items[-1] = item
+      self._count += 1
+
+  def Items(self):
+    """Get all the items in the bucket."""
+    with self._mutex:
+      return self.items
diff --git a/tensorflow/python/summary/impl/reservoir_test.py b/tensorflow/python/summary/impl/reservoir_test.py
new file mode 100644
index 0000000000..46cbde5940
--- /dev/null
+++ b/tensorflow/python/summary/impl/reservoir_test.py
@@ -0,0 +1,178 @@
+import tensorflow.python.platform
+
+from tensorflow.python.platform import googletest
+from tensorflow.python.summary.impl import reservoir
+
+
+class ReservoirTest(googletest.TestCase):
+
+  def testEmptyReservoir(self):
+    r = reservoir.Reservoir(1)
+    self.assertFalse(r.Keys())
+
+  def testRespectsSize(self):
+    r = reservoir.Reservoir(42)
+    self.assertEqual(r._buckets['meaning of life']._max_size, 42)
+
+  def testItemsAndKeys(self):
+    r = reservoir.Reservoir(42)
+    r.AddItem('foo', 4)
+    r.AddItem('bar', 9)
+    r.AddItem('foo', 19)
+    self.assertItemsEqual(r.Keys(), ['foo', 'bar'])
+    self.assertEqual(r.Items('foo'), [4, 19])
+    self.assertEqual(r.Items('bar'), [9])
+
+  def testExceptions(self):
+    with self.assertRaises(ValueError):
+      reservoir.Reservoir(-1)
+    with self.assertRaises(ValueError):
+      reservoir.Reservoir(13.3)
+
+    r = reservoir.Reservoir(12)
+    with self.assertRaises(KeyError):
+      r.Items('missing key')
+
+  def testDeterminism(self):
+    """Tests that the reservoir is deterministic."""
+    key = 'key'
+    r1 = reservoir.Reservoir(10)
+    r2 = reservoir.Reservoir(10)
+    for i in xrange(100):
+      r1.AddItem('key', i)
+      r2.AddItem('key', i)
+
+    self.assertEqual(r1.Items(key), r2.Items(key))
+
+  def testBucketDeterminism(self):
+    """Tests that reservoirs are deterministic at a bucket level.
+
+    This means that only the order elements are added within a bucket matters.
+    """
+    separate_reservoir = reservoir.Reservoir(10)
+    interleaved_reservoir = reservoir.Reservoir(10)
+    for i in xrange(100):
+      separate_reservoir.AddItem('key1', i)
+    for i in xrange(100):
+      separate_reservoir.AddItem('key2', i)
+    for i in xrange(100):
+      interleaved_reservoir.AddItem('key1', i)
+      interleaved_reservoir.AddItem('key2', i)
+
+    for key in ['key1', 'key2']:
+      self.assertEqual(separate_reservoir.Items(key),
+                       interleaved_reservoir.Items(key))
+
+  def testUsesSeed(self):
+    """Tests that reservoirs with different seeds keep different samples."""
+    key = 'key'
+    r1 = reservoir.Reservoir(10, seed=0)
+    r2 = reservoir.Reservoir(10, seed=1)
+    for i in xrange(100):
+      r1.AddItem('key', i)
+      r2.AddItem('key', i)
+    self.assertNotEqual(r1.Items(key), r2.Items(key))
+
+
+class ReservoirBucketTest(googletest.TestCase):
+
+  def testEmptyBucket(self):
+    b = reservoir._ReservoirBucket(1)
+    self.assertFalse(b.Items())
+
+  def testFillToSize(self):
+    b = reservoir._ReservoirBucket(100)
+    for i in xrange(100):
+      b.AddItem(i)
+    self.assertEqual(b.Items(), range(100))
+
+  def testDoesntOverfill(self):
+    b = reservoir._ReservoirBucket(10)
+    for i in xrange(1000):
+      b.AddItem(i)
+    self.assertEqual(len(b.Items()), 10)
+
+  def testMaintainsOrder(self):
+    b = reservoir._ReservoirBucket(100)
+    for i in xrange(10000):
+      b.AddItem(i)
+    items = b.Items()
+    prev = None
+    for item in items:
+      self.assertTrue(item > prev)
+      prev = item
+
+  def testKeepsLatestItem(self):
+    b = reservoir._ReservoirBucket(5)
+    for i in xrange(100):
+      b.AddItem(i)
+      last = b.Items()[-1]
+      self.assertEqual(last, i)
+
+  def testSizeOneBucket(self):
+    b = reservoir._ReservoirBucket(1)
+    for i in xrange(20):
+      b.AddItem(i)
+      self.assertEqual(b.Items(), [i])
+
+  def testSizeZeroBucket(self):
+    b = reservoir._ReservoirBucket(0)
+    for i in xrange(20):
+      b.AddItem(i)
+      self.assertEqual(b.Items(), range(i+1))
+
+  def testSizeRequirement(self):
+    with self.assertRaises(ValueError):
+      reservoir._ReservoirBucket(-1)
+    with self.assertRaises(ValueError):
+      reservoir._ReservoirBucket(10.3)
+
+
+class ReservoirBucketStatisticalDistributionTest(googletest.TestCase):
+
+  def setUp(self):
+    self.total = 1000000
+    self.samples = 10000
+    self.n_buckets = 100
+    self.total_per_bucket = self.total / self.n_buckets
+    self.assertEqual(self.total % self.n_buckets, 0, 'total must be evenly '
+                     'divisible by the number of buckets')
+    self.assertTrue(self.total > self.samples, 'need to have more items '
+                    'than samples')
+
+  def AssertBinomialQuantity(self, measured):
+    p = 1.0 * self.n_buckets / self.samples
+    mean = p * self.samples
+    variance = p * (1 - p) * self.samples
+    error = measured - mean
+    # Given that the buckets were actually binomially distributed, this
+    # fails with probability ~2E-9
+    passed = error * error <= 36.0 * variance
+    self.assertTrue(passed, 'found a bucket with measured %d '
+                    'too far from expected %d' % (measured, mean))
+
+  def testBucketReservoirSamplingViaStatisticalProperties(self):
+    # Not related to a 'ReservoirBucket', but instead number of buckets we put
+    # samples into for testing the shape of the distribution
+    b = reservoir._ReservoirBucket(_max_size=self.samples)
+    # add one extra item because we always keep the most recent item, which
+    # would skew the distribution; we can just slice it off the end instead.
+    for i in xrange(self.total + 1):
+      b.AddItem(i)
+
+    divbins = [0] * self.n_buckets
+    modbins = [0] * self.n_buckets
+    # Slice off the last item when we iterate.
+    for item in b.Items()[0:-1]:
+      divbins[item / self.total_per_bucket] += 1
+      modbins[item % self.n_buckets] += 1
+
+    for bucket_index in xrange(self.n_buckets):
+      divbin = divbins[bucket_index]
+      modbin = modbins[bucket_index]
+      self.AssertBinomialQuantity(divbin)
+      self.AssertBinomialQuantity(modbin)
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/tensorflow.i b/tensorflow/python/tensorflow.i
new file mode 100644
index 0000000000..d26f12a89c
--- /dev/null
+++ b/tensorflow/python/tensorflow.i
@@ -0,0 +1,14 @@
+/* SWIG wrapper for all of TensorFlow native functionality.
+ * The includes are intentionally not alphabetically sorted, as the order of
+ * includes follows dependency order */
+
+%include "tensorflow/python/util/port.i"
+
+%include "tensorflow/python/lib/core/status.i"
+%include "tensorflow/python/lib/core/status_helper.i"
+
+%include "tensorflow/python/lib/io/py_record_reader.i"
+%include "tensorflow/python/lib/io/py_record_writer.i"
+%include "tensorflow/python/client/events_writer.i"
+
+%include "tensorflow/python/client/tf_session.i"
diff --git a/tensorflow/python/training/__init__.py b/tensorflow/python/training/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/training/__init__.py
diff --git a/tensorflow/python/training/adagrad.py b/tensorflow/python/training/adagrad.py
new file mode 100644
index 0000000000..41cf2e00f4
--- /dev/null
+++ b/tensorflow/python/training/adagrad.py
@@ -0,0 +1,58 @@
+"""Adagrad for TensorFlow."""
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.training import optimizer
+from tensorflow.python.training import training_ops
+
+
+class AdagradOptimizer(optimizer.Optimizer):
+  """Optimizer that implements the Adagrad algorithm.
+
+  @@__init__
+  """
+
+  def __init__(self, learning_rate, initial_accumulator_value=0.1,
+               use_locking=False, name="Adagrad"):
+    """Construct a new Adagrad optimizer.
+
+    Args:
+      learning_rate: A `Tensor` or a floating point value.  The learning rate.
+      initial_accumulator_value: A floating point value.
+        Starting value for the accumulators, must be positive.
+      use_locking: If `True` use locks for update operations.
+      name: Optional name prefix for the operations created when applying
+        gradients.  Defaults to "Adagrad".
+
+    Raises:
+      ValueError: If the initial_accumulator_value is invalid.
+    """
+    if initial_accumulator_value <= 0.0:
+      raise ValueError("initial_accumulator_value must be positive: %s" %
+                       initial_accumulator_value)
+    super(AdagradOptimizer, self).__init__(use_locking, name)
+    self._learning_rate = learning_rate
+    self._initial_accumulator_value = initial_accumulator_value
+    # Created in Initialize.
+    self._learning_rate_tensor = None
+
+  def _create_slots(self, var_list):
+    for v in var_list:
+      val = constant_op.constant(self._initial_accumulator_value,
+                                 shape=v.get_shape())
+      self._get_or_make_slot(v, val, "accumulator", self._name)
+
+  def _prepare(self):
+    self._learning_rate_tensor = ops.convert_to_tensor(self._learning_rate,
+                                                       name="learning_rate")
+
+  def _apply_dense(self, grad, var):
+    acc = self.get_slot(var, "accumulator")
+    return training_ops.apply_adagrad(
+        var, acc, self._learning_rate_tensor, grad,
+        use_locking=self._use_locking)
+
+  def _apply_sparse(self, grad, var):
+    acc = self.get_slot(var, "accumulator")
+    return training_ops.sparse_apply_adagrad(
+        var, acc, self._learning_rate_tensor, grad.values, grad.indices,
+        use_locking=self._use_locking)
diff --git a/tensorflow/python/training/adagrad_test.py b/tensorflow/python/training/adagrad_test.py
new file mode 100644
index 0000000000..ee83791eb5
--- /dev/null
+++ b/tensorflow/python/training/adagrad_test.py
@@ -0,0 +1,144 @@
+"""Functional tests for aggregate operations."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class AdagradOptimizerTest(tf.test.TestCase):
+
+  def testBasic(self):
+    with self.test_session():
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([3.0, 4.0])
+      grads0 = tf.constant([0.1, 0.1])
+      grads1 = tf.constant([0.01, 0.01])
+      ada_opt = tf.train.AdagradOptimizer(3.0, initial_accumulator_value=0.1)
+      ada_update = ada_opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+      # Run 3 steps of adagrad
+      for _ in range(3):
+        ada_update.run()
+      # Validate updated params
+      self.assertAllClose(np.array([-1.6026098728179932, -0.6026098728179932]),
+                          var0.eval())
+      self.assertAllClose(np.array([2.715679168701172, 3.715679168701172]),
+                          var1.eval())
+
+  def testFloat64(self):
+    with self.test_session():
+      opt = tf.train.AdagradOptimizer(3.0, initial_accumulator_value=0.1)
+
+      # compute_gradients.
+      values = [1.0, 3.0]
+      good_vars = [tf.Variable([v]) for v in values]
+      bad_loss = tf.constant(2.0, tf.float64, name="bad_loss")
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_loss.*expected.*float32",
+          opt.compute_gradients, bad_loss, good_vars)
+      bad_vars = [
+          tf.Variable(np.array([v], np.float64), name="bad_var")
+          for v in values]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_var.*expected.*float32",
+          opt.compute_gradients, tf.cast(bad_vars[0] + bad_vars[1], tf.float32),
+          bad_vars)
+      opt.compute_gradients(good_vars[0] + good_vars[1], good_vars)
+
+      # apply_gradients.
+      bad_grads = [
+          tf.constant([0.1], dtype=np.float64, name="bad_grad"),
+          tf.constant([0.01])]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_grad.*expected.*float32",
+          opt.apply_gradients, zip(bad_grads, good_vars))
+      good_grads = [tf.constant([0.01]), tf.constant([0.02])]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_var.*expected.*float32",
+          opt.apply_gradients, zip(good_grads, bad_vars))
+      opt.apply_gradients(zip(good_grads, good_vars))
+
+  def testSparseBasic(self):
+    with self.test_session():
+      var0 = tf.Variable([[1.0], [2.0]])
+      var1 = tf.Variable([[3.0], [4.0]])
+      grads0 = tf.IndexedSlices(tf.constant([0.1], shape=[1, 1]),
+                                tf.constant([0]),
+                                tf.constant([2, 1]))
+      grads1 = tf.IndexedSlices(tf.constant([0.01], shape=[1, 1]),
+                                tf.constant([1]),
+                                tf.constant([2, 1]))
+      ada_opt = tf.train.AdagradOptimizer(3.0, initial_accumulator_value=0.1)
+      ada_update = ada_opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+      # Fetch params to validate initial values
+      self.assertAllClose([[1.0], [2.0]], var0.eval())
+      self.assertAllClose([[3.0], [4.0]], var1.eval())
+      # Run 3 step of sgd
+      for _ in range(3):
+        ada_update.run()
+      # Validate updated params
+      self.assertAllClose([[-1.6026098728179932], [2.0]], var0.eval())
+      self.assertAllClose([[3.0], [3.715679168701172]], var1.eval())
+
+  def testSparseStability(self):
+    with self.test_session():
+      shape = [1, 6]
+      var0 = tf.Variable([[0.00872496, -0.106952, 0.110467, 0.226505,
+                           -0.0147257, -0.0105945]])
+      grads0 = tf.IndexedSlices(
+          tf.constant(
+              [[-5.91278e-05, 5.31673e-05, -2.5779e-06, 4.29153e-05,
+                -8.4877e-05, -9.48906e-05]],
+              shape=shape),
+          tf.constant([0]),
+          tf.constant(shape))
+      ada_opt = tf.train.AdagradOptimizer(1.0, initial_accumulator_value=0.1)
+      ada_update = ada_opt.apply_gradients(zip([grads0], [var0]))
+      self.assertEqual(["accumulator"], ada_opt.get_slot_names())
+      slot0 = ada_opt.get_slot(var0, "accumulator")
+      init = tf.initialize_all_variables()
+      for _ in range(100):
+        init.run()
+        ada_update.run()
+        self.assertAllClose([[0.1, 0.1, 0.1, 0.1, 0.1, 0.1]], slot0.eval())
+        self.assertAllClose(
+            [[0.00891194, -0.10712013, 0.11047515, 0.22636929,
+              -0.0144573, -0.01029443]], var0.eval())
+
+  def testSharing(self):
+    with self.test_session():
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([3.0, 4.0])
+      grads0 = tf.constant([0.1, 0.1])
+      grads1 = tf.constant([0.01, 0.01])
+      ada_opt = tf.train.AdagradOptimizer(3.0)
+      # Apply the optimizer twice.  Both applications will use the same accums.
+      ada_update1 = ada_opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      ada_update2 = ada_opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      self.assertEqual(["accumulator"], ada_opt.get_slot_names())
+      slot0 = ada_opt.get_slot(var0, "accumulator")
+      self.assertEquals(slot0.get_shape(), var0.get_shape())
+      slot1 = ada_opt.get_slot(var1, "accumulator")
+      self.assertEquals(slot1.get_shape(), var1.get_shape())
+      tf.initialize_all_variables().run()
+
+      # Fetch params to validate initial values.
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+      # Mix the first and the second adagrad for 3 steps.
+      ada_update1.run()
+      ada_update2.run()
+      ada_update1.run()
+      # Validate updated params (the same as with only 1 Adagrad).
+      self.assertAllClose(np.array([-1.6026098728179932, -0.6026098728179932]),
+                          var0.eval())
+      self.assertAllClose(np.array([2.715679168701172, 3.715679168701172]),
+                          var1.eval())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/adam.py b/tensorflow/python/training/adam.py
new file mode 100644
index 0000000000..266430bb13
--- /dev/null
+++ b/tensorflow/python/training/adam.py
@@ -0,0 +1,142 @@
+"""Adam for TensorFlow."""
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import state_ops
+from tensorflow.python.ops import variables
+from tensorflow.python.training import optimizer
+from tensorflow.python.training import training_ops
+
+
+class AdamOptimizer(optimizer.Optimizer):
+  """Optimizer that implements the Adam algorithm.
+
+  @@__init__
+  """
+
+  def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8,
+               use_locking=False, name="Adam"):
+    """Construct a new Adam optimizer.
+
+    Implementation is based on: http://arxiv.org/pdf/1412.6980v7.pdf
+
+    Initialization:
+
+    ```
+    m_0 <- 0 (Initialize initial 1st moment vector)
+    v_0 <- 0 (Initialize initial 2nd moment vector)
+    t <- 0 (Initialize timestep)
+    ```
+
+    The update rule for `variable` with gradient `g` uses an optimization
+    described at the end of section2 of the paper:
+
+    ```
+    t <- t + 1
+    lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
+
+    m_t <- beta1 * m_{t-1} + (1 - beta1) * g
+    v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
+    variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
+    ```
+
+    The default value of 1e-8 for epsilon might not be a good default in
+    general. For example, when training an Inception network on ImageNet a
+    current good choice is 1.0 or 0.1.
+
+    Args:
+      learning_rate: A Tensor or a floating point value.  The learning rate.
+      beta1: A float value or a constant float tensor.
+        The exponential decay rate for the 1st moment estimates.
+      beta2: A float value or a constant float tensor.
+        The exponential decay rate for the 2st moment estimates.
+      epsilon: A small constant for numerical stability.
+      use_locking: If True use locks for update operation.s
+      name: Optional name for the operations created when applying gradients.
+        Defaults to "Adam".
+    """
+    super(AdamOptimizer, self).__init__(use_locking, name)
+    self._lr = learning_rate
+    self._beta1 = beta1
+    self._beta2 = beta2
+    self._epsilon = epsilon
+
+    # Tensor versions of the constructor arguments, created in _prepare().
+    self._lr_t = None
+    self._beta1_t = None
+    self._beta2_t = None
+    self._epsilon_t = None
+
+    # Variables to accumulate the powers of the beta parameters.
+    # Created in _create_slots when we know the variables to optimize.
+    self._beta1_power = None
+    self._beta2_power = None
+
+    # Created in SparseApply if needed.
+    self._updated_lr = None
+
+  def _get_beta_accumulators(self):
+    return self._beta1_power, self._beta2_power
+
+  def _create_slots(self, var_list):
+    # Create the beta1 and beta2 accumulators on the same device as the first
+    # variable.
+    if self._beta1_power is None:
+      with ops.device(var_list[0].device):
+        self._beta1_power = variables.Variable(self._beta1, name="beta1_power")
+        self._beta2_power = variables.Variable(self._beta2, name="beta2_power")
+    # Create slots for the first and second moments.
+    for v in var_list:
+      self._zeros_slot(v, "m", self._name)
+      self._zeros_slot(v, "v", self._name)
+
+  def _prepare(self):
+    self._lr_t = ops.convert_to_tensor(self._lr, name="learning_rate")
+    self._beta1_t = ops.convert_to_tensor(self._beta1, name="beta1")
+    self._beta2_t = ops.convert_to_tensor(self._beta2, name="beta2")
+    self._epsilon_t = ops.convert_to_tensor(self._epsilon, name="epsilon")
+
+  def _apply_dense(self, grad, var):
+    m = self.get_slot(var, "m")
+    v = self.get_slot(var, "v")
+    return training_ops.apply_adam(
+        var, m, v, self._beta1_power, self._beta2_power,
+        self._lr_t, self._beta1_t, self._beta2_t,
+        self._epsilon_t, grad, use_locking=self._use_locking).op
+
+  def _apply_sparse(self, grad, var):
+    lr = (self._lr_t *
+          math_ops.sqrt(1 - self._beta2_power)
+          / (1 - self._beta1_power))
+    # m_t = beta1 * m + (1 - beta1) * g_t
+    m = self.get_slot(var, "m")
+    m_scaled_g_values = grad.values * (1 - self._beta1_t)
+    m_t = state_ops.assign(m, m * self._beta1_t,
+                           use_locking=self._use_locking)
+    m_t = state_ops.scatter_add(m_t, grad.indices, m_scaled_g_values,
+                               use_locking=self._use_locking)
+    # v_t = beta2 * v + (1 - beta2) * (g_t * g_t)
+    v = self.get_slot(var, "v")
+    v_scaled_g_values = (grad.values * grad.values) * (1 - self._beta2_t)
+    v_t = state_ops.assign(v, v * self._beta2_t, use_locking=self._use_locking)
+    v_t = state_ops.scatter_add(v_t, grad.indices, v_scaled_g_values,
+                               use_locking=self._use_locking)
+    v_sqrt = math_ops.sqrt(v_t)
+    var_update = state_ops.assign_sub(var,
+                                     lr * m_t / (v_sqrt + self._epsilon_t),
+                                     use_locking=self._use_locking)
+    return control_flow_ops.group(*[var_update, m_t, v_t])
+
+  def _finish(self, update_ops, name_scope):
+    # Update the power accumulators.
+    with ops.control_dependencies(update_ops):
+      with ops.device(self._beta1_power.device):
+        update_beta1 = self._beta1_power.assign(
+            self._beta1_power * self._beta1_t,
+            use_locking=self._use_locking)
+        update_beta2 = self._beta2_power.assign(
+            self._beta2_power * self._beta2_t,
+            use_locking=self._use_locking)
+    return control_flow_ops.group(*update_ops + [update_beta1, update_beta2],
+                                  name=name_scope)
diff --git a/tensorflow/python/training/adam_test.py b/tensorflow/python/training/adam_test.py
new file mode 100644
index 0000000000..f92728d0c7
--- /dev/null
+++ b/tensorflow/python/training/adam_test.py
@@ -0,0 +1,174 @@
+"""Tests for Adam."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+def adam_update_numpy(param, g_t, t, m, v, alpha=0.001, beta1=0.9, beta2=0.999,
+                      epsilon=1e-8):
+  alpha_t = alpha * np.sqrt(1 - beta2 ** t) / (1 - beta1 ** t)
+
+  m_t = beta1 * m + (1 - beta1) * g_t
+  v_t = beta2 * v + (1 - beta2) * g_t * g_t
+
+  param_t = param - alpha_t * m_t / (np.sqrt(v_t) + epsilon)
+  return param_t, m_t, v_t
+
+
+class AdamOptimizerTest(tf.test.TestCase):
+
+  def testSparse(self):
+    with self.test_session():
+      # Initialize variables for numpy implementation.
+      m0, v0, m1, v1 = 0.0, 0.0, 0.0, 0.0
+      var0_np = np.array([1.0, 2.0], dtype=np.float32)
+      grads0_np = np.array([0.1, 0.1], dtype=np.float32)
+      var1_np = np.array([3.0, 4.0], dtype=np.float32)
+      grads1_np = np.array([0.01, 0.01], dtype=np.float32)
+
+      var0 = tf.Variable(var0_np)
+      var1 = tf.Variable(var1_np)
+      grads0_np_indices = np.array([0, 1], dtype=np.int32)
+      grads0 = tf.IndexedSlices(tf.constant(grads0_np),
+                                tf.constant(grads0_np_indices),
+                                tf.constant([2]))
+      grads1_np_indices = np.array([0, 1], dtype=np.int32)
+      grads1 = tf.IndexedSlices(tf.constant(grads1_np),
+                                tf.constant(grads1_np_indices),
+                                tf.constant([2]))
+      opt = tf.train.AdamOptimizer()
+      update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+
+      beta1_power, beta2_power = opt._get_beta_accumulators()
+
+      # Run 3 steps of Adam
+      for t in range(1, 4):
+        self.assertAllClose(0.9 ** t, beta1_power.eval())
+        self.assertAllClose(0.999 ** t, beta2_power.eval())
+        update.run()
+
+        var0_np, m0, v0 = adam_update_numpy(var0_np, grads0_np, t, m0, v0)
+        var1_np, m1, v1 = adam_update_numpy(var1_np, grads1_np, t, m1, v1)
+
+        # Validate updated params
+        self.assertAllClose(var0_np, var0.eval())
+        self.assertAllClose(var1_np, var1.eval())
+
+  def testBasic(self):
+    with self.test_session():
+      # Initialize variables for numpy implementation.
+      m0, v0, m1, v1 = 0.0, 0.0, 0.0, 0.0
+      var0_np = np.array([1.0, 2.0], dtype=np.float32)
+      grads0_np = np.array([0.1, 0.1], dtype=np.float32)
+      var1_np = np.array([3.0, 4.0], dtype=np.float32)
+      grads1_np = np.array([0.01, 0.01], dtype=np.float32)
+
+      var0 = tf.Variable(var0_np)
+      var1 = tf.Variable(var1_np)
+      grads0 = tf.constant(grads0_np)
+      grads1 = tf.constant(grads1_np)
+      opt = tf.train.AdamOptimizer()
+      update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+
+      beta1_power, beta2_power = opt._get_beta_accumulators()
+
+      # Run 3 steps of Adam
+      for t in range(1, 4):
+        self.assertAllClose(0.9 ** t, beta1_power.eval())
+        self.assertAllClose(0.999 ** t, beta2_power.eval())
+        update.run()
+
+        var0_np, m0, v0 = adam_update_numpy(var0_np, grads0_np, t, m0, v0)
+        var1_np, m1, v1 = adam_update_numpy(var1_np, grads1_np, t, m1, v1)
+
+        # Validate updated params
+        self.assertAllClose(var0_np, var0.eval())
+        self.assertAllClose(var1_np, var1.eval())
+
+  def testFloat64(self):
+    with self.test_session():
+      opt = tf.train.AdamOptimizer()
+
+      # compute_gradients.
+      values = [1.0, 3.0]
+      good_vars = [tf.Variable([v]) for v in values]
+      bad_loss = tf.constant(2.0, tf.float64, name="bad_loss")
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_loss.*expected.*float32",
+          opt.compute_gradients, bad_loss, good_vars)
+      bad_vars = [
+          tf.Variable(np.array([v], np.float64), name="bad_var")
+          for v in values]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_var.*expected.*float32",
+          opt.compute_gradients, tf.cast(bad_vars[0] + bad_vars[1], tf.float32),
+          bad_vars)
+      opt.compute_gradients(good_vars[0] + good_vars[1], good_vars)
+
+      # apply_gradients.
+      bad_grads = [
+          tf.constant([0.1], dtype=np.float64, name="bad_grad"),
+          tf.constant([0.01])]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_grad.*expected.*float32",
+          opt.apply_gradients, zip(bad_grads, good_vars))
+      good_grads = [tf.constant([0.01]), tf.constant([0.02])]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_var.*expected.*float32",
+          opt.apply_gradients, zip(good_grads, bad_vars))
+      opt.apply_gradients(zip(good_grads, good_vars))
+
+  def testSharing(self):
+    with self.test_session():
+      # Initialize variables for numpy implementation.
+      m0, v0, m1, v1 = 0.0, 0.0, 0.0, 0.0
+      var0_np = np.array([1.0, 2.0], dtype=np.float32)
+      grads0_np = np.array([0.1, 0.1], dtype=np.float32)
+      var1_np = np.array([3.0, 4.0], dtype=np.float32)
+      grads1_np = np.array([0.01, 0.01], dtype=np.float32)
+
+      var0 = tf.Variable(var0_np)
+      var1 = tf.Variable(var1_np)
+      grads0 = tf.constant(grads0_np)
+      grads1 = tf.constant(grads1_np)
+      opt = tf.train.AdamOptimizer()
+      update1 = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      update2 = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      beta1_power, beta2_power = opt._get_beta_accumulators()
+
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+
+      # Run 3 steps of intertwined Adam1 and Adam2.
+      for t in range(1, 4):
+        self.assertAllClose(0.9 ** t, beta1_power.eval())
+        self.assertAllClose(0.999 ** t, beta2_power.eval())
+        if t % 2 == 0:
+          update1.run()
+        else:
+          update2.run()
+
+        var0_np, m0, v0 = adam_update_numpy(var0_np, grads0_np, t, m0, v0)
+        var1_np, m1, v1 = adam_update_numpy(var1_np, grads1_np, t, m1, v1)
+
+        # Validate updated params
+        self.assertAllClose(var0_np, var0.eval())
+        self.assertAllClose(var1_np, var1.eval())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/checkpoint_state.proto b/tensorflow/python/training/checkpoint_state.proto
new file mode 100644
index 0000000000..1f521341f1
--- /dev/null
+++ b/tensorflow/python/training/checkpoint_state.proto
@@ -0,0 +1,18 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+// Protocol buffer representing the checkpoint state.
+//
+// TODO(mdevin): Add other attributes as needed.
+message CheckpointState {
+  // Path to the most-recent model checkpoint.
+  string model_checkpoint_path = 1;
+
+  // Paths to all not-yet-deleted model checkpoints, sorted from oldest to
+  // newest.
+  // Note that the value of model_checkpoint_path should be the last item in
+  // this list.
+  repeated string all_model_checkpoint_paths = 2;
+}
diff --git a/tensorflow/python/training/coordinator.py b/tensorflow/python/training/coordinator.py
new file mode 100644
index 0000000000..f090e6d222
--- /dev/null
+++ b/tensorflow/python/training/coordinator.py
@@ -0,0 +1,186 @@
+"""Coordinator to help multiple threads stop when requested."""
+import sys
+import threading
+import time
+
+from tensorflow.python.platform import logging
+
+
+class Coordinator(object):
+  """A coordinator for threads.
+
+  This class implements a simple mechanism to coordinate the termination of a
+  set of threads.
+
+  #### Usage:
+
+  ```python
+  # Create a coordinator.
+  coord = Coordinator()
+  # Start a number of threads, passing the coordinator to each of them.
+  ...start thread 1...(coord, ...)
+  ...start thread N...(coord, ...)
+  # Wait for all the threads to terminate.
+  coord.join(threads)
+  ```
+
+  Any of the threads can call `coord.request_stop()` to ask for all the threads
+  to stop.  To cooperate with the requests, each thread must check for
+  `coord.should_stop()` on a regular basis.  `coord.should_stop()` returns
+  `True` as soon as `coord.request_stop()` has been called.
+
+  A typical thread running with a Coordinator will do something like:
+
+  ```python
+  while not coord.should_stop():
+     ...do some work...
+  ```
+
+  #### Exception handling:
+
+  A thread can report an exception to the Coordinator as part of the
+  `should_stop()` call.  The exception will be re-raised from the
+  `coord.join()` call.
+
+  Thread code:
+
+  ```python
+  try:
+    while not coord.should_stop():
+      ...do some work...
+  except Exception, e:
+    coord.request_stop(e)
+  ```
+
+  Main code:
+
+  ```python
+  try:
+    ...
+    coord = Coordinator()
+    # Start a number of threads, passing the coordinator to each of them.
+    ...start thread 1...(coord, ...)
+    ...start thread N...(coord, ...)
+    # Wait for all the threads to terminate.
+    coord.join(threads)
+  except Exception, e:
+    ...exception that was passed to coord.request_stop()
+  ```
+
+  #### Grace period for stopping:
+
+  After a thread has called `coord.request_stop()` the other threads have a
+  fixed time to stop, this is called the 'stop grace period' and defaults to 2
+  minutes.  If any of the threads is still alive after the grace period expires
+  `coord.join()` raises a RuntimeException reporting the laggards.
+
+  ```
+  try:
+    ...
+    coord = Coordinator()
+    # Start a number of threads, passing the coordinator to each of them.
+    ...start thread 1...(coord, ...)
+    ...start thread N...(coord, ...)
+    # Wait for all the threads to terminate, give them 10s grace period
+    coord.join(threads, stop_grace_period_secs=10)
+  except RuntimeException:
+    ...one of the threads took more than 10s to stop after request_stop()
+    ...was called.
+  except Exception:
+    ...exception that was passed to coord.request_stop()
+  ```
+  """
+
+  def __init__(self):
+    """Create a new Coordinator."""
+    # Protects all attributes.
+    self._lock = threading.Lock()
+    # Event set when threads must stop.
+    self._stop_event = threading.Event()
+    # Python exc_info to report.
+    self._exc_info_to_raise = None
+
+  def request_stop(self, ex=None):
+    """Request that the threads stop.
+
+    After this is called, calls to should_stop() will return True.
+
+    Args:
+      ex: Optional Exception, or Python 'exc_info' tuple as returned by
+        sys.exc_info().  If this is the first call to request_stop() the
+        corresponding exception is recorded and re-raised from join().
+    """
+    with self._lock:
+      if not self._stop_event.is_set():
+        if ex and self._exc_info_to_raise is None:
+          if isinstance(ex, tuple):
+            logging.info("Error reported to Coordinator: %s", str(ex[1]))
+            self._exc_info_to_raise = ex
+          else:
+            logging.info("Error reported to Coordinator: %s", str(ex))
+            self._exc_info_to_raise = sys.exc_info()
+        self._stop_event.set()
+
+  def should_stop(self):
+    """Check if stop was requested.
+
+    Returns:
+      True if a stop was requested.
+    """
+    return self._stop_event.is_set()
+
+  def wait_for_stop(self, timeout=None):
+    """Wait till the Coordinator is told to stop.
+
+    Args:
+      timeout: float.  Sleep for up to that many seconds waiting for
+        should_stop() to become True.
+
+    Returns:
+      True if the Coordinator is told stop, False if the timeout expired.
+    """
+    return self._stop_event.wait(timeout)
+
+  def join(self, threads, stop_grace_period_secs=120):
+    """Wait for threads to terminate.
+
+    Blocks until all 'threads' have terminated or request_stop() is called.
+
+    After the threads stop, if an 'exc_info' was passed to request_stop, that
+    exception is re-reaised.
+
+    Grace period handling: When request_stop() is called, threads are given
+    'stop_grace_period_secs' seconds to terminate.  If any of them is still
+    alive after that period expires, a RuntimeError is raised.  Note that if
+    an 'exc_info' was passed to request_stop() then it is raised instead of
+    that RuntimeError.
+
+    Args:
+      threads: List threading.Threads. The started threads to join.
+      stop_grace_period_secs: Number of seconds given to threads to stop after
+        request_stop() has been called.
+
+    Raises:
+      RuntimeError: If any thread is still alive after request_stop()
+        is called and the grace period expires.
+    """
+    # Wait for all threads to stop or for request_stop() to be called.
+    while any(t.is_alive() for t in threads) and not self.wait_for_stop(1.0):
+      pass
+
+    # If any thread is still alive, wait for the grace period to expire.
+    while any(t.is_alive() for t in threads) and stop_grace_period_secs >= 0.0:
+      stop_grace_period_secs -= 1.0
+      time.sleep(1.0)
+
+    # List the threads still alive after the grace period.
+    stragglers = [t.name for t in threads if t.is_alive()]
+
+    # Terminate with an exception if appropriate.
+    with self._lock:
+      if self._exc_info_to_raise:
+        exc_info = self._exc_info_to_raise
+        raise exc_info[0], exc_info[1], exc_info[2]
+      elif stragglers:
+        raise RuntimeError("Coordinator stopped with threads still running: %s",
+                           " ".join(stragglers))
diff --git a/tensorflow/python/training/coordinator_test.py b/tensorflow/python/training/coordinator_test.py
new file mode 100644
index 0000000000..ce9126caf4
--- /dev/null
+++ b/tensorflow/python/training/coordinator_test.py
@@ -0,0 +1,98 @@
+"""Tests for Coordinator."""
+import sys
+import threading
+import time
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+def StopInN(coord, n_secs):
+  time.sleep(n_secs)
+  coord.request_stop()
+
+
+def RaiseInN(coord, n_secs, ex, report_exception):
+  try:
+    time.sleep(n_secs)
+    raise ex
+  except RuntimeError, e:
+    if report_exception:
+      coord.request_stop(e)
+    else:
+      coord.request_stop(sys.exc_info())
+
+
+def SleepABit(n_secs):
+  time.sleep(n_secs)
+
+
+class CoordinatorTest(tf.test.TestCase):
+
+  def testStopAPI(self):
+    coord = tf.train.Coordinator()
+    self.assertFalse(coord.should_stop())
+    self.assertFalse(coord.wait_for_stop(0.01))
+    coord.request_stop()
+    self.assertTrue(coord.should_stop())
+    self.assertTrue(coord.wait_for_stop(0.01))
+
+  def testStopAsync(self):
+    coord = tf.train.Coordinator()
+    self.assertFalse(coord.should_stop())
+    self.assertFalse(coord.wait_for_stop(0.1))
+    threading.Thread(target=StopInN, args=(coord, 0.02)).start()
+    self.assertFalse(coord.should_stop())
+    self.assertFalse(coord.wait_for_stop(0.01))
+    self.assertTrue(coord.wait_for_stop(0.03))
+    self.assertTrue(coord.should_stop())
+
+  def testJoin(self):
+    coord = tf.train.Coordinator()
+    threads = [
+        threading.Thread(target=SleepABit, args=(0.01,)),
+        threading.Thread(target=SleepABit, args=(0.02,)),
+        threading.Thread(target=SleepABit, args=(0.01,))]
+    for t in threads:
+      t.start()
+    coord.join(threads)
+
+  def testJoinGraceExpires(self):
+    coord = tf.train.Coordinator()
+    threads = [
+        threading.Thread(target=StopInN, args=(coord, 0.01)),
+        threading.Thread(target=SleepABit, args=(10.0,))]
+    for t in threads:
+      t.daemon = True
+      t.start()
+    with self.assertRaisesRegexp(RuntimeError, "threads still running"):
+      coord.join(threads, stop_grace_period_secs=0.02)
+
+  def testJoinRaiseReportExcInfo(self):
+    coord = tf.train.Coordinator()
+    threads = [
+        threading.Thread(target=RaiseInN,
+                         args=(coord, 0.01, RuntimeError("First"), False)),
+        threading.Thread(target=RaiseInN,
+                         args=(coord, 0.02, RuntimeError("Too late"), False))]
+    for t in threads:
+      t.start()
+    with self.assertRaisesRegexp(RuntimeError, "First"):
+      coord.join(threads)
+
+  def testJoinRaiseReportException(self):
+    coord = tf.train.Coordinator()
+    threads = [
+        threading.Thread(target=RaiseInN,
+                         args=(coord, 0.01, RuntimeError("First"), True)),
+        threading.Thread(target=RaiseInN,
+                         args=(coord, 0.02, RuntimeError("Too late"), True))]
+    for t in threads:
+      t.start()
+    with self.assertRaisesRegexp(RuntimeError, "First"):
+      coord.join(threads)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/ftrl.py b/tensorflow/python/training/ftrl.py
new file mode 100644
index 0000000000..6b9471a5ed
--- /dev/null
+++ b/tensorflow/python/training/ftrl.py
@@ -0,0 +1,283 @@
+"""FTRL-Proximal for Tensor Flow."""
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import state_ops
+from tensorflow.python.training import optimizer
+
+
+def _Solve(a, b, c):
+  """Return solution of a quadratic minimization.
+
+  The optimization equation is:
+       f(a, b, c) = argmin_w{1/2 * a * w^2 + b * w + c * |w|}
+  we get optimal solution w*:
+       w* = -(b - sign(b)*c)/a if |b| > c else w* = 0
+
+  REQUIRES: Dimensionality of a and b must be same
+
+  Args:
+    a: A Tensor
+    b: A Tensor
+    c: A Tensor with one element.
+
+  Returns:
+    A Tensor w, which is solution for the equation
+  """
+  with ops.name_scope("solve_" + b.op.name):
+    c = ops.convert_to_tensor(c)
+    k = array_ops.fill(array_ops.shape(b), c)
+    zero_t = array_ops.zeros(array_ops.shape(b), dtype=b.dtype)
+    w = (c * math_ops.sign(b) - b) / a
+    w = math_ops.select(math_ops.less(math_ops.abs(b), k), zero_t, w)
+    return w
+
+
+def _Compute(accum, linear, base_lr, lr_power, l1, l2):
+  """Compute "variable" given current "accum" and "linear".
+
+  REQUIRES: Dimensionality of accum and linear must be same.
+
+  Args:
+    accum: A Tensor which is accumulated gradient square.
+    linear: A Tensor with same size of accum.
+    base_lr: A Tensor which is base learning rate
+    lr_power: A Tensor which is learning rate power
+    l1: A Tensor which is l1_regularization strength
+    l2: A Tensor which is l2_regularization strength
+  Returns:
+    A Tensor which is "variable" after update
+  """
+  with ops.name_scope("compute_" + accum.op.name):
+    one_t = constant_op.constant(1.0, dtype=types.float32)
+    two_t = constant_op.constant(2.0, dtype=types.float32)
+    learning_rate = math_ops.pow(accum, lr_power) * base_lr
+    quadratic = one_t / learning_rate + two_t * l2
+    w = _Solve(quadratic, linear, l1)
+    return w
+
+
+def _Update(variable, gradients, accum, linear, base_lr, lr_power, l1, l2):
+  """Update "variable", "accum", "linear" based on "gradients".
+
+  Some notations here: "variable" as W, "accum" as N, "linear" as Z,
+                       "gradients" as G, N(t) means "accum" at t-step.
+  Assuming lr_power = -0.5 which means using adagrad learning rate.
+  "accum" updates as: N = N + G^2
+  "linear" updates as: Z = Z + G - W * (sqrt(N(t)) - sqrt(N(t-1)))/base_lr
+  REQUIRES: Dimensionality of variable, gradients, accum and linear
+            must be same.
+
+  Args:
+    variable: A Variable.
+    gradients: A Tensor of same shape as 'variable'.
+    accum: A Variable containing the sum of the squares of gradients.
+    linear: A Variable containing approximation info.
+    base_lr: A constant represents base learning rate.
+    lr_power: A constant is used to adjust learning rate.
+    l1: A constant represents l1 regularization strength.
+    l2: A constant represents l2 regularization strength.
+
+  Returns:
+    A group op including three Assign ops:
+      1. Assign for "accum"
+      2. Assign for "linear"
+      3. Assign for "variable"
+  """
+  dtype = variable.dtype.base_dtype
+  base_lr = ops.convert_to_tensor(base_lr, dtype=dtype)
+  lr_power = ops.convert_to_tensor(lr_power, dtype=dtype)
+  l1 = ops.convert_to_tensor(l1, dtype=dtype)
+  l2 = ops.convert_to_tensor(l2, dtype=dtype)
+  # Compute the new accumulator
+  sqr_grad = math_ops.square(gradients)
+  accum_updated = sqr_grad + accum
+  # Compute the new linear
+  neg_lr_power = math_ops.neg(lr_power)
+  sigma = math_ops.pow(accum_updated, neg_lr_power) - math_ops.pow(
+      accum, neg_lr_power)
+  sigma /= base_lr
+  proximal_adjust = sigma * variable
+  linear_updated = linear + gradients - proximal_adjust
+  # Compute the "variable"
+  variable_updated = _Compute(accum_updated, linear_updated, base_lr,
+                              lr_power, l1, l2)
+
+  with ops.control_dependencies([sigma]):
+    accum_update_op = state_ops.assign(accum, accum_updated)
+  linear_update_op = state_ops.assign(linear, linear_updated)
+  variable_update_op = state_ops.assign(variable, variable_updated)
+  group_op = control_flow_ops.group(linear_update_op, accum_update_op,
+                                    variable_update_op)
+  return group_op
+
+
+# TODO(xbing): Refactor code to make _SparseUpdate and _Update share
+# common routines.
+def _SparseUpdate(variable, gradients, accum, linear, base_lr,
+                  lr_power, l1, l2):
+  """Sparse Update "variable", "accum", "linear" based on sparse "gradients".
+
+  See the description in _Update.
+
+  Args:
+    variable: A Variable.
+    gradients: A Sparse Tensor
+    accum: A Variable containing the sum of the squares of gradients.
+    linear: A Variable containing approximation info.
+    base_lr: A constant represents base learning rate.
+    lr_power: A constant is used to adjust learning rate.
+    l1: A constant represents l1 regularization strength.
+    l2: A constant represents l2 regularization strength.
+
+  Returns:
+    A group op including three ScatterUpdate ops:
+      1. ScatterUpdate for "accum"
+      2. ScatterUpdate for "linear"
+      3. ScatterUpdate for "variable"
+  """
+  assert isinstance(gradients, ops.IndexedSlices)
+  with ops.name_scope("sparse_update_" + variable.op.name) as scope:
+    dtype = variable.dtype.base_dtype
+    base_lr = ops.convert_to_tensor(base_lr, dtype=dtype)
+    lr_power = ops.convert_to_tensor(lr_power, dtype=dtype)
+    l1 = ops.convert_to_tensor(l1, dtype=dtype)
+    l2 = ops.convert_to_tensor(l2, dtype=dtype)
+
+    # Compute the new value for the accumulator
+    previous_accum = array_ops.gather(accum, gradients.indices)
+    sqr_grad = gradients.values * gradients.values
+    accum_updated = sqr_grad + previous_accum
+
+    # Compute the new linear
+    neg_lr_power = math_ops.neg(lr_power)
+    sigma = math_ops.pow(accum_updated, neg_lr_power) - math_ops.pow(
+        previous_accum, neg_lr_power)
+    sigma /= base_lr
+    variable_slice = array_ops.gather(variable, gradients.indices)
+    proximal_adjust = sigma * variable_slice
+    linear_slice = array_ops.gather(linear, gradients.indices)
+    linear_updated = linear_slice + gradients.values - proximal_adjust
+
+    # Compute the new "variable"
+    variable_updated = _Compute(accum_updated, linear_updated, base_lr,
+                                lr_power, l1, l2)
+
+    with ops.control_dependencies([sigma]):
+      accum_update_op = state_ops.scatter_update(accum, gradients.indices,
+                                                accum_updated)
+    linear_update_op = state_ops.scatter_update(linear, gradients.indices,
+                                               linear_updated)
+    variable_update_op = state_ops.scatter_update(variable, gradients.indices,
+                                                 variable_updated)
+    group_op = control_flow_ops.group(linear_update_op, accum_update_op,
+                                      variable_update_op, name=scope)
+    return group_op
+
+
+class FtrlOptimizer(optimizer.Optimizer):
+  """Optimizer that implements the FTRL algorithm.
+
+  @@__init__
+  """
+
+  def __init__(self, learning_rate,
+               learning_rate_power=-0.5,
+               initial_accumulator_value=0.1,
+               l1_regularization_strength=0.0,
+               l2_regularization_strength=0.0,
+               use_locking=False, name="Ftrl"):
+    """Construct a new FTRL optimizer.
+
+    The Ftrl-proximal algorithm, abbreviated for Follow-the-regularized-leader,
+    is described in the paper [Ad Click Prediction: a View from the Trenches](
+    https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf).
+
+    It can give a good performance vs. sparsity tradeoff.
+
+    Ftrl-proximal uses its own global base learning rate and can behave like
+    Adagrad with `learning_rate_power=-0.5`, or like gradient descent with
+    `learning_rate_power=0.0`.
+
+    The effective learning rate is adjusted per parameter, relative to this
+    base learning rate as:
+
+    ```
+    effective_learning_rate_i = (learning_rate /
+        pow(k + summed_squared_gradients_for_i, learning_rate_power));
+    ```
+
+    where k is the small constant `initial_accumulator_value`.
+
+    Note that the real regularization coefficient of `|w|^2` for objective
+    function is `1 / lambda_2` if specifying `l2 = lambda_2` as argument when
+    using this function.
+
+    Args:
+      learning_rate: A float value or a constant float `Tensor`.
+      learning_rate_power: A float value, must be less or equal to zero.
+      initial_accumulator_value: The starting value for accumulators.
+        Only positive values are allowed.
+      l1_regularization_strength: A float value, must be greater than or
+        equal to zero.
+      l2_regularization_strength: A float value, must be greater than or
+        equal to zero.
+      use_locking: If `True` use locks for update operations.
+      name: Optional name prefix for the operations created when applying
+        gradients.  Defaults to "Ftrl".
+
+    Raises:
+      ValueError: if one of the arguments is invalid.
+    """
+    super(FtrlOptimizer, self).__init__(use_locking, name)
+
+    if initial_accumulator_value <= 0.0:
+      raise ValueError("initial_accumulator_value %f needs to be positive" %
+                       initial_accumulator_value)
+    if learning_rate_power > 0.0:
+      raise ValueError("learning_rate_power %f needs to be negative or zero" %
+                       learning_rate_power)
+    if l1_regularization_strength < 0.0:
+      raise ValueError(
+          "l1_regularization_strength %f needs to be positive or zero" %
+          l1_regularization_strength)
+    if l2_regularization_strength < 0.0:
+      raise ValueError(
+          "l2_regularization_strength %f needs to be positive or zero" %
+          l2_regularization_strength)
+
+    self._learning_rate = learning_rate
+    self._learning_rate_power = learning_rate_power
+    self._initial_accumulator_value = initial_accumulator_value
+    self._l1_regularization_strength = l1_regularization_strength
+    self._l2_regularization_strength = l2_regularization_strength
+
+  def _create_slots(self, var_list):
+    # Create the "accum" and "linear" slots.
+    for v in var_list:
+      self._get_or_make_slot(
+          v,
+          constant_op.constant(self._initial_accumulator_value,
+                               dtype=v.dtype, shape=v.get_shape()),
+          "accum",
+          self._name)
+      self._zeros_slot(v, "linear", self._name)
+
+  def _apply_dense(self, grad, var):
+    accum = self.get_slot(var, "accum")
+    linear = self.get_slot(var, "linear")
+    return _Update(var, grad, accum, linear,
+                   self._learning_rate, self._learning_rate_power,
+                   self._l1_regularization_strength,
+                   self._l2_regularization_strength)
+
+  def _apply_sparse(self, grad, var):
+    accum = self.get_slot(var, "accum")
+    linear = self.get_slot(var, "linear")
+    return _SparseUpdate(var, grad, accum, linear,
+                         self._learning_rate, self._learning_rate_power,
+                         self._l1_regularization_strength,
+                         self._l2_regularization_strength)
diff --git a/tensorflow/python/training/ftrl_test.py b/tensorflow/python/training/ftrl_test.py
new file mode 100644
index 0000000000..eb581048f1
--- /dev/null
+++ b/tensorflow/python/training/ftrl_test.py
@@ -0,0 +1,234 @@
+"""Functional tests for Ftrl operations."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class FtrlOptimizerTest(tf.test.TestCase):
+
+  def testFtrlwithoutRegularization(self):
+    with self.test_session() as sess:
+      var0 = tf.Variable([0.0, 0.0])
+      var1 = tf.Variable([0.0, 0.0])
+      grads0 = tf.constant([0.1, 0.2])
+      grads1 = tf.constant([0.01, 0.02])
+      opt = tf.train.FtrlOptimizer(3.0,
+                                   initial_accumulator_value=0.1,
+                                   l1_regularization_strength=0.0,
+                                   l2_regularization_strength=0.0)
+      update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      v0_val, v1_val = sess.run([var0, var1])
+      self.assertAllClose([0.0, 0.0], v0_val)
+      self.assertAllClose([0.0, 0.0], v1_val)
+
+      # Run 3 steps FTRL
+      for _ in range(3):
+        update.run()
+
+      v0_val, v1_val = sess.run([var0, var1])
+      self.assertAllClose(np.array([-2.60260963, -4.29698515]),
+                          v0_val)
+      self.assertAllClose(np.array([-0.28432083, -0.56694895]),
+                          v1_val)
+
+  def testFtrlwithoutRegularization2(self):
+    with self.test_session() as sess:
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([4.0, 3.0])
+      grads0 = tf.constant([0.1, 0.2])
+      grads1 = tf.constant([0.01, 0.02])
+
+      opt = tf.train.FtrlOptimizer(3.0,
+                                   initial_accumulator_value=0.1,
+                                   l1_regularization_strength=0.0,
+                                   l2_regularization_strength=0.0)
+      update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      v0_val, v1_val = sess.run([var0, var1])
+      self.assertAllClose([1.0, 2.0], v0_val)
+      self.assertAllClose([4.0, 3.0], v1_val)
+
+      # Run 3 steps FTRL
+      for _ in range(3):
+        update.run()
+      v0_val, v1_val = sess.run([var0, var1])
+      self.assertAllClose(np.array([-2.55607247, -3.98729396]),
+                          v0_val)
+      self.assertAllClose(np.array([-0.28232238, -0.56096673]),
+                          v1_val)
+
+  def testFtrlWithL1(self):
+    with self.test_session() as sess:
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([4.0, 3.0])
+      grads0 = tf.constant([0.1, 0.2])
+      grads1 = tf.constant([0.01, 0.02])
+
+      opt = tf.train.FtrlOptimizer(3.0,
+                                   initial_accumulator_value=0.1,
+                                   l1_regularization_strength=0.001,
+                                   l2_regularization_strength=0.0)
+      update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      v0_val, v1_val = sess.run([var0, var1])
+      self.assertAllClose([1.0, 2.0], v0_val)
+      self.assertAllClose([4.0, 3.0], v1_val)
+
+      # Run 10 steps FTRL
+      for _ in range(10):
+        update.run()
+      v0_val, v1_val = sess.run([var0, var1])
+      self.assertAllClose(np.array([-7.66718769, -10.91273689]),
+                          v0_val)
+      self.assertAllClose(np.array([-0.93460727, -1.86147261]),
+                          v1_val)
+
+  def testFtrlWithL1_L2(self):
+    with self.test_session() as sess:
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([4.0, 3.0])
+      grads0 = tf.constant([0.1, 0.2])
+      grads1 = tf.constant([0.01, 0.02])
+
+      opt = tf.train.FtrlOptimizer(3.0,
+                                   initial_accumulator_value=0.1,
+                                   l1_regularization_strength=0.001,
+                                   l2_regularization_strength=2.0)
+      update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      v0_val, v1_val = sess.run([var0, var1])
+      self.assertAllClose([1.0, 2.0], v0_val)
+      self.assertAllClose([4.0, 3.0], v1_val)
+
+      # Run 10 steps FTRL
+      for _ in range(10):
+        update.run()
+
+      v0_val, v1_val = sess.run([var0, var1])
+      self.assertAllClose(np.array([-0.24059935, -0.46829352]),
+                          v0_val)
+      self.assertAllClose(np.array([-0.02406147, -0.04830509]),
+                          v1_val)
+
+  def applyOptimizer(self, opt, steps=5, is_sparse=False):
+    if is_sparse:
+      var0 = tf.Variable([[0.0], [0.0]])
+      var1 = tf.Variable([[0.0], [0.0]])
+      grads0 = tf.IndexedSlices(tf.constant([0.1], shape=[1, 1]),
+                                tf.constant([0]),
+                                tf.constant([2, 1]))
+      grads1 = tf.IndexedSlices(tf.constant([0.02], shape=[1, 1]),
+                                tf.constant([1]),
+                                tf.constant([2, 1]))
+    else:
+      var0 = tf.Variable([0.0, 0.0])
+      var1 = tf.Variable([0.0, 0.0])
+      grads0 = tf.constant([0.1, 0.2])
+      grads1 = tf.constant([0.01, 0.02])
+
+    update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+    tf.initialize_all_variables().run()
+
+    sess = tf.get_default_session()
+    v0_val, v1_val = sess.run([var0, var1])
+    if is_sparse:
+      self.assertAllClose([[0.0], [0.0]], v0_val)
+      self.assertAllClose([[0.0], [0.0]], v1_val)
+    else:
+      self.assertAllClose([0.0, 0.0], v0_val)
+      self.assertAllClose([0.0, 0.0], v1_val)
+
+    # Run Ftrl for a few steps
+    for _ in range(steps):
+      update.run()
+
+    v0_val, v1_val = sess.run([var0, var1])
+    return v0_val, v1_val
+
+  # When variables are intialized with Zero, FTRL-Proximal has two properties:
+  # 1. Without L1&L2 but with fixed learning rate, FTRL-Proximal is identical
+  # with GradientDescent.
+  # 2. Without L1&L2 but with adaptive learning rate, FTRL-Proximal is identical
+  # with Adagrad.
+  # So, basing on these two properties, we test if our implementation of
+  # FTRL-Proximal performs same updates as Adagrad or GradientDescent.
+  def testEquivAdagradwithoutRegularization(self):
+    with self.test_session():
+      val0, val1 = self.applyOptimizer(
+          tf.train.FtrlOptimizer(3.0,
+                                 # Adagrad learning rate
+                                 learning_rate_power=-0.5,
+                                 initial_accumulator_value=0.1,
+                                 l1_regularization_strength=0.0,
+                                 l2_regularization_strength=0.0))
+
+    with self.test_session():
+      val2, val3 = self.applyOptimizer(
+          tf.train.AdagradOptimizer(3.0, initial_accumulator_value=0.1))
+
+    self.assertAllClose(val0, val2)
+    self.assertAllClose(val1, val3)
+
+  def testEquivSparseAdagradwithoutRegularization(self):
+    with self.test_session():
+      val0, val1 = self.applyOptimizer(
+          tf.train.FtrlOptimizer(3.0,
+                                 # Adagrad learning rate
+                                 learning_rate_power=-0.5,
+                                 initial_accumulator_value=0.1,
+                                 l1_regularization_strength=0.0,
+                                 l2_regularization_strength=0.0),
+          is_sparse=True)
+
+    with self.test_session():
+      val2, val3 = self.applyOptimizer(
+          tf.train.AdagradOptimizer(3.0, initial_accumulator_value=0.1),
+          is_sparse=True)
+
+    self.assertAllClose(val0, val2)
+    self.assertAllClose(val1, val3)
+
+  def testEquivSparseGradientDescentwithoutRegularizaion(self):
+    with self.test_session():
+      val0, val1 = self.applyOptimizer(
+          tf.train.FtrlOptimizer(3.0,
+                                 # Fixed learning rate
+                                 learning_rate_power=-0.0,
+                                 initial_accumulator_value=0.1,
+                                 l1_regularization_strength=0.0,
+                                 l2_regularization_strength=0.0),
+          is_sparse=True)
+
+    with self.test_session():
+      val2, val3 = self.applyOptimizer(
+          tf.train.GradientDescentOptimizer(3.0), is_sparse=True)
+
+    self.assertAllClose(val0, val2)
+    self.assertAllClose(val1, val3)
+
+  def testEquivGradientDescentwithoutRegularizaion(self):
+    with self.test_session():
+      val0, val1 = self.applyOptimizer(
+          tf.train.FtrlOptimizer(3.0,
+                                 # Fixed learning rate
+                                 learning_rate_power=-0.0,
+                                 initial_accumulator_value=0.1,
+                                 l1_regularization_strength=0.0,
+                                 l2_regularization_strength=0.0))
+
+    with self.test_session():
+      val2, val3 = self.applyOptimizer(
+          tf.train.GradientDescentOptimizer(3.0))
+
+    self.assertAllClose(val0, val2)
+    self.assertAllClose(val1, val3)
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/gradient_descent.py b/tensorflow/python/training/gradient_descent.py
new file mode 100644
index 0000000000..21247aacf3
--- /dev/null
+++ b/tensorflow/python/training/gradient_descent.py
@@ -0,0 +1,44 @@
+"""GradientDescent for TensorFlow."""
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import constant_op
+# pylint: disable=unused-import
+from tensorflow.python.ops import math_ops
+# pylint: enable=unused-import
+from tensorflow.python.training import optimizer
+from tensorflow.python.training import training_ops
+
+
+class GradientDescentOptimizer(optimizer.Optimizer):
+  """Optimizer that implements the gradient descent algorithm.
+
+  @@__init__
+  """
+
+  def __init__(self, learning_rate, use_locking=False, name="GradientDescent"):
+    """Construct a new gradient descent optimizer.
+
+    Args:
+      learning_rate: A Tensor or a floating point value.  The learning
+        rate to use.
+      use_locking: If True use locks for update operation.s
+      name: Optional name prefix for the operations created when applying
+        gradients. Defaults to "GradientDescent".
+    """
+    super(GradientDescentOptimizer, self).__init__(use_locking, name)
+    self._learning_rate = learning_rate
+
+  def _apply_dense(self, grad, var):
+    return training_ops.apply_gradient_descent(
+        var,
+        self._learning_rate_tensor,
+        grad,
+        use_locking=self._use_locking).op
+
+  def _apply_sparse(self, grad, var):
+    delta = ops.IndexedSlices(grad.values * self._learning_rate_tensor,
+                              grad.indices, grad.dense_shape)
+    return var.scatter_sub(delta, use_locking=self._use_locking)
+
+  def _prepare(self):
+    self._learning_rate_tensor = ops.convert_to_tensor(self._learning_rate,
+                                                       name="learning_rate")
diff --git a/tensorflow/python/training/gradient_descent_test.py b/tensorflow/python/training/gradient_descent_test.py
new file mode 100644
index 0000000000..d5b0cae401
--- /dev/null
+++ b/tensorflow/python/training/gradient_descent_test.py
@@ -0,0 +1,105 @@
+"""Functional test for GradientDescent."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class GradientDescentOptimizerTest(tf.test.TestCase):
+
+  def testBasic(self):
+    with self.test_session():
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([3.0, 4.0])
+      grads0 = tf.constant([0.1, 0.1])
+      grads1 = tf.constant([0.01, 0.01])
+      sgd_op = tf.train.GradientDescentOptimizer(3.0).apply_gradients(
+          zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+      # Run 1 step of sgd
+      sgd_op.run()
+      # Validate updated params
+      self.assertAllClose([1.0 - 3.0 * 0.1, 2.0 - 3.0 * 0.1], var0.eval())
+      self.assertAllClose([3.0 - 3.0 * 0.01, 4.0 - 3.0 * 0.01], var1.eval())
+
+  def testFloat64(self):
+    with self.test_session():
+      opt = tf.train.GradientDescentOptimizer(3.0)
+
+      # compute_gradients.
+      values = [1.0, 3.0]
+      good_vars = [tf.Variable([v]) for v in values]
+      bad_loss = tf.constant(2.0, tf.float64, name="bad_loss")
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_loss.*expected.*float32",
+          opt.compute_gradients, bad_loss, good_vars)
+      bad_vars = [
+          tf.Variable(np.array([v], np.float64), name="bad_var")
+          for v in values]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_var.*expected.*float32",
+          opt.compute_gradients, tf.cast(bad_vars[0] + bad_vars[1], tf.float32),
+          bad_vars)
+      opt.compute_gradients(good_vars[0] + good_vars[1], good_vars)
+
+      # apply_gradients.
+      bad_grads = [
+          tf.constant([0.1], dtype=np.float64, name="bad_grad"),
+          tf.constant([0.01])]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_grad.*expected.*float32",
+          opt.apply_gradients, zip(bad_grads, good_vars))
+      good_grads = [tf.constant([0.01]), tf.constant([0.02])]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_var.*expected.*float32",
+          opt.apply_gradients, zip(good_grads, bad_vars))
+      opt.apply_gradients(zip(good_grads, good_vars))
+
+  def testWithGlobalStep(self):
+    with self.test_session():
+      global_step = tf.Variable(0, trainable=False)
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([3.0, 4.0])
+      grads0 = tf.constant([0.1, 0.1])
+      grads1 = tf.constant([0.01, 0.01])
+      sgd_op = tf.train.GradientDescentOptimizer(3.0).apply_gradients(
+          zip([grads0, grads1], [var0, var1]), global_step=global_step)
+      tf.initialize_all_variables().run()
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+      # Run 1 step of sgd
+      sgd_op.run()
+      # Validate updated params and global_step
+      self.assertAllClose([1.0 - 3.0 * 0.1, 2.0 - 3.0 * 0.1], var0.eval())
+      self.assertAllClose([3.0 - 3.0 * 0.01, 4.0 - 3.0 * 0.01], var1.eval())
+      self.assertAllClose(1, global_step.eval())
+
+  def testSparseBasic(self):
+    with self.test_session():
+      var0 = tf.Variable([[1.0], [2.0]])
+      var1 = tf.Variable([[3.0], [4.0]])
+      grads0 = tf.IndexedSlices(tf.constant([0.1], shape=[1, 1]),
+                                tf.constant([0]),
+                                tf.constant([2, 1]))
+      grads1 = tf.IndexedSlices(tf.constant([0.01], shape=[1, 1]),
+                                tf.constant([1]),
+                                tf.constant([2, 1]))
+      sgd_op = tf.train.GradientDescentOptimizer(3.0).apply_gradients(
+          zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+      # Fetch params to validate initial values
+      self.assertAllClose([[1.0], [2.0]], var0.eval())
+      self.assertAllClose([[3.0], [4.0]], var1.eval())
+      # Run 1 step of sgd
+      sgd_op.run()
+      # Validate updated params
+      self.assertAllClose([[1.0 - 3.0 * 0.1], [2.0]], var0.eval())
+      self.assertAllClose([[3.0], [4.0 - 3.0 * 0.01]], var1.eval())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/input.py b/tensorflow/python/training/input.py
new file mode 100644
index 0000000000..413fc044f7
--- /dev/null
+++ b/tensorflow/python/training/input.py
@@ -0,0 +1,501 @@
+"""## Input pipeline
+
+TensorFlow functions for setting up an input-prefetching pipeline.
+Please see the [reading data how-to](../../how_tos/reading_data.md)
+for context.
+
+### Beginning of an input pipeline
+
+The "producer" functions add a queue to the graph and a corresponding
+QueueRunner for running the subgraph that fills that queue.
+
+@@match_filenames_once
+@@limit_epochs
+@@range_input_producer
+@@slice_input_producer
+@@string_input_producer
+
+### Batching at the end of an input pipeline
+
+These functions add a queue to the graph to assemble a batch of
+examples, with possible shuffling.  They also add a QueueRunner for
+running the subgraph that fills that queue.
+
+Use [batch](#batch) or [batch_join](#batch_join) for batching examples that have
+already been well shuffled.  Use [shuffle_batch](#shuffle_batch) or
+[shuffle_batch_join](#shuffle_batch_join) for examples that
+would benefit from additional shuffling.
+
+Use [batch](#batch) or [shuffle_batch](#shuffle_batch) if you want a
+single thread producing examples to batch, or if you have a
+single subgraph producing examples but you want to run it in N threads
+(where you increase N until it can keep the queue full).  Use
+[batch_join](#batch_join) or [shuffle_batch_join](#shuffle_batch_join)
+if you have N different subgraphs producing examples to batch and you
+want them run by N threads.
+
+@@batch
+@@batch_join
+@@shuffle_batch
+@@shuffle_batch_join
+
+"""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import data_flow_ops
+from tensorflow.python.ops import io_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import random_ops
+from tensorflow.python.ops import summary_ops
+from tensorflow.python.ops import variables
+from tensorflow.python.training import queue_runner
+
+
+def match_filenames_once(pattern, name=None):
+  """Save the list of files matching pattern, so it is only computed once.
+
+  Args:
+    pattern: A file pattern (glob).
+    name: A name for the operations (optional).
+
+  Returns:
+    A variable that is initialized to the list of files matching pattern.
+  """
+  with ops.op_scope([pattern], name, "matching_filenames") as name:
+    return variables.Variable(io_ops.matching_files(pattern), trainable=False,
+                              name=name, validate_shape=False)
+
+
+def limit_epochs(tensor, num_epochs=None, name=None):
+  """Returns tensor num_epochs times and then raises an OutOfRange error.
+
+  Args:
+    tensor: Any Tensor.
+    num_epochs: An integer (optional).  If specified, limits the number
+      of steps the output tensor may be evaluated.
+    name: A name for the operations (optional).
+
+  Returns:
+    tensor or OutOfRange.
+  """
+  if num_epochs is None:
+    return tensor
+  if num_epochs <= 0:
+    raise ValueError("num_epochs must be > 0 not %d." % num_epochs)
+  with ops.op_scope([tensor], name, "limit_epochs") as name:
+    zero64 = constant_op.constant(0, dtype=types.int64)
+    epochs = variables.Variable(zero64, name="epochs")
+    counter = epochs.count_up_to(num_epochs)
+    with ops.control_dependencies([counter]):
+      return array_ops.identity(tensor, name=name)
+
+
+def _input_producer(input_tensor, dtype, num_epochs, shuffle, seed, capacity,
+                    name, summary_name):
+  if shuffle:
+    input_tensor = random_ops.random_shuffle(input_tensor, seed=seed)
+  input_tensor = limit_epochs(input_tensor, num_epochs)
+
+  q = data_flow_ops.FIFOQueue(capacity=capacity, dtypes=[dtype], shapes=[[]],
+                              name=name)
+  enq = q.enqueue_many([input_tensor])
+  queue_runner.add_queue_runner(queue_runner.QueueRunner(q, [enq]))
+  summary_ops.scalar_summary("queue/%s/%s" % (q.name, summary_name),
+                             math_ops.cast(q.size(), types.float32) *
+                             (1. / capacity))
+  return q
+
+
+def string_input_producer(string_tensor, num_epochs=None, shuffle=True,
+                          seed=None, capacity=32, name=None):
+  """Output strings (e.g. filenames) to a queue for an input pipeline.
+
+  Args:
+    string_tensor: A 1-D string tensor with the strings to produce.
+    num_epochs: An integer (optional). If specified, `string_input_producer`
+      produces each string from `string_tensor` `num_epochs` times before
+      generating an OutOfRange error. If not specified, `string_input_producer`
+      can cycle through the strings in `string_tensor` an unlimited number of
+      times.
+    shuffle: Boolean. If true, the strings are randomly shuffled within each
+      epoch.
+    seed: An integer (optional). Seed used if shuffle == True.
+    capacity: An integer. Sets the queue capacity.
+    name: A name for the operations (optional).
+
+  Returns:
+    A queue with the output strings.  A QueueRunner for the Queue
+    is added to the current Graph's QUEUE_RUNNER collection.
+  """
+  with ops.op_scope([string_tensor], name, "input_producer") as name:
+    return _input_producer(
+        string_tensor, types.string, num_epochs, shuffle, seed, capacity, name,
+        "fraction_of_%d_full" % capacity)
+
+
+def range_input_producer(limit, num_epochs=None, shuffle=True, seed=None,
+                         capacity=32, name=None):
+  """Produces the integers from 0 to limit-1 in a queue.
+
+  Args:
+    limit: An int32 scalar tensor.
+    num_epochs: An integer (optional). If specified, `range_input_producer`
+      produces each integer `num_epochs` times before generating an
+      OutOfRange error. If not specified, `range_input_producer` can cycle
+      through the integers an unlimited number of times.
+    shuffle: Boolean. If true, the integers are randomly shuffled within each
+      epoch.
+    seed: An integer (optional). Seed used if shuffle == True.
+    capacity: An integer. Sets the queue capacity.
+    name: A name for the operations (optional).
+
+  Returns:
+    A Queue with the output integers.  A QueueRunner for the Queue
+    is added to the current Graph's QUEUE_RUNNER collection.
+  """
+  with ops.op_scope([limit], name, "input_producer") as name:
+    range_tensor = math_ops.range(0, limit)
+    return _input_producer(
+        range_tensor, types.int32, num_epochs, shuffle, seed, capacity, name,
+        "fraction_of_%d_full" % capacity)
+
+
+def slice_input_producer(tensor_list, num_epochs=None, shuffle=True, seed=None,
+                         capacity=32, name=None):
+  """Produces a slice of each Tensor in tensor_list.
+
+  Implemented using a Queue -- a QueueRunner for the Queue
+  is added to the current Graph's QUEUE_RUNNER collection.
+
+  Args:
+    tensor_list: A list of Tensors. Every Tensor in tensor_list must
+      have the same size in the first dimension.
+    num_epochs: An integer (optional). If specified, `slice_input_producer`
+      produces each slice `num_epochs` times before generating
+      an OutOfRange error. If not specified, `slice_input_producer` can cycle
+      through the slices an unlimited number of times.
+    seed: An integer (optional). Seed used if shuffle == True.
+    capacity: An integer. Sets the queue capacity.
+    name: A name for the operations (optional).
+
+  Returns:
+    A list of tensors, one for each element of tensor_list.  If the tensor
+    in tensor_list has shape [N, a, b, .., z], then the corresponding output
+    tensor will have shape [a, b, ..., z].
+  """
+  with ops.op_scope(tensor_list, name, "input_producer"):
+    tensor_list = ops.convert_n_to_tensor_or_indexed_slices(tensor_list)
+    if not tensor_list:
+      raise ValueError(
+          "Expected at least one tensor in slice_input_producer().")
+    range_size = array_ops.shape(tensor_list[0])[0]
+    # TODO(josh11b): Add an assertion that the first dimension of
+    # everything in TensorList matches. Maybe just check the inferred shapes?
+    queue = range_input_producer(range_size, num_epochs=num_epochs,
+                                 shuffle=shuffle, seed=seed, capacity=capacity)
+    index = queue.dequeue()
+    output = [array_ops.gather(t, index) for t in tensor_list]
+    return output
+
+
+# Helpers for the batching functions ------------------------------------------
+
+def _flatten(tensor_list_list):
+  return [tensor for tensor_list in tensor_list_list for tensor in tensor_list]
+
+
+def _validate(tensor_list):
+  tensor_list = ops.convert_n_to_tensor_or_indexed_slices(tensor_list)
+  if not tensor_list:
+    raise ValueError("Expected at least one tensor in batch().")
+  return tensor_list
+
+
+def _validate_join(tensor_list_list):
+  tensor_list_list = [ops.convert_n_to_tensor_or_indexed_slices(tl)
+                      for tl in tensor_list_list]
+  if not tensor_list_list:
+    raise ValueError("Expected at least one input in batch_join().")
+  return tensor_list_list
+
+
+def _dtypes(tensor_list_list):
+  all_dtypes = [[t.dtype for t in tl] for tl in tensor_list_list]
+  dtypes = all_dtypes[0]
+  for other_dtypes in all_dtypes[1:]:
+    if other_dtypes != dtypes:
+      raise TypeError("Expected types to be consistent: %s vs. %s." %
+                      ", ".join(x.name for x in dtypes),
+                      ", ".join(x.name for x in other_dtypes))
+  return dtypes
+
+
+def _merge_shapes(shape_list, enqueue_many):
+  shape_list = [tensor_shape.as_shape(s) for s in shape_list]
+  if enqueue_many:
+    # We want the shapes without the leading batch dimension.
+    shape_list = [s.WithRankAtLeast(1)[1:] for s in shape_list]
+  merged_shape = shape_list[0]
+  for s in shape_list[1:]:
+    merged_shape.merge_with(s)
+  return merged_shape.as_list()
+
+
+def _shapes(tensor_list_list, shapes, enqueue_many):
+  if shapes is None:
+    l = len(tensor_list_list[0])
+    shapes = [_merge_shapes([tl[i].get_shape().as_list()
+                             for tl in tensor_list_list],
+                            enqueue_many) for i in range(l)]
+  return shapes
+
+
+def _enqueue_join(queue, tensor_list_list, enqueue_many):
+  if enqueue_many:
+    enqueue_ops = [queue.enqueue_many(tl) for tl in tensor_list_list]
+  else:
+    enqueue_ops = [queue.enqueue(tl) for tl in tensor_list_list]
+  queue_runner.add_queue_runner(queue_runner.QueueRunner(queue, enqueue_ops))
+
+
+def _enqueue(queue, tensor_list, threads, enqueue_many):
+  if enqueue_many:
+    enqueue_ops = [queue.enqueue_many(tensor_list)] * threads
+  else:
+    enqueue_ops = [queue.enqueue(tensor_list)] * threads
+  queue_runner.add_queue_runner(queue_runner.QueueRunner(queue, enqueue_ops))
+
+
+# Batching functions ----------------------------------------------------------
+
+def batch(tensor_list, batch_size, num_threads=1, capacity=32,
+          enqueue_many=False, shapes=None, name=None):
+  """Run tensor_list to fill a queue to create batches.
+
+  Implemented using a queue -- a QueueRunner for the queue
+  is added to the current Graph's QUEUE_RUNNER collection.
+
+  Args:
+    tensor_list: The list of tensors to enqueue.
+    batch_size: The new batch size pulled from the queue.
+    num_threads: The number of threads enqueuing tensor_list.
+    capacity: Maximum number of elements in the queue, controls the
+      how far ahead the prefetching allowed is allowed to get and
+      memory usage.
+    enqueue_many: If False, tensor_list is assumed to represent a
+      single example.  If True, tensor_list is assumed to represent
+      a batch of examples, where the first dimension is indexed by
+      example, and all members of tensor_list should have the same
+      size in the first dimension.
+    shapes: Optional. The shapes for each example.  Defaults to the
+      inferred shapes for tensor_list (leaving off the first dimension
+      if enqueue_many is True).
+    name: A name for the operations (optional).
+
+  Returns:
+    A list of tensors with the same number and types as tensor_list.
+    If enqueue_many is false, then an input tensor with shape
+    `[x, y, z]` will be output as a tensor with shape
+    `[batch_size, x, y, z]`.  If enqueue_many is True, and an
+    input tensor has shape `[*, x, y, z]`, the the output will have
+    shape `[batch_size, x, y, z]`.
+  """
+  with ops.op_scope(tensor_list, name, "batch") as name:
+    tensor_list = _validate(tensor_list)
+    dtypes = _dtypes([tensor_list])
+    shapes = _shapes([tensor_list], shapes, enqueue_many)
+    # TODO(josh11b,mrry): Switch to BatchQueue once it is written.
+    queue = data_flow_ops.FIFOQueue(
+        capacity=capacity, dtypes=dtypes, shapes=shapes)
+    _enqueue(queue, tensor_list, num_threads, enqueue_many)
+    summary_ops.scalar_summary(
+        "queue/%s/fraction_of_%d_full" % (queue.name, capacity),
+        math_ops.cast(queue.size(), types.float32) * (1. / capacity))
+    return queue.dequeue_many(batch_size, name=name)
+
+
+# TODO(josh11b): Add a thread_multiplier or num_threads (that has to be
+# a multiple of len(tensor_list_list)?) parameter, to address the use
+# case where you want more parallelism than you can support different
+# readers (either because you don't have that many files or can't
+# read that many files in parallel due to the number of seeks required).
+# Once this is done, batch() can be written as a call to batch_join().
+def batch_join(tensor_list_list, batch_size, capacity=32, enqueue_many=False,
+               shapes=None, name=None):
+  """Run a list of tensors to fill a queue to create batches of examples.
+
+  This version enqueues a different list of tensors in different threads.
+  Implemented using a queue -- a QueueRunner for the queue
+  is added to the current Graph's QUEUE_RUNNER collection.
+
+  Args:
+    tensor_list_list: A list of tuples of tensors to enqueue.
+      len(tensor_list_list) threads will be started, with the i-th
+      thread enqueuing the tensors from tensor_list[i].
+      tensor_list[i1][j] must match tensor_list[i2][j] in type and
+      shape (except in the first dimension if enqueue_many is true).
+    batch_size: The new batch size pulled from the queue.
+    capacity: Maximum number of elements in the queue, controls the
+      how far ahead the prefetching allowed is allowed to get and
+      memory usage.
+    enqueue_many: If False, each tensor_list_list[i] is assumed to
+      represent a single example.  If True, tensor_list_list[i] is
+      assumed to represent a batch of examples, where the first
+      dimension is indexed by example, and all members of
+      tensor_list_list[i] should have the same size in the first
+      dimension.
+    shapes: Optional. The shapes for each example.  Defaults to the
+      inferred shapes for tensor_list_list[i] (which must match, after
+      leaving off the first dimension if enqueue_many is True).
+    name: A name for the operations (optional).
+
+  Returns:
+    A list of tensors with the same number and types as
+    tensor_list_list[i].  If enqueue_many is false, then an input
+    tensor with shape `[x, y, z]` will be output as a tensor with
+    shape `[batch_size, x, y, z]`.  If enqueue_many is True, and an
+    input tensor has shape `[*, x, y, z]`, the the output will have
+    shape `[batch_size, x, y, z]`.
+  """
+  with ops.op_scope(_flatten(tensor_list_list), name, "batch_join") as name:
+    tensor_list_list = _validate_join(tensor_list_list)
+    dtypes = _dtypes(tensor_list_list)
+    shapes = _shapes(tensor_list_list, shapes, enqueue_many)
+    # TODO(josh11b,mrry): Switch to BatchQueue once it is written.
+    queue = data_flow_ops.FIFOQueue(
+        capacity=capacity, dtypes=dtypes, shapes=shapes)
+    _enqueue_join(queue, tensor_list_list, enqueue_many)
+    summary_ops.scalar_summary(
+        "queue/%s/fraction_of_%d_full" % (queue.name, capacity),
+        math_ops.cast(queue.size(), types.float32) * (1. / capacity))
+    return queue.dequeue_many(batch_size, name=name)
+
+
+def shuffle_batch(tensor_list, batch_size, capacity, min_after_dequeue,
+                  num_threads=1, seed=None, enqueue_many=False, shapes=None,
+                  name=None):
+  """Create batches by randomly shuffling tensors.
+
+  This adds:
+
+  * a shuffling queue into which tensors from tensor_list are enqueued.
+  * a dequeue many operation to create batches from the queue,
+  * and a QueueRunner is added to the current Graph's QUEUE_RUNNER collection,
+    to enqueue the tensors from tensor_list.
+
+  Args:
+    tensor_list: The list of tensors to enqueue.
+    batch_size: The new batch size pulled from the queue.
+    capacity: Maximum number of elements in the queue, controls the
+      how far ahead the prefetching allowed is allowed to get and
+      memory usage.
+    min_after_dequeue: Minimum number elements in the queue after a
+      dequeue, used to ensure a level of mixing of elements.
+    num_threads: The number of threads enqueuing tensor_list.
+    seed: Seed for the random shuffling within the queue.
+    enqueue_many: If False, tensor_list is assumed to represent a
+      single example.  If True, tensor_list is assumed to represent
+      a batch of examples, where the first dimension is indexed by
+      example, and all members of tensor_list should have the same
+      size in the first dimension.
+    shapes: Optional. The shapes for each example.  Defaults to the
+      inferred shapes for tensor_list (leaving off the first dimension
+      if enqueue_many is True).
+    name: A name for the operations (optional).
+
+  Returns:
+    A list of tensors with the same number and types as tensor_list.
+    If enqueue_many is false, then an input tensor with shape
+    `[x, y, z]` will be output as a tensor with shape
+    `[batch_size, x, y, z]`.  If enqueue_many is True, and an
+    input tensor has shape `[*, x, y, z]`, the the output will have
+    shape `[batch_size, x, y, z]`.
+  """
+  with ops.op_scope(tensor_list, name, "shuffle_batch") as name:
+    tensor_list = _validate(tensor_list)
+    dtypes = _dtypes([tensor_list])
+    shapes = _shapes([tensor_list], shapes, enqueue_many)
+    queue = data_flow_ops.RandomShuffleQueue(
+        capacity=capacity, min_after_dequeue=min_after_dequeue, seed=seed,
+        dtypes=dtypes, shapes=shapes)
+    _enqueue(queue, tensor_list, num_threads, enqueue_many)
+    full = (math_ops.cast(queue.size() - min_after_dequeue, types.float32) *
+            (1. / (capacity - min_after_dequeue)))
+    # Note that name contains a '/' at the end so we intentionally do not place
+    # a '/' after %s below.
+    summary_name = (
+        "queue/%sfraction_over_%d_of_%d_full" %
+        (name, min_after_dequeue, capacity - min_after_dequeue))
+    summary_ops.scalar_summary(summary_name, full)
+
+    return queue.dequeue_many(batch_size, name=name)
+
+
+def shuffle_batch_join(tensor_list_list, batch_size, capacity,
+                       min_after_dequeue, seed=None, enqueue_many=False,
+                       shapes=None, name=None):
+  """Create batches by randomly shuffling tensors.
+
+  This version enqueues a different list of tensors in different threads.
+  It adds:
+
+  * a shuffling queue into which tensors from tensor_list_list are enqueued.
+  * a dequeue many operation to create batches from the queue,
+  * and a QueueRunner is added to the current Graph's QUEUE_RUNNER collection,
+    to enqueue the tensors from tensor_list_list.
+
+  Args:
+    tensor_list_list: A list of tuples of tensors to enqueue.
+      len(tensor_list_list) threads will be started, with the i-th
+      thread enqueuing the tensors from tensor_list[i].
+      tensor_list[i1][j] must match tensor_list[i2][j] in type and
+      shape (except in the first dimension if enqueue_many is true).
+    batch_size: The new batch size pulled from the queue.
+    capacity: Maximum number of elements in the queue, controls the
+      how far ahead the prefetching allowed is allowed to get and
+      memory usage.
+    min_after_dequeue: Minimum number elements in the queue after a
+      dequeue, used to ensure a level of mixing of elements.
+    seed: Seed for the random shuffling within the queue.
+    enqueue_many: If False, each tensor_list_list[i] is assumed to
+      represent a single example.  If True, tensor_list_list[i] is
+      assumed to represent a batch of examples, where the first
+      dimension is indexed by example, and all members of
+      tensor_list_list[i] should have the same size in the first
+      dimension.
+    shapes: Optional. The shapes for each example.  Defaults to the
+      inferred shapes for tensor_list_list[i] (which must match, after
+      leaving off the first dimension if enqueue_many is True).
+    name: A name for the operations (optional).
+
+  Returns:
+    A list of tensors with the same number and types as
+    tensor_list_list[i].  If enqueue_many is false, then an input
+    tensor with shape `[x, y, z]` will be output as a tensor with
+    shape `[batch_size, x, y, z]`.  If enqueue_many is True, and an
+    input tensor has shape `[*, x, y, z]`, the the output will have
+    shape `[batch_size, x, y, z]`.
+  """
+  with ops.op_scope(
+      _flatten(tensor_list_list), name, "shuffle_batch_join") as name:
+    tensor_list_list = _validate_join(tensor_list_list)
+    dtypes = _dtypes(tensor_list_list)
+    shapes = _shapes(tensor_list_list, shapes, enqueue_many)
+    queue = data_flow_ops.RandomShuffleQueue(
+        capacity=capacity, min_after_dequeue=min_after_dequeue, seed=seed,
+        dtypes=dtypes, shapes=shapes)
+    _enqueue_join(queue, tensor_list_list, enqueue_many)
+    full = (math_ops.cast(queue.size() - min_after_dequeue, types.float32) *
+            (1. / (capacity - min_after_dequeue)))
+    # Note that name contains a '/' at the end so we intentionally do not place
+    # a '/' after %s below.
+    summary_name = (
+        "queue/%sfraction_over_%d_of_%d_full" %
+        (name, min_after_dequeue, capacity - min_after_dequeue))
+    summary_ops.scalar_summary(summary_name, full)
+    return queue.dequeue_many(batch_size, name=name)
diff --git a/tensorflow/python/training/input_test.py b/tensorflow/python/training/input_test.py
new file mode 100644
index 0000000000..fe8c195e77
--- /dev/null
+++ b/tensorflow/python/training/input_test.py
@@ -0,0 +1,477 @@
+"""Tests for training.input."""
+
+import os
+import itertools
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class MatchFilenamesOnceTest(tf.test.TestCase):
+
+  def test(self):
+    temp_dir = self.get_temp_dir()
+    filenames = [os.path.join(temp_dir, n) for n in os.listdir(temp_dir)]
+    additional = [os.path.join(self.get_temp_dir(), "match_filenames.%d" % i)
+                  for i in range(3)]
+    for name in additional:
+      open(name, "w").write("Some contents")
+    filenames += additional
+    with self.test_session():
+      star = tf.train.match_filenames_once(
+          os.path.join(self.get_temp_dir(), "*"))
+      question = tf.train.match_filenames_once(
+          os.path.join(self.get_temp_dir(), "match_filenames.?"))
+      one = tf.train.match_filenames_once(additional[1])
+      tf.initialize_all_variables().run()
+      self.assertItemsEqual(filenames, star.eval())
+      self.assertItemsEqual(additional, question.eval())
+      self.assertItemsEqual([additional[1]], one.eval())
+
+
+class LimitEpochsTest(tf.test.TestCase):
+
+  def testNoLimit(self):
+    with self.test_session():
+      seven = tf.constant(7)
+      seven_forever = tf.train.limit_epochs(seven)
+      tf.initialize_all_variables().run()
+      for i in range(100):
+        self.assertEqual(7, seven_forever.eval())
+
+  def testLimit(self):
+    with self.test_session():
+      love_me = tf.constant("Love Me")
+      love_me_two_times = tf.train.limit_epochs(love_me, num_epochs=2)
+      tf.initialize_all_variables().run()
+      self.assertEqual("Love Me", love_me_two_times.eval())
+      self.assertEqual("Love Me", love_me_two_times.eval())
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        love_me_two_times.eval()
+
+
+class StringInputProducerTest(tf.test.TestCase):
+
+  def testNoShuffle(self):
+    with self.test_session():
+      strings = ["to", "be", "or", "not", "to", "be"]
+      num_epochs = 3
+      queue = tf.train.string_input_producer(
+          strings, num_epochs=num_epochs, shuffle=False)
+      dequeue_many = queue.dequeue_many(len(strings) * num_epochs)
+      dequeue = queue.dequeue()
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      # No randomness, so just see repeated copies of the input.
+      output = dequeue_many.eval()
+      self.assertAllEqual(strings * num_epochs, output)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        dequeue.eval()
+      for thread in threads:
+        thread.join()
+
+  def testShuffle(self):
+    with self.test_session():
+      strings = ["a", "b", "c"]
+      num_epochs = 600
+      queue = tf.train.string_input_producer(
+          strings, num_epochs=num_epochs, shuffle=True, seed=271828)
+      dequeue_many = queue.dequeue_many(len(strings))
+      dequeue = queue.dequeue()
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      # Validate that we only shuffle the strings within an epoch and
+      # count how often each possible order appears.
+      expected = ["abc", "acb", "bac", "bca", "cab", "cba"]
+      frequency = {}
+      for e in expected:
+        frequency[e] = 0
+      for _ in range(num_epochs):
+        output = dequeue_many.eval()
+        key = "".join(output)
+        self.assertIn(key, expected)
+        frequency[key] += 1
+
+      # Expect an approximately even distribution over all possible orders.
+      expected_frequency = num_epochs / len(expected)
+      margin = expected_frequency * 0.4
+      tf.logging.info("Observed counts: %s", frequency)
+      for key in expected:
+        value = frequency[key]
+        self.assertGreater(value, expected_frequency - margin)
+        self.assertLess(value, expected_frequency + margin)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        dequeue.eval()
+      for thread in threads:
+        thread.join()
+
+
+class RangeInputProducerTest(tf.test.TestCase):
+
+  def testNoShuffle(self):
+    with self.test_session():
+      num_epochs = 3
+      range_size = 5
+      queue = tf.train.range_input_producer(
+          range_size, num_epochs=num_epochs, shuffle=False)
+      dequeue_many = queue.dequeue_many(range_size * num_epochs)
+      dequeue = queue.dequeue()
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      # No randomness, so just see repeated copies of the input.
+      output = dequeue_many.eval()
+      self.assertAllEqual(range(range_size) * num_epochs, output)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        dequeue.eval()
+      for thread in threads:
+        thread.join()
+
+  def testShuffle(self):
+    with self.test_session():
+      num_epochs = 200
+      range_size = 2
+      queue = tf.train.range_input_producer(
+          range_size, num_epochs=num_epochs, shuffle=True, seed=314159)
+      dequeue_many = queue.dequeue_many(range_size)
+      dequeue = queue.dequeue()
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      # Validate that we only shuffle the integers within an epoch and
+      # count how often each possible order appears.
+      expected = [12, 21]
+      frequency = {}
+      for e in expected:
+        frequency[e] = 0
+      for _ in range(num_epochs):
+        output = dequeue_many.eval()
+        key = 10 * (output[0] + 1) + (output[1] + 1)
+        self.assertIn(key, expected)
+        frequency[key] += 1
+
+      # Expect an approximately even distribution over all possible orders.
+      expected_frequency = num_epochs / len(expected)
+      margin = expected_frequency * 0.4
+      tf.logging.info("Observed counts: %s", frequency)
+      for key in expected:
+        value = frequency[key]
+        self.assertGreater(value, expected_frequency - margin)
+        self.assertLess(value, expected_frequency + margin)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        dequeue.eval()
+      for thread in threads:
+        thread.join()
+
+
+class SliceInputProducerTest(tf.test.TestCase):
+
+  def testNoShuffle(self):
+    with self.test_session() as sess:
+      num_epochs = 3
+      source_strings = ["Alpha", "Beta", "Delta", "Gamma"]
+      source_ints = [2, 3, 5, 7]
+      slices = tf.train.slice_input_producer(
+          [source_strings, source_ints], num_epochs=num_epochs, shuffle=False)
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      # No randomness, so just see repeated copies of the input.
+      num_items = len(source_strings) * num_epochs
+      output = [sess.run(slices) for _ in range(num_items)]
+      out_strings, out_ints = zip(*output)
+      self.assertAllEqual(source_strings * num_epochs, out_strings)
+      self.assertAllEqual(source_ints * num_epochs, out_ints)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run(slices)
+      for thread in threads:
+        thread.join()
+
+  def testShuffle(self):
+    with self.test_session() as sess:
+      num_epochs = 1200
+      source_strings = ["A", "B", "D", "G"]
+      source_ints = [7, 3, 5, 2]
+      slices = tf.train.slice_input_producer(
+          [source_strings, source_ints], num_epochs=num_epochs, shuffle=True,
+          seed=161803)
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      # Validate that we only shuffle the integers within an epoch and
+      # count how often each possible order appears.
+      expected = [",".join(x) for x in
+                  itertools.permutations(["A7", "B3", "D5", "G2"])]
+      frequency = {}
+      for e in expected:
+        frequency[e] = 0
+      for _ in range(num_epochs):
+        output = [sess.run(slices) for _ in range(len(source_strings))]
+        key = ",".join([s + str(i) for s, i in output])
+        self.assertIn(key, expected)
+        frequency[key] += 1
+
+      # Expect an approximately even distribution over all possible orders.
+      expected_frequency = num_epochs / len(expected)
+      margin = expected_frequency * 0.4
+      tf.logging.info("Observed counts: %s", frequency)
+      for key in expected:
+        value = frequency[key]
+        self.assertGreater(value, expected_frequency - margin)
+        self.assertLess(value, expected_frequency + margin)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run(slices)
+      for thread in threads:
+        thread.join()
+
+
+class BatchTest(tf.test.TestCase):
+
+  def testOneThread(self):
+    with self.test_session() as sess:
+      batch_size = 10
+      num_batches = 3
+      zero64 = tf.constant(0, dtype=tf.int64)
+      examples = tf.Variable(zero64)
+      counter = examples.count_up_to(num_batches * batch_size)
+      batched = tf.train.batch([counter, "string"], batch_size=batch_size)
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      for i in range(num_batches):
+        results = sess.run(batched)
+        self.assertAllEqual(results[0],
+                            range(i * batch_size, (i + 1) * batch_size))
+        self.assertAllEqual(results[1], ["string"] * batch_size)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run(batched)
+      for thread in threads:
+        thread.join()
+
+  def testManyThreads(self):
+    with self.test_session() as sess:
+      batch_size = 10
+      num_batches = 3
+      zero64 = tf.constant(0, dtype=tf.int64)
+      examples = tf.Variable(zero64)
+      counter = examples.count_up_to(num_batches * batch_size)
+      batched = tf.train.batch([counter, "string"], batch_size=batch_size,
+                               num_threads=4)
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      all_counts = []
+      for i in range(num_batches):
+        results = sess.run(batched)
+        tf.logging.info("Batch %d: %s", i, results[0])
+        self.assertEqual(len(results[0]), batch_size)
+        all_counts.extend(results[0])
+        self.assertAllEqual(results[1], ["string"] * batch_size)
+      self.assertItemsEqual(all_counts, range(num_batches * batch_size))
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run(batched)
+      for thread in threads:
+        thread.join()
+
+
+class BatchJoinTest(tf.test.TestCase):
+
+  def testTwoThreads(self):
+    with self.test_session() as sess:
+      # Two threads, the first generates (0..34, "a").
+      num_a = 35
+      zero64 = tf.constant(0, dtype=tf.int64)
+      examples = tf.Variable(zero64)
+      counter = examples.count_up_to(num_a)
+
+      # The second generates (99, "b") 45 times and then stops.
+      num_b = 45
+      ninety_nine = tf.train.limit_epochs(
+          tf.constant(99, dtype=tf.int64), num_b)
+
+      # These get joined together and grouped into batches of 5.
+      batch_size = 5
+      batched = tf.train.batch_join([[counter, "a"], [ninety_nine, "b"]],
+                                    batch_size=batch_size)
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      # Should see the "a" and "b" threads mixed together.
+      all_a = []
+      seen_b = 0
+      saw_both = 0
+      num_batches = (num_a + num_b) / batch_size
+      for i in range(num_batches):
+        results = sess.run(batched)
+        tf.logging.info("Batch %d: %s", i, results[0])
+        self.assertEqual(len(results[0]), batch_size)
+        self.assertEqual(len(results[1]), batch_size)
+        which_a = [i for i, s in enumerate(results[1]) if s == "a"]
+        which_b = [i for i, s in enumerate(results[1]) if s == "b"]
+        self.assertEqual(len(which_a) + len(which_b), batch_size)
+        if len(which_a) > 0 and len(which_b) > 0: saw_both += 1
+        all_a.extend([results[0][i] for i in which_a])
+        seen_b += len(which_b)
+        self.assertAllEqual([99] * len(which_b),
+                            [results[0][i] for i in which_b])
+
+      # Some minimum level of mixing of the results of both threads.
+      self.assertGreater(saw_both, 1)
+
+      # Verify the order of results from "a" were preserved.
+      self.assertAllEqual(all_a, range(num_a))
+      self.assertEqual(seen_b, num_b)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run(batched)
+      for thread in threads:
+        thread.join()
+
+
+class ShuffleBatchTest(tf.test.TestCase):
+
+  def testOneThread(self):
+    with self.test_session() as sess:
+      batch_size = 10
+      num_batches = 3
+      zero64 = tf.constant(0, dtype=tf.int64)
+      examples = tf.Variable(zero64)
+      counter = examples.count_up_to(num_batches * batch_size)
+      batched = tf.train.shuffle_batch(
+          [counter, "string"], batch_size=batch_size, capacity=32,
+          min_after_dequeue=16, seed=141421)
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      all_counts = []
+      for i in range(num_batches):
+        results = sess.run(batched)
+        self.assertEqual(len(results[0]), batch_size)
+        all_counts.extend(results[0])
+        self.assertAllEqual(results[1], ["string"] * batch_size)
+      # Results scrambled, but include all the expected numbers.
+      deltas = [all_counts[i + 1] - all_counts[i]
+                for i in range(len(all_counts) - 1)]
+      self.assertFalse(all(d == deltas[0] for d in deltas))
+      self.assertItemsEqual(all_counts, range(num_batches * batch_size))
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run(batched)
+      for thread in threads:
+        thread.join()
+
+  def testManyThreads(self):
+    with self.test_session() as sess:
+      batch_size = 10
+      num_batches = 3
+      zero64 = tf.constant(0, dtype=tf.int64)
+      examples = tf.Variable(zero64)
+      counter = examples.count_up_to(num_batches * batch_size)
+      batched = tf.train.shuffle_batch(
+          [counter, "string"], batch_size=batch_size, capacity=32,
+          min_after_dequeue=16, seed=173205, num_threads=4)
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      all_counts = []
+      for i in range(num_batches):
+        results = sess.run(batched)
+        tf.logging.info("Batch %d: %s", i, results[0])
+        self.assertEqual(len(results[0]), batch_size)
+        all_counts.extend(results[0])
+        self.assertAllEqual(results[1], ["string"] * batch_size)
+      # Results scrambled, but include all the expected numbers.
+      deltas = [all_counts[i + 1] - all_counts[i]
+                for i in range(len(all_counts) - 1)]
+      self.assertFalse(all(d == deltas[0] for d in deltas))
+      self.assertItemsEqual(all_counts, range(num_batches * batch_size))
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run(batched)
+      for thread in threads:
+        thread.join()
+
+
+class ShuffleBatchJoinTest(tf.test.TestCase):
+
+  def testTwoThreads(self):
+    with self.test_session() as sess:
+      # Two threads, the first generates (0..24, "a").
+      num_a = 25
+      zero64 = tf.constant(0, dtype=tf.int64)
+      examples = tf.Variable(zero64)
+      counter = examples.count_up_to(num_a)
+
+      # The second generates (99, "b") 35 times and then stops.
+      num_b = 35
+      ninety_nine = tf.train.limit_epochs(
+          tf.constant(99, dtype=tf.int64), num_b)
+
+      # These get joined together and grouped into batches of 5.
+      batch_size = 5
+      batched = tf.train.shuffle_batch_join(
+          [[counter, "a"], [ninety_nine, "b"]], batch_size=batch_size,
+          capacity=32, min_after_dequeue=16, seed=223607)
+
+      tf.initialize_all_variables().run()
+      threads = tf.train.start_queue_runners()
+
+      # Should see the "a" and "b" threads mixed together.
+      all_a = []
+      seen_b = 0
+      saw_both = 0
+      num_batches = (num_a + num_b) / batch_size
+      for i in range(num_batches):
+        results = sess.run(batched)
+        tf.logging.info("Batch %d: %s", i, results[0])
+        self.assertEqual(len(results[0]), batch_size)
+        self.assertEqual(len(results[1]), batch_size)
+        which_a = [i for i, s in enumerate(results[1]) if s == "a"]
+        which_b = [i for i, s in enumerate(results[1]) if s == "b"]
+        self.assertEqual(len(which_a) + len(which_b), batch_size)
+        if len(which_a) > 0 and len(which_b) > 0: saw_both += 1
+        all_a.extend([results[0][i] for i in which_a])
+        seen_b += len(which_b)
+        self.assertAllEqual([99] * len(which_b),
+                            [results[0][i] for i in which_b])
+
+      # Some minimum level of mixing of the results of both threads.
+      self.assertGreater(saw_both, 1)
+
+      # Saw all the items from "a", but scrambled.
+      self.assertItemsEqual(all_a, range(num_a))
+      deltas = [all_a[i + 1] - all_a[i]
+                for i in range(len(all_a) - 1)]
+      self.assertFalse(all(d == deltas[0] for d in deltas))
+      self.assertEqual(seen_b, num_b)
+
+      # Reached the limit.
+      with self.assertRaises(tf.errors.OutOfRangeError):
+        sess.run(batched)
+      for thread in threads:
+        thread.join()
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/learning_rate_decay.py b/tensorflow/python/training/learning_rate_decay.py
new file mode 100644
index 0000000000..cafcb26d01
--- /dev/null
+++ b/tensorflow/python/training/learning_rate_decay.py
@@ -0,0 +1,65 @@
+"""Various learning rate decay functions."""
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import math_ops
+
+
+def exponential_decay(learning_rate, global_step, decay_steps, decay_rate,
+                      staircase=False, name=None):
+  """Applies exponential decay to the learning rate.
+
+  When training a model, it is often recommended to lower the learning rate as
+  the training progresses.  This function applies an exponential decay function
+  to a provided initial learning rate.  It requires a `global_step` value to
+  compute the decayed learning rate.  You can just pass a TensorFlow variable
+  that you increment at each training step.
+
+  The function returns the decayed learning rate.  It is computed as:
+
+  ```python
+  decayed_learning_rate = learning_rate *
+                          decay_rate ^ (global_step / decay_steps)
+  ```
+
+  If the argument `staircase` is `True`, then `global_step /decay_steps` is an
+  integer division and the decayed learning rate follows a staircase function.
+
+  Example: decay every 100000 steps with a base of 0.96:
+
+  ```python
+  ...
+  global_step = tf.Variable(0, trainable=False)
+  starter_learning_rate = 0.1
+  learning_rate = tf.exponential_decay(starter_learning_rate, global_step,
+                                       100000, 0.96, staircase=True)
+  optimizer = tf.GradientDescent(learning_rate)
+  # Passing global_step to minimize() will increment it at each step.
+  optimizer.minimize(...my loss..., global_step=global_step)
+  ```
+
+  Args:
+    learning_rate: A scalar `float32` or `float64` `Tensor` or a
+      Python number.  The initial learning rate.
+    global_step: A scalar `int32` or `int64` `Tensor` or a Python number.
+      Global step to use for the decay computation.  Must not be negative.
+    decay_steps: A scalar `int32` or `int64` `Tensor` or a Python number.
+      Must be positive.  See the decay computation above.
+    decay_rate: A scalar `float32` or `float64` `Tensor` or a
+      Python number.  The decay rate.
+    staircase: Boolean.  It `True` decay the learning rate at discrete intervals.
+    name: string.  Optional name of the operation.  Defaults to 'ExponentialDecay'
+
+  Returns:
+    A scalar `Tensor` of the same type as `learning_rate`.  The decayed
+    learning rate.
+  """
+  with ops.op_scope([learning_rate, global_step, decay_steps, decay_rate],
+                   name, "ExponentialDecay") as name:
+    learning_rate = ops.convert_to_tensor(learning_rate, name="learning_rate")
+    dtype = learning_rate.dtype
+    global_step = math_ops.cast(global_step, dtype)
+    decay_steps = math_ops.cast(decay_steps, dtype)
+    decay_rate = math_ops.cast(decay_rate, dtype)
+    p = global_step / decay_steps
+    if staircase:
+      p = math_ops.floor(p)
+    return math_ops.mul(learning_rate, math_ops.pow(decay_rate, p), name=name)
diff --git a/tensorflow/python/training/learning_rate_decay_test.py b/tensorflow/python/training/learning_rate_decay_test.py
new file mode 100644
index 0000000000..b85d58cae7
--- /dev/null
+++ b/tensorflow/python/training/learning_rate_decay_test.py
@@ -0,0 +1,60 @@
+"""Functional test for learning rate decay."""
+import tensorflow.python.platform
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import state_ops
+from tensorflow.python.ops import variables
+from tensorflow.python.platform import googletest
+from tensorflow.python.training import learning_rate_decay
+
+
+class LRDecayTest(test_util.TensorFlowTestCase):
+
+  def testContinuous(self):
+    with self.test_session():
+      step = 5
+      decayed_lr = learning_rate_decay.exponential_decay(0.05, step, 10, 0.96)
+      expected = .05 * 0.96 ** (5.0 / 10.0)
+      self.assertAllClose(decayed_lr.eval(), expected, 1e-6)
+
+  def testStaircase(self):
+    with self.test_session():
+      step = state_ops.variable_op([], types.int32)
+      assign_100 = state_ops.assign(step, 100)
+      assign_1 = state_ops.assign(step, 1)
+      assign_2 = state_ops.assign(step, 2)
+      decayed_lr = learning_rate_decay.exponential_decay(.1, step, 3, 0.96,
+                                                         staircase=True)
+      # No change to learning rate
+      assign_1.op.run()
+      self.assertAllClose(decayed_lr.eval(), .1, 1e-6)
+      assign_2.op.run()
+      self.assertAllClose(decayed_lr.eval(), .1, 1e-6)
+      # Decayed learning rate
+      assign_100.op.run()
+      expected = .1 * 0.96 ** (100 / 3)
+      self.assertAllClose(decayed_lr.eval(), expected, 1e-6)
+
+  def testVariables(self):
+    with self.test_session():
+      step = variables.Variable(1)
+      assign_1 = step.assign(1)
+      assign_2 = step.assign(2)
+      assign_100 = step.assign(100)
+      decayed_lr = learning_rate_decay.exponential_decay(.1, step, 3, 0.96,
+                                                         staircase=True)
+      variables.initialize_all_variables().run()
+      # No change to learning rate
+      assign_1.op.run()
+      self.assertAllClose(decayed_lr.eval(), .1, 1e-6)
+      assign_2.op.run()
+      self.assertAllClose(decayed_lr.eval(), .1, 1e-6)
+      # Decayed learning rate
+      assign_100.op.run()
+      expected = .1 * 0.96 ** (100 / 3)
+      self.assertAllClose(decayed_lr.eval(), expected, 1e-6)
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/training/momentum.py b/tensorflow/python/training/momentum.py
new file mode 100644
index 0000000000..fdd434359f
--- /dev/null
+++ b/tensorflow/python/training/momentum.py
@@ -0,0 +1,51 @@
+"""Momentum for TensorFlow."""
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.training import optimizer
+from tensorflow.python.training import training_ops
+
+
+class MomentumOptimizer(optimizer.Optimizer):
+  """Optimizer that implements the Momentum algorithm.
+
+  @@__init__
+  """
+
+  def __init__(self, learning_rate, momentum,
+               use_locking=False, name="Momentum"):
+    """Construct a new Momentum optimizer.
+
+    Args:
+      learning_rate: A `Tensor` or a floating point value.  The learning rate.
+      momentum: A `Tensor` or a floating point value.  The momentum.
+      use_locking: If `True` use locks for update operations.
+      name: Optional name prefix for the operations created when applying
+        gradients.  Defaults to "Momentum".
+    """
+    super(MomentumOptimizer, self).__init__(use_locking, name)
+    self._learning_rate = learning_rate
+    self._momentum = momentum
+
+  def _create_slots(self, var_list):
+    for v in var_list:
+      self._zeros_slot(v, "momentum", self._name)
+
+  def _prepare(self):
+    self._learning_rate_tensor = ops.convert_to_tensor(self._learning_rate,
+                                                       name="learning_rate")
+    self._momentum_tensor = ops.convert_to_tensor(self._momentum,
+                                                  name="momentum")
+
+  def _apply_dense(self, grad, var):
+    mom = self.get_slot(var, "momentum")
+    return training_ops.apply_momentum(
+        var, mom,
+        self._learning_rate_tensor, grad, self._momentum_tensor,
+        use_locking=self._use_locking).op
+
+  def _apply_sparse(self, grad, var):
+    mom = self.get_slot(var, "momentum")
+    return training_ops.sparse_apply_momentum(
+        var, mom,
+        self._learning_rate_tensor, grad.values, grad.indices,
+        self._momentum_tensor, use_locking=self._use_locking).op
diff --git a/tensorflow/python/training/momentum_test.py b/tensorflow/python/training/momentum_test.py
new file mode 100644
index 0000000000..2cf86d97c9
--- /dev/null
+++ b/tensorflow/python/training/momentum_test.py
@@ -0,0 +1,258 @@
+"""Tests for Momentum."""
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class MomentumOptimizerTest(tf.test.TestCase):
+
+  def testBasic(self):
+    with self.test_session():
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([3.0, 4.0])
+      grads0 = tf.constant([0.1, 0.1])
+      grads1 = tf.constant([0.01, 0.01])
+      mom_opt = tf.train.MomentumOptimizer(learning_rate=2.0, momentum=0.9)
+      mom_update = mom_opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+      # Check we have slots
+      self.assertEqual(["momentum"], mom_opt.get_slot_names())
+      slot0 = mom_opt.get_slot(var0, "momentum")
+      self.assertEquals(slot0.get_shape(), var0.get_shape())
+      self.assertFalse(slot0 in tf.trainable_variables())
+      slot1 = mom_opt.get_slot(var1, "momentum")
+      self.assertEquals(slot1.get_shape(), var1.get_shape())
+      self.assertFalse(slot1 in tf.trainable_variables())
+
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+      # Step 1: the momentum accumulators where 0. So we should see a normal
+      # update: v -= grad * learning_rate
+      mom_update.run()
+      # Check that the momentum accumulators have been updated.
+      self.assertAllClose(np.array([0.1, 0.1]), slot0.eval())
+      self.assertAllClose(np.array([0.01, 0.01]), slot1.eval())
+      # Check that the parameters have been updated.
+      self.assertAllClose(np.array([1.0 - (0.1 * 2.0),
+                                    2.0 - (0.1 * 2.0)]),
+                          var0.eval())
+      self.assertAllClose(np.array([3.0 - (0.01 * 2.0),
+                                    4.0 - (0.01 * 2.0)]),
+                          var1.eval())
+      # Step 2: the momentum accumulators contain the previous update.
+      mom_update.run()
+      # Check that the momentum accumulators have been updated.
+      self.assertAllClose(np.array([(0.9 * 0.1 + 0.1), (0.9 * 0.1 + 0.1)]),
+                          slot0.eval())
+      self.assertAllClose(np.array([(0.9 * 0.01 + 0.01), (0.9 * 0.01 + 0.01)]),
+                          slot1.eval())
+      # Check that the parameters have been updated.
+      self.assertAllClose(
+          np.array([1.0 - (0.1 * 2.0) - ((0.9 * 0.1 + 0.1) * 2.0),
+                    2.0 - (0.1 * 2.0) - ((0.9 * 0.1 + 0.1) * 2.0)]),
+          var0.eval())
+      self.assertAllClose(np.array([2.98 - ((0.9 * 0.01 + 0.01) * 2.0),
+                                    3.98 - ((0.9 * 0.01 + 0.01) * 2.0)]),
+                          var1.eval())
+
+  def testFloat64(self):
+    with self.test_session():
+      opt = tf.train.MomentumOptimizer(learning_rate=2.0, momentum=0.9)
+
+      # compute_gradients.
+      values = [1.0, 3.0]
+      good_vars = [tf.Variable([v]) for v in values]
+      bad_loss = tf.constant(2.0, tf.float64, name="bad_loss")
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_loss.*expected.*float32",
+          opt.compute_gradients, bad_loss, good_vars)
+      bad_vars = [
+          tf.Variable(np.array([v], np.float64), name="bad_var")
+          for v in values]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_var.*expected.*float32",
+          opt.compute_gradients, tf.cast(bad_vars[0] + bad_vars[1], tf.float32),
+          bad_vars)
+      opt.compute_gradients(good_vars[0] + good_vars[1], good_vars)
+
+      # apply_gradients.
+      bad_grads = [
+          tf.constant([0.1], dtype=np.float64, name="bad_grad"),
+          tf.constant([0.01])]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_grad.*expected.*float32",
+          opt.apply_gradients, zip(bad_grads, good_vars))
+      good_grads = [tf.constant([0.01]), tf.constant([0.02])]
+      self.assertRaisesRegexp(
+          ValueError, r"Invalid type.*float64.*bad_var.*expected.*float32",
+          opt.apply_gradients, zip(good_grads, bad_vars))
+      opt.apply_gradients(zip(good_grads, good_vars))
+
+  def _dbParamsMom01(self):
+    """Return dist-belief momentum values.
+
+    Return values been generated from the dist-belief momentum unittest,
+    running with a learning rate of 0.1 and a momemntum of 0.1.
+
+    These values record how a parameter vector of size 10, initialized with 0.0,
+    gets updated with 10 consecutive momentum steps.  It uses random gradients.
+
+    Returns:
+      db_grad: The gradients to apply
+      db_out: The parameters after the momentum update.
+    """
+    db_grad = [[]] * 10
+    db_out = [[]] * 10
+    # pylint: disable=line-too-long
+    db_grad[0] = [0.00096264342, 0.17914793, 0.93945462, 0.41396621, 0.53037018, 0.93197989, 0.78648776, 0.50036013, 0.55345792, 0.96722615]
+    db_out[0] = [-9.6264346e-05, -0.017914793, -0.093945466, -0.041396622, -0.053037018, -0.093197994, -0.078648776, -0.050036013, -0.055345792, -0.096722618]
+    db_grad[1] = [0.17075552, 0.88821375, 0.20873757, 0.25236958, 0.57578111, 0.15312378, 0.5513742, 0.94687688, 0.16012503, 0.22159521]
+    db_out[1] = [-0.017181443, -0.10852765, -0.12421377, -0.070773244, -0.11591884, -0.11783017, -0.14165108, -0.14972731, -0.076892875, -0.1285544]
+    db_grad[2] = [0.35077485, 0.47304362, 0.44412705, 0.44368884, 0.078527533, 0.81223965, 0.31168157, 0.43203235, 0.16792089, 0.24644311]
+    db_out[2] = [-0.053967446, -0.1648933, -0.1716533, -0.1180798, -0.13005978, -0.20151734, -0.17911947, -0.20289968, -0.095839672, -0.15638189]
+    db_grad[3] = [0.9694621, 0.75035888, 0.28171822, 0.83813518, 0.53807181, 0.3728098, 0.81454384, 0.03848977, 0.89759839, 0.93665648]
+    db_out[3] = [-0.15459226, -0.24556576, -0.20456907, -0.20662397, -0.18528105, -0.24716705, -0.2643207, -0.21206589, -0.18749419, -0.2528303]
+    db_grad[4] = [0.38578293, 0.8536852, 0.88722926, 0.66276771, 0.13678469, 0.94036359, 0.69107032, 0.81897682, 0.5433259, 0.67860287]
+    db_out[4] = [-0.20323303, -0.33900154, -0.29658359, -0.28175515, -0.20448165, -0.34576839, -0.34194785, -0.29488021, -0.25099224, -0.33033544]
+    db_grad[5] = [0.27885768, 0.76100707, 0.24625534, 0.81354135, 0.18959245, 0.48038563, 0.84163809, 0.41172323, 0.83259648, 0.44941229]
+    db_out[5] = [-0.23598288, -0.42444581, -0.33041057, -0.3706224, -0.22536094, -0.40366709, -0.43387437, -0.34433398, -0.34060168, -0.38302717]
+    db_grad[6] = [0.27233034, 0.056316052, 0.5039115, 0.24105175, 0.35697976, 0.75913221, 0.73577434, 0.16014607, 0.57500273, 0.071136251]
+    db_out[6] = [-0.26649091, -0.43862185, -0.38418442, -0.40361428, -0.26314685, -0.48537019, -0.51664448, -0.36529395, -0.40706289, -0.39540997]
+    db_grad[7] = [0.58697265, 0.2494842, 0.08106143, 0.39954534, 0.15892942, 0.12683646, 0.74053431, 0.16033, 0.66625422, 0.73515922]
+    db_out[7] = [-0.32823896, -0.46498787, -0.39766794, -0.446868, -0.28281838, -0.50622416, -0.59897494, -0.38342294, -0.48033443, -0.47016418]
+    db_grad[8] = [0.8215279, 0.41994119, 0.95172721, 0.68000203, 0.79439718, 0.43384039, 0.55561525, 0.22567581, 0.93331909, 0.29438227]
+    db_out[8] = [-0.41656655, -0.50961858, -0.49418902, -0.51919359, -0.36422527, -0.55169362, -0.6627695, -0.40780342, -0.58099347, -0.50707781]
+    db_grad[9] = [0.68297005, 0.67758518, 0.1748755, 0.13266537, 0.70697063, 0.055731893, 0.68593478, 0.50580865, 0.12602448, 0.093537711]
+    db_out[9] = [-0.49369633, -0.58184016, -0.52132869, -0.5396927, -0.44306302, -0.56181377, -0.73774242, -0.46082234, -0.60366184, -0.52012295]
+    # pylint: enable=line-too-long
+    return db_grad, db_out
+
+  def testLikeDistBeliefMom01(self):
+    with self.test_session():
+      db_grad, db_out = self._dbParamsMom01()
+      num_samples = len(db_grad)
+      var0 = tf.Variable([0.0] * num_samples)
+      grads0 = tf.constant([0.0] * num_samples)
+      mom_opt = tf.train.MomentumOptimizer(learning_rate=0.1, momentum=0.1)
+      mom_update = mom_opt.apply_gradients(zip([grads0], [var0]))
+      tf.initialize_all_variables().run()
+      for i in xrange(num_samples):
+        mom_update.run(feed_dict={grads0: db_grad[i]})
+        self.assertAllClose(np.array(db_out[i]), var0.eval())
+
+  def testSparse(self):
+    with self.test_session():
+      var0 = tf.Variable(tf.zeros([4, 2]))
+      var1 = tf.Variable(
+          tf.constant(1.0, tf.float32, [4, 2]))
+      grads0 = tf.IndexedSlices(tf.constant([[.1, .1]]),
+                                tf.constant([1]),
+                                tf.constant([4, 2]))
+      grads1 = tf.IndexedSlices(tf.constant([[.01, .01], [.01, .01]]),
+                                tf.constant([2, 3]),
+                                tf.constant([4, 2]))
+      mom_opt = tf.train.MomentumOptimizer(learning_rate=2.0, momentum=0.9)
+      mom_update = mom_opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      # Check we have slots
+      self.assertEqual(["momentum"], mom_opt.get_slot_names())
+      slot0 = mom_opt.get_slot(var0, "momentum")
+      self.assertEquals(slot0.get_shape(), var0.get_shape())
+      slot1 = mom_opt.get_slot(var1, "momentum")
+      self.assertEquals(slot1.get_shape(), var1.get_shape())
+
+      # Fetch params to validate initial values
+      self.assertAllClose([0, 0], var0.eval()[0])
+      self.assertAllClose([0, 0], var0.eval()[1])
+      self.assertAllClose([1, 1], var1.eval()[2])
+
+      # Step 1: the momentum accumulators are 0. So we should see a normal
+      # update: v -= grad * learning_rate
+      mom_update.run()
+      # Check that the momentum accumulators have been updated.
+      self.assertAllClose(np.array([0, 0]), slot0.eval()[0])
+      self.assertAllClose(np.array([.1, .1]), slot0.eval()[1])
+      self.assertAllClose(np.array([.01, .01]), slot1.eval()[2])
+      # Check that the parameters have been updated.
+      self.assertAllClose(np.array([0, 0]), var0.eval()[0])
+      self.assertAllClose(np.array([- (0.1 * 2.0),
+                                    - (0.1 * 2.0)]),
+                          var0.eval()[1])
+      self.assertAllClose(np.array([1.0 - (0.01 * 2.0),
+                                    1.0 - (0.01 * 2.0)]),
+                          var1.eval()[2])
+      # Step 2: the momentum accumulators contain the previous update.
+      mom_update.run()
+      # Check that the momentum accumulators have been updated.
+      self.assertAllClose(np.array([0, 0]), slot0.eval()[0])
+      self.assertAllClose(np.array([(0.9 * 0.1 + 0.1),
+                                    (0.9 * 0.1 + 0.1)]),
+                          slot0.eval()[1])
+      self.assertAllClose(np.array([(0.9 * 0.01 + 0.01),
+                                    (0.9 * 0.01 + 0.01)]),
+                          slot1.eval()[2])
+      # Check that the parameters have been updated.
+      self.assertAllClose(np.array([0, 0]), var0.eval()[0])
+      self.assertAllClose(
+          np.array([- (0.1 * 2.0) - ((0.9 * 0.1 + 0.1) * 2.0),
+                    - (0.1 * 2.0) - ((0.9 * 0.1 + 0.1) * 2.0)]),
+          var0.eval()[1])
+      self.assertAllClose(np.array([0.98 - ((0.9 * 0.01 + 0.01) * 2.0),
+                                    0.98 - ((0.9 * 0.01 + 0.01) * 2.0)]),
+                          var1.eval()[2])
+
+  def testSharing(self):
+    with self.test_session():
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([3.0, 4.0])
+      grads0 = tf.constant([0.1, 0.1])
+      grads1 = tf.constant([0.01, 0.01])
+      mom_opt = tf.train.MomentumOptimizer(learning_rate=2.0, momentum=0.9)
+      mom_update1 = mom_opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      mom_update2 = mom_opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      self.assertEqual(["momentum"], mom_opt.get_slot_names())
+      slot0 = mom_opt.get_slot(var0, "momentum")
+      self.assertEquals(slot0.get_shape(), var0.get_shape())
+      slot1 = mom_opt.get_slot(var1, "momentum")
+      self.assertEquals(slot1.get_shape(), var1.get_shape())
+
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+      # Step 1: the momentum accumulators where 0. So we should see a normal
+      # update: v -= grad * learning_rate
+      mom_update1.run()
+      # Check that the momentum accumulators have been updated.
+      self.assertAllClose(np.array([0.1, 0.1]), slot0.eval())
+      self.assertAllClose(np.array([0.01, 0.01]), slot1.eval())
+      # Check that the parameters have been updated.
+      self.assertAllClose(np.array([1.0 - (0.1 * 2.0),
+                                    2.0 - (0.1 * 2.0)]),
+                          var0.eval())
+      self.assertAllClose(np.array([3.0 - (0.01 * 2.0),
+                                    4.0 - (0.01 * 2.0)]),
+                          var1.eval())
+      # Step 2: the second momentum accumulators contain the previous update.
+      mom_update2.run()
+      # Check that the momentum accumulators have been updated.
+      self.assertAllClose(np.array([(0.9 * 0.1 + 0.1), (0.9 * 0.1 + 0.1)]),
+                          slot0.eval())
+      self.assertAllClose(np.array([(0.9 * 0.01 + 0.01), (0.9 * 0.01 + 0.01)]),
+                          slot1.eval())
+      # Check that the parameters have been updated.
+      self.assertAllClose(
+          np.array([1.0 - (0.1 * 2.0) - ((0.9 * 0.1 + 0.1) * 2.0),
+                    2.0 - (0.1 * 2.0) - ((0.9 * 0.1 + 0.1) * 2.0)]),
+          var0.eval())
+      self.assertAllClose(np.array([2.98 - ((0.9 * 0.01 + 0.01) * 2.0),
+                                    3.98 - ((0.9 * 0.01 + 0.01) * 2.0)]),
+                          var1.eval())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/moving_averages.py b/tensorflow/python/training/moving_averages.py
new file mode 100644
index 0000000000..becc71dfa2
--- /dev/null
+++ b/tensorflow/python/training/moving_averages.py
@@ -0,0 +1,247 @@
+"""Maintain moving averages of parameters."""
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import math_ops
+from tensorflow.python.ops import state_ops
+from tensorflow.python.ops import variables
+
+
+# TODO(mdevin): switch to variables.Variable.
+def assign_moving_average(variable, value, decay, name=None):
+  """Compute the moving average of a variable.
+
+  The moving average of 'variable' updated with 'value' is:
+    variable * decay + value * (1 - decay)
+
+  The returned Operation sets 'variable' to the newly computed moving average.
+
+  The new value of 'variable' can be set with the 'AssignSub' op as:
+     variable -= (1 - decay) * (variable - value)
+
+  Args:
+    variable: A Variable.
+    value: A tensor with the same shape as 'variable'
+    decay: A float Tensor or float value.  The moving average decay.
+    name: Optional name of the returned operation.
+
+  Returns:
+    An Operation that updates 'variable' with the newly computed
+    moving average.
+  """
+  with ops.op_scope([variable, value, decay], name, "AssignMovingAvg") as name:
+    with ops.device(variable.device):
+      decay = ops.convert_to_tensor(1.0 - decay, name="decay")
+      if decay.dtype != variable.dtype.base_dtype:
+        decay = math_ops.cast(decay, variable.dtype.base_dtype)
+      return state_ops.assign_sub(variable, (variable - value) * decay,
+                                  name=name)
+
+
+class ExponentialMovingAverage(object):
+  """Maintains moving averages of variables by employing and exponential decay.
+
+  When training a model, it is often beneficial to maintain moving averages of
+  the trained parameters.  Evaluations that use averaged parameters sometimes
+  produce significantly better results than the final trained values.
+
+  The `apply()` method adds shadow copies of trained variables and add ops that
+  maintain a moving average of the trained variables in their shadow copies.
+  It is used when building the training model.  The ops that maintain moving
+  averages are typically run after each training step.
+  The `average()` and `average_name()` methods give access to the shadow
+  variables and their names.  They are useful when building an evaluation
+  model, or when restoring a model from a checkpoint file.  They help use the
+  moving averages in place of the last trained values for evaluations.
+
+  The moving averages are computed using exponential decay.  You specify the
+  decay value when creating the `ExponentialMovingAverage` object.  The shadow
+  variables are initialized with the same initial values as the trained
+  variables.  When you run the ops to maintain the moving averages, each
+  shadow variable is updated with the formula:
+
+    `shadow_variable -= (1 - decay) * (shadow_variable - variable)`
+
+  This is mathematically equivalent to the classic formula below, but the use
+  of an `assign_sub` op (the `"-="` in the formula) allows concurrent lockless
+  updates to the variables:
+
+    `shadow_variable = decay * shadow_variable + (1 - decay) * variable`
+
+  Reasonable values for `decay` are close to 1.0, typically in the
+  multiple-nines range: 0.999, 0.9999, etc.
+
+  Example usage when creating a training model:
+
+  ```python
+  # Create variables.
+  var0 = tf.Variable(...)
+  var1 = tf.Variable(...)
+  # ... use the variables to build a training model...
+  ...
+  # Create an op that applies the optimizer.  This is what we usually
+  # would use as a training op.
+  opt_op = opt.minimize(my_loss, [var0, var1])
+
+  # Create an ExponentialMovingAverage object
+  ema = tf.train.ExponentialMovingAverage(decay=0.9999)
+
+  # Create the shadow variables, and add ops to maintain moving averages
+  # of var0 and var1.
+  maintain_averages_op = ema.apply([var0, var1])
+
+  # Create an op that will update the moving averages after each training
+  # step.  This is what we will use in place of the usuall trainig op.
+  with tf.control_dependencies([opt_op]):
+      training_op = tf.group(maintain_averages_op)
+
+  ...train the model by running training_op...
+  ```
+
+  There are two ways to use the moving averages for evaluations:
+
+  *  Build a model that uses the shadow variables instead of the variables.
+     For this, use the `average()` method which returns the shadow variable
+     for a given variable.
+  *  Build a model normally but load the checkpoint files to evaluate by using
+     the shadow variable names.  For this use the `average_name()` method.  See
+     the [Saver class](train.md#Saver) for more information on restoring saved
+     variables.
+
+  Example of restoring the shadow variable values:
+
+  ```python
+  # Create a Saver that loads variables from their saved shadow values.
+  shadow_var0_name = ema.average_name(var0)
+  shadow_var1_name = ema.average_name(var1)
+  saver = tf.train.Saver({shadow_var0_name: var0, shadow_var1_name: var1})
+  saver.restore(...checkpoint filename...)
+  # var0 and var1 now hold the moving average values
+  ```
+
+  @@__init__
+  @@apply
+  @@average_name
+  @@average
+  """
+
+  def __init__(self, decay, num_updates=None,
+               name="ExponentialMovingAverage"):
+    """Creates a new ExponentialMovingAverage object.
+
+    The `Apply()` method has to be called to create shadow variables and add
+    ops to maintain moving averages.
+
+    The optional `num_updates` parameter allows one to tweak the decay rate
+    dynamically. .  It is typical to pass the count of training steps, usually
+    kept in a variable that is incremented at each step, in which case the
+    decay rate is lower at the start of training.  This makes moving averages
+    move faster.  If passed, the actual decay rate used is:
+
+      `min(decay, (1 + num_updates) / (10 + num_updates))`
+
+    Args:
+      decay: Float.  The decay to use.
+      num_updates: Optional count of number of updates applied to variables.
+      name: String. Optional prefix name to use for the name of ops added in
+        `Apply()`.
+    """
+    self._decay = decay
+    self._num_updates = num_updates
+    self._name = name
+    self._averages = {}
+
+  def apply(self, var_list=None):
+    """Maintains moving averages of variables.
+
+    `var_list` must be a list of `Variable` or `Tensor` objects.  This method
+    creates shadow variables for all elements of `var_list`.  Shadow variables
+    for `Variable` objects are initialized to the variable's initial value.
+    For `Tensor` objects, the shadow variables are initialized to 0.
+
+    shadow variables are created with `trainable=False` and added to the
+    `GraphKeys.ALL_VARIABLES` collection.  They will be returned by calls to
+    `tf.all_variables()`.
+
+    Returns an op that updates all shadow variables as described above.
+
+    Note that `apply()` can be called multiple times with different lists of
+    variables.
+
+    Args:
+      var_list: A list of Variable or Tensor objects. The variables
+        and Tensors must be of types float32 or float64.
+
+    Returns:
+      An Operation that updates the moving averages.
+
+    Raises:
+      TypeError: If the arguments are not all float32 or float64.
+      ValueError: If the moving average of one of the variables is already
+        being computed.
+    """
+    # TODO(mdevin): op_scope
+    if var_list is None:
+      var_list = variables.trainable_variables()
+    for var in var_list:
+      if var.dtype.base_dtype not in [types.float32, types.float64]:
+        raise TypeError("The variables must be float or double: %s" % var)
+      if var in self._averages:
+        raise ValueError("Moving average already computed for: %s" % var)
+      with ops.name_scope(var.op.name + "/" + self._name) as scope:
+        with ops.device(var.device):
+          if isinstance(var, variables.Variable):
+            initial_value = var.initialized_value()
+          else:
+            initial_value = array_ops.zeros(var.get_shape().as_list())
+          avg = variables.Variable(initial_value, name=scope, trainable=False)
+          self._averages[var] = avg
+    with ops.name_scope(self._name) as scope:
+      decay = ops.convert_to_tensor(self._decay, name="decay")
+      if self._num_updates is not None:
+        num_updates = math_ops.cast(self._num_updates, types.float32,
+                                    name="num_updates")
+        decay = math_ops.minimum(decay,
+                                 (1.0 + num_updates) / (10.0 + num_updates))
+      updates = []
+      for var in var_list:
+        updates.append(assign_moving_average(self._averages[var], var, decay))
+      return control_flow_ops.group(*updates, name=scope)
+
+  def average(self, var):
+    """Returns the `Variable` holding the average of `var`.
+
+    Args:
+      var: A `Variable` object.
+
+    Returns:
+      A `Variable` object or `None` if the moving average of `var`
+      is not maintained..
+    """
+    return self._averages.get(var, None)
+
+  def average_name(self, var):
+    """Returns the name of the `Variable` holding the average for `var`.
+
+    The typical scenario for `ExponentialMovingAverage` is to compute moving
+    averages of variables during training, and restore the variables from the
+    computed moving averages during evaluations.
+
+    To restore variables, you have to know the name of the shadow variables.
+    That name and the original variable can then be passed to a `Saver()` object
+    to restore the variable from the moving average value with:
+      `saver = tf.train.Saver({ema.average_name(var): var})`
+
+    `average_name()` can be called whether or not `apply()` has been called.
+
+    Args:
+      var: A `Variable` object.
+
+    Returns:
+      A string: the name of the variable that will be used or was used
+      by the `ExponentialMovingAverage class` to hold the moving average of
+      `var`.
+    """
+    return var.op.name + "/" + self._name
diff --git a/tensorflow/python/training/moving_averages_test.py b/tensorflow/python/training/moving_averages_test.py
new file mode 100644
index 0000000000..73ee94b400
--- /dev/null
+++ b/tensorflow/python/training/moving_averages_test.py
@@ -0,0 +1,130 @@
+"""Functional test for moving_averages.py."""
+import tensorflow.python.platform
+
+from tensorflow.python.framework import test_util
+from tensorflow.python.framework import types
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import variables
+from tensorflow.python.platform import googletest
+from tensorflow.python.training import moving_averages
+
+
+class MovingAveragesTest(test_util.TensorFlowTestCase):
+
+  def testAssignMovingAverage(self):
+    with self.test_session():
+      var = variables.Variable([10.0, 11.0])
+      val = constant_op.constant([1.0, 2.0], types.float32)
+      decay = 0.25
+      assign = moving_averages.assign_moving_average(var, val, decay)
+      variables.initialize_all_variables().run()
+      self.assertAllClose([10.0, 11.0], var.eval())
+      assign.op.run()
+      self.assertAllClose([10.0 * 0.25 + 1.0 * (1.0 - 0.25),
+                           11.0 * 0.25 + 2.0 * (1.0 - 0.25)],
+                          var.eval())
+
+def _Repeat(value, dim):
+  if dim == 1:
+    return value
+  return [value for _ in xrange(dim)]
+
+class ExponentialMovingAverageTest(test_util.TensorFlowTestCase):
+
+  def _CheckDecay(self, ema, actual_decay, dim):
+    tens = _Repeat(10.0, dim)
+    thirties = _Repeat(30.0, dim)
+    var0 = variables.Variable(tens, name="v0")
+    var1 = variables.Variable(thirties, name="v1")
+    variables.initialize_all_variables().run()
+    # Note that tensor2 is not a Variable but just a plain Tensor resulting
+    # from the sum operation.
+    tensor2 = var0 + var1
+    update = ema.apply([var0, var1, tensor2])
+    avg0 = ema.average(var0)
+    avg1 = ema.average(var1)
+    avg2 = ema.average(tensor2)
+
+    self.assertFalse(avg0 in variables.trainable_variables())
+    self.assertFalse(avg1 in variables.trainable_variables())
+    self.assertFalse(avg2 in variables.trainable_variables())
+    variables.initialize_all_variables().run()
+
+    self.assertEqual("v0/ExponentialMovingAverage:0", avg0.name)
+    self.assertEqual("v1/ExponentialMovingAverage:0", avg1.name)
+    self.assertEqual("add/ExponentialMovingAverage:0", avg2.name)
+
+    # Check initial values.
+    self.assertAllClose(tens, var0.eval())
+    self.assertAllClose(thirties, var1.eval())
+    self.assertAllClose(_Repeat(10.0 + 30.0, dim), tensor2.eval())
+
+    # Check that averages are initialized correctly.
+    self.assertAllClose(tens, avg0.eval())
+    self.assertAllClose(thirties, avg1.eval())
+    # Note that averages of Tensor's initialize to zeros_like since no value
+    # of the Tensor is known because the Op has not been run (yet).
+    self.assertAllClose(_Repeat(0.0, dim), avg2.eval())
+
+    # Update the averages and check.
+    update.run()
+    dk = actual_decay
+
+    expected = _Repeat(10.0 * dk + 10.0 * (1 - dk), dim)
+    self.assertAllClose(expected, avg0.eval())
+    expected = _Repeat(30.0 * dk + 30.0 * (1 - dk), dim)
+    self.assertAllClose(expected, avg1.eval())
+    expected = _Repeat(0.0 * dk + (10.0 + 30.0) * (1 - dk), dim)
+    self.assertAllClose(expected, avg2.eval())
+
+    # Again, update the averages and check.
+    update.run()
+    expected = _Repeat((10.0 * dk + 10.0 * (1 - dk)) * dk + 10.0 * (1 - dk),
+                       dim)
+    self.assertAllClose(expected, avg0.eval())
+    expected = _Repeat((30.0 * dk + 30.0 * (1 - dk)) * dk + 30.0 * (1 - dk),
+                       dim)
+    self.assertAllClose(expected, avg1.eval())
+    expected = _Repeat(((0.0 * dk + (10.0 + 30.0) * (1 - dk)) * dk +
+                        (10.0 + 30.0) * (1 - dk)),
+                       dim)
+    self.assertAllClose(expected, avg2.eval())
+
+  def testAverageVariablesNoNumUpdates_Scalar(self):
+    with self.test_session():
+      ema = moving_averages.ExponentialMovingAverage(0.25)
+      self._CheckDecay(ema, actual_decay=0.25, dim=1)
+
+  def testAverageVariablesNoNumUpdates_Vector(self):
+    with self.test_session():
+      ema = moving_averages.ExponentialMovingAverage(0.25)
+      self._CheckDecay(ema, actual_decay=0.25, dim=5)
+
+  def testAverageVariablesNumUpdates_Scalar(self):
+    with self.test_session():
+      # With num_updates 1, the decay applied is 0.1818
+      ema = moving_averages.ExponentialMovingAverage(0.25, num_updates=1)
+      self._CheckDecay(ema, actual_decay=0.181818, dim=1)
+
+  def testAverageVariablesNumUpdates_Vector(self):
+    with self.test_session():
+      # With num_updates 1, the decay applied is 0.1818
+      ema = moving_averages.ExponentialMovingAverage(0.25, num_updates=1)
+      self._CheckDecay(ema, actual_decay=0.181818, dim=5)
+
+  def testAverageVariablesNames(self):
+    v0 = variables.Variable(10.0, name="v0")
+    v1 = variables.Variable(30.0, name="v1")
+    tensor2 = v0 + v1
+    ema = moving_averages.ExponentialMovingAverage(0.25, name="foo_avg")
+    self.assertEqual("v0/foo_avg", ema.average_name(v0))
+    self.assertEqual("v1/foo_avg", ema.average_name(v1))
+    self.assertEqual("add/foo_avg", ema.average_name(tensor2))
+    ema.apply([v0, v1, tensor2])
+    self.assertEqual(ema.average_name(v0), ema.average(v0).op.name)
+    self.assertEqual(ema.average_name(v1), ema.average(v1).op.name)
+    self.assertEqual(ema.average_name(tensor2), ema.average(tensor2).op.name)
+
+
+if __name__ == "__main__":
+  googletest.main()
diff --git a/tensorflow/python/training/optimizer.py b/tensorflow/python/training/optimizer.py
new file mode 100644
index 0000000000..1186117169
--- /dev/null
+++ b/tensorflow/python/training/optimizer.py
@@ -0,0 +1,426 @@
+"""Base class for optimizers."""
+# pylint: disable=g-bad-name
+import types
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import types as tf_types
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import gradients
+from tensorflow.python.ops import state_ops
+from tensorflow.python.ops import variables
+
+
+class Optimizer(object):
+  """Base class for optimizers.
+
+  This class defines the API to add Ops to train a model.  You never use this
+  class directly, but instead instantiate one of its subclasses such as
+  `GradientDescentOptimizer`, `AdagradOptimizer`, or `MomentumOptimizer`.
+
+  ### Usage
+
+  ```
+  # Create an optimizer with the desired parameters.
+  opt = GradientDescentOptimizer(learning_rate=0.1)
+  # Add Ops to the graph to minimize a cost by updating a list of variables.
+  # "cost" is a Tensor, and the list of variables contains variables.Variable
+  # objects.
+  opt_op = opt.minimize(cost, <list of variables>)
+  ```
+
+  In the training program you will just have to run the returned Op.
+
+  ```
+  # Execute opt_op to do one step of training:
+  opt_op.run()
+  ```
+
+  ### Processing gradients before applying them.
+
+  Calling `minimize()` takes care of both computing the gradients and
+  applying them to the variables.  If you want to process the gradients
+  before applying them you can instead use the optimizer in three steps:
+
+  1.  Compute the gradients with `compute_gradients()`.
+  2.  Process the gradients as you wish.
+  3.  Apply the processed gradients with `apply_gradients()`.
+
+  Example:
+
+  ```
+  # Create an optimizer.
+  opt = GradientDescentOptimizer(learning_rate=0.1)
+
+  # Compute the gradients for a list of variables.
+  grads_and_vars = opt.compute_gradients(loss, <list of variables>)
+
+  # grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
+  # need to the 'gradient' part, for example cap them, etc.
+  capped_grads_and_vars = [(MyCapper(gv[0]), gv[1])) for gv in grads_and_vars]
+
+  # Ask the optimizer to apply the capped gradients.
+  opt.apply_gradients(capped_grads_and_vars)
+  ```
+
+  @@__init__
+
+  @@minimize
+  @@compute_gradients
+  @@apply_gradients
+
+  ### Gating Gradients
+
+  Both `minimize()` and `compute_gradients()` accept a `gate_gradient` argument
+  that controls the degree of parallelism during the application of the
+  gradients.
+
+  The possible values are: `GATE_NONE`, `GATE_OP`, and `GATE_GRAPH`.
+
+  <b>GATE_NONE</b>: Compute and apply gradients in parallel.  This provides the
+  maximum parallelism in execution, at the cost of some non-reproducibility in
+  the results.  For example the two gradients of MatMul depend on the input
+  values: With `GATE_NONE` one of the gradients could be applied to one of the
+  inputs _before_ the other gradient is computed resulting in non-reproducible
+  results.
+
+  <b>GATE_OP</b>: For each Op, make sure all gradients are computed before they
+  are used.  This prevents race conditions for Ops that generate gradients for
+  multiple inputs where the gradients depend on the inputs.
+
+  <b>GATE_GRAPH</b>: Make sure all gradients for all variables are computed
+  before any one of them is used.  This provides the least parallelism but can
+  be useful if you want to process all gradients before applying any of them.
+
+  ### Slots
+
+  Some optimizer subclasses, such as `MomentumOptimizer` and `AdagradOptimizer`
+  allocate and manage additional variables associated with the variables to
+  train.  These are called <i>Slots</i>.  Slots have names and you can ask the
+  optimizer for the names of the slots that it uses.  Once you have a slot name
+  you can ask the optimizer for the variable it created to hold the slot value.
+
+  This can be useful if you want to log debug a training algorithm, report stats
+  about the slots, etc.
+
+  @@get_slot_names
+  @@get_slot
+  """
+
+  # Values for gate_gradients.
+  GATE_NONE = 0
+  GATE_OP = 1
+  GATE_GRAPH = 2
+
+  def __init__(self, use_locking, name):
+    """Create a new Optimizer.
+
+    This must be called by the constructors of subclasses.
+
+    Args:
+      use_locking: Bool. If True apply use locks to prevent concurrent updates
+        to variables.
+      name: A non-empty string.  The name to use for accumulators created
+        for the optimizer.
+
+    Raises:
+      ValueError: if name is malformed.
+    """
+    if not name:
+      raise ValueError("Must specify the optimizer name")
+    self._use_locking = use_locking
+    self._name = name
+    # Dictionary of slots.
+    #  {slot_name : { variable_to_train: slot_for_the_variable, ...}, ... }
+    self._slots = {}
+
+  def minimize(self, loss, global_step=None, var_list=None,
+               gate_gradients=GATE_OP, name=None):
+    """Add operations to minimize 'loss' by updating 'var_list'.
+
+    This method simply combines calls compute_gradients() and
+    apply_gradients(). If you want to process the gradient before applying them
+    call compute_gradients() and apply_gradients() explicitly instead of using
+    this function.
+
+    Args:
+      loss: A Tensor containing the value to minimize.
+      global_step: Optional Variable to increment by one after the
+        variables have been updated.
+      var_list: Optional list of variables.Variable to update to minimize
+        'loss'.  Defaults to the list of variables collected in the graph
+        under the key GraphKeys.TRAINABLE_VARIABLES.
+      gate_gradients: How to gate the computation of gradients.  Can be
+        GATE_NONE, GATE_OP, or  GATE_GRAPH.
+      name: Optional name for the returned operation.
+
+    Returns:
+      An Operation that updates the variables in 'var_list'.  If 'global_step'
+      was not None, that operation also increments global_step.
+
+    Raises:
+      ValueError: if some of the variables are not variables.Variable objects.
+    """
+    grads_and_vars = self.compute_gradients(loss, var_list=var_list,
+                                            gate_gradients=gate_gradients)
+    return self.apply_gradients(grads_and_vars, global_step=global_step,
+                                name=name)
+
+  def compute_gradients(self, loss, var_list=None, gate_gradients=GATE_OP):
+    """Compute gradients of "loss" for the variables in "var_list".
+
+    This is the first part of minimize().  It returns a list
+    of (gradient, variable) pairs where "gradient" is the gradient
+    for "variable".  Note that "gradient" can be a Tensor, a
+    IndexedSlices, or None if there is no gradient for the
+    given variable.
+
+    Args:
+      loss: A Tensor containing the value to minimize.
+      var_list: Optional list of variables.Variable to update to minimize
+        "loss".  Defaults to the list of variables collected in the graph
+        under the key GraphKey.TRAINABLE_VARIABLES.
+      gate_gradients: How to gate the computation of gradients.  Can be
+        GATE_NONE, GATE_OP, or  GATE_GRAPH.
+
+    Returns:
+      A list of (gradient, variable) pairs.
+
+    Raises:
+      TypeError: If var_list contains anything else than variables.Variable.
+      ValueError: If some arguments are invalid.
+    """
+    if gate_gradients not in [Optimizer.GATE_NONE, Optimizer.GATE_OP,
+                              Optimizer.GATE_GRAPH]:
+      raise ValueError("gate_gradients must be one of: Optimizer.GATE_NONE, "
+                       "Optimizer.GATE_OP, Optimizer.GATE_GRAPH.  Not %s" %
+                       gate_gradients)
+    self._assert_valid_dtypes([loss])
+    if var_list is None:
+      var_list = variables.trainable_variables()
+    for var in var_list:
+      if not isinstance(var, variables.Variable):
+        raise TypeError("Argument is not a variables.Variable: %s" % var)
+    grads = gradients.gradients(
+        loss, var_list, gate_gradients=(gate_gradients == Optimizer.GATE_OP))
+    if gate_gradients == Optimizer.GATE_GRAPH:
+      grads = control_flow_ops.tuple(grads)
+    grads_and_vars = zip(grads, var_list)
+    self._assert_valid_dtypes([v for g, v in grads_and_vars if g is not None])
+    return grads_and_vars
+
+  def apply_gradients(self, grads_and_vars, global_step=None, name=None):
+    """Apply gradients to variables.
+
+    This is the second part of minimize(). It returns an Operation that
+    applies gradients.
+
+    Args:
+      grads_and_vars: List of (gradient, variable) pairs as returned by
+        compute_gradients().
+      global_step: Optional Variable to increment by one after the
+        variables have been updated.
+      name: Optional name for the returned operation.  Default to the
+        name passed to the Optimizer constructor.
+
+    Returns:
+      An Operation that applies the specified gradients. If 'global_step'
+      was not None, that operation also increments global_step.
+
+    Raises:
+      TypeError: if grads_and_vars is malformed.
+    """
+    # This is a default implementation of apply_gradients() that can be shared
+    # by most optimizers.  It relies on the subclass implementing the following
+    # methods: _create_slots(), _prepare(), _apply_dense(), and _apply_sparse().
+    for g, v in grads_and_vars:
+      if not isinstance(g, (ops.Tensor, ops.IndexedSlices, types.NoneType)):
+        raise TypeError(
+            "Gradient must be a Tensor, IndexedSlices, or None: %s" % g)
+      if not isinstance(v, variables.Variable):
+        raise TypeError(
+            "Variable must be a variables.Variable: %s" % v)
+      if g is not None:
+        self._assert_valid_dtypes([g, v])
+    self._create_slots([v for g, v in grads_and_vars if g is not None])
+    update_ops = []
+    with ops.op_scope([], name, self._name) as name:
+      self._prepare()
+      for grad, var in grads_and_vars:
+        if not grad:
+          continue
+        with ops.name_scope("update_" + var.op.name), ops.device(var.device):
+          if isinstance(grad, ops.Tensor):
+            update_ops.append(self._apply_dense(grad, var))
+          else:
+            update_ops.append(self._apply_sparse(grad, var))
+      if global_step is None:
+        return self._finish(update_ops, name)
+      else:
+        with ops.control_dependencies([self._finish(update_ops, "update")]):
+          with ops.device(global_step.device):
+            return state_ops.assign_add(global_step, 1, name=name).op
+
+  def get_slot(self, var, name):
+    """Return a slot named "name" created for "var" by the Optimizer.
+
+    Some Optimizer subclasses use additional variables.  For example
+    Momentum and Adagrad use variables to accumulate updates.  This method
+    gives access to these Variables if for some reason you need them.
+
+    Use get_slot_names() to get the list of slot names created by the Optimizer.
+
+    Args:
+      var: A variable passed to minimize() or apply_gradients().
+      name: A string.
+
+    Returns:
+      The Variable for the slot if it was created, None otherwise.
+    """
+    named_slots = self._slots.get(name, None)
+    if not named_slots:
+      return None
+    return named_slots.get(var, None)
+
+  def get_slot_names(self):
+    """Return a list of the names of slots created by the Optimizer.
+
+    See get_slot().
+
+    Returns:
+      A list of strings.
+    """
+    return sorted(self._slots.keys())
+
+  def _assert_valid_dtypes(self, tensors):
+    """Asserts tensors are all valid types (see _valid_dtypes).
+
+    Args:
+      tensors: tensors to check.
+    Raises:
+      ValueError: if any tensor is not a valid type.
+    """
+    valid_dtypes = self._valid_dtypes()
+    for t in tensors:
+      dtype = t.dtype.base_dtype
+      if dtype not in valid_dtypes:
+        raise ValueError(
+            "Invalid type %s for %s, expected: %s." % (
+                dtype, t.name, [v for v in valid_dtypes]))
+
+  # --------------
+  # Methods to be implemented by subclasses if they want to use the
+  # inherited implementation of apply_gradients() or compute_gradients().
+  # --------------
+  def _valid_dtypes(self):
+    """Valid types for loss, variables and gradients.
+
+    Defaults to float32. Subclasses should override to allow other types.
+
+    Returns:
+      Valid types for loss, variables and gradients.
+    """
+    return set([tf_types.float32])
+
+  def _create_slots(self, var_list):
+    """Create all slots needed by the variables.
+
+    Args:
+      var_list: A list of variables.Variable.
+    """
+    # No slots needed by default
+    pass
+
+  def _prepare(self):
+    """Create all needed tensors before applying gradients.
+
+    This is called with the name_scope using the "name" that
+    users have chosen for the application of gradients.
+    """
+    pass
+
+  def _apply_dense(self, grad, var):
+    """Add ops to apply dense gradients to "var".
+
+    Args:
+      grad: A Tensor.
+      var: A variables.Variable.
+
+    Return:
+      An Operation.
+    """
+    raise NotImplementedError()
+
+  def _apply_sparse(self, grad, var):
+    """Add ops to apply sparse gradients to "var".
+
+    Args:
+      grad: IndexedSlices.
+      var: A variables.Variable.
+
+    Return:
+      An Operation.
+    """
+    raise NotImplementedError()
+
+  def _finish(self, update_ops, name_scope):
+    """Do what is needed to finish the update.
+
+    This is called with the name_scope using the "name" that
+    users have chosen for the application of gradients.
+
+    Args:
+      update_ops: List of Operations to update variables.  This list contains
+        the values returned by the _apply_dense() and _apply_sparse() calls.
+      name_scope: string.  Name to use for the returned operation.
+
+    Returns:
+      The operation to apply updates.
+    """
+    return control_flow_ops.group(*update_ops, name=name_scope)
+
+  # --------------
+  # Utility methods for subclasses.
+  # --------------
+
+  def _get_or_make_slot(self, var, val, slot_name, op_name):
+    """Find or create a slot for a variable.
+
+    Args:
+      var: A variables.Variable.
+      val: A Tensor.  The initial value of the slot.
+      slot_name: Name for the slot.
+      op_name: Name to use when scoping the Variable that
+        needs to be created for  the slot.
+
+    Returns:
+      A variables.Variable.
+    """
+    named_slots = self._slots.get(slot_name, None)
+    if named_slots is None:
+      named_slots = {}
+      self._slots[slot_name] = named_slots
+    slot = named_slots.get(var, None)
+    if slot is None:
+      # Scope the slot name in the namespace of the Variable and
+      # create the slot on the same device as the variable.
+      with ops.name_scope(var.op.name + "/" + op_name) as scope:
+        with ops.device(var.device):
+          slot = variables.Variable(val, name=scope, trainable=False)
+      named_slots[var] = slot
+    return slot
+
+  def _zeros_slot(self, var, slot_name, op_name):
+    """Find or create a slot initialized with 0.0.
+
+    Args:
+      var: A variables.Variable.
+      slot_name: Name for the slot.
+      op_name: Name to use when scoping the Variable that
+        needs to be created for  the slot.
+
+    Returns:
+      A variables.Variable.
+    """
+    val = array_ops.zeros(var.get_shape().as_list(), dtype=var.dtype)
+    return self._get_or_make_slot(var, val, slot_name, op_name)
diff --git a/tensorflow/python/training/queue_runner.py b/tensorflow/python/training/queue_runner.py
new file mode 100644
index 0000000000..fcf9927c79
--- /dev/null
+++ b/tensorflow/python/training/queue_runner.py
@@ -0,0 +1,233 @@
+"""Create threads to run multiple enqueue ops."""
+import threading
+
+import tensorflow.python.platform
+
+from tensorflow.python.framework import errors
+from tensorflow.python.framework import ops
+from tensorflow.python.platform import logging
+
+
+class QueueRunner(object):
+  """Holds a list of enqueue operations for a queue, each to be run in a thread.
+
+  Queues are a convenient TensorFlow mechanism to compute tensors
+  asynchronously using multiple threads. For example in the canonical 'Input
+  Reader' setup one set of threads generates filenames in a queue; a second set
+  of threads read records from the files, processes them, and enqueues tensors
+  on a second queue; a third set of threads dequeues these input records to
+  construct batches and runs them through training operations.
+
+  There are several delicate issues when running multiple threads that way:
+  closing the queues in sequence as the input is exhausted, correctly catching
+  and reporting exceptions, etc.
+
+  The `QueueRunner`, combined with the `Coordinator`, helps handle these issues.
+  """
+
+  def __init__(self, queue, enqueue_ops):
+    """Create a QueueRunner.
+
+    On construction the `QueueRunner` adds an op to close the queue.  That op
+    will be run if the enqueue ops raise exceptions.
+
+    When you later call the `create_threads()` method, the `QueueRunner` will
+    create one thread for each op in `enqueue_ops`.  Each thread will run its
+    enqueue op in parallel with the other threads.  The enqueue ops do not have
+    to all be the same op, but it is expected that they all enqueue tensors in
+    `queue`.
+
+    Args:
+      queue: A `Queue`.
+      enqueue_ops: List of enqueue ops to run in threads later.
+    """
+    self._queue = queue
+    self._enqueue_ops = enqueue_ops
+    # Close when no more will be produced, but pending enqueues should be
+    # preserved.
+    self._close_op = self._queue.close()
+    # Close and cancel pending enqueues since there was an error and we want
+    # to unblock everything so we can cleanly exit.
+    self._cancel_op = self._queue.close(cancel_pending_enqueues=True)
+    # Protect the count of runs to wait for.
+    self._lock = threading.Lock()
+    self._runs = 0
+    # List of exceptions raised by the running threads.
+    self._exceptions_raised = []
+
+  @property
+  def exceptions_raised(self):
+    """Exceptions raised but not handled by the `QueueRunner` threads.
+
+    Exceptions raised in queue runner threads are handled in one of two ways
+    depending on whether or not a `Coordinator` was passed to
+    `create_threads()`:
+
+    * With a `Coordinator`, exceptions are reported to the coordinator and
+      forgotten by the `QueueRunner`.
+    * Without a `Coordinator`, exceptions are captured by the `QueueRunner` and
+      made available in this `exceptions_raised` property.
+
+    Returns:
+      A list of Python `Exception` objects.  The list is empty if no exception
+      was captured.  (No exceptions are captured when using a Coordinator.)
+    """
+    return self._exceptions_raised
+
+  # pylint: disable=broad-except
+  def _run(self, sess, enqueue_op, coord=None):
+    """Execute the enqueue op in a loop, close the queue in case of error.
+
+    Args:
+      sess: A Session.
+      enqueue_op: The Operation to run.
+      coord: Optional Coordinator object for reporting errors and checking
+        for stop conditions.
+    """
+    decremented = False
+    try:
+      while True:
+        if coord and coord.should_stop():
+          break
+        try:
+          sess.run(enqueue_op)
+        except errors.OutOfRangeError:
+          # This exception indicates that a queue was closed.
+          with self._lock:
+            self._runs -= 1
+            decremented = True
+            if self._runs == 0:
+              try:
+                sess.run(self._close_op)
+              except Exception, e:
+                # Intentionally ignore errors from close_op.
+                logging.vlog(1, "Ignored exception: %s", str(e))
+            return
+    except Exception, e:
+      # This catches all other exceptions.
+      if coord:
+        coord.request_stop(e)
+      else:
+        logging.error("Exception in QueueRunner: %s", str(e))
+        with self._lock:
+          self._exceptions_raised.append(e)
+        raise
+    finally:
+      # Make sure we account for all terminations: normal or errors.
+      if not decremented:
+        with self._lock:
+          self._runs -= 1
+
+  def _close_on_stop(self, sess, cancel_op, coord):
+    """Close the queue when the Coordinator requests stop.
+
+    Args:
+      sess: A Session.
+      cancel_op: The Operation to run.
+      coord: Coordinator.
+    """
+    coord.wait_for_stop()
+    try:
+      sess.run(cancel_op)
+    except Exception, e:
+      # Intentionally ignore errors from cancel_op.
+      logging.vlog(1, "Ignored exception: %s", str(e))
+  # pylint: enable=broad-except
+
+  def create_threads(self, sess, coord=None, daemon=False, start=False):
+    """Create threads to run the enqueue ops.
+
+    This method requires a session in which the graph was launched.  It creates
+    a list of threads, optionally starting them.  There is one thread for each
+    op passed in `enqueue_ops`.
+
+    The `coord` argument is an optional coordinator, that the threads will use
+    to terminate together and report exceptions.  If a coordinator is given,
+    this method starts an additional thread to close the queue when the
+    coordinator requests a stop.
+
+    This method may be called again as long as all threads from a previous call
+    have stopped.
+
+    Args:
+      sess: A `Session`.
+      coord: Optional `Coordinator` object for reporting errors and checking
+        stop conditions.
+      daemon: Boolean.  If `True` make the threads daemon threads.
+      start: Boolean.  If `True` starts the threads.  If `False` the
+        caller must call the `start()` method of the returned threads.
+
+    Returns:
+      A list of threads.
+
+    Raises:
+      RuntimeError: If threads from a previous call to `create_threads()` are
+      still running.
+    """
+    with self._lock:
+      if self._runs > 0:
+        raise RuntimeError(
+            "Threads are already running from a previous call to Threads() "
+            "for this queue runner.")
+      self._runs = len(self._enqueue_ops)
+      self._exceptions_raised = []
+
+    ret_threads = [threading.Thread(target=self._run, args=(sess, op, coord))
+                   for op in self._enqueue_ops]
+    if coord:
+      ret_threads.append(threading.Thread(target=self._close_on_stop,
+                                          args=(sess, self._cancel_op, coord)))
+    for t in ret_threads:
+      if daemon:
+        t.daemon = True
+      if start:
+        t.start()
+    return ret_threads
+
+
+def add_queue_runner(qr, collection=ops.GraphKeys.QUEUE_RUNNERS):
+  """Adds a `QueueRunner` to a collection in the graph.
+
+  When building a complex model that uses many queues it is often difficult to
+  gather all the queue runners that need to be run.  This convenience function
+  allows you to add a queue runner to a well known collection in the graph.
+
+  The companion method `start_queue_runners()` can be used to start threads for
+  all the collected queue runners.
+
+  Args:
+    qr: A `QueueRunner`.
+    collection: A `GraphKey` specifying the graph collection to add
+      the queue runner to.  Defaults to `GraphKeys.QUEUE_RUNNERS`.
+  """
+  ops.add_to_collection(collection, qr)
+
+
+def start_queue_runners(sess=None, coord=None, daemon=True, start=True,
+                        collection=ops.GraphKeys.QUEUE_RUNNERS):
+  """Starts all queue runners collected in the graph.
+
+  This is a companion method to `add_queue_runner()`.  It just starts
+  threads for all queue runners collected in the graph.  It returns
+  the list of all threads.
+
+  Args:
+    sess: `Session` used to run the queue ops.  Defaults to the
+      default session.
+    coord: Optional `Coordinator` for coordinating the started threads.
+    daemon: Whether the threads should be marked as `daemons`, meaning
+      they don't block program exit.
+    start: Set to `False` to only create the threads, not start them.
+    collection: A `GraphKey` specifying the graph collection to
+      get the queue runners from.  Defaults to `GraphKeys.QUEUE_RUNNERS`.
+
+  Returns:
+    A list of threads.
+  """
+  if sess is None:
+    sess = ops.get_default_session()
+  threads = []
+  for qr in ops.get_collection(collection):
+    threads.extend(qr.create_threads(sess, coord=coord, daemon=daemon,
+                                     start=start))
+  return threads
diff --git a/tensorflow/python/training/queue_runner_test.py b/tensorflow/python/training/queue_runner_test.py
new file mode 100644
index 0000000000..c94c02da66
--- /dev/null
+++ b/tensorflow/python/training/queue_runner_test.py
@@ -0,0 +1,186 @@
+"""Tests for QueueRunner."""
+import time
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class QueueRunnerTest(tf.test.TestCase):
+
+  def testBasic(self):
+    with self.test_session() as sess:
+      # CountUpTo will raise OUT_OF_RANGE when it reaches the count.
+      zero64 = tf.constant(0, dtype=tf.int64)
+      var = tf.Variable(zero64)
+      count_up_to = var.count_up_to(3)
+      queue = tf.FIFOQueue(10, tf.float32)
+      tf.initialize_all_variables().run()
+      qr = tf.train.QueueRunner(queue, [count_up_to])
+      threads = qr.create_threads(sess)
+      for t in threads:
+        t.start()
+      for t in threads:
+        t.join()
+      self.assertEqual(0, len(qr.exceptions_raised))
+      # The variable should be 3.
+      self.assertEqual(3, var.eval())
+
+  def testTwoOps(self):
+    with self.test_session() as sess:
+      # CountUpTo will raise OUT_OF_RANGE when it reaches the count.
+      zero64 = tf.constant(0, dtype=tf.int64)
+      var0 = tf.Variable(zero64)
+      count_up_to_3 = var0.count_up_to(3)
+      var1 = tf.Variable(zero64)
+      count_up_to_30 = var1.count_up_to(30)
+      queue = tf.FIFOQueue(10, tf.float32)
+      qr = tf.train.QueueRunner(queue, [count_up_to_3, count_up_to_30])
+      threads = qr.create_threads(sess)
+      tf.initialize_all_variables().run()
+      for t in threads:
+        t.start()
+      for t in threads:
+        t.join()
+      self.assertEqual(0, len(qr.exceptions_raised))
+      self.assertEqual(3, var0.eval())
+      self.assertEqual(30, var1.eval())
+
+  def testExceptionsCaptured(self):
+    with self.test_session() as sess:
+      queue = tf.FIFOQueue(10, tf.float32)
+      qr = tf.train.QueueRunner(queue, ["i fail", "so fail"])
+      threads = qr.create_threads(sess)
+      tf.initialize_all_variables().run()
+      for t in threads:
+        t.start()
+      for t in threads:
+        t.join()
+      exceptions = qr.exceptions_raised
+      self.assertEqual(2, len(exceptions))
+      self.assertTrue("Operation not in the graph" in str(exceptions[0]))
+      self.assertTrue("Operation not in the graph" in str(exceptions[1]))
+
+  def testRealDequeueEnqueue(self):
+    with self.test_session() as sess:
+      q0 = tf.FIFOQueue(3, tf.float32)
+      enqueue0 = q0.enqueue((10.0,))
+      close0 = q0.close()
+      q1 = tf.FIFOQueue(30, tf.float32)
+      enqueue1 = q1.enqueue((q0.dequeue(),))
+      dequeue1 = q1.dequeue()
+      qr = tf.train.QueueRunner(q1, [enqueue1])
+      threads = qr.create_threads(sess)
+      for t in threads:
+        t.start()
+      # Enqueue 2 values, then close queue0.
+      enqueue0.run()
+      enqueue0.run()
+      close0.run()
+      # Wait for the queue runner to terminate.
+      for t in threads:
+        t.join()
+      # It should have terminated cleanly.
+      self.assertEqual(0, len(qr.exceptions_raised))
+      # The 2 values should be in queue1.
+      self.assertEqual(10.0, dequeue1.eval())
+      self.assertEqual(10.0, dequeue1.eval())
+      # And queue1 should now be closed.
+      with self.assertRaisesRegexp(tf.errors.OutOfRangeError, "is closed"):
+        dequeue1.eval()
+
+  def testRespectCoordShouldStop(self):
+    with self.test_session() as sess:
+      # CountUpTo will raise OUT_OF_RANGE when it reaches the count.
+      zero64 = tf.constant(0, dtype=tf.int64)
+      var = tf.Variable(zero64)
+      count_up_to = var.count_up_to(3)
+      queue = tf.FIFOQueue(10, tf.float32)
+      tf.initialize_all_variables().run()
+      qr = tf.train.QueueRunner(queue, [count_up_to])
+      # As the coordinator to stop.  The queue runner should
+      # finish immediately.
+      coord = tf.train.Coordinator()
+      coord.request_stop()
+      threads = qr.create_threads(sess, coord)
+      for t in threads:
+        t.start()
+      coord.join(threads)
+      self.assertEqual(0, len(qr.exceptions_raised))
+      # The variable should be 0.
+      self.assertEqual(0, var.eval())
+
+  def testRequestStopOnException(self):
+    with self.test_session() as sess:
+      queue = tf.FIFOQueue(10, tf.float32)
+      qr = tf.train.QueueRunner(queue, ["not an op"])
+      coord = tf.train.Coordinator()
+      threads = qr.create_threads(sess, coord)
+      for t in threads:
+        t.start()
+      # The exception should be re-raised when joining.
+      with self.assertRaisesRegexp(ValueError, "Operation not in the graph"):
+        coord.join(threads)
+
+  def testGracePeriod(self):
+    with self.test_session() as sess:
+      # The enqueue will quickly block.
+      queue = tf.FIFOQueue(2, tf.float32)
+      enqueue = queue.enqueue((10.0,))
+      dequeue = queue.dequeue()
+      qr = tf.train.QueueRunner(queue, [enqueue])
+      coord = tf.train.Coordinator()
+      threads = qr.create_threads(sess, coord, start=True)
+      # Dequeue one element and then request stop.
+      dequeue.op.run()
+      time.sleep(0.02)
+      coord.request_stop()
+      # We should be able to join because the RequestStop() will cause
+      # the queue to be closed and the enqueue to terminate.
+      coord.join(threads, stop_grace_period_secs=0.05)
+
+  def testNoMultiThreads(self):
+    with self.test_session() as sess:
+      # CountUpTo will raise OUT_OF_RANGE when it reaches the count.
+      zero64 = tf.constant(0, dtype=tf.int64)
+      var = tf.Variable(zero64)
+      count_up_to = var.count_up_to(3)
+      queue = tf.FIFOQueue(10, tf.float32)
+      tf.initialize_all_variables().run()
+      coord = tf.train.Coordinator()
+      qr = tf.train.QueueRunner(queue, [count_up_to])
+      threads = []
+      threads.extend(qr.create_threads(sess, coord=coord))
+      with self.assertRaisesRegexp(
+          RuntimeError,
+          "Threads are already running"):
+        threads.extend(qr.create_threads(sess, coord=coord))
+      coord.request_stop()
+      coord.join(threads, stop_grace_period_secs=0.5)
+
+  def testThreads(self):
+    with self.test_session() as sess:
+      # CountUpTo will raise OUT_OF_RANGE when it reaches the count.
+      zero64 = tf.constant(0, dtype=tf.int64)
+      var = tf.Variable(zero64)
+      count_up_to = var.count_up_to(3)
+      queue = tf.FIFOQueue(10, tf.float32)
+      tf.initialize_all_variables().run()
+      qr = tf.train.QueueRunner(queue, [count_up_to, "bad op"])
+      threads = qr.create_threads(sess, start=True)
+      for t in threads:
+        t.join()
+      exceptions = qr.exceptions_raised
+      self.assertEqual(1, len(exceptions))
+      self.assertTrue("Operation not in the graph" in str(exceptions[0]))
+
+      threads = qr.create_threads(sess, start=True)
+      for t in threads:
+        t.join()
+      exceptions = qr.exceptions_raised
+      self.assertEqual(1, len(exceptions))
+      self.assertTrue("Operation not in the graph" in str(exceptions[0]))
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/rmsprop.py b/tensorflow/python/training/rmsprop.py
new file mode 100644
index 0000000000..6dc0ce11ea
--- /dev/null
+++ b/tensorflow/python/training/rmsprop.py
@@ -0,0 +1,81 @@
+"""One-line documentation for rmsprop module.
+
+rmsprop algorithm [tieleman2012rmsprop]
+
+A detailed description of rmsprop.
+
+- maintain a moving (discounted) average of the square of gradients
+- divide gradient by the root of this average
+
+mean_square = decay * mean_square{t-1} + (1-decay) * gradient ** 2
+mom = momentum * mom{t-1} + learning_rate * g_t / sqrt(mean_square + epsilon)
+delta = - mom
+
+"""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.training import optimizer
+from tensorflow.python.training import training_ops
+
+
+class RMSPropOptimizer(optimizer.Optimizer):
+  """Optimizer that implements the RMSProp algorithm.
+
+  @@__init__
+  """
+
+  def __init__(self, learning_rate, decay, momentum=0.0, epsilon=1e-10,
+               use_locking=False, name="RMSProp"):
+    """Construct a new RMSProp optimizer.
+
+    Args:
+      learning_rate: A Tensor or a floating point value.  The learning rate.
+      decay: discounting factor for the history/coming gradient
+      momentum: a scalar tensor.
+      epsilon: small value to avoid zero denominator.
+      use_locking: If True use locks for update operation.
+      name: Optional name prefic for the operations created when applying
+        gradients. Defaults to "RMSProp".
+    """
+    super(RMSPropOptimizer, self).__init__(use_locking, name)
+    self._learning_rate = learning_rate
+    self._decay = decay
+    self._momentum = momentum
+    self._epsilon = epsilon
+
+    # Tensors for learning rate and momentum.  Created in _prepare.
+    self._learning_rate_tensor = None
+    self._decay_tensor = None
+    self._momentum_tensor = None
+    self._epsilon_tensor = None
+
+  def _create_slots(self, var_list):
+    for v in var_list:
+      self._get_or_make_slot(
+          v, constant_op.constant(1.0, dtype=v.dtype, shape=v.get_shape()),
+          "rms", self._name)
+      self._zeros_slot(v, "momentum", self._name)
+
+  def _prepare(self):
+    self._learning_rate_tensor = ops.convert_to_tensor(self._learning_rate,
+                                                       name="learning_rate")
+    self._decay_tensor = ops.convert_to_tensor(self._decay, name="decay")
+    self._momentum_tensor = ops.convert_to_tensor(self._momentum,
+                                                  name="momentum")
+    self._epsilon_tensor = ops.convert_to_tensor(self._epsilon,
+                                                 name="epsilon")
+
+  def _apply_dense(self, grad, var):
+    rms = self.get_slot(var, "rms")
+    mom = self.get_slot(var, "momentum")
+    return training_ops.apply_rms_prop(
+        var, rms, mom,
+        self._learning_rate_tensor,
+        self._decay_tensor,
+        self._momentum_tensor,
+        self._epsilon_tensor,
+        grad, use_locking=self._use_locking).op
+
+  def _apply_sparse(self, grad, var):
+    raise NotImplementedError()
diff --git a/tensorflow/python/training/rmsprop_test.py b/tensorflow/python/training/rmsprop_test.py
new file mode 100644
index 0000000000..520df73ca8
--- /dev/null
+++ b/tensorflow/python/training/rmsprop_test.py
@@ -0,0 +1,158 @@
+"""Tests for rmsprop."""
+import math
+
+import tensorflow.python.platform
+
+import numpy as np
+import tensorflow as tf
+
+
+class RMSPropOptimizerTest(tf.test.TestCase):
+
+  def testWithoutMomentum(self):
+    with self.test_session():
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([3.0, 4.0])
+      grads0 = tf.constant([0.1, 0.1])
+      grads1 = tf.constant([0.01, 0.01])
+      opt = tf.train.RMSPropOptimizer(learning_rate=2.0, decay=0.9,
+                                      momentum=0.0, epsilon=1.0)
+      update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      rms0 = opt.get_slot(var0, "rms")
+      self.assertTrue(rms0 is not None)
+      rms1 = opt.get_slot(var1, "rms")
+      self.assertTrue(rms1 is not None)
+      mom0 = opt.get_slot(var0, "momentum")
+      self.assertTrue(mom0 is not None)
+      mom1 = opt.get_slot(var1, "momentum")
+      self.assertTrue(mom1 is not None)
+
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+      # Step 1: the rms accumulators where 1. So we should see a normal
+      # update: v -= grad * learning_rate
+      update.run()
+      # Check the root mean square accumulators.
+      self.assertAllClose(np.array([0.901, 0.901]), rms0.eval())
+      self.assertAllClose(np.array([0.90001, 0.90001]), rms1.eval())
+      # Check the parameters.
+      self.assertAllClose(np.array([1.0 - (0.1 * 2.0 / math.sqrt(0.901+1.0)),
+                                    2.0 - (0.1 * 2.0 / math.sqrt(0.901+1.0))]),
+                          var0.eval())
+      self.assertAllClose(np.array([3.0 - (0.01 * 2.0
+                                           / math.sqrt(0.90001+1.0)),
+                                    4.0 - (0.01 * 2.0
+                                           / math.sqrt(0.90001+1.0))]),
+                          var1.eval())
+      # Step 2: the root mean square accumulators contain the previous update.
+      update.run()
+      # Check the rms accumulators.
+      self.assertAllClose(np.array([0.901*0.9+0.001, 0.901*0.9+0.001]),
+                          rms0.eval())
+      self.assertAllClose(np.array([0.90001*0.9+1e-5, 0.90001*0.9+1e-5]),
+                          rms1.eval())
+      # Check the parameters.
+      self.assertAllClose(
+          np.array([1.0 - (0.1 * 2.0 / math.sqrt(0.901+1.0))
+                    - (0.1 * 2.0 / math.sqrt(0.901*0.9+0.001+1.0)),
+                    2.0 - (0.1 * 2.0 / math.sqrt(0.901+1.0))
+                    - (0.1 * 2.0 / math.sqrt(0.901*0.9+0.001+1.0))]),
+          var0.eval())
+      self.assertAllClose(np.array([3.0 - (0.01 * 2.0 / math.sqrt(0.90001+1.0))
+                                    - (0.01 * 2.0 /
+                                       math.sqrt(0.90001*0.9+1e-5+1.0)),
+                                    4.0 - (0.01 * 2.0 / math.sqrt(0.90001+1.0))
+                                    - (0.01 * 2.0 /
+                                       math.sqrt(0.90001*0.9+1e-5+1.0))]),
+                          var1.eval())
+
+  def testWithMomentum(self):
+    with self.test_session():
+      var0 = tf.Variable([1.0, 2.0])
+      var1 = tf.Variable([3.0, 4.0])
+      grads0 = tf.constant([0.1, 0.1])
+      grads1 = tf.constant([0.01, 0.01])
+
+      opt = tf.train.RMSPropOptimizer(learning_rate=2.0, decay=0.9,
+                                      momentum=0.5, epsilon=1e-5)
+      update = opt.apply_gradients(zip([grads0, grads1], [var0, var1]))
+      tf.initialize_all_variables().run()
+
+      rms0 = opt.get_slot(var0, "rms")
+      self.assertTrue(rms0 is not None)
+      rms1 = opt.get_slot(var1, "rms")
+      self.assertTrue(rms1 is not None)
+      mom0 = opt.get_slot(var0, "momentum")
+      self.assertTrue(mom0 is not None)
+      mom1 = opt.get_slot(var1, "momentum")
+      self.assertTrue(mom1 is not None)
+
+      # Fetch params to validate initial values
+      self.assertAllClose([1.0, 2.0], var0.eval())
+      self.assertAllClose([3.0, 4.0], var1.eval())
+      # Step 1: rms = 1, mom = 0. So we should see a normal
+      # update: v -= grad * learning_rate
+      update.run()
+      # Check the root mean square accumulators.
+      self.assertAllClose(np.array([0.901, 0.901]), rms0.eval())
+      self.assertAllClose(np.array([0.90001, 0.90001]), rms1.eval())
+      # Check the momentum accumulators
+      self.assertAllClose(np.array([(0.1 * 2.0 / math.sqrt(0.901+1e-5)),
+                                    (0.1 * 2.0 / math.sqrt(0.901+1e-5))]),
+                          mom0.eval())
+      self.assertAllClose(np.array([(0.01 * 2.0/ math.sqrt(0.90001+1e-5)),
+                                    (0.01 * 2.0/ math.sqrt(0.90001+1e-5))]),
+                          mom1.eval())
+
+      # Check that the parameters.
+      self.assertAllClose(np.array([1.0 - (0.1 * 2.0 / math.sqrt(0.901+1e-5)),
+                                    2.0 - (0.1 * 2.0 / math.sqrt(0.901+1e-5))]),
+                          var0.eval())
+      self.assertAllClose(np.array([3.0 - (0.01 * 2.0/ math.sqrt(0.90001+1e-5)),
+                                    4.0 - (0.01 * 2.0/ math.sqrt(0.90001+1e-5))]
+                                  ),
+                          var1.eval())
+
+      # Step 2: the root mean square accumulators contain the previous update.
+      update.run()
+      # Check the rms accumulators.
+      self.assertAllClose(np.array([0.901*0.9+0.001, 0.901*0.9+0.001]),
+                          rms0.eval())
+      self.assertAllClose(np.array([0.90001*0.9+1e-5, 0.90001*0.9+1e-5]),
+                          rms1.eval())
+      self.assertAllClose(np.array([0.5 * (0.1 * 2.0 / math.sqrt(0.901+1e-5)) +
+                                    (0.1*2.0/math.sqrt(0.901*0.9+0.001+1e-5)),
+                                    0.5 * (0.1 * 2.0 / math.sqrt(0.901+1e-5)) +
+                                    (0.1*2.0/math.sqrt(0.901*0.9+0.001+1e-5))
+                                   ]), mom0.eval())
+      self.assertAllClose(np.array([0.5 *(0.01 * 2.0/ math.sqrt(0.90001+1e-5))+
+                                    (0.01 * 2.0 /math.sqrt(0.90001*0.9+2e-5)),
+                                    0.5 *(0.01 * 2.0/ math.sqrt(0.90001+1e-5))+
+                                    (0.01 * 2.0 / math.sqrt(0.90001*0.9+2e-5))
+                                   ]), mom1.eval())
+
+      # Check the parameters.
+      self.assertAllClose(
+          np.array([1.0 - (0.1 * 2.0 / math.sqrt(0.901+1e-5)) - (0.5 * (
+              0.1 * 2.0 / math.sqrt(0.901+1e-5)) +(
+                  0.1 * 2.0 / math.sqrt(0.901*0.9+0.001+1e-5))),
+                    2.0 - (0.1 * 2.0 / math.sqrt(0.901+1e-5)) - (0.5 * (
+                        0.1 * 2.0 / math.sqrt(0.901+1e-5)) +(
+                            0.1 * 2.0 / math.sqrt(0.901*0.9+0.001+1e-5)))
+                   ]), var0.eval())
+
+      self.assertAllClose(
+          np.array([3.0 - (0.01 * 2.0 / math.sqrt(0.90001+1e-5))
+                    - (0.5 *(0.01 * 2.0/ math.sqrt(0.90001+1e-5)) +
+                       (0.01 * 2.0 /math.sqrt(0.90001*0.9+2e-5))),
+                    4.0 - (0.01 * 2.0 / math.sqrt(0.90001+1e-5))
+                    - (0.5 *(0.01 * 2.0/ math.sqrt(0.90001+1e-5)) +
+                       (0.01 * 2.0 / math.sqrt(0.90001*0.9+2e-5)))]),
+          var1.eval())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/saver.proto b/tensorflow/python/training/saver.proto
new file mode 100644
index 0000000000..b9ba9f7e3c
--- /dev/null
+++ b/tensorflow/python/training/saver.proto
@@ -0,0 +1,30 @@
+syntax = "proto3";
+
+package tensorflow;
+// option cc_enable_arenas = true;
+
+// Protocol buffer representing the configuration of a SaveRestoreHelper.
+message SaverDef {
+  // The name of the tensor in which to specify the filename when saving or
+  // restoring a model checkpoint.
+  string filename_tensor_name = 1;
+
+  // The operation to run when saving a model checkpoint.
+  string save_tensor_name = 2;
+
+  // The operation to run when restoring a model checkpoint.
+  string restore_op_name = 3;
+
+  // Maximum number of checkpoints to keep.  If 0, no checkpoints are deleted.
+  int32 max_to_keep = 4;
+
+  // Shard the save files, one per device that has Parameters nodes.
+  bool sharded = 5;
+
+  // How often to keep an additional checkpoint. If not specified, only the last
+  // "max_to_keep" checkpoints are kept; if specified, in addition to keeping
+  // the
+  // last "max_to_keep" checkpoints, an additional checkpoint will be kept for
+  // every n hours of training.
+  float keep_checkpoint_every_n_hours = 6;
+}
diff --git a/tensorflow/python/training/saver.py b/tensorflow/python/training/saver.py
new file mode 100644
index 0000000000..505bbad4c6
--- /dev/null
+++ b/tensorflow/python/training/saver.py
@@ -0,0 +1,887 @@
+# pylint: disable=invalid-name
+"""Save and restore variables."""
+import collections
+import numbers
+import os.path
+import time
+
+from google.protobuf import text_format
+
+from tensorflow.python.client import graph_util
+from tensorflow.python.client import session
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import array_ops
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import gen_array_ops
+from tensorflow.python.ops import gen_io_ops
+from tensorflow.python.ops import io_ops
+from tensorflow.python.ops import state_ops
+from tensorflow.python.ops import variables
+from tensorflow.python.platform import gfile
+from tensorflow.python.platform import logging
+from tensorflow.python.training import saver_pb2
+from tensorflow.python.training import training_util
+from tensorflow.python.training.checkpoint_state_pb2 import CheckpointState
+
+
+class BaseSaverBuilder(object):
+  """Base class for Savers.
+
+  Can be extended to create different Ops.
+  """
+
+  class VarToSave(object):
+    """Class used to describe variable slices that need to be saved."""
+
+    def __init__(self, var, slice_spec, name):
+      self.var = var
+      self.slice_spec = slice_spec
+      self.name = name
+
+  def __init__(self):
+    pass
+
+  def save_op(self, filename_tensor, vars_to_save):
+    """Create an Op to save 'vars_to_save'.
+
+    This is intended to be overridden by subclasses that want to generate
+    different Ops.
+
+    Args:
+      filename_tensor: String Tensor.
+      vars_to_save: a list of BaseSaverBuilder.VarToSave objects.
+
+    Returns:
+      An Operation that save the variables.
+    """
+    return io_ops._save(
+        filename=filename_tensor,
+        tensor_names=[vs.name for vs in vars_to_save],
+        tensors=[vs.var for vs in vars_to_save],
+        tensor_slices=[vs.slice_spec for vs in vars_to_save])
+
+  def restore_op(self, filename_tensor, var_to_save, preferred_shard):
+    """Create an Op to read the variable 'var_to_save'.
+
+    This is intended to be overridden by subclasses that want to generate
+    different Ops.
+
+    Args:
+      filename_tensor: String Tensor.
+      var_to_save: a BaseSaverBuilder.VarToSave object.
+      preferred_shard: Int.  Shard to open first when loading a sharded file.
+
+    Returns:
+      A Tensor resulting from reading 'var_to_save' from 'filename'.
+    """
+    return io_ops._restore_slice(
+        filename_tensor,
+        var_to_save.name,
+        var_to_save.slice_spec,
+        var_to_save.var.dtype,
+        preferred_shard=preferred_shard)
+
+  def sharded_filename(self, filename_tensor, shard, num_shards):
+    """Append sharding information to a filename.
+
+    Args:
+      filename_tensor: a string tensor.
+      shard: integer.  The shard for the filename.
+      num_shards: an int Tensor for the number of shards.
+
+    Returns:
+      A string tensor.
+    """
+    return gen_io_ops._sharded_filename(filename_tensor, shard, num_shards)
+
+  def _AddSaveOps(self, filename_tensor, vars_to_save):
+    """Add ops to save variables that are on the same shard.
+
+    Args:
+      filename_tensor: String Tensor.
+      vars_to_save: a list of _VarToSave objects.
+
+    Returns:
+      A tensor with the filename used to save.
+    """
+    save = self.save_op(filename_tensor, vars_to_save)
+    return control_flow_ops.with_dependencies([save], filename_tensor)
+
+  def _AddShardedSaveOps(self, filename_tensor, per_device):
+    """Add ops to save the params per shard.
+
+    Args:
+      filename_tensor: String Tensor.
+      per_device: A list of (device, BaseSaverBuilder.VarToSave) pairs, as
+        returned by _GroupByDevices().
+
+    Returns:
+      An op to save the variables.
+    """
+    num_shards = len(per_device)
+    sharded_saves = []
+    num_shards_tensor = constant_op.constant(num_shards, name="num_shards")
+    for shard, (device, vars_to_save) in enumerate(per_device):
+      with ops.device(device):
+        sharded_filename = self.sharded_filename(
+            filename_tensor, shard, num_shards_tensor)
+        sharded_saves.append(self._AddSaveOps(sharded_filename, vars_to_save))
+    # Return the sharded name for the save path.
+    with ops.control_dependencies([x.op for x in sharded_saves]):
+      return gen_io_ops._sharded_filespec(filename_tensor, num_shards_tensor)
+
+  def _AddRestoreOps(self,
+                     filename_tensor,
+                     vars_to_save,
+                     restore_sequentially,
+                     reshape,
+                     preferred_shard=-1,
+                     name="restore_all"):
+    """Add operations to restore vars_to_save.
+
+    Args:
+      filename_tensor: Tensor for the path of the file to load.
+      vars_to_save: a list of _VarToSave objects.
+      restore_sequentially: True if we want to restore variables sequentially
+        within a shard.
+      reshape: True if we want to reshape loaded tensors to the shape of
+        the corresponding variable.
+      preferred_shard: Shard to open first when loading a sharded file.
+      name: Name for the returned op.
+
+    Returns:
+      An Operation that restores the variables.
+    """
+    assign_ops = []
+    for vs in vars_to_save:
+      v = vs.var
+      restore_control_inputs = assign_ops[-1:] if restore_sequentially else []
+      # Load and optionally reshape on the CPU, as string tensors are not
+      # available on the GPU.
+      # TODO(mdevin): Re-enable restore on GPU when we can support annotating
+      # string tensors as "HostMemory" inputs.
+      with ops.device(graph_util.set_cpu0(v.device) if v.device else None):
+        with ops.control_dependencies(restore_control_inputs):
+          values = self.restore_op(filename_tensor, vs, preferred_shard)
+        if reshape:
+          shape = v.get_shape()
+          if not shape.is_fully_defined():
+            shape = array_ops.shape(v)
+          values = array_ops.reshape(values, shape)
+
+      # Assign on the same device as the variable.
+      with ops.device(v.device):
+        assign_ops.append(state_ops.assign(v,
+                                           values,
+                                           validate_shape=not reshape))
+
+    # Create a Noop that has control dependencies from all the updates.
+    return control_flow_ops.group(*assign_ops, name=name)
+
+  def _AddShardedRestoreOps(self, filename_tensor, per_device,
+                            restore_sequentially, reshape):
+    """Add Ops to save variables from multiple devices.
+
+    Args:
+      filename_tensor: Tensor for the path of the file to load.
+      per_device: A list of (device, _VarToSave) pairs, as
+        returned by _GroupByDevices().
+      restore_sequentially: True if we want to restore variables sequentially
+        within a shard.
+      reshape: True if we want to reshape loaded tensors to the shape of
+        the corresponding variable.
+
+    Returns:
+      An Operation that restores the variables.
+    """
+    sharded_restores = []
+    for shard, (device, vars_to_save) in enumerate(per_device):
+      with ops.device(device):
+        sharded_restores.append(self._AddRestoreOps(
+            filename_tensor,
+            vars_to_save,
+            restore_sequentially,
+            reshape,
+            preferred_shard=shard,
+            name="restore_shard"))
+    return control_flow_ops.group(*sharded_restores, name="restore_all")
+
+  def _IsVariable(self, v):
+    return isinstance(v, ops.Tensor) and (
+        v.op.type == "Variable" or v.op.type == "AutoReloadVariable")
+
+  def _GroupByDevices(self, vars_to_save):
+    """Group Variable tensor slices per device.
+
+    TODO(mdevin): Make sure that all the devices found are on different
+    job/replica/task/cpu|gpu.  It would be bad if 2 were on the same device.
+    It can happen if the devices as unspecified.
+
+    Args:
+      vars_to_save: a list of BaseSaverBuilder.VarToSave objects.
+
+    Returns:
+      A list of tuples: (device_name, BaseSaverBuilder.VarToSave) tuples.
+      The list is sorted by ascending device_name.
+    """
+    per_device = collections.defaultdict(lambda: [])
+    for var_to_save in vars_to_save:
+      per_device[var_to_save.var.device].append(var_to_save)
+    return sorted([(dev, tup) for dev, tup in per_device.iteritems()],
+                  key=lambda t: t[0])
+
+  def _VarListToDict(self, var_list):
+    """Create a dictionary of names to variable lists.
+
+    Args:
+      var_list: A list, tuple, or set of Variables.
+
+    Returns:
+      A dictionary of variable names to the variables that must be saved under
+      that name.  Variables with save_slice_info are grouped together under the
+      same key in no particular order.
+
+    Raises:
+      TypeError: If the type of var_list or its elements is not supported.
+      ValueError: If at least two variables share the same name.
+    """
+    if not isinstance(var_list, (list, tuple, set)):
+      raise TypeError("Variables to save should be passed in a dict or a "
+                      "list: %s" % var_list)
+    var_list = set(var_list)
+    names_to_variables = {}
+    for var in var_list:
+      # pylint: disable=protected-access
+      if isinstance(var, variables.Variable) and var._save_slice_info:
+        name = var._save_slice_info.name
+        if name in names_to_variables:
+          if not isinstance(names_to_variables[name], list):
+            raise ValueError("Mixing slices and non-slices with the same name: "
+                             "%s" % name)
+          names_to_variables[name].append(var)
+        else:
+          names_to_variables[name] = [var]
+      else:
+        var = ops.convert_to_tensor(var)
+        if not self._IsVariable(var):
+          raise TypeError("Variable to save is not a Variable: %s" % var)
+        name = var.op.name
+        if name in names_to_variables:
+          raise ValueError("At least two variables have the same name: %s" %
+                           name)
+        names_to_variables[name] = var
+      # pylint: enable=protected-access
+    return names_to_variables
+
+  def _ValidateAndSliceInputs(self, names_to_variables):
+    """Returns the variables and names that will be used for a Saver.
+
+    Args:
+      names_to_variables: A dict (k, v) where k is the name of a variable and v
+         is a Variable to save or a BaseSaverBuilder.Saver.
+
+    Returns:
+      A list of BaseSaverBuilder.VarToSave objects.
+
+    Raises:
+      TypeError: if any of the keys are not strings or any of the
+        values are not one of Tensor or Variable.
+      ValueError: if the same variable is given in more than one value
+        (this also applies to slices of SlicedVariables).
+    """
+    if not isinstance(names_to_variables, dict):
+      names_to_variables = self._VarListToDict(names_to_variables)
+
+    vars_to_save = []
+    seen_variables = set()
+    for name in sorted(names_to_variables.iterkeys()):
+      if not isinstance(name, basestring):
+        raise TypeError("names_to_variables must be a dict mapping string "
+                        "names to variable Tensors. Name is not a string: %s" %
+                        name)
+      v = names_to_variables[name]
+      if isinstance(v, (list, tuple)):
+        # A set of slices.
+        slice_name = None
+        # pylint: disable=protected-access
+        for variable in v:
+          if not isinstance(variable, variables.Variable):
+            raise ValueError("Slices must all be Variables: %s" % variable)
+          if not variable._save_slice_info:
+            raise ValueError("Slices must all be slices: %s" % variable)
+          if slice_name is None:
+            slice_name = variable._save_slice_info.name
+          elif slice_name != variable._save_slice_info.name:
+            raise variable("Slices must all be from the same tensor: %s != %s"
+                           % (slice_name, variable._save_slice_info.name))
+          self._AddVarToSave(vars_to_save, seen_variables,
+                             variable, variable._save_slice_info.spec, name)
+        # pylint: enable=protected-access
+      else:
+        # A variable or tensor.
+        variable = ops.convert_to_tensor(v)
+        if not self._IsVariable(variable):
+          raise TypeError("names_to_variables must be a dict mapping string "
+                          "names to Tensors/Variables. Not a variable: %s" %
+                          variable)
+        self._AddVarToSave(vars_to_save, seen_variables, variable, "", name)
+    return vars_to_save
+
+  def _AddVarToSave(self, vars_to_save, seen_variables, variable, slice_spec,
+                    name):
+    """Create a VarToSave and add it  to the vars_to_save list.
+
+    Args:
+      vars_to_save: List to append the new VarToSave to.
+      seen_variables: Set of variables already processed.  Used to check
+        that each variable is only saved once.
+      variable: Variable to save.
+      slice_spec: String.  Slice spec for the variable.
+      name: Name to use to save the variable.
+
+    Raises:
+      ValueError: If the variable has already been processed.
+    """
+    if variable in seen_variables:
+      raise ValueError("The same variable will be restored with two names: %s",
+                       variable)
+    vars_to_save.append(BaseSaverBuilder.VarToSave(variable, slice_spec, name))
+    seen_variables.add(variable)
+
+  def build(self,
+            names_to_variables,
+            reshape=False,
+            sharded=False,
+            max_to_keep=5,
+            keep_checkpoint_every_n_hours=10000.0,
+            name=None,
+            restore_sequentially=False):
+    """Adds save/restore nodes to the graph and creates a SaverDef proto.
+
+    Args:
+      names_to_variables: A dictionary mapping name to a Variable.
+        Each name will be associated with the
+        corresponding variable in the checkpoint.
+      reshape: If True, allow restoring parameters from a checkpoint
+        that where the parameters have a different shape.  This is
+        only needed when you try to restore from a Dist-Belief checkpoint,
+        and only some times.
+      sharded: If True, shard the checkpoints, one per device that has
+        Parameters nodes.
+      max_to_keep: maximum number of checkpoints to keep.  As new checkpoints
+        are created, old ones are deleted.  If None or 0, no checkpoints are
+        deleted.  Presently the number is only roughly enforced.  For example
+        in case of restarts more than max_to_keep checkpoints may be kept.
+      keep_checkpoint_every_n_hours: How often checkpoints should be kept.
+        Defaults to 10,000 hours.
+      name: string.  Optional name to use as a prefix when adding operations.
+      restore_sequentially: A Bool, which if true, causes restore of different
+        variables to happen sequentially within each device.
+
+    Returns:
+      A SaverDef proto.
+
+    Raises:
+      TypeError: If 'names_to_variables' is not a dictionary mapping string
+        keys to variable Tensors.
+      ValueError: If any of the keys or values in 'names_to_variables' is not
+        unique.
+    """
+    vars_to_save = self._ValidateAndSliceInputs(names_to_variables)
+    if max_to_keep is None:
+      max_to_keep = 0
+
+    with ops.op_scope([vs.var for vs in vars_to_save], name, "save") as name:
+      # Add the Constant string tensor for the filename.
+      filename_tensor = constant_op.constant("model")
+
+      # Add the save ops.
+      if sharded:
+        per_device = self._GroupByDevices(vars_to_save)
+        save_tensor = self._AddShardedSaveOps(filename_tensor, per_device)
+        restore_op = self._AddShardedRestoreOps(
+            filename_tensor, per_device, restore_sequentially, reshape)
+      else:
+        save_tensor = self._AddSaveOps(filename_tensor, vars_to_save)
+        restore_op = self._AddRestoreOps(
+            filename_tensor, vars_to_save, restore_sequentially, reshape)
+
+    assert restore_op.name.endswith("restore_all"), restore_op.name
+
+    return saver_pb2.SaverDef(
+        filename_tensor_name=filename_tensor.name,
+        save_tensor_name=save_tensor.name,
+        restore_op_name=restore_op.name,
+        max_to_keep=max_to_keep,
+        keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours,
+        sharded=sharded)
+
+def _GetCheckpointFilename(save_dir, latest_filename):
+  """Returns a filename for storing the CheckpointState.
+
+  Args:
+    save_dir: The directory for saving and restoring checkpoints.
+    latest_filename: Name of the file in 'save_dir' that is used
+      to store the CheckpointState.
+
+  Returns:
+    The path of the file that contains the CheckpointState proto.
+  """
+  if latest_filename is None:
+    latest_filename = "checkpoint"
+  return os.path.join(save_dir, latest_filename)
+
+
+def update_checkpoint_state(save_dir,
+                            model_checkpoint_path,
+                            all_model_checkpoint_paths=None,
+                            latest_filename=None):
+  """Updates the content of the 'checkpoint' file.
+
+  This updates the checkpoint file containing a CheckpointState
+  proto.
+
+  Args:
+    save_dir: Directory where the model was saved.
+    model_checkpoint_path: The checkpoint file.
+    all_model_checkpoint_paths: list of strings.  Paths to all not-yet-deleted
+      checkpoints, sorted from oldest to newest.  If this is a non-empty list,
+      the last element must be equal to model_checkpoint_path.  These paths
+      are also saved in the CheckpointState proto.
+    latest_filename: Optional name of the checkpoint file.  Default to
+      'checkpoint'.
+
+  Raises:
+    RuntimeError: If the save paths conflict.
+  """
+  if all_model_checkpoint_paths is None:
+    all_model_checkpoint_paths = []
+  elif all_model_checkpoint_paths[-1] != model_checkpoint_path:
+    logging.warning(
+        "%s is not in all_model_checkpoint_paths! Manually adding it.",
+        model_checkpoint_path)
+    all_model_checkpoint_paths.append(model_checkpoint_path)
+  # Writes the "checkpoint" file for the coordinator for later restoration.
+  coord_checkpoint_filename = _GetCheckpointFilename(save_dir, latest_filename)
+  if coord_checkpoint_filename == model_checkpoint_path:
+    raise RuntimeError("Save path '%s' conflicts with path used for "
+                       "checkpoint state.  Please use a different save path." %
+                       model_checkpoint_path)
+  coord_checkpoint_proto = CheckpointState(
+      model_checkpoint_path=model_checkpoint_path,
+      all_model_checkpoint_paths=all_model_checkpoint_paths)
+  f = gfile.FastGFile(coord_checkpoint_filename, mode="w")
+  f.write(text_format.MessageToString(coord_checkpoint_proto))
+  f.close()
+
+
+def get_checkpoint_state(checkpoint_dir, latest_filename=None):
+  """Returns CheckpointState proto from the "checkpoint" file.
+
+  If the "checkpoint" file contains a valid CheckpointState
+  proto, returns it.
+
+  Args:
+    checkpoint_dir: The directory of checkpoints.
+    latest_filename: Optional name of the checkpoint file.  Default to
+      'checkpoint'.
+
+  Returns:
+    A CheckpointState if the state was available, None
+    otherwise.
+  """
+  ckpt = None
+  coord_checkpoint_filename = _GetCheckpointFilename(
+      checkpoint_dir, latest_filename)
+  f = None
+  try:
+    # Check that the file exists before opeining it to avoid
+    # many lines of errors from colossus in the logs.
+    if gfile.Exists(coord_checkpoint_filename):
+      f = gfile.FastGFile(coord_checkpoint_filename, mode="r")
+      ckpt = CheckpointState()
+      text_format.Merge(f.read(), ckpt)
+  except gfile.FileError:
+    # It's ok if the file cannot be read
+    return None
+  except text_format.ParseError, e:
+    logging.warning(str(e))
+    logging.warning("%s: Checkpoint ignored", coord_checkpoint_filename)
+    return None
+  finally:
+    if f:
+      f.close()
+  return ckpt
+
+
+class Saver(object):
+  """Saves and restores variables.
+
+  See [Variables](../../how_tos/variables/index.md)
+  for an overview of variables, saving and restoring.
+
+  The `Saver` class adds ops to save and restore variables to and from
+  *checkpoints*.  It also provides convenience methods to run these ops.
+
+  Checkpoints are binary files in a proprietary format which map variable names
+  to tensor values.  The best way to examine the contents of a checkpoint is to
+  load it using a `Saver`.
+
+  Savers can automatically number checkpoint filenames with a provided counter.
+  This lets you keep multiple checkpoints at different steps while training a
+  model.  For example you can number the checkpoint filenames with the training
+  step number.  To avoid filling up disks, savers manage checkpoint files
+  automatically. For example, they can keep only the N most recent files, or
+  one checkpoint for every N hours of training.
+
+  You number checkpoint filenames by passing a value to the optional
+  `global_step` argument to `save()`:
+
+  ```python
+  saver.save('my-model', global_step=0) ==> filename: 'my-model-0'
+  ...
+  saver.save('my-model', global_step=1000) ==> filename: 'my-model-1000'
+  ```
+
+  Additionally, optional arguments to the `Saver()` constructor let you control
+  the proliferation of checkpoint files on disk:
+
+  * `max_to_keep` indicates the maximum number of recent checkpoint files to
+    keep.  As new files are created, older files are deleted.  If None or 0,
+    all checkpoint files are kept.  Defaults to 5 (that is, the 5 most recent
+    checkpoint files are kept.)
+
+  * `keep_checkpoint_every_n_hours`: In addition to keeping the most recent
+    `max_to_keep` checkpoint files, you might want to keep one checkpoint file
+    for every N hours of training.  This can be useful if you want to later
+    analyze how a model progressed during a long training session.  For
+    example, passing `keep_checkpoint_every_n_hours=2` ensures that you keep
+    one checkpoint file for every 2 hours of training.  The default value of
+    10,000 hours effectively disables the feature.
+
+  Note that you still have to call the `save()` method to save the model.
+  Passing these arguments to the constructor will not save variables
+  automatically for you.
+
+  A training program that saves regularly looks like:
+
+  ```python
+  ...
+  # Create a saver.
+  saver = tf.train.Saver(...variables...)
+  # Launch the graph and train, saving the model every 1,000 steps.
+  sess = tf.Session()
+  for step in xrange(1000000):
+      sess.run(..training_op..)
+      if step % 1000 == 0:
+          # Append the step number to the checkpoint name:
+          saver.save(sess, 'my-model', global_step=step)
+  ```
+
+  In addition to checkpoint files, savers keep a protocol buffer on disk with
+  the list of recent checkpoints. This is used to manage numbered checkpoint
+  files and by `latest_checkpoint()`, which makes it easy to discover the path
+  to the most recent checkpoint. That protocol buffer is stored in a file named
+  'checkpoint' next to the checkpoint files.
+
+  If you create several savers, you can specify a different filename for the
+  protocol buffer file in the call to `save()`.
+
+  @@__init__
+  @@save
+  @@restore
+
+  Other utility methods.
+
+  @@last_checkpoints
+  @@set_last_checkpoints
+  @@as_saver_def
+  """
+
+  def __init__(self,
+               var_list=None,
+               reshape=False,
+               sharded=False,
+               max_to_keep=5,
+               keep_checkpoint_every_n_hours=10000.0,
+               name=None,
+               restore_sequentially=False,
+               saver_def=None,
+               builder=None):
+    """Creates a `Saver`.
+
+    The constructor adds ops to save and restore variables.
+
+    `var_list` specifies the variables that will be saved and restored. It can
+    be passed as a `dict` or a list:
+
+    * A `dict` of names to variables: The keys are the names that will be
+      used to save or restore the variables in the checkpoint files.
+    * A list of variables: The variables will be keyed with their op name in
+      the checkpoint files.
+
+    For example:
+
+    ```python
+    v1 = tf.Variable(..., name='v1')
+    v2 = tf.Variable(..., name='v2')
+
+    # Pass the variables as a dict:
+    saver = tf.train.Saver({'v1': v1, 'v2': v2})
+
+    # Or pass them as a list.
+    saver = tf.train.Saver([v1, v2])
+    # Passing a list is equivalent to passing a dict with the variable op names
+    # as keys:
+    saver = tf.train.Saver({v.op.name: v for v in [v1, v2]})
+    ```
+
+    The optional `reshape` argument, if True, allows restoring a variable from
+    a save file where the variable had a different shape, but the same number
+    of elements and type.  This is useful if you have reshaped a variable and
+    want to reload it from an older checkpoint.
+
+    The optional `sharded` argument, if True, instructs the saver to shard
+    checkpoints per device.
+
+    Args:
+      var_list: A list of Variables or a dictionary mapping names to
+        Variables.  If None, defaults to the list of all variables.
+      reshape: If True, allows restoring parameters from a checkpoint
+        where the variables have a different shape.
+      sharded: If True, shard the checkpoints, one per device.
+      max_to_keep: maximum number of recent checkpoints to keep.
+        Defaults to 10,000 hours.
+      keep_checkpoint_every_n_hours: How often to keep checkpoints.
+        Defaults to 10,000 hours.
+      name: string.  Optional name to use as a prefix when adding operations.
+      restore_sequentially: A Bool, which if true, causes restore of different
+        variables to happen sequentially within each device.  This can lower
+        memory usage when restoring very large models.
+      saver_def: Optional SaverDef proto to use instead of running the builder.
+        This is only useful for specialty code that wants to recreate a Saver
+        object for a previously built Graph that had a Saver.  The saver_def
+        proto should be the one returned by the as_saver_def() call of the
+        Saver that was created for that Graph.
+      builder: Optional SaverBuilder to use if a saver_def was not provided.
+        Defaults to BaseSaverBuilder().
+
+    Raises:
+      TypeError: If `var_list` is invalid.
+      ValueError: If any of the keys or values in `var_list` is not unique.
+    """
+    if saver_def is None:
+      if builder is None:
+        builder = BaseSaverBuilder()
+      if var_list is None:
+        var_list = variables.all_variables()
+      if not var_list:
+        raise ValueError("No variables to save")
+      saver_def = builder.build(
+          var_list,
+          reshape=reshape,
+          sharded=sharded,
+          max_to_keep=max_to_keep,
+          keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours,
+          name=name,
+          restore_sequentially=restore_sequentially)
+    if not isinstance(saver_def, saver_pb2.SaverDef):
+      raise ValueError("saver_def must if a saver_pb2.SaverDef: %s" % saver_def)
+    if not saver_def.save_tensor_name:
+      raise ValueError("saver_def must specify the save_tensor_name: %s"
+                       % str(saver_def))
+    if not saver_def.restore_op_name:
+      raise ValueError("saver_def must specify the restore_op_name: %s"
+                       % str(saver_def))
+    self._filename_tensor_name = saver_def.filename_tensor_name
+    self._save_tensor_name = saver_def.save_tensor_name
+    self._restore_op_name = saver_def.restore_op_name
+    self._max_to_keep = saver_def.max_to_keep
+    # If keep_checkpoint_every_n_hours is not set, set it to 10000 hours.
+    self._keep_checkpoint_every_n_hours = (
+        saver_def.keep_checkpoint_every_n_hours if
+        saver_def.keep_checkpoint_every_n_hours else 10000)
+    self._next_checkpoint_time = (
+        time.time() + self._keep_checkpoint_every_n_hours * 3600)
+    self._sharded = saver_def.sharded
+    self._last_checkpoints = []
+
+  def _CheckpointFilename(self, p):
+    """Returns the checkpoint file name.
+
+    If p is (filename, time) pair, return p[0]; else return p.
+
+    Args:
+      p: (filename, time) pair or just checkpoint filename.
+
+    Returns:
+      Checkpoint file name.
+    """
+    return p[0] if isinstance(p, tuple) else p
+
+  def _MaybeDeleteOldCheckpoints(self, latest_save_path):
+    """Deletes old checkpoints if necessary.
+
+    Always keep the last max_to_keep checkpoints.  If
+    keep_checkpoint_every_n_hours was specified, keep an additional checkpoint
+    every N hours. For example, if N is 0.5, an additional checkpoint is kept
+    for every 0.5 hours of training; if N is 10, an additional checkpoint is
+    kept for every 10 hours of training.
+
+    Args:
+      latest_save_path: Name including path of checkpoint file to save.
+    """
+    if not self._max_to_keep:
+      return
+    # Remove first from list if the same name was used before.
+    for p in self._last_checkpoints:
+      if latest_save_path == self._CheckpointFilename(p):
+        self._last_checkpoints.remove(p)
+    # Append new path to list
+    self._last_checkpoints.append((latest_save_path, time.time()))
+    # If more than max_to_keep, remove oldest.
+    if len(self._last_checkpoints) > self._max_to_keep:
+      p = self._last_checkpoints.pop(0)
+      # Do not delete the file if we keep_checkpoint_every_n_hours is set and we
+      # have reached N hours of training.
+      should_keep = p[1] > self._next_checkpoint_time
+      if should_keep:
+        self._next_checkpoint_time += (
+            self._keep_checkpoint_every_n_hours * 3600)
+        return
+      # Otherwise delete the files.
+      for f in gfile.Glob(self._CheckpointFilename(p)):
+        try:
+          gfile.Remove(f)
+        except gfile.GOSError, e:
+          logging.warning("Ignoring: %s", str(e))
+
+  def as_saver_def(self):
+    """Generates a `SaverDef` representation of this saver.
+
+    Returns:
+      A `SaverDef` proto.
+    """
+    return saver_pb2.SaverDef(
+        filename_tensor_name=self._filename_tensor_name,
+        save_tensor_name=self._save_tensor_name,
+        restore_op_name=self._restore_op_name,
+        max_to_keep=self._max_to_keep,
+        keep_checkpoint_every_n_hours=self._keep_checkpoint_every_n_hours,
+        sharded=self._sharded)
+
+  @property
+  def last_checkpoints(self):
+    """List of not-yet-deleted checkpoint filenames.
+
+    You can pass any of the returned values to `restore()`.
+
+    Returns:
+      A list of checkpoint filenames, sorted from oldest to newest.
+    """
+    return list(self._CheckpointFilename(p) for p in self._last_checkpoints)
+
+  def set_last_checkpoints(self, last_checkpoints):
+    """Sets the list of not-yet-deleted checkpoint filenames.
+
+    Args:
+      last_checkpoints: a list of checkpoint filenames.
+
+    Raises:
+      AssertionError: if the list of checkpoint filenames has already been set.
+    """
+    assert not self._last_checkpoints
+    assert isinstance(last_checkpoints, list)
+    self._last_checkpoints = list(last_checkpoints)
+
+  def save(self, sess, save_path, global_step=None, latest_filename=None):
+    """Saves variables.
+
+    This method runs the ops added by the constructor for saving variables.
+    It requires a session in which the graph was launched.  The variables to
+    save must also have been initialized.
+
+    The method returns the path of the newly created checkpoint file.  This
+    path can be passed directly to a call to `restore()`.
+
+    Args:
+      sess: A Session to use to save the variables..
+      save_path: string.  Path to the checkpoint filename.  If the saver is
+        `sharded`, this is the prefix of the sharded checkpoint filename.
+      global_step: If provided the global step number is appended to
+        `save_path` to create the checkpoint filename. The optional argument
+        can be a Tensor, a Tensor name or an integer.
+      latest_filename: Optional name for the protocol buffer file that will
+        contains the list of most recent checkpoint filenames.  That file,
+        kept in the same directory as the checkpoint files, is automatically
+        managed by the saver to keep track of recent checkpoints.  Defaults to
+        'checkpoint'.
+
+    Returns:
+      A string: path at which the variables were saved.  If the saver is
+        sharded, this string ends with: '-?????-of-nnnnn' where 'nnnnn'
+        is the number of shards created.
+
+    Raises:
+      TypeError: If `sess` is not a Session.
+    """
+    if latest_filename is None:
+      latest_filename = "checkpoint"
+    if global_step is not None:
+      if not isinstance(global_step, numbers.Number):
+        global_step = training_util.global_step(sess, global_step)
+      checkpoint_file = "%s-%d" % (save_path, global_step)
+    else:
+      checkpoint_file = save_path
+    save_path = os.path.dirname(save_path)
+    if not isinstance(sess, session.SessionInterface):
+      raise TypeError("'sess' must be a Session; %s" % sess)
+
+    model_checkpoint_path = sess.run(
+        self._save_tensor_name, {self._filename_tensor_name: checkpoint_file})
+    model_checkpoint_path = str(model_checkpoint_path)
+    self._MaybeDeleteOldCheckpoints(model_checkpoint_path)
+    update_checkpoint_state(save_path, model_checkpoint_path,
+                            self.last_checkpoints, latest_filename)
+    return model_checkpoint_path
+
+  def restore(self, sess, save_path):
+    """Restores previously saved variables.
+
+    This method runs the ops added by the constructor for restoring variables.
+    It requires a session in which the graph was launched.  The variables to
+    restore do not have to have been initialized, as restoring is itself a way
+    to initialize variables.
+
+    The `save_path` argument is typically a value previously returned from a
+    `save()` call, or a call to `latest_checkpoint()`.
+
+    Args:
+      sess: A Session to use to restore the parameters.
+      save_path: Path where parameters were previously saved.
+    """
+    sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
+
+
+def latest_checkpoint(checkpoint_dir, latest_filename=None):
+  """Finds the filename of latest saved checkpoint file.
+
+  Args:
+    checkpoint_dir: Directory where the variables were saved.
+    latest_filename: Optional name for the protocol buffer file that
+      contains the list of most recent checkpoint filenames.
+      See the corresponding argument to `Saver.save()`.
+
+  Returns:
+    The full path to the latest checkpoint or None if no checkpoint was found.
+  """
+  # Pick the latest checkpoint based on checkpoint state.
+  ckpt = get_checkpoint_state(checkpoint_dir, latest_filename)
+  if ckpt and ckpt.model_checkpoint_path:
+    checkpoint_full_path = os.path.join(
+        checkpoint_dir, ckpt.model_checkpoint_path)
+    if gfile.Exists(checkpoint_full_path):
+      return checkpoint_full_path
+
+  return None
diff --git a/tensorflow/python/training/saver_test.py b/tensorflow/python/training/saver_test.py
new file mode 100644
index 0000000000..db378e9637
--- /dev/null
+++ b/tensorflow/python/training/saver_test.py
@@ -0,0 +1,563 @@
+"""Tests for tensorflow.ops.io_ops."""
+import os.path
+import time
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+import numpy as np
+
+from tensorflow.python.platform import gfile
+
+
+class SaverTest(tf.test.TestCase):
+
+  def testBasics(self):
+    save_path = os.path.join(self.get_temp_dir(), "basics")
+
+    with self.test_session() as sess:
+      # Build a graph with 2 parameter nodes, and Save and
+      # Restore nodes for them.
+      v0 = tf.Variable(10.0, name="v0")
+      v1 = tf.Variable(20.0, name="v1")
+      save = tf.train.Saver({"v0": v0, "v1": v1}, restore_sequentially=True)
+      tf.initialize_all_variables().run()
+
+      # Check that the parameter nodes have been initialized.
+      self.assertEqual(10.0, v0.eval())
+      self.assertEqual(20.0, v1.eval())
+
+      # Save the initialized values in the file at "save_path"
+      val = save.save(sess, save_path)
+      self.assertTrue(isinstance(val, basestring))
+      self.assertEqual(save_path, val)
+
+    # Start a second session.  In that session the parameter nodes
+    # have not been initialized either.
+    with self.test_session() as sess:
+      v0 = tf.Variable(-1.0, name="v0")
+      v1 = tf.Variable(-1.0, name="v1")
+      save = tf.train.Saver({"v0": v0, "v1": v1})
+
+      with self.assertRaisesWithPredicateMatch(
+          tf.OpError, lambda e: "uninitialized value v0" in e.message):
+        sess.run(v0)
+      with self.assertRaisesWithPredicateMatch(
+          tf.OpError, lambda e: "uninitialized value v1" in e.message):
+        sess.run(v1)
+
+      # Restore the saved values in the parameter nodes.
+      save.restore(sess, save_path)
+      # Check that the parameter nodes have been restored.
+      self.assertEqual(10.0, v0.eval())
+      self.assertEqual(20.0, v1.eval())
+
+    # Build another graph with 2 nodes, initialized
+    # differently, and a Restore node for them.
+    with self.test_session() as sess:
+      v0_2 = tf.Variable(1000.0, name="v0")
+      v1_2 = tf.Variable(2000.0, name="v1")
+      save2 = tf.train.Saver({"v0": v0_2, "v1": v1_2})
+      tf.initialize_all_variables().run()
+
+      # Check that the parameter nodes have been initialized.
+      self.assertEqual(1000.0, v0_2.eval())
+      self.assertEqual(2000.0, v1_2.eval())
+      # Restore the values saved earlier in the parameter nodes.
+      save2.restore(sess, save_path)
+      # Check that the parameter nodes have been restored.
+      self.assertEqual(10.0, v0_2.eval())
+      self.assertEqual(20.0, v1_2.eval())
+
+  def testInt64(self):
+    save_path = os.path.join(self.get_temp_dir(), "int64")
+
+    with self.test_session() as sess:
+      # Build a graph with 1 node, and save and restore for them.
+      v = tf.Variable(np.int64(15), name="v")
+      save = tf.train.Saver({"v": v}, restore_sequentially=True)
+      tf.initialize_all_variables().run()
+
+      # Save the initialized values in the file at "save_path"
+      val = save.save(sess, save_path)
+      self.assertTrue(isinstance(val, basestring))
+      self.assertEqual(save_path, val)
+
+      with self.test_session() as sess:
+        v = tf.Variable(np.int64(-1), name="v")
+        save = tf.train.Saver({"v": v})
+
+      with self.assertRaisesWithPredicateMatch(
+          tf.OpError, lambda e: "uninitialized value v" in e.message):
+        sess.run(v)
+
+      # Restore the saved values in the parameter nodes.
+      save.restore(sess, save_path)
+      # Check that the parameter nodes have been restored.
+      self.assertEqual(np.int64(15), v.eval())
+
+  def testSomeErrors(self):
+    with tf.Graph().as_default():
+      v0 = tf.Variable([10.0], name="v0")
+      v1 = tf.Variable([20.0], name="v1")
+      v2 = tf.Variable([20.0], name="v2")
+      v2._set_save_slice_info(tf.Variable.SaveSliceInfo("v1", ""))
+
+      # By default the name used for "v2" will be "v1" and raise an error.
+      with self.assertRaisesRegexp(ValueError, "same name: v1"):
+        tf.train.Saver([v0, v1, v2])
+
+      # The names are different and will work.
+      tf.train.Saver({"vee1": v1, "other": [v2]})
+
+  def testBasicsWithListOfVariables(self):
+    save_path = os.path.join(self.get_temp_dir(), "basics_with_list")
+
+    with self.test_session(graph=tf.Graph()) as sess:
+      # Build a graph with 2 parameter nodes, and Save and
+      # Restore nodes for them.
+      v0 = tf.Variable(10.0, name="v0")
+      v1 = tf.Variable(20.0, name="v1")
+      save = tf.train.Saver([v0, v1])
+      tf.initialize_all_variables().run()
+
+      # Check that the parameter nodes have been initialized.
+      self.assertEqual(10.0, v0.eval())
+      self.assertEqual(20.0, v1.eval())
+
+      # Save the initialized values in the file at "save_path"
+      val = save.save(sess, save_path)
+      self.assertTrue(isinstance(val, basestring))
+      self.assertEqual(save_path, val)
+
+    # Start a second session.  In that session the variables
+    # have not been initialized either.
+    with self.test_session(graph=tf.Graph()) as sess:
+      v0 = tf.Variable(-1.0, name="v0")
+      v1 = tf.Variable(-1.0, name="v1")
+      save = tf.train.Saver([v0, v1])
+
+      with self.assertRaisesWithPredicateMatch(
+          tf.OpError, lambda e: "uninitialized value v0" in e.message):
+        sess.run(v0)
+      with self.assertRaisesWithPredicateMatch(
+          tf.OpError, lambda e: "uninitialized value v1" in e.message):
+        sess.run(v1)
+
+      # Restore the saved values in the parameter nodes.
+      save.restore(sess, save_path)
+      # Check that the parameter nodes have been restored.
+      self.assertEqual(10.0, v0.eval())
+      self.assertEqual(20.0, v1.eval())
+
+    # Build another graph with 2 nodes, initialized
+    # differently, and a Restore node for them.
+    with self.test_session(graph=tf.Graph()) as sess:
+      v0_2 = tf.Variable(1000.0, name="v0")
+      v1_2 = tf.Variable(2000.0, name="v1")
+      save2 = tf.train.Saver([v0_2, v1_2])
+      tf.initialize_all_variables().run()
+
+      # Check that the parameter nodes have been initialized.
+      self.assertEqual(1000.0, v0_2.eval())
+      self.assertEqual(2000.0, v1_2.eval())
+      # Restore the values saved earlier in the parameter nodes.
+      save2.restore(sess, save_path)
+      # Check that the parameter nodes have been restored.
+      self.assertEqual(10.0, v0_2.eval())
+      self.assertEqual(20.0, v1_2.eval())
+
+  def _SaveAndLoad(self, var_name, var_value, other_value, save_path):
+    with self.test_session() as sess:
+      var = tf.Variable(var_value, name=var_name)
+      save = tf.train.Saver({var_name: var})
+      var.initializer.run()
+      val = save.save(sess, save_path)
+      self.assertEqual(save_path, val)
+    with self.test_session() as sess:
+      var = tf.Variable(other_value, name=var_name)
+      save = tf.train.Saver({var_name: var})
+      save.restore(sess, save_path)
+      self.assertAllClose(var_value, var.eval())
+
+  def testCacheRereadsFile(self):
+    save_path = os.path.join(self.get_temp_dir(), "cache_rereads")
+    # Save and reload one Variable named "var0".
+    self._SaveAndLoad("var0", 0.0, 1.0, save_path)
+    # Save and reload one Variable named "var1" in the same file.
+    # The cached readers should know to re-read the file.
+    self._SaveAndLoad("var1", 1.1, 2.2, save_path)
+
+  def testGPU(self):
+    if not tf.test.IsBuiltWithCuda():
+      return
+    save_path = os.path.join(self.get_temp_dir(), "gpu")
+    with tf.Session("", graph=tf.Graph()) as sess:
+      with sess.graph.device("/gpu:0"):
+        v0_1 = tf.Variable(123.45)
+      save = tf.train.Saver({"v0": v0_1})
+      tf.initialize_all_variables().run()
+      save.save(sess, save_path)
+
+    with tf.Session("", graph=tf.Graph()) as sess:
+      with sess.graph.device("/gpu:0"):
+        v0_2 = tf.Variable(543.21)
+      save = tf.train.Saver({"v0": v0_2})
+      tf.initialize_all_variables().run()
+      self.assertAllClose(543.21, v0_2.eval())
+      save.restore(sess, save_path)
+      self.assertAllClose(123.45, v0_2.eval())
+
+  def testVariables(self):
+    save_path = os.path.join(self.get_temp_dir(), "variables")
+    with tf.Session("", graph=tf.Graph()) as sess:
+      one = tf.Variable(1.0)
+      twos = tf.Variable([2.0, 2.0, 2.0])
+      init = tf.initialize_all_variables()
+      save = tf.train.Saver(tf.all_variables())
+      init.run()
+      save.save(sess, save_path)
+
+    with tf.Session("", graph=tf.Graph()) as sess:
+      one = tf.Variable(0.0)
+      twos = tf.Variable([0.0, 0.0, 0.0])
+      # Saver with no arg, defaults to 'all variables'.
+      save = tf.train.Saver()
+      save.restore(sess, save_path)
+      self.assertAllClose(1.0, one.eval())
+      self.assertAllClose([2.0, 2.0, 2.0], twos.eval())
+
+  def testSaveWithGlobalStep(self):
+    save_path = os.path.join(self.get_temp_dir(), "ckpt_with_global_step")
+    global_step_int = 5
+    # Save and reload one Variable named "var0".
+    self._SaveAndLoad("var0", 0.0, 1.0, save_path)
+    for use_tensor in [True, False]:
+      with self.test_session() as sess:
+        var = tf.Variable(1.0, name="var0")
+        save = tf.train.Saver({var.op.name: var})
+        var.initializer.run()
+        if use_tensor:
+          global_step = tf.constant(global_step_int)
+          val = save.save(sess, save_path, global_step=global_step)
+        else:
+          val = save.save(sess, save_path, global_step=global_step_int)
+        expected_save_path = "%s-%d" % (save_path, global_step_int)
+        self.assertEqual(expected_save_path, val)
+
+
+class SaveRestoreShardedTest(tf.test.TestCase):
+
+  def testBasics(self):
+    save_path = os.path.join(self.get_temp_dir(), "sharded")
+
+    # Build a graph with 2 parameter nodes on different devices.
+    with tf.Session(
+        target="",
+        config=tf.ConfigProto(device_count={"CPU": 2})) as sess:
+      with sess.graph.device("/cpu:0"):
+        v0 = tf.Variable(10, name="v0")
+      with sess.graph.device("/cpu:1"):
+        v1 = tf.Variable(20, name="v1")
+      save = tf.train.Saver({"v0": v0, "v1": v1}, sharded=True)
+      tf.initialize_all_variables().run()
+      val = save.save(sess, save_path)
+      self.assertEqual(save_path + "-?????-of-00002", val)
+
+    # Restore a different "v0" from shard 0 of the saved files.
+    with tf.Session(
+        target="",
+        config=tf.ConfigProto(device_count={"CPU": 2})) as sess:
+      with sess.graph.device("/cpu:0"):
+        v0 = tf.Variable(111, name="v0")
+      save = tf.train.Saver({"v0": v0}, sharded=True)
+      tf.initialize_all_variables().run()
+      self.assertEqual(111, v0.eval())
+      save.restore(sess, save_path + "-00000-of-00002")
+      self.assertEqual(10, v0.eval())
+
+    # Restore a different "v1" from shard 1 of the saved files.
+    with tf.Session(
+        target="",
+        config=tf.ConfigProto(device_count={"CPU": 2})) as sess:
+      with sess.graph.device("/cpu:0"):
+        v1 = tf.Variable(222)
+      save = tf.train.Saver({"v1": v1}, sharded=True)
+      tf.initialize_all_variables().run()
+      self.assertEqual(222, v1.eval())
+      save.restore(sess, save_path + "-00001-of-00002")
+      self.assertEqual(20, v1.eval())
+
+    # Now try a restore with the sharded filename.
+    with tf.Session(
+        target="",
+        config=tf.ConfigProto(device_count={"CPU": 2})) as sess:
+      with sess.graph.device("/cpu:0"):
+        v0 = tf.Variable(111, name="v0")
+      with sess.graph.device("/cpu:1"):
+        v1 = tf.Variable(222, name="v1")
+      save = tf.train.Saver({"v0": v0, "v1": v1}, sharded=True)
+      tf.initialize_all_variables().run()
+      self.assertEqual(111, v0.eval())
+      self.assertEqual(222, v1.eval())
+      save_path = os.path.join(self.get_temp_dir(), "sharded")
+      save.restore(sess, save_path + "-?????-of-?????")
+      self.assertEqual(10, v0.eval())
+      self.assertEqual(20, v1.eval())
+
+  def testSaverDef(self):
+    with self.test_session():
+      v0 = tf.Variable(123, name="v0")
+      save = tf.train.Saver({"v0": v0}, sharded=True)
+      sd = save.as_saver_def()
+      self.assertTrue(sd.sharded)
+
+
+class MaxToKeepTest(tf.test.TestCase):
+
+  def testNonSharded(self):
+    save_dir = os.path.join(self.get_temp_dir(), "max_to_keep_non_sharded")
+    try:
+      gfile.DeleteRecursively(save_dir)
+    except gfile.GOSError, _:
+      pass                      # Ignore
+    gfile.MakeDirs(save_dir)
+
+    with self.test_session() as sess:
+      v = tf.Variable(10.0, name="v")
+      save = tf.train.Saver({"v": v}, max_to_keep=2)
+      tf.initialize_all_variables().run()
+      self.assertEqual([], save.last_checkpoints)
+
+      s1 = save.save(sess, os.path.join(save_dir, "s1"))
+      self.assertEqual([s1], save.last_checkpoints)
+      self.assertTrue(gfile.Exists(s1))
+
+      s2 = save.save(sess, os.path.join(save_dir, "s2"))
+      self.assertEqual([s1, s2], save.last_checkpoints)
+      self.assertTrue(gfile.Exists(s1))
+      self.assertTrue(gfile.Exists(s2))
+
+      s3 = save.save(sess, os.path.join(save_dir, "s3"))
+      self.assertEqual([s2, s3], save.last_checkpoints)
+      self.assertFalse(gfile.Exists(s1))
+      self.assertTrue(gfile.Exists(s2))
+      self.assertTrue(gfile.Exists(s3))
+
+      # Create a second helper, identical to the first.
+      save2 = tf.train.Saver(saver_def=save.as_saver_def())
+      save2.set_last_checkpoints(save.last_checkpoints)
+
+      # Create a third helper, with the same configuration but no knowledge of
+      # previous checkpoints.
+      save3 = tf.train.Saver(saver_def=save.as_saver_def())
+
+      # Exercise the first helper.
+
+      # Adding s2 again (old s2 is removed first, then new s2 appended)
+      s2 = save.save(sess, os.path.join(save_dir, "s2"))
+      self.assertEqual([s3, s2], save.last_checkpoints)
+      self.assertFalse(gfile.Exists(s1))
+      self.assertTrue(gfile.Exists(s3))
+      self.assertTrue(gfile.Exists(s2))
+
+      # Adding s1 (s3 should now be deleted as oldest in list)
+      s1 = save.save(sess, os.path.join(save_dir, "s1"))
+      self.assertEqual([s2, s1], save.last_checkpoints)
+      self.assertFalse(gfile.Exists(s3))
+      self.assertTrue(gfile.Exists(s2))
+      self.assertTrue(gfile.Exists(s1))
+
+      # Exercise the second helper.
+
+      # Adding s2 again (old s2 is removed first, then new s2 appended)
+      s2 = save2.save(sess, os.path.join(save_dir, "s2"))
+      self.assertEqual([s3, s2], save2.last_checkpoints)
+      # Created by the first helper.
+      self.assertTrue(gfile.Exists(s1))
+      # Deleted by the first helper.
+      self.assertFalse(gfile.Exists(s3))
+      self.assertTrue(gfile.Exists(s2))
+
+      # Adding s1 (s3 should now be deleted as oldest in list)
+      s1 = save2.save(sess, os.path.join(save_dir, "s1"))
+      self.assertEqual([s2, s1], save2.last_checkpoints)
+      self.assertFalse(gfile.Exists(s3))
+      self.assertTrue(gfile.Exists(s2))
+      self.assertTrue(gfile.Exists(s1))
+
+      # Exercise the third helper.
+
+      # Adding s2 again (but helper is unaware of previous s2)
+      s2 = save3.save(sess, os.path.join(save_dir, "s2"))
+      self.assertEqual([s2], save3.last_checkpoints)
+      # Created by the first helper.
+      self.assertTrue(gfile.Exists(s1))
+      # Deleted by the first helper.
+      self.assertFalse(gfile.Exists(s3))
+      self.assertTrue(gfile.Exists(s2))
+
+      # Adding s1 (s3 should not be deleted because helper is unaware of it)
+      s1 = save3.save(sess, os.path.join(save_dir, "s1"))
+      self.assertEqual([s2, s1], save3.last_checkpoints)
+      self.assertFalse(gfile.Exists(s3))
+      self.assertTrue(gfile.Exists(s2))
+      self.assertTrue(gfile.Exists(s1))
+
+  def testSharded(self):
+    save_dir = os.path.join(self.get_temp_dir(), "max_to_keep_sharded")
+    try:
+      gfile.DeleteRecursively(save_dir)
+    except gfile.GOSError, _:
+      pass                      # Ignore
+    gfile.MakeDirs(save_dir)
+
+    with tf.Session(
+        target="",
+        config=tf.ConfigProto(device_count={"CPU": 2})) as sess:
+      with sess.graph.device("/cpu:0"):
+        v0 = tf.Variable(111, name="v0")
+      with sess.graph.device("/cpu:1"):
+        v1 = tf.Variable(222, name="v1")
+      save = tf.train.Saver({"v0": v0, "v1": v1}, sharded=True, max_to_keep=2)
+      tf.initialize_all_variables().run()
+      self.assertEqual([], save.last_checkpoints)
+
+      s1 = save.save(sess, os.path.join(save_dir, "s1"))
+      self.assertEqual([s1], save.last_checkpoints)
+      self.assertEquals(2, len(gfile.Glob(s1)))
+
+      s2 = save.save(sess, os.path.join(save_dir, "s2"))
+      self.assertEqual([s1, s2], save.last_checkpoints)
+      self.assertEquals(2, len(gfile.Glob(s1)))
+      self.assertEquals(2, len(gfile.Glob(s2)))
+
+      s3 = save.save(sess, os.path.join(save_dir, "s3"))
+      self.assertEqual([s2, s3], save.last_checkpoints)
+      self.assertEquals(0, len(gfile.Glob(s1)))
+      self.assertEquals(2, len(gfile.Glob(s2)))
+      self.assertEquals(2, len(gfile.Glob(s3)))
+
+
+class KeepCheckpointEveryNHoursTest(tf.test.TestCase):
+
+  def testNonSharded(self):
+    save_dir = os.path.join(self.get_temp_dir(),
+                            "keep_checkpoint_every_n_hours")
+    try:
+      gfile.DeleteRecursively(save_dir)
+    except gfile.GOSError, _:
+      pass                      # Ignore
+    gfile.MakeDirs(save_dir)
+
+    with self.test_session() as sess:
+      v = tf.Variable([10.0], name="v")
+      # Run the initializer NOW to avoid the 0.5s overhead of the first Run()
+      # call, which throws the test timing off in fastbuild mode.
+      tf.initialize_all_variables().run()
+      # Create a saver that will keep the last 2 checkpoints plus one every 0.7
+      # seconds.
+      start_time = time.time()
+      save = tf.train.Saver({"v": v}, max_to_keep=2,
+                         keep_checkpoint_every_n_hours=0.7 / 3600)
+      self.assertEqual([], save.last_checkpoints)
+
+      # Wait till 0.7 second have elapsed so s1 will be old enough to keep.
+      time.sleep((time.time() + 0.7) - start_time)
+      s1 = save.save(sess, os.path.join(save_dir, "s1"))
+      self.assertEqual([s1], save.last_checkpoints)
+
+      s2 = save.save(sess, os.path.join(save_dir, "s2"))
+      self.assertEqual([s1, s2], save.last_checkpoints)
+
+      # We now have 2 'last_checkpoints': [s1, s2].  The next call to Save(),
+      # would normally delete s1, because max_to_keep is 2.  However, s1 is
+      # older than 0.7s so we must keep it.
+      s3 = save.save(sess, os.path.join(save_dir, "s3"))
+      self.assertEqual([s2, s3], save.last_checkpoints)
+
+      # s1 should still be here, we are Not checking now to reduce time
+      # variance in the test.
+
+      # We now have 2 'last_checkpoints': [s2, s3], and s1 on disk.  The next
+      # call to Save(), will delete s2, because max_to_keep is 2, and because
+      # we already kept the old s1. s2 is very close in time to s1 so it gets
+      # deleted.
+      s4 = save.save(sess, os.path.join(save_dir, "s4"))
+      self.assertEqual([s3, s4], save.last_checkpoints)
+
+      # Check that s1 is still here, but s2 is gone.
+      self.assertTrue(gfile.Exists(s1))
+      self.assertFalse(gfile.Exists(s2))
+      self.assertTrue(gfile.Exists(s3))
+      self.assertTrue(gfile.Exists(s4))
+
+
+class SaveRestoreWithVariableNameMap(tf.test.TestCase):
+
+  def testNonReshape(self):
+    save_path = os.path.join(self.get_temp_dir(), "basics")
+
+    with self.test_session() as sess:
+      # Build a graph with 2 parameter nodes, and Save and
+      # Restore nodes for them.
+      v0 = tf.Variable(10.0, name="v0")
+      v1 = tf.Variable(20.0, name="v1")
+      save = tf.train.Saver({"save_prefix/v0": v0, "save_prefix/v1": v1})
+      tf.initialize_all_variables().run()
+
+      # Check that the parameter nodes have been initialized.
+      self.assertEqual(10.0, v0.eval())
+      self.assertEqual(20.0, v1.eval())
+
+      # Save the initialized values in the file at "save_path"
+      # Use a variable name map to set the saved tensor names
+      val = save.save(sess, save_path)
+      self.assertTrue(isinstance(val, basestring))
+      self.assertEqual(save_path, val)
+
+      # Verify that the original names are not in the Saved file
+      save = tf.train.Saver({"v0": v0, "v1": v1})
+      with self.assertRaisesOpError("not found in checkpoint"):
+        save.restore(sess, save_path)
+
+    # Verify that the mapped names are present in the Saved file and can be
+    # Restored using remapped names.
+    with self.test_session() as sess:
+      v0 = tf.Variable(-1.0, name="v0")
+      v1 = tf.Variable(-1.0, name="v1")
+
+      with self.assertRaisesOpError("uninitialized value v0"):
+        sess.run(v0)
+      with self.assertRaisesOpError("uninitialized value v1"):
+        sess.run(v1)
+
+      save = tf.train.Saver({"save_prefix/v0": v0, "save_prefix/v1": v1})
+      save.restore(sess, save_path)
+
+      # Check that the parameter nodes have been restored.
+      self.assertEqual(10.0, v0.eval())
+      self.assertEqual(20.0, v1.eval())
+
+    # Add a prefix to the node names in the current graph and Restore using
+    # remapped names.
+    with self.test_session() as sess:
+      v0 = tf.Variable(-1.0, name="restore_prefix/v0")
+      v1 = tf.Variable(-1.0, name="restore_prefix/v1")
+
+      with self.assertRaisesOpError("uninitialized value restore_prefix/v0"):
+        sess.run(v0)
+      with self.assertRaisesOpError("uninitialized value restore_prefix/v1"):
+        sess.run(v1)
+
+      # Restore the saved values in the parameter nodes.
+      save = tf.train.Saver({"save_prefix/v0": v0, "save_prefix/v1": v1})
+      save.restore(sess, save_path)
+
+      # Check that the parameter nodes have been restored.
+      self.assertEqual(10.0, v0.eval())
+      self.assertEqual(20.0, v1.eval())
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/summary_io.py b/tensorflow/python/training/summary_io.py
new file mode 100644
index 0000000000..dd994c5311
--- /dev/null
+++ b/tensorflow/python/training/summary_io.py
@@ -0,0 +1,226 @@
+"""Reads Summaries from and writes Summaries to event files."""
+
+import os.path
+import Queue
+import threading
+import time
+
+from tensorflow.core.framework import summary_pb2
+from tensorflow.core.util import event_pb2
+from tensorflow.python import pywrap_tensorflow
+from tensorflow.python.lib.io import tf_record
+from tensorflow.python.platform import gfile
+
+
+class SummaryWriter(object):
+  """Writes `Summary` protocol buffers to event files.
+
+  The `SummaryWriter` class provides a mechanism to create an event file in a
+  given directory and add summaries and events to it. The class updates the
+  file contents asynchronously. This allows a training program to call methods
+  to add data to the file directly from the training loop, without slowing down
+  training.
+
+  @@__init__
+
+  @@add_summary
+  @@add_event
+  @@add_graph
+
+  @@flush
+  @@close
+  """
+
+  def __init__(self, logdir, graph_def=None, max_queue=10, flush_secs=120):
+    """Creates a `SummaryWriter` and an event file.
+
+    On construction the summary writer creates a new event file in `logdir`.
+    This event file will contain `Event` protocol buffers constructed when you
+    call one of the following functions: `add_summary()`, `add_event()`, or
+    `add_graph()`.
+
+    If you pass a `graph_def` protocol buffer to the constructor it is added to
+    the event file. (This is equivalent to calling `add_graph()` later).
+
+    TensorBoard will pick the graph from the file and display it graphically so
+    you can interactively explore the graph you built. You will usually pass
+    the graph from the session in which you launched it:
+
+    ```python
+    ...create a graph...
+    # Launch the graph in a session.
+    sess = tf.Session()
+    # Create a summary writer, add the 'graph_def' to the event file.
+    writer = tf.train.SummaryWriter(<some-directory>, sess.graph_def)
+    ```
+
+    The other arguments to the constructor control the asynchronous writes to
+    the event file:
+
+    *  `flush_secs`: How often, in seconds, to flush the added summaries
+       and events to disk.
+    *  `max_queue`: Maximum number of summaries or events pending to be
+       written to disk before one of the 'add' calls block.
+
+    Args:
+      logdir: A string. Directory where event file will be written.
+      graph_def: A `GraphDef` protocol buffer.
+      max_queue: Integer. Size of the queue for pending events and summaries.
+      flush_secs: Number. How often, in seconds, to flush the
+        pending events and summaries to disk.
+    """
+    self._logdir = logdir
+    if not gfile.IsDirectory(self._logdir):
+      gfile.MakeDirs(self._logdir)
+    self._event_queue = Queue.Queue(max_queue)
+    self._ev_writer = pywrap_tensorflow.EventsWriter(
+        os.path.join(self._logdir, "events"))
+    self._worker = _EventLoggerThread(self._event_queue, self._ev_writer,
+                                      flush_secs)
+    self._worker.start()
+    if graph_def is not None:
+      self.add_graph(graph_def)
+
+  def add_summary(self, summary, global_step=None):
+    """Adds a `Summary` protocol buffer to the event file.
+
+    This method wraps the provided summary in an `Event` procotol buffer
+    and adds it to the event file.
+
+    You can pass the output of any summary op, as-is, to this function. You
+    can also pass a `Summary` procotol buffer that you manufacture with your
+    own data. This is commonly done to report evaluation results in event
+    files.
+
+    Args:
+      summary: A `Summary` protocol buffer, optionally serialized as a string.
+      global_step: Number. Optional global step value to record with the
+        summary.
+    """
+    if isinstance(summary, basestring):
+      summ = summary_pb2.Summary()
+      summ.ParseFromString(summary)
+      summary = summ
+    event = event_pb2.Event(wall_time=time.time(), summary=summary)
+    if global_step is not None:
+      event.step = long(global_step)
+    self.add_event(event)
+
+  def add_event(self, event):
+    """Adds an event to the event file.
+
+    Args:
+      event: An `Event` protocol buffer.
+    """
+    self._event_queue.put(event)
+
+  def add_graph(self, graph_def, global_step=None):
+    """Adds a `GraphDef` protocol buffer to the event file.
+
+    The graph described by the protocol buffer will be displayed by
+    TensorBoard. Most users pass a graph in the constructor instead.
+
+    Args:
+      graph_def: A `GraphDef` protocol buffer.
+      global_step: Number. Optional global step counter to record with the
+        graph.
+    """
+    event = event_pb2.Event(wall_time=time.time(), graph_def=graph_def)
+    if global_step is not None:
+      event.step = long(global_step)
+    self._event_queue.put(event)
+
+  def flush(self):
+    """Flushes the event file to disk.
+
+    Call this method to make sure that all pending events have been written to
+    disk.
+    """
+    self._event_queue.join()
+    self._ev_writer.Flush()
+
+  def close(self):
+    """Flushes the event file to disk and close the file.
+
+    Call this method when you do not need the summary writer anymore.
+    """
+    self.flush()
+    self._ev_writer.Close()
+
+
+class _EventLoggerThread(threading.Thread):
+  """Thread that logs events."""
+
+  def __init__(self, queue, ev_writer, flush_secs):
+    """Creates an _EventLoggerThread.
+
+    Args:
+      queue: a Queue from which to dequeue events.
+      ev_writer: an event writer. Used to log brain events for
+       the visualizer.
+      flush_secs: How often, in seconds, to flush the
+        pending file to disk.
+    """
+    threading.Thread.__init__(self)
+    self.daemon = True
+    self._queue = queue
+    self._ev_writer = ev_writer
+    self._flush_secs = flush_secs
+    # The first event will be flushed immediately.
+    self._next_event_flush_time = 0
+
+  def run(self):
+    while True:
+      event = self._queue.get()
+      try:
+        self._ev_writer.WriteEvent(event)
+        # Flush the event writer every so often.
+        now = time.time()
+        if now > self._next_event_flush_time:
+          self._ev_writer.Flush()
+          # Do it again in two minutes.
+          self._next_event_flush_time = now + self._flush_secs
+      finally:
+        self._queue.task_done()
+
+
+def summary_iterator(path):
+  """An iterator for reading `Event` protocol buffers from an event file.
+
+  You can use this function to read events written to an event file. It returns
+  a Python iterator that yields `Event` protocol buffers.
+
+  Example: Print the contents of an events file.
+
+  ```python
+  for e in tf.summary_iterator(path to events file):
+      print e
+  ```
+
+  Example: Print selected summary values.
+
+  ```python
+  # This example supposes that the events file contains summaries with a
+  # summary value tag 'loss'.  These could have been added by calling
+  # `add_summary()`, passing the output of a scalar summary op created with
+  # with: `tf.scalar_summary(['loss'], loss_tensor)`.
+  for e in tf.summary_iterator(path to events file):
+      for v in e.summary.value:
+          if v.tag == 'loss':
+              print v.simple_value
+  ```
+
+  See the protocol buffer definitions of
+  [Event](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/util/event.proto)
+  and
+  [Summary](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+  for more information about their attributes.
+
+  Args:
+    path: The path to an event file created by a `SummaryWriter`.
+
+  Yields:
+    `Event` protocol buffers.
+  """
+  for r in tf_record.tf_record_iterator(path):
+    yield event_pb2.Event.FromString(r)
diff --git a/tensorflow/python/training/summary_writer_test.py b/tensorflow/python/training/summary_writer_test.py
new file mode 100644
index 0000000000..2ec416f68f
--- /dev/null
+++ b/tensorflow/python/training/summary_writer_test.py
@@ -0,0 +1,151 @@
+"""Tests for training_coordinator.py."""
+import glob
+import os.path
+import shutil
+import time
+
+import tensorflow.python.platform
+
+import tensorflow as tf
+
+
+class SummaryWriterTestCase(tf.test.TestCase):
+
+  def _TestDir(self, test_name):
+    test_dir = os.path.join(self.get_temp_dir(), test_name)
+    return test_dir
+
+  def _CleanTestDir(self, test_name):
+    test_dir = self._TestDir(test_name)
+    if os.path.exists(test_dir):
+      shutil.rmtree(test_dir)
+    return test_dir
+
+  def _EventsReader(self, test_dir):
+    event_paths = glob.glob(os.path.join(test_dir, "event*"))
+    # If the tests runs multiple time in the same directory we can have
+    # more than one matching event file.  We only want to read the last one.
+    self.assertTrue(event_paths)
+    return tf.train.summary_iterator(event_paths[-1])
+
+  def _assertRecent(self, t):
+    self.assertTrue(abs(t - time.time()) < 5)
+
+  def testBasics(self):
+    test_dir = self._CleanTestDir("basics")
+    sw = tf.train.SummaryWriter(test_dir)
+    sw.add_summary(tf.Summary(value=[tf.Summary.Value(tag="mee",
+                                                      simple_value=10.0)]),
+                   10)
+    sw.add_summary(tf.Summary(value=[tf.Summary.Value(tag="boo",
+                                                      simple_value=20.0)]),
+                   20)
+    with tf.Graph().as_default() as g:
+      tf.constant([0], name="zero")
+    gd = g.as_graph_def()
+    sw.add_graph(gd, global_step=30)
+    sw.close()
+    rr = self._EventsReader(test_dir)
+
+    # The first event should list the file_version.
+    ev = next(rr)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals("brain.Event:1", ev.file_version)
+
+    # The next event should have the value 'mee=10.0'.
+    ev = next(rr)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals(10, ev.step)
+    self.assertProtoEquals("""
+      value { tag: 'mee' simple_value: 10.0 }
+      """, ev.summary)
+
+    # The next event should have the value 'boo=20.0'.
+    ev = next(rr)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals(20, ev.step)
+    self.assertProtoEquals("""
+      value { tag: 'boo' simple_value: 20.0 }
+      """, ev.summary)
+
+    # The next event should have the graph_def.
+    ev = next(rr)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals(30, ev.step)
+    self.assertProtoEquals(gd, ev.graph_def)
+
+    # We should be done.
+    self.assertRaises(StopIteration, lambda: next(rr))
+
+  def testConstructWithGraph(self):
+    test_dir = self._CleanTestDir("basics_with_graph")
+    with tf.Graph().as_default() as g:
+      tf.constant([12], name="douze")
+    gd = g.as_graph_def()
+    sw = tf.train.SummaryWriter(test_dir, graph_def=gd)
+    sw.close()
+    rr = self._EventsReader(test_dir)
+
+    # The first event should list the file_version.
+    ev = next(rr)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals("brain.Event:1", ev.file_version)
+
+    # The next event should have the graph.
+    ev = next(rr)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals(0, ev.step)
+    self.assertProtoEquals(gd, ev.graph_def)
+
+    # We should be done.
+    self.assertRaises(StopIteration, lambda: next(rr))
+
+  # Checks that values returned from session Run() calls are added correctly to
+  # summaries.  These are numpy types so we need to check they fit in the
+  # protocol buffers correctly.
+  def testSummariesAndStopFromSessionRunCalls(self):
+    test_dir = self._CleanTestDir("global_step")
+    sw = tf.train.SummaryWriter(test_dir)
+    with self.test_session():
+      i = tf.constant(1, dtype=tf.int32, shape=[])
+      l = tf.constant(2, dtype=tf.int64, shape=[])
+      # Test the summary can be passed serialized.
+      summ = tf.Summary(value=[tf.Summary.Value(tag="i", simple_value=1.0)])
+      sw.add_summary(summ.SerializeToString(), i.eval())
+      sw.add_summary(tf.Summary(value=[tf.Summary.Value(tag="l",
+                                                        simple_value=2.0)]),
+                     l.eval())
+      sw.close()
+
+    rr = self._EventsReader(test_dir)
+
+    # File_version.
+    ev = next(rr)
+    self.assertTrue(ev)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals("brain.Event:1", ev.file_version)
+
+    # Summary passed serialized.
+    ev = next(rr)
+    self.assertTrue(ev)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals(1, ev.step)
+    self.assertProtoEquals("""
+      value { tag: 'i' simple_value: 1.0 }
+      """, ev.summary)
+
+    # Summary passed as SummaryObject.
+    ev = next(rr)
+    self.assertTrue(ev)
+    self._assertRecent(ev.wall_time)
+    self.assertEquals(2, ev.step)
+    self.assertProtoEquals("""
+      value { tag: 'l' simple_value: 2.0 }
+      """, ev.summary)
+
+    # We should be done.
+    self.assertRaises(StopIteration, lambda: next(rr))
+
+
+if __name__ == "__main__":
+  tf.test.main()
diff --git a/tensorflow/python/training/training.py b/tensorflow/python/training/training.py
new file mode 100644
index 0000000000..a400e9fa7d
--- /dev/null
+++ b/tensorflow/python/training/training.py
@@ -0,0 +1,138 @@
+# pylint: disable=wildcard-import,unused-import,g-bad-import-order,line-too-long
+"""This library provides a set of classes and functions that helps train models.
+
+## Optimizers.
+
+The Optimizer base class provides methods to compute gradients for a loss and
+apply gradients to variables.  A collection of subclasses implement classic
+optimization algorithms such as GradientDescent and Adagrad.
+
+You never instantiate the Optimizer class itself, but instead instantiate one
+of the subclasses.
+
+@@Optimizer
+
+@@GradientDescentOptimizer
+@@AdagradOptimizer
+@@MomentumOptimizer
+@@AdamOptimizer
+@@FtrlOptimizer
+@@RMSPropOptimizer
+
+## Gradient Computation.
+
+TensorFlow provides functions to compute the derivatives for a given
+TensorFlow computation graph, adding operations to the graph. The
+optimizer classes automatically compute derivatives on your graph, but
+creators of new Optimizers or expert users can call the lower-level
+functions below.
+
+@@gradients
+@@AggregationMethod
+
+@@stop_gradient
+
+
+## Gradient Clipping
+
+TensorFlow provides several operations that you can use to add clipping
+functions to your graph. You can use these functions to perform general data
+clipping, but they're particularly useful for handling exploding or vanishing
+gradients.
+
+@@clip_by_value
+@@clip_by_norm
+@@clip_by_average_norm
+@@clip_by_global_norm
+@@global_norm
+
+## Decaying the learning rate.
+@@exponential_decay
+
+## Moving Averages.
+
+Some training algorithms, such as GradientDescent and Momentum often benefit
+from maintaining a moving average of variables during optimization.  Using the
+moving averages for evaluations often improve results significantly.
+
+@@ExponentialMovingAverage
+
+## Coordinator and QueueRunner.
+
+See [Threading and Queues](../../how_tos/threading_and_queues/index.md)
+for how to use threads and queues.  For documentation on the Queue API,
+see [Queues](../../api_docs/python/io_ops.md#queues).
+
+@@Coordinator
+@@QueueRunner
+@@add_queue_runner
+@@start_queue_runners
+
+## Summary Operations.
+
+The following ops output
+[`Summary`](https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/core/framework/summary.proto)
+protocol buffers as serialized string tensors.
+
+You can fetch the output of a summary op in a session, and pass it to a
+[SummaryWriter](train.md#SummaryWriter) to append it to an event file.  You can
+then use TensorBoard to visualize the contents of the event files.  See
+[TensorBoard and Summaries](../../how_tos/summaries_and_tensorboard/index.md)
+for more details.
+
+@@scalar_summary
+@@image_summary
+@@histogram_summary
+@@zero_fraction
+
+@@merge_summary
+@@merge_all_summaries
+
+## Adding Summaries to Event Files.
+
+See [Summaries and
+TensorBoard](../../how_tos/summaries_and_tensorboard/index.md) for an
+overview of summaries, event files, and visualization in TensorBoard.
+
+@@SummaryWriter
+@@summary_iterator
+
+## Training utilities.
+
+@@global_step
+@@write_graph
+"""
+
+# Optimizers.
+from tensorflow.python.training.adagrad import AdagradOptimizer
+from tensorflow.python.training.adam import AdamOptimizer
+from tensorflow.python.training.ftrl import FtrlOptimizer
+from tensorflow.python.training.momentum import MomentumOptimizer
+from tensorflow.python.training.moving_averages import ExponentialMovingAverage
+from tensorflow.python.training.optimizer import Optimizer
+from tensorflow.python.training.rmsprop import RMSPropOptimizer
+from tensorflow.python.training.gradient_descent import GradientDescentOptimizer
+
+# Utility classes for training.
+from tensorflow.python.training.coordinator import Coordinator
+from tensorflow.python.training.queue_runner import *
+
+# For the module level doc.
+from tensorflow.python.training import input as _input
+from tensorflow.python.training.input import *
+
+from tensorflow.python.training.saver import get_checkpoint_state
+from tensorflow.python.training.saver import latest_checkpoint
+from tensorflow.python.training.saver import Saver
+from tensorflow.python.training.saver import update_checkpoint_state
+from tensorflow.python.training.summary_io import summary_iterator
+from tensorflow.python.training.summary_io import SummaryWriter
+from tensorflow.python.training.training_util import write_graph
+from tensorflow.python.training.training_util import global_step
+
+# Training data protos.
+from tensorflow.core.example.example_pb2 import *
+from tensorflow.core.example.feature_pb2 import *
+
+# Utility op.  Open Source. TODO(mdevin): move to nn?
+from tensorflow.python.training.learning_rate_decay import exponential_decay
diff --git a/tensorflow/python/training/training_ops.py b/tensorflow/python/training/training_ops.py
new file mode 100644
index 0000000000..410b23e04d
--- /dev/null
+++ b/tensorflow/python/training/training_ops.py
@@ -0,0 +1,115 @@
+"""Python wrappers for training ops."""
+
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+from tensorflow.python.training import gen_training_ops
+# pylint: disable=wildcard-import
+from tensorflow.python.training.gen_training_ops import *
+# pylint: enable=wildcard-import
+
+
+# Shape functions for fused training ops
+# --------------------------------------
+#
+# The fused training ops all have the same basic structure: they take
+# one or more variables with the same shape, and emit a reference to
+# the original variable (which has the same shape as the first
+# input). In addition, they take one or more scalar tensors containing
+# hyperparameters.
+#
+# The sparse ops take the gradients as a Python IndexedSlices, which
+# means that the indices are a vector of length N, and the gradient
+# values are a tensor whose size is the same as the original variable,
+# except for the 0th dimension, which has size N.
+
+
+def _AssertInputIsScalar(op, index):
+  """Raises ValueError if `op.inputs[index]` is not scalar."""
+  op.inputs[index].get_shape().assert_is_compatible_with(tensor_shape.scalar())
+
+
+@ops.RegisterShape("ApplyAdagrad")
+def _ApplyAdagradShape(op):
+  """Shape function for the ApplyAdagrad op."""
+  var_shape = op.inputs[0].get_shape()
+  accum_shape = op.inputs[1].get_shape().merge_with(var_shape)
+  _AssertInputIsScalar(op, 2)  # lr
+  grad_shape = op.inputs[3].get_shape().merge_with(accum_shape)
+  return [grad_shape]
+
+
+@ops.RegisterShape("ApplyAdam")
+def _ApplyAdamShape(op):
+  """Shape function for the ApplyAdam op."""
+  var_shape = op.inputs[0].get_shape()
+  m_shape = op.inputs[1].get_shape().merge_with(var_shape)
+  v_shape = op.inputs[2].get_shape().merge_with(m_shape)
+  _AssertInputIsScalar(op, 3)  # beta1_power
+  _AssertInputIsScalar(op, 4)  # beta2_power
+  _AssertInputIsScalar(op, 5)  # lr
+  _AssertInputIsScalar(op, 6)  # beta1
+  _AssertInputIsScalar(op, 7)  # beta2
+  _AssertInputIsScalar(op, 8)  # epsilon
+  grad_shape = op.inputs[9].get_shape().merge_with(v_shape)
+  return [grad_shape]
+
+
+@ops.RegisterShape("ApplyMomentum")
+def _ApplyMomentumShape(op):
+  """Shape function for the ApplyMomentum op."""
+  var_shape = op.inputs[0].get_shape()
+  accum_shape = op.inputs[1].get_shape().merge_with(var_shape)
+  _AssertInputIsScalar(op, 2)  # lr
+  grad_shape = op.inputs[3].get_shape().merge_with(accum_shape)
+  _AssertInputIsScalar(op, 4)  # momentum
+  return [grad_shape]
+
+
+@ops.RegisterShape("ApplyRMSProp")
+def _ApplyRMSPropShape(op):
+  """Shape function for the ApplyRMSProp op."""
+  var_shape = op.inputs[0].get_shape()
+  ms_shape = op.inputs[1].get_shape().merge_with(var_shape)
+  mom_shape = op.inputs[2].get_shape().merge_with(ms_shape)
+  _AssertInputIsScalar(op, 3)  # lr
+  _AssertInputIsScalar(op, 4)  # rho
+  _AssertInputIsScalar(op, 5)  # momentum
+  _AssertInputIsScalar(op, 6)  # epsilon
+  grad_shape = op.inputs[7].get_shape().merge_with(mom_shape)
+  return [grad_shape]
+
+
+@ops.RegisterShape("ApplyGradientDescent")
+def _ApplyGradientDescentShape(op):
+  """Shape function for the ApplyGradientDescent op."""
+  var_shape = op.inputs[0].get_shape()
+  _AssertInputIsScalar(op, 1)  # alpha
+  delta_shape = op.inputs[2].get_shape().merge_with(var_shape)
+  return [delta_shape]
+
+
+@ops.RegisterShape("SparseApplyAdagrad")
+def _SparseApplyAdagradShape(op):
+  """Shape function for the SparseApplyAdagrad op."""
+  var_shape = op.inputs[0].get_shape()
+  accum_shape = op.inputs[1].get_shape().merge_with(var_shape)
+  _AssertInputIsScalar(op, 2)  # lr
+  grad_shape = op.inputs[3].get_shape().merge_with(
+      tensor_shape.TensorShape([None]).concatenate(accum_shape[1:]))
+  unused_indices_shape = op.inputs[4].get_shape().merge_with(
+      tensor_shape.vector(grad_shape[0]))
+  return [accum_shape]
+
+
+@ops.RegisterShape("SparseApplyMomentum")
+def _SparseApplyMomentumShape(op):
+  """Shape function for the SparseApplyMomentum op."""
+  var_shape = op.inputs[0].get_shape()
+  accum_shape = op.inputs[1].get_shape().merge_with(var_shape)
+  _AssertInputIsScalar(op, 2)  # lr
+  grad_shape = op.inputs[3].get_shape().merge_with(
+      tensor_shape.TensorShape([None]).concatenate(accum_shape[1:]))
+  unused_indices_shape = op.inputs[4].get_shape().merge_with(
+      tensor_shape.vector(grad_shape[0]))
+  _AssertInputIsScalar(op, 5)  # momentum
+  return [accum_shape]
diff --git a/tensorflow/python/training/training_ops_test.py b/tensorflow/python/training/training_ops_test.py
new file mode 100644
index 0000000000..902b9b0d78
--- /dev/null
+++ b/tensorflow/python/training/training_ops_test.py
@@ -0,0 +1,159 @@
+"""Tests for tensorflow.learning.training_ops."""
+
+import itertools
+
+import tensorflow.python.platform
+
+import numpy as np
+
+from tensorflow.python.framework import types
+from tensorflow.python.framework.test_util import TensorFlowTestCase
+from tensorflow.python.ops import constant_op
+from tensorflow.python.ops import variables
+from tensorflow.python.platform import googletest
+from tensorflow.python.training import training_ops
+
+
+class TrainingOpsTest(TensorFlowTestCase):
+
+  def _toType(self, dtype):
+    if dtype == np.float32:
+      return types.float32
+    elif dtype == np.float64:
+      return types.float64
+    elif dtype == np.int32:
+      return types.int32
+    elif dtype == np.int64:
+      return types.int64
+    else:
+      assert False, (dtype)
+
+  def _testTypes(self, x, alpha, delta, use_gpu=None):
+    self.setUp()
+    with self.test_session(use_gpu=use_gpu):
+      var = variables.Variable(x)
+      variables.initialize_all_variables().run()
+      self.assertAllEqual(x, var.eval())
+      apply_sgd = training_ops.apply_gradient_descent(var, alpha, delta)
+      out = apply_sgd.eval()
+      self.assertShapeEqual(out, apply_sgd)
+      self.assertAllEqual(x - alpha * delta, out)
+
+  def testApplyGradientDescent(self):
+    for (dtype, use_gpu) in itertools.product(
+        [np.float32, np.float64], [False, True]):
+      x = np.arange(100).astype(dtype)
+      alpha = np.array(2.0).astype(dtype)
+      delta = np.arange(100).astype(dtype)
+      self._testTypes(x, alpha, delta, use_gpu)
+
+  def _testTypesForAdagrad(self, x, y, lr, grad, use_gpu=None):
+    self.setUp()
+    with self.test_session(use_gpu=use_gpu):
+      var = variables.Variable(x)
+      accum = variables.Variable(y)
+      variables.initialize_all_variables().run()
+
+      self.assertAllEqual(x, var.eval())
+      apply_adagrad = training_ops.apply_adagrad(var, accum, lr, grad)
+      out = apply_adagrad.eval()
+      self.assertShapeEqual(out, apply_adagrad)
+      self.assertAllClose(
+          x - lr * grad * (y + grad * grad) ** (-0.5), out)
+      self.assertAllEqual(y + grad * grad, accum.eval())
+
+  def testApplyAdagrad(self):
+    for (dtype, use_gpu) in itertools.product(
+        [np.float32, np.float64], [False, True]):
+      x = np.arange(100).astype(dtype)
+      y = np.arange(1, 101).astype(dtype)
+      lr = np.array(2.0).astype(dtype)
+      grad = np.arange(100).astype(dtype)
+      self._testTypesForAdagrad(x, y, lr, grad, use_gpu)
+
+  def _testTypesForSparseAdagrad(self, x, y, lr, grad, indices):
+    self.setUp()
+    with self.test_session(use_gpu=False):
+      var = variables.Variable(x)
+      accum = variables.Variable(y)
+      variables.initialize_all_variables().run()
+
+      self.assertAllEqual(x, var.eval())
+      sparse_apply_adagrad = training_ops.sparse_apply_adagrad(
+          var, accum, lr, grad,
+          constant_op.constant(indices, self._toType(indices.dtype)))
+      out = sparse_apply_adagrad.eval()
+      self.assertShapeEqual(out, sparse_apply_adagrad)
+
+      for (i, index) in enumerate(indices):
+        self.assertAllClose(
+            x[index] - lr * grad[i] * (y[index] + grad[i] * grad[i]) ** (-0.5),
+            var.eval()[index])
+        self.assertAllEqual(y[index] + grad[i] * grad[i], accum.eval()[index])
+
+  def testSparseApplyAdagrad(self):
+    for (dtype, index_type) in itertools.product(
+        [np.float32, np.float64], [np.int32, np.int64]):
+      x_val = [range(10), range(10, 20), range(20, 30)]
+      y_val = [range(1, 11), range(11, 21), range(21, 31)]
+      x = np.array(x_val).astype(dtype)
+      y = np.array(y_val).astype(dtype)
+      lr = np.array(2.0).astype(dtype)
+      grad_val = [range(10), range(10)]
+      grad = np.array(grad_val).astype(dtype)
+      indices = np.array([0, 2]).astype(index_type)
+      self._testTypesForSparseAdagrad(x, y, lr, grad, indices)
+
+  def testApplyAdam(self):
+    for dtype, use_gpu in itertools.product(
+        [np.float32, np.float64], [False, True]):
+      var = np.arange(100).astype(dtype)
+      m = np.arange(1, 101).astype(dtype)
+      v = np.arange(101, 201).astype(dtype)
+      grad = np.arange(100).astype(dtype)
+      self._testTypesForAdam(var, m, v, grad, use_gpu)
+
+  def _testTypesForAdam(self, var, m, v, grad, use_gpu):
+    self.setUp()
+    with self.test_session(use_gpu=use_gpu):
+      var_t = variables.Variable(var)
+      m_t = variables.Variable(m)
+      v_t = variables.Variable(v)
+
+      t = 1
+      beta1 = np.array(0.9, dtype=var.dtype)
+      beta2 = np.array(0.999, dtype=var.dtype)
+      beta1_power = beta1**t
+      beta2_power = beta2**t
+      lr = np.array(0.001, dtype=var.dtype)
+      epsilon = np.array(1e-8, dtype=var.dtype)
+      beta1_t = constant_op.constant(beta1, self._toType(var.dtype), [])
+      beta2_t = constant_op.constant(beta2, self._toType(var.dtype), [])
+      beta1_power_t = variables.Variable(beta1_power)
+      beta2_power_t = variables.Variable(beta2_power)
+      lr_t = constant_op.constant(lr, self._toType(var.dtype), [])
+      epsilon_t = constant_op.constant(epsilon, self._toType(var.dtype), [])
+      variables.initialize_all_variables().run()
+
+      self.assertAllEqual(var, var_t.eval())
+      new_var, _, _ = self._adamUpdateNumpy(var, grad, t, m, v,
+                                            lr, beta1, beta2, epsilon)
+      apply_adam = training_ops.apply_adam(var_t, m_t, v_t, beta1_power_t,
+                                           beta2_power_t, lr_t,
+                                           beta1_t, beta2_t, epsilon_t, grad)
+      out = apply_adam.eval()
+      self.assertShapeEqual(out, apply_adam)
+      self.assertAllClose(new_var, out)
+
+  def _adamUpdateNumpy(self, param, g_t, t, m, v, alpha, beta1,
+                       beta2, epsilon):
+    alpha_t = alpha * np.sqrt(1 - beta2 ** t) / (1 - beta1 ** t)
+
+    m_t = beta1 * m + (1 - beta1) * g_t
+    v_t = beta2 * v + (1 - beta2) * g_t * g_t
+
+    param_t = param - alpha_t * m_t / (np.sqrt(v_t) + epsilon)
+    return param_t, m_t, v_t
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/python/training/training_util.py b/tensorflow/python/training/training_util.py
new file mode 100644
index 0000000000..14166e25c6
--- /dev/null
+++ b/tensorflow/python/training/training_util.py
@@ -0,0 +1,57 @@
+"""Utility functions for training."""
+import os.path
+
+from tensorflow.python.platform import gfile
+
+
+def global_step(sess, global_step_tensor):
+  """Small helper to get the global step.
+
+  ```python
+  # Creates a variable to hold the global_step.
+  global_step_tensor = tf.Variable(10, trainable=False, name='global_step')
+  # Creates a session.
+  sess = tf.Session()
+  # Initializes the variable.
+  sess.run(global_step_tensor.initializer)
+  print 'global_step:', tf.train.global_step(sess, global_step_tensor)
+
+  global_step: 10
+  ```
+
+  Args:
+    sess: A brain `Session` object.
+    global_step_tensor:  `Tensor` or the `name` of the operation that contains
+      the global step.
+
+  Returns:
+    The global step value.
+  """
+  return int(sess.run(global_step_tensor))
+
+
+def write_graph(graph_def, logdir, name, as_text=True):
+  """Writes a graph proto on disk.
+
+  The graph is written as a binary proto unless as_text is `True`.
+
+  ```python
+  v = tf.Variable(0, name='my_variable')
+  sess = tf.Session()
+  tf.train.write_graph(sess.graph_def, '/tmp/my-model', 'train.pbtxt')
+  ```
+
+  Args:
+    graph_def: A `GraphDef` protocol buffer.
+    logdir: Directory where to write the graph.
+    name: Filename for the graph.
+    as_text: If `True`, writes the graph as an ASCII proto.
+  """
+  path = os.path.join(logdir, name)
+  gfile.MakeDirs(os.path.dirname(path))
+  f = gfile.FastGFile(path, "w")
+  if as_text:
+    f.write(str(graph_def))
+  else:
+    f.write(graph_def.SerializeToString())
+  f.close()
diff --git a/tensorflow/python/user_ops/__init__.py b/tensorflow/python/user_ops/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/user_ops/__init__.py
diff --git a/tensorflow/python/user_ops/user_ops.py b/tensorflow/python/user_ops/user_ops.py
new file mode 100644
index 0000000000..20e2604e05
--- /dev/null
+++ b/tensorflow/python/user_ops/user_ops.py
@@ -0,0 +1,10 @@
+"""All user ops."""
+
+import tensorflow.python.platform
+from tensorflow.python.ops import gen_user_ops
+from tensorflow.python.ops.gen_user_ops import *
+
+
+def my_fact():
+  """Example of overriding the generated code for an Op."""
+  return gen_user_ops._fact()
diff --git a/tensorflow/python/util/__init__.py b/tensorflow/python/util/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/util/__init__.py
diff --git a/tensorflow/python/util/port.i b/tensorflow/python/util/port.i
new file mode 100644
index 0000000000..fdb217dcc7
--- /dev/null
+++ b/tensorflow/python/util/port.i
@@ -0,0 +1,11 @@
+%include "tensorflow/python/platform/base.i"
+
+%{
+#include "tensorflow/core/util/port.h"
+%}
+
+%ignoreall
+%unignore tensorflow;
+%unignore tensorflow::IsGoogleCudaEnabled;
+%include "tensorflow/core/util/port.h"
+%unignoreall
diff --git a/tensorflow/python/util/protobuf/__init__.py b/tensorflow/python/util/protobuf/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/python/util/protobuf/__init__.py
diff --git a/tensorflow/python/util/protobuf/compare.py b/tensorflow/python/util/protobuf/compare.py
new file mode 100644
index 0000000000..19f7128f4e
--- /dev/null
+++ b/tensorflow/python/util/protobuf/compare.py
@@ -0,0 +1,384 @@
+#!/usr/bin/python2.4
+
+"""Utility functions for comparing proto2 messages in Python.
+
+Proto2Cmp() is a cmp-style comparison function. It can be passed to sort(), etc.
+See its docstring for details.
+
+ClearDefaultValuedFields() recursively clears the fields that are set to their
+default values. This is useful for comparing protocol buffers where the
+semantics of unset fields and default valued fields are the same.
+
+NormalizeRepeatedFields() sorts and optionally de-dupes repeated fields. This
+is useful for treating repeated fields as sets instead of lists.
+
+assertProto2Equal() and assertProto2SameElements() are useful for unit tests.
+They produce much more helpful output than assertEqual() and friends for proto2
+messages, e.g. this:
+
+  outer {
+    inner {
+-     strings: "x"
+?               ^
++     strings: "y"
+?               ^
+    }
+  }
+
+...compared to the default output from assertEqual() that looks like this:
+
+AssertionError: <my.Msg object at 0x9fb353c> != <my.Msg object at 0x9fb35cc>
+
+Call them inside your unit test's googletest.TestCase subclasses like this:
+
+  from tensorflow.python.util.protobuf import compare
+
+  class MyTest(googletest.TestCase):
+    ...
+    def testXXX(self):
+      ...
+      compare.assertProto2Equal(self, a, b)
+      compare.assertProto2SameElements(self, a, c)
+
+Alternatively:
+
+  from tensorflow.python.util.protobuf import compare
+
+  class MyTest(compare.Proto2Assertions, googletest.TestCase):
+    ...
+    def testXXX(self):
+      ...
+      self.assertProto2Equal(a, b)
+      self.assertProto2SameElements(a, c)
+"""
+
+import copy
+
+from google.protobuf import descriptor
+from google.protobuf import message
+from google.protobuf import text_format
+
+
+def assertProto2Equal(self, a, b, check_initialized=True,
+                      normalize_numbers=False, msg=None):
+  """Fails with a useful error if a and b aren't equal.
+
+  Comparison of repeated fields matches the semantics of
+  unittest.TestCase.assertEqual(), ie order and extra duplicates fields matter.
+
+  Args:
+    self: googletest.TestCase
+    a: proto2 PB instance, or text string representing one
+    b: proto2 PB instance -- message.Message or subclass thereof
+    check_initialized: boolean, whether to fail if either a or b isn't
+      initialized
+    normalize_numbers: boolean, whether to normalize types and precision of
+      numbers before comparison.
+    msg: if specified, is used as the error message on failure
+  """
+  if isinstance(a, basestring):
+    a = text_format.Merge(a, b.__class__())
+
+  for pb in a, b:
+    if check_initialized:
+      errors = pb.FindInitializationErrors()
+      if errors:
+        self.fail('Initialization errors: %s\n%s' % (errors, pb))
+    if normalize_numbers:
+      NormalizeNumberFields(pb)
+
+  self.assertMultiLineEqual(text_format.MessageToString(a),
+                            text_format.MessageToString(b),
+                            msg=msg)
+
+
+def assertProto2SameElements(self, a, b, number_matters=False,
+                             check_initialized=True, normalize_numbers=False,
+                             msg=None):
+  """Fails with a useful error if a and b aren't equivalent.
+
+  When comparing repeated fields, order doesn't matter and the number of times
+  each element appears (ie duplicates) only matters if number_matters is True.
+
+  By default, comparison of repeated fields follows set semantics and matches
+  googletest.TestCase.assertSameElements(): neither order nor number of a given
+  element matters.
+
+  Args:
+    self: googletest.TestCase
+    a: proto2 PB instance, or text string representing one
+    b: proto2 PB instance -- message.Message or subclass thereof
+    number_matters: boolean, whether number of each elements must match
+    check_initialized: boolean, whether to fail if either a or b isn't
+      initialized
+    normalize_numbers: boolean, whether to normalize types and precision of
+      numbers before comparison.
+    msg: if specified, is used as the error message on failure
+  """
+  if isinstance(a, basestring):
+    a = text_format.Merge(a, b.__class__())
+  else:
+    a = copy.deepcopy(a)
+  b = copy.deepcopy(b)
+  for pb in a, b:
+    NormalizeRepeatedFields(pb, dedupe=not number_matters)
+  assertProto2Equal(
+      self, a, b, check_initialized=check_initialized,
+      normalize_numbers=normalize_numbers, msg=msg)
+
+
+def assertProto2Contains(self, a, b,  # pylint: disable=invalid-name
+                         number_matters=False, check_initialized=True,
+                         msg=None):
+  """Fails with a useful error if fields in a are not in b.
+
+  Useful to test if expected fields are in b, allows tests to define
+  expected fields in string format.
+
+  Example:
+    compare.assertProto2Contains('group { field: "value" }', test_pb2)
+
+  Args:
+    self: googletest.TestCase
+    a: proto2 PB instance, or text string representing one
+    b: proto2 PB instance
+    number_matters: boolean, whether number of each field must match
+    check_initialized: boolean, whether to fail if b isn't initialized
+    msg: if specified, is used as the error message on failure
+  """
+  if isinstance(a, basestring):
+    a = text_format.Merge(a, b.__class__())
+  else:
+    a = copy.deepcopy(a)
+  completed_a = copy.deepcopy(b)
+  completed_a.MergeFrom(a)
+  assertProto2SameElements(self, completed_a, b, number_matters=number_matters,
+                           check_initialized=check_initialized, msg=msg)
+
+
+def ClearDefaultValuedFields(pb):
+  """Clears all fields in a proto2 message that are set to their default values.
+
+  The result has more compact text / json / binary representation. It's also
+  easier to compare to other protos if the choice whether fields are not set or
+  set to their default values doesn't change the proto buffer's semantics.
+
+  Args:
+    pb: A proto2 message.
+  """
+  for field, value in pb.ListFields():
+    if field.type == field.TYPE_MESSAGE:
+      if field.label == field.LABEL_REPEATED:
+        for item in value:
+          ClearDefaultValuedFields(item)
+      else:
+        ClearDefaultValuedFields(value)
+        if field.label == field.LABEL_OPTIONAL and not value.ListFields():
+          pb.ClearField(field.name)
+    elif field.label == field.LABEL_OPTIONAL and value == field.default_value:
+      pb.ClearField(field.name)
+
+
+def NormalizeRepeatedFields(pb, dedupe=True):
+  """Sorts all repeated fields and optionally removes duplicates.
+
+  Modifies pb in place. Recurses into nested objects. Uses Proto2Cmp for
+  sorting.
+
+  Args:
+    pb: proto2 message
+    dedupe: boolean, whether to remove duplicates
+
+  Returns: the given pb, modified in place
+  """
+  for desc, values in pb.ListFields():
+    if desc.label is not descriptor.FieldDescriptor.LABEL_REPEATED:
+      values = [values]
+
+    if (desc.type == descriptor.FieldDescriptor.TYPE_MESSAGE and
+        desc.message_type.has_options and
+        desc.message_type.GetOptions().map_entry):
+      # This is a map, only recurse if the values have a message type.
+      if (desc.message_type.fields_by_number[2].type ==
+          descriptor.FieldDescriptor.TYPE_MESSAGE):
+        for v in values.itervalues():
+          NormalizeRepeatedFields(v, dedupe=dedupe)
+    else:
+      if (desc.type == descriptor.FieldDescriptor.TYPE_MESSAGE or
+          desc.type == descriptor.FieldDescriptor.TYPE_GROUP):
+        for v in values:
+          # recursive step
+          NormalizeRepeatedFields(v, dedupe=dedupe)
+
+      values.sort(Proto2Cmp)
+
+      if dedupe:
+        # De-dupe in place. Can't use set, etc. because messages aren't
+        # hashable.  This is a heavily discussed toy problem. the code below is
+        # a simplified version of http://code.activestate.com/recipes/52560/
+        # and it requires that values is sorted.
+        for i in xrange(len(values) - 1, 0, -1):
+          if values[i] == values[i - 1]:
+            del values[i]
+
+  return pb
+
+
+def NormalizeNumberFields(pb):
+  """Normalizes types and precisions of number fields in a protocol buffer.
+
+  Due to subtleties in the python protocol buffer implementation, it is possible
+  for values to have different types and precision depending on whether they
+  were set and retrieved directly or deserialized from a protobuf. This function
+  normalizes integer values to ints and longs based on width, 32-bit floats to
+  five digits of precision to account for python always storing them as 64-bit,
+  and ensures doubles are floating point for when they're set to integers.
+
+  Modifies pb in place. Recurses into nested objects.
+
+  Args:
+    pb: proto2 message
+
+  Returns:
+    the given pb, modified in place
+  """
+  for desc, values in pb.ListFields():
+    is_repeated = True
+    if desc.label is not descriptor.FieldDescriptor.LABEL_REPEATED:
+      is_repeated = False
+      values = [values]
+
+    normalized_values = None
+
+    # We force 32-bit values to int and 64-bit values to long to make
+    # alternate implementations where the distinction is more significant
+    # (e.g. the C++ implementation) simpler.
+    if desc.type in (descriptor.FieldDescriptor.TYPE_INT64,
+                     descriptor.FieldDescriptor.TYPE_UINT64,
+                     descriptor.FieldDescriptor.TYPE_SINT64):
+      normalized_values = [long(x) for x in values]
+    elif desc.type in (descriptor.FieldDescriptor.TYPE_INT32,
+                       descriptor.FieldDescriptor.TYPE_UINT32,
+                       descriptor.FieldDescriptor.TYPE_SINT32,
+                       descriptor.FieldDescriptor.TYPE_ENUM):
+      normalized_values = [int(x) for x in values]
+    elif desc.type == descriptor.FieldDescriptor.TYPE_FLOAT:
+      normalized_values = [round(x, 6) for x in values]
+    elif desc.type == descriptor.FieldDescriptor.TYPE_DOUBLE:
+      normalized_values = [round(float(x), 7) for x in values]
+
+    if normalized_values is not None:
+      if is_repeated:
+        pb.ClearField(desc.name)
+        getattr(pb, desc.name).extend(normalized_values)
+      else:
+        setattr(pb, desc.name, normalized_values[0])
+
+    if (desc.type == descriptor.FieldDescriptor.TYPE_MESSAGE or
+        desc.type == descriptor.FieldDescriptor.TYPE_GROUP):
+      if (desc.type == descriptor.FieldDescriptor.TYPE_MESSAGE and
+          desc.message_type.has_options and
+          desc.message_type.GetOptions().map_entry):
+        # This is a map, only recurse if the values have a message type.
+        if (desc.message_type.fields_by_number[2].type ==
+            descriptor.FieldDescriptor.TYPE_MESSAGE):
+          for v in values.itervalues():
+            NormalizeNumberFields(v)
+      else:
+        for v in values:
+          # recursive step
+          NormalizeNumberFields(v)
+
+  return pb
+
+
+def _IsRepeatedContainer(value):
+  if isinstance(value, basestring):
+    return False
+  try:
+    iter(value)
+    return True
+  except TypeError:
+    return False
+
+
+def Proto2Cmp(a, b):
+  """Compares two proto2 objects field by field, in ascending tag order.
+
+  Recurses into nested messages. Uses list (not set) semantics for comparing
+  repeated fields, ie duplicates and order matter. If one field is a prefix of
+  the other, the longer field is greater.
+
+  This function is intended to be used as a python cmp function, e.g. in sort.
+
+  Ordering fields by tag number has precedent in other google code, but it's
+  still somewhat arbitrary. The main value is to provide *some* stable ordering
+  for proto2 messages.
+
+  This would be easier as a__cmp__ method or set of __le__, __gt__, etc methods
+  in the proto2 Message class itself. That would take a little more care,
+  though, and probably some significant debate over whether they should exist at
+  all, so this was easier.
+
+  Args:
+    a, b: proto2 messages or primitives
+
+  Returns: integer > 0 if a > b, < 0 if a < b, 0 if a == b
+  """
+  def Format(pb):
+    """Returns a dictionary that maps tag number (for messages) or element index
+    (for repeated fields) to value, or just pb unchanged if it's neither."""
+    if isinstance(pb, message.Message):
+      return dict((desc.number, value) for desc, value in pb.ListFields())
+    elif _IsRepeatedContainer(pb):
+      return dict(enumerate(list(pb)))
+    else:
+      return pb
+
+  a, b = Format(a), Format(b)
+
+  # base case
+  if not isinstance(a, dict) or not isinstance(b, dict):
+    return cmp(a, b)
+
+  # this list performs double duty: it compares two messages by tag value *or*
+  # two repeated fields by element, in order. the magic is in the format()
+  # function, which converts them both to the same easily comparable format.
+  for tag in sorted(set(a.keys() + b.keys())):
+    if tag not in a:
+      return -1  # b is greater
+    elif tag not in b:
+      return 1   # a is greater
+    else:
+      # recursive step
+      cmped = Proto2Cmp(a[tag], b[tag])
+      if cmped != 0:
+        return cmped
+
+  # didn't find any values that differed, so they're equal!
+  return 0
+
+
+class Proto2Assertions(object):
+  """Mix this into a googletest.TestCase class to get proto2 assertions.
+
+  Usage:
+
+  class SomeTestCase(compare.Proto2Assertions, googletest.TestCase):
+    ...
+    def testSomething(self):
+      ...
+      self.assertProto2Equal(a, b)
+
+  See module-level definitions for method documentation.
+  """
+
+  # pylint: disable=invalid-name
+  def assertProto2Equal(self, *args, **kwargs):
+    return assertProto2Equal(self, *args, **kwargs)
+
+  def assertProto2SameElements(self, *args, **kwargs):
+    return assertProto2SameElements(self, *args, **kwargs)
+
+  def assertProto2Contains(self, *args, **kwargs):
+    return assertProto2Contains(self, *args, **kwargs)
diff --git a/tensorflow/python/util/protobuf/compare_test.proto b/tensorflow/python/util/protobuf/compare_test.proto
new file mode 100644
index 0000000000..fa0b5de9f0
--- /dev/null
+++ b/tensorflow/python/util/protobuf/compare_test.proto
@@ -0,0 +1,49 @@
+// Test messages used in compare_test.py.
+syntax = "proto2";
+
+package compare_test;
+// option cc_enable_arenas = true;
+
+enum Enum {
+  A = 0;
+  B = 1;
+  C = 2;
+}
+
+message Small {
+  repeated string strings = 1;
+};
+
+message Medium {
+  repeated int32 int32s = 1;
+  repeated Small smalls = 2;
+  repeated group GroupA = 3 {
+    repeated group GroupB = 4 {
+      required string strings = 5;
+    }
+  }
+  repeated float floats = 6;
+};
+
+message Large {
+  optional string string_ = 1;
+  optional int64 int64_ = 2;
+  optional float float_ = 3;
+  optional bool bool_ = 4;
+  optional Enum enum_ = 5;
+  repeated int64 int64s = 6;
+  optional Medium medium = 7;
+  optional Small small = 8;
+  optional double double_ = 9;
+  optional WithMap with_map = 10;
+};
+
+message Labeled {
+  required int32 required = 1;
+  optional int32 optional = 2;
+}
+
+message WithMap {
+  map<int32, Small> value_message = 1;
+  map<string, string> value_string = 2;
+}
diff --git a/tensorflow/python/util/protobuf/compare_test.py b/tensorflow/python/util/protobuf/compare_test.py
new file mode 100644
index 0000000000..9a03d123ae
--- /dev/null
+++ b/tensorflow/python/util/protobuf/compare_test.py
@@ -0,0 +1,652 @@
+#!/usr/bin/python2.4
+
+"""Tests for python.util.protobuf.compare."""
+
+import copy
+import re
+import textwrap
+
+from tensorflow.python.platform import googletest
+from tensorflow.python.util.protobuf import compare
+from tensorflow.python.util.protobuf import compare_test_pb2
+
+from google.protobuf import text_format
+
+
+def LargePbs(*args):
+  """Converts ASCII string Large PBs to messages."""
+  pbs = []
+  for arg in args:
+    pb = compare_test_pb2.Large()
+    text_format.Merge(arg, pb)
+    pbs.append(pb)
+
+  return pbs
+
+
+class Proto2CmpTest(googletest.TestCase):
+
+  def assertGreater(self, a, b):
+    """Asserts that Proto2Cmp says a > b."""
+    a, b = LargePbs(a, b)
+    googletest.TestCase.assertGreater(self, compare.Proto2Cmp(a, b), 0)
+    googletest.TestCase.assertLess(self, compare.Proto2Cmp(b, a), 0)
+
+  def assertEquals(self, a, b):
+    """Asserts that Proto2Cmp says a == b."""
+    a, b = LargePbs(a, b)
+    googletest.TestCase.assertEquals(self, compare.Proto2Cmp(a, b), 0)
+
+  def testPrimitives(self):
+    googletest.TestCase.assertEqual(self, 0, compare.Proto2Cmp('a', 'a'))
+    googletest.TestCase.assertLess(self, 0, compare.Proto2Cmp('b', 'a'))
+
+    pb = compare_test_pb2.Large()
+    googletest.TestCase.assertEquals(self, cmp('a', pb), compare.Proto2Cmp('a', pb))
+    googletest.TestCase.assertEqual(self, cmp(pb, 'a'), compare.Proto2Cmp(pb, 'a'))
+
+  def testEmpty(self):
+    self.assertEquals('', '')
+
+  def testPrimitiveFields(self):
+    self.assertGreater('string_: "a"', '')
+    self.assertEquals('string_: "a"', 'string_: "a"')
+    self.assertGreater('string_: "b"', 'string_: "a"')
+    self.assertGreater('string_: "ab"', 'string_: "aa"')
+
+    self.assertGreater('int64_: 0', '')
+    self.assertEquals('int64_: 0', 'int64_: 0')
+    self.assertGreater('int64_: -1', '')
+    self.assertGreater('int64_: 1', 'int64_: 0')
+    self.assertGreater('int64_: 0', 'int64_: -1')
+
+    self.assertGreater('float_: 0.0', '')
+    self.assertEquals('float_: 0.0', 'float_: 0.0')
+    self.assertGreater('float_: -0.1', '')
+    self.assertGreater('float_: 3.14', 'float_: 0')
+    self.assertGreater('float_: 0', 'float_: -0.1')
+    self.assertEquals('float_: -0.1', 'float_: -0.1')
+
+    self.assertGreater('bool_: true', '')
+    self.assertGreater('bool_: false', '')
+    self.assertGreater('bool_: true', 'bool_: false')
+    self.assertEquals('bool_: false', 'bool_: false')
+    self.assertEquals('bool_: true', 'bool_: true')
+
+    self.assertGreater('enum_: A', '')
+    self.assertGreater('enum_: B', 'enum_: A')
+    self.assertGreater('enum_: C', 'enum_: B')
+    self.assertEquals('enum_: C', 'enum_: C')
+
+  def testRepeatedPrimitives(self):
+    self.assertGreater('int64s: 0', '')
+    self.assertEquals('int64s: 0', 'int64s: 0')
+    self.assertGreater('int64s: 1', 'int64s: 0')
+    self.assertGreater('int64s: 0 int64s: 0', '')
+    self.assertGreater('int64s: 0 int64s: 0', 'int64s: 0')
+    self.assertGreater('int64s: 1 int64s: 0', 'int64s: 0')
+    self.assertGreater('int64s: 0 int64s: 1', 'int64s: 0')
+    self.assertGreater('int64s: 1', 'int64s: 0 int64s: 2')
+    self.assertGreater('int64s: 2 int64s: 0', 'int64s: 1')
+    self.assertEquals('int64s: 0 int64s: 0', 'int64s: 0 int64s: 0')
+    self.assertEquals('int64s: 0 int64s: 1', 'int64s: 0 int64s: 1')
+    self.assertGreater('int64s: 1 int64s: 0', 'int64s: 0 int64s: 0')
+    self.assertGreater('int64s: 1 int64s: 0', 'int64s: 0 int64s: 1')
+    self.assertGreater('int64s: 1 int64s: 0', 'int64s: 0 int64s: 2')
+    self.assertGreater('int64s: 1 int64s: 1', 'int64s: 1 int64s: 0')
+    self.assertGreater('int64s: 1 int64s: 1', 'int64s: 1 int64s: 0 int64s: 2')
+
+  def testMessage(self):
+    self.assertGreater('small <>', '')
+    self.assertEquals('small <>', 'small <>')
+    self.assertGreater('small < strings: "a" >', '')
+    self.assertGreater('small < strings: "a" >', 'small <>')
+    self.assertEquals('small < strings: "a" >', 'small < strings: "a" >')
+    self.assertGreater('small < strings: "b" >', 'small < strings: "a" >')
+    self.assertGreater('small < strings: "a" strings: "b" >',
+                       'small < strings: "a" >')
+
+    self.assertGreater('string_: "a"', 'small <>')
+    self.assertGreater('string_: "a"', 'small < strings: "b" >')
+    self.assertGreater('string_: "a"', 'small < strings: "b" strings: "c" >')
+    self.assertGreater('string_: "a" small <>', 'small <>')
+    self.assertGreater('string_: "a" small <>', 'small < strings: "b" >')
+    self.assertEquals('string_: "a" small <>', 'string_: "a" small <>')
+    self.assertGreater('string_: "a" small < strings: "a" >',
+                       'string_: "a" small <>')
+    self.assertEquals('string_: "a" small < strings: "a" >',
+                      'string_: "a" small < strings: "a" >')
+    self.assertGreater('string_: "a" small < strings: "a" >',
+                       'int64_: 1 small < strings: "a" >')
+    self.assertGreater('string_: "a" small < strings: "a" >', 'int64_: 1')
+    self.assertGreater('string_: "a"', 'int64_: 1 small < strings: "a" >')
+    self.assertGreater('string_: "a" int64_: 0 small < strings: "a" >',
+                       'int64_: 1 small < strings: "a" >')
+    self.assertGreater('string_: "a" int64_: 1 small < strings: "a" >',
+                       'string_: "a" int64_: 0 small < strings: "a" >')
+    self.assertEquals('string_: "a" int64_: 0 small < strings: "a" >',
+                      'string_: "a" int64_: 0 small < strings: "a" >')
+
+  def testNestedMessage(self):
+    self.assertGreater('medium <>', '')
+    self.assertEquals('medium <>', 'medium <>')
+    self.assertGreater('medium < smalls <> >', 'medium <>')
+    self.assertEquals('medium < smalls <> >', 'medium < smalls <> >')
+    self.assertGreater('medium < smalls <> smalls <> >', 'medium < smalls <> >')
+    self.assertEquals('medium < smalls <> smalls <> >',
+                      'medium < smalls <> smalls <> >')
+
+    self.assertGreater('medium < int32s: 0 >', 'medium < smalls <> >')
+
+    self.assertGreater('medium < smalls < strings: "a"> >',
+                       'medium < smalls <> >')
+
+  def testTagOrder(self):
+    """Tests that different fields are ordered by tag number.
+
+    For reference, here are the relevant tag numbers from compare_test.proto:
+      optional string string_ = 1;
+      optional int64 int64_ = 2;
+      optional float float_ = 3;
+      optional Small small = 8;
+      optional Medium medium = 7;
+      optional Small small = 8;
+    """
+    self.assertGreater('string_: "a"                      ',
+                       '             int64_: 1            ')
+    self.assertGreater('string_: "a" int64_: 2            ',
+                       '             int64_: 1            ')
+    self.assertGreater('string_: "b" int64_: 1            ',
+                       'string_: "a" int64_: 2            ')
+    self.assertEquals( 'string_: "a" int64_: 1            ',
+                       'string_: "a" int64_: 1            ')
+    self.assertGreater('string_: "a" int64_: 1 float_: 0.0',
+                       'string_: "a" int64_: 1            ')
+    self.assertEquals( 'string_: "a" int64_: 1 float_: 0.0',
+                       'string_: "a" int64_: 1 float_: 0.0')
+    self.assertGreater('string_: "a" int64_: 1 float_: 0.1',
+                       'string_: "a" int64_: 1 float_: 0.0')
+    self.assertGreater('string_: "a" int64_: 2 float_: 0.0',
+                       'string_: "a" int64_: 1 float_: 0.1')
+    self.assertGreater('string_: "a"                      ',
+                       '             int64_: 1 float_: 0.1')
+    self.assertGreater('string_: "a"           float_: 0.0',
+                       '             int64_: 1            ')
+    self.assertGreater('string_: "b"           float_: 0.0',
+                       'string_: "a" int64_: 1            ')
+
+    self.assertGreater('string_: "a"',
+                       'small < strings: "a" >')
+    self.assertGreater('string_: "a" small < strings: "a" >',
+                       'small < strings: "b" >')
+    self.assertGreater('string_: "a" small < strings: "b" >',
+                       'string_: "a" small < strings: "a" >')
+    self.assertEquals('string_: "a" small < strings: "a" >',
+                      'string_: "a" small < strings: "a" >')
+
+    self.assertGreater('string_: "a" medium <>',
+                       'string_: "a" small < strings: "a" >')
+    self.assertGreater('string_: "a" medium < smalls <> >',
+                       'string_: "a" small < strings: "a" >')
+    self.assertGreater('medium <>', 'small < strings: "a" >')
+    self.assertGreater('medium <> small <>', 'small < strings: "a" >')
+    self.assertGreater('medium < smalls <> >', 'small < strings: "a" >')
+    self.assertGreater('medium < smalls < strings: "a" > >',
+                       'small < strings: "b" >')
+
+
+class NormalizeRepeatedFieldsTest(googletest.TestCase):
+
+  def assertNormalizes(self, orig, expected_no_dedupe, expected_dedupe):
+    """Checks NormalizeRepeatedFields(orig) against the two expected results."""
+    orig, expected_no_dedupe, expected_dedupe = LargePbs(
+        orig, expected_no_dedupe, expected_dedupe)
+
+    actual = compare.NormalizeRepeatedFields(copy.deepcopy(orig), dedupe=False)
+    self.assertEqual(expected_no_dedupe, actual)
+
+    actual = compare.NormalizeRepeatedFields(copy.deepcopy(orig), dedupe=True)
+    self.assertEqual(expected_dedupe, actual)
+
+  def testIgnoreNonRepeatedFields(self):
+    orig = """string_: "a" int64_: 1 float_: 0.1 bool_: true enum_: A
+              medium: {} small: {}"""
+    self.assertNormalizes(orig, orig, orig)
+
+  def testRepeatedPrimitive(self):
+    self.assertNormalizes('int64s: 3 int64s: -1 int64s: 2 int64s: -1 int64s: 3',
+                          'int64s: -1 int64s: -1 int64s: 2 int64s: 3 int64s: 3',
+                          'int64s: -1 int64s: 2 int64s: 3')
+
+  def testRepeatedMessage(self):
+    self.assertNormalizes("""medium: { smalls: { strings: "c" }
+                                       smalls: { strings: "a" }
+                                       smalls: { strings: "b" }
+                                       smalls: { strings: "a" }
+                                       smalls: { strings: "c" } }
+                          """,
+                          """medium: { smalls: { strings: "a" }
+                                       smalls: { strings: "a" }
+                                       smalls: { strings: "b" }
+                                       smalls: { strings: "c" }
+                                       smalls: { strings: "c" } }
+                          """,
+                          """medium: { smalls: { strings: "a" }
+                                       smalls: { strings: "b" }
+                                       smalls: { strings: "c" } }
+                          """)
+
+  def testNestedRepeatedGroup(self):
+    self.assertNormalizes("""medium {  GroupA { GroupB { strings: "c" }
+                                                GroupB { strings: "a" }
+                                                GroupB { strings: "b" }
+                                                GroupB { strings: "a" }
+                                                GroupB { strings: "c" } } }
+                          """,
+                          """medium {  GroupA { GroupB { strings: "a" }
+                                                GroupB { strings: "a" }
+                                                GroupB { strings: "b" }
+                                                GroupB { strings: "c" }
+                                                GroupB { strings: "c" } } }
+                          """,
+                          """medium {  GroupA { GroupB { strings: "a" }
+                                                GroupB { strings: "b" }
+                                                GroupB { strings: "c" } } }
+                          """)
+
+  def testMapNormalizes(self):
+    self.assertNormalizes(
+        """with_map: {  value_message: { key: 2, value: { strings: "k2v1",
+                                                          strings: "k2v2",
+                                                          strings: "k2v1" } },
+                        value_message: { key: 1, value: { strings: "k1v2",
+                                                          strings: "k1v1" } } }
+        """,
+        """with_map: {  value_message: { key: 1, value: { strings: "k1v1",
+                                                          strings: "k1v2" } },
+                        value_message: { key: 2, value: { strings: "k2v1",
+                                                          strings: "k2v1",
+                                                          strings: "k2v2" } } }
+        """,
+        """with_map: {  value_message: { key: 1, value: { strings: "k1v1",
+                                                          strings: "k1v2" } },
+                        value_message: { key: 2, value: { strings: "k2v1",
+                                                          strings: "k2v2" } } }
+        """)
+
+
+class NormalizeNumbersTest(googletest.TestCase):
+  """Tests for NormalizeNumberFields()."""
+
+  def testNormalizesInts(self):
+    pb = compare_test_pb2.Large()
+    pb.int64_ = 4
+    compare.NormalizeNumberFields(pb)
+    self.assertTrue(isinstance(pb.int64_, long))
+
+    pb.int64_ = 4L
+    compare.NormalizeNumberFields(pb)
+    self.assertTrue(isinstance(pb.int64_, long))
+
+    pb.int64_ = 9999999999999999L
+    compare.NormalizeNumberFields(pb)
+    self.assertTrue(isinstance(pb.int64_, long))
+
+  def testNormalizesRepeatedInts(self):
+    pb = compare_test_pb2.Large()
+    pb.int64s.extend([1L, 400, 999999999999999L])
+    compare.NormalizeNumberFields(pb)
+    self.assertTrue(isinstance(pb.int64s[0], long))
+    self.assertTrue(isinstance(pb.int64s[1], long))
+    self.assertTrue(isinstance(pb.int64s[2], long))
+
+  def testNormalizesFloats(self):
+    pb1 = compare_test_pb2.Large()
+    pb1.float_ = 1.2314352351231
+    pb2 = compare_test_pb2.Large()
+    pb2.float_ = 1.231435
+    self.assertNotEqual(pb1.float_, pb2.float_)
+    compare.NormalizeNumberFields(pb1)
+    compare.NormalizeNumberFields(pb2)
+    self.assertEqual(pb1.float_, pb2.float_)
+
+  def testNormalizesRepeatedFloats(self):
+    pb = compare_test_pb2.Large()
+    pb.medium.floats.extend([0.111111111, 0.111111])
+    compare.NormalizeNumberFields(pb)
+    for value in pb.medium.floats:
+      self.assertAlmostEqual(0.111111, value)
+
+  def testNormalizesDoubles(self):
+    pb1 = compare_test_pb2.Large()
+    pb1.double_ = 1.2314352351231
+    pb2 = compare_test_pb2.Large()
+    pb2.double_ = 1.2314352
+    self.assertNotEqual(pb1.double_, pb2.double_)
+    compare.NormalizeNumberFields(pb1)
+    compare.NormalizeNumberFields(pb2)
+    self.assertEqual(pb1.double_, pb2.double_)
+
+  def testNormalizesMaps(self):
+    pb = compare_test_pb2.WithMap()
+    pb.value_message[4].strings.extend(['a', 'b', 'c'])
+    pb.value_string['d'] = 'e'
+    compare.NormalizeNumberFields(pb)
+
+
+class AssertTest(googletest.TestCase):
+  """Tests both assertProto2Equal() and assertProto2SameElements()."""
+  def assertProto2Equal(self, a, b, **kwargs):
+    if isinstance(a, basestring) and isinstance(b, basestring):
+      a, b = LargePbs(a, b)
+    compare.assertProto2Equal(self, a, b, **kwargs)
+
+  def assertProto2SameElements(self, a, b, **kwargs):
+    if isinstance(a, basestring) and isinstance(b, basestring):
+      a, b = LargePbs(a, b)
+    compare.assertProto2SameElements(self, a, b, **kwargs)
+
+  def assertAll(self, a, **kwargs):
+    """Checks that all possible asserts pass."""
+    self.assertProto2Equal(a, a, **kwargs)
+    self.assertProto2SameElements(a, a, number_matters=False, **kwargs)
+    self.assertProto2SameElements(a, a, number_matters=True, **kwargs)
+
+  def assertSameNotEqual(self, a, b):
+    """Checks that assertProto2SameElements() passes with number_matters=False
+    and number_matters=True but not assertProto2Equal().
+    """
+    self.assertProto2SameElements(a, b, number_matters=False)
+    self.assertProto2SameElements(a, b, number_matters=True)
+    self.assertRaises(AssertionError, self.assertProto2Equal, a, b)
+
+  def assertSameExceptNumber(self, a, b):
+    """Checks that assertProto2SameElements() passes with number_matters=False
+    but not number_matters=True or assertProto2Equal().
+    """
+    self.assertProto2SameElements(a, b, number_matters=False)
+    self.assertRaises(AssertionError, self.assertProto2SameElements, a, b,
+                      number_matters=True)
+    self.assertRaises(AssertionError, self.assertProto2Equal, a, b)
+
+  def assertNone(self, a, b, message, **kwargs):
+    """Checks that all possible asserts fail with the given message."""
+    message = re.escape(textwrap.dedent(message))
+    self.assertRaisesRegexp(AssertionError, message,
+                            self.assertProto2SameElements, a, b,
+                            number_matters=False, **kwargs)
+    self.assertRaisesRegexp(AssertionError, message,
+                            self.assertProto2SameElements, a, b,
+                            number_matters=True, **kwargs)
+    self.assertRaisesRegexp(AssertionError, message,
+                            self.assertProto2Equal, a, b, **kwargs)
+
+  def testCheckInitialized(self):
+    # neither is initialized
+    a = compare_test_pb2.Labeled()
+    a.optional = 1
+    self.assertNone(a, a, 'Initialization errors: ', check_initialized=True)
+    self.assertAll(a, check_initialized=False)
+
+    # a is initialized, b isn't
+    b = copy.deepcopy(a)
+    a.required = 2
+    self.assertNone(a, b, 'Initialization errors: ', check_initialized=True)
+    self.assertNone(a, b,
+                    """
+                    - required: 2
+                      optional: 1
+                    """,
+                    check_initialized=False)
+
+    # both are initialized
+    a = compare_test_pb2.Labeled()
+    a.required = 2
+    self.assertAll(a, check_initialized=True)
+    self.assertAll(a, check_initialized=False)
+
+    b = copy.deepcopy(a)
+    b.required = 3
+    message = """
+              - required: 2
+              ?           ^
+              + required: 3
+              ?           ^
+              """
+    self.assertNone(a, b, message, check_initialized=True)
+    self.assertNone(a, b, message, check_initialized=False)
+
+  def testAssertEqualWithStringArg(self):
+    pb = compare_test_pb2.Large()
+    pb.string_ = 'abc'
+    pb.float_ = 1.234
+    compare.assertProto2Equal(
+        self,
+        """
+          string_: 'abc'
+          float_: 1.234
+        """,
+        pb)
+
+  def testAssertSameElementsWithStringArg(self):
+    pb = compare_test_pb2.Large()
+    pb.string_ = 'abc'
+    pb.float_ = 1.234
+    pb.int64s.extend([7, 3, 5])
+    compare.assertProto2SameElements(
+        self,
+        """
+          string_: 'abc'
+          float_: 1.234
+          int64s: 3
+          int64s: 7
+          int64s: 5
+        """,
+        pb)
+
+  def testProto2ContainsString(self):
+    pb = compare_test_pb2.Large()
+    pb.string_ = 'abc'
+    pb.float_ = 1.234
+    pb.small.strings.append('xyz')
+    compare.assertProto2Contains(
+        self,
+        """
+          small {
+            strings: "xyz"
+          }
+        """,
+        pb)
+
+  def testProto2ContainsProto(self):
+    pb = compare_test_pb2.Large()
+    pb.string_ = 'abc'
+    pb.float_ = 1.234
+    pb.small.strings.append('xyz')
+    pb2 = compare_test_pb2.Large()
+    pb2.small.strings.append('xyz')
+    compare.assertProto2Contains(
+        self, pb2, pb)
+
+  def testNormalizesNumbers(self):
+    pb1 = compare_test_pb2.Large()
+    pb1.int64_ = 4
+    pb2 = compare_test_pb2.Large()
+    pb2.int64_ = 4L
+    compare.assertProto2Equal(self, pb1, pb2)
+
+  def testNormalizesFloat(self):
+    pb1 = compare_test_pb2.Large()
+    pb1.double_ = 4.0
+    pb2 = compare_test_pb2.Large()
+    pb2.double_ = 4L
+    compare.assertProto2Equal(self, pb1, pb2, normalize_numbers=True)
+
+    pb1 = compare_test_pb2.Medium()
+    pb1.floats.extend([4.0, 6.0])
+    pb2 = compare_test_pb2.Medium()
+    pb2.floats.extend([6L, 4L])
+    compare.assertProto2SameElements(self, pb1, pb2, normalize_numbers=True)
+
+  def testPrimitives(self):
+    self.assertAll('string_: "x"')
+    self.assertNone('string_: "x"',
+                    'string_: "y"',
+                    """
+                    - string_: "x"
+                    ?           ^
+                    + string_: "y"
+                    ?           ^
+                    """)
+
+  def testRepeatedPrimitives(self):
+    self.assertAll('int64s: 0 int64s: 1')
+
+    self.assertSameNotEqual('int64s: 0 int64s: 1', 'int64s: 1 int64s: 0')
+    self.assertSameNotEqual('int64s: 0 int64s: 1 int64s: 2',
+                            'int64s: 2 int64s: 1 int64s: 0')
+
+    self.assertSameExceptNumber('int64s: 0', 'int64s: 0 int64s: 0')
+    self.assertSameExceptNumber('int64s: 0 int64s: 1',
+                                'int64s: 1 int64s: 0 int64s: 1')
+
+    self.assertNone('int64s: 0',
+                    'int64s: 0 int64s: 2',
+                    """
+                      int64s: 0
+                    + int64s: 2
+                    """)
+    self.assertNone('int64s: 0 int64s: 1',
+                    'int64s: 0 int64s: 2',
+                    """
+                      int64s: 0
+                    - int64s: 1
+                    ?         ^
+                    + int64s: 2
+                    ?         ^
+                    """)
+
+  def testMessage(self):
+    self.assertAll('medium: {}')
+    self.assertAll('medium: { smalls: {} }')
+    self.assertAll('medium: { int32s: 1 smalls: {} }')
+    self.assertAll('medium: { smalls: { strings: "x" } }')
+    self.assertAll('medium: { smalls: { strings: "x" } } small: { strings: "y" }')
+
+    self.assertSameNotEqual(
+        'medium: { smalls: { strings: "x" strings: "y" } }',
+        'medium: { smalls: { strings: "y" strings: "x" } }')
+    self.assertSameNotEqual(
+        'medium: { smalls: { strings: "x" } smalls: { strings: "y" } }',
+        'medium: { smalls: { strings: "y" } smalls: { strings: "x" } }')
+
+    self.assertSameExceptNumber(
+        'medium: { smalls: { strings: "x" strings: "y" strings: "x" } }',
+        'medium: { smalls: { strings: "y" strings: "x" } }')
+    self.assertSameExceptNumber(
+        'medium: { smalls: { strings: "x" } int32s: 0 }',
+        'medium: { int32s: 0 smalls: { strings: "x" } int32s: 0 }')
+
+    self.assertNone('medium: {}',
+                    'medium: { smalls: { strings: "x" } }',
+                    """
+                      medium {
+                    +   smalls {
+                    +     strings: "x"
+                    +   }
+                      }
+                    """)
+    self.assertNone('medium: { smalls: { strings: "x" } }',
+                    'medium: { smalls: {} }',
+                    """
+                      medium {
+                        smalls {
+                    -     strings: "x"
+                        }
+                      }
+                    """)
+    self.assertNone('medium: { int32s: 0 }',
+                    'medium: { int32s: 1 }',
+                    """
+                      medium {
+                    -   int32s: 0
+                    ?           ^
+                    +   int32s: 1
+                    ?           ^
+                      }
+                    """)
+
+  def testMsgPassdown(self):
+    self.assertRaisesRegexp(AssertionError, 'test message passed down',
+                            self.assertProto2Equal,
+                            'medium: {}',
+                            'medium: { smalls: { strings: "x" } }',
+                            msg='test message passed down')
+
+  def testRepeatedMessage(self):
+    self.assertAll('medium: { smalls: {} smalls: {} }')
+    self.assertAll('medium: { smalls: { strings: "x" } } medium: {}')
+    self.assertAll('medium: { smalls: { strings: "x" } } medium: { int32s: 0 }')
+    self.assertAll('medium: { smalls: {} smalls: { strings: "x" } } small: {}')
+
+    self.assertSameNotEqual('medium: { smalls: { strings: "x" } smalls: {} }',
+                            'medium: { smalls: {} smalls: { strings: "x" } }')
+
+    self.assertSameExceptNumber('medium: { smalls: {} }',
+                                'medium: { smalls: {} smalls: {} }')
+    self.assertSameExceptNumber('medium: { smalls: {} smalls: {} } medium: {}',
+                                'medium: {} medium: {} medium: { smalls: {} }')
+    self.assertSameExceptNumber(
+        'medium: { smalls: { strings: "x" } smalls: {} }',
+        'medium: { smalls: {} smalls: { strings: "x" } smalls: {} }')
+
+    self.assertNone('medium: {}',
+                    'medium: {} medium { smalls: {} }',
+                    """
+                      medium {
+                    +   smalls {
+                    +   }
+                      }
+                    """)
+    self.assertNone('medium: { smalls: {} smalls: { strings: "x" } }',
+                    'medium: { smalls: {} smalls: { strings: "y" } }',
+                    """
+                      medium {
+                        smalls {
+                        }
+                        smalls {
+                    -     strings: "x"
+                    ?               ^
+                    +     strings: "y"
+                    ?               ^
+                        }
+                      }
+                    """)
+
+
+class MixinTests(compare.Proto2Assertions, googletest.TestCase):
+
+  def testAssertEqualWithStringArg(self):
+    pb = compare_test_pb2.Large()
+    pb.string_ = 'abc'
+    pb.float_ = 1.234
+    self.assertProto2Equal(
+        """
+          string_: 'abc'
+          float_: 1.234
+        """,
+        pb)
+
+  def testAssertSameElements(self):
+    a = compare_test_pb2.Large()
+    a.string_ = 'abc'
+    a.float_ = 1.234
+    a.int64s[:] = [4, 3, 2]
+    b = compare_test_pb2.Large()
+    b.CopyFrom(a)
+    b.int64s[:] = [2, 4, 3]
+    self.assertProto2SameElements(a, b)
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/stream_executor/BUILD b/tensorflow/stream_executor/BUILD
new file mode 100644
index 0000000000..b91fe431f6
--- /dev/null
+++ b/tensorflow/stream_executor/BUILD
@@ -0,0 +1,39 @@
+licenses(["restricted"])
+
+load("/tensorflow/tensorflow", "if_cuda")
+
+cc_library(
+    name = "stream_executor",
+    srcs = glob(
+        [
+            "*.cc",
+            "lib/*.cc",
+        ],
+        exclude = [
+            "**/*_test.cc",
+        ],
+    ) + if_cuda(
+        glob([
+            "cuda/*.cc",
+        ]),
+    ),
+    hdrs = glob([
+        "*.h",
+        "lib/*.h",
+        "platform/**/*.h",
+    ]),
+    data = [
+        "//tensorflow/core:cuda",
+        "//third_party/gpus/cuda:cublas",
+        "//third_party/gpus/cuda:cudnn",
+    ],
+    linkopts = [
+        "-ldl",
+    ],
+    visibility = ["//visibility:public"],
+    deps = [
+        "//tensorflow/core:lib",
+        "//third_party/gpus/cuda:cuda_headers",
+    ],
+    alwayslink = 1,
+)
diff --git a/tensorflow/stream_executor/blas.cc b/tensorflow/stream_executor/blas.cc
new file mode 100644
index 0000000000..70a6bb7030
--- /dev/null
+++ b/tensorflow/stream_executor/blas.cc
@@ -0,0 +1,57 @@
+#include "tensorflow/stream_executor/blas.h"
+
+#include "tensorflow/stream_executor/lib/strcat.h"
+
+namespace perftools {
+namespace gputools {
+namespace blas {
+
+string TransposeString(Transpose t) {
+  switch (t) {
+    case Transpose::kNoTranspose:
+      return "NoTranspose";
+    case Transpose::kTranspose:
+      return "Transpose";
+    case Transpose::kConjugateTranspose:
+      return "ConjugateTranspose";
+    default:
+      LOG(FATAL) << "Unknown transpose " << static_cast<int32>(t);
+  }
+}
+
+string UpperLowerString(UpperLower ul) {
+  switch (ul) {
+    case UpperLower::kUpper:
+      return "Upper";
+    case UpperLower::kLower:
+      return "Lower";
+    default:
+      LOG(FATAL) << "Unknown upperlower " << static_cast<int32>(ul);
+  }
+}
+
+string DiagonalString(Diagonal d) {
+  switch (d) {
+    case Diagonal::kUnit:
+      return "Unit";
+    case Diagonal::kNonUnit:
+      return "NonUnit";
+    default:
+      LOG(FATAL) << "Unknown diagonal " << static_cast<int32>(d);
+  }
+}
+
+string SideString(Side s) {
+  switch (s) {
+    case Side::kLeft:
+      return "Left";
+    case Side::kRight:
+      return "Right";
+    default:
+      LOG(FATAL) << "Unknown side " << static_cast<int32>(s);
+  }
+}
+
+}  // namespace blas
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/blas.h b/tensorflow/stream_executor/blas.h
new file mode 100644
index 0000000000..f6ee29837d
--- /dev/null
+++ b/tensorflow/stream_executor/blas.h
@@ -0,0 +1,1780 @@
+// Exposes the family of BLAS routines as pre-canned high performance calls for
+// use in conjunction with the StreamExecutor abstraction.
+//
+// Note that this interface is optionally supported by platforms; see
+// StreamExecutor::SupportsBlas() for details.
+//
+// This abstraction makes it simple to entrain BLAS operations on GPU data into
+// a Stream -- users typically will not use this API directly, but will use the
+// Stream builder methods to entrain these operations "under the hood". For
+// example:
+//
+//  DeviceMemory<float> x = stream_exec->AllocateArray<float>(1024);
+//  DeviceMemory<float> y = stream_exec->AllocateArray<float>(1024);
+//  // ... populate x and y ...
+//  Stream stream{stream_exec};
+//  stream
+//    .Init()
+//    .ThenBlasAxpy(1024, 5.5, x, 1, &y, 1)
+//    .BlockHostUntilDone();
+//
+// By using stream operations in this manner the user can easily intermix custom
+// kernel launches (via StreamExecutor::ThenLaunch()) with these pre-canned BLAS
+// routines.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_BLAS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_BLAS_H_
+
+#include <complex>
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/lib/array_slice.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+
+template <typename ElemT>
+class DeviceMemory;
+
+namespace blas {
+
+// Specifies whether the input matrix will be transposed or
+// transposed+conjugated before any BLAS operations.
+enum class Transpose { kNoTranspose, kTranspose, kConjugateTranspose };
+
+// Returns a name for t.
+string TransposeString(Transpose t);
+
+// Specifies whether the upper or lower triangular part of a
+// symmetric/Hermitian matrix is used.
+enum class UpperLower { kUpper, kLower };
+
+// Returns a name for ul.
+string UpperLowerString(UpperLower ul);
+
+// Specifies whether a matrix is unit triangular.
+enum class Diagonal { kUnit, kNonUnit };
+
+// Returns a name for d.
+string DiagonalString(Diagonal d);
+
+// Specifies whether a Hermitian matrix appears on the left or right in
+// operation.
+enum class Side { kLeft, kRight };
+
+// Returns a name for s.
+string SideString(Side s);
+
+// BLAS support interface -- this can be derived from a GPU executor when the
+// underlying platform has an BLAS library implementation available. See
+// StreamExecutor::AsBlas().
+//
+// Thread-hostile: CUDA associates a CUDA-context with a particular thread in
+// the system. Any operation that a user attempts to perform by enqueueing BLAS
+// operations on a thread not-associated with the CUDA-context has unknown
+// behavior at the current time; see b/13176597
+class BlasSupport {
+ public:
+  virtual ~BlasSupport() {}
+
+  // Computes the sum of magnitudes of the vector elements.
+  // result <- |Re x(1)| + |Im x(1)| + |Re  x(2)| + |Im  x(2)|+ ... + |Re  x(n)|
+  // + |Im x(n)|.
+  // Note that Im x(i) = 0 for real types float/double.
+  virtual bool DoBlasAsum(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<float> &x, int incx,
+                          DeviceMemory<float> *result) = 0;
+  virtual bool DoBlasAsum(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<double> &x, int incx,
+                          DeviceMemory<double> *result) = 0;
+  virtual bool DoBlasAsum(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          DeviceMemory<float> *result) = 0;
+  virtual bool DoBlasAsum(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          DeviceMemory<double> *result) = 0;
+
+  // Performs a BLAS y <- ax+y operation.
+  virtual bool DoBlasAxpy(Stream *stream, uint64 elem_count, float alpha,
+                          const DeviceMemory<float> &x, int incx,
+                          DeviceMemory<float> *y, int incy) = 0;
+  virtual bool DoBlasAxpy(Stream *stream, uint64 elem_count, double alpha,
+                          const DeviceMemory<double> &x, int incx,
+                          DeviceMemory<double> *y, int incy) = 0;
+  virtual bool DoBlasAxpy(Stream *stream, uint64 elem_count,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          DeviceMemory<std::complex<float>> *y, int incy) = 0;
+  virtual bool DoBlasAxpy(Stream *stream, uint64 elem_count,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          DeviceMemory<std::complex<double>> *y, int incy) = 0;
+
+  // Copies vector to another vector: y <- x.
+  virtual bool DoBlasCopy(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<float> &x, int incx,
+                          DeviceMemory<float> *y, int incy) = 0;
+  virtual bool DoBlasCopy(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<double> &x, int incx,
+                          DeviceMemory<double> *y, int incy) = 0;
+  virtual bool DoBlasCopy(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          DeviceMemory<std::complex<float>> *y, int incy) = 0;
+  virtual bool DoBlasCopy(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          DeviceMemory<std::complex<double>> *y, int incy) = 0;
+
+  // Performs a BLAS dot product result <- x . y.
+  virtual bool DoBlasDot(Stream *stream, uint64 elem_count,
+                         const DeviceMemory<float> &x, int incx,
+                         const DeviceMemory<float> &y, int incy,
+                         DeviceMemory<float> *result) = 0;
+  virtual bool DoBlasDot(Stream *stream, uint64 elem_count,
+                         const DeviceMemory<double> &x, int incx,
+                         const DeviceMemory<double> &y, int incy,
+                         DeviceMemory<double> *result) = 0;
+
+  // Performs a BLAS dot product result <- conj(x) . y for complex types.
+  virtual bool DoBlasDotc(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *result) = 0;
+  virtual bool DoBlasDotc(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *result) = 0;
+
+  // Performs a BLAS dot product result <- x . y for complex types. Note that
+  // x is unconjugated in this routine.
+  virtual bool DoBlasDotu(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *result) = 0;
+  virtual bool DoBlasDotu(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *result) = 0;
+
+  // Computes the Euclidean norm of a vector: result <- ||x||.
+  // See the following link for more information of Euclidean norm:
+  // http://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm
+  virtual bool DoBlasNrm2(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<float> &x, int incx,
+                          DeviceMemory<float> *result) = 0;
+  virtual bool DoBlasNrm2(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<double> &x, int incx,
+                          DeviceMemory<double> *result) = 0;
+  virtual bool DoBlasNrm2(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          DeviceMemory<float> *result) = 0;
+  virtual bool DoBlasNrm2(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          DeviceMemory<double> *result) = 0;
+
+  // Performs rotation of points in the plane:
+  // x(i) = c*x(i) + s*y(i)
+  // y(i) = c*y(i) - s*x(i).
+  virtual bool DoBlasRot(Stream *stream, uint64 elem_count,
+                         DeviceMemory<float> *x, int incx,
+                         DeviceMemory<float> *y, int incy, float c,
+                         float s) = 0;
+  virtual bool DoBlasRot(Stream *stream, uint64 elem_count,
+                         DeviceMemory<double> *x, int incx,
+                         DeviceMemory<double> *y, int incy, double c,
+                         double s) = 0;
+  virtual bool DoBlasRot(Stream *stream, uint64 elem_count,
+                         DeviceMemory<std::complex<float>> *x, int incx,
+                         DeviceMemory<std::complex<float>> *y, int incy,
+                         float c, float s) = 0;
+  virtual bool DoBlasRot(Stream *stream, uint64 elem_count,
+                         DeviceMemory<std::complex<double>> *x, int incx,
+                         DeviceMemory<std::complex<double>> *y, int incy,
+                         double c, double s) = 0;
+
+  // Computes the parameters for a Givens rotation.
+  // Given the Cartesian coordinates (a, b) of a point, these routines return
+  // the parameters c, s, r, and z associated with the Givens rotation. The
+  // parameters c and s define a unitary matrix such that:
+  //
+  //   |  c s |.| a | = | r |
+  //   | -s c | | b |   | 0 |
+  //
+  // The parameter z is defined such that if |a| > |b|, z is s; otherwise if
+  // c is not 0 z is 1/c; otherwise z is 1.
+  virtual bool DoBlasRotg(Stream *stream, DeviceMemory<float> *a,
+                          DeviceMemory<float> *b, DeviceMemory<float> *c,
+                          DeviceMemory<float> *s) = 0;
+  virtual bool DoBlasRotg(Stream *stream, DeviceMemory<double> *a,
+                          DeviceMemory<double> *b, DeviceMemory<double> *c,
+                          DeviceMemory<double> *s) = 0;
+  virtual bool DoBlasRotg(Stream *stream, DeviceMemory<std::complex<float>> *a,
+                          DeviceMemory<std::complex<float>> *b,
+                          DeviceMemory<float> *c,
+                          DeviceMemory<std::complex<float>> *s) = 0;
+  virtual bool DoBlasRotg(Stream *stream, DeviceMemory<std::complex<double>> *a,
+                          DeviceMemory<std::complex<double>> *b,
+                          DeviceMemory<double> *c,
+                          DeviceMemory<std::complex<double>> *s) = 0;
+
+  // Performs modified Givens rotation of points in the plane.
+  // Given two vectors x and y, each vector element of these vectors is replaced
+  // as follows:
+  //
+  //   | x(i) | =  H | x(i) |
+  //   | y(i) |      | y(i) |
+  //
+  // for i=1 to n, where H is a modified Givens transformation matrix whose
+  // values are stored in the param[1] through param[4] array.
+  // For more information please Google this routine.
+  virtual bool DoBlasRotm(Stream *stream, uint64 elem_count,
+                          DeviceMemory<float> *x, int incx,
+                          DeviceMemory<float> *y, int incy,
+                          const DeviceMemory<float> &param) = 0;
+  virtual bool DoBlasRotm(Stream *stream, uint64 elem_count,
+                          DeviceMemory<double> *x, int incx,
+                          DeviceMemory<double> *y, int incy,
+                          const DeviceMemory<double> &param) = 0;
+
+  // Computes the parameters for a modified Givens rotation.
+  // Given Cartesian coordinates (x1, y1) of an input vector, these routines
+  // compute the components of a modified Givens transformation matrix H that
+  // zeros the y-component of the resulting vector:
+  //
+  //   | x1 | =  H | x1 * sqrt(d1) |
+  //   |  0 |      | y1 * sqrt(d1) |
+  //
+  // For more information please Google this routine.
+  virtual bool DoBlasRotmg(Stream *stream, DeviceMemory<float> *d1,
+                           DeviceMemory<float> *d2, DeviceMemory<float> *x1,
+                           const DeviceMemory<float> &y1,
+                           DeviceMemory<float> *param) = 0;
+  virtual bool DoBlasRotmg(Stream *stream, DeviceMemory<double> *d1,
+                           DeviceMemory<double> *d2, DeviceMemory<double> *x1,
+                           const DeviceMemory<double> &y1,
+                           DeviceMemory<double> *param) = 0;
+
+  // Computes the product of a vector by a scalar: x <- a*x.
+  virtual bool DoBlasScal(Stream *stream, uint64 elem_count, float alpha,
+                          DeviceMemory<float> *x, int incx) = 0;
+  virtual bool DoBlasScal(Stream *stream, uint64 elem_count, double alpha,
+                          DeviceMemory<double> *x, int incx) = 0;
+  virtual bool DoBlasScal(Stream *stream, uint64 elem_count, float alpha,
+                          DeviceMemory<std::complex<float>> *x, int incx) = 0;
+  virtual bool DoBlasScal(Stream *stream, uint64 elem_count, double alpha,
+                          DeviceMemory<std::complex<double>> *x, int incx) = 0;
+  virtual bool DoBlasScal(Stream *stream, uint64 elem_count,
+                          std::complex<float> alpha,
+                          DeviceMemory<std::complex<float>> *x, int incx) = 0;
+  virtual bool DoBlasScal(Stream *stream, uint64 elem_count,
+                          std::complex<double> alpha,
+                          DeviceMemory<std::complex<double>> *x, int incx) = 0;
+
+  // Swaps a vector with another vector.
+  virtual bool DoBlasSwap(Stream *stream, uint64 elem_count,
+                          DeviceMemory<float> *x, int incx,
+                          DeviceMemory<float> *y, int incy) = 0;
+  virtual bool DoBlasSwap(Stream *stream, uint64 elem_count,
+                          DeviceMemory<double> *x, int incx,
+                          DeviceMemory<double> *y, int incy) = 0;
+  virtual bool DoBlasSwap(Stream *stream, uint64 elem_count,
+                          DeviceMemory<std::complex<float>> *x, int incx,
+                          DeviceMemory<std::complex<float>> *y, int incy) = 0;
+  virtual bool DoBlasSwap(Stream *stream, uint64 elem_count,
+                          DeviceMemory<std::complex<double>> *x, int incx,
+                          DeviceMemory<std::complex<double>> *y, int incy) = 0;
+
+  // Finds the index of the element with maximum absolute value.
+  virtual bool DoBlasIamax(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<float> &x, int incx,
+                           DeviceMemory<int> *result) = 0;
+  virtual bool DoBlasIamax(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<double> &x, int incx,
+                           DeviceMemory<int> *result) = 0;
+  virtual bool DoBlasIamax(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<std::complex<float>> &x, int incx,
+                           DeviceMemory<int> *result) = 0;
+  virtual bool DoBlasIamax(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<std::complex<double>> &x,
+                           int incx, DeviceMemory<int> *result) = 0;
+
+  // Finds the index of the element with minimum absolute value.
+  virtual bool DoBlasIamin(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<float> &x, int incx,
+                           DeviceMemory<int> *result) = 0;
+  virtual bool DoBlasIamin(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<double> &x, int incx,
+                           DeviceMemory<int> *result) = 0;
+  virtual bool DoBlasIamin(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<std::complex<float>> &x, int incx,
+                           DeviceMemory<int> *result) = 0;
+  virtual bool DoBlasIamin(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<std::complex<double>> &x,
+                           int incx, DeviceMemory<int> *result) = 0;
+
+  // Computes a matrix-vector product using a general band matrix:
+  //
+  //     y <- alpha * a * x + beta * y,
+  // or
+  //     y <- alpha * a' * x + beta * y,
+  // or
+  //     y <- alpha * conj(a') * x + beta * y,
+  //
+  // alpha and beta are scalars; a is an m-by-n general band matrix, with kl
+  // sub-diagonals and ku super-diagonals; x is a vector with
+  // n(trans==kNoTranspose)/m(otherwise) elements;
+  // y is a vector with m(trans==kNoTranspose)/n(otherwise) elements.
+  virtual bool DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, uint64 kl, uint64 ku, float alpha,
+                          const DeviceMemory<float> &a, int lda,
+                          const DeviceMemory<float> &x, int incx, float beta,
+                          DeviceMemory<float> *y, int incy) = 0;
+  virtual bool DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, uint64 kl, uint64 ku, double alpha,
+                          const DeviceMemory<double> &a, int lda,
+                          const DeviceMemory<double> &x, int incx, double beta,
+                          DeviceMemory<double> *y, int incy) = 0;
+  virtual bool DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, uint64 kl, uint64 ku,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) = 0;
+  virtual bool DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, uint64 kl, uint64 ku,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) = 0;
+
+  // Computes a matrix-vector product using a general matrix.
+  //
+  //     y <- alpha * a * x + beta * y,
+  // or
+  //     y <- alpha * a' * x + beta * y,
+  // or
+  //     y <- alpha * conj(a') * x + beta * y,
+  //
+  // alpha and beta are scalars; a is an m-by-n general matrix; x is a vector
+  // with n(trans==kNoTranspose)/m(otherwise) elements;
+  // y is a vector with m(trans==kNoTranspose)/n(otherwise) elements.
+  virtual bool DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, float alpha, const DeviceMemory<float> &a,
+                          int lda, const DeviceMemory<float> &x, int incx,
+                          float beta, DeviceMemory<float> *y, int incy) = 0;
+  virtual bool DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, double alpha, const DeviceMemory<double> &a,
+                          int lda, const DeviceMemory<double> &x, int incx,
+                          double beta, DeviceMemory<double> *y, int incy) = 0;
+  virtual bool DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) = 0;
+  virtual bool DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) = 0;
+
+  // Performs a rank-1 update of a general matrix.
+  //
+  //     a <- alpha * x * y' + a,
+  //
+  // alpha is a scalar; x is an m-element vector; y is an n-element vector; a is
+  // an m-by-n general matrix.
+  virtual bool DoBlasGer(Stream *stream, uint64 m, uint64 n, float alpha,
+                         const DeviceMemory<float> &x, int incx,
+                         const DeviceMemory<float> &y, int incy,
+                         DeviceMemory<float> *a, int lda) = 0;
+  virtual bool DoBlasGer(Stream *stream, uint64 m, uint64 n, double alpha,
+                         const DeviceMemory<double> &x, int incx,
+                         const DeviceMemory<double> &y, int incy,
+                         DeviceMemory<double> *a, int lda) = 0;
+
+  // Performs a rank-1 update (conjugated) of a general matrix.
+  //
+  //     a <- alpha * x * conj(y') + a,
+  //
+  // alpha is a scalar; x is an m-element vector; y is an n-element vector; a is
+  // an m-by-n general matrix.
+  virtual bool DoBlasGerc(Stream *stream, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *a, int lda) = 0;
+  virtual bool DoBlasGerc(Stream *stream, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *a, int lda) = 0;
+
+  // Performs a rank-1 update (unconjugated) of a general matrix.
+  //
+  //     a <- alpha * x * y' + a,
+  //
+  // alpha is a scalar; x is an m-element vector; y is an n-element vector; a is
+  // an m-by-n general matrix.
+  virtual bool DoBlasGeru(Stream *stream, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *a, int lda) = 0;
+  virtual bool DoBlasGeru(Stream *stream, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *a, int lda) = 0;
+
+  // Computes a matrix-vector product using a Hermitian band matrix.
+  //
+  //     y <- alpha * a * x + beta * y,
+  //
+  // alpha and beta are scalars; a is an n-by-n Hermitian band matrix, with k
+  // super-diagonals; x and y are n-element vectors.
+  virtual bool DoBlasHbmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          uint64 k, std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) = 0;
+  virtual bool DoBlasHbmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          uint64 k, std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) = 0;
+
+  // Computes a matrix-vector product using a Hermitian matrix.
+  //
+  //     y <- alpha * a * x + beta * y,
+  //
+  // alpha and beta are scalars; a is an n-by-n Hermitian matrix; x and y are
+  // n-element vectors.
+  virtual bool DoBlasHemv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) = 0;
+  virtual bool DoBlasHemv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) = 0;
+
+  // Performs a rank-1 update of a Hermitian matrix.
+  //
+  //     a <- alpha * x * conj(x') + a,
+  //
+  // alpha is a scalar; x is an n-element vector; a is an n-by-n Hermitian
+  // matrix.
+  virtual bool DoBlasHer(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         float alpha,
+                         const DeviceMemory<std::complex<float>> &x, int incx,
+                         DeviceMemory<std::complex<float>> *a, int lda) = 0;
+  virtual bool DoBlasHer(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         double alpha,
+                         const DeviceMemory<std::complex<double>> &x, int incx,
+                         DeviceMemory<std::complex<double>> *a, int lda) = 0;
+
+  // Performs a rank-2 update of a Hermitian matrix.
+  //
+  //     a <- alpha * x * conj(x') + conj(alpha) * y * conj(x') + a,
+  //
+  // alpha is a scalar; x and y are n-element vectors; a is an n-by-n Hermitian
+  // matrix.
+  virtual bool DoBlasHer2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *a, int lda) = 0;
+  virtual bool DoBlasHer2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *a, int lda) = 0;
+
+  // Computes a matrix-vector product using a Hermitian packed matrix.
+  //
+  //     y <- alpha * a * x + beta * y,
+  //
+  // alpha and beta are scalars; a is an n-by-n Hermitian matrix, supplied in
+  // packed form; x and y are n-element vectors.
+  virtual bool DoBlasHpmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &ap,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) = 0;
+  virtual bool DoBlasHpmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &ap,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) = 0;
+
+  // Performs a rank-1 update of a Hermitian packed matrix.
+  //
+  //     a <- alpha * x * conj(x') + a,
+  //
+  // alpha is a scalar; x is an n-element vector; a is an n-by-n Hermitian
+  // matrix, supplied in packed form.
+  virtual bool DoBlasHpr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         float alpha,
+                         const DeviceMemory<std::complex<float>> &x, int incx,
+                         DeviceMemory<std::complex<float>> *ap) = 0;
+  virtual bool DoBlasHpr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         double alpha,
+                         const DeviceMemory<std::complex<double>> &x, int incx,
+                         DeviceMemory<std::complex<double>> *ap) = 0;
+
+  // Performs a rank-2 update of a Hermitian packed matrix.
+  //
+  //     a <- alpha * x * conj(x') + conj(alpha) * y * conj(x') + a,
+  //
+  // alpha is a scalar; x and y are n-element vectors; a is an n-by-n Hermitian
+  // matrix, supplied in packed form.
+  virtual bool DoBlasHpr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *ap) = 0;
+  virtual bool DoBlasHpr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *ap) = 0;
+
+  // Computes a matrix-vector product using a symmetric band matrix.
+  //
+  //     y <- alpha * a * x + beta * y,
+  //
+  // alpha and beta are scalars; a is an n-by-n symmetric band matrix, with k
+  // super-diagonals; x and y are n-element vectors.
+  virtual bool DoBlasSbmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          uint64 k, float alpha, const DeviceMemory<float> &a,
+                          int lda, const DeviceMemory<float> &x, int incx,
+                          float beta, DeviceMemory<float> *y, int incy) = 0;
+  virtual bool DoBlasSbmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          uint64 k, double alpha, const DeviceMemory<double> &a,
+                          int lda, const DeviceMemory<double> &x, int incx,
+                          double beta, DeviceMemory<double> *y, int incy) = 0;
+
+  // Computes a matrix-vector product using a symmetric packed matrix.
+  //
+  //     y <- alpha * a * x + beta * y,
+  //
+  // alpha and beta are scalars; a is an n-by-n symmetric matrix, supplied in
+  // packed form; x and y are n-element vectors.
+  virtual bool DoBlasSpmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          float alpha, const DeviceMemory<float> &ap,
+                          const DeviceMemory<float> &x, int incx, float beta,
+                          DeviceMemory<float> *y, int incy) = 0;
+  virtual bool DoBlasSpmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          double alpha, const DeviceMemory<double> &ap,
+                          const DeviceMemory<double> &x, int incx, double beta,
+                          DeviceMemory<double> *y, int incy) = 0;
+
+  // Performs a rank-1 update of a symmetric packed matrix.
+  //
+  //     a <- alpha * x * x' + a,
+  //
+  // alpha is a scalar; x is an n-element vector; a is an n-by-n symmetric
+  // matrix, supplied in packed form.
+  virtual bool DoBlasSpr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         float alpha, const DeviceMemory<float> &x, int incx,
+                         DeviceMemory<float> *ap) = 0;
+  virtual bool DoBlasSpr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         double alpha, const DeviceMemory<double> &x, int incx,
+                         DeviceMemory<double> *ap) = 0;
+
+  // Performs a rank-2 update of a symmetric packed matrix.
+  //
+  //     a <- alpha * x * x' + alpha * y * x' + a,
+  //
+  // alpha is a scalar; x and y are n-element vectors; a is an n-by-n symmetric
+  // matrix, supplied in packed form.
+  virtual bool DoBlasSpr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          float alpha, const DeviceMemory<float> &x, int incx,
+                          const DeviceMemory<float> &y, int incy,
+                          DeviceMemory<float> *ap) = 0;
+  virtual bool DoBlasSpr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          double alpha, const DeviceMemory<double> &x, int incx,
+                          const DeviceMemory<double> &y, int incy,
+                          DeviceMemory<double> *ap) = 0;
+
+  // Computes a matrix-vector product for a symmetric matrix.
+  //
+  //     y <- alpha * a * x + beta * y,
+  //
+  // alpha and beta are scalars; a is an n-by-n symmetric matrix; x and y are
+  // n-element vectors.
+  virtual bool DoBlasSymv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          float alpha, const DeviceMemory<float> &a, int lda,
+                          const DeviceMemory<float> &x, int incx, float beta,
+                          DeviceMemory<float> *y, int incy) = 0;
+  virtual bool DoBlasSymv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          double alpha, const DeviceMemory<double> &a, int lda,
+                          const DeviceMemory<double> &x, int incx, double beta,
+                          DeviceMemory<double> *y, int incy) = 0;
+
+  // Performs a rank-1 update of a symmetric matrix.
+  //
+  //     a <- alpha * x * x' + a,
+  //
+  // alpha is a scalar; x is an n-element vector; a is an n-by-n symmetric
+  // matrix.
+  virtual bool DoBlasSyr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         float alpha, const DeviceMemory<float> &x, int incx,
+                         DeviceMemory<float> *a, int lda) = 0;
+  virtual bool DoBlasSyr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         double alpha, const DeviceMemory<double> &x, int incx,
+                         DeviceMemory<double> *a, int lda) = 0;
+
+  // Performs a rank-2 update of symmetric matrix.
+  //
+  //     a <- alpha * x * x' + alpha * y * x' + a,
+  //
+  // alpha is a scalar; x and y are n-element vectors; a is an n-by-n symmetric
+  // matrix.
+  virtual bool DoBlasSyr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          float alpha, const DeviceMemory<float> &x, int incx,
+                          const DeviceMemory<float> &y, int incy,
+                          DeviceMemory<float> *a, int lda) = 0;
+  virtual bool DoBlasSyr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          double alpha, const DeviceMemory<double> &x, int incx,
+                          const DeviceMemory<double> &y, int incy,
+                          DeviceMemory<double> *a, int lda) = 0;
+
+  // Computes a matrix-vector product using a triangular band matrix.
+  //
+  //     x <- a * x,
+  // or
+  //     x <- a' * x,
+  // or
+  //     x <- conj(a') * x,
+  //
+  // a is an n-by-n unit, or non-unit, upper or lower triangular band matrix,
+  // with k+1 diagonals; x is a n-element vector.
+  virtual bool DoBlasTbmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *x, int incx) = 0;
+  virtual bool DoBlasTbmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *x, int incx) = 0;
+  virtual bool DoBlasTbmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<std::complex<float>> &a,
+                          int lda, DeviceMemory<std::complex<float>> *x,
+                          int incx) = 0;
+  virtual bool DoBlasTbmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<std::complex<double>> &a,
+                          int lda, DeviceMemory<std::complex<double>> *x,
+                          int incx) = 0;
+
+  // Solves a system of linear equations whose coefficients are in a triangular
+  // band matrix as below:
+  //
+  //     a * x = b,
+  // or
+  //     a' * x = b,
+  // or
+  //     conj(a') * x = b,
+  //
+  // b and x are n-element vectors; a is an n-by-n unit, or non-unit, upper or
+  // lower triangular band matrix, with k+1 diagonals.
+  virtual bool DoBlasTbsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *x, int incx) = 0;
+  virtual bool DoBlasTbsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *x, int incx) = 0;
+  virtual bool DoBlasTbsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<std::complex<float>> &a,
+                          int lda, DeviceMemory<std::complex<float>> *x,
+                          int incx) = 0;
+  virtual bool DoBlasTbsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<std::complex<double>> &a,
+                          int lda, DeviceMemory<std::complex<double>> *x,
+                          int incx) = 0;
+
+  // Computes a matrix-vector product using a triangular packed matrix.
+  //
+  //     x <- a * x,
+  // or
+  //     x <- a' * x,
+  // or
+  //     x <- conj(a') * x,
+  //
+  // a is an n-by-n unit, or non-unit, upper or lower triangular matrix,
+  // supplied in packed form; x is a n-element vector.
+  virtual bool DoBlasTpmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<float> &ap, DeviceMemory<float> *x,
+                          int incx) = 0;
+  virtual bool DoBlasTpmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<double> &ap,
+                          DeviceMemory<double> *x, int incx) = 0;
+  virtual bool DoBlasTpmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<float>> &ap,
+                          DeviceMemory<std::complex<float>> *x, int incx) = 0;
+  virtual bool DoBlasTpmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<double>> &ap,
+                          DeviceMemory<std::complex<double>> *x, int incx) = 0;
+
+  // Solves a system of linear equations whose coefficients are in a triangular
+  // packed matrix as below:
+  //
+  //     a * x = b,
+  // or
+  //     a' * x = b,
+  // or
+  //     conj(a') * x = b,
+  //
+  // b and x are n-element vectors; a is an n-by-n unit, or non-unit, upper or
+  // lower triangular matrix, supplied in packed form.
+  virtual bool DoBlasTpsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<float> &ap, DeviceMemory<float> *x,
+                          int incx) = 0;
+  virtual bool DoBlasTpsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<double> &ap,
+                          DeviceMemory<double> *x, int incx) = 0;
+  virtual bool DoBlasTpsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<float>> &ap,
+                          DeviceMemory<std::complex<float>> *x, int incx) = 0;
+  virtual bool DoBlasTpsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<double>> &ap,
+                          DeviceMemory<std::complex<double>> *x, int incx) = 0;
+
+  // Computes a matrix-vector product using a triangular matrix.
+  //
+  //     x <- a * x,
+  // or
+  //     x <- a' * x,
+  // or
+  //     x <- conj(a') * x,
+  //
+  // a is an n-by-n unit, or non-unit, upper or lower triangular matrix; x is a
+  // n-element vector.
+  virtual bool DoBlasTrmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *x, int incx) = 0;
+  virtual bool DoBlasTrmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *x, int incx) = 0;
+  virtual bool DoBlasTrmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          DeviceMemory<std::complex<float>> *x, int incx) = 0;
+  virtual bool DoBlasTrmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          DeviceMemory<std::complex<double>> *x, int incx) = 0;
+
+  // Solves a system of linear equations whose coefficients are in a triangular
+  // matrix as below:
+  //
+  //     a * x = b,
+  // or
+  //     a' * x = b,
+  // or
+  //     conj(a') * x = b,
+  //
+  // b and x are n-element vectors; a is an n-by-n unit, or non-unit, upper or
+  // lower triangular matrix.
+  virtual bool DoBlasTrsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *x, int incx) = 0;
+  virtual bool DoBlasTrsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *x, int incx) = 0;
+  virtual bool DoBlasTrsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          DeviceMemory<std::complex<float>> *x, int incx) = 0;
+  virtual bool DoBlasTrsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          DeviceMemory<std::complex<double>> *x, int incx) = 0;
+
+  // Computes a matrix-matrix product with general matrices:
+  //
+  //     c <- alpha * op(a) * op(b) + beta * c,
+  //
+  // op(X) is one of op(X) = X, or op(X) = X', or op(X) = conj(X'); alpha and
+  // beta are scalars; a, b, and c are matrices; op(a) is an m-by-k matrix;
+  // op(b) is a k-by-n matrix; c is an m-by-n matrix.
+  virtual bool DoBlasGemm(Stream *stream, blas::Transpose transa,
+                          blas::Transpose transb, uint64 m, uint64 n, uint64 k,
+                          float alpha, const DeviceMemory<float> &a, int lda,
+                          const DeviceMemory<float> &b, int ldb, float beta,
+                          DeviceMemory<float> *c, int ldc) = 0;
+  virtual bool DoBlasGemm(Stream *stream, blas::Transpose transa,
+                          blas::Transpose transb, uint64 m, uint64 n, uint64 k,
+                          double alpha, const DeviceMemory<double> &a, int lda,
+                          const DeviceMemory<double> &b, int ldb, double beta,
+                          DeviceMemory<double> *c, int ldc) = 0;
+  virtual bool DoBlasGemm(Stream *stream, blas::Transpose transa,
+                          blas::Transpose transb, uint64 m, uint64 n, uint64 k,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &b, int ldb,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *c, int ldc) = 0;
+  virtual bool DoBlasGemm(Stream *stream, blas::Transpose transa,
+                          blas::Transpose transb, uint64 m, uint64 n, uint64 k,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &b, int ldb,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *c, int ldc) = 0;
+
+  // Computes a batch of matrix-matrix product with general matrices.
+  // This is a batched version of DoBlasGemm.
+  // The batched GEMM computes matrix product for each input/output in a, b,
+  // and c, which contain batch_count DeviceMemory objects.
+  virtual bool DoBlasGemmBatched(
+      Stream *stream, blas::Transpose transa, blas::Transpose transb, uint64 m,
+      uint64 n, uint64 k, float alpha,
+      const port::ArraySlice<DeviceMemory<float> *> &a, int lda,
+      const port::ArraySlice<DeviceMemory<float> *> &b, int ldb, float beta,
+      const port::ArraySlice<DeviceMemory<float> *> &c, int ldc,
+      int batch_count) = 0;
+  virtual bool DoBlasGemmBatched(
+      Stream *stream, blas::Transpose transa, blas::Transpose transb, uint64 m,
+      uint64 n, uint64 k, double alpha,
+      const port::ArraySlice<DeviceMemory<double> *> &a, int lda,
+      const port::ArraySlice<DeviceMemory<double> *> &b, int ldb, double beta,
+      const port::ArraySlice<DeviceMemory<double> *> &c, int ldc,
+      int batch_count) = 0;
+  virtual bool DoBlasGemmBatched(
+      Stream *stream, blas::Transpose transa, blas::Transpose transb, uint64 m,
+      uint64 n, uint64 k, std::complex<float> alpha,
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &a, int lda,
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &b, int ldb,
+      std::complex<float> beta,
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &c, int ldc,
+      int batch_count) = 0;
+  virtual bool DoBlasGemmBatched(
+      Stream *stream, blas::Transpose transa, blas::Transpose transb, uint64 m,
+      uint64 n, uint64 k, std::complex<double> alpha,
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &a, int lda,
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &b, int ldb,
+      std::complex<double> beta,
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &c, int ldc,
+      int batch_count) = 0;
+
+  // Computes a matrix-matrix product where one input matrix is Hermitian:
+  //
+  //     c <- alpha * a * b + beta * c,
+  // or
+  //     c <- alpha * b * a + beta * c,
+  //
+  // alpha and beta are scalars; a is a Hermitian matrix; b and c are m-by-n
+  // matrices.
+  virtual bool DoBlasHemm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &b, int ldb,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *c, int ldc) = 0;
+  virtual bool DoBlasHemm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &b, int ldb,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *c, int ldc) = 0;
+
+  // Performs a Hermitian rank-k update.
+  //
+  //     c <- alpha * a * conj(a') + beta * c,
+  // or
+  //     c <- alpha * conj(a') * a + beta * c,
+  //
+  // alpha and beta are scalars; c is a n-by-n Hermitian matrix; a is an n-by-k
+  // matrix in the first case and a k-by-n matrix in the second case.
+  virtual bool DoBlasHerk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          float alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          float beta, DeviceMemory<std::complex<float>> *c,
+                          int ldc) = 0;
+  virtual bool DoBlasHerk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          double alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          double beta, DeviceMemory<std::complex<double>> *c,
+                          int ldc) = 0;
+
+  // Performs a Hermitian rank-2k update.
+  //
+  //     c <- alpha * a * conj(b') + conj(alpha) * b * conj(a') + beta * c,
+  // or
+  //     c <- alpha * conj(b') * a + conj(alpha) * conj(a') * b + beta * c,
+  //
+  // alpha and beta are scalars; c is a n-by-n Hermitian matrix; a and b are
+  // n-by-k matrices in the first case and k-by-n matrices in the second case.
+  virtual bool DoBlasHer2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           std::complex<float> alpha,
+                           const DeviceMemory<std::complex<float>> &a, int lda,
+                           const DeviceMemory<std::complex<float>> &b, int ldb,
+                           float beta, DeviceMemory<std::complex<float>> *c,
+                           int ldc) = 0;
+  virtual bool DoBlasHer2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           std::complex<double> alpha,
+                           const DeviceMemory<std::complex<double>> &a, int lda,
+                           const DeviceMemory<std::complex<double>> &b, int ldb,
+                           double beta, DeviceMemory<std::complex<double>> *c,
+                           int ldc) = 0;
+
+  // Computes a matrix-matrix product where one input matrix is symmetric.
+  //
+  //     c <- alpha * a * b + beta * c,
+  // or
+  //     c <- alpha * b * a + beta * c,
+  //
+  // alpha and beta are scalars; a is a symmetric matrix; b and c are m-by-n
+  // matrices.
+  virtual bool DoBlasSymm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          float alpha, const DeviceMemory<float> &a, int lda,
+                          const DeviceMemory<float> &b, int ldb, float beta,
+                          DeviceMemory<float> *c, int ldc) = 0;
+  virtual bool DoBlasSymm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          double alpha, const DeviceMemory<double> &a, int lda,
+                          const DeviceMemory<double> &b, int ldb, double beta,
+                          DeviceMemory<double> *c, int ldc) = 0;
+  virtual bool DoBlasSymm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &b, int ldb,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *c, int ldc) = 0;
+  virtual bool DoBlasSymm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &b, int ldb,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *c, int ldc) = 0;
+
+  // Performs a symmetric rank-k update.
+  //
+  //     c <- alpha * a * a' + beta * c,
+  // or
+  //     c <- alpha * a' * a + beta * c,
+  //
+  // alpha and beta are scalars; c is a n-by-n symmetric matrix; a is an n-by-k
+  // matrix in the first case and a k-by-n matrix in the second case.
+  virtual bool DoBlasSyrk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          float alpha, const DeviceMemory<float> &a, int lda,
+                          float beta, DeviceMemory<float> *c, int ldc) = 0;
+  virtual bool DoBlasSyrk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          double alpha, const DeviceMemory<double> &a, int lda,
+                          double beta, DeviceMemory<double> *c, int ldc) = 0;
+  virtual bool DoBlasSyrk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *c, int ldc) = 0;
+  virtual bool DoBlasSyrk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *c, int ldc) = 0;
+
+  // Performs a symmetric rank-2k update.
+  //
+  //     c <- alpha * a * b' + alpha * b * a' + beta * c,
+  // or
+  //     c <- alpha * b' * a + alpha * a' * b + beta * c,
+  //
+  // alpha and beta are scalars; c is a n-by-n symmetric matrix; a and b are
+  // n-by-k matrices in the first case and k-by-n matrices in the second case.
+  virtual bool DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           float alpha, const DeviceMemory<float> &a, int lda,
+                           const DeviceMemory<float> &b, int ldb, float beta,
+                           DeviceMemory<float> *c, int ldc) = 0;
+  virtual bool DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           double alpha, const DeviceMemory<double> &a, int lda,
+                           const DeviceMemory<double> &b, int ldb, double beta,
+                           DeviceMemory<double> *c, int ldc) = 0;
+  virtual bool DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           std::complex<float> alpha,
+                           const DeviceMemory<std::complex<float>> &a, int lda,
+                           const DeviceMemory<std::complex<float>> &b, int ldb,
+                           std::complex<float> beta,
+                           DeviceMemory<std::complex<float>> *c, int ldc) = 0;
+  virtual bool DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           std::complex<double> alpha,
+                           const DeviceMemory<std::complex<double>> &a, int lda,
+                           const DeviceMemory<std::complex<double>> &b, int ldb,
+                           std::complex<double> beta,
+                           DeviceMemory<std::complex<double>> *c, int ldc) = 0;
+
+  // Computes a matrix-matrix product where one input matrix is triangular.
+  //
+  //     b <- alpha * op(a) * b,
+  // or
+  //     b <- alpha * b * op(a)
+  //
+  // alpha is a scalar; b is an m-by-n matrix; a is a unit, or non-unit, upper
+  // or lower triangular matrix; op(a) is one of op(a) = a, or op(a) = a', or
+  // op(a) = conj(a').
+  virtual bool DoBlasTrmm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n, float alpha,
+                          const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *b, int ldb) = 0;
+  virtual bool DoBlasTrmm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n, double alpha,
+                          const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *b, int ldb) = 0;
+  virtual bool DoBlasTrmm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          DeviceMemory<std::complex<float>> *b, int ldb) = 0;
+  virtual bool DoBlasTrmm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          DeviceMemory<std::complex<double>> *b, int ldb) = 0;
+
+  // Solves a triangular matrix equation.
+  //
+  //     op(a) * x = alpha * b,
+  // or
+  //     x * op(a) = alpha * b
+  //
+  // alpha is a scalar; x and b are m-by-n matrices; a is a unit, or non-unit,
+  // upper or lower triangular matrix; op(a) is one of op(a) = a, or op(a) = a',
+  // or op(a) = conj(a').
+  virtual bool DoBlasTrsm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n, float alpha,
+                          const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *b, int ldb) = 0;
+  virtual bool DoBlasTrsm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n, double alpha,
+                          const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *b, int ldb) = 0;
+  virtual bool DoBlasTrsm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          DeviceMemory<std::complex<float>> *b, int ldb) = 0;
+  virtual bool DoBlasTrsm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          DeviceMemory<std::complex<double>> *b, int ldb) = 0;
+
+ protected:
+  BlasSupport() {}
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(BlasSupport);
+};
+
+// Macro used to quickly declare overrides for abstract virtuals in the
+// BlasSupport base class.
+#define TENSORFLOW_STREAM_EXECUTOR_GPU_BLAS_SUPPORT_OVERRIDES                  \
+  bool DoBlasAsum(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<float> &x, int incx,                      \
+                  DeviceMemory<float> *result) override;                       \
+  bool DoBlasAsum(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<double> &x, int incx,                     \
+                  DeviceMemory<double> *result) override;                      \
+  bool DoBlasAsum(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  DeviceMemory<float> *result) override;                       \
+  bool DoBlasAsum(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  DeviceMemory<double> *result) override;                      \
+  bool DoBlasAxpy(Stream *stream, uint64 elem_count, float alpha,              \
+                  const DeviceMemory<float> &x, int incx,                      \
+                  DeviceMemory<float> *y, int incy) override;                  \
+  bool DoBlasAxpy(Stream *stream, uint64 elem_count, double alpha,             \
+                  const DeviceMemory<double> &x, int incx,                     \
+                  DeviceMemory<double> *y, int incy) override;                 \
+  bool DoBlasAxpy(Stream *stream, uint64 elem_count,                           \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  DeviceMemory<std::complex<float>> *y, int incy) override;    \
+  bool DoBlasAxpy(Stream *stream, uint64 elem_count,                           \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  DeviceMemory<std::complex<double>> *y, int incy) override;   \
+  bool DoBlasCopy(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<float> &x, int incx,                      \
+                  DeviceMemory<float> *y, int incy) override;                  \
+  bool DoBlasCopy(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<double> &x, int incx,                     \
+                  DeviceMemory<double> *y, int incy) override;                 \
+  bool DoBlasCopy(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  DeviceMemory<std::complex<float>> *y, int incy) override;    \
+  bool DoBlasCopy(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  DeviceMemory<std::complex<double>> *y, int incy) override;   \
+  bool DoBlasDot(Stream *stream, uint64 elem_count,                            \
+                 const DeviceMemory<float> &x, int incx,                       \
+                 const DeviceMemory<float> &y, int incy,                       \
+                 DeviceMemory<float> *result) override;                        \
+  bool DoBlasDot(Stream *stream, uint64 elem_count,                            \
+                 const DeviceMemory<double> &x, int incx,                      \
+                 const DeviceMemory<double> &y, int incy,                      \
+                 DeviceMemory<double> *result) override;                       \
+  bool DoBlasDotc(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  const DeviceMemory<std::complex<float>> &y, int incy,        \
+                  DeviceMemory<std::complex<float>> *result) override;         \
+  bool DoBlasDotc(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  const DeviceMemory<std::complex<double>> &y, int incy,       \
+                  DeviceMemory<std::complex<double>> *result) override;        \
+  bool DoBlasDotu(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  const DeviceMemory<std::complex<float>> &y, int incy,        \
+                  DeviceMemory<std::complex<float>> *result) override;         \
+  bool DoBlasDotu(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  const DeviceMemory<std::complex<double>> &y, int incy,       \
+                  DeviceMemory<std::complex<double>> *result) override;        \
+  bool DoBlasNrm2(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<float> &x, int incx,                      \
+                  DeviceMemory<float> *result) override;                       \
+  bool DoBlasNrm2(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<double> &x, int incx,                     \
+                  DeviceMemory<double> *result) override;                      \
+  bool DoBlasNrm2(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  DeviceMemory<float> *result) override;                       \
+  bool DoBlasNrm2(Stream *stream, uint64 elem_count,                           \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  DeviceMemory<double> *result) override;                      \
+  bool DoBlasRot(Stream *stream, uint64 elem_count, DeviceMemory<float> *x,    \
+                 int incx, DeviceMemory<float> *y, int incy, float c, float s) \
+      override;                                                                \
+  bool DoBlasRot(Stream *stream, uint64 elem_count, DeviceMemory<double> *x,   \
+                 int incx, DeviceMemory<double> *y, int incy, double c,        \
+                 double s) override;                                           \
+  bool DoBlasRot(Stream *stream, uint64 elem_count,                            \
+                 DeviceMemory<std::complex<float>> *x, int incx,               \
+                 DeviceMemory<std::complex<float>> *y, int incy, float c,      \
+                 float s) override;                                            \
+  bool DoBlasRot(Stream *stream, uint64 elem_count,                            \
+                 DeviceMemory<std::complex<double>> *x, int incx,              \
+                 DeviceMemory<std::complex<double>> *y, int incy, double c,    \
+                 double s) override;                                           \
+  bool DoBlasRotg(Stream *stream, DeviceMemory<float> *a,                      \
+                  DeviceMemory<float> *b, DeviceMemory<float> *c,              \
+                  DeviceMemory<float> *s) override;                            \
+  bool DoBlasRotg(Stream *stream, DeviceMemory<double> *a,                     \
+                  DeviceMemory<double> *b, DeviceMemory<double> *c,            \
+                  DeviceMemory<double> *s) override;                           \
+  bool DoBlasRotg(Stream *stream, DeviceMemory<std::complex<float>> *a,        \
+                  DeviceMemory<std::complex<float>> *b,                        \
+                  DeviceMemory<float> *c,                                      \
+                  DeviceMemory<std::complex<float>> *s) override;              \
+  bool DoBlasRotg(Stream *stream, DeviceMemory<std::complex<double>> *a,       \
+                  DeviceMemory<std::complex<double>> *b,                       \
+                  DeviceMemory<double> *c,                                     \
+                  DeviceMemory<std::complex<double>> *s) override;             \
+  bool DoBlasRotm(Stream *stream, uint64 elem_count, DeviceMemory<float> *x,   \
+                  int incx, DeviceMemory<float> *y, int incy,                  \
+                  const DeviceMemory<float> &param) override;                  \
+  bool DoBlasRotm(Stream *stream, uint64 elem_count, DeviceMemory<double> *x,  \
+                  int incx, DeviceMemory<double> *y, int incy,                 \
+                  const DeviceMemory<double> &param) override;                 \
+  bool DoBlasRotmg(Stream *stream, DeviceMemory<float> *d1,                    \
+                   DeviceMemory<float> *d2, DeviceMemory<float> *x1,           \
+                   const DeviceMemory<float> &y1, DeviceMemory<float> *param)  \
+      override;                                                                \
+  bool DoBlasRotmg(Stream *stream, DeviceMemory<double> *d1,                   \
+                   DeviceMemory<double> *d2, DeviceMemory<double> *x1,         \
+                   const DeviceMemory<double> &y1,                             \
+                   DeviceMemory<double> *param) override;                      \
+  bool DoBlasScal(Stream *stream, uint64 elem_count, float alpha,              \
+                  DeviceMemory<float> *x, int incx) override;                  \
+  bool DoBlasScal(Stream *stream, uint64 elem_count, double alpha,             \
+                  DeviceMemory<double> *x, int incx) override;                 \
+  bool DoBlasScal(Stream *stream, uint64 elem_count, float alpha,              \
+                  DeviceMemory<std::complex<float>> *x, int incx) override;    \
+  bool DoBlasScal(Stream *stream, uint64 elem_count, double alpha,             \
+                  DeviceMemory<std::complex<double>> *x, int incx) override;   \
+  bool DoBlasScal(Stream *stream, uint64 elem_count,                           \
+                  std::complex<float> alpha,                                   \
+                  DeviceMemory<std::complex<float>> *x, int incx) override;    \
+  bool DoBlasScal(Stream *stream, uint64 elem_count,                           \
+                  std::complex<double> alpha,                                  \
+                  DeviceMemory<std::complex<double>> *x, int incx) override;   \
+  bool DoBlasSwap(Stream *stream, uint64 elem_count, DeviceMemory<float> *x,   \
+                  int incx, DeviceMemory<float> *y, int incy) override;        \
+  bool DoBlasSwap(Stream *stream, uint64 elem_count, DeviceMemory<double> *x,  \
+                  int incx, DeviceMemory<double> *y, int incy) override;       \
+  bool DoBlasSwap(Stream *stream, uint64 elem_count,                           \
+                  DeviceMemory<std::complex<float>> *x, int incx,              \
+                  DeviceMemory<std::complex<float>> *y, int incy) override;    \
+  bool DoBlasSwap(Stream *stream, uint64 elem_count,                           \
+                  DeviceMemory<std::complex<double>> *x, int incx,             \
+                  DeviceMemory<std::complex<double>> *y, int incy) override;   \
+  bool DoBlasIamax(Stream *stream, uint64 elem_count,                          \
+                   const DeviceMemory<float> &x, int incx,                     \
+                   DeviceMemory<int> *result) override;                        \
+  bool DoBlasIamax(Stream *stream, uint64 elem_count,                          \
+                   const DeviceMemory<double> &x, int incx,                    \
+                   DeviceMemory<int> *result) override;                        \
+  bool DoBlasIamax(Stream *stream, uint64 elem_count,                          \
+                   const DeviceMemory<std::complex<float>> &x, int incx,       \
+                   DeviceMemory<int> *result) override;                        \
+  bool DoBlasIamax(Stream *stream, uint64 elem_count,                          \
+                   const DeviceMemory<std::complex<double>> &x, int incx,      \
+                   DeviceMemory<int> *result) override;                        \
+  bool DoBlasIamin(Stream *stream, uint64 elem_count,                          \
+                   const DeviceMemory<float> &x, int incx,                     \
+                   DeviceMemory<int> *result) override;                        \
+  bool DoBlasIamin(Stream *stream, uint64 elem_count,                          \
+                   const DeviceMemory<double> &x, int incx,                    \
+                   DeviceMemory<int> *result) override;                        \
+  bool DoBlasIamin(Stream *stream, uint64 elem_count,                          \
+                   const DeviceMemory<std::complex<float>> &x, int incx,       \
+                   DeviceMemory<int> *result) override;                        \
+  bool DoBlasIamin(Stream *stream, uint64 elem_count,                          \
+                   const DeviceMemory<std::complex<double>> &x, int incx,      \
+                   DeviceMemory<int> *result) override;                        \
+  bool DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m, uint64 n,   \
+                  uint64 kl, uint64 ku, float alpha,                           \
+                  const DeviceMemory<float> &a, int lda,                       \
+                  const DeviceMemory<float> &x, int incx, float beta,          \
+                  DeviceMemory<float> *y, int incy) override;                  \
+  bool DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m, uint64 n,   \
+                  uint64 kl, uint64 ku, double alpha,                          \
+                  const DeviceMemory<double> &a, int lda,                      \
+                  const DeviceMemory<double> &x, int incx, double beta,        \
+                  DeviceMemory<double> *y, int incy) override;                 \
+  bool DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m, uint64 n,   \
+                  uint64 kl, uint64 ku, std::complex<float> alpha,             \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *y, int incy) override;    \
+  bool DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m, uint64 n,   \
+                  uint64 kl, uint64 ku, std::complex<double> alpha,            \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *y, int incy) override;   \
+  bool DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m, uint64 n,   \
+                  float alpha, const DeviceMemory<float> &a, int lda,          \
+                  const DeviceMemory<float> &x, int incx, float beta,          \
+                  DeviceMemory<float> *y, int incy) override;                  \
+  bool DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m, uint64 n,   \
+                  double alpha, const DeviceMemory<double> &a, int lda,        \
+                  const DeviceMemory<double> &x, int incx, double beta,        \
+                  DeviceMemory<double> *y, int incy) override;                 \
+  bool DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m, uint64 n,   \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *y, int incy) override;    \
+  bool DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m, uint64 n,   \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *y, int incy) override;   \
+  bool DoBlasGer(Stream *stream, uint64 m, uint64 n, float alpha,              \
+                 const DeviceMemory<float> &x, int incx,                       \
+                 const DeviceMemory<float> &y, int incy,                       \
+                 DeviceMemory<float> *a, int lda) override;                    \
+  bool DoBlasGer(Stream *stream, uint64 m, uint64 n, double alpha,             \
+                 const DeviceMemory<double> &x, int incx,                      \
+                 const DeviceMemory<double> &y, int incy,                      \
+                 DeviceMemory<double> *a, int lda) override;                   \
+  bool DoBlasGerc(Stream *stream, uint64 m, uint64 n,                          \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  const DeviceMemory<std::complex<float>> &y, int incy,        \
+                  DeviceMemory<std::complex<float>> *a, int lda) override;     \
+  bool DoBlasGerc(Stream *stream, uint64 m, uint64 n,                          \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  const DeviceMemory<std::complex<double>> &y, int incy,       \
+                  DeviceMemory<std::complex<double>> *a, int lda) override;    \
+  bool DoBlasGeru(Stream *stream, uint64 m, uint64 n,                          \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  const DeviceMemory<std::complex<float>> &y, int incy,        \
+                  DeviceMemory<std::complex<float>> *a, int lda) override;     \
+  bool DoBlasGeru(Stream *stream, uint64 m, uint64 n,                          \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  const DeviceMemory<std::complex<double>> &y, int incy,       \
+                  DeviceMemory<std::complex<double>> *a, int lda) override;    \
+  bool DoBlasHbmv(Stream *stream, blas::UpperLower uplo, uint64 n, uint64 k,   \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *y, int incy) override;    \
+  bool DoBlasHbmv(Stream *stream, blas::UpperLower uplo, uint64 n, uint64 k,   \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *y, int incy) override;   \
+  bool DoBlasHemv(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *y, int incy) override;    \
+  bool DoBlasHemv(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *y, int incy) override;   \
+  bool DoBlasHer(Stream *stream, blas::UpperLower uplo, uint64 n, float alpha, \
+                 const DeviceMemory<std::complex<float>> &x, int incx,         \
+                 DeviceMemory<std::complex<float>> *a, int lda) override;      \
+  bool DoBlasHer(Stream *stream, blas::UpperLower uplo, uint64 n,              \
+                 double alpha, const DeviceMemory<std::complex<double>> &x,    \
+                 int incx, DeviceMemory<std::complex<double>> *a, int lda)     \
+      override;                                                                \
+  bool DoBlasHer2(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  const DeviceMemory<std::complex<float>> &y, int incy,        \
+                  DeviceMemory<std::complex<float>> *a, int lda) override;     \
+  bool DoBlasHer2(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  const DeviceMemory<std::complex<double>> &y, int incy,       \
+                  DeviceMemory<std::complex<double>> *a, int lda) override;    \
+  bool DoBlasHpmv(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &ap,                 \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *y, int incy) override;    \
+  bool DoBlasHpmv(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &ap,                \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *y, int incy) override;   \
+  bool DoBlasHpr(Stream *stream, blas::UpperLower uplo, uint64 n, float alpha, \
+                 const DeviceMemory<std::complex<float>> &x, int incx,         \
+                 DeviceMemory<std::complex<float>> *ap) override;              \
+  bool DoBlasHpr(Stream *stream, blas::UpperLower uplo, uint64 n,              \
+                 double alpha, const DeviceMemory<std::complex<double>> &x,    \
+                 int incx, DeviceMemory<std::complex<double>> *ap) override;   \
+  bool DoBlasHpr2(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &x, int incx,        \
+                  const DeviceMemory<std::complex<float>> &y, int incy,        \
+                  DeviceMemory<std::complex<float>> *ap) override;             \
+  bool DoBlasHpr2(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &x, int incx,       \
+                  const DeviceMemory<std::complex<double>> &y, int incy,       \
+                  DeviceMemory<std::complex<double>> *ap) override;            \
+  bool DoBlasSbmv(Stream *stream, blas::UpperLower uplo, uint64 n, uint64 k,   \
+                  float alpha, const DeviceMemory<float> &a, int lda,          \
+                  const DeviceMemory<float> &x, int incx, float beta,          \
+                  DeviceMemory<float> *y, int incy) override;                  \
+  bool DoBlasSbmv(Stream *stream, blas::UpperLower uplo, uint64 n, uint64 k,   \
+                  double alpha, const DeviceMemory<double> &a, int lda,        \
+                  const DeviceMemory<double> &x, int incx, double beta,        \
+                  DeviceMemory<double> *y, int incy) override;                 \
+  bool DoBlasSpmv(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  float alpha, const DeviceMemory<float> &ap,                  \
+                  const DeviceMemory<float> &x, int incx, float beta,          \
+                  DeviceMemory<float> *y, int incy) override;                  \
+  bool DoBlasSpmv(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  double alpha, const DeviceMemory<double> &ap,                \
+                  const DeviceMemory<double> &x, int incx, double beta,        \
+                  DeviceMemory<double> *y, int incy) override;                 \
+  bool DoBlasSpr(Stream *stream, blas::UpperLower uplo, uint64 n, float alpha, \
+                 const DeviceMemory<float> &x, int incx,                       \
+                 DeviceMemory<float> *ap) override;                            \
+  bool DoBlasSpr(Stream *stream, blas::UpperLower uplo, uint64 n,              \
+                 double alpha, const DeviceMemory<double> &x, int incx,        \
+                 DeviceMemory<double> *ap) override;                           \
+  bool DoBlasSpr2(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  float alpha, const DeviceMemory<float> &x, int incx,         \
+                  const DeviceMemory<float> &y, int incy,                      \
+                  DeviceMemory<float> *ap) override;                           \
+  bool DoBlasSpr2(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  double alpha, const DeviceMemory<double> &x, int incx,       \
+                  const DeviceMemory<double> &y, int incy,                     \
+                  DeviceMemory<double> *ap) override;                          \
+  bool DoBlasSymv(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  float alpha, const DeviceMemory<float> &a, int lda,          \
+                  const DeviceMemory<float> &x, int incx, float beta,          \
+                  DeviceMemory<float> *y, int incy) override;                  \
+  bool DoBlasSymv(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  double alpha, const DeviceMemory<double> &a, int lda,        \
+                  const DeviceMemory<double> &x, int incx, double beta,        \
+                  DeviceMemory<double> *y, int incy) override;                 \
+  bool DoBlasSyr(Stream *stream, blas::UpperLower uplo, uint64 n, float alpha, \
+                 const DeviceMemory<float> &x, int incx,                       \
+                 DeviceMemory<float> *a, int lda) override;                    \
+  bool DoBlasSyr(Stream *stream, blas::UpperLower uplo, uint64 n,              \
+                 double alpha, const DeviceMemory<double> &x, int incx,        \
+                 DeviceMemory<double> *a, int lda) override;                   \
+  bool DoBlasSyr2(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  float alpha, const DeviceMemory<float> &x, int incx,         \
+                  const DeviceMemory<float> &y, int incy,                      \
+                  DeviceMemory<float> *a, int lda) override;                   \
+  bool DoBlasSyr2(Stream *stream, blas::UpperLower uplo, uint64 n,             \
+                  double alpha, const DeviceMemory<double> &x, int incx,       \
+                  const DeviceMemory<double> &y, int incy,                     \
+                  DeviceMemory<double> *a, int lda) override;                  \
+  bool DoBlasTbmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  uint64 k, const DeviceMemory<float> &a, int lda,             \
+                  DeviceMemory<float> *x, int incx) override;                  \
+  bool DoBlasTbmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  uint64 k, const DeviceMemory<double> &a, int lda,            \
+                  DeviceMemory<double> *x, int incx) override;                 \
+  bool DoBlasTbmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  uint64 k, const DeviceMemory<std::complex<float>> &a,        \
+                  int lda, DeviceMemory<std::complex<float>> *x, int incx)     \
+      override;                                                                \
+  bool DoBlasTbmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  uint64 k, const DeviceMemory<std::complex<double>> &a,       \
+                  int lda, DeviceMemory<std::complex<double>> *x, int incx)    \
+      override;                                                                \
+  bool DoBlasTbsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  uint64 k, const DeviceMemory<float> &a, int lda,             \
+                  DeviceMemory<float> *x, int incx) override;                  \
+  bool DoBlasTbsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  uint64 k, const DeviceMemory<double> &a, int lda,            \
+                  DeviceMemory<double> *x, int incx) override;                 \
+  bool DoBlasTbsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  uint64 k, const DeviceMemory<std::complex<float>> &a,        \
+                  int lda, DeviceMemory<std::complex<float>> *x, int incx)     \
+      override;                                                                \
+  bool DoBlasTbsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  uint64 k, const DeviceMemory<std::complex<double>> &a,       \
+                  int lda, DeviceMemory<std::complex<double>> *x, int incx)    \
+      override;                                                                \
+  bool DoBlasTpmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<float> &ap, DeviceMemory<float> *x,       \
+                  int incx) override;                                          \
+  bool DoBlasTpmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<double> &ap, DeviceMemory<double> *x,     \
+                  int incx) override;                                          \
+  bool DoBlasTpmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<std::complex<float>> &ap,                 \
+                  DeviceMemory<std::complex<float>> *x, int incx) override;    \
+  bool DoBlasTpmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<std::complex<double>> &ap,                \
+                  DeviceMemory<std::complex<double>> *x, int incx) override;   \
+  bool DoBlasTpsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<float> &ap, DeviceMemory<float> *x,       \
+                  int incx) override;                                          \
+  bool DoBlasTpsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<double> &ap, DeviceMemory<double> *x,     \
+                  int incx) override;                                          \
+  bool DoBlasTpsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<std::complex<float>> &ap,                 \
+                  DeviceMemory<std::complex<float>> *x, int incx) override;    \
+  bool DoBlasTpsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<std::complex<double>> &ap,                \
+                  DeviceMemory<std::complex<double>> *x, int incx) override;   \
+  bool DoBlasTrmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<float> &a, int lda,                       \
+                  DeviceMemory<float> *x, int incx) override;                  \
+  bool DoBlasTrmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<double> &a, int lda,                      \
+                  DeviceMemory<double> *x, int incx) override;                 \
+  bool DoBlasTrmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  DeviceMemory<std::complex<float>> *x, int incx) override;    \
+  bool DoBlasTrmv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  DeviceMemory<std::complex<double>> *x, int incx) override;   \
+  bool DoBlasTrsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<float> &a, int lda,                       \
+                  DeviceMemory<float> *x, int incx) override;                  \
+  bool DoBlasTrsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<double> &a, int lda,                      \
+                  DeviceMemory<double> *x, int incx) override;                 \
+  bool DoBlasTrsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  DeviceMemory<std::complex<float>> *x, int incx) override;    \
+  bool DoBlasTrsv(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, blas::Diagonal diag, uint64 n,        \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  DeviceMemory<std::complex<double>> *x, int incx) override;   \
+  bool DoBlasGemm(Stream *stream, blas::Transpose transa,                      \
+                  blas::Transpose transb, uint64 m, uint64 n, uint64 k,        \
+                  float alpha, const DeviceMemory<float> &a, int lda,          \
+                  const DeviceMemory<float> &b, int ldb, float beta,           \
+                  DeviceMemory<float> *c, int ldc) override;                   \
+  bool DoBlasGemm(Stream *stream, blas::Transpose transa,                      \
+                  blas::Transpose transb, uint64 m, uint64 n, uint64 k,        \
+                  double alpha, const DeviceMemory<double> &a, int lda,        \
+                  const DeviceMemory<double> &b, int ldb, double beta,         \
+                  DeviceMemory<double> *c, int ldc) override;                  \
+  bool DoBlasGemm(Stream *stream, blas::Transpose transa,                      \
+                  blas::Transpose transb, uint64 m, uint64 n, uint64 k,        \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  const DeviceMemory<std::complex<float>> &b, int ldb,         \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *c, int ldc) override;     \
+  bool DoBlasGemm(Stream *stream, blas::Transpose transa,                      \
+                  blas::Transpose transb, uint64 m, uint64 n, uint64 k,        \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  const DeviceMemory<std::complex<double>> &b, int ldb,        \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *c, int ldc) override;    \
+  bool DoBlasGemmBatched(                                                      \
+      Stream *stream, blas::Transpose transa, blas::Transpose transb,          \
+      uint64 m, uint64 n, uint64 k, float alpha,                               \
+      const port::ArraySlice<DeviceMemory<float> *> &a, int lda,               \
+      const port::ArraySlice<DeviceMemory<float> *> &b, int ldb, float beta,   \
+      const port::ArraySlice<DeviceMemory<float> *> &c, int ldc,               \
+      int batch_count) override;                                               \
+  bool DoBlasGemmBatched(                                                      \
+      Stream *stream, blas::Transpose transa, blas::Transpose transb,          \
+      uint64 m, uint64 n, uint64 k, double alpha,                              \
+      const port::ArraySlice<DeviceMemory<double> *> &a, int lda,              \
+      const port::ArraySlice<DeviceMemory<double> *> &b, int ldb, double beta, \
+      const port::ArraySlice<DeviceMemory<double> *> &c, int ldc,              \
+      int batch_count) override;                                               \
+  bool DoBlasGemmBatched(                                                      \
+      Stream *stream, blas::Transpose transa, blas::Transpose transb,          \
+      uint64 m, uint64 n, uint64 k, std::complex<float> alpha,                 \
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &a, int lda, \
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &b, int ldb, \
+      std::complex<float> beta,                                                \
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &c, int ldc, \
+      int batch_count) override;                                               \
+  bool DoBlasGemmBatched(                                                      \
+      Stream *stream, blas::Transpose transa, blas::Transpose transb,          \
+      uint64 m, uint64 n, uint64 k, std::complex<double> alpha,                \
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &a,         \
+      int lda,                                                                 \
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &b,         \
+      int ldb, std::complex<double> beta,                                      \
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &c,         \
+      int ldc, int batch_count) override;                                      \
+  bool DoBlasHemm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  uint64 m, uint64 n, std::complex<float> alpha,               \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  const DeviceMemory<std::complex<float>> &b, int ldb,         \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *c, int ldc) override;     \
+  bool DoBlasHemm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  uint64 m, uint64 n, std::complex<double> alpha,              \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  const DeviceMemory<std::complex<double>> &b, int ldb,        \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *c, int ldc) override;    \
+  bool DoBlasHerk(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, uint64 n, uint64 k, float alpha,      \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  float beta, DeviceMemory<std::complex<float>> *c, int ldc)   \
+      override;                                                                \
+  bool DoBlasHerk(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, uint64 n, uint64 k, double alpha,     \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  double beta, DeviceMemory<std::complex<double>> *c, int ldc) \
+      override;                                                                \
+  bool DoBlasHer2k(                                                            \
+      Stream *stream, blas::UpperLower uplo, blas::Transpose trans, uint64 n,  \
+      uint64 k, std::complex<float> alpha,                                     \
+      const DeviceMemory<std::complex<float>> &a, int lda,                     \
+      const DeviceMemory<std::complex<float>> &b, int ldb, float beta,         \
+      DeviceMemory<std::complex<float>> *c, int ldc) override;                 \
+  bool DoBlasHer2k(                                                            \
+      Stream *stream, blas::UpperLower uplo, blas::Transpose trans, uint64 n,  \
+      uint64 k, std::complex<double> alpha,                                    \
+      const DeviceMemory<std::complex<double>> &a, int lda,                    \
+      const DeviceMemory<std::complex<double>> &b, int ldb, double beta,       \
+      DeviceMemory<std::complex<double>> *c, int ldc) override;                \
+  bool DoBlasSymm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  uint64 m, uint64 n, float alpha,                             \
+                  const DeviceMemory<float> &a, int lda,                       \
+                  const DeviceMemory<float> &b, int ldb, float beta,           \
+                  DeviceMemory<float> *c, int ldc) override;                   \
+  bool DoBlasSymm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  uint64 m, uint64 n, double alpha,                            \
+                  const DeviceMemory<double> &a, int lda,                      \
+                  const DeviceMemory<double> &b, int ldb, double beta,         \
+                  DeviceMemory<double> *c, int ldc) override;                  \
+  bool DoBlasSymm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  uint64 m, uint64 n, std::complex<float> alpha,               \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  const DeviceMemory<std::complex<float>> &b, int ldb,         \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *c, int ldc) override;     \
+  bool DoBlasSymm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  uint64 m, uint64 n, std::complex<double> alpha,              \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  const DeviceMemory<std::complex<double>> &b, int ldb,        \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *c, int ldc) override;    \
+  bool DoBlasSyrk(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, uint64 n, uint64 k, float alpha,      \
+                  const DeviceMemory<float> &a, int lda, float beta,           \
+                  DeviceMemory<float> *c, int ldc) override;                   \
+  bool DoBlasSyrk(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, uint64 n, uint64 k, double alpha,     \
+                  const DeviceMemory<double> &a, int lda, double beta,         \
+                  DeviceMemory<double> *c, int ldc) override;                  \
+  bool DoBlasSyrk(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, uint64 n, uint64 k,                   \
+                  std::complex<float> alpha,                                   \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  std::complex<float> beta,                                    \
+                  DeviceMemory<std::complex<float>> *c, int ldc) override;     \
+  bool DoBlasSyrk(Stream *stream, blas::UpperLower uplo,                       \
+                  blas::Transpose trans, uint64 n, uint64 k,                   \
+                  std::complex<double> alpha,                                  \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  std::complex<double> beta,                                   \
+                  DeviceMemory<std::complex<double>> *c, int ldc) override;    \
+  bool DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,                      \
+                   blas::Transpose trans, uint64 n, uint64 k, float alpha,     \
+                   const DeviceMemory<float> &a, int lda,                      \
+                   const DeviceMemory<float> &b, int ldb, float beta,          \
+                   DeviceMemory<float> *c, int ldc) override;                  \
+  bool DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,                      \
+                   blas::Transpose trans, uint64 n, uint64 k, double alpha,    \
+                   const DeviceMemory<double> &a, int lda,                     \
+                   const DeviceMemory<double> &b, int ldb, double beta,        \
+                   DeviceMemory<double> *c, int ldc) override;                 \
+  bool DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,                      \
+                   blas::Transpose trans, uint64 n, uint64 k,                  \
+                   std::complex<float> alpha,                                  \
+                   const DeviceMemory<std::complex<float>> &a, int lda,        \
+                   const DeviceMemory<std::complex<float>> &b, int ldb,        \
+                   std::complex<float> beta,                                   \
+                   DeviceMemory<std::complex<float>> *c, int ldc) override;    \
+  bool DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,                      \
+                   blas::Transpose trans, uint64 n, uint64 k,                  \
+                   std::complex<double> alpha,                                 \
+                   const DeviceMemory<std::complex<double>> &a, int lda,       \
+                   const DeviceMemory<std::complex<double>> &b, int ldb,       \
+                   std::complex<double> beta,                                  \
+                   DeviceMemory<std::complex<double>> *c, int ldc) override;   \
+  bool DoBlasTrmm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  blas::Transpose transa, blas::Diagonal diag, uint64 m,       \
+                  uint64 n, float alpha, const DeviceMemory<float> &a,         \
+                  int lda, DeviceMemory<float> *b, int ldb) override;          \
+  bool DoBlasTrmm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  blas::Transpose transa, blas::Diagonal diag, uint64 m,       \
+                  uint64 n, double alpha, const DeviceMemory<double> &a,       \
+                  int lda, DeviceMemory<double> *b, int ldb) override;         \
+  bool DoBlasTrmm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  blas::Transpose transa, blas::Diagonal diag, uint64 m,       \
+                  uint64 n, std::complex<float> alpha,                         \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  DeviceMemory<std::complex<float>> *b, int ldb) override;     \
+  bool DoBlasTrmm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  blas::Transpose transa, blas::Diagonal diag, uint64 m,       \
+                  uint64 n, std::complex<double> alpha,                        \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  DeviceMemory<std::complex<double>> *b, int ldb) override;    \
+  bool DoBlasTrsm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  blas::Transpose transa, blas::Diagonal diag, uint64 m,       \
+                  uint64 n, float alpha, const DeviceMemory<float> &a,         \
+                  int lda, DeviceMemory<float> *b, int ldb) override;          \
+  bool DoBlasTrsm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  blas::Transpose transa, blas::Diagonal diag, uint64 m,       \
+                  uint64 n, double alpha, const DeviceMemory<double> &a,       \
+                  int lda, DeviceMemory<double> *b, int ldb) override;         \
+  bool DoBlasTrsm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  blas::Transpose transa, blas::Diagonal diag, uint64 m,       \
+                  uint64 n, std::complex<float> alpha,                         \
+                  const DeviceMemory<std::complex<float>> &a, int lda,         \
+                  DeviceMemory<std::complex<float>> *b, int ldb) override;     \
+  bool DoBlasTrsm(Stream *stream, blas::Side side, blas::UpperLower uplo,      \
+                  blas::Transpose transa, blas::Diagonal diag, uint64 m,       \
+                  uint64 n, std::complex<double> alpha,                        \
+                  const DeviceMemory<std::complex<double>> &a, int lda,        \
+                  DeviceMemory<std::complex<double>> *b, int ldb) override;
+
+}  // namespace blas
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_BLAS_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_activation.cc b/tensorflow/stream_executor/cuda/cuda_activation.cc
new file mode 100644
index 0000000000..32d2c0d424
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_activation.cc
@@ -0,0 +1,30 @@
+#include "tensorflow/stream_executor/cuda/cuda_activation.h"
+
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+CUcontext ExtractCudaContext(CUDAExecutor *cuda_exec);
+CUDAExecutor *ExtractCudaExecutor(StreamExecutor *stream_exec);
+
+ScopedActivateExecutorContext::ScopedActivateExecutorContext(
+    CUDAExecutor *cuda_exec, MultiOpActivation moa)
+    : cuda_exec_(cuda_exec),
+      driver_scoped_activate_context_(
+          new ScopedActivateContext{ExtractCudaContext(cuda_exec), moa}) {}
+
+ScopedActivateExecutorContext::ScopedActivateExecutorContext(
+    StreamExecutor *stream_exec, MultiOpActivation moa)
+    : ScopedActivateExecutorContext(ExtractCudaExecutor(stream_exec), moa) {}
+
+ScopedActivateExecutorContext::~ScopedActivateExecutorContext() {
+  delete static_cast<ScopedActivateContext *>(driver_scoped_activate_context_);
+}
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/cuda/cuda_activation.h b/tensorflow/stream_executor/cuda/cuda_activation.h
new file mode 100644
index 0000000000..4181d13d0a
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_activation.h
@@ -0,0 +1,53 @@
+// This file contains APIs that assume a StreamExecutor is backed by CUDA.
+// It reaches into the CUDA implementation to activate an underlying CUDA
+// context.
+//
+// Having this file separate from cuda_gpu_executor.h means that dependent
+// code does not also have to depend on cuda.h.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_ACTIVATION_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_ACTIVATION_H_
+
+#include "tensorflow/stream_executor/cuda/multi_op_activation.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+class StreamExecutor;
+
+namespace cuda {
+
+class CUDAExecutor;
+class ScopedActivateContext;
+
+// Activates a CUDA context within an enclosing scope.
+class ScopedActivateExecutorContext {
+ public:
+  // Form that takes a CUDA executor implementation.
+  explicit ScopedActivateExecutorContext(
+      CUDAExecutor* cuda_exec, MultiOpActivation moa = MultiOpActivation::kNo);
+
+  // Form that takes a pImpl executor and extracts a CUDA implementation --
+  // fatal failure if it is not CUDA inside.
+  explicit ScopedActivateExecutorContext(
+      StreamExecutor* stream_exec,
+      MultiOpActivation moa = MultiOpActivation::kNo);
+
+  ~ScopedActivateExecutorContext();
+
+ private:
+  // The CUDA executor implementation whose context is activated.
+  CUDAExecutor* cuda_exec_;
+
+  // The cuda.h-using datatype that we wrap.
+  ScopedActivateContext* driver_scoped_activate_context_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(ScopedActivateExecutorContext);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_ACTIVATION_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_blas.cc b/tensorflow/stream_executor/cuda/cuda_blas.cc
new file mode 100644
index 0000000000..ef1036bca3
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_blas.cc
@@ -0,0 +1,2184 @@
+#include "tensorflow/stream_executor/cuda/cuda_blas.h"
+
+#include <dlfcn.h>
+
+#include <complex>
+
+#include "tensorflow/stream_executor/cuda/cuda_activation.h"
+#include "tensorflow/stream_executor/cuda/cuda_gpu_executor.h"
+#include "tensorflow/stream_executor/cuda/cuda_helpers.h"
+#include "tensorflow/stream_executor/cuda/cuda_platform.h"
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/dso_loader.h"
+#include "tensorflow/stream_executor/lib/initialize.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/status_macros.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "third_party/gpus/cuda/include/cublas_v2.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+PLUGIN_REGISTRY_DEFINE_PLUGIN_ID(kCuBlasPlugin);
+
+namespace dynload {
+
+#define PERFTOOLS_GPUTOOLS_CUBLAS_WRAP(__name)                              \
+  struct DynLoadShim__##__name {                                            \
+    static const char *kName;                                               \
+    using FuncPointerT = std::add_pointer<decltype(::__name)>::type;        \
+    static void *GetDsoHandle() {                                           \
+      static auto status = internal::CachedDsoLoader::GetCublasDsoHandle(); \
+      return status.ValueOrDie();                                           \
+    }                                                                       \
+    static FuncPointerT DynLoad() {                                         \
+      static void *f = dlsym(GetDsoHandle(), kName);                        \
+      CHECK(f != nullptr) << "could not find " << kName                     \
+                          << " in cuBLAS DSO; dlerror: " << dlerror();      \
+      return reinterpret_cast<FuncPointerT>(f);                             \
+    }                                                                       \
+    template <typename... Args>                                             \
+    cublasStatus_t operator()(CUDAExecutor * parent, Args... args) {        \
+      cuda::ScopedActivateExecutorContext sac{parent};                      \
+      return DynLoad()(args...);                                            \
+    }                                                                       \
+  } __name;                                                                 \
+  const char *DynLoadShim__##__name::kName = #__name;
+
+#define PERFTOOLS_GPUTOOLS_CUBLAS_V2_WRAP(__name) \
+  PERFTOOLS_GPUTOOLS_CUBLAS_WRAP(__name)
+
+#define CUBLAS_BLAS_ROUTINE_EACH(__macro) \
+  __macro(cublasSnrm2)                    \
+  __macro(cublasDnrm2)                    \
+  __macro(cublasScnrm2)                   \
+  __macro(cublasDznrm2)                   \
+  __macro(cublasSdot)                     \
+  __macro(cublasDdot)                     \
+  __macro(cublasCdotu)                    \
+  __macro(cublasCdotc)                    \
+  __macro(cublasZdotu)                    \
+  __macro(cublasZdotc)                    \
+  __macro(cublasSscal)                    \
+  __macro(cublasDscal)                    \
+  __macro(cublasCscal)                    \
+  __macro(cublasCsscal)                   \
+  __macro(cublasZscal)                    \
+  __macro(cublasZdscal)                   \
+  __macro(cublasSaxpy)                    \
+  __macro(cublasDaxpy)                    \
+  __macro(cublasCaxpy)                    \
+  __macro(cublasZaxpy)                    \
+  __macro(cublasScopy)                    \
+  __macro(cublasDcopy)                    \
+  __macro(cublasCcopy)                    \
+  __macro(cublasZcopy)                    \
+  __macro(cublasSswap)                    \
+  __macro(cublasDswap)                    \
+  __macro(cublasCswap)                    \
+  __macro(cublasZswap)                    \
+  __macro(cublasIsamax)                   \
+  __macro(cublasIdamax)                   \
+  __macro(cublasIcamax)                   \
+  __macro(cublasIzamax)                   \
+  __macro(cublasIsamin)                   \
+  __macro(cublasIdamin)                   \
+  __macro(cublasIcamin)                   \
+  __macro(cublasIzamin)                   \
+  __macro(cublasSasum)                    \
+  __macro(cublasDasum)                    \
+  __macro(cublasScasum)                   \
+  __macro(cublasDzasum)                   \
+  __macro(cublasSrot)                     \
+  __macro(cublasDrot)                     \
+  __macro(cublasCrot)                     \
+  __macro(cublasCsrot)                    \
+  __macro(cublasZrot)                     \
+  __macro(cublasZdrot)                    \
+  __macro(cublasSrotg)                    \
+  __macro(cublasDrotg)                    \
+  __macro(cublasCrotg)                    \
+  __macro(cublasZrotg)                    \
+  __macro(cublasSrotm)                    \
+  __macro(cublasDrotm)                    \
+  __macro(cublasSrotmg)                   \
+  __macro(cublasDrotmg)                   \
+  __macro(cublasSgemv)                    \
+  __macro(cublasDgemv)                    \
+  __macro(cublasCgemv)                    \
+  __macro(cublasZgemv)                    \
+  __macro(cublasSgbmv)                    \
+  __macro(cublasDgbmv)                    \
+  __macro(cublasCgbmv)                    \
+  __macro(cublasZgbmv)                    \
+  __macro(cublasStrmv)                    \
+  __macro(cublasDtrmv)                    \
+  __macro(cublasCtrmv)                    \
+  __macro(cublasZtrmv)                    \
+  __macro(cublasStbmv)                    \
+  __macro(cublasDtbmv)                    \
+  __macro(cublasCtbmv)                    \
+  __macro(cublasZtbmv)                    \
+  __macro(cublasStpmv)                    \
+  __macro(cublasDtpmv)                    \
+  __macro(cublasCtpmv)                    \
+  __macro(cublasZtpmv)                    \
+  __macro(cublasStrsv)                    \
+  __macro(cublasDtrsv)                    \
+  __macro(cublasCtrsv)                    \
+  __macro(cublasZtrsv)                    \
+  __macro(cublasStpsv)                    \
+  __macro(cublasDtpsv)                    \
+  __macro(cublasCtpsv)                    \
+  __macro(cublasZtpsv)                    \
+  __macro(cublasStbsv)                    \
+  __macro(cublasDtbsv)                    \
+  __macro(cublasCtbsv)                    \
+  __macro(cublasZtbsv)                    \
+  __macro(cublasSsymv)                    \
+  __macro(cublasDsymv)                    \
+  __macro(cublasCsymv)                    \
+  __macro(cublasZsymv)                    \
+  __macro(cublasChemv)                    \
+  __macro(cublasZhemv)                    \
+  __macro(cublasSsbmv)                    \
+  __macro(cublasDsbmv)                    \
+  __macro(cublasChbmv)                    \
+  __macro(cublasZhbmv)                    \
+  __macro(cublasSspmv)                    \
+  __macro(cublasDspmv)                    \
+  __macro(cublasChpmv)                    \
+  __macro(cublasZhpmv)                    \
+  __macro(cublasSger)                     \
+  __macro(cublasDger)                     \
+  __macro(cublasCgeru)                    \
+  __macro(cublasCgerc)                    \
+  __macro(cublasZgeru)                    \
+  __macro(cublasZgerc)                    \
+  __macro(cublasSsyr)                     \
+  __macro(cublasDsyr)                     \
+  __macro(cublasCsyr)                     \
+  __macro(cublasZsyr)                     \
+  __macro(cublasCher)                     \
+  __macro(cublasZher)                     \
+  __macro(cublasSspr)                     \
+  __macro(cublasDspr)                     \
+  __macro(cublasChpr)                     \
+  __macro(cublasZhpr)                     \
+  __macro(cublasSsyr2)                    \
+  __macro(cublasDsyr2)                    \
+  __macro(cublasCsyr2)                    \
+  __macro(cublasZsyr2)                    \
+  __macro(cublasCher2)                    \
+  __macro(cublasZher2)                    \
+  __macro(cublasSspr2)                    \
+  __macro(cublasDspr2)                    \
+  __macro(cublasChpr2)                    \
+  __macro(cublasZhpr2)                    \
+  __macro(cublasSgemm)                    \
+  __macro(cublasDgemm)                    \
+  __macro(cublasCgemm)                    \
+  __macro(cublasZgemm)                    \
+  __macro(cublasSsyrk)                    \
+  __macro(cublasDsyrk)                    \
+  __macro(cublasCsyrk)                    \
+  __macro(cublasZsyrk)                    \
+  __macro(cublasCherk)                    \
+  __macro(cublasZherk)                    \
+  __macro(cublasSsyr2k)                   \
+  __macro(cublasDsyr2k)                   \
+  __macro(cublasCsyr2k)                   \
+  __macro(cublasZsyr2k)                   \
+  __macro(cublasCher2k)                   \
+  __macro(cublasZher2k)                   \
+  __macro(cublasSsyrkx)                   \
+  __macro(cublasDsyrkx)                   \
+  __macro(cublasCsyrkx)                   \
+  __macro(cublasZsyrkx)                   \
+  __macro(cublasCherkx)                   \
+  __macro(cublasZherkx)                   \
+  __macro(cublasSsymm)                    \
+  __macro(cublasDsymm)                    \
+  __macro(cublasCsymm)                    \
+  __macro(cublasZsymm)                    \
+  __macro(cublasChemm)                    \
+  __macro(cublasZhemm)                    \
+  __macro(cublasStrsm)                    \
+  __macro(cublasDtrsm)                    \
+  __macro(cublasCtrsm)                    \
+  __macro(cublasZtrsm)                    \
+  __macro(cublasStrmm)                    \
+  __macro(cublasDtrmm)                    \
+  __macro(cublasCtrmm)                    \
+  __macro(cublasZtrmm)                    \
+  __macro(cublasSgeam)                    \
+  __macro(cublasDgeam)                    \
+  __macro(cublasCgeam)                    \
+  __macro(cublasZgeam)                    \
+  __macro(cublasSdgmm)                    \
+  __macro(cublasDdgmm)                    \
+  __macro(cublasCdgmm)                    \
+  __macro(cublasZdgmm)
+
+PERFTOOLS_GPUTOOLS_CUBLAS_V2_WRAP(cublasCreate)
+PERFTOOLS_GPUTOOLS_CUBLAS_V2_WRAP(cublasDestroy)
+PERFTOOLS_GPUTOOLS_CUBLAS_V2_WRAP(cublasSetStream)
+PERFTOOLS_GPUTOOLS_CUBLAS_V2_WRAP(cublasSetPointerMode)
+PERFTOOLS_GPUTOOLS_CUBLAS_V2_WRAP(cublasGetPointerMode)
+PERFTOOLS_GPUTOOLS_CUBLAS_WRAP(cublasSgemmBatched)
+PERFTOOLS_GPUTOOLS_CUBLAS_WRAP(cublasDgemmBatched)
+PERFTOOLS_GPUTOOLS_CUBLAS_WRAP(cublasCgemmBatched)
+PERFTOOLS_GPUTOOLS_CUBLAS_WRAP(cublasZgemmBatched)
+CUBLAS_BLAS_ROUTINE_EACH(PERFTOOLS_GPUTOOLS_CUBLAS_V2_WRAP)
+
+}  // namespace dynload
+
+static string ToString(cublasStatus_t status) {
+  switch (status) {
+    case CUBLAS_STATUS_SUCCESS:
+      return "CUBLAS_STATUS_SUCCESS";
+    case CUBLAS_STATUS_NOT_INITIALIZED:
+      return "CUBLAS_STATUS_NOT_INITIALIZED";
+    case CUBLAS_STATUS_ALLOC_FAILED:
+      return "CUBLAS_STATUS_ALLOC_FAILED";
+    case CUBLAS_STATUS_INVALID_VALUE:
+      return "CUBLAS_STATUS_INVALID_VALUE";
+    case CUBLAS_STATUS_ARCH_MISMATCH:
+      return "CUBLAS_STATUS_ARCH_MISMATCH";
+    case CUBLAS_STATUS_MAPPING_ERROR:
+      return "CUBLAS_STATUS_MAPPING_ERROR";
+    case CUBLAS_STATUS_EXECUTION_FAILED:
+      return "CUBLAS_STATUS_EXECUTION_FAILED";
+    case CUBLAS_STATUS_INTERNAL_ERROR:
+      return "CUBLAS_STATUS_INTERNAL_ERROR";
+    default:
+      return port::StrCat("<invalid cublas status: ", status, ">");
+  }
+}
+
+// cuBLAS has interfaces that permit pointers to be passed from either the host
+// memory space or the device memory space; however, you must instruct it as to
+// which address space those pointers are in with cublasSetPointerMode.
+//
+// This helper sets the cuBLAS pointer mode to a desired value for a cuBLAS call
+// you are about to perform in a given scope.
+//
+// The prior cuBLAS pointer mode is retained and restored when this object goes
+// out of scope.
+class ScopedCublasPointerMode {
+ public:
+  // Note that, because the setting of the cublas pointer mode is fallible,
+  // construction of this scoped datatype must be paired with a call to
+  // Init().
+  //
+  // Parameters:
+  //  handle: The cublas library handle to act upon in setting the pointer mode.
+  explicit ScopedCublasPointerMode(CUDAExecutor *parent, cublasHandle_t handle)
+      : parent_(parent), handle_(handle), ok_(false) {}
+
+  // Attempts the switch to the requested scoped pointer mode, new_mode.
+  //
+  // Note that when false is returned, an appropriate error has already been
+  // logged.
+  bool Init(cublasPointerMode_t new_mode) {
+    cublasStatus_t ret =
+        dynload::cublasGetPointerMode_v2(parent_, handle_, &old_mode_);
+    if (ret != CUBLAS_STATUS_SUCCESS) {
+      LOG(ERROR) << "failed to get old cublas pointer mode: " << ToString(ret);
+      return ok_ = false;
+    }
+
+    ret = dynload::cublasSetPointerMode_v2(parent_, handle_, new_mode);
+    if (ret != CUBLAS_STATUS_SUCCESS) {
+      LOG(ERROR) << "failed to set new cublas pointer mode: " << ToString(ret);
+      return ok_ = false;
+    }
+
+    return ok_ = true;
+  }
+
+  // Switches back to the prior pointer mode, if the switch operation was
+  // successful in the first place.
+  ~ScopedCublasPointerMode() {
+    if (ok_) {
+      cublasStatus_t ret =
+          dynload::cublasSetPointerMode_v2(parent_, handle_, old_mode_);
+      if (ret != CUBLAS_STATUS_SUCCESS) {
+        LOG(ERROR) << "failed to set former cublas pointer mode: "
+                   << ToString(ret);
+      }
+    }
+  }
+
+ private:
+  CUDAExecutor *parent_;   // Executor establishing this pointer mode for.
+  cublasHandle_t handle_;  // Handle to the cuBLAS instance of interest.
+  cublasPointerMode_t old_mode_;  // Prior cuBLAS pointer mode, to be restored.
+  bool ok_;                       // Whether the change was successful.
+};
+
+bool CUDABlas::Init() {
+  cublasStatus_t ret = dynload::cublasCreate_v2(parent_, &blas_);
+  if (ret != CUBLAS_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to create cublas handle: " << ToString(ret);
+    return false;
+  }
+
+  return true;
+}
+
+CUDABlas::CUDABlas(cuda::CUDAExecutor *parent)
+    : parent_(CHECK_NOTNULL(parent)), blas_(nullptr) {}
+
+CUDABlas::~CUDABlas() {
+  if (blas_ != nullptr) {
+    dynload::cublasDestroy_v2(parent_, blas_);
+  }
+}
+
+bool CUDABlas::SetStream(Stream *stream) {
+  CHECK(stream != nullptr);
+  CHECK(AsCUDAStreamValue(stream) != nullptr);
+  CHECK(blas_ != nullptr);
+  cublasStatus_t ret =
+      dynload::cublasSetStream_v2(parent_, blas_, AsCUDAStreamValue(stream));
+  if (ret != CUBLAS_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to set stream for cuBLAS calls: " << ToString(ret);
+    return false;
+  }
+
+  return true;
+}
+
+namespace {
+
+// Helper functions transforming blas arguments into cuBLAS arguments.
+
+cublasOperation_t CUDABlasTranspose(blas::Transpose trans) {
+  switch (trans) {
+    case blas::Transpose::kNoTranspose:
+      return CUBLAS_OP_N;
+    case blas::Transpose::kTranspose:
+      return CUBLAS_OP_T;
+    case blas::Transpose::kConjugateTranspose:
+      return CUBLAS_OP_C;
+    default:
+      LOG(FATAL) << "Invalid value of blas::Transpose.";
+  }
+}
+
+cublasFillMode_t CUDABlasUpperLower(blas::UpperLower uplo) {
+  switch (uplo) {
+    case blas::UpperLower::kUpper:
+      return CUBLAS_FILL_MODE_UPPER;
+    case blas::UpperLower::kLower:
+      return CUBLAS_FILL_MODE_LOWER;
+    default:
+      LOG(FATAL) << "Invalid value of blas::UpperLower.";
+  }
+}
+
+cublasDiagType_t CUDABlasDiagonal(blas::Diagonal diag) {
+  switch (diag) {
+    case blas::Diagonal::kUnit:
+      return CUBLAS_DIAG_UNIT;
+    case blas::Diagonal::kNonUnit:
+      return CUBLAS_DIAG_NON_UNIT;
+    default:
+      LOG(FATAL) << "Invalid value of blas::Diagonal.";
+  }
+}
+
+cublasSideMode_t CUDABlasSide(blas::Side side) {
+  switch (side) {
+    case blas::Side::kLeft:
+      return CUBLAS_SIDE_LEFT;
+    case blas::Side::kRight:
+      return CUBLAS_SIDE_RIGHT;
+    default:
+      LOG(FATAL) << "Invalid value of blas::Side.";
+  }
+}
+
+}  // namespace
+
+template <typename FuncT, typename... Args>
+bool CUDABlas::DoBlasInternal(FuncT cublas_func, Stream *stream,
+                              bool pointer_mode_host, Args... args) {
+  mutex_lock lock{mu_};
+
+  CHECK(blas_ != nullptr);
+  if (!SetStream(stream)) {
+    return false;
+  }
+
+  ScopedCublasPointerMode pointer_mode{parent_, blas_};
+  if (!pointer_mode.Init(pointer_mode_host ? CUBLAS_POINTER_MODE_HOST
+                                           : CUBLAS_POINTER_MODE_DEVICE)) {
+    return false;
+  }
+
+  cublasStatus_t ret = cublas_func(parent_, blas_, args...);
+  if (ret != CUBLAS_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to run cuBLAS routine " << cublas_func.kName << ": "
+               << ToString(ret);
+    return false;
+  }
+
+  return true;
+}
+
+bool CUDABlas::DoBlasAsum(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<float> &x, int incx,
+                          DeviceMemory<float> *result) {
+  return DoBlasInternal(dynload::cublasSasum, stream,
+                        false /* = pointer_mode_host */, elem_count,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasAsum(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<double> &x, int incx,
+                          DeviceMemory<double> *result) {
+  return DoBlasInternal(dynload::cublasDasum, stream,
+                        false /* = pointer_mode_host */, elem_count,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasAsum(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          DeviceMemory<float> *result) {
+  return DoBlasInternal(
+      dynload::cublasScasum, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasAsum(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          DeviceMemory<double> *result) {
+  return DoBlasInternal(
+      dynload::cublasDzasum, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasAxpy(Stream *stream, uint64 elem_count, float alpha,
+                          const DeviceMemory<float> &x, int incx,
+                          DeviceMemory<float> *y, int incy) {
+  return DoBlasInternal(dynload::cublasSaxpy, stream,
+                        true /* = pointer_mode_host */, elem_count, &alpha,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasAxpy(Stream *stream, uint64 elem_count, double alpha,
+                          const DeviceMemory<double> &x, int incx,
+                          DeviceMemory<double> *y, int incy) {
+  return DoBlasInternal(dynload::cublasDaxpy, stream,
+                        true /* = pointer_mode_host */, elem_count, &alpha,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasAxpy(Stream *stream, uint64 elem_count,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          DeviceMemory<std::complex<float>> *y, int incy) {
+  return DoBlasInternal(dynload::cublasCaxpy, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAComplex(&alpha), CUDAComplex(CUDAMemory(x)), incx,
+                        CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasAxpy(Stream *stream, uint64 elem_count,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          DeviceMemory<std::complex<double>> *y, int incy) {
+  return DoBlasInternal(dynload::cublasZaxpy, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAComplex(&alpha), CUDAComplex(CUDAMemory(x)), incx,
+                        CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasCopy(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<float> &x, int incx,
+                          DeviceMemory<float> *y, int incy) {
+  return DoBlasInternal(dynload::cublasScopy, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasCopy(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<double> &x, int incx,
+                          DeviceMemory<double> *y, int incy) {
+  return DoBlasInternal(dynload::cublasDcopy, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasCopy(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          DeviceMemory<std::complex<float>> *y, int incy) {
+  return DoBlasInternal(dynload::cublasCcopy, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAComplex(CUDAMemory(x)), incx,
+                        CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasCopy(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          DeviceMemory<std::complex<double>> *y, int incy) {
+  return DoBlasInternal(dynload::cublasZcopy, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAComplex(CUDAMemory(x)), incx,
+                        CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasDot(Stream *stream, uint64 elem_count,
+                         const DeviceMemory<float> &x, int incx,
+                         const DeviceMemory<float> &y, int incy,
+                         DeviceMemory<float> *result) {
+  return DoBlasInternal(
+      dynload::cublasSdot, stream, false /* = pointer_mode_host */, elem_count,
+      CUDAMemory(x), incx, CUDAMemory(y), incy, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasDot(Stream *stream, uint64 elem_count,
+                         const DeviceMemory<double> &x, int incx,
+                         const DeviceMemory<double> &y, int incy,
+                         DeviceMemory<double> *result) {
+  return DoBlasInternal(
+      dynload::cublasDdot, stream, false /* = pointer_mode_host */, elem_count,
+      CUDAMemory(x), incx, CUDAMemory(y), incy, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasDotc(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *result) {
+  return DoBlasInternal(
+      dynload::cublasCdotc, stream, false /* = pointer_mode_host */, elem_count,
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemory(y)), incy,
+      CUDAComplex(CUDAMemoryMutable(result)));
+}
+
+bool CUDABlas::DoBlasDotc(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *result) {
+  return DoBlasInternal(
+      dynload::cublasZdotc, stream, false /* = pointer_mode_host */, elem_count,
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemory(y)), incy,
+      CUDAComplex(CUDAMemoryMutable(result)));
+}
+
+bool CUDABlas::DoBlasDotu(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *result) {
+  return DoBlasInternal(
+      dynload::cublasCdotu, stream, false /* = pointer_mode_host */, elem_count,
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemory(y)), incy,
+      CUDAComplex(CUDAMemoryMutable(result)));
+}
+
+bool CUDABlas::DoBlasDotu(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *result) {
+  return DoBlasInternal(
+      dynload::cublasZdotu, stream, false /* = pointer_mode_host */, elem_count,
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemory(y)), incy,
+      CUDAComplex(CUDAMemoryMutable(result)));
+}
+
+bool CUDABlas::DoBlasNrm2(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<float> &x, int incx,
+                          DeviceMemory<float> *result) {
+  return DoBlasInternal(dynload::cublasSnrm2, stream,
+                        false /* = pointer_mode_host */, elem_count,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasNrm2(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<double> &x, int incx,
+                          DeviceMemory<double> *result) {
+  return DoBlasInternal(dynload::cublasDnrm2, stream,
+                        false /* = pointer_mode_host */, elem_count,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasNrm2(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          DeviceMemory<float> *result) {
+  return DoBlasInternal(
+      dynload::cublasScnrm2, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasNrm2(Stream *stream, uint64 elem_count,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          DeviceMemory<double> *result) {
+  return DoBlasInternal(
+      dynload::cublasDznrm2, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasRot(Stream *stream, uint64 elem_count,
+                         DeviceMemory<float> *x, int incx,
+                         DeviceMemory<float> *y, int incy, float c, float s) {
+  return DoBlasInternal(
+      dynload::cublasSrot, stream, true /* = pointer_mode_host */, elem_count,
+      CUDAMemoryMutable(x), incx, CUDAMemoryMutable(y), incy, &c, &s);
+}
+
+bool CUDABlas::DoBlasRot(Stream *stream, uint64 elem_count,
+                         DeviceMemory<double> *x, int incx,
+                         DeviceMemory<double> *y, int incy, double c,
+                         double s) {
+  return DoBlasInternal(
+      dynload::cublasDrot, stream, true /* = pointer_mode_host */, elem_count,
+      CUDAMemoryMutable(x), incx, CUDAMemoryMutable(y), incy, &c, &s);
+}
+
+bool CUDABlas::DoBlasRot(Stream *stream, uint64 elem_count,
+                         DeviceMemory<std::complex<float>> *x, int incx,
+                         DeviceMemory<std::complex<float>> *y, int incy,
+                         float c, float s) {
+  return DoBlasInternal(dynload::cublasCsrot, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAComplex(CUDAMemoryMutable(x)), incx,
+                        CUDAComplex(CUDAMemoryMutable(y)), incy, &c, &s);
+}
+
+bool CUDABlas::DoBlasRot(Stream *stream, uint64 elem_count,
+                         DeviceMemory<std::complex<double>> *x, int incx,
+                         DeviceMemory<std::complex<double>> *y, int incy,
+                         double c, double s) {
+  return DoBlasInternal(dynload::cublasZdrot, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAComplex(CUDAMemoryMutable(x)), incx,
+                        CUDAComplex(CUDAMemoryMutable(y)), incy, &c, &s);
+}
+
+bool CUDABlas::DoBlasRotg(Stream *stream, DeviceMemory<float> *a,
+                          DeviceMemory<float> *b, DeviceMemory<float> *c,
+                          DeviceMemory<float> *s) {
+  return DoBlasInternal(dynload::cublasSrotg, stream,
+                        false /* = pointer_mode_host */, CUDAMemoryMutable(a),
+                        CUDAMemoryMutable(b), CUDAMemoryMutable(c),
+                        CUDAMemoryMutable(s));
+}
+
+bool CUDABlas::DoBlasRotg(Stream *stream, DeviceMemory<double> *a,
+                          DeviceMemory<double> *b, DeviceMemory<double> *c,
+                          DeviceMemory<double> *s) {
+  return DoBlasInternal(dynload::cublasDrotg, stream,
+                        false /* = pointer_mode_host */,
+                        CUDAComplex(CUDAMemoryMutable(a)), CUDAMemoryMutable(b),
+                        CUDAMemoryMutable(c), CUDAMemoryMutable(s));
+}
+
+bool CUDABlas::DoBlasRotg(Stream *stream, DeviceMemory<std::complex<float>> *a,
+                          DeviceMemory<std::complex<float>> *b,
+                          DeviceMemory<float> *c,
+                          DeviceMemory<std::complex<float>> *s) {
+  return DoBlasInternal(
+      dynload::cublasCrotg, stream, false /* = pointer_mode_host */,
+      CUDAComplex(CUDAMemoryMutable(a)), CUDAComplex(CUDAMemoryMutable(b)),
+      CUDAComplex(CUDAMemoryMutable(c)), CUDAComplex(CUDAMemoryMutable(s)));
+}
+
+bool CUDABlas::DoBlasRotg(Stream *stream, DeviceMemory<std::complex<double>> *a,
+                          DeviceMemory<std::complex<double>> *b,
+                          DeviceMemory<double> *c,
+                          DeviceMemory<std::complex<double>> *s) {
+  return DoBlasInternal(
+      dynload::cublasZrotg, stream, false /* = pointer_mode_host */,
+      CUDAComplex(CUDAMemoryMutable(a)), CUDAComplex(CUDAMemoryMutable(b)),
+      CUDAComplex(CUDAMemoryMutable(c)), CUDAComplex(CUDAMemoryMutable(s)));
+}
+
+bool CUDABlas::DoBlasRotm(Stream *stream, uint64 elem_count,
+                          DeviceMemory<float> *x, int incx,
+                          DeviceMemory<float> *y, int incy,
+                          const DeviceMemory<float> &param) {
+  return DoBlasInternal(dynload::cublasSrotm, stream,
+                        false /* = pointer_mode_host */, elem_count,
+                        CUDAMemoryMutable(x), incx, CUDAMemoryMutable(y), incy,
+                        CUDAMemory(param));
+}
+
+bool CUDABlas::DoBlasRotm(Stream *stream, uint64 elem_count,
+                          DeviceMemory<double> *x, int incx,
+                          DeviceMemory<double> *y, int incy,
+                          const DeviceMemory<double> &param) {
+  return DoBlasInternal(dynload::cublasDrotm, stream,
+                        false /* = pointer_mode_host */, elem_count,
+                        CUDAMemoryMutable(x), incx, CUDAMemoryMutable(y), incy,
+                        CUDAMemory(param));
+}
+
+bool CUDABlas::DoBlasRotmg(Stream *stream, DeviceMemory<float> *d1,
+                           DeviceMemory<float> *d2, DeviceMemory<float> *x1,
+                           const DeviceMemory<float> &y1,
+                           DeviceMemory<float> *param) {
+  return DoBlasInternal(dynload::cublasSrotmg, stream,
+                        false /* = pointer_mode_host */, CUDAMemoryMutable(d1),
+                        CUDAMemoryMutable(d2), CUDAMemoryMutable(x1),
+                        CUDAMemory(y1), CUDAMemoryMutable(param));
+}
+
+bool CUDABlas::DoBlasRotmg(Stream *stream, DeviceMemory<double> *d1,
+                           DeviceMemory<double> *d2, DeviceMemory<double> *x1,
+                           const DeviceMemory<double> &y1,
+                           DeviceMemory<double> *param) {
+  return DoBlasInternal(dynload::cublasDrotmg, stream,
+                        false /* = pointer_mode_host */, CUDAMemoryMutable(d1),
+                        CUDAMemoryMutable(d2), CUDAMemoryMutable(x1),
+                        CUDAMemory(y1), CUDAMemoryMutable(param));
+}
+
+bool CUDABlas::DoBlasScal(Stream *stream, uint64 elem_count, float alpha,
+                          DeviceMemory<float> *x, int incx) {
+  return DoBlasInternal(dynload::cublasSscal, stream,
+                        true /* = pointer_mode_host */, elem_count, &alpha,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasScal(Stream *stream, uint64 elem_count, double alpha,
+                          DeviceMemory<double> *x, int incx) {
+  return DoBlasInternal(dynload::cublasDscal, stream,
+                        true /* = pointer_mode_host */, elem_count, &alpha,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasScal(Stream *stream, uint64 elem_count, float alpha,
+                          DeviceMemory<std::complex<float>> *x, int incx) {
+  return DoBlasInternal(
+      dynload::cublasCsscal, stream, true /* = pointer_mode_host */, elem_count,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasScal(Stream *stream, uint64 elem_count, double alpha,
+                          DeviceMemory<std::complex<double>> *x, int incx) {
+  return DoBlasInternal(
+      dynload::cublasZdscal, stream, true /* = pointer_mode_host */, elem_count,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasScal(Stream *stream, uint64 elem_count,
+                          std::complex<float> alpha,
+                          DeviceMemory<std::complex<float>> *x, int incx) {
+  return DoBlasInternal(
+      dynload::cublasCscal, stream, true /* = pointer_mode_host */, elem_count,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasScal(Stream *stream, uint64 elem_count,
+                          std::complex<double> alpha,
+                          DeviceMemory<std::complex<double>> *x, int incx) {
+  return DoBlasInternal(
+      dynload::cublasZscal, stream, true /* = pointer_mode_host */, elem_count,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasSwap(Stream *stream, uint64 elem_count,
+                          DeviceMemory<float> *x, int incx,
+                          DeviceMemory<float> *y, int incy) {
+  return DoBlasInternal(dynload::cublasSswap, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAMemoryMutable(x), incx, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasSwap(Stream *stream, uint64 elem_count,
+                          DeviceMemory<double> *x, int incx,
+                          DeviceMemory<double> *y, int incy) {
+  return DoBlasInternal(dynload::cublasDswap, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAMemoryMutable(x), incx, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasSwap(Stream *stream, uint64 elem_count,
+                          DeviceMemory<std::complex<float>> *x, int incx,
+                          DeviceMemory<std::complex<float>> *y, int incy) {
+  return DoBlasInternal(dynload::cublasCswap, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAComplex(CUDAMemoryMutable(x)), incx,
+                        CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasSwap(Stream *stream, uint64 elem_count,
+                          DeviceMemory<std::complex<double>> *x, int incx,
+                          DeviceMemory<std::complex<double>> *y, int incy) {
+  return DoBlasInternal(dynload::cublasZswap, stream,
+                        true /* = pointer_mode_host */, elem_count,
+                        CUDAComplex(CUDAMemoryMutable(x)), incx,
+                        CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasIamax(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<float> &x, int incx,
+                           DeviceMemory<int> *result) {
+  return DoBlasInternal(dynload::cublasIsamax, stream,
+                        false /* = pointer_mode_host */, elem_count,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasIamax(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<double> &x, int incx,
+                           DeviceMemory<int> *result) {
+  return DoBlasInternal(dynload::cublasIdamax, stream,
+                        false /* = pointer_mode_host */, elem_count,
+                        CUDAMemory(x), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasIamax(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<std::complex<float>> &x, int incx,
+                           DeviceMemory<int> *result) {
+  return DoBlasInternal(
+      dynload::cublasIcamax, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasIamax(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<std::complex<double>> &x,
+                           int incx, DeviceMemory<int> *result) {
+  return DoBlasInternal(
+      dynload::cublasIzamax, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasIamin(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<float> &x, int incx,
+                           DeviceMemory<int> *result) {
+  return DoBlasInternal(
+      dynload::cublasIsamin, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasIamin(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<double> &x, int incx,
+                           DeviceMemory<int> *result) {
+  return DoBlasInternal(
+      dynload::cublasIdamin, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasIamin(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<std::complex<float>> &x, int incx,
+                           DeviceMemory<int> *result) {
+  return DoBlasInternal(
+      dynload::cublasIcamin, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasIamin(Stream *stream, uint64 elem_count,
+                           const DeviceMemory<std::complex<double>> &x,
+                           int incx, DeviceMemory<int> *result) {
+  return DoBlasInternal(
+      dynload::cublasIzamin, stream, false /* = pointer_mode_host */,
+      elem_count, CUDAComplex(CUDAMemory(x)), incx, CUDAMemoryMutable(result));
+}
+
+bool CUDABlas::DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, uint64 kl, uint64 ku, float alpha,
+                          const DeviceMemory<float> &a, int lda,
+                          const DeviceMemory<float> &x, int incx, float beta,
+                          DeviceMemory<float> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasSgbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(trans), m, n, kl, ku, &alpha, CUDAMemory(a), lda,
+      CUDAMemory(x), incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, uint64 kl, uint64 ku, double alpha,
+                          const DeviceMemory<double> &a, int lda,
+                          const DeviceMemory<double> &x, int incx, double beta,
+                          DeviceMemory<double> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasDgbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(trans), m, n, kl, ku, &alpha, CUDAMemory(a), lda,
+      CUDAMemory(x), incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, uint64 kl, uint64 ku,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasCgbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(trans), m, n, kl, ku, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasGbmv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, uint64 kl, uint64 ku,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasZgbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(trans), m, n, kl, ku, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, float alpha, const DeviceMemory<float> &a,
+                          int lda, const DeviceMemory<float> &x, int incx,
+                          float beta, DeviceMemory<float> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasSgemv, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(trans), m, n, &alpha, CUDAMemory(a), lda, CUDAMemory(x),
+      incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, double alpha, const DeviceMemory<double> &a,
+                          int lda, const DeviceMemory<double> &x, int incx,
+                          double beta, DeviceMemory<double> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasDgemv, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(trans), m, n, &alpha, CUDAMemory(a), lda, CUDAMemory(x),
+      incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasCgemv, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(trans), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasGemv(Stream *stream, blas::Transpose trans, uint64 m,
+                          uint64 n, std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasZgemv, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(trans), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasGer(Stream *stream, uint64 m, uint64 n, float alpha,
+                         const DeviceMemory<float> &x, int incx,
+                         const DeviceMemory<float> &y, int incy,
+                         DeviceMemory<float> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasSger, stream, true /* = pointer_mode_host */, m, n, &alpha,
+      CUDAMemory(x), incx, CUDAMemory(y), incy, CUDAMemoryMutable(a), lda);
+}
+
+bool CUDABlas::DoBlasGer(Stream *stream, uint64 m, uint64 n, double alpha,
+                         const DeviceMemory<double> &x, int incx,
+                         const DeviceMemory<double> &y, int incy,
+                         DeviceMemory<double> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasDger, stream, true /* = pointer_mode_host */, m, n, &alpha,
+      CUDAMemory(x), incx, CUDAMemory(y), incy, CUDAMemoryMutable(a), lda);
+}
+
+bool CUDABlas::DoBlasGerc(Stream *stream, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasCgerc, stream, true /* = pointer_mode_host */, m, n,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(CUDAMemory(y)), incy, CUDAComplex(CUDAMemoryMutable(a)), lda);
+}
+
+bool CUDABlas::DoBlasGerc(Stream *stream, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasZgerc, stream, true /* = pointer_mode_host */, m, n,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(CUDAMemory(y)), incy, CUDAComplex(CUDAMemoryMutable(a)), lda);
+}
+
+bool CUDABlas::DoBlasGeru(Stream *stream, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasCgeru, stream, true /* = pointer_mode_host */, m, n,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(CUDAMemory(y)), incy, CUDAComplex(CUDAMemoryMutable(a)), lda);
+}
+
+bool CUDABlas::DoBlasGeru(Stream *stream, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasZgeru, stream, true /* = pointer_mode_host */, m, n,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(CUDAMemory(y)), incy, CUDAComplex(CUDAMemoryMutable(a)), lda);
+}
+
+bool CUDABlas::DoBlasHbmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          uint64 k, std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasChbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, k, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasHbmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          uint64 k, std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasZhbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, k, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasHemv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasChemv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasHemv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasZhemv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasHer(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         float alpha,
+                         const DeviceMemory<std::complex<float>> &x, int incx,
+                         DeviceMemory<std::complex<float>> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasCher, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, &alpha, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(CUDAMemoryMutable(a)), lda);
+}
+
+bool CUDABlas::DoBlasHer(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         double alpha,
+                         const DeviceMemory<std::complex<double>> &x, int incx,
+                         DeviceMemory<std::complex<double>> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasZher, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, &alpha, CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(CUDAMemoryMutable(a)), lda);
+}
+
+bool CUDABlas::DoBlasHer2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasCher2, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemory(y)), incy,
+      CUDAComplex(CUDAMemoryMutable(a)), lda);
+}
+
+bool CUDABlas::DoBlasHer2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *a, int lda) {
+  return DoBlasInternal(
+      dynload::cublasZher2, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemory(y)), incy,
+      CUDAComplex(CUDAMemoryMutable(a)), lda);
+}
+
+bool CUDABlas::DoBlasHpmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &ap,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasChpmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(ap)), CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasHpmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &ap,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasZhpmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(ap)), CUDAComplex(CUDAMemory(x)), incx,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(y)), incy);
+}
+
+bool CUDABlas::DoBlasHpr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         float alpha,
+                         const DeviceMemory<std::complex<float>> &x, int incx,
+                         DeviceMemory<std::complex<float>> *ap) {
+  return DoBlasInternal(
+      dynload::cublasChpr, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemoryMutable(ap)));
+}
+
+bool CUDABlas::DoBlasHpr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         double alpha,
+                         const DeviceMemory<std::complex<double>> &x, int incx,
+                         DeviceMemory<std::complex<double>> *ap) {
+  return DoBlasInternal(
+      dynload::cublasZhpr, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemoryMutable(ap)));
+}
+
+bool CUDABlas::DoBlasHpr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &x, int incx,
+                          const DeviceMemory<std::complex<float>> &y, int incy,
+                          DeviceMemory<std::complex<float>> *ap) {
+  return DoBlasInternal(
+      dynload::cublasChpr2, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemory(y)), incy,
+      CUDAComplex(CUDAMemoryMutable(ap)));
+}
+
+bool CUDABlas::DoBlasHpr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &x, int incx,
+                          const DeviceMemory<std::complex<double>> &y, int incy,
+                          DeviceMemory<std::complex<double>> *ap) {
+  return DoBlasInternal(
+      dynload::cublasZhpr2, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(x)), incx, CUDAComplex(CUDAMemory(y)), incy,
+      CUDAComplex(CUDAMemoryMutable(ap)));
+}
+
+bool CUDABlas::DoBlasSbmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          uint64 k, float alpha, const DeviceMemory<float> &a,
+                          int lda, const DeviceMemory<float> &x, int incx,
+                          float beta, DeviceMemory<float> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasSsbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, k, &alpha, CUDAMemory(a), lda, CUDAMemory(x),
+      incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasSbmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          uint64 k, double alpha, const DeviceMemory<double> &a,
+                          int lda, const DeviceMemory<double> &x, int incx,
+                          double beta, DeviceMemory<double> *y, int incy) {
+  return DoBlasInternal(
+      dynload::cublasDsbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), n, k, &alpha, CUDAMemory(a), lda, CUDAMemory(x),
+      incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasSpmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          float alpha, const DeviceMemory<float> &ap,
+                          const DeviceMemory<float> &x, int incx, float beta,
+                          DeviceMemory<float> *y, int incy) {
+  return DoBlasInternal(dynload::cublasSspmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(ap),
+                        CUDAMemory(x), incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasSpmv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          double alpha, const DeviceMemory<double> &ap,
+                          const DeviceMemory<double> &x, int incx, double beta,
+                          DeviceMemory<double> *y, int incy) {
+  return DoBlasInternal(dynload::cublasDspmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(ap),
+                        CUDAMemory(x), incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasSpr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         float alpha, const DeviceMemory<float> &x, int incx,
+                         DeviceMemory<float> *ap) {
+  return DoBlasInternal(dynload::cublasSspr, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(x),
+                        incx, CUDAMemoryMutable(ap));
+}
+
+bool CUDABlas::DoBlasSpr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         double alpha, const DeviceMemory<double> &x, int incx,
+                         DeviceMemory<double> *ap) {
+  return DoBlasInternal(dynload::cublasDspr, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(x),
+                        incx, CUDAMemoryMutable(ap));
+}
+
+bool CUDABlas::DoBlasSpr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          float alpha, const DeviceMemory<float> &x, int incx,
+                          const DeviceMemory<float> &y, int incy,
+                          DeviceMemory<float> *ap) {
+  return DoBlasInternal(dynload::cublasSspr2, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(x),
+                        incx, CUDAMemory(y), incy, CUDAMemoryMutable(ap));
+}
+
+bool CUDABlas::DoBlasSpr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          double alpha, const DeviceMemory<double> &x, int incx,
+                          const DeviceMemory<double> &y, int incy,
+                          DeviceMemory<double> *ap) {
+  return DoBlasInternal(dynload::cublasDspr2, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(x),
+                        incx, CUDAMemory(y), incy, CUDAMemoryMutable(ap));
+}
+
+bool CUDABlas::DoBlasSymv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          float alpha, const DeviceMemory<float> &a, int lda,
+                          const DeviceMemory<float> &x, int incx, float beta,
+                          DeviceMemory<float> *y, int incy) {
+  return DoBlasInternal(dynload::cublasSsymv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(a), lda,
+                        CUDAMemory(x), incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasSymv(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          double alpha, const DeviceMemory<double> &a, int lda,
+                          const DeviceMemory<double> &x, int incx, double beta,
+                          DeviceMemory<double> *y, int incy) {
+  return DoBlasInternal(dynload::cublasDsymv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(a), lda,
+                        CUDAMemory(x), incx, &beta, CUDAMemoryMutable(y), incy);
+}
+
+bool CUDABlas::DoBlasSyr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         float alpha, const DeviceMemory<float> &x, int incx,
+                         DeviceMemory<float> *a, int lda) {
+  return DoBlasInternal(dynload::cublasSsyr, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(x),
+                        incx, CUDAMemoryMutable(a), lda);
+}
+
+bool CUDABlas::DoBlasSyr(Stream *stream, blas::UpperLower uplo, uint64 n,
+                         double alpha, const DeviceMemory<double> &x, int incx,
+                         DeviceMemory<double> *a, int lda) {
+  return DoBlasInternal(dynload::cublasDsyr, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(x),
+                        incx, CUDAMemoryMutable(a), lda);
+}
+
+bool CUDABlas::DoBlasSyr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          float alpha, const DeviceMemory<float> &x, int incx,
+                          const DeviceMemory<float> &y, int incy,
+                          DeviceMemory<float> *a, int lda) {
+  return DoBlasInternal(dynload::cublasSsyr2, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(x),
+                        incx, CUDAMemory(y), incy, CUDAMemoryMutable(a), lda);
+}
+
+bool CUDABlas::DoBlasSyr2(Stream *stream, blas::UpperLower uplo, uint64 n,
+                          double alpha, const DeviceMemory<double> &x, int incx,
+                          const DeviceMemory<double> &y, int incy,
+                          DeviceMemory<double> *a, int lda) {
+  return DoBlasInternal(dynload::cublasDsyr2, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), n, &alpha, CUDAMemory(x),
+                        incx, CUDAMemory(y), incy, CUDAMemoryMutable(a), lda);
+}
+
+bool CUDABlas::DoBlasTbmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *x, int incx) {
+  return DoBlasInternal(dynload::cublasStbmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, k, CUDAMemory(a), lda,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTbmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *x, int incx) {
+  return DoBlasInternal(dynload::cublasDtbmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, k, CUDAMemory(a), lda,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTbmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<std::complex<float>> &a,
+                          int lda, DeviceMemory<std::complex<float>> *x,
+                          int incx) {
+  return DoBlasInternal(
+      dynload::cublasCtbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+      CUDABlasDiagonal(diag), n, k, CUDAComplex(CUDAMemory(a)), lda,
+      CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTbmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<std::complex<double>> &a,
+                          int lda, DeviceMemory<std::complex<double>> *x,
+                          int incx) {
+  return DoBlasInternal(
+      dynload::cublasZtbmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+      CUDABlasDiagonal(diag), n, k, CUDAComplex(CUDAMemory(a)), lda,
+      CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTbsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *x, int incx) {
+  return DoBlasInternal(dynload::cublasStbsv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, k, CUDAMemory(a), lda,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTbsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *x, int incx) {
+  return DoBlasInternal(dynload::cublasDtbsv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, k, CUDAMemory(a), lda,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTbsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<std::complex<float>> &a,
+                          int lda, DeviceMemory<std::complex<float>> *x,
+                          int incx) {
+  return DoBlasInternal(
+      dynload::cublasCtbsv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+      CUDABlasDiagonal(diag), n, k, CUDAComplex(CUDAMemory(a)), lda,
+      CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTbsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          uint64 k, const DeviceMemory<std::complex<double>> &a,
+                          int lda, DeviceMemory<std::complex<double>> *x,
+                          int incx) {
+  return DoBlasInternal(
+      dynload::cublasZtbsv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+      CUDABlasDiagonal(diag), n, k, CUDAComplex(CUDAMemory(a)), lda,
+      CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTpmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<float> &ap, DeviceMemory<float> *x,
+                          int incx) {
+  return DoBlasInternal(
+      dynload::cublasStpmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+      CUDABlasDiagonal(diag), n, CUDAMemory(ap), CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTpmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<double> &ap,
+                          DeviceMemory<double> *x, int incx) {
+  return DoBlasInternal(
+      dynload::cublasDtpmv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+      CUDABlasDiagonal(diag), n, CUDAMemory(ap), CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTpmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<float>> &ap,
+                          DeviceMemory<std::complex<float>> *x, int incx) {
+  return DoBlasInternal(dynload::cublasCtpmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAComplex(CUDAMemory(ap)),
+                        CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTpmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<double>> &ap,
+                          DeviceMemory<std::complex<double>> *x, int incx) {
+  return DoBlasInternal(dynload::cublasZtpmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAComplex(CUDAMemory(ap)),
+                        CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTpsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<float> &ap, DeviceMemory<float> *x,
+                          int incx) {
+  return DoBlasInternal(
+      dynload::cublasStpsv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+      CUDABlasDiagonal(diag), n, CUDAMemory(ap), CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTpsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<double> &ap,
+                          DeviceMemory<double> *x, int incx) {
+  return DoBlasInternal(
+      dynload::cublasDtpsv, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+      CUDABlasDiagonal(diag), n, CUDAMemory(ap), CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTpsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<float>> &ap,
+                          DeviceMemory<std::complex<float>> *x, int incx) {
+  return DoBlasInternal(dynload::cublasCtpsv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAComplex(CUDAMemory(ap)),
+                        CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTpsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<double>> &ap,
+                          DeviceMemory<std::complex<double>> *x, int incx) {
+  return DoBlasInternal(dynload::cublasZtpsv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAComplex(CUDAMemory(ap)),
+                        CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTrmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *x, int incx) {
+  return DoBlasInternal(dynload::cublasStrmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAMemory(a), lda,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTrmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *x, int incx) {
+  return DoBlasInternal(dynload::cublasDtrmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAMemory(a), lda,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTrmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          DeviceMemory<std::complex<float>> *x, int incx) {
+  return DoBlasInternal(dynload::cublasCtrmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAComplex(CUDAMemory(a)),
+                        lda, CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTrmv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          DeviceMemory<std::complex<double>> *x, int incx) {
+  return DoBlasInternal(dynload::cublasZtrmv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAComplex(CUDAMemory(a)),
+                        lda, CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTrsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *x, int incx) {
+  return DoBlasInternal(dynload::cublasStrsv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAMemory(a), lda,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTrsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *x, int incx) {
+  return DoBlasInternal(dynload::cublasDtrsv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAMemory(a), lda,
+                        CUDAMemoryMutable(x), incx);
+}
+
+bool CUDABlas::DoBlasTrsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          DeviceMemory<std::complex<float>> *x, int incx) {
+  return DoBlasInternal(dynload::cublasCtrsv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAComplex(CUDAMemory(a)),
+                        lda, CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasTrsv(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, blas::Diagonal diag, uint64 n,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          DeviceMemory<std::complex<double>> *x, int incx) {
+  return DoBlasInternal(dynload::cublasZtrsv, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans),
+                        CUDABlasDiagonal(diag), n, CUDAComplex(CUDAMemory(a)),
+                        lda, CUDAComplex(CUDAMemoryMutable(x)), incx);
+}
+
+bool CUDABlas::DoBlasGemm(Stream *stream, blas::Transpose transa,
+                          blas::Transpose transb, uint64 m, uint64 n, uint64 k,
+                          float alpha, const DeviceMemory<float> &a, int lda,
+                          const DeviceMemory<float> &b, int ldb, float beta,
+                          DeviceMemory<float> *c, int ldc) {
+  VLOG(1) << port::Printf(
+      "doing cuBLAS SGEMM: at=%d bt=%d m=%llu n=%llu "
+      "k=%llu alpha=%f a=%p lda=%d b=%p ldb=%d beta=%f "
+      "c=%p ldc=%d",
+      static_cast<int>(transa), static_cast<int>(transb), m, n, k, alpha,
+      a.opaque(), lda, b.opaque(), ldb, beta, c->opaque(), ldc);
+  if (transa == blas::Transpose::kNoTranspose) {
+    if (lda < static_cast<int64>(m)) {
+      LOG(WARNING) << "GEMM lda was smaller than m (no transpose case); "
+                      "precondition violation";
+    }
+  } else {
+    if (lda < static_cast<int64>(k)) {
+      LOG(WARNING) << "GEMM lda (" << lda << ") was smaller than k (" << k
+                   << ") (transpose case); precondition violation";
+    }
+  }
+  if (transb == blas::Transpose::kNoTranspose) {
+    if (ldb < static_cast<int64>(k)) {
+      LOG(WARNING) << "GEMM ldb (" << ldb << ") was smaller than k (" << k
+                   << ") (no transpose case); precondition violation";
+    }
+  } else {
+    if (ldb < static_cast<int64>(n)) {
+      LOG(WARNING) << "GEMM ldb was smaller than n (transpose case); "
+                      "precondition violation";
+    }
+  }
+  return DoBlasInternal(
+      dynload::cublasSgemm, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(transa), CUDABlasTranspose(transb), m, n, k, &alpha,
+      CUDAMemory(a), lda, CUDAMemory(b), ldb, &beta, CUDAMemoryMutable(c), ldc);
+}
+
+bool CUDABlas::DoBlasGemm(Stream *stream, blas::Transpose transa,
+                          blas::Transpose transb, uint64 m, uint64 n, uint64 k,
+                          double alpha, const DeviceMemory<double> &a, int lda,
+                          const DeviceMemory<double> &b, int ldb, double beta,
+                          DeviceMemory<double> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasDgemm, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(transa), CUDABlasTranspose(transb), m, n, k, &alpha,
+      CUDAMemory(a), lda, CUDAMemory(b), ldb, &beta, CUDAMemoryMutable(c), ldc);
+}
+
+bool CUDABlas::DoBlasGemm(Stream *stream, blas::Transpose transa,
+                          blas::Transpose transb, uint64 m, uint64 n, uint64 k,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &b, int ldb,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasCgemm, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(transa), CUDABlasTranspose(transb), m, n, k,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda,
+      CUDAComplex(CUDAMemory(b)), ldb, CUDAComplex(&beta),
+      CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasGemm(Stream *stream, blas::Transpose transa,
+                          blas::Transpose transb, uint64 m, uint64 n, uint64 k,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &b, int ldb,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasZgemm, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(transa), CUDABlasTranspose(transb), m, n, k,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda,
+      CUDAComplex(CUDAMemory(b)), ldb, CUDAComplex(&beta),
+      CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+template <typename T, typename FuncT>
+port::Status CUDABlas::DoBlasGemmBatchedInternal(
+    FuncT cublas_func, Stream *stream, blas::Transpose transa,
+    blas::Transpose transb, uint64 m, uint64 n, uint64 k, T alpha,
+    const port::ArraySlice<DeviceMemory<T> *> &a_array, int lda,
+    const port::ArraySlice<DeviceMemory<T> *> &b_array, int ldb, T beta,
+    const port::ArraySlice<DeviceMemory<T> *> &c_array, int ldc,
+    int batch_count) {
+  std::vector<T *> a_ptr_vec, b_ptr_vec, c_ptr_vec;
+  for (int i = 0; i < batch_count; ++i) {
+    a_ptr_vec.push_back(static_cast<T *>(a_array[i]->opaque()));
+    b_ptr_vec.push_back(static_cast<T *>(b_array[i]->opaque()));
+    c_ptr_vec.push_back(static_cast<T *>(c_array[i]->opaque()));
+  }
+
+  typedef typename CUDAComplexT<T>::type CUDA_T;
+  SE_ASSIGN_OR_RETURN(
+      std::unique_ptr<TemporaryDeviceMemory<CUDA_T *>> a_ptr_array,
+      stream->AllocateTemporaryArray<CUDA_T *>(batch_count));
+  SE_ASSIGN_OR_RETURN(
+      std::unique_ptr<TemporaryDeviceMemory<CUDA_T *>> b_ptr_array,
+      stream->AllocateTemporaryArray<CUDA_T *>(batch_count));
+  SE_ASSIGN_OR_RETURN(
+      std::unique_ptr<TemporaryDeviceMemory<CUDA_T *>> c_ptr_array,
+      stream->AllocateTemporaryArray<CUDA_T *>(batch_count));
+
+  if (!stream->ThenMemcpy(a_ptr_array->mutable_device_memory(),
+                          a_ptr_vec.data(), batch_count * sizeof(T *))
+           .ok() ||
+      !stream->ThenMemcpy(b_ptr_array->mutable_device_memory(),
+                          b_ptr_vec.data(), batch_count * sizeof(T *))
+           .ok() ||
+      !stream->ThenMemcpy(c_ptr_array->mutable_device_memory(),
+                          c_ptr_vec.data(), batch_count * sizeof(T *))
+           .ok()) {
+    return port::Status(port::error::INTERNAL,
+                        "failed to copy memory from host to device in "
+                        "CUDABlas::DoBlasGemmBatched");
+  }
+
+  bool ok = DoBlasInternal(
+      cublas_func, stream, true /* = pointer_mode_host */,
+      CUDABlasTranspose(transa), CUDABlasTranspose(transb), m, n, k,
+      CUDAComplex(&alpha),
+      const_cast<const CUDA_T **>(CUDAMemory(a_ptr_array->device_memory())),
+      lda,
+      const_cast<const CUDA_T **>(CUDAMemory(b_ptr_array->device_memory())),
+      ldb, CUDAComplex(&beta),
+      const_cast<CUDA_T **>(CUDAMemory(c_ptr_array->device_memory())), ldc,
+      batch_count);
+
+  if (ok) {
+    return port::Status::OK();
+  }
+  return port::Status(port::error::INTERNAL,
+                      "failed BLAS call, see log for details");
+}
+
+bool CUDABlas::DoBlasGemmBatched(
+    Stream *stream, blas::Transpose transa, blas::Transpose transb, uint64 m,
+    uint64 n, uint64 k, float alpha,
+    const port::ArraySlice<DeviceMemory<float> *> &a_array, int lda,
+    const port::ArraySlice<DeviceMemory<float> *> &b_array, int ldb, float beta,
+    const port::ArraySlice<DeviceMemory<float> *> &c_array, int ldc,
+    int batch_count) {
+  SE_RETURN_STATUS_AS_BOOL(DoBlasGemmBatchedInternal(
+      dynload::cublasSgemmBatched, stream, transa, transb, m, n, k, alpha,
+      a_array, lda, b_array, ldb, beta, c_array, ldc, batch_count));
+}
+
+bool CUDABlas::DoBlasGemmBatched(
+    Stream *stream, blas::Transpose transa, blas::Transpose transb, uint64 m,
+    uint64 n, uint64 k, double alpha,
+    const port::ArraySlice<DeviceMemory<double> *> &a_array, int lda,
+    const port::ArraySlice<DeviceMemory<double> *> &b_array, int ldb,
+    double beta, const port::ArraySlice<DeviceMemory<double> *> &c_array,
+    int ldc, int batch_count) {
+  SE_RETURN_STATUS_AS_BOOL(DoBlasGemmBatchedInternal(
+      dynload::cublasDgemmBatched, stream, transa, transb, m, n, k, alpha,
+      a_array, lda, b_array, ldb, beta, c_array, ldc, batch_count));
+}
+
+bool CUDABlas::DoBlasGemmBatched(
+    Stream *stream, blas::Transpose transa, blas::Transpose transb, uint64 m,
+    uint64 n, uint64 k, std::complex<float> alpha,
+    const port::ArraySlice<DeviceMemory<std::complex<float>> *> &a_array,
+    int lda,
+    const port::ArraySlice<DeviceMemory<std::complex<float>> *> &b_array,
+    int ldb, std::complex<float> beta,
+    const port::ArraySlice<DeviceMemory<std::complex<float>> *> &c_array,
+    int ldc, int batch_count) {
+  SE_RETURN_STATUS_AS_BOOL(DoBlasGemmBatchedInternal(
+      dynload::cublasCgemmBatched, stream, transa, transb, m, n, k, alpha,
+      a_array, lda, b_array, ldb, beta, c_array, ldc, batch_count));
+}
+
+bool CUDABlas::DoBlasGemmBatched(
+    Stream *stream, blas::Transpose transa, blas::Transpose transb, uint64 m,
+    uint64 n, uint64 k, std::complex<double> alpha,
+    const port::ArraySlice<DeviceMemory<std::complex<double>> *> &a_array,
+    int lda,
+    const port::ArraySlice<DeviceMemory<std::complex<double>> *> &b_array,
+    int ldb, std::complex<double> beta,
+    const port::ArraySlice<DeviceMemory<std::complex<double>> *> &c_array,
+    int ldc, int batch_count) {
+  SE_RETURN_STATUS_AS_BOOL(DoBlasGemmBatchedInternal(
+      dynload::cublasZgemmBatched, stream, transa, transb, m, n, k, alpha,
+      a_array, lda, b_array, ldb, beta, c_array, ldc, batch_count));
+}
+
+bool CUDABlas::DoBlasHemm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &b, int ldb,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasChemm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(b)), ldb,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasHemm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &b, int ldb,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasZhemm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(b)), ldb,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasHerk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          float alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          float beta, DeviceMemory<std::complex<float>> *c,
+                          int ldc) {
+  return DoBlasInternal(dynload::cublasCherk, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n,
+                        k, CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda,
+                        &beta, CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasHerk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          double alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          double beta, DeviceMemory<std::complex<double>> *c,
+                          int ldc) {
+  return DoBlasInternal(dynload::cublasZherk, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n,
+                        k, CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda,
+                        &beta, CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasHer2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           std::complex<float> alpha,
+                           const DeviceMemory<std::complex<float>> &a, int lda,
+                           const DeviceMemory<std::complex<float>> &b, int ldb,
+                           float beta, DeviceMemory<std::complex<float>> *c,
+                           int ldc) {
+  return DoBlasInternal(dynload::cublasCher2k, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n,
+                        k, CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda,
+                        CUDAComplex(CUDAMemory(b)), ldb, &beta,
+                        CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasHer2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           std::complex<double> alpha,
+                           const DeviceMemory<std::complex<double>> &a, int lda,
+                           const DeviceMemory<std::complex<double>> &b, int ldb,
+                           double beta, DeviceMemory<std::complex<double>> *c,
+                           int ldc) {
+  return DoBlasInternal(dynload::cublasZher2k, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n,
+                        k, CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda,
+                        CUDAComplex(CUDAMemory(b)), ldb, &beta,
+                        CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasSymm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          float alpha, const DeviceMemory<float> &a, int lda,
+                          const DeviceMemory<float> &b, int ldb, float beta,
+                          DeviceMemory<float> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasSsymm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), m, n, &alpha, CUDAMemory(a),
+      lda, CUDAMemory(b), ldb, &beta, CUDAMemoryMutable(c), ldc);
+}
+
+bool CUDABlas::DoBlasSymm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          double alpha, const DeviceMemory<double> &a, int lda,
+                          const DeviceMemory<double> &b, int ldb, double beta,
+                          DeviceMemory<double> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasDsymm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), m, n, &alpha, CUDAMemory(a),
+      lda, CUDAMemory(b), ldb, &beta, CUDAMemoryMutable(c), ldc);
+}
+
+bool CUDABlas::DoBlasSymm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          const DeviceMemory<std::complex<float>> &b, int ldb,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasCsymm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(b)), ldb,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasSymm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          const DeviceMemory<std::complex<double>> &b, int ldb,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasZsymm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemory(b)), ldb,
+      CUDAComplex(&beta), CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasSyrk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          float alpha, const DeviceMemory<float> &a, int lda,
+                          float beta, DeviceMemory<float> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasSsyrk, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n, k, &alpha,
+      CUDAMemory(a), lda, &beta, CUDAMemoryMutable(c), ldc);
+}
+
+bool CUDABlas::DoBlasSyrk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          double alpha, const DeviceMemory<double> &a, int lda,
+                          double beta, DeviceMemory<double> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasDsyrk, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n, k, &alpha,
+      CUDAMemory(a), lda, &beta, CUDAMemoryMutable(c), ldc);
+}
+
+bool CUDABlas::DoBlasSyrk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          std::complex<float> beta,
+                          DeviceMemory<std::complex<float>> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasCsyrk, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n, k,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(&beta),
+      CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasSyrk(Stream *stream, blas::UpperLower uplo,
+                          blas::Transpose trans, uint64 n, uint64 k,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          std::complex<double> beta,
+                          DeviceMemory<std::complex<double>> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasZsyrk, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n, k,
+      CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(&beta),
+      CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           float alpha, const DeviceMemory<float> &a, int lda,
+                           const DeviceMemory<float> &b, int ldb, float beta,
+                           DeviceMemory<float> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasSsyr2k, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n, k, &alpha,
+      CUDAMemory(a), lda, CUDAMemory(b), ldb, &beta, CUDAMemoryMutable(c), ldc);
+}
+
+bool CUDABlas::DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           double alpha, const DeviceMemory<double> &a, int lda,
+                           const DeviceMemory<double> &b, int ldb, double beta,
+                           DeviceMemory<double> *c, int ldc) {
+  return DoBlasInternal(
+      dynload::cublasDsyr2k, stream, true /* = pointer_mode_host */,
+      CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n, k, &alpha,
+      CUDAMemory(a), lda, CUDAMemory(b), ldb, &beta, CUDAMemoryMutable(c), ldc);
+}
+
+bool CUDABlas::DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           std::complex<float> alpha,
+                           const DeviceMemory<std::complex<float>> &a, int lda,
+                           const DeviceMemory<std::complex<float>> &b, int ldb,
+                           std::complex<float> beta,
+                           DeviceMemory<std::complex<float>> *c, int ldc) {
+  return DoBlasInternal(dynload::cublasCsyr2k, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n,
+                        k, CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda,
+                        CUDAComplex(CUDAMemory(b)), ldb, CUDAComplex(&beta),
+                        CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasSyr2k(Stream *stream, blas::UpperLower uplo,
+                           blas::Transpose trans, uint64 n, uint64 k,
+                           std::complex<double> alpha,
+                           const DeviceMemory<std::complex<double>> &a, int lda,
+                           const DeviceMemory<std::complex<double>> &b, int ldb,
+                           std::complex<double> beta,
+                           DeviceMemory<std::complex<double>> *c, int ldc) {
+  return DoBlasInternal(dynload::cublasZsyr2k, stream,
+                        true /* = pointer_mode_host */,
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(trans), n,
+                        k, CUDAComplex(&alpha), CUDAComplex(CUDAMemory(a)), lda,
+                        CUDAComplex(CUDAMemory(b)), ldb, CUDAComplex(&beta),
+                        CUDAComplex(CUDAMemoryMutable(c)), ldc);
+}
+
+bool CUDABlas::DoBlasTrmm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n, float alpha,
+                          const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *b, int ldb) {
+  return DoBlasInternal(
+      dynload::cublasStrmm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), CUDABlasTranspose(transa),
+      CUDABlasDiagonal(diag), m, n, &alpha, CUDAMemory(a), lda,
+      CUDAMemoryMutable(b), ldb, CUDAMemoryMutable(b), ldb);
+}
+
+bool CUDABlas::DoBlasTrmm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n, double alpha,
+                          const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *b, int ldb) {
+  return DoBlasInternal(
+      dynload::cublasDtrmm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), CUDABlasTranspose(transa),
+      CUDABlasDiagonal(diag), m, n, &alpha, CUDAMemory(a), lda,
+      CUDAMemoryMutable(b), ldb, CUDAMemoryMutable(b), ldb);
+}
+
+bool CUDABlas::DoBlasTrmm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          DeviceMemory<std::complex<float>> *b, int ldb) {
+  return DoBlasInternal(
+      dynload::cublasCtrmm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), CUDABlasTranspose(transa),
+      CUDABlasDiagonal(diag), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemoryMutable(b)), ldb,
+      CUDAComplex(CUDAMemoryMutable(b)), ldb);
+}
+
+bool CUDABlas::DoBlasTrmm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          DeviceMemory<std::complex<double>> *b, int ldb) {
+  return DoBlasInternal(
+      dynload::cublasZtrmm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), CUDABlasTranspose(transa),
+      CUDABlasDiagonal(diag), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemoryMutable(b)), ldb,
+      CUDAComplex(CUDAMemoryMutable(b)), ldb);
+}
+
+bool CUDABlas::DoBlasTrsm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n, float alpha,
+                          const DeviceMemory<float> &a, int lda,
+                          DeviceMemory<float> *b, int ldb) {
+  return DoBlasInternal(dynload::cublasStrsm, stream,
+                        true /* = pointer_mode_host */, CUDABlasSide(side),
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(transa),
+                        CUDABlasDiagonal(diag), m, n, &alpha, CUDAMemory(a),
+                        lda, CUDAMemoryMutable(b), ldb);
+}
+
+bool CUDABlas::DoBlasTrsm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n, double alpha,
+                          const DeviceMemory<double> &a, int lda,
+                          DeviceMemory<double> *b, int ldb) {
+  return DoBlasInternal(dynload::cublasDtrsm, stream,
+                        true /* = pointer_mode_host */, CUDABlasSide(side),
+                        CUDABlasUpperLower(uplo), CUDABlasTranspose(transa),
+                        CUDABlasDiagonal(diag), m, n, &alpha, CUDAMemory(a),
+                        lda, CUDAMemoryMutable(b), ldb);
+}
+
+bool CUDABlas::DoBlasTrsm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n,
+                          std::complex<float> alpha,
+                          const DeviceMemory<std::complex<float>> &a, int lda,
+                          DeviceMemory<std::complex<float>> *b, int ldb) {
+  return DoBlasInternal(
+      dynload::cublasCtrsm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), CUDABlasTranspose(transa),
+      CUDABlasDiagonal(diag), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemoryMutable(b)), ldb);
+}
+
+bool CUDABlas::DoBlasTrsm(Stream *stream, blas::Side side,
+                          blas::UpperLower uplo, blas::Transpose transa,
+                          blas::Diagonal diag, uint64 m, uint64 n,
+                          std::complex<double> alpha,
+                          const DeviceMemory<std::complex<double>> &a, int lda,
+                          DeviceMemory<std::complex<double>> *b, int ldb) {
+  return DoBlasInternal(
+      dynload::cublasZtrsm, stream, true /* = pointer_mode_host */,
+      CUDABlasSide(side), CUDABlasUpperLower(uplo), CUDABlasTranspose(transa),
+      CUDABlasDiagonal(diag), m, n, CUDAComplex(&alpha),
+      CUDAComplex(CUDAMemory(a)), lda, CUDAComplex(CUDAMemoryMutable(b)), ldb);
+}
+
+}  // namespace cuda
+
+namespace gpu = ::perftools::gputools;
+
+void initialize_cublas() {
+  gpu::port::Status status =
+      gpu::PluginRegistry::Instance()
+          ->RegisterFactory<gpu::PluginRegistry::BlasFactory>(
+              gpu::cuda::kCudaPlatformId, gpu::cuda::kCuBlasPlugin, "cuBLAS",
+              [](gpu::internal::StreamExecutorInterface
+                     *parent) -> gpu::blas::BlasSupport * {
+                gpu::cuda::CUDAExecutor *cuda_executor =
+                    dynamic_cast<gpu::cuda::CUDAExecutor *>(parent);
+                if (cuda_executor == nullptr) {
+                  LOG(ERROR)
+                      << "Attempting to initialize an instance of the cuBLAS "
+                      << "support library with a non-CUDA StreamExecutor";
+                  return nullptr;
+                }
+
+                gpu::cuda::CUDABlas *blas =
+                    new gpu::cuda::CUDABlas(cuda_executor);
+                if (!blas->Init()) {
+                  // Note: Init() will log a more specific error.
+                  delete blas;
+                  return nullptr;
+                }
+                return blas;
+              });
+
+  if (!status.ok()) {
+    LOG(ERROR) << "Unable to register cuBLAS factory: "
+               << status.error_message();
+  }
+
+  // Prime the cuBLAS DSO. The loader will log more information.
+  auto statusor = gpu::internal::CachedDsoLoader::GetCublasDsoHandle();
+  if (!statusor.ok()) {
+    LOG(INFO) << "Unable to load cuBLAS DSO.";
+  }
+
+  gpu::PluginRegistry::Instance()->SetDefaultFactory(gpu::cuda::kCudaPlatformId,
+                                                     gpu::PluginKind::kBlas,
+                                                     gpu::cuda::kCuBlasPlugin);
+}
+
+}  // namespace gputools
+}  // namespace perftools
+
+REGISTER_MODULE_INITIALIZER(register_cublas,
+                            { perftools::gputools::initialize_cublas(); });
diff --git a/tensorflow/stream_executor/cuda/cuda_blas.h b/tensorflow/stream_executor/cuda/cuda_blas.h
new file mode 100644
index 0000000000..1dfec2ebc5
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_blas.h
@@ -0,0 +1,100 @@
+// CUDA-specific support for BLAS functionality -- this wraps the cuBLAS library
+// capabilities, and is only included into CUDA implementation code -- it will
+// not introduce cuda headers into other code.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_BLAS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_BLAS_H_
+
+#include "tensorflow/stream_executor/blas.h"
+#include "tensorflow/stream_executor/lib/stringpiece.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+
+typedef struct cublasContext *cublasHandle_t;
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+
+namespace cuda {
+
+// Opaque and unique identifier for the cuBLAS plugin.
+extern const PluginId kCuBlasPlugin;
+
+class CUDAExecutor;
+
+// BLAS plugin for CUDA platform via cuBLAS library.
+//
+// This satisfies the platform-agnostic BlasSupport interface.
+//
+// Note that the cuBLAS handle that this encapsulates is implicitly tied to the
+// context (and, as a result, the device) that the parent CUDAExecutor is tied
+// to. This simply happens as an artifact of creating the cuBLAS handle when a
+// CUDA context is active.
+//
+// Thread-safe post-initialization.
+class CUDABlas : public blas::BlasSupport {
+ public:
+  explicit CUDABlas(CUDAExecutor *parent);
+
+  // Allocates a cuBLAS handle.
+  bool Init();
+
+  // Releases the cuBLAS handle, if present.
+  ~CUDABlas() override;
+
+  TENSORFLOW_STREAM_EXECUTOR_GPU_BLAS_SUPPORT_OVERRIDES
+
+ private:
+  // Tells cuBLAS to enqueue the BLAS operation onto a particular Stream.
+  //
+  // cuBLAS is stateful, and only be associated with one stream (in order to
+  // enqueue dispatch) at a given time. As a result, this generally must be
+  // invoked before calling into cuBLAS.
+  bool SetStream(Stream *stream) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // A helper function that calls the real cuBLAS function together with error
+  // handling.
+  //
+  // cublas_func:        cuBLAS function pointer.
+  // cublas_name:        cuBLAS function name.
+  // stream:             Stream to enqueue the BLAS operation onto.
+  // pointer_mode_host:  Indicate if the pointer to a scalar value is from host
+  //                     (true) or device (false).
+  // args:               Arguments of cuBLAS function.
+  template <typename FuncT, typename... Args>
+  bool DoBlasInternal(FuncT cublas_func, Stream *stream, bool pointer_mode_host,
+                      Args... args);
+
+  // A helper function to implement DoBlasGemmBatched interfaces for generic
+  // types.
+  template <typename T, typename FuncT>
+  port::Status DoBlasGemmBatchedInternal(
+      FuncT cublas_func, Stream *stream, blas::Transpose transa,
+      blas::Transpose transb, uint64 m, uint64 n, uint64 k, T alpha,
+      const port::ArraySlice<DeviceMemory<T> *> &a_array, int lda,
+      const port::ArraySlice<DeviceMemory<T> *> &b_array, int ldb, T beta,
+      const port::ArraySlice<DeviceMemory<T> *> &c_array, int ldc,
+      int batch_count);
+
+  // mutex that guards the cuBLAS handle for this device.
+  mutex mu_;
+
+  // CUDAExecutor which instantiated this CUDABlas.
+  // Immutable post-initialization.
+  CUDAExecutor *parent_;
+
+  // cuBLAS library handle on the device.
+  cublasHandle_t blas_ GUARDED_BY(mu_);
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CUDABlas);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_BLAS_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_diagnostics.cc b/tensorflow/stream_executor/cuda/cuda_diagnostics.cc
new file mode 100644
index 0000000000..c01c9978a1
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_diagnostics.cc
@@ -0,0 +1,260 @@
+#include "tensorflow/stream_executor/cuda/cuda_diagnostics.h"
+
+#include <dirent.h>
+#include <limits.h>
+#include <link.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/sysmacros.h>
+#include <unistd.h>
+#include <algorithm>
+#include <memory>
+#include <vector>
+
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/inlined_vector.h"
+#include "tensorflow/stream_executor/lib/numbers.h"
+#include "tensorflow/stream_executor/lib/process_state.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/str_util.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/lib/stringpiece.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+static const char *kDriverVersionPath = "/proc/driver/nvidia/version";
+
+string DriverVersionToString(DriverVersion version) {
+  return port::Printf("%d.%d", std::get<0>(version), std::get<1>(version));
+}
+
+string DriverVersionStatusToString(port::StatusOr<DriverVersion> version) {
+  if (!version.ok()) {
+    return version.status().ToString();
+  }
+
+  return DriverVersionToString(version.ValueOrDie());
+}
+
+port::StatusOr<DriverVersion> StringToDriverVersion(const string &value) {
+  std::vector<string> pieces = port::Split(value, '.');
+  if (pieces.size() != 2) {
+    return port::Status{
+        port::error::INVALID_ARGUMENT,
+        port::Printf("expected %%d.%%d form for driver version; got \"%s\"",
+                     value.c_str())};
+  }
+
+  int major;
+  int minor;
+  if (!port::safe_strto32(pieces[0], &major)) {
+    return port::Status{
+        port::error::INVALID_ARGUMENT,
+        port::Printf("could not parse major version number \"%s\" as an "
+                     "integer from string \"%s\"",
+                     pieces[0].c_str(), value.c_str())};
+  }
+  if (!port::safe_strto32(pieces[1], &minor)) {
+    return port::Status{
+        port::error::INVALID_ARGUMENT,
+        port::Printf("could not parse minor version number \"%s\" as an "
+                     "integer from string \"%s\"",
+                     pieces[1].c_str(), value.c_str())};
+  }
+
+  DriverVersion result{major, minor};
+  VLOG(2) << "version string \"" << value << "\" made value "
+          << DriverVersionToString(result);
+  return result;
+}
+
+// -- class Diagnostician
+
+string Diagnostician::GetDevNodePath(int dev_node_ordinal) {
+  return port::StrCat("/dev/nvidia", dev_node_ordinal);
+}
+
+void Diagnostician::LogDiagnosticInformation() {
+  if (access(kDriverVersionPath, F_OK) != 0) {
+    LOG(INFO) << "kernel driver does not appear to be running on this host "
+              << "(" << port::Hostname() << "): "
+              << "/proc/driver/nvidia/version does not exist";
+    return;
+  }
+  auto dev0_path = GetDevNodePath(0);
+  if (access(dev0_path.c_str(), F_OK) != 0) {
+    LOG(INFO) << "no NVIDIA GPU device is present: " << dev0_path
+              << " does not exist";
+    return;
+  }
+
+  LOG(INFO) << "retrieving CUDA diagnostic information for host: "
+            << port::Hostname();
+
+
+  LogDriverVersionInformation();
+}
+
+/* static */ void Diagnostician::LogDriverVersionInformation() {
+  LOG(INFO) << "hostname: " << port::Hostname();
+
+  if (VLOG_IS_ON(1)) {
+    const char *value = getenv("LD_LIBRARY_PATH");
+    string library_path = value == nullptr ? "" : value;
+    VLOG(1) << "LD_LIBRARY_PATH is: \"" << library_path << "\"";
+
+    std::vector<string> pieces = port::Split(library_path, ':');
+    for (auto piece : pieces) {
+      if (piece.empty()) {
+        continue;
+      }
+      DIR *dir = opendir(piece.c_str());
+      if (dir == nullptr) {
+        VLOG(1) << "could not open \"" << piece << "\"";
+        continue;
+      }
+      while (dirent *entity = readdir(dir)) {
+        VLOG(1) << piece << " :: " << entity->d_name;
+      }
+      closedir(dir);
+    }
+  }
+
+  port::StatusOr<DriverVersion> dso_version = FindDsoVersion();
+  LOG(INFO) << "libcuda reported version is: "
+            << DriverVersionStatusToString(dso_version);
+
+  port::StatusOr<DriverVersion> kernel_version = FindKernelDriverVersion();
+  LOG(INFO) << "kernel reported version is: "
+            << DriverVersionStatusToString(kernel_version);
+  if (kernel_version.ok() && dso_version.ok()) {
+    WarnOnDsoKernelMismatch(dso_version, kernel_version);
+  }
+}
+
+// Iterates through loaded DSOs with DlIteratePhdrCallback to find the
+// driver-interfacing DSO version number. Returns it as a string.
+port::StatusOr<DriverVersion> Diagnostician::FindDsoVersion() {
+  port::StatusOr<DriverVersion> result{port::Status{
+      port::error::NOT_FOUND,
+      "was unable to find libcuda.so DSO loaded into this program"}};
+
+  // Callback used when iterating through DSOs. Looks for the driver-interfacing
+  // DSO and yields its version number into the callback data, when found.
+  auto iterate_phdr =
+      [](struct dl_phdr_info *info, size_t size, void *data) -> int {
+    if (strstr(info->dlpi_name, "libcuda.so")) {
+      VLOG(1) << "found DLL info with name: " << info->dlpi_name;
+      char resolved_path[PATH_MAX] = {0};
+      if (realpath(info->dlpi_name, resolved_path) == nullptr) {
+        return 0;
+      }
+      VLOG(1) << "found DLL info with resolved path: " << resolved_path;
+      const char *slash = rindex(resolved_path, '/');
+      if (slash == nullptr) {
+        return 0;
+      }
+      const char *so_suffix = ".so.";
+      const char *dot = strstr(slash, so_suffix);
+      if (dot == nullptr) {
+        return 0;
+      }
+      string dso_version = dot + strlen(so_suffix);
+      // TODO(b/22689637): Eliminate the explicit namespace if possible.
+      auto stripped_dso_version = port::StripSuffixString(dso_version, ".ld64");
+      auto result = static_cast<port::StatusOr<DriverVersion> *>(data);
+      *result = StringToDriverVersion(stripped_dso_version);
+      return 1;
+    }
+    return 0;
+  };
+
+  dl_iterate_phdr(iterate_phdr, &result);
+
+  return result;
+}
+
+port::StatusOr<DriverVersion> Diagnostician::FindKernelModuleVersion(
+    const string &driver_version_file_contents) {
+  static const char *kDriverFilePrelude = "Kernel Module  ";
+  size_t offset = driver_version_file_contents.find(kDriverFilePrelude);
+  if (offset == string::npos) {
+    return port::Status{
+        port::error::NOT_FOUND,
+        port::StrCat("could not find kernel module information in "
+                     "driver version file contents: \"",
+                     driver_version_file_contents, "\"")};
+  }
+
+  string version_and_rest = driver_version_file_contents.substr(
+      offset + strlen(kDriverFilePrelude), string::npos);
+  size_t space_index = version_and_rest.find(" ");
+  auto kernel_version = version_and_rest.substr(0, space_index);
+  // TODO(b/22689637): Eliminate the explicit namespace if possible.
+  auto stripped_kernel_version =
+      port::StripSuffixString(kernel_version, ".ld64");
+  return StringToDriverVersion(stripped_kernel_version);
+}
+
+void Diagnostician::WarnOnDsoKernelMismatch(
+    port::StatusOr<DriverVersion> dso_version,
+    port::StatusOr<DriverVersion> kernel_version) {
+  if (kernel_version.ok() && dso_version.ok() &&
+      dso_version.ValueOrDie() == kernel_version.ValueOrDie()) {
+    LOG(INFO) << "kernel version seems to match DSO: "
+              << DriverVersionToString(kernel_version.ValueOrDie());
+  } else {
+    LOG(ERROR) << "kernel version "
+               << DriverVersionStatusToString(kernel_version)
+               << " does not match DSO version "
+               << DriverVersionStatusToString(dso_version)
+               << " -- cannot find working devices in this configuration";
+  }
+}
+
+
+port::StatusOr<DriverVersion> Diagnostician::FindKernelDriverVersion() {
+  FILE *driver_version_file = fopen(kDriverVersionPath, "r");
+  if (driver_version_file == nullptr) {
+    return port::Status{
+        port::error::PERMISSION_DENIED,
+        port::StrCat("could not open driver version path for reading: ",
+                     kDriverVersionPath)};
+  }
+
+  static const int kContentsSize = 1024;
+  port::InlinedVector<char, 4> contents(kContentsSize);
+  size_t retcode =
+      fread(contents.begin(), 1, kContentsSize - 2, driver_version_file);
+  if (retcode < kContentsSize - 1) {
+    contents[retcode] = '\0';
+  }
+  contents[kContentsSize - 1] = '\0';
+
+  if (retcode != 0) {
+    LOG(INFO) << "driver version file contents: \"\"\"" << contents.begin()
+              << "\"\"\"";
+    fclose(driver_version_file);
+    return FindKernelModuleVersion(string{contents.begin()});
+  }
+
+  auto status =
+      port::Status{port::error::INTERNAL,
+                   port::StrCat("failed to read driver version file contents: ",
+                                kDriverVersionPath, "; ferror: ",
+                                ferror(driver_version_file))};
+  fclose(driver_version_file);
+  return status;
+}
+
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/cuda/cuda_diagnostics.h b/tensorflow/stream_executor/cuda/cuda_diagnostics.h
new file mode 100644
index 0000000000..005b3dc310
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_diagnostics.h
@@ -0,0 +1,85 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DIAGNOSTICS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DIAGNOSTICS_H_
+
+#include <tuple>
+
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+// e.g. DriverVersion{331, 79}
+using DriverVersion = std::tuple<int, int>;
+
+// Converts a parsed driver version to string form.
+string DriverVersionToString(DriverVersion version);
+
+// Converts a parsed driver version or status value to natural string form.
+string DriverVersionStatusToString(port::StatusOr<DriverVersion> version);
+
+// Converts a string of a form like "331.79" to a DriverVersion{331, 79}.
+port::StatusOr<DriverVersion> StringToDriverVersion(const string &value);
+
+class Diagnostician {
+ public:
+  // Logs diagnostic information when CUDA appears to be misconfigured (e.g. is
+  // not initializing).
+  //
+  // Note: if we're running on a machine that has no GPUs, we don't want to
+  // produce very much log spew beyond saying, "looks like there's no CUDA
+  // kernel
+  // module running".
+  //
+  // Note: we use non-Google-File:: API here because we may be called before
+  // InitGoogle has completed.
+  static void LogDiagnosticInformation();
+
+  // Given the driver version file contents, finds the kernel module version and
+  // returns it as a string.
+  //
+  // This is solely used for more informative log messages when the user is
+  // running on a machine that happens to have a libcuda/kernel driver mismatch.
+  static port::StatusOr<DriverVersion> FindKernelModuleVersion(
+      const string &driver_version_file_contents);
+
+  // Extracts the kernel driver version from the current host.
+  static port::StatusOr<DriverVersion> FindKernelDriverVersion();
+
+  // Iterates through loaded DSOs with DlIteratePhdrCallback to find the
+  // driver-interfacing DSO version number. Returns it as a string.
+  static port::StatusOr<DriverVersion> FindDsoVersion();
+
+  // Logs information about the kernel driver version and userspace driver
+  // library version.
+  static void LogDriverVersionInformation();
+
+ private:
+  // Logs information about the loaded nvidia-related kernel modules.
+  static void LogKernelModuleInformation();
+
+  // Given the DSO version number and the driver version file contents, extracts
+  // the driver version and compares, warning the user in the case of
+  // incompatability.
+  //
+  // This is solely used for more informative log messages when the user is
+  // running on a machine that happens to have a libcuda/kernel driver mismatch.
+  static void WarnOnDsoKernelMismatch(
+      port::StatusOr<DriverVersion> dso_version,
+      port::StatusOr<DriverVersion> kernel_version);
+
+  // Logs information about the dev nodes present on this machine: their
+  // existence, permissions, accessibility from this uid/gid.
+  static void LogDevNodeDiagnosticInformation();
+
+  static string GetDevNodePath(int dev_node_ordinal);
+
+  SE_DISALLOW_COPY_AND_ASSIGN(Diagnostician);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DIAGNOSTICS_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_dnn.cc b/tensorflow/stream_executor/cuda/cuda_dnn.cc
new file mode 100644
index 0000000000..6e4403512b
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_dnn.cc
@@ -0,0 +1,1074 @@
+#include "tensorflow/stream_executor/cuda/cuda_dnn.h"
+
+#include <dlfcn.h>
+#include <functional>
+
+#include "tensorflow/stream_executor/dnn.h"
+#include "tensorflow/stream_executor/dso_loader.h"
+#include "tensorflow/stream_executor/lib/env.h"
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/initialize.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/lib/threadpool.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+#include "tensorflow/stream_executor/cuda/cuda_activation.h"
+#include "tensorflow/stream_executor/cuda/cuda_diagnostics.h"
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/cuda/cuda_gpu_executor.h"
+#include "tensorflow/stream_executor/cuda/cuda_platform.h"
+#include "third_party/gpus/cuda/include/cudnn.h"
+
+namespace {
+
+// Converts (via narrowing) a type T value to a type U, and checks that the
+// value has no value change due to the conversion.
+template <typename WideT, typename NarrowT>
+NarrowT CheckedNarrowing(const WideT& wide) {
+  NarrowT narrow = wide;
+  CHECK_EQ(narrow, wide)
+      << "checked narrowing failed; values not equal post-conversion";
+  return narrow;
+}
+
+}  // namespace
+
+namespace perftools {
+namespace gputools {
+
+using dnn::BatchDescriptor;
+using dnn::FilterDescriptor;
+using dnn::ConvolutionDescriptor;
+using dnn::PoolingDescriptor;
+
+namespace cuda {
+
+PLUGIN_REGISTRY_DEFINE_PLUGIN_ID(kCuDnnPlugin);
+
+extern CUstream AsCUDAStreamValue(Stream* stream);
+
+string ToString(cudnnStatus_t status) {
+  switch (status) {
+    case CUDNN_STATUS_SUCCESS:
+      return "CUDNN_STATUS_SUCCESS";
+    case CUDNN_STATUS_NOT_INITIALIZED:
+      return "CUDNN_STATUS_NOT_INITIALIZED";
+    case CUDNN_STATUS_ALLOC_FAILED:
+      return "CUDNN_STATUS_ALLOC_FAILED";
+    case CUDNN_STATUS_BAD_PARAM:
+      return "CUDNN_STATUS_BAD_PARAM";
+    case CUDNN_STATUS_INTERNAL_ERROR:
+      return "CUDNN_STATUS_INTERNAL_ERROR";
+    case CUDNN_STATUS_INVALID_VALUE:
+      return "CUDNN_STATUS_INVALID_VALUE";
+    case CUDNN_STATUS_ARCH_MISMATCH:
+      return "CUDNN_STATUS_ARCH_MISMATCH";
+    case CUDNN_STATUS_MAPPING_ERROR:
+      return "CUDNN_STATUS_MAPPING_ERROR";
+    case CUDNN_STATUS_EXECUTION_FAILED:
+      return "CUDNN_STATUS_EXECUTION_FAILED";
+    case CUDNN_STATUS_NOT_SUPPORTED:
+      return "CUDNN_STATUS_NOT_SUPPORTED";
+    case CUDNN_STATUS_LICENSE_ERROR:
+      return "CUDNN_STATUS_LICENSE_ERROR";
+    default:
+      return port::StrCat("<unknown cudnn status: ", static_cast<int>(status),
+                          ">");
+  }
+}
+
+namespace dynload {
+
+static port::ThreadPool* InitCudnnThreadpool() {
+  port::ThreadPool* cudnn_threadpool_;
+  port::ThreadOptions options;
+  // TBD(keveman): Conservatively setting the stack size and guard size to 2MB,
+  // until we can get some guarantees from NVIDIA on the minimum stack space
+  // they will work with.
+  options.stack_size = 2 * 1024 * 1024;
+  options.guard_size = 2 * 1024 * 1024;
+  cudnn_threadpool_ = new port::ThreadPool(port::Env::Default(), options,
+                                           "cudnn_threadpool", 1);
+  CHECK(cudnn_threadpool_);
+  return cudnn_threadpool_;
+}
+
+static mutex cudnn_threadpool_mu(LINKER_INITIALIZED);
+static port::ThreadPool* GetCudaThreadpool() {
+  mutex_lock lock(cudnn_threadpool_mu);
+  static port::ThreadPool* cudnn_threadpool = InitCudnnThreadpool();
+  return cudnn_threadpool;
+}
+
+#define PERFTOOLS_GPUTOOLS_CUDNN_WRAP(__name)                              \
+  struct DynLoadShim__##__name {                                           \
+    static const char* kName;                                              \
+    typedef std::add_pointer<decltype(::__name)>::type FuncPointerT;       \
+    static void* GetDsoHandle() {                                          \
+      static auto result = internal::CachedDsoLoader::GetCudnnDsoHandle(); \
+      return result.ValueOrDie();                                          \
+    }                                                                      \
+    static FuncPointerT DynLoad() {                                        \
+      static void* f = dlsym(GetDsoHandle(), kName);                       \
+      if (f == nullptr) {                                                  \
+        LOG(FATAL) << "could not find " << kName                           \
+                   << " in cudnn DSO; dlerror: " << dlerror();             \
+      }                                                                    \
+      return reinterpret_cast<FuncPointerT>(f);                            \
+    }                                                                      \
+    template <typename... Args>                                            \
+    void CallWrapper(CUDAExecutor* parent, port::Notification* n,          \
+                     cudnnStatus_t* retval, const Args&... args) {         \
+      cuda::ScopedActivateExecutorContext sac{parent};                     \
+      *retval = DynLoad()(args...);                                        \
+      n->Notify();                                                         \
+    }                                                                      \
+    template <typename... Args>                                            \
+    cudnnStatus_t operator()(CUDAExecutor* parent, Args... args) {         \
+      port::Notification n;                                                \
+      cudnnStatus_t retval;                                                \
+      auto call_func_closure =                                             \
+          std::bind(&DynLoadShim__##__name::CallWrapper<Args...>, this,    \
+                    parent, &n, &retval, args...);                         \
+      GetCudaThreadpool()->Schedule(call_func_closure);                    \
+      n.WaitForNotification();                                             \
+      return retval;                                                       \
+    }                                                                      \
+  } __name;                                                                \
+  const char* DynLoadShim__##__name::kName = #__name;
+
+#define CUDNN_DNN_ROUTINE_EACH(__macro)                                      \
+  __macro(cudnnSetTensor4dDescriptor) __macro(                               \
+      cudnnGetConvolutionNdForwardOutputDim)                                 \
+      __macro(cudnnGetConvolutionForwardAlgorithm) __macro(                  \
+          cudnnCreateTensorDescriptor) __macro(cudnnDestroyTensorDescriptor) \
+          __macro(cudnnCreateFilterDescriptor)                               \
+              __macro(cudnnSetFilter4dDescriptor)                            \
+                  __macro(cudnnSetPooling2dDescriptor)                       \
+                      __macro(cudnnDestroyFilterDescriptor)                  \
+                          __macro(cudnnCreateConvolutionDescriptor)          \
+                              __macro(cudnnCreatePoolingDescriptor)          \
+                                  __macro(cudnnAddTensor)                    \
+                                      __macro(cudnnDestroyPoolingDescriptor)
+
+CUDNN_DNN_ROUTINE_EACH(PERFTOOLS_GPUTOOLS_CUDNN_WRAP)
+#undef CUDNN_DNN_ROUTINE_EACH
+
+// clang-format off
+#define CUDNN_DNN_ROUTINE_EACH(__macro)            \
+  __macro(cudnnSetConvolution2dDescriptor)         \
+  __macro(cudnnDestroyConvolutionDescriptor)       \
+  __macro(cudnnCreate)                             \
+  __macro(cudnnDestroy)                            \
+  __macro(cudnnSetStream)                          \
+  __macro(cudnnActivationForward)                  \
+  __macro(cudnnConvolutionForward)                 \
+  __macro(cudnnConvolutionBackwardData)            \
+  __macro(cudnnConvolutionBackwardFilter)          \
+  __macro(cudnnGetConvolutionForwardWorkspaceSize) \
+  __macro(cudnnTransformTensor)                    \
+  __macro(cudnnPoolingForward)                     \
+  __macro(cudnnPoolingBackward)
+// clang-format on
+
+CUDNN_DNN_ROUTINE_EACH(PERFTOOLS_GPUTOOLS_CUDNN_WRAP)
+#undef CUDNN_DNN_ROUTINE_EACH
+
+}  // namespace dynload
+
+namespace {
+
+cudnnHandle_t ToHandle(void* opaque_handle) {
+  return static_cast<cudnnHandle_t>(opaque_handle);
+}
+
+}  // namespace
+
+CudnnSupport::CudnnSupport(CUDAExecutor* parent)
+    : parent_(parent), dnn_handle_(nullptr) {}
+
+CudnnSupport::~CudnnSupport() {
+  auto status = dynload::cudnnDestroy(parent_, ToHandle(dnn_handle_));
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "could not destroy cudnn handle: " << ToString(status);
+  }
+}
+
+port::Status CudnnSupport::Init() {
+  auto status = dynload::cudnnCreate(
+      parent_, reinterpret_cast<cudnnHandle_t*>(&dnn_handle_));
+  if (status == CUDNN_STATUS_SUCCESS) {
+    return port::Status::OK();
+  }
+
+  LOG(ERROR) << "could not create cudnn handle: " << ToString(status);
+  if (status == CUDNN_STATUS_NOT_INITIALIZED) {
+    // This is the error code that the driver returns when we're not running a
+    // sufficient CUDA driver -- cudnn requires 6.5+ compatibility, which
+    // starts with the 340.XX driver series.
+    auto result = cuda::Diagnostician::FindKernelDriverVersion();
+    if (!result.ok()) {
+      LOG(ERROR) << "error retrieving driver version: "
+                 << DriverVersionStatusToString(result);
+    } else {
+      const auto& version = result.ValueOrDie();
+      LOG(INFO) << "running driver version: " << DriverVersionToString(version);
+      if (std::get<0>(version) < 340) {
+        LOG(ERROR)
+            << "cudnn library is only supported on 340.XX+ driver versions";
+      }
+    }
+  }
+  return port::Status{port::error::INTERNAL,
+                      port::StrCat("cudnn library could not create a handle: ",
+                                   ToString(status))};
+}
+
+// Turns a BatchDescriptor structure into a cudnn tensor handle within a scope.
+class ScopedTensorDescriptor {
+ public:
+  ScopedTensorDescriptor(CUDAExecutor* parent,
+                         const BatchDescriptor& batch_descriptor,
+                         cudnnDataType_t elem_type)
+      : parent_(parent), handle_(nullptr) {
+    cudnnStatus_t status =
+        dynload::cudnnCreateTensorDescriptor(parent_, &handle_);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(FATAL) << "could not create cudnn tensor descriptor: "
+                 << ToString(status);
+    }
+
+    cudnnTensorFormat_t format;
+    switch (batch_descriptor.layout()) {
+      case dnn::DataLayout::kBatchYXDepth:
+        format = CUDNN_TENSOR_NHWC;
+        break;
+      case dnn::DataLayout::kBatchDepthYX:
+        format = CUDNN_TENSOR_NCHW;
+        break;
+      default:
+        LOG(FATAL) << "Unsupported tensor format "
+                   << DataLayoutString(batch_descriptor.layout());
+        break;
+    }
+
+    status = dynload::cudnnSetTensor4dDescriptor(
+        parent_, handle_, format, elem_type,
+        CheckedNarrowing<int64, int>(batch_descriptor.count()),
+        CheckedNarrowing<int64, int>(batch_descriptor.feature_map_count()),
+        CheckedNarrowing<int64, int>(batch_descriptor.height()),
+        CheckedNarrowing<int64, int>(batch_descriptor.width()));
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(FATAL) << "could not set cudnn tensor descriptor: "
+                 << ToString(status);
+    }
+  }
+
+  ~ScopedTensorDescriptor() {
+    cudnnStatus_t status =
+        dynload::cudnnDestroyTensorDescriptor(parent_, handle_);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(ERROR) << "could not destroy cudnn tensor descriptor: "
+                 << ToString(status);
+    }
+  }
+
+  cudnnTensorDescriptor_t handle() const { return handle_; }
+
+ private:
+  CUDAExecutor* parent_;            // Parent executor. Not owned.
+  cudnnTensorDescriptor_t handle_;  // Owned.
+
+  SE_DISALLOW_COPY_AND_ASSIGN(ScopedTensorDescriptor);
+};
+
+// Turns a FilterDescriptor structure into a cudnn filter handle within a scope.
+class ScopedFilterDescriptor {
+ public:
+  ScopedFilterDescriptor(CUDAExecutor* parent,
+                         const FilterDescriptor& filter_descriptor,
+                         cudnnDataType_t elem_type)
+      : parent_(parent), handle_(nullptr) {
+    cudnnStatus_t status =
+        dynload::cudnnCreateFilterDescriptor(parent_, &handle_);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(FATAL) << "could not create cudnn filter descriptor: "
+                 << ToString(status);
+    }
+
+    // TODO(b/23032134): Even if the filter layout is not supported,
+    // cudnnSetFilter4DDescriptor will return CUDNN_STATUS_SUCCESS because it
+    // does not take layout as an input. Maybe force cuDNN by giving wrong
+    // inputs intentionally?
+    switch (filter_descriptor.layout()) {
+      case dnn::FilterLayout::kOutputInputYX:
+        break;
+      default:
+        LOG(FATAL) << "Unsupported filter format "
+                   << FilterLayoutString(filter_descriptor.layout());
+        break;
+    }
+
+    status = dynload::cudnnSetFilter4dDescriptor(
+        parent_, handle_, elem_type,
+        CheckedNarrowing<int64, int>(
+            filter_descriptor.output_feature_map_count()),
+        CheckedNarrowing<int64, int>(
+            filter_descriptor.input_feature_map_count()),
+        CheckedNarrowing<int64, int>(filter_descriptor.input_filter_height()),
+        CheckedNarrowing<int64, int>(filter_descriptor.input_filter_width()));
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(FATAL) << "could not set cudnn filter descriptor: "
+                 << ToString(status);
+    }
+  }
+
+  ~ScopedFilterDescriptor() {
+    cudnnStatus_t status =
+        dynload::cudnnDestroyFilterDescriptor(parent_, handle_);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(ERROR) << "could not destroy cudnn filter descriptor: "
+                 << ToString(status);
+    }
+  }
+
+  cudnnFilterDescriptor_t handle() const { return handle_; }
+
+ private:
+  // Parent executor object. Not owned.
+  CUDAExecutor* parent_;
+
+  // cudnn filter descriptor this object creates. Owned.
+  cudnnFilterDescriptor_t handle_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(ScopedFilterDescriptor);
+};
+
+// Turns a ConvolutionDescriptor structure into a cudnn convolution handle
+// within a scope.
+class ScopedConvolutionDescriptor {
+ public:
+  ScopedConvolutionDescriptor(
+      CUDAExecutor* parent, const ConvolutionDescriptor& convolution_descriptor)
+      : parent_(parent), handle_(nullptr) {
+    cudnnStatus_t status =
+        dynload::cudnnCreateConvolutionDescriptor(parent_, &handle_);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(FATAL) << "could not create cudnn convolution descriptor: "
+                 << ToString(status);
+    }
+
+    status = dynload::cudnnSetConvolution2dDescriptor(
+        parent_, handle_, CheckedNarrowing<int64, int>(
+                              convolution_descriptor.zero_padding_height()),
+        CheckedNarrowing<int64, int>(
+            convolution_descriptor.zero_padding_width()),
+        CheckedNarrowing<int64, int>(
+            convolution_descriptor.vertical_filter_stride()),
+        CheckedNarrowing<int64, int>(
+            convolution_descriptor.horizontal_filter_stride()),
+        // TODO(leary) not sure what the following two params do.
+        1 /* = upscale_input_x */, 1 /* = upscale_input_y */,
+        // NOTE(keveman): cuDNN supports convolution and cross correlation.
+        // However, almost all the use cases do cross correlation, so just hard
+        // coding it here.
+        CUDNN_CROSS_CORRELATION);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(FATAL) << "could not set cudnn convolution descriptor: "
+                 << ToString(status);
+    }
+  }
+
+  ~ScopedConvolutionDescriptor() {
+    cudnnStatus_t status =
+        dynload::cudnnDestroyConvolutionDescriptor(parent_, handle_);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(ERROR) << "could not destroy cudnn convolution descriptor: "
+                 << ToString(status);
+    }
+  }
+
+  cudnnConvolutionDescriptor_t handle() const { return handle_; }
+
+ private:
+  CUDAExecutor* parent_;                 // Parent executor. Not owned.
+  cudnnConvolutionDescriptor_t handle_;  // Owned.
+
+  SE_DISALLOW_COPY_AND_ASSIGN(ScopedConvolutionDescriptor);
+};
+
+// Turns a PoolingDescriptor structure into a cudnn pooling descriptor handle
+// within a scope.
+class ScopedPoolingDescriptor {
+ public:
+  ScopedPoolingDescriptor(CUDAExecutor* parent,
+                          const PoolingDescriptor& pooling_descriptor)
+      : parent_(parent), handle_(nullptr) {
+    cudnnStatus_t status =
+        dynload::cudnnCreatePoolingDescriptor(parent_, &handle_);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(FATAL) << "could not create cudnn pooling descriptor: "
+                 << ToString(status);
+    }
+    status = dynload::cudnnSetPooling2dDescriptor(
+        parent_, handle_,
+        (pooling_descriptor.mode() == dnn::PoolingMode::kMaximum
+             ? CUDNN_POOLING_MAX
+             : CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING),
+        CheckedNarrowing<int64, int>(pooling_descriptor.window_height()),
+        CheckedNarrowing<int64, int>(pooling_descriptor.window_width()),
+        CheckedNarrowing<int64, int>(pooling_descriptor.vertical_padding()),
+        CheckedNarrowing<int64, int>(pooling_descriptor.horizontal_padding()),
+        CheckedNarrowing<int64, int>(pooling_descriptor.vertical_stride()),
+        CheckedNarrowing<int64, int>(pooling_descriptor.horizontal_stride()));
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(FATAL) << "could not set cudnn pooling descriptor: "
+                 << ToString(status);
+    }
+  }
+  ~ScopedPoolingDescriptor() {
+    cudnnStatus_t status =
+        dynload::cudnnDestroyPoolingDescriptor(parent_, handle_);
+    if (status != CUDNN_STATUS_SUCCESS) {
+      LOG(ERROR) << "could not destroy cudnn pooling descriptor: "
+                 << ToString(status);
+    }
+  }
+
+  cudnnPoolingDescriptor_t handle() const { return handle_; }
+
+ private:
+  CUDAExecutor* parent_;             // Parent executor. Not owned.
+  cudnnPoolingDescriptor_t handle_;  // Owned.
+
+  SE_DISALLOW_COPY_AND_ASSIGN(ScopedPoolingDescriptor);
+};
+
+bool CudnnSupport::DoConvolve(
+    Stream* stream, const BatchDescriptor& batch_descriptor,
+    const DeviceMemory<float>& input_data,
+    const FilterDescriptor& filter_descriptor,
+    const DeviceMemory<float>& filter_data,
+    const ConvolutionDescriptor& convolution_descriptor,
+    const BatchDescriptor& output_descriptor,
+    DeviceMemory<float>* output_data) {
+  ScopedTensorDescriptor input_4d{parent_, batch_descriptor, CUDNN_DATA_FLOAT};
+  ScopedTensorDescriptor output_4d{parent_, output_descriptor,
+                                   CUDNN_DATA_FLOAT};
+  ScopedFilterDescriptor filter{parent_, filter_descriptor, CUDNN_DATA_FLOAT};
+  ScopedConvolutionDescriptor conv{parent_, convolution_descriptor};
+
+  mutex_lock lock{dnn_handle_mutex_};
+  auto status = dynload::cudnnSetStream(parent_, ToHandle(dnn_handle_),
+                                        AsCUDAStreamValue(stream));
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(FATAL) << "failed to set stream for cudnn handle: " << ToString(status);
+  }
+  // Alpha is the scaling factor for input.
+  float alpha = 1.0;
+  // Beta is the scaling factor for output.
+  float beta = 0.0;
+
+  // The NO_WORKSPACE versions are possibly slower for certain shapes, but
+  // not so for the shapes currently used by Brain. Also, it seems prudent to
+  // keep cuMemAlloc off the critical path.
+  cudnnConvolutionFwdAlgo_t algo;
+  status = dynload::cudnnGetConvolutionForwardAlgorithm(
+      parent_, ToHandle(dnn_handle_), input_4d.handle(), filter.handle(),
+      conv.handle(), output_4d.handle(), CUDNN_CONVOLUTION_FWD_NO_WORKSPACE, 0,
+      &algo);
+
+  CHECK_EQ(status, CUDNN_STATUS_SUCCESS)
+      << "Unable to find a suitable algorithm for doing forward convolution";
+
+  status = dynload::cudnnConvolutionForward(
+      parent_, ToHandle(dnn_handle_), &alpha, input_4d.handle(),
+      input_data.opaque(), filter.handle(), filter_data.opaque(), conv.handle(),
+      algo, nullptr /* workspace ptr */, 0 /* workspace size */, &beta,
+      output_4d.handle(), output_data->opaque());
+
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(FATAL) << "failed to enqueue convolution on stream: "
+               << ToString(status);
+    return false;
+  }
+
+  return true;
+}
+
+bool CudnnSupport::DoConvolve(
+    Stream* stream, const BatchDescriptor& batch_descriptor,
+    const DeviceMemory<double>& input_data,
+    const FilterDescriptor& filter_descriptor,
+    const DeviceMemory<double>& filter_data,
+    const ConvolutionDescriptor& convolution_descriptor,
+    const BatchDescriptor& output_descriptor,
+    DeviceMemory<double>* output_data) {
+  LOG(ERROR) << "double-based DNN not yet implemented";
+  return false;
+}
+
+DeviceMemory<float> CudnnSupport::MaybeTransformLayout(
+    Stream* stream, BatchDescriptor* output_descriptor,
+    DeviceMemory<float> backward_output_data,
+    std::unique_ptr<TemporaryDeviceMemory<float>>* transform_scratch) {
+  if (output_descriptor->layout() == dnn::DataLayout::kBatchDepthYX) {
+    return backward_output_data;
+  }
+  CHECK(output_descriptor->layout() == dnn::DataLayout::kBatchYXDepth);
+  *transform_scratch =
+      stream->AllocateTemporaryArray<float>(backward_output_data.ElementCount())
+          .ConsumeValueOrDie();
+  BatchDescriptor transformed_output_descriptor;
+  transformed_output_descriptor.CloneFrom(*output_descriptor);
+  transformed_output_descriptor.set_layout(dnn::DataLayout::kBatchDepthYX);
+  ScopedTensorDescriptor orig_out_back_4d{parent_, *output_descriptor,
+                                          CUDNN_DATA_FLOAT};
+  ScopedTensorDescriptor transformed_out_back_4d{
+      parent_, transformed_output_descriptor, CUDNN_DATA_FLOAT};
+
+  float alpha = 1.0f;
+  float beta = 0.0f;
+  auto status = dynload::cudnnTransformTensor(
+      parent_, ToHandle(dnn_handle_), &alpha, orig_out_back_4d.handle(),
+      backward_output_data.opaque(), &beta, transformed_out_back_4d.handle(),
+      (*transform_scratch)->mutable_device_memory()->opaque());
+
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(FATAL) << "Failed to transform the data layout.";
+  }
+  output_descriptor->set_layout(dnn::DataLayout::kBatchDepthYX);
+  return (*transform_scratch)->device_memory();
+}
+
+bool CudnnSupport::DoConvolveBackwardData(
+    Stream* stream, const FilterDescriptor& filter_descriptor,
+    const DeviceMemory<float>& filter_data,
+    const BatchDescriptor& output_descriptor_in,
+    DeviceMemory<float> backward_output_data,
+    const ConvolutionDescriptor& convolution_descriptor,
+    const BatchDescriptor& input_descriptor,
+    DeviceMemory<float>* backward_input_data) {
+  mutex_lock lock{dnn_handle_mutex_};
+  auto status = dynload::cudnnSetStream(parent_, ToHandle(dnn_handle_),
+                                        AsCUDAStreamValue(stream));
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(FATAL) << "failed to set stream for cudnn handle: " << ToString(status);
+  }
+
+  // Alpha is the scaling factor for input.
+  float alpha = 1.0;
+  // Beta is the scaling factor for output.
+  float beta = 0.0;
+
+  // TBD(keveman): remove once cuDNN supports kBatchYXDepth for backward pass.
+  BatchDescriptor output_descriptor;
+  output_descriptor.CloneFrom(output_descriptor_in);
+  std::unique_ptr<TemporaryDeviceMemory<float>> transform_scratch;
+  backward_output_data = MaybeTransformLayout(
+      stream, &output_descriptor, backward_output_data, &transform_scratch);
+
+  ScopedTensorDescriptor out_back_4d{parent_, output_descriptor,
+                                     CUDNN_DATA_FLOAT};
+  ScopedTensorDescriptor in_back_4d{parent_, input_descriptor,
+                                    CUDNN_DATA_FLOAT};
+  ScopedFilterDescriptor filter{parent_, filter_descriptor, CUDNN_DATA_FLOAT};
+  ScopedConvolutionDescriptor conv{parent_, convolution_descriptor};
+
+  status = dynload::cudnnConvolutionBackwardData(
+      parent_, ToHandle(dnn_handle_), &alpha, filter.handle(),
+      filter_data.opaque(), out_back_4d.handle(), backward_output_data.opaque(),
+      conv.handle(), &beta, in_back_4d.handle(), backward_input_data->opaque());
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(FATAL) << "failed to enqueue convolution on stream: "
+               << ToString(status);
+    return false;
+  }
+  return true;
+}
+
+bool CudnnSupport::DoConvolveBackwardFilter(
+    Stream* stream, const dnn::BatchDescriptor& input_descriptor,
+    const DeviceMemory<float>& input_data,
+    const dnn::BatchDescriptor& output_descriptor_in,
+    DeviceMemory<float> backward_output_data,
+    const dnn::ConvolutionDescriptor& convolution_descriptor,
+    const dnn::FilterDescriptor& filter_descriptor,
+    DeviceMemory<float>* backward_filter_data) {
+  mutex_lock lock{dnn_handle_mutex_};
+  auto status = dynload::cudnnSetStream(parent_, ToHandle(dnn_handle_),
+                                        AsCUDAStreamValue(stream));
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(FATAL) << "failed to set stream for cudnn handle: " << ToString(status);
+  }
+
+  // Alpha is the scaling factor for input.
+  float alpha = 1.0;
+  // Beta is the scaling factor for output.
+  float beta = 0.0;
+
+  // TBD(keveman): remove once cuDNN supports kBatchYXDepth for backward pass.
+  BatchDescriptor output_descriptor;
+  output_descriptor.CloneFrom(output_descriptor_in);
+  std::unique_ptr<TemporaryDeviceMemory<float>> transform_scratch;
+  backward_output_data = MaybeTransformLayout(
+      stream, &output_descriptor, backward_output_data, &transform_scratch);
+
+  ScopedTensorDescriptor out_back_4d{parent_, output_descriptor,
+        CUDNN_DATA_FLOAT};
+  ScopedTensorDescriptor input_4d{parent_, input_descriptor, CUDNN_DATA_FLOAT};
+  ScopedFilterDescriptor filter{parent_, filter_descriptor, CUDNN_DATA_FLOAT};
+  ScopedConvolutionDescriptor conv{parent_, convolution_descriptor};
+
+  status = dynload::cudnnConvolutionBackwardFilter(
+      parent_, ToHandle(dnn_handle_), &alpha, input_4d.handle(),
+      input_data.opaque(), out_back_4d.handle(), backward_output_data.opaque(),
+      conv.handle(), &beta, filter.handle(), backward_filter_data->opaque());
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(FATAL) << "failed to enqueue convolution on stream: "
+               << ToString(status);
+    return false;
+  }
+  return true;
+}
+
+bool CudnnSupport::DoMatMul(Stream* stream,
+                            const DeviceMemory<float>& input_data,
+                            const DeviceMemory<float>& weights,
+                            const dnn::BatchDescriptor& input_dimensions,
+                            const dnn::BatchDescriptor& output_dimensions,
+                            DeviceMemory<float>* output_data) {
+  if (input_dimensions.count() != output_dimensions.count()) {
+    LOG(ERROR) << "MatMul input and output dimensions are not compatible.";
+    return false;
+  }
+
+  // We do not permute the input or output, instead we just
+  // reinterpret the layout. We are working with row-major matrices
+  // and the rows of the input and output correspond to batch, so
+  // batch has to be outermost in both the input and output.
+  //
+  // By adding transposes to the BLAS gemm call we could perhaps make
+  // the kYXDepthBatch layout work as well, but there has been no need
+  // for that so far.
+  if (input_dimensions.layout() != dnn::DataLayout::kBatchYXDepth &&
+      input_dimensions.layout() != dnn::DataLayout::kBatchDepthYX) {
+    LOG(ERROR) << "Unsupported MatMul input layout.";
+    return false;
+  }
+  if (output_dimensions.layout() != dnn::DataLayout::kBatchYXDepth &&
+      output_dimensions.layout() != dnn::DataLayout::kBatchDepthYX) {
+    LOG(ERROR) << "Unsupported MatMul output layout.";
+    return false;
+  }
+
+  if (output_dimensions.width() == 1 && output_dimensions.height() == 1) {
+    // This is a fast path that also supports the kBatchYXDepth layout.
+
+    // The matrices here are in row-major format while BLAS expects
+    // column-major, i.e. our matrices are transposed as far as BLAS
+    // is concerned. So we need to compute output^T =
+    // input^T*weights^T. There is no parameter for transposing the
+    // output in BLAS gemm, but instead we can transpose both sides of
+    // the equality to see that this is equivalent to
+    // output=weights*input. So we only need to swap the order of
+    // weights and input in the matrix product to correct for the
+    // row-major versus column-major difference.
+    const float alpha = 1.0f;  // Take the matrix product without scaling it.
+    const float beta = 0.0f;   // Ignore the original values in output_data.
+    const int64 m = output_dimensions.NodesAcrossFeatureMaps();
+    const int64 n = input_dimensions.count();
+    const int64 k = input_dimensions.NodesAcrossFeatureMaps();
+    stream->ThenBlasGemm(blas::Transpose::kNoTranspose,
+                         blas::Transpose::kNoTranspose, m, n, k, alpha, weights,
+                         m, input_data, k, beta, output_data, m);
+  } else {
+    // This is a slower and more complex path that supports output
+    // width() * height() > 1, though it only supports the
+    // kBatchYXDepth layout. Does support kBatchDepthYX if output
+    // feature_map_count() == 1, as then there is no difference
+    // between the two layouts.
+    //
+    // The operation here is the same as above, except that we have to
+    // do the matrix multiplication for each (y,x) output coordinate
+    // separately. We then interpret weights as containing K = width()
+    // * height() different matrices, which we all multiply onto the
+    // matrix from input_data, yielding K matrix products. We then
+    // combine these together into one matrix by concatenating all the
+    // first rows of these matrices, then all the seconds rows and so
+    // on. We can do this with a batched matrix multiplication, where
+    // the result is written to a different submatrix of the output
+    // for each matrix multiplication.
+    //
+    // The reason that we only support the kBatchYXDepth output layout
+    // is that we have to do something in the depth for each (y,x)
+    // coordinate. The kBatchYXDepth layout has the depth information
+    // for each point (y,x) in contiguous memory while the
+    // kBatchDepthYX layout does not.
+    //
+    // TODO(broune): Consider a special case for when output depth ==
+    // 1, as then possibly this could all be done as one matrix
+    // multiplication instead of a batched one, which should be
+    // faster. Another possibility would be to add a weights layout
+    // parameter and then support kBatchDepthYX for a different
+    // weights layout.
+    if (output_dimensions.layout() != dnn::DataLayout::kBatchYXDepth &&
+        !(output_dimensions.layout() == dnn::DataLayout::kBatchDepthYX &&
+          output_dimensions.feature_map_count() == 1)) {
+      LOG(ERROR) << "Unsupported MatMul output layout.";
+      return false;
+    }
+
+    const float alpha = 1.0f;  // Take the matrix product without scaling it.
+    const float beta = 0.0f;   // Ignore the original values in output_data.
+    const uint64 m = output_dimensions.feature_map_count();
+    const uint64 n = input_dimensions.count();
+    const uint64 k = input_dimensions.NodesAcrossFeatureMaps();
+    const int lda = m;
+    const int ldb = k;
+    const int ldc = output_dimensions.NodesAcrossFeatureMaps();
+    const int batch_count = output_dimensions.NodesPerFeatureMap();
+
+    std::vector<DeviceMemory<float>> a(batch_count);
+    std::vector<DeviceMemory<float>> b(batch_count);
+    std::vector<DeviceMemory<float>> c(batch_count);
+    for (int i = 0; i < batch_count; ++i) {
+      const int weights_offset = i * input_dimensions.NodesAcrossFeatureMaps() *
+                                 output_dimensions.feature_map_count();
+      a[i] = DeviceMemory<float>::MakeFromByteSize(
+          const_cast<float*>(reinterpret_cast<const float*>(weights.opaque())) +
+              weights_offset,
+          weights.ElementCount() - weights_offset);
+
+      b[i] = input_data;
+
+      const int output_offset = i * output_dimensions.feature_map_count();
+      c[i] = DeviceMemory<float>::MakeFromByteSize(
+          const_cast<float*>(
+              reinterpret_cast<const float*>(output_data->opaque())) +
+              output_offset,
+          output_data->ElementCount() - output_offset);
+    }
+    const auto toPtrs = [](std::vector<DeviceMemory<float>>& v) {
+      std::vector<DeviceMemory<float>*> ptrs;
+      for (auto& mem : v) {
+        ptrs.push_back(&mem);
+      }
+      return ptrs;
+    };
+
+    stream->ThenBlasGemmBatched(blas::Transpose::kNoTranspose,
+                                blas::Transpose::kNoTranspose, m, n, k, alpha,
+                                toPtrs(a), lda, toPtrs(b), ldb, beta, toPtrs(c),
+                                ldc, batch_count);
+  }
+
+  return stream->ok();
+}
+
+bool CudnnSupport::DoBiasAdd(Stream* stream,
+                             const DeviceMemory<float>& input_data,
+                             const DeviceMemory<float>& biases,
+                             const dnn::BatchDescriptor& dimensions,
+                             DeviceMemory<float>* output_data) {
+  ScopedTensorDescriptor input_descriptor{parent_, dimensions,
+                                          CUDNN_DATA_FLOAT};
+
+  BatchDescriptor bias_dimensions;
+  bias_dimensions.set_count(1)
+      .set_feature_map_count(dimensions.feature_map_count())
+      .set_height(1)
+      .set_width(1)
+      .set_layout(dnn::DataLayout::kBatchYXDepth);
+  ScopedTensorDescriptor bias_descriptor{parent_, bias_dimensions,
+                                         CUDNN_DATA_FLOAT};
+
+  // cudnnAddTensor is in-place, so we need to copy input_data to
+  // output_data before doing the addition, unless the input and
+  // output are at the same address.
+  if (input_data.opaque() != output_data->opaque()) {
+    stream->ThenMemcpy(output_data, input_data,
+                       dimensions.ElementCount() * sizeof(float));
+    if (!stream->ok()) {
+      LOG(ERROR)
+          << "stream " << stream
+          << " could not enqueue a tensor copy as part of bias addition.";
+      return false;
+    }
+  }
+
+  mutex_lock lock{dnn_handle_mutex_};
+
+  const float alpha = 1.0f;
+  const float beta = 1.0f;
+  auto status = dynload::cudnnAddTensor(
+      parent_, ToHandle(dnn_handle_), CUDNN_ADD_SAME_C, &alpha,
+      bias_descriptor.handle(), biases.opaque(), &beta,
+      input_descriptor.handle(), output_data->opaque());
+
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "stream " << stream << " could not enqueue bias addition.";
+    return false;
+  }
+
+  return true;
+}
+
+bool CudnnSupport::DoActivate(Stream* stream,
+                              dnn::ActivationMode activation_mode,
+                              const dnn::BatchDescriptor& dimensions,
+                              const DeviceMemory<float>& input_data,
+                              DeviceMemory<float>* output_data) {
+  mutex_lock lock{dnn_handle_mutex_};
+  auto status = dynload::cudnnSetStream(parent_, ToHandle(dnn_handle_),
+                                        AsCUDAStreamValue(stream));
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to set stream for cudnn handle: " << ToString(status);
+    return false;
+  }
+  cudnnActivationMode_t mode;
+  switch (activation_mode) {
+    case dnn::ActivationMode::kRelu6:
+      // TODO(leary) should probably do a post-pass to clip at 6?
+      LOG(WARNING) << "user requested Relu6, but providing Relu instead";
+      mode = CUDNN_ACTIVATION_RELU;
+      break;
+    case dnn::ActivationMode::kReluX:
+      // TODO(broune) should probably do a post-pass to clip at X?
+      LOG(WARNING) << "user requested ReluX, but providing Relu instead";
+      mode = CUDNN_ACTIVATION_RELU;
+      break;
+    case dnn::ActivationMode::kRelu:
+      mode = CUDNN_ACTIVATION_RELU;
+      break;
+    case dnn::ActivationMode::kSigmoid:
+      mode = CUDNN_ACTIVATION_SIGMOID;
+      break;
+    case dnn::ActivationMode::kTanh:
+      mode = CUDNN_ACTIVATION_TANH;
+      break;
+    default:
+      LOG(ERROR) << "unrecognized activation mode: "
+                 << static_cast<int>(activation_mode);
+      return false;
+  }
+
+  ScopedTensorDescriptor input_4d{parent_, dimensions, CUDNN_DATA_FLOAT};
+  // Alpha is the input scaling factor.
+  float alpha = 1.0;
+  // Beta is the output scaling factor.
+  float beta = 0.0;
+  status = dynload::cudnnActivationForward(
+      parent_, ToHandle(dnn_handle_), mode, &alpha, input_4d.handle(),
+      input_data.opaque(), &beta, input_4d.handle(), output_data->opaque());
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "stream " << stream
+               << " could not enqueue activation: " << ToString(status);
+    return false;
+  }
+
+  return true;
+}
+
+bool CudnnSupport::DoPoolForward(
+    Stream* stream, const dnn::PoolingDescriptor& pooling_dimensions,
+    const dnn::BatchDescriptor& input_dimensions,
+    const DeviceMemory<float>& input_data,
+    const dnn::BatchDescriptor& output_dimensions,
+    DeviceMemory<float>* output_data) {
+  mutex_lock lock{dnn_handle_mutex_};
+  auto status = dynload::cudnnSetStream(parent_, ToHandle(dnn_handle_),
+                                        AsCUDAStreamValue(stream));
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to set stream for cudnn handle: " << ToString(status);
+    return false;
+  }
+
+  // Alpha is the scaling factor for input.
+  float alpha = 1.0;
+  // Beta is the scaling factor for output.
+  float beta = 0.0;
+
+  ScopedTensorDescriptor src_desc{parent_, input_dimensions, CUDNN_DATA_FLOAT};
+  ScopedTensorDescriptor dest_desc{parent_, output_dimensions,
+                                   CUDNN_DATA_FLOAT};
+  ScopedPoolingDescriptor pooling_desc{parent_, pooling_dimensions};
+  status = dynload::cudnnPoolingForward(
+      parent_, ToHandle(dnn_handle_), pooling_desc.handle(), &alpha,
+      src_desc.handle(), input_data.opaque(), &beta, dest_desc.handle(),
+      output_data->opaque());
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to enqueue forward pooling on stream: "
+               << ToString(status);
+    return false;
+  }
+  return true;
+}
+
+bool CudnnSupport::DoPoolBackward(
+    Stream* stream, const dnn::PoolingDescriptor& pooling_dimensions,
+    const dnn::BatchDescriptor& input_dimensions,
+    const DeviceMemory<float>& input_data,
+    const dnn::BatchDescriptor& output_dimensions,
+    const DeviceMemory<float>& output_data,
+    const DeviceMemory<float>& input_diff_data,
+    DeviceMemory<float>* output_diff_data) {
+  mutex_lock lock{dnn_handle_mutex_};
+  auto status = dynload::cudnnSetStream(parent_, ToHandle(dnn_handle_),
+                                        AsCUDAStreamValue(stream));
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to set stream for cudnn handle: " << ToString(status);
+    return false;
+  }
+
+  // Alpha is the scaling factor for input.
+  float alpha = 1.0;
+  // Beta is the scaling factor for output.
+  float beta = 0.0;
+
+  ScopedTensorDescriptor src_desc{parent_, input_dimensions, CUDNN_DATA_FLOAT};
+  ScopedTensorDescriptor dest_desc{parent_, output_dimensions,
+        CUDNN_DATA_FLOAT};
+  ScopedPoolingDescriptor pooling_desc{parent_, pooling_dimensions};
+  status = dynload::cudnnPoolingBackward(
+      parent_, ToHandle(dnn_handle_), pooling_desc.handle(), &alpha,
+      dest_desc.handle(), output_data.opaque(), dest_desc.handle(),
+      input_diff_data.opaque(), src_desc.handle(), input_data.opaque(), &beta,
+      src_desc.handle(), output_diff_data->opaque());
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to enqueue backward pooling on stream: "
+               << ToString(status);
+    return false;
+  }
+  return true;
+}
+
+bool CudnnSupport::DoNormalize(
+    Stream* stream, const dnn::NormalizeDescriptor& normalize_descriptor,
+    const DeviceMemory<float>& input_data, DeviceMemory<float>* output_data) {
+  LOG(FATAL) << "not yet implemented";  // TODO(leary)
+}
+
+bool CudnnSupport::DoDepthConcatenate(
+    Stream* stream, port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+    port::ArraySlice<const DeviceMemory<float>*> input_data,
+    DeviceMemory<float>* output_data) {
+  LOG(FATAL) << "not yet implemented";  // TODO(leary)
+}
+
+bool CudnnSupport::DoElementwiseOperate(
+    Stream* stream, dnn::ElementwiseOperation operation,
+    port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+    port::ArraySlice<const DeviceMemory<float>*> input_data,
+    const dnn::BatchDescriptor& output_dimensions,
+    DeviceMemory<float>* output_data) {
+  LOG(FATAL) << "not yet implemented";  // TODO(leary)
+}
+
+bool CudnnSupport::DoMemcpyD2HQuantized(
+    Stream* stream, const DeviceMemory<float>& gpu_unquantized_src,
+    port::MutableArraySlice<uint8> host_dst) {
+  LOG(ERROR) << "quantized memcpy not supported by cuDNN";
+  return false;
+}
+
+bool CudnnSupport::DoMemcpyD2HQuantized(
+    Stream* stream, const DeviceMemory<float>& device_unquantized_src,
+    port::MutableArraySlice<uint16> host_dst) {
+  LOG(ERROR) << "quantized memcpy not supported by cuDNN";
+  return false;
+}
+
+bool CudnnSupport::DoMemcpyD2HQuantized(
+    Stream* stream, const DeviceMemory<float>& device_unquantized_src,
+    port::MutableArraySlice<int32> host_dst) {
+  LOG(ERROR) << "quantized memcpy not supported by cuDNN";
+  return false;
+}
+
+bool CudnnSupport::DoMemcpyH2DQuantized(
+    Stream* stream, port::ArraySlice<uint8> host_src,
+    DeviceMemory<float>* gpu_unquantized_dst) {
+  LOG(ERROR) << "quantized memcpy not supported by cuDNN";
+  return false;
+}
+
+bool CudnnSupport::DeriveOutputBatchDescriptor(
+    const BatchDescriptor& batch_descriptor,
+    const FilterDescriptor& filter_descriptor,
+    const dnn::ConvolutionDescriptor& convolution_descriptor,
+    dnn::BatchDescriptor* output_batch_descriptor) {
+  ScopedTensorDescriptor input_4d{parent_, batch_descriptor, CUDNN_DATA_FLOAT};
+  ScopedFilterDescriptor filter{parent_, filter_descriptor, CUDNN_DATA_FLOAT};
+  ScopedConvolutionDescriptor conv{parent_, convolution_descriptor};
+
+  int dims[4];
+  auto status = dynload::cudnnGetConvolutionNdForwardOutputDim(
+      parent_, conv.handle(), input_4d.handle(), filter.handle(), 4, dims);
+  if (status != CUDNN_STATUS_SUCCESS) {
+    LOG(ERROR) << "could not get output tensor for convolution: "
+               << ToString(status);
+    return false;
+  }
+
+  output_batch_descriptor->set_count(dims[0])
+      .set_feature_map_count(dims[1])
+      .set_height(dims[2])
+      .set_width(dims[3])
+      .set_layout(batch_descriptor.layout());
+  return true;
+}
+
+}  // namespace cuda
+
+namespace gpu = ::perftools::gputools;
+
+void initialize_cudnn() {
+  gpu::port::Status status =
+      gpu::PluginRegistry::Instance()
+          ->RegisterFactory<gpu::PluginRegistry::DnnFactory>(
+              gpu::cuda::kCudaPlatformId, gpu::cuda::kCuDnnPlugin, "cuDNN",
+              [](gpu::internal::StreamExecutorInterface*
+                     parent) -> gpu::dnn::DnnSupport* {
+                gpu::cuda::CUDAExecutor* cuda_executor =
+                    dynamic_cast<gpu::cuda::CUDAExecutor*>(parent);
+                if (cuda_executor == nullptr) {
+                  LOG(ERROR)
+                      << "Attempting to initialize an instance of the cuBLAS "
+                      << "support library with a non-CUDA StreamExecutor";
+                  return nullptr;
+                }
+
+                gpu::cuda::CudnnSupport* dnn =
+                    new gpu::cuda::CudnnSupport(cuda_executor);
+                if (!dnn->Init().ok()) {
+                  // Note: Init() will log a more specific error.
+                  delete dnn;
+                  return nullptr;
+                }
+                return dnn;
+              });
+
+  if (!status.ok()) {
+    LOG(ERROR) << "Unable to register cuDNN factory: "
+               << status.error_message();
+  }
+
+  // Prime the cuDNN DSO. The loader will log more information.
+  auto statusor = gpu::internal::CachedDsoLoader::GetCudnnDsoHandle();
+  if (!statusor.ok()) {
+    LOG(INFO) << "Unable to load cuDNN DSO.";
+  }
+
+  gpu::PluginRegistry::Instance()->SetDefaultFactory(gpu::cuda::kCudaPlatformId,
+                                                     gpu::PluginKind::kDnn,
+                                                     gpu::cuda::kCuDnnPlugin);
+}
+
+}  // namespace gputools
+}  // namespace perftools
+
+REGISTER_MODULE_INITIALIZER(register_cudnn,
+                            { perftools::gputools::initialize_cudnn(); });
diff --git a/tensorflow/stream_executor/cuda/cuda_dnn.h b/tensorflow/stream_executor/cuda/cuda_dnn.h
new file mode 100644
index 0000000000..08e952cee0
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_dnn.h
@@ -0,0 +1,206 @@
+// The CUDA-specific DNN library support, implementing the general DnnSupport
+// interface.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DNN_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DNN_H_
+
+#include "tensorflow/stream_executor/dnn.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+#include "tensorflow/stream_executor/temporary_device_memory.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+class CUDAExecutor;
+
+// Opaque and unique identifer for the cuDNN plugin.
+extern const PluginId kCuDnnPlugin;
+
+// cudnn-library based DNN support. For details on overridden interface
+// functions, see dnn.h.
+class CudnnSupport : public dnn::DnnSupport {
+ public:
+  explicit CudnnSupport(CUDAExecutor* parent);
+  ~CudnnSupport() override;
+
+  port::Status Init() override;
+
+  bool DoConvolve(Stream* stream, const dnn::BatchDescriptor& input_descriptor,
+                  const DeviceMemory<float>& input_data,
+                  const dnn::FilterDescriptor& filter_descriptor,
+                  const DeviceMemory<float>& filter_data,
+                  const dnn::ConvolutionDescriptor& convolution_descriptor,
+                  const dnn::BatchDescriptor& output_descriptor,
+                  DeviceMemory<float>* output_data) override;
+
+  bool DoConvolve(Stream* stream, const dnn::BatchDescriptor& batch_descriptor,
+                  const DeviceMemory<double>& input_data,
+                  const dnn::FilterDescriptor& filter_descriptor,
+                  const DeviceMemory<double>& filter_data,
+                  const dnn::ConvolutionDescriptor& convolution_descriptor,
+                  const dnn::BatchDescriptor& output_descriptor,
+                  DeviceMemory<double>* output_data) override;
+
+  bool DoSeparableConvolve(
+      Stream* stream, const dnn::BatchDescriptor& batch_descriptor,
+      const DeviceMemory<float>& input_data,
+      const dnn::FilterDescriptor& filter_descriptor, int depth_multiplier,
+      const DeviceMemory<float>& first_weights,
+      const DeviceMemory<float>& second_weights,
+      const dnn::ConvolutionDescriptor& convolution_descriptor,
+      const dnn::BatchDescriptor& output_descriptor,
+      DeviceMemory<float>* output_data) override {
+    LOG(ERROR) << "separable convolution not supported by CUDNN";
+    return false;
+  }
+
+  bool DoConvolveBackwardData(
+      Stream* stream, const dnn::FilterDescriptor& filter_descriptor,
+      const DeviceMemory<float>& filter_data,
+      const dnn::BatchDescriptor& output_descriptor,
+      DeviceMemory<float> backward_output_data,
+      const dnn::ConvolutionDescriptor& convolution_descriptor,
+      const dnn::BatchDescriptor& input_descriptor,
+      DeviceMemory<float>* backward_input_data) override;
+
+  bool DoConvolveBackwardFilter(
+      Stream* stream, const dnn::BatchDescriptor& input_descriptor,
+      const DeviceMemory<float>& input_data,
+      const dnn::BatchDescriptor& output_descriptor,
+      DeviceMemory<float> backward_output_data,
+      const dnn::ConvolutionDescriptor& convolution_descriptor,
+      const dnn::FilterDescriptor& filter_descriptor,
+      DeviceMemory<float>* backward_filter_data) override;
+
+  bool DoMatMul(Stream* stream, const DeviceMemory<float>& input_data,
+                const DeviceMemory<float>& weights,
+                const dnn::BatchDescriptor& input_dimensions,
+                const dnn::BatchDescriptor& output_dimensions,
+                DeviceMemory<float>* output_data) override;
+
+  bool DoMatMulQuantized(Stream* stream, const DeviceMemory<float>& input_data,
+                         const DeviceMemory<int8>& quantized_weights,
+                         const DeviceMemory<float>& weight_scales,
+                         const dnn::BatchDescriptor& input_dimensions,
+                         const dnn::BatchDescriptor& output_dimensions,
+                         DeviceMemory<float>* output_data) override {
+    LOG(ERROR) << "DNN MatMulQuantized not supported by CUDNN";
+    return false;
+  }
+
+  bool DoMatMulQuantized(Stream* stream, const DeviceMemory<float>& input_data,
+                         const DeviceMemory<int16>& quantized_weights,
+                         const DeviceMemory<float>& weight_scales,
+                         const dnn::BatchDescriptor& input_dimensions,
+                         const dnn::BatchDescriptor& output_dimensions,
+                         DeviceMemory<float>* output_data) override {
+    LOG(ERROR) << "DNN MatMulQuantized not supported by CUDNN";
+    return false;
+  }
+
+  bool DoBiasAdd(Stream* stream, const DeviceMemory<float>& input_data,
+                 const DeviceMemory<float>& biases,
+                 const dnn::BatchDescriptor& dimensions,
+                 DeviceMemory<float>* output_data) override;
+
+  bool DoActivate(Stream* stream, dnn::ActivationMode activation_mode,
+                  const dnn::BatchDescriptor& dimensions,
+                  const DeviceMemory<float>& input_data,
+                  DeviceMemory<float>* output_data) override;
+
+  bool DoPoolForward(Stream* stream,
+                     const dnn::PoolingDescriptor& pooling_dimensions,
+                     const dnn::BatchDescriptor& input_dimensions,
+                     const DeviceMemory<float>& input_data,
+                     const dnn::BatchDescriptor& output_dimensions,
+                     DeviceMemory<float>* output_data) override;
+
+  bool DoPoolBackward(Stream* stream,
+                      const dnn::PoolingDescriptor& pooling_dimensions,
+                      const dnn::BatchDescriptor& input_dimensions,
+                      const DeviceMemory<float>& input_data,
+                      const dnn::BatchDescriptor& output_dimensions,
+                      const DeviceMemory<float>& output_data,
+                      const DeviceMemory<float>& input_diff_data,
+                      DeviceMemory<float>* output_diff_data) override;
+
+  bool DoNormalize(Stream* stream,
+                   const dnn::NormalizeDescriptor& normalize_descriptor,
+                   const DeviceMemory<float>& input_data,
+                   DeviceMemory<float>* output_data) override;
+
+  bool DoDepthConcatenate(
+      Stream* stream, port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+      port::ArraySlice<const DeviceMemory<float>*> input_data,
+      DeviceMemory<float>* output_data) override;
+
+  bool DoElementwiseOperate(
+      Stream* stream, dnn::ElementwiseOperation operation,
+      port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+      port::ArraySlice<const DeviceMemory<float>*> input_data,
+      const dnn::BatchDescriptor& output_dimensions,
+      DeviceMemory<float>* output_data) override;
+
+  bool DoMemcpyD2HQuantized(Stream* stream,
+                            const DeviceMemory<float>& device_unquantized_src,
+                            port::MutableArraySlice<uint8> host_dst) override;
+
+  bool DoMemcpyD2HQuantized(Stream* stream,
+                            const DeviceMemory<float>& device_unquantized_src,
+                            port::MutableArraySlice<uint16> host_dst) override;
+
+  bool DoMemcpyD2HQuantized(Stream* stream,
+                            const DeviceMemory<float>& device_unquantized_src,
+                            port::MutableArraySlice<int32> host_dst) override;
+
+  bool DoMemcpyH2DQuantized(
+      Stream* stream, port::ArraySlice<uint8> host_src,
+      DeviceMemory<float>* device_unquantized_dst) override;
+
+  // Derives an output batch descriptor from an input batch and convolution
+  // descriptors.
+  bool DeriveOutputBatchDescriptor(
+      const dnn::BatchDescriptor& batch_descriptor,
+      const dnn::FilterDescriptor& filter_descriptor,
+      const dnn::ConvolutionDescriptor& convolution_descriptor,
+      dnn::BatchDescriptor* output_batch_descriptor);
+
+ private:
+  // Guards the enqueueing of DNN operations via the dnn_handle_ below.
+  mutex dnn_handle_mutex_;
+
+  CUDAExecutor* parent_;  // Parent executor object. Not owned.
+
+  // cudnn library handle. cudnnHandle_t type is not present in this header to
+  // prevent third-party library header inclusions from leaking outside the
+  // single cuda_dnn translation unit.
+  void* dnn_handle_ GUARDED_BY(dnn_handle_mutex_);
+
+  // NOTE(keveman): Temporary data layout transformation until cuDNN supports
+  // kBatchYXDepth for backward pass. This function allocates temporary memory,
+  // lays out the source data into the temporary but in the kBatchDepthXY
+  // layout, and returns the temporary memory. The caller is responsible for
+  // deallocating the temporary. Since the allocation is done using Stream's
+  // AllocateTemporaryMemory, a later BlockHostUntilDone could be used for
+  // deallocation.
+  //
+  // transform_scratch is populated with a legitimate temporary allocation iff
+  // the original output data needs to be transformed.
+  DeviceMemory<float> MaybeTransformLayout(
+      Stream* stream, dnn::BatchDescriptor* output_descriptor,
+      DeviceMemory<float> backward_output_data,
+      std::unique_ptr<TemporaryDeviceMemory<float>>* transform_scratch)
+      EXCLUSIVE_LOCKS_REQUIRED(dnn_handle_mutex_);
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CudnnSupport);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DNN_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_driver.cc b/tensorflow/stream_executor/cuda/cuda_driver.cc
new file mode 100644
index 0000000000..8c4316b4c1
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_driver.cc
@@ -0,0 +1,1608 @@
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+
+#include <dlfcn.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <set>
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/cuda/cuda_diagnostics.h"
+#include "tensorflow/stream_executor/dso_loader.h"
+#include "tensorflow/stream_executor/lib/casts.h"
+#include "tensorflow/stream_executor/lib/env.h"
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/human_readable.h"
+#include "tensorflow/stream_executor/lib/notification.h"
+#include "tensorflow/stream_executor/lib/threadpool.h"
+#include "tensorflow/stream_executor/lib/stacktrace.h"
+#include "tensorflow/stream_executor/lib/static_threadlocal.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/lib/inlined_vector.h"
+
+bool FLAGS_gpuexec_cuda_driver_inject_init_error = false;
+bool FLAGS_gpuexec_cuda_sync_around_driver_calls = false;
+bool FLAGS_gpuexec_cuda_device_0_only = false;
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+namespace dynload {
+
+#define PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(__name)                              \
+  struct DynLoadShim__##__name {                                             \
+    static const char *kName;                                                \
+    using FuncPointerT = std::add_pointer<decltype(::__name)>::type;         \
+    static void *GetDsoHandle() {                                            \
+      static auto status = internal::CachedDsoLoader::GetLibcudaDsoHandle(); \
+      return status.ValueOrDie();                                            \
+    }                                                                        \
+    static FuncPointerT DynLoad() {                                          \
+      static void *f = dlsym(GetDsoHandle(), kName);                         \
+      CHECK(f != nullptr) << "could not find " << kName                      \
+                          << "in libcuda DSO; dlerror: " << dlerror();       \
+      return reinterpret_cast<FuncPointerT>(f);                              \
+    }                                                                        \
+    template <typename... Args>                                              \
+    CUresult operator()(Args... args) {                                      \
+      return DynLoad()(args...);                                             \
+    }                                                                        \
+  } __name;                                                                  \
+  const char *DynLoadShim__##__name::kName = #__name;
+
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxCreate_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxDestroy);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxEnablePeerAccess);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxGetCurrent);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxGetDevice);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxGetSharedMemConfig);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxPopCurrent_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxSetCurrent);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxSetSharedMemConfig);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuCtxSynchronize);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceComputeCapability);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceCanAccessPeer);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceGet);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceGetAttribute);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceGetCount);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceGetName);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceGetPCIBusId);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceGetProperties);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDeviceTotalMem);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuDriverGetVersion);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuEventCreate);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuEventDestroy_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuEventElapsedTime);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuEventQuery);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuEventRecord);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuFuncGetAttribute);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuFuncSetCacheConfig);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuGetErrorName);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuGetErrorString);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuInit);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuLaunchKernel);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemAlloc_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemcpyDtoD_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemcpyDtoH_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemcpyHtoD_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemcpyDtoDAsync_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemcpyDtoHAsync_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemcpyHtoDAsync_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemGetAddressRange_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemFree_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemFreeHost);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemGetInfo_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemHostAlloc);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemHostRegister_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemHostUnregister);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemsetD32_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemsetD32Async);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuMemsetD8_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuModuleGetFunction);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuModuleGetGlobal_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuModuleLoadDataEx);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuModuleLoadFatBinary);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuModuleUnload);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuOccupancyMaxActiveBlocksPerMultiprocessor);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuPointerGetAttribute);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuStreamAddCallback);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuStreamCreate);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuStreamDestroy_v2);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuStreamQuery);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuStreamSynchronize);
+PERFTOOLS_GPUTOOLS_LIBCUDA_WRAP(cuStreamWaitEvent);
+
+}  // namespace dynload
+
+namespace {
+
+// Manages the singleton set of contexts that we've created. This is used for
+// checking that no CUDA-runtime-created contexts have been generated
+// accidentally.  CUDA-runtime-created contexts are avoided, if triple angle
+// brace launches are required, by using the scoped activations in
+// cuda_activation.h.
+class CreatedContexts {
+ public:
+  // Returns whether context is a member of the live set.
+  static bool Has(CUcontext context) {
+    shared_lock lock{mu_};
+    return Live()->find(context) != Live()->end();
+  }
+
+  // Adds context to the live set.
+  static void Add(CUcontext context) {
+    CHECK(context != nullptr);
+    mutex_lock lock{mu_};
+    Live()->emplace(context);
+  }
+
+  // Removes context from the live set.
+  static void Remove(CUcontext context) {
+    CHECK(context != nullptr);
+    mutex_lock lock{mu_};
+    Live()->erase(context);
+  }
+
+ private:
+  // Returns the live set singleton.
+  static std::set<CUcontext> *Live() {
+    static auto singleton = new std::set<CUcontext>;
+    return singleton;
+  }
+
+  // Lock that guards access-to/mutation-of the live set.
+  static mutex mu_;
+};
+
+/* static */ mutex CreatedContexts::mu_{LINKER_INITIALIZED};
+
+// Formats CUresult to output prettified values into a log stream.
+// Error summaries taken from:
+// http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#group__CUDA__TYPES_1gc6c391505e117393cc2558fff6bfc2e9
+//
+// TODO(leary) switch to cuGetErrorName when updated cuda.h is available.
+string ToString(CUresult result) {
+#define OSTREAM_CUDA_ERROR(__name) \
+  case CUDA_ERROR_##__name:        \
+    return "CUDA_ERROR_" #__name;
+
+///////////////
+// NOTE: here we specify return code values outside of the enum explicitly
+// because our in-tree cuda.h is from the CUDA 5.5 SDK, but CUDA 6.0+ driver
+// libraries are deployed in the fleet these error codes are backwards
+// compatible, but if we see a "new" one, we want to be able to identify it in
+// the logs.
+//
+// Once we get a cuda.h that has cuGetErrorName (TODO is above) we can
+// eliminate this function and just rely on the driver to provide us these
+// strings.
+//
+// NOTE: "Must reboot all context" below is shorthand for, "must
+// destroy/recreate the offending context and any allocation which come from
+// it if you are to continue using CUDA."
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wswitch"
+  switch (result) {
+    OSTREAM_CUDA_ERROR(INVALID_VALUE)
+    OSTREAM_CUDA_ERROR(OUT_OF_MEMORY)
+    OSTREAM_CUDA_ERROR(NOT_INITIALIZED)
+    OSTREAM_CUDA_ERROR(DEINITIALIZED)
+    OSTREAM_CUDA_ERROR(NO_DEVICE)
+    OSTREAM_CUDA_ERROR(INVALID_DEVICE)
+    OSTREAM_CUDA_ERROR(INVALID_IMAGE)
+    OSTREAM_CUDA_ERROR(INVALID_CONTEXT)
+    OSTREAM_CUDA_ERROR(INVALID_HANDLE)
+    OSTREAM_CUDA_ERROR(NOT_FOUND)
+    OSTREAM_CUDA_ERROR(NOT_READY)
+    OSTREAM_CUDA_ERROR(NO_BINARY_FOR_GPU)
+
+    // Encountered an uncorrectable ECC error during execution.
+    OSTREAM_CUDA_ERROR(ECC_UNCORRECTABLE)
+
+    // Load/store on an invalid address. Must reboot all context.
+    case 700:
+      return "CUDA_ERROR_ILLEGAL_ADDRESS";
+    // Passed too many / wrong arguments, too many threads for register count.
+    case 701:
+      return "CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES";
+    // Kernel took too long to execute.
+    case 702:
+      return "CUDA_ERROR_LAUNCH_TIMEOUT";
+    // Kernel launch uses an incompatible texturing mode.
+    case 703:
+      return "CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING";
+    // Trying to re-enable peer access that already has it enabled.
+    case 704:
+      return "CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED";
+    // Trying to disable peer access that has not yet been enabled.
+    case 705:
+      return "CUDA_ERROR_PEER_ACCESS_NOT_ENABLED";
+    // Primary context for the specified device has already been initialized.
+    case 708:
+      return "CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE";
+    // Context current to calling thread has been destroyed or is a primary
+    // context that has not yet been initialized.
+    case 709:
+      return "CUDA_ERROR_CONTEXT_IS_DESTROYED";
+    // Device-side assert triggered during kernel execution. Must reboot all
+    // context.
+    case 710:
+      return "CUDA_ERROR_ASSERT";
+    // Hardware resources to enable peer access have been exhausted.
+    case 711:
+      return "CUDA_ERROR_TOO_MANY_PEERS";
+    // Memory range has already been registered.
+    case 712:
+      return "CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED";
+    // Pointer does not correspond to any currently registered memory region.
+    case 713:
+      return "CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED";
+    // Due to stack corruption or exceeding stack size limit. Must reboot all
+    // context.
+    case 714:
+      return "CUDA_ERROR_HARDWARE_STACK_ERROR";
+    case 715:
+      return "CUDA_ERROR_ILLEGAL_INSTRUCTION";
+    // Load/store on an unaligned memory address. Must reboot all context.
+    case 716:
+      return "CUDA_ERROR_MISALIGNED_ADDRESS";
+    // Device instruction with specific address space given address not
+    // belonging to allowed address space. Must reboot all context.
+    case 717:
+      return "CUDA_ERROR_INVALID_ADDRESS_SPACE";
+    // Device program counter wrapped its address space. Must reboot all
+    // context.
+    case 718:
+      return "CUDA_ERROR_INVALID_PC";
+    // Exception on device while executing a kernel; e.g. deref invalid device
+    // pointer, accessing OOB shared memory. Must reboot all context.
+    case 719:
+      return "CUDA_ERROR_LAUNCH_FAILED";
+
+    OSTREAM_CUDA_ERROR(CONTEXT_ALREADY_IN_USE)
+    OSTREAM_CUDA_ERROR(PEER_ACCESS_UNSUPPORTED)
+    OSTREAM_CUDA_ERROR(NOT_PERMITTED)
+    OSTREAM_CUDA_ERROR(NOT_SUPPORTED)
+    OSTREAM_CUDA_ERROR(UNKNOWN)  // Unknown internal error to CUDA.
+    default:
+      return port::StrCat("CUresult(", static_cast<int>(result), ")");
+  }
+#pragma GCC diagnostic pop
+}
+
+// Returns the current context and checks that it is in the set of CUDA contexts
+// created by StreamExecutor (to ensure that the CUDA runtime didn't create a
+// context behind our backs).
+CUcontext CurrentContext() {
+  CUcontext current = nullptr;
+  CUresult result = dynload::cuCtxGetCurrent(&current);
+  if (result != CUDA_SUCCESS) {
+    LOG(FATAL) << "failed to query current context: " << ToString(result);
+  }
+  if (current != nullptr && !CreatedContexts::Has(current)) {
+    LOG(FATAL) << "current context was not created by the StreamExecutor "
+                  "cuda_driver API: "
+               << current
+               << "; a CUDA runtime call "
+                  "was likely performed without using a StreamExecutor context";
+  }
+  return current;
+}
+
+// "Pops" the current context, checks that it matches expected, and checks the
+// postcondition that the current context is nullptr.
+//
+// This is not done when we're nested within a MultiOpActivation, as we want to
+// persist the active context until the MultiOpActivation is popped.
+void PopContextAndCheckNowNull(CUcontext expected) {
+  CUcontext actual = CurrentContext();
+  CHECK_EQ(expected, actual) << "would pop unexpected context";
+  CUcontext popped;
+  CHECK_EQ(CUDA_SUCCESS, dynload::cuCtxPopCurrent_v2(&popped));
+  CHECK_EQ(expected, popped);
+  CHECK(nullptr == CurrentContext());
+  VLOG(3) << "popped context " << expected
+          << " and current context is now null";
+}
+
+// CUDA driver routines may require a large amount of stack (particularly
+// cuModuleLoadDataEx, in our experience). To avoid stack overflow when using
+// stack-limited threads (such as those spawned by a default-argument
+// thread::ThreadPool on some platforms), we run certain routines in this pool
+// and wait for completion.
+static mutex driver_executor_threadpool_mu(LINKER_INITIALIZED);
+static port::ThreadPool *InitializeDriverExecutor() {
+  return new port::ThreadPool(port::Env::Default(), port::ThreadOptions(),
+                              "cuda_driver", 1);
+}
+
+port::ThreadPool *GetDriverExecutor() {
+  mutex_lock lock(driver_executor_threadpool_mu);
+  static port::ThreadPool *thread_pool = InitializeDriverExecutor();
+  return thread_pool;
+}
+
+}  // namespace
+
+
+// Thread-local storage that indicates whether a CUDA context activation is
+// being nested within an outer, MultiOpActivation. In that case, we should not
+// pop the context to nullptr when we are done with the current activation.
+SE_STATIC_THREAD_LOCAL_POD(bool, tls_in_multi_op_activation);
+
+string MemorySpaceString(MemorySpace memory_space) {
+  switch (memory_space) {
+    case MemorySpace::kHost:
+      return "host";
+    case MemorySpace::kDevice:
+      return "device";
+    default:
+      LOG(FATAL) << "impossible memory space";
+  }
+}
+
+// Implementation note: the CUDA context is held, per-thread, in TLS. We avoid
+// setting all the time because it's not clear what side effects might occur for
+// a "set" operation, whereas a "get" operation we can reasonably assume is a
+// TLS read.
+//
+// We cannot race here because CUcontext is associated with a particular thread
+// and stored in TLS; and these interfaces should not be used from signal
+// handlers.
+ScopedActivateContext::ScopedActivateContext(CUcontext context,
+                                             MultiOpActivation moa)
+    : context_(CHECK_NOTNULL(context)),
+      previously_in_multi_op_activation_(tls_in_multi_op_activation.get()) {
+  if (static_cast<bool>(moa)) {
+    tls_in_multi_op_activation.get() = true;
+  }
+
+  CUcontext current = prior_context_ = CurrentContext();
+  if (current != context) {
+    VLOG(3) << "ScopedActivateContext switching context from " << current
+            << " to " << context;
+    CHECK_EQ(CUDA_SUCCESS, dynload::cuCtxSetCurrent(context));
+    if (FLAGS_gpuexec_cuda_sync_around_driver_calls) {
+      auto res = dynload::cuCtxSynchronize();
+      if (res != CUDA_SUCCESS) {
+        LOG(FATAL) << "gpuexec_cuda_sync_around_driver_calls found "
+                   << ToString(res)
+                   << " immediately after establishing the device context "
+                   << context << " :: " << port::CurrentStackTrace();
+      }
+    }
+  }
+}
+
+ScopedActivateContext::~ScopedActivateContext() {
+  if (tls_in_multi_op_activation.get()) {
+    CHECK_EQ(context_, CurrentContext());
+    if (FLAGS_gpuexec_cuda_sync_around_driver_calls) {
+      auto res = dynload::cuCtxSynchronize();
+      if (res != CUDA_SUCCESS) {
+        LOG(FATAL) << "gpuexec_cuda_sync_around_driver_calls found "
+                   << ToString(res)
+                   << " immediately after de-establishing the device context "
+                   << context_ << " :: " << port::CurrentStackTrace();
+      }
+    }
+    CHECK_EQ(CUDA_SUCCESS, dynload::cuCtxSetCurrent(prior_context_));
+  } else {
+    PopContextAndCheckNowNull(context_);
+  }
+  tls_in_multi_op_activation.get() = previously_in_multi_op_activation_;
+}
+
+namespace {
+
+// Returns a stringified device number associated with pointer, primarily for
+// logging purposes. Returns "?" if the device could not be successfully
+// queried.
+string CUDAPointerToDeviceString(CUdeviceptr pointer) {
+  auto value = CUDADriver::GetPointerDevice(pointer);
+  if (value.ok()) {
+    return port::StrCat(value.ValueOrDie());
+  }
+  LOG(ERROR) << "could not query device: " << value.status();
+  return "?";
+}
+
+// Returns a stringified memory space associated with pointer, primarily for
+// logging purposes. Returns "?" if the memory space could not be successfully
+// queried.
+string CUDAPointerToMemorySpaceString(CUdeviceptr pointer) {
+  auto value = CUDADriver::GetPointerMemorySpace(pointer);
+  if (value.ok()) {
+    return MemorySpaceString(value.ValueOrDie());
+  }
+  LOG(ERROR) << "could not query device: " << value.status();
+  return "?";
+}
+
+// Returns a stringified representation of whether or not peer access is
+// permitted between the "from" and "to" pointers' associated contexts,
+// primarily for logging purposes. Returns "error" if an error is encountered
+// in the process of querying.
+string CUDAPointersToCanAccessString(CUdeviceptr from, CUdeviceptr to) {
+  auto from_context = CUDADriver::GetPointerContext(from);
+  if (!from_context.ok()) {
+    LOG(ERROR) << "could not retrieve source pointer's context: "
+               << from_context.status();
+    return "error";
+  }
+  auto to_context = CUDADriver::GetPointerContext(to);
+  if (!to_context.ok()) {
+    LOG(ERROR) << "could not retrieve destination pointer's context: "
+               << to_context.status();
+    return "error";
+  }
+  return CUDADriver::CanEnablePeerAccess(from_context.ValueOrDie(),
+                                         to_context.ValueOrDie())
+             ? "true"
+             : "false";
+}
+
+
+// Actually performs the work of CUDA initialization. Wrapped up in one-time
+// execution guard.
+static port::Status InternalInit() {
+  CUresult res = CUDA_ERROR_NO_DEVICE;
+  if (FLAGS_gpuexec_cuda_driver_inject_init_error) {
+    LOG(ERROR) << "injecting CUDA init error; initialization will fail";
+  } else if (internal::CachedDsoLoader::GetLibcudaDsoHandle().ok()) {
+    // We only call cuInit if we can dynload libcuda.
+    
+    res = dynload::cuInit(0 /* = flags */);
+  }
+
+  if (res == CUDA_SUCCESS) {
+    return port::Status::OK();
+  }
+
+  LOG(ERROR) << "failed call to cuInit: " << ToString(res);
+  Diagnostician::LogDiagnosticInformation();
+  return port::Status{port::error::ABORTED,
+                      port::StrCat("failed call to cuInit: ", ToString(res))};
+}
+
+}  // namespace
+
+/* static */ port::Status CUDADriver::Init() {
+  // Cached return value from calling InternalInit(), as cuInit need only be
+  // called once, but CUDADriver::Init may be called many times.
+  static port::Status init_retval;
+  static bool set = false;
+  static mutex init_mu(LINKER_INITIALIZED);
+
+  mutex_lock lock(init_mu);
+  if (!set) {
+    init_retval = InternalInit();
+    set = true;
+  }
+
+  return init_retval;
+}
+
+/* static */ port::Status CUDADriver::GetDevice(int device_ordinal,
+                                                CUdevice *device) {
+  CUresult res = dynload::cuDeviceGet(device, device_ordinal);
+  if (res == CUDA_SUCCESS) {
+    return port::Status::OK();
+  }
+
+  return port::Status{
+      port::error::INTERNAL,
+      port::StrCat("failed call to cuDeviceGet: ", ToString(res))};
+}
+
+/* static */ bool CUDADriver::GetDeviceName(CUdevice device,
+                                            string *device_name) {
+  static const size_t kCharLimit = 64;
+  port::InlinedVector<char, 4> chars(kCharLimit);
+  CUresult res =
+      dynload::cuDeviceGetName(chars.begin(), kCharLimit - 1, device);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to get device name for " << device << ": "
+               << ToString(res);
+    return false;
+  }
+  chars[kCharLimit - 1] = '\0';
+  *device_name = chars.begin();
+  return true;
+}
+
+bool DeviceOptionsToContextFlags(DeviceOptions device_options, int *flags) {
+  static_assert(DeviceOptions::kMask == 0xf,
+                "needs update for new device options");
+
+  if (device_options.flags() & DeviceOptions::kDoNotReclaimStackAllocation) {
+    *flags |= CU_CTX_LMEM_RESIZE_TO_MAX;
+  }
+
+  // If no flags are set the default is CU_CTX_SCHED_AUTO, which
+  // in Google environments is very likely to mean SPIN.
+  if (device_options.flags() & DeviceOptions::kScheduleSpin) {
+    *flags |= CU_CTX_SCHED_SPIN;
+  }
+  if (device_options.flags() & DeviceOptions::kScheduleYield) {
+    *flags |= CU_CTX_SCHED_YIELD;
+  }
+  if (device_options.flags() & DeviceOptions::kScheduleBlockingSync) {
+    *flags |= CU_CTX_SCHED_BLOCKING_SYNC;
+  }
+
+  return true;
+}
+
+/* static */ port::Status CUDADriver::CreateContext(
+    CUdevice device, DeviceOptions device_options, CUcontext *context) {
+  CUcontext former_context = CurrentContext();
+  if (former_context != nullptr) {
+    LOG(WARNING) << "creating context when one is currently active; existing: "
+                 << former_context;
+  }
+
+  int flags = 0;
+  if (!DeviceOptionsToContextFlags(device_options, &flags)) {
+    LOG(WARNING) << "could not convert all device options into context flags";
+  }
+
+  CUresult res;
+  {
+    // TODO(leary) Need to see if NVIDIA can expunge the leakiness in their
+    // context creation: see http://b/13248943
+    
+    res = dynload::cuCtxCreate_v2(context, flags, device);
+  }
+  if (res == CUDA_SUCCESS) {
+    CreatedContexts::Add(*context);
+    PopContextAndCheckNowNull(*context);
+    CHECK(*context != nullptr)
+        << "success in this call must entail non-null result";
+    VLOG(2) << "created context " << context << " for this thread";
+    return port::Status::OK();
+  }
+
+  string message = "failed call to cuCtxCreate: " + ToString(res);
+  if (res == CUDA_ERROR_OUT_OF_MEMORY) {
+    uint64 total_memory;
+    if (GetDeviceTotalMemory(device, &total_memory)) {
+      port::StrAppend(&message, "; total memory reported: ", total_memory);
+    } else {
+      port::StrAppend(&message, "; could not query total memory");
+    }
+  }
+
+  return port::Status{port::error::INTERNAL, message};
+}
+
+/* static */ void CUDADriver::DestroyContext(CUcontext context) {
+  if (context == nullptr) {
+    return;
+  }
+
+  CUresult res = dynload::cuCtxDestroy_v2(context);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to destroy CUDA context; leaking: " << ToString(res);
+  }
+
+  CreatedContexts::Remove(context);
+}
+
+/* static */ bool CUDADriver::FuncGetAttribute(CUfunction_attribute attribute,
+                                               CUfunction func,
+                                               int *attribute_value) {
+  CUresult res = dynload::cuFuncGetAttribute(attribute_value, attribute, func);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query kernel attribute. kernel: " << func
+               << ", attribute: " << attribute;
+    return false;
+  }
+  return true;
+}
+
+/* static */ bool CUDADriver::FuncSetCacheConfig(CUfunction function,
+                                                 CUfunc_cache cache_config) {
+  CUresult res = dynload::cuFuncSetCacheConfig(function, cache_config);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to set CUDA kernel cache config. kernel: " << function
+               << ", config: " << cache_config << ", result: " << ToString(res);
+    return false;
+  }
+
+  return true;
+}
+
+/* static */ port::StatusOr<CUsharedconfig>
+CUDADriver::ContextGetSharedMemConfig(CUcontext context) {
+  CUsharedconfig shared_mem_config;
+  ScopedActivateContext activation{context};
+  CUresult result = dynload::cuCtxGetSharedMemConfig(&shared_mem_config);
+  if (result != CUDA_SUCCESS) {
+    CUdevice device;
+    dynload::cuCtxGetDevice(&device);
+    LOG(ERROR) << "failed to get CUDA device shared memory config. "
+               << "Context device ID: " << device
+               << ", result: " << ToString(result);
+    return port::Status{
+        port::error::INTERNAL,
+        port::StrCat("failed to get shared memory config: ", ToString(result))};
+  }
+  return shared_mem_config;
+}
+
+/* static */ port::Status CUDADriver::ContextSetSharedMemConfig(
+    CUcontext context, CUsharedconfig shared_mem_config) {
+  ScopedActivateContext activation{context};
+  CUresult result = dynload::cuCtxSetSharedMemConfig(shared_mem_config);
+  if (result != CUDA_SUCCESS) {
+    CUdevice device;
+    dynload::cuCtxGetDevice(&device);
+    LOG(ERROR) << "failed to set CUDA device shared memory config. "
+               << "Context device ID: " << device
+               << ", config: " << shared_mem_config
+               << ", result: " << ToString(result);
+    return port::Status{
+        port::error::INTERNAL,
+        port::StrCat("failed to set shared memory config: ", ToString(result))};
+  }
+  return port::Status::OK();
+}
+
+/* static */ bool CUDADriver::LaunchKernel(
+    CUcontext context, CUfunction function, unsigned int grid_dim_x,
+    unsigned int grid_dim_y, unsigned int grid_dim_z, unsigned int block_dim_x,
+    unsigned int block_dim_y, unsigned int block_dim_z,
+    unsigned int shared_mem_bytes, CUstream stream, void **kernel_params,
+    void **extra) {
+  ScopedActivateContext activation{context};
+  VLOG(2) << "launching kernel: " << function << "; gdx: " << grid_dim_x
+          << " gdy: " << grid_dim_y << " gdz: " << grid_dim_z
+          << " bdx: " << block_dim_x << " bdy: " << block_dim_y
+          << " bdz: " << block_dim_z;
+  CUresult res = dynload::cuLaunchKernel(
+      function, grid_dim_x, grid_dim_y, grid_dim_z, block_dim_x, block_dim_y,
+      block_dim_z, shared_mem_bytes, stream, kernel_params, extra);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to launch CUDA kernel: " << function
+               << "; result: " << ToString(res);
+    return false;
+  }
+  VLOG(2) << "successfully launched kernel";
+  return true;
+}
+
+/* static */ port::Status CUDADriver::LoadCubin(CUcontext context,
+                                                const char *cubin_bytes,
+                                                CUmodule *module) {
+  ScopedActivateContext activation{context};
+  CUresult result = dynload::cuModuleLoadFatBinary(module, cubin_bytes);
+  if (result != CUDA_SUCCESS) {
+    return port::Status{port::error::INTERNAL,
+                        "failed to load in-memory CUBIN: " + ToString(result)};
+  }
+
+  return port::Status::OK();
+}
+
+/* static */ bool CUDADriver::LoadPtx(CUcontext context,
+                                      const char *ptx_contents,
+                                      CUmodule *module) {
+  port::Notification notification;
+  bool ret = true;
+  GetDriverExecutor()->Schedule([context, ptx_contents, module, &ret,
+                                 &notification]() {
+    ScopedActivateContext activation{context};
+    void *ptx_data = const_cast<char *>(ptx_contents);
+    static const unsigned int kLogBufferBytesLimit = 1024;
+    unsigned int error_log_buffer_bytes = kLogBufferBytesLimit;
+    unsigned int info_log_buffer_bytes = kLogBufferBytesLimit;
+    port::InlinedVector<char, 4> error_log_buffer(error_log_buffer_bytes);
+    port::InlinedVector<char, 4> info_log_buffer(info_log_buffer_bytes);
+    bool log_verbose = true;
+    CUjit_option options[] = {CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES,
+                              CU_JIT_ERROR_LOG_BUFFER,
+                              CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES,
+                              CU_JIT_INFO_LOG_BUFFER, CU_JIT_LOG_VERBOSE};
+    // Note that the driver API wants the contents of this values to be stored
+    // in an array of void*s, so we coerce them accordingly.
+    void *option_values[] = {
+        port::bit_cast<void *>(uintptr_t(error_log_buffer_bytes)),
+        port::bit_cast<void *>(error_log_buffer.data()),
+        port::bit_cast<void *>(uintptr_t(info_log_buffer_bytes)),
+        port::bit_cast<void *>(info_log_buffer.data()),
+        port::bit_cast<void *>(uintptr_t(log_verbose))};
+    CHECK(ARRAYSIZE(options) == ARRAYSIZE(option_values));
+
+    CUresult res;
+    {
+      // TODO(leary) Need to see if NVIDIA can expunge the leakiness in their
+      // module loading: see http://b/13248943
+      
+      res = dynload::cuModuleLoadDataEx(module, ptx_data, ARRAYSIZE(options),
+                                        options, option_values);
+    }
+
+    // The PTX JIT mutates the values in the option values array to reflect the
+    // size of the logs it output; now that we've made the call, read the values
+    // back out.
+    error_log_buffer_bytes = reinterpret_cast<uintptr_t>(option_values[0]);
+    info_log_buffer_bytes = reinterpret_cast<uintptr_t>(option_values[2]);
+    CHECK_LE(error_log_buffer_bytes, kLogBufferBytesLimit);
+    CHECK_LE(info_log_buffer_bytes, kLogBufferBytesLimit);
+
+    if (res != CUDA_SUCCESS) {
+      LOG(ERROR) << "failed to load PTX text as a module: " << ToString(res);
+      // As a precaution for null termination of the API-provided value, ensure
+      // that at least the last byte is null.
+      error_log_buffer[error_log_buffer_bytes ?
+                       error_log_buffer_bytes - 1 : 0] = '\0';
+      LOG(ERROR) << "error log buffer (" << error_log_buffer_bytes
+                 << " bytes): " << error_log_buffer.data();
+      ret = false;
+      notification.Notify();
+    }
+
+    VLOG(3) << "PTX compilation info log (" << info_log_buffer_bytes
+            << " bytes): " << info_log_buffer.data();
+    VLOG(3) << "PTX compilation error log (" << error_log_buffer_bytes
+            << " bytes): " << error_log_buffer.data();
+    CHECK(module != nullptr);
+    notification.Notify();
+  });
+  notification.WaitForNotification();
+
+  return ret;
+}
+
+/* static */ bool CUDADriver::SynchronousMemsetUint8(CUcontext context,
+                                                     CUdeviceptr location,
+                                                     uint8 value, size_t size) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemsetD8_v2(location, value, size);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to memset memory: " << ToString(res);
+    return false;
+  }
+  return true;
+}
+
+/* static */ bool CUDADriver::SynchronousMemsetUint32(CUcontext context,
+                                                      CUdeviceptr location,
+                                                      uint32 value,
+                                                      size_t uint32_count) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemsetD32_v2(location, value, uint32_count);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to memset memory: " << ToString(res);
+    return false;
+  }
+  return true;
+}
+
+/* static */ bool CUDADriver::AsynchronousMemsetUint32(CUcontext context,
+                                                       CUdeviceptr location,
+                                                       uint32 value,
+                                                       size_t uint32_count,
+                                                       CUstream stream) {
+  ScopedActivateContext activation{context};
+  CUresult res =
+      dynload::cuMemsetD32Async(location, value, uint32_count, stream);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to enqueue async memset operation: " << ToString(res);
+    return false;
+  }
+  VLOG(2) << "successfully enqueued async memset operation";
+  return true;
+}
+
+/* static */ bool CUDADriver::AddStreamCallback(CUcontext context,
+                                                CUstream stream,
+                                                StreamCallback callback,
+                                                void *data) {
+  // Note: flags param is required to be zero according to CUDA 6.0.
+  CUresult res =
+      dynload::cuStreamAddCallback(stream, callback, data, 0 /* = flags */);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "unable to add host callback: " << ToString(res);
+    return false;
+  }
+  return true;
+}
+
+/* static */ bool CUDADriver::GetModuleFunction(CUcontext context,
+                                                CUmodule module,
+                                                const char *kernel_name,
+                                                CUfunction *function) {
+  ScopedActivateContext activated{context};
+  CHECK(module != nullptr && kernel_name != nullptr);
+  CUresult res = dynload::cuModuleGetFunction(function, module, kernel_name);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to get PTX kernel \"" << kernel_name
+               << "\" from module: " << ToString(res);
+    return false;
+  }
+
+  return true;
+}
+
+/* static */ bool CUDADriver::GetModuleSymbol(CUcontext context,
+                                              CUmodule module,
+                                              const char *symbol_name,
+                                              CUdeviceptr *dptr,
+                                              size_t *bytes) {
+  ScopedActivateContext activated{context};
+  CHECK(module != nullptr && symbol_name != nullptr &&
+        (dptr != nullptr || bytes != nullptr));
+  CUresult res =
+      dynload::cuModuleGetGlobal_v2(dptr, bytes, module, symbol_name);
+  if (res != CUDA_SUCCESS) {
+    // symbol may not be found in the current module, but it may reside in
+    // another module.
+    VLOG(2) << "failed to get symbol \"" << symbol_name
+            << "\" from module: " << ToString(res);
+    return false;
+  }
+
+  return true;
+}
+
+/* static */ void CUDADriver::UnloadModule(CUcontext context, CUmodule module) {
+  ScopedActivateContext activated{context};
+  CUresult res = dynload::cuModuleUnload(module);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to unload module " << module
+               << "; leaking: " << ToString(res);
+  }
+}
+
+/* static */ port::StatusOr<CUdevice> CUDADriver::DeviceFromContext(
+    CUcontext context) {
+  ScopedActivateContext activated{context};
+  CUdevice device = -1;
+  CUresult result = dynload::cuCtxGetDevice(&device);
+  if (result == CUDA_SUCCESS) {
+    return device;
+  }
+
+  return port::Status{
+      port::error::INTERNAL,
+      port::StrCat("failed to get device for context: ", ToString(result))};
+}
+
+/* static */ bool CUDADriver::CreateStream(CUcontext context, CUstream *out) {
+  // TODO(leary) can we switch this to CU_STREAM_NON_BLOCKING or will that mess
+  // up synchronization with respect to memsets and any other things that have
+  // to occur on the default stream?
+  ScopedActivateContext activated{context};
+  CUresult res = dynload::cuStreamCreate(out, 0);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "could not allocate CUDA stream for context " << context
+               << ": " << ToString(res);
+    return false;
+  }
+
+  VLOG(2) << "successfully created stream " << *out << " for context "
+          << context << " on thread";
+  return true;
+}
+
+/* static */ void CUDADriver::DestroyStream(CUcontext context,
+                                            CUstream *stream) {
+  if (*stream == nullptr) {
+    return;
+  }
+
+  ScopedActivateContext activated{context};
+  CUresult res = dynload::cuStreamDestroy_v2(*stream);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to destroy CUDA stream for context " << context
+               << ": " << ToString(res);
+  } else {
+    VLOG(2) << "successfully destroyed stream " << *stream << " for context "
+            << context;
+    *stream = nullptr;
+  }
+}
+
+/* static */ void *CUDADriver::DeviceAllocate(CUcontext context, uint64 bytes) {
+  ScopedActivateContext activated{context};
+  CUdeviceptr result = 0;
+  CUresult res = dynload::cuMemAlloc_v2(&result, bytes);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to allocate "
+               << port::HumanReadableNumBytes::ToString(bytes) << " (" << bytes
+               << " bytes) from device: " << ToString(res);
+    return nullptr;
+  }
+  void *ptr = reinterpret_cast<void *>(result);
+  VLOG(2) << "allocated " << ptr << " for context " << context << " of "
+          << bytes << " bytes";
+  return ptr;
+}
+
+/* static */ void CUDADriver::DeviceDeallocate(CUcontext context,
+                                               void *location) {
+  ScopedActivateContext activation{context};
+  CUdeviceptr pointer = port::bit_cast<CUdeviceptr>(location);
+  CUresult res = dynload::cuMemFree_v2(pointer);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to free device memory at " << location
+               << "; result: " << ToString(res);
+  } else {
+    VLOG(2) << "deallocated " << location << " for context " << context;
+  }
+}
+
+/* static */ void *CUDADriver::HostAllocate(CUcontext context, uint64 bytes) {
+  ScopedActivateContext activation{context};
+  void *host_mem = nullptr;
+  // "Portable" memory is visible to all CUDA contexts. Safe for our use model.
+  CUresult res =
+      dynload::cuMemHostAlloc(&host_mem, bytes, CU_MEMHOSTALLOC_PORTABLE);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to alloc " << bytes
+               << " bytes on host: " << ToString(res);
+  }
+  return host_mem;
+}
+
+/* static */ void CUDADriver::HostDeallocate(CUcontext context,
+                                             void *location) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemFreeHost(location);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "error deallocating host memory at " << location << ": "
+               << ToString(res);
+  }
+}
+
+/* static */ bool CUDADriver::HostRegister(CUcontext context, void *location,
+                                           uint64 bytes) {
+  ScopedActivateContext activation{context};
+  // "Portable" memory is visible to all CUDA contexts. Safe for our use model.
+  CUresult res =
+      dynload::cuMemHostRegister(location, bytes, CU_MEMHOSTREGISTER_PORTABLE);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "error registering host memory at " << location << ": "
+               << ToString(res);
+    return false;
+  }
+  return true;
+}
+
+/* static */ bool CUDADriver::HostUnregister(CUcontext context,
+                                             void *location) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemHostUnregister(location);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "error unregistering host memory at " << location << ": "
+               << ToString(res);
+    return false;
+  }
+  return true;
+}
+
+/* static */ port::Status CUDADriver::DestroyEvent(CUcontext context,
+                                                   CUevent *event) {
+  if (*event == nullptr) {
+    return port::Status{port::error::INVALID_ARGUMENT,
+                        "input event cannot be null"};
+  }
+
+  ScopedActivateContext activated{context};
+  CUresult res = dynload::cuEventDestroy_v2(*event);
+  *event = nullptr;
+
+  switch (res) {
+    case CUDA_SUCCESS:
+      return port::Status::OK();
+    case CUDA_ERROR_DEINITIALIZED:
+    case CUDA_ERROR_NOT_INITIALIZED:
+      return port::Status{
+          port::error::FAILED_PRECONDITION,
+          port::Printf("error destroying CUDA event in context %p: %s", context,
+                       ToString(res).c_str())};
+    default:
+      return port::Status{
+          port::error::INTERNAL,
+          port::Printf("error destroying CUDA event in context %p: %s", context,
+                       ToString(res).c_str())};
+  }
+}
+
+/* static */ port::Status CUDADriver::RecordEvent(CUcontext context,
+                                                  CUevent event,
+                                                  CUstream stream) {
+  ScopedActivateContext activated{context};
+  CUresult res = dynload::cuEventRecord(event, stream);
+  switch (res) {
+    case CUDA_SUCCESS:
+      return port::Status::OK();
+    case CUDA_ERROR_DEINITIALIZED:
+    case CUDA_ERROR_NOT_INITIALIZED:
+      return port::Status{
+          port::error::FAILED_PRECONDITION,
+          port::Printf("error recording CUDA event on stream %p: %s", stream,
+                       ToString(res).c_str())};
+    default:
+      return port::Status{
+          port::error::INVALID_ARGUMENT,
+          port::Printf("error recording CUDA event on stream %p: %s", stream,
+                       ToString(res).c_str())};
+  }
+}
+
+/* static */ port::StatusOr<CUresult> CUDADriver::QueryEvent(CUcontext context,
+                                                             CUevent event) {
+  ScopedActivateContext activated{context};
+  CUresult res = dynload::cuEventQuery(event);
+  if (res != CUDA_SUCCESS && res != CUDA_ERROR_NOT_READY) {
+    return port::Status{
+        port::error::INTERNAL,
+        port::Printf("failed to query event: %s", ToString(res).c_str())};
+  }
+
+  return res;
+}
+
+/* static */ bool CUDADriver::GetEventElapsedTime(CUcontext context,
+                                                  float *elapsed_milliseconds,
+                                                  CUevent start, CUevent stop) {
+  ScopedActivateContext activated{context};
+  CUresult res = dynload::cuEventElapsedTime(elapsed_milliseconds, start, stop);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to get elapsed time between events: "
+               << ToString(res);
+    return false;
+  }
+
+  return true;
+}
+
+/* static */ bool CUDADriver::WaitStreamOnEvent(CUcontext context,
+                                                CUstream stream,
+                                                CUevent event) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuStreamWaitEvent(stream, event, 0 /* = flags */);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "could not wait stream on event: " << ToString(res);
+    return false;
+  }
+
+  return true;
+}
+
+/* static */ bool CUDADriver::SynchronizeContext(CUcontext context) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuCtxSynchronize();
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "could not synchronize on CUDA context: " << ToString(res)
+               << " :: " << port::CurrentStackTrace();
+    return false;
+  }
+
+  return true;
+}
+
+/* static */ bool CUDADriver::SynchronizeStream(CUcontext context,
+                                                CUstream stream) {
+  ScopedActivateContext activated{context};
+  CHECK(stream != nullptr);
+  CUresult res = dynload::cuStreamSynchronize(stream);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "could not synchronize on CUDA stream: " << ToString(res)
+               << " :: " << port::CurrentStackTrace();
+    return false;
+  }
+  VLOG(2) << "successfully synchronized stream " << stream << " on context "
+          << context;
+  return true;
+}
+
+/* static */ bool CUDADriver::IsStreamIdle(CUcontext context, CUstream stream) {
+  ScopedActivateContext activated{context};
+  CHECK(stream != nullptr);
+  CUresult res = dynload::cuStreamQuery(stream);
+  if (res == CUDA_SUCCESS) {
+    return true;
+  }
+
+  if (res != CUDA_ERROR_NOT_READY) {
+    LOG(ERROR) << "stream in bad state on status query: " << ToString(res);
+  }
+  return false;
+}
+
+/* static */ bool CUDADriver::SynchronousMemcpyD2H(CUcontext context,
+                                                   void *host_dst,
+                                                   CUdeviceptr gpu_src,
+                                                   uint64 size) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemcpyDtoH_v2(host_dst, gpu_src, size);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << port::Printf(
+        "failed to synchronous memcpy from device to host: %s; "
+        "host dst: %p; GPU src: %p; size: %llu=0x%llx",
+        ToString(res).c_str(), host_dst, port::bit_cast<void *>(gpu_src), size, size);
+    return false;
+  }
+  VLOG(2) << "successfully sync memcpy'd d2h of " << size << " bytes to "
+          << host_dst;
+  return true;
+}
+
+/* static */ bool CUDADriver::SynchronousMemcpyH2D(CUcontext context,
+                                                   CUdeviceptr gpu_dst,
+                                                   const void *host_src,
+                                                   uint64 size) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemcpyHtoD_v2(gpu_dst, host_src, size);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << port::Printf(
+        "failed to synchronous memcpy from host to device: %s; GPU dst: %p;"
+        " host src: %p; size: %llu=0x%llx",
+        ToString(res).c_str(), port::bit_cast<void *>(gpu_dst), host_src, size, size);
+    return false;
+  }
+  VLOG(2) << "successfully enqueued sync memcpy h2d of " << size << " bytes";
+  return true;
+}
+
+/* static */ bool CUDADriver::SynchronousMemcpyD2D(CUcontext context,
+                                                   CUdeviceptr gpu_dst,
+                                                   CUdeviceptr gpu_src,
+                                                   uint64 size) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemcpyDtoD_v2(gpu_dst, gpu_src, size);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << port::Printf(
+        "failed to synchronous memcpy from host to device: %s; GPU dst: %p; "
+        "GPU src: %p; size: %llu=0x%llx",
+        ToString(res).c_str(), port::bit_cast<void *>(gpu_dst),
+        port::bit_cast<void *>(gpu_src), size, size);
+    return false;
+  }
+  VLOG(2) << "successfully sync memcpy'd d2d of " << size << " bytes";
+  return true;
+}
+
+/* static */ bool CUDADriver::AsynchronousMemcpyD2H(CUcontext context,
+                                                    void *host_dst,
+                                                    CUdeviceptr gpu_src,
+                                                    uint64 size,
+                                                    CUstream stream) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemcpyDtoHAsync_v2(host_dst, gpu_src, size, stream);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << port::Printf(
+        "failed to enqueue async memcpy from device to host: %s; host dst: %p; "
+        "GPU src: %p; size: %llu=0x%llx",
+        ToString(res).c_str(), host_dst, port::bit_cast<void *>(gpu_src), size, size);
+    return false;
+  }
+  VLOG(2) << "successfully enqueued async memcpy d2h of " << size
+          << " bytes from " << port::bit_cast<void *>(gpu_src) << " to " << host_dst
+          << " on stream " << stream;
+  return true;
+}
+
+/* static */ bool CUDADriver::AsynchronousMemcpyH2D(CUcontext context,
+                                                    CUdeviceptr gpu_dst,
+                                                    const void *host_src,
+                                                    uint64 size,
+                                                    CUstream stream) {
+  ScopedActivateContext activation{context};
+  CUresult res = dynload::cuMemcpyHtoDAsync_v2(gpu_dst, host_src, size, stream);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << port::Printf(
+        "failed to enqueue async memcpy from host to device: %s; GPU dst: %p; "
+        "host src: %p; size: %llu=0x%llx",
+        ToString(res).c_str(), port::bit_cast<void *>(gpu_dst), host_src, size, size);
+    return false;
+  }
+  VLOG(2) << "successfully enqueued async memcpy h2d of " << size << " bytes"
+          << " on stream " << stream;
+  return true;
+}
+
+/* static */ bool CUDADriver::AsynchronousMemcpyD2D(CUcontext context,
+                                                    CUdeviceptr gpu_dst,
+                                                    CUdeviceptr gpu_src,
+                                                    uint64 size,
+                                                    CUstream stream) {
+  ScopedActivateContext activation{context};
+  CUresult result =
+      dynload::cuMemcpyDtoDAsync_v2(gpu_dst, gpu_src, size, stream);
+  if (result != CUDA_SUCCESS) {
+    LOG(ERROR) << port::Printf(
+        "failed to enqueue async memcpy from device to device: %s"
+        "; GPU dst: %p on %s %s"
+        "; GPU src: %p on %s %s"
+        "; can access? %s; size: %llu=0x%llx",
+        ToString(result).c_str(), port::bit_cast<void *>(gpu_dst),
+        CUDAPointerToMemorySpaceString(gpu_dst).c_str(),
+        CUDAPointerToDeviceString(gpu_dst).c_str(), port::bit_cast<void *>(gpu_src),
+        CUDAPointerToMemorySpaceString(gpu_src).c_str(),
+        CUDAPointerToDeviceString(gpu_src).c_str(),
+        CUDAPointersToCanAccessString(gpu_src, gpu_dst).c_str(), size, size);
+
+    return false;
+  }
+  VLOG(2) << "successfully enqueued async memcpy d2d of " << size << " bytes";
+  return true;
+}
+
+/* static */ port::Status CUDADriver::CreateEvent(CUcontext context,
+                                                  CUevent *result,
+                                                  EventFlags flags) {
+  int cuflags;
+  switch (flags) {
+    case EventFlags::kDefault:
+      cuflags = CU_EVENT_DEFAULT;
+      break;
+    case EventFlags::kDisableTiming:
+      cuflags = CU_EVENT_DISABLE_TIMING;
+      break;
+    default:
+      LOG(FATAL) << "impossible event flags: " << int(flags);
+  }
+
+  ScopedActivateContext activated{context};
+  CUresult res = dynload::cuEventCreate(result, cuflags);
+
+  if (res == CUDA_SUCCESS) {
+    return port::Status::OK();
+  } else if (res == CUDA_ERROR_OUT_OF_MEMORY) {
+    return port::Status{port::error::RESOURCE_EXHAUSTED,
+                        "could not create CUDA event: out of device memory"};
+  } else {
+    return port::Status{
+        port::error::FAILED_PRECONDITION,
+        port::StrCat("could not create CUDA event: ", ToString(res))};
+  }
+}
+
+/* static */ int CUDADriver::GetDeviceCount() {
+  int device_count = 0;
+  CUresult res = dynload::cuDeviceGetCount(&device_count);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "could not retrieve CUDA device count: " << ToString(res);
+    return 0;
+  }
+
+  if (FLAGS_gpuexec_cuda_device_0_only && device_count > 1) {
+    device_count = 1;
+  }
+  return device_count;
+}
+
+/* static */ port::StatusOr<CUcontext> CUDADriver::GetPointerContext(
+    CUdeviceptr pointer) {
+  CUcontext context = nullptr;
+  CUresult result = dynload::cuPointerGetAttribute(
+      &context, CU_POINTER_ATTRIBUTE_CONTEXT, pointer);
+  if (result == CUDA_SUCCESS) {
+    CHECK(context != nullptr) << "success should entail non-null context";
+    return context;
+  }
+
+  return port::Status{
+      port::error::INTERNAL,
+      port::StrCat("failed to query device pointer for context: ",
+                   ToString(result))};
+}
+
+/* static */ port::StatusOr<MemorySpace> CUDADriver::GetPointerMemorySpace(
+    CUdeviceptr pointer) {
+  unsigned int value;
+  CUresult result = dynload::cuPointerGetAttribute(
+      &value, CU_POINTER_ATTRIBUTE_MEMORY_TYPE, pointer);
+  if (result == CUDA_SUCCESS) {
+    switch (value) {
+      case CU_MEMORYTYPE_DEVICE:
+        return MemorySpace::kDevice;
+      case CU_MEMORYTYPE_HOST:
+        return MemorySpace::kHost;
+      default:
+        return port::Status{
+            port::error::INTERNAL,
+            port::StrCat("unknown memory space provided by CUDA API: ", value)};
+    }
+  }
+
+  return port::Status{
+      port::error::INTERNAL,
+      port::StrCat("failed to query device pointer for memory space: ",
+                   ToString(result))};
+}
+
+/* static */ port::Status CUDADriver::GetPointerAddressRange(CUdeviceptr dptr,
+                                                             CUdeviceptr *base,
+                                                             size_t *size) {
+  CUresult result = dynload::cuMemGetAddressRange(base, size, dptr);
+  if (result == CUDA_SUCCESS) {
+    return port::Status::OK();
+  } else if (result == CUDA_ERROR_NOT_FOUND) {
+    // We differentiate between "this pointer is unknown" (return here) and
+    // "there was an internal error while performing this operation" (return
+    // below).
+    return port::Status{
+        port::error::NOT_FOUND,
+        port::Printf("not a device pointer %p; %s",
+                     reinterpret_cast<void *>(dptr), ToString(result).c_str())};
+  }
+
+  return port::Status{
+      port::error::INTERNAL,
+      port::Printf("failed to get pointer into for device pointer %p; %s",
+                   reinterpret_cast<void *>(dptr), ToString(result).c_str())};
+}
+
+/* static */ port::StatusOr<CUdevice> CUDADriver::GetPointerDevice(
+    CUdeviceptr pointer) {
+  auto result = GetPointerContext(pointer);
+  if (!result.ok()) {
+    return result.status();
+  }
+
+  return DeviceFromContext(result.ValueOrDie());
+}
+
+/* static */ port::Status CUDADriver::GetComputeCapability(int *cc_major,
+                                                           int *cc_minor,
+                                                           CUdevice device) {
+  *cc_major = 0;
+  *cc_minor = 0;
+  CUresult result =
+      dynload::cuDeviceComputeCapability(cc_major, cc_minor, device);
+  if (result == CUDA_SUCCESS) {
+    return port::Status::OK();
+  }
+
+  return port::Status{
+      port::error::INTERNAL,
+      port::Printf("failed to get compute capability for device: %s; %d",
+                   ToString(result).c_str(), device)};
+}
+
+// Helper function that turns the integer output of cuDeviceGetAttribute to type
+// T and wraps it in a StatusOr.
+template <typename T>
+static port::StatusOr<T> GetSimpleAttribute(CUdevice device,
+                                            CUdevice_attribute attribute) {
+  int value = -1;
+  CUresult result = dynload::cuDeviceGetAttribute(&value, attribute, device);
+  if (result != CUDA_SUCCESS) {
+    return port::Status{
+        port::error::NOT_FOUND,
+        port::StrCat("could not retrieve CUDA device attribute (", attribute,
+                     "): ", ToString(result))};
+  }
+  T converted = value;
+  return converted;
+}
+
+/* static */ port::StatusOr<int> CUDADriver::GetMultiprocessorCount(
+    CUdevice device) {
+  return GetSimpleAttribute<int>(device,
+                                 CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT);
+}
+
+/* static */ port::StatusOr<int64> CUDADriver::GetMaxSharedMemoryPerCore(
+    CUdevice device) {
+  return GetSimpleAttribute<int64>(
+      device, CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR);
+}
+
+/* static */ port::StatusOr<int64> CUDADriver::GetMaxSharedMemoryPerBlock(
+    CUdevice device) {
+  return GetSimpleAttribute<int64>(
+      device, CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK);
+}
+
+/* static */ port::StatusOr<int64> CUDADriver::GetMaxThreadsPerMultiprocessor(
+    CUdevice device) {
+  return GetSimpleAttribute<int64>(
+      device, CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR);
+}
+
+/* static */ port::StatusOr<int64> CUDADriver::GetMaxThreadsPerBlock(
+    CUdevice device) {
+  return GetSimpleAttribute<int64>(device,
+                                   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK);
+}
+
+/* static */ port::StatusOr<int64> CUDADriver::GetMaxRegistersPerBlock(
+    CUdevice device) {
+  return GetSimpleAttribute<int64>(device,
+                                   CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK);
+}
+
+/* static */ port::StatusOr<int64> CUDADriver::GetThreadsPerWarp(
+    CUdevice device) {
+  return GetSimpleAttribute<int64>(device, CU_DEVICE_ATTRIBUTE_WARP_SIZE);
+}
+
+/* static */ bool CUDADriver::GetGridLimits(int *x, int *y, int *z,
+                                            CUdevice device) {
+  int value;
+  CUresult res = dynload::cuDeviceGetAttribute(
+      &value, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X, device);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query max grid dim x: " << ToString(res);
+    return false;
+  }
+  *x = value;
+
+  res = dynload::cuDeviceGetAttribute(
+      &value, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y, device);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query max grid dim y: " << ToString(res);
+    return false;
+  }
+  *y = value;
+
+  res = dynload::cuDeviceGetAttribute(
+      &value, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z, device);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query max grid dim z: " << ToString(res);
+    return false;
+  }
+  *z = value;
+  return true;
+}
+
+/* static */ bool CUDADriver::GetDriverVersion(int *driver_version) {
+  CUresult res = dynload::cuDriverGetVersion(driver_version);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query driver version: " << ToString(res);
+    return false;
+  }
+
+  return true;
+}
+
+/* static */ bool CUDADriver::GetDeviceProperties(CUdevprop *device_properties,
+                                                  int device_ordinal) {
+  CUresult res =
+      dynload::cuDeviceGetProperties(device_properties, device_ordinal);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query device properties: " << ToString(res);
+    return false;
+  }
+
+  return true;
+}
+
+/* static */ bool CUDADriver::IsEccEnabled(CUdevice device, bool *result) {
+  int value = -1;
+  CUresult res = dynload::cuDeviceGetAttribute(
+      &value, CU_DEVICE_ATTRIBUTE_ECC_ENABLED, device);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query ECC status: " << ToString(res);
+    return false;
+  }
+
+  *result = value;
+  return true;
+}
+
+/* static */ bool CUDADriver::GetDeviceMemoryInfo(CUcontext context,
+                                                  int64 *free_out,
+                                                  int64 *total_out) {
+  ScopedActivateContext activation{context};
+  size_t free = 0;
+  size_t total = 0;
+  CUresult res = dynload::cuMemGetInfo_v2(&free, &total);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query device memory info: " << ToString(res);
+    return false;
+  }
+
+  *free_out = free;
+  *total_out = total;
+  return true;
+}
+
+/* static */ bool CUDADriver::GetDeviceTotalMemory(CUdevice device,
+                                                   uint64 *result) {
+  size_t value = -1;
+  CUresult res = dynload::cuDeviceTotalMem_v2(&value, device);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query total available memory: " << ToString(res);
+    return false;
+  }
+
+  *result = value;
+  return true;
+}
+
+/* static */ string CUDADriver::GetPCIBusID(CUdevice device) {
+  string pci_bus_id;
+  static const int kBufferSize = 64;
+  port::InlinedVector<char, 4> chars(kBufferSize);
+  chars[kBufferSize - 1] = '\0';
+  CUresult res =
+      dynload::cuDeviceGetPCIBusId(chars.begin(), kBufferSize - 1, device);
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to query PCI bus id for device: " << ToString(res);
+    return pci_bus_id;
+  }
+  pci_bus_id = chars.begin();
+  return pci_bus_id;
+}
+
+/* static */ bool CUDADriver::CanEnablePeerAccess(CUcontext from,
+                                                  CUcontext to) {
+  if (from == to) {
+    return true;  // A context can always access its own memory.
+  }
+
+  int can_access_peer = -1;
+  auto from_device = DeviceFromContext(from);
+  if (!from_device.ok()) {
+    LOG(ERROR) << "failed to resolve 'from' peer access context to a device: "
+               << from_device.status();
+    return false;
+  }
+  auto to_device = DeviceFromContext(to);
+  if (!to_device.ok()) {
+    LOG(ERROR) << "failed to resolve 'to' peer access context to a device: "
+               << to_device.status();
+    return false;
+  }
+  CUresult res = dynload::cuDeviceCanAccessPeer(
+      &can_access_peer, from_device.ValueOrDie(), to_device.ValueOrDie());
+  if (res != CUDA_SUCCESS) {
+    LOG(ERROR) << "failed to detect peer access capability: " << ToString(res);
+    return false;
+  }
+
+  return can_access_peer;
+}
+
+/* static */ port::Status CUDADriver::EnablePeerAccess(CUcontext from,
+                                                       CUcontext to) {
+  if (from == to) {
+    return port::Status::OK();  // A context can always access its own memory.
+  }
+
+  ScopedActivateContext activated{from};
+  CUresult result = dynload::cuCtxEnablePeerAccess(to, 0 /* = flags */);
+  if (result != CUDA_SUCCESS &&
+      result != CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED) {
+    return port::Status{
+        port::error::INTERNAL,
+        port::Printf("failed to enable peer access from %p to %p: %s", from, to,
+                     ToString(result).c_str())};
+  }
+
+  return port::Status::OK();
+}
+
+/* static */ port::StatusOr<int> CUDADriver::GetMaxOccupiedBlocksPerCore(
+    CUcontext context, CUfunction kernel, int threads_per_block,
+    size_t dynamic_shared_memory_bytes) {
+  ScopedActivateContext activation{context};
+
+  int max_blocks;
+  CUresult result = dynload::cuOccupancyMaxActiveBlocksPerMultiprocessor(
+      &max_blocks, kernel, threads_per_block, dynamic_shared_memory_bytes);
+  if (result != CUDA_SUCCESS) {
+    return port::Status{
+        port::error::INTERNAL,
+        port::Printf("failed to calculate occupancy of kernel %p: %s", kernel,
+                     ToString(result).c_str())};
+  }
+
+  return max_blocks;
+}
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/cuda/cuda_driver.h b/tensorflow/stream_executor/cuda/cuda_driver.h
new file mode 100644
index 0000000000..007db222d9
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_driver.h
@@ -0,0 +1,460 @@
+// CUDA userspace driver library wrapper functionality.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DRIVER_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DRIVER_H_
+
+#include <stddef.h>
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/cuda/multi_op_activation.h"
+#include "tensorflow/stream_executor/device_options.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "third_party/gpus/cuda/include/cuda.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+// Identifies the memory space where an allocation resides. See
+// CUDADriver::GetPointerMemorySpace().
+enum class MemorySpace { kHost, kDevice };
+
+// Returns a casual string, such as "host" for the provided memory space.
+string MemorySpaceString(MemorySpace memory_space);
+
+// CUDADriver contains wrappers for calls to the userspace library driver. It's
+// useful to isolate these calls and put basic wrappers around them to separate
+// userspace library driver behaviors from the rest of the program.
+//
+// At the moment it's simply used as a namespace.
+//
+// The calls log any specific errors internally and return whether the operation
+// was successful to the caller.
+//
+// The order of parameters is generally kept symmetric with the underlying CUDA
+// driver API.
+//
+// Links on functions are to specific documentation under
+// http://docs.nvidia.com/cuda/cuda-driver-api/
+//
+// Thread safety: these functions should not be used from signal handlers.
+class CUDADriver {
+ public:
+  // Wraps a call to cuInit with logging to help indicate what has gone wrong in
+  // the case of failure. Safe to call multiple times; will be fast on all calls
+  // after the first.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__INITIALIZE.html#group__CUDA__INITIALIZE_1g0a2f1517e1bd8502c7194c3a8c134bc3
+  static port::Status Init();
+
+  // Returns the device associated with the given context.
+  // device is an outparam owned by the caller, must not be null.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g4e84b109eba36cdaaade167f34ae881e
+  static port::StatusOr<CUdevice> DeviceFromContext(CUcontext context);
+
+  // Creates a new CUDA stream associated with the given context via
+  // cuStreamCreate.
+  // stream is an outparam owned by the caller, must not be null.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1ga581f0c5833e21ded8b5a56594e243f4
+  static bool CreateStream(CUcontext context, CUstream *stream);
+
+  // Destroys a CUDA stream associated with the given context.
+  // stream is owned by the caller, must not be null, and *stream is set to null
+  // if the stream is successfuly destroyed.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g244c8833de4596bcd31a06cdf21ee758
+  static void DestroyStream(CUcontext context, CUstream *stream);
+
+  // CUDA events can explicitly disable event TSC retrieval for some presumed
+  // performance improvement if timing is unnecessary.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g450687e75f3ff992fe01662a43d9d3db
+  enum class EventFlags { kDefault, kDisableTiming };
+
+  // Creates a new event associated with the given context.
+  // result is an outparam owned by the caller and must not be null.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g450687e75f3ff992fe01662a43d9d3db
+  static port::Status CreateEvent(CUcontext context, CUevent *result,
+                                  EventFlags flags);
+
+  // Destroys *event and turns it into a nullptr. event may not be null, but
+  // *event may be, via cuEventDestroy
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g593ec73a8ec5a5fc031311d3e4dca1ef
+  static port::Status DestroyEvent(CUcontext context, CUevent *event);
+
+  // Allocates a GPU memory space of size bytes associated with the given
+  // context via cuMemAlloc.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb82d2a09844a58dd9e744dc31e8aa467
+  static void *DeviceAllocate(CUcontext context, uint64 bytes);
+
+  // Deallocates a GPU memory space of size bytes associated with the given
+  // context via cuMemFree.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g89b3f154e17cc89b6eea277dbdf5c93a
+  static void DeviceDeallocate(CUcontext context, void *location);
+
+  // Allocates page-locked and CUDA-registered memory on the host via
+  // cuMemAllocHost.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gdd8311286d2c2691605362c689bc64e0
+  static void *HostAllocate(CUcontext context, uint64 bytes);
+
+  // Deallocates a location created by HostAllocate, via cuMemFreeHost.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g62e0fdbe181dab6b1c90fa1a51c7b92c
+  static void HostDeallocate(CUcontext context, void *location);
+
+  // Registers a memory region at location of size bytes via cuMemHostRegister.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf0a9fe11544326dabd743b7aa6b54223
+  static bool HostRegister(CUcontext context, void *location, uint64 bytes);
+
+  // Unregisters a memory region that was previously registered at location via
+  // cuMemHostUnregister.
+  //
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g63f450c8125359be87b7623b1c0b2a14
+  //
+  // TODO(leary) verify an error will be returned if the location wasn't
+  // previously registered.
+  static bool HostUnregister(CUcontext context, void *location);
+
+  // Given a device ordinal, returns a device handle into the device outparam,
+  // which must not be null.
+  //
+  // N.B. these device handles do not have a corresponding destroy function in
+  // the CUDA driver API.
+  static port::Status GetDevice(int device_ordinal, CUdevice *device);
+
+  // Given a device handle, returns the name reported by the driver for the
+  // device.
+  static bool GetDeviceName(CUdevice device, string *name_out);
+
+  // Given a device to create a context for, returns a context handle into the
+  // context outparam, which must not be null.
+  //
+  // N.B. CUDA contexts are weird. They are implicitly associated with the
+  // calling thread. Current documentation on contexts and their influence on
+  // userspace processes is given here:
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
+  static port::Status CreateContext(CUdevice device,
+                                    DeviceOptions device_options,
+                                    CUcontext *context);
+
+  // Destroys the provided context via cuCtxDestroy.
+  // Don't do this while clients could still be using the context, per the docs
+  // bad things will happen.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g27a365aebb0eb548166309f58a1e8b8e
+  static void DestroyContext(CUcontext context);
+
+  // Queries the runtime for the specified attribute of the specified function.
+  // cuFuncGetAttribute (the underlying CUDA driver API routine) only operates
+  // in terms of integer-sized values, so there's no potential for overrun (as
+  // of CUDA 5.5).
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g5e92a1b0d8d1b82cb00dcfb2de15961b
+  static bool FuncGetAttribute(CUfunction_attribute attribute,
+                               CUfunction function, int *attribute_value);
+
+  // Sets the preferred cache configuration for the specified function.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g40f8c11e81def95dc0072a375f965681
+  static bool FuncSetCacheConfig(CUfunction function,
+                                 CUfunc_cache cache_config);
+
+  // Gets the preferred shared memory bank configuration for the specified
+  // CONTEXT (not function!), either default or four- or eight-byte bank size.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g17153a1b8b8c756f7ab8505686a4ad74
+  static port::StatusOr<CUsharedconfig> ContextGetSharedMemConfig(
+      CUcontext context);
+
+  // Sets the preferred shared memory bank configuration for the specified
+  // CONTEXT (not function!), either default or four- or eight-byte bank size.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g2574235fa643f8f251bf7bc28fac3692
+  static port::Status ContextSetSharedMemConfig(
+      CUcontext context, CUsharedconfig shared_mem_config);
+
+  // Launches a CUDA kernel via cuLaunchKernel.
+  // TODO(leary) describe the structure of kernel_params and extra in a readable
+  // way.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15
+  static bool LaunchKernel(CUcontext context, CUfunction function,
+                           unsigned int grid_dim_x, unsigned int grid_dim_y,
+                           unsigned int grid_dim_z, unsigned int block_dim_x,
+                           unsigned int block_dim_y, unsigned int block_dim_z,
+                           unsigned int shared_mem_bytes, CUstream stream,
+                           void **kernel_params, void **extra);
+
+  // Loads ptx_contents with the CUDA driver's PTX JIT and stores the resulting
+  // handle in "module". Any error logs that are produced are logged internally.
+  static bool LoadPtx(CUcontext context, const char *ptx_contents,
+                      CUmodule *module);
+
+  // Loads cubin_bytes with the CUDA driver's blob loading interface and stores
+  // the resulting handle in "module".
+  static port::Status LoadCubin(CUcontext context, const char *cubin_bytes,
+                                CUmodule *module);
+
+  // Retrieves a named kernel from a loaded module, and places the resulting
+  // handle into function (outparam) on success. Neither kernel_name nor
+  // function may be null. No ownership is taken of kernel_name.
+  static bool GetModuleFunction(CUcontext context, CUmodule module,
+                                const char *kernel_name, CUfunction *function);
+
+  // Retrieves a named global/constant symbol from a loaded module, and returns
+  // a device pointer and size of the symbol on success. symbol_name may not be
+  // null. At least one of dptr or bytes should not be null. No ownership is
+  // taken of symbol_name.
+  static bool GetModuleSymbol(CUcontext context, CUmodule module,
+                              const char *symbol_name, CUdeviceptr *dptr,
+                              size_t *bytes);
+
+  // Unloads module from the current context via cuModuleUnload.
+  // TODO(leary) the documentation doesn't say what kind of disasters happen
+  // if you try to unload a module while its CUfunctions are in use.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g8ea3d716524369de3763104ced4ea57b
+  static void UnloadModule(CUcontext context, CUmodule module);
+
+  // Performs a synchronous memset of the device memory segment via cuMemsetD8.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6e582bf866e9e2fb014297bfaf354d7b
+  static bool SynchronousMemsetUint8(CUcontext context, CUdeviceptr location,
+                                     uint8 value, size_t size);
+
+  // Performs a synchronous memset of the device memory segment via cuMemsetD32.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g983e8d8759acd1b64326317481fbf132
+  static bool SynchronousMemsetUint32(CUcontext context, CUdeviceptr location,
+                                      uint32 value, size_t uint32_count);
+
+  // Performs an asynchronous memset of the device memory segment via
+  // cuMemsetD32Async.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g58229da5d30f1c0cdf667b320ec2c0f5
+  static bool AsynchronousMemsetUint32(CUcontext context, CUdeviceptr location,
+                                       uint32 value, size_t uint32_count,
+                                       CUstream stream);
+
+  // -- Synchronous memcopies.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4d32266788c440b0220b1a9ba5795169
+
+  static bool SynchronousMemcpyD2H(CUcontext context, void *host_dst,
+                                   CUdeviceptr gpu_src, uint64 size);
+  static bool SynchronousMemcpyH2D(CUcontext context, CUdeviceptr gpu_dst,
+                                   const void *host_src, uint64 size);
+  static bool SynchronousMemcpyD2D(CUcontext context, CUdeviceptr gpu_dst,
+                                   CUdeviceptr gpu_src, uint64 size);
+
+  // -- Asynchronous memcopies.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g56f30236c7c5247f8e061b59d3268362
+
+  static bool AsynchronousMemcpyD2H(CUcontext context, void *host_dst,
+                                    CUdeviceptr gpu_src, uint64 size,
+                                    CUstream stream);
+  static bool AsynchronousMemcpyH2D(CUcontext context, CUdeviceptr gpu_dst,
+                                    const void *host_src, uint64 size,
+                                    CUstream stream);
+  static bool AsynchronousMemcpyD2D(CUcontext context, CUdeviceptr gpu_dst,
+                                    CUdeviceptr gpu_src, uint64 size,
+                                    CUstream stream);
+
+  // The CUDA stream callback type signature.
+  // The data passed to AddStreamCallback is subsequently passed to this
+  // callback when it fires.
+  //
+  // Some notable things:
+  // * Callbacks must not make any CUDA API calls.
+  // * Callbacks from independent streams execute in an undefined order and may
+  //   be serialized.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483
+  typedef void (*StreamCallback)(CUstream stream, CUresult status, void *data);
+
+  // Enqueues a callback operation into stream.
+  // See StreamCallback above and the NVIDIA documentation for additional
+  // details.
+  static bool AddStreamCallback(CUcontext context, CUstream stream,
+                                StreamCallback callback, void *data);
+
+  // Causes stream to wait for event to trigger before proceeding via
+  // cuStreamWaitEvent.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#axzz334nAXAhM
+  static bool WaitStreamOnEvent(CUcontext context, CUstream stream,
+                                CUevent event);
+
+  // Blocks the calling thread until the operations enqueued onto stream have
+  // been completed, via cuStreamSynchronize.
+  //
+  // TODO(leary) if a pathological thread enqueues operations onto the stream
+  // while another thread blocks like this, can you wind up waiting an unbounded
+  // amount of time?
+  //
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g15e49dd91ec15991eb7c0a741beb7dad
+  static bool SynchronizeStream(CUcontext context, CUstream stream);
+
+  // Blocks the calling thread until the operations associated with the context
+  // have been completed, via cuCtxSynchronize.
+  //
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g7a54725f28d34b8c6299f0c6ca579616
+  static bool SynchronizeContext(CUcontext context);
+
+  // Returns true if all stream tasks have completed at time of the call. Note
+  // the potential for races around this call (if another thread adds work to
+  // the stream immediately after this returns).
+  static bool IsStreamIdle(CUcontext context, CUstream stream);
+
+  // Returns whether code in the from context can access memory in the to
+  // context via cuDeviceCanAccessPeer.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g496bdaae1f632ebfb695b99d2c40f19e
+  static bool CanEnablePeerAccess(CUcontext from, CUcontext to);
+
+  // Enables peer access per CanEnablePeerAccess, via cuCtxEnablePeerAccess.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g0889ec6728e61c05ed359551d67b3f5a
+  static port::Status EnablePeerAccess(CUcontext from, CUcontext to);
+
+  // Returns the elapsed milliseconds between start and stop via
+  // cuEventElapsedTime.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1gdfb1178807353bbcaa9e245da497cf97
+  static bool GetEventElapsedTime(CUcontext context,
+                                  float *elapsed_milliseconds, CUevent start,
+                                  CUevent stop);
+
+  // Records that an event occurred when execution reaches the current point in
+  // thestream via cuEventRecord.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g95424d3be52c4eb95d83861b70fb89d1
+  static port::Status RecordEvent(CUcontext context, CUevent event,
+                                  CUstream stream);
+
+  // Polls (without blocking) to determine the status of an event - pending or
+  // complete (or an error status).
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g6f0704d755066b0ee705749ae911deef
+  static port::StatusOr<CUresult> QueryEvent(CUcontext context, CUevent event);
+
+  // -- Pointer-specific calls.
+
+  // Returns the context in which pointer was allocated or registered.
+  static port::StatusOr<CUcontext> GetPointerContext(CUdeviceptr pointer);
+
+  // Returns the device associated with the context from GetPointerContext().
+  static port::StatusOr<CUdevice> GetPointerDevice(CUdeviceptr pointer);
+
+  // Returns the memory space addressed by pointer.
+  static port::StatusOr<MemorySpace> GetPointerMemorySpace(CUdeviceptr pointer);
+
+  // Returns the base address and size of the device pointer dptr.
+  static port::Status GetPointerAddressRange(CUdeviceptr dptr,
+                                             CUdeviceptr *base, size_t *size);
+
+  // -- Device-specific calls.
+
+  // Returns the compute capability for the device; i.e (3, 5).
+  // This is currently done via the deprecated device API.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE__DEPRECATED.html#group__CUDA__DEVICE__DEPRECATED_1ge2091bbac7e1fb18c2821612115607ea
+  static port::Status GetComputeCapability(int *cc_major, int *cc_minor,
+                                           CUdevice device);
+
+  // Returns the number of multiprocessors on the device (note that the device
+  // may be multi-GPU-per-board).
+  static port::StatusOr<int> GetMultiprocessorCount(CUdevice device);
+
+  // Returns the limit on number of threads that can be resident in a single
+  // multiprocessor.
+  static port::StatusOr<int64> GetMaxThreadsPerMultiprocessor(CUdevice device);
+
+  // Returns the limit on number of threads which may be resident for a single
+  // block (cooperative thread array).
+  static port::StatusOr<int64> GetMaxThreadsPerBlock(CUdevice device);
+
+  // Returns the amount of shared memory available on a single GPU core (i.e.
+  // SM on NVIDIA devices).
+  static port::StatusOr<int64> GetMaxSharedMemoryPerCore(CUdevice device);
+
+  // Returns the amount of shared memory available for a single block
+  // (cooperative thread array).
+  static port::StatusOr<int64> GetMaxSharedMemoryPerBlock(CUdevice device);
+
+  // Returns the maximum supported number of registers per block.
+  static port::StatusOr<int64> GetMaxRegistersPerBlock(CUdevice device);
+
+  // Returns the number of threads per warp.
+  static port::StatusOr<int64> GetThreadsPerWarp(CUdevice device);
+
+  // Queries the grid limits for device with cuDeviceGetAttribute calls.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g9c3e1414f0ad901d3278a4d6645fc266
+  static bool GetGridLimits(int *x, int *y, int *z, CUdevice device);
+
+  // Returns a grab-bag of device properties in a caller-owned device_properties
+  // structure for device_ordinal via cuDeviceGetProperties.
+  // This call is deprecated in the NVIDIA driver API.
+  //
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE__DEPRECATED.html#group__CUDA__DEVICE__DEPRECATED_1g65a5b4e25186bd257df80b98c98cffe6
+  static bool GetDeviceProperties(CUdevprop *device_properties,
+                                  int device_ordinal);
+
+  // Returns whether ECC is enabled for the given CUdevice via
+  // cuDeviceGetattribute with CU_DEVICE_ATTRIBUTE_ECC_ENABLED.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g9c3e1414f0ad901d3278a4d6645fc266
+  static bool IsEccEnabled(CUdevice device, bool *result);
+
+  // Returns the total amount of memory available for allocation by the CUDA
+  // context, in bytes, via cuDeviceTotalMem.
+  static bool GetDeviceTotalMemory(CUdevice device, uint64 *result);
+
+  // Returns the free amount of memory and total amount of memory, as reported
+  // by cuMemGetInfo.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g808f555540d0143a331cc42aa98835c0
+  static bool GetDeviceMemoryInfo(CUcontext context, int64 *free, int64 *total);
+
+  // Returns a PCI bus id string for the device.
+  // [domain]:[bus]:[device].[function]
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g85295e7d9745ab8f0aa80dd1e172acfc
+  static string GetPCIBusID(CUdevice device);
+
+  // -- Context- and device-independent calls.
+
+  // Returns the number of visible CUDA device via cuDeviceGetCount.
+  // This should correspond to the set of device ordinals available.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g52b5ce05cb8c5fb6831b2c0ff2887c74
+  static int GetDeviceCount();
+
+  // Returns the driver version number via cuDriverGetVersion.
+  // This is, surprisingly, NOT the actual driver version (e.g. 331.79) but,
+  // instead, the CUDA toolkit release number that this driver is compatible
+  // with; e.g. 6000 (for a CUDA 6.0 compatible driver) or 6050 (for a CUDA 6.5
+  // compatible driver).
+  //
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__VERSION.html#group__CUDA__VERSION_1g8b7a10395392e049006e61bcdc8ebe71
+  static bool GetDriverVersion(int *driver_version);
+
+  // -- Other calls
+
+  // Returns the maximum number of blocks (per multiprocessor) occupied by the
+  // specified kernel/CUfunction when launched with the specified parameters.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__OCCUPANCY.html#group__CUDA__OCCUPANCY_1gcc6e1094d05cba2cee17fe33ddd04a98
+  static port::StatusOr<int> GetMaxOccupiedBlocksPerCore(
+      CUcontext context, CUfunction kernel, int threads_per_block,
+      size_t dynamic_shared_memory_bytes);
+
+  // Seam for injecting an error at CUDA initialization time for testing
+  // purposes.
+  static bool driver_inject_init_error_;
+};
+
+// Ensures a context is activated within a scope.
+class ScopedActivateContext {
+ public:
+  // Activates the context via cuCtxSetCurrent, if it is not the currently
+  // active context (a la cuCtxGetCurrent). Note the alternative push/pop
+  // mechanism is said by NVIDIA to be relatively slow and deprecated.
+  // http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gbe562ee6258b4fcc272ca6478ca2a2f7
+  explicit ScopedActivateContext(
+      CUcontext context, MultiOpActivation moa = MultiOpActivation::kNo);
+
+  // Checks that the context has remained activated for the duration of the
+  // scope.
+  ~ScopedActivateContext();
+
+ private:
+  CUcontext context_;  // context being activated.
+
+  CUcontext prior_context_;  // context that was active when we were activated.
+
+  // Stores whether this was instantiated during a MultiOpActivation, in which
+  // case we will not pop the context when we're destroyed (we will leave it to
+  // the parent MultiOpActivation that we were nested within).
+  bool previously_in_multi_op_activation_;
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_DRIVER_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_event.cc b/tensorflow/stream_executor/cuda/cuda_event.cc
new file mode 100644
index 0000000000..a87c868c6b
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_event.cc
@@ -0,0 +1,56 @@
+#include "tensorflow/stream_executor/cuda/cuda_event.h"
+
+#include "tensorflow/stream_executor/cuda/cuda_stream.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+CUDAEvent::CUDAEvent(CUDAExecutor* parent)
+    : parent_(parent), cuda_event_(nullptr) {}
+
+CUDAEvent::~CUDAEvent() {}
+
+port::Status CUDAEvent::Init() {
+  return CUDADriver::CreateEvent(parent_->cuda_context(), &cuda_event_,
+                                 CUDADriver::EventFlags::kDisableTiming);
+}
+
+port::Status CUDAEvent::Destroy() {
+  return CUDADriver::DestroyEvent(parent_->cuda_context(), &cuda_event_);
+}
+
+port::Status CUDAEvent::Record(CUDAStream* stream) {
+  return CUDADriver::RecordEvent(parent_->cuda_context(), cuda_event_,
+                                 stream->cuda_stream());
+}
+
+Event::Status CUDAEvent::PollForStatus() {
+  port::StatusOr<CUresult> status =
+      CUDADriver::QueryEvent(parent_->cuda_context(), cuda_event_);
+  if (!status.ok()) {
+    LOG(ERROR) << "Error polling for event status: "
+               << status.status().error_message();
+    return Event::Status::kError;
+  }
+
+  switch (status.ValueOrDie()) {
+    case CUDA_SUCCESS:
+      return Event::Status::kComplete;
+    case CUDA_ERROR_NOT_READY:
+      return Event::Status::kPending;
+    default:
+      LOG(INFO) << "Error condition returned for event status: "
+                << status.ValueOrDie();
+      return Event::Status::kError;
+  }
+}
+
+const CUevent& CUDAEvent::cuda_event() {
+  return cuda_event_;
+}
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/cuda/cuda_event.h b/tensorflow/stream_executor/cuda/cuda_event.h
new file mode 100644
index 0000000000..c5b65662db
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_event.h
@@ -0,0 +1,49 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_EVENT_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_EVENT_H_
+
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/cuda/cuda_stream.h"
+#include "tensorflow/stream_executor/event.h"
+#include "tensorflow/stream_executor/lib/status.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+// CUDAEvent wraps a CUevent in the platform-independent EventInterface
+// interface.
+class CUDAEvent : public internal::EventInterface {
+ public:
+  explicit CUDAEvent(CUDAExecutor* parent);
+
+  ~CUDAEvent() override;
+
+  // Populates the CUDA-platform-specific elements of this object.
+  port::Status Init();
+
+  // Deallocates any platform-specific elements of this object. This is broken
+  // out (not part of the destructor) to allow for error reporting.
+  port::Status Destroy();
+
+  // Inserts the event at the current position into the specified stream.
+  port::Status Record(CUDAStream* stream);
+
+  // Polls the CUDA platform for the event's current status.
+  Event::Status PollForStatus();
+
+  // The underyling CUDA event element.
+  const CUevent& cuda_event();
+
+ private:
+  // The Executor used to which this object and CUevent are bound.
+  CUDAExecutor* parent_;
+
+  // The underlying CUDA event element.
+  CUevent cuda_event_;
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_EVENT_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_fft.cc b/tensorflow/stream_executor/cuda/cuda_fft.cc
new file mode 100644
index 0000000000..59c3159895
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_fft.cc
@@ -0,0 +1,327 @@
+#include "tensorflow/stream_executor/cuda/cuda_fft.h"
+
+#include <dlfcn.h>
+
+#include <complex>
+
+#include "tensorflow/stream_executor/cuda/cuda_activation.h"
+#include "tensorflow/stream_executor/cuda/cuda_gpu_executor.h"
+#include "tensorflow/stream_executor/cuda/cuda_helpers.h"
+#include "tensorflow/stream_executor/cuda/cuda_platform.h"
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/dso_loader.h"
+#include "tensorflow/stream_executor/lib/initialize.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+PLUGIN_REGISTRY_DEFINE_PLUGIN_ID(kCuFftPlugin);
+
+namespace dynload {
+
+// This macro wraps a global identifier, given by __name, in a callable
+// structure that loads the DLL symbol out of the DSO handle in a thread-safe
+// manner on first use. This dynamic loading technique is used to avoid DSO
+// dependencies on vendor libraries which may or may not be available in the
+// deployed binary environment.
+#define PERFTOOLS_GPUTOOLS_CUFFT_WRAP(__name)                              \
+  struct DynLoadShim__##__name {                                           \
+    static const char *kName;                                              \
+    using FuncPointerT = std::add_pointer<decltype(::__name)>::type;       \
+    static void *GetDsoHandle() {                                          \
+      static auto status = internal::CachedDsoLoader::GetCufftDsoHandle(); \
+      return status.ValueOrDie();                                          \
+    }                                                                      \
+    static FuncPointerT DynLoad() {                                        \
+      static void *f = dlsym(GetDsoHandle(), kName);                       \
+      CHECK(f != nullptr) << "could not find " << kName                    \
+                          << " in cuFFT DSO; dlerror: " << dlerror();      \
+      return reinterpret_cast<FuncPointerT>(f);                            \
+    }                                                                      \
+    template <typename... Args>                                            \
+    cufftResult operator()(CUDAExecutor * parent, Args... args) {          \
+      cuda::ScopedActivateExecutorContext sac{parent};                     \
+      return DynLoad()(args...);                                           \
+    }                                                                      \
+  } __name;                                                                \
+  const char *DynLoadShim__##__name::kName = #__name;
+
+#define CUFFT_ROUTINE_EACH(__macro)                                         \
+  __macro(cufftDestroy) __macro(cufftSetStream) __macro(cufftPlan1d)        \
+      __macro(cufftPlan2d) __macro(cufftPlan3d) __macro(cufftPlanMany)      \
+          __macro(cufftExecD2Z) __macro(cufftExecZ2D) __macro(cufftExecC2C) \
+              __macro(cufftExecC2R) __macro(cufftExecZ2Z)                   \
+                  __macro(cufftExecR2C)
+
+CUFFT_ROUTINE_EACH(PERFTOOLS_GPUTOOLS_CUFFT_WRAP)
+
+}  // namespace dynload
+
+namespace {
+
+// A helper function transforming gpu_fft arguments into cuFFT arguments.
+cufftType CUDAFftType(fft::Type type) {
+  switch (type) {
+    case fft::Type::kC2CForward:
+    case fft::Type::kC2CInverse:
+      return CUFFT_C2C;
+    case fft::Type::kC2R:
+      return CUFFT_C2R;
+    case fft::Type::kR2C:
+      return CUFFT_R2C;
+    case fft::Type::kZ2ZForward:
+    case fft::Type::kZ2ZInverse:
+      return CUFFT_Z2Z;
+    case fft::Type::kZ2D:
+      return CUFFT_Z2D;
+    case fft::Type::kD2Z:
+      return CUFFT_D2Z;
+    default:
+      LOG(FATAL) << "Invalid value of fft::Type.";
+  }
+}
+
+// Associates the given stream with the given cuFFT plan.
+bool SetStream(CUDAExecutor *parent, cufftHandle plan, Stream *stream) {
+  auto ret = dynload::cufftSetStream(parent, plan, AsCUDAStreamValue(stream));
+  if (ret != CUFFT_SUCCESS) {
+    LOG(ERROR) << "failed to run cuFFT routine cufftSetStream: " << ret;
+    return false;
+  }
+  return true;
+}
+
+}  // namespace
+
+CUDAFftPlan::CUDAFftPlan(CUDAExecutor *parent, uint64 num_x, fft::Type type)
+    : parent_(parent), fft_type_(type) {
+  auto ret = dynload::cufftPlan1d(parent, &plan_, num_x, CUDAFftType(type),
+                                  1 /* = batch */);
+  if (ret != CUFFT_SUCCESS) {
+    LOG(ERROR) << "failed to create cuFFT 1d plan:" << ret;
+  }
+}
+
+CUDAFftPlan::CUDAFftPlan(CUDAExecutor *parent, uint64 num_x, uint64 num_y,
+                         fft::Type type)
+    : parent_(parent), fft_type_(type) {
+  auto ret =
+      dynload::cufftPlan2d(parent, &plan_, num_x, num_y, CUDAFftType(type));
+  if (ret != CUFFT_SUCCESS) {
+    LOG(ERROR) << "failed to create cuFFT 2d plan:" << ret;
+  }
+}
+
+CUDAFftPlan::CUDAFftPlan(CUDAExecutor *parent, uint64 num_x, uint64 num_y,
+                         uint64 num_z, fft::Type type)
+    : parent_(parent), fft_type_(type) {
+  auto ret = dynload::cufftPlan3d(parent, &plan_, num_x, num_y, num_z,
+                                  CUDAFftType(type));
+  if (ret != CUFFT_SUCCESS) {
+    LOG(ERROR) << "failed to create cuFFT 3d plan:" << ret;
+  }
+}
+
+CUDAFftPlan::CUDAFftPlan(CUDAExecutor *parent, int rank, uint64 *elem_count,
+                         uint64 *input_embed, uint64 input_stride,
+                         uint64 input_distance, uint64 *output_embed,
+                         uint64 output_stride, uint64 output_distance,
+                         fft::Type type, int batch_count)
+    : parent_(parent), fft_type_(type) {
+  int elem_count_[3], input_embed_[3], output_embed_[3];
+  for (int i = 0; i < rank; ++i) {
+    elem_count_[i] = elem_count[i];
+    if (input_embed) {
+      input_embed_[i] = input_embed[i];
+    }
+    if (output_embed) {
+      output_embed_[i] = output_embed[i];
+    }
+  }
+  auto ret = dynload::cufftPlanMany(
+      parent, &plan_, rank, elem_count_, input_embed ? input_embed_ : nullptr,
+      input_stride, input_distance, output_embed ? output_embed_ : nullptr,
+      output_stride, output_distance, CUDAFftType(type), batch_count);
+  if (ret != CUFFT_SUCCESS) {
+    LOG(ERROR) << "failed to create cuFFT batched plan:" << ret;
+  }
+}
+
+CUDAFftPlan::~CUDAFftPlan() { dynload::cufftDestroy(parent_, plan_); }
+
+int CUDAFftPlan::GetFftDirection() const {
+  switch (fft_type_) {
+    case fft::Type::kC2CForward:
+    case fft::Type::kZ2ZForward:
+    case fft::Type::kR2C:
+    case fft::Type::kD2Z:
+      return CUFFT_FORWARD;
+    case fft::Type::kC2CInverse:
+    case fft::Type::kZ2ZInverse:
+    case fft::Type::kC2R:
+    case fft::Type::kZ2D:
+      return CUFFT_INVERSE;
+    default:
+      LOG(FATAL) << "Invalid value of fft::Type.";
+  }
+}
+
+std::unique_ptr<fft::Plan> CUDAFft::Create1dPlan(Stream *stream, uint64 num_x,
+                                                 fft::Type type,
+                                                 bool in_place_fft) {
+  std::unique_ptr<fft::Plan> plan{new CUDAFftPlan(parent_, num_x, type)};
+  return plan;
+}
+
+std::unique_ptr<fft::Plan> CUDAFft::Create2dPlan(Stream *stream, uint64 num_x,
+                                                 uint64 num_y, fft::Type type,
+                                                 bool in_place_fft) {
+  std::unique_ptr<fft::Plan> plan{new CUDAFftPlan(parent_, num_x, num_y, type)};
+  return plan;
+}
+
+std::unique_ptr<fft::Plan> CUDAFft::Create3dPlan(Stream *stream, uint64 num_x,
+                                                 uint64 num_y, uint64 num_z,
+                                                 fft::Type type,
+                                                 bool in_place_fft) {
+  std::unique_ptr<fft::Plan> plan{
+      new CUDAFftPlan(parent_, num_x, num_y, num_z, type)};
+  return plan;
+}
+
+std::unique_ptr<fft::Plan> CUDAFft::CreateBatchedPlan(
+    Stream *stream, int rank, uint64 *elem_count, uint64 *input_embed,
+    uint64 input_stride, uint64 input_distance, uint64 *output_embed,
+    uint64 output_stride, uint64 output_distance, fft::Type type,
+    bool in_place_fft, int batch_count) {
+  std::unique_ptr<fft::Plan> plan{new CUDAFftPlan(
+      parent_, rank, elem_count, input_embed, input_stride, input_distance,
+      output_embed, output_stride, output_distance, type, batch_count)};
+  return plan;
+}
+
+template <typename FuncT, typename InputT, typename OutputT>
+bool CUDAFft::DoFftInternal(Stream *stream, fft::Plan *plan, FuncT cufftExec,
+                            const DeviceMemory<InputT> &input,
+                            DeviceMemory<OutputT> *output) {
+  CUDAFftPlan *cuda_fft_plan = dynamic_cast<CUDAFftPlan *>(plan);
+  if (cuda_fft_plan == nullptr) {
+    LOG(ERROR) << "the passed-in plan is not a CUDAFftPlan object.";
+    return false;
+  }
+
+  if (!SetStream(parent_, cuda_fft_plan->GetPlan(), stream)) {
+    return false;
+  }
+
+  auto ret = cufftExec(parent_, cuda_fft_plan->GetPlan(),
+                       CUDAComplex(const_cast<InputT *>(CUDAMemory(input))),
+                       CUDAComplex(CUDAMemoryMutable(output)));
+
+  if (ret != CUFFT_SUCCESS) {
+    LOG(ERROR) << "failed to run cuFFT routine: " << ret;
+    return false;
+  }
+
+  return true;
+}
+
+template <typename FuncT, typename InputT, typename OutputT>
+bool CUDAFft::DoFftWithDirectionInternal(Stream *stream, fft::Plan *plan,
+                                         FuncT cufftExec,
+                                         const DeviceMemory<InputT> &input,
+                                         DeviceMemory<OutputT> *output) {
+  CUDAFftPlan *cuda_fft_plan = dynamic_cast<CUDAFftPlan *>(plan);
+  if (cuda_fft_plan == nullptr) {
+    LOG(ERROR) << "the passed-in plan is not a CUDAFftPlan object.";
+    return false;
+  }
+
+  if (!SetStream(parent_, cuda_fft_plan->GetPlan(), stream)) {
+    return false;
+  }
+
+  auto ret = cufftExec(parent_, cuda_fft_plan->GetPlan(),
+                       CUDAComplex(const_cast<InputT *>(CUDAMemory(input))),
+                       CUDAComplex(CUDAMemoryMutable(output)),
+                       cuda_fft_plan->GetFftDirection());
+
+  if (ret != CUFFT_SUCCESS) {
+    LOG(ERROR) << "failed to run cuFFT routine: " << ret;
+    return false;
+  }
+
+  return true;
+}
+
+#define PERFTOOLS_GPUTOOLS_CUDA_DEFINE_FFT(__type, __fft_type1, __fft_type2,   \
+                                           __fft_type3)                        \
+  bool CUDAFft::DoFft(Stream *stream, fft::Plan *plan,                         \
+                      const DeviceMemory<std::complex<__type>> &input,         \
+                      DeviceMemory<std::complex<__type>> *output) {            \
+    return DoFftWithDirectionInternal(                                         \
+        stream, plan, dynload::cufftExec##__fft_type1, input, output);         \
+  }                                                                            \
+  bool CUDAFft::DoFft(Stream *stream, fft::Plan *plan,                         \
+                      const DeviceMemory<__type> &input,                       \
+                      DeviceMemory<std::complex<__type>> *output) {            \
+    return DoFftInternal(stream, plan, dynload::cufftExec##__fft_type2, input, \
+                         output);                                              \
+  }                                                                            \
+  bool CUDAFft::DoFft(Stream *stream, fft::Plan *plan,                         \
+                      const DeviceMemory<std::complex<__type>> &input,         \
+                      DeviceMemory<__type> *output) {                          \
+    return DoFftInternal(stream, plan, dynload::cufftExec##__fft_type3, input, \
+                         output);                                              \
+  }
+
+PERFTOOLS_GPUTOOLS_CUDA_DEFINE_FFT(float, C2C, R2C, C2R)
+PERFTOOLS_GPUTOOLS_CUDA_DEFINE_FFT(double, Z2Z, D2Z, Z2D)
+
+#undef PERFTOOLS_GPUTOOLS_CUDA_DEFINE_FFT
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+namespace gpu = ::perftools::gputools;
+
+REGISTER_MODULE_INITIALIZER(register_cufft, {
+  gpu::port::Status status =
+      gpu::PluginRegistry::Instance()
+          ->RegisterFactory<gpu::PluginRegistry::FftFactory>(
+              gpu::cuda::kCudaPlatformId, gpu::cuda::kCuFftPlugin, "cuFFT",
+              [](gpu::internal::StreamExecutorInterface
+                     *parent) -> gpu::fft::FftSupport * {
+                gpu::cuda::CUDAExecutor *cuda_executor =
+                    dynamic_cast<gpu::cuda::CUDAExecutor *>(parent);
+                if (cuda_executor == nullptr) {
+                  LOG(ERROR)
+                      << "Attempting to initialize an instance of the cuFFT "
+                      << "support library with a non-CUDA StreamExecutor";
+                  return nullptr;
+                }
+
+                return new gpu::cuda::CUDAFft(cuda_executor);
+              });
+  if (!status.ok()) {
+    LOG(ERROR) << "Unable to register cuFFT factory: "
+               << status.error_message();
+  }
+
+  // Prime the cuFFT DSO. The loader will log more information.
+  auto statusor = gpu::internal::CachedDsoLoader::GetCufftDsoHandle();
+  if (!statusor.ok()) {
+    LOG(INFO) << "Unable to load cuFFT DSO.";
+  }
+
+  gpu::PluginRegistry::Instance()->SetDefaultFactory(gpu::cuda::kCudaPlatformId,
+                                                     gpu::PluginKind::kFft,
+                                                     gpu::cuda::kCuFftPlugin);
+});
diff --git a/tensorflow/stream_executor/cuda/cuda_fft.h b/tensorflow/stream_executor/cuda/cuda_fft.h
new file mode 100644
index 0000000000..2577c2952e
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_fft.h
@@ -0,0 +1,95 @@
+// CUDA-specific support for FFT functionality -- this wraps the cuFFT library
+// capabilities, and is only included into CUDA implementation code -- it will
+// not introduce cuda headers into other code.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_FFT_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_FFT_H_
+
+#include "tensorflow/stream_executor/fft.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+#include "third_party/gpus/cuda/include/cufft.h"
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+
+namespace cuda {
+
+class CUDAExecutor;
+
+// Opaque and unique indentifier for the cuFFT plugin.
+extern const PluginId kCuFftPlugin;
+
+class CUDAFftPlan : public fft::Plan {
+ public:
+  // Constructor creating 1d FFT plan.
+  CUDAFftPlan(CUDAExecutor *parent, uint64 num_x, fft::Type type);
+  // Constructor creating 2d FFT plan.
+  CUDAFftPlan(CUDAExecutor *parent, uint64 num_x, uint64 num_y, fft::Type type);
+  // Constructor creating 3d FFT plan.
+  CUDAFftPlan(CUDAExecutor *parent, uint64 num_x, uint64 num_y, uint64 num_z,
+              fft::Type type);
+  // Constructor creating batched FFT plan.
+  CUDAFftPlan(CUDAExecutor *parent, int rank, uint64 *elem_count,
+              uint64 *input_embed, uint64 input_stride, uint64 input_distance,
+              uint64 *output_embed, uint64 output_stride,
+              uint64 output_distance, fft::Type type, int batch_count);
+  ~CUDAFftPlan() override;
+
+  // Get FFT direction in cuFFT based on FFT type.
+  int GetFftDirection() const;
+  cufftHandle GetPlan() const { return plan_; }
+
+ private:
+  CUDAExecutor *parent_;
+  cufftHandle plan_;
+  fft::Type fft_type_;
+};
+
+// FFT support for CUDA platform via cuFFT library.
+//
+// This satisfies the platform-agnostic FftSupport interface.
+//
+// Note that the cuFFT handle that this encapsulates is implicitly tied to the
+// context (and, as a result, the device) that the parent CUDAExecutor is tied
+// to. This simply happens as an artifact of creating the cuFFT handle when a
+// CUDA context is active.
+//
+// Thread-safe. The CUDA context associated with all operations is the CUDA
+// context of parent_, so all context is explicit.
+class CUDAFft : public fft::FftSupport {
+ public:
+  explicit CUDAFft(CUDAExecutor *parent) : parent_(parent) {}
+  ~CUDAFft() override {}
+
+  TENSORFLOW_STREAM_EXECUTOR_GPU_FFT_SUPPORT_OVERRIDES
+
+ private:
+  CUDAExecutor *parent_;
+
+  // Two helper functions that execute dynload::cufftExec?2?.
+
+  // This is for complex to complex FFT, when the direction is required.
+  template <typename FuncT, typename InputT, typename OutputT>
+  bool DoFftWithDirectionInternal(Stream *stream, fft::Plan *plan,
+                                  FuncT cufft_exec,
+                                  const DeviceMemory<InputT> &input,
+                                  DeviceMemory<OutputT> *output);
+
+  // This is for complex to real or real to complex FFT, when the direction
+  // is implied.
+  template <typename FuncT, typename InputT, typename OutputT>
+  bool DoFftInternal(Stream *stream, fft::Plan *plan, FuncT cufft_exec,
+                     const DeviceMemory<InputT> &input,
+                     DeviceMemory<OutputT> *output);
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CUDAFft);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_FFT_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
new file mode 100644
index 0000000000..77f16e2a6e
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
@@ -0,0 +1,1082 @@
+#include "tensorflow/stream_executor/cuda/cuda_gpu_executor.h"
+
+#include <unistd.h>
+
+#include "tensorflow/stream_executor/cuda/cuda_diagnostics.h"
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/cuda/cuda_event.h"
+#include "tensorflow/stream_executor/cuda/cuda_platform.h"
+#include "tensorflow/stream_executor/cuda/cuda_stream.h"
+#include "tensorflow/stream_executor/cuda/cuda_timer.h"
+#include "tensorflow/stream_executor/dso_loader.h"
+#include "tensorflow/stream_executor/kernel_cache_config.h"
+#include "tensorflow/stream_executor/lib/casts.h"
+#include "tensorflow/stream_executor/lib/env.h"
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/initialize.h"
+#include "tensorflow/stream_executor/lib/mathutil.h"
+#include "tensorflow/stream_executor/lib/path.h"
+#include "tensorflow/stream_executor/lib/process_state.h"
+#include "tensorflow/stream_executor/lib/ptr_util.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/lib/str_util.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+#include "tensorflow/stream_executor/timer.h"
+#include "tensorflow/stream_executor/lib/numbers.h"
+
+#ifdef PLATFORMS_GPUS_CUDA_DYNAMIC_LIBCUDA_DYNAMIC_LIBCUDA_H_
+#error \
+    "No driver calls in this file, wrap driver functionality in cuda_driver.cc."
+#endif
+
+#ifdef __CUDA_RUNTIME_H__
+#error \
+    "CUDA runtime being included into CUDA GPU executor; should be driver only."
+#endif
+
+extern bool FLAGS_check_gpu_leaks;
+tensorflow::int32 FLAGS_register_occupancy_warning_threshold;
+bool FLAGS_prefer_cubin_to_ptx = true;
+
+namespace perftools {
+namespace gputools {
+namespace rng {
+class RngSupport;
+}  // namespace rng
+}  // namespace gputools
+}  // namespace perftools
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+// Hook that can be used to CUBIN-ate PTX before it is loaded into the driver.
+// It has been observed that loading both PTX and cubins into the driver library
+// can cause it to crash, but loading only CUBINs avoids those crashes;
+// therefore, it's useful to have this hook to hack in uniform CUBIN-ation of
+// PTX code.
+//
+// As this is an implementation-detail workaround, the usage is to declare this
+// variable with extern linkage and populate it from another translation unit.
+std::function<string(const string &)> g_cubinate;
+
+static CUDAEvent *AsCUDAEvent(Event *event) {
+  DCHECK(event != nullptr);
+  return static_cast<CUDAEvent *>(event->implementation());
+}
+
+// Given a platform-independent stream datatype, returns the internal CUDA
+// platform implementation pointer.
+static CUDAStream *AsCUDAStream(Stream *stream) {
+  DCHECK(stream != nullptr);
+  return static_cast<CUDAStream *>(stream->implementation());
+}
+
+// Given a platform-independent stream datatype, returns the platform
+// implementation's internal value, suitable for passing directly to libcuda
+// APIs.
+CUstream AsCUDAStreamValue(Stream *stream) {
+  DCHECK(stream != nullptr);
+  return AsCUDAStream(stream)->cuda_stream();
+}
+
+// Given a platform-independent timer datatype, returns the internal CUDA
+// platform implementation pointer.
+static CUDATimer *AsCUDATimer(Timer *timer) {
+  DCHECK(timer != nullptr);
+  return static_cast<CUDATimer *>(timer->implementation());
+}
+
+// Given const GPU memory, returns a libcuda device pointer datatype, suitable
+// for passing directly to libcuda APIs.
+//
+// N.B. we must lose constness in order to pass a suitable type to the existing
+// libcuda APIs, so the caller should take care to only pass the result of const
+// GPU memory conversions to libcuda functions which will honor constness.
+static CUdeviceptr AsCudaDevicePtr(const DeviceMemoryBase &gpu_mem) {
+  return reinterpret_cast<CUdeviceptr>(gpu_mem.opaque());
+}
+
+// See description on const version above.
+static CUdeviceptr AsCudaDevicePtr(DeviceMemoryBase *gpu_mem) {
+  return AsCudaDevicePtr(*gpu_mem);
+}
+
+static CUcontext GetCudaContext(Stream *stream) {
+  return static_cast<CUDAExecutor *>(stream->parent()->implementation())
+      ->cuda_context();
+}
+
+CUcontext ExtractCudaContext(CUDAExecutor *cuda_exec) {
+  CHECK(cuda_exec != nullptr);
+  return cuda_exec->cuda_context();
+}
+
+CUDAExecutor *ExtractCudaExecutor(StreamExecutor *stream_exec) {
+  return static_cast<CUDAExecutor *>(stream_exec->implementation());
+}
+
+CUDAExecutor::~CUDAExecutor() {
+  for (auto &it : disk_modules_) {
+    CUDADriver::UnloadModule(context_, it.second);
+  }
+  for (auto &it : in_memory_modules_) {
+    CUDADriver::UnloadModule(context_, it.second);
+  }
+  if (context_ != nullptr) {
+    CUDADriver::DestroyContext(context_);
+  }
+}
+
+port::Status CUDAExecutor::Init(int device_ordinal,
+                                DeviceOptions device_options) {
+  device_ordinal_ = device_ordinal;
+
+  auto status = CUDADriver::Init();
+  if (!status.ok()) {
+    return status;
+  }
+
+  status = CUDADriver::GetDevice(device_ordinal_, &device_);
+  if (!status.ok()) {
+    return status;
+  }
+
+  status = CUDADriver::CreateContext(device_, device_options, &context_);
+  if (!status.ok()) {
+    return status;
+  }
+
+  return CUDADriver::GetComputeCapability(&cc_major_, &cc_minor_, device_);
+}
+
+bool CUDAExecutor::FindOnDiskForComputeCapability(
+    port::StringPiece filename, port::StringPiece canonical_suffix,
+    string *found_filename) const {
+  if (cc_major_ == 0 && cc_minor_ == 0) {
+    return false;
+  }
+
+  // TODO(22689637): Eliminate unnecessary ToString()s when all dependencies
+  // have been migrated.
+  string cc_specific = port::StrCat(filename.ToString(), ".cc", cc_major_,
+                                    cc_minor_, canonical_suffix.ToString());
+  if (port::FileExists(cc_specific)) {
+    VLOG(2) << "found compute-capability-specific file, using that: "
+            << cc_specific;
+    *found_filename = cc_specific;
+    return true;
+  }
+
+  VLOG(2) << "could not find compute-capability specific file at: "
+          << cc_specific;
+  if (port::FileExists(filename.ToString())) {
+    *found_filename = filename.ToString();
+    return true;
+  }
+
+  return false;
+}
+
+// Returns the path to the running executable.
+// N.B. Derived from //knowledge/smalltalk/background_kb.cc
+// Arg: strip_exe: if true, remove the name of the executable itself from the
+//                 returned string. Example: calling this from /usr/bin/foo
+//                 would return /usr/bin.
+static string GetBinaryDir(bool strip_exe) {
+  char exe_path[PATH_MAX] = {0};
+  CHECK_ERR(readlink("/proc/self/exe", exe_path, sizeof(exe_path) - 1));
+  // Make sure it's null-terminated:
+  exe_path[sizeof(exe_path) - 1] = 0;
+
+  if (strip_exe) {
+    // The exe is the last component of the path, so remove one component.
+    string ret = exe_path;
+    std::vector<string> components = port::Split(exe_path, '/');
+    components.pop_back();
+    return port::Join(components, "/");
+  }
+  return exe_path;
+}
+
+// Returns the location of the runfiles directory.
+// This is the directory which "bazel run" sets as the current working directory
+// before the program starts.
+// N.B. This doesn't have to be running under "bazel run" in order to get the
+// appropriate runfiles directory.
+static string GetRunfilesDir() {
+  return port::StrCat(GetBinaryDir(false), ".runfiles");
+}
+
+bool CUDAExecutor::GetKernel(const MultiKernelLoaderSpec &spec,
+                             KernelBase *kernel) {
+  CUDAKernel *cuda_kernel = AsCUDAKernel(kernel);
+  CUmodule module = nullptr;
+  const string *kernelname;
+
+  const OnDiskKernelLoaderSpec *on_disk_spec = nullptr;
+  bool has_ptx = spec.has_cuda_ptx_on_disk();
+  bool has_cubin = spec.has_cuda_cubin_on_disk();
+  if (has_cubin && (!has_ptx || FLAGS_prefer_cubin_to_ptx)) {
+    on_disk_spec = &spec.cuda_cubin_on_disk();
+  } else if (has_ptx) {
+    on_disk_spec = &spec.cuda_ptx_on_disk();
+  }
+
+  if (on_disk_spec != nullptr) {
+  } else if (spec.has_cuda_ptx_in_memory()) {
+    kernelname = &spec.cuda_ptx_in_memory().kernelname();
+
+    if (cc_major_ == 0 && cc_minor_ == 0) {
+      return false;
+    }
+
+    // Note that the orignal ptx may be compressed, and the ptx we get below is
+    // the decompressed result. To cache the module we should use the original
+    // ptx (compressed one) as the key. This is because for the same compressed
+    // ptx, we may get different decompressed ptx wrt the pointer value.
+    const char *ptx = spec.cuda_ptx_in_memory().text(cc_major_, cc_minor_);
+    const char *orig_ptx =
+        spec.cuda_ptx_in_memory().original_text(cc_major_, cc_minor_);
+    if (ptx == nullptr || orig_ptx == nullptr) {
+      ptx = spec.cuda_ptx_in_memory().default_text();
+      orig_ptx = spec.cuda_ptx_in_memory().original_default_text();
+    }
+    if (ptx == nullptr || orig_ptx == nullptr) {
+      LOG(FATAL) << "could not load ptx for kernel " << kernelname;
+      return false;
+    }
+
+    mutex_lock lock{in_memory_modules_mu_};
+    module = in_memory_modules_[orig_ptx];
+
+    if (module == nullptr) {
+      if (g_cubinate == nullptr) {
+        if (!CUDADriver::LoadPtx(context_, ptx, &module)) {
+          return false;
+        }
+      } else {
+        string cubin = g_cubinate(ptx);
+        auto load_status =
+            CUDADriver::LoadCubin(context_, cubin.c_str(), &module);
+        if (!load_status.ok()) {
+          LOG(ERROR) << "failed to load cubin via hook: " << load_status;
+          return false;
+        }
+      }
+      in_memory_modules_[orig_ptx] = module;
+    }
+  } else if (spec.has_cuda_cubin_in_memory()) {
+    kernelname = &spec.cuda_cubin_in_memory().kernelname();
+    const char *cubin = spec.cuda_cubin_in_memory().bytes();
+    mutex_lock lock{in_memory_modules_mu_};
+    module = in_memory_modules_[cubin];
+
+    if (module == nullptr) {
+      auto load_status = CUDADriver::LoadCubin(context_, cubin, &module);
+      if (!load_status.ok()) {
+        LOG(ERROR) << "failed to load CUBIN: " << load_status;
+        return false;
+      }
+
+      in_memory_modules_[cubin] = module;
+    }
+  } else {
+    LOG(WARNING) << "no method of loading CUDA kernel provided";
+    return false;
+  }
+
+  VLOG(2) << "getting function " << kernelname << " from module " << module;
+  if (!CUDADriver::GetModuleFunction(context_, module, kernelname->c_str(),
+                                     cuda_kernel->cuda_function_ptr())) {
+    return false;
+  }
+
+  // We have to trust the kernel loader spec arity because there doesn't appear
+  // to be a way to reflect on the number of expected arguments w/the CUDA API.
+  cuda_kernel->set_arity(spec.arity());
+
+  KernelMetadata kernel_metadata;
+  if (!GetKernelMetadata(cuda_kernel, &kernel_metadata)) {
+    LOG(WARNING) << "Unable to get metadata for kernel " << kernelname;
+  }
+  kernel->set_metadata(kernel_metadata);
+  kernel->set_name(*kernelname);
+  return true;
+}
+
+bool CUDAExecutor::GetKernelMetadata(CUDAKernel *cuda_kernel,
+                                     KernelMetadata *kernel_metadata) {
+  int value;
+  if (!CUDADriver::FuncGetAttribute(CU_FUNC_ATTRIBUTE_NUM_REGS,
+                                    *cuda_kernel->cuda_function_ptr(),
+                                    &value)) {
+    return false;
+  }
+  kernel_metadata->set_registers_per_thread(value);
+
+  if (!CUDADriver::FuncGetAttribute(CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES,
+                                    *cuda_kernel->cuda_function_ptr(),
+                                    &value)) {
+    return false;
+  }
+  kernel_metadata->set_shared_memory_bytes(value);
+
+  return true;
+}
+
+bool CUDAExecutor::Launch(Stream *stream, const ThreadDim &thread_dims,
+                          const BlockDim &block_dims, const KernelBase &kernel,
+                          const std::vector<KernelArg> &args) {
+  CHECK_EQ(kernel.Arity(), args.size());
+  CUstream custream = AsCUDAStreamValue(stream);
+  const CUDAKernel *cuda_kernel = AsCUDAKernel(&kernel);
+  CUfunction cufunc = cuda_kernel->AsCUDAFunctionValue();
+
+  std::vector<void *> addrs;
+  addrs.reserve(args.size());
+  int shmem_bytes = 0;
+  for (size_t i = 0; i < args.size(); i++) {
+    switch (args[i].type) {
+      case KernelArg::kNormal:
+        addrs.push_back(const_cast<void *>(
+            static_cast<const void *>(args[i].data.begin())));
+        break;
+      case KernelArg::kSharedMemory:
+        shmem_bytes += args[i].bytes;
+        break;
+      default:
+        LOG(ERROR) << "Invalid kernel arg type passed (" << args[i].type
+                   << ") for arg " << i;
+        return false;
+    }
+  }
+
+  // Only perform/print the occupancy check 1x.
+  launched_kernels_mu_.lock();
+  if (launched_kernels_.find(cufunc) == launched_kernels_.end()) {
+    OccupancyCheck(kernel, thread_dims, block_dims);
+    // TODO(rspringer): Remove elements from launched_kernels_...if we ever
+    // expose a kernel/module deallocation method.
+    launched_kernels_.insert(cufunc);
+  }
+  launched_kernels_mu_.unlock();
+
+  if (cuda_kernel->GetPreferredCacheConfig() !=
+      KernelCacheConfig::kNoPreference) {
+    CUDADriver::FuncSetCacheConfig(cufunc, cuda_kernel->GetCUDACacheConfig());
+  }
+
+  if (!CUDADriver::LaunchKernel(
+          GetCudaContext(stream), cufunc, block_dims.x, block_dims.y,
+          block_dims.z, thread_dims.x, thread_dims.y, thread_dims.z,
+          shmem_bytes, custream, addrs.data(), nullptr /* = extra */)) {
+    LOG(ERROR) << "failed to launch CUDA kernel with args: " << args.size()
+               << "; thread dim: " << thread_dims.ToString()
+               << "; block dim: " << block_dims.ToString();
+    return false;
+  }
+
+  return true;
+}
+
+// This is a non-essential operation; if there's a failure, proceed without
+// logging an error. It's nearly certain that in case of failures, we'd never
+// get here in the first place; these are very low-impact routines.
+void CUDAExecutor::OccupancyCheck(const KernelBase &kernel,
+                                  const ThreadDim &thread_dims,
+                                  const BlockDim &block_dims) {
+  VLOG(2) << "Computing kernel occupancy for kernel "
+          << kernel.demangled_name();
+  VLOG(2) << "Thread dimensions (" << thread_dims.x << ", " << thread_dims.y
+          << ", " << thread_dims.z << ")";
+
+  int regs_per_thread;
+  if (!kernel.metadata().registers_per_thread(&regs_per_thread)) {
+    return;
+  }
+
+  int smem_per_block;
+  if (!kernel.metadata().shared_memory_bytes(&smem_per_block)) {
+    return;
+  }
+
+  const DeviceDescription &device_description =
+      kernel.parent()->GetDeviceDescription();
+
+  uint64 blocks_per_sm = CalculateOccupancy(
+      device_description, regs_per_thread, smem_per_block, thread_dims);
+  VLOG(2) << "Resident blocks per SM is " << blocks_per_sm;
+
+  // To increase occupancy, there must be a sufficient number of blocks
+  // available to spread across the sm's at this new improved occupancy level.
+  int multiprocessor_count = device_description.core_count();
+  int block_count = block_dims.x * block_dims.y * block_dims.z;
+  int available_blocks_per_sm =
+      port::MathUtil::CeilOfRatio(block_count, multiprocessor_count);
+  if (available_blocks_per_sm <= static_cast<int64>(blocks_per_sm)) {
+    VLOG(2) << "Occupancy is limited by number of blocks available per sm.";
+    return;
+  }
+
+  uint64 improved_regs_per_thread = CalculateRegisterLimitForTargetOccupancy(
+      device_description, smem_per_block, thread_dims, blocks_per_sm + 1);
+  if (improved_regs_per_thread != 0) {
+    VLOG(2) << "Reducing register usage from " << regs_per_thread
+            << " to " << improved_regs_per_thread
+            << " could increase resident blocks per SM by one.";
+
+    uint64 reg_reduction = regs_per_thread - improved_regs_per_thread;
+    if (reg_reduction <=
+        static_cast<uint64>(FLAGS_register_occupancy_warning_threshold)) {
+      LOG(INFO) << "Notice: occupancy would increase if register usage was"
+                << " reduced from " << regs_per_thread
+                << " to " << improved_regs_per_thread
+                << " registers per thread for kernel: "
+                << kernel.demangled_name();
+    }
+  } else {
+    VLOG(2) << "Resident blocks per SM cannot be increased by reducing "
+        "register usage.";
+  }
+}
+
+void *CUDAExecutor::Allocate(uint64 size) {
+  return CUDADriver::DeviceAllocate(context_, size);
+}
+
+void *CUDAExecutor::AllocateSubBuffer(DeviceMemoryBase *mem,
+                                      uint64 offset_bytes, uint64 size_bytes) {
+  // offset and size are in bytes, so char* works as the pointer type.
+  return reinterpret_cast<char *>(mem->opaque()) + offset_bytes;
+}
+
+void CUDAExecutor::Deallocate(DeviceMemoryBase *mem) {
+  // CUDA "sub-buffers" are just pointer + offset, so no dealloc is necessary.
+  if (!mem->is_sub_buffer()) {
+    CUDADriver::DeviceDeallocate(context_, mem->opaque());
+  }
+}
+
+bool CUDAExecutor::HostMemoryRegister(void *location, uint64 size) {
+  if (location == nullptr || size == 0) {
+    LOG(WARNING) << "attempting to register null or zero-sized memory: "
+                 << location << "; size " << size;
+  }
+  VLOG(2) << "registering " << location << " size " << size;
+  return CUDADriver::HostRegister(context_, location, size);
+}
+
+bool CUDAExecutor::HostMemoryUnregister(void *location) {
+  VLOG(2) << "unregistering " << location;
+  return CUDADriver::HostUnregister(context_, location);
+}
+
+bool CUDAExecutor::SynchronizeAllActivity() {
+  return CUDADriver::SynchronizeContext(context_);
+}
+
+bool CUDAExecutor::SynchronousMemZero(DeviceMemoryBase *location, uint64 size) {
+  if (reinterpret_cast<uintptr_t>(location->opaque()) % 4 == 0 &&
+      size % 4 == 0) {
+    return CUDADriver::SynchronousMemsetUint32(
+        context_, AsCudaDevicePtr(location), 0x0, size / 4);
+  }
+  return CUDADriver::SynchronousMemsetUint8(context_, AsCudaDevicePtr(location),
+                                            0x0, size);
+}
+
+bool CUDAExecutor::SynchronousMemSet(DeviceMemoryBase *location, int value,
+                                     uint64 size) {
+  if (reinterpret_cast<uintptr_t>(location->opaque()) % 4 == 0 &&
+      size % 4 == 0) {
+    // cudaMemset reinterprets "value" as a uint8.
+    uint8 byte_value = static_cast<uint8>(value);
+    uint32 pattern = (byte_value << 24) | (byte_value << 16) |
+                     (byte_value << 8) | byte_value;
+    return CUDADriver::SynchronousMemsetUint32(
+        context_, AsCudaDevicePtr(location), pattern, size / 4);
+  }
+  return CUDADriver::SynchronousMemsetUint8(context_, AsCudaDevicePtr(location),
+                                            value, size);
+}
+
+bool CUDAExecutor::SynchronousMemcpy(DeviceMemoryBase *gpu_dst,
+                                     const void *host_src, uint64 size) {
+  return CUDADriver::SynchronousMemcpyH2D(context_, AsCudaDevicePtr(gpu_dst),
+                                          host_src, size);
+}
+
+bool CUDAExecutor::SynchronousMemcpy(void *host_dst,
+                                     const DeviceMemoryBase &gpu_src,
+                                     uint64 size) {
+  return CUDADriver::SynchronousMemcpyD2H(context_, host_dst,
+                                          AsCudaDevicePtr(gpu_src), size);
+}
+
+bool CUDAExecutor::SynchronousMemcpyDeviceToDevice(
+    DeviceMemoryBase *gpu_dst, const DeviceMemoryBase &gpu_src, uint64 size) {
+  return CUDADriver::SynchronousMemcpyD2D(context_, AsCudaDevicePtr(gpu_dst),
+                                          AsCudaDevicePtr(gpu_src), size);
+}
+
+bool CUDAExecutor::MemZero(Stream *stream, DeviceMemoryBase *location,
+                           uint64 size) {
+  return Memset32(stream, location, 0x0, size);
+}
+
+bool CUDAExecutor::Memset32(Stream *stream, DeviceMemoryBase *location,
+                            uint32 pattern, uint64 size) {
+  VLOG(2) << "enqueueing memset32 operation onto stream " << stream
+          << " at location " << location << " with size " << size
+          << " and pattern " << std::hex << pattern;
+  CHECK(reinterpret_cast<uintptr_t>(location->opaque()) % 4 == 0 &&
+        size % 4 == 0);
+  return CUDADriver::AsynchronousMemsetUint32(
+      context_, AsCudaDevicePtr(location), pattern, size / 4,
+      AsCUDAStreamValue(stream));
+}
+
+bool CUDAExecutor::Memcpy(Stream *stream, void *host_dst,
+                          const DeviceMemoryBase &gpu_src, uint64 size) {
+  return CUDADriver::AsynchronousMemcpyD2H(context_, host_dst,
+                                           AsCudaDevicePtr(gpu_src), size,
+                                           AsCUDAStreamValue(stream));
+}
+
+bool CUDAExecutor::Memcpy(Stream *stream, DeviceMemoryBase *gpu_dst,
+                          const void *host_src, uint64 size) {
+  return CUDADriver::AsynchronousMemcpyH2D(context_, AsCudaDevicePtr(gpu_dst),
+                                           host_src, size,
+                                           AsCUDAStreamValue(stream));
+}
+
+bool CUDAExecutor::MemcpyDeviceToDevice(Stream *stream,
+                                        DeviceMemoryBase *gpu_dst,
+                                        const DeviceMemoryBase &gpu_src,
+                                        uint64 size) {
+  return CUDADriver::AsynchronousMemcpyD2D(context_, AsCudaDevicePtr(gpu_dst),
+                                           AsCudaDevicePtr(gpu_src), size,
+                                           AsCUDAStreamValue(stream));
+}
+
+bool CUDAExecutor::HostCallback(Stream *stream,
+                                std::function<void()> callback) {
+  auto callback_ptr = new std::function<void()>(callback);
+  return CUDADriver::AddStreamCallback(context_, AsCUDAStreamValue(stream),
+                                       InternalHostCallback, callback_ptr);
+}
+
+/* static */ void CUDAExecutor::InternalHostCallback(CUstream stream,
+                                                     CUresult status,
+                                                     void *data) {
+  std::function<void()> *callback =
+      reinterpret_cast<std::function<void()> *>(data);
+  (*callback)();
+  delete callback;
+}
+
+port::Status CUDAExecutor::AllocateEvent(Event *event) {
+  return AsCUDAEvent(event)->Init();
+}
+
+port::Status CUDAExecutor::DeallocateEvent(Event *event) {
+  return AsCUDAEvent(event)->Destroy();
+}
+
+port::Status CUDAExecutor::RecordEvent(Stream *stream, Event *event) {
+  return AsCUDAEvent(event)->Record(AsCUDAStream(stream));
+}
+
+port::Status CUDAExecutor::WaitForEvent(Stream *stream, Event *event) {
+  if (CUDADriver::WaitStreamOnEvent(context_,
+                                    AsCUDAStream(stream)->cuda_stream(),
+                                    AsCUDAEvent(event)->cuda_event())) {
+    return port::Status::OK();
+  } else {
+    return port::Status{
+        port::error::INTERNAL,
+        port::Printf("error recording waiting for CUDA event on stream %p",
+                     stream)};
+  }
+}
+
+Event::Status CUDAExecutor::PollForEventStatus(Event *event) {
+  return AsCUDAEvent(event)->PollForStatus();
+}
+
+bool CUDAExecutor::AllocateStream(Stream *stream) {
+  return AsCUDAStream(stream)->Init();
+}
+
+void CUDAExecutor::DeallocateStream(Stream *stream) {
+  CUDAStream *cuda_stream = AsCUDAStream(stream);
+  if (!cuda_stream->IsIdle()) {
+    LOG(ERROR) << "Deallocating stream with pending work";
+  }
+  cuda_stream->Destroy();
+}
+
+bool CUDAExecutor::AllocateTimer(Timer *timer) {
+  return AsCUDATimer(timer)->Init();
+}
+
+void CUDAExecutor::DeallocateTimer(Timer *timer) {
+  AsCUDATimer(timer)->Destroy();
+}
+
+bool CUDAExecutor::CreateStreamDependency(Stream *dependent, Stream *other) {
+  CUevent other_completed_event;
+  bool ok =
+      AsCUDAStream(other)->GetOrCreateCompletedEvent(&other_completed_event);
+  if (!ok) {
+    LOG(ERROR) << "failed to get completion event from other; "
+                  "therefore, failed to create inter-stream dependency";
+    return false;
+  }
+
+  ok = CUDADriver::RecordEvent(context_, other_completed_event,
+                               AsCUDAStreamValue(other))
+           .ok();
+  if (!ok) {
+    LOG(ERROR) << "failed to record completion event; "
+                  "therefore, failed to create inter-stream dependency";
+    return false;
+  }
+
+  return CUDADriver::WaitStreamOnEvent(context_, AsCUDAStreamValue(dependent),
+                                       other_completed_event);
+}
+
+bool CUDAExecutor::StartTimer(Stream *stream, Timer *timer) {
+  return AsCUDATimer(timer)->Start(AsCUDAStream(stream));
+}
+
+bool CUDAExecutor::StopTimer(Stream *stream, Timer *timer) {
+  return AsCUDATimer(timer)->Stop(AsCUDAStream(stream));
+}
+
+bool CUDAExecutor::BlockHostUntilDone(Stream *stream) {
+  return CUDADriver::SynchronizeStream(context_, AsCUDAStreamValue(stream));
+}
+
+blas::BlasSupport *CUDAExecutor::CreateBlas() {
+  PluginRegistry *registry = PluginRegistry::Instance();
+  port::StatusOr<PluginRegistry::BlasFactory> status =
+      registry->GetFactory<PluginRegistry::BlasFactory>(kCudaPlatformId,
+                                                        plugin_config_.blas());
+  if (!status.ok()) {
+    LOG(ERROR) << "Unable to retrieve BLAS factory: "
+               << status.status().error_message();
+    return nullptr;
+  }
+
+  return status.ValueOrDie()(this);
+}
+
+dnn::DnnSupport *CUDAExecutor::CreateDnn() {
+  PluginRegistry *registry = PluginRegistry::Instance();
+  port::StatusOr<PluginRegistry::DnnFactory> status =
+      registry->GetFactory<PluginRegistry::DnnFactory>(kCudaPlatformId,
+                                                       plugin_config_.dnn());
+  if (!status.ok()) {
+    LOG(ERROR) << "Unable to retrieve DNN factory: "
+               << status.status().error_message();
+    return nullptr;
+  }
+
+  return status.ValueOrDie()(this);
+}
+
+fft::FftSupport *CUDAExecutor::CreateFft() {
+  PluginRegistry *registry = PluginRegistry::Instance();
+  port::StatusOr<PluginRegistry::FftFactory> status =
+      registry->GetFactory<PluginRegistry::FftFactory>(kCudaPlatformId,
+                                                       plugin_config_.fft());
+  if (!status.ok()) {
+    LOG(ERROR) << "Unable to retrieve FFT factory: "
+               << status.status().error_message();
+    return nullptr;
+  }
+
+  return status.ValueOrDie()(this);
+}
+
+rng::RngSupport *CUDAExecutor::CreateRng() {
+  PluginRegistry *registry = PluginRegistry::Instance();
+  port::StatusOr<PluginRegistry::RngFactory> status =
+      registry->GetFactory<PluginRegistry::RngFactory>(kCudaPlatformId,
+                                                       plugin_config_.rng());
+  if (!status.ok()) {
+    LOG(ERROR) << "Unable to retrieve RNG factory: "
+               << status.status().error_message();
+    return nullptr;
+  }
+
+  return status.ValueOrDie()(this);
+}
+
+// TODO(rspringer): Remove in b/18544742.
+bool CUDAExecutor::SupportsDnn() const {
+  return true;
+}
+
+bool CUDAExecutor::CanEnablePeerAccessTo(StreamExecutorInterface *other) {
+  CUDAExecutor *cuda_other = static_cast<CUDAExecutor *>(other);
+  return CUDADriver::CanEnablePeerAccess(context_, cuda_other->context_);
+}
+
+port::Status CUDAExecutor::EnablePeerAccessTo(StreamExecutorInterface *other) {
+  CUDAExecutor *cuda_other = static_cast<CUDAExecutor *>(other);
+  return CUDADriver::EnablePeerAccess(context_, cuda_other->context_);
+}
+
+SharedMemoryConfig CUDAExecutor::GetDeviceSharedMemoryConfig() {
+  port::StatusOr<CUsharedconfig> cuda_config =
+      CUDADriver::ContextGetSharedMemConfig(context_);
+  if (!cuda_config.ok()) {
+    // Don't log; the failed call will log necessary output.
+    return SharedMemoryConfig::kDefault;
+  }
+
+  switch (cuda_config.ValueOrDie()) {
+    case CU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE:
+      return SharedMemoryConfig::kDefault;
+    case CU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE:
+      return SharedMemoryConfig::kFourByte;
+    case CU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE:
+      return SharedMemoryConfig::kEightByte;
+    default:
+      LOG(FATAL) << "Invalid shared memory configuration returned: "
+                 << cuda_config.ValueOrDie();
+  }
+}
+
+port::Status CUDAExecutor::SetDeviceSharedMemoryConfig(
+    SharedMemoryConfig config) {
+  CUsharedconfig cuda_config;
+  switch (config) {
+    case SharedMemoryConfig::kDefault:
+      cuda_config = CU_SHARED_MEM_CONFIG_DEFAULT_BANK_SIZE;
+      break;
+    case SharedMemoryConfig::kFourByte:
+      cuda_config = CU_SHARED_MEM_CONFIG_FOUR_BYTE_BANK_SIZE;
+      break;
+    case SharedMemoryConfig::kEightByte:
+      cuda_config = CU_SHARED_MEM_CONFIG_EIGHT_BYTE_BANK_SIZE;
+      break;
+    default:
+      LOG(FATAL) << "Invalid shared memory configuration specified: "
+                 << static_cast<int>(config);
+  }
+  return CUDADriver::ContextSetSharedMemConfig(context_, cuda_config);
+}
+
+bool CUDAExecutor::DeviceMemoryUsage(int64 *free, int64 *total) const {
+  return CUDADriver::GetDeviceMemoryInfo(context_, free, total);
+}
+
+bool CUDAExecutor::GetSymbol(const string& symbol_name, void **mem,
+                             size_t *bytes) {
+  {  // give limited scope to mutex_lock
+    mutex_lock lock{disk_modules_mu_};
+    for (auto &it : disk_modules_) {
+      if (CUDADriver::GetModuleSymbol(context_, it.second, symbol_name.c_str(),
+                                      reinterpret_cast<CUdeviceptr *>(mem),
+                                      bytes)) {
+        return true;
+      }
+    }
+  }
+
+  {  // give limited scope to mutex_lock
+    mutex_lock lock{in_memory_modules_mu_};
+    for (auto &it : in_memory_modules_) {
+      if (CUDADriver::GetModuleSymbol(context_, it.second, symbol_name.c_str(),
+                                      reinterpret_cast<CUdeviceptr *>(mem),
+                                      bytes)) {
+        return true;
+      }
+    }
+  }
+
+  LOG(INFO) << "Falied to find symbol in any modules: " << symbol_name;
+  return false;
+}
+
+bool CUDAExecutor::FillBlockDimLimit(BlockDim *block_dim_limit) const {
+  // The BlockDim name is a mismatch against these GRID_DIM_* queries because
+  // we use BlockDims to express the dimensions of blocks within a grid
+  // (as opposed to ThreadDim which expresses the dimensions of threads
+  // within a block).
+  int x, y, z;
+  if (!CUDADriver::GetGridLimits(&x, &y, &z, device_)) {
+    return false;
+  }
+
+  block_dim_limit->x = x;
+  block_dim_limit->y = y;
+  block_dim_limit->z = z;
+  return true;
+}
+
+KernelArg CUDAExecutor::DeviceMemoryToKernelArg(
+    const DeviceMemoryBase &gpu_mem) const {
+  const void* arg = gpu_mem.opaque();
+  const uint8 *arg_ptr = reinterpret_cast<const uint8 *>(&arg);
+
+  KernelArg kernel_arg;
+  kernel_arg.type = KernelArg::kNormal;
+  kernel_arg.data = port::InlinedVector<uint8, 4>(arg_ptr, arg_ptr + sizeof(arg));
+  kernel_arg.bytes = sizeof(arg);
+  return kernel_arg;
+}
+
+bool CUDAExecutor::SupportsBlas() const { return true; }
+
+bool CUDAExecutor::SupportsFft() const { return true; }
+
+bool CUDAExecutor::SupportsRng() const { return true; }
+
+void *CUDAExecutor::CudaContextHack() { return context_; }
+
+CUcontext CUDAExecutor::cuda_context() { return context_; }
+
+// Attemps to read the NUMA node corresponding to the GPU device's PCI bus out
+// of SysFS. Returns -1 if it cannot.
+//
+// For anything more complicated/prod-focused than this, you'll likely want to
+// turn to gsys' topology modeling.
+static int TryToReadNumaNode(const string &pci_bus_id, int device_ordinal) {
+  VLOG(2) << "trying to read NUMA node for device ordinal: " << device_ordinal;
+  static const int kUnknownNumaNode = -1;
+
+  if (pci_bus_id.empty()) {
+    LOG(INFO) << "no PCI bus ID for device ordinal: " << device_ordinal;
+    return kUnknownNumaNode;
+  }
+
+  string filename =
+      port::Printf("/sys/bus/pci/devices/%s/numa_node", pci_bus_id.c_str());
+
+  // We have to use fopen/fread here so that the device properties can be
+  // populated before InitGoogle procedure has been completed (at which point we
+  // could use the file::* utilities).
+  FILE *file = fopen(filename.c_str(), "r");
+  if (file == nullptr) {
+    LOG(ERROR) << "could not open file to read NUMA node: " << filename;
+    return kUnknownNumaNode;
+  }
+
+  string content;
+  char buf[32];
+  size_t did_read = fread(buf, sizeof(buf[0]), sizeof(buf) - 1, file);
+  buf[did_read] = '\0';
+  content = buf;
+
+  int32 value;
+  if (port::safe_strto32(content, &value)) {
+    if (value < 0) {  // See http://b/18228951 for details on this path.
+      LOG(INFO) << "successful NUMA node read from SysFS had negative value ("
+                << value << "), but there must be at least one NUMA node"
+                            ", so returning NUMA node zero";
+      return 0;
+    }
+    return value;
+  }
+
+  LOG(WARNING)
+      << "could not convert SysFS file contents to integral NUMA node value: "
+      << content;
+
+  return kUnknownNumaNode;
+}
+
+// Set of compute capability specific device parameters that cannot be
+// queried from the driver API.  These values instead are baked into a
+// lookup table indexed by compute capability version.
+struct UnqueryableDeviceParams {
+  int cc_major;
+  int cc_minor;
+  uint64 blocks_per_core_limit;
+  uint64 registers_per_core_limit;
+  uint64 registers_per_thread_limit;
+  uint64 warp_alloc_granularity;
+  uint64 register_alloc_granularity;
+  uint64 shared_memory_alloc_granularity;
+};
+
+static const UnqueryableDeviceParams kAllUnqueryableDeviceParams[] = {
+  {
+    3, 5,       // compute capability (3.5)
+    16,         // blocks_per_core_limit
+    64 * 1024,  // registers_per_core_limit
+    255,        // registers_per_thread_limit
+    4,          // warp_alloc_granularity
+    256,        // register_alloc_granularity
+    256         // shared_memory_alloc_granularity
+  }
+};
+
+DeviceDescription *CUDAExecutor::PopulateDeviceDescription() const {
+  internal::DeviceDescriptionBuilder builder;
+
+  {
+    int driver_version = 0;
+    (void)CUDADriver::GetDriverVersion(&driver_version);
+    string augmented_driver_version = port::Printf(
+        "%d (%s)", driver_version,
+        DriverVersionStatusToString(Diagnostician::FindDsoVersion()).c_str());
+    builder.set_driver_version(augmented_driver_version);
+  }
+
+  {
+    string pci_bus_id = CUDADriver::GetPCIBusID(device_);
+
+    // Lower the hex characters to match sysfs.
+    pci_bus_id = port::Lowercase(pci_bus_id);
+    builder.set_pci_bus_id(pci_bus_id);
+
+    // Read the NUMA node corresponding to the PCI bus ID out of sysfs.
+    int numa_node = TryToReadNumaNode(pci_bus_id, device_ordinal_);
+    builder.set_numa_node(numa_node);
+  }
+
+  CUdevprop prop;
+  if (CUDADriver::GetDeviceProperties(&prop, device_ordinal_)) {
+    builder.set_threads_per_block_limit(prop.maxThreadsPerBlock);
+
+    ThreadDim thread_dim_limit;
+    thread_dim_limit.x = prop.maxThreadsDim[0];
+    thread_dim_limit.y = prop.maxThreadsDim[1];
+    thread_dim_limit.z = prop.maxThreadsDim[2];
+    builder.set_thread_dim_limit(thread_dim_limit);
+
+    float clock_rate_ghz = static_cast<float>(prop.clockRate) / 1e6;
+    builder.set_clock_rate_ghz(clock_rate_ghz);
+  }
+
+  {
+    bool ecc_enabled = false;
+    (void)CUDADriver::IsEccEnabled(device_, &ecc_enabled);
+    builder.set_ecc_enabled(ecc_enabled);
+  }
+
+  {
+    uint64 device_memory_size = -1;
+    (void)CUDADriver::GetDeviceTotalMemory(device_, &device_memory_size);
+    builder.set_device_memory_size(device_memory_size);
+  }
+
+  {
+    BlockDim block_dim_limit;
+    FillBlockDimLimit(&block_dim_limit);
+    builder.set_block_dim_limit(block_dim_limit);
+  }
+
+  {
+    string device_name;
+    (void)CUDADriver::GetDeviceName(device_, &device_name);
+    builder.set_name(device_name);
+  }
+
+  for (size_t i = 0; i < ARRAYSIZE(kAllUnqueryableDeviceParams); i++) {
+    const auto &params = kAllUnqueryableDeviceParams[i];
+    if (params.cc_major == cc_major_ && params.cc_minor == cc_minor_) {
+      builder.set_blocks_per_core_limit(params.blocks_per_core_limit);
+      builder.set_registers_per_core_limit(params.registers_per_core_limit);
+      builder.set_registers_per_thread_limit(params.registers_per_thread_limit);
+      builder.set_warp_alloc_granularity(params.warp_alloc_granularity);
+      builder.set_register_alloc_granularity(params.register_alloc_granularity);
+      builder.set_shared_memory_alloc_granularity(
+          params.shared_memory_alloc_granularity);
+    }
+  }
+
+  builder.set_platform_version(
+      port::StrCat("Compute Capability ", cc_major_, ".", cc_minor_));
+
+  // TODO(leary) should be a way to query this from the driver, but this is
+  // unlikely to change for us any time soon.
+  builder.set_device_address_bits(64);
+
+  builder.set_device_vendor("NVIDIA Corporation");
+  builder.set_cuda_compute_capability(cc_major_, cc_minor_);
+  builder.set_shared_memory_per_core(
+      CUDADriver::GetMaxSharedMemoryPerCore(device_).ValueOrDie());
+  builder.set_shared_memory_per_block(
+      CUDADriver::GetMaxSharedMemoryPerBlock(device_).ValueOrDie());
+  builder.set_core_count(
+      CUDADriver::GetMultiprocessorCount(device_).ValueOrDie());
+  builder.set_threads_per_core_limit(
+      CUDADriver::GetMaxThreadsPerMultiprocessor(device_).ValueOrDie());
+  builder.set_registers_per_block_limit(
+      CUDADriver::GetMaxRegistersPerBlock(device_).ValueOrDie());
+  builder.set_threads_per_warp(
+      CUDADriver::GetThreadsPerWarp(device_).ValueOrDie());
+
+  auto built = builder.Build();
+  return built.release();
+}
+
+}  // namespace cuda
+
+namespace gpu = ::perftools::gputools;
+
+void initialize_cuda_gpu_executor() {
+  port::StatusOr<void *> status =
+      gpu::internal::CachedDsoLoader::GetLibcudaDsoHandle();
+  if (!status.ok()) {
+    gpu::cuda::Diagnostician::LogDriverVersionInformation();
+    LOG(INFO) << "LD_LIBRARY_PATH: " << getenv("LD_LIBRARY_PATH");
+    LOG(INFO) << "failed to find libcuda.so on this system: "
+              << status.status();
+  }
+
+  // TODO(b/22689637): Temporary until users are migrated off of PlatformKind.
+  gpu::PluginRegistry::Instance()->MapPlatformKindToId(
+      gpu::PlatformKind::kCuda, gpu::cuda::kCudaPlatformId);
+
+  *gpu::internal::MakeCUDAExecutorImplementation() = [](
+      const gpu::PluginConfig &config) {
+    return new gpu::cuda::CUDAExecutor{config};
+  };
+
+  *gpu::internal::MakeCUDAKernelImplementation() = []() {
+    return new gpu::cuda::CUDAKernel;
+  };
+
+  *gpu::internal::MakeCUDAEventImplementation() = [](
+      gpu::StreamExecutor *parent) {
+    gpu::cuda::CUDAExecutor *cuda_executor =
+        static_cast<gpu::cuda::CUDAExecutor *>(parent->implementation());
+    return new gpu::cuda::CUDAEvent{cuda_executor};
+  };
+
+  *gpu::internal::MakeCUDAStreamImplementation() = [](
+      gpu::StreamExecutor *parent) {
+    gpu::cuda::CUDAExecutor *cuda_executor =
+        static_cast<gpu::cuda::CUDAExecutor *>(parent->implementation());
+    return new gpu::cuda::CUDAStream{cuda_executor};
+  };
+  *gpu::internal::MakeCUDATimerImplementation() = [](
+      gpu::StreamExecutor *parent) {
+    gpu::cuda::CUDAExecutor *cuda_executor =
+        static_cast<gpu::cuda::CUDAExecutor *>(parent->implementation());
+    return new gpu::cuda::CUDATimer{cuda_executor};
+  };
+}
+
+}  // namespace gputools
+}  // namespace perftools
+
+REGISTER_MODULE_INITIALIZER(
+    cuda_gpu_executor, {perftools::gputools::initialize_cuda_gpu_executor();});
diff --git a/tensorflow/stream_executor/cuda/cuda_gpu_executor.h b/tensorflow/stream_executor/cuda/cuda_gpu_executor.h
new file mode 100644
index 0000000000..fda89b9738
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.h
@@ -0,0 +1,270 @@
+// The CUDA implementation of the StreamExecutorInterface functionality.
+// CUDA inclusions are ideally confined to this implementation file.
+//
+// The notions from the StreamExecutor basically correspond to the CUDA streams
+// programming model provided by the libcuda.so driver APIs, so we don't have
+// to do much more than wrap the calls to the libraries appropriately.
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_GPU_EXECUTOR_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_GPU_EXECUTOR_H_
+
+#include <map>
+#include <set>
+
+#include "tensorflow/stream_executor/cuda/cuda_kernel.h"
+#include "tensorflow/stream_executor/event.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+
+namespace perftools {
+namespace gputools {
+namespace blas {
+class BlasSupport;
+}
+namespace internal {
+class RngSupport;
+}  // namespace internal
+}  // namespace gputools
+}  // namespace perftools
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+// CUDA-platform implementation of the platform-agnostic
+// StreamExecutorInferface.
+class CUDAExecutor : public internal::StreamExecutorInterface {
+ public:
+  // sub_platform indicates the subplatform used in this executor; it must
+  // be a CUDA type.
+  explicit CUDAExecutor(const PluginConfig &plugin_config)
+      : device_(0),
+        context_(nullptr),
+        device_ordinal_(0),
+        cc_major_(0),
+        cc_minor_(0),
+        plugin_config_(plugin_config) {}
+
+  // See the corresponding StreamExecutor methods for method comments on the
+  // following overrides.
+
+  ~CUDAExecutor() override;
+
+  port::Status Init(int device_ordinal, DeviceOptions device_options) override;
+
+  bool GetKernel(const MultiKernelLoaderSpec &spec,
+                 KernelBase *kernel) override;
+
+  bool Launch(Stream *stream, const ThreadDim &thread_dims,
+              const BlockDim &block_dims, const KernelBase &k,
+              const std::vector<KernelArg> &args) override;
+
+  void *Allocate(uint64 size) override;
+
+  void *AllocateSubBuffer(DeviceMemoryBase *mem, uint64 offset_bytes,
+                          uint64 size_bytes) override;
+
+  void Deallocate(DeviceMemoryBase *mem) override;
+
+  // CUDA allocation/registration functions are necessary because the driver
+  // internally sets up buffers for DMA operations (and page locks them).
+  // There's no external interface for us to otherwise control these DMA
+  // settings.
+  void *HostMemoryAllocate(uint64 size) override {
+    return CUDADriver::HostAllocate(context_, size);
+  }
+
+  void HostMemoryDeallocate(void *location) override {
+    return CUDADriver::HostDeallocate(context_, location);
+  }
+
+  bool HostMemoryRegister(void *location, uint64 size) override;
+
+  bool HostMemoryUnregister(void *location) override;
+
+  bool SynchronizeAllActivity() override;
+
+  bool SynchronousMemZero(DeviceMemoryBase *location, uint64 size) override;
+
+  bool SynchronousMemSet(DeviceMemoryBase *location, int value,
+                         uint64 size) override;
+
+  bool SynchronousMemcpy(DeviceMemoryBase *gpu_dst, const void *host_src,
+                         uint64 size) override;
+
+  bool SynchronousMemcpy(void *host_dst, const DeviceMemoryBase &gpu_src,
+                         uint64 size) override;
+
+  bool SynchronousMemcpyDeviceToDevice(DeviceMemoryBase *gpu_dst,
+                                       const DeviceMemoryBase &gpu_src,
+                                       uint64 size) override;
+
+  bool MemZero(Stream *stream, DeviceMemoryBase *location,
+               uint64 size) override;
+  bool Memset32(Stream *stream, DeviceMemoryBase *location, uint32 pattern,
+                uint64 size) override;
+
+  bool Memcpy(Stream *stream, void *host_dst, const DeviceMemoryBase &gpu_src,
+              uint64 size) override;
+
+  bool Memcpy(Stream *stream, DeviceMemoryBase *gpu_dst, const void *host_src,
+              uint64 size) override;
+
+  bool MemcpyDeviceToDevice(Stream *stream, DeviceMemoryBase *gpu_dst,
+                            const DeviceMemoryBase &gpu_src,
+                            uint64 size) override;
+
+  bool HostCallback(Stream *stream, std::function<void()> callback) override;
+
+  bool AllocateStream(Stream *stream) override;
+
+  void DeallocateStream(Stream *stream) override;
+
+  bool CreateStreamDependency(Stream *dependent, Stream *other) override;
+
+  bool AllocateTimer(Timer *timer) override;
+
+  void DeallocateTimer(Timer *timer) override;
+
+  bool StartTimer(Stream *stream, Timer *timer) override;
+
+  bool StopTimer(Stream *stream, Timer *timer) override;
+
+  port::Status AllocateEvent(Event *event) override;
+
+  port::Status DeallocateEvent(Event *event) override;
+
+  port::Status RecordEvent(Stream *stream, Event *event) override;
+
+  port::Status WaitForEvent(Stream *stream, Event *event) override;
+
+  Event::Status PollForEventStatus(Event *event) override;
+
+  bool BlockHostUntilDone(Stream *stream) override;
+
+  int PlatformDeviceCount() override { return CUDADriver::GetDeviceCount(); }
+
+  port::Status EnablePeerAccessTo(StreamExecutorInterface *other) override;
+
+  bool CanEnablePeerAccessTo(StreamExecutorInterface *other) override;
+
+  SharedMemoryConfig GetDeviceSharedMemoryConfig() override;
+
+  port::Status SetDeviceSharedMemoryConfig(SharedMemoryConfig config) override;
+
+  bool DeviceMemoryUsage(int64 *free, int64 *total) const override;
+
+  // Search for the symbol and returns a device pointer and size.
+  // Returns false if symbol does not exist.
+  bool GetSymbol(const string& symbol_name, void **mem, size_t *bytes) override;
+
+  DeviceDescription *PopulateDeviceDescription() const override;
+
+  // Populates the block_dim_limit by querying the device driver API. If an
+  // error occurs at any point while asking the driver for block dim limits, it
+  // will be only partially populated as a result, and an error will be logged.
+  bool FillBlockDimLimit(BlockDim *block_dim_limit) const;
+
+  KernelArg DeviceMemoryToKernelArg(
+      const DeviceMemoryBase &gpu_mem) const override;
+
+  bool SupportsBlas() const override;
+
+  blas::BlasSupport *CreateBlas() override;
+
+  bool SupportsFft() const override;
+
+  fft::FftSupport *CreateFft() override;
+
+  bool SupportsRng() const override;
+
+  rng::RngSupport *CreateRng() override;
+
+  bool SupportsDnn() const override;
+
+  dnn::DnnSupport *CreateDnn() override;
+
+  void *CudaContextHack() override;
+
+  CUcontext cuda_context();
+
+ private:
+  // Attempts to find a more specific version of the file indicated by
+  // filename by looking for compute-capability-specific suffixed versions; i.e.
+  // looking for "foo.ptx" will check to see if "foo.ptx.cc30.ptx" is present if
+  // we're on a compute capability 3.0 machine.
+  bool FindOnDiskForComputeCapability(port::StringPiece filename,
+                                      port::StringPiece canonical_suffix,
+                                      string *found_filename) const;
+
+  // Host callback landing routine invoked by CUDA.
+  // data: User-provided callback provided to HostCallback() above, captured
+  //       as a std::function<void()>. Allocated/initialized inside
+  //       HostCallback() and owned and deleted by this call.
+  static void InternalHostCallback(CUstream stream, CUresult status,
+                                   void *data);
+
+  // Collects metadata for the specified kernel.
+  bool GetKernelMetadata(CUDAKernel *cuda_kernel,
+                         KernelMetadata *kernel_metadata);
+
+  // Determines if the given kernel's occupancy could be improved by only
+  // slightly reducing its register usage. If so, a message is emitted to the
+  // INFO log. The warning threshold is controlled by the flag
+  // register_occupancy_warning_threshold.
+  void OccupancyCheck(const KernelBase &kernel, const ThreadDim &thread_dims,
+                      const BlockDim &block_dims);
+
+  // Guards the on-disk-module mapping.
+  mutex disk_modules_mu_;
+
+  // Mapping from filename to CUmodule, if it was already retrieved.
+  // Multiple CUfunctions are usually obtained from a single CUmodule so we
+  // attempt to hit in this mapping first, before retrieving it.
+  std::map<string, CUmodule> disk_modules_ GUARDED_BY(disk_modules_mu_);
+
+  // Guards the in-memory-module mapping.
+  mutex in_memory_modules_mu_;
+
+  std::map<const char *, CUmodule> in_memory_modules_
+      GUARDED_BY(in_memory_modules_mu_);
+
+  // Guards the launched kernel set.
+  mutex launched_kernels_mu_;
+
+  // Keeps track of the set of launched kernels. Currently used to suppress the
+  // occupancy check on subsequent launches.
+  std::set<CUfunction> launched_kernels_ GUARDED_BY(launched_kernels_mu_);
+
+  // Handle for the CUDA device being operated on. Immutable
+  // post-initialization.
+  CUdevice device_;
+
+  // Handle for session with the library/driver. Immutable post-initialization.
+  CUcontext context_;
+
+  // The device ordinal value that this executor was initialized with; recorded
+  // for use in getting device metadata. Immutable post-initialization.
+  int device_ordinal_;
+
+  // The major verion of the compute capability for device_.
+  int cc_major_;
+
+  // The minor verion of the compute capability for device_.
+  int cc_minor_;
+
+  // The plugin configuration associated with this instance.
+  PluginConfig plugin_config_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CUDAExecutor);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_GPU_EXECUTOR_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_helpers.h b/tensorflow/stream_executor/cuda/cuda_helpers.h
new file mode 100644
index 0000000000..2c5311cb3b
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_helpers.h
@@ -0,0 +1,95 @@
+// Common helper functions used for dealing with CUDA API datatypes.
+//
+// These are typically placed here for use by multiple source components (for
+// example, BLAS and executor components).
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_HELPERS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_HELPERS_H_
+
+#include <stddef.h>
+#include <complex>
+
+#include "third_party/gpus/cuda/include/cuComplex.h"
+#include "third_party/gpus/cuda/include/cuda.h"
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+template <typename ElemT>
+class DeviceMemory;
+
+namespace cuda {
+
+// Converts a const DeviceMemory reference to its underlying typed pointer in
+// CUDA
+// device memory.
+template <typename T>
+const T *CUDAMemory(const DeviceMemory<T> &mem) {
+  return static_cast<const T *>(mem.opaque());
+}
+
+// Converts a (non-const) DeviceMemory pointer reference to its underlying typed
+// pointer in CUDA device device memory.
+template <typename T>
+T *CUDAMemoryMutable(DeviceMemory<T> *mem) {
+  return static_cast<T *>(mem->opaque());
+}
+
+CUstream AsCUDAStreamValue(Stream *stream);
+
+static_assert(sizeof(std::complex<float>) == sizeof(cuComplex),
+              "std::complex<float> and cuComplex should have the same size");
+static_assert(offsetof(cuComplex, x) == 0,
+              "The real part of cuComplex should appear first.");
+static_assert(sizeof(std::complex<double>) == sizeof(cuDoubleComplex),
+              "std::complex<double> and cuDoubleComplex should have the same "
+              "size");
+static_assert(offsetof(cuDoubleComplex, x) == 0,
+              "The real part of cuDoubleComplex should appear first.");
+
+// Type traits to get CUDA complex types from std::complex<>.
+
+template <typename T>
+struct CUDAComplexT {
+  typedef T type;
+};
+
+template <>
+struct CUDAComplexT<std::complex<float>> {
+  typedef cuComplex type;
+};
+
+template <>
+struct CUDAComplexT<std::complex<double>> {
+  typedef cuDoubleComplex type;
+};
+
+// Converts pointers of std::complex<> to pointers of
+// cuComplex/cuDoubleComplex. No type conversion for non-complex types.
+
+template <typename T>
+inline const typename CUDAComplexT<T>::type *CUDAComplex(const T *p) {
+  return reinterpret_cast<const typename CUDAComplexT<T>::type *>(p);
+}
+
+template <typename T>
+inline typename CUDAComplexT<T>::type *CUDAComplex(T *p) {
+  return reinterpret_cast<typename CUDAComplexT<T>::type *>(p);
+}
+
+// Converts values of std::complex<float/double> to values of
+// cuComplex/cuDoubleComplex.
+inline cuComplex CUDAComplexValue(std::complex<float> val) {
+  return {val.real(), val.imag()};
+}
+
+inline cuDoubleComplex CUDAComplexValue(std::complex<double> val) {
+  return {val.real(), val.imag()};
+}
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_HELPERS_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_kernel.h b/tensorflow/stream_executor/cuda/cuda_kernel.h
new file mode 100644
index 0000000000..e8ad3955e9
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_kernel.h
@@ -0,0 +1,115 @@
+// The CUDA implementation of the StreamExecutorInterface functionality.
+// CUDA inclusions are ideally confined to this implementation file.
+//
+// The notions from the StreamExecutor basically correspond to the CUDA streams
+// programming model provided by the libcuda.so driver APIs, so we don't have
+// to do much more than wrap the calls to the libraries appropriately.
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_KERNEL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_KERNEL_H_
+
+#include "tensorflow/stream_executor/kernel_cache_config.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/lib/casts.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "third_party/gpus/cuda/include/cuda.h"
+
+#ifdef PLATFORMS_GPUS_CUDA_DYNAMIC_LIBCUDA_DYNAMIC_LIBCUDA_H_
+#error \
+    "No driver calls in this file, wrap driver functionality in cuda_driver.cc."
+#endif
+
+#ifdef __CUDA_RUNTIME_H__
+#error \
+    "CUDA runtime being included into CUDA GPU executor; should be driver only."
+#endif
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+// Wraps a CUfunction to implement the platform-independent KernelInterface.
+class CUDAKernel : public internal::KernelInterface {
+ public:
+  CUDAKernel() : cuda_function_(nullptr), arity_(0),
+                 preferred_cache_config_(KernelCacheConfig::kNoPreference) {}
+
+  // Note that the function is unloaded when the module is unloaded, and the
+  // module that the function is contained in is owned by the CUDAExecutor.
+  ~CUDAKernel() override {}
+
+  // As arity cannot be reflected upon using the CUDA API, the arity is
+  // explicitly set during the CUDAExecutor::GetKernel initialization process.
+  void set_arity(unsigned arity) { arity_ = arity; }
+  unsigned Arity() const override { return arity_; }
+
+  // Returns the CUfunction value for passing to the CUDA API.
+  CUfunction AsCUDAFunctionValue() const {
+    DCHECK(cuda_function_ != nullptr);
+    return const_cast<CUfunction>(cuda_function_);
+  }
+
+  // Returns the slot that the CUfunction is stored within for this object,
+  // for the CUDA API which wants to load into a CUfunction*.
+  CUfunction *cuda_function_ptr() { return &cuda_function_; }
+
+  // CUDA supports setting the preferred cache configuration of a CUfunction
+  // (more-or-less equivalent to a CUDAKernel). We support this via the below
+  // functions; users can set a preference, and that is applied when the kernel
+  // is [lazy-]loaded (in CUDAExecutor::Launch). The alternative would be to
+  // load the kernel & set the preference when the user calls the setter below;
+  // either approach is valid.
+  // Sets the current kernel cache configuration preference.
+  void SetPreferredCacheConfig(KernelCacheConfig config) override {
+    preferred_cache_config_ = config;
+  }
+
+  // Returns the current kernel cache configuration preference.
+  KernelCacheConfig GetPreferredCacheConfig() const override {
+    return preferred_cache_config_;
+  }
+
+  // Returns the current kernel cache configuration preference as a
+  // CUfunc_cache.
+  CUfunc_cache GetCUDACacheConfig() const {
+    switch (preferred_cache_config_) {
+      case KernelCacheConfig::kNoPreference:
+        return CU_FUNC_CACHE_PREFER_NONE;
+      case KernelCacheConfig::kPreferShared:
+        return CU_FUNC_CACHE_PREFER_SHARED;
+      case KernelCacheConfig::kPreferL1:
+        return CU_FUNC_CACHE_PREFER_L1;
+      case KernelCacheConfig::kPreferEqual:
+        return CU_FUNC_CACHE_PREFER_EQUAL;
+      default:
+        LOG(FATAL) << "Unknown KernelCacheConfig"
+                   << static_cast<int32>(preferred_cache_config_);
+    }
+  }
+
+ private:
+  CUfunction cuda_function_;  // Wrapped CUDA kernel handle.
+  unsigned arity_;            // Number of formal parameters the kernel takes.
+
+  // Preferred (but not required) cache configuration for this kernel.
+  KernelCacheConfig preferred_cache_config_;
+};
+
+// Given a platform-independent kernel datatype, returns the (const) internal
+// CUDA platform implementation pointer.
+inline const CUDAKernel *AsCUDAKernel(const KernelBase *kernel) {
+  return static_cast<const CUDAKernel *>(kernel->implementation());
+}
+
+// Given a platform-independent kernel datatype, returns the (non-const)
+// internal CUDA platform implementation pointer.
+inline CUDAKernel *AsCUDAKernel(KernelBase *kernel) {
+  return static_cast<CUDAKernel *>(kernel->implementation());
+}
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_KERNEL_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_platform.cc b/tensorflow/stream_executor/cuda/cuda_platform.cc
new file mode 100644
index 0000000000..ef88b89eda
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_platform.cc
@@ -0,0 +1,172 @@
+#include "tensorflow/stream_executor/cuda/cuda_platform.h"
+
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/initialize.h"
+#include "tensorflow/stream_executor/lib/ptr_util.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+PLATFORM_DEFINE_ID(kCudaPlatformId);
+
+CudaPlatform::CudaPlatform()
+    : name_("CUDA"), min_numa_node_(0), limit_numa_node_(0) {}
+
+CudaPlatform::~CudaPlatform() {}
+
+// Due to legacy issues in user code, we can't currently call InpectNumaNodes
+// at module initialization time, because non-GPU programs still include this
+// plugin via various methods, so instead, it has to be init-on-reference.
+void CudaPlatform::InspectNumaNodes() {
+  // To get NUMA node information, we need to create all executors, so we can
+  // examine their device descriptions to see their bus assignments.
+  static bool initialized = false;
+  static mutex numa_mutex(LINKER_INITIALIZED);
+  mutex_lock lock(numa_mutex);
+  if (initialized) {
+    return;
+  }
+
+  StreamExecutorConfig config;
+  for (int i = 0; i < VisibleDeviceCount(); i++) {
+    config.ordinal = i;
+    StreamExecutor* exec = GetExecutor(config).ValueOrDie();
+    if (i == 0) {
+      // NUMA nodes may not start at 0, so set the minimum node  based on the
+      // first executor we see.
+      min_numa_node_ = exec->GetDeviceDescription().numa_node();
+      limit_numa_node_ = min_numa_node_ + 1;
+    } else {
+      min_numa_node_ =
+          std::min(min_numa_node_, exec->GetDeviceDescription().numa_node());
+      limit_numa_node_ = std::max(limit_numa_node_,
+                                  exec->GetDeviceDescription().numa_node() + 1);
+    }
+  }
+  initialized = true;
+}
+
+int CudaPlatform::BusCount() {
+  InspectNumaNodes();
+  return limit_numa_node_ - min_numa_node_;
+}
+
+int CudaPlatform::DeviceToBus(int device_ordinal) {
+  StreamExecutorConfig config;
+  config.ordinal = device_ordinal;
+  StreamExecutor* exec = GetExecutor(config).ValueOrDie();
+  return exec->GetDeviceDescription().numa_node() - min_numa_node_;
+}
+
+port::StatusOr<StreamExecutor*> CudaPlatform::FirstExecutorForBus(
+    int bus_ordinal) {
+  InspectNumaNodes();
+  CHECK_LT(bus_ordinal, BusCount()) << "bus ordinal out of available range";
+  for (int i = 0; i < VisibleDeviceCount(); i++) {
+    if (DeviceToBus(i) == bus_ordinal) {
+      StreamExecutorConfig config;
+      config.ordinal = i;
+      return GetExecutor(config).ValueOrDie();
+    }
+  }
+
+  return port::Status{
+      port::error::NOT_FOUND,
+      port::Printf("Executor for bus %d not found.", bus_ordinal)};
+}
+
+Platform::Id CudaPlatform::id() const { return kCudaPlatformId; }
+
+int CudaPlatform::VisibleDeviceCount() const {
+  // Throw away the result - it logs internally, and this [containing] function
+  // isn't in the path of user control. It's safe to call this > 1x.
+  if (!cuda::CUDADriver::Init().ok()) {
+    return -1;
+  }
+
+  return CUDADriver::GetDeviceCount();
+}
+
+const string& CudaPlatform::Name() const { return name_; }
+
+port::StatusOr<StreamExecutor*> CudaPlatform::ExecutorForDevice(int ordinal) {
+  StreamExecutorConfig config;
+  config.ordinal = ordinal;
+  config.plugin_config = PluginConfig();
+  config.device_options = DeviceOptions::Default();
+  return GetExecutor(config);
+}
+
+port::StatusOr<StreamExecutor*> CudaPlatform::ExecutorForDeviceWithPluginConfig(
+    int device_ordinal, const PluginConfig& plugin_config) {
+  StreamExecutorConfig config;
+  config.ordinal = device_ordinal;
+  config.plugin_config = plugin_config;
+  config.device_options = DeviceOptions::Default();
+  return GetExecutor(config);
+}
+
+port::StatusOr<StreamExecutor*> CudaPlatform::GetExecutor(
+    const StreamExecutorConfig& config) {
+  mutex_lock lock(mu_);
+
+  port::StatusOr<StreamExecutor*> status = executor_cache_.Get(config);
+  if (status.ok()) {
+    return status.ValueOrDie();
+  }
+
+  port::StatusOr<std::unique_ptr<StreamExecutor>> executor =
+      GetUncachedExecutor(config);
+  if (!executor.ok()) {
+    return executor.status();
+  }
+
+  StreamExecutor* naked_executor = executor.ValueOrDie().get();
+  executor_cache_.Insert(config, executor.ConsumeValueOrDie());
+  return naked_executor;
+}
+
+port::StatusOr<std::unique_ptr<StreamExecutor>>
+CudaPlatform::GetUncachedExecutor(const StreamExecutorConfig& config) {
+  auto executor = port::MakeUnique<StreamExecutor>(PlatformKind::kCuda,
+                                                   config.plugin_config);
+  auto init_status = executor->Init(config.ordinal, config.device_options);
+  if (!init_status.ok()) {
+    return port::Status{
+        port::error::INTERNAL,
+        port::Printf(
+            "failed initializing StreamExecutor for CUDA device ordinal %d: %s",
+            config.ordinal, init_status.ToString().c_str())};
+  }
+
+  return std::move(executor);
+}
+
+void CudaPlatform::RegisterTraceListener(
+    std::unique_ptr<TraceListener> listener) {
+  LOG(FATAL) << "not yet implemented: register CUDA trace listener";
+}
+
+void CudaPlatform::UnregisterTraceListener(TraceListener* listener) {
+  LOG(FATAL) << "not yet implemented: unregister CUDA trace listener";
+}
+
+}  // namespace cuda
+
+static void InitializeCudaPlatform() {
+  // Disabling leak checking, MultiPlatformManager does not destroy its
+  // registered platforms.
+  
+  std::unique_ptr<cuda::CudaPlatform> platform(new cuda::CudaPlatform);
+  SE_CHECK_OK(MultiPlatformManager::RegisterPlatform(std::move(platform)));
+}
+
+}  // namespace gputools
+}  // namespace perftools
+
+REGISTER_MODULE_INITIALIZER(cuda_platform,
+                            perftools::gputools::InitializeCudaPlatform());
diff --git a/tensorflow/stream_executor/cuda/cuda_platform.h b/tensorflow/stream_executor/cuda/cuda_platform.h
new file mode 100644
index 0000000000..966d7343f7
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_platform.h
@@ -0,0 +1,98 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_PLATFORM_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_PLATFORM_H_
+
+#include <memory>
+#include "tensorflow/stream_executor/platform/port.h"
+#include <vector>
+
+#include "tensorflow/stream_executor/executor_cache.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+#include "tensorflow/stream_executor/trace_listener.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+// Opaque and unique identifier for the CUDA platform plugin.
+// This is needed so that plugins can refer to/identify this platform without
+// instantiating a CudaPlatform object.
+extern const Platform::Id kCudaPlatformId;
+
+// Cuda-specific platform plugin, registered as a singleton value via module
+// initializer.
+class CudaPlatform : public Platform {
+ public:
+  CudaPlatform();
+  ~CudaPlatform() override;
+
+  // CudaPlatform-specific functionality
+  // Returns the number of distinct buses / NUMA nodes on the machine.
+  int BusCount();
+
+  // Returns the bus/NUMA node for the specified device ordinal.
+  int DeviceToBus(int device_ordinal);
+
+  // Returns the lowest-ordinal-number StreamExecutor on the specified bus.
+  port::StatusOr<StreamExecutor*> FirstExecutorForBus(int bus_ordinal);
+
+  // Platform interface implementation:
+  // Returns the same value as kCudaPlatform above.
+  Platform::Id id() const override;
+
+  // Returns -1 as a sentinel on internal failure (and logs the error).
+  int VisibleDeviceCount() const override;
+
+  const string& Name() const override;
+
+  port::StatusOr<StreamExecutor*> ExecutorForDevice(int ordinal) override;
+
+  port::StatusOr<StreamExecutor*> ExecutorForDeviceWithPluginConfig(
+      int ordinal, const PluginConfig& config) override;
+
+  port::StatusOr<StreamExecutor*> GetExecutor(
+      const StreamExecutorConfig& config) override;
+
+  port::StatusOr<std::unique_ptr<StreamExecutor>> GetUncachedExecutor(
+      const StreamExecutorConfig& config) override;
+
+  void RegisterTraceListener(std::unique_ptr<TraceListener> listener) override;
+
+  void UnregisterTraceListener(TraceListener* listener) override;
+
+ private:
+  // Determines the number of NUMA nodes and the assignment of executor to each.
+  void InspectNumaNodes();
+
+  // This platform's name.
+  string name_;
+
+  // mutex that guards internal state.
+  mutable mutex mu_;
+
+  // Cache of created executors.
+  ExecutorCache executor_cache_;
+
+  // The smallest NUMA node value for any device managed by this machine
+  // manager. Used, along with limit_numa_node_, to convert NUMA nodes into bus
+  // ordinals. The NUMA node space occupied by GPUs is assumed to be dense./
+  int min_numa_node_;
+
+  // Larger than the NUMA node value for any device managed by this machine
+  // manager.
+  int limit_numa_node_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CudaPlatform);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_PLATFORM_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_rng.cc b/tensorflow/stream_executor/cuda/cuda_rng.cc
new file mode 100644
index 0000000000..ad48c8b59a
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_rng.cc
@@ -0,0 +1,317 @@
+#include "tensorflow/stream_executor/cuda/cuda_rng.h"
+
+#include <dlfcn.h>
+
+#include "tensorflow/stream_executor/cuda/cuda_activation.h"
+#include "tensorflow/stream_executor/cuda/cuda_gpu_executor.h"
+#include "tensorflow/stream_executor/cuda/cuda_helpers.h"
+#include "tensorflow/stream_executor/cuda/cuda_platform.h"
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/dso_loader.h"
+#include "tensorflow/stream_executor/lib/initialize.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/rng.h"
+#include "third_party/gpus/cuda/include/curand.h"
+
+// Formats curandStatus_t to output prettified values into a log stream.
+std::ostream &operator<<(std::ostream &in, const curandStatus_t &status) {
+#define OSTREAM_CURAND_STATUS(__name) \
+  case CURAND_STATUS_##__name:        \
+    in << "CURAND_STATUS_" #__name;   \
+    return in;
+
+  switch (status) {
+    OSTREAM_CURAND_STATUS(SUCCESS)
+    OSTREAM_CURAND_STATUS(VERSION_MISMATCH)
+    OSTREAM_CURAND_STATUS(NOT_INITIALIZED)
+    OSTREAM_CURAND_STATUS(ALLOCATION_FAILED)
+    OSTREAM_CURAND_STATUS(TYPE_ERROR)
+    OSTREAM_CURAND_STATUS(OUT_OF_RANGE)
+    OSTREAM_CURAND_STATUS(LENGTH_NOT_MULTIPLE)
+    OSTREAM_CURAND_STATUS(LAUNCH_FAILURE)
+    OSTREAM_CURAND_STATUS(PREEXISTING_FAILURE)
+    OSTREAM_CURAND_STATUS(INITIALIZATION_FAILED)
+    OSTREAM_CURAND_STATUS(ARCH_MISMATCH)
+    OSTREAM_CURAND_STATUS(INTERNAL_ERROR)
+    default:
+      in << "curandStatus_t(" << static_cast<int>(status) << ")";
+      return in;
+  }
+}
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+PLUGIN_REGISTRY_DEFINE_PLUGIN_ID(kCuRandPlugin);
+
+namespace dynload {
+
+#define PERFTOOLS_GPUTOOLS_CURAND_WRAP(__name)                              \
+  struct DynLoadShim__##__name {                                            \
+    static const char *kName;                                               \
+    using FuncPointerT = std::add_pointer<decltype(::__name)>::type;        \
+    static void *GetDsoHandle() {                                           \
+      static auto status = internal::CachedDsoLoader::GetCurandDsoHandle(); \
+      return status.ValueOrDie();                                           \
+    }                                                                       \
+    static FuncPointerT DynLoad() {                                         \
+      static void *f = dlsym(GetDsoHandle(), kName);                        \
+      CHECK(f != nullptr) << "could not find " << kName                     \
+                          << " in curand DSO; dlerror: " << dlerror();      \
+      return reinterpret_cast<FuncPointerT>(f);                             \
+    }                                                                       \
+    template <typename... Args>                                             \
+    curandStatus_t operator()(CUDAExecutor * parent, Args... args) {        \
+      cuda::ScopedActivateExecutorContext sac{parent};                      \
+      return DynLoad()(args...);                                            \
+    }                                                                       \
+  } __name;                                                                 \
+  const char *DynLoadShim__##__name::kName = #__name;
+
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandCreateGenerator);
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandDestroyGenerator);
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandSetStream);
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandGenerateUniform);
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandGenerateUniformDouble);
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandSetPseudoRandomGeneratorSeed);
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandSetGeneratorOffset);
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandGenerateNormal);
+PERFTOOLS_GPUTOOLS_CURAND_WRAP(curandGenerateNormalDouble);
+
+}  // namespace dynload
+
+template <typename T>
+string TypeString();
+
+template <>
+string TypeString<float>() {
+  return "float";
+}
+
+template <>
+string TypeString<double>() {
+  return "double";
+}
+
+template <>
+string TypeString<std::complex<float>>() {
+  return "std::complex<float>";
+}
+
+template <>
+string TypeString<std::complex<double>>() {
+  return "std::complex<double>";
+}
+
+CUDARng::CUDARng(CUDAExecutor *parent) : parent_(parent), rng_(nullptr) {}
+
+CUDARng::~CUDARng() {
+  if (rng_ != nullptr) {
+    dynload::curandDestroyGenerator(parent_, rng_);
+  }
+}
+
+bool CUDARng::Init() {
+  mutex_lock lock{mu_};
+  CHECK(rng_ == nullptr);
+
+  curandStatus_t ret =
+      dynload::curandCreateGenerator(parent_, &rng_, CURAND_RNG_PSEUDO_DEFAULT);
+  if (ret != CURAND_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to create random number generator: " << ret;
+    return false;
+  }
+
+  CHECK(rng_ != nullptr);
+  return true;
+}
+
+bool CUDARng::SetStream(Stream *stream) {
+  curandStatus_t ret =
+      dynload::curandSetStream(parent_, rng_, AsCUDAStreamValue(stream));
+  if (ret != CURAND_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to set stream for random generation: " << ret;
+    return false;
+  }
+
+  return true;
+}
+
+// Returns true if std::complex stores its contents as two consecutive
+// elements. Tests int, float and double, as the last two are independent
+// specializations.
+constexpr bool ComplexIsConsecutiveFloats() {
+  return sizeof(std::complex<int>) == 8 && sizeof(std::complex<float>) == 8 &&
+      sizeof(std::complex<double>) == 16;
+}
+
+template <typename T>
+bool CUDARng::DoPopulateRandUniformInternal(Stream *stream,
+                                            DeviceMemory<T> *v) {
+  mutex_lock lock{mu_};
+  static_assert(ComplexIsConsecutiveFloats(),
+                "std::complex values are not stored as consecutive values");
+
+  if (!SetStream(stream)) {
+    return false;
+  }
+
+  // std::complex<T> is currently implemented as two consecutive T variables.
+  uint64 element_count = v->ElementCount();
+  if (std::is_same<T, std::complex<float>>::value ||
+      std::is_same<T, std::complex<double>>::value) {
+    element_count *= 2;
+  }
+
+  curandStatus_t ret;
+  if (std::is_same<T, float>::value ||
+      std::is_same<T, std::complex<float>>::value) {
+    ret = dynload::curandGenerateUniform(
+        parent_, rng_, reinterpret_cast<float *>(CUDAMemoryMutable(v)),
+        element_count);
+  } else {
+    ret = dynload::curandGenerateUniformDouble(
+        parent_, rng_, reinterpret_cast<double *>(CUDAMemoryMutable(v)),
+        element_count);
+  }
+  if (ret != CURAND_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to do uniform generation of " << v->ElementCount()
+               << " " << TypeString<T>() << "s at " << v->opaque() << ": "
+               << ret;
+    return false;
+  }
+
+  return true;
+}
+
+bool CUDARng::DoPopulateRandUniform(Stream *stream, DeviceMemory<float> *v) {
+  return DoPopulateRandUniformInternal(stream, v);
+}
+
+bool CUDARng::DoPopulateRandUniform(Stream *stream, DeviceMemory<double> *v) {
+  return DoPopulateRandUniformInternal(stream, v);
+}
+
+bool CUDARng::DoPopulateRandUniform(Stream *stream,
+                                    DeviceMemory<std::complex<float>> *v) {
+  return DoPopulateRandUniformInternal(stream, v);
+}
+
+bool CUDARng::DoPopulateRandUniform(Stream *stream,
+                                    DeviceMemory<std::complex<double>> *v) {
+  return DoPopulateRandUniformInternal(stream, v);
+}
+
+template <typename ElemT, typename FuncT>
+bool CUDARng::DoPopulateRandGaussianInternal(Stream *stream, ElemT mean,
+                                             ElemT stddev,
+                                             DeviceMemory<ElemT> *v,
+                                             FuncT func) {
+  mutex_lock lock{mu_};
+
+  if (!SetStream(stream)) {
+    return false;
+  }
+
+  uint64 element_count = v->ElementCount();
+  curandStatus_t ret =
+      func(parent_, rng_, CUDAMemoryMutable(v), element_count, mean, stddev);
+
+  if (ret != CURAND_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to do gaussian generation of " << v->ElementCount()
+               << " floats at " << v->opaque() << ": " << ret;
+    return false;
+  }
+
+  return true;
+}
+
+bool CUDARng::DoPopulateRandGaussian(Stream *stream, float mean, float stddev,
+                                     DeviceMemory<float> *v) {
+  return DoPopulateRandGaussianInternal(stream, mean, stddev, v,
+                                        dynload::curandGenerateNormal);
+}
+
+bool CUDARng::DoPopulateRandGaussian(Stream *stream, double mean, double stddev,
+                                     DeviceMemory<double> *v) {
+  return DoPopulateRandGaussianInternal(stream, mean, stddev, v,
+                                        dynload::curandGenerateNormalDouble);
+}
+
+bool CUDARng::SetSeed(Stream *stream, const uint8 *seed, uint64 seed_bytes) {
+  mutex_lock lock{mu_};
+  CHECK(rng_ != nullptr);
+
+  if (!CheckSeed(seed, seed_bytes)) {
+    return false;
+  }
+
+  if (!SetStream(stream)) {
+    return false;
+  }
+
+  // Requires 8 bytes of seed data; checked in RngSupport::CheckSeed (above)
+  // (which itself requires 16 for API consistency with host RNG fallbacks).
+  curandStatus_t ret = dynload::curandSetPseudoRandomGeneratorSeed(
+      parent_, rng_, *(reinterpret_cast<const uint64 *>(seed)));
+  if (ret != CURAND_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to set rng seed: " << ret;
+    return false;
+  }
+
+  ret = dynload::curandSetGeneratorOffset(parent_, rng_, 0);
+  if (ret != CURAND_STATUS_SUCCESS) {
+    LOG(ERROR) << "failed to reset rng position: " << ret;
+    return false;
+  }
+  return true;
+}
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+namespace gpu = ::perftools::gputools;
+
+REGISTER_MODULE_INITIALIZER(register_curand, {
+  gpu::port::Status status =
+      gpu::PluginRegistry::Instance()
+          ->RegisterFactory<gpu::PluginRegistry::RngFactory>(
+              gpu::cuda::kCudaPlatformId, gpu::cuda::kCuRandPlugin, "cuRAND",
+              [](gpu::internal::StreamExecutorInterface
+                     *parent) -> gpu::rng::RngSupport * {
+                gpu::cuda::CUDAExecutor *cuda_executor =
+                    dynamic_cast<gpu::cuda::CUDAExecutor *>(parent);
+                if (cuda_executor == nullptr) {
+                  LOG(ERROR)
+                      << "Attempting to initialize an instance of the cuRAND "
+                      << "support library with a non-CUDA StreamExecutor";
+                  return nullptr;
+                }
+
+                gpu::cuda::CUDARng *rng = new gpu::cuda::CUDARng(cuda_executor);
+                if (!rng->Init()) {
+                  // Note: Init() will log a more specific error.
+                  delete rng;
+                  return nullptr;
+                }
+                return rng;
+              });
+
+  if (!status.ok()) {
+    LOG(ERROR) << "Unable to register cuRAND factory: "
+               << status.error_message();
+  }
+
+  // Prime the cuRAND DSO. The loader will log more information.
+  auto statusor = gpu::internal::CachedDsoLoader::GetCurandDsoHandle();
+  if (!statusor.ok()) {
+    LOG(INFO) << "Unable to load cuRAND DSO.";
+  }
+
+  gpu::PluginRegistry::Instance()->SetDefaultFactory(gpu::cuda::kCudaPlatformId,
+                                                     gpu::PluginKind::kRng,
+                                                     gpu::cuda::kCuRandPlugin);
+});
diff --git a/tensorflow/stream_executor/cuda/cuda_rng.h b/tensorflow/stream_executor/cuda/cuda_rng.h
new file mode 100644
index 0000000000..4e1b82969b
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_rng.h
@@ -0,0 +1,89 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_RNG_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_RNG_H_
+
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+#include "tensorflow/stream_executor/rng.h"
+
+typedef struct curandGenerator_st *curandGenerator_t;
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+template <typename ElemT>
+class DeviceMemory;
+
+namespace cuda {
+
+// Opaque and unique identifier for the cuRAND plugin.
+extern const PluginId kCuRandPlugin;
+
+class CUDAExecutor;
+
+// CUDA-platform implementation of the random number generation support
+// interface.
+//
+// Thread-safe post-initialization.
+class CUDARng : public rng::RngSupport {
+ public:
+  explicit CUDARng(CUDAExecutor *parent);
+
+  // Retrieves a curand library generator handle. This is necessary for
+  // enqueuing random number generation work onto the device.
+  // TODO(leary) provide a way for users to select the RNG algorithm.
+  bool Init();
+
+  // Releases a curand library generator handle, if one was acquired.
+  ~CUDARng() override;
+
+  // See rng::RngSupport for details on the following overrides.
+  bool DoPopulateRandUniform(Stream *stream, DeviceMemory<float> *v) override;
+  bool DoPopulateRandUniform(Stream *stream, DeviceMemory<double> *v) override;
+  bool DoPopulateRandUniform(Stream *stream,
+                             DeviceMemory<std::complex<float>> *v) override;
+  bool DoPopulateRandUniform(Stream *stream,
+                             DeviceMemory<std::complex<double>> *v) override;
+  bool DoPopulateRandGaussian(Stream *stream, float mean, float stddev,
+                              DeviceMemory<float> *v) override;
+  bool DoPopulateRandGaussian(Stream *stream, double mean, double stddev,
+                              DeviceMemory<double> *v) override;
+
+  bool SetSeed(Stream *stream, const uint8 *seed, uint64 seed_bytes) override;
+
+ private:
+  // Actually performs the work of generating random numbers - the public
+  // methods are thin wrappers to this interface.
+  template <typename T>
+  bool DoPopulateRandUniformInternal(Stream *stream, DeviceMemory<T> *v);
+  template <typename ElemT, typename FuncT>
+  bool DoPopulateRandGaussianInternal(Stream *stream, ElemT mean, ElemT stddev,
+                                      DeviceMemory<ElemT> *v, FuncT func);
+
+  // Sets the stream for the internal curand generator.
+  //
+  // This is a stateful operation, as the handle can only have one stream set at
+  // a given time, so it is usually performed right before enqueuing work to do
+  // with random number generation.
+  bool SetStream(Stream *stream) EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // mutex that guards the cuRAND handle for this device.
+  mutex mu_;
+
+  // CUDAExecutor which instantiated this CUDARng.
+  // Immutable post-initialization.
+  CUDAExecutor *parent_;
+
+  // cuRANDalibrary handle on the device.
+  curandGenerator_t rng_ GUARDED_BY(mu_);
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CUDARng);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_RNG_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_stream.cc b/tensorflow/stream_executor/cuda/cuda_stream.cc
new file mode 100644
index 0000000000..e70579b55c
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_stream.cc
@@ -0,0 +1,51 @@
+#include "tensorflow/stream_executor/cuda/cuda_stream.h"
+
+#include "tensorflow/stream_executor/lib/status.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+bool CUDAStream::Init() {
+  return CUDADriver::CreateStream(parent_->cuda_context(), &cuda_stream_);
+}
+
+void CUDAStream::Destroy() {
+  {
+    mutex_lock lock{mu_};
+    if (completed_event_ != nullptr) {
+      port::Status status =
+          CUDADriver::DestroyEvent(parent_->cuda_context(), &completed_event_);
+      if (!status.ok()) {
+        LOG(ERROR) << status.error_message();
+      }
+    }
+  }
+
+  CUDADriver::DestroyStream(parent_->cuda_context(), &cuda_stream_);
+}
+
+bool CUDAStream::IsIdle() const {
+  return CUDADriver::IsStreamIdle(parent_->cuda_context(), cuda_stream_);
+}
+
+bool CUDAStream::GetOrCreateCompletedEvent(CUevent *completed_event) {
+  mutex_lock lock{mu_};
+  if (completed_event_ != nullptr) {
+    *completed_event = completed_event_;
+    return true;
+  }
+
+  if (!CUDADriver::CreateEvent(parent_->cuda_context(), &completed_event_,
+                               CUDADriver::EventFlags::kDisableTiming)
+           .ok()) {
+    return false;
+  }
+
+  *completed_event = completed_event_;
+  return true;
+}
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/cuda/cuda_stream.h b/tensorflow/stream_executor/cuda/cuda_stream.h
new file mode 100644
index 0000000000..f6db64a1bf
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_stream.h
@@ -0,0 +1,74 @@
+// Defines the CUDAStream type - the CUDA-specific implementation of the generic
+// StreamExecutor Stream interface.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_STREAM_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_STREAM_H_
+
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/cuda/cuda_gpu_executor.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+class CUDAExecutor;
+
+// Wraps a CUstream in order to satisfy the platform-independent
+// StreamInterface.
+//
+// Thread-safe post-initialization.
+class CUDAStream : public internal::StreamInterface {
+ public:
+  explicit CUDAStream(CUDAExecutor *parent)
+      : parent_(parent), cuda_stream_(nullptr), completed_event_(nullptr) {}
+
+  // Note: teardown is handled by a parent's call to DeallocateStream.
+  ~CUDAStream() override {}
+
+  void *CudaStreamHack() override { return cuda_stream_; }
+  void **CudaStreamMemberHack() override {
+    return reinterpret_cast<void **>(&cuda_stream_);
+  }
+
+  // Explicitly initialize the CUDA resources associated with this stream, used
+  // by StreamExecutor::AllocateStream().
+  bool Init();
+
+  // Explicitly destroy the CUDA resources associated with this stream, used by
+  // StreamExecutor::DeallocateStream().
+  void Destroy();
+
+  // Returns true if no work is pending or executing on the stream.
+  bool IsIdle() const;
+
+  // Retrieves an event which indicates that all work enqueued into the stream
+  // has completed. Ownership of the event is not transferred to the caller, the
+  // event is owned by this stream.
+  bool GetOrCreateCompletedEvent(CUevent *completed_event);
+
+  // Returns the CUstream value for passing to the CUDA API.
+  //
+  // Precond: this CUDAStream has been allocated (otherwise passing a nullptr
+  // into the NVIDIA library causes difficult-to-understand faults).
+  CUstream cuda_stream() const {
+    DCHECK(cuda_stream_ != nullptr);
+    return const_cast<CUstream>(cuda_stream_);
+  }
+
+  CUDAExecutor *parent() const { return parent_; }
+
+ private:
+  mutex mu_;              // mutex that guards the completion event.
+  CUDAExecutor *parent_;  // Executor that spawned this stream.
+  CUstream cuda_stream_;  // Wrapped CUDA stream handle.
+
+  // Event that indicates this stream has completed.
+  CUevent completed_event_ GUARDED_BY(mu_);
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_STREAM_H_
diff --git a/tensorflow/stream_executor/cuda/cuda_timer.cc b/tensorflow/stream_executor/cuda/cuda_timer.cc
new file mode 100644
index 0000000000..ad5e13ab6b
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_timer.cc
@@ -0,0 +1,73 @@
+#include "tensorflow/stream_executor/cuda/cuda_timer.h"
+
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/cuda/cuda_gpu_executor.h"
+#include "tensorflow/stream_executor/cuda/cuda_stream.h"
+#include "tensorflow/stream_executor/lib/status.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+bool CUDATimer::Init() {
+  CHECK(start_event_ == nullptr && stop_event_ == nullptr);
+  CUcontext context = parent_->cuda_context();
+  if (!CUDADriver::CreateEvent(context, &start_event_,
+                               CUDADriver::EventFlags::kDefault)
+           .ok()) {
+    return false;
+  }
+
+  if (!CUDADriver::CreateEvent(context, &stop_event_,
+                               CUDADriver::EventFlags::kDefault)
+           .ok()) {
+    port::Status status = CUDADriver::DestroyEvent(context, &start_event_);
+    if (!status.ok()) {
+      LOG(ERROR) << status;
+    }
+    return false;
+  }
+
+  CHECK(start_event_ != nullptr && stop_event_ != nullptr);
+  return true;
+}
+
+void CUDATimer::Destroy() {
+  CUcontext context = parent_->cuda_context();
+  port::Status status = CUDADriver::DestroyEvent(context, &start_event_);
+  if (!status.ok()) {
+    LOG(ERROR) << status;
+  }
+
+  status = CUDADriver::DestroyEvent(context, &stop_event_);
+  if (!status.ok()) {
+    LOG(ERROR) << status;
+  }
+}
+
+float CUDATimer::GetElapsedMilliseconds() const {
+  CHECK(start_event_ != nullptr && stop_event_ != nullptr);
+  // TODO(leary) provide a way to query timer resolution?
+  // CUDA docs say a resolution of about 0.5us
+  float elapsed_milliseconds = NAN;
+  (void)CUDADriver::GetEventElapsedTime(parent_->cuda_context(),
+                                        &elapsed_milliseconds, start_event_,
+                                        stop_event_);
+  return elapsed_milliseconds;
+}
+
+bool CUDATimer::Start(CUDAStream *stream) {
+  return CUDADriver::RecordEvent(parent_->cuda_context(), start_event_,
+                                 stream->cuda_stream())
+      .ok();
+}
+
+bool CUDATimer::Stop(CUDAStream *stream) {
+  return CUDADriver::RecordEvent(parent_->cuda_context(), stop_event_,
+                                 stream->cuda_stream())
+      .ok();
+}
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/cuda/cuda_timer.h b/tensorflow/stream_executor/cuda/cuda_timer.h
new file mode 100644
index 0000000000..e49e212403
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/cuda_timer.h
@@ -0,0 +1,69 @@
+// Defines the CUDATimer type - the CUDA-specific implementation of the generic
+// StreamExecutor Timer interface.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_TIMER_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_TIMER_H_
+
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+#include "tensorflow/stream_executor/cuda/cuda_driver.h"
+#include "tensorflow/stream_executor/cuda/cuda_gpu_executor.h"
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+class CUDAExecutor;
+class CUDAStream;
+
+// Wraps a pair of CUevents in order to satisfy the platform-independent
+// TimerInferface -- both a start and a stop event are present which may be
+// recorded in a stream.
+class CUDATimer : public internal::TimerInterface {
+ public:
+  explicit CUDATimer(CUDAExecutor *parent)
+      : parent_(parent), start_event_(nullptr), stop_event_(nullptr) {}
+
+  // Note: teardown is explicitly handled in this API by a call to
+  // StreamExecutor::DeallocateTimer(), which invokes Destroy().
+  ~CUDATimer() override {}
+
+  // Allocates the platform-specific pieces of the timer, called as part of
+  // StreamExecutor::AllocateTimer().
+  bool Init();
+
+  // Deallocates the platform-specific pieces of the timer, called as part of
+  // StreamExecutor::DeallocateTimer().
+  void Destroy();
+
+  // Records the "timer start" event at the current point in the stream.
+  bool Start(CUDAStream *stream);
+
+  // Records the "timer stop" event at the current point in the stream.
+  bool Stop(CUDAStream *stream);
+
+  // Returns the elapsed time, in milliseconds, between the start and stop
+  // events.
+  float GetElapsedMilliseconds() const;
+
+  // See perftools::gputools::Timer::Microseconds().
+  // TODO(leary) make this into an error code interface...
+  uint64 Microseconds() const override {
+    return GetElapsedMilliseconds() * 1e3;
+  }
+
+  // See perftools::GPUTools::Timer::Nanoseconds().
+  uint64 Nanoseconds() const override { return GetElapsedMilliseconds() * 1e6; }
+
+ private:
+  CUDAExecutor *parent_;
+  CUevent start_event_;  // Event recorded to indicate the "start" timestamp
+                         // executing in a stream.
+  CUevent stop_event_;   // Event recorded to indicate the "stop" timestamp
+                         // executing in a stream.
+};
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_CUDA_TIMER_H_
diff --git a/tensorflow/stream_executor/cuda/multi_op_activation.h b/tensorflow/stream_executor/cuda/multi_op_activation.h
new file mode 100644
index 0000000000..ba2bcd3a91
--- /dev/null
+++ b/tensorflow/stream_executor/cuda/multi_op_activation.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_CUDA_MULTI_OP_ACTIVATION_H_
+#define TENSORFLOW_STREAM_EXECUTOR_CUDA_MULTI_OP_ACTIVATION_H_
+
+namespace perftools {
+namespace gputools {
+namespace cuda {
+
+// Type-safe boolean wrapper: denotes whether a ScopedActivateExecutorContext
+// may have other ScopedActivateExecutorContexts nested within it.
+enum class MultiOpActivation { kNo = false, kYes = true };
+
+}  // namespace cuda
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_CUDA_MULTI_OP_ACTIVATION_H_
diff --git a/tensorflow/stream_executor/device_description.cc b/tensorflow/stream_executor/device_description.cc
new file mode 100644
index 0000000000..23c110c2f3
--- /dev/null
+++ b/tensorflow/stream_executor/device_description.cc
@@ -0,0 +1,221 @@
+#include "tensorflow/stream_executor/device_description.h"
+
+#include <algorithm>
+
+#include "tensorflow/stream_executor/lib/human_readable.h"
+#include "tensorflow/stream_executor/lib/mathutil.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+
+namespace perftools {
+namespace gputools {
+
+static const uint64 kUninitializedUint64 = -1ULL;
+/* static */ const char *DeviceDescription::kUndefinedString = "<undefined>";
+
+DeviceDescription::DeviceDescription()
+    : device_vendor_(kUndefinedString),
+      platform_version_(kUndefinedString),
+      driver_version_(kUndefinedString),
+      runtime_version_(kUndefinedString),
+      pci_bus_id_(kUndefinedString),
+      name_(kUndefinedString),
+      thread_dim_limit_(kUninitializedUint64, kUninitializedUint64,
+                        kUninitializedUint64),
+      block_dim_limit_(kUninitializedUint64, kUninitializedUint64,
+                       kUninitializedUint64),
+      blocks_per_core_limit_(kUninitializedUint64),
+      threads_per_core_limit_(kUninitializedUint64),
+      threads_per_block_limit_(kUninitializedUint64),
+      threads_per_warp_(kUninitializedUint64),
+      registers_per_core_limit_(kUninitializedUint64),
+      registers_per_block_limit_(kUninitializedUint64),
+      registers_per_thread_limit_(kUninitializedUint64),
+      warp_alloc_granularity_(1),
+      register_alloc_granularity_(1),
+      shared_memory_alloc_granularity_(1),
+      device_address_bits_(kUninitializedUint64),
+      device_memory_size_(kUninitializedUint64),
+      shared_memory_per_core_(kUninitializedUint64),
+      shared_memory_per_block_(kUninitializedUint64),
+      clock_rate_ghz_(-1.0),
+      cuda_compute_capability_major_(-1),
+      cuda_compute_capability_minor_(-1),
+      numa_node_(-1),
+      core_count_(-1),
+      ecc_enabled_(false) {}
+
+std::unique_ptr<std::map<string, string>> DeviceDescription::ToMap() const {
+  std::unique_ptr<std::map<string, string>> owned_result{
+      new std::map<string, string>};
+  std::map<string, string> &result = *owned_result;
+  result["Device Vendor"] = device_vendor();
+  result["Platform Version"] = platform_version();
+  result["Driver Version"] = driver_version();
+  result["Runtime Version"] = runtime_version();
+  result["PCI bus ID"] = pci_bus_id_;
+  result["Device Name"] = name_;
+
+  const ThreadDim &thread_dim = thread_dim_limit();
+  result["ThreadDim Limit"] =
+      port::StrCat(thread_dim.x, ",", thread_dim.y, ",", thread_dim.z);
+  const BlockDim &block_dim = block_dim_limit();
+  result["BlockDim Limit"] =
+      port::StrCat(block_dim.x, ",", block_dim.y, ",", block_dim.z);
+
+  result["Threads Per Core Limit"] = port::StrCat(threads_per_core_limit());
+  result["Threads Per Block Limit"] = port::StrCat(threads_per_block_limit());
+  result["Registers Per Block Limit"] =
+      port::StrCat(registers_per_block_limit());
+
+  result["Device Address Bits"] = port::StrCat(device_address_bits());
+  result["Device Memory Size"] =
+      port::HumanReadableNumBytes::ToString(device_memory_size());
+
+  result["Shared Memory Per Core"] =
+      port::HumanReadableNumBytes::ToString(shared_memory_per_core_);
+  result["Shared Memory Per Block"] =
+      port::HumanReadableNumBytes::ToString(shared_memory_per_block_);
+
+  result["Clock Rate GHz"] = port::StrCat(clock_rate_ghz());
+
+  result["CUDA Compute Capability"] = port::StrCat(
+      cuda_compute_capability_major_, ".", cuda_compute_capability_minor_);
+
+  result["NUMA Node"] = port::StrCat(numa_node());
+  result["Core Count"] = port::StrCat(core_count());
+  result["ECC Enabled"] = port::StrCat(ecc_enabled());
+  return owned_result;
+}
+
+namespace internal {
+
+DeviceDescriptionBuilder::DeviceDescriptionBuilder()
+    : device_description_(new DeviceDescription) {}
+
+}  // namespace internal
+
+bool DeviceDescription::cuda_compute_capability(int *major, int *minor) const {
+  *major = cuda_compute_capability_major_;
+  *minor = cuda_compute_capability_minor_;
+  return cuda_compute_capability_major_ != 0;
+}
+
+bool ThreadDimOk(const DeviceDescription &device_description,
+                 const ThreadDim &thread_dim) {
+  auto total_threads = thread_dim.x * thread_dim.y * thread_dim.z;
+  auto threads_per_block_limit = device_description.threads_per_block_limit();
+  if (total_threads > threads_per_block_limit) {
+    VLOG(2) << "exceeded total-thread-per-block limit: " << total_threads
+            << " vs limit " << threads_per_block_limit;
+    return false;
+  }
+
+  const auto &limit = device_description.thread_dim_limit();
+  bool ok = thread_dim.x <= limit.x && thread_dim.y <= limit.y &&
+            thread_dim.z <= limit.z;
+  if (!ok) {
+    VLOG(2) << "thread dim " << thread_dim.ToString()
+            << " exceeds limit contraints of " << limit.ToString();
+  }
+  return ok;
+}
+
+uint64 DivideCeil(uint64 x, uint64 y) {
+  return port::MathUtil::CeilOfRatio(x, y);
+}
+
+void CalculateDimensionality(const DeviceDescription &device_description,
+                             uint64 element_count, uint64 *threads_per_block,
+                             uint64 *block_count) {
+  *threads_per_block = device_description.threads_per_block_limit();
+  *block_count = DivideCeil(element_count, *threads_per_block);
+  if (*block_count == 1) {
+    CHECK_LE(element_count, *threads_per_block);
+    *threads_per_block = element_count;
+  }
+}
+
+// Round value up to a multiple of n.
+static uint64 RoundUp(uint64 value, uint64 n) {
+  return port::MathUtil::CeilOfRatio(value, n) * n;
+}
+
+// Round value down to a multiple of n.
+static uint64 RoundDown(uint64 value, uint64 n) {
+  return port::MathUtil::FloorOfRatio(value, n) * n;
+}
+
+uint64 CalculateOccupancy(const DeviceDescription &device_description,
+                          uint64 registers_per_thread,
+                          uint64 shared_memory_per_block,
+                          const ThreadDim &thread_dims) {
+  // Don't try to compute occupancy if necessary values are not initialized.
+  uint64 required_fields[] =  { device_description.registers_per_thread_limit(),
+                                device_description.threads_per_warp(),
+                                device_description.warp_alloc_granularity(),
+                                device_description.register_alloc_granularity(),
+                                device_description.registers_per_block_limit(),
+                                device_description.shared_memory_per_core(),
+                                device_description.blocks_per_core_limit() };
+  for (auto value : required_fields) {
+    if (value == kUninitializedUint64) {
+      return 0;
+    }
+  }
+
+  if (registers_per_thread > device_description.registers_per_thread_limit()) {
+    return 0;
+  }
+
+  uint64 warps_per_block =
+      port::MathUtil::CeilOfRatio(thread_dims.x * thread_dims.y * thread_dims.z,
+                                  device_description.threads_per_warp());
+
+  // Warp resources are allocated at a particular granularity.  This value is
+  // the effective number of warps for resource allocation purposes.
+  uint64 alloc_warps_per_block =
+      RoundUp(warps_per_block, device_description.warp_alloc_granularity());
+
+  uint64 alloc_regs_per_warp =
+      RoundUp(device_description.threads_per_warp() * registers_per_thread,
+              device_description.register_alloc_granularity());
+  uint64 regs_per_block = alloc_warps_per_block * alloc_regs_per_warp;
+  uint64 reg_limit =
+      device_description.registers_per_block_limit() / regs_per_block;
+
+  uint64 alloc_smem_per_block = RoundUp(
+      shared_memory_per_block,
+      device_description.shared_memory_alloc_granularity());
+  uint64 smem_limit = alloc_smem_per_block > 0 ?
+      device_description.shared_memory_per_core() / alloc_smem_per_block :
+      device_description.blocks_per_core_limit();
+
+  uint64 thread_limit = device_description.threads_per_core_limit()
+      / (warps_per_block  * device_description.threads_per_warp());
+
+  return std::min({ device_description.blocks_per_core_limit(),
+          reg_limit, smem_limit, thread_limit });
+}
+
+uint64 CalculateRegisterLimitForTargetOccupancy(
+    const DeviceDescription &device_description, uint64 shared_memory_per_block,
+    const ThreadDim &thread_dims, uint64 target_blocks_per_core) {
+  // Linear search from maximum number of registers down until the target
+  // blocks per SM is found.
+  // TODO(meheff): Compute this using a closed form solution.
+  int reg_step = device_description.register_alloc_granularity() /
+      device_description.threads_per_warp();
+  for (int r = device_description.registers_per_thread_limit(); r > 0;
+       r = RoundDown(r - 1, reg_step)) {
+    uint64 occupancy = CalculateOccupancy(
+        device_description, r, shared_memory_per_block, thread_dims);
+    if (occupancy >= target_blocks_per_core) {
+      return r;
+    }
+  }
+  return 0;
+}
+
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/device_description.h b/tensorflow/stream_executor/device_description.h
new file mode 100644
index 0000000000..e7b7102da5
--- /dev/null
+++ b/tensorflow/stream_executor/device_description.h
@@ -0,0 +1,370 @@
+// Describes the underlying platform for a StreamExecutor; e.g. OpenCL or CUDA
+// device and platform properties. Also contains convenience functions for
+// checking/calculating launch dimensionality based on device properties.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_DEVICE_DESCRIPTION_H_
+#define TENSORFLOW_STREAM_EXECUTOR_DEVICE_DESCRIPTION_H_
+
+#include <map>
+#include <memory>
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/launch_dim.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace internal {
+class DeviceDescriptionBuilder;
+}  // namespace internal
+
+// Data that describes the execution target of the StreamExecutor, in terms of
+// important logical parameters. These include dimensionality limits and
+// physical parameters of interest, such as number of cores present on the
+// device.
+//
+// Thread-safe: immutable post-initialization.
+class DeviceDescription {
+ public:
+  // Returns the platform being run on; this value is primarily intended for
+  // printing, and comes out something like "OpenCL 1.2" or "Compute Capability
+  // 3.5".
+  const string &platform_version() const { return platform_version_; }
+
+  // Returns the driver version interfacing with the underlying platform. Vendor
+  // dependent format.
+  const string &driver_version() const { return driver_version_; }
+
+  // Return the runtime version, if one is provided by the underlying platform.
+  // Vendor dependent format / usefulness.
+  const string &runtime_version() const { return runtime_version_; }
+
+  // Returns the name that the device reports. Vendor dependent.
+  const string &name() const { return name_; }
+
+  // Returns the PCI bus identifier for this device, of the form
+  // [domain]:[bus]:[device].[function]
+  const string &pci_bus_id() const { return pci_bus_id_; }
+
+  // Returns the NUMA node associated with this device, for use in
+  // determining socket locality. If the NUMA node could not be determined, -1
+  // is returned.
+  int numa_node() const { return numa_node_; }
+
+  // Number of cores (traditional notion of core; i.e. an SM on an NVIDIA device
+  // or an AMD Compute Unit.
+  int core_count() const { return core_count_; }
+
+  // Returns the limit on the thread dimensionality values in each of the
+  // respective dimensions. These limits affect what constitutes a legitimate
+  // kernel launch request.
+  const ThreadDim &thread_dim_limit() const { return thread_dim_limit_; }
+
+  // Returns the limit on the block dimensionality values in each of the
+  // respective dimensions. These limits may affect what constitutes a
+  // legitimate kernel launch request.
+  const BlockDim &block_dim_limit() const { return block_dim_limit_; }
+
+  // Returns the limit on the number of simultaneously resident blocks
+  // on a multiprocessor.
+  const uint64 blocks_per_core_limit() const { return blocks_per_core_limit_; }
+
+  // Returns the limit on the total number of threads that can be launched in a
+  // single block; i.e. the limit on x * y * z dimensions of a ThreadDim.
+  // This limit affects what constitutes a legitimate kernel launch request.
+  const uint64 &threads_per_block_limit() const {
+    return threads_per_block_limit_;
+  }
+
+  // Returns the limit on the total number of threads that can be simultaneously
+  // launched on a given multiprocessor.
+  const uint64 &threads_per_core_limit() const {
+    return threads_per_core_limit_;
+  }
+
+  // Returns the number of threads per warp/wavefront.
+  const uint64 &threads_per_warp() const { return threads_per_warp_; }
+
+  // Returns the limit on the total number of registers per core.
+  const uint64 &registers_per_core_limit() const {
+    return registers_per_core_limit_;
+  }
+
+  // Returns the limit on the total number of registers that can be
+  // simultaneously used by a block.
+  const uint64 &registers_per_block_limit() const {
+    return registers_per_block_limit_;
+  }
+
+  // Returns the limit on the total number of registers that can be
+  // allocated to a thread.
+  const uint64 &registers_per_thread_limit() const {
+    return registers_per_thread_limit_;
+  }
+
+  // Returns the granularity at which warps are allocated resources.
+  const uint64 &warp_alloc_granularity() const {
+    return warp_alloc_granularity_;
+  }
+
+  // Returns the granularity at which registers are allocated to warps.
+  const uint64 &register_alloc_granularity() const {
+    return register_alloc_granularity_;
+  }
+
+  // Returns the granularity at which shared memory is allocated to warps.
+  const uint64 &shared_memory_alloc_granularity() const {
+    return shared_memory_alloc_granularity_;
+  }
+
+  // Returns the number of address bits available to kernel code running on the
+  // platform. This affects things like the maximum allocation size and perhaps
+  // types used in kernel code such as size_t.
+  const uint64 &device_address_bits() const { return device_address_bits_; }
+
+  // Returns the device memory size in bytes.
+  uint64 device_memory_size() const { return device_memory_size_; }
+
+  // Returns the device's core clock rate in GHz.
+  const float clock_rate_ghz() const { return clock_rate_ghz_; }
+
+  // Returns whether ECC is enabled.
+  bool ecc_enabled() const { return ecc_enabled_; }
+
+  // Returns the device vendor string, e.g., "NVIDIA Corporation", "Advanced
+  // Micro Devices, Inc.", or "GenuineIntel".
+  const string &device_vendor() const { return device_vendor_; }
+
+  // Returns the CUDA compute capability if we're running on the CUDA platform.
+  // If a CUDA compute capability is not available, the major version will be
+  // zero, and the return value will be false.
+  bool cuda_compute_capability(int *major, int *minor) const;
+
+  // Returns the maximum amount of shared memory present on a single core
+  // (i.e. Streaming Multiprocessor on NVIDIA GPUs; Compute Unit for OpenCL
+  // devices). Note that some devices, such as NVIDIA's have a configurable
+  // partitioning between shared memory and L1 cache.
+  uint64 shared_memory_per_core() const { return shared_memory_per_core_; }
+
+  // Returns the maximum amount of shared memory available for a single block.
+  uint64 shared_memory_per_block() const { return shared_memory_per_block_; }
+
+  // TODO(leary): resident blocks per core will be useful.
+
+  // Convenience typedef for the string-based DeviceDescription mapping.
+  typedef std::map<string, string> Map;
+
+  // Returns a mapping from readable names to readable values that describe the
+  // device. This is useful for things like printing.
+  std::unique_ptr<Map> ToMap() const;
+
+  // For string values that are not available via the underlying platform, this
+  // value will be provided.
+  static const char *kUndefinedString;
+
+ private:
+  friend class internal::DeviceDescriptionBuilder;
+
+  DeviceDescription();
+
+  // For description of the following members, see the corresponding accessor
+  // above.
+  //
+  // N.B. If another field is added, update ToMap() above.
+  string device_vendor_;
+  string platform_version_;
+  string driver_version_;
+  string runtime_version_;
+  string pci_bus_id_;
+  string name_;
+
+  ThreadDim thread_dim_limit_;
+  BlockDim block_dim_limit_;
+
+  uint64 blocks_per_core_limit_;
+
+  uint64 threads_per_core_limit_;
+  uint64 threads_per_block_limit_;
+  uint64 threads_per_warp_;
+
+  uint64 registers_per_core_limit_;
+  uint64 registers_per_block_limit_;
+  uint64 registers_per_thread_limit_;
+
+  uint64 warp_alloc_granularity_;
+  uint64 register_alloc_granularity_;
+  uint64 shared_memory_alloc_granularity_;
+
+  uint64 device_address_bits_;
+  uint64 device_memory_size_;
+
+  // Shared memory limits on a given device.
+  uint64 shared_memory_per_core_;
+  uint64 shared_memory_per_block_;
+
+  float clock_rate_ghz_;
+
+  // CUDA "CC" major value, -1 if not available.
+  int cuda_compute_capability_major_;
+  int cuda_compute_capability_minor_;
+
+  int numa_node_;
+  int core_count_;
+  bool ecc_enabled_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(DeviceDescription);
+};
+
+namespace internal {
+
+// Helper class the builds a device description, given that it has a large
+// number of fields that would be easily confused in constructor form.
+class DeviceDescriptionBuilder {
+ public:
+  DeviceDescriptionBuilder();
+
+  // For descriptions of the following fields, see comments on the corresponding
+  // DeviceDescription::* accessors above.
+
+  void set_device_vendor(const string &value) {
+    device_description_->device_vendor_ = value;
+  }
+  void set_platform_version(const string &value) {
+    device_description_->platform_version_ = value;
+  }
+  void set_driver_version(const string &value) {
+    device_description_->driver_version_ = value;
+  }
+  void set_runtime_version(const string &value) {
+    device_description_->runtime_version_ = value;
+  }
+  void set_pci_bus_id(const string &value) {
+    device_description_->pci_bus_id_ = value;
+  }
+  void set_name(const string &value) { device_description_->name_ = value; }
+
+  void set_thread_dim_limit(const ThreadDim &value) {
+    device_description_->thread_dim_limit_ = value;
+  }
+  void set_block_dim_limit(const BlockDim &value) {
+    device_description_->block_dim_limit_ = value;
+  }
+
+  void set_blocks_per_core_limit(uint64 value) {
+    device_description_->blocks_per_core_limit_ = value;
+  }
+
+  void set_threads_per_core_limit(uint64 value) {
+    device_description_->threads_per_core_limit_ = value;
+  }
+  void set_threads_per_block_limit(uint64 value) {
+    device_description_->threads_per_block_limit_ = value;
+  }
+  void set_threads_per_warp(uint64 value) {
+    device_description_->threads_per_warp_ = value;
+  }
+
+  void set_registers_per_core_limit(uint64 value) {
+    device_description_->registers_per_core_limit_ = value;
+  }
+  void set_registers_per_block_limit(uint64 value) {
+    device_description_->registers_per_block_limit_ = value;
+  }
+  void set_registers_per_thread_limit(uint64 value) {
+    device_description_->registers_per_thread_limit_ = value;
+  }
+
+  void set_warp_alloc_granularity(uint64 value) {
+    device_description_->warp_alloc_granularity_ = value;
+  }
+  void set_register_alloc_granularity(uint64 value) {
+    device_description_->register_alloc_granularity_ = value;
+  }
+  void set_shared_memory_alloc_granularity(uint64 value) {
+    device_description_->shared_memory_alloc_granularity_ = value;
+  }
+
+  void set_device_address_bits(uint64 value) {
+    device_description_->device_address_bits_ = value;
+  }
+  void set_device_memory_size(uint64 value) {
+    device_description_->device_memory_size_ = value;
+  }
+
+  void set_shared_memory_per_core(int64 value) {
+    device_description_->shared_memory_per_core_ = value;
+  }
+  void set_shared_memory_per_block(int64 value) {
+    device_description_->shared_memory_per_block_ = value;
+  }
+
+  void set_clock_rate_ghz(float value) {
+    device_description_->clock_rate_ghz_ = value;
+  }
+
+  void set_cuda_compute_capability(int major, int minor) {
+    device_description_->cuda_compute_capability_major_ = major;
+    device_description_->cuda_compute_capability_minor_ = minor;
+  }
+
+  void set_numa_node(int value) { device_description_->numa_node_ = value; }
+  void set_core_count(int value) { device_description_->core_count_ = value; }
+  void set_ecc_enabled(bool value) {
+    device_description_->ecc_enabled_ = value;
+  }
+
+  // Returns a built DeviceDescription with ownership transferred to the
+  // caller. There are currently no restrictions on which fields must be set in
+  // order to build the descriptor.
+  //
+  // Once the description is built, this builder object should be discarded.
+  std::unique_ptr<DeviceDescription> Build() {
+    return std::move(device_description_);
+  }
+
+ private:
+  std::unique_ptr<DeviceDescription> device_description_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(DeviceDescriptionBuilder);
+};
+
+}  // namespace internal
+
+// Returns whether the given thread_dim is acceptable given the limits described
+// in device_description. For detailed reasons for failing the predicate, enable
+// VLOG(2) for this module.
+bool ThreadDimOk(const DeviceDescription &device_description,
+                 const ThreadDim &thread_dim);
+
+// [deprecated] Use MathUtil::CeilOfRatio directly instead.
+//
+// Equivalent to ceil(double(element_count) / threads_per_block).
+uint64 DivideCeil(uint64 x, uint64 y);
+
+// Calculate the number of threads/blocks required to process element_count
+// elements. Note that you can still end up with more threads than
+// element_count due to rounding, so kernels often start with an "is this
+// thread id in the element_count range?" test.
+void CalculateDimensionality(const DeviceDescription &device_description,
+                             uint64 element_count, uint64 *threads_per_block,
+                             uint64 *block_count);
+
+// Compute and return maximum blocks per core (occupancy) based on the
+// device description, some kernel characteristics and the number of threads per
+// block.  If unable to compute occupancy, zero is returned.
+uint64 CalculateOccupancy(const DeviceDescription &device_description,
+                          uint64 registers_per_thread,
+                          uint64 shared_memory_per_block,
+                          const ThreadDim &thread_dims);
+
+// Compute and return the maximum number of registers per thread which
+// achieves the target occupancy.  If the target is not possible then
+// zero is returned.
+uint64 CalculateRegisterLimitForTargetOccupancy(
+    const DeviceDescription &device_description, uint64 shared_memory_per_block,
+    const ThreadDim &thread_dims, uint64 target_blocks_per_core);
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_DEVICE_DESCRIPTION_H_
diff --git a/tensorflow/stream_executor/device_memory.h b/tensorflow/stream_executor/device_memory.h
new file mode 100644
index 0000000000..9e88180316
--- /dev/null
+++ b/tensorflow/stream_executor/device_memory.h
@@ -0,0 +1,284 @@
+// Suite of types that represent device memory allocations. These are
+// allocated by the StreamExecutor interface, which produces values appropriate
+// for the underlying platform (whether it be CUDA or OpenCL).
+//
+// The untyped base class (like a device void*) is DeviceMemoryBase, which can
+// be specialized for a given allocation type (like a device T*) using
+// DeviceMemory<T>.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_DEVICE_MEMORY_H_
+#define TENSORFLOW_STREAM_EXECUTOR_DEVICE_MEMORY_H_
+
+#include <stddef.h>
+
+#include "tensorflow/stream_executor/lib/casts.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+class StreamExecutor;
+
+// void*-analogous device memory allocation. For the typed variation, see
+// DeviceMemory<T>.
+//
+// This is effectively a two-tuple of a pointer and size; however, note that the
+// pointer may not be to the virtual address itself -- in OpenCL the pointer is
+// to a cl_mem handle that describes the device allocation. Therefore,
+// DeviceMemoryBase::opaque does not necessarily produce a pointer that can be
+// referenced directly, so use it with caution.
+//
+// Thread-compatible.
+class DeviceMemoryBase {
+ public:
+  // Default constructor instantiates a null-pointed, zero-sized device memory
+  // region. An opaque pointer may be provided -- see header for details on the
+  // opacity of that pointer.
+  explicit DeviceMemoryBase(void *opaque = nullptr, uint64 size = 0,
+                            bool is_sub_buffer = false)
+      : opaque_(opaque), size_(size), is_sub_buffer_(is_sub_buffer) {}
+
+  // Returns whether the backing memory is the null pointer.
+  // A `== nullptr` convenience method is also provided.
+  bool is_null() const { return opaque_ == nullptr; }
+  bool operator==(std::nullptr_t other) const { return is_null(); }
+  bool operator!=(std::nullptr_t other) const { return !is_null(); }
+
+  // Provides a partial order between device memory values.
+  //
+  // This operator is provided so that this object can be used as a key in an
+  // ordered map.
+  bool operator<(const DeviceMemoryBase &other) const {
+    return opaque() < other.opaque();
+  }
+
+  // Returns the size, in bytes, for the backing memory.
+  uint64 size() const { return size_; }
+
+  // Warning: note that the pointer returned is not necessarily directly to
+  // device virtual address space, but is platform-dependent.
+  void *opaque() { return opaque_; }
+  const void *opaque() const { return opaque_; }
+
+  // Returns true if this is an offset into another primary allocation.
+  bool is_sub_buffer() const { return is_sub_buffer_; }
+
+  // Returns whether the two DeviceMemoryBase segments are identical (both in
+  // their opaque pointer and size).
+  bool IsSameAs(const DeviceMemoryBase &other) const {
+    return opaque() == other.opaque() && size() == other.size();
+  }
+
+ protected:
+  friend class StreamExecutor;
+
+  // Resets the internal values of the opaque pointer and number of bytes in the
+  // memory region, just as in the constructor.
+  void Reset(void *opaque, uint64 bytes) {
+    opaque_ = opaque;
+    size_ = bytes;
+  }
+
+ private:
+  void *opaque_;  // Platform-dependent value representing allocated memory.
+  uint64 size_;   // Size in bytes of this allocation.
+  bool is_sub_buffer_;  // Is this a primary allocation or a sub-buffer?
+};
+
+// Typed wrapper around "void *"-like DeviceMemoryBase.
+//
+// For example, DeviceMemory<int> is a simple wrapper around DeviceMemoryBase
+// that represents one or more integers in Device memory.
+//
+// Thread-compatible.
+template <typename ElemT>
+class DeviceMemory final : public DeviceMemoryBase {
+ public:
+  // Default constructor instantiates a null-pointed, zero-sized memory region.
+  DeviceMemory() : DeviceMemoryBase(nullptr, 0) {}
+
+  // Typed device memory regions may be constructed from untyped device memory
+  // regions, this effectively amounts to a cast from a void*.
+  explicit DeviceMemory(const DeviceMemoryBase &other)
+      : DeviceMemoryBase(const_cast<DeviceMemoryBase &>(other).opaque(),
+                         other.size(), other.is_sub_buffer()) {}
+
+  static constexpr size_t kElemSize = sizeof(ElemT);
+
+  // Returns the number of elements of type ElemT that constitute this
+  // allocation.
+  uint64 ElementCount() const { return size() / kElemSize; }
+
+  // Returns whether this is a single-element allocation.
+  bool IsScalar() const { return ElementCount() == 1; }
+
+  // Create a typed area of DeviceMemory with a given opaque pointer and the
+  // quantity of bytes in the allocation. This function is broken out to
+  // distinguish bytes from an element count.
+  static DeviceMemory<ElemT> MakeFromByteSize(void *opaque, uint64 bytes) {
+    return DeviceMemory<ElemT>(opaque, bytes);
+  }
+
+  // Resets the DeviceMemory data, in MakeFromByteSize fashion.
+  // This simply clobbers the prior values.
+  void ResetFromByteSize(void *opaque, uint64 bytes) {
+    // TODO(leary) when NVCC is eliminated we can add this check (and the
+    // logging include it requires).
+    // CHECK_EQ(0, bytes % kElemSize);
+    DeviceMemoryBase::Reset(opaque, bytes);
+  }
+
+  // ------------------------------------------------------------
+  // DO NOT USE - FASTR TEAM-INTERNAL FUNCTIONS
+  // Used internally by gcudacc.
+#ifdef __GCUDACC__
+  // Implicit conversion operators needed to support mixed mode. Since buffer
+  // sizes aren't used in the CUDA launching process, and since the constructed
+  // objects are all temporary, this is safe.
+  // Linter warning disabled as we require an implicit conversion.
+  DeviceMemory(const ElemT *opaque) :  // NOLINT
+        DeviceMemoryBase(reinterpret_cast<void *>(const_cast<ElemT *>(opaque)),
+                         0) {}
+
+  operator ElemT *() { return reinterpret_cast<ElemT *>(opaque()); }
+  operator const ElemT *() {
+    return const_cast<const ElemT *>(reinterpret_cast<ElemT *>(opaque()));
+  }
+#endif
+  // ------------------------------------------------------------
+
+ protected:
+  // This constructor is solely used from derived classes; it is made protected
+  // because it accepts a byte-size instead of an element count, which could
+  // potentially be misused given the ElementCount() nature of this interface.
+  //
+  // In order to specify the desire to use byte size instead of element count
+  // explicitly, use MakeFromByteSize.
+  DeviceMemory(void *opaque, uint64 size) : DeviceMemoryBase(opaque, size) {}
+};
+
+// A class to encapsulate the type and size of a dynamic shared memory
+// buffer. Because the buffer exists solely on the device and is not copyable
+// to the host, memory objects of this type do not maintain buffer pointers
+// on the host.
+template <typename ElemT>
+class SharedDeviceMemory final : public DeviceMemoryBase {
+ public:
+  explicit SharedDeviceMemory(uint64 elem_count)
+      : DeviceMemoryBase(nullptr, elem_count * kElemSize) {}
+
+  static constexpr size_t kElemSize = sizeof(ElemT);
+
+  // Returns the number of elements of type ElemT that constitute this
+  // allocation.
+  uint64 ElementCount() const { return size() / kElemSize; }
+
+  // Returns whether this is a single-element allocation.
+  bool IsScalar() const { return ElementCount() == 1; }
+};
+
+// Similar to the typed DeviceMemory, but is the unique owner of its
+// memory, if any. ScopedDeviceMemory is thread-compatible. It is also
+// movable and uncopyable to represent unique ownership.
+template <typename ElemT>
+class ScopedDeviceMemory {
+ public:
+  // Parameters:
+  //  parent: Executor used to deallocate memory when this instance goes
+  //          out of scope.
+  //  value: Already-allocated device memory value for this scoped mechanism to
+  //         deallocate. This memory must have been allocated by parent.
+  ScopedDeviceMemory(StreamExecutor *parent, DeviceMemoryBase value);
+
+  // Constructor overload that places a literal array into device memory
+  ScopedDeviceMemory(StreamExecutor *parent,
+                     std::initializer_list<ElemT> values);
+
+  // Moves ownership of the memory from other to the constructed
+  // object.
+  //
+  // Postcondition: other == nullptr.
+  ScopedDeviceMemory(ScopedDeviceMemory &&other) noexcept:
+      ScopedDeviceMemory(other.parent_, other.Release()) {}
+
+  // Releases the memory that was provided in the constructor, through the
+  // "parent" StreamExecutor.
+  ~ScopedDeviceMemory();
+
+  // Moves ownership of the memory from other to this object.
+  //
+  // Postcondition: other == nullptr.
+  ScopedDeviceMemory& operator=(ScopedDeviceMemory &&other) {
+    Reset(other.Release());
+    parent_ = other.parent_;
+    return *this;
+  }
+
+  // Returns the memory that backs this scoped allocation converted to
+  // DeviceMemory<T> apparent type. This is useful for cases where the
+  // DeviceMemory must be passed by const-ref, as the ScopedDeviceMemory doesn't
+  // allow copying, for scoped-object-lifetime reasons.
+  const DeviceMemory<ElemT> &cref() const { return wrapped_; }
+
+  // Returns a pointer to the DeviceMemory<T> apparent type for use in mutable
+  // operations. The value returned should not be used outside the scope of this
+  // ScopedDeviceMemory object's lifetime.
+  DeviceMemory<ElemT> *ptr() { return &wrapped_; }
+  const DeviceMemory<ElemT> *ptr() const { return &wrapped_; }
+
+  // Smart-pointer-like operators for the wrapped DeviceMemory.
+  // This reference must not be used outside the lifetime of this
+  // ScopedDeviceMemory.
+  const DeviceMemory<ElemT> &operator*() const { return cref(); }
+  DeviceMemory<ElemT> *operator->() { return ptr(); }
+  const DeviceMemory<ElemT> *operator->() const { return ptr(); }
+  bool operator==(std::nullptr_t other) const { return wrapped_.is_null(); }
+  bool operator!=(std::nullptr_t other) const { return !wrapped_.is_null(); }
+
+  // Analogous to std::unique_ptr::reset, frees the existing memory held in
+  // this scoped memory container and replaces it with updated. Ownership
+  // of updated is transferred to this object.
+  void Reset(DeviceMemory<ElemT> updated);
+  void Reset(std::nullptr_t);
+
+  // Analogous to std::unique_ptr::release, releases ownership of the held
+  // memory and transfers it to the caller.
+  //
+  // Postcondition: *this == nullptr
+  DeviceMemory<ElemT> Release() {
+    auto tmp = wrapped_;
+    wrapped_.ResetFromByteSize(nullptr, 0);
+    return tmp;
+  }
+
+ private:
+  DeviceMemory<ElemT> wrapped_;  // Value we wrap with scoped-release.
+  StreamExecutor *parent_;       // See constructor.
+
+  SE_DISALLOW_COPY_AND_ASSIGN(ScopedDeviceMemory);
+};
+
+// Host-side representation of packed-and-aligned vector datatypes on the device
+// side. Since these can appear in device kernel signatures, we support
+// launching them with these datatypes in launch signatures.
+
+struct Float2 {
+  float x, y;
+};
+
+struct Float4 {
+  Float2 xz, yw;
+};
+
+struct Double2 {
+  double x, y;
+};
+
+static_assert(sizeof(Float2) == 2 * sizeof(float), "Float2 must be packed");
+static_assert(sizeof(Float4) == 4 * sizeof(float), "Float4 must be packed");
+static_assert(sizeof(Double2) == 2 * sizeof(double), "Double2 must be packed");
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_DEVICE_MEMORY_H_
diff --git a/tensorflow/stream_executor/device_options.h b/tensorflow/stream_executor/device_options.h
new file mode 100644
index 0000000000..bd393a6efb
--- /dev/null
+++ b/tensorflow/stream_executor/device_options.h
@@ -0,0 +1,70 @@
+// Contains device-level options that can be specified at a platform level.
+// Example usage:
+//    auto device_options = DeviceOptions::Default();
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_DEVICE_OPTIONS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_DEVICE_OPTIONS_H_
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/platform/logging.h"
+
+namespace perftools {
+namespace gputools {
+
+// Indicates a set of options for a device's usage, which generally must be
+// provided at StreamExecutor device-initialization time.
+//
+// These are intended to be useful-but-not-mandatorily-supported options for
+// using devices on the underlying platform. Presently, if the option requested
+// is not available on the target platform, a warning will be emitted.
+struct DeviceOptions {
+ public:
+  // When it is observed that more memory has to be allocated for thread stacks,
+  // this flag prevents it from ever being deallocated. Potentially saves
+  // thrashing the thread stack memory allocation, but at the potential cost of
+  // some memory space.
+  static const unsigned kDoNotReclaimStackAllocation = 0x1;
+
+  // The following options refer to synchronization options when
+  // using SynchronizeStream or SynchronizeContext.
+
+  // Synchronize with spinlocks.
+  static const unsigned kScheduleSpin = 0x02;
+  // Synchronize with spinlocks that also call CPU yield instructions.
+  static const unsigned kScheduleYield = 0x04;
+  // Synchronize with a "synchronization primitive" (e.g. mutex).
+  static const unsigned kScheduleBlockingSync = 0x08;
+
+  static const unsigned kMask = 0xf;  // Mask of all available flags.
+
+  // Constructs an or-d together set of device options.
+  explicit DeviceOptions(unsigned flags) : flags_(flags) {
+    CHECK((flags & kMask) == flags);
+  }
+
+  // Factory for the default set of device options.
+  static DeviceOptions Default() { return DeviceOptions(0); }
+
+  unsigned flags() const { return flags_; }
+
+  bool operator==(const DeviceOptions& other) const {
+    return flags_ == other.flags_;
+  }
+
+  bool operator!=(const DeviceOptions& other) const {
+    return !(*this == other);
+  }
+
+  string ToString() {
+    return flags_ == 0 ? "none" : "kDoNotReclaimStackAllocation";
+  }
+
+ private:
+  unsigned flags_;
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_DEVICE_OPTIONS_H_
diff --git a/tensorflow/stream_executor/dnn.cc b/tensorflow/stream_executor/dnn.cc
new file mode 100644
index 0000000000..020de7f7bb
--- /dev/null
+++ b/tensorflow/stream_executor/dnn.cc
@@ -0,0 +1,297 @@
+#include "tensorflow/stream_executor/dnn.h"
+
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+
+namespace perftools {
+namespace gputools {
+namespace dnn {
+
+string ActivationModeString(ActivationMode mode) {
+  switch (mode) {
+    case ActivationMode::kSigmoid:
+      return "sigmoid";
+    case ActivationMode::kRelu:
+      return "relu";
+    case ActivationMode::kRelu6:
+      return "relu6";
+    case ActivationMode::kReluX:
+      return "reluX";
+    case ActivationMode::kTanh:
+      return "tanh";
+    default:
+      LOG(FATAL) << "Unknown activation_mode " << static_cast<int32>(mode);
+  }
+}
+
+string ElementwiseOperationString(ElementwiseOperation op) {
+  switch (op) {
+    case ElementwiseOperation::kAdd:
+      return "add";
+    case ElementwiseOperation::kMultiply:
+      return "multiply";
+    default:
+      LOG(FATAL) << "Unknown elementwise op " << static_cast<int32>(op);
+  }
+}
+
+string DataLayoutString(DataLayout layout) {
+  switch (layout) {
+    case DataLayout::kYXDepthBatch:
+      return "YXDepthBatch";
+    case DataLayout::kYXBatchDepth:
+      return "YXBatchDepth";
+    case DataLayout::kBatchYXDepth:
+      return "BatchYXDepth";
+    case DataLayout::kBatchDepthYX:
+      return "BatchDepthYX";
+    default:
+      LOG(FATAL) << "Unknown data layout " << static_cast<int32>(layout);
+  }
+}
+
+string FilterLayoutString(FilterLayout layout) {
+  switch (layout) {
+    case FilterLayout::kOutputInputYX:
+      return "OutputInputYX";
+    case FilterLayout::kInputYXOutput:
+      return "InputYXOutput";
+    case FilterLayout::kYXInputOutput:
+      return "YXInputOutput";
+    default:
+      LOG(FATAL) << "Unknown filter layout " << static_cast<int32>(layout);
+  }
+}
+
+// -- BatchDescriptor
+
+BatchDescriptor::BatchDescriptor()
+    : count_(0),
+      feature_map_count_(0),
+      height_(0),
+      width_(0),
+      value_max_(0.0),
+      value_min_(0.0),
+      layout_(DataLayout::kYXDepthBatch),
+      quantized_activation_mode_(QuantizedActivationMode::k8Bit) {}
+
+void BatchDescriptor::CloneFrom(const BatchDescriptor& other) {
+  count_ = other.count_;
+  feature_map_count_ = other.feature_map_count_;
+  height_ = other.height_;
+  width_ = other.width_;
+  value_max_ = other.value_max_;
+  value_min_ = other.value_min_;
+  layout_ = other.layout_;
+  quantized_activation_mode_ = other.quantized_activation_mode_;
+}
+
+string BatchDescriptor::ToString() const {
+  return port::Printf(
+      "{count: %lld feature_map_count: %lld height: %lld width: %lld "
+      "value_min: %f value_max: %f layout: %s}",
+      count_, feature_map_count_, height_, width_, value_min_, value_max_,
+      DataLayoutString(layout_).c_str());
+}
+
+string BatchDescriptor::ToShortString() const {
+  // All the constituent strings are less than 15 characters, so the
+  // small string optimization ensures that there will be at most one
+  // heap memory allocation.
+  string x = port::StrCat("x", width());
+  string y = port::StrCat("y", height());
+  string depth = port::StrCat("d", feature_map_count());
+  string batch = port::StrCat("b", count());
+
+  string suffix;
+  if (value_min() != value_max()) {
+    port::StrAppend(&suffix, "[", value_min(), ";", value_max(), "]");
+  }
+  if (quantized_activation_mode() == QuantizedActivationMode::k16Bit) {
+    suffix += "_16bit";
+  }
+
+  switch (layout()) {
+    case DataLayout::kYXDepthBatch:
+      return port::StrCat(y, x, depth, batch, suffix);
+    case DataLayout::kYXBatchDepth:
+      return port::StrCat(y, x, batch, depth, suffix);
+    case DataLayout::kBatchYXDepth:
+      return port::StrCat(batch, y, x, depth, suffix);
+    case DataLayout::kBatchDepthYX:
+      return port::StrCat(batch, depth, y, x, suffix);
+    default:
+      LOG(FATAL) << "Unknown layout " << static_cast<int32>(layout());
+  }
+}
+
+int64 BatchDescriptor::NodesPerFeatureMap() const { return width_ * height_; }
+
+int64 BatchDescriptor::NodesAcrossFeatureMaps() const {
+  return NodesPerFeatureMap() * feature_map_count_;
+}
+
+int64 BatchDescriptor::ElementCount() const {
+  return count_ * feature_map_count_ * height_ * width_;
+}
+
+int64 BatchDescriptor::FullyConnectedWeightCount(
+    const BatchDescriptor& input, const BatchDescriptor& output) {
+  return input.NodesAcrossFeatureMaps() * output.NodesAcrossFeatureMaps();
+}
+
+int64 BatchDescriptor::FullyConnectedBiasCount(const BatchDescriptor& output) {
+  return output.NodesAcrossFeatureMaps();
+}
+
+// -- FilterDescriptor
+
+FilterDescriptor::FilterDescriptor()
+    : output_feature_map_count_(0),
+      input_feature_map_count_(0),
+      input_filter_height_(0),
+      input_filter_width_(0),
+      layout_(FilterLayout::kOutputInputYX) {}
+
+FilterDescriptor::~FilterDescriptor() {}
+
+void FilterDescriptor::CloneFrom(const FilterDescriptor& other) {
+  set_output_feature_map_count(other.output_feature_map_count())
+      .set_input_feature_map_count(other.input_feature_map_count())
+      .set_input_filter_height(other.input_filter_height())
+      .set_input_filter_width(other.input_filter_width())
+      .set_layout(other.layout());
+}
+
+string FilterDescriptor::ToString() const {
+  return port::Printf(
+      "{output_feature_map_count: %lld input_feature_map_count: %lld "
+      "input_filter_height: %lld input_filter_width: %lld layout: %s}",
+      output_feature_map_count_, input_feature_map_count_, input_filter_height_,
+      input_filter_width_, FilterLayoutString(layout_).c_str());
+}
+
+string FilterDescriptor::ToShortString() const {
+  // All the constituent strings are less than 15 characters, so the
+  // small string optimization ensures that there will be at most one
+  // heap memory allocation.
+  string od = port::StrCat("od", output_feature_map_count_);
+  string id = port::StrCat("id", input_feature_map_count_);
+  string y = port::StrCat("y", input_filter_height_);
+  string x = port::StrCat("x", input_filter_width_);
+
+  switch (layout_) {
+    case FilterLayout::kOutputInputYX:
+      return port::StrCat(od, id, y, x);
+    case FilterLayout::kInputYXOutput:
+      return port::StrCat(id, y, x, od);
+    case FilterLayout::kYXInputOutput:
+      return port::StrCat(y, x, id, od);
+    default:
+      LOG(FATAL) << "Unknown layout " << static_cast<int32>(layout_);
+  }
+}
+
+int64 FilterDescriptor::ComputeWeightCount() const {
+  return output_feature_map_count_ * input_feature_map_count_ *
+         input_filter_height_ * input_filter_width_;
+}
+
+// -- ConvolutionDescriptor
+
+ConvolutionDescriptor::ConvolutionDescriptor()
+    : zero_padding_height_(0),
+      zero_padding_width_(0),
+      vertical_filter_stride_(1),
+      horizontal_filter_stride_(1) {}
+
+ConvolutionDescriptor::~ConvolutionDescriptor() {}
+
+string ConvolutionDescriptor::ToString() const {
+  return port::Printf(
+      "{zero_padding_height: %lld zero_padding_width: %lld "
+      "vertical_filter_stride: %lld horizontal_filter_stride: %lld}",
+      zero_padding_height_, zero_padding_width_, vertical_filter_stride_,
+      horizontal_filter_stride_);
+}
+
+string ConvolutionDescriptor::ToShortString() const {
+  return port::StrCat("py:", zero_padding_height_, "_px:", zero_padding_width_,
+                      "_sy:", vertical_filter_stride_, "_sx:",
+                      horizontal_filter_stride_);
+}
+
+// -- PoolingDescriptor
+
+PoolingDescriptor::PoolingDescriptor()
+    : mode_(dnn::PoolingMode::kMaximum),
+      window_height_(0),
+      window_width_(0),
+      vertical_padding_(0),
+      horizontal_padding_(0),
+      vertical_stride_(0),
+      horizontal_stride_(0) {}
+
+void PoolingDescriptor::CloneFrom(const PoolingDescriptor& other) {
+  mode_ = other.mode_;
+  window_height_ = other.window_height_;
+  window_width_ = other.window_width_;
+  vertical_padding_ = other.vertical_padding_;
+  horizontal_padding_ = other.horizontal_padding_;
+  vertical_stride_ = other.vertical_stride_;
+  horizontal_stride_ = other.horizontal_stride_;
+}
+
+string PoolingDescriptor::ToString() const {
+  const char* mode_string =
+      mode_ == dnn::PoolingMode::kMaximum ? "kMaximum" : "kAverage";
+  return port::Printf(
+      "{mode: %s window_height: %lld window_width: %lld vertical_stride: %lld "
+      "horizontal_stride: %lld vertical padding: %lld horizontal padding: "
+      "%lld}",
+      mode_string, window_height_, window_width_, vertical_stride_,
+      horizontal_stride_, vertical_padding_, horizontal_padding_);
+}
+
+string PoolingDescriptor::ToShortString() const {
+  return port::StrCat(mode_ == dnn::PoolingMode::kMaximum ? "max" : "avg",
+                      "_y:", window_height_, "_x:", window_width_, "_py:",
+                      vertical_padding_, "_px:", horizontal_padding_, "_sy:",
+                      vertical_stride_, "_sx:", horizontal_stride_);
+}
+
+// -- NormalizeDescriptor
+
+NormalizeDescriptor::NormalizeDescriptor()
+    : bias_(0.0),
+      range_(0),
+      alpha_(0.0),
+      beta_(0.0),
+      wrap_around_(false),
+      segment_size_(0) {}
+
+void NormalizeDescriptor::CloneFrom(const NormalizeDescriptor& other) {
+  bias_ = other.bias_;
+  range_ = other.range_;
+  alpha_ = other.alpha_;
+  beta_ = other.beta_;
+  wrap_around_ = other.wrap_around_;
+  segment_size_ = other.segment_size_;
+}
+
+string NormalizeDescriptor::ToString() const {
+  return port::Printf(
+      "{bias: %f range: %d alpha: %f beta: %f wrap_around: %d "
+      "segment_size: %d}",
+      bias_, range_, alpha_, beta_, wrap_around_, segment_size_);
+}
+
+string NormalizeDescriptor::ToShortString() const {
+  return port::StrCat("bias:", bias_, "_range:", range_, "_alpha:", alpha_,
+                      "_beta:", beta_, "_wrap:", wrap_around_, "_size:",
+                      segment_size_);
+}
+
+}  // namespace dnn
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/dnn.h b/tensorflow/stream_executor/dnn.h
new file mode 100644
index 0000000000..e737d1c78f
--- /dev/null
+++ b/tensorflow/stream_executor/dnn.h
@@ -0,0 +1,895 @@
+// Neural Net operation support for StreamExecutor instances.
+//
+// This is an abstract interface for a platform to optionally support common
+// neural net operations; it accommodates implementations such as the cudnn
+// library operations.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_DNN_H_
+#define TENSORFLOW_STREAM_EXECUTOR_DNN_H_
+
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/lib/array_slice.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+
+namespace dnn {
+
+// Describes how an input or output layer's data is formatted.
+// Specify int64 so there's no padding in BatchDescriptor.
+enum class DataLayout : int64 {
+  kYXDepthBatch = 0,  // Same as dist_belief::DF_DEPTH_MAJOR.
+  kYXBatchDepth,      // Same as dist_belief::DF_BATCH_MAJOR.
+  kBatchYXDepth,      // Same as run_brain output, and tensorflow's layout.
+  kBatchDepthYX,      // cuDNN's NCHW layout, data laid out as image, feature,
+                      // maps, rows, columns.
+};
+
+// Returns a string representation of the given data layout.
+string DataLayoutString(DataLayout layout);
+
+// Specifies a quantization for activations in a given BatchDescriptor.
+enum class QuantizedActivationMode {
+  k8Bit = 1,
+  k16Bit = 2,
+  k32Bit = 4,
+};
+
+// Describes the dimensions that a layer consumes/produces.
+//
+// This is a matrix (height, width), its "depth" (feature_map_count),
+// how many of these matrices are present (count),
+// and the maximum and minimum values expected in the matrix (value_max,
+// value_min).
+// If input is quantized, all values greater
+// than value_max will be clipped to value_max and all values less than
+// value_min will be clipped to value_min.
+// When quantized output is dequantized no value will be greater than
+// value_max or less than value_min.
+//
+// Uses the named argument construction form:
+//
+//  auto input_batch_dimensions =
+//      BatchDescriptor().set_count(42).set_feature_map_count(7)...
+//
+// Details:
+//
+// For a convolutional layer, a single inference takes a 3-dimensional matrix
+// of input and produces a 3-dimensional matrix of output. We call the three
+// dimensions height, width and feature_map_count, where for an image, the
+// height and width correspond to the Y and X pixel indices, respectively, and
+// the feature_map_count corresponds to the RGB dimension of the input data.
+// Then the count indicates how many 3D matrices are being presented to be
+// processed at once; this corresponds to the neural network concept of
+// minibatch size.
+//
+// For a fully connected layer, it's better to put the nodes of the layer in
+// the feature_map_count, and leave the height and weight as degenerate (== 1).
+// Count indicates how many input vectors (degenerate 3D matrices) are to be
+// processed.
+//
+// If unspecified, value_max and value_min default to 0.0.
+// If value_max == value_min the Stream will attempt to derive valid values -
+// for example the output of Relu6 activation will always be in the range
+// [0.0, 6.0].
+//
+// If unspecified, layout defaults to kYXDepthBatch.
+class BatchDescriptor {
+ public:
+  // Creates a "blank" batch descriptor, which should be initialized via the
+  // named argument helpers.
+  BatchDescriptor();
+
+  // Clones values from 'other' for initialization.
+  void CloneFrom(const BatchDescriptor& other);
+
+  string ToString() const;
+  string ToShortString() const;
+
+  // Accessors.
+  int64 count() const { return count_; }
+  int64 feature_map_count() const { return feature_map_count_; }
+  int64 height() const { return height_; }
+  int64 width() const { return width_; }
+  float value_max() const { return value_max_; }
+  float value_min() const { return value_min_; }
+  DataLayout layout() const { return layout_; }
+  QuantizedActivationMode quantized_activation_mode() const {
+    return quantized_activation_mode_;
+  }
+
+  // Named-argument helpers for avoiding user error during construction.
+  BatchDescriptor& set_count(int64 value) {
+    count_ = value;
+    return *this;
+  }
+  BatchDescriptor& set_feature_map_count(int64 value) {
+    feature_map_count_ = value;
+    return *this;
+  }
+  BatchDescriptor& set_height(int64 value) {
+    height_ = value;
+    return *this;
+  }
+  BatchDescriptor& set_width(int64 value) {
+    width_ = value;
+    return *this;
+  }
+  BatchDescriptor& set_value_max(float value) {
+    value_max_ = value;
+    return *this;
+  }
+  BatchDescriptor& set_value_min(float value) {
+    value_min_ = value;
+    return *this;
+  }
+  BatchDescriptor& set_layout(DataLayout layout) {
+    layout_ = layout;
+    return *this;
+  }
+  BatchDescriptor& set_quantized_activation_mode(
+      QuantizedActivationMode quantized_activation_mode) {
+    quantized_activation_mode_ = quantized_activation_mode;
+    return *this;
+  }
+
+  // Return the number of nodes in a single feature map.
+  int64 NodesPerFeatureMap() const;
+
+  // Return the number of nodes across all feature maps. Note that this is not
+  // affected by the batch count.
+  int64 NodesAcrossFeatureMaps() const;
+
+  // Returns the number of elements (e.g. RGB pixel values) required to hold a
+  // given batch descriptor, given a no-padding assumption. Note that this is
+  // affected by the batch count.
+  int64 ElementCount() const;
+
+  // Return the number of weights required to fully connect a layer with
+  // dimensions given by the 'input' descriptor with a layer with dimensions
+  // given by the 'output' descriptor.
+  static int64 FullyConnectedWeightCount(const BatchDescriptor& input,
+                                         const BatchDescriptor& output);
+
+  // Return the number of biases required to fully connect to an output layer
+  // with dimensions given the 'output' descriptor.
+  static int64 FullyConnectedBiasCount(const BatchDescriptor& output);
+
+ private:
+  int64 count_;
+  int64 feature_map_count_;
+  int64 height_;
+  int64 width_;
+  float value_max_;
+  float value_min_;
+  DataLayout layout_;
+  QuantizedActivationMode quantized_activation_mode_;
+};
+
+// Describes how a filter is laid out in the memory.
+// Specify int64 so there's no padding in FilterDescriptor.
+enum class FilterLayout : int64 {
+  kOutputInputYX = 0,  // cuDNN's default filter layout, laid out as:
+                       // (major) output feature maps >> input feature maps >>
+                       // rows >> columns (minor).
+  kInputYXOutput,      // Same as dist_belief's default filter layout.
+  kYXInputOutput,      // Same as tensorflow's default filter layout.
+};
+
+// Returns a string representation of the given filter layout.
+string FilterLayoutString(FilterLayout layout);
+
+// Describes a filter for the convolution. This is the "window" from
+// height-by-width patches of each of the feature maps in the input layer to the
+// cells within the output feature map.
+//
+// Uses the named argument construction form:
+//
+//  FilterDescriptor filter_dimensions;
+//  filter_dimensions
+//    .set_output_feature_map_count(42)
+//    .set_input_feature_map_count(7)
+//    ...
+//
+// Arguments:
+// - output_feature_map_count: number of feature maps in the output layer.
+// - input_feature_map_count: number of feature maps in the input layer (from
+//      which the filter patch is taken).
+// - input_filter_height: "height" number of neurons used in the sliding window
+//      over the input layer.
+// - input_filter_width: "width" number of neurons used in the sliding window
+//      over the input layer.
+//
+// Sometimes names like "filter input height" are referred to by synonymous
+// terminology, such as "kernel y size".
+//
+// If unspecified, layout defaults to kOutputInputYX.
+class FilterDescriptor {
+ public:
+  // By default construction, all dimensions are set to zero, so they should all
+  // be populated by the user via the named-argument helpers below. (See class
+  // comment for details.)
+  FilterDescriptor();
+
+  ~FilterDescriptor();
+
+  // Named-argument helpers for avoiding user error during construction.
+  FilterDescriptor& set_output_feature_map_count(int64 value) {
+    output_feature_map_count_ = value;
+    return *this;
+  }
+  FilterDescriptor& set_input_feature_map_count(int64 value) {
+    input_feature_map_count_ = value;
+    return *this;
+  }
+  FilterDescriptor& set_input_filter_height(int64 value) {
+    input_filter_height_ = value;
+    return *this;
+  }
+  FilterDescriptor& set_input_filter_width(int64 value) {
+    input_filter_width_ = value;
+    return *this;
+  }
+  FilterDescriptor& set_layout(FilterLayout layout) {
+    layout_ = layout;
+    return *this;
+  }
+
+  void CloneFrom(const FilterDescriptor& other);
+
+  string ToString() const;
+  string ToShortString() const;
+
+  // Returns the number of weights required as parameters for a convolution
+  // using this filter descriptor.
+  int64 ComputeWeightCount() const;
+
+  // Returns the number of biases required as parameters for a convolution using
+  // this filter descriptor.
+  int64 bias_count() const { return output_feature_map_count_; }
+
+  int64 output_feature_map_count() const { return output_feature_map_count_; }
+  int64 input_feature_map_count() const { return input_feature_map_count_; }
+  int64 input_filter_height() const { return input_filter_height_; }
+  int64 input_filter_width() const { return input_filter_width_; }
+  FilterLayout layout() const { return layout_; }
+
+ private:
+  int64 output_feature_map_count_;
+  int64 input_feature_map_count_;
+  int64 input_filter_height_;
+  int64 input_filter_width_;
+  FilterLayout layout_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(FilterDescriptor);
+};
+
+// Describes a convolution.
+//
+// Uses the named argument construction form:
+//
+//  ConvolutionDescriptor convolution_dimensions;
+//  convolution_dimensions
+//    .set_vertical_filter_stride(2)
+//    .set_horizontal_filter_stride(2)
+//    ...
+//
+// Arguments:
+// - zero_padding_height: padding of the "y dimension" of the input data. Note
+//    that this is different from the height of the filter.
+// - zero_padding_width: analogouus to the height above, but in the "x
+//    dimension".
+// - vertical_filter_stride: the convolution slides a 2-dimensional window of
+//    filter-height-by-filter-width over the input layer -- the center of that
+//    window is moved in the "y dimension" according to this stride value.
+// - horizontal_filter_stride: analogous to the vertical stride above, but in
+//    the "x dimension".
+class ConvolutionDescriptor {
+ public:
+  // By default construction, there is no zero-padding and the filter stride is
+  // 1x1 (centering the filter on every cell in the input layer's
+  // width-by-height area).
+  ConvolutionDescriptor();
+  ~ConvolutionDescriptor();
+
+  string ToString() const;
+  string ToShortString() const;
+
+  ConvolutionDescriptor& set_zero_padding_height(int64 value) {
+    zero_padding_height_ = value;
+    return *this;
+  }
+  ConvolutionDescriptor& set_zero_padding_width(int64 value) {
+    zero_padding_width_ = value;
+    return *this;
+  }
+  ConvolutionDescriptor& set_vertical_filter_stride(int64 value) {
+    vertical_filter_stride_ = value;
+    return *this;
+  }
+  ConvolutionDescriptor& set_horizontal_filter_stride(int64 value) {
+    horizontal_filter_stride_ = value;
+    return *this;
+  }
+
+  int64 zero_padding_height() const { return zero_padding_height_; }
+  int64 zero_padding_width() const { return zero_padding_width_; }
+  int64 vertical_filter_stride() const { return vertical_filter_stride_; }
+  int64 horizontal_filter_stride() const { return horizontal_filter_stride_; }
+
+ private:
+  int64 zero_padding_height_;
+  int64 zero_padding_width_;
+  int64 vertical_filter_stride_;
+  int64 horizontal_filter_stride_;
+  // TODO(leary) cudnn provides these fields, but need to characterize what
+  // their effect is -- they may be boolean rather than integral.
+  // int64 upscale_input_x;
+  // int64 upscale_input_y;
+};
+
+// A patch of values in the input can be pooled via either a max or an average
+// operation.
+// Specify int64 so there's no padding in PoolingDescriptor.
+enum class PoolingMode : int64 {
+  kMaximum,
+  kAverage,
+};
+
+// Describes a pooling operation to be enqueued onto a stream via a platform's
+// DnnSupport.
+//
+// TODO(broune): describe how padding works and what happens if the
+// window height/width is not divisible by the vertical/horizontal
+// stride.
+//
+// Arguments:
+//  pooling_mode: pooling operator to use on the input patch
+//  window_height: height of input window
+//  window_width: width of input window
+//  vertical_stride: vertical delta for center of the input patch
+//  horizontal_stride: horizontal delta for center of the input patch
+class PoolingDescriptor {
+ public:
+  PoolingDescriptor();
+
+  PoolingDescriptor& set_pooling_mode(PoolingMode value) {
+    mode_ = value;
+    return *this;
+  }
+  PoolingDescriptor& set_window_height(int64 value) {
+    window_height_ = value;
+    return *this;
+  }
+  PoolingDescriptor& set_window_width(int64 value) {
+    window_width_ = value;
+    return *this;
+  }
+  PoolingDescriptor& set_vertical_padding(int64 value) {
+    vertical_padding_ = value;
+    return *this;
+  }
+  PoolingDescriptor& set_horizontal_padding(int64 value) {
+    horizontal_padding_ = value;
+    return *this;
+  }
+  PoolingDescriptor& set_vertical_stride(int64 value) {
+    vertical_stride_ = value;
+    return *this;
+  }
+  PoolingDescriptor& set_horizontal_stride(int64 value) {
+    horizontal_stride_ = value;
+    return *this;
+  }
+
+  void CloneFrom(const PoolingDescriptor& other);
+
+  string ToString() const;
+  string ToShortString() const;
+
+  PoolingMode mode() const { return mode_; }
+  int64 window_height() const { return window_height_; }
+  int64 window_width() const { return window_width_; }
+  int64 vertical_padding() const { return vertical_padding_; }
+  int64 horizontal_padding() const { return horizontal_padding_; }
+  int64 vertical_stride() const { return vertical_stride_; }
+  int64 horizontal_stride() const { return horizontal_stride_; }
+
+ private:
+  PoolingMode mode_;
+  int64 window_height_;
+  int64 window_width_;
+  int64 vertical_padding_;
+  int64 horizontal_padding_;
+  int64 vertical_stride_;
+  int64 horizontal_stride_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(PoolingDescriptor);
+};
+
+// Describes a dist_belief local response normalization.
+// The normalization equation is:
+// y_i = x_i / (bias + alpha * (sum_j_{i - range}^{i + range} x_j^2)) ^ beta
+// where x_i is the input in feature map i, y_i is the output.
+// Each feature map is split into segment_size segments for performing the
+// sum_j_. If wrap_around is true, the sum_j_ for y_i on the left and right of
+// a segment wrap around at the edges of the segment, if wrap_around is false
+// zeros are inserted instead.
+class NormalizeDescriptor {
+ public:
+  NormalizeDescriptor();
+
+  NormalizeDescriptor& set_bias(float bias) {
+    bias_ = bias;
+    return *this;
+  }
+
+  NormalizeDescriptor& set_range(int32 range) {
+    range_ = range;
+    return *this;
+  }
+
+  NormalizeDescriptor& set_alpha(float alpha) {
+    alpha_ = alpha;
+    return *this;
+  }
+
+  NormalizeDescriptor& set_beta(float beta) {
+    beta_ = beta;
+    return *this;
+  }
+
+  NormalizeDescriptor& set_wrap_around(bool wrap_around) {
+    wrap_around_ = wrap_around;
+    return *this;
+  }
+
+  NormalizeDescriptor& set_segment_size(int32 segment_size) {
+    segment_size_ = segment_size;
+    return *this;
+  }
+
+  void CloneFrom(const NormalizeDescriptor& other);
+
+  string ToString() const;
+  string ToShortString() const;
+
+  float bias() const { return bias_; }
+  int32 range() const { return range_; }
+  float alpha() const { return alpha_; }
+  float beta() const { return beta_; }
+  bool wrap_around() const { return wrap_around_; }
+  int32 segment_size() const { return segment_size_; }
+
+ private:
+  float bias_;
+  int32 range_;
+  float alpha_;
+  float beta_;
+  bool wrap_around_;
+  int32 segment_size_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(NormalizeDescriptor);
+};
+
+// Describes a kind of non-linearity (threshold-like mathematical function).
+enum class ActivationMode {
+  kSigmoid,
+  // Rectified linear activation: f(x) = x < 0 ? 0 : x
+  kRelu,
+  // Rectified linear activation, where upper maximum is 6.0.
+  kRelu6,
+  // Rectified linear activation, where upper maximum specified by
+  // BatchDescriptor::value_max().
+  kReluX,
+  kTanh,
+};
+
+// Returns a string representation of the given activation mode.
+string ActivationModeString(ActivationMode mode);
+
+// Describes the operation that DoElementwiseOperation should perform on its
+// inputs.
+enum class ElementwiseOperation {
+  kAdd,
+  kMultiply
+};
+
+string ElementwiseOperationString(ElementwiseOperation op);
+
+// Suite of operations typically used for implementing Deep/Convolutional Neural
+// Nets.
+class DnnSupport {
+ public:
+  DnnSupport() {}
+  virtual ~DnnSupport() {}
+
+  virtual port::Status Init() = 0;
+
+  // Enqueues a single-precision convolution operation onto the stream.
+  //
+  // Arguments (all borrowed):
+  //  stream: borrowed pointer to the stream that the 'convolve' operation
+  //    should be enqueued onto.
+  //  input_descriptor: dimensions of the input layer.
+  //  input_data: un-owned device memory region which contains the
+  //    convolution input.
+  //  filter_descriptor: dimensions of the convolution filter.
+  //  weights: coefficients for the convolution filter, these are multiplied
+  //    against values in the input that the filter convolves over.
+  //  convolution_descriptor: stride of the convolution filter.
+  //  output_descriptor: dimensions of the output layer.
+  //  output_data: un-owned device memory region in which to place the
+  //    convolution result.
+  //
+  // input_descriptor, filter_descriptor, convolution_descriptor and
+  // output_descriptor together specify exactly how the convolution is aligned
+  // with the input data:
+  //
+  // * (input dimensions - filter size + 1) / filter stride == output dimensions
+  //   corresponds to dist_belief padding = VALID, i.e. the input is not padded.
+  // * input dimensions / filter stride == output dimensions
+  //   corresponds to dist_belief padding = SAME, i.e. input and output are the
+  //   same size - this requires padding the input.
+  // * (input dimensions + filter size - 1) / filter stride == output dimensions
+  //   corresponds to dist_belief padding = FULL, i.e. the output is sized so
+  //   that if the inverse of the filter is applied to the output in VALID mode
+  //   the result is the same size as the input - this requires even more
+  //   padding
+  //   of the input.
+  virtual bool DoConvolve(
+      Stream* stream, const dnn::BatchDescriptor& input_descriptor,
+      const DeviceMemory<float>& input_data,
+      const dnn::FilterDescriptor& filter_descriptor,
+      const DeviceMemory<float>& filter_data,
+      const dnn::ConvolutionDescriptor& convolution_descriptor,
+      const dnn::BatchDescriptor& output_descriptor,
+      DeviceMemory<float>* output_data) = 0;
+
+  // Enqueues a double-precision convolution operation onto the stream.
+  // See DoConvolve above for argument details.
+  virtual bool DoConvolve(
+      Stream* stream, const dnn::BatchDescriptor& batch_descriptor,
+      const DeviceMemory<double>& input_data,
+      const dnn::FilterDescriptor& filter_descriptor,
+      const DeviceMemory<double>& filter_data,
+      const dnn::ConvolutionDescriptor& convolution_descriptor,
+      const dnn::BatchDescriptor& output_descriptor,
+      DeviceMemory<double>* output_data) = 0;
+
+  // Variation of the above with the weight matrix split into two matrices.
+  // first_weights: Coefficients of the first matrix.
+  // second_weights: Coefficients of the second matrix.
+  // depth_multiplier: specifies the columns of the first matrix and rows
+  // of the second one - first_weights columns = depth_multiplier,
+  // second_weights rows = depth_multiplier *
+  //                       filter_descriptor.input_feature_map_count().
+  // see go/separable for documentation on separable convolutions.
+  virtual bool DoSeparableConvolve(
+      Stream* stream, const BatchDescriptor& input_descriptor,
+      const DeviceMemory<float>& input_data,
+      const FilterDescriptor& filter_descriptor, int depth_multiplier,
+      const DeviceMemory<float>& first_weights,
+      const DeviceMemory<float>& second_weights,
+      const ConvolutionDescriptor& convolution_descriptor,
+      const BatchDescriptor& output_descriptor,
+      DeviceMemory<float>* output_data) = 0;
+
+  // Enqueues a single-precision backward convolution (for data) operation onto
+  // the stream.
+  //
+  // Arguments:
+  //  stream: borrowed pointer to the stream that the 'convolve' operation
+  //    should be enqueued onto.
+  //  filter_descriptor: dimensions of the convolution filter.
+  //  filter_data: coefficients for the convolution filter.
+  //  output_descriptor: dimensions of the output gradients, which is the same
+  //  as
+  //  the dimensions of the ouput.
+  //  backward_output_data: un-owned device memory region which contains the
+  //    backprop of the output.
+  //  convolution_descriptor: stride of the convolution filter.
+  //  input_descriptor: dimensions of the input layer.
+  //  backward_input_data: un-owned device memory region in which to place the
+  //    backprop of the input.
+  virtual bool DoConvolveBackwardData(
+      Stream* stream, const FilterDescriptor& filter_descriptor,
+      const DeviceMemory<float>& filter_data,
+      const BatchDescriptor& output_descriptor,
+      DeviceMemory<float> backward_output_data,
+      const ConvolutionDescriptor& convolution_descriptor,
+      const BatchDescriptor& input_descriptor,
+      DeviceMemory<float>* backward_input_data) = 0;
+
+  // Enqueues a single-precision backward convolution (for filter) operation
+  // onto
+  // the stream.
+  //
+  // Arguments:
+  //  stream: borrowed pointer to the stream that the 'convolve' operation
+  //    should be enqueued onto.
+  //  input_descriptor: dimensions of the input layer.
+  //  input_data: un-owned device memory region which contains the
+  //    convolution input.
+  //  output_descriptor: dimensions of the output gradients, which is the same
+  //  as
+  //  the dimensions of the ouput.
+  //  backward_output_data: un-owned device memory region which contains the
+  //    backprop of the output.
+  //  convolution_descriptor: stride of the convolution filter.
+  //  filter_descriptor: dimensions of the convolution filter.
+  //  backward_filter_data: un-owned device memory region in which to place the
+  //    backprop of the filter.
+  virtual bool DoConvolveBackwardFilter(
+      Stream* stream, const BatchDescriptor& input_descriptor,
+      const DeviceMemory<float>& input_data,
+      const BatchDescriptor& output_descriptor,
+      DeviceMemory<float> backward_output_data,
+      const ConvolutionDescriptor& convolution_descriptor,
+      const FilterDescriptor& filter_descriptor,
+      DeviceMemory<float>* backward_filter_data) = 0;
+
+  // Fully connects the "nodes" (float values) in input_data with
+  // shape input_dimensions to output_data with output_dimensions
+  // using provided weights. This is equivalent to computing a matrix
+  // product, hence the name MatMul.
+  //
+  // A BatchDescriptor has four dimensions: batch, y, x, depth. Matrix products
+  // happen in two dimensions. To get down to two dimensions, we consider the
+  // input y, x and depth dimension as one combined dimension T. For now,
+  // assume that the output height and width are 1 and let OD be the output
+  // depth.
+  //
+  // There are three device memory buffers passed in to this
+  // function. We can now view all three as matrices:
+  //
+  //   input_data: A batch x T matrix
+  //   weights: A T x OD matrix
+  //   output_data: A batch x OD matrix
+  //
+  // This function then computes the matrix product of input_data and
+  // weights and writes the result into output_data.
+  //
+  // Here the weights buffer is in row major order, i.e. the first OD
+  // entries in weights are the first row, the second OD entries in
+  // weights are the second row and so on.
+  //
+  // The case for output width*height > 1 is more complicated. Let K =
+  // OY * OX where OY is the output height and OX is the output
+  // width. Then weights is divided into K sub-arrays W_i, for
+  // i=0,...,k-1, that each represent a T x OD matrix. This function
+  // then computes the K matrix multiplications of input_data with
+  // each W_i. This creates K matrices with dimensions batch x
+  // OD. These K matrices are concatenated horizontally to form one
+  // larger matrix with dimensions batch x (K*OD); note that this is
+  // not the same as concatenating the bytes of the matrices. The
+  // combined matrix can then be interpreted as a tensor with
+  // dimensions (batch, OY, OX, OD). If the output tensor format is
+  // not kBatchYXDepth, this function would then need to arrange for
+  // the output to be in the requested layout, if that is
+  // supported. Note that the case K=1 is equivalent to the
+  // description above. It is recommended to prefer the case K=1.
+  //
+  // Arguments (all borrowed):
+  //  stream: borrowed pointer to the stream that the 'fully connect' operation
+  //    should be enqueued onto.
+  //  output_data: un-owned device memory region in which to place the
+  //    fully connected result.
+  virtual bool DoMatMul(Stream* stream, const DeviceMemory<float>& input_data,
+                        const DeviceMemory<float>& weights,
+                        const dnn::BatchDescriptor& input_dimensions,
+                        const dnn::BatchDescriptor& output_dimensions,
+                        DeviceMemory<float>* output_data) = 0;
+
+  // Version of DoMatMul that uses pre-quantized 8 bit weights.
+  // weight_scales specifies the scaling of each column of weights:
+  // original float weight[row * num_columns + column] =
+  //     quantized_weight[row * nnum_columns + column] * weight_scales[column].
+  virtual bool DoMatMulQuantized(Stream* stream,
+                                 const DeviceMemory<float>& input_data,
+                                 const DeviceMemory<int8>& quantized_weights,
+                                 const DeviceMemory<float>& weight_scales,
+                                 const dnn::BatchDescriptor& input_dimensions,
+                                 const dnn::BatchDescriptor& output_dimensions,
+                                 DeviceMemory<float>* output_data) = 0;
+
+  // Version of DoMatMul that uses pre-quantized 16 bit weights.
+  // weight_scales specifies the scaling of each column of weights:
+  // original float weight[row * num_columns + column] =
+  //     quantized_weight[row * nnum_columns + column] * weight_scales[column].
+  virtual bool DoMatMulQuantized(Stream* stream,
+                                 const DeviceMemory<float>& input_data,
+                                 const DeviceMemory<int16>& quantized_weights,
+                                 const DeviceMemory<float>& weight_scales,
+                                 const dnn::BatchDescriptor& input_dimensions,
+                                 const dnn::BatchDescriptor& output_dimensions,
+                                 DeviceMemory<float>* output_data) = 0;
+
+  // Adds biases to the feature maps in input_data producing
+  // output_data. input_data can equal output_data, but must not
+  // partially overlap it.
+  //
+  // Let K = count() * height() * width() and N = feature_map_count()
+  // on dimensions. Then input_value contains K*N values and biases
+  // contains N values. We can thus logically consider input_value to
+  // contain K vectors of N elements each. This function adds biases
+  // to each of those N vectors.
+  //
+  // TODO(broune): This works differently when width() * height() > 1
+  // and the call to ThenBiasAdd() follows a call to ThenMatMul(). In
+  // that case there should be width() * height() *
+  // feature_map_count() biases, but this is not implemented on all
+  // StreamExecutors.
+  //
+  // Arguments (all borrowed):
+  //  stream: borrowed pointer to the stream that the 'bias add' operation
+  //    should be enqueued onto.
+  //  input_data: un-owned device memory region containing the input.
+  //  biases: un-owned device memory region containing biases to add to the
+  //    input.
+  //  dimensions: dimensions of input_data and output_data.
+  //  output_data: un-owned device memory region in which to place the result.
+  virtual bool DoBiasAdd(Stream* stream, const DeviceMemory<float>& input_data,
+                         const DeviceMemory<float>& biases,
+                         const dnn::BatchDescriptor& dimensions,
+                         DeviceMemory<float>* output_data) = 0;
+
+  // Performs a forward pooling operation on input_data, writing to
+  // output_data. See PoolingDescriptor for how to configure the
+  // pooling operation.
+  //
+  // Pooling happens as a window that moves across the Y and X
+  // dimensions of input_data, where each position of the window
+  // yields one output value. E.g. for max pooling, the computed value
+  // is the maximum element in the window. The operation is applied
+  // independently to each batch and at each feature map (depth), so
+  // that the output depth and feature_map_count are the same as for
+  // the input. The output width and height can be different.
+  //
+  // See PoolingDescriptor for how to configure the pooling operation.
+  virtual bool DoPoolForward(Stream* stream,
+                             const dnn::PoolingDescriptor& pooling_dimensions,
+                             const dnn::BatchDescriptor& input_dimensions,
+                             const DeviceMemory<float>& input_data,
+                             const dnn::BatchDescriptor& output_dimensions,
+                             DeviceMemory<float>* output_data) = 0;
+
+  // Performs differentiation of the pooling operation.
+  virtual bool DoPoolBackward(Stream* stream,
+                              const dnn::PoolingDescriptor& pooling_dimensions,
+                              const dnn::BatchDescriptor& input_dimensions,
+                              const DeviceMemory<float>& input_data,
+                              const dnn::BatchDescriptor& output_dimensions,
+                              const DeviceMemory<float>& output_data,
+                              const DeviceMemory<float>& input_diff_data,
+                              DeviceMemory<float>* output_diff_data) = 0;
+
+  // Applies local response normalization to all of the values
+  // held on the device in 'input_data'.
+  virtual bool DoNormalize(Stream* stream,
+                           const dnn::NormalizeDescriptor& normalize_descriptor,
+                           const DeviceMemory<float>& input_data,
+                           DeviceMemory<float>* output_data) = 0;
+
+  // Applies an activation function (see ActivationMode) to all of the values
+  // held on the device in 'input_data', whose dimensions are described by
+  // 'dimensions'.
+  //
+  // Arguments (all borrowed):
+  //  stream: borrowed pointer to the stream that the 'activate' operation
+  //    should be enqueued onto.
+  //  activation_mode: Type of activation to perform.
+  //  input_data: un-owned device memory region which contains the
+  //    activate input.
+  //  output_data: un-owned device memory region in which to place the
+  //    activate result.
+  virtual bool DoActivate(Stream* stream, ActivationMode activation_mode,
+                          const BatchDescriptor& dimensions,
+                          const DeviceMemory<float>& input_data,
+                          DeviceMemory<float>* output_data) = 0;
+
+  // Concatenates several layers into one, by concatenating the depth of each
+  // layer at matching x and y coordinates.
+  // The inputs must all have the same width and height, the output will have
+  // the same width and height as the inputs and its depth will be the sum of
+  // the input depths.
+  //
+  // Arguments (all borrowed):
+  //  stream: borrowed pointer to the stream that the 'depth concatenate'
+  // operation should be enqueued onto.
+  //  input_dimensions: The dimensions of each input.
+  //  input_data: un-owned device memory region which contains the
+  //    input data for each input layer.
+  //  output_data: un-owned device memory region in which to place the
+  //    depth concatenate result.
+  virtual bool DoDepthConcatenate(
+      Stream* stream, port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+      port::ArraySlice<const DeviceMemory<float>*> input_data,
+      DeviceMemory<float>* output_data) = 0;
+
+  // Computes the specified operation (e.g. addition or multiplication)
+  // between corresponding elements in the inputs and stores the result in the
+  // output element.
+  // The inputs and output must all have the same dimensions, but may have
+  // different quantization parameters (min_value and max_value).
+  //
+  // Arguments (all borrowed):
+  //  stream: borrowed pointer to the stream that the 'elementwise operation'
+  // should be enqueued onto.
+  //  operation: The operation to perform.
+  //  input_dimensions: The dimensions of each input.
+  //  input_data: un-owned device memory region which contains the
+  //    input data for each input layer.
+  //  output_dimensions: The dimensions of the output.
+  //  output_data: un-owned device memory region in which to place the
+  //    operation result.
+  virtual bool DoElementwiseOperate(
+      Stream* stream, ElementwiseOperation operation,
+      port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+      port::ArraySlice<const DeviceMemory<float>*> input_data,
+      const dnn::BatchDescriptor& output_dimensions,
+      DeviceMemory<float>* output_data) = 0;
+
+  // Enqueues an asynchronous memcpy of the *quantized* output of a layer (that
+  // is, bytes instead of scaled floats) into 'host_dst' if they are available
+  // for the underlying DNN implementation. If this quantized output is not
+  // available, false is returned, which will place 'stream' into an error
+  // state.
+  //
+  // Arguments (all borrowed):
+  //  stream: borrowed pointer to the stream that the 'quantized memcpy'
+  //    operation should be enqueued onto.
+  //  gpu_unquantized_src: the device memory that contains the unquantized data
+  //    -- this data should also have a corresponding quantized representation
+  //    on the device for this operation to succeed.
+  //  host_dst: un-owned host memory region that is mutated in place,
+  //    it is clobbered by the values in 'gpu_unquantized_src' when the enqueued
+  //    (asynchronous) memcpy operation is performed.
+  // TODO(wgulland) Merge all these versions of DoMemcpyD2HQuantized.
+  virtual bool DoMemcpyD2HQuantized(
+      Stream* stream, const DeviceMemory<float>& gpu_unquantized_src,
+      port::MutableArraySlice<uint8> host_dst) = 0;
+
+  // As above, but for 16-bit values.
+  virtual bool DoMemcpyD2HQuantized(
+      Stream* stream, const DeviceMemory<float>& gpu_unquantized_src,
+      port::MutableArraySlice<uint16> host_dst) = 0;
+
+  // As above, but for signed 32-bit values.
+  virtual bool DoMemcpyD2HQuantized(
+      Stream* stream, const DeviceMemory<float>& gpu_unquantized_src,
+      port::MutableArraySlice<int32> host_dst) = 0;
+
+  // Enqueues an asynchronous memcpy of 'host_dst' into the *quantized* input
+  // of a layer (that is, bytes instead of scaled floats) if they are supported
+  // by the underlying DNN implementation. If this quantized input is not
+  // supported, false is returned, which will place 'stream' into an error
+  // state.
+  //
+  // Arguments (all borrowed):
+  //  stream: borrowed pointer to the stream that the 'quantized memcpy'
+  //    operation should be enqueued onto.
+  //  host_src: un-owned host memory region that contains the quantized data.
+  //  gpu_unquantized_dst: the device memory that is clobbered by the values in
+  //    'host_src' when the enqueued (asynchronous) memcpy operation is
+  //    performed. -- this data should also have a corresponding quantized
+  //    representation on the device for this operation to
+  //    succeed.
+  virtual bool DoMemcpyH2DQuantized(
+      Stream* stream, port::ArraySlice<uint8> host_src,
+      DeviceMemory<float>* gpu_unquantized_dst) = 0;
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(DnnSupport);
+};
+
+}  // namespace dnn
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_DNN_H_
diff --git a/tensorflow/stream_executor/dso_loader.cc b/tensorflow/stream_executor/dso_loader.cc
new file mode 100644
index 0000000000..4ac14ea30b
--- /dev/null
+++ b/tensorflow/stream_executor/dso_loader.cc
@@ -0,0 +1,208 @@
+#include "tensorflow/stream_executor/dso_loader.h"
+
+#include <dlfcn.h>
+#include <limits.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <initializer_list>
+#include "tensorflow/stream_executor/platform/port.h"
+#include <vector>
+
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/str_util.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/lib/str_util.h"
+
+namespace perftools {
+namespace gputools {
+namespace internal {
+
+/* static */ port::Status DsoLoader::GetCublasDsoHandle(void** dso_handle) {
+  return GetDsoHandle(FindDsoPath("libcublas.so.7.0",
+                                  "third_party/gpus/cuda/lib64"),
+                      dso_handle);
+}
+
+/* static */ port::Status DsoLoader::GetCudnnDsoHandle(void** dso_handle) {
+  // libcudnn is versioned differently than the other libraries.  See b/22397368
+  // for some details about the complications surrounding this.
+  return GetDsoHandle(FindDsoPath("libcudnn.so.6.5",
+                                  "third_party/gpus/cuda/lib64"),
+                      dso_handle);
+}
+
+/* static */ port::Status DsoLoader::GetCufftDsoHandle(void** dso_handle) {
+  return GetDsoHandle(FindDsoPath("libcufft.so.7.0",
+                                  "third_party/gpus/cuda/lib64"),
+                      dso_handle);
+}
+
+/* static */ port::Status DsoLoader::GetCurandDsoHandle(void** dso_handle) {
+  return GetDsoHandle(FindDsoPath("libcurand.so.7.0",
+                                  "third_party/gpus/cuda/lib64"),
+                      dso_handle);
+}
+
+/* static */ port::Status DsoLoader::GetLibcudaDsoHandle(void** dso_handle) {
+  return GetDsoHandle(FindDsoPath("libcuda.so",
+                                  "third_party/gpus/cuda/driver/lib64"),
+                      dso_handle);
+}
+
+/* static */ port::Status DsoLoader::GetLibcuptiDsoHandle(void** dso_handle) {
+  return GetDsoHandle(
+      FindDsoPath("libcupti.so.7.0",
+                  "third_party/gpus/cuda/extras/CUPTI/lib64"),
+      dso_handle);
+}
+
+/* static */ void DsoLoader::RegisterRpath(port::StringPiece path) {
+  mutex_lock lock{rpath_mutex_};
+  GetRpaths()->push_back(path.ToString());
+}
+
+
+/* static */ port::Status DsoLoader::GetDsoHandle(port::StringPiece path,
+                                                  void** dso_handle,
+                                                  LoadKind load_kind) {
+
+  int dynload_flags =
+      RTLD_LAZY | (load_kind == LoadKind::kLocal ? RTLD_LOCAL : RTLD_GLOBAL);
+  string path_string = path.ToString();
+  *dso_handle = dlopen(path_string.c_str(), dynload_flags);
+  if (*dso_handle == nullptr) {
+    LOG(INFO) << "LD_LIBRARY_PATH: " << getenv("LD_LIBRARY_PATH");
+    // TODO(b/22689637): Eliminate unnecessary ToString once StrCat has been
+    // moved to the open-sourceable version.
+    return port::Status(
+        port::error::FAILED_PRECONDITION,
+        port::StrCat("could not dlopen DSO: ", path, "; dlerror: ", dlerror()));
+  }
+
+  VLOG(2) << "loaded path \"" << path << "\" "
+          << (load_kind == LoadKind::kLocal ? "locally" : "globally");
+  return port::Status::OK();
+}
+
+/* static */ string DsoLoader::GetBinaryDirectory(bool strip_executable_name) {
+  char exe_path[PATH_MAX] = {0};
+  CHECK_ERR(readlink("/proc/self/exe", exe_path, sizeof(exe_path) - 1));
+  // Make sure it's null-terminated:
+  exe_path[sizeof(exe_path) - 1] = 0;
+
+  if (strip_executable_name) {
+    // The exe is the last component of the path, so remove one component.
+    std::vector<string> components = port::Split(exe_path, '/');
+    components.pop_back();
+    return port::Join(components, "/");
+  }
+  return exe_path;
+}
+
+// Creates a heap-allocated vector for initial rpaths.
+// Ownership is transferred to the caller.
+static std::vector<string>* CreatePrimordialRpaths() {
+  auto rpaths = new std::vector<string>;
+  rpaths->push_back(
+      "driver/driver_sh.runfiles/third_party/gpus/cuda/lib64");
+  return rpaths;
+}
+
+/* static */ mutex DsoLoader::rpath_mutex_{LINKER_INITIALIZED};
+/* static */ std::vector<string>* DsoLoader::GetRpaths() {
+  static std::vector<string>* rpaths = CreatePrimordialRpaths();
+  return rpaths;
+}
+
+/* static */ bool DsoLoader::TrySymbolicDereference(string* candidate) {
+  char buf[PATH_MAX];
+  char* result = realpath(candidate->c_str(), buf);
+  if (result == nullptr) {
+    return false;
+  }
+  VLOG(3) << "realpath resolved candidate path \"" << *candidate << "\" to \""
+          << result << "\"";
+  *candidate = result;
+  return true;
+}
+
+/* static */ string DsoLoader::FindDsoPath(port::StringPiece library_name,
+                                           port::StringPiece runfiles_relpath) {
+
+  // Keep a record of the paths we attempted so we can dump out meaningful
+  // diagnostics if no path is found.
+  std::vector<string> attempted;
+
+  using StringPieces = std::vector<port::StringPiece>;
+  string candidate;
+
+  // Otherwise, try binary-plus-rpath locations.
+  string binary_directory =
+      GetBinaryDirectory(true /* = strip_executable_name */);
+  mutex_lock lock{rpath_mutex_};
+  for (const string& rpath : *GetRpaths()) {
+    candidate =
+        port::Join(StringPieces{binary_directory, rpath, library_name}, "/");
+    if (TrySymbolicDereference(&candidate)) {
+      return candidate;
+    }
+  }
+  attempted.push_back(candidate);
+
+  return library_name.ToString();
+}
+
+// -- CachedDsoLoader
+
+/* static */ port::StatusOr<void*> CachedDsoLoader::GetCublasDsoHandle() {
+  static port::StatusOr<void*> result =
+      FetchHandleResult(DsoLoader::GetCublasDsoHandle);
+  return result;
+}
+
+/* static */ port::StatusOr<void*> CachedDsoLoader::GetCurandDsoHandle() {
+  static port::StatusOr<void*> result =
+      FetchHandleResult(DsoLoader::GetCurandDsoHandle);
+  return result;
+}
+
+/* static */ port::StatusOr<void*> CachedDsoLoader::GetCudnnDsoHandle() {
+  static port::StatusOr<void*> result =
+      FetchHandleResult(DsoLoader::GetCudnnDsoHandle);
+  return result;
+}
+
+/* static */ port::StatusOr<void*> CachedDsoLoader::GetCufftDsoHandle() {
+  static port::StatusOr<void*> result =
+      FetchHandleResult(DsoLoader::GetCufftDsoHandle);
+  return result;
+}
+
+/* static */ port::StatusOr<void*> CachedDsoLoader::GetLibcudaDsoHandle() {
+  static port::StatusOr<void*> result =
+      FetchHandleResult(DsoLoader::GetLibcudaDsoHandle);
+  return result;
+}
+
+/* static */ port::StatusOr<void*> CachedDsoLoader::GetLibcuptiDsoHandle() {
+  static port::StatusOr<void*> result =
+      FetchHandleResult(DsoLoader::GetLibcuptiDsoHandle);
+  return result;
+}
+
+/* static */ port::StatusOr<void*> CachedDsoLoader::FetchHandleResult(
+    std::function<port::Status(void**)> load_dso) {
+  void* handle;
+  auto status = load_dso(&handle);
+  if (!status.ok()) {
+    return status;
+  }
+  return handle;
+}
+
+}  // namespace internal
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/dso_loader.h b/tensorflow/stream_executor/dso_loader.h
new file mode 100644
index 0000000000..4dcc48d231
--- /dev/null
+++ b/tensorflow/stream_executor/dso_loader.h
@@ -0,0 +1,107 @@
+// Common DSO loading functionality: exposes callables that dlopen DSOs
+// in either the runfiles directories
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_DSO_LOADER_H_
+#define TENSORFLOW_STREAM_EXECUTOR_DSO_LOADER_H_
+
+#include "tensorflow/stream_executor/platform/port.h"
+#include <vector>
+
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/lib/stringpiece.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace internal {
+
+// Permits StreamExecutor code to dynamically load a pre-determined set of
+// relevant DSOs via dlopen.
+//
+// Thread-safe.
+class DsoLoader {
+ public:
+  // The following methods either load the DSO of interest and return a dlopen
+  // handle or error status in the canonical namespace.
+
+  static port::Status GetCublasDsoHandle(void** dso_handle);
+  static port::Status GetCudnnDsoHandle(void** dso_handle);
+  static port::Status GetCufftDsoHandle(void** dso_handle);
+  static port::Status GetCurandDsoHandle(void** dso_handle);
+  static port::Status GetLibcudaDsoHandle(void** dso_handle);
+  static port::Status GetLibcuptiDsoHandle(void** dso_handle);
+
+  // Registers a new binary-relative path to use as a dlopen search path.
+  static void RegisterRpath(port::StringPiece path);
+
+ private:
+  // Registered rpaths (singleton vector) and a mutex that guards it.
+  static std::vector<string>* GetRpaths();
+  static mutex rpath_mutex_;
+
+  // Descriptive boolean wrapper to indicate whether symbols are made available
+  // to resolve in later-loaded libraries.
+  enum class LoadKind { kLocal, kGlobal };
+
+  // Loads a DSO from the given "path" (which can technically be any dlopen-able
+  // name). If the load kind is global, the symbols in the loaded DSO are
+  // visible to subsequent DSO loading operations.
+  static port::Status GetDsoHandle(port::StringPiece path, void** dso_handle,
+                                   LoadKind load_kind = LoadKind::kLocal);
+
+
+  // Returns the binary directory (or binary path) associated with the currently
+  // executing program. If strip_executable_name is true, the executable file is
+  // stripped off of the path.
+  static string GetBinaryDirectory(bool strip_executable_name);
+
+  // Returns the location of the runfiles directory.
+  // * Manual invocation gets the runfiles as a relative path to the current
+  //   executable.
+  static string GetRunfilesDirectory();
+
+  // Invokes realpath on the original path; updates candidate and returns true
+  // if it succeeds (i.e. a file exists at the path); otherwise, returns false.
+  static bool TrySymbolicDereference(string* candidate);
+
+  // Attempts to find a path to the DSO of interest, otherwise returns the
+  // bare library name:
+  // Arguments:
+  //   library_name: the filename in tree; e.g. libOpenCL.so.1.0.0
+  //   runfiles_relpath: where to look for the library relative to the runfiles
+  //      root; e.g. third_party/gpus/cuda/lib64
+  static string FindDsoPath(port::StringPiece library_name,
+                            port::StringPiece runfiles_relpath);
+
+  SE_DISALLOW_COPY_AND_ASSIGN(DsoLoader);
+};
+
+// Wrapper around the DsoLoader that prevents us from dlopen'ing any of the DSOs
+// more than once.
+class CachedDsoLoader {
+ public:
+  // Cached versions of the corresponding DsoLoader methods above.
+  static port::StatusOr<void*> GetCublasDsoHandle();
+  static port::StatusOr<void*> GetCudnnDsoHandle();
+  static port::StatusOr<void*> GetCufftDsoHandle();
+  static port::StatusOr<void*> GetCurandDsoHandle();
+  static port::StatusOr<void*> GetLibcudaDsoHandle();
+  static port::StatusOr<void*> GetLibcuptiDsoHandle();
+
+ private:
+  // Fetches a DSO handle via "load_dso" and returns the StatusOr form of the
+  // result.
+  static port::StatusOr<void*> FetchHandleResult(
+      std::function<port::Status(void**)> load_dso);
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CachedDsoLoader);
+};
+
+}  // namespace internal
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_DSO_LOADER_H_
diff --git a/tensorflow/stream_executor/event.cc b/tensorflow/stream_executor/event.cc
new file mode 100644
index 0000000000..79c3d39f24
--- /dev/null
+++ b/tensorflow/stream_executor/event.cc
@@ -0,0 +1,48 @@
+#include "tensorflow/stream_executor/event.h"
+
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+#include "tensorflow/stream_executor/stream.h"
+
+namespace perftools {
+namespace gputools {
+
+internal::EventInterface* CreateEventImplementation(
+    StreamExecutor* stream_exec) {
+  PlatformKind platform_kind = stream_exec->platform_kind();
+  switch (platform_kind) {
+    case PlatformKind::kCuda:
+      return (*internal::MakeCUDAEventImplementation())(stream_exec);
+    default:
+      LOG(FATAL) << "Cannot create event implementation for platform kind: "
+                 << PlatformKindString(platform_kind);
+  }
+}
+
+Event::Event(StreamExecutor* stream_exec)
+    : implementation_(CreateEventImplementation(stream_exec)),
+      stream_exec_(stream_exec) {}
+
+Event::~Event() {
+  auto status = stream_exec_->DeallocateEvent(this);
+  if (!status.ok()) {
+    LOG(ERROR) << status.error_message();
+  }
+}
+
+bool Event::Init() {
+  auto status = stream_exec_->AllocateEvent(this);
+  if (!status.ok()) {
+    LOG(ERROR) << status.error_message();
+    return false;
+  }
+
+  return true;
+}
+
+Event::Status Event::PollForStatus() {
+  return stream_exec_->PollForEventStatus(this);
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/event.h b/tensorflow/stream_executor/event.h
new file mode 100644
index 0000000000..fdd5112d9a
--- /dev/null
+++ b/tensorflow/stream_executor/event.h
@@ -0,0 +1,63 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_EVENT_H_
+#define TENSORFLOW_STREAM_EXECUTOR_EVENT_H_
+
+#include <memory>
+
+namespace perftools {
+namespace gputools {
+
+namespace internal {
+class EventInterface;
+}
+
+class Stream;
+class StreamExecutor;
+
+// The Event class, when supported by a platform, enables low-overhead status
+// reporting for a Stream. An Event is inserted at a location in a stream via
+// the Stream::ThenRecordEvent() API. From then on, the Event's status can be
+// monitored via the nonblocking Event::PollForStatus() call.
+class Event {
+ public:
+  // Potential states for an Event. If PollForStatus() returns anything aside
+  // from kPending or kComplete, an error has occurred; kUnknown is a bad state.
+  // Not all implementations are able to return all enumeration values. Refer to
+  // the platform-specific implementation for details.
+  enum class Status {
+    kUnknown,
+    kError,
+    kPending,
+    kComplete,
+  };
+
+  explicit Event(StreamExecutor* stream_exec);  // NOLINT
+
+  // Releases any resources held by the Event object.
+  ~Event();
+
+  // Performs any platform-specific or potentially error-generating
+  // initialization.
+  bool Init();
+
+  // Returns the current Status for the event.
+  Status PollForStatus();
+
+  // Returns a pointer to the underlying platform-specific implementation.
+  internal::EventInterface* implementation() { return implementation_.get(); }
+
+ private:
+  friend class Stream;
+
+  // Pointer to the platform-specific EventInterface implementation underlying
+  // the object. Owned.
+  std::unique_ptr<internal::EventInterface> implementation_;
+
+  // Pointer to the StreamExecutor interface used to create this object.
+  // Not owned.
+  StreamExecutor* stream_exec_;
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_EVENT_H_
diff --git a/tensorflow/stream_executor/executor_cache.cc b/tensorflow/stream_executor/executor_cache.cc
new file mode 100644
index 0000000000..7bf1a9aa4a
--- /dev/null
+++ b/tensorflow/stream_executor/executor_cache.cc
@@ -0,0 +1,43 @@
+#include "tensorflow/stream_executor/executor_cache.h"
+
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+
+namespace perftools {
+namespace gputools {
+
+port::Status ExecutorCache::Insert(const StreamExecutorConfig& config,
+                                   std::unique_ptr<StreamExecutor> entry) {
+  if (Get(config).ok()) {
+    return port::Status(port::error::ALREADY_EXISTS,
+                        "An executor with a matching config already exists.");
+  }
+
+  cache_[config.ordinal].emplace_back(Entry(config, std::move(entry)));
+
+  return port::Status::OK();
+}
+
+port::StatusOr<StreamExecutor*> ExecutorCache::Get(
+    const StreamExecutorConfig& config) {
+  auto entries = cache_.find(config.ordinal);
+  if (entries == cache_.end()) {
+    return port::Status(
+        port::error::NOT_FOUND,
+        port::Printf("No executors registered for ordinal %d", config.ordinal));
+  }
+
+  for (const auto& iter : entries->second) {
+    if (iter.first.plugin_config == config.plugin_config &&
+        iter.first.device_options == config.device_options) {
+      return iter.second.get();
+    }
+  }
+
+  return port::Status(port::error::NOT_FOUND,
+                      "No executor found with a matching config.");
+}
+
+void ExecutorCache::DestroyAllExecutors() { cache_.clear(); }
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/executor_cache.h b/tensorflow/stream_executor/executor_cache.h
new file mode 100644
index 0000000000..4d1d9ddb07
--- /dev/null
+++ b/tensorflow/stream_executor/executor_cache.h
@@ -0,0 +1,45 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_EXECUTOR_CACHE_H_
+#define TENSORFLOW_STREAM_EXECUTOR_EXECUTOR_CACHE_H_
+
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+
+namespace perftools {
+namespace gputools {
+
+// Utility class to allow Platform objects to manage cached StreamExecutors.
+class ExecutorCache {
+ public:
+  ExecutorCache() {}
+
+  // Inserts a new StreamExecutor with the given configuration into the cache.
+  // Will not overwrite if called when a matching element is already present.
+  port::Status Insert(const StreamExecutorConfig& config,
+                      std::unique_ptr<StreamExecutor> executor);
+
+  // Returns a pointer to the described executor (if one with a matching config
+  // has been created), or a NOT_FOUND status.
+  port::StatusOr<StreamExecutor*> Get(const StreamExecutorConfig& config);
+
+  // Destroys all Executors and clears the cache.
+  // Performs no synchronization - undefined behavior may occur if any executors
+  // are active!
+  void DestroyAllExecutors();
+
+ private:
+  typedef std::pair<StreamExecutorConfig, std::unique_ptr<StreamExecutor>>
+      Entry;
+
+  // Maps ordinal number to a list of cached executors for that ordinal.
+  // We key off of ordinal (instead of just looking up all fields in the
+  // StreamExecutorConfig) for a slight improvement in lookup time.
+  std::map<int, std::vector<Entry>> cache_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(ExecutorCache);
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_EXECUTOR_CACHE_H_
diff --git a/tensorflow/stream_executor/fft.h b/tensorflow/stream_executor/fft.h
new file mode 100644
index 0000000000..b47921d8f2
--- /dev/null
+++ b/tensorflow/stream_executor/fft.h
@@ -0,0 +1,187 @@
+// Exposes the family of FFT routines as pre-canned high performance calls for
+// use in conjunction with the StreamExecutor abstraction.
+//
+// Note that this interface is optionally supported by platforms; see
+// StreamExecutor::SupportsFft() for details.
+//
+// This abstraction makes it simple to entrain FFT operations on GPU data into
+// a Stream -- users typically will not use this API directly, but will use the
+// Stream builder methods to entrain these operations "under the hood". For
+// example:
+//
+//  DeviceMemory<std::complex<float>> x =
+//    stream_exec->AllocateArray<std::complex<float>>(1024);
+//  DeviceMemory<std::complex<float>> y =
+//    stream_exec->AllocateArray<std::complex<float>>(1024);
+//  // ... populate x and y ...
+//  Stream stream{stream_exec};
+//  std::unique_ptr<Plan> plan =
+//     stream_exec.AsFft()->Create1dPlan(&stream, 1024, Type::kC2CForward);
+//  stream
+//    .Init()
+//    .ThenFft(plan.get(), x, &y)
+//    .BlockHostUntilDone();
+//
+// By using stream operations in this manner the user can easily intermix custom
+// kernel launches (via StreamExecutor::ThenLaunch()) with these pre-canned FFT
+// routines.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_FFT_H_
+#define TENSORFLOW_STREAM_EXECUTOR_FFT_H_
+
+#include <complex>
+#include <memory>
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+template <typename ElemT>
+class DeviceMemory;
+
+namespace fft {
+
+// Specifies FFT input and output types, and the direction.
+// R, D, C, and Z stand for SP real, DP real, SP complex, and DP complex.
+enum class Type {
+  kC2CForward,
+  kC2CInverse,
+  kC2R,
+  kR2C,
+  kZ2ZForward,
+  kZ2ZInverse,
+  kZ2D,
+  kD2Z
+};
+
+// FFT plan class. Each FFT implementation should define a plan class that is
+// derived from this class. It does not provide any interface but serves
+// as a common type that is used to execute the plan.
+class Plan {
+ public:
+  virtual ~Plan() {}
+};
+
+// FFT support interface -- this can be derived from a GPU executor when the
+// underlying platform has an FFT library implementation available. See
+// StreamExecutor::AsFft().
+//
+// This support interface is not generally thread-safe; it is only thread-safe
+// for the CUDA platform (cuFFT) usage; host side FFT support is known
+// thread-compatible, but not thread-safe.
+class FftSupport {
+ public:
+  virtual ~FftSupport() {}
+
+  // Creates a 1d FFT plan.
+  virtual std::unique_ptr<Plan> Create1dPlan(Stream *stream, uint64 num_x,
+                                             Type type, bool in_place_fft) = 0;
+
+  // Creates a 2d FFT plan.
+  virtual std::unique_ptr<Plan> Create2dPlan(Stream *stream, uint64 num_x,
+                                             uint64 num_y, Type type,
+                                             bool in_place_fft) = 0;
+
+  // Creates a 3d FFT plan.
+  virtual std::unique_ptr<Plan> Create3dPlan(Stream *stream, uint64 num_x,
+                                             uint64 num_y, uint64 num_z,
+                                             Type type, bool in_place_fft) = 0;
+
+  // Creates a batched FFT plan.
+  //
+  // stream:          The GPU stream in which the FFT runs.
+  // rank:            Dimensionality of the transform (1, 2, or 3).
+  // elem_count:      Array of size rank, describing the size of each dimension.
+  // input_embed, output_embed:
+  //                  Pointer of size rank that indicates the storage dimensions
+  //                  of the input/output data in memory. If set to null_ptr all
+  //                  other advanced data layout parameters are ignored.
+  // input_stride:    Indicates the distance (number of elements; same below)
+  //                  between two successive input elements.
+  // input_distance:  Indicates the distance between the first element of two
+  //                  consecutive signals in a batch of the input data.
+  // output_stride:   Indicates the distance between two successive output
+  //                  elements.
+  // output_distance: Indicates the distance between the first element of two
+  //                  consecutive signals in a batch of the output data.
+  virtual std::unique_ptr<Plan> CreateBatchedPlan(
+      Stream *stream, int rank, uint64 *elem_count, uint64 *input_embed,
+      uint64 input_stride, uint64 input_distance, uint64 *output_embed,
+      uint64 output_stride, uint64 output_distance, Type type,
+      bool in_place_fft, int batch_count) = 0;
+
+  // Computes complex-to-complex FFT in the transform direction as specified
+  // by direction parameter.
+  virtual bool DoFft(Stream *stream, Plan *plan,
+                     const DeviceMemory<std::complex<float>> &input,
+                     DeviceMemory<std::complex<float>> *output) = 0;
+  virtual bool DoFft(Stream *stream, Plan *plan,
+                     const DeviceMemory<std::complex<double>> &input,
+                     DeviceMemory<std::complex<double>> *output) = 0;
+
+  // Computes real-to-complex FFT in forward direction.
+  virtual bool DoFft(Stream *stream, Plan *plan,
+                     const DeviceMemory<float> &input,
+                     DeviceMemory<std::complex<float>> *output) = 0;
+  virtual bool DoFft(Stream *stream, Plan *plan,
+                     const DeviceMemory<double> &input,
+                     DeviceMemory<std::complex<double>> *output) = 0;
+
+  // Computes complex-to-real FFT in inverse direction.
+  virtual bool DoFft(Stream *stream, Plan *plan,
+                     const DeviceMemory<std::complex<float>> &input,
+                     DeviceMemory<float> *output) = 0;
+  virtual bool DoFft(Stream *stream, Plan *plan,
+                     const DeviceMemory<std::complex<double>> &input,
+                     DeviceMemory<double> *output) = 0;
+
+ protected:
+  FftSupport() {}
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(FftSupport);
+};
+
+// Macro used to quickly declare overrides for abstract virtuals in the
+// fft::FftSupport base class. Assumes that it's emitted somewhere inside the
+// ::perftools::gputools namespace.
+#define TENSORFLOW_STREAM_EXECUTOR_GPU_FFT_SUPPORT_OVERRIDES                 \
+  std::unique_ptr<fft::Plan> Create1dPlan(Stream *stream, uint64 num_x,      \
+                                          fft::Type type, bool in_place_fft) \
+      override;                                                              \
+  std::unique_ptr<fft::Plan> Create2dPlan(Stream *stream, uint64 num_x,      \
+                                          uint64 num_y, fft::Type type,      \
+                                          bool in_place_fft) override;       \
+  std::unique_ptr<fft::Plan> Create3dPlan(                                   \
+      Stream *stream, uint64 num_x, uint64 num_y, uint64 num_z,              \
+      fft::Type type, bool in_place_fft) override;                           \
+  std::unique_ptr<fft::Plan> CreateBatchedPlan(                              \
+      Stream *stream, int rank, uint64 *elem_count, uint64 *input_embed,     \
+      uint64 input_stride, uint64 input_distance, uint64 *output_embed,      \
+      uint64 output_stride, uint64 output_distance, fft::Type type,          \
+      bool in_place_fft, int batch_count) override;                          \
+  bool DoFft(Stream *stream, fft::Plan *plan,                                \
+             const DeviceMemory<std::complex<float>> &input,                 \
+             DeviceMemory<std::complex<float>> *output) override;            \
+  bool DoFft(Stream *stream, fft::Plan *plan,                                \
+             const DeviceMemory<std::complex<double>> &input,                \
+             DeviceMemory<std::complex<double>> *output) override;           \
+  bool DoFft(Stream *stream, fft::Plan *plan,                                \
+             const DeviceMemory<float> &input,                               \
+             DeviceMemory<std::complex<float>> *output) override;            \
+  bool DoFft(Stream *stream, fft::Plan *plan,                                \
+             const DeviceMemory<double> &input,                              \
+             DeviceMemory<std::complex<double>> *output) override;           \
+  bool DoFft(Stream *stream, fft::Plan *plan,                                \
+             const DeviceMemory<std::complex<float>> &input,                 \
+             DeviceMemory<float> *output) override;                          \
+  bool DoFft(Stream *stream, fft::Plan *plan,                                \
+             const DeviceMemory<std::complex<double>> &input,                \
+             DeviceMemory<double> *output) override;
+
+}  // namespace fft
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_FFT_H_
diff --git a/tensorflow/stream_executor/gcuda.cc b/tensorflow/stream_executor/gcuda.cc
new file mode 100644
index 0000000000..505534c08f
--- /dev/null
+++ b/tensorflow/stream_executor/gcuda.cc
@@ -0,0 +1,87 @@
+#include "tensorflow/stream_executor/gcuda.h"
+
+namespace perftools {
+namespace gputools {
+
+// Returns the mapping of gcudacc kernel stub to preferred cache
+// configuration. C++ static singleton pattern.
+std::map<void *, KernelCacheConfig> &GetGcudaccStubToCacheConfigMap() {
+  static std::map<void *, KernelCacheConfig> cache_config_by_stub;
+  return cache_config_by_stub;
+}
+
+shared_mem_config::SharedMemConfig DeviceGetSharedMemConfig(
+    StreamExecutor *stream_exec) {
+  SharedMemoryConfig config = stream_exec->GetDeviceSharedMemoryConfig();
+
+  switch (config) {
+    case SharedMemoryConfig::kDefault:
+      return shared_mem_config::kDefaultBankSize;
+    case SharedMemoryConfig::kFourByte:
+      return shared_mem_config::kFourByteBankSize;
+    case SharedMemoryConfig::kEightByte:
+      return shared_mem_config::kEightByteBankSize;
+    default:
+      LOG(FATAL) << "Impossible shared memory config returned: "
+                 << static_cast<int>(config);
+  }
+}
+
+void DeviceSetSharedMemConfig(StreamExecutor *stream_exec,
+                              shared_mem_config::SharedMemConfig config) {
+  SharedMemoryConfig executor_config;
+  switch (config) {
+    case shared_mem_config::kDefaultBankSize:
+      executor_config = SharedMemoryConfig::kDefault;
+      break;
+    case shared_mem_config::kFourByteBankSize:
+      executor_config = SharedMemoryConfig::kFourByte;
+      break;
+    case shared_mem_config::kEightByteBankSize:
+      executor_config = SharedMemoryConfig::kEightByte;
+      break;
+    default:
+      LOG(FATAL) << "Impossible shared memory config specified: "
+                 << static_cast<int>(config);
+  }
+
+  if (!stream_exec->SetDeviceSharedMemoryConfig(executor_config).ok()) {
+    // The message is logged at a higher level.
+    LOG(INFO) << "Unable to set cache configuration; proceeding.";
+  }
+}
+
+template <>
+void FuncSetCacheConfig<void *>(Stream *stream, void *fptr,
+                                cache_config::CacheConfig cache_config) {
+  // Map from the legacy to the C++11 type.
+  KernelCacheConfig kernel_cache_config;
+  switch (cache_config) {
+    case cache_config::kPreferShared:
+      kernel_cache_config = KernelCacheConfig::kPreferShared;
+      break;
+    case cache_config::kPreferL1:
+      kernel_cache_config = KernelCacheConfig::kPreferL1;
+      break;
+    case cache_config::kPreferEqual:
+      kernel_cache_config = KernelCacheConfig::kPreferEqual;
+      break;
+    default:
+      kernel_cache_config = KernelCacheConfig::kNoPreference;
+  }
+  auto cache_config_map = GetGcudaccStubToCacheConfigMap();
+  cache_config_map[fptr] = kernel_cache_config;
+}
+
+template <>
+KernelCacheConfig FuncGetCacheConfig<void *>(void *fptr) {
+  auto cache_config_map = GetGcudaccStubToCacheConfigMap();
+  auto iter = cache_config_map.find(fptr);
+  if (iter == cache_config_map.end()) {
+    return KernelCacheConfig::kNoPreference;
+  }
+  return cache_config_map[fptr];
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/gcuda.h b/tensorflow/stream_executor/gcuda.h
new file mode 100644
index 0000000000..24b09c5358
--- /dev/null
+++ b/tensorflow/stream_executor/gcuda.h
@@ -0,0 +1,415 @@
+// Common declarations and includes for mixed-mode GPU usage at Google.
+//
+// This header serves to define a "common baseline" for GPU usage,
+// either with gcudacc or nvcc, and on the host or device. The rule of thumb is,
+// "if you're working with mixed-mode GPU code at Google, include this header."
+#ifndef TENSORFLOW_STREAM_EXECUTOR_GCUDA_H_
+#define TENSORFLOW_STREAM_EXECUTOR_GCUDA_H_
+
+// Symbol glossary:
+//   __CUDACC__: CUDA capable compiler, compiling host or device
+//   __CUDA_ARCH__: Compiling device code
+//   __GCUDACC__: Using gcudacc
+//   __NVCC__: Using nvcc
+
+// For device code compiled with gcudacc, CUDA_ASSUME(X) tells the compiler
+// that it may assume that X is true. This can enable further optimization.
+// It is undefined behavior if X is not true. X should not have side-effects
+// and gcudacc will try to warn you if it does.
+#if defined(__CUDA_ARCH__) && defined(__GCUDACC__)
+#define CUDA_ASSUME(X) __builtin_assume(X)
+#else
+#define CUDA_ASSUME(X) do {} while (false)
+#endif
+
+namespace perftools {
+namespace gputools {
+namespace cache_config {
+// A version of the KernelCacheConfig enum class, exposed for pre-C++11
+// compilers.
+enum CacheConfig {
+  // Indicates no preference for device L1/shared memory configuration.
+  kNoPreference,
+
+  // Indicates a preference for more shared memory than L1 cache.
+  kPreferShared,
+
+  // Indicates a preference for more L1 cache than shared memory.
+  kPreferL1,
+
+  // Indicates a preference for equal amounts of L1 cache and shared memory.
+  kPreferEqual,
+};
+}  // namespace cache_config
+
+namespace shared_mem_config {
+// A compatability-layer declaration of CUsharedconfig, needed to support
+// cuFuncSetSharedMemConfig/cudaDeviceSetSharedMemConfig. Declared here for
+// compatability with pre-C++11 compilers.
+enum SharedMemConfig {
+  // Indicates that the context's shared memory config should be used.
+  kDefaultBankSize,
+
+  // Specifies a four-byte bank size for shared memory.
+  kFourByteBankSize,
+
+  // Specifies an eight-byte bank size for shared memory.
+  kEightByteBankSize,
+};
+}  // namespace shared_mem_config
+}  // namespace gputools
+}  // namespace perftools
+
+#if !defined(__NVCC__) && !defined(GCUDACC_STANDALONE_MODE)
+// Using gcudacc, either device-only or mixed-mode code. No special declarations
+// are needed for host-only code being compiled under gcudacc.
+
+// These includes are required by the code introduced during gcudacc operation.
+// Since the user code may not directly include these headers, they may not be
+// present in the build environment without inclusion here.
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/kernel.h"
+#include "tensorflow/stream_executor/kernel_cache_config.h"
+#include "tensorflow/stream_executor/launch_dim.h"
+#include "tensorflow/stream_executor/machine_manager.h"
+#include "tensorflow/stream_executor/shared_memory_config.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+
+// cudaConfigureCall is a symbol used by Clang when it sees a CUDA triple-angle-
+// bracket launch, so we declare it here so the symbol resolves. It is not used
+// by gcudacc-generated code, however, so it is not defined anywhere.
+// In other words, this is a dummy declaration needed for parsing.
+
+#ifdef __GCUDACC__
+// These symbols only need to be defined during compilation with gcudacc.
+namespace perftools {
+namespace gputools {
+
+// This class defines all the implicit conversions necessary to match launch
+// dimensions against the cudaConfigureCall() signature, and sits where a dim3
+// usually would in triple angle launches. This supports the kernel launch
+// dimension styles:
+//  kernel<<<1, 1>>>() and
+//  kernel<<<BlockDim(...), ThreadDim(...)>>> and
+//  kernel<<<dim3(1), dim3(1)>>>
+// All of these are predicated upon implicit conversions, which are frowned upon
+// by the style guide. Rather then add this CUDA-specific bad behavior to
+// StreamExecutor headers, we isolate it here.
+class LaunchDimConverter {
+ public:
+  LaunchDimConverter(unsigned long long int i) : _dim(i, 1, 1) {}  // NOLINT
+  LaunchDimConverter(::perftools::gputools::BlockDim dim)
+      :  // NOLINT
+        _dim(dim.x, dim.y, dim.z) {}
+  LaunchDimConverter(::perftools::gputools::ThreadDim dim)
+      :  // NOLINT
+        _dim(dim.x, dim.y, dim.z) {}
+  LaunchDimConverter(dim3 dim) :  // NOLINT
+    _dim(dim.x, dim.y, dim.z) {}
+
+  ::perftools::gputools::BlockDim AsBlockDim() {
+    return ::perftools::gputools::BlockDim(_dim.x, _dim.y, _dim.z);
+  }
+
+  ::perftools::gputools::ThreadDim AsThreadDim() {
+    return ::perftools::gputools::ThreadDim(_dim.x, _dim.y, _dim.z);
+  }
+
+ private:
+  ::perftools::gputools::Dim3D _dim;
+};
+}  // namespace gputools
+}  // namespace perftools
+
+int cudaConfigureCall(::perftools::gputools::LaunchDimConverter grid_size,
+                      ::perftools::gputools::LaunchDimConverter block_size,
+                      unsigned shared_size = 0,
+                      ::perftools::gputools::Stream *stream = 0);
+#endif
+
+// The rest of the symbols in this block are needed during both StreamExecutor
+// and user library compilation.
+namespace perftools {
+namespace gputools {
+
+// Gets the preferred shared memory configuration for the device to which
+// the specified executor is bound.
+shared_mem_config::SharedMemConfig DeviceGetSharedMemConfig(
+    StreamExecutor *stream_exec);
+
+// Sets the preferred shared memory configuration for the device to which
+// the specified executor is bound.
+// Does not return an error if the current device is invalid.
+void DeviceSetSharedMemConfig(StreamExecutor *stream_exec,
+                              shared_mem_config::SharedMemConfig config);
+
+// Sets the preferred cache configuration for the given kernel.
+template <typename KernelT>
+void FuncSetCacheConfig(Stream *stream, KernelT kernel,
+                        cache_config::CacheConfig cache_config) {
+  FuncSetCacheConfig(stream, reinterpret_cast<void *>(kernel), cache_config);
+}
+
+// Internal specialization of the above.
+template <>
+void FuncSetCacheConfig<void *>(Stream *stream, void *kernel,
+                                cache_config::CacheConfig cache_config);
+
+// Gets the preferred cache configuration for the given kernel.
+template <typename KernelT>
+KernelCacheConfig FuncGetCacheConfig(KernelT kernel) {
+  return FuncGetCacheConfig(reinterpret_cast<void *>(kernel));
+}
+
+// Internal specialization of the above.
+template <>
+KernelCacheConfig FuncGetCacheConfig<void *>(void *kernel);
+
+}  // namespace gputools
+}  // namespace perftools
+
+#elif defined(__NVCC__)
+// NVCC code compilation, device-only or mixed mode. As above, no special
+// declarations are needed for host-only code.
+namespace perftools {
+namespace gputools {
+class Stream;
+}  // namespace gputools
+}  // namespace perftools
+
+// --- BEGIN EXTERNALLY-DEFINED FUNCTIONS
+
+// The following functions must be defined in some external library linked in to
+// the final binary - they are _not_ defined in the StreamExecutor
+// (in nvcc mode).
+
+// Sets the preferred cache configuration for the specified kernel.
+template <typename KernelT>
+void SetCudaCacheConfig(perftools::gputools::Stream* stream, KernelT kernel,
+    ::perftools::gputools::cache_config::CacheConfig preference);
+
+// Gets the current device for use in CUDA runtime-emulating routines.
+// "device" is the device ordinal as returned by
+// StreamExecutor::device_ordinal().
+int GetDevice();
+
+// Sets the current device for use in CUDA runtime-emulating routines.
+// "device" is the device ordinal as returned by
+// StreamExecutor::device_ordinal().
+void SetDevice(int device);
+
+// --- END EXTERNALLY-DEFINED FUNCTIONS
+
+namespace perftools {
+namespace gputools {
+template <typename KernelT>
+void FuncSetCacheConfig(Stream *stream, KernelT kernel,
+                        cache_config::CacheConfig cache_config) {
+  SetCudaCacheConfig(stream, reinterpret_cast<void*>(kernel), cache_config);
+}
+}  // namespace gputools
+}  // namespace perftools
+
+// The following functions are declared extern "C" in CUDA's device_functions.h,
+// so we have to wrap them for compatability with the cuda_builtin namespace.
+// Thin wrappers to break these functions out of cuda_builtin are defined below.
+__forceinline__ __device__ clock_t __gcuda_nvcc_clock() { return clock(); }
+__forceinline__ __device__ int __gcuda_nvcc__clz(int x) {
+  return __clz(x);
+}
+__forceinline__ __device__ int __gcuda_nvcc__clzll(long long int x) {
+  return __clzll(x);
+}
+__forceinline__ __device__ float __gcuda_nvcc__fdividef(float a, float b) {
+  return __fdividef(a, b);
+}
+__forceinline__ __device__ int __gcuda_nvcc__ffsll(long long int x) { // NOLINT
+  return __ffsll(x);
+}
+__forceinline__ __device__ int __gcuda_nvcc__popc(unsigned int x) {
+  return __popc(x);
+}
+__forceinline__ __device__ float __gcuda_nvcc__powf(float a, float b) {
+  return __powf(a, b);
+}
+__forceinline__ __device__ void __gcuda_nvcc__sincosf(
+    float x, float *sptr, float *cptr) {
+  __sincosf(x, sptr, cptr);
+}
+__forceinline__ __device__ unsigned int __gcuda_nvcc__umulhi(
+    unsigned int x, unsigned int y) {
+  return __umulhi(x, y);
+}
+
+#if __CUDA_ARCH__ >= 200 || !defined(__CUDA_ARCH__)
+__forceinline__ __device__ unsigned int __gcuda_nvcc__ballot(int x) {
+  return __ballot(x);
+}
+#endif  // __CUDA_ARCH__ >= 200 || !defined(__CUDA_ARCH__)
+
+// Forward-declare printf as nvcc does not declare it by itself and we
+// need this file to compile even if it is included before including
+// stdio.h or cstdio.
+int printf(const char* format, ...);
+
+namespace cuda_builtin {
+using ::abs;
+using ::atomicAdd;
+using ::atomicCAS;
+using ::ceil;
+using ::ceilf;
+using ::cos;
+using ::cosf;
+using ::erfcinv;
+using ::erfcinvf;
+using ::exp;
+using ::expf;
+using ::fabs;
+using ::fabsf;
+using ::floor;
+using ::floorf;
+using ::fabs;
+using ::fabsf;
+using ::fma;
+using ::fmaf;
+using ::fmax;
+using ::fmaxf;
+using ::fmin;
+using ::fminf;
+using ::log;
+using ::log1p;
+using ::log1pf;
+using ::logf;
+using ::max;
+using ::min;
+using ::powf;
+using ::printf;
+using ::sin;
+using ::sinf;
+using ::sincos;
+using ::sincosf;
+using ::sincospi;
+using ::sincospif;
+using ::sqrt;
+using ::sqrtf;
+using ::tanh;
+using ::trunc;
+using ::truncf;
+using ::trunc;
+
+// rsqrt and rsqrtf are functions defined by nvcc in both host and device mode.
+// Add these functions to gcuda.h such that it is also host device. In device
+// side they correspond to intrinsics while explicit definitions are provided
+// below for host side.
+#ifdef __CUDA_ARCH__
+using ::rsqrt;
+using ::rsqrtf;
+#else
+__forceinline__ __host__ __device__ float rsqrtf(float x) {
+  return 1 / std::sqrt(x);
+}
+__forceinline__ __host__ __device__ double rsqrt(double x) {
+  return 1 / std::sqrt(x);
+}
+#endif
+
+__forceinline__ __device__ int clock() { return __gcuda_nvcc_clock(); }
+
+__forceinline__ __device__ int __clz(int x) {
+  return __gcuda_nvcc__clz(x);
+}
+
+__forceinline__ __device__ int __clz(long long int x) {
+  return __gcuda_nvcc__clzll(x);
+}
+
+__forceinline__ __device__ float __fdividef(float a, float b) {
+  return __gcuda_nvcc__fdividef(a, b);
+}
+
+__forceinline__ __device__ int __ffsll(long long int x) { // NOLINT
+  return __gcuda_nvcc__ffsll(x);
+}
+
+__forceinline__ __device__ int __popc(unsigned int x) {
+  return __gcuda_nvcc__popc(x);
+}
+
+__forceinline__ __device__ float __powf(float a, float b) {
+  return __gcuda_nvcc__powf(a, b);
+}
+
+__forceinline__ __device__ void __sincosf(float x, float *sptr, float *cptr) {
+  __gcuda_nvcc__sincosf(x, sptr, cptr);
+}
+
+__forceinline__ __device__ unsigned int __umulhi(unsigned int x,
+                                                 unsigned int y) {
+  return __gcuda_nvcc__umulhi(x, y);
+}
+
+#ifdef __CUDA_ARCH__
+// These symbols are only visible when parsing device code.
+using ::__double_as_longlong;
+using ::__int_as_float;
+using ::__float_as_int;
+using ::__longlong_as_double;
+#endif  // __CUDA_ARCH__
+
+#if __CUDA_ARCH__ >= 200 || !defined(__CUDA_ARCH__)
+__forceinline__ __device__ unsigned int __ballot(int x) {
+  return __gcuda_nvcc__ballot(x);
+}
+#endif  // __CUDA_ARCH__ >= 200 || !defined(__CUDA_ARCH__)
+
+#if __CUDA_ARCH__ >= 300 || !defined(__CUDA_ARCH__)
+using ::__shfl;
+using ::__shfl_down;
+using ::__shfl_up;
+using ::__shfl_xor;
+#endif  // __CUDA_ARCH__ >= 300 || !defined(__CUDA_ARCH__)
+
+#if __CUDA_ARCH__ >= 320 || !defined(__CUDA_ARCH__)
+using ::__ldg;
+#endif  // __CUDA_ARCH__ >= 320 || !defined(__CUDA_ARCH__)
+
+#if __CUDA_API_VERSION < 6050
+// CUDA < 6.5 defines isfinite as a macro, while CUDA >= 6.5 and gcudacc
+// define isfinite as a function. Work around this for the CUDA 5.5 case,
+// duplicating that macro definition.
+#undef isfinite
+#define __gcuda_nvcc_isfinite(x)                                  \
+    (sizeof(x) == sizeof(float) ? __finitef(x) :                  \
+        sizeof(x) == sizeof(double) ? __finite(x) : __finitel(x))
+inline __device__ int isfinite(float x) {
+  return __gcuda_nvcc_isfinite(x);
+}
+inline __device__ int isfinite(double x) {
+  return __gcuda_nvcc_isfinite(x);
+}
+inline __device__ int isfinite(long double x) {
+  return __gcuda_nvcc_isfinite(x);
+}
+#else
+// CUDA API >= v6.5
+using ::isfinite;
+#endif  // __CUDA_API_VERSION >= 6050
+}  // namespace cuda_builtin
+
+#if __CUDA_API_VERSION >= 6050
+// The second part of the isfinite workaround.
+inline __device__ int isfinite(float x) {
+  return __gcuda_nvcc_isfinite(x);
+}
+inline __device__ int isfinite(double x) {
+  return __gcuda_nvcc_isfinite(x);
+}
+inline __device__ int isfinite(long double x) {
+  return __gcuda_nvcc_isfinite(x);
+}
+#endif  // __CUDA_API_VERSION >= 6050
+
+#endif  // defined(__NVCC__)
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_GCUDA_H_
diff --git a/tensorflow/stream_executor/gpu_launch_dim.h b/tensorflow/stream_executor/gpu_launch_dim.h
new file mode 100644
index 0000000000..51182b2d32
--- /dev/null
+++ b/tensorflow/stream_executor/gpu_launch_dim.h
@@ -0,0 +1,8 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_GPU_LAUNCH_DIM_H_
+#define TENSORFLOW_STREAM_EXECUTOR_GPU_LAUNCH_DIM_H_
+
+// TODO(rspringer): Temporary redirection until all users - including gcudacc -
+// are using the new file.
+#include "tensorflow/stream_executor/launch_dim.h"
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_GPU_LAUNCH_DIM_H_
diff --git a/tensorflow/stream_executor/kernel.cc b/tensorflow/stream_executor/kernel.cc
new file mode 100644
index 0000000000..5e7fe95627
--- /dev/null
+++ b/tensorflow/stream_executor/kernel.cc
@@ -0,0 +1,95 @@
+// Implementation of the pointer-to-implementation wrapper for the data-parallel
+// kernel abstraction. KernelBase just delegates to the internal
+// platform-specific implementation instance.
+
+#include "tensorflow/stream_executor/kernel.h"
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/lib/demangle.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+
+namespace perftools {
+namespace gputools {
+
+bool KernelMetadata::registers_per_thread(int *registers_per_thread) const {
+  if (has_registers_per_thread_) {
+    *registers_per_thread = registers_per_thread_;
+    return true;
+  }
+
+  return false;
+}
+
+void KernelMetadata::set_registers_per_thread(int registers_per_thread) {
+  registers_per_thread_ = registers_per_thread;
+  has_registers_per_thread_ = true;
+}
+
+bool KernelMetadata::shared_memory_bytes(int *shared_memory_bytes) const {
+  if (has_shared_memory_bytes_) {
+    *shared_memory_bytes = shared_memory_bytes_;
+    return true;
+  }
+
+  return false;
+}
+
+void KernelMetadata::set_shared_memory_bytes(int shared_memory_bytes) {
+  shared_memory_bytes_ = shared_memory_bytes;
+  has_shared_memory_bytes_ = true;
+}
+
+static internal::KernelInterface *KernelImplementationFromPlatformKind(
+    PlatformKind platform_kind) {
+  if (platform_kind == PlatformKind::kCuda) {
+    return (*internal::MakeCUDAKernelImplementation())();
+  } else if (platform_kind == PlatformKind::kOpenCL ||
+             platform_kind == PlatformKind::kOpenCLAltera) {
+    return (*internal::MakeOpenCLKernelImplementation())();
+  } else {
+    LOG(FATAL) << "cannot create kernel implementation for platform kind: "
+               << PlatformKindString(platform_kind);
+  }
+}
+
+KernelBase::KernelBase(StreamExecutor *parent)
+    : implementation_(
+          KernelImplementationFromPlatformKind(parent->platform_kind())),
+      parent_(parent) {
+  DCHECK(parent_ != nullptr);
+}
+
+KernelBase::KernelBase(StreamExecutor *parent,
+                       internal::KernelInterface *implementation)
+    : implementation_(implementation), parent_(parent) {}
+
+KernelBase::~KernelBase() {}
+
+unsigned KernelBase::Arity() const { return implementation_->Arity(); }
+
+void KernelBase::SetPreferredCacheConfig(KernelCacheConfig config) {
+  return implementation_->SetPreferredCacheConfig(config);
+}
+
+KernelCacheConfig KernelBase::GetPreferredCacheConfig() const {
+  return implementation_->GetPreferredCacheConfig();
+}
+
+// Prefix stub functions emitted by the CUDA splitter.
+static const char *kStubPrefix = "__device_stub_";
+
+void KernelBase::set_name(port::StringPiece name) {
+  name_ = name.ToString();
+  port::StringPiece stubless_name = name;
+  if (name.starts_with(kStubPrefix)) {
+    stubless_name.remove_prefix(strlen(kStubPrefix));
+  }
+  demangled_name_ = port::Demangle(stubless_name.data());
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/kernel.h b/tensorflow/stream_executor/kernel.h
new file mode 100644
index 0000000000..da646d0f40
--- /dev/null
+++ b/tensorflow/stream_executor/kernel.h
@@ -0,0 +1,499 @@
+// Suite of datatypes to represent data-parallel kernel objects (code entities).
+// Kernel is the untyped variant, whereas TypedKernel takes a type signature
+// to do some template-based helper generation and give compile-time type
+// checking for kernel launch parameters.
+//
+// Users typically don't see KernelBase, they see typed kernels, analogous to a
+// typed function pointer. TypedKernels express their argument types via
+// template parameters like so:
+//
+//  TypedKernel<DeviceMemory<int>*, int>
+//
+// Which expresses a data parallel kernel signature for:
+//
+//  void(int*, int);
+//
+// And for a const memory region:
+//
+//  TypedKernel<const DeviceMemory<int>&, int>
+//
+// Corresponds to a data parallel kernel signature for:
+//
+//  void(const int*, int)
+//
+// Note that kernels always have a void return type, so results typically must
+// be memcpy'ied from device memory to the host.
+//
+// Also note that a scalar integer residing in device memory and an array of
+// integers residing in device memory have the same signature: DeviceMemory<T>.
+// However, in the future, checks may be added for additional safety that arrays
+// of minimum sizes are passed when those minimum sizes are contractually
+// expected by the kernel.
+//
+// For user-defined types whose definitions are appropriately shared between the
+// host code doing the launching and the kernel code being launched, the user
+// defined types are similarly permitted to be expressed as residing in device
+// memory:
+//
+//  TypedKernel<DeviceMemory<MyUserDefinedStructure>>
+//
+// And, when the alignment and padding are agreed upon, POD types will also be
+// able to be passed by value; for example, it is a common idiom to specify a
+// bunch of options simultaneously with a structure:
+//
+//  TypedKernel<MyOptionsStructurePassedByValue, DeviceMemory<float>>
+//
+// Which corresponds to a data parallel kernel signature like:
+//
+//  void(MyOptionsStructurePassedByValue value, float *result);
+//
+// Users typically won't need to type out the TypedKernel signature in full, it
+// will be typedef'd by automatically generated code; for example, see
+// perftools::gputools::executor_sample::VecReduceAddKernel.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_KERNEL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_KERNEL_H_
+
+#include <memory>
+#include <tuple>
+#include <type_traits>
+#include <vector>
+
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/kernel_cache_config.h"
+#include "tensorflow/stream_executor/lib/stringpiece.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/lib/inlined_vector.h"
+
+namespace perftools {
+namespace gputools {
+
+class DeviceMemoryBase;
+template <typename ElemT>
+class DeviceMemory;
+class StreamExecutor;
+
+namespace internal {
+class KernelInterface;
+}  // namespace internal
+
+// KernelMetadata holds runtime-queryable attributes of a loaded kernel, such as
+// registers allocated, shared memory used, etc.
+// Not all platforms support reporting of all information, so each accessor
+// returns false if the associated field is not populated in the underlying
+// platform.
+class KernelMetadata {
+ public:
+  KernelMetadata()
+      : has_registers_per_thread_(false), has_shared_memory_bytes_(false) {}
+
+  // Returns the number of registers used per thread executing this kernel.
+  bool registers_per_thread(int *registers_per_thread) const;
+
+  // Sets the number of registers used per thread executing this kernel.
+  void set_registers_per_thread(int registers_per_thread);
+
+  // Returns the amount of [static] shared memory used per block executing this
+  // kernel. Note that dynamic shared memory allocations are not (and can not)
+  // be reported here (since they're not specified until kernel launch time).
+  bool shared_memory_bytes(int *shared_memory_bytes) const;
+
+  // Sets the amount of [static] shared memory used per block executing this
+  // kernel.
+  void set_shared_memory_bytes(int shared_memory_bytes);
+
+ private:
+  // Holds the value returned by registers_per_thread above.
+  bool has_registers_per_thread_;
+  int registers_per_thread_;
+
+  // Holds the value returned by shared_memory_bytes above.
+  bool has_shared_memory_bytes_;
+  int64 shared_memory_bytes_;
+};
+
+// A data-parallel kernel (code entity) for launching via the StreamExecutor,
+// analogous to a void* device function pointer. See TypedKernel for the typed
+// variant.
+//
+// Thread-compatible.
+class KernelBase {
+ public:
+  // Constructs an "empty" (not-yet-loaded) kernel instance.
+  //
+  // parent is the StreamExecutor that will be responsible for loading the
+  // implementation of this kernel. It must not be null.
+  explicit KernelBase(StreamExecutor *parent);
+
+  // Test-only constructor that can take a mock KernelInterface implementation.
+  // Takes ownership of implementation, it should not be null.
+  KernelBase(StreamExecutor *parent, internal::KernelInterface *implementation);
+
+  // Releases resources associated with the kernel instance (i.e.
+  // platform-specific implementation).
+  ~KernelBase();
+
+  // Returns the number of parameters that this kernel accepts. (Arity refers to
+  // nullary, unary, ...).
+  unsigned Arity() const;
+
+  // Returns the StreamExecutor that represents the platform this kernel
+  // executes upon.
+  StreamExecutor *parent() const { return parent_; }
+
+  // Returns a const pointer to the (opaque) platform-dependent implementation.
+  const internal::KernelInterface *implementation() const {
+    return implementation_.get();
+  }
+
+  // Returns a non-const pointer to the (opaque) platform-dependent
+  // implementation.
+  internal::KernelInterface *implementation() { return implementation_.get(); }
+
+  void set_metadata(const KernelMetadata &metadata) { metadata_ = metadata; }
+
+  const KernelMetadata &metadata() const { return metadata_; }
+
+  // Sets the preferred cache configuration for a kernel. This is just a
+  // suggestion to the runtime, and may not be honored during execution.
+  void SetPreferredCacheConfig(KernelCacheConfig config);
+
+  // Gets the preferred cache configuration for a kernel.
+  KernelCacheConfig GetPreferredCacheConfig() const;
+
+  void set_name(port::StringPiece name);
+  const string &name() const { return name_; }
+  const string &demangled_name() const { return demangled_name_; }
+
+ private:
+  // Implementation delegated to for platform-specific functionality.
+  std::unique_ptr<internal::KernelInterface> implementation_;
+
+  // The StreamExecutor that loads this kernel object.
+  StreamExecutor *parent_;
+
+  string name_;
+  string demangled_name_;
+
+  KernelMetadata metadata_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(KernelBase);
+};
+
+// Whether T is a DeviceMemory-family pointer.
+template <typename T>
+struct IsDeviceMemoryPointer {
+  static constexpr bool value = false;
+};
+
+template <typename U>
+struct IsDeviceMemoryPointer<DeviceMemory<U> *> {
+  static constexpr bool value = true;
+};
+
+template <>
+struct IsDeviceMemoryPointer<DeviceMemoryBase *> {
+  static constexpr bool value = true;
+};
+
+// Whether T is a DeviceMemory-family value-like thing (which includes a
+// reference). This trait is useful because we pack values in the same manner as
+// references.
+template <typename T>
+struct IsDeviceMemoryValueLike {
+  static constexpr bool value = false;
+};
+
+template <typename U>
+struct IsDeviceMemoryValueLike<DeviceMemory<U> &> {
+  static constexpr bool value = true;
+};
+
+// We need to treat SharedDeviceMemory types differently than other DeviceMemory
+// types (since they maintain no allocations), hence these specializations.
+template <typename U>
+struct IsDeviceMemoryValueLike<SharedDeviceMemory<U> &> {
+  static constexpr bool value = false;
+};
+
+template <>
+struct IsDeviceMemoryValueLike<DeviceMemoryBase &> {
+  static constexpr bool value = true;
+};
+
+template <typename U>
+struct IsDeviceMemoryValueLike<DeviceMemory<U>> {
+  static constexpr bool value = true;
+};
+
+template <typename U>
+struct IsDeviceMemoryValueLike<SharedDeviceMemory<U>> {
+  static constexpr bool value = false;
+};
+
+template <>
+struct IsDeviceMemoryValueLike<DeviceMemoryBase> {
+  static constexpr bool value = true;
+};
+
+template <typename U>
+struct IsSharedDeviceMemory {
+  static constexpr bool value = false;
+};
+
+template <typename U>
+struct IsSharedDeviceMemory<SharedDeviceMemory<U> &> {
+  static constexpr bool value = true;
+};
+
+template <typename U>
+struct IsSharedDeviceMemory<SharedDeviceMemory<U>> {
+  static constexpr bool value = true;
+};
+
+// KernelArg encapsulates the information necessary for a back-end executor to
+// configure a kernel to launch using the given argument.
+struct KernelArg {
+  // Indicates the type of an argument: normal, to be passed to the kernel
+  // in the standard manner, or shared memory, which has distinct
+  // rules for specification per backend.
+  enum Type {
+    kNormal,
+    kSharedMemory,
+  } type;
+
+  // The data to pass to the kernel - either a pointer to device memory, or the
+  // argument value. compact_array is used to prevent smaller args (ex. u8, u64)
+  // from requiring heap allocation.
+  port::InlinedVector<uint8, 4> data;
+
+  // The size of this argument in bytes.
+  uint64 bytes;
+};
+
+// Typed variant of KernelBase, like a typed device function pointer. See the
+// file comment for details and example usage.
+//
+// This class contains template metaprogramming magic to type check the
+// parameters passed to a kernel launch are acceptable, and subsequently pack
+// them into a form which can be used by the StreamExecutorInterface
+// implementation. (i.e.  CUDA and OpenCL both bind void*s with associated
+// sizes as kernel arguments.)
+//
+// Thread-compatible.
+template <typename... Params>
+class TypedKernel : public KernelBase {
+ public:
+  // Delegates to KernelBase::KernelBase(), see that constructor.
+  explicit TypedKernel(StreamExecutor *parent) : KernelBase(parent) {}
+
+  // Test-only constructor that can take a mock KernelInterface implementation.
+  // Takes ownership of implementation, it should not be null.
+  TypedKernel(StreamExecutor *parent, internal::KernelInterface *implementation)
+      : KernelBase(parent, implementation) {}
+
+ private:
+  // Stream needs access to the specific parameter-packing functionality that
+  // the TypedKernel provides for its corresponding type signature (and no other
+  // type signatures).
+  friend class Stream;
+
+  // This is the main entry point into the magic. Packs the parameters (which
+  // must type check against the class template) into the args and sizes
+  // arrays.
+  //
+  // Const refs are taken as parameters on all of the handlers to avoid
+  // implicit type promotion of integers.
+  void PackParams(std::vector<KernelArg> *args, Params... params) const {
+    PackOneParam(args, params...);
+  }
+
+  template <typename T, typename... RestOfParams>
+  void PackOneParam(std::vector<KernelArg> *args, const T &arg,
+                    const RestOfParams... rest) const {
+    PackOneParam(args, arg);
+    PackOneParam(args, rest...);
+  }
+
+  // Packs one (non-DeviceMemoryBase) parameter into the arg and sizes array.
+  // The enable_if<> is for excluding DeviceMemoryBase args, which have a
+  // separate implementation below.
+  template <typename T>
+  void PackOneParam(
+      std::vector<KernelArg> *args, const T &arg,
+      typename std::enable_if<!IsDeviceMemoryValueLike<T>::value &&
+                              !IsDeviceMemoryPointer<T>::value &&
+                              !IsSharedDeviceMemory<T>::value>::type * =
+          nullptr) const {
+    static_assert(!std::is_pointer<T>::value,
+                  "cannot pass raw pointer to the device");
+    static_assert(!std::is_convertible<T, DeviceMemoryBase>::value,
+                  "cannot pass device memory as a normal value");
+    const uint8 *arg_ptr = reinterpret_cast<const uint8 *>(&arg);
+    args->emplace_back(KernelArg{
+        KernelArg::kNormal,
+        port::InlinedVector<uint8, 4>{arg_ptr, arg_ptr + sizeof(arg)}, sizeof(arg)});
+  }
+
+  // DeviceMemoryBase family reference override.
+  template <typename T>
+  void PackOneParam(
+      std::vector<KernelArg> *args, const T &arg,
+      typename std::enable_if<IsDeviceMemoryValueLike<T>::value>::type * =
+          nullptr) const {
+    args->emplace_back(parent()->DeviceMemoryToKernelArg(arg));
+  }
+
+  // DeviceMemoryBase family pointer override.
+  template <typename T>
+  void PackOneParam(
+      std::vector<KernelArg> *args, T arg,
+      typename std::enable_if<IsDeviceMemoryPointer<T>::value>::type * =
+          nullptr) const {
+    DeviceMemoryBase *ptr = static_cast<DeviceMemoryBase *>(arg);
+    args->emplace_back(parent()->DeviceMemoryToKernelArg(*ptr));
+  }
+
+  // Dynamic shared device memory has a size, but no associated allocation on
+  // the host; internally, the device will allocate storage.
+  template <typename T>
+  void PackOneParam(
+      std::vector<KernelArg> *args, T arg,
+      typename std::enable_if<IsSharedDeviceMemory<T>::value>::type * =
+          nullptr) const {
+    args->emplace_back(KernelArg{KernelArg::kSharedMemory,
+                                 port::InlinedVector<uint8, 4>(), arg.size()});
+  }
+
+  // Base case for variadic template expansion - nothing to do!
+  void PackOneParam(std::vector<KernelArg> *args) const {}
+
+  SE_DISALLOW_COPY_AND_ASSIGN(TypedKernel);
+};
+
+// Template metaprogramming helper type that helps us produce better error
+// messages at compile time when the are mismatches between the parameter
+// type list and the argument type list.
+template <typename ParamTuple, typename ArgTuple>
+struct KernelInvocationChecker {
+  // Whether the parameter tuple and argument tuple match in length.
+  static constexpr bool kLengthMatches =
+      std::tuple_size<ParamTuple>::value == std::tuple_size<ArgTuple>::value;
+
+  // The (matching) length of the parameters and arguments type lists.
+  static constexpr int kTupleLength =
+      static_cast<int>(std::tuple_size<ArgTuple>::value);
+
+  // Helper trait to say whether the parameter wants a DeviceMemory-reference
+  // compatible type. This is for inexact type matches, so that it doesn't have
+  // to be precisely a const DeviceMemory<T>&, but can also be a value that
+  // represents the same.
+  template <typename ParamType, typename ArgType>
+  struct IsCompatibleDeviceMemoryRef {
+    static constexpr bool value = false;
+  };
+
+  // See type trait definition above.
+  template <typename U>
+  struct IsCompatibleDeviceMemoryRef<const DeviceMemory<U> &, DeviceMemory<U>> {
+    static constexpr bool value = true;
+  };
+
+  // See type trait definition above.
+  template <typename U>
+  struct IsCompatibleDeviceMemoryRef<const SharedDeviceMemory<U> &,
+                                     SharedDeviceMemory<U>> {
+    static constexpr bool value = true;
+  };
+
+  // Returns whether ParamT and ArgT are compatible for data parallel kernel
+  // parameter packing without any assert functionality.
+  template <typename ParamT, typename ArgT>
+  static constexpr bool CompatibleNoAssert() {
+    return std::is_same<typename std::remove_const<ParamT>::type,
+                        ArgT>::value ||
+           IsCompatibleDeviceMemoryRef<ParamT, ArgT>::value;
+  }
+
+  // Checks whether ParamT and ArgT are compatible for data parallel kernel
+  // parameter packing. kArgumentNumber is unused, it just for error display.
+  //
+  // NOTE: if you encounter an error here, you can see the mismatch by looking
+  // at the end of the last error message, which will be of the form:
+  //
+  //    ...::Compatible<const perftools::gputools::DeviceMemory<OneThing> &,
+  //                    perftools::gputools::DeviceMemory<AnotherThing>, true,
+  //                    0>'
+  //    requested here
+  //
+  // This means that the 0th argument you passed to the kernel invocation should
+  // have been DeviceMemory<OneThing> but was observed to be
+  // DeviceMemory<AnotherThing>.
+  template <typename ParamT, typename ArgT, bool kShouldStaticAssert,
+            int kArgumentNumber>
+  static constexpr bool Compatible() {
+    static_assert(
+        kShouldStaticAssert ? CompatibleNoAssert<ParamT, ArgT>() : true,
+        "parameter type (LHS) is not compatible with argument type (RHS)");
+    return CompatibleNoAssert<ParamT, ArgT>();
+  }
+
+  // Checks the parameter/argument match at kArgumentNumber for an out of bounds
+  // argument number.
+  //
+  // This is the base case: we've run out of argument to check, so we're all
+  // good.
+  template <int kArgumentNumber, bool kShouldStaticAssert>
+  static constexpr bool CheckParam(
+      typename std::enable_if<(kArgumentNumber < 0)>::type *dummy = nullptr) {
+    return true;
+  }
+
+  // Checks the parameter/argument match at kArgumentNumber.
+  // kShouldStaticAssert determines whether to assert out on a mismatch, or just
+  // yield the constexpr boolean value.
+  template <int kArgumentNumber, bool kShouldStaticAssert>
+  static constexpr bool CheckParam(
+      typename std::enable_if<kArgumentNumber >= 0>::type *dummy = nullptr) {
+    typedef typename std::tuple_element<kArgumentNumber, ParamTuple>::type
+        ParamT;
+    typedef typename std::tuple_element<kArgumentNumber, ArgTuple>::type ArgT;
+    return Compatible<ParamT, ArgT, kShouldStaticAssert, kArgumentNumber>() &&
+           CheckParam<kArgumentNumber - 1, kShouldStaticAssert>();
+  }
+
+  // Checks the parameters/arguments for match, but doesn't static assert out.
+  // This is useful for testing/inspecting whether a set of parameters match in
+  // things like tests.
+  static constexpr bool CheckAllNoStaticAssert() {
+    return kLengthMatches && CheckParam<kTupleLength - 1, false>();
+  }
+
+  // Checks the parameters and static asserts out with a helpful error message
+  // (and useful template parameters in the instantiation stack) if there is an
+  // error.
+  static constexpr bool CheckAllStaticAssert() {
+    static_assert(kLengthMatches,
+                  "argument length mismatched against typed kernel parameters");
+    return kLengthMatches && CheckParam<kTupleLength - 1, true>();
+  }
+};
+
+// This is a convenience type for checking whether a typed kernel matches
+// against a type list.
+template <typename KernelT, typename... Params>
+struct KernelParamsOk {
+  static constexpr bool kResult = false;
+};
+
+// See above.
+template <typename... Params, typename... Args>
+struct KernelParamsOk<TypedKernel<Params...>, Args...> {
+  static constexpr bool kResult = KernelInvocationChecker<
+      std::tuple<Params...>, std::tuple<Args...>>::CheckAllNoStaticAssert();
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_KERNEL_H_
diff --git a/tensorflow/stream_executor/kernel_cache_config.h b/tensorflow/stream_executor/kernel_cache_config.h
new file mode 100644
index 0000000000..9675d2940c
--- /dev/null
+++ b/tensorflow/stream_executor/kernel_cache_config.h
@@ -0,0 +1,29 @@
+// This file contains declarations relating to kernel cache configuration
+// parameters recognized by the StreamExecutor.
+#ifndef TENSORFLOW_STREAM_EXECUTOR_KERNEL_CACHE_CONFIG_H_
+#define TENSORFLOW_STREAM_EXECUTOR_KERNEL_CACHE_CONFIG_H_
+
+namespace perftools {
+namespace gputools {
+
+// This enum represents potential configurations of L1/shared memory when
+// running a particular kernel. These values represent user preference, and
+// the runtime is not required to respect these choices.
+enum class KernelCacheConfig {
+  // Indicates no preference for device L1/shared memory configuration.
+  kNoPreference,
+
+  // Indicates a preference for more shared memory than L1 cache.
+  kPreferShared,
+
+  // Indicates a preference for more L1 cache than shared memory.
+  kPreferL1,
+
+  // Indicates a preference for equal amounts of L1 cache and shared memory.
+  kPreferEqual,
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_KERNEL_CACHE_CONFIG_H_
diff --git a/tensorflow/stream_executor/kernel_spec.cc b/tensorflow/stream_executor/kernel_spec.cc
new file mode 100644
index 0000000000..e3b4b0d951
--- /dev/null
+++ b/tensorflow/stream_executor/kernel_spec.cc
@@ -0,0 +1,236 @@
+#include "tensorflow/stream_executor/kernel_spec.h"
+
+
+namespace perftools {
+namespace gputools {
+
+KernelLoaderSpec::KernelLoaderSpec(port::StringPiece kernelname)
+    : kernelname_(kernelname.ToString()) {}
+
+OnDiskKernelLoaderSpec::OnDiskKernelLoaderSpec(port::StringPiece filename,
+                                               port::StringPiece kernelname)
+    : KernelLoaderSpec(kernelname), filename_(filename.ToString()) {}
+
+CudaPtxOnDisk::CudaPtxOnDisk(port::StringPiece filename,
+                             port::StringPiece kernelname)
+    : OnDiskKernelLoaderSpec(filename, kernelname) {}
+
+CudaCubinOnDisk::CudaCubinOnDisk(port::StringPiece filename,
+                                 port::StringPiece kernelname)
+    : OnDiskKernelLoaderSpec(filename, kernelname) {}
+
+CudaCubinInMemory::CudaCubinInMemory(const char *bytes,
+                                     port::StringPiece kernelname)
+    : KernelLoaderSpec(kernelname), bytes_(bytes) {}
+
+bool CompareComputeCapability(const std::tuple<int, int> &lhs,
+                              const std::tuple<int, int> &rhs) {
+  return std::get<0>(lhs) < std::get<0>(rhs) ||
+         (std::get<0>(lhs) == std::get<0>(rhs) &&
+          std::get<1>(lhs) < std::get<1>(rhs));
+}
+
+const std::tuple<int, int> CudaPtxInMemory::kMinimumCapability{1, 0};
+
+CudaPtxInMemory::CudaPtxInMemory(port::StringPiece ptx,
+                                 port::StringPiece kernel_name,
+                                 bool ptx_compressed)
+    : KernelLoaderSpec(kernel_name),
+      ptx_by_compute_capability_(CompareComputeCapability) {
+  if (ptx_compressed) {
+    // Lazy decompression. Put an empty string in decompressed_ptx_ showing that
+    // the original ptx is compressed.
+    decompressed_ptx_[ptx.data()] = "";
+  }
+  ptx_by_compute_capability_[kMinimumCapability] = ptx.data();
+}
+
+CudaPtxInMemory::CudaPtxInMemory(
+    const std::initializer_list<CudaPtxInMemory::PtxSpec> &spec_list,
+    port::StringPiece kernel_name, bool ptx_compressed)
+    : KernelLoaderSpec(kernel_name),
+      ptx_by_compute_capability_(CompareComputeCapability) {
+  for (const auto &spec : spec_list) {
+    int major, minor;
+    port::StringPiece ptx;
+    std::tie(major, minor, ptx) = spec;
+    if (ptx_compressed) {
+      // Lazy decompression. Put an empty string in decompressed_ptx_ showing
+      // that the original ptx is compressed.
+      decompressed_ptx_[ptx.data()] = "";
+    }
+    ptx_by_compute_capability_[std::tuple<int, int>{major, minor}] = ptx.data();
+  }
+}
+
+string CudaPtxInMemory::DecompressPtx(const char *ptx) {
+  // Get the length of the PTX string from the beginning of the buffer.
+  uint64 ptx_length = *reinterpret_cast<const uint64 *>(ptx);
+  // Get the PTX string from the buffer with offset and length.
+  string compressed_ptx(ptx + sizeof(uint64),
+                        ptx + sizeof(uint64) + ptx_length);
+  string decompressed_ptx;
+  // Decompress the PTX string with bzip2.
+  LOG(FATAL) << "bzip2 decompression is not supported yet.";
+  return decompressed_ptx;
+}
+
+const char *CudaPtxInMemory::default_text() const {
+  if (ptx_by_compute_capability_.empty()) {
+    return nullptr;
+  }
+
+  mutex_lock lock{mu_};
+
+  auto ptx = ptx_by_compute_capability_.begin()->second;
+  // Check if there is an entry in decompressed ptx table.
+  auto decompressed_ptx_iter = decompressed_ptx_.find(ptx);
+  if (decompressed_ptx_iter != decompressed_ptx_.end()) {
+    // If the decompressed string is empty, which means the ptx hasn't been
+    // decompressed, decompress it here.
+    if (decompressed_ptx_iter->second.size() == 0) {
+      decompressed_ptx_iter->second = DecompressPtx(ptx);
+    }
+    return decompressed_ptx_iter->second.c_str();
+  }
+  return ptx;
+}
+
+const char *CudaPtxInMemory::original_default_text() const {
+  if (ptx_by_compute_capability_.empty()) {
+    return nullptr;
+  }
+
+  return ptx_by_compute_capability_.begin()->second;
+}
+
+const char *CudaPtxInMemory::text(int compute_capability_major,
+                                  int compute_capability_minor) const {
+  std::tuple<int, int> capability{compute_capability_major,
+                                  compute_capability_minor};
+
+  auto ptx_iter = ptx_by_compute_capability_.find(capability);
+  if (ptx_iter == ptx_by_compute_capability_.end()) {
+    return nullptr;
+  }
+
+  mutex_lock lock{mu_};
+
+  // Check if there is an entry in decompressed ptx table.
+  auto decompressed_ptx_iter = decompressed_ptx_.find(ptx_iter->second);
+  if (decompressed_ptx_iter != decompressed_ptx_.end()) {
+    // If the decompressed string is empty, which means the ptx hasn't been
+    // decompressed, decompress it here.
+    if (decompressed_ptx_iter->second.size() == 0) {
+      decompressed_ptx_iter->second = DecompressPtx(ptx_iter->second);
+    }
+    return decompressed_ptx_iter->second.c_str();
+  }
+  return ptx_iter->second;
+}
+
+const char *CudaPtxInMemory::original_text(int compute_capability_major,
+                                           int compute_capability_minor) const {
+  std::tuple<int, int> capability{compute_capability_major,
+                                  compute_capability_minor};
+
+  auto ptx_iter = ptx_by_compute_capability_.find(capability);
+  if (ptx_iter == ptx_by_compute_capability_.end()) {
+    return nullptr;
+  }
+
+  return ptx_iter->second;
+}
+
+OpenCLTextOnDisk::OpenCLTextOnDisk(port::StringPiece filename,
+                                   port::StringPiece kernelname)
+    : OnDiskKernelLoaderSpec(filename, kernelname) {}
+
+OpenCLTextInMemory::OpenCLTextInMemory(port::StringPiece text,
+                                       port::StringPiece kernelname)
+    : KernelLoaderSpec(kernelname), text_(text.ToString()) {}
+
+OpenCLBinaryOnDisk::OpenCLBinaryOnDisk(port::StringPiece filename,
+                                       port::StringPiece kernelname)
+    : OnDiskKernelLoaderSpec(filename, kernelname) {}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddOpenCLTextOnDisk(
+    port::StringPiece filename, port::StringPiece kernelname) {
+  CHECK(ocl_text_on_disk_ == nullptr);
+  ocl_text_on_disk_.reset(new OpenCLTextOnDisk{filename, kernelname});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddOpenCLBinaryOnDisk(
+    port::StringPiece filename, port::StringPiece kernelname) {
+  CHECK(ocl_binary_on_disk_ == nullptr);
+  ocl_binary_on_disk_.reset(new OpenCLBinaryOnDisk{filename, kernelname});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddOpenCLTextInMemory(
+    port::StringPiece filename, port::StringPiece kernelname) {
+  CHECK(ocl_text_in_memory_ == nullptr);
+  ocl_text_in_memory_.reset(new OpenCLTextInMemory{filename, kernelname});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddCudaPtxOnDisk(
+    port::StringPiece filename, port::StringPiece kernelname) {
+  CHECK(cuda_ptx_on_disk_ == nullptr);
+  cuda_ptx_on_disk_.reset(new CudaPtxOnDisk{filename, kernelname});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddCudaCubinInMemory(
+    const char *bytes, port::StringPiece kernelname) {
+  CHECK(cuda_cubin_in_memory_ == nullptr);
+  cuda_cubin_in_memory_.reset(new CudaCubinInMemory{bytes, kernelname});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddCudaCubinOnDisk(
+    port::StringPiece filename, port::StringPiece kernelname) {
+  CHECK(cuda_cubin_on_disk_ == nullptr);
+  cuda_cubin_on_disk_.reset(new CudaCubinOnDisk{filename, kernelname});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddCudaPtxInMemory(
+    port::StringPiece ptx, port::StringPiece kernelname) {
+  CHECK(cuda_ptx_in_memory_ == nullptr);
+  cuda_ptx_in_memory_.reset(
+      new CudaPtxInMemory{ptx, kernelname, false /* ptx_compressed */});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddCudaCompressedPtxInMemory(
+    port::StringPiece ptx, port::StringPiece kernelname) {
+  CHECK(cuda_ptx_in_memory_ == nullptr);
+  cuda_ptx_in_memory_.reset(
+      new CudaPtxInMemory{ptx, kernelname, true /* ptx_compressed */});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddCudaPtxInMemory(
+    std::initializer_list<CudaPtxInMemory::PtxSpec> spec_list,
+    port::StringPiece kernelname) {
+  CHECK(cuda_ptx_in_memory_ == nullptr);
+  cuda_ptx_in_memory_.reset(
+      new CudaPtxInMemory{spec_list, kernelname, false /* ptx_compressed */});
+  return this;
+}
+
+MultiKernelLoaderSpec *MultiKernelLoaderSpec::AddCudaCompressedPtxInMemory(
+    std::initializer_list<CudaPtxInMemory::PtxSpec> spec_list,
+    port::StringPiece kernelname) {
+  CHECK(cuda_ptx_in_memory_ == nullptr);
+  cuda_ptx_in_memory_.reset(
+      new CudaPtxInMemory{spec_list, kernelname, true /* ptx_compressed */});
+  return this;
+}
+
+MultiKernelLoaderSpec::MultiKernelLoaderSpec(size_t arity) : arity_(arity) {}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/kernel_spec.h b/tensorflow/stream_executor/kernel_spec.h
new file mode 100644
index 0000000000..01a47ac253
--- /dev/null
+++ b/tensorflow/stream_executor/kernel_spec.h
@@ -0,0 +1,365 @@
+// Kernel-loader specs are structures that describe how to load a data-parallel
+// kernel on a given platform for subsequent launching. Headers that instantiate
+// these data structures will typically be auto-generated. However, users can
+// also instantiate them by hand.
+//
+// A kernel with the same exact functionality and type signature may be
+// implemented on several different platforms. Typical usage is to create a
+// singleton that describes how to load a kernel on the various supported
+// platforms:
+//
+//  static const MultiKernelLoaderSpec &SaxpySpec() {
+//    static auto *mkls =
+//        (new MultiKernelLoaderSpec{4 /* = arity */})
+//            ->AddCudaPtxOnDisk(ptx_file_path, ptx_kernelname)
+//            ->AddOpenCLTextOnDisk(opencl_text_file_path, ocl_kernelname);
+//    };
+//
+//    return *mkls;
+//  }
+//
+// This lazily instantiates an object that describes how to load CUDA PTX
+// present on disk that implements saxpy for the for the CUDA platform, or
+// OpenCL text present on disk that implements saxpy for an OpenCL-based
+// platform. The CudaPtxOnDisk and OpenCLTextOnDisk objects are subtypes of
+// KernelLoaderSpec -- KernelLoaderSpec describes how to load a kernel for
+// subsequent launching on a single platform.
+//
+// For the loader functionality that accepts these KernelLoaderSpecs in order
+// to grab the kernel appropriately, see StreamExecutor::GetKernel().
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_KERNEL_SPEC_H_
+#define TENSORFLOW_STREAM_EXECUTOR_KERNEL_SPEC_H_
+
+#include <stddef.h>
+#include <map>
+#include <memory>
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/lib/stringpiece.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+// Describes how to load a kernel on a target platform.
+//
+// This is an abstract base class, subclassed for specific platforms.
+// The filename_or_text field represents the program location (i.e. PTX or
+// OpenCL loadable translation unit path) and is simply stored; whether it is a
+// filename or text is exposed via more specifically named accessors in
+// subclasses.
+//
+// These kernel loader specifications are typically auto-generated into header
+// files at build time, but can also be specified manually.
+class KernelLoaderSpec {
+ public:
+  virtual ~KernelLoaderSpec() {}
+
+  // Returns the kernel name to load out of the program.
+  const string &kernelname() const { return kernelname_; }
+
+ protected:
+  explicit KernelLoaderSpec(port::StringPiece kernelname);
+
+ private:
+  // The kernel name that should be loaded out of the program description given
+  // above.
+  string kernelname_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(KernelLoaderSpec);
+};
+
+// An abstract kernel loader spec that has an associated file path, where
+// there's a canonical suffix for the filename; e.g. see CudaPtxOnDisk whose
+// canonical filename suffix is ".ptx".
+class OnDiskKernelLoaderSpec : public KernelLoaderSpec {
+ public:
+  ~OnDiskKernelLoaderSpec() override {}
+
+  // Returns the path to the on-disk loadable kernel file.
+  const string &filename() const { return filename_; }
+
+  // Returns the canonical suffix for this on-disk kernel loader spec format;
+  // e.g. PTX files on disk have a canonical suffix of ".ptx".
+  virtual const char *CanonicalSuffix() const = 0;
+
+ protected:
+  OnDiskKernelLoaderSpec(port::StringPiece filename,
+                         port::StringPiece kernelname);
+
+  string filename_;
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(OnDiskKernelLoaderSpec);
+};
+
+// Kernel loader specification for PTX text that resides on disk.
+class CudaPtxOnDisk : public OnDiskKernelLoaderSpec {
+ public:
+  CudaPtxOnDisk(port::StringPiece filename, port::StringPiece kernelname);
+  ~CudaPtxOnDisk() override {}
+
+  const char *CanonicalSuffix() const override { return ".ptx"; }
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(CudaPtxOnDisk);
+};
+
+// Kernel loader specification for CUBIN binary that resides on disk.
+class CudaCubinOnDisk : public OnDiskKernelLoaderSpec {
+ public:
+  CudaCubinOnDisk(port::StringPiece filename, port::StringPiece kernelname);
+  ~CudaCubinOnDisk() override {}
+
+  const string &filename() const { return filename_; }
+
+  const char *CanonicalSuffix() const override { return ".cubin"; }
+
+ private:
+  string filename_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CudaCubinOnDisk);
+};
+
+// Kernel loader specification for PTX text that resides in memory.
+class CudaPtxInMemory : public KernelLoaderSpec {
+ public:
+  // Components: compute capability major number, compute capability minor
+  // number, and PTX source.
+  typedef std::tuple<int, int, port::StringPiece> PtxSpec;
+
+  // Single-PTX constructor. Adds the provided PTX version with an unknown
+  // compute capability. Since the CC is unknown, the PTX is assumed to be very
+  // generally usable - in other words, PTX specified in this manner is VERY
+  // likely to be used as the default! Note that the PTX can be compressed,
+  // which is indicated by the argument ptx_compressed.
+  //
+  // Warning: the string backing the provided port::StringPiece ptx must outlive this
+  // instance.
+  CudaPtxInMemory(port::StringPiece ptx, port::StringPiece kernelname,
+                  bool ptx_compressed = false);
+
+  // Multiple-PTX-version constructor. Adds each item in spec_list to this
+  // object. Note that the PTX can be compressed, which is indicated by the
+  // argument ptx_compressed.
+  CudaPtxInMemory(const std::initializer_list<PtxSpec> &spec_list,
+                  port::StringPiece kernel_name, bool ptx_compressed = false);
+  ~CudaPtxInMemory() override {}
+
+  // Add the PTX implementation described by ptx_spec to this object. On
+  // collision (i.e., if a version with the same compute_capability already
+  // exists), the existing implementation will be overwritten.
+  void AddSpec(PtxSpec ptx_spec);
+
+  // Returns pointer to the ptx of available implementation with the
+  // lowest-valued compute capability. For example, if PTX written to CC2.0,
+  // 3.0, and 3.5 are all available, the version for CC2.0 will be set. Returns
+  // nullptr on failed lookup (if any version is not available).
+  // When the ptx is compressed, returns the decompressed ptx.
+  const char *default_text() const;
+
+  // Similar to default_text().
+  // When the ptx is compressed, returns the decompressed ptx.
+  const char *original_default_text() const;
+
+  // Returns pointer to the ptx for the requested compute capability.
+  // Returns nullptr on failed lookup (if the requested version is not
+  // available).
+  // When the ptx is compressed, returns the decompressed ptx.
+  const char *text(int compute_capability_major,
+                   int compute_capability_minor) const;
+
+  // Similar to text().
+  // When the ptx is compressed, returns the original compressed ptx.
+  const char *original_text(int compute_capability_major,
+                            int compute_capability_minor) const;
+
+  // Decompresses the PTX string using bzip2.
+  static string DecompressPtx(const char *ptx);
+
+ private:
+  // PTX translation unit text contents in memory. The key is of as a tuple
+  // "<cc_major>,<cc_minor>", i.e., "2,0", "3,0", "3,5". Because CC's
+  // represented in this way have a clear sorting order, map::begin() will give
+  // the lowest-numbered version available, i.e. the default.
+  std::map<std::tuple<int, int>, const char *,
+           bool (*)(const std::tuple<int, int> &, const std::tuple<int, int> &)>
+      ptx_by_compute_capability_;
+
+  // Stores all decompressed ptx strings, with original ptx string as keys.
+  // It is marked as mutable for lazy decompression.
+  mutable std::map<const char *, string> decompressed_ptx_;
+  mutable mutex mu_;
+
+  // Defines the minimum compute capability possible. Used when PTX has no
+  // compute capability specified (in the single-PTX constructor).
+  static const std::tuple<int, int> kMinimumCapability;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CudaPtxInMemory);
+};
+
+// Kernel loader specification for OpenCL text that resides on disk.
+class OpenCLTextOnDisk : public OnDiskKernelLoaderSpec {
+ public:
+  OpenCLTextOnDisk(port::StringPiece filename, port::StringPiece kernelname);
+  ~OpenCLTextOnDisk() override {}
+
+  const char *CanonicalSuffix() const override { return ".ocl"; }
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(OpenCLTextOnDisk);
+};
+
+// Kernel loader specification for OpenCL binary that resides on disk.
+class OpenCLBinaryOnDisk : public OnDiskKernelLoaderSpec {
+ public:
+  OpenCLBinaryOnDisk(port::StringPiece filename, port::StringPiece kernelname);
+  ~OpenCLBinaryOnDisk() override {}
+
+  const char *CanonicalSuffix() const override { return ".aocx"; }
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(OpenCLBinaryOnDisk);
+};
+
+// Kernel loader specification for OpenCL text that resides in memory.
+class OpenCLTextInMemory : public KernelLoaderSpec {
+ public:
+  OpenCLTextInMemory(port::StringPiece text, port::StringPiece kernelname);
+  ~OpenCLTextInMemory() override {}
+
+  // Returns the OpenCL text contents.
+  const string &text() const { return text_; }
+
+ private:
+  // OpenCL translation unit text contents in memory.
+  string text_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(OpenCLTextInMemory);
+};
+
+// Kernel loader specification for a CUBIN blob that resides in memory.
+class CudaCubinInMemory : public KernelLoaderSpec {
+ public:
+  CudaCubinInMemory(const char *bytes, port::StringPiece kernelname);
+  ~CudaCubinInMemory() override {}
+
+  const char *bytes() const { return bytes_; }
+
+ private:
+  const char *bytes_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(CudaCubinInMemory);
+};
+
+// Describes how to load a kernel on any subset of a number of target platforms.
+class MultiKernelLoaderSpec {
+ public:
+  explicit MultiKernelLoaderSpec(size_t arity);
+
+  // Returns the number of arguments that this kernel accepts.
+  size_t arity() const { return arity_; }
+
+  // Convenience getters for testing whether these platform variants have
+  // kernel loader specifications available.
+  bool has_cuda_ptx_on_disk() const { return cuda_ptx_on_disk_ != nullptr; }
+  bool has_cuda_cubin_on_disk() const { return cuda_cubin_on_disk_ != nullptr; }
+  bool has_cuda_cubin_in_memory() const {
+    return cuda_cubin_in_memory_ != nullptr;
+  }
+  bool has_cuda_ptx_in_memory() const { return cuda_ptx_in_memory_ != nullptr; }
+  bool has_ocl_text_on_disk() const { return ocl_text_on_disk_ != nullptr; }
+  bool has_ocl_binary_on_disk() const { return ocl_binary_on_disk_ != nullptr; }
+  bool has_ocl_text_in_memory() const { return ocl_text_in_memory_ != nullptr; }
+
+  // Accessors for platform variant kernel load specifications.
+  // Precondition: corresponding has_* is true.
+  const CudaPtxOnDisk &cuda_ptx_on_disk() const {
+    CHECK(has_cuda_ptx_on_disk());
+    return *cuda_ptx_on_disk_;
+  }
+  const CudaCubinOnDisk &cuda_cubin_on_disk() const {
+    CHECK(has_cuda_cubin_on_disk());
+    return *cuda_cubin_on_disk_;
+  }
+  const CudaCubinInMemory &cuda_cubin_in_memory() const {
+    CHECK(has_cuda_cubin_in_memory());
+    return *cuda_cubin_in_memory_;
+  }
+  const CudaPtxInMemory &cuda_ptx_in_memory() const {
+    CHECK(has_cuda_ptx_in_memory());
+    return *cuda_ptx_in_memory_;
+  }
+  const OpenCLTextOnDisk &ocl_text_on_disk() const {
+    CHECK(has_ocl_text_on_disk());
+    return *ocl_text_on_disk_;
+  }
+  const OpenCLBinaryOnDisk &ocl_binary_on_disk() const {
+    CHECK(has_ocl_binary_on_disk());
+    return *ocl_binary_on_disk_;
+  }
+  const OpenCLTextInMemory &ocl_text_in_memory() const {
+    CHECK(has_ocl_text_in_memory());
+    return *ocl_text_in_memory_;
+  }
+
+  // Builder-pattern-like methods for use in initializing a
+  // MultiKernelLoaderSpec. Each of these should be used at most once for a
+  // single MultiKernelLoaderSpec object. See file comment for example usage.
+  //
+  // Note that the kernelname parameter must be consistent with the kernel in
+  // the PTX or OpenCL being loaded. Also be aware that in CUDA C++ the kernel
+  // name may be mangled by the compiler if it is not declared in an
+  // extern "C" scope.
+  MultiKernelLoaderSpec *AddOpenCLTextOnDisk(port::StringPiece filename,
+                                             port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddOpenCLBinaryOnDisk(port::StringPiece filename,
+                                               port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddOpenCLTextInMemory(port::StringPiece ocl_text,
+                                               port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddCudaPtxOnDisk(port::StringPiece filename,
+                                          port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddCudaCubinOnDisk(port::StringPiece filename,
+                                            port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddCudaCubinInMemory(const char *cubin_bytes,
+                                              port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddCudaPtxInMemory(port::StringPiece ptx,
+                                            port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddCudaCompressedPtxInMemory(
+      port::StringPiece ptx, port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddCudaPtxInMemory(
+      std::initializer_list<CudaPtxInMemory::PtxSpec> spec_list,
+      port::StringPiece kernelname);
+  MultiKernelLoaderSpec *AddCudaCompressedPtxInMemory(
+      std::initializer_list<CudaPtxInMemory::PtxSpec> spec_list,
+      port::StringPiece kernelname);
+
+ private:
+  std::unique_ptr<CudaPtxOnDisk>
+      cuda_ptx_on_disk_;  // PTX text that resides in a file.
+  std::unique_ptr<CudaCubinOnDisk>
+      cuda_cubin_on_disk_;  // Binary CUDA program in a file.
+  std::unique_ptr<CudaCubinInMemory>
+      cuda_cubin_in_memory_;  // Binary CUDA program in memory.
+  std::unique_ptr<CudaPtxInMemory>
+      cuda_ptx_in_memory_;  // PTX text that resides in memory.
+  std::unique_ptr<OpenCLTextOnDisk>
+      ocl_text_on_disk_;  // OpenCL text that resides on disk.
+  std::unique_ptr<OpenCLBinaryOnDisk>
+      ocl_binary_on_disk_;  // OpenCL binary that resides on disk.
+  std::unique_ptr<OpenCLTextInMemory>
+      ocl_text_in_memory_;  // OpenCL text that resides in memory.
+
+  // Number of parameters that the kernel takes. (This is nicer to have in a
+  // constexpr than having to determine it from the types via template
+  // metaprogramming).
+  size_t arity_;
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_KERNEL_SPEC_H_
diff --git a/tensorflow/stream_executor/launch_dim.h b/tensorflow/stream_executor/launch_dim.h
new file mode 100644
index 0000000000..9b870ed6aa
--- /dev/null
+++ b/tensorflow/stream_executor/launch_dim.h
@@ -0,0 +1,65 @@
+// Types to express dimensionality of a kernel launch. Blocks and threads
+// are (up to) 3-dimensional.
+//
+// A thread is conceptually like a SIMD lane. Some number, typically 32
+// (though that fact should not be relied on) SIMD lanes are tied together with
+// a single PC in a unit called a warp. There is a maximum number of threads
+// that can execute in a shared-context entity called a block. Presently, that
+// number is 1024 -- again, something that should not be relied on from this
+// comment, but checked via perftools::gputools::DeviceDescription.
+//
+// For additional information, see
+// http://docs.nvidia.com/cuda/kepler-tuning-guide/#device-utilization-and-occupancy
+//
+// Because of that modest thread-per-block limit, a kernel can be launched with
+// multiple blocks. Each block is indivisibly scheduled onto a single core.
+// Blocks can also be used in a multi-dimensional configuration, and the block
+// count has much less modest limits -- typically they're similar to the maximum
+// amount of addressable memory.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LAUNCH_DIM_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LAUNCH_DIM_H_
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+// Basic type that represents a 3-dimensional index space.
+struct Dim3D {
+  uint64 x, y, z;
+
+  Dim3D(uint64 x, uint64 y, uint64 z) : x(x), y(y), z(z) {}
+};
+
+// Thread dimensionality for use in a kernel launch. See file comment for
+// details.
+struct ThreadDim : public Dim3D {
+  explicit ThreadDim(uint64 x = 1, uint64 y = 1, uint64 z = 1)
+      : Dim3D(x, y, z) {}
+
+  // Returns a string representation of the thread dimensionality.
+  string ToString() const {
+    return port::StrCat("ThreadDim{", x, ", ", y, ", ", z, "}");
+  }
+};
+
+// Block dimensionality for use in a kernel launch. See file comment for
+// details.
+struct BlockDim : public Dim3D {
+  explicit BlockDim(uint64 x = 1, uint64 y = 1, uint64 z = 1)
+      : Dim3D(x, y, z) {}
+
+  // Returns a string representation of the block dimensionality.
+  string ToString() const {
+    return port::StrCat("BlockDim{", x, ", ", y, ", ", z, "}");
+  }
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LAUNCH_DIM_H_
diff --git a/tensorflow/stream_executor/lib/array_slice.h b/tensorflow/stream_executor/lib/array_slice.h
new file mode 100644
index 0000000000..271b1c15a0
--- /dev/null
+++ b/tensorflow/stream_executor/lib/array_slice.h
@@ -0,0 +1,17 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_ARRAY_SLICE_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_ARRAY_SLICE_H_
+
+#include "tensorflow/core/lib/gtl/array_slice.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::gtl::ArraySlice;
+using tensorflow::gtl::MutableArraySlice;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_ARRAY_SLICE_H_
diff --git a/tensorflow/stream_executor/lib/casts.h b/tensorflow/stream_executor/lib/casts.h
new file mode 100644
index 0000000000..61ff2ab00e
--- /dev/null
+++ b/tensorflow/stream_executor/lib/casts.h
@@ -0,0 +1,85 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_CASTS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_CASTS_H_
+
+#include <stdlib.h>
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+// port::bit_cast<Dest,Source> is a template function that implements the
+// equivalent of "*reinterpret_cast<Dest*>(&source)".  We need this in
+// very low-level functions like the protobuf library and fast math
+// support.
+//
+//   float f = 3.14159265358979;
+//   int i = port::bit_cast<int32>(f);
+//   // i = 0x40490fdb
+//
+// The classical address-casting method is:
+//
+//   // WRONG
+//   float f = 3.14159265358979;            // WRONG
+//   int i = * reinterpret_cast<int*>(&f);  // WRONG
+//
+// The address-casting method actually produces undefined behavior
+// according to ISO C++ specification section 3.10 -15 -.  Roughly, this
+// section says: if an object in memory has one type, and a program
+// accesses it with a different type, then the result is undefined
+// behavior for most values of "different type".
+//
+// This is true for any cast syntax, either *(int*)&f or
+// *reinterpret_cast<int*>(&f).  And it is particularly true for
+// conversions between integral lvalues and floating-point lvalues.
+//
+// The purpose of 3.10 -15- is to allow optimizing compilers to assume
+// that expressions with different types refer to different memory.  gcc
+// 4.0.1 has an optimizer that takes advantage of this.  So a
+// non-conforming program quietly produces wildly incorrect output.
+//
+// The problem is not the use of reinterpret_cast.  The problem is type
+// punning: holding an object in memory of one type and reading its bits
+// back using a different type.
+//
+// The C++ standard is more subtle and complex than this, but that
+// is the basic idea.
+//
+// Anyways ...
+//
+// port::bit_cast<> calls memcpy() which is blessed by the standard,
+// especially by the example in section 3.9 .  Also, of course,
+// port::bit_cast<> wraps up the nasty logic in one place.
+//
+// Fortunately memcpy() is very fast.  In optimized mode, with a
+// constant size, gcc 2.95.3, gcc 4.0.1, and msvc 7.1 produce inline
+// code with the minimal amount of data movement.  On a 32-bit system,
+// memcpy(d,s,4) compiles to one load and one store, and memcpy(d,s,8)
+// compiles to two loads and two stores.
+//
+// I tested this code with gcc 2.95.3, gcc 4.0.1, icc 8.1, and msvc 7.1.
+//
+// WARNING: if Dest or Source is a non-POD type, the result of the memcpy
+// is likely to surprise you.
+//
+// Props to Bill Gibbons for the compile time assertion technique and
+// Art Komninos and Igor Tandetnik for the msvc experiments.
+//
+// -- mec 2005-10-17
+
+template <class Dest, class Source>
+inline Dest bit_cast(const Source& source) {
+  // Compile time assertion: sizeof(Dest) == sizeof(Source)
+  // A compile error here means your Dest and Source have different sizes.
+  static_assert(sizeof(Dest) == sizeof(Source),
+                "src and dst types must have equal sizes");
+
+  Dest dest;
+  memcpy(&dest, &source, sizeof(dest));
+  return dest;
+}
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_CASTS_H_
diff --git a/tensorflow/stream_executor/lib/demangle.cc b/tensorflow/stream_executor/lib/demangle.cc
new file mode 100644
index 0000000000..6b837b803a
--- /dev/null
+++ b/tensorflow/stream_executor/lib/demangle.cc
@@ -0,0 +1,38 @@
+#include "tensorflow/stream_executor/lib/demangle.h"
+
+#if (__GNUC__ >= 4 || (__GNUC__ >= 3 && __GNUC_MINOR__ >= 4)) && \
+    !defined(__mips__)
+#  define HAS_CXA_DEMANGLE 1
+#else
+#  define HAS_CXA_DEMANGLE 0
+#endif
+
+#include <stdlib.h>
+#if HAS_CXA_DEMANGLE
+#include <cxxabi.h>
+#endif
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+// The API reference of abi::__cxa_demangle() can be found in
+// libstdc++'s manual.
+// https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.3/a01696.html
+string Demangle(const char *mangled) {
+  string demangled;
+  int status = 0;
+  char *result = NULL;
+#if HAS_CXA_DEMANGLE
+  result = abi::__cxa_demangle(mangled, NULL, NULL, &status);
+#endif
+  if (status == 0 && result != NULL) {  // Demangling succeeeded.
+    demangled.append(result);
+    free(result);
+  }
+  return demangled;
+}
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/lib/demangle.h b/tensorflow/stream_executor/lib/demangle.h
new file mode 100644
index 0000000000..0420f7101f
--- /dev/null
+++ b/tensorflow/stream_executor/lib/demangle.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_DEMANGLE_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_DEMANGLE_H_
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+string Demangle(const char* mangled);
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_DEMANGLE_H_
diff --git a/tensorflow/stream_executor/lib/env.h b/tensorflow/stream_executor/lib/env.h
new file mode 100644
index 0000000000..74b50ad42d
--- /dev/null
+++ b/tensorflow/stream_executor/lib/env.h
@@ -0,0 +1,29 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_ENV_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_ENV_H_
+
+#include "tensorflow/core/public/env.h"
+#include "tensorflow/stream_executor/lib/stringpiece.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::Env;
+using tensorflow::ReadFileToString;
+using tensorflow::Thread;
+using tensorflow::WriteStringToFile;
+
+inline bool FileExists(const string& filename) {
+  return Env::Default()->FileExists(filename);
+}
+
+inline bool FileExists(const port::StringPiece& filename) {
+  return Env::Default()->FileExists(filename.ToString());
+}
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_ENV_H_
diff --git a/tensorflow/stream_executor/lib/error.h b/tensorflow/stream_executor/lib/error.h
new file mode 100644
index 0000000000..376ddd3d07
--- /dev/null
+++ b/tensorflow/stream_executor/lib/error.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_ERROR_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_ERROR_H_
+
+#include "tensorflow/core/lib/core/error_codes.pb.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+namespace error = tensorflow::error;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_ERROR_H_
diff --git a/tensorflow/stream_executor/lib/human_readable.h b/tensorflow/stream_executor/lib/human_readable.h
new file mode 100644
index 0000000000..78df4a4a70
--- /dev/null
+++ b/tensorflow/stream_executor/lib/human_readable.h
@@ -0,0 +1,58 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_HUMAN_READABLE_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_HUMAN_READABLE_H_
+
+#include <assert.h>
+#include <limits>
+
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+class HumanReadableNumBytes {
+ public:
+  static string ToString(int64 num_bytes) {
+    if (num_bytes == std::numeric_limits<int64>::min()) {
+      // Special case for number with not representable nagation.
+      return "-8E";
+    }
+
+    const char* neg_str = GetNegStr(&num_bytes);
+
+    // Special case for bytes.
+    if (num_bytes < 1024LL) {
+      // No fractions for bytes.
+      return port::Printf("%s%lldB", neg_str, num_bytes);
+    }
+
+    static const char units[] = "KMGTPE";  // int64 only goes up to E.
+    const char* unit = units;
+    while (num_bytes >= (1024LL) * (1024LL)) {
+      num_bytes /= (1024LL);
+      ++unit;
+      assert(unit < units + sizeof(units));
+    }
+
+    return port::Printf(((*unit == 'K') ? "%s%.1f%c" : "%s%.2f%c"), neg_str,
+                        num_bytes / 1024.0, *unit);
+  }
+
+ private:
+  template <typename T>
+  static const char* GetNegStr(T* value) {
+    if (*value < 0) {
+      *value = -(*value);
+      return "-";
+    } else {
+      return "";
+    }
+  }
+};
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_HUMAN_READABLE_H_
diff --git a/tensorflow/stream_executor/lib/initialize.h b/tensorflow/stream_executor/lib/initialize.h
new file mode 100644
index 0000000000..d1832d6b26
--- /dev/null
+++ b/tensorflow/stream_executor/lib/initialize.h
@@ -0,0 +1,35 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_INITIALIZE_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_INITIALIZE_H_
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE)
+#else
+
+#undef REGISTER_MODULE_INITIALIZER
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+class Initializer {
+ public:
+  typedef void (*InitializerFunc)();
+  explicit Initializer(InitializerFunc func) { func(); }
+};
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#define REGISTER_INITIALIZER(type, name, body)                               \
+  static void google_init_##type##_##name() { body; }                        \
+  perftools::gputools::port::Initializer google_initializer_##type##_##name( \
+      google_init_##type##_##name)
+
+#define REGISTER_MODULE_INITIALIZER(name, body) \
+  REGISTER_INITIALIZER(module, name, body)
+
+#endif  // !defined(PLATFORM_GOOGLE)
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_INITIALIZE_H_
diff --git a/tensorflow/stream_executor/lib/inlined_vector.h b/tensorflow/stream_executor/lib/inlined_vector.h
new file mode 100644
index 0000000000..e1f7a29904
--- /dev/null
+++ b/tensorflow/stream_executor/lib/inlined_vector.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_INLINED_VECTOR_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_INLINED_VECTOR_H_
+
+#include "tensorflow/core/lib/gtl/inlined_vector.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::gtl::InlinedVector;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_INLINED_VECTOR_H_
diff --git a/tensorflow/stream_executor/lib/mathutil.h b/tensorflow/stream_executor/lib/mathutil.h
new file mode 100644
index 0000000000..dd3d37a19c
--- /dev/null
+++ b/tensorflow/stream_executor/lib/mathutil.h
@@ -0,0 +1,88 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_MATHUTIL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_MATHUTIL_H_
+
+#include <algorithm>
+#include <cmath>
+#include <limits>
+#include <type_traits>
+#include <vector>
+
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+class MathUtil {
+ public:
+  template <typename IntegralType>
+  static IntegralType CeilOfRatio(IntegralType numerator,
+                                  IntegralType denominator) {
+    return CeilOrFloorOfRatio<IntegralType, true>(numerator, denominator);
+  }
+  template <typename IntegralType>
+  static IntegralType FloorOfRatio(IntegralType numerator,
+                                   IntegralType denominator) {
+    return CeilOrFloorOfRatio<IntegralType, false>(numerator, denominator);
+  }
+  template <typename IntegralType, bool ceil>
+  static IntegralType CeilOrFloorOfRatio(IntegralType numerator,
+                                         IntegralType denominator);
+};
+
+// ---- CeilOrFloorOfRatio ----
+// This is a branching-free, cast-to-double-free implementation.
+//
+// Casting to double is in general incorrect because of loss of precision
+// when casting an int64 into a double.
+//
+// There's a bunch of 'recipes' to compute a integer ceil (or floor) on the web,
+// and most of them are incorrect.
+template<typename IntegralType, bool ceil>
+IntegralType MathUtil::CeilOrFloorOfRatio(IntegralType numerator,
+                                          IntegralType denominator) {
+  static_assert(std::is_integral<IntegralType>::value,
+                 "CeilOfRatio_is_only_defined_for_integral_types");
+  assert(denominator != 0);
+  // Dividing the smallest signed integer by -1 is not supported: it would
+  // SIGFPE
+  assert(!std::is_signed<IntegralType>::value ||
+         numerator != std::numeric_limits<IntegralType>::min() ||
+         denominator != -1);
+
+  const IntegralType rounded_toward_zero = numerator / denominator;
+  const IntegralType intermediate_product = rounded_toward_zero * denominator;
+
+  if (ceil) {  // Compile-time condition: not an actual branching
+    // When rounded_toward_zero is negative, then an adjustment is never needed:
+    // the real ratio is negative, and so rounded toward zero is the ceil.
+    // When rounded_toward_zero is non-negative, an adjustment is needed if the
+    // sign of the difference numerator - intermediate_product is the same as
+    // the sign of the denominator.
+    //
+    // Using a bool and then a static_cast to IntegralType is not strictly
+    // necessary, but it makes the code clear, and anyway the compiler should
+    // get rid of it.
+    const bool needs_adjustment = (rounded_toward_zero >= 0) &&
+        ((denominator > 0 && numerator > intermediate_product) ||
+            (denominator < 0 && numerator < intermediate_product));
+    const IntegralType adjustment = static_cast<IntegralType>(needs_adjustment);
+    const IntegralType ceil_of_ratio = rounded_toward_zero + adjustment;
+    return ceil_of_ratio;
+  } else {
+    // Floor case: symmetrical to the previous one
+    const bool needs_adjustment = (rounded_toward_zero <= 0) &&
+        ((denominator > 0 && numerator < intermediate_product) ||
+         (denominator < 0 && numerator > intermediate_product));
+    const IntegralType adjustment = static_cast<IntegralType>(needs_adjustment);
+    const IntegralType floor_of_ratio = rounded_toward_zero - adjustment;
+    return floor_of_ratio;
+  }
+}
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_MATHUTIL_H_
diff --git a/tensorflow/stream_executor/lib/notification.h b/tensorflow/stream_executor/lib/notification.h
new file mode 100644
index 0000000000..2baa458fc9
--- /dev/null
+++ b/tensorflow/stream_executor/lib/notification.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_NOTIFICATION_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_NOTIFICATION_H_
+
+#include "tensorflow/core/lib/core/notification.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::Notification;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_NOTIFICATION_H_
diff --git a/tensorflow/stream_executor/lib/numbers.cc b/tensorflow/stream_executor/lib/numbers.cc
new file mode 100644
index 0000000000..a9981b0ce6
--- /dev/null
+++ b/tensorflow/stream_executor/lib/numbers.cc
@@ -0,0 +1,27 @@
+#include "tensorflow/stream_executor/lib/numbers.h"
+
+#include <stdlib.h>
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+bool safe_strto32(const char* str, int32* value) {
+  char* endptr;
+  *value = strtol(str, &endptr, 10);  // NOLINT
+  if (endptr != str) {
+    while (isspace(*endptr)) ++endptr;
+  }
+  return *str != '\0' && *endptr == '\0';
+}
+
+// Convert strings to floating point values.
+// Leading and trailing spaces are allowed.
+// Values may be rounded on over- and underflow.
+bool safe_strto32(const string& str, int32* value) {
+  return port::safe_strto32(str.c_str(), value);
+}
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/lib/numbers.h b/tensorflow/stream_executor/lib/numbers.h
new file mode 100644
index 0000000000..17b2893743
--- /dev/null
+++ b/tensorflow/stream_executor/lib/numbers.h
@@ -0,0 +1,19 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_NUMBERS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_NUMBERS_H_
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+// Convert strings to floating point values.
+// Leading and trailing spaces are allowed.
+// Values may be rounded on over- and underflow.
+bool safe_strto32(const string& str, int32* value);
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_NUMBERS_H_
diff --git a/tensorflow/stream_executor/lib/path.cc b/tensorflow/stream_executor/lib/path.cc
new file mode 100644
index 0000000000..a6e76e99b7
--- /dev/null
+++ b/tensorflow/stream_executor/lib/path.cc
@@ -0,0 +1,50 @@
+#include "tensorflow/stream_executor/lib/path.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+
+using ::perftools::gputools::port::StringPiece;
+using ::perftools::gputools::port::StrAppend;
+
+namespace perftools {
+namespace gputools {
+namespace port {
+namespace internal {
+
+static bool IsAbsolutePath(port::StringPiece path) {
+  return !path.empty() && path[0] == '/';
+}
+
+// For an array of paths of length count, append them all together,
+// ensuring that the proper path separators are inserted between them.
+string JoinPathImpl(std::initializer_list<port::StringPiece> paths) {
+  string result;
+
+  for (port::StringPiece path : paths) {
+    if (path.empty()) continue;
+
+    if (result.empty()) {
+      result = path.ToString();
+      continue;
+    }
+
+    if (result[result.size() - 1] == '/') {
+      if (IsAbsolutePath(path)) {
+        StrAppend(&result, path.substr(1));
+      } else {
+        StrAppend(&result, path);
+      }
+    } else {
+      if (IsAbsolutePath(path)) {
+        StrAppend(&result, path);
+      } else {
+        StrAppend(&result, "/", path);
+      }
+    }
+  }
+
+  return result;
+}
+
+}  // namespace internal
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/lib/path.h b/tensorflow/stream_executor/lib/path.h
new file mode 100644
index 0000000000..1d648e8de1
--- /dev/null
+++ b/tensorflow/stream_executor/lib/path.h
@@ -0,0 +1,44 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_PATH_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_PATH_H_
+
+#include "tensorflow/stream_executor/lib/stringpiece.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+namespace internal {
+// TODO(rspringer): Move to cc/implementation file.
+// Not part of the public API.
+string JoinPathImpl(std::initializer_list<port::StringPiece> paths);
+}  // namespace internal
+
+// Join multiple paths together.
+// JoinPath unconditionally joins all paths together. For example:
+//
+//  Arguments                  | JoinPath
+//  ---------------------------+---------------------
+//  '/foo', 'bar'              | /foo/bar
+//  '/foo/', 'bar'             | /foo/bar
+//  '/foo', '/bar'             | /foo/bar
+//  '/foo', '/bar', '/baz'     | /foo/bar/baz
+//
+// All paths will be treated as relative paths, regardless of whether or not
+// they start with a leading '/'.  That is, all paths will be concatenated
+// together, with the appropriate path separator inserted in between.
+// Arguments must be convertible to port::StringPiece.
+//
+// Usage:
+// string path = file::JoinPath("/var/log", dirname, filename);
+// string path = file::JoinPath(FLAGS_test_srcdir, filename);
+template <typename... T>
+inline string JoinPath(const T&... args) {
+  return internal::JoinPathImpl({args...});
+}
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_PATH_H_
diff --git a/tensorflow/stream_executor/lib/process_state.cc b/tensorflow/stream_executor/lib/process_state.cc
new file mode 100644
index 0000000000..c20493b263
--- /dev/null
+++ b/tensorflow/stream_executor/lib/process_state.cc
@@ -0,0 +1,37 @@
+#include "tensorflow/stream_executor/lib/process_state.h"
+
+#include <unistd.h>
+
+#include <memory>
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+string Hostname() {
+  char hostname[1024];
+  gethostname(hostname, sizeof hostname);
+  hostname[sizeof hostname - 1] = 0;
+  return hostname;
+}
+
+bool GetCurrentDirectory(string* dir) {
+  size_t len = 128;
+  std::unique_ptr<char[]> a(new char[len]);
+  for (;;) {
+    char* p = getcwd(a.get(), len);
+    if (p != NULL) {
+      *dir = p;
+      return true;
+    } else if (errno == ERANGE) {
+      len += len;
+      a.reset(new char[len]);
+    } else {
+      return false;
+    }
+  }
+}
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/lib/process_state.h b/tensorflow/stream_executor/lib/process_state.h
new file mode 100644
index 0000000000..b75879499b
--- /dev/null
+++ b/tensorflow/stream_executor/lib/process_state.h
@@ -0,0 +1,17 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_PROCESS_STATE_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_PROCESS_STATE_H_
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+string Hostname();
+bool GetCurrentDirectory(string* dir);
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_PROCESS_STATE_H_
diff --git a/tensorflow/stream_executor/lib/ptr_util.h b/tensorflow/stream_executor/lib/ptr_util.h
new file mode 100644
index 0000000000..d10d0bcb8c
--- /dev/null
+++ b/tensorflow/stream_executor/lib/ptr_util.h
@@ -0,0 +1,48 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_PTR_UTIL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_PTR_UTIL_H_
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+// Trait to select overloads and return types for MakeUnique.
+template <typename T>
+struct MakeUniqueResult {
+  using scalar = std::unique_ptr<T>;
+};
+template <typename T>
+struct MakeUniqueResult<T[]> {
+  using array = std::unique_ptr<T[]>;
+};
+template <typename T, size_t N>
+struct MakeUniqueResult<T[N]> {
+  using invalid = void;
+};
+
+// MakeUnique<T>(...) is an early implementation of C++14 std::make_unique.
+// It is designed to be 100% compatible with std::make_unique so that the
+// eventual switchover will be a simple renaming operation.
+template <typename T, typename... Args>
+typename MakeUniqueResult<T>::scalar MakeUnique(Args&&... args) {  // NOLINT
+  return std::unique_ptr<T>(
+      new T(std::forward<Args>(args)...));  // NOLINT(build/c++11)
+}
+
+// Overload for array of unknown bound.
+// The allocation of arrays needs to use the array form of new,
+// and cannot take element constructor arguments.
+template <typename T>
+typename MakeUniqueResult<T>::array MakeUnique(size_t n) {
+  return std::unique_ptr<T>(new typename std::remove_extent<T>::type[n]());
+}
+
+// Reject arrays of known bound.
+template <typename T, typename... Args>
+typename MakeUniqueResult<T>::invalid MakeUnique(Args&&... /* args */) =
+    delete;  // NOLINT
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_PTR_UTIL_H_
diff --git a/tensorflow/stream_executor/lib/stacktrace.h b/tensorflow/stream_executor/lib/stacktrace.h
new file mode 100644
index 0000000000..e7d478efe3
--- /dev/null
+++ b/tensorflow/stream_executor/lib/stacktrace.h
@@ -0,0 +1,18 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STACKTRACE_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STACKTRACE_H_
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+#if !defined(PLATFORM_GOOGLE)
+inline string CurrentStackTrace() { return "No stack trace available"; }
+#endif
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STACKTRACE_H_
diff --git a/tensorflow/stream_executor/lib/static_threadlocal.h b/tensorflow/stream_executor/lib/static_threadlocal.h
new file mode 100644
index 0000000000..9227b2cf0d
--- /dev/null
+++ b/tensorflow/stream_executor/lib/static_threadlocal.h
@@ -0,0 +1,30 @@
+// Copyright 2006 Google Inc.
+// All rights reserved.
+// Author: Yaz Saito (saito@google.com)
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STATIC_THREADLOCAL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STATIC_THREADLOCAL_H_
+
+// For POD types in TLS mode, s_obj_VAR is the thread-local variable.
+#define SE_STATIC_THREAD_LOCAL_POD(_Type_, _var_)               \
+  static thread_local _Type_ s_obj_##_var_;                     \
+  namespace {                                                   \
+  class ThreadLocal_##_var_ {                                   \
+  public:                                                       \
+    ThreadLocal_##_var_() {}                                    \
+    void Init() {}                                              \
+    inline _Type_ *pointer() const {                            \
+      return &s_obj_##_var_;                                    \
+    }                                                           \
+    inline _Type_ *safe_pointer() const {                       \
+      return &s_obj_##_var_;                                    \
+    }                                                           \
+    _Type_ &get() const {                                       \
+      return s_obj_##_var_;                                     \
+    }                                                           \
+    bool is_native_tls() const { return true; }                 \
+  private:                                                      \
+    SE_DISALLOW_COPY_AND_ASSIGN(ThreadLocal_##_var_);           \
+  } _var_;                                                      \
+  }
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STATIC_THREADLOCAL_H_
diff --git a/tensorflow/stream_executor/lib/status.h b/tensorflow/stream_executor/lib/status.h
new file mode 100644
index 0000000000..b3ad13b0ae
--- /dev/null
+++ b/tensorflow/stream_executor/lib/status.h
@@ -0,0 +1,23 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STATUS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STATUS_H_
+
+#include "tensorflow/core/public/status.h"
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::Status;
+
+#define SE_CHECK_OK(val) \
+  CHECK_EQ(::perftools::gputools::port::Status::OK(), (val))
+#define SE_ASSERT_OK(val) \
+  ASSERT_EQ(::perftools::gputools::port::Status::OK(), (val))
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STATUS_H_
diff --git a/tensorflow/stream_executor/lib/status_macros.h b/tensorflow/stream_executor/lib/status_macros.h
new file mode 100644
index 0000000000..7e1de92a98
--- /dev/null
+++ b/tensorflow/stream_executor/lib/status_macros.h
@@ -0,0 +1,54 @@
+// Helper macros for dealing with the port::Status datatype.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STATUS_MACROS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STATUS_MACROS_H_
+
+// Early-returns the status if it is in error; otherwise, proceeds.
+//
+// The argument expression is guaranteed to be evaluated exactly once.
+#define SE_RETURN_IF_ERROR(__status) \
+  do {                               \
+    auto status = __status;          \
+    if (!status.ok()) {              \
+      return status;                 \
+    }                                \
+  } while (false)
+
+// Identifier concatenation helper macros.
+#define SE_MACRO_CONCAT_INNER(__x, __y) __x##__y
+#define SE_MACRO_CONCAT(__x, __y) SE_MACRO_CONCAT_INNER(__x, __y)
+
+// Implementation of SE_ASSIGN_OR_RETURN that uses a unique temporary identifier
+// for avoiding collision in the enclosing scope.
+#define SE_ASSIGN_OR_RETURN_IMPL(__lhs, __rhs, __name) \
+  auto __name = (__rhs);                               \
+  if (!__name.ok()) {                                  \
+    return __name.status();                            \
+  }                                                    \
+  __lhs = __name.ConsumeValueOrDie();
+
+// Early-returns the status if it is in error; otherwise, assigns the
+// right-hand-side expression to the left-hand-side expression.
+//
+// The right-hand-side expression is guaranteed to be evaluated exactly once.
+#define SE_ASSIGN_OR_RETURN(__lhs, __rhs) \
+  SE_ASSIGN_OR_RETURN_IMPL(__lhs, __rhs,  \
+                           SE_MACRO_CONCAT(__status_or_value, __COUNTER__))
+
+// Logs the status and returns false if it is in error; otherwise, returns true.
+//
+// The argument expression is guaranteed to be evaluated exactly once.
+//
+// TODO(leary) remove as many of these as possible with port::Status
+// proliferation.
+#define SE_RETURN_STATUS_AS_BOOL(__status) \
+  do {                                     \
+    auto status = __status;                \
+    if (__status.ok()) {                   \
+      return true;                         \
+    }                                      \
+    LOG(ERROR) << status;                  \
+    return false;                          \
+  } while (false)
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STATUS_MACROS_H_
diff --git a/tensorflow/stream_executor/lib/statusor.h b/tensorflow/stream_executor/lib/statusor.h
new file mode 100644
index 0000000000..38ce35e46e
--- /dev/null
+++ b/tensorflow/stream_executor/lib/statusor.h
@@ -0,0 +1,234 @@
+// Copyright 2008 Google Inc. All Rights Reserved.
+// Author: acm@google.com (Andrew Morrow)
+// Author: zhengxq@google.com (Xiaoqiang Zheng)
+//
+// StatusOr<T> is the union of a Status object and a T
+// object. StatusOr models the concept of an object that is either a
+// usable value, or an error Status explaining why such a value is
+// not present. To this end, StatusOr<T> does not allow its Status
+// value to be Status::OK. Further, StatusOr<T*> does not allow the
+// contained pointer to be NULL.
+//
+// The primary use-case for StatusOr<T> is as the return value of a
+// function which may fail.
+//
+// Example client usage for a StatusOr<T>, where T is not a pointer:
+//
+//  StatusOr<float> result = DoBigCalculationThatCouldFail();
+//  if (result.ok()) {
+//    float answer = result.ValueOrDie();
+//    printf("Big calculation yielded: %f", answer);
+//  } else {
+//    LOG(ERROR) << result.status();
+//  }
+//
+// Example client usage for a StatusOr<T*>:
+//
+//  StatusOr<Foo*> result = FooFactory::MakeNewFoo(arg);
+//  if (result.ok()) {
+//    std::unique_ptr<Foo> foo(result.ValueOrDie());
+//    foo->DoSomethingCool();
+//  } else {
+//    LOG(ERROR) << result.status();
+//  }
+//
+// Example client usage for a StatusOr<std::unique_ptr<T>>:
+//
+//  StatusOr<std::unique_ptr<Foo>> result = FooFactory::MakeNewFoo(arg);
+//  if (result.ok()) {
+//    std::unique_ptr<Foo> foo = result.ConsumeValueOrDie();
+//    foo->DoSomethingCool();
+//  } else {
+//    LOG(ERROR) << result.status();
+//  }
+//
+// Example factory implementation returning StatusOr<T*>:
+//
+//  StatusOr<Foo*> FooFactory::MakeNewFoo(int arg) {
+//    if (arg <= 0) {
+//      return Status(port::error::INVALID_ARGUMENT,
+//                            "Arg must be positive");
+//    } else {
+//      return new Foo(arg);
+//    }
+//  }
+//
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STATUSOR_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STATUSOR_H_
+
+#include <new>
+#include "tensorflow/stream_executor/platform/port.h"
+#include <type_traits>
+#include <utility>
+
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+template<typename T>
+class StatusOr {
+  template<typename U> friend class StatusOr;
+
+ public:
+  // Construct a new StatusOr with Status::UNKNOWN status
+  StatusOr() : status_(error::UNKNOWN, "") {}
+
+  // Construct a new StatusOr with the given non-ok status. After calling
+  // this constructor, calls to ValueOrDie() is invalid.
+  //
+  // NOTE: Not explicit - we want to use StatusOr<T> as a return
+  // value, so it is convenient and sensible to be able to do 'return
+  // Status()' when the return type is StatusOr<T>.
+  //
+  // REQUIRES: status != Status::OK.
+  // In optimized builds, passing Status::OK here will have the effect
+  // of passing PosixErrorSpace::EINVAL as a fallback.
+  StatusOr(const Status& status);  // NOLINT
+
+  // Construct a new StatusOr with the given value. If T is a plain pointer,
+  // value must not be NULL. After calling this constructor, calls to
+  // ValueOrDie() will succeed, and calls to status() will return OK.
+  //
+  // NOTE: Not explicit - we want to use StatusOr<T> as a return type
+  // so it is convenient and sensible to be able to do 'return T()'
+  // when when the return type is StatusOr<T>.
+  //
+  // REQUIRES: if T is a plain pointer, value != NULL.
+  // In optimized builds, passing a NULL pointer here will have
+  // the effect of passing PosixErrorSpace::EINVAL as a fallback.
+  StatusOr(const T& value);  // NOLINT
+
+  // Conversion copy constructor, T must be copy constructible from U
+  template <typename U>
+  StatusOr(const StatusOr<U>& other)  // NOLINT
+      : status_(other.status_),
+        value_(other.value_) {}
+
+  // Conversion assignment operator, T must be assignable from U
+  template <typename U>
+  StatusOr& operator=(const StatusOr<U>& other) {
+    status_ = other.status_;
+    value_ = other.value_;
+    return *this;
+  }
+
+  // Rvalue-reference overloads of the other constructors and assignment
+  // operators, to support move-only types and avoid unnecessary copying.
+  StatusOr(T&& value);  // NOLINT
+
+  // Move conversion operator to avoid unecessary copy.
+  // T must be assignable from U.
+  // Not marked with explicit so the implicit conversion can happen.
+  template <typename U>
+  StatusOr(StatusOr<U>&& other)  // NOLINT
+      : status_(std::move(other.status_)),
+        value_(std::move(other.value_)) {}
+
+  // Move assignment opeartor to avoid unnecessary copy.
+  // T must be assignable from U
+  template <typename U>
+  StatusOr& operator=(StatusOr<U>&& other) {
+    status_ = std::move(other.status_);
+    value_ = std::move(other.value_);
+    return *this;
+  }
+
+  // Returns a reference to our status. If this contains a T, then
+  // returns Status::OK.
+  const Status& status() const { return status_; }
+
+  // Returns this->status().ok()
+  bool ok() const { return status_.ok(); }
+
+  // Returns a reference to our current value, requires that this->ok().
+  // If you need to initialize a T object from the stored value,
+  // ConsumeValueOrDie() may be more efficient.
+  const T& ValueOrDie() const;
+
+  // Returns our current value, requires this->ok(). Use this if
+  // you would otherwise want to say std::move(s.ValueOrDie()), for example
+  // if you need to initialize a T object from the stored value and you don't
+  // need subsequent access to the stored value. It uses T's move constructor,
+  // if it has one, so it will work with move-only types, and will often be
+  // more efficient than ValueOrDie, but may leave the stored value
+  // in an arbitrary valid state.
+  T ConsumeValueOrDie();
+
+ private:
+  Status status_;
+  T value_;
+
+  void CheckValueNotNull(const T& value);
+
+  template <typename U>
+  struct IsNull {
+    // For non-pointer U, a reference can never be NULL.
+    static inline bool IsValueNull(const U& t) { return false; }
+  };
+
+  template <typename U>
+  struct IsNull<U*> {
+    static inline bool IsValueNull(const U* t) { return t == NULL; }
+  };
+};
+
+////////////////////////////////////////////////////////////////////////////////
+// Implementation details for StatusOr<T>
+
+template <typename T>
+StatusOr<T>::StatusOr(const T& value)
+    : status_(), value_(value) {
+  CheckValueNotNull(value);
+}
+
+template <typename T>
+const T& StatusOr<T>::ValueOrDie() const {
+  assert(status_.ok());
+  return value_;
+}
+
+template <typename T>
+T StatusOr<T>::ConsumeValueOrDie() {
+  assert(status_.ok());
+  return std::move(value_);
+}
+
+template <typename T>
+StatusOr<T>::StatusOr(const Status& status)
+    : status_(status) {
+  assert(!status.ok());
+  if (status.ok()) {
+    status_ =
+        Status(error::INTERNAL,
+               "Status::OK is not a valid constructor argument to StatusOr<T>");
+  }
+}
+
+template <typename T>
+StatusOr<T>::StatusOr(T&& value)
+    : status_() {
+  CheckValueNotNull(value);
+  value_ = std::move(value);
+}
+
+template <typename T>
+void StatusOr<T>::CheckValueNotNull(const T& value) {
+  assert(!IsNull<T>::IsValueNull(value));
+  if (IsNull<T>::IsValueNull(value)) {
+    status_ =
+        Status(error::INTERNAL,
+               "NULL is not a valid constructor argument to StatusOr<T*>");
+  }
+}
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STATUSOR_H_
diff --git a/tensorflow/stream_executor/lib/str_util.h b/tensorflow/stream_executor/lib/str_util.h
new file mode 100644
index 0000000000..021f54dfec
--- /dev/null
+++ b/tensorflow/stream_executor/lib/str_util.h
@@ -0,0 +1,30 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STR_UTIL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STR_UTIL_H_
+
+#include "tensorflow/core/lib/strings/str_util.h"
+#include "tensorflow/stream_executor/lib/stringpiece.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::str_util::Join;
+using tensorflow::str_util::Split;
+
+// Returns a copy of the input string 'str' with the given 'suffix'
+// removed. If the suffix doesn't match, returns a copy of the original string.
+inline string StripSuffixString(port::StringPiece str, port::StringPiece suffix) {
+  if (str.ends_with(suffix)) {
+    str.remove_suffix(suffix.size());
+  }
+  return str.ToString();
+}
+
+using tensorflow::str_util::Lowercase;
+using tensorflow::str_util::Uppercase;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STR_UTIL_H_
diff --git a/tensorflow/stream_executor/lib/strcat.h b/tensorflow/stream_executor/lib/strcat.h
new file mode 100644
index 0000000000..b3fe4da327
--- /dev/null
+++ b/tensorflow/stream_executor/lib/strcat.h
@@ -0,0 +1,17 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STRCAT_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STRCAT_H_
+
+#include "tensorflow/core/lib/strings/strcat.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::strings::StrCat;
+using tensorflow::strings::StrAppend;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STRCAT_H_
diff --git a/tensorflow/stream_executor/lib/stringpiece.h b/tensorflow/stream_executor/lib/stringpiece.h
new file mode 100644
index 0000000000..14e6fc99d7
--- /dev/null
+++ b/tensorflow/stream_executor/lib/stringpiece.h
@@ -0,0 +1,17 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STRINGPIECE_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STRINGPIECE_H_
+
+#include "tensorflow/core/lib/core/stringpiece.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::StringPiece;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STRINGPIECE_H_
diff --git a/tensorflow/stream_executor/lib/stringprintf.h b/tensorflow/stream_executor/lib/stringprintf.h
new file mode 100644
index 0000000000..379e7e9a83
--- /dev/null
+++ b/tensorflow/stream_executor/lib/stringprintf.h
@@ -0,0 +1,18 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_STRINGPRINTF_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_STRINGPRINTF_H_
+
+#include "tensorflow/core/lib/strings/stringprintf.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::strings::Printf;
+using tensorflow::strings::Appendf;
+using tensorflow::strings::Appendv;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_STRINGPRINTF_H_
diff --git a/tensorflow/stream_executor/lib/thread_options.h b/tensorflow/stream_executor/lib/thread_options.h
new file mode 100644
index 0000000000..7d436578d6
--- /dev/null
+++ b/tensorflow/stream_executor/lib/thread_options.h
@@ -0,0 +1,16 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_THREAD_OPTIONS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_THREAD_OPTIONS_H_
+
+#include "tensorflow/core/public/env.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::ThreadOptions;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_THREAD_OPTIONS_H_
diff --git a/tensorflow/stream_executor/lib/threadpool.h b/tensorflow/stream_executor/lib/threadpool.h
new file mode 100644
index 0000000000..3cf297d57b
--- /dev/null
+++ b/tensorflow/stream_executor/lib/threadpool.h
@@ -0,0 +1,19 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_LIB_THREADPOOL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_LIB_THREADPOOL_H_
+
+#include "tensorflow/core/lib/core/threadpool.h"
+#include "tensorflow/stream_executor/lib/env.h"
+#include "tensorflow/stream_executor/lib/notification.h"
+#include "tensorflow/stream_executor/lib/thread_options.h"
+
+namespace perftools {
+namespace gputools {
+namespace port {
+
+using tensorflow::thread::ThreadPool;
+
+}  // namespace port
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_LIB_THREADPOOL_H_
diff --git a/tensorflow/stream_executor/machine_manager.cc b/tensorflow/stream_executor/machine_manager.cc
new file mode 100644
index 0000000000..6d7bc50379
--- /dev/null
+++ b/tensorflow/stream_executor/machine_manager.cc
@@ -0,0 +1,276 @@
+#include "tensorflow/stream_executor/machine_manager.h"
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/dso_loader.h"
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+mutex MachineManager::mu_{LINKER_INITIALIZED};
+
+MachineManager *MachineManager::singleton_ = nullptr;
+
+PlatformKind MachineManager::DetectPreferredPlatform() {
+// TODO(leary) for KNC card experiments, figure out a legitimate way to
+// determine this. For now, we use a compile-time hint so we can compile tests
+// for both.
+#if defined TENSORFLOW_STREAM_EXECUTOR_MACHINE_MANAGER_PREFER_OPENCL
+  return PlatformKind::kOpenCL;
+#elif defined TENSORFLOW_STREAM_EXECUTOR_MACHINE_MANAGER_PREFER_HOST
+  return PlatformKind::kHost;
+#else
+  return PlatformKind::kCuda;
+#endif
+}
+
+/* static */ port::StatusOr<std::unique_ptr<MachineManager>>
+MachineManager::Create(PlatformKind kind, DeviceOptions options,
+                       const PluginConfig &config) {
+  std::unique_ptr<MachineManager> machine_manager{
+      new MachineManager{kind, options, config}};
+  auto init_status = machine_manager->Init();
+  if (!init_status.ok()) {
+    return init_status;
+  }
+
+  return std::move(machine_manager);
+}
+
+MachineManager::MachineManager(PlatformKind platform,
+                               DeviceOptions device_options,
+                               const PluginConfig &config)
+    : platform_(platform),
+      device_options_(device_options),
+      plugin_config_(config),
+      min_numa_node_(0),
+      limit_numa_node_(0) {}
+
+port::Status MachineManager::Init() {
+  // Initialize the first StreamExecutor, then use that platform interface to
+  // grab the device count.
+  executors_.resize(1);
+  executors_[0].reset(new StreamExecutor{platform_, plugin_config_});
+  auto status = executors_[0]->Init(0 /* = device_ordinal */, device_options_);
+  if (!status.ok()) {
+    return port::Status{
+        port::error::FAILED_PRECONDITION,
+        port::StrCat(
+            "failed to initialize StreamExecutor for device ordinal 0: ",
+            status.ToString())};
+  }
+  int device_count = executors_[0]->PlatformDeviceCount();
+  if (device_count == 0) {
+    LOG(WARNING) << "no devices found for platform "
+                 << PlatformKindString(platform_);
+    min_numa_node_ = limit_numa_node_ = 0;
+    return port::Status::OK();
+  }
+
+  streams_.resize(device_count);
+  streams_[0].reset(new Stream(executors_[0].get()));
+  if (!streams_[0]->Init().ok()) {
+    return port::Status{
+        port::error::FAILED_PRECONDITION,
+        "failed to initialize default stream for device ordinal 0"};
+  }
+
+  min_numa_node_ = executors_[0]->GetDeviceDescription().numa_node();
+  limit_numa_node_ = min_numa_node_ + 1;
+
+  executors_.resize(device_count);
+  for (int device_ordinal = 1; device_ordinal < device_count;
+       ++device_ordinal) {
+    StreamExecutor *stream_exec = new StreamExecutor{platform_, plugin_config_};
+    executors_[device_ordinal].reset(stream_exec);
+    auto status = stream_exec->Init(device_ordinal, device_options_);
+    if (!status.ok()) {
+      return port::Status(
+          port::error::FAILED_PRECONDITION,
+          port::StrCat(
+              "failed to initialize StreamExecutor for device ordinal ",
+              device_ordinal, ": ", status.ToString()));
+    }
+
+    min_numa_node_ = std::min(min_numa_node_,
+                              stream_exec->GetDeviceDescription().numa_node());
+    limit_numa_node_ = std::max(
+        limit_numa_node_, stream_exec->GetDeviceDescription().numa_node() + 1);
+
+    if (!stream_exec->GetDeviceDescription().ecc_enabled()) {
+      LOG(WARNING) << "ECC not enabled for device ordinal: " << device_ordinal;
+    }
+
+    streams_[device_ordinal].reset(
+        new Stream(executors_[device_ordinal].get()));
+    if (!streams_[device_ordinal]->Init().ok()) {
+      return port::Status(
+          port::error::FAILED_PRECONDITION,
+          port::StrCat(
+              "failed to initialize default stream for device ordinal ",
+              device_ordinal));
+    }
+  }
+
+  return port::Status::OK();
+}
+
+int MachineManager::device_count() const { return executors_.size(); }
+
+port::Status MachineManager::EnablePeerAccess() {
+  auto peer_access_map = GetPeerAccessMap();
+  for (const auto &access : *peer_access_map) {
+    auto devices = access.first;
+    if (access.second) {
+      StreamExecutor *from = executors_[devices.first].get();
+      StreamExecutor *to = executors_[devices.second].get();
+      auto status = from->EnablePeerAccessTo(to);
+      if (!status.ok()) {
+        return status;
+      }
+    } else {
+      LOG(INFO) << "cannot enable peer access from device ordinal "
+                << devices.first << " to device ordinal " << devices.second;
+    }
+  }
+  return port::Status::OK();
+}
+
+std::unique_ptr<std::map<std::pair<int, int>, bool>>
+MachineManager::GetPeerAccessMap() {
+  auto *map = new std::map<std::pair<int, int>, bool>;
+  for (int i = 0; i < device_count(); ++i) {
+    for (int j = 0; j < device_count(); ++j) {
+      StreamExecutor *from = executors_[i].get();
+      StreamExecutor *to = executors_[j].get();
+      (*map)[{i, j}] = from->CanEnablePeerAccessTo(to);
+    }
+  }
+
+  return std::unique_ptr<std::map<std::pair<int, int>, bool>>{map};
+}
+
+StreamExecutor *MachineManager::executor_for_device(int device_ordinal) const {
+  CHECK_GE(device_ordinal, 0) << "device ordinal must be non-negative";
+  CHECK(0 <= device_ordinal && device_ordinal < device_count())
+      << "device " << device_ordinal << " out of range with device count "
+      << device_count();
+  StreamExecutor *executor = executors_[device_ordinal].get();
+  CHECK(executor != nullptr);
+  return executor;
+}
+
+int MachineManager::ExecutorToBus(const StreamExecutor *stream_exec) const {
+  return stream_exec->GetDeviceDescription().numa_node() - min_numa_node_;
+}
+
+int MachineManager::DeviceToBus(int device_ordinal) const {
+  return ExecutorToBus(executor_for_device(device_ordinal));
+}
+
+int MachineManager::ExecutorToNumaNode(
+    const StreamExecutor *stream_exec) const {
+  return stream_exec->GetDeviceDescription().numa_node();
+}
+
+int MachineManager::DeviceToNumaNode(int device_ordinal) const {
+  return ExecutorToNumaNode(executor_for_device(device_ordinal));
+}
+
+StreamExecutor *MachineManager::first_executor_for_bus(int bus_ordinal) {
+  CHECK_LT(bus_ordinal, bus_count()) << "bus ordinal out of available range";
+  for (auto &executor : executors_) {
+    if (ExecutorToBus(executor.get()) == bus_ordinal) {
+      return executor.get();
+    }
+  }
+
+  LOG(WARNING) << "could not find executor requested for bus ordinal: "
+               << bus_ordinal;
+  return nullptr;
+}
+
+StreamExecutor *MachineManager::first_executor_for_numa_node(int numa_node) {
+  for (auto &executor : executors_) {
+    if (ExecutorToNumaNode(executor.get()) == numa_node) {
+      return executor.get();
+    }
+  }
+
+  LOG(WARNING) << "could not find executor requested for numa_node: "
+               << numa_node;
+  return nullptr;
+}
+
+Stream *MachineManager::stream_for_device(int device_ordinal) {
+  CHECK(0 <= device_ordinal && device_ordinal < device_count());
+  Stream *stream = streams_[device_ordinal].get();
+  CHECK(stream != nullptr);
+  return stream;
+}
+
+/* static */ port::StatusOr<MachineManager *>
+MachineManager::CreateSingletonInternal(PlatformKind platform,
+                                        DeviceOptions options,
+                                        const PluginConfig &config) {
+  if (singleton_ != nullptr) {
+    return port::Status{
+        port::error::ALREADY_EXISTS,
+        "cannot create machine manager singleton; one already exists"};
+  }
+
+  auto create_status = Create(platform, options, config);
+  if (!create_status.ok()) {
+    return create_status.status();
+  }
+
+  singleton_ = create_status.ConsumeValueOrDie().release();
+
+  VLOG(1) << "machine manager singleton is " << singleton_ << " with platform "
+          << PlatformKindString(platform) << " and device options "
+          << options.ToString();
+
+  return singleton_;
+}
+
+/* static */ MachineManager *MachineManager::CreateSingletonOrDie(
+    PlatformKind platform, DeviceOptions options, const PluginConfig &config) {
+  auto status = CreateSingleton(platform, options, config);
+  if (!status.ok()) {
+    LOG(FATAL) << "failed to create MachineManager singleton: "
+               << status.status();
+  }
+  return status.ValueOrDie();
+}
+
+/* static */ port::StatusOr<MachineManager *> MachineManager::CreateSingleton(
+    PlatformKind platform, DeviceOptions device_options,
+    const PluginConfig &config) {
+  mutex_lock lock{mu_};
+  return CreateSingletonInternal(platform, device_options, config);
+}
+
+/* static */ MachineManager *MachineManager::singleton() {
+  mutex_lock lock{mu_};
+  if (singleton_ == nullptr) {
+    PlatformKind platform = DetectPreferredPlatform();
+    DeviceOptions options = DeviceOptions::Default();
+    auto status = CreateSingletonInternal(platform, options, PluginConfig());
+    if (!status.ok()) {
+      LOG(FATAL)
+          << "failed to create MachineManager singleton: "
+             "singleton accessor attempted lazy construction but failed: "
+          << status.status();
+    }
+    return status.ValueOrDie();
+  }
+
+  return singleton_;
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/machine_manager.h b/tensorflow/stream_executor/machine_manager.h
new file mode 100644
index 0000000000..bcff7a9da0
--- /dev/null
+++ b/tensorflow/stream_executor/machine_manager.h
@@ -0,0 +1,197 @@
+// This interface provides a machine-wide resource management singleton
+// interface as a convenience for users who will want to exploit all of the GPU
+// resources present on the system.
+//
+// To use the singleton interface:
+//
+//  // At start of program or in your module initializer.
+//  // Do not call this with different sets of arguments!
+//  MachineManager::CreateSingletonOrDie(
+//      MachineManager::DetectPreferredPlatform(), DeviceOptions::Default());
+//
+//  // At any point after that, this convenience interface avoids you having to
+//  // pass those two parameters:
+//  StreamExecutor *device0_executor =
+//      MachineManager::singleton()->executor_for_device(0 /* = ordinal */);
+//  ...
+
+// ----------------- THIS CLASS IS DEPRECATED - DO NOT USE ------------------
+// This class is not suitable for open-sourcing, as it does not support
+// plugins and depends on hardcoded PlatformKind enums. MultiPlatformManager and
+// Platform plugins are the replacements.
+// ----------------- THIS CLASS IS DEPRECATED - DO NOT USE ------------------
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_MACHINE_MANAGER_H_
+#define TENSORFLOW_STREAM_EXECUTOR_MACHINE_MANAGER_H_
+
+#include <map>
+#include <memory>
+#include <utility>
+#include <vector>
+
+#include "tensorflow/stream_executor/device_options.h"  // IWYU pragma: export
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+
+namespace perftools {
+namespace gputools {
+
+// MachineManager is used to instantiate and manage singleton resources for
+// all the GPUs present on a machine. This basically amounts to having a
+// StreamExecutor-per-device pool.
+//
+// Thread-safe.
+class MachineManager {
+ public:
+  // Inspects the host to determine the preferred GPU execution platform.
+  // To force OpenCL from a build target on a machine that has both OpenCL and
+  // CUDA capabilities, link against the :stream_executor_prefer_opencl target.
+  static PlatformKind DetectPreferredPlatform();
+
+  // Returns the machine manager singleton.
+  // If the singleton has not yet been created when this is invoked, this
+  // creates it with resonable default options, otherwise it returns the
+  // already-created singleton. If there are errors during creation, this call
+  // will terminate the program.
+  static MachineManager *singleton();
+
+  // Returns a singleton instance of the machine manager -- it's generally
+  // assumed that users will have one of these for a real-world application as a
+  // form of resource manager.
+  //
+  // This should only be called once, at the initialization of an application,
+  // if at all -- MachineManager::singleton() will return a value with sensible
+  // default as determined by DetectPreferredPlatform. Attempts to create the
+  // singleton with options multiple times will result in an error.
+  static port::StatusOr<MachineManager *> CreateSingleton(
+      PlatformKind platform, DeviceOptions device_options,
+      const PluginConfig &config = PluginConfig());
+
+  // Convenience "or die" wrapper around the above call.
+  static MachineManager *CreateSingletonOrDie(
+      PlatformKind platform, DeviceOptions device_options,
+      const PluginConfig &config = PluginConfig());
+
+  // Creates a new instantiation of the MachineManager.
+  // Warning: generally users will want to use the singleton form, see
+  // MachineManager::singleton().
+  //
+  // The machine manager has a number of devices that it detects on creation
+  // that does not change over the course of its lifetime. This does not support
+  // things like hot-plugging of GPUs or the event of GPUs dropping off the bus
+  // in a recoverable manner.
+  static port::StatusOr<std::unique_ptr<MachineManager>> Create(
+      PlatformKind kind, DeviceOptions options,
+      const PluginConfig &config = PluginConfig());
+
+  // Returns the number of devices visible to the machine manager.
+  int device_count() const;
+
+  // Returns the StreamExecutor for one of the machine-manager visible devices.
+  // Checks that device_ordinal is within device_count() bound.
+  StreamExecutor *executor_for_device(int device_ordinal) const;
+
+  // Returns the bus ordinal count (as determined by the span of NUMA nodes
+  // associated with the available devices).
+  int bus_count() const { return limit_numa_node_ - min_numa_node_; }
+
+  // Returns the bus ordinal associated with a given device ordinal.
+  int DeviceToBus(int device_ordinal) const;
+
+  // Returns the NUMA node associated with a given device ordinal.
+  int DeviceToNumaNode(int device_ordinal) const;
+
+  // Returns the first StreamExecutor (within device_count() ordinals that has
+  // the corresponding bus ordinal, or nullptr if none is found.
+  //
+  // The valid bus ordinals can be enumerated by scanning through the executors
+  // and seeing what bus number they are on.
+  StreamExecutor *first_executor_for_bus(int bus_ordinal);
+
+  // Returns the first StreamExecutor associated with the specified
+  // numa_node, or nullptr if none is found.
+  StreamExecutor *first_executor_for_numa_node(int numa_node);
+
+  // Returns the default stream for the default executor (that returned by
+  // executor_for_device()). The same stream will be returned for all calls to
+  // stream_for_device() (with the same device_ordinal).
+  Stream *stream_for_device(int device_ordinal);
+
+  // Returns the platform that this machine manager was created to target.
+  PlatformKind platform() const { return platform_; }
+
+  // Enables peer access between all possible devices on this platform.
+  // Only dies due to failure to enable peer access for devices in which
+  // GetPeerAccessMap() is true.
+  port::Status EnablePeerAccess();
+
+  // Returns a map that says, for pairs (device ordinal i, device ordinal j),
+  // whether i can access j's memory space.
+  std::unique_ptr<std::map<std::pair<int, int>, bool>> GetPeerAccessMap();
+
+ private:
+  // Guts of the singleton creation mechanism that requires the exclusive
+  // singleton lock to be held, in order to prevent deadlock due to method
+  // composition.
+  static port::StatusOr<MachineManager *> CreateSingletonInternal(
+      PlatformKind platform, DeviceOptions options, const PluginConfig &config)
+      EXCLUSIVE_LOCKS_REQUIRED(mu_);
+
+  // Private constructor used in singleton creation.
+  MachineManager(PlatformKind platform, DeviceOptions options,
+                 const PluginConfig &config);
+
+  // Populates the executors_ vector with an executor per observable device
+  // ordinal on the platform. Logs and returns false if any of the
+  // Stream Executors cannot be created.
+  port::Status Init();
+
+  // Converts a StreamExecutor's NUMA node association into a bus ordinal for
+  // this machine.
+  int ExecutorToBus(const StreamExecutor *stream_exec) const;
+
+  // Returns the NUMA node association for the StreamExecutor.
+  int ExecutorToNumaNode(const StreamExecutor *stream_exec) const;
+
+  // Mutex that guards the initialization of the machine manager static
+  // variable.
+  static mutex mu_;
+
+  // Singleton MachineManager value -- assignment to this is protected by a
+  // static singleton guard clause.
+  static MachineManager *singleton_ GUARDED_BY(mu_);
+
+  // Holds an executor associated with each device ordinal present in the
+  // system, which are the indices. Immutable after initialization.
+  std::vector<std::unique_ptr<StreamExecutor>> executors_;
+
+  // Holds an stream associated with each device ordinal present in the
+  // system, which are the indices. Immutable after initialization.
+  std::vector<std::unique_ptr<Stream>> streams_;
+
+  // The platform that this is managing for the machine.
+  PlatformKind platform_;
+
+  // Options used to create StreamExecutors on each of the respective devices.
+  DeviceOptions device_options_;
+
+  // Plugin configuration to use for all StreamExecutors created by this object.
+  PluginConfig plugin_config_;
+
+  // The smallest NUMA node value for any device managed by this machine
+  // manager. Used, along with limit_numa_node_, to convert NUMA nodes into bus
+  // ordinals. The NUMA node space occupied by GPUs is assumed to be dense.
+  int min_numa_node_;
+
+  // Larger than the NUMA node value for any device managed by this machine
+  // manager.
+  int limit_numa_node_;
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_MACHINE_MANAGER_H_
diff --git a/tensorflow/stream_executor/multi_platform_manager.cc b/tensorflow/stream_executor/multi_platform_manager.cc
new file mode 100644
index 0000000000..a65add05c5
--- /dev/null
+++ b/tensorflow/stream_executor/multi_platform_manager.cc
@@ -0,0 +1,66 @@
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/str_util.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+
+namespace perftools {
+namespace gputools {
+
+/* static */ mutex MultiPlatformManager::platforms_mutex_(LINKER_INITIALIZED);
+
+/* static */ port::Status MultiPlatformManager::RegisterPlatform(
+    std::unique_ptr<Platform> platform) {
+  CHECK(platform != nullptr);
+  string key = port::Lowercase(platform->Name());
+  mutex_lock lock(platforms_mutex_);
+  if (GetPlatformMap()->find(key) != GetPlatformMap()->end()) {
+    return port::Status(port::error::INTERNAL,
+                        "platform is already registered with name: \"" +
+                            platform->Name() + "\"");
+  }
+  GetPlatformByIdMap()->insert(std::make_pair(platform->id(), platform.get()));
+  // Release ownership/uniqueness to prevent destruction on program exit.
+  // This avoids Platforms "cleaning up" on program exit, because otherwise,
+  // there are _very_ tricky races between StreamExecutor and underlying
+  // platforms (CUDA, OpenCL) during exit. Since these are fixed-size and 1x per
+  // program, these are deemed acceptable.
+  (*GetPlatformMap())[key] = platform.release();
+  return port::Status::OK();
+}
+
+/* static */ port::StatusOr<Platform*> MultiPlatformManager::PlatformWithName(
+    const string& target) {
+  mutex_lock lock(platforms_mutex_);
+  auto it = GetPlatformMap()->find(port::Lowercase(target));
+
+  if (it == GetPlatformMap()->end()) {
+    return port::Status(
+        port::error::NOT_FOUND,
+        "could not find registered platform with name: \"" + target + "\"");
+  }
+
+  return it->second;
+}
+
+/* static */ port::StatusOr<Platform*> MultiPlatformManager::PlatformWithId(
+    const Platform::Id& id) {
+  mutex_lock lock(platforms_mutex_);
+  auto it = GetPlatformByIdMap()->find(id);
+  if (it == GetPlatformByIdMap()->end()) {
+    return port::Status(
+        port::error::NOT_FOUND,
+        port::Printf("could not find registered platform with id: 0x%p", id));
+  }
+
+  return it->second;
+}
+
+/* static */ void MultiPlatformManager::ClearPlatformRegistry() {
+  mutex_lock lock(platforms_mutex_);
+  GetPlatformMap()->clear();
+  GetPlatformByIdMap()->clear();
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/multi_platform_manager.h b/tensorflow/stream_executor/multi_platform_manager.h
new file mode 100644
index 0000000000..ade7fac24b
--- /dev/null
+++ b/tensorflow/stream_executor/multi_platform_manager.h
@@ -0,0 +1,144 @@
+// This is a registration-oriented interface for multiple platforms. It will
+// replace the MachineManager singleton interface, as MachineManager does not
+// currently support simultaneous use of multiple platforms.
+//
+// Usage:
+//
+// In your BUILD rule, add a dependency on a platform plugin that you'd like
+// to use, such as:
+//
+//   //perftools/gputools/executor/cuda:cuda_platform
+//   //perftools/gputools/executor/opencl:opencl_platform
+//
+// This will register platform plugins that can be discovered via this
+// interface. Sample API usage:
+//
+//   port::StatusOr<Platform*> platform_status =
+//      gpu::MultiPlatformManager::PlatformWithName("OpenCL");
+//   if (!platform_status.ok()) { ... }
+//   Platform* platform = platform_status.ValueOrDie();
+//   LOG(INFO) << platform->VisibleDeviceCount() << " devices visible";
+//   if (platform->VisibleDeviceCount() <= 0) { return; }
+//
+//   for (int i = 0; i < platform->VisibleDeviceCount(); ++i) {
+//     port::StatusOr<StreamExecutor*> executor_status =
+//        platform->ExecutorForDevice(i);
+//     if (!executor_status.ok()) {
+//       LOG(INFO) << "could not retrieve executor for device ordinal " << i
+//                 << ": " << executor_status.status();
+//       continue;
+//     }
+//     LOG(INFO) << "found usable executor: " << executor_status.ValueOrDie();
+//   }
+//
+// A few things to note:
+//  - There is no standard formatting/practice for identifying the name of a
+//    platform. Ideally, a platform will list its registered name in its header
+//    or in other associated documentation.
+//  - Platform name lookup is case-insensitive. "OpenCL" or "opencl" (or even
+//    ("OpEnCl") would work correctly in the above example.
+//
+// And similarly, for standard interfaces (BLAS, RNG, etc.) you can add
+// dependencies on support libraries, e.g.:
+//
+//    //perftools/gputools/executor/cuda:pluton_blas_plugin
+//    //perftools/gputools/executor/cuda:cudnn_plugin
+//    //perftools/gputools/executor/cuda:cublas_plugin
+//    //perftools/gputools/executor/cuda:curand_plugin
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_MULTI_PLATFORM_MANAGER_H_
+#define TENSORFLOW_STREAM_EXECUTOR_MULTI_PLATFORM_MANAGER_H_
+
+#include <functional>
+#include <map>
+#include <memory>
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+// Manages multiple platforms that may be present on the current machine.
+class MultiPlatformManager {
+ public:
+  // Registers a platform object, returns an error status if the platform is
+  // already registered. The associated listener, if not null, will be used to
+  // trace events for ALL executors for that platform.
+  // Takes ownership of listener.
+  static port::Status RegisterPlatform(std::unique_ptr<Platform> platform);
+
+  // Retrieves the platform registered with the given platform name; e.g.
+  // "CUDA", "OpenCL", ...
+  //
+  // If the requested platform is not registered, an error status is returned.
+  // Ownership of the platform is NOT transferred to the caller --
+  // the MultiPlatformManager owns the platforms in a singleton-like fashion.
+  static port::StatusOr<Platform*> PlatformWithName(const string& target);
+
+  // Retrieves the platform registered with the given platform ID, which
+  // is an opaque (but comparable) value.
+  //
+  // If the requested platform is not registered, an error status is returned.
+  // Ownership of the platform is NOT transferred to the caller --
+  // the MultiPlatformManager owns the platforms in a singleton-like fashion.
+  static port::StatusOr<Platform*> PlatformWithId(const Platform::Id& id);
+
+  // Clears the set of registered platforms, primarily used for testing.
+  static void ClearPlatformRegistry();
+
+  // Although the MultiPlatformManager "owns" its platforms, it holds them as
+  // undecorated pointers to prevent races during program exit (between this
+  // object's data and the underlying platforms (e.g., CUDA, OpenCL).
+  // Because certain platforms have unpredictable deinitialization
+  // times/sequences, it is not possible to strucure a safe deinitialization
+  // sequence. Thus, we intentionally "leak" allocated platforms to defer
+  // cleanup to the OS. This should be acceptable, as these are one-time
+  // allocations per program invocation.
+  // The MultiPlatformManager should be considered the owner
+  // of any platforms registered with it, and leak checking should be disabled
+  // during allocation of such Platforms, to avoid spurious reporting at program
+  // exit.
+  using PlatformMap = std::map<string, Platform*>;
+
+  // Provides access to the available set of platforms under a lock.
+  static port::Status WithPlatforms(
+      std::function<port::Status(PlatformMap*)> callback) {
+    mutex_lock lock(platforms_mutex_);
+    return callback(GetPlatformMap());
+  }
+
+ private:
+  // mutex that guards the platform map.
+  static mutex platforms_mutex_;
+
+  // TODO(b/22689637): Clean up these two maps; make sure they coexist nicely.
+  // TODO(b/22689637): Move this (whatever the final/"official" map is) to
+  // plugin_regstry.h, along with the associated functionality.
+  // Platform-name-to-object mapping. These platforms are registered via module
+  // initializers, and linkage determines which platforms are available to a
+  // given target.
+  static PlatformMap* GetPlatformMap() {
+    static PlatformMap* instance = new PlatformMap;
+    return instance;
+  }
+
+  // Holds a Platform::Id-to-object mapping.
+  // Unlike platforms_ above, this map does not own its contents.
+  static std::map<Platform::Id, Platform*>* GetPlatformByIdMap() {
+    using PlatformIdMap = std::map<Platform::Id, Platform*>;
+    static PlatformIdMap* instance = new PlatformIdMap;
+    return instance;
+  }
+
+  SE_DISALLOW_COPY_AND_ASSIGN(MultiPlatformManager);
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_MULTI_PLATFORM_MANAGER_H_
diff --git a/tensorflow/stream_executor/platform.cc b/tensorflow/stream_executor/platform.cc
new file mode 100644
index 0000000000..8be9353bbe
--- /dev/null
+++ b/tensorflow/stream_executor/platform.cc
@@ -0,0 +1,115 @@
+#include "tensorflow/stream_executor/platform.h"
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+
+namespace perftools {
+namespace gputools {
+
+string PlatformKindString(PlatformKind kind) {
+  switch (kind) {
+    case PlatformKind::kCuda:
+      return "CUDA";
+    case PlatformKind::kOpenCL:
+      return "OpenCL";
+    case PlatformKind::kOpenCLAltera:
+      return "OpenCL+Altera";
+    case PlatformKind::kHost:
+      return "Host";
+    case PlatformKind::kMock:
+      return "Mock";
+    default:
+      return port::StrCat("InvalidPlatformKind(", static_cast<int>(kind), ")");
+  }
+}
+
+PlatformKind PlatformKindFromString(string kind) {
+  for (int i = 0; i < static_cast<int>(PlatformKind::kSize); ++i) {
+    if (kind == PlatformKindString(static_cast<PlatformKind>(i))) {
+      return static_cast<PlatformKind>(i);
+    }
+  }
+
+  return PlatformKind::kInvalid;
+}
+
+bool PlatformIsRunnable(PlatformKind kind) {
+  switch (kind) {
+    case PlatformKind::kCuda:
+    case PlatformKind::kOpenCL:
+    case PlatformKind::kHost:
+      return true;
+    default:
+      return false;
+  }
+}
+
+bool PlatformIsRunnableOnDevice(PlatformKind kind) {
+  switch (kind) {
+    case PlatformKind::kCuda:
+    case PlatformKind::kOpenCL:
+      return true;
+    default:
+      return false;
+  }
+}
+
+void CheckPlatformKindIsValid(PlatformKind kind) {
+  CHECK(static_cast<int>(PlatformKind::kCuda) <= static_cast<int>(kind) &&
+        static_cast<int>(kind) <= static_cast<int>(PlatformKind::kMock))
+      << "invalid GPU executor kind: " << PlatformKindString(kind);
+}
+
+StreamExecutorConfig::StreamExecutorConfig()
+    : ordinal(-1), device_options(DeviceOptions::Default()) {}
+
+StreamExecutorConfig::StreamExecutorConfig(int ordinal_in)
+    : ordinal(ordinal_in), device_options(DeviceOptions::Default()) {}
+
+Platform::~Platform() {}
+
+port::Status Platform::ForceExecutorShutdown() {
+  return port::Status(port::error::UNIMPLEMENTED,
+                      "executor shutdown is not supported on this platform");
+}
+
+std::unique_ptr<Platform::PeerAccessMap> Platform::GetPeerAccessMap() {
+  auto *map = new PeerAccessMap;
+
+  int device_count = VisibleDeviceCount();
+  for (int i = 0; i < device_count; ++i) {
+    for (int j = 0; j < device_count; ++j) {
+      StreamExecutor *from = ExecutorForDevice(i).ValueOrDie();
+      StreamExecutor *to = ExecutorForDevice(j).ValueOrDie();
+      (*map)[{i, j}] = from->CanEnablePeerAccessTo(to);
+    }
+  }
+
+  return std::unique_ptr<Platform::PeerAccessMap>{map};
+}
+
+port::Status Platform::EnablePeerAccess() {
+  auto peer_access_map = GetPeerAccessMap();
+  for (const auto &access : *peer_access_map) {
+    auto devices = access.first;
+    if (access.second) {
+      StreamExecutor *from = ExecutorForDevice(devices.first).ValueOrDie();
+      StreamExecutor *to = ExecutorForDevice(devices.second).ValueOrDie();
+      auto status = from->EnablePeerAccessTo(to);
+      if (!status.ok()) {
+        return status;
+      }
+    } else {
+      LOG(INFO) << "cannot enable peer access from device ordinal "
+                << devices.first << " to device ordinal " << devices.second;
+    }
+  }
+  return port::Status::OK();
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/platform.h b/tensorflow/stream_executor/platform.h
new file mode 100644
index 0000000000..c8b500b424
--- /dev/null
+++ b/tensorflow/stream_executor/platform.h
@@ -0,0 +1,185 @@
+// Defines types and declares functions for identifying and extracting
+// information about the types of platforms and supporting libraries for which
+// StreamExecutor implementations exist.
+#ifndef TENSORFLOW_STREAM_EXECUTOR_PLATFORM_H_
+#define TENSORFLOW_STREAM_EXECUTOR_PLATFORM_H_
+
+#include <map>
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/device_options.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/plugin.h"
+#include "tensorflow/stream_executor/trace_listener.h"
+
+namespace perftools {
+namespace gputools {
+
+class StreamExecutor;
+
+// Describes the platform for a StreamExecutor instantiation to act upon.
+//
+// Implementors: if you add a value here be sure to update PlatformKindString
+// and CheckPlatformKindIsValid.
+enum class PlatformKind {
+  kInvalid,
+  kCuda,
+  kOpenCL,
+  kOpenCLAltera,  // Altera FPGA OpenCL platform.
+                  // See documentation: go/fpgaopencl
+                  // (StreamExecutor integration)
+  kHost,
+  kMock,
+  kSize,
+};
+
+// Returns true if kind represents a valid platform capable of enqueuing items
+// on a stream, but not necessarily on an accelerator device.
+// Returns false for kMock and any invalid PlatformKind values.
+bool PlatformIsRunnable(PlatformKind kind);
+
+// Returns true if kind represents a valid platform capable of running kernels
+// on an accelerator device. Returns false for kHost*, kMock and any invalid
+// PlatformKind values.
+bool PlatformIsRunnableOnDevice(PlatformKind kind);
+
+// Returns a printable description of a PlatformKind.
+string PlatformKindString(PlatformKind kind);
+
+// Returns the PlatformKind corresponding to the input string; returns kInvalid
+// in the case of no match.
+PlatformKind PlatformKindFromString(string platform_string);
+
+// Checks that kind takes on a valid value.
+void CheckPlatformKindIsValid(PlatformKind kind);
+
+// StreamExecutorConfig encapsulates the set of options for constructing a
+// StreamExecutor for a given platform.
+struct StreamExecutorConfig {
+  // Sets members to defaults: -1 for ordinal (must be changed), and default
+  // PluginConfig and DeviceOptions.
+  StreamExecutorConfig();
+
+  // Simple ordinal-setting constructor.
+  explicit StreamExecutorConfig(int ordinal);
+
+  // The ordinal of the device to be managed by the returned StreamExecutor.
+  int ordinal;
+
+  // The PluginConfig for the returned StreamExecutor.
+  PluginConfig plugin_config;
+
+  // The DeviceOptions for the returned StreamExecutor.
+  DeviceOptions device_options;
+};
+
+// Abstract base class for a platform registered with the MultiPlatformManager.
+class Platform {
+ public:
+  virtual ~Platform();
+
+  // A platform ID is a unique identifier for each registered platform type -
+  // each platform is required to expose an ID to ensure unique registration and
+  // as a target against which plugins can register.
+  //
+  // The macro below is provided to help generate a [process-unique] identifer.
+  using Id = void*;
+
+// Helper macro to define a plugin ID. To be used only inside plugin
+// implementation files. Works by "reserving" an address/value (guaranteed to be
+// unique) inside a process space.
+#define PLATFORM_DEFINE_ID(ID_VAR_NAME) \
+  namespace {                           \
+  int plugin_id_value;                  \
+  }                                     \
+  const perftools::gputools::Platform::Id ID_VAR_NAME = &plugin_id_value;
+
+  // Returns a key uniquely identifying this platform.
+  virtual Id id() const = 0;
+
+  // Returns the number of devices accessible on this platform.
+  //
+  // Note that, though these devices are visible, if there is only one userspace
+  // context allowed for the device at a time and another process is using this
+  // device, a call to ExecutorForDevice may return an error status.
+  virtual int VisibleDeviceCount() const = 0;
+
+  // Name of this platform.
+  virtual const string& Name() const = 0;
+
+  // Returns a device with the given ordinal on this platform with a default
+  // plugin configuration or, if none can be found with the given ordinal or
+  // there is an error in opening a context to communicate with the device, an
+  // error status is returned.
+  //
+  // Ownership of the executor is NOT transferred to the caller --
+  // the Platform owns the executors in a singleton-like fashion.
+  virtual port::StatusOr<StreamExecutor*> ExecutorForDevice(int ordinal) = 0;
+
+  // Returns a device or error, as above, with the specified plugins.
+  //
+  // Ownership of the executor is NOT transferred to the caller.
+  virtual port::StatusOr<StreamExecutor*> ExecutorForDeviceWithPluginConfig(
+      int ordinal, const PluginConfig& plugin_config) = 0;
+
+  // Returns a device constructed with the options specified in "config".
+  // Ownership of the executor is NOT transferred to the caller.
+  virtual port::StatusOr<StreamExecutor*> GetExecutor(
+      const StreamExecutorConfig& config) = 0;
+
+  // Returns a device constructed with the options specified in "config" without
+  // looking in or storing to the Platform's executor cache.
+  // Ownership IS transferred to the caller.
+  virtual port::StatusOr<std::unique_ptr<StreamExecutor>> GetUncachedExecutor(
+      const StreamExecutorConfig& config) = 0;
+
+  // Warning: this is a dangerous API and should be used with caution.
+  //
+  // Forces the platform to delete executor instances, releasing their
+  // associated device contexts. There must be no held instances of the executor
+  // and there must be no outstanding activity on the devices for this platform.
+  //
+  // This is only useful on platforms which bind a device to a single process
+  // that has obtained the device context. May return UNIMPLEMENTED on platforms
+  // that have no reason to destroy device contexts.
+  virtual port::Status ForceExecutorShutdown();
+
+  // Registers a TraceListener to listen to all StreamExecutors for this
+  // platform.
+  // Takes ownership of listener.
+  virtual void RegisterTraceListener(
+      std::unique_ptr<TraceListener> listener) = 0;
+
+  // Removes the specified TraceListener from all StreamExecutors.
+  virtual void UnregisterTraceListener(TraceListener* listener) = 0;
+
+  // Map of executor-to-executor coordinate and boolean, indicating if the first
+  // executor can access the second's memory.
+  using PeerAccessMap = std::map<std::pair<int, int>, bool>;
+
+  // Returns a matrix indicating which executors can access which other
+  // executors' memory.
+  virtual std::unique_ptr<PeerAccessMap> GetPeerAccessMap();
+
+  // Attempts to enable all peer-to-peer access links described by the result of
+  // GetPeerAccessMap(). Note that calling this routine will force the creation
+  // of a default-argument (see StreamExecutorConfig) StreamExecutor object for
+  // each device ordinal in the system, should any not yet exist.
+  virtual port::Status EnablePeerAccess();
+
+ protected:
+  // SE_DISALLOW_COPY_AND_ASSIGN declares a constructor, which suppresses the
+  // presence of the default constructor. This statement re-enables it, which
+  // simplifies subclassing.
+  Platform() = default;
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(Platform);
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_PLATFORM_H_
diff --git a/tensorflow/stream_executor/platform/default/mutex.h b/tensorflow/stream_executor/platform/default/mutex.h
new file mode 100644
index 0000000000..371eb7f156
--- /dev/null
+++ b/tensorflow/stream_executor/platform/default/mutex.h
@@ -0,0 +1,60 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_PLATFORM_DEFAULT_MUTEX_H_
+#define TENSORFLOW_STREAM_EXECUTOR_PLATFORM_DEFAULT_MUTEX_H_
+
+#include <chrono>              // NOLINT
+#include <condition_variable>  // NOLINT
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+// std::shared_timed_mutex is a C++14 feature.
+#if (__cplusplus >= 201402L)
+#define STREAM_EXECUTOR_USE_SHARED_MUTEX
+#endif  // __cplusplus >= 201402L
+
+#ifdef STREAM_EXECUTOR_USE_SHARED_MUTEX
+#include <shared_mutex>  // NOLINT
+#else
+#include <mutex>  // NOLINT
+#endif
+
+namespace perftools {
+namespace gputools {
+
+enum ConditionResult { kCond_Timeout, kCond_MaybeNotified };
+
+#ifdef STREAM_EXECUTOR_USE_SHARED_MUTEX
+typedef std::shared_timed_mutex BaseMutex;
+#else
+typedef std::mutex BaseMutex;
+#endif
+
+// A class that wraps around the std::mutex implementation, only adding an
+// additional LinkerInitialized constructor interface.
+class mutex : public BaseMutex {
+ public:
+  mutex() {}
+  // The default implementation of std::mutex is safe to use after the linker
+  // initializations
+  explicit mutex(LinkerInitialized x) {}
+};
+
+typedef std::unique_lock<BaseMutex> mutex_lock;
+
+#ifdef STREAM_EXECUTOR_USE_SHARED_MUTEX
+typedef std::shared_lock<BaseMutex> shared_lock;
+#else
+typedef std::unique_lock<BaseMutex> shared_lock;
+#endif
+
+using std::condition_variable;
+
+inline ConditionResult WaitForMilliseconds(mutex_lock* mu,
+                                           condition_variable* cv, int64 ms) {
+  std::cv_status s = cv->wait_for(*mu, std::chrono::milliseconds(ms));
+  return (s == std::cv_status::timeout) ? kCond_Timeout : kCond_MaybeNotified;
+}
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_PLATFORM_DEFAULT_MUTEX_H_
diff --git a/tensorflow/stream_executor/platform/logging.h b/tensorflow/stream_executor/platform/logging.h
new file mode 100644
index 0000000000..a3e2385dd3
--- /dev/null
+++ b/tensorflow/stream_executor/platform/logging.h
@@ -0,0 +1,21 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_PLATFORM_LOGGING_H_
+#define TENSORFLOW_STREAM_EXECUTOR_PLATFORM_LOGGING_H_
+
+#include "tensorflow/core/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+#if !defined(PLATFORM_GOOGLE)
+
+// A CHECK() macro that lets you assert the success of a function that
+// returns -1 and sets errno in case of an error. E.g.
+//
+// CHECK_ERR(mkdir(path, 0700));
+//
+// or
+//
+// int fd = open(filename, flags); CHECK_ERR(fd) << ": open " << filename;
+#define CHECK_ERR(invocation) CHECK((invocation) != -1) << #invocation
+
+#endif
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_PLATFORM_LOGGING_H_
diff --git a/tensorflow/stream_executor/platform/mutex.h b/tensorflow/stream_executor/platform/mutex.h
new file mode 100644
index 0000000000..21b1894737
--- /dev/null
+++ b/tensorflow/stream_executor/platform/mutex.h
@@ -0,0 +1,12 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_PLATFORM_MUTEX_H_
+#define TENSORFLOW_STREAM_EXECUTOR_PLATFORM_MUTEX_H_
+
+#include "tensorflow/core/platform/port.h"
+
+#if defined(PLATFORM_GOOGLE)
+#include "tensorflow/stream_executor/platform/google/mutex.h"
+#else
+#include "tensorflow/stream_executor/platform/default/mutex.h"
+#endif
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_PLATFORM_MUTEX_H_
diff --git a/tensorflow/stream_executor/platform/port.h b/tensorflow/stream_executor/platform/port.h
new file mode 100644
index 0000000000..ebe0cf517b
--- /dev/null
+++ b/tensorflow/stream_executor/platform/port.h
@@ -0,0 +1,40 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_PLATFORM_PORT_H_
+#define TENSORFLOW_STREAM_EXECUTOR_PLATFORM_PORT_H_
+
+#include "tensorflow/core/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+using tensorflow::int8;
+using tensorflow::int16;
+using tensorflow::int32;
+using tensorflow::int64;
+
+using tensorflow::uint8;
+using tensorflow::uint16;
+using tensorflow::uint32;
+using tensorflow::uint64;
+
+#if !defined(PLATFORM_GOOGLE)
+using std::string;
+#endif
+
+#if !defined(COMPILER_MSVC)
+#define ARRAYSIZE(a)              \
+    ((sizeof(a) / sizeof(*(a))) / \
+    static_cast<size_t>(!(sizeof(a) % sizeof(*(a)))))
+#endif
+
+using tensorflow::LinkerInitialized;
+using tensorflow::LINKER_INITIALIZED;
+
+#define SE_FALLTHROUGH_INTENDED TF_FALLTHROUGH_INTENDED
+
+}  // namespace gputools
+}  // namespace perftools
+
+#define SE_DISALLOW_COPY_AND_ASSIGN TF_DISALLOW_COPY_AND_ASSIGN
+#define SE_MUST_USE_RESULT TF_MUST_USE_RESULT
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_PLATFORM_PORT_H_
diff --git a/tensorflow/stream_executor/platform/thread_annotations.h b/tensorflow/stream_executor/platform/thread_annotations.h
new file mode 100644
index 0000000000..bce4bb3794
--- /dev/null
+++ b/tensorflow/stream_executor/platform/thread_annotations.h
@@ -0,0 +1,6 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_PLATFORM_THREAD_ANNOTATIONS_H_
+#define TENSORFLOW_STREAM_EXECUTOR_PLATFORM_THREAD_ANNOTATIONS_H_
+
+#include "tensorflow/core/platform/thread_annotations.h"
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_PLATFORM_THREAD_ANNOTATIONS_H_
diff --git a/tensorflow/stream_executor/plugin.cc b/tensorflow/stream_executor/plugin.cc
new file mode 100644
index 0000000000..8ca8ecff38
--- /dev/null
+++ b/tensorflow/stream_executor/plugin.cc
@@ -0,0 +1,40 @@
+#include "tensorflow/stream_executor/plugin.h"
+
+namespace perftools {
+namespace gputools {
+
+// Mostly-arbitrary ID only used as a sentinel "not otherwise initialized"
+// value. This value should never [need to] be specified aside by initialization
+// functions defined in this file and in PluginRegistry.
+PLUGIN_REGISTRY_DEFINE_PLUGIN_ID(PluginConfig::kDefault);
+
+PluginConfig::PluginConfig()
+    : blas_(kDefault), dnn_(kDefault), fft_(kDefault), rng_(kDefault) {}
+
+bool PluginConfig::operator==(const PluginConfig& rhs) const {
+  return blas_ == rhs.blas_ && dnn_ == rhs.dnn_ && fft_ == rhs.fft_ &&
+         rng_ == rhs.rng_;
+}
+
+PluginConfig& PluginConfig::SetBlas(PluginId blas) {
+  blas_ = blas;
+  return *this;
+}
+
+PluginConfig& PluginConfig::SetDnn(PluginId dnn) {
+  dnn_ = dnn;
+  return *this;
+}
+
+PluginConfig& PluginConfig::SetFft(PluginId fft) {
+  fft_ = fft;
+  return *this;
+}
+
+PluginConfig& PluginConfig::SetRng(PluginId rng) {
+  rng_ = rng;
+  return *this;
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/plugin.h b/tensorflow/stream_executor/plugin.h
new file mode 100644
index 0000000000..5dc39b7928
--- /dev/null
+++ b/tensorflow/stream_executor/plugin.h
@@ -0,0 +1,74 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_PLUGIN_H_
+#define TENSORFLOW_STREAM_EXECUTOR_PLUGIN_H_
+
+namespace perftools {
+namespace gputools {
+
+// A plugin ID is a unique identifier for each registered plugin type.
+typedef void* PluginId;
+
+// Helper macro to define a plugin ID. To be used only inside plugin
+// implementation files. Works by "reserving" an address/value (guaranteed to be
+// unique) inside a process space.
+#define PLUGIN_REGISTRY_DEFINE_PLUGIN_ID(ID_VAR_NAME) \
+  namespace {                                         \
+  int plugin_id_value;                                \
+  }                                                   \
+  const PluginId ID_VAR_NAME = &plugin_id_value;
+
+// kNullPlugin denotes an invalid plugin identifier.
+extern const PluginId kNullPlugin;
+
+// Enumeration to list the supported types of plugins / support libraries.
+enum class PluginKind {
+  kInvalid,
+  kBlas,
+  kDnn,
+  kFft,
+  kRng,
+};
+
+// A PluginConfig describes the set of plugins to be used by a StreamExecutor
+// instance. Each plugin is defined by an arbitrary identifier, usually best set
+// to the address static member in the implementation (to avoid conflicts).
+//
+// A PluginConfig may be passed to the StreamExecutor constructor - the plugins
+// described therein will be used to provide BLAS, DNN, FFT, and RNG
+// functionality. Platform-approprate defaults will be used for any un-set
+// libraries. If a platform does not support a specified plugin (ex. cuBLAS on
+// an OpenCL executor), then an error will be logged and no plugin operations
+// will succeed.
+//
+// The StreamExecutor BUILD target does not link ANY plugin libraries - even
+// common host fallbacks! Any plugins must be explicitly linked by dependent
+// targets. See the cuda, opencl and host BUILD files for implemented plugin
+// support (search for "plugin").
+class PluginConfig {
+ public:
+  // Value specifying the platform's default option for that plugin.
+  static const PluginId kDefault;
+
+  // Initializes all members to the default options.
+  PluginConfig();
+
+  bool operator==(const PluginConfig& rhs) const;
+
+  // Sets the appropriate library kind to that passed in.
+  PluginConfig& SetBlas(PluginId blas);
+  PluginConfig& SetDnn(PluginId dnn);
+  PluginConfig& SetFft(PluginId fft);
+  PluginConfig& SetRng(PluginId rng);
+
+  PluginId blas() const { return blas_; }
+  PluginId dnn() const { return dnn_; }
+  PluginId fft() const { return fft_; }
+  PluginId rng() const { return rng_; }
+
+ private:
+  PluginId blas_, dnn_, fft_, rng_;
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_PLUGIN_H_
diff --git a/tensorflow/stream_executor/plugin_registry.cc b/tensorflow/stream_executor/plugin_registry.cc
new file mode 100644
index 0000000000..eda44d1146
--- /dev/null
+++ b/tensorflow/stream_executor/plugin_registry.cc
@@ -0,0 +1,228 @@
+#include "tensorflow/stream_executor/plugin_registry.h"
+
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/multi_platform_manager.h"
+
+namespace perftools {
+namespace gputools {
+
+const PluginId kNullPlugin = nullptr;
+
+// Returns the string representation of the specified PluginKind.
+string PluginKindString(PluginKind plugin_kind) {
+  switch (plugin_kind) {
+    case PluginKind::kBlas:
+      return "BLAS";
+    case PluginKind::kDnn:
+      return "DNN";
+    case PluginKind::kFft:
+      return "FFT";
+    case PluginKind::kRng:
+      return "RNG";
+    case PluginKind::kInvalid:
+    default:
+      return "kInvalid";
+  }
+}
+
+PluginRegistry::DefaultFactories::DefaultFactories() :
+    blas(kNullPlugin), dnn(kNullPlugin), fft(kNullPlugin), rng(kNullPlugin) { }
+
+/* static */ mutex PluginRegistry::mu_(LINKER_INITIALIZED);
+/* static */ PluginRegistry* PluginRegistry::instance_ = nullptr;
+
+PluginRegistry::PluginRegistry() {}
+
+/* static */ PluginRegistry* PluginRegistry::Instance() {
+  mutex_lock lock{mu_};
+  if (instance_ == nullptr) {
+    instance_ = new PluginRegistry();
+  }
+  return instance_;
+}
+
+void PluginRegistry::MapPlatformKindToId(PlatformKind platform_kind,
+                                         Platform::Id platform_id) {
+  platform_id_by_kind_[platform_kind] = platform_id;
+}
+
+template <typename FACTORY_TYPE>
+port::Status PluginRegistry::RegisterFactoryInternal(
+    PluginId plugin_id, const string& plugin_name, FACTORY_TYPE factory,
+    std::map<PluginId, FACTORY_TYPE>* factories) {
+  mutex_lock lock{mu_};
+
+  if (factories->find(plugin_id) != factories->end()) {
+    return port::Status{
+        port::error::ALREADY_EXISTS,
+        port::Printf("Attempting to register factory for plugin %s when "
+                     "one has already been registered",
+                     plugin_name.c_str())};
+  }
+
+  (*factories)[plugin_id] = factory;
+  plugin_names_[plugin_id] = plugin_name;
+  return port::Status::OK();
+}
+
+template <typename FACTORY_TYPE>
+port::StatusOr<FACTORY_TYPE> PluginRegistry::GetFactoryInternal(
+    PluginId plugin_id, const std::map<PluginId, FACTORY_TYPE>& factories,
+    const std::map<PluginId, FACTORY_TYPE>& generic_factories) const {
+  auto iter = factories.find(plugin_id);
+  if (iter == factories.end()) {
+    iter = generic_factories.find(plugin_id);
+    if (iter == generic_factories.end()) {
+      return port::Status{
+          port::error::NOT_FOUND,
+          port::Printf("Plugin ID %p not registered.", plugin_id)};
+    }
+  }
+
+  return iter->second;
+}
+
+bool PluginRegistry::SetDefaultFactory(Platform::Id platform_id,
+                                       PluginKind plugin_kind,
+                                       PluginId plugin_id) {
+  if (!HasFactory(platform_id, plugin_kind, plugin_id)) {
+    port::StatusOr<Platform*> status =
+        MultiPlatformManager::PlatformWithId(platform_id);
+    string platform_name = "<unregistered platform>";
+    if (status.ok()) {
+      platform_name = status.ValueOrDie()->Name();
+    }
+
+    LOG(ERROR) << "A factory must be registered for a platform before being "
+               << "set as default! "
+               << "Platform name: " << platform_name
+               << ", PluginKind: " << PluginKindString(plugin_kind)
+               << ", PluginId: " << plugin_id;
+    return false;
+  }
+
+  switch (plugin_kind) {
+    case PluginKind::kBlas:
+      default_factories_[platform_id].blas = plugin_id;
+      break;
+    case PluginKind::kDnn:
+      default_factories_[platform_id].dnn = plugin_id;
+      break;
+    case PluginKind::kFft:
+      default_factories_[platform_id].fft = plugin_id;
+      break;
+    case PluginKind::kRng:
+      default_factories_[platform_id].rng = plugin_id;
+      break;
+    default:
+      LOG(ERROR) << "Invalid plugin kind specified: "
+                 << static_cast<int>(plugin_kind);
+      return false;
+  }
+
+  return true;
+}
+
+bool PluginRegistry::HasFactory(const PluginFactories& factories,
+                                PluginKind plugin_kind,
+                                PluginId plugin_id) const {
+  switch (plugin_kind) {
+    case PluginKind::kBlas:
+      return factories.blas.find(plugin_id) != factories.blas.end();
+    case PluginKind::kDnn:
+      return factories.dnn.find(plugin_id) != factories.dnn.end();
+    case PluginKind::kFft:
+      return factories.fft.find(plugin_id) != factories.fft.end();
+    case PluginKind::kRng:
+      return factories.rng.find(plugin_id) != factories.rng.end();
+    default:
+      LOG(ERROR) << "Invalid plugin kind specified: "
+                 << PluginKindString(plugin_kind);
+      return false;
+  }
+}
+
+bool PluginRegistry::HasFactory(Platform::Id platform_id,
+                                PluginKind plugin_kind,
+                                PluginId plugin_id) const {
+  auto iter = factories_.find(platform_id);
+  if (iter != factories_.end()) {
+    if (HasFactory(iter->second, plugin_kind, plugin_id)) {
+      return true;
+    }
+  }
+
+  return HasFactory(generic_factories_, plugin_kind, plugin_id);
+}
+
+// Explicit instantiations to support types exposed in user/public API.
+#define EMIT_PLUGIN_SPECIALIZATIONS(FACTORY_TYPE, FACTORY_VAR, PLUGIN_STRING) \
+  template port::StatusOr<PluginRegistry::FACTORY_TYPE>                       \
+  PluginRegistry::GetFactoryInternal<PluginRegistry::FACTORY_TYPE>(           \
+      PluginId plugin_id,                                                     \
+      const std::map<PluginId, PluginRegistry::FACTORY_TYPE>& factories,      \
+      const std::map<PluginId, PluginRegistry::FACTORY_TYPE>&                 \
+          generic_factories) const;                                           \
+                                                                              \
+  template port::Status                                                       \
+  PluginRegistry::RegisterFactoryInternal<PluginRegistry::FACTORY_TYPE>(      \
+      PluginId plugin_id, const string& plugin_name,                          \
+      PluginRegistry::FACTORY_TYPE factory,                                   \
+      std::map<PluginId, PluginRegistry::FACTORY_TYPE>* factories);           \
+                                                                              \
+  template <>                                                                 \
+  port::Status PluginRegistry::RegisterFactory<PluginRegistry::FACTORY_TYPE>( \
+      Platform::Id platform_id, PluginId plugin_id, const string& name,       \
+      PluginRegistry::FACTORY_TYPE factory) {                                 \
+    return RegisterFactoryInternal(plugin_id, name, factory,                  \
+                                   &factories_[platform_id].FACTORY_VAR);     \
+  }                                                                           \
+                                                                              \
+  template <>                                                                 \
+  port::Status PluginRegistry::RegisterFactoryForAllPlatforms<                \
+      PluginRegistry::FACTORY_TYPE>(PluginId plugin_id, const string& name,   \
+                                    PluginRegistry::FACTORY_TYPE factory) {   \
+    return RegisterFactoryInternal(plugin_id, name, factory,                  \
+                                   &generic_factories_.FACTORY_VAR);          \
+  }                                                                           \
+                                                                              \
+  template <>                                                                 \
+  port::StatusOr<PluginRegistry::FACTORY_TYPE> PluginRegistry::GetFactory(    \
+      Platform::Id platform_id, PluginId plugin_id) {                         \
+    if (plugin_id == PluginConfig::kDefault) {                                \
+      plugin_id = default_factories_[platform_id].FACTORY_VAR;                \
+                                                                              \
+      if (plugin_id == kNullPlugin) {                                         \
+        return port::Status{port::error::FAILED_PRECONDITION,                 \
+                            "No suitable " PLUGIN_STRING                      \
+                            " plugin registered, default or otherwise."};     \
+      } else {                                                                \
+        VLOG(2) << "Selecting default " PLUGIN_STRING " plugin, "             \
+                << plugin_names_[plugin_id];                                  \
+      }                                                                       \
+    }                                                                         \
+    return GetFactoryInternal(plugin_id, factories_[platform_id].FACTORY_VAR, \
+                              generic_factories_.FACTORY_VAR);                \
+  }                                                                           \
+                                                                              \
+  /* TODO(b/22689637): Also temporary WRT MultiPlatformManager */             \
+  template <>                                                                 \
+  port::StatusOr<PluginRegistry::FACTORY_TYPE> PluginRegistry::GetFactory(    \
+      PlatformKind platform_kind, PluginId plugin_id) {                       \
+    auto iter = platform_id_by_kind_.find(platform_kind);                     \
+    if (iter == platform_id_by_kind_.end()) {                                 \
+      return port::Status{port::error::FAILED_PRECONDITION,                   \
+                          port::Printf("Platform kind %d not registered.",    \
+                                       static_cast<int>(platform_kind))};     \
+    }                                                                         \
+    return GetFactory<PluginRegistry::FACTORY_TYPE>(iter->second, plugin_id); \
+  }
+
+EMIT_PLUGIN_SPECIALIZATIONS(BlasFactory, blas, "BLAS");
+EMIT_PLUGIN_SPECIALIZATIONS(DnnFactory, dnn, "DNN");
+EMIT_PLUGIN_SPECIALIZATIONS(FftFactory, fft, "FFT");
+EMIT_PLUGIN_SPECIALIZATIONS(RngFactory, rng, "RNG");
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/plugin_registry.h b/tensorflow/stream_executor/plugin_registry.h
new file mode 100644
index 0000000000..f1ea59853d
--- /dev/null
+++ b/tensorflow/stream_executor/plugin_registry.h
@@ -0,0 +1,155 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_PLUGIN_REGISTRY_H_
+#define TENSORFLOW_STREAM_EXECUTOR_PLUGIN_REGISTRY_H_
+
+#include <map>
+
+#include "tensorflow/stream_executor/blas.h"
+#include "tensorflow/stream_executor/dnn.h"
+#include "tensorflow/stream_executor/fft.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/plugin.h"
+#include "tensorflow/stream_executor/rng.h"
+
+namespace perftools {
+namespace gputools {
+
+namespace internal {
+class StreamExecutorInterface;
+}
+
+// The PluginRegistry is a singleton that maintains the set of registered
+// "support library" plugins. Currently, there are four kinds of plugins:
+// BLAS, DNN, FFT, and RNG. Each interface is defined in the corresponding
+// gpu_{kind}.h header.
+//
+// At runtime, a StreamExecutor object will query the singleton registry to
+// retrieve the plugin kind that StreamExecutor was configured with (refer to
+// the StreamExecutor and PluginConfig declarations).
+//
+// Plugin libraries are best registered using REGISTER_MODULE_INITIALIZER,
+// but can be registered at any time. When registering a DSO-backed plugin, it
+// is usually a good idea to load the DSO at registration time, to prevent
+// late-loading from distorting performance/benchmarks as much as possible.
+class PluginRegistry {
+ public:
+  typedef blas::BlasSupport* (*BlasFactory)(internal::StreamExecutorInterface*);
+  typedef dnn::DnnSupport* (*DnnFactory)(internal::StreamExecutorInterface*);
+  typedef fft::FftSupport* (*FftFactory)(internal::StreamExecutorInterface*);
+  typedef rng::RngSupport* (*RngFactory)(internal::StreamExecutorInterface*);
+
+  // Gets (and creates, if necessary) the singleton PluginRegistry instance.
+  static PluginRegistry* Instance();
+
+  // Registers the specified factory with the specified platform.
+  // Returns a non-successful status if the factory has already been registered
+  // with that platform (but execution should be otherwise unaffected).
+  template <typename FactoryT>
+  port::Status RegisterFactory(Platform::Id platform_id, PluginId plugin_id,
+                               const string& name, FactoryT factory);
+
+  // Registers the specified factory as usable by _all_ platform types.
+  // Reports errors just as RegisterFactory.
+  template <typename FactoryT>
+  port::Status RegisterFactoryForAllPlatforms(PluginId plugin_id,
+                                              const string& name,
+                                              FactoryT factory);
+
+  // TODO(b/22689637): Setter for temporary mapping until all users are using
+  // MultiPlatformManager / PlatformId.
+  void MapPlatformKindToId(PlatformKind platform_kind,
+                           Platform::Id platform_id);
+
+  // Potentially sets the plugin identified by plugin_id to be the default
+  // for the specified platform and plugin kind. If this routine is called
+  // multiple types for the same PluginKind, the PluginId given in the last call
+  // will be used.
+  bool SetDefaultFactory(Platform::Id platform_id, PluginKind plugin_kind,
+                         PluginId plugin_id);
+
+  // Return true if the factory/id has been registered for the
+  // specified platform and plugin kind and false otherwise.
+  bool HasFactory(Platform::Id platform_id, PluginKind plugin_kind,
+                  PluginId plugin) const;
+
+  // Retrieves the factory registered for the specified kind,
+  // or a port::Status on error.
+  template <typename FactoryT>
+  port::StatusOr<FactoryT> GetFactory(Platform::Id platform_id,
+                                      PluginId plugin_id);
+
+  // TODO(b/22689637): Deprecated/temporary. Will be deleted once all users are
+  // on MultiPlatformManager / PlatformId.
+  template <typename FactoryT>
+  port::StatusOr<FactoryT> GetFactory(PlatformKind platform_kind,
+                                      PluginId plugin_id);
+
+ private:
+  // Containers for the sets of registered factories, by plugin kind.
+  struct PluginFactories {
+    std::map<PluginId, BlasFactory> blas;
+    std::map<PluginId, DnnFactory> dnn;
+    std::map<PluginId, FftFactory> fft;
+    std::map<PluginId, RngFactory> rng;
+  };
+
+  // Simple structure to hold the currently configured default plugins (for a
+  // particular Platform).
+  struct DefaultFactories {
+    DefaultFactories();
+    PluginId blas, dnn, fft, rng;
+  };
+
+  PluginRegistry();
+
+  // Actually performs the work of registration.
+  template <typename FactoryT>
+  port::Status RegisterFactoryInternal(PluginId plugin_id,
+                                       const string& plugin_name,
+                                       FactoryT factory,
+                                       std::map<PluginId, FactoryT>* factories);
+
+  // Actually performs the work of factory retrieval.
+  template <typename FactoryT>
+  port::StatusOr<FactoryT> GetFactoryInternal(
+      PluginId plugin_id, const std::map<PluginId, FactoryT>& factories,
+      const std::map<PluginId, FactoryT>& generic_factories) const;
+
+  // Returns true if the specified plugin has been registered with the specified
+  // platform factories. Unlike the other overload of this method, this does
+  // not implicitly examine the default factory lists.
+  bool HasFactory(const PluginFactories& factories, PluginKind plugin_kind,
+                  PluginId plugin) const;
+
+  // As this object is a singleton, a global mutex can be used for static and
+  // instance protection.
+  static mutex mu_;
+
+  // The singleton itself.
+  static PluginRegistry* instance_;
+
+  // TODO(b/22689637): Temporary mapping until all users are using
+  // MultiPlatformManager / PlatformId.
+  std::map<PlatformKind, Platform::Id> platform_id_by_kind_;
+
+  // The set of registered factories, keyed by platform ID.
+  std::map<Platform::Id, PluginFactories> factories_;
+
+  // Plugins supported for all platform kinds.
+  PluginFactories generic_factories_;
+
+  // The sets of default factories, keyed by platform ID.
+  std::map<Platform::Id, DefaultFactories> default_factories_;
+
+  // Lookup table for plugin names.
+  std::map<PluginId, string> plugin_names_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(PluginRegistry);
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_PLUGIN_REGISTRY_H_
diff --git a/tensorflow/stream_executor/rng.cc b/tensorflow/stream_executor/rng.cc
new file mode 100644
index 0000000000..052b502194
--- /dev/null
+++ b/tensorflow/stream_executor/rng.cc
@@ -0,0 +1,36 @@
+#include "tensorflow/stream_executor/rng.h"
+
+#include "tensorflow/stream_executor/platform/logging.h"
+
+namespace perftools {
+namespace gputools {
+namespace rng {
+
+bool RngSupport::CheckSeed(const uint8 *seed, uint64 seed_bytes) {
+  CHECK(seed != nullptr);
+
+  if (seed_bytes < kMinSeedBytes) {
+    LOG(INFO) << "Insufficient RNG seed data specified: " << seed_bytes
+              << ". At least " << RngSupport::kMinSeedBytes
+              << " bytes are required.";
+    return false;
+  }
+
+  if (seed_bytes > kMaxSeedBytes) {
+    LOG(INFO) << "Too much RNG seed data specified: " << seed_bytes
+              << ". At most " << RngSupport::kMaxSeedBytes
+              << " bytes may be provided.";
+    return false;
+  }
+
+  return true;
+}
+
+#if defined(__APPLE__)
+const int RngSupport::kMinSeedBytes;
+const int RngSupport::kMaxSeedBytes;
+#endif
+
+}  // namespace rng
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/rng.h b/tensorflow/stream_executor/rng.h
new file mode 100644
index 0000000000..797631d01d
--- /dev/null
+++ b/tensorflow/stream_executor/rng.h
@@ -0,0 +1,80 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_RNG_H_
+#define TENSORFLOW_STREAM_EXECUTOR_RNG_H_
+
+#include <limits.h>
+#include <complex>
+
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+template <typename ElemT>
+class DeviceMemory;
+
+namespace rng {
+
+// Random-number-generation support interface -- this can be derived from a GPU
+// executor when the underlying platform has an RNG library implementation
+// available. See StreamExecutor::AsRng().
+// When a seed is not specified, the backing RNG will be initialized with the
+// default seed for that implementation.
+//
+// Thread-hostile: see StreamExecutor class comment for details on
+// thread-hostility.
+class RngSupport {
+ public:
+  static const int kMinSeedBytes = 16;
+  static const int kMaxSeedBytes = INT_MAX;
+
+  // Releases any random-number-generation resources associated with this
+  // support object in the underlying platform implementation.
+  virtual ~RngSupport() {}
+
+  // Populates a GPU memory allocation with random values appropriate for the
+  // DeviceMemory element type; i.e. populates DeviceMemory<float> with random
+  // float values.
+  virtual bool DoPopulateRandUniform(Stream *stream,
+                                     DeviceMemory<float> *v) = 0;
+  virtual bool DoPopulateRandUniform(Stream *stream,
+                                     DeviceMemory<double> *v) = 0;
+  virtual bool DoPopulateRandUniform(Stream *stream,
+                                     DeviceMemory<std::complex<float>> *v) = 0;
+  virtual bool DoPopulateRandUniform(Stream *stream,
+                                     DeviceMemory<std::complex<double>> *v) = 0;
+
+  // Populates a GPU memory allocation with random values sampled from a
+  // Gaussian distribution with the given mean and standard deviation.
+  virtual bool DoPopulateRandGaussian(Stream *stream, float mean, float stddev,
+                                      DeviceMemory<float> *v) {
+    LOG(ERROR)
+        << "platform's random number generator does not support gaussian";
+    return false;
+  }
+  virtual bool DoPopulateRandGaussian(Stream *stream, double mean,
+                                      double stddev, DeviceMemory<double> *v) {
+    LOG(ERROR)
+        << "platform's random number generator does not support gaussian";
+    return false;
+  }
+
+  // Specifies the seed used to initialize the RNG.
+  // This call does not transfer ownership of the buffer seed; its data should
+  // not be altered for the lifetime of this call. At least 16 bytes of seed
+  // data must be provided, but not all seed data will necessarily be used.
+  // seed: Pointer to seed data. Must not be null.
+  // seed_bytes: Size of seed buffer in bytes. Must be >= 16.
+  virtual bool SetSeed(Stream *stream, const uint8 *seed,
+                       uint64 seed_bytes) = 0;
+
+ protected:
+  static bool CheckSeed(const uint8 *seed, uint64 seed_bytes);
+};
+
+}  // namespace rng
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_RNG_H_
diff --git a/tensorflow/stream_executor/shared_memory_config.h b/tensorflow/stream_executor/shared_memory_config.h
new file mode 100644
index 0000000000..f2bfe27117
--- /dev/null
+++ b/tensorflow/stream_executor/shared_memory_config.h
@@ -0,0 +1,21 @@
+// This file defines a uniform interface to configuration options for shared
+// memory for supported devices. As with many StreamExecutor-supported features,
+// support for the options defined herein is device-dependent.
+#ifndef TENSORFLOW_STREAM_EXECUTOR_SHARED_MEMORY_CONFIG_H_
+#define TENSORFLOW_STREAM_EXECUTOR_SHARED_MEMORY_CONFIG_H_
+
+namespace perftools {
+namespace gputools {
+
+// SharedMemoryConfig enum describes potential widths of shared memory banks for
+// a device or kernel.
+enum class SharedMemoryConfig {
+  kDefault,    // Use the device default configuration.
+  kFourByte,   // Sets shared memory banks to be four bytes wide.
+  kEightByte,  // Sets shared memory banks to be eight bytes wide.
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_SHARED_MEMORY_CONFIG_H_
diff --git a/tensorflow/stream_executor/stream.cc b/tensorflow/stream_executor/stream.cc
new file mode 100644
index 0000000000..ca3ef9aa1a
--- /dev/null
+++ b/tensorflow/stream_executor/stream.cc
@@ -0,0 +1,3329 @@
+#include "tensorflow/stream_executor/stream.h"
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/blas.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/rng.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+
+namespace perftools {
+namespace gputools {
+
+namespace {
+static internal::StreamInterface *CreateStreamImplementation(
+    StreamExecutor *parent) {
+  PlatformKind platform_kind = parent->platform_kind();
+  if (platform_kind == PlatformKind::kCuda) {
+    return (*internal::MakeCUDAStreamImplementation())(parent);
+  } else if (platform_kind == PlatformKind::kOpenCL ||
+             platform_kind == PlatformKind::kOpenCLAltera) {
+    return (*internal::MakeOpenCLStreamImplementation())(parent);
+  } else if (platform_kind == PlatformKind::kHost) {
+    return internal::MakeHostStreamImplementation(parent);
+  } else {
+    LOG(FATAL) << "cannot create stream implementation for platform kind: "
+               << PlatformKindString(platform_kind);
+  }
+}
+
+// Code to turn parameters to functions on stream into strings that
+// will be VLOG'ed. We need overloads, instead of
+// e.g. BatchDescriptorToVlogString(), as the code that calls these
+// functions does not know what the type of the parameter is.
+string ToVlogString(const dnn::BatchDescriptor &descriptor) {
+  return descriptor.ToShortString();
+}
+
+string ToVlogString(const dnn::FilterDescriptor &descriptor) {
+  return descriptor.ToShortString();
+}
+
+string ToVlogString(const dnn::ConvolutionDescriptor &descriptor) {
+  return descriptor.ToShortString();
+}
+
+string ToVlogString(const dnn::PoolingDescriptor &descriptor) {
+  return descriptor.ToShortString();
+}
+
+string ToVlogString(const dnn::NormalizeDescriptor &descriptor) {
+  return descriptor.ToShortString();
+}
+
+string ToVlogString(dnn::ActivationMode mode) {
+  return dnn::ActivationModeString(mode);
+}
+
+string ToVlogString(dnn::ElementwiseOperation op) {
+  return dnn::ElementwiseOperationString(op);
+}
+
+string ToVlogString(blas::Transpose t) { return blas::TransposeString(t); }
+
+string ToVlogString(blas::UpperLower ul) { return blas::UpperLowerString(ul); }
+
+string ToVlogString(blas::Diagonal d) { return blas::DiagonalString(d); }
+
+string ToVlogString(blas::Side s) { return blas::SideString(s); }
+
+string ToVlogString(const void *ptr) {
+  if (ptr == nullptr) {
+    return "null";
+  }
+
+  // StrCat does not convert pointers to text.
+  std::ostringstream out;
+  out << ptr;
+  return out.str();
+}
+
+template <class T>
+string ToVlogString(const std::complex<T> &c) {
+  // StrCat does not convert std::complex to text.
+  std::ostringstream out;
+  out << c;
+  return out.str();
+}
+
+template <class T>
+string ToVlogString(const std::function<T> &f) {
+  return f == nullptr ? "null" : "<non-null function>";
+}
+
+string ToVlogString(const DeviceMemoryBase &memory) {
+  return ToVlogString(memory.opaque());
+}
+
+string ToVlogString(const DeviceMemoryBase *memory) {
+  return ToVlogString(*memory);
+}
+
+string ToVlogString(int i) { return port::StrCat(i); }
+
+string ToVlogString(uint32 i) { return port::StrCat(i); }
+
+string ToVlogString(uint64 i) { return port::StrCat(i); }
+
+string ToVlogString(float f) { return port::StrCat(f); }
+
+string ToVlogString(double d) { return port::StrCat(d); }
+
+template <class T>
+string ToVlogString(port::ArraySlice<T> elements) {
+  string str = port::StrCat(
+      ToVlogString(reinterpret_cast<const void *>(elements.data())), "[",
+      elements.size(), "]{");
+  const char *separator = "";
+  size_t max_to_show = std::numeric_limits<size_t>::max();
+  if (!VLOG_IS_ON(2)) {
+    max_to_show = 5;
+  } else if (!VLOG_IS_ON(3)) {
+    max_to_show = 20;
+  } else if (!VLOG_IS_ON(11)) {
+    max_to_show = 1000;
+  }
+  for (size_t i = 0; i < elements.size(); ++i) {
+    if (i == max_to_show) {
+      str += ", ...";
+      break;
+    }
+    port::StrAppend(&str, separator, ToVlogString(elements[i]));
+    separator = ", ";
+  }
+  str += "}";
+  return str;
+}
+
+template <class T>
+string ToVlogString(port::MutableArraySlice<T> elements) {
+  return ToVlogString(port::ArraySlice<T>(elements));
+}
+
+// Used together with PARAM to VLOG calls made to the stream. Intended
+// to be used like this:
+//
+//   VLOG(1) << CallStr("MyFunction", this, {PARAM(a), PARAM(b)});
+//
+// where a and b are the parameters to MyFunction.
+//
+// See VLOG_CALL for a short-hand for this. This way of doing it saves
+// a tremendous amount of boilerplate code given how many functions
+// there are on Stream and how many parameters they each have.
+string CallStr(const char *function_name, Stream *stream,
+               std::vector<std::pair<const char *, string>> params) {
+  // Do not call this function unless VLOG is on since just
+  // constructing all the strings in params is expensive.
+  CHECK(VLOG_IS_ON(1));
+
+  string str = port::StrCat("Called Stream::", function_name, "(");
+  const char *separator = "";
+  for (const auto &param : params) {
+    port::StrAppend(&str, separator, param.first, "=", param.second);
+    separator = ", ";
+  }
+  port::StrAppend(&str, ") stream=", ToVlogString(stream));
+  return str;
+}
+
+// Use this macro to avoid having to type every parameter twice to log
+// it with VLOG and CallStr.
+#define PARAM(parameter) \
+  { #parameter, ToVlogString(parameter) }
+
+// Use this macro to avoid having to type out the name of each
+// function and to save some boilerplate. Intended to be used like this:
+//
+//   VLOG_CALL(PARAM(a), PARAM(b))
+//
+// This saves a tremendous amount of boilerplate compared to the alternative:
+//
+//   VLOG(1) << "Calling MyFunction(a=" << ToVlogString(a)
+//           << ", b=" << ToVlogString(b);
+//
+// Note here that most of the parameter names are not short and that
+// most of the functions take many more than 2 parameters.
+#define VLOG_CALL(...) VLOG(1) << CallStr(__func__, this, {__VA_ARGS__})
+
+}  // namespace
+
+Stream::Stream(StreamExecutor *parent)
+    : implementation_(CreateStreamImplementation(parent)),
+      parent_(parent),
+      allocated_(false),
+      ok_(false),
+      temporary_memory_manager_(this) {
+  VLOG_CALL(PARAM(parent));
+}
+
+Stream::Stream(StreamExecutor *parent,
+               internal::StreamInterface *implementation)
+    : implementation_(implementation),
+      parent_(parent),
+      allocated_(false),
+      ok_(false),
+      temporary_memory_manager_(this) {
+  VLOG_CALL(PARAM(parent), PARAM(implementation));
+}
+
+Stream::~Stream() {
+  VLOG_CALL();
+
+  temporary_memory_manager_.ForceDeallocateAll();
+
+  if (allocated_) {
+    parent_->DeallocateStream(this);
+  }
+}
+
+Stream &Stream::Init() {
+  VLOG_CALL();
+
+  mutex_lock lock{mu_};
+  CHECK_EQ(false, allocated_)
+      << "stream appears to already have been initialized";
+  CHECK(!ok_) << "stream should be in !ok() state pre-initialization";
+
+  if (parent_->AllocateStream(this)) {
+    // Successful initialization!
+    allocated_ = true;
+    ok_ = true;
+  } else {
+    LOG(ERROR) << "failed to allocate stream during initialization";
+  }
+
+  return *this;
+}
+
+Stream &Stream::InitTimer(Timer *timer) {
+  VLOG_CALL(PARAM(timer));
+
+  if (ok()) {
+    CheckError(parent_->AllocateTimer(timer));
+  } else {
+    LOG(INFO) << "did not allocate timer: " << timer;
+  }
+  return *this;
+}
+
+Stream &Stream::InitWithTimer(Timer *timer) {
+  VLOG_CALL(PARAM(timer));
+
+  return Init().InitTimer(timer);
+}
+
+Stream &Stream::ThenRecordEvent(Event *event) {
+  VLOG_CALL(PARAM(event));
+
+  port::Status status = parent_->RecordEvent(this, event);
+  if (!status.ok()) {
+    LOG(ERROR) << "Error recording event in stream: " << status.error_message()
+               << "; not marking stream as bad, as the Event object may be "
+               << "at fault. Monitor for further errors.";
+  }
+
+  return *this;
+}
+
+Stream &Stream::ThenConvolve(
+    const dnn::BatchDescriptor &batch_descriptor,
+    const DeviceMemory<float> &input_data,
+    const dnn::FilterDescriptor &filter_descriptor,
+    const DeviceMemory<float> &filter_data,
+    const dnn::ConvolutionDescriptor &convolution_descriptor,
+    const dnn::BatchDescriptor &output_descriptor,
+    DeviceMemory<float> *output) {
+  VLOG_CALL(PARAM(batch_descriptor), PARAM(input_data),
+            PARAM(filter_descriptor), PARAM(filter_data),
+            PARAM(convolution_descriptor), PARAM(output_descriptor),
+            PARAM(output));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoConvolve(
+          this, batch_descriptor, input_data, filter_descriptor, filter_data,
+          convolution_descriptor, output_descriptor, output));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenSeparableConvolve(
+    const dnn::BatchDescriptor &batch_descriptor,
+    const DeviceMemory<float> &input_data,
+    const dnn::FilterDescriptor &filter_descriptor, int depth_multiplier,
+    const DeviceMemory<float> &first_weights,
+    const DeviceMemory<float> &second_weights,
+    const dnn::ConvolutionDescriptor &convolution_descriptor,
+    const dnn::BatchDescriptor &output_descriptor,
+    DeviceMemory<float> *output) {
+  VLOG_CALL(
+      PARAM(batch_descriptor), PARAM(input_data), PARAM(filter_descriptor),
+      PARAM(depth_multiplier), PARAM(first_weights), PARAM(second_weights),
+      PARAM(convolution_descriptor), PARAM(output_descriptor), PARAM(output));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoSeparableConvolve(
+          this, batch_descriptor, input_data, filter_descriptor,
+          depth_multiplier, first_weights, second_weights,
+          convolution_descriptor, output_descriptor, output));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenConvolveBackwardData(
+    const dnn::FilterDescriptor &filter_descriptor,
+    const DeviceMemory<float> &filter_data,
+    const dnn::BatchDescriptor &output_descriptor,
+    DeviceMemory<float> backward_output_data,
+    const dnn::ConvolutionDescriptor &convolution_descriptor,
+    const dnn::BatchDescriptor &input_descriptor,
+    DeviceMemory<float> *backward_input_data) {
+  VLOG_CALL(PARAM(filter_descriptor), PARAM(filter_data),
+            PARAM(output_descriptor), PARAM(backward_output_data),
+            PARAM(convolution_descriptor), PARAM(input_descriptor),
+            PARAM(backward_input_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoConvolveBackwardData(
+          this, filter_descriptor, filter_data, output_descriptor,
+          backward_output_data, convolution_descriptor, input_descriptor,
+          backward_input_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenConvolveBackwardFilter(
+    const dnn::BatchDescriptor &input_descriptor,
+    const DeviceMemory<float> &input_data,
+    const dnn::BatchDescriptor &output_descriptor,
+    DeviceMemory<float> backward_output_data,
+    const dnn::ConvolutionDescriptor &convolution_descriptor,
+    const dnn::FilterDescriptor &filter_descriptor,
+    DeviceMemory<float> *backward_filter_data) {
+  VLOG_CALL(PARAM(input_descriptor), PARAM(input_data),
+            PARAM(output_descriptor), PARAM(backward_output_data),
+            PARAM(convolution_descriptor), PARAM(filter_descriptor),
+            PARAM(backward_filter_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoConvolveBackwardFilter(
+          this, input_descriptor, input_data, output_descriptor,
+          backward_output_data, convolution_descriptor, filter_descriptor,
+          backward_filter_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMatMul(const DeviceMemory<float> &input_data,
+                           const DeviceMemory<float> &weights,
+                           const dnn::BatchDescriptor &input_dimensions,
+                           const dnn::BatchDescriptor &output_dimensions,
+                           DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(input_data), PARAM(weights), PARAM(input_dimensions),
+            PARAM(output_dimensions), PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoMatMul(this, input_data, weights, input_dimensions,
+                               output_dimensions, output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMatMulQuantized(
+    const DeviceMemory<float> &input_data, const DeviceMemory<int8> &weights,
+    const DeviceMemory<float> &weight_scales,
+    const dnn::BatchDescriptor &input_dimensions,
+    const dnn::BatchDescriptor &output_dimensions,
+    DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(input_data), PARAM(weights), PARAM(weight_scales),
+            PARAM(input_dimensions), PARAM(output_dimensions),
+            PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoMatMulQuantized(this, input_data, weights,
+                                        weight_scales, input_dimensions,
+                                        output_dimensions, output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMatMulQuantized(
+    const DeviceMemory<float> &input_data, const DeviceMemory<int16> &weights,
+    const DeviceMemory<float> &weight_scales,
+    const dnn::BatchDescriptor &input_dimensions,
+    const dnn::BatchDescriptor &output_dimensions,
+    DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(input_data), PARAM(weights), PARAM(weight_scales),
+            PARAM(input_dimensions), PARAM(output_dimensions),
+            PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoMatMulQuantized(this, input_data, weights,
+                                        weight_scales, input_dimensions,
+                                        output_dimensions, output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenBiasAdd(const DeviceMemory<float> &input_data,
+                            const DeviceMemory<float> &biases,
+                            const dnn::BatchDescriptor &dimensions,
+                            DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(input_data), PARAM(biases), PARAM(dimensions),
+            PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(
+          dnn->DoBiasAdd(this, input_data, biases, dimensions, output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenPoolForward(
+    const dnn::PoolingDescriptor &pooling_dimensions,
+    const dnn::BatchDescriptor &input_dimensions,
+    const DeviceMemory<float> &input_data,
+    const dnn::BatchDescriptor &output_dimensions,
+    DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(pooling_dimensions), PARAM(input_dimensions),
+            PARAM(input_data), PARAM(output_dimensions), PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoPoolForward(this, pooling_dimensions, input_dimensions,
+                                    input_data, output_dimensions,
+                                    output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenPoolBackward(
+    const dnn::PoolingDescriptor &pooling_dimensions,
+    const dnn::BatchDescriptor &input_dimensions,
+    const DeviceMemory<float> &input_data,
+    const dnn::BatchDescriptor &output_dimensions,
+    const DeviceMemory<float> &output_data,
+    const DeviceMemory<float> &input_diff_data,
+    DeviceMemory<float> *output_diff_data) {
+  VLOG_CALL(PARAM(pooling_dimensions), PARAM(input_dimensions),
+            PARAM(input_data), PARAM(output_dimensions), PARAM(output_data),
+            PARAM(input_diff_data), PARAM(output_diff_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoPoolBackward(this, pooling_dimensions, input_dimensions,
+                                     input_data, output_dimensions, output_data,
+                                     input_diff_data, output_diff_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenNormalize(
+    const dnn::NormalizeDescriptor &normalize_descriptor,
+    const DeviceMemory<float> &input_data, DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(normalize_descriptor), PARAM(input_data), PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoNormalize(this, normalize_descriptor, input_data,
+                                  output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenActivate(dnn::ActivationMode activation_mode,
+                             const dnn::BatchDescriptor &dimensions,
+                             const DeviceMemory<float> &input_data,
+                             DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(activation_mode), PARAM(dimensions), PARAM(input_data),
+            PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoActivate(this, activation_mode, dimensions, input_data,
+                                 output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenDepthConcatenate(
+    port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+    port::ArraySlice<const DeviceMemory<float> *> input_data,
+    DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(input_dimensions), PARAM(input_data), PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoDepthConcatenate(this, input_dimensions, input_data,
+                                         output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenElementwiseOperate(
+    dnn::ElementwiseOperation operation,
+    port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+    port::ArraySlice<const DeviceMemory<float> *> input_data,
+    const dnn::BatchDescriptor &output_dimensions,
+    DeviceMemory<float> *output_data) {
+  VLOG_CALL(PARAM(operation), PARAM(input_dimensions), PARAM(input_data),
+            PARAM(output_dimensions), PARAM(output_data));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(dnn->DoElementwiseOperate(this, operation, input_dimensions,
+                                           input_data, output_dimensions,
+                                           output_data));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemcpyD2HQuantized(
+    const DeviceMemory<float> &gpu_unquantized_src,
+    port::MutableArraySlice<uint8> host_dst) {
+  VLOG_CALL(PARAM(gpu_unquantized_src), PARAM(host_dst));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(
+          dnn->DoMemcpyD2HQuantized(this, gpu_unquantized_src, host_dst));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemcpyD2HQuantized(
+    const DeviceMemory<float> &gpu_unquantized_src,
+    port::MutableArraySlice<uint16> host_dst) {
+  VLOG_CALL(PARAM(gpu_unquantized_src), PARAM(host_dst));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(
+          dnn->DoMemcpyD2HQuantized(this, gpu_unquantized_src, host_dst));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemcpyD2HQuantized(
+    const DeviceMemory<float> &gpu_unquantized_src,
+    port::MutableArraySlice<int32> host_dst) {
+  VLOG_CALL(PARAM(gpu_unquantized_src), PARAM(host_dst));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(
+          dnn->DoMemcpyD2HQuantized(this, gpu_unquantized_src, host_dst));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemcpyH2DQuantized(
+    port::ArraySlice<uint8> host_src,
+    DeviceMemory<float> *gpu_unquantized_dst) {
+  VLOG_CALL(PARAM(host_src), PARAM(gpu_unquantized_dst));
+
+  if (ok()) {
+    if (dnn::DnnSupport *dnn = parent_->AsDnn()) {
+      CheckError(
+          dnn->DoMemcpyH2DQuantized(this, host_src, gpu_unquantized_dst));
+    } else {
+      SetError();
+      LOG(WARNING)
+          << "attempting to perform DNN operation using StreamExecutor "
+             "without DNN support";
+    }
+  }
+  return *this;
+}
+
+Stream *Stream::GetOrCreateSubStream() {
+  mutex_lock lock{mu_};
+  for (auto &stream : sub_streams_) {
+    if (stream.second) {
+      stream.second = false;
+      return stream.first.get();
+    }
+  }
+  sub_streams_.emplace_back(std::unique_ptr<Stream>{new Stream{parent_}},
+                            false);
+  Stream *sub_stream = sub_streams_.back().first.get();
+  sub_stream->Init();
+  CHECK(ok_) << "sub-stream failed to be initialized";
+
+  return sub_stream;
+}
+
+void Stream::ReturnSubStream(Stream *sub_stream) {
+  mutex_lock lock{mu_};
+  for (auto &stream : sub_streams_) {
+    if (stream.first.get() == sub_stream) {
+      stream.second = true;
+      return;
+    }
+  }
+  LOG(FATAL) << "the sub-stream to be returned is not created by this stream";
+}
+
+Stream &Stream::ThenStartTimer(Timer *t) {
+  VLOG_CALL(PARAM(t));
+
+  if (ok()) {
+    CheckError(parent_->StartTimer(this, t));
+  } else {
+    LOG(INFO) << "stream " << this << " did not enqueue 'start timer': " << t;
+  }
+  return *this;
+}
+
+Stream &Stream::ThenStopTimer(Timer *t) {
+  VLOG_CALL(PARAM(t));
+
+  if (ok()) {
+    CheckError(parent_->StopTimer(this, t));
+  } else {
+    LOG(INFO) << "stream " << this << " did not enqueue 'stop timer': " << t;
+  }
+  return *this;
+}
+
+Stream &Stream::ThenWaitFor(Stream *other) {
+  VLOG_CALL(PARAM(other));
+
+  CHECK(this != other) << "stream cannot wait for itself";
+  if (ok() && other->ok()) {
+    CheckError(parent_->CreateStreamDependency(this, other));
+  } else {
+    SetError();
+    LOG(INFO) << "stream " << this << " did not wait for stream: " << other;
+  }
+  return *this;
+}
+
+Stream &Stream::ThenWaitFor(std::vector<std::unique_ptr<Stream>> *others) {
+  VLOG_CALL(PARAM(others));
+
+  for (auto &stream : *others) {
+    CHECK_NE(stream.get(), this);
+    ThenWaitFor(stream.get());
+  }
+  return *this;
+}
+
+Stream &Stream::ThenWaitFor(Event *event) {
+  VLOG_CALL(PARAM(event));
+
+  if (ok()) {
+    port::Status status = parent_->WaitForEvent(this, event);
+    if (!status.ok()) {
+      LOG(ERROR) << "Error waiting for event in stream: "
+                 << status.error_message()
+                 << "; not marking stream as bad, as the Event object may be "
+                 << "at fault. Monitor for further errors.";
+    }
+  } else {
+    LOG(INFO) << "stream " << this << " did not wait for an event.";
+  }
+  return *this;
+}
+
+// A functor that implements ThenBlasXXX interfaces, which calls DoBlasXXX
+// functions and logs for errors.
+template <typename... Args>
+struct ThenBlasImpl {
+  // blas_func is the DoBlasXXX member function pointer, and args are its
+  // arguments except the first one of Stream* type.
+  Stream &operator()(Stream *stream,
+                     bool (blas::BlasSupport::*blas_func)(Stream *, Args...),
+                     Args... args);
+};
+
+template <typename... Args>
+Stream &ThenBlasImpl<Args...>::operator()(
+    Stream *stream, bool (blas::BlasSupport::*blas_func)(Stream *, Args...),
+    Args... args) {
+  if (stream->ok()) {
+    if (blas::BlasSupport *blas = stream->parent_->AsBlas()) {
+      stream->CheckError((blas->*blas_func)(stream, args...));
+    } else {
+      stream->CheckError(false);
+      LOG(WARNING)
+          << "attempting to perform BLAS operation using StreamExecutor "
+             "without BLAS support";
+    }
+  }
+  return *stream;
+}
+
+Stream &Stream::ThenBlasAsum(uint64 elem_count, const DeviceMemory<float> &x,
+                             int incx, DeviceMemory<float> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<float> &, int, DeviceMemory<float> *>
+      impl;
+  return impl(this, &blas::BlasSupport::DoBlasAsum, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasAsum(uint64 elem_count, const DeviceMemory<double> &x,
+                             int incx, DeviceMemory<double> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasAsum, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasAsum(uint64 elem_count,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, DeviceMemory<float> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<float> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasAsum, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasAsum(uint64 elem_count,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, DeviceMemory<double> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasAsum, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasAxpy(uint64 elem_count, float alpha,
+                             const DeviceMemory<float> &x, int incx,
+                             DeviceMemory<float> *y, int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy));
+
+  ThenBlasImpl<uint64, float, const DeviceMemory<float> &, int,
+               DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasAxpy, elem_count, alpha, x, incx,
+              y, incy);
+}
+
+Stream &Stream::ThenBlasAxpy(uint64 elem_count, double alpha,
+                             const DeviceMemory<double> &x, int incx,
+                             DeviceMemory<double> *y, int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy));
+
+  ThenBlasImpl<uint64, double, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasAxpy, elem_count, alpha, x, incx,
+              y, incy);
+}
+
+Stream &Stream::ThenBlasAxpy(uint64 elem_count, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, DeviceMemory<std::complex<float>> *y,
+                             int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy));
+
+  ThenBlasImpl<uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasAxpy, elem_count, alpha, x, incx,
+              y, incy);
+}
+
+Stream &Stream::ThenBlasAxpy(uint64 elem_count, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, DeviceMemory<std::complex<double>> *y,
+                             int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy));
+
+  ThenBlasImpl<uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasAxpy, elem_count, alpha, x, incx,
+              y, incy);
+}
+
+Stream &Stream::ThenBlasCopy(uint64 elem_count, const DeviceMemory<float> &x,
+                             int incx, DeviceMemory<float> *y, int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<uint64, const DeviceMemory<float> &, int, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasCopy, elem_count, x, incx, y,
+              incy);
+}
+
+Stream &Stream::ThenBlasCopy(uint64 elem_count, const DeviceMemory<double> &x,
+                             int incx, DeviceMemory<double> *y, int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<uint64, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasCopy, elem_count, x, incx, y,
+              incy);
+}
+
+Stream &Stream::ThenBlasCopy(uint64 elem_count,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, DeviceMemory<std::complex<float>> *y,
+                             int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasCopy, elem_count, x, incx, y,
+              incy);
+}
+
+Stream &Stream::ThenBlasCopy(uint64 elem_count,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, DeviceMemory<std::complex<double>> *y,
+                             int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasCopy, elem_count, x, incx, y,
+              incy);
+}
+
+Stream &Stream::ThenBlasDot(uint64 elem_count, const DeviceMemory<float> &x,
+                            int incx, const DeviceMemory<float> &y, int incy,
+                            DeviceMemory<float> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<float> &, int,
+               const DeviceMemory<float> &, int, DeviceMemory<float> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasDot, elem_count, x, incx, y, incy,
+              result);
+}
+
+Stream &Stream::ThenBlasDot(uint64 elem_count, const DeviceMemory<double> &x,
+                            int incx, const DeviceMemory<double> &y, int incy,
+                            DeviceMemory<double> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<double> &, int,
+               const DeviceMemory<double> &, int, DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasDot, elem_count, x, incx, y, incy,
+              result);
+}
+
+Stream &Stream::ThenBlasDotc(uint64 elem_count,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<float>> &y,
+                             int incy,
+                             DeviceMemory<std::complex<float>> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasDotc, elem_count, x, incx, y,
+              incy, result);
+}
+
+Stream &Stream::ThenBlasDotc(uint64 elem_count,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<double>> &y,
+                             int incy,
+                             DeviceMemory<std::complex<double>> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasDotc, elem_count, x, incx, y,
+              incy, result);
+}
+
+Stream &Stream::ThenBlasDotu(uint64 elem_count,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<float>> &y,
+                             int incy,
+                             DeviceMemory<std::complex<float>> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasDotu, elem_count, x, incx, y,
+              incy, result);
+}
+
+Stream &Stream::ThenBlasDotu(uint64 elem_count,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<double>> &y,
+                             int incy,
+                             DeviceMemory<std::complex<double>> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasDotu, elem_count, x, incx, y,
+              incy, result);
+}
+
+Stream &Stream::ThenBlasNrm2(uint64 elem_count, const DeviceMemory<float> &x,
+                             int incx, DeviceMemory<float> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<float> &, int, DeviceMemory<float> *>
+      impl;
+  return impl(this, &blas::BlasSupport::DoBlasNrm2, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasNrm2(uint64 elem_count, const DeviceMemory<double> &x,
+                             int incx, DeviceMemory<double> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasNrm2, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasNrm2(uint64 elem_count,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, DeviceMemory<float> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<float> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasNrm2, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasNrm2(uint64 elem_count,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, DeviceMemory<double> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasNrm2, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasRot(uint64 elem_count, DeviceMemory<float> *x, int incx,
+                            DeviceMemory<float> *y, int incy, float c,
+                            float s) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(c), PARAM(s));
+
+  ThenBlasImpl<uint64, DeviceMemory<float> *, int, DeviceMemory<float> *, int,
+               float, float> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRot, elem_count, x, incx, y, incy,
+              c, s);
+}
+
+Stream &Stream::ThenBlasRot(uint64 elem_count, DeviceMemory<double> *x,
+                            int incx, DeviceMemory<double> *y, int incy,
+                            double c, double s) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(c), PARAM(s));
+
+  ThenBlasImpl<uint64, DeviceMemory<double> *, int, DeviceMemory<double> *, int,
+               double, double> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRot, elem_count, x, incx, y, incy,
+              c, s);
+}
+
+Stream &Stream::ThenBlasRot(uint64 elem_count,
+                            DeviceMemory<std::complex<float>> *x, int incx,
+                            DeviceMemory<std::complex<float>> *y, int incy,
+                            float c, float s) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(c), PARAM(s));
+
+  ThenBlasImpl<uint64, DeviceMemory<std::complex<float>> *, int,
+               DeviceMemory<std::complex<float>> *, int, float, float> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRot, elem_count, x, incx, y, incy,
+              c, s);
+}
+
+Stream &Stream::ThenBlasRot(uint64 elem_count,
+                            DeviceMemory<std::complex<double>> *x, int incx,
+                            DeviceMemory<std::complex<double>> *y, int incy,
+                            double c, double s) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(c), PARAM(s));
+
+  ThenBlasImpl<uint64, DeviceMemory<std::complex<double>> *, int,
+               DeviceMemory<std::complex<double>> *, int, double, double> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRot, elem_count, x, incx, y, incy,
+              c, s);
+}
+
+Stream &Stream::ThenBlasRotg(DeviceMemory<float> *a, DeviceMemory<float> *b,
+                             DeviceMemory<float> *c, DeviceMemory<float> *s) {
+  VLOG_CALL(PARAM(a), PARAM(b), PARAM(c), PARAM(s));
+
+  ThenBlasImpl<DeviceMemory<float> *, DeviceMemory<float> *,
+               DeviceMemory<float> *, DeviceMemory<float> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRotg, a, b, c, s);
+}
+
+Stream &Stream::ThenBlasRotg(DeviceMemory<double> *a, DeviceMemory<double> *b,
+                             DeviceMemory<double> *c, DeviceMemory<double> *s) {
+  VLOG_CALL(PARAM(a), PARAM(b), PARAM(c), PARAM(s));
+
+  ThenBlasImpl<DeviceMemory<double> *, DeviceMemory<double> *,
+               DeviceMemory<double> *, DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRotg, a, b, c, s);
+}
+
+Stream &Stream::ThenBlasRotg(DeviceMemory<std::complex<float>> *a,
+                             DeviceMemory<std::complex<float>> *b,
+                             DeviceMemory<float> *c,
+                             DeviceMemory<std::complex<float>> *s) {
+  VLOG_CALL(PARAM(a), PARAM(b), PARAM(c), PARAM(s));
+
+  ThenBlasImpl<DeviceMemory<std::complex<float>> *,
+               DeviceMemory<std::complex<float>> *, DeviceMemory<float> *,
+               DeviceMemory<std::complex<float>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRotg, a, b, c, s);
+}
+
+Stream &Stream::ThenBlasRotg(DeviceMemory<std::complex<double>> *a,
+                             DeviceMemory<std::complex<double>> *b,
+                             DeviceMemory<double> *c,
+                             DeviceMemory<std::complex<double>> *s) {
+  VLOG_CALL(PARAM(a), PARAM(b), PARAM(c), PARAM(s));
+
+  ThenBlasImpl<DeviceMemory<std::complex<double>> *,
+               DeviceMemory<std::complex<double>> *, DeviceMemory<double> *,
+               DeviceMemory<std::complex<double>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRotg, a, b, c, s);
+}
+
+Stream &Stream::ThenBlasRotm(uint64 elem_count, DeviceMemory<float> *x,
+                             int incx, DeviceMemory<float> *y, int incy,
+                             const DeviceMemory<float> &param) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(param));
+
+  ThenBlasImpl<uint64, DeviceMemory<float> *, int, DeviceMemory<float> *, int,
+               const DeviceMemory<float> &> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRotm, elem_count, x, incx, y,
+              incy, param);
+}
+
+Stream &Stream::ThenBlasRotm(uint64 elem_count, DeviceMemory<double> *x,
+                             int incx, DeviceMemory<double> *y, int incy,
+                             const DeviceMemory<double> &param) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy),
+            PARAM(param));
+
+  ThenBlasImpl<uint64, DeviceMemory<double> *, int, DeviceMemory<double> *, int,
+               const DeviceMemory<double> &> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRotm, elem_count, x, incx, y,
+              incy, param);
+}
+
+Stream &Stream::ThenBlasRotmg(DeviceMemory<float> *d1, DeviceMemory<float> *d2,
+                              DeviceMemory<float> *x1,
+                              const DeviceMemory<float> &y1,
+                              DeviceMemory<float> *param) {
+  VLOG_CALL(PARAM(d1), PARAM(d2), PARAM(x1), PARAM(y1), PARAM(param));
+
+  ThenBlasImpl<DeviceMemory<float> *, DeviceMemory<float> *,
+               DeviceMemory<float> *, const DeviceMemory<float> &,
+               DeviceMemory<float> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRotmg, d1, d2, x1, y1, param);
+}
+
+Stream &Stream::ThenBlasRotmg(DeviceMemory<double> *d1,
+                              DeviceMemory<double> *d2,
+                              DeviceMemory<double> *x1,
+                              const DeviceMemory<double> &y1,
+                              DeviceMemory<double> *param) {
+  VLOG_CALL(PARAM(d1), PARAM(d2), PARAM(x1), PARAM(y1), PARAM(param));
+
+  ThenBlasImpl<DeviceMemory<double> *, DeviceMemory<double> *,
+               DeviceMemory<double> *, const DeviceMemory<double> &,
+               DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasRotmg, d1, d2, x1, y1, param);
+}
+
+Stream &Stream::ThenBlasScal(uint64 elem_count, float alpha,
+                             DeviceMemory<float> *x, int incx) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<uint64, float, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasScal, elem_count, alpha, x, incx);
+}
+
+Stream &Stream::ThenBlasScal(uint64 elem_count, double alpha,
+                             DeviceMemory<double> *x, int incx) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<uint64, double, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasScal, elem_count, alpha, x, incx);
+}
+
+Stream &Stream::ThenBlasScal(uint64 elem_count, float alpha,
+                             DeviceMemory<std::complex<float>> *x, int incx) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<uint64, float, DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasScal, elem_count, alpha, x, incx);
+}
+
+Stream &Stream::ThenBlasScal(uint64 elem_count, double alpha,
+                             DeviceMemory<std::complex<double>> *x, int incx) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<uint64, double, DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasScal, elem_count, alpha, x, incx);
+}
+
+Stream &Stream::ThenBlasScal(uint64 elem_count, std::complex<float> alpha,
+                             DeviceMemory<std::complex<float>> *x, int incx) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<uint64, std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasScal, elem_count, alpha, x, incx);
+}
+
+Stream &Stream::ThenBlasScal(uint64 elem_count, std::complex<double> alpha,
+                             DeviceMemory<std::complex<double>> *x, int incx) {
+  VLOG_CALL(PARAM(elem_count), PARAM(alpha), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<uint64, std::complex<double>,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasScal, elem_count, alpha, x, incx);
+}
+
+Stream &Stream::ThenBlasSwap(uint64 elem_count, DeviceMemory<float> *x,
+                             int incx, DeviceMemory<float> *y, int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<uint64, DeviceMemory<float> *, int, DeviceMemory<float> *, int>
+      impl;
+  return impl(this, &blas::BlasSupport::DoBlasSwap, elem_count, x, incx, y,
+              incy);
+}
+
+Stream &Stream::ThenBlasSwap(uint64 elem_count, DeviceMemory<double> *x,
+                             int incx, DeviceMemory<double> *y, int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<uint64, DeviceMemory<double> *, int, DeviceMemory<double> *, int>
+      impl;
+  return impl(this, &blas::BlasSupport::DoBlasSwap, elem_count, x, incx, y,
+              incy);
+}
+
+Stream &Stream::ThenBlasSwap(uint64 elem_count,
+                             DeviceMemory<std::complex<float>> *x, int incx,
+                             DeviceMemory<std::complex<float>> *y, int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<uint64, DeviceMemory<std::complex<float>> *, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSwap, elem_count, x, incx, y,
+              incy);
+}
+
+Stream &Stream::ThenBlasSwap(uint64 elem_count,
+                             DeviceMemory<std::complex<double>> *x, int incx,
+                             DeviceMemory<std::complex<double>> *y, int incy) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<uint64, DeviceMemory<std::complex<double>> *, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSwap, elem_count, x, incx, y,
+              incy);
+}
+
+Stream &Stream::ThenBlasIamax(uint64 elem_count, const DeviceMemory<float> &x,
+                              int incx, DeviceMemory<int> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<float> &, int, DeviceMemory<int> *>
+      impl;
+  return impl(this, &blas::BlasSupport::DoBlasIamax, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasIamax(uint64 elem_count, const DeviceMemory<double> &x,
+                              int incx, DeviceMemory<int> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<double> &, int, DeviceMemory<int> *>
+      impl;
+  return impl(this, &blas::BlasSupport::DoBlasIamax, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasIamax(uint64 elem_count,
+                              const DeviceMemory<std::complex<float>> &x,
+                              int incx, DeviceMemory<int> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<int> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasIamax, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasIamax(uint64 elem_count,
+                              const DeviceMemory<std::complex<double>> &x,
+                              int incx, DeviceMemory<int> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<int> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasIamax, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasIamin(uint64 elem_count, const DeviceMemory<float> &x,
+                              int incx, DeviceMemory<int> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<float> &, int, DeviceMemory<int> *>
+      impl;
+  return impl(this, &blas::BlasSupport::DoBlasIamin, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasIamin(uint64 elem_count, const DeviceMemory<double> &x,
+                              int incx, DeviceMemory<int> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<double> &, int, DeviceMemory<int> *>
+      impl;
+  return impl(this, &blas::BlasSupport::DoBlasIamin, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasIamin(uint64 elem_count,
+                              const DeviceMemory<std::complex<float>> &x,
+                              int incx, DeviceMemory<int> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<int> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasIamin, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasIamin(uint64 elem_count,
+                              const DeviceMemory<std::complex<double>> &x,
+                              int incx, DeviceMemory<int> *result) {
+  VLOG_CALL(PARAM(elem_count), PARAM(x), PARAM(incx), PARAM(result));
+
+  ThenBlasImpl<uint64, const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<int> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasIamin, elem_count, x, incx,
+              result);
+}
+
+Stream &Stream::ThenBlasGbmv(blas::Transpose trans, uint64 m, uint64 n,
+                             uint64 kl, uint64 ku, float alpha,
+                             const DeviceMemory<float> &a, int lda,
+                             const DeviceMemory<float> &x, int incx, float beta,
+                             DeviceMemory<float> *y, int incy) {
+  VLOG_CALL(PARAM(trans), PARAM(m), PARAM(n), PARAM(kl), PARAM(ku),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(x), PARAM(incx),
+            PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::Transpose, uint64, uint64, uint64, uint64, float,
+               const DeviceMemory<float> &, int, const DeviceMemory<float> &,
+               int, float, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGbmv, trans, m, n, kl, ku, alpha,
+              a, lda, x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasGbmv(blas::Transpose trans, uint64 m, uint64 n,
+                             uint64 kl, uint64 ku, double alpha,
+                             const DeviceMemory<double> &a, int lda,
+                             const DeviceMemory<double> &x, int incx,
+                             double beta, DeviceMemory<double> *y, int incy) {
+  VLOG_CALL(PARAM(trans), PARAM(m), PARAM(n), PARAM(kl), PARAM(ku),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(x), PARAM(incx),
+            PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::Transpose, uint64, uint64, uint64, uint64, double,
+               const DeviceMemory<double> &, int, const DeviceMemory<double> &,
+               int, double, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGbmv, trans, m, n, kl, ku, alpha,
+              a, lda, x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasGbmv(blas::Transpose trans, uint64 m, uint64 n,
+                             uint64 kl, uint64 ku, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *y, int incy) {
+  VLOG_CALL(PARAM(trans), PARAM(m), PARAM(n), PARAM(kl), PARAM(ku),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(x), PARAM(incx),
+            PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::Transpose, uint64, uint64, uint64, uint64,
+               std::complex<float>, const DeviceMemory<std::complex<float>> &,
+               int, const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGbmv, trans, m, n, kl, ku, alpha,
+              a, lda, x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasGbmv(blas::Transpose trans, uint64 m, uint64 n,
+                             uint64 kl, uint64 ku, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *y, int incy) {
+  VLOG_CALL(PARAM(trans), PARAM(m), PARAM(n), PARAM(kl), PARAM(ku),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(x), PARAM(incx),
+            PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::Transpose, uint64, uint64, uint64, uint64,
+               std::complex<double>, const DeviceMemory<std::complex<double>> &,
+               int, const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGbmv, trans, m, n, kl, ku, alpha,
+              a, lda, x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasGemv(blas::Transpose trans, uint64 m, uint64 n,
+                             float alpha, const DeviceMemory<float> &a, int lda,
+                             const DeviceMemory<float> &x, int incx, float beta,
+                             DeviceMemory<float> *y, int incy) {
+  VLOG_CALL(PARAM(trans), PARAM(m), PARAM(n), PARAM(alpha), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx), PARAM(beta), PARAM(y),
+            PARAM(incy));
+
+  ThenBlasImpl<blas::Transpose, uint64, uint64, float,
+               const DeviceMemory<float> &, int, const DeviceMemory<float> &,
+               int, float, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemv, trans, m, n, alpha, a, lda,
+              x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasGemv(blas::Transpose trans, uint64 m, uint64 n,
+                             double alpha, const DeviceMemory<double> &a,
+                             int lda, const DeviceMemory<double> &x, int incx,
+                             double beta, DeviceMemory<double> *y, int incy) {
+  VLOG_CALL(PARAM(trans), PARAM(m), PARAM(n), PARAM(alpha), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx), PARAM(beta), PARAM(y),
+            PARAM(incy));
+
+  ThenBlasImpl<blas::Transpose, uint64, uint64, double,
+               const DeviceMemory<double> &, int, const DeviceMemory<double> &,
+               int, double, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemv, trans, m, n, alpha, a, lda,
+              x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasGemv(blas::Transpose trans, uint64 m, uint64 n,
+                             std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *y, int incy) {
+  VLOG_CALL(PARAM(trans), PARAM(m), PARAM(n), PARAM(alpha), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx), PARAM(beta), PARAM(y),
+            PARAM(incy));
+
+  ThenBlasImpl<blas::Transpose, uint64, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemv, trans, m, n, alpha, a, lda,
+              x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasGemv(blas::Transpose trans, uint64 m, uint64 n,
+                             std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *y, int incy) {
+  VLOG_CALL(PARAM(trans), PARAM(m), PARAM(n), PARAM(alpha), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx), PARAM(beta), PARAM(y),
+            PARAM(incy));
+
+  ThenBlasImpl<blas::Transpose, uint64, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemv, trans, m, n, alpha, a, lda,
+              x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasGer(uint64 m, uint64 n, float alpha,
+                            const DeviceMemory<float> &x, int incx,
+                            const DeviceMemory<float> &y, int incy,
+                            DeviceMemory<float> *a, int lda) {
+  VLOG_CALL(PARAM(m), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<uint64, uint64, float, const DeviceMemory<float> &, int,
+               const DeviceMemory<float> &, int, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGer, m, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasGer(uint64 m, uint64 n, double alpha,
+                            const DeviceMemory<double> &x, int incx,
+                            const DeviceMemory<double> &y, int incy,
+                            DeviceMemory<double> *a, int lda) {
+  VLOG_CALL(PARAM(m), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<uint64, uint64, double, const DeviceMemory<double> &, int,
+               const DeviceMemory<double> &, int, DeviceMemory<double> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGer, m, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasGerc(uint64 m, uint64 n, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<float>> &y,
+                             int incy, DeviceMemory<std::complex<float>> *a,
+                             int lda) {
+  VLOG_CALL(PARAM(m), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<uint64, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGerc, m, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasGerc(uint64 m, uint64 n, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<double>> &y,
+                             int incy, DeviceMemory<std::complex<double>> *a,
+                             int lda) {
+  VLOG_CALL(PARAM(m), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<uint64, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGerc, m, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasGeru(uint64 m, uint64 n, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<float>> &y,
+                             int incy, DeviceMemory<std::complex<float>> *a,
+                             int lda) {
+  VLOG_CALL(PARAM(m), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<uint64, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGeru, m, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasGeru(uint64 m, uint64 n, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<double>> &y,
+                             int incy, DeviceMemory<std::complex<double>> *a,
+                             int lda) {
+  VLOG_CALL(PARAM(m), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx), PARAM(y),
+            PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<uint64, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGeru, m, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasHbmv(blas::UpperLower uplo, uint64 n, uint64 k,
+                             std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(k), PARAM(alpha), PARAM(a), PARAM(lda),
+            PARAM(x), PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHbmv, uplo, n, k, alpha, a, lda,
+              x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasHbmv(blas::UpperLower uplo, uint64 n, uint64 k,
+                             std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(k), PARAM(alpha), PARAM(a), PARAM(lda),
+            PARAM(x), PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHbmv, uplo, n, k, alpha, a, lda,
+              x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasHemv(blas::UpperLower uplo, uint64 n,
+                             std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(x),
+            PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHemv, uplo, n, alpha, a, lda, x,
+              incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasHemv(blas::UpperLower uplo, uint64 n,
+                             std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(x),
+            PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHemv, uplo, n, alpha, a, lda, x,
+              incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasHer(blas::UpperLower uplo, uint64 n, float alpha,
+                            const DeviceMemory<std::complex<float>> &x,
+                            int incx, DeviceMemory<std::complex<float>> *a,
+                            int lda) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<blas::UpperLower, uint64, float,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHer, uplo, n, alpha, x, incx, a,
+              lda);
+}
+
+Stream &Stream::ThenBlasHer(blas::UpperLower uplo, uint64 n, double alpha,
+                            const DeviceMemory<std::complex<double>> &x,
+                            int incx, DeviceMemory<std::complex<double>> *a,
+                            int lda) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<blas::UpperLower, uint64, double,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHer, uplo, n, alpha, x, incx, a,
+              lda);
+}
+
+Stream &Stream::ThenBlasHer2(blas::UpperLower uplo, uint64 n,
+                             std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<float>> &y,
+                             int incy, DeviceMemory<std::complex<float>> *a,
+                             int lda) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(y), PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<blas::UpperLower, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHer2, uplo, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasHer2(blas::UpperLower uplo, uint64 n,
+                             std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<double>> &y,
+                             int incy, DeviceMemory<std::complex<double>> *a,
+                             int lda) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(y), PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<blas::UpperLower, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHer2, uplo, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasHpmv(blas::UpperLower uplo, uint64 n,
+                             std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &ap,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(ap), PARAM(x),
+            PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &,
+               const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHpmv, uplo, n, alpha, ap, x, incx,
+              beta, y, incy);
+}
+
+Stream &Stream::ThenBlasHpmv(blas::UpperLower uplo, uint64 n,
+                             std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &ap,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(ap), PARAM(x),
+            PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &,
+               const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHpmv, uplo, n, alpha, ap, x, incx,
+              beta, y, incy);
+}
+
+Stream &Stream::ThenBlasHpr(blas::UpperLower uplo, uint64 n, float alpha,
+                            const DeviceMemory<std::complex<float>> &x,
+                            int incx, DeviceMemory<std::complex<float>> *ap) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(ap));
+
+  ThenBlasImpl<blas::UpperLower, uint64, float,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHpr, uplo, n, alpha, x, incx, ap);
+}
+
+Stream &Stream::ThenBlasHpr(blas::UpperLower uplo, uint64 n, double alpha,
+                            const DeviceMemory<std::complex<double>> &x,
+                            int incx, DeviceMemory<std::complex<double>> *ap) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(ap));
+
+  ThenBlasImpl<blas::UpperLower, uint64, double,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHpr, uplo, n, alpha, x, incx, ap);
+}
+
+Stream &Stream::ThenBlasHpr2(blas::UpperLower uplo, uint64 n,
+                             std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<float>> &y,
+                             int incy, DeviceMemory<std::complex<float>> *ap) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(y), PARAM(incy), PARAM(ap));
+
+  ThenBlasImpl<blas::UpperLower, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHpr2, uplo, n, alpha, x, incx, y,
+              incy, ap);
+}
+
+Stream &Stream::ThenBlasHpr2(blas::UpperLower uplo, uint64 n,
+                             std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &x,
+                             int incx,
+                             const DeviceMemory<std::complex<double>> &y,
+                             int incy, DeviceMemory<std::complex<double>> *ap) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(y), PARAM(incy), PARAM(ap));
+
+  ThenBlasImpl<blas::UpperLower, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHpr2, uplo, n, alpha, x, incx, y,
+              incy, ap);
+}
+
+Stream &Stream::ThenBlasSbmv(blas::UpperLower uplo, uint64 n, uint64 k,
+                             float alpha, const DeviceMemory<float> &a, int lda,
+                             const DeviceMemory<float> &x, int incx, float beta,
+                             DeviceMemory<float> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(k), PARAM(alpha), PARAM(a), PARAM(lda),
+            PARAM(x), PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, uint64, float,
+               const DeviceMemory<float> &, int, const DeviceMemory<float> &,
+               int, float, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSbmv, uplo, n, k, alpha, a, lda,
+              x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasSbmv(blas::UpperLower uplo, uint64 n, uint64 k,
+                             double alpha, const DeviceMemory<double> &a,
+                             int lda, const DeviceMemory<double> &x, int incx,
+                             double beta, DeviceMemory<double> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(k), PARAM(alpha), PARAM(a), PARAM(lda),
+            PARAM(x), PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, uint64, double,
+               const DeviceMemory<double> &, int, const DeviceMemory<double> &,
+               int, double, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSbmv, uplo, n, k, alpha, a, lda,
+              x, incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasSpmv(blas::UpperLower uplo, uint64 n, float alpha,
+                             const DeviceMemory<float> &ap,
+                             const DeviceMemory<float> &x, int incx, float beta,
+                             DeviceMemory<float> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(ap), PARAM(x),
+            PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, float, const DeviceMemory<float> &,
+               const DeviceMemory<float> &, int, float, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSpmv, uplo, n, alpha, ap, x, incx,
+              beta, y, incy);
+}
+
+Stream &Stream::ThenBlasSpmv(blas::UpperLower uplo, uint64 n, double alpha,
+                             const DeviceMemory<double> &ap,
+                             const DeviceMemory<double> &x, int incx,
+                             double beta, DeviceMemory<double> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(ap), PARAM(x),
+            PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, double, const DeviceMemory<double> &,
+               const DeviceMemory<double> &, int, double,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSpmv, uplo, n, alpha, ap, x, incx,
+              beta, y, incy);
+}
+
+Stream &Stream::ThenBlasSpr(blas::UpperLower uplo, uint64 n, float alpha,
+                            const DeviceMemory<float> &x, int incx,
+                            DeviceMemory<float> *ap) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(ap));
+
+  ThenBlasImpl<blas::UpperLower, uint64, float, const DeviceMemory<float> &,
+               int, DeviceMemory<float> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSpr, uplo, n, alpha, x, incx, ap);
+}
+
+Stream &Stream::ThenBlasSpr(blas::UpperLower uplo, uint64 n, double alpha,
+                            const DeviceMemory<double> &x, int incx,
+                            DeviceMemory<double> *ap) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(ap));
+
+  ThenBlasImpl<blas::UpperLower, uint64, double, const DeviceMemory<double> &,
+               int, DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSpr, uplo, n, alpha, x, incx, ap);
+}
+
+Stream &Stream::ThenBlasSpr2(blas::UpperLower uplo, uint64 n, float alpha,
+                             const DeviceMemory<float> &x, int incx,
+                             const DeviceMemory<float> &y, int incy,
+                             DeviceMemory<float> *ap) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(y), PARAM(incy), PARAM(ap));
+
+  ThenBlasImpl<blas::UpperLower, uint64, float, const DeviceMemory<float> &,
+               int, const DeviceMemory<float> &, int,
+               DeviceMemory<float> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSpr2, uplo, n, alpha, x, incx, y,
+              incy, ap);
+}
+
+Stream &Stream::ThenBlasSpr2(blas::UpperLower uplo, uint64 n, double alpha,
+                             const DeviceMemory<double> &x, int incx,
+                             const DeviceMemory<double> &y, int incy,
+                             DeviceMemory<double> *ap) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(y), PARAM(incy), PARAM(ap));
+
+  ThenBlasImpl<blas::UpperLower, uint64, double, const DeviceMemory<double> &,
+               int, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSpr2, uplo, n, alpha, x, incx, y,
+              incy, ap);
+}
+
+Stream &Stream::ThenBlasSymv(blas::UpperLower uplo, uint64 n, float alpha,
+                             const DeviceMemory<float> &a, int lda,
+                             const DeviceMemory<float> &x, int incx, float beta,
+                             DeviceMemory<float> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(x),
+            PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, float, const DeviceMemory<float> &,
+               int, const DeviceMemory<float> &, int, float,
+               DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSymv, uplo, n, alpha, a, lda, x,
+              incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasSymv(blas::UpperLower uplo, uint64 n, double alpha,
+                             const DeviceMemory<double> &a, int lda,
+                             const DeviceMemory<double> &x, int incx,
+                             double beta, DeviceMemory<double> *y, int incy) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(x),
+            PARAM(incx), PARAM(beta), PARAM(y), PARAM(incy));
+
+  ThenBlasImpl<blas::UpperLower, uint64, double, const DeviceMemory<double> &,
+               int, const DeviceMemory<double> &, int, double,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSymv, uplo, n, alpha, a, lda, x,
+              incx, beta, y, incy);
+}
+
+Stream &Stream::ThenBlasSyr(blas::UpperLower uplo, uint64 n, float alpha,
+                            const DeviceMemory<float> &x, int incx,
+                            DeviceMemory<float> *a, int lda) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<blas::UpperLower, uint64, float, const DeviceMemory<float> &,
+               int, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyr, uplo, n, alpha, x, incx, a,
+              lda);
+}
+
+Stream &Stream::ThenBlasSyr(blas::UpperLower uplo, uint64 n, double alpha,
+                            const DeviceMemory<double> &x, int incx,
+                            DeviceMemory<double> *a, int lda) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<blas::UpperLower, uint64, double, const DeviceMemory<double> &,
+               int, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyr, uplo, n, alpha, x, incx, a,
+              lda);
+}
+
+Stream &Stream::ThenBlasSyr2(blas::UpperLower uplo, uint64 n, float alpha,
+                             const DeviceMemory<float> &x, int incx,
+                             const DeviceMemory<float> &y, int incy,
+                             DeviceMemory<float> *a, int lda) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(y), PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<blas::UpperLower, uint64, float, const DeviceMemory<float> &,
+               int, const DeviceMemory<float> &, int, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyr2, uplo, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasSyr2(blas::UpperLower uplo, uint64 n, double alpha,
+                             const DeviceMemory<double> &x, int incx,
+                             const DeviceMemory<double> &y, int incy,
+                             DeviceMemory<double> *a, int lda) {
+  VLOG_CALL(PARAM(uplo), PARAM(n), PARAM(alpha), PARAM(x), PARAM(incx),
+            PARAM(y), PARAM(incy), PARAM(a), PARAM(lda));
+
+  ThenBlasImpl<blas::UpperLower, uint64, double, const DeviceMemory<double> &,
+               int, const DeviceMemory<double> &, int, DeviceMemory<double> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyr2, uplo, n, alpha, x, incx, y,
+              incy, a, lda);
+}
+
+Stream &Stream::ThenBlasTbmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n, uint64 k,
+                             const DeviceMemory<float> &a, int lda,
+                             DeviceMemory<float> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(k),
+            PARAM(a), PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               uint64, const DeviceMemory<float> &, int, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTbmv, uplo, trans, diag, n, k, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTbmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n, uint64 k,
+                             const DeviceMemory<double> &a, int lda,
+                             DeviceMemory<double> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(k),
+            PARAM(a), PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               uint64, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTbmv, uplo, trans, diag, n, k, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTbmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n, uint64 k,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda, DeviceMemory<std::complex<float>> *x,
+                             int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(k),
+            PARAM(a), PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               uint64, const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTbmv, uplo, trans, diag, n, k, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTbmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n, uint64 k,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda, DeviceMemory<std::complex<double>> *x,
+                             int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(k),
+            PARAM(a), PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               uint64, const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTbmv, uplo, trans, diag, n, k, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTbsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n, uint64 k,
+                             const DeviceMemory<float> &a, int lda,
+                             DeviceMemory<float> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(k),
+            PARAM(a), PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               uint64, const DeviceMemory<float> &, int, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTbsv, uplo, trans, diag, n, k, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTbsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n, uint64 k,
+                             const DeviceMemory<double> &a, int lda,
+                             DeviceMemory<double> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(k),
+            PARAM(a), PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               uint64, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTbsv, uplo, trans, diag, n, k, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTbsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n, uint64 k,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda, DeviceMemory<std::complex<float>> *x,
+                             int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(k),
+            PARAM(a), PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               uint64, const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTbsv, uplo, trans, diag, n, k, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTbsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n, uint64 k,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda, DeviceMemory<std::complex<double>> *x,
+                             int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(k),
+            PARAM(a), PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               uint64, const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTbsv, uplo, trans, diag, n, k, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTpmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<float> &ap,
+                             DeviceMemory<float> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(ap),
+            PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<float> &, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTpmv, uplo, trans, diag, n, ap, x,
+              incx);
+}
+
+Stream &Stream::ThenBlasTpmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<double> &ap,
+                             DeviceMemory<double> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(ap),
+            PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<double> &, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTpmv, uplo, trans, diag, n, ap, x,
+              incx);
+}
+
+Stream &Stream::ThenBlasTpmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<std::complex<float>> &ap,
+                             DeviceMemory<std::complex<float>> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(ap),
+            PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<std::complex<float>> &,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTpmv, uplo, trans, diag, n, ap, x,
+              incx);
+}
+
+Stream &Stream::ThenBlasTpmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<std::complex<double>> &ap,
+                             DeviceMemory<std::complex<double>> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(ap),
+            PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<std::complex<double>> &,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTpmv, uplo, trans, diag, n, ap, x,
+              incx);
+}
+
+Stream &Stream::ThenBlasTpsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<float> &ap,
+                             DeviceMemory<float> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(ap),
+            PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<float> &, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTpsv, uplo, trans, diag, n, ap, x,
+              incx);
+}
+
+Stream &Stream::ThenBlasTpsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<double> &ap,
+                             DeviceMemory<double> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(ap),
+            PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<double> &, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTpsv, uplo, trans, diag, n, ap, x,
+              incx);
+}
+
+Stream &Stream::ThenBlasTpsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<std::complex<float>> &ap,
+                             DeviceMemory<std::complex<float>> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(ap),
+            PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<std::complex<float>> &,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTpsv, uplo, trans, diag, n, ap, x,
+              incx);
+}
+
+Stream &Stream::ThenBlasTpsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<std::complex<double>> &ap,
+                             DeviceMemory<std::complex<double>> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(ap),
+            PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<std::complex<double>> &,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTpsv, uplo, trans, diag, n, ap, x,
+              incx);
+}
+
+Stream &Stream::ThenBlasTrmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<float> &a, int lda,
+                             DeviceMemory<float> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<float> &, int, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrmv, uplo, trans, diag, n, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTrmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<double> &a, int lda,
+                             DeviceMemory<double> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<double> &, int, DeviceMemory<double> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrmv, uplo, trans, diag, n, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTrmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda, DeviceMemory<std::complex<float>> *x,
+                             int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrmv, uplo, trans, diag, n, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTrmv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda, DeviceMemory<std::complex<double>> *x,
+                             int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrmv, uplo, trans, diag, n, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTrsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<float> &a, int lda,
+                             DeviceMemory<float> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<float> &, int, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrsv, uplo, trans, diag, n, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTrsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<double> &a, int lda,
+                             DeviceMemory<double> *x, int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<double> &, int, DeviceMemory<double> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrsv, uplo, trans, diag, n, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTrsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda, DeviceMemory<std::complex<float>> *x,
+                             int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrsv, uplo, trans, diag, n, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasTrsv(blas::UpperLower uplo, blas::Transpose trans,
+                             blas::Diagonal diag, uint64 n,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda, DeviceMemory<std::complex<double>> *x,
+                             int incx) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(diag), PARAM(n), PARAM(a),
+            PARAM(lda), PARAM(x), PARAM(incx));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, blas::Diagonal, uint64,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrsv, uplo, trans, diag, n, a,
+              lda, x, incx);
+}
+
+Stream &Stream::ThenBlasGemm(blas::Transpose transa, blas::Transpose transb,
+                             uint64 m, uint64 n, uint64 k, float alpha,
+                             const DeviceMemory<float> &a, int lda,
+                             const DeviceMemory<float> &b, int ldb, float beta,
+                             DeviceMemory<float> *c, int ldc) {
+  VLOG_CALL(PARAM(transa), PARAM(transb), PARAM(m), PARAM(n), PARAM(k),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb),
+            PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::Transpose, blas::Transpose, uint64, uint64, uint64, float,
+               const DeviceMemory<float> &, int, const DeviceMemory<float> &,
+               int, float, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemm, transa, transb, m, n, k,
+              alpha, a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasGemm(blas::Transpose transa, blas::Transpose transb,
+                             uint64 m, uint64 n, uint64 k, double alpha,
+                             const DeviceMemory<double> &a, int lda,
+                             const DeviceMemory<double> &b, int ldb,
+                             double beta, DeviceMemory<double> *c, int ldc) {
+  VLOG_CALL(PARAM(transa), PARAM(transb), PARAM(m), PARAM(n), PARAM(k),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb),
+            PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::Transpose, blas::Transpose, uint64, uint64, uint64, double,
+               const DeviceMemory<double> &, int, const DeviceMemory<double> &,
+               int, double, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemm, transa, transb, m, n, k,
+              alpha, a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasGemm(blas::Transpose transa, blas::Transpose transb,
+                             uint64 m, uint64 n, uint64 k,
+                             std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<float>> &b,
+                             int ldb, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *c, int ldc) {
+  VLOG_CALL(PARAM(transa), PARAM(transb), PARAM(m), PARAM(n), PARAM(k),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb),
+            PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::Transpose, blas::Transpose, uint64, uint64, uint64,
+               std::complex<float>, const DeviceMemory<std::complex<float>> &,
+               int, const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemm, transa, transb, m, n, k,
+              alpha, a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasGemm(blas::Transpose transa, blas::Transpose transb,
+                             uint64 m, uint64 n, uint64 k,
+                             std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<double>> &b,
+                             int ldb, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *c, int ldc) {
+  VLOG_CALL(PARAM(transa), PARAM(transb), PARAM(m), PARAM(n), PARAM(k),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb),
+            PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::Transpose, blas::Transpose, uint64, uint64, uint64,
+               std::complex<double>, const DeviceMemory<std::complex<double>> &,
+               int, const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemm, transa, transb, m, n, k,
+              alpha, a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasHemm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                             uint64 n, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<float>> &b,
+                             int ldb, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *c, int ldc) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(m), PARAM(n), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, uint64, uint64,
+               std::complex<float>, const DeviceMemory<std::complex<float>> &,
+               int, const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHemm, side, uplo, m, n, alpha, a,
+              lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasHemm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                             uint64 n, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<double>> &b,
+                             int ldb, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *c, int ldc) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(m), PARAM(n), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, uint64, uint64,
+               std::complex<double>, const DeviceMemory<std::complex<double>> &,
+               int, const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHemm, side, uplo, m, n, alpha, a,
+              lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasHerk(blas::UpperLower uplo, blas::Transpose trans,
+                             uint64 n, uint64 k, float alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda, float beta,
+                             DeviceMemory<std::complex<float>> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64, float,
+               const DeviceMemory<std::complex<float>> &, int, float,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHerk, uplo, trans, n, k, alpha, a,
+              lda, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasHerk(blas::UpperLower uplo, blas::Transpose trans,
+                             uint64 n, uint64 k, double alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda, double beta,
+                             DeviceMemory<std::complex<double>> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64, double,
+               const DeviceMemory<std::complex<double>> &, int, double,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHerk, uplo, trans, n, k, alpha, a,
+              lda, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasHer2k(blas::UpperLower uplo, blas::Transpose trans,
+                              uint64 n, uint64 k, std::complex<float> alpha,
+                              const DeviceMemory<std::complex<float>> &a,
+                              int lda,
+                              const DeviceMemory<std::complex<float>> &b,
+                              int ldb, float beta,
+                              DeviceMemory<std::complex<float>> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64,
+               std::complex<float>, const DeviceMemory<std::complex<float>> &,
+               int, const DeviceMemory<std::complex<float>> &, int, float,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHer2k, uplo, trans, n, k, alpha,
+              a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasHer2k(blas::UpperLower uplo, blas::Transpose trans,
+                              uint64 n, uint64 k, std::complex<double> alpha,
+                              const DeviceMemory<std::complex<double>> &a,
+                              int lda,
+                              const DeviceMemory<std::complex<double>> &b,
+                              int ldb, double beta,
+                              DeviceMemory<std::complex<double>> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64,
+               std::complex<double>, const DeviceMemory<std::complex<double>> &,
+               int, const DeviceMemory<std::complex<double>> &, int, double,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasHer2k, uplo, trans, n, k, alpha,
+              a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSymm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                             uint64 n, float alpha,
+                             const DeviceMemory<float> &a, int lda,
+                             const DeviceMemory<float> &b, int ldb, float beta,
+                             DeviceMemory<float> *c, int ldc) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(m), PARAM(n), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, uint64, uint64, float,
+               const DeviceMemory<float> &, int, const DeviceMemory<float> &,
+               int, float, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSymm, side, uplo, m, n, alpha, a,
+              lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSymm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                             uint64 n, double alpha,
+                             const DeviceMemory<double> &a, int lda,
+                             const DeviceMemory<double> &b, int ldb,
+                             double beta, DeviceMemory<double> *c, int ldc) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(m), PARAM(n), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, uint64, uint64, double,
+               const DeviceMemory<double> &, int, const DeviceMemory<double> &,
+               int, double, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSymm, side, uplo, m, n, alpha, a,
+              lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSymm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                             uint64 n, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<float>> &b,
+                             int ldb, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *c, int ldc) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(m), PARAM(n), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, uint64, uint64,
+               std::complex<float>, const DeviceMemory<std::complex<float>> &,
+               int, const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSymm, side, uplo, m, n, alpha, a,
+              lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSymm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                             uint64 n, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda,
+                             const DeviceMemory<std::complex<double>> &b,
+                             int ldb, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *c, int ldc) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(m), PARAM(n), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, uint64, uint64,
+               std::complex<double>, const DeviceMemory<std::complex<double>> &,
+               int, const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSymm, side, uplo, m, n, alpha, a,
+              lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSyrk(blas::UpperLower uplo, blas::Transpose trans,
+                             uint64 n, uint64 k, float alpha,
+                             const DeviceMemory<float> &a, int lda, float beta,
+                             DeviceMemory<float> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64, float,
+               const DeviceMemory<float> &, int, float, DeviceMemory<float> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyrk, uplo, trans, n, k, alpha, a,
+              lda, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSyrk(blas::UpperLower uplo, blas::Transpose trans,
+                             uint64 n, uint64 k, double alpha,
+                             const DeviceMemory<double> &a, int lda,
+                             double beta, DeviceMemory<double> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64, double,
+               const DeviceMemory<double> &, int, double,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyrk, uplo, trans, n, k, alpha, a,
+              lda, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSyrk(blas::UpperLower uplo, blas::Transpose trans,
+                             uint64 n, uint64 k, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda, std::complex<float> beta,
+                             DeviceMemory<std::complex<float>> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64,
+               std::complex<float>, const DeviceMemory<std::complex<float>> &,
+               int, std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyrk, uplo, trans, n, k, alpha, a,
+              lda, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSyrk(blas::UpperLower uplo, blas::Transpose trans,
+                             uint64 n, uint64 k, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda, std::complex<double> beta,
+                             DeviceMemory<std::complex<double>> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(beta), PARAM(c), PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64,
+               std::complex<double>, const DeviceMemory<std::complex<double>> &,
+               int, std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyrk, uplo, trans, n, k, alpha, a,
+              lda, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSyr2k(blas::UpperLower uplo, blas::Transpose trans,
+                              uint64 n, uint64 k, float alpha,
+                              const DeviceMemory<float> &a, int lda,
+                              const DeviceMemory<float> &b, int ldb, float beta,
+                              DeviceMemory<float> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64, float,
+               const DeviceMemory<float> &, int, const DeviceMemory<float> &,
+               int, float, DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyr2k, uplo, trans, n, k, alpha,
+              a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSyr2k(blas::UpperLower uplo, blas::Transpose trans,
+                              uint64 n, uint64 k, double alpha,
+                              const DeviceMemory<double> &a, int lda,
+                              const DeviceMemory<double> &b, int ldb,
+                              double beta, DeviceMemory<double> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64, double,
+               const DeviceMemory<double> &, int, const DeviceMemory<double> &,
+               int, double, DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyr2k, uplo, trans, n, k, alpha,
+              a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSyr2k(blas::UpperLower uplo, blas::Transpose trans,
+                              uint64 n, uint64 k, std::complex<float> alpha,
+                              const DeviceMemory<std::complex<float>> &a,
+                              int lda,
+                              const DeviceMemory<std::complex<float>> &b,
+                              int ldb, std::complex<float> beta,
+                              DeviceMemory<std::complex<float>> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64,
+               std::complex<float>, const DeviceMemory<std::complex<float>> &,
+               int, const DeviceMemory<std::complex<float>> &, int,
+               std::complex<float>, DeviceMemory<std::complex<float>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyr2k, uplo, trans, n, k, alpha,
+              a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasSyr2k(blas::UpperLower uplo, blas::Transpose trans,
+                              uint64 n, uint64 k, std::complex<double> alpha,
+                              const DeviceMemory<std::complex<double>> &a,
+                              int lda,
+                              const DeviceMemory<std::complex<double>> &b,
+                              int ldb, std::complex<double> beta,
+                              DeviceMemory<std::complex<double>> *c, int ldc) {
+  VLOG_CALL(PARAM(uplo), PARAM(trans), PARAM(n), PARAM(k), PARAM(alpha),
+            PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb), PARAM(beta), PARAM(c),
+            PARAM(ldc));
+
+  ThenBlasImpl<blas::UpperLower, blas::Transpose, uint64, uint64,
+               std::complex<double>, const DeviceMemory<std::complex<double>> &,
+               int, const DeviceMemory<std::complex<double>> &, int,
+               std::complex<double>, DeviceMemory<std::complex<double>> *,
+               int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasSyr2k, uplo, trans, n, k, alpha,
+              a, lda, b, ldb, beta, c, ldc);
+}
+
+Stream &Stream::ThenBlasTrmm(blas::Side side, blas::UpperLower uplo,
+                             blas::Transpose transa, blas::Diagonal diag,
+                             uint64 m, uint64 n, float alpha,
+                             const DeviceMemory<float> &a, int lda,
+                             DeviceMemory<float> *b, int ldb) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(transa), PARAM(diag), PARAM(m),
+            PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, blas::Transpose, blas::Diagonal,
+               uint64, uint64, float, const DeviceMemory<float> &, int,
+               DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrmm, side, uplo, transa, diag, m,
+              n, alpha, a, lda, b, ldb);
+}
+
+Stream &Stream::ThenBlasTrmm(blas::Side side, blas::UpperLower uplo,
+                             blas::Transpose transa, blas::Diagonal diag,
+                             uint64 m, uint64 n, double alpha,
+                             const DeviceMemory<double> &a, int lda,
+                             DeviceMemory<double> *b, int ldb) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(transa), PARAM(diag), PARAM(m),
+            PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, blas::Transpose, blas::Diagonal,
+               uint64, uint64, double, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrmm, side, uplo, transa, diag, m,
+              n, alpha, a, lda, b, ldb);
+}
+
+Stream &Stream::ThenBlasTrmm(blas::Side side, blas::UpperLower uplo,
+                             blas::Transpose transa, blas::Diagonal diag,
+                             uint64 m, uint64 n, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda, DeviceMemory<std::complex<float>> *b,
+                             int ldb) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(transa), PARAM(diag), PARAM(m),
+            PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, blas::Transpose, blas::Diagonal,
+               uint64, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrmm, side, uplo, transa, diag, m,
+              n, alpha, a, lda, b, ldb);
+}
+
+Stream &Stream::ThenBlasTrmm(blas::Side side, blas::UpperLower uplo,
+                             blas::Transpose transa, blas::Diagonal diag,
+                             uint64 m, uint64 n, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda, DeviceMemory<std::complex<double>> *b,
+                             int ldb) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(transa), PARAM(diag), PARAM(m),
+            PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, blas::Transpose, blas::Diagonal,
+               uint64, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrmm, side, uplo, transa, diag, m,
+              n, alpha, a, lda, b, ldb);
+}
+
+Stream &Stream::ThenBlasTrsm(blas::Side side, blas::UpperLower uplo,
+                             blas::Transpose transa, blas::Diagonal diag,
+                             uint64 m, uint64 n, float alpha,
+                             const DeviceMemory<float> &a, int lda,
+                             DeviceMemory<float> *b, int ldb) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(transa), PARAM(diag), PARAM(m),
+            PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, blas::Transpose, blas::Diagonal,
+               uint64, uint64, float, const DeviceMemory<float> &, int,
+               DeviceMemory<float> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrsm, side, uplo, transa, diag, m,
+              n, alpha, a, lda, b, ldb);
+}
+
+Stream &Stream::ThenBlasTrsm(blas::Side side, blas::UpperLower uplo,
+                             blas::Transpose transa, blas::Diagonal diag,
+                             uint64 m, uint64 n, double alpha,
+                             const DeviceMemory<double> &a, int lda,
+                             DeviceMemory<double> *b, int ldb) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(transa), PARAM(diag), PARAM(m),
+            PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, blas::Transpose, blas::Diagonal,
+               uint64, uint64, double, const DeviceMemory<double> &, int,
+               DeviceMemory<double> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrsm, side, uplo, transa, diag, m,
+              n, alpha, a, lda, b, ldb);
+}
+
+Stream &Stream::ThenBlasTrsm(blas::Side side, blas::UpperLower uplo,
+                             blas::Transpose transa, blas::Diagonal diag,
+                             uint64 m, uint64 n, std::complex<float> alpha,
+                             const DeviceMemory<std::complex<float>> &a,
+                             int lda, DeviceMemory<std::complex<float>> *b,
+                             int ldb) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(transa), PARAM(diag), PARAM(m),
+            PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, blas::Transpose, blas::Diagonal,
+               uint64, uint64, std::complex<float>,
+               const DeviceMemory<std::complex<float>> &, int,
+               DeviceMemory<std::complex<float>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrsm, side, uplo, transa, diag, m,
+              n, alpha, a, lda, b, ldb);
+}
+
+Stream &Stream::ThenBlasTrsm(blas::Side side, blas::UpperLower uplo,
+                             blas::Transpose transa, blas::Diagonal diag,
+                             uint64 m, uint64 n, std::complex<double> alpha,
+                             const DeviceMemory<std::complex<double>> &a,
+                             int lda, DeviceMemory<std::complex<double>> *b,
+                             int ldb) {
+  VLOG_CALL(PARAM(side), PARAM(uplo), PARAM(transa), PARAM(diag), PARAM(m),
+            PARAM(n), PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb));
+
+  ThenBlasImpl<blas::Side, blas::UpperLower, blas::Transpose, blas::Diagonal,
+               uint64, uint64, std::complex<double>,
+               const DeviceMemory<std::complex<double>> &, int,
+               DeviceMemory<std::complex<double>> *, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasTrsm, side, uplo, transa, diag, m,
+              n, alpha, a, lda, b, ldb);
+}
+
+Stream &Stream::ThenBlasGemmBatched(
+    blas::Transpose transa, blas::Transpose transb, uint64 m, uint64 n,
+    uint64 k, float alpha, const port::ArraySlice<DeviceMemory<float> *> &a,
+    int lda, const port::ArraySlice<DeviceMemory<float> *> &b, int ldb,
+    float beta, const port::ArraySlice<DeviceMemory<float> *> &c, int ldc,
+    int batch_count) {
+  VLOG_CALL(PARAM(transa), PARAM(transb), PARAM(m), PARAM(n), PARAM(k),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb),
+            PARAM(beta), PARAM(c), PARAM(ldc), PARAM(batch_count));
+
+  ThenBlasImpl<blas::Transpose, blas::Transpose, uint64, uint64, uint64, float,
+               const port::ArraySlice<DeviceMemory<float> *> &, int,
+               const port::ArraySlice<DeviceMemory<float> *> &, int, float,
+               const port::ArraySlice<DeviceMemory<float> *> &, int, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemmBatched, transa, transb, m, n,
+              k, alpha, a, lda, b, ldb, beta, c, ldc, batch_count);
+}
+
+Stream &Stream::ThenBlasGemmBatched(
+    blas::Transpose transa, blas::Transpose transb, uint64 m, uint64 n,
+    uint64 k, double alpha, const port::ArraySlice<DeviceMemory<double> *> &a,
+    int lda, const port::ArraySlice<DeviceMemory<double> *> &b, int ldb,
+    double beta, const port::ArraySlice<DeviceMemory<double> *> &c, int ldc,
+    int batch_count) {
+  VLOG_CALL(PARAM(transa), PARAM(transb), PARAM(m), PARAM(n), PARAM(k),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb),
+            PARAM(beta), PARAM(c), PARAM(ldc), PARAM(batch_count));
+
+  ThenBlasImpl<blas::Transpose, blas::Transpose, uint64, uint64, uint64, double,
+               const port::ArraySlice<DeviceMemory<double> *> &, int,
+               const port::ArraySlice<DeviceMemory<double> *> &, int, double,
+               const port::ArraySlice<DeviceMemory<double> *> &, int, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemmBatched, transa, transb, m, n,
+              k, alpha, a, lda, b, ldb, beta, c, ldc, batch_count);
+}
+
+Stream &Stream::ThenBlasGemmBatched(
+    blas::Transpose transa, blas::Transpose transb, uint64 m, uint64 n,
+    uint64 k, std::complex<float> alpha,
+    const port::ArraySlice<DeviceMemory<std::complex<float>> *> &a, int lda,
+    const port::ArraySlice<DeviceMemory<std::complex<float>> *> &b, int ldb,
+    std::complex<float> beta,
+    const port::ArraySlice<DeviceMemory<std::complex<float>> *> &c, int ldc,
+    int batch_count) {
+  VLOG_CALL(PARAM(transa), PARAM(transb), PARAM(m), PARAM(n), PARAM(k),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb),
+            PARAM(beta), PARAM(c), PARAM(ldc), PARAM(batch_count));
+
+  ThenBlasImpl<blas::Transpose, blas::Transpose, uint64, uint64, uint64,
+               std::complex<float>,
+               const port::ArraySlice<DeviceMemory<std::complex<float>> *> &,
+               int,
+               const port::ArraySlice<DeviceMemory<std::complex<float>> *> &,
+               int, std::complex<float>,
+               const port::ArraySlice<DeviceMemory<std::complex<float>> *> &,
+               int, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemmBatched, transa, transb, m, n,
+              k, alpha, a, lda, b, ldb, beta, c, ldc, batch_count);
+}
+
+Stream &Stream::ThenBlasGemmBatched(
+    blas::Transpose transa, blas::Transpose transb, uint64 m, uint64 n,
+    uint64 k, std::complex<double> alpha,
+    const port::ArraySlice<DeviceMemory<std::complex<double>> *> &a, int lda,
+    const port::ArraySlice<DeviceMemory<std::complex<double>> *> &b, int ldb,
+    std::complex<double> beta,
+    const port::ArraySlice<DeviceMemory<std::complex<double>> *> &c, int ldc,
+    int batch_count) {
+  VLOG_CALL(PARAM(transa), PARAM(transb), PARAM(m), PARAM(n), PARAM(k),
+            PARAM(alpha), PARAM(a), PARAM(lda), PARAM(b), PARAM(ldb),
+            PARAM(beta), PARAM(c), PARAM(ldc), PARAM(batch_count));
+
+  ThenBlasImpl<blas::Transpose, blas::Transpose, uint64, uint64, uint64,
+               std::complex<double>,
+               const port::ArraySlice<DeviceMemory<std::complex<double>> *> &,
+               int,
+               const port::ArraySlice<DeviceMemory<std::complex<double>> *> &,
+               int, std::complex<double>,
+               const port::ArraySlice<DeviceMemory<std::complex<double>> *> &,
+               int, int> impl;
+  return impl(this, &blas::BlasSupport::DoBlasGemmBatched, transa, transb, m, n,
+              k, alpha, a, lda, b, ldb, beta, c, ldc, batch_count);
+}
+
+Stream &Stream::ThenSetRngSeed(const uint8 *seed, uint64 seed_bytes) {
+  VLOG_CALL(PARAM(seed), PARAM(seed_bytes));
+
+  if (ok()) {
+    if (rng::RngSupport *rng = parent_->AsRng()) {
+      CheckError(rng->SetSeed(this, seed, seed_bytes));
+    } else {
+      SetError();
+      LOG(INFO) << "stream " << this << " unable to initialize RNG";
+    }
+  } else {
+    LOG(INFO) << "stream " << this
+              << " did not set RNG seed: " << static_cast<const void *>(seed)
+              << "; bytes: " << seed_bytes;
+  }
+  return *this;
+}
+
+Stream &Stream::ThenPopulateRandUniform(DeviceMemory<float> *values) {
+  VLOG_CALL(PARAM(values));
+
+  if (ok()) {
+    if (rng::RngSupport *rng = parent_->AsRng()) {
+      CheckError(rng->DoPopulateRandUniform(this, values));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform RNG operation using StreamExecutor "
+                   "without RNG support.";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenPopulateRandGaussian(float mean, float sd,
+                                         DeviceMemory<float> *values) {
+  VLOG_CALL(PARAM(mean), PARAM(sd), PARAM(values));
+
+  if (ok()) {
+    if (rng::RngSupport *rng = parent_->AsRng()) {
+      CheckError(rng->DoPopulateRandGaussian(this, mean, sd, values));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform RNG operation using StreamExecutor "
+                   "without RNG support.";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenPopulateRandGaussian(double mean, double sd,
+                                         DeviceMemory<double> *values) {
+  VLOG_CALL(PARAM(mean), PARAM(sd), PARAM(values));
+
+  if (ok()) {
+    if (rng::RngSupport *rng = parent_->AsRng()) {
+      CheckError(rng->DoPopulateRandGaussian(this, mean, sd, values));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform RNG operation using StreamExecutor "
+                   "without RNG support.";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenPopulateRandUniform(DeviceMemory<double> *values) {
+  VLOG_CALL(PARAM(values));
+
+  if (ok()) {
+    if (rng::RngSupport *rng = parent_->AsRng()) {
+      CheckError(rng->DoPopulateRandUniform(this, values));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform RNG operation using StreamExecutor "
+                   "without RNG support.";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenPopulateRandUniform(
+    DeviceMemory<std::complex<float>> *values) {
+  VLOG_CALL(PARAM(values));
+
+  if (ok()) {
+    if (rng::RngSupport *rng = parent_->AsRng()) {
+      CheckError(rng->DoPopulateRandUniform(this, values));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform RNG operation using StreamExecutor "
+                   "without RNG support.";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenPopulateRandUniform(
+    DeviceMemory<std::complex<double>> *values) {
+  VLOG_CALL(PARAM(values));
+
+  if (ok()) {
+    if (rng::RngSupport *rng = parent_->AsRng()) {
+      CheckError(rng->DoPopulateRandUniform(this, values));
+    } else {
+      SetError();
+      LOG(INFO) << "stream " << this
+                << " attempting to perform RNG operation using StreamExecutor "
+                   "without RNG support.";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemcpy(void *host_dst, const DeviceMemoryBase &gpu_src,
+                           uint64 size) {
+  VLOG_CALL(PARAM(host_dst), PARAM(gpu_src), PARAM(size));
+
+  if (ok()) {
+    CheckError(parent_->Memcpy(this, host_dst, gpu_src, size));
+  } else {
+    LOG(INFO) << "stream " << this
+              << " did not memcpy device-to-host; source: " << gpu_src.opaque();
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemcpy(DeviceMemoryBase *gpu_dst, const void *host_src,
+                           uint64 size) {
+  VLOG_CALL(PARAM(gpu_dst), PARAM(host_src), PARAM(size));
+
+  if (ok()) {
+    CheckError(parent_->Memcpy(this, gpu_dst, host_src, size));
+  } else {
+    LOG(INFO) << "stream " << this
+              << " did not memcpy host-to-device; source: " << host_src;
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemcpy(DeviceMemoryBase *gpu_dst,
+                           const DeviceMemoryBase &gpu_src, uint64 size) {
+  VLOG_CALL(PARAM(gpu_dst), PARAM(gpu_src), PARAM(size));
+
+  if (ok()) {
+    CheckError(parent_->MemcpyDeviceToDevice(this, gpu_dst, gpu_src, size));
+  } else {
+    LOG(INFO) << "stream " << this
+              << " did not memcpy gpu-to-gpu; source: " << &gpu_src;
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemZero(DeviceMemoryBase *location, uint64 size) {
+  VLOG_CALL(PARAM(location), PARAM(size));
+
+  if (ok()) {
+    CheckError(parent_->MemZero(this, location, size));
+  } else {
+    LOG(INFO) << "stream " << this
+              << " did not memzero GPU location; source: " << location;
+  }
+  return *this;
+}
+
+Stream &Stream::ThenMemset32(DeviceMemoryBase *location, const uint32 &pattern,
+                             uint64 size) {
+  VLOG_CALL(PARAM(location), PARAM(pattern), PARAM(size));
+
+  if (ok()) {
+    CheckError(parent_->Memset32(this, location, pattern, size));
+  } else {
+    LOG(INFO) << "stream " << this
+              << " did not memset GPU location; source: " << location
+              << "; size: " << size << "; pattern: " << std::hex << pattern;
+  }
+  return *this;
+}
+
+Stream &Stream::ThenDoHostCallbackForTest(std::function<void()> callback) {
+  VLOG_CALL(PARAM(callback));
+
+  return ThenDoHostCallback(callback);
+}
+
+Stream &Stream::ThenDoHostCallback(std::function<void()> callback) {
+  VLOG_CALL(PARAM(callback));
+
+  if (ok()) {
+    CheckError(parent_->HostCallback(this, callback));
+  } else {
+    LOG(INFO) << "stream " << this
+              << " was in error state before adding host callback";
+  }
+  return *this;
+}
+
+Stream &Stream::ThenFft(fft::Plan *plan,
+                        const DeviceMemory<std::complex<float>> &input,
+                        DeviceMemory<std::complex<float>> *output) {
+  VLOG_CALL(PARAM(plan), PARAM(input), PARAM(output));
+
+  if (ok()) {
+    if (fft::FftSupport *fft = parent_->AsFft()) {
+      CheckError(fft->DoFft(this, plan, input, output));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform FFT operation using StreamExecutor "
+                   "without FFT support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenFft(fft::Plan *plan,
+                        const DeviceMemory<std::complex<double>> &input,
+                        DeviceMemory<std::complex<double>> *output) {
+  VLOG_CALL(PARAM(plan), PARAM(input), PARAM(output));
+
+  if (ok()) {
+    if (fft::FftSupport *fft = parent_->AsFft()) {
+      CheckError(fft->DoFft(this, plan, input, output));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform FFT operation using StreamExecutor "
+                   "without FFT support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenFft(fft::Plan *plan, const DeviceMemory<float> &input,
+                        DeviceMemory<std::complex<float>> *output) {
+  VLOG_CALL(PARAM(plan), PARAM(input), PARAM(output));
+
+  if (ok()) {
+    if (fft::FftSupport *fft = parent_->AsFft()) {
+      CheckError(fft->DoFft(this, plan, input, output));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform FFT operation using StreamExecutor "
+                   "without FFT support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenFft(fft::Plan *plan, const DeviceMemory<double> &input,
+                        DeviceMemory<std::complex<double>> *output) {
+  VLOG_CALL(PARAM(plan), PARAM(input), PARAM(output));
+
+  if (ok()) {
+    if (fft::FftSupport *fft = parent_->AsFft()) {
+      CheckError(fft->DoFft(this, plan, input, output));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform FFT operation using StreamExecutor "
+                   "without FFT support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenFft(fft::Plan *plan,
+                        const DeviceMemory<std::complex<float>> &input,
+                        DeviceMemory<float> *output) {
+  VLOG_CALL(PARAM(plan), PARAM(input), PARAM(output));
+
+  if (ok()) {
+    if (fft::FftSupport *fft = parent_->AsFft()) {
+      CheckError(fft->DoFft(this, plan, input, output));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform FFT operation using StreamExecutor "
+                   "without FFT support";
+    }
+  }
+  return *this;
+}
+
+Stream &Stream::ThenFft(fft::Plan *plan,
+                        const DeviceMemory<std::complex<double>> &input,
+                        DeviceMemory<double> *output) {
+  VLOG_CALL(PARAM(plan), PARAM(input), PARAM(output));
+
+  if (ok()) {
+    if (fft::FftSupport *fft = parent_->AsFft()) {
+      CheckError(fft->DoFft(this, plan, input, output));
+    } else {
+      SetError();
+      LOG(INFO) << "attempting to perform FFT operation using StreamExecutor "
+                   "without FFT support";
+    }
+  }
+  return *this;
+}
+
+// It looks confusing, but all this is doing is inserting a callback at the
+// present point in the stream to then enqueue a task on the host executor.
+Stream &Stream::ThenEnqueueOnBackgroundThread(
+    std::function<void(StreamExecutor *)> task) {
+  VLOG_CALL(PARAM(task));
+
+  StreamExecutor *stream_executor = this->parent_;
+  std::function<void()> bound_task = std::bind(task, stream_executor);
+
+  return ThenDoHostCallback([stream_executor, bound_task]() {
+    stream_executor->EnqueueOnBackgroundThread(bound_task);
+  });
+}
+
+bool Stream::BlockHostUntilDone() {
+  VLOG_CALL();
+
+  if (!ok()) {
+    LOG(INFO)
+        << "stream " << this
+        << " did not block host until done; was already in an error state";
+    return false;
+  }
+
+  {
+    // Wait until all active sub-streams have done their tasks.
+    mutex_lock lock{mu_};
+    for (auto &stream : sub_streams_) {
+      if (!stream.second) {
+        CheckError(stream.first->BlockHostUntilDone());
+        // Set this sub-stream as available.
+        stream.second = true;
+      }
+    }
+  }
+
+  temporary_memory_manager_.DeallocateFinalizedTemporaries();
+
+  CheckError(parent_->BlockHostUntilDone(this));
+  return ok();
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/stream.h b/tensorflow/stream_executor/stream.h
new file mode 100644
index 0000000000..d4d5e7729b
--- /dev/null
+++ b/tensorflow/stream_executor/stream.h
@@ -0,0 +1,1258 @@
+// The Stream is used in conjunction with the StreamExecutor "parent" to
+// perform actions with a linear stream of dependencies. Dependencies can also
+// be created between Streams to do task management (i.e. limit which tasks
+// can be performed concurrently and specify what task dependencies exist).
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_STREAM_H_
+#define TENSORFLOW_STREAM_EXECUTOR_STREAM_H_
+
+#include <complex>
+#include <functional>
+#include <memory>
+
+#include "tensorflow/stream_executor/blas.h"
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/dnn.h"
+#include "tensorflow/stream_executor/event.h"
+#include "tensorflow/stream_executor/fft.h"
+#include "tensorflow/stream_executor/kernel.h"
+#include "tensorflow/stream_executor/launch_dim.h"
+#include "tensorflow/stream_executor/lib/array_slice.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/temporary_memory_manager.h"
+
+namespace perftools {
+namespace gputools {
+
+namespace host {
+class HostBlas;
+class HostFft;
+class HostRng;
+class HostTimer;
+}  // namespace host
+
+namespace ocl {
+class CLBlas;
+}  // namespace ocl
+
+namespace internal {
+class StreamInterface;
+}  // namespace internal
+
+class DeviceMemoryBase;
+template <typename ElemT>
+class DeviceMemory;
+
+class Timer;
+
+namespace dnn {
+struct BatchDescriptor;
+struct FilterDescriptor;
+struct ConvolutionDescriptor;
+}  // namespace dnn
+
+class StreamExecutor;
+
+// Represents a stream of dependent computations on a GPU device.
+//
+// The operations within a stream execute linearly and asynchronously until
+// BlockHostUntilDone() is invoked, which synchronously joins host code with
+// the execution of the stream.
+//
+// If any given operation fails when entraining work for the stream, ok() will
+// indicate that an error has occurred. After initialization, once a stream is
+// !ok(), it will never be ok().
+//
+// Thread-safe post-initialization.
+class Stream {
+ public:
+  // Instantiate a stream tied to parent as a platform executor. Work
+  // entrained onto this stream will be launched/managed on that
+  // StreamExecutor's platform.
+  explicit Stream(StreamExecutor *parent);
+
+  // Test only. Use an externally-populated value (like a mock) for the
+  // platform-specific stream implementation.
+  Stream(StreamExecutor *parent, internal::StreamInterface *implementation);
+
+  // Deallocates any stream resources that the parent StreamExecutor has
+  // bestowed
+  // upon this object.
+  ~Stream();
+
+  // Returns whether any errors have occurred while entraining work for this
+  // stream.
+  bool ok() const { return !InErrorState(); }
+
+  // Initialize the stream. This must be performed before entraining any other
+  // operations.
+  Stream &Init();
+
+  // Initializes timer t via the StreamExecutor.
+  Stream &InitTimer(Timer *t);
+
+  // Convenience wrapper around Init() and InitTimer().
+  Stream &InitWithTimer(Timer *t);
+
+  // Warning! After calling BlockHostUntilDone(), all sub-streams will be
+  // returned and hence invalid. This may be a temporary solution to the issue
+  // b/18070215.
+  // Get or create a sub-stream from this stream. If there is any sub-stream
+  // in the pool that can be reused then just return this sub-stream.
+  // Otherwise
+  // create a new sub-stream.
+  Stream *GetOrCreateSubStream();
+
+  // Return the sub-stream back to the host stream so that it can be reused
+  // later.
+  void ReturnSubStream(Stream *sub_stream);
+
+  // Allocate temporary memories. The stream will deallocate them when blocked
+  // or destroyed.
+  template <typename T>
+  port::StatusOr<std::unique_ptr<TemporaryDeviceMemory<T>>>
+  AllocateTemporaryArray(uint64 element_count);
+
+  // Entrains onto the stream of operations: a kernel launch with the given
+  // (variadic) parameters for the invocation. These arguments can be things
+  // like DeviceMemory or primitive types such as int. What arguments you may
+  // pass to a given kernel are noted as the template parameters to the
+  // TypedKernel type that the machocc compiler generates.
+  //
+  // Template parameters:
+  //  Params...   The type list of formal parameters that the typed kernel
+  //              expects, which is matched against Args...
+  //  Args...     The deduced type list for passed actual arguments
+  //
+  // Implementation: A compile-time compatibility check is performed that has
+  // some leniency versus an exact parameter pack match -- for example,
+  // `const DeviceMemory<T>` is considered "pack compatible" with a
+  // `const DeviceMemory<T>&` formal parameter; in part, because we don't have
+  // perfect forwarding support without rvalue references. It also attempts to
+  // spit out helpful static_assert error traces with information as to the
+  // argument number and types that were mismatched.
+  template <typename... Params, typename... Args>
+  Stream &ThenLaunch(ThreadDim thread_dims, BlockDim block_dims,
+                     const TypedKernel<Params...> &kernel, Args... args);
+
+  // Record a "start" event for the interval timer at this point in the
+  // stream's
+  // execution (relative to the previously and subsequently enqueued items in
+  // the stream's execution). Streams may be started/stopped multiple times.
+  Stream &ThenStartTimer(Timer *t);
+
+  // Record a "stop" event for the interval timer at this point in the
+  // stream's
+  // execution. See also Stream::ThenStartTimer.
+  Stream &ThenStopTimer(Timer *t);
+
+  // TODO(leary) If work is added to the stream that is being depended upon,
+  //              then what? Have to describe what happens.
+  template <typename... Params>
+  Stream &ThenWaitFor(Stream *other, Params... more_streams) {
+    return ThenWaitFor(more_streams...).ThenWaitFor(other);
+  }
+
+  // Create a dependency for this stream's next work on the other stream
+  // completing. Does not take ownership of other, and other must not be
+  // null.
+  //
+  // Checks that a stream does not wait for itself, and it is up to the
+  // user to guarantee that a stream does not come to wait on itself in a
+  // cyclic
+  // manner; in that case, behavior is undefined.
+  //
+  // N.B. Base recursion case for the variadic ThenWaitFor.
+  Stream &ThenWaitFor(Stream *other);
+
+  // Waits for all streams values in others.
+  // Checks that there is no shallow circular wait (i.e. that "this" is not in
+  // others).
+  Stream &ThenWaitFor(std::vector<std::unique_ptr<Stream>> *others);
+
+  // Waits for an event object to be set.
+  // Note that ThenRecordEvent must have been called on the event before
+  // you call this function; otherwise the event will be considered complete
+  // and this wait will do nothing.
+  Stream &ThenWaitFor(Event *event);
+
+  // Inserts the specified event into the end of this stream. Once the stream
+  // has processed all events prior to the insertion point, the event will be
+  // marked as completed.
+  // The stream does not take ownership of event - meaning that event's lifetime
+  // must extend past the point at which it is marked complete!
+  Stream &ThenRecordEvent(Event *event);
+
+  ////////////////
+  // DNN support
+  //
+  // See DnnSupport::* for comments on the following methods.
+
+  // TODO(leary) add double-precision version of this interface.
+  Stream &ThenConvolve(const dnn::BatchDescriptor &input_descriptor,
+                       const DeviceMemory<float> &input_data,
+                       const dnn::FilterDescriptor &filter_descriptor,
+                       const DeviceMemory<float> &filter_data,
+                       const dnn::ConvolutionDescriptor &convolution_descriptor,
+                       const dnn::BatchDescriptor &output_descriptor,
+                       DeviceMemory<float> *output);
+
+  Stream &ThenSeparableConvolve(
+      const dnn::BatchDescriptor &input_descriptor,
+      const DeviceMemory<float> &input_data,
+      const dnn::FilterDescriptor &filter_descriptor, int depth_multiplier,
+      const DeviceMemory<float> &first_weights,
+      const DeviceMemory<float> &second_weights,
+      const dnn::ConvolutionDescriptor &convolution_descriptor,
+      const dnn::BatchDescriptor &output_descriptor,
+      DeviceMemory<float> *output);
+
+  Stream &ThenConvolveBackwardData(
+      const dnn::FilterDescriptor &filter_descriptor,
+      const DeviceMemory<float> &filter_data,
+      const dnn::BatchDescriptor &output_descriptor,
+      DeviceMemory<float> backward_output_data,
+      const dnn::ConvolutionDescriptor &convolution_descriptor,
+      const dnn::BatchDescriptor &input_descriptor,
+      DeviceMemory<float> *backward_input_data);
+
+  Stream &ThenConvolveBackwardFilter(
+      const dnn::BatchDescriptor &input_descriptor,
+      const DeviceMemory<float> &input_data,
+      const dnn::BatchDescriptor &output_descriptor,
+      DeviceMemory<float> backward_output_data,
+      const dnn::ConvolutionDescriptor &convolution_descriptor,
+      const dnn::FilterDescriptor &filter_descriptor,
+      DeviceMemory<float> *backward_filter_data);
+
+  Stream &ThenMatMul(const DeviceMemory<float> &input_data,
+                     const DeviceMemory<float> &weights,
+                     const dnn::BatchDescriptor &input_dimensions,
+                     const dnn::BatchDescriptor &output_dimensions,
+                     DeviceMemory<float> *output_data);
+
+  Stream &ThenMatMulQuantized(const DeviceMemory<float> &input_data,
+                     const DeviceMemory<int8> &weights,
+                     const DeviceMemory<float> &weight_scales,
+                     const dnn::BatchDescriptor &input_dimensions,
+                     const dnn::BatchDescriptor &output_dimensions,
+                     DeviceMemory<float> *output_data);
+
+  Stream &ThenMatMulQuantized(const DeviceMemory<float> &input_data,
+                     const DeviceMemory<int16> &weights,
+                     const DeviceMemory<float> &weight_scales,
+                     const dnn::BatchDescriptor &input_dimensions,
+                     const dnn::BatchDescriptor &output_dimensions,
+                     DeviceMemory<float> *output_data);
+
+  Stream &ThenBiasAdd(const DeviceMemory<float> &input_data,
+                      const DeviceMemory<float> &biases,
+                      const dnn::BatchDescriptor &dimensions,
+                      DeviceMemory<float> *output_data);
+
+  Stream &ThenPoolForward(const dnn::PoolingDescriptor &pooling_dimensions,
+                          const dnn::BatchDescriptor &input_dimensions,
+                          const DeviceMemory<float> &input_data,
+                          const dnn::BatchDescriptor &output_dimensions,
+                          DeviceMemory<float> *output_data);
+
+  Stream &ThenPoolBackward(const dnn::PoolingDescriptor &pooling_dimensions,
+                           const dnn::BatchDescriptor &input_dimensions,
+                           const DeviceMemory<float> &input_data,
+                           const dnn::BatchDescriptor &output_dimensions,
+                           const DeviceMemory<float> &output_data,
+                           const DeviceMemory<float> &input_diff_data,
+                           DeviceMemory<float> *output_diff_data);
+
+  Stream &ThenNormalize(const dnn::NormalizeDescriptor &normalize_descriptor,
+                        const DeviceMemory<float> &input_data,
+                        DeviceMemory<float> *output_data);
+
+  Stream &ThenActivate(dnn::ActivationMode activation_mode,
+                       const dnn::BatchDescriptor &dimensions,
+                       const DeviceMemory<float> &input_data,
+                       DeviceMemory<float> *output_data);
+
+  Stream &ThenDepthConcatenate(
+      port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+      port::ArraySlice<const DeviceMemory<float> *> input_data,
+      DeviceMemory<float> *output_data);
+
+  Stream &ThenElementwiseOperate(
+      dnn::ElementwiseOperation operation,
+      port::ArraySlice<dnn::BatchDescriptor> input_dimensions,
+      port::ArraySlice<const DeviceMemory<float> *> input_data,
+      const dnn::BatchDescriptor &output_dimensions,
+      DeviceMemory<float> *output_data);
+
+  // See DnnSupport::DoMemcpyD2HQuantized.
+  // TODO(wgulland) Use a template to merge the versions of
+  // ThenMemcpyD2HQuantized.
+  Stream &ThenMemcpyD2HQuantized(const DeviceMemory<float> &gpu_unquantized_src,
+                                 port::MutableArraySlice<uint8> host_dst);
+
+  // See DnnSupport::DoMemcpyD2HQuantized.
+  Stream &ThenMemcpyD2HQuantized(const DeviceMemory<float> &gpu_unquantized_src,
+                                 port::MutableArraySlice<uint16> host_dst);
+
+  // See DnnSupport::DoMemcpyD2HQuantized.
+  Stream &ThenMemcpyD2HQuantized(const DeviceMemory<float> &gpu_unquantized_src,
+                                 port::MutableArraySlice<int32> host_dst);
+
+  // See DnnSupport::DoMemcpyH2DQuantized.
+  Stream &ThenMemcpyH2DQuantized(port::ArraySlice<uint8> host_src,
+                                 DeviceMemory<float> *gpu_unquantized_dst);
+
+  /////////////////
+  // BLAS support
+
+  // See BlasSupport::DoBlasAsum.
+  Stream &ThenBlasAsum(uint64 elem_count, const DeviceMemory<float> &x,
+                       int incx, DeviceMemory<float> *result);
+  Stream &ThenBlasAsum(uint64 elem_count, const DeviceMemory<double> &x,
+                       int incx, DeviceMemory<double> *result);
+  Stream &ThenBlasAsum(uint64 elem_count,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       DeviceMemory<float> *result);
+  Stream &ThenBlasAsum(uint64 elem_count,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       DeviceMemory<double> *result);
+
+  // See BlasSupport::DoBlasAxpy. Note that, even for the case where alpha is
+  // present in DeviceMemory, it must be an execution-time constant (i.e. a
+  // value
+  // that the stream does not change or populate during the course of
+  // execution). The value is effectively captured at stream-enqueue time.
+  Stream &ThenBlasAxpy(uint64 elem_count, float alpha,
+                       const DeviceMemory<float> &x, int incx,
+                       DeviceMemory<float> *y, int incy);
+  Stream &ThenBlasAxpy(uint64 elem_count, double alpha,
+                       const DeviceMemory<double> &x, int incx,
+                       DeviceMemory<double> *y, int incy);
+  Stream &ThenBlasAxpy(uint64 elem_count, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       DeviceMemory<std::complex<float>> *y, int incy);
+  Stream &ThenBlasAxpy(uint64 elem_count, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       DeviceMemory<std::complex<double>> *y, int incy);
+
+  // See BlasSupport::DoBlasCopy.
+  Stream &ThenBlasCopy(uint64 elem_count, const DeviceMemory<float> &x,
+                       int incx, DeviceMemory<float> *y, int incy);
+  Stream &ThenBlasCopy(uint64 elem_count, const DeviceMemory<double> &x,
+                       int incx, DeviceMemory<double> *y, int incy);
+  Stream &ThenBlasCopy(uint64 elem_count,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       DeviceMemory<std::complex<float>> *y, int incy);
+  Stream &ThenBlasCopy(uint64 elem_count,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       DeviceMemory<std::complex<double>> *y, int incy);
+
+  // See BlasSupport::DoBlasDot.
+  Stream &ThenBlasDot(uint64 elem_count, const DeviceMemory<float> &x, int incx,
+                      const DeviceMemory<float> &y, int incy,
+                      DeviceMemory<float> *result);
+  Stream &ThenBlasDot(uint64 elem_count, const DeviceMemory<double> &x,
+                      int incx, const DeviceMemory<double> &y, int incy,
+                      DeviceMemory<double> *result);
+
+  // See BlasSupport::DoBlasDotc.
+  Stream &ThenBlasDotc(uint64 elem_count,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       const DeviceMemory<std::complex<float>> &y, int incy,
+                       DeviceMemory<std::complex<float>> *result);
+  Stream &ThenBlasDotc(uint64 elem_count,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       const DeviceMemory<std::complex<double>> &y, int incy,
+                       DeviceMemory<std::complex<double>> *result);
+
+  // See BlasSupport::DoBlasDotu.
+  Stream &ThenBlasDotu(uint64 elem_count,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       const DeviceMemory<std::complex<float>> &y, int incy,
+                       DeviceMemory<std::complex<float>> *result);
+  Stream &ThenBlasDotu(uint64 elem_count,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       const DeviceMemory<std::complex<double>> &y, int incy,
+                       DeviceMemory<std::complex<double>> *result);
+
+  // See BlasSupport::DoBlasNrm2.
+  Stream &ThenBlasNrm2(uint64 elem_count, const DeviceMemory<float> &x,
+                       int incx, DeviceMemory<float> *result);
+  Stream &ThenBlasNrm2(uint64 elem_count, const DeviceMemory<double> &x,
+                       int incx, DeviceMemory<double> *result);
+  Stream &ThenBlasNrm2(uint64 elem_count,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       DeviceMemory<float> *result);
+  Stream &ThenBlasNrm2(uint64 elem_count,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       DeviceMemory<double> *result);
+
+  // See BlasSupport::DoBlasRot.
+  Stream &ThenBlasRot(uint64 elem_count, DeviceMemory<float> *x, int incx,
+                      DeviceMemory<float> *y, int incy, float c, float s);
+  Stream &ThenBlasRot(uint64 elem_count, DeviceMemory<double> *x, int incx,
+                      DeviceMemory<double> *y, int incy, double c, double s);
+  Stream &ThenBlasRot(uint64 elem_count, DeviceMemory<std::complex<float>> *x,
+                      int incx, DeviceMemory<std::complex<float>> *y, int incy,
+                      float c, float s);
+  Stream &ThenBlasRot(uint64 elem_count, DeviceMemory<std::complex<double>> *x,
+                      int incx, DeviceMemory<std::complex<double>> *y, int incy,
+                      double c, double s);
+
+  // See BlasSupport::DoBlasRotg.
+  Stream &ThenBlasRotg(DeviceMemory<float> *a, DeviceMemory<float> *b,
+                       DeviceMemory<float> *c, DeviceMemory<float> *s);
+  Stream &ThenBlasRotg(DeviceMemory<double> *a, DeviceMemory<double> *b,
+                       DeviceMemory<double> *c, DeviceMemory<double> *s);
+  Stream &ThenBlasRotg(DeviceMemory<std::complex<float>> *a,
+                       DeviceMemory<std::complex<float>> *b,
+                       DeviceMemory<float> *c,
+                       DeviceMemory<std::complex<float>> *s);
+  Stream &ThenBlasRotg(DeviceMemory<std::complex<double>> *a,
+                       DeviceMemory<std::complex<double>> *b,
+                       DeviceMemory<double> *c,
+                       DeviceMemory<std::complex<double>> *s);
+
+  // See BlasSupport::DoBlasRotm.
+  Stream &ThenBlasRotm(uint64 elem_count, DeviceMemory<float> *x, int incx,
+                       DeviceMemory<float> *y, int incy,
+                       const DeviceMemory<float> &param);
+  Stream &ThenBlasRotm(uint64 elem_count, DeviceMemory<double> *x, int incx,
+                       DeviceMemory<double> *y, int incy,
+                       const DeviceMemory<double> &param);
+
+  // See BlasSupport::DoBlasRotmg.
+  Stream &ThenBlasRotmg(DeviceMemory<float> *d1, DeviceMemory<float> *d2,
+                        DeviceMemory<float> *x1, const DeviceMemory<float> &y1,
+                        DeviceMemory<float> *param);
+  Stream &ThenBlasRotmg(DeviceMemory<double> *d1, DeviceMemory<double> *d2,
+                        DeviceMemory<double> *x1,
+                        const DeviceMemory<double> &y1,
+                        DeviceMemory<double> *param);
+
+  // See BlasSupport::DoBlasScal.
+  Stream &ThenBlasScal(uint64 elem_count, float alpha, DeviceMemory<float> *x,
+                       int incx);
+  Stream &ThenBlasScal(uint64 elem_count, double alpha, DeviceMemory<double> *x,
+                       int incx);
+  Stream &ThenBlasScal(uint64 elem_count, float alpha,
+                       DeviceMemory<std::complex<float>> *x, int incx);
+  Stream &ThenBlasScal(uint64 elem_count, double alpha,
+                       DeviceMemory<std::complex<double>> *x, int incx);
+  Stream &ThenBlasScal(uint64 elem_count, std::complex<float> alpha,
+                       DeviceMemory<std::complex<float>> *x, int incx);
+  Stream &ThenBlasScal(uint64 elem_count, std::complex<double> alpha,
+                       DeviceMemory<std::complex<double>> *x, int incx);
+
+  // See BlasSupport::DoBlasSwap.
+  Stream &ThenBlasSwap(uint64 elem_count, DeviceMemory<float> *x, int incx,
+                       DeviceMemory<float> *y, int incy);
+  Stream &ThenBlasSwap(uint64 elem_count, DeviceMemory<double> *x, int incx,
+                       DeviceMemory<double> *y, int incy);
+  Stream &ThenBlasSwap(uint64 elem_count, DeviceMemory<std::complex<float>> *x,
+                       int incx, DeviceMemory<std::complex<float>> *y,
+                       int incy);
+  Stream &ThenBlasSwap(uint64 elem_count, DeviceMemory<std::complex<double>> *x,
+                       int incx, DeviceMemory<std::complex<double>> *y,
+                       int incy);
+
+  // See BlasSupport::DoBlasIamax.
+  Stream &ThenBlasIamax(uint64 elem_count, const DeviceMemory<float> &x,
+                        int incx, DeviceMemory<int> *result);
+  Stream &ThenBlasIamax(uint64 elem_count, const DeviceMemory<double> &x,
+                        int incx, DeviceMemory<int> *result);
+  Stream &ThenBlasIamax(uint64 elem_count,
+                        const DeviceMemory<std::complex<float>> &x, int incx,
+                        DeviceMemory<int> *result);
+  Stream &ThenBlasIamax(uint64 elem_count,
+                        const DeviceMemory<std::complex<double>> &x, int incx,
+                        DeviceMemory<int> *result);
+
+  // See BlasSupport::DoBlasIamin.
+  Stream &ThenBlasIamin(uint64 elem_count, const DeviceMemory<float> &x,
+                        int incx, DeviceMemory<int> *result);
+  Stream &ThenBlasIamin(uint64 elem_count, const DeviceMemory<double> &x,
+                        int incx, DeviceMemory<int> *result);
+  Stream &ThenBlasIamin(uint64 elem_count,
+                        const DeviceMemory<std::complex<float>> &x, int incx,
+                        DeviceMemory<int> *result);
+  Stream &ThenBlasIamin(uint64 elem_count,
+                        const DeviceMemory<std::complex<double>> &x, int incx,
+                        DeviceMemory<int> *result);
+
+  // See BlasSupport::DoBlasGbmv.
+  Stream &ThenBlasGbmv(blas::Transpose trans, uint64 m, uint64 n, uint64 kl,
+                       uint64 ku, float alpha, const DeviceMemory<float> &a,
+                       int lda, const DeviceMemory<float> &x, int incx,
+                       float beta, DeviceMemory<float> *y, int incy);
+  Stream &ThenBlasGbmv(blas::Transpose trans, uint64 m, uint64 n, uint64 kl,
+                       uint64 ku, double alpha, const DeviceMemory<double> &a,
+                       int lda, const DeviceMemory<double> &x, int incx,
+                       double beta, DeviceMemory<double> *y, int incy);
+  Stream &ThenBlasGbmv(blas::Transpose trans, uint64 m, uint64 n, uint64 kl,
+                       uint64 ku, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *y, int incy);
+  Stream &ThenBlasGbmv(blas::Transpose trans, uint64 m, uint64 n, uint64 kl,
+                       uint64 ku, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *y, int incy);
+
+  // See BlasSupport::DoBlasGemv.
+  Stream &ThenBlasGemv(blas::Transpose trans, uint64 m, uint64 n, float alpha,
+                       const DeviceMemory<float> &a, int lda,
+                       const DeviceMemory<float> &x, int incx, float beta,
+                       DeviceMemory<float> *y, int incy);
+  Stream &ThenBlasGemv(blas::Transpose trans, uint64 m, uint64 n, double alpha,
+                       const DeviceMemory<double> &a, int lda,
+                       const DeviceMemory<double> &x, int incx, double beta,
+                       DeviceMemory<double> *y, int incy);
+  Stream &ThenBlasGemv(blas::Transpose trans, uint64 m, uint64 n,
+                       std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *y, int incy);
+  Stream &ThenBlasGemv(blas::Transpose trans, uint64 m, uint64 n,
+                       std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *y, int incy);
+
+  // See BlasSupport::DoBlasGer.
+  Stream &ThenBlasGer(uint64 m, uint64 n, float alpha,
+                      const DeviceMemory<float> &x, int incx,
+                      const DeviceMemory<float> &y, int incy,
+                      DeviceMemory<float> *a, int lda);
+  Stream &ThenBlasGer(uint64 m, uint64 n, double alpha,
+                      const DeviceMemory<double> &x, int incx,
+                      const DeviceMemory<double> &y, int incy,
+                      DeviceMemory<double> *a, int lda);
+
+  // See BlasSupport::DoBlasGerc.
+  Stream &ThenBlasGerc(uint64 m, uint64 n, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       const DeviceMemory<std::complex<float>> &y, int incy,
+                       DeviceMemory<std::complex<float>> *a, int lda);
+  Stream &ThenBlasGerc(uint64 m, uint64 n, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       const DeviceMemory<std::complex<double>> &y, int incy,
+                       DeviceMemory<std::complex<double>> *a, int lda);
+
+  // See BlasSupport::DoBlasGeru.
+  Stream &ThenBlasGeru(uint64 m, uint64 n, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       const DeviceMemory<std::complex<float>> &y, int incy,
+                       DeviceMemory<std::complex<float>> *a, int lda);
+  Stream &ThenBlasGeru(uint64 m, uint64 n, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       const DeviceMemory<std::complex<double>> &y, int incy,
+                       DeviceMemory<std::complex<double>> *a, int lda);
+
+  // See BlasSupport::DoBlasHbmv.
+  Stream &ThenBlasHbmv(blas::UpperLower uplo, uint64 n, uint64 k,
+                       std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *y, int incy);
+  Stream &ThenBlasHbmv(blas::UpperLower uplo, uint64 n, uint64 k,
+                       std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *y, int incy);
+
+  // See BlasSupport::DoBlasHemv.
+  Stream &ThenBlasHemv(blas::UpperLower uplo, uint64 n,
+                       std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *y, int incy);
+  Stream &ThenBlasHemv(blas::UpperLower uplo, uint64 n,
+                       std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *y, int incy);
+
+  // See BlasSupport::DoBlasHer.
+  Stream &ThenBlasHer(blas::UpperLower uplo, uint64 n, float alpha,
+                      const DeviceMemory<std::complex<float>> &x, int incx,
+                      DeviceMemory<std::complex<float>> *a, int lda);
+  Stream &ThenBlasHer(blas::UpperLower uplo, uint64 n, double alpha,
+                      const DeviceMemory<std::complex<double>> &x, int incx,
+                      DeviceMemory<std::complex<double>> *a, int lda);
+
+  // See BlasSupport::DoBlasHer2.
+  Stream &ThenBlasHer2(blas::UpperLower uplo, uint64 n,
+                       std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       const DeviceMemory<std::complex<float>> &y, int incy,
+                       DeviceMemory<std::complex<float>> *a, int lda);
+  Stream &ThenBlasHer2(blas::UpperLower uplo, uint64 n,
+                       std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       const DeviceMemory<std::complex<double>> &y, int incy,
+                       DeviceMemory<std::complex<double>> *a, int lda);
+
+  // See BlasSupport::DoBlasHpmv.
+  Stream &ThenBlasHpmv(blas::UpperLower uplo, uint64 n,
+                       std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &ap,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *y, int incy);
+  Stream &ThenBlasHpmv(blas::UpperLower uplo, uint64 n,
+                       std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &ap,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *y, int incy);
+
+  // See BlasSupport::DoBlasHpr.
+  Stream &ThenBlasHpr(blas::UpperLower uplo, uint64 n, float alpha,
+                      const DeviceMemory<std::complex<float>> &x, int incx,
+                      DeviceMemory<std::complex<float>> *ap);
+  Stream &ThenBlasHpr(blas::UpperLower uplo, uint64 n, double alpha,
+                      const DeviceMemory<std::complex<double>> &x, int incx,
+                      DeviceMemory<std::complex<double>> *ap);
+
+  // See BlasSupport::DoBlasHpr2.
+  Stream &ThenBlasHpr2(blas::UpperLower uplo, uint64 n,
+                       std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &x, int incx,
+                       const DeviceMemory<std::complex<float>> &y, int incy,
+                       DeviceMemory<std::complex<float>> *ap);
+  Stream &ThenBlasHpr2(blas::UpperLower uplo, uint64 n,
+                       std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &x, int incx,
+                       const DeviceMemory<std::complex<double>> &y, int incy,
+                       DeviceMemory<std::complex<double>> *ap);
+
+  // See BlasSupport::DoBlasSbmv.
+  Stream &ThenBlasSbmv(blas::UpperLower uplo, uint64 n, uint64 k, float alpha,
+                       const DeviceMemory<float> &a, int lda,
+                       const DeviceMemory<float> &x, int incx, float beta,
+                       DeviceMemory<float> *y, int incy);
+  Stream &ThenBlasSbmv(blas::UpperLower uplo, uint64 n, uint64 k, double alpha,
+                       const DeviceMemory<double> &a, int lda,
+                       const DeviceMemory<double> &x, int incx, double beta,
+                       DeviceMemory<double> *y, int incy);
+
+  // See BlasSupport::DoBlasSpmv.
+  Stream &ThenBlasSpmv(blas::UpperLower uplo, uint64 n, float alpha,
+                       const DeviceMemory<float> &ap,
+                       const DeviceMemory<float> &x, int incx, float beta,
+                       DeviceMemory<float> *y, int incy);
+  Stream &ThenBlasSpmv(blas::UpperLower uplo, uint64 n, double alpha,
+                       const DeviceMemory<double> &ap,
+                       const DeviceMemory<double> &x, int incx, double beta,
+                       DeviceMemory<double> *y, int incy);
+
+  // See BlasSupport::DoBlasSpr.
+  Stream &ThenBlasSpr(blas::UpperLower uplo, uint64 n, float alpha,
+                      const DeviceMemory<float> &x, int incx,
+                      DeviceMemory<float> *ap);
+  Stream &ThenBlasSpr(blas::UpperLower uplo, uint64 n, double alpha,
+                      const DeviceMemory<double> &x, int incx,
+                      DeviceMemory<double> *ap);
+
+  // See BlasSupport::DoBlasSpr2.
+  Stream &ThenBlasSpr2(blas::UpperLower uplo, uint64 n, float alpha,
+                       const DeviceMemory<float> &x, int incx,
+                       const DeviceMemory<float> &y, int incy,
+                       DeviceMemory<float> *ap);
+  Stream &ThenBlasSpr2(blas::UpperLower uplo, uint64 n, double alpha,
+                       const DeviceMemory<double> &x, int incx,
+                       const DeviceMemory<double> &y, int incy,
+                       DeviceMemory<double> *ap);
+
+  // See BlasSupport::DoBlasSymv.
+  Stream &ThenBlasSymv(blas::UpperLower uplo, uint64 n, float alpha,
+                       const DeviceMemory<float> &a, int lda,
+                       const DeviceMemory<float> &x, int incx, float beta,
+                       DeviceMemory<float> *y, int incy);
+  Stream &ThenBlasSymv(blas::UpperLower uplo, uint64 n, double alpha,
+                       const DeviceMemory<double> &a, int lda,
+                       const DeviceMemory<double> &x, int incx, double beta,
+                       DeviceMemory<double> *y, int incy);
+
+  // See BlasSupport::DoBlasSyr.
+  Stream &ThenBlasSyr(blas::UpperLower uplo, uint64 n, float alpha,
+                      const DeviceMemory<float> &x, int incx,
+                      DeviceMemory<float> *a, int lda);
+  Stream &ThenBlasSyr(blas::UpperLower uplo, uint64 n, double alpha,
+                      const DeviceMemory<double> &x, int incx,
+                      DeviceMemory<double> *a, int lda);
+
+  // See BlasSupport::DoBlasSyr2.
+  Stream &ThenBlasSyr2(blas::UpperLower uplo, uint64 n, float alpha,
+                       const DeviceMemory<float> &x, int incx,
+                       const DeviceMemory<float> &y, int incy,
+                       DeviceMemory<float> *a, int lda);
+  Stream &ThenBlasSyr2(blas::UpperLower uplo, uint64 n, double alpha,
+                       const DeviceMemory<double> &x, int incx,
+                       const DeviceMemory<double> &y, int incy,
+                       DeviceMemory<double> *a, int lda);
+
+  // See BlasSupport::DoBlasTbmv.
+  Stream &ThenBlasTbmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n, uint64 k,
+                       const DeviceMemory<float> &a, int lda,
+                       DeviceMemory<float> *x, int incx);
+  Stream &ThenBlasTbmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n, uint64 k,
+                       const DeviceMemory<double> &a, int lda,
+                       DeviceMemory<double> *x, int incx);
+  Stream &ThenBlasTbmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n, uint64 k,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       DeviceMemory<std::complex<float>> *x, int incx);
+  Stream &ThenBlasTbmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n, uint64 k,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       DeviceMemory<std::complex<double>> *x, int incx);
+
+  // See BlasSupport::DoBlasTbsv.
+  Stream &ThenBlasTbsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n, uint64 k,
+                       const DeviceMemory<float> &a, int lda,
+                       DeviceMemory<float> *x, int incx);
+  Stream &ThenBlasTbsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n, uint64 k,
+                       const DeviceMemory<double> &a, int lda,
+                       DeviceMemory<double> *x, int incx);
+  Stream &ThenBlasTbsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n, uint64 k,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       DeviceMemory<std::complex<float>> *x, int incx);
+  Stream &ThenBlasTbsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n, uint64 k,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       DeviceMemory<std::complex<double>> *x, int incx);
+
+  // See BlasSupport::DoBlasTpmv.
+  Stream &ThenBlasTpmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<float> &ap, DeviceMemory<float> *x,
+                       int incx);
+  Stream &ThenBlasTpmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<double> &ap, DeviceMemory<double> *x,
+                       int incx);
+  Stream &ThenBlasTpmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<std::complex<float>> &ap,
+                       DeviceMemory<std::complex<float>> *x, int incx);
+  Stream &ThenBlasTpmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<std::complex<double>> &ap,
+                       DeviceMemory<std::complex<double>> *x, int incx);
+
+  // See BlasSupport::DoBlasTpsv.
+  Stream &ThenBlasTpsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<float> &ap, DeviceMemory<float> *x,
+                       int incx);
+  Stream &ThenBlasTpsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<double> &ap, DeviceMemory<double> *x,
+                       int incx);
+  Stream &ThenBlasTpsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<std::complex<float>> &ap,
+                       DeviceMemory<std::complex<float>> *x, int incx);
+  Stream &ThenBlasTpsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<std::complex<double>> &ap,
+                       DeviceMemory<std::complex<double>> *x, int incx);
+
+  // See BlasSupport::DoBlasTrmv.
+  Stream &ThenBlasTrmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<float> &a, int lda,
+                       DeviceMemory<float> *x, int incx);
+  Stream &ThenBlasTrmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<double> &a, int lda,
+                       DeviceMemory<double> *x, int incx);
+  Stream &ThenBlasTrmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       DeviceMemory<std::complex<float>> *x, int incx);
+  Stream &ThenBlasTrmv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       DeviceMemory<std::complex<double>> *x, int incx);
+
+  // See BlasSupport::DoBlasTrsv.
+  Stream &ThenBlasTrsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<float> &a, int lda,
+                       DeviceMemory<float> *x, int incx);
+  Stream &ThenBlasTrsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<double> &a, int lda,
+                       DeviceMemory<double> *x, int incx);
+  Stream &ThenBlasTrsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       DeviceMemory<std::complex<float>> *x, int incx);
+  Stream &ThenBlasTrsv(blas::UpperLower uplo, blas::Transpose trans,
+                       blas::Diagonal diag, uint64 n,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       DeviceMemory<std::complex<double>> *x, int incx);
+
+  // See BlasSupport::DoBlasGemm.
+  Stream &ThenBlasGemm(blas::Transpose transa, blas::Transpose transb, uint64 m,
+                       uint64 n, uint64 k, float alpha,
+                       const DeviceMemory<float> &a, int lda,
+                       const DeviceMemory<float> &b, int ldb, float beta,
+                       DeviceMemory<float> *c, int ldc);
+  Stream &ThenBlasGemm(blas::Transpose transa, blas::Transpose transb, uint64 m,
+                       uint64 n, uint64 k, double alpha,
+                       const DeviceMemory<double> &a, int lda,
+                       const DeviceMemory<double> &b, int ldb, double beta,
+                       DeviceMemory<double> *c, int ldc);
+  Stream &ThenBlasGemm(blas::Transpose transa, blas::Transpose transb, uint64 m,
+                       uint64 n, uint64 k, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       const DeviceMemory<std::complex<float>> &b, int ldb,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *c, int ldc);
+  Stream &ThenBlasGemm(blas::Transpose transa, blas::Transpose transb, uint64 m,
+                       uint64 n, uint64 k, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       const DeviceMemory<std::complex<double>> &b, int ldb,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *c, int ldc);
+
+  // See BlasSupport::DoBlasGemmBatched.
+  Stream &ThenBlasGemmBatched(blas::Transpose transa, blas::Transpose transb,
+                              uint64 m, uint64 n, uint64 k, float alpha,
+                              const port::ArraySlice<DeviceMemory<float> *> &a,
+                              int lda,
+                              const port::ArraySlice<DeviceMemory<float> *> &b,
+                              int ldb, float beta,
+                              const port::ArraySlice<DeviceMemory<float> *> &c,
+                              int ldc, int batch_count);
+  Stream &ThenBlasGemmBatched(blas::Transpose transa, blas::Transpose transb,
+                              uint64 m, uint64 n, uint64 k, double alpha,
+                              const port::ArraySlice<DeviceMemory<double> *> &a,
+                              int lda,
+                              const port::ArraySlice<DeviceMemory<double> *> &b,
+                              int ldb, double beta,
+                              const port::ArraySlice<DeviceMemory<double> *> &c,
+                              int ldc, int batch_count);
+  Stream &ThenBlasGemmBatched(
+      blas::Transpose transa, blas::Transpose transb, uint64 m, uint64 n,
+      uint64 k, std::complex<float> alpha,
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &a, int lda,
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &b, int ldb,
+      std::complex<float> beta,
+      const port::ArraySlice<DeviceMemory<std::complex<float>> *> &c, int ldc,
+      int batch_count);
+  Stream &ThenBlasGemmBatched(
+      blas::Transpose transa, blas::Transpose transb, uint64 m, uint64 n,
+      uint64 k, std::complex<double> alpha,
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &a, int lda,
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &b, int ldb,
+      std::complex<double> beta,
+      const port::ArraySlice<DeviceMemory<std::complex<double>> *> &c, int ldc,
+      int batch_count);
+
+  // See BlasSupport::DoBlasHemm.
+  Stream &ThenBlasHemm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                       uint64 n, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       const DeviceMemory<std::complex<float>> &b, int ldb,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *c, int ldc);
+  Stream &ThenBlasHemm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                       uint64 n, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       const DeviceMemory<std::complex<double>> &b, int ldb,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *c, int ldc);
+
+  // See BlasSupport::DoBlasHerk.
+  Stream &ThenBlasHerk(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                       uint64 k, float alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       float beta, DeviceMemory<std::complex<float>> *c,
+                       int ldc);
+  Stream &ThenBlasHerk(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                       uint64 k, double alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       double beta, DeviceMemory<std::complex<double>> *c,
+                       int ldc);
+
+  // See BlasSupport::DoBlasHer2k.
+  Stream &ThenBlasHer2k(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                        uint64 k, std::complex<float> alpha,
+                        const DeviceMemory<std::complex<float>> &a, int lda,
+                        const DeviceMemory<std::complex<float>> &b, int ldb,
+                        float beta, DeviceMemory<std::complex<float>> *c,
+                        int ldc);
+  Stream &ThenBlasHer2k(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                        uint64 k, std::complex<double> alpha,
+                        const DeviceMemory<std::complex<double>> &a, int lda,
+                        const DeviceMemory<std::complex<double>> &b, int ldb,
+                        double beta, DeviceMemory<std::complex<double>> *c,
+                        int ldc);
+
+  // See BlasSupport::DoBlasSymm.
+  Stream &ThenBlasSymm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                       uint64 n, float alpha, const DeviceMemory<float> &a,
+                       int lda, const DeviceMemory<float> &b, int ldb,
+                       float beta, DeviceMemory<float> *c, int ldc);
+  Stream &ThenBlasSymm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                       uint64 n, double alpha, const DeviceMemory<double> &a,
+                       int lda, const DeviceMemory<double> &b, int ldb,
+                       double beta, DeviceMemory<double> *c, int ldc);
+  Stream &ThenBlasSymm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                       uint64 n, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       const DeviceMemory<std::complex<float>> &b, int ldb,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *c, int ldc);
+  Stream &ThenBlasSymm(blas::Side side, blas::UpperLower uplo, uint64 m,
+                       uint64 n, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       const DeviceMemory<std::complex<double>> &b, int ldb,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *c, int ldc);
+
+  // See BlasSupport::DoBlasSyrk.
+  Stream &ThenBlasSyrk(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                       uint64 k, float alpha, const DeviceMemory<float> &a,
+                       int lda, float beta, DeviceMemory<float> *c, int ldc);
+  Stream &ThenBlasSyrk(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                       uint64 k, double alpha, const DeviceMemory<double> &a,
+                       int lda, double beta, DeviceMemory<double> *c, int ldc);
+  Stream &ThenBlasSyrk(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                       uint64 k, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       std::complex<float> beta,
+                       DeviceMemory<std::complex<float>> *c, int ldc);
+  Stream &ThenBlasSyrk(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                       uint64 k, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       std::complex<double> beta,
+                       DeviceMemory<std::complex<double>> *c, int ldc);
+
+  // See BlasSupport::DoBlasSyr2k.
+  Stream &ThenBlasSyr2k(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                        uint64 k, float alpha, const DeviceMemory<float> &a,
+                        int lda, const DeviceMemory<float> &b, int ldb,
+                        float beta, DeviceMemory<float> *c, int ldc);
+  Stream &ThenBlasSyr2k(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                        uint64 k, double alpha, const DeviceMemory<double> &a,
+                        int lda, const DeviceMemory<double> &b, int ldb,
+                        double beta, DeviceMemory<double> *c, int ldc);
+  Stream &ThenBlasSyr2k(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                        uint64 k, std::complex<float> alpha,
+                        const DeviceMemory<std::complex<float>> &a, int lda,
+                        const DeviceMemory<std::complex<float>> &b, int ldb,
+                        std::complex<float> beta,
+                        DeviceMemory<std::complex<float>> *c, int ldc);
+  Stream &ThenBlasSyr2k(blas::UpperLower uplo, blas::Transpose trans, uint64 n,
+                        uint64 k, std::complex<double> alpha,
+                        const DeviceMemory<std::complex<double>> &a, int lda,
+                        const DeviceMemory<std::complex<double>> &b, int ldb,
+                        std::complex<double> beta,
+                        DeviceMemory<std::complex<double>> *c, int ldc);
+
+  // See BlasSupport::DoBlasTrmm.
+  Stream &ThenBlasTrmm(blas::Side side, blas::UpperLower uplo,
+                       blas::Transpose transa, blas::Diagonal diag, uint64 m,
+                       uint64 n, float alpha, const DeviceMemory<float> &a,
+                       int lda, DeviceMemory<float> *b, int ldb);
+  Stream &ThenBlasTrmm(blas::Side side, blas::UpperLower uplo,
+                       blas::Transpose transa, blas::Diagonal diag, uint64 m,
+                       uint64 n, double alpha, const DeviceMemory<double> &a,
+                       int lda, DeviceMemory<double> *b, int ldb);
+  Stream &ThenBlasTrmm(blas::Side side, blas::UpperLower uplo,
+                       blas::Transpose transa, blas::Diagonal diag, uint64 m,
+                       uint64 n, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       DeviceMemory<std::complex<float>> *b, int ldb);
+  Stream &ThenBlasTrmm(blas::Side side, blas::UpperLower uplo,
+                       blas::Transpose transa, blas::Diagonal diag, uint64 m,
+                       uint64 n, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       DeviceMemory<std::complex<double>> *b, int ldb);
+
+  // See BlasSupport::DoBlasTrsm.
+  Stream &ThenBlasTrsm(blas::Side side, blas::UpperLower uplo,
+                       blas::Transpose transa, blas::Diagonal diag, uint64 m,
+                       uint64 n, float alpha, const DeviceMemory<float> &a,
+                       int lda, DeviceMemory<float> *b, int ldb);
+  Stream &ThenBlasTrsm(blas::Side side, blas::UpperLower uplo,
+                       blas::Transpose transa, blas::Diagonal diag, uint64 m,
+                       uint64 n, double alpha, const DeviceMemory<double> &a,
+                       int lda, DeviceMemory<double> *b, int ldb);
+  Stream &ThenBlasTrsm(blas::Side side, blas::UpperLower uplo,
+                       blas::Transpose transa, blas::Diagonal diag, uint64 m,
+                       uint64 n, std::complex<float> alpha,
+                       const DeviceMemory<std::complex<float>> &a, int lda,
+                       DeviceMemory<std::complex<float>> *b, int ldb);
+  Stream &ThenBlasTrsm(blas::Side side, blas::UpperLower uplo,
+                       blas::Transpose transa, blas::Diagonal diag, uint64 m,
+                       uint64 n, std::complex<double> alpha,
+                       const DeviceMemory<std::complex<double>> &a, int lda,
+                       DeviceMemory<std::complex<double>> *b, int ldb);
+
+  // See FftSupport::DoFft.
+  Stream &ThenFft(fft::Plan *plan,
+                  const DeviceMemory<std::complex<float>> &input,
+                  DeviceMemory<std::complex<float>> *output);
+  Stream &ThenFft(fft::Plan *plan,
+                  const DeviceMemory<std::complex<double>> &input,
+                  DeviceMemory<std::complex<double>> *output);
+  Stream &ThenFft(fft::Plan *plan, const DeviceMemory<float> &input,
+                  DeviceMemory<std::complex<float>> *output);
+  Stream &ThenFft(fft::Plan *plan, const DeviceMemory<double> &input,
+                  DeviceMemory<std::complex<double>> *output);
+  Stream &ThenFft(fft::Plan *plan,
+                  const DeviceMemory<std::complex<float>> &input,
+                  DeviceMemory<float> *output);
+  Stream &ThenFft(fft::Plan *plan,
+                  const DeviceMemory<std::complex<double>> &input,
+                  DeviceMemory<double> *output);
+
+  // Makes the RNG use the provided value as the basis for further generation.
+  // /dev/urandom (good) and /dev/random (better, but sometimes slow) are good
+  // sources of seed data if the default (high quality) sources are not
+  // desired.
+  // For most use cases, this function will not be necessary; each provided
+  // back-end implementation will be appropriately seeded by default.
+  // At a minimum 16 bytes of data are required in the seed buffer.
+  //
+  // To seed with good (non-reproducable) data:
+  //   File* f = File::Open("/dev/random", "r");
+  //   int64 bytes_read = f->Read(seed_data, bytes_to_read);
+  //   < error checking >
+  //   stream.ThenSetRngSeed(seed_data, bytes_read);
+  //
+  // To seed with reproducible data:
+  //   uint64_t seed_data[2] = { <data> };
+  //   stream.ThenSetRngSeed(seed_data, 16);
+  Stream &ThenSetRngSeed(const uint8 *seed, uint64 seed_bytes);
+
+  // Populates the memory indicated by values with uniform-random-distribution
+  // values. TODO(leary) seeding API/description
+  //
+  // Uses the type and size of the DeviceMemory to infer what data should be
+  // populated.
+  Stream &ThenPopulateRandUniform(DeviceMemory<float> *values);
+  Stream &ThenPopulateRandUniform(DeviceMemory<double> *values);
+  Stream &ThenPopulateRandUniform(DeviceMemory<std::complex<float>> *values);
+  Stream &ThenPopulateRandUniform(DeviceMemory<std::complex<double>> *values);
+  Stream &ThenPopulateRandGaussian(float mean, float stddev,
+                                   DeviceMemory<float> *values);
+  Stream &ThenPopulateRandGaussian(double mean, double stddev,
+                                   DeviceMemory<double> *values);
+
+  // Entrain onto the stream: a memcpy to a host destination from a GPU source
+  // of the given target size. host_dst must be a pointer to host memory
+  // allocated by StreamExecutor::HostMemoryAllocate or otherwise allocated and
+  // then registered with StreamExecutor::HostMemoryRegister.
+  Stream &ThenMemcpy(void *host_dst, const DeviceMemoryBase &gpu_src,
+                     uint64 size);
+
+  // Entrain onto the stream: a memcpy to a GPU destination from a host source
+  // of the given target size. host_src must be a pointer to host memory
+  // allocated by StreamExecutor::HostMemoryAllocate or otherwise allocated and
+  // then registered with StreamExecutor::HostMemoryRegister.
+  Stream &ThenMemcpy(DeviceMemoryBase *gpu_dst, const void *host_src,
+                     uint64 size);
+
+  // Alternative interface for memcpying from device to host that takes an
+  // array slice. Checks that the destination size can accomodate the host
+  // slice size.
+  template <typename T>
+  Stream &ThenMemcpyD2H(const DeviceMemory<T> &gpu_src,
+                        port::MutableArraySlice<T> host_dst) {
+    auto host_size = host_dst.size() * sizeof(T);
+    CHECK(gpu_src.size() == 0 || host_size >= gpu_src.size());
+    return ThenMemcpy(host_dst.begin(), gpu_src, host_size);
+  }
+
+  // Alternative interface for memcpying from host to device that takes an
+  // array slice. Checks that the destination size can accomodate the host
+  // slice size.
+  template <typename T>
+  Stream &ThenMemcpyH2D(port::ArraySlice<T> host_src,
+                        DeviceMemory<T> *gpu_dst) {
+    auto host_size = host_src.size() * sizeof(T);
+    CHECK(gpu_dst->size() == 0 || gpu_dst->size() >= host_size);
+    return ThenMemcpy(gpu_dst, host_src.begin(), host_size);
+  }
+
+  // Entrain onto the stream: a memcpy to a GPU destination from a GPU source
+  // of the given target size. gpu_src/dst must be pointers to GPU memory and
+  // peer access must be enabled between their owning StreamExecutors.
+  Stream &ThenMemcpy(DeviceMemoryBase *gpu_dst, const DeviceMemoryBase &gpu_src,
+                     uint64 size);
+
+  // Calls to the device-to-device copy overload of ThenMemcpy -- useful for
+  // ensuring that the host pointer isn't getting confused accidentally with a
+  // device pointer if you're not doing metaprogramming against the API.
+  Stream &ThenMemcpyD2D(DeviceMemoryBase *gpu_dst,
+                        const DeviceMemoryBase &gpu_src, uint64 size) {
+    return ThenMemcpy(gpu_dst, gpu_src, size);
+  }
+
+  // Entrain onto the stream: a memset of zero at a GPU location of size
+  // bytes.
+  // The location must not be null.
+  // TODO(leary) Presently the size must be a 4-byte multiple.
+  Stream &ThenMemZero(DeviceMemoryBase *location, uint64 size);
+
+  // Entrain onto the stream: a memset of a 32-bit pattern at a GPU location
+  // of
+  // size bytes, where bytes must be evenly 32-bit sized (i.e. evently
+  // divisible
+  // by 4). The location must not be null.
+  Stream &ThenMemset32(DeviceMemoryBase *location, const uint32 &pattern,
+                       uint64 size);
+
+  // (Synchronously) block the host code waiting for the operations entrained
+  // on
+  // the stream (enqueued to this point in program execution) to complete.
+  bool BlockHostUntilDone();
+
+  // Warning! This method interacts with internal threads in
+  // sometimes-unpredictable ways and is intended for GPU-Executor-internal
+  // use
+  // only. Please check with a member of the FASTR team before making use of
+  // this method.
+  //
+  // Entrains onto the stream a function to be executed on the host at some
+  // point in the future.
+  // Async host callbacks DO NOT block the stream as device functions (or as
+  // synchronous host callbacks). No synchronization is possible with
+  // asynchronous callbacks; they are strictly fire-and-forget.
+  // This method is private due to the potential for undefined behavior with
+  // synchronization using OpenCL user events.
+  // The ONLY lifetime guarantee in these calls is that the StreamExecutor
+  // parameter will still be valid - this Stream may not be!
+  // Any callbacks requiring device API calls must use this method.
+  Stream &ThenEnqueueOnBackgroundThread(
+      std::function<void(StreamExecutor *)> task);
+
+  // Returns the (opaque) platform-specific backing object. Ownership is not
+  // transferred to the caller.
+  internal::StreamInterface *implementation() { return implementation_.get(); }
+
+  // Entrains onto the stream a callback to the host (from the device).
+  // Host callbacks block/occupy the stream just as device functions
+  // (execute one at a time, block later stream operations).
+  // Behavior is undefined when synchronizing using OpenCL user events.
+  // Behavior is undefined if host callbacks call device routines or insert
+  // them into any stream.
+  // On certain platforms, ThenDoHostCallback is expected to have significant
+  // negative effects on performance.
+  Stream &ThenDoHostCallback(std::function<void()> callback);
+
+  // Identical to ThenDoHostCallback; only exposed for testing purposes.
+  Stream &ThenDoHostCallbackForTest(std::function<void()> callback);
+
+  // Returns the StreamExecutor (parent object) associated with this stream.
+  StreamExecutor *parent() const {
+    CHECK(parent_ != nullptr);
+    return parent_;
+  }
+
+  // Returns the (internal usage) temporary-memory-allocation manager associated
+  // with this stream.
+  internal::TemporaryMemoryManager *temporary_memory_manager();
+
+ private:
+  friend class host::HostBlas;   // for parent_.
+  friend class host::HostFft;    // for parent_.
+  friend class host::HostRng;    // for parent_.
+  template <typename... Args>
+  friend struct ThenBlasImpl;  // for implementing ThenBlasXXX.
+  friend class ocl::CLBlas;    // for parent_.
+
+  bool InErrorState() const {
+    shared_lock lock{mu_};
+    return !ok_;
+  }
+
+  // Sets the error state if operation_retcode is false.
+  // This is a useful shorthand for many stream routines.
+  void CheckError(bool operation_retcode) {
+    if (operation_retcode) {
+      return;
+    }
+    mutex_lock lock{mu_};
+    ok_ = false;
+  }
+
+  void SetError() { CheckError(false /* = operation_retcode */); }
+
+  // The platform-dependent implementation that the StreamExecutor interface
+  // delegates to.
+  std::unique_ptr<internal::StreamInterface> implementation_;
+
+  // The StreamExecutor that supports the operation of this stream.
+  StreamExecutor *parent_;
+
+  // mutex that guards the allocation / error state flags.
+  // Mutable so that it can be obtained via const reader lock.
+  mutable mutex mu_;
+
+  // Whether Init() was successfully called to allocate this stream on the
+  // underlying platform. It simply flips from 0 to 1 with a sanity check.
+  // See StreamExecutor::AllocateStream.
+  bool allocated_ GUARDED_BY(mu_);
+
+  // Whether all operations have entrained successfully to the current program
+  // point.
+  bool ok_ GUARDED_BY(mu_);
+
+  // Sub-streams that are generated from this stream. Each element has a pointer
+  // to sub-stream and a boolean value indicating if this substream is ready to
+  // be reused.
+  std::vector<std::pair<std::unique_ptr<Stream>, bool>> sub_streams_
+      GUARDED_BY(mu_);
+
+  // Streams can allocate temporary memories to help with work they enqueue
+  // (e.g. for scratch memory spaces). This member tracks those allocations and
+  // notes when they can be reclaimed -- reclamation is attempted when
+  // BlockHostUntilDone() is called.
+  internal::TemporaryMemoryManager temporary_memory_manager_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(Stream);
+};
+
+////////////
+// Inlines
+
+template <typename T>
+inline port::StatusOr<std::unique_ptr<TemporaryDeviceMemory<T>>>
+Stream::AllocateTemporaryArray(uint64 element_count) {
+  return temporary_memory_manager_.AllocateArray<T>(element_count);
+}
+
+inline internal::TemporaryMemoryManager *Stream::temporary_memory_manager() {
+  return &temporary_memory_manager_;
+}
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_STREAM_H_
diff --git a/tensorflow/stream_executor/stream_executor.h b/tensorflow/stream_executor/stream_executor.h
new file mode 100644
index 0000000000..3bccaec5e3
--- /dev/null
+++ b/tensorflow/stream_executor/stream_executor.h
@@ -0,0 +1,50 @@
+// The StreamExecutor is a single-device abstraction for:
+//
+// * Loading/launching data-parallel-kernels
+// * Invoking pre-canned high-performance library routines (like matrix
+//   multiply)
+//
+// The appropriately-typed kernel and "loader spec" are automatically generated
+// for the user within a namespace by the gcudacc compiler output, so typical
+// use looks like so:
+//
+//    namespace gpu = ::perftools::gputools;
+//    namespace gcudacc = ::platforms::gpus::gcudacc;
+//
+//    gpu::StreamExecutor stream_exec{PlatformKind::kCuda};
+//    gcudacc::kernel::MyKernel my_kernel{&stream_exec};
+//    bool ok = stream_exec.GetKernel(gcudacc::spec::MyKernelSpec(),
+//    &my_kernel);
+//    if (!ok) { ... }
+//    gpu::DeviceMemory<int> result = stream_exec.AllocateZeroed<int>();
+//    if (result == nullptr) { ... }
+//    int host_result;
+//    gpu::Stream my_stream{&stream_exec};
+//    my_stream
+//      .Init()
+//      .ThenLaunch(ThreadDim{1024}, BlockDim{1}, my_kernel, result)
+//      .ThenMemcpy(&host_result, result, sizeof(host_result))
+//      .BlockHostUntilDone()
+//    if (!my_stream.ok()) { ... }
+//    printf("%d\n", host_result);
+//
+// Since the device may operate asynchronously to the host, the
+// Stream::BlockHostUntilDone() call forces the calling host thread to wait for
+// the chain of commands specified for the Stream to complete execution.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_H_
+#define TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_H_
+
+#include "tensorflow/stream_executor/device_description.h"  // IWYU pragma: export
+#include "tensorflow/stream_executor/device_memory.h"    // IWYU pragma: export
+#include "tensorflow/stream_executor/device_options.h"  // IWYU pragma: export
+#include "tensorflow/stream_executor/event.h"           // IWYU pragma: export
+#include "tensorflow/stream_executor/kernel.h"       // IWYU pragma: export
+#include "tensorflow/stream_executor/kernel_spec.h"  // IWYU pragma: export
+#include "tensorflow/stream_executor/launch_dim.h"   // IWYU pragma: export
+#include "tensorflow/stream_executor/platform.h"     // IWYU pragma: export
+#include "tensorflow/stream_executor/stream.h"       // IWYU pragma: export
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"  // IWYU pragma: export
+#include "tensorflow/stream_executor/timer.h"            // IWYU pragma: export
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_H_
diff --git a/tensorflow/stream_executor/stream_executor_internal.cc b/tensorflow/stream_executor/stream_executor_internal.cc
new file mode 100644
index 0000000000..b2785e0874
--- /dev/null
+++ b/tensorflow/stream_executor/stream_executor_internal.cc
@@ -0,0 +1,65 @@
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+
+namespace perftools {
+namespace gputools {
+namespace internal {
+
+// -- CUDA
+
+StreamExecutorFactory* MakeCUDAExecutorImplementation() {
+  static StreamExecutorFactory instance;
+  return &instance;
+}
+EventFactory* MakeCUDAEventImplementation() {
+  static EventFactory instance;
+  return &instance;
+}
+StreamFactory* MakeCUDAStreamImplementation() {
+  static StreamFactory instance;
+  return &instance;
+}
+TimerFactory* MakeCUDATimerImplementation() {
+  static TimerFactory instance;
+  return &instance;
+}
+KernelFactory* MakeCUDAKernelImplementation() {
+  static KernelFactory instance;
+  return &instance;
+}
+
+// -- OpenCL
+
+StreamExecutorFactory* MakeOpenCLExecutorImplementation() {
+  static StreamExecutorFactory instance;
+  return &instance;
+}
+StreamExecutorFactory* MakeOpenCLAlteraExecutorImplementation() {
+  static StreamExecutorFactory instance;
+  return &instance;
+}
+StreamFactory* MakeOpenCLStreamImplementation() {
+  static StreamFactory instance;
+  return &instance;
+}
+TimerFactory* MakeOpenCLTimerImplementation() {
+  static TimerFactory instance;
+  return &instance;
+}
+KernelFactory* MakeOpenCLKernelImplementation() {
+  static KernelFactory instance;
+  return &instance;
+}
+
+// -- Host
+
+StreamExecutorFactory MakeHostExecutorImplementation;
+StreamFactory MakeHostStreamImplementation;
+TimerFactory MakeHostTimerImplementation;
+
+
+}  // namespace internal
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/stream_executor_internal.h b/tensorflow/stream_executor/stream_executor_internal.h
new file mode 100644
index 0000000000..5b4e596cfe
--- /dev/null
+++ b/tensorflow/stream_executor/stream_executor_internal.h
@@ -0,0 +1,364 @@
+// Interfaces for platform-dependent implementations to satisfy. This are
+// delegated to from the StreamExecutor in pointer-to-implementation style; i.e.
+// the StreamExecutor is just a husk that delegates calls to the
+// platform-specific objects which implement the interfaces defined here.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_INTERNAL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_INTERNAL_H_
+
+#include <functional>
+#include <map>
+#include <memory>
+#include <utility>
+#include <vector>
+
+#include "tensorflow/stream_executor/device_description.h"
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/device_options.h"
+#include "tensorflow/stream_executor/dnn.h"
+#include "tensorflow/stream_executor/event.h"
+#include "tensorflow/stream_executor/kernel.h"
+#include "tensorflow/stream_executor/kernel_cache_config.h"
+#include "tensorflow/stream_executor/kernel_spec.h"
+#include "tensorflow/stream_executor/launch_dim.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/plugin_registry.h"
+#include "tensorflow/stream_executor/shared_memory_config.h"
+#include "tensorflow/stream_executor/trace_listener.h"
+#include "tensorflow/stream_executor/lib/inlined_vector.h"
+
+namespace perftools {
+namespace gputools {
+
+class KernelBase;
+class Stream;
+class Timer;
+
+namespace blas {
+class BlasSupport;
+}  // namespace blas
+
+namespace fft {
+class Support;
+}  // namespace fft
+
+namespace rng {
+class RngSupport;
+}  // namespace rng
+
+}  // namespace gputools
+}  // namespace perftools
+
+namespace perftools {
+namespace gputools {
+namespace internal {
+
+// Interface for the different StreamExecutor platforms (i.e. CUDA, OpenCL).
+//
+// Various platforms will provide an implementation that satisfy this interface.
+class StreamExecutorInterface {
+ public:
+  // Default constructor for the abstract interface.
+  StreamExecutorInterface() {}
+
+  // Default destructor for the abstract interface.
+  virtual ~StreamExecutorInterface() {}
+
+  // Returns the (transitively) wrapped executor if this executor is
+  // wrapping another executor; otherwise, returns this.
+  virtual StreamExecutorInterface *GetUnderlyingExecutor() { return this; }
+
+  // See the StreamExecutor interface for comments on the same-named methods.
+  virtual port::Status Init(int device_ordinal,
+                            DeviceOptions device_options) = 0;
+  virtual bool GetKernel(const MultiKernelLoaderSpec &spec,
+                         KernelBase *kernel) {
+    return false;
+  }
+  virtual bool Launch(Stream *stream, const ThreadDim &thread_dims,
+                      const BlockDim &block_dims, const KernelBase &k,
+                      const std::vector<KernelArg> &args) {
+    return false;
+  }
+  virtual void *Allocate(uint64 size) = 0;
+  virtual void *AllocateSubBuffer(DeviceMemoryBase *parent, uint64 offset,
+                                  uint64 size) = 0;
+  virtual void Deallocate(DeviceMemoryBase *mem) = 0;
+  virtual void *HostMemoryAllocate(uint64 size) = 0;
+  virtual void HostMemoryDeallocate(void *mem) = 0;
+  virtual bool HostMemoryRegister(void *mem, uint64 size) = 0;
+  virtual bool HostMemoryUnregister(void *mem) = 0;
+  virtual bool SynchronizeAllActivity() = 0;
+  virtual bool SynchronousMemZero(DeviceMemoryBase *location, uint64 size) = 0;
+  virtual bool SynchronousMemSet(DeviceMemoryBase *location, int value,
+                                 uint64 size) = 0;
+  virtual bool SynchronousMemcpy(DeviceMemoryBase *gpu_dst,
+                                 const void *host_src, uint64 size) = 0;
+  virtual bool SynchronousMemcpy(void *host_dst,
+                                 const DeviceMemoryBase &gpu_src,
+                                 uint64 size) = 0;
+  virtual bool SynchronousMemcpyDeviceToDevice(DeviceMemoryBase *gpu_dst,
+                                               const DeviceMemoryBase &gpu_src,
+                                               uint64 size) = 0;
+  virtual bool MemZero(Stream *stream, DeviceMemoryBase *location,
+                       uint64 size) = 0;
+  virtual bool Memset32(Stream *stream, DeviceMemoryBase *location,
+                        uint32 pattern, uint64 size) = 0;
+  virtual bool Memcpy(Stream *stream, void *host_dst,
+                      const DeviceMemoryBase &gpu_src, uint64 size) = 0;
+  virtual bool Memcpy(Stream *stream, DeviceMemoryBase *gpu_dst,
+                      const void *host_src, uint64 size) = 0;
+  virtual bool MemcpyDeviceToDevice(Stream *stream, DeviceMemoryBase *gpu_dst,
+                                    const DeviceMemoryBase &host_src,
+                                    uint64 size) = 0;
+  virtual bool HostCallback(Stream *stream, std::function<void()> callback) = 0;
+  virtual port::Status AllocateEvent(Event *event) = 0;
+  virtual port::Status DeallocateEvent(Event *event) = 0;
+  virtual port::Status RecordEvent(Stream *stream, Event *event) = 0;
+  virtual port::Status WaitForEvent(Stream *stream, Event *event) = 0;
+  virtual Event::Status PollForEventStatus(Event *event) = 0;
+  virtual bool AllocateStream(Stream *stream) = 0;
+  virtual void DeallocateStream(Stream *stream) = 0;
+  virtual bool CreateStreamDependency(Stream *dependent, Stream *other) = 0;
+  virtual bool AllocateTimer(Timer *timer) = 0;
+  virtual void DeallocateTimer(Timer *timer) = 0;
+  virtual bool StartTimer(Stream *stream, Timer *timer) = 0;
+  virtual bool StopTimer(Stream *stream, Timer *timer) = 0;
+  virtual bool BlockHostUntilDone(Stream *stream) = 0;
+  virtual int PlatformDeviceCount() = 0;
+  virtual port::Status EnablePeerAccessTo(StreamExecutorInterface *other) = 0;
+  virtual bool CanEnablePeerAccessTo(StreamExecutorInterface *other) = 0;
+  virtual SharedMemoryConfig GetDeviceSharedMemoryConfig() = 0;
+  virtual port::Status SetDeviceSharedMemoryConfig(
+      SharedMemoryConfig config) = 0;
+
+  virtual bool DeviceMemoryUsage(int64 *free, int64 *total) const {
+    return false;
+  }
+
+  // Retrieves device pointer and size for a symbol. The device pointer is
+  // stored at mem, and the size is stored at size. Either mem or bytes can be
+  // null, however, both of them cannot be null at the same time. To use
+  // constant memory in CUDA, GetSymbol has to be used. Returns true if symbol
+  // is found.
+  virtual bool GetSymbol(const string& symbol_name, void **mem, size_t *bytes) {
+    return false;
+  }
+
+  // Creates a new DeviceDescription object. Ownership is transferred to the
+  // caller.
+  virtual DeviceDescription *PopulateDeviceDescription() const = 0;
+
+  virtual KernelArg DeviceMemoryToKernelArg(
+      const DeviceMemoryBase &gpu_mem) const = 0;
+
+  // Attempts to register the provided TraceListener with the device-specific
+  // Executor implementation. When this is called, the PIMPL interface has
+  // already taken ownership of the object and is managing the generic tracing
+  // events. The device-specific implementation must determine if the passed
+  // listener is of a type appropriate for it to trace during registration (and
+  // before dispatching events to it).
+  // Returns true if the listener was successfully registered, false otherwise.
+  // Does not take ownership of listener.
+  virtual bool RegisterTraceListener(TraceListener* listener) { return false; }
+
+  // Unregisters the specified listener from the device-specific Executor.
+  // Returns true if the listener was successfully registered, false otherwise.
+  virtual bool UnregisterTraceListener(TraceListener* listener) {
+    return false;
+  }
+
+  // Returns whether this StreamExecutor has BLAS support for its underlying
+  // platform.
+  virtual bool SupportsBlas() const { return false; }
+
+  // Creates a new BlasSupport object, ownership is transferred to the caller.
+  // If SupportsBlas() is false, this will always return null.
+  //
+  // If SupportsBlas() is true, this may return null, for example, if the BLAS
+  // initialization fails.
+  virtual blas::BlasSupport *CreateBlas() { return nullptr; }
+
+  // Returns whether this StreamExecutor has FFT support for its underlying
+  // platform.
+  virtual bool SupportsFft() const { return false; }
+
+  // Creates a new fft::FftSupport object, ownership is transferred to the
+  // caller.
+  // If SupportsFft() is false, this will always return null.
+  //
+  // If SupportsFft() is true, this may return null, for example, if the FFT
+  // initialization fails.
+  virtual fft::FftSupport *CreateFft() { return nullptr; }
+
+  // Returns whether this StreamExecutor has Random Number Generation support
+  // for
+  // its underlying platform.
+  virtual bool SupportsRng() const { return false; }
+
+  // Returns whether this StreamExecutor has neural net support for its
+  // underlying
+  // platform.
+  virtual bool SupportsDnn() const { return false; }
+
+  // Creates a new RngSupport object, ownership is transferred to the caller.
+  // If SupportsRng() is false, this will always return null.
+  //
+  // If SupportsRng() is true, this may return null, for example, if the RNG
+  // initialization fails.
+  virtual rng::RngSupport *CreateRng() { return nullptr; }
+
+  // Creates a new DnnSupport object, ownership is transferred to the caller.
+  // If SupportsDnn() is false, this will always return null.
+  //
+  // If SupportsDnn() is true, this may return null, for example, if the RNG
+  // initialization fails.
+  virtual dnn::DnnSupport *CreateDnn() { return nullptr; }
+
+  // Please read the warning below. This method is only temporary. See
+  // http://b/15759750
+  //
+  // Returns the CUDA context associated with this StreamExecutor platform
+  // implementation.
+  //
+  // WARNING: checks that the underlying platform is, in fact, CUDA, causing a
+  // fatal error if it is not. This hack is made available solely for use from
+  // distbelief code, which temporarily has strong ties to CUDA as a platform.
+  virtual void *CudaContextHack() { return nullptr; }
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(StreamExecutorInterface);
+};
+
+// Pointer-to-implementation object type (i.e. the KernelBase class delegates to
+// this interface) with virtual destruction. This class exists for the
+// platform-dependent code to hang any kernel data/resource info/functionality
+// off of.
+class KernelInterface {
+ public:
+  // Default constructor for the abstract interface.
+  KernelInterface() {}
+
+  // Default destructor for the abstract interface.
+  virtual ~KernelInterface() {}
+
+  // Returns the number of formal parameters that this kernel accepts.
+  virtual unsigned Arity() const = 0;
+
+  // Sets the preferred cache configuration.
+  virtual void SetPreferredCacheConfig(KernelCacheConfig config) = 0;
+
+  // Gets the preferred cache configuration.
+  virtual KernelCacheConfig GetPreferredCacheConfig() const = 0;
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(KernelInterface);
+};
+
+// Platform-dependent interface class for the generic Events interface, in
+// the PIMPL style.
+class EventInterface {
+ public:
+  EventInterface() {}
+  virtual ~EventInterface() {}
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(EventInterface);
+};
+
+// Pointer-to-implementation object type (i.e. the Stream class delegates to
+// this interface) with virtual destruction. This class exists for the
+// platform-dependent code to hang any kernel data/resource info/functionality
+// off of.
+class StreamInterface {
+ public:
+  // Default constructor for the abstract interface.
+  StreamInterface() {}
+
+  // Default destructor for the abstract interface.
+  virtual ~StreamInterface() {}
+
+  // Please read the warning below. This method is only temporary. See
+  // http://b/15759750
+  //
+  // Returns the CUDA stream associated with this platform's stream
+  // implementation.
+  //
+  // WARNING: checks that the underlying platform is, in fact, CUDA, causing a
+  // fatal error if it is not. This hack is made available solely for use from
+  // distbelief code, which temporarily has strong ties to CUDA as a platform.
+  virtual void *CudaStreamHack() { return nullptr; }
+
+  // Please read the warning above. This method is only temporary. See
+  // http://b/15759750
+  //
+  // See the above comment on CudaStreamHack -- this further breaks abstraction
+  // for Eigen within distbelief, which has strong ties to CUDA as a platform,
+  // and a historical attachment to a programming model which takes a
+  // stream-slot rather than a stream-value.
+  virtual void **CudaStreamMemberHack() { return nullptr; }
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(StreamInterface);
+};
+
+// Pointer-to-implementation object type (i.e. the Timer class delegates to
+// this interface) with virtual destruction. This class exists for the
+// platform-dependent code to hang any timer data/resource info/functionality
+// off of.
+class TimerInterface {
+ public:
+  // Default constructor for the abstract interface.
+  TimerInterface() {}
+
+  // Default destructor for the abstract interface.
+  virtual ~TimerInterface() {}
+
+  // Returns the number of microseconds elapsed in a completed timer.
+  virtual uint64 Microseconds() const = 0;
+
+  // Returns the number of nanoseconds elapsed in a completed timer.
+  virtual uint64 Nanoseconds() const = 0;
+
+ private:
+  SE_DISALLOW_COPY_AND_ASSIGN(TimerInterface);
+};
+
+// Extern functions for constructing platform-specific instances that conform to
+// the StreamExecutor interface. (Defining constructor functions extern in this
+// way prevents CUDA/OpenCL headers from leaking into any shared header files.)
+//
+// TODO(leary) switch this all over to registries.
+
+using StreamExecutorFactory =
+    std::function<StreamExecutorInterface *(const PluginConfig &)>;
+using EventFactory = std::function<EventInterface *(StreamExecutor *)>;
+using StreamFactory = std::function<StreamInterface *(StreamExecutor *)>;
+using TimerFactory = std::function<TimerInterface *(StreamExecutor *)>;
+using KernelFactory = std::function<KernelInterface*()>;
+
+EventFactory* MakeCUDAEventImplementation();
+StreamExecutorFactory* MakeCUDAExecutorImplementation();
+StreamFactory* MakeCUDAStreamImplementation();
+TimerFactory* MakeCUDATimerImplementation();
+KernelFactory* MakeCUDAKernelImplementation();
+
+StreamExecutorFactory* MakeOpenCLExecutorImplementation();
+StreamExecutorFactory* MakeOpenCLAlteraExecutorImplementation();
+StreamFactory* MakeOpenCLStreamImplementation();
+TimerFactory* MakeOpenCLTimerImplementation();
+KernelFactory* MakeOpenCLKernelImplementation();
+
+extern StreamExecutorFactory MakeHostExecutorImplementation;
+extern StreamFactory MakeHostStreamImplementation;
+extern TimerFactory MakeHostTimerImplementation;
+
+
+}  // namespace internal
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_INTERNAL_H_
diff --git a/tensorflow/stream_executor/stream_executor_pimpl.cc b/tensorflow/stream_executor/stream_executor_pimpl.cc
new file mode 100644
index 0000000000..22b7a50b79
--- /dev/null
+++ b/tensorflow/stream_executor/stream_executor_pimpl.cc
@@ -0,0 +1,642 @@
+// Implements the StreamExecutor interface by passing through to its
+// implementation_ value (in pointer-to-implementation style), which
+// implements StreamExecutorInterface.
+
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+
+#include <atomic>
+
+#include "tensorflow/stream_executor/blas.h"
+#include "tensorflow/stream_executor/fft.h"
+#include "tensorflow/stream_executor/lib/env.h"
+#include "tensorflow/stream_executor/lib/error.h"
+#include "tensorflow/stream_executor/lib/notification.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/lib/threadpool.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/rng.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+
+namespace {
+bool FLAGS_check_gpu_leaks = false;
+}  // namespace
+
+namespace perftools {
+namespace gputools {
+namespace {
+
+// Maximum stack depth to report when generating backtrace on mem allocation
+// (for GPU memory leak checker)
+static const int kMaxStackDepth = 256;
+
+// Make sure the executor is done with its work; we know (because this isn't
+// publicly visible) that all enqueued work is quick.
+void BlockOnThreadExecutor(port::ThreadPool *executor) {
+  port::Notification n;
+  executor->Schedule([&n]() { n.Notify(); });
+  n.WaitForNotification();
+}
+
+internal::StreamExecutorInterface *StreamExecutorImplementationFromPlatformKind(
+    PlatformKind platform_kind, const PluginConfig &plugin_config) {
+  // Note: we use this factory-assignment-in-switch pattern instead of just
+  // invoking the callable in case linkage is messed up -- instead of invoking a
+  // nullptr std::function (due to failed registration) we give a nice
+  // LOG(FATAL) message.
+  internal::StreamExecutorFactory factory;
+  switch (platform_kind) {
+    case PlatformKind::kCuda:
+      factory = *internal::MakeCUDAExecutorImplementation();
+      break;
+    case PlatformKind::kOpenCL:
+      factory = *internal::MakeOpenCLExecutorImplementation();
+      break;
+    case PlatformKind::kOpenCLAltera:
+      factory = *internal::MakeOpenCLAlteraExecutorImplementation();
+      break;
+    case PlatformKind::kHost:
+      factory = internal::MakeHostExecutorImplementation;
+      break;
+    default:
+      factory = nullptr;
+  }
+  if (factory == nullptr) {
+    LOG(FATAL)
+        << "cannot create GPU executor implementation for platform kind: "
+        << PlatformKindString(platform_kind);
+  }
+  return factory(plugin_config);
+}
+
+std::atomic_int_fast64_t correlation_id_generator(0);
+
+}  // namespace
+
+template <typename BeginCallT, typename CompleteCallT,
+          typename ReturnT, typename... BeginArgsT>
+class ScopedTracer {
+ public:
+  ScopedTracer(StreamExecutor *stream_exec, BeginCallT begin_call,
+               CompleteCallT complete_call, const ReturnT *result,
+               BeginArgsT... begin_args)
+      : stream_exec_(stream_exec),
+        complete_call_(complete_call),
+        result_(result) {
+    if (stream_exec_->tracing_enabled_) {
+      correlation_id_ =
+          correlation_id_generator.fetch_add(1, std::memory_order_relaxed) - 1;
+      Trace(begin_call, begin_args...);
+    }
+  }
+
+  ~ScopedTracer() {
+    if (stream_exec_->tracing_enabled_) {
+      Trace(complete_call_, result_);
+    }
+  }
+
+ private:
+  template <typename CallbackT, typename... TraceArgsT>
+  void Trace(CallbackT callback, TraceArgsT... args) {
+    {
+      // Instance tracers held in a block to limit the lock lifetime.
+      shared_lock lock{stream_exec_->mu_};
+      for (TraceListener *listener : stream_exec_->listeners_) {
+        (listener->*callback)(correlation_id_,
+                              std::forward<TraceArgsT>(args)...);
+      }
+    }
+  }
+
+  StreamExecutor *stream_exec_;
+  CompleteCallT complete_call_;
+  const ReturnT* result_;
+  int64 correlation_id_;
+};
+
+template <typename BeginCallT, typename CompleteCallT, typename ReturnT,
+          typename... BeginArgsT>
+ScopedTracer<BeginCallT, CompleteCallT, ReturnT, BeginArgsT...>
+MakeScopedTracer(StreamExecutor *stream_exec, BeginCallT begin_call,
+                 CompleteCallT complete_call, ReturnT *result,
+                 BeginArgsT... begin_args) {
+  return ScopedTracer<BeginCallT, CompleteCallT, ReturnT, BeginArgsT...>(
+      stream_exec, begin_call, complete_call, result,
+      std::forward<BeginArgsT>(begin_args)...);
+}
+
+#define SCOPED_TRACE(LOC, ...)                                      \
+  auto tracer = MakeScopedTracer(this, &LOC ## Begin,               \
+                                 &LOC ## Complete, ## __VA_ARGS__);
+
+/* static */ mutex StreamExecutor::static_mu_{LINKER_INITIALIZED};
+
+StreamExecutor::StreamExecutor(PlatformKind platform_kind,
+                               const PluginConfig &plugin_config)
+    : implementation_(StreamExecutorImplementationFromPlatformKind(
+          platform_kind, plugin_config)),
+      platform_kind_(platform_kind),
+      device_ordinal_(-1),
+      background_threads_(new port::ThreadPool(
+          port::Env::Default(), "stream_executor", kNumBackgroundThreads)),
+      live_stream_count_(0),
+      tracing_enabled_(false) {
+  CheckPlatformKindIsValid(platform_kind);
+}
+
+StreamExecutor::StreamExecutor(
+    PlatformKind platform_kind,
+    internal::StreamExecutorInterface *implementation)
+    : implementation_(implementation),
+      platform_kind_(platform_kind),
+      device_ordinal_(-1),
+      background_threads_(new port::ThreadPool(
+          port::Env::Default(), "stream_executor", kNumBackgroundThreads)),
+      live_stream_count_(0),
+      tracing_enabled_(false) {
+  CheckPlatformKindIsValid(platform_kind);
+}
+
+StreamExecutor::~StreamExecutor() {
+  BlockOnThreadExecutor(background_threads_.get());
+
+  if (live_stream_count_.load() != 0) {
+    LOG(WARNING) << "Not all streams were deallocated at executor destruction "
+                 << "time. This may lead to unexpected/bad behavior - "
+                 << "especially if any stream is still active!";
+  }
+
+  if (FLAGS_check_gpu_leaks) {
+    for (auto it : mem_allocs_) {
+      LOG(INFO) << "Memory alloced at executor exit: addr: "
+                << port::Printf("%p", it.first)
+                << ", bytes: " << it.second.bytes << ", trace: \n"
+                << it.second.stack_trace;
+    }
+  }
+}
+
+port::Status StreamExecutor::Init(int device_ordinal,
+                                  DeviceOptions device_options) {
+  device_ordinal_ = device_ordinal;
+  return implementation_->Init(device_ordinal, device_options);
+}
+
+port::Status StreamExecutor::Init() {
+  return Init(0, DeviceOptions::Default());
+}
+
+bool StreamExecutor::GetKernel(const MultiKernelLoaderSpec &spec,
+                               KernelBase *kernel) {
+  return implementation_->GetKernel(spec, kernel);
+}
+
+void StreamExecutor::Deallocate(DeviceMemoryBase *mem) {
+  VLOG(1) << "Called StreamExecutor::Deallocate(mem=" << mem->opaque()
+          << ") mem->size()=" << mem->size();
+
+  if (mem->opaque() != nullptr) {
+    EraseAllocRecord(mem->opaque());
+  }
+  implementation_->Deallocate(mem);
+  mem->Reset(nullptr, 0);
+}
+
+void StreamExecutor::GetMemAllocs(std::map<void *, AllocRecord> *records_out) {
+  shared_lock lock{mu_};
+  *records_out = mem_allocs_;
+}
+
+bool StreamExecutor::CanEnablePeerAccessTo(StreamExecutor *other) {
+  return implementation_->CanEnablePeerAccessTo(other->implementation_.get());
+}
+
+port::Status StreamExecutor::EnablePeerAccessTo(StreamExecutor *other) {
+  return implementation_->EnablePeerAccessTo(other->implementation_.get());
+}
+
+SharedMemoryConfig StreamExecutor::GetDeviceSharedMemoryConfig() {
+  return implementation_->GetDeviceSharedMemoryConfig();
+}
+
+port::Status StreamExecutor::SetDeviceSharedMemoryConfig(
+    SharedMemoryConfig config) {
+  if (config != SharedMemoryConfig::kDefault &&
+      config != SharedMemoryConfig::kFourByte &&
+      config != SharedMemoryConfig::kEightByte) {
+    string error_msg = port::Printf(
+        "Invalid shared memory config specified: %d", static_cast<int>(config));
+    LOG(ERROR) << error_msg;
+    return port::Status{port::error::INVALID_ARGUMENT, error_msg};
+  }
+  return implementation_->SetDeviceSharedMemoryConfig(config);
+}
+
+const DeviceDescription &StreamExecutor::GetDeviceDescription() const {
+  mutex_lock lock{mu_};
+  if (device_description_ != nullptr) {
+    return *device_description_;
+  }
+
+  device_description_.reset(PopulateDeviceDescription());
+  return *device_description_;
+}
+
+int StreamExecutor::PlatformDeviceCount() const {
+  return implementation_->PlatformDeviceCount();
+}
+
+bool StreamExecutor::SupportsBlas() const {
+  return implementation_->SupportsBlas();
+}
+
+bool StreamExecutor::SupportsRng() const {
+  return implementation_->SupportsRng();
+}
+
+bool StreamExecutor::SupportsDnn() const {
+  return implementation_->SupportsDnn();
+}
+
+dnn::DnnSupport *StreamExecutor::AsDnn() {
+  mutex_lock lock{mu_};
+  if (dnn_ != nullptr) {
+    return dnn_.get();
+  }
+
+  dnn_.reset(implementation_->CreateDnn());
+  return dnn_.get();
+}
+
+blas::BlasSupport *StreamExecutor::AsBlas() {
+  mutex_lock lock{mu_};
+  if (blas_ != nullptr) {
+    return blas_.get();
+  }
+
+  blas_.reset(implementation_->CreateBlas());
+  return blas_.get();
+}
+
+fft::FftSupport *StreamExecutor::AsFft() {
+  mutex_lock lock{mu_};
+  if (fft_ != nullptr) {
+    return fft_.get();
+  }
+
+  fft_.reset(implementation_->CreateFft());
+  return fft_.get();
+}
+
+rng::RngSupport *StreamExecutor::AsRng() {
+  mutex_lock lock{mu_};
+  if (rng_ != nullptr) {
+    return rng_.get();
+  }
+
+  rng_.reset(implementation_->CreateRng());
+  return rng_.get();
+}
+
+bool StreamExecutor::Launch(Stream *stream, const ThreadDim &thread_dims,
+                            const BlockDim &block_dims,
+                            const KernelBase &kernel,
+                            const std::vector<KernelArg> &args) {
+  SubmitTrace(&TraceListener::LaunchSubmit, stream, thread_dims, block_dims,
+              kernel, args);
+
+  return implementation_->Launch(stream, thread_dims, block_dims, kernel, args);
+}
+
+bool StreamExecutor::BlockHostUntilDone(Stream *stream) {
+  bool result;
+  SCOPED_TRACE(TraceListener::BlockHostUntilDone, &result, stream);
+
+  result = implementation_->BlockHostUntilDone(stream);
+  return result;
+}
+
+void *StreamExecutor::Allocate(uint64 size) {
+  void *buf = implementation_->Allocate(size);
+  VLOG(1) << "Called StreamExecutor::Allocate(size=" << size
+          << ") returns " << buf;
+  CreateAllocRecord(buf, size);
+
+  return buf;
+}
+
+bool StreamExecutor::GetSymbol(const string &symbol_name, void **mem,
+                               size_t *bytes) {
+  return implementation_->GetSymbol(symbol_name, mem, bytes);
+}
+
+void *StreamExecutor::HostMemoryAllocate(uint64 size) {
+  void *buffer = implementation_->HostMemoryAllocate(size);
+  VLOG(1) << "Called StreamExecutor::HostMemoryAllocate(size=" << size
+          << ") returns " << buffer;
+  return buffer;
+}
+
+void StreamExecutor::HostMemoryDeallocate(void *location) {
+  VLOG(1) << "Called StreamExecutor::HostMemoryDeallocate(location="
+          << location << ")";
+
+  return implementation_->HostMemoryDeallocate(location);
+}
+
+bool StreamExecutor::HostMemoryRegister(void *location, uint64 size) {
+  VLOG(1) << "Called StreamExecutor::HostMemoryRegister(location=" << location
+          << ", size=" << size << ")";
+  if (location == nullptr || size == 0) {
+    LOG(WARNING) << "attempting to register null or zero-sized memory: "
+                 << location << "; size " << size;
+  }
+  return implementation_->HostMemoryRegister(location, size);
+}
+
+bool StreamExecutor::HostMemoryUnregister(void *location) {
+  VLOG(1) << "Called StreamExecutor::HostMemoryUnregister(location=" << location
+          << ")";
+  return implementation_->HostMemoryUnregister(location);
+}
+
+bool StreamExecutor::SynchronizeAllActivity() {
+  VLOG(1) << "Called StreamExecutor::SynchronizeAllActivity()";
+  bool ok = implementation_->SynchronizeAllActivity();
+
+  // This should all be quick and infallible work, so we can perform the
+  // synchronization even in the case of failure.
+  BlockOnThreadExecutor(background_threads_.get());
+
+  return ok;
+}
+
+bool StreamExecutor::SynchronousMemZero(DeviceMemoryBase *location,
+                                        uint64 size) {
+  VLOG(1) << "Called StreamExecutor::SynchronousMemZero(location="
+          << location << ", size=" << size << ")";
+
+  return implementation_->SynchronousMemZero(location, size);
+}
+
+bool StreamExecutor::SynchronousMemSet(DeviceMemoryBase *location, int value,
+                                       uint64 size) {
+  VLOG(1) << "Called StreamExecutor::SynchronousMemSet(location="
+          << location << ", value=" << value << ", size=" << size << ")";
+
+  return implementation_->SynchronousMemSet(location, value, size);
+}
+
+bool StreamExecutor::SynchronousMemcpy(DeviceMemoryBase *gpu_dst,
+                                       const void *host_src, uint64 size) {
+  VLOG(1) << "Called StreamExecutor::SynchronousMemcpy(gpu_dst="
+          << gpu_dst->opaque() << ", host_src=" << host_src << ", size=" << size
+          << ") H2D";
+
+  // Tracing overloaded methods is very difficult due to issues with type
+  // inference on template args. Since use of these overloaded methods is
+  // discouraged anyway, this isn't a huge deal.
+  return implementation_->SynchronousMemcpy(gpu_dst, host_src, size);
+}
+
+bool StreamExecutor::SynchronousMemcpy(void *host_dst,
+                                       const DeviceMemoryBase &gpu_src,
+                                       uint64 size) {
+  VLOG(1) << "Called StreamExecutor::SynchronousMemcpy(host_dst="
+          << host_dst << ", gpu_src=" << gpu_src.opaque() << ", size=" << size
+          << ") D2H";
+
+  return implementation_->SynchronousMemcpy(host_dst, gpu_src, size);
+}
+
+bool StreamExecutor::SynchronousMemcpy(DeviceMemoryBase *gpu_dst,
+                                       const DeviceMemoryBase &gpu_src,
+                                       uint64 size) {
+  VLOG(1) << "Called StreamExecutor::SynchronousMemcpy(gpu_dst="
+          << gpu_dst->opaque() << ", gpu_src=" << gpu_src.opaque() << ", size=" << size
+          << ") D2D";
+
+  return implementation_->SynchronousMemcpyDeviceToDevice(gpu_dst, gpu_src,
+                                                          size);
+}
+
+port::Status StreamExecutor::SynchronousMemcpyD2H(
+    const DeviceMemoryBase &gpu_src, int64 size, void *host_dst) {
+  VLOG(1) << "Called StreamExecutor::SynchronousMemcpyD2H(gpu_src="
+          << gpu_src.opaque() << ", size=" << size << ", host_dst=" << host_dst << ")";
+
+  port::Status result{port::Status::OK()};
+  SCOPED_TRACE(TraceListener::SynchronousMemcpyD2H,
+               &result, gpu_src, size, host_dst);
+
+  if (!implementation_->SynchronousMemcpy(host_dst, gpu_src, size)) {
+    return port::Status{
+        port::error::INTERNAL,
+        port::Printf(
+            "failed to synchronously memcpy device-to-host: GPU %p to host %p "
+            "size %lld",
+            gpu_src.opaque(), host_dst, size)};
+  }
+
+  return result;
+}
+
+port::Status StreamExecutor::SynchronousMemcpyH2D(const void *host_src,
+                                                  int64 size,
+                                                  DeviceMemoryBase *gpu_dst) {
+  VLOG(1) << "Called StreamExecutor::SynchronousMemcpyH2D(host_src="
+          << host_src << ", size=" << size << ", gpu_dst" << gpu_dst->opaque() << ")";
+
+  port::Status result{port::Status::OK()};
+  SCOPED_TRACE(TraceListener::SynchronousMemcpyH2D,
+               &result, host_src, size, gpu_dst);
+
+  if (!implementation_->SynchronousMemcpy(gpu_dst, host_src, size)) {
+    result = port::Status{
+        port::error::INTERNAL,
+        port::Printf("failed to synchronously memcpy host-to-device: host "
+                     "%p to GPU %p size %lld",
+                     host_src, gpu_dst->opaque(), size)};
+  }
+
+  return result;
+}
+
+bool StreamExecutor::Memcpy(Stream *stream, void *host_dst,
+                            const DeviceMemoryBase &gpu_src, uint64 size) {
+  return implementation_->Memcpy(stream, host_dst, gpu_src, size);
+}
+
+bool StreamExecutor::Memcpy(Stream *stream, DeviceMemoryBase *gpu_dst,
+                            const void *host_src, uint64 size) {
+  return implementation_->Memcpy(stream, gpu_dst, host_src, size);
+}
+
+bool StreamExecutor::MemcpyDeviceToDevice(Stream *stream,
+                                          DeviceMemoryBase *gpu_dst,
+                                          const DeviceMemoryBase &gpu_src,
+                                          uint64 size) {
+  return implementation_->MemcpyDeviceToDevice(stream, gpu_dst, gpu_src, size);
+}
+
+bool StreamExecutor::MemZero(Stream *stream, DeviceMemoryBase *location,
+                             uint64 size) {
+  return implementation_->MemZero(stream, location, size);
+}
+
+bool StreamExecutor::Memset32(Stream *stream, DeviceMemoryBase *location,
+                              uint32 pattern, uint64 size) {
+  CHECK_EQ(0, size % 4)
+      << "need 32-bit multiple size to fill with 32-bit pattern";
+  return implementation_->Memset32(stream, location, pattern, size);
+}
+
+bool StreamExecutor::HostCallback(Stream *stream,
+                                  std::function<void()> callback) {
+  return implementation_->HostCallback(stream, callback);
+}
+
+port::Status StreamExecutor::AllocateEvent(Event *event) {
+  return implementation_->AllocateEvent(event);
+}
+
+port::Status StreamExecutor::DeallocateEvent(Event *event) {
+  return implementation_->DeallocateEvent(event);
+}
+
+port::Status StreamExecutor::RecordEvent(Stream *stream, Event *event) {
+  return implementation_->RecordEvent(stream, event);
+}
+
+port::Status StreamExecutor::WaitForEvent(Stream *stream, Event *event) {
+  return implementation_->WaitForEvent(stream, event);
+}
+
+Event::Status StreamExecutor::PollForEventStatus(Event *event) {
+  return implementation_->PollForEventStatus(event);
+}
+
+bool StreamExecutor::AllocateStream(Stream *stream) {
+  live_stream_count_.fetch_add(1, std::memory_order_relaxed);
+  if (!implementation_->AllocateStream(stream)) {
+    auto count = live_stream_count_.fetch_sub(1);
+    CHECK_GE(count, 0) << "live stream count should not dip below zero";
+    LOG(INFO) << "failed to allocate stream; live stream count: " << count;
+    return false;
+  }
+
+  return true;
+}
+
+void StreamExecutor::DeallocateStream(Stream *stream) {
+  implementation_->DeallocateStream(stream);
+  CHECK_GE(live_stream_count_.fetch_sub(1), 0)
+      << "live stream count should not dip below zero";
+}
+
+bool StreamExecutor::CreateStreamDependency(Stream *dependent, Stream *other) {
+  return implementation_->CreateStreamDependency(dependent, other);
+}
+
+bool StreamExecutor::AllocateTimer(Timer *timer) {
+  return implementation_->AllocateTimer(timer);
+}
+
+void StreamExecutor::DeallocateTimer(Timer *timer) {
+  return implementation_->DeallocateTimer(timer);
+}
+
+bool StreamExecutor::StartTimer(Stream *stream, Timer *timer) {
+  return implementation_->StartTimer(stream, timer);
+}
+
+bool StreamExecutor::StopTimer(Stream *stream, Timer *timer) {
+  return implementation_->StopTimer(stream, timer);
+}
+
+DeviceDescription *StreamExecutor::PopulateDeviceDescription() const {
+  return implementation_->PopulateDeviceDescription();
+}
+
+bool StreamExecutor::DeviceMemoryUsage(int64 *free, int64 *total) const {
+  return implementation_->DeviceMemoryUsage(free, total);
+}
+
+KernelArg StreamExecutor::DeviceMemoryToKernelArg(
+    const DeviceMemoryBase &gpu_mem) const {
+  return implementation_->DeviceMemoryToKernelArg(gpu_mem);
+}
+
+void StreamExecutor::EnqueueOnBackgroundThread(std::function<void()> task) {
+  background_threads_->Schedule(task);
+}
+
+void StreamExecutor::CreateAllocRecord(void *opaque, uint64 bytes) {
+  if (FLAGS_check_gpu_leaks && opaque != nullptr && bytes != 0) {
+    mutex_lock lock{mu_};
+    mem_allocs_[opaque] = AllocRecord{
+        bytes, ""};
+  }
+}
+
+void StreamExecutor::EraseAllocRecord(void *opaque) {
+  if (FLAGS_check_gpu_leaks && opaque != nullptr) {
+    mutex_lock lock{mu_};
+    if (mem_allocs_.find(opaque) == mem_allocs_.end()) {
+      LOG(ERROR) << "Deallocating unknown pointer: "
+                 << port::Printf("0x%p", opaque);
+    } else {
+      mem_allocs_.erase(opaque);
+    }
+  }
+}
+
+void StreamExecutor::EnableTracing(bool enabled) { tracing_enabled_ = enabled; }
+
+void StreamExecutor::RegisterTraceListener(TraceListener *listener) {
+  {
+    mutex_lock lock{mu_};
+    if (listeners_.find(listener) != listeners_.end()) {
+      LOG(INFO) << "Attempt to register already-registered listener, "
+                << listener;
+    } else {
+      listeners_.insert(listener);
+    }
+  }
+
+  implementation_->RegisterTraceListener(listener);
+}
+
+bool StreamExecutor::UnregisterTraceListener(TraceListener *listener) {
+  {
+    mutex_lock lock{mu_};
+    if (listeners_.find(listener) == listeners_.end()) {
+      LOG(INFO) << "Attempt to unregister unknown listener, " << listener;
+      return false;
+    }
+    listeners_.erase(listener);
+  }
+
+  implementation_->UnregisterTraceListener(listener);
+  return true;
+}
+
+template <typename TraceCallT, typename... ArgsT>
+void StreamExecutor::SubmitTrace(TraceCallT trace_call, ArgsT &&... args) {
+  if (tracing_enabled_) {
+    {
+      // instance tracers held in a block to limit the lock lifetime.
+      shared_lock lock{mu_};
+      for (TraceListener *listener : listeners_) {
+        (listener->*trace_call)(std::forward<ArgsT>(args)...);
+      }
+    }
+  }
+}
+
+internal::StreamExecutorInterface *StreamExecutor::implementation() {
+  return implementation_->GetUnderlyingExecutor();
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/stream_executor_pimpl.h b/tensorflow/stream_executor/stream_executor_pimpl.h
new file mode 100644
index 0000000000..29ab235d0e
--- /dev/null
+++ b/tensorflow/stream_executor/stream_executor_pimpl.h
@@ -0,0 +1,725 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_PIMPL_H_
+#define TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_PIMPL_H_
+
+#include <atomic>
+#include <set>
+#include <tuple>
+#include <vector>
+
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/lib/strcat.h"
+#include "tensorflow/stream_executor/lib/threadpool.h"
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/port.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/rng.h"
+#include "tensorflow/stream_executor/shared_memory_config.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+#include "tensorflow/stream_executor/trace_listener.h"
+
+namespace perftools {
+namespace gputools {
+
+// Structure used for device memory leak checking.
+struct AllocRecord {
+  // The requested allocation size of the buffer.
+  uint64 bytes;
+
+  // Holds a representation of the stack at the time the associated buffer was
+  // allocated. Produced in a form described in
+  // //util/symbolize/symbolized_stacktrace.h.
+  string stack_trace;
+};
+
+// Forward declaration of private friend class.
+template <typename BeginCallT, typename CompleteCallT,
+          typename ReturnT, typename... BeginArgsT>
+class ScopedTracer;
+
+// A StreamExecutor manages a single device, in terms of executing work (kernel
+// launches) and memory management (allocation/deallocation, memory copies to
+// and from the device). It is conceptually the "handle" for a device -- Stream
+// objects, which are used to enqueue work to run on the
+// coprocessor have a StreamExecutor instance as their "parent" object.
+//
+// StreamExecutor objects have an underlying platform that is specified up
+// front;
+// e.g. either it is a CUDA or OpenCL executor.
+//
+// Thread-safe after initialization.
+// StreamExecutor interface should not be invoked from a signal handler.
+class StreamExecutor {
+ public:
+  explicit StreamExecutor(PlatformKind kind,
+                          const PluginConfig &plugin_config = PluginConfig());
+
+  // Primarily used for testing.
+  StreamExecutor(PlatformKind kind,
+                 internal::StreamExecutorInterface *implementation);
+
+  ~StreamExecutor();
+
+  port::Status Init();
+  port::Status Init(int device_ordinal, DeviceOptions device_options);
+
+  // Returns the platform that this StreamExecutor is acting upon.
+  PlatformKind platform_kind() const { return platform_kind_; }
+
+  // Retrieves (loads) a kernel for the platform this StreamExecutor is acting
+  // upon, if one exists.
+  //
+  // Parameters:
+  //   spec: The MultiKernelLoaderSpec is usually generated as a compile-time
+  //    constant into an appropriate namespace. For example, see
+  //    perftools::gputools::executor_sample::kKernelLoaderSpecs, from which a
+  //    MultiKernelLoaderSpec is selected.
+  //   kernel: Outparam that the kernel is loaded into. A given Kernel
+  //    instantiation should not be loaded into more than once.
+  //
+  // If an error occurs, or there is no kernel available for the StreamExecutor
+  // platform, false is returned.
+  bool GetKernel(const MultiKernelLoaderSpec &spec, KernelBase *kernel);
+
+  // Synchronously allocates an array on the GPU device of type T with
+  // element_count elements.
+  template <typename T>
+  DeviceMemory<T> AllocateArray(uint64 element_count);
+
+  // As AllocateArray(), but returns a ScopedDeviceMemory<T>.
+  template <typename T>
+  ScopedDeviceMemory<T> AllocateOwnedArray(uint64 element_count) {
+    return ScopedDeviceMemory<T>(this, AllocateArray<T>(element_count));
+  }
+
+  // Convenience wrapper that allocates space for a single element of type T
+  // in GPU memory.
+  template <typename T>
+  DeviceMemory<T> AllocateScalar() {
+    return AllocateArray<T>(1);
+  }
+
+  // As AllocateScalar(), but returns a ScopedDeviceMemory<T>.
+  template <typename T>
+  ScopedDeviceMemory<T> AllocateOwnedScalar() {
+    return AllocateOwnedArray<T>(1);
+  }
+
+  // Synchronously allocates a scalar of type T on the GPU device that is
+  // (POD) zero-byte initialized.
+  template <typename T>
+  DeviceMemory<T> AllocateZeroed();
+
+  // As AllocateZeroed(), but returns a ScopedDeviceMemory<T>.
+  template <typename T>
+  ScopedDeviceMemory<T> AllocateOwnedZeroed() {
+    return ScopedDeviceMemory<T>(this, AllocateZeroed<T>());
+  }
+
+  // Allocate a memory region inside another allocated memory region.
+  // Offset and size are specified in terms of T elements.
+  // Warning: Do not free a parent buffer before its sub-buffers; this may cause
+  // use-after-free issues (the specific behavior is not consistent across
+  // platforms).
+  //  - Note: OpenCL uses refcounting to manage buffer lifetimes, so use of a
+  //    sub-buffer after parent deallocation is expected to be safe. This will
+  //    render your code non-platform-portable, however.
+  template <typename T>
+  DeviceMemory<T> AllocateSubBuffer(DeviceMemory<T> *parent,
+                                    uint64 element_offset,
+                                    uint64 element_count);
+
+  // As AllocateSubBuffer(), but returns a ScopedDeviceMemory<T>.
+  template <typename T>
+  ScopedDeviceMemory<T> AllocateOwnedSubBuffer(DeviceMemory<T> *parent,
+                                               uint64 element_offset,
+                                               uint64 element_count) {
+    return ScopedDeviceMemory<T>(
+        this, AllocateSubBuffer<T>(parent, element_offset, element_count));
+  }
+
+  // Finds a symbol and returns device memory allocated to the symbol. The
+  // symbol is searched in any kernels that were previously loaded through
+  // GetKernel() before the GetSymbol() call. The user has to make sure that the
+  // type of symbol and T match.
+  // - Note: symbol_name should include its namespace as well. For example,
+  //         pass "nms0::symbol" if referring to nms0::symbol.
+  template <typename T>
+  port::StatusOr<DeviceMemory<T>> GetSymbol(const string &symbol_name);
+
+  // Deallocate the DeviceMemory previously allocated via this interface.
+  // Deallocation of a nullptr-representative value is permitted.
+  //
+  // Resets the internal contents of mem to be null-representative, but this
+  // null-out effect should not be relied upon in client code.
+  void Deallocate(DeviceMemoryBase *mem);
+
+  // Retrieves a mapping of active opaque GPU memory pointer to a string
+  // representation of the [allocating thread's] stack at the time the pointer
+  // was allocated. Useful for tracking GPU memory leaks.
+  //
+  // Note: this will only be populated if --check_gpu_leaks flag is activated.
+  void GetMemAllocs(std::map<void *, AllocRecord> *records_out);
+
+  // Allocates a region of host memory and registers it with the platform API.
+  // Memory allocated in this manner (or allocated and registered with
+  // HostMemoryRegister() is required for use in asynchronous memcpy operations,
+  // such as Stream::ThenMemcpy.
+  void *HostMemoryAllocate(uint64 bytes);
+
+  // Deallocates a region of host memory allocated by HostMemoryAllocate().
+  void HostMemoryDeallocate(void *location);
+
+  // Registers a region of host memory with the platform API. Registered memory
+  // (or memory allocated with HostMemoryAllocate) is required for use with
+  // asynchronous memcpy operations, such as Stream::ThenMemcpy. This method
+  // is used to register memory allocated outside the StreamExecutor;
+  // HostMemoryAllocate implicitly registers its allocations and
+  // HostMemoryDeallocate implicitly deregisters on deallocation.
+  bool HostMemoryRegister(void *location, uint64 size) SE_MUST_USE_RESULT;
+
+  // Unregisters a region of host memory registered with HostMemoryRegister.
+  // This should be done before deallocating the region with delete[]/free/etc.
+  bool HostMemoryUnregister(void *location) SE_MUST_USE_RESULT;
+
+  // Synchronizes all activity occuring in the StreamExecutor's context (most
+  // likely a whole device).
+  bool SynchronizeAllActivity() SE_MUST_USE_RESULT;
+
+  // Blocks the caller while "size" bytes are zeroed out (in POD fashion) at the
+  // given location in GPU memory.
+  bool SynchronousMemZero(DeviceMemoryBase *location,
+                          uint64 size) SE_MUST_USE_RESULT;
+
+  // Blocks the caller while "size" bytes are initialized to "value" (in POD
+  // fashion) at the given location in GPU memory.
+  bool SynchronousMemSet(DeviceMemoryBase *location, int value,
+                         uint64 size) SE_MUST_USE_RESULT;
+
+  // [deprecated] Blocks the caller while a data segment of the given size is
+  // copied from the host source to the GPU destination.
+  //
+  // Deprecation: prefer explicit H2D below, to avoid error-prone API usage.
+  bool SynchronousMemcpy(DeviceMemoryBase *gpu_dst, const void *host_src,
+                         uint64 size) SE_MUST_USE_RESULT;
+
+  // [deprecated] Blocks the caller while a data segment of the given size is
+  // copied from the GPU source to the host destination.
+  //
+  // Deprecation: prefer explicit D2H below, to avoid error-prone API usage.
+  bool SynchronousMemcpy(void *host_dst, const DeviceMemoryBase &gpu_src,
+                         uint64 size) SE_MUST_USE_RESULT;
+
+  // Same as SynchronousMemcpy(DeviceMemoryBase*, ...) above.
+  port::Status SynchronousMemcpyH2D(const void *host_src, int64 size,
+                                    DeviceMemoryBase *gpu_dst);
+
+  // Alternative interface for memcpying from host to device that takes an
+  // array slice. Checks that the destination size can accomodate the host
+  // slice size.
+  template <class T>
+  port::Status SynchronousMemcpyH2D(port::ArraySlice<T> host_src,
+                                    DeviceMemoryBase *gpu_dst) {
+    auto host_size = host_src.size() * sizeof(T);
+    CHECK(gpu_dst->size() == 0 || gpu_dst->size() >= host_size);
+    return SynchronousMemcpyH2D(host_src.begin(), host_size, gpu_dst);
+  }
+
+  // Same as SynchronousMemcpy(void*, ...) above.
+  port::Status SynchronousMemcpyD2H(const DeviceMemoryBase &gpu_src, int64 size,
+                                    void *host_dst);
+
+  // Alternative interface for memcpying from device to host that takes an
+  // array slice. Checks that the destination size can accomodate the host
+  // slice size.
+  template <typename T>
+  port::Status SynchronousMemcpyD2H(const DeviceMemory<T> &gpu_src,
+                                    port::MutableArraySlice<T> host_dst) {
+    auto host_size = host_dst.size() * sizeof(T);
+    CHECK(gpu_src.size() == 0 || host_size >= gpu_src.size());
+    return SynchronousMemcpyD2H(gpu_src, host_size, host_dst.begin());
+  }
+
+  // Blocks the caller while a data segment of the given size is copied from the
+  // GPU source to the GPU destination.
+  bool SynchronousMemcpy(DeviceMemoryBase *gpu_dst,
+                         const DeviceMemoryBase &gpu_src,
+                         uint64 size) SE_MUST_USE_RESULT;
+
+  // Enqueues an operation onto stream to zero out size bytes at the given GPU
+  // memory location. Neither stream nor location may be null. Returns whether
+  // the operation was successfully enqueued onto the stream.
+  bool MemZero(Stream *stream, DeviceMemoryBase *location,
+               uint64 size) SE_MUST_USE_RESULT;
+
+  // Enqueues an operation onto stream to set 32-bit patterns starting at
+  // location, for byte count given by size. size must be 32-bit quantified
+  // (i.e. evently divisible by 4). Returns whether the operation was
+  // successfully enqueued onto the stream.
+  bool Memset32(Stream *stream, DeviceMemoryBase *location, uint32 pattern,
+                uint64 size) SE_MUST_USE_RESULT;
+
+  // Enables peer access from this StreamExecutor to memory
+  // allocated by other, such that launched device code, memcpies, etc may
+  // access it directly.
+  //
+  // Both this StreamExecutor and other must be backed by the same platform (as
+  // in
+  // CUDA vs OpenCL) implementation.
+  port::Status EnablePeerAccessTo(StreamExecutor *other);
+
+  // Returns whether it's possible to enable peer access from this
+  // StreamExecutor
+  // to memory allocated by another.
+  //
+  // Even when this returns true, EnablePeerAccessTo may fail for other reasons;
+  // this is more an up-front test as to whether it's expressly forbidden.
+  bool CanEnablePeerAccessTo(StreamExecutor *other);
+
+  // Gets the preferred shared memory configuration for the device to which this
+  // executor is bound.
+  SharedMemoryConfig GetDeviceSharedMemoryConfig();
+
+  // Sets the preferred shared memory configuration for the device to which this
+  // executor is bound.
+  port::Status SetDeviceSharedMemoryConfig(SharedMemoryConfig config);
+
+  // Obtains metadata about the underlying device.
+  // The value is cached on first use.
+  const DeviceDescription &GetDeviceDescription() const;
+
+  // Returns the underlying device memory usage information, if it is available.
+  // If it is not available (false is returned), free/total may not be
+  // initialized.
+  //
+  // Note: "Free" reflects the amount of free memory on the underlying device,
+  // so allocations via other StreamExecutors that have the same underlying
+  // device
+  // will be reflected in "free".
+  bool DeviceMemoryUsage(int64 *free, int64 *total) const;
+
+  // The device count reported by this StreamExecutor's platform.
+  // Note: on OpenCL we implicitly select platform zero at the moment.
+  int PlatformDeviceCount() const;
+
+  // Returns whether the StreamExecutor supports BLAS routines for the platform
+  // that underlies this interface.
+  bool SupportsBlas() const;
+
+  // Returns whether the StreamExecutor supports FFT routines for the platform
+  // that underlies this interface.
+  bool SupportsFft() const;
+
+  // Returns whether the StreamExecutor supports RNG routines for the platform
+  // that underlies this interface.
+  bool SupportsRng() const;
+
+  // Returns whether the StreamExecutor support neural net routines for the
+  // platform that underlies this interface.
+  bool SupportsDnn() const;
+
+  // Returns the device ordinal that this StreamExecutor was initialized with.
+  // Meaningless before initialization.
+  int device_ordinal() const { return device_ordinal_; }
+
+  // Returns a borrowed pointer to the underlying StreamExecutor implementation.
+  internal::StreamExecutorInterface *implementation();
+
+  // Warning: use Stream::ThenLaunch instead, this method is not for general
+  // consumption. However, this is the only way to launch a kernel for which
+  // the type signature is only known at runtime; say, if an application
+  // supports loading/launching kernels with arbitrary type signatures.
+  // In this case, the application is expected to know how to do parameter
+  // packing that obeys the contract of the underlying platform implementation.
+  //
+  // Launches a data parallel kernel with the given thread/block
+  // dimensionality and already-packed args/sizes to pass to the underlying
+  // platform driver.
+  //
+  // This is called by Stream::Launch() to delegate to the platform's launch
+  // implementation in StreamExecutorInterface::Launch().
+  bool Launch(Stream *stream, const ThreadDim &thread_dims,
+              const BlockDim &block_dims, const KernelBase &kernel,
+              const std::vector<KernelArg> &args);
+
+  // Gets-or-creates (creates with memoization) a FftSupport datatype that can
+  // be used to execute FFT routines on the current platform.
+  //
+  // Ownership and user-facing is the same as AsBlas() below.
+  //
+  // Returns null if there was an error initializing the FFT support for the
+  // underlying platform.
+  fft::FftSupport *AsFft();
+
+  // Gets-or-creates (creates with memoization) a DnnSupport datatype that can
+  // be used for neural network routines on the current platform.
+  //
+  // Ownership and user-facing is the same as AsBlas() below.
+  //
+  // Returns null if there was an error initializing the DNN support for the
+  // underlying platform.
+  dnn::DnnSupport *AsDnn();
+
+  // Turns StreamExecutor operation tracing on or off.
+  void EnableTracing(bool enable);
+
+  // Registers a trace listener to receive callbacks for only a single
+  // StreamExecutor instance.
+  // To register a listener for all executors for a given platform, see
+  // Platform::RegisterTraceListener().
+  // Does not take ownership of listener.
+  void RegisterTraceListener(TraceListener* listener);
+
+  // Removes a TraceListener from this StreamExecutor instance.
+  // Returns false (and logs) in cases where the argument listener was not
+  // previously registered.
+  bool UnregisterTraceListener(TraceListener* listener);
+
+  // Converts a DeviceMemory object into a KernelArg object for passing to the
+  // device driver for kernel launch.
+  KernelArg DeviceMemoryToKernelArg(const DeviceMemoryBase &gpu_mem) const;
+
+ private:
+  template <typename BeginCallT, typename CompleteCallT,
+            typename ReturnT, typename... BeginArgsT>
+  friend class ScopedTracer;
+  friend class Event;
+  friend class Stream;
+  friend class Timer;
+  template <typename... Params>
+  friend class TypedKernel;
+  template <typename... Args>
+  friend struct ThenBlasImpl;
+
+  // Gets-or-creates (creates with memoization) a BlasSupport datatype that can
+  // be used to execute BLAS routines on the current platform. This is typically
+  // not user-facing, as users will use the Stream::ThenBlas* family of routines
+  // to entrain BLAS operations. See blas.h for additional details.
+  //
+  // Ownership is not transferred to the caller -- ownership is retained by this
+  // object for memoization. This BLAS interface is also only expected to be
+  // used by a Stream for entraining calls to BLAS functionality.
+  //
+  // Returns null if there was an error initializing the BLAS support for the
+  // underlying platform.
+  blas::BlasSupport *AsBlas();
+
+  // Gets-or-creates (creates with memoization) an RngSupport datatype that can
+  // be used for random-number-generation routines on the current platform.
+  //
+  // Ownership and user-facing is the same as AsBlas() above.
+  //
+  // Returns null if there was an error initializing the RNG support for the
+  // underlying platform.
+  rng::RngSupport *AsRng();
+
+  // Causes the host code to synchronously wait for operations entrained onto
+  // stream to complete. Effectively a join on the asynchronous GPU operations
+  // enqueued on the stream before this program point.
+  bool BlockHostUntilDone(Stream *stream);
+
+  // Synchronously allocates size bytes on the underlying platform and returns
+  // an opaque void* representing that allocation. In the case of failure,
+  // nullptr is returned.
+  void *Allocate(uint64 size);
+
+  // Finds and retrieves device memory for the symbol on the underlying
+  // platform.
+  bool GetSymbol(const string& symbol_name, void **mem, size_t *bytes);
+
+  // Entrains a memcpy operation onto stream, with a host destination location
+  // host_dst and a GPU memory source, with target size size.
+  bool Memcpy(Stream *stream, void *host_dst, const DeviceMemoryBase &gpu_src,
+              uint64 size);
+
+  // Entrains a memcpy operation onto stream, with a GPU destination location
+  // and a host memory source, with target size size.
+  bool Memcpy(Stream *stream, DeviceMemoryBase *gpu_dst, const void *host_src,
+              uint64 size);
+
+  // Entrains a memcpy operation onto stream, with a GPU destination location
+  // and a GPU source location, with target size size. Peer access should have
+  // been enabled between the StreamExecutors owning the GPU memory regions.
+  bool MemcpyDeviceToDevice(Stream *stream, DeviceMemoryBase *gpu_dst,
+                            const DeviceMemoryBase &gpu_src, uint64 size);
+
+  // Entrains on a stream a user-specified function to be run on the host.
+  // See Stream::ThenDoHostCallback for full details.
+  bool HostCallback(Stream *stream, std::function<void()> callback);
+
+  // Performs platform-specific allocation and initialization of an event.
+  port::Status AllocateEvent(Event *event);
+
+  // Performs platform-specific deallocation and cleanup of an event.
+  port::Status DeallocateEvent(Event *event);
+
+  // Inserts the specified event at the end of the specified stream.
+  port::Status RecordEvent(Stream *stream, Event *event);
+
+  // Wait for the specified event at the end of the specified stream.
+  port::Status WaitForEvent(Stream *stream, Event *event);
+
+  // Requests the current status of the event from the underlying platform.
+  Event::Status PollForEventStatus(Event *event);
+
+  // Allocates stream resources on the underlying platform for subject and
+  // initializes its internals.
+  bool AllocateStream(Stream *subject);
+
+  // Deallocates stream resources on the underlying platform.
+  void DeallocateStream(Stream *subject);
+
+  // Causes dependent to not begin execution until other has finished its
+  // last-enqueued work.
+  bool CreateStreamDependency(Stream *dependent, Stream *other);
+
+  // Allocates timer resources on the underlying platform for subject and
+  // initializes its internals.
+  bool AllocateTimer(Timer *subject);
+
+  // Deallocates timer resources on the underlying platform.
+  void DeallocateTimer(Timer *subject);
+
+  // Records a start event for an interval timer.
+  bool StartTimer(Stream *stream, Timer *timer);
+
+  // Records a stop event for an interval timer.
+  bool StopTimer(Stream *stream, Timer *timer);
+
+  // Allocates a new metadata object, appropriately populated, on the heap, with
+  // ownership transfer to caller.
+  DeviceDescription *PopulateDeviceDescription() const;
+
+  // Adds a task to the port::ThreadPool work queue. These tasks must be
+  // fire-and-forget and have no external data or timing dependencies; their
+  // execution order and completion time have no guarantees.
+  // For an example of an appropriate task, see HostBlas::DoBlasGemmInternal;
+  // there, temporary internal buffers are freed using this method.
+  void EnqueueOnBackgroundThread(std::function<void()> task);
+
+  // Adds an AllocRecord for 'opaque' of size 'bytes' to the record map, for
+  // leak checking. NULL buffer pointers and buffer sizes of 0 will not be
+  // tracked.
+  void CreateAllocRecord(void *opaque, uint64 size);
+
+  // Removes the AllocRecord keyed by 'opaque' from the record map. NULL
+  // pointers will not be erased (as they're not tracked, per above).
+  void EraseAllocRecord(void *opaque);
+
+  // Calls the relevant TraceListener routine to begin tracing for the specified
+  // asynchronous method.
+  template <typename TraceCallT, typename... ArgsT>
+  void SubmitTrace(TraceCallT trace_call, ArgsT&&... args);
+
+  // Reader/writer lock for class-static StreamExecutor members.
+  static mutex static_mu_;
+
+  // Reader/writer lock for mutable data structures on this StreamExecutor.
+  //
+  // Mutable so that caching functions (like DeviceDescription, AsBlas, etc.)
+  // can acquire the lock on their first (mutating) call as well.
+  mutable mutex mu_;
+
+  // A mapping of pointer (to GPU memory) to string representation of the stack
+  // (of the allocating thread) at the time at which the pointer was allocated.
+  std::map<void *, AllocRecord> mem_allocs_ GUARDED_BY(mu_);
+
+  // Pointer to the platform-specific-interface implementation. This is
+  // delegated to by the interface routines in pointer-to-implementation
+  // fashion.
+  std::unique_ptr<internal::StreamExecutorInterface> implementation_;
+
+  // Memoized BLAS support object -- we only want to create this once when asked
+  // for a BLAS interface.
+  std::unique_ptr<blas::BlasSupport> blas_ GUARDED_BY(mu_);
+
+  // Memoized DNN support object -- we only want to create this once when asked
+  // for an DNN interface.
+  std::unique_ptr<dnn::DnnSupport> dnn_ GUARDED_BY(mu_);
+
+  // Memoized FFT support object -- we only want to create this once when asked
+  // for a FFT interface.
+  std::unique_ptr<fft::FftSupport> fft_;
+
+  // Memoized RNG support object -- we only want to create this once when asked
+  // for an RNG interface.
+  std::unique_ptr<rng::RngSupport> rng_ GUARDED_BY(mu_);
+
+  // Slot to cache the owned DeviceDescription for the underlying device
+  // once it has been quieried from DeviceDescription().
+  mutable std::unique_ptr<DeviceDescription> device_description_
+      GUARDED_BY(mu_);
+
+  // The kind of the underlying platform that is being targeted, as passed
+  // during construction.
+  //
+  // Immutable post-initialization.
+  PlatformKind platform_kind_;
+
+  // The device ordinal that this object was initialized with.
+  //
+  // Immutable post-initialization.
+  int device_ordinal_;
+
+  // Executor for handling host callback work that cannot be performed
+  // by a host callback thread - for example, cleanup after a host BLAS routine
+  // (which may make device API calls). This work cannot block the host
+  // callback thread, will be completed asynchronously, and should be treated
+  // as fire-and-forget. Assume no ordering guarantees WRT the tasks enqueued
+  // here.
+  //
+  // Immutable post-initialization. Object is thread-safe.
+  std::unique_ptr<port::ThreadPool> background_threads_;
+
+  // Counter for the current number of live streams. This is used to check
+  // for accidentally-outstanding streams at StreamExecutor teardown time, as
+  // well
+  // as to indicate leaks (via a large outstanding count being logged) in the
+  // case we can't allocate more streams.
+  std::atomic_int_fast32_t live_stream_count_;
+
+  // Only one worker thread is needed; little work will be done by the
+  // executor.
+  static const int kNumBackgroundThreads = 1;
+
+  // Indicates if StreamExecutor operation tracing should be performed.
+  bool tracing_enabled_;
+
+  // The set of TraceListeners registered for this StreamExecutor.
+  std::set<TraceListener*> listeners_ GUARDED_BY(mu_);
+
+  SE_DISALLOW_COPY_AND_ASSIGN(StreamExecutor);
+};
+
+////////////
+// Inlines
+
+template <typename T>
+inline DeviceMemory<T> StreamExecutor::AllocateArray(uint64 element_count) {
+  uint64 bytes = sizeof(T) * element_count;
+  void *opaque = Allocate(bytes);
+  return DeviceMemory<T>::MakeFromByteSize(opaque, bytes);
+}
+
+template <typename T>
+inline port::StatusOr<DeviceMemory<T>> StreamExecutor::GetSymbol(
+    const string &symbol_name) {
+  // If failed to get the symbol, opaque/bytes are unchanged. Initialize them to
+  // be nullptr/0 for consistency with DeviceMemory semantics.
+  void *opaque = nullptr;
+  size_t bytes = 0;
+  if (GetSymbol(symbol_name, &opaque, &bytes)) {
+    CHECK_EQ(bytes % sizeof(T), 0);
+    return DeviceMemory<T>::MakeFromByteSize(opaque, bytes);
+  }
+  return port::Status(
+      port::error::NOT_FOUND,
+      port::StrCat("Check if kernel using the symbol is loaded: ",
+                   symbol_name));
+}
+
+template <typename ElemT>
+ScopedDeviceMemory<ElemT>::ScopedDeviceMemory(StreamExecutor *parent,
+                                              DeviceMemoryBase value)
+    : wrapped_(value), parent_(parent) {}
+
+template <typename ElemT>
+ScopedDeviceMemory<ElemT>::ScopedDeviceMemory(
+    StreamExecutor *parent, std::initializer_list<ElemT> values)
+    : ScopedDeviceMemory(parent, parent->AllocateArray<ElemT>(values.size())) {
+  if (ptr() != nullptr) {
+    std::vector<ElemT> local(values);
+    if (!parent->SynchronousMemcpy(ptr(), const_cast<const ElemT *>(&local[0]),
+                                   ptr()->size())) {
+      Reset(nullptr);
+    }
+  }
+}
+
+template <typename ElemT>
+ScopedDeviceMemory<ElemT>::~ScopedDeviceMemory() {
+  parent_->Deallocate(&wrapped_);
+}
+
+template <typename ElemT>
+void ScopedDeviceMemory<ElemT>::Reset(DeviceMemory<ElemT> updated) {
+  parent_->Deallocate(&wrapped_);
+  wrapped_ = updated;
+}
+
+template <typename ElemT>
+void ScopedDeviceMemory<ElemT>::Reset(std::nullptr_t) {
+  parent_->Deallocate(&wrapped_);
+  wrapped_ = DeviceMemory<ElemT>{};
+}
+
+template <typename T>
+DeviceMemory<T> StreamExecutor::AllocateZeroed() {
+  void *opaque = Allocate(sizeof(T));
+  if (opaque == nullptr) {
+    return DeviceMemory<T>{};
+  }
+
+  DeviceMemory<T> result = DeviceMemory<T>::MakeFromByteSize(opaque, sizeof(T));
+  bool ok = SynchronousMemZero(&result, sizeof(T));
+  if (!ok) {
+    Deallocate(&result);
+    return DeviceMemory<T>{};
+  }
+
+  return result;
+}
+
+template <typename T>
+DeviceMemory<T> StreamExecutor::AllocateSubBuffer(DeviceMemory<T> *parent,
+                                                  uint64 element_offset,
+                                                  uint64 element_count) {
+  if (element_offset + element_count > parent->ElementCount()) {
+    LOG(ERROR) << "requested sub-buffer allocation (offset + size) is greater "
+               << "than parent allocation size: (" << element_offset << " + "
+               << element_count << ") vs. (" << parent->ElementCount() << ")";
+    return DeviceMemory<T>{};
+  }
+
+  void *opaque = implementation_->AllocateSubBuffer(
+      parent, sizeof(T) * element_offset, sizeof(T) * element_count);
+  if (opaque == nullptr) {
+    return DeviceMemory<T>{};
+  }
+  CreateAllocRecord(opaque, sizeof(T) * element_count);
+  return DeviceMemory<T>(DeviceMemoryBase(opaque, sizeof(T) * element_count,
+                                    true /* = is_sub_buffer */));
+}
+
+template <typename... Params, typename... Args>
+inline Stream &Stream::ThenLaunch(ThreadDim thread_dims, BlockDim block_dims,
+                                  const TypedKernel<Params...> &kernel,
+                                  Args... args) {
+  KernelInvocationChecker<std::tuple<Params...>,
+                          std::tuple<Args...>>::CheckAllStaticAssert();
+  if (ok()) {
+    // This is the core that allows type-safe kernel launching.
+    // Since the platforms take kernel arguments as tuples of (void *, size),
+    // we pack the variadic parameters passed as ...args into the desired
+    // tuple form and pass that packed form to the StreamExecutor::Launch()
+    // implementation.
+    std::vector<KernelArg> kernel_args;
+    kernel_args.reserve(kernel.Arity());
+    kernel.PackParams(&kernel_args, args...);
+    bool ok =
+        parent_->Launch(this, thread_dims, block_dims, kernel, kernel_args);
+    if (!ok) {
+      SetError();
+      LOG(WARNING) << "parent failed to launch kernel: " << &kernel;
+    }
+  }
+  return *this;
+}
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_STREAM_EXECUTOR_PIMPL_H_
diff --git a/tensorflow/stream_executor/temporary_device_memory.cc b/tensorflow/stream_executor/temporary_device_memory.cc
new file mode 100644
index 0000000000..d11b58813d
--- /dev/null
+++ b/tensorflow/stream_executor/temporary_device_memory.cc
@@ -0,0 +1,53 @@
+#include "tensorflow/stream_executor/temporary_device_memory.h"
+
+#include "tensorflow/stream_executor/stream.h"
+
+namespace perftools {
+namespace gputools {
+
+TemporaryDeviceMemoryBase::~TemporaryDeviceMemoryBase() {
+  parent_->temporary_memory_manager()->MarkFinalized(device_memory_,
+                                                     allocation_generation_,
+                                                     /*must_exist=*/false);
+}
+
+DeviceMemoryBase* TemporaryDeviceMemoryBase::mutable_device_memory() {
+  DCHECK(!IsFinalized())
+      << "should not access device memory after finalization";
+  return &device_memory_;
+}
+
+const DeviceMemoryBase& TemporaryDeviceMemoryBase::device_memory() const {
+  DCHECK(!IsFinalized())
+      << "should not access device memory after finalization";
+  return device_memory_;
+}
+
+void TemporaryDeviceMemoryBase::Finalize() {
+  DCHECK(!IsFinalized()) << "should not finalize more than once";
+  parent_->temporary_memory_manager()->MarkFinalized(device_memory_,
+                                                     allocation_generation_,
+                                                     /*must_exist=*/true);
+}
+
+bool TemporaryDeviceMemoryBase::IsFinalized() const {
+  return parent_->temporary_memory_manager()->IsFinalized(
+      device_memory_, allocation_generation_);
+}
+
+bool TemporaryDeviceMemoryBase::IsAllocated() const {
+  return parent_->temporary_memory_manager()->HasAllocated(
+      device_memory_, allocation_generation_);
+}
+
+TemporaryDeviceMemoryBase::TemporaryDeviceMemoryBase(
+    Stream* parent, DeviceMemoryBase device_memory,
+    uint64 allocation_generation)
+    : device_memory_(device_memory),
+      allocation_generation_(allocation_generation),
+      parent_(parent) {
+  DCHECK(IsAllocated());
+}
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/temporary_device_memory.h b/tensorflow/stream_executor/temporary_device_memory.h
new file mode 100644
index 0000000000..4e7c63056b
--- /dev/null
+++ b/tensorflow/stream_executor/temporary_device_memory.h
@@ -0,0 +1,123 @@
+// Temporary memories are used to allocate scratch space required by an
+// operation about to be enqueued onto a stream.
+//
+//    std::unique_ptr<TemporaryDeviceMemory<float>> temporary_memory =
+//        stream.AllocateTemporaryArray<float>(1024).ConsumeValueOrDie();
+//    // ... enqueue stuff onto the stream using the temporary memory ...
+//    // Note that the memory is accessible via
+//    // temporary_memory->device_memory() and similar.
+//
+//    // Finalize the temporary memory. The underlying device memory may
+//    // be released any time after this program point, as another thread may
+//    // call Stream::BlockHostUntilDone, causing synchronization. This
+//    // finalization also happens automatically for the user if the unique_ptr
+//    // goes out of scope.
+//    temporary_memory.Finalize();
+//
+// WARNING: do NOT hold onto the device memory associated with temporary_memory
+// after finalization. If temporary_memory->device_memory() is used after the
+// temporary memory is finalized, it will cause a DCHECK failure.
+//
+// Note that standard usage takes advantage of the type-safe wrapper,
+// TemporaryDeviceMemory<T>, defined below.
+//
+// Also see tests for executable sample usage.
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_TEMPORARY_DEVICE_MEMORY_H_
+#define TENSORFLOW_STREAM_EXECUTOR_TEMPORARY_DEVICE_MEMORY_H_
+
+#include "tensorflow/stream_executor/device_memory.h"
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+namespace internal {
+class TemporaryMemoryManager;
+}
+
+// Untyped base class (analogous to a void*) for temporary device memory
+// allocations associated with a stream.
+class TemporaryDeviceMemoryBase {
+ public:
+  // Marks the temporary memory as finalized if it is not already marked as
+  // such.
+  ~TemporaryDeviceMemoryBase();
+
+  // Precondition: !IsFinalized()
+  DeviceMemoryBase* mutable_device_memory();
+
+  // Precondition: !IsFinalized()
+  const DeviceMemoryBase& device_memory() const;
+
+  // "Finalizes" this temporary memory, making it acceptable to release at the
+  // next stream synchronization point -- the device memory can be reclaimed at
+  // any time after the temporary memory is marked as finalized (e.g. if a
+  // separate thread is calls Stream::BlockHostUntilDone). This may only be
+  // called once -- see the precondition below.
+  //
+  // Precondition: !IsFinalized()
+  void Finalize();
+
+  // Returns true iff the temporary memory is finalized (that is, the user is
+  // done referring to the temporary device memory, and thus it can be released
+  // at the next stream synchronization point).
+  bool IsFinalized() const;
+
+  // Returns true iff the temporary memory is still allocated.
+  //
+  // Note: this is a polling call, no guarantee is made that the temporary
+  // memory is still allocated after the call has completed.
+  bool IsAllocated() const;
+
+ private:
+  friend class internal::TemporaryMemoryManager;
+  friend class TemporaryDeviceMemoryTest;
+
+  // Note: construction DCHECKs that the memory is known-allocated in the
+  // stream's temporary-allocation-manager.
+  TemporaryDeviceMemoryBase(Stream* parent, DeviceMemoryBase device_memory,
+                            uint64 allocation_generation);
+
+  // The device memory region that has allocated.
+  DeviceMemoryBase device_memory_;
+
+  // The generation counter value for the temporary memory record in the
+  // temporary memory manager.
+  uint64 allocation_generation_;
+
+  // The stream that this temporary memory was allocated for.
+  Stream* parent_;
+};
+
+// Type-safe wrapper around the base type (which is analogous to a void*).
+template <typename T>
+class TemporaryDeviceMemory : public TemporaryDeviceMemoryBase {
+ public:
+  // Type-safe wrapper around TemporaryDeviceMemoryBase::mutable_device_memory.
+  DeviceMemory<T>* mutable_device_memory() {
+    StaticSlicingAssertionDummy();
+    return reinterpret_cast<DeviceMemory<T>*>(
+        TemporaryDeviceMemoryBase::mutable_device_memory());
+  }
+
+  // Type-safe wrapper around TemporaryDeviceMemoryBase::device_memory.
+  const DeviceMemory<T>& device_memory() const {
+    StaticSlicingAssertionDummy();
+    return reinterpret_cast<const DeviceMemory<T>&>(
+        TemporaryDeviceMemoryBase::device_memory());
+  }
+
+ private:
+  static void StaticSlicingAssertionDummy() {
+    static_assert(
+        sizeof(TemporaryDeviceMemory) == sizeof(TemporaryDeviceMemoryBase),
+        "derived class is simply a wrapper, no members may be added due to "
+        "slicing");
+  }
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_TEMPORARY_DEVICE_MEMORY_H_
diff --git a/tensorflow/stream_executor/temporary_memory_manager.cc b/tensorflow/stream_executor/temporary_memory_manager.cc
new file mode 100644
index 0000000000..0352aa4b2b
--- /dev/null
+++ b/tensorflow/stream_executor/temporary_memory_manager.cc
@@ -0,0 +1,113 @@
+#include "tensorflow/stream_executor/temporary_memory_manager.h"
+
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/lib/stringprintf.h"
+#include "tensorflow/stream_executor/lib/ptr_util.h"
+#include "tensorflow/stream_executor/stream.h"
+#include "tensorflow/stream_executor/stream_executor_pimpl.h"
+
+namespace perftools {
+namespace gputools {
+namespace internal {
+
+void TemporaryMemoryManager::ForceDeallocateAll() {
+  mutex_lock lock(mutex_);
+  VLOG(1) << "force-deallocating " << records_.size() << " remaining records";
+  for (auto it = records_.begin(); it != records_.end(); ++it) {
+    DeviceMemoryBase device_memory = it->first;
+    stream_->parent()->Deallocate(&device_memory);
+  }
+}
+
+void TemporaryMemoryManager::MarkFinalized(
+    const DeviceMemoryBase& device_memory, uint64 generation, bool must_exist) {
+  mutex_lock lock(mutex_);
+  auto it = records_.find(device_memory);
+  if (it == records_.end()) {
+    if (must_exist) {
+      LOG(FATAL) << "attempted to mark finalization for temporary "
+                    "memory that does not exist";
+    }
+    return;
+  }
+  it->second.finalized = true;
+}
+
+void TemporaryMemoryManager::DeallocateFinalizedTemporaries() {
+  mutex_lock lock(mutex_);
+  int deallocated_count = 0;
+  for (auto it = records_.begin(); it != records_.end();) {
+    if (it->second.finalized) {
+      DeviceMemoryBase device_memory = it->first;
+      stream_->parent()->Deallocate(&device_memory);
+      ++deallocated_count;
+      it = records_.erase(it);
+    } else {
+      ++it;
+    }
+  }
+  VLOG(1) << "deallocated " << deallocated_count << " finalized temporaries";
+}
+
+bool TemporaryMemoryManager::IsFinalized(const DeviceMemoryBase& device_memory,
+                                         uint64 allocation_generation) const {
+  mutex_lock lock(mutex_);
+  auto it = records_.find(device_memory);
+  if (it == records_.end()) {
+    return true;  // If there's no record present it's vacuously finalized.
+  }
+
+  if (it->second.allocation_generation == allocation_generation) {
+    return it->second.finalized;
+  }
+
+  // If the allocation generation did not match, it's vacuously true.
+  return true;
+}
+
+bool TemporaryMemoryManager::HasAllocated(const DeviceMemoryBase& device_memory,
+                                          uint64 generation) const {
+  mutex_lock lock(mutex_);
+  auto it = records_.find(device_memory);
+  if (it == records_.end()) {
+    return false;
+  }
+  return it->second.allocation_generation == generation;
+}
+
+port::StatusOr<std::unique_ptr<TemporaryDeviceMemoryBase>>
+TemporaryMemoryManager::AllocateArrayBase(uint64 element_count,
+                                          uint64 element_size) {
+  uint64 byte_size = element_count * element_size;
+  DeviceMemoryBase device_memory =
+      stream_->parent()->AllocateArray<uint8>(byte_size);
+  if (device_memory == nullptr) {
+    return port::Status(port::error::RESOURCE_EXHAUSTED,
+                        port::StrCat("could not allocate temporary memory of ",
+                                     byte_size, " bytes"));
+  }
+
+  uint64 generation;
+
+  // Add the record before instantiating the device memory instance so we can
+  // check the allocation invariant at TemporaryDeviceMemory construction time.
+  {
+    mutex_lock lock(mutex_);
+    generation = ++generation_;
+    DCHECK(records_.find(device_memory) == records_.end());
+    records_[device_memory] = {generation,
+                               /*finalized=*/false};
+  }
+
+  VLOG(1) << port::Printf(
+      "stream %p allocated temporary device memory at %p (size %llu) in "
+      "generation %llu",
+      stream_, device_memory.opaque(), byte_size, generation);
+  std::unique_ptr<TemporaryDeviceMemoryBase> result(
+      new TemporaryDeviceMemoryBase(stream_, device_memory, generation));
+  return std::move(result);
+}
+
+}  // namespace internal
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/temporary_memory_manager.h b/tensorflow/stream_executor/temporary_memory_manager.h
new file mode 100644
index 0000000000..847f0f2182
--- /dev/null
+++ b/tensorflow/stream_executor/temporary_memory_manager.h
@@ -0,0 +1,138 @@
+// The temporary-memory-manager is a helper class for a Stream to keep track of
+// temporary allocations. These allocations defer their deallocation to the next
+// Stream::BlockHostUntilDone call for efficiency purposes (as deallocation
+// itself generally forces synchronization to occur).
+
+#ifndef TENSORFLOW_STREAM_EXECUTOR_TEMPORARY_MEMORY_MANAGER_H_
+#define TENSORFLOW_STREAM_EXECUTOR_TEMPORARY_MEMORY_MANAGER_H_
+
+#include <map>
+#include <memory>
+
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/lib/status.h"
+#include "tensorflow/stream_executor/lib/statusor.h"
+#include "tensorflow/stream_executor/platform/mutex.h"
+#include "tensorflow/stream_executor/platform/thread_annotations.h"
+#include "tensorflow/stream_executor/temporary_device_memory.h"
+
+namespace perftools {
+namespace gputools {
+namespace internal {
+
+// Record used inside the TemporaryMemoryManager as metadata for a given device
+// memory region.
+struct TemporaryMemoryRecord {
+  // What "generation" this record was allocated in.
+  //
+  // Currently the generation counter is bumped for every allocation, but this
+  // could be made coarser if necessary.
+  uint64 allocation_generation;
+
+  // Notes whether the temporary memory has been marked as finalized, such that
+  // we can release the DeviceMemory associated with this record at
+  // synchronization time.
+  bool finalized;
+};
+
+// Manages temporary memories associated with a stream -- keeps records of
+// outstanding temporaries and their state, and can deallocate them
+// appropriately at points in the Stream lifecycle (e.g. BlockHostUntilDone,
+// destruction).
+class TemporaryMemoryManager {
+ public:
+  explicit TemporaryMemoryManager(Stream* stream) : stream_(stream) {}
+
+  // Allocates a temporary array that is then managed by this object.
+  template <typename T>
+  port::StatusOr<std::unique_ptr<TemporaryDeviceMemory<T>>> AllocateArray(
+      uint64 element_count);
+
+  // Forces deallocation of all managed temporary memory regions.
+  //
+  // Called, for example, when the Stream owning this temporary memory manager
+  // is destroyed.
+  //
+  // Note: These calls to Deallocate will likely force synchronization.
+  void ForceDeallocateAll();
+
+  // Marks the given memory region as finalized.
+  //
+  // If must_exist is set, this will check-fail if the temporary memory record
+  // is not found.
+  void MarkFinalized(const DeviceMemoryBase& device_memory, uint64 generation,
+                     bool must_exist);
+
+  // Deallocates temporary memories that have been finalized.
+  //
+  // Note: These calls to Deallocate will likely force synchronization, so it is
+  // meant to be called before a "BlockHostUntilDone" is about to be performed.
+  void DeallocateFinalizedTemporaries();
+
+  // Returns whether the provided device_memory is finalized.
+  //
+  // In the vacuous case where the device memory doesn't appear in the temporary
+  // memory records, it is either not a temporary at all, or has already been
+  // deallocated, and thus returns true.
+  bool IsFinalized(const DeviceMemoryBase& device_memory,
+                   uint64 allocation_generation) const;
+
+  // Returns whether the manager has a live allocation record for the given
+  // device memory pointer with the given generation counter.
+  //
+  // Note: this is a polling call -- there is no guarantee that the region is
+  // still allocated once the call has completed.
+  bool HasAllocated(const DeviceMemoryBase& device_memory,
+                    uint64 generation) const;
+
+ private:
+  // Allocates an array without type parameterization, so that the
+  // implementation can live in the source file. Without this base allocation
+  // method, we incur a circular dependency between the StreamExecutor
+  // definition and this class' definition.
+  port::StatusOr<std::unique_ptr<TemporaryDeviceMemoryBase>> AllocateArrayBase(
+      uint64 element_count, uint64 element_size);
+
+  // Mutex to guard temporary record state.
+  mutable mutex mutex_;
+
+  // Mapping from device memory to the current (live) temporary memory record.
+  //
+  // If a device memory is not in this mapping, it is not a temporary currently
+  // allocated and owned by this temporary memory manager.
+  std::map<DeviceMemoryBase, TemporaryMemoryRecord> records_ GUARDED_BY(mutex_);
+
+  // Allocation generation -- we bump this counter to distinguish temporary
+  // memory handles that have been deallocated and later reallocated at the same
+  // device memory address.
+  uint64 generation_ GUARDED_BY(mutex_);
+
+  // The stream (parent object) for this temporary memory manager -- allocations
+  // are performed through this stream handle.
+  Stream* stream_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(TemporaryMemoryManager);
+};
+
+////////////
+// Inlines
+
+template <typename T>
+port::StatusOr<std::unique_ptr<TemporaryDeviceMemory<T>>>
+TemporaryMemoryManager::AllocateArray(uint64 element_count) {
+  port::StatusOr<std::unique_ptr<TemporaryDeviceMemoryBase>> temporary_memory =
+      AllocateArrayBase(element_count, sizeof(T));
+  if (!temporary_memory.ok()) {
+    return temporary_memory.status();
+  }
+
+  return std::unique_ptr<TemporaryDeviceMemory<T>>(
+      reinterpret_cast<TemporaryDeviceMemory<T>*>(
+          temporary_memory.ConsumeValueOrDie().release()));
+}
+
+}  // namespace internal
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_TEMPORARY_MEMORY_MANAGER_H_
diff --git a/tensorflow/stream_executor/timer.cc b/tensorflow/stream_executor/timer.cc
new file mode 100644
index 0000000000..46210a2346
--- /dev/null
+++ b/tensorflow/stream_executor/timer.cc
@@ -0,0 +1,41 @@
+#include "tensorflow/stream_executor/timer.h"
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+#include "tensorflow/stream_executor/platform.h"
+#include "tensorflow/stream_executor/platform/logging.h"
+#include "tensorflow/stream_executor/stream_executor.h"
+#include "tensorflow/stream_executor/stream_executor_internal.h"
+
+namespace perftools {
+namespace gputools {
+
+static internal::TimerInterface *CreateTimerImplementation(
+    StreamExecutor *parent) {
+  PlatformKind platform_kind = parent->platform_kind();
+  if (platform_kind == PlatformKind::kCuda) {
+    return (*internal::MakeCUDATimerImplementation())(parent);
+  } else if (platform_kind == PlatformKind::kOpenCL ||
+             platform_kind == PlatformKind::kOpenCLAltera) {
+    return (*internal::MakeOpenCLTimerImplementation())(parent);
+  } else if (platform_kind == PlatformKind::kHost) {
+    return internal::MakeHostTimerImplementation(parent);
+  } else if (platform_kind == PlatformKind::kMock) {
+    return nullptr;
+  } else {
+    LOG(FATAL) << "cannot create timer implementation for platform kind: "
+               << PlatformKindString(platform_kind);
+  }
+}
+
+Timer::Timer(StreamExecutor *parent)
+    : implementation_(CreateTimerImplementation(parent)), parent_(parent) {}
+
+Timer::~Timer() { parent_->DeallocateTimer(this); }
+
+uint64 Timer::Microseconds() const { return implementation_->Microseconds(); }
+
+uint64 Timer::Nanoseconds() const { return implementation_->Nanoseconds(); }
+
+}  // namespace gputools
+}  // namespace perftools
diff --git a/tensorflow/stream_executor/timer.h b/tensorflow/stream_executor/timer.h
new file mode 100644
index 0000000000..ff54c06180
--- /dev/null
+++ b/tensorflow/stream_executor/timer.h
@@ -0,0 +1,60 @@
+#ifndef TENSORFLOW_STREAM_EXECUTOR_TIMER_H_
+#define TENSORFLOW_STREAM_EXECUTOR_TIMER_H_
+
+#include <memory>
+
+#include "tensorflow/stream_executor/platform/port.h"
+
+namespace perftools {
+namespace gputools {
+
+namespace internal {
+class TimerInterface;
+}  // namespace internal
+
+class StreamExecutor;
+
+// An interval timer, suitable for use in timing the operations which occur in
+// streams.
+//
+// Thread-hostile: CUDA associates a CUDA-context with a particular thread in
+// the system. Any operation that a user attempts to perform by using a Timer
+// on a thread not-associated with the CUDA-context has unknown behavior at the
+// current time; see b/13176597
+class Timer {
+ public:
+  // Instantiate a timer tied to parent as a platform executor.
+  explicit Timer(StreamExecutor *parent);
+
+  // Deallocates any timer resources that the parent StreamExecutor has bestowed
+  // upon this object.
+  ~Timer();
+
+  // Returns the elapsed number of microseconds for a completed timer.
+  // Completed means has been through a start/stop lifecycle.
+  uint64 Microseconds() const;
+
+  // Returns the elapsed number of nanoseconds for a completed timer.
+  // Completed means has been through a start/stop lifecycle.
+  uint64 Nanoseconds() const;
+
+  // Returns the (opaque) backing platform ITimer instance. Ownership is
+  // not transferred to the caller.
+  internal::TimerInterface *implementation() { return implementation_.get(); }
+
+ private:
+  // Platform-dependent implementation of the timer internals for the underlying
+  // platform. This class just delegates to this opaque instance.
+  std::unique_ptr<internal::TimerInterface> implementation_;
+
+  // The StreamExecutor that manages the platform-specific internals for this
+  // timer.
+  StreamExecutor *parent_;
+
+  SE_DISALLOW_COPY_AND_ASSIGN(Timer);
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_TIMER_H_
diff --git a/tensorflow/stream_executor/trace_listener.h b/tensorflow/stream_executor/trace_listener.h
new file mode 100644
index 0000000000..dcbb223f4f
--- /dev/null
+++ b/tensorflow/stream_executor/trace_listener.h
@@ -0,0 +1,59 @@
+// This file defines the StreamExecutor trace listener, used for inserting
+// non-device-specific instrumentation into the StreamExecutor.
+#ifndef TENSORFLOW_STREAM_EXECUTOR_TRACE_LISTENER_H_
+#define TENSORFLOW_STREAM_EXECUTOR_TRACE_LISTENER_H_
+
+#include "tensorflow/stream_executor/device_memory.h"
+#include "tensorflow/stream_executor/kernel.h"
+#include "tensorflow/stream_executor/launch_dim.h"
+#include "tensorflow/stream_executor/lib/status.h"
+
+namespace perftools {
+namespace gputools {
+
+class Stream;
+
+// Traces StreamExecutor PIMPL-level events.
+// The few StreamExecutor interfaces that are synchronous have both Begin and
+// Complete versions of their trace calls. Asynchronous operations only have
+// Submit calls, as execution of the underlying operations is device-specific.
+// As all tracing calls mirror StreamExecutor routines, documentation here is
+// minimal.
+//
+// All calls have default implementations that perform no work; subclasses
+// should override functionality of interest. Keep in mind that these routines
+// are not called on a dedicated thread, so callbacks should execute quickly.
+//
+// Note: This API is constructed on an as-needed basis. Users should add
+// support for further StreamExecutor operations as required. By enforced
+// convention (see SCOPED_TRACE in stream_executor_pimpl.cc), synchronous
+// tracepoints should be named NameBegin and NameComplete.
+class TraceListener {
+ public:
+  virtual ~TraceListener() {}
+
+  virtual void LaunchSubmit(Stream* stream, const ThreadDim& thread_dims,
+                            const BlockDim& block_dims,
+                            const KernelBase& kernel,
+                            const std::vector<KernelArg>& args) {}
+
+  virtual void SynchronousMemcpyH2DBegin(int64 correlation_id,
+                                         const void* host_src, int64 size,
+                                         DeviceMemoryBase* gpu_dst) {}
+  virtual void SynchronousMemcpyH2DComplete(int64 correlation_id,
+                                            const port::Status* result) {}
+
+  virtual void SynchronousMemcpyD2HBegin(int64 correlation_id,
+                                         const DeviceMemoryBase& gpu_src,
+                                         int64 size, void* host_dst) {}
+  virtual void SynchronousMemcpyD2HComplete(int64 correlation_id,
+                                            const port::Status* result) {}
+
+  virtual void BlockHostUntilDoneBegin(int64 correlation_id, Stream* stream) {}
+  virtual void BlockHostUntilDoneComplete(int64 correlation_id, bool result) {}
+};
+
+}  // namespace gputools
+}  // namespace perftools
+
+#endif  // TENSORFLOW_STREAM_EXECUTOR_TRACE_LISTENER_H_
diff --git a/tensorflow/tensorboard/BUILD b/tensorflow/tensorboard/BUILD
new file mode 100644
index 0000000000..2dcb5e4fa9
--- /dev/null
+++ b/tensorflow/tensorboard/BUILD
@@ -0,0 +1,49 @@
+# Description:
+# TensorBoard, a dashboard for investigating TensorFlow
+
+package(default_visibility = ["//tensorflow:internal"])
+
+filegroup(
+    name = "tensorboard_frontend",
+    srcs = [
+        "dist/index.html",
+        "dist/tf-tensorboard.html",
+        "//tensorflow/tensorboard/bower:bower",
+    ] + glob(["lib/**/*"]),
+)
+
+py_library(
+    name = "tensorboard_handler",
+    srcs = ["tensorboard_handler.py"],
+    deps = [
+        ":float_wrapper",
+        "//tensorflow/python:platform",
+        "//tensorflow/python:summary",
+    ],
+)
+
+py_library(
+    name = "float_wrapper",
+    srcs = ["float_wrapper.py"],
+)
+
+py_test(
+    name = "float_wrapper_test",
+    size = "small",
+    srcs = ["float_wrapper_test.py"],
+    deps = [
+        ":float_wrapper",
+        "//tensorflow/python:platform_test",
+    ],
+)
+
+py_binary(
+    name = "tensorboard",
+    srcs = ["tensorboard.py"],
+    data = [":tensorboard_frontend"],
+    deps = [
+        ":tensorboard_handler",
+        "//tensorflow/python:platform",
+        "//tensorflow/python:summary",
+    ],
+)
diff --git a/tensorflow/tensorboard/README.md b/tensorflow/tensorboard/README.md
new file mode 100644
index 0000000000..eb85a1e461
--- /dev/null
+++ b/tensorflow/tensorboard/README.md
@@ -0,0 +1,66 @@
+# TensorBoard
+
+TensorBoard is a suite of web applications for inspecting and understanding your
+TensorFlow runs and graphs.
+
+Example Usage:
+
+```
+python tensorflow/tensorboard/tensorboard.py --logdir=path/to/logs
+# if installed via pip
+tensorboard --logdir=path/to/logs
+
+# if building from source
+bazel build tensorflow/tensorboard:tensorboard
+./bazel-bin/tensorflow/tensorboard/tensorboard --logdir=path/to/logs
+
+# then connect to http://localhost:6006
+```
+
+Note that TensorBoard requires a `logdir` to read logs from. For info on
+configuring TensorBoard, run `tensorboard --help`.
+
+TensorBoard includes a backend (tensorboard.py) that reads TensorFlow event data
+from the *tfevents* files, and then  serves this data to the browser. It also
+includes a frontend (app/tf-tensorboard.html) that contains html and javascript
+for displaying this data in a UI.
+
+
+## Building the TensorBoard frontend
+
+### Install Node, npm, gulp, bower, and tsd in your machine
+Get nodejs and npm through whatever package distribution system is appropriate
+for your machine. For example, on Ubuntu 14.04, run
+`sudo apt-get install nodejs nodejs-legacy npm`. Then, run
+`sudo npm install -g gulp bower tsd`.
+
+### Install project dependencies
+
+Inside this directory (`tensorflow/tensorboard`),
+run the following commands.
+
+    npm install
+    bower install
+    tsd install
+
+### Run Gulp Vulcanize
+
+Inside this directory, run `gulp vulcanize`. That will compile all of the
+html/js/css dependencies for TensorBoard into a monolithic index.html file under
+dist/. Once you've done this, you can locally run your own TensorBoard instance
+and it will have a working frontend.
+
+### Frontend General Dev Instructions
+
+To speed up the development process, we can run the frontend code independently
+of the backend, and mock out the backend with static JSON files. This allows
+testing the frontend's correctness without needing to find  real data and spin
+up a real server. Look at app/demo/index.html for an example.
+
+The following gulp commands are useful:
+
+* `gulp test` - build, test, and lint the code
+* `gulp watch` - build, test, and rebuild on change
+* `gulp server` - start a livereload server on localhost:8000
+* `gulp` - alias for `gulp watch`
+* `gulp vulcanize` -
diff --git a/tensorflow/tensorboard/__init__.py b/tensorflow/tensorboard/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/tensorboard/__init__.py
diff --git a/tensorflow/tensorboard/app/demo/data/cos.json b/tensorflow/tensorboard/app/demo/data/cos.json
new file mode 100644
index 0000000000..807e1f6dc0
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/cos.json
@@ -0,0 +1 @@
+[[1434931200.0, 0.0, 1.0], [1434932928.0, 20000.0, 0.9999210442038161], [1434934656.0, 40000.0, 0.9996841892832999], [1434936384.0, 60000.0, 0.9992894726405892], [1434938112.0, 80000.0, 0.9987369566060175], [1434939840.0, 100000.0, 0.9980267284282716], [1434941568.0, 120000.0, 0.9971589002606139], [1434943296.0, 140000.0, 0.9961336091431725], [1434945024.0, 160000.0, 0.9949510169813002], [1434946752.0, 180000.0, 0.9936113105200084], [1434948480.0, 200000.0, 0.9921147013144779], [1434950208.0, 220000.0, 0.9904614256966512], [1434951936.0, 240000.0, 0.9886517447379141], [1434953664.0, 260000.0, 0.986685944207868], [1434955392.0, 280000.0, 0.9845643345292053], [1434957120.0, 300000.0, 0.9822872507286887], [1434958848.0, 320000.0, 0.9798550523842469], [1434960576.0, 340000.0, 0.9772681235681935], [1434962304.0, 360000.0, 0.9745268727865771], [1434964032.0, 380000.0, 0.971631732914674], [1434965760.0, 400000.0, 0.9685831611286311], [1434967488.0, 420000.0, 0.9653816388332739], [1434969216.0, 440000.0, 0.9620276715860859], [1434970944.0, 460000.0, 0.9585217890173758], [1434972672.0, 480000.0, 0.954864544746643], [1434974400.0, 500000.0, 0.9510565162951535], [1434976128.0, 520000.0, 0.9470983049947443], [1434977856.0, 540000.0, 0.9429905358928645], [1434979584.0, 560000.0, 0.9387338576538741], [1434981312.0, 580000.0, 0.934328942456612], [1434983040.0, 600000.0, 0.9297764858882515], [1434984768.0, 620000.0, 0.925077206834458], [1434986496.0, 640000.0, 0.9202318473658704], [1434988224.0, 660000.0, 0.9152411726209175], [1434989952.0, 680000.0, 0.9101059706849957], [1434991680.0, 700000.0, 0.9048270524660195], [1434993408.0, 720000.0, 0.8994052515663711], [1434995136.0, 740000.0, 0.8938414241512638], [1434996864.0, 760000.0, 0.8881364488135446], [1434998592.0, 780000.0, 0.8822912264349533], [1435000320.0, 800000.0, 0.8763066800438636], [1435002048.0, 820000.0, 0.8701837546695257], [1435003776.0, 840000.0, 0.8639234171928353], [1435005504.0, 860000.0, 0.8575266561936523], [1435007232.0, 880000.0, 0.850994481794692], [1435008960.0, 900000.0, 0.8443279255020151], [1435010688.0, 920000.0, 0.8375280400421418], [1435012416.0, 940000.0, 0.8305958991958127], [1435014144.0, 960000.0, 0.8235325976284275], [1435015872.0, 980000.0, 0.8163392507171839], [1435017600.0, 1000000.0, 0.8090169943749475], [1435019328.0, 1020000.0, 0.8015669848708766], [1435021056.0, 1040000.0, 0.7939903986478354], [1435022784.0, 1060000.0, 0.7862884321366189], [1435024512.0, 1080000.0, 0.7784623015670235], [1435026240.0, 1100000.0, 0.7705132427757893], [1435027968.0, 1120000.0, 0.7624425110114479], [1435029696.0, 1140000.0, 0.7542513807361038], [1435031424.0, 1160000.0, 0.7459411454241821], [1435033152.0, 1180000.0, 0.7375131173581739], [1435034880.0, 1200000.0, 0.7289686274214116], [1435036608.0, 1220000.0, 0.7203090248879069], [1435038336.0, 1240000.0, 0.7115356772092855], [1435040064.0, 1260000.0, 0.7026499697988492], [1435041792.0, 1280000.0, 0.6936533058128049], [1435043520.0, 1300000.0, 0.6845471059286886], [1435045248.0, 1320000.0, 0.6753328081210245], [1435046976.0, 1340000.0, 0.6660118674342517], [1435048704.0, 1360000.0, 0.6565857557529564], [1435050432.0, 1380000.0, 0.6470559615694443], [1435052160.0, 1400000.0, 0.6374239897486896], [1435053888.0, 1420000.0, 0.6276913612907006], [1435055616.0, 1440000.0, 0.6178596130903344], [1435057344.0, 1460000.0, 0.6079302976946055], [1435059072.0, 1480000.0, 0.5979049830575189], [1435060800.0, 1500000.0, 0.5877852522924731], [1435062528.0, 1520000.0, 0.5775727034222676], [1435064256.0, 1540000.0, 0.5672689491267565], [1435065984.0, 1560000.0, 0.5568756164881881], [1435067712.0, 1580000.0, 0.5463943467342691], [1435069440.0, 1600000.0, 0.5358267949789965], [1435071168.0, 1620000.0, 0.5251746299612957], [1435072896.0, 1640000.0, 0.5144395337815064], [1435074624.0, 1660000.0, 0.5036232016357609], [1435076352.0, 1680000.0, 0.49272734154829156], [1435078080.0, 1700000.0, 0.48175367410171516], [1435079808.0, 1720000.0, 0.47070393216533274], [1435081536.0, 1740000.0, 0.4595798606214879], [1435083264.0, 1760000.0, 0.44838321609003245], [1435084992.0, 1780000.0, 0.437115766650933], [1435086720.0, 1800000.0, 0.42577929156507266], [1435088448.0, 1820000.0, 0.41437558099328425], [1435090176.0, 1840000.0, 0.4029064357136627], [1435091904.0, 1860000.0, 0.39137366683720254], [1435093632.0, 1880000.0, 0.3797790955218011], [1435095360.0, 1900000.0, 0.3681245526846781], [1435097088.0, 1920000.0, 0.35641187871325075], [1435098816.0, 1940000.0, 0.344642923174517], [1435100544.0, 1960000.0, 0.3328195445229867], [1435102272.0, 1980000.0, 0.32094360980720943], [1435104000.0, 2000000.0, 0.30901699437494745], [1435105728.0, 2020000.0, 0.29704158157703486], [1435107456.0, 2040000.0, 0.28501926246997616], [1435109184.0, 2060000.0, 0.2729519355173254], [1435110912.0, 2080000.0, 0.260841506289897], [1435112640.0, 2100000.0, 0.24868988716485496], [1435114368.0, 2120000.0, 0.23649899702372476], [1435116096.0, 2140000.0, 0.22427076094938117], [1435117824.0, 2160000.0, 0.21200710992205474], [1435119552.0, 2180000.0, 0.199709980514407], [1435121280.0, 2200000.0, 0.18738131458572474], [1435123008.0, 2220000.0, 0.17502305897527604], [1435124736.0, 2240000.0, 0.16263716519488372], [1435126464.0, 2260000.0, 0.15022558912075706], [1435128192.0, 2280000.0, 0.13779029068463797], [1435129920.0, 2300000.0, 0.12533323356430426], [1435131648.0, 2320000.0, 0.1128563848734816], [1435133376.0, 2340000.0, 0.10036171485121491], [1435135104.0, 2360000.0, 0.08785119655074332], [1435136832.0, 2380000.0, 0.07532680552793275], [1435138560.0, 2400000.0, 0.06279051952931353], [1435140288.0, 2420000.0, 0.0502443181797696], [1435142016.0, 2440000.0, 0.037690182669934694], [1435143744.0, 2460000.0, 0.02513009544333753], [1435145472.0, 2480000.0, 0.012566039883352776], [1435147200.0, 2500000.0, 6.123233995736766e-17], [1435148928.0, 2520000.0, -0.012566039883352653], [1435150656.0, 2540000.0, -0.025130095443337407], [1435152384.0, 2560000.0, -0.037690182669934576], [1435154112.0, 2580000.0, -0.05024431817976947], [1435155840.0, 2600000.0, -0.0627905195293134], [1435157568.0, 2620000.0, -0.07532680552793262], [1435159296.0, 2640000.0, -0.0878511965507432], [1435161024.0, 2660000.0, -0.100361714851215], [1435162752.0, 2680000.0, -0.1128563848734817], [1435164480.0, 2700000.0, -0.12533323356430437], [1435166208.0, 2720000.0, -0.13779029068463805], [1435167936.0, 2740000.0, -0.15022558912075715], [1435169664.0, 2760000.0, -0.16263716519488358], [1435171392.0, 2780000.0, -0.17502305897527615], [1435173120.0, 2800000.0, -0.18738131458572482], [1435174848.0, 2820000.0, -0.1997099805144069], [1435176576.0, 2840000.0, -0.21200710992205438], [1435178304.0, 2860000.0, -0.22427076094938103], [1435180032.0, 2880000.0, -0.23649899702372443], [1435181760.0, 2900000.0, -0.24868988716485463], [1435183488.0, 2920000.0, -0.26084150628989666], [1435185216.0, 2940000.0, -0.27295193551732505], [1435186944.0, 2960000.0, -0.28501926246997605], [1435188672.0, 2980000.0, -0.29704158157703475], [1435190400.0, 3000000.0, -0.30901699437494734], [1435192128.0, 3020000.0, -0.3209436098072093], [1435193856.0, 3040000.0, -0.33281954452298657], [1435195584.0, 3060000.0, -0.3446429231745169], [1435197312.0, 3080000.0, -0.35641187871325064], [1435199040.0, 3100000.0, -0.368124552684678], [1435200768.0, 3120000.0, -0.379779095521801], [1435202496.0, 3140000.0, -0.3913736668372024], [1435204224.0, 3160000.0, -0.4029064357136626], [1435205952.0, 3180000.0, -0.41437558099328414], [1435207680.0, 3200000.0, -0.4257792915650727], [1435209408.0, 3220000.0, -0.4371157666509327], [1435211136.0, 3240000.0, -0.4483832160900321], [1435212864.0, 3260000.0, -0.4595798606214878], [1435214592.0, 3280000.0, -0.47070393216533263], [1435216320.0, 3300000.0, -0.48175367410171543], [1435218048.0, 3320000.0, -0.49272734154829145], [1435219776.0, 3340000.0, -0.5036232016357608], [1435221504.0, 3360000.0, -0.5144395337815065], [1435223232.0, 3380000.0, -0.5251746299612958], [1435224960.0, 3400000.0, -0.5358267949789969], [1435226688.0, 3420000.0, -0.546394346734269], [1435228416.0, 3440000.0, -0.5568756164881876], [1435230144.0, 3460000.0, -0.5672689491267563], [1435231872.0, 3480000.0, -0.5775727034222674], [1435233600.0, 3500000.0, -0.587785252292473], [1435235328.0, 3520000.0, -0.5979049830575185], [1435237056.0, 3540000.0, -0.6079302976946052], [1435238784.0, 3560000.0, -0.6178596130903342], [1435240512.0, 3580000.0, -0.6276913612907004], [1435242240.0, 3600000.0, -0.6374239897486897], [1435243968.0, 3620000.0, -0.647055961569444], [1435245696.0, 3640000.0, -0.6565857557529563], [1435247424.0, 3660000.0, -0.6660118674342516], [1435249152.0, 3680000.0, -0.6753328081210245], [1435250880.0, 3700000.0, -0.6845471059286887], [1435252608.0, 3720000.0, -0.6936533058128048], [1435254336.0, 3740000.0, -0.7026499697988491], [1435256064.0, 3760000.0, -0.7115356772092853], [1435257792.0, 3780000.0, -0.7203090248879069], [1435259520.0, 3800000.0, -0.7289686274214113], [1435261248.0, 3820000.0, -0.7375131173581738], [1435262976.0, 3840000.0, -0.7459411454241821], [1435264704.0, 3860000.0, -0.7542513807361039], [1435266432.0, 3880000.0, -0.762442511011448], [1435268160.0, 3900000.0, -0.7705132427757891], [1435269888.0, 3920000.0, -0.7784623015670233], [1435271616.0, 3940000.0, -0.7862884321366189], [1435273344.0, 3960000.0, -0.7939903986478354], [1435275072.0, 3980000.0, -0.8015669848708767], [1435276800.0, 4000000.0, -0.8090169943749473], [1435278528.0, 4020000.0, -0.8163392507171839], [1435280256.0, 4040000.0, -0.8235325976284275], [1435281984.0, 4060000.0, -0.8305958991958128], [1435283712.0, 4080000.0, -0.8375280400421417], [1435285440.0, 4100000.0, -0.8443279255020149], [1435287168.0, 4120000.0, -0.8509944817946917], [1435288896.0, 4140000.0, -0.8575266561936521], [1435290624.0, 4160000.0, -0.8639234171928353], [1435292352.0, 4180000.0, -0.8701837546695257], [1435294080.0, 4200000.0, -0.8763066800438634], [1435295808.0, 4220000.0, -0.8822912264349532], [1435297536.0, 4240000.0, -0.8881364488135445], [1435299264.0, 4260000.0, -0.8938414241512638], [1435300992.0, 4280000.0, -0.8994052515663711], [1435302720.0, 4300000.0, -0.9048270524660194], [1435304448.0, 4320000.0, -0.9101059706849957], [1435306176.0, 4340000.0, -0.9152411726209175], [1435307904.0, 4360000.0, -0.9202318473658704], [1435309632.0, 4380000.0, -0.9250772068344579], [1435311360.0, 4400000.0, -0.9297764858882513], [1435313088.0, 4420000.0, -0.934328942456612], [1435314816.0, 4440000.0, -0.9387338576538741], [1435316544.0, 4460000.0, -0.9429905358928645], [1435318272.0, 4480000.0, -0.9470983049947442], [1435320000.0, 4500000.0, -0.9510565162951535], [1435321728.0, 4520000.0, -0.954864544746643], [1435323456.0, 4540000.0, -0.958521789017376], [1435325184.0, 4560000.0, -0.9620276715860859], [1435326912.0, 4580000.0, -0.9653816388332739], [1435328640.0, 4600000.0, -0.9685831611286311], [1435330368.0, 4620000.0, -0.971631732914674], [1435332096.0, 4640000.0, -0.9745268727865772], [1435333824.0, 4660000.0, -0.9772681235681935], [1435335552.0, 4680000.0, -0.9798550523842469], [1435337280.0, 4700000.0, -0.9822872507286886], [1435339008.0, 4720000.0, -0.9845643345292053], [1435340736.0, 4740000.0, -0.986685944207868], [1435342464.0, 4760000.0, -0.988651744737914], [1435344192.0, 4780000.0, -0.9904614256966512], [1435345920.0, 4800000.0, -0.9921147013144778], [1435347648.0, 4820000.0, -0.9936113105200084], [1435349376.0, 4840000.0, -0.9949510169813002], [1435351104.0, 4860000.0, -0.9961336091431725], [1435352832.0, 4880000.0, -0.9971589002606139], [1435354560.0, 4900000.0, -0.9980267284282716], [1435356288.0, 4920000.0, -0.9987369566060175], [1435358016.0, 4940000.0, -0.9992894726405892], [1435359744.0, 4960000.0, -0.9996841892832999], [1435361472.0, 4980000.0, -0.9999210442038161], [1435363200.0, 5000000.0, -1.0], [1435364928.0, 5020000.0, -0.9999210442038161], [1435366656.0, 5040000.0, -0.9996841892832999], [1435368384.0, 5060000.0, -0.9992894726405892], [1435370112.0, 5080000.0, -0.9987369566060175], [1435371840.0, 5100000.0, -0.9980267284282716], [1435373568.0, 5120000.0, -0.9971589002606139], [1435375296.0, 5140000.0, -0.9961336091431725], [1435377024.0, 5160000.0, -0.9949510169813002], [1435378752.0, 5180000.0, -0.9936113105200084], [1435380480.0, 5200000.0, -0.9921147013144779], [1435382208.0, 5220000.0, -0.9904614256966512], [1435383936.0, 5240000.0, -0.9886517447379141], [1435385664.0, 5260000.0, -0.986685944207868], [1435387392.0, 5280000.0, -0.9845643345292053], [1435389120.0, 5300000.0, -0.9822872507286886], [1435390848.0, 5320000.0, -0.9798550523842469], [1435392576.0, 5340000.0, -0.9772681235681935], [1435394304.0, 5360000.0, -0.9745268727865771], [1435396032.0, 5380000.0, -0.9716317329146739], [1435397760.0, 5400000.0, -0.9685831611286311], [1435399488.0, 5420000.0, -0.9653816388332739], [1435401216.0, 5440000.0, -0.9620276715860859], [1435402944.0, 5460000.0, -0.9585217890173758], [1435404672.0, 5480000.0, -0.954864544746643], [1435406400.0, 5500000.0, -0.9510565162951535], [1435408128.0, 5520000.0, -0.9470983049947443], [1435409856.0, 5540000.0, -0.9429905358928644], [1435411584.0, 5560000.0, -0.9387338576538741], [1435413312.0, 5580000.0, -0.9343289424566119], [1435415040.0, 5600000.0, -0.9297764858882512], [1435416768.0, 5620000.0, -0.925077206834458], [1435418496.0, 5640000.0, -0.9202318473658705], [1435420224.0, 5660000.0, -0.9152411726209179], [1435421952.0, 5680000.0, -0.9101059706849959], [1435423680.0, 5700000.0, -0.9048270524660197], [1435425408.0, 5720000.0, -0.8994052515663712], [1435427136.0, 5740000.0, -0.8938414241512641], [1435428864.0, 5760000.0, -0.8881364488135448], [1435430592.0, 5780000.0, -0.8822912264349535], [1435432320.0, 5800000.0, -0.8763066800438637], [1435434048.0, 5820000.0, -0.8701837546695258], [1435435776.0, 5840000.0, -0.8639234171928356], [1435437504.0, 5860000.0, -0.8575266561936525], [1435439232.0, 5880000.0, -0.8509944817946921], [1435440960.0, 5900000.0, -0.8443279255020152], [1435442688.0, 5920000.0, -0.8375280400421418], [1435444416.0, 5940000.0, -0.8305958991958129], [1435446144.0, 5960000.0, -0.8235325976284277], [1435447872.0, 5980000.0, -0.816339250717184], [1435449600.0, 6000000.0, -0.8090169943749476], [1435451328.0, 6020000.0, -0.8015669848708769], [1435453056.0, 6040000.0, -0.7939903986478356], [1435454784.0, 6060000.0, -0.786288432136619], [1435456512.0, 6080000.0, -0.7784623015670236], [1435458240.0, 6100000.0, -0.7705132427757893], [1435459968.0, 6120000.0, -0.7624425110114481], [1435461696.0, 6140000.0, -0.754251380736104], [1435463424.0, 6160000.0, -0.7459411454241822], [1435465152.0, 6180000.0, -0.7375131173581739], [1435466880.0, 6200000.0, -0.7289686274214116], [1435468608.0, 6220000.0, -0.7203090248879072], [1435470336.0, 6240000.0, -0.7115356772092855], [1435472064.0, 6260000.0, -0.7026499697988493], [1435473792.0, 6280000.0, -0.6936533058128049], [1435475520.0, 6300000.0, -0.684547105928689], [1435477248.0, 6320000.0, -0.6753328081210246], [1435478976.0, 6340000.0, -0.6660118674342517], [1435480704.0, 6360000.0, -0.6565857557529564], [1435482432.0, 6380000.0, -0.6470559615694442], [1435484160.0, 6400000.0, -0.6374239897486895], [1435485888.0, 6420000.0, -0.6276913612907002], [1435487616.0, 6440000.0, -0.6178596130903348], [1435489344.0, 6460000.0, -0.6079302976946057], [1435491072.0, 6480000.0, -0.597904983057519], [1435492800.0, 6500000.0, -0.5877852522924732], [1435494528.0, 6520000.0, -0.5775727034222676], [1435496256.0, 6540000.0, -0.5672689491267564], [1435497984.0, 6560000.0, -0.5568756164881878], [1435499712.0, 6580000.0, -0.5463943467342689], [1435501440.0, 6600000.0, -0.5358267949789963], [1435503168.0, 6620000.0, -0.5251746299612953], [1435504896.0, 6640000.0, -0.5144395337815068], [1435506624.0, 6660000.0, -0.503623201635761], [1435508352.0, 6680000.0, -0.4927273415482917], [1435510080.0, 6700000.0, -0.48175367410171527], [1435511808.0, 6720000.0, -0.47070393216533246], [1435513536.0, 6740000.0, -0.45957986062148765], [1435515264.0, 6760000.0, -0.44838321609003196], [1435516992.0, 6780000.0, -0.4371157666509325], [1435518720.0, 6800000.0, -0.42577929156507216], [1435520448.0, 6820000.0, -0.41437558099328436], [1435522176.0, 6840000.0, -0.4029064357136628], [1435523904.0, 6860000.0, -0.3913736668372024], [1435525632.0, 6880000.0, -0.37977909552180183], [1435527360.0, 6900000.0, -0.3681245526846786], [1435529088.0, 6920000.0, -0.35641187871325125], [1435530816.0, 6940000.0, -0.3446429231745175], [1435532544.0, 6960000.0, -0.332819544522987], [1435534272.0, 6980000.0, -0.32094360980720976], [1435536000.0, 7000000.0, -0.30901699437494756], [1435537728.0, 7020000.0, -0.297041581577035], [1435539456.0, 7040000.0, -0.28501926246997694], [1435541184.0, 7060000.0, -0.27295193551732594], [1435542912.0, 7080000.0, -0.26084150628989755], [1435544640.0, 7100000.0, -0.2486898871648553], [1435546368.0, 7120000.0, -0.2364989970237251], [1435548096.0, 7140000.0, -0.2242707609493815], [1435549824.0, 7160000.0, -0.21200710992205485], [1435551552.0, 7180000.0, -0.1997099805144071], [1435553280.0, 7200000.0, -0.18738131458572463], [1435555008.0, 7220000.0, -0.17502305897527595], [1435556736.0, 7240000.0, -0.16263716519488428], [1435558464.0, 7260000.0, -0.15022558912075762], [1435560192.0, 7280000.0, -0.13779029068463852], [1435561920.0, 7300000.0, -0.1253332335643046], [1435563648.0, 7320000.0, -0.11285638487348193], [1435565376.0, 7340000.0, -0.10036171485121503], [1435567104.0, 7360000.0, -0.08785119655074321], [1435568832.0, 7380000.0, -0.07532680552793265], [1435570560.0, 7400000.0, -0.06279051952931321], [1435572288.0, 7420000.0, -0.05024431817977017], [1435574016.0, 7440000.0, -0.03769018266993504], [1435575744.0, 7460000.0, -0.025130095443337875], [1435577472.0, 7480000.0, -0.012566039883352897], [1435579200.0, 7500000.0, -1.8369701987210297e-16], [1435580928.0, 7520000.0, 0.012566039883352531], [1435582656.0, 7540000.0, 0.025130095443337507], [1435584384.0, 7560000.0, 0.03769018266993467], [1435586112.0, 7580000.0, 0.0502443181797698], [1435587840.0, 7600000.0, 0.06279051952931283], [1435589568.0, 7620000.0, 0.07532680552793229], [1435591296.0, 7640000.0, 0.08785119655074285], [1435593024.0, 7660000.0, 0.10036171485121467], [1435594752.0, 7680000.0, 0.11285638487348157], [1435596480.0, 7700000.0, 0.12533323356430423], [1435598208.0, 7720000.0, 0.13779029068463816], [1435599936.0, 7740000.0, 0.15022558912075726], [1435601664.0, 7760000.0, 0.16263716519488391], [1435603392.0, 7780000.0, 0.1750230589752756], [1435605120.0, 7800000.0, 0.18738131458572427], [1435606848.0, 7820000.0, 0.19970998051440675], [1435608576.0, 7840000.0, 0.2120071099220545], [1435610304.0, 7860000.0, 0.22427076094938114], [1435612032.0, 7880000.0, 0.23649899702372473], [1435613760.0, 7900000.0, 0.24868988716485493], [1435615488.0, 7920000.0, 0.2608415062898972], [1435617216.0, 7940000.0, 0.2729519355173256], [1435618944.0, 7960000.0, 0.2850192624699766], [1435620672.0, 7980000.0, 0.29704158157703464], [1435622400.0, 8000000.0, 0.30901699437494723], [1435624128.0, 8020000.0, 0.32094360980720943], [1435625856.0, 8040000.0, 0.3328195445229867], [1435627584.0, 8060000.0, 0.34464292317451717], [1435629312.0, 8080000.0, 0.3564118787132509], [1435631040.0, 8100000.0, 0.36812455268467825], [1435632768.0, 8120000.0, 0.3797790955218015], [1435634496.0, 8140000.0, 0.3913736668372021], [1435636224.0, 8160000.0, 0.40290643571366247], [1435637952.0, 8180000.0, 0.41437558099328403], [1435639680.0, 8200000.0, 0.4257792915650718], [1435641408.0, 8220000.0, 0.43711576665093216], [1435643136.0, 8240000.0, 0.4483832160900316], [1435644864.0, 8260000.0, 0.4595798606214873], [1435646592.0, 8280000.0, 0.47070393216533213], [1435648320.0, 8300000.0, 0.48175367410171493], [1435650048.0, 8320000.0, 0.49272734154829134], [1435651776.0, 8340000.0, 0.5036232016357607], [1435653504.0, 8360000.0, 0.5144395337815064], [1435655232.0, 8380000.0, 0.5251746299612949], [1435656960.0, 8400000.0, 0.535826794978996], [1435658688.0, 8420000.0, 0.5463943467342686], [1435660416.0, 8440000.0, 0.5568756164881876], [1435662144.0, 8460000.0, 0.5672689491267562], [1435663872.0, 8480000.0, 0.5775727034222673], [1435665600.0, 8500000.0, 0.5877852522924729], [1435667328.0, 8520000.0, 0.5979049830575188], [1435669056.0, 8540000.0, 0.6079302976946054], [1435670784.0, 8560000.0, 0.6178596130903344], [1435672512.0, 8580000.0, 0.6276913612906999], [1435674240.0, 8600000.0, 0.6374239897486893], [1435675968.0, 8620000.0, 0.647055961569444], [1435677696.0, 8640000.0, 0.6565857557529562], [1435679424.0, 8660000.0, 0.6660118674342514], [1435681152.0, 8680000.0, 0.6753328081210244], [1435682880.0, 8700000.0, 0.6845471059286886], [1435684608.0, 8720000.0, 0.693653305812805], [1435686336.0, 8740000.0, 0.7026499697988493], [1435688064.0, 8760000.0, 0.7115356772092849], [1435689792.0, 8780000.0, 0.7203090248879065], [1435691520.0, 8800000.0, 0.7289686274214112], [1435693248.0, 8820000.0, 0.7375131173581737], [1435694976.0, 8840000.0, 0.745941145424182], [1435696704.0, 8860000.0, 0.7542513807361038], [1435698432.0, 8880000.0, 0.7624425110114479], [1435700160.0, 8900000.0, 0.7705132427757894], [1435701888.0, 8920000.0, 0.7784623015670236], [1435703616.0, 8940000.0, 0.7862884321366186], [1435705344.0, 8960000.0, 0.793990398647835], [1435707072.0, 8980000.0, 0.8015669848708764], [1435708800.0, 9000000.0, 0.8090169943749473], [1435710528.0, 9020000.0, 0.8163392507171839], [1435712256.0, 9040000.0, 0.8235325976284275], [1435713984.0, 9060000.0, 0.8305958991958127], [1435715712.0, 9080000.0, 0.8375280400421419], [1435717440.0, 9100000.0, 0.8443279255020153], [1435719168.0, 9120000.0, 0.8509944817946921], [1435720896.0, 9140000.0, 0.8575266561936521], [1435722624.0, 9160000.0, 0.8639234171928352], [1435724352.0, 9180000.0, 0.8701837546695256], [1435726080.0, 9200000.0, 0.8763066800438636], [1435727808.0, 9220000.0, 0.8822912264349533], [1435729536.0, 9240000.0, 0.8881364488135446], [1435731264.0, 9260000.0, 0.8938414241512639], [1435732992.0, 9280000.0, 0.8994052515663712], [1435734720.0, 9300000.0, 0.9048270524660197], [1435736448.0, 9320000.0, 0.9101059706849955], [1435738176.0, 9340000.0, 0.9152411726209175], [1435739904.0, 9360000.0, 0.9202318473658703], [1435741632.0, 9380000.0, 0.9250772068344577], [1435743360.0, 9400000.0, 0.9297764858882511], [1435745088.0, 9420000.0, 0.9343289424566118], [1435746816.0, 9440000.0, 0.9387338576538738], [1435748544.0, 9460000.0, 0.9429905358928643], [1435750272.0, 9480000.0, 0.9470983049947441], [1435752000.0, 9500000.0, 0.9510565162951535], [1435753728.0, 9520000.0, 0.954864544746643], [1435755456.0, 9540000.0, 0.9585217890173756], [1435757184.0, 9560000.0, 0.9620276715860857], [1435758912.0, 9580000.0, 0.9653816388332737], [1435760640.0, 9600000.0, 0.968583161128631], [1435762368.0, 9620000.0, 0.9716317329146739], [1435764096.0, 9640000.0, 0.974526872786577], [1435765824.0, 9660000.0, 0.9772681235681934], [1435767552.0, 9680000.0, 0.9798550523842469], [1435769280.0, 9700000.0, 0.9822872507286887], [1435771008.0, 9720000.0, 0.9845643345292054], [1435772736.0, 9740000.0, 0.9866859442078679], [1435774464.0, 9760000.0, 0.988651744737914], [1435776192.0, 9780000.0, 0.9904614256966512], [1435777920.0, 9800000.0, 0.9921147013144778], [1435779648.0, 9820000.0, 0.9936113105200084], [1435781376.0, 9840000.0, 0.9949510169813002], [1435783104.0, 9860000.0, 0.9961336091431725], [1435784832.0, 9880000.0, 0.9971589002606139], [1435786560.0, 9900000.0, 0.9980267284282716], [1435788288.0, 9920000.0, 0.9987369566060175], [1435790016.0, 9940000.0, 0.9992894726405892], [1435791744.0, 9960000.0, 0.9996841892832999], [1435793472.0, 9980000.0, 0.9999210442038161], [1435795200.0, 10000000.0, 1.0], [1435796928.0, 10020000.0, 0.9999210442038161], [1435798656.0, 10040000.0, 0.9996841892832999], [1435800384.0, 10060000.0, 0.9992894726405892], [1435802112.0, 10080000.0, 0.9987369566060175], [1435803840.0, 10100000.0, 0.9980267284282716], [1435805568.0, 10120000.0, 0.9971589002606139], [1435807296.0, 10140000.0, 0.9961336091431725], [1435809024.0, 10160000.0, 0.9949510169813002], [1435810752.0, 10180000.0, 0.9936113105200084], [1435812480.0, 10200000.0, 0.9921147013144779], [1435814208.0, 10220000.0, 0.9904614256966512], [1435815936.0, 10240000.0, 0.988651744737914], [1435817664.0, 10260000.0, 0.986685944207868], [1435819392.0, 10280000.0, 0.9845643345292054], [1435821120.0, 10300000.0, 0.9822872507286887], [1435822848.0, 10320000.0, 0.979855052384247], [1435824576.0, 10340000.0, 0.9772681235681935], [1435826304.0, 10360000.0, 0.9745268727865771], [1435828032.0, 10380000.0, 0.971631732914674], [1435829760.0, 10400000.0, 0.9685831611286311], [1435831488.0, 10420000.0, 0.9653816388332739], [1435833216.0, 10440000.0, 0.9620276715860858], [1435834944.0, 10460000.0, 0.9585217890173757], [1435836672.0, 10480000.0, 0.9548645447466431], [1435838400.0, 10500000.0, 0.9510565162951536], [1435840128.0, 10520000.0, 0.9470983049947443], [1435841856.0, 10540000.0, 0.9429905358928645], [1435843584.0, 10560000.0, 0.9387338576538741], [1435845312.0, 10580000.0, 0.934328942456612], [1435847040.0, 10600000.0, 0.9297764858882513], [1435848768.0, 10620000.0, 0.9250772068344579], [1435850496.0, 10640000.0, 0.9202318473658702], [1435852224.0, 10660000.0, 0.9152411726209176], [1435853952.0, 10680000.0, 0.9101059706849958], [1435855680.0, 10700000.0, 0.9048270524660196], [1435857408.0, 10720000.0, 0.8994052515663711], [1435859136.0, 10740000.0, 0.8938414241512638], [1435860864.0, 10760000.0, 0.8881364488135444], [1435862592.0, 10780000.0, 0.8822912264349532], [1435864320.0, 10800000.0, 0.8763066800438634], [1435866048.0, 10820000.0, 0.8701837546695254], [1435867776.0, 10840000.0, 0.8639234171928354], [1435869504.0, 10860000.0, 0.8575266561936523], [1435871232.0, 10880000.0, 0.8509944817946918], [1435872960.0, 10900000.0, 0.8443279255020151], [1435874688.0, 10920000.0, 0.8375280400421417], [1435876416.0, 10940000.0, 0.8305958991958124], [1435878144.0, 10960000.0, 0.8235325976284272], [1435879872.0, 10980000.0, 0.8163392507171836], [1435881600.0, 11000000.0, 0.809016994374947], [1435883328.0, 11020000.0, 0.8015669848708762], [1435885056.0, 11040000.0, 0.7939903986478354], [1435886784.0, 11060000.0, 0.7862884321366189], [1435888512.0, 11080000.0, 0.7784623015670233], [1435890240.0, 11100000.0, 0.770513242775789], [1435891968.0, 11120000.0, 0.7624425110114477], [1435893696.0, 11140000.0, 0.7542513807361035], [1435895424.0, 11160000.0, 0.7459411454241818], [1435897152.0, 11180000.0, 0.7375131173581735], [1435898880.0, 11200000.0, 0.728968627421411], [1435900608.0, 11220000.0, 0.7203090248879069], [1435902336.0, 11240000.0, 0.7115356772092852], [1435904064.0, 11260000.0, 0.7026499697988496], [1435905792.0, 11280000.0, 0.6936533058128054], [1435907520.0, 11300000.0, 0.6845471059286896], [1435909248.0, 11320000.0, 0.6753328081210254], [1435910976.0, 11340000.0, 0.6660118674342524], [1435912704.0, 11360000.0, 0.6565857557529572], [1435914432.0, 11380000.0, 0.647055961569445], [1435916160.0, 11400000.0, 0.6374239897486903], [1435917888.0, 11420000.0, 0.627691361290701], [1435919616.0, 11440000.0, 0.6178596130903348], [1435921344.0, 11460000.0, 0.6079302976946058], [1435923072.0, 11480000.0, 0.5979049830575199], [1435924800.0, 11500000.0, 0.587785252292474], [1435926528.0, 11520000.0, 0.5775727034222684], [1435928256.0, 11540000.0, 0.5672689491267573], [1435929984.0, 11560000.0, 0.5568756164881887], [1435931712.0, 11580000.0, 0.5463943467342697], [1435933440.0, 11600000.0, 0.5358267949789972], [1435935168.0, 11620000.0, 0.5251746299612962], [1435936896.0, 11640000.0, 0.5144395337815069], [1435938624.0, 11660000.0, 0.5036232016357611], [1435940352.0, 11680000.0, 0.4927273415482925], [1435942080.0, 11700000.0, 0.48175367410171616], [1435943808.0, 11720000.0, 0.47070393216533335], [1435945536.0, 11740000.0, 0.45957986062148853], [1435947264.0, 11760000.0, 0.44838321609003284], [1435948992.0, 11780000.0, 0.43711576665093343], [1435950720.0, 11800000.0, 0.42577929156507305], [1435952448.0, 11820000.0, 0.4143755809932845], [1435954176.0, 11840000.0, 0.4029064357136629], [1435955904.0, 11860000.0, 0.39137366683720337], [1435957632.0, 11880000.0, 0.37977909552180195], [1435959360.0, 11900000.0, 0.36812455268467875], [1435961088.0, 11920000.0, 0.35641187871325136], [1435962816.0, 11940000.0, 0.3446429231745176], [1435964544.0, 11960000.0, 0.3328195445229871], [1435966272.0, 11980000.0, 0.3209436098072099], [1435968000.0, 12000000.0, 0.30901699437494773], [1435969728.0, 12020000.0, 0.2970415815770351], [1435971456.0, 12040000.0, 0.28501926246997705], [1435973184.0, 12060000.0, 0.27295193551732605], [1435974912.0, 12080000.0, 0.26084150628989766], [1435976640.0, 12100000.0, 0.24868988716485543], [1435978368.0, 12120000.0, 0.2364989970237252], [1435980096.0, 12140000.0, 0.22427076094938161], [1435981824.0, 12160000.0, 0.21200710992205496], [1435983552.0, 12180000.0, 0.19970998051440725], [1435985280.0, 12200000.0, 0.18738131458572474], [1435987008.0, 12220000.0, 0.17502305897527606], [1435988736.0, 12240000.0, 0.1626371651948844], [1435990464.0, 12260000.0, 0.15022558912075773], [1435992192.0, 12280000.0, 0.13779029068463863], [1435993920.0, 12300000.0, 0.12533323356430473], [1435995648.0, 12320000.0, 0.11285638487348205], [1435997376.0, 12340000.0, 0.10036171485121516], [1435999104.0, 12360000.0, 0.08785119655074333], [1436000832.0, 12380000.0, 0.07532680552793278], [1436002560.0, 12400000.0, 0.06279051952931332], [1436004288.0, 12420000.0, 0.050244318179770285], [1436006016.0, 12440000.0, 0.037690182669935166], [1436007744.0, 12460000.0, 0.025130095443337996], [1436009472.0, 12480000.0, 0.01256603988335302], [1436011200.0, 12500000.0, 3.061616997868383e-16], [1436012928.0, 12520000.0, -0.012566039883352408], [1436014656.0, 12540000.0, -0.025130095443337386], [1436016384.0, 12560000.0, -0.03769018266993455], [1436018112.0, 12580000.0, -0.050244318179769674], [1436019840.0, 12600000.0, -0.06279051952931271], [1436021568.0, 12620000.0, -0.07532680552793217], [1436023296.0, 12640000.0, -0.08785119655074272], [1436025024.0, 12660000.0, -0.10036171485121455], [1436026752.0, 12680000.0, -0.11285638487348144], [1436028480.0, 12700000.0, -0.12533323356430412], [1436030208.0, 12720000.0, -0.13779029068463805], [1436031936.0, 12740000.0, -0.15022558912075626], [1436033664.0, 12760000.0, -0.16263716519488378], [1436035392.0, 12780000.0, -0.17502305897527545], [1436037120.0, 12800000.0, -0.18738131458572502], [1436038848.0, 12820000.0, -0.19970998051440664], [1436040576.0, 12840000.0, -0.21200710992205524], [1436042304.0, 12860000.0, -0.224270760949381], [1436044032.0, 12880000.0, -0.23649899702372376], [1436045760.0, 12900000.0, -0.24868988716485482], [1436047488.0, 12920000.0, -0.2608415062898962], [1436049216.0, 12940000.0, -0.2729519355173255], [1436050944.0, 12960000.0, -0.2850192624699756], [1436052672.0, 12980000.0, -0.29704158157703536], [1436054400.0, 13000000.0, -0.3090169943749471], [1436056128.0, 13020000.0, -0.32094360980721015], [1436057856.0, 13040000.0, -0.33281954452298657], [1436059584.0, 13060000.0, -0.3446429231745162], [1436061312.0, 13080000.0, -0.3564118787132508], [1436063040.0, 13100000.0, -0.3681245526846773], [1436064768.0, 13120000.0, -0.3797790955218014], [1436066496.0, 13140000.0, -0.391373666837202], [1436068224.0, 13160000.0, -0.40290643571366314], [1436069952.0, 13180000.0, -0.4143755809932839], [1436071680.0, 13200000.0, -0.4257792915650733], [1436073408.0, 13220000.0, -0.4371157666509329], [1436075136.0, 13240000.0, -0.4483832160900331], [1436076864.0, 13260000.0, -0.45957986062148803], [1436078592.0, 13280000.0, -0.470703932165332], [1436080320.0, 13300000.0, -0.4817536741017156], [1436082048.0, 13320000.0, -0.49272734154829123], [1436083776.0, 13340000.0, -0.5036232016357614], [1436085504.0, 13360000.0, -0.5144395337815063], [1436087232.0, 13380000.0, -0.5251746299612964], [1436088960.0, 13400000.0, -0.5358267949789967], [1436090688.0, 13420000.0, -0.5463943467342699], [1436092416.0, 13440000.0, -0.5568756164881882], [1436094144.0, 13460000.0, -0.5672689491267561], [1436095872.0, 13480000.0, -0.577572703422268], [1436097600.0, 13500000.0, -0.5877852522924729], [1436099328.0, 13520000.0, -0.5979049830575194], [1436101056.0, 13540000.0, -0.6079302976946053], [1436102784.0, 13560000.0, -0.617859613090335], [1436104512.0, 13580000.0, -0.6276913612907006], [1436106240.0, 13600000.0, -0.6374239897486905], [1436107968.0, 13620000.0, -0.6470559615694446], [1436109696.0, 13640000.0, -0.6565857557529561], [1436111424.0, 13660000.0, -0.666011867434252], [1436113152.0, 13680000.0, -0.6753328081210243], [1436114880.0, 13700000.0, -0.6845471059286892], [1436116608.0, 13720000.0, -0.6936533058128049], [1436118336.0, 13740000.0, -0.7026499697988499], [1436120064.0, 13760000.0, -0.7115356772092842], [1436121792.0, 13780000.0, -0.7203090248879065], [1436123520.0, 13800000.0, -0.7289686274214106], [1436125248.0, 13820000.0, -0.7375131173581736], [1436126976.0, 13840000.0, -0.7459411454241813], [1436128704.0, 13860000.0, -0.7542513807361036], [1436130432.0, 13880000.0, -0.7624425110114472], [1436132160.0, 13900000.0, -0.7705132427757881], [1436133888.0, 13920000.0, -0.7784623015670229], [1436135616.0, 13940000.0, -0.7862884321366179], [1436137344.0, 13960000.0, -0.7939903986478349], [1436139072.0, 13980000.0, -0.8015669848708757], [1436140800.0, 14000000.0, -0.8090169943749472], [1436142528.0, 14020000.0, -0.8163392507171833], [1436144256.0, 14040000.0, -0.8235325976284273], [1436145984.0, 14060000.0, -0.8305958991958121], [1436147712.0, 14080000.0, -0.8375280400421408], [1436149440.0, 14100000.0, -0.8443279255020147], [1436151168.0, 14120000.0, -0.8509944817946911], [1436152896.0, 14140000.0, -0.857526656193652], [1436154624.0, 14160000.0, -0.8639234171928347], [1436156352.0, 14180000.0, -0.8701837546695256], [1436158080.0, 14200000.0, -0.876306680043863], [1436159808.0, 14220000.0, -0.8822912264349533], [1436161536.0, 14240000.0, -0.8881364488135441], [1436163264.0, 14260000.0, -0.893841424151263], [1436164992.0, 14280000.0, -0.8994052515663707], [1436166720.0, 14300000.0, -0.9048270524660189], [1436168448.0, 14320000.0, -0.9101059706849955], [1436170176.0, 14340000.0, -0.9152411726209171], [1436171904.0, 14360000.0, -0.9202318473658703], [1436173632.0, 14380000.0, -0.9250772068344577], [1436175360.0, 14400000.0, -0.9297764858882515], [1436177088.0, 14420000.0, -0.9343289424566118], [1436178816.0, 14440000.0, -0.9387338576538742], [1436180544.0, 14460000.0, -0.9429905358928643], [1436182272.0, 14480000.0, -0.9470983049947439], [1436184000.0, 14500000.0, -0.9510565162951534], [1436185728.0, 14520000.0, -0.9548645447466426], [1436187456.0, 14540000.0, -0.9585217890173758], [1436189184.0, 14560000.0, -0.9620276715860856], [1436190912.0, 14580000.0, -0.9653816388332739], [1436192640.0, 14600000.0, -0.968583161128631], [1436194368.0, 14620000.0, -0.971631732914674], [1436196096.0, 14640000.0, -0.974526872786577], [1436197824.0, 14660000.0, -0.9772681235681931], [1436199552.0, 14680000.0, -0.9798550523842469], [1436201280.0, 14700000.0, -0.9822872507286885], [1436203008.0, 14720000.0, -0.9845643345292053], [1436204736.0, 14740000.0, -0.9866859442078679], [1436206464.0, 14760000.0, -0.9886517447379141], [1436208192.0, 14780000.0, -0.9904614256966511], [1436209920.0, 14800000.0, -0.9921147013144779], [1436211648.0, 14820000.0, -0.9936113105200084], [1436213376.0, 14840000.0, -0.9949510169813001], [1436215104.0, 14860000.0, -0.9961336091431724], [1436216832.0, 14880000.0, -0.9971589002606138], [1436218560.0, 14900000.0, -0.9980267284282716], [1436220288.0, 14920000.0, -0.9987369566060175], [1436222016.0, 14940000.0, -0.9992894726405892], [1436223744.0, 14960000.0, -0.9996841892832999], [1436225472.0, 14980000.0, -0.9999210442038161], [1436227200.0, 15000000.0, -1.0], [1436228928.0, 15020000.0, -0.9999210442038161], [1436230656.0, 15040000.0, -0.9996841892832999], [1436232384.0, 15060000.0, -0.9992894726405893], [1436234112.0, 15080000.0, -0.9987369566060175], [1436235840.0, 15100000.0, -0.9980267284282716], [1436237568.0, 15120000.0, -0.9971589002606139], [1436239296.0, 15140000.0, -0.9961336091431725], [1436241024.0, 15160000.0, -0.9949510169813001], [1436242752.0, 15180000.0, -0.9936113105200084], [1436244480.0, 15200000.0, -0.992114701314478], [1436246208.0, 15220000.0, -0.9904614256966512], [1436247936.0, 15240000.0, -0.9886517447379142], [1436249664.0, 15260000.0, -0.986685944207868], [1436251392.0, 15280000.0, -0.9845643345292054], [1436253120.0, 15300000.0, -0.9822872507286886], [1436254848.0, 15320000.0, -0.979855052384247], [1436256576.0, 15340000.0, -0.9772681235681934], [1436258304.0, 15360000.0, -0.9745268727865772], [1436260032.0, 15380000.0, -0.9716317329146742], [1436261760.0, 15400000.0, -0.9685831611286311], [1436263488.0, 15420000.0, -0.9653816388332741], [1436265216.0, 15440000.0, -0.9620276715860858], [1436266944.0, 15460000.0, -0.9585217890173761], [1436268672.0, 15480000.0, -0.9548645447466428], [1436270400.0, 15500000.0, -0.9510565162951536], [1436272128.0, 15520000.0, -0.9470983049947441], [1436273856.0, 15540000.0, -0.9429905358928645], [1436275584.0, 15560000.0, -0.9387338576538744], [1436277312.0, 15580000.0, -0.934328942456612], [1436279040.0, 15600000.0, -0.9297764858882517], [1436280768.0, 15620000.0, -0.9250772068344579], [1436282496.0, 15640000.0, -0.9202318473658705], [1436284224.0, 15660000.0, -0.9152411726209174], [1436285952.0, 15680000.0, -0.9101059706849958], [1436287680.0, 15700000.0, -0.9048270524660192], [1436289408.0, 15720000.0, -0.8994052515663711], [1436291136.0, 15740000.0, -0.8938414241512633], [1436292864.0, 15760000.0, -0.8881364488135445], [1436294592.0, 15780000.0, -0.8822912264349536], [1436296320.0, 15800000.0, -0.8763066800438635], [1436298048.0, 15820000.0, -0.8701837546695259], [1436299776.0, 15840000.0, -0.863923417192835], [1436301504.0, 15860000.0, -0.8575266561936524], [1436303232.0, 15880000.0, -0.8509944817946915], [1436304960.0, 15900000.0, -0.8443279255020151], [1436306688.0, 15920000.0, -0.8375280400421412], [1436308416.0, 15940000.0, -0.8305958991958126], [1436310144.0, 15960000.0, -0.8235325976284278], [1436311872.0, 15980000.0, -0.8163392507171837], [1436313600.0, 16000000.0, -0.8090169943749477], [1436315328.0, 16020000.0, -0.8015669848708762], [1436317056.0, 16040000.0, -0.7939903986478355], [1436318784.0, 16060000.0, -0.7862884321366184], [1436320512.0, 16080000.0, -0.7784623015670235], [1436322240.0, 16100000.0, -0.7705132427757886], [1436323968.0, 16120000.0, -0.7624425110114477], [1436325696.0, 16140000.0, -0.7542513807361042], [1436327424.0, 16160000.0, -0.7459411454241819], [1436329152.0, 16180000.0, -0.7375131173581742], [1436330880.0, 16200000.0, -0.7289686274214111], [1436332608.0, 16220000.0, -0.7203090248879069], [1436334336.0, 16240000.0, -0.7115356772092847], [1436336064.0, 16260000.0, -0.7026499697988504], [1436337792.0, 16280000.0, -0.6936533058128055], [1436339520.0, 16300000.0, -0.6845471059286897], [1436341248.0, 16320000.0, -0.6753328081210248], [1436342976.0, 16340000.0, -0.6660118674342526], [1436344704.0, 16360000.0, -0.6565857557529566], [1436346432.0, 16380000.0, -0.6470559615694451], [1436348160.0, 16400000.0, -0.6374239897486911], [1436349888.0, 16420000.0, -0.6276913612907011], [1436351616.0, 16440000.0, -0.6178596130903357], [1436353344.0, 16460000.0, -0.6079302976946059], [1436355072.0, 16480000.0, -0.59790498305752], [1436356800.0, 16500000.0, -0.5877852522924735], [1436358528.0, 16520000.0, -0.5775727034222685], [1436360256.0, 16540000.0, -0.5672689491267566], [1436361984.0, 16560000.0, -0.5568756164881888], [1436363712.0, 16580000.0, -0.5463943467342706], [1436365440.0, 16600000.0, -0.5358267949789972], [1436367168.0, 16620000.0, -0.525174629961297], [1436368896.0, 16640000.0, -0.514439533781507], [1436370624.0, 16660000.0, -0.503623201635762], [1436372352.0, 16680000.0, -0.49272734154829184], [1436374080.0, 16700000.0, -0.48175367410171627], [1436375808.0, 16720000.0, -0.4707039321653327], [1436377536.0, 16740000.0, -0.45957986062148865], [1436379264.0, 16760000.0, -0.4483832160900338], [1436380992.0, 16780000.0, -0.43711576665093355], [1436382720.0, 16800000.0, -0.425779291565074], [1436384448.0, 16820000.0, -0.4143755809932846], [1436386176.0, 16840000.0, -0.4029064357136638], [1436387904.0, 16860000.0, -0.39137366683720265], [1436389632.0, 16880000.0, -0.37977909552180206], [1436391360.0, 16900000.0, -0.36812455268467803], [1436393088.0, 16920000.0, -0.3564118787132515], [1436394816.0, 16940000.0, -0.3446429231745169], [1436396544.0, 16960000.0, -0.33281954452298723], [1436398272.0, 16980000.0, -0.3209436098072108], [1436400000.0, 17000000.0, -0.30901699437494784], [1436401728.0, 17020000.0, -0.29704158157703603], [1436403456.0, 17040000.0, -0.28501926246997633], [1436405184.0, 17060000.0, -0.27295193551732616], [1436406912.0, 17080000.0, -0.26084150628989694], [1436408640.0, 17100000.0, -0.24868988716485554], [1436410368.0, 17120000.0, -0.23649899702372448], [1436412096.0, 17140000.0, -0.22427076094938173], [1436413824.0, 17160000.0, -0.21200710992205596], [1436415552.0, 17180000.0, -0.19970998051440736], [1436417280.0, 17200000.0, -0.18738131458572574], [1436419008.0, 17220000.0, -0.17502305897527617], [1436420736.0, 17240000.0, -0.1626371651948845], [1436422464.0, 17260000.0, -0.15022558912075698], [1436424192.0, 17280000.0, -0.13779029068463877], [1436425920.0, 17300000.0, -0.12533323356430395], [1436427648.0, 17320000.0, -0.11285638487348218], [1436429376.0, 17340000.0, -0.10036171485121616], [1436431104.0, 17360000.0, -0.08785119655074346], [1436432832.0, 17380000.0, -0.07532680552793379], [1436434560.0, 17400000.0, -0.06279051952931344], [1436436288.0, 17420000.0, -0.05024431817977041], [1436438016.0, 17440000.0, -0.037690182669934395], [1436439744.0, 17460000.0, -0.025130095443338118], [1436441472.0, 17480000.0, -0.012566039883352255], [1436443200.0, 17500000.0, -4.286263797015736e-16], [1436444928.0, 17520000.0, 0.012566039883351397], [1436446656.0, 17540000.0, 0.02513009544333726], [1436448384.0, 17560000.0, 0.03769018266993354], [1436450112.0, 17580000.0, 0.05024431817976955], [1436451840.0, 17600000.0, 0.0627905195293126], [1436453568.0, 17620000.0, 0.07532680552793293], [1436455296.0, 17640000.0, 0.08785119655074261], [1436457024.0, 17660000.0, 0.10036171485121531], [1436458752.0, 17680000.0, 0.11285638487348133], [1436460480.0, 17700000.0, 0.12533323356430312], [1436462208.0, 17720000.0, 0.1377902906846379], [1436463936.0, 17740000.0, 0.15022558912075612], [1436465664.0, 17760000.0, 0.16263716519488366], [1436467392.0, 17780000.0, 0.17502305897527534], [1436469120.0, 17800000.0, 0.1873813145857249], [1436470848.0, 17820000.0, 0.19970998051440653], [1436472576.0, 17840000.0, 0.21200710992205513], [1436474304.0, 17860000.0, 0.2242707609493809], [1436476032.0, 17880000.0, 0.23649899702372365], [1436477760.0, 17900000.0, 0.2486898871648547], [1436479488.0, 17920000.0, 0.2608415062898961], [1436481216.0, 17940000.0, 0.2729519355173254], [1436482944.0, 17960000.0, 0.2850192624699755], [1436484672.0, 17980000.0, 0.29704158157703525], [1436486400.0, 18000000.0, 0.309016994374947], [1436488128.0, 18020000.0, 0.32094360980721004], [1436489856.0, 18040000.0, 0.3328195445229864], [1436491584.0, 18060000.0, 0.3446429231745161], [1436493312.0, 18080000.0, 0.3564118787132507], [1436495040.0, 18100000.0, 0.3681245526846772], [1436496768.0, 18120000.0, 0.3797790955218013], [1436498496.0, 18140000.0, 0.3913736668372019], [1436500224.0, 18160000.0, 0.402906435713663], [1436501952.0, 18180000.0, 0.4143755809932838], [1436503680.0, 18200000.0, 0.4257792915650732], [1436505408.0, 18220000.0, 0.43711576665093277], [1436507136.0, 18240000.0, 0.448383216090033], [1436508864.0, 18260000.0, 0.4595798606214879], [1436510592.0, 18280000.0, 0.4707039321653319], [1436512320.0, 18300000.0, 0.4817536741017155], [1436514048.0, 18320000.0, 0.4927273415482911], [1436515776.0, 18340000.0, 0.5036232016357612], [1436517504.0, 18360000.0, 0.5144395337815062], [1436519232.0, 18380000.0, 0.5251746299612963], [1436520960.0, 18400000.0, 0.5358267949789965], [1436522688.0, 18420000.0, 0.5463943467342698], [1436524416.0, 18440000.0, 0.5568756164881881], [1436526144.0, 18460000.0, 0.567268949126756], [1436527872.0, 18480000.0, 0.5775727034222679], [1436529600.0, 18500000.0, 0.5877852522924728], [1436531328.0, 18520000.0, 0.5979049830575193], [1436533056.0, 18540000.0, 0.6079302976946052], [1436534784.0, 18560000.0, 0.617859613090335], [1436536512.0, 18580000.0, 0.6276913612907005], [1436538240.0, 18600000.0, 0.6374239897486904], [1436539968.0, 18620000.0, 0.6470559615694444], [1436541696.0, 18640000.0, 0.656585755752956], [1436543424.0, 18660000.0, 0.6660118674342519], [1436545152.0, 18680000.0, 0.6753328081210241], [1436546880.0, 18700000.0, 0.6845471059286892], [1436548608.0, 18720000.0, 0.6936533058128048], [1436550336.0, 18740000.0, 0.7026499697988497], [1436552064.0, 18760000.0, 0.7115356772092841], [1436553792.0, 18780000.0, 0.7203090248879064], [1436555520.0, 18800000.0, 0.7289686274214104], [1436557248.0, 18820000.0, 0.7375131173581735], [1436558976.0, 18840000.0, 0.7459411454241812], [1436560704.0, 18860000.0, 0.7542513807361036], [1436562432.0, 18880000.0, 0.7624425110114471], [1436564160.0, 18900000.0, 0.770513242775788], [1436565888.0, 18920000.0, 0.7784623015670229], [1436567616.0, 18940000.0, 0.7862884321366178], [1436569344.0, 18960000.0, 0.7939903986478349], [1436571072.0, 18980000.0, 0.8015669848708757], [1436572800.0, 19000000.0, 0.8090169943749471], [1436574528.0, 19020000.0, 0.8163392507171833], [1436576256.0, 19040000.0, 0.8235325976284272], [1436577984.0, 19060000.0, 0.8305958991958121], [1436579712.0, 19080000.0, 0.8375280400421408], [1436581440.0, 19100000.0, 0.8443279255020146], [1436583168.0, 19120000.0, 0.8509944817946911], [1436584896.0, 19140000.0, 0.857526656193652], [1436586624.0, 19160000.0, 0.8639234171928346], [1436588352.0, 19180000.0, 0.8701837546695255], [1436590080.0, 19200000.0, 0.876306680043863], [1436591808.0, 19220000.0, 0.8822912264349532], [1436593536.0, 19240000.0, 0.888136448813544], [1436595264.0, 19260000.0, 0.893841424151263], [1436596992.0, 19280000.0, 0.8994052515663707], [1436598720.0, 19300000.0, 0.9048270524660189], [1436600448.0, 19320000.0, 0.9101059706849954], [1436602176.0, 19340000.0, 0.915241172620917], [1436603904.0, 19360000.0, 0.9202318473658702], [1436605632.0, 19380000.0, 0.9250772068344576], [1436607360.0, 19400000.0, 0.9297764858882513], [1436609088.0, 19420000.0, 0.9343289424566117], [1436610816.0, 19440000.0, 0.9387338576538741], [1436612544.0, 19460000.0, 0.9429905358928642], [1436614272.0, 19480000.0, 0.9470983049947438], [1436616000.0, 19500000.0, 0.9510565162951534], [1436617728.0, 19520000.0, 0.9548645447466426], [1436619456.0, 19540000.0, 0.9585217890173758], [1436621184.0, 19560000.0, 0.9620276715860856], [1436622912.0, 19580000.0, 0.9653816388332739], [1436624640.0, 19600000.0, 0.968583161128631], [1436626368.0, 19620000.0, 0.971631732914674], [1436628096.0, 19640000.0, 0.974526872786577], [1436629824.0, 19660000.0, 0.9772681235681931], [1436631552.0, 19680000.0, 0.9798550523842467], [1436633280.0, 19700000.0, 0.9822872507286885], [1436635008.0, 19720000.0, 0.9845643345292053], [1436636736.0, 19740000.0, 0.9866859442078679], [1436638464.0, 19760000.0, 0.9886517447379141], [1436640192.0, 19780000.0, 0.9904614256966511], [1436641920.0, 19800000.0, 0.9921147013144779], [1436643648.0, 19820000.0, 0.9936113105200084], [1436645376.0, 19840000.0, 0.9949510169813001], [1436647104.0, 19860000.0, 0.9961336091431724], [1436648832.0, 19880000.0, 0.9971589002606138], [1436650560.0, 19900000.0, 0.9980267284282716], [1436652288.0, 19920000.0, 0.9987369566060175], [1436654016.0, 19940000.0, 0.9992894726405892], [1436655744.0, 19960000.0, 0.9996841892832999], [1436657472.0, 19980000.0, 0.9999210442038161], [1436659200.0, 20000000.0, 1.0], [1436660928.0, 20020000.0, 0.9999210442038161], [1436662656.0, 20040000.0, 0.9996841892832999], [1436664384.0, 20060000.0, 0.9992894726405893], [1436666112.0, 20080000.0, 0.9987369566060175], [1436667840.0, 20100000.0, 0.9980267284282718], [1436669568.0, 20120000.0, 0.9971589002606139], [1436671296.0, 20140000.0, 0.9961336091431727], [1436673024.0, 20160000.0, 0.9949510169813002], [1436674752.0, 20180000.0, 0.9936113105200086], [1436676480.0, 20200000.0, 0.992114701314478], [1436678208.0, 20220000.0, 0.9904614256966515], [1436679936.0, 20240000.0, 0.9886517447379142], [1436681664.0, 20260000.0, 0.9866859442078684], [1436683392.0, 20280000.0, 0.9845643345292056], [1436685120.0, 20300000.0, 0.9822872507286889], [1436686848.0, 20320000.0, 0.979855052384247], [1436688576.0, 20340000.0, 0.9772681235681937], [1436690304.0, 20360000.0, 0.9745268727865772], [1436692032.0, 20380000.0, 0.9716317329146742], [1436693760.0, 20400000.0, 0.9685831611286312], [1436695488.0, 20420000.0, 0.9653816388332741], [1436697216.0, 20440000.0, 0.9620276715860858], [1436698944.0, 20460000.0, 0.9585217890173766], [1436700672.0, 20480000.0, 0.9548645447466428], [1436702400.0, 20500000.0, 0.9510565162951543], [1436704128.0, 20520000.0, 0.9470983049947441], [1436705856.0, 20540000.0, 0.9429905358928652], [1436707584.0, 20560000.0, 0.9387338576538744], [1436709312.0, 20580000.0, 0.9343289424566127], [1436711040.0, 20600000.0, 0.9297764858882517], [1436712768.0, 20620000.0, 0.9250772068344587], [1436714496.0, 20640000.0, 0.9202318473658706], [1436716224.0, 20660000.0, 0.9152411726209181], [1436717952.0, 20680000.0, 0.9101059706849959], [1436719680.0, 20700000.0, 0.90482705246602], [1436721408.0, 20720000.0, 0.8994052515663712], [1436723136.0, 20740000.0, 0.8938414241512642], [1436724864.0, 20760000.0, 0.8881364488135445], [1436726592.0, 20780000.0, 0.8822912264349536], [1436728320.0, 20800000.0, 0.8763066800438635], [1436730048.0, 20820000.0, 0.8701837546695259], [1436731776.0, 20840000.0, 0.8639234171928352], [1436733504.0, 20860000.0, 0.8575266561936533], [1436735232.0, 20880000.0, 0.8509944817946915], [1436736960.0, 20900000.0, 0.8443279255020161], [1436738688.0, 20920000.0, 0.8375280400421412], [1436740416.0, 20940000.0, 0.8305958991958136], [1436742144.0, 20960000.0, 0.8235325976284278], [1436743872.0, 20980000.0, 0.8163392507171848], [1436745600.0, 21000000.0, 0.8090169943749477], [1436747328.0, 21020000.0, 0.8015669848708773], [1436749056.0, 21040000.0, 0.7939903986478355], [1436750784.0, 21060000.0, 0.7862884321366196], [1436752512.0, 21080000.0, 0.7784623015670235], [1436754240.0, 21100000.0, 0.7705132427757898], [1436755968.0, 21120000.0, 0.7624425110114478], [1436757696.0, 21140000.0, 0.7542513807361042], [1436759424.0, 21160000.0, 0.7459411454241819], [1436761152.0, 21180000.0, 0.7375131173581742], [1436762880.0, 21200000.0, 0.7289686274214111], [1436764608.0, 21220000.0, 0.7203090248879083], [1436766336.0, 21240000.0, 0.7115356772092848], [1436768064.0, 21260000.0, 0.7026499697988505], [1436769792.0, 21280000.0, 0.6936533058128043], [1436771520.0, 21300000.0, 0.6845471059286898], [1436773248.0, 21320000.0, 0.6753328081210249], [1436774976.0, 21340000.0, 0.6660118674342527], [1436776704.0, 21360000.0, 0.6565857557529567], [1436778432.0, 21380000.0, 0.6470559615694452], [1436780160.0, 21400000.0, 0.6374239897486899], [1436781888.0, 21420000.0, 0.6276913612907012], [1436783616.0, 21440000.0, 0.6178596130903343], [1436785344.0, 21460000.0, 0.6079302976946059], [1436787072.0, 21480000.0, 0.5979049830575187], [1436788800.0, 21500000.0, 0.5877852522924736], [1436790528.0, 21520000.0, 0.5775727034222672], [1436792256.0, 21540000.0, 0.5672689491267567], [1436793984.0, 21560000.0, 0.5568756164881874], [1436795712.0, 21580000.0, 0.5463943467342707], [1436797440.0, 21600000.0, 0.5358267949789959], [1436799168.0, 21620000.0, 0.5251746299612972], [1436800896.0, 21640000.0, 0.5144395337815055], [1436802624.0, 21660000.0, 0.5036232016357621], [1436804352.0, 21680000.0, 0.49272734154829195], [1436806080.0, 21700000.0, 0.4817536741017164], [1436807808.0, 21720000.0, 0.4707039321653328], [1436809536.0, 21740000.0, 0.45957986062148876], [1436811264.0, 21760000.0, 0.4483832160900323], [1436812992.0, 21780000.0, 0.43711576665093366], [1436814720.0, 21800000.0, 0.4257792915650725], [1436816448.0, 21820000.0, 0.4143755809932847], [1436818176.0, 21840000.0, 0.4029064357136623], [1436819904.0, 21860000.0, 0.39137366683720276], [1436821632.0, 21880000.0, 0.37977909552180056], [1436823360.0, 21900000.0, 0.36812455268467814], [1436825088.0, 21920000.0, 0.3564118787132499], [1436826816.0, 21940000.0, 0.344642923174517], [1436828544.0, 21960000.0, 0.3328195445229857], [1436830272.0, 21980000.0, 0.32094360980721093], [1436832000.0, 22000000.0, 0.30901699437494623], [1436833728.0, 22020000.0, 0.2970415815770362], [1436835456.0, 22040000.0, 0.2850192624699747], [1436837184.0, 22060000.0, 0.2729519355173263], [1436838912.0, 22080000.0, 0.26084150628989705], [1436840640.0, 22100000.0, 0.24868988716485566], [1436842368.0, 22120000.0, 0.2364989970237246], [1436844096.0, 22140000.0, 0.22427076094938186], [1436845824.0, 22160000.0, 0.21200710992205435], [1436847552.0, 22180000.0, 0.19970998051440747], [1436849280.0, 22200000.0, 0.18738131458572413], [1436851008.0, 22220000.0, 0.1750230589752763], [1436852736.0, 22240000.0, 0.1626371651948829], [1436854464.0, 22260000.0, 0.1502255891207571], [1436856192.0, 22280000.0, 0.13779029068463713], [1436857920.0, 22300000.0, 0.1253332335643041], [1436859648.0, 22320000.0, 0.11285638487348054], [1436861376.0, 22340000.0, 0.10036171485121628], [1436863104.0, 22360000.0, 0.0878511965507418], [1436864832.0, 22380000.0, 0.0753268055279339], [1436866560.0, 22400000.0, 0.06279051952931179], [1436868288.0, 22420000.0, 0.050244318179770535], [1436870016.0, 22440000.0, 0.03769018266993452], [1436871744.0, 22460000.0, 0.025130095443338243], [1436873472.0, 22480000.0, 0.012566039883352377], [1436875200.0, 22500000.0, 5.51091059616309e-16], [1436876928.0, 22520000.0, -0.012566039883351275], [1436878656.0, 22540000.0, -0.02513009544333714], [1436880384.0, 22560000.0, -0.03769018266993342], [1436882112.0, 22580000.0, -0.05024431817976943], [1436883840.0, 22600000.0, -0.0627905195293107], [1436885568.0, 22620000.0, -0.0753268055279328], [1436887296.0, 22640000.0, -0.08785119655074071], [1436889024.0, 22660000.0, -0.10036171485121519], [1436890752.0, 22680000.0, -0.11285638487347945], [1436892480.0, 22700000.0, -0.12533323356430298], [1436894208.0, 22720000.0, -0.13779029068463602], [1436895936.0, 22740000.0, -0.150225589120756], [1436897664.0, 22760000.0, -0.16263716519488178], [1436899392.0, 22780000.0, -0.17502305897527523], [1436901120.0, 22800000.0, -0.18738131458572305], [1436902848.0, 22820000.0, -0.1997099805144064], [1436904576.0, 22840000.0, -0.21200710992205327], [1436906304.0, 22860000.0, -0.22427076094938078], [1436908032.0, 22880000.0, -0.2364989970237235], [1436909760.0, 22900000.0, -0.2486898871648546], [1436911488.0, 22920000.0, -0.260841506289896], [1436913216.0, 22940000.0, -0.2729519355173252], [1436914944.0, 22960000.0, -0.28501926246997367], [1436916672.0, 22980000.0, -0.29704158157703514], [1436918400.0, 23000000.0, -0.3090169943749452], [1436920128.0, 23020000.0, -0.3209436098072099], [1436921856.0, 23040000.0, -0.3328195445229846], [1436923584.0, 23060000.0, -0.344642923174516], [1436925312.0, 23080000.0, -0.3564118787132489], [1436927040.0, 23100000.0, -0.3681245526846771], [1436928768.0, 23120000.0, -0.3797790955217995], [1436930496.0, 23140000.0, -0.39137366683720176], [1436932224.0, 23160000.0, -0.4029064357136613], [1436933952.0, 23180000.0, -0.4143755809932837], [1436935680.0, 23200000.0, -0.4257792915650715], [1436937408.0, 23220000.0, -0.43711576665093266], [1436939136.0, 23240000.0, -0.4483832160900313], [1436940864.0, 23260000.0, -0.4595798606214878], [1436942592.0, 23280000.0, -0.4707039321653318], [1436944320.0, 23300000.0, -0.4817536741017154], [1436946048.0, 23320000.0, -0.492727341548291], [1436947776.0, 23340000.0, -0.5036232016357611], [1436949504.0, 23360000.0, -0.5144395337815045], [1436951232.0, 23380000.0, -0.5251746299612962], [1436952960.0, 23400000.0, -0.5358267949789949], [1436954688.0, 23420000.0, -0.5463943467342698], [1436956416.0, 23440000.0, -0.5568756164881865], [1436958144.0, 23460000.0, -0.5672689491267559], [1436959872.0, 23480000.0, -0.5775727034222663], [1436961600.0, 23500000.0, -0.5877852522924727], [1436963328.0, 23520000.0, -0.5979049830575178], [1436965056.0, 23540000.0, -0.607930297694605], [1436966784.0, 23560000.0, -0.6178596130903334], [1436968512.0, 23580000.0, -0.6276913612907004], [1436970240.0, 23600000.0, -0.637423989748689], [1436971968.0, 23620000.0, -0.6470559615694443], [1436973696.0, 23640000.0, -0.6565857557529559], [1436975424.0, 23660000.0, -0.6660118674342518], [1436977152.0, 23680000.0, -0.675332808121024], [1436978880.0, 23700000.0, -0.6845471059286891], [1436980608.0, 23720000.0, -0.6936533058128035], [1436982336.0, 23740000.0, -0.7026499697988497], [1436984064.0, 23760000.0, -0.711535677209284], [1436985792.0, 23780000.0, -0.7203090248879075], [1436987520.0, 23800000.0, -0.7289686274214104], [1436989248.0, 23820000.0, -0.7375131173581735], [1436990976.0, 23840000.0, -0.7459411454241812], [1436992704.0, 23860000.0, -0.7542513807361035], [1436994432.0, 23880000.0, -0.7624425110114471], [1436996160.0, 23900000.0, -0.770513242775789], [1436997888.0, 23920000.0, -0.7784623015670228], [1436999616.0, 23940000.0, -0.7862884321366189], [1437001344.0, 23960000.0, -0.7939903986478348], [1437003072.0, 23980000.0, -0.8015669848708766], [1437004800.0, 24000000.0, -0.8090169943749471], [1437006528.0, 24020000.0, -0.8163392507171842], [1437008256.0, 24040000.0, -0.8235325976284272], [1437009984.0, 24060000.0, -0.830595899195813], [1437011712.0, 24080000.0, -0.8375280400421407], [1437013440.0, 24100000.0, -0.8443279255020155], [1437015168.0, 24120000.0, -0.850994481794691], [1437016896.0, 24140000.0, -0.8575266561936528], [1437018624.0, 24160000.0, -0.8639234171928346], [1437020352.0, 24180000.0, -0.8701837546695254], [1437022080.0, 24200000.0, -0.8763066800438629], [1437023808.0, 24220000.0, -0.8822912264349532], [1437025536.0, 24240000.0, -0.888136448813544], [1437027264.0, 24260000.0, -0.8938414241512638], [1437028992.0, 24280000.0, -0.8994052515663706], [1437030720.0, 24300000.0, -0.9048270524660196], [1437032448.0, 24320000.0, -0.9101059706849954], [1437034176.0, 24340000.0, -0.9152411726209176], [1437035904.0, 24360000.0, -0.9202318473658702], [1437037632.0, 24380000.0, -0.9250772068344583], [1437039360.0, 24400000.0, -0.9297764858882513], [1437041088.0, 24420000.0, -0.9343289424566124], [1437042816.0, 24440000.0, -0.9387338576538741], [1437044544.0, 24460000.0, -0.9429905358928647], [1437046272.0, 24480000.0, -0.9470983049947438], [1437048000.0, 24500000.0, -0.951056516295154], [1437049728.0, 24520000.0, -0.9548645447466425], [1437051456.0, 24540000.0, -0.9585217890173763], [1437053184.0, 24560000.0, -0.9620276715860856], [1437054912.0, 24580000.0, -0.9653816388332739], [1437056640.0, 24600000.0, -0.9685831611286309], [1437058368.0, 24620000.0, -0.971631732914674], [1437060096.0, 24640000.0, -0.974526872786577], [1437061824.0, 24660000.0, -0.9772681235681935], [1437063552.0, 24680000.0, -0.9798550523842467], [1437065280.0, 24700000.0, -0.9822872507286887], [1437067008.0, 24720000.0, -0.9845643345292053], [1437068736.0, 24740000.0, -0.9866859442078681], [1437070464.0, 24760000.0, -0.988651744737914], [1437072192.0, 24780000.0, -0.9904614256966513], [1437073920.0, 24800000.0, -0.9921147013144779], [1437075648.0, 24820000.0, -0.9936113105200085], [1437077376.0, 24840000.0, -0.9949510169813001], [1437079104.0, 24860000.0, -0.9961336091431726], [1437080832.0, 24880000.0, -0.9971589002606138], [1437082560.0, 24900000.0, -0.9980267284282717], [1437084288.0, 24920000.0, -0.9987369566060175], [1437086016.0, 24940000.0, -0.9992894726405892], [1437087744.0, 24960000.0, -0.9996841892832999], [1437089472.0, 24980000.0, -0.9999210442038161], [1437091200.0, 25000000.0, -1.0], [1437092928.0, 25020000.0, -0.9999210442038161], [1437094656.0, 25040000.0, -0.9996841892832999], [1437096384.0, 25060000.0, -0.9992894726405893], [1437098112.0, 25080000.0, -0.9987369566060175], [1437099840.0, 25100000.0, -0.9980267284282718], [1437101568.0, 25120000.0, -0.9971589002606139], [1437103296.0, 25140000.0, -0.9961336091431727], [1437105024.0, 25160000.0, -0.9949510169813002], [1437106752.0, 25180000.0, -0.9936113105200086], [1437108480.0, 25200000.0, -0.992114701314478], [1437110208.0, 25220000.0, -0.9904614256966515], [1437111936.0, 25240000.0, -0.9886517447379142], [1437113664.0, 25260000.0, -0.9866859442078684], [1437115392.0, 25280000.0, -0.9845643345292056], [1437117120.0, 25300000.0, -0.9822872507286889], [1437118848.0, 25320000.0, -0.979855052384247], [1437120576.0, 25340000.0, -0.9772681235681938], [1437122304.0, 25360000.0, -0.9745268727865772], [1437124032.0, 25380000.0, -0.9716317329146742], [1437125760.0, 25400000.0, -0.9685831611286312], [1437127488.0, 25420000.0, -0.9653816388332741], [1437129216.0, 25440000.0, -0.9620276715860859], [1437130944.0, 25460000.0, -0.9585217890173766], [1437132672.0, 25480000.0, -0.9548645447466434], [1437134400.0, 25500000.0, -0.9510565162951543], [1437136128.0, 25520000.0, -0.9470983049947441], [1437137856.0, 25540000.0, -0.9429905358928646], [1437139584.0, 25560000.0, -0.9387338576538745], [1437141312.0, 25580000.0, -0.9343289424566128], [1437143040.0, 25600000.0, -0.9297764858882511], [1437144768.0, 25620000.0, -0.925077206834458], [1437146496.0, 25640000.0, -0.9202318473658706], [1437148224.0, 25660000.0, -0.9152411726209182], [1437149952.0, 25680000.0, -0.9101059706849952], [1437151680.0, 25700000.0, -0.9048270524660209], [1437153408.0, 25720000.0, -0.8994052515663712], [1437155136.0, 25740000.0, -0.8938414241512643], [1437156864.0, 25760000.0, -0.8881364488135454], [1437158592.0, 25780000.0, -0.8822912264349545], [1437160320.0, 25800000.0, -0.8763066800438636], [1437162048.0, 25820000.0, -0.870183754669526], [1437163776.0, 25840000.0, -0.863923417192836], [1437165504.0, 25860000.0, -0.8575266561936534], [1437167232.0, 25880000.0, -0.8509944817946916], [1437168960.0, 25900000.0, -0.8443279255020152], [1437170688.0, 25920000.0, -0.8375280400421423], [1437172416.0, 25940000.0, -0.8305958991958137], [1437174144.0, 25960000.0, -0.8235325976284269], [1437175872.0, 25980000.0, -0.8163392507171838], [1437177600.0, 26000000.0, -0.8090169943749478], [1437179328.0, 26020000.0, -0.8015669848708774], [1437181056.0, 26040000.0, -0.7939903986478345], [1437182784.0, 26060000.0, -0.7862884321366207], [1437184512.0, 26080000.0, -0.7784623015670236], [1437186240.0, 26100000.0, -0.7705132427757899], [1437187968.0, 26120000.0, -0.762442511011449], [1437189696.0, 26140000.0, -0.7542513807361055], [1437191424.0, 26160000.0, -0.745941145424182], [1437193152.0, 26180000.0, -0.7375131173581743], [1437194880.0, 26200000.0, -0.7289686274214124], [1437196608.0, 26220000.0, -0.7203090248879084], [1437198336.0, 26240000.0, -0.7115356772092849], [1437200064.0, 26260000.0, -0.7026499697988493], [1437201792.0, 26280000.0, -0.6936533058128056], [1437203520.0, 26300000.0, -0.68454710592869], [1437205248.0, 26320000.0, -0.6753328081210237], [1437206976.0, 26340000.0, -0.6660118674342514], [1437208704.0, 26360000.0, -0.6565857557529569], [1437210432.0, 26380000.0, -0.6470559615694453], [1437212160.0, 26400000.0, -0.6374239897486885], [1437213888.0, 26420000.0, -0.6276913612907027], [1437215616.0, 26440000.0, -0.6178596130903344], [1437217344.0, 26460000.0, -0.607930297694606], [1437219072.0, 26480000.0, -0.5979049830575174], [1437220800.0, 26500000.0, -0.5877852522924751], [1437222528.0, 26520000.0, -0.5775727034222673], [1437224256.0, 26540000.0, -0.5672689491267568], [1437225984.0, 26560000.0, -0.5568756164881891], [1437227712.0, 26580000.0, -0.5463943467342708], [1437229440.0, 26600000.0, -0.535826794978996], [1437231168.0, 26620000.0, -0.5251746299612957], [1437232896.0, 26640000.0, -0.5144395337815071], [1437234624.0, 26660000.0, -0.5036232016357622], [1437236352.0, 26680000.0, -0.4927273415482905], [1437238080.0, 26700000.0, -0.48175367410171493], [1437239808.0, 26720000.0, -0.4707039321653329], [1437241536.0, 26740000.0, -0.45957986062148887], [1437243264.0, 26760000.0, -0.4483832160900308], [1437244992.0, 26780000.0, -0.4371157666509353], [1437246720.0, 26800000.0, -0.4257792915650726], [1437248448.0, 26820000.0, -0.4143755809932848], [1437250176.0, 26840000.0, -0.4029064357136608], [1437251904.0, 26860000.0, -0.39137366683720454], [1437253632.0, 26880000.0, -0.37977909552180067], [1437255360.0, 26900000.0, -0.36812455268467825], [1437257088.0, 26920000.0, -0.3564118787132517], [1437258816.0, 26940000.0, -0.34464292317451883], [1437260544.0, 26960000.0, -0.3328195445229858], [1437262272.0, 26980000.0, -0.3209436098072094], [1437264000.0, 27000000.0, -0.30901699437494806], [1437265728.0, 27020000.0, -0.2970415815770363], [1437267456.0, 27040000.0, -0.28501926246997483], [1437269184.0, 27060000.0, -0.2729519355173247], [1437270912.0, 27080000.0, -0.26084150628989716], [1437272640.0, 27100000.0, -0.24868988716485577], [1437274368.0, 27120000.0, -0.23649899702372298], [1437276096.0, 27140000.0, -0.2242707609493837], [1437277824.0, 27160000.0, -0.21200710992205446], [1437279552.0, 27180000.0, -0.1997099805144076], [1437281280.0, 27200000.0, -0.1873813145857225], [1437283008.0, 27220000.0, -0.17502305897527817], [1437284736.0, 27240000.0, -0.162637165194883], [1437286464.0, 27260000.0, -0.15022558912075723], [1437288192.0, 27280000.0, -0.137790290684639], [1437289920.0, 27300000.0, -0.12533323356430598], [1437291648.0, 27320000.0, -0.11285638487348065], [1437293376.0, 27340000.0, -0.10036171485121463], [1437295104.0, 27360000.0, -0.08785119655074371], [1437296832.0, 27380000.0, -0.07532680552793403], [1437298560.0, 27400000.0, -0.06279051952931192], [1437300288.0, 27420000.0, -0.050244318179768876], [1437302016.0, 27440000.0, -0.037690182669934645], [1437303744.0, 27460000.0, -0.025130095443338364], [1437305472.0, 27480000.0, -0.012566039883350723], [1437307200.0, 27500000.0, -2.4499125789312946e-15], [1437308928.0, 27520000.0, 0.012566039883349376], [1437310656.0, 27540000.0, 0.025130095443337018], [1437312384.0, 27560000.0, 0.0376901826699333], [1437314112.0, 27580000.0, 0.05024431817976754], [1437315840.0, 27600000.0, 0.06279051952931057], [1437317568.0, 27620000.0, 0.07532680552793268], [1437319296.0, 27640000.0, 0.08785119655074236], [1437321024.0, 27660000.0, 0.1003617148512133], [1437322752.0, 27680000.0, 0.11285638487347932], [1437324480.0, 27700000.0, 0.12533323356430462], [1437326208.0, 27720000.0, 0.13779029068463766], [1437327936.0, 27740000.0, 0.1502255891207559], [1437329664.0, 27760000.0, 0.16263716519488167], [1437331392.0, 27780000.0, 0.17502305897527684], [1437333120.0, 27800000.0, 0.18738131458572116], [1437334848.0, 27820000.0, 0.19970998051440628], [1437336576.0, 27840000.0, 0.21200710992205313], [1437338304.0, 27860000.0, 0.2242707609493824], [1437340032.0, 27880000.0, 0.23649899702372168], [1437341760.0, 27900000.0, 0.24868988716485446], [1437343488.0, 27920000.0, 0.2608415062898959], [1437345216.0, 27940000.0, 0.2729519355173234], [1437346944.0, 27960000.0, 0.28501926246997356], [1437348672.0, 27980000.0, 0.29704158157703503], [1437350400.0, 28000000.0, 0.3090169943749468], [1437352128.0, 28020000.0, 0.3209436098072081], [1437353856.0, 28040000.0, 0.3328195445229845], [1437355584.0, 28060000.0, 0.34464292317451756], [1437357312.0, 28080000.0, 0.3564118787132505], [1437359040.0, 28100000.0, 0.368124552684677], [1437360768.0, 28120000.0, 0.3797790955217994], [1437362496.0, 28140000.0, 0.39137366683720326], [1437364224.0, 28160000.0, 0.4029064357136596], [1437365952.0, 28180000.0, 0.4143755809932836], [1437367680.0, 28200000.0, 0.4257792915650714], [1437369408.0, 28220000.0, 0.43711576665093416], [1437371136.0, 28240000.0, 0.4483832160900296], [1437372864.0, 28260000.0, 0.4595798606214877], [1437374592.0, 28280000.0, 0.4707039321653317], [1437376320.0, 28300000.0, 0.4817536741017137], [1437378048.0, 28320000.0, 0.49272734154828934], [1437379776.0, 28340000.0, 0.503623201635761], [1437381504.0, 28360000.0, 0.514439533781506], [1437383232.0, 28380000.0, 0.5251746299612946], [1437384960.0, 28400000.0, 0.5358267949789949], [1437386688.0, 28420000.0, 0.5463943467342697], [1437388416.0, 28440000.0, 0.556875616488188], [1437390144.0, 28460000.0, 0.5672689491267557], [1437391872.0, 28480000.0, 0.5775727034222662], [1437393600.0, 28500000.0, 0.587785252292474], [1437395328.0, 28520000.0, 0.5979049830575163], [1437397056.0, 28540000.0, 0.607930297694605], [1437398784.0, 28560000.0, 0.6178596130903333], [1437400512.0, 28580000.0, 0.6276913612907017], [1437402240.0, 28600000.0, 0.6374239897486875], [1437403968.0, 28620000.0, 0.6470559615694442], [1437405696.0, 28640000.0, 0.6565857557529557], [1437407424.0, 28660000.0, 0.6660118674342504], [1437409152.0, 28680000.0, 0.6753328081210227], [1437410880.0, 28700000.0, 0.684547105928689], [1437412608.0, 28720000.0, 0.6936533058128047], [1437414336.0, 28740000.0, 0.7026499697988483], [1437416064.0, 28760000.0, 0.7115356772092839], [1437417792.0, 28780000.0, 0.7203090248879075], [1437419520.0, 28800000.0, 0.7289686274214116], [1437421248.0, 28820000.0, 0.7375131173581734], [1437422976.0, 28840000.0, 0.7459411454241811], [1437424704.0, 28860000.0, 0.7542513807361046], [1437426432.0, 28880000.0, 0.7624425110114481], [1437428160.0, 28900000.0, 0.770513242775789], [1437429888.0, 28920000.0, 0.7784623015670227], [1437431616.0, 28940000.0, 0.7862884321366199], [1437433344.0, 28960000.0, 0.7939903986478337], [1437435072.0, 28980000.0, 0.8015669848708766], [1437436800.0, 29000000.0, 0.809016994374947], [1437438528.0, 29020000.0, 0.816339250717183], [1437440256.0, 29040000.0, 0.8235325976284261], [1437441984.0, 29060000.0, 0.8305958991958129], [1437443712.0, 29080000.0, 0.8375280400421415], [1437445440.0, 29100000.0, 0.8443279255020145], [1437447168.0, 29120000.0, 0.8509944817946908], [1437448896.0, 29140000.0, 0.8575266561936528], [1437450624.0, 29160000.0, 0.8639234171928354], [1437452352.0, 29180000.0, 0.8701837546695254], [1437454080.0, 29200000.0, 0.8763066800438629], [1437455808.0, 29220000.0, 0.882291226434954], [1437457536.0, 29240000.0, 0.8881364488135448], [1437459264.0, 29260000.0, 0.8938414241512637], [1437460992.0, 29280000.0, 0.8994052515663706], [1437462720.0, 29300000.0, 0.9048270524660202], [1437464448.0, 29320000.0, 0.9101059706849947], [1437466176.0, 29340000.0, 0.9152411726209176], [1437467904.0, 29360000.0, 0.9202318473658702], [1437469632.0, 29380000.0, 0.9250772068344575], [1437471360.0, 29400000.0, 0.9297764858882506], [1437473088.0, 29420000.0, 0.9343289424566122], [1437474816.0, 29440000.0, 0.9387338576538741], [1437476544.0, 29460000.0, 0.9429905358928642], [1437478272.0, 29480000.0, 0.9470983049947437], [1437480000.0, 29500000.0, 0.9510565162951539], [1437481728.0, 29520000.0, 0.9548645447466431], [1437483456.0, 29540000.0, 0.9585217890173757], [1437485184.0, 29560000.0, 0.9620276715860855], [1437486912.0, 29580000.0, 0.9653816388332743], [1437488640.0, 29600000.0, 0.9685831611286313], [1437490368.0, 29620000.0, 0.9716317329146739], [1437492096.0, 29640000.0, 0.9745268727865769], [1437493824.0, 29660000.0, 0.9772681235681938], [1437495552.0, 29680000.0, 0.9798550523842464], [1437497280.0, 29700000.0, 0.9822872507286887], [1437499008.0, 29720000.0, 0.9845643345292053], [1437500736.0, 29740000.0, 0.9866859442078678], [1437502464.0, 29760000.0, 0.9886517447379137], [1437504192.0, 29780000.0, 0.9904614256966513], [1437505920.0, 29800000.0, 0.9921147013144779], [1437507648.0, 29820000.0, 0.9936113105200083], [1437509376.0, 29840000.0, 0.9949510169813], [1437511104.0, 29860000.0, 0.9961336091431726], [1437512832.0, 29880000.0, 0.9971589002606139], [1437514560.0, 29900000.0, 0.9980267284282716], [1437516288.0, 29920000.0, 0.9987369566060175], [1437518016.0, 29940000.0, 0.9992894726405893], [1437519744.0, 29960000.0, 0.9996841892833], [1437521472.0, 29980000.0, 0.9999210442038161], [1437523200.0, 30000000.0, 1.0], [1437524928.0, 30020000.0, 0.9999210442038161], [1437526656.0, 30040000.0, 0.9996841892833], [1437528384.0, 30060000.0, 0.9992894726405893], [1437530112.0, 30080000.0, 0.9987369566060175], [1437531840.0, 30100000.0, 0.9980267284282717], [1437533568.0, 30120000.0, 0.997158900260614], [1437535296.0, 30140000.0, 0.9961336091431727], [1437537024.0, 30160000.0, 0.9949510169813002], [1437538752.0, 30180000.0, 0.9936113105200085], [1437540480.0, 30200000.0, 0.992114701314478], [1437542208.0, 30220000.0, 0.9904614256966515], [1437543936.0, 30240000.0, 0.988651744737914], [1437545664.0, 30260000.0, 0.9866859442078681], [1437547392.0, 30280000.0, 0.9845643345292056], [1437549120.0, 30300000.0, 0.982287250728689], [1437550848.0, 30320000.0, 0.9798550523842466], [1437552576.0, 30340000.0, 0.9772681235681941], [1437554304.0, 30360000.0, 0.9745268727865772], [1437556032.0, 30380000.0, 0.9716317329146743], [1437557760.0, 30400000.0, 0.9685831611286316], [1437559488.0, 30420000.0, 0.9653816388332747], [1437561216.0, 30440000.0, 0.9620276715860859], [1437562944.0, 30460000.0, 0.9585217890173762], [1437564672.0, 30480000.0, 0.9548645447466435], [1437566400.0, 30500000.0, 0.9510565162951543], [1437568128.0, 30520000.0, 0.9470983049947442], [1437569856.0, 30540000.0, 0.9429905358928646], [1437571584.0, 30560000.0, 0.9387338576538745], [1437573312.0, 30580000.0, 0.9343289424566128], [1437575040.0, 30600000.0, 0.9297764858882511], [1437576768.0, 30620000.0, 0.925077206834458], [1437578496.0, 30640000.0, 0.9202318473658707], [1437580224.0, 30660000.0, 0.9152411726209182], [1437581952.0, 30680000.0, 0.9101059706849952], [1437583680.0, 30700000.0, 0.9048270524660209], [1437585408.0, 30720000.0, 0.8994052515663712], [1437587136.0, 30740000.0, 0.8938414241512643], [1437588864.0, 30760000.0, 0.8881364488135455], [1437590592.0, 30780000.0, 0.8822912264349546], [1437592320.0, 30800000.0, 0.8763066800438636], [1437594048.0, 30820000.0, 0.870183754669526], [1437595776.0, 30840000.0, 0.8639234171928362], [1437597504.0, 30860000.0, 0.8575266561936535], [1437599232.0, 30880000.0, 0.8509944817946916], [1437600960.0, 30900000.0, 0.8443279255020153], [1437602688.0, 30920000.0, 0.8375280400421423], [1437604416.0, 30940000.0, 0.8305958991958138], [1437606144.0, 30960000.0, 0.823532597628427], [1437607872.0, 30980000.0, 0.8163392507171839], [1437609600.0, 31000000.0, 0.8090169943749479], [1437611328.0, 31020000.0, 0.8015669848708775], [1437613056.0, 31040000.0, 0.7939903986478346], [1437614784.0, 31060000.0, 0.7862884321366208], [1437616512.0, 31080000.0, 0.7784623015670237], [1437618240.0, 31100000.0, 0.7705132427757899], [1437619968.0, 31120000.0, 0.7624425110114491], [1437621696.0, 31140000.0, 0.7542513807361055], [1437623424.0, 31160000.0, 0.7459411454241821], [1437625152.0, 31180000.0, 0.7375131173581744], [1437626880.0, 31200000.0, 0.7289686274214126], [1437628608.0, 31220000.0, 0.7203090248879085], [1437630336.0, 31240000.0, 0.711535677209285], [1437632064.0, 31260000.0, 0.7026499697988494], [1437633792.0, 31280000.0, 0.6936533058128057], [1437635520.0, 31300000.0, 0.6845471059286901], [1437637248.0, 31320000.0, 0.6753328081210237], [1437638976.0, 31340000.0, 0.6660118674342516], [1437640704.0, 31360000.0, 0.6565857557529569], [1437642432.0, 31380000.0, 0.6470559615694453], [1437644160.0, 31400000.0, 0.6374239897486886], [1437645888.0, 31420000.0, 0.6276913612907028], [1437647616.0, 31440000.0, 0.6178596130903345], [1437649344.0, 31460000.0, 0.6079302976946062], [1437651072.0, 31480000.0, 0.5979049830575175], [1437652800.0, 31500000.0, 0.5877852522924751], [1437654528.0, 31520000.0, 0.5775727034222674], [1437656256.0, 31540000.0, 0.567268949126757], [1437657984.0, 31560000.0, 0.5568756164881892], [1437659712.0, 31580000.0, 0.5463943467342709], [1437661440.0, 31600000.0, 0.5358267949789961], [1437663168.0, 31620000.0, 0.5251746299612958], [1437664896.0, 31640000.0, 0.5144395337815072], [1437666624.0, 31660000.0, 0.5036232016357624], [1437668352.0, 31680000.0, 0.4927273415482906], [1437670080.0, 31700000.0, 0.48175367410171505], [1437671808.0, 31720000.0, 0.470703932165333], [1437673536.0, 31740000.0, 0.459579860621489], [1437675264.0, 31760000.0, 0.4483832160900309], [1437676992.0, 31780000.0, 0.43711576665093543], [1437678720.0, 31800000.0, 0.4257792915650727], [1437680448.0, 31820000.0, 0.4143755809932849], [1437682176.0, 31840000.0, 0.4029064357136609], [1437683904.0, 31860000.0, 0.39137366683720465], [1437685632.0, 31880000.0, 0.3797790955218008], [1437687360.0, 31900000.0, 0.36812455268467836], [1437689088.0, 31920000.0, 0.35641187871325186], [1437690816.0, 31940000.0, 0.34464292317451894], [1437692544.0, 31960000.0, 0.3328195445229859], [1437694272.0, 31980000.0, 0.3209436098072095], [1437696000.0, 32000000.0, 0.3090169943749482], [1437697728.0, 32020000.0, 0.2970415815770364], [1437699456.0, 32040000.0, 0.285019262469975], [1437701184.0, 32060000.0, 0.27295193551732483], [1437702912.0, 32080000.0, 0.2608415062898973], [1437704640.0, 32100000.0, 0.2486898871648559], [1437706368.0, 32120000.0, 0.2364989970237231], [1437708096.0, 32140000.0, 0.22427076094938383], [1437709824.0, 32160000.0, 0.21200710992205457], [1437711552.0, 32180000.0, 0.19970998051440772], [1437713280.0, 32200000.0, 0.1873813145857226], [1437715008.0, 32220000.0, 0.17502305897527828], [1437716736.0, 32240000.0, 0.1626371651948831], [1437718464.0, 32260000.0, 0.15022558912075734], [1437720192.0, 32280000.0, 0.13779029068463913], [1437721920.0, 32300000.0, 0.1253332335643061], [1437723648.0, 32320000.0, 0.11285638487348078], [1437725376.0, 32340000.0, 0.10036171485121476], [1437727104.0, 32360000.0, 0.08785119655074382], [1437728832.0, 32380000.0, 0.07532680552793415], [1437730560.0, 32400000.0, 0.06279051952931204], [1437732288.0, 32420000.0, 0.050244318179769], [1437734016.0, 32440000.0, 0.03769018266993476], [1437735744.0, 32460000.0, 0.025130095443338486], [1437737472.0, 32480000.0, 0.012566039883350845], [1437739200.0, 32500000.0, -9.803364199544708e-16], [1437740928.0, 32520000.0, -0.012566039883349254], [1437742656.0, 32540000.0, -0.025130095443336897], [1437744384.0, 32560000.0, -0.037690182669933174], [1437746112.0, 32580000.0, -0.05024431817976741], [1437747840.0, 32600000.0, -0.06279051952931045], [1437749568.0, 32620000.0, -0.07532680552793257], [1437751296.0, 32640000.0, -0.08785119655074224], [1437753024.0, 32660000.0, -0.10036171485121317], [1437754752.0, 32680000.0, -0.1128563848734792], [1437756480.0, 32700000.0, -0.1253332335643045], [1437758208.0, 32720000.0, -0.13779029068463755], [1437759936.0, 32740000.0, -0.15022558912075576], [1437761664.0, 32760000.0, -0.16263716519488156], [1437763392.0, 32780000.0, -0.17502305897527673], [1437765120.0, 32800000.0, -0.18738131458572105], [1437766848.0, 32820000.0, -0.19970998051440617], [1437768576.0, 32840000.0, -0.21200710992205302], [1437770304.0, 32860000.0, -0.22427076094938228], [1437772032.0, 32880000.0, -0.23649899702372154], [1437773760.0, 32900000.0, -0.24868988716485435], [1437775488.0, 32920000.0, -0.2608415062898957], [1437777216.0, 32940000.0, -0.2729519355173233], [1437778944.0, 32960000.0, -0.28501926246997344], [1437780672.0, 32980000.0, -0.29704158157703486], [1437782400.0, 33000000.0, -0.3090169943749467], [1437784128.0, 33020000.0, -0.320943609807208], [1437785856.0, 33040000.0, -0.3328195445229844], [1437787584.0, 33060000.0, -0.34464292317451745], [1437789312.0, 33080000.0, -0.35641187871325036], [1437791040.0, 33100000.0, -0.36812455268467686], [1437792768.0, 33120000.0, -0.3797790955217993], [1437794496.0, 33140000.0, -0.39137366683720315], [1437796224.0, 33160000.0, -0.4029064357136594], [1437797952.0, 33180000.0, -0.4143755809932835], [1437799680.0, 33200000.0, -0.42577929156507127], [1437801408.0, 33220000.0, -0.43711576665093405], [1437803136.0, 33240000.0, -0.4483832160900295], [1437804864.0, 33260000.0, -0.4595798606214876], [1437806592.0, 33280000.0, -0.4707039321653316], [1437808320.0, 33300000.0, -0.4817536741017136], [1437810048.0, 33320000.0, -0.49272734154828923], [1437811776.0, 33340000.0, -0.5036232016357609], [1437813504.0, 33360000.0, -0.5144395337815059], [1437815232.0, 33380000.0, -0.5251746299612945], [1437816960.0, 33400000.0, -0.5358267949789948], [1437818688.0, 33420000.0, -0.5463943467342696], [1437820416.0, 33440000.0, -0.5568756164881878], [1437822144.0, 33460000.0, -0.5672689491267556], [1437823872.0, 33480000.0, -0.5775727034222661], [1437825600.0, 33500000.0, -0.5877852522924739], [1437827328.0, 33520000.0, -0.5979049830575162], [1437829056.0, 33540000.0, -0.6079302976946049], [1437830784.0, 33560000.0, -0.6178596130903332], [1437832512.0, 33580000.0, -0.6276913612907016], [1437834240.0, 33600000.0, -0.6374239897486874], [1437835968.0, 33620000.0, -0.6470559615694442], [1437837696.0, 33640000.0, -0.6565857557529557], [1437839424.0, 33660000.0, -0.6660118674342503], [1437841152.0, 33680000.0, -0.6753328081210226], [1437842880.0, 33700000.0, -0.6845471059286888], [1437844608.0, 33720000.0, -0.6936533058128046], [1437846336.0, 33740000.0, -0.7026499697988483], [1437848064.0, 33760000.0, -0.7115356772092839], [1437849792.0, 33780000.0, -0.7203090248879074], [1437851520.0, 33800000.0, -0.7289686274214114], [1437853248.0, 33820000.0, -0.7375131173581733], [1437854976.0, 33840000.0, -0.745941145424181], [1437856704.0, 33860000.0, -0.7542513807361045], [1437858432.0, 33880000.0, -0.7624425110114481], [1437860160.0, 33900000.0, -0.7705132427757889], [1437861888.0, 33920000.0, -0.7784623015670227], [1437863616.0, 33940000.0, -0.7862884321366198], [1437865344.0, 33960000.0, -0.7939903986478336], [1437867072.0, 33980000.0, -0.8015669848708765], [1437868800.0, 34000000.0, -0.8090169943749469], [1437870528.0, 34020000.0, -0.816339250717183], [1437872256.0, 34040000.0, -0.8235325976284261], [1437873984.0, 34060000.0, -0.8305958991958129], [1437875712.0, 34080000.0, -0.8375280400421415], [1437877440.0, 34100000.0, -0.8443279255020144], [1437879168.0, 34120000.0, -0.8509944817946908], [1437880896.0, 34140000.0, -0.8575266561936526], [1437882624.0, 34160000.0, -0.8639234171928353], [1437884352.0, 34180000.0, -0.8701837546695252], [1437886080.0, 34200000.0, -0.8763066800438628], [1437887808.0, 34220000.0, -0.8822912264349538], [1437889536.0, 34240000.0, -0.8881364488135447], [1437891264.0, 34260000.0, -0.8938414241512637], [1437892992.0, 34280000.0, -0.8994052515663705], [1437894720.0, 34300000.0, -0.9048270524660202], [1437896448.0, 34320000.0, -0.9101059706849945], [1437898176.0, 34340000.0, -0.9152411726209176], [1437899904.0, 34360000.0, -0.92023184736587], [1437901632.0, 34380000.0, -0.9250772068344575], [1437903360.0, 34400000.0, -0.9297764858882506], [1437905088.0, 34420000.0, -0.9343289424566122], [1437906816.0, 34440000.0, -0.938733857653874], [1437908544.0, 34460000.0, -0.9429905358928641], [1437910272.0, 34480000.0, -0.9470983049947437], [1437912000.0, 34500000.0, -0.9510565162951539], [1437913728.0, 34520000.0, -0.954864544746643], [1437915456.0, 34540000.0, -0.9585217890173757], [1437917184.0, 34560000.0, -0.9620276715860855], [1437918912.0, 34580000.0, -0.9653816388332742], [1437920640.0, 34600000.0, -0.9685831611286313], [1437922368.0, 34620000.0, -0.9716317329146739], [1437924096.0, 34640000.0, -0.9745268727865769], [1437925824.0, 34660000.0, -0.9772681235681938], [1437927552.0, 34680000.0, -0.9798550523842463], [1437929280.0, 34700000.0, -0.9822872507286887], [1437931008.0, 34720000.0, -0.9845643345292053], [1437932736.0, 34740000.0, -0.9866859442078678], [1437934464.0, 34760000.0, -0.9886517447379137], [1437936192.0, 34780000.0, -0.9904614256966513], [1437937920.0, 34800000.0, -0.9921147013144778], [1437939648.0, 34820000.0, -0.9936113105200083], [1437941376.0, 34840000.0, -0.9949510169813], [1437943104.0, 34860000.0, -0.9961336091431725], [1437944832.0, 34880000.0, -0.9971589002606139], [1437946560.0, 34900000.0, -0.9980267284282716], [1437948288.0, 34920000.0, -0.9987369566060175], [1437950016.0, 34940000.0, -0.9992894726405893], [1437951744.0, 34960000.0, -0.9996841892833], [1437953472.0, 34980000.0, -0.9999210442038161], [1437955200.0, 35000000.0, -1.0], [1437956928.0, 35020000.0, -0.9999210442038161], [1437958656.0, 35040000.0, -0.9996841892833], [1437960384.0, 35060000.0, -0.9992894726405893], [1437962112.0, 35080000.0, -0.9987369566060175], [1437963840.0, 35100000.0, -0.9980267284282717], [1437965568.0, 35120000.0, -0.997158900260614], [1437967296.0, 35140000.0, -0.9961336091431727], [1437969024.0, 35160000.0, -0.9949510169813002], [1437970752.0, 35180000.0, -0.9936113105200085], [1437972480.0, 35200000.0, -0.992114701314478], [1437974208.0, 35220000.0, -0.9904614256966515], [1437975936.0, 35240000.0, -0.988651744737914], [1437977664.0, 35260000.0, -0.9866859442078681], [1437979392.0, 35280000.0, -0.9845643345292056], [1437981120.0, 35300000.0, -0.982287250728689], [1437982848.0, 35320000.0, -0.9798550523842467], [1437984576.0, 35340000.0, -0.9772681235681941], [1437986304.0, 35360000.0, -0.9745268727865773], [1437988032.0, 35380000.0, -0.9716317329146743], [1437989760.0, 35400000.0, -0.9685831611286317], [1437991488.0, 35420000.0, -0.9653816388332747], [1437993216.0, 35440000.0, -0.9620276715860859], [1437994944.0, 35460000.0, -0.9585217890173762], [1437996672.0, 35480000.0, -0.9548645447466435], [1437998400.0, 35500000.0, -0.9510565162951544], [1438000128.0, 35520000.0, -0.9470983049947442], [1438001856.0, 35540000.0, -0.9429905358928646], [1438003584.0, 35560000.0, -0.9387338576538746], [1438005312.0, 35580000.0, -0.9343289424566128], [1438007040.0, 35600000.0, -0.9297764858882512], [1438008768.0, 35620000.0, -0.9250772068344582], [1438010496.0, 35640000.0, -0.9202318473658707], [1438012224.0, 35660000.0, -0.9152411726209183], [1438013952.0, 35680000.0, -0.9101059706849953], [1438015680.0, 35700000.0, -0.9048270524660209], [1438017408.0, 35720000.0, -0.8994052515663713], [1438019136.0, 35740000.0, -0.8938414241512644], [1438020864.0, 35760000.0, -0.8881364488135455], [1438022592.0, 35780000.0, -0.8822912264349546], [1438024320.0, 35800000.0, -0.8763066800438637], [1438026048.0, 35820000.0, -0.8701837546695261], [1438027776.0, 35840000.0, -0.8639234171928362], [1438029504.0, 35860000.0, -0.8575266561936535], [1438031232.0, 35880000.0, -0.8509944817946917], [1438032960.0, 35900000.0, -0.8443279255020154], [1438034688.0, 35920000.0, -0.8375280400421424], [1438036416.0, 35940000.0, -0.8305958991958138], [1438038144.0, 35960000.0, -0.823532597628427], [1438039872.0, 35980000.0, -0.816339250717184], [1438041600.0, 36000000.0, -0.8090169943749479], [1438043328.0, 36020000.0, -0.8015669848708775], [1438045056.0, 36040000.0, -0.7939903986478346], [1438046784.0, 36060000.0, -0.7862884321366209], [1438048512.0, 36080000.0, -0.7784623015670237], [1438050240.0, 36100000.0, -0.77051324277579], [1438051968.0, 36120000.0, -0.7624425110114492], [1438053696.0, 36140000.0, -0.7542513807361056], [1438055424.0, 36160000.0, -0.7459411454241821], [1438057152.0, 36180000.0, -0.7375131173581745], [1438058880.0, 36200000.0, -0.7289686274214127], [1438060608.0, 36220000.0, -0.7203090248879085], [1438062336.0, 36240000.0, -0.7115356772092851], [1438064064.0, 36260000.0, -0.7026499697988495], [1438065792.0, 36280000.0, -0.6936533058128058], [1438067520.0, 36300000.0, -0.6845471059286901], [1438069248.0, 36320000.0, -0.6753328081210238], [1438070976.0, 36340000.0, -0.6660118674342516], [1438072704.0, 36360000.0, -0.656585755752957], [1438074432.0, 36380000.0, -0.6470559615694454], [1438076160.0, 36400000.0, -0.6374239897486887], [1438077888.0, 36420000.0, -0.6276913612907029], [1438079616.0, 36440000.0, -0.6178596130903345], [1438081344.0, 36460000.0, -0.6079302976946063], [1438083072.0, 36480000.0, -0.5979049830575175], [1438084800.0, 36500000.0, -0.5877852522924752], [1438086528.0, 36520000.0, -0.5775727034222675], [1438088256.0, 36540000.0, -0.5672689491267571], [1438089984.0, 36560000.0, -0.5568756164881892], [1438091712.0, 36580000.0, -0.546394346734271], [1438093440.0, 36600000.0, -0.5358267949789962], [1438095168.0, 36620000.0, -0.5251746299612959], [1438096896.0, 36640000.0, -0.5144395337815073], [1438098624.0, 36660000.0, -0.5036232016357625], [1438100352.0, 36680000.0, -0.49272734154829073], [1438102080.0, 36700000.0, -0.4817536741017151], [1438103808.0, 36720000.0, -0.4707039321653331], [1438105536.0, 36740000.0, -0.4595798606214891], [1438107264.0, 36760000.0, -0.448383216090031], [1438108992.0, 36780000.0, -0.43711576665093554], [1438110720.0, 36800000.0, -0.4257792915650728], [1438112448.0, 36820000.0, -0.41437558099328503], [1438114176.0, 36840000.0, -0.402906435713661], [1438115904.0, 36860000.0, -0.39137366683720476], [1438117632.0, 36880000.0, -0.3797790955218009], [1438119360.0, 36900000.0, -0.3681245526846785], [1438121088.0, 36920000.0, -0.356411878713252], [1438122816.0, 36940000.0, -0.34464292317451906], [1438124544.0, 36960000.0, -0.332819544522986], [1438126272.0, 36980000.0, -0.3209436098072096], [1438128000.0, 37000000.0, -0.3090169943749483], [1438129728.0, 37020000.0, -0.29704158157703653], [1438131456.0, 37040000.0, -0.2850192624699751], [1438133184.0, 37060000.0, -0.27295193551732494], [1438134912.0, 37080000.0, -0.2608415062898974], [1438136640.0, 37100000.0, -0.24868988716485602], [1438138368.0, 37120000.0, -0.2364989970237232], [1438140096.0, 37140000.0, -0.22427076094938395], [1438141824.0, 37160000.0, -0.2120071099220547], [1438143552.0, 37180000.0, -0.19970998051440783], [1438145280.0, 37200000.0, -0.18738131458572274], [1438147008.0, 37220000.0, -0.17502305897527842], [1438148736.0, 37240000.0, -0.16263716519488325], [1438150464.0, 37260000.0, -0.15022558912075745], [1438152192.0, 37280000.0, -0.13779029068463924], [1438153920.0, 37300000.0, -0.1253332335643062], [1438155648.0, 37320000.0, -0.1128563848734809], [1438157376.0, 37340000.0, -0.10036171485121488], [1438159104.0, 37360000.0, -0.08785119655074394], [1438160832.0, 37380000.0, -0.07532680552793428], [1438162560.0, 37400000.0, -0.06279051952931217], [1438164288.0, 37420000.0, -0.050244318179769126], [1438166016.0, 37440000.0, -0.03769018266993489], [1438167744.0, 37460000.0, -0.02513009544333861], [1438169472.0, 37480000.0, -0.012566039883350968], [1438171200.0, 37500000.0, -2.6948419387607653e-15], [1438172928.0, 37520000.0, 0.012566039883349131], [1438174656.0, 37540000.0, 0.02513009544333677], [1438176384.0, 37560000.0, 0.03769018266993305], [1438178112.0, 37580000.0, 0.05024431817976729], [1438179840.0, 37600000.0, 0.06279051952931033], [1438181568.0, 37620000.0, 0.07532680552793244], [1438183296.0, 37640000.0, 0.08785119655074211], [1438185024.0, 37660000.0, 0.10036171485121305], [1438186752.0, 37680000.0, 0.11285638487347907], [1438188480.0, 37700000.0, 0.1253332335643044], [1438190208.0, 37720000.0, 0.13779029068463744], [1438191936.0, 37740000.0, 0.15022558912075565], [1438193664.0, 37760000.0, 0.16263716519488142], [1438195392.0, 37780000.0, 0.17502305897527662], [1438197120.0, 37800000.0, 0.18738131458572094], [1438198848.0, 37820000.0, 0.19970998051440603], [1438200576.0, 37840000.0, 0.2120071099220529], [1438202304.0, 37860000.0, 0.22427076094938214], [1438204032.0, 37880000.0, 0.23649899702372143], [1438205760.0, 37900000.0, 0.24868988716485424], [1438207488.0, 37920000.0, 0.2608415062898956], [1438209216.0, 37940000.0, 0.27295193551732316], [1438210944.0, 37960000.0, 0.28501926246997333], [1438212672.0, 37980000.0, 0.29704158157703475], [1438214400.0, 38000000.0, 0.30901699437494656], [1438216128.0, 38020000.0, 0.3209436098072079], [1438217856.0, 38040000.0, 0.3328195445229843], [1438219584.0, 38060000.0, 0.34464292317451733], [1438221312.0, 38080000.0, 0.35641187871325025], [1438223040.0, 38100000.0, 0.36812455268467675], [1438224768.0, 38120000.0, 0.37977909552179917], [1438226496.0, 38140000.0, 0.39137366683720304], [1438228224.0, 38160000.0, 0.4029064357136593], [1438229952.0, 38180000.0, 0.41437558099328337], [1438231680.0, 38200000.0, 0.42577929156507116], [1438233408.0, 38220000.0, 0.43711576665093393], [1438235136.0, 38240000.0, 0.4483832160900294], [1438236864.0, 38260000.0, 0.4595798606214875], [1438238592.0, 38280000.0, 0.47070393216533146], [1438240320.0, 38300000.0, 0.4817536741017135], [1438242048.0, 38320000.0, 0.4927273415482891], [1438243776.0, 38340000.0, 0.5036232016357608], [1438245504.0, 38360000.0, 0.5144395337815058], [1438247232.0, 38380000.0, 0.5251746299612944], [1438248960.0, 38400000.0, 0.5358267949789947], [1438250688.0, 38420000.0, 0.5463943467342695], [1438252416.0, 38440000.0, 0.5568756164881877], [1438254144.0, 38460000.0, 0.5672689491267555], [1438255872.0, 38480000.0, 0.577572703422266], [1438257600.0, 38500000.0, 0.5877852522924738], [1438259328.0, 38520000.0, 0.597904983057516], [1438261056.0, 38540000.0, 0.6079302976946048], [1438262784.0, 38560000.0, 0.6178596130903331], [1438264512.0, 38580000.0, 0.6276913612907015], [1438266240.0, 38600000.0, 0.6374239897486873], [1438267968.0, 38620000.0, 0.6470559615694441], [1438269696.0, 38640000.0, 0.6565857557529556], [1438271424.0, 38660000.0, 0.6660118674342502], [1438273152.0, 38680000.0, 0.6753328081210225], [1438274880.0, 38700000.0, 0.6845471059286887], [1438276608.0, 38720000.0, 0.6936533058128045], [1438278336.0, 38740000.0, 0.7026499697988482], [1438280064.0, 38760000.0, 0.7115356772092838], [1438281792.0, 38780000.0, 0.7203090248879073], [1438283520.0, 38800000.0, 0.7289686274214113], [1438285248.0, 38820000.0, 0.7375131173581733], [1438286976.0, 38840000.0, 0.7459411454241809], [1438288704.0, 38860000.0, 0.7542513807361044], [1438290432.0, 38880000.0, 0.762442511011448], [1438292160.0, 38900000.0, 0.7705132427757888], [1438293888.0, 38920000.0, 0.7784623015670226], [1438295616.0, 38940000.0, 0.7862884321366198], [1438297344.0, 38960000.0, 0.7939903986478335], [1438299072.0, 38980000.0, 0.8015669848708765], [1438300800.0, 39000000.0, 0.8090169943749469], [1438302528.0, 39020000.0, 0.8163392507171829], [1438304256.0, 39040000.0, 0.823532597628426], [1438305984.0, 39060000.0, 0.8305958991958128], [1438307712.0, 39080000.0, 0.8375280400421414], [1438309440.0, 39100000.0, 0.8443279255020144], [1438311168.0, 39120000.0, 0.8509944817946907], [1438312896.0, 39140000.0, 0.8575266561936526], [1438314624.0, 39160000.0, 0.8639234171928353], [1438316352.0, 39180000.0, 0.8701837546695252], [1438318080.0, 39200000.0, 0.8763066800438628], [1438319808.0, 39220000.0, 0.8822912264349538], [1438321536.0, 39240000.0, 0.8881364488135447], [1438323264.0, 39260000.0, 0.8938414241512636], [1438324992.0, 39280000.0, 0.8994052515663705], [1438326720.0, 39300000.0, 0.9048270524660201], [1438328448.0, 39320000.0, 0.9101059706849945], [1438330176.0, 39340000.0, 0.9152411726209175], [1438331904.0, 39360000.0, 0.92023184736587], [1438333632.0, 39380000.0, 0.9250772068344575], [1438335360.0, 39400000.0, 0.9297764858882506], [1438337088.0, 39420000.0, 0.9343289424566121], [1438338816.0, 39440000.0, 0.938733857653874], [1438340544.0, 39460000.0, 0.9429905358928641], [1438342272.0, 39480000.0, 0.9470983049947437], [1438344000.0, 39500000.0, 0.9510565162951539], [1438345728.0, 39520000.0, 0.954864544746643], [1438347456.0, 39540000.0, 0.9585217890173757], [1438349184.0, 39560000.0, 0.9620276715860855], [1438350912.0, 39580000.0, 0.9653816388332742], [1438352640.0, 39600000.0, 0.9685831611286312], [1438354368.0, 39620000.0, 0.9716317329146739], [1438356096.0, 39640000.0, 0.9745268727865769], [1438357824.0, 39660000.0, 0.9772681235681938], [1438359552.0, 39680000.0, 0.9798550523842463], [1438361280.0, 39700000.0, 0.9822872507286887], [1438363008.0, 39720000.0, 0.9845643345292052], [1438364736.0, 39740000.0, 0.9866859442078678], [1438366464.0, 39760000.0, 0.9886517447379136], [1438368192.0, 39780000.0, 0.9904614256966513], [1438369920.0, 39800000.0, 0.9921147013144778], [1438371648.0, 39820000.0, 0.9936113105200083], [1438373376.0, 39840000.0, 0.9949510169813], [1438375104.0, 39860000.0, 0.9961336091431725], [1438376832.0, 39880000.0, 0.9971589002606139], [1438378560.0, 39900000.0, 0.9980267284282716], [1438380288.0, 39920000.0, 0.9987369566060175], [1438382016.0, 39940000.0, 0.9992894726405893], [1438383744.0, 39960000.0, 0.9996841892833], [1438385472.0, 39980000.0, 0.9999210442038161], [1438387200.0, 40000000.0, 1.0], [1438388928.0, 40020000.0, 0.9999210442038161], [1438390656.0, 40040000.0, 0.9996841892833], [1438392384.0, 40060000.0, 0.9992894726405892], [1438394112.0, 40080000.0, 0.9987369566060176], [1438395840.0, 40100000.0, 0.9980267284282717], [1438397568.0, 40120000.0, 0.997158900260614], [1438399296.0, 40140000.0, 0.9961336091431724], [1438401024.0, 40160000.0, 0.9949510169813002], [1438402752.0, 40180000.0, 0.9936113105200085], [1438404480.0, 40200000.0, 0.9921147013144785], [1438406208.0, 40220000.0, 0.9904614256966511], [1438407936.0, 40240000.0, 0.988651744737914], [1438409664.0, 40260000.0, 0.9866859442078681], [1438411392.0, 40280000.0, 0.9845643345292062], [1438413120.0, 40300000.0, 0.9822872507286884], [1438414848.0, 40320000.0, 0.9798550523842467], [1438416576.0, 40340000.0, 0.9772681235681943], [1438418304.0, 40360000.0, 0.9745268727865781], [1438420032.0, 40380000.0, 0.9716317329146735], [1438421760.0, 40400000.0, 0.9685831611286317], [1438423488.0, 40420000.0, 0.9653816388332748], [1438425216.0, 40440000.0, 0.9620276715860869], [1438426944.0, 40460000.0, 0.9585217890173752], [1438428672.0, 40480000.0, 0.9548645447466435], [1438430400.0, 40500000.0, 0.9510565162951544], [1438432128.0, 40520000.0, 0.9470983049947455], [1438433856.0, 40540000.0, 0.9429905358928647], [1438435584.0, 40560000.0, 0.9387338576538746], [1438437312.0, 40580000.0, 0.9343289424566129], [1438439040.0, 40600000.0, 0.9297764858882526], [1438440768.0, 40620000.0, 0.9250772068344582], [1438442496.0, 40640000.0, 0.9202318473658708], [1438444224.0, 40660000.0, 0.9152411726209183], [1438445952.0, 40680000.0, 0.9101059706849968], [1438447680.0, 40700000.0, 0.9048270524660195], [1438449408.0, 40720000.0, 0.8994052515663713], [1438451136.0, 40740000.0, 0.8938414241512644], [1438452864.0, 40760000.0, 0.8881364488135456], [1438454592.0, 40780000.0, 0.8822912264349531], [1438456320.0, 40800000.0, 0.8763066800438637], [1438458048.0, 40820000.0, 0.8701837546695262], [1438459776.0, 40840000.0, 0.8639234171928363], [1438461504.0, 40860000.0, 0.8575266561936518], [1438463232.0, 40880000.0, 0.8509944817946918], [1438464960.0, 40900000.0, 0.8443279255020154], [1438466688.0, 40920000.0, 0.8375280400421444], [1438468416.0, 40940000.0, 0.8305958991958119], [1438470144.0, 40960000.0, 0.8235325976284271], [1438471872.0, 40980000.0, 0.816339250717184], [1438473600.0, 41000000.0, 0.8090169943749501], [1438475328.0, 41020000.0, 0.8015669848708755], [1438477056.0, 41040000.0, 0.7939903986478347], [1438478784.0, 41060000.0, 0.7862884321366209], [1438480512.0, 41080000.0, 0.778462301567026], [1438482240.0, 41100000.0, 0.7705132427757878], [1438483968.0, 41120000.0, 0.7624425110114492], [1438485696.0, 41140000.0, 0.7542513807361058], [1438487424.0, 41160000.0, 0.7459411454241847], [1438489152.0, 41180000.0, 0.7375131173581722], [1438490880.0, 41200000.0, 0.7289686274214127], [1438492608.0, 41220000.0, 0.7203090248879086], [1438494336.0, 41240000.0, 0.7115356772092877], [1438496064.0, 41260000.0, 0.7026499697988495], [1438497792.0, 41280000.0, 0.6936533058128059], [1438499520.0, 41300000.0, 0.6845471059286902], [1438501248.0, 41320000.0, 0.6753328081210266], [1438502976.0, 41340000.0, 0.6660118674342517], [1438504704.0, 41360000.0, 0.6565857557529571], [1438506432.0, 41380000.0, 0.6470559615694456], [1438508160.0, 41400000.0, 0.6374239897486916], [1438509888.0, 41420000.0, 0.6276913612907002], [1438511616.0, 41440000.0, 0.6178596130903347], [1438513344.0, 41460000.0, 0.6079302976946064], [1438515072.0, 41480000.0, 0.5979049830575205], [1438516800.0, 41500000.0, 0.5877852522924725], [1438518528.0, 41520000.0, 0.5775727034222676], [1438520256.0, 41540000.0, 0.5672689491267572], [1438521984.0, 41560000.0, 0.5568756164881893], [1438523712.0, 41580000.0, 0.5463943467342681], [1438525440.0, 41600000.0, 0.5358267949789963], [1438527168.0, 41620000.0, 0.525174629961296], [1438528896.0, 41640000.0, 0.5144395337815074], [1438530624.0, 41660000.0, 0.5036232016357595], [1438532352.0, 41680000.0, 0.49272734154829084], [1438534080.0, 41700000.0, 0.4817536741017152], [1438535808.0, 41720000.0, 0.47070393216533635], [1438537536.0, 41740000.0, 0.45957986062148604], [1438539264.0, 41760000.0, 0.4483832160900311], [1438540992.0, 41780000.0, 0.43711576665093566], [1438542720.0, 41800000.0, 0.42577929156507616], [1438544448.0, 41820000.0, 0.4143755809932819], [1438546176.0, 41840000.0, 0.40290643571366114], [1438547904.0, 41860000.0, 0.39137366683720487], [1438549632.0, 41880000.0, 0.3797790955218043], [1438551360.0, 41900000.0, 0.36812455268467525], [1438553088.0, 41920000.0, 0.3564118787132521], [1438554816.0, 41940000.0, 0.34464292317451917], [1438556544.0, 41960000.0, 0.3328195445229895], [1438558272.0, 41980000.0, 0.3209436098072097], [1438560000.0, 42000000.0, 0.3090169943749484], [1438561728.0, 42020000.0, 0.29704158157703664], [1438563456.0, 42040000.0, 0.2850192624699786], [1438565184.0, 42060000.0, 0.27295193551732505], [1438566912.0, 42080000.0, 0.2608415062898975], [1438568640.0, 42100000.0, 0.24868988716485613], [1438570368.0, 42120000.0, 0.2364989970237268], [1438572096.0, 42140000.0, 0.2242707609493806], [1438573824.0, 42160000.0, 0.21200710992205482], [1438575552.0, 42180000.0, 0.19970998051440797], [1438577280.0, 42200000.0, 0.18738131458572635], [1438579008.0, 42220000.0, 0.17502305897527504], [1438580736.0, 42240000.0, 0.16263716519488336], [1438582464.0, 42260000.0, 0.1502255891207576], [1438584192.0, 42280000.0, 0.13779029068463938], [1438585920.0, 42300000.0, 0.12533323356430282], [1438587648.0, 42320000.0, 0.11285638487348103], [1438589376.0, 42340000.0, 0.100361714851215], [1438591104.0, 42360000.0, 0.08785119655074407], [1438592832.0, 42380000.0, 0.07532680552793085], [1438594560.0, 42400000.0, 0.06279051952931229], [1438596288.0, 42420000.0, 0.050244318179769244], [1438598016.0, 42440000.0, 0.03769018266993856], [1438599744.0, 42460000.0, 0.02513009544333518], [1438601472.0, 42480000.0, 0.012566039883351091], [1438603200.0, 42500000.0, -7.354070601250002e-16], [1438604928.0, 42520000.0, -0.01256603988334901], [1438606656.0, 42540000.0, -0.025130095443333098], [1438608384.0, 42560000.0, -0.03769018266993648], [1438610112.0, 42580000.0, -0.05024431817976717], [1438611840.0, 42600000.0, -0.06279051952931021], [1438613568.0, 42620000.0, -0.07532680552792878], [1438615296.0, 42640000.0, -0.087851196550742], [1438617024.0, 42660000.0, -0.10036171485121292], [1438618752.0, 42680000.0, -0.11285638487347896], [1438620480.0, 42700000.0, -0.12533323356430073], [1438622208.0, 42720000.0, -0.1377902906846373], [1438623936.0, 42740000.0, -0.15022558912075554], [1438625664.0, 42760000.0, -0.1626371651948813], [1438627392.0, 42780000.0, -0.17502305897527298], [1438629120.0, 42800000.0, -0.1873813145857243], [1438630848.0, 42820000.0, -0.19970998051440592], [1438632576.0, 42840000.0, -0.2120071099220528], [1438634304.0, 42860000.0, -0.22427076094937856], [1438636032.0, 42880000.0, -0.23649899702372476], [1438637760.0, 42900000.0, -0.2486898871648541], [1438639488.0, 42920000.0, -0.2608415062898955], [1438641216.0, 42940000.0, -0.27295193551732305], [1438642944.0, 42960000.0, -0.2850192624699766], [1438644672.0, 42980000.0, -0.29704158157703464], [1438646400.0, 43000000.0, -0.3090169943749464], [1438648128.0, 43020000.0, -0.32094360980720776], [1438649856.0, 43040000.0, -0.3328195445229875], [1438651584.0, 43060000.0, -0.3446429231745172], [1438653312.0, 43080000.0, -0.35641187871325014], [1438655040.0, 43100000.0, -0.36812455268467337], [1438656768.0, 43120000.0, -0.3797790955218024], [1438658496.0, 43140000.0, -0.3913736668372029], [1438660224.0, 43160000.0, -0.4029064357136592], [1438661952.0, 43180000.0, -0.41437558099328], [1438663680.0, 43200000.0, -0.42577929156507427], [1438665408.0, 43220000.0, -0.4371157666509338], [1438667136.0, 43240000.0, -0.4483832160900293], [1438668864.0, 43260000.0, -0.4595798606214842], [1438670592.0, 43280000.0, -0.4707039321653345], [1438672320.0, 43300000.0, -0.48175367410171344], [1438674048.0, 43320000.0, -0.492727341548289], [1438675776.0, 43340000.0, -0.5036232016357577], [1438677504.0, 43360000.0, -0.5144395337815056], [1438679232.0, 43380000.0, -0.5251746299612943], [1438680960.0, 43400000.0, -0.5358267949789945], [1438682688.0, 43420000.0, -0.5463943467342663], [1438684416.0, 43440000.0, -0.5568756164881876], [1438686144.0, 43460000.0, -0.5672689491267554], [1438687872.0, 43480000.0, -0.577572703422266], [1438689600.0, 43500000.0, -0.5877852522924708], [1438691328.0, 43520000.0, -0.5979049830575188], [1438693056.0, 43540000.0, -0.6079302976946047], [1438694784.0, 43560000.0, -0.6178596130903331], [1438696512.0, 43580000.0, -0.6276913612906986], [1438698240.0, 43600000.0, -0.63742398974869], [1438699968.0, 43620000.0, -0.647055961569444], [1438701696.0, 43640000.0, -0.6565857557529555], [1438703424.0, 43660000.0, -0.6660118674342501], [1438705152.0, 43680000.0, -0.675332808121025], [1438706880.0, 43700000.0, -0.6845471059286886], [1438708608.0, 43720000.0, -0.6936533058128044], [1438710336.0, 43740000.0, -0.7026499697988481], [1438712064.0, 43760000.0, -0.7115356772092862], [1438713792.0, 43780000.0, -0.7203090248879072], [1438715520.0, 43800000.0, -0.7289686274214113], [1438717248.0, 43820000.0, -0.7375131173581707], [1438718976.0, 43840000.0, -0.7459411454241832], [1438720704.0, 43860000.0, -0.7542513807361043], [1438722432.0, 43880000.0, -0.7624425110114479], [1438724160.0, 43900000.0, -0.7705132427757865], [1438725888.0, 43920000.0, -0.7784623015670247], [1438727616.0, 43940000.0, -0.7862884321366197], [1438729344.0, 43960000.0, -0.7939903986478335], [1438731072.0, 43980000.0, -0.8015669848708743], [1438732800.0, 44000000.0, -0.8090169943749489], [1438734528.0, 44020000.0, -0.8163392507171828], [1438736256.0, 44040000.0, -0.8235325976284259], [1438737984.0, 44060000.0, -0.8305958991958108], [1438739712.0, 44080000.0, -0.8375280400421433], [1438741440.0, 44100000.0, -0.8443279255020143], [1438743168.0, 44120000.0, -0.8509944817946907], [1438744896.0, 44140000.0, -0.8575266561936508], [1438746624.0, 44160000.0, -0.8639234171928352], [1438748352.0, 44180000.0, -0.8701837546695251], [1438750080.0, 44200000.0, -0.8763066800438627], [1438751808.0, 44220000.0, -0.8822912264349521], [1438753536.0, 44240000.0, -0.8881364488135446], [1438755264.0, 44260000.0, -0.8938414241512636], [1438756992.0, 44280000.0, -0.8994052515663704], [1438758720.0, 44300000.0, -0.9048270524660186], [1438760448.0, 44320000.0, -0.910105970684996], [1438762176.0, 44340000.0, -0.9152411726209175], [1438763904.0, 44360000.0, -0.9202318473658699], [1438765632.0, 44380000.0, -0.9250772068344574], [1438767360.0, 44400000.0, -0.9297764858882518], [1438769088.0, 44420000.0, -0.9343289424566121], [1438770816.0, 44440000.0, -0.938733857653874], [1438772544.0, 44460000.0, -0.9429905358928641], [1438774272.0, 44480000.0, -0.9470983049947448], [1438776000.0, 44500000.0, -0.9510565162951538], [1438777728.0, 44520000.0, -0.954864544746643], [1438779456.0, 44540000.0, -0.9585217890173746], [1438781184.0, 44560000.0, -0.9620276715860864], [1438782912.0, 44580000.0, -0.9653816388332742], [1438784640.0, 44600000.0, -0.9685831611286312], [1438786368.0, 44620000.0, -0.971631732914673], [1438788096.0, 44640000.0, -0.9745268727865777], [1438789824.0, 44660000.0, -0.9772681235681938], [1438791552.0, 44680000.0, -0.9798550523842463], [1438793280.0, 44700000.0, -0.982287250728688], [1438795008.0, 44720000.0, -0.9845643345292059], [1438796736.0, 44740000.0, -0.9866859442078678], [1438798464.0, 44760000.0, -0.9886517447379136], [1438800192.0, 44780000.0, -0.9904614256966507], [1438801920.0, 44800000.0, -0.9921147013144782], [1438803648.0, 44820000.0, -0.9936113105200083], [1438805376.0, 44840000.0, -0.9949510169813], [1438807104.0, 44860000.0, -0.9961336091431722], [1438808832.0, 44880000.0, -0.9971589002606139], [1438810560.0, 44900000.0, -0.9980267284282716], [1438812288.0, 44920000.0, -0.9987369566060175], [1438814016.0, 44940000.0, -0.9992894726405892], [1438815744.0, 44960000.0, -0.9996841892832999], [1438817472.0, 44980000.0, -0.9999210442038161], [1438819200.0, 45000000.0, -1.0], [1438820928.0, 45020000.0, -0.9999210442038161], [1438822656.0, 45040000.0, -0.9996841892833], [1438824384.0, 45060000.0, -0.9992894726405892], [1438826112.0, 45080000.0, -0.9987369566060176], [1438827840.0, 45100000.0, -0.9980267284282717], [1438829568.0, 45120000.0, -0.9971589002606142], [1438831296.0, 45140000.0, -0.9961336091431724], [1438833024.0, 45160000.0, -0.9949510169813002], [1438834752.0, 45180000.0, -0.9936113105200085], [1438836480.0, 45200000.0, -0.9921147013144785], [1438838208.0, 45220000.0, -0.9904614256966511], [1438839936.0, 45240000.0, -0.988651744737914], [1438841664.0, 45260000.0, -0.9866859442078681], [1438843392.0, 45280000.0, -0.9845643345292062], [1438845120.0, 45300000.0, -0.9822872507286884], [1438846848.0, 45320000.0, -0.9798550523842467], [1438848576.0, 45340000.0, -0.9772681235681943], [1438850304.0, 45360000.0, -0.9745268727865781], [1438852032.0, 45380000.0, -0.9716317329146735], [1438853760.0, 45400000.0, -0.9685831611286317], [1438855488.0, 45420000.0, -0.9653816388332748], [1438857216.0, 45440000.0, -0.962027671586087], [1438858944.0, 45460000.0, -0.9585217890173753], [1438860672.0, 45480000.0, -0.9548645447466436], [1438862400.0, 45500000.0, -0.9510565162951544], [1438864128.0, 45520000.0, -0.9470983049947455], [1438865856.0, 45540000.0, -0.9429905358928647], [1438867584.0, 45560000.0, -0.9387338576538746], [1438869312.0, 45580000.0, -0.9343289424566129], [1438871040.0, 45600000.0, -0.9297764858882526], [1438872768.0, 45620000.0, -0.9250772068344583], [1438874496.0, 45640000.0, -0.9202318473658708], [1438876224.0, 45660000.0, -0.9152411726209184], [1438877952.0, 45680000.0, -0.9101059706849969], [1438879680.0, 45700000.0, -0.9048270524660196], [1438881408.0, 45720000.0, -0.8994052515663714], [1438883136.0, 45740000.0, -0.8938414241512646], [1438884864.0, 45760000.0, -0.8881364488135456], [1438886592.0, 45780000.0, -0.8822912264349531], [1438888320.0, 45800000.0, -0.8763066800438638], [1438890048.0, 45820000.0, -0.8701837546695262], [1438891776.0, 45840000.0, -0.8639234171928363], [1438893504.0, 45860000.0, -0.8575266561936519], [1438895232.0, 45880000.0, -0.8509944817946918], [1438896960.0, 45900000.0, -0.8443279255020155], [1438898688.0, 45920000.0, -0.8375280400421445], [1438900416.0, 45940000.0, -0.830595899195812], [1438902144.0, 45960000.0, -0.8235325976284272], [1438903872.0, 45980000.0, -0.8163392507171842], [1438905600.0, 46000000.0, -0.8090169943749502], [1438907328.0, 46020000.0, -0.8015669848708756], [1438909056.0, 46040000.0, -0.7939903986478348], [1438910784.0, 46060000.0, -0.786288432136621], [1438912512.0, 46080000.0, -0.7784623015670261], [1438914240.0, 46100000.0, -0.7705132427757879], [1438915968.0, 46120000.0, -0.7624425110114493], [1438917696.0, 46140000.0, -0.7542513807361058], [1438919424.0, 46160000.0, -0.7459411454241847], [1438921152.0, 46180000.0, -0.7375131173581723], [1438922880.0, 46200000.0, -0.7289686274214128], [1438924608.0, 46220000.0, -0.7203090248879087], [1438926336.0, 46240000.0, -0.7115356772092878], [1438928064.0, 46260000.0, -0.7026499697988496], [1438929792.0, 46280000.0, -0.693653305812806], [1438931520.0, 46300000.0, -0.6845471059286903], [1438933248.0, 46320000.0, -0.6753328081210266], [1438934976.0, 46340000.0, -0.6660118674342518], [1438936704.0, 46360000.0, -0.6565857557529572], [1438938432.0, 46380000.0, -0.6470559615694457], [1438940160.0, 46400000.0, -0.6374239897486916], [1438941888.0, 46420000.0, -0.6276913612907004], [1438943616.0, 46440000.0, -0.6178596130903348], [1438945344.0, 46460000.0, -0.6079302976946065], [1438947072.0, 46480000.0, -0.5979049830575206], [1438948800.0, 46500000.0, -0.5877852522924726], [1438950528.0, 46520000.0, -0.5775727034222677], [1438952256.0, 46540000.0, -0.5672689491267573], [1438953984.0, 46560000.0, -0.5568756164881894], [1438955712.0, 46580000.0, -0.5463943467342682], [1438957440.0, 46600000.0, -0.5358267949789964], [1438959168.0, 46620000.0, -0.5251746299612962], [1438960896.0, 46640000.0, -0.5144395337815075], [1438962624.0, 46660000.0, -0.5036232016357596], [1438964352.0, 46680000.0, -0.49272734154829095], [1438966080.0, 46700000.0, -0.4817536741017153], [1438967808.0, 46720000.0, -0.47070393216533646], [1438969536.0, 46740000.0, -0.45957986062148615], [1438971264.0, 46760000.0, -0.44838321609003123], [1438972992.0, 46780000.0, -0.43711576665093577], [1438974720.0, 46800000.0, -0.42577929156507627], [1438976448.0, 46820000.0, -0.41437558099328203], [1438978176.0, 46840000.0, -0.40290643571366125], [1438979904.0, 46860000.0, -0.391373666837205], [1438981632.0, 46880000.0, -0.3797790955218044], [1438983360.0, 46900000.0, -0.3681245526846754], [1438985088.0, 46920000.0, -0.3564118787132522], [1438986816.0, 46940000.0, -0.3446429231745193], [1438988544.0, 46960000.0, -0.3328195445229896], [1438990272.0, 46980000.0, -0.3209436098072098], [1438992000.0, 47000000.0, -0.3090169943749485], [1438993728.0, 47020000.0, -0.29704158157703675], [1438995456.0, 47040000.0, -0.2850192624699787], [1438997184.0, 47060000.0, -0.27295193551732516], [1438998912.0, 47080000.0, -0.2608415062898976], [1439000640.0, 47100000.0, -0.24868988716485624], [1439002368.0, 47120000.0, -0.2364989970237269], [1439004096.0, 47140000.0, -0.22427076094938073], [1439005824.0, 47160000.0, -0.21200710992205493], [1439007552.0, 47180000.0, -0.19970998051440808], [1439009280.0, 47200000.0, -0.18738131458572646], [1439011008.0, 47220000.0, -0.17502305897527515], [1439012736.0, 47240000.0, -0.16263716519488347], [1439014464.0, 47260000.0, -0.1502255891207577], [1439016192.0, 47280000.0, -0.1377902906846395], [1439017920.0, 47300000.0, -0.12533323356430293], [1439019648.0, 47320000.0, -0.11285638487348114], [1439021376.0, 47340000.0, -0.10036171485121512], [1439023104.0, 47360000.0, -0.08785119655074419], [1439024832.0, 47380000.0, -0.07532680552793097], [1439026560.0, 47400000.0, -0.0627905195293124], [1439028288.0, 47420000.0, -0.05024431817976937], [1439030016.0, 47440000.0, -0.037690182669938684], [1439031744.0, 47460000.0, -0.025130095443335304], [1439033472.0, 47480000.0, -0.012566039883351213], [1439035200.0, 47500000.0, -2.939771298590236e-15], [1439036928.0, 47520000.0, 0.012566039883348886], [1439038656.0, 47540000.0, 0.025130095443332976], [1439040384.0, 47560000.0, 0.03769018266993636], [1439042112.0, 47580000.0, 0.050244318179767045], [1439043840.0, 47600000.0, 0.06279051952931008], [1439045568.0, 47620000.0, 0.07532680552792866], [1439047296.0, 47640000.0, 0.08785119655074188], [1439049024.0, 47660000.0, 0.10036171485121281], [1439050752.0, 47680000.0, 0.11285638487347883], [1439052480.0, 47700000.0, 0.12533323356430062], [1439054208.0, 47720000.0, 0.1377902906846372], [1439055936.0, 47740000.0, 0.1502255891207554], [1439057664.0, 47760000.0, 0.1626371651948812], [1439059392.0, 47780000.0, 0.17502305897527287], [1439061120.0, 47800000.0, 0.18738131458572418], [1439062848.0, 47820000.0, 0.1997099805144058], [1439064576.0, 47840000.0, 0.21200710992205266], [1439066304.0, 47860000.0, 0.22427076094937845], [1439068032.0, 47880000.0, 0.23649899702372465], [1439069760.0, 47900000.0, 0.248689887164854], [1439071488.0, 47920000.0, 0.2608415062898954], [1439073216.0, 47940000.0, 0.27295193551732294], [1439074944.0, 47960000.0, 0.2850192624699765], [1439076672.0, 47980000.0, 0.29704158157703453], [1439078400.0, 48000000.0, 0.3090169943749463], [1439080128.0, 48020000.0, 0.32094360980720765], [1439081856.0, 48040000.0, 0.3328195445229874], [1439083584.0, 48060000.0, 0.3446429231745171], [1439085312.0, 48080000.0, 0.35641187871325003], [1439087040.0, 48100000.0, 0.36812455268467326], [1439088768.0, 48120000.0, 0.3797790955218022], [1439090496.0, 48140000.0, 0.3913736668372028], [1439092224.0, 48160000.0, 0.4029064357136591], [1439093952.0, 48180000.0, 0.41437558099327987], [1439095680.0, 48200000.0, 0.42577929156507416], [1439097408.0, 48220000.0, 0.4371157666509337], [1439099136.0, 48240000.0, 0.4483832160900292], [1439100864.0, 48260000.0, 0.4595798606214841], [1439102592.0, 48280000.0, 0.4707039321653344], [1439104320.0, 48300000.0, 0.4817536741017133], [1439106048.0, 48320000.0, 0.49272734154828896], [1439107776.0, 48340000.0, 0.5036232016357576], [1439109504.0, 48360000.0, 0.5144395337815055], [1439111232.0, 48380000.0, 0.5251746299612942], [1439112960.0, 48400000.0, 0.5358267949789944], [1439114688.0, 48420000.0, 0.5463943467342662], [1439116416.0, 48440000.0, 0.5568756164881875], [1439118144.0, 48460000.0, 0.5672689491267553], [1439119872.0, 48480000.0, 0.5775727034222659], [1439121600.0, 48500000.0, 0.5877852522924707], [1439123328.0, 48520000.0, 0.5979049830575187], [1439125056.0, 48540000.0, 0.6079302976946046], [1439126784.0, 48560000.0, 0.617859613090333], [1439128512.0, 48580000.0, 0.6276913612906985], [1439130240.0, 48600000.0, 0.6374239897486899], [1439131968.0, 48620000.0, 0.6470559615694439], [1439133696.0, 48640000.0, 0.6565857557529554], [1439135424.0, 48660000.0, 0.66601186743425], [1439137152.0, 48680000.0, 0.6753328081210249], [1439138880.0, 48700000.0, 0.6845471059286886], [1439140608.0, 48720000.0, 0.6936533058128043], [1439142336.0, 48740000.0, 0.702649969798848], [1439144064.0, 48760000.0, 0.7115356772092861], [1439145792.0, 48780000.0, 0.720309024887907], [1439147520.0, 48800000.0, 0.7289686274214112], [1439149248.0, 48820000.0, 0.7375131173581706], [1439150976.0, 48840000.0, 0.7459411454241831], [1439152704.0, 48860000.0, 0.7542513807361043], [1439154432.0, 48880000.0, 0.7624425110114478], [1439156160.0, 48900000.0, 0.7705132427757865], [1439157888.0, 48920000.0, 0.7784623015670247], [1439159616.0, 48940000.0, 0.7862884321366196], [1439161344.0, 48960000.0, 0.7939903986478334], [1439163072.0, 48980000.0, 0.8015669848708742], [1439164800.0, 49000000.0, 0.8090169943749488], [1439166528.0, 49020000.0, 0.8163392507171828], [1439168256.0, 49040000.0, 0.8235325976284259], [1439169984.0, 49060000.0, 0.8305958991958107], [1439171712.0, 49080000.0, 0.8375280400421432], [1439173440.0, 49100000.0, 0.8443279255020143], [1439175168.0, 49120000.0, 0.8509944817946906], [1439176896.0, 49140000.0, 0.8575266561936506], [1439178624.0, 49160000.0, 0.8639234171928352], [1439180352.0, 49180000.0, 0.8701837546695251], [1439182080.0, 49200000.0, 0.8763066800438627], [1439183808.0, 49220000.0, 0.8822912264349521], [1439185536.0, 49240000.0, 0.8881364488135446], [1439187264.0, 49260000.0, 0.8938414241512634], [1439188992.0, 49280000.0, 0.8994052515663704], [1439190720.0, 49300000.0, 0.9048270524660186], [1439192448.0, 49320000.0, 0.9101059706849959], [1439194176.0, 49340000.0, 0.9152411726209174], [1439195904.0, 49360000.0, 0.9202318473658699], [1439197632.0, 49380000.0, 0.9250772068344574], [1439199360.0, 49400000.0, 0.9297764858882518], [1439201088.0, 49420000.0, 0.9343289424566121], [1439202816.0, 49440000.0, 0.9387338576538738], [1439204544.0, 49460000.0, 0.942990535892864], [1439206272.0, 49480000.0, 0.9470983049947447], [1439208000.0, 49500000.0, 0.9510565162951538], [1439209728.0, 49520000.0, 0.9548645447466428], [1439211456.0, 49540000.0, 0.9585217890173746], [1439213184.0, 49560000.0, 0.9620276715860864], [1439214912.0, 49580000.0, 0.9653816388332741], [1439216640.0, 49600000.0, 0.9685831611286312], [1439218368.0, 49620000.0, 0.971631732914673], [1439220096.0, 49640000.0, 0.9745268727865777], [1439221824.0, 49660000.0, 0.9772681235681938], [1439223552.0, 49680000.0, 0.9798550523842463], [1439225280.0, 49700000.0, 0.9822872507286879], [1439227008.0, 49720000.0, 0.9845643345292059], [1439228736.0, 49740000.0, 0.9866859442078678], [1439230464.0, 49760000.0, 0.9886517447379136], [1439232192.0, 49780000.0, 0.9904614256966507], [1439233920.0, 49800000.0, 0.9921147013144782], [1439235648.0, 49820000.0, 0.9936113105200083], [1439237376.0, 49840000.0, 0.9949510169813], [1439239104.0, 49860000.0, 0.9961336091431722], [1439240832.0, 49880000.0, 0.9971589002606139], [1439242560.0, 49900000.0, 0.9980267284282716], [1439244288.0, 49920000.0, 0.9987369566060175], [1439246016.0, 49940000.0, 0.9992894726405892], [1439247744.0, 49960000.0, 0.9996841892832999], [1439249472.0, 49980000.0, 0.9999210442038161], [1439251200.0, 50000000.0, 1.0], [1439252928.0, 50020000.0, 0.9999210442038161], [1439254656.0, 50040000.0, 0.9996841892833], [1439256384.0, 50060000.0, 0.9992894726405892], [1439258112.0, 50080000.0, 0.9987369566060176], [1439259840.0, 50100000.0, 0.9980267284282717], [1439261568.0, 50120000.0, 0.9971589002606142], [1439263296.0, 50140000.0, 0.9961336091431724], [1439265024.0, 50160000.0, 0.9949510169813002], [1439266752.0, 50180000.0, 0.9936113105200085], [1439268480.0, 50200000.0, 0.9921147013144785], [1439270208.0, 50220000.0, 0.9904614256966511], [1439271936.0, 50240000.0, 0.988651744737914], [1439273664.0, 50260000.0, 0.9866859442078681], [1439275392.0, 50280000.0, 0.9845643345292062], [1439277120.0, 50300000.0, 0.9822872507286885], [1439278848.0, 50320000.0, 0.9798550523842467], [1439280576.0, 50340000.0, 0.9772681235681943], [1439282304.0, 50360000.0, 0.9745268727865782], [1439284032.0, 50380000.0, 0.9716317329146735], [1439285760.0, 50400000.0, 0.9685831611286317], [1439287488.0, 50420000.0, 0.9653816388332748], [1439289216.0, 50440000.0, 0.962027671586087], [1439290944.0, 50460000.0, 0.9585217890173753], [1439292672.0, 50480000.0, 0.9548645447466436], [1439294400.0, 50500000.0, 0.9510565162951545], [1439296128.0, 50520000.0, 0.9470983049947455], [1439297856.0, 50540000.0, 0.9429905358928649], [1439299584.0, 50560000.0, 0.9387338576538747], [1439301312.0, 50580000.0, 0.934328942456613], [1439303040.0, 50600000.0, 0.9297764858882527], [1439304768.0, 50620000.0, 0.9250772068344583], [1439306496.0, 50640000.0, 0.9202318473658708], [1439308224.0, 50660000.0, 0.9152411726209184], [1439309952.0, 50680000.0, 0.9101059706849969], [1439311680.0, 50700000.0, 0.9048270524660196], [1439313408.0, 50720000.0, 0.8994052515663714], [1439315136.0, 50740000.0, 0.8938414241512646], [1439316864.0, 50760000.0, 0.8881364488135457], [1439318592.0, 50780000.0, 0.8822912264349532], [1439320320.0, 50800000.0, 0.8763066800438638], [1439322048.0, 50820000.0, 0.8701837546695264], [1439323776.0, 50840000.0, 0.8639234171928364], [1439325504.0, 50860000.0, 0.8575266561936519], [1439327232.0, 50880000.0, 0.850994481794692], [1439328960.0, 50900000.0, 0.8443279255020155], [1439330688.0, 50920000.0, 0.8375280400421445], [1439332416.0, 50940000.0, 0.830595899195814], [1439334144.0, 50960000.0, 0.8235325976284292], [1439335872.0, 50980000.0, 0.8163392507171863], [1439337600.0, 51000000.0, 0.8090169943749502], [1439339328.0, 51020000.0, 0.8015669848708756], [1439341056.0, 51040000.0, 0.7939903986478348], [1439342784.0, 51060000.0, 0.7862884321366189], [1439344512.0, 51080000.0, 0.7784623015670239], [1439346240.0, 51100000.0, 0.7705132427757903], [1439347968.0, 51120000.0, 0.7624425110114494], [1439349696.0, 51140000.0, 0.7542513807361059], [1439351424.0, 51160000.0, 0.7459411454241848], [1439353152.0, 51180000.0, 0.7375131173581723], [1439354880.0, 51200000.0, 0.7289686274214104], [1439356608.0, 51220000.0, 0.7203090248879064], [1439358336.0, 51240000.0, 0.7115356772092853], [1439360064.0, 51260000.0, 0.7026499697988497], [1439361792.0, 51280000.0, 0.693653305812806], [1439363520.0, 51300000.0, 0.6845471059286904], [1439365248.0, 51320000.0, 0.6753328081210267], [1439366976.0, 51340000.0, 0.6660118674342492], [1439368704.0, 51360000.0, 0.6565857557529546], [1439370432.0, 51380000.0, 0.647055961569443], [1439372160.0, 51400000.0, 0.6374239897486945], [1439373888.0, 51420000.0, 0.6276913612907004], [1439375616.0, 51440000.0, 0.6178596130903349], [1439377344.0, 51460000.0, 0.6079302976946066], [1439379072.0, 51480000.0, 0.5979049830575207], [1439380800.0, 51500000.0, 0.5877852522924698], [1439382528.0, 51520000.0, 0.5775727034222707], [1439384256.0, 51540000.0, 0.5672689491267603], [1439385984.0, 51560000.0, 0.5568756164881925], [1439387712.0, 51580000.0, 0.5463943467342683], [1439389440.0, 51600000.0, 0.5358267949789964], [1439391168.0, 51620000.0, 0.5251746299612963], [1439392896.0, 51640000.0, 0.5144395337815076], [1439394624.0, 51660000.0, 0.5036232016357627], [1439396352.0, 51680000.0, 0.4927273415482942], [1439398080.0, 51700000.0, 0.48175367410171854], [1439399808.0, 51720000.0, 0.47070393216533657], [1439401536.0, 51740000.0, 0.45957986062148626], [1439403264.0, 51760000.0, 0.44838321609003134], [1439404992.0, 51780000.0, 0.4371157666509327], [1439406720.0, 51800000.0, 0.42577929156507316], [1439408448.0, 51820000.0, 0.41437558099328536], [1439410176.0, 51840000.0, 0.4029064357136646], [1439411904.0, 51860000.0, 0.3913736668372051], [1439413632.0, 51880000.0, 0.3797790955218045], [1439415360.0, 51900000.0, 0.36812455268467553], [1439417088.0, 51920000.0, 0.356411878713249], [1439418816.0, 51940000.0, 0.34464292317451606], [1439420544.0, 51960000.0, 0.33281954452298634], [1439422272.0, 51980000.0, 0.32094360980720993], [1439424000.0, 52000000.0, 0.3090169943749486], [1439425728.0, 52020000.0, 0.29704158157703686], [1439427456.0, 52040000.0, 0.28501926246997883], [1439429184.0, 52060000.0, 0.2729519355173219], [1439430912.0, 52080000.0, 0.26084150628989433], [1439432640.0, 52100000.0, 0.24868988716485294], [1439434368.0, 52120000.0, 0.23649899702373048], [1439436096.0, 52140000.0, 0.22427076094938084], [1439437824.0, 52160000.0, 0.21200710992205504], [1439439552.0, 52180000.0, 0.1997099805144082], [1439441280.0, 52200000.0, 0.18738131458572657], [1439443008.0, 52220000.0, 0.1750230589752718], [1439444736.0, 52240000.0, 0.1626371651948871], [1439446464.0, 52260000.0, 0.15022558912076134], [1439448192.0, 52280000.0, 0.13779029068464313], [1439449920.0, 52300000.0, 0.12533323356430304], [1439451648.0, 52320000.0, 0.11285638487348126], [1439453376.0, 52340000.0, 0.10036171485121524], [1439455104.0, 52360000.0, 0.08785119655074432], [1439456832.0, 52380000.0, 0.07532680552793464], [1439458560.0, 52400000.0, 0.06279051952931608], [1439460288.0, 52420000.0, 0.05024431817977304], [1439462016.0, 52440000.0, 0.0376901826699388], [1439463744.0, 52460000.0, 0.025130095443335426], [1439465472.0, 52480000.0, 0.012566039883351336], [1439467200.0, 52500000.0, -4.904777002955296e-16], [1439468928.0, 52520000.0, -0.012566039883352316], [1439470656.0, 52540000.0, -0.025130095443336404], [1439472384.0, 52560000.0, -0.03769018266993268], [1439474112.0, 52580000.0, -0.05024431817976692], [1439475840.0, 52600000.0, -0.06279051952930996], [1439477568.0, 52620000.0, -0.07532680552792853], [1439479296.0, 52640000.0, -0.08785119655074529], [1439481024.0, 52660000.0, -0.10036171485121623], [1439482752.0, 52680000.0, -0.11285638487348224], [1439484480.0, 52700000.0, -0.125333233564297], [1439486208.0, 52720000.0, -0.13779029068463708], [1439487936.0, 52740000.0, -0.1502255891207553], [1439489664.0, 52760000.0, -0.16263716519488106], [1439491392.0, 52780000.0, -0.17502305897527273], [1439493120.0, 52800000.0, -0.18738131458572754], [1439494848.0, 52820000.0, -0.19970998051440916], [1439496576.0, 52840000.0, -0.21200710992204908], [1439498304.0, 52860000.0, -0.22427076094937487], [1439500032.0, 52880000.0, -0.23649899702372454], [1439501760.0, 52900000.0, -0.24868988716485388], [1439503488.0, 52920000.0, -0.2608415062898953], [1439505216.0, 52940000.0, -0.27295193551732283], [1439506944.0, 52960000.0, -0.2850192624699798], [1439508672.0, 52980000.0, -0.29704158157703103], [1439510400.0, 53000000.0, -0.3090169943749428], [1439512128.0, 53020000.0, -0.32094360980720416], [1439513856.0, 53040000.0, -0.3328195445229873], [1439515584.0, 53060000.0, -0.34464292317451695], [1439517312.0, 53080000.0, -0.35641187871324986], [1439519040.0, 53100000.0, -0.3681245526846764], [1439520768.0, 53120000.0, -0.37977909552179884], [1439522496.0, 53140000.0, -0.39137366683719943], [1439524224.0, 53160000.0, -0.402906435713659], [1439525952.0, 53180000.0, -0.41437558099327976], [1439527680.0, 53200000.0, -0.42577929156507405], [1439529408.0, 53220000.0, -0.4371157666509336], [1439531136.0, 53240000.0, -0.44838321609003223], [1439532864.0, 53260000.0, -0.45957986062148715], [1439534592.0, 53280000.0, -0.4707039321653312], [1439536320.0, 53300000.0, -0.4817536741017132], [1439538048.0, 53320000.0, -0.49272734154828884], [1439539776.0, 53340000.0, -0.5036232016357575], [1439541504.0, 53360000.0, -0.5144395337815085], [1439543232.0, 53380000.0, -0.525174629961297], [1439544960.0, 53400000.0, -0.5358267949789973], [1439546688.0, 53420000.0, -0.5463943467342631], [1439548416.0, 53440000.0, -0.5568756164881874], [1439550144.0, 53460000.0, -0.5672689491267553], [1439551872.0, 53480000.0, -0.5775727034222657], [1439553600.0, 53500000.0, -0.5877852522924706], [1439555328.0, 53520000.0, -0.5979049830575215], [1439557056.0, 53540000.0, -0.6079302976946074], [1439558784.0, 53560000.0, -0.6178596130903301], [1439560512.0, 53580000.0, -0.6276913612906957], [1439562240.0, 53600000.0, -0.6374239897486897], [1439563968.0, 53620000.0, -0.6470559615694438], [1439565696.0, 53640000.0, -0.6565857557529553], [1439567424.0, 53660000.0, -0.66601186743425], [1439569152.0, 53680000.0, -0.6753328081210275], [1439570880.0, 53700000.0, -0.684547105928686], [1439572608.0, 53720000.0, -0.6936533058128017], [1439574336.0, 53740000.0, -0.7026499697988454], [1439576064.0, 53760000.0, -0.711535677209286], [1439577792.0, 53780000.0, -0.720309024887907], [1439579520.0, 53800000.0, -0.7289686274214111], [1439581248.0, 53820000.0, -0.7375131173581729], [1439582976.0, 53840000.0, -0.7459411454241807], [1439584704.0, 53860000.0, -0.7542513807361019], [1439586432.0, 53880000.0, -0.7624425110114454], [1439588160.0, 53900000.0, -0.7705132427757864], [1439589888.0, 53920000.0, -0.7784623015670246], [1439591616.0, 53940000.0, -0.7862884321366195], [1439593344.0, 53960000.0, -0.7939903986478355], [1439595072.0, 53980000.0, -0.8015669848708763], [1439596800.0, 54000000.0, -0.8090169943749467], [1439598528.0, 54020000.0, -0.8163392507171827], [1439600256.0, 54040000.0, -0.8235325976284258], [1439601984.0, 54060000.0, -0.8305958991958106], [1439603712.0, 54080000.0, -0.8375280400421432], [1439605440.0, 54100000.0, -0.8443279255020161], [1439607168.0, 54120000.0, -0.8509944817946924], [1439608896.0, 54140000.0, -0.8575266561936524], [1439610624.0, 54160000.0, -0.863923417192835], [1439612352.0, 54180000.0, -0.870183754669525], [1439614080.0, 54200000.0, -0.8763066800438626], [1439615808.0, 54220000.0, -0.882291226434952], [1439617536.0, 54240000.0, -0.8881364488135461], [1439619264.0, 54260000.0, -0.893841424151265], [1439620992.0, 54280000.0, -0.8994052515663687], [1439622720.0, 54300000.0, -0.904827052466017], [1439624448.0, 54320000.0, -0.9101059706849959], [1439626176.0, 54340000.0, -0.9152411726209174], [1439627904.0, 54360000.0, -0.9202318473658698], [1439629632.0, 54380000.0, -0.9250772068344573], [1439631360.0, 54400000.0, -0.929776485888253], [1439633088.0, 54420000.0, -0.9343289424566108], [1439634816.0, 54440000.0, -0.9387338576538726], [1439636544.0, 54460000.0, -0.9429905358928627], [1439638272.0, 54480000.0, -0.9470983049947447], [1439640000.0, 54500000.0, -0.9510565162951538], [1439641728.0, 54520000.0, -0.9548645447466428], [1439643456.0, 54540000.0, -0.9585217890173756], [1439645184.0, 54560000.0, -0.9620276715860854], [1439646912.0, 54580000.0, -0.9653816388332732], [1439648640.0, 54600000.0, -0.9685831611286303], [1439650368.0, 54620000.0, -0.971631732914673], [1439652096.0, 54640000.0, -0.9745268727865776], [1439653824.0, 54660000.0, -0.9772681235681937], [1439655552.0, 54680000.0, -0.979855052384247], [1439657280.0, 54700000.0, -0.9822872507286886], [1439659008.0, 54720000.0, -0.9845643345292052], [1439660736.0, 54740000.0, -0.9866859442078678], [1439662464.0, 54760000.0, -0.9886517447379136], [1439664192.0, 54780000.0, -0.9904614256966507], [1439665920.0, 54800000.0, -0.9921147013144782], [1439667648.0, 54820000.0, -0.9936113105200086], [1439669376.0, 54840000.0, -0.9949510169813003], [1439671104.0, 54860000.0, -0.9961336091431725], [1439672832.0, 54880000.0, -0.9971589002606139], [1439674560.0, 54900000.0, -0.9980267284282714], [1439676288.0, 54920000.0, -0.9987369566060174], [1439678016.0, 54940000.0, -0.9992894726405892], [1439679744.0, 54960000.0, -0.9996841892833], [1439681472.0, 54980000.0, -0.9999210442038161], [1439683200.0, 55000000.0, -1.0], [1439684928.0, 55020000.0, -0.9999210442038162], [1439686656.0, 55040000.0, -0.9996841892833002], [1439688384.0, 55060000.0, -0.9992894726405892], [1439690112.0, 55080000.0, -0.9987369566060176], [1439691840.0, 55100000.0, -0.9980267284282717], [1439693568.0, 55120000.0, -0.9971589002606142], [1439695296.0, 55140000.0, -0.9961336091431727], [1439697024.0, 55160000.0, -0.9949510169813006], [1439698752.0, 55180000.0, -0.993611310520009], [1439700480.0, 55200000.0, -0.9921147013144785], [1439702208.0, 55220000.0, -0.9904614256966511], [1439703936.0, 55240000.0, -0.9886517447379141], [1439705664.0, 55260000.0, -0.9866859442078683], [1439707392.0, 55280000.0, -0.9845643345292057], [1439709120.0, 55300000.0, -0.9822872507286892], [1439710848.0, 55320000.0, -0.9798550523842475], [1439712576.0, 55340000.0, -0.9772681235681943], [1439714304.0, 55360000.0, -0.9745268727865782], [1439716032.0, 55380000.0, -0.9716317329146735], [1439717760.0, 55400000.0, -0.968583161128631], [1439719488.0, 55420000.0, -0.9653816388332739], [1439721216.0, 55440000.0, -0.9620276715860862], [1439722944.0, 55460000.0, -0.9585217890173764], [1439724672.0, 55480000.0, -0.9548645447466436], [1439726400.0, 55500000.0, -0.9510565162951545], [1439728128.0, 55520000.0, -0.9470983049947456], [1439729856.0, 55540000.0, -0.9429905358928636], [1439731584.0, 55560000.0, -0.9387338576538735], [1439733312.0, 55580000.0, -0.9343289424566118], [1439735040.0, 55600000.0, -0.929776485888254], [1439736768.0, 55620000.0, -0.9250772068344583], [1439738496.0, 55640000.0, -0.9202318473658709], [1439740224.0, 55660000.0, -0.9152411726209185], [1439741952.0, 55680000.0, -0.910105970684997], [1439743680.0, 55700000.0, -0.9048270524660181], [1439745408.0, 55720000.0, -0.89940525156637], [1439747136.0, 55740000.0, -0.8938414241512662], [1439748864.0, 55760000.0, -0.8881364488135474], [1439750592.0, 55780000.0, -0.8822912264349532], [1439752320.0, 55800000.0, -0.8763066800438639], [1439754048.0, 55820000.0, -0.8701837546695264], [1439755776.0, 55840000.0, -0.8639234171928364], [1439757504.0, 55860000.0, -0.8575266561936538], [1439759232.0, 55880000.0, -0.8509944817946938], [1439760960.0, 55900000.0, -0.8443279255020175], [1439762688.0, 55920000.0, -0.8375280400421447], [1439764416.0, 55940000.0, -0.8305958991958121], [1439766144.0, 55960000.0, -0.8235325976284273], [1439767872.0, 55980000.0, -0.8163392507171843], [1439769600.0, 56000000.0, -0.8090169943749482], [1439771328.0, 56020000.0, -0.8015669848708779], [1439773056.0, 56040000.0, -0.7939903986478372], [1439774784.0, 56060000.0, -0.7862884321366211], [1439776512.0, 56080000.0, -0.7784623015670262], [1439778240.0, 56100000.0, -0.770513242775788], [1439779968.0, 56120000.0, -0.7624425110114472], [1439781696.0, 56140000.0, -0.7542513807361036], [1439783424.0, 56160000.0, -0.7459411454241824], [1439785152.0, 56180000.0, -0.7375131173581748], [1439786880.0, 56200000.0, -0.728968627421413], [1439788608.0, 56220000.0, -0.7203090248879089], [1439790336.0, 56240000.0, -0.7115356772092879], [1439792064.0, 56260000.0, -0.7026499697988473], [1439793792.0, 56280000.0, -0.6936533058128036], [1439795520.0, 56300000.0, -0.6845471059286878], [1439797248.0, 56320000.0, -0.6753328081210295], [1439798976.0, 56340000.0, -0.666011867434252], [1439800704.0, 56360000.0, -0.6565857557529574], [1439802432.0, 56380000.0, -0.6470559615694459], [1439804160.0, 56400000.0, -0.6374239897486919], [1439805888.0, 56420000.0, -0.6276913612906977], [1439807616.0, 56440000.0, -0.6178596130903322], [1439809344.0, 56460000.0, -0.6079302976946095], [1439811072.0, 56480000.0, -0.5979049830575236], [1439812800.0, 56500000.0, -0.5877852522924728], [1439814528.0, 56520000.0, -0.577572703422268], [1439816256.0, 56540000.0, -0.5672689491267574], [1439817984.0, 56560000.0, -0.5568756164881896], [1439819712.0, 56580000.0, -0.5463943467342655], [1439821440.0, 56600000.0, -0.5358267949789995], [1439823168.0, 56620000.0, -0.5251746299612994], [1439824896.0, 56640000.0, -0.5144395337815109], [1439826624.0, 56660000.0, -0.5036232016357598], [1439828352.0, 56680000.0, -0.4927273415482912], [1439830080.0, 56700000.0, -0.48175367410171555], [1439831808.0, 56720000.0, -0.4707039321653335], [1439833536.0, 56740000.0, -0.45957986062148953], [1439835264.0, 56760000.0, -0.4483832160900346], [1439836992.0, 56780000.0, -0.437115766650936], [1439838720.0, 56800000.0, -0.4257792915650765], [1439840448.0, 56820000.0, -0.41437558099328226], [1439842176.0, 56840000.0, -0.40290643571366147], [1439843904.0, 56860000.0, -0.39137366683720193], [1439845632.0, 56880000.0, -0.37977909552180134], [1439847360.0, 56900000.0, -0.3681245526846789], [1439849088.0, 56920000.0, -0.3564118787132524], [1439850816.0, 56940000.0, -0.3446429231745195], [1439852544.0, 56960000.0, -0.33281954452298984], [1439854272.0, 56980000.0, -0.3209436098072067], [1439856000.0, 57000000.0, -0.3090169943749454], [1439857728.0, 57020000.0, -0.2970415815770336], [1439859456.0, 57040000.0, -0.2850192624699824], [1439861184.0, 57060000.0, -0.27295193551732544], [1439862912.0, 57080000.0, -0.2608415062898979], [1439864640.0, 57100000.0, -0.2486898871648565], [1439866368.0, 57120000.0, -0.23649899702372715], [1439868096.0, 57140000.0, -0.2242707609493775], [1439869824.0, 57160000.0, -0.2120071099220517], [1439871552.0, 57180000.0, -0.1997099805144118], [1439873280.0, 57200000.0, -0.1873813145857302], [1439875008.0, 57220000.0, -0.1750230589752754], [1439876736.0, 57240000.0, -0.16263716519488372], [1439878464.0, 57260000.0, -0.15022558912075795], [1439880192.0, 57280000.0, -0.13779029068463974], [1439881920.0, 57300000.0, -0.12533323356429965], [1439883648.0, 57320000.0, -0.11285638487348491], [1439885376.0, 57340000.0, -0.1003617148512189], [1439887104.0, 57360000.0, -0.08785119655074797], [1439888832.0, 57380000.0, -0.07532680552793122], [1439890560.0, 57400000.0, -0.06279051952931265], [1439892288.0, 57420000.0, -0.05024431817976961], [1439894016.0, 57440000.0, -0.037690182669935374], [1439895744.0, 57460000.0, -0.0251300954433391], [1439897472.0, 57480000.0, -0.01256603988335501], [1439899200.0, 57500000.0, -3.1847006584197066e-15], [1439900928.0, 57520000.0, 0.012566039883348642], [1439902656.0, 57540000.0, 0.02513009544333273], [1439904384.0, 57560000.0, 0.03769018266993611], [1439906112.0, 57580000.0, 0.05024431817977035], [1439907840.0, 57600000.0, 0.06279051952931339], [1439909568.0, 57620000.0, 0.07532680552793196], [1439911296.0, 57640000.0, 0.08785119655074163], [1439913024.0, 57660000.0, 0.10036171485121256], [1439914752.0, 57680000.0, 0.11285638487347859], [1439916480.0, 57700000.0, 0.12533323356430037], [1439918208.0, 57720000.0, 0.13779029068464047], [1439919936.0, 57740000.0, 0.15022558912075867], [1439921664.0, 57760000.0, 0.16263716519488444], [1439923392.0, 57780000.0, 0.17502305897526912], [1439925120.0, 57800000.0, 0.18738131458572393], [1439926848.0, 57820000.0, 0.19970998051440556], [1439928576.0, 57840000.0, 0.21200710992205243], [1439930304.0, 57860000.0, 0.2242707609493782], [1439932032.0, 57880000.0, 0.23649899702372787], [1439933760.0, 57900000.0, 0.24868988716485033], [1439935488.0, 57920000.0, 0.2608415062898917], [1439937216.0, 57940000.0, 0.2729519355173193], [1439938944.0, 57960000.0, 0.2850192624699763], [1439940672.0, 57980000.0, 0.2970415815770343], [1439942400.0, 58000000.0, 0.30901699437494606], [1439944128.0, 58020000.0, 0.32094360980720743], [1439945856.0, 58040000.0, 0.33281954452298385], [1439947584.0, 58060000.0, 0.3446429231745135], [1439949312.0, 58080000.0, 0.3564118787132465], [1439951040.0, 58100000.0, 0.368124552684673], [1439952768.0, 58120000.0, 0.379779095521802], [1439954496.0, 58140000.0, 0.3913736668372026], [1439956224.0, 58160000.0, 0.40290643571366214], [1439957952.0, 58180000.0, 0.4143755809932829], [1439959680.0, 58200000.0, 0.4257792915650707], [1439961408.0, 58220000.0, 0.43711576665093027], [1439963136.0, 58240000.0, 0.44838321609002896], [1439964864.0, 58260000.0, 0.45957986062148387], [1439966592.0, 58280000.0, 0.4707039321653342], [1439968320.0, 58300000.0, 0.4817536741017162], [1439970048.0, 58320000.0, 0.4927273415482918], [1439971776.0, 58340000.0, 0.5036232016357604], [1439973504.0, 58360000.0, 0.5144395337815053], [1439975232.0, 58380000.0, 0.5251746299612939], [1439976960.0, 58400000.0, 0.5358267949789942], [1439978688.0, 58420000.0, 0.546394346734266], [1439980416.0, 58440000.0, 0.5568756164881903], [1439982144.0, 58460000.0, 0.5672689491267581], [1439983872.0, 58480000.0, 0.5775727034222685], [1439985600.0, 58500000.0, 0.5877852522924677], [1439987328.0, 58520000.0, 0.5979049830575185], [1439989056.0, 58540000.0, 0.6079302976946044], [1439990784.0, 58560000.0, 0.6178596130903328], [1439992512.0, 58580000.0, 0.6276913612906984], [1439994240.0, 58600000.0, 0.6374239897486924], [1439995968.0, 58620000.0, 0.647055961569441], [1439997696.0, 58640000.0, 0.6565857557529525], [1439999424.0, 58660000.0, 0.6660118674342472], [1440001152.0, 58680000.0, 0.6753328081210247], [1440002880.0, 58700000.0, 0.6845471059286884], [1440004608.0, 58720000.0, 0.6936533058128042], [1440006336.0, 58740000.0, 0.7026499697988479], [1440008064.0, 58760000.0, 0.7115356772092835], [1440009792.0, 58780000.0, 0.7203090248879045], [1440011520.0, 58800000.0, 0.7289686274214086], [1440013248.0, 58820000.0, 0.7375131173581705], [1440014976.0, 58840000.0, 0.745941145424183], [1440016704.0, 58860000.0, 0.7542513807361041], [1440018432.0, 58880000.0, 0.7624425110114477], [1440020160.0, 58900000.0, 0.7705132427757886], [1440021888.0, 58920000.0, 0.7784623015670222], [1440023616.0, 58940000.0, 0.7862884321366173], [1440025344.0, 58960000.0, 0.7939903986478333], [1440027072.0, 58980000.0, 0.8015669848708741], [1440028800.0, 59000000.0, 0.8090169943749487], [1440030528.0, 59020000.0, 0.8163392507171847], [1440032256.0, 59040000.0, 0.8235325976284278], [1440033984.0, 59060000.0, 0.8305958991958126], [1440035712.0, 59080000.0, 0.8375280400421412], [1440037440.0, 59100000.0, 0.8443279255020141], [1440039168.0, 59120000.0, 0.8509944817946905], [1440040896.0, 59140000.0, 0.8575266561936505], [1440042624.0, 59160000.0, 0.8639234171928368], [1440044352.0, 59180000.0, 0.8701837546695267], [1440046080.0, 59200000.0, 0.8763066800438643], [1440047808.0, 59220000.0, 0.8822912264349502], [1440049536.0, 59240000.0, 0.8881364488135445], [1440051264.0, 59260000.0, 0.8938414241512633], [1440052992.0, 59280000.0, 0.8994052515663703], [1440054720.0, 59300000.0, 0.9048270524660185], [1440056448.0, 59320000.0, 0.9101059706849972], [1440058176.0, 59340000.0, 0.9152411726209188], [1440059904.0, 59360000.0, 0.9202318473658684], [1440061632.0, 59380000.0, 0.9250772068344559], [1440063360.0, 59400000.0, 0.9297764858882517], [1440065088.0, 59420000.0, 0.934328942456612], [1440066816.0, 59440000.0, 0.9387338576538737], [1440068544.0, 59460000.0, 0.942990535892864], [1440070272.0, 59480000.0, 0.9470983049947435], [1440072000.0, 59500000.0, 0.9510565162951525], [1440073728.0, 59520000.0, 0.9548645447466417], [1440075456.0, 59540000.0, 0.9585217890173745], [1440077184.0, 59560000.0, 0.9620276715860863], [1440078912.0, 59580000.0, 0.9653816388332741], [1440080640.0, 59600000.0, 0.9685831611286311], [1440082368.0, 59620000.0, 0.9716317329146738], [1440084096.0, 59640000.0, 0.9745268727865768], [1440085824.0, 59660000.0, 0.9772681235681929], [1440087552.0, 59680000.0, 0.9798550523842462], [1440089280.0, 59700000.0, 0.9822872507286879], [1440091008.0, 59720000.0, 0.9845643345292058], [1440092736.0, 59740000.0, 0.9866859442078684], [1440094464.0, 59760000.0, 0.9886517447379142], [1440096192.0, 59780000.0, 0.9904614256966512], [1440097920.0, 59800000.0, 0.9921147013144778], [1440099648.0, 59820000.0, 0.9936113105200083], [1440101376.0, 59840000.0, 0.9949510169813], [1440103104.0, 59860000.0, 0.9961336091431722], [1440104832.0, 59880000.0, 0.9971589002606142], [1440106560.0, 59900000.0, 0.9980267284282717], [1440108288.0, 59920000.0, 0.9987369566060176], [1440110016.0, 59940000.0, 0.999289472640589], [1440111744.0, 59960000.0, 0.9996841892832999], [1440113472.0, 59980000.0, 0.9999210442038161], [1440115200.0, 60000000.0, 1.0], [1440116928.0, 60020000.0, 0.9999210442038161], [1440118656.0, 60040000.0, 0.9996841892833], [1440120384.0, 60060000.0, 0.9992894726405891], [1440122112.0, 60080000.0, 0.9987369566060177], [1440123840.0, 60100000.0, 0.9980267284282719], [1440125568.0, 60120000.0, 0.9971589002606144], [1440127296.0, 60140000.0, 0.9961336091431725], [1440129024.0, 60160000.0, 0.9949510169813002], [1440130752.0, 60180000.0, 0.9936113105200086], [1440132480.0, 60200000.0, 0.9921147013144781], [1440134208.0, 60220000.0, 0.9904614256966516], [1440135936.0, 60240000.0, 0.9886517447379146], [1440137664.0, 60260000.0, 0.9866859442078688], [1440139392.0, 60280000.0, 0.9845643345292063], [1440141120.0, 60300000.0, 0.9822872507286885], [1440142848.0, 60320000.0, 0.9798550523842469], [1440144576.0, 60340000.0, 0.9772681235681936], [1440146304.0, 60360000.0, 0.9745268727865775], [1440148032.0, 60380000.0, 0.9716317329146744], [1440149760.0, 60400000.0, 0.9685831611286319], [1440151488.0, 60420000.0, 0.9653816388332749], [1440153216.0, 60440000.0, 0.9620276715860872], [1440154944.0, 60460000.0, 0.9585217890173754], [1440156672.0, 60480000.0, 0.9548645447466426], [1440158400.0, 60500000.0, 0.9510565162951535], [1440160128.0, 60520000.0, 0.9470983049947445], [1440161856.0, 60540000.0, 0.9429905358928649], [1440163584.0, 60560000.0, 0.9387338576538747], [1440165312.0, 60580000.0, 0.934328942456613], [1440167040.0, 60600000.0, 0.9297764858882527], [1440168768.0, 60620000.0, 0.925077206834457], [1440170496.0, 60640000.0, 0.9202318473658696], [1440172224.0, 60660000.0, 0.91524117262092], [1440173952.0, 60680000.0, 0.9101059706849984], [1440175680.0, 60700000.0, 0.9048270524660197], [1440177408.0, 60720000.0, 0.8994052515663715], [1440179136.0, 60740000.0, 0.8938414241512647], [1440180864.0, 60760000.0, 0.8881364488135458], [1440182592.0, 60780000.0, 0.8822912264349516], [1440184320.0, 60800000.0, 0.8763066800438657], [1440186048.0, 60820000.0, 0.8701837546695281], [1440187776.0, 60840000.0, 0.8639234171928383], [1440189504.0, 60860000.0, 0.857526656193652], [1440191232.0, 60880000.0, 0.8509944817946921], [1440192960.0, 60900000.0, 0.8443279255020157], [1440194688.0, 60920000.0, 0.8375280400421428], [1440196416.0, 60940000.0, 0.8305958991958141], [1440198144.0, 60960000.0, 0.8235325976284295], [1440199872.0, 60980000.0, 0.8163392507171864], [1440201600.0, 61000000.0, 0.8090169943749503], [1440203328.0, 61020000.0, 0.8015669848708759], [1440205056.0, 61040000.0, 0.793990398647835], [1440206784.0, 61060000.0, 0.786288432136619], [1440208512.0, 61080000.0, 0.7784623015670241], [1440210240.0, 61100000.0, 0.7705132427757904], [1440211968.0, 61120000.0, 0.7624425110114496], [1440213696.0, 61140000.0, 0.7542513807361061], [1440215424.0, 61160000.0, 0.7459411454241849], [1440217152.0, 61180000.0, 0.7375131173581725], [1440218880.0, 61200000.0, 0.7289686274214106], [1440220608.0, 61220000.0, 0.7203090248879065], [1440222336.0, 61240000.0, 0.7115356772092855], [1440224064.0, 61260000.0, 0.7026499697988499], [1440225792.0, 61280000.0, 0.6936533058128063], [1440227520.0, 61300000.0, 0.6845471059286905], [1440229248.0, 61320000.0, 0.6753328081210269], [1440230976.0, 61340000.0, 0.6660118674342494], [1440232704.0, 61360000.0, 0.6565857557529547], [1440234432.0, 61380000.0, 0.6470559615694432], [1440236160.0, 61400000.0, 0.6374239897486947], [1440237888.0, 61420000.0, 0.6276913612907006], [1440239616.0, 61440000.0, 0.6178596130903351], [1440241344.0, 61460000.0, 0.6079302976946067], [1440243072.0, 61480000.0, 0.5979049830575209], [1440244800.0, 61500000.0, 0.58778525229247], [1440246528.0, 61520000.0, 0.577572703422271], [1440248256.0, 61540000.0, 0.5672689491267605], [1440249984.0, 61560000.0, 0.5568756164881927], [1440251712.0, 61580000.0, 0.5463943467342685], [1440253440.0, 61600000.0, 0.5358267949789967], [1440255168.0, 61620000.0, 0.5251746299612965], [1440256896.0, 61640000.0, 0.5144395337815079], [1440258624.0, 61660000.0, 0.5036232016357629], [1440260352.0, 61680000.0, 0.49272734154829434], [1440262080.0, 61700000.0, 0.48175367410171877], [1440263808.0, 61720000.0, 0.4707039321653368], [1440265536.0, 61740000.0, 0.4595798606214865], [1440267264.0, 61760000.0, 0.44838321609003157], [1440268992.0, 61780000.0, 0.43711576665093294], [1440270720.0, 61800000.0, 0.4257792915650734], [1440272448.0, 61820000.0, 0.4143755809932856], [1440274176.0, 61840000.0, 0.4029064357136648], [1440275904.0, 61860000.0, 0.3913736668372053], [1440277632.0, 61880000.0, 0.3797790955218047], [1440279360.0, 61900000.0, 0.36812455268467575], [1440281088.0, 61920000.0, 0.3564118787132492], [1440282816.0, 61940000.0, 0.3446429231745163], [1440284544.0, 61960000.0, 0.3328195445229866], [1440286272.0, 61980000.0, 0.3209436098072102], [1440288000.0, 62000000.0, 0.3090169943749489], [1440289728.0, 62020000.0, 0.2970415815770371], [1440291456.0, 62040000.0, 0.2850192624699791], [1440293184.0, 62060000.0, 0.2729519355173221], [1440294912.0, 62080000.0, 0.26084150628989455], [1440296640.0, 62100000.0, 0.24868988716485316], [1440298368.0, 62120000.0, 0.23649899702373073], [1440300096.0, 62140000.0, 0.2242707609493811], [1440301824.0, 62160000.0, 0.2120071099220553], [1440303552.0, 62180000.0, 0.19970998051440844], [1440305280.0, 62200000.0, 0.18738131458572682], [1440307008.0, 62220000.0, 0.175023058975272], [1440308736.0, 62240000.0, 0.16263716519488736], [1440310464.0, 62260000.0, 0.1502255891207616], [1440312192.0, 62280000.0, 0.13779029068464338], [1440313920.0, 62300000.0, 0.1253332335643033], [1440315648.0, 62320000.0, 0.11285638487348151], [1440317376.0, 62340000.0, 0.10036171485121549], [1440319104.0, 62360000.0, 0.08785119655074455], [1440320832.0, 62380000.0, 0.07532680552793489], [1440322560.0, 62400000.0, 0.06279051952931632], [1440324288.0, 62420000.0, 0.05024431817977328], [1440326016.0, 62440000.0, 0.03769018266993905], [1440327744.0, 62460000.0, 0.02513009544333567], [1440329472.0, 62480000.0, 0.01256603988335158], [1440331200.0, 62500000.0, -2.45548340466059e-16], [1440332928.0, 62520000.0, -0.012566039883352071], [1440334656.0, 62540000.0, -0.02513009544333616], [1440336384.0, 62560000.0, -0.03769018266993244], [1440338112.0, 62580000.0, -0.05024431817976668], [1440339840.0, 62600000.0, -0.06279051952930972], [1440341568.0, 62620000.0, -0.07532680552792828], [1440343296.0, 62640000.0, -0.08785119655074504], [1440345024.0, 62660000.0, -0.10036171485121598], [1440346752.0, 62680000.0, -0.112856384873482], [1440348480.0, 62700000.0, -0.12533323356429674], [1440350208.0, 62720000.0, -0.13779029068463683], [1440351936.0, 62740000.0, -0.15022558912075504], [1440353664.0, 62760000.0, -0.16263716519488083], [1440355392.0, 62780000.0, -0.1750230589752725], [1440357120.0, 62800000.0, -0.18738131458572732], [1440358848.0, 62820000.0, -0.1997099805144089], [1440360576.0, 62840000.0, -0.21200710992204883], [1440362304.0, 62860000.0, -0.22427076094937462], [1440364032.0, 62880000.0, -0.2364989970237243], [1440365760.0, 62900000.0, -0.24868988716485363], [1440367488.0, 62920000.0, -0.26084150628989505], [1440369216.0, 62940000.0, -0.2729519355173226], [1440370944.0, 62960000.0, -0.28501926246997955], [1440372672.0, 62980000.0, -0.2970415815770308], [1440374400.0, 63000000.0, -0.30901699437494257], [1440376128.0, 63020000.0, -0.32094360980720393], [1440377856.0, 63040000.0, -0.33281954452298707], [1440379584.0, 63060000.0, -0.3446429231745167], [1440381312.0, 63080000.0, -0.35641187871324964], [1440383040.0, 63100000.0, -0.3681245526846762], [1440384768.0, 63120000.0, -0.3797790955217986], [1440386496.0, 63140000.0, -0.3913736668371992], [1440388224.0, 63160000.0, -0.40290643571365875], [1440389952.0, 63180000.0, -0.41437558099327954], [1440391680.0, 63200000.0, -0.4257792915650738], [1440393408.0, 63220000.0, -0.4371157666509334], [1440395136.0, 63240000.0, -0.448383216090032], [1440396864.0, 63260000.0, -0.4595798606214869], [1440398592.0, 63280000.0, -0.47070393216533096], [1440400320.0, 63300000.0, -0.481753674101713], [1440402048.0, 63320000.0, -0.4927273415482886], [1440403776.0, 63340000.0, -0.5036232016357572], [1440405504.0, 63360000.0, -0.5144395337815083], [1440407232.0, 63380000.0, -0.5251746299612968], [1440408960.0, 63400000.0, -0.5358267949789971], [1440410688.0, 63420000.0, -0.546394346734263], [1440412416.0, 63440000.0, -0.5568756164881872], [1440414144.0, 63460000.0, -0.5672689491267551], [1440415872.0, 63480000.0, -0.5775727034222655], [1440417600.0, 63500000.0, -0.5877852522924705], [1440419328.0, 63520000.0, -0.5979049830575213], [1440421056.0, 63540000.0, -0.6079302976946072], [1440422784.0, 63560000.0, -0.6178596130903299], [1440424512.0, 63580000.0, -0.6276913612906955], [1440426240.0, 63600000.0, -0.6374239897486896], [1440427968.0, 63620000.0, -0.6470559615694436], [1440429696.0, 63640000.0, -0.6565857557529552], [1440431424.0, 63660000.0, -0.6660118674342498], [1440433152.0, 63680000.0, -0.6753328081210273], [1440434880.0, 63700000.0, -0.6845471059286857], [1440436608.0, 63720000.0, -0.6936533058128015], [1440438336.0, 63740000.0, -0.7026499697988452], [1440440064.0, 63760000.0, -0.7115356772092858], [1440441792.0, 63780000.0, -0.7203090248879068], [1440443520.0, 63800000.0, -0.7289686274214109], [1440445248.0, 63820000.0, -0.7375131173581728], [1440446976.0, 63840000.0, -0.7459411454241806], [1440448704.0, 63860000.0, -0.7542513807361018], [1440450432.0, 63880000.0, -0.7624425110114453], [1440452160.0, 63900000.0, -0.7705132427757861], [1440453888.0, 63920000.0, -0.7784623015670245], [1440455616.0, 63940000.0, -0.7862884321366194], [1440457344.0, 63960000.0, -0.7939903986478353], [1440459072.0, 63980000.0, -0.8015669848708761], [1440460800.0, 64000000.0, -0.8090169943749465], [1440462528.0, 64020000.0, -0.8163392507171826], [1440464256.0, 64040000.0, -0.8235325976284257], [1440465984.0, 64060000.0, -0.8305958991958104], [1440467712.0, 64080000.0, -0.837528040042143], [1440469440.0, 64100000.0, -0.844327925502016], [1440471168.0, 64120000.0, -0.8509944817946923], [1440472896.0, 64140000.0, -0.8575266561936523], [1440474624.0, 64160000.0, -0.8639234171928349], [1440476352.0, 64180000.0, -0.8701837546695249], [1440478080.0, 64200000.0, -0.8763066800438625], [1440479808.0, 64220000.0, -0.8822912264349518], [1440481536.0, 64240000.0, -0.888136448813546], [1440483264.0, 64260000.0, -0.8938414241512649], [1440484992.0, 64280000.0, -0.8994052515663686], [1440486720.0, 64300000.0, -0.9048270524660169], [1440488448.0, 64320000.0, -0.9101059706849958], [1440490176.0, 64340000.0, -0.9152411726209173], [1440491904.0, 64360000.0, -0.9202318473658698], [1440493632.0, 64380000.0, -0.9250772068344572], [1440495360.0, 64400000.0, -0.9297764858882529], [1440497088.0, 64420000.0, -0.9343289424566107], [1440498816.0, 64440000.0, -0.9387338576538725], [1440500544.0, 64460000.0, -0.9429905358928626], [1440502272.0, 64480000.0, -0.9470983049947446], [1440504000.0, 64500000.0, -0.9510565162951536], [1440505728.0, 64520000.0, -0.9548645447466428], [1440507456.0, 64540000.0, -0.9585217890173755], [1440509184.0, 64560000.0, -0.9620276715860853], [1440510912.0, 64580000.0, -0.9653816388332731], [1440512640.0, 64600000.0, -0.9685831611286302], [1440514368.0, 64620000.0, -0.9716317329146729], [1440516096.0, 64640000.0, -0.9745268727865776], [1440517824.0, 64660000.0, -0.9772681235681937], [1440519552.0, 64680000.0, -0.979855052384247], [1440521280.0, 64700000.0, -0.9822872507286886], [1440523008.0, 64720000.0, -0.9845643345292051], [1440524736.0, 64740000.0, -0.9866859442078677], [1440526464.0, 64760000.0, -0.9886517447379136], [1440528192.0, 64780000.0, -0.9904614256966507], [1440529920.0, 64800000.0, -0.9921147013144782], [1440531648.0, 64820000.0, -0.9936113105200086], [1440533376.0, 64840000.0, -0.9949510169813003], [1440535104.0, 64860000.0, -0.9961336091431725], [1440536832.0, 64880000.0, -0.9971589002606139], [1440538560.0, 64900000.0, -0.9980267284282714], [1440540288.0, 64920000.0, -0.9987369566060174], [1440542016.0, 64940000.0, -0.9992894726405891], [1440543744.0, 64960000.0, -0.9996841892833], [1440545472.0, 64980000.0, -0.9999210442038161], [1440547200.0, 65000000.0, -1.0], [1440548928.0, 65020000.0, -0.9999210442038162], [1440550656.0, 65040000.0, -0.9996841892833002], [1440552384.0, 65060000.0, -0.9992894726405893], [1440554112.0, 65080000.0, -0.9987369566060176], [1440555840.0, 65100000.0, -0.9980267284282717], [1440557568.0, 65120000.0, -0.9971589002606142], [1440559296.0, 65140000.0, -0.9961336091431727], [1440561024.0, 65160000.0, -0.9949510169813006], [1440562752.0, 65180000.0, -0.993611310520009], [1440564480.0, 65200000.0, -0.9921147013144785], [1440566208.0, 65220000.0, -0.9904614256966512], [1440567936.0, 65240000.0, -0.9886517447379141], [1440569664.0, 65260000.0, -0.9866859442078683], [1440571392.0, 65280000.0, -0.9845643345292057], [1440573120.0, 65300000.0, -0.9822872507286892], [1440574848.0, 65320000.0, -0.9798550523842475], [1440576576.0, 65340000.0, -0.9772681235681944], [1440578304.0, 65360000.0, -0.9745268727865782], [1440580032.0, 65380000.0, -0.9716317329146736], [1440581760.0, 65400000.0, -0.968583161128631], [1440583488.0, 65420000.0, -0.965381638833274], [1440585216.0, 65440000.0, -0.9620276715860862], [1440586944.0, 65460000.0, -0.9585217890173764], [1440588672.0, 65480000.0, -0.9548645447466437], [1440590400.0, 65500000.0, -0.9510565162951546], [1440592128.0, 65520000.0, -0.9470983049947456], [1440593856.0, 65540000.0, -0.9429905358928637], [1440595584.0, 65560000.0, -0.9387338576538736], [1440597312.0, 65580000.0, -0.9343289424566118], [1440599040.0, 65600000.0, -0.9297764858882541], [1440600768.0, 65620000.0, -0.9250772068344584], [1440602496.0, 65640000.0, -0.920231847365871], [1440604224.0, 65660000.0, -0.9152411726209185], [1440605952.0, 65680000.0, -0.9101059706849971], [1440607680.0, 65700000.0, -0.9048270524660182], [1440609408.0, 65720000.0, -0.8994052515663701], [1440611136.0, 65740000.0, -0.8938414241512663], [1440612864.0, 65760000.0, -0.8881364488135475], [1440614592.0, 65780000.0, -0.8822912264349534], [1440616320.0, 65800000.0, -0.876306680043864], [1440618048.0, 65820000.0, -0.8701837546695265], [1440619776.0, 65840000.0, -0.8639234171928366], [1440621504.0, 65860000.0, -0.857526656193654], [1440623232.0, 65880000.0, -0.850994481794694], [1440624960.0, 65900000.0, -0.8443279255020176], [1440626688.0, 65920000.0, -0.8375280400421448], [1440628416.0, 65940000.0, -0.8305958991958122], [1440630144.0, 65960000.0, -0.8235325976284275], [1440631872.0, 65980000.0, -0.8163392507171844], [1440633600.0, 66000000.0, -0.8090169943749483], [1440635328.0, 66020000.0, -0.801566984870878], [1440637056.0, 66040000.0, -0.7939903986478373], [1440638784.0, 66060000.0, -0.7862884321366214], [1440640512.0, 66080000.0, -0.7784623015670264], [1440642240.0, 66100000.0, -0.7705132427757883], [1440643968.0, 66120000.0, -0.7624425110114473], [1440645696.0, 66140000.0, -0.7542513807361038], [1440647424.0, 66160000.0, -0.7459411454241827], [1440649152.0, 66180000.0, -0.7375131173581749], [1440650880.0, 66200000.0, -0.7289686274214131], [1440652608.0, 66220000.0, -0.720309024887909], [1440654336.0, 66240000.0, -0.7115356772092881], [1440656064.0, 66260000.0, -0.7026499697988475], [1440657792.0, 66280000.0, -0.6936533058128038], [1440659520.0, 66300000.0, -0.6845471059286881], [1440661248.0, 66320000.0, -0.6753328081210296], [1440662976.0, 66340000.0, -0.6660118674342521], [1440664704.0, 66360000.0, -0.6565857557529575], [1440666432.0, 66380000.0, -0.647055961569446], [1440668160.0, 66400000.0, -0.6374239897486921], [1440669888.0, 66420000.0, -0.6276913612906979], [1440671616.0, 66440000.0, -0.6178596130903324], [1440673344.0, 66460000.0, -0.6079302976946097], [1440675072.0, 66480000.0, -0.5979049830575238], [1440676800.0, 66500000.0, -0.587785252292473], [1440678528.0, 66520000.0, -0.5775727034222681], [1440680256.0, 66540000.0, -0.5672689491267576], [1440681984.0, 66560000.0, -0.5568756164881898], [1440683712.0, 66580000.0, -0.5463943467342657], [1440685440.0, 66600000.0, -0.5358267949789998], [1440687168.0, 66620000.0, -0.5251746299612996], [1440688896.0, 66640000.0, -0.514439533781511], [1440690624.0, 66660000.0, -0.50362320163576], [1440692352.0, 66680000.0, -0.4927273415482914], [1440694080.0, 66700000.0, -0.48175367410171577], [1440695808.0, 66720000.0, -0.47070393216533374], [1440697536.0, 66740000.0, -0.45957986062148976], [1440699264.0, 66760000.0, -0.44838321609003484], [1440700992.0, 66780000.0, -0.4371157666509362], [1440702720.0, 66800000.0, -0.4257792915650767], [1440704448.0, 66820000.0, -0.4143755809932825], [1440706176.0, 66840000.0, -0.4029064357136617], [1440707904.0, 66860000.0, -0.39137366683720215], [1440709632.0, 66880000.0, -0.37977909552180156], [1440711360.0, 66900000.0, -0.36812455268467914], [1440713088.0, 66920000.0, -0.35641187871325264], [1440714816.0, 66940000.0, -0.3446429231745197], [1440716544.0, 66960000.0, -0.33281954452299006], [1440718272.0, 66980000.0, -0.32094360980720693], [1440720000.0, 67000000.0, -0.3090169943749456], [1440721728.0, 67020000.0, -0.2970415815770338], [1440723456.0, 67040000.0, -0.2850192624699826], [1440725184.0, 67060000.0, -0.27295193551732566], [1440726912.0, 67080000.0, -0.2608415062898981], [1440728640.0, 67100000.0, -0.24868988716485674], [1440730368.0, 67120000.0, -0.2364989970237274], [1440732096.0, 67140000.0, -0.22427076094937773], [1440733824.0, 67160000.0, -0.21200710992205193], [1440735552.0, 67180000.0, -0.19970998051441205], [1440737280.0, 67200000.0, -0.18738131458573043], [1440739008.0, 67220000.0, -0.17502305897527565], [1440740736.0, 67240000.0, -0.16263716519488397], [1440742464.0, 67260000.0, -0.1502255891207582], [1440744192.0, 67280000.0, -0.13779029068463997], [1440745920.0, 67300000.0, -0.1253332335642999], [1440747648.0, 67320000.0, -0.11285638487348516], [1440749376.0, 67340000.0, -0.10036171485121914], [1440751104.0, 67360000.0, -0.08785119655074822], [1440752832.0, 67380000.0, -0.07532680552793146], [1440754560.0, 67400000.0, -0.0627905195293129], [1440756288.0, 67420000.0, -0.050244318179769855], [1440758016.0, 67440000.0, -0.037690182669935623], [1440759744.0, 67460000.0, -0.025130095443339343], [1440761472.0, 67480000.0, -0.012566039883355255], [1440763200.0, 67500000.0, -3.4296300182491773e-15], [1440764928.0, 67520000.0, 0.012566039883348397], [1440766656.0, 67540000.0, 0.025130095443332487], [1440768384.0, 67560000.0, 0.037690182669935866], [1440770112.0, 67580000.0, 0.050244318179770105], [1440771840.0, 67600000.0, 0.06279051952931314], [1440773568.0, 67620000.0, 0.07532680552793171], [1440775296.0, 67640000.0, 0.08785119655074139], [1440777024.0, 67660000.0, 0.10036171485121233], [1440778752.0, 67680000.0, 0.11285638487347835], [1440780480.0, 67700000.0, 0.12533323356430012], [1440782208.0, 67720000.0, 0.13779029068464022], [1440783936.0, 67740000.0, 0.15022558912075842], [1440785664.0, 67760000.0, 0.16263716519488422], [1440787392.0, 67780000.0, 0.17502305897526887], [1440789120.0, 67800000.0, 0.18738131458572369], [1440790848.0, 67820000.0, 0.19970998051440533], [1440792576.0, 67840000.0, 0.21200710992205218], [1440794304.0, 67860000.0, 0.22427076094937798], [1440796032.0, 67880000.0, 0.23649899702372762], [1440797760.0, 67900000.0, 0.24868988716485008], [1440799488.0, 67920000.0, 0.2608415062898915], [1440801216.0, 67940000.0, 0.27295193551731906], [1440802944.0, 67960000.0, 0.28501926246997605], [1440804672.0, 67980000.0, 0.2970415815770341], [1440806400.0, 68000000.0, 0.30901699437494584], [1440808128.0, 68020000.0, 0.32094360980720715], [1440809856.0, 68040000.0, 0.3328195445229836], [1440811584.0, 68060000.0, 0.3446429231745133], [1440813312.0, 68080000.0, 0.3564118787132462], [1440815040.0, 68100000.0, 0.36812455268467276], [1440816768.0, 68120000.0, 0.3797790955218018], [1440818496.0, 68140000.0, 0.39137366683720237], [1440820224.0, 68160000.0, 0.4029064357136619], [1440821952.0, 68180000.0, 0.4143755809932827], [1440823680.0, 68200000.0, 0.4257792915650705], [1440825408.0, 68220000.0, 0.43711576665093005], [1440827136.0, 68240000.0, 0.44838321609002874], [1440828864.0, 68260000.0, 0.45957986062148365], [1440830592.0, 68280000.0, 0.47070393216533396], [1440832320.0, 68300000.0, 0.481753674101716], [1440834048.0, 68320000.0, 0.4927273415482916], [1440835776.0, 68340000.0, 0.5036232016357602], [1440837504.0, 68360000.0, 0.5144395337815051], [1440839232.0, 68380000.0, 0.5251746299612937], [1440840960.0, 68400000.0, 0.535826794978994], [1440842688.0, 68420000.0, 0.5463943467342658], [1440844416.0, 68440000.0, 0.5568756164881901], [1440846144.0, 68460000.0, 0.5672689491267578], [1440847872.0, 68480000.0, 0.5775727034222683], [1440849600.0, 68500000.0, 0.5877852522924675], [1440851328.0, 68520000.0, 0.5979049830575184], [1440853056.0, 68540000.0, 0.6079302976946043], [1440854784.0, 68560000.0, 0.6178596130903325], [1440856512.0, 68580000.0, 0.6276913612906981], [1440858240.0, 68600000.0, 0.6374239897486923], [1440859968.0, 68620000.0, 0.6470559615694408], [1440861696.0, 68640000.0, 0.6565857557529524], [1440863424.0, 68660000.0, 0.666011867434247], [1440865152.0, 68680000.0, 0.6753328081210246], [1440866880.0, 68700000.0, 0.6845471059286882], [1440868608.0, 68720000.0, 0.6936533058128039], [1440870336.0, 68740000.0, 0.7026499697988476], [1440872064.0, 68760000.0, 0.7115356772092832], [1440873792.0, 68780000.0, 0.7203090248879043], [1440875520.0, 68800000.0, 0.7289686274214084], [1440877248.0, 68820000.0, 0.7375131173581703], [1440878976.0, 68840000.0, 0.7459411454241828], [1440880704.0, 68860000.0, 0.754251380736104], [1440882432.0, 68880000.0, 0.7624425110114476], [1440884160.0, 68900000.0, 0.7705132427757884], [1440885888.0, 68920000.0, 0.7784623015670221], [1440887616.0, 68940000.0, 0.7862884321366171], [1440889344.0, 68960000.0, 0.793990398647833], [1440891072.0, 68980000.0, 0.8015669848708739], [1440892800.0, 69000000.0, 0.8090169943749486], [1440894528.0, 69020000.0, 0.8163392507171846], [1440896256.0, 69040000.0, 0.8235325976284276], [1440897984.0, 69060000.0, 0.8305958991958123], [1440899712.0, 69080000.0, 0.837528040042141], [1440901440.0, 69100000.0, 0.844327925502014], [1440903168.0, 69120000.0, 0.8509944817946904], [1440904896.0, 69140000.0, 0.8575266561936504], [1440906624.0, 69160000.0, 0.8639234171928367], [1440908352.0, 69180000.0, 0.8701837546695266], [1440910080.0, 69200000.0, 0.8763066800438641], [1440911808.0, 69220000.0, 0.8822912264349501], [1440913536.0, 69240000.0, 0.8881364488135444], [1440915264.0, 69260000.0, 0.8938414241512632], [1440916992.0, 69280000.0, 0.8994052515663702], [1440918720.0, 69300000.0, 0.9048270524660184], [1440920448.0, 69320000.0, 0.9101059706849972], [1440922176.0, 69340000.0, 0.9152411726209186], [1440923904.0, 69360000.0, 0.9202318473658684], [1440925632.0, 69380000.0, 0.9250772068344558], [1440927360.0, 69400000.0, 0.9297764858882516], [1440929088.0, 69420000.0, 0.9343289424566119], [1440930816.0, 69440000.0, 0.9387338576538737], [1440932544.0, 69460000.0, 0.9429905358928639], [1440934272.0, 69480000.0, 0.9470983049947435], [1440936000.0, 69500000.0, 0.9510565162951525], [1440937728.0, 69520000.0, 0.9548645447466417], [1440939456.0, 69540000.0, 0.9585217890173745], [1440941184.0, 69560000.0, 0.9620276715860863], [1440942912.0, 69580000.0, 0.965381638833274], [1440944640.0, 69600000.0, 0.9685831611286311], [1440946368.0, 69620000.0, 0.9716317329146736], [1440948096.0, 69640000.0, 0.9745268727865767], [1440949824.0, 69660000.0, 0.9772681235681929], [1440951552.0, 69680000.0, 0.9798550523842462], [1440953280.0, 69700000.0, 0.9822872507286879], [1440955008.0, 69720000.0, 0.9845643345292058], [1440956736.0, 69740000.0, 0.9866859442078683], [1440958464.0, 69760000.0, 0.9886517447379141], [1440960192.0, 69780000.0, 0.9904614256966512], [1440961920.0, 69800000.0, 0.9921147013144777], [1440963648.0, 69820000.0, 0.9936113105200082], [1440965376.0, 69840000.0, 0.9949510169813], [1440967104.0, 69860000.0, 0.9961336091431722], [1440968832.0, 69880000.0, 0.9971589002606142], [1440970560.0, 69900000.0, 0.9980267284282717], [1440972288.0, 69920000.0, 0.9987369566060176], [1440974016.0, 69940000.0, 0.999289472640589], [1440975744.0, 69960000.0, 0.9996841892832999], [1440977472.0, 69980000.0, 0.9999210442038161], [1440979200.0, 70000000.0, 1.0], [1440980928.0, 70020000.0, 0.9999210442038161], [1440982656.0, 70040000.0, 0.9996841892833], [1440984384.0, 70060000.0, 0.9992894726405891], [1440986112.0, 70080000.0, 0.9987369566060177], [1440987840.0, 70100000.0, 0.9980267284282719], [1440989568.0, 70120000.0, 0.9971589002606144], [1440991296.0, 70140000.0, 0.9961336091431725], [1440993024.0, 70160000.0, 0.9949510169813003], [1440994752.0, 70180000.0, 0.9936113105200086], [1440996480.0, 70200000.0, 0.9921147013144781], [1440998208.0, 70220000.0, 0.9904614256966516], [1440999936.0, 70240000.0, 0.9886517447379146], [1441001664.0, 70260000.0, 0.9866859442078688], [1441003392.0, 70280000.0, 0.9845643345292063], [1441005120.0, 70300000.0, 0.9822872507286885], [1441006848.0, 70320000.0, 0.9798550523842469], [1441008576.0, 70340000.0, 0.9772681235681936], [1441010304.0, 70360000.0, 0.9745268727865775], [1441012032.0, 70380000.0, 0.9716317329146745], [1441013760.0, 70400000.0, 0.9685831611286319], [1441015488.0, 70420000.0, 0.9653816388332749], [1441017216.0, 70440000.0, 0.9620276715860872], [1441018944.0, 70460000.0, 0.9585217890173754], [1441020672.0, 70480000.0, 0.9548645447466427], [1441022400.0, 70500000.0, 0.9510565162951535], [1441024128.0, 70520000.0, 0.9470983049947446], [1441025856.0, 70540000.0, 0.942990535892865], [1441027584.0, 70560000.0, 0.9387338576538748], [1441029312.0, 70580000.0, 0.9343289424566131], [1441031040.0, 70600000.0, 0.9297764858882528], [1441032768.0, 70620000.0, 0.9250772068344572], [1441034496.0, 70640000.0, 0.9202318473658697], [1441036224.0, 70660000.0, 0.9152411726209201], [1441037952.0, 70680000.0, 0.9101059706849985], [1441039680.0, 70700000.0, 0.9048270524660198], [1441041408.0, 70720000.0, 0.8994052515663716], [1441043136.0, 70740000.0, 0.8938414241512648], [1441044864.0, 70760000.0, 0.8881364488135459], [1441046592.0, 70780000.0, 0.8822912264349517], [1441048320.0, 70800000.0, 0.8763066800438658], [1441050048.0, 70820000.0, 0.8701837546695284], [1441051776.0, 70840000.0, 0.8639234171928384], [1441053504.0, 70860000.0, 0.8575266561936522], [1441055232.0, 70880000.0, 0.8509944817946922], [1441056960.0, 70900000.0, 0.8443279255020159], [1441058688.0, 70920000.0, 0.8375280400421429], [1441060416.0, 70940000.0, 0.8305958991958143], [1441062144.0, 70960000.0, 0.8235325976284296], [1441063872.0, 70980000.0, 0.8163392507171866], [1441065600.0, 71000000.0, 0.8090169943749506], [1441067328.0, 71020000.0, 0.801566984870876], [1441069056.0, 71040000.0, 0.7939903986478352], [1441070784.0, 71060000.0, 0.7862884321366193], [1441072512.0, 71080000.0, 0.7784623015670242], [1441074240.0, 71100000.0, 0.7705132427757906], [1441075968.0, 71120000.0, 0.7624425110114498], [1441077696.0, 71140000.0, 0.7542513807361062], [1441079424.0, 71160000.0, 0.7459411454241851], [1441081152.0, 71180000.0, 0.7375131173581726], [1441082880.0, 71200000.0, 0.7289686274214108], [1441084608.0, 71220000.0, 0.7203090248879067], [1441086336.0, 71240000.0, 0.7115356772092857], [1441088064.0, 71260000.0, 0.7026499697988501], [1441089792.0, 71280000.0, 0.6936533058128064], [1441091520.0, 71300000.0, 0.6845471059286907], [1441093248.0, 71320000.0, 0.6753328081210271], [1441094976.0, 71340000.0, 0.6660118674342496], [1441096704.0, 71360000.0, 0.656585755752955], [1441098432.0, 71380000.0, 0.6470559615694434], [1441100160.0, 71400000.0, 0.6374239897486949], [1441101888.0, 71420000.0, 0.6276913612907008], [1441103616.0, 71440000.0, 0.6178596130903353], [1441105344.0, 71460000.0, 0.6079302976946069], [1441107072.0, 71480000.0, 0.597904983057521], [1441108800.0, 71500000.0, 0.5877852522924703], [1441110528.0, 71520000.0, 0.5775727034222711], [1441112256.0, 71540000.0, 0.5672689491267607], [1441113984.0, 71560000.0, 0.5568756164881928], [1441115712.0, 71580000.0, 0.5463943467342687], [1441117440.0, 71600000.0, 0.5358267949789969], [1441119168.0, 71620000.0, 0.5251746299612966], [1441120896.0, 71640000.0, 0.5144395337815081], [1441122624.0, 71660000.0, 0.5036232016357631], [1441124352.0, 71680000.0, 0.49272734154829456], [1441126080.0, 71700000.0, 0.481753674101719], [1441127808.0, 71720000.0, 0.470703932165337], [1441129536.0, 71740000.0, 0.4595798606214867], [1441131264.0, 71760000.0, 0.4483832160900318], [1441132992.0, 71780000.0, 0.43711576665093316], [1441134720.0, 71800000.0, 0.4257792915650736], [1441136448.0, 71820000.0, 0.4143755809932858], [1441138176.0, 71840000.0, 0.402906435713665], [1441139904.0, 71860000.0, 0.39137366683720554], [1441141632.0, 71880000.0, 0.37977909552180494], [1441143360.0, 71900000.0, 0.368124552684676], [1441145088.0, 71920000.0, 0.3564118787132494], [1441146816.0, 71940000.0, 0.3446429231745165], [1441148544.0, 71960000.0, 0.33281954452298684], [1441150272.0, 71980000.0, 0.32094360980721043], [1441152000.0, 72000000.0, 0.3090169943749491], [1441153728.0, 72020000.0, 0.29704158157703736], [1441155456.0, 72040000.0, 0.28501926246997933], [1441157184.0, 72060000.0, 0.27295193551732233], [1441158912.0, 72080000.0, 0.2608415062898948], [1441160640.0, 72100000.0, 0.2486898871648534], [1441162368.0, 72120000.0, 0.23649899702373095], [1441164096.0, 72140000.0, 0.2242707609493813], [1441165824.0, 72160000.0, 0.21200710992205554], [1441167552.0, 72180000.0, 0.1997099805144087], [1441169280.0, 72200000.0, 0.18738131458572707], [1441171008.0, 72220000.0, 0.17502305897527226], [1441172736.0, 72240000.0, 0.16263716519488758], [1441174464.0, 72260000.0, 0.15022558912076184], [1441176192.0, 72280000.0, 0.13779029068464363], [1441177920.0, 72300000.0, 0.12533323356430354], [1441179648.0, 72320000.0, 0.11285638487348175], [1441181376.0, 72340000.0, 0.10036171485121573], [1441183104.0, 72360000.0, 0.0878511965507448], [1441184832.0, 72380000.0, 0.07532680552793512], [1441186560.0, 72400000.0, 0.06279051952931657], [1441188288.0, 72420000.0, 0.050244318179773526], [1441190016.0, 72440000.0, 0.037690182669939294], [1441191744.0, 72460000.0, 0.025130095443335915], [1441193472.0, 72480000.0, 0.012566039883351825], [1441195200.0, 72500000.0, -6.189806365883577e-19], [1441196928.0, 72520000.0, -0.012566039883351827], [1441198656.0, 72540000.0, -0.025130095443335915], [1441200384.0, 72560000.0, -0.037690182669932196], [1441202112.0, 72580000.0, -0.050244318179766434], [1441203840.0, 72600000.0, -0.06279051952930947], [1441205568.0, 72620000.0, -0.07532680552792805], [1441207296.0, 72640000.0, -0.0878511965507448], [1441209024.0, 72660000.0, -0.10036171485121573], [1441210752.0, 72680000.0, -0.11285638487348175], [1441212480.0, 72700000.0, -0.1253332335642965], [1441214208.0, 72720000.0, -0.13779029068463658], [1441215936.0, 72740000.0, -0.15022558912075482], [1441217664.0, 72760000.0, -0.16263716519488058], [1441219392.0, 72780000.0, -0.17502305897527226], [1441221120.0, 72800000.0, -0.18738131458572707], [1441222848.0, 72820000.0, -0.1997099805144087], [1441224576.0, 72840000.0, -0.2120071099220486], [1441226304.0, 72860000.0, -0.2242707609493744], [1441228032.0, 72880000.0, -0.23649899702372407], [1441229760.0, 72900000.0, -0.2486898871648534], [1441231488.0, 72920000.0, -0.2608415062898948], [1441233216.0, 72940000.0, -0.27295193551732233], [1441234944.0, 72960000.0, -0.28501926246997933], [1441236672.0, 72980000.0, -0.29704158157703053], [1441238400.0, 73000000.0, -0.30901699437494234], [1441240128.0, 73020000.0, -0.3209436098072037], [1441241856.0, 73040000.0, -0.33281954452298684], [1441243584.0, 73060000.0, -0.3446429231745165], [1441245312.0, 73080000.0, -0.3564118787132494], [1441247040.0, 73100000.0, -0.368124552684676], [1441248768.0, 73120000.0, -0.3797790955217984], [1441250496.0, 73140000.0, -0.391373666837199], [1441252224.0, 73160000.0, -0.40290643571365853], [1441253952.0, 73180000.0, -0.4143755809932793], [1441255680.0, 73200000.0, -0.4257792915650736], [1441257408.0, 73220000.0, -0.43711576665093316], [1441259136.0, 73240000.0, -0.4483832160900318], [1441260864.0, 73260000.0, -0.4595798606214867], [1441262592.0, 73280000.0, -0.47070393216533074], [1441264320.0, 73300000.0, -0.48175367410171277], [1441266048.0, 73320000.0, -0.4927273415482884], [1441267776.0, 73340000.0, -0.503623201635757], [1441269504.0, 73360000.0, -0.5144395337815081], [1441271232.0, 73380000.0, -0.5251746299612966], [1441272960.0, 73400000.0, -0.5358267949789969], [1441274688.0, 73420000.0, -0.5463943467342628], [1441276416.0, 73440000.0, -0.556875616488187], [1441278144.0, 73460000.0, -0.5672689491267549], [1441279872.0, 73480000.0, -0.5775727034222653], [1441281600.0, 73500000.0, -0.5877852522924703], [1441283328.0, 73520000.0, -0.597904983057521], [1441285056.0, 73540000.0, -0.6079302976946069], [1441286784.0, 73560000.0, -0.6178596130903297], [1441288512.0, 73580000.0, -0.6276913612906952], [1441290240.0, 73600000.0, -0.6374239897486894], [1441291968.0, 73620000.0, -0.6470559615694434], [1441293696.0, 73640000.0, -0.656585755752955], [1441295424.0, 73660000.0, -0.6660118674342496], [1441297152.0, 73680000.0, -0.6753328081210271], [1441298880.0, 73700000.0, -0.6845471059286855], [1441300608.0, 73720000.0, -0.6936533058128013], [1441302336.0, 73740000.0, -0.7026499697988451], [1441304064.0, 73760000.0, -0.7115356772092857], [1441305792.0, 73780000.0, -0.7203090248879067], [1441307520.0, 73800000.0, -0.7289686274214108], [1441309248.0, 73820000.0, -0.7375131173581726], [1441310976.0, 73840000.0, -0.7459411454241803], [1441312704.0, 73860000.0, -0.7542513807361015], [1441314432.0, 73880000.0, -0.7624425110114451], [1441316160.0, 73900000.0, -0.770513242775786], [1441317888.0, 73920000.0, -0.7784623015670242], [1441319616.0, 73940000.0, -0.7862884321366193], [1441321344.0, 73960000.0, -0.7939903986478352], [1441323072.0, 73980000.0, -0.801566984870876], [1441324800.0, 74000000.0, -0.8090169943749463], [1441326528.0, 74020000.0, -0.8163392507171825], [1441328256.0, 74040000.0, -0.8235325976284256], [1441329984.0, 74060000.0, -0.8305958991958103], [1441331712.0, 74080000.0, -0.8375280400421429], [1441333440.0, 74100000.0, -0.8443279255020159], [1441335168.0, 74120000.0, -0.8509944817946922], [1441336896.0, 74140000.0, -0.8575266561936522], [1441338624.0, 74160000.0, -0.8639234171928348], [1441340352.0, 74180000.0, -0.8701837546695248], [1441342080.0, 74200000.0, -0.8763066800438624], [1441343808.0, 74220000.0, -0.8822912264349517], [1441345536.0, 74240000.0, -0.8881364488135459], [1441347264.0, 74260000.0, -0.8938414241512648], [1441348992.0, 74280000.0, -0.8994052515663685], [1441350720.0, 74300000.0, -0.9048270524660168], [1441352448.0, 74320000.0, -0.9101059706849957], [1441354176.0, 74340000.0, -0.9152411726209172], [1441355904.0, 74360000.0, -0.9202318473658697], [1441357632.0, 74380000.0, -0.9250772068344572], [1441359360.0, 74400000.0, -0.9297764858882528], [1441361088.0, 74420000.0, -0.9343289424566106], [1441362816.0, 74440000.0, -0.9387338576538724], [1441364544.0, 74460000.0, -0.9429905358928626], [1441366272.0, 74480000.0, -0.9470983049947446], [1441368000.0, 74500000.0, -0.9510565162951535], [1441369728.0, 74520000.0, -0.9548645447466427], [1441371456.0, 74540000.0, -0.9585217890173754], [1441373184.0, 74560000.0, -0.9620276715860853], [1441374912.0, 74580000.0, -0.9653816388332731], [1441376640.0, 74600000.0, -0.9685831611286302], [1441378368.0, 74620000.0, -0.9716317329146729], [1441380096.0, 74640000.0, -0.9745268727865775], [1441381824.0, 74660000.0, -0.9772681235681936], [1441383552.0, 74680000.0, -0.9798550523842469], [1441385280.0, 74700000.0, -0.9822872507286885], [1441387008.0, 74720000.0, -0.9845643345292051], [1441388736.0, 74740000.0, -0.9866859442078677], [1441390464.0, 74760000.0, -0.9886517447379135], [1441392192.0, 74780000.0, -0.9904614256966506], [1441393920.0, 74800000.0, -0.9921147013144781], [1441395648.0, 74820000.0, -0.9936113105200086], [1441397376.0, 74840000.0, -0.9949510169813003], [1441399104.0, 74860000.0, -0.9961336091431725], [1441400832.0, 74880000.0, -0.9971589002606139], [1441402560.0, 74900000.0, -0.9980267284282714], [1441404288.0, 74920000.0, -0.9987369566060174], [1441406016.0, 74940000.0, -0.9992894726405891], [1441407744.0, 74960000.0, -0.9996841892833], [1441409472.0, 74980000.0, -0.9999210442038161], [1441411200.0, 75000000.0, -1.0], [1441412928.0, 75020000.0, -0.9999210442038162], [1441414656.0, 75040000.0, -0.9996841892833002], [1441416384.0, 75060000.0, -0.9992894726405893], [1441418112.0, 75080000.0, -0.9987369566060176], [1441419840.0, 75100000.0, -0.9980267284282717], [1441421568.0, 75120000.0, -0.9971589002606142], [1441423296.0, 75140000.0, -0.9961336091431728], [1441425024.0, 75160000.0, -0.9949510169813006], [1441426752.0, 75180000.0, -0.9936113105200091], [1441428480.0, 75200000.0, -0.9921147013144785], [1441430208.0, 75220000.0, -0.9904614256966512], [1441431936.0, 75240000.0, -0.9886517447379141], [1441433664.0, 75260000.0, -0.9866859442078683], [1441435392.0, 75280000.0, -0.9845643345292058], [1441437120.0, 75300000.0, -0.9822872507286893], [1441438848.0, 75320000.0, -0.9798550523842476], [1441440576.0, 75340000.0, -0.9772681235681944], [1441442304.0, 75360000.0, -0.9745268727865783], [1441444032.0, 75380000.0, -0.9716317329146736], [1441445760.0, 75400000.0, -0.9685831611286311], [1441447488.0, 75420000.0, -0.965381638833274], [1441449216.0, 75440000.0, -0.9620276715860863], [1441450944.0, 75460000.0, -0.9585217890173765], [1441452672.0, 75480000.0, -0.9548645447466438], [1441454400.0, 75500000.0, -0.9510565162951546], [1441456128.0, 75520000.0, -0.9470983049947457], [1441457856.0, 75540000.0, -0.9429905358928639], [1441459584.0, 75560000.0, -0.9387338576538737], [1441461312.0, 75580000.0, -0.9343289424566119], [1441463040.0, 75600000.0, -0.9297764858882541], [1441464768.0, 75620000.0, -0.9250772068344585], [1441466496.0, 75640000.0, -0.9202318473658712], [1441468224.0, 75660000.0, -0.9152411726209186], [1441469952.0, 75680000.0, -0.9101059706849971], [1441471680.0, 75700000.0, -0.9048270524660184], [1441473408.0, 75720000.0, -0.8994052515663702], [1441475136.0, 75740000.0, -0.8938414241512664], [1441476864.0, 75760000.0, -0.8881364488135476], [1441478592.0, 75780000.0, -0.8822912264349535], [1441480320.0, 75800000.0, -0.8763066800438641], [1441482048.0, 75820000.0, -0.8701837546695266], [1441483776.0, 75840000.0, -0.8639234171928367], [1441485504.0, 75860000.0, -0.8575266561936541], [1441487232.0, 75880000.0, -0.8509944817946941], [1441488960.0, 75900000.0, -0.8443279255020177], [1441490688.0, 75920000.0, -0.8375280400421449], [1441492416.0, 75940000.0, -0.8305958991958123], [1441494144.0, 75960000.0, -0.8235325976284276], [1441495872.0, 75980000.0, -0.8163392507171846], [1441497600.0, 76000000.0, -0.8090169943749486], [1441499328.0, 76020000.0, -0.8015669848708782], [1441501056.0, 76040000.0, -0.7939903986478374], [1441502784.0, 76060000.0, -0.7862884321366215], [1441504512.0, 76080000.0, -0.7784623015670266], [1441506240.0, 76100000.0, -0.7705132427757884], [1441507968.0, 76120000.0, -0.7624425110114476], [1441509696.0, 76140000.0, -0.754251380736104], [1441511424.0, 76160000.0, -0.7459411454241828], [1441513152.0, 76180000.0, -0.7375131173581752], [1441514880.0, 76200000.0, -0.7289686274214133], [1441516608.0, 76220000.0, -0.7203090248879093], [1441518336.0, 76240000.0, -0.7115356772092882], [1441520064.0, 76260000.0, -0.7026499697988476], [1441521792.0, 76280000.0, -0.6936533058128039], [1441523520.0, 76300000.0, -0.6845471059286882], [1441525248.0, 76320000.0, -0.6753328081210298], [1441526976.0, 76340000.0, -0.6660118674342523], [1441528704.0, 76360000.0, -0.6565857557529577], [1441530432.0, 76380000.0, -0.6470559615694462], [1441532160.0, 76400000.0, -0.6374239897486923], [1441533888.0, 76420000.0, -0.6276913612906981], [1441535616.0, 76440000.0, -0.6178596130903325], [1441537344.0, 76460000.0, -0.6079302976946098], [1441539072.0, 76480000.0, -0.597904983057524], [1441540800.0, 76500000.0, -0.5877852522924732], [1441542528.0, 76520000.0, -0.5775727034222683], [1441544256.0, 76540000.0, -0.5672689491267578], [1441545984.0, 76560000.0, -0.5568756164881901], [1441547712.0, 76580000.0, -0.5463943467342658], [1441549440.0, 76600000.0, -0.535826794979], [1441551168.0, 76620000.0, -0.5251746299612998], [1441552896.0, 76640000.0, -0.5144395337815112], [1441554624.0, 76660000.0, -0.5036232016357602], [1441556352.0, 76680000.0, -0.4927273415482916], [1441558080.0, 76700000.0, -0.481753674101716], [1441559808.0, 76720000.0, -0.47070393216533396], [1441561536.0, 76740000.0, -0.45957986062149], [1441563264.0, 76760000.0, -0.44838321609003506], [1441564992.0, 76780000.0, -0.43711576665093643], [1441566720.0, 76800000.0, -0.42577929156507693], [1441568448.0, 76820000.0, -0.4143755809932827], [1441570176.0, 76840000.0, -0.4029064357136619], [1441571904.0, 76860000.0, -0.39137366683720237], [1441573632.0, 76880000.0, -0.3797790955218018], [1441575360.0, 76900000.0, -0.36812455268467936], [1441577088.0, 76920000.0, -0.35641187871325286], [1441578816.0, 76940000.0, -0.34464292317451994], [1441580544.0, 76960000.0, -0.3328195445229903], [1441582272.0, 76980000.0, -0.32094360980720715], [1441584000.0, 77000000.0, -0.30901699437494584], [1441585728.0, 77020000.0, -0.2970415815770341], [1441587456.0, 77040000.0, -0.2850192624699828], [1441589184.0, 77060000.0, -0.2729519355173259], [1441590912.0, 77080000.0, -0.26084150628989833], [1441592640.0, 77100000.0, -0.24868988716485696], [1441594368.0, 77120000.0, -0.23649899702372762], [1441596096.0, 77140000.0, -0.22427076094937798], [1441597824.0, 77160000.0, -0.21200710992205218], [1441599552.0, 77180000.0, -0.19970998051441227], [1441601280.0, 77200000.0, -0.18738131458573068], [1441603008.0, 77220000.0, -0.17502305897527587], [1441604736.0, 77240000.0, -0.16263716519488422], [1441606464.0, 77260000.0, -0.15022558912075842], [1441608192.0, 77280000.0, -0.13779029068464022], [1441609920.0, 77300000.0, -0.12533323356430012], [1441611648.0, 77320000.0, -0.1128563848734854], [1441613376.0, 77340000.0, -0.10036171485121939], [1441615104.0, 77360000.0, -0.08785119655074847], [1441616832.0, 77380000.0, -0.07532680552793171], [1441618560.0, 77400000.0, -0.06279051952931314], [1441620288.0, 77420000.0, -0.050244318179770105], [1441622016.0, 77440000.0, -0.037690182669935866], [1441623744.0, 77460000.0, -0.02513009544333959], [1441625472.0, 77480000.0, -0.012566039883355501], [1441627200.0, 77500000.0, -3.674559378078648e-15], [1441628928.0, 77520000.0, 0.012566039883348151], [1441630656.0, 77540000.0, 0.02513009544333224], [1441632384.0, 77560000.0, 0.037690182669935623], [1441634112.0, 77580000.0, 0.05024431817976986], [1441635840.0, 77600000.0, 0.0627905195293129], [1441637568.0, 77620000.0, 0.07532680552793146], [1441639296.0, 77640000.0, 0.08785119655074114], [1441641024.0, 77660000.0, 0.10036171485121208], [1441642752.0, 77680000.0, 0.1128563848734781], [1441644480.0, 77700000.0, 0.1253332335642999], [1441646208.0, 77720000.0, 0.13779029068463997], [1441647936.0, 77740000.0, 0.1502255891207582], [1441649664.0, 77760000.0, 0.16263716519488397], [1441651392.0, 77780000.0, 0.17502305897526865], [1441653120.0, 77800000.0, 0.18738131458572346], [1441654848.0, 77820000.0, 0.19970998051440508], [1441656576.0, 77840000.0, 0.21200710992205193], [1441658304.0, 77860000.0, 0.22427076094937773], [1441660032.0, 77880000.0, 0.2364989970237274], [1441661760.0, 77900000.0, 0.24868988716484985], [1441663488.0, 77920000.0, 0.2608415062898912], [1441665216.0, 77940000.0, 0.27295193551731883], [1441666944.0, 77960000.0, 0.2850192624699758], [1441668672.0, 77980000.0, 0.2970415815770338], [1441670400.0, 78000000.0, 0.3090169943749456], [1441672128.0, 78020000.0, 0.32094360980720693], [1441673856.0, 78040000.0, 0.33281954452298335], [1441675584.0, 78060000.0, 0.34464292317451306], [1441677312.0, 78080000.0, 0.356411878713246], [1441679040.0, 78100000.0, 0.36812455268467253], [1441680768.0, 78120000.0, 0.37977909552180156], [1441682496.0, 78140000.0, 0.39137366683720215], [1441684224.0, 78160000.0, 0.4029064357136617], [1441685952.0, 78180000.0, 0.4143755809932825], [1441687680.0, 78200000.0, 0.4257792915650703], [1441689408.0, 78220000.0, 0.4371157666509298], [1441691136.0, 78240000.0, 0.4483832160900285], [1441692864.0, 78260000.0, 0.4595798606214834], [1441694592.0, 78280000.0, 0.47070393216533374], [1441696320.0, 78300000.0, 0.48175367410171577], [1441698048.0, 78320000.0, 0.4927273415482914], [1441699776.0, 78340000.0, 0.50362320163576], [1441701504.0, 78360000.0, 0.514439533781505], [1441703232.0, 78380000.0, 0.5251746299612935], [1441704960.0, 78400000.0, 0.5358267949789938], [1441706688.0, 78420000.0, 0.5463943467342657], [1441708416.0, 78440000.0, 0.5568756164881898], [1441710144.0, 78460000.0, 0.5672689491267576], [1441711872.0, 78480000.0, 0.5775727034222681], [1441713600.0, 78500000.0, 0.5877852522924673], [1441715328.0, 78520000.0, 0.5979049830575182], [1441717056.0, 78540000.0, 0.607930297694604], [1441718784.0, 78560000.0, 0.6178596130903324], [1441720512.0, 78580000.0, 0.6276913612906979], [1441722240.0, 78600000.0, 0.6374239897486921], [1441723968.0, 78620000.0, 0.6470559615694407], [1441725696.0, 78640000.0, 0.6565857557529522], [1441727424.0, 78660000.0, 0.6660118674342469], [1441729152.0, 78680000.0, 0.6753328081210244], [1441730880.0, 78700000.0, 0.6845471059286881], [1441732608.0, 78720000.0, 0.6936533058128038], [1441734336.0, 78740000.0, 0.7026499697988475], [1441736064.0, 78760000.0, 0.7115356772092831], [1441737792.0, 78780000.0, 0.7203090248879042], [1441739520.0, 78800000.0, 0.7289686274214082], [1441741248.0, 78820000.0, 0.7375131173581702], [1441742976.0, 78840000.0, 0.7459411454241827], [1441744704.0, 78860000.0, 0.7542513807361038], [1441746432.0, 78880000.0, 0.7624425110114473], [1441748160.0, 78900000.0, 0.7705132427757883], [1441749888.0, 78920000.0, 0.7784623015670219], [1441751616.0, 78940000.0, 0.7862884321366169], [1441753344.0, 78960000.0, 0.7939903986478329], [1441755072.0, 78980000.0, 0.8015669848708737], [1441756800.0, 79000000.0, 0.8090169943749483], [1441758528.0, 79020000.0, 0.8163392507171844], [1441760256.0, 79040000.0, 0.8235325976284275], [1441761984.0, 79060000.0, 0.8305958991958122], [1441763712.0, 79080000.0, 0.8375280400421409], [1441765440.0, 79100000.0, 0.8443279255020139], [1441767168.0, 79120000.0, 0.8509944817946903], [1441768896.0, 79140000.0, 0.8575266561936503], [1441770624.0, 79160000.0, 0.8639234171928366], [1441772352.0, 79180000.0, 0.8701837546695265], [1441774080.0, 79200000.0, 0.876306680043864], [1441775808.0, 79220000.0, 0.88229122643495], [1441777536.0, 79240000.0, 0.8881364488135443], [1441779264.0, 79260000.0, 0.8938414241512631], [1441780992.0, 79280000.0, 0.8994052515663701], [1441782720.0, 79300000.0, 0.9048270524660182], [1441784448.0, 79320000.0, 0.9101059706849971], [1441786176.0, 79340000.0, 0.9152411726209185], [1441787904.0, 79360000.0, 0.9202318473658683], [1441789632.0, 79380000.0, 0.9250772068344557], [1441791360.0, 79400000.0, 0.9297764858882515], [1441793088.0, 79420000.0, 0.9343289424566118], [1441794816.0, 79440000.0, 0.9387338576538736], [1441796544.0, 79460000.0, 0.9429905358928637], [1441798272.0, 79480000.0, 0.9470983049947433], [1441800000.0, 79500000.0, 0.9510565162951524], [1441801728.0, 79520000.0, 0.9548645447466416], [1441803456.0, 79540000.0, 0.9585217890173744], [1441805184.0, 79560000.0, 0.9620276715860862], [1441806912.0, 79580000.0, 0.965381638833274], [1441808640.0, 79600000.0, 0.968583161128631], [1441810368.0, 79620000.0, 0.9716317329146736], [1441812096.0, 79640000.0, 0.9745268727865767], [1441813824.0, 79660000.0, 0.9772681235681928], [1441815552.0, 79680000.0, 0.9798550523842461], [1441817280.0, 79700000.0, 0.9822872507286878], [1441819008.0, 79720000.0, 0.9845643345292057], [1441820736.0, 79740000.0, 0.9866859442078683], [1441822464.0, 79760000.0, 0.9886517447379141], [1441824192.0, 79780000.0, 0.9904614256966512], [1441825920.0, 79800000.0, 0.9921147013144777], [1441827648.0, 79820000.0, 0.9936113105200082], [1441829376.0, 79840000.0, 0.9949510169812998], [1441831104.0, 79860000.0, 0.9961336091431722], [1441832832.0, 79880000.0, 0.9971589002606142], [1441834560.0, 79900000.0, 0.9980267284282717], [1441836288.0, 79920000.0, 0.9987369566060176], [1441838016.0, 79940000.0, 0.999289472640589], [1441839744.0, 79960000.0, 0.9996841892832999], [1441841472.0, 79980000.0, 0.9999210442038161], [1441843200.0, 80000000.0, 1.0], [1441844928.0, 80020000.0, 0.9999210442038161], [1441846656.0, 80040000.0, 0.9996841892833], [1441848384.0, 80060000.0, 0.9992894726405891], [1441850112.0, 80080000.0, 0.9987369566060178], [1441851840.0, 80100000.0, 0.9980267284282719], [1441853568.0, 80120000.0, 0.9971589002606139], [1441855296.0, 80140000.0, 0.9961336091431732], [1441857024.0, 80160000.0, 0.9949510169813003], [1441858752.0, 80180000.0, 0.9936113105200078], [1441860480.0, 80200000.0, 0.9921147013144782], [1441862208.0, 80220000.0, 0.9904614256966517], [1441863936.0, 80240000.0, 0.9886517447379146], [1441865664.0, 80260000.0, 0.9866859442078689], [1441867392.0, 80280000.0, 0.9845643345292051], [1441869120.0, 80300000.0, 0.9822872507286899], [1441870848.0, 80320000.0, 0.979855052384247], [1441872576.0, 80340000.0, 0.9772681235681937], [1441874304.0, 80360000.0, 0.9745268727865776], [1441876032.0, 80380000.0, 0.9716317329146745], [1441877760.0, 80400000.0, 0.9685831611286337], [1441879488.0, 80420000.0, 0.965381638833275], [1441881216.0, 80440000.0, 0.9620276715860853], [1441882944.0, 80460000.0, 0.9585217890173775], [1441884672.0, 80480000.0, 0.9548645447466427], [1441886400.0, 80500000.0, 0.9510565162951536], [1441888128.0, 80520000.0, 0.9470983049947446], [1441889856.0, 80540000.0, 0.9429905358928651], [1441891584.0, 80560000.0, 0.9387338576538774], [1441893312.0, 80580000.0, 0.9343289424566132], [1441895040.0, 80600000.0, 0.9297764858882503], [1441896768.0, 80620000.0, 0.9250772068344599], [1441898496.0, 80640000.0, 0.9202318473658698], [1441900224.0, 80660000.0, 0.9152411726209173], [1441901952.0, 80680000.0, 0.9101059706849987], [1441903680.0, 80700000.0, 0.9048270524660199], [1441905408.0, 80720000.0, 0.8994052515663749], [1441907136.0, 80740000.0, 0.8938414241512649], [1441908864.0, 80760000.0, 0.8881364488135428], [1441910592.0, 80780000.0, 0.8822912264349552], [1441912320.0, 80800000.0, 0.8763066800438659], [1441914048.0, 80820000.0, 0.8701837546695249], [1441915776.0, 80840000.0, 0.8639234171928385], [1441917504.0, 80860000.0, 0.8575266561936523], [1441919232.0, 80880000.0, 0.8509944817946961], [1441920960.0, 80900000.0, 0.844327925502016], [1441922688.0, 80920000.0, 0.8375280400421391], [1441924416.0, 80940000.0, 0.8305958991958144], [1441926144.0, 80960000.0, 0.8235325976284297], [1441927872.0, 80980000.0, 0.8163392507171826], [1441929600.0, 81000000.0, 0.8090169943749507], [1441931328.0, 81020000.0, 0.8015669848708761], [1441933056.0, 81040000.0, 0.7939903986478396], [1441934784.0, 81060000.0, 0.7862884321366194], [1441936512.0, 81080000.0, 0.7784623015670245], [1441938240.0, 81100000.0, 0.7705132427757907], [1441939968.0, 81120000.0, 0.7624425110114499], [1441941696.0, 81140000.0, 0.7542513807361018], [1441943424.0, 81160000.0, 0.7459411454241852], [1441945152.0, 81180000.0, 0.7375131173581728], [1441946880.0, 81200000.0, 0.7289686274214158], [1441948608.0, 81220000.0, 0.7203090248879068], [1441950336.0, 81240000.0, 0.7115356772092858], [1441952064.0, 81260000.0, 0.7026499697988553], [1441953792.0, 81280000.0, 0.6936533058128066], [1441955520.0, 81300000.0, 0.6845471059286857], [1441957248.0, 81320000.0, 0.6753328081210273], [1441958976.0, 81340000.0, 0.6660118674342498], [1441960704.0, 81360000.0, 0.6565857557529605], [1441962432.0, 81380000.0, 0.6470559615694436], [1441964160.0, 81400000.0, 0.6374239897486896], [1441965888.0, 81420000.0, 0.6276913612907065], [1441967616.0, 81440000.0, 0.6178596130903354], [1441969344.0, 81460000.0, 0.6079302976946015], [1441971072.0, 81480000.0, 0.5979049830575213], [1441972800.0, 81500000.0, 0.5877852522924705], [1441974528.0, 81520000.0, 0.5775727034222713], [1441976256.0, 81540000.0, 0.5672689491267608], [1441977984.0, 81560000.0, 0.5568756164881872], [1441979712.0, 81580000.0, 0.5463943467342749], [1441981440.0, 81600000.0, 0.5358267949789971], [1441983168.0, 81620000.0, 0.5251746299612908], [1441984896.0, 81640000.0, 0.5144395337815083], [1441986624.0, 81660000.0, 0.5036232016357634], [1441988352.0, 81680000.0, 0.4927273415482948], [1441990080.0, 81700000.0, 0.4817536741017192], [1441991808.0, 81720000.0, 0.47070393216533096], [1441993536.0, 81740000.0, 0.45957986062149325], [1441995264.0, 81760000.0, 0.448383216090032], [1441996992.0, 81780000.0, 0.437115766650927], [1441998720.0, 81800000.0, 0.4257792915650738], [1442000448.0, 81820000.0, 0.41437558099328603], [1442002176.0, 81840000.0, 0.4029064357136718], [1442003904.0, 81860000.0, 0.39137366683720576], [1442005632.0, 81880000.0, 0.3797790955217986], [1442007360.0, 81900000.0, 0.3681245526846828], [1442009088.0, 81920000.0, 0.35641187871324964], [1442010816.0, 81940000.0, 0.3446429231745167], [1442012544.0, 81960000.0, 0.33281954452298707], [1442014272.0, 81980000.0, 0.32094360980721065], [1442016000.0, 82000000.0, 0.3090169943749561], [1442017728.0, 82020000.0, 0.2970415815770376], [1442019456.0, 82040000.0, 0.2850192624699727], [1442021184.0, 82060000.0, 0.27295193551732944], [1442022912.0, 82080000.0, 0.26084150628989505], [1442024640.0, 82100000.0, 0.24868988716485363], [1442026368.0, 82120000.0, 0.2364989970237312], [1442028096.0, 82140000.0, 0.22427076094938156], [1442029824.0, 82160000.0, 0.21200710992206273], [1442031552.0, 82180000.0, 0.1997099805144089], [1442033280.0, 82200000.0, 0.18738131458572033], [1442035008.0, 82220000.0, 0.1750230589752795], [1442036736.0, 82240000.0, 0.16263716519488783], [1442038464.0, 82260000.0, 0.15022558912075504], [1442040192.0, 82280000.0, 0.13779029068464385], [1442041920.0, 82300000.0, 0.1253332335643038], [1442043648.0, 82320000.0, 0.11285638487348906], [1442045376.0, 82340000.0, 0.10036171485121598], [1442047104.0, 82360000.0, 0.08785119655073796], [1442048832.0, 82380000.0, 0.07532680552793537], [1442050560.0, 82400000.0, 0.0627905195293168], [1442052288.0, 82420000.0, 0.05024431817976668], [1442054016.0, 82440000.0, 0.03769018266993954], [1442055744.0, 82460000.0, 0.02513009544333616], [1442057472.0, 82480000.0, 0.012566039883359175], [1442059200.0, 82500000.0, 2.443103791928823e-16], [1442060928.0, 82520000.0, -0.012566039883351582], [1442062656.0, 82540000.0, -0.025130095443335672], [1442064384.0, 82560000.0, -0.03769018266993195], [1442066112.0, 82580000.0, -0.05024431817975909], [1442067840.0, 82600000.0, -0.06279051952930922], [1442069568.0, 82620000.0, -0.07532680552793489], [1442071296.0, 82640000.0, -0.08785119655073748], [1442073024.0, 82660000.0, -0.10036171485121549], [1442074752.0, 82680000.0, -0.11285638487348151], [1442076480.0, 82700000.0, -0.12533323356429624], [1442078208.0, 82720000.0, -0.13779029068463633], [1442079936.0, 82740000.0, -0.15022558912074754], [1442081664.0, 82760000.0, -0.16263716519488033], [1442083392.0, 82780000.0, -0.175023058975279], [1442085120.0, 82800000.0, -0.18738131458571985], [1442086848.0, 82820000.0, -0.19970998051440844], [1442088576.0, 82840000.0, -0.2120071099220553], [1442090304.0, 82860000.0, -0.22427076094937415], [1442092032.0, 82880000.0, -0.23649899702372382], [1442093760.0, 82900000.0, -0.24868988716484627], [1442095488.0, 82920000.0, -0.26084150628989455], [1442097216.0, 82940000.0, -0.27295193551732894], [1442098944.0, 82960000.0, -0.2850192624699723], [1442100672.0, 82980000.0, -0.2970415815770303], [1442102400.0, 83000000.0, -0.3090169943749489], [1442104128.0, 83020000.0, -0.32094360980720343], [1442105856.0, 83040000.0, -0.3328195445229866], [1442107584.0, 83060000.0, -0.3446429231745096], [1442109312.0, 83080000.0, -0.3564118787132492], [1442111040.0, 83100000.0, -0.36812455268467575], [1442112768.0, 83120000.0, -0.37977909552179817], [1442114496.0, 83140000.0, -0.39137366683719876], [1442116224.0, 83160000.0, -0.4029064357136648], [1442117952.0, 83180000.0, -0.4143755809932791], [1442119680.0, 83200000.0, -0.4257792915650734], [1442121408.0, 83220000.0, -0.43711576665092655], [1442123136.0, 83240000.0, -0.44838321609003157], [1442124864.0, 83260000.0, -0.4595798606214865], [1442126592.0, 83280000.0, -0.4707039321653305], [1442128320.0, 83300000.0, -0.48175367410171255], [1442130048.0, 83320000.0, -0.49272734154829434], [1442131776.0, 83340000.0, -0.5036232016357568], [1442133504.0, 83360000.0, -0.5144395337815079], [1442135232.0, 83380000.0, -0.5251746299612904], [1442136960.0, 83400000.0, -0.5358267949789967], [1442138688.0, 83420000.0, -0.5463943467342685], [1442140416.0, 83440000.0, -0.5568756164881808], [1442142144.0, 83460000.0, -0.5672689491267546], [1442143872.0, 83480000.0, -0.577572703422271], [1442145600.0, 83500000.0, -0.58778525229247], [1442147328.0, 83520000.0, -0.5979049830575209], [1442149056.0, 83540000.0, -0.6079302976946012], [1442150784.0, 83560000.0, -0.6178596130903296], [1442152512.0, 83580000.0, -0.6276913612907006], [1442154240.0, 83600000.0, -0.6374239897486838], [1442155968.0, 83620000.0, -0.6470559615694432], [1442157696.0, 83640000.0, -0.6565857557529602], [1442159424.0, 83660000.0, -0.6660118674342494], [1442161152.0, 83680000.0, -0.6753328081210269], [1442162880.0, 83700000.0, -0.6845471059286854], [1442164608.0, 83720000.0, -0.6936533058128012], [1442166336.0, 83740000.0, -0.70264996979885], [1442168064.0, 83760000.0, -0.7115356772092805], [1442169792.0, 83780000.0, -0.7203090248879065], [1442171520.0, 83800000.0, -0.7289686274214154], [1442173248.0, 83820000.0, -0.7375131173581725], [1442174976.0, 83840000.0, -0.7459411454241802], [1442176704.0, 83860000.0, -0.7542513807361014], [1442178432.0, 83880000.0, -0.762442511011445], [1442180160.0, 83900000.0, -0.7705132427757904], [1442181888.0, 83920000.0, -0.7784623015670197], [1442183616.0, 83940000.0, -0.786288432136619], [1442185344.0, 83960000.0, -0.793990398647835], [1442187072.0, 83980000.0, -0.8015669848708759], [1442188800.0, 84000000.0, -0.8090169943749462], [1442190528.0, 84020000.0, -0.8163392507171782], [1442192256.0, 84040000.0, -0.8235325976284253], [1442193984.0, 84060000.0, -0.8305958991958141], [1442195712.0, 84080000.0, -0.8375280400421389], [1442197440.0, 84100000.0, -0.8443279255020157], [1442199168.0, 84120000.0, -0.8509944817946921], [1442200896.0, 84140000.0, -0.8575266561936521], [1442202624.0, 84160000.0, -0.8639234171928347], [1442204352.0, 84180000.0, -0.8701837546695211], [1442206080.0, 84200000.0, -0.8763066800438623], [1442207808.0, 84220000.0, -0.882291226434955], [1442209536.0, 84240000.0, -0.8881364488135425], [1442211264.0, 84260000.0, -0.8938414241512647], [1442212992.0, 84280000.0, -0.8994052515663715], [1442214720.0, 84300000.0, -0.9048270524660167], [1442216448.0, 84320000.0, -0.9101059706849955], [1442218176.0, 84340000.0, -0.9152411726209142], [1442219904.0, 84360000.0, -0.9202318473658696], [1442221632.0, 84380000.0, -0.9250772068344597], [1442223360.0, 84400000.0, -0.9297764858882501], [1442225088.0, 84420000.0, -0.9343289424566106], [1442226816.0, 84440000.0, -0.9387338576538748], [1442228544.0, 84460000.0, -0.9429905358928625], [1442230272.0, 84480000.0, -0.9470983049947445], [1442232000.0, 84500000.0, -0.9510565162951513], [1442233728.0, 84520000.0, -0.9548645447466426], [1442235456.0, 84540000.0, -0.9585217890173774], [1442237184.0, 84560000.0, -0.9620276715860852], [1442238912.0, 84580000.0, -0.965381638833273], [1442240640.0, 84600000.0, -0.9685831611286319], [1442242368.0, 84620000.0, -0.9716317329146728], [1442244096.0, 84640000.0, -0.9745268727865775], [1442245824.0, 84660000.0, -0.977268123568192], [1442247552.0, 84680000.0, -0.9798550523842469], [1442249280.0, 84700000.0, -0.9822872507286885], [1442251008.0, 84720000.0, -0.9845643345292051], [1442252736.0, 84740000.0, -0.9866859442078677], [1442254464.0, 84760000.0, -0.9886517447379146], [1442256192.0, 84780000.0, -0.9904614256966506], [1442257920.0, 84800000.0, -0.9921147013144781], [1442259648.0, 84820000.0, -0.9936113105200077], [1442261376.0, 84840000.0, -0.9949510169813002], [1442263104.0, 84860000.0, -0.9961336091431725], [1442264832.0, 84880000.0, -0.9971589002606134], [1442266560.0, 84900000.0, -0.9980267284282714], [1442268288.0, 84920000.0, -0.9987369566060177], [1442270016.0, 84940000.0, -0.9992894726405891], [1442271744.0, 84960000.0, -0.9996841892833], [1442273472.0, 84980000.0, -0.9999210442038161], [1442275200.0, 85000000.0, -1.0], [1442276928.0, 85020000.0, -0.9999210442038161], [1442278656.0, 85040000.0, -0.9996841892833002], [1442280384.0, 85060000.0, -0.9992894726405893], [1442282112.0, 85080000.0, -0.9987369566060179], [1442283840.0, 85100000.0, -0.9980267284282717], [1442285568.0, 85120000.0, -0.9971589002606136], [1442287296.0, 85140000.0, -0.9961336091431728], [1442289024.0, 85160000.0, -0.9949510169813006], [1442290752.0, 85180000.0, -0.9936113105200083], [1442292480.0, 85200000.0, -0.9921147013144787], [1442294208.0, 85220000.0, -0.9904614256966512], [1442295936.0, 85240000.0, -0.9886517447379152], [1442297664.0, 85260000.0, -0.9866859442078684], [1442299392.0, 85280000.0, -0.9845643345292058], [1442301120.0, 85300000.0, -0.9822872507286893], [1442302848.0, 85320000.0, -0.9798550523842476], [1442304576.0, 85340000.0, -0.9772681235681929], [1442306304.0, 85360000.0, -0.9745268727865783], [1442308032.0, 85380000.0, -0.9716317329146738], [1442309760.0, 85400000.0, -0.9685831611286329], [1442311488.0, 85420000.0, -0.9653816388332741], [1442313216.0, 85440000.0, -0.9620276715860863], [1442314944.0, 85460000.0, -0.9585217890173786], [1442316672.0, 85480000.0, -0.9548645447466438], [1442318400.0, 85500000.0, -0.9510565162951525], [1442320128.0, 85520000.0, -0.9470983049947458], [1442321856.0, 85540000.0, -0.942990535892864], [1442323584.0, 85560000.0, -0.9387338576538762], [1442325312.0, 85580000.0, -0.934328942456612], [1442327040.0, 85600000.0, -0.9297764858882517], [1442328768.0, 85620000.0, -0.9250772068344613], [1442330496.0, 85640000.0, -0.9202318473658713], [1442332224.0, 85660000.0, -0.9152411726209159], [1442333952.0, 85680000.0, -0.9101059706849972], [1442335680.0, 85700000.0, -0.9048270524660185], [1442337408.0, 85720000.0, -0.8994052515663734], [1442339136.0, 85740000.0, -0.8938414241512666], [1442340864.0, 85760000.0, -0.8881364488135445], [1442342592.0, 85780000.0, -0.8822912264349569], [1442344320.0, 85800000.0, -0.8763066800438643], [1442346048.0, 85820000.0, -0.8701837546695232], [1442347776.0, 85840000.0, -0.8639234171928368], [1442349504.0, 85860000.0, -0.8575266561936542], [1442351232.0, 85880000.0, -0.8509944817946943], [1442352960.0, 85900000.0, -0.844327925502018], [1442354688.0, 85920000.0, -0.8375280400421411], [1442356416.0, 85940000.0, -0.8305958991958164], [1442358144.0, 85960000.0, -0.8235325976284278], [1442359872.0, 85980000.0, -0.8163392507171806], [1442361600.0, 86000000.0, -0.8090169943749487], [1442363328.0, 86020000.0, -0.8015669848708783], [1442365056.0, 86040000.0, -0.7939903986478376], [1442366784.0, 86060000.0, -0.7862884321366217], [1442368512.0, 86080000.0, -0.7784623015670222], [1442370240.0, 86100000.0, -0.770513242775793], [1442371968.0, 86120000.0, -0.7624425110114477], [1442373696.0, 86140000.0, -0.7542513807361041], [1442375424.0, 86160000.0, -0.745941145424183], [1442377152.0, 86180000.0, -0.7375131173581753], [1442378880.0, 86200000.0, -0.7289686274214183], [1442380608.0, 86220000.0, -0.7203090248879094], [1442382336.0, 86240000.0, -0.7115356772092835], [1442384064.0, 86260000.0, -0.7026499697988529], [1442385792.0, 86280000.0, -0.6936533058128042], [1442387520.0, 86300000.0, -0.6845471059286884], [1442389248.0, 86320000.0, -0.67533280812103], [1442390976.0, 86340000.0, -0.6660118674342526], [1442392704.0, 86360000.0, -0.6565857557529633], [1442394432.0, 86380000.0, -0.6470559615694464], [1442396160.0, 86400000.0, -0.637423989748687], [1442397888.0, 86420000.0, -0.6276913612907038], [1442399616.0, 86440000.0, -0.6178596130903328], [1442401344.0, 86460000.0, -0.6079302976946044], [1442403072.0, 86480000.0, -0.5979049830575243], [1442404800.0, 86500000.0, -0.5877852522924734], [1442406528.0, 86520000.0, -0.5775727034222743], [1442408256.0, 86540000.0, -0.5672689491267581], [1442409984.0, 86560000.0, -0.5568756164881843], [1442411712.0, 86580000.0, -0.546394346734272], [1442413440.0, 86600000.0, -0.5358267949790002], [1442415168.0, 86620000.0, -0.5251746299612939], [1442416896.0, 86640000.0, -0.5144395337815114], [1442418624.0, 86660000.0, -0.5036232016357604], [1442420352.0, 86680000.0, -0.492727341548298], [1442422080.0, 86700000.0, -0.4817536741017162], [1442423808.0, 86720000.0, -0.4707039321653342], [1442425536.0, 86740000.0, -0.4595798606214902], [1442427264.0, 86760000.0, -0.4483832160900353], [1442428992.0, 86780000.0, -0.43711576665093027], [1442430720.0, 86800000.0, -0.42577929156507716], [1442432448.0, 86820000.0, -0.4143755809932829], [1442434176.0, 86840000.0, -0.40290643571366863], [1442435904.0, 86860000.0, -0.3913736668372026], [1442437632.0, 86880000.0, -0.379779095521802], [1442439360.0, 86900000.0, -0.3681245526846796], [1442441088.0, 86920000.0, -0.3564118787132531], [1442442816.0, 86940000.0, -0.3446429231745135], [1442444544.0, 86960000.0, -0.3328195445229905], [1442446272.0, 86980000.0, -0.3209436098072074], [1442448000.0, 87000000.0, -0.30901699437495284], [1442449728.0, 87020000.0, -0.2970415815770343], [1442451456.0, 87040000.0, -0.2850192624699763], [1442453184.0, 87060000.0, -0.27295193551733293], [1442454912.0, 87080000.0, -0.2608415062898986], [1442456640.0, 87100000.0, -0.24868988716485033], [1442458368.0, 87120000.0, -0.23649899702372787], [1442460096.0, 87140000.0, -0.2242707609493782], [1442461824.0, 87160000.0, -0.21200710992205937], [1442463552.0, 87180000.0, -0.19970998051441252], [1442465280.0, 87200000.0, -0.18738131458572393], [1442467008.0, 87220000.0, -0.1750230589752831], [1442468736.0, 87240000.0, -0.16263716519488444], [1442470464.0, 87260000.0, -0.15022558912075165], [1442472192.0, 87280000.0, -0.13779029068464047], [1442473920.0, 87300000.0, -0.12533323356430037], [1442475648.0, 87320000.0, -0.11285638487348565], [1442477376.0, 87340000.0, -0.10036171485121963], [1442479104.0, 87360000.0, -0.08785119655074163], [1442480832.0, 87380000.0, -0.07532680552793904], [1442482560.0, 87400000.0, -0.06279051952931339], [1442484288.0, 87420000.0, -0.05024431817976325], [1442486016.0, 87440000.0, -0.03769018266993611], [1442487744.0, 87460000.0, -0.02513009544333983], [1442489472.0, 87480000.0, -0.012566039883355746], [1442491200.0, 87500000.0, -3.919488737908119e-15], [1442492928.0, 87520000.0, 0.012566039883355012], [1442494656.0, 87540000.0, 0.025130095443331998], [1442496384.0, 87560000.0, 0.03769018266993538], [1442498112.0, 87580000.0, 0.05024431817976252], [1442499840.0, 87600000.0, 0.06279051952931265], [1442501568.0, 87620000.0, 0.07532680552793122], [1442503296.0, 87640000.0, 0.08785119655073381], [1442505024.0, 87660000.0, 0.10036171485121183], [1442506752.0, 87680000.0, 0.11285638487348491], [1442508480.0, 87700000.0, 0.12533323356429965], [1442510208.0, 87720000.0, 0.13779029068463974], [1442511936.0, 87740000.0, 0.15022558912075093], [1442513664.0, 87760000.0, 0.16263716519488372], [1442515392.0, 87780000.0, 0.1750230589752754], [1442517120.0, 87800000.0, 0.18738131458571625], [1442518848.0, 87820000.0, 0.19970998051440483], [1442520576.0, 87840000.0, 0.21200710992205865], [1442522304.0, 87860000.0, 0.2242707609493775], [1442524032.0, 87880000.0, 0.23649899702372715], [1442525760.0, 87900000.0, 0.2486898871648496], [1442527488.0, 87920000.0, 0.260841506289891], [1442529216.0, 87940000.0, 0.27295193551732544], [1442530944.0, 87960000.0, 0.2850192624699688], [1442532672.0, 87980000.0, 0.2970415815770336], [1442534400.0, 88000000.0, 0.3090169943749521], [1442536128.0, 88020000.0, 0.3209436098072067], [1442537856.0, 88040000.0, 0.3328195445229831], [1442539584.0, 88060000.0, 0.34464292317451284], [1442541312.0, 88080000.0, 0.35641187871324576], [1442543040.0, 88100000.0, 0.3681245526846789], [1442544768.0, 88120000.0, 0.3797790955217948], [1442546496.0, 88140000.0, 0.39137366683720193], [1442548224.0, 88160000.0, 0.40290643571366797], [1442549952.0, 88180000.0, 0.41437558099328226], [1442551680.0, 88200000.0, 0.42577929156507005], [1442553408.0, 88220000.0, 0.4371157666509232], [1442555136.0, 88240000.0, 0.4483832160900283], [1442556864.0, 88260000.0, 0.45957986062148953], [1442558592.0, 88280000.0, 0.47070393216532724], [1442560320.0, 88300000.0, 0.48175367410171555], [1442562048.0, 88320000.0, 0.4927273415482912], [1442563776.0, 88340000.0, 0.5036232016357598], [1442565504.0, 88360000.0, 0.5144395337815048], [1442567232.0, 88380000.0, 0.5251746299612873], [1442568960.0, 88400000.0, 0.5358267949789935], [1442570688.0, 88420000.0, 0.5463943467342713], [1442572416.0, 88440000.0, 0.5568756164881837], [1442574144.0, 88460000.0, 0.5672689491267574], [1442575872.0, 88480000.0, 0.577572703422268], [1442577600.0, 88500000.0, 0.587785252292467], [1442579328.0, 88520000.0, 0.5979049830575179], [1442581056.0, 88540000.0, 0.6079302976945982], [1442582784.0, 88560000.0, 0.6178596130903322], [1442584512.0, 88580000.0, 0.6276913612907032], [1442586240.0, 88600000.0, 0.6374239897486864], [1442587968.0, 88620000.0, 0.6470559615694405], [1442589696.0, 88640000.0, 0.6565857557529574], [1442591424.0, 88660000.0, 0.6660118674342467], [1442593152.0, 88680000.0, 0.6753328081210243], [1442594880.0, 88700000.0, 0.6845471059286827], [1442596608.0, 88720000.0, 0.6936533058128036], [1442598336.0, 88740000.0, 0.7026499697988523], [1442600064.0, 88760000.0, 0.7115356772092829], [1442601792.0, 88780000.0, 0.7203090248879039], [1442603520.0, 88800000.0, 0.728968627421413], [1442605248.0, 88820000.0, 0.7375131173581699], [1442606976.0, 88840000.0, 0.7459411454241824], [1442608704.0, 88860000.0, 0.754251380736099], [1442610432.0, 88880000.0, 0.7624425110114472], [1442612160.0, 88900000.0, 0.770513242775788], [1442613888.0, 88920000.0, 0.7784623015670218], [1442615616.0, 88940000.0, 0.7862884321366168], [1442617344.0, 88960000.0, 0.7939903986478372], [1442619072.0, 88980000.0, 0.8015669848708736], [1442620800.0, 89000000.0, 0.8090169943749482], [1442622528.0, 89020000.0, 0.8163392507171802], [1442624256.0, 89040000.0, 0.8235325976284273], [1442625984.0, 89060000.0, 0.8305958991958121], [1442627712.0, 89080000.0, 0.8375280400421369], [1442629440.0, 89100000.0, 0.8443279255020137], [1442631168.0, 89120000.0, 0.8509944817946938], [1442632896.0, 89140000.0, 0.8575266561936501], [1442634624.0, 89160000.0, 0.8639234171928364], [1442636352.0, 89180000.0, 0.8701837546695229], [1442638080.0, 89200000.0, 0.8763066800438639], [1442639808.0, 89220000.0, 0.8822912264349532], [1442641536.0, 89240000.0, 0.8881364488135408], [1442643264.0, 89260000.0, 0.893841424151263], [1442644992.0, 89280000.0, 0.8994052515663731], [1442646720.0, 89300000.0, 0.9048270524660181], [1442648448.0, 89320000.0, 0.910105970684997], [1442650176.0, 89340000.0, 0.9152411726209156], [1442651904.0, 89360000.0, 0.9202318473658682], [1442653632.0, 89380000.0, 0.9250772068344583], [1442655360.0, 89400000.0, 0.9297764858882488], [1442657088.0, 89420000.0, 0.9343289424566118], [1442658816.0, 89440000.0, 0.938733857653876], [1442660544.0, 89460000.0, 0.9429905358928636], [1442662272.0, 89480000.0, 0.9470983049947432], [1442664000.0, 89500000.0, 0.9510565162951523], [1442665728.0, 89520000.0, 0.9548645447466415], [1442667456.0, 89540000.0, 0.9585217890173764], [1442669184.0, 89560000.0, 0.9620276715860842], [1442670912.0, 89580000.0, 0.9653816388332739], [1442672640.0, 89600000.0, 0.9685831611286327], [1442674368.0, 89620000.0, 0.9716317329146735], [1442676096.0, 89640000.0, 0.9745268727865766], [1442677824.0, 89660000.0, 0.9772681235681928], [1442679552.0, 89680000.0, 0.9798550523842461], [1442681280.0, 89700000.0, 0.9822872507286892], [1442683008.0, 89720000.0, 0.9845643345292044], [1442684736.0, 89740000.0, 0.9866859442078683], [1442686464.0, 89760000.0, 0.9886517447379141], [1442688192.0, 89780000.0, 0.9904614256966511], [1442689920.0, 89800000.0, 0.9921147013144777], [1442691648.0, 89820000.0, 0.9936113105200074], [1442693376.0, 89840000.0, 0.9949510169812998], [1442695104.0, 89860000.0, 0.9961336091431727], [1442696832.0, 89880000.0, 0.9971589002606136], [1442698560.0, 89900000.0, 0.9980267284282717], [1442700288.0, 89920000.0, 0.9987369566060176], [1442702016.0, 89940000.0, 0.999289472640589], [1442703744.0, 89960000.0, 0.9996841892832999], [1442705472.0, 89980000.0, 0.999921044203816], [1442707200.0, 90000000.0, 1.0], [1442708928.0, 90020000.0, 0.9999210442038161], [1442710656.0, 90040000.0, 0.9996841892833], [1442712384.0, 90060000.0, 0.9992894726405892], [1442714112.0, 90080000.0, 0.9987369566060178], [1442715840.0, 90100000.0, 0.9980267284282719], [1442717568.0, 90120000.0, 0.9971589002606139], [1442719296.0, 90140000.0, 0.9961336091431732], [1442721024.0, 90160000.0, 0.9949510169813003], [1442722752.0, 90180000.0, 0.9936113105200078], [1442724480.0, 90200000.0, 0.9921147013144782], [1442726208.0, 90220000.0, 0.9904614256966517], [1442727936.0, 90240000.0, 0.9886517447379147], [1442729664.0, 90260000.0, 0.9866859442078689], [1442731392.0, 90280000.0, 0.9845643345292052], [1442733120.0, 90300000.0, 0.9822872507286899], [1442734848.0, 90320000.0, 0.979855052384247], [1442736576.0, 90340000.0, 0.9772681235681937], [1442738304.0, 90360000.0, 0.9745268727865776], [1442740032.0, 90380000.0, 0.9716317329146746], [1442741760.0, 90400000.0, 0.9685831611286337], [1442743488.0, 90420000.0, 0.965381638833275], [1442745216.0, 90440000.0, 0.9620276715860854], [1442746944.0, 90460000.0, 0.9585217890173776], [1442748672.0, 90480000.0, 0.9548645447466428], [1442750400.0, 90500000.0, 0.9510565162951538], [1442752128.0, 90520000.0, 0.9470983049947447], [1442753856.0, 90540000.0, 0.9429905358928652], [1442755584.0, 90560000.0, 0.9387338576538775], [1442757312.0, 90580000.0, 0.9343289424566134], [1442759040.0, 90600000.0, 0.9297764858882503], [1442760768.0, 90620000.0, 0.92507720683446], [1442762496.0, 90640000.0, 0.9202318473658698], [1442764224.0, 90660000.0, 0.9152411726209174], [1442765952.0, 90680000.0, 0.9101059706849988], [1442767680.0, 90700000.0, 0.90482705246602], [1442769408.0, 90720000.0, 0.899405251566375], [1442771136.0, 90740000.0, 0.893841424151265], [1442772864.0, 90760000.0, 0.8881364488135429], [1442774592.0, 90780000.0, 0.8822912264349553], [1442776320.0, 90800000.0, 0.876306680043866], [1442778048.0, 90820000.0, 0.870183754669525], [1442779776.0, 90840000.0, 0.8639234171928386], [1442781504.0, 90860000.0, 0.8575266561936524], [1442783232.0, 90880000.0, 0.8509944817946962], [1442784960.0, 90900000.0, 0.8443279255020161], [1442786688.0, 90920000.0, 0.8375280400421393], [1442788416.0, 90940000.0, 0.8305958991958146], [1442790144.0, 90960000.0, 0.8235325976284298], [1442791872.0, 90980000.0, 0.8163392507171827], [1442793600.0, 91000000.0, 0.8090169943749508], [1442795328.0, 91020000.0, 0.8015669848708763], [1442797056.0, 91040000.0, 0.7939903986478398], [1442798784.0, 91060000.0, 0.7862884321366195], [1442800512.0, 91080000.0, 0.7784623015670246], [1442802240.0, 91100000.0, 0.7705132427757909], [1442803968.0, 91120000.0, 0.76244251101145], [1442805696.0, 91140000.0, 0.7542513807361019], [1442807424.0, 91160000.0, 0.7459411454241854], [1442809152.0, 91180000.0, 0.7375131173581729], [1442810880.0, 91200000.0, 0.728968627421416], [1442812608.0, 91220000.0, 0.720309024887907], [1442814336.0, 91240000.0, 0.711535677209286], [1442816064.0, 91260000.0, 0.7026499697988555], [1442817792.0, 91280000.0, 0.6936533058128068], [1442819520.0, 91300000.0, 0.684547105928686], [1442821248.0, 91320000.0, 0.6753328081210275], [1442822976.0, 91340000.0, 0.66601186743425], [1442824704.0, 91360000.0, 0.6565857557529607], [1442826432.0, 91380000.0, 0.6470559615694438], [1442828160.0, 91400000.0, 0.6374239897486897], [1442829888.0, 91420000.0, 0.6276913612907067], [1442831616.0, 91440000.0, 0.6178596130903357], [1442833344.0, 91460000.0, 0.6079302976946017], [1442835072.0, 91480000.0, 0.5979049830575215], [1442836800.0, 91500000.0, 0.5877852522924706], [1442838528.0, 91520000.0, 0.5775727034222715], [1442840256.0, 91540000.0, 0.5672689491267611], [1442841984.0, 91560000.0, 0.5568756164881874], [1442843712.0, 91580000.0, 0.5463943467342751], [1442845440.0, 91600000.0, 0.5358267949789973], [1442847168.0, 91620000.0, 0.525174629961291], [1442848896.0, 91640000.0, 0.5144395337815085], [1442850624.0, 91660000.0, 0.5036232016357636], [1442852352.0, 91680000.0, 0.492727341548295], [1442854080.0, 91700000.0, 0.48175367410171943], [1442855808.0, 91720000.0, 0.4707039321653312], [1442857536.0, 91740000.0, 0.4595798606214935], [1442859264.0, 91760000.0, 0.44838321609003223], [1442860992.0, 91780000.0, 0.4371157666509272], [1442862720.0, 91800000.0, 0.42577929156507405], [1442864448.0, 91820000.0, 0.41437558099328625], [1442866176.0, 91840000.0, 0.402906435713672], [1442867904.0, 91860000.0, 0.391373666837206], [1442869632.0, 91880000.0, 0.37977909552179884], [1442871360.0, 91900000.0, 0.368124552684683], [1442873088.0, 91920000.0, 0.35641187871324986], [1442874816.0, 91940000.0, 0.34464292317451695], [1442876544.0, 91960000.0, 0.3328195445229873], [1442878272.0, 91980000.0, 0.3209436098072109], [1442880000.0, 92000000.0, 0.30901699437495633], [1442881728.0, 92020000.0, 0.2970415815770378], [1442883456.0, 92040000.0, 0.285019262469973], [1442885184.0, 92060000.0, 0.27295193551732966], [1442886912.0, 92080000.0, 0.2608415062898953], [1442888640.0, 92100000.0, 0.24868988716485388], [1442890368.0, 92120000.0, 0.23649899702373142], [1442892096.0, 92140000.0, 0.22427076094938178], [1442893824.0, 92160000.0, 0.21200710992206295], [1442895552.0, 92180000.0, 0.19970998051440916], [1442897280.0, 92200000.0, 0.18738131458572058], [1442899008.0, 92220000.0, 0.17502305897527973], [1442900736.0, 92240000.0, 0.16263716519488808], [1442902464.0, 92260000.0, 0.1502255891207553], [1442904192.0, 92280000.0, 0.1377902906846441], [1442905920.0, 92300000.0, 0.125333233564304], [1442907648.0, 92320000.0, 0.1128563848734893], [1442909376.0, 92340000.0, 0.10036171485121621], [1442911104.0, 92360000.0, 0.08785119655073821], [1442912832.0, 92380000.0, 0.07532680552793562], [1442914560.0, 92400000.0, 0.06279051952931705], [1442916288.0, 92420000.0, 0.05024431817976692], [1442918016.0, 92440000.0, 0.03769018266993978], [1442919744.0, 92460000.0, 0.025130095443336404], [1442921472.0, 92480000.0, 0.01256603988335942], [1442923200.0, 92500000.0, 4.892397390223529e-16], [1442924928.0, 92520000.0, -0.012566039883351336], [1442926656.0, 92540000.0, -0.025130095443335426], [1442928384.0, 92560000.0, -0.0376901826699317], [1442930112.0, 92580000.0, -0.05024431817975885], [1442931840.0, 92600000.0, -0.06279051952930899], [1442933568.0, 92620000.0, -0.07532680552793464], [1442935296.0, 92640000.0, -0.08785119655073724], [1442937024.0, 92660000.0, -0.10036171485121524], [1442938752.0, 92680000.0, -0.11285638487348126], [1442940480.0, 92700000.0, -0.125333233564296], [1442942208.0, 92720000.0, -0.1377902906846361], [1442943936.0, 92740000.0, -0.1502255891207473], [1442945664.0, 92760000.0, -0.1626371651948801], [1442947392.0, 92780000.0, -0.17502305897527878], [1442949120.0, 92800000.0, -0.1873813145857196], [1442950848.0, 92820000.0, -0.1997099805144082], [1442952576.0, 92840000.0, -0.21200710992205507], [1442954304.0, 92860000.0, -0.22427076094937393], [1442956032.0, 92880000.0, -0.23649899702372357], [1442957760.0, 92900000.0, -0.24868988716484605], [1442959488.0, 92920000.0, -0.26084150628989433], [1442961216.0, 92940000.0, -0.2729519355173287], [1442962944.0, 92960000.0, -0.28501926246997206], [1442964672.0, 92980000.0, -0.2970415815770301], [1442966400.0, 93000000.0, -0.3090169943749486], [1442968128.0, 93020000.0, -0.3209436098072032], [1442969856.0, 93040000.0, -0.33281954452298634], [1442971584.0, 93060000.0, -0.3446429231745094], [1442973312.0, 93080000.0, -0.356411878713249], [1442975040.0, 93100000.0, -0.36812455268467553], [1442976768.0, 93120000.0, -0.37977909552179795], [1442978496.0, 93140000.0, -0.39137366683719854], [1442980224.0, 93160000.0, -0.4029064357136646], [1442981952.0, 93180000.0, -0.41437558099327887], [1442983680.0, 93200000.0, -0.42577929156507316], [1442985408.0, 93220000.0, -0.43711576665092633], [1442987136.0, 93240000.0, -0.44838321609003134], [1442988864.0, 93260000.0, -0.45957986062148626], [1442990592.0, 93280000.0, -0.4707039321653303], [1442992320.0, 93300000.0, -0.4817536741017123], [1442994048.0, 93320000.0, -0.4927273415482942], [1442995776.0, 93340000.0, -0.5036232016357566], [1442997504.0, 93360000.0, -0.5144395337815076], [1442999232.0, 93380000.0, -0.5251746299612902], [1443000960.0, 93400000.0, -0.5358267949789964], [1443002688.0, 93420000.0, -0.5463943467342683], [1443004416.0, 93440000.0, -0.5568756164881806], [1443006144.0, 93460000.0, -0.5672689491267544], [1443007872.0, 93480000.0, -0.5775727034222707], [1443009600.0, 93500000.0, -0.5877852522924698], [1443011328.0, 93520000.0, -0.5979049830575207], [1443013056.0, 93540000.0, -0.6079302976946009], [1443014784.0, 93560000.0, -0.6178596130903293], [1443016512.0, 93580000.0, -0.6276913612907004], [1443018240.0, 93600000.0, -0.6374239897486835], [1443019968.0, 93620000.0, -0.647055961569443], [1443021696.0, 93640000.0, -0.65658575575296], [1443023424.0, 93660000.0, -0.6660118674342492], [1443025152.0, 93680000.0, -0.6753328081210267], [1443026880.0, 93700000.0, -0.6845471059286852], [1443028608.0, 93720000.0, -0.6936533058128009], [1443030336.0, 93740000.0, -0.7026499697988497], [1443032064.0, 93760000.0, -0.7115356772092803], [1443033792.0, 93780000.0, -0.7203090248879064], [1443035520.0, 93800000.0, -0.7289686274214153], [1443037248.0, 93820000.0, -0.7375131173581723], [1443038976.0, 93840000.0, -0.74594114542418], [1443040704.0, 93860000.0, -0.7542513807361012], [1443042432.0, 93880000.0, -0.7624425110114448], [1443044160.0, 93900000.0, -0.7705132427757903], [1443045888.0, 93920000.0, -0.7784623015670195], [1443047616.0, 93940000.0, -0.7862884321366189], [1443049344.0, 93960000.0, -0.7939903986478348], [1443051072.0, 93980000.0, -0.8015669848708756], [1443052800.0, 94000000.0, -0.8090169943749461], [1443054528.0, 94020000.0, -0.816339250717178], [1443056256.0, 94040000.0, -0.8235325976284252], [1443057984.0, 94060000.0, -0.830595899195814], [1443059712.0, 94080000.0, -0.8375280400421388], [1443061440.0, 94100000.0, -0.8443279255020155], [1443063168.0, 94120000.0, -0.850994481794692], [1443064896.0, 94140000.0, -0.8575266561936519], [1443066624.0, 94160000.0, -0.8639234171928346], [1443068352.0, 94180000.0, -0.870183754669521], [1443070080.0, 94200000.0, -0.8763066800438621], [1443071808.0, 94220000.0, -0.8822912264349548], [1443073536.0, 94240000.0, -0.8881364488135424], [1443075264.0, 94260000.0, -0.8938414241512646], [1443076992.0, 94280000.0, -0.8994052515663714], [1443078720.0, 94300000.0, -0.9048270524660166], [1443080448.0, 94320000.0, -0.9101059706849954], [1443082176.0, 94340000.0, -0.9152411726209141], [1443083904.0, 94360000.0, -0.9202318473658695], [1443085632.0, 94380000.0, -0.9250772068344596], [1443087360.0, 94400000.0, -0.92977648588825], [1443089088.0, 94420000.0, -0.9343289424566105], [1443090816.0, 94440000.0, -0.9387338576538747], [1443092544.0, 94460000.0, -0.9429905358928624], [1443094272.0, 94480000.0, -0.9470983049947443], [1443096000.0, 94500000.0, -0.9510565162951512], [1443097728.0, 94520000.0, -0.9548645447466426], [1443099456.0, 94540000.0, -0.9585217890173773], [1443101184.0, 94560000.0, -0.9620276715860852], [1443102912.0, 94580000.0, -0.9653816388332729], [1443104640.0, 94600000.0, -0.9685831611286317], [1443106368.0, 94620000.0, -0.9716317329146728], [1443108096.0, 94640000.0, -0.9745268727865773], [1443109824.0, 94660000.0, -0.977268123568192], [1443111552.0, 94680000.0, -0.9798550523842467], [1443113280.0, 94700000.0, -0.9822872507286885], [1443115008.0, 94720000.0, -0.984564334529205], [1443116736.0, 94740000.0, -0.9866859442078676], [1443118464.0, 94760000.0, -0.9886517447379145], [1443120192.0, 94780000.0, -0.9904614256966506], [1443121920.0, 94800000.0, -0.9921147013144781], [1443123648.0, 94820000.0, -0.9936113105200077], [1443125376.0, 94840000.0, -0.9949510169813002], [1443127104.0, 94860000.0, -0.9961336091431724], [1443128832.0, 94880000.0, -0.9971589002606133], [1443130560.0, 94900000.0, -0.9980267284282714], [1443132288.0, 94920000.0, -0.9987369566060177], [1443134016.0, 94940000.0, -0.9992894726405891], [1443135744.0, 94960000.0, -0.9996841892833], [1443137472.0, 94980000.0, -0.999921044203816], [1443139200.0, 95000000.0, -1.0], [1443140928.0, 95020000.0, -0.9999210442038161], [1443142656.0, 95040000.0, -0.9996841892833002], [1443144384.0, 95060000.0, -0.9992894726405893], [1443146112.0, 95080000.0, -0.9987369566060179], [1443147840.0, 95100000.0, -0.9980267284282718], [1443149568.0, 95120000.0, -0.9971589002606137], [1443151296.0, 95140000.0, -0.9961336091431728], [1443153024.0, 95160000.0, -0.9949510169813006], [1443154752.0, 95180000.0, -0.9936113105200083], [1443156480.0, 95200000.0, -0.9921147013144787], [1443158208.0, 95220000.0, -0.9904614256966513], [1443159936.0, 95240000.0, -0.9886517447379153], [1443161664.0, 95260000.0, -0.9866859442078684], [1443163392.0, 95280000.0, -0.9845643345292059], [1443165120.0, 95300000.0, -0.9822872507286893], [1443166848.0, 95320000.0, -0.9798550523842477], [1443168576.0, 95340000.0, -0.977268123568193], [1443170304.0, 95360000.0, -0.9745268727865785], [1443172032.0, 95380000.0, -0.9716317329146738], [1443173760.0, 95400000.0, -0.968583161128633], [1443175488.0, 95420000.0, -0.9653816388332741], [1443177216.0, 95440000.0, -0.9620276715860864], [1443178944.0, 95460000.0, -0.9585217890173786], [1443180672.0, 95480000.0, -0.954864544746644], [1443182400.0, 95500000.0, -0.9510565162951526], [1443184128.0, 95520000.0, -0.9470983049947459], [1443185856.0, 95540000.0, -0.942990535892864], [1443187584.0, 95560000.0, -0.9387338576538763], [1443189312.0, 95580000.0, -0.9343289424566121], [1443191040.0, 95600000.0, -0.9297764858882518], [1443192768.0, 95620000.0, -0.9250772068344614], [1443194496.0, 95640000.0, -0.9202318473658713], [1443196224.0, 95660000.0, -0.915241172620916], [1443197952.0, 95680000.0, -0.9101059706849973], [1443199680.0, 95700000.0, -0.9048270524660186], [1443201408.0, 95720000.0, -0.8994052515663735], [1443203136.0, 95740000.0, -0.8938414241512667], [1443204864.0, 95760000.0, -0.8881364488135446], [1443206592.0, 95780000.0, -0.882291226434957], [1443208320.0, 95800000.0, -0.8763066800438644], [1443210048.0, 95820000.0, -0.8701837546695234], [1443211776.0, 95840000.0, -0.8639234171928369], [1443213504.0, 95860000.0, -0.8575266561936543], [1443215232.0, 95880000.0, -0.8509944817946944], [1443216960.0, 95900000.0, -0.8443279255020181], [1443218688.0, 95920000.0, -0.8375280400421413], [1443220416.0, 95940000.0, -0.8305958991958166], [1443222144.0, 95960000.0, -0.8235325976284279], [1443223872.0, 95980000.0, -0.8163392507171807], [1443225600.0, 96000000.0, -0.8090169943749488], [1443227328.0, 96020000.0, -0.8015669848708784], [1443229056.0, 96040000.0, -0.7939903986478377], [1443230784.0, 96060000.0, -0.7862884321366218], [1443232512.0, 96080000.0, -0.7784623015670225], [1443234240.0, 96100000.0, -0.7705132427757933], [1443235968.0, 96120000.0, -0.7624425110114478], [1443237696.0, 96140000.0, -0.7542513807361043], [1443239424.0, 96160000.0, -0.7459411454241831], [1443241152.0, 96180000.0, -0.7375131173581755], [1443242880.0, 96200000.0, -0.7289686274214184], [1443244608.0, 96220000.0, -0.7203090248879096], [1443246336.0, 96240000.0, -0.7115356772092836], [1443248064.0, 96260000.0, -0.7026499697988531], [1443249792.0, 96280000.0, -0.6936533058128043], [1443251520.0, 96300000.0, -0.6845471059286886], [1443253248.0, 96320000.0, -0.6753328081210301], [1443254976.0, 96340000.0, -0.6660118674342527], [1443256704.0, 96360000.0, -0.6565857557529635], [1443258432.0, 96380000.0, -0.6470559615694466], [1443260160.0, 96400000.0, -0.6374239897486872], [1443261888.0, 96420000.0, -0.627691361290704], [1443263616.0, 96440000.0, -0.617859613090333], [1443265344.0, 96460000.0, -0.6079302976946046], [1443267072.0, 96480000.0, -0.5979049830575244], [1443268800.0, 96500000.0, -0.5877852522924736], [1443270528.0, 96520000.0, -0.5775727034222745], [1443272256.0, 96540000.0, -0.5672689491267583], [1443273984.0, 96560000.0, -0.5568756164881845], [1443275712.0, 96580000.0, -0.5463943467342722], [1443277440.0, 96600000.0, -0.5358267949790004], [1443279168.0, 96620000.0, -0.5251746299612942], [1443280896.0, 96640000.0, -0.5144395337815116], [1443282624.0, 96660000.0, -0.5036232016357606], [1443284352.0, 96680000.0, -0.4927273415482982], [1443286080.0, 96700000.0, -0.48175367410171643], [1443287808.0, 96720000.0, -0.4707039321653344], [1443289536.0, 96740000.0, -0.4595798606214904], [1443291264.0, 96760000.0, -0.4483832160900355], [1443292992.0, 96780000.0, -0.4371157666509305], [1443294720.0, 96800000.0, -0.4257792915650774], [1443296448.0, 96820000.0, -0.41437558099328314], [1443298176.0, 96840000.0, -0.40290643571366885], [1443299904.0, 96860000.0, -0.3913736668372028], [1443301632.0, 96880000.0, -0.3797790955218022], [1443303360.0, 96900000.0, -0.36812455268467986], [1443305088.0, 96920000.0, -0.3564118787132533], [1443306816.0, 96940000.0, -0.3446429231745137], [1443308544.0, 96960000.0, -0.3328195445229908], [1443310272.0, 96980000.0, -0.32094360980720765], [1443312000.0, 97000000.0, -0.30901699437495306], [1443313728.0, 97020000.0, -0.29704158157703453], [1443315456.0, 97040000.0, -0.2850192624699765], [1443317184.0, 97060000.0, -0.2729519355173332], [1443318912.0, 97080000.0, -0.26084150628989883], [1443320640.0, 97100000.0, -0.24868988716485055], [1443322368.0, 97120000.0, -0.2364989970237281], [1443324096.0, 97140000.0, -0.22427076094937845], [1443325824.0, 97160000.0, -0.2120071099220596], [1443327552.0, 97180000.0, -0.19970998051441277], [1443329280.0, 97200000.0, -0.18738131458572418], [1443331008.0, 97220000.0, -0.17502305897528336], [1443332736.0, 97240000.0, -0.1626371651948847], [1443334464.0, 97260000.0, -0.1502255891207519], [1443336192.0, 97280000.0, -0.13779029068464071], [1443337920.0, 97300000.0, -0.12533323356430062], [1443339648.0, 97320000.0, -0.11285638487348588], [1443341376.0, 97340000.0, -0.10036171485121988], [1443343104.0, 97360000.0, -0.08785119655074188], [1443344832.0, 97380000.0, -0.07532680552793929], [1443346560.0, 97400000.0, -0.06279051952931362], [1443348288.0, 97420000.0, -0.0502443181797635], [1443350016.0, 97440000.0, -0.03769018266993635], [1443351744.0, 97460000.0, -0.025130095443340078], [1443353472.0, 97480000.0, -0.01256603988335599], [1443355200.0, 97500000.0, -4.164418097737589e-15], [1443356928.0, 97520000.0, 0.012566039883354767], [1443358656.0, 97540000.0, 0.02513009544333175], [1443360384.0, 97560000.0, 0.03769018266993513], [1443362112.0, 97580000.0, 0.05024431817976227], [1443363840.0, 97600000.0, 0.0627905195293124], [1443365568.0, 97620000.0, 0.07532680552793097], [1443367296.0, 97640000.0, 0.08785119655073358], [1443369024.0, 97660000.0, 0.10036171485121159], [1443370752.0, 97680000.0, 0.11285638487348468], [1443372480.0, 97700000.0, 0.1253332335642994], [1443374208.0, 97720000.0, 0.1377902906846395], [1443375936.0, 97740000.0, 0.15022558912075068], [1443377664.0, 97760000.0, 0.16263716519488347], [1443379392.0, 97780000.0, 0.17502305897527515], [1443381120.0, 97800000.0, 0.187381314585716], [1443382848.0, 97820000.0, 0.1997099805144046], [1443384576.0, 97840000.0, 0.2120071099220584], [1443386304.0, 97860000.0, 0.22427076094937726], [1443388032.0, 97880000.0, 0.2364989970237269], [1443389760.0, 97900000.0, 0.24868988716484938], [1443391488.0, 97920000.0, 0.2608415062898908], [1443393216.0, 97940000.0, 0.27295193551732516], [1443394944.0, 97960000.0, 0.2850192624699685], [1443396672.0, 97980000.0, 0.29704158157703336], [1443398400.0, 98000000.0, 0.3090169943749519], [1443400128.0, 98020000.0, 0.3209436098072065], [1443401856.0, 98040000.0, 0.3328195445229829], [1443403584.0, 98060000.0, 0.3446429231745126], [1443405312.0, 98080000.0, 0.35641187871324553], [1443407040.0, 98100000.0, 0.3681245526846787], [1443408768.0, 98120000.0, 0.37977909552179456], [1443410496.0, 98140000.0, 0.3913736668372017], [1443412224.0, 98160000.0, 0.40290643571366774], [1443413952.0, 98180000.0, 0.41437558099328203], [1443415680.0, 98200000.0, 0.42577929156506983], [1443417408.0, 98220000.0, 0.437115766650923], [1443419136.0, 98240000.0, 0.44838321609002807], [1443420864.0, 98260000.0, 0.4595798606214893], [1443422592.0, 98280000.0, 0.4707039321653271], [1443424320.0, 98300000.0, 0.4817536741017153], [1443426048.0, 98320000.0, 0.49272734154829095], [1443427776.0, 98340000.0, 0.5036232016357596], [1443429504.0, 98360000.0, 0.5144395337815045], [1443431232.0, 98380000.0, 0.525174629961287], [1443432960.0, 98400000.0, 0.5358267949789934], [1443434688.0, 98420000.0, 0.5463943467342711], [1443436416.0, 98440000.0, 0.5568756164881835], [1443438144.0, 98460000.0, 0.5672689491267573], [1443439872.0, 98480000.0, 0.5775727034222677], [1443441600.0, 98500000.0, 0.5877852522924669], [1443443328.0, 98520000.0, 0.5979049830575177], [1443445056.0, 98540000.0, 0.607930297694598], [1443446784.0, 98560000.0, 0.617859613090332], [1443448512.0, 98580000.0, 0.6276913612907031], [1443450240.0, 98600000.0, 0.6374239897486862], [1443451968.0, 98620000.0, 0.6470559615694402], [1443453696.0, 98640000.0, 0.6565857557529572], [1443455424.0, 98660000.0, 0.6660118674342465], [1443457152.0, 98680000.0, 0.675332808121024], [1443458880.0, 98700000.0, 0.6845471059286825], [1443460608.0, 98720000.0, 0.6936533058128034], [1443462336.0, 98740000.0, 0.7026499697988522], [1443464064.0, 98760000.0, 0.7115356772092828], [1443465792.0, 98780000.0, 0.7203090248879038], [1443467520.0, 98800000.0, 0.7289686274214128], [1443469248.0, 98820000.0, 0.7375131173581698], [1443470976.0, 98840000.0, 0.7459411454241823], [1443472704.0, 98860000.0, 0.7542513807360988], [1443474432.0, 98880000.0, 0.762442511011447], [1443476160.0, 98900000.0, 0.7705132427757879], [1443477888.0, 98920000.0, 0.7784623015670217], [1443479616.0, 98940000.0, 0.7862884321366167], [1443481344.0, 98960000.0, 0.7939903986478369], [1443483072.0, 98980000.0, 0.8015669848708734], [1443484800.0, 99000000.0, 0.8090169943749481], [1443486528.0, 99020000.0, 0.81633925071718], [1443488256.0, 99040000.0, 0.8235325976284272], [1443489984.0, 99060000.0, 0.830595899195812], [1443491712.0, 99080000.0, 0.8375280400421368], [1443493440.0, 99100000.0, 0.8443279255020136], [1443495168.0, 99120000.0, 0.8509944817946937], [1443496896.0, 99140000.0, 0.85752665619365], [1443498624.0, 99160000.0, 0.8639234171928363], [1443500352.0, 99180000.0, 0.8701837546695228], [1443502080.0, 99200000.0, 0.8763066800438638], [1443503808.0, 99220000.0, 0.8822912264349531], [1443505536.0, 99240000.0, 0.8881364488135407], [1443507264.0, 99260000.0, 0.8938414241512629], [1443508992.0, 99280000.0, 0.899405251566373], [1443510720.0, 99300000.0, 0.904827052466018], [1443512448.0, 99320000.0, 0.9101059706849969], [1443514176.0, 99340000.0, 0.9152411726209155], [1443515904.0, 99360000.0, 0.920231847365868], [1443517632.0, 99380000.0, 0.9250772068344583], [1443519360.0, 99400000.0, 0.9297764858882487], [1443521088.0, 99420000.0, 0.9343289424566117], [1443522816.0, 99440000.0, 0.9387338576538758], [1443524544.0, 99460000.0, 0.9429905358928636], [1443526272.0, 99480000.0, 0.9470983049947432], [1443528000.0, 99500000.0, 0.9510565162951523], [1443529728.0, 99520000.0, 0.9548645447466415], [1443531456.0, 99540000.0, 0.9585217890173763], [1443533184.0, 99560000.0, 0.9620276715860842], [1443534912.0, 99580000.0, 0.9653816388332738], [1443536640.0, 99600000.0, 0.9685831611286326], [1443538368.0, 99620000.0, 0.9716317329146735], [1443540096.0, 99640000.0, 0.9745268727865766], [1443541824.0, 99660000.0, 0.9772681235681927], [1443543552.0, 99680000.0, 0.9798550523842461], [1443545280.0, 99700000.0, 0.982287250728689], [1443547008.0, 99720000.0, 0.9845643345292043], [1443548736.0, 99740000.0, 0.9866859442078681], [1443550464.0, 99760000.0, 0.988651744737914], [1443552192.0, 99780000.0, 0.9904614256966511], [1443553920.0, 99800000.0, 0.9921147013144777], [1443555648.0, 99820000.0, 0.9936113105200073], [1443557376.0, 99840000.0, 0.9949510169812998], [1443559104.0, 99860000.0, 0.9961336091431727], [1443560832.0, 99880000.0, 0.9971589002606136], [1443562560.0, 99900000.0, 0.9980267284282717], [1443564288.0, 99920000.0, 0.9987369566060176], [1443566016.0, 99940000.0, 0.999289472640589], [1443567744.0, 99960000.0, 0.9996841892832999], [1443569472.0, 99980000.0, 0.999921044203816]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/app/demo/data/cubic.json b/tensorflow/tensorboard/app/demo/data/cubic.json
new file mode 100644
index 0000000000..67dc173187
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/cubic.json
@@ -0,0 +1 @@
+[[1434931200.0, 0.0, 0.0], [1434932928.0, 20000.0, -0.039999992], [1434934656.0, 40000.0, -0.07999993600000001], [1434936384.0, 60000.0, -0.119999784], [1434938112.0, 80000.0, -0.159999488], [1434939840.0, 100000.0, -0.199999], [1434941568.0, 120000.0, -0.23999827199999998], [1434943296.0, 140000.0, -0.279997256], [1434945024.0, 160000.0, -0.319995904], [1434946752.0, 180000.0, -0.359994168], [1434948480.0, 200000.0, -0.399992], [1434950208.0, 220000.0, -0.43998935199999994], [1434951936.0, 240000.0, -0.479986176], [1434953664.0, 260000.0, -0.519982424], [1434955392.0, 280000.0, -0.5599780480000001], [1434957120.0, 300000.0, -0.599973], [1434958848.0, 320000.0, -0.639967232], [1434960576.0, 340000.0, -0.6799606960000001], [1434962304.0, 360000.0, -0.719953344], [1434964032.0, 380000.0, -0.759945128], [1434965760.0, 400000.0, -0.7999360000000001], [1434967488.0, 420000.0, -0.8399259120000001], [1434969216.0, 440000.0, -0.8799148159999999], [1434970944.0, 460000.0, -0.919902664], [1434972672.0, 480000.0, -0.959889408], [1434974400.0, 500000.0, -0.999875], [1434976128.0, 520000.0, -1.039859392], [1434977856.0, 540000.0, -1.0798425360000001], [1434979584.0, 560000.0, -1.1198243840000002], [1434981312.0, 580000.0, -1.159804888], [1434983040.0, 600000.0, -1.199784], [1434984768.0, 620000.0, -1.239761672], [1434986496.0, 640000.0, -1.2797378560000001], [1434988224.0, 660000.0, -1.319712504], [1434989952.0, 680000.0, -1.3596855680000002], [1434991680.0, 700000.0, -1.3996570000000002], [1434993408.0, 720000.0, -1.439626752], [1434995136.0, 740000.0, -1.479594776], [1434996864.0, 760000.0, -1.519561024], [1434998592.0, 780000.0, -1.559525448], [1435000320.0, 800000.0, -1.599488], [1435002048.0, 820000.0, -1.6394486320000001], [1435003776.0, 840000.0, -1.6794072960000002], [1435005504.0, 860000.0, -1.7193639439999997], [1435007232.0, 880000.0, -1.7593185279999999], [1435008960.0, 900000.0, -1.7992709999999998], [1435010688.0, 920000.0, -1.8392213119999998], [1435012416.0, 940000.0, -1.8791694159999999], [1435014144.0, 960000.0, -1.919115264], [1435015872.0, 980000.0, -1.959058808], [1435017600.0, 1000000.0, -1.999], [1435019328.0, 1020000.0, -2.038938792], [1435021056.0, 1040000.0, -2.078875136], [1435022784.0, 1060000.0, -2.118808984], [1435024512.0, 1080000.0, -2.158740288], [1435026240.0, 1100000.0, -2.198669], [1435027968.0, 1120000.0, -2.2385950720000003], [1435029696.0, 1140000.0, -2.2785184560000005], [1435031424.0, 1160000.0, -2.3184391040000003], [1435033152.0, 1180000.0, -2.358356968], [1435034880.0, 1200000.0, -2.398272], [1435036608.0, 1220000.0, -2.438184152], [1435038336.0, 1240000.0, -2.478093376], [1435040064.0, 1260000.0, -2.517999624], [1435041792.0, 1280000.0, -2.557902848], [1435043520.0, 1300000.0, -2.5978030000000003], [1435045248.0, 1320000.0, -2.637700032], [1435046976.0, 1340000.0, -2.6775938960000003], [1435048704.0, 1360000.0, -2.7174845440000004], [1435050432.0, 1380000.0, -2.7573719280000004], [1435052160.0, 1400000.0, -2.7972560000000004], [1435053888.0, 1420000.0, -2.837136712], [1435055616.0, 1440000.0, -2.877014016], [1435057344.0, 1460000.0, -2.916887864], [1435059072.0, 1480000.0, -2.956758208], [1435060800.0, 1500000.0, -2.996625], [1435062528.0, 1520000.0, -3.036488192], [1435064256.0, 1540000.0, -3.076347736], [1435065984.0, 1560000.0, -3.116203584], [1435067712.0, 1580000.0, -3.1560556880000004], [1435069440.0, 1600000.0, -3.195904], [1435071168.0, 1620000.0, -3.235748472], [1435072896.0, 1640000.0, -3.2755890560000003], [1435074624.0, 1660000.0, -3.3154257040000004], [1435076352.0, 1680000.0, -3.3552583680000003], [1435078080.0, 1700000.0, -3.395087], [1435079808.0, 1720000.0, -3.4349115519999995], [1435081536.0, 1740000.0, -3.4747319759999997], [1435083264.0, 1760000.0, -3.5145482239999994], [1435084992.0, 1780000.0, -3.5543602479999996], [1435086720.0, 1800000.0, -3.594168], [1435088448.0, 1820000.0, -3.6339714319999996], [1435090176.0, 1840000.0, -3.6737704959999995], [1435091904.0, 1860000.0, -3.713565144], [1435093632.0, 1880000.0, -3.7533553279999996], [1435095360.0, 1900000.0, -3.793141], [1435097088.0, 1920000.0, -3.832922112], [1435098816.0, 1940000.0, -3.872698616], [1435100544.0, 1960000.0, -3.912470464], [1435102272.0, 1980000.0, -3.952237608], [1435104000.0, 2000000.0, -3.992], [1435105728.0, 2020000.0, -4.031757592], [1435107456.0, 2040000.0, -4.071510336], [1435109184.0, 2060000.0, -4.111258184], [1435110912.0, 2080000.0, -4.151001088], [1435112640.0, 2100000.0, -4.190739], [1435114368.0, 2120000.0, -4.230471872], [1435116096.0, 2140000.0, -4.270199656], [1435117824.0, 2160000.0, -4.3099223040000005], [1435119552.0, 2180000.0, -4.349639768], [1435121280.0, 2200000.0, -4.389352000000001], [1435123008.0, 2220000.0, -4.429058952], [1435124736.0, 2240000.0, -4.468760576], [1435126464.0, 2260000.0, -4.5084568240000005], [1435128192.0, 2280000.0, -4.5481476480000005], [1435129920.0, 2300000.0, -4.587833000000001], [1435131648.0, 2320000.0, -4.627512832000001], [1435133376.0, 2340000.0, -4.667187096000001], [1435135104.0, 2360000.0, -4.706855743999999], [1435136832.0, 2380000.0, -4.746518728], [1435138560.0, 2400000.0, -4.786176], [1435140288.0, 2420000.0, -4.825827512], [1435142016.0, 2440000.0, -4.865473216], [1435143744.0, 2460000.0, -4.905113064], [1435145472.0, 2480000.0, -4.944747008], [1435147200.0, 2500000.0, -4.984375], [1435148928.0, 2520000.0, -5.023996992], [1435150656.0, 2540000.0, -5.063612936], [1435152384.0, 2560000.0, -5.103222784], [1435154112.0, 2580000.0, -5.142826488], [1435155840.0, 2600000.0, -5.182424], [1435157568.0, 2620000.0, -5.222015272], [1435159296.0, 2640000.0, -5.261600256], [1435161024.0, 2660000.0, -5.301178904], [1435162752.0, 2680000.0, -5.340751168000001], [1435164480.0, 2700000.0, -5.380317000000001], [1435166208.0, 2720000.0, -5.419876352], [1435167936.0, 2740000.0, -5.459429176], [1435169664.0, 2760000.0, -5.498975424], [1435171392.0, 2780000.0, -5.538515048000001], [1435173120.0, 2800000.0, -5.578048000000001], [1435174848.0, 2820000.0, -5.617574232], [1435176576.0, 2840000.0, -5.657093696], [1435178304.0, 2860000.0, -5.696606344], [1435180032.0, 2880000.0, -5.736112127999999], [1435181760.0, 2900000.0, -5.775611], [1435183488.0, 2920000.0, -5.8151029119999995], [1435185216.0, 2940000.0, -5.854587816], [1435186944.0, 2960000.0, -5.894065664], [1435188672.0, 2980000.0, -5.933536408], [1435190400.0, 3000000.0, -5.973], [1435192128.0, 3020000.0, -6.012456392], [1435193856.0, 3040000.0, -6.0519055360000005], [1435195584.0, 3060000.0, -6.0913473840000005], [1435197312.0, 3080000.0, -6.1307818880000005], [1435199040.0, 3100000.0, -6.170209], [1435200768.0, 3120000.0, -6.209628672], [1435202496.0, 3140000.0, -6.249040856000001], [1435204224.0, 3160000.0, -6.288445504], [1435205952.0, 3180000.0, -6.327842568], [1435207680.0, 3200000.0, -6.3672320000000004], [1435209408.0, 3220000.0, -6.406613752], [1435211136.0, 3240000.0, -6.445987776000001], [1435212864.0, 3260000.0, -6.485354024], [1435214592.0, 3280000.0, -6.524712448000001], [1435216320.0, 3300000.0, -6.564063000000001], [1435218048.0, 3320000.0, -6.603405632], [1435219776.0, 3340000.0, -6.642740296], [1435221504.0, 3360000.0, -6.682066944000001], [1435223232.0, 3380000.0, -6.721385528000001], [1435224960.0, 3400000.0, -6.760696], [1435226688.0, 3420000.0, -6.7999983120000005], [1435228416.0, 3440000.0, -6.839292415999999], [1435230144.0, 3460000.0, -6.878578264], [1435231872.0, 3480000.0, -6.917855807999999], [1435233600.0, 3500000.0, -6.9571250000000004], [1435235328.0, 3520000.0, -6.996385791999999], [1435237056.0, 3540000.0, -7.035638136], [1435238784.0, 3560000.0, -7.074881983999999], [1435240512.0, 3580000.0, -7.114117288], [1435242240.0, 3600000.0, -7.153344], [1435243968.0, 3620000.0, -7.192562072], [1435245696.0, 3640000.0, -7.231771456], [1435247424.0, 3660000.0, -7.270972104], [1435249152.0, 3680000.0, -7.3101639679999995], [1435250880.0, 3700000.0, -7.349347000000001], [1435252608.0, 3720000.0, -7.388521151999999], [1435254336.0, 3740000.0, -7.4276863760000005], [1435256064.0, 3760000.0, -7.466842624], [1435257792.0, 3780000.0, -7.5059898480000005], [1435259520.0, 3800000.0, -7.545128], [1435261248.0, 3820000.0, -7.584257032000001], [1435262976.0, 3840000.0, -7.623376896], [1435264704.0, 3860000.0, -7.662487544], [1435266432.0, 3880000.0, -7.701588928], [1435268160.0, 3900000.0, -7.740681], [1435269888.0, 3920000.0, -7.779763712], [1435271616.0, 3940000.0, -7.818837016000001], [1435273344.0, 3960000.0, -7.857900864], [1435275072.0, 3980000.0, -7.8969552080000005], [1435276800.0, 4000000.0, -7.936], [1435278528.0, 4020000.0, -7.975035192000001], [1435280256.0, 4040000.0, -8.014060736], [1435281984.0, 4060000.0, -8.053076584000001], [1435283712.0, 4080000.0, -8.092082688], [1435285440.0, 4100000.0, -8.131079], [1435287168.0, 4120000.0, -8.170065472000001], [1435288896.0, 4140000.0, -8.209042056], [1435290624.0, 4160000.0, -8.248008704], [1435292352.0, 4180000.0, -8.286965367999999], [1435294080.0, 4200000.0, -8.325912], [1435295808.0, 4220000.0, -8.364848552], [1435297536.0, 4240000.0, -8.403774976000001], [1435299264.0, 4260000.0, -8.442691223999999], [1435300992.0, 4280000.0, -8.481597248], [1435302720.0, 4300000.0, -8.520493], [1435304448.0, 4320000.0, -8.559378432], [1435306176.0, 4340000.0, -8.598253496], [1435307904.0, 4360000.0, -8.637118144], [1435309632.0, 4380000.0, -8.675972328], [1435311360.0, 4400000.0, -8.714816], [1435313088.0, 4420000.0, -8.753649112], [1435314816.0, 4440000.0, -8.792471616], [1435316544.0, 4460000.0, -8.831283464], [1435318272.0, 4480000.0, -8.870084608], [1435320000.0, 4500000.0, -8.908875], [1435321728.0, 4520000.0, -8.947654592000001], [1435323456.0, 4540000.0, -8.986423336], [1435325184.0, 4560000.0, -9.025181184000001], [1435326912.0, 4580000.0, -9.063928088], [1435328640.0, 4600000.0, -9.102664], [1435330368.0, 4620000.0, -9.141388872], [1435332096.0, 4640000.0, -9.180102656], [1435333824.0, 4660000.0, -9.218805304], [1435335552.0, 4680000.0, -9.257496768000001], [1435337280.0, 4700000.0, -9.296176999999998], [1435339008.0, 4720000.0, -9.334845952], [1435340736.0, 4740000.0, -9.373503576000001], [1435342464.0, 4760000.0, -9.412149824], [1435344192.0, 4780000.0, -9.450784647999999], [1435345920.0, 4800000.0, -9.489408], [1435347648.0, 4820000.0, -9.528019832], [1435349376.0, 4840000.0, -9.566620096], [1435351104.0, 4860000.0, -9.605208743999999], [1435352832.0, 4880000.0, -9.643785728], [1435354560.0, 4900000.0, -9.682351], [1435356288.0, 4920000.0, -9.720904512], [1435358016.0, 4940000.0, -9.759446215999999], [1435359744.0, 4960000.0, -9.797976064], [1435361472.0, 4980000.0, -9.836494008], [1435363200.0, 5000000.0, -9.875], [1435364928.0, 5020000.0, -9.913493992], [1435366656.0, 5040000.0, -9.951975936], [1435368384.0, 5060000.0, -9.990445784], [1435370112.0, 5080000.0, -10.028903488000001], [1435371840.0, 5100000.0, -10.067349], [1435373568.0, 5120000.0, -10.105782272], [1435375296.0, 5140000.0, -10.144203256], [1435377024.0, 5160000.0, -10.182611904], [1435378752.0, 5180000.0, -10.221008168], [1435380480.0, 5200000.0, -10.259392], [1435382208.0, 5220000.0, -10.297763352], [1435383936.0, 5240000.0, -10.336122176], [1435385664.0, 5260000.0, -10.374468424], [1435387392.0, 5280000.0, -10.412802048], [1435389120.0, 5300000.0, -10.451123], [1435390848.0, 5320000.0, -10.489431232000001], [1435392576.0, 5340000.0, -10.527726696], [1435394304.0, 5360000.0, -10.566009344000001], [1435396032.0, 5380000.0, -10.604279128000002], [1435397760.0, 5400000.0, -10.642536000000002], [1435399488.0, 5420000.0, -10.680779912], [1435401216.0, 5440000.0, -10.719010816], [1435402944.0, 5460000.0, -10.757228664000001], [1435404672.0, 5480000.0, -10.795433408000001], [1435406400.0, 5500000.0, -10.833625], [1435408128.0, 5520000.0, -10.871803392], [1435409856.0, 5540000.0, -10.909968536000001], [1435411584.0, 5560000.0, -10.948120384000001], [1435413312.0, 5580000.0, -10.986258888], [1435415040.0, 5600000.0, -11.024384000000001], [1435416768.0, 5620000.0, -11.062495672000003], [1435418496.0, 5640000.0, -11.100593856], [1435420224.0, 5660000.0, -11.138678503999998], [1435421952.0, 5680000.0, -11.176749568], [1435423680.0, 5700000.0, -11.214806999999999], [1435425408.0, 5720000.0, -11.252850751999999], [1435427136.0, 5740000.0, -11.290880775999998], [1435428864.0, 5760000.0, -11.328897024], [1435430592.0, 5780000.0, -11.366899447999998], [1435432320.0, 5800000.0, -11.404888], [1435434048.0, 5820000.0, -11.442862631999999], [1435435776.0, 5840000.0, -11.480823296], [1435437504.0, 5860000.0, -11.518769943999999], [1435439232.0, 5880000.0, -11.556702528], [1435440960.0, 5900000.0, -11.594620999999998], [1435442688.0, 5920000.0, -11.632525312], [1435444416.0, 5940000.0, -11.670415415999999], [1435446144.0, 5960000.0, -11.708291264], [1435447872.0, 5980000.0, -11.746152808], [1435449600.0, 6000000.0, -11.784], [1435451328.0, 6020000.0, -11.821832791999999], [1435453056.0, 6040000.0, -11.859651136], [1435454784.0, 6060000.0, -11.897454984], [1435456512.0, 6080000.0, -11.935244288], [1435458240.0, 6100000.0, -11.973018999999999], [1435459968.0, 6120000.0, -12.010779072], [1435461696.0, 6140000.0, -12.048524455999999], [1435463424.0, 6160000.0, -12.086255104000001], [1435465152.0, 6180000.0, -12.123970968], [1435466880.0, 6200000.0, -12.161672000000001], [1435468608.0, 6220000.0, -12.199358152], [1435470336.0, 6240000.0, -12.237029376], [1435472064.0, 6260000.0, -12.274685624], [1435473792.0, 6280000.0, -12.312326848], [1435475520.0, 6300000.0, -12.349953], [1435477248.0, 6320000.0, -12.387564032], [1435478976.0, 6340000.0, -12.425159896], [1435480704.0, 6360000.0, -12.462740544], [1435482432.0, 6380000.0, -12.500305928], [1435484160.0, 6400000.0, -12.537856000000001], [1435485888.0, 6420000.0, -12.575390711999999], [1435487616.0, 6440000.0, -12.612910016], [1435489344.0, 6460000.0, -12.650413864], [1435491072.0, 6480000.0, -12.687902208], [1435492800.0, 6500000.0, -12.725375], [1435494528.0, 6520000.0, -12.762832192000001], [1435496256.0, 6540000.0, -12.800273736], [1435497984.0, 6560000.0, -12.837699584000001], [1435499712.0, 6580000.0, -12.875109688], [1435501440.0, 6600000.0, -12.912504], [1435503168.0, 6620000.0, -12.949882472], [1435504896.0, 6640000.0, -12.987245056], [1435506624.0, 6660000.0, -13.024591704], [1435508352.0, 6680000.0, -13.061922368000001], [1435510080.0, 6700000.0, -13.099237], [1435511808.0, 6720000.0, -13.136535552000002], [1435513536.0, 6740000.0, -13.173817976], [1435515264.0, 6760000.0, -13.211084224000002], [1435516992.0, 6780000.0, -13.248334248], [1435518720.0, 6800000.0, -13.285568000000001], [1435520448.0, 6820000.0, -13.322785432], [1435522176.0, 6840000.0, -13.359986496000001], [1435523904.0, 6860000.0, -13.397171144000001], [1435525632.0, 6880000.0, -13.434339327999998], [1435527360.0, 6900000.0, -13.471490999999999], [1435529088.0, 6920000.0, -13.508626112], [1435530816.0, 6940000.0, -13.545744615999999], [1435532544.0, 6960000.0, -13.582846463999998], [1435534272.0, 6980000.0, -13.619931608], [1435536000.0, 7000000.0, -13.657], [1435537728.0, 7020000.0, -13.694051592], [1435539456.0, 7040000.0, -13.731086335999999], [1435541184.0, 7060000.0, -13.768104183999998], [1435542912.0, 7080000.0, -13.805105088], [1435544640.0, 7100000.0, -13.842089], [1435546368.0, 7120000.0, -13.879055871999999], [1435548096.0, 7140000.0, -13.916005656], [1435549824.0, 7160000.0, -13.952938304], [1435551552.0, 7180000.0, -13.989853768], [1435553280.0, 7200000.0, -14.026751999999998], [1435555008.0, 7220000.0, -14.063632951999999], [1435556736.0, 7240000.0, -14.100496576000001], [1435558464.0, 7260000.0, -14.137342824], [1435560192.0, 7280000.0, -14.174171647999998], [1435561920.0, 7300000.0, -14.210982999999999], [1435563648.0, 7320000.0, -14.247776832000001], [1435565376.0, 7340000.0, -14.284553096], [1435567104.0, 7360000.0, -14.321311743999999], [1435568832.0, 7380000.0, -14.358052728], [1435570560.0, 7400000.0, -14.394776], [1435572288.0, 7420000.0, -14.431481512], [1435574016.0, 7440000.0, -14.468169216], [1435575744.0, 7460000.0, -14.504839064], [1435577472.0, 7480000.0, -14.541491008000001], [1435579200.0, 7500000.0, -14.578125], [1435580928.0, 7520000.0, -14.614740992], [1435582656.0, 7540000.0, -14.651338936], [1435584384.0, 7560000.0, -14.687918784], [1435586112.0, 7580000.0, -14.724480488], [1435587840.0, 7600000.0, -14.761023999999999], [1435589568.0, 7620000.0, -14.797549272], [1435591296.0, 7640000.0, -14.834056256000002], [1435593024.0, 7660000.0, -14.870544904], [1435594752.0, 7680000.0, -14.907015168], [1435596480.0, 7700000.0, -14.943467], [1435598208.0, 7720000.0, -14.979900352000001], [1435599936.0, 7740000.0, -15.016315176], [1435601664.0, 7760000.0, -15.052711424], [1435603392.0, 7780000.0, -15.089089048], [1435605120.0, 7800000.0, -15.125448000000002], [1435606848.0, 7820000.0, -15.161788232000001], [1435608576.0, 7840000.0, -15.198109696], [1435610304.0, 7860000.0, -15.234412344], [1435612032.0, 7880000.0, -15.270696128], [1435613760.0, 7900000.0, -15.306961000000001], [1435615488.0, 7920000.0, -15.343206912], [1435617216.0, 7940000.0, -15.379433816], [1435618944.0, 7960000.0, -15.415641664000002], [1435620672.0, 7980000.0, -15.451830408000001], [1435622400.0, 8000000.0, -15.488], [1435624128.0, 8020000.0, -15.524150392], [1435625856.0, 8040000.0, -15.560281536000002], [1435627584.0, 8060000.0, -15.596393384], [1435629312.0, 8080000.0, -15.632485888], [1435631040.0, 8100000.0, -15.668559000000002], [1435632768.0, 8120000.0, -15.704612672000001], [1435634496.0, 8140000.0, -15.740646855999998], [1435636224.0, 8160000.0, -15.776661504], [1435637952.0, 8180000.0, -15.812656568], [1435639680.0, 8200000.0, -15.848631999999998], [1435641408.0, 8220000.0, -15.884587751999998], [1435643136.0, 8240000.0, -15.920523776000001], [1435644864.0, 8260000.0, -15.956440023999999], [1435646592.0, 8280000.0, -15.992336448], [1435648320.0, 8300000.0, -16.028212999999997], [1435650048.0, 8320000.0, -16.064069632], [1435651776.0, 8340000.0, -16.099906296], [1435653504.0, 8360000.0, -16.135722943999998], [1435655232.0, 8380000.0, -16.171519527999997], [1435656960.0, 8400000.0, -16.207296], [1435658688.0, 8420000.0, -16.243052312], [1435660416.0, 8440000.0, -16.278788415999998], [1435662144.0, 8460000.0, -16.314504264], [1435663872.0, 8480000.0, -16.350199808], [1435665600.0, 8500000.0, -16.385875], [1435667328.0, 8520000.0, -16.421529791999998], [1435669056.0, 8540000.0, -16.457164136], [1435670784.0, 8560000.0, -16.492777984], [1435672512.0, 8580000.0, -16.528371288], [1435674240.0, 8600000.0, -16.563944], [1435675968.0, 8620000.0, -16.599496071999997], [1435677696.0, 8640000.0, -16.635027456], [1435679424.0, 8660000.0, -16.670538104000002], [1435681152.0, 8680000.0, -16.706027968], [1435682880.0, 8700000.0, -16.741497], [1435684608.0, 8720000.0, -16.776945152], [1435686336.0, 8740000.0, -16.812372376], [1435688064.0, 8760000.0, -16.847778624], [1435689792.0, 8780000.0, -16.883163848], [1435691520.0, 8800000.0, -16.918528000000002], [1435693248.0, 8820000.0, -16.953871032000002], [1435694976.0, 8840000.0, -16.989192896], [1435696704.0, 8860000.0, -17.024493544], [1435698432.0, 8880000.0, -17.059772928], [1435700160.0, 8900000.0, -17.095031000000002], [1435701888.0, 8920000.0, -17.130267712], [1435703616.0, 8940000.0, -17.165483016], [1435705344.0, 8960000.0, -17.200676864000002], [1435707072.0, 8980000.0, -17.235849208], [1435708800.0, 9000000.0, -17.271], [1435710528.0, 9020000.0, -17.306129192], [1435712256.0, 9040000.0, -17.341236736000003], [1435713984.0, 9060000.0, -17.376322584], [1435715712.0, 9080000.0, -17.411386688], [1435717440.0, 9100000.0, -17.446429], [1435719168.0, 9120000.0, -17.481449472], [1435720896.0, 9140000.0, -17.516448056], [1435722624.0, 9160000.0, -17.551424704], [1435724352.0, 9180000.0, -17.586379368], [1435726080.0, 9200000.0, -17.621312000000003], [1435727808.0, 9220000.0, -17.656222552000003], [1435729536.0, 9240000.0, -17.691110976], [1435731264.0, 9260000.0, -17.725977224], [1435732992.0, 9280000.0, -17.760821248000003], [1435734720.0, 9300000.0, -17.795643000000002], [1435736448.0, 9320000.0, -17.830442432], [1435738176.0, 9340000.0, -17.865219495999998], [1435739904.0, 9360000.0, -17.899974144], [1435741632.0, 9380000.0, -17.934706327999997], [1435743360.0, 9400000.0, -17.969416], [1435745088.0, 9420000.0, -18.004103112], [1435746816.0, 9440000.0, -18.038767615999998], [1435748544.0, 9460000.0, -18.073409463999997], [1435750272.0, 9480000.0, -18.108028608], [1435752000.0, 9500000.0, -18.142625], [1435753728.0, 9520000.0, -18.177198592], [1435755456.0, 9540000.0, -18.211749335999997], [1435757184.0, 9560000.0, -18.246277183999997], [1435758912.0, 9580000.0, -18.280782088], [1435760640.0, 9600000.0, -18.315264], [1435762368.0, 9620000.0, -18.349722871999997], [1435764096.0, 9640000.0, -18.384158656], [1435765824.0, 9660000.0, -18.418571304], [1435767552.0, 9680000.0, -18.452960768], [1435769280.0, 9700000.0, -18.487326999999997], [1435771008.0, 9720000.0, -18.521669951999996], [1435772736.0, 9740000.0, -18.555989576], [1435774464.0, 9760000.0, -18.590285824], [1435776192.0, 9780000.0, -18.624558647999997], [1435777920.0, 9800000.0, -18.658808], [1435779648.0, 9820000.0, -18.693033832], [1435781376.0, 9840000.0, -18.727236096], [1435783104.0, 9860000.0, -18.761414744], [1435784832.0, 9880000.0, -18.795569727999997], [1435786560.0, 9900000.0, -18.829701], [1435788288.0, 9920000.0, -18.863808512], [1435790016.0, 9940000.0, -18.897892216], [1435791744.0, 9960000.0, -18.931952064], [1435793472.0, 9980000.0, -18.965988008], [1435795200.0, 10000000.0, -19.0], [1435796928.0, 10020000.0, -19.033987992], [1435798656.0, 10040000.0, -19.067951936], [1435800384.0, 10060000.0, -19.101891784000003], [1435802112.0, 10080000.0, -19.135807488], [1435803840.0, 10100000.0, -19.169698999999998], [1435805568.0, 10120000.0, -19.203566272000003], [1435807296.0, 10140000.0, -19.237409256], [1435809024.0, 10160000.0, -19.271227904], [1435810752.0, 10180000.0, -19.305022168], [1435812480.0, 10200000.0, -19.338791999999998], [1435814208.0, 10220000.0, -19.372537352000002], [1435815936.0, 10240000.0, -19.406258176], [1435817664.0, 10260000.0, -19.439954424], [1435819392.0, 10280000.0, -19.473626048000003], [1435821120.0, 10300000.0, -19.507273], [1435822848.0, 10320000.0, -19.540895232], [1435824576.0, 10340000.0, -19.574492696], [1435826304.0, 10360000.0, -19.608065344], [1435828032.0, 10380000.0, -19.641613128000003], [1435829760.0, 10400000.0, -19.675136000000002], [1435831488.0, 10420000.0, -19.708633912], [1435833216.0, 10440000.0, -19.742106816000003], [1435834944.0, 10460000.0, -19.775554664], [1435836672.0, 10480000.0, -19.808977408], [1435838400.0, 10500000.0, -19.842375], [1435840128.0, 10520000.0, -19.875747391999997], [1435841856.0, 10540000.0, -19.909094536], [1435843584.0, 10560000.0, -19.942416384], [1435845312.0, 10580000.0, -19.975712888], [1435847040.0, 10600000.0, -20.008984], [1435848768.0, 10620000.0, -20.042229672], [1435850496.0, 10640000.0, -20.075449856000002], [1435852224.0, 10660000.0, -20.108644504], [1435853952.0, 10680000.0, -20.141813568], [1435855680.0, 10700000.0, -20.174957000000003], [1435857408.0, 10720000.0, -20.208074752], [1435859136.0, 10740000.0, -20.241166776], [1435860864.0, 10760000.0, -20.274233024000004], [1435862592.0, 10780000.0, -20.307273448000004], [1435864320.0, 10800000.0, -20.340288], [1435866048.0, 10820000.0, -20.373276632], [1435867776.0, 10840000.0, -20.406239296], [1435869504.0, 10860000.0, -20.439175944000002], [1435871232.0, 10880000.0, -20.472086528000002], [1435872960.0, 10900000.0, -20.504971], [1435874688.0, 10920000.0, -20.537829312000003], [1435876416.0, 10940000.0, -20.570661416000004], [1435878144.0, 10960000.0, -20.603467264000003], [1435879872.0, 10980000.0, -20.636246808], [1435881600.0, 11000000.0, -20.669], [1435883328.0, 11020000.0, -20.701726792000002], [1435885056.0, 11040000.0, -20.734427136], [1435886784.0, 11060000.0, -20.767100984000002], [1435888512.0, 11080000.0, -20.799748288000004], [1435890240.0, 11100000.0, -20.832369000000003], [1435891968.0, 11120000.0, -20.864963072000002], [1435893696.0, 11140000.0, -20.897530456000002], [1435895424.0, 11160000.0, -20.930071104], [1435897152.0, 11180000.0, -20.962584968], [1435898880.0, 11200000.0, -20.995072], [1435900608.0, 11220000.0, -21.027532152], [1435902336.0, 11240000.0, -21.059965376000005], [1435904064.0, 11260000.0, -21.092371623999995], [1435905792.0, 11280000.0, -21.124750847999998], [1435907520.0, 11300000.0, -21.157103], [1435909248.0, 11320000.0, -21.189428032], [1435910976.0, 11340000.0, -21.221725896], [1435912704.0, 11360000.0, -21.253996544], [1435914432.0, 11380000.0, -21.286239927999997], [1435916160.0, 11400000.0, -21.318455999999998], [1435917888.0, 11420000.0, -21.350644711999998], [1435919616.0, 11440000.0, -21.382806016], [1435921344.0, 11460000.0, -21.414939863999997], [1435923072.0, 11480000.0, -21.447046207999996], [1435924800.0, 11500000.0, -21.479125], [1435926528.0, 11520000.0, -21.511176192], [1435928256.0, 11540000.0, -21.543199736], [1435929984.0, 11560000.0, -21.575195584], [1435931712.0, 11580000.0, -21.607163687999996], [1435933440.0, 11600000.0, -21.639104], [1435935168.0, 11620000.0, -21.671016471999998], [1435936896.0, 11640000.0, -21.702901055999998], [1435938624.0, 11660000.0, -21.734757704], [1435940352.0, 11680000.0, -21.766586368], [1435942080.0, 11700000.0, -21.798386999999998], [1435943808.0, 11720000.0, -21.830159551999998], [1435945536.0, 11740000.0, -21.861903975999997], [1435947264.0, 11760000.0, -21.893620224], [1435948992.0, 11780000.0, -21.925308248], [1435950720.0, 11800000.0, -21.956967999999996], [1435952448.0, 11820000.0, -21.988599432], [1435954176.0, 11840000.0, -22.020202496], [1435955904.0, 11860000.0, -22.051777144], [1435957632.0, 11880000.0, -22.083323328], [1435959360.0, 11900000.0, -22.114841], [1435961088.0, 11920000.0, -22.146330112], [1435962816.0, 11940000.0, -22.177790616], [1435964544.0, 11960000.0, -22.209222464], [1435966272.0, 11980000.0, -22.240625608000002], [1435968000.0, 12000000.0, -22.272], [1435969728.0, 12020000.0, -22.303345592], [1435971456.0, 12040000.0, -22.334662335999997], [1435973184.0, 12060000.0, -22.365950184], [1435974912.0, 12080000.0, -22.397209088], [1435976640.0, 12100000.0, -22.428439], [1435978368.0, 12120000.0, -22.459639871999997], [1435980096.0, 12140000.0, -22.490811656], [1435981824.0, 12160000.0, -22.521954304], [1435983552.0, 12180000.0, -22.553067768], [1435985280.0, 12200000.0, -22.584152], [1435987008.0, 12220000.0, -22.615206951999998], [1435988736.0, 12240000.0, -22.646232576], [1435990464.0, 12260000.0, -22.677228824], [1435992192.0, 12280000.0, -22.708195648], [1435993920.0, 12300000.0, -22.739133000000002], [1435995648.0, 12320000.0, -22.770040832], [1435997376.0, 12340000.0, -22.800919096], [1435999104.0, 12360000.0, -22.831767744], [1436000832.0, 12380000.0, -22.862586727999997], [1436002560.0, 12400000.0, -22.893376], [1436004288.0, 12420000.0, -22.924135512], [1436006016.0, 12440000.0, -22.954865215999998], [1436007744.0, 12460000.0, -22.985565064000003], [1436009472.0, 12480000.0, -23.016235008000002], [1436011200.0, 12500000.0, -23.046875], [1436012928.0, 12520000.0, -23.077484992], [1436014656.0, 12540000.0, -23.108064935999998], [1436016384.0, 12560000.0, -23.138614784], [1436018112.0, 12580000.0, -23.169134488], [1436019840.0, 12600000.0, -23.199624], [1436021568.0, 12620000.0, -23.230083272], [1436023296.0, 12640000.0, -23.260512256000002], [1436025024.0, 12660000.0, -23.290910904], [1436026752.0, 12680000.0, -23.321279168], [1436028480.0, 12700000.0, -23.351616999999997], [1436030208.0, 12720000.0, -23.381924352000002], [1436031936.0, 12740000.0, -23.412201176], [1436033664.0, 12760000.0, -23.442447424], [1436035392.0, 12780000.0, -23.472663048], [1436037120.0, 12800000.0, -23.502848], [1436038848.0, 12820000.0, -23.533002232], [1436040576.0, 12840000.0, -23.563125696], [1436042304.0, 12860000.0, -23.593218344], [1436044032.0, 12880000.0, -23.623280128], [1436045760.0, 12900000.0, -23.653311000000002], [1436047488.0, 12920000.0, -23.683310912], [1436049216.0, 12940000.0, -23.713279816000004], [1436050944.0, 12960000.0, -23.743217664000003], [1436052672.0, 12980000.0, -23.773124408], [1436054400.0, 13000000.0, -23.803], [1436056128.0, 13020000.0, -23.832844392], [1436057856.0, 13040000.0, -23.862657536], [1436059584.0, 13060000.0, -23.892439384], [1436061312.0, 13080000.0, -23.922189888], [1436063040.0, 13100000.0, -23.951909], [1436064768.0, 13120000.0, -23.981596672000002], [1436066496.0, 13140000.0, -24.011252856000002], [1436068224.0, 13160000.0, -24.040877504], [1436069952.0, 13180000.0, -24.070470567999998], [1436071680.0, 13200000.0, -24.100032000000002], [1436073408.0, 13220000.0, -24.129561752], [1436075136.0, 13240000.0, -24.159059776], [1436076864.0, 13260000.0, -24.188526024], [1436078592.0, 13280000.0, -24.217960448000003], [1436080320.0, 13300000.0, -24.247363], [1436082048.0, 13320000.0, -24.276733632], [1436083776.0, 13340000.0, -24.306072296], [1436085504.0, 13360000.0, -24.335378944000002], [1436087232.0, 13380000.0, -24.364653528], [1436088960.0, 13400000.0, -24.393896], [1436090688.0, 13420000.0, -24.423106312], [1436092416.0, 13440000.0, -24.452284416], [1436094144.0, 13460000.0, -24.481430264], [1436095872.0, 13480000.0, -24.510543808], [1436097600.0, 13500000.0, -24.539625], [1436099328.0, 13520000.0, -24.568673792000002], [1436101056.0, 13540000.0, -24.597690136], [1436102784.0, 13560000.0, -24.626673984], [1436104512.0, 13580000.0, -24.655625288000003], [1436106240.0, 13600000.0, -24.684544000000002], [1436107968.0, 13620000.0, -24.713430072], [1436109696.0, 13640000.0, -24.742283456], [1436111424.0, 13660000.0, -24.771104104], [1436113152.0, 13680000.0, -24.799891968000004], [1436114880.0, 13700000.0, -24.828647], [1436116608.0, 13720000.0, -24.857369152], [1436118336.0, 13740000.0, -24.886058376000005], [1436120064.0, 13760000.0, -24.914714624], [1436121792.0, 13780000.0, -24.943337848], [1436123520.0, 13800000.0, -24.971928], [1436125248.0, 13820000.0, -25.000485031999997], [1436126976.0, 13840000.0, -25.029008896], [1436128704.0, 13860000.0, -25.057499544], [1436130432.0, 13880000.0, -25.085956927999998], [1436132160.0, 13900000.0, -25.114380999999998], [1436133888.0, 13920000.0, -25.142771712], [1436135616.0, 13940000.0, -25.171129016], [1436137344.0, 13960000.0, -25.199452863999998], [1436139072.0, 13980000.0, -25.227743208], [1436140800.0, 14000000.0, -25.256], [1436142528.0, 14020000.0, -25.284223192], [1436144256.0, 14040000.0, -25.312412736], [1436145984.0, 14060000.0, -25.340568583999996], [1436147712.0, 14080000.0, -25.368690687999997], [1436149440.0, 14100000.0, -25.396779], [1436151168.0, 14120000.0, -25.424833472], [1436152896.0, 14140000.0, -25.452854056], [1436154624.0, 14160000.0, -25.480840704000002], [1436156352.0, 14180000.0, -25.508793368], [1436158080.0, 14200000.0, -25.536711999999998], [1436159808.0, 14220000.0, -25.564596551999998], [1436161536.0, 14240000.0, -25.592446975999998], [1436163264.0, 14260000.0, -25.620263224], [1436164992.0, 14280000.0, -25.648045248], [1436166720.0, 14300000.0, -25.675793], [1436168448.0, 14320000.0, -25.703506432], [1436170176.0, 14340000.0, -25.731185496000002], [1436171904.0, 14360000.0, -25.758830144], [1436173632.0, 14380000.0, -25.786440327999998], [1436175360.0, 14400000.0, -25.814016], [1436177088.0, 14420000.0, -25.841557112], [1436178816.0, 14440000.0, -25.869063616], [1436180544.0, 14460000.0, -25.896535464], [1436182272.0, 14480000.0, -25.923972608], [1436184000.0, 14500000.0, -25.951375], [1436185728.0, 14520000.0, -25.978742592], [1436187456.0, 14540000.0, -26.006075336], [1436189184.0, 14560000.0, -26.033373184], [1436190912.0, 14580000.0, -26.060636088000003], [1436192640.0, 14600000.0, -26.087864], [1436194368.0, 14620000.0, -26.115056871999997], [1436196096.0, 14640000.0, -26.142214656], [1436197824.0, 14660000.0, -26.169337304], [1436199552.0, 14680000.0, -26.196424768], [1436201280.0, 14700000.0, -26.223477], [1436203008.0, 14720000.0, -26.250493952], [1436204736.0, 14740000.0, -26.277475576], [1436206464.0, 14760000.0, -26.304421824], [1436208192.0, 14780000.0, -26.331332648], [1436209920.0, 14800000.0, -26.358208], [1436211648.0, 14820000.0, -26.385047832], [1436213376.0, 14840000.0, -26.411852096], [1436215104.0, 14860000.0, -26.438620743999998], [1436216832.0, 14880000.0, -26.465353727999997], [1436218560.0, 14900000.0, -26.492051], [1436220288.0, 14920000.0, -26.518712512], [1436222016.0, 14940000.0, -26.545338215999998], [1436223744.0, 14960000.0, -26.571928064], [1436225472.0, 14980000.0, -26.598482008], [1436227200.0, 15000000.0, -26.625], [1436228928.0, 15020000.0, -26.651481992], [1436230656.0, 15040000.0, -26.677927936], [1436232384.0, 15060000.0, -26.704337784], [1436234112.0, 15080000.0, -26.730711488], [1436235840.0, 15100000.0, -26.757049], [1436237568.0, 15120000.0, -26.783350272], [1436239296.0, 15140000.0, -26.809615256], [1436241024.0, 15160000.0, -26.835843904], [1436242752.0, 15180000.0, -26.862036168], [1436244480.0, 15200000.0, -26.888191999999997], [1436246208.0, 15220000.0, -26.914311352000002], [1436247936.0, 15240000.0, -26.940394175999998], [1436249664.0, 15260000.0, -26.966440423999998], [1436251392.0, 15280000.0, -26.992450048000002], [1436253120.0, 15300000.0, -27.018423000000002], [1436254848.0, 15320000.0, -27.044359232], [1436256576.0, 15340000.0, -27.070258696], [1436258304.0, 15360000.0, -27.096121344], [1436260032.0, 15380000.0, -27.121947128000002], [1436261760.0, 15400000.0, -27.147736000000002], [1436263488.0, 15420000.0, -27.173487912], [1436265216.0, 15440000.0, -27.199202816000003], [1436266944.0, 15460000.0, -27.224880664], [1436268672.0, 15480000.0, -27.250521408], [1436270400.0, 15500000.0, -27.276125], [1436272128.0, 15520000.0, -27.301691392], [1436273856.0, 15540000.0, -27.327220536000002], [1436275584.0, 15560000.0, -27.352712384], [1436277312.0, 15580000.0, -27.378166888], [1436279040.0, 15600000.0, -27.403584000000002], [1436280768.0, 15620000.0, -27.428963672000002], [1436282496.0, 15640000.0, -27.454305856], [1436284224.0, 15660000.0, -27.479610504], [1436285952.0, 15680000.0, -27.504877567999998], [1436287680.0, 15700000.0, -27.530107], [1436289408.0, 15720000.0, -27.555298752000002], [1436291136.0, 15740000.0, -27.580452776], [1436292864.0, 15760000.0, -27.605569024], [1436294592.0, 15780000.0, -27.630647448], [1436296320.0, 15800000.0, -27.655688], [1436298048.0, 15820000.0, -27.680690632], [1436299776.0, 15840000.0, -27.705655296], [1436301504.0, 15860000.0, -27.730581944], [1436303232.0, 15880000.0, -27.755470528], [1436304960.0, 15900000.0, -27.780321], [1436306688.0, 15920000.0, -27.805133312000002], [1436308416.0, 15940000.0, -27.829907416], [1436310144.0, 15960000.0, -27.854643264000003], [1436311872.0, 15980000.0, -27.879340808000002], [1436313600.0, 16000000.0, -27.904], [1436315328.0, 16020000.0, -27.928620791999997], [1436317056.0, 16040000.0, -27.953203136], [1436318784.0, 16060000.0, -27.977746984000003], [1436320512.0, 16080000.0, -28.002252288], [1436322240.0, 16100000.0, -28.026719], [1436323968.0, 16120000.0, -28.051147072], [1436325696.0, 16140000.0, -28.075536456000002], [1436327424.0, 16160000.0, -28.099887104], [1436329152.0, 16180000.0, -28.124198967999998], [1436330880.0, 16200000.0, -28.148472000000005], [1436332608.0, 16220000.0, -28.172706152000003], [1436334336.0, 16240000.0, -28.196901376000003], [1436336064.0, 16260000.0, -28.221057623999997], [1436337792.0, 16280000.0, -28.245174847999998], [1436339520.0, 16300000.0, -28.269252999999996], [1436341248.0, 16320000.0, -28.293292032000004], [1436342976.0, 16340000.0, -28.317291896], [1436344704.0, 16360000.0, -28.341252544], [1436346432.0, 16380000.0, -28.365173927999997], [1436348160.0, 16400000.0, -28.389055999999997], [1436349888.0, 16420000.0, -28.412898711999997], [1436351616.0, 16440000.0, -28.436702015999998], [1436353344.0, 16460000.0, -28.460465864000003], [1436355072.0, 16480000.0, -28.484190208], [1436356800.0, 16500000.0, -28.507875], [1436358528.0, 16520000.0, -28.531520192000002], [1436360256.0, 16540000.0, -28.555125736], [1436361984.0, 16560000.0, -28.578691583999998], [1436363712.0, 16580000.0, -28.602217687999996], [1436365440.0, 16600000.0, -28.625703999999995], [1436367168.0, 16620000.0, -28.649150471999995], [1436368896.0, 16640000.0, -28.672557056000002], [1436370624.0, 16660000.0, -28.695923704000002], [1436372352.0, 16680000.0, -28.719250368], [1436374080.0, 16700000.0, -28.742537], [1436375808.0, 16720000.0, -28.765783552], [1436377536.0, 16740000.0, -28.788989975999996], [1436379264.0, 16760000.0, -28.812156223999995], [1436380992.0, 16780000.0, -28.835282248000002], [1436382720.0, 16800000.0, -28.858368000000002], [1436384448.0, 16820000.0, -28.881413432000002], [1436386176.0, 16840000.0, -28.904418496], [1436387904.0, 16860000.0, -28.927383144], [1436389632.0, 16880000.0, -28.950307327999997], [1436391360.0, 16900000.0, -28.973191], [1436393088.0, 16920000.0, -28.996034111999997], [1436394816.0, 16940000.0, -29.018836615999994], [1436396544.0, 16960000.0, -29.041598464000003], [1436398272.0, 16980000.0, -29.064319608], [1436400000.0, 17000000.0, -29.087], [1436401728.0, 17020000.0, -29.109639592], [1436403456.0, 17040000.0, -29.132238336], [1436405184.0, 17060000.0, -29.154796184], [1436406912.0, 17080000.0, -29.177313088], [1436408640.0, 17100000.0, -29.199789000000003], [1436410368.0, 17120000.0, -29.222223872], [1436412096.0, 17140000.0, -29.244617656000003], [1436413824.0, 17160000.0, -29.266970304], [1436415552.0, 17180000.0, -29.289281768], [1436417280.0, 17200000.0, -29.311552], [1436419008.0, 17220000.0, -29.333780951999998], [1436420736.0, 17240000.0, -29.355968576], [1436422464.0, 17260000.0, -29.378114823999997], [1436424192.0, 17280000.0, -29.400219648000004], [1436425920.0, 17300000.0, -29.422283], [1436427648.0, 17320000.0, -29.444304832], [1436429376.0, 17340000.0, -29.466285096], [1436431104.0, 17360000.0, -29.488223744], [1436432832.0, 17380000.0, -29.510120727999997], [1436434560.0, 17400000.0, -29.531975999999997], [1436436288.0, 17420000.0, -29.553789512], [1436438016.0, 17440000.0, -29.575561216000004], [1436439744.0, 17460000.0, -29.597291064000004], [1436441472.0, 17480000.0, -29.618979008], [1436443200.0, 17500000.0, -29.640625], [1436444928.0, 17520000.0, -29.662228992], [1436446656.0, 17540000.0, -29.683790935999998], [1436448384.0, 17560000.0, -29.705310783999998], [1436450112.0, 17580000.0, -29.726788487999997], [1436451840.0, 17600000.0, -29.748224000000004], [1436453568.0, 17620000.0, -29.769617272], [1436455296.0, 17640000.0, -29.790968256], [1436457024.0, 17660000.0, -29.812276904], [1436458752.0, 17680000.0, -29.833543168], [1436460480.0, 17700000.0, -29.854767], [1436462208.0, 17720000.0, -29.875948351999998], [1436463936.0, 17740000.0, -29.897087176000003], [1436465664.0, 17760000.0, -29.918183424000002], [1436467392.0, 17780000.0, -29.939237048000003], [1436469120.0, 17800000.0, -29.960248], [1436470848.0, 17820000.0, -29.981216232], [1436472576.0, 17840000.0, -30.002141696], [1436474304.0, 17860000.0, -30.023024344], [1436476032.0, 17880000.0, -30.043864127999996], [1436477760.0, 17900000.0, -30.064660999999997], [1436479488.0, 17920000.0, -30.085414912000005], [1436481216.0, 17940000.0, -30.106125816000002], [1436482944.0, 17960000.0, -30.126793664], [1436484672.0, 17980000.0, -30.147418408], [1436486400.0, 18000000.0, -30.168], [1436488128.0, 18020000.0, -30.188538391999998], [1436489856.0, 18040000.0, -30.209033536], [1436491584.0, 18060000.0, -30.229485384000004], [1436493312.0, 18080000.0, -30.249893888000003], [1436495040.0, 18100000.0, -30.270259000000003], [1436496768.0, 18120000.0, -30.290580672], [1436498496.0, 18140000.0, -30.310858856], [1436500224.0, 18160000.0, -30.331093504], [1436501952.0, 18180000.0, -30.351284567999997], [1436503680.0, 18200000.0, -30.371432], [1436505408.0, 18220000.0, -30.391535751999996], [1436507136.0, 18240000.0, -30.411595776000002], [1436508864.0, 18260000.0, -30.431612024000003], [1436510592.0, 18280000.0, -30.451584448000002], [1436512320.0, 18300000.0, -30.471513], [1436514048.0, 18320000.0, -30.491397632], [1436515776.0, 18340000.0, -30.511238296], [1436517504.0, 18360000.0, -30.531034943999998], [1436519232.0, 18380000.0, -30.550787528000004], [1436520960.0, 18400000.0, -30.570496000000006], [1436522688.0, 18420000.0, -30.590160312000002], [1436524416.0, 18440000.0, -30.609780416000003], [1436526144.0, 18460000.0, -30.629356264000002], [1436527872.0, 18480000.0, -30.648887807999998], [1436529600.0, 18500000.0, -30.668374999999997], [1436531328.0, 18520000.0, -30.687817791999997], [1436533056.0, 18540000.0, -30.707216135999996], [1436534784.0, 18560000.0, -30.726569984000005], [1436536512.0, 18580000.0, -30.745879288000005], [1436538240.0, 18600000.0, -30.765144000000003], [1436539968.0, 18620000.0, -30.784364072000002], [1436541696.0, 18640000.0, -30.803539456], [1436543424.0, 18660000.0, -30.822670104], [1436545152.0, 18680000.0, -30.841755967999998], [1436546880.0, 18700000.0, -30.860797000000005], [1436548608.0, 18720000.0, -30.879793152000005], [1436550336.0, 18740000.0, -30.898744376000003], [1436552064.0, 18760000.0, -30.917650623999997], [1436553792.0, 18780000.0, -30.936511847999995], [1436555520.0, 18800000.0, -30.955327999999994], [1436557248.0, 18820000.0, -30.974099032], [1436558976.0, 18840000.0, -30.992824896000002], [1436560704.0, 18860000.0, -31.011505544], [1436562432.0, 18880000.0, -31.030140927999998], [1436564160.0, 18900000.0, -31.048730999999997], [1436565888.0, 18920000.0, -31.067275711999997], [1436567616.0, 18940000.0, -31.085775015999996], [1436569344.0, 18960000.0, -31.104228864000003], [1436571072.0, 18980000.0, -31.122637208], [1436572800.0, 19000000.0, -31.141000000000002], [1436574528.0, 19020000.0, -31.159317192], [1436576256.0, 19040000.0, -31.177588736], [1436577984.0, 19060000.0, -31.195814583999997], [1436579712.0, 19080000.0, -31.213994688], [1436581440.0, 19100000.0, -31.232128999999997], [1436583168.0, 19120000.0, -31.250217471999996], [1436584896.0, 19140000.0, -31.268260056000003], [1436586624.0, 19160000.0, -31.286256704000003], [1436588352.0, 19180000.0, -31.304207368], [1436590080.0, 19200000.0, -31.322111999999997], [1436591808.0, 19220000.0, -31.339970551999997], [1436593536.0, 19240000.0, -31.357782975999996], [1436595264.0, 19260000.0, -31.375549223999997], [1436596992.0, 19280000.0, -31.393269248000003], [1436598720.0, 19300000.0, -31.410943000000003], [1436600448.0, 19320000.0, -31.428570432], [1436602176.0, 19340000.0, -31.446151496], [1436603904.0, 19360000.0, -31.463686144], [1436605632.0, 19380000.0, -31.481174327999998], [1436607360.0, 19400000.0, -31.498616], [1436609088.0, 19420000.0, -31.516011111999997], [1436610816.0, 19440000.0, -31.533359615999995], [1436612544.0, 19460000.0, -31.550661464], [1436614272.0, 19480000.0, -31.567916608], [1436616000.0, 19500000.0, -31.585125], [1436617728.0, 19520000.0, -31.602286592], [1436619456.0, 19540000.0, -31.619401336], [1436621184.0, 19560000.0, -31.636469184], [1436622912.0, 19580000.0, -31.653490087999998], [1436624640.0, 19600000.0, -31.670464000000003], [1436626368.0, 19620000.0, -31.687390872], [1436628096.0, 19640000.0, -31.704270656000002], [1436629824.0, 19660000.0, -31.721103304], [1436631552.0, 19680000.0, -31.737888767999998], [1436633280.0, 19700000.0, -31.754627], [1436635008.0, 19720000.0, -31.771317951999997], [1436636736.0, 19740000.0, -31.787961575999997], [1436638464.0, 19760000.0, -31.804557823999996], [1436640192.0, 19780000.0, -31.821106648000004], [1436641920.0, 19800000.0, -31.837608000000003], [1436643648.0, 19820000.0, -31.854061832], [1436645376.0, 19840000.0, -31.870468096], [1436647104.0, 19860000.0, -31.886826744], [1436648832.0, 19880000.0, -31.903137727999997], [1436650560.0, 19900000.0, -31.919400999999997], [1436652288.0, 19920000.0, -31.935616512000003], [1436654016.0, 19940000.0, -31.951784216000004], [1436655744.0, 19960000.0, -31.967904064000003], [1436657472.0, 19980000.0, -31.983976008], [1436659200.0, 20000000.0, -32.0], [1436660928.0, 20020000.0, -32.015975991999994], [1436662656.0, 20040000.0, -32.031903936], [1436664384.0, 20060000.0, -32.047783784], [1436666112.0, 20080000.0, -32.063615487999996], [1436667840.0, 20100000.0, -32.079398999999995], [1436669568.0, 20120000.0, -32.095134272], [1436671296.0, 20140000.0, -32.110821255999994], [1436673024.0, 20160000.0, -32.126459904], [1436674752.0, 20180000.0, -32.142050168000004], [1436676480.0, 20200000.0, -32.157592], [1436678208.0, 20220000.0, -32.173085352], [1436679936.0, 20240000.0, -32.188530176], [1436681664.0, 20260000.0, -32.203926423999995], [1436683392.0, 20280000.0, -32.219274048], [1436685120.0, 20300000.0, -32.234573], [1436686848.0, 20320000.0, -32.249823232], [1436688576.0, 20340000.0, -32.265024696], [1436690304.0, 20360000.0, -32.280177343999995], [1436692032.0, 20380000.0, -32.295281128], [1436693760.0, 20400000.0, -32.310336], [1436695488.0, 20420000.0, -32.325341912], [1436697216.0, 20440000.0, -32.340298816], [1436698944.0, 20460000.0, -32.355206663999994], [1436700672.0, 20480000.0, -32.370065408], [1436702400.0, 20500000.0, -32.384875], [1436704128.0, 20520000.0, -32.399635392], [1436705856.0, 20540000.0, -32.414346536], [1436707584.0, 20560000.0, -32.42900838400001], [1436709312.0, 20580000.0, -32.443620888], [1436711040.0, 20600000.0, -32.458184], [1436712768.0, 20620000.0, -32.472697671999995], [1436714496.0, 20640000.0, -32.487161856], [1436716224.0, 20660000.0, -32.501576504], [1436717952.0, 20680000.0, -32.515941568], [1436719680.0, 20700000.0, -32.530257], [1436721408.0, 20720000.0, -32.544522752], [1436723136.0, 20740000.0, -32.558738776], [1436724864.0, 20760000.0, -32.572905024], [1436726592.0, 20780000.0, -32.587021448], [1436728320.0, 20800000.0, -32.601088000000004], [1436730048.0, 20820000.0, -32.615104632], [1436731776.0, 20840000.0, -32.629071296], [1436733504.0, 20860000.0, -32.642987944], [1436735232.0, 20880000.0, -32.656854528000004], [1436736960.0, 20900000.0, -32.670671], [1436738688.0, 20920000.0, -32.684437312], [1436740416.0, 20940000.0, -32.698153416], [1436742144.0, 20960000.0, -32.711819264], [1436743872.0, 20980000.0, -32.725434807999996], [1436745600.0, 21000000.0, -32.739], [1436747328.0, 21020000.0, -32.752514792], [1436749056.0, 21040000.0, -32.765979136], [1436750784.0, 21060000.0, -32.779392984], [1436752512.0, 21080000.0, -32.79275628800001], [1436754240.0, 21100000.0, -32.806068999999994], [1436755968.0, 21120000.0, -32.819331072], [1436757696.0, 21140000.0, -32.832542456], [1436759424.0, 21160000.0, -32.845703104], [1436761152.0, 21180000.0, -32.858812968], [1436762880.0, 21200000.0, -32.871872], [1436764608.0, 21220000.0, -32.884880152], [1436766336.0, 21240000.0, -32.897837376000005], [1436768064.0, 21260000.0, -32.910743624], [1436769792.0, 21280000.0, -32.923598848], [1436771520.0, 21300000.0, -32.936403], [1436773248.0, 21320000.0, -32.949156032], [1436774976.0, 21340000.0, -32.961857896], [1436776704.0, 21360000.0, -32.974508543999995], [1436778432.0, 21380000.0, -32.987107928], [1436780160.0, 21400000.0, -32.999656], [1436781888.0, 21420000.0, -33.012152711999995], [1436783616.0, 21440000.0, -33.024598016], [1436785344.0, 21460000.0, -33.036991864], [1436787072.0, 21480000.0, -33.049334208], [1436788800.0, 21500000.0, -33.061625], [1436790528.0, 21520000.0, -33.073864192], [1436792256.0, 21540000.0, -33.086051736], [1436793984.0, 21560000.0, -33.098187584], [1436795712.0, 21580000.0, -33.110271688], [1436797440.0, 21600000.0, -33.122304], [1436799168.0, 21620000.0, -33.134284472], [1436800896.0, 21640000.0, -33.146213056], [1436802624.0, 21660000.0, -33.158089704000005], [1436804352.0, 21680000.0, -33.16991436799999], [1436806080.0, 21700000.0, -33.181687], [1436807808.0, 21720000.0, -33.193407552000004], [1436809536.0, 21740000.0, -33.205075975999996], [1436811264.0, 21760000.0, -33.216692224], [1436812992.0, 21780000.0, -33.228256248], [1436814720.0, 21800000.0, -33.239768], [1436816448.0, 21820000.0, -33.251227432], [1436818176.0, 21840000.0, -33.262634496000004], [1436819904.0, 21860000.0, -33.273989144], [1436821632.0, 21880000.0, -33.285291328], [1436823360.0, 21900000.0, -33.296541], [1436825088.0, 21920000.0, -33.307738112], [1436826816.0, 21940000.0, -33.318882615999996], [1436828544.0, 21960000.0, -33.329974464], [1436830272.0, 21980000.0, -33.341013608000004], [1436832000.0, 22000000.0, -33.352], [1436833728.0, 22020000.0, -33.362933592000005], [1436835456.0, 22040000.0, -33.373814336], [1436837184.0, 22060000.0, -33.384642184], [1436838912.0, 22080000.0, -33.395417088], [1436840640.0, 22100000.0, -33.406139], [1436842368.0, 22120000.0, -33.416807872], [1436844096.0, 22140000.0, -33.427423656], [1436845824.0, 22160000.0, -33.437986304000006], [1436847552.0, 22180000.0, -33.448495768], [1436849280.0, 22200000.0, -33.458952000000004], [1436851008.0, 22220000.0, -33.469354951999996], [1436852736.0, 22240000.0, -33.479704576], [1436854464.0, 22260000.0, -33.49000082399999], [1436856192.0, 22280000.0, -33.500243648], [1436857920.0, 22300000.0, -33.510433], [1436859648.0, 22320000.0, -33.520568831999995], [1436861376.0, 22340000.0, -33.530651096], [1436863104.0, 22360000.0, -33.540679744], [1436864832.0, 22380000.0, -33.550654728], [1436866560.0, 22400000.0, -33.560576], [1436868288.0, 22420000.0, -33.570443512000004], [1436870016.0, 22440000.0, -33.580257216], [1436871744.0, 22460000.0, -33.590017064], [1436873472.0, 22480000.0, -33.599723008000005], [1436875200.0, 22500000.0, -33.609375], [1436876928.0, 22520000.0, -33.618972991999996], [1436878656.0, 22540000.0, -33.628516936], [1436880384.0, 22560000.0, -33.638006784], [1436882112.0, 22580000.0, -33.647442487999996], [1436883840.0, 22600000.0, -33.656824], [1436885568.0, 22620000.0, -33.666151272], [1436887296.0, 22640000.0, -33.675424256], [1436889024.0, 22660000.0, -33.684642904], [1436890752.0, 22680000.0, -33.693807168], [1436892480.0, 22700000.0, -33.702917], [1436894208.0, 22720000.0, -33.711972352000004], [1436895936.0, 22740000.0, -33.720973176], [1436897664.0, 22760000.0, -33.729919424], [1436899392.0, 22780000.0, -33.738811048], [1436901120.0, 22800000.0, -33.747648], [1436902848.0, 22820000.0, -33.756430232], [1436904576.0, 22840000.0, -33.765157695999996], [1436906304.0, 22860000.0, -33.773830344], [1436908032.0, 22880000.0, -33.782448128], [1436909760.0, 22900000.0, -33.791011], [1436911488.0, 22920000.0, -33.799518911999996], [1436913216.0, 22940000.0, -33.807971816000006], [1436914944.0, 22960000.0, -33.816369664], [1436916672.0, 22980000.0, -33.824712408], [1436918400.0, 23000000.0, -33.833000000000006], [1436920128.0, 23020000.0, -33.841232391999995], [1436921856.0, 23040000.0, -33.849409536], [1436923584.0, 23060000.0, -33.857531384000005], [1436925312.0, 23080000.0, -33.865597887999996], [1436927040.0, 23100000.0, -33.873609], [1436928768.0, 23120000.0, -33.881564671999996], [1436930496.0, 23140000.0, -33.889464856000004], [1436932224.0, 23160000.0, -33.89730950399999], [1436933952.0, 23180000.0, -33.905098568], [1436935680.0, 23200000.0, -33.912832], [1436937408.0, 23220000.0, -33.920509751999994], [1436939136.0, 23240000.0, -33.928131776], [1436940864.0, 23260000.0, -33.935698024000004], [1436942592.0, 23280000.0, -33.94320844799999], [1436944320.0, 23300000.0, -33.950663], [1436946048.0, 23320000.0, -33.958061632], [1436947776.0, 23340000.0, -33.965404295999996], [1436949504.0, 23360000.0, -33.972690944], [1436951232.0, 23380000.0, -33.979921528000006], [1436952960.0, 23400000.0, -33.987096], [1436954688.0, 23420000.0, -33.994214312000004], [1436956416.0, 23440000.0, -34.001276415999996], [1436958144.0, 23460000.0, -34.008282264], [1436959872.0, 23480000.0, -34.015231807999996], [1436961600.0, 23500000.0, -34.022124999999996], [1436963328.0, 23520000.0, -34.028961792000004], [1436965056.0, 23540000.0, -34.035742135999996], [1436966784.0, 23560000.0, -34.042465984], [1436968512.0, 23580000.0, -34.049133288], [1436970240.0, 23600000.0, -34.055744], [1436971968.0, 23620000.0, -34.062298072000004], [1436973696.0, 23640000.0, -34.068795456000004], [1436975424.0, 23660000.0, -34.075236104], [1436977152.0, 23680000.0, -34.081619968], [1436978880.0, 23700000.0, -34.08794700000001], [1436980608.0, 23720000.0, -34.094217152], [1436982336.0, 23740000.0, -34.100430376000006], [1436984064.0, 23760000.0, -34.106586623999995], [1436985792.0, 23780000.0, -34.112685848], [1436987520.0, 23800000.0, -34.118728], [1436989248.0, 23820000.0, -34.124713032], [1436990976.0, 23840000.0, -34.130640896], [1436992704.0, 23860000.0, -34.136511543999994], [1436994432.0, 23880000.0, -34.142324928], [1436996160.0, 23900000.0, -34.148081000000005], [1436997888.0, 23920000.0, -34.153779711999995], [1436999616.0, 23940000.0, -34.159421015999996], [1437001344.0, 23960000.0, -34.165004864000004], [1437003072.0, 23980000.0, -34.170531208], [1437004800.0, 24000000.0, -34.176], [1437006528.0, 24020000.0, -34.181411192], [1437008256.0, 24040000.0, -34.186764736], [1437009984.0, 24060000.0, -34.192060584000004], [1437011712.0, 24080000.0, -34.197298688], [1437013440.0, 24100000.0, -34.202479], [1437015168.0, 24120000.0, -34.20760147199999], [1437016896.0, 24140000.0, -34.212666055999996], [1437018624.0, 24160000.0, -34.217672704], [1437020352.0, 24180000.0, -34.222621368], [1437022080.0, 24200000.0, -34.227512], [1437023808.0, 24220000.0, -34.232344552], [1437025536.0, 24240000.0, -34.237118976], [1437027264.0, 24260000.0, -34.241835224], [1437028992.0, 24280000.0, -34.24649324800001], [1437030720.0, 24300000.0, -34.251093], [1437032448.0, 24320000.0, -34.255634432], [1437034176.0, 24340000.0, -34.26011749600001], [1437035904.0, 24360000.0, -34.264542144], [1437037632.0, 24380000.0, -34.268908328], [1437039360.0, 24400000.0, -34.273216], [1437041088.0, 24420000.0, -34.277465112], [1437042816.0, 24440000.0, -34.281655615999995], [1437044544.0, 24460000.0, -34.285787463999995], [1437046272.0, 24480000.0, -34.289860608], [1437048000.0, 24500000.0, -34.293875], [1437049728.0, 24520000.0, -34.297830592], [1437051456.0, 24540000.0, -34.301727336], [1437053184.0, 24560000.0, -34.305565183999995], [1437054912.0, 24580000.0, -34.309344088], [1437056640.0, 24600000.0, -34.313064000000004], [1437058368.0, 24620000.0, -34.316724871999995], [1437060096.0, 24640000.0, -34.320326656], [1437061824.0, 24660000.0, -34.323869304000006], [1437063552.0, 24680000.0, -34.327352768], [1437065280.0, 24700000.0, -34.330777], [1437067008.0, 24720000.0, -34.334141951999996], [1437068736.0, 24740000.0, -34.337447576], [1437070464.0, 24760000.0, -34.340693824], [1437072192.0, 24780000.0, -34.343880647999995], [1437073920.0, 24800000.0, -34.347008], [1437075648.0, 24820000.0, -34.350075831999995], [1437077376.0, 24840000.0, -34.353084096], [1437079104.0, 24860000.0, -34.356032744000004], [1437080832.0, 24880000.0, -34.358921728], [1437082560.0, 24900000.0, -34.361751], [1437084288.0, 24920000.0, -34.364520512000006], [1437086016.0, 24940000.0, -34.367230215999996], [1437087744.0, 24960000.0, -34.369880064], [1437089472.0, 24980000.0, -34.37247000800001], [1437091200.0, 25000000.0, -34.375], [1437092928.0, 25020000.0, -34.377469991999995], [1437094656.0, 25040000.0, -34.379879935999995], [1437096384.0, 25060000.0, -34.382229784], [1437098112.0, 25080000.0, -34.384519487999995], [1437099840.0, 25100000.0, -34.386749], [1437101568.0, 25120000.0, -34.388918272], [1437103296.0, 25140000.0, -34.391027256], [1437105024.0, 25160000.0, -34.393075904], [1437106752.0, 25180000.0, -34.395064168000005], [1437108480.0, 25200000.0, -34.396992], [1437110208.0, 25220000.0, -34.398859352], [1437111936.0, 25240000.0, -34.400666176], [1437113664.0, 25260000.0, -34.402412424], [1437115392.0, 25280000.0, -34.404098048], [1437117120.0, 25300000.0, -34.405722999999995], [1437118848.0, 25320000.0, -34.407287232], [1437120576.0, 25340000.0, -34.408790696], [1437122304.0, 25360000.0, -34.410233344], [1437124032.0, 25380000.0, -34.411615128], [1437125760.0, 25400000.0, -34.412936], [1437127488.0, 25420000.0, -34.414195912], [1437129216.0, 25440000.0, -34.415394816], [1437130944.0, 25460000.0, -34.416532664], [1437132672.0, 25480000.0, -34.417609408000004], [1437134400.0, 25500000.0, -34.418625000000006], [1437136128.0, 25520000.0, -34.419579392], [1437137856.0, 25540000.0, -34.420472536000005], [1437139584.0, 25560000.0, -34.42130438400001], [1437141312.0, 25580000.0, -34.422074888], [1437143040.0, 25600000.0, -34.422784], [1437144768.0, 25620000.0, -34.42343167199999], [1437146496.0, 25640000.0, -34.424017856000006], [1437148224.0, 25660000.0, -34.424542504], [1437149952.0, 25680000.0, -34.425005568], [1437151680.0, 25700000.0, -34.42540700000001], [1437153408.0, 25720000.0, -34.425746751999995], [1437155136.0, 25740000.0, -34.426024776000006], [1437156864.0, 25760000.0, -34.426241024], [1437158592.0, 25780000.0, -34.426395447999994], [1437160320.0, 25800000.0, -34.426488], [1437162048.0, 25820000.0, -34.426518632000004], [1437163776.0, 25840000.0, -34.426487296], [1437165504.0, 25860000.0, -34.426393944], [1437167232.0, 25880000.0, -34.426238528], [1437168960.0, 25900000.0, -34.426021000000006], [1437170688.0, 25920000.0, -34.425741312], [1437172416.0, 25940000.0, -34.425399416], [1437174144.0, 25960000.0, -34.424995264], [1437175872.0, 25980000.0, -34.42452880799999], [1437177600.0, 26000000.0, -34.42399999999999], [1437179328.0, 26020000.0, -34.423408792000004], [1437181056.0, 26040000.0, -34.42275513599999], [1437182784.0, 26060000.0, -34.422038984], [1437184512.0, 26080000.0, -34.421260288], [1437186240.0, 26100000.0, -34.420418999999995], [1437187968.0, 26120000.0, -34.419515071999996], [1437189696.0, 26140000.0, -34.418548456], [1437191424.0, 26160000.0, -34.41751910399999], [1437193152.0, 26180000.0, -34.416426968], [1437194880.0, 26200000.0, -34.415272], [1437196608.0, 26220000.0, -34.414054152], [1437198336.0, 26240000.0, -34.412773376000004], [1437200064.0, 26260000.0, -34.41142962399999], [1437201792.0, 26280000.0, -34.410022848], [1437203520.0, 26300000.0, -34.408553], [1437205248.0, 26320000.0, -34.407020032], [1437206976.0, 26340000.0, -34.405423896], [1437208704.0, 26360000.0, -34.403764544], [1437210432.0, 26380000.0, -34.402041928], [1437212160.0, 26400000.0, -34.400256], [1437213888.0, 26420000.0, -34.398406711999996], [1437215616.0, 26440000.0, -34.396494016], [1437217344.0, 26460000.0, -34.39451786400001], [1437219072.0, 26480000.0, -34.392478208], [1437220800.0, 26500000.0, -34.390375000000006], [1437222528.0, 26520000.0, -34.38820819200001], [1437224256.0, 26540000.0, -34.385977736], [1437225984.0, 26560000.0, -34.383683583999996], [1437227712.0, 26580000.0, -34.381325688], [1437229440.0, 26600000.0, -34.378904000000006], [1437231168.0, 26620000.0, -34.376418472], [1437232896.0, 26640000.0, -34.373869056], [1437234624.0, 26660000.0, -34.371255704000006], [1437236352.0, 26680000.0, -34.368578368], [1437238080.0, 26700000.0, -34.365837], [1437239808.0, 26720000.0, -34.363031552], [1437241536.0, 26740000.0, -34.360161976], [1437243264.0, 26760000.0, -34.357228223999996], [1437244992.0, 26780000.0, -34.35423024800001], [1437246720.0, 26800000.0, -34.351168], [1437248448.0, 26820000.0, -34.348041432], [1437250176.0, 26840000.0, -34.34485049600001], [1437251904.0, 26860000.0, -34.341595143999996], [1437253632.0, 26880000.0, -34.338275328], [1437255360.0, 26900000.0, -34.334891], [1437257088.0, 26920000.0, -34.331442112], [1437258816.0, 26940000.0, -34.327928615999994], [1437260544.0, 26960000.0, -34.324350464], [1437262272.0, 26980000.0, -34.320707608000006], [1437264000.0, 27000000.0, -34.31699999999999], [1437265728.0, 27020000.0, -34.313227592000004], [1437267456.0, 27040000.0, -34.30939033600001], [1437269184.0, 27060000.0, -34.305488184], [1437270912.0, 27080000.0, -34.301521088], [1437272640.0, 27100000.0, -34.297489], [1437274368.0, 27120000.0, -34.293391872], [1437276096.0, 27140000.0, -34.289229656], [1437277824.0, 27160000.0, -34.285002304], [1437279552.0, 27180000.0, -34.280709768], [1437281280.0, 27200000.0, -34.276352], [1437283008.0, 27220000.0, -34.271928951999996], [1437284736.0, 27240000.0, -34.267440576], [1437286464.0, 27260000.0, -34.26288682399999], [1437288192.0, 27280000.0, -34.258267648], [1437289920.0, 27300000.0, -34.253583000000006], [1437291648.0, 27320000.0, -34.24883283199999], [1437293376.0, 27340000.0, -34.244017096], [1437295104.0, 27360000.0, -34.239135744], [1437296832.0, 27380000.0, -34.234188728], [1437298560.0, 27400000.0, -34.229175999999995], [1437300288.0, 27420000.0, -34.224097512], [1437302016.0, 27440000.0, -34.218953216], [1437303744.0, 27460000.0, -34.213743064], [1437305472.0, 27480000.0, -34.208467008], [1437307200.0, 27500000.0, -34.203125], [1437308928.0, 27520000.0, -34.197716992], [1437310656.0, 27540000.0, -34.192242936], [1437312384.0, 27560000.0, -34.186702784000005], [1437314112.0, 27580000.0, -34.181096487999994], [1437315840.0, 27600000.0, -34.175424], [1437317568.0, 27620000.0, -34.169685272], [1437319296.0, 27640000.0, -34.163880256], [1437321024.0, 27660000.0, -34.158008904], [1437322752.0, 27680000.0, -34.152071168000006], [1437324480.0, 27700000.0, -34.146067], [1437326208.0, 27720000.0, -34.139996352000004], [1437327936.0, 27740000.0, -34.133859176], [1437329664.0, 27760000.0, -34.127655424], [1437331392.0, 27780000.0, -34.121385048], [1437333120.0, 27800000.0, -34.115048], [1437334848.0, 27820000.0, -34.108644232], [1437336576.0, 27840000.0, -34.102173695999994], [1437338304.0, 27860000.0, -34.095636344], [1437340032.0, 27880000.0, -34.089032128], [1437341760.0, 27900000.0, -34.08236099999999], [1437343488.0, 27920000.0, -34.075622912], [1437345216.0, 27940000.0, -34.068817816], [1437346944.0, 27960000.0, -34.061945664], [1437348672.0, 27980000.0, -34.055006408], [1437350400.0, 28000000.0, -34.048], [1437352128.0, 28020000.0, -34.040926392], [1437353856.0, 28040000.0, -34.033785536], [1437355584.0, 28060000.0, -34.02657738400001], [1437357312.0, 28080000.0, -34.019301888], [1437359040.0, 28100000.0, -34.011959000000004], [1437360768.0, 28120000.0, -34.004548672], [1437362496.0, 28140000.0, -33.997070856], [1437364224.0, 28160000.0, -33.989525504], [1437365952.0, 28180000.0, -33.981912568], [1437367680.0, 28200000.0, -33.974232], [1437369408.0, 28220000.0, -33.966483751999995], [1437371136.0, 28240000.0, -33.958667776], [1437372864.0, 28260000.0, -33.950784024], [1437374592.0, 28280000.0, -33.942832448000004], [1437376320.0, 28300000.0, -33.934813], [1437378048.0, 28320000.0, -33.926725632], [1437379776.0, 28340000.0, -33.918570296], [1437381504.0, 28360000.0, -33.910346944000004], [1437383232.0, 28380000.0, -33.902055528000005], [1437384960.0, 28400000.0, -33.893696], [1437386688.0, 28420000.0, -33.885268312], [1437388416.0, 28440000.0, -33.876772415999994], [1437390144.0, 28460000.0, -33.868208264], [1437391872.0, 28480000.0, -33.859575808], [1437393600.0, 28500000.0, -33.850875], [1437395328.0, 28520000.0, -33.842105792], [1437397056.0, 28540000.0, -33.833268135999994], [1437398784.0, 28560000.0, -33.82436198400001], [1437400512.0, 28580000.0, -33.815387288], [1437402240.0, 28600000.0, -33.806343999999996], [1437403968.0, 28620000.0, -33.797232072], [1437405696.0, 28640000.0, -33.788051456000005], [1437407424.0, 28660000.0, -33.77880210399999], [1437409152.0, 28680000.0, -33.769483968], [1437410880.0, 28700000.0, -33.760097], [1437412608.0, 28720000.0, -33.750641152], [1437414336.0, 28740000.0, -33.741116376], [1437416064.0, 28760000.0, -33.73152262399999], [1437417792.0, 28780000.0, -33.721859847999994], [1437419520.0, 28800000.0, -33.71212799999999], [1437421248.0, 28820000.0, -33.702327032], [1437422976.0, 28840000.0, -33.692456895999996], [1437424704.0, 28860000.0, -33.68251754399999], [1437426432.0, 28880000.0, -33.672508928], [1437428160.0, 28900000.0, -33.662431], [1437429888.0, 28920000.0, -33.652283712], [1437431616.0, 28940000.0, -33.642067016], [1437433344.0, 28960000.0, -33.63178086400001], [1437435072.0, 28980000.0, -33.621425208], [1437436800.0, 29000000.0, -33.611000000000004], [1437438528.0, 29020000.0, -33.600505192], [1437440256.0, 29040000.0, -33.589940736], [1437441984.0, 29060000.0, -33.579306584], [1437443712.0, 29080000.0, -33.568602688], [1437445440.0, 29100000.0, -33.557829], [1437447168.0, 29120000.0, -33.546985471999996], [1437448896.0, 29140000.0, -33.536072055999995], [1437450624.0, 29160000.0, -33.525088704000005], [1437452352.0, 29180000.0, -33.514035367999995], [1437454080.0, 29200000.0, -33.502912], [1437455808.0, 29220000.0, -33.491718551999995], [1437457536.0, 29240000.0, -33.480454976], [1437459264.0, 29260000.0, -33.469121224], [1437460992.0, 29280000.0, -33.45771724800001], [1437462720.0, 29300000.0, -33.446242999999996], [1437464448.0, 29320000.0, -33.434698432000005], [1437466176.0, 29340000.0, -33.423083496000004], [1437467904.0, 29360000.0, -33.411398144], [1437469632.0, 29380000.0, -33.399642328], [1437471360.0, 29400000.0, -33.387816], [1437473088.0, 29420000.0, -33.375919112000005], [1437474816.0, 29440000.0, -33.363951615999994], [1437476544.0, 29460000.0, -33.351913464], [1437478272.0, 29480000.0, -33.339804608], [1437480000.0, 29500000.0, -33.327625], [1437481728.0, 29520000.0, -33.315374592], [1437483456.0, 29540000.0, -33.303053336000005], [1437485184.0, 29560000.0, -33.290661184], [1437486912.0, 29580000.0, -33.278198087999996], [1437488640.0, 29600000.0, -33.265664], [1437490368.0, 29620000.0, -33.253058872], [1437492096.0, 29640000.0, -33.240382655999994], [1437493824.0, 29660000.0, -33.227635304], [1437495552.0, 29680000.0, -33.214816768000006], [1437497280.0, 29700000.0, -33.201927], [1437499008.0, 29720000.0, -33.188965952], [1437500736.0, 29740000.0, -33.175933576], [1437502464.0, 29760000.0, -33.162829824], [1437504192.0, 29780000.0, -33.149654647999995], [1437505920.0, 29800000.0, -33.136408], [1437507648.0, 29820000.0, -33.12308983199999], [1437509376.0, 29840000.0, -33.109700096], [1437511104.0, 29860000.0, -33.096238744000004], [1437512832.0, 29880000.0, -33.082705728], [1437514560.0, 29900000.0, -33.069101], [1437516288.0, 29920000.0, -33.055424512], [1437518016.0, 29940000.0, -33.041676216], [1437519744.0, 29960000.0, -33.027856064000005], [1437521472.0, 29980000.0, -33.013964008], [1437523200.0, 30000000.0, -33.0], [1437524928.0, 30020000.0, -32.985963991999995], [1437526656.0, 30040000.0, -32.971855936], [1437528384.0, 30060000.0, -32.957675784], [1437530112.0, 30080000.0, -32.94342348799999], [1437531840.0, 30100000.0, -32.929099], [1437533568.0, 30120000.0, -32.914702272], [1437535296.0, 30140000.0, -32.900233256], [1437537024.0, 30160000.0, -32.885691904], [1437538752.0, 30180000.0, -32.871078168000004], [1437540480.0, 30200000.0, -32.856392], [1437542208.0, 30220000.0, -32.841633352], [1437543936.0, 30240000.0, -32.826802176], [1437545664.0, 30260000.0, -32.811898424], [1437547392.0, 30280000.0, -32.796922048], [1437549120.0, 30300000.0, -32.781873], [1437550848.0, 30320000.0, -32.766751232000004], [1437552576.0, 30340000.0, -32.751556695999994], [1437554304.0, 30360000.0, -32.736289344], [1437556032.0, 30380000.0, -32.720949128], [1437557760.0, 30400000.0, -32.705535999999995], [1437559488.0, 30420000.0, -32.690049912000006], [1437561216.0, 30440000.0, -32.674490816], [1437562944.0, 30460000.0, -32.658858664], [1437564672.0, 30480000.0, -32.643153407999996], [1437566400.0, 30500000.0, -32.62737500000001], [1437568128.0, 30520000.0, -32.611523391999995], [1437569856.0, 30540000.0, -32.595598536000004], [1437571584.0, 30560000.0, -32.579600384], [1437573312.0, 30580000.0, -32.56352888800001], [1437575040.0, 30600000.0, -32.547384], [1437576768.0, 30620000.0, -32.531165672], [1437578496.0, 30640000.0, -32.514873855999994], [1437580224.0, 30660000.0, -32.498508504], [1437581952.0, 30680000.0, -32.482069568], [1437583680.0, 30700000.0, -32.465557000000004], [1437585408.0, 30720000.0, -32.448970751999994], [1437587136.0, 30740000.0, -32.432310776], [1437588864.0, 30760000.0, -32.415577024], [1437590592.0, 30780000.0, -32.398769447999996], [1437592320.0, 30800000.0, -32.381888000000004], [1437594048.0, 30820000.0, -32.364932632000006], [1437595776.0, 30840000.0, -32.347903296], [1437597504.0, 30860000.0, -32.330799944], [1437599232.0, 30880000.0, -32.313622527999996], [1437600960.0, 30900000.0, -32.296371], [1437602688.0, 30920000.0, -32.27904531200001], [1437604416.0, 30940000.0, -32.261645416], [1437606144.0, 30960000.0, -32.244171264], [1437607872.0, 30980000.0, -32.226622808], [1437609600.0, 31000000.0, -32.208999999999996], [1437611328.0, 31020000.0, -32.191302792], [1437613056.0, 31040000.0, -32.173531135999994], [1437614784.0, 31060000.0, -32.155684984000004], [1437616512.0, 31080000.0, -32.137764288], [1437618240.0, 31100000.0, -32.119769000000005], [1437619968.0, 31120000.0, -32.101699072], [1437621696.0, 31140000.0, -32.083554456], [1437623424.0, 31160000.0, -32.065335104], [1437625152.0, 31180000.0, -32.047040968000005], [1437626880.0, 31200000.0, -32.028672], [1437628608.0, 31220000.0, -32.010228151999996], [1437630336.0, 31240000.0, -31.991709376], [1437632064.0, 31260000.0, -31.973115624000002], [1437633792.0, 31280000.0, -31.954446847999996], [1437635520.0, 31300000.0, -31.935702999999997], [1437637248.0, 31320000.0, -31.916884032], [1437638976.0, 31340000.0, -31.897989896], [1437640704.0, 31360000.0, -31.879020543999992], [1437642432.0, 31380000.0, -31.859975928000004], [1437644160.0, 31400000.0, -31.840856000000002], [1437645888.0, 31420000.0, -31.821660712], [1437647616.0, 31440000.0, -31.802390016], [1437649344.0, 31460000.0, -31.783043864000003], [1437651072.0, 31480000.0, -31.763622207999997], [1437652800.0, 31500000.0, -31.744125000000004], [1437654528.0, 31520000.0, -31.724552192], [1437656256.0, 31540000.0, -31.704903736], [1437657984.0, 31560000.0, -31.685179583999997], [1437659712.0, 31580000.0, -31.665379687999998], [1437661440.0, 31600000.0, -31.645503999999995], [1437663168.0, 31620000.0, -31.625552471999995], [1437664896.0, 31640000.0, -31.605525055999998], [1437666624.0, 31660000.0, -31.585421704000005], [1437668352.0, 31680000.0, -31.565242367999996], [1437670080.0, 31700000.0, -31.544987], [1437671808.0, 31720000.0, -31.524655552], [1437673536.0, 31740000.0, -31.504247976], [1437675264.0, 31760000.0, -31.483764223999998], [1437676992.0, 31780000.0, -31.463204248000004], [1437678720.0, 31800000.0, -31.442567999999994], [1437680448.0, 31820000.0, -31.421855432], [1437682176.0, 31840000.0, -31.401066496], [1437683904.0, 31860000.0, -31.380201143999997], [1437685632.0, 31880000.0, -31.359259328], [1437687360.0, 31900000.0, -31.338240999999996], [1437689088.0, 31920000.0, -31.317146112000003], [1437690816.0, 31940000.0, -31.295974616000002], [1437692544.0, 31960000.0, -31.274726463999997], [1437694272.0, 31980000.0, -31.253401607999997], [1437696000.0, 32000000.0, -31.231999999999992], [1437697728.0, 32020000.0, -31.210521591999992], [1437699456.0, 32040000.0, -31.188966335999993], [1437701184.0, 32060000.0, -31.167334184000005], [1437702912.0, 32080000.0, -31.14562508799999], [1437704640.0, 32100000.0, -31.123839000000004], [1437706368.0, 32120000.0, -31.101975872000004], [1437708096.0, 32140000.0, -31.080035656], [1437709824.0, 32160000.0, -31.058018304], [1437711552.0, 32180000.0, -31.035923768000004], [1437713280.0, 32200000.0, -31.013751999999997], [1437715008.0, 32220000.0, -30.991502951999998], [1437716736.0, 32240000.0, -30.969176575999995], [1437718464.0, 32260000.0, -30.946772823999993], [1437720192.0, 32280000.0, -30.924291647999993], [1437721920.0, 32300000.0, -30.901732999999993], [1437723648.0, 32320000.0, -30.879096831999995], [1437725376.0, 32340000.0, -30.85638309600001], [1437727104.0, 32360000.0, -30.83359174399999], [1437728832.0, 32380000.0, -30.81072272800001], [1437730560.0, 32400000.0, -30.787776], [1437732288.0, 32420000.0, -30.764751512000004], [1437734016.0, 32440000.0, -30.741649216], [1437735744.0, 32460000.0, -30.718469063999997], [1437737472.0, 32480000.0, -30.695211008], [1437739200.0, 32500000.0, -30.671875], [1437740928.0, 32520000.0, -30.648460991999997], [1437742656.0, 32540000.0, -30.624968935999995], [1437744384.0, 32560000.0, -30.601398783999997], [1437746112.0, 32580000.0, -30.577750488], [1437747840.0, 32600000.0, -30.55402399999999], [1437749568.0, 32620000.0, -30.530219271999997], [1437751296.0, 32640000.0, -30.50633625600001], [1437753024.0, 32660000.0, -30.482374903999997], [1437754752.0, 32680000.0, -30.458335168000005], [1437756480.0, 32700000.0, -30.434217000000004], [1437758208.0, 32720000.0, -30.410020352000004], [1437759936.0, 32740000.0, -30.385745176000007], [1437761664.0, 32760000.0, -30.361391424000004], [1437763392.0, 32780000.0, -30.336959048000004], [1437765120.0, 32800000.0, -30.312448000000003], [1437766848.0, 32820000.0, -30.287858232000005], [1437768576.0, 32840000.0, -30.263189695999998], [1437770304.0, 32860000.0, -30.238442344], [1437772032.0, 32880000.0, -30.213616127999998], [1437773760.0, 32900000.0, -30.188710999999998], [1437775488.0, 32920000.0, -30.16372691200001], [1437777216.0, 32940000.0, -30.138663815999998], [1437778944.0, 32960000.0, -30.11352166400001], [1437780672.0, 32980000.0, -30.08830040800001], [1437782400.0, 33000000.0, -30.063000000000002], [1437784128.0, 33020000.0, -30.037620392], [1437785856.0, 33040000.0, -30.012161536000008], [1437787584.0, 33060000.0, -29.986623384000005], [1437789312.0, 33080000.0, -29.961005888000003], [1437791040.0, 33100000.0, -29.935308999999997], [1437792768.0, 33120000.0, -29.909532671999997], [1437794496.0, 33140000.0, -29.883676856], [1437796224.0, 33160000.0, -29.857741504000003], [1437797952.0, 33180000.0, -29.831726568], [1437799680.0, 33200000.0, -29.805631999999996], [1437801408.0, 33220000.0, -29.779457752], [1437803136.0, 33240000.0, -29.753203775999992], [1437804864.0, 33260000.0, -29.726870023999993], [1437806592.0, 33280000.0, -29.700456448000004], [1437808320.0, 33300000.0, -29.673962999999993], [1437810048.0, 33320000.0, -29.647389632000007], [1437811776.0, 33340000.0, -29.620736296000004], [1437813504.0, 33360000.0, -29.594002944000003], [1437815232.0, 33380000.0, -29.567189528000007], [1437816960.0, 33400000.0, -29.540295999999998], [1437818688.0, 33420000.0, -29.513322312], [1437820416.0, 33440000.0, -29.486268416], [1437822144.0, 33460000.0, -29.459134264], [1437823872.0, 33480000.0, -29.431919807999996], [1437825600.0, 33500000.0, -29.404624999999996], [1437827328.0, 33520000.0, -29.377249791999994], [1437829056.0, 33540000.0, -29.349794135999993], [1437830784.0, 33560000.0, -29.32225798400001], [1437832512.0, 33580000.0, -29.294641287999994], [1437834240.0, 33600000.0, -29.26694400000001], [1437835968.0, 33620000.0, -29.23916607200001], [1437837696.0, 33640000.0, -29.211307456000007], [1437839424.0, 33660000.0, -29.183368104000003], [1437841152.0, 33680000.0, -29.155347968000008], [1437842880.0, 33700000.0, -29.127246999999997], [1437844608.0, 33720000.0, -29.099065152], [1437846336.0, 33740000.0, -29.070802376000003], [1437848064.0, 33760000.0, -29.042458624], [1437849792.0, 33780000.0, -29.014033847999997], [1437851520.0, 33800000.0, -28.985528000000002], [1437853248.0, 33820000.0, -28.956941031999996], [1437854976.0, 33840000.0, -28.928272895999996], [1437856704.0, 33860000.0, -28.899523543999997], [1437858432.0, 33880000.0, -28.870692927999997], [1437860160.0, 33900000.0, -28.84178099999999], [1437861888.0, 33920000.0, -28.81278771200001], [1437863616.0, 33940000.0, -28.783713015999993], [1437865344.0, 33960000.0, -28.75455686400001], [1437867072.0, 33980000.0, -28.725319208000002], [1437868800.0, 34000000.0, -28.696000000000005], [1437870528.0, 34020000.0, -28.666599192], [1437872256.0, 34040000.0, -28.637116736000003], [1437873984.0, 34060000.0, -28.607552583999997], [1437875712.0, 34080000.0, -28.577906688], [1437877440.0, 34100000.0, -28.548178999999998], [1437879168.0, 34120000.0, -28.518369471999996], [1437880896.0, 34140000.0, -28.48847805599999], [1437882624.0, 34160000.0, -28.458504704], [1437884352.0, 34180000.0, -28.428449367999995], [1437886080.0, 34200000.0, -28.39831200000001], [1437887808.0, 34220000.0, -28.368092551999993], [1437889536.0, 34240000.0, -28.337790976000008], [1437891264.0, 34260000.0, -28.307407224000002], [1437892992.0, 34280000.0, -28.276941248], [1437894720.0, 34300000.0, -28.246393000000005], [1437896448.0, 34320000.0, -28.215762432000005], [1437898176.0, 34340000.0, -28.185049495999998], [1437899904.0, 34360000.0, -28.154254144], [1437901632.0, 34380000.0, -28.123376328], [1437903360.0, 34400000.0, -28.092416], [1437905088.0, 34420000.0, -28.061373112], [1437906816.0, 34440000.0, -28.030247615999997], [1437908544.0, 34460000.0, -27.999039464], [1437910272.0, 34480000.0, -27.967748608], [1437912000.0, 34500000.0, -27.93637499999999], [1437913728.0, 34520000.0, -27.904918591999994], [1437915456.0, 34540000.0, -27.873379335999992], [1437917184.0, 34560000.0, -27.84175718400001], [1437918912.0, 34580000.0, -27.810052087999992], [1437920640.0, 34600000.0, -27.778264], [1437922368.0, 34620000.0, -27.746392872], [1437924096.0, 34640000.0, -27.714438656000006], [1437925824.0, 34660000.0, -27.682401303999995], [1437927552.0, 34680000.0, -27.650280768000002], [1437929280.0, 34700000.0, -27.618077], [1437931008.0, 34720000.0, -27.585789952], [1437932736.0, 34740000.0, -27.553419575999996], [1437934464.0, 34760000.0, -27.520965824], [1437936192.0, 34780000.0, -27.488428647999996], [1437937920.0, 34800000.0, -27.45580799999999], [1437939648.0, 34820000.0, -27.423103831999995], [1437941376.0, 34840000.0, -27.390316096000006], [1437943104.0, 34860000.0, -27.35744474399999], [1437944832.0, 34880000.0, -27.32448972800001], [1437946560.0, 34900000.0, -27.291451000000002], [1437948288.0, 34920000.0, -27.258328512000006], [1437950016.0, 34940000.0, -27.225122216000003], [1437951744.0, 34960000.0, -27.191832064000003], [1437953472.0, 34980000.0, -27.158458008000004], [1437955200.0, 35000000.0, -27.125], [1437956928.0, 35020000.0, -27.091457992000002], [1437958656.0, 35040000.0, -27.057831936], [1437960384.0, 35060000.0, -27.024121784000002], [1437962112.0, 35080000.0, -26.990327488], [1437963840.0, 35100000.0, -26.956449], [1437965568.0, 35120000.0, -26.922486271999993], [1437967296.0, 35140000.0, -26.888439256000012], [1437969024.0, 35160000.0, -26.854307903999995], [1437970752.0, 35180000.0, -26.82009216800001], [1437972480.0, 35200000.0, -26.785792000000008], [1437974208.0, 35220000.0, -26.751407352000008], [1437975936.0, 35240000.0, -26.716938176000006], [1437977664.0, 35260000.0, -26.682384424000006], [1437979392.0, 35280000.0, -26.647746048000002], [1437981120.0, 35300000.0, -26.613023], [1437982848.0, 35320000.0, -26.578215232000005], [1437984576.0, 35340000.0, -26.543322695999997], [1437986304.0, 35360000.0, -26.508345344], [1437988032.0, 35380000.0, -26.473283128], [1437989760.0, 35400000.0, -26.438135999999993], [1437991488.0, 35420000.0, -26.402903912000014], [1437993216.0, 35440000.0, -26.367586815999992], [1437994944.0, 35460000.0, -26.33218466400001], [1437996672.0, 35480000.0, -26.296697408000007], [1437998400.0, 35500000.0, -26.261125000000007], [1438000128.0, 35520000.0, -26.225467392000006], [1438001856.0, 35540000.0, -26.189724536000007], [1438003584.0, 35560000.0, -26.153896384], [1438005312.0, 35580000.0, -26.117982888000007], [1438007040.0, 35600000.0, -26.081984], [1438008768.0, 35620000.0, -26.045899672000004], [1438010496.0, 35640000.0, -26.009729856], [1438012224.0, 35660000.0, -25.973474503999995], [1438013952.0, 35680000.0, -25.937133567999993], [1438015680.0, 35700000.0, -25.900706999999997], [1438017408.0, 35720000.0, -25.864194751999996], [1438019136.0, 35740000.0, -25.827596775999993], [1438020864.0, 35760000.0, -25.79091302399999], [1438022592.0, 35780000.0, -25.754143448000008], [1438024320.0, 35800000.0, -25.717287999999996], [1438026048.0, 35820000.0, -25.68034663200001], [1438027776.0, 35840000.0, -25.643319296], [1438029504.0, 35860000.0, -25.606205944000003], [1438031232.0, 35880000.0, -25.569006528000003], [1438032960.0, 35900000.0, -25.531720999999997], [1438034688.0, 35920000.0, -25.494349312000004], [1438036416.0, 35940000.0, -25.456891416000005], [1438038144.0, 35960000.0, -25.419347263999995], [1438039872.0, 35980000.0, -25.381716808], [1438041600.0, 36000000.0, -25.343999999999994], [1438043328.0, 36020000.0, -25.306196791999994], [1438045056.0, 36040000.0, -25.268307135999997], [1438046784.0, 36060000.0, -25.230330984000005], [1438048512.0, 36080000.0, -25.192268287999994], [1438050240.0, 36100000.0, -25.15411900000001], [1438051968.0, 36120000.0, -25.115883072000003], [1438053696.0, 36140000.0, -25.077560456000008], [1438055424.0, 36160000.0, -25.039151104000005], [1438057152.0, 36180000.0, -25.000654968], [1438058880.0, 36200000.0, -24.962072000000006], [1438060608.0, 36220000.0, -24.923402152], [1438062336.0, 36240000.0, -24.884645376], [1438064064.0, 36260000.0, -24.845801624000003], [1438065792.0, 36280000.0, -24.806870847999996], [1438067520.0, 36300000.0, -24.767852999999995], [1438069248.0, 36320000.0, -24.72874803199999], [1438070976.0, 36340000.0, -24.689555896], [1438072704.0, 36360000.0, -24.650276543999993], [1438074432.0, 36380000.0, -24.610909927999998], [1438076160.0, 36400000.0, -24.57145599999999], [1438077888.0, 36420000.0, -24.53191471200001], [1438079616.0, 36440000.0, -24.492286015999994], [1438081344.0, 36460000.0, -24.452569864000004], [1438083072.0, 36480000.0, -24.412766208], [1438084800.0, 36500000.0, -24.372875], [1438086528.0, 36520000.0, -24.332896192], [1438088256.0, 36540000.0, -24.292829736], [1438089984.0, 36560000.0, -24.252675584000002], [1438091712.0, 36580000.0, -24.212433688000004], [1438093440.0, 36600000.0, -24.172103999999997], [1438095168.0, 36620000.0, -24.131686472], [1438096896.0, 36640000.0, -24.091181055999996], [1438098624.0, 36660000.0, -24.050587703999994], [1438100352.0, 36680000.0, -24.00990636799999], [1438102080.0, 36700000.0, -23.96913700000001], [1438103808.0, 36720000.0, -23.928279551999992], [1438105536.0, 36740000.0, -23.887333976000008], [1438107264.0, 36760000.0, -23.846300224000004], [1438108992.0, 36780000.0, -23.805178248000004], [1438110720.0, 36800000.0, -23.763968000000006], [1438112448.0, 36820000.0, -23.722669432000004], [1438114176.0, 36840000.0, -23.681282496], [1438115904.0, 36860000.0, -23.639807144000002], [1438117632.0, 36880000.0, -23.598243328000002], [1438119360.0, 36900000.0, -23.556590999999997], [1438121088.0, 36920000.0, -23.514850111999998], [1438122816.0, 36940000.0, -23.473020616], [1438124544.0, 36960000.0, -23.43110246399999], [1438126272.0, 36980000.0, -23.389095607999998], [1438128000.0, 37000000.0, -23.346999999999994], [1438129728.0, 37020000.0, -23.304815591999997], [1438131456.0, 37040000.0, -23.26254233599999], [1438133184.0, 37060000.0, -23.220180184], [1438134912.0, 37080000.0, -23.177729087999985], [1438136640.0, 37100000.0, -23.135189000000004], [1438138368.0, 37120000.0, -23.092559872000002], [1438140096.0, 37140000.0, -23.049841655999998], [1438141824.0, 37160000.0, -23.007034304], [1438143552.0, 37180000.0, -22.964137768], [1438145280.0, 37200000.0, -22.921152], [1438147008.0, 37220000.0, -22.878076951999994], [1438148736.0, 37240000.0, -22.834912575999994], [1438150464.0, 37260000.0, -22.791658824000002], [1438152192.0, 37280000.0, -22.748315647999995], [1438153920.0, 37300000.0, -22.704882999999995], [1438155648.0, 37320000.0, -22.661360831999993], [1438157376.0, 37340000.0, -22.61774909600001], [1438159104.0, 37360000.0, -22.57404774399999], [1438160832.0, 37380000.0, -22.530256728000005], [1438162560.0, 37400000.0, -22.486376], [1438164288.0, 37420000.0, -22.442405512000008], [1438166016.0, 37440000.0, -22.398345216000003], [1438167744.0, 37460000.0, -22.354195064000002], [1438169472.0, 37480000.0, -22.309955007999996], [1438171200.0, 37500000.0, -22.265625], [1438172928.0, 37520000.0, -22.221204992000004], [1438174656.0, 37540000.0, -22.176694935999997], [1438176384.0, 37560000.0, -22.132094784000003], [1438178112.0, 37580000.0, -22.087404487999997], [1438179840.0, 37600000.0, -22.042623999999996], [1438181568.0, 37620000.0, -21.99775327199999], [1438183296.0, 37640000.0, -21.952792256000016], [1438185024.0, 37660000.0, -21.907740903999994], [1438186752.0, 37680000.0, -21.86259916800001], [1438188480.0, 37700000.0, -21.817367000000004], [1438190208.0, 37720000.0, -21.772044352], [1438191936.0, 37740000.0, -21.726631176000005], [1438193664.0, 37760000.0, -21.681127424000003], [1438195392.0, 37780000.0, -21.635533048], [1438197120.0, 37800000.0, -21.589847999999996], [1438198848.0, 37820000.0, -21.544072232000005], [1438200576.0, 37840000.0, -21.498205696], [1438202304.0, 37860000.0, -21.452248343999997], [1438204032.0, 37880000.0, -21.406200128000002], [1438205760.0, 37900000.0, -21.360060999999995], [1438207488.0, 37920000.0, -21.313830912000007], [1438209216.0, 37940000.0, -21.267509815999993], [1438210944.0, 37960000.0, -21.221097664000013], [1438212672.0, 37980000.0, -21.174594408000004], [1438214400.0, 38000000.0, -21.128000000000007], [1438216128.0, 38020000.0, -21.081314392000003], [1438217856.0, 38040000.0, -21.03453753600001], [1438219584.0, 38060000.0, -20.987669384000007], [1438221312.0, 38080000.0, -20.940709888000008], [1438223040.0, 38100000.0, -20.893659], [1438224768.0, 38120000.0, -20.846516672], [1438226496.0, 38140000.0, -20.799282855999998], [1438228224.0, 38160000.0, -20.751957504000003], [1438229952.0, 38180000.0, -20.70454056799999], [1438231680.0, 38200000.0, -20.657031999999994], [1438233408.0, 38220000.0, -20.609431751999992], [1438235136.0, 38240000.0, -20.561739775999996], [1438236864.0, 38260000.0, -20.513956023999988], [1438238592.0, 38280000.0, -20.466080448000007], [1438240320.0, 38300000.0, -20.41811299999999], [1438242048.0, 38320000.0, -20.37005363200001], [1438243776.0, 38340000.0, -20.321902296000005], [1438245504.0, 38360000.0, -20.273658944000005], [1438247232.0, 38380000.0, -20.225323528000004], [1438248960.0, 38400000.0, -20.176896], [1438250688.0, 38420000.0, -20.128376312], [1438252416.0, 38440000.0, -20.079764416000003], [1438254144.0, 38460000.0, -20.031060263999997], [1438255872.0, 38480000.0, -19.982263808], [1438257600.0, 38500000.0, -19.93337499999999], [1438259328.0, 38520000.0, -19.884393792000004], [1438261056.0, 38540000.0, -19.835320135999993], [1438262784.0, 38560000.0, -19.78615398400001], [1438264512.0, 38580000.0, -19.736895287999992], [1438266240.0, 38600000.0, -19.68754400000001], [1438267968.0, 38620000.0, -19.638100072000007], [1438269696.0, 38640000.0, -19.588563456000003], [1438271424.0, 38660000.0, -19.538934104000006], [1438273152.0, 38680000.0, -19.489211968000006], [1438274880.0, 38700000.0, -19.439397], [1438276608.0, 38720000.0, -19.389489152000003], [1438278336.0, 38740000.0, -19.339488376], [1438280064.0, 38760000.0, -19.289394624000003], [1438281792.0, 38780000.0, -19.239207847999992], [1438283520.0, 38800000.0, -19.188927999999997], [1438285248.0, 38820000.0, -19.138555032], [1438286976.0, 38840000.0, -19.088088895999995], [1438288704.0, 38860000.0, -19.037529543999995], [1438290432.0, 38880000.0, -18.986876927999994], [1438292160.0, 38900000.0, -18.93613099999999], [1438293888.0, 38920000.0, -18.88529171200001], [1438295616.0, 38940000.0, -18.834359015999986], [1438297344.0, 38960000.0, -18.783332864000002], [1438299072.0, 38980000.0, -18.732213208000005], [1438300800.0, 39000000.0, -18.681000000000004], [1438302528.0, 39020000.0, -18.629693192000005], [1438304256.0, 39040000.0, -18.578292736], [1438305984.0, 39060000.0, -18.526798583999998], [1438307712.0, 39080000.0, -18.475210687999997], [1438309440.0, 39100000.0, -18.423528999999995], [1438311168.0, 39120000.0, -18.371753471999995], [1438312896.0, 39140000.0, -18.319884055999992], [1438314624.0, 39160000.0, -18.267920703999998], [1438316352.0, 39180000.0, -18.215863367999994], [1438318080.0, 39200000.0, -18.16371200000001], [1438319808.0, 39220000.0, -18.11146655199999], [1438321536.0, 39240000.0, -18.059126976], [1438323264.0, 39260000.0, -18.006693224000003], [1438324992.0, 39280000.0, -17.954165248000002], [1438326720.0, 39300000.0, -17.901543000000004], [1438328448.0, 39320000.0, -17.848826432000003], [1438330176.0, 39340000.0, -17.796015496000003], [1438331904.0, 39360000.0, -17.743110144], [1438333632.0, 39380000.0, -17.690110327999996], [1438335360.0, 39400000.0, -17.637015999999996], [1438337088.0, 39420000.0, -17.583827111999994], [1438338816.0, 39440000.0, -17.530543616000003], [1438340544.0, 39460000.0, -17.477165463999988], [1438342272.0, 39480000.0, -17.423692607999996], [1438344000.0, 39500000.0, -17.370124999999994], [1438345728.0, 39520000.0, -17.316462591999993], [1438347456.0, 39540000.0, -17.262705335999996], [1438349184.0, 39560000.0, -17.208853184000006], [1438350912.0, 39580000.0, -17.15490608799999], [1438352640.0, 39600000.0, -17.10086400000001], [1438354368.0, 39620000.0, -17.046726872], [1438356096.0, 39640000.0, -16.992494656000005], [1438357824.0, 39660000.0, -16.938167303999997], [1438359552.0, 39680000.0, -16.883744768000007], [1438361280.0, 39700000.0, -16.829226999999996], [1438363008.0, 39720000.0, -16.774613952000003], [1438364736.0, 39740000.0, -16.719905575999995], [1438366464.0, 39760000.0, -16.665101823999997], [1438368192.0, 39780000.0, -16.61020264799999], [1438369920.0, 39800000.0, -16.555207999999993], [1438371648.0, 39820000.0, -16.500117831999987], [1438373376.0, 39840000.0, -16.44493209600001], [1438375104.0, 39860000.0, -16.389650743999987], [1438376832.0, 39880000.0, -16.334273728000007], [1438378560.0, 39900000.0, -16.278801], [1438380288.0, 39920000.0, -16.223232512000003], [1438382016.0, 39940000.0, -16.167568216], [1438383744.0, 39960000.0, -16.111808064], [1438385472.0, 39980000.0, -16.055952008], [1438387200.0, 40000000.0, -16.0], [1438388928.0, 40020000.0, -15.943951991999995], [1438390656.0, 40040000.0, -15.887807936000016], [1438392384.0, 40060000.0, -15.831567783999986], [1438394112.0, 40080000.0, -15.775231488000003], [1438395840.0, 40100000.0, -15.718799000000004], [1438397568.0, 40120000.0, -15.662270272000015], [1438399296.0, 40140000.0, -15.605645255999988], [1438401024.0, 40160000.0, -15.548923903999992], [1438402752.0, 40180000.0, -15.492106168000007], [1438404480.0, 40200000.0, -15.435192000000015], [1438406208.0, 40220000.0, -15.378181351999984], [1438407936.0, 40240000.0, -15.32107417600001], [1438409664.0, 40260000.0, -15.263870424000018], [1438411392.0, 40280000.0, -15.206570048000017], [1438413120.0, 40300000.0, -15.14917299999999], [1438414848.0, 40320000.0, -15.09167923199999], [1438416576.0, 40340000.0, -15.034088695999998], [1438418304.0, 40360000.0, -14.976401344000024], [1438420032.0, 40380000.0, -14.918617127999994], [1438421760.0, 40400000.0, -14.860736000000003], [1438423488.0, 40420000.0, -14.802757912000004], [1438425216.0, 40440000.0, -14.744682816000008], [1438426944.0, 40460000.0, -14.686510663999982], [1438428672.0, 40480000.0, -14.628241408000008], [1438430400.0, 40500000.0, -14.56987500000001], [1438432128.0, 40520000.0, -14.511411392], [1438433856.0, 40540000.0, -14.452850536], [1438435584.0, 40560000.0, -14.394192383999993], [1438437312.0, 40580000.0, -14.335436888000004], [1438439040.0, 40600000.0, -14.276584000000014], [1438440768.0, 40620000.0, -14.217633671999991], [1438442496.0, 40640000.0, -14.158585856000002], [1438444224.0, 40660000.0, -14.099440504], [1438445952.0, 40680000.0, -14.04019756800001], [1438447680.0, 40700000.0, -13.980857], [1438449408.0, 40720000.0, -13.921418751999994], [1438451136.0, 40740000.0, -13.861882775999987], [1438452864.0, 40760000.0, -13.802249024000005], [1438454592.0, 40780000.0, -13.742517447999987], [1438456320.0, 40800000.0, -13.682687999999999], [1438458048.0, 40820000.0, -13.62276063200001], [1438459776.0, 40840000.0, -13.562735296000014], [1438461504.0, 40860000.0, -13.50261194399998], [1438463232.0, 40880000.0, -13.442390528000004], [1438464960.0, 40900000.0, -13.38207100000001], [1438466688.0, 40920000.0, -13.32165331200001], [1438468416.0, 40940000.0, -13.261137415999997], [1438470144.0, 40960000.0, -13.200523263999997], [1438471872.0, 40980000.0, -13.139810808000007], [1438473600.0, 41000000.0, -13.079000000000008], [1438475328.0, 41020000.0, -13.018090791999981], [1438477056.0, 41040000.0, -12.957083135999994], [1438478784.0, 41060000.0, -12.895976984000015], [1438480512.0, 41080000.0, -12.83477228800001], [1438482240.0, 41100000.0, -12.773468999999992], [1438483968.0, 41120000.0, -12.71206707200001], [1438485696.0, 41140000.0, -12.650566456000007], [1438487424.0, 41160000.0, -12.588967104000005], [1438489152.0, 41180000.0, -12.527268968000001], [1438490880.0, 41200000.0, -12.465472000000005], [1438492608.0, 41220000.0, -12.403576152000014], [1438494336.0, 41240000.0, -12.341581376000008], [1438496064.0, 41260000.0, -12.279487623999998], [1438497792.0, 41280000.0, -12.217294848000009], [1438499520.0, 41300000.0, -12.155003000000008], [1438501248.0, 41320000.0, -12.092612032000005], [1438502976.0, 41340000.0, -12.030121895999983], [1438504704.0, 41360000.0, -11.967532543999994], [1438506432.0, 41380000.0, -11.904843927999991], [1438508160.0, 41400000.0, -11.842056000000014], [1438509888.0, 41420000.0, -11.779168711999986], [1438511616.0, 41440000.0, -11.71618201599999], [1438513344.0, 41460000.0, -11.653095864000008], [1438515072.0, 41480000.0, -11.589910208000006], [1438516800.0, 41500000.0, -11.526624999999981], [1438518528.0, 41520000.0, -11.463240192], [1438520256.0, 41540000.0, -11.399755736000003], [1438521984.0, 41560000.0, -11.336171584000013], [1438523712.0, 41580000.0, -11.272487687999984], [1438525440.0, 41600000.0, -11.208703999999997], [1438527168.0, 41620000.0, -11.144820472000006], [1438528896.0, 41640000.0, -11.080837056000007], [1438530624.0, 41660000.0, -11.016753703999981], [1438532352.0, 41680000.0, -10.952570367999982], [1438534080.0, 41700000.0, -10.888287000000005], [1438535808.0, 41720000.0, -10.823903552000019], [1438537536.0, 41740000.0, -10.759419975999975], [1438539264.0, 41760000.0, -10.694836224], [1438540992.0, 41780000.0, -10.630152248000016], [1438542720.0, 41800000.0, -10.565368000000007], [1438544448.0, 41820000.0, -10.500483431999996], [1438546176.0, 41840000.0, -10.435498495999994], [1438547904.0, 41860000.0, -10.370413143999997], [1438549632.0, 41880000.0, -10.305227328000015], [1438551360.0, 41900000.0, -10.239940999999988], [1438553088.0, 41920000.0, -10.174554111999996], [1438554816.0, 41940000.0, -10.109066616000007], [1438556544.0, 41960000.0, -10.043478463999989], [1438558272.0, 41980000.0, -9.97778960799998], [1438560000.0, 42000000.0, -9.911999999999992], [1438561728.0, 42020000.0, -9.84610959199999], [1438563456.0, 42040000.0, -9.780118336000001], [1438565184.0, 42060000.0, -9.71402618399999], [1438566912.0, 42080000.0, -9.647833087999985], [1438568640.0, 42100000.0, -9.581539000000006], [1438570368.0, 42120000.0, -9.51514387200001], [1438572096.0, 42140000.0, -9.448647655999977], [1438573824.0, 42160000.0, -9.382050304000003], [1438575552.0, 42180000.0, -9.315351768], [1438577280.0, 42200000.0, -9.248552000000004], [1438579008.0, 42220000.0, -9.181650951999984], [1438580736.0, 42240000.0, -9.114648575999993], [1438582464.0, 42260000.0, -9.047544824], [1438584192.0, 42280000.0, -8.980339648000012], [1438585920.0, 42300000.0, -8.913032999999984], [1438587648.0, 42320000.0, -8.845624831999999], [1438589376.0, 42340000.0, -8.778115096000008], [1438591104.0, 42360000.0, -8.710503744000007], [1438592832.0, 42380000.0, -8.642790727999994], [1438594560.0, 42400000.0, -8.574975999999992], [1438596288.0, 42420000.0, -8.507059511999998], [1438598016.0, 42440000.0, -8.439041216000007], [1438599744.0, 42460000.0, -8.370921064000001], [1438601472.0, 42480000.0, -8.302699008000005], [1438603200.0, 42500000.0, -8.234375], [1438604928.0, 42520000.0, -8.165948992000011], [1438606656.0, 42540000.0, -8.097420936000006], [1438608384.0, 42560000.0, -8.02879078399998], [1438610112.0, 42580000.0, -7.960058488000001], [1438611840.0, 42600000.0, -7.891224000000008], [1438613568.0, 42620000.0, -7.822287272000025], [1438615296.0, 42640000.0, -7.753248255999992], [1438617024.0, 42660000.0, -7.684106903999989], [1438618752.0, 42680000.0, -7.614863168000014], [1438620480.0, 42700000.0, -7.545517000000018], [1438622208.0, 42720000.0, -7.476068351999984], [1438623936.0, 42740000.0, -7.406517176000008], [1438625664.0, 42760000.0, -7.336863424000001], [1438627392.0, 42780000.0, -7.267107048000014], [1438629120.0, 42800000.0, -7.197248000000002], [1438630848.0, 42820000.0, -7.127286232000003], [1438632576.0, 42840000.0, -7.057221695999999], [1438634304.0, 42860000.0, -6.987054344000015], [1438636032.0, 42880000.0, -6.916784127999989], [1438637760.0, 42900000.0, -6.846411000000003], [1438639488.0, 42920000.0, -6.775934912000025], [1438641216.0, 42940000.0, -6.705355816000022], [1438642944.0, 42960000.0, -6.63467366399999], [1438644672.0, 42980000.0, -6.563888408000011], [1438646400.0, 43000000.0, -6.493000000000009], [1438648128.0, 43020000.0, -6.422008392000009], [1438649856.0, 43040000.0, -6.350913535999993], [1438651584.0, 43060000.0, -6.279715383999999], [1438653312.0, 43080000.0, -6.2084138879999955], [1438655040.0, 43100000.0, -6.137009000000006], [1438656768.0, 43120000.0, -6.065500671999999], [1438658496.0, 43140000.0, -5.993888855999998], [1438660224.0, 43160000.0, -5.922173504], [1438661952.0, 43180000.0, -5.850354568000014], [1438663680.0, 43200000.0, -5.778431999999995], [1438665408.0, 43220000.0, -5.706405751999995], [1438667136.0, 43240000.0, -5.6342757759999955], [1438668864.0, 43260000.0, -5.562042024000007], [1438670592.0, 43280000.0, -5.4897044479999835], [1438672320.0, 43300000.0, -5.417262999999991], [1438674048.0, 43320000.0, -5.3447176320000125], [1438675776.0, 43340000.0, -5.272068296000015], [1438677504.0, 43360000.0, -5.19931494399998], [1438679232.0, 43380000.0, -5.126457528000003], [1438680960.0, 43400000.0, -5.0534959999999955], [1438682688.0, 43420000.0, -4.98043031200001], [1438684416.0, 43440000.0, -4.907260416], [1438686144.0, 43460000.0, -4.833986263999989], [1438687872.0, 43480000.0, -4.760607808000003], [1438689600.0, 43500000.0, -4.687125000000023], [1438691328.0, 43520000.0, -4.613537791999988], [1438693056.0, 43540000.0, -4.53984613599998], [1438694784.0, 43560000.0, -4.4660499840000085], [1438696512.0, 43580000.0, -4.392149288000027], [1438698240.0, 43600000.0, -4.3181439999999895], [1438699968.0, 43620000.0, -4.244034072000005], [1438701696.0, 43640000.0, -4.169819456000013], [1438703424.0, 43660000.0, -4.09550010400001], [1438705152.0, 43680000.0, -4.021075967999991], [1438706880.0, 43700000.0, -3.9465469999999954], [1438708608.0, 43720000.0, -3.8719131520000047], [1438710336.0, 43740000.0, -3.797174376000001], [1438712064.0, 43760000.0, -3.7223306239999943], [1438713792.0, 43780000.0, -3.647381847999995], [1438715520.0, 43800000.0, -3.572327999999999], [1438717248.0, 43820000.0, -3.497169032000002], [1438718976.0, 43840000.0, -3.4219048959999867], [1438720704.0, 43860000.0, -3.346535543999991], [1438722432.0, 43880000.0, -3.271060927999997], [1438724160.0, 43900000.0, -3.195481000000015], [1438725888.0, 43920000.0, -3.119795711999984], [1438727616.0, 43940000.0, -3.044005016], [1438729344.0, 43960000.0, -2.9681088640000155], [1438731072.0, 43980000.0, -2.892107208000013], [1438732800.0, 44000000.0, -2.815999999999974], [1438734528.0, 44020000.0, -2.7397871919999943], [1438736256.0, 44040000.0, -2.663468736000013], [1438737984.0, 44060000.0, -2.5870445839999974], [1438739712.0, 44080000.0, -2.5105146879999864], [1438741440.0, 44100000.0, -2.4338790000000046], [1438743168.0, 44120000.0, -2.357137471999991], [1438744896.0, 44140000.0, -2.2802900560000126], [1438746624.0, 44160000.0, -2.2033367039999945], [1438748352.0, 44180000.0, -2.1262773679999896], [1438750080.0, 44200000.0, -2.049112000000008], [1438751808.0, 44220000.0, -1.9718405520000175], [1438753536.0, 44240000.0, -1.8944629759999856], [1438755264.0, 44260000.0, -1.8169792240000078], [1438756992.0, 44280000.0, -1.739389248000009], [1438758720.0, 44300000.0, -1.6616930000000139], [1438760448.0, 44320000.0, -1.5838904319999898], [1438762176.0, 44340000.0, -1.505981496000004], [1438763904.0, 44360000.0, -1.4279661439999956], [1438765632.0, 44380000.0, -1.3498443280000032], [1438767360.0, 44400000.0, -1.2716159999999945], [1438769088.0, 44420000.0, -1.193281111999994], [1438770816.0, 44440000.0, -1.1148396159999976], [1438772544.0, 44460000.0, -1.0362914640000014], [1438774272.0, 44480000.0, -0.957636607999973], [1438776000.0, 44500000.0, -0.8788749999999936], [1438777728.0, 44520000.0, -0.8000065919999884], [1438779456.0, 44540000.0, -0.72103133600001], [1438781184.0, 44560000.0, -0.6419491839999836], [1438782912.0, 44580000.0, -0.5627600879999903], [1438784640.0, 44600000.0, -0.4834639999999979], [1438786368.0, 44620000.0, -0.4040608720000165], [1438788096.0, 44640000.0, -0.3245506559999711], [1438789824.0, 44660000.0, -0.2449333039999999], [1438791552.0, 44680000.0, -0.1652087679999994], [1438793280.0, 44700000.0, -0.08537700000000825], [1438795008.0, 44720000.0, -0.005437951999979873], [1438796736.0, 44740000.0, 0.07460842400000445], [1438798464.0, 44760000.0, 0.15476217600000552], [1438800192.0, 44780000.0, 0.2350233519999705], [1438801920.0, 44800000.0, 0.315392000000017], [1438803648.0, 44820000.0, 0.3958681680000211], [1438805376.0, 44840000.0, 0.4764519039999868], [1438807104.0, 44860000.0, 0.5571432559999892], [1438808832.0, 44880000.0, 0.6379422720000321], [1438810560.0, 44900000.0, 0.7188490000000058], [1438812288.0, 44920000.0, 0.7998634879999855], [1438814016.0, 44940000.0, 0.8809857840000035], [1438815744.0, 44960000.0, 0.9622159360000069], [1438817472.0, 44980000.0, 1.0435539920000139], [1438819200.0, 45000000.0, 1.125], [1438820928.0, 45020000.0, 1.2065540079999977], [1438822656.0, 45040000.0, 1.2882160639999967], [1438824384.0, 45060000.0, 1.3699862160000151], [1438826112.0, 45080000.0, 1.4518645120000144], [1438827840.0, 45100000.0, 1.5338509999999985], [1438829568.0, 45120000.0, 1.6159457279999714], [1438831296.0, 45140000.0, 1.6981487440000222], [1438833024.0, 45160000.0, 1.7804600959999988], [1438834752.0, 45180000.0, 1.8628798319999902], [1438836480.0, 45200000.0, 1.945407999999972], [1438838208.0, 45220000.0, 2.028044648000005], [1438839936.0, 45240000.0, 2.110789823999994], [1438841664.0, 45260000.0, 2.1936435759999853], [1438843392.0, 45280000.0, 2.276605951999983], [1438845120.0, 45300000.0, 2.359677000000005], [1438846848.0, 45320000.0, 2.442856768000013], [1438848576.0, 45340000.0, 2.5261453039999964], [1438850304.0, 45360000.0, 2.609542655999988], [1438852032.0, 45380000.0, 2.6930488720000056], [1438853760.0, 45400000.0, 2.7766639999999967], [1438855488.0, 45420000.0, 2.8603880879999792], [1438857216.0, 45440000.0, 2.9442211839999857], [1438858944.0, 45460000.0, 3.02816333600002], [1438860672.0, 45480000.0, 3.112214592000001], [1438862400.0, 45500000.0, 3.196374999999989], [1438864128.0, 45520000.0, 3.2806446079999887], [1438865856.0, 45540000.0, 3.3650234640000036], [1438867584.0, 45560000.0, 3.4495116159999952], [1438869312.0, 45580000.0, 3.534109111999996], [1438871040.0, 45600000.0, 3.6188159999999954], [1438872768.0, 45620000.0, 3.703632328000012], [1438874496.0, 45640000.0, 3.7885581440000067], [1438876224.0, 45660000.0, 3.873593495999998], [1438877952.0, 45680000.0, 3.958738432000004], [1438879680.0, 45700000.0, 4.043993000000015], [1438881408.0, 45720000.0, 4.129357248000005], [1438883136.0, 45740000.0, 4.214831223999994], [1438884864.0, 45760000.0, 4.300414975999971], [1438886592.0, 45780000.0, 4.3861085520000245], [1438888320.0, 45800000.0, 4.471912000000017], [1438890048.0, 45820000.0, 4.557825367999982], [1438891776.0, 45840000.0, 4.6438487039999785], [1438893504.0, 45860000.0, 4.729982056000026], [1438895232.0, 45880000.0, 4.816225471999999], [1438896960.0, 45900000.0, 4.902578999999989], [1438898688.0, 45920000.0, 4.989042687999984], [1438900416.0, 45940000.0, 5.075616584000002], [1438902144.0, 45960000.0, 5.162300735999992], [1438903872.0, 45980000.0, 5.249095191999984], [1438905600.0, 46000000.0, 5.33599999999997], [1438907328.0, 46020000.0, 5.423015208000024], [1438909056.0, 46040000.0, 5.510140864000007], [1438910784.0, 46060000.0, 5.597377015999982], [1438912512.0, 46080000.0, 5.684723711999979], [1438914240.0, 46100000.0, 5.7721810000000175], [1438915968.0, 46120000.0, 5.859748928000002], [1438917696.0, 46140000.0, 5.947427543999979], [1438919424.0, 46160000.0, 6.035216895999994], [1438921152.0, 46180000.0, 6.12311703200001], [1438922880.0, 46200000.0, 6.211128000000002], [1438924608.0, 46220000.0, 6.299249847999988], [1438926336.0, 46240000.0, 6.387482623999986], [1438928064.0, 46260000.0, 6.475826376000015], [1438929792.0, 46280000.0, 6.564281151999992], [1438931520.0, 46300000.0, 6.652846999999994], [1438933248.0, 46320000.0, 6.741523967999996], [1438934976.0, 46340000.0, 6.830312104000029], [1438936704.0, 46360000.0, 6.9192114560000135], [1438938432.0, 46380000.0, 7.008222071999995], [1438940160.0, 46400000.0, 7.0973439999999925], [1438941888.0, 46420000.0, 7.186577288000024], [1438943616.0, 46440000.0, 7.275921984000021], [1438945344.0, 46460000.0, 7.36537813599999], [1438947072.0, 46480000.0, 7.4549457919999895], [1438948800.0, 46500000.0, 7.544625000000025], [1438950528.0, 46520000.0, 7.634415808], [1438952256.0, 46540000.0, 7.724318264000004], [1438953984.0, 46560000.0, 7.814332415999999], [1438955712.0, 46580000.0, 7.904458312000017], [1438957440.0, 46600000.0, 7.994696000000005], [1438959168.0, 46620000.0, 8.085045528000009], [1438960896.0, 46640000.0, 8.175506943999977], [1438962624.0, 46660000.0, 8.266080296000013], [1438964352.0, 46680000.0, 8.35676563200002], [1438966080.0, 46700000.0, 8.447562999999988], [1438967808.0, 46720000.0, 8.538472447999993], [1438969536.0, 46740000.0, 8.629494024000024], [1438971264.0, 46760000.0, 8.720627776], [1438972992.0, 46780000.0, 8.811873751999997], [1438974720.0, 46800000.0, 8.903231999999988], [1438976448.0, 46820000.0, 8.994702568000008], [1438978176.0, 46840000.0, 9.086285504000003], [1438979904.0, 46860000.0, 9.177980856000005], [1438981632.0, 46880000.0, 9.26978867199999], [1438983360.0, 46900000.0, 9.361709000000005], [1438985088.0, 46920000.0, 9.45374188800001], [1438986816.0, 46940000.0, 9.545887383999997], [1438988544.0, 46960000.0, 9.638145535999996], [1438990272.0, 46980000.0, 9.730516392000027], [1438992000.0, 47000000.0, 9.823000000000022], [1438993728.0, 47020000.0, 9.915596407999999], [1438995456.0, 47040000.0, 10.008305663999977], [1438997184.0, 47060000.0, 10.10112781600003], [1438998912.0, 47080000.0, 10.194062912000007], [1439000640.0, 47100000.0, 10.287110999999996], [1439002368.0, 47120000.0, 10.380272127999987], [1439004096.0, 47140000.0, 10.473546344000027], [1439005824.0, 47160000.0, 10.566933696000007], [1439007552.0, 47180000.0, 10.660434232], [1439009280.0, 47200000.0, 10.754047999999997], [1439011008.0, 47220000.0, 10.847775048000017], [1439012736.0, 47240000.0, 10.941615424000005], [1439014464.0, 47260000.0, 11.03556917600001], [1439016192.0, 47280000.0, 11.129636351999977], [1439017920.0, 47300000.0, 11.223817000000025], [1439019648.0, 47320000.0, 11.318111168000016], [1439021376.0, 47340000.0, 11.412518903999995], [1439023104.0, 47360000.0, 11.507040255999982], [1439024832.0, 47380000.0, 11.601675272000023], [1439026560.0, 47400000.0, 11.696423999999993], [1439028288.0, 47420000.0, 11.791286487999997], [1439030016.0, 47440000.0, 11.886262783999996], [1439031744.0, 47460000.0, 11.981352936000022], [1439033472.0, 47480000.0, 12.076556992000008], [1439035200.0, 47500000.0, 12.171875], [1439036928.0, 47520000.0, 12.267307008000003], [1439038656.0, 47540000.0, 12.362853063999978], [1439040384.0, 47560000.0, 12.458513216000014], [1439042112.0, 47580000.0, 12.554287512000002], [1439043840.0, 47600000.0, 12.650176000000002], [1439045568.0, 47620000.0, 12.746178727999975], [1439047296.0, 47640000.0, 12.842295744000012], [1439049024.0, 47660000.0, 12.938527096000016], [1439050752.0, 47680000.0, 13.03487283199999], [1439052480.0, 47700000.0, 13.131332999999984], [1439054208.0, 47720000.0, 13.227907648000027], [1439055936.0, 47740000.0, 13.324596823999997], [1439057664.0, 47760000.0, 13.421400575999996], [1439059392.0, 47780000.0, 13.518318951999987], [1439061120.0, 47800000.0, 13.615352000000001], [1439062848.0, 47820000.0, 13.712499768], [1439064576.0, 47840000.0, 13.809762304000003], [1439066304.0, 47860000.0, 13.90713965599997], [1439068032.0, 47880000.0, 14.004631872000019], [1439069760.0, 47900000.0, 14.102238999999997], [1439071488.0, 47920000.0, 14.19996108799998], [1439073216.0, 47940000.0, 14.297798183999973], [1439074944.0, 47960000.0, 14.39575033600002], [1439076672.0, 47980000.0, 14.493817592], [1439078400.0, 48000000.0, 14.591999999999999], [1439080128.0, 48020000.0, 14.69029760799998], [1439081856.0, 48040000.0, 14.788710464000019], [1439083584.0, 48060000.0, 14.887238616000005], [1439085312.0, 48080000.0, 14.985882111999999], [1439087040.0, 48100000.0, 15.084640999999976], [1439088768.0, 48120000.0, 15.183515328000013], [1439090496.0, 48140000.0, 15.282505144000012], [1439092224.0, 48160000.0, 15.381610495999993], [1439093952.0, 48180000.0, 15.480831431999988], [1439095680.0, 48200000.0, 15.580168000000015], [1439097408.0, 48220000.0, 15.679620248000006], [1439099136.0, 48240000.0, 15.779188223999995], [1439100864.0, 48260000.0, 15.87887197599997], [1439102592.0, 48280000.0, 15.978671552000023], [1439104320.0, 48300000.0, 16.078587000000013], [1439106048.0, 48320000.0, 16.178618367999988], [1439107776.0, 48340000.0, 16.27876570399998], [1439109504.0, 48360000.0, 16.37902905600002], [1439111232.0, 48380000.0, 16.479408472000003], [1439112960.0, 48400000.0, 16.579904], [1439114688.0, 48420000.0, 16.680515687999986], [1439116416.0, 48440000.0, 16.78124358400001], [1439118144.0, 48460000.0, 16.882087736000003], [1439119872.0, 48480000.0, 16.983048192], [1439121600.0, 48500000.0, 17.084124999999972], [1439123328.0, 48520000.0, 17.185318208000012], [1439125056.0, 48540000.0, 17.28662786400001], [1439126784.0, 48560000.0, 17.388054015999984], [1439128512.0, 48580000.0, 17.48959671199998], [1439130240.0, 48600000.0, 17.591256000000016], [1439131968.0, 48620000.0, 17.693031927999996], [1439133696.0, 48640000.0, 17.794924543999997], [1439135424.0, 48660000.0, 17.896933895999993], [1439137152.0, 48680000.0, 17.999060032000003], [1439138880.0, 48700000.0, 18.101303000000016], [1439140608.0, 48720000.0, 18.203662847999993], [1439142336.0, 48740000.0, 18.306139623999982], [1439144064.0, 48760000.0, 18.408733376000015], [1439145792.0, 48780000.0, 18.511444151999996], [1439147520.0, 48800000.0, 18.614272], [1439149248.0, 48820000.0, 18.71721696799999], [1439150976.0, 48840000.0, 18.820279104000022], [1439152704.0, 48860000.0, 18.923458456000006], [1439154432.0, 48880000.0, 19.026755072], [1439156160.0, 48900000.0, 19.130168999999967], [1439157888.0, 48920000.0, 19.233700288000023], [1439159616.0, 48940000.0, 19.337348984000016], [1439161344.0, 48960000.0, 19.441115135999993], [1439163072.0, 48980000.0, 19.544998791999987], [1439164800.0, 49000000.0, 19.64900000000003], [1439166528.0, 49020000.0, 19.75311880800001], [1439168256.0, 49040000.0, 19.85735526399999], [1439169984.0, 49060000.0, 19.96170941599999], [1439171712.0, 49080000.0, 20.066181312000012], [1439173440.0, 49100000.0, 20.170771000000002], [1439175168.0, 49120000.0, 20.275478528000008], [1439176896.0, 49140000.0, 20.380303943999976], [1439178624.0, 49160000.0, 20.48524729600001], [1439180352.0, 49180000.0, 20.590308632000017], [1439182080.0, 49200000.0, 20.695487999999983], [1439183808.0, 49220000.0, 20.800785447999985], [1439185536.0, 49240000.0, 20.906201024000026], [1439187264.0, 49260000.0, 21.01173477600001], [1439188992.0, 49280000.0, 21.117386752], [1439190720.0, 49300000.0, 21.223156999999986], [1439192448.0, 49320000.0, 21.329045568000012], [1439194176.0, 49340000.0, 21.435052503999998], [1439195904.0, 49360000.0, 21.541177856000004], [1439197632.0, 49380000.0, 21.64742167199998], [1439199360.0, 49400000.0, 21.753784000000024], [1439201088.0, 49420000.0, 21.860264888000017], [1439202816.0, 49440000.0, 21.966864384000004], [1439204544.0, 49460000.0, 22.07358253599999], [1439206272.0, 49480000.0, 22.180419392000033], [1439208000.0, 49500000.0, 22.28737500000001], [1439209728.0, 49520000.0, 22.394449408], [1439211456.0, 49540000.0, 22.501642663999974], [1439213184.0, 49560000.0, 22.608954816000022], [1439214912.0, 49580000.0, 22.71638591200002], [1439216640.0, 49600000.0, 22.823936000000003], [1439218368.0, 49620000.0, 22.931605127999987], [1439220096.0, 49640000.0, 23.039393344000032], [1439221824.0, 49660000.0, 23.147300696000002], [1439223552.0, 49680000.0, 23.255327232], [1439225280.0, 49700000.0, 23.363472999999985], [1439227008.0, 49720000.0, 23.47173804800002], [1439228736.0, 49740000.0, 23.58012242400001], [1439230464.0, 49760000.0, 23.688626176], [1439232192.0, 49780000.0, 23.79724935199998], [1439233920.0, 49800000.0, 23.905992000000012], [1439235648.0, 49820000.0, 24.014854168000014], [1439237376.0, 49840000.0, 24.12383590399999], [1439239104.0, 49860000.0, 24.232937255999985], [1439240832.0, 49880000.0, 24.342158272000034], [1439242560.0, 49900000.0, 24.451499], [1439244288.0, 49920000.0, 24.560959487999995], [1439246016.0, 49940000.0, 24.670539784], [1439247744.0, 49960000.0, 24.780239936000015], [1439249472.0, 49980000.0, 24.890059992000005], [1439251200.0, 50000000.0, 25.0], [1439252928.0, 50020000.0, 25.11006000799999], [1439254656.0, 50040000.0, 25.22024006399998], [1439256384.0, 50060000.0, 25.330540216000017], [1439258112.0, 50080000.0, 25.440960512000004], [1439259840.0, 50100000.0, 25.551500999999988], [1439261568.0, 50120000.0, 25.662161727999973], [1439263296.0, 50140000.0, 25.77294274400002], [1439265024.0, 50160000.0, 25.883844096000004], [1439266752.0, 50180000.0, 25.994865831999988], [1439268480.0, 50200000.0, 26.106007999999974], [1439270208.0, 50220000.0, 26.217270648000024], [1439271936.0, 50240000.0, 26.328653824], [1439273664.0, 50260000.0, 26.440157575999976], [1439275392.0, 50280000.0, 26.551781951999985], [1439277120.0, 50300000.0, 26.663527000000002], [1439278848.0, 50320000.0, 26.775392768000003], [1439280576.0, 50340000.0, 26.887379303999992], [1439282304.0, 50360000.0, 26.999486655999974], [1439284032.0, 50380000.0, 27.111714872000007], [1439285760.0, 50400000.0, 27.224064000000013], [1439287488.0, 50420000.0, 27.33653408799998], [1439289216.0, 50440000.0, 27.449125183999968], [1439290944.0, 50460000.0, 27.56183733600001], [1439292672.0, 50480000.0, 27.674670591999984], [1439294400.0, 50500000.0, 27.78762499999999], [1439296128.0, 50520000.0, 27.900700607999994], [1439297856.0, 50540000.0, 28.013897463999996], [1439299584.0, 50560000.0, 28.127215616], [1439301312.0, 50580000.0, 28.240655111999985], [1439303040.0, 50600000.0, 28.35421599999998], [1439304768.0, 50620000.0, 28.467898328000018], [1439306496.0, 50640000.0, 28.58170214400002], [1439308224.0, 50660000.0, 28.695627495999986], [1439309952.0, 50680000.0, 28.80967443199998], [1439311680.0, 50700000.0, 28.923843000000005], [1439313408.0, 50720000.0, 29.03813324800001], [1439315136.0, 50740000.0, 29.152545223999994], [1439316864.0, 50760000.0, 29.26707897599998], [1439318592.0, 50780000.0, 29.38173455200001], [1439320320.0, 50800000.0, 29.496511999999996], [1439322048.0, 50820000.0, 29.61141136799999], [1439323776.0, 50840000.0, 29.726432703999976], [1439325504.0, 50860000.0, 29.841576056000036], [1439327232.0, 50880000.0, 29.956841471999994], [1439328960.0, 50900000.0, 30.07222899999998], [1439330688.0, 50920000.0, 30.187738687999982], [1439332416.0, 50940000.0, 30.303370584000007], [1439334144.0, 50960000.0, 30.419124736], [1439335872.0, 50980000.0, 30.535001191999996], [1439337600.0, 51000000.0, 30.650999999999982], [1439339328.0, 51020000.0, 30.767121208000034], [1439341056.0, 51040000.0, 30.883364864], [1439342784.0, 51060000.0, 30.999731016], [1439344512.0, 51080000.0, 31.116219711999975], [1439346240.0, 51100000.0, 31.23283100000002], [1439347968.0, 51120000.0, 31.349564927999978], [1439349696.0, 51140000.0, 31.466421543999985], [1439351424.0, 51160000.0, 31.583400896], [1439353152.0, 51180000.0, 31.700503032], [1439354880.0, 51200000.0, 31.817728000000017], [1439356608.0, 51220000.0, 31.935075847999997], [1439358336.0, 51240000.0, 32.052546624], [1439360064.0, 51260000.0, 32.170140376000006], [1439361792.0, 51280000.0, 32.28785715199999], [1439363520.0, 51300000.0, 32.405697], [1439365248.0, 51320000.0, 32.523659967999976], [1439366976.0, 51340000.0, 32.64174610400002], [1439368704.0, 51360000.0, 32.759955456], [1439370432.0, 51380000.0, 32.878288072000004], [1439372160.0, 51400000.0, 32.996743999999964], [1439373888.0, 51420000.0, 33.11532328800001], [1439375616.0, 51440000.0, 33.234025984], [1439377344.0, 51460000.0, 33.35285213599998], [1439379072.0, 51480000.0, 33.471801791999965], [1439380800.0, 51500000.0, 33.59087500000004], [1439382528.0, 51520000.0, 33.71007180800002], [1439384256.0, 51540000.0, 33.82939226399999], [1439385984.0, 51560000.0, 33.94883641599999], [1439387712.0, 51580000.0, 34.068404312], [1439389440.0, 51600000.0, 34.188096000000016], [1439391168.0, 51620000.0, 34.30791152799999], [1439392896.0, 51640000.0, 34.42785094399997], [1439394624.0, 51660000.0, 34.54791429600003], [1439396352.0, 51680000.0, 34.66810163200002], [1439398080.0, 51700000.0, 34.78841299999999], [1439399808.0, 51720000.0, 34.908848447999986], [1439401536.0, 51740000.0, 35.029408024000006], [1439403264.0, 51760000.0, 35.15009177600001], [1439404992.0, 51780000.0, 35.27089975199999], [1439406720.0, 51800000.0, 35.391831999999965], [1439408448.0, 51820000.0, 35.51288856800002], [1439410176.0, 51840000.0, 35.634069503999996], [1439411904.0, 51860000.0, 35.755374856], [1439413632.0, 51880000.0, 35.87680467199999], [1439415360.0, 51900000.0, 35.99835900000002], [1439417088.0, 51920000.0, 36.12003788800001], [1439418816.0, 51940000.0, 36.241841384], [1439420544.0, 51960000.0, 36.36376953600001], [1439422272.0, 51980000.0, 36.485822392000046], [1439424000.0, 52000000.0, 36.60800000000003], [1439425728.0, 52020000.0, 36.730302408], [1439427456.0, 52040000.0, 36.852729663999966], [1439429184.0, 52060000.0, 36.975281816000034], [1439430912.0, 52080000.0, 37.097958912000024], [1439432640.0, 52100000.0, 37.220760999999996], [1439434368.0, 52120000.0, 37.34368812799998], [1439436096.0, 52140000.0, 37.466740344000016], [1439437824.0, 52160000.0, 37.589917696], [1439439552.0, 52180000.0, 37.71322023199998], [1439441280.0, 52200000.0, 37.83664799999998], [1439443008.0, 52220000.0, 37.96020104800003], [1439444736.0, 52240000.0, 38.08387942400002], [1439446464.0, 52260000.0, 38.207683176], [1439448192.0, 52280000.0, 38.33161235199998], [1439449920.0, 52300000.0, 38.45566700000002], [1439451648.0, 52320000.0, 38.57984716800003], [1439453376.0, 52340000.0, 38.70415290400001], [1439455104.0, 52360000.0, 38.82858425599997], [1439456832.0, 52380000.0, 38.953141272000025], [1439458560.0, 52400000.0, 39.07782400000002], [1439460288.0, 52420000.0, 39.20263248799998], [1439462016.0, 52440000.0, 39.327566784], [1439463744.0, 52460000.0, 39.45262693600003], [1439465472.0, 52480000.0, 39.57781299200002], [1439467200.0, 52500000.0, 39.703125], [1439468928.0, 52520000.0, 39.828563008], [1439470656.0, 52540000.0, 39.95412706399998], [1439472384.0, 52560000.0, 40.07981721600001], [1439474112.0, 52580000.0, 40.20563351200002], [1439475840.0, 52600000.0, 40.331575999999984], [1439477568.0, 52620000.0, 40.45764472799998], [1439479296.0, 52640000.0, 40.58383974400002], [1439481024.0, 52660000.0, 40.71016109600001], [1439482752.0, 52680000.0, 40.83660883199998], [1439484480.0, 52700000.0, 40.96318299999997], [1439486208.0, 52720000.0, 41.08988364800001], [1439487936.0, 52740000.0, 41.216710824], [1439489664.0, 52760000.0, 41.343664575999995], [1439491392.0, 52780000.0, 41.470744951999976], [1439493120.0, 52800000.0, 41.59795200000001], [1439494848.0, 52820000.0, 41.725285768000006], [1439496576.0, 52840000.0, 41.85274630399998], [1439498304.0, 52860000.0, 41.98033365599997], [1439500032.0, 52880000.0, 42.10804787200003], [1439501760.0, 52900000.0, 42.235889], [1439503488.0, 52920000.0, 42.363857087999975], [1439505216.0, 52940000.0, 42.491952183999985], [1439506944.0, 52960000.0, 42.620174336000005], [1439508672.0, 52980000.0, 42.748523592], [1439510400.0, 53000000.0, 42.87699999999998], [1439512128.0, 53020000.0, 43.00560360799997], [1439513856.0, 53040000.0, 43.134334464000005], [1439515584.0, 53060000.0, 43.263192616], [1439517312.0, 53080000.0, 43.39217811199998], [1439519040.0, 53100000.0, 43.52129099999996], [1439520768.0, 53120000.0, 43.65053132800003], [1439522496.0, 53140000.0, 43.77989914400001], [1439524224.0, 53160000.0, 43.909394496000004], [1439525952.0, 53180000.0, 44.03901743199998], [1439527680.0, 53200000.0, 44.168768], [1439529408.0, 53220000.0, 44.29864624800001], [1439531136.0, 53240000.0, 44.42865222399999], [1439532864.0, 53260000.0, 44.55878597599998], [1439534592.0, 53280000.0, 44.689047552000034], [1439536320.0, 53300000.0, 44.81943699999999], [1439538048.0, 53320000.0, 44.94995436799998], [1439539776.0, 53340000.0, 45.080599703999965], [1439541504.0, 53360000.0, 45.21137305600001], [1439543232.0, 53380000.0, 45.342274472], [1439544960.0, 53400000.0, 45.473304], [1439546688.0, 53420000.0, 45.604461688], [1439548416.0, 53440000.0, 45.73574758400001], [1439550144.0, 53460000.0, 45.867161736], [1439551872.0, 53480000.0, 45.998704192000005], [1439553600.0, 53500000.0, 46.13037499999996], [1439555328.0, 53520000.0, 46.26217420800002], [1439557056.0, 53540000.0, 46.39410186400001], [1439558784.0, 53560000.0, 46.52615801599998], [1439560512.0, 53580000.0, 46.65834271199998], [1439562240.0, 53600000.0, 46.79065600000003], [1439563968.0, 53620000.0, 46.923097928000004], [1439565696.0, 53640000.0, 47.055668543999985], [1439567424.0, 53660000.0, 47.18836789599999], [1439569152.0, 53680000.0, 47.32119603200002], [1439570880.0, 53700000.0, 47.45415299999999], [1439572608.0, 53720000.0, 47.587238848], [1439574336.0, 53740000.0, 47.72045362399999], [1439576064.0, 53760000.0, 47.85379737600002], [1439577792.0, 53780000.0, 47.98727015200001], [1439579520.0, 53800000.0, 48.12087199999999], [1439581248.0, 53820000.0, 48.254602968], [1439582976.0, 53840000.0, 48.38846310400004], [1439584704.0, 53860000.0, 48.522452455999996], [1439586432.0, 53880000.0, 48.65657107199999], [1439588160.0, 53900000.0, 48.790818999999985], [1439589888.0, 53920000.0, 48.92519628800002], [1439591616.0, 53940000.0, 49.05970298400001], [1439593344.0, 53960000.0, 49.19433913599998], [1439595072.0, 53980000.0, 49.329104791999995], [1439596800.0, 54000000.0, 49.46400000000003], [1439598528.0, 54020000.0, 49.59902480800001], [1439600256.0, 54040000.0, 49.73417926399999], [1439601984.0, 54060000.0, 49.86946341599999], [1439603712.0, 54080000.0, 50.004877312000005], [1439605440.0, 54100000.0, 50.14042100000002], [1439607168.0, 54120000.0, 50.276094528], [1439608896.0, 54140000.0, 50.411897943999975], [1439610624.0, 54160000.0, 50.54783129600001], [1439612352.0, 54180000.0, 50.68389463200002], [1439614080.0, 54200000.0, 50.820088], [1439615808.0, 54220000.0, 50.95641144799998], [1439617536.0, 54240000.0, 51.092865024000034], [1439619264.0, 54260000.0, 51.229448776], [1439620992.0, 54280000.0, 51.36616275199998], [1439622720.0, 54300000.0, 51.50300699999997], [1439624448.0, 54320000.0, 51.639981568000024], [1439626176.0, 54340000.0, 51.77708650400001], [1439627904.0, 54360000.0, 51.91432185599999], [1439629632.0, 54380000.0, 52.051687671999986], [1439631360.0, 54400000.0, 52.18918400000004], [1439633088.0, 54420000.0, 52.32681088800001], [1439634816.0, 54440000.0, 52.46456838399999], [1439636544.0, 54460000.0, 52.602456536000005], [1439638272.0, 54480000.0, 52.740475392000036], [1439640000.0, 54500000.0, 52.878625], [1439641728.0, 54520000.0, 53.016905408000014], [1439643456.0, 54540000.0, 53.15531666399998], [1439645184.0, 54560000.0, 53.29385881600004], [1439646912.0, 54580000.0, 53.43253191200003], [1439648640.0, 54600000.0, 53.57133599999999], [1439650368.0, 54620000.0, 53.710271127999974], [1439652096.0, 54640000.0, 53.84933734400005], [1439653824.0, 54660000.0, 53.988534696], [1439655552.0, 54680000.0, 54.127863231999996], [1439657280.0, 54700000.0, 54.26732299999999], [1439659008.0, 54720000.0, 54.40691404800002], [1439660736.0, 54740000.0, 54.546636424], [1439662464.0, 54760000.0, 54.68649017599999], [1439664192.0, 54780000.0, 54.82647535199996], [1439665920.0, 54800000.0, 54.966592000000034], [1439667648.0, 54820000.0, 55.106840168000005], [1439669376.0, 54840000.0, 55.24721990399999], [1439671104.0, 54860000.0, 55.387731255999995], [1439672832.0, 54880000.0, 55.52837427200002], [1439674560.0, 54900000.0, 55.669149000000004], [1439676288.0, 54920000.0, 55.81005548799999], [1439678016.0, 54940000.0, 55.951093783999994], [1439679744.0, 54960000.0, 56.09226393600002], [1439681472.0, 54980000.0, 56.233565992000024], [1439683200.0, 55000000.0, 56.375], [1439684928.0, 55020000.0, 56.516566007999984], [1439686656.0, 55040000.0, 56.65826406399998], [1439688384.0, 55060000.0, 56.80009421600002], [1439690112.0, 55080000.0, 56.94205651199999], [1439691840.0, 55100000.0, 57.08415099999999], [1439693568.0, 55120000.0, 57.226377727999974], [1439695296.0, 55140000.0, 57.36873674400002], [1439697024.0, 55160000.0, 57.511228095999996], [1439698752.0, 55180000.0, 57.653851832], [1439700480.0, 55200000.0, 57.79660799999998], [1439702208.0, 55220000.0, 57.939496648000016], [1439703936.0, 55240000.0, 58.08251782399999], [1439705664.0, 55260000.0, 58.22567157599998], [1439707392.0, 55280000.0, 58.368957951999974], [1439709120.0, 55300000.0, 58.51237700000003], [1439710848.0, 55320000.0, 58.65592876800001], [1439712576.0, 55340000.0, 58.799613303999976], [1439714304.0, 55360000.0, 58.943430655999975], [1439716032.0, 55380000.0, 59.087380872000026], [1439717760.0, 55400000.0, 59.231464], [1439719488.0, 55420000.0, 59.37568008799997], [1439721216.0, 55440000.0, 59.52002918399995], [1439722944.0, 55460000.0, 59.66451133600002], [1439724672.0, 55480000.0, 59.80912659200001], [1439726400.0, 55500000.0, 59.95387499999998], [1439728128.0, 55520000.0, 60.098756607999974], [1439729856.0, 55540000.0, 60.24377146400002], [1439731584.0, 55560000.0, 60.38891961600001], [1439733312.0, 55580000.0, 60.534201112000005], [1439735040.0, 55600000.0, 60.67961599999998], [1439736768.0, 55620000.0, 60.82516432800003], [1439738496.0, 55640000.0, 60.97084614400001], [1439740224.0, 55660000.0, 61.11666149599998], [1439741952.0, 55680000.0, 61.262610431999974], [1439743680.0, 55700000.0, 61.40869300000003], [1439745408.0, 55720000.0, 61.554909248], [1439747136.0, 55740000.0, 61.70125922399998], [1439748864.0, 55760000.0, 61.847742975999964], [1439750592.0, 55780000.0, 61.99436055200002], [1439752320.0, 55800000.0, 62.14111200000002], [1439754048.0, 55820000.0, 62.28799736799998], [1439755776.0, 55840000.0, 62.43501670399998], [1439757504.0, 55860000.0, 62.58217005600005], [1439759232.0, 55880000.0, 62.72945747200002], [1439760960.0, 55900000.0, 62.87687899999999], [1439762688.0, 55920000.0, 63.02443468799997], [1439764416.0, 55940000.0, 63.17212458400003], [1439766144.0, 55960000.0, 63.319948736], [1439767872.0, 55980000.0, 63.467907192], [1439769600.0, 56000000.0, 63.61599999999996], [1439771328.0, 56020000.0, 63.76422720800004], [1439773056.0, 56040000.0, 63.912588864], [1439774784.0, 56060000.0, 64.06108501599996], [1439776512.0, 56080000.0, 64.20971571199996], [1439778240.0, 56100000.0, 64.35848100000003], [1439779968.0, 56120000.0, 64.507380928], [1439781696.0, 56140000.0, 64.656415544], [1439783424.0, 56160000.0, 64.80558489599997], [1439785152.0, 56180000.0, 64.95488903200001], [1439786880.0, 56200000.0, 65.10432800000001], [1439788608.0, 56220000.0, 65.25390184799997], [1439790336.0, 56240000.0, 65.40361062399998], [1439792064.0, 56260000.0, 65.55345437600002], [1439793792.0, 56280000.0, 65.703433152], [1439795520.0, 56300000.0, 65.85354699999999], [1439797248.0, 56320000.0, 66.00379596799999], [1439798976.0, 56340000.0, 66.15418010400003], [1439800704.0, 56360000.0, 66.30469945600001], [1439802432.0, 56380000.0, 66.455354072], [1439804160.0, 56400000.0, 66.60614399999996], [1439805888.0, 56420000.0, 66.75706928800003], [1439807616.0, 56440000.0, 66.90812998400003], [1439809344.0, 56460000.0, 67.059326136], [1439811072.0, 56480000.0, 67.21065779199996], [1439812800.0, 56500000.0, 67.36212500000002], [1439814528.0, 56520000.0, 67.51372780800001], [1439816256.0, 56540000.0, 67.66546626399999], [1439817984.0, 56560000.0, 67.81734041599996], [1439819712.0, 56580000.0, 67.96935031200003], [1439821440.0, 56600000.0, 68.12149600000002], [1439823168.0, 56620000.0, 68.27377752800002], [1439824896.0, 56640000.0, 68.42619494399997], [1439826624.0, 56660000.0, 68.57874829600003], [1439828352.0, 56680000.0, 68.73143763200001], [1439830080.0, 56700000.0, 68.88426299999998], [1439831808.0, 56720000.0, 69.03722444799996], [1439833536.0, 56740000.0, 69.19032202400003], [1439835264.0, 56760000.0, 69.34355577600002], [1439836992.0, 56780000.0, 69.49692575199998], [1439838720.0, 56800000.0, 69.650432], [1439840448.0, 56820000.0, 69.804074568], [1439842176.0, 56840000.0, 69.95785350400001], [1439843904.0, 56860000.0, 70.111768856], [1439845632.0, 56880000.0, 70.26582067199999], [1439847360.0, 56900000.0, 70.420009], [1439849088.0, 56920000.0, 70.57433388800001], [1439850816.0, 56940000.0, 70.728795384], [1439852544.0, 56960000.0, 70.88339353599997], [1439854272.0, 56980000.0, 71.03812839200003], [1439856000.0, 57000000.0, 71.19300000000001], [1439857728.0, 57020000.0, 71.34800840800003], [1439859456.0, 57040000.0, 71.50315366399998], [1439861184.0, 57060000.0, 71.65843581600006], [1439862912.0, 57080000.0, 71.81385491200004], [1439864640.0, 57100000.0, 71.96941100000002], [1439866368.0, 57120000.0, 72.12510412799996], [1439868096.0, 57140000.0, 72.28093434400003], [1439869824.0, 57160000.0, 72.43690169600002], [1439871552.0, 57180000.0, 72.59300623200001], [1439873280.0, 57200000.0, 72.74924799999998], [1439875008.0, 57220000.0, 72.90562704800006], [1439876736.0, 57240000.0, 73.06214342399998], [1439878464.0, 57260000.0, 73.218797176], [1439880192.0, 57280000.0, 73.375588352], [1439881920.0, 57300000.0, 73.53251700000003], [1439883648.0, 57320000.0, 73.68958316800003], [1439885376.0, 57340000.0, 73.846786904], [1439887104.0, 57360000.0, 74.00412825599997], [1439888832.0, 57380000.0, 74.16160727200004], [1439890560.0, 57400000.0, 74.31922399999999], [1439892288.0, 57420000.0, 74.47697848800001], [1439894016.0, 57440000.0, 74.63487078399999], [1439895744.0, 57460000.0, 74.79290093600002], [1439897472.0, 57480000.0, 74.95106899200002], [1439899200.0, 57500000.0, 75.109375], [1439900928.0, 57520000.0, 75.267819008], [1439902656.0, 57540000.0, 75.42640106399998], [1439904384.0, 57560000.0, 75.58512121600003], [1439906112.0, 57580000.0, 75.74397951200001], [1439907840.0, 57600000.0, 75.902976], [1439909568.0, 57620000.0, 76.06211072799998], [1439911296.0, 57640000.0, 76.22138374400004], [1439913024.0, 57660000.0, 76.38079509599999], [1439914752.0, 57680000.0, 76.540344832], [1439916480.0, 57700000.0, 76.70003299999998], [1439918208.0, 57720000.0, 76.85985964800003], [1439919936.0, 57740000.0, 77.019824824], [1439921664.0, 57760000.0, 77.179928576], [1439923392.0, 57780000.0, 77.34017095199998], [1439925120.0, 57800000.0, 77.50055200000001], [1439926848.0, 57820000.0, 77.66107176800001], [1439928576.0, 57840000.0, 77.82173030400001], [1439930304.0, 57860000.0, 77.98252765599997], [1439932032.0, 57880000.0, 78.14346387200003], [1439933760.0, 57900000.0, 78.30453899999999], [1439935488.0, 57920000.0, 78.46575308799999], [1439937216.0, 57940000.0, 78.62710618399996], [1439938944.0, 57960000.0, 78.78859833600002], [1439940672.0, 57980000.0, 78.950229592], [1439942400.0, 58000000.0, 79.112], [1439944128.0, 58020000.0, 79.27390960799997], [1439945856.0, 58040000.0, 79.43595846400001], [1439947584.0, 58060000.0, 79.59814661600001], [1439949312.0, 58080000.0, 79.760474112], [1439951040.0, 58100000.0, 79.92294099999998], [1439952768.0, 58120000.0, 80.08554732800002], [1439954496.0, 58140000.0, 80.248293144], [1439956224.0, 58160000.0, 80.41117849599996], [1439957952.0, 58180000.0, 80.57420343199999], [1439959680.0, 58200000.0, 80.73736800000003], [1439961408.0, 58220000.0, 80.90067224800003], [1439963136.0, 58240000.0, 81.064116224], [1439964864.0, 58260000.0, 81.22769997599995], [1439966592.0, 58280000.0, 81.39142355200005], [1439968320.0, 58300000.0, 81.55528700000002], [1439970048.0, 58320000.0, 81.71929036799996], [1439971776.0, 58340000.0, 81.88343370399996], [1439973504.0, 58360000.0, 82.04771705600004], [1439975232.0, 58380000.0, 82.212140472], [1439976960.0, 58400000.0, 82.37670399999998], [1439978688.0, 58420000.0, 82.54140768799998], [1439980416.0, 58440000.0, 82.70625158400004], [1439982144.0, 58460000.0, 82.871235736], [1439983872.0, 58480000.0, 83.036360192], [1439985600.0, 58500000.0, 83.20162499999998], [1439987328.0, 58520000.0, 83.36703020800003], [1439989056.0, 58540000.0, 83.53257586400004], [1439990784.0, 58560000.0, 83.69826201599997], [1439992512.0, 58580000.0, 83.86408871199995], [1439994240.0, 58600000.0, 84.03005600000004], [1439995968.0, 58620000.0, 84.196163928], [1439997696.0, 58640000.0, 84.362412544], [1439999424.0, 58660000.0, 84.52880189599995], [1440001152.0, 58680000.0, 84.69533203200001], [1440002880.0, 58700000.0, 84.86200300000002], [1440004608.0, 58720000.0, 85.028814848], [1440006336.0, 58740000.0, 85.19576762399998], [1440008064.0, 58760000.0, 85.36286137600001], [1440009792.0, 58780000.0, 85.53009615200003], [1440011520.0, 58800000.0, 85.69747199999998], [1440013248.0, 58820000.0, 85.864988968], [1440014976.0, 58840000.0, 86.032647104], [1440016704.0, 58860000.0, 86.20044645600001], [1440018432.0, 58880000.0, 86.36838707199999], [1440020160.0, 58900000.0, 86.53646899999997], [1440021888.0, 58920000.0, 86.70469228800002], [1440023616.0, 58940000.0, 86.87305698399999], [1440025344.0, 58960000.0, 87.041563136], [1440027072.0, 58980000.0, 87.21021079199998], [1440028800.0, 59000000.0, 87.37900000000002], [1440030528.0, 59020000.0, 87.547930808], [1440032256.0, 59040000.0, 87.71700326400001], [1440033984.0, 59060000.0, 87.88621741599998], [1440035712.0, 59080000.0, 88.05557331200002], [1440037440.0, 59100000.0, 88.225071], [1440039168.0, 59120000.0, 88.39471052799998], [1440040896.0, 59140000.0, 88.56449194399997], [1440042624.0, 59160000.0, 88.73441529600002], [1440044352.0, 59180000.0, 88.90448063200002], [1440046080.0, 59200000.0, 89.07468799999998], [1440047808.0, 59220000.0, 89.24503744799995], [1440049536.0, 59240000.0, 89.41552902400004], [1440051264.0, 59260000.0, 89.58616277600001], [1440052992.0, 59280000.0, 89.75693875200002], [1440054720.0, 59300000.0, 89.92785699999996], [1440056448.0, 59320000.0, 90.09891756800002], [1440058176.0, 59340000.0, 90.270120504], [1440059904.0, 59360000.0, 90.44146585599998], [1440061632.0, 59380000.0, 90.61295367199997], [1440063360.0, 59400000.0, 90.78458400000005], [1440065088.0, 59420000.0, 90.95635688799999], [1440066816.0, 59440000.0, 91.12827238400001], [1440068544.0, 59460000.0, 91.30033053599999], [1440070272.0, 59480000.0, 91.47253139200004], [1440072000.0, 59500000.0, 91.64487500000001], [1440073728.0, 59520000.0, 91.81736140800001], [1440075456.0, 59540000.0, 91.98999066399996], [1440077184.0, 59560000.0, 92.16276281600003], [1440078912.0, 59580000.0, 92.33567791200002], [1440080640.0, 59600000.0, 92.50873599999998], [1440082368.0, 59620000.0, 92.68193712799997], [1440084096.0, 59640000.0, 92.85528134400008], [1440085824.0, 59660000.0, 93.028768696], [1440087552.0, 59680000.0, 93.20239923199999], [1440089280.0, 59700000.0, 93.37617299999998], [1440091008.0, 59720000.0, 93.55009004800003], [1440092736.0, 59740000.0, 93.724150424], [1440094464.0, 59760000.0, 93.89835417599998], [1440096192.0, 59780000.0, 94.07270135199997], [1440097920.0, 59800000.0, 94.24719200000003], [1440099648.0, 59820000.0, 94.42182616800004], [1440101376.0, 59840000.0, 94.596603904], [1440103104.0, 59860000.0, 94.77152525599996], [1440104832.0, 59880000.0, 94.94659027200005], [1440106560.0, 59900000.0, 95.12179900000004], [1440108288.0, 59920000.0, 95.297151488], [1440110016.0, 59940000.0, 95.472647784], [1440111744.0, 59960000.0, 95.64828793600003], [1440113472.0, 59980000.0, 95.82407199200003], [1440115200.0, 60000000.0, 96.0], [1440116928.0, 60020000.0, 96.17607200799998], [1440118656.0, 60040000.0, 96.35228806399999], [1440120384.0, 60060000.0, 96.52864821600002], [1440122112.0, 60080000.0, 96.70515251199998], [1440123840.0, 60100000.0, 96.881801], [1440125568.0, 60120000.0, 97.05859372799996], [1440127296.0, 60140000.0, 97.23553074400002], [1440129024.0, 60160000.0, 97.412612096], [1440130752.0, 60180000.0, 97.58983783199999], [1440132480.0, 60200000.0, 97.76720799999997], [1440134208.0, 60220000.0, 97.94472264800004], [1440135936.0, 60240000.0, 98.12238182399999], [1440137664.0, 60260000.0, 98.30018557599998], [1440139392.0, 60280000.0, 98.47813395199996], [1440141120.0, 60300000.0, 98.65622700000002], [1440142848.0, 60320000.0, 98.83446476800002], [1440144576.0, 60340000.0, 99.01284730399998], [1440146304.0, 60360000.0, 99.19137465599997], [1440148032.0, 60380000.0, 99.37004687200003], [1440149760.0, 60400000.0, 99.548864], [1440151488.0, 60420000.0, 99.72782608799997], [1440153216.0, 60440000.0, 99.90693318399994], [1440154944.0, 60460000.0, 100.08618533600001], [1440156672.0, 60480000.0, 100.26558259200002], [1440158400.0, 60500000.0, 100.44512499999999], [1440160128.0, 60520000.0, 100.62481260799998], [1440161856.0, 60540000.0, 100.804645464], [1440163584.0, 60560000.0, 100.98462361600002], [1440165312.0, 60580000.0, 101.16474711199996], [1440167040.0, 60600000.0, 101.34501599999999], [1440168768.0, 60620000.0, 101.525430328], [1440170496.0, 60640000.0, 101.705990144], [1440172224.0, 60660000.0, 101.88669549599999], [1440173952.0, 60680000.0, 102.06754643199997], [1440175680.0, 60700000.0, 102.24854300000001], [1440177408.0, 60720000.0, 102.42968524800003], [1440179136.0, 60740000.0, 102.61097322399999], [1440180864.0, 60760000.0, 102.79240697599998], [1440182592.0, 60780000.0, 102.97398655200001], [1440184320.0, 60800000.0, 103.15571200000002], [1440186048.0, 60820000.0, 103.33758336799998], [1440187776.0, 60840000.0, 103.51960070399996], [1440189504.0, 60860000.0, 103.70176405600003], [1440191232.0, 60880000.0, 103.884073472], [1440192960.0, 60900000.0, 104.06652899999999], [1440194688.0, 60920000.0, 104.24913068799997], [1440196416.0, 60940000.0, 104.43187858400002], [1440198144.0, 60960000.0, 104.61477273600003], [1440199872.0, 60980000.0, 104.79781319199999], [1440201600.0, 61000000.0, 104.98099999999994], [1440203328.0, 61020000.0, 105.16433320800003], [1440205056.0, 61040000.0, 105.34781286400003], [1440206784.0, 61060000.0, 105.53143901599995], [1440208512.0, 61080000.0, 105.71521171199996], [1440210240.0, 61100000.0, 105.89913100000003], [1440211968.0, 61120000.0, 106.08319692799998], [1440213696.0, 61140000.0, 106.26740954399997], [1440215424.0, 61160000.0, 106.45176889599995], [1440217152.0, 61180000.0, 106.63627503200004], [1440218880.0, 61200000.0, 106.82092800000001], [1440220608.0, 61220000.0, 107.00572784799999], [1440222336.0, 61240000.0, 107.19067462399997], [1440224064.0, 61260000.0, 107.37576837600005], [1440225792.0, 61280000.0, 107.56100915200003], [1440227520.0, 61300000.0, 107.746397], [1440229248.0, 61320000.0, 107.93193196799996], [1440230976.0, 61340000.0, 108.11761410400001], [1440232704.0, 61360000.0, 108.30344345600002], [1440234432.0, 61380000.0, 108.489420072], [1440236160.0, 61400000.0, 108.67554399999996], [1440237888.0, 61420000.0, 108.86181528800003], [1440239616.0, 61440000.0, 109.048233984], [1440241344.0, 61460000.0, 109.234800136], [1440243072.0, 61480000.0, 109.42151379199997], [1440244800.0, 61500000.0, 109.60837500000005], [1440246528.0, 61520000.0, 109.79538380800001], [1440248256.0, 61540000.0, 109.98254026400001], [1440249984.0, 61560000.0, 110.16984441599998], [1440251712.0, 61580000.0, 110.35729631200003], [1440253440.0, 61600000.0, 110.54489600000001], [1440255168.0, 61620000.0, 110.732643528], [1440256896.0, 61640000.0, 110.92053894399996], [1440258624.0, 61660000.0, 111.10858229600002], [1440260352.0, 61680000.0, 111.29677363200001], [1440262080.0, 61700000.0, 111.48511299999998], [1440263808.0, 61720000.0, 111.673600448], [1440265536.0, 61740000.0, 111.86223602400001], [1440267264.0, 61760000.0, 112.05101977600003], [1440268992.0, 61780000.0, 112.239951752], [1440270720.0, 61800000.0, 112.42903199999998], [1440272448.0, 61820000.0, 112.61826056800004], [1440274176.0, 61840000.0, 112.80763750399998], [1440275904.0, 61860000.0, 112.99716285599999], [1440277632.0, 61880000.0, 113.18683667199997], [1440279360.0, 61900000.0, 113.37665900000005], [1440281088.0, 61920000.0, 113.56662988800002], [1440282816.0, 61940000.0, 113.75674938400002], [1440284544.0, 61960000.0, 113.94701753599998], [1440286272.0, 61980000.0, 114.13743439200002], [1440288000.0, 62000000.0, 114.32800000000003], [1440289728.0, 62020000.0, 114.51871440800002], [1440291456.0, 62040000.0, 114.70957766399998], [1440293184.0, 62060000.0, 114.90058981600004], [1440294912.0, 62080000.0, 115.09175091200004], [1440296640.0, 62100000.0, 115.28306099999996], [1440298368.0, 62120000.0, 115.47452012799998], [1440300096.0, 62140000.0, 115.66612834400007], [1440301824.0, 62160000.0, 115.85788569600003], [1440303552.0, 62180000.0, 116.049792232], [1440305280.0, 62200000.0, 116.24184799999996], [1440307008.0, 62220000.0, 116.43405304800005], [1440308736.0, 62240000.0, 116.62640742400002], [1440310464.0, 62260000.0, 116.81891117599999], [1440312192.0, 62280000.0, 117.01156435199997], [1440313920.0, 62300000.0, 117.20436700000003], [1440315648.0, 62320000.0, 117.39731916800004], [1440317376.0, 62340000.0, 117.59042090399998], [1440319104.0, 62360000.0, 117.78367225599999], [1440320832.0, 62380000.0, 117.97707327200006], [1440322560.0, 62400000.0, 118.170624], [1440324288.0, 62420000.0, 118.364324488], [1440326016.0, 62440000.0, 118.55817478399999], [1440327744.0, 62460000.0, 118.75217493600005], [1440329472.0, 62480000.0, 118.94632499200003], [1440331200.0, 62500000.0, 119.140625], [1440332928.0, 62520000.0, 119.33507500799996], [1440334656.0, 62540000.0, 119.52967506399995], [1440336384.0, 62560000.0, 119.72442521600004], [1440338112.0, 62580000.0, 119.91932551200003], [1440339840.0, 62600000.0, 120.114376], [1440341568.0, 62620000.0, 120.30957672799998], [1440343296.0, 62640000.0, 120.50492774400001], [1440345024.0, 62660000.0, 120.70042909600002], [1440346752.0, 62680000.0, 120.89608083200001], [1440348480.0, 62700000.0, 121.09188299999995], [1440350208.0, 62720000.0, 121.28783564800005], [1440351936.0, 62740000.0, 121.48393882399999], [1440353664.0, 62760000.0, 121.68019257599995], [1440355392.0, 62780000.0, 121.87659695199999], [1440357120.0, 62800000.0, 122.07315200000001], [1440358848.0, 62820000.0, 122.269857768], [1440360576.0, 62840000.0, 122.46671430399998], [1440362304.0, 62860000.0, 122.66372165599992], [1440364032.0, 62880000.0, 122.86087987200001], [1440365760.0, 62900000.0, 123.05818900000001], [1440367488.0, 62920000.0, 123.25564908799998], [1440369216.0, 62940000.0, 123.45326018399999], [1440370944.0, 62960000.0, 123.65102233600003], [1440372672.0, 62980000.0, 123.848935592], [1440374400.0, 63000000.0, 124.04699999999997], [1440376128.0, 63020000.0, 124.24521560799994], [1440377856.0, 63040000.0, 124.44358246400003], [1440379584.0, 63060000.0, 124.642100616], [1440381312.0, 63080000.0, 124.840770112], [1440383040.0, 63100000.0, 125.03959099999994], [1440384768.0, 63120000.0, 125.23856332800005], [1440386496.0, 63140000.0, 125.43768714400002], [1440388224.0, 63160000.0, 125.636962496], [1440389952.0, 63180000.0, 125.83638943199998], [1440391680.0, 63200000.0, 126.03596800000005], [1440393408.0, 63220000.0, 126.235698248], [1440395136.0, 63240000.0, 126.435580224], [1440396864.0, 63260000.0, 126.63561397599996], [1440398592.0, 63280000.0, 126.83579955200003], [1440400320.0, 63300000.0, 127.036137], [1440402048.0, 63320000.0, 127.23662636799996], [1440403776.0, 63340000.0, 127.43726770399998], [1440405504.0, 63360000.0, 127.63806105600003], [1440407232.0, 63380000.0, 127.83900647200001], [1440408960.0, 63400000.0, 128.04010399999999], [1440410688.0, 63420000.0, 128.24135368799995], [1440412416.0, 63440000.0, 128.44275558400005], [1440414144.0, 63460000.0, 128.64430973599997], [1440415872.0, 63480000.0, 128.84601619199998], [1440417600.0, 63500000.0, 129.04787499999998], [1440419328.0, 63520000.0, 129.24988620800002], [1440421056.0, 63540000.0, 129.45204986400006], [1440422784.0, 63560000.0, 129.65436601599998], [1440424512.0, 63580000.0, 129.856834712], [1440426240.0, 63600000.0, 130.05945600000007], [1440427968.0, 63620000.0, 130.262229928], [1440429696.0, 63640000.0, 130.465156544], [1440431424.0, 63660000.0, 130.66823589599994], [1440433152.0, 63680000.0, 130.87146803200005], [1440434880.0, 63700000.0, 131.074853], [1440436608.0, 63720000.0, 131.27839084800002], [1440438336.0, 63740000.0, 131.48208162399996], [1440440064.0, 63760000.0, 131.68592537600003], [1440441792.0, 63780000.0, 131.889922152], [1440443520.0, 63800000.0, 132.094072], [1440445248.0, 63820000.0, 132.29837496799996], [1440446976.0, 63840000.0, 132.502831104], [1440448704.0, 63860000.0, 132.707440456], [1440450432.0, 63880000.0, 132.91220307199995], [1440452160.0, 63900000.0, 133.11711899999995], [1440453888.0, 63920000.0, 133.32218828800004], [1440455616.0, 63940000.0, 133.52741098400003], [1440457344.0, 63960000.0, 133.732787136], [1440459072.0, 63980000.0, 133.93831679199997], [1440460800.0, 64000000.0, 134.14400000000006], [1440462528.0, 64020000.0, 134.34983680800005], [1440464256.0, 64040000.0, 134.55582726400002], [1440465984.0, 64060000.0, 134.76197141599994], [1440467712.0, 64080000.0, 134.96826931200005], [1440469440.0, 64100000.0, 135.17472100000003], [1440471168.0, 64120000.0, 135.381326528], [1440472896.0, 64140000.0, 135.58808594399997], [1440474624.0, 64160000.0, 135.79499929600007], [1440476352.0, 64180000.0, 136.00206663200004], [1440478080.0, 64200000.0, 136.209288], [1440479808.0, 64220000.0, 136.416663448], [1440481536.0, 64240000.0, 136.62419302400002], [1440483264.0, 64260000.0, 136.83187677600003], [1440484992.0, 64280000.0, 137.039714752], [1440486720.0, 64300000.0, 137.24770699999996], [1440488448.0, 64320000.0, 137.45585356800004], [1440490176.0, 64340000.0, 137.664154504], [1440491904.0, 64360000.0, 137.87260985599997], [1440493632.0, 64380000.0, 138.08121967199997], [1440495360.0, 64400000.0, 138.28998400000006], [1440497088.0, 64420000.0, 138.498902888], [1440498816.0, 64440000.0, 138.707976384], [1440500544.0, 64460000.0, 138.91720453599996], [1440502272.0, 64480000.0, 139.12658739200006], [1440504000.0, 64500000.0, 139.33612499999998], [1440505728.0, 64520000.0, 139.54581740800003], [1440507456.0, 64540000.0, 139.755664664], [1440509184.0, 64560000.0, 139.96566681600007], [1440510912.0, 64580000.0, 140.17582391200003], [1440512640.0, 64600000.0, 140.38613600000002], [1440514368.0, 64620000.0, 140.59660312799997], [1440516096.0, 64640000.0, 140.80722534400005], [1440517824.0, 64660000.0, 141.01800269600005], [1440519552.0, 64680000.0, 141.22893523199997], [1440521280.0, 64700000.0, 141.44002299999997], [1440523008.0, 64720000.0, 141.65126604800008], [1440524736.0, 64740000.0, 141.862664424], [1440526464.0, 64760000.0, 142.07421817599996], [1440528192.0, 64780000.0, 142.285927352], [1440529920.0, 64800000.0, 142.49779200000006], [1440531648.0, 64820000.0, 142.709812168], [1440533376.0, 64840000.0, 142.921987904], [1440535104.0, 64860000.0, 143.134319256], [1440536832.0, 64880000.0, 143.34680627200007], [1440538560.0, 64900000.0, 143.55944900000003], [1440540288.0, 64920000.0, 143.77224748800003], [1440542016.0, 64940000.0, 143.98520178399997], [1440543744.0, 64960000.0, 144.19831193600004], [1440545472.0, 64980000.0, 144.41157799200002], [1440547200.0, 65000000.0, 144.625], [1440548928.0, 65020000.0, 144.83857800799998], [1440550656.0, 65040000.0, 145.05231206399998], [1440552384.0, 65060000.0, 145.266202216], [1440554112.0, 65080000.0, 145.48024851200003], [1440555840.0, 65100000.0, 145.69445100000002], [1440557568.0, 65120000.0, 145.90880972799997], [1440559296.0, 65140000.0, 146.12332474400003], [1440561024.0, 65160000.0, 146.33799609599998], [1440562752.0, 65180000.0, 146.55282383200003], [1440564480.0, 65200000.0, 146.767808], [1440566208.0, 65220000.0, 146.98294864800005], [1440567936.0, 65240000.0, 147.198245824], [1440569664.0, 65260000.0, 147.413699576], [1440571392.0, 65280000.0, 147.62930995199991], [1440573120.0, 65300000.0, 147.84507700000003], [1440574848.0, 65320000.0, 148.06100076799999], [1440576576.0, 65340000.0, 148.27708130399998], [1440578304.0, 65360000.0, 148.49331865599996], [1440580032.0, 65380000.0, 148.70971287200007], [1440581760.0, 65400000.0, 148.926264], [1440583488.0, 65420000.0, 149.142972088], [1440585216.0, 65440000.0, 149.35983718399996], [1440586944.0, 65460000.0, 149.57685933600004], [1440588672.0, 65480000.0, 149.79403859199996], [1440590400.0, 65500000.0, 150.011375], [1440592128.0, 65520000.0, 150.22886860799994], [1440593856.0, 65540000.0, 150.44651946400003], [1440595584.0, 65560000.0, 150.66432761599998], [1440597312.0, 65580000.0, 150.88229311199999], [1440599040.0, 65600000.0, 151.10041599999994], [1440600768.0, 65620000.0, 151.31869632800004], [1440602496.0, 65640000.0, 151.53713414399996], [1440604224.0, 65660000.0, 151.75572949600001], [1440605952.0, 65680000.0, 151.97448243199997], [1440607680.0, 65700000.0, 152.19339300000004], [1440609408.0, 65720000.0, 152.412461248], [1440611136.0, 65740000.0, 152.631687224], [1440612864.0, 65760000.0, 152.85107097599996], [1440614592.0, 65780000.0, 153.07061255200006], [1440616320.0, 65800000.0, 153.290312], [1440618048.0, 65820000.0, 153.510169368], [1440619776.0, 65840000.0, 153.73018470399995], [1440621504.0, 65860000.0, 153.95035805600006], [1440623232.0, 65880000.0, 154.170689472], [1440624960.0, 65900000.0, 154.39117899999997], [1440626688.0, 65920000.0, 154.61182668799992], [1440628416.0, 65940000.0, 154.83263258400007], [1440630144.0, 65960000.0, 155.05359673599997], [1440631872.0, 65980000.0, 155.274719192], [1440633600.0, 66000000.0, 155.49599999999998], [1440635328.0, 66020000.0, 155.71743920800003], [1440637056.0, 66040000.0, 155.93903686400003], [1440638784.0, 66060000.0, 156.16079301599996], [1440640512.0, 66080000.0, 156.38270771199993], [1440642240.0, 66100000.0, 156.60478100000003], [1440643968.0, 66120000.0, 156.827012928], [1440645696.0, 66140000.0, 157.049403544], [1440647424.0, 66160000.0, 157.27195289599996], [1440649152.0, 66180000.0, 157.494661032], [1440650880.0, 66200000.0, 157.71752800000004], [1440652608.0, 66220000.0, 157.94055384799998], [1440654336.0, 66240000.0, 158.163738624], [1440656064.0, 66260000.0, 158.38708237600005], [1440657792.0, 66280000.0, 158.610585152], [1440659520.0, 66300000.0, 158.83424699999998], [1440661248.0, 66320000.0, 159.05806796799993], [1440662976.0, 66340000.0, 159.282048104], [1440664704.0, 66360000.0, 159.506187456], [1440666432.0, 66380000.0, 159.73048607199996], [1440668160.0, 66400000.0, 159.95494399999998], [1440669888.0, 66420000.0, 160.17956128800003], [1440671616.0, 66440000.0, 160.404337984], [1440673344.0, 66460000.0, 160.62927413600002], [1440675072.0, 66480000.0, 160.854369792], [1440676800.0, 66500000.0, 161.07962500000002], [1440678528.0, 66520000.0, 161.30503980800003], [1440680256.0, 66540000.0, 161.530614264], [1440681984.0, 66560000.0, 161.75634841599998], [1440683712.0, 66580000.0, 161.98224231200007], [1440685440.0, 66600000.0, 162.20829600000002], [1440687168.0, 66620000.0, 162.43450952799998], [1440688896.0, 66640000.0, 162.66088294399995], [1440690624.0, 66660000.0, 162.88741629600008], [1440692352.0, 66680000.0, 163.114109632], [1440694080.0, 66700000.0, 163.34096300000002], [1440695808.0, 66720000.0, 163.56797644799997], [1440697536.0, 66740000.0, 163.79515002400007], [1440699264.0, 66760000.0, 164.02248377599997], [1440700992.0, 66780000.0, 164.249977752], [1440702720.0, 66800000.0, 164.477632], [1440704448.0, 66820000.0, 164.70544656800004], [1440706176.0, 66840000.0, 164.93342150400002], [1440707904.0, 66860000.0, 165.16155685599998], [1440709632.0, 66880000.0, 165.38985267199996], [1440711360.0, 66900000.0, 165.618309], [1440713088.0, 66920000.0, 165.84692588800002], [1440714816.0, 66940000.0, 166.075703384], [1440716544.0, 66960000.0, 166.304641536], [1440718272.0, 66980000.0, 166.53374039200006], [1440720000.0, 67000000.0, 166.76300000000003], [1440721728.0, 67020000.0, 166.99242040800002], [1440723456.0, 67040000.0, 167.222001664], [1440725184.0, 67060000.0, 167.45174381600003], [1440726912.0, 67080000.0, 167.68164691200005], [1440728640.0, 67100000.0, 167.91171099999997], [1440730368.0, 67120000.0, 168.14193612799994], [1440732096.0, 67140000.0, 168.37232234400008], [1440733824.0, 67160000.0, 168.60286969600003], [1440735552.0, 67180000.0, 168.83357823199998], [1440737280.0, 67200000.0, 169.06444799999994], [1440739008.0, 67220000.0, 169.29547904800006], [1440740736.0, 67240000.0, 169.52667142399997], [1440742464.0, 67260000.0, 169.75802517599996], [1440744192.0, 67280000.0, 169.98954035199995], [1440745920.0, 67300000.0, 170.22121700000002], [1440747648.0, 67320000.0, 170.45305516800002], [1440749376.0, 67340000.0, 170.68505490399997], [1440751104.0, 67360000.0, 170.91721625599993], [1440752832.0, 67380000.0, 171.14953927200006], [1440754560.0, 67400000.0, 171.38202400000006], [1440756288.0, 67420000.0, 171.61467048799997], [1440758016.0, 67440000.0, 171.84747878399997], [1440759744.0, 67460000.0, 172.08044893600004], [1440761472.0, 67480000.0, 172.313580992], [1440763200.0, 67500000.0, 172.546875], [1440764928.0, 67520000.0, 172.780331008], [1440766656.0, 67540000.0, 173.01394906399997], [1440768384.0, 67560000.0, 173.24772921600004], [1440770112.0, 67580000.0, 173.48167151200002], [1440771840.0, 67600000.0, 173.71577599999995], [1440773568.0, 67620000.0, 173.95004272799994], [1440775296.0, 67640000.0, 174.18447174400004], [1440777024.0, 67660000.0, 174.419063096], [1440778752.0, 67680000.0, 174.653816832], [1440780480.0, 67700000.0, 174.88873299999995], [1440782208.0, 67720000.0, 175.12381164800001], [1440783936.0, 67740000.0, 175.35905282400003], [1440785664.0, 67760000.0, 175.59445657599997], [1440787392.0, 67780000.0, 175.83002295199998], [1440789120.0, 67800000.0, 176.06575200000006], [1440790848.0, 67820000.0, 176.30164376800002], [1440792576.0, 67840000.0, 176.53769830399995], [1440794304.0, 67860000.0, 176.77391565599996], [1440796032.0, 67880000.0, 177.01029587200003], [1440797760.0, 67900000.0, 177.24683900000002], [1440799488.0, 67920000.0, 177.48354508799994], [1440801216.0, 67940000.0, 177.72041418399994], [1440802944.0, 67960000.0, 177.95744633600003], [1440804672.0, 67980000.0, 178.194641592], [1440806400.0, 68000000.0, 178.43199999999996], [1440808128.0, 68020000.0, 178.66952160799994], [1440809856.0, 68040000.0, 178.90720646400004], [1440811584.0, 68060000.0, 179.14505461599998], [1440813312.0, 68080000.0, 179.38306611199997], [1440815040.0, 68100000.0, 179.62124099999994], [1440816768.0, 68120000.0, 179.85957932800005], [1440818496.0, 68140000.0, 180.09808114400002], [1440820224.0, 68160000.0, 180.336746496], [1440821952.0, 68180000.0, 180.57557543199997], [1440823680.0, 68200000.0, 180.81456800000004], [1440825408.0, 68220000.0, 181.05372424800004], [1440827136.0, 68240000.0, 181.293044224], [1440828864.0, 68260000.0, 181.53252797599998], [1440830592.0, 68280000.0, 181.77217555200008], [1440832320.0, 68300000.0, 182.011987], [1440834048.0, 68320000.0, 182.25196236799997], [1440835776.0, 68340000.0, 182.49210170399994], [1440837504.0, 68360000.0, 182.73240505600003], [1440839232.0, 68380000.0, 182.972872472], [1440840960.0, 68400000.0, 183.21350399999994], [1440842688.0, 68420000.0, 183.45429968799996], [1440844416.0, 68440000.0, 183.69525958400004], [1440846144.0, 68460000.0, 183.93638373599998], [1440847872.0, 68480000.0, 184.17767219199996], [1440849600.0, 68500000.0, 184.41912499999995], [1440851328.0, 68520000.0, 184.66074220800004], [1440853056.0, 68540000.0, 184.90252386400002], [1440854784.0, 68560000.0, 185.144470016], [1440856512.0, 68580000.0, 185.38658071199993], [1440858240.0, 68600000.0, 185.628856], [1440859968.0, 68620000.0, 185.871295928], [1440861696.0, 68640000.0, 186.11390054399996], [1440863424.0, 68660000.0, 186.35666989599997], [1440865152.0, 68680000.0, 186.59960403200006], [1440866880.0, 68700000.0, 186.842703], [1440868608.0, 68720000.0, 187.085966848], [1440870336.0, 68740000.0, 187.32939562399994], [1440872064.0, 68760000.0, 187.57298937600004], [1440873792.0, 68780000.0, 187.816748152], [1440875520.0, 68800000.0, 188.06067199999998], [1440877248.0, 68820000.0, 188.30476096799998], [1440878976.0, 68840000.0, 188.54901510400003], [1440880704.0, 68860000.0, 188.79343445600003], [1440882432.0, 68880000.0, 189.038019072], [1440884160.0, 68900000.0, 189.282769], [1440885888.0, 68920000.0, 189.52768428800002], [1440887616.0, 68940000.0, 189.77276498400005], [1440889344.0, 68960000.0, 190.01801113599996], [1440891072.0, 68980000.0, 190.26342279199997], [1440892800.0, 69000000.0, 190.50900000000007], [1440894528.0, 69020000.0, 190.75474280800003], [1440896256.0, 69040000.0, 191.000651264], [1440897984.0, 69060000.0, 191.24672541599995], [1440899712.0, 69080000.0, 191.49296531200005], [1440901440.0, 69100000.0, 191.739371], [1440903168.0, 69120000.0, 191.98594252799995], [1440904896.0, 69140000.0, 192.23267994399995], [1440906624.0, 69160000.0, 192.47958329600004], [1440908352.0, 69180000.0, 192.72665263200003], [1440910080.0, 69200000.0, 192.97388800000002], [1440911808.0, 69220000.0, 193.22128944799994], [1440913536.0, 69240000.0, 193.46885702400004], [1440915264.0, 69260000.0, 193.716590776], [1440916992.0, 69280000.0, 193.96449075199996], [1440918720.0, 69300000.0, 194.21255699999998], [1440920448.0, 69320000.0, 194.46078956800008], [1440922176.0, 69340000.0, 194.709188504], [1440923904.0, 69360000.0, 194.95775385599998], [1440925632.0, 69380000.0, 195.20648567199999], [1440927360.0, 69400000.0, 195.45538400000004], [1440929088.0, 69420000.0, 195.70444888800003], [1440930816.0, 69440000.0, 195.953680384], [1440932544.0, 69460000.0, 196.203078536], [1440934272.0, 69480000.0, 196.45264339200006], [1440936000.0, 69500000.0, 196.70237500000002], [1440937728.0, 69520000.0, 196.95227340799997], [1440939456.0, 69540000.0, 197.20233866399997], [1440941184.0, 69560000.0, 197.45257081600005], [1440942912.0, 69580000.0, 197.70296991200004], [1440944640.0, 69600000.0, 197.95353600000004], [1440946368.0, 69620000.0, 198.20426912799996], [1440948096.0, 69640000.0, 198.45516934400004], [1440949824.0, 69660000.0, 198.70623669600002], [1440951552.0, 69680000.0, 198.957471232], [1440953280.0, 69700000.0, 199.20887299999995], [1440955008.0, 69720000.0, 199.46044204800006], [1440956736.0, 69740000.0, 199.712178424], [1440958464.0, 69760000.0, 199.96408217599995], [1440960192.0, 69780000.0, 200.21615335199994], [1440961920.0, 69800000.0, 200.46839200000005], [1440963648.0, 69820000.0, 200.72079816800002], [1440965376.0, 69840000.0, 200.97337190399998], [1440967104.0, 69860000.0, 201.226113256], [1440968832.0, 69880000.0, 201.47902227200004], [1440970560.0, 69900000.0, 201.732099], [1440972288.0, 69920000.0, 201.98534348799998], [1440974016.0, 69940000.0, 202.23875578399998], [1440975744.0, 69960000.0, 202.49233593600002], [1440977472.0, 69980000.0, 202.74608399200005], [1440979200.0, 70000000.0, 203.0], [1440980928.0, 70020000.0, 203.25408400799998], [1440982656.0, 70040000.0, 203.50833606399993], [1440984384.0, 70060000.0, 203.762756216], [1440986112.0, 70080000.0, 204.017344512], [1440987840.0, 70100000.0, 204.27210099999996], [1440989568.0, 70120000.0, 204.52702572799993], [1440991296.0, 70140000.0, 204.78211874400003], [1440993024.0, 70160000.0, 205.037380096], [1440994752.0, 70180000.0, 205.29280983199996], [1440996480.0, 70200000.0, 205.54840799999994], [1440998208.0, 70220000.0, 205.80417464800001], [1440999936.0, 70240000.0, 206.06010982400002], [1441001664.0, 70260000.0, 206.316213576], [1441003392.0, 70280000.0, 206.5724859519999], [1441005120.0, 70300000.0, 206.82892700000005], [1441006848.0, 70320000.0, 207.085536768], [1441008576.0, 70340000.0, 207.342315304], [1441010304.0, 70360000.0, 207.59926265599992], [1441012032.0, 70380000.0, 207.85637887200005], [1441013760.0, 70400000.0, 208.11366399999997], [1441015488.0, 70420000.0, 208.37111808799997], [1441017216.0, 70440000.0, 208.62874118399992], [1441018944.0, 70460000.0, 208.886533336], [1441020672.0, 70480000.0, 209.14449459199997], [1441022400.0, 70500000.0, 209.402625], [1441024128.0, 70520000.0, 209.66092460799993], [1441025856.0, 70540000.0, 209.91939346400002], [1441027584.0, 70560000.0, 210.178031616], [1441029312.0, 70580000.0, 210.43683911199994], [1441031040.0, 70600000.0, 210.69581599999998], [1441032768.0, 70620000.0, 210.95496232800002], [1441034496.0, 70640000.0, 211.21427814399996], [1441036224.0, 70660000.0, 211.473763496], [1441037952.0, 70680000.0, 211.73341843199998], [1441039680.0, 70700000.0, 211.99324300000004], [1441041408.0, 70720000.0, 212.253237248], [1441043136.0, 70740000.0, 212.51340122399998], [1441044864.0, 70760000.0, 212.77373497599996], [1441046592.0, 70780000.0, 213.0342385520001], [1441048320.0, 70800000.0, 213.29491200000004], [1441050048.0, 70820000.0, 213.55575536799995], [1441051776.0, 70840000.0, 213.8167687039999], [1441053504.0, 70860000.0, 214.07795205600004], [1441055232.0, 70880000.0, 214.33930547200004], [1441056960.0, 70900000.0, 214.60082899999998], [1441058688.0, 70920000.0, 214.86252268799993], [1441060416.0, 70940000.0, 215.12438658400004], [1441062144.0, 70960000.0, 215.386420736], [1441063872.0, 70980000.0, 215.64862519199997], [1441065600.0, 71000000.0, 215.91099999999994], [1441067328.0, 71020000.0, 216.173545208], [1441069056.0, 71040000.0, 216.436260864], [1441070784.0, 71060000.0, 216.69914701599998], [1441072512.0, 71080000.0, 216.96220371199993], [1441074240.0, 71100000.0, 217.22543100000004], [1441075968.0, 71120000.0, 217.48882892800003], [1441077696.0, 71140000.0, 217.752397544], [1441079424.0, 71160000.0, 218.01613689599992], [1441081152.0, 71180000.0, 218.28004703200003], [1441082880.0, 71200000.0, 218.54412800000003], [1441084608.0, 71220000.0, 218.80837984800002], [1441086336.0, 71240000.0, 219.07280262399993], [1441088064.0, 71260000.0, 219.33739637600004], [1441089792.0, 71280000.0, 219.602161152], [1441091520.0, 71300000.0, 219.86709699999997], [1441093248.0, 71320000.0, 220.132203968], [1441094976.0, 71340000.0, 220.397482104], [1441096704.0, 71360000.0, 220.66293145600005], [1441098432.0, 71380000.0, 220.928552072], [1441100160.0, 71400000.0, 221.19434399999997], [1441101888.0, 71420000.0, 221.46030728800005], [1441103616.0, 71440000.0, 221.72644198400002], [1441105344.0, 71460000.0, 221.99274813599996], [1441107072.0, 71480000.0, 222.259225792], [1441108800.0, 71500000.0, 222.52587500000004], [1441110528.0, 71520000.0, 222.79269580800005], [1441112256.0, 71540000.0, 223.059688264], [1441113984.0, 71560000.0, 223.32685241599995], [1441115712.0, 71580000.0, 223.59418831200006], [1441117440.0, 71600000.0, 223.861696], [1441119168.0, 71620000.0, 224.12937552799997], [1441120896.0, 71640000.0, 224.39722694399993], [1441122624.0, 71660000.0, 224.66525029600007], [1441124352.0, 71680000.0, 224.93344563200003], [1441126080.0, 71700000.0, 225.201813], [1441127808.0, 71720000.0, 225.47035244799997], [1441129536.0, 71740000.0, 225.73906402400007], [1441131264.0, 71760000.0, 226.007947776], [1441132992.0, 71780000.0, 226.27700375199998], [1441134720.0, 71800000.0, 226.546232], [1441136448.0, 71820000.0, 226.81563256800004], [1441138176.0, 71840000.0, 227.085205504], [1441139904.0, 71860000.0, 227.354950856], [1441141632.0, 71880000.0, 227.62486867199993], [1441143360.0, 71900000.0, 227.89495900000009], [1441145088.0, 71920000.0, 228.16522188800005], [1441146816.0, 71940000.0, 228.43565738400002], [1441148544.0, 71960000.0, 228.70626553599996], [1441150272.0, 71980000.0, 228.97704639200006], [1441152000.0, 72000000.0, 229.24800000000005], [1441153728.0, 72020000.0, 229.519126408], [1441155456.0, 72040000.0, 229.790425664], [1441157184.0, 72060000.0, 230.06189781600006], [1441158912.0, 72080000.0, 230.333542912], [1441160640.0, 72100000.0, 230.60536100000002], [1441162368.0, 72120000.0, 230.87735212799998], [1441164096.0, 72140000.0, 231.14951634400003], [1441165824.0, 72160000.0, 231.42185369600003], [1441167552.0, 72180000.0, 231.694364232], [1441169280.0, 72200000.0, 231.96704799999995], [1441171008.0, 72220000.0, 232.23990504800008], [1441172736.0, 72240000.0, 232.51293542400003], [1441174464.0, 72260000.0, 232.78613917599998], [1441176192.0, 72280000.0, 233.05951635199995], [1441177920.0, 72300000.0, 233.33306700000003], [1441179648.0, 72320000.0, 233.606791168], [1441181376.0, 72340000.0, 233.880688904], [1441183104.0, 72360000.0, 234.154760256], [1441184832.0, 72380000.0, 234.42900527200007], [1441186560.0, 72400000.0, 234.70342399999998], [1441188288.0, 72420000.0, 234.978016488], [1441190016.0, 72440000.0, 235.25278278399998], [1441191744.0, 72460000.0, 235.5277229360001], [1441193472.0, 72480000.0, 235.802836992], [1441195200.0, 72500000.0, 236.078125], [1441196928.0, 72520000.0, 236.35358700799995], [1441198656.0, 72540000.0, 236.62922306399997], [1441200384.0, 72560000.0, 236.90503321600005], [1441202112.0, 72580000.0, 237.181017512], [1441203840.0, 72600000.0, 237.457176], [1441205568.0, 72620000.0, 237.73350872799992], [1441207296.0, 72640000.0, 238.01001574400007], [1441209024.0, 72660000.0, 238.286697096], [1441210752.0, 72680000.0, 238.56355283199994], [1441212480.0, 72700000.0, 238.84058299999992], [1441214208.0, 72720000.0, 239.11778764800005], [1441215936.0, 72740000.0, 239.39516682400003], [1441217664.0, 72760000.0, 239.67272057599996], [1441219392.0, 72780000.0, 239.95044895199993], [1441221120.0, 72800000.0, 240.22835200000006], [1441222848.0, 72820000.0, 240.50642976800003], [1441224576.0, 72840000.0, 240.78468230399994], [1441226304.0, 72860000.0, 241.06310965599997], [1441228032.0, 72880000.0, 241.34171187200002], [1441229760.0, 72900000.0, 241.62048900000002], [1441231488.0, 72920000.0, 241.89944108799997], [1441233216.0, 72940000.0, 242.17856818399991], [1441234944.0, 72960000.0, 242.45787033600004], [1441236672.0, 72980000.0, 242.73734759200002], [1441238400.0, 73000000.0, 243.017], [1441240128.0, 73020000.0, 243.29682760799992], [1441241856.0, 73040000.0, 243.57683046400004], [1441243584.0, 73060000.0, 243.85700861600003], [1441245312.0, 73080000.0, 244.13736211199998], [1441247040.0, 73100000.0, 244.41789099999994], [1441248768.0, 73120000.0, 244.698595328], [1441250496.0, 73140000.0, 244.97947514400002], [1441252224.0, 73160000.0, 245.26053049599994], [1441253952.0, 73180000.0, 245.54176143199996], [1441255680.0, 73200000.0, 245.82316800000004], [1441257408.0, 73220000.0, 246.10475024800002], [1441259136.0, 73240000.0, 246.38650822399998], [1441260864.0, 73260000.0, 246.66844197599994], [1441262592.0, 73280000.0, 246.95055155200004], [1441264320.0, 73300000.0, 247.23283700000005], [1441266048.0, 73320000.0, 247.515298368], [1441267776.0, 73340000.0, 247.79793570399994], [1441269504.0, 73360000.0, 248.0807490560001], [1441271232.0, 73380000.0, 248.36373847200002], [1441272960.0, 73400000.0, 248.64690399999995], [1441274688.0, 73420000.0, 248.93024568799993], [1441276416.0, 73440000.0, 249.21376358400005], [1441278144.0, 73460000.0, 249.497457736], [1441279872.0, 73480000.0, 249.78132819199996], [1441281600.0, 73500000.0, 250.06537499999996], [1441283328.0, 73520000.0, 250.34959820800003], [1441285056.0, 73540000.0, 250.633997864], [1441286784.0, 73560000.0, 250.91857401599998], [1441288512.0, 73580000.0, 251.20332671199995], [1441290240.0, 73600000.0, 251.488256], [1441291968.0, 73620000.0, 251.77336192799999], [1441293696.0, 73640000.0, 252.05864454399997], [1441295424.0, 73660000.0, 252.34410389599992], [1441297152.0, 73680000.0, 252.62974003200003], [1441298880.0, 73700000.0, 252.91555300000002], [1441300608.0, 73720000.0, 253.20154284799997], [1441302336.0, 73740000.0, 253.48770962399996], [1441304064.0, 73760000.0, 253.774053376], [1441305792.0, 73780000.0, 254.06057415200002], [1441307520.0, 73800000.0, 254.347272], [1441309248.0, 73820000.0, 254.63414696799992], [1441310976.0, 73840000.0, 254.92119910400004], [1441312704.0, 73860000.0, 255.208428456], [1441314432.0, 73880000.0, 255.49583507199998], [1441316160.0, 73900000.0, 255.783419], [1441317888.0, 73920000.0, 256.07118028800005], [1441319616.0, 73940000.0, 256.359118984], [1441321344.0, 73960000.0, 256.64723513599995], [1441323072.0, 73980000.0, 256.93552879199996], [1441324800.0, 74000000.0, 257.22400000000005], [1441326528.0, 74020000.0, 257.512648808], [1441328256.0, 74040000.0, 257.801475264], [1441329984.0, 74060000.0, 258.090479416], [1441331712.0, 74080000.0, 258.37966131200005], [1441333440.0, 74100000.0, 258.66902100000004], [1441335168.0, 74120000.0, 258.958558528], [1441336896.0, 74140000.0, 259.24827394399995], [1441338624.0, 74160000.0, 259.5381672960001], [1441340352.0, 74180000.0, 259.828238632], [1441342080.0, 74200000.0, 260.11848799999996], [1441343808.0, 74220000.0, 260.4089154479999], [1441345536.0, 74240000.0, 260.69952102400003], [1441347264.0, 74260000.0, 260.990304776], [1441348992.0, 74280000.0, 261.281266752], [1441350720.0, 74300000.0, 261.572407], [1441352448.0, 74320000.0, 261.86372556800006], [1441354176.0, 74340000.0, 262.155222504], [1441355904.0, 74360000.0, 262.44689785599996], [1441357632.0, 74380000.0, 262.7387516719999], [1441359360.0, 74400000.0, 263.03078400000004], [1441361088.0, 74420000.0, 263.32299488800004], [1441362816.0, 74440000.0, 263.61538438400004], [1441364544.0, 74460000.0, 263.9079525359999], [1441366272.0, 74480000.0, 264.20069939200005], [1441368000.0, 74500000.0, 264.49362500000007], [1441369728.0, 74520000.0, 264.7867294079999], [1441371456.0, 74540000.0, 265.080012664], [1441373184.0, 74560000.0, 265.37347481600005], [1441374912.0, 74580000.0, 265.6671159120001], [1441376640.0, 74600000.0, 265.960936], [1441378368.0, 74620000.0, 266.25493512799994], [1441380096.0, 74640000.0, 266.54911334400003], [1441381824.0, 74660000.0, 266.84347069600005], [1441383552.0, 74680000.0, 267.13800723199995], [1441385280.0, 74700000.0, 267.432723], [1441387008.0, 74720000.0, 267.72761804800007], [1441388736.0, 74740000.0, 268.022692424], [1441390464.0, 74760000.0, 268.31794617599996], [1441392192.0, 74780000.0, 268.6133793519999], [1441393920.0, 74800000.0, 268.90899200000007], [1441395648.0, 74820000.0, 269.20478416800006], [1441397376.0, 74840000.0, 269.50075590399996], [1441399104.0, 74860000.0, 269.79690725599994], [1441400832.0, 74880000.0, 270.093238272], [1441402560.0, 74900000.0, 270.38974900000005], [1441404288.0, 74920000.0, 270.686439488], [1441406016.0, 74940000.0, 270.98330978399997], [1441407744.0, 74960000.0, 271.2803599360001], [1441409472.0, 74980000.0, 271.57758999200007], [1441411200.0, 75000000.0, 271.875], [1441412928.0, 75020000.0, 272.17259000799993], [1441414656.0, 75040000.0, 272.4703600639999], [1441416384.0, 75060000.0, 272.76831021600003], [1441418112.0, 75080000.0, 273.066440512], [1441419840.0, 75100000.0, 273.36475099999996], [1441421568.0, 75120000.0, 273.6632417279999], [1441423296.0, 75140000.0, 273.9619127440001], [1441425024.0, 75160000.0, 274.260764096], [1441426752.0, 75180000.0, 274.55979583199996], [1441428480.0, 75200000.0, 274.85900799999996], [1441430208.0, 75220000.0, 275.15840064800005], [1441431936.0, 75240000.0, 275.4579738240001], [1441433664.0, 75260000.0, 275.757727576], [1441435392.0, 75280000.0, 276.0576619519999], [1441437120.0, 75300000.0, 276.35777700000006], [1441438848.0, 75320000.0, 276.658072768], [1441440576.0, 75340000.0, 276.958549304], [1441442304.0, 75360000.0, 277.25920665599995], [1441444032.0, 75380000.0, 277.56004487200005], [1441445760.0, 75400000.0, 277.861064], [1441447488.0, 75420000.0, 278.1622640879999], [1441449216.0, 75440000.0, 278.463645184], [1441450944.0, 75460000.0, 278.76520733600006], [1441452672.0, 75480000.0, 279.066950592], [1441454400.0, 75500000.0, 279.36887499999995], [1441456128.0, 75520000.0, 279.6709806079999], [1441457856.0, 75540000.0, 279.97326746400006], [1441459584.0, 75560000.0, 280.275735616], [1441461312.0, 75580000.0, 280.578385112], [1441463040.0, 75600000.0, 280.881216], [1441464768.0, 75620000.0, 281.1842283280001], [1441466496.0, 75640000.0, 281.487422144], [1441468224.0, 75660000.0, 281.790797496], [1441469952.0, 75680000.0, 282.09435443199993], [1441471680.0, 75700000.0, 282.398093], [1441473408.0, 75720000.0, 282.702013248], [1441475136.0, 75740000.0, 283.00611522400004], [1441476864.0, 75760000.0, 283.31039897599993], [1441478592.0, 75780000.0, 283.61486455200003], [1441480320.0, 75800000.0, 283.91951200000005], [1441482048.0, 75820000.0, 284.22434136799995], [1441483776.0, 75840000.0, 284.52935270399996], [1441485504.0, 75860000.0, 284.834546056], [1441487232.0, 75880000.0, 285.139921472], [1441488960.0, 75900000.0, 285.445479], [1441490688.0, 75920000.0, 285.7512186879999], [1441492416.0, 75940000.0, 286.0571405840001], [1441494144.0, 75960000.0, 286.363244736], [1441495872.0, 75980000.0, 286.6695311919999], [1441497600.0, 76000000.0, 286.97599999999994], [1441499328.0, 76020000.0, 287.282651208], [1441501056.0, 76040000.0, 287.58948486400004], [1441502784.0, 76060000.0, 287.896501016], [1441504512.0, 76080000.0, 288.2036997119999], [1441506240.0, 76100000.0, 288.511081], [1441507968.0, 76120000.0, 288.81864492799997], [1441509696.0, 76140000.0, 289.12639154399994], [1441511424.0, 76160000.0, 289.4343208959999], [1441513152.0, 76180000.0, 289.74243303200006], [1441514880.0, 76200000.0, 290.05072800000005], [1441516608.0, 76220000.0, 290.359205848], [1441518336.0, 76240000.0, 290.667866624], [1441520064.0, 76260000.0, 290.976710376], [1441521792.0, 76280000.0, 291.285737152], [1441523520.0, 76300000.0, 291.59494699999993], [1441525248.0, 76320000.0, 291.90433996799993], [1441526976.0, 76340000.0, 292.213916104], [1441528704.0, 76360000.0, 292.5236754560001], [1441530432.0, 76380000.0, 292.83361807200004], [1441532160.0, 76400000.0, 293.14374399999997], [1441533888.0, 76420000.0, 293.45405328800007], [1441535616.0, 76440000.0, 293.76454598400005], [1441537344.0, 76460000.0, 294.075222136], [1441539072.0, 76480000.0, 294.38608179199997], [1441540800.0, 76500000.0, 294.6971250000001], [1441542528.0, 76520000.0, 295.00835180800004], [1441544256.0, 76540000.0, 295.31976226399996], [1441545984.0, 76560000.0, 295.63135641599996], [1441547712.0, 76580000.0, 295.9431343120001], [1441549440.0, 76600000.0, 296.25509600000004], [1441551168.0, 76620000.0, 296.56724152799995], [1441552896.0, 76640000.0, 296.87957094399997], [1441554624.0, 76660000.0, 297.1920842960001], [1441556352.0, 76680000.0, 297.504781632], [1441558080.0, 76700000.0, 297.8176629999999], [1441559808.0, 76720000.0, 298.13072844799996], [1441561536.0, 76740000.0, 298.44397802400005], [1441563264.0, 76760000.0, 298.757411776], [1441564992.0, 76780000.0, 299.071029752], [1441566720.0, 76800000.0, 299.38483199999996], [1441568448.0, 76820000.0, 299.6988185680001], [1441570176.0, 76840000.0, 300.012989504], [1441571904.0, 76860000.0, 300.32734485599997], [1441573632.0, 76880000.0, 300.64188467199995], [1441575360.0, 76900000.0, 300.95660900000007], [1441577088.0, 76920000.0, 301.27151788800006], [1441578816.0, 76940000.0, 301.586611384], [1441580544.0, 76960000.0, 301.901889536], [1441582272.0, 76980000.0, 302.21735239200007], [1441584000.0, 77000000.0, 302.5330000000001], [1441585728.0, 77020000.0, 302.848832408], [1441587456.0, 77040000.0, 303.1648496639999], [1441589184.0, 77060000.0, 303.48105181600005], [1441590912.0, 77080000.0, 303.7974389120001], [1441592640.0, 77100000.0, 304.114011], [1441594368.0, 77120000.0, 304.43076812799995], [1441596096.0, 77140000.0, 304.7477103440001], [1441597824.0, 77160000.0, 305.06483769600004], [1441599552.0, 77180000.0, 305.38215023199996], [1441601280.0, 77200000.0, 305.6996479999999], [1441603008.0, 77220000.0, 306.0173310480001], [1441604736.0, 77240000.0, 306.335199424], [1441606464.0, 77260000.0, 306.653253176], [1441608192.0, 77280000.0, 306.971492352], [1441609920.0, 77300000.0, 307.28991700000006], [1441611648.0, 77320000.0, 307.608527168], [1441613376.0, 77340000.0, 307.927322904], [1441615104.0, 77360000.0, 308.2463042559999], [1441616832.0, 77380000.0, 308.5654712720001], [1441618560.0, 77400000.0, 308.88482400000004], [1441620288.0, 77420000.0, 309.204362488], [1441622016.0, 77440000.0, 309.52408678399996], [1441623744.0, 77460000.0, 309.84399693600005], [1441625472.0, 77480000.0, 310.16409299200006], [1441627200.0, 77500000.0, 310.484375], [1441628928.0, 77520000.0, 310.8048430079999], [1441630656.0, 77540000.0, 311.12549706399994], [1441632384.0, 77560000.0, 311.4463372160001], [1441634112.0, 77580000.0, 311.767363512], [1441635840.0, 77600000.0, 312.088576], [1441637568.0, 77620000.0, 312.40997472799995], [1441639296.0, 77640000.0, 312.73155974400004], [1441641024.0, 77660000.0, 313.05333109599997], [1441642752.0, 77680000.0, 313.375288832], [1441644480.0, 77700000.0, 313.69743299999993], [1441646208.0, 77720000.0, 314.01976364800004], [1441647936.0, 77740000.0, 314.342280824], [1441649664.0, 77760000.0, 314.664984576], [1441651392.0, 77780000.0, 314.9878749519999], [1441653120.0, 77800000.0, 315.31095200000004], [1441654848.0, 77820000.0, 315.63421576800005], [1441656576.0, 77840000.0, 315.95766630399993], [1441658304.0, 77860000.0, 316.281303656], [1441660032.0, 77880000.0, 316.6051278720001], [1441661760.0, 77900000.0, 316.929139], [1441663488.0, 77920000.0, 317.25333708799997], [1441665216.0, 77940000.0, 317.5777221839999], [1441666944.0, 77960000.0, 317.902294336], [1441668672.0, 77980000.0, 318.22705359200006], [1441670400.0, 78000000.0, 318.55199999999996], [1441672128.0, 78020000.0, 318.8771336079999], [1441673856.0, 78040000.0, 319.20245446399997], [1441675584.0, 78060000.0, 319.527962616], [1441677312.0, 78080000.0, 319.85365811199995], [1441679040.0, 78100000.0, 320.1795409999999], [1441680768.0, 78120000.0, 320.50561132800004], [1441682496.0, 78140000.0, 320.83186914400005], [1441684224.0, 78160000.0, 321.158314496], [1441685952.0, 78180000.0, 321.4849474319999], [1441687680.0, 78200000.0, 321.81176800000003], [1441689408.0, 78220000.0, 322.138776248], [1441691136.0, 78240000.0, 322.465972224], [1441692864.0, 78260000.0, 322.79335597599993], [1441694592.0, 78280000.0, 323.12092755200007], [1441696320.0, 78300000.0, 323.44868700000006], [1441698048.0, 78320000.0, 323.776634368], [1441699776.0, 78340000.0, 324.1047697039999], [1441701504.0, 78360000.0, 324.4330930560001], [1441703232.0, 78380000.0, 324.761604472], [1441704960.0, 78400000.0, 325.09030399999995], [1441706688.0, 78420000.0, 325.41919168799996], [1441708416.0, 78440000.0, 325.7482675840001], [1441710144.0, 78460000.0, 326.077531736], [1441711872.0, 78480000.0, 326.406984192], [1441713600.0, 78500000.0, 326.73662499999995], [1441715328.0, 78520000.0, 327.06645420800004], [1441717056.0, 78540000.0, 327.396471864], [1441718784.0, 78560000.0, 327.726678016], [1441720512.0, 78580000.0, 328.0570727119999], [1441722240.0, 78600000.0, 328.387656], [1441723968.0, 78620000.0, 328.71842792800004], [1441725696.0, 78640000.0, 329.04938854399995], [1441727424.0, 78660000.0, 329.3805378959999], [1441729152.0, 78680000.0, 329.711876032], [1441730880.0, 78700000.0, 330.043403], [1441732608.0, 78720000.0, 330.375118848], [1441734336.0, 78740000.0, 330.70702362399993], [1441736064.0, 78760000.0, 331.03911737600004], [1441737792.0, 78780000.0, 331.37140015200004], [1441739520.0, 78800000.0, 331.70387200000005], [1441741248.0, 78820000.0, 332.03653296799996], [1441742976.0, 78840000.0, 332.36938310400006], [1441744704.0, 78860000.0, 332.702422456], [1441746432.0, 78880000.0, 333.03565107199995], [1441748160.0, 78900000.0, 333.36906899999997], [1441749888.0, 78920000.0, 333.70267628800013], [1441751616.0, 78940000.0, 334.036472984], [1441753344.0, 78960000.0, 334.370459136], [1441755072.0, 78980000.0, 334.704634792], [1441756800.0, 79000000.0, 335.03900000000004], [1441758528.0, 79020000.0, 335.3735548080001], [1441760256.0, 79040000.0, 335.708299264], [1441761984.0, 79060000.0, 336.04323341599996], [1441763712.0, 79080000.0, 336.37835731200005], [1441765440.0, 79100000.0, 336.71367100000003], [1441767168.0, 79120000.0, 337.049174528], [1441768896.0, 79140000.0, 337.3848679439999], [1441770624.0, 79160000.0, 337.72075129600006], [1441772352.0, 79180000.0, 338.05682463200003], [1441774080.0, 79200000.0, 338.3930879999999], [1441775808.0, 79220000.0, 338.72954144799996], [1441777536.0, 79240000.0, 339.06618502400005], [1441779264.0, 79260000.0, 339.40301877600007], [1441780992.0, 79280000.0, 339.74004275199997], [1441782720.0, 79300000.0, 340.0772569999999], [1441784448.0, 79320000.0, 340.41466156800004], [1441786176.0, 79340000.0, 340.75225650400006], [1441787904.0, 79360000.0, 341.09004185599997], [1441789632.0, 79380000.0, 341.42801767199995], [1441791360.0, 79400000.0, 341.76618400000007], [1441793088.0, 79420000.0, 342.104540888], [1441794816.0, 79440000.0, 342.44308838399996], [1441796544.0, 79460000.0, 342.7818265359999], [1441798272.0, 79480000.0, 343.1207553920001], [1441800000.0, 79500000.0, 343.459875], [1441801728.0, 79520000.0, 343.79918540799997], [1441803456.0, 79540000.0, 344.138686664], [1441805184.0, 79560000.0, 344.4783788160001], [1441806912.0, 79580000.0, 344.818261912], [1441808640.0, 79600000.0, 345.158336], [1441810368.0, 79620000.0, 345.49860112799996], [1441812096.0, 79640000.0, 345.83905734400014], [1441813824.0, 79660000.0, 346.17970469600004], [1441815552.0, 79680000.0, 346.52054323199997], [1441817280.0, 79700000.0, 346.8615729999999], [1441819008.0, 79720000.0, 347.2027940480001], [1441820736.0, 79740000.0, 347.544206424], [1441822464.0, 79760000.0, 347.88581017599995], [1441824192.0, 79780000.0, 348.22760535199996], [1441825920.0, 79800000.0, 348.56959200000006], [1441827648.0, 79820000.0, 348.911770168], [1441829376.0, 79840000.0, 349.254139904], [1441831104.0, 79860000.0, 349.59670125599996], [1441832832.0, 79880000.0, 349.93945427200003], [1441834560.0, 79900000.0, 350.28239900000005], [1441836288.0, 79920000.0, 350.62553548799997], [1441838016.0, 79940000.0, 350.96886378399995], [1441839744.0, 79960000.0, 351.31238393600006], [1441841472.0, 79980000.0, 351.656095992], [1441843200.0, 80000000.0, 352.0], [1441844928.0, 80020000.0, 352.3440960080001], [1441846656.0, 80040000.0, 352.688384064], [1441848384.0, 80060000.0, 353.032864216], [1441850112.0, 80080000.0, 353.3775365119998], [1441851840.0, 80100000.0, 353.72240100000005], [1441853568.0, 80120000.0, 354.06745772800014], [1441855296.0, 80140000.0, 354.41270674399993], [1441857024.0, 80160000.0, 354.75814809599996], [1441858752.0, 80180000.0, 355.1037818320002], [1441860480.0, 80200000.0, 355.4496079999999], [1441862208.0, 80220000.0, 355.795626648], [1441863936.0, 80240000.0, 356.1418378239998], [1441865664.0, 80260000.0, 356.48824157599995], [1441867392.0, 80280000.0, 356.8348379520001], [1441869120.0, 80300000.0, 357.1816269999998], [1441870848.0, 80320000.0, 357.528608768], [1441872576.0, 80340000.0, 357.8757833040001], [1441874304.0, 80360000.0, 358.2231506559999], [1441876032.0, 80380000.0, 358.57071087200006], [1441877760.0, 80400000.0, 358.91846399999986], [1441879488.0, 80420000.0, 359.26641008799993], [1441881216.0, 80440000.0, 359.6145491840001], [1441882944.0, 80460000.0, 359.9628813359999], [1441884672.0, 80480000.0, 360.3114065919999], [1441886400.0, 80500000.0, 360.6601250000001], [1441888128.0, 80520000.0, 361.00903660799986], [1441889856.0, 80540000.0, 361.358141464], [1441891584.0, 80560000.0, 361.70743961599976], [1441893312.0, 80580000.0, 362.05693111200003], [1441895040.0, 80600000.0, 362.4066160000001], [1441896768.0, 80620000.0, 362.7564943279999], [1441898496.0, 80640000.0, 363.1065661440001], [1441900224.0, 80660000.0, 363.45683149600006], [1441901952.0, 80680000.0, 363.80729043199995], [1441903680.0, 80700000.0, 364.15794300000016], [1441905408.0, 80720000.0, 364.5087892479998], [1441907136.0, 80740000.0, 364.859829224], [1441908864.0, 80760000.0, 365.2110629760001], [1441910592.0, 80780000.0, 365.56249055199993], [1441912320.0, 80800000.0, 365.91411199999993], [1441914048.0, 80820000.0, 366.26592736800023], [1441915776.0, 80840000.0, 366.617936704], [1441917504.0, 80860000.0, 366.970140056], [1441919232.0, 80880000.0, 367.3225374719999], [1441920960.0, 80900000.0, 367.6751289999999], [1441922688.0, 80920000.0, 368.0279146880001], [1441924416.0, 80940000.0, 368.38089458399986], [1441926144.0, 80960000.0, 368.734068736], [1441927872.0, 80980000.0, 369.08743719200004], [1441929600.0, 81000000.0, 369.4409999999999], [1441931328.0, 81020000.0, 369.794757208], [1441933056.0, 81040000.0, 370.14870886399996], [1441934784.0, 81060000.0, 370.5028550159999], [1441936512.0, 81080000.0, 370.8571957120001], [1441938240.0, 81100000.0, 371.21173099999993], [1441939968.0, 81120000.0, 371.5664609280001], [1441941696.0, 81140000.0, 371.9213855440001], [1441943424.0, 81160000.0, 372.27650489599995], [1441945152.0, 81180000.0, 372.63181903200007], [1441946880.0, 81200000.0, 372.9873279999998], [1441948608.0, 81220000.0, 373.34303184799995], [1441950336.0, 81240000.0, 373.6989306240001], [1441952064.0, 81260000.0, 374.055024376], [1441953792.0, 81280000.0, 374.411313152], [1441955520.0, 81300000.0, 374.7677970000002], [1441957248.0, 81320000.0, 375.12447596799996], [1441958976.0, 81340000.0, 375.481350104], [1441960704.0, 81360000.0, 375.8384194559998], [1441962432.0, 81380000.0, 376.19568407199995], [1441964160.0, 81400000.0, 376.55314400000003], [1441965888.0, 81420000.0, 376.91079928799985], [1441967616.0, 81440000.0, 377.26864998400004], [1441969344.0, 81460000.0, 377.62669613600013], [1441971072.0, 81480000.0, 377.98493779200004], [1441972800.0, 81500000.0, 378.34337500000004], [1441974528.0, 81520000.0, 378.70200780799996], [1441976256.0, 81540000.0, 379.060836264], [1441977984.0, 81560000.0, 379.4198604160001], [1441979712.0, 81580000.0, 379.7790803119999], [1441981440.0, 81600000.0, 380.138496], [1441983168.0, 81620000.0, 380.4981075280002], [1441984896.0, 81640000.0, 380.85791494399996], [1441986624.0, 81660000.0, 381.21791829600016], [1441988352.0, 81680000.0, 381.5781176319998], [1441990080.0, 81700000.0, 381.93851299999994], [1441991808.0, 81720000.0, 382.29910444800015], [1441993536.0, 81740000.0, 382.6598920239999], [1441995264.0, 81760000.0, 383.020875776], [1441996992.0, 81780000.0, 383.3820557520001], [1441998720.0, 81800000.0, 383.74343199999987], [1442000448.0, 81820000.0, 384.10500456800014], [1442002176.0, 81840000.0, 384.4667735039999], [1442003904.0, 81860000.0, 384.82873885599986], [1442005632.0, 81880000.0, 385.1909006720001], [1442007360.0, 81900000.0, 385.553259], [1442009088.0, 81920000.0, 385.915813888], [1442010816.0, 81940000.0, 386.2785653840002], [1442012544.0, 81960000.0, 386.64151353599993], [1442014272.0, 81980000.0, 387.004658392], [1442016000.0, 82000000.0, 387.36799999999994], [1442017728.0, 82020000.0, 387.731538408], [1442019456.0, 82040000.0, 388.09527366400016], [1442021184.0, 82060000.0, 388.4592058159999], [1442022912.0, 82080000.0, 388.82333491200006], [1442024640.0, 82100000.0, 389.18766100000016], [1442026368.0, 82120000.0, 389.5521841279999], [1442028096.0, 82140000.0, 389.91690434400005], [1442029824.0, 82160000.0, 390.2818216959999], [1442031552.0, 82180000.0, 390.6469362319999], [1442033280.0, 82200000.0, 391.0122480000001], [1442035008.0, 82220000.0, 391.37775704799986], [1442036736.0, 82240000.0, 391.74346342399997], [1442038464.0, 82260000.0, 392.1093671760001], [1442040192.0, 82280000.0, 392.47546835199995], [1442041920.0, 82300000.0, 392.841767], [1442043648.0, 82320000.0, 393.2082631679999], [1442045376.0, 82340000.0, 393.574956904], [1442047104.0, 82360000.0, 393.94184825600007], [1442048832.0, 82380000.0, 394.308937272], [1442050560.0, 82400000.0, 394.676224], [1442052288.0, 82420000.0, 395.0437084880002], [1442054016.0, 82440000.0, 395.4113907839999], [1442055744.0, 82460000.0, 395.77927093600016], [1442057472.0, 82480000.0, 396.1473489919999], [1442059200.0, 82500000.0, 396.515625], [1442060928.0, 82520000.0, 396.8840990080001], [1442062656.0, 82540000.0, 397.2527710639999], [1442064384.0, 82560000.0, 397.62164121599994], [1442066112.0, 82580000.0, 397.99070951199985], [1442067840.0, 82600000.0, 398.3599759999999], [1442069568.0, 82620000.0, 398.7294407280001], [1442071296.0, 82640000.0, 399.0991037439999], [1442073024.0, 82660000.0, 399.4689650960001], [1442074752.0, 82680000.0, 399.8390248320002], [1442076480.0, 82700000.0, 400.209283], [1442078208.0, 82720000.0, 400.57973964800004], [1442079936.0, 82740000.0, 400.9503948239998], [1442081664.0, 82760000.0, 401.321248576], [1442083392.0, 82780000.0, 401.69230095200015], [1442085120.0, 82800000.0, 402.06355199999985], [1442086848.0, 82820000.0, 402.4350017680001], [1442088576.0, 82840000.0, 402.80665030400013], [1442090304.0, 82860000.0, 403.1784976559999], [1442092032.0, 82880000.0, 403.55054387200005], [1442093760.0, 82900000.0, 403.92278899999997], [1442095488.0, 82920000.0, 404.2952330879999], [1442097216.0, 82940000.0, 404.6678761840002], [1442098944.0, 82960000.0, 405.04071833599994], [1442100672.0, 82980000.0, 405.4137595919999], [1442102400.0, 83000000.0, 405.78700000000015], [1442104128.0, 83020000.0, 406.16043960800005], [1442105856.0, 83040000.0, 406.534078464], [1442107584.0, 83060000.0, 406.90791661599985], [1442109312.0, 83080000.0, 407.281954112], [1442111040.0, 83100000.0, 407.65619100000015], [1442112768.0, 83120000.0, 408.0306273279998], [1442114496.0, 83140000.0, 408.40526314400006], [1442116224.0, 83160000.0, 408.78009849600016], [1442117952.0, 83180000.0, 409.15513343199984], [1442119680.0, 83200000.0, 409.53036800000007], [1442121408.0, 83220000.0, 409.9058022479998], [1442123136.0, 83240000.0, 410.2814362239999], [1442124864.0, 83260000.0, 410.65726997600007], [1442126592.0, 83280000.0, 411.03330355199995], [1442128320.0, 83300000.0, 411.409537], [1442130048.0, 83320000.0, 411.7859703680002], [1442131776.0, 83340000.0, 412.1626037039999], [1442133504.0, 83360000.0, 412.5394370560001], [1442135232.0, 83380000.0, 412.9164704719998], [1442136960.0, 83400000.0, 413.293704], [1442138688.0, 83420000.0, 413.67113768800004], [1442140416.0, 83440000.0, 414.04877158399984], [1442142144.0, 83460000.0, 414.426605736], [1442143872.0, 83480000.0, 414.8046401920002], [1442145600.0, 83500000.0, 415.18287499999997], [1442147328.0, 83520000.0, 415.56131020800007], [1442149056.0, 83540000.0, 415.93994586399987], [1442150784.0, 83560000.0, 416.3187820159999], [1442152512.0, 83580000.0, 416.697818712], [1442154240.0, 83600000.0, 417.0770559999999], [1442155968.0, 83620000.0, 417.456493928], [1442157696.0, 83640000.0, 417.8361325440001], [1442159424.0, 83660000.0, 418.21597189599987], [1442161152.0, 83680000.0, 418.5960120320001], [1442162880.0, 83700000.0, 418.9762529999998], [1442164608.0, 83720000.0, 419.356694848], [1442166336.0, 83740000.0, 419.73733762400013], [1442168064.0, 83760000.0, 420.1181813759998], [1442169792.0, 83780000.0, 420.49922615199995], [1442171520.0, 83800000.0, 420.88047200000017], [1442173248.0, 83820000.0, 421.2619189679999], [1442174976.0, 83840000.0, 421.64356710400006], [1442176704.0, 83860000.0, 422.02541645599985], [1442178432.0, 83880000.0, 422.4074670719999], [1442180160.0, 83900000.0, 422.78971900000016], [1442181888.0, 83920000.0, 423.172172288], [1442183616.0, 83940000.0, 423.5548269840001], [1442185344.0, 83960000.0, 423.9376831360002], [1442187072.0, 83980000.0, 424.320740792], [1442188800.0, 84000000.0, 424.70400000000006], [1442190528.0, 84020000.0, 425.08746080799983], [1442192256.0, 84040000.0, 425.471123264], [1442193984.0, 84060000.0, 425.8549874160001], [1442195712.0, 84080000.0, 426.239053312], [1442197440.0, 84100000.0, 426.623321], [1442199168.0, 84120000.0, 427.0077905280001], [1442200896.0, 84140000.0, 427.39246194399993], [1442202624.0, 84160000.0, 427.7773352960001], [1442204352.0, 84180000.0, 428.1624106319998], [1442206080.0, 84200000.0, 428.547688], [1442207808.0, 84220000.0, 428.9331674480001], [1442209536.0, 84240000.0, 429.31884902399986], [1442211264.0, 84260000.0, 429.7047327760001], [1442212992.0, 84280000.0, 430.0908187520002], [1442214720.0, 84300000.0, 430.47710699999993], [1442216448.0, 84320000.0, 430.86359756800005], [1442218176.0, 84340000.0, 431.25029050399985], [1442219904.0, 84360000.0, 431.637185856], [1442221632.0, 84380000.0, 432.02428367200014], [1442223360.0, 84400000.0, 432.41158399999995], [1442225088.0, 84420000.0, 432.79908688800003], [1442226816.0, 84440000.0, 433.1867923840002], [1442228544.0, 84460000.0, 433.5747005359999], [1442230272.0, 84480000.0, 433.96281139200005], [1442232000.0, 84500000.0, 434.3511249999999], [1442233728.0, 84520000.0, 434.739641408], [1442235456.0, 84540000.0, 435.12836066400007], [1442237184.0, 84560000.0, 435.5172828159999], [1442238912.0, 84580000.0, 435.9064079120001], [1442240640.0, 84600000.0, 436.29573600000015], [1442242368.0, 84620000.0, 436.68526712799985], [1442244096.0, 84640000.0, 437.07500134400004], [1442245824.0, 84660000.0, 437.46493869599993], [1442247552.0, 84680000.0, 437.855079232], [1442249280.0, 84700000.0, 438.2454230000002], [1442251008.0, 84720000.0, 438.63597004799993], [1442252736.0, 84740000.0, 439.026720424], [1442254464.0, 84760000.0, 439.4176741760001], [1442256192.0, 84780000.0, 439.8088313519999], [1442257920.0, 84800000.0, 440.20019200000013], [1442259648.0, 84820000.0, 440.59175616799996], [1442261376.0, 84840000.0, 440.98352390400004], [1442263104.0, 84860000.0, 441.37549525600014], [1442264832.0, 84880000.0, 441.7676702719999], [1442266560.0, 84900000.0, 442.160049], [1442268288.0, 84920000.0, 442.5526314880001], [1442270016.0, 84940000.0, 442.9454177839999], [1442271744.0, 84960000.0, 443.338407936], [1442273472.0, 84980000.0, 443.73160199199987], [1442275200.0, 85000000.0, 444.125], [1442276928.0, 85020000.0, 444.5186020080001], [1442278656.0, 85040000.0, 444.91240806399986], [1442280384.0, 85060000.0, 445.306418216], [1442282112.0, 85080000.0, 445.70063251199986], [1442283840.0, 85100000.0, 446.09505099999996], [1442285568.0, 85120000.0, 446.4896737280002], [1442287296.0, 85140000.0, 446.88450074399987], [1442289024.0, 85160000.0, 447.27953209599997], [1442290752.0, 85180000.0, 447.67476783200016], [1442292480.0, 85200000.0, 448.07020799999987], [1442294208.0, 85220000.0, 448.46585264800007], [1442295936.0, 85240000.0, 448.86170182399974], [1442297664.0, 85260000.0, 449.2577555759999], [1442299392.0, 85280000.0, 449.65401395200007], [1442301120.0, 85300000.0, 450.0504769999999], [1442302848.0, 85320000.0, 450.44714476800004], [1442304576.0, 85340000.0, 450.84401730400015], [1442306304.0, 85360000.0, 451.24109465599986], [1442308032.0, 85380000.0, 451.6383768720001], [1442309760.0, 85400000.0, 452.03586399999983], [1442311488.0, 85420000.0, 452.43355608799993], [1442313216.0, 85440000.0, 452.8314531840001], [1442314944.0, 85460000.0, 453.22955533599986], [1442316672.0, 85480000.0, 453.62786259199993], [1442318400.0, 85500000.0, 454.02637500000014], [1442320128.0, 85520000.0, 454.425092608], [1442321856.0, 85540000.0, 454.824015464], [1442323584.0, 85560000.0, 455.2231436159998], [1442325312.0, 85580000.0, 455.62247711199996], [1442327040.0, 85600000.0, 456.022016], [1442328768.0, 85620000.0, 456.4217603279999], [1442330496.0, 85640000.0, 456.821710144], [1442332224.0, 85660000.0, 457.2218654960002], [1442333952.0, 85680000.0, 457.62222643199993], [1442335680.0, 85700000.0, 458.0227930000001], [1442337408.0, 85720000.0, 458.4235652479999], [1442339136.0, 85740000.0, 458.82454322399997], [1442340864.0, 85760000.0, 459.22572697600015], [1442342592.0, 85780000.0, 459.6271165519999], [1442344320.0, 85800000.0, 460.0287119999999], [1442346048.0, 85820000.0, 460.4305133680001], [1442347776.0, 85840000.0, 460.8325207039998], [1442349504.0, 85860000.0, 461.234734056], [1442351232.0, 85880000.0, 461.6371534719998], [1442352960.0, 85900000.0, 462.039779], [1442354688.0, 85920000.0, 462.44261068800006], [1442356416.0, 85940000.0, 462.84564858399983], [1442358144.0, 85960000.0, 463.24889273599996], [1442359872.0, 85980000.0, 463.6523431920001], [1442361600.0, 86000000.0, 464.0559999999999], [1442363328.0, 86020000.0, 464.45986320800006], [1442365056.0, 86040000.0, 464.8639328639999], [1442366784.0, 86060000.0, 465.268209016], [1442368512.0, 86080000.0, 465.67269171200013], [1442370240.0, 86100000.0, 466.0773809999999], [1442371968.0, 86120000.0, 466.48227692800003], [1442373696.0, 86140000.0, 466.8873795440002], [1442375424.0, 86160000.0, 467.292688896], [1442377152.0, 86180000.0, 467.69820503200003], [1442378880.0, 86200000.0, 468.1039279999999], [1442380608.0, 86220000.0, 468.50985784799997], [1442382336.0, 86240000.0, 468.91599462400006], [1442384064.0, 86260000.0, 469.32233837599983], [1442385792.0, 86280000.0, 469.728889152], [1442387520.0, 86300000.0, 470.1356470000002], [1442389248.0, 86320000.0, 470.54261196799996], [1442390976.0, 86340000.0, 470.949784104], [1442392704.0, 86360000.0, 471.3571634559998], [1442394432.0, 86380000.0, 471.76475007199997], [1442396160.0, 86400000.0, 472.1725440000001], [1442397888.0, 86420000.0, 472.580545288], [1442399616.0, 86440000.0, 472.988753984], [1442401344.0, 86460000.0, 473.39717013600017], [1442403072.0, 86480000.0, 473.805793792], [1442404800.0, 86500000.0, 474.21462500000007], [1442406528.0, 86520000.0, 474.62366380799995], [1442408256.0, 86540000.0, 475.032910264], [1442409984.0, 86560000.0, 475.44236441600015], [1442411712.0, 86580000.0, 475.8520263119999], [1442413440.0, 86600000.0, 476.26189600000004], [1442415168.0, 86620000.0, 476.6719735280002], [1442416896.0, 86640000.0, 477.08225894399993], [1442418624.0, 86660000.0, 477.4927522960001], [1442420352.0, 86680000.0, 477.9034536319998], [1442422080.0, 86700000.0, 478.31436299999996], [1442423808.0, 86720000.0, 478.72548044800016], [1442425536.0, 86740000.0, 479.13680602399984], [1442427264.0, 86760000.0, 479.54833977600003], [1442428992.0, 86780000.0, 479.9600817520001], [1442430720.0, 86800000.0, 480.372032], [1442432448.0, 86820000.0, 480.7841905680001], [1442434176.0, 86840000.0, 481.1965575039999], [1442435904.0, 86860000.0, 481.609132856], [1442437632.0, 86880000.0, 482.0219166720001], [1442439360.0, 86900000.0, 482.43490899999995], [1442441088.0, 86920000.0, 482.84810988800007], [1442442816.0, 86940000.0, 483.26151938400017], [1442444544.0, 86960000.0, 483.67513753599997], [1442446272.0, 86980000.0, 484.0889643920001], [1442448000.0, 87000000.0, 484.5029999999998], [1442449728.0, 87020000.0, 484.9172444080001], [1442451456.0, 87040000.0, 485.3316976640001], [1442453184.0, 87060000.0, 485.7463598159999], [1442454912.0, 87080000.0, 486.1612309120002], [1442456640.0, 87100000.0, 486.57631100000026], [1442458368.0, 87120000.0, 486.99160012799996], [1442460096.0, 87140000.0, 487.407098344], [1442461824.0, 87160000.0, 487.82280569599976], [1442463552.0, 87180000.0, 488.238722232], [1442465280.0, 87200000.0, 488.65484800000013], [1442467008.0, 87220000.0, 489.0711830479998], [1442468736.0, 87240000.0, 489.487727424], [1442470464.0, 87260000.0, 489.90448117600033], [1442472192.0, 87280000.0, 490.3214443519999], [1442473920.0, 87300000.0, 490.7386170000001], [1442475648.0, 87320000.0, 491.1559991679999], [1442477376.0, 87340000.0, 491.573590904], [1442479104.0, 87360000.0, 491.99139225600015], [1442480832.0, 87380000.0, 492.40940327199996], [1442482560.0, 87400000.0, 492.82762400000007], [1442484288.0, 87420000.0, 493.24605448800014], [1442486016.0, 87440000.0, 493.66469478399995], [1442487744.0, 87460000.0, 494.0835449360001], [1442489472.0, 87480000.0, 494.50260499199993], [1442491200.0, 87500000.0, 494.921875], [1442492928.0, 87520000.0, 495.3413550080001], [1442494656.0, 87540000.0, 495.7610450639999], [1442496384.0, 87560000.0, 496.18094521600005], [1442498112.0, 87580000.0, 496.6010555119998], [1442499840.0, 87600000.0, 497.021376], [1442501568.0, 87620000.0, 497.44190672800016], [1442503296.0, 87640000.0, 497.8626477439999], [1442505024.0, 87660000.0, 498.28359909599993], [1442506752.0, 87680000.0, 498.70476083200015], [1442508480.0, 87700000.0, 499.126133], [1442510208.0, 87720000.0, 499.54771564800006], [1442511936.0, 87740000.0, 499.9695088239997], [1442513664.0, 87760000.0, 500.39151257599997], [1442515392.0, 87780000.0, 500.8137269520001], [1442517120.0, 87800000.0, 501.23615199999983], [1442518848.0, 87820000.0, 501.658787768], [1442520576.0, 87840000.0, 502.08163430400015], [1442522304.0, 87860000.0, 502.50469165599986], [1442524032.0, 87880000.0, 502.927959872], [1442525760.0, 87900000.0, 503.3514389999998], [1442527488.0, 87920000.0, 503.77512908799986], [1442529216.0, 87940000.0, 504.1990301840001], [1442530944.0, 87960000.0, 504.6231423359999], [1442532672.0, 87980000.0, 505.047465592], [1442534400.0, 88000000.0, 505.4720000000002], [1442536128.0, 88020000.0, 505.89674560799995], [1442537856.0, 88040000.0, 506.32170246400005], [1442539584.0, 88060000.0, 506.7468706159998], [1442541312.0, 88080000.0, 507.1722501119999], [1442543040.0, 88100000.0, 507.597841], [1442544768.0, 88120000.0, 508.02364332799993], [1442546496.0, 88140000.0, 508.44965714399996], [1442548224.0, 88160000.0, 508.87588249600014], [1442549952.0, 88180000.0, 509.3023194319999], [1442551680.0, 88200000.0, 509.728968], [1442553408.0, 88220000.0, 510.15582824799986], [1442555136.0, 88240000.0, 510.582900224], [1442556864.0, 88260000.0, 511.0101839760001], [1442558592.0, 88280000.0, 511.4376795519999], [1442560320.0, 88300000.0, 511.86538700000006], [1442562048.0, 88320000.0, 512.2933063680001], [1442563776.0, 88340000.0, 512.721437704], [1442565504.0, 88360000.0, 513.149781056], [1442567232.0, 88380000.0, 513.5783364719998], [1442568960.0, 88400000.0, 514.007104], [1442570688.0, 88420000.0, 514.436083688], [1442572416.0, 88440000.0, 514.8652755839998], [1442574144.0, 88460000.0, 515.294679736], [1442575872.0, 88480000.0, 515.7242961920001], [1442577600.0, 88500000.0, 516.1541249999999], [1442579328.0, 88520000.0, 516.5841662079999], [1442581056.0, 88540000.0, 517.0144198639998], [1442582784.0, 88560000.0, 517.4448860159999], [1442584512.0, 88580000.0, 517.8755647120001], [1442586240.0, 88600000.0, 518.3064559999998], [1442587968.0, 88620000.0, 518.737559928], [1442589696.0, 88640000.0, 519.1688765440001], [1442591424.0, 88660000.0, 519.600405896], [1442593152.0, 88680000.0, 520.032148032], [1442594880.0, 88700000.0, 520.4641029999999], [1442596608.0, 88720000.0, 520.8962708480001], [1442598336.0, 88740000.0, 521.328651624], [1442600064.0, 88760000.0, 521.7612453759999], [1442601792.0, 88780000.0, 522.194052152], [1442603520.0, 88800000.0, 522.6270720000001], [1442605248.0, 88820000.0, 523.060304968], [1442606976.0, 88840000.0, 523.493751104], [1442608704.0, 88860000.0, 523.9274104559997], [1442610432.0, 88880000.0, 524.361283072], [1442612160.0, 88900000.0, 524.7953690000002], [1442613888.0, 88920000.0, 525.2296682879999], [1442615616.0, 88940000.0, 525.6641809839999], [1442617344.0, 88960000.0, 526.0989071360002], [1442619072.0, 88980000.0, 526.5338467919998], [1442620800.0, 89000000.0, 526.969], [1442622528.0, 89020000.0, 527.4043668079998], [1442624256.0, 89040000.0, 527.8399472640001], [1442625984.0, 89060000.0, 528.2757414160001], [1442627712.0, 89080000.0, 528.7117493119999], [1442629440.0, 89100000.0, 529.1479710000001], [1442631168.0, 89120000.0, 529.5844065280002], [1442632896.0, 89140000.0, 530.021055944], [1442634624.0, 89160000.0, 530.457919296], [1442636352.0, 89180000.0, 530.8949966319999], [1442638080.0, 89200000.0, 531.3322880000001], [1442639808.0, 89220000.0, 531.7697934480002], [1442641536.0, 89240000.0, 532.2075130239998], [1442643264.0, 89260000.0, 532.6454467760001], [1442644992.0, 89280000.0, 533.0835947520002], [1442646720.0, 89300000.0, 533.5219569999999], [1442648448.0, 89320000.0, 533.9605335680001], [1442650176.0, 89340000.0, 534.3993245039999], [1442651904.0, 89360000.0, 534.838329856], [1442653632.0, 89380000.0, 535.2775496720002], [1442655360.0, 89400000.0, 535.7169839999999], [1442657088.0, 89420000.0, 536.1566328880001], [1442658816.0, 89440000.0, 536.5964963840003], [1442660544.0, 89460000.0, 537.036574536], [1442662272.0, 89480000.0, 537.476867392], [1442664000.0, 89500000.0, 537.9173749999999], [1442665728.0, 89520000.0, 538.358097408], [1442667456.0, 89540000.0, 538.7990346640001], [1442669184.0, 89560000.0, 539.2401868159998], [1442670912.0, 89580000.0, 539.6815539120001], [1442672640.0, 89600000.0, 540.1231360000002], [1442674368.0, 89620000.0, 540.5649331279999], [1442676096.0, 89640000.0, 541.0069453440002], [1442677824.0, 89660000.0, 541.4491726959998], [1442679552.0, 89680000.0, 541.8916152319999], [1442681280.0, 89700000.0, 542.3342730000002], [1442683008.0, 89720000.0, 542.7771460479998], [1442684736.0, 89740000.0, 543.2202344240001], [1442686464.0, 89760000.0, 543.6635381760003], [1442688192.0, 89780000.0, 544.1070573520001], [1442689920.0, 89800000.0, 544.5507920000001], [1442691648.0, 89820000.0, 544.9947421679999], [1442693376.0, 89840000.0, 545.438907904], [1442695104.0, 89860000.0, 545.8832892560001], [1442696832.0, 89880000.0, 546.327886272], [1442698560.0, 89900000.0, 546.7726990000001], [1442700288.0, 89920000.0, 547.2177274880002], [1442702016.0, 89940000.0, 547.6629717839999], [1442703744.0, 89960000.0, 548.1084319360002], [1442705472.0, 89980000.0, 548.5541079919999], [1442707200.0, 90000000.0, 549.0], [1442708928.0, 90020000.0, 549.446108008], [1442710656.0, 90040000.0, 549.8924320639999], [1442712384.0, 90060000.0, 550.338972216], [1442714112.0, 90080000.0, 550.7857285119999], [1442715840.0, 90100000.0, 551.2327009999999], [1442717568.0, 90120000.0, 551.6798897280001], [1442719296.0, 90140000.0, 552.1272947439999], [1442721024.0, 90160000.0, 552.5749160960002], [1442722752.0, 90180000.0, 553.0227538320003], [1442724480.0, 90200000.0, 553.4708079999999], [1442726208.0, 90220000.0, 553.9190786480001], [1442727936.0, 90240000.0, 554.3675658239997], [1442729664.0, 90260000.0, 554.816269576], [1442731392.0, 90280000.0, 555.2651899520001], [1442733120.0, 90300000.0, 555.7143269999999], [1442734848.0, 90320000.0, 556.163680768], [1442736576.0, 90340000.0, 556.6132513040002], [1442738304.0, 90360000.0, 557.0630386559999], [1442740032.0, 90380000.0, 557.513042872], [1442741760.0, 90400000.0, 557.9632639999998], [1442743488.0, 90420000.0, 558.4137020879999], [1442745216.0, 90440000.0, 558.864357184], [1442746944.0, 90460000.0, 559.3152293359999], [1442748672.0, 90480000.0, 559.766318592], [1442750400.0, 90500000.0, 560.2176250000002], [1442752128.0, 90520000.0, 560.6691486079999], [1442753856.0, 90540000.0, 561.120889464], [1442755584.0, 90560000.0, 561.5728476159998], [1442757312.0, 90580000.0, 562.025023112], [1442759040.0, 90600000.0, 562.4774160000001], [1442760768.0, 90620000.0, 562.9300263279998], [1442762496.0, 90640000.0, 563.3828541440001], [1442764224.0, 90660000.0, 563.8358994960001], [1442765952.0, 90680000.0, 564.2891624319999], [1442767680.0, 90700000.0, 564.7426430000002], [1442769408.0, 90720000.0, 565.196341248], [1442771136.0, 90740000.0, 565.6502572239999], [1442772864.0, 90760000.0, 566.1043909760001], [1442774592.0, 90780000.0, 566.5587425519998], [1442776320.0, 90800000.0, 567.0133119999999], [1442778048.0, 90820000.0, 567.4680993680003], [1442779776.0, 90840000.0, 567.9231047039998], [1442781504.0, 90860000.0, 568.378328056], [1442783232.0, 90880000.0, 568.8337694719999], [1442784960.0, 90900000.0, 569.2894289999999], [1442786688.0, 90920000.0, 569.7453066880001], [1442788416.0, 90940000.0, 570.2014025839999], [1442790144.0, 90960000.0, 570.6577167360001], [1442791872.0, 90980000.0, 571.1142491920002], [1442793600.0, 91000000.0, 571.5709999999999], [1442795328.0, 91020000.0, 572.0279692080001], [1442797056.0, 91040000.0, 572.4851568639999], [1442798784.0, 91060000.0, 572.942563016], [1442800512.0, 91080000.0, 573.4001877120002], [1442802240.0, 91100000.0, 573.858031], [1442803968.0, 91120000.0, 574.316092928], [1442805696.0, 91140000.0, 574.7743735440001], [1442807424.0, 91160000.0, 575.2328728959999], [1442809152.0, 91180000.0, 575.6915910320001], [1442810880.0, 91200000.0, 576.1505279999999], [1442812608.0, 91220000.0, 576.609683848], [1442814336.0, 91240000.0, 577.0690586240001], [1442816064.0, 91260000.0, 577.5286523759999], [1442817792.0, 91280000.0, 577.9884651520001], [1442819520.0, 91300000.0, 578.4484970000001], [1442821248.0, 91320000.0, 578.908747968], [1442822976.0, 91340000.0, 579.3692181040001], [1442824704.0, 91360000.0, 579.829907456], [1442826432.0, 91380000.0, 580.290816072], [1442828160.0, 91400000.0, 580.7519440000001], [1442829888.0, 91420000.0, 581.2132912879999], [1442831616.0, 91440000.0, 581.674857984], [1442833344.0, 91460000.0, 582.1366441360001], [1442835072.0, 91480000.0, 582.5986497919998], [1442836800.0, 91500000.0, 583.0608750000001], [1442838528.0, 91520000.0, 583.5233198079998], [1442840256.0, 91540000.0, 583.9859842640001], [1442841984.0, 91560000.0, 584.4488684160002], [1442843712.0, 91580000.0, 584.9119723119999], [1442845440.0, 91600000.0, 585.3752960000002], [1442847168.0, 91620000.0, 585.8388395280002], [1442848896.0, 91640000.0, 586.3026029439999], [1442850624.0, 91660000.0, 586.766586296], [1442852352.0, 91680000.0, 587.2307896319998], [1442854080.0, 91700000.0, 587.695213], [1442855808.0, 91720000.0, 588.1598564480003], [1442857536.0, 91740000.0, 588.6247200239999], [1442859264.0, 91760000.0, 589.089803776], [1442860992.0, 91780000.0, 589.5551077520004], [1442862720.0, 91800000.0, 590.0206319999999], [1442864448.0, 91820000.0, 590.4863765680002], [1442866176.0, 91840000.0, 590.9523415039998], [1442867904.0, 91860000.0, 591.4185268559999], [1442869632.0, 91880000.0, 591.8849326720001], [1442871360.0, 91900000.0, 592.351559], [1442873088.0, 91920000.0, 592.8184058879999], [1442874816.0, 91940000.0, 593.2854733840002], [1442876544.0, 91960000.0, 593.7527615359999], [1442878272.0, 91980000.0, 594.2202703920001], [1442880000.0, 92000000.0, 594.6879999999998], [1442881728.0, 92020000.0, 595.1559504080001], [1442883456.0, 92040000.0, 595.6241216640002], [1442885184.0, 92060000.0, 596.092513816], [1442886912.0, 92080000.0, 596.5611269120001], [1442888640.0, 92100000.0, 597.0299610000002], [1442890368.0, 92120000.0, 597.4990161279999], [1442892096.0, 92140000.0, 597.9682923440001], [1442893824.0, 92160000.0, 598.4377896959998], [1442895552.0, 92180000.0, 598.907508232], [1442897280.0, 92200000.0, 599.3774480000002], [1442899008.0, 92220000.0, 599.8476090479999], [1442900736.0, 92240000.0, 600.3179914240001], [1442902464.0, 92260000.0, 600.7885951760003], [1442904192.0, 92280000.0, 601.2594203519998], [1442905920.0, 92300000.0, 601.7304670000001], [1442907648.0, 92320000.0, 602.2017351679999], [1442909376.0, 92340000.0, 602.6732249040001], [1442911104.0, 92360000.0, 603.1449362560002], [1442912832.0, 92380000.0, 603.6168692719999], [1442914560.0, 92400000.0, 604.0890240000001], [1442916288.0, 92420000.0, 604.5614004880001], [1442918016.0, 92440000.0, 605.0339987839999], [1442919744.0, 92460000.0, 605.5068189360002], [1442921472.0, 92480000.0, 605.9798609919999], [1442923200.0, 92500000.0, 606.453125], [1442924928.0, 92520000.0, 606.9266110080002], [1442926656.0, 92540000.0, 607.4003190639999], [1442928384.0, 92560000.0, 607.874249216], [1442930112.0, 92580000.0, 608.3484015119998], [1442931840.0, 92600000.0, 608.822776], [1442933568.0, 92620000.0, 609.2973727280001], [1442935296.0, 92640000.0, 609.7721917439999], [1442937024.0, 92660000.0, 610.247233096], [1442938752.0, 92680000.0, 610.7224968320003], [1442940480.0, 92700000.0, 611.1979829999999], [1442942208.0, 92720000.0, 611.6736916480002], [1442943936.0, 92740000.0, 612.1496228239997], [1442945664.0, 92760000.0, 612.6257765759999], [1442947392.0, 92780000.0, 613.1021529520001], [1442949120.0, 92800000.0, 613.5787519999999], [1442950848.0, 92820000.0, 614.055573768], [1442952576.0, 92840000.0, 614.5326183040002], [1442954304.0, 92860000.0, 615.0098856559998], [1442956032.0, 92880000.0, 615.4873758720001], [1442957760.0, 92900000.0, 615.9650889999999], [1442959488.0, 92920000.0, 616.4430250879999], [1442961216.0, 92940000.0, 616.9211841840001], [1442962944.0, 92960000.0, 617.3995663359999], [1442964672.0, 92980000.0, 617.878171592], [1442966400.0, 93000000.0, 618.3570000000002], [1442968128.0, 93020000.0, 618.8360516079999], [1442969856.0, 93040000.0, 619.315326464], [1442971584.0, 93060000.0, 619.7948246159998], [1442973312.0, 93080000.0, 620.274546112], [1442975040.0, 93100000.0, 620.754491], [1442976768.0, 93120000.0, 621.2346593279999], [1442978496.0, 93140000.0, 621.7150511440001], [1442980224.0, 93160000.0, 622.1956664960002], [1442981952.0, 93180000.0, 622.6765054319999], [1442983680.0, 93200000.0, 623.1575680000001], [1442985408.0, 93220000.0, 623.6388542479997], [1442987136.0, 93240000.0, 624.120364224], [1442988864.0, 93260000.0, 624.6020979760001], [1442990592.0, 93280000.0, 625.0840555519999], [1442992320.0, 93300000.0, 625.566237], [1442994048.0, 93320000.0, 626.0486423680002], [1442995776.0, 93340000.0, 626.5312717039999], [1442997504.0, 93360000.0, 627.0141250560001], [1442999232.0, 93380000.0, 627.4972024719998], [1443000960.0, 93400000.0, 627.9805039999999], [1443002688.0, 93420000.0, 628.4640296880002], [1443004416.0, 93440000.0, 628.9477795839999], [1443006144.0, 93460000.0, 629.431753736], [1443007872.0, 93480000.0, 629.9159521920002], [1443009600.0, 93500000.0, 630.4003749999999], [1443011328.0, 93520000.0, 630.885022208], [1443013056.0, 93540000.0, 631.3698938639998], [1443014784.0, 93560000.0, 631.854990016], [1443016512.0, 93580000.0, 632.3403107120002], [1443018240.0, 93600000.0, 632.8258559999999], [1443019968.0, 93620000.0, 633.3116259280001], [1443021696.0, 93640000.0, 633.7976205440002], [1443023424.0, 93660000.0, 634.2838398959998], [1443025152.0, 93680000.0, 634.7702840320001], [1443026880.0, 93700000.0, 635.2569529999997], [1443028608.0, 93720000.0, 635.743846848], [1443030336.0, 93740000.0, 636.2309656240002], [1443032064.0, 93760000.0, 636.7183093759999], [1443033792.0, 93780000.0, 637.205878152], [1443035520.0, 93800000.0, 637.6936720000001], [1443037248.0, 93820000.0, 638.181690968], [1443038976.0, 93840000.0, 638.6699351040002], [1443040704.0, 93860000.0, 639.158404456], [1443042432.0, 93880000.0, 639.647099072], [1443044160.0, 93900000.0, 640.1360190000003], [1443045888.0, 93920000.0, 640.625164288], [1443047616.0, 93940000.0, 641.114534984], [1443049344.0, 93960000.0, 641.6041311360002], [1443051072.0, 93980000.0, 642.093952792], [1443052800.0, 94000000.0, 642.5840000000002], [1443054528.0, 94020000.0, 643.0742728079998], [1443056256.0, 94040000.0, 643.564771264], [1443057984.0, 94060000.0, 644.0554954160002], [1443059712.0, 94080000.0, 644.5464453119998], [1443061440.0, 94100000.0, 645.0376210000002], [1443063168.0, 94120000.0, 645.5290225280003], [1443064896.0, 94140000.0, 646.020649944], [1443066624.0, 94160000.0, 646.512503296], [1443068352.0, 94180000.0, 647.0045826319998], [1443070080.0, 94200000.0, 647.496888], [1443071808.0, 94220000.0, 647.9894194480003], [1443073536.0, 94240000.0, 648.4821770239998], [1443075264.0, 94260000.0, 648.9751607760002], [1443076992.0, 94280000.0, 649.4683707520003], [1443078720.0, 94300000.0, 649.9618069999999], [1443080448.0, 94320000.0, 650.4554695680001], [1443082176.0, 94340000.0, 650.9493585039999], [1443083904.0, 94360000.0, 651.443473856], [1443085632.0, 94380000.0, 651.9378156720003], [1443087360.0, 94400000.0, 652.432384], [1443089088.0, 94420000.0, 652.927178888], [1443090816.0, 94440000.0, 653.4222003840002], [1443092544.0, 94460000.0, 653.9174485359999], [1443094272.0, 94480000.0, 654.412923392], [1443096000.0, 94500000.0, 654.9086249999998], [1443097728.0, 94520000.0, 655.4045534080001], [1443099456.0, 94540000.0, 655.900708664], [1443101184.0, 94560000.0, 656.3970908159998], [1443102912.0, 94580000.0, 656.8936999120001], [1443104640.0, 94600000.0, 657.3905360000002], [1443106368.0, 94620000.0, 657.8875991279999], [1443108096.0, 94640000.0, 658.3848893440002], [1443109824.0, 94660000.0, 658.8824066959999], [1443111552.0, 94680000.0, 659.380151232], [1443113280.0, 94700000.0, 659.8781230000002], [1443115008.0, 94720000.0, 660.3763220479998], [1443116736.0, 94740000.0, 660.874748424], [1443118464.0, 94760000.0, 661.3734021760002], [1443120192.0, 94780000.0, 661.8722833519998], [1443121920.0, 94800000.0, 662.371392], [1443123648.0, 94820000.0, 662.8707281679999], [1443125376.0, 94840000.0, 663.3702919039999], [1443127104.0, 94860000.0, 663.8700832560002], [1443128832.0, 94880000.0, 664.3701022719999], [1443130560.0, 94900000.0, 664.870349], [1443132288.0, 94920000.0, 665.3708234880003], [1443134016.0, 94940000.0, 665.8715257839999], [1443135744.0, 94960000.0, 666.372455936], [1443137472.0, 94980000.0, 666.873613992], [1443139200.0, 95000000.0, 667.375], [1443140928.0, 95020000.0, 667.8766140080002], [1443142656.0, 95040000.0, 668.3784560639999], [1443144384.0, 95060000.0, 668.880526216], [1443146112.0, 95080000.0, 669.3828245119997], [1443147840.0, 95100000.0, 669.8853509999999], [1443149568.0, 95120000.0, 670.3881057280001], [1443151296.0, 95140000.0, 670.891088744], [1443153024.0, 95160000.0, 671.394300096], [1443154752.0, 95180000.0, 671.8977398320002], [1443156480.0, 95200000.0, 672.401408], [1443158208.0, 95220000.0, 672.9053046480001], [1443159936.0, 95240000.0, 673.4094298239997], [1443161664.0, 95260000.0, 673.913783576], [1443163392.0, 95280000.0, 674.4183659520002], [1443165120.0, 95300000.0, 674.9231769999999], [1443166848.0, 95320000.0, 675.4282167680001], [1443168576.0, 95340000.0, 675.9334853040002], [1443170304.0, 95360000.0, 676.4389826559999], [1443172032.0, 95380000.0, 676.9447088720001], [1443173760.0, 95400000.0, 677.4506639999998], [1443175488.0, 95420000.0, 677.9568480879999], [1443177216.0, 95440000.0, 678.4632611840002], [1443178944.0, 95460000.0, 678.9699033359998], [1443180672.0, 95480000.0, 679.476774592], [1443182400.0, 95500000.0, 679.9838750000002], [1443184128.0, 95520000.0, 680.491204608], [1443185856.0, 95540000.0, 680.998763464], [1443187584.0, 95560000.0, 681.5065516159998], [1443189312.0, 95580000.0, 682.014569112], [1443191040.0, 95600000.0, 682.522816], [1443192768.0, 95620000.0, 683.0312923279998], [1443194496.0, 95640000.0, 683.539998144], [1443196224.0, 95660000.0, 684.0489334960002], [1443197952.0, 95680000.0, 684.558098432], [1443199680.0, 95700000.0, 685.0674930000001], [1443201408.0, 95720000.0, 685.5771172479997], [1443203136.0, 95740000.0, 686.0869712239999], [1443204864.0, 95760000.0, 686.5970549760002], [1443206592.0, 95780000.0, 687.1073685519998], [1443208320.0, 95800000.0, 687.6179119999999], [1443210048.0, 95820000.0, 688.1286853680002], [1443211776.0, 95840000.0, 688.6396887039998], [1443213504.0, 95860000.0, 689.150922056], [1443215232.0, 95880000.0, 689.6623854719998], [1443216960.0, 95900000.0, 690.1740789999999], [1443218688.0, 95920000.0, 690.6860026880001], [1443220416.0, 95940000.0, 691.1981565839999], [1443222144.0, 95960000.0, 691.710540736], [1443223872.0, 95980000.0, 692.2231551920003], [1443225600.0, 96000000.0, 692.736], [1443227328.0, 96020000.0, 693.249075208], [1443229056.0, 96040000.0, 693.7623808639999], [1443230784.0, 96060000.0, 694.2759170159999], [1443232512.0, 96080000.0, 694.7896837120002], [1443234240.0, 96100000.0, 695.3036809999999], [1443235968.0, 96120000.0, 695.8179089280001], [1443237696.0, 96140000.0, 696.3323675440001], [1443239424.0, 96160000.0, 696.8470568959999], [1443241152.0, 96180000.0, 697.3619770320001], [1443242880.0, 96200000.0, 697.8771279999997], [1443244608.0, 96220000.0, 698.3925098479999], [1443246336.0, 96240000.0, 698.9081226240002], [1443248064.0, 96260000.0, 699.4239663759998], [1443249792.0, 96280000.0, 699.940041152], [1443251520.0, 96300000.0, 700.4563470000003], [1443253248.0, 96320000.0, 700.9728839679999], [1443254976.0, 96340000.0, 701.489652104], [1443256704.0, 96360000.0, 702.0066514559999], [1443258432.0, 96380000.0, 702.523882072], [1443260160.0, 96400000.0, 703.0413440000002], [1443261888.0, 96420000.0, 703.5590372879999], [1443263616.0, 96440000.0, 704.076961984], [1443265344.0, 96460000.0, 704.5951181360001], [1443267072.0, 96480000.0, 705.1135057919998], [1443268800.0, 96500000.0, 705.6321250000001], [1443270528.0, 96520000.0, 706.1509758079998], [1443272256.0, 96540000.0, 706.6700582639999], [1443273984.0, 96560000.0, 707.1893724160002], [1443275712.0, 96580000.0, 707.7089183119999], [1443277440.0, 96600000.0, 708.2286960000001], [1443279168.0, 96620000.0, 708.7487055280003], [1443280896.0, 96640000.0, 709.2689469439999], [1443282624.0, 96660000.0, 709.7894202960001], [1443284352.0, 96680000.0, 710.3101256319998], [1443286080.0, 96700000.0, 710.831063], [1443287808.0, 96720000.0, 711.3522324480002], [1443289536.0, 96740000.0, 711.8736340239998], [1443291264.0, 96760000.0, 712.3952677760001], [1443292992.0, 96780000.0, 712.9171337520002], [1443294720.0, 96800000.0, 713.439232], [1443296448.0, 96820000.0, 713.9615625680002], [1443298176.0, 96840000.0, 714.4841255039998], [1443299904.0, 96860000.0, 715.006920856], [1443301632.0, 96880000.0, 715.5299486720002], [1443303360.0, 96900000.0, 716.0532089999999], [1443305088.0, 96920000.0, 716.576701888], [1443306816.0, 96940000.0, 717.1004273840002], [1443308544.0, 96960000.0, 717.624385536], [1443310272.0, 96980000.0, 718.1485763920001], [1443312000.0, 97000000.0, 718.6729999999998], [1443313728.0, 97020000.0, 719.1976564080001], [1443315456.0, 97040000.0, 719.7225456640001], [1443317184.0, 97060000.0, 720.2476678159999], [1443318912.0, 97080000.0, 720.7730229120001], [1443320640.0, 97100000.0, 721.2986110000003], [1443322368.0, 97120000.0, 721.8244321279999], [1443324096.0, 97140000.0, 722.3504863440002], [1443325824.0, 97160000.0, 722.8767736959999], [1443327552.0, 97180000.0, 723.403294232], [1443329280.0, 97200000.0, 723.9300480000002], [1443331008.0, 97220000.0, 724.4570350479999], [1443332736.0, 97240000.0, 724.984255424], [1443334464.0, 97260000.0, 725.5117091760003], [1443336192.0, 97280000.0, 726.039396352], [1443337920.0, 97300000.0, 726.5673170000001], [1443339648.0, 97320000.0, 727.0954711679999], [1443341376.0, 97340000.0, 727.6238589039999], [1443343104.0, 97360000.0, 728.1524802560001], [1443344832.0, 97380000.0, 728.681335272], [1443346560.0, 97400000.0, 729.2104240000001], [1443348288.0, 97420000.0, 729.7397464880003], [1443350016.0, 97440000.0, 730.2693027839999], [1443351744.0, 97460000.0, 730.7990929360001], [1443353472.0, 97480000.0, 731.3291169919999], [1443355200.0, 97500000.0, 731.859375], [1443356928.0, 97520000.0, 732.3898670080002], [1443358656.0, 97540000.0, 732.9205930639998], [1443360384.0, 97560000.0, 733.451553216], [1443362112.0, 97580000.0, 733.9827475119998], [1443363840.0, 97600000.0, 734.5141759999999], [1443365568.0, 97620000.0, 735.0458387280001], [1443367296.0, 97640000.0, 735.5777357439998], [1443369024.0, 97660000.0, 736.109867096], [1443370752.0, 97680000.0, 736.6422328320002], [1443372480.0, 97700000.0, 737.1748329999998], [1443374208.0, 97720000.0, 737.7076676480001], [1443375936.0, 97740000.0, 738.2407368239998], [1443377664.0, 97760000.0, 738.774040576], [1443379392.0, 97780000.0, 739.3075789520001], [1443381120.0, 97800000.0, 739.8413519999997], [1443382848.0, 97820000.0, 740.375359768], [1443384576.0, 97840000.0, 740.9096023040001], [1443386304.0, 97860000.0, 741.4440796559999], [1443388032.0, 97880000.0, 741.9787918720001], [1443389760.0, 97900000.0, 742.5137389999998], [1443391488.0, 97920000.0, 743.0489210879999], [1443393216.0, 97940000.0, 743.5843381840001], [1443394944.0, 97960000.0, 744.1199903359999], [1443396672.0, 97980000.0, 744.655877592], [1443398400.0, 98000000.0, 745.1920000000002], [1443400128.0, 98020000.0, 745.7283576079999], [1443401856.0, 98040000.0, 746.2649504640001], [1443403584.0, 98060000.0, 746.8017786159998], [1443405312.0, 98080000.0, 747.338842112], [1443407040.0, 98100000.0, 747.8761410000001], [1443408768.0, 98120000.0, 748.4136753279998], [1443410496.0, 98140000.0, 748.951445144], [1443412224.0, 98160000.0, 749.4894504960001], [1443413952.0, 98180000.0, 750.0276914319998], [1443415680.0, 98200000.0, 750.5661680000001], [1443417408.0, 98220000.0, 751.1048802479997], [1443419136.0, 98240000.0, 751.643828224], [1443420864.0, 98260000.0, 752.1830119760002], [1443422592.0, 98280000.0, 752.7224315519998], [1443424320.0, 98300000.0, 753.262087], [1443426048.0, 98320000.0, 753.8019783680002], [1443427776.0, 98340000.0, 754.342105704], [1443429504.0, 98360000.0, 754.8824690560001], [1443431232.0, 98380000.0, 755.4230684719997], [1443432960.0, 98400000.0, 755.963904], [1443434688.0, 98420000.0, 756.5049756880002], [1443436416.0, 98440000.0, 757.0462835839999], [1443438144.0, 98460000.0, 757.587827736], [1443439872.0, 98480000.0, 758.1296081920002], [1443441600.0, 98500000.0, 758.671625], [1443443328.0, 98520000.0, 759.2138782080001], [1443445056.0, 98540000.0, 759.7563678639997], [1443446784.0, 98560000.0, 760.299094016], [1443448512.0, 98580000.0, 760.8420567120002], [1443450240.0, 98600000.0, 761.3852559999998], [1443451968.0, 98620000.0, 761.928691928], [1443453696.0, 98640000.0, 762.4723645440001], [1443455424.0, 98660000.0, 763.0162738959998], [1443457152.0, 98680000.0, 763.560420032], [1443458880.0, 98700000.0, 764.1048029999998], [1443460608.0, 98720000.0, 764.6494228480001], [1443462336.0, 98740000.0, 765.1942796240002], [1443464064.0, 98760000.0, 765.7393733759998], [1443465792.0, 98780000.0, 766.2847041519999], [1443467520.0, 98800000.0, 766.8302720000003], [1443469248.0, 98820000.0, 767.3760769679999], [1443470976.0, 98840000.0, 767.9221191040001], [1443472704.0, 98860000.0, 768.4683984559997], [1443474432.0, 98880000.0, 769.014915072], [1443476160.0, 98900000.0, 769.5616690000002], [1443477888.0, 98920000.0, 770.1086602879998], [1443479616.0, 98940000.0, 770.6558889840001], [1443481344.0, 98960000.0, 771.2033551360003], [1443483072.0, 98980000.0, 771.7510587919999], [1443484800.0, 99000000.0, 772.2990000000001], [1443486528.0, 99020000.0, 772.8471788079999], [1443488256.0, 99040000.0, 773.3955952639999], [1443489984.0, 99060000.0, 773.9442494160002], [1443491712.0, 99080000.0, 774.4931413119998], [1443493440.0, 99100000.0, 775.042271], [1443495168.0, 99120000.0, 775.5916385280002], [1443496896.0, 99140000.0, 776.141243944], [1443498624.0, 99160000.0, 776.6910872960002], [1443500352.0, 99180000.0, 777.2411686319998], [1443502080.0, 99200000.0, 777.7914880000001], [1443503808.0, 99220000.0, 778.3420454480001], [1443505536.0, 99240000.0, 778.8928410239998], [1443507264.0, 99260000.0, 779.443874776], [1443508992.0, 99280000.0, 779.9951467520002], [1443510720.0, 99300000.0, 780.546657], [1443512448.0, 99320000.0, 781.0984055680001], [1443514176.0, 99340000.0, 781.6503925039998], [1443515904.0, 99360000.0, 782.202617856], [1443517632.0, 99380000.0, 782.7550816720001], [1443519360.0, 99400000.0, 783.3077839999999], [1443521088.0, 99420000.0, 783.860724888], [1443522816.0, 99440000.0, 784.4139043840003], [1443524544.0, 99460000.0, 784.967322536], [1443526272.0, 99480000.0, 785.5209793920001], [1443528000.0, 99500000.0, 786.0748749999998], [1443529728.0, 99520000.0, 786.629009408], [1443531456.0, 99540000.0, 787.1833826640002], [1443533184.0, 99560000.0, 787.7379948159999], [1443534912.0, 99580000.0, 788.2928459120001], [1443536640.0, 99600000.0, 788.8479360000001], [1443538368.0, 99620000.0, 789.4032651279999], [1443540096.0, 99640000.0, 789.9588333440001], [1443541824.0, 99660000.0, 790.5146406959998], [1443543552.0, 99680000.0, 791.070687232], [1443545280.0, 99700000.0, 791.6269730000001], [1443547008.0, 99720000.0, 792.1834980479998], [1443548736.0, 99740000.0, 792.740262424], [1443550464.0, 99760000.0, 793.2972661760003], [1443552192.0, 99780000.0, 793.8545093519999], [1443553920.0, 99800000.0, 794.411992], [1443555648.0, 99820000.0, 794.9697141679999], [1443557376.0, 99840000.0, 795.527675904], [1443559104.0, 99860000.0, 796.0858772560002], [1443560832.0, 99880000.0, 796.644318272], [1443562560.0, 99900000.0, 797.2029990000001], [1443564288.0, 99920000.0, 797.7619194880002], [1443566016.0, 99940000.0, 798.3210797839998], [1443567744.0, 99960000.0, 798.880479936], [1443569472.0, 99980000.0, 799.4401199919998]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/app/demo/data/linear.json b/tensorflow/tensorboard/app/demo/data/linear.json
new file mode 100644
index 0000000000..3279365a23
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/linear.json
@@ -0,0 +1 @@
+[[1434931200.0, 0.0, 5.0], [1434932928.0, 20000.0, 5.04], [1434934656.0, 40000.0, 5.08], [1434936384.0, 60000.0, 5.12], [1434938112.0, 80000.0, 5.16], [1434939840.0, 100000.0, 5.2], [1434941568.0, 120000.0, 5.24], [1434943296.0, 140000.0, 5.28], [1434945024.0, 160000.0, 5.32], [1434946752.0, 180000.0, 5.36], [1434948480.0, 200000.0, 5.4], [1434950208.0, 220000.0, 5.4399999999999995], [1434951936.0, 240000.0, 5.48], [1434953664.0, 260000.0, 5.52], [1434955392.0, 280000.0, 5.5600000000000005], [1434957120.0, 300000.0, 5.6], [1434958848.0, 320000.0, 5.64], [1434960576.0, 340000.0, 5.68], [1434962304.0, 360000.0, 5.72], [1434964032.0, 380000.0, 5.76], [1434965760.0, 400000.0, 5.8], [1434967488.0, 420000.0, 5.84], [1434969216.0, 440000.0, 5.88], [1434970944.0, 460000.0, 5.92], [1434972672.0, 480000.0, 5.96], [1434974400.0, 500000.0, 6.0], [1434976128.0, 520000.0, 6.04], [1434977856.0, 540000.0, 6.08], [1434979584.0, 560000.0, 6.12], [1434981312.0, 580000.0, 6.16], [1434983040.0, 600000.0, 6.2], [1434984768.0, 620000.0, 6.24], [1434986496.0, 640000.0, 6.28], [1434988224.0, 660000.0, 6.32], [1434989952.0, 680000.0, 6.36], [1434991680.0, 700000.0, 6.4], [1434993408.0, 720000.0, 6.4399999999999995], [1434995136.0, 740000.0, 6.48], [1434996864.0, 760000.0, 6.52], [1434998592.0, 780000.0, 6.5600000000000005], [1435000320.0, 800000.0, 6.6], [1435002048.0, 820000.0, 6.640000000000001], [1435003776.0, 840000.0, 6.68], [1435005504.0, 860000.0, 6.72], [1435007232.0, 880000.0, 6.76], [1435008960.0, 900000.0, 6.8], [1435010688.0, 920000.0, 6.84], [1435012416.0, 940000.0, 6.88], [1435014144.0, 960000.0, 6.92], [1435015872.0, 980000.0, 6.96], [1435017600.0, 1000000.0, 7.0], [1435019328.0, 1020000.0, 7.04], [1435021056.0, 1040000.0, 7.08], [1435022784.0, 1060000.0, 7.12], [1435024512.0, 1080000.0, 7.16], [1435026240.0, 1100000.0, 7.2], [1435027968.0, 1120000.0, 7.24], [1435029696.0, 1140000.0, 7.28], [1435031424.0, 1160000.0, 7.32], [1435033152.0, 1180000.0, 7.359999999999999], [1435034880.0, 1200000.0, 7.4], [1435036608.0, 1220000.0, 7.4399999999999995], [1435038336.0, 1240000.0, 7.48], [1435040064.0, 1260000.0, 7.52], [1435041792.0, 1280000.0, 7.5600000000000005], [1435043520.0, 1300000.0, 7.6], [1435045248.0, 1320000.0, 7.640000000000001], [1435046976.0, 1340000.0, 7.68], [1435048704.0, 1360000.0, 7.720000000000001], [1435050432.0, 1380000.0, 7.76], [1435052160.0, 1400000.0, 7.800000000000001], [1435053888.0, 1420000.0, 7.84], [1435055616.0, 1440000.0, 7.88], [1435057344.0, 1460000.0, 7.92], [1435059072.0, 1480000.0, 7.96], [1435060800.0, 1500000.0, 8.0], [1435062528.0, 1520000.0, 8.04], [1435064256.0, 1540000.0, 8.08], [1435065984.0, 1560000.0, 8.120000000000001], [1435067712.0, 1580000.0, 8.16], [1435069440.0, 1600000.0, 8.2], [1435071168.0, 1620000.0, 8.24], [1435072896.0, 1640000.0, 8.280000000000001], [1435074624.0, 1660000.0, 8.32], [1435076352.0, 1680000.0, 8.36], [1435078080.0, 1700000.0, 8.4], [1435079808.0, 1720000.0, 8.44], [1435081536.0, 1740000.0, 8.48], [1435083264.0, 1760000.0, 8.52], [1435084992.0, 1780000.0, 8.559999999999999], [1435086720.0, 1800000.0, 8.6], [1435088448.0, 1820000.0, 8.64], [1435090176.0, 1840000.0, 8.68], [1435091904.0, 1860000.0, 8.719999999999999], [1435093632.0, 1880000.0, 8.76], [1435095360.0, 1900000.0, 8.8], [1435097088.0, 1920000.0, 8.84], [1435098816.0, 1940000.0, 8.879999999999999], [1435100544.0, 1960000.0, 8.92], [1435102272.0, 1980000.0, 8.96], [1435104000.0, 2000000.0, 9.0], [1435105728.0, 2020000.0, 9.04], [1435107456.0, 2040000.0, 9.08], [1435109184.0, 2060000.0, 9.120000000000001], [1435110912.0, 2080000.0, 9.16], [1435112640.0, 2100000.0, 9.2], [1435114368.0, 2120000.0, 9.24], [1435116096.0, 2140000.0, 9.280000000000001], [1435117824.0, 2160000.0, 9.32], [1435119552.0, 2180000.0, 9.36], [1435121280.0, 2200000.0, 9.4], [1435123008.0, 2220000.0, 9.440000000000001], [1435124736.0, 2240000.0, 9.48], [1435126464.0, 2260000.0, 9.52], [1435128192.0, 2280000.0, 9.56], [1435129920.0, 2300000.0, 9.600000000000001], [1435131648.0, 2320000.0, 9.64], [1435133376.0, 2340000.0, 9.68], [1435135104.0, 2360000.0, 9.719999999999999], [1435136832.0, 2380000.0, 9.76], [1435138560.0, 2400000.0, 9.8], [1435140288.0, 2420000.0, 9.84], [1435142016.0, 2440000.0, 9.879999999999999], [1435143744.0, 2460000.0, 9.92], [1435145472.0, 2480000.0, 9.96], [1435147200.0, 2500000.0, 10.0], [1435148928.0, 2520000.0, 10.04], [1435150656.0, 2540000.0, 10.08], [1435152384.0, 2560000.0, 10.120000000000001], [1435154112.0, 2580000.0, 10.16], [1435155840.0, 2600000.0, 10.2], [1435157568.0, 2620000.0, 10.24], [1435159296.0, 2640000.0, 10.280000000000001], [1435161024.0, 2660000.0, 10.32], [1435162752.0, 2680000.0, 10.36], [1435164480.0, 2700000.0, 10.4], [1435166208.0, 2720000.0, 10.440000000000001], [1435167936.0, 2740000.0, 10.48], [1435169664.0, 2760000.0, 10.52], [1435171392.0, 2780000.0, 10.56], [1435173120.0, 2800000.0, 10.600000000000001], [1435174848.0, 2820000.0, 10.64], [1435176576.0, 2840000.0, 10.68], [1435178304.0, 2860000.0, 10.719999999999999], [1435180032.0, 2880000.0, 10.76], [1435181760.0, 2900000.0, 10.8], [1435183488.0, 2920000.0, 10.84], [1435185216.0, 2940000.0, 10.879999999999999], [1435186944.0, 2960000.0, 10.92], [1435188672.0, 2980000.0, 10.96], [1435190400.0, 3000000.0, 11.0], [1435192128.0, 3020000.0, 11.04], [1435193856.0, 3040000.0, 11.08], [1435195584.0, 3060000.0, 11.120000000000001], [1435197312.0, 3080000.0, 11.16], [1435199040.0, 3100000.0, 11.2], [1435200768.0, 3120000.0, 11.24], [1435202496.0, 3140000.0, 11.280000000000001], [1435204224.0, 3160000.0, 11.32], [1435205952.0, 3180000.0, 11.36], [1435207680.0, 3200000.0, 11.4], [1435209408.0, 3220000.0, 11.440000000000001], [1435211136.0, 3240000.0, 11.48], [1435212864.0, 3260000.0, 11.52], [1435214592.0, 3280000.0, 11.56], [1435216320.0, 3300000.0, 11.600000000000001], [1435218048.0, 3320000.0, 11.64], [1435219776.0, 3340000.0, 11.68], [1435221504.0, 3360000.0, 11.72], [1435223232.0, 3380000.0, 11.760000000000002], [1435224960.0, 3400000.0, 11.8], [1435226688.0, 3420000.0, 11.84], [1435228416.0, 3440000.0, 11.879999999999999], [1435230144.0, 3460000.0, 11.92], [1435231872.0, 3480000.0, 11.959999999999999], [1435233600.0, 3500000.0, 12.0], [1435235328.0, 3520000.0, 12.04], [1435237056.0, 3540000.0, 12.08], [1435238784.0, 3560000.0, 12.12], [1435240512.0, 3580000.0, 12.16], [1435242240.0, 3600000.0, 12.2], [1435243968.0, 3620000.0, 12.24], [1435245696.0, 3640000.0, 12.28], [1435247424.0, 3660000.0, 12.32], [1435249152.0, 3680000.0, 12.36], [1435250880.0, 3700000.0, 12.4], [1435252608.0, 3720000.0, 12.44], [1435254336.0, 3740000.0, 12.48], [1435256064.0, 3760000.0, 12.52], [1435257792.0, 3780000.0, 12.56], [1435259520.0, 3800000.0, 12.6], [1435261248.0, 3820000.0, 12.64], [1435262976.0, 3840000.0, 12.68], [1435264704.0, 3860000.0, 12.72], [1435266432.0, 3880000.0, 12.76], [1435268160.0, 3900000.0, 12.8], [1435269888.0, 3920000.0, 12.84], [1435271616.0, 3940000.0, 12.88], [1435273344.0, 3960000.0, 12.92], [1435275072.0, 3980000.0, 12.96], [1435276800.0, 4000000.0, 13.0], [1435278528.0, 4020000.0, 13.040000000000001], [1435280256.0, 4040000.0, 13.08], [1435281984.0, 4060000.0, 13.120000000000001], [1435283712.0, 4080000.0, 13.16], [1435285440.0, 4100000.0, 13.2], [1435287168.0, 4120000.0, 13.24], [1435288896.0, 4140000.0, 13.28], [1435290624.0, 4160000.0, 13.32], [1435292352.0, 4180000.0, 13.36], [1435294080.0, 4200000.0, 13.4], [1435295808.0, 4220000.0, 13.44], [1435297536.0, 4240000.0, 13.48], [1435299264.0, 4260000.0, 13.52], [1435300992.0, 4280000.0, 13.56], [1435302720.0, 4300000.0, 13.6], [1435304448.0, 4320000.0, 13.64], [1435306176.0, 4340000.0, 13.68], [1435307904.0, 4360000.0, 13.72], [1435309632.0, 4380000.0, 13.76], [1435311360.0, 4400000.0, 13.8], [1435313088.0, 4420000.0, 13.84], [1435314816.0, 4440000.0, 13.88], [1435316544.0, 4460000.0, 13.92], [1435318272.0, 4480000.0, 13.96], [1435320000.0, 4500000.0, 14.0], [1435321728.0, 4520000.0, 14.040000000000001], [1435323456.0, 4540000.0, 14.08], [1435325184.0, 4560000.0, 14.120000000000001], [1435326912.0, 4580000.0, 14.16], [1435328640.0, 4600000.0, 14.200000000000001], [1435330368.0, 4620000.0, 14.24], [1435332096.0, 4640000.0, 14.280000000000001], [1435333824.0, 4660000.0, 14.32], [1435335552.0, 4680000.0, 14.360000000000001], [1435337280.0, 4700000.0, 14.399999999999999], [1435339008.0, 4720000.0, 14.44], [1435340736.0, 4740000.0, 14.48], [1435342464.0, 4760000.0, 14.52], [1435344192.0, 4780000.0, 14.559999999999999], [1435345920.0, 4800000.0, 14.6], [1435347648.0, 4820000.0, 14.64], [1435349376.0, 4840000.0, 14.68], [1435351104.0, 4860000.0, 14.719999999999999], [1435352832.0, 4880000.0, 14.76], [1435354560.0, 4900000.0, 14.8], [1435356288.0, 4920000.0, 14.84], [1435358016.0, 4940000.0, 14.879999999999999], [1435359744.0, 4960000.0, 14.92], [1435361472.0, 4980000.0, 14.96], [1435363200.0, 5000000.0, 15.0], [1435364928.0, 5020000.0, 15.04], [1435366656.0, 5040000.0, 15.08], [1435368384.0, 5060000.0, 15.120000000000001], [1435370112.0, 5080000.0, 15.16], [1435371840.0, 5100000.0, 15.2], [1435373568.0, 5120000.0, 15.24], [1435375296.0, 5140000.0, 15.280000000000001], [1435377024.0, 5160000.0, 15.32], [1435378752.0, 5180000.0, 15.36], [1435380480.0, 5200000.0, 15.4], [1435382208.0, 5220000.0, 15.440000000000001], [1435383936.0, 5240000.0, 15.48], [1435385664.0, 5260000.0, 15.52], [1435387392.0, 5280000.0, 15.56], [1435389120.0, 5300000.0, 15.600000000000001], [1435390848.0, 5320000.0, 15.64], [1435392576.0, 5340000.0, 15.68], [1435394304.0, 5360000.0, 15.72], [1435396032.0, 5380000.0, 15.760000000000002], [1435397760.0, 5400000.0, 15.8], [1435399488.0, 5420000.0, 15.84], [1435401216.0, 5440000.0, 15.88], [1435402944.0, 5460000.0, 15.920000000000002], [1435404672.0, 5480000.0, 15.96], [1435406400.0, 5500000.0, 16.0], [1435408128.0, 5520000.0, 16.04], [1435409856.0, 5540000.0, 16.080000000000002], [1435411584.0, 5560000.0, 16.12], [1435413312.0, 5580000.0, 16.16], [1435415040.0, 5600000.0, 16.200000000000003], [1435416768.0, 5620000.0, 16.240000000000002], [1435418496.0, 5640000.0, 16.28], [1435420224.0, 5660000.0, 16.32], [1435421952.0, 5680000.0, 16.36], [1435423680.0, 5700000.0, 16.4], [1435425408.0, 5720000.0, 16.439999999999998], [1435427136.0, 5740000.0, 16.479999999999997], [1435428864.0, 5760000.0, 16.52], [1435430592.0, 5780000.0, 16.56], [1435432320.0, 5800000.0, 16.6], [1435434048.0, 5820000.0, 16.64], [1435435776.0, 5840000.0, 16.68], [1435437504.0, 5860000.0, 16.72], [1435439232.0, 5880000.0, 16.759999999999998], [1435440960.0, 5900000.0, 16.799999999999997], [1435442688.0, 5920000.0, 16.84], [1435444416.0, 5940000.0, 16.88], [1435446144.0, 5960000.0, 16.92], [1435447872.0, 5980000.0, 16.96], [1435449600.0, 6000000.0, 17.0], [1435451328.0, 6020000.0, 17.04], [1435453056.0, 6040000.0, 17.08], [1435454784.0, 6060000.0, 17.119999999999997], [1435456512.0, 6080000.0, 17.16], [1435458240.0, 6100000.0, 17.2], [1435459968.0, 6120000.0, 17.240000000000002], [1435461696.0, 6140000.0, 17.28], [1435463424.0, 6160000.0, 17.32], [1435465152.0, 6180000.0, 17.36], [1435466880.0, 6200000.0, 17.4], [1435468608.0, 6220000.0, 17.439999999999998], [1435470336.0, 6240000.0, 17.48], [1435472064.0, 6260000.0, 17.52], [1435473792.0, 6280000.0, 17.560000000000002], [1435475520.0, 6300000.0, 17.6], [1435477248.0, 6320000.0, 17.64], [1435478976.0, 6340000.0, 17.68], [1435480704.0, 6360000.0, 17.72], [1435482432.0, 6380000.0, 17.759999999999998], [1435484160.0, 6400000.0, 17.8], [1435485888.0, 6420000.0, 17.84], [1435487616.0, 6440000.0, 17.880000000000003], [1435489344.0, 6460000.0, 17.92], [1435491072.0, 6480000.0, 17.96], [1435492800.0, 6500000.0, 18.0], [1435494528.0, 6520000.0, 18.04], [1435496256.0, 6540000.0, 18.08], [1435497984.0, 6560000.0, 18.12], [1435499712.0, 6580000.0, 18.16], [1435501440.0, 6600000.0, 18.200000000000003], [1435503168.0, 6620000.0, 18.240000000000002], [1435504896.0, 6640000.0, 18.28], [1435506624.0, 6660000.0, 18.32], [1435508352.0, 6680000.0, 18.36], [1435510080.0, 6700000.0, 18.4], [1435511808.0, 6720000.0, 18.44], [1435513536.0, 6740000.0, 18.48], [1435515264.0, 6760000.0, 18.520000000000003], [1435516992.0, 6780000.0, 18.560000000000002], [1435518720.0, 6800000.0, 18.6], [1435520448.0, 6820000.0, 18.64], [1435522176.0, 6840000.0, 18.68], [1435523904.0, 6860000.0, 18.72], [1435525632.0, 6880000.0, 18.759999999999998], [1435527360.0, 6900000.0, 18.799999999999997], [1435529088.0, 6920000.0, 18.84], [1435530816.0, 6940000.0, 18.88], [1435532544.0, 6960000.0, 18.919999999999998], [1435534272.0, 6980000.0, 18.96], [1435536000.0, 7000000.0, 19.0], [1435537728.0, 7020000.0, 19.04], [1435539456.0, 7040000.0, 19.08], [1435541184.0, 7060000.0, 19.119999999999997], [1435542912.0, 7080000.0, 19.16], [1435544640.0, 7100000.0, 19.2], [1435546368.0, 7120000.0, 19.24], [1435548096.0, 7140000.0, 19.28], [1435549824.0, 7160000.0, 19.32], [1435551552.0, 7180000.0, 19.36], [1435553280.0, 7200000.0, 19.4], [1435555008.0, 7220000.0, 19.439999999999998], [1435556736.0, 7240000.0, 19.48], [1435558464.0, 7260000.0, 19.52], [1435560192.0, 7280000.0, 19.56], [1435561920.0, 7300000.0, 19.6], [1435563648.0, 7320000.0, 19.64], [1435565376.0, 7340000.0, 19.68], [1435567104.0, 7360000.0, 19.72], [1435568832.0, 7380000.0, 19.759999999999998], [1435570560.0, 7400000.0, 19.8], [1435572288.0, 7420000.0, 19.84], [1435574016.0, 7440000.0, 19.88], [1435575744.0, 7460000.0, 19.92], [1435577472.0, 7480000.0, 19.96], [1435579200.0, 7500000.0, 20.0], [1435580928.0, 7520000.0, 20.04], [1435582656.0, 7540000.0, 20.08], [1435584384.0, 7560000.0, 20.12], [1435586112.0, 7580000.0, 20.16], [1435587840.0, 7600000.0, 20.2], [1435589568.0, 7620000.0, 20.240000000000002], [1435591296.0, 7640000.0, 20.28], [1435593024.0, 7660000.0, 20.32], [1435594752.0, 7680000.0, 20.36], [1435596480.0, 7700000.0, 20.4], [1435598208.0, 7720000.0, 20.44], [1435599936.0, 7740000.0, 20.48], [1435601664.0, 7760000.0, 20.52], [1435603392.0, 7780000.0, 20.560000000000002], [1435605120.0, 7800000.0, 20.6], [1435606848.0, 7820000.0, 20.64], [1435608576.0, 7840000.0, 20.68], [1435610304.0, 7860000.0, 20.72], [1435612032.0, 7880000.0, 20.76], [1435613760.0, 7900000.0, 20.8], [1435615488.0, 7920000.0, 20.84], [1435617216.0, 7940000.0, 20.880000000000003], [1435618944.0, 7960000.0, 20.92], [1435620672.0, 7980000.0, 20.96], [1435622400.0, 8000000.0, 21.0], [1435624128.0, 8020000.0, 21.04], [1435625856.0, 8040000.0, 21.080000000000002], [1435627584.0, 8060000.0, 21.12], [1435629312.0, 8080000.0, 21.16], [1435631040.0, 8100000.0, 21.200000000000003], [1435632768.0, 8120000.0, 21.240000000000002], [1435634496.0, 8140000.0, 21.279999999999998], [1435636224.0, 8160000.0, 21.32], [1435637952.0, 8180000.0, 21.36], [1435639680.0, 8200000.0, 21.4], [1435641408.0, 8220000.0, 21.439999999999998], [1435643136.0, 8240000.0, 21.48], [1435644864.0, 8260000.0, 21.52], [1435646592.0, 8280000.0, 21.56], [1435648320.0, 8300000.0, 21.599999999999998], [1435650048.0, 8320000.0, 21.64], [1435651776.0, 8340000.0, 21.68], [1435653504.0, 8360000.0, 21.72], [1435655232.0, 8380000.0, 21.759999999999998], [1435656960.0, 8400000.0, 21.8], [1435658688.0, 8420000.0, 21.84], [1435660416.0, 8440000.0, 21.88], [1435662144.0, 8460000.0, 21.919999999999998], [1435663872.0, 8480000.0, 21.96], [1435665600.0, 8500000.0, 22.0], [1435667328.0, 8520000.0, 22.04], [1435669056.0, 8540000.0, 22.08], [1435670784.0, 8560000.0, 22.12], [1435672512.0, 8580000.0, 22.16], [1435674240.0, 8600000.0, 22.2], [1435675968.0, 8620000.0, 22.24], [1435677696.0, 8640000.0, 22.28], [1435679424.0, 8660000.0, 22.32], [1435681152.0, 8680000.0, 22.36], [1435682880.0, 8700000.0, 22.4], [1435684608.0, 8720000.0, 22.44], [1435686336.0, 8740000.0, 22.48], [1435688064.0, 8760000.0, 22.52], [1435689792.0, 8780000.0, 22.56], [1435691520.0, 8800000.0, 22.6], [1435693248.0, 8820000.0, 22.64], [1435694976.0, 8840000.0, 22.68], [1435696704.0, 8860000.0, 22.72], [1435698432.0, 8880000.0, 22.76], [1435700160.0, 8900000.0, 22.8], [1435701888.0, 8920000.0, 22.84], [1435703616.0, 8940000.0, 22.88], [1435705344.0, 8960000.0, 22.92], [1435707072.0, 8980000.0, 22.96], [1435708800.0, 9000000.0, 23.0], [1435710528.0, 9020000.0, 23.04], [1435712256.0, 9040000.0, 23.080000000000002], [1435713984.0, 9060000.0, 23.12], [1435715712.0, 9080000.0, 23.16], [1435717440.0, 9100000.0, 23.2], [1435719168.0, 9120000.0, 23.240000000000002], [1435720896.0, 9140000.0, 23.28], [1435722624.0, 9160000.0, 23.32], [1435724352.0, 9180000.0, 23.36], [1435726080.0, 9200000.0, 23.400000000000002], [1435727808.0, 9220000.0, 23.44], [1435729536.0, 9240000.0, 23.48], [1435731264.0, 9260000.0, 23.52], [1435732992.0, 9280000.0, 23.560000000000002], [1435734720.0, 9300000.0, 23.6], [1435736448.0, 9320000.0, 23.64], [1435738176.0, 9340000.0, 23.68], [1435739904.0, 9360000.0, 23.720000000000002], [1435741632.0, 9380000.0, 23.759999999999998], [1435743360.0, 9400000.0, 23.799999999999997], [1435745088.0, 9420000.0, 23.84], [1435746816.0, 9440000.0, 23.88], [1435748544.0, 9460000.0, 23.919999999999998], [1435750272.0, 9480000.0, 23.96], [1435752000.0, 9500000.0, 24.0], [1435753728.0, 9520000.0, 24.04], [1435755456.0, 9540000.0, 24.08], [1435757184.0, 9560000.0, 24.119999999999997], [1435758912.0, 9580000.0, 24.16], [1435760640.0, 9600000.0, 24.2], [1435762368.0, 9620000.0, 24.24], [1435764096.0, 9640000.0, 24.28], [1435765824.0, 9660000.0, 24.32], [1435767552.0, 9680000.0, 24.36], [1435769280.0, 9700000.0, 24.4], [1435771008.0, 9720000.0, 24.439999999999998], [1435772736.0, 9740000.0, 24.48], [1435774464.0, 9760000.0, 24.52], [1435776192.0, 9780000.0, 24.56], [1435777920.0, 9800000.0, 24.6], [1435779648.0, 9820000.0, 24.64], [1435781376.0, 9840000.0, 24.68], [1435783104.0, 9860000.0, 24.72], [1435784832.0, 9880000.0, 24.759999999999998], [1435786560.0, 9900000.0, 24.8], [1435788288.0, 9920000.0, 24.84], [1435790016.0, 9940000.0, 24.88], [1435791744.0, 9960000.0, 24.92], [1435793472.0, 9980000.0, 24.96], [1435795200.0, 10000000.0, 25.0], [1435796928.0, 10020000.0, 25.04], [1435798656.0, 10040000.0, 25.08], [1435800384.0, 10060000.0, 25.12], [1435802112.0, 10080000.0, 25.16], [1435803840.0, 10100000.0, 25.2], [1435805568.0, 10120000.0, 25.240000000000002], [1435807296.0, 10140000.0, 25.28], [1435809024.0, 10160000.0, 25.32], [1435810752.0, 10180000.0, 25.36], [1435812480.0, 10200000.0, 25.4], [1435814208.0, 10220000.0, 25.44], [1435815936.0, 10240000.0, 25.48], [1435817664.0, 10260000.0, 25.52], [1435819392.0, 10280000.0, 25.560000000000002], [1435821120.0, 10300000.0, 25.6], [1435822848.0, 10320000.0, 25.64], [1435824576.0, 10340000.0, 25.68], [1435826304.0, 10360000.0, 25.72], [1435828032.0, 10380000.0, 25.76], [1435829760.0, 10400000.0, 25.8], [1435831488.0, 10420000.0, 25.84], [1435833216.0, 10440000.0, 25.880000000000003], [1435834944.0, 10460000.0, 25.92], [1435836672.0, 10480000.0, 25.96], [1435838400.0, 10500000.0, 26.0], [1435840128.0, 10520000.0, 26.04], [1435841856.0, 10540000.0, 26.080000000000002], [1435843584.0, 10560000.0, 26.12], [1435845312.0, 10580000.0, 26.16], [1435847040.0, 10600000.0, 26.200000000000003], [1435848768.0, 10620000.0, 26.240000000000002], [1435850496.0, 10640000.0, 26.28], [1435852224.0, 10660000.0, 26.32], [1435853952.0, 10680000.0, 26.36], [1435855680.0, 10700000.0, 26.400000000000002], [1435857408.0, 10720000.0, 26.44], [1435859136.0, 10740000.0, 26.48], [1435860864.0, 10760000.0, 26.520000000000003], [1435862592.0, 10780000.0, 26.560000000000002], [1435864320.0, 10800000.0, 26.6], [1435866048.0, 10820000.0, 26.64], [1435867776.0, 10840000.0, 26.68], [1435869504.0, 10860000.0, 26.720000000000002], [1435871232.0, 10880000.0, 26.76], [1435872960.0, 10900000.0, 26.8], [1435874688.0, 10920000.0, 26.840000000000003], [1435876416.0, 10940000.0, 26.880000000000003], [1435878144.0, 10960000.0, 26.92], [1435879872.0, 10980000.0, 26.96], [1435881600.0, 11000000.0, 27.0], [1435883328.0, 11020000.0, 27.040000000000003], [1435885056.0, 11040000.0, 27.080000000000002], [1435886784.0, 11060000.0, 27.12], [1435888512.0, 11080000.0, 27.160000000000004], [1435890240.0, 11100000.0, 27.200000000000003], [1435891968.0, 11120000.0, 27.240000000000002], [1435893696.0, 11140000.0, 27.28], [1435895424.0, 11160000.0, 27.32], [1435897152.0, 11180000.0, 27.360000000000003], [1435898880.0, 11200000.0, 27.400000000000002], [1435900608.0, 11220000.0, 27.44], [1435902336.0, 11240000.0, 27.480000000000004], [1435904064.0, 11260000.0, 27.519999999999996], [1435905792.0, 11280000.0, 27.56], [1435907520.0, 11300000.0, 27.599999999999998], [1435909248.0, 11320000.0, 27.639999999999997], [1435910976.0, 11340000.0, 27.68], [1435912704.0, 11360000.0, 27.72], [1435914432.0, 11380000.0, 27.759999999999998], [1435916160.0, 11400000.0, 27.799999999999997], [1435917888.0, 11420000.0, 27.839999999999996], [1435919616.0, 11440000.0, 27.88], [1435921344.0, 11460000.0, 27.919999999999998], [1435923072.0, 11480000.0, 27.959999999999997], [1435924800.0, 11500000.0, 28.0], [1435926528.0, 11520000.0, 28.04], [1435928256.0, 11540000.0, 28.08], [1435929984.0, 11560000.0, 28.119999999999997], [1435931712.0, 11580000.0, 28.159999999999997], [1435933440.0, 11600000.0, 28.2], [1435935168.0, 11620000.0, 28.24], [1435936896.0, 11640000.0, 28.279999999999998], [1435938624.0, 11660000.0, 28.32], [1435940352.0, 11680000.0, 28.36], [1435942080.0, 11700000.0, 28.4], [1435943808.0, 11720000.0, 28.439999999999998], [1435945536.0, 11740000.0, 28.479999999999997], [1435947264.0, 11760000.0, 28.52], [1435948992.0, 11780000.0, 28.56], [1435950720.0, 11800000.0, 28.599999999999998], [1435952448.0, 11820000.0, 28.64], [1435954176.0, 11840000.0, 28.68], [1435955904.0, 11860000.0, 28.72], [1435957632.0, 11880000.0, 28.759999999999998], [1435959360.0, 11900000.0, 28.799999999999997], [1435961088.0, 11920000.0, 28.84], [1435962816.0, 11940000.0, 28.88], [1435964544.0, 11960000.0, 28.919999999999998], [1435966272.0, 11980000.0, 28.96], [1435968000.0, 12000000.0, 29.0], [1435969728.0, 12020000.0, 29.04], [1435971456.0, 12040000.0, 29.08], [1435973184.0, 12060000.0, 29.119999999999997], [1435974912.0, 12080000.0, 29.16], [1435976640.0, 12100000.0, 29.2], [1435978368.0, 12120000.0, 29.24], [1435980096.0, 12140000.0, 29.28], [1435981824.0, 12160000.0, 29.32], [1435983552.0, 12180000.0, 29.36], [1435985280.0, 12200000.0, 29.4], [1435987008.0, 12220000.0, 29.439999999999998], [1435988736.0, 12240000.0, 29.48], [1435990464.0, 12260000.0, 29.52], [1435992192.0, 12280000.0, 29.56], [1435993920.0, 12300000.0, 29.6], [1435995648.0, 12320000.0, 29.64], [1435997376.0, 12340000.0, 29.68], [1435999104.0, 12360000.0, 29.72], [1436000832.0, 12380000.0, 29.759999999999998], [1436002560.0, 12400000.0, 29.8], [1436004288.0, 12420000.0, 29.84], [1436006016.0, 12440000.0, 29.88], [1436007744.0, 12460000.0, 29.92], [1436009472.0, 12480000.0, 29.96], [1436011200.0, 12500000.0, 30.0], [1436012928.0, 12520000.0, 30.04], [1436014656.0, 12540000.0, 30.08], [1436016384.0, 12560000.0, 30.12], [1436018112.0, 12580000.0, 30.16], [1436019840.0, 12600000.0, 30.2], [1436021568.0, 12620000.0, 30.240000000000002], [1436023296.0, 12640000.0, 30.28], [1436025024.0, 12660000.0, 30.32], [1436026752.0, 12680000.0, 30.36], [1436028480.0, 12700000.0, 30.4], [1436030208.0, 12720000.0, 30.44], [1436031936.0, 12740000.0, 30.48], [1436033664.0, 12760000.0, 30.52], [1436035392.0, 12780000.0, 30.560000000000002], [1436037120.0, 12800000.0, 30.6], [1436038848.0, 12820000.0, 30.64], [1436040576.0, 12840000.0, 30.68], [1436042304.0, 12860000.0, 30.72], [1436044032.0, 12880000.0, 30.76], [1436045760.0, 12900000.0, 30.8], [1436047488.0, 12920000.0, 30.84], [1436049216.0, 12940000.0, 30.880000000000003], [1436050944.0, 12960000.0, 30.92], [1436052672.0, 12980000.0, 30.96], [1436054400.0, 13000000.0, 31.0], [1436056128.0, 13020000.0, 31.04], [1436057856.0, 13040000.0, 31.080000000000002], [1436059584.0, 13060000.0, 31.12], [1436061312.0, 13080000.0, 31.16], [1436063040.0, 13100000.0, 31.200000000000003], [1436064768.0, 13120000.0, 31.240000000000002], [1436066496.0, 13140000.0, 31.28], [1436068224.0, 13160000.0, 31.32], [1436069952.0, 13180000.0, 31.36], [1436071680.0, 13200000.0, 31.400000000000002], [1436073408.0, 13220000.0, 31.44], [1436075136.0, 13240000.0, 31.48], [1436076864.0, 13260000.0, 31.520000000000003], [1436078592.0, 13280000.0, 31.560000000000002], [1436080320.0, 13300000.0, 31.6], [1436082048.0, 13320000.0, 31.64], [1436083776.0, 13340000.0, 31.68], [1436085504.0, 13360000.0, 31.720000000000002], [1436087232.0, 13380000.0, 31.76], [1436088960.0, 13400000.0, 31.8], [1436090688.0, 13420000.0, 31.840000000000003], [1436092416.0, 13440000.0, 31.880000000000003], [1436094144.0, 13460000.0, 31.92], [1436095872.0, 13480000.0, 31.96], [1436097600.0, 13500000.0, 32.0], [1436099328.0, 13520000.0, 32.040000000000006], [1436101056.0, 13540000.0, 32.08], [1436102784.0, 13560000.0, 32.120000000000005], [1436104512.0, 13580000.0, 32.160000000000004], [1436106240.0, 13600000.0, 32.2], [1436107968.0, 13620000.0, 32.24], [1436109696.0, 13640000.0, 32.28], [1436111424.0, 13660000.0, 32.32], [1436113152.0, 13680000.0, 32.36], [1436114880.0, 13700000.0, 32.400000000000006], [1436116608.0, 13720000.0, 32.44], [1436118336.0, 13740000.0, 32.480000000000004], [1436120064.0, 13760000.0, 32.519999999999996], [1436121792.0, 13780000.0, 32.56], [1436123520.0, 13800000.0, 32.599999999999994], [1436125248.0, 13820000.0, 32.64], [1436126976.0, 13840000.0, 32.68], [1436128704.0, 13860000.0, 32.72], [1436130432.0, 13880000.0, 32.76], [1436132160.0, 13900000.0, 32.8], [1436133888.0, 13920000.0, 32.839999999999996], [1436135616.0, 13940000.0, 32.879999999999995], [1436137344.0, 13960000.0, 32.92], [1436139072.0, 13980000.0, 32.959999999999994], [1436140800.0, 14000000.0, 33.0], [1436142528.0, 14020000.0, 33.04], [1436144256.0, 14040000.0, 33.08], [1436145984.0, 14060000.0, 33.12], [1436147712.0, 14080000.0, 33.16], [1436149440.0, 14100000.0, 33.2], [1436151168.0, 14120000.0, 33.239999999999995], [1436152896.0, 14140000.0, 33.28], [1436154624.0, 14160000.0, 33.32], [1436156352.0, 14180000.0, 33.36], [1436158080.0, 14200000.0, 33.4], [1436159808.0, 14220000.0, 33.44], [1436161536.0, 14240000.0, 33.48], [1436163264.0, 14260000.0, 33.519999999999996], [1436164992.0, 14280000.0, 33.56], [1436166720.0, 14300000.0, 33.599999999999994], [1436168448.0, 14320000.0, 33.64], [1436170176.0, 14340000.0, 33.68], [1436171904.0, 14360000.0, 33.72], [1436173632.0, 14380000.0, 33.76], [1436175360.0, 14400000.0, 33.8], [1436177088.0, 14420000.0, 33.84], [1436178816.0, 14440000.0, 33.879999999999995], [1436180544.0, 14460000.0, 33.92], [1436182272.0, 14480000.0, 33.96], [1436184000.0, 14500000.0, 34.0], [1436185728.0, 14520000.0, 34.04], [1436187456.0, 14540000.0, 34.08], [1436189184.0, 14560000.0, 34.12], [1436190912.0, 14580000.0, 34.16], [1436192640.0, 14600000.0, 34.2], [1436194368.0, 14620000.0, 34.239999999999995], [1436196096.0, 14640000.0, 34.28], [1436197824.0, 14660000.0, 34.32], [1436199552.0, 14680000.0, 34.36], [1436201280.0, 14700000.0, 34.4], [1436203008.0, 14720000.0, 34.44], [1436204736.0, 14740000.0, 34.480000000000004], [1436206464.0, 14760000.0, 34.519999999999996], [1436208192.0, 14780000.0, 34.56], [1436209920.0, 14800000.0, 34.6], [1436211648.0, 14820000.0, 34.64], [1436213376.0, 14840000.0, 34.68], [1436215104.0, 14860000.0, 34.72], [1436216832.0, 14880000.0, 34.76], [1436218560.0, 14900000.0, 34.8], [1436220288.0, 14920000.0, 34.84], [1436222016.0, 14940000.0, 34.879999999999995], [1436223744.0, 14960000.0, 34.92], [1436225472.0, 14980000.0, 34.96], [1436227200.0, 15000000.0, 35.0], [1436228928.0, 15020000.0, 35.04], [1436230656.0, 15040000.0, 35.08], [1436232384.0, 15060000.0, 35.120000000000005], [1436234112.0, 15080000.0, 35.16], [1436235840.0, 15100000.0, 35.2], [1436237568.0, 15120000.0, 35.24], [1436239296.0, 15140000.0, 35.28], [1436241024.0, 15160000.0, 35.32], [1436242752.0, 15180000.0, 35.36], [1436244480.0, 15200000.0, 35.4], [1436246208.0, 15220000.0, 35.44], [1436247936.0, 15240000.0, 35.480000000000004], [1436249664.0, 15260000.0, 35.519999999999996], [1436251392.0, 15280000.0, 35.56], [1436253120.0, 15300000.0, 35.6], [1436254848.0, 15320000.0, 35.64], [1436256576.0, 15340000.0, 35.68], [1436258304.0, 15360000.0, 35.72], [1436260032.0, 15380000.0, 35.760000000000005], [1436261760.0, 15400000.0, 35.8], [1436263488.0, 15420000.0, 35.84], [1436265216.0, 15440000.0, 35.88], [1436266944.0, 15460000.0, 35.92], [1436268672.0, 15480000.0, 35.96], [1436270400.0, 15500000.0, 36.0], [1436272128.0, 15520000.0, 36.04], [1436273856.0, 15540000.0, 36.08], [1436275584.0, 15560000.0, 36.120000000000005], [1436277312.0, 15580000.0, 36.16], [1436279040.0, 15600000.0, 36.2], [1436280768.0, 15620000.0, 36.24], [1436282496.0, 15640000.0, 36.28], [1436284224.0, 15660000.0, 36.32], [1436285952.0, 15680000.0, 36.36], [1436287680.0, 15700000.0, 36.400000000000006], [1436289408.0, 15720000.0, 36.44], [1436291136.0, 15740000.0, 36.480000000000004], [1436292864.0, 15760000.0, 36.52], [1436294592.0, 15780000.0, 36.56], [1436296320.0, 15800000.0, 36.6], [1436298048.0, 15820000.0, 36.64], [1436299776.0, 15840000.0, 36.68], [1436301504.0, 15860000.0, 36.72], [1436303232.0, 15880000.0, 36.760000000000005], [1436304960.0, 15900000.0, 36.8], [1436306688.0, 15920000.0, 36.84], [1436308416.0, 15940000.0, 36.88], [1436310144.0, 15960000.0, 36.92], [1436311872.0, 15980000.0, 36.96], [1436313600.0, 16000000.0, 37.0], [1436315328.0, 16020000.0, 37.04], [1436317056.0, 16040000.0, 37.08], [1436318784.0, 16060000.0, 37.120000000000005], [1436320512.0, 16080000.0, 37.160000000000004], [1436322240.0, 16100000.0, 37.2], [1436323968.0, 16120000.0, 37.24], [1436325696.0, 16140000.0, 37.28], [1436327424.0, 16160000.0, 37.32], [1436329152.0, 16180000.0, 37.36], [1436330880.0, 16200000.0, 37.400000000000006], [1436332608.0, 16220000.0, 37.440000000000005], [1436334336.0, 16240000.0, 37.480000000000004], [1436336064.0, 16260000.0, 37.519999999999996], [1436337792.0, 16280000.0, 37.559999999999995], [1436339520.0, 16300000.0, 37.599999999999994], [1436341248.0, 16320000.0, 37.64], [1436342976.0, 16340000.0, 37.68], [1436344704.0, 16360000.0, 37.72], [1436346432.0, 16380000.0, 37.76], [1436348160.0, 16400000.0, 37.8], [1436349888.0, 16420000.0, 37.839999999999996], [1436351616.0, 16440000.0, 37.879999999999995], [1436353344.0, 16460000.0, 37.92], [1436355072.0, 16480000.0, 37.96], [1436356800.0, 16500000.0, 38.0], [1436358528.0, 16520000.0, 38.04], [1436360256.0, 16540000.0, 38.08], [1436361984.0, 16560000.0, 38.12], [1436363712.0, 16580000.0, 38.16], [1436365440.0, 16600000.0, 38.199999999999996], [1436367168.0, 16620000.0, 38.239999999999995], [1436368896.0, 16640000.0, 38.28], [1436370624.0, 16660000.0, 38.32], [1436372352.0, 16680000.0, 38.36], [1436374080.0, 16700000.0, 38.4], [1436375808.0, 16720000.0, 38.44], [1436377536.0, 16740000.0, 38.48], [1436379264.0, 16760000.0, 38.519999999999996], [1436380992.0, 16780000.0, 38.56], [1436382720.0, 16800000.0, 38.6], [1436384448.0, 16820000.0, 38.64], [1436386176.0, 16840000.0, 38.68], [1436387904.0, 16860000.0, 38.72], [1436389632.0, 16880000.0, 38.76], [1436391360.0, 16900000.0, 38.8], [1436393088.0, 16920000.0, 38.839999999999996], [1436394816.0, 16940000.0, 38.879999999999995], [1436396544.0, 16960000.0, 38.92], [1436398272.0, 16980000.0, 38.96], [1436400000.0, 17000000.0, 39.0], [1436401728.0, 17020000.0, 39.04], [1436403456.0, 17040000.0, 39.08], [1436405184.0, 17060000.0, 39.12], [1436406912.0, 17080000.0, 39.16], [1436408640.0, 17100000.0, 39.2], [1436410368.0, 17120000.0, 39.24], [1436412096.0, 17140000.0, 39.28], [1436413824.0, 17160000.0, 39.32], [1436415552.0, 17180000.0, 39.36], [1436417280.0, 17200000.0, 39.4], [1436419008.0, 17220000.0, 39.44], [1436420736.0, 17240000.0, 39.48], [1436422464.0, 17260000.0, 39.519999999999996], [1436424192.0, 17280000.0, 39.56], [1436425920.0, 17300000.0, 39.6], [1436427648.0, 17320000.0, 39.64], [1436429376.0, 17340000.0, 39.68], [1436431104.0, 17360000.0, 39.72], [1436432832.0, 17380000.0, 39.76], [1436434560.0, 17400000.0, 39.8], [1436436288.0, 17420000.0, 39.84], [1436438016.0, 17440000.0, 39.88], [1436439744.0, 17460000.0, 39.92], [1436441472.0, 17480000.0, 39.96], [1436443200.0, 17500000.0, 40.0], [1436444928.0, 17520000.0, 40.04], [1436446656.0, 17540000.0, 40.08], [1436448384.0, 17560000.0, 40.12], [1436450112.0, 17580000.0, 40.16], [1436451840.0, 17600000.0, 40.2], [1436453568.0, 17620000.0, 40.24], [1436455296.0, 17640000.0, 40.28], [1436457024.0, 17660000.0, 40.32], [1436458752.0, 17680000.0, 40.36], [1436460480.0, 17700000.0, 40.4], [1436462208.0, 17720000.0, 40.44], [1436463936.0, 17740000.0, 40.480000000000004], [1436465664.0, 17760000.0, 40.52], [1436467392.0, 17780000.0, 40.56], [1436469120.0, 17800000.0, 40.6], [1436470848.0, 17820000.0, 40.64], [1436472576.0, 17840000.0, 40.68], [1436474304.0, 17860000.0, 40.72], [1436476032.0, 17880000.0, 40.76], [1436477760.0, 17900000.0, 40.8], [1436479488.0, 17920000.0, 40.84], [1436481216.0, 17940000.0, 40.88], [1436482944.0, 17960000.0, 40.92], [1436484672.0, 17980000.0, 40.96], [1436486400.0, 18000000.0, 41.0], [1436488128.0, 18020000.0, 41.04], [1436489856.0, 18040000.0, 41.08], [1436491584.0, 18060000.0, 41.120000000000005], [1436493312.0, 18080000.0, 41.160000000000004], [1436495040.0, 18100000.0, 41.2], [1436496768.0, 18120000.0, 41.24], [1436498496.0, 18140000.0, 41.28], [1436500224.0, 18160000.0, 41.32], [1436501952.0, 18180000.0, 41.36], [1436503680.0, 18200000.0, 41.4], [1436505408.0, 18220000.0, 41.44], [1436507136.0, 18240000.0, 41.480000000000004], [1436508864.0, 18260000.0, 41.52], [1436510592.0, 18280000.0, 41.56], [1436512320.0, 18300000.0, 41.6], [1436514048.0, 18320000.0, 41.64], [1436515776.0, 18340000.0, 41.68], [1436517504.0, 18360000.0, 41.72], [1436519232.0, 18380000.0, 41.760000000000005], [1436520960.0, 18400000.0, 41.800000000000004], [1436522688.0, 18420000.0, 41.84], [1436524416.0, 18440000.0, 41.88], [1436526144.0, 18460000.0, 41.92], [1436527872.0, 18480000.0, 41.96], [1436529600.0, 18500000.0, 42.0], [1436531328.0, 18520000.0, 42.04], [1436533056.0, 18540000.0, 42.08], [1436534784.0, 18560000.0, 42.120000000000005], [1436536512.0, 18580000.0, 42.160000000000004], [1436538240.0, 18600000.0, 42.2], [1436539968.0, 18620000.0, 42.24], [1436541696.0, 18640000.0, 42.28], [1436543424.0, 18660000.0, 42.32], [1436545152.0, 18680000.0, 42.36], [1436546880.0, 18700000.0, 42.400000000000006], [1436548608.0, 18720000.0, 42.440000000000005], [1436550336.0, 18740000.0, 42.480000000000004], [1436552064.0, 18760000.0, 42.519999999999996], [1436553792.0, 18780000.0, 42.559999999999995], [1436555520.0, 18800000.0, 42.599999999999994], [1436557248.0, 18820000.0, 42.64], [1436558976.0, 18840000.0, 42.68], [1436560704.0, 18860000.0, 42.72], [1436562432.0, 18880000.0, 42.76], [1436564160.0, 18900000.0, 42.8], [1436565888.0, 18920000.0, 42.839999999999996], [1436567616.0, 18940000.0, 42.879999999999995], [1436569344.0, 18960000.0, 42.92], [1436571072.0, 18980000.0, 42.96], [1436572800.0, 19000000.0, 43.0], [1436574528.0, 19020000.0, 43.04], [1436576256.0, 19040000.0, 43.08], [1436577984.0, 19060000.0, 43.12], [1436579712.0, 19080000.0, 43.16], [1436581440.0, 19100000.0, 43.199999999999996], [1436583168.0, 19120000.0, 43.239999999999995], [1436584896.0, 19140000.0, 43.28], [1436586624.0, 19160000.0, 43.32], [1436588352.0, 19180000.0, 43.36], [1436590080.0, 19200000.0, 43.4], [1436591808.0, 19220000.0, 43.44], [1436593536.0, 19240000.0, 43.48], [1436595264.0, 19260000.0, 43.519999999999996], [1436596992.0, 19280000.0, 43.56], [1436598720.0, 19300000.0, 43.6], [1436600448.0, 19320000.0, 43.64], [1436602176.0, 19340000.0, 43.68], [1436603904.0, 19360000.0, 43.72], [1436605632.0, 19380000.0, 43.76], [1436607360.0, 19400000.0, 43.8], [1436609088.0, 19420000.0, 43.839999999999996], [1436610816.0, 19440000.0, 43.879999999999995], [1436612544.0, 19460000.0, 43.92], [1436614272.0, 19480000.0, 43.96], [1436616000.0, 19500000.0, 44.0], [1436617728.0, 19520000.0, 44.04], [1436619456.0, 19540000.0, 44.08], [1436621184.0, 19560000.0, 44.12], [1436622912.0, 19580000.0, 44.16], [1436624640.0, 19600000.0, 44.2], [1436626368.0, 19620000.0, 44.24], [1436628096.0, 19640000.0, 44.28], [1436629824.0, 19660000.0, 44.32], [1436631552.0, 19680000.0, 44.36], [1436633280.0, 19700000.0, 44.4], [1436635008.0, 19720000.0, 44.44], [1436636736.0, 19740000.0, 44.48], [1436638464.0, 19760000.0, 44.519999999999996], [1436640192.0, 19780000.0, 44.56], [1436641920.0, 19800000.0, 44.6], [1436643648.0, 19820000.0, 44.64], [1436645376.0, 19840000.0, 44.68], [1436647104.0, 19860000.0, 44.72], [1436648832.0, 19880000.0, 44.76], [1436650560.0, 19900000.0, 44.8], [1436652288.0, 19920000.0, 44.84], [1436654016.0, 19940000.0, 44.88], [1436655744.0, 19960000.0, 44.92], [1436657472.0, 19980000.0, 44.96], [1436659200.0, 20000000.0, 45.0], [1436660928.0, 20020000.0, 45.03999999999999], [1436662656.0, 20040000.0, 45.08], [1436664384.0, 20060000.0, 45.12], [1436666112.0, 20080000.0, 45.16], [1436667840.0, 20100000.0, 45.199999999999996], [1436669568.0, 20120000.0, 45.24], [1436671296.0, 20140000.0, 45.279999999999994], [1436673024.0, 20160000.0, 45.32], [1436674752.0, 20180000.0, 45.36], [1436676480.0, 20200000.0, 45.4], [1436678208.0, 20220000.0, 45.44], [1436679936.0, 20240000.0, 45.480000000000004], [1436681664.0, 20260000.0, 45.519999999999996], [1436683392.0, 20280000.0, 45.56], [1436685120.0, 20300000.0, 45.599999999999994], [1436686848.0, 20320000.0, 45.64], [1436688576.0, 20340000.0, 45.67999999999999], [1436690304.0, 20360000.0, 45.72], [1436692032.0, 20380000.0, 45.76], [1436693760.0, 20400000.0, 45.8], [1436695488.0, 20420000.0, 45.839999999999996], [1436697216.0, 20440000.0, 45.88], [1436698944.0, 20460000.0, 45.919999999999995], [1436700672.0, 20480000.0, 45.96], [1436702400.0, 20500000.0, 46.0], [1436704128.0, 20520000.0, 46.04], [1436705856.0, 20540000.0, 46.08], [1436707584.0, 20560000.0, 46.120000000000005], [1436709312.0, 20580000.0, 46.16], [1436711040.0, 20600000.0, 46.2], [1436712768.0, 20620000.0, 46.239999999999995], [1436714496.0, 20640000.0, 46.28], [1436716224.0, 20660000.0, 46.31999999999999], [1436717952.0, 20680000.0, 46.36], [1436719680.0, 20700000.0, 46.4], [1436721408.0, 20720000.0, 46.44], [1436723136.0, 20740000.0, 46.48], [1436724864.0, 20760000.0, 46.52], [1436726592.0, 20780000.0, 46.559999999999995], [1436728320.0, 20800000.0, 46.6], [1436730048.0, 20820000.0, 46.64], [1436731776.0, 20840000.0, 46.68], [1436733504.0, 20860000.0, 46.72], [1436735232.0, 20880000.0, 46.760000000000005], [1436736960.0, 20900000.0, 46.8], [1436738688.0, 20920000.0, 46.84], [1436740416.0, 20940000.0, 46.879999999999995], [1436742144.0, 20960000.0, 46.92], [1436743872.0, 20980000.0, 46.959999999999994], [1436745600.0, 21000000.0, 47.0], [1436747328.0, 21020000.0, 47.04], [1436749056.0, 21040000.0, 47.08], [1436750784.0, 21060000.0, 47.12], [1436752512.0, 21080000.0, 47.160000000000004], [1436754240.0, 21100000.0, 47.199999999999996], [1436755968.0, 21120000.0, 47.24], [1436757696.0, 21140000.0, 47.28], [1436759424.0, 21160000.0, 47.32], [1436761152.0, 21180000.0, 47.36], [1436762880.0, 21200000.0, 47.400000000000006], [1436764608.0, 21220000.0, 47.44], [1436766336.0, 21240000.0, 47.480000000000004], [1436768064.0, 21260000.0, 47.519999999999996], [1436769792.0, 21280000.0, 47.56], [1436771520.0, 21300000.0, 47.599999999999994], [1436773248.0, 21320000.0, 47.64], [1436774976.0, 21340000.0, 47.68], [1436776704.0, 21360000.0, 47.72], [1436778432.0, 21380000.0, 47.76], [1436780160.0, 21400000.0, 47.800000000000004], [1436781888.0, 21420000.0, 47.839999999999996], [1436783616.0, 21440000.0, 47.88], [1436785344.0, 21460000.0, 47.92], [1436787072.0, 21480000.0, 47.96], [1436788800.0, 21500000.0, 48.0], [1436790528.0, 21520000.0, 48.040000000000006], [1436792256.0, 21540000.0, 48.08], [1436793984.0, 21560000.0, 48.120000000000005], [1436795712.0, 21580000.0, 48.16], [1436797440.0, 21600000.0, 48.2], [1436799168.0, 21620000.0, 48.239999999999995], [1436800896.0, 21640000.0, 48.28], [1436802624.0, 21660000.0, 48.32], [1436804352.0, 21680000.0, 48.36], [1436806080.0, 21700000.0, 48.4], [1436807808.0, 21720000.0, 48.440000000000005], [1436809536.0, 21740000.0, 48.48], [1436811264.0, 21760000.0, 48.52], [1436812992.0, 21780000.0, 48.56], [1436814720.0, 21800000.0, 48.6], [1436816448.0, 21820000.0, 48.64], [1436818176.0, 21840000.0, 48.68000000000001], [1436819904.0, 21860000.0, 48.72], [1436821632.0, 21880000.0, 48.760000000000005], [1436823360.0, 21900000.0, 48.8], [1436825088.0, 21920000.0, 48.84], [1436826816.0, 21940000.0, 48.879999999999995], [1436828544.0, 21960000.0, 48.92], [1436830272.0, 21980000.0, 48.96], [1436832000.0, 22000000.0, 49.0], [1436833728.0, 22020000.0, 49.04], [1436835456.0, 22040000.0, 49.080000000000005], [1436837184.0, 22060000.0, 49.12], [1436838912.0, 22080000.0, 49.160000000000004], [1436840640.0, 22100000.0, 49.2], [1436842368.0, 22120000.0, 49.24], [1436844096.0, 22140000.0, 49.28], [1436845824.0, 22160000.0, 49.32000000000001], [1436847552.0, 22180000.0, 49.36], [1436849280.0, 22200000.0, 49.400000000000006], [1436851008.0, 22220000.0, 49.44], [1436852736.0, 22240000.0, 49.480000000000004], [1436854464.0, 22260000.0, 49.519999999999996], [1436856192.0, 22280000.0, 49.56], [1436857920.0, 22300000.0, 49.6], [1436859648.0, 22320000.0, 49.64], [1436861376.0, 22340000.0, 49.68], [1436863104.0, 22360000.0, 49.720000000000006], [1436864832.0, 22380000.0, 49.76], [1436866560.0, 22400000.0, 49.800000000000004], [1436868288.0, 22420000.0, 49.84], [1436870016.0, 22440000.0, 49.88], [1436871744.0, 22460000.0, 49.92], [1436873472.0, 22480000.0, 49.96000000000001], [1436875200.0, 22500000.0, 50.0], [1436876928.0, 22520000.0, 50.03999999999999], [1436878656.0, 22540000.0, 50.08], [1436880384.0, 22560000.0, 50.12], [1436882112.0, 22580000.0, 50.16], [1436883840.0, 22600000.0, 50.199999999999996], [1436885568.0, 22620000.0, 50.24], [1436887296.0, 22640000.0, 50.279999999999994], [1436889024.0, 22660000.0, 50.32], [1436890752.0, 22680000.0, 50.36], [1436892480.0, 22700000.0, 50.4], [1436894208.0, 22720000.0, 50.44], [1436895936.0, 22740000.0, 50.480000000000004], [1436897664.0, 22760000.0, 50.519999999999996], [1436899392.0, 22780000.0, 50.56], [1436901120.0, 22800000.0, 50.599999999999994], [1436902848.0, 22820000.0, 50.64], [1436904576.0, 22840000.0, 50.67999999999999], [1436906304.0, 22860000.0, 50.72], [1436908032.0, 22880000.0, 50.76], [1436909760.0, 22900000.0, 50.8], [1436911488.0, 22920000.0, 50.839999999999996], [1436913216.0, 22940000.0, 50.88], [1436914944.0, 22960000.0, 50.919999999999995], [1436916672.0, 22980000.0, 50.96], [1436918400.0, 23000000.0, 51.0], [1436920128.0, 23020000.0, 51.04], [1436921856.0, 23040000.0, 51.08], [1436923584.0, 23060000.0, 51.120000000000005], [1436925312.0, 23080000.0, 51.16], [1436927040.0, 23100000.0, 51.2], [1436928768.0, 23120000.0, 51.239999999999995], [1436930496.0, 23140000.0, 51.28], [1436932224.0, 23160000.0, 51.31999999999999], [1436933952.0, 23180000.0, 51.36], [1436935680.0, 23200000.0, 51.4], [1436937408.0, 23220000.0, 51.44], [1436939136.0, 23240000.0, 51.48], [1436940864.0, 23260000.0, 51.52], [1436942592.0, 23280000.0, 51.559999999999995], [1436944320.0, 23300000.0, 51.6], [1436946048.0, 23320000.0, 51.64], [1436947776.0, 23340000.0, 51.68], [1436949504.0, 23360000.0, 51.72], [1436951232.0, 23380000.0, 51.760000000000005], [1436952960.0, 23400000.0, 51.8], [1436954688.0, 23420000.0, 51.84], [1436956416.0, 23440000.0, 51.879999999999995], [1436958144.0, 23460000.0, 51.92], [1436959872.0, 23480000.0, 51.959999999999994], [1436961600.0, 23500000.0, 52.0], [1436963328.0, 23520000.0, 52.04], [1436965056.0, 23540000.0, 52.08], [1436966784.0, 23560000.0, 52.12], [1436968512.0, 23580000.0, 52.160000000000004], [1436970240.0, 23600000.0, 52.199999999999996], [1436971968.0, 23620000.0, 52.24], [1436973696.0, 23640000.0, 52.28], [1436975424.0, 23660000.0, 52.32], [1436977152.0, 23680000.0, 52.36], [1436978880.0, 23700000.0, 52.400000000000006], [1436980608.0, 23720000.0, 52.44], [1436982336.0, 23740000.0, 52.480000000000004], [1436984064.0, 23760000.0, 52.519999999999996], [1436985792.0, 23780000.0, 52.56], [1436987520.0, 23800000.0, 52.599999999999994], [1436989248.0, 23820000.0, 52.64], [1436990976.0, 23840000.0, 52.68], [1436992704.0, 23860000.0, 52.72], [1436994432.0, 23880000.0, 52.76], [1436996160.0, 23900000.0, 52.800000000000004], [1436997888.0, 23920000.0, 52.839999999999996], [1436999616.0, 23940000.0, 52.88], [1437001344.0, 23960000.0, 52.92], [1437003072.0, 23980000.0, 52.96], [1437004800.0, 24000000.0, 53.0], [1437006528.0, 24020000.0, 53.040000000000006], [1437008256.0, 24040000.0, 53.08], [1437009984.0, 24060000.0, 53.120000000000005], [1437011712.0, 24080000.0, 53.16], [1437013440.0, 24100000.0, 53.2], [1437015168.0, 24120000.0, 53.239999999999995], [1437016896.0, 24140000.0, 53.28], [1437018624.0, 24160000.0, 53.32], [1437020352.0, 24180000.0, 53.36], [1437022080.0, 24200000.0, 53.4], [1437023808.0, 24220000.0, 53.440000000000005], [1437025536.0, 24240000.0, 53.48], [1437027264.0, 24260000.0, 53.52], [1437028992.0, 24280000.0, 53.56], [1437030720.0, 24300000.0, 53.6], [1437032448.0, 24320000.0, 53.64], [1437034176.0, 24340000.0, 53.68000000000001], [1437035904.0, 24360000.0, 53.72], [1437037632.0, 24380000.0, 53.760000000000005], [1437039360.0, 24400000.0, 53.8], [1437041088.0, 24420000.0, 53.84], [1437042816.0, 24440000.0, 53.879999999999995], [1437044544.0, 24460000.0, 53.92], [1437046272.0, 24480000.0, 53.96], [1437048000.0, 24500000.0, 54.0], [1437049728.0, 24520000.0, 54.04], [1437051456.0, 24540000.0, 54.080000000000005], [1437053184.0, 24560000.0, 54.12], [1437054912.0, 24580000.0, 54.160000000000004], [1437056640.0, 24600000.0, 54.2], [1437058368.0, 24620000.0, 54.24], [1437060096.0, 24640000.0, 54.28], [1437061824.0, 24660000.0, 54.32000000000001], [1437063552.0, 24680000.0, 54.36], [1437065280.0, 24700000.0, 54.400000000000006], [1437067008.0, 24720000.0, 54.44], [1437068736.0, 24740000.0, 54.480000000000004], [1437070464.0, 24760000.0, 54.519999999999996], [1437072192.0, 24780000.0, 54.56], [1437073920.0, 24800000.0, 54.6], [1437075648.0, 24820000.0, 54.64], [1437077376.0, 24840000.0, 54.68], [1437079104.0, 24860000.0, 54.720000000000006], [1437080832.0, 24880000.0, 54.76], [1437082560.0, 24900000.0, 54.800000000000004], [1437084288.0, 24920000.0, 54.84], [1437086016.0, 24940000.0, 54.88], [1437087744.0, 24960000.0, 54.92], [1437089472.0, 24980000.0, 54.96000000000001], [1437091200.0, 25000000.0, 55.0], [1437092928.0, 25020000.0, 55.03999999999999], [1437094656.0, 25040000.0, 55.08], [1437096384.0, 25060000.0, 55.12], [1437098112.0, 25080000.0, 55.16], [1437099840.0, 25100000.0, 55.199999999999996], [1437101568.0, 25120000.0, 55.24], [1437103296.0, 25140000.0, 55.279999999999994], [1437105024.0, 25160000.0, 55.32], [1437106752.0, 25180000.0, 55.36], [1437108480.0, 25200000.0, 55.4], [1437110208.0, 25220000.0, 55.44], [1437111936.0, 25240000.0, 55.480000000000004], [1437113664.0, 25260000.0, 55.519999999999996], [1437115392.0, 25280000.0, 55.56], [1437117120.0, 25300000.0, 55.599999999999994], [1437118848.0, 25320000.0, 55.64], [1437120576.0, 25340000.0, 55.67999999999999], [1437122304.0, 25360000.0, 55.72], [1437124032.0, 25380000.0, 55.76], [1437125760.0, 25400000.0, 55.8], [1437127488.0, 25420000.0, 55.839999999999996], [1437129216.0, 25440000.0, 55.88], [1437130944.0, 25460000.0, 55.919999999999995], [1437132672.0, 25480000.0, 55.96], [1437134400.0, 25500000.0, 56.0], [1437136128.0, 25520000.0, 56.04], [1437137856.0, 25540000.0, 56.08], [1437139584.0, 25560000.0, 56.120000000000005], [1437141312.0, 25580000.0, 56.16], [1437143040.0, 25600000.0, 56.2], [1437144768.0, 25620000.0, 56.239999999999995], [1437146496.0, 25640000.0, 56.28], [1437148224.0, 25660000.0, 56.31999999999999], [1437149952.0, 25680000.0, 56.36], [1437151680.0, 25700000.0, 56.4], [1437153408.0, 25720000.0, 56.44], [1437155136.0, 25740000.0, 56.48], [1437156864.0, 25760000.0, 56.52], [1437158592.0, 25780000.0, 56.559999999999995], [1437160320.0, 25800000.0, 56.6], [1437162048.0, 25820000.0, 56.64], [1437163776.0, 25840000.0, 56.68], [1437165504.0, 25860000.0, 56.72], [1437167232.0, 25880000.0, 56.760000000000005], [1437168960.0, 25900000.0, 56.8], [1437170688.0, 25920000.0, 56.84], [1437172416.0, 25940000.0, 56.879999999999995], [1437174144.0, 25960000.0, 56.92], [1437175872.0, 25980000.0, 56.959999999999994], [1437177600.0, 26000000.0, 57.0], [1437179328.0, 26020000.0, 57.04], [1437181056.0, 26040000.0, 57.08], [1437182784.0, 26060000.0, 57.12], [1437184512.0, 26080000.0, 57.160000000000004], [1437186240.0, 26100000.0, 57.199999999999996], [1437187968.0, 26120000.0, 57.24], [1437189696.0, 26140000.0, 57.28], [1437191424.0, 26160000.0, 57.32], [1437193152.0, 26180000.0, 57.36], [1437194880.0, 26200000.0, 57.400000000000006], [1437196608.0, 26220000.0, 57.44], [1437198336.0, 26240000.0, 57.480000000000004], [1437200064.0, 26260000.0, 57.519999999999996], [1437201792.0, 26280000.0, 57.56], [1437203520.0, 26300000.0, 57.599999999999994], [1437205248.0, 26320000.0, 57.64], [1437206976.0, 26340000.0, 57.68], [1437208704.0, 26360000.0, 57.72], [1437210432.0, 26380000.0, 57.76], [1437212160.0, 26400000.0, 57.800000000000004], [1437213888.0, 26420000.0, 57.839999999999996], [1437215616.0, 26440000.0, 57.88], [1437217344.0, 26460000.0, 57.92], [1437219072.0, 26480000.0, 57.96], [1437220800.0, 26500000.0, 58.0], [1437222528.0, 26520000.0, 58.040000000000006], [1437224256.0, 26540000.0, 58.08], [1437225984.0, 26560000.0, 58.120000000000005], [1437227712.0, 26580000.0, 58.16], [1437229440.0, 26600000.0, 58.2], [1437231168.0, 26620000.0, 58.239999999999995], [1437232896.0, 26640000.0, 58.28], [1437234624.0, 26660000.0, 58.32], [1437236352.0, 26680000.0, 58.36], [1437238080.0, 26700000.0, 58.4], [1437239808.0, 26720000.0, 58.440000000000005], [1437241536.0, 26740000.0, 58.48], [1437243264.0, 26760000.0, 58.52], [1437244992.0, 26780000.0, 58.56], [1437246720.0, 26800000.0, 58.6], [1437248448.0, 26820000.0, 58.64], [1437250176.0, 26840000.0, 58.68000000000001], [1437251904.0, 26860000.0, 58.72], [1437253632.0, 26880000.0, 58.760000000000005], [1437255360.0, 26900000.0, 58.8], [1437257088.0, 26920000.0, 58.84], [1437258816.0, 26940000.0, 58.879999999999995], [1437260544.0, 26960000.0, 58.92], [1437262272.0, 26980000.0, 58.96], [1437264000.0, 27000000.0, 59.0], [1437265728.0, 27020000.0, 59.04], [1437267456.0, 27040000.0, 59.080000000000005], [1437269184.0, 27060000.0, 59.12], [1437270912.0, 27080000.0, 59.160000000000004], [1437272640.0, 27100000.0, 59.2], [1437274368.0, 27120000.0, 59.24], [1437276096.0, 27140000.0, 59.28], [1437277824.0, 27160000.0, 59.32000000000001], [1437279552.0, 27180000.0, 59.36], [1437281280.0, 27200000.0, 59.400000000000006], [1437283008.0, 27220000.0, 59.44], [1437284736.0, 27240000.0, 59.480000000000004], [1437286464.0, 27260000.0, 59.519999999999996], [1437288192.0, 27280000.0, 59.56], [1437289920.0, 27300000.0, 59.6], [1437291648.0, 27320000.0, 59.64], [1437293376.0, 27340000.0, 59.68], [1437295104.0, 27360000.0, 59.720000000000006], [1437296832.0, 27380000.0, 59.76], [1437298560.0, 27400000.0, 59.800000000000004], [1437300288.0, 27420000.0, 59.84], [1437302016.0, 27440000.0, 59.88], [1437303744.0, 27460000.0, 59.92], [1437305472.0, 27480000.0, 59.96000000000001], [1437307200.0, 27500000.0, 60.0], [1437308928.0, 27520000.0, 60.03999999999999], [1437310656.0, 27540000.0, 60.08], [1437312384.0, 27560000.0, 60.12], [1437314112.0, 27580000.0, 60.16], [1437315840.0, 27600000.0, 60.199999999999996], [1437317568.0, 27620000.0, 60.24], [1437319296.0, 27640000.0, 60.279999999999994], [1437321024.0, 27660000.0, 60.32], [1437322752.0, 27680000.0, 60.36], [1437324480.0, 27700000.0, 60.4], [1437326208.0, 27720000.0, 60.44], [1437327936.0, 27740000.0, 60.480000000000004], [1437329664.0, 27760000.0, 60.519999999999996], [1437331392.0, 27780000.0, 60.56], [1437333120.0, 27800000.0, 60.599999999999994], [1437334848.0, 27820000.0, 60.64], [1437336576.0, 27840000.0, 60.67999999999999], [1437338304.0, 27860000.0, 60.72], [1437340032.0, 27880000.0, 60.76], [1437341760.0, 27900000.0, 60.8], [1437343488.0, 27920000.0, 60.839999999999996], [1437345216.0, 27940000.0, 60.88], [1437346944.0, 27960000.0, 60.919999999999995], [1437348672.0, 27980000.0, 60.96], [1437350400.0, 28000000.0, 61.0], [1437352128.0, 28020000.0, 61.04], [1437353856.0, 28040000.0, 61.08], [1437355584.0, 28060000.0, 61.120000000000005], [1437357312.0, 28080000.0, 61.16], [1437359040.0, 28100000.0, 61.2], [1437360768.0, 28120000.0, 61.239999999999995], [1437362496.0, 28140000.0, 61.28], [1437364224.0, 28160000.0, 61.31999999999999], [1437365952.0, 28180000.0, 61.36], [1437367680.0, 28200000.0, 61.4], [1437369408.0, 28220000.0, 61.44], [1437371136.0, 28240000.0, 61.48], [1437372864.0, 28260000.0, 61.52], [1437374592.0, 28280000.0, 61.559999999999995], [1437376320.0, 28300000.0, 61.6], [1437378048.0, 28320000.0, 61.64], [1437379776.0, 28340000.0, 61.68], [1437381504.0, 28360000.0, 61.72], [1437383232.0, 28380000.0, 61.760000000000005], [1437384960.0, 28400000.0, 61.8], [1437386688.0, 28420000.0, 61.84], [1437388416.0, 28440000.0, 61.879999999999995], [1437390144.0, 28460000.0, 61.92], [1437391872.0, 28480000.0, 61.959999999999994], [1437393600.0, 28500000.0, 62.0], [1437395328.0, 28520000.0, 62.04], [1437397056.0, 28540000.0, 62.08], [1437398784.0, 28560000.0, 62.12], [1437400512.0, 28580000.0, 62.160000000000004], [1437402240.0, 28600000.0, 62.199999999999996], [1437403968.0, 28620000.0, 62.24], [1437405696.0, 28640000.0, 62.28], [1437407424.0, 28660000.0, 62.32], [1437409152.0, 28680000.0, 62.36], [1437410880.0, 28700000.0, 62.400000000000006], [1437412608.0, 28720000.0, 62.44], [1437414336.0, 28740000.0, 62.480000000000004], [1437416064.0, 28760000.0, 62.519999999999996], [1437417792.0, 28780000.0, 62.56], [1437419520.0, 28800000.0, 62.599999999999994], [1437421248.0, 28820000.0, 62.64], [1437422976.0, 28840000.0, 62.68], [1437424704.0, 28860000.0, 62.72], [1437426432.0, 28880000.0, 62.76], [1437428160.0, 28900000.0, 62.800000000000004], [1437429888.0, 28920000.0, 62.839999999999996], [1437431616.0, 28940000.0, 62.88], [1437433344.0, 28960000.0, 62.92], [1437435072.0, 28980000.0, 62.96], [1437436800.0, 29000000.0, 63.0], [1437438528.0, 29020000.0, 63.040000000000006], [1437440256.0, 29040000.0, 63.08], [1437441984.0, 29060000.0, 63.120000000000005], [1437443712.0, 29080000.0, 63.16], [1437445440.0, 29100000.0, 63.2], [1437447168.0, 29120000.0, 63.239999999999995], [1437448896.0, 29140000.0, 63.28], [1437450624.0, 29160000.0, 63.32], [1437452352.0, 29180000.0, 63.36], [1437454080.0, 29200000.0, 63.4], [1437455808.0, 29220000.0, 63.440000000000005], [1437457536.0, 29240000.0, 63.48], [1437459264.0, 29260000.0, 63.52], [1437460992.0, 29280000.0, 63.56], [1437462720.0, 29300000.0, 63.6], [1437464448.0, 29320000.0, 63.64], [1437466176.0, 29340000.0, 63.68000000000001], [1437467904.0, 29360000.0, 63.72], [1437469632.0, 29380000.0, 63.760000000000005], [1437471360.0, 29400000.0, 63.8], [1437473088.0, 29420000.0, 63.84], [1437474816.0, 29440000.0, 63.879999999999995], [1437476544.0, 29460000.0, 63.92], [1437478272.0, 29480000.0, 63.96], [1437480000.0, 29500000.0, 64.0], [1437481728.0, 29520000.0, 64.03999999999999], [1437483456.0, 29540000.0, 64.08000000000001], [1437485184.0, 29560000.0, 64.12], [1437486912.0, 29580000.0, 64.16], [1437488640.0, 29600000.0, 64.2], [1437490368.0, 29620000.0, 64.24000000000001], [1437492096.0, 29640000.0, 64.28], [1437493824.0, 29660000.0, 64.32000000000001], [1437495552.0, 29680000.0, 64.36], [1437497280.0, 29700000.0, 64.4], [1437499008.0, 29720000.0, 64.44], [1437500736.0, 29740000.0, 64.48], [1437502464.0, 29760000.0, 64.52], [1437504192.0, 29780000.0, 64.56], [1437505920.0, 29800000.0, 64.6], [1437507648.0, 29820000.0, 64.64], [1437509376.0, 29840000.0, 64.68], [1437511104.0, 29860000.0, 64.72], [1437512832.0, 29880000.0, 64.75999999999999], [1437514560.0, 29900000.0, 64.80000000000001], [1437516288.0, 29920000.0, 64.84], [1437518016.0, 29940000.0, 64.88], [1437519744.0, 29960000.0, 64.92], [1437521472.0, 29980000.0, 64.96000000000001], [1437523200.0, 30000000.0, 65.0], [1437524928.0, 30020000.0, 65.03999999999999], [1437526656.0, 30040000.0, 65.08], [1437528384.0, 30060000.0, 65.12], [1437530112.0, 30080000.0, 65.16], [1437531840.0, 30100000.0, 65.19999999999999], [1437533568.0, 30120000.0, 65.24000000000001], [1437535296.0, 30140000.0, 65.28], [1437537024.0, 30160000.0, 65.32], [1437538752.0, 30180000.0, 65.36], [1437540480.0, 30200000.0, 65.4], [1437542208.0, 30220000.0, 65.44], [1437543936.0, 30240000.0, 65.48], [1437545664.0, 30260000.0, 65.52], [1437547392.0, 30280000.0, 65.56], [1437549120.0, 30300000.0, 65.6], [1437550848.0, 30320000.0, 65.64], [1437552576.0, 30340000.0, 65.67999999999999], [1437554304.0, 30360000.0, 65.72], [1437556032.0, 30380000.0, 65.75999999999999], [1437557760.0, 30400000.0, 65.8], [1437559488.0, 30420000.0, 65.84], [1437561216.0, 30440000.0, 65.88], [1437562944.0, 30460000.0, 65.91999999999999], [1437564672.0, 30480000.0, 65.96000000000001], [1437566400.0, 30500000.0, 66.0], [1437568128.0, 30520000.0, 66.03999999999999], [1437569856.0, 30540000.0, 66.08], [1437571584.0, 30560000.0, 66.12], [1437573312.0, 30580000.0, 66.16], [1437575040.0, 30600000.0, 66.2], [1437576768.0, 30620000.0, 66.24], [1437578496.0, 30640000.0, 66.28], [1437580224.0, 30660000.0, 66.32], [1437581952.0, 30680000.0, 66.36], [1437583680.0, 30700000.0, 66.4], [1437585408.0, 30720000.0, 66.44], [1437587136.0, 30740000.0, 66.47999999999999], [1437588864.0, 30760000.0, 66.52000000000001], [1437590592.0, 30780000.0, 66.56], [1437592320.0, 30800000.0, 66.6], [1437594048.0, 30820000.0, 66.64], [1437595776.0, 30840000.0, 66.68], [1437597504.0, 30860000.0, 66.72], [1437599232.0, 30880000.0, 66.76], [1437600960.0, 30900000.0, 66.8], [1437602688.0, 30920000.0, 66.84], [1437604416.0, 30940000.0, 66.88], [1437606144.0, 30960000.0, 66.92], [1437607872.0, 30980000.0, 66.96], [1437609600.0, 31000000.0, 67.0], [1437611328.0, 31020000.0, 67.03999999999999], [1437613056.0, 31040000.0, 67.08], [1437614784.0, 31060000.0, 67.12], [1437616512.0, 31080000.0, 67.16], [1437618240.0, 31100000.0, 67.19999999999999], [1437619968.0, 31120000.0, 67.24000000000001], [1437621696.0, 31140000.0, 67.28], [1437623424.0, 31160000.0, 67.32], [1437625152.0, 31180000.0, 67.36], [1437626880.0, 31200000.0, 67.4], [1437628608.0, 31220000.0, 67.44], [1437630336.0, 31240000.0, 67.48], [1437632064.0, 31260000.0, 67.52], [1437633792.0, 31280000.0, 67.56], [1437635520.0, 31300000.0, 67.6], [1437637248.0, 31320000.0, 67.64], [1437638976.0, 31340000.0, 67.68], [1437640704.0, 31360000.0, 67.72], [1437642432.0, 31380000.0, 67.75999999999999], [1437644160.0, 31400000.0, 67.80000000000001], [1437645888.0, 31420000.0, 67.84], [1437647616.0, 31440000.0, 67.88], [1437649344.0, 31460000.0, 67.92], [1437651072.0, 31480000.0, 67.96000000000001], [1437652800.0, 31500000.0, 68.0], [1437654528.0, 31520000.0, 68.04], [1437656256.0, 31540000.0, 68.08], [1437657984.0, 31560000.0, 68.12], [1437659712.0, 31580000.0, 68.16], [1437661440.0, 31600000.0, 68.2], [1437663168.0, 31620000.0, 68.24], [1437664896.0, 31640000.0, 68.28], [1437666624.0, 31660000.0, 68.32], [1437668352.0, 31680000.0, 68.36], [1437670080.0, 31700000.0, 68.4], [1437671808.0, 31720000.0, 68.44], [1437673536.0, 31740000.0, 68.47999999999999], [1437675264.0, 31760000.0, 68.52000000000001], [1437676992.0, 31780000.0, 68.56], [1437678720.0, 31800000.0, 68.6], [1437680448.0, 31820000.0, 68.64], [1437682176.0, 31840000.0, 68.68], [1437683904.0, 31860000.0, 68.72], [1437685632.0, 31880000.0, 68.76], [1437687360.0, 31900000.0, 68.8], [1437689088.0, 31920000.0, 68.84], [1437690816.0, 31940000.0, 68.88], [1437692544.0, 31960000.0, 68.92], [1437694272.0, 31980000.0, 68.96000000000001], [1437696000.0, 32000000.0, 69.0], [1437697728.0, 32020000.0, 69.03999999999999], [1437699456.0, 32040000.0, 69.08], [1437701184.0, 32060000.0, 69.12], [1437702912.0, 32080000.0, 69.16], [1437704640.0, 32100000.0, 69.2], [1437706368.0, 32120000.0, 69.24000000000001], [1437708096.0, 32140000.0, 69.28], [1437709824.0, 32160000.0, 69.32000000000001], [1437711552.0, 32180000.0, 69.36], [1437713280.0, 32200000.0, 69.4], [1437715008.0, 32220000.0, 69.44], [1437716736.0, 32240000.0, 69.48], [1437718464.0, 32260000.0, 69.52], [1437720192.0, 32280000.0, 69.56], [1437721920.0, 32300000.0, 69.6], [1437723648.0, 32320000.0, 69.64], [1437725376.0, 32340000.0, 69.68], [1437727104.0, 32360000.0, 69.72], [1437728832.0, 32380000.0, 69.76], [1437730560.0, 32400000.0, 69.80000000000001], [1437732288.0, 32420000.0, 69.84], [1437734016.0, 32440000.0, 69.88000000000001], [1437735744.0, 32460000.0, 69.92], [1437737472.0, 32480000.0, 69.96000000000001], [1437739200.0, 32500000.0, 70.0], [1437740928.0, 32520000.0, 70.03999999999999], [1437742656.0, 32540000.0, 70.08], [1437744384.0, 32560000.0, 70.11999999999999], [1437746112.0, 32580000.0, 70.16], [1437747840.0, 32600000.0, 70.19999999999999], [1437749568.0, 32620000.0, 70.24], [1437751296.0, 32640000.0, 70.28], [1437753024.0, 32660000.0, 70.32], [1437754752.0, 32680000.0, 70.36], [1437756480.0, 32700000.0, 70.4], [1437758208.0, 32720000.0, 70.44], [1437759936.0, 32740000.0, 70.48], [1437761664.0, 32760000.0, 70.52], [1437763392.0, 32780000.0, 70.56], [1437765120.0, 32800000.0, 70.6], [1437766848.0, 32820000.0, 70.64], [1437768576.0, 32840000.0, 70.67999999999999], [1437770304.0, 32860000.0, 70.72], [1437772032.0, 32880000.0, 70.75999999999999], [1437773760.0, 32900000.0, 70.8], [1437775488.0, 32920000.0, 70.84], [1437777216.0, 32940000.0, 70.88], [1437778944.0, 32960000.0, 70.92], [1437780672.0, 32980000.0, 70.96000000000001], [1437782400.0, 33000000.0, 71.0], [1437784128.0, 33020000.0, 71.04], [1437785856.0, 33040000.0, 71.08], [1437787584.0, 33060000.0, 71.12], [1437789312.0, 33080000.0, 71.16], [1437791040.0, 33100000.0, 71.2], [1437792768.0, 33120000.0, 71.24], [1437794496.0, 33140000.0, 71.28], [1437796224.0, 33160000.0, 71.32], [1437797952.0, 33180000.0, 71.36], [1437799680.0, 33200000.0, 71.39999999999999], [1437801408.0, 33220000.0, 71.44], [1437803136.0, 33240000.0, 71.47999999999999], [1437804864.0, 33260000.0, 71.52], [1437806592.0, 33280000.0, 71.56], [1437808320.0, 33300000.0, 71.6], [1437810048.0, 33320000.0, 71.64], [1437811776.0, 33340000.0, 71.68], [1437813504.0, 33360000.0, 71.72], [1437815232.0, 33380000.0, 71.76], [1437816960.0, 33400000.0, 71.8], [1437818688.0, 33420000.0, 71.84], [1437820416.0, 33440000.0, 71.88], [1437822144.0, 33460000.0, 71.92], [1437823872.0, 33480000.0, 71.96], [1437825600.0, 33500000.0, 72.0], [1437827328.0, 33520000.0, 72.03999999999999], [1437829056.0, 33540000.0, 72.08], [1437830784.0, 33560000.0, 72.12], [1437832512.0, 33580000.0, 72.16], [1437834240.0, 33600000.0, 72.2], [1437835968.0, 33620000.0, 72.24000000000001], [1437837696.0, 33640000.0, 72.28], [1437839424.0, 33660000.0, 72.32000000000001], [1437841152.0, 33680000.0, 72.36], [1437842880.0, 33700000.0, 72.4], [1437844608.0, 33720000.0, 72.44], [1437846336.0, 33740000.0, 72.48], [1437848064.0, 33760000.0, 72.52], [1437849792.0, 33780000.0, 72.56], [1437851520.0, 33800000.0, 72.6], [1437853248.0, 33820000.0, 72.64], [1437854976.0, 33840000.0, 72.67999999999999], [1437856704.0, 33860000.0, 72.72], [1437858432.0, 33880000.0, 72.75999999999999], [1437860160.0, 33900000.0, 72.8], [1437861888.0, 33920000.0, 72.84], [1437863616.0, 33940000.0, 72.88], [1437865344.0, 33960000.0, 72.92], [1437867072.0, 33980000.0, 72.96000000000001], [1437868800.0, 34000000.0, 73.0], [1437870528.0, 34020000.0, 73.04], [1437872256.0, 34040000.0, 73.08], [1437873984.0, 34060000.0, 73.12], [1437875712.0, 34080000.0, 73.16], [1437877440.0, 34100000.0, 73.2], [1437879168.0, 34120000.0, 73.24], [1437880896.0, 34140000.0, 73.28], [1437882624.0, 34160000.0, 73.32], [1437884352.0, 34180000.0, 73.36], [1437886080.0, 34200000.0, 73.4], [1437887808.0, 34220000.0, 73.44], [1437889536.0, 34240000.0, 73.48], [1437891264.0, 34260000.0, 73.52000000000001], [1437892992.0, 34280000.0, 73.56], [1437894720.0, 34300000.0, 73.60000000000001], [1437896448.0, 34320000.0, 73.64], [1437898176.0, 34340000.0, 73.68], [1437899904.0, 34360000.0, 73.72], [1437901632.0, 34380000.0, 73.76], [1437903360.0, 34400000.0, 73.8], [1437905088.0, 34420000.0, 73.84], [1437906816.0, 34440000.0, 73.88], [1437908544.0, 34460000.0, 73.92], [1437910272.0, 34480000.0, 73.96], [1437912000.0, 34500000.0, 74.0], [1437913728.0, 34520000.0, 74.03999999999999], [1437915456.0, 34540000.0, 74.08], [1437917184.0, 34560000.0, 74.12], [1437918912.0, 34580000.0, 74.16], [1437920640.0, 34600000.0, 74.2], [1437922368.0, 34620000.0, 74.24000000000001], [1437924096.0, 34640000.0, 74.28], [1437925824.0, 34660000.0, 74.32000000000001], [1437927552.0, 34680000.0, 74.36], [1437929280.0, 34700000.0, 74.4], [1437931008.0, 34720000.0, 74.44], [1437932736.0, 34740000.0, 74.48], [1437934464.0, 34760000.0, 74.52], [1437936192.0, 34780000.0, 74.56], [1437937920.0, 34800000.0, 74.6], [1437939648.0, 34820000.0, 74.64], [1437941376.0, 34840000.0, 74.68], [1437943104.0, 34860000.0, 74.72], [1437944832.0, 34880000.0, 74.76], [1437946560.0, 34900000.0, 74.80000000000001], [1437948288.0, 34920000.0, 74.84], [1437950016.0, 34940000.0, 74.88000000000001], [1437951744.0, 34960000.0, 74.92], [1437953472.0, 34980000.0, 74.96000000000001], [1437955200.0, 35000000.0, 75.0], [1437956928.0, 35020000.0, 75.03999999999999], [1437958656.0, 35040000.0, 75.08], [1437960384.0, 35060000.0, 75.11999999999999], [1437962112.0, 35080000.0, 75.16], [1437963840.0, 35100000.0, 75.19999999999999], [1437965568.0, 35120000.0, 75.24], [1437967296.0, 35140000.0, 75.28], [1437969024.0, 35160000.0, 75.32], [1437970752.0, 35180000.0, 75.36], [1437972480.0, 35200000.0, 75.4], [1437974208.0, 35220000.0, 75.44], [1437975936.0, 35240000.0, 75.48], [1437977664.0, 35260000.0, 75.52], [1437979392.0, 35280000.0, 75.56], [1437981120.0, 35300000.0, 75.6], [1437982848.0, 35320000.0, 75.64], [1437984576.0, 35340000.0, 75.67999999999999], [1437986304.0, 35360000.0, 75.72], [1437988032.0, 35380000.0, 75.75999999999999], [1437989760.0, 35400000.0, 75.8], [1437991488.0, 35420000.0, 75.84], [1437993216.0, 35440000.0, 75.88], [1437994944.0, 35460000.0, 75.92], [1437996672.0, 35480000.0, 75.96000000000001], [1437998400.0, 35500000.0, 76.0], [1438000128.0, 35520000.0, 76.04], [1438001856.0, 35540000.0, 76.08], [1438003584.0, 35560000.0, 76.12], [1438005312.0, 35580000.0, 76.16], [1438007040.0, 35600000.0, 76.2], [1438008768.0, 35620000.0, 76.24], [1438010496.0, 35640000.0, 76.28], [1438012224.0, 35660000.0, 76.32], [1438013952.0, 35680000.0, 76.36], [1438015680.0, 35700000.0, 76.39999999999999], [1438017408.0, 35720000.0, 76.44], [1438019136.0, 35740000.0, 76.47999999999999], [1438020864.0, 35760000.0, 76.52], [1438022592.0, 35780000.0, 76.56], [1438024320.0, 35800000.0, 76.6], [1438026048.0, 35820000.0, 76.64], [1438027776.0, 35840000.0, 76.68], [1438029504.0, 35860000.0, 76.72], [1438031232.0, 35880000.0, 76.76], [1438032960.0, 35900000.0, 76.8], [1438034688.0, 35920000.0, 76.84], [1438036416.0, 35940000.0, 76.88], [1438038144.0, 35960000.0, 76.92], [1438039872.0, 35980000.0, 76.96], [1438041600.0, 36000000.0, 77.0], [1438043328.0, 36020000.0, 77.03999999999999], [1438045056.0, 36040000.0, 77.08], [1438046784.0, 36060000.0, 77.12], [1438048512.0, 36080000.0, 77.16], [1438050240.0, 36100000.0, 77.2], [1438051968.0, 36120000.0, 77.24000000000001], [1438053696.0, 36140000.0, 77.28], [1438055424.0, 36160000.0, 77.32000000000001], [1438057152.0, 36180000.0, 77.36], [1438058880.0, 36200000.0, 77.4], [1438060608.0, 36220000.0, 77.44], [1438062336.0, 36240000.0, 77.48], [1438064064.0, 36260000.0, 77.52], [1438065792.0, 36280000.0, 77.56], [1438067520.0, 36300000.0, 77.6], [1438069248.0, 36320000.0, 77.64], [1438070976.0, 36340000.0, 77.67999999999999], [1438072704.0, 36360000.0, 77.72], [1438074432.0, 36380000.0, 77.75999999999999], [1438076160.0, 36400000.0, 77.8], [1438077888.0, 36420000.0, 77.84], [1438079616.0, 36440000.0, 77.88], [1438081344.0, 36460000.0, 77.92], [1438083072.0, 36480000.0, 77.96000000000001], [1438084800.0, 36500000.0, 78.0], [1438086528.0, 36520000.0, 78.04], [1438088256.0, 36540000.0, 78.08], [1438089984.0, 36560000.0, 78.12], [1438091712.0, 36580000.0, 78.16], [1438093440.0, 36600000.0, 78.2], [1438095168.0, 36620000.0, 78.24], [1438096896.0, 36640000.0, 78.28], [1438098624.0, 36660000.0, 78.32], [1438100352.0, 36680000.0, 78.36], [1438102080.0, 36700000.0, 78.4], [1438103808.0, 36720000.0, 78.44], [1438105536.0, 36740000.0, 78.48], [1438107264.0, 36760000.0, 78.52000000000001], [1438108992.0, 36780000.0, 78.56], [1438110720.0, 36800000.0, 78.60000000000001], [1438112448.0, 36820000.0, 78.64], [1438114176.0, 36840000.0, 78.68], [1438115904.0, 36860000.0, 78.72], [1438117632.0, 36880000.0, 78.76], [1438119360.0, 36900000.0, 78.8], [1438121088.0, 36920000.0, 78.84], [1438122816.0, 36940000.0, 78.88], [1438124544.0, 36960000.0, 78.92], [1438126272.0, 36980000.0, 78.96], [1438128000.0, 37000000.0, 79.0], [1438129728.0, 37020000.0, 79.03999999999999], [1438131456.0, 37040000.0, 79.08], [1438133184.0, 37060000.0, 79.12], [1438134912.0, 37080000.0, 79.16], [1438136640.0, 37100000.0, 79.2], [1438138368.0, 37120000.0, 79.24000000000001], [1438140096.0, 37140000.0, 79.28], [1438141824.0, 37160000.0, 79.32000000000001], [1438143552.0, 37180000.0, 79.36], [1438145280.0, 37200000.0, 79.4], [1438147008.0, 37220000.0, 79.44], [1438148736.0, 37240000.0, 79.48], [1438150464.0, 37260000.0, 79.52], [1438152192.0, 37280000.0, 79.56], [1438153920.0, 37300000.0, 79.6], [1438155648.0, 37320000.0, 79.64], [1438157376.0, 37340000.0, 79.68], [1438159104.0, 37360000.0, 79.72], [1438160832.0, 37380000.0, 79.76], [1438162560.0, 37400000.0, 79.80000000000001], [1438164288.0, 37420000.0, 79.84], [1438166016.0, 37440000.0, 79.88000000000001], [1438167744.0, 37460000.0, 79.92], [1438169472.0, 37480000.0, 79.96000000000001], [1438171200.0, 37500000.0, 80.0], [1438172928.0, 37520000.0, 80.03999999999999], [1438174656.0, 37540000.0, 80.08], [1438176384.0, 37560000.0, 80.11999999999999], [1438178112.0, 37580000.0, 80.16], [1438179840.0, 37600000.0, 80.19999999999999], [1438181568.0, 37620000.0, 80.24], [1438183296.0, 37640000.0, 80.28], [1438185024.0, 37660000.0, 80.32], [1438186752.0, 37680000.0, 80.36], [1438188480.0, 37700000.0, 80.4], [1438190208.0, 37720000.0, 80.44], [1438191936.0, 37740000.0, 80.48], [1438193664.0, 37760000.0, 80.52], [1438195392.0, 37780000.0, 80.56], [1438197120.0, 37800000.0, 80.6], [1438198848.0, 37820000.0, 80.64], [1438200576.0, 37840000.0, 80.67999999999999], [1438202304.0, 37860000.0, 80.72], [1438204032.0, 37880000.0, 80.75999999999999], [1438205760.0, 37900000.0, 80.8], [1438207488.0, 37920000.0, 80.84], [1438209216.0, 37940000.0, 80.88], [1438210944.0, 37960000.0, 80.92], [1438212672.0, 37980000.0, 80.96000000000001], [1438214400.0, 38000000.0, 81.0], [1438216128.0, 38020000.0, 81.04], [1438217856.0, 38040000.0, 81.08], [1438219584.0, 38060000.0, 81.12], [1438221312.0, 38080000.0, 81.16], [1438223040.0, 38100000.0, 81.2], [1438224768.0, 38120000.0, 81.24], [1438226496.0, 38140000.0, 81.28], [1438228224.0, 38160000.0, 81.32], [1438229952.0, 38180000.0, 81.36], [1438231680.0, 38200000.0, 81.39999999999999], [1438233408.0, 38220000.0, 81.44], [1438235136.0, 38240000.0, 81.47999999999999], [1438236864.0, 38260000.0, 81.52], [1438238592.0, 38280000.0, 81.56], [1438240320.0, 38300000.0, 81.6], [1438242048.0, 38320000.0, 81.64], [1438243776.0, 38340000.0, 81.68], [1438245504.0, 38360000.0, 81.72], [1438247232.0, 38380000.0, 81.76], [1438248960.0, 38400000.0, 81.8], [1438250688.0, 38420000.0, 81.84], [1438252416.0, 38440000.0, 81.88], [1438254144.0, 38460000.0, 81.92], [1438255872.0, 38480000.0, 81.96], [1438257600.0, 38500000.0, 82.0], [1438259328.0, 38520000.0, 82.03999999999999], [1438261056.0, 38540000.0, 82.08], [1438262784.0, 38560000.0, 82.12], [1438264512.0, 38580000.0, 82.16], [1438266240.0, 38600000.0, 82.2], [1438267968.0, 38620000.0, 82.24000000000001], [1438269696.0, 38640000.0, 82.28], [1438271424.0, 38660000.0, 82.32000000000001], [1438273152.0, 38680000.0, 82.36], [1438274880.0, 38700000.0, 82.4], [1438276608.0, 38720000.0, 82.44], [1438278336.0, 38740000.0, 82.48], [1438280064.0, 38760000.0, 82.52], [1438281792.0, 38780000.0, 82.56], [1438283520.0, 38800000.0, 82.6], [1438285248.0, 38820000.0, 82.64], [1438286976.0, 38840000.0, 82.67999999999999], [1438288704.0, 38860000.0, 82.72], [1438290432.0, 38880000.0, 82.75999999999999], [1438292160.0, 38900000.0, 82.8], [1438293888.0, 38920000.0, 82.84], [1438295616.0, 38940000.0, 82.88], [1438297344.0, 38960000.0, 82.92], [1438299072.0, 38980000.0, 82.96000000000001], [1438300800.0, 39000000.0, 83.0], [1438302528.0, 39020000.0, 83.04], [1438304256.0, 39040000.0, 83.08], [1438305984.0, 39060000.0, 83.12], [1438307712.0, 39080000.0, 83.16], [1438309440.0, 39100000.0, 83.2], [1438311168.0, 39120000.0, 83.24], [1438312896.0, 39140000.0, 83.28], [1438314624.0, 39160000.0, 83.32], [1438316352.0, 39180000.0, 83.36], [1438318080.0, 39200000.0, 83.4], [1438319808.0, 39220000.0, 83.44], [1438321536.0, 39240000.0, 83.48], [1438323264.0, 39260000.0, 83.52000000000001], [1438324992.0, 39280000.0, 83.56], [1438326720.0, 39300000.0, 83.60000000000001], [1438328448.0, 39320000.0, 83.64], [1438330176.0, 39340000.0, 83.68], [1438331904.0, 39360000.0, 83.72], [1438333632.0, 39380000.0, 83.76], [1438335360.0, 39400000.0, 83.8], [1438337088.0, 39420000.0, 83.84], [1438338816.0, 39440000.0, 83.88], [1438340544.0, 39460000.0, 83.92], [1438342272.0, 39480000.0, 83.96], [1438344000.0, 39500000.0, 84.0], [1438345728.0, 39520000.0, 84.03999999999999], [1438347456.0, 39540000.0, 84.08], [1438349184.0, 39560000.0, 84.12], [1438350912.0, 39580000.0, 84.16], [1438352640.0, 39600000.0, 84.2], [1438354368.0, 39620000.0, 84.24000000000001], [1438356096.0, 39640000.0, 84.28], [1438357824.0, 39660000.0, 84.32000000000001], [1438359552.0, 39680000.0, 84.36], [1438361280.0, 39700000.0, 84.4], [1438363008.0, 39720000.0, 84.44], [1438364736.0, 39740000.0, 84.48], [1438366464.0, 39760000.0, 84.52], [1438368192.0, 39780000.0, 84.56], [1438369920.0, 39800000.0, 84.6], [1438371648.0, 39820000.0, 84.64], [1438373376.0, 39840000.0, 84.68], [1438375104.0, 39860000.0, 84.72], [1438376832.0, 39880000.0, 84.76], [1438378560.0, 39900000.0, 84.80000000000001], [1438380288.0, 39920000.0, 84.84], [1438382016.0, 39940000.0, 84.88000000000001], [1438383744.0, 39960000.0, 84.92], [1438385472.0, 39980000.0, 84.96000000000001], [1438387200.0, 40000000.0, 85.0], [1438388928.0, 40020000.0, 85.03999999999999], [1438390656.0, 40040000.0, 85.07999999999998], [1438392384.0, 40060000.0, 85.12], [1438394112.0, 40080000.0, 85.16], [1438395840.0, 40100000.0, 85.19999999999999], [1438397568.0, 40120000.0, 85.24], [1438399296.0, 40140000.0, 85.28], [1438401024.0, 40160000.0, 85.32], [1438402752.0, 40180000.0, 85.36], [1438404480.0, 40200000.0, 85.39999999999999], [1438406208.0, 40220000.0, 85.44], [1438407936.0, 40240000.0, 85.48], [1438409664.0, 40260000.0, 85.52], [1438411392.0, 40280000.0, 85.55999999999999], [1438413120.0, 40300000.0, 85.60000000000001], [1438414848.0, 40320000.0, 85.64], [1438416576.0, 40340000.0, 85.67999999999999], [1438418304.0, 40360000.0, 85.72], [1438420032.0, 40380000.0, 85.76], [1438421760.0, 40400000.0, 85.8], [1438423488.0, 40420000.0, 85.84], [1438425216.0, 40440000.0, 85.88], [1438426944.0, 40460000.0, 85.92], [1438428672.0, 40480000.0, 85.96000000000001], [1438430400.0, 40500000.0, 86.0], [1438432128.0, 40520000.0, 86.03999999999999], [1438433856.0, 40540000.0, 86.08000000000001], [1438435584.0, 40560000.0, 86.12], [1438437312.0, 40580000.0, 86.16], [1438439040.0, 40600000.0, 86.19999999999999], [1438440768.0, 40620000.0, 86.24000000000001], [1438442496.0, 40640000.0, 86.28], [1438444224.0, 40660000.0, 86.32], [1438445952.0, 40680000.0, 86.35999999999999], [1438447680.0, 40700000.0, 86.4], [1438449408.0, 40720000.0, 86.44], [1438451136.0, 40740000.0, 86.47999999999999], [1438452864.0, 40760000.0, 86.52], [1438454592.0, 40780000.0, 86.56], [1438456320.0, 40800000.0, 86.6], [1438458048.0, 40820000.0, 86.64], [1438459776.0, 40840000.0, 86.67999999999999], [1438461504.0, 40860000.0, 86.72], [1438463232.0, 40880000.0, 86.76], [1438464960.0, 40900000.0, 86.8], [1438466688.0, 40920000.0, 86.83999999999999], [1438468416.0, 40940000.0, 86.88000000000001], [1438470144.0, 40960000.0, 86.92], [1438471872.0, 40980000.0, 86.96], [1438473600.0, 41000000.0, 87.0], [1438475328.0, 41020000.0, 87.04], [1438477056.0, 41040000.0, 87.08], [1438478784.0, 41060000.0, 87.12], [1438480512.0, 41080000.0, 87.16], [1438482240.0, 41100000.0, 87.2], [1438483968.0, 41120000.0, 87.24000000000001], [1438485696.0, 41140000.0, 87.28], [1438487424.0, 41160000.0, 87.32], [1438489152.0, 41180000.0, 87.36000000000001], [1438490880.0, 41200000.0, 87.4], [1438492608.0, 41220000.0, 87.44], [1438494336.0, 41240000.0, 87.47999999999999], [1438496064.0, 41260000.0, 87.52000000000001], [1438497792.0, 41280000.0, 87.56], [1438499520.0, 41300000.0, 87.6], [1438501248.0, 41320000.0, 87.63999999999999], [1438502976.0, 41340000.0, 87.68], [1438504704.0, 41360000.0, 87.72], [1438506432.0, 41380000.0, 87.75999999999999], [1438508160.0, 41400000.0, 87.8], [1438509888.0, 41420000.0, 87.84], [1438511616.0, 41440000.0, 87.88], [1438513344.0, 41460000.0, 87.92], [1438515072.0, 41480000.0, 87.96], [1438516800.0, 41500000.0, 88.0], [1438518528.0, 41520000.0, 88.04], [1438520256.0, 41540000.0, 88.08], [1438521984.0, 41560000.0, 88.11999999999999], [1438523712.0, 41580000.0, 88.16000000000001], [1438525440.0, 41600000.0, 88.2], [1438527168.0, 41620000.0, 88.24], [1438528896.0, 41640000.0, 88.28], [1438530624.0, 41660000.0, 88.32000000000001], [1438532352.0, 41680000.0, 88.36], [1438534080.0, 41700000.0, 88.4], [1438535808.0, 41720000.0, 88.44], [1438537536.0, 41740000.0, 88.48], [1438539264.0, 41760000.0, 88.52000000000001], [1438540992.0, 41780000.0, 88.56], [1438542720.0, 41800000.0, 88.6], [1438544448.0, 41820000.0, 88.64000000000001], [1438546176.0, 41840000.0, 88.68], [1438547904.0, 41860000.0, 88.72], [1438549632.0, 41880000.0, 88.75999999999999], [1438551360.0, 41900000.0, 88.80000000000001], [1438553088.0, 41920000.0, 88.84], [1438554816.0, 41940000.0, 88.88], [1438556544.0, 41960000.0, 88.91999999999999], [1438558272.0, 41980000.0, 88.96000000000001], [1438560000.0, 42000000.0, 89.0], [1438561728.0, 42020000.0, 89.03999999999999], [1438563456.0, 42040000.0, 89.08], [1438565184.0, 42060000.0, 89.12], [1438566912.0, 42080000.0, 89.16], [1438568640.0, 42100000.0, 89.2], [1438570368.0, 42120000.0, 89.24], [1438572096.0, 42140000.0, 89.28], [1438573824.0, 42160000.0, 89.32000000000001], [1438575552.0, 42180000.0, 89.36], [1438577280.0, 42200000.0, 89.39999999999999], [1438579008.0, 42220000.0, 89.44000000000001], [1438580736.0, 42240000.0, 89.48], [1438582464.0, 42260000.0, 89.52], [1438584192.0, 42280000.0, 89.56], [1438585920.0, 42300000.0, 89.60000000000001], [1438587648.0, 42320000.0, 89.64], [1438589376.0, 42340000.0, 89.68], [1438591104.0, 42360000.0, 89.72], [1438592832.0, 42380000.0, 89.76], [1438594560.0, 42400000.0, 89.80000000000001], [1438596288.0, 42420000.0, 89.84], [1438598016.0, 42440000.0, 89.88], [1438599744.0, 42460000.0, 89.92000000000002], [1438601472.0, 42480000.0, 89.96000000000001], [1438603200.0, 42500000.0, 90.0], [1438604928.0, 42520000.0, 90.03999999999999], [1438606656.0, 42540000.0, 90.07999999999998], [1438608384.0, 42560000.0, 90.12], [1438610112.0, 42580000.0, 90.16], [1438611840.0, 42600000.0, 90.19999999999999], [1438613568.0, 42620000.0, 90.24], [1438615296.0, 42640000.0, 90.28], [1438617024.0, 42660000.0, 90.32], [1438618752.0, 42680000.0, 90.36], [1438620480.0, 42700000.0, 90.39999999999999], [1438622208.0, 42720000.0, 90.44], [1438623936.0, 42740000.0, 90.48], [1438625664.0, 42760000.0, 90.52], [1438627392.0, 42780000.0, 90.55999999999999], [1438629120.0, 42800000.0, 90.60000000000001], [1438630848.0, 42820000.0, 90.64], [1438632576.0, 42840000.0, 90.67999999999999], [1438634304.0, 42860000.0, 90.72], [1438636032.0, 42880000.0, 90.76], [1438637760.0, 42900000.0, 90.8], [1438639488.0, 42920000.0, 90.84], [1438641216.0, 42940000.0, 90.88], [1438642944.0, 42960000.0, 90.92], [1438644672.0, 42980000.0, 90.96000000000001], [1438646400.0, 43000000.0, 91.0], [1438648128.0, 43020000.0, 91.03999999999999], [1438649856.0, 43040000.0, 91.08000000000001], [1438651584.0, 43060000.0, 91.12], [1438653312.0, 43080000.0, 91.16], [1438655040.0, 43100000.0, 91.19999999999999], [1438656768.0, 43120000.0, 91.24000000000001], [1438658496.0, 43140000.0, 91.28], [1438660224.0, 43160000.0, 91.32], [1438661952.0, 43180000.0, 91.35999999999999], [1438663680.0, 43200000.0, 91.4], [1438665408.0, 43220000.0, 91.44], [1438667136.0, 43240000.0, 91.47999999999999], [1438668864.0, 43260000.0, 91.52], [1438670592.0, 43280000.0, 91.56], [1438672320.0, 43300000.0, 91.6], [1438674048.0, 43320000.0, 91.64], [1438675776.0, 43340000.0, 91.67999999999999], [1438677504.0, 43360000.0, 91.72], [1438679232.0, 43380000.0, 91.76], [1438680960.0, 43400000.0, 91.8], [1438682688.0, 43420000.0, 91.83999999999999], [1438684416.0, 43440000.0, 91.88000000000001], [1438686144.0, 43460000.0, 91.92], [1438687872.0, 43480000.0, 91.96], [1438689600.0, 43500000.0, 92.0], [1438691328.0, 43520000.0, 92.04], [1438693056.0, 43540000.0, 92.08], [1438694784.0, 43560000.0, 92.12], [1438696512.0, 43580000.0, 92.16], [1438698240.0, 43600000.0, 92.2], [1438699968.0, 43620000.0, 92.24000000000001], [1438701696.0, 43640000.0, 92.28], [1438703424.0, 43660000.0, 92.32], [1438705152.0, 43680000.0, 92.36000000000001], [1438706880.0, 43700000.0, 92.4], [1438708608.0, 43720000.0, 92.44], [1438710336.0, 43740000.0, 92.47999999999999], [1438712064.0, 43760000.0, 92.52000000000001], [1438713792.0, 43780000.0, 92.56], [1438715520.0, 43800000.0, 92.6], [1438717248.0, 43820000.0, 92.63999999999999], [1438718976.0, 43840000.0, 92.68], [1438720704.0, 43860000.0, 92.72], [1438722432.0, 43880000.0, 92.75999999999999], [1438724160.0, 43900000.0, 92.8], [1438725888.0, 43920000.0, 92.84], [1438727616.0, 43940000.0, 92.88], [1438729344.0, 43960000.0, 92.92], [1438731072.0, 43980000.0, 92.96], [1438732800.0, 44000000.0, 93.0], [1438734528.0, 44020000.0, 93.04], [1438736256.0, 44040000.0, 93.08], [1438737984.0, 44060000.0, 93.11999999999999], [1438739712.0, 44080000.0, 93.16000000000001], [1438741440.0, 44100000.0, 93.2], [1438743168.0, 44120000.0, 93.24], [1438744896.0, 44140000.0, 93.28], [1438746624.0, 44160000.0, 93.32000000000001], [1438748352.0, 44180000.0, 93.36], [1438750080.0, 44200000.0, 93.4], [1438751808.0, 44220000.0, 93.44], [1438753536.0, 44240000.0, 93.48], [1438755264.0, 44260000.0, 93.52000000000001], [1438756992.0, 44280000.0, 93.56], [1438758720.0, 44300000.0, 93.6], [1438760448.0, 44320000.0, 93.64000000000001], [1438762176.0, 44340000.0, 93.68], [1438763904.0, 44360000.0, 93.72], [1438765632.0, 44380000.0, 93.75999999999999], [1438767360.0, 44400000.0, 93.80000000000001], [1438769088.0, 44420000.0, 93.84], [1438770816.0, 44440000.0, 93.88], [1438772544.0, 44460000.0, 93.91999999999999], [1438774272.0, 44480000.0, 93.96000000000001], [1438776000.0, 44500000.0, 94.0], [1438777728.0, 44520000.0, 94.03999999999999], [1438779456.0, 44540000.0, 94.08], [1438781184.0, 44560000.0, 94.12], [1438782912.0, 44580000.0, 94.16], [1438784640.0, 44600000.0, 94.2], [1438786368.0, 44620000.0, 94.24], [1438788096.0, 44640000.0, 94.28], [1438789824.0, 44660000.0, 94.32000000000001], [1438791552.0, 44680000.0, 94.36], [1438793280.0, 44700000.0, 94.39999999999999], [1438795008.0, 44720000.0, 94.44000000000001], [1438796736.0, 44740000.0, 94.48], [1438798464.0, 44760000.0, 94.52], [1438800192.0, 44780000.0, 94.56], [1438801920.0, 44800000.0, 94.60000000000001], [1438803648.0, 44820000.0, 94.64], [1438805376.0, 44840000.0, 94.68], [1438807104.0, 44860000.0, 94.72], [1438808832.0, 44880000.0, 94.76], [1438810560.0, 44900000.0, 94.80000000000001], [1438812288.0, 44920000.0, 94.84], [1438814016.0, 44940000.0, 94.88], [1438815744.0, 44960000.0, 94.92000000000002], [1438817472.0, 44980000.0, 94.96000000000001], [1438819200.0, 45000000.0, 95.0], [1438820928.0, 45020000.0, 95.03999999999999], [1438822656.0, 45040000.0, 95.07999999999998], [1438824384.0, 45060000.0, 95.12], [1438826112.0, 45080000.0, 95.16], [1438827840.0, 45100000.0, 95.19999999999999], [1438829568.0, 45120000.0, 95.24], [1438831296.0, 45140000.0, 95.28], [1438833024.0, 45160000.0, 95.32], [1438834752.0, 45180000.0, 95.36], [1438836480.0, 45200000.0, 95.39999999999999], [1438838208.0, 45220000.0, 95.44], [1438839936.0, 45240000.0, 95.48], [1438841664.0, 45260000.0, 95.52], [1438843392.0, 45280000.0, 95.55999999999999], [1438845120.0, 45300000.0, 95.60000000000001], [1438846848.0, 45320000.0, 95.64], [1438848576.0, 45340000.0, 95.67999999999999], [1438850304.0, 45360000.0, 95.72], [1438852032.0, 45380000.0, 95.76], [1438853760.0, 45400000.0, 95.8], [1438855488.0, 45420000.0, 95.84], [1438857216.0, 45440000.0, 95.88], [1438858944.0, 45460000.0, 95.92], [1438860672.0, 45480000.0, 95.96000000000001], [1438862400.0, 45500000.0, 96.0], [1438864128.0, 45520000.0, 96.03999999999999], [1438865856.0, 45540000.0, 96.08000000000001], [1438867584.0, 45560000.0, 96.12], [1438869312.0, 45580000.0, 96.16], [1438871040.0, 45600000.0, 96.19999999999999], [1438872768.0, 45620000.0, 96.24000000000001], [1438874496.0, 45640000.0, 96.28], [1438876224.0, 45660000.0, 96.32], [1438877952.0, 45680000.0, 96.35999999999999], [1438879680.0, 45700000.0, 96.4], [1438881408.0, 45720000.0, 96.44], [1438883136.0, 45740000.0, 96.47999999999999], [1438884864.0, 45760000.0, 96.52], [1438886592.0, 45780000.0, 96.56], [1438888320.0, 45800000.0, 96.6], [1438890048.0, 45820000.0, 96.64], [1438891776.0, 45840000.0, 96.67999999999999], [1438893504.0, 45860000.0, 96.72], [1438895232.0, 45880000.0, 96.76], [1438896960.0, 45900000.0, 96.8], [1438898688.0, 45920000.0, 96.83999999999999], [1438900416.0, 45940000.0, 96.88000000000001], [1438902144.0, 45960000.0, 96.92], [1438903872.0, 45980000.0, 96.96], [1438905600.0, 46000000.0, 97.0], [1438907328.0, 46020000.0, 97.04], [1438909056.0, 46040000.0, 97.08], [1438910784.0, 46060000.0, 97.12], [1438912512.0, 46080000.0, 97.16], [1438914240.0, 46100000.0, 97.2], [1438915968.0, 46120000.0, 97.24000000000001], [1438917696.0, 46140000.0, 97.28], [1438919424.0, 46160000.0, 97.32], [1438921152.0, 46180000.0, 97.36000000000001], [1438922880.0, 46200000.0, 97.4], [1438924608.0, 46220000.0, 97.44], [1438926336.0, 46240000.0, 97.47999999999999], [1438928064.0, 46260000.0, 97.52000000000001], [1438929792.0, 46280000.0, 97.56], [1438931520.0, 46300000.0, 97.6], [1438933248.0, 46320000.0, 97.63999999999999], [1438934976.0, 46340000.0, 97.68], [1438936704.0, 46360000.0, 97.72], [1438938432.0, 46380000.0, 97.75999999999999], [1438940160.0, 46400000.0, 97.8], [1438941888.0, 46420000.0, 97.84], [1438943616.0, 46440000.0, 97.88], [1438945344.0, 46460000.0, 97.92], [1438947072.0, 46480000.0, 97.96], [1438948800.0, 46500000.0, 98.0], [1438950528.0, 46520000.0, 98.04], [1438952256.0, 46540000.0, 98.08], [1438953984.0, 46560000.0, 98.11999999999999], [1438955712.0, 46580000.0, 98.16000000000001], [1438957440.0, 46600000.0, 98.2], [1438959168.0, 46620000.0, 98.24], [1438960896.0, 46640000.0, 98.28], [1438962624.0, 46660000.0, 98.32000000000001], [1438964352.0, 46680000.0, 98.36], [1438966080.0, 46700000.0, 98.4], [1438967808.0, 46720000.0, 98.44], [1438969536.0, 46740000.0, 98.48], [1438971264.0, 46760000.0, 98.52000000000001], [1438972992.0, 46780000.0, 98.56], [1438974720.0, 46800000.0, 98.6], [1438976448.0, 46820000.0, 98.64000000000001], [1438978176.0, 46840000.0, 98.68], [1438979904.0, 46860000.0, 98.72], [1438981632.0, 46880000.0, 98.75999999999999], [1438983360.0, 46900000.0, 98.80000000000001], [1438985088.0, 46920000.0, 98.84], [1438986816.0, 46940000.0, 98.88], [1438988544.0, 46960000.0, 98.91999999999999], [1438990272.0, 46980000.0, 98.96000000000001], [1438992000.0, 47000000.0, 99.0], [1438993728.0, 47020000.0, 99.03999999999999], [1438995456.0, 47040000.0, 99.08], [1438997184.0, 47060000.0, 99.12], [1438998912.0, 47080000.0, 99.16], [1439000640.0, 47100000.0, 99.2], [1439002368.0, 47120000.0, 99.24], [1439004096.0, 47140000.0, 99.28], [1439005824.0, 47160000.0, 99.32000000000001], [1439007552.0, 47180000.0, 99.36], [1439009280.0, 47200000.0, 99.39999999999999], [1439011008.0, 47220000.0, 99.44000000000001], [1439012736.0, 47240000.0, 99.48], [1439014464.0, 47260000.0, 99.52], [1439016192.0, 47280000.0, 99.56], [1439017920.0, 47300000.0, 99.60000000000001], [1439019648.0, 47320000.0, 99.64], [1439021376.0, 47340000.0, 99.68], [1439023104.0, 47360000.0, 99.72], [1439024832.0, 47380000.0, 99.76], [1439026560.0, 47400000.0, 99.80000000000001], [1439028288.0, 47420000.0, 99.84], [1439030016.0, 47440000.0, 99.88], [1439031744.0, 47460000.0, 99.92000000000002], [1439033472.0, 47480000.0, 99.96000000000001], [1439035200.0, 47500000.0, 100.0], [1439036928.0, 47520000.0, 100.03999999999999], [1439038656.0, 47540000.0, 100.07999999999998], [1439040384.0, 47560000.0, 100.12], [1439042112.0, 47580000.0, 100.16], [1439043840.0, 47600000.0, 100.19999999999999], [1439045568.0, 47620000.0, 100.24], [1439047296.0, 47640000.0, 100.28], [1439049024.0, 47660000.0, 100.32], [1439050752.0, 47680000.0, 100.36], [1439052480.0, 47700000.0, 100.39999999999999], [1439054208.0, 47720000.0, 100.44], [1439055936.0, 47740000.0, 100.48], [1439057664.0, 47760000.0, 100.52], [1439059392.0, 47780000.0, 100.55999999999999], [1439061120.0, 47800000.0, 100.60000000000001], [1439062848.0, 47820000.0, 100.64], [1439064576.0, 47840000.0, 100.67999999999999], [1439066304.0, 47860000.0, 100.72], [1439068032.0, 47880000.0, 100.76], [1439069760.0, 47900000.0, 100.8], [1439071488.0, 47920000.0, 100.84], [1439073216.0, 47940000.0, 100.88], [1439074944.0, 47960000.0, 100.92], [1439076672.0, 47980000.0, 100.96000000000001], [1439078400.0, 48000000.0, 101.0], [1439080128.0, 48020000.0, 101.03999999999999], [1439081856.0, 48040000.0, 101.08000000000001], [1439083584.0, 48060000.0, 101.12], [1439085312.0, 48080000.0, 101.16], [1439087040.0, 48100000.0, 101.19999999999999], [1439088768.0, 48120000.0, 101.24000000000001], [1439090496.0, 48140000.0, 101.28], [1439092224.0, 48160000.0, 101.32], [1439093952.0, 48180000.0, 101.35999999999999], [1439095680.0, 48200000.0, 101.4], [1439097408.0, 48220000.0, 101.44], [1439099136.0, 48240000.0, 101.47999999999999], [1439100864.0, 48260000.0, 101.52], [1439102592.0, 48280000.0, 101.56], [1439104320.0, 48300000.0, 101.6], [1439106048.0, 48320000.0, 101.64], [1439107776.0, 48340000.0, 101.67999999999999], [1439109504.0, 48360000.0, 101.72], [1439111232.0, 48380000.0, 101.76], [1439112960.0, 48400000.0, 101.8], [1439114688.0, 48420000.0, 101.83999999999999], [1439116416.0, 48440000.0, 101.88000000000001], [1439118144.0, 48460000.0, 101.92], [1439119872.0, 48480000.0, 101.96], [1439121600.0, 48500000.0, 102.0], [1439123328.0, 48520000.0, 102.04], [1439125056.0, 48540000.0, 102.08], [1439126784.0, 48560000.0, 102.12], [1439128512.0, 48580000.0, 102.16], [1439130240.0, 48600000.0, 102.2], [1439131968.0, 48620000.0, 102.24000000000001], [1439133696.0, 48640000.0, 102.28], [1439135424.0, 48660000.0, 102.32], [1439137152.0, 48680000.0, 102.36000000000001], [1439138880.0, 48700000.0, 102.4], [1439140608.0, 48720000.0, 102.44], [1439142336.0, 48740000.0, 102.47999999999999], [1439144064.0, 48760000.0, 102.52000000000001], [1439145792.0, 48780000.0, 102.56], [1439147520.0, 48800000.0, 102.6], [1439149248.0, 48820000.0, 102.63999999999999], [1439150976.0, 48840000.0, 102.68], [1439152704.0, 48860000.0, 102.72], [1439154432.0, 48880000.0, 102.75999999999999], [1439156160.0, 48900000.0, 102.8], [1439157888.0, 48920000.0, 102.84], [1439159616.0, 48940000.0, 102.88], [1439161344.0, 48960000.0, 102.92], [1439163072.0, 48980000.0, 102.96], [1439164800.0, 49000000.0, 103.0], [1439166528.0, 49020000.0, 103.04], [1439168256.0, 49040000.0, 103.08], [1439169984.0, 49060000.0, 103.11999999999999], [1439171712.0, 49080000.0, 103.16000000000001], [1439173440.0, 49100000.0, 103.2], [1439175168.0, 49120000.0, 103.24], [1439176896.0, 49140000.0, 103.28], [1439178624.0, 49160000.0, 103.32000000000001], [1439180352.0, 49180000.0, 103.36], [1439182080.0, 49200000.0, 103.4], [1439183808.0, 49220000.0, 103.44], [1439185536.0, 49240000.0, 103.48], [1439187264.0, 49260000.0, 103.52000000000001], [1439188992.0, 49280000.0, 103.56], [1439190720.0, 49300000.0, 103.6], [1439192448.0, 49320000.0, 103.64000000000001], [1439194176.0, 49340000.0, 103.68], [1439195904.0, 49360000.0, 103.72], [1439197632.0, 49380000.0, 103.75999999999999], [1439199360.0, 49400000.0, 103.80000000000001], [1439201088.0, 49420000.0, 103.84], [1439202816.0, 49440000.0, 103.88], [1439204544.0, 49460000.0, 103.91999999999999], [1439206272.0, 49480000.0, 103.96000000000001], [1439208000.0, 49500000.0, 104.0], [1439209728.0, 49520000.0, 104.03999999999999], [1439211456.0, 49540000.0, 104.08], [1439213184.0, 49560000.0, 104.12], [1439214912.0, 49580000.0, 104.16], [1439216640.0, 49600000.0, 104.2], [1439218368.0, 49620000.0, 104.24], [1439220096.0, 49640000.0, 104.28], [1439221824.0, 49660000.0, 104.32000000000001], [1439223552.0, 49680000.0, 104.36], [1439225280.0, 49700000.0, 104.39999999999999], [1439227008.0, 49720000.0, 104.44000000000001], [1439228736.0, 49740000.0, 104.48], [1439230464.0, 49760000.0, 104.52], [1439232192.0, 49780000.0, 104.56], [1439233920.0, 49800000.0, 104.60000000000001], [1439235648.0, 49820000.0, 104.64], [1439237376.0, 49840000.0, 104.68], [1439239104.0, 49860000.0, 104.72], [1439240832.0, 49880000.0, 104.76], [1439242560.0, 49900000.0, 104.80000000000001], [1439244288.0, 49920000.0, 104.84], [1439246016.0, 49940000.0, 104.88], [1439247744.0, 49960000.0, 104.92000000000002], [1439249472.0, 49980000.0, 104.96000000000001], [1439251200.0, 50000000.0, 105.0], [1439252928.0, 50020000.0, 105.03999999999999], [1439254656.0, 50040000.0, 105.07999999999998], [1439256384.0, 50060000.0, 105.12], [1439258112.0, 50080000.0, 105.16], [1439259840.0, 50100000.0, 105.19999999999999], [1439261568.0, 50120000.0, 105.24], [1439263296.0, 50140000.0, 105.28], [1439265024.0, 50160000.0, 105.32], [1439266752.0, 50180000.0, 105.36], [1439268480.0, 50200000.0, 105.39999999999999], [1439270208.0, 50220000.0, 105.44], [1439271936.0, 50240000.0, 105.48], [1439273664.0, 50260000.0, 105.52], [1439275392.0, 50280000.0, 105.55999999999999], [1439277120.0, 50300000.0, 105.60000000000001], [1439278848.0, 50320000.0, 105.64], [1439280576.0, 50340000.0, 105.67999999999999], [1439282304.0, 50360000.0, 105.72], [1439284032.0, 50380000.0, 105.76], [1439285760.0, 50400000.0, 105.8], [1439287488.0, 50420000.0, 105.84], [1439289216.0, 50440000.0, 105.88], [1439290944.0, 50460000.0, 105.92], [1439292672.0, 50480000.0, 105.96000000000001], [1439294400.0, 50500000.0, 106.0], [1439296128.0, 50520000.0, 106.03999999999999], [1439297856.0, 50540000.0, 106.08000000000001], [1439299584.0, 50560000.0, 106.12], [1439301312.0, 50580000.0, 106.16], [1439303040.0, 50600000.0, 106.19999999999999], [1439304768.0, 50620000.0, 106.24000000000001], [1439306496.0, 50640000.0, 106.28], [1439308224.0, 50660000.0, 106.32], [1439309952.0, 50680000.0, 106.35999999999999], [1439311680.0, 50700000.0, 106.4], [1439313408.0, 50720000.0, 106.44], [1439315136.0, 50740000.0, 106.47999999999999], [1439316864.0, 50760000.0, 106.52], [1439318592.0, 50780000.0, 106.56], [1439320320.0, 50800000.0, 106.6], [1439322048.0, 50820000.0, 106.64], [1439323776.0, 50840000.0, 106.67999999999999], [1439325504.0, 50860000.0, 106.72], [1439327232.0, 50880000.0, 106.76], [1439328960.0, 50900000.0, 106.8], [1439330688.0, 50920000.0, 106.83999999999999], [1439332416.0, 50940000.0, 106.88000000000001], [1439334144.0, 50960000.0, 106.92], [1439335872.0, 50980000.0, 106.96], [1439337600.0, 51000000.0, 107.0], [1439339328.0, 51020000.0, 107.04], [1439341056.0, 51040000.0, 107.08], [1439342784.0, 51060000.0, 107.12], [1439344512.0, 51080000.0, 107.16], [1439346240.0, 51100000.0, 107.2], [1439347968.0, 51120000.0, 107.24000000000001], [1439349696.0, 51140000.0, 107.28], [1439351424.0, 51160000.0, 107.32], [1439353152.0, 51180000.0, 107.36000000000001], [1439354880.0, 51200000.0, 107.4], [1439356608.0, 51220000.0, 107.44], [1439358336.0, 51240000.0, 107.47999999999999], [1439360064.0, 51260000.0, 107.52000000000001], [1439361792.0, 51280000.0, 107.56], [1439363520.0, 51300000.0, 107.6], [1439365248.0, 51320000.0, 107.63999999999999], [1439366976.0, 51340000.0, 107.68], [1439368704.0, 51360000.0, 107.72], [1439370432.0, 51380000.0, 107.75999999999999], [1439372160.0, 51400000.0, 107.8], [1439373888.0, 51420000.0, 107.84], [1439375616.0, 51440000.0, 107.88], [1439377344.0, 51460000.0, 107.92], [1439379072.0, 51480000.0, 107.96], [1439380800.0, 51500000.0, 108.0], [1439382528.0, 51520000.0, 108.04], [1439384256.0, 51540000.0, 108.08], [1439385984.0, 51560000.0, 108.11999999999999], [1439387712.0, 51580000.0, 108.16000000000001], [1439389440.0, 51600000.0, 108.2], [1439391168.0, 51620000.0, 108.24], [1439392896.0, 51640000.0, 108.28], [1439394624.0, 51660000.0, 108.32000000000001], [1439396352.0, 51680000.0, 108.36], [1439398080.0, 51700000.0, 108.4], [1439399808.0, 51720000.0, 108.44], [1439401536.0, 51740000.0, 108.48], [1439403264.0, 51760000.0, 108.52000000000001], [1439404992.0, 51780000.0, 108.56], [1439406720.0, 51800000.0, 108.6], [1439408448.0, 51820000.0, 108.64000000000001], [1439410176.0, 51840000.0, 108.68], [1439411904.0, 51860000.0, 108.72], [1439413632.0, 51880000.0, 108.75999999999999], [1439415360.0, 51900000.0, 108.80000000000001], [1439417088.0, 51920000.0, 108.84], [1439418816.0, 51940000.0, 108.88], [1439420544.0, 51960000.0, 108.91999999999999], [1439422272.0, 51980000.0, 108.96000000000001], [1439424000.0, 52000000.0, 109.0], [1439425728.0, 52020000.0, 109.03999999999999], [1439427456.0, 52040000.0, 109.08], [1439429184.0, 52060000.0, 109.12], [1439430912.0, 52080000.0, 109.16], [1439432640.0, 52100000.0, 109.2], [1439434368.0, 52120000.0, 109.24], [1439436096.0, 52140000.0, 109.28], [1439437824.0, 52160000.0, 109.32000000000001], [1439439552.0, 52180000.0, 109.36], [1439441280.0, 52200000.0, 109.39999999999999], [1439443008.0, 52220000.0, 109.44000000000001], [1439444736.0, 52240000.0, 109.48], [1439446464.0, 52260000.0, 109.52], [1439448192.0, 52280000.0, 109.56], [1439449920.0, 52300000.0, 109.60000000000001], [1439451648.0, 52320000.0, 109.64], [1439453376.0, 52340000.0, 109.68], [1439455104.0, 52360000.0, 109.72], [1439456832.0, 52380000.0, 109.76], [1439458560.0, 52400000.0, 109.80000000000001], [1439460288.0, 52420000.0, 109.84], [1439462016.0, 52440000.0, 109.88], [1439463744.0, 52460000.0, 109.92000000000002], [1439465472.0, 52480000.0, 109.96000000000001], [1439467200.0, 52500000.0, 110.0], [1439468928.0, 52520000.0, 110.03999999999999], [1439470656.0, 52540000.0, 110.07999999999998], [1439472384.0, 52560000.0, 110.12], [1439474112.0, 52580000.0, 110.16], [1439475840.0, 52600000.0, 110.19999999999999], [1439477568.0, 52620000.0, 110.24], [1439479296.0, 52640000.0, 110.28], [1439481024.0, 52660000.0, 110.32], [1439482752.0, 52680000.0, 110.36], [1439484480.0, 52700000.0, 110.39999999999999], [1439486208.0, 52720000.0, 110.44], [1439487936.0, 52740000.0, 110.48], [1439489664.0, 52760000.0, 110.52], [1439491392.0, 52780000.0, 110.55999999999999], [1439493120.0, 52800000.0, 110.60000000000001], [1439494848.0, 52820000.0, 110.64], [1439496576.0, 52840000.0, 110.67999999999999], [1439498304.0, 52860000.0, 110.72], [1439500032.0, 52880000.0, 110.76], [1439501760.0, 52900000.0, 110.8], [1439503488.0, 52920000.0, 110.84], [1439505216.0, 52940000.0, 110.88], [1439506944.0, 52960000.0, 110.92], [1439508672.0, 52980000.0, 110.96000000000001], [1439510400.0, 53000000.0, 111.0], [1439512128.0, 53020000.0, 111.03999999999999], [1439513856.0, 53040000.0, 111.08000000000001], [1439515584.0, 53060000.0, 111.12], [1439517312.0, 53080000.0, 111.16], [1439519040.0, 53100000.0, 111.19999999999999], [1439520768.0, 53120000.0, 111.24000000000001], [1439522496.0, 53140000.0, 111.28], [1439524224.0, 53160000.0, 111.32], [1439525952.0, 53180000.0, 111.35999999999999], [1439527680.0, 53200000.0, 111.4], [1439529408.0, 53220000.0, 111.44], [1439531136.0, 53240000.0, 111.47999999999999], [1439532864.0, 53260000.0, 111.52], [1439534592.0, 53280000.0, 111.56], [1439536320.0, 53300000.0, 111.6], [1439538048.0, 53320000.0, 111.64], [1439539776.0, 53340000.0, 111.67999999999999], [1439541504.0, 53360000.0, 111.72], [1439543232.0, 53380000.0, 111.76], [1439544960.0, 53400000.0, 111.8], [1439546688.0, 53420000.0, 111.83999999999999], [1439548416.0, 53440000.0, 111.88000000000001], [1439550144.0, 53460000.0, 111.92], [1439551872.0, 53480000.0, 111.96], [1439553600.0, 53500000.0, 112.0], [1439555328.0, 53520000.0, 112.04], [1439557056.0, 53540000.0, 112.08], [1439558784.0, 53560000.0, 112.12], [1439560512.0, 53580000.0, 112.16], [1439562240.0, 53600000.0, 112.2], [1439563968.0, 53620000.0, 112.24000000000001], [1439565696.0, 53640000.0, 112.28], [1439567424.0, 53660000.0, 112.32], [1439569152.0, 53680000.0, 112.36000000000001], [1439570880.0, 53700000.0, 112.4], [1439572608.0, 53720000.0, 112.44], [1439574336.0, 53740000.0, 112.47999999999999], [1439576064.0, 53760000.0, 112.52000000000001], [1439577792.0, 53780000.0, 112.56], [1439579520.0, 53800000.0, 112.6], [1439581248.0, 53820000.0, 112.63999999999999], [1439582976.0, 53840000.0, 112.68], [1439584704.0, 53860000.0, 112.72], [1439586432.0, 53880000.0, 112.75999999999999], [1439588160.0, 53900000.0, 112.8], [1439589888.0, 53920000.0, 112.84], [1439591616.0, 53940000.0, 112.88], [1439593344.0, 53960000.0, 112.92], [1439595072.0, 53980000.0, 112.96], [1439596800.0, 54000000.0, 113.0], [1439598528.0, 54020000.0, 113.04], [1439600256.0, 54040000.0, 113.08], [1439601984.0, 54060000.0, 113.11999999999999], [1439603712.0, 54080000.0, 113.16000000000001], [1439605440.0, 54100000.0, 113.2], [1439607168.0, 54120000.0, 113.24], [1439608896.0, 54140000.0, 113.28], [1439610624.0, 54160000.0, 113.32000000000001], [1439612352.0, 54180000.0, 113.36], [1439614080.0, 54200000.0, 113.4], [1439615808.0, 54220000.0, 113.44], [1439617536.0, 54240000.0, 113.48], [1439619264.0, 54260000.0, 113.52000000000001], [1439620992.0, 54280000.0, 113.56], [1439622720.0, 54300000.0, 113.6], [1439624448.0, 54320000.0, 113.64000000000001], [1439626176.0, 54340000.0, 113.68], [1439627904.0, 54360000.0, 113.72], [1439629632.0, 54380000.0, 113.75999999999999], [1439631360.0, 54400000.0, 113.80000000000001], [1439633088.0, 54420000.0, 113.84], [1439634816.0, 54440000.0, 113.88], [1439636544.0, 54460000.0, 113.91999999999999], [1439638272.0, 54480000.0, 113.96000000000001], [1439640000.0, 54500000.0, 114.0], [1439641728.0, 54520000.0, 114.03999999999999], [1439643456.0, 54540000.0, 114.08], [1439645184.0, 54560000.0, 114.12], [1439646912.0, 54580000.0, 114.16], [1439648640.0, 54600000.0, 114.2], [1439650368.0, 54620000.0, 114.24], [1439652096.0, 54640000.0, 114.28], [1439653824.0, 54660000.0, 114.32000000000001], [1439655552.0, 54680000.0, 114.36], [1439657280.0, 54700000.0, 114.39999999999999], [1439659008.0, 54720000.0, 114.44000000000001], [1439660736.0, 54740000.0, 114.48], [1439662464.0, 54760000.0, 114.52], [1439664192.0, 54780000.0, 114.56], [1439665920.0, 54800000.0, 114.60000000000001], [1439667648.0, 54820000.0, 114.64], [1439669376.0, 54840000.0, 114.68], [1439671104.0, 54860000.0, 114.72], [1439672832.0, 54880000.0, 114.76], [1439674560.0, 54900000.0, 114.80000000000001], [1439676288.0, 54920000.0, 114.84], [1439678016.0, 54940000.0, 114.88], [1439679744.0, 54960000.0, 114.92000000000002], [1439681472.0, 54980000.0, 114.96000000000001], [1439683200.0, 55000000.0, 115.0], [1439684928.0, 55020000.0, 115.03999999999999], [1439686656.0, 55040000.0, 115.07999999999998], [1439688384.0, 55060000.0, 115.12], [1439690112.0, 55080000.0, 115.16], [1439691840.0, 55100000.0, 115.19999999999999], [1439693568.0, 55120000.0, 115.24], [1439695296.0, 55140000.0, 115.28], [1439697024.0, 55160000.0, 115.32], [1439698752.0, 55180000.0, 115.36], [1439700480.0, 55200000.0, 115.39999999999999], [1439702208.0, 55220000.0, 115.44], [1439703936.0, 55240000.0, 115.48], [1439705664.0, 55260000.0, 115.52], [1439707392.0, 55280000.0, 115.55999999999999], [1439709120.0, 55300000.0, 115.60000000000001], [1439710848.0, 55320000.0, 115.64], [1439712576.0, 55340000.0, 115.67999999999999], [1439714304.0, 55360000.0, 115.72], [1439716032.0, 55380000.0, 115.76], [1439717760.0, 55400000.0, 115.8], [1439719488.0, 55420000.0, 115.84], [1439721216.0, 55440000.0, 115.88], [1439722944.0, 55460000.0, 115.92], [1439724672.0, 55480000.0, 115.96000000000001], [1439726400.0, 55500000.0, 116.0], [1439728128.0, 55520000.0, 116.03999999999999], [1439729856.0, 55540000.0, 116.08000000000001], [1439731584.0, 55560000.0, 116.12], [1439733312.0, 55580000.0, 116.16], [1439735040.0, 55600000.0, 116.19999999999999], [1439736768.0, 55620000.0, 116.24000000000001], [1439738496.0, 55640000.0, 116.28], [1439740224.0, 55660000.0, 116.32], [1439741952.0, 55680000.0, 116.35999999999999], [1439743680.0, 55700000.0, 116.4], [1439745408.0, 55720000.0, 116.44], [1439747136.0, 55740000.0, 116.47999999999999], [1439748864.0, 55760000.0, 116.52], [1439750592.0, 55780000.0, 116.56], [1439752320.0, 55800000.0, 116.6], [1439754048.0, 55820000.0, 116.64], [1439755776.0, 55840000.0, 116.67999999999999], [1439757504.0, 55860000.0, 116.72], [1439759232.0, 55880000.0, 116.76], [1439760960.0, 55900000.0, 116.8], [1439762688.0, 55920000.0, 116.83999999999999], [1439764416.0, 55940000.0, 116.88000000000001], [1439766144.0, 55960000.0, 116.92], [1439767872.0, 55980000.0, 116.96], [1439769600.0, 56000000.0, 117.0], [1439771328.0, 56020000.0, 117.04], [1439773056.0, 56040000.0, 117.08], [1439774784.0, 56060000.0, 117.12], [1439776512.0, 56080000.0, 117.16], [1439778240.0, 56100000.0, 117.2], [1439779968.0, 56120000.0, 117.24000000000001], [1439781696.0, 56140000.0, 117.28], [1439783424.0, 56160000.0, 117.32], [1439785152.0, 56180000.0, 117.36000000000001], [1439786880.0, 56200000.0, 117.4], [1439788608.0, 56220000.0, 117.44], [1439790336.0, 56240000.0, 117.47999999999999], [1439792064.0, 56260000.0, 117.52000000000001], [1439793792.0, 56280000.0, 117.56], [1439795520.0, 56300000.0, 117.6], [1439797248.0, 56320000.0, 117.63999999999999], [1439798976.0, 56340000.0, 117.68], [1439800704.0, 56360000.0, 117.72], [1439802432.0, 56380000.0, 117.75999999999999], [1439804160.0, 56400000.0, 117.8], [1439805888.0, 56420000.0, 117.84], [1439807616.0, 56440000.0, 117.88], [1439809344.0, 56460000.0, 117.92], [1439811072.0, 56480000.0, 117.96], [1439812800.0, 56500000.0, 118.0], [1439814528.0, 56520000.0, 118.04], [1439816256.0, 56540000.0, 118.08], [1439817984.0, 56560000.0, 118.11999999999999], [1439819712.0, 56580000.0, 118.16000000000001], [1439821440.0, 56600000.0, 118.2], [1439823168.0, 56620000.0, 118.24], [1439824896.0, 56640000.0, 118.28], [1439826624.0, 56660000.0, 118.32000000000001], [1439828352.0, 56680000.0, 118.36], [1439830080.0, 56700000.0, 118.4], [1439831808.0, 56720000.0, 118.44], [1439833536.0, 56740000.0, 118.48], [1439835264.0, 56760000.0, 118.52000000000001], [1439836992.0, 56780000.0, 118.56], [1439838720.0, 56800000.0, 118.6], [1439840448.0, 56820000.0, 118.64000000000001], [1439842176.0, 56840000.0, 118.68], [1439843904.0, 56860000.0, 118.72], [1439845632.0, 56880000.0, 118.75999999999999], [1439847360.0, 56900000.0, 118.80000000000001], [1439849088.0, 56920000.0, 118.84], [1439850816.0, 56940000.0, 118.88], [1439852544.0, 56960000.0, 118.91999999999999], [1439854272.0, 56980000.0, 118.96000000000001], [1439856000.0, 57000000.0, 119.0], [1439857728.0, 57020000.0, 119.03999999999999], [1439859456.0, 57040000.0, 119.08], [1439861184.0, 57060000.0, 119.12], [1439862912.0, 57080000.0, 119.16], [1439864640.0, 57100000.0, 119.2], [1439866368.0, 57120000.0, 119.24], [1439868096.0, 57140000.0, 119.28], [1439869824.0, 57160000.0, 119.32000000000001], [1439871552.0, 57180000.0, 119.36], [1439873280.0, 57200000.0, 119.39999999999999], [1439875008.0, 57220000.0, 119.44000000000001], [1439876736.0, 57240000.0, 119.48], [1439878464.0, 57260000.0, 119.52], [1439880192.0, 57280000.0, 119.56], [1439881920.0, 57300000.0, 119.60000000000001], [1439883648.0, 57320000.0, 119.64], [1439885376.0, 57340000.0, 119.68], [1439887104.0, 57360000.0, 119.72], [1439888832.0, 57380000.0, 119.76], [1439890560.0, 57400000.0, 119.80000000000001], [1439892288.0, 57420000.0, 119.84], [1439894016.0, 57440000.0, 119.88], [1439895744.0, 57460000.0, 119.92000000000002], [1439897472.0, 57480000.0, 119.96000000000001], [1439899200.0, 57500000.0, 120.0], [1439900928.0, 57520000.0, 120.03999999999999], [1439902656.0, 57540000.0, 120.07999999999998], [1439904384.0, 57560000.0, 120.12], [1439906112.0, 57580000.0, 120.16], [1439907840.0, 57600000.0, 120.19999999999999], [1439909568.0, 57620000.0, 120.24], [1439911296.0, 57640000.0, 120.28], [1439913024.0, 57660000.0, 120.32], [1439914752.0, 57680000.0, 120.36], [1439916480.0, 57700000.0, 120.39999999999999], [1439918208.0, 57720000.0, 120.44], [1439919936.0, 57740000.0, 120.48], [1439921664.0, 57760000.0, 120.52], [1439923392.0, 57780000.0, 120.55999999999999], [1439925120.0, 57800000.0, 120.60000000000001], [1439926848.0, 57820000.0, 120.64], [1439928576.0, 57840000.0, 120.67999999999999], [1439930304.0, 57860000.0, 120.72], [1439932032.0, 57880000.0, 120.76], [1439933760.0, 57900000.0, 120.8], [1439935488.0, 57920000.0, 120.84], [1439937216.0, 57940000.0, 120.88], [1439938944.0, 57960000.0, 120.92], [1439940672.0, 57980000.0, 120.96000000000001], [1439942400.0, 58000000.0, 121.0], [1439944128.0, 58020000.0, 121.03999999999999], [1439945856.0, 58040000.0, 121.08000000000001], [1439947584.0, 58060000.0, 121.12], [1439949312.0, 58080000.0, 121.16], [1439951040.0, 58100000.0, 121.19999999999999], [1439952768.0, 58120000.0, 121.24000000000001], [1439954496.0, 58140000.0, 121.28], [1439956224.0, 58160000.0, 121.32], [1439957952.0, 58180000.0, 121.35999999999999], [1439959680.0, 58200000.0, 121.4], [1439961408.0, 58220000.0, 121.44], [1439963136.0, 58240000.0, 121.47999999999999], [1439964864.0, 58260000.0, 121.52], [1439966592.0, 58280000.0, 121.56], [1439968320.0, 58300000.0, 121.6], [1439970048.0, 58320000.0, 121.64], [1439971776.0, 58340000.0, 121.67999999999999], [1439973504.0, 58360000.0, 121.72], [1439975232.0, 58380000.0, 121.76], [1439976960.0, 58400000.0, 121.8], [1439978688.0, 58420000.0, 121.83999999999999], [1439980416.0, 58440000.0, 121.88000000000001], [1439982144.0, 58460000.0, 121.92], [1439983872.0, 58480000.0, 121.96], [1439985600.0, 58500000.0, 122.0], [1439987328.0, 58520000.0, 122.04], [1439989056.0, 58540000.0, 122.08], [1439990784.0, 58560000.0, 122.12], [1439992512.0, 58580000.0, 122.16], [1439994240.0, 58600000.0, 122.2], [1439995968.0, 58620000.0, 122.24000000000001], [1439997696.0, 58640000.0, 122.28], [1439999424.0, 58660000.0, 122.32], [1440001152.0, 58680000.0, 122.36000000000001], [1440002880.0, 58700000.0, 122.4], [1440004608.0, 58720000.0, 122.44], [1440006336.0, 58740000.0, 122.47999999999999], [1440008064.0, 58760000.0, 122.52000000000001], [1440009792.0, 58780000.0, 122.56], [1440011520.0, 58800000.0, 122.6], [1440013248.0, 58820000.0, 122.63999999999999], [1440014976.0, 58840000.0, 122.68], [1440016704.0, 58860000.0, 122.72], [1440018432.0, 58880000.0, 122.75999999999999], [1440020160.0, 58900000.0, 122.8], [1440021888.0, 58920000.0, 122.84], [1440023616.0, 58940000.0, 122.88], [1440025344.0, 58960000.0, 122.92], [1440027072.0, 58980000.0, 122.96], [1440028800.0, 59000000.0, 123.0], [1440030528.0, 59020000.0, 123.04], [1440032256.0, 59040000.0, 123.08], [1440033984.0, 59060000.0, 123.11999999999999], [1440035712.0, 59080000.0, 123.16000000000001], [1440037440.0, 59100000.0, 123.2], [1440039168.0, 59120000.0, 123.24], [1440040896.0, 59140000.0, 123.28], [1440042624.0, 59160000.0, 123.32000000000001], [1440044352.0, 59180000.0, 123.36], [1440046080.0, 59200000.0, 123.4], [1440047808.0, 59220000.0, 123.44], [1440049536.0, 59240000.0, 123.48], [1440051264.0, 59260000.0, 123.52000000000001], [1440052992.0, 59280000.0, 123.56], [1440054720.0, 59300000.0, 123.6], [1440056448.0, 59320000.0, 123.64000000000001], [1440058176.0, 59340000.0, 123.68], [1440059904.0, 59360000.0, 123.72], [1440061632.0, 59380000.0, 123.75999999999999], [1440063360.0, 59400000.0, 123.80000000000001], [1440065088.0, 59420000.0, 123.84], [1440066816.0, 59440000.0, 123.88], [1440068544.0, 59460000.0, 123.91999999999999], [1440070272.0, 59480000.0, 123.96000000000001], [1440072000.0, 59500000.0, 124.0], [1440073728.0, 59520000.0, 124.03999999999999], [1440075456.0, 59540000.0, 124.08], [1440077184.0, 59560000.0, 124.12], [1440078912.0, 59580000.0, 124.16], [1440080640.0, 59600000.0, 124.2], [1440082368.0, 59620000.0, 124.24], [1440084096.0, 59640000.0, 124.28], [1440085824.0, 59660000.0, 124.32000000000001], [1440087552.0, 59680000.0, 124.36], [1440089280.0, 59700000.0, 124.39999999999999], [1440091008.0, 59720000.0, 124.44000000000001], [1440092736.0, 59740000.0, 124.48], [1440094464.0, 59760000.0, 124.52], [1440096192.0, 59780000.0, 124.56], [1440097920.0, 59800000.0, 124.60000000000001], [1440099648.0, 59820000.0, 124.64], [1440101376.0, 59840000.0, 124.68], [1440103104.0, 59860000.0, 124.72], [1440104832.0, 59880000.0, 124.76], [1440106560.0, 59900000.0, 124.80000000000001], [1440108288.0, 59920000.0, 124.84], [1440110016.0, 59940000.0, 124.88], [1440111744.0, 59960000.0, 124.92000000000002], [1440113472.0, 59980000.0, 124.96000000000001], [1440115200.0, 60000000.0, 125.0], [1440116928.0, 60020000.0, 125.03999999999999], [1440118656.0, 60040000.0, 125.07999999999998], [1440120384.0, 60060000.0, 125.12], [1440122112.0, 60080000.0, 125.16], [1440123840.0, 60100000.0, 125.19999999999999], [1440125568.0, 60120000.0, 125.24], [1440127296.0, 60140000.0, 125.28], [1440129024.0, 60160000.0, 125.32], [1440130752.0, 60180000.0, 125.36], [1440132480.0, 60200000.0, 125.39999999999999], [1440134208.0, 60220000.0, 125.44], [1440135936.0, 60240000.0, 125.48], [1440137664.0, 60260000.0, 125.52], [1440139392.0, 60280000.0, 125.55999999999999], [1440141120.0, 60300000.0, 125.60000000000001], [1440142848.0, 60320000.0, 125.64], [1440144576.0, 60340000.0, 125.67999999999999], [1440146304.0, 60360000.0, 125.72], [1440148032.0, 60380000.0, 125.76], [1440149760.0, 60400000.0, 125.8], [1440151488.0, 60420000.0, 125.84], [1440153216.0, 60440000.0, 125.88], [1440154944.0, 60460000.0, 125.92], [1440156672.0, 60480000.0, 125.96000000000001], [1440158400.0, 60500000.0, 126.0], [1440160128.0, 60520000.0, 126.03999999999999], [1440161856.0, 60540000.0, 126.08000000000001], [1440163584.0, 60560000.0, 126.12], [1440165312.0, 60580000.0, 126.16], [1440167040.0, 60600000.0, 126.19999999999999], [1440168768.0, 60620000.0, 126.24000000000001], [1440170496.0, 60640000.0, 126.28], [1440172224.0, 60660000.0, 126.32], [1440173952.0, 60680000.0, 126.35999999999999], [1440175680.0, 60700000.0, 126.4], [1440177408.0, 60720000.0, 126.44], [1440179136.0, 60740000.0, 126.47999999999999], [1440180864.0, 60760000.0, 126.52], [1440182592.0, 60780000.0, 126.56], [1440184320.0, 60800000.0, 126.6], [1440186048.0, 60820000.0, 126.64], [1440187776.0, 60840000.0, 126.67999999999999], [1440189504.0, 60860000.0, 126.72], [1440191232.0, 60880000.0, 126.76], [1440192960.0, 60900000.0, 126.8], [1440194688.0, 60920000.0, 126.83999999999999], [1440196416.0, 60940000.0, 126.88000000000001], [1440198144.0, 60960000.0, 126.92], [1440199872.0, 60980000.0, 126.96], [1440201600.0, 61000000.0, 127.0], [1440203328.0, 61020000.0, 127.04], [1440205056.0, 61040000.0, 127.08], [1440206784.0, 61060000.0, 127.12], [1440208512.0, 61080000.0, 127.16], [1440210240.0, 61100000.0, 127.2], [1440211968.0, 61120000.0, 127.24000000000001], [1440213696.0, 61140000.0, 127.28], [1440215424.0, 61160000.0, 127.32], [1440217152.0, 61180000.0, 127.36000000000001], [1440218880.0, 61200000.0, 127.4], [1440220608.0, 61220000.0, 127.44], [1440222336.0, 61240000.0, 127.47999999999999], [1440224064.0, 61260000.0, 127.52000000000001], [1440225792.0, 61280000.0, 127.56], [1440227520.0, 61300000.0, 127.6], [1440229248.0, 61320000.0, 127.63999999999999], [1440230976.0, 61340000.0, 127.68], [1440232704.0, 61360000.0, 127.72], [1440234432.0, 61380000.0, 127.75999999999999], [1440236160.0, 61400000.0, 127.8], [1440237888.0, 61420000.0, 127.84], [1440239616.0, 61440000.0, 127.88], [1440241344.0, 61460000.0, 127.92], [1440243072.0, 61480000.0, 127.96], [1440244800.0, 61500000.0, 128.0], [1440246528.0, 61520000.0, 128.04000000000002], [1440248256.0, 61540000.0, 128.07999999999998], [1440249984.0, 61560000.0, 128.12], [1440251712.0, 61580000.0, 128.16000000000003], [1440253440.0, 61600000.0, 128.2], [1440255168.0, 61620000.0, 128.24], [1440256896.0, 61640000.0, 128.28], [1440258624.0, 61660000.0, 128.32], [1440260352.0, 61680000.0, 128.36], [1440262080.0, 61700000.0, 128.4], [1440263808.0, 61720000.0, 128.44], [1440265536.0, 61740000.0, 128.48000000000002], [1440267264.0, 61760000.0, 128.52], [1440268992.0, 61780000.0, 128.56], [1440270720.0, 61800000.0, 128.6], [1440272448.0, 61820000.0, 128.64000000000001], [1440274176.0, 61840000.0, 128.68], [1440275904.0, 61860000.0, 128.72], [1440277632.0, 61880000.0, 128.76], [1440279360.0, 61900000.0, 128.8], [1440281088.0, 61920000.0, 128.84], [1440282816.0, 61940000.0, 128.88], [1440284544.0, 61960000.0, 128.92], [1440286272.0, 61980000.0, 128.96], [1440288000.0, 62000000.0, 129.0], [1440289728.0, 62020000.0, 129.04], [1440291456.0, 62040000.0, 129.07999999999998], [1440293184.0, 62060000.0, 129.12], [1440294912.0, 62080000.0, 129.16], [1440296640.0, 62100000.0, 129.2], [1440298368.0, 62120000.0, 129.24], [1440300096.0, 62140000.0, 129.28], [1440301824.0, 62160000.0, 129.32], [1440303552.0, 62180000.0, 129.36], [1440305280.0, 62200000.0, 129.39999999999998], [1440307008.0, 62220000.0, 129.44], [1440308736.0, 62240000.0, 129.48000000000002], [1440310464.0, 62260000.0, 129.51999999999998], [1440312192.0, 62280000.0, 129.56], [1440313920.0, 62300000.0, 129.60000000000002], [1440315648.0, 62320000.0, 129.64], [1440317376.0, 62340000.0, 129.68], [1440319104.0, 62360000.0, 129.72], [1440320832.0, 62380000.0, 129.76], [1440322560.0, 62400000.0, 129.8], [1440324288.0, 62420000.0, 129.84], [1440326016.0, 62440000.0, 129.88], [1440327744.0, 62460000.0, 129.92000000000002], [1440329472.0, 62480000.0, 129.96], [1440331200.0, 62500000.0, 130.0], [1440332928.0, 62520000.0, 130.04], [1440334656.0, 62540000.0, 130.07999999999998], [1440336384.0, 62560000.0, 130.12], [1440338112.0, 62580000.0, 130.16], [1440339840.0, 62600000.0, 130.2], [1440341568.0, 62620000.0, 130.24], [1440343296.0, 62640000.0, 130.28], [1440345024.0, 62660000.0, 130.32], [1440346752.0, 62680000.0, 130.36], [1440348480.0, 62700000.0, 130.39999999999998], [1440350208.0, 62720000.0, 130.44], [1440351936.0, 62740000.0, 130.48000000000002], [1440353664.0, 62760000.0, 130.51999999999998], [1440355392.0, 62780000.0, 130.56], [1440357120.0, 62800000.0, 130.60000000000002], [1440358848.0, 62820000.0, 130.64], [1440360576.0, 62840000.0, 130.68], [1440362304.0, 62860000.0, 130.72], [1440364032.0, 62880000.0, 130.76], [1440365760.0, 62900000.0, 130.8], [1440367488.0, 62920000.0, 130.84], [1440369216.0, 62940000.0, 130.88], [1440370944.0, 62960000.0, 130.92000000000002], [1440372672.0, 62980000.0, 130.96], [1440374400.0, 63000000.0, 131.0], [1440376128.0, 63020000.0, 131.04], [1440377856.0, 63040000.0, 131.08], [1440379584.0, 63060000.0, 131.12], [1440381312.0, 63080000.0, 131.16], [1440383040.0, 63100000.0, 131.2], [1440384768.0, 63120000.0, 131.24], [1440386496.0, 63140000.0, 131.28], [1440388224.0, 63160000.0, 131.32], [1440389952.0, 63180000.0, 131.35999999999999], [1440391680.0, 63200000.0, 131.4], [1440393408.0, 63220000.0, 131.44], [1440395136.0, 63240000.0, 131.48], [1440396864.0, 63260000.0, 131.51999999999998], [1440398592.0, 63280000.0, 131.56], [1440400320.0, 63300000.0, 131.6], [1440402048.0, 63320000.0, 131.64], [1440403776.0, 63340000.0, 131.68], [1440405504.0, 63360000.0, 131.72], [1440407232.0, 63380000.0, 131.76], [1440408960.0, 63400000.0, 131.8], [1440410688.0, 63420000.0, 131.83999999999997], [1440412416.0, 63440000.0, 131.88], [1440414144.0, 63460000.0, 131.92000000000002], [1440415872.0, 63480000.0, 131.95999999999998], [1440417600.0, 63500000.0, 132.0], [1440419328.0, 63520000.0, 132.04000000000002], [1440421056.0, 63540000.0, 132.07999999999998], [1440422784.0, 63560000.0, 132.12], [1440424512.0, 63580000.0, 132.16], [1440426240.0, 63600000.0, 132.2], [1440427968.0, 63620000.0, 132.24], [1440429696.0, 63640000.0, 132.28], [1440431424.0, 63660000.0, 132.32], [1440433152.0, 63680000.0, 132.36], [1440434880.0, 63700000.0, 132.4], [1440436608.0, 63720000.0, 132.44], [1440438336.0, 63740000.0, 132.48], [1440440064.0, 63760000.0, 132.52], [1440441792.0, 63780000.0, 132.56], [1440443520.0, 63800000.0, 132.6], [1440445248.0, 63820000.0, 132.64], [1440446976.0, 63840000.0, 132.68], [1440448704.0, 63860000.0, 132.72], [1440450432.0, 63880000.0, 132.76], [1440452160.0, 63900000.0, 132.8], [1440453888.0, 63920000.0, 132.84], [1440455616.0, 63940000.0, 132.88], [1440457344.0, 63960000.0, 132.92000000000002], [1440459072.0, 63980000.0, 132.95999999999998], [1440460800.0, 64000000.0, 133.0], [1440462528.0, 64020000.0, 133.04], [1440464256.0, 64040000.0, 133.07999999999998], [1440465984.0, 64060000.0, 133.12], [1440467712.0, 64080000.0, 133.16], [1440469440.0, 64100000.0, 133.2], [1440471168.0, 64120000.0, 133.24], [1440472896.0, 64140000.0, 133.28], [1440474624.0, 64160000.0, 133.32], [1440476352.0, 64180000.0, 133.36], [1440478080.0, 64200000.0, 133.4], [1440479808.0, 64220000.0, 133.44], [1440481536.0, 64240000.0, 133.48000000000002], [1440483264.0, 64260000.0, 133.52], [1440484992.0, 64280000.0, 133.56], [1440486720.0, 64300000.0, 133.6], [1440488448.0, 64320000.0, 133.64000000000001], [1440490176.0, 64340000.0, 133.68], [1440491904.0, 64360000.0, 133.72], [1440493632.0, 64380000.0, 133.76], [1440495360.0, 64400000.0, 133.8], [1440497088.0, 64420000.0, 133.84], [1440498816.0, 64440000.0, 133.88], [1440500544.0, 64460000.0, 133.92], [1440502272.0, 64480000.0, 133.96], [1440504000.0, 64500000.0, 134.0], [1440505728.0, 64520000.0, 134.04], [1440507456.0, 64540000.0, 134.07999999999998], [1440509184.0, 64560000.0, 134.12], [1440510912.0, 64580000.0, 134.16], [1440512640.0, 64600000.0, 134.2], [1440514368.0, 64620000.0, 134.24], [1440516096.0, 64640000.0, 134.28], [1440517824.0, 64660000.0, 134.32], [1440519552.0, 64680000.0, 134.36], [1440521280.0, 64700000.0, 134.4], [1440523008.0, 64720000.0, 134.44], [1440524736.0, 64740000.0, 134.48000000000002], [1440526464.0, 64760000.0, 134.52], [1440528192.0, 64780000.0, 134.56], [1440529920.0, 64800000.0, 134.60000000000002], [1440531648.0, 64820000.0, 134.64000000000001], [1440533376.0, 64840000.0, 134.68], [1440535104.0, 64860000.0, 134.72], [1440536832.0, 64880000.0, 134.76000000000002], [1440538560.0, 64900000.0, 134.8], [1440540288.0, 64920000.0, 134.84], [1440542016.0, 64940000.0, 134.88], [1440543744.0, 64960000.0, 134.92000000000002], [1440545472.0, 64980000.0, 134.96], [1440547200.0, 65000000.0, 135.0], [1440548928.0, 65020000.0, 135.04], [1440550656.0, 65040000.0, 135.07999999999998], [1440552384.0, 65060000.0, 135.12], [1440554112.0, 65080000.0, 135.16], [1440555840.0, 65100000.0, 135.2], [1440557568.0, 65120000.0, 135.23999999999998], [1440559296.0, 65140000.0, 135.28], [1440561024.0, 65160000.0, 135.32], [1440562752.0, 65180000.0, 135.35999999999999], [1440564480.0, 65200000.0, 135.39999999999998], [1440566208.0, 65220000.0, 135.44], [1440567936.0, 65240000.0, 135.48], [1440569664.0, 65260000.0, 135.51999999999998], [1440571392.0, 65280000.0, 135.56], [1440573120.0, 65300000.0, 135.6], [1440574848.0, 65320000.0, 135.64], [1440576576.0, 65340000.0, 135.68], [1440578304.0, 65360000.0, 135.72], [1440580032.0, 65380000.0, 135.76], [1440581760.0, 65400000.0, 135.8], [1440583488.0, 65420000.0, 135.84], [1440585216.0, 65440000.0, 135.88], [1440586944.0, 65460000.0, 135.92000000000002], [1440588672.0, 65480000.0, 135.96], [1440590400.0, 65500000.0, 136.0], [1440592128.0, 65520000.0, 136.04], [1440593856.0, 65540000.0, 136.08], [1440595584.0, 65560000.0, 136.12], [1440597312.0, 65580000.0, 136.16], [1440599040.0, 65600000.0, 136.2], [1440600768.0, 65620000.0, 136.24], [1440602496.0, 65640000.0, 136.28], [1440604224.0, 65660000.0, 136.32], [1440605952.0, 65680000.0, 136.35999999999999], [1440607680.0, 65700000.0, 136.4], [1440609408.0, 65720000.0, 136.44], [1440611136.0, 65740000.0, 136.48], [1440612864.0, 65760000.0, 136.51999999999998], [1440614592.0, 65780000.0, 136.56], [1440616320.0, 65800000.0, 136.6], [1440618048.0, 65820000.0, 136.64], [1440619776.0, 65840000.0, 136.68], [1440621504.0, 65860000.0, 136.72], [1440623232.0, 65880000.0, 136.76], [1440624960.0, 65900000.0, 136.8], [1440626688.0, 65920000.0, 136.84], [1440628416.0, 65940000.0, 136.88], [1440630144.0, 65960000.0, 136.92000000000002], [1440631872.0, 65980000.0, 136.96], [1440633600.0, 66000000.0, 137.0], [1440635328.0, 66020000.0, 137.04000000000002], [1440637056.0, 66040000.0, 137.08], [1440638784.0, 66060000.0, 137.12], [1440640512.0, 66080000.0, 137.16], [1440642240.0, 66100000.0, 137.20000000000002], [1440643968.0, 66120000.0, 137.24], [1440645696.0, 66140000.0, 137.28], [1440647424.0, 66160000.0, 137.32], [1440649152.0, 66180000.0, 137.36], [1440650880.0, 66200000.0, 137.4], [1440652608.0, 66220000.0, 137.44], [1440654336.0, 66240000.0, 137.48], [1440656064.0, 66260000.0, 137.52], [1440657792.0, 66280000.0, 137.56], [1440659520.0, 66300000.0, 137.6], [1440661248.0, 66320000.0, 137.64], [1440662976.0, 66340000.0, 137.68], [1440664704.0, 66360000.0, 137.72], [1440666432.0, 66380000.0, 137.76], [1440668160.0, 66400000.0, 137.79999999999998], [1440669888.0, 66420000.0, 137.84], [1440671616.0, 66440000.0, 137.88], [1440673344.0, 66460000.0, 137.92], [1440675072.0, 66480000.0, 137.95999999999998], [1440676800.0, 66500000.0, 138.0], [1440678528.0, 66520000.0, 138.04], [1440680256.0, 66540000.0, 138.07999999999998], [1440681984.0, 66560000.0, 138.12], [1440683712.0, 66580000.0, 138.16], [1440685440.0, 66600000.0, 138.2], [1440687168.0, 66620000.0, 138.24], [1440688896.0, 66640000.0, 138.28], [1440690624.0, 66660000.0, 138.32], [1440692352.0, 66680000.0, 138.36], [1440694080.0, 66700000.0, 138.4], [1440695808.0, 66720000.0, 138.44], [1440697536.0, 66740000.0, 138.48000000000002], [1440699264.0, 66760000.0, 138.52], [1440700992.0, 66780000.0, 138.56], [1440702720.0, 66800000.0, 138.6], [1440704448.0, 66820000.0, 138.64000000000001], [1440706176.0, 66840000.0, 138.68], [1440707904.0, 66860000.0, 138.72], [1440709632.0, 66880000.0, 138.76], [1440711360.0, 66900000.0, 138.8], [1440713088.0, 66920000.0, 138.84], [1440714816.0, 66940000.0, 138.88], [1440716544.0, 66960000.0, 138.92], [1440718272.0, 66980000.0, 138.96], [1440720000.0, 67000000.0, 139.0], [1440721728.0, 67020000.0, 139.04], [1440723456.0, 67040000.0, 139.07999999999998], [1440725184.0, 67060000.0, 139.12], [1440726912.0, 67080000.0, 139.16], [1440728640.0, 67100000.0, 139.2], [1440730368.0, 67120000.0, 139.24], [1440732096.0, 67140000.0, 139.28], [1440733824.0, 67160000.0, 139.32], [1440735552.0, 67180000.0, 139.36], [1440737280.0, 67200000.0, 139.4], [1440739008.0, 67220000.0, 139.44], [1440740736.0, 67240000.0, 139.48000000000002], [1440742464.0, 67260000.0, 139.52], [1440744192.0, 67280000.0, 139.56], [1440745920.0, 67300000.0, 139.60000000000002], [1440747648.0, 67320000.0, 139.64000000000001], [1440749376.0, 67340000.0, 139.68], [1440751104.0, 67360000.0, 139.72], [1440752832.0, 67380000.0, 139.76000000000002], [1440754560.0, 67400000.0, 139.8], [1440756288.0, 67420000.0, 139.84], [1440758016.0, 67440000.0, 139.88], [1440759744.0, 67460000.0, 139.92000000000002], [1440761472.0, 67480000.0, 139.96], [1440763200.0, 67500000.0, 140.0], [1440764928.0, 67520000.0, 140.04], [1440766656.0, 67540000.0, 140.07999999999998], [1440768384.0, 67560000.0, 140.12], [1440770112.0, 67580000.0, 140.16], [1440771840.0, 67600000.0, 140.2], [1440773568.0, 67620000.0, 140.23999999999998], [1440775296.0, 67640000.0, 140.28], [1440777024.0, 67660000.0, 140.32], [1440778752.0, 67680000.0, 140.35999999999999], [1440780480.0, 67700000.0, 140.39999999999998], [1440782208.0, 67720000.0, 140.44], [1440783936.0, 67740000.0, 140.48], [1440785664.0, 67760000.0, 140.51999999999998], [1440787392.0, 67780000.0, 140.56], [1440789120.0, 67800000.0, 140.6], [1440790848.0, 67820000.0, 140.64], [1440792576.0, 67840000.0, 140.68], [1440794304.0, 67860000.0, 140.72], [1440796032.0, 67880000.0, 140.76], [1440797760.0, 67900000.0, 140.8], [1440799488.0, 67920000.0, 140.84], [1440801216.0, 67940000.0, 140.88], [1440802944.0, 67960000.0, 140.92000000000002], [1440804672.0, 67980000.0, 140.96], [1440806400.0, 68000000.0, 141.0], [1440808128.0, 68020000.0, 141.04], [1440809856.0, 68040000.0, 141.08], [1440811584.0, 68060000.0, 141.12], [1440813312.0, 68080000.0, 141.16], [1440815040.0, 68100000.0, 141.2], [1440816768.0, 68120000.0, 141.24], [1440818496.0, 68140000.0, 141.28], [1440820224.0, 68160000.0, 141.32], [1440821952.0, 68180000.0, 141.35999999999999], [1440823680.0, 68200000.0, 141.4], [1440825408.0, 68220000.0, 141.44], [1440827136.0, 68240000.0, 141.48], [1440828864.0, 68260000.0, 141.51999999999998], [1440830592.0, 68280000.0, 141.56], [1440832320.0, 68300000.0, 141.6], [1440834048.0, 68320000.0, 141.64], [1440835776.0, 68340000.0, 141.68], [1440837504.0, 68360000.0, 141.72], [1440839232.0, 68380000.0, 141.76], [1440840960.0, 68400000.0, 141.8], [1440842688.0, 68420000.0, 141.84], [1440844416.0, 68440000.0, 141.88], [1440846144.0, 68460000.0, 141.92000000000002], [1440847872.0, 68480000.0, 141.96], [1440849600.0, 68500000.0, 142.0], [1440851328.0, 68520000.0, 142.04000000000002], [1440853056.0, 68540000.0, 142.08], [1440854784.0, 68560000.0, 142.12], [1440856512.0, 68580000.0, 142.16], [1440858240.0, 68600000.0, 142.20000000000002], [1440859968.0, 68620000.0, 142.24], [1440861696.0, 68640000.0, 142.28], [1440863424.0, 68660000.0, 142.32], [1440865152.0, 68680000.0, 142.36], [1440866880.0, 68700000.0, 142.4], [1440868608.0, 68720000.0, 142.44], [1440870336.0, 68740000.0, 142.48], [1440872064.0, 68760000.0, 142.52], [1440873792.0, 68780000.0, 142.56], [1440875520.0, 68800000.0, 142.6], [1440877248.0, 68820000.0, 142.64], [1440878976.0, 68840000.0, 142.68], [1440880704.0, 68860000.0, 142.72], [1440882432.0, 68880000.0, 142.76], [1440884160.0, 68900000.0, 142.79999999999998], [1440885888.0, 68920000.0, 142.84], [1440887616.0, 68940000.0, 142.88], [1440889344.0, 68960000.0, 142.92], [1440891072.0, 68980000.0, 142.95999999999998], [1440892800.0, 69000000.0, 143.0], [1440894528.0, 69020000.0, 143.04], [1440896256.0, 69040000.0, 143.07999999999998], [1440897984.0, 69060000.0, 143.12], [1440899712.0, 69080000.0, 143.16], [1440901440.0, 69100000.0, 143.2], [1440903168.0, 69120000.0, 143.24], [1440904896.0, 69140000.0, 143.28], [1440906624.0, 69160000.0, 143.32], [1440908352.0, 69180000.0, 143.36], [1440910080.0, 69200000.0, 143.4], [1440911808.0, 69220000.0, 143.44], [1440913536.0, 69240000.0, 143.48000000000002], [1440915264.0, 69260000.0, 143.52], [1440916992.0, 69280000.0, 143.56], [1440918720.0, 69300000.0, 143.6], [1440920448.0, 69320000.0, 143.64000000000001], [1440922176.0, 69340000.0, 143.68], [1440923904.0, 69360000.0, 143.72], [1440925632.0, 69380000.0, 143.76], [1440927360.0, 69400000.0, 143.8], [1440929088.0, 69420000.0, 143.84], [1440930816.0, 69440000.0, 143.88], [1440932544.0, 69460000.0, 143.92], [1440934272.0, 69480000.0, 143.96], [1440936000.0, 69500000.0, 144.0], [1440937728.0, 69520000.0, 144.04], [1440939456.0, 69540000.0, 144.07999999999998], [1440941184.0, 69560000.0, 144.12], [1440942912.0, 69580000.0, 144.16], [1440944640.0, 69600000.0, 144.2], [1440946368.0, 69620000.0, 144.24], [1440948096.0, 69640000.0, 144.28], [1440949824.0, 69660000.0, 144.32], [1440951552.0, 69680000.0, 144.36], [1440953280.0, 69700000.0, 144.4], [1440955008.0, 69720000.0, 144.44], [1440956736.0, 69740000.0, 144.48000000000002], [1440958464.0, 69760000.0, 144.52], [1440960192.0, 69780000.0, 144.56], [1440961920.0, 69800000.0, 144.60000000000002], [1440963648.0, 69820000.0, 144.64000000000001], [1440965376.0, 69840000.0, 144.68], [1440967104.0, 69860000.0, 144.72], [1440968832.0, 69880000.0, 144.76000000000002], [1440970560.0, 69900000.0, 144.8], [1440972288.0, 69920000.0, 144.84], [1440974016.0, 69940000.0, 144.88], [1440975744.0, 69960000.0, 144.92000000000002], [1440977472.0, 69980000.0, 144.96], [1440979200.0, 70000000.0, 145.0], [1440980928.0, 70020000.0, 145.04], [1440982656.0, 70040000.0, 145.07999999999998], [1440984384.0, 70060000.0, 145.12], [1440986112.0, 70080000.0, 145.16], [1440987840.0, 70100000.0, 145.2], [1440989568.0, 70120000.0, 145.23999999999998], [1440991296.0, 70140000.0, 145.28], [1440993024.0, 70160000.0, 145.32], [1440994752.0, 70180000.0, 145.35999999999999], [1440996480.0, 70200000.0, 145.39999999999998], [1440998208.0, 70220000.0, 145.44], [1440999936.0, 70240000.0, 145.48], [1441001664.0, 70260000.0, 145.51999999999998], [1441003392.0, 70280000.0, 145.56], [1441005120.0, 70300000.0, 145.6], [1441006848.0, 70320000.0, 145.64], [1441008576.0, 70340000.0, 145.68], [1441010304.0, 70360000.0, 145.72], [1441012032.0, 70380000.0, 145.76], [1441013760.0, 70400000.0, 145.8], [1441015488.0, 70420000.0, 145.84], [1441017216.0, 70440000.0, 145.88], [1441018944.0, 70460000.0, 145.92000000000002], [1441020672.0, 70480000.0, 145.96], [1441022400.0, 70500000.0, 146.0], [1441024128.0, 70520000.0, 146.04], [1441025856.0, 70540000.0, 146.08], [1441027584.0, 70560000.0, 146.12], [1441029312.0, 70580000.0, 146.16], [1441031040.0, 70600000.0, 146.2], [1441032768.0, 70620000.0, 146.24], [1441034496.0, 70640000.0, 146.28], [1441036224.0, 70660000.0, 146.32], [1441037952.0, 70680000.0, 146.35999999999999], [1441039680.0, 70700000.0, 146.4], [1441041408.0, 70720000.0, 146.44], [1441043136.0, 70740000.0, 146.48], [1441044864.0, 70760000.0, 146.51999999999998], [1441046592.0, 70780000.0, 146.56], [1441048320.0, 70800000.0, 146.6], [1441050048.0, 70820000.0, 146.64], [1441051776.0, 70840000.0, 146.68], [1441053504.0, 70860000.0, 146.72], [1441055232.0, 70880000.0, 146.76], [1441056960.0, 70900000.0, 146.8], [1441058688.0, 70920000.0, 146.84], [1441060416.0, 70940000.0, 146.88], [1441062144.0, 70960000.0, 146.92000000000002], [1441063872.0, 70980000.0, 146.96], [1441065600.0, 71000000.0, 147.0], [1441067328.0, 71020000.0, 147.04000000000002], [1441069056.0, 71040000.0, 147.08], [1441070784.0, 71060000.0, 147.12], [1441072512.0, 71080000.0, 147.16], [1441074240.0, 71100000.0, 147.20000000000002], [1441075968.0, 71120000.0, 147.24], [1441077696.0, 71140000.0, 147.28], [1441079424.0, 71160000.0, 147.32], [1441081152.0, 71180000.0, 147.36], [1441082880.0, 71200000.0, 147.4], [1441084608.0, 71220000.0, 147.44], [1441086336.0, 71240000.0, 147.48], [1441088064.0, 71260000.0, 147.52], [1441089792.0, 71280000.0, 147.56], [1441091520.0, 71300000.0, 147.6], [1441093248.0, 71320000.0, 147.64], [1441094976.0, 71340000.0, 147.68], [1441096704.0, 71360000.0, 147.72], [1441098432.0, 71380000.0, 147.76], [1441100160.0, 71400000.0, 147.79999999999998], [1441101888.0, 71420000.0, 147.84], [1441103616.0, 71440000.0, 147.88], [1441105344.0, 71460000.0, 147.92], [1441107072.0, 71480000.0, 147.95999999999998], [1441108800.0, 71500000.0, 148.0], [1441110528.0, 71520000.0, 148.04], [1441112256.0, 71540000.0, 148.07999999999998], [1441113984.0, 71560000.0, 148.12], [1441115712.0, 71580000.0, 148.16], [1441117440.0, 71600000.0, 148.2], [1441119168.0, 71620000.0, 148.24], [1441120896.0, 71640000.0, 148.28], [1441122624.0, 71660000.0, 148.32], [1441124352.0, 71680000.0, 148.36], [1441126080.0, 71700000.0, 148.4], [1441127808.0, 71720000.0, 148.44], [1441129536.0, 71740000.0, 148.48000000000002], [1441131264.0, 71760000.0, 148.52], [1441132992.0, 71780000.0, 148.56], [1441134720.0, 71800000.0, 148.6], [1441136448.0, 71820000.0, 148.64000000000001], [1441138176.0, 71840000.0, 148.68], [1441139904.0, 71860000.0, 148.72], [1441141632.0, 71880000.0, 148.76], [1441143360.0, 71900000.0, 148.8], [1441145088.0, 71920000.0, 148.84], [1441146816.0, 71940000.0, 148.88], [1441148544.0, 71960000.0, 148.92], [1441150272.0, 71980000.0, 148.96], [1441152000.0, 72000000.0, 149.0], [1441153728.0, 72020000.0, 149.04], [1441155456.0, 72040000.0, 149.07999999999998], [1441157184.0, 72060000.0, 149.12], [1441158912.0, 72080000.0, 149.16], [1441160640.0, 72100000.0, 149.2], [1441162368.0, 72120000.0, 149.24], [1441164096.0, 72140000.0, 149.28], [1441165824.0, 72160000.0, 149.32], [1441167552.0, 72180000.0, 149.36], [1441169280.0, 72200000.0, 149.4], [1441171008.0, 72220000.0, 149.44], [1441172736.0, 72240000.0, 149.48000000000002], [1441174464.0, 72260000.0, 149.52], [1441176192.0, 72280000.0, 149.56], [1441177920.0, 72300000.0, 149.60000000000002], [1441179648.0, 72320000.0, 149.64000000000001], [1441181376.0, 72340000.0, 149.68], [1441183104.0, 72360000.0, 149.72], [1441184832.0, 72380000.0, 149.76000000000002], [1441186560.0, 72400000.0, 149.8], [1441188288.0, 72420000.0, 149.84], [1441190016.0, 72440000.0, 149.88], [1441191744.0, 72460000.0, 149.92000000000002], [1441193472.0, 72480000.0, 149.96], [1441195200.0, 72500000.0, 150.0], [1441196928.0, 72520000.0, 150.04], [1441198656.0, 72540000.0, 150.07999999999998], [1441200384.0, 72560000.0, 150.12], [1441202112.0, 72580000.0, 150.16], [1441203840.0, 72600000.0, 150.2], [1441205568.0, 72620000.0, 150.23999999999998], [1441207296.0, 72640000.0, 150.28], [1441209024.0, 72660000.0, 150.32], [1441210752.0, 72680000.0, 150.35999999999999], [1441212480.0, 72700000.0, 150.39999999999998], [1441214208.0, 72720000.0, 150.44], [1441215936.0, 72740000.0, 150.48], [1441217664.0, 72760000.0, 150.51999999999998], [1441219392.0, 72780000.0, 150.56], [1441221120.0, 72800000.0, 150.6], [1441222848.0, 72820000.0, 150.64], [1441224576.0, 72840000.0, 150.68], [1441226304.0, 72860000.0, 150.72], [1441228032.0, 72880000.0, 150.76], [1441229760.0, 72900000.0, 150.8], [1441231488.0, 72920000.0, 150.84], [1441233216.0, 72940000.0, 150.88], [1441234944.0, 72960000.0, 150.92000000000002], [1441236672.0, 72980000.0, 150.96], [1441238400.0, 73000000.0, 151.0], [1441240128.0, 73020000.0, 151.04], [1441241856.0, 73040000.0, 151.08], [1441243584.0, 73060000.0, 151.12], [1441245312.0, 73080000.0, 151.16], [1441247040.0, 73100000.0, 151.2], [1441248768.0, 73120000.0, 151.24], [1441250496.0, 73140000.0, 151.28], [1441252224.0, 73160000.0, 151.32], [1441253952.0, 73180000.0, 151.35999999999999], [1441255680.0, 73200000.0, 151.4], [1441257408.0, 73220000.0, 151.44], [1441259136.0, 73240000.0, 151.48], [1441260864.0, 73260000.0, 151.51999999999998], [1441262592.0, 73280000.0, 151.56], [1441264320.0, 73300000.0, 151.6], [1441266048.0, 73320000.0, 151.64], [1441267776.0, 73340000.0, 151.68], [1441269504.0, 73360000.0, 151.72], [1441271232.0, 73380000.0, 151.76], [1441272960.0, 73400000.0, 151.8], [1441274688.0, 73420000.0, 151.84], [1441276416.0, 73440000.0, 151.88], [1441278144.0, 73460000.0, 151.92000000000002], [1441279872.0, 73480000.0, 151.96], [1441281600.0, 73500000.0, 152.0], [1441283328.0, 73520000.0, 152.04000000000002], [1441285056.0, 73540000.0, 152.08], [1441286784.0, 73560000.0, 152.12], [1441288512.0, 73580000.0, 152.16], [1441290240.0, 73600000.0, 152.20000000000002], [1441291968.0, 73620000.0, 152.24], [1441293696.0, 73640000.0, 152.28], [1441295424.0, 73660000.0, 152.32], [1441297152.0, 73680000.0, 152.36], [1441298880.0, 73700000.0, 152.4], [1441300608.0, 73720000.0, 152.44], [1441302336.0, 73740000.0, 152.48], [1441304064.0, 73760000.0, 152.52], [1441305792.0, 73780000.0, 152.56], [1441307520.0, 73800000.0, 152.6], [1441309248.0, 73820000.0, 152.64], [1441310976.0, 73840000.0, 152.68], [1441312704.0, 73860000.0, 152.72], [1441314432.0, 73880000.0, 152.76], [1441316160.0, 73900000.0, 152.79999999999998], [1441317888.0, 73920000.0, 152.84], [1441319616.0, 73940000.0, 152.88], [1441321344.0, 73960000.0, 152.92], [1441323072.0, 73980000.0, 152.95999999999998], [1441324800.0, 74000000.0, 153.0], [1441326528.0, 74020000.0, 153.04], [1441328256.0, 74040000.0, 153.07999999999998], [1441329984.0, 74060000.0, 153.12], [1441331712.0, 74080000.0, 153.16], [1441333440.0, 74100000.0, 153.2], [1441335168.0, 74120000.0, 153.24], [1441336896.0, 74140000.0, 153.28], [1441338624.0, 74160000.0, 153.32], [1441340352.0, 74180000.0, 153.36], [1441342080.0, 74200000.0, 153.4], [1441343808.0, 74220000.0, 153.44], [1441345536.0, 74240000.0, 153.48000000000002], [1441347264.0, 74260000.0, 153.52], [1441348992.0, 74280000.0, 153.56], [1441350720.0, 74300000.0, 153.6], [1441352448.0, 74320000.0, 153.64000000000001], [1441354176.0, 74340000.0, 153.68], [1441355904.0, 74360000.0, 153.72], [1441357632.0, 74380000.0, 153.76], [1441359360.0, 74400000.0, 153.8], [1441361088.0, 74420000.0, 153.84], [1441362816.0, 74440000.0, 153.88], [1441364544.0, 74460000.0, 153.92], [1441366272.0, 74480000.0, 153.96], [1441368000.0, 74500000.0, 154.0], [1441369728.0, 74520000.0, 154.04], [1441371456.0, 74540000.0, 154.07999999999998], [1441373184.0, 74560000.0, 154.12], [1441374912.0, 74580000.0, 154.16], [1441376640.0, 74600000.0, 154.2], [1441378368.0, 74620000.0, 154.24], [1441380096.0, 74640000.0, 154.28], [1441381824.0, 74660000.0, 154.32], [1441383552.0, 74680000.0, 154.36], [1441385280.0, 74700000.0, 154.4], [1441387008.0, 74720000.0, 154.44], [1441388736.0, 74740000.0, 154.48000000000002], [1441390464.0, 74760000.0, 154.52], [1441392192.0, 74780000.0, 154.56], [1441393920.0, 74800000.0, 154.60000000000002], [1441395648.0, 74820000.0, 154.64000000000001], [1441397376.0, 74840000.0, 154.68], [1441399104.0, 74860000.0, 154.72], [1441400832.0, 74880000.0, 154.76000000000002], [1441402560.0, 74900000.0, 154.8], [1441404288.0, 74920000.0, 154.84], [1441406016.0, 74940000.0, 154.88], [1441407744.0, 74960000.0, 154.92000000000002], [1441409472.0, 74980000.0, 154.96], [1441411200.0, 75000000.0, 155.0], [1441412928.0, 75020000.0, 155.04], [1441414656.0, 75040000.0, 155.07999999999998], [1441416384.0, 75060000.0, 155.12], [1441418112.0, 75080000.0, 155.16], [1441419840.0, 75100000.0, 155.2], [1441421568.0, 75120000.0, 155.23999999999998], [1441423296.0, 75140000.0, 155.28], [1441425024.0, 75160000.0, 155.32], [1441426752.0, 75180000.0, 155.35999999999999], [1441428480.0, 75200000.0, 155.39999999999998], [1441430208.0, 75220000.0, 155.44], [1441431936.0, 75240000.0, 155.48], [1441433664.0, 75260000.0, 155.51999999999998], [1441435392.0, 75280000.0, 155.56], [1441437120.0, 75300000.0, 155.6], [1441438848.0, 75320000.0, 155.64], [1441440576.0, 75340000.0, 155.68], [1441442304.0, 75360000.0, 155.72], [1441444032.0, 75380000.0, 155.76], [1441445760.0, 75400000.0, 155.8], [1441447488.0, 75420000.0, 155.84], [1441449216.0, 75440000.0, 155.88], [1441450944.0, 75460000.0, 155.92000000000002], [1441452672.0, 75480000.0, 155.96], [1441454400.0, 75500000.0, 156.0], [1441456128.0, 75520000.0, 156.04], [1441457856.0, 75540000.0, 156.08], [1441459584.0, 75560000.0, 156.12], [1441461312.0, 75580000.0, 156.16], [1441463040.0, 75600000.0, 156.2], [1441464768.0, 75620000.0, 156.24], [1441466496.0, 75640000.0, 156.28], [1441468224.0, 75660000.0, 156.32], [1441469952.0, 75680000.0, 156.35999999999999], [1441471680.0, 75700000.0, 156.4], [1441473408.0, 75720000.0, 156.44], [1441475136.0, 75740000.0, 156.48], [1441476864.0, 75760000.0, 156.51999999999998], [1441478592.0, 75780000.0, 156.56], [1441480320.0, 75800000.0, 156.6], [1441482048.0, 75820000.0, 156.64], [1441483776.0, 75840000.0, 156.68], [1441485504.0, 75860000.0, 156.72], [1441487232.0, 75880000.0, 156.76], [1441488960.0, 75900000.0, 156.8], [1441490688.0, 75920000.0, 156.84], [1441492416.0, 75940000.0, 156.88], [1441494144.0, 75960000.0, 156.92000000000002], [1441495872.0, 75980000.0, 156.96], [1441497600.0, 76000000.0, 157.0], [1441499328.0, 76020000.0, 157.04000000000002], [1441501056.0, 76040000.0, 157.08], [1441502784.0, 76060000.0, 157.12], [1441504512.0, 76080000.0, 157.16], [1441506240.0, 76100000.0, 157.20000000000002], [1441507968.0, 76120000.0, 157.24], [1441509696.0, 76140000.0, 157.28], [1441511424.0, 76160000.0, 157.32], [1441513152.0, 76180000.0, 157.36], [1441514880.0, 76200000.0, 157.4], [1441516608.0, 76220000.0, 157.44], [1441518336.0, 76240000.0, 157.48], [1441520064.0, 76260000.0, 157.52], [1441521792.0, 76280000.0, 157.56], [1441523520.0, 76300000.0, 157.6], [1441525248.0, 76320000.0, 157.64], [1441526976.0, 76340000.0, 157.68], [1441528704.0, 76360000.0, 157.72], [1441530432.0, 76380000.0, 157.76], [1441532160.0, 76400000.0, 157.79999999999998], [1441533888.0, 76420000.0, 157.84], [1441535616.0, 76440000.0, 157.88], [1441537344.0, 76460000.0, 157.92], [1441539072.0, 76480000.0, 157.95999999999998], [1441540800.0, 76500000.0, 158.0], [1441542528.0, 76520000.0, 158.04], [1441544256.0, 76540000.0, 158.07999999999998], [1441545984.0, 76560000.0, 158.12], [1441547712.0, 76580000.0, 158.16], [1441549440.0, 76600000.0, 158.2], [1441551168.0, 76620000.0, 158.24], [1441552896.0, 76640000.0, 158.28], [1441554624.0, 76660000.0, 158.32], [1441556352.0, 76680000.0, 158.36], [1441558080.0, 76700000.0, 158.4], [1441559808.0, 76720000.0, 158.44], [1441561536.0, 76740000.0, 158.48000000000002], [1441563264.0, 76760000.0, 158.52], [1441564992.0, 76780000.0, 158.56], [1441566720.0, 76800000.0, 158.6], [1441568448.0, 76820000.0, 158.64000000000001], [1441570176.0, 76840000.0, 158.68], [1441571904.0, 76860000.0, 158.72], [1441573632.0, 76880000.0, 158.76], [1441575360.0, 76900000.0, 158.8], [1441577088.0, 76920000.0, 158.84], [1441578816.0, 76940000.0, 158.88], [1441580544.0, 76960000.0, 158.92], [1441582272.0, 76980000.0, 158.96], [1441584000.0, 77000000.0, 159.0], [1441585728.0, 77020000.0, 159.04], [1441587456.0, 77040000.0, 159.07999999999998], [1441589184.0, 77060000.0, 159.12], [1441590912.0, 77080000.0, 159.16], [1441592640.0, 77100000.0, 159.2], [1441594368.0, 77120000.0, 159.24], [1441596096.0, 77140000.0, 159.28], [1441597824.0, 77160000.0, 159.32], [1441599552.0, 77180000.0, 159.36], [1441601280.0, 77200000.0, 159.4], [1441603008.0, 77220000.0, 159.44], [1441604736.0, 77240000.0, 159.48000000000002], [1441606464.0, 77260000.0, 159.52], [1441608192.0, 77280000.0, 159.56], [1441609920.0, 77300000.0, 159.60000000000002], [1441611648.0, 77320000.0, 159.64000000000001], [1441613376.0, 77340000.0, 159.68], [1441615104.0, 77360000.0, 159.72], [1441616832.0, 77380000.0, 159.76000000000002], [1441618560.0, 77400000.0, 159.8], [1441620288.0, 77420000.0, 159.84], [1441622016.0, 77440000.0, 159.88], [1441623744.0, 77460000.0, 159.92000000000002], [1441625472.0, 77480000.0, 159.96], [1441627200.0, 77500000.0, 160.0], [1441628928.0, 77520000.0, 160.04], [1441630656.0, 77540000.0, 160.07999999999998], [1441632384.0, 77560000.0, 160.12], [1441634112.0, 77580000.0, 160.16], [1441635840.0, 77600000.0, 160.2], [1441637568.0, 77620000.0, 160.23999999999998], [1441639296.0, 77640000.0, 160.28], [1441641024.0, 77660000.0, 160.32], [1441642752.0, 77680000.0, 160.35999999999999], [1441644480.0, 77700000.0, 160.39999999999998], [1441646208.0, 77720000.0, 160.44], [1441647936.0, 77740000.0, 160.48], [1441649664.0, 77760000.0, 160.51999999999998], [1441651392.0, 77780000.0, 160.56], [1441653120.0, 77800000.0, 160.6], [1441654848.0, 77820000.0, 160.64], [1441656576.0, 77840000.0, 160.68], [1441658304.0, 77860000.0, 160.72], [1441660032.0, 77880000.0, 160.76], [1441661760.0, 77900000.0, 160.8], [1441663488.0, 77920000.0, 160.84], [1441665216.0, 77940000.0, 160.88], [1441666944.0, 77960000.0, 160.92000000000002], [1441668672.0, 77980000.0, 160.96], [1441670400.0, 78000000.0, 161.0], [1441672128.0, 78020000.0, 161.04], [1441673856.0, 78040000.0, 161.08], [1441675584.0, 78060000.0, 161.12], [1441677312.0, 78080000.0, 161.16], [1441679040.0, 78100000.0, 161.2], [1441680768.0, 78120000.0, 161.24], [1441682496.0, 78140000.0, 161.28], [1441684224.0, 78160000.0, 161.32], [1441685952.0, 78180000.0, 161.35999999999999], [1441687680.0, 78200000.0, 161.4], [1441689408.0, 78220000.0, 161.44], [1441691136.0, 78240000.0, 161.48], [1441692864.0, 78260000.0, 161.51999999999998], [1441694592.0, 78280000.0, 161.56], [1441696320.0, 78300000.0, 161.6], [1441698048.0, 78320000.0, 161.64], [1441699776.0, 78340000.0, 161.68], [1441701504.0, 78360000.0, 161.72], [1441703232.0, 78380000.0, 161.76], [1441704960.0, 78400000.0, 161.8], [1441706688.0, 78420000.0, 161.84], [1441708416.0, 78440000.0, 161.88], [1441710144.0, 78460000.0, 161.92000000000002], [1441711872.0, 78480000.0, 161.96], [1441713600.0, 78500000.0, 162.0], [1441715328.0, 78520000.0, 162.04000000000002], [1441717056.0, 78540000.0, 162.08], [1441718784.0, 78560000.0, 162.12], [1441720512.0, 78580000.0, 162.16], [1441722240.0, 78600000.0, 162.20000000000002], [1441723968.0, 78620000.0, 162.24], [1441725696.0, 78640000.0, 162.28], [1441727424.0, 78660000.0, 162.32], [1441729152.0, 78680000.0, 162.36], [1441730880.0, 78700000.0, 162.4], [1441732608.0, 78720000.0, 162.44], [1441734336.0, 78740000.0, 162.48], [1441736064.0, 78760000.0, 162.52], [1441737792.0, 78780000.0, 162.56], [1441739520.0, 78800000.0, 162.6], [1441741248.0, 78820000.0, 162.64], [1441742976.0, 78840000.0, 162.68], [1441744704.0, 78860000.0, 162.72], [1441746432.0, 78880000.0, 162.76], [1441748160.0, 78900000.0, 162.79999999999998], [1441749888.0, 78920000.0, 162.84], [1441751616.0, 78940000.0, 162.88], [1441753344.0, 78960000.0, 162.92], [1441755072.0, 78980000.0, 162.95999999999998], [1441756800.0, 79000000.0, 163.0], [1441758528.0, 79020000.0, 163.04], [1441760256.0, 79040000.0, 163.07999999999998], [1441761984.0, 79060000.0, 163.12], [1441763712.0, 79080000.0, 163.16], [1441765440.0, 79100000.0, 163.2], [1441767168.0, 79120000.0, 163.24], [1441768896.0, 79140000.0, 163.28], [1441770624.0, 79160000.0, 163.32], [1441772352.0, 79180000.0, 163.36], [1441774080.0, 79200000.0, 163.4], [1441775808.0, 79220000.0, 163.44], [1441777536.0, 79240000.0, 163.48000000000002], [1441779264.0, 79260000.0, 163.52], [1441780992.0, 79280000.0, 163.56], [1441782720.0, 79300000.0, 163.6], [1441784448.0, 79320000.0, 163.64000000000001], [1441786176.0, 79340000.0, 163.68], [1441787904.0, 79360000.0, 163.72], [1441789632.0, 79380000.0, 163.76], [1441791360.0, 79400000.0, 163.8], [1441793088.0, 79420000.0, 163.84], [1441794816.0, 79440000.0, 163.88], [1441796544.0, 79460000.0, 163.92], [1441798272.0, 79480000.0, 163.96], [1441800000.0, 79500000.0, 164.0], [1441801728.0, 79520000.0, 164.04], [1441803456.0, 79540000.0, 164.07999999999998], [1441805184.0, 79560000.0, 164.12], [1441806912.0, 79580000.0, 164.16], [1441808640.0, 79600000.0, 164.2], [1441810368.0, 79620000.0, 164.24], [1441812096.0, 79640000.0, 164.28], [1441813824.0, 79660000.0, 164.32], [1441815552.0, 79680000.0, 164.36], [1441817280.0, 79700000.0, 164.4], [1441819008.0, 79720000.0, 164.44], [1441820736.0, 79740000.0, 164.48000000000002], [1441822464.0, 79760000.0, 164.52], [1441824192.0, 79780000.0, 164.56], [1441825920.0, 79800000.0, 164.60000000000002], [1441827648.0, 79820000.0, 164.64000000000001], [1441829376.0, 79840000.0, 164.68], [1441831104.0, 79860000.0, 164.72], [1441832832.0, 79880000.0, 164.76000000000002], [1441834560.0, 79900000.0, 164.8], [1441836288.0, 79920000.0, 164.84], [1441838016.0, 79940000.0, 164.88], [1441839744.0, 79960000.0, 164.92000000000002], [1441841472.0, 79980000.0, 164.96], [1441843200.0, 80000000.0, 165.0], [1441844928.0, 80020000.0, 165.04000000000002], [1441846656.0, 80040000.0, 165.07999999999998], [1441848384.0, 80060000.0, 165.12], [1441850112.0, 80080000.0, 165.15999999999997], [1441851840.0, 80100000.0, 165.2], [1441853568.0, 80120000.0, 165.24], [1441855296.0, 80140000.0, 165.27999999999997], [1441857024.0, 80160000.0, 165.32], [1441858752.0, 80180000.0, 165.36], [1441860480.0, 80200000.0, 165.39999999999998], [1441862208.0, 80220000.0, 165.44], [1441863936.0, 80240000.0, 165.48], [1441865664.0, 80260000.0, 165.51999999999998], [1441867392.0, 80280000.0, 165.56], [1441869120.0, 80300000.0, 165.6], [1441870848.0, 80320000.0, 165.64], [1441872576.0, 80340000.0, 165.68], [1441874304.0, 80360000.0, 165.72], [1441876032.0, 80380000.0, 165.76], [1441877760.0, 80400000.0, 165.79999999999998], [1441879488.0, 80420000.0, 165.84], [1441881216.0, 80440000.0, 165.88], [1441882944.0, 80460000.0, 165.92], [1441884672.0, 80480000.0, 165.96], [1441886400.0, 80500000.0, 166.0], [1441888128.0, 80520000.0, 166.04], [1441889856.0, 80540000.0, 166.08], [1441891584.0, 80560000.0, 166.11999999999998], [1441893312.0, 80580000.0, 166.16], [1441895040.0, 80600000.0, 166.20000000000002], [1441896768.0, 80620000.0, 166.23999999999998], [1441898496.0, 80640000.0, 166.28], [1441900224.0, 80660000.0, 166.32000000000002], [1441901952.0, 80680000.0, 166.35999999999999], [1441903680.0, 80700000.0, 166.4], [1441905408.0, 80720000.0, 166.44], [1441907136.0, 80740000.0, 166.48], [1441908864.0, 80760000.0, 166.52], [1441910592.0, 80780000.0, 166.56], [1441912320.0, 80800000.0, 166.6], [1441914048.0, 80820000.0, 166.64000000000001], [1441915776.0, 80840000.0, 166.68], [1441917504.0, 80860000.0, 166.72], [1441919232.0, 80880000.0, 166.76], [1441920960.0, 80900000.0, 166.8], [1441922688.0, 80920000.0, 166.84], [1441924416.0, 80940000.0, 166.88], [1441926144.0, 80960000.0, 166.92000000000002], [1441927872.0, 80980000.0, 166.96], [1441929600.0, 81000000.0, 167.0], [1441931328.0, 81020000.0, 167.04000000000002], [1441933056.0, 81040000.0, 167.07999999999998], [1441934784.0, 81060000.0, 167.12], [1441936512.0, 81080000.0, 167.16000000000003], [1441938240.0, 81100000.0, 167.2], [1441939968.0, 81120000.0, 167.24], [1441941696.0, 81140000.0, 167.28000000000003], [1441943424.0, 81160000.0, 167.32], [1441945152.0, 81180000.0, 167.36], [1441946880.0, 81200000.0, 167.39999999999998], [1441948608.0, 81220000.0, 167.44], [1441950336.0, 81240000.0, 167.48000000000002], [1441952064.0, 81260000.0, 167.51999999999998], [1441953792.0, 81280000.0, 167.56], [1441955520.0, 81300000.0, 167.60000000000002], [1441957248.0, 81320000.0, 167.64], [1441958976.0, 81340000.0, 167.68], [1441960704.0, 81360000.0, 167.71999999999997], [1441962432.0, 81380000.0, 167.76], [1441964160.0, 81400000.0, 167.8], [1441965888.0, 81420000.0, 167.83999999999997], [1441967616.0, 81440000.0, 167.88], [1441969344.0, 81460000.0, 167.92000000000002], [1441971072.0, 81480000.0, 167.95999999999998], [1441972800.0, 81500000.0, 168.0], [1441974528.0, 81520000.0, 168.04], [1441976256.0, 81540000.0, 168.07999999999998], [1441977984.0, 81560000.0, 168.12], [1441979712.0, 81580000.0, 168.16], [1441981440.0, 81600000.0, 168.2], [1441983168.0, 81620000.0, 168.24], [1441984896.0, 81640000.0, 168.28], [1441986624.0, 81660000.0, 168.32], [1441988352.0, 81680000.0, 168.35999999999999], [1441990080.0, 81700000.0, 168.4], [1441991808.0, 81720000.0, 168.44], [1441993536.0, 81740000.0, 168.48], [1441995264.0, 81760000.0, 168.52], [1441996992.0, 81780000.0, 168.56], [1441998720.0, 81800000.0, 168.6], [1442000448.0, 81820000.0, 168.64000000000001], [1442002176.0, 81840000.0, 168.67999999999998], [1442003904.0, 81860000.0, 168.72], [1442005632.0, 81880000.0, 168.76000000000002], [1442007360.0, 81900000.0, 168.79999999999998], [1442009088.0, 81920000.0, 168.84], [1442010816.0, 81940000.0, 168.88000000000002], [1442012544.0, 81960000.0, 168.92], [1442014272.0, 81980000.0, 168.96], [1442016000.0, 82000000.0, 169.0], [1442017728.0, 82020000.0, 169.04], [1442019456.0, 82040000.0, 169.08], [1442021184.0, 82060000.0, 169.12], [1442022912.0, 82080000.0, 169.16], [1442024640.0, 82100000.0, 169.20000000000002], [1442026368.0, 82120000.0, 169.24], [1442028096.0, 82140000.0, 169.28], [1442029824.0, 82160000.0, 169.32], [1442031552.0, 82180000.0, 169.36], [1442033280.0, 82200000.0, 169.4], [1442035008.0, 82220000.0, 169.44], [1442036736.0, 82240000.0, 169.48000000000002], [1442038464.0, 82260000.0, 169.52], [1442040192.0, 82280000.0, 169.56], [1442041920.0, 82300000.0, 169.60000000000002], [1442043648.0, 82320000.0, 169.64], [1442045376.0, 82340000.0, 169.68], [1442047104.0, 82360000.0, 169.72000000000003], [1442048832.0, 82380000.0, 169.76], [1442050560.0, 82400000.0, 169.8], [1442052288.0, 82420000.0, 169.84000000000003], [1442054016.0, 82440000.0, 169.88], [1442055744.0, 82460000.0, 169.92000000000002], [1442057472.0, 82480000.0, 169.95999999999998], [1442059200.0, 82500000.0, 170.0], [1442060928.0, 82520000.0, 170.04000000000002], [1442062656.0, 82540000.0, 170.07999999999998], [1442064384.0, 82560000.0, 170.12], [1442066112.0, 82580000.0, 170.15999999999997], [1442067840.0, 82600000.0, 170.2], [1442069568.0, 82620000.0, 170.24], [1442071296.0, 82640000.0, 170.27999999999997], [1442073024.0, 82660000.0, 170.32], [1442074752.0, 82680000.0, 170.36], [1442076480.0, 82700000.0, 170.39999999999998], [1442078208.0, 82720000.0, 170.44], [1442079936.0, 82740000.0, 170.48], [1442081664.0, 82760000.0, 170.51999999999998], [1442083392.0, 82780000.0, 170.56], [1442085120.0, 82800000.0, 170.6], [1442086848.0, 82820000.0, 170.64], [1442088576.0, 82840000.0, 170.68], [1442090304.0, 82860000.0, 170.72], [1442092032.0, 82880000.0, 170.76], [1442093760.0, 82900000.0, 170.79999999999998], [1442095488.0, 82920000.0, 170.84], [1442097216.0, 82940000.0, 170.88], [1442098944.0, 82960000.0, 170.92], [1442100672.0, 82980000.0, 170.96], [1442102400.0, 83000000.0, 171.0], [1442104128.0, 83020000.0, 171.04], [1442105856.0, 83040000.0, 171.08], [1442107584.0, 83060000.0, 171.11999999999998], [1442109312.0, 83080000.0, 171.16], [1442111040.0, 83100000.0, 171.20000000000002], [1442112768.0, 83120000.0, 171.23999999999998], [1442114496.0, 83140000.0, 171.28], [1442116224.0, 83160000.0, 171.32000000000002], [1442117952.0, 83180000.0, 171.35999999999999], [1442119680.0, 83200000.0, 171.4], [1442121408.0, 83220000.0, 171.44], [1442123136.0, 83240000.0, 171.48], [1442124864.0, 83260000.0, 171.52], [1442126592.0, 83280000.0, 171.56], [1442128320.0, 83300000.0, 171.6], [1442130048.0, 83320000.0, 171.64000000000001], [1442131776.0, 83340000.0, 171.68], [1442133504.0, 83360000.0, 171.72], [1442135232.0, 83380000.0, 171.76], [1442136960.0, 83400000.0, 171.8], [1442138688.0, 83420000.0, 171.84], [1442140416.0, 83440000.0, 171.88], [1442142144.0, 83460000.0, 171.92000000000002], [1442143872.0, 83480000.0, 171.96], [1442145600.0, 83500000.0, 172.0], [1442147328.0, 83520000.0, 172.04000000000002], [1442149056.0, 83540000.0, 172.07999999999998], [1442150784.0, 83560000.0, 172.12], [1442152512.0, 83580000.0, 172.16000000000003], [1442154240.0, 83600000.0, 172.2], [1442155968.0, 83620000.0, 172.24], [1442157696.0, 83640000.0, 172.28000000000003], [1442159424.0, 83660000.0, 172.32], [1442161152.0, 83680000.0, 172.36], [1442162880.0, 83700000.0, 172.39999999999998], [1442164608.0, 83720000.0, 172.44], [1442166336.0, 83740000.0, 172.48000000000002], [1442168064.0, 83760000.0, 172.51999999999998], [1442169792.0, 83780000.0, 172.56], [1442171520.0, 83800000.0, 172.60000000000002], [1442173248.0, 83820000.0, 172.64], [1442174976.0, 83840000.0, 172.68], [1442176704.0, 83860000.0, 172.71999999999997], [1442178432.0, 83880000.0, 172.76], [1442180160.0, 83900000.0, 172.8], [1442181888.0, 83920000.0, 172.83999999999997], [1442183616.0, 83940000.0, 172.88], [1442185344.0, 83960000.0, 172.92000000000002], [1442187072.0, 83980000.0, 172.95999999999998], [1442188800.0, 84000000.0, 173.0], [1442190528.0, 84020000.0, 173.04], [1442192256.0, 84040000.0, 173.07999999999998], [1442193984.0, 84060000.0, 173.12], [1442195712.0, 84080000.0, 173.16], [1442197440.0, 84100000.0, 173.2], [1442199168.0, 84120000.0, 173.24], [1442200896.0, 84140000.0, 173.28], [1442202624.0, 84160000.0, 173.32], [1442204352.0, 84180000.0, 173.35999999999999], [1442206080.0, 84200000.0, 173.4], [1442207808.0, 84220000.0, 173.44], [1442209536.0, 84240000.0, 173.48], [1442211264.0, 84260000.0, 173.52], [1442212992.0, 84280000.0, 173.56], [1442214720.0, 84300000.0, 173.6], [1442216448.0, 84320000.0, 173.64000000000001], [1442218176.0, 84340000.0, 173.67999999999998], [1442219904.0, 84360000.0, 173.72], [1442221632.0, 84380000.0, 173.76000000000002], [1442223360.0, 84400000.0, 173.79999999999998], [1442225088.0, 84420000.0, 173.84], [1442226816.0, 84440000.0, 173.88000000000002], [1442228544.0, 84460000.0, 173.92], [1442230272.0, 84480000.0, 173.96], [1442232000.0, 84500000.0, 174.0], [1442233728.0, 84520000.0, 174.04], [1442235456.0, 84540000.0, 174.08], [1442237184.0, 84560000.0, 174.12], [1442238912.0, 84580000.0, 174.16], [1442240640.0, 84600000.0, 174.20000000000002], [1442242368.0, 84620000.0, 174.24], [1442244096.0, 84640000.0, 174.28], [1442245824.0, 84660000.0, 174.32], [1442247552.0, 84680000.0, 174.36], [1442249280.0, 84700000.0, 174.4], [1442251008.0, 84720000.0, 174.44], [1442252736.0, 84740000.0, 174.48000000000002], [1442254464.0, 84760000.0, 174.52], [1442256192.0, 84780000.0, 174.56], [1442257920.0, 84800000.0, 174.60000000000002], [1442259648.0, 84820000.0, 174.64], [1442261376.0, 84840000.0, 174.68], [1442263104.0, 84860000.0, 174.72000000000003], [1442264832.0, 84880000.0, 174.76], [1442266560.0, 84900000.0, 174.8], [1442268288.0, 84920000.0, 174.84000000000003], [1442270016.0, 84940000.0, 174.88], [1442271744.0, 84960000.0, 174.92000000000002], [1442273472.0, 84980000.0, 174.95999999999998], [1442275200.0, 85000000.0, 175.0], [1442276928.0, 85020000.0, 175.04000000000002], [1442278656.0, 85040000.0, 175.07999999999998], [1442280384.0, 85060000.0, 175.12], [1442282112.0, 85080000.0, 175.15999999999997], [1442283840.0, 85100000.0, 175.2], [1442285568.0, 85120000.0, 175.24], [1442287296.0, 85140000.0, 175.27999999999997], [1442289024.0, 85160000.0, 175.32], [1442290752.0, 85180000.0, 175.36], [1442292480.0, 85200000.0, 175.39999999999998], [1442294208.0, 85220000.0, 175.44], [1442295936.0, 85240000.0, 175.48], [1442297664.0, 85260000.0, 175.51999999999998], [1442299392.0, 85280000.0, 175.56], [1442301120.0, 85300000.0, 175.6], [1442302848.0, 85320000.0, 175.64], [1442304576.0, 85340000.0, 175.68], [1442306304.0, 85360000.0, 175.72], [1442308032.0, 85380000.0, 175.76], [1442309760.0, 85400000.0, 175.79999999999998], [1442311488.0, 85420000.0, 175.84], [1442313216.0, 85440000.0, 175.88], [1442314944.0, 85460000.0, 175.92], [1442316672.0, 85480000.0, 175.96], [1442318400.0, 85500000.0, 176.0], [1442320128.0, 85520000.0, 176.04], [1442321856.0, 85540000.0, 176.08], [1442323584.0, 85560000.0, 176.11999999999998], [1442325312.0, 85580000.0, 176.16], [1442327040.0, 85600000.0, 176.20000000000002], [1442328768.0, 85620000.0, 176.23999999999998], [1442330496.0, 85640000.0, 176.28], [1442332224.0, 85660000.0, 176.32000000000002], [1442333952.0, 85680000.0, 176.35999999999999], [1442335680.0, 85700000.0, 176.4], [1442337408.0, 85720000.0, 176.44], [1442339136.0, 85740000.0, 176.48], [1442340864.0, 85760000.0, 176.52], [1442342592.0, 85780000.0, 176.56], [1442344320.0, 85800000.0, 176.6], [1442346048.0, 85820000.0, 176.64000000000001], [1442347776.0, 85840000.0, 176.68], [1442349504.0, 85860000.0, 176.72], [1442351232.0, 85880000.0, 176.76], [1442352960.0, 85900000.0, 176.8], [1442354688.0, 85920000.0, 176.84], [1442356416.0, 85940000.0, 176.88], [1442358144.0, 85960000.0, 176.92000000000002], [1442359872.0, 85980000.0, 176.96], [1442361600.0, 86000000.0, 177.0], [1442363328.0, 86020000.0, 177.04000000000002], [1442365056.0, 86040000.0, 177.07999999999998], [1442366784.0, 86060000.0, 177.12], [1442368512.0, 86080000.0, 177.16000000000003], [1442370240.0, 86100000.0, 177.2], [1442371968.0, 86120000.0, 177.24], [1442373696.0, 86140000.0, 177.28000000000003], [1442375424.0, 86160000.0, 177.32], [1442377152.0, 86180000.0, 177.36], [1442378880.0, 86200000.0, 177.39999999999998], [1442380608.0, 86220000.0, 177.44], [1442382336.0, 86240000.0, 177.48000000000002], [1442384064.0, 86260000.0, 177.51999999999998], [1442385792.0, 86280000.0, 177.56], [1442387520.0, 86300000.0, 177.60000000000002], [1442389248.0, 86320000.0, 177.64], [1442390976.0, 86340000.0, 177.68], [1442392704.0, 86360000.0, 177.71999999999997], [1442394432.0, 86380000.0, 177.76], [1442396160.0, 86400000.0, 177.8], [1442397888.0, 86420000.0, 177.83999999999997], [1442399616.0, 86440000.0, 177.88], [1442401344.0, 86460000.0, 177.92000000000002], [1442403072.0, 86480000.0, 177.95999999999998], [1442404800.0, 86500000.0, 178.0], [1442406528.0, 86520000.0, 178.04], [1442408256.0, 86540000.0, 178.07999999999998], [1442409984.0, 86560000.0, 178.12], [1442411712.0, 86580000.0, 178.16], [1442413440.0, 86600000.0, 178.2], [1442415168.0, 86620000.0, 178.24], [1442416896.0, 86640000.0, 178.28], [1442418624.0, 86660000.0, 178.32], [1442420352.0, 86680000.0, 178.35999999999999], [1442422080.0, 86700000.0, 178.4], [1442423808.0, 86720000.0, 178.44], [1442425536.0, 86740000.0, 178.48], [1442427264.0, 86760000.0, 178.52], [1442428992.0, 86780000.0, 178.56], [1442430720.0, 86800000.0, 178.6], [1442432448.0, 86820000.0, 178.64000000000001], [1442434176.0, 86840000.0, 178.67999999999998], [1442435904.0, 86860000.0, 178.72], [1442437632.0, 86880000.0, 178.76000000000002], [1442439360.0, 86900000.0, 178.79999999999998], [1442441088.0, 86920000.0, 178.84], [1442442816.0, 86940000.0, 178.88000000000002], [1442444544.0, 86960000.0, 178.92], [1442446272.0, 86980000.0, 178.96], [1442448000.0, 87000000.0, 179.0], [1442449728.0, 87020000.0, 179.04], [1442451456.0, 87040000.0, 179.08], [1442453184.0, 87060000.0, 179.12], [1442454912.0, 87080000.0, 179.16], [1442456640.0, 87100000.0, 179.20000000000002], [1442458368.0, 87120000.0, 179.24], [1442460096.0, 87140000.0, 179.28], [1442461824.0, 87160000.0, 179.32], [1442463552.0, 87180000.0, 179.36], [1442465280.0, 87200000.0, 179.4], [1442467008.0, 87220000.0, 179.44], [1442468736.0, 87240000.0, 179.48000000000002], [1442470464.0, 87260000.0, 179.52], [1442472192.0, 87280000.0, 179.56], [1442473920.0, 87300000.0, 179.60000000000002], [1442475648.0, 87320000.0, 179.64], [1442477376.0, 87340000.0, 179.68], [1442479104.0, 87360000.0, 179.72000000000003], [1442480832.0, 87380000.0, 179.76], [1442482560.0, 87400000.0, 179.8], [1442484288.0, 87420000.0, 179.84000000000003], [1442486016.0, 87440000.0, 179.88], [1442487744.0, 87460000.0, 179.92000000000002], [1442489472.0, 87480000.0, 179.95999999999998], [1442491200.0, 87500000.0, 180.0], [1442492928.0, 87520000.0, 180.04000000000002], [1442494656.0, 87540000.0, 180.07999999999998], [1442496384.0, 87560000.0, 180.12], [1442498112.0, 87580000.0, 180.15999999999997], [1442499840.0, 87600000.0, 180.2], [1442501568.0, 87620000.0, 180.24], [1442503296.0, 87640000.0, 180.27999999999997], [1442505024.0, 87660000.0, 180.32], [1442506752.0, 87680000.0, 180.36], [1442508480.0, 87700000.0, 180.39999999999998], [1442510208.0, 87720000.0, 180.44], [1442511936.0, 87740000.0, 180.48], [1442513664.0, 87760000.0, 180.51999999999998], [1442515392.0, 87780000.0, 180.56], [1442517120.0, 87800000.0, 180.6], [1442518848.0, 87820000.0, 180.64], [1442520576.0, 87840000.0, 180.68], [1442522304.0, 87860000.0, 180.72], [1442524032.0, 87880000.0, 180.76], [1442525760.0, 87900000.0, 180.79999999999998], [1442527488.0, 87920000.0, 180.84], [1442529216.0, 87940000.0, 180.88], [1442530944.0, 87960000.0, 180.92], [1442532672.0, 87980000.0, 180.96], [1442534400.0, 88000000.0, 181.0], [1442536128.0, 88020000.0, 181.04], [1442537856.0, 88040000.0, 181.08], [1442539584.0, 88060000.0, 181.11999999999998], [1442541312.0, 88080000.0, 181.16], [1442543040.0, 88100000.0, 181.20000000000002], [1442544768.0, 88120000.0, 181.23999999999998], [1442546496.0, 88140000.0, 181.28], [1442548224.0, 88160000.0, 181.32000000000002], [1442549952.0, 88180000.0, 181.35999999999999], [1442551680.0, 88200000.0, 181.4], [1442553408.0, 88220000.0, 181.44], [1442555136.0, 88240000.0, 181.48], [1442556864.0, 88260000.0, 181.52], [1442558592.0, 88280000.0, 181.56], [1442560320.0, 88300000.0, 181.6], [1442562048.0, 88320000.0, 181.64000000000001], [1442563776.0, 88340000.0, 181.68], [1442565504.0, 88360000.0, 181.72], [1442567232.0, 88380000.0, 181.76], [1442568960.0, 88400000.0, 181.8], [1442570688.0, 88420000.0, 181.84], [1442572416.0, 88440000.0, 181.88], [1442574144.0, 88460000.0, 181.92000000000002], [1442575872.0, 88480000.0, 181.96], [1442577600.0, 88500000.0, 182.0], [1442579328.0, 88520000.0, 182.04000000000002], [1442581056.0, 88540000.0, 182.07999999999998], [1442582784.0, 88560000.0, 182.12], [1442584512.0, 88580000.0, 182.16000000000003], [1442586240.0, 88600000.0, 182.2], [1442587968.0, 88620000.0, 182.24], [1442589696.0, 88640000.0, 182.28000000000003], [1442591424.0, 88660000.0, 182.32], [1442593152.0, 88680000.0, 182.36], [1442594880.0, 88700000.0, 182.39999999999998], [1442596608.0, 88720000.0, 182.44], [1442598336.0, 88740000.0, 182.48000000000002], [1442600064.0, 88760000.0, 182.51999999999998], [1442601792.0, 88780000.0, 182.56], [1442603520.0, 88800000.0, 182.60000000000002], [1442605248.0, 88820000.0, 182.64], [1442606976.0, 88840000.0, 182.68], [1442608704.0, 88860000.0, 182.71999999999997], [1442610432.0, 88880000.0, 182.76], [1442612160.0, 88900000.0, 182.8], [1442613888.0, 88920000.0, 182.83999999999997], [1442615616.0, 88940000.0, 182.88], [1442617344.0, 88960000.0, 182.92000000000002], [1442619072.0, 88980000.0, 182.95999999999998], [1442620800.0, 89000000.0, 183.0], [1442622528.0, 89020000.0, 183.04], [1442624256.0, 89040000.0, 183.07999999999998], [1442625984.0, 89060000.0, 183.12], [1442627712.0, 89080000.0, 183.16], [1442629440.0, 89100000.0, 183.2], [1442631168.0, 89120000.0, 183.24], [1442632896.0, 89140000.0, 183.28], [1442634624.0, 89160000.0, 183.32], [1442636352.0, 89180000.0, 183.35999999999999], [1442638080.0, 89200000.0, 183.4], [1442639808.0, 89220000.0, 183.44], [1442641536.0, 89240000.0, 183.48], [1442643264.0, 89260000.0, 183.52], [1442644992.0, 89280000.0, 183.56], [1442646720.0, 89300000.0, 183.6], [1442648448.0, 89320000.0, 183.64000000000001], [1442650176.0, 89340000.0, 183.67999999999998], [1442651904.0, 89360000.0, 183.72], [1442653632.0, 89380000.0, 183.76000000000002], [1442655360.0, 89400000.0, 183.79999999999998], [1442657088.0, 89420000.0, 183.84], [1442658816.0, 89440000.0, 183.88000000000002], [1442660544.0, 89460000.0, 183.92], [1442662272.0, 89480000.0, 183.96], [1442664000.0, 89500000.0, 184.0], [1442665728.0, 89520000.0, 184.04], [1442667456.0, 89540000.0, 184.08], [1442669184.0, 89560000.0, 184.12], [1442670912.0, 89580000.0, 184.16], [1442672640.0, 89600000.0, 184.20000000000002], [1442674368.0, 89620000.0, 184.24], [1442676096.0, 89640000.0, 184.28], [1442677824.0, 89660000.0, 184.32], [1442679552.0, 89680000.0, 184.36], [1442681280.0, 89700000.0, 184.4], [1442683008.0, 89720000.0, 184.44], [1442684736.0, 89740000.0, 184.48000000000002], [1442686464.0, 89760000.0, 184.52], [1442688192.0, 89780000.0, 184.56], [1442689920.0, 89800000.0, 184.60000000000002], [1442691648.0, 89820000.0, 184.64], [1442693376.0, 89840000.0, 184.68], [1442695104.0, 89860000.0, 184.72000000000003], [1442696832.0, 89880000.0, 184.76], [1442698560.0, 89900000.0, 184.8], [1442700288.0, 89920000.0, 184.84000000000003], [1442702016.0, 89940000.0, 184.88], [1442703744.0, 89960000.0, 184.92000000000002], [1442705472.0, 89980000.0, 184.95999999999998], [1442707200.0, 90000000.0, 185.0], [1442708928.0, 90020000.0, 185.04000000000002], [1442710656.0, 90040000.0, 185.07999999999998], [1442712384.0, 90060000.0, 185.12], [1442714112.0, 90080000.0, 185.15999999999997], [1442715840.0, 90100000.0, 185.2], [1442717568.0, 90120000.0, 185.24], [1442719296.0, 90140000.0, 185.27999999999997], [1442721024.0, 90160000.0, 185.32], [1442722752.0, 90180000.0, 185.36], [1442724480.0, 90200000.0, 185.39999999999998], [1442726208.0, 90220000.0, 185.44], [1442727936.0, 90240000.0, 185.48], [1442729664.0, 90260000.0, 185.51999999999998], [1442731392.0, 90280000.0, 185.56], [1442733120.0, 90300000.0, 185.6], [1442734848.0, 90320000.0, 185.64], [1442736576.0, 90340000.0, 185.68], [1442738304.0, 90360000.0, 185.72], [1442740032.0, 90380000.0, 185.76], [1442741760.0, 90400000.0, 185.79999999999998], [1442743488.0, 90420000.0, 185.84], [1442745216.0, 90440000.0, 185.88], [1442746944.0, 90460000.0, 185.92], [1442748672.0, 90480000.0, 185.96], [1442750400.0, 90500000.0, 186.0], [1442752128.0, 90520000.0, 186.04], [1442753856.0, 90540000.0, 186.08], [1442755584.0, 90560000.0, 186.11999999999998], [1442757312.0, 90580000.0, 186.16], [1442759040.0, 90600000.0, 186.20000000000002], [1442760768.0, 90620000.0, 186.23999999999998], [1442762496.0, 90640000.0, 186.28], [1442764224.0, 90660000.0, 186.32000000000002], [1442765952.0, 90680000.0, 186.35999999999999], [1442767680.0, 90700000.0, 186.4], [1442769408.0, 90720000.0, 186.44], [1442771136.0, 90740000.0, 186.48], [1442772864.0, 90760000.0, 186.52], [1442774592.0, 90780000.0, 186.56], [1442776320.0, 90800000.0, 186.6], [1442778048.0, 90820000.0, 186.64000000000001], [1442779776.0, 90840000.0, 186.68], [1442781504.0, 90860000.0, 186.72], [1442783232.0, 90880000.0, 186.76], [1442784960.0, 90900000.0, 186.8], [1442786688.0, 90920000.0, 186.84], [1442788416.0, 90940000.0, 186.88], [1442790144.0, 90960000.0, 186.92000000000002], [1442791872.0, 90980000.0, 186.96], [1442793600.0, 91000000.0, 187.0], [1442795328.0, 91020000.0, 187.04000000000002], [1442797056.0, 91040000.0, 187.07999999999998], [1442798784.0, 91060000.0, 187.12], [1442800512.0, 91080000.0, 187.16000000000003], [1442802240.0, 91100000.0, 187.2], [1442803968.0, 91120000.0, 187.24], [1442805696.0, 91140000.0, 187.28000000000003], [1442807424.0, 91160000.0, 187.32], [1442809152.0, 91180000.0, 187.36], [1442810880.0, 91200000.0, 187.39999999999998], [1442812608.0, 91220000.0, 187.44], [1442814336.0, 91240000.0, 187.48000000000002], [1442816064.0, 91260000.0, 187.51999999999998], [1442817792.0, 91280000.0, 187.56], [1442819520.0, 91300000.0, 187.60000000000002], [1442821248.0, 91320000.0, 187.64], [1442822976.0, 91340000.0, 187.68], [1442824704.0, 91360000.0, 187.71999999999997], [1442826432.0, 91380000.0, 187.76], [1442828160.0, 91400000.0, 187.8], [1442829888.0, 91420000.0, 187.83999999999997], [1442831616.0, 91440000.0, 187.88], [1442833344.0, 91460000.0, 187.92000000000002], [1442835072.0, 91480000.0, 187.95999999999998], [1442836800.0, 91500000.0, 188.0], [1442838528.0, 91520000.0, 188.04], [1442840256.0, 91540000.0, 188.07999999999998], [1442841984.0, 91560000.0, 188.12], [1442843712.0, 91580000.0, 188.16], [1442845440.0, 91600000.0, 188.2], [1442847168.0, 91620000.0, 188.24], [1442848896.0, 91640000.0, 188.28], [1442850624.0, 91660000.0, 188.32], [1442852352.0, 91680000.0, 188.35999999999999], [1442854080.0, 91700000.0, 188.4], [1442855808.0, 91720000.0, 188.44], [1442857536.0, 91740000.0, 188.48], [1442859264.0, 91760000.0, 188.52], [1442860992.0, 91780000.0, 188.56], [1442862720.0, 91800000.0, 188.6], [1442864448.0, 91820000.0, 188.64000000000001], [1442866176.0, 91840000.0, 188.67999999999998], [1442867904.0, 91860000.0, 188.72], [1442869632.0, 91880000.0, 188.76000000000002], [1442871360.0, 91900000.0, 188.79999999999998], [1442873088.0, 91920000.0, 188.84], [1442874816.0, 91940000.0, 188.88000000000002], [1442876544.0, 91960000.0, 188.92], [1442878272.0, 91980000.0, 188.96], [1442880000.0, 92000000.0, 189.0], [1442881728.0, 92020000.0, 189.04], [1442883456.0, 92040000.0, 189.08], [1442885184.0, 92060000.0, 189.12], [1442886912.0, 92080000.0, 189.16], [1442888640.0, 92100000.0, 189.20000000000002], [1442890368.0, 92120000.0, 189.24], [1442892096.0, 92140000.0, 189.28], [1442893824.0, 92160000.0, 189.32], [1442895552.0, 92180000.0, 189.36], [1442897280.0, 92200000.0, 189.4], [1442899008.0, 92220000.0, 189.44], [1442900736.0, 92240000.0, 189.48000000000002], [1442902464.0, 92260000.0, 189.52], [1442904192.0, 92280000.0, 189.56], [1442905920.0, 92300000.0, 189.60000000000002], [1442907648.0, 92320000.0, 189.64], [1442909376.0, 92340000.0, 189.68], [1442911104.0, 92360000.0, 189.72000000000003], [1442912832.0, 92380000.0, 189.76], [1442914560.0, 92400000.0, 189.8], [1442916288.0, 92420000.0, 189.84000000000003], [1442918016.0, 92440000.0, 189.88], [1442919744.0, 92460000.0, 189.92000000000002], [1442921472.0, 92480000.0, 189.95999999999998], [1442923200.0, 92500000.0, 190.0], [1442924928.0, 92520000.0, 190.04000000000002], [1442926656.0, 92540000.0, 190.07999999999998], [1442928384.0, 92560000.0, 190.12], [1442930112.0, 92580000.0, 190.15999999999997], [1442931840.0, 92600000.0, 190.2], [1442933568.0, 92620000.0, 190.24], [1442935296.0, 92640000.0, 190.27999999999997], [1442937024.0, 92660000.0, 190.32], [1442938752.0, 92680000.0, 190.36], [1442940480.0, 92700000.0, 190.39999999999998], [1442942208.0, 92720000.0, 190.44], [1442943936.0, 92740000.0, 190.48], [1442945664.0, 92760000.0, 190.51999999999998], [1442947392.0, 92780000.0, 190.56], [1442949120.0, 92800000.0, 190.6], [1442950848.0, 92820000.0, 190.64], [1442952576.0, 92840000.0, 190.68], [1442954304.0, 92860000.0, 190.72], [1442956032.0, 92880000.0, 190.76], [1442957760.0, 92900000.0, 190.79999999999998], [1442959488.0, 92920000.0, 190.84], [1442961216.0, 92940000.0, 190.88], [1442962944.0, 92960000.0, 190.92], [1442964672.0, 92980000.0, 190.96], [1442966400.0, 93000000.0, 191.0], [1442968128.0, 93020000.0, 191.04], [1442969856.0, 93040000.0, 191.08], [1442971584.0, 93060000.0, 191.11999999999998], [1442973312.0, 93080000.0, 191.16], [1442975040.0, 93100000.0, 191.20000000000002], [1442976768.0, 93120000.0, 191.23999999999998], [1442978496.0, 93140000.0, 191.28], [1442980224.0, 93160000.0, 191.32000000000002], [1442981952.0, 93180000.0, 191.35999999999999], [1442983680.0, 93200000.0, 191.4], [1442985408.0, 93220000.0, 191.44], [1442987136.0, 93240000.0, 191.48], [1442988864.0, 93260000.0, 191.52], [1442990592.0, 93280000.0, 191.56], [1442992320.0, 93300000.0, 191.6], [1442994048.0, 93320000.0, 191.64000000000001], [1442995776.0, 93340000.0, 191.68], [1442997504.0, 93360000.0, 191.72], [1442999232.0, 93380000.0, 191.76], [1443000960.0, 93400000.0, 191.8], [1443002688.0, 93420000.0, 191.84], [1443004416.0, 93440000.0, 191.88], [1443006144.0, 93460000.0, 191.92000000000002], [1443007872.0, 93480000.0, 191.96], [1443009600.0, 93500000.0, 192.0], [1443011328.0, 93520000.0, 192.04000000000002], [1443013056.0, 93540000.0, 192.07999999999998], [1443014784.0, 93560000.0, 192.12], [1443016512.0, 93580000.0, 192.16000000000003], [1443018240.0, 93600000.0, 192.2], [1443019968.0, 93620000.0, 192.24], [1443021696.0, 93640000.0, 192.28000000000003], [1443023424.0, 93660000.0, 192.32], [1443025152.0, 93680000.0, 192.36], [1443026880.0, 93700000.0, 192.39999999999998], [1443028608.0, 93720000.0, 192.44], [1443030336.0, 93740000.0, 192.48000000000002], [1443032064.0, 93760000.0, 192.51999999999998], [1443033792.0, 93780000.0, 192.56], [1443035520.0, 93800000.0, 192.60000000000002], [1443037248.0, 93820000.0, 192.64], [1443038976.0, 93840000.0, 192.68], [1443040704.0, 93860000.0, 192.71999999999997], [1443042432.0, 93880000.0, 192.76], [1443044160.0, 93900000.0, 192.8], [1443045888.0, 93920000.0, 192.83999999999997], [1443047616.0, 93940000.0, 192.88], [1443049344.0, 93960000.0, 192.92000000000002], [1443051072.0, 93980000.0, 192.95999999999998], [1443052800.0, 94000000.0, 193.0], [1443054528.0, 94020000.0, 193.04], [1443056256.0, 94040000.0, 193.07999999999998], [1443057984.0, 94060000.0, 193.12], [1443059712.0, 94080000.0, 193.16], [1443061440.0, 94100000.0, 193.2], [1443063168.0, 94120000.0, 193.24], [1443064896.0, 94140000.0, 193.28], [1443066624.0, 94160000.0, 193.32], [1443068352.0, 94180000.0, 193.35999999999999], [1443070080.0, 94200000.0, 193.4], [1443071808.0, 94220000.0, 193.44], [1443073536.0, 94240000.0, 193.48], [1443075264.0, 94260000.0, 193.52], [1443076992.0, 94280000.0, 193.56], [1443078720.0, 94300000.0, 193.6], [1443080448.0, 94320000.0, 193.64000000000001], [1443082176.0, 94340000.0, 193.67999999999998], [1443083904.0, 94360000.0, 193.72], [1443085632.0, 94380000.0, 193.76000000000002], [1443087360.0, 94400000.0, 193.79999999999998], [1443089088.0, 94420000.0, 193.84], [1443090816.0, 94440000.0, 193.88000000000002], [1443092544.0, 94460000.0, 193.92], [1443094272.0, 94480000.0, 193.96], [1443096000.0, 94500000.0, 194.0], [1443097728.0, 94520000.0, 194.04], [1443099456.0, 94540000.0, 194.08], [1443101184.0, 94560000.0, 194.12], [1443102912.0, 94580000.0, 194.16], [1443104640.0, 94600000.0, 194.20000000000002], [1443106368.0, 94620000.0, 194.24], [1443108096.0, 94640000.0, 194.28], [1443109824.0, 94660000.0, 194.32], [1443111552.0, 94680000.0, 194.36], [1443113280.0, 94700000.0, 194.4], [1443115008.0, 94720000.0, 194.44], [1443116736.0, 94740000.0, 194.48000000000002], [1443118464.0, 94760000.0, 194.52], [1443120192.0, 94780000.0, 194.56], [1443121920.0, 94800000.0, 194.60000000000002], [1443123648.0, 94820000.0, 194.64], [1443125376.0, 94840000.0, 194.68], [1443127104.0, 94860000.0, 194.72000000000003], [1443128832.0, 94880000.0, 194.76], [1443130560.0, 94900000.0, 194.8], [1443132288.0, 94920000.0, 194.84000000000003], [1443134016.0, 94940000.0, 194.88], [1443135744.0, 94960000.0, 194.92000000000002], [1443137472.0, 94980000.0, 194.95999999999998], [1443139200.0, 95000000.0, 195.0], [1443140928.0, 95020000.0, 195.04000000000002], [1443142656.0, 95040000.0, 195.07999999999998], [1443144384.0, 95060000.0, 195.12], [1443146112.0, 95080000.0, 195.15999999999997], [1443147840.0, 95100000.0, 195.2], [1443149568.0, 95120000.0, 195.24], [1443151296.0, 95140000.0, 195.27999999999997], [1443153024.0, 95160000.0, 195.32], [1443154752.0, 95180000.0, 195.36], [1443156480.0, 95200000.0, 195.39999999999998], [1443158208.0, 95220000.0, 195.44], [1443159936.0, 95240000.0, 195.48], [1443161664.0, 95260000.0, 195.51999999999998], [1443163392.0, 95280000.0, 195.56], [1443165120.0, 95300000.0, 195.6], [1443166848.0, 95320000.0, 195.64], [1443168576.0, 95340000.0, 195.68], [1443170304.0, 95360000.0, 195.72], [1443172032.0, 95380000.0, 195.76], [1443173760.0, 95400000.0, 195.79999999999998], [1443175488.0, 95420000.0, 195.84], [1443177216.0, 95440000.0, 195.88], [1443178944.0, 95460000.0, 195.92], [1443180672.0, 95480000.0, 195.96], [1443182400.0, 95500000.0, 196.0], [1443184128.0, 95520000.0, 196.04], [1443185856.0, 95540000.0, 196.08], [1443187584.0, 95560000.0, 196.11999999999998], [1443189312.0, 95580000.0, 196.16], [1443191040.0, 95600000.0, 196.20000000000002], [1443192768.0, 95620000.0, 196.23999999999998], [1443194496.0, 95640000.0, 196.28], [1443196224.0, 95660000.0, 196.32000000000002], [1443197952.0, 95680000.0, 196.35999999999999], [1443199680.0, 95700000.0, 196.4], [1443201408.0, 95720000.0, 196.44], [1443203136.0, 95740000.0, 196.48], [1443204864.0, 95760000.0, 196.52], [1443206592.0, 95780000.0, 196.56], [1443208320.0, 95800000.0, 196.6], [1443210048.0, 95820000.0, 196.64000000000001], [1443211776.0, 95840000.0, 196.68], [1443213504.0, 95860000.0, 196.72], [1443215232.0, 95880000.0, 196.76], [1443216960.0, 95900000.0, 196.8], [1443218688.0, 95920000.0, 196.84], [1443220416.0, 95940000.0, 196.88], [1443222144.0, 95960000.0, 196.92000000000002], [1443223872.0, 95980000.0, 196.96], [1443225600.0, 96000000.0, 197.0], [1443227328.0, 96020000.0, 197.04000000000002], [1443229056.0, 96040000.0, 197.07999999999998], [1443230784.0, 96060000.0, 197.12], [1443232512.0, 96080000.0, 197.16000000000003], [1443234240.0, 96100000.0, 197.2], [1443235968.0, 96120000.0, 197.24], [1443237696.0, 96140000.0, 197.28000000000003], [1443239424.0, 96160000.0, 197.32], [1443241152.0, 96180000.0, 197.36], [1443242880.0, 96200000.0, 197.39999999999998], [1443244608.0, 96220000.0, 197.44], [1443246336.0, 96240000.0, 197.48000000000002], [1443248064.0, 96260000.0, 197.51999999999998], [1443249792.0, 96280000.0, 197.56], [1443251520.0, 96300000.0, 197.60000000000002], [1443253248.0, 96320000.0, 197.64], [1443254976.0, 96340000.0, 197.68], [1443256704.0, 96360000.0, 197.71999999999997], [1443258432.0, 96380000.0, 197.76], [1443260160.0, 96400000.0, 197.8], [1443261888.0, 96420000.0, 197.83999999999997], [1443263616.0, 96440000.0, 197.88], [1443265344.0, 96460000.0, 197.92000000000002], [1443267072.0, 96480000.0, 197.95999999999998], [1443268800.0, 96500000.0, 198.0], [1443270528.0, 96520000.0, 198.04], [1443272256.0, 96540000.0, 198.07999999999998], [1443273984.0, 96560000.0, 198.12], [1443275712.0, 96580000.0, 198.16], [1443277440.0, 96600000.0, 198.2], [1443279168.0, 96620000.0, 198.24], [1443280896.0, 96640000.0, 198.28], [1443282624.0, 96660000.0, 198.32], [1443284352.0, 96680000.0, 198.35999999999999], [1443286080.0, 96700000.0, 198.4], [1443287808.0, 96720000.0, 198.44], [1443289536.0, 96740000.0, 198.48], [1443291264.0, 96760000.0, 198.52], [1443292992.0, 96780000.0, 198.56], [1443294720.0, 96800000.0, 198.6], [1443296448.0, 96820000.0, 198.64000000000001], [1443298176.0, 96840000.0, 198.67999999999998], [1443299904.0, 96860000.0, 198.72], [1443301632.0, 96880000.0, 198.76000000000002], [1443303360.0, 96900000.0, 198.79999999999998], [1443305088.0, 96920000.0, 198.84], [1443306816.0, 96940000.0, 198.88000000000002], [1443308544.0, 96960000.0, 198.92], [1443310272.0, 96980000.0, 198.96], [1443312000.0, 97000000.0, 199.0], [1443313728.0, 97020000.0, 199.04], [1443315456.0, 97040000.0, 199.08], [1443317184.0, 97060000.0, 199.12], [1443318912.0, 97080000.0, 199.16], [1443320640.0, 97100000.0, 199.20000000000002], [1443322368.0, 97120000.0, 199.24], [1443324096.0, 97140000.0, 199.28], [1443325824.0, 97160000.0, 199.32], [1443327552.0, 97180000.0, 199.36], [1443329280.0, 97200000.0, 199.4], [1443331008.0, 97220000.0, 199.44], [1443332736.0, 97240000.0, 199.48000000000002], [1443334464.0, 97260000.0, 199.52], [1443336192.0, 97280000.0, 199.56], [1443337920.0, 97300000.0, 199.60000000000002], [1443339648.0, 97320000.0, 199.64], [1443341376.0, 97340000.0, 199.68], [1443343104.0, 97360000.0, 199.72000000000003], [1443344832.0, 97380000.0, 199.76], [1443346560.0, 97400000.0, 199.8], [1443348288.0, 97420000.0, 199.84000000000003], [1443350016.0, 97440000.0, 199.88], [1443351744.0, 97460000.0, 199.92000000000002], [1443353472.0, 97480000.0, 199.95999999999998], [1443355200.0, 97500000.0, 200.0], [1443356928.0, 97520000.0, 200.04000000000002], [1443358656.0, 97540000.0, 200.07999999999998], [1443360384.0, 97560000.0, 200.12], [1443362112.0, 97580000.0, 200.15999999999997], [1443363840.0, 97600000.0, 200.2], [1443365568.0, 97620000.0, 200.24], [1443367296.0, 97640000.0, 200.27999999999997], [1443369024.0, 97660000.0, 200.32], [1443370752.0, 97680000.0, 200.36], [1443372480.0, 97700000.0, 200.39999999999998], [1443374208.0, 97720000.0, 200.44], [1443375936.0, 97740000.0, 200.48], [1443377664.0, 97760000.0, 200.51999999999998], [1443379392.0, 97780000.0, 200.56], [1443381120.0, 97800000.0, 200.6], [1443382848.0, 97820000.0, 200.64], [1443384576.0, 97840000.0, 200.68], [1443386304.0, 97860000.0, 200.72], [1443388032.0, 97880000.0, 200.76], [1443389760.0, 97900000.0, 200.79999999999998], [1443391488.0, 97920000.0, 200.84], [1443393216.0, 97940000.0, 200.88], [1443394944.0, 97960000.0, 200.92], [1443396672.0, 97980000.0, 200.96], [1443398400.0, 98000000.0, 201.0], [1443400128.0, 98020000.0, 201.04], [1443401856.0, 98040000.0, 201.08], [1443403584.0, 98060000.0, 201.11999999999998], [1443405312.0, 98080000.0, 201.16], [1443407040.0, 98100000.0, 201.20000000000002], [1443408768.0, 98120000.0, 201.23999999999998], [1443410496.0, 98140000.0, 201.28], [1443412224.0, 98160000.0, 201.32000000000002], [1443413952.0, 98180000.0, 201.35999999999999], [1443415680.0, 98200000.0, 201.4], [1443417408.0, 98220000.0, 201.44], [1443419136.0, 98240000.0, 201.48], [1443420864.0, 98260000.0, 201.52], [1443422592.0, 98280000.0, 201.56], [1443424320.0, 98300000.0, 201.6], [1443426048.0, 98320000.0, 201.64000000000001], [1443427776.0, 98340000.0, 201.68], [1443429504.0, 98360000.0, 201.72], [1443431232.0, 98380000.0, 201.76], [1443432960.0, 98400000.0, 201.8], [1443434688.0, 98420000.0, 201.84], [1443436416.0, 98440000.0, 201.88], [1443438144.0, 98460000.0, 201.92000000000002], [1443439872.0, 98480000.0, 201.96], [1443441600.0, 98500000.0, 202.0], [1443443328.0, 98520000.0, 202.04000000000002], [1443445056.0, 98540000.0, 202.07999999999998], [1443446784.0, 98560000.0, 202.12], [1443448512.0, 98580000.0, 202.16000000000003], [1443450240.0, 98600000.0, 202.2], [1443451968.0, 98620000.0, 202.24], [1443453696.0, 98640000.0, 202.28000000000003], [1443455424.0, 98660000.0, 202.32], [1443457152.0, 98680000.0, 202.36], [1443458880.0, 98700000.0, 202.39999999999998], [1443460608.0, 98720000.0, 202.44], [1443462336.0, 98740000.0, 202.48000000000002], [1443464064.0, 98760000.0, 202.51999999999998], [1443465792.0, 98780000.0, 202.56], [1443467520.0, 98800000.0, 202.60000000000002], [1443469248.0, 98820000.0, 202.64], [1443470976.0, 98840000.0, 202.68], [1443472704.0, 98860000.0, 202.71999999999997], [1443474432.0, 98880000.0, 202.76], [1443476160.0, 98900000.0, 202.8], [1443477888.0, 98920000.0, 202.83999999999997], [1443479616.0, 98940000.0, 202.88], [1443481344.0, 98960000.0, 202.92000000000002], [1443483072.0, 98980000.0, 202.95999999999998], [1443484800.0, 99000000.0, 203.0], [1443486528.0, 99020000.0, 203.04], [1443488256.0, 99040000.0, 203.07999999999998], [1443489984.0, 99060000.0, 203.12], [1443491712.0, 99080000.0, 203.16], [1443493440.0, 99100000.0, 203.2], [1443495168.0, 99120000.0, 203.24], [1443496896.0, 99140000.0, 203.28], [1443498624.0, 99160000.0, 203.32], [1443500352.0, 99180000.0, 203.35999999999999], [1443502080.0, 99200000.0, 203.4], [1443503808.0, 99220000.0, 203.44], [1443505536.0, 99240000.0, 203.48], [1443507264.0, 99260000.0, 203.52], [1443508992.0, 99280000.0, 203.56], [1443510720.0, 99300000.0, 203.6], [1443512448.0, 99320000.0, 203.64000000000001], [1443514176.0, 99340000.0, 203.67999999999998], [1443515904.0, 99360000.0, 203.72], [1443517632.0, 99380000.0, 203.76000000000002], [1443519360.0, 99400000.0, 203.79999999999998], [1443521088.0, 99420000.0, 203.84], [1443522816.0, 99440000.0, 203.88000000000002], [1443524544.0, 99460000.0, 203.92], [1443526272.0, 99480000.0, 203.96], [1443528000.0, 99500000.0, 204.0], [1443529728.0, 99520000.0, 204.04], [1443531456.0, 99540000.0, 204.08], [1443533184.0, 99560000.0, 204.12], [1443534912.0, 99580000.0, 204.16], [1443536640.0, 99600000.0, 204.20000000000002], [1443538368.0, 99620000.0, 204.24], [1443540096.0, 99640000.0, 204.28], [1443541824.0, 99660000.0, 204.32], [1443543552.0, 99680000.0, 204.36], [1443545280.0, 99700000.0, 204.4], [1443547008.0, 99720000.0, 204.44], [1443548736.0, 99740000.0, 204.48000000000002], [1443550464.0, 99760000.0, 204.52], [1443552192.0, 99780000.0, 204.56], [1443553920.0, 99800000.0, 204.60000000000002], [1443555648.0, 99820000.0, 204.64], [1443557376.0, 99840000.0, 204.68], [1443559104.0, 99860000.0, 204.72000000000003], [1443560832.0, 99880000.0, 204.76], [1443562560.0, 99900000.0, 204.8], [1443564288.0, 99920000.0, 204.84000000000003], [1443566016.0, 99940000.0, 204.88], [1443567744.0, 99960000.0, 204.92000000000002], [1443569472.0, 99980000.0, 204.95999999999998]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/app/demo/data/poly5-graph.pbtxt b/tensorflow/tensorboard/app/demo/data/poly5-graph.pbtxt
new file mode 100644
index 0000000000..5bf8834752
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/poly5-graph.pbtxt
@@ -0,0 +1,14 @@
+node {
+  name: "A"
+  op: "Input"
+}
+node {
+  name: "B"
+  op: "Input"
+}
+node {
+  name: "C"
+  op: "MatMul"
+  input: "A"
+  input: "B"
+}
diff --git a/tensorflow/tensorboard/app/demo/data/poly5.json b/tensorflow/tensorboard/app/demo/data/poly5.json
new file mode 100644
index 0000000000..bd885f3be9
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/poly5.json
@@ -0,0 +1 @@
+[[1434931200.0, 0.0, 0.0], [1434932928.0, 20000.0, 1.9960079840320003e-06], [1434934656.0, 40000.0, 3.9840637450240005e-06], [1434936384.0, 60000.0, 5.964214711776e-06], [1434938112.0, 80000.0, 7.936507936768e-06], [1434939840.0, 100000.0, 9.9009901e-06], [1434941568.0, 120000.0, 1.1857707512831999e-05], [1434943296.0, 140000.0, 1.3806706121824001e-05], [1434945024.0, 160000.0, 1.5748031512576e-05], [1434946752.0, 180000.0, 1.7681728913568e-05], [1434948480.0, 200000.0, 1.96078432e-05], [1434950208.0, 220000.0, 2.1526418897632e-05], [1434951936.0, 240000.0, 2.3437500186624e-05], [1434953664.0, 260000.0, 2.5341130905376e-05], [1434955392.0, 280000.0, 2.7237354554368002e-05], [1434957120.0, 300000.0, 2.91262143e-05], [1434958848.0, 320000.0, 3.1007752978432e-05], [1434960576.0, 340000.0, 3.2882013099424006e-05], [1434962304.0, 360000.0, 3.4749036850175996e-05], [1434964032.0, 380000.0, 3.6608866099167997e-05], [1434965760.0, 400000.0, 3.84615424e-05], [1434967488.0, 420000.0, 4.0307106995232e-05], [1434969216.0, 440000.0, 4.2145600820224e-05], [1434970944.0, 460000.0, 4.3977064506975996e-05], [1434972672.0, 480000.0, 4.5801538387968e-05], [1434974400.0, 500000.0, 4.76190625e-05], [1434976128.0, 520000.0, 4.9429676588031996e-05], [1434977856.0, 540000.0, 5.1233420109024e-05], [1434979584.0, 560000.0, 5.3030332235776005e-05], [1434981312.0, 580000.0, 5.482045186076801e-05], [1434983040.0, 600000.0, 5.66038176e-05], [1434984768.0, 620000.0, 5.8380467796832e-05], [1434986496.0, 640000.0, 6.0150440525824e-05], [1434988224.0, 660000.0, 6.1913773596576e-05], [1434989952.0, 680000.0, 6.367050455756801e-05], [1434991680.0, 700000.0, 6.542067070000001e-05], [1434993408.0, 720000.0, 6.7164309061632e-05], [1434995136.0, 740000.0, 6.890145643062399e-05], [1434996864.0, 760000.0, 7.0632149349376e-05], [1434998592.0, 780000.0, 7.2356424118368e-05], [1435000320.0, 800000.0, 7.407431679999999e-05], [1435002048.0, 820000.0, 7.5785863222432e-05], [1435003776.0, 840000.0, 7.7491098983424e-05], [1435005504.0, 860000.0, 7.919005945417599e-05], [1435007232.0, 880000.0, 8.088277978316798e-05], [1435008960.0, 900000.0, 8.25692949e-05], [1435010688.0, 920000.0, 8.4249639519232e-05], [1435012416.0, 940000.0, 8.5923848144224e-05], [1435014144.0, 960000.0, 8.7591955070976e-05], [1435015872.0, 980000.0, 8.9253994391968e-05], [1435017600.0, 1000000.0, 9.091000000000001e-05], [1435019328.0, 1020000.0, 9.2560005592032e-05], [1435021056.0, 1040000.0, 9.4204044673024e-05], [1435022784.0, 1060000.0, 9.584215055977599e-05], [1435024512.0, 1080000.0, 9.7474356384768e-05], [1435026240.0, 1100000.0, 9.91006951e-05], [1435027968.0, 1120000.0, 0.000100721199480832], [1435029696.0, 1140000.0, 0.00010233590212982401], [1435031424.0, 1160000.0, 0.000103944835480576], [1435033152.0, 1180000.0, 0.000105548031801568], [1435034880.0, 1200000.0, 0.0001071455232], [1435036608.0, 1220000.0, 0.000108737341625632], [1435038336.0, 1240000.0, 0.00011032351887462399], [1435040064.0, 1260000.0, 0.000111904086593376], [1435041792.0, 1280000.0, 0.000113479076282368], [1435043520.0, 1300000.0, 0.0001150485193], [1435045248.0, 1320000.0, 0.000116612446866432], [1435046976.0, 1340000.0, 0.00011817089006742401], [1435048704.0, 1360000.0, 0.000119723879858176], [1435050432.0, 1380000.0, 0.000121271447067168], [1435052160.0, 1400000.0, 0.00012281362240000002], [1435053888.0, 1420000.0, 0.000124350436443232], [1435055616.0, 1440000.0, 0.00012588191966822398], [1435057344.0, 1460000.0, 0.000127408102434976], [1435059072.0, 1480000.0, 0.000128929014995968], [1435060800.0, 1500000.0, 0.00013044468749999998], [1435062528.0, 1520000.0, 0.000131955149996032], [1435064256.0, 1540000.0, 0.000133460432437024], [1435065984.0, 1560000.0, 0.000134960564683776], [1435067712.0, 1580000.0, 0.000136455576508768], [1435069440.0, 1600000.0, 0.0001379454976], [1435071168.0, 1620000.0, 0.00013943035756483203], [1435072896.0, 1640000.0, 0.000140910185933824], [1435074624.0, 1660000.0, 0.000142385012164576], [1435076352.0, 1680000.0, 0.000143854865645568], [1435078080.0, 1700000.0, 0.0001453197757], [1435079808.0, 1720000.0, 0.000146779771589632], [1435081536.0, 1740000.0, 0.00014823488251862398], [1435083264.0, 1760000.0, 0.00014968513763737598], [1435084992.0, 1780000.0, 0.00015113056604636798], [1435086720.0, 1800000.0, 0.00015257119680000002], [1435088448.0, 1820000.0, 0.00015400705891043198], [1435090176.0, 1840000.0, 0.00015543818135142402], [1435091904.0, 1860000.0, 0.000156864593062176], [1435093632.0, 1880000.0, 0.000158286322951168], [1435095360.0, 1900000.0, 0.0001597033999], [1435097088.0, 1920000.0, 0.000161115852767232], [1435098816.0, 1940000.0, 0.00016252371039222403], [1435100544.0, 1960000.0, 0.000163927001598976], [1435102272.0, 1980000.0, 0.000165325755199968], [1435104000.0, 2000000.0, 0.00016672], [1435105728.0, 2020000.0, 0.000168109764800032], [1435107456.0, 2040000.0, 0.00016949507840102398], [1435109184.0, 2060000.0, 0.00017087596960777598], [1435110912.0, 2080000.0, 0.00017225246723276798], [1435112640.0, 2100000.0, 0.0001736246001], [1435114368.0, 2120000.0, 0.000174992397048832], [1435116096.0, 2140000.0, 0.000176355886937824], [1435117824.0, 2160000.0, 0.000177715098648576], [1435119552.0, 2180000.0, 0.000179070061089568], [1435121280.0, 2200000.0, 0.0001804208032], [1435123008.0, 2220000.0, 0.000181767353953632], [1435124736.0, 2240000.0, 0.000183109742362624], [1435126464.0, 2260000.0, 0.000184447997481376], [1435128192.0, 2280000.0, 0.000185782148410368], [1435129920.0, 2300000.0, 0.0001871122243], [1435131648.0, 2320000.0, 0.00018843825435443202], [1435133376.0, 2340000.0, 0.00018976026783542398], [1435135104.0, 2360000.0, 0.000191078294066176], [1435136832.0, 2380000.0, 0.000192392362435168], [1435138560.0, 2400000.0, 0.0001937025024], [1435140288.0, 2420000.0, 0.00019500874349123197], [1435142016.0, 2440000.0, 0.000196311115316224], [1435143744.0, 2460000.0, 0.000197609647562976], [1435145472.0, 2480000.0, 0.000198904370003968], [1435147200.0, 2500000.0, 0.0002001953125], [1435148928.0, 2520000.0, 0.000201482505004032], [1435150656.0, 2540000.0, 0.000202765977565024], [1435152384.0, 2560000.0, 0.000204045760331776], [1435154112.0, 2580000.0, 0.00020532188355676801], [1435155840.0, 2600000.0, 0.00020659437759999998], [1435157568.0, 2620000.0, 0.000207863272932832], [1435159296.0, 2640000.0, 0.000209128600141824], [1435161024.0, 2660000.0, 0.000210390389932576], [1435162752.0, 2680000.0, 0.000211648673133568], [1435164480.0, 2700000.0, 0.0002129034807], [1435166208.0, 2720000.0, 0.000214154843717632], [1435167936.0, 2740000.0, 0.000215402793406624], [1435169664.0, 2760000.0, 0.00021664736112537604], [1435171392.0, 2780000.0, 0.000217888578374368], [1435173120.0, 2800000.0, 0.00021912647680000002], [1435174848.0, 2820000.0, 0.00022036108819843198], [1435176576.0, 2840000.0, 0.00022159244451942398], [1435178304.0, 2860000.0, 0.000222820577870176], [1435180032.0, 2880000.0, 0.000224045520519168], [1435181760.0, 2900000.0, 0.00022526730489999997], [1435183488.0, 2920000.0, 0.000226485963615232], [1435185216.0, 2940000.0, 0.00022770152944022397], [1435186944.0, 2960000.0, 0.000228914035326976], [1435188672.0, 2980000.0, 0.00023012351440796798], [1435190400.0, 3000000.0, 0.00023132999999999997], [1435192128.0, 3020000.0, 0.00023253352560803197], [1435193856.0, 3040000.0, 0.000233734124929024], [1435195584.0, 3060000.0, 0.00023493183185577598], [1435197312.0, 3080000.0, 0.00023612668048076797], [1435199040.0, 3100000.0, 0.0002373187051], [1435200768.0, 3120000.0, 0.000238507940216832], [1435202496.0, 3140000.0, 0.000239694420545824], [1435204224.0, 3160000.0, 0.00024087818101657598], [1435205952.0, 3180000.0, 0.00024205925677756802], [1435207680.0, 3200000.0, 0.0002432376832], [1435209408.0, 3220000.0, 0.000244413495881632], [1435211136.0, 3240000.0, 0.000245586730650624], [1435212864.0, 3260000.0, 0.000246757423569376], [1435214592.0, 3280000.0, 0.000247925610938368], [1435216320.0, 3300000.0, 0.0002490913293], [1435218048.0, 3320000.0, 0.000250254615442432], [1435219776.0, 3340000.0, 0.00025141550640342403], [1435221504.0, 3360000.0, 0.00025257403947417603], [1435223232.0, 3380000.0, 0.00025373025220316804], [1435224960.0, 3400000.0, 0.0002548841824], [1435226688.0, 3420000.0, 0.000256035868139232], [1435228416.0, 3440000.0, 0.000257185347764224], [1435230144.0, 3460000.0, 0.00025833265989097596], [1435231872.0, 3480000.0, 0.000259477843411968], [1435233600.0, 3500000.0, 0.0002606209375], [1435235328.0, 3520000.0, 0.000261761981612032], [1435237056.0, 3540000.0, 0.00026290101549302403], [1435238784.0, 3560000.0, 0.000264038079179776], [1435240512.0, 3580000.0, 0.00026517321300476796], [1435242240.0, 3600000.0, 0.0002663064576], [1435243968.0, 3620000.0, 0.00026743785390083197], [1435245696.0, 3640000.0, 0.000268567443149824], [1435247424.0, 3660000.0, 0.000269695266900576], [1435249152.0, 3680000.0, 0.000270821367021568], [1435250880.0, 3700000.0, 0.0002719457857], [1435252608.0, 3720000.0, 0.00027306856544563195], [1435254336.0, 3740000.0, 0.000274189749094624], [1435256064.0, 3760000.0, 0.000275309379813376], [1435257792.0, 3780000.0, 0.00027642750110236803], [1435259520.0, 3800000.0, 0.00027754415679999997], [1435261248.0, 3820000.0, 0.000278659391086432], [1435262976.0, 3840000.0, 0.00027977324848742403], [1435264704.0, 3860000.0, 0.000280885773878176], [1435266432.0, 3880000.0, 0.000281997012487168], [1435268160.0, 3900000.0, 0.0002831070099], [1435269888.0, 3920000.0, 0.00028421581206323197], [1435271616.0, 3940000.0, 0.000285323465288224], [1435273344.0, 3960000.0, 0.000286430016254976], [1435275072.0, 3980000.0, 0.000287535512015968], [1435276800.0, 4000000.0, 0.00028864], [1435278528.0, 4020000.0, 0.00028974352801603196], [1435280256.0, 4040000.0, 0.00029084614425702406], [1435281984.0, 4060000.0, 0.000291947897303776], [1435283712.0, 4080000.0, 0.00029304883612876797], [1435285440.0, 4100000.0, 0.0002941490101], [1435287168.0, 4120000.0, 0.000295248468984832], [1435288896.0, 4140000.0, 0.00029634726295382394], [1435290624.0, 4160000.0, 0.000297445442584576], [1435292352.0, 4180000.0, 0.000298543058865568], [1435294080.0, 4200000.0, 0.00029964016319999995], [1435295808.0, 4220000.0, 0.000300736807409632], [1435297536.0, 4240000.0, 0.00030183304373862394], [1435299264.0, 4260000.0, 0.000302928924857376], [1435300992.0, 4280000.0, 0.000304024503866368], [1435302720.0, 4300000.0, 0.0003051198343], [1435304448.0, 4320000.0, 0.00030621497013043195], [1435306176.0, 4340000.0, 0.000307309965771424], [1435307904.0, 4360000.0, 0.000308404876082176], [1435309632.0, 4380000.0, 0.000309499756371168], [1435311360.0, 4400000.0, 0.0003105946624], [1435313088.0, 4420000.0, 0.000311689650387232], [1435314816.0, 4440000.0, 0.000312784777012224], [1435316544.0, 4460000.0, 0.000313880099418976], [1435318272.0, 4480000.0, 0.000314975675219968], [1435320000.0, 4500000.0, 0.0003160715625], [1435321728.0, 4520000.0, 0.000317167819820032], [1435323456.0, 4540000.0, 0.000318264506221024], [1435325184.0, 4560000.0, 0.000319361681227776], [1435326912.0, 4580000.0, 0.00032045940485276805], [1435328640.0, 4600000.0, 0.0003215577376], [1435330368.0, 4620000.0, 0.000322656740468832], [1435332096.0, 4640000.0, 0.000323756474957824], [1435333824.0, 4660000.0, 0.000324857003068576], [1435335552.0, 4680000.0, 0.000325958387309568], [1435337280.0, 4700000.0, 0.00032706069069999996], [1435339008.0, 4720000.0, 0.00032816397677363194], [1435340736.0, 4740000.0, 0.000329268309582624], [1435342464.0, 4760000.0, 0.000330373753701376], [1435344192.0, 4780000.0, 0.000331480374230368], [1435345920.0, 4800000.0, 0.0003325882368], [1435347648.0, 4820000.0, 0.000333697407574432], [1435349376.0, 4840000.0, 0.000334807953255424], [1435351104.0, 4860000.0, 0.000335919941086176], [1435352832.0, 4880000.0, 0.000337033438855168], [1435354560.0, 4900000.0, 0.0003381485149], [1435356288.0, 4920000.0, 0.000339265238111232], [1435358016.0, 4940000.0, 0.00034038367793622396], [1435359744.0, 4960000.0, 0.000341503904382976], [1435361472.0, 4980000.0, 0.00034262598802396797], [1435363200.0, 5000000.0, 0.00034375], [1435364928.0, 5020000.0, 0.00034487601202403203], [1435366656.0, 5040000.0, 0.000346004096385024], [1435368384.0, 5060000.0, 0.000347134325951776], [1435370112.0, 5080000.0, 0.000348266774176768], [1435371840.0, 5100000.0, 0.0003494015151000001], [1435373568.0, 5120000.0, 0.00035053862335283204], [1435375296.0, 5140000.0, 0.00035167817416182403], [1435377024.0, 5160000.0, 0.000352820243352576], [1435378752.0, 5180000.0, 0.000353964907353568], [1435380480.0, 5200000.0, 0.0003551122432], [1435382208.0, 5220000.0, 0.0003562623285376321], [1435383936.0, 5240000.0, 0.00035741524162662396], [1435385664.0, 5260000.0, 0.000358571061345376], [1435387392.0, 5280000.0, 0.000359729867194368], [1435389120.0, 5300000.0, 0.0003608917393], [1435390848.0, 5320000.0, 0.000362056758418432], [1435392576.0, 5340000.0, 0.00036322500593942403], [1435394304.0, 5360000.0, 0.000364396563890176], [1435396032.0, 5380000.0, 0.00036557151493916805], [1435397760.0, 5400000.0, 0.00036674994240000003], [1435399488.0, 5420000.0, 0.00036793193023523205], [1435401216.0, 5440000.0, 0.000369117563060224], [1435402944.0, 5460000.0, 0.00037030692614697605], [1435404672.0, 5480000.0, 0.000371500105427968], [1435406400.0, 5500000.0, 0.0003726971875], [1435408128.0, 5520000.0, 0.0003738982596280321], [1435409856.0, 5540000.0, 0.000375103409749024], [1435411584.0, 5560000.0, 0.00037631272647577603], [1435413312.0, 5580000.0, 0.000377526299100768], [1435415040.0, 5600000.0, 0.00037874421760000006], [1435416768.0, 5620000.0, 0.000379966572636832], [1435418496.0, 5640000.0, 0.00038119345556582394], [1435420224.0, 5660000.0, 0.000382424958436576], [1435421952.0, 5680000.0, 0.00038366117399756797], [1435423680.0, 5700000.0, 0.00038490219569999994], [1435425408.0, 5720000.0, 0.000386148117701632], [1435427136.0, 5740000.0, 0.000387399034870624], [1435428864.0, 5760000.0, 0.00038865504278937595], [1435430592.0, 5780000.0, 0.000389916237758368], [1435432320.0, 5800000.0, 0.00039118271679999995], [1435434048.0, 5820000.0, 0.000392454577662432], [1435435776.0, 5840000.0, 0.00039373191882342395], [1435437504.0, 5860000.0, 0.00039501483949417595], [1435439232.0, 5880000.0, 0.000396303439623168], [1435440960.0, 5900000.0, 0.00039759781989999997], [1435442688.0, 5920000.0, 0.00039889808175923205], [1435444416.0, 5940000.0, 0.00040020432738422393], [1435446144.0, 5960000.0, 0.000401516659710976], [1435447872.0, 5980000.0, 0.00040283518243196796], [1435449600.0, 6000000.0, 0.00040416], [1435451328.0, 6020000.0, 0.000405491217632032], [1435453056.0, 6040000.0, 0.000406828941313024], [1435454784.0, 6060000.0, 0.000408173277799776], [1435456512.0, 6080000.0, 0.00040952433462476803], [1435458240.0, 6100000.0, 0.00041088222009999997], [1435459968.0, 6120000.0, 0.00041224704332083203], [1435461696.0, 6140000.0, 0.00041361891416982397], [1435463424.0, 6160000.0, 0.00041499794332057596], [1435465152.0, 6180000.0, 0.000416384242241568], [1435466880.0, 6200000.0, 0.00041777792319999994], [1435468608.0, 6220000.0, 0.000419179099265632], [1435470336.0, 6240000.0, 0.000420587884314624], [1435472064.0, 6260000.0, 0.000422004393033376], [1435473792.0, 6280000.0, 0.000423428740922368], [1435475520.0, 6300000.0, 0.0004248610443], [1435477248.0, 6320000.0, 0.00042630142030643204], [1435478976.0, 6340000.0, 0.00042774998690742395], [1435480704.0, 6360000.0, 0.000429206862898176], [1435482432.0, 6380000.0, 0.000430672167907168], [1435484160.0, 6400000.0, 0.00043214602240000005], [1435485888.0, 6420000.0, 0.000433628547683232], [1435487616.0, 6440000.0, 0.00043511986590822395], [1435489344.0, 6460000.0, 0.00043662010007497606], [1435491072.0, 6480000.0, 0.00043812937403596804], [1435492800.0, 6500000.0, 0.00043964781249999994], [1435494528.0, 6520000.0, 0.000441175541036032], [1435496256.0, 6540000.0, 0.000442712686077024], [1435497984.0, 6560000.0, 0.000444259374923776], [1435499712.0, 6580000.0, 0.000445815735748768], [1435501440.0, 6600000.0, 0.00044738189760000003], [1435503168.0, 6620000.0, 0.0004489579904048321], [1435504896.0, 6640000.0, 0.000450544144973824], [1435506624.0, 6660000.0, 0.00045214049300457604], [1435508352.0, 6680000.0, 0.000453747167085568], [1435510080.0, 6700000.0, 0.0004553643007], [1435511808.0, 6720000.0, 0.00045699202822963204], [1435513536.0, 6740000.0, 0.000458630484958624], [1435515264.0, 6760000.0, 0.00046027980707737603], [1435516992.0, 6780000.0, 0.00046194013168636807], [1435518720.0, 6800000.0, 0.0004636115968], [1435520448.0, 6820000.0, 0.000465294341350432], [1435522176.0, 6840000.0, 0.0004669885051914241], [1435523904.0, 6860000.0, 0.000468694229102176], [1435525632.0, 6880000.0, 0.0004704116547911679], [1435527360.0, 6900000.0, 0.0004721409249], [1435529088.0, 6920000.0, 0.00047388218300723194], [1435530816.0, 6940000.0, 0.000475635573632224], [1435532544.0, 6960000.0, 0.00047740124223897597], [1435534272.0, 6980000.0, 0.000479179335239968], [1435536000.0, 7000000.0, 0.00048096999999999994], [1435537728.0, 7020000.0, 0.0004827733848400319], [1435539456.0, 7040000.0, 0.000484589639041024], [1435541184.0, 7060000.0, 0.000486418912847776], [1435542912.0, 7080000.0, 0.000488261357472768], [1435544640.0, 7100000.0, 0.0004901171250999999], [1435546368.0, 7120000.0, 0.0004919863688888319], [1435548096.0, 7140000.0, 0.0004938692429778239], [1435549824.0, 7160000.0, 0.0004957659024885758], [1435551552.0, 7180000.0, 0.0004976765035295679], [1435553280.0, 7200000.0, 0.0004996012032], [1435555008.0, 7220000.0, 0.000501540159593632], [1435556736.0, 7240000.0, 0.0005034935318026239], [1435558464.0, 7260000.0, 0.0005054614799213759], [1435560192.0, 7280000.0, 0.0005074441650503679], [1435561920.0, 7300000.0, 0.0005094417493000001], [1435563648.0, 7320000.0, 0.000511454395794432], [1435565376.0, 7340000.0, 0.0005134822686754239], [1435567104.0, 7360000.0, 0.000515525533106176], [1435568832.0, 7380000.0, 0.000517584355275168], [1435570560.0, 7400000.0, 0.0005196589024], [1435572288.0, 7420000.0, 0.0005217493427312321], [1435574016.0, 7440000.0, 0.000523855845556224], [1435575744.0, 7460000.0, 0.0005259785812029759], [1435577472.0, 7480000.0, 0.0005281177210439681], [1435579200.0, 7500000.0, 0.0005302734375], [1435580928.0, 7520000.0, 0.000532445904044032], [1435582656.0, 7540000.0, 0.0005346352952050239], [1435584384.0, 7560000.0, 0.000536841786571776], [1435586112.0, 7580000.0, 0.000539065554796768], [1435587840.0, 7600000.0, 0.0005413067776], [1435589568.0, 7620000.0, 0.000543565633772832], [1435591296.0, 7640000.0, 0.000545842303181824], [1435593024.0, 7660000.0, 0.0005481369667725759], [1435594752.0, 7680000.0, 0.000550449806573568], [1435596480.0, 7700000.0, 0.0005527810057], [1435598208.0, 7720000.0, 0.000555130748357632], [1435599936.0, 7740000.0, 0.000557499219846624], [1435601664.0, 7760000.0, 0.000559886606565376], [1435603392.0, 7780000.0, 0.000562293096014368], [1435605120.0, 7800000.0, 0.0005647188768], [1435606848.0, 7820000.0, 0.0005671641386384321], [1435608576.0, 7840000.0, 0.000569629072359424], [1435610304.0, 7860000.0, 0.000572113869910176], [1435612032.0, 7880000.0, 0.0005746187243591681], [1435613760.0, 7900000.0, 0.0005771438299], [1435615488.0, 7920000.0, 0.000579689381855232], [1435617216.0, 7940000.0, 0.000582255576680224], [1435618944.0, 7960000.0, 0.000584842611966976], [1435620672.0, 7980000.0, 0.000587450686447968], [1435622400.0, 8000000.0, 0.0005900800000000001], [1435624128.0, 8020000.0, 0.0005927307536480321], [1435625856.0, 8040000.0, 0.000595403149569024], [1435627584.0, 8060000.0, 0.0005980973910957761], [1435629312.0, 8080000.0, 0.000600813682720768], [1435631040.0, 8100000.0, 0.0006035522301000001], [1435632768.0, 8120000.0, 0.0006063132400568321], [1435634496.0, 8140000.0, 0.0006090969205858239], [1435636224.0, 8160000.0, 0.0006119034808565759], [1435637952.0, 8180000.0, 0.0006147331312175679], [1435639680.0, 8200000.0, 0.0006175860832], [1435641408.0, 8220000.0, 0.0006204625495216319], [1435643136.0, 8240000.0, 0.000623362744090624], [1435644864.0, 8260000.0, 0.0006262868820093761], [1435646592.0, 8280000.0, 0.0006292351795783679], [1435648320.0, 8300000.0, 0.0006322078543], [1435650048.0, 8320000.0, 0.000635205124882432], [1435651776.0, 8340000.0, 0.000638227211243424], [1435653504.0, 8360000.0, 0.000641274334514176], [1435655232.0, 8380000.0, 0.0006443467170431681], [1435656960.0, 8400000.0, 0.0006474445823999999], [1435658688.0, 8420000.0, 0.0006505681553792321], [1435660416.0, 8440000.0, 0.000653717662004224], [1435662144.0, 8460000.0, 0.000656893329530976], [1435663872.0, 8480000.0, 0.0006600953864519678], [1435665600.0, 8500000.0, 0.0006633240624999999], [1435667328.0, 8520000.0, 0.000666579588652032], [1435669056.0, 8540000.0, 0.0006698621971330239], [1435670784.0, 8560000.0, 0.0006731721214197759], [1435672512.0, 8580000.0, 0.000676509596244768], [1435674240.0, 8600000.0, 0.0006798748576], [1435675968.0, 8620000.0, 0.0006832681427408321], [1435677696.0, 8640000.0, 0.000686689690189824], [1435679424.0, 8660000.0, 0.000690139739740576], [1435681152.0, 8680000.0, 0.000693618532461568], [1435682880.0, 8700000.0, 0.0006971263107], [1435684608.0, 8720000.0, 0.0007006633180856321], [1435686336.0, 8740000.0, 0.000704229799534624], [1435688064.0, 8760000.0, 0.000707826001253376], [1435689792.0, 8780000.0, 0.000711452170742368], [1435691520.0, 8800000.0, 0.0007151085568], [1435693248.0, 8820000.0, 0.0007187954095264319], [1435694976.0, 8840000.0, 0.000722512980327424], [1435696704.0, 8860000.0, 0.0007262615219181761], [1435698432.0, 8880000.0, 0.000730041288327168], [1435700160.0, 8900000.0, 0.0007338525349000002], [1435701888.0, 8920000.0, 0.000737695518303232], [1435703616.0, 8940000.0, 0.000741570496528224], [1435705344.0, 8960000.0, 0.000745477728894976], [1435707072.0, 8980000.0, 0.0007494174760559681], [1435708800.0, 9000000.0, 0.0007533900000000002], [1435710528.0, 9020000.0, 0.000757395564056032], [1435712256.0, 9040000.0, 0.0007614344328970241], [1435713984.0, 9060000.0, 0.0007655068725437762], [1435715712.0, 9080000.0, 0.0007696131503687679], [1435717440.0, 9100000.0, 0.0007737535351000001], [1435719168.0, 9120000.0, 0.0007779282968248321], [1435720896.0, 9140000.0, 0.000782137706993824], [1435722624.0, 9160000.0, 0.0007863820384245761], [1435724352.0, 9180000.0, 0.0007906615653055681], [1435726080.0, 9200000.0, 0.0007949765632000001], [1435727808.0, 9220000.0, 0.0007993273090496321], [1435729536.0, 9240000.0, 0.0008037140811786242], [1435731264.0, 9260000.0, 0.0008081371592973762], [1435732992.0, 9280000.0, 0.0008125968245063682], [1435734720.0, 9300000.0, 0.0008170933593000001], [1435736448.0, 9320000.0, 0.0008216270475704322], [1435738176.0, 9340000.0, 0.000826198174611424], [1435739904.0, 9360000.0, 0.0008308070271221761], [1435741632.0, 9380000.0, 0.0008354538932111679], [1435743360.0, 9400000.0, 0.0008401390623999998], [1435745088.0, 9420000.0, 0.0008448628256272318], [1435746816.0, 9440000.0, 0.0008496254752522239], [1435748544.0, 9460000.0, 0.0008544273050589759], [1435750272.0, 9480000.0, 0.0008592686102599678], [1435752000.0, 9500000.0, 0.0008641496874999998], [1435753728.0, 9520000.0, 0.000869070834860032], [1435755456.0, 9540000.0, 0.0008740323518610239], [1435757184.0, 9560000.0, 0.0008790345394677759], [1435758912.0, 9580000.0, 0.0008840777000927679], [1435760640.0, 9600000.0, 0.0008891621375999998], [1435762368.0, 9620000.0, 0.0008942881573088319], [1435764096.0, 9640000.0, 0.0008994560659978239], [1435765824.0, 9660000.0, 0.0009046661719085757], [1435767552.0, 9680000.0, 0.0009099187847495679], [1435769280.0, 9700000.0, 0.0009152142157], [1435771008.0, 9720000.0, 0.000920552777413632], [1435772736.0, 9740000.0, 0.0009259347840226238], [1435774464.0, 9760000.0, 0.0009313605511413759], [1435776192.0, 9780000.0, 0.0009368303958703678], [1435777920.0, 9800000.0, 0.0009423446368], [1435779648.0, 9820000.0, 0.000947903594014432], [1435781376.0, 9840000.0, 0.000953507589095424], [1435783104.0, 9860000.0, 0.000959156945126176], [1435784832.0, 9880000.0, 0.0009648519866951679], [1435786560.0, 9900000.0, 0.0009705930399], [1435788288.0, 9920000.0, 0.0009763804323512321], [1435790016.0, 9940000.0, 0.000982214493176224], [1435791744.0, 9960000.0, 0.000988095553022976], [1435793472.0, 9980000.0, 0.000994023944063968], [1435795200.0, 10000000.0, 0.001], [1435796928.0, 10020000.0, 0.0010060240560640318], [1435798656.0, 10040000.0, 0.0010120964490250238], [1435800384.0, 10060000.0, 0.0010182175171917762], [1435802112.0, 10080000.0, 0.001024387600416768], [1435803840.0, 10100000.0, 0.0010306070401000004], [1435805568.0, 10120000.0, 0.0010368761791928322], [1435807296.0, 10140000.0, 0.0010431953622018239], [1435809024.0, 10160000.0, 0.001049564935192576], [1435810752.0, 10180000.0, 0.0010559852457935682], [1435812480.0, 10200000.0, 0.0010624566432000002], [1435814208.0, 10220000.0, 0.001068979478177632], [1435815936.0, 10240000.0, 0.001075554103066624], [1435817664.0, 10260000.0, 0.001082180871785376], [1435819392.0, 10280000.0, 0.0010888601398343681], [1435821120.0, 10300000.0, 0.0010955922643], [1435822848.0, 10320000.0, 0.0011023776038584323], [1435824576.0, 10340000.0, 0.0011092165187794243], [1435826304.0, 10360000.0, 0.0011161093709301762], [1435828032.0, 10380000.0, 0.0011230565237791681], [1435829760.0, 10400000.0, 0.0011300583423999998], [1435831488.0, 10420000.0, 0.0011371151934752319], [1435833216.0, 10440000.0, 0.0011442274453002243], [1435834944.0, 10460000.0, 0.0011513954677869762], [1435836672.0, 10480000.0, 0.001158619632467968], [1435838400.0, 10500000.0, 0.0011659003125000004], [1435840128.0, 10520000.0, 0.0011732378826680323], [1435841856.0, 10540000.0, 0.001180632719389024], [1435843584.0, 10560000.0, 0.001188085200715776], [1435845312.0, 10580000.0, 0.0011955957063407682], [1435847040.0, 10600000.0, 0.0012031646175999998], [1435848768.0, 10620000.0, 0.0012107923174768322], [1435850496.0, 10640000.0, 0.001218479190605824], [1435852224.0, 10660000.0, 0.001226225623276576], [1435853952.0, 10680000.0, 0.0012340320034375683], [1435855680.0, 10700000.0, 0.0012418987207000003], [1435857408.0, 10720000.0, 0.0012498261663416322], [1435859136.0, 10740000.0, 0.0012578147333106241], [1435860864.0, 10760000.0, 0.0012658648162293764], [1435862592.0, 10780000.0, 0.0012739768113983682], [1435864320.0, 10800000.0, 0.0012821511168000005], [1435866048.0, 10820000.0, 0.0012903881321024325], [1435867776.0, 10840000.0, 0.0012986882586634245], [1435869504.0, 10860000.0, 0.0013070518995341764], [1435871232.0, 10880000.0, 0.0013154794594631685], [1435872960.0, 10900000.0, 0.0013239713449000002], [1435874688.0, 10920000.0, 0.0013325279639992325], [1435876416.0, 10940000.0, 0.0013411497266242244], [1435878144.0, 10960000.0, 0.0013498370443509762], [1435879872.0, 10980000.0, 0.0013585903304719684], [1435881600.0, 11000000.0, 0.0013674100000000005], [1435883328.0, 11020000.0, 0.0013762964696720323], [1435885056.0, 11040000.0, 0.0013852501579530248], [1435886784.0, 11060000.0, 0.0013942714850397765], [1435888512.0, 11080000.0, 0.0014033608728647685], [1435890240.0, 11100000.0, 0.0014125187451000006], [1435891968.0, 11120000.0, 0.0014217455271608324], [1435893696.0, 11140000.0, 0.0014310416462098244], [1435895424.0, 11160000.0, 0.0014404075311605766], [1435897152.0, 11180000.0, 0.0014498436126815686], [1435898880.0, 11200000.0, 0.0014593503232000005], [1435900608.0, 11220000.0, 0.0014689280969056324], [1435902336.0, 11240000.0, 0.0014785773697546245], [1435904064.0, 11260000.0, 0.0014882985794733754], [1435905792.0, 11280000.0, 0.0014980921655623675], [1435907520.0, 11300000.0, 0.0015079585692999995], [1435909248.0, 11320000.0, 0.0015178982337464314], [1435910976.0, 11340000.0, 0.0015279116037474232], [1435912704.0, 11360000.0, 0.0015379991259381756], [1435914432.0, 11380000.0, 0.0015481612487471676], [1435916160.0, 11400000.0, 0.0015583984223999992], [1435917888.0, 11420000.0, 0.0015687110989232313], [1435919616.0, 11440000.0, 0.0015790997321482236], [1435921344.0, 11460000.0, 0.0015895647777149754], [1435923072.0, 11480000.0, 0.0016001066930759675], [1435924800.0, 11500000.0, 0.0016107259374999999], [1435926528.0, 11520000.0, 0.0016214229720760316], [1435928256.0, 11540000.0, 0.0016321982597170233], [1435929984.0, 11560000.0, 0.0016430522651637758], [1435931712.0, 11580000.0, 0.0016539854549887673], [1435933440.0, 11600000.0, 0.0016649982975999993], [1435935168.0, 11620000.0, 0.0016760912632448317], [1435936896.0, 11640000.0, 0.0016872648240138235], [1435938624.0, 11660000.0, 0.0016985194538445753], [1435940352.0, 11680000.0, 0.0017098556285255675], [1435942080.0, 11700000.0, 0.0017212738256999996], [1435943808.0, 11720000.0, 0.0017327745248696316], [1435945536.0, 11740000.0, 0.0017443582073986238], [1435947264.0, 11760000.0, 0.0017560253565173758], [1435948992.0, 11780000.0, 0.0017677764573263677], [1435950720.0, 11800000.0, 0.0017796119968], [1435952448.0, 11820000.0, 0.0017915324637904316], [1435954176.0, 11840000.0, 0.0018035383490314235], [1435955904.0, 11860000.0, 0.0018156301451421758], [1435957632.0, 11880000.0, 0.0018278083466311673], [1435959360.0, 11900000.0, 0.001840073449899999], [1435961088.0, 11920000.0, 0.0018524259532472322], [1435962816.0, 11940000.0, 0.0018648663568722234], [1435964544.0, 11960000.0, 0.0018773951628789755], [1435966272.0, 11980000.0, 0.0018900128752799674], [1435968000.0, 12000000.0, 0.0019027199999999992], [1435969728.0, 12020000.0, 0.0019155170448800316], [1435971456.0, 12040000.0, 0.0019284045196810234], [1435973184.0, 12060000.0, 0.0019413829360877756], [1435974912.0, 12080000.0, 0.0019544528077127675], [1435976640.0, 12100000.0, 0.0019676146501], [1435978368.0, 12120000.0, 0.0019808689807288317], [1435980096.0, 12140000.0, 0.0019942163190178235], [1435981824.0, 12160000.0, 0.0020076571863285758], [1435983552.0, 12180000.0, 0.0020211921059695684], [1435985280.0, 12200000.0, 0.0020348216032], [1435987008.0, 12220000.0, 0.002048546205233632], [1435988736.0, 12240000.0, 0.0020623664412426235], [1435990464.0, 12260000.0, 0.002076282842361376], [1435992192.0, 12280000.0, 0.002090295941690368], [1435993920.0, 12300000.0, 0.0021044062742999995], [1435995648.0, 12320000.0, 0.002118614377234431], [1435997376.0, 12340000.0, 0.002132920789515423], [1435999104.0, 12360000.0, 0.002147326052146176], [1436000832.0, 12380000.0, 0.0021618307081151676], [1436002560.0, 12400000.0, 0.0021764353024], [1436004288.0, 12420000.0, 0.002191140381971232], [1436006016.0, 12440000.0, 0.0022059464957962247], [1436007744.0, 12460000.0, 0.0022208541948429755], [1436009472.0, 12480000.0, 0.002235864032083968], [1436011200.0, 12500000.0, 0.0022509765625], [1436012928.0, 12520000.0, 0.0022661923430840314], [1436014656.0, 12540000.0, 0.0022815119328450243], [1436016384.0, 12560000.0, 0.002296935892811776], [1436018112.0, 12580000.0, 0.002312464786036768], [1436019840.0, 12600000.0, 0.0023280991775999995], [1436021568.0, 12620000.0, 0.0023438396346128322], [1436023296.0, 12640000.0, 0.002359686726221824], [1436025024.0, 12660000.0, 0.002375641023612576], [1436026752.0, 12680000.0, 0.002391703100013568], [1436028480.0, 12700000.0, 0.0024078735307], [1436030208.0, 12720000.0, 0.002424152892997632], [1436031936.0, 12740000.0, 0.002440541766286624], [1436033664.0, 12760000.0, 0.002457040732005376], [1436035392.0, 12780000.0, 0.0024736503736543687], [1436037120.0, 12800000.0, 0.002490371276800001], [1436038848.0, 12820000.0, 0.002507204029078432], [1436040576.0, 12840000.0, 0.002524149220199424], [1436042304.0, 12860000.0, 0.002541207441950176], [1436044032.0, 12880000.0, 0.002558379288199168], [1436045760.0, 12900000.0, 0.0025756653549], [1436047488.0, 12920000.0, 0.002593066240095233], [1436049216.0, 12940000.0, 0.002610582543920225], [1436050944.0, 12960000.0, 0.0026282148686069765], [1436052672.0, 12980000.0, 0.0026459638184879687], [1436054400.0, 13000000.0, 0.00266383], [1436056128.0, 13020000.0, 0.002681814021688032], [1436057856.0, 13040000.0, 0.002699916494209024], [1436059584.0, 13060000.0, 0.002718138030335776], [1436061312.0, 13080000.0, 0.0027364792449607687], [1436063040.0, 13100000.0, 0.0027549407551], [1436064768.0, 13120000.0, 0.0027735231798968326], [1436066496.0, 13140000.0, 0.0027922271406258243], [1436068224.0, 13160000.0, 0.0028110532606965767], [1436069952.0, 13180000.0, 0.0028300021656575683], [1436071680.0, 13200000.0, 0.002849074483200001], [1436073408.0, 13220000.0, 0.002868270843161633], [1436075136.0, 13240000.0, 0.0028875918775306247], [1436076864.0, 13260000.0, 0.002907038220449376], [1436078592.0, 13280000.0, 0.002926610508218369], [1436080320.0, 13300000.0, 0.0029463093793000014], [1436082048.0, 13320000.0, 0.0029661354743224333], [1436083776.0, 13340000.0, 0.002986089436083424], [1436085504.0, 13360000.0, 0.003006171909554177], [1436087232.0, 13380000.0, 0.0030263835418831687], [1436088960.0, 13400000.0, 0.0030467249824000005], [1436090688.0, 13420000.0, 0.003067196882619233], [1436092416.0, 13440000.0, 0.0030877998962442247], [1436094144.0, 13460000.0, 0.003108534679170976], [1436095872.0, 13480000.0, 0.0031294018894919688], [1436097600.0, 13500000.0, 0.003150402187500001], [1436099328.0, 13520000.0, 0.0031715362356920327], [1436101056.0, 13540000.0, 0.003192804698773025], [1436102784.0, 13560000.0, 0.0032142082436597773], [1436104512.0, 13580000.0, 0.0032357475394847687], [1436106240.0, 13600000.0, 0.0032574232576], [1436107968.0, 13620000.0, 0.0032792360715808333], [1436109696.0, 13640000.0, 0.0033011866572298255], [1436111424.0, 13660000.0, 0.0033232756925805776], [1436113152.0, 13680000.0, 0.0033455038579015693], [1436114880.0, 13700000.0, 0.0033678718357000015], [1436116608.0, 13720000.0, 0.0033903803107256333], [1436118336.0, 13740000.0, 0.003413029969974625], [1436120064.0, 13760000.0, 0.0034358215026933743], [1436121792.0, 13780000.0, 0.003458755600382367], [1436123520.0, 13800000.0, 0.003481832956799998], [1436125248.0, 13820000.0, 0.0035050542679664317], [1436126976.0, 13840000.0, 0.0035284202321674224], [1436128704.0, 13860000.0, 0.0035519315499581757], [1436130432.0, 13880000.0, 0.0035755889241671675], [1436132160.0, 13900000.0, 0.003599393059899999], [1436133888.0, 13920000.0, 0.003623344664543231], [1436135616.0, 13940000.0, 0.0036474444477682237], [1436137344.0, 13960000.0, 0.0036716931215349745], [1436139072.0, 13980000.0, 0.003696091400095967], [1436140800.0, 14000000.0, 0.003720639999999998], [1436142528.0, 14020000.0, 0.0037453396400960307], [1436144256.0, 14040000.0, 0.0037701910415370234], [1436145984.0, 14060000.0, 0.0037951949277837756], [1436147712.0, 14080000.0, 0.0038203520246087672], [1436149440.0, 14100000.0, 0.003845663060099999], [1436151168.0, 14120000.0, 0.0038711287646648316], [1436152896.0, 14140000.0, 0.0038967498710338233], [1436154624.0, 14160000.0, 0.003922527114264575], [1436156352.0, 14180000.0, 0.0039484612317455675], [1436158080.0, 14200000.0, 0.003974552963199998], [1436159808.0, 14220000.0, 0.004000803050689632], [1436161536.0, 14240000.0, 0.004027212238618623], [1436163264.0, 14260000.0, 0.004053781273737375], [1436164992.0, 14280000.0, 0.0040805109051463674], [1436166720.0, 14300000.0, 0.004107401884299999], [1436168448.0, 14320000.0, 0.00413445496501043], [1436170176.0, 14340000.0, 0.004161670903451423], [1436171904.0, 14360000.0, 0.004189050458162175], [1436173632.0, 14380000.0, 0.004216594390051167], [1436175360.0, 14400000.0, 0.0042443034624], [1436177088.0, 14420000.0, 0.004272178440867231], [1436178816.0, 14440000.0, 0.004300220093492223], [1436180544.0, 14460000.0, 0.004328429190698975], [1436182272.0, 14480000.0, 0.004356806505299966], [1436184000.0, 14500000.0, 0.004385352812499999], [1436185728.0, 14520000.0, 0.004414068889900031], [1436187456.0, 14540000.0, 0.004442955517501024], [1436189184.0, 14560000.0, 0.004472013477707776], [1436190912.0, 14580000.0, 0.004501243555332767], [1436192640.0, 14600000.0, 0.0045306465376], [1436194368.0, 14620000.0, 0.004560223214148832], [1436196096.0, 14640000.0, 0.004589974377037824], [1436197824.0, 14660000.0, 0.004619900820748575], [1436199552.0, 14680000.0, 0.004650003342189566], [1436201280.0, 14700000.0, 0.0046802827407000005], [1436203008.0, 14720000.0, 0.004710739818053632], [1436204736.0, 14740000.0, 0.004741375378462625], [1436206464.0, 14760000.0, 0.004772190228581374], [1436208192.0, 14780000.0, 0.004803185177510368], [1436209920.0, 14800000.0, 0.004834361036799999], [1436211648.0, 14820000.0, 0.0048657186204544315], [1436213376.0, 14840000.0, 0.004897258744935424], [1436215104.0, 14860000.0, 0.004928982229166176], [1436216832.0, 14880000.0, 0.004960889894535168], [1436218560.0, 14900000.0, 0.004992982564899999], [1436220288.0, 14920000.0, 0.00502526106659123], [1436222016.0, 14940000.0, 0.0050577262284162245], [1436223744.0, 14960000.0, 0.005090378881662977], [1436225472.0, 14980000.0, 0.005123219860103968], [1436227200.0, 15000000.0, 0.00515625], [1436228928.0, 15020000.0, 0.005189470140104032], [1436230656.0, 15040000.0, 0.005222881121665024], [1436232384.0, 15060000.0, 0.005256483788431776], [1436234112.0, 15080000.0, 0.005290278986656767], [1436235840.0, 15100000.0, 0.0053242675650999995], [1436237568.0, 15120000.0, 0.005358450375032833], [1436239296.0, 15140000.0, 0.005392828270241824], [1436241024.0, 15160000.0, 0.005427402107032576], [1436242752.0, 15180000.0, 0.0054621727442335675], [1436244480.0, 15200000.0, 0.005497141043200001], [1436246208.0, 15220000.0, 0.005532307867817632], [1436247936.0, 15240000.0, 0.005567674084506625], [1436249664.0, 15260000.0, 0.005603240562225377], [1436251392.0, 15280000.0, 0.005639008172474367], [1436253120.0, 15300000.0, 0.005674977789300001], [1436254848.0, 15320000.0, 0.005711150289298432], [1436256576.0, 15340000.0, 0.005747526551619424], [1436258304.0, 15360000.0, 0.005784107457970176], [1436260032.0, 15380000.0, 0.0058208938926191675], [1436261760.0, 15400000.0, 0.0058578867424], [1436263488.0, 15420000.0, 0.005895086896715232], [1436265216.0, 15440000.0, 0.005932495247540224], [1436266944.0, 15460000.0, 0.005970112689426977], [1436268672.0, 15480000.0, 0.006007940119507969], [1436270400.0, 15500000.0, 0.006045978437499999], [1436272128.0, 15520000.0, 0.006084228545708033], [1436273856.0, 15540000.0, 0.006122691349029026], [1436275584.0, 15560000.0, 0.006161367754955776], [1436277312.0, 15580000.0, 0.006200258673580768], [1436279040.0, 15600000.0, 0.006239365017600001], [1436280768.0, 15620000.0, 0.006278687702316833], [1436282496.0, 15640000.0, 0.006318227645645825], [1436284224.0, 15660000.0, 0.006357985768116577], [1436285952.0, 15680000.0, 0.0063979629928775684], [1436287680.0, 15700000.0, 0.006438160245700001], [1436289408.0, 15720000.0, 0.0064785784549816336], [1436291136.0, 15740000.0, 0.0065192185517506255], [1436292864.0, 15760000.0, 0.006560081469669378], [1436294592.0, 15780000.0, 0.00660116814503837], [1436296320.0, 15800000.0, 0.006642479516800001], [1436298048.0, 15820000.0, 0.006684016526542433], [1436299776.0, 15840000.0, 0.0067257801185034265], [1436301504.0, 15860000.0, 0.006767771239574177], [1436303232.0, 15880000.0, 0.006809990839303169], [1436304960.0, 15900000.0, 0.006852439869900002], [1436306688.0, 15920000.0, 0.0068951192862392345], [1436308416.0, 15940000.0, 0.006938030045864227], [1436310144.0, 15960000.0, 0.006981173108990978], [1436311872.0, 15980000.0, 0.0070245494385119695], [1436313600.0, 16000000.0, 0.007068160000000002], [1436315328.0, 16020000.0, 0.007112005761712036], [1436317056.0, 16040000.0, 0.007156087694593026], [1436318784.0, 16060000.0, 0.007200406772279779], [1436320512.0, 16080000.0, 0.0072449639711047705], [1436322240.0, 16100000.0, 0.007289760270100003], [1436323968.0, 16120000.0, 0.007334796651000834], [1436325696.0, 16140000.0, 0.007380074098249827], [1436327424.0, 16160000.0, 0.007425593599000578], [1436329152.0, 16180000.0, 0.007471356143121571], [1436330880.0, 16200000.0, 0.007517362723200001], [1436332608.0, 16220000.0, 0.007563614334545633], [1436334336.0, 16240000.0, 0.007610111975194629], [1436336064.0, 16260000.0, 0.007656856645913374], [1436337792.0, 16280000.0, 0.007703849350202366], [1436339520.0, 16300000.0, 0.0077510910942999965], [1436341248.0, 16320000.0, 0.007798582887186429], [1436342976.0, 16340000.0, 0.00784632574058742], [1436344704.0, 16360000.0, 0.007894320668978171], [1436346432.0, 16380000.0, 0.007942568689587166], [1436348160.0, 16400000.0, 0.007991070822399996], [1436349888.0, 16420000.0, 0.00803982809016323], [1436351616.0, 16440000.0, 0.008088841518388222], [1436353344.0, 16460000.0, 0.008138112135354973], [1436355072.0, 16480000.0, 0.008187640972115967], [1436356800.0, 16500000.0, 0.008237429062499997], [1436358528.0, 16520000.0, 0.00828747744311603], [1436360256.0, 16540000.0, 0.00833778715335702], [1436361984.0, 16560000.0, 0.008388359235403776], [1436363712.0, 16580000.0, 0.008439194734228767], [1436365440.0, 16600000.0, 0.008490294697599998], [1436367168.0, 16620000.0, 0.008541660176084829], [1436368896.0, 16640000.0, 0.008593292223053823], [1436370624.0, 16660000.0, 0.008645191894684575], [1436372352.0, 16680000.0, 0.008697360249965567], [1436374080.0, 16700000.0, 0.008749798350699996], [1436375808.0, 16720000.0, 0.008802507261509631], [1436377536.0, 16740000.0, 0.008855488049838623], [1436379264.0, 16760000.0, 0.008908741785957375], [1436380992.0, 16780000.0, 0.008962269542966365], [1436382720.0, 16800000.0, 0.009016072396799997], [1436384448.0, 16820000.0, 0.00907015142623043], [1436386176.0, 16840000.0, 0.009124507712871422], [1436387904.0, 16860000.0, 0.009179142341182174], [1436389632.0, 16880000.0, 0.009234056398471168], [1436391360.0, 16900000.0, 0.0092892509749], [1436393088.0, 16920000.0, 0.009344727163487231], [1436394816.0, 16940000.0, 0.009400486060112222], [1436396544.0, 16960000.0, 0.009456528763518973], [1436398272.0, 16980000.0, 0.009512856375319968], [1436400000.0, 17000000.0, 0.00956947], [1436401728.0, 17020000.0, 0.009626370744920029], [1436403456.0, 17040000.0, 0.009683559720321025], [1436405184.0, 17060000.0, 0.009741038039327772], [1436406912.0, 17080000.0, 0.009798806817952768], [1436408640.0, 17100000.0, 0.009856867175099999], [1436410368.0, 17120000.0, 0.00991522023256883], [1436412096.0, 17140000.0, 0.009973867115057825], [1436413824.0, 17160000.0, 0.010032808950168575], [1436415552.0, 17180000.0, 0.010092046868409569], [1436417280.0, 17200000.0, 0.0101515820032], [1436419008.0, 17220000.0, 0.01021141549087363], [1436420736.0, 17240000.0, 0.010271548470682625], [1436422464.0, 17260000.0, 0.010331982084801374], [1436424192.0, 17280000.0, 0.010392717478330364], [1436425920.0, 17300000.0, 0.010453755799299998], [1436427648.0, 17320000.0, 0.01051509819867443], [1436429376.0, 17340000.0, 0.010576745830355424], [1436431104.0, 17360000.0, 0.010638699851186177], [1436432832.0, 17380000.0, 0.010700961420955167], [1436434560.0, 17400000.0, 0.010763531702399999], [1436436288.0, 17420000.0, 0.01082641186121123], [1436438016.0, 17440000.0, 0.010889603066036223], [1436439744.0, 17460000.0, 0.010953106488482977], [1436441472.0, 17480000.0, 0.011016923303123968], [1436443200.0, 17500000.0, 0.0110810546875], [1436444928.0, 17520000.0, 0.011145501822124032], [1436446656.0, 17540000.0, 0.011210265890485023], [1436448384.0, 17560000.0, 0.011275348079051777], [1436450112.0, 17580000.0, 0.011340749577276767], [1436451840.0, 17600000.0, 0.0114064715776], [1436453568.0, 17620000.0, 0.011472515275452832], [1436455296.0, 17640000.0, 0.011538881869261823], [1436457024.0, 17660000.0, 0.011605572560452578], [1436458752.0, 17680000.0, 0.01167258855345357], [1436460480.0, 17700000.0, 0.011739931055700002], [1436462208.0, 17720000.0, 0.011807601277637634], [1436463936.0, 17740000.0, 0.011875600432726622], [1436465664.0, 17760000.0, 0.011943929737445376], [1436467392.0, 17780000.0, 0.012012590411294372], [1436469120.0, 17800000.0, 0.012081583676800001], [1436470848.0, 17820000.0, 0.012150910759518432], [1436472576.0, 17840000.0, 0.012220572888039427], [1436474304.0, 17860000.0, 0.012290571293990174], [1436476032.0, 17880000.0, 0.01236090721203917], [1436477760.0, 17900000.0, 0.012431581879899997], [1436479488.0, 17920000.0, 0.012502596538335234], [1436481216.0, 17940000.0, 0.012573952431160225], [1436482944.0, 17960000.0, 0.012645650805246976], [1436484672.0, 17980000.0, 0.012717692910527972], [1436486400.0, 18000000.0, 0.012790080000000004], [1436488128.0, 18020000.0, 0.01286281332972803], [1436489856.0, 18040000.0, 0.012935894158849026], [1436491584.0, 18060000.0, 0.013009323749575778], [1436493312.0, 18080000.0, 0.01308310336720077], [1436495040.0, 18100000.0, 0.013157234280100004], [1436496768.0, 18120000.0, 0.013231717759736836], [1436498496.0, 18140000.0, 0.013306555080665828], [1436500224.0, 18160000.0, 0.01338174752053658], [1436501952.0, 18180000.0, 0.01345729636009757], [1436503680.0, 18200000.0, 0.013533202883200006], [1436505408.0, 18220000.0, 0.013609468376801636], [1436507136.0, 18240000.0, 0.013686094130970628], [1436508864.0, 18260000.0, 0.013763081438889378], [1436510592.0, 18280000.0, 0.01384043159685837], [1436512320.0, 18300000.0, 0.013918145904300003], [1436514048.0, 18320000.0, 0.013996225663762435], [1436515776.0, 18340000.0, 0.014074672180923427], [1436517504.0, 18360000.0, 0.01415348676459418], [1436519232.0, 18380000.0, 0.01423267072672317], [1436520960.0, 18400000.0, 0.014312225382400004], [1436522688.0, 18420000.0, 0.014392152049859237], [1436524416.0, 18440000.0, 0.014472452050484226], [1436526144.0, 18460000.0, 0.014553126708810978], [1436527872.0, 18480000.0, 0.014634177352531972], [1436529600.0, 18500000.0, 0.014715605312500001], [1436531328.0, 18520000.0, 0.014797411922732038], [1436533056.0, 18540000.0, 0.014879598520413026], [1436534784.0, 18560000.0, 0.01496216644589978], [1436536512.0, 18580000.0, 0.015045117042724775], [1436538240.0, 18600000.0, 0.015128451657600005], [1436539968.0, 18620000.0, 0.01521217164042084], [1436541696.0, 18640000.0, 0.015296278344269828], [1436543424.0, 18660000.0, 0.015380773125420579], [1436545152.0, 18680000.0, 0.015465657343341575], [1436546880.0, 18700000.0, 0.015550932360700006], [1436548608.0, 18720000.0, 0.015636599543365635], [1436550336.0, 18740000.0, 0.01572266026041463], [1436552064.0, 18760000.0, 0.015809115884133373], [1436553792.0, 18780000.0, 0.015895967790022364], [1436555520.0, 18800000.0, 0.015983217356799993], [1436557248.0, 18820000.0, 0.016070865966406427], [1436558976.0, 18840000.0, 0.01615891500400742], [1436560704.0, 18860000.0, 0.016247365857998172], [1436562432.0, 18880000.0, 0.016336219920007163], [1436564160.0, 18900000.0, 0.0164254785849], [1436565888.0, 18920000.0, 0.016515143250783234], [1436567616.0, 18940000.0, 0.01660521531900822], [1436569344.0, 18960000.0, 0.01669569619417497], [1436571072.0, 18980000.0, 0.016786587284135963], [1436572800.0, 19000000.0, 0.016877889999999993], [1436574528.0, 19020000.0, 0.01696960575613603], [1436576256.0, 19040000.0, 0.01706173597017702], [1436577984.0, 19060000.0, 0.017154282063023774], [1436579712.0, 19080000.0, 0.017247245458848766], [1436581440.0, 19100000.0, 0.017340627585099997], [1436583168.0, 19120000.0, 0.01743442987250483], [1436584896.0, 19140000.0, 0.017528653755073825], [1436586624.0, 19160000.0, 0.017623300670104574], [1436588352.0, 19180000.0, 0.01771837205818556], [1436590080.0, 19200000.0, 0.017813869363199997], [1436591808.0, 19220000.0, 0.01790979403232963], [1436593536.0, 19240000.0, 0.018006147516058623], [1436595264.0, 19260000.0, 0.018102931268177372], [1436596992.0, 19280000.0, 0.018200146745786364], [1436598720.0, 19300000.0, 0.018297795409299995], [1436600448.0, 19320000.0, 0.018395878722450427], [1436602176.0, 19340000.0, 0.018494398152291425], [1436603904.0, 19360000.0, 0.018593355169202172], [1436605632.0, 19380000.0, 0.018692751246891164], [1436607360.0, 19400000.0, 0.018792587862399998], [1436609088.0, 19420000.0, 0.018892866496107233], [1436610816.0, 19440000.0, 0.01899358863173222], [1436612544.0, 19460000.0, 0.01909475575633898], [1436614272.0, 19480000.0, 0.019196369360339964], [1436616000.0, 19500000.0, 0.019298430937499996], [1436617728.0, 19520000.0, 0.019400941984940034], [1436619456.0, 19540000.0, 0.019503904003141025], [1436621184.0, 19560000.0, 0.019607318495947774], [1436622912.0, 19580000.0, 0.019711186970572767], [1436624640.0, 19600000.0, 0.0198155109376], [1436626368.0, 19620000.0, 0.019920291910988833], [1436628096.0, 19640000.0, 0.020025531408077824], [1436629824.0, 19660000.0, 0.020131230949588578], [1436631552.0, 19680000.0, 0.02023739205962957], [1436633280.0, 19700000.0, 0.020344016265699996], [1436635008.0, 19720000.0, 0.02045110509869363], [1436636736.0, 19740000.0, 0.020558660092902623], [1436638464.0, 19760000.0, 0.020666682786021375], [1436640192.0, 19780000.0, 0.02077517471915037], [1436641920.0, 19800000.0, 0.0208841374368], [1436643648.0, 19820000.0, 0.020993572486894433], [1436645376.0, 19840000.0, 0.021103481420775425], [1436647104.0, 19860000.0, 0.02121386579320618], [1436648832.0, 19880000.0, 0.021324727162375167], [1436650560.0, 19900000.0, 0.0214360670899], [1436652288.0, 19920000.0, 0.021547887140831234], [1436654016.0, 19940000.0, 0.02166018888365622], [1436655744.0, 19960000.0, 0.021772973890302975], [1436657472.0, 19980000.0, 0.02188624373614397], [1436659200.0, 20000000.0, 0.022], [1436660928.0, 20020000.0, 0.022114244264144017], [1436662656.0, 20040000.0, 0.022228978114305024], [1436664384.0, 20060000.0, 0.022344203139671767], [1436666112.0, 20080000.0, 0.022459920932896768], [1436667840.0, 20100000.0, 0.022576133090099988], [1436669568.0, 20120000.0, 0.022692841210872835], [1436671296.0, 20140000.0, 0.022810046898281814], [1436673024.0, 20160000.0, 0.022927751758872583], [1436674752.0, 20180000.0, 0.023045957402673553], [1436676480.0, 20200000.0, 0.023164665443200005], [1436678208.0, 20220000.0, 0.023283877497457618], [1436679936.0, 20240000.0, 0.023403595185946632], [1436681664.0, 20260000.0, 0.02352382013266536], [1436683392.0, 20280000.0, 0.02364455396511437], [1436685120.0, 20300000.0, 0.02376579831429999], [1436686848.0, 20320000.0, 0.02388755481473843], [1436688576.0, 20340000.0, 0.02400982510445941], [1436690304.0, 20360000.0, 0.024132610825010184], [1436692032.0, 20380000.0, 0.024255913621459155], [1436693760.0, 20400000.0, 0.0243797351424], [1436695488.0, 20420000.0, 0.02450407703995522], [1436697216.0, 20440000.0, 0.024628940969780223], [1436698944.0, 20460000.0, 0.024754328591066962], [1436700672.0, 20480000.0, 0.024880241566547966], [1436702400.0, 20500000.0, 0.025006681562499987], [1436704128.0, 20520000.0, 0.025133650248748033], [1436705856.0, 20540000.0, 0.02526114929866901], [1436707584.0, 20560000.0, 0.025389180389195777], [1436709312.0, 20580000.0, 0.025517745200820754], [1436711040.0, 20600000.0, 0.025646845417599998], [1436712768.0, 20620000.0, 0.02577648272715682], [1436714496.0, 20640000.0, 0.02590665882068583], [1436716224.0, 20660000.0, 0.026037375392956558], [1436717952.0, 20680000.0, 0.026168634142317574], [1436719680.0, 20700000.0, 0.02630043677069999], [1436721408.0, 20720000.0, 0.026432784983621636], [1436723136.0, 20740000.0, 0.026565680490190615], [1436724864.0, 20760000.0, 0.02669912500310938], [1436726592.0, 20780000.0, 0.02683312023867836], [1436728320.0, 20800000.0, 0.026967667916800005], [1436730048.0, 20820000.0, 0.027102769760982425], [1436731776.0, 20840000.0, 0.027238427498343428], [1436733504.0, 20860000.0, 0.027374642859614162], [1436735232.0, 20880000.0, 0.027511417579143178], [1436736960.0, 20900000.0, 0.027648753394899996], [1436738688.0, 20920000.0, 0.02778665204847924], [1436740416.0, 20940000.0, 0.02792511528510421], [1436742144.0, 20960000.0, 0.02806414485363098], [1436743872.0, 20980000.0, 0.028203742506551957], [1436745600.0, 21000000.0, 0.02834391000000001], [1436747328.0, 21020000.0, 0.028484649093752024], [1436749056.0, 21040000.0, 0.028625961551233033], [1436750784.0, 21060000.0, 0.02876784913951977], [1436752512.0, 21080000.0, 0.028910313629344774], [1436754240.0, 21100000.0, 0.029053356795099995], [1436755968.0, 21120000.0, 0.029196980414840836], [1436757696.0, 21140000.0, 0.029341186270289817], [1436759424.0, 21160000.0, 0.029485976146840582], [1436761152.0, 21180000.0, 0.02963135183356156], [1436762880.0, 21200000.0, 0.029777315123200008], [1436764608.0, 21220000.0, 0.02992386781218562], [1436766336.0, 21240000.0, 0.030071011700634632], [1436768064.0, 21260000.0, 0.03021874859235337], [1436769792.0, 21280000.0, 0.030367080294842374], [1436771520.0, 21300000.0, 0.030516008619299987], [1436773248.0, 21320000.0, 0.030665535380626437], [1436774976.0, 21340000.0, 0.030815662397427418], [1436776704.0, 21360000.0, 0.030966391492018183], [1436778432.0, 21380000.0, 0.031117724490427157], [1436780160.0, 21400000.0, 0.03126966322240001], [1436781888.0, 21420000.0, 0.031422209521403226], [1436783616.0, 21440000.0, 0.03157536522462823], [1436785344.0, 21460000.0, 0.03172913217299497], [1436787072.0, 21480000.0, 0.03188351221115598], [1436788800.0, 21500000.0, 0.032038507187499995], [1436790528.0, 21520000.0, 0.03219411895415605], [1436792256.0, 21540000.0, 0.03235034936699701], [1436793984.0, 21560000.0, 0.032507200285643786], [1436795712.0, 21580000.0, 0.032664673573468764], [1436797440.0, 21600000.0, 0.03282277109760001], [1436799168.0, 21620000.0, 0.03298149472892483], [1436800896.0, 21640000.0, 0.03314084634209384], [1436802624.0, 21660000.0, 0.03330082781552456], [1436804352.0, 21680000.0, 0.03346144103140559], [1436806080.0, 21700000.0, 0.0336226878757], [1436807808.0, 21720000.0, 0.03378457023814964], [1436809536.0, 21740000.0, 0.033947090012278615], [1436811264.0, 21760000.0, 0.034110249095397394], [1436812992.0, 21780000.0, 0.034274049388606356], [1436814720.0, 21800000.0, 0.03443849279680001], [1436816448.0, 21820000.0, 0.03460358122867043], [1436818176.0, 21840000.0, 0.03476931659671143], [1436819904.0, 21860000.0, 0.03493570081722217], [1436821632.0, 21880000.0, 0.03510273581031119], [1436823360.0, 21900000.0, 0.03527042349989999], [1436825088.0, 21920000.0, 0.03543876581372724], [1436826816.0, 21940000.0, 0.035607764683352215], [1436828544.0, 21960000.0, 0.035777422044158985], [1436830272.0, 21980000.0, 0.03594773983535996], [1436832000.0, 22000000.0, 0.03611872000000001], [1436833728.0, 22020000.0, 0.03629036448496003], [1436835456.0, 22040000.0, 0.03646267524096104], [1436837184.0, 22060000.0, 0.03663565422256777], [1436838912.0, 22080000.0, 0.036809303388192785], [1436840640.0, 22100000.0, 0.0369836247001], [1436842368.0, 22120000.0, 0.037158620124408855], [1436844096.0, 22140000.0, 0.03733429163109782], [1436845824.0, 22160000.0, 0.03751064119400859], [1436847552.0, 22180000.0, 0.037687670790849566], [1436849280.0, 22200000.0, 0.03786538240320002], [1436851008.0, 22220000.0, 0.03804377801651364], [1436852736.0, 22240000.0, 0.038222859620122646], [1436854464.0, 22260000.0, 0.038402629207241384], [1436856192.0, 22280000.0, 0.03858308877497039], [1436857920.0, 22300000.0, 0.038764240324299995], [1436859648.0, 22320000.0, 0.03894608586011445], [1436861376.0, 22340000.0, 0.03912862739119543], [1436863104.0, 22360000.0, 0.039311866930226196], [1436864832.0, 22380000.0, 0.03949580649379517], [1436866560.0, 22400000.0, 0.03968044810240001], [1436868288.0, 22420000.0, 0.03986579378045123], [1436870016.0, 22440000.0, 0.04005184555627624], [1436871744.0, 22460000.0, 0.04023860546212298], [1436873472.0, 22480000.0, 0.04042607553416398], [1436875200.0, 22500000.0, 0.0406142578125], [1436876928.0, 22520000.0, 0.040803154341164015], [1436878656.0, 22540000.0, 0.04099276716812502], [1436880384.0, 22560000.0, 0.041183098345291755], [1436882112.0, 22580000.0, 0.041374149928516776], [1436883840.0, 22600000.0, 0.04156592397759997], [1436885568.0, 22620000.0, 0.041758422556292835], [1436887296.0, 22640000.0, 0.041951647732301804], [1436889024.0, 22660000.0, 0.04214560157729257], [1436890752.0, 22680000.0, 0.04234028616689354], [1436892480.0, 22700000.0, 0.0425357035807], [1436894208.0, 22720000.0, 0.04273185590227761], [1436895936.0, 22740000.0, 0.04292874521916663], [1436897664.0, 22760000.0, 0.04312637362288535], [1436899392.0, 22780000.0, 0.04332474320893437], [1436901120.0, 22800000.0, 0.04352385607679998], [1436902848.0, 22820000.0, 0.04372371432995844], [1436904576.0, 22840000.0, 0.0439243200758794], [1436906304.0, 22860000.0, 0.04412567542603018], [1436908032.0, 22880000.0, 0.044327782495879146], [1436909760.0, 22900000.0, 0.0445306434049], [1436911488.0, 22920000.0, 0.04473426027657521], [1436913216.0, 22940000.0, 0.04493863523840023], [1436914944.0, 22960000.0, 0.045143770421886956], [1436916672.0, 22980000.0, 0.04534966796256798], [1436918400.0, 23000000.0, 0.04555632999999998], [1436920128.0, 23020000.0, 0.04576375867776803], [1436921856.0, 23040000.0, 0.04597195614348901], [1436923584.0, 23060000.0, 0.04618092454881578], [1436925312.0, 23080000.0, 0.04639066604944075], [1436927040.0, 23100000.0, 0.046601182805100005], [1436928768.0, 23120000.0, 0.04681247697957682], [1436930496.0, 23140000.0, 0.04702455074070583], [1436932224.0, 23160000.0, 0.047237406260376556], [1436933952.0, 23180000.0, 0.04745104571453756], [1436935680.0, 23200000.0, 0.04766547128319998], [1436937408.0, 23220000.0, 0.04788068515044164], [1436939136.0, 23240000.0, 0.04809668950441061], [1436940864.0, 23260000.0, 0.04831348653732938], [1436942592.0, 23280000.0, 0.04853107844549835], [1436944320.0, 23300000.0, 0.0487494674293], [1436946048.0, 23320000.0, 0.04896865569320241], [1436947776.0, 23340000.0, 0.049188645445763436], [1436949504.0, 23360000.0, 0.04940943889963416], [1436951232.0, 23380000.0, 0.04963103827156318], [1436952960.0, 23400000.0, 0.049853445782399974], [1436954688.0, 23420000.0, 0.050076663657099243], [1436956416.0, 23440000.0, 0.05030069412472422], [1436958144.0, 23460000.0, 0.05052553941845099], [1436959872.0, 23480000.0, 0.05075120177557196], [1436961600.0, 23500000.0, 0.050977683437500004], [1436963328.0, 23520000.0, 0.051204986649772015], [1436965056.0, 23540000.0, 0.05143311366205303], [1436966784.0, 23560000.0, 0.05166206672813977], [1436968512.0, 23580000.0, 0.051891848105964776], [1436970240.0, 23600000.0, 0.05212246005759999], [1436971968.0, 23620000.0, 0.05235390484926084], [1436973696.0, 23640000.0, 0.052586184751309814], [1436975424.0, 23660000.0, 0.05281930203826058], [1436977152.0, 23680000.0, 0.05305325898878155], [1436978880.0, 23700000.0, 0.05328805788570002], [1436980608.0, 23720000.0, 0.05352370101600562], [1436982336.0, 23740000.0, 0.05376019067085463], [1436984064.0, 23760000.0, 0.05399752914557336], [1436985792.0, 23780000.0, 0.054235718739662375], [1436987520.0, 23800000.0, 0.05447476175679998], [1436989248.0, 23820000.0, 0.054714660504846455], [1436990976.0, 23840000.0, 0.054955417295847415], [1436992704.0, 23860000.0, 0.05519703444603819], [1436994432.0, 23880000.0, 0.05543951427584715], [1436996160.0, 23900000.0, 0.05568285910990001], [1436997888.0, 23920000.0, 0.05592707127702323], [1436999616.0, 23940000.0, 0.05617215311024823], [1437001344.0, 23960000.0, 0.056418106946814955], [1437003072.0, 23980000.0, 0.05666493512817598], [1437004800.0, 24000000.0, 0.05691263999999998], [1437006528.0, 24020000.0, 0.05716122391217604], [1437008256.0, 24040000.0, 0.057410689218817014], [1437009984.0, 24060000.0, 0.0576610382782638], [1437011712.0, 24080000.0, 0.05791227345308876], [1437013440.0, 24100000.0, 0.05816439711010001], [1437015168.0, 24120000.0, 0.05841741162034482], [1437016896.0, 24140000.0, 0.05867131935911384], [1437018624.0, 24160000.0, 0.058926122705944564], [1437020352.0, 24180000.0, 0.05918182404462559], [1437022080.0, 24200000.0, 0.05943842576319999], [1437023808.0, 24220000.0, 0.05969593025396966], [1437025536.0, 24240000.0, 0.05995433991349861], [1437027264.0, 24260000.0, 0.06021365714261741], [1437028992.0, 24280000.0, 0.06047388434642636], [1437030720.0, 24300000.0, 0.060735023934300025], [1437032448.0, 24320000.0, 0.06099707831989044], [1437034176.0, 24340000.0, 0.06126004992113144], [1437035904.0, 24360000.0, 0.06152394116024218], [1437037632.0, 24380000.0, 0.06178875446373121], [1437039360.0, 24400000.0, 0.06205449226239999], [1437041088.0, 24420000.0, 0.06232115699134726], [1437042816.0, 24440000.0, 0.06258875108997222], [1437044544.0, 24460000.0, 0.062857277001979], [1437046272.0, 24480000.0, 0.06312673717537996], [1437048000.0, 24500000.0, 0.06339713406250003], [1437049728.0, 24520000.0, 0.06366847011998003], [1437051456.0, 24540000.0, 0.06394074780878106], [1437053184.0, 24560000.0, 0.06421396959418778], [1437054912.0, 24580000.0, 0.06448813794581279], [1437056640.0, 24600000.0, 0.06476325533759998], [1437058368.0, 24620000.0, 0.06503932424782885], [1437060096.0, 24640000.0, 0.06531634715911781], [1437061824.0, 24660000.0, 0.0655943265584286], [1437063552.0, 24680000.0, 0.06587326493706955], [1437065280.0, 24700000.0, 0.06615316479070003], [1437067008.0, 24720000.0, 0.06643402861933363], [1437068736.0, 24740000.0, 0.06671585892734265], [1437070464.0, 24760000.0, 0.06699865822346136], [1437072192.0, 24780000.0, 0.06728242902079039], [1437073920.0, 24800000.0, 0.0675671738368], [1437075648.0, 24820000.0, 0.06785289519333446], [1437077376.0, 24840000.0, 0.06813959561661542], [1437079104.0, 24860000.0, 0.06842727763724622], [1437080832.0, 24880000.0, 0.06871594379021516], [1437082560.0, 24900000.0, 0.06900559661490001], [1437084288.0, 24920000.0, 0.06929623865507123], [1437086016.0, 24940000.0, 0.06958787245889625], [1437087744.0, 24960000.0, 0.06988050057894296], [1437089472.0, 24980000.0, 0.070174125572184], [1437091200.0, 25000000.0, 0.07046875], [1437092928.0, 25020000.0, 0.07076437642818399], [1437094656.0, 25040000.0, 0.07106100742694503], [1437096384.0, 25060000.0, 0.07135864557091176], [1437098112.0, 25080000.0, 0.07165729343913677], [1437099840.0, 25100000.0, 0.07195695361509998], [1437101568.0, 25120000.0, 0.07225762868671283], [1437103296.0, 25140000.0, 0.07255932124632179], [1437105024.0, 25160000.0, 0.07286203389071258], [1437106752.0, 25180000.0, 0.07316576922111355], [1437108480.0, 25200000.0, 0.0734705298432], [1437110208.0, 25220000.0, 0.07377631836709761], [1437111936.0, 25240000.0, 0.07408313740738663], [1437113664.0, 25260000.0, 0.07439098958310533], [1437115392.0, 25280000.0, 0.07469987751775439], [1437117120.0, 25300000.0, 0.07500980383929998], [1437118848.0, 25320000.0, 0.07532077118017842], [1437120576.0, 25340000.0, 0.0756327821772994], [1437122304.0, 25360000.0, 0.07594583947205018], [1437124032.0, 25380000.0, 0.07625994571029912], [1437125760.0, 25400000.0, 0.0765751035424], [1437127488.0, 25420000.0, 0.07689131562319519], [1437129216.0, 25440000.0, 0.07720858461202024], [1437130944.0, 25460000.0, 0.07752691317270693], [1437132672.0, 25480000.0, 0.07784630397358798], [1437134400.0, 25500000.0, 0.07816675968749996], [1437136128.0, 25520000.0, 0.07848828299178803], [1437137856.0, 25540000.0, 0.078810876568309], [1437139584.0, 25560000.0, 0.07913454310343579], [1437141312.0, 25580000.0, 0.07945928528806073], [1437143040.0, 25600000.0, 0.07978510581760001], [1437144768.0, 25620000.0, 0.0801120073919968], [1437146496.0, 25640000.0, 0.08043999271572583], [1437148224.0, 25660000.0, 0.08076906449779656], [1437149952.0, 25680000.0, 0.08109922545175757], [1437151680.0, 25700000.0, 0.08143047829569997], [1437153408.0, 25720000.0, 0.08176282575226164], [1437155136.0, 25740000.0, 0.08209627054863061], [1437156864.0, 25760000.0, 0.08243081541654938], [1437158592.0, 25780000.0, 0.08276646309231835], [1437160320.0, 25800000.0, 0.0831032163168], [1437162048.0, 25820000.0, 0.0834410778354224], [1437163776.0, 25840000.0, 0.08378005039818345], [1437165504.0, 25860000.0, 0.08412013675965416], [1437167232.0, 25880000.0, 0.08446133967898319], [1437168960.0, 25900000.0, 0.08480366191989998], [1437170688.0, 25920000.0, 0.08514710625071926], [1437172416.0, 25940000.0, 0.08549167544434419], [1437174144.0, 25960000.0, 0.085837372278271], [1437175872.0, 25980000.0, 0.08618419953459194], [1437177600.0, 26000000.0, 0.08653216], [1437179328.0, 26020000.0, 0.08688125646579199], [1437181056.0, 26040000.0, 0.08723149172787303], [1437182784.0, 26060000.0, 0.08758286858675975], [1437184512.0, 26080000.0, 0.08793538984758478], [1437186240.0, 26100000.0, 0.08828905832009996], [1437187968.0, 26120000.0, 0.08864387681868084], [1437189696.0, 26140000.0, 0.08899984816232982], [1437191424.0, 26160000.0, 0.0893569751746806], [1437193152.0, 26180000.0, 0.08971526068400153], [1437194880.0, 26200000.0, 0.09007470752320001], [1437196608.0, 26220000.0, 0.09043531852982563], [1437198336.0, 26240000.0, 0.09079709654607464], [1437200064.0, 26260000.0, 0.09116004441879337], [1437201792.0, 26280000.0, 0.09152416499948239], [1437203520.0, 26300000.0, 0.09188946114429997], [1437205248.0, 26320000.0, 0.09225593571406646], [1437206976.0, 26340000.0, 0.0926235915742674], [1437208704.0, 26360000.0, 0.0929924315950582], [1437210432.0, 26380000.0, 0.09336245865126716], [1437212160.0, 26400000.0, 0.09373367562240002], [1437213888.0, 26420000.0, 0.09410608539264322], [1437215616.0, 26440000.0, 0.09447969085086827], [1437217344.0, 26460000.0, 0.09485449489063497], [1437219072.0, 26480000.0, 0.09523050041019598], [1437220800.0, 26500000.0, 0.09560771031249997], [1437222528.0, 26520000.0, 0.09598612750519606], [1437224256.0, 26540000.0, 0.09636575490063702], [1437225984.0, 26560000.0, 0.0967465954158838], [1437227712.0, 26580000.0, 0.09712865197270874], [1437229440.0, 26600000.0, 0.09751192749760004], [1437231168.0, 26620000.0, 0.09789642492176481], [1437232896.0, 26640000.0, 0.09828214718113387], [1437234624.0, 26660000.0, 0.09866909721636456], [1437236352.0, 26680000.0, 0.0990572779728456], [1437238080.0, 26700000.0, 0.09944669240069999], [1437239808.0, 26720000.0, 0.09983734345478966], [1437241536.0, 26740000.0, 0.10022923409471861], [1437243264.0, 26760000.0, 0.10062236728483741], [1437244992.0, 26780000.0, 0.10101674599424634], [1437246720.0, 26800000.0, 0.10141237319680003], [1437248448.0, 26820000.0, 0.10180925187111044], [1437250176.0, 26840000.0, 0.10220738500055146], [1437251904.0, 26860000.0, 0.10260677557326219], [1437253632.0, 26880000.0, 0.10300742658215119], [1437255360.0, 26900000.0, 0.1034093410249], [1437257088.0, 26920000.0, 0.10381252190396725], [1437258816.0, 26940000.0, 0.10421697222659221], [1437260544.0, 26960000.0, 0.104622695004799], [1437262272.0, 26980000.0, 0.10502969325539997], [1437264000.0, 27000000.0, 0.10543797000000005], [1437265728.0, 27020000.0, 0.105847528265], [1437267456.0, 27040000.0, 0.10625837108160104], [1437269184.0, 27060000.0, 0.10667050148580776], [1437270912.0, 27080000.0, 0.10708392251843282], [1437272640.0, 27100000.0, 0.10749863722509999], [1437274368.0, 27120000.0, 0.10791464865624889], [1437276096.0, 27140000.0, 0.10833195986713781], [1437277824.0, 27160000.0, 0.1087505739178486], [1437279552.0, 27180000.0, 0.10917049387328959], [1437281280.0, 27200000.0, 0.10959172280320002], [1437283008.0, 27220000.0, 0.11001426378215363], [1437284736.0, 27240000.0, 0.11043811988956269], [1437286464.0, 27260000.0, 0.11086329420968137], [1437288192.0, 27280000.0, 0.1112897898316104], [1437289920.0, 27300000.0, 0.11171760984929999], [1437291648.0, 27320000.0, 0.1121467573615545], [1437293376.0, 27340000.0, 0.11257723547203542], [1437295104.0, 27360000.0, 0.11300904728926624], [1437296832.0, 27380000.0, 0.11344219592663518], [1437298560.0, 27400000.0, 0.11387668450240006], [1437300288.0, 27420000.0, 0.11431251613969123], [1437302016.0, 27440000.0, 0.11474969396651627], [1437303744.0, 27460000.0, 0.11518822111576299], [1437305472.0, 27480000.0, 0.11562810072520402], [1437307200.0, 27500000.0, 0.1160693359375], [1437308928.0, 27520000.0, 0.11651192990020397], [1437310656.0, 27540000.0, 0.11695588576576504], [1437312384.0, 27560000.0, 0.11740120669153173], [1437314112.0, 27580000.0, 0.11784789583975677], [1437315840.0, 27600000.0, 0.11829595637759996], [1437317568.0, 27620000.0, 0.11874539147713283], [1437319296.0, 27640000.0, 0.1191962043153418], [1437321024.0, 27660000.0, 0.1196483980741326], [1437322752.0, 27680000.0, 0.12010197594033352], [1437324480.0, 27700000.0, 0.12055694110569999], [1437326208.0, 27720000.0, 0.12101329676691762], [1437327936.0, 27740000.0, 0.12147104612560664], [1437329664.0, 27760000.0, 0.12193019238832534], [1437331392.0, 27780000.0, 0.12239073876657439], [1437333120.0, 27800000.0, 0.12285268847679995], [1437334848.0, 27820000.0, 0.12331604474039844], [1437336576.0, 27840000.0, 0.12378081078371939], [1437338304.0, 27860000.0, 0.1242469898380702], [1437340032.0, 27880000.0, 0.12471458513971911], [1437341760.0, 27900000.0, 0.12518359992990002], [1437343488.0, 27920000.0, 0.12565403745481518], [1437345216.0, 27940000.0, 0.12612590096564022], [1437346944.0, 27960000.0, 0.12659919371852693], [1437348672.0, 27980000.0, 0.12707391897460796], [1437350400.0, 28000000.0, 0.12755007999999993], [1437352128.0, 28020000.0, 0.12802768006580803], [1437353856.0, 28040000.0, 0.12850672244812902], [1437355584.0, 28060000.0, 0.12898721042805578], [1437357312.0, 28080000.0, 0.12946914729168074], [1437359040.0, 28100000.0, 0.1299525363301], [1437360768.0, 28120000.0, 0.1304373808394168], [1437362496.0, 28140000.0, 0.13092368412074581], [1437364224.0, 28160000.0, 0.13141144948021652], [1437365952.0, 28180000.0, 0.13190068022897758], [1437367680.0, 28200000.0, 0.13239137968319994], [1437369408.0, 28220000.0, 0.13288355116408163], [1437371136.0, 28240000.0, 0.1333771979978506], [1437372864.0, 28260000.0, 0.1338723235157694], [1437374592.0, 28280000.0, 0.13436893105413833], [1437376320.0, 28300000.0, 0.1348670239543], [1437378048.0, 28320000.0, 0.1353666055626424], [1437379776.0, 28340000.0, 0.13586767923060344], [1437381504.0, 28360000.0, 0.13637024831467415], [1437383232.0, 28380000.0, 0.13687431617640317], [1437384960.0, 28400000.0, 0.1373798861824], [1437386688.0, 28420000.0, 0.13788696170433926], [1437388416.0, 28440000.0, 0.1383955461189642], [1437390144.0, 28460000.0, 0.138905642808091], [1437391872.0, 28480000.0, 0.13941725515861195], [1437393600.0, 28500000.0, 0.1399303865625], [1437395328.0, 28520000.0, 0.14044504041681202], [1437397056.0, 28540000.0, 0.14096122012369305], [1437398784.0, 28560000.0, 0.14147892909037976], [1437400512.0, 28580000.0, 0.14199817072920481], [1437402240.0, 28600000.0, 0.14251894845759996], [1437403968.0, 28620000.0, 0.14304126569810086], [1437405696.0, 28640000.0, 0.1435651258783498], [1437407424.0, 28660000.0, 0.14409053243110062], [1437409152.0, 28680000.0, 0.14461748879422154], [1437410880.0, 28700000.0, 0.14514599841070003], [1437412608.0, 28720000.0, 0.1456760647286456], [1437414336.0, 28740000.0, 0.14620769120129468], [1437416064.0, 28760000.0, 0.14674088128701337], [1437417792.0, 28780000.0, 0.1472756384493024], [1437419520.0, 28800000.0, 0.14781196615679998], [1437421248.0, 28820000.0, 0.1483498678832865], [1437422976.0, 28840000.0, 0.14888934710768736], [1437424704.0, 28860000.0, 0.14943040731407822], [1437426432.0, 28880000.0, 0.14997305199168712], [1437428160.0, 28900000.0, 0.15051728463490002], [1437429888.0, 28920000.0, 0.1510631087432632], [1437431616.0, 28940000.0, 0.15161052782148826], [1437433344.0, 28960000.0, 0.15215954537945492], [1437435072.0, 28980000.0, 0.15271016493221598], [1437436800.0, 29000000.0, 0.15326238999999997], [1437438528.0, 29020000.0, 0.15381622410821602], [1437440256.0, 29040000.0, 0.154371670787457], [1437441984.0, 29060000.0, 0.1549287335735038], [1437443712.0, 29080000.0, 0.15548741600732874], [1437445440.0, 29100000.0, 0.15604772163510003], [1437447168.0, 29120000.0, 0.1566096540081848], [1437448896.0, 29140000.0, 0.15717321668315384], [1437450624.0, 29160000.0, 0.15773841322178456], [1437452352.0, 29180000.0, 0.15830524719106565], [1437454080.0, 29200000.0, 0.1588737221632], [1437455808.0, 29220000.0, 0.15944384171560966], [1437457536.0, 29240000.0, 0.16001560943093862], [1437459264.0, 29260000.0, 0.16058902889705742], [1437460992.0, 29280000.0, 0.16116410370706635], [1437462720.0, 29300000.0, 0.16174083745930004], [1437464448.0, 29320000.0, 0.1623192337573304], [1437466176.0, 29340000.0, 0.16289929620997148], [1437467904.0, 29360000.0, 0.16348102843128218], [1437469632.0, 29380000.0, 0.16406443404057117], [1437471360.0, 29400000.0, 0.16464951666239996], [1437473088.0, 29420000.0, 0.16523627992658732], [1437474816.0, 29440000.0, 0.1658247274682122], [1437476544.0, 29460000.0, 0.166414862927619], [1437478272.0, 29480000.0, 0.16700668995041998], [1437480000.0, 29500000.0, 0.16760021218750007], [1437481728.0, 29520000.0, 0.16819543329502], [1437483456.0, 29540000.0, 0.1687923569344211], [1437485184.0, 29560000.0, 0.16939098677242775], [1437486912.0, 29580000.0, 0.16999132648105286], [1437488640.0, 29600000.0, 0.1705933797376], [1437490368.0, 29620000.0, 0.1711971502246689], [1437492096.0, 29640000.0, 0.17180264163015782], [1437493824.0, 29660000.0, 0.17240985764726863], [1437495552.0, 29680000.0, 0.17301880197450956], [1437497280.0, 29700000.0, 0.17362947831570005], [1437499008.0, 29720000.0, 0.17424189037997362], [1437500736.0, 29740000.0, 0.1748560418817827], [1437502464.0, 29760000.0, 0.1754719365409014], [1437504192.0, 29780000.0, 0.17608957808243045], [1437505920.0, 29800000.0, 0.17670897023679996], [1437507648.0, 29820000.0, 0.1773301167397745], [1437509376.0, 29840000.0, 0.17795302133245539], [1437511104.0, 29860000.0, 0.17857768776128624], [1437512832.0, 29880000.0, 0.17920411977805517], [1437514560.0, 29900000.0, 0.17983232113990005], [1437516288.0, 29920000.0, 0.1804622956093112], [1437518016.0, 29940000.0, 0.1810940469541363], [1437519744.0, 29960000.0, 0.18172757894758299], [1437521472.0, 29980000.0, 0.18236289536822403], [1437523200.0, 30000000.0, 0.183], [1437524928.0, 30020000.0, 0.183638896632224], [1437526656.0, 30040000.0, 0.18427958905958503], [1437528384.0, 30060000.0, 0.1849220810821517], [1437530112.0, 30080000.0, 0.18556637650537675], [1437531840.0, 30100000.0, 0.18621247914009992], [1437533568.0, 30120000.0, 0.18686039280255284], [1437535296.0, 30140000.0, 0.1875101213143618], [1437537024.0, 30160000.0, 0.18816166850255256], [1437538752.0, 30180000.0, 0.18881503819955348], [1437540480.0, 30200000.0, 0.18947023424320003], [1437542208.0, 30220000.0, 0.19012726047673756], [1437543936.0, 30240000.0, 0.19078612074882662], [1437545664.0, 30260000.0, 0.1914468189135453], [1437547392.0, 30280000.0, 0.1921093588303944], [1437549120.0, 30300000.0, 0.19277374436429992], [1437550848.0, 30320000.0, 0.19343997938561844], [1437552576.0, 30340000.0, 0.1941080677701393], [1437554304.0, 30360000.0, 0.19477801339909018], [1437556032.0, 30380000.0, 0.19544982015913914], [1437557760.0, 30400000.0, 0.19612349194240003], [1437559488.0, 30420000.0, 0.1967990326464352], [1437561216.0, 30440000.0, 0.19747644617426022], [1437562944.0, 30460000.0, 0.19815573643434692], [1437564672.0, 30480000.0, 0.19883690734062798], [1437566400.0, 30500000.0, 0.19951996281249995], [1437568128.0, 30520000.0, 0.20020490677482802], [1437569856.0, 30540000.0, 0.20089174315794897], [1437571584.0, 30560000.0, 0.2015804758976758], [1437573312.0, 30580000.0, 0.2022711089353007], [1437575040.0, 30600000.0, 0.20296364621760002], [1437576768.0, 30620000.0, 0.20365809169683677], [1437578496.0, 30640000.0, 0.20435444933076585], [1437580224.0, 30660000.0, 0.20505272308263653], [1437581952.0, 30680000.0, 0.2057529169211976], [1437583680.0, 30700000.0, 0.20645503482069993], [1437585408.0, 30720000.0, 0.20715908076090164], [1437587136.0, 30740000.0, 0.20786505872707062], [1437588864.0, 30760000.0, 0.20857297270998934], [1437590592.0, 30780000.0, 0.20928282670595835], [1437592320.0, 30800000.0, 0.2099946247168], [1437594048.0, 30820000.0, 0.21070837074986237], [1437595776.0, 30840000.0, 0.21142406881802348], [1437597504.0, 30860000.0, 0.21214172293969413], [1437599232.0, 30880000.0, 0.2128613371388232], [1437600960.0, 30900000.0, 0.21358291544489993], [1437602688.0, 30920000.0, 0.21430646189295927], [1437604416.0, 30940000.0, 0.21503198052358416], [1437606144.0, 30960000.0, 0.21575947538291101], [1437607872.0, 30980000.0, 0.21648895052263195], [1437609600.0, 31000000.0, 0.21722040999999997], [1437611328.0, 31020000.0, 0.217953857877832], [1437613056.0, 31040000.0, 0.21868929822451305], [1437614784.0, 31060000.0, 0.2194267351139997], [1437616512.0, 31080000.0, 0.22016617262582483], [1437618240.0, 31100000.0, 0.22090761484509994], [1437619968.0, 31120000.0, 0.22165106586252087], [1437621696.0, 31140000.0, 0.22239652977436977], [1437623424.0, 31160000.0, 0.22314401068252063], [1437625152.0, 31180000.0, 0.2238935126944415], [1437626880.0, 31200000.0, 0.22464503992320004], [1437628608.0, 31220000.0, 0.22539859648746563], [1437630336.0, 31240000.0, 0.22615418651151467], [1437632064.0, 31260000.0, 0.22691181412523334], [1437633792.0, 31280000.0, 0.22767148346412244], [1437635520.0, 31300000.0, 0.2284331986693], [1437637248.0, 31320000.0, 0.2291969638875065], [1437638976.0, 31340000.0, 0.22996278327110733], [1437640704.0, 31360000.0, 0.23073066097809822], [1437642432.0, 31380000.0, 0.23150060117210716], [1437644160.0, 31400000.0, 0.23227260802240005], [1437645888.0, 31420000.0, 0.23304668570388315], [1437647616.0, 31440000.0, 0.23382283839710827], [1437649344.0, 31460000.0, 0.23460107028827498], [1437651072.0, 31480000.0, 0.235381385569236], [1437652800.0, 31500000.0, 0.23616378843749997], [1437654528.0, 31520000.0, 0.23694828309623608], [1437656256.0, 31540000.0, 0.23773487375427704], [1437657984.0, 31560000.0, 0.23852356462612384], [1437659712.0, 31580000.0, 0.2393143599319487], [1437661440.0, 31600000.0, 0.2401072638976], [1437663168.0, 31620000.0, 0.24090228075460482], [1437664896.0, 31640000.0, 0.2416994147401739], [1437666624.0, 31660000.0, 0.24249867009720452], [1437668352.0, 31680000.0, 0.24330005107428565], [1437670080.0, 31700000.0, 0.24410356192569996], [1437671808.0, 31720000.0, 0.24490920691142967], [1437673536.0, 31740000.0, 0.2457169902971586], [1437675264.0, 31760000.0, 0.24652691635427743], [1437676992.0, 31780000.0, 0.24733898935988632], [1437678720.0, 31800000.0, 0.24815321359680007], [1437680448.0, 31820000.0, 0.2489695933535504], [1437682176.0, 31840000.0, 0.24978813292439148], [1437683904.0, 31860000.0, 0.25060883660930217], [1437685632.0, 31880000.0, 0.25143170871399123], [1437687360.0, 31900000.0, 0.25225675354989996], [1437689088.0, 31920000.0, 0.2530839754342073], [1437690816.0, 31940000.0, 0.25391337868983216], [1437692544.0, 31960000.0, 0.254744967645439], [1437694272.0, 31980000.0, 0.25557874663544], [1437696000.0, 32000000.0, 0.25641472000000004], [1437697728.0, 32020000.0, 0.25725289208504004], [1437699456.0, 32040000.0, 0.2580932672422411], [1437701184.0, 32060000.0, 0.25893584982904777], [1437702912.0, 32080000.0, 0.25978064420867286], [1437704640.0, 32100000.0, 0.26062765475009997], [1437706368.0, 32120000.0, 0.2614768858280889], [1437708096.0, 32140000.0, 0.2623283418231778], [1437709824.0, 32160000.0, 0.2631820271216887], [1437711552.0, 32180000.0, 0.2640379461157296], [1437713280.0, 32200000.0, 0.26489610320320006], [1437715008.0, 32220000.0, 0.2657565027877936], [1437716736.0, 32240000.0, 0.26661914927900265], [1437718464.0, 32260000.0, 0.2674840470921213], [1437720192.0, 32280000.0, 0.2683512006482504], [1437721920.0, 32300000.0, 0.2692206143743], [1437723648.0, 32320000.0, 0.27009229270299456], [1437725376.0, 32340000.0, 0.2709662400728754], [1437727104.0, 32360000.0, 0.2718424609283063], [1437728832.0, 32380000.0, 0.2727209597194752], [1437730560.0, 32400000.0, 0.2736017409024001], [1437732288.0, 32420000.0, 0.27448480893893124], [1437734016.0, 32440000.0, 0.2753701682967563], [1437735744.0, 32460000.0, 0.276257823449403], [1437737472.0, 32480000.0, 0.27714777887624403], [1437739200.0, 32500000.0, 0.2780400390625], [1437740928.0, 32520000.0, 0.27893460849924395], [1437742656.0, 32540000.0, 0.27983149168340504], [1437744384.0, 32560000.0, 0.28073069311777166], [1437746112.0, 32580000.0, 0.28163221731099675], [1437747840.0, 32600000.0, 0.28253606877759996], [1437749568.0, 32620000.0, 0.2834422520379728], [1437751296.0, 32640000.0, 0.28435077161838174], [1437753024.0, 32660000.0, 0.2852616320509726], [1437754752.0, 32680000.0, 0.2861748378737734], [1437756480.0, 32700000.0, 0.2870903936307], [1437758208.0, 32720000.0, 0.2880083038715575], [1437759936.0, 32740000.0, 0.2889285731520467], [1437761664.0, 32760000.0, 0.2898512060337653], [1437763392.0, 32780000.0, 0.29077620708421437], [1437765120.0, 32800000.0, 0.2917035808767999], [1437766848.0, 32820000.0, 0.29263333199083846], [1437768576.0, 32840000.0, 0.2935654650115594], [1437770304.0, 32860000.0, 0.29449998453011017], [1437772032.0, 32880000.0, 0.2954368951435591], [1437773760.0, 32900000.0, 0.29637620145490007], [1437775488.0, 32920000.0, 0.29731790807305514], [1437777216.0, 32940000.0, 0.29826201961288024], [1437778944.0, 32960000.0, 0.2992085406951669], [1437780672.0, 32980000.0, 0.300157475946648], [1437782400.0, 33000000.0, 0.30110882999999994], [1437784128.0, 33020000.0, 0.30206260749384806], [1437785856.0, 33040000.0, 0.3030188130727689], [1437787584.0, 33060000.0, 0.3039774513872958], [1437789312.0, 33080000.0, 0.30493852709392066], [1437791040.0, 33100000.0, 0.3059020448551], [1437792768.0, 33120000.0, 0.3068680093392568], [1437794496.0, 33140000.0, 0.30783642522078586], [1437796224.0, 33160000.0, 0.3088072971800565], [1437797952.0, 33180000.0, 0.3097806299034176], [1437799680.0, 33200000.0, 0.31075642808319986], [1437801408.0, 33220000.0, 0.31173469641772167], [1437803136.0, 33240000.0, 0.31271543961129056], [1437804864.0, 33260000.0, 0.3136986623742094], [1437806592.0, 33280000.0, 0.3146843694227783], [1437808320.0, 33300000.0, 0.31567256547930006], [1437810048.0, 33320000.0, 0.3166632552720824], [1437811776.0, 33340000.0, 0.3176564435354434], [1437813504.0, 33360000.0, 0.3186521350097141], [1437815232.0, 33380000.0, 0.3196503344412433], [1437816960.0, 33400000.0, 0.3206510465823999], [1437818688.0, 33420000.0, 0.32165427619157927], [1437820416.0, 33440000.0, 0.3226600280332042], [1437822144.0, 33460000.0, 0.323668306877731], [1437823872.0, 33480000.0, 0.3246791175016519], [1437825600.0, 33500000.0, 0.3256924646875], [1437827328.0, 33520000.0, 0.32670835322385194], [1437829056.0, 33540000.0, 0.3277267879053331], [1437830784.0, 33560000.0, 0.32874777353261964], [1437832512.0, 33580000.0, 0.32977131491244477], [1437834240.0, 33600000.0, 0.3307974168575999], [1437835968.0, 33620000.0, 0.33182608418694093], [1437837696.0, 33640000.0, 0.33285732172538973], [1437839424.0, 33660000.0, 0.33389113430394063], [1437841152.0, 33680000.0, 0.3349275267596615], [1437842880.0, 33700000.0, 0.3359665039357001], [1437844608.0, 33720000.0, 0.3370080706812856], [1437846336.0, 33740000.0, 0.33805223185173466], [1437848064.0, 33760000.0, 0.33909899230845325], [1437849792.0, 33780000.0, 0.3401483569189424], [1437851520.0, 33800000.0, 0.34120033055679994], [1437853248.0, 33820000.0, 0.3422549181017265], [1437854976.0, 33840000.0, 0.3433121244395274], [1437856704.0, 33860000.0, 0.34437195446211827], [1437858432.0, 33880000.0, 0.34543441306752715], [1437860160.0, 33900000.0, 0.3464995051599], [1437861888.0, 33920000.0, 0.3475672356495031], [1437863616.0, 33940000.0, 0.3486376094527283], [1437865344.0, 33960000.0, 0.3497106314920949], [1437867072.0, 33980000.0, 0.35078630669625604], [1437868800.0, 34000000.0, 0.35186463999999995], [1437870528.0, 34020000.0, 0.35294563634425613], [1437872256.0, 34040000.0, 0.3540293006760969], [1437873984.0, 34060000.0, 0.35511563794874385], [1437875712.0, 34080000.0, 0.35620465312156874], [1437877440.0, 34100000.0, 0.3572963511601], [1437879168.0, 34120000.0, 0.35839073703602475], [1437880896.0, 34140000.0, 0.3594878157271939], [1437882624.0, 34160000.0, 0.3605875922176245], [1437884352.0, 34180000.0, 0.36169007149750565], [1437886080.0, 34200000.0, 0.3627952585632], [1437887808.0, 34220000.0, 0.36390315841724974], [1437889536.0, 34240000.0, 0.36501377606837854], [1437891264.0, 34260000.0, 0.36612711653149743], [1437892992.0, 34280000.0, 0.36724318482770635], [1437894720.0, 34300000.0, 0.36836198598430014], [1437896448.0, 34320000.0, 0.3694835250347705], [1437898176.0, 34340000.0, 0.3706078070188116], [1437899904.0, 34360000.0, 0.3717348369823221], [1437901632.0, 34380000.0, 0.3728646199774112], [1437903360.0, 34400000.0, 0.3739971610624], [1437905088.0, 34420000.0, 0.3751324653018273], [1437906816.0, 34440000.0, 0.37627053776645225], [1437908544.0, 34460000.0, 0.3774113835332591], [1437910272.0, 34480000.0, 0.37855500768545997], [1437912000.0, 34500000.0, 0.3797014153125001], [1437913728.0, 34520000.0, 0.38085061151006], [1437915456.0, 34540000.0, 0.38200260138006115], [1437917184.0, 34560000.0, 0.38315739003066773], [1437918912.0, 34580000.0, 0.3843149825762929], [1437920640.0, 34600000.0, 0.3854753841375999], [1437922368.0, 34620000.0, 0.38663859984150895], [1437924096.0, 34640000.0, 0.38780463482119776], [1437925824.0, 34660000.0, 0.3889734942161087], [1437927552.0, 34680000.0, 0.39014518317194957], [1437929280.0, 34700000.0, 0.3913197068407001], [1437931008.0, 34720000.0, 0.3924970703806136], [1437932736.0, 34740000.0, 0.3936772789562228], [1437934464.0, 34760000.0, 0.39486033773834134], [1437936192.0, 34780000.0, 0.39604625190407056], [1437937920.0, 34800000.0, 0.39723502663679994], [1437939648.0, 34820000.0, 0.3984266671262146], [1437941376.0, 34840000.0, 0.3996211785682954], [1437943104.0, 34860000.0, 0.40081856616532635], [1437944832.0, 34880000.0, 0.4020188351258952], [1437946560.0, 34900000.0, 0.4032219906649002], [1437948288.0, 34920000.0, 0.4044280380035512], [1437950016.0, 34940000.0, 0.40563698236937634], [1437951744.0, 34960000.0, 0.40684882899622293], [1437953472.0, 34980000.0, 0.4080635831242641], [1437955200.0, 35000000.0, 0.40928125], [1437956928.0, 35020000.0, 0.4105018348762639], [1437958656.0, 35040000.0, 0.411725343012225], [1437960384.0, 35060000.0, 0.41295177967339164], [1437962112.0, 35080000.0, 0.41418115013161666], [1437963840.0, 35100000.0, 0.41541345966509974], [1437965568.0, 35120000.0, 0.41664871355839284], [1437967296.0, 35140000.0, 0.41788691710240183], [1437969024.0, 35160000.0, 0.4191280755943926], [1437970752.0, 35180000.0, 0.4203721943379934], [1437972480.0, 35200000.0, 0.4216192786431999], [1437974208.0, 35220000.0, 0.4228693338263774], [1437975936.0, 35240000.0, 0.4241223652102666], [1437977664.0, 35260000.0, 0.4253783781239852], [1437979392.0, 35280000.0, 0.42663737790303446], [1437981120.0, 35300000.0, 0.4278993698892999], [1437982848.0, 35320000.0, 0.42916435943105846], [1437984576.0, 35340000.0, 0.4304323518829794], [1437986304.0, 35360000.0, 0.43170335260613024], [1437988032.0, 35380000.0, 0.43297736696797906], [1437989760.0, 35400000.0, 0.4342544003424], [1437991488.0, 35420000.0, 0.4355344581096751], [1437993216.0, 35440000.0, 0.4368175456565002], [1437994944.0, 35460000.0, 0.4381036683759868], [1437996672.0, 35480000.0, 0.43939283166766796], [1437998400.0, 35500000.0, 0.44068504093749994], [1438000128.0, 35520000.0, 0.441980301597868], [1438001856.0, 35540000.0, 0.44327861906758886], [1438003584.0, 35560000.0, 0.44457999877191584], [1438005312.0, 35580000.0, 0.44588444614254064], [1438007040.0, 35600000.0, 0.4471919666176001], [1438008768.0, 35620000.0, 0.4485025656416768], [1438010496.0, 35640000.0, 0.4498162486658059], [1438012224.0, 35660000.0, 0.45113302114747644], [1438013952.0, 35680000.0, 0.45245288855063764], [1438015680.0, 35700000.0, 0.45377585634569995], [1438017408.0, 35720000.0, 0.4551019300095417], [1438019136.0, 35740000.0, 0.4564311150255105], [1438020864.0, 35760000.0, 0.4577634168834294], [1438022592.0, 35780000.0, 0.45909884107959836], [1438024320.0, 35800000.0, 0.46043739311680004], [1438026048.0, 35820000.0, 0.4617790785043023], [1438027776.0, 35840000.0, 0.46312390275786347], [1438029504.0, 35860000.0, 0.46447187139973406], [1438031232.0, 35880000.0, 0.4658229899586632], [1438032960.0, 35900000.0, 0.46717726396989984], [1438034688.0, 35920000.0, 0.4685346989751993], [1438036416.0, 35940000.0, 0.4698953005228242], [1438038144.0, 35960000.0, 0.4712590741675511], [1438039872.0, 35980000.0, 0.47262602547067195], [1438041600.0, 36000000.0, 0.47399616000000017], [1438043328.0, 36020000.0, 0.47536948332987194], [1438045056.0, 36040000.0, 0.47674600104115306], [1438046784.0, 36060000.0, 0.4781257187212396], [1438048512.0, 36080000.0, 0.4795086419640648], [1438050240.0, 36100000.0, 0.4808947763700999], [1438051968.0, 36120000.0, 0.482284127546361], [1438053696.0, 36140000.0, 0.48367670110640976], [1438055424.0, 36160000.0, 0.4850725026703607], [1438057152.0, 36180000.0, 0.48647153786488145], [1438058880.0, 36200000.0, 0.48787381232320015], [1438060608.0, 36220000.0, 0.4892793316851056], [1438062336.0, 36240000.0, 0.49068810159695475], [1438064064.0, 36260000.0, 0.4921001277116733], [1438065792.0, 36280000.0, 0.49351541568876245], [1438067520.0, 36300000.0, 0.49493397119429994], [1438069248.0, 36320000.0, 0.4963557999009466], [1438070976.0, 36340000.0, 0.49778090748794734], [1438072704.0, 36360000.0, 0.4992092996411383], [1438074432.0, 36380000.0, 0.5006409820529469], [1438076160.0, 36400000.0, 0.5020759604224001], [1438077888.0, 36420000.0, 0.5035142404551232], [1438079616.0, 36440000.0, 0.5049558278633482], [1438081344.0, 36460000.0, 0.506400728365915], [1438083072.0, 36480000.0, 0.5078489476882762], [1438084800.0, 36500000.0, 0.5093004915624999], [1438086528.0, 36520000.0, 0.5107553657272762], [1438088256.0, 36540000.0, 0.5122135759279169], [1438089984.0, 36560000.0, 0.5136751279163638], [1438091712.0, 36580000.0, 0.5151400274511888], [1438093440.0, 36600000.0, 0.5166082802976001], [1438095168.0, 36620000.0, 0.5180798922274447], [1438096896.0, 36640000.0, 0.5195548690192139], [1438098624.0, 36660000.0, 0.5210332164580446], [1438100352.0, 36680000.0, 0.5225149403357258], [1438102080.0, 36700000.0, 0.5240000464507], [1438103808.0, 36720000.0, 0.5254885406080698], [1438105536.0, 36740000.0, 0.5269804286195985], [1438107264.0, 36760000.0, 0.5284757163037176], [1438108992.0, 36780000.0, 0.5299744094855264], [1438110720.0, 36800000.0, 0.5314765139968], [1438112448.0, 36820000.0, 0.5329820356759903], [1438114176.0, 36840000.0, 0.5344909803682315], [1438115904.0, 36860000.0, 0.5360033539253423], [1438117632.0, 36880000.0, 0.5375191622058312], [1438119360.0, 36900000.0, 0.5390384110749], [1438121088.0, 36920000.0, 0.5405611064044474], [1438122816.0, 36940000.0, 0.5420872540730721], [1438124544.0, 36960000.0, 0.5436168599660791], [1438126272.0, 36980000.0, 0.54514992997548], [1438128000.0, 37000000.0, 0.5466864700000001], [1438129728.0, 37020000.0, 0.5482264859450799], [1438131456.0, 37040000.0, 0.5497699837228812], [1438133184.0, 37060000.0, 0.5513169692522879], [1438134912.0, 37080000.0, 0.5528674484589128], [1438136640.0, 37100000.0, 0.5544214272751], [1438138368.0, 37120000.0, 0.555978911639929], [1438140096.0, 37140000.0, 0.5575399074992179], [1438141824.0, 37160000.0, 0.5591044208055287], [1438143552.0, 37180000.0, 0.5606724575181696], [1438145280.0, 37200000.0, 0.5622440236032001], [1438147008.0, 37220000.0, 0.5638191250334336], [1438148736.0, 37240000.0, 0.565397767788443], [1438150464.0, 37260000.0, 0.5669799578545613], [1438152192.0, 37280000.0, 0.5685657012248905], [1438153920.0, 37300000.0, 0.5701550038993], [1438155648.0, 37320000.0, 0.5717478718844345], [1438157376.0, 37340000.0, 0.5733443111937155], [1438159104.0, 37360000.0, 0.5749443278473463], [1438160832.0, 37380000.0, 0.5765479278723153], [1438162560.0, 37400000.0, 0.5781551173024001], [1438164288.0, 37420000.0, 0.5797659021781713], [1438166016.0, 37440000.0, 0.5813802885469964], [1438167744.0, 37460000.0, 0.582998282463043], [1438169472.0, 37480000.0, 0.5846198899872843], [1438171200.0, 37500000.0, 0.5862451171875], [1438172928.0, 37520000.0, 0.5878739701382838], [1438174656.0, 37540000.0, 0.589506454921045], [1438176384.0, 37560000.0, 0.5911425776240116], [1438178112.0, 37580000.0, 0.5927823443422368], [1438179840.0, 37600000.0, 0.5944257611775997], [1438181568.0, 37620000.0, 0.5960728342388126], [1438183296.0, 37640000.0, 0.5977235696414217], [1438185024.0, 37660000.0, 0.5993779735078124], [1438186752.0, 37680000.0, 0.6010360519672134], [1438188480.0, 37700000.0, 0.6026978111557001], [1438190208.0, 37720000.0, 0.6043632572161975], [1438191936.0, 37740000.0, 0.6060323962984866], [1438193664.0, 37760000.0, 0.6077052345592051], [1438195392.0, 37780000.0, 0.6093817781618545], [1438197120.0, 37800000.0, 0.6110620332767998], [1438198848.0, 37820000.0, 0.6127460060812786], [1438200576.0, 37840000.0, 0.6144337027593992], [1438202304.0, 37860000.0, 0.6161251295021501], [1438204032.0, 37880000.0, 0.617820292507399], [1438205760.0, 37900000.0, 0.6195191979799], [1438207488.0, 37920000.0, 0.621221852131295], [1438209216.0, 37940000.0, 0.6229282611801201], [1438210944.0, 37960000.0, 0.6246384313518067], [1438212672.0, 37980000.0, 0.626352368878688], [1438214400.0, 38000000.0, 0.6280700799999996], [1438216128.0, 38020000.0, 0.6297915709618882], [1438217856.0, 38040000.0, 0.6315168480174089], [1438219584.0, 38060000.0, 0.6332459174265358], [1438221312.0, 38080000.0, 0.6349787854561606], [1438223040.0, 38100000.0, 0.6367154583800999], [1438224768.0, 38120000.0, 0.6384559424790968], [1438226496.0, 38140000.0, 0.6402002440408259], [1438228224.0, 38160000.0, 0.6419483693598964], [1438229952.0, 38180000.0, 0.6437003247378577], [1438231680.0, 38200000.0, 0.6454561164831999], [1438233408.0, 38220000.0, 0.6472157509113617], [1438235136.0, 38240000.0, 0.6489792343447305], [1438236864.0, 38260000.0, 0.6507465731126494], [1438238592.0, 38280000.0, 0.6525177735514183], [1438240320.0, 38300000.0, 0.6542928420043], [1438242048.0, 38320000.0, 0.6560717848215223], [1438243776.0, 38340000.0, 0.6578546083602833], [1438245504.0, 38360000.0, 0.659641318984754], [1438247232.0, 38380000.0, 0.6614319230660832], [1438248960.0, 38400000.0, 0.6632264269824], [1438250688.0, 38420000.0, 0.6650248371188194], [1438252416.0, 38440000.0, 0.6668271598674441], [1438254144.0, 38460000.0, 0.6686334016273712], [1438255872.0, 38480000.0, 0.6704435688046918], [1438257600.0, 38500000.0, 0.6722576678125002], [1438259328.0, 38520000.0, 0.6740757050708919], [1438261056.0, 38540000.0, 0.6758976870069732], [1438262784.0, 38560000.0, 0.6777236200548596], [1438264512.0, 38580000.0, 0.6795535106556848], [1438266240.0, 38600000.0, 0.6813873652576], [1438267968.0, 38620000.0, 0.6832251903157809], [1438269696.0, 38640000.0, 0.6850669922924298], [1438271424.0, 38660000.0, 0.6869127776567806], [1438273152.0, 38680000.0, 0.6887625528851015], [1438274880.0, 38700000.0, 0.6906163244607001], [1438276608.0, 38720000.0, 0.6924740988739255], [1438278336.0, 38740000.0, 0.6943358826221748], [1438280064.0, 38760000.0, 0.6962016822098932], [1438281792.0, 38780000.0, 0.6980715041485823], [1438283520.0, 38800000.0, 0.6999453549568], [1438285248.0, 38820000.0, 0.7018232411601664], [1438286976.0, 38840000.0, 0.7037051692913674], [1438288704.0, 38860000.0, 0.7055911458901583], [1438290432.0, 38880000.0, 0.7074811775033671], [1438292160.0, 38900000.0, 0.7093752706849001], [1438293888.0, 38920000.0, 0.7112734319957433], [1438295616.0, 38940000.0, 0.7131756680039685], [1438297344.0, 38960000.0, 0.7150819852847349], [1438299072.0, 38980000.0, 0.7169923904202962], [1438300800.0, 39000000.0, 0.7189068899999999], [1438302528.0, 39020000.0, 0.7208254906202961], [1438304256.0, 39040000.0, 0.7227481988847368], [1438305984.0, 39060000.0, 0.7246750214039839], [1438307712.0, 39080000.0, 0.7266059647958087], [1438309440.0, 39100000.0, 0.7285410356851002], [1438311168.0, 39120000.0, 0.7304802407038646], [1438312896.0, 39140000.0, 0.7324235864912338], [1438314624.0, 39160000.0, 0.7343710796934646], [1438316352.0, 39180000.0, 0.7363227269639456], [1438318080.0, 39200000.0, 0.7382785349631998], [1438319808.0, 39220000.0, 0.7402385103588898], [1438321536.0, 39240000.0, 0.7422026598258185], [1438323264.0, 39260000.0, 0.7441709900459378], [1438324992.0, 39280000.0, 0.7461435077083463], [1438326720.0, 39300000.0, 0.7481202195093001], [1438328448.0, 39320000.0, 0.7501011321522103], [1438330176.0, 39340000.0, 0.7520862523476516], [1438331904.0, 39360000.0, 0.7540755868133622], [1438333632.0, 39380000.0, 0.7560691422742515], [1438335360.0, 39400000.0, 0.7580669254624], [1438337088.0, 39420000.0, 0.7600689431170674], [1438338816.0, 39440000.0, 0.762075201984692], [1438340544.0, 39460000.0, 0.7640857088188991], [1438342272.0, 39480000.0, 0.7661004703804998], [1438344000.0, 39500000.0, 0.7681194934375002], [1438345728.0, 39520000.0, 0.7701427847650999], [1438347456.0, 39540000.0, 0.7721703511457011], [1438349184.0, 39560000.0, 0.7742021993689077], [1438350912.0, 39580000.0, 0.7762383362315329], [1438352640.0, 39600000.0, 0.7782787685376], [1438354368.0, 39620000.0, 0.780323503098349], [1438356096.0, 39640000.0, 0.7823725467322379], [1438357824.0, 39660000.0, 0.7844259062649488], [1438359552.0, 39680000.0, 0.7864835885293895], [1438361280.0, 39700000.0, 0.7885456003657003], [1438363008.0, 39720000.0, 0.7906119486212535], [1438364736.0, 39740000.0, 0.7926826401506629], [1438366464.0, 39760000.0, 0.7947576818157813], [1438368192.0, 39780000.0, 0.7968370804857107], [1438369920.0, 39800000.0, 0.7989208430368], [1438371648.0, 39820000.0, 0.8010089763526547], [1438373376.0, 39840000.0, 0.8031014873241354], [1438375104.0, 39860000.0, 0.8051983828493663], [1438376832.0, 39880000.0, 0.8072996698337352], [1438378560.0, 39900000.0, 0.8094053551899002], [1438380288.0, 39920000.0, 0.8115154458377912], [1438382016.0, 39940000.0, 0.8136299487046165], [1438383744.0, 39960000.0, 0.8157488707248629], [1438385472.0, 39980000.0, 0.8178722188403041], [1438387200.0, 40000000.0, 0.82], [1438388928.0, 40020000.0, 0.8221322211603037], [1438390656.0, 40040000.0, 0.8242688892848645], [1438392384.0, 40060000.0, 0.826410011344632], [1438394112.0, 40080000.0, 0.8285555943178566], [1438395840.0, 40100000.0, 0.8307056451900998], [1438397568.0, 40120000.0, 0.8328601709542325], [1438399296.0, 40140000.0, 0.8350191786104422], [1438401024.0, 40160000.0, 0.8371826751662326], [1438402752.0, 40180000.0, 0.8393506676364333], [1438404480.0, 40200000.0, 0.8415231630431995], [1438406208.0, 40220000.0, 0.8437001684160178], [1438407936.0, 40240000.0, 0.8458816907917065], [1438409664.0, 40260000.0, 0.8480677372144253], [1438411392.0, 40280000.0, 0.8502583147356739], [1438413120.0, 40300000.0, 0.8524534304143002], [1438414848.0, 40320000.0, 0.8546530913164985], [1438416576.0, 40340000.0, 0.8568573045158192], [1438418304.0, 40360000.0, 0.8590660770931696], [1438420032.0, 40380000.0, 0.8612794161368195], [1438421760.0, 40400000.0, 0.8634973287424001], [1438423488.0, 40420000.0, 0.865719822012915], [1438425216.0, 40440000.0, 0.8679469030587398], [1438426944.0, 40460000.0, 0.8701785789976273], [1438428672.0, 40480000.0, 0.8724148569547082], [1438430400.0, 40500000.0, 0.8746557440624997], [1438432128.0, 40520000.0, 0.8769012474609075], [1438433856.0, 40540000.0, 0.8791513742972291], [1438435584.0, 40560000.0, 0.881406131726156], [1438437312.0, 40580000.0, 0.8836655269097805], [1438439040.0, 40600000.0, 0.8859295670175994], [1438440768.0, 40620000.0, 0.8881982592265171], [1438442496.0, 40640000.0, 0.890471610720846], [1438444224.0, 40660000.0, 0.8927496286923162], [1438445952.0, 40680000.0, 0.8950323203400771], [1438447680.0, 40700000.0, 0.8973196928707005], [1438449408.0, 40720000.0, 0.8996117534981818], [1438451136.0, 40740000.0, 0.9019085094439503], [1438452864.0, 40760000.0, 0.9042099679368688], [1438454592.0, 40780000.0, 0.9065161362132388], [1438456320.0, 40800000.0, 0.9088270215168002], [1438458048.0, 40820000.0, 0.9111426310987423], [1438459776.0, 40840000.0, 0.9134629722177029], [1438461504.0, 40860000.0, 0.9157880521397744], [1438463232.0, 40880000.0, 0.9181178781385031], [1438464960.0, 40900000.0, 0.9204524574948998], [1438466688.0, 40920000.0, 0.9227917974974387], [1438468416.0, 40940000.0, 0.9251359054420647], [1438470144.0, 40960000.0, 0.9274847886321911], [1438471872.0, 40980000.0, 0.9298384543787117], [1438473600.0, 41000000.0, 0.9321969099999996], [1438475328.0, 41020000.0, 0.9345601628219126], [1438477056.0, 41040000.0, 0.9369282201777932], [1438478784.0, 41060000.0, 0.9393010894084796], [1438480512.0, 41080000.0, 0.9416787778623043], [1438482240.0, 41100000.0, 0.9440612928951003], [1438483968.0, 41120000.0, 0.946448641870201], [1438485696.0, 41140000.0, 0.9488408321584496], [1438487424.0, 41160000.0, 0.9512378711382], [1438489152.0, 41180000.0, 0.953639766195322], [1438490880.0, 41200000.0, 0.9560465247232001], [1438492608.0, 41220000.0, 0.9584581541227454], [1438494336.0, 41240000.0, 0.9608746618023941], [1438496064.0, 41260000.0, 0.9632960551781138], [1438497792.0, 41280000.0, 0.9657223416734027], [1438499520.0, 41300000.0, 0.9681535287192997], [1438501248.0, 41320000.0, 0.9705896237543858], [1438502976.0, 41340000.0, 0.973030634224788], [1438504704.0, 41360000.0, 0.9754765675841783], [1438506432.0, 41380000.0, 0.9779274312937871], [1438508160.0, 41400000.0, 0.9803832328223996], [1438509888.0, 41420000.0, 0.9828439796463637], [1438511616.0, 41440000.0, 0.9853096792495883], [1438513344.0, 41460000.0, 0.9877803391235548], [1438515072.0, 41480000.0, 0.9902559667673156], [1438516800.0, 41500000.0, 0.9927365696875005], [1438518528.0, 41520000.0, 0.9952221553983162], [1438520256.0, 41540000.0, 0.997712731421557], [1438521984.0, 41560000.0, 1.0002083052866035], [1438523712.0, 41580000.0, 1.002708884530429], [1438525440.0, 41600000.0, 1.0052144766976001], [1438527168.0, 41620000.0, 1.007725089340285], [1438528896.0, 41640000.0, 1.0102407300182537], [1438530624.0, 41660000.0, 1.012761406298885], [1438532352.0, 41680000.0, 1.0152871257571656], [1438534080.0, 41700000.0, 1.0178178959756998], [1438535808.0, 41720000.0, 1.0203537245447092], [1438537536.0, 41740000.0, 1.0228946190620392], [1438539264.0, 41760000.0, 1.0254405871331576], [1438540992.0, 41780000.0, 1.0279916363711665], [1438542720.0, 41800000.0, 1.0305477743967997], [1438544448.0, 41820000.0, 1.033109008838431], [1438546176.0, 41840000.0, 1.0356753473320717], [1438547904.0, 41860000.0, 1.0382467975213823], [1438549632.0, 41880000.0, 1.0408233670576708], [1438551360.0, 41900000.0, 1.0434050635999006], [1438553088.0, 41920000.0, 1.0459918948146874], [1438554816.0, 41940000.0, 1.0485838683763122], [1438556544.0, 41960000.0, 1.0511809919667185], [1438558272.0, 41980000.0, 1.0537832732755208], [1438560000.0, 42000000.0, 1.0563907200000002], [1438561728.0, 42020000.0, 1.0590033398451202], [1438563456.0, 42040000.0, 1.0616211405235205], [1438565184.0, 42060000.0, 1.064244129755528], [1438566912.0, 42080000.0, 1.0668723152691535], [1438568640.0, 42100000.0, 1.0695057048001], [1438570368.0, 42120000.0, 1.0721443060917686], [1438572096.0, 42140000.0, 1.0747881268952582], [1438573824.0, 42160000.0, 1.0774371749693687], [1438575552.0, 42180000.0, 1.0800914580806098], [1438577280.0, 42200000.0, 1.0827509840032], [1438579008.0, 42220000.0, 1.0854157605190742], [1438580736.0, 42240000.0, 1.0880857954178826], [1438582464.0, 42260000.0, 1.0907610964970016], [1438584192.0, 42280000.0, 1.0934416715615303], [1438585920.0, 42300000.0, 1.0961275284243004], [1438587648.0, 42320000.0, 1.0988186749058746], [1438589376.0, 42340000.0, 1.1015151188345553], [1438591104.0, 42360000.0, 1.1042168680463857], [1438592832.0, 42380000.0, 1.106923930385156], [1438594560.0, 42400000.0, 1.1096363137024001], [1438596288.0, 42420000.0, 1.112354025857411], [1438598016.0, 42440000.0, 1.1150770747172358], [1438599744.0, 42460000.0, 1.1178054681566836], [1438601472.0, 42480000.0, 1.1205392140583244], [1438603200.0, 42500000.0, 1.1232783203125], [1438604928.0, 42520000.0, 1.1260227948173236], [1438606656.0, 42540000.0, 1.1287726454786842], [1438608384.0, 42560000.0, 1.131527880210252], [1438610112.0, 42580000.0, 1.1342885069334767], [1438611840.0, 42600000.0, 1.1370545335775997], [1438613568.0, 42620000.0, 1.1398259680796523], [1438615296.0, 42640000.0, 1.1426028183844619], [1438617024.0, 42660000.0, 1.1453850924446527], [1438618752.0, 42680000.0, 1.1481727982206535], [1438620480.0, 42700000.0, 1.1509659436806996], [1438622208.0, 42720000.0, 1.1537645368008378], [1438623936.0, 42740000.0, 1.1565685855649261], [1438625664.0, 42760000.0, 1.1593780979646453], [1438627392.0, 42780000.0, 1.162193081999494], [1438629120.0, 42800000.0, 1.1650135456768], [1438630848.0, 42820000.0, 1.1678394970117183], [1438632576.0, 42840000.0, 1.1706709440272394], [1438634304.0, 42860000.0, 1.17350789475419], [1438636032.0, 42880000.0, 1.1763503572312393], [1438637760.0, 42900000.0, 1.1791983395049002], [1438639488.0, 42920000.0, 1.182051849629535], [1438641216.0, 42940000.0, 1.18491089566736], [1438642944.0, 42960000.0, 1.1877754856884473], [1438644672.0, 42980000.0, 1.190645627770728], [1438646400.0, 43000000.0, 1.1935213299999996], [1438648128.0, 43020000.0, 1.1964026004699273], [1438649856.0, 43040000.0, 1.1992894472820494], [1438651584.0, 43060000.0, 1.202181878545776], [1438653312.0, 43080000.0, 1.2050799023784005], [1438655040.0, 43100000.0, 1.2079835269050996], [1438656768.0, 43120000.0, 1.210892760258937], [1438658496.0, 43140000.0, 1.213807610580866], [1438660224.0, 43160000.0, 1.2167280860197363], [1438661952.0, 43180000.0, 1.2196541947322972], [1438663680.0, 43200000.0, 1.2225859448832004], [1438665408.0, 43220000.0, 1.2255233446450016], [1438667136.0, 43240000.0, 1.2284664021981706], [1438668864.0, 43260000.0, 1.2314151257310888], [1438670592.0, 43280000.0, 1.234369523440059], [1438672320.0, 43300000.0, 1.2373296035293002], [1438674048.0, 43320000.0, 1.2402953742109624], [1438675776.0, 43340000.0, 1.2432668437051229], [1438677504.0, 43360000.0, 1.2462440202397946], [1438679232.0, 43380000.0, 1.2492269120509232], [1438680960.0, 43400000.0, 1.2522155273823998], [1438682688.0, 43420000.0, 1.255209874486059], [1438684416.0, 43440000.0, 1.2582099616216846], [1438686144.0, 43460000.0, 1.261215797057011], [1438687872.0, 43480000.0, 1.2642273890677318], [1438689600.0, 43500000.0, 1.2672447459374996], [1438691328.0, 43520000.0, 1.2702678759579327], [1438693056.0, 43540000.0, 1.2732967874286132], [1438694784.0, 43560000.0, 1.2763314886570998], [1438696512.0, 43580000.0, 1.2793719879589243], [1438698240.0, 43600000.0, 1.2824182936576], [1438699968.0, 43620000.0, 1.2854704140846211], [1438701696.0, 43640000.0, 1.28852835757947], [1438703424.0, 43660000.0, 1.2915921324896198], [1438705152.0, 43680000.0, 1.294661747170542], [1438706880.0, 43700000.0, 1.2977372099857003], [1438708608.0, 43720000.0, 1.3008185293065655], [1438710336.0, 43740000.0, 1.3039057135126142], [1438712064.0, 43760000.0, 1.3069987709913338], [1438713792.0, 43780000.0, 1.3100977101382227], [1438715520.0, 43800000.0, 1.3132025393567999], [1438717248.0, 43820000.0, 1.316313267058606], [1438718976.0, 43840000.0, 1.3194299016632078], [1438720704.0, 43860000.0, 1.3225524515981983], [1438722432.0, 43880000.0, 1.325680925299207], [1438724160.0, 43900000.0, 1.3288153312098996], [1438725888.0, 43920000.0, 1.3319556777819839], [1438727616.0, 43940000.0, 1.3351019734752085], [1438729344.0, 43960000.0, 1.338254226757375], [1438731072.0, 43980000.0, 1.3414124461043357], [1438732800.0, 44000000.0, 1.3445766400000008], [1438734528.0, 44020000.0, 1.3477468169363362], [1438736256.0, 44040000.0, 1.350922985413377], [1438737984.0, 44060000.0, 1.3541051539392233], [1438739712.0, 44080000.0, 1.3572933310300492], [1438741440.0, 44100000.0, 1.3604875252101003], [1438743168.0, 44120000.0, 1.3636877450117046], [1438744896.0, 44140000.0, 1.3668939989752733], [1438746624.0, 44160000.0, 1.370106295649305], [1438748352.0, 44180000.0, 1.3733246435903856], [1438750080.0, 44200000.0, 1.3765490513632], [1438751808.0, 44220000.0, 1.3797795275405291], [1438753536.0, 44240000.0, 1.3830160807032592], [1438755264.0, 44260000.0, 1.3862587194403773], [1438756992.0, 44280000.0, 1.3895074523489863], [1438758720.0, 44300000.0, 1.3927622880342994], [1438760448.0, 44320000.0, 1.396023235109651], [1438762176.0, 44340000.0, 1.3992903021964918], [1438763904.0, 44360000.0, 1.402563497924402], [1438765632.0, 44380000.0, 1.4058428309310906], [1438767360.0, 44400000.0, 1.4091283098624006], [1438769088.0, 44420000.0, 1.4124199433723077], [1438770816.0, 44440000.0, 1.4157177401229322], [1438772544.0, 44460000.0, 1.4190217087845385], [1438774272.0, 44480000.0, 1.4223318580355409], [1438776000.0, 44500000.0, 1.4256481965625003], [1438777728.0, 44520000.0, 1.42897073306014], [1438779456.0, 44540000.0, 1.4322994762313406], [1438781184.0, 44560000.0, 1.4356344347871484], [1438782912.0, 44580000.0, 1.4389756174467734], [1438784640.0, 44600000.0, 1.4423230329376002], [1438786368.0, 44620000.0, 1.4456766899951885], [1438788096.0, 44640000.0, 1.4490365973632786], [1438789824.0, 44660000.0, 1.4524027637937889], [1438791552.0, 44680000.0, 1.4557751980468296], [1438793280.0, 44700000.0, 1.4591539088906995], [1438795008.0, 44720000.0, 1.4625389051018944], [1438796736.0, 44740000.0, 1.4659301954651027], [1438798464.0, 44760000.0, 1.4693277887732215], [1438800192.0, 44780000.0, 1.47273169382735], [1438801920.0, 44800000.0, 1.4761419194368006], [1438803648.0, 44820000.0, 1.4795584744190946], [1438805376.0, 44840000.0, 1.4829813675999752], [1438807104.0, 44860000.0, 1.486410607813406], [1438808832.0, 44880000.0, 1.489846203901576], [1438810560.0, 44900000.0, 1.4932881647149003], [1438812288.0, 44920000.0, 1.496736499112031], [1438814016.0, 44940000.0, 1.5001912159598558], [1438815744.0, 44960000.0, 1.503652324133504], [1438817472.0, 44980000.0, 1.5071198325163446], [1438819200.0, 45000000.0, 1.51059375], [1438820928.0, 45020000.0, 1.5140740854843437], [1438822656.0, 45040000.0, 1.5175608478775042], [1438824384.0, 45060000.0, 1.5210540460958724], [1438826112.0, 45080000.0, 1.5245536890640967], [1438827840.0, 45100000.0, 1.5280597857150997], [1438829568.0, 45120000.0, 1.531572344990072], [1438831296.0, 45140000.0, 1.5350913758384823], [1438833024.0, 45160000.0, 1.5386168872180728], [1438834752.0, 45180000.0, 1.5421488880948735], [1438836480.0, 45200000.0, 1.5456873874431991], [1438838208.0, 45220000.0, 1.5492323942456578], [1438839936.0, 45240000.0, 1.5527839174931464], [1438841664.0, 45260000.0, 1.556341966184865], [1438843392.0, 45280000.0, 1.5599065493283135], [1438845120.0, 45300000.0, 1.5634776759393005], [1438846848.0, 45320000.0, 1.5670553550419386], [1438848576.0, 45340000.0, 1.5706395956686592], [1438850304.0, 45360000.0, 1.5742304068602095], [1438852032.0, 45380000.0, 1.5778277976656596], [1438853760.0, 45400000.0, 1.5814317771424], [1438855488.0, 45420000.0, 1.5850423543561547], [1438857216.0, 45440000.0, 1.5886595383809794], [1438858944.0, 45460000.0, 1.5922833382992676], [1438860672.0, 45480000.0, 1.5959137632017477], [1438862400.0, 45500000.0, 1.5995508221874997], [1438864128.0, 45520000.0, 1.6031945243639472], [1438865856.0, 45540000.0, 1.6068448788468697], [1438867584.0, 45560000.0, 1.610501894760396], [1438869312.0, 45580000.0, 1.6141655812370204], [1438871040.0, 45600000.0, 1.6178359474175992], [1438872768.0, 45620000.0, 1.621513002451357], [1438874496.0, 45640000.0, 1.6251967554958862], [1438876224.0, 45660000.0, 1.6288872157171563], [1438877952.0, 45680000.0, 1.6325843922895167], [1438879680.0, 45700000.0, 1.6362882943957007], [1438881408.0, 45720000.0, 1.6399989312268217], [1438883136.0, 45740000.0, 1.6437163119823903], [1438884864.0, 45760000.0, 1.6474404458703087], [1438886592.0, 45780000.0, 1.6511713421068788], [1438888320.0, 45800000.0, 1.6549090099168002], [1438890048.0, 45820000.0, 1.6586534585331822], [1438891776.0, 45840000.0, 1.6624046971975428], [1438893504.0, 45860000.0, 1.6661627351598145], [1438895232.0, 45880000.0, 1.6699275816783434], [1438896960.0, 45900000.0, 1.6736992460198996], [1438898688.0, 45920000.0, 1.6774777374596788], [1438900416.0, 45940000.0, 1.681263065281305], [1438902144.0, 45960000.0, 1.6850552387768314], [1438903872.0, 45980000.0, 1.688854267246752], [1438905600.0, 46000000.0, 1.6926601599999995], [1438907328.0, 46020000.0, 1.696472926353953], [1438909056.0, 46040000.0, 1.7002925756344334], [1438910784.0, 46060000.0, 1.7041191171757195], [1438912512.0, 46080000.0, 1.707952560320544], [1438914240.0, 46100000.0, 1.7117929144201007], [1438915968.0, 46120000.0, 1.7156401888340413], [1438917696.0, 46140000.0, 1.7194943929304896], [1438919424.0, 46160000.0, 1.7233555360860398], [1438921152.0, 46180000.0, 1.727223627685762], [1438922880.0, 46200000.0, 1.7310986771232002], [1438924608.0, 46220000.0, 1.7349806938003853], [1438926336.0, 46240000.0, 1.738869687127834], [1438928064.0, 46260000.0, 1.7427656665245541], [1438929792.0, 46280000.0, 1.7466686414180426], [1438931520.0, 46300000.0, 1.7505786212443], [1438933248.0, 46320000.0, 1.7544956154478257], [1438934976.0, 46340000.0, 1.758419633481628], [1438936704.0, 46360000.0, 1.7623506848072181], [1438938432.0, 46380000.0, 1.7662887788946269], [1438940160.0, 46400000.0, 1.770233925222399], [1438941888.0, 46420000.0, 1.7741861332776039], [1438943616.0, 46440000.0, 1.7781454125558285], [1438945344.0, 46460000.0, 1.7821117725611948], [1438947072.0, 46480000.0, 1.7860852228063555], [1438948800.0, 46500000.0, 1.7900657728125007], [1438950528.0, 46520000.0, 1.7940534321093562], [1438952256.0, 46540000.0, 1.798048210235197], [1438953984.0, 46560000.0, 1.8020501167368432], [1438955712.0, 46580000.0, 1.8060591611696692], [1438957440.0, 46600000.0, 1.8100753530976001], [1438959168.0, 46620000.0, 1.8140987020931247], [1438960896.0, 46640000.0, 1.8181292177372932], [1438962624.0, 46660000.0, 1.8221669096197253], [1438964352.0, 46680000.0, 1.8262117873386055], [1438966080.0, 46700000.0, 1.8302638605006998], [1438967808.0, 46720000.0, 1.8343231387213492], [1438969536.0, 46740000.0, 1.8383896316244792], [1438971264.0, 46760000.0, 1.8424633488425977], [1438972992.0, 46780000.0, 1.8465443000168063], [1438974720.0, 46800000.0, 1.8506324947967994], [1438976448.0, 46820000.0, 1.8547279428408712], [1438978176.0, 46840000.0, 1.858830653815912], [1438979904.0, 46860000.0, 1.862940637397422], [1438981632.0, 46880000.0, 1.867057903269511], [1438983360.0, 46900000.0, 1.8711824611249008], [1438985088.0, 46920000.0, 1.8753143206649276], [1438986816.0, 46940000.0, 1.8794534915995518], [1438988544.0, 46960000.0, 1.8835999836473585], [1438990272.0, 46980000.0, 1.8877538065355606], [1438992000.0, 47000000.0, 1.8919149700000004], [1438993728.0, 47020000.0, 1.8960834837851601], [1438995456.0, 47040000.0, 1.9002593576441604], [1438997184.0, 47060000.0, 1.9044426013387685], [1438998912.0, 47080000.0, 1.9086332246393931], [1439000640.0, 47100000.0, 1.9128312373251], [1439002368.0, 47120000.0, 1.9170366491836084], [1439004096.0, 47140000.0, 1.9212494700112985], [1439005824.0, 47160000.0, 1.9254697096132087], [1439007552.0, 47180000.0, 1.9296973778030495], [1439009280.0, 47200000.0, 1.9339324844031998], [1439011008.0, 47220000.0, 1.9381750392447146], [1439012736.0, 47240000.0, 1.9424250521673232], [1439014464.0, 47260000.0, 1.9466825330194413], [1439016192.0, 47280000.0, 1.9509474916581702], [1439017920.0, 47300000.0, 1.955219937949301], [1439019648.0, 47320000.0, 1.9594998817673148], [1439021376.0, 47340000.0, 1.9637873329953954], [1439023104.0, 47360000.0, 1.9680823015254256], [1439024832.0, 47380000.0, 1.9723847972579962], [1439026560.0, 47400000.0, 1.9766948301024005], [1439028288.0, 47420000.0, 1.9810124099766508], [1439030016.0, 47440000.0, 1.9853375468074757], [1439031744.0, 47460000.0, 1.9896702505303236], [1439033472.0, 47480000.0, 1.9940105310893643], [1439035200.0, 47500000.0, 1.9983583984375], [1439036928.0, 47520000.0, 2.0027138625363636], [1439038656.0, 47540000.0, 2.0070769333563243], [1439040384.0, 47560000.0, 2.0114476208764924], [1439042112.0, 47580000.0, 2.0158259350847167], [1439043840.0, 47600000.0, 2.020211885977599], [1439045568.0, 47620000.0, 2.024605483560492], [1439047296.0, 47640000.0, 2.0290067378475025], [1439049024.0, 47660000.0, 2.0334156588614927], [1439050752.0, 47680000.0, 2.0378322566340934], [1439052480.0, 47700000.0, 2.0422565412056994], [1439054208.0, 47720000.0, 2.046688522625478], [1439055936.0, 47740000.0, 2.0511282109513664], [1439057664.0, 47760000.0, 2.055575616250085], [1439059392.0, 47780000.0, 2.0600307485971334], [1439061120.0, 47800000.0, 2.064493618076801], [1439062848.0, 47820000.0, 2.0689642347821593], [1439064576.0, 47840000.0, 2.073442608815079], [1439066304.0, 47860000.0, 2.0779287502862296], [1439068032.0, 47880000.0, 2.08242266931508], [1439069760.0, 47900000.0, 2.0869243760298994], [1439071488.0, 47920000.0, 2.091433880567775], [1439073216.0, 47940000.0, 2.0959511930745993], [1439074944.0, 47960000.0, 2.1004763237050876], [1439076672.0, 47980000.0, 2.1050092826227678], [1439078400.0, 48000000.0, 2.1095500799999996], [1439080128.0, 48020000.0, 2.1140987260179673], [1439081856.0, 48040000.0, 2.1186552308666897], [1439083584.0, 48060000.0, 2.123219604745016], [1439085312.0, 48080000.0, 2.1277918578606405], [1439087040.0, 48100000.0, 2.132372000430099], [1439088768.0, 48120000.0, 2.136960042678778], [1439090496.0, 48140000.0, 2.1415559948409064], [1439092224.0, 48160000.0, 2.1461598671595756], [1439093952.0, 48180000.0, 2.1507716698867365], [1439095680.0, 48200000.0, 2.155391413283201], [1439097408.0, 48220000.0, 2.1600191076186426], [1439099136.0, 48240000.0, 2.1646547631716104], [1439100864.0, 48260000.0, 2.1692983902295286], [1439102592.0, 48280000.0, 2.173949999088699], [1439104320.0, 48300000.0, 2.1786096000543], [1439106048.0, 48320000.0, 2.1832772034404018], [1439107776.0, 48340000.0, 2.1879528195699622], [1439109504.0, 48360000.0, 2.1926364587748344], [1439111232.0, 48380000.0, 2.1973281313957638], [1439112960.0, 48400000.0, 2.2020278477823996], [1439114688.0, 48420000.0, 2.206735618293299], [1439116416.0, 48440000.0, 2.211451453295925], [1439118144.0, 48460000.0, 2.216175363166651], [1439119872.0, 48480000.0, 2.2209073582907717], [1439121600.0, 48500000.0, 2.2256474490624987], [1439123328.0, 48520000.0, 2.2303956458849723], [1439125056.0, 48540000.0, 2.235151959170253], [1439126784.0, 48560000.0, 2.23991639933934], [1439128512.0, 48580000.0, 2.244688976822164], [1439130240.0, 48600000.0, 2.249469702057601], [1439131968.0, 48620000.0, 2.2542585854934614], [1439133696.0, 48640000.0, 2.25905563758651], [1439135424.0, 48660000.0, 2.2638608688024595], [1439137152.0, 48680000.0, 2.268674289615982], [1439138880.0, 48700000.0, 2.2734959105107007], [1439140608.0, 48720000.0, 2.278325741979206], [1439142336.0, 48740000.0, 2.2831637945230536], [1439144064.0, 48760000.0, 2.2880100786527744], [1439145792.0, 48780000.0, 2.2928646048878627], [1439147520.0, 48800000.0, 2.2977273837568], [1439149248.0, 48820000.0, 2.3025984257970458], [1439150976.0, 48840000.0, 2.3074777415550485], [1439152704.0, 48860000.0, 2.3123653415862386], [1439154432.0, 48880000.0, 2.317261236455047], [1439156160.0, 48900000.0, 2.3221654367348994], [1439157888.0, 48920000.0, 2.3270779530082235], [1439159616.0, 48940000.0, 2.3319987958664488], [1439161344.0, 48960000.0, 2.3369279759100148], [1439163072.0, 48980000.0, 2.3418655037483753], [1439164800.0, 49000000.0, 2.346811390000001], [1439166528.0, 49020000.0, 2.3517656452923767], [1439168256.0, 49040000.0, 2.356728280262017], [1439169984.0, 49060000.0, 2.3616993055544633], [1439171712.0, 49080000.0, 2.3666787318242895], [1439173440.0, 49100000.0, 2.3716665697351], [1439175168.0, 49120000.0, 2.376662829959545], [1439176896.0, 49140000.0, 2.381667523179314], [1439178624.0, 49160000.0, 2.386680660085146], [1439180352.0, 49180000.0, 2.391702251376826], [1439182080.0, 49200000.0, 2.3967323077632], [1439183808.0, 49220000.0, 2.4017708399621687], [1439185536.0, 49240000.0, 2.406817858700699], [1439187264.0, 49260000.0, 2.411873374714818], [1439188992.0, 49280000.0, 2.416937398749626], [1439190720.0, 49300000.0, 2.4220099415592986], [1439192448.0, 49320000.0, 2.427091013907091], [1439194176.0, 49340000.0, 2.4321806265653323], [1439195904.0, 49360000.0, 2.437278790315442], [1439197632.0, 49380000.0, 2.442385515947931], [1439199360.0, 49400000.0, 2.447500814262401], [1439201088.0, 49420000.0, 2.4526246960675477], [1439202816.0, 49440000.0, 2.4577571721811724], [1439204544.0, 49460000.0, 2.462898253430178], [1439206272.0, 49480000.0, 2.468047950650581], [1439208000.0, 49500000.0, 2.4732062746875], [1439209728.0, 49520000.0, 2.47837323639518], [1439211456.0, 49540000.0, 2.4835488466369804], [1439213184.0, 49560000.0, 2.4887331162853887], [1439214912.0, 49580000.0, 2.4939260562220134], [1439216640.0, 49600000.0, 2.4991276773376], [1439218368.0, 49620000.0, 2.5043379905320284], [1439220096.0, 49640000.0, 2.5095570067143194], [1439221824.0, 49660000.0, 2.5147847368026293], [1439223552.0, 49680000.0, 2.520021191724269], [1439225280.0, 49700000.0, 2.5252663824156993], [1439227008.0, 49720000.0, 2.530520319822535], [1439228736.0, 49740000.0, 2.5357830148995437], [1439230464.0, 49760000.0, 2.5410544786106617], [1439232192.0, 49780000.0, 2.54633472192899], [1439233920.0, 49800000.0, 2.5516237558368013], [1439235648.0, 49820000.0, 2.556921591325535], [1439237376.0, 49840000.0, 2.562228239395816], [1439239104.0, 49860000.0, 2.5675437110574455], [1439240832.0, 49880000.0, 2.5728680173294154], [1439242560.0, 49900000.0, 2.5782011692399003], [1439244288.0, 49920000.0, 2.583543177826271], [1439246016.0, 49940000.0, 2.588894054135096], [1439247744.0, 49960000.0, 2.594253809222144], [1439249472.0, 49980000.0, 2.5996224541523842], [1439251200.0, 50000000.0, 2.605], [1439252928.0, 50020000.0, 2.6103864578483833], [1439254656.0, 50040000.0, 2.6157818387901433], [1439256384.0, 50060000.0, 2.621186153927112], [1439258112.0, 50080000.0, 2.626599414370336], [1439259840.0, 50100000.0, 2.6320216312400997], [1439261568.0, 50120000.0, 2.6374528156659123], [1439263296.0, 50140000.0, 2.6428929787865227], [1439265024.0, 50160000.0, 2.6483421317499123], [1439266752.0, 50180000.0, 2.6538002857133134], [1439268480.0, 50200000.0, 2.6592674518431987], [1439270208.0, 50220000.0, 2.6647436413152987], [1439271936.0, 50240000.0, 2.670228865314586], [1439273664.0, 50260000.0, 2.675723135035305], [1439275392.0, 50280000.0, 2.681226461680953], [1439277120.0, 50300000.0, 2.6867388564643004], [1439278848.0, 50320000.0, 2.692260330607379], [1439280576.0, 50340000.0, 2.697790895341499], [1439282304.0, 50360000.0, 2.703330561907249], [1439284032.0, 50380000.0, 2.7088793415545003], [1439285760.0, 50400000.0, 2.7144372455424], [1439287488.0, 50420000.0, 2.7200042851393946], [1439289216.0, 50440000.0, 2.725580471623219], [1439290944.0, 50460000.0, 2.7311658162809076], [1439292672.0, 50480000.0, 2.7367603304087873], [1439294400.0, 50500000.0, 2.7423640253124995], [1439296128.0, 50520000.0, 2.747976912306987], [1439297856.0, 50540000.0, 2.75359900271651], [1439299584.0, 50560000.0, 2.7592303078746356], [1439301312.0, 50580000.0, 2.7648708391242605], [1439303040.0, 50600000.0, 2.770520607817599], [1439304768.0, 50620000.0, 2.776179625316197], [1439306496.0, 50640000.0, 2.781847902990926], [1439308224.0, 50660000.0, 2.787525452221996], [1439309952.0, 50680000.0, 2.793212284398957], [1439311680.0, 50700000.0, 2.7989084109207005], [1439313408.0, 50720000.0, 2.8046138431954617], [1439315136.0, 50740000.0, 2.8103285926408303], [1439316864.0, 50760000.0, 2.816052670683748], [1439318592.0, 50780000.0, 2.8217860887605197], [1439320320.0, 50800000.0, 2.8275288583168], [1439322048.0, 50820000.0, 2.8332809908076215], [1439323776.0, 50840000.0, 2.839042497697382], [1439325504.0, 50860000.0, 2.8448133904598545], [1439327232.0, 50880000.0, 2.8505936805781835], [1439328960.0, 50900000.0, 2.8563833795449], [1439330688.0, 50920000.0, 2.862182498861918], [1439332416.0, 50940000.0, 2.867991050040545], [1439334144.0, 50960000.0, 2.8738090446014715], [1439335872.0, 50980000.0, 2.879636494074792], [1439337600.0, 51000000.0, 2.8854734099999986], [1439339328.0, 51020000.0, 2.8913198039259926], [1439341056.0, 51040000.0, 2.8971756874110732], [1439342784.0, 51060000.0, 2.9030410720229596], [1439344512.0, 51080000.0, 2.9089159693387843], [1439346240.0, 51100000.0, 2.9148003909451012], [1439347968.0, 51120000.0, 2.9206943484378813], [1439349696.0, 51140000.0, 2.9265978534225296], [1439351424.0, 51160000.0, 2.9325109175138793], [1439353152.0, 51180000.0, 2.938433552336202], [1439354880.0, 51200000.0, 2.944365769523201], [1439356608.0, 51220000.0, 2.9503075807180252], [1439358336.0, 51240000.0, 2.9562589975732734], [1439360064.0, 51260000.0, 2.962220031750994], [1439361792.0, 51280000.0, 2.9681906949226824], [1439363520.0, 51300000.0, 2.9741709987693], [1439365248.0, 51320000.0, 2.980160954981266], [1439366976.0, 51340000.0, 2.9861605752584683], [1439368704.0, 51360000.0, 2.9921698713102587], [1439370432.0, 51380000.0, 2.9981888548554667], [1439372160.0, 51400000.0, 3.0042175376223987], [1439373888.0, 51420000.0, 3.010255931348844], [1439375616.0, 51440000.0, 3.0163040477820684], [1439377344.0, 51460000.0, 3.0223618986788345], [1439379072.0, 51480000.0, 3.028429495805395], [1439380800.0, 51500000.0, 3.034506850937501], [1439382528.0, 51520000.0, 3.040593975860397], [1439384256.0, 51540000.0, 3.046690882368837], [1439385984.0, 51560000.0, 3.052797582267083], [1439387712.0, 51580000.0, 3.05891408736891], [1439389440.0, 51600000.0, 3.0650404094976], [1439391168.0, 51620000.0, 3.071176560485964], [1439392896.0, 51640000.0, 3.0773225521763323], [1439394624.0, 51660000.0, 3.083478396420566], [1439396352.0, 51680000.0, 3.0896441050800463], [1439398080.0, 51700000.0, 3.0958196900256993], [1439399808.0, 51720000.0, 3.102005163137989], [1439401536.0, 51740000.0, 3.10820053630692], [1439403264.0, 51760000.0, 3.114405821432038], [1439404992.0, 51780000.0, 3.1206210304224467], [1439406720.0, 51800000.0, 3.1268461751967993], [1439408448.0, 51820000.0, 3.1330812676833113], [1439410176.0, 51840000.0, 3.1393263198197525], [1439411904.0, 51860000.0, 3.1455813435534625], [1439413632.0, 51880000.0, 3.1518463508413506], [1439415360.0, 51900000.0, 3.1581213536499013], [1439417088.0, 51920000.0, 3.164406363955168], [1439418816.0, 51940000.0, 3.170701393742792], [1439420544.0, 51960000.0, 3.1770064550079984], [1439422272.0, 51980000.0, 3.183321559755601], [1439424000.0, 52000000.0, 3.1896467200000007], [1439425728.0, 52020000.0, 3.1959819477652], [1439427456.0, 52040000.0, 3.2023272550848], [1439429184.0, 52060000.0, 3.2086826540020095], [1439430912.0, 52080000.0, 3.2150481565696336], [1439432640.0, 52100000.0, 3.2214237748501002], [1439434368.0, 52120000.0, 3.2278095209154483], [1439436096.0, 52140000.0, 3.234205406847339], [1439437824.0, 52160000.0, 3.2406114447370493], [1439439552.0, 52180000.0, 3.2470276466854893], [1439441280.0, 52200000.0, 3.253454024803199], [1439443008.0, 52220000.0, 3.259890591210355], [1439444736.0, 52240000.0, 3.2663373580367634], [1439446464.0, 52260000.0, 3.2727943374218817], [1439448192.0, 52280000.0, 3.27926154151481], [1439449920.0, 52300000.0, 3.2857389824743017], [1439451648.0, 52320000.0, 3.292226672468755], [1439453376.0, 52340000.0, 3.298724623676235], [1439455104.0, 52360000.0, 3.305232848284465], [1439456832.0, 52380000.0, 3.3117513584908367], [1439458560.0, 52400000.0, 3.3182801665024004], [1439460288.0, 52420000.0, 3.324819284535892], [1439462016.0, 52440000.0, 3.3313687248177155], [1439463744.0, 52460000.0, 3.337928499583964], [1439465472.0, 52480000.0, 3.344498621080405], [1439467200.0, 52500000.0, 3.3510791015625], [1439468928.0, 52520000.0, 3.357669953295403], [1439470656.0, 52540000.0, 3.3642711885539636], [1439472384.0, 52560000.0, 3.3708828196227323], [1439474112.0, 52580000.0, 3.377504858795956], [1439475840.0, 52600000.0, 3.384137318377599], [1439477568.0, 52620000.0, 3.390780210681332], [1439479296.0, 52640000.0, 3.3974335480305426], [1439481024.0, 52660000.0, 3.4040973427583325], [1439482752.0, 52680000.0, 3.410771607207533], [1439484480.0, 52700000.0, 3.4174563537306986], [1439486208.0, 52720000.0, 3.4241515946901186], [1439487936.0, 52740000.0, 3.4308573424578066], [1439489664.0, 52760000.0, 3.4375736094155247], [1439491392.0, 52780000.0, 3.4443004079547728], [1439493120.0, 52800000.0, 3.451037750476801], [1439494848.0, 52820000.0, 3.4577856493925987], [1439496576.0, 52840000.0, 3.464544117122919], [1439498304.0, 52860000.0, 3.471313166098269], [1439500032.0, 52880000.0, 3.4780928087589205], [1439501760.0, 52900000.0, 3.4848830575548995], [1439503488.0, 52920000.0, 3.491683924946015], [1439505216.0, 52940000.0, 3.498495423401838], [1439506944.0, 52960000.0, 3.5053175654017275], [1439508672.0, 52980000.0, 3.512150363434808], [1439510400.0, 53000000.0, 3.518993829999999], [1439512128.0, 53020000.0, 3.5258479776060065], [1439513856.0, 53040000.0, 3.5327128187713295], [1439515584.0, 53060000.0, 3.5395883660242564], [1439517312.0, 53080000.0, 3.546474631902881], [1439519040.0, 53100000.0, 3.553371628955099], [1439520768.0, 53120000.0, 3.5602793697386175], [1439522496.0, 53140000.0, 3.5671978668209454], [1439524224.0, 53160000.0, 3.5741271327794157], [1439525952.0, 53180000.0, 3.5810671802011766], [1439527680.0, 53200000.0, 3.5880180216832014], [1439529408.0, 53220000.0, 3.5949796698322816], [1439531136.0, 53240000.0, 3.6019521372650503], [1439532864.0, 53260000.0, 3.6089354366079673], [1439534592.0, 53280000.0, 3.6159295804973395], [1439536320.0, 53300000.0, 3.6229345815793], [1439538048.0, 53320000.0, 3.6299504525098416], [1439539776.0, 53340000.0, 3.6369772059548025], [1439541504.0, 53360000.0, 3.6440148545898747], [1439543232.0, 53380000.0, 3.6510634111006035], [1439544960.0, 53400000.0, 3.658122888182399], [1439546688.0, 53420000.0, 3.6651932985405384], [1439548416.0, 53440000.0, 3.6722746548901655], [1439550144.0, 53460000.0, 3.6793669699562908], [1439551872.0, 53480000.0, 3.6864702564738114], [1439553600.0, 53500000.0, 3.6935845271874985], [1439555328.0, 53520000.0, 3.7007097948520133], [1439557056.0, 53540000.0, 3.7078460722318933], [1439558784.0, 53560000.0, 3.714993372101579], [1439560512.0, 53580000.0, 3.722151707245403], [1439562240.0, 53600000.0, 3.7293210904576006], [1439563968.0, 53620000.0, 3.736501534542301], [1439565696.0, 53640000.0, 3.7436930523135494], [1439567424.0, 53660000.0, 3.750895656595299], [1439569152.0, 53680000.0, 3.758109360221423], [1439570880.0, 53700000.0, 3.7653341760357013], [1439572608.0, 53720000.0, 3.772570116891845], [1439574336.0, 53740000.0, 3.7798171956534934], [1439576064.0, 53760000.0, 3.7870754251942143], [1439577792.0, 53780000.0, 3.7943448183975024], [1439579520.0, 53800000.0, 3.8016253881568], [1439581248.0, 53820000.0, 3.8089171473754853], [1439582976.0, 53840000.0, 3.8162201089668883], [1439584704.0, 53860000.0, 3.823534285854279], [1439586432.0, 53880000.0, 3.830859690970886], [1439588160.0, 53900000.0, 3.838196337259899], [1439589888.0, 53920000.0, 3.845544237674464], [1439591616.0, 53940000.0, 3.852903405177688], [1439593344.0, 53960000.0, 3.8602738527426554], [1439595072.0, 53980000.0, 3.8676555933524153], [1439596800.0, 54000000.0, 3.875048640000002], [1439598528.0, 54020000.0, 3.882453005688417], [1439600256.0, 54040000.0, 3.8898687034306563], [1439601984.0, 54060000.0, 3.8972957462497035], [1439603712.0, 54080000.0, 3.9047341471785297], [1439605440.0, 54100000.0, 3.9121839192601007], [1439607168.0, 54120000.0, 3.919645075547384], [1439608896.0, 54140000.0, 3.9271176291033534], [1439610624.0, 54160000.0, 3.9346015930009868], [1439612352.0, 54180000.0, 3.942096980323266], [1439614080.0, 54200000.0, 3.9496038041632], [1439615808.0, 54220000.0, 3.9571220776238087], [1439617536.0, 54240000.0, 3.96465181381814], [1439619264.0, 54260000.0, 3.9721930258692577], [1439620992.0, 54280000.0, 3.979745726910266], [1439622720.0, 54300000.0, 3.9873099300842987], [1439624448.0, 54320000.0, 3.994885648544532], [1439626176.0, 54340000.0, 4.002472895454172], [1439627904.0, 54360000.0, 4.0100716839864825], [1439629632.0, 54380000.0, 4.01768202732477], [1439631360.0, 54400000.0, 4.025303938662401], [1439633088.0, 54420000.0, 4.032937431202788], [1439634816.0, 54440000.0, 4.040582518159413], [1439636544.0, 54460000.0, 4.048239212755818], [1439638272.0, 54480000.0, 4.055907528225622], [1439640000.0, 54500000.0, 4.063587477812501], [1439641728.0, 54520000.0, 4.07127907477022], [1439643456.0, 54540000.0, 4.078982332362621], [1439645184.0, 54560000.0, 4.08669726386363], [1439646912.0, 54580000.0, 4.094423882557253], [1439648640.0, 54600000.0, 4.1021622017376], [1439650368.0, 54620000.0, 4.109912234708869], [1439652096.0, 54640000.0, 4.117673994785361], [1439653824.0, 54660000.0, 4.1254474952914695], [1439655552.0, 54680000.0, 4.133232749561709], [1439657280.0, 54700000.0, 4.1410297709406985], [1439659008.0, 54720000.0, 4.148838572783175], [1439660736.0, 54740000.0, 4.156659168453984], [1439662464.0, 54760000.0, 4.1644915713281], [1439664192.0, 54780000.0, 4.172335794790629], [1439665920.0, 54800000.0, 4.180191852236802], [1439667648.0, 54820000.0, 4.188059757071975], [1439669376.0, 54840000.0, 4.195939522711655], [1439671104.0, 54860000.0, 4.203831162581485], [1439672832.0, 54880000.0, 4.2117346901172565], [1439674560.0, 54900000.0, 4.219650118764901], [1439676288.0, 54920000.0, 4.227577461980512], [1439678016.0, 54940000.0, 4.235516733230336], [1439679744.0, 54960000.0, 4.243467945990784], [1439681472.0, 54980000.0, 4.251431113748425], [1439683200.0, 55000000.0, 4.25940625], [1439684928.0, 55020000.0, 4.2673933682524225], [1439686656.0, 55040000.0, 4.275392482022783], [1439688384.0, 55060000.0, 4.283403604838353], [1439690112.0, 55080000.0, 4.291426750236577], [1439691840.0, 55100000.0, 4.2994619317651], [1439693568.0, 55120000.0, 4.30750916298175], [1439695296.0, 55140000.0, 4.315568457454563], [1439697024.0, 55160000.0, 4.3236398287617535], [1439698752.0, 55180000.0, 4.331723290491753], [1439700480.0, 55200000.0, 4.339818856243199], [1439702208.0, 55220000.0, 4.347926539624939], [1439703936.0, 55240000.0, 4.356046354256027], [1439705664.0, 55260000.0, 4.364178313765745], [1439707392.0, 55280000.0, 4.3723224317935925], [1439709120.0, 55300000.0, 4.380478721989301], [1439710848.0, 55320000.0, 4.388647198012818], [1439712576.0, 55340000.0, 4.396827873534338], [1439714304.0, 55360000.0, 4.405020762234288], [1439716032.0, 55380000.0, 4.413225877803339], [1439717760.0, 55400000.0, 4.4214432339424], [1439719488.0, 55420000.0, 4.429672844362635], [1439721216.0, 55440000.0, 4.437914722785459], [1439722944.0, 55460000.0, 4.4461688829425485], [1439724672.0, 55480000.0, 4.454435338575829], [1439726400.0, 55500000.0, 4.4627141034374995], [1439728128.0, 55520000.0, 4.471005191290027], [1439729856.0, 55540000.0, 4.479308615906151], [1439731584.0, 55560000.0, 4.487624391068875], [1439733312.0, 55580000.0, 4.4959525305715], [1439735040.0, 55600000.0, 4.5042930482175985], [1439736768.0, 55620000.0, 4.512645957821038], [1439738496.0, 55640000.0, 4.521011273205966], [1439740224.0, 55660000.0, 4.529389008206835], [1439741952.0, 55680000.0, 4.537779176668397], [1439743680.0, 55700000.0, 4.546181792445701], [1439745408.0, 55720000.0, 4.554596869404102], [1439747136.0, 55740000.0, 4.563024421419271], [1439748864.0, 55760000.0, 4.571464462377187], [1439750592.0, 55780000.0, 4.57991700617416], [1439752320.0, 55800000.0, 4.588382066716799], [1439754048.0, 55820000.0, 4.596859657922062], [1439755776.0, 55840000.0, 4.60534979371722], [1439757504.0, 55860000.0, 4.613852488039895], [1439759232.0, 55880000.0, 4.622367754838023], [1439760960.0, 55900000.0, 4.630895608069899], [1439762688.0, 55920000.0, 4.639436061704156], [1439764416.0, 55940000.0, 4.647989129719786], [1439766144.0, 55960000.0, 4.6565548261061105], [1439767872.0, 55980000.0, 4.665133164862832], [1439769600.0, 56000000.0, 4.673724159999999], [1439771328.0, 56020000.0, 4.682327825538033], [1439773056.0, 56040000.0, 4.690944175507713], [1439774784.0, 56060000.0, 4.699573223950199], [1439776512.0, 56080000.0, 4.708214984917023], [1439778240.0, 56100000.0, 4.716869472470102], [1439779968.0, 56120000.0, 4.725536700681721], [1439781696.0, 56140000.0, 4.7342166836345685], [1439783424.0, 56160000.0, 4.742909435421718], [1439785152.0, 56180000.0, 4.751614970146644], [1439786880.0, 56200000.0, 4.760333301923201], [1439788608.0, 56220000.0, 4.769064444875665], [1439790336.0, 56240000.0, 4.777808413138713], [1439792064.0, 56260000.0, 4.786565220857435], [1439793792.0, 56280000.0, 4.795334882187323], [1439795520.0, 56300000.0, 4.8041174112943], [1439797248.0, 56320000.0, 4.812912822354705], [1439798976.0, 56340000.0, 4.821721129555309], [1439800704.0, 56360000.0, 4.8305423470933], [1439802432.0, 56380000.0, 4.839376489176306], [1439804160.0, 56400000.0, 4.848223570022398], [1439805888.0, 56420000.0, 4.857083603860085], [1439807616.0, 56440000.0, 4.865956604928308], [1439809344.0, 56460000.0, 4.874842587476475], [1439811072.0, 56480000.0, 4.883741565764436], [1439812800.0, 56500000.0, 4.892653554062501], [1439814528.0, 56520000.0, 4.901578566651437], [1439816256.0, 56540000.0, 4.910516617822477], [1439817984.0, 56560000.0, 4.919467721877322], [1439819712.0, 56580000.0, 4.928431893128152], [1439821440.0, 56600000.0, 4.937409145897601], [1439823168.0, 56620000.0, 4.946399494518805], [1439824896.0, 56640000.0, 4.955402953335373], [1439826624.0, 56660000.0, 4.964419536701406], [1439828352.0, 56680000.0, 4.973449258981486], [1439830080.0, 56700000.0, 4.982492134550699], [1439831808.0, 56720000.0, 4.991548177794629], [1439833536.0, 56740000.0, 5.000617403109359], [1439835264.0, 56760000.0, 5.009699824901478], [1439836992.0, 56780000.0, 5.0187954575880855], [1439838720.0, 56800000.0, 5.027904315596799], [1439840448.0, 56820000.0, 5.037026413365752], [1439842176.0, 56840000.0, 5.046161765343592], [1439843904.0, 56860000.0, 5.055310385989502], [1439845632.0, 56880000.0, 5.06447228977319], [1439847360.0, 56900000.0, 5.0736474911749], [1439849088.0, 56920000.0, 5.082836004685409], [1439850816.0, 56940000.0, 5.092037844806032], [1439852544.0, 56960000.0, 5.101253026048638], [1439854272.0, 56980000.0, 5.110481562935641], [1439856000.0, 57000000.0, 5.119723470000001], [1439857728.0, 57020000.0, 5.12897876178524], [1439859456.0, 57040000.0, 5.138247452845439], [1439861184.0, 57060000.0, 5.147529557745249], [1439862912.0, 57080000.0, 5.156825091059874], [1439864640.0, 57100000.0, 5.1661340673751], [1439866368.0, 57120000.0, 5.175456501287288], [1439868096.0, 57140000.0, 5.18479240740338], [1439869824.0, 57160000.0, 5.19414180034089], [1439871552.0, 57180000.0, 5.20350469472793], [1439873280.0, 57200000.0, 5.2128811052031985], [1439875008.0, 57220000.0, 5.222271046415996], [1439876736.0, 57240000.0, 5.231674533026204], [1439878464.0, 57260000.0, 5.241091579704321], [1439880192.0, 57280000.0, 5.250522201131449], [1439881920.0, 57300000.0, 5.2599664119993], [1439883648.0, 57320000.0, 5.2694242270101945], [1439885376.0, 57340000.0, 5.278895660877075], [1439887104.0, 57360000.0, 5.288380728323505], [1439888832.0, 57380000.0, 5.297879444083677], [1439890560.0, 57400000.0, 5.3073918229024], [1439892288.0, 57420000.0, 5.316917879535131], [1439894016.0, 57440000.0, 5.326457628747955], [1439895744.0, 57460000.0, 5.336011085317605], [1439897472.0, 57480000.0, 5.345578264031444], [1439899200.0, 57500000.0, 5.3551591796875], [1439900928.0, 57520000.0, 5.364753847094444], [1439902656.0, 57540000.0, 5.374362281071603], [1439904384.0, 57560000.0, 5.383984496448974], [1439906112.0, 57580000.0, 5.393620508067196], [1439907840.0, 57600000.0, 5.4032703307776], [1439909568.0, 57620000.0, 5.412933979442172], [1439911296.0, 57640000.0, 5.422611468933583], [1439913024.0, 57660000.0, 5.432302814135173], [1439914752.0, 57680000.0, 5.442008029940973], [1439916480.0, 57700000.0, 5.451727131255698], [1439918208.0, 57720000.0, 5.461460132994759], [1439919936.0, 57740000.0, 5.471207050084247], [1439921664.0, 57760000.0, 5.480967897460964], [1439923392.0, 57780000.0, 5.490742690072413], [1439925120.0, 57800000.0, 5.500531442876801], [1439926848.0, 57820000.0, 5.510334170843039], [1439928576.0, 57840000.0, 5.5201508889507584], [1439930304.0, 57860000.0, 5.529981612190308], [1439932032.0, 57880000.0, 5.53982635556276], [1439933760.0, 57900000.0, 5.5496851340799], [1439935488.0, 57920000.0, 5.559557962764255], [1439937216.0, 57940000.0, 5.569444856649079], [1439938944.0, 57960000.0, 5.579345830778369], [1439940672.0, 57980000.0, 5.589260900206848], [1439942400.0, 58000000.0, 5.59919008], [1439944128.0, 58020000.0, 5.6091333852340455], [1439945856.0, 58040000.0, 5.619090830995971], [1439947584.0, 58060000.0, 5.629062432383496], [1439949312.0, 58080000.0, 5.63904820450512], [1439951040.0, 58100000.0, 5.649048162480098], [1439952768.0, 58120000.0, 5.659062321438458], [1439954496.0, 58140000.0, 5.669090696520986], [1439956224.0, 58160000.0, 5.679133302879255], [1439957952.0, 58180000.0, 5.689190155675615], [1439959680.0, 58200000.0, 5.6992612700832], [1439961408.0, 58220000.0, 5.709346661285922], [1439963136.0, 58240000.0, 5.719446344478491], [1439964864.0, 58260000.0, 5.729560334866408], [1439966592.0, 58280000.0, 5.739688647665981], [1439968320.0, 58300000.0, 5.7498312981043], [1439970048.0, 58320000.0, 5.759988301419282], [1439971776.0, 58340000.0, 5.770159672859641], [1439973504.0, 58360000.0, 5.780345427684916], [1439975232.0, 58380000.0, 5.790545581165444], [1439976960.0, 58400000.0, 5.800760148582401], [1439978688.0, 58420000.0, 5.810989145227776], [1439980416.0, 58440000.0, 5.821232586404405], [1439982144.0, 58460000.0, 5.831490487425932], [1439983872.0, 58480000.0, 5.84176286361685], [1439985600.0, 58500000.0, 5.8520497303124985], [1439987328.0, 58520000.0, 5.862351102859053], [1439989056.0, 58540000.0, 5.8726669966135345], [1439990784.0, 58560000.0, 5.882997426943818], [1439992512.0, 58580000.0, 5.8933424092286435], [1439994240.0, 58600000.0, 5.903701958857601], [1439995968.0, 58620000.0, 5.9140760912311405], [1439997696.0, 58640000.0, 5.924464821760588], [1439999424.0, 58660000.0, 5.934868165868139], [1440001152.0, 58680000.0, 5.945286138986864], [1440002880.0, 58700000.0, 5.9557187565607], [1440004608.0, 58720000.0, 5.966166034044484], [1440006336.0, 58740000.0, 5.976627986903933], [1440008064.0, 58760000.0, 5.987104630615655], [1440009792.0, 58780000.0, 5.997595980667143], [1440011520.0, 58800000.0, 6.008102052556799], [1440013248.0, 58820000.0, 6.018622861793925], [1440014976.0, 58840000.0, 6.02915842389873], [1440016704.0, 58860000.0, 6.0397087544023185], [1440018432.0, 58880000.0, 6.050273868846727], [1440020160.0, 58900000.0, 6.060853782784898], [1440021888.0, 58920000.0, 6.071448511780704], [1440023616.0, 58940000.0, 6.082058071408929], [1440025344.0, 58960000.0, 6.092682477255295], [1440027072.0, 58980000.0, 6.103321744916454], [1440028800.0, 59000000.0, 6.113975890000002], [1440030528.0, 59020000.0, 6.124644928124456], [1440032256.0, 59040000.0, 6.1353288749192965], [1440033984.0, 59060000.0, 6.1460277460249415], [1440035712.0, 59080000.0, 6.15674155709277], [1440037440.0, 59100000.0, 6.1674703237851], [1440039168.0, 59120000.0, 6.178214061775225], [1440040896.0, 59140000.0, 6.188972786747392], [1440042624.0, 59160000.0, 6.199746514396828], [1440044352.0, 59180000.0, 6.2105352604297055], [1440046080.0, 59200000.0, 6.221339040563199], [1440047808.0, 59220000.0, 6.232157870525447], [1440049536.0, 59240000.0, 6.242991766055581], [1440051264.0, 59260000.0, 6.253840742903698], [1440052992.0, 59280000.0, 6.264704816830906], [1440054720.0, 59300000.0, 6.275584003609299], [1440056448.0, 59320000.0, 6.286478319021972], [1440058176.0, 59340000.0, 6.297387778863012], [1440059904.0, 59360000.0, 6.308312398937522], [1440061632.0, 59380000.0, 6.319252195061609], [1440063360.0, 59400000.0, 6.330207183062402], [1440065088.0, 59420000.0, 6.341177378778029], [1440066816.0, 59440000.0, 6.352162798057651], [1440068544.0, 59460000.0, 6.363163456761457], [1440070272.0, 59480000.0, 6.374179370760662], [1440072000.0, 59500000.0, 6.385210555937501], [1440073728.0, 59520000.0, 6.39625702818526], [1440075456.0, 59540000.0, 6.40731880340826], [1440077184.0, 59560000.0, 6.4183958975218705], [1440078912.0, 59580000.0, 6.4294883264524945], [1440080640.0, 59600000.0, 6.440596106137599], [1440082368.0, 59620000.0, 6.451719252525708], [1440084096.0, 59640000.0, 6.462857781576401], [1440085824.0, 59660000.0, 6.474011709260309], [1440087552.0, 59680000.0, 6.485181051559149], [1440089280.0, 59700000.0, 6.496365824465699], [1440091008.0, 59720000.0, 6.507566043983815], [1440092736.0, 59740000.0, 6.518781726128424], [1440094464.0, 59760000.0, 6.530012886925541], [1440096192.0, 59780000.0, 6.54125954241227], [1440097920.0, 59800000.0, 6.552521708636802], [1440099648.0, 59820000.0, 6.563799401658415], [1440101376.0, 59840000.0, 6.575092637547495], [1440103104.0, 59860000.0, 6.586401432385524], [1440104832.0, 59880000.0, 6.5977258022650975], [1440106560.0, 59900000.0, 6.6090657632899], [1440108288.0, 59920000.0, 6.620421331574752], [1440110016.0, 59940000.0, 6.631792523245575], [1440111744.0, 59960000.0, 6.643179354439425], [1440113472.0, 59980000.0, 6.6545818413044655], [1440115200.0, 60000000.0, 6.666], [1440116928.0, 60020000.0, 6.677433846696464], [1440118656.0, 60040000.0, 6.688883397575423], [1440120384.0, 60060000.0, 6.7003486688295935], [1440122112.0, 60080000.0, 6.711829676662816], [1440123840.0, 60100000.0, 6.723326437290099], [1440125568.0, 60120000.0, 6.73483896693759], [1440127296.0, 60140000.0, 6.746367281842604], [1440129024.0, 60160000.0, 6.757911398253592], [1440130752.0, 60180000.0, 6.769471332430193], [1440132480.0, 60200000.0, 6.781047100643198], [1440134208.0, 60220000.0, 6.792638719174579], [1440135936.0, 60240000.0, 6.804246204317467], [1440137664.0, 60260000.0, 6.815869572376185], [1440139392.0, 60280000.0, 6.8275088396662325], [1440141120.0, 60300000.0, 6.839164022514301], [1440142848.0, 60320000.0, 6.850835137258258], [1440144576.0, 60340000.0, 6.862522200247178], [1440146304.0, 60360000.0, 6.874225227841327], [1440148032.0, 60380000.0, 6.885944236412181], [1440149760.0, 60400000.0, 6.8976792423424], [1440151488.0, 60420000.0, 6.909430262025874], [1440153216.0, 60440000.0, 6.921197311867698], [1440154944.0, 60460000.0, 6.932980408284188], [1440156672.0, 60480000.0, 6.944779567702868], [1440158400.0, 60500000.0, 6.956594806562499], [1440160128.0, 60520000.0, 6.968426141313065], [1440161856.0, 60540000.0, 6.980273588415791], [1440163584.0, 60560000.0, 6.992137164343116], [1440165312.0, 60580000.0, 7.004016885578739], [1440167040.0, 60600000.0, 7.015912768617598], [1440168768.0, 60620000.0, 7.027824829965878], [1440170496.0, 60640000.0, 7.0397530861410065], [1440172224.0, 60660000.0, 7.0516975536716755], [1440173952.0, 60680000.0, 7.063658249097834], [1440175680.0, 60700000.0, 7.075635188970702], [1440177408.0, 60720000.0, 7.087628389852742], [1440179136.0, 60740000.0, 7.099637868317709], [1440180864.0, 60760000.0, 7.111663640950627], [1440182592.0, 60780000.0, 7.1237057243478], [1440184320.0, 60800000.0, 7.1357641351168], [1440186048.0, 60820000.0, 7.147838889876502], [1440187776.0, 60840000.0, 7.159930005257061], [1440189504.0, 60860000.0, 7.172037497899936], [1440191232.0, 60880000.0, 7.184161384457863], [1440192960.0, 60900000.0, 7.1963016815949], [1440194688.0, 60920000.0, 7.2084584059863985], [1440196416.0, 60940000.0, 7.220631574319027], [1440198144.0, 60960000.0, 7.232821203290752], [1440199872.0, 60980000.0, 7.2450273096108715], [1440201600.0, 61000000.0, 7.257249909999998], [1440203328.0, 61020000.0, 7.269489021190074], [1440205056.0, 61040000.0, 7.281744659924354], [1440206784.0, 61060000.0, 7.294016842957438], [1440208512.0, 61080000.0, 7.306305587055263], [1440210240.0, 61100000.0, 7.318610908995103], [1440211968.0, 61120000.0, 7.330932825565561], [1440213696.0, 61140000.0, 7.343271353566609], [1440215424.0, 61160000.0, 7.355626509809558], [1440217152.0, 61180000.0, 7.367998311117085], [1440218880.0, 61200000.0, 7.380386774323202], [1440220608.0, 61220000.0, 7.392791916273306], [1440222336.0, 61240000.0, 7.405213753824152], [1440224064.0, 61260000.0, 7.417652303843877], [1440225792.0, 61280000.0, 7.430107583211963], [1440227520.0, 61300000.0, 7.442579608819299], [1440229248.0, 61320000.0, 7.455068397568144], [1440230976.0, 61340000.0, 7.467573966372151], [1440232704.0, 61360000.0, 7.4800963321563385], [1440234432.0, 61380000.0, 7.492635511857147], [1440236160.0, 61400000.0, 7.505191522422399], [1440237888.0, 61420000.0, 7.517764380811325], [1440239616.0, 61440000.0, 7.530354103994549], [1440241344.0, 61460000.0, 7.542960708954115], [1440243072.0, 61480000.0, 7.555584212683475], [1440244800.0, 61500000.0, 7.568224632187502], [1440246528.0, 61520000.0, 7.580881984482476], [1440248256.0, 61540000.0, 7.593556286596116], [1440249984.0, 61560000.0, 7.606247555567562], [1440251712.0, 61580000.0, 7.618955808447392], [1440253440.0, 61600000.0, 7.6316810622976], [1440255168.0, 61620000.0, 7.644423334191643], [1440256896.0, 61640000.0, 7.657182641214412], [1440258624.0, 61660000.0, 7.669959000462247], [1440260352.0, 61680000.0, 7.682752429042926], [1440262080.0, 61700000.0, 7.695562944075699], [1440263808.0, 61720000.0, 7.708390562691267], [1440265536.0, 61740000.0, 7.721235302031801], [1440267264.0, 61760000.0, 7.734097179250919], [1440268992.0, 61780000.0, 7.746976211513726], [1440270720.0, 61800000.0, 7.7598724159967984], [1440272448.0, 61820000.0, 7.772785809888194], [1440274176.0, 61840000.0, 7.785716410387433], [1440275904.0, 61860000.0, 7.798664234705542], [1440277632.0, 61880000.0, 7.81162930006503], [1440279360.0, 61900000.0, 7.8246116236999015], [1440281088.0, 61920000.0, 7.8376112228556485], [1440282816.0, 61940000.0, 7.850628114789272], [1440284544.0, 61960000.0, 7.863662316769277], [1440286272.0, 61980000.0, 7.876713846075683], [1440288000.0, 62000000.0, 7.889782720000001], [1440289728.0, 62020000.0, 7.902868955845279], [1440291456.0, 62040000.0, 7.915972570926079], [1440293184.0, 62060000.0, 7.929093582568491], [1440294912.0, 62080000.0, 7.942232008110112], [1440296640.0, 62100000.0, 7.955387864900101], [1440298368.0, 62120000.0, 7.968561170299126], [1440300096.0, 62140000.0, 7.981751941679421], [1440301824.0, 62160000.0, 7.9949601964247305], [1440303552.0, 62180000.0, 8.00818595193037], [1440305280.0, 62200000.0, 8.021429225603198], [1440307008.0, 62220000.0, 8.034690034861637], [1440308736.0, 62240000.0, 8.047968397135644], [1440310464.0, 62260000.0, 8.061264329866761], [1440312192.0, 62280000.0, 8.074577850508089], [1440313920.0, 62300000.0, 8.087908976524302], [1440315648.0, 62320000.0, 8.101257725391635], [1440317376.0, 62340000.0, 8.114624114597914], [1440319104.0, 62360000.0, 8.128008161642544], [1440320832.0, 62380000.0, 8.141409884036518], [1440322560.0, 62400000.0, 8.154829299302401], [1440324288.0, 62420000.0, 8.168266424974371], [1440326016.0, 62440000.0, 8.181721278598195], [1440327744.0, 62460000.0, 8.195193877731246], [1440329472.0, 62480000.0, 8.208684239942485], [1440331200.0, 62500000.0, 8.2221923828125], [1440332928.0, 62520000.0, 8.235718323933485], [1440334656.0, 62540000.0, 8.249262080909244], [1440336384.0, 62560000.0, 8.262823671355212], [1440338112.0, 62580000.0, 8.276403112898437], [1440339840.0, 62600000.0, 8.2900004231776], [1440341568.0, 62620000.0, 8.303615619843011], [1440343296.0, 62640000.0, 8.317248720556623], [1440345024.0, 62660000.0, 8.330899742992013], [1440346752.0, 62680000.0, 8.344568704834412], [1440348480.0, 62700000.0, 8.358255623780696], [1440350208.0, 62720000.0, 8.371960517539401], [1440351936.0, 62740000.0, 8.385683403830686], [1440353664.0, 62760000.0, 8.399424300386405], [1440355392.0, 62780000.0, 8.413183224950052], [1440357120.0, 62800000.0, 8.4269601952768], [1440358848.0, 62820000.0, 8.440755229133476], [1440360576.0, 62840000.0, 8.454568344298599], [1440362304.0, 62860000.0, 8.468399558562348], [1440364032.0, 62880000.0, 8.482248889726602], [1440365760.0, 62900000.0, 8.4961163556049], [1440367488.0, 62920000.0, 8.510001974022492], [1440369216.0, 62940000.0, 8.523905762816318], [1440370944.0, 62960000.0, 8.53782773983501], [1440372672.0, 62980000.0, 8.55176792293889], [1440374400.0, 63000000.0, 8.565726329999997], [1440376128.0, 63020000.0, 8.579702978902086], [1440377856.0, 63040000.0, 8.593697887540609], [1440379584.0, 63060000.0, 8.607711073822738], [1440381312.0, 63080000.0, 8.621742555667359], [1440383040.0, 63100000.0, 8.635792351005097], [1440384768.0, 63120000.0, 8.649860477778299], [1440386496.0, 63140000.0, 8.663946953941027], [1440388224.0, 63160000.0, 8.678051797459096], [1440389952.0, 63180000.0, 8.692175026310055], [1440391680.0, 63200000.0, 8.706316658483201], [1440393408.0, 63220000.0, 8.720476711979563], [1440395136.0, 63240000.0, 8.734655204811931], [1440396864.0, 63260000.0, 8.748852155004844], [1440398592.0, 63280000.0, 8.76306758059462], [1440400320.0, 63300000.0, 8.777301499629301], [1440402048.0, 63320000.0, 8.791553930168723], [1440403776.0, 63340000.0, 8.805824890284482], [1440405504.0, 63360000.0, 8.820114398059957], [1440407232.0, 63380000.0, 8.834422471590283], [1440408960.0, 63400000.0, 8.848749128982401], [1440410688.0, 63420000.0, 8.863094388355016], [1440412416.0, 63440000.0, 8.877458267838644], [1440414144.0, 63460000.0, 8.891840785575573], [1440415872.0, 63480000.0, 8.90624195971989], [1440417600.0, 63500000.0, 8.9206618084375], [1440419328.0, 63520000.0, 8.935100349906094], [1440421056.0, 63540000.0, 8.949557602315172], [1440422784.0, 63560000.0, 8.96403358386606], [1440424512.0, 63580000.0, 8.978528312771884], [1440426240.0, 63600000.0, 8.993041807257603], [1440427968.0, 63620000.0, 9.00757408555998], [1440429696.0, 63640000.0, 9.02212516592763], [1440431424.0, 63660000.0, 9.036695066620979], [1440433152.0, 63680000.0, 9.051283805912304], [1440434880.0, 63700000.0, 9.0658914020857], [1440436608.0, 63720000.0, 9.080517873437124], [1440438336.0, 63740000.0, 9.095163238274372], [1440440064.0, 63760000.0, 9.109827514917095], [1440441792.0, 63780000.0, 9.124510721696783], [1440443520.0, 63800000.0, 9.139212876956798], [1440445248.0, 63820000.0, 9.153933999052365], [1440446976.0, 63840000.0, 9.168674106350569], [1440448704.0, 63860000.0, 9.18343321723036], [1440450432.0, 63880000.0, 9.19821135008257], [1440452160.0, 63900000.0, 9.213008523309899], [1440453888.0, 63920000.0, 9.227824755326948], [1440455616.0, 63940000.0, 9.24266006456017], [1440457344.0, 63960000.0, 9.257514469447935], [1440459072.0, 63980000.0, 9.272387988440492], [1440460800.0, 64000000.0, 9.287280640000002], [1440462528.0, 64020000.0, 9.302192442600496], [1440464256.0, 64040000.0, 9.317123414727936], [1440465984.0, 64060000.0, 9.332073574880182], [1440467712.0, 64080000.0, 9.34704294156701], [1440469440.0, 64100000.0, 9.3620315333101], [1440471168.0, 64120000.0, 9.377039368643063], [1440472896.0, 64140000.0, 9.39206646611143], [1440474624.0, 64160000.0, 9.407112844272666], [1440476352.0, 64180000.0, 9.422178521696145], [1440478080.0, 64200000.0, 9.437263516963199], [1440479808.0, 64220000.0, 9.45236784866709], [1440481536.0, 64240000.0, 9.467491535413023], [1440483264.0, 64260000.0, 9.482634595818135], [1440484992.0, 64280000.0, 9.497797048511545], [1440486720.0, 64300000.0, 9.5129789121343], [1440488448.0, 64320000.0, 9.528180205339416], [1440490176.0, 64340000.0, 9.543400946791854], [1440491904.0, 64360000.0, 9.558641155168562], [1440493632.0, 64380000.0, 9.573900849158449], [1440495360.0, 64400000.0, 9.589180047462403], [1440497088.0, 64420000.0, 9.604478768793268], [1440498816.0, 64440000.0, 9.619797031875892], [1440500544.0, 64460000.0, 9.635134855447095], [1440502272.0, 64480000.0, 9.650492258255703], [1440504000.0, 64500000.0, 9.665869259062502], [1440505728.0, 64520000.0, 9.681265876640298], [1440507456.0, 64540000.0, 9.6966821297739], [1440509184.0, 64560000.0, 9.71211803726011], [1440510912.0, 64580000.0, 9.727573617907733], [1440512640.0, 64600000.0, 9.7430488905376], [1440514368.0, 64620000.0, 9.758543873982546], [1440516096.0, 64640000.0, 9.77405858708744], [1440517824.0, 64660000.0, 9.789593048709152], [1440519552.0, 64680000.0, 9.80514727771659], [1440521280.0, 64700000.0, 9.820721292990699], [1440523008.0, 64720000.0, 9.836315113424456], [1440524736.0, 64740000.0, 9.851928757922865], [1440526464.0, 64760000.0, 9.867562245402981], [1440528192.0, 64780000.0, 9.883215594793908], [1440529920.0, 64800000.0, 9.898888825036801], [1440531648.0, 64820000.0, 9.914581955084858], [1440533376.0, 64840000.0, 9.930295003903336], [1440535104.0, 64860000.0, 9.946027990469563], [1440536832.0, 64880000.0, 9.961780933772937], [1440538560.0, 64900000.0, 9.977553852814902], [1440540288.0, 64920000.0, 9.99334676660899], [1440542016.0, 64940000.0, 10.009159694180815], [1440543744.0, 64960000.0, 10.024992654568065], [1440545472.0, 64980000.0, 10.040845666820505], [1440547200.0, 65000000.0, 10.05671875], [1440548928.0, 65020000.0, 10.072611923180503], [1440550656.0, 65040000.0, 10.088525205448065], [1440552384.0, 65060000.0, 10.10445861590083], [1440554112.0, 65080000.0, 10.120412173649056], [1440555840.0, 65100000.0, 10.136385897815098], [1440557568.0, 65120000.0, 10.15237980753343], [1440559296.0, 65140000.0, 10.168393921950642], [1440561024.0, 65160000.0, 10.184428260225433], [1440562752.0, 65180000.0, 10.200482841528634], [1440564480.0, 65200000.0, 10.216557685043199], [1440566208.0, 65220000.0, 10.23265280996422], [1440567936.0, 65240000.0, 10.248768235498906], [1440569664.0, 65260000.0, 10.264903980866624], [1440571392.0, 65280000.0, 10.281060065298872], [1440573120.0, 65300000.0, 10.297236508039303], [1440574848.0, 65320000.0, 10.313433328343699], [1440576576.0, 65340000.0, 10.32965054548002], [1440578304.0, 65360000.0, 10.345888178728368], [1440580032.0, 65380000.0, 10.362146247381023], [1440581760.0, 65400000.0, 10.378424770742402], [1440583488.0, 65420000.0, 10.394723768129113], [1440585216.0, 65440000.0, 10.411043258869935], [1440586944.0, 65460000.0, 10.42738326230583], [1440588672.0, 65480000.0, 10.44374379778991], [1440590400.0, 65500000.0, 10.460124884687497], [1440592128.0, 65520000.0, 10.476526542376105], [1440593856.0, 65540000.0, 10.492948790245432], [1440595584.0, 65560000.0, 10.509391647697358], [1440597312.0, 65580000.0, 10.525855134145981], [1440599040.0, 65600000.0, 10.542339269017596], [1440600768.0, 65620000.0, 10.55884407175072], [1440602496.0, 65640000.0, 10.575369561796046], [1440604224.0, 65660000.0, 10.591915758616517], [1440605952.0, 65680000.0, 10.608482681687274], [1440607680.0, 65700000.0, 10.625070350495703], [1440609408.0, 65720000.0, 10.641678784541382], [1440611136.0, 65740000.0, 10.658308003336149], [1440612864.0, 65760000.0, 10.674958026404067], [1440614592.0, 65780000.0, 10.691628873281442], [1440616320.0, 65800000.0, 10.708320563516802], [1440618048.0, 65820000.0, 10.725033116670941], [1440619776.0, 65840000.0, 10.741766552316903], [1440621504.0, 65860000.0, 10.758520890039977], [1440623232.0, 65880000.0, 10.775296149437704], [1440624960.0, 65900000.0, 10.7920923501199], [1440626688.0, 65920000.0, 10.808909511708638], [1440628416.0, 65940000.0, 10.825747653838265], [1440630144.0, 65960000.0, 10.84260679615539], [1440631872.0, 65980000.0, 10.85948695831891], [1440633600.0, 66000000.0, 10.876388159999998], [1440635328.0, 66020000.0, 10.893310420882116], [1440637056.0, 66040000.0, 10.910253760660991], [1440638784.0, 66060000.0, 10.927218199044678], [1440640512.0, 66080000.0, 10.944203755753502], [1440642240.0, 66100000.0, 10.961210450520104], [1440643968.0, 66120000.0, 10.978238303089404], [1440645696.0, 66140000.0, 10.995287333218648], [1440647424.0, 66160000.0, 11.012357560677398], [1440649152.0, 66180000.0, 11.029449005247525], [1440650880.0, 66200000.0, 11.046561686723201], [1440652608.0, 66220000.0, 11.063695624910945], [1440654336.0, 66240000.0, 11.08085083962959], [1440656064.0, 66260000.0, 11.098027350710314], [1440657792.0, 66280000.0, 11.115225177996605], [1440659520.0, 66300000.0, 11.132444341344298], [1440661248.0, 66320000.0, 11.149684860621585], [1440662976.0, 66340000.0, 11.16694675570899], [1440664704.0, 66360000.0, 11.18423004649938], [1440666432.0, 66380000.0, 11.201534752897988], [1440668160.0, 66400000.0, 11.218860894822397], [1440669888.0, 66420000.0, 11.236208492202566], [1440671616.0, 66440000.0, 11.253577564980791], [1440673344.0, 66460000.0, 11.270968133111754], [1440675072.0, 66480000.0, 11.288380216562514], [1440676800.0, 66500000.0, 11.305813835312502], [1440678528.0, 66520000.0, 11.323269009353517], [1440680256.0, 66540000.0, 11.340745758689758], [1440681984.0, 66560000.0, 11.358244103337803], [1440683712.0, 66580000.0, 11.375764063326631], [1440685440.0, 66600000.0, 11.393305658697601], [1440687168.0, 66620000.0, 11.410868909504483], [1440688896.0, 66640000.0, 11.428453835813452], [1440690624.0, 66660000.0, 11.446060457703087], [1440692352.0, 66680000.0, 11.463688795264366], [1440694080.0, 66700000.0, 11.481338868600698], [1440695808.0, 66720000.0, 11.499010697827908], [1440697536.0, 66740000.0, 11.51670430307424], [1440699264.0, 66760000.0, 11.534419704480358], [1440700992.0, 66780000.0, 11.552156922199366], [1440702720.0, 66800000.0, 11.569915976396798], [1440704448.0, 66820000.0, 11.587696887250635], [1440706176.0, 66840000.0, 11.605499674951272], [1440707904.0, 66860000.0, 11.623324359701583], [1440709632.0, 66880000.0, 11.64117096171687], [1440711360.0, 66900000.0, 11.659039501224905], [1440713088.0, 66920000.0, 11.676929998465887], [1440714816.0, 66940000.0, 11.694842473692512], [1440716544.0, 66960000.0, 11.712776947169917], [1440718272.0, 66980000.0, 11.730733439175722], [1440720000.0, 67000000.0, 11.748711970000004], [1440721728.0, 67020000.0, 11.766712559945319], [1440723456.0, 67040000.0, 11.784735229326719], [1440725184.0, 67060000.0, 11.802779998471733], [1440726912.0, 67080000.0, 11.820846887720355], [1440728640.0, 67100000.0, 11.838935917425099], [1440730368.0, 67120000.0, 11.857047107950965], [1440732096.0, 67140000.0, 11.875180479675462], [1440733824.0, 67160000.0, 11.89333605298857], [1440735552.0, 67180000.0, 11.91151384829281], [1440737280.0, 67200000.0, 11.929713886003197], [1440739008.0, 67220000.0, 11.947936186547278], [1440740736.0, 67240000.0, 11.966180770365083], [1440742464.0, 67260000.0, 11.984447657909202], [1440744192.0, 67280000.0, 12.002736869644728], [1440745920.0, 67300000.0, 12.021048426049305], [1440747648.0, 67320000.0, 12.039382347613076], [1440749376.0, 67340000.0, 12.057738654838753], [1440751104.0, 67360000.0, 12.076117368241585], [1440752832.0, 67380000.0, 12.09451850834936], [1440754560.0, 67400000.0, 12.1129420957024], [1440756288.0, 67420000.0, 12.13138815085361], [1440758016.0, 67440000.0, 12.149856694368435], [1440759744.0, 67460000.0, 12.168347746824887], [1440761472.0, 67480000.0, 12.186861328813526], [1440763200.0, 67500000.0, 12.2053974609375], [1440764928.0, 67520000.0, 12.223956163812524], [1440766656.0, 67540000.0, 12.24253745806688], [1440768384.0, 67560000.0, 12.261141364341455], [1440770112.0, 67580000.0, 12.279767903289676], [1440771840.0, 67600000.0, 12.2984170955776], [1440773568.0, 67620000.0, 12.31708896188385], [1440775296.0, 67640000.0, 12.335783522899664], [1440777024.0, 67660000.0, 12.354500799328854], [1440778752.0, 67680000.0, 12.373240811887852], [1440780480.0, 67700000.0, 12.392003581305696], [1440782208.0, 67720000.0, 12.41078912832404], [1440783936.0, 67740000.0, 12.429597473697125], [1440785664.0, 67760000.0, 12.448428638191844], [1440787392.0, 67780000.0, 12.467282642587692], [1440789120.0, 67800000.0, 12.486159507676803], [1440790848.0, 67820000.0, 12.505059254263918], [1440792576.0, 67840000.0, 12.523981903166439], [1440794304.0, 67860000.0, 12.542927475214386], [1440796032.0, 67880000.0, 12.561895991250442], [1440797760.0, 67900000.0, 12.5808874721299], [1440799488.0, 67920000.0, 12.599901938720734], [1440801216.0, 67940000.0, 12.618939411903558], [1440802944.0, 67960000.0, 12.63799991257165], [1440804672.0, 67980000.0, 12.657083461630927], [1440806400.0, 68000000.0, 12.676190079999998], [1440808128.0, 68020000.0, 12.695319788610124], [1440809856.0, 68040000.0, 12.71447260840525], [1440811584.0, 68060000.0, 12.733648560341974], [1440813312.0, 68080000.0, 12.752847665389602], [1440815040.0, 68100000.0, 12.772069944530097], [1440816768.0, 68120000.0, 12.791315418758138], [1440818496.0, 68140000.0, 12.810584109081066], [1440820224.0, 68160000.0, 12.829876036518938], [1440821952.0, 68180000.0, 12.849191222104492], [1440823680.0, 68200000.0, 12.868529686883202], [1440825408.0, 68220000.0, 12.887891451913205], [1440827136.0, 68240000.0, 12.907276538265371], [1440828864.0, 68260000.0, 12.926684967023286], [1440830592.0, 68280000.0, 12.946116759283258], [1440832320.0, 68300000.0, 12.965571936154301], [1440834048.0, 68320000.0, 12.985050518758161], [1440835776.0, 68340000.0, 13.00455252822932], [1440837504.0, 68360000.0, 13.024077985714998], [1440839232.0, 68380000.0, 13.043626912375123], [1440840960.0, 68400000.0, 13.0631993293824], [1440842688.0, 68420000.0, 13.082795257922255], [1440844416.0, 68440000.0, 13.102414719192888], [1440846144.0, 68460000.0, 13.12205773440521], [1440847872.0, 68480000.0, 13.141724324782931], [1440849600.0, 68500000.0, 13.161414511562496], [1440851328.0, 68520000.0, 13.181128315993137], [1440853056.0, 68540000.0, 13.200865759336812], [1440854784.0, 68560000.0, 13.220626862868299], [1440856512.0, 68580000.0, 13.240411647875122], [1440858240.0, 68600000.0, 13.260220135657605], [1440859968.0, 68620000.0, 13.280052347528823], [1440861696.0, 68640000.0, 13.29990830481467], [1440863424.0, 68660000.0, 13.319788028853818], [1440865152.0, 68680000.0, 13.339691540997745], [1440866880.0, 68700000.0, 13.359618862610702], [1440868608.0, 68720000.0, 13.379570015069765], [1440870336.0, 68740000.0, 13.39954501976481], [1440872064.0, 68760000.0, 13.419543898098537], [1440873792.0, 68780000.0, 13.439566671486425], [1440875520.0, 68800000.0, 13.459613361356798], [1440877248.0, 68820000.0, 13.4796839891508], [1440878976.0, 68840000.0, 13.499778576322411], [1440880704.0, 68860000.0, 13.519897144338398], [1440882432.0, 68880000.0, 13.540039714678409], [1440884160.0, 68900000.0, 13.560206308834895], [1440885888.0, 68920000.0, 13.580396948313187], [1440887616.0, 68940000.0, 13.60061165463141], [1440889344.0, 68960000.0, 13.620850449320574], [1440891072.0, 68980000.0, 13.641113353924531], [1440892800.0, 69000000.0, 13.661400390000003], [1440894528.0, 69020000.0, 13.681711579116538], [1440896256.0, 69040000.0, 13.702046942856578], [1440897984.0, 69060000.0, 13.722406502815423], [1440899712.0, 69080000.0, 13.742790280601252], [1440901440.0, 69100000.0, 13.763198297835102], [1440903168.0, 69120000.0, 13.783630576150903], [1440904896.0, 69140000.0, 13.804087137195472], [1440906624.0, 69160000.0, 13.824568002628508], [1440908352.0, 69180000.0, 13.845073194122588], [1440910080.0, 69200000.0, 13.865602733363199], [1440911808.0, 69220000.0, 13.886156642048729], [1440913536.0, 69240000.0, 13.906734941890461], [1440915264.0, 69260000.0, 13.927337654612579], [1440916992.0, 69280000.0, 13.947964801952185], [1440918720.0, 69300000.0, 13.968616405659295], [1440920448.0, 69320000.0, 13.989292487496854], [1440922176.0, 69340000.0, 14.009993069240693], [1440923904.0, 69360000.0, 14.0307181726796], [1440925632.0, 69380000.0, 14.051467819615288], [1440927360.0, 69400000.0, 14.072242031862405], [1440929088.0, 69420000.0, 14.093040831248508], [1440930816.0, 69440000.0, 14.11386423961413], [1440932544.0, 69460000.0, 14.134712278812737], [1440934272.0, 69480000.0, 14.155584970710745], [1440936000.0, 69500000.0, 14.176482337187505], [1440937728.0, 69520000.0, 14.197404400135337], [1440939456.0, 69540000.0, 14.218351181459537], [1440941184.0, 69560000.0, 14.239322703078352], [1440942912.0, 69580000.0, 14.260318986922973], [1440944640.0, 69600000.0, 14.281340054937598], [1440946368.0, 69620000.0, 14.302385929079387], [1440948096.0, 69640000.0, 14.323456631318482], [1440949824.0, 69660000.0, 14.34455218363799], [1440951552.0, 69680000.0, 14.365672608034027], [1440953280.0, 69700000.0, 14.386817926515693], [1440955008.0, 69720000.0, 14.407988161105099], [1440956736.0, 69740000.0, 14.429183333837305], [1440958464.0, 69760000.0, 14.450403466760422], [1440960192.0, 69780000.0, 14.471648581935549], [1440961920.0, 69800000.0, 14.492918701436803], [1440963648.0, 69820000.0, 14.514213847351296], [1440965376.0, 69840000.0, 14.535534041779176], [1440967104.0, 69860000.0, 14.556879306833608], [1440968832.0, 69880000.0, 14.578249664640778], [1440970560.0, 69900000.0, 14.599645137339904], [1440972288.0, 69920000.0, 14.621065747083232], [1440974016.0, 69940000.0, 14.642511516036055], [1440975744.0, 69960000.0, 14.663982466376707], [1440977472.0, 69980000.0, 14.685478620296546], [1440979200.0, 70000000.0, 14.707], [1440980928.0, 70020000.0, 14.72854662770454], [1440982656.0, 70040000.0, 14.750118525640701], [1440984384.0, 70060000.0, 14.771715716052075], [1440986112.0, 70080000.0, 14.793338221195295], [1440987840.0, 70100000.0, 14.814986063340095], [1440989568.0, 70120000.0, 14.836659264769269], [1440991296.0, 70140000.0, 14.85835784777868], [1440993024.0, 70160000.0, 14.88008183467727], [1440994752.0, 70180000.0, 14.90183124778707], [1440996480.0, 70200000.0, 14.923606109443195], [1440998208.0, 70220000.0, 14.94540644199386], [1440999936.0, 70240000.0, 14.967232267800346], [1441001664.0, 70260000.0, 14.989083609237063], [1441003392.0, 70280000.0, 15.010960488691511], [1441005120.0, 70300000.0, 15.032862928564308], [1441006848.0, 70320000.0, 15.054790951269139], [1441008576.0, 70340000.0, 15.076744579232859], [1441010304.0, 70360000.0, 15.098723834895408], [1441012032.0, 70380000.0, 15.120728740709863], [1441013760.0, 70400000.0, 15.142759319142398], [1441015488.0, 70420000.0, 15.164815592672355], [1441017216.0, 70440000.0, 15.186897583792174], [1441018944.0, 70460000.0, 15.209005315007472], [1441020672.0, 70480000.0, 15.231138808836947], [1441022400.0, 70500000.0, 15.253298087812498], [1441024128.0, 70520000.0, 15.275483174479142], [1441025856.0, 70540000.0, 15.297694091395073], [1441027584.0, 70560000.0, 15.319930861131596], [1441029312.0, 70580000.0, 15.342193506273219], [1441031040.0, 70600000.0, 15.364482049417594], [1441032768.0, 70620000.0, 15.38679651317556], [1441034496.0, 70640000.0, 15.40913692017109], [1441036224.0, 70660000.0, 15.431503293041352], [1441037952.0, 70680000.0, 15.453895654436714], [1441039680.0, 70700000.0, 15.476314027020706], [1441041408.0, 70720000.0, 15.498758433470025], [1441043136.0, 70740000.0, 15.521228896474588], [1441044864.0, 70760000.0, 15.543725438737503], [1441046592.0, 70780000.0, 15.566248082975081], [1441048320.0, 70800000.0, 15.5887968519168], [1441050048.0, 70820000.0, 15.611371768305382], [1441051776.0, 70840000.0, 15.633972854896744], [1441053504.0, 70860000.0, 15.656600134460016], [1441055232.0, 70880000.0, 15.679253629777543], [1441056960.0, 70900000.0, 15.7019333636449], [1441058688.0, 70920000.0, 15.724639358870874], [1441060416.0, 70940000.0, 15.747371638277507], [1441062144.0, 70960000.0, 15.770130224700031], [1441063872.0, 70980000.0, 15.792915140986953], [1441065600.0, 71000000.0, 15.815726409999998], [1441067328.0, 71020000.0, 15.838564054614157], [1441069056.0, 71040000.0, 15.861428097717633], [1441070784.0, 71060000.0, 15.884318562211918], [1441072512.0, 71080000.0, 15.907235471011742], [1441074240.0, 71100000.0, 15.930178847045104], [1441075968.0, 71120000.0, 15.953148713253242], [1441077696.0, 71140000.0, 15.97614509259069], [1441079424.0, 71160000.0, 15.999168008025235], [1441081152.0, 71180000.0, 16.022217482537965], [1441082880.0, 71200000.0, 16.045293539123204], [1441084608.0, 71220000.0, 16.06839620078858], [1441086336.0, 71240000.0, 16.091525490555032], [1441088064.0, 71260000.0, 16.11468143145676], [1441089792.0, 71280000.0, 16.137864046541246], [1441091520.0, 71300000.0, 16.1610733588693], [1441093248.0, 71320000.0, 16.184309391515026], [1441094976.0, 71340000.0, 16.20757216756583], [1441096704.0, 71360000.0, 16.230861710122422], [1441098432.0, 71380000.0, 16.254178042298825], [1441100160.0, 71400000.0, 16.277521187222398], [1441101888.0, 71420000.0, 16.300891168033807], [1441103616.0, 71440000.0, 16.32428800788703], [1441105344.0, 71460000.0, 16.347711729949395], [1441107072.0, 71480000.0, 16.371162357401552], [1441108800.0, 71500000.0, 16.394639913437505], [1441110528.0, 71520000.0, 16.418144421264554], [1441112256.0, 71540000.0, 16.441675904103395], [1441113984.0, 71560000.0, 16.46523438518804], [1441115712.0, 71580000.0, 16.48881988776587], [1441117440.0, 71600000.0, 16.5124324350976], [1441119168.0, 71620000.0, 16.53607205045732], [1441120896.0, 71640000.0, 16.559738757132486], [1441122624.0, 71660000.0, 16.58343257842393], [1441124352.0, 71680000.0, 16.607153537645807], [1441126080.0, 71700000.0, 16.630901658125698], [1441127808.0, 71720000.0, 16.654676963204547], [1441129536.0, 71740000.0, 16.678479476236685], [1441131264.0, 71760000.0, 16.702309220589797], [1441132992.0, 71780000.0, 16.726166219645005], [1441134720.0, 71800000.0, 16.750050496796792], [1441136448.0, 71820000.0, 16.773962075453074], [1441138176.0, 71840000.0, 16.797900979035116], [1441139904.0, 71860000.0, 16.821867230977624], [1441141632.0, 71880000.0, 16.84586085472871], [1441143360.0, 71900000.0, 16.869881873749904], [1441145088.0, 71920000.0, 16.89393031151613], [1441146816.0, 71940000.0, 16.91800619151575], [1441148544.0, 71960000.0, 16.94210953725056], [1441150272.0, 71980000.0, 16.96624037223576], [1441152000.0, 72000000.0, 16.99039872], [1441153728.0, 72020000.0, 17.014584604085357], [1441155456.0, 72040000.0, 17.038798048047365], [1441157184.0, 72060000.0, 17.06303907545497], [1441158912.0, 72080000.0, 17.08730770989059], [1441160640.0, 72100000.0, 17.111603974950096], [1441162368.0, 72120000.0, 17.135927894242805], [1441164096.0, 72140000.0, 17.160279491391503], [1441165824.0, 72160000.0, 17.184658790032408], [1441167552.0, 72180000.0, 17.209065813815247], [1441169280.0, 72200000.0, 17.233500586403196], [1441171008.0, 72220000.0, 17.25796313147292], [1441172736.0, 72240000.0, 17.282453472714526], [1441174464.0, 72260000.0, 17.30697163383164], [1441176192.0, 72280000.0, 17.331517638541364], [1441177920.0, 72300000.0, 17.356091510574306], [1441179648.0, 72320000.0, 17.380693273674517], [1441181376.0, 72340000.0, 17.405322951599594], [1441183104.0, 72360000.0, 17.429980568120623], [1441184832.0, 72380000.0, 17.454666147022206], [1441186560.0, 72400000.0, 17.479379712102403], [1441188288.0, 72420000.0, 17.504121287172847], [1441190016.0, 72440000.0, 17.52889089605867], [1441191744.0, 72460000.0, 17.55368856259853], [1441193472.0, 72480000.0, 17.578514310644568], [1441195200.0, 72500000.0, 17.6033681640625], [1441196928.0, 72520000.0, 17.62825014673156], [1441198656.0, 72540000.0, 17.653160282544523], [1441200384.0, 72560000.0, 17.678098595407693], [1441202112.0, 72580000.0, 17.70306510924092], [1441203840.0, 72600000.0, 17.728059847977594], [1441205568.0, 72620000.0, 17.753082835564683], [1441207296.0, 72640000.0, 17.77813409596271], [1441209024.0, 72660000.0, 17.803213653145693], [1441210752.0, 72680000.0, 17.82832153110129], [1441212480.0, 72700000.0, 17.853457753830696], [1441214208.0, 72720000.0, 17.87862234534868], [1441215936.0, 72740000.0, 17.903815329683567], [1441217664.0, 72760000.0, 17.929036730877282], [1441219392.0, 72780000.0, 17.95428657298533], [1441221120.0, 72800000.0, 17.979564880076804], [1441222848.0, 72820000.0, 18.00487167623436], [1441224576.0, 72840000.0, 18.030206985554276], [1441226304.0, 72860000.0, 18.055570832146426], [1441228032.0, 72880000.0, 18.08096324013428], [1441229760.0, 72900000.0, 18.1063842336549], [1441231488.0, 72920000.0, 18.131833836858977], [1441233216.0, 72940000.0, 18.1573120739108], [1441234944.0, 72960000.0, 18.182818968988286], [1441236672.0, 72980000.0, 18.208354546282965], [1441238400.0, 73000000.0, 18.23391883], [1441240128.0, 73020000.0, 18.259511844358162], [1441241856.0, 73040000.0, 18.28513361358989], [1441243584.0, 73060000.0, 18.310784161941214], [1441245312.0, 73080000.0, 18.336463513671838], [1441247040.0, 73100000.0, 18.362171693055096], [1441248768.0, 73120000.0, 18.38790872437798], [1441250496.0, 73140000.0, 18.413674631941102], [1441252224.0, 73160000.0, 18.439469440058776], [1441253952.0, 73180000.0, 18.465293173058935], [1441255680.0, 73200000.0, 18.491145855283204], [1441257408.0, 73220000.0, 18.517027511086837], [1441259136.0, 73240000.0, 18.54293816483881], [1441260864.0, 73260000.0, 18.568877840921722], [1441262592.0, 73280000.0, 18.594846563731902], [1441264320.0, 73300000.0, 18.620844357679307], [1441266048.0, 73320000.0, 18.646871247187597], [1441267776.0, 73340000.0, 18.672927256694155], [1441269504.0, 73360000.0, 18.69901241065004], [1441271232.0, 73380000.0, 18.725126733519964], [1441272960.0, 73400000.0, 18.7512702497824], [1441274688.0, 73420000.0, 18.777442983929493], [1441276416.0, 73440000.0, 18.803644960467132], [1441278144.0, 73460000.0, 18.829876203914857], [1441279872.0, 73480000.0, 18.856136738805972], [1441281600.0, 73500000.0, 18.882426589687494], [1441283328.0, 73520000.0, 18.908745781120174], [1441285056.0, 73540000.0, 18.935094337678454], [1441286784.0, 73560000.0, 18.96147228395054], [1441288512.0, 73580000.0, 18.987879644538356], [1441290240.0, 73600000.0, 19.014316444057602], [1441291968.0, 73620000.0, 19.040782707137662], [1441293696.0, 73640000.0, 19.067278458421708], [1441295424.0, 73660000.0, 19.093803722566655], [1441297152.0, 73680000.0, 19.120358524243187], [1441298880.0, 73700000.0, 19.1469428881357], [1441300608.0, 73720000.0, 19.1735568389424], [1441302336.0, 73740000.0, 19.200200401375252], [1441304064.0, 73760000.0, 19.22687360015998], [1441305792.0, 73780000.0, 19.253576460036065], [1441307520.0, 73800000.0, 19.2803090057568], [1441309248.0, 73820000.0, 19.30707126208925], [1441310976.0, 73840000.0, 19.33386325381425], [1441312704.0, 73860000.0, 19.36068500572644], [1441314432.0, 73880000.0, 19.387536542634244], [1441316160.0, 73900000.0, 19.4144178893599], [1441317888.0, 73920000.0, 19.441329070739428], [1441319616.0, 73940000.0, 19.468270111622648], [1441321344.0, 73960000.0, 19.495241036873217], [1441323072.0, 73980000.0, 19.52224187136857], [1441324800.0, 74000000.0, 19.54927264], [1441326528.0, 74020000.0, 19.576333367672575], [1441328256.0, 74040000.0, 19.603424079305213], [1441329984.0, 74060000.0, 19.63054479983066], [1441331712.0, 74080000.0, 19.65769555419549], [1441333440.0, 74100000.0, 19.684876367360097], [1441335168.0, 74120000.0, 19.71208726429875], [1441336896.0, 74140000.0, 19.73932826999951], [1441338624.0, 74160000.0, 19.766599409464348], [1441340352.0, 74180000.0, 19.793900707709028], [1441342080.0, 74200000.0, 19.8212321897632], [1441343808.0, 74220000.0, 19.848593880670368], [1441345536.0, 74240000.0, 19.875985805487907], [1441347264.0, 74260000.0, 19.903407989287015], [1441348992.0, 74280000.0, 19.930860457152825], [1441350720.0, 74300000.0, 19.958343234184294], [1441352448.0, 74320000.0, 19.985856345494295], [1441354176.0, 74340000.0, 20.01339981620954], [1441355904.0, 74360000.0, 20.040973671470645], [1441357632.0, 74380000.0, 20.068577936432124], [1441359360.0, 74400000.0, 20.096212636262408], [1441361088.0, 74420000.0, 20.12387779614375], [1441362816.0, 74440000.0, 20.151573441272372], [1441364544.0, 74460000.0, 20.179299596858375], [1441366272.0, 74480000.0, 20.207056288125788], [1441368000.0, 74500000.0, 20.234843540312504], [1441369728.0, 74520000.0, 20.262661378670384], [1441371456.0, 74540000.0, 20.29050982846518], [1441373184.0, 74560000.0, 20.31838891497659], [1441374912.0, 74580000.0, 20.346298663498214], [1441376640.0, 74600000.0, 20.374239099337597], [1441378368.0, 74620000.0, 20.402210247816225], [1441380096.0, 74640000.0, 20.43021213426952], [1441381824.0, 74660000.0, 20.45824478404683], [1441383552.0, 74680000.0, 20.486308222511468], [1441385280.0, 74700000.0, 20.514402475040697], [1441387008.0, 74720000.0, 20.542527567025736], [1441388736.0, 74740000.0, 20.570683523871747], [1441390464.0, 74760000.0, 20.59887037099786], [1441392192.0, 74780000.0, 20.627088133837187], [1441393920.0, 74800000.0, 20.655336837836803], [1441395648.0, 74820000.0, 20.683616508457735], [1441397376.0, 74840000.0, 20.711927171175017], [1441399104.0, 74860000.0, 20.740268851477637], [1441400832.0, 74880000.0, 20.76864157486862], [1441402560.0, 74900000.0, 20.797045366864907], [1441404288.0, 74920000.0, 20.82548025299747], [1441406016.0, 74940000.0, 20.853946258811288], [1441407744.0, 74960000.0, 20.88244340986535], [1441409472.0, 74980000.0, 20.910971731732587], [1441411200.0, 75000000.0, 20.93953125], [1441412928.0, 75020000.0, 20.968121990268582], [1441414656.0, 75040000.0, 20.996743978153344], [1441416384.0, 75060000.0, 21.02539723928331], [1441418112.0, 75080000.0, 21.054081799301542], [1441419840.0, 75100000.0, 21.0827976838651], [1441421568.0, 75120000.0, 21.111544918645105], [1441423296.0, 75140000.0, 21.140323529326725], [1441425024.0, 75160000.0, 21.16913354160911], [1441426752.0, 75180000.0, 21.197974981205512], [1441428480.0, 75200000.0, 21.226847873843194], [1441430208.0, 75220000.0, 21.2557522452635], [1441431936.0, 75240000.0, 21.284688121221787], [1441433664.0, 75260000.0, 21.313655527487505], [1441435392.0, 75280000.0, 21.342654489844147], [1441437120.0, 75300000.0, 21.371685034089303], [1441438848.0, 75320000.0, 21.400747186034575], [1441440576.0, 75340000.0, 21.429840971505694], [1441442304.0, 75360000.0, 21.458966416342445], [1441444032.0, 75380000.0, 21.488123546398707], [1441445760.0, 75400000.0, 21.5173123875424], [1441447488.0, 75420000.0, 21.54653296565559], [1441449216.0, 75440000.0, 21.57578530663442], [1441450944.0, 75460000.0, 21.605069436389112], [1441452672.0, 75480000.0, 21.634385380843987], [1441454400.0, 75500000.0, 21.663733165937497], [1441456128.0, 75520000.0, 21.69311281762218], [1441457856.0, 75540000.0, 21.722524361864718], [1441459584.0, 75560000.0, 21.75196782464584], [1441461312.0, 75580000.0, 21.78144323196046], [1441463040.0, 75600000.0, 21.810950609817592], [1441464768.0, 75620000.0, 21.840489984240406], [1441466496.0, 75640000.0, 21.870061381266126], [1441468224.0, 75660000.0, 21.899664826946196], [1441469952.0, 75680000.0, 21.92930034734615], [1441471680.0, 75700000.0, 21.958967968545704], [1441473408.0, 75720000.0, 21.988667716638666], [1441475136.0, 75740000.0, 22.01839961773303], [1441476864.0, 75760000.0, 22.048163697950944], [1441478592.0, 75780000.0, 22.077959983428723], [1441480320.0, 75800000.0, 22.1077885003168], [1441482048.0, 75820000.0, 22.13764927477982], [1441483776.0, 75840000.0, 22.16754233299658], [1441485504.0, 75860000.0, 22.197467701160054], [1441487232.0, 75880000.0, 22.22742540547738], [1441488960.0, 75900000.0, 22.2574154721699], [1441490688.0, 75920000.0, 22.287437927473114], [1441492416.0, 75940000.0, 22.317492797636746], [1441494144.0, 75960000.0, 22.347580108924674], [1441495872.0, 75980000.0, 22.37769988761499], [1441497600.0, 76000000.0, 22.40785215999999], [1441499328.0, 76020000.0, 22.438036952386195], [1441501056.0, 76040000.0, 22.468254291094276], [1441502784.0, 76060000.0, 22.49850420245916], [1441504512.0, 76080000.0, 22.52878671282998], [1441506240.0, 76100000.0, 22.559101848570105], [1441507968.0, 76120000.0, 22.589449636057086], [1441509696.0, 76140000.0, 22.619830101682723], [1441511424.0, 76160000.0, 22.65024327185308], [1441513152.0, 76180000.0, 22.680689172988405], [1441514880.0, 76200000.0, 22.7111678315232], [1441516608.0, 76220000.0, 22.741679273906225], [1441518336.0, 76240000.0, 22.77222352660047], [1441520064.0, 76260000.0, 22.802800616083196], [1441521792.0, 76280000.0, 22.83341056884589], [1441523520.0, 76300000.0, 22.8640534113943], [1441525248.0, 76320000.0, 22.89472917024846], [1441526976.0, 76340000.0, 22.925437871942666], [1441528704.0, 76360000.0, 22.956179543025456], [1441530432.0, 76380000.0, 22.98695421005966], [1441532160.0, 76400000.0, 23.017761899622393], [1441533888.0, 76420000.0, 23.04860263830505], [1441535616.0, 76440000.0, 23.079476452713273], [1441537344.0, 76460000.0, 23.110383369467034], [1441539072.0, 76480000.0, 23.141323415200592], [1441540800.0, 76500000.0, 23.172296616562505], [1441542528.0, 76520000.0, 23.203303000215598], [1441544256.0, 76540000.0, 23.234342592837034], [1441545984.0, 76560000.0, 23.26541542111828], [1441547712.0, 76580000.0, 23.296521511765118], [1441549440.0, 76600000.0, 23.3276608914976], [1441551168.0, 76620000.0, 23.358833587050164], [1441552896.0, 76640000.0, 23.390039625171532], [1441554624.0, 76660000.0, 23.42127903262477], [1441556352.0, 76680000.0, 23.452551836187254], [1441558080.0, 76700000.0, 23.483858062650697], [1441559808.0, 76720000.0, 23.515197738821183], [1441561536.0, 76740000.0, 23.546570891519124], [1441563264.0, 76760000.0, 23.57797754757924], [1441564992.0, 76780000.0, 23.609417733850645], [1441566720.0, 76800000.0, 23.640891477196792], [1441568448.0, 76820000.0, 23.67239880449552], [1441570176.0, 76840000.0, 23.70393974263896], [1441571904.0, 76860000.0, 23.735514318533667], [1441573632.0, 76880000.0, 23.767122559100542], [1441575360.0, 76900000.0, 23.798764491274905], [1441577088.0, 76920000.0, 23.83044014200637], [1441578816.0, 76940000.0, 23.86214953825899], [1441580544.0, 76960000.0, 23.893892707011194], [1441582272.0, 76980000.0, 23.9256696752558], [1441584000.0, 77000000.0, 23.957480470000004], [1441585728.0, 77020000.0, 23.989325118265402], [1441587456.0, 77040000.0, 24.021203647087997], [1441589184.0, 77060000.0, 24.053116083518212], [1441590912.0, 77080000.0, 24.085062454620836], [1441592640.0, 77100000.0, 24.117042787475096], [1441594368.0, 77120000.0, 24.149057109174638], [1441596096.0, 77140000.0, 24.181105446827544], [1441597824.0, 77160000.0, 24.213187827556254], [1441599552.0, 77180000.0, 24.24530427849769], [1441601280.0, 77200000.0, 24.277454826803197], [1441603008.0, 77220000.0, 24.309639499638564], [1441604736.0, 77240000.0, 24.341858324183963], [1441606464.0, 77260000.0, 24.37411132763408], [1441608192.0, 77280000.0, 24.406398537198005], [1441609920.0, 77300000.0, 24.438719980099307], [1441611648.0, 77320000.0, 24.47107568357596], [1441613376.0, 77340000.0, 24.503465674880438], [1441615104.0, 77360000.0, 24.535889981279663], [1441616832.0, 77380000.0, 24.56834863005504], [1441618560.0, 77400000.0, 24.600841648502406], [1441620288.0, 77420000.0, 24.63336906393209], [1441622016.0, 77440000.0, 24.66593090366891], [1441623744.0, 77460000.0, 24.69852719505217], [1441625472.0, 77480000.0, 24.731157965435607], [1441627200.0, 77500000.0, 24.7638232421875], [1441628928.0, 77520000.0, 24.7965230526906], [1441630656.0, 77540000.0, 24.829257424342156], [1441632384.0, 77560000.0, 24.86202638455394], [1441634112.0, 77580000.0, 24.89482996075216], [1441635840.0, 77600000.0, 24.927668180377594], [1441637568.0, 77620000.0, 24.960541070885522], [1441639296.0, 77640000.0, 24.993448659745745], [1441641024.0, 77660000.0, 25.026390974442535], [1441642752.0, 77680000.0, 25.059368042474734], [1441644480.0, 77700000.0, 25.092379891355694], [1441646208.0, 77720000.0, 25.12542654861332], [1441647936.0, 77740000.0, 25.158508041790004], [1441649664.0, 77760000.0, 25.191624398442723], [1441651392.0, 77780000.0, 25.224775646142966], [1441653120.0, 77800000.0, 25.2579618124768], [1441654848.0, 77820000.0, 25.2911829250448], [1441656576.0, 77840000.0, 25.324439011462115], [1441658304.0, 77860000.0, 25.357730099358466], [1441660032.0, 77880000.0, 25.39105621637812], [1441661760.0, 77900000.0, 25.424417390179904], [1441663488.0, 77920000.0, 25.457813648437217], [1441665216.0, 77940000.0, 25.491245018838036], [1441666944.0, 77960000.0, 25.524711529084932], [1441668672.0, 77980000.0, 25.558213206895005], [1441670400.0, 78000000.0, 25.591750079999994], [1441672128.0, 78020000.0, 25.625322176146202], [1441673856.0, 78040000.0, 25.658929523094532], [1441675584.0, 78060000.0, 25.692572148620457], [1441677312.0, 78080000.0, 25.726250080514077], [1441679040.0, 78100000.0, 25.75996334658009], [1441680768.0, 78120000.0, 25.793711974637827], [1441682496.0, 78140000.0, 25.827495992521147], [1441684224.0, 78160000.0, 25.861315428078612], [1441685952.0, 78180000.0, 25.89517030917337], [1441687680.0, 78200000.0, 25.92906066368321], [1441689408.0, 78220000.0, 25.962986519500483], [1441691136.0, 78240000.0, 25.99694790453225], [1441692864.0, 78260000.0, 26.03094484670016], [1441694592.0, 78280000.0, 26.064977373940543], [1441696320.0, 78300000.0, 26.099045514204303], [1441698048.0, 78320000.0, 26.133149295457038], [1441699776.0, 78340000.0, 26.167288745679], [1441701504.0, 78360000.0, 26.20146389286508], [1441703232.0, 78380000.0, 26.235674765024804], [1441704960.0, 78400000.0, 26.2699213901824], [1441706688.0, 78420000.0, 26.304203796376733], [1441708416.0, 78440000.0, 26.33852201166137], [1441710144.0, 78460000.0, 26.372876064104492], [1441711872.0, 78480000.0, 26.407265981789013], [1441713600.0, 78500000.0, 26.441691792812488], [1441715328.0, 78520000.0, 26.476153525287216], [1441717056.0, 78540000.0, 26.510651207340096], [1441718784.0, 78560000.0, 26.54518486711278], [1441720512.0, 78580000.0, 26.579754532761594], [1441722240.0, 78600000.0, 26.614360232457603], [1441723968.0, 78620000.0, 26.649001994386506], [1441725696.0, 78640000.0, 26.68367984674875], [1441727424.0, 78660000.0, 26.718393817759498], [1441729152.0, 78680000.0, 26.753143935648627], [1441730880.0, 78700000.0, 26.7879302286607], [1441732608.0, 78720000.0, 26.822752725055043], [1441734336.0, 78740000.0, 26.857611453105687], [1441736064.0, 78760000.0, 26.89250644110142], [1441737792.0, 78780000.0, 26.927437717345708], [1441739520.0, 78800000.0, 26.9624053101568], [1441741248.0, 78820000.0, 26.99740924786768], [1441742976.0, 78840000.0, 27.03244955882609], [1441744704.0, 78860000.0, 27.067526271394478], [1441746432.0, 78880000.0, 27.102639413950083], [1441748160.0, 78900000.0, 27.13778901488489], [1441749888.0, 78920000.0, 27.172975102605673], [1441751616.0, 78940000.0, 27.208197705533895], [1441753344.0, 78960000.0, 27.24345685210585], [1441755072.0, 78980000.0, 27.278752570772614], [1441756800.0, 79000000.0, 27.314084890000007], [1441758528.0, 79020000.0, 27.34945383826862], [1441760256.0, 79040000.0, 27.384859444073857], [1441761984.0, 79060000.0, 27.420301735925896], [1441763712.0, 79080000.0, 27.45578074234973], [1441765440.0, 79100000.0, 27.491296491885098], [1441767168.0, 79120000.0, 27.526849013086583], [1441768896.0, 79140000.0, 27.562438334523552], [1441770624.0, 79160000.0, 27.59806448478019], [1441772352.0, 79180000.0, 27.63372749245547], [1441774080.0, 79200000.0, 27.669427386163196], [1441775808.0, 79220000.0, 27.705164194532], [1441777536.0, 79240000.0, 27.740937946205342], [1441779264.0, 79260000.0, 27.776748669841457], [1441780992.0, 79280000.0, 27.812596394113466], [1441782720.0, 79300000.0, 27.848481147709293], [1441784448.0, 79320000.0, 27.884402959331734], [1441786176.0, 79340000.0, 27.92036185769837], [1441787904.0, 79360000.0, 27.956357871541684], [1441789632.0, 79380000.0, 27.992391029608967], [1441791360.0, 79400000.0, 28.028461360662405], [1441793088.0, 79420000.0, 28.064568893478988], [1441794816.0, 79440000.0, 28.100713656850612], [1441796544.0, 79460000.0, 28.136895679584015], [1441798272.0, 79480000.0, 28.173114990500824], [1441800000.0, 79500000.0, 28.209371618437505], [1441801728.0, 79520000.0, 28.24566559224542], [1441803456.0, 79540000.0, 28.28199694079082], [1441805184.0, 79560000.0, 28.31836569295484], [1441806912.0, 79580000.0, 28.354771877633457], [1441808640.0, 79600000.0, 28.3912155237376], [1441810368.0, 79620000.0, 28.42769666019306], [1441812096.0, 79640000.0, 28.464215315940564], [1441813824.0, 79660000.0, 28.50077151993567], [1441815552.0, 79680000.0, 28.53736530114891], [1441817280.0, 79700000.0, 28.573996688565696], [1441819008.0, 79720000.0, 28.610665711186382], [1441820736.0, 79740000.0, 28.647372398026185], [1441822464.0, 79760000.0, 28.6841167781153], [1441824192.0, 79780000.0, 28.720898880498826], [1441825920.0, 79800000.0, 28.757718734236803], [1441827648.0, 79820000.0, 28.79457636840418], [1441829376.0, 79840000.0, 28.83147181209085], [1441831104.0, 79860000.0, 28.868405094401684], [1441832832.0, 79880000.0, 28.905376244456463], [1441834560.0, 79900000.0, 28.942385291389908], [1441836288.0, 79920000.0, 28.97943226435171], [1441838016.0, 79940000.0, 29.01651719250653], [1441839744.0, 79960000.0, 29.05364010503399], [1441841472.0, 79980000.0, 29.09080103112863], [1441843200.0, 80000000.0, 29.128], [1441844928.0, 80020000.0, 29.165237040872643], [1441846656.0, 80040000.0, 29.20251218298597], [1441848384.0, 80060000.0, 29.23982545559456], [1441850112.0, 80080000.0, 29.27717688796776], [1441851840.0, 80100000.0, 29.31456650939009], [1441853568.0, 80120000.0, 29.35199434916096], [1441855296.0, 80140000.0, 29.389460436594753], [1441857024.0, 80160000.0, 29.42696480102095], [1441858752.0, 80180000.0, 29.464507471783968], [1441860480.0, 80200000.0, 29.5020884782432], [1441862208.0, 80220000.0, 29.53970784977314], [1441863936.0, 80240000.0, 29.577365615763213], [1441865664.0, 80260000.0, 29.615061805617945], [1441867392.0, 80280000.0, 29.652796448756806], [1441869120.0, 80300000.0, 29.690569574614287], [1441870848.0, 80320000.0, 29.728381212640016], [1441872576.0, 80340000.0, 29.766231392298558], [1441874304.0, 80360000.0, 29.804120143069483], [1441876032.0, 80380000.0, 29.84204749444754], [1441877760.0, 80400000.0, 29.880013475942384], [1441879488.0, 80420000.0, 29.91801811707883], [1441881216.0, 80440000.0, 29.95606144739667], [1441882944.0, 80460000.0, 29.994143496450732], [1441884672.0, 80480000.0, 30.032264293811025], [1441886400.0, 80500000.0, 30.070423869062505], [1441888128.0, 80520000.0, 30.108622251805222], [1441889856.0, 80540000.0, 30.146859471654356], [1441891584.0, 80560000.0, 30.185135558240066], [1441893312.0, 80580000.0, 30.2234505412077], [1441895040.0, 80600000.0, 30.261804450217614], [1441896768.0, 80620000.0, 30.300197314945223], [1441898496.0, 80640000.0, 30.33862916508117], [1441900224.0, 80660000.0, 30.37710003033105], [1441901952.0, 80680000.0, 30.415609940415585], [1441903680.0, 80700000.0, 30.454158925070704], [1441905408.0, 80720000.0, 30.492747014047282], [1441907136.0, 80740000.0, 30.53137423711147], [1441908864.0, 80760000.0, 30.5700406240444], [1441910592.0, 80780000.0, 30.608746204642348], [1441912320.0, 80800000.0, 30.647491008716806], [1441914048.0, 80820000.0, 30.68627506609427], [1441915776.0, 80840000.0, 30.725098406616414], [1441917504.0, 80860000.0, 30.7639610601401], [1441919232.0, 80880000.0, 30.802863056537205], [1441920960.0, 80900000.0, 30.841804425694892], [1441922688.0, 80920000.0, 30.88078519751537], [1441924416.0, 80940000.0, 30.919805401915976], [1441926144.0, 80960000.0, 30.95886506882932], [1441927872.0, 80980000.0, 30.99796422820305], [1441929600.0, 81000000.0, 31.03710290999999], [1441931328.0, 81020000.0, 31.07628114419824], [1441933056.0, 81040000.0, 31.115498960790898], [1441934784.0, 81060000.0, 31.154756389786403], [1441936512.0, 81080000.0, 31.19405346120823], [1441938240.0, 81100000.0, 31.233390205095088], [1441939968.0, 81120000.0, 31.27276665150093], [1441941696.0, 81140000.0, 31.312182830494784], [1441943424.0, 81160000.0, 31.351638772160914], [1441945152.0, 81180000.0, 31.39113450659884], [1441946880.0, 81200000.0, 31.430670063923184], [1441948608.0, 81220000.0, 31.47024547426386], [1441950336.0, 81240000.0, 31.509860767765925], [1441952064.0, 81260000.0, 31.549515974589625], [1441953792.0, 81280000.0, 31.589211124910527], [1441955520.0, 81300000.0, 31.62894624891931], [1441957248.0, 81320000.0, 31.6687213768219], [1441958976.0, 81340000.0, 31.708536538839514], [1441960704.0, 81360000.0, 31.74839176520848], [1441962432.0, 81380000.0, 31.788287086180503], [1441964160.0, 81400000.0, 31.82822253202241], [1441965888.0, 81420000.0, 31.868198133016275], [1441967616.0, 81440000.0, 31.908213919459516], [1441969344.0, 81460000.0, 31.94826992166469], [1441971072.0, 81480000.0, 31.988366169959626], [1441972800.0, 81500000.0, 32.0285026946875], [1441974528.0, 81520000.0, 32.068679526206616], [1441976256.0, 81540000.0, 32.10889669489068], [1441977984.0, 81560000.0, 32.149154231128534], [1441979712.0, 81580000.0, 32.18945216532434], [1441981440.0, 81600000.0, 32.229790527897606], [1441983168.0, 81620000.0, 32.27016934928302], [1441984896.0, 81640000.0, 32.31058865993057], [1441986624.0, 81660000.0, 32.35104849030561], [1441988352.0, 81680000.0, 32.39154887088867], [1441990080.0, 81700000.0, 32.43208983217569], [1441991808.0, 81720000.0, 32.47267140467783], [1441993536.0, 81740000.0, 32.513293618921544], [1441995264.0, 81760000.0, 32.553956505448674], [1441996992.0, 81780000.0, 32.594660094816305], [1441998720.0, 81800000.0, 32.635404417596796], [1442000448.0, 81820000.0, 32.67618950437796], [1442002176.0, 81840000.0, 32.71701538576277], [1442003904.0, 81860000.0, 32.7578820923697], [1442005632.0, 81880000.0, 32.79878965483241], [1442007360.0, 81900000.0, 32.8397381037999], [1442009088.0, 81920000.0, 32.88072746993661], [1442010816.0, 81940000.0, 32.92175778392225], [1442012544.0, 81960000.0, 32.96282907645183], [1442014272.0, 81980000.0, 33.00394137823584], [1442016000.0, 82000000.0, 33.04509471999999], [1442017728.0, 82020000.0, 33.08628913248543], [1442019456.0, 82040000.0, 33.127524646448656], [1442021184.0, 82060000.0, 33.16880129266144], [1442022912.0, 82080000.0, 33.21011910191107], [1442024640.0, 82100000.0, 33.25147810500012], [1442026368.0, 82120000.0, 33.292878332746476], [1442028096.0, 82140000.0, 33.334319815983584], [1442029824.0, 82160000.0, 33.37580258556008], [1442031552.0, 82180000.0, 33.41732667234013], [1442033280.0, 82200000.0, 33.45889210720322], [1442035008.0, 82220000.0, 33.50049892104418], [1442036736.0, 82240000.0, 33.542147144773416], [1442038464.0, 82260000.0, 33.583836809316544], [1442040192.0, 82280000.0, 33.62556794561465], [1442041920.0, 82300000.0, 33.667340584624306], [1442043648.0, 82320000.0, 33.70915475731738], [1442045376.0, 82340000.0, 33.75101049468128], [1442047104.0, 82360000.0, 33.79290782771872], [1442048832.0, 82380000.0, 33.834846787447866], [1442050560.0, 82400000.0, 33.87682740490241], [1442052288.0, 82420000.0, 33.91884971113135], [1442054016.0, 82440000.0, 33.96091373719914], [1442055744.0, 82460000.0, 34.003019514185816], [1442057472.0, 82480000.0, 34.04516707318663], [1442059200.0, 82500000.0, 34.0873564453125], [1442060928.0, 82520000.0, 34.12958766168966], [1442062656.0, 82540000.0, 34.1718607534598], [1442064384.0, 82560000.0, 34.21417575178018], [1442066112.0, 82580000.0, 34.25653268782338], [1442067840.0, 82600000.0, 34.2989315927776], [1442069568.0, 82620000.0, 34.34137249784638], [1442071296.0, 82640000.0, 34.38385543424877], [1442073024.0, 82660000.0, 34.42638043321937], [1442074752.0, 82680000.0, 34.468947526008186], [1442076480.0, 82700000.0, 34.51155674388069], [1442078208.0, 82720000.0, 34.55420811811796], [1442079936.0, 82740000.0, 34.59690168001642], [1442081664.0, 82760000.0, 34.63963746088816], [1442083392.0, 82780000.0, 34.68241549206061], [1442085120.0, 82800000.0, 34.72523580487678], [1442086848.0, 82820000.0, 34.76809843069524], [1442088576.0, 82840000.0, 34.81100340088997], [1442090304.0, 82860000.0, 34.8539507468505], [1442092032.0, 82880000.0, 34.896940499981966], [1442093760.0, 82900000.0, 34.93997269170488], [1442095488.0, 82920000.0, 34.98304735345545], [1442097216.0, 82940000.0, 35.026164516685284], [1442098944.0, 82960000.0, 35.06932421286155], [1442100672.0, 82980000.0, 35.11252647346706], [1442102400.0, 83000000.0, 35.155771330000015], [1442104128.0, 83020000.0, 35.19905881397425], [1442105856.0, 83040000.0, 35.24238895691917], [1442107584.0, 83060000.0, 35.28576179037968], [1442109312.0, 83080000.0, 35.329177345916314], [1442111040.0, 83100000.0, 35.3726356551051], [1442112768.0, 83120000.0, 35.41613674953764], [1442114496.0, 83140000.0, 35.459680660821185], [1442116224.0, 83160000.0, 35.503267420578474], [1442117952.0, 83180000.0, 35.54689706044782], [1442119680.0, 83200000.0, 35.59056961208321], [1442121408.0, 83220000.0, 35.6342851071541], [1442123136.0, 83240000.0, 35.67804357734568], [1442124864.0, 83260000.0, 35.721845054358624], [1442126592.0, 83280000.0, 35.76568956990917], [1442128320.0, 83300000.0, 35.80957715572931], [1442130048.0, 83320000.0, 35.8535078435665], [1442131776.0, 83340000.0, 35.89748166518384], [1442133504.0, 83360000.0, 35.941498652360124], [1442135232.0, 83380000.0, 35.98555883688963], [1442136960.0, 83400000.0, 36.02966225058239], [1442138688.0, 83420000.0, 36.07380892526398], [1442140416.0, 83440000.0, 36.11799889277558], [1442142144.0, 83460000.0, 36.16223218497413], [1442143872.0, 83480000.0, 36.20650883373207], [1442145600.0, 83500000.0, 36.25082887093748], [1442147328.0, 83520000.0, 36.29519232849427], [1442149056.0, 83540000.0, 36.33959923832171], [1442150784.0, 83560000.0, 36.38404963235501], [1442152512.0, 83580000.0, 36.42854354254486], [1442154240.0, 83600000.0, 36.47308100085758], [1442155968.0, 83620000.0, 36.517662039275336], [1442157696.0, 83640000.0, 36.56228668979581], [1442159424.0, 83660000.0, 36.60695498443234], [1442161152.0, 83680000.0, 36.651666955214075], [1442162880.0, 83700000.0, 36.69642263418569], [1442164608.0, 83720000.0, 36.74122205340768], [1442166336.0, 83740000.0, 36.786065244956156], [1442168064.0, 83760000.0, 36.83095224092284], [1442169792.0, 83780000.0, 36.87588307341534], [1442171520.0, 83800000.0, 36.920857774556815], [1442173248.0, 83820000.0, 36.96587637648612], [1442174976.0, 83840000.0, 37.01093891135794], [1442176704.0, 83860000.0, 37.056045411342495], [1442178432.0, 83880000.0, 37.101195908625925], [1442180160.0, 83900000.0, 37.14639043540991], [1442181888.0, 83920000.0, 37.19162902391189], [1442183616.0, 83940000.0, 37.23691170636513], [1442185344.0, 83960000.0, 37.28223851501851], [1442187072.0, 83980000.0, 37.32760948213665], [1442188800.0, 84000000.0, 37.37302464000001], [1442190528.0, 84020000.0, 37.418484020904636], [1442192256.0, 84040000.0, 37.46398765716249], [1442193984.0, 84060000.0, 37.509535581101154], [1442195712.0, 84080000.0, 37.555127825063956], [1442197440.0, 84100000.0, 37.60076442141011], [1442199168.0, 84120000.0, 37.64644540251444], [1442200896.0, 84140000.0, 37.692170800767585], [1442202624.0, 84160000.0, 37.73794064857603], [1442204352.0, 84180000.0, 37.78375497836189], [1442206080.0, 84200000.0, 37.829613822563196], [1442207808.0, 84220000.0, 37.875517213633664], [1442209536.0, 84240000.0, 37.92146518404277], [1442211264.0, 84260000.0, 37.967457766275906], [1442212992.0, 84280000.0, 38.01349499283413], [1442214720.0, 84300000.0, 38.05957689623429], [1442216448.0, 84320000.0, 38.10570350900918], [1442218176.0, 84340000.0, 38.1518748637072], [1442219904.0, 84360000.0, 38.19809099289273], [1442221632.0, 84380000.0, 38.24435192914582], [1442223360.0, 84400000.0, 38.29065770506239], [1442225088.0, 84420000.0, 38.33700835325423], [1442226816.0, 84440000.0, 38.383403906348875], [1442228544.0, 84460000.0, 38.42984439698965], [1442230272.0, 84480000.0, 38.476329857835864], [1442232000.0, 84500000.0, 38.52286032156248], [1442233728.0, 84520000.0, 38.569435820860456], [1442235456.0, 84540000.0, 38.616056388436476], [1442237184.0, 84560000.0, 38.662722057013056], [1442238912.0, 84580000.0, 38.7094328593287], [1442240640.0, 84600000.0, 38.75618882813761], [1442242368.0, 84620000.0, 38.8029899962099], [1442244096.0, 84640000.0, 38.84983639633161], [1442245824.0, 84660000.0, 38.89672806130449], [1442247552.0, 84680000.0, 38.94366502394635], [1442249280.0, 84700000.0, 38.990647317090726], [1442251008.0, 84720000.0, 39.037674973587], [1442252736.0, 84740000.0, 39.08474802630062], [1442254464.0, 84760000.0, 39.131866508112765], [1442256192.0, 84780000.0, 39.179030451920475], [1442257920.0, 84800000.0, 39.22623989063681], [1442259648.0, 84820000.0, 39.273494857190606], [1442261376.0, 84840000.0, 39.32079538452668], [1442263104.0, 84860000.0, 39.36814150560573], [1442264832.0, 84880000.0, 39.415533253404284], [1442266560.0, 84900000.0, 39.462970660914905], [1442268288.0, 84920000.0, 39.510453761145975], [1442270016.0, 84940000.0, 39.55798258712177], [1442271744.0, 84960000.0, 39.60555717188263], [1442273472.0, 84980000.0, 39.65317754848465], [1442275200.0, 85000000.0, 39.70084375], [1442276928.0, 85020000.0, 39.74855580951668], [1442278656.0, 85040000.0, 39.79631376013862], [1442280384.0, 85060000.0, 39.84411763498579], [1442282112.0, 85080000.0, 39.89196746719399], [1442283840.0, 85100000.0, 39.939863289915095], [1442285568.0, 85120000.0, 39.9878051363168], [1442287296.0, 85140000.0, 40.03579303958279], [1442289024.0, 85160000.0, 40.0838270329128], [1442290752.0, 85180000.0, 40.13190714952241], [1442292480.0, 85200000.0, 40.18003342264318], [1442294208.0, 85220000.0, 40.22820588552278], [1442295936.0, 85240000.0, 40.27642457142464], [1442297664.0, 85260000.0, 40.32468951362838], [1442299392.0, 85280000.0, 40.37300074542944], [1442301120.0, 85300000.0, 40.42135830013928], [1442302848.0, 85320000.0, 40.46976221108546], [1442304576.0, 85340000.0, 40.518212511611395], [1442306304.0, 85360000.0, 40.566709235076516], [1442308032.0, 85380000.0, 40.615252414856386], [1442309760.0, 85400000.0, 40.663842084342384], [1442311488.0, 85420000.0, 40.71247827694208], [1442313216.0, 85440000.0, 40.76116102607891], [1442314944.0, 85460000.0, 40.809890365192366], [1442316672.0, 85480000.0, 40.85866632773807], [1442318400.0, 85500000.0, 40.907488947187524], [1442320128.0, 85520000.0, 40.95635825702827], [1442321856.0, 85540000.0, 41.00527429076399], [1442323584.0, 85560000.0, 41.054237081914295], [1442325312.0, 85580000.0, 41.10324666401494], [1442327040.0, 85600000.0, 41.152303070617606], [1442328768.0, 85620000.0, 41.20140633529006], [1442330496.0, 85640000.0, 41.2505564916162], [1442332224.0, 85660000.0, 41.2997535731959], [1442333952.0, 85680000.0, 41.34899761364503], [1442335680.0, 85700000.0, 41.3982886465957], [1442337408.0, 85720000.0, 41.447626705695924], [1442339136.0, 85740000.0, 41.49701182460991], [1442340864.0, 85760000.0, 41.54644403701784], [1442342592.0, 85780000.0, 41.59592337661598], [1442344320.0, 85800000.0, 41.6454498771168], [1442346048.0, 85820000.0, 41.69502357224873], [1442347776.0, 85840000.0, 41.74464449575625], [1442349504.0, 85860000.0, 41.79431268140015], [1442351232.0, 85880000.0, 41.84402816295704], [1442352960.0, 85900000.0, 41.89379097421989], [1442354688.0, 85920000.0, 41.943601148997615], [1442356416.0, 85940000.0, 41.99345872111521], [1442358144.0, 85960000.0, 42.04336372441395], [1442359872.0, 85980000.0, 42.09331619275109], [1442361600.0, 86000000.0, 42.14331615999999], [1442363328.0, 86020000.0, 42.19336366005029], [1442365056.0, 86040000.0, 42.24345872680753], [1442366784.0, 86060000.0, 42.29360139419362], [1442368512.0, 86080000.0, 42.343791696146475], [1442370240.0, 86100000.0, 42.394029666620085], [1442371968.0, 86120000.0, 42.44431533958476], [1442373696.0, 86140000.0, 42.49464874902684], [1442375424.0, 86160000.0, 42.54502992894874], [1442377152.0, 86180000.0, 42.59545891336929], [1442378880.0, 86200000.0, 42.64593573632318], [1442380608.0, 86220000.0, 42.6964604318615], [1442382336.0, 86240000.0, 42.74703303405137], [1442384064.0, 86260000.0, 42.79765357697606], [1442385792.0, 86280000.0, 42.84832209473516], [1442387520.0, 86300000.0, 42.89903862144431], [1442389248.0, 86320000.0, 42.94980319123534], [1442390976.0, 86340000.0, 43.00061583825635], [1442392704.0, 86360000.0, 43.05147659667151], [1442394432.0, 86380000.0, 43.10238550066134], [1442396160.0, 86400000.0, 43.15334258442242], [1442397888.0, 86420000.0, 43.20434788216751], [1442399616.0, 86440000.0, 43.25540142812575], [1442401344.0, 86460000.0, 43.306503256542335], [1442403072.0, 86480000.0, 43.357653401678675], [1442404800.0, 86500000.0, 43.408851897812504], [1442406528.0, 86520000.0, 43.46009877923766], [1442408256.0, 86540000.0, 43.51139408026432], [1442409984.0, 86560000.0, 43.56273783521879], [1442411712.0, 86580000.0, 43.61413007844357], [1442413440.0, 86600000.0, 43.665570844297605], [1442415168.0, 86620000.0, 43.71706016715587], [1442416896.0, 86640000.0, 43.7685980814096], [1442418624.0, 86660000.0, 43.820184621466446], [1442420352.0, 86680000.0, 43.87181982175011], [1442422080.0, 86700000.0, 43.923503716700694], [1442423808.0, 86720000.0, 43.97523634077449], [1442425536.0, 86740000.0, 44.027017728443994], [1442427264.0, 86760000.0, 44.07884791419812], [1442428992.0, 86780000.0, 44.130726932541954], [1442430720.0, 86800000.0, 44.182654817996784], [1442432448.0, 86820000.0, 44.234631605100404], [1442434176.0, 86840000.0, 44.28665732840662], [1442435904.0, 86860000.0, 44.33873202248574], [1442437632.0, 86880000.0, 44.39085572192425], [1442439360.0, 86900000.0, 44.443028461324886], [1442441088.0, 86920000.0, 44.49525027530685], [1442442816.0, 86940000.0, 44.54752119850549], [1442444544.0, 86960000.0, 44.59984126557247], [1442446272.0, 86980000.0, 44.65221051117589], [1442448000.0, 87000000.0, 44.704628969999966], [1442449728.0, 87020000.0, 44.757096676745476], [1442451456.0, 87040000.0, 44.8096136661293], [1442453184.0, 87060000.0, 44.862179972884675], [1442454912.0, 87080000.0, 44.914795631761315], [1442456640.0, 87100000.0, 44.967460677525125], [1442458368.0, 87120000.0, 45.020175144958316], [1442460096.0, 87140000.0, 45.07293906885963], [1442461824.0, 87160000.0, 45.125752484043915], [1442463552.0, 87180000.0, 45.17861542534257], [1442465280.0, 87200000.0, 45.23152792760322], [1442467008.0, 87220000.0, 45.28449002568983], [1442468736.0, 87240000.0, 45.337501754482844], [1442470464.0, 87260000.0, 45.39056314887899], [1442472192.0, 87280000.0, 45.44367424379129], [1442473920.0, 87300000.0, 45.49683507414932], [1442475648.0, 87320000.0, 45.55004567489881], [1442477376.0, 87340000.0, 45.60330608100211], [1442479104.0, 87360000.0, 45.65661632743777], [1442480832.0, 87380000.0, 45.7099764492007], [1442482560.0, 87400000.0, 45.763386481302405], [1442484288.0, 87420000.0, 45.81684645877059], [1442486016.0, 87440000.0, 45.870356416649386], [1442487744.0, 87460000.0, 45.92391638999946], [1442489472.0, 87480000.0, 45.977526413897664], [1442491200.0, 87500000.0, 46.0311865234375], [1442492928.0, 87520000.0, 46.08489675372871], [1442494656.0, 87540000.0, 46.13865713989744], [1442496384.0, 87560000.0, 46.192467717086416], [1442498112.0, 87580000.0, 46.24632852045461], [1442499840.0, 87600000.0, 46.30023958517759], [1442501568.0, 87620000.0, 46.354200946447236], [1442503296.0, 87640000.0, 46.4082126394718], [1442505024.0, 87660000.0, 46.46227469947622], [1442506752.0, 87680000.0, 46.51638716170162], [1442508480.0, 87700000.0, 46.57055006140569], [1442510208.0, 87720000.0, 46.624763433862604], [1442511936.0, 87740000.0, 46.679027314362855], [1442513664.0, 87760000.0, 46.733341738213596], [1442515392.0, 87780000.0, 46.787706740738265], [1442517120.0, 87800000.0, 46.842122357276786], [1442518848.0, 87820000.0, 46.89658862318568], [1442520576.0, 87840000.0, 46.95110557383782], [1442522304.0, 87860000.0, 47.00567324462254], [1442524032.0, 87880000.0, 47.0602916709458], [1442525760.0, 87900000.0, 47.11496088822988], [1442527488.0, 87920000.0, 47.16968093191369], [1442529216.0, 87940000.0, 47.22445183745254], [1442530944.0, 87960000.0, 47.279273640318195], [1442532672.0, 87980000.0, 47.3341463759991], [1442534400.0, 88000000.0, 47.38907008000002], [1442536128.0, 88020000.0, 47.44404478784228], [1442537856.0, 88040000.0, 47.49907053506381], [1442539584.0, 88060000.0, 47.554147357218916], [1442541312.0, 88080000.0, 47.609275289878546], [1442543040.0, 88100000.0, 47.66445436863011], [1442544768.0, 88120000.0, 47.71968462907748], [1442546496.0, 88140000.0, 47.77496610684123], [1442548224.0, 88160000.0, 47.830298837558324], [1442549952.0, 88180000.0, 47.88568285688224], [1442551680.0, 88200000.0, 47.9411182004832], [1442553408.0, 88220000.0, 47.996604904047736], [1442555136.0, 88240000.0, 48.052143003279134], [1442556864.0, 88260000.0, 48.107732533897064], [1442558592.0, 88280000.0, 48.16337353163781], [1442560320.0, 88300000.0, 48.2190660322543], [1442562048.0, 88320000.0, 48.27481007151595], [1442563776.0, 88340000.0, 48.33060568520867], [1442565504.0, 88360000.0, 48.38645290913516], [1442567232.0, 88380000.0, 48.44235177911445], [1442568960.0, 88400000.0, 48.49830233098239], [1442570688.0, 88420000.0, 48.55430460059123], [1442572416.0, 88440000.0, 48.61035862380983], [1442574144.0, 88460000.0, 48.66646443652377], [1442575872.0, 88480000.0, 48.722622074635105], [1442577600.0, 88500000.0, 48.77883157406248], [1442579328.0, 88520000.0, 48.8350929707413], [1442581056.0, 88540000.0, 48.891406300623345], [1442582784.0, 88560000.0, 48.94777159967725], [1442584512.0, 88580000.0, 49.00418890388811], [1442586240.0, 88600000.0, 49.060658249257585], [1442587968.0, 88620000.0, 49.11717967180419], [1442589696.0, 88640000.0, 49.173753207562854], [1442591424.0, 88660000.0, 49.23037889258518], [1442593152.0, 88680000.0, 49.28705676293951], [1442594880.0, 88700000.0, 49.343786854710686], [1442596608.0, 88720000.0, 49.40056920400032], [1442598336.0, 88740000.0, 49.457403846926596], [1442600064.0, 88760000.0, 49.51429081962427], [1442601792.0, 88780000.0, 49.57123015824498], [1442603520.0, 88800000.0, 49.62822189895683], [1442605248.0, 88820000.0, 49.68526607794455], [1442606976.0, 88840000.0, 49.74236273140978], [1442608704.0, 88860000.0, 49.79951189557053], [1442610432.0, 88880000.0, 49.85671360666177], [1442612160.0, 88900000.0, 49.91396790093492], [1442613888.0, 88920000.0, 49.97127481465812], [1442615616.0, 88940000.0, 50.028634384116366], [1442617344.0, 88960000.0, 50.08604664561116], [1442619072.0, 88980000.0, 50.1435116354607], [1442620800.0, 89000000.0, 50.20102939000001], [1442622528.0, 89020000.0, 50.25859994558068], [1442624256.0, 89040000.0, 50.316223338571135], [1442625984.0, 89060000.0, 50.37389960535641], [1442627712.0, 89080000.0, 50.431628782338194], [1442629440.0, 89100000.0, 50.48941090593512], [1442631168.0, 89120000.0, 50.547246012582285], [1442632896.0, 89140000.0, 50.60513413873163], [1442634624.0, 89160000.0, 50.663075320851874], [1442636352.0, 89180000.0, 50.72106959542833], [1442638080.0, 89200000.0, 50.7791169989632], [1442639808.0, 89220000.0, 50.83721756797531], [1442641536.0, 89240000.0, 50.8953713390002], [1442643264.0, 89260000.0, 50.95357834859034], [1442644992.0, 89280000.0, 51.011838633314774], [1442646720.0, 89300000.0, 51.070152229759294], [1442648448.0, 89320000.0, 51.12851917452662], [1442650176.0, 89340000.0, 51.18693950423602], [1442651904.0, 89360000.0, 51.245413255523765], [1442653632.0, 89380000.0, 51.30394046504267], [1442655360.0, 89400000.0, 51.36252116946239], [1442657088.0, 89420000.0, 51.42115540546947], [1442658816.0, 89440000.0, 51.479843209767125], [1442660544.0, 89460000.0, 51.538584619075294], [1442662272.0, 89480000.0, 51.59737967013091], [1442664000.0, 89500000.0, 51.65622839968748], [1442665728.0, 89520000.0, 51.71513084451549], [1442667456.0, 89540000.0, 51.77408704140211], [1442669184.0, 89560000.0, 51.83309702715129], [1442670912.0, 89580000.0, 51.89216083858394], [1442672640.0, 89600000.0, 51.95127851253762], [1442674368.0, 89620000.0, 52.01045008586674], [1442676096.0, 89640000.0, 52.06967559544265], [1442677824.0, 89660000.0, 52.128955078153325], [1442679552.0, 89680000.0, 52.188288570903794], [1442681280.0, 89700000.0, 52.24767611061572], [1442683008.0, 89720000.0, 52.307117734227646], [1442684736.0, 89740000.0, 52.36661347869507], [1442686464.0, 89760000.0, 52.42616338099021], [1442688192.0, 89780000.0, 52.485767478102105], [1442689920.0, 89800000.0, 52.545425807036814], [1442691648.0, 89820000.0, 52.605138404817026], [1442693376.0, 89840000.0, 52.66490530848254], [1442695104.0, 89860000.0, 52.72472655508978], [1442696832.0, 89880000.0, 52.784602181712124], [1442698560.0, 89900000.0, 52.844532225439906], [1442700288.0, 89920000.0, 52.90451672338022], [1442702016.0, 89940000.0, 52.964555712657], [1442703744.0, 89960000.0, 53.02464923041128], [1442705472.0, 89980000.0, 53.084797313800685], [1442707200.0, 90000000.0, 53.145], [1442708928.0, 90020000.0, 53.20525732620073], [1442710656.0, 90040000.0, 53.26556932961126], [1442712384.0, 90060000.0, 53.32593604745704], [1442714112.0, 90080000.0, 53.38635751698023], [1442715840.0, 90100000.0, 53.44683377544009], [1442717568.0, 90120000.0, 53.50736486011265], [1442719296.0, 90140000.0, 53.567950808290824], [1442721024.0, 90160000.0, 53.62859165728463], [1442722752.0, 90180000.0, 53.689287444420856], [1442724480.0, 90200000.0, 53.75003820704318], [1442726208.0, 90220000.0, 53.810843982512424], [1442727936.0, 90240000.0, 53.87170480820607], [1442729664.0, 90260000.0, 53.932620721518816], [1442731392.0, 90280000.0, 53.993591759862085], [1442733120.0, 90300000.0, 54.05461796066428], [1442734848.0, 90320000.0, 54.1156993613709], [1442736576.0, 90340000.0, 54.17683599944424], [1442738304.0, 90360000.0, 54.238027912363556], [1442740032.0, 90380000.0, 54.29927513762523], [1442741760.0, 90400000.0, 54.360577712742376], [1442743488.0, 90420000.0, 54.42193567524531], [1442745216.0, 90440000.0, 54.48334906268116], [1442746944.0, 90460000.0, 54.54481791261401], [1442748672.0, 90480000.0, 54.60634226262511], [1442750400.0, 90500000.0, 54.66792215031252], [1442752128.0, 90520000.0, 54.7295576132913], [1442753856.0, 90540000.0, 54.79124868919364], [1442755584.0, 90560000.0, 54.85299541566853], [1442757312.0, 90580000.0, 54.91479783038217], [1442759040.0, 90600000.0, 54.976655971017614], [1442760768.0, 90620000.0, 55.0385698752749], [1442762496.0, 90640000.0, 55.10053958087125], [1442764224.0, 90660000.0, 55.16256512554074], [1442765952.0, 90680000.0, 55.22464654703445], [1442767680.0, 90700000.0, 55.286783883120705], [1442769408.0, 90720000.0, 55.34897717158455], [1442771136.0, 90740000.0, 55.41122645022835], [1442772864.0, 90760000.0, 55.473531756871296], [1442774592.0, 90780000.0, 55.53589312934962], [1442776320.0, 90800000.0, 55.59831060551681], [1442778048.0, 90820000.0, 55.66078422324317], [1442779776.0, 90840000.0, 55.7233140204161], [1442781504.0, 90860000.0, 55.785900034940184], [1442783232.0, 90880000.0, 55.84854230473688], [1442784960.0, 90900000.0, 55.911240867744894], [1442786688.0, 90920000.0, 55.97399576191985], [1442788416.0, 90940000.0, 56.03680702523444], [1442790144.0, 90960000.0, 56.09967469567859], [1442791872.0, 90980000.0, 56.16259881125913], [1442793600.0, 91000000.0, 56.22557941], [1442795328.0, 91020000.0, 56.288616529942324], [1442797056.0, 91040000.0, 56.35171020914418], [1442798784.0, 91060000.0, 56.414860485680876], [1442800512.0, 91080000.0, 56.47806739764472], [1442802240.0, 91100000.0, 56.54133098314507], [1442803968.0, 91120000.0, 56.6046512803086], [1442805696.0, 91140000.0, 56.66802832727888], [1442807424.0, 91160000.0, 56.731462162216594], [1442809152.0, 91180000.0, 56.79495282329973], [1442810880.0, 91200000.0, 56.858500348723176], [1442812608.0, 91220000.0, 56.92210477669916], [1442814336.0, 91240000.0, 56.985766145456815], [1442816064.0, 91260000.0, 57.04948449324248], [1442817792.0, 91280000.0, 57.11325985831981], [1442819520.0, 91300000.0, 57.177092278969326], [1442821248.0, 91320000.0, 57.24098179348877], [1442822976.0, 91340000.0, 57.3049284401932], [1442824704.0, 91360000.0, 57.36893225741455], [1442826432.0, 91380000.0, 57.432993283502185], [1442828160.0, 91400000.0, 57.49711155682242], [1442829888.0, 91420000.0, 57.56128711575875], [1442831616.0, 91440000.0, 57.625519998711994], [1442833344.0, 91460000.0, 57.68981024409998], [1442835072.0, 91480000.0, 57.75415789035771], [1442836800.0, 91500000.0, 57.81856297593752], [1442838528.0, 91520000.0, 57.8830255393087], [1442840256.0, 91540000.0, 57.94754561895795], [1442841984.0, 91560000.0, 58.01212325338902], [1442843712.0, 91580000.0, 58.076758481122816], [1442845440.0, 91600000.0, 58.14145134069762], [1442847168.0, 91620000.0, 58.20620187066871], [1442848896.0, 91640000.0, 58.27101010960864], [1442850624.0, 91660000.0, 58.33587609610729], [1442852352.0, 91680000.0, 58.40079986877154], [1442854080.0, 91700000.0, 58.46578146622569], [1442855808.0, 91720000.0, 58.53082092711113], [1442857536.0, 91740000.0, 58.595918290086416], [1442859264.0, 91760000.0, 58.66107359382757], [1442860992.0, 91780000.0, 58.72628687702759], [1442862720.0, 91800000.0, 58.791558178396784], [1442864448.0, 91820000.0, 58.85688753666284], [1442866176.0, 91840000.0, 58.92227499057045], [1442867904.0, 91860000.0, 58.98772057888177], [1442869632.0, 91880000.0, 59.0532243403761], [1442871360.0, 91900000.0, 59.11878631384989], [1442873088.0, 91920000.0, 59.1844065381171], [1442874816.0, 91940000.0, 59.25008505200874], [1442876544.0, 91960000.0, 59.315821894373116], [1442878272.0, 91980000.0, 59.38161710407592], [1442880000.0, 92000000.0, 59.447470719999984], [1442881728.0, 92020000.0, 59.51338278104552], [1442883456.0, 92040000.0, 59.57935332612994], [1442885184.0, 92060000.0, 59.64538239418791], [1442886912.0, 92080000.0, 59.71147002417157], [1442888640.0, 92100000.0, 59.77761625505013], [1442890368.0, 92120000.0, 59.84382112581016], [1442892096.0, 92140000.0, 59.91008467545567], [1442893824.0, 92160000.0, 59.97640694300775], [1442895552.0, 92180000.0, 60.04278796750501], [1442897280.0, 92200000.0, 60.109227788003224], [1442899008.0, 92220000.0, 60.17572644357546], [1442900736.0, 92240000.0, 60.24228397331229], [1442902464.0, 92260000.0, 60.308900416321436], [1442904192.0, 92280000.0, 60.37557581172793], [1442905920.0, 92300000.0, 60.44231019867432], [1442907648.0, 92320000.0, 60.50910361632025], [1442909376.0, 92340000.0, 60.57595610384296], [1442911104.0, 92360000.0, 60.6428677004368], [1442912832.0, 92380000.0, 60.70983844531354], [1442914560.0, 92400000.0, 60.7768683777024], [1442916288.0, 92420000.0, 60.84395753684984], [1442918016.0, 92440000.0, 60.911105962019626], [1442919744.0, 92460000.0, 60.9783136924931], [1442921472.0, 92480000.0, 61.04558076756871], [1442923200.0, 92500000.0, 61.1129072265625], [1442924928.0, 92520000.0, 61.18029310880775], [1442926656.0, 92540000.0, 61.247738453655074], [1442928384.0, 92560000.0, 61.315243300472666], [1442930112.0, 92580000.0, 61.38280768864584], [1442931840.0, 92600000.0, 61.45043165757759], [1442933568.0, 92620000.0, 61.51811524668808], [1442935296.0, 92640000.0, 61.58585849541484], [1442937024.0, 92660000.0, 61.65366144321305], [1442938752.0, 92680000.0, 61.72152412955507], [1442940480.0, 92700000.0, 61.789446593930684], [1442942208.0, 92720000.0, 61.85742887584724], [1442943936.0, 92740000.0, 61.92547101482929], [1442945664.0, 92760000.0, 61.99357305041903], [1442947392.0, 92780000.0, 62.06173502217591], [1442949120.0, 92800000.0, 62.12995696967677], [1442950848.0, 92820000.0, 62.19823893251611], [1442952576.0, 92840000.0, 62.26658095030566], [1442954304.0, 92860000.0, 62.33498306267457], [1442956032.0, 92880000.0, 62.40344530926964], [1442957760.0, 92900000.0, 62.47196772975487], [1442959488.0, 92920000.0, 62.540550363811924], [1442961216.0, 92940000.0, 62.609193251139786], [1442962944.0, 92960000.0, 62.67789643145483], [1442964672.0, 92980000.0, 62.746659944491135], [1442966400.0, 93000000.0, 62.815483830000026], [1442968128.0, 93020000.0, 62.88436812775032], [1442969856.0, 93040000.0, 62.95331287752845], [1442971584.0, 93060000.0, 63.02231811913814], [1442973312.0, 93080000.0, 63.09138389240079], [1442975040.0, 93100000.0, 63.16051023715511], [1442976768.0, 93120000.0, 63.229697193257316], [1442978496.0, 93140000.0, 63.29894480058127], [1442980224.0, 93160000.0, 63.36825309901816], [1442981952.0, 93180000.0, 63.43762212847669], [1442983680.0, 93200000.0, 63.5070519288832], [1442985408.0, 93220000.0, 63.57654254018137], [1442987136.0, 93240000.0, 63.646094002332575], [1442988864.0, 93260000.0, 63.71570635531551], [1442990592.0, 93280000.0, 63.78537963912643], [1442992320.0, 93300000.0, 63.85511389377929], [1442994048.0, 93320000.0, 63.92490915930538], [1442995776.0, 93340000.0, 63.99476547575352], [1442997504.0, 93360000.0, 64.0646828831902], [1442999232.0, 93380000.0, 64.13466142169929], [1443000960.0, 93400000.0, 64.2047011313824], [1443002688.0, 93420000.0, 64.27480205235847], [1443004416.0, 93440000.0, 64.34496422476406], [1443006144.0, 93460000.0, 64.41518768875342], [1443007872.0, 93480000.0, 64.48547248449816], [1443009600.0, 93500000.0, 64.55581865218748], [1443011328.0, 93520000.0, 64.62622623202834], [1443013056.0, 93540000.0, 64.696695264245], [1443014784.0, 93560000.0, 64.76722578907949], [1443016512.0, 93580000.0, 64.83781784679135], [1443018240.0, 93600000.0, 64.90847147765757], [1443019968.0, 93620000.0, 64.97918672197302], [1443021696.0, 93640000.0, 65.0499636200499], [1443023424.0, 93660000.0, 65.12080221221801], [1443025152.0, 93680000.0, 65.19170253882496], [1443026880.0, 93700000.0, 65.26266464023568], [1443028608.0, 93720000.0, 65.33368855683295], [1443030336.0, 93740000.0, 65.40477432901704], [1443032064.0, 93760000.0, 65.4759219972057], [1443033792.0, 93780000.0, 65.54713160183462], [1443035520.0, 93800000.0, 65.61840318335685], [1443037248.0, 93820000.0, 65.68973678224299], [1443038976.0, 93840000.0, 65.76113243898162], [1443040704.0, 93860000.0, 65.83259019407856], [1443042432.0, 93880000.0, 65.9041100880576], [1443044160.0, 93900000.0, 65.97569216145993], [1443045888.0, 93920000.0, 66.04733645484438], [1443047616.0, 93940000.0, 66.11904300878761], [1443049344.0, 93960000.0, 66.1908118638838], [1443051072.0, 93980000.0, 66.26264306074472], [1443052800.0, 94000000.0, 66.33453664000001], [1443054528.0, 94020000.0, 66.40649264229671], [1443056256.0, 94040000.0, 66.47851110829977], [1443057984.0, 94060000.0, 66.55059207869164], [1443059712.0, 94080000.0, 66.62273559417243], [1443061440.0, 94100000.0, 66.6949416954601], [1443063168.0, 94120000.0, 66.76721042329014], [1443064896.0, 94140000.0, 66.83954181841567], [1443066624.0, 94160000.0, 66.91193592160771], [1443068352.0, 94180000.0, 66.98439277365476], [1443070080.0, 94200000.0, 67.0569124153632], [1443071808.0, 94220000.0, 67.12949488755696], [1443073536.0, 94240000.0, 67.20214023107765], [1443075264.0, 94260000.0, 67.27484848678479], [1443076992.0, 94280000.0, 67.34761969555541], [1443078720.0, 94300000.0, 67.4204538982843], [1443080448.0, 94320000.0, 67.49335113588407], [1443082176.0, 94340000.0, 67.56631144928485], [1443083904.0, 94360000.0, 67.63933487943478], [1443085632.0, 94380000.0, 67.71242146729952], [1443087360.0, 94400000.0, 67.78557125386239], [1443089088.0, 94420000.0, 67.85878428012471], [1443090816.0, 94440000.0, 67.93206058710537], [1443092544.0, 94460000.0, 68.00540021584092], [1443094272.0, 94480000.0, 68.07880320738596], [1443096000.0, 94500000.0, 68.15226960281247], [1443097728.0, 94520000.0, 68.22579944321053], [1443099456.0, 94540000.0, 68.29939276968776], [1443101184.0, 94560000.0, 68.37304962336952], [1443102912.0, 94580000.0, 68.44677004539918], [1443104640.0, 94600000.0, 68.52055407693766], [1443106368.0, 94620000.0, 68.59440175916356], [1443108096.0, 94640000.0, 68.6683131332737], [1443109824.0, 94660000.0, 68.74228824048215], [1443111552.0, 94680000.0, 68.81632712202124], [1443113280.0, 94700000.0, 68.89042981914072], [1443115008.0, 94720000.0, 68.96459637310828], [1443116736.0, 94740000.0, 69.03882682520951], [1443118464.0, 94760000.0, 69.11312121674766], [1443120192.0, 94780000.0, 69.18747958904373], [1443121920.0, 94800000.0, 69.26190198343683], [1443123648.0, 94820000.0, 69.33638844128346], [1443125376.0, 94840000.0, 69.41093900395836], [1443127104.0, 94860000.0, 69.48555371285383], [1443128832.0, 94880000.0, 69.56023260937995], [1443130560.0, 94900000.0, 69.63497573496491], [1443132288.0, 94920000.0, 69.70978313105445], [1443134016.0, 94940000.0, 69.78465483911226], [1443135744.0, 94960000.0, 69.8595909006199], [1443137472.0, 94980000.0, 69.93459135707671], [1443139200.0, 95000000.0, 70.00965625], [1443140928.0, 95020000.0, 70.08478562092476], [1443142656.0, 95040000.0, 70.15997951140389], [1443144384.0, 95060000.0, 70.23523796300827], [1443146112.0, 95080000.0, 70.31056101732648], [1443147840.0, 95100000.0, 70.38594871596507], [1443149568.0, 95120000.0, 70.4614011005485], [1443151296.0, 95140000.0, 70.53691821271886], [1443153024.0, 95160000.0, 70.61250009413648], [1443154752.0, 95180000.0, 70.6881467864793], [1443156480.0, 95200000.0, 70.76385833144319], [1443158208.0, 95220000.0, 70.83963477074207], [1443159936.0, 95240000.0, 70.91547614610752], [1443161664.0, 95260000.0, 70.99138249928924], [1443163392.0, 95280000.0, 71.06735387205475], [1443165120.0, 95300000.0, 71.14339030618926], [1443166848.0, 95320000.0, 71.21949184349634], [1443168576.0, 95340000.0, 71.29565852579707], [1443170304.0, 95360000.0, 71.3718903949306], [1443172032.0, 95380000.0, 71.44818749275406], [1443173760.0, 95400000.0, 71.52454986114236], [1443175488.0, 95420000.0, 71.60097754198856], [1443177216.0, 95440000.0, 71.6774705772034], [1443178944.0, 95460000.0, 71.75402900871565], [1443180672.0, 95480000.0, 71.83065287847214], [1443182400.0, 95500000.0, 71.90734222843754], [1443184128.0, 95520000.0, 71.98409710059433], [1443185856.0, 95540000.0, 72.0609175369433], [1443187584.0, 95560000.0, 72.13780357950277], [1443189312.0, 95580000.0, 72.21475527030941], [1443191040.0, 95600000.0, 72.2917726514176], [1443192768.0, 95620000.0, 72.36885576489973], [1443194496.0, 95640000.0, 72.44600465284628], [1443196224.0, 95660000.0, 72.52321935736559], [1443197952.0, 95680000.0, 72.60049992058391], [1443199680.0, 95700000.0, 72.67784638464572], [1443201408.0, 95720000.0, 72.75525879171319], [1443203136.0, 95740000.0, 72.83273718396677], [1443204864.0, 95760000.0, 72.91028160360473], [1443206592.0, 95780000.0, 72.98789209284324], [1443208320.0, 95800000.0, 73.06556869391679], [1443210048.0, 95820000.0, 73.14331144907761], [1443211776.0, 95840000.0, 73.22112040059592], [1443213504.0, 95860000.0, 73.29899559076021], [1443215232.0, 95880000.0, 73.37693706187672], [1443216960.0, 95900000.0, 73.45494485626989], [1443218688.0, 95920000.0, 73.5330190162821], [1443220416.0, 95940000.0, 73.61115958427368], [1443222144.0, 95960000.0, 73.68936660262324], [1443223872.0, 95980000.0, 73.7676401137272], [1443225600.0, 96000000.0, 73.84598016], [1443227328.0, 96020000.0, 73.92438678387435], [1443229056.0, 96040000.0, 74.0028600278008], [1443230784.0, 96060000.0, 74.08139993424811], [1443232512.0, 96080000.0, 74.16000654570294], [1443234240.0, 96100000.0, 74.23867990467008], [1443235968.0, 96120000.0, 74.31742005367245], [1443237696.0, 96140000.0, 74.39622703525092], [1443239424.0, 96160000.0, 74.47510089196442], [1443241152.0, 96180000.0, 74.55404166639018], [1443242880.0, 96200000.0, 74.63304940112316], [1443244608.0, 96220000.0, 74.71212413877679], [1443246336.0, 96240000.0, 74.79126592198227], [1443248064.0, 96260000.0, 74.87047479338894], [1443249792.0, 96280000.0, 74.94975079566444], [1443251520.0, 96300000.0, 75.02909397149433], [1443253248.0, 96320000.0, 75.10850436358221], [1443254976.0, 96340000.0, 75.18798201465006], [1443256704.0, 96360000.0, 75.26752696743758], [1443258432.0, 96380000.0, 75.34713926470303], [1443260160.0, 96400000.0, 75.42681894922242], [1443261888.0, 96420000.0, 75.50656606379], [1443263616.0, 96440000.0, 75.58638065121825], [1443265344.0, 96460000.0, 75.6662627543376], [1443267072.0, 96480000.0, 75.74621241599674], [1443268800.0, 96500000.0, 75.82622967906251], [1443270528.0, 96520000.0, 75.90631458641973], [1443272256.0, 96540000.0, 75.98646718097157], [1443273984.0, 96560000.0, 76.06668750563928], [1443275712.0, 96580000.0, 76.14697560336204], [1443277440.0, 96600000.0, 76.22733151709761], [1443279168.0, 96620000.0, 76.30775528982154], [1443280896.0, 96640000.0, 76.38824696452768], [1443282624.0, 96660000.0, 76.46880658422813], [1443284352.0, 96680000.0, 76.54943419195297], [1443286080.0, 96700000.0, 76.63012983075068], [1443287808.0, 96720000.0, 76.71089354368777], [1443289536.0, 96740000.0, 76.79172537384885], [1443291264.0, 96760000.0, 76.872625364337], [1443292992.0, 96780000.0, 76.95359355827324], [1443294720.0, 96800000.0, 77.03462999879677], [1443296448.0, 96820000.0, 77.11573472906528], [1443298176.0, 96840000.0, 77.19690779225428], [1443299904.0, 96860000.0, 77.27814923155782], [1443301632.0, 96880000.0, 77.35945909018793], [1443303360.0, 96900000.0, 77.44083741137487], [1443305088.0, 96920000.0, 77.52228423836735], [1443306816.0, 96940000.0, 77.60379961443199], [1443308544.0, 96960000.0, 77.68538358285376], [1443310272.0, 96980000.0, 77.76703618693598], [1443312000.0, 97000000.0, 77.84875746999995], [1443313728.0, 97020000.0, 77.93054747538557], [1443315456.0, 97040000.0, 78.01240624645058], [1443317184.0, 97060000.0, 78.09433382657114], [1443318912.0, 97080000.0, 78.17633025914179], [1443320640.0, 97100000.0, 78.25839558757514], [1443322368.0, 97120000.0, 78.340529855302], [1443324096.0, 97140000.0, 78.42273310577173], [1443325824.0, 97160000.0, 78.50500538245159], [1443327552.0, 97180000.0, 78.58734672882744], [1443329280.0, 97200000.0, 78.66975718840322], [1443331008.0, 97220000.0, 78.75223680470108], [1443332736.0, 97240000.0, 78.83478562126173], [1443334464.0, 97260000.0, 78.91740368164386], [1443336192.0, 97280000.0, 79.00009102942457], [1443337920.0, 97300000.0, 79.08284770819932], [1443339648.0, 97320000.0, 79.16567376158169], [1443341376.0, 97340000.0, 79.24856923320378], [1443343104.0, 97360000.0, 79.33153416671585], [1443344832.0, 97380000.0, 79.41456860578639], [1443346560.0, 97400000.0, 79.49767259410241], [1443348288.0, 97420000.0, 79.58084617536908], [1443350016.0, 97440000.0, 79.66408939330988], [1443351744.0, 97460000.0, 79.74740229166673], [1443353472.0, 97480000.0, 79.83078491419974], [1443355200.0, 97500000.0, 79.9142373046875], [1443356928.0, 97520000.0, 79.99775950692678], [1443358656.0, 97540000.0, 80.0813515647327], [1443360384.0, 97560000.0, 80.16501352193889], [1443362112.0, 97580000.0, 80.24874542239708], [1443363840.0, 97600000.0, 80.33254730997758], [1443365568.0, 97620000.0, 80.41641922856893], [1443367296.0, 97640000.0, 80.50036122207786], [1443369024.0, 97660000.0, 80.58437333442988], [1443370752.0, 97680000.0, 80.66845560956853], [1443372480.0, 97700000.0, 80.75260809145568], [1443374208.0, 97720000.0, 80.8368308240719], [1443375936.0, 97740000.0, 80.92112385141574], [1443377664.0, 97760000.0, 81.00548721750447], [1443379392.0, 97780000.0, 81.08992096637355], [1443381120.0, 97800000.0, 81.17442514207677], [1443382848.0, 97820000.0, 81.25899978868658], [1443384576.0, 97840000.0, 81.34364495029351], [1443386304.0, 97860000.0, 81.4283606710066], [1443388032.0, 97880000.0, 81.5131469949535], [1443389760.0, 97900000.0, 81.59800396627985], [1443391488.0, 97920000.0, 81.68293162915018], [1443393216.0, 97940000.0, 81.76793002774701], [1443394944.0, 97960000.0, 81.85299920627146], [1443396672.0, 97980000.0, 81.93813920894317], [1443398400.0, 98000000.0, 82.02335008000004], [1443400128.0, 98020000.0, 82.10863186369835], [1443401856.0, 98040000.0, 82.19398460431312], [1443403584.0, 98060000.0, 82.27940834613739], [1443405312.0, 98080000.0, 82.36490313348305], [1443407040.0, 98100000.0, 82.45046901068011], [1443408768.0, 98120000.0, 82.53610602207716], [1443410496.0, 98140000.0, 82.62181421204131], [1443412224.0, 98160000.0, 82.70759362495802], [1443413952.0, 98180000.0, 82.79344430523112], [1443415680.0, 98200000.0, 82.87936629728321], [1443417408.0, 98220000.0, 82.965359645555], [1443419136.0, 98240000.0, 83.05142439450599], [1443420864.0, 98260000.0, 83.13756058861397], [1443422592.0, 98280000.0, 83.22376827237508], [1443424320.0, 98300000.0, 83.31004749030431], [1443426048.0, 98320000.0, 83.39639828693484], [1443427776.0, 98340000.0, 83.48282070681836], [1443429504.0, 98360000.0, 83.56931479452525], [1443431232.0, 98380000.0, 83.65588059464415], [1443432960.0, 98400000.0, 83.74251815178239], [1443434688.0, 98420000.0, 83.82922751056573], [1443436416.0, 98440000.0, 83.9160087156383], [1443438144.0, 98460000.0, 84.00286181166307], [1443439872.0, 98480000.0, 84.0897868433212], [1443441600.0, 98500000.0, 84.17678385531248], [1443443328.0, 98520000.0, 84.26385289235539], [1443445056.0, 98540000.0, 84.35099399918663], [1443446784.0, 98560000.0, 84.43820722056174], [1443448512.0, 98580000.0, 84.52549260125457], [1443450240.0, 98600000.0, 84.61285018605757], [1443451968.0, 98620000.0, 84.70028001978186], [1443453696.0, 98640000.0, 84.78778214725693], [1443455424.0, 98660000.0, 84.87535661333084], [1443457152.0, 98680000.0, 84.96300346287039], [1443458880.0, 98700000.0, 85.05072274076068], [1443460608.0, 98720000.0, 85.1385144919056], [1443462336.0, 98740000.0, 85.22637876122747], [1443464064.0, 98760000.0, 85.31431559366716], [1443465792.0, 98780000.0, 85.40232503418424], [1443467520.0, 98800000.0, 85.49040712775682], [1443469248.0, 98820000.0, 85.57856191938143], [1443470976.0, 98840000.0, 85.66678945407348], [1443472704.0, 98860000.0, 85.75508977686658], [1443474432.0, 98880000.0, 85.84346293281345], [1443476160.0, 98900000.0, 85.93190896698493], [1443477888.0, 98920000.0, 86.02042792447061], [1443479616.0, 98940000.0, 86.10901985037886], [1443481344.0, 98960000.0, 86.19768478983644], [1443483072.0, 98980000.0, 86.28642278798876], [1443484800.0, 99000000.0, 86.37523389000002], [1443486528.0, 99020000.0, 86.46411814105275], [1443488256.0, 99040000.0, 86.55307558634841], [1443489984.0, 99060000.0, 86.6421062711069], [1443491712.0, 99080000.0, 86.73121024056665], [1443493440.0, 99100000.0, 86.8203875399851], [1443495168.0, 99120000.0, 86.90963821463798], [1443496896.0, 99140000.0, 86.9989623098197], [1443498624.0, 99160000.0, 87.08835987084358], [1443500352.0, 99180000.0, 87.17783094304119], [1443502080.0, 99200000.0, 87.26737557176318], [1443503808.0, 99220000.0, 87.3569938023786], [1443505536.0, 99240000.0, 87.44668568027507], [1443507264.0, 99260000.0, 87.53645125085924], [1443508992.0, 99280000.0, 87.62629055955605], [1443510720.0, 99300000.0, 87.71620365180928], [1443512448.0, 99320000.0, 87.80619057308152], [1443514176.0, 99340000.0, 87.89625136885368], [1443515904.0, 99360000.0, 87.98638608462585], [1443517632.0, 99380000.0, 88.07659476591635], [1443519360.0, 99400000.0, 88.16687745826239], [1443521088.0, 99420000.0, 88.25723420721994], [1443522816.0, 99440000.0, 88.3476650583636], [1443524544.0, 99460000.0, 88.43817005728657], [1443526272.0, 99480000.0, 88.52874924960099], [1443528000.0, 99500000.0, 88.61940268093747], [1443529728.0, 99520000.0, 88.71013039694557], [1443531456.0, 99540000.0, 88.80093244329339], [1443533184.0, 99560000.0, 88.89180886566778], [1443534912.0, 99580000.0, 88.98275970977441], [1443536640.0, 99600000.0, 89.07378502133766], [1443538368.0, 99620000.0, 89.16488484610042], [1443540096.0, 99640000.0, 89.25605922982476], [1443541824.0, 99660000.0, 89.34730821829099], [1443543552.0, 99680000.0, 89.43863185729866], [1443545280.0, 99700000.0, 89.53003019266575], [1443547008.0, 99720000.0, 89.6215032702289], [1443548736.0, 99740000.0, 89.71305113584395], [1443550464.0, 99760000.0, 89.8046738353851], [1443552192.0, 99780000.0, 89.89637141474539], [1443553920.0, 99800000.0, 89.98814391983682], [1443555648.0, 99820000.0, 90.07999139658989], [1443557376.0, 99840000.0, 90.17191389095422], [1443559104.0, 99860000.0, 90.2639114488979], [1443560832.0, 99880000.0, 90.3559841164078], [1443562560.0, 99900000.0, 90.44813193948993], [1443564288.0, 99920000.0, 90.54035496416871], [1443566016.0, 99940000.0, 90.6326532364875], [1443567744.0, 99960000.0, 90.72502680250857], [1443569472.0, 99980000.0, 90.81747570831276]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/app/demo/data/runs.json b/tensorflow/tensorboard/app/demo/data/runs.json
new file mode 100644
index 0000000000..3ee46d9ef1
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/runs.json
@@ -0,0 +1,542 @@
+{
+  "cubic_function_with_a_very_long_name_how_will_it_truncate": {
+    "images": [],
+    "histograms": [],
+    "scalars": [
+      "electric/raihu",
+      "electric/voltorb",
+      "fire/arcanine",
+      "fire/charmander",
+      "fire/magmar",
+      "grass/bulbasaur",
+      "grass/ivysaur",
+      "grass/venasaur",
+      "ground/diglet",
+      "psychic/abra",
+      "psychic/mew",
+      "psychic/mewtwo",
+      "water/goldeen",
+      "water/lapras",
+      "water/psychic/starmie",
+      "water/squirtle",
+      "water/tentacool"
+    ]
+  },
+  "sin": {
+    "graph": true,
+    "images": [],
+    "histograms": [],
+    "scalars": [
+      "electric/pikachu",
+      "electric/voltorb",
+      "fire/arcanine",
+      "fire/charmander",
+      "fire/flying/charizard/except/made/really/long/to/check/truncation/characteristics",
+      "fire/flying/moltres",
+      "fire/vulpix",
+      "grass/ivysaur",
+      "grass/venasaur",
+      "ground/dugtrio",
+      "psychic/abra",
+      "psychic/mewtwo",
+      "water/goldeen",
+      "water/lapras",
+      "water/psychic/starmie",
+      "water/squirtle",
+      "water/tentacool"
+    ]
+  },
+  "poly5": {
+    "graph": true,
+    "images": [],
+    "histograms": [],
+    "scalars": [
+      "electric/pikachu",
+      "electric/raihu",
+      "electric/voltorb",
+      "fire/charmander",
+      "fire/flying/charizard",
+      "fire/flying/moltres",
+      "fire/magmar",
+      "fire/vulpix",
+      "grass/bulbasaur",
+      "grass/venasaur",
+      "ground/diglet",
+      "psychic/abra",
+      "psychic/mewtwo",
+      "water/lapras",
+      "water/psychic/starmie",
+      "water/squirtle",
+      "water/tentacool"
+    ]
+  },
+  "cos": {
+    "images": [],
+    "histograms": [],
+    "scalars": [
+      "electric/pikachu",
+      "electric/raihu",
+      "electric/voltorb",
+      "fire/arcanine",
+      "fire/charmander",
+      "fire/flying/moltres",
+      "fire/vulpix",
+      "grass/bulbasaur",
+      "grass/venasaur",
+      "ground/diglet",
+      "psychic/abra",
+      "psychic/mew",
+      "psychic/mewtwo",
+      "water/goldeen",
+      "water/psychic/starmie",
+      "water/squirtle",
+      "water/tentacool"
+    ]
+  },
+  "sq": {
+    "images": [],
+    "histograms": [],
+    "scalars": [
+      "electric/pikachu",
+      "electric/voltorb",
+      "fire/arcanine",
+      "fire/flying/charizard",
+      "fire/flying/moltres",
+      "fire/vulpix",
+      "grass/bulbasaur",
+      "grass/venasaur",
+      "ground/diglet",
+      "ground/dugtrio",
+      "psychic/abra",
+      "psychic/mew",
+      "psychic/mewtwo",
+      "water/goldeen",
+      "water/lapras",
+      "water/psychic/starmie",
+      "water/squirtle"
+    ]
+  },
+  "linear_function_can_get_a_long_name_too!": {
+    "images": [],
+    "histograms": [],
+    "scalars": [
+      "electric/pikachu",
+      "electric/raihu",
+      "electric/voltorb",
+      "fire/arcanine",
+      "fire/charmander",
+      "fire/flying/moltres",
+      "fire/magmar",
+      "fire/vulpix",
+      "grass/bulbasaur",
+      "grass/ivysaur",
+      "grass/venasaur",
+      "psychic/abra",
+      "psychic/mew",
+      "water/lapras",
+      "water/psychic/starmie",
+      "water/squirtle",
+      "water/tentacool"
+    ]
+  },
+  "z_hidden+0": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+1": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+2": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+3": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+4": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+5": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+6": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+7": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+8": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+9": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+10": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+11": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+12": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+13": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+14": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+15": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+16": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+17": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+18": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+19": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+20": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+21": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+22": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+23": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+24": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+25": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+26": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+27": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+28": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+29": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+30": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+31": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+32": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+33": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+34": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+35": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+36": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+37": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+38": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+39": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+40": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+41": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+42": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+43": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+44": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+45": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+46": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+47": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+48": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+49": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+50": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+51": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+52": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+53": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+54": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+55": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+56": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+57": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+58": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+59": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+60": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+61": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+62": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+63": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+64": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+65": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+66": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+67": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+68": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+69": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+70": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+71": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+72": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+73": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+74": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+75": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+76": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+77": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+78": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  },
+  "z_hidden+79": {
+    "images": [],
+    "histograms": [],
+    "scalars": []
+  }
+}
diff --git a/tensorflow/tensorboard/app/demo/data/sin-graph.pbtxt b/tensorflow/tensorboard/app/demo/data/sin-graph.pbtxt
new file mode 100644
index 0000000000..319ec1c4bd
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/sin-graph.pbtxt
@@ -0,0 +1,14 @@
+node {
+  name: "Q"
+  op: "Input"
+}
+node {
+  name: "W"
+  op: "Input"
+}
+node {
+  name: "X"
+  op: "MatMul"
+  input: "Q"
+  input: "W"
+}
diff --git a/tensorflow/tensorboard/app/demo/data/sin.json b/tensorflow/tensorboard/app/demo/data/sin.json
new file mode 100644
index 0000000000..a91fe63d54
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/sin.json
@@ -0,0 +1 @@
+[[1434931200.0, 0.0, 0.0], [1434932928.0, 20000.0, 0.012566039883352607], [1434934656.0, 40000.0, 0.02513009544333748], [1434936384.0, 60000.0, 0.03769018266993454], [1434938112.0, 80000.0, 0.050244318179769556], [1434939840.0, 100000.0, 0.06279051952931337], [1434941568.0, 120000.0, 0.07532680552793272], [1434943296.0, 140000.0, 0.08785119655074317], [1434945024.0, 160000.0, 0.1003617148512149], [1434946752.0, 180000.0, 0.11285638487348167], [1434948480.0, 200000.0, 0.12533323356430426], [1434950208.0, 220000.0, 0.13779029068463805], [1434951936.0, 240000.0, 0.15022558912075706], [1434953664.0, 260000.0, 0.16263716519488358], [1434955392.0, 280000.0, 0.17502305897527604], [1434957120.0, 300000.0, 0.1873813145857246], [1434958848.0, 320000.0, 0.19970998051440703], [1434960576.0, 340000.0, 0.21200710992205463], [1434962304.0, 360000.0, 0.22427076094938114], [1434964032.0, 380000.0, 0.23649899702372468], [1434965760.0, 400000.0, 0.2486898871648548], [1434967488.0, 420000.0, 0.26084150628989694], [1434969216.0, 440000.0, 0.27295193551732516], [1434970944.0, 460000.0, 0.2850192624699761], [1434972672.0, 480000.0, 0.2970415815770349], [1434974400.0, 500000.0, 0.3090169943749474], [1434976128.0, 520000.0, 0.3209436098072095], [1434977856.0, 540000.0, 0.3328195445229866], [1434979584.0, 560000.0, 0.34464292317451706], [1434981312.0, 580000.0, 0.35641187871325075], [1434983040.0, 600000.0, 0.3681245526846779], [1434984768.0, 620000.0, 0.37977909552180106], [1434986496.0, 640000.0, 0.3913736668372024], [1434988224.0, 660000.0, 0.40290643571366264], [1434989952.0, 680000.0, 0.41437558099328414], [1434991680.0, 700000.0, 0.4257792915650727], [1434993408.0, 720000.0, 0.4371157666509328], [1434995136.0, 740000.0, 0.44838321609003223], [1434996864.0, 760000.0, 0.4595798606214878], [1434998592.0, 780000.0, 0.4707039321653325], [1435000320.0, 800000.0, 0.4817536741017153], [1435002048.0, 820000.0, 0.49272734154829156], [1435003776.0, 840000.0, 0.5036232016357608], [1435005504.0, 860000.0, 0.5144395337815064], [1435007232.0, 880000.0, 0.5251746299612956], [1435008960.0, 900000.0, 0.5358267949789967], [1435010688.0, 920000.0, 0.5463943467342691], [1435012416.0, 940000.0, 0.556875616488188], [1435014144.0, 960000.0, 0.5672689491267565], [1435015872.0, 980000.0, 0.5775727034222676], [1435017600.0, 1000000.0, 0.5877852522924731], [1435019328.0, 1020000.0, 0.5979049830575188], [1435021056.0, 1040000.0, 0.6079302976946054], [1435022784.0, 1060000.0, 0.6178596130903343], [1435024512.0, 1080000.0, 0.6276913612907005], [1435026240.0, 1100000.0, 0.6374239897486896], [1435027968.0, 1120000.0, 0.6470559615694442], [1435029696.0, 1140000.0, 0.6565857557529565], [1435031424.0, 1160000.0, 0.6660118674342517], [1435033152.0, 1180000.0, 0.6753328081210244], [1435034880.0, 1200000.0, 0.6845471059286886], [1435036608.0, 1220000.0, 0.6936533058128049], [1435038336.0, 1240000.0, 0.7026499697988492], [1435040064.0, 1260000.0, 0.7115356772092853], [1435041792.0, 1280000.0, 0.7203090248879069], [1435043520.0, 1300000.0, 0.7289686274214116], [1435045248.0, 1320000.0, 0.7375131173581739], [1435046976.0, 1340000.0, 0.7459411454241821], [1435048704.0, 1360000.0, 0.7542513807361038], [1435050432.0, 1380000.0, 0.7624425110114479], [1435052160.0, 1400000.0, 0.7705132427757893], [1435053888.0, 1420000.0, 0.7784623015670233], [1435055616.0, 1440000.0, 0.7862884321366188], [1435057344.0, 1460000.0, 0.7939903986478353], [1435059072.0, 1480000.0, 0.8015669848708765], [1435060800.0, 1500000.0, 0.8090169943749475], [1435062528.0, 1520000.0, 0.8163392507171839], [1435064256.0, 1540000.0, 0.8235325976284275], [1435065984.0, 1560000.0, 0.8305958991958126], [1435067712.0, 1580000.0, 0.8375280400421417], [1435069440.0, 1600000.0, 0.8443279255020151], [1435071168.0, 1620000.0, 0.8509944817946918], [1435072896.0, 1640000.0, 0.8575266561936523], [1435074624.0, 1660000.0, 0.8639234171928353], [1435076352.0, 1680000.0, 0.8701837546695257], [1435078080.0, 1700000.0, 0.8763066800438637], [1435079808.0, 1720000.0, 0.8822912264349532], [1435081536.0, 1740000.0, 0.8881364488135445], [1435083264.0, 1760000.0, 0.8938414241512637], [1435084992.0, 1780000.0, 0.899405251566371], [1435086720.0, 1800000.0, 0.9048270524660196], [1435088448.0, 1820000.0, 0.9101059706849957], [1435090176.0, 1840000.0, 0.9152411726209175], [1435091904.0, 1860000.0, 0.9202318473658703], [1435093632.0, 1880000.0, 0.925077206834458], [1435095360.0, 1900000.0, 0.9297764858882513], [1435097088.0, 1920000.0, 0.934328942456612], [1435098816.0, 1940000.0, 0.9387338576538741], [1435100544.0, 1960000.0, 0.9429905358928644], [1435102272.0, 1980000.0, 0.9470983049947443], [1435104000.0, 2000000.0, 0.9510565162951535], [1435105728.0, 2020000.0, 0.954864544746643], [1435107456.0, 2040000.0, 0.9585217890173758], [1435109184.0, 2060000.0, 0.9620276715860858], [1435110912.0, 2080000.0, 0.9653816388332739], [1435112640.0, 2100000.0, 0.9685831611286311], [1435114368.0, 2120000.0, 0.9716317329146739], [1435116096.0, 2140000.0, 0.9745268727865771], [1435117824.0, 2160000.0, 0.9772681235681935], [1435119552.0, 2180000.0, 0.9798550523842469], [1435121280.0, 2200000.0, 0.9822872507286886], [1435123008.0, 2220000.0, 0.9845643345292053], [1435124736.0, 2240000.0, 0.986685944207868], [1435126464.0, 2260000.0, 0.9886517447379141], [1435128192.0, 2280000.0, 0.9904614256966512], [1435129920.0, 2300000.0, 0.9921147013144779], [1435131648.0, 2320000.0, 0.9936113105200084], [1435133376.0, 2340000.0, 0.9949510169813002], [1435135104.0, 2360000.0, 0.9961336091431725], [1435136832.0, 2380000.0, 0.9971589002606139], [1435138560.0, 2400000.0, 0.9980267284282716], [1435140288.0, 2420000.0, 0.9987369566060175], [1435142016.0, 2440000.0, 0.9992894726405892], [1435143744.0, 2460000.0, 0.9996841892832999], [1435145472.0, 2480000.0, 0.9999210442038161], [1435147200.0, 2500000.0, 1.0], [1435148928.0, 2520000.0, 0.9999210442038161], [1435150656.0, 2540000.0, 0.9996841892832999], [1435152384.0, 2560000.0, 0.9992894726405892], [1435154112.0, 2580000.0, 0.9987369566060175], [1435155840.0, 2600000.0, 0.9980267284282716], [1435157568.0, 2620000.0, 0.9971589002606139], [1435159296.0, 2640000.0, 0.9961336091431725], [1435161024.0, 2660000.0, 0.9949510169813002], [1435162752.0, 2680000.0, 0.9936113105200084], [1435164480.0, 2700000.0, 0.9921147013144778], [1435166208.0, 2720000.0, 0.9904614256966512], [1435167936.0, 2740000.0, 0.988651744737914], [1435169664.0, 2760000.0, 0.986685944207868], [1435171392.0, 2780000.0, 0.9845643345292053], [1435173120.0, 2800000.0, 0.9822872507286886], [1435174848.0, 2820000.0, 0.9798550523842469], [1435176576.0, 2840000.0, 0.9772681235681935], [1435178304.0, 2860000.0, 0.9745268727865772], [1435180032.0, 2880000.0, 0.971631732914674], [1435181760.0, 2900000.0, 0.9685831611286312], [1435183488.0, 2920000.0, 0.965381638833274], [1435185216.0, 2940000.0, 0.9620276715860859], [1435186944.0, 2960000.0, 0.958521789017376], [1435188672.0, 2980000.0, 0.9548645447466431], [1435190400.0, 3000000.0, 0.9510565162951536], [1435192128.0, 3020000.0, 0.9470983049947443], [1435193856.0, 3040000.0, 0.9429905358928645], [1435195584.0, 3060000.0, 0.9387338576538742], [1435197312.0, 3080000.0, 0.9343289424566121], [1435199040.0, 3100000.0, 0.9297764858882513], [1435200768.0, 3120000.0, 0.925077206834458], [1435202496.0, 3140000.0, 0.9202318473658704], [1435204224.0, 3160000.0, 0.9152411726209176], [1435205952.0, 3180000.0, 0.9101059706849957], [1435207680.0, 3200000.0, 0.9048270524660195], [1435209408.0, 3220000.0, 0.8994052515663712], [1435211136.0, 3240000.0, 0.8938414241512639], [1435212864.0, 3260000.0, 0.8881364488135446], [1435214592.0, 3280000.0, 0.8822912264349533], [1435216320.0, 3300000.0, 0.8763066800438635], [1435218048.0, 3320000.0, 0.8701837546695257], [1435219776.0, 3340000.0, 0.8639234171928354], [1435221504.0, 3360000.0, 0.8575266561936522], [1435223232.0, 3380000.0, 0.8509944817946917], [1435224960.0, 3400000.0, 0.844327925502015], [1435226688.0, 3420000.0, 0.8375280400421418], [1435228416.0, 3440000.0, 0.8305958991958129], [1435230144.0, 3460000.0, 0.8235325976284276], [1435231872.0, 3480000.0, 0.816339250717184], [1435233600.0, 3500000.0, 0.8090169943749475], [1435235328.0, 3520000.0, 0.8015669848708769], [1435237056.0, 3540000.0, 0.7939903986478355], [1435238784.0, 3560000.0, 0.786288432136619], [1435240512.0, 3580000.0, 0.7784623015670235], [1435242240.0, 3600000.0, 0.7705132427757893], [1435243968.0, 3620000.0, 0.7624425110114481], [1435245696.0, 3640000.0, 0.754251380736104], [1435247424.0, 3660000.0, 0.7459411454241822], [1435249152.0, 3680000.0, 0.7375131173581739], [1435250880.0, 3700000.0, 0.7289686274214114], [1435252608.0, 3720000.0, 0.720309024887907], [1435254336.0, 3740000.0, 0.7115356772092855], [1435256064.0, 3760000.0, 0.7026499697988492], [1435257792.0, 3780000.0, 0.6936533058128049], [1435259520.0, 3800000.0, 0.6845471059286888], [1435261248.0, 3820000.0, 0.6753328081210246], [1435262976.0, 3840000.0, 0.6660118674342517], [1435264704.0, 3860000.0, 0.6565857557529564], [1435266432.0, 3880000.0, 0.6470559615694442], [1435268160.0, 3900000.0, 0.6374239897486899], [1435269888.0, 3920000.0, 0.6276913612907006], [1435271616.0, 3940000.0, 0.6178596130903343], [1435273344.0, 3960000.0, 0.6079302976946053], [1435275072.0, 3980000.0, 0.5979049830575187], [1435276800.0, 4000000.0, 0.5877852522924732], [1435278528.0, 4020000.0, 0.5775727034222676], [1435280256.0, 4040000.0, 0.5672689491267564], [1435281984.0, 4060000.0, 0.5568756164881878], [1435283712.0, 4080000.0, 0.5463943467342692], [1435285440.0, 4100000.0, 0.535826794978997], [1435287168.0, 4120000.0, 0.525174629961296], [1435288896.0, 4140000.0, 0.5144395337815066], [1435290624.0, 4160000.0, 0.5036232016357609], [1435292352.0, 4180000.0, 0.4927273415482916], [1435294080.0, 4200000.0, 0.4817536741017156], [1435295808.0, 4220000.0, 0.4707039321653328], [1435297536.0, 4240000.0, 0.459579860621488], [1435299264.0, 4260000.0, 0.4483832160900323], [1435300992.0, 4280000.0, 0.4371157666509329], [1435302720.0, 4300000.0, 0.4257792915650729], [1435304448.0, 4320000.0, 0.4143755809932843], [1435306176.0, 4340000.0, 0.40290643571366275], [1435307904.0, 4360000.0, 0.39137366683720237], [1435309632.0, 4380000.0, 0.3797790955218014], [1435311360.0, 4400000.0, 0.36812455268467814], [1435313088.0, 4420000.0, 0.3564118787132508], [1435314816.0, 4440000.0, 0.34464292317451706], [1435316544.0, 4460000.0, 0.3328195445229865], [1435318272.0, 4480000.0, 0.3209436098072097], [1435320000.0, 4500000.0, 0.3090169943749475], [1435321728.0, 4520000.0, 0.2970415815770349], [1435323456.0, 4540000.0, 0.28501926246997605], [1435325184.0, 4560000.0, 0.27295193551732505], [1435326912.0, 4580000.0, 0.26084150628989705], [1435328640.0, 4600000.0, 0.24868988716485482], [1435330368.0, 4620000.0, 0.2364989970237246], [1435332096.0, 4640000.0, 0.224270760949381], [1435333824.0, 4660000.0, 0.2120071099220548], [1435335552.0, 4680000.0, 0.19970998051440705], [1435337280.0, 4700000.0, 0.18738131458572502], [1435339008.0, 4720000.0, 0.1750230589752763], [1435340736.0, 4740000.0, 0.16263716519488378], [1435342464.0, 4760000.0, 0.15022558912075712], [1435344192.0, 4780000.0, 0.13779029068463847], [1435345920.0, 4800000.0, 0.12533323356430454], [1435347648.0, 4820000.0, 0.11285638487348187], [1435349376.0, 4840000.0, 0.10036171485121498], [1435351104.0, 4860000.0, 0.08785119655074315], [1435352832.0, 4880000.0, 0.07532680552793304], [1435354560.0, 4900000.0, 0.06279051952931358], [1435356288.0, 4920000.0, 0.05024431817976966], [1435358016.0, 4940000.0, 0.037690182669934534], [1435359744.0, 4960000.0, 0.025130095443337813], [1435361472.0, 4980000.0, 0.012566039883352836], [1435363200.0, 5000000.0, 1.2246467991473532e-16], [1435364928.0, 5020000.0, -0.012566039883352592], [1435366656.0, 5040000.0, -0.02513009544333757], [1435368384.0, 5060000.0, -0.03769018266993429], [1435370112.0, 5080000.0, -0.05024431817976942], [1435371840.0, 5100000.0, -0.06279051952931335], [1435373568.0, 5120000.0, -0.07532680552793279], [1435375296.0, 5140000.0, -0.0878511965507429], [1435377024.0, 5160000.0, -0.10036171485121473], [1435378752.0, 5180000.0, -0.11285638487348164], [1435380480.0, 5200000.0, -0.12533323356430429], [1435382208.0, 5220000.0, -0.13779029068463822], [1435383936.0, 5240000.0, -0.15022558912075687], [1435385664.0, 5260000.0, -0.16263716519488353], [1435387392.0, 5280000.0, -0.1750230589752761], [1435389120.0, 5300000.0, -0.18738131458572477], [1435390848.0, 5320000.0, -0.19970998051440725], [1435392576.0, 5340000.0, -0.21200710992205454], [1435394304.0, 5360000.0, -0.2242707609493812], [1435396032.0, 5380000.0, -0.2364989970237248], [1435397760.0, 5400000.0, -0.24868988716485502], [1435399488.0, 5420000.0, -0.26084150628989683], [1435401216.0, 5440000.0, -0.2729519355173252], [1435402944.0, 5460000.0, -0.2850192624699762], [1435404672.0, 5480000.0, -0.2970415815770351], [1435406400.0, 5500000.0, -0.30901699437494773], [1435408128.0, 5520000.0, -0.3209436098072095], [1435409856.0, 5540000.0, -0.33281954452298673], [1435411584.0, 5560000.0, -0.3446429231745172], [1435413312.0, 5580000.0, -0.356411878713251], [1435415040.0, 5600000.0, -0.3681245526846783], [1435416768.0, 5620000.0, -0.37977909552180117], [1435418496.0, 5640000.0, -0.39137366683720215], [1435420224.0, 5660000.0, -0.4029064357136621], [1435421952.0, 5680000.0, -0.41437558099328364], [1435423680.0, 5700000.0, -0.42577929156507227], [1435425408.0, 5720000.0, -0.43711576665093266], [1435427136.0, 5740000.0, -0.4483832160900317], [1435428864.0, 5760000.0, -0.45957986062148737], [1435430592.0, 5780000.0, -0.4707039321653322], [1435432320.0, 5800000.0, -0.481753674101715], [1435434048.0, 5820000.0, -0.4927273415482914], [1435435776.0, 5840000.0, -0.5036232016357604], [1435437504.0, 5860000.0, -0.5144395337815061], [1435439232.0, 5880000.0, -0.5251746299612954], [1435440960.0, 5900000.0, -0.5358267949789964], [1435442688.0, 5920000.0, -0.546394346734269], [1435444416.0, 5940000.0, -0.5568756164881876], [1435446144.0, 5960000.0, -0.5672689491267562], [1435447872.0, 5980000.0, -0.5775727034222674], [1435449600.0, 6000000.0, -0.587785252292473], [1435451328.0, 6020000.0, -0.5979049830575185], [1435453056.0, 6040000.0, -0.607930297694605], [1435454784.0, 6060000.0, -0.6178596130903341], [1435456512.0, 6080000.0, -0.6276913612907004], [1435458240.0, 6100000.0, -0.6374239897486896], [1435459968.0, 6120000.0, -0.647055961569444], [1435461696.0, 6140000.0, -0.6565857557529562], [1435463424.0, 6160000.0, -0.6660118674342514], [1435465152.0, 6180000.0, -0.6753328081210244], [1435466880.0, 6200000.0, -0.6845471059286887], [1435468608.0, 6220000.0, -0.6936533058128047], [1435470336.0, 6240000.0, -0.7026499697988491], [1435472064.0, 6260000.0, -0.7115356772092852], [1435473792.0, 6280000.0, -0.7203090248879069], [1435475520.0, 6300000.0, -0.7289686274214113], [1435477248.0, 6320000.0, -0.7375131173581737], [1435478976.0, 6340000.0, -0.7459411454241821], [1435480704.0, 6360000.0, -0.7542513807361038], [1435482432.0, 6380000.0, -0.7624425110114479], [1435484160.0, 6400000.0, -0.7705132427757894], [1435485888.0, 6420000.0, -0.7784623015670236], [1435487616.0, 6440000.0, -0.7862884321366186], [1435489344.0, 6460000.0, -0.793990398647835], [1435491072.0, 6480000.0, -0.8015669848708764], [1435492800.0, 6500000.0, -0.8090169943749473], [1435494528.0, 6520000.0, -0.8163392507171839], [1435496256.0, 6540000.0, -0.8235325976284275], [1435497984.0, 6560000.0, -0.8305958991958128], [1435499712.0, 6580000.0, -0.8375280400421419], [1435501440.0, 6600000.0, -0.8443279255020153], [1435503168.0, 6620000.0, -0.8509944817946921], [1435504896.0, 6640000.0, -0.8575266561936521], [1435506624.0, 6660000.0, -0.8639234171928352], [1435508352.0, 6680000.0, -0.8701837546695256], [1435510080.0, 6700000.0, -0.8763066800438636], [1435511808.0, 6720000.0, -0.8822912264349534], [1435513536.0, 6740000.0, -0.8881364488135446], [1435515264.0, 6760000.0, -0.8938414241512639], [1435516992.0, 6780000.0, -0.8994052515663712], [1435518720.0, 6800000.0, -0.9048270524660198], [1435520448.0, 6820000.0, -0.9101059706849955], [1435522176.0, 6840000.0, -0.9152411726209175], [1435523904.0, 6860000.0, -0.9202318473658704], [1435525632.0, 6880000.0, -0.9250772068344577], [1435527360.0, 6900000.0, -0.9297764858882511], [1435529088.0, 6920000.0, -0.9343289424566118], [1435530816.0, 6940000.0, -0.938733857653874], [1435532544.0, 6960000.0, -0.9429905358928643], [1435534272.0, 6980000.0, -0.9470983049947442], [1435536000.0, 7000000.0, -0.9510565162951535], [1435537728.0, 7020000.0, -0.954864544746643], [1435539456.0, 7040000.0, -0.9585217890173756], [1435541184.0, 7060000.0, -0.9620276715860857], [1435542912.0, 7080000.0, -0.9653816388332738], [1435544640.0, 7100000.0, -0.968583161128631], [1435546368.0, 7120000.0, -0.9716317329146739], [1435548096.0, 7140000.0, -0.9745268727865771], [1435549824.0, 7160000.0, -0.9772681235681934], [1435551552.0, 7180000.0, -0.9798550523842469], [1435553280.0, 7200000.0, -0.9822872507286887], [1435555008.0, 7220000.0, -0.9845643345292054], [1435556736.0, 7240000.0, -0.9866859442078679], [1435558464.0, 7260000.0, -0.988651744737914], [1435560192.0, 7280000.0, -0.9904614256966512], [1435561920.0, 7300000.0, -0.9921147013144778], [1435563648.0, 7320000.0, -0.9936113105200084], [1435565376.0, 7340000.0, -0.9949510169813002], [1435567104.0, 7360000.0, -0.9961336091431725], [1435568832.0, 7380000.0, -0.9971589002606139], [1435570560.0, 7400000.0, -0.9980267284282716], [1435572288.0, 7420000.0, -0.9987369566060175], [1435574016.0, 7440000.0, -0.9992894726405892], [1435575744.0, 7460000.0, -0.9996841892832999], [1435577472.0, 7480000.0, -0.9999210442038161], [1435579200.0, 7500000.0, -1.0], [1435580928.0, 7520000.0, -0.9999210442038161], [1435582656.0, 7540000.0, -0.9996841892832999], [1435584384.0, 7560000.0, -0.9992894726405892], [1435586112.0, 7580000.0, -0.9987369566060175], [1435587840.0, 7600000.0, -0.9980267284282716], [1435589568.0, 7620000.0, -0.9971589002606139], [1435591296.0, 7640000.0, -0.9961336091431725], [1435593024.0, 7660000.0, -0.9949510169813002], [1435594752.0, 7680000.0, -0.9936113105200084], [1435596480.0, 7700000.0, -0.9921147013144779], [1435598208.0, 7720000.0, -0.9904614256966512], [1435599936.0, 7740000.0, -0.988651744737914], [1435601664.0, 7760000.0, -0.986685944207868], [1435603392.0, 7780000.0, -0.9845643345292054], [1435605120.0, 7800000.0, -0.9822872507286887], [1435606848.0, 7820000.0, -0.979855052384247], [1435608576.0, 7840000.0, -0.9772681235681935], [1435610304.0, 7860000.0, -0.9745268727865771], [1435612032.0, 7880000.0, -0.971631732914674], [1435613760.0, 7900000.0, -0.9685831611286311], [1435615488.0, 7920000.0, -0.9653816388332738], [1435617216.0, 7940000.0, -0.9620276715860858], [1435618944.0, 7960000.0, -0.9585217890173757], [1435620672.0, 7980000.0, -0.9548645447466431], [1435622400.0, 8000000.0, -0.9510565162951536], [1435624128.0, 8020000.0, -0.9470983049947443], [1435625856.0, 8040000.0, -0.9429905358928644], [1435627584.0, 8060000.0, -0.9387338576538741], [1435629312.0, 8080000.0, -0.934328942456612], [1435631040.0, 8100000.0, -0.9297764858882512], [1435632768.0, 8120000.0, -0.9250772068344579], [1435634496.0, 8140000.0, -0.9202318473658705], [1435636224.0, 8160000.0, -0.9152411726209176], [1435637952.0, 8180000.0, -0.9101059706849958], [1435639680.0, 8200000.0, -0.9048270524660199], [1435641408.0, 8220000.0, -0.8994052515663714], [1435643136.0, 8240000.0, -0.8938414241512641], [1435644864.0, 8260000.0, -0.8881364488135448], [1435646592.0, 8280000.0, -0.8822912264349535], [1435648320.0, 8300000.0, -0.8763066800438638], [1435650048.0, 8320000.0, -0.8701837546695258], [1435651776.0, 8340000.0, -0.8639234171928354], [1435653504.0, 8360000.0, -0.8575266561936523], [1435655232.0, 8380000.0, -0.8509944817946923], [1435656960.0, 8400000.0, -0.8443279255020155], [1435658688.0, 8420000.0, -0.8375280400421421], [1435660416.0, 8440000.0, -0.8305958991958129], [1435662144.0, 8460000.0, -0.8235325976284277], [1435663872.0, 8480000.0, -0.8163392507171842], [1435665600.0, 8500000.0, -0.8090169943749476], [1435667328.0, 8520000.0, -0.8015669848708766], [1435669056.0, 8540000.0, -0.7939903986478353], [1435670784.0, 8560000.0, -0.7862884321366188], [1435672512.0, 8580000.0, -0.7784623015670239], [1435674240.0, 8600000.0, -0.7705132427757896], [1435675968.0, 8620000.0, -0.7624425110114481], [1435677696.0, 8640000.0, -0.7542513807361041], [1435679424.0, 8660000.0, -0.7459411454241823], [1435681152.0, 8680000.0, -0.737513117358174], [1435682880.0, 8700000.0, -0.7289686274214116], [1435684608.0, 8720000.0, -0.7203090248879068], [1435686336.0, 8740000.0, -0.7115356772092852], [1435688064.0, 8760000.0, -0.7026499697988496], [1435689792.0, 8780000.0, -0.6936533058128054], [1435691520.0, 8800000.0, -0.684547105928689], [1435693248.0, 8820000.0, -0.6753328081210247], [1435694976.0, 8840000.0, -0.6660118674342518], [1435696704.0, 8860000.0, -0.6565857557529565], [1435698432.0, 8880000.0, -0.6470559615694443], [1435700160.0, 8900000.0, -0.6374239897486896], [1435701888.0, 8920000.0, -0.6276913612907002], [1435703616.0, 8940000.0, -0.6178596130903348], [1435705344.0, 8960000.0, -0.6079302976946057], [1435707072.0, 8980000.0, -0.5979049830575192], [1435708800.0, 9000000.0, -0.5877852522924734], [1435710528.0, 9020000.0, -0.5775727034222677], [1435712256.0, 9040000.0, -0.5672689491267565], [1435713984.0, 9060000.0, -0.556875616488188], [1435715712.0, 9080000.0, -0.5463943467342689], [1435717440.0, 9100000.0, -0.5358267949789963], [1435719168.0, 9120000.0, -0.5251746299612954], [1435720896.0, 9140000.0, -0.5144395337815068], [1435722624.0, 9160000.0, -0.503623201635761], [1435724352.0, 9180000.0, -0.4927273415482917], [1435726080.0, 9200000.0, -0.4817536741017153], [1435727808.0, 9220000.0, -0.4707039321653325], [1435729536.0, 9240000.0, -0.4595798606214877], [1435731264.0, 9260000.0, -0.448383216090032], [1435732992.0, 9280000.0, -0.43711576665093255], [1435734720.0, 9300000.0, -0.4257792915650722], [1435736448.0, 9320000.0, -0.4143755809932844], [1435738176.0, 9340000.0, -0.40290643571366286], [1435739904.0, 9360000.0, -0.3913736668372025], [1435741632.0, 9380000.0, -0.3797790955218019], [1435743360.0, 9400000.0, -0.3681245526846787], [1435745088.0, 9420000.0, -0.3564118787132513], [1435746816.0, 9440000.0, -0.34464292317451756], [1435748544.0, 9460000.0, -0.33281954452298707], [1435750272.0, 9480000.0, -0.3209436098072098], [1435752000.0, 9500000.0, -0.3090169943749476], [1435753728.0, 9520000.0, -0.29704158157703503], [1435755456.0, 9540000.0, -0.285019262469977], [1435757184.0, 9560000.0, -0.272951935517326], [1435758912.0, 9580000.0, -0.2608415062898976], [1435760640.0, 9600000.0, -0.24868988716485535], [1435762368.0, 9620000.0, -0.23649899702372515], [1435764096.0, 9640000.0, -0.22427076094938156], [1435765824.0, 9660000.0, -0.2120071099220549], [1435767552.0, 9680000.0, -0.1997099805144072], [1435769280.0, 9700000.0, -0.18738131458572468], [1435771008.0, 9720000.0, -0.175023058975276], [1435772736.0, 9740000.0, -0.16263716519488433], [1435774464.0, 9760000.0, -0.15022558912075767], [1435776192.0, 9780000.0, -0.13779029068463858], [1435777920.0, 9800000.0, -0.12533323356430465], [1435779648.0, 9820000.0, -0.112856384873482], [1435781376.0, 9840000.0, -0.10036171485121509], [1435783104.0, 9860000.0, -0.08785119655074328], [1435784832.0, 9880000.0, -0.07532680552793272], [1435786560.0, 9900000.0, -0.06279051952931326], [1435788288.0, 9920000.0, -0.05024431817977022], [1435790016.0, 9940000.0, -0.0376901826699351], [1435791744.0, 9960000.0, -0.025130095443337937], [1435793472.0, 9980000.0, -0.01256603988335296], [1435795200.0, 10000000.0, -2.4492935982947064e-16], [1435796928.0, 10020000.0, 0.012566039883352469], [1435798656.0, 10040000.0, 0.025130095443337445], [1435800384.0, 10060000.0, 0.03769018266993461], [1435802112.0, 10080000.0, 0.05024431817976974], [1435803840.0, 10100000.0, 0.06279051952931278], [1435805568.0, 10120000.0, 0.07532680552793222], [1435807296.0, 10140000.0, 0.08785119655074279], [1435809024.0, 10160000.0, 0.1003617148512146], [1435810752.0, 10180000.0, 0.11285638487348151], [1435812480.0, 10200000.0, 0.12533323356430418], [1435814208.0, 10220000.0, 0.1377902906846381], [1435815936.0, 10240000.0, 0.1502255891207572], [1435817664.0, 10260000.0, 0.16263716519488383], [1435819392.0, 10280000.0, 0.1750230589752755], [1435821120.0, 10300000.0, 0.1873813145857242], [1435822848.0, 10320000.0, 0.1997099805144067], [1435824576.0, 10340000.0, 0.21200710992205443], [1435826304.0, 10360000.0, 0.2242707609493811], [1435828032.0, 10380000.0, 0.23649899702372468], [1435829760.0, 10400000.0, 0.24868988716485488], [1435831488.0, 10420000.0, 0.2608415062898971], [1435833216.0, 10440000.0, 0.27295193551732555], [1435834944.0, 10460000.0, 0.28501926246997655], [1435836672.0, 10480000.0, 0.2970415815770346], [1435838400.0, 10500000.0, 0.3090169943749472], [1435840128.0, 10520000.0, 0.3209436098072094], [1435841856.0, 10540000.0, 0.3328195445229866], [1435843584.0, 10560000.0, 0.3446429231745171], [1435845312.0, 10580000.0, 0.35641187871325086], [1435847040.0, 10600000.0, 0.3681245526846782], [1435848768.0, 10620000.0, 0.37977909552180145], [1435850496.0, 10640000.0, 0.39137366683720287], [1435852224.0, 10660000.0, 0.4029064357136624], [1435853952.0, 10680000.0, 0.414375580993284], [1435855680.0, 10700000.0, 0.42577929156507255], [1435857408.0, 10720000.0, 0.43711576665093294], [1435859136.0, 10740000.0, 0.44838321609003234], [1435860864.0, 10760000.0, 0.4595798606214881], [1435862592.0, 10780000.0, 0.47070393216533285], [1435864320.0, 10800000.0, 0.48175367410171566], [1435866048.0, 10820000.0, 0.49272734154829206], [1435867776.0, 10840000.0, 0.5036232016357607], [1435869504.0, 10860000.0, 0.5144395337815063], [1435871232.0, 10880000.0, 0.5251746299612957], [1435872960.0, 10900000.0, 0.5358267949789967], [1435874688.0, 10920000.0, 0.5463943467342692], [1435876416.0, 10940000.0, 0.5568756164881883], [1435878144.0, 10960000.0, 0.5672689491267568], [1435879872.0, 10980000.0, 0.577572703422268], [1435881600.0, 11000000.0, 0.5877852522924736], [1435883328.0, 11020000.0, 0.5979049830575195], [1435885056.0, 11040000.0, 0.6079302976946054], [1435886784.0, 11060000.0, 0.6178596130903344], [1435888512.0, 11080000.0, 0.6276913612907006], [1435890240.0, 11100000.0, 0.6374239897486899], [1435891968.0, 11120000.0, 0.6470559615694446], [1435893696.0, 11140000.0, 0.6565857557529567], [1435895424.0, 11160000.0, 0.666011867434252], [1435897152.0, 11180000.0, 0.6753328081210249], [1435898880.0, 11200000.0, 0.6845471059286893], [1435900608.0, 11220000.0, 0.6936533058128049], [1435902336.0, 11240000.0, 0.7026499697988493], [1435904064.0, 11260000.0, 0.7115356772092849], [1435905792.0, 11280000.0, 0.7203090248879065], [1435907520.0, 11300000.0, 0.7289686274214106], [1435909248.0, 11320000.0, 0.737513117358173], [1435910976.0, 11340000.0, 0.7459411454241814], [1435912704.0, 11360000.0, 0.7542513807361031], [1435914432.0, 11380000.0, 0.7624425110114472], [1435916160.0, 11400000.0, 0.7705132427757887], [1435917888.0, 11420000.0, 0.778462301567023], [1435919616.0, 11440000.0, 0.7862884321366185], [1435921344.0, 11460000.0, 0.793990398647835], [1435923072.0, 11480000.0, 0.8015669848708759], [1435924800.0, 11500000.0, 0.8090169943749468], [1435926528.0, 11520000.0, 0.8163392507171834], [1435928256.0, 11540000.0, 0.8235325976284269], [1435929984.0, 11560000.0, 0.8305958991958121], [1435931712.0, 11580000.0, 0.8375280400421413], [1435933440.0, 11600000.0, 0.8443279255020147], [1435935168.0, 11620000.0, 0.8509944817946916], [1435936896.0, 11640000.0, 0.8575266561936521], [1435938624.0, 11660000.0, 0.8639234171928352], [1435940352.0, 11680000.0, 0.8701837546695251], [1435942080.0, 11700000.0, 0.8763066800438631], [1435943808.0, 11720000.0, 0.8822912264349528], [1435945536.0, 11740000.0, 0.8881364488135441], [1435947264.0, 11760000.0, 0.8938414241512634], [1435948992.0, 11780000.0, 0.8994052515663707], [1435950720.0, 11800000.0, 0.9048270524660194], [1435952448.0, 11820000.0, 0.9101059706849955], [1435954176.0, 11840000.0, 0.9152411726209174], [1435955904.0, 11860000.0, 0.9202318473658699], [1435957632.0, 11880000.0, 0.9250772068344577], [1435959360.0, 11900000.0, 0.9297764858882511], [1435961088.0, 11920000.0, 0.9343289424566118], [1435962816.0, 11940000.0, 0.9387338576538738], [1435964544.0, 11960000.0, 0.9429905358928643], [1435966272.0, 11980000.0, 0.9470983049947441], [1435968000.0, 12000000.0, 0.9510565162951535], [1435969728.0, 12020000.0, 0.954864544746643], [1435971456.0, 12040000.0, 0.9585217890173756], [1435973184.0, 12060000.0, 0.9620276715860857], [1435974912.0, 12080000.0, 0.9653816388332737], [1435976640.0, 12100000.0, 0.968583161128631], [1435978368.0, 12120000.0, 0.9716317329146738], [1435980096.0, 12140000.0, 0.974526872786577], [1435981824.0, 12160000.0, 0.9772681235681934], [1435983552.0, 12180000.0, 0.9798550523842469], [1435985280.0, 12200000.0, 0.9822872507286886], [1435987008.0, 12220000.0, 0.9845643345292053], [1435988736.0, 12240000.0, 0.9866859442078679], [1435990464.0, 12260000.0, 0.988651744737914], [1435992192.0, 12280000.0, 0.9904614256966512], [1435993920.0, 12300000.0, 0.9921147013144778], [1435995648.0, 12320000.0, 0.9936113105200084], [1435997376.0, 12340000.0, 0.9949510169813002], [1435999104.0, 12360000.0, 0.9961336091431725], [1436000832.0, 12380000.0, 0.9971589002606139], [1436002560.0, 12400000.0, 0.9980267284282716], [1436004288.0, 12420000.0, 0.9987369566060175], [1436006016.0, 12440000.0, 0.9992894726405892], [1436007744.0, 12460000.0, 0.9996841892832999], [1436009472.0, 12480000.0, 0.9999210442038161], [1436011200.0, 12500000.0, 1.0], [1436012928.0, 12520000.0, 0.9999210442038161], [1436014656.0, 12540000.0, 0.9996841892832999], [1436016384.0, 12560000.0, 0.9992894726405892], [1436018112.0, 12580000.0, 0.9987369566060175], [1436019840.0, 12600000.0, 0.9980267284282716], [1436021568.0, 12620000.0, 0.9971589002606139], [1436023296.0, 12640000.0, 0.9961336091431725], [1436025024.0, 12660000.0, 0.9949510169813002], [1436026752.0, 12680000.0, 0.9936113105200084], [1436028480.0, 12700000.0, 0.9921147013144779], [1436030208.0, 12720000.0, 0.9904614256966512], [1436031936.0, 12740000.0, 0.9886517447379142], [1436033664.0, 12760000.0, 0.986685944207868], [1436035392.0, 12780000.0, 0.9845643345292054], [1436037120.0, 12800000.0, 0.9822872507286886], [1436038848.0, 12820000.0, 0.979855052384247], [1436040576.0, 12840000.0, 0.9772681235681934], [1436042304.0, 12860000.0, 0.9745268727865772], [1436044032.0, 12880000.0, 0.9716317329146742], [1436045760.0, 12900000.0, 0.9685831611286311], [1436047488.0, 12920000.0, 0.9653816388332741], [1436049216.0, 12940000.0, 0.9620276715860858], [1436050944.0, 12960000.0, 0.9585217890173761], [1436052672.0, 12980000.0, 0.9548645447466428], [1436054400.0, 13000000.0, 0.9510565162951536], [1436056128.0, 13020000.0, 0.947098304994744], [1436057856.0, 13040000.0, 0.9429905358928645], [1436059584.0, 13060000.0, 0.9387338576538744], [1436061312.0, 13080000.0, 0.934328942456612], [1436063040.0, 13100000.0, 0.9297764858882517], [1436064768.0, 13120000.0, 0.9250772068344579], [1436066496.0, 13140000.0, 0.9202318473658705], [1436068224.0, 13160000.0, 0.9152411726209173], [1436069952.0, 13180000.0, 0.9101059706849958], [1436071680.0, 13200000.0, 0.9048270524660192], [1436073408.0, 13220000.0, 0.8994052515663711], [1436075136.0, 13240000.0, 0.8938414241512633], [1436076864.0, 13260000.0, 0.8881364488135445], [1436078592.0, 13280000.0, 0.8822912264349536], [1436080320.0, 13300000.0, 0.8763066800438634], [1436082048.0, 13320000.0, 0.8701837546695259], [1436083776.0, 13340000.0, 0.863923417192835], [1436085504.0, 13360000.0, 0.8575266561936523], [1436087232.0, 13380000.0, 0.8509944817946914], [1436088960.0, 13400000.0, 0.8443279255020151], [1436090688.0, 13420000.0, 0.8375280400421411], [1436092416.0, 13440000.0, 0.8305958991958126], [1436094144.0, 13460000.0, 0.8235325976284278], [1436095872.0, 13480000.0, 0.8163392507171837], [1436097600.0, 13500000.0, 0.8090169943749477], [1436099328.0, 13520000.0, 0.8015669848708762], [1436101056.0, 13540000.0, 0.7939903986478354], [1436102784.0, 13560000.0, 0.7862884321366184], [1436104512.0, 13580000.0, 0.7784623015670233], [1436106240.0, 13600000.0, 0.7705132427757886], [1436107968.0, 13620000.0, 0.7624425110114477], [1436109696.0, 13640000.0, 0.7542513807361041], [1436111424.0, 13660000.0, 0.7459411454241818], [1436113152.0, 13680000.0, 0.737513117358174], [1436114880.0, 13700000.0, 0.728968627421411], [1436116608.0, 13720000.0, 0.7203090248879069], [1436118336.0, 13740000.0, 0.7115356772092847], [1436120064.0, 13760000.0, 0.7026499697988503], [1436121792.0, 13780000.0, 0.6936533058128054], [1436123520.0, 13800000.0, 0.6845471059286897], [1436125248.0, 13820000.0, 0.6753328081210247], [1436126976.0, 13840000.0, 0.6660118674342526], [1436128704.0, 13860000.0, 0.6565857557529566], [1436130432.0, 13880000.0, 0.6470559615694451], [1436132160.0, 13900000.0, 0.6374239897486911], [1436133888.0, 13920000.0, 0.627691361290701], [1436135616.0, 13940000.0, 0.6178596130903355], [1436137344.0, 13960000.0, 0.6079302976946058], [1436139072.0, 13980000.0, 0.5979049830575199], [1436140800.0, 14000000.0, 0.5877852522924734], [1436142528.0, 14020000.0, 0.5775727034222685], [1436144256.0, 14040000.0, 0.5672689491267566], [1436145984.0, 14060000.0, 0.5568756164881887], [1436147712.0, 14080000.0, 0.5463943467342705], [1436149440.0, 14100000.0, 0.5358267949789972], [1436151168.0, 14120000.0, 0.5251746299612969], [1436152896.0, 14140000.0, 0.5144395337815069], [1436154624.0, 14160000.0, 0.5036232016357619], [1436156352.0, 14180000.0, 0.4927273415482918], [1436158080.0, 14200000.0, 0.4817536741017162], [1436159808.0, 14220000.0, 0.47070393216533263], [1436161536.0, 14240000.0, 0.4595798606214886], [1436163264.0, 14260000.0, 0.44838321609003373], [1436164992.0, 14280000.0, 0.4371157666509335], [1436166720.0, 14300000.0, 0.42577929156507394], [1436168448.0, 14320000.0, 0.41437558099328453], [1436170176.0, 14340000.0, 0.40290643571366375], [1436171904.0, 14360000.0, 0.3913736668372026], [1436173632.0, 14380000.0, 0.379779095521802], [1436175360.0, 14400000.0, 0.368124552684678], [1436177088.0, 14420000.0, 0.3564118787132514], [1436178816.0, 14440000.0, 0.34464292317451684], [1436180544.0, 14460000.0, 0.3328195445229872], [1436182272.0, 14480000.0, 0.32094360980721076], [1436184000.0, 14500000.0, 0.3090169943749478], [1436185728.0, 14520000.0, 0.297041581577036], [1436187456.0, 14540000.0, 0.2850192624699763], [1436189184.0, 14560000.0, 0.2729519355173261], [1436190912.0, 14580000.0, 0.2608415062898969], [1436192640.0, 14600000.0, 0.2486898871648555], [1436194368.0, 14620000.0, 0.2364989970237244], [1436196096.0, 14640000.0, 0.22427076094938167], [1436197824.0, 14660000.0, 0.2120071099220559], [1436199552.0, 14680000.0, 0.1997099805144073], [1436201280.0, 14700000.0, 0.18738131458572568], [1436203008.0, 14720000.0, 0.17502305897527612], [1436204736.0, 14740000.0, 0.16263716519488444], [1436206464.0, 14760000.0, 0.15022558912075692], [1436208192.0, 14780000.0, 0.13779029068463872], [1436209920.0, 14800000.0, 0.1253332335643039], [1436211648.0, 14820000.0, 0.11285638487348212], [1436213376.0, 14840000.0, 0.1003617148512161], [1436215104.0, 14860000.0, 0.0878511965507434], [1436216832.0, 14880000.0, 0.07532680552793372], [1436218560.0, 14900000.0, 0.06279051952931339], [1436220288.0, 14920000.0, 0.05024431817977035], [1436222016.0, 14940000.0, 0.03769018266993433], [1436223744.0, 14960000.0, 0.02513009544333806], [1436225472.0, 14980000.0, 0.012566039883352193], [1436227200.0, 15000000.0, 3.6739403974420594e-16], [1436228928.0, 15020000.0, -0.012566039883351459], [1436230656.0, 15040000.0, -0.025130095443337323], [1436232384.0, 15060000.0, -0.037690182669933604], [1436234112.0, 15080000.0, -0.05024431817976961], [1436235840.0, 15100000.0, -0.06279051952931265], [1436237568.0, 15120000.0, -0.07532680552793299], [1436239296.0, 15140000.0, -0.08785119655074267], [1436241024.0, 15160000.0, -0.10036171485121537], [1436242752.0, 15180000.0, -0.11285638487348139], [1436244480.0, 15200000.0, -0.12533323356430318], [1436246208.0, 15220000.0, -0.13779029068463797], [1436247936.0, 15240000.0, -0.1502255891207562], [1436249664.0, 15260000.0, -0.16263716519488372], [1436251392.0, 15280000.0, -0.1750230589752754], [1436253120.0, 15300000.0, -0.18738131458572496], [1436254848.0, 15320000.0, -0.19970998051440658], [1436256576.0, 15340000.0, -0.21200710992205518], [1436258304.0, 15360000.0, -0.22427076094938095], [1436260032.0, 15380000.0, -0.2364989970237237], [1436261760.0, 15400000.0, -0.24868988716485477], [1436263488.0, 15420000.0, -0.26084150628989616], [1436265216.0, 15440000.0, -0.27295193551732544], [1436266944.0, 15460000.0, -0.28501926246997555], [1436268672.0, 15480000.0, -0.2970415815770353], [1436270400.0, 15500000.0, -0.30901699437494706], [1436272128.0, 15520000.0, -0.3209436098072101], [1436273856.0, 15540000.0, -0.3328195445229865], [1436275584.0, 15560000.0, -0.34464292317451617], [1436277312.0, 15580000.0, -0.35641187871325075], [1436279040.0, 15600000.0, -0.36812455268467725], [1436280768.0, 15620000.0, -0.37977909552180134], [1436282496.0, 15640000.0, -0.39137366683720193], [1436284224.0, 15660000.0, -0.4029064357136631], [1436285952.0, 15680000.0, -0.41437558099328387], [1436287680.0, 15700000.0, -0.42577929156507327], [1436289408.0, 15720000.0, -0.4371157666509328], [1436291136.0, 15740000.0, -0.44838321609003307], [1436292864.0, 15760000.0, -0.459579860621488], [1436294592.0, 15780000.0, -0.47070393216533196], [1436296320.0, 15800000.0, -0.48175367410171555], [1436298048.0, 15820000.0, -0.4927273415482912], [1436299776.0, 15840000.0, -0.5036232016357614], [1436301504.0, 15860000.0, -0.5144395337815062], [1436303232.0, 15880000.0, -0.5251746299612964], [1436304960.0, 15900000.0, -0.5358267949789965], [1436306688.0, 15920000.0, -0.5463943467342699], [1436308416.0, 15940000.0, -0.5568756164881882], [1436310144.0, 15960000.0, -0.567268949126756], [1436311872.0, 15980000.0, -0.577572703422268], [1436313600.0, 16000000.0, -0.5877852522924728], [1436315328.0, 16020000.0, -0.5979049830575194], [1436317056.0, 16040000.0, -0.6079302976946053], [1436318784.0, 16060000.0, -0.617859613090335], [1436320512.0, 16080000.0, -0.6276913612907005], [1436322240.0, 16100000.0, -0.6374239897486905], [1436323968.0, 16120000.0, -0.6470559615694444], [1436325696.0, 16140000.0, -0.6565857557529561], [1436327424.0, 16160000.0, -0.666011867434252], [1436329152.0, 16180000.0, -0.6753328081210243], [1436330880.0, 16200000.0, -0.6845471059286892], [1436332608.0, 16220000.0, -0.6936533058128049], [1436334336.0, 16240000.0, -0.7026499697988499], [1436336064.0, 16260000.0, -0.7115356772092841], [1436337792.0, 16280000.0, -0.7203090248879064], [1436339520.0, 16300000.0, -0.7289686274214106], [1436341248.0, 16320000.0, -0.7375131173581736], [1436342976.0, 16340000.0, -0.7459411454241813], [1436344704.0, 16360000.0, -0.7542513807361036], [1436346432.0, 16380000.0, -0.7624425110114472], [1436348160.0, 16400000.0, -0.770513242775788], [1436349888.0, 16420000.0, -0.7784623015670229], [1436351616.0, 16440000.0, -0.7862884321366179], [1436353344.0, 16460000.0, -0.7939903986478349], [1436355072.0, 16480000.0, -0.8015669848708757], [1436356800.0, 16500000.0, -0.8090169943749472], [1436358528.0, 16520000.0, -0.8163392507171833], [1436360256.0, 16540000.0, -0.8235325976284273], [1436361984.0, 16560000.0, -0.8305958991958121], [1436363712.0, 16580000.0, -0.8375280400421408], [1436365440.0, 16600000.0, -0.8443279255020146], [1436367168.0, 16620000.0, -0.8509944817946911], [1436368896.0, 16640000.0, -0.857526656193652], [1436370624.0, 16660000.0, -0.8639234171928346], [1436372352.0, 16680000.0, -0.8701837546695255], [1436374080.0, 16700000.0, -0.876306680043863], [1436375808.0, 16720000.0, -0.8822912264349532], [1436377536.0, 16740000.0, -0.8881364488135441], [1436379264.0, 16760000.0, -0.893841424151263], [1436380992.0, 16780000.0, -0.8994052515663707], [1436382720.0, 16800000.0, -0.9048270524660189], [1436384448.0, 16820000.0, -0.9101059706849954], [1436386176.0, 16840000.0, -0.9152411726209171], [1436387904.0, 16860000.0, -0.9202318473658703], [1436389632.0, 16880000.0, -0.9250772068344576], [1436391360.0, 16900000.0, -0.9297764858882513], [1436393088.0, 16920000.0, -0.9343289424566118], [1436394816.0, 16940000.0, -0.9387338576538742], [1436396544.0, 16960000.0, -0.9429905358928643], [1436398272.0, 16980000.0, -0.9470983049947438], [1436400000.0, 17000000.0, -0.9510565162951534], [1436401728.0, 17020000.0, -0.9548645447466426], [1436403456.0, 17040000.0, -0.9585217890173758], [1436405184.0, 17060000.0, -0.9620276715860856], [1436406912.0, 17080000.0, -0.9653816388332739], [1436408640.0, 17100000.0, -0.968583161128631], [1436410368.0, 17120000.0, -0.971631732914674], [1436412096.0, 17140000.0, -0.974526872786577], [1436413824.0, 17160000.0, -0.9772681235681931], [1436415552.0, 17180000.0, -0.9798550523842467], [1436417280.0, 17200000.0, -0.9822872507286885], [1436419008.0, 17220000.0, -0.9845643345292053], [1436420736.0, 17240000.0, -0.9866859442078679], [1436422464.0, 17260000.0, -0.9886517447379141], [1436424192.0, 17280000.0, -0.9904614256966511], [1436425920.0, 17300000.0, -0.9921147013144779], [1436427648.0, 17320000.0, -0.9936113105200084], [1436429376.0, 17340000.0, -0.9949510169813001], [1436431104.0, 17360000.0, -0.9961336091431724], [1436432832.0, 17380000.0, -0.9971589002606138], [1436434560.0, 17400000.0, -0.9980267284282716], [1436436288.0, 17420000.0, -0.9987369566060175], [1436438016.0, 17440000.0, -0.9992894726405892], [1436439744.0, 17460000.0, -0.9996841892832999], [1436441472.0, 17480000.0, -0.9999210442038161], [1436443200.0, 17500000.0, -1.0], [1436444928.0, 17520000.0, -0.9999210442038161], [1436446656.0, 17540000.0, -0.9996841892832999], [1436448384.0, 17560000.0, -0.9992894726405893], [1436450112.0, 17580000.0, -0.9987369566060175], [1436451840.0, 17600000.0, -0.9980267284282716], [1436453568.0, 17620000.0, -0.9971589002606139], [1436455296.0, 17640000.0, -0.9961336091431725], [1436457024.0, 17660000.0, -0.9949510169813001], [1436458752.0, 17680000.0, -0.9936113105200084], [1436460480.0, 17700000.0, -0.992114701314478], [1436462208.0, 17720000.0, -0.9904614256966512], [1436463936.0, 17740000.0, -0.9886517447379142], [1436465664.0, 17760000.0, -0.986685944207868], [1436467392.0, 17780000.0, -0.9845643345292056], [1436469120.0, 17800000.0, -0.9822872507286886], [1436470848.0, 17820000.0, -0.979855052384247], [1436472576.0, 17840000.0, -0.9772681235681934], [1436474304.0, 17860000.0, -0.9745268727865772], [1436476032.0, 17880000.0, -0.9716317329146742], [1436477760.0, 17900000.0, -0.9685831611286312], [1436479488.0, 17920000.0, -0.9653816388332741], [1436481216.0, 17940000.0, -0.9620276715860858], [1436482944.0, 17960000.0, -0.9585217890173761], [1436484672.0, 17980000.0, -0.9548645447466428], [1436486400.0, 18000000.0, -0.9510565162951538], [1436488128.0, 18020000.0, -0.9470983049947441], [1436489856.0, 18040000.0, -0.9429905358928645], [1436491584.0, 18060000.0, -0.9387338576538744], [1436493312.0, 18080000.0, -0.934328942456612], [1436495040.0, 18100000.0, -0.9297764858882517], [1436496768.0, 18120000.0, -0.9250772068344579], [1436498496.0, 18140000.0, -0.9202318473658706], [1436500224.0, 18160000.0, -0.9152411726209174], [1436501952.0, 18180000.0, -0.9101059706849959], [1436503680.0, 18200000.0, -0.9048270524660192], [1436505408.0, 18220000.0, -0.8994052515663711], [1436507136.0, 18240000.0, -0.8938414241512634], [1436508864.0, 18260000.0, -0.8881364488135445], [1436510592.0, 18280000.0, -0.8822912264349536], [1436512320.0, 18300000.0, -0.8763066800438635], [1436514048.0, 18320000.0, -0.8701837546695259], [1436515776.0, 18340000.0, -0.863923417192835], [1436517504.0, 18360000.0, -0.8575266561936524], [1436519232.0, 18380000.0, -0.8509944817946915], [1436520960.0, 18400000.0, -0.8443279255020151], [1436522688.0, 18420000.0, -0.8375280400421412], [1436524416.0, 18440000.0, -0.8305958991958126], [1436526144.0, 18460000.0, -0.8235325976284278], [1436527872.0, 18480000.0, -0.8163392507171837], [1436529600.0, 18500000.0, -0.8090169943749477], [1436531328.0, 18520000.0, -0.8015669848708763], [1436533056.0, 18540000.0, -0.7939903986478355], [1436534784.0, 18560000.0, -0.7862884321366185], [1436536512.0, 18580000.0, -0.7784623015670235], [1436538240.0, 18600000.0, -0.7705132427757886], [1436539968.0, 18620000.0, -0.7624425110114478], [1436541696.0, 18640000.0, -0.7542513807361042], [1436543424.0, 18660000.0, -0.7459411454241819], [1436545152.0, 18680000.0, -0.7375131173581742], [1436546880.0, 18700000.0, -0.7289686274214111], [1436548608.0, 18720000.0, -0.720309024887907], [1436550336.0, 18740000.0, -0.7115356772092848], [1436552064.0, 18760000.0, -0.7026499697988504], [1436553792.0, 18780000.0, -0.6936533058128055], [1436555520.0, 18800000.0, -0.6845471059286898], [1436557248.0, 18820000.0, -0.6753328081210248], [1436558976.0, 18840000.0, -0.6660118674342526], [1436560704.0, 18860000.0, -0.6565857557529566], [1436562432.0, 18880000.0, -0.6470559615694451], [1436564160.0, 18900000.0, -0.6374239897486912], [1436565888.0, 18920000.0, -0.6276913612907011], [1436567616.0, 18940000.0, -0.6178596130903357], [1436569344.0, 18960000.0, -0.6079302976946059], [1436571072.0, 18980000.0, -0.59790498305752], [1436572800.0, 19000000.0, -0.5877852522924735], [1436574528.0, 19020000.0, -0.5775727034222686], [1436576256.0, 19040000.0, -0.5672689491267567], [1436577984.0, 19060000.0, -0.5568756164881888], [1436579712.0, 19080000.0, -0.5463943467342706], [1436581440.0, 19100000.0, -0.5358267949789973], [1436583168.0, 19120000.0, -0.525174629961297], [1436584896.0, 19140000.0, -0.514439533781507], [1436586624.0, 19160000.0, -0.503623201635762], [1436588352.0, 19180000.0, -0.4927273415482919], [1436590080.0, 19200000.0, -0.4817536741017163], [1436591808.0, 19220000.0, -0.47070393216533274], [1436593536.0, 19240000.0, -0.4595798606214887], [1436595264.0, 19260000.0, -0.44838321609003384], [1436596992.0, 19280000.0, -0.4371157666509336], [1436598720.0, 19300000.0, -0.42577929156507405], [1436600448.0, 19320000.0, -0.41437558099328464], [1436602176.0, 19340000.0, -0.40290643571366386], [1436603904.0, 19360000.0, -0.3913736668372027], [1436605632.0, 19380000.0, -0.3797790955218021], [1436607360.0, 19400000.0, -0.3681245526846781], [1436609088.0, 19420000.0, -0.35641187871325153], [1436610816.0, 19440000.0, -0.34464292317451695], [1436612544.0, 19460000.0, -0.3328195445229873], [1436614272.0, 19480000.0, -0.3209436098072109], [1436616000.0, 19500000.0, -0.3090169943749479], [1436617728.0, 19520000.0, -0.29704158157703614], [1436619456.0, 19540000.0, -0.2850192624699764], [1436621184.0, 19560000.0, -0.2729519355173262], [1436622912.0, 19580000.0, -0.260841506289897], [1436624640.0, 19600000.0, -0.2486898871648556], [1436626368.0, 19620000.0, -0.23649899702372454], [1436628096.0, 19640000.0, -0.22427076094938178], [1436629824.0, 19660000.0, -0.21200710992205601], [1436631552.0, 19680000.0, -0.19970998051440741], [1436633280.0, 19700000.0, -0.1873813145857258], [1436635008.0, 19720000.0, -0.17502305897527623], [1436636736.0, 19740000.0, -0.16263716519488458], [1436638464.0, 19760000.0, -0.15022558912075704], [1436640192.0, 19780000.0, -0.13779029068463883], [1436641920.0, 19800000.0, -0.125333233564304], [1436643648.0, 19820000.0, -0.11285638487348224], [1436645376.0, 19840000.0, -0.10036171485121623], [1436647104.0, 19860000.0, -0.08785119655074351], [1436648832.0, 19880000.0, -0.07532680552793385], [1436650560.0, 19900000.0, -0.06279051952931351], [1436652288.0, 19920000.0, -0.05024431817977047], [1436654016.0, 19940000.0, -0.03769018266993446], [1436655744.0, 19960000.0, -0.02513009544333818], [1436657472.0, 19980000.0, -0.012566039883352316], [1436659200.0, 20000000.0, -4.898587196589413e-16], [1436660928.0, 20020000.0, 0.012566039883351336], [1436662656.0, 20040000.0, 0.025130095443337202], [1436664384.0, 20060000.0, 0.03769018266993348], [1436666112.0, 20080000.0, 0.050244318179769494], [1436667840.0, 20100000.0, 0.06279051952931075], [1436669568.0, 20120000.0, 0.07532680552793287], [1436671296.0, 20140000.0, 0.08785119655074078], [1436673024.0, 20160000.0, 0.10036171485121524], [1436674752.0, 20180000.0, 0.1128563848734795], [1436676480.0, 20200000.0, 0.12533323356430304], [1436678208.0, 20220000.0, 0.1377902906846361], [1436679936.0, 20240000.0, 0.15022558912075606], [1436681664.0, 20260000.0, 0.16263716519488186], [1436683392.0, 20280000.0, 0.1750230589752753], [1436685120.0, 20300000.0, 0.1873813145857231], [1436686848.0, 20320000.0, 0.19970998051440647], [1436688576.0, 20340000.0, 0.21200710992205332], [1436690304.0, 20360000.0, 0.22427076094938084], [1436692032.0, 20380000.0, 0.23649899702372357], [1436693760.0, 20400000.0, 0.24868988716485466], [1436695488.0, 20420000.0, 0.26084150628989605], [1436697216.0, 20440000.0, 0.2729519355173253], [1436698944.0, 20460000.0, 0.2850192624699737], [1436700672.0, 20480000.0, 0.2970415815770352], [1436702400.0, 20500000.0, 0.30901699437494523], [1436704128.0, 20520000.0, 0.32094360980720993], [1436705856.0, 20540000.0, 0.3328195445229847], [1436707584.0, 20560000.0, 0.34464292317451606], [1436709312.0, 20580000.0, 0.356411878713249], [1436711040.0, 20600000.0, 0.36812455268467714], [1436712768.0, 20620000.0, 0.3797790955217996], [1436714496.0, 20640000.0, 0.3913736668372018], [1436716224.0, 20660000.0, 0.40290643571366136], [1436717952.0, 20680000.0, 0.41437558099328375], [1436719680.0, 20700000.0, 0.42577929156507155], [1436721408.0, 20720000.0, 0.4371157666509327], [1436723136.0, 20740000.0, 0.44838321609003134], [1436724864.0, 20760000.0, 0.45957986062148787], [1436726592.0, 20780000.0, 0.47070393216533185], [1436728320.0, 20800000.0, 0.48175367410171543], [1436730048.0, 20820000.0, 0.49272734154829106], [1436731776.0, 20840000.0, 0.5036232016357612], [1436733504.0, 20860000.0, 0.5144395337815046], [1436735232.0, 20880000.0, 0.5251746299612963], [1436736960.0, 20900000.0, 0.535826794978995], [1436738688.0, 20920000.0, 0.5463943467342698], [1436740416.0, 20940000.0, 0.5568756164881866], [1436742144.0, 20960000.0, 0.5672689491267559], [1436743872.0, 20980000.0, 0.5775727034222664], [1436745600.0, 21000000.0, 0.5877852522924727], [1436747328.0, 21020000.0, 0.5979049830575178], [1436749056.0, 21040000.0, 0.6079302976946052], [1436750784.0, 21060000.0, 0.6178596130903335], [1436752512.0, 21080000.0, 0.6276913612907004], [1436754240.0, 21100000.0, 0.6374239897486891], [1436755968.0, 21120000.0, 0.6470559615694444], [1436757696.0, 21140000.0, 0.656585755752956], [1436759424.0, 21160000.0, 0.6660118674342519], [1436761152.0, 21180000.0, 0.6753328081210241], [1436762880.0, 21200000.0, 0.6845471059286891], [1436764608.0, 21220000.0, 0.6936533058128035], [1436766336.0, 21240000.0, 0.7026499697988497], [1436768064.0, 21260000.0, 0.7115356772092841], [1436769792.0, 21280000.0, 0.7203090248879076], [1436771520.0, 21300000.0, 0.7289686274214104], [1436773248.0, 21320000.0, 0.7375131173581735], [1436774976.0, 21340000.0, 0.7459411454241812], [1436776704.0, 21360000.0, 0.7542513807361035], [1436778432.0, 21380000.0, 0.7624425110114471], [1436780160.0, 21400000.0, 0.7705132427757891], [1436781888.0, 21420000.0, 0.7784623015670228], [1436783616.0, 21440000.0, 0.7862884321366189], [1436785344.0, 21460000.0, 0.7939903986478348], [1436787072.0, 21480000.0, 0.8015669848708767], [1436788800.0, 21500000.0, 0.8090169943749471], [1436790528.0, 21520000.0, 0.8163392507171843], [1436792256.0, 21540000.0, 0.8235325976284272], [1436793984.0, 21560000.0, 0.830595899195813], [1436795712.0, 21580000.0, 0.8375280400421407], [1436797440.0, 21600000.0, 0.8443279255020155], [1436799168.0, 21620000.0, 0.850994481794691], [1436800896.0, 21640000.0, 0.8575266561936529], [1436802624.0, 21660000.0, 0.8639234171928346], [1436804352.0, 21680000.0, 0.8701837546695255], [1436806080.0, 21700000.0, 0.876306680043863], [1436807808.0, 21720000.0, 0.8822912264349532], [1436809536.0, 21740000.0, 0.888136448813544], [1436811264.0, 21760000.0, 0.8938414241512638], [1436812992.0, 21780000.0, 0.8994052515663706], [1436814720.0, 21800000.0, 0.9048270524660196], [1436816448.0, 21820000.0, 0.9101059706849954], [1436818176.0, 21840000.0, 0.9152411726209178], [1436819904.0, 21860000.0, 0.9202318473658702], [1436821632.0, 21880000.0, 0.9250772068344583], [1436823360.0, 21900000.0, 0.9297764858882513], [1436825088.0, 21920000.0, 0.9343289424566124], [1436826816.0, 21940000.0, 0.9387338576538741], [1436828544.0, 21960000.0, 0.9429905358928649], [1436830272.0, 21980000.0, 0.9470983049947438], [1436832000.0, 22000000.0, 0.951056516295154], [1436833728.0, 22020000.0, 0.9548645447466426], [1436835456.0, 22040000.0, 0.9585217890173763], [1436837184.0, 22060000.0, 0.9620276715860856], [1436838912.0, 22080000.0, 0.9653816388332739], [1436840640.0, 22100000.0, 0.9685831611286309], [1436842368.0, 22120000.0, 0.971631732914674], [1436844096.0, 22140000.0, 0.974526872786577], [1436845824.0, 22160000.0, 0.9772681235681935], [1436847552.0, 22180000.0, 0.9798550523842467], [1436849280.0, 22200000.0, 0.9822872507286888], [1436851008.0, 22220000.0, 0.9845643345292053], [1436852736.0, 22240000.0, 0.9866859442078681], [1436854464.0, 22260000.0, 0.988651744737914], [1436856192.0, 22280000.0, 0.9904614256966513], [1436857920.0, 22300000.0, 0.9921147013144779], [1436859648.0, 22320000.0, 0.9936113105200085], [1436861376.0, 22340000.0, 0.9949510169813001], [1436863104.0, 22360000.0, 0.9961336091431726], [1436864832.0, 22380000.0, 0.9971589002606138], [1436866560.0, 22400000.0, 0.9980267284282717], [1436868288.0, 22420000.0, 0.9987369566060175], [1436870016.0, 22440000.0, 0.9992894726405892], [1436871744.0, 22460000.0, 0.9996841892832999], [1436873472.0, 22480000.0, 0.9999210442038161], [1436875200.0, 22500000.0, 1.0], [1436876928.0, 22520000.0, 0.9999210442038161], [1436878656.0, 22540000.0, 0.9996841892832999], [1436880384.0, 22560000.0, 0.9992894726405893], [1436882112.0, 22580000.0, 0.9987369566060175], [1436883840.0, 22600000.0, 0.9980267284282718], [1436885568.0, 22620000.0, 0.9971589002606139], [1436887296.0, 22640000.0, 0.9961336091431727], [1436889024.0, 22660000.0, 0.9949510169813002], [1436890752.0, 22680000.0, 0.9936113105200086], [1436892480.0, 22700000.0, 0.992114701314478], [1436894208.0, 22720000.0, 0.9904614256966515], [1436895936.0, 22740000.0, 0.9886517447379142], [1436897664.0, 22760000.0, 0.9866859442078684], [1436899392.0, 22780000.0, 0.9845643345292056], [1436901120.0, 22800000.0, 0.9822872507286889], [1436902848.0, 22820000.0, 0.979855052384247], [1436904576.0, 22840000.0, 0.9772681235681938], [1436906304.0, 22860000.0, 0.9745268727865772], [1436908032.0, 22880000.0, 0.9716317329146742], [1436909760.0, 22900000.0, 0.9685831611286312], [1436911488.0, 22920000.0, 0.9653816388332741], [1436913216.0, 22940000.0, 0.9620276715860859], [1436914944.0, 22960000.0, 0.9585217890173766], [1436916672.0, 22980000.0, 0.9548645447466428], [1436918400.0, 23000000.0, 0.9510565162951543], [1436920128.0, 23020000.0, 0.9470983049947441], [1436921856.0, 23040000.0, 0.9429905358928652], [1436923584.0, 23060000.0, 0.9387338576538745], [1436925312.0, 23080000.0, 0.9343289424566127], [1436927040.0, 23100000.0, 0.9297764858882518], [1436928768.0, 23120000.0, 0.9250772068344587], [1436930496.0, 23140000.0, 0.9202318473658706], [1436932224.0, 23160000.0, 0.9152411726209182], [1436933952.0, 23180000.0, 0.9101059706849959], [1436935680.0, 23200000.0, 0.90482705246602], [1436937408.0, 23220000.0, 0.8994052515663712], [1436939136.0, 23240000.0, 0.8938414241512642], [1436940864.0, 23260000.0, 0.8881364488135446], [1436942592.0, 23280000.0, 0.8822912264349537], [1436944320.0, 23300000.0, 0.8763066800438635], [1436946048.0, 23320000.0, 0.870183754669526], [1436947776.0, 23340000.0, 0.8639234171928352], [1436949504.0, 23360000.0, 0.8575266561936534], [1436951232.0, 23380000.0, 0.8509944817946915], [1436952960.0, 23400000.0, 0.8443279255020162], [1436954688.0, 23420000.0, 0.8375280400421413], [1436956416.0, 23440000.0, 0.8305958991958137], [1436958144.0, 23460000.0, 0.8235325976284279], [1436959872.0, 23480000.0, 0.8163392507171848], [1436961600.0, 23500000.0, 0.8090169943749478], [1436963328.0, 23520000.0, 0.8015669848708774], [1436965056.0, 23540000.0, 0.7939903986478356], [1436966784.0, 23560000.0, 0.7862884321366196], [1436968512.0, 23580000.0, 0.7784623015670236], [1436970240.0, 23600000.0, 0.7705132427757898], [1436971968.0, 23620000.0, 0.7624425110114478], [1436973696.0, 23640000.0, 0.7542513807361043], [1436975424.0, 23660000.0, 0.745941145424182], [1436977152.0, 23680000.0, 0.7375131173581743], [1436978880.0, 23700000.0, 0.7289686274214112], [1436980608.0, 23720000.0, 0.7203090248879084], [1436982336.0, 23740000.0, 0.7115356772092849], [1436984064.0, 23760000.0, 0.7026499697988505], [1436985792.0, 23780000.0, 0.6936533058128043], [1436987520.0, 23800000.0, 0.6845471059286898], [1436989248.0, 23820000.0, 0.6753328081210249], [1436990976.0, 23840000.0, 0.6660118674342527], [1436992704.0, 23860000.0, 0.6565857557529567], [1436994432.0, 23880000.0, 0.6470559615694452], [1436996160.0, 23900000.0, 0.6374239897486899], [1436997888.0, 23920000.0, 0.6276913612907012], [1436999616.0, 23940000.0, 0.6178596130903343], [1437001344.0, 23960000.0, 0.607930297694606], [1437003072.0, 23980000.0, 0.5979049830575187], [1437004800.0, 24000000.0, 0.5877852522924736], [1437006528.0, 24020000.0, 0.5775727034222673], [1437008256.0, 24040000.0, 0.5672689491267568], [1437009984.0, 24060000.0, 0.5568756164881875], [1437011712.0, 24080000.0, 0.5463943467342707], [1437013440.0, 24100000.0, 0.5358267949789959], [1437015168.0, 24120000.0, 0.5251746299612972], [1437016896.0, 24140000.0, 0.5144395337815055], [1437018624.0, 24160000.0, 0.5036232016357621], [1437020352.0, 24180000.0, 0.492727341548292], [1437022080.0, 24200000.0, 0.48175367410171643], [1437023808.0, 24220000.0, 0.47070393216533285], [1437025536.0, 24240000.0, 0.4595798606214888], [1437027264.0, 24260000.0, 0.44838321609003234], [1437028992.0, 24280000.0, 0.4371157666509337], [1437030720.0, 24300000.0, 0.42577929156507255], [1437032448.0, 24320000.0, 0.41437558099328475], [1437034176.0, 24340000.0, 0.40290643571366236], [1437035904.0, 24360000.0, 0.3913736668372028], [1437037632.0, 24380000.0, 0.3797790955218006], [1437039360.0, 24400000.0, 0.3681245526846782], [1437041088.0, 24420000.0, 0.35641187871325003], [1437042816.0, 24440000.0, 0.3446429231745171], [1437044544.0, 24460000.0, 0.33281954452298573], [1437046272.0, 24480000.0, 0.320943609807211], [1437048000.0, 24500000.0, 0.3090169943749463], [1437049728.0, 24520000.0, 0.29704158157703625], [1437051456.0, 24540000.0, 0.2850192624699748], [1437053184.0, 24560000.0, 0.2729519355173264], [1437054912.0, 24580000.0, 0.2608415062898971], [1437056640.0, 24600000.0, 0.2486898871648557], [1437058368.0, 24620000.0, 0.23649899702372465], [1437060096.0, 24640000.0, 0.22427076094938192], [1437061824.0, 24660000.0, 0.2120071099220544], [1437063552.0, 24680000.0, 0.19970998051440755], [1437065280.0, 24700000.0, 0.18738131458572418], [1437067008.0, 24720000.0, 0.17502305897527637], [1437068736.0, 24740000.0, 0.16263716519488294], [1437070464.0, 24760000.0, 0.15022558912075717], [1437072192.0, 24780000.0, 0.1377902906846372], [1437073920.0, 24800000.0, 0.12533323356430415], [1437075648.0, 24820000.0, 0.1128563848734806], [1437077376.0, 24840000.0, 0.10036171485121634], [1437079104.0, 24860000.0, 0.08785119655074188], [1437080832.0, 24880000.0, 0.07532680552793397], [1437082560.0, 24900000.0, 0.06279051952931186], [1437084288.0, 24920000.0, 0.05024431817977059], [1437086016.0, 24940000.0, 0.03769018266993458], [1437087744.0, 24960000.0, 0.0251300954433383], [1437089472.0, 24980000.0, 0.012566039883352437], [1437091200.0, 25000000.0, 6.123233995736766e-16], [1437092928.0, 25020000.0, -0.012566039883351214], [1437094656.0, 25040000.0, -0.02513009544333708], [1437096384.0, 25060000.0, -0.037690182669933354], [1437098112.0, 25080000.0, -0.05024431817976937], [1437099840.0, 25100000.0, -0.06279051952931064], [1437101568.0, 25120000.0, -0.07532680552793275], [1437103296.0, 25140000.0, -0.08785119655074065], [1437105024.0, 25160000.0, -0.10036171485121513], [1437106752.0, 25180000.0, -0.11285638487347938], [1437108480.0, 25200000.0, -0.12533323356430293], [1437110208.0, 25220000.0, -0.13779029068463597], [1437111936.0, 25240000.0, -0.15022558912075595], [1437113664.0, 25260000.0, -0.16263716519488172], [1437115392.0, 25280000.0, -0.17502305897527515], [1437117120.0, 25300000.0, -0.18738131458572296], [1437118848.0, 25320000.0, -0.19970998051440633], [1437120576.0, 25340000.0, -0.2120071099220532], [1437122304.0, 25360000.0, -0.22427076094938073], [1437124032.0, 25380000.0, -0.23649899702372346], [1437125760.0, 25400000.0, -0.24868988716485452], [1437127488.0, 25420000.0, -0.26084150628989594], [1437129216.0, 25440000.0, -0.27295193551732516], [1437130944.0, 25460000.0, -0.2850192624699736], [1437132672.0, 25480000.0, -0.29704158157703336], [1437134400.0, 25500000.0, -0.3090169943749451], [1437136128.0, 25520000.0, -0.3209436098072098], [1437137856.0, 25540000.0, -0.33281954452298623], [1437139584.0, 25560000.0, -0.34464292317451595], [1437141312.0, 25580000.0, -0.35641187871324886], [1437143040.0, 25600000.0, -0.3681245526846787], [1437144768.0, 25620000.0, -0.3797790955218011], [1437146496.0, 25640000.0, -0.3913736668372017], [1437148224.0, 25660000.0, -0.40290643571366125], [1437149952.0, 25680000.0, -0.41437558099328525], [1437151680.0, 25700000.0, -0.42577929156506983], [1437153408.0, 25720000.0, -0.4371157666509326], [1437155136.0, 25740000.0, -0.44838321609003123], [1437156864.0, 25760000.0, -0.45957986062148615], [1437158592.0, 25780000.0, -0.4707039321653302], [1437160320.0, 25800000.0, -0.4817536741017153], [1437162048.0, 25820000.0, -0.49272734154829095], [1437163776.0, 25840000.0, -0.5036232016357596], [1437165504.0, 25860000.0, -0.5144395337815045], [1437167232.0, 25880000.0, -0.5251746299612962], [1437168960.0, 25900000.0, -0.5358267949789964], [1437170688.0, 25920000.0, -0.5463943467342682], [1437172416.0, 25940000.0, -0.5568756164881865], [1437174144.0, 25960000.0, -0.5672689491267573], [1437175872.0, 25980000.0, -0.5775727034222677], [1437177600.0, 26000000.0, -0.5877852522924726], [1437179328.0, 26020000.0, -0.5979049830575177], [1437181056.0, 26040000.0, -0.6079302976946065], [1437182784.0, 26060000.0, -0.617859613090332], [1437184512.0, 26080000.0, -0.6276913612907004], [1437186240.0, 26100000.0, -0.637423989748689], [1437187968.0, 26120000.0, -0.647055961569443], [1437189696.0, 26140000.0, -0.6565857557529545], [1437191424.0, 26160000.0, -0.6660118674342518], [1437193152.0, 26180000.0, -0.675332808121024], [1437194880.0, 26200000.0, -0.6845471059286877], [1437196608.0, 26220000.0, -0.6936533058128034], [1437198336.0, 26240000.0, -0.7026499697988496], [1437200064.0, 26260000.0, -0.7115356772092852], [1437201792.0, 26280000.0, -0.7203090248879063], [1437203520.0, 26300000.0, -0.7289686274214103], [1437205248.0, 26320000.0, -0.7375131173581746], [1437206976.0, 26340000.0, -0.7459411454241823], [1437208704.0, 26360000.0, -0.7542513807361034], [1437210432.0, 26380000.0, -0.762442511011447], [1437212160.0, 26400000.0, -0.7705132427757901], [1437213888.0, 26420000.0, -0.7784623015670217], [1437215616.0, 26440000.0, -0.7862884321366188], [1437217344.0, 26460000.0, -0.7939903986478348], [1437219072.0, 26480000.0, -0.8015669848708777], [1437220800.0, 26500000.0, -0.809016994374946], [1437222528.0, 26520000.0, -0.8163392507171842], [1437224256.0, 26540000.0, -0.8235325976284272], [1437225984.0, 26560000.0, -0.830595899195812], [1437227712.0, 26580000.0, -0.8375280400421407], [1437229440.0, 26600000.0, -0.8443279255020155], [1437231168.0, 26620000.0, -0.8509944817946918], [1437232896.0, 26640000.0, -0.8575266561936519], [1437234624.0, 26660000.0, -0.8639234171928345], [1437236352.0, 26680000.0, -0.8701837546695262], [1437238080.0, 26700000.0, -0.8763066800438638], [1437239808.0, 26720000.0, -0.8822912264349531], [1437241536.0, 26740000.0, -0.888136448813544], [1437243264.0, 26760000.0, -0.8938414241512646], [1437244992.0, 26780000.0, -0.8994052515663699], [1437246720.0, 26800000.0, -0.9048270524660196], [1437248448.0, 26820000.0, -0.9101059706849953], [1437250176.0, 26840000.0, -0.9152411726209184], [1437251904.0, 26860000.0, -0.9202318473658695], [1437253632.0, 26880000.0, -0.9250772068344583], [1437255360.0, 26900000.0, -0.9297764858882513], [1437257088.0, 26920000.0, -0.9343289424566117], [1437258816.0, 26940000.0, -0.9387338576538734], [1437260544.0, 26960000.0, -0.9429905358928647], [1437262272.0, 26980000.0, -0.9470983049947443], [1437264000.0, 27000000.0, -0.9510565162951534], [1437265728.0, 27020000.0, -0.9548645447466425], [1437267456.0, 27040000.0, -0.9585217890173763], [1437269184.0, 27060000.0, -0.962027671586086], [1437270912.0, 27080000.0, -0.9653816388332738], [1437272640.0, 27100000.0, -0.9685831611286309], [1437274368.0, 27120000.0, -0.9716317329146743], [1437276096.0, 27140000.0, -0.9745268727865766], [1437277824.0, 27160000.0, -0.9772681235681935], [1437279552.0, 27180000.0, -0.9798550523842467], [1437281280.0, 27200000.0, -0.982287250728689], [1437283008.0, 27220000.0, -0.984564334529205], [1437284736.0, 27240000.0, -0.9866859442078681], [1437286464.0, 27260000.0, -0.988651744737914], [1437288192.0, 27280000.0, -0.9904614256966511], [1437289920.0, 27300000.0, -0.9921147013144777], [1437291648.0, 27320000.0, -0.9936113105200085], [1437293376.0, 27340000.0, -0.9949510169813002], [1437295104.0, 27360000.0, -0.9961336091431724], [1437296832.0, 27380000.0, -0.9971589002606138], [1437298560.0, 27400000.0, -0.9980267284282717], [1437300288.0, 27420000.0, -0.9987369566060176], [1437302016.0, 27440000.0, -0.9992894726405892], [1437303744.0, 27460000.0, -0.9996841892832999], [1437305472.0, 27480000.0, -0.9999210442038161], [1437307200.0, 27500000.0, -1.0], [1437308928.0, 27520000.0, -0.9999210442038161], [1437310656.0, 27540000.0, -0.9996841892832999], [1437312384.0, 27560000.0, -0.9992894726405893], [1437314112.0, 27580000.0, -0.9987369566060176], [1437315840.0, 27600000.0, -0.9980267284282718], [1437317568.0, 27620000.0, -0.9971589002606139], [1437319296.0, 27640000.0, -0.9961336091431725], [1437321024.0, 27660000.0, -0.9949510169813003], [1437322752.0, 27680000.0, -0.9936113105200087], [1437324480.0, 27700000.0, -0.9921147013144778], [1437326208.0, 27720000.0, -0.9904614256966513], [1437327936.0, 27740000.0, -0.9886517447379142], [1437329664.0, 27760000.0, -0.9866859442078684], [1437331392.0, 27780000.0, -0.9845643345292052], [1437333120.0, 27800000.0, -0.9822872507286894], [1437334848.0, 27820000.0, -0.979855052384247], [1437336576.0, 27840000.0, -0.9772681235681938], [1437338304.0, 27860000.0, -0.9745268727865769], [1437340032.0, 27880000.0, -0.9716317329146746], [1437341760.0, 27900000.0, -0.9685831611286312], [1437343488.0, 27920000.0, -0.9653816388332742], [1437345216.0, 27940000.0, -0.9620276715860864], [1437346944.0, 27960000.0, -0.9585217890173766], [1437348672.0, 27980000.0, -0.954864544746643], [1437350400.0, 28000000.0, -0.9510565162951538], [1437352128.0, 28020000.0, -0.9470983049947448], [1437353856.0, 28040000.0, -0.9429905358928652], [1437355584.0, 28060000.0, -0.938733857653874], [1437357312.0, 28080000.0, -0.9343289424566121], [1437359040.0, 28100000.0, -0.9297764858882518], [1437360768.0, 28120000.0, -0.9250772068344587], [1437362496.0, 28140000.0, -0.9202318473658699], [1437364224.0, 28160000.0, -0.915241172620919], [1437365952.0, 28180000.0, -0.910105970684996], [1437367680.0, 28200000.0, -0.9048270524660201], [1437369408.0, 28220000.0, -0.8994052515663704], [1437371136.0, 28240000.0, -0.8938414241512651], [1437372864.0, 28260000.0, -0.8881364488135446], [1437374592.0, 28280000.0, -0.8822912264349537], [1437376320.0, 28300000.0, -0.8763066800438645], [1437378048.0, 28320000.0, -0.8701837546695269], [1437379776.0, 28340000.0, -0.8639234171928352], [1437381504.0, 28360000.0, -0.8575266561936525], [1437383232.0, 28380000.0, -0.8509944817946925], [1437384960.0, 28400000.0, -0.8443279255020162], [1437386688.0, 28420000.0, -0.8375280400421413], [1437388416.0, 28440000.0, -0.8305958991958127], [1437390144.0, 28460000.0, -0.8235325976284279], [1437391872.0, 28480000.0, -0.8163392507171849], [1437393600.0, 28500000.0, -0.8090169943749468], [1437395328.0, 28520000.0, -0.8015669848708785], [1437397056.0, 28540000.0, -0.7939903986478356], [1437398784.0, 28560000.0, -0.7862884321366197], [1437400512.0, 28580000.0, -0.7784623015670225], [1437402240.0, 28600000.0, -0.770513242775791], [1437403968.0, 28620000.0, -0.7624425110114479], [1437405696.0, 28640000.0, -0.7542513807361043], [1437407424.0, 28660000.0, -0.7459411454241832], [1437409152.0, 28680000.0, -0.7375131173581755], [1437410880.0, 28700000.0, -0.7289686274214112], [1437412608.0, 28720000.0, -0.7203090248879072], [1437414336.0, 28740000.0, -0.7115356772092862], [1437416064.0, 28760000.0, -0.7026499697988506], [1437417792.0, 28780000.0, -0.6936533058128044], [1437419520.0, 28800000.0, -0.6845471059286886], [1437421248.0, 28820000.0, -0.675332808121025], [1437422976.0, 28840000.0, -0.6660118674342528], [1437424704.0, 28860000.0, -0.6565857557529555], [1437426432.0, 28880000.0, -0.647055961569444], [1437428160.0, 28900000.0, -0.63742398974869], [1437429888.0, 28920000.0, -0.6276913612907014], [1437431616.0, 28940000.0, -0.6178596130903331], [1437433344.0, 28960000.0, -0.6079302976946075], [1437435072.0, 28980000.0, -0.5979049830575188], [1437436800.0, 29000000.0, -0.5877852522924737], [1437438528.0, 29020000.0, -0.5775727034222689], [1437440256.0, 29040000.0, -0.5672689491267584], [1437441984.0, 29060000.0, -0.5568756164881876], [1437443712.0, 29080000.0, -0.5463943467342693], [1437445440.0, 29100000.0, -0.5358267949789975], [1437447168.0, 29120000.0, -0.5251746299612973], [1437448896.0, 29140000.0, -0.5144395337815056], [1437450624.0, 29160000.0, -0.5036232016357607], [1437452352.0, 29180000.0, -0.4927273415482921], [1437454080.0, 29200000.0, -0.48175367410171654], [1437455808.0, 29220000.0, -0.47070393216533135], [1437457536.0, 29240000.0, -0.45957986062148737], [1437459264.0, 29260000.0, -0.44838321609003245], [1437460992.0, 29280000.0, -0.4371157666509338], [1437462720.0, 29300000.0, -0.42577929156507105], [1437464448.0, 29320000.0, -0.4143755809932865], [1437466176.0, 29340000.0, -0.40290643571366247], [1437467904.0, 29360000.0, -0.3913736668372029], [1437469632.0, 29380000.0, -0.3797790955218024], [1437471360.0, 29400000.0, -0.36812455268468], [1437473088.0, 29420000.0, -0.35641187871325014], [1437474816.0, 29440000.0, -0.3446429231745172], [1437476544.0, 29460000.0, -0.3328195445229875], [1437478272.0, 29480000.0, -0.3209436098072111], [1437480000.0, 29500000.0, -0.3090169943749464], [1437481728.0, 29520000.0, -0.29704158157703464], [1437483456.0, 29540000.0, -0.2850192624699766], [1437485184.0, 29560000.0, -0.2729519355173265], [1437486912.0, 29580000.0, -0.2608415062898955], [1437488640.0, 29600000.0, -0.2486898871648541], [1437490368.0, 29620000.0, -0.23649899702372476], [1437492096.0, 29640000.0, -0.22427076094938203], [1437493824.0, 29660000.0, -0.2120071099220528], [1437495552.0, 29680000.0, -0.1997099805144094], [1437497280.0, 29700000.0, -0.1873813145857243], [1437499008.0, 29720000.0, -0.17502305897527648], [1437500736.0, 29740000.0, -0.1626371651948848], [1437502464.0, 29760000.0, -0.15022558912075903], [1437504192.0, 29780000.0, -0.1377902906846373], [1437505920.0, 29800000.0, -0.12533323356430426], [1437507648.0, 29820000.0, -0.11285638487348248], [1437509376.0, 29840000.0, -0.10036171485121646], [1437511104.0, 29860000.0, -0.087851196550742], [1437512832.0, 29880000.0, -0.07532680552793232], [1437514560.0, 29900000.0, -0.06279051952931375], [1437516288.0, 29920000.0, -0.050244318179770715], [1437518016.0, 29940000.0, -0.03769018266993293], [1437519744.0, 29960000.0, -0.02513009544333665], [1437521472.0, 29980000.0, -0.01256603988335256], [1437523200.0, 30000000.0, -7.347880794884119e-16], [1437524928.0, 30020000.0, 0.012566039883351091], [1437526656.0, 30040000.0, 0.02513009544333518], [1437528384.0, 30060000.0, 0.03769018266993146], [1437530112.0, 30080000.0, 0.050244318179769244], [1437531840.0, 30100000.0, 0.06279051952931229], [1437533568.0, 30120000.0, 0.07532680552793085], [1437535296.0, 30140000.0, 0.08785119655074053], [1437537024.0, 30160000.0, 0.100361714851215], [1437538752.0, 30180000.0, 0.11285638487348103], [1437540480.0, 30200000.0, 0.12533323356430282], [1437542208.0, 30220000.0, 0.13779029068463586], [1437543936.0, 30240000.0, 0.1502255891207576], [1437545664.0, 30260000.0, 0.16263716519488336], [1437547392.0, 30280000.0, 0.17502305897527504], [1437549120.0, 30300000.0, 0.18738131458572285], [1437550848.0, 30320000.0, 0.19970998051440797], [1437552576.0, 30340000.0, 0.21200710992205135], [1437554304.0, 30360000.0, 0.2242707609493806], [1437556032.0, 30380000.0, 0.23649899702372335], [1437557760.0, 30400000.0, 0.24868988716485269], [1437559488.0, 30420000.0, 0.2608415062898941], [1437561216.0, 30440000.0, 0.27295193551732505], [1437562944.0, 30460000.0, 0.2850192624699752], [1437564672.0, 30480000.0, 0.29704158157703325], [1437566400.0, 30500000.0, 0.309016994374945], [1437568128.0, 30520000.0, 0.3209436098072097], [1437569856.0, 30540000.0, 0.3328195445229861], [1437571584.0, 30560000.0, 0.34464292317451584], [1437573312.0, 30580000.0, 0.35641187871324875], [1437575040.0, 30600000.0, 0.3681245526846786], [1437576768.0, 30620000.0, 0.379779095521801], [1437578496.0, 30640000.0, 0.3913736668372016], [1437580224.0, 30660000.0, 0.40290643571366114], [1437581952.0, 30680000.0, 0.41437558099328514], [1437583680.0, 30700000.0, 0.4257792915650697], [1437585408.0, 30720000.0, 0.4371157666509325], [1437587136.0, 30740000.0, 0.4483832160900311], [1437588864.0, 30760000.0, 0.45957986062148604], [1437590592.0, 30780000.0, 0.4707039321653301], [1437592320.0, 30800000.0, 0.4817536741017152], [1437594048.0, 30820000.0, 0.49272734154829084], [1437595776.0, 30840000.0, 0.5036232016357595], [1437597504.0, 30860000.0, 0.5144395337815044], [1437599232.0, 30880000.0, 0.525174629961296], [1437600960.0, 30900000.0, 0.5358267949789963], [1437602688.0, 30920000.0, 0.5463943467342681], [1437604416.0, 30940000.0, 0.5568756164881864], [1437606144.0, 30960000.0, 0.5672689491267572], [1437607872.0, 30980000.0, 0.5775727034222676], [1437609600.0, 31000000.0, 0.5877852522924725], [1437611328.0, 31020000.0, 0.5979049830575176], [1437613056.0, 31040000.0, 0.6079302976946064], [1437614784.0, 31060000.0, 0.6178596130903319], [1437616512.0, 31080000.0, 0.6276913612907002], [1437618240.0, 31100000.0, 0.6374239897486889], [1437619968.0, 31120000.0, 0.6470559615694429], [1437621696.0, 31140000.0, 0.6565857557529544], [1437623424.0, 31160000.0, 0.6660118674342517], [1437625152.0, 31180000.0, 0.6753328081210239], [1437626880.0, 31200000.0, 0.6845471059286876], [1437628608.0, 31220000.0, 0.6936533058128034], [1437630336.0, 31240000.0, 0.7026499697988495], [1437632064.0, 31260000.0, 0.7115356772092851], [1437633792.0, 31280000.0, 0.7203090248879062], [1437635520.0, 31300000.0, 0.7289686274214102], [1437637248.0, 31320000.0, 0.7375131173581745], [1437638976.0, 31340000.0, 0.7459411454241822], [1437640704.0, 31360000.0, 0.7542513807361034], [1437642432.0, 31380000.0, 0.762442511011447], [1437644160.0, 31400000.0, 0.7705132427757901], [1437645888.0, 31420000.0, 0.7784623015670216], [1437647616.0, 31440000.0, 0.7862884321366188], [1437649344.0, 31460000.0, 0.7939903986478347], [1437651072.0, 31480000.0, 0.8015669848708776], [1437652800.0, 31500000.0, 0.8090169943749459], [1437654528.0, 31520000.0, 0.816339250717184], [1437656256.0, 31540000.0, 0.8235325976284271], [1437657984.0, 31560000.0, 0.8305958991958119], [1437659712.0, 31580000.0, 0.8375280400421405], [1437661440.0, 31600000.0, 0.8443279255020154], [1437663168.0, 31620000.0, 0.8509944817946918], [1437664896.0, 31640000.0, 0.8575266561936518], [1437666624.0, 31660000.0, 0.8639234171928345], [1437668352.0, 31680000.0, 0.8701837546695262], [1437670080.0, 31700000.0, 0.8763066800438637], [1437671808.0, 31720000.0, 0.8822912264349531], [1437673536.0, 31740000.0, 0.8881364488135439], [1437675264.0, 31760000.0, 0.8938414241512644], [1437676992.0, 31780000.0, 0.8994052515663697], [1437678720.0, 31800000.0, 0.9048270524660195], [1437680448.0, 31820000.0, 0.9101059706849953], [1437682176.0, 31840000.0, 0.9152411726209183], [1437683904.0, 31860000.0, 0.9202318473658694], [1437685632.0, 31880000.0, 0.9250772068344582], [1437687360.0, 31900000.0, 0.9297764858882512], [1437689088.0, 31920000.0, 0.9343289424566116], [1437690816.0, 31940000.0, 0.9387338576538734], [1437692544.0, 31960000.0, 0.9429905358928647], [1437694272.0, 31980000.0, 0.9470983049947442], [1437696000.0, 32000000.0, 0.9510565162951533], [1437697728.0, 32020000.0, 0.9548645447466425], [1437699456.0, 32040000.0, 0.9585217890173763], [1437701184.0, 32060000.0, 0.962027671586086], [1437702912.0, 32080000.0, 0.9653816388332738], [1437704640.0, 32100000.0, 0.9685831611286309], [1437706368.0, 32120000.0, 0.9716317329146743], [1437708096.0, 32140000.0, 0.9745268727865766], [1437709824.0, 32160000.0, 0.9772681235681935], [1437711552.0, 32180000.0, 0.9798550523842467], [1437713280.0, 32200000.0, 0.982287250728689], [1437715008.0, 32220000.0, 0.984564334529205], [1437716736.0, 32240000.0, 0.9866859442078681], [1437718464.0, 32260000.0, 0.988651744737914], [1437720192.0, 32280000.0, 0.9904614256966511], [1437721920.0, 32300000.0, 0.9921147013144775], [1437723648.0, 32320000.0, 0.9936113105200085], [1437725376.0, 32340000.0, 0.9949510169813002], [1437727104.0, 32360000.0, 0.9961336091431724], [1437728832.0, 32380000.0, 0.9971589002606138], [1437730560.0, 32400000.0, 0.9980267284282717], [1437732288.0, 32420000.0, 0.9987369566060176], [1437734016.0, 32440000.0, 0.9992894726405892], [1437735744.0, 32460000.0, 0.9996841892832999], [1437737472.0, 32480000.0, 0.9999210442038161], [1437739200.0, 32500000.0, 1.0], [1437740928.0, 32520000.0, 0.9999210442038161], [1437742656.0, 32540000.0, 0.9996841892833], [1437744384.0, 32560000.0, 0.9992894726405893], [1437746112.0, 32580000.0, 0.9987369566060176], [1437747840.0, 32600000.0, 0.9980267284282718], [1437749568.0, 32620000.0, 0.9971589002606139], [1437751296.0, 32640000.0, 0.9961336091431725], [1437753024.0, 32660000.0, 0.9949510169813003], [1437754752.0, 32680000.0, 0.9936113105200087], [1437756480.0, 32700000.0, 0.9921147013144778], [1437758208.0, 32720000.0, 0.9904614256966513], [1437759936.0, 32740000.0, 0.9886517447379142], [1437761664.0, 32760000.0, 0.9866859442078684], [1437763392.0, 32780000.0, 0.9845643345292052], [1437765120.0, 32800000.0, 0.9822872507286894], [1437766848.0, 32820000.0, 0.9798550523842471], [1437768576.0, 32840000.0, 0.9772681235681938], [1437770304.0, 32860000.0, 0.9745268727865769], [1437772032.0, 32880000.0, 0.9716317329146748], [1437773760.0, 32900000.0, 0.9685831611286312], [1437775488.0, 32920000.0, 0.9653816388332742], [1437777216.0, 32940000.0, 0.9620276715860865], [1437778944.0, 32960000.0, 0.9585217890173767], [1437780672.0, 32980000.0, 0.954864544746643], [1437782400.0, 33000000.0, 0.9510565162951539], [1437784128.0, 33020000.0, 0.9470983049947448], [1437785856.0, 33040000.0, 0.9429905358928653], [1437787584.0, 33060000.0, 0.938733857653874], [1437789312.0, 33080000.0, 0.9343289424566121], [1437791040.0, 33100000.0, 0.9297764858882518], [1437792768.0, 33120000.0, 0.9250772068344588], [1437794496.0, 33140000.0, 0.92023184736587], [1437796224.0, 33160000.0, 0.915241172620919], [1437797952.0, 33180000.0, 0.910105970684996], [1437799680.0, 33200000.0, 0.9048270524660201], [1437801408.0, 33220000.0, 0.8994052515663705], [1437803136.0, 33240000.0, 0.8938414241512652], [1437804864.0, 33260000.0, 0.8881364488135447], [1437806592.0, 33280000.0, 0.8822912264349538], [1437808320.0, 33300000.0, 0.8763066800438645], [1437810048.0, 33320000.0, 0.870183754669527], [1437811776.0, 33340000.0, 0.8639234171928353], [1437813504.0, 33360000.0, 0.8575266561936526], [1437815232.0, 33380000.0, 0.8509944817946926], [1437816960.0, 33400000.0, 0.8443279255020163], [1437818688.0, 33420000.0, 0.8375280400421414], [1437820416.0, 33440000.0, 0.8305958991958128], [1437822144.0, 33460000.0, 0.823532597628428], [1437823872.0, 33480000.0, 0.816339250717185], [1437825600.0, 33500000.0, 0.8090169943749469], [1437827328.0, 33520000.0, 0.8015669848708786], [1437829056.0, 33540000.0, 0.7939903986478357], [1437830784.0, 33560000.0, 0.7862884321366198], [1437832512.0, 33580000.0, 0.7784623015670226], [1437834240.0, 33600000.0, 0.7705132427757911], [1437835968.0, 33620000.0, 0.762442511011448], [1437837696.0, 33640000.0, 0.7542513807361044], [1437839424.0, 33660000.0, 0.7459411454241833], [1437841152.0, 33680000.0, 0.7375131173581756], [1437842880.0, 33700000.0, 0.7289686274214113], [1437844608.0, 33720000.0, 0.7203090248879073], [1437846336.0, 33740000.0, 0.7115356772092862], [1437848064.0, 33760000.0, 0.7026499697988507], [1437849792.0, 33780000.0, 0.6936533058128045], [1437851520.0, 33800000.0, 0.6845471059286887], [1437853248.0, 33820000.0, 0.6753328081210251], [1437854976.0, 33840000.0, 0.6660118674342529], [1437856704.0, 33860000.0, 0.6565857557529556], [1437858432.0, 33880000.0, 0.6470559615694441], [1437860160.0, 33900000.0, 0.6374239897486901], [1437861888.0, 33920000.0, 0.6276913612907015], [1437863616.0, 33940000.0, 0.6178596130903331], [1437865344.0, 33960000.0, 0.6079302976946076], [1437867072.0, 33980000.0, 0.5979049830575189], [1437868800.0, 34000000.0, 0.5877852522924738], [1437870528.0, 34020000.0, 0.577572703422269], [1437872256.0, 34040000.0, 0.5672689491267585], [1437873984.0, 34060000.0, 0.5568756164881877], [1437875712.0, 34080000.0, 0.5463943467342695], [1437877440.0, 34100000.0, 0.5358267949789977], [1437879168.0, 34120000.0, 0.5251746299612974], [1437880896.0, 34140000.0, 0.5144395337815058], [1437882624.0, 34160000.0, 0.5036232016357608], [1437884352.0, 34180000.0, 0.49272734154829223], [1437886080.0, 34200000.0, 0.48175367410171666], [1437887808.0, 34220000.0, 0.47070393216533146], [1437889536.0, 34240000.0, 0.4595798606214875], [1437891264.0, 34260000.0, 0.44838321609003257], [1437892992.0, 34280000.0, 0.43711576665093393], [1437894720.0, 34300000.0, 0.42577929156507116], [1437896448.0, 34320000.0, 0.4143755809932866], [1437898176.0, 34340000.0, 0.4029064357136626], [1437899904.0, 34360000.0, 0.39137366683720304], [1437901632.0, 34380000.0, 0.3797790955218025], [1437903360.0, 34400000.0, 0.3681245526846801], [1437905088.0, 34420000.0, 0.35641187871325025], [1437906816.0, 34440000.0, 0.34464292317451733], [1437908544.0, 34460000.0, 0.3328195445229876], [1437910272.0, 34480000.0, 0.3209436098072112], [1437912000.0, 34500000.0, 0.30901699437494656], [1437913728.0, 34520000.0, 0.29704158157703475], [1437915456.0, 34540000.0, 0.2850192624699767], [1437917184.0, 34560000.0, 0.2729519355173266], [1437918912.0, 34580000.0, 0.2608415062898956], [1437920640.0, 34600000.0, 0.24868988716485424], [1437922368.0, 34620000.0, 0.2364989970237249], [1437924096.0, 34640000.0, 0.22427076094938214], [1437925824.0, 34660000.0, 0.2120071099220529], [1437927552.0, 34680000.0, 0.19970998051440952], [1437929280.0, 34700000.0, 0.1873813145857244], [1437931008.0, 34720000.0, 0.17502305897527662], [1437932736.0, 34740000.0, 0.16263716519488494], [1437934464.0, 34760000.0, 0.15022558912075917], [1437936192.0, 34780000.0, 0.13779029068463744], [1437937920.0, 34800000.0, 0.1253332335643044], [1437939648.0, 34820000.0, 0.11285638487348261], [1437941376.0, 34840000.0, 0.10036171485121659], [1437943104.0, 34860000.0, 0.08785119655074211], [1437944832.0, 34880000.0, 0.07532680552793244], [1437946560.0, 34900000.0, 0.06279051952931387], [1437948288.0, 34920000.0, 0.05024431817977084], [1437950016.0, 34940000.0, 0.03769018266993305], [1437951744.0, 34960000.0, 0.02513009544333677], [1437953472.0, 34980000.0, 0.012566039883352684], [1437955200.0, 35000000.0, 8.572527594031472e-16], [1437956928.0, 35020000.0, -0.012566039883350968], [1437958656.0, 35040000.0, -0.025130095443335058], [1437960384.0, 35060000.0, -0.037690182669931335], [1437962112.0, 35080000.0, -0.050244318179769126], [1437963840.0, 35100000.0, -0.06279051952931217], [1437965568.0, 35120000.0, -0.07532680552793074], [1437967296.0, 35140000.0, -0.0878511965507404], [1437969024.0, 35160000.0, -0.10036171485121488], [1437970752.0, 35180000.0, -0.1128563848734809], [1437972480.0, 35200000.0, -0.12533323356430268], [1437974208.0, 35220000.0, -0.13779029068463572], [1437975936.0, 35240000.0, -0.15022558912075745], [1437977664.0, 35260000.0, -0.16263716519488325], [1437979392.0, 35280000.0, -0.17502305897527493], [1437981120.0, 35300000.0, -0.18738131458572274], [1437982848.0, 35320000.0, -0.19970998051440783], [1437984576.0, 35340000.0, -0.2120071099220512], [1437986304.0, 35360000.0, -0.22427076094938048], [1437988032.0, 35380000.0, -0.23649899702372323], [1437989760.0, 35400000.0, -0.24868988716485257], [1437991488.0, 35420000.0, -0.26084150628989394], [1437993216.0, 35440000.0, -0.27295193551732494], [1437994944.0, 35460000.0, -0.2850192624699751], [1437996672.0, 35480000.0, -0.29704158157703314], [1437998400.0, 35500000.0, -0.3090169943749449], [1438000128.0, 35520000.0, -0.3209436098072096], [1438001856.0, 35540000.0, -0.332819544522986], [1438003584.0, 35560000.0, -0.3446429231745157], [1438005312.0, 35580000.0, -0.35641187871324864], [1438007040.0, 35600000.0, -0.3681245526846785], [1438008768.0, 35620000.0, -0.3797790955218009], [1438010496.0, 35640000.0, -0.3913736668372015], [1438012224.0, 35660000.0, -0.402906435713661], [1438013952.0, 35680000.0, -0.41437558099328503], [1438015680.0, 35700000.0, -0.4257792915650696], [1438017408.0, 35720000.0, -0.4371157666509324], [1438019136.0, 35740000.0, -0.448383216090031], [1438020864.0, 35760000.0, -0.4595798606214859], [1438022592.0, 35780000.0, -0.47070393216532996], [1438024320.0, 35800000.0, -0.48175367410171516], [1438026048.0, 35820000.0, -0.49272734154829073], [1438027776.0, 35840000.0, -0.5036232016357594], [1438029504.0, 35860000.0, -0.5144395337815043], [1438031232.0, 35880000.0, -0.5251746299612959], [1438032960.0, 35900000.0, -0.5358267949789962], [1438034688.0, 35920000.0, -0.546394346734268], [1438036416.0, 35940000.0, -0.5568756164881863], [1438038144.0, 35960000.0, -0.5672689491267571], [1438039872.0, 35980000.0, -0.5775727034222675], [1438041600.0, 36000000.0, -0.5877852522924725], [1438043328.0, 36020000.0, -0.5979049830575175], [1438045056.0, 36040000.0, -0.6079302976946063], [1438046784.0, 36060000.0, -0.6178596130903318], [1438048512.0, 36080000.0, -0.6276913612907001], [1438050240.0, 36100000.0, -0.6374239897486887], [1438051968.0, 36120000.0, -0.6470559615694428], [1438053696.0, 36140000.0, -0.6565857557529543], [1438055424.0, 36160000.0, -0.6660118674342516], [1438057152.0, 36180000.0, -0.6753328081210238], [1438058880.0, 36200000.0, -0.6845471059286875], [1438060608.0, 36220000.0, -0.6936533058128033], [1438062336.0, 36240000.0, -0.7026499697988495], [1438064064.0, 36260000.0, -0.7115356772092851], [1438065792.0, 36280000.0, -0.720309024887906], [1438067520.0, 36300000.0, -0.7289686274214102], [1438069248.0, 36320000.0, -0.7375131173581745], [1438070976.0, 36340000.0, -0.7459411454241821], [1438072704.0, 36360000.0, -0.7542513807361033], [1438074432.0, 36380000.0, -0.7624425110114469], [1438076160.0, 36400000.0, -0.77051324277579], [1438077888.0, 36420000.0, -0.7784623015670215], [1438079616.0, 36440000.0, -0.7862884321366187], [1438081344.0, 36460000.0, -0.7939903986478346], [1438083072.0, 36480000.0, -0.8015669848708775], [1438084800.0, 36500000.0, -0.8090169943749459], [1438086528.0, 36520000.0, -0.816339250717184], [1438088256.0, 36540000.0, -0.823532597628427], [1438089984.0, 36560000.0, -0.8305958991958118], [1438091712.0, 36580000.0, -0.8375280400421404], [1438093440.0, 36600000.0, -0.8443279255020154], [1438095168.0, 36620000.0, -0.8509944817946917], [1438096896.0, 36640000.0, -0.8575266561936518], [1438098624.0, 36660000.0, -0.8639234171928344], [1438100352.0, 36680000.0, -0.8701837546695261], [1438102080.0, 36700000.0, -0.8763066800438637], [1438103808.0, 36720000.0, -0.882291226434953], [1438105536.0, 36740000.0, -0.8881364488135439], [1438107264.0, 36760000.0, -0.8938414241512644], [1438108992.0, 36780000.0, -0.8994052515663697], [1438110720.0, 36800000.0, -0.9048270524660195], [1438112448.0, 36820000.0, -0.9101059706849953], [1438114176.0, 36840000.0, -0.9152411726209183], [1438115904.0, 36860000.0, -0.9202318473658694], [1438117632.0, 36880000.0, -0.9250772068344582], [1438119360.0, 36900000.0, -0.9297764858882512], [1438121088.0, 36920000.0, -0.9343289424566116], [1438122816.0, 36940000.0, -0.9387338576538734], [1438124544.0, 36960000.0, -0.9429905358928646], [1438126272.0, 36980000.0, -0.9470983049947442], [1438128000.0, 37000000.0, -0.9510565162951533], [1438129728.0, 37020000.0, -0.9548645447466425], [1438131456.0, 37040000.0, -0.9585217890173762], [1438133184.0, 37060000.0, -0.9620276715860859], [1438134912.0, 37080000.0, -0.9653816388332738], [1438136640.0, 37100000.0, -0.9685831611286309], [1438138368.0, 37120000.0, -0.9716317329146743], [1438140096.0, 37140000.0, -0.9745268727865765], [1438141824.0, 37160000.0, -0.9772681235681935], [1438143552.0, 37180000.0, -0.9798550523842467], [1438145280.0, 37200000.0, -0.982287250728689], [1438147008.0, 37220000.0, -0.984564334529205], [1438148736.0, 37240000.0, -0.9866859442078681], [1438150464.0, 37260000.0, -0.988651744737914], [1438152192.0, 37280000.0, -0.9904614256966511], [1438153920.0, 37300000.0, -0.9921147013144775], [1438155648.0, 37320000.0, -0.9936113105200085], [1438157376.0, 37340000.0, -0.9949510169813002], [1438159104.0, 37360000.0, -0.9961336091431724], [1438160832.0, 37380000.0, -0.9971589002606138], [1438162560.0, 37400000.0, -0.9980267284282717], [1438164288.0, 37420000.0, -0.9987369566060175], [1438166016.0, 37440000.0, -0.9992894726405892], [1438167744.0, 37460000.0, -0.9996841892832999], [1438169472.0, 37480000.0, -0.9999210442038161], [1438171200.0, 37500000.0, -1.0], [1438172928.0, 37520000.0, -0.9999210442038161], [1438174656.0, 37540000.0, -0.9996841892833], [1438176384.0, 37560000.0, -0.9992894726405893], [1438178112.0, 37580000.0, -0.9987369566060176], [1438179840.0, 37600000.0, -0.9980267284282718], [1438181568.0, 37620000.0, -0.9971589002606139], [1438183296.0, 37640000.0, -0.9961336091431725], [1438185024.0, 37660000.0, -0.9949510169813004], [1438186752.0, 37680000.0, -0.9936113105200087], [1438188480.0, 37700000.0, -0.9921147013144778], [1438190208.0, 37720000.0, -0.9904614256966513], [1438191936.0, 37740000.0, -0.9886517447379142], [1438193664.0, 37760000.0, -0.9866859442078685], [1438195392.0, 37780000.0, -0.9845643345292053], [1438197120.0, 37800000.0, -0.9822872507286894], [1438198848.0, 37820000.0, -0.9798550523842471], [1438200576.0, 37840000.0, -0.9772681235681938], [1438202304.0, 37860000.0, -0.9745268727865769], [1438204032.0, 37880000.0, -0.9716317329146748], [1438205760.0, 37900000.0, -0.9685831611286313], [1438207488.0, 37920000.0, -0.9653816388332742], [1438209216.0, 37940000.0, -0.9620276715860865], [1438210944.0, 37960000.0, -0.9585217890173767], [1438212672.0, 37980000.0, -0.954864544746643], [1438214400.0, 38000000.0, -0.9510565162951539], [1438216128.0, 38020000.0, -0.9470983049947448], [1438217856.0, 38040000.0, -0.9429905358928653], [1438219584.0, 38060000.0, -0.938733857653874], [1438221312.0, 38080000.0, -0.9343289424566122], [1438223040.0, 38100000.0, -0.9297764858882519], [1438224768.0, 38120000.0, -0.9250772068344588], [1438226496.0, 38140000.0, -0.92023184736587], [1438228224.0, 38160000.0, -0.915241172620919], [1438229952.0, 38180000.0, -0.910105970684996], [1438231680.0, 38200000.0, -0.9048270524660202], [1438233408.0, 38220000.0, -0.8994052515663705], [1438235136.0, 38240000.0, -0.8938414241512652], [1438236864.0, 38260000.0, -0.8881364488135447], [1438238592.0, 38280000.0, -0.8822912264349538], [1438240320.0, 38300000.0, -0.8763066800438646], [1438242048.0, 38320000.0, -0.870183754669527], [1438243776.0, 38340000.0, -0.8639234171928353], [1438245504.0, 38360000.0, -0.8575266561936526], [1438247232.0, 38380000.0, -0.8509944817946927], [1438248960.0, 38400000.0, -0.8443279255020163], [1438250688.0, 38420000.0, -0.8375280400421415], [1438252416.0, 38440000.0, -0.8305958991958129], [1438254144.0, 38460000.0, -0.8235325976284281], [1438255872.0, 38480000.0, -0.816339250717185], [1438257600.0, 38500000.0, -0.8090169943749469], [1438259328.0, 38520000.0, -0.8015669848708786], [1438261056.0, 38540000.0, -0.7939903986478358], [1438262784.0, 38560000.0, -0.7862884321366198], [1438264512.0, 38580000.0, -0.7784623015670227], [1438266240.0, 38600000.0, -0.7705132427757911], [1438267968.0, 38620000.0, -0.7624425110114481], [1438269696.0, 38640000.0, -0.7542513807361045], [1438271424.0, 38660000.0, -0.7459411454241834], [1438273152.0, 38680000.0, -0.7375131173581757], [1438274880.0, 38700000.0, -0.7289686274214114], [1438276608.0, 38720000.0, -0.7203090248879074], [1438278336.0, 38740000.0, -0.7115356772092863], [1438280064.0, 38760000.0, -0.7026499697988507], [1438281792.0, 38780000.0, -0.6936533058128046], [1438283520.0, 38800000.0, -0.6845471059286888], [1438285248.0, 38820000.0, -0.6753328081210251], [1438286976.0, 38840000.0, -0.666011867434253], [1438288704.0, 38860000.0, -0.6565857557529557], [1438290432.0, 38880000.0, -0.6470559615694442], [1438292160.0, 38900000.0, -0.6374239897486902], [1438293888.0, 38920000.0, -0.6276913612907016], [1438295616.0, 38940000.0, -0.6178596130903332], [1438297344.0, 38960000.0, -0.6079302976946077], [1438299072.0, 38980000.0, -0.597904983057519], [1438300800.0, 39000000.0, -0.5877852522924739], [1438302528.0, 39020000.0, -0.577572703422269], [1438304256.0, 39040000.0, -0.5672689491267585], [1438305984.0, 39060000.0, -0.5568756164881878], [1438307712.0, 39080000.0, -0.5463943467342696], [1438309440.0, 39100000.0, -0.5358267949789978], [1438311168.0, 39120000.0, -0.5251746299612975], [1438312896.0, 39140000.0, -0.5144395337815059], [1438314624.0, 39160000.0, -0.5036232016357609], [1438316352.0, 39180000.0, -0.49272734154829234], [1438318080.0, 39200000.0, -0.48175367410171677], [1438319808.0, 39220000.0, -0.4707039321653316], [1438321536.0, 39240000.0, -0.4595798606214876], [1438323264.0, 39260000.0, -0.4483832160900327], [1438324992.0, 39280000.0, -0.43711576665093405], [1438326720.0, 39300000.0, -0.42577929156507127], [1438328448.0, 39320000.0, -0.4143755809932867], [1438330176.0, 39340000.0, -0.4029064357136627], [1438331904.0, 39360000.0, -0.39137366683720315], [1438333632.0, 39380000.0, -0.3797790955218026], [1438335360.0, 39400000.0, -0.3681245526846802], [1438337088.0, 39420000.0, -0.35641187871325036], [1438338816.0, 39440000.0, -0.34464292317451745], [1438340544.0, 39460000.0, -0.33281954452298773], [1438342272.0, 39480000.0, -0.32094360980721137], [1438344000.0, 39500000.0, -0.3090169943749467], [1438345728.0, 39520000.0, -0.29704158157703486], [1438347456.0, 39540000.0, -0.28501926246997683], [1438349184.0, 39560000.0, -0.2729519355173267], [1438350912.0, 39580000.0, -0.2608415062898957], [1438352640.0, 39600000.0, -0.24868988716485435], [1438354368.0, 39620000.0, -0.236498997023725], [1438356096.0, 39640000.0, -0.22427076094938228], [1438357824.0, 39660000.0, -0.21200710992205302], [1438359552.0, 39680000.0, -0.19970998051440964], [1438361280.0, 39700000.0, -0.18738131458572455], [1438363008.0, 39720000.0, -0.17502305897527673], [1438364736.0, 39740000.0, -0.16263716519488505], [1438366464.0, 39760000.0, -0.15022558912075928], [1438368192.0, 39780000.0, -0.13779029068463755], [1438369920.0, 39800000.0, -0.1253332335643045], [1438371648.0, 39820000.0, -0.11285638487348272], [1438373376.0, 39840000.0, -0.10036171485121671], [1438375104.0, 39860000.0, -0.08785119655074224], [1438376832.0, 39880000.0, -0.07532680552793257], [1438378560.0, 39900000.0, -0.062790519529314], [1438380288.0, 39920000.0, -0.05024431817977096], [1438382016.0, 39940000.0, -0.037690182669933174], [1438383744.0, 39960000.0, -0.025130095443336893], [1438385472.0, 39980000.0, -0.012566039883352805], [1438387200.0, 40000000.0, -9.797174393178826e-16], [1438388928.0, 40020000.0, 0.012566039883350847], [1438390656.0, 40040000.0, 0.025130095443334936], [1438392384.0, 40060000.0, 0.03769018266993476], [1438394112.0, 40080000.0, 0.050244318179769], [1438395840.0, 40100000.0, 0.06279051952931204], [1438397568.0, 40120000.0, 0.07532680552793061], [1438399296.0, 40140000.0, 0.08785119655074382], [1438401024.0, 40160000.0, 0.10036171485121476], [1438402752.0, 40180000.0, 0.11285638487348078], [1438404480.0, 40200000.0, 0.12533323356429904], [1438406208.0, 40220000.0, 0.13779029068463913], [1438407936.0, 40240000.0, 0.15022558912075734], [1438409664.0, 40260000.0, 0.1626371651948831], [1438411392.0, 40280000.0, 0.1750230589752713], [1438413120.0, 40300000.0, 0.1873813145857261], [1438414848.0, 40320000.0, 0.19970998051440772], [1438416576.0, 40340000.0, 0.2120071099220511], [1438418304.0, 40360000.0, 0.2242707609493769], [1438420032.0, 40380000.0, 0.23649899702372656], [1438421760.0, 40400000.0, 0.24868988716485246], [1438423488.0, 40420000.0, 0.26084150628989383], [1438425216.0, 40440000.0, 0.2729519355173214], [1438426944.0, 40460000.0, 0.2850192624699784], [1438428672.0, 40480000.0, 0.29704158157703303], [1438430400.0, 40500000.0, 0.3090169943749448], [1438432128.0, 40520000.0, 0.32094360980720615], [1438433856.0, 40540000.0, 0.3328195445229859], [1438435584.0, 40560000.0, 0.3446429231745156], [1438437312.0, 40580000.0, 0.35641187871324853], [1438439040.0, 40600000.0, 0.36812455268467503], [1438440768.0, 40620000.0, 0.3797790955218008], [1438442496.0, 40640000.0, 0.3913736668372014], [1438444224.0, 40660000.0, 0.4029064357136609], [1438445952.0, 40680000.0, 0.4143755809932817], [1438447680.0, 40700000.0, 0.4257792915650727], [1438449408.0, 40720000.0, 0.43711576665093227], [1438451136.0, 40740000.0, 0.4483832160900309], [1438452864.0, 40760000.0, 0.4595798606214858], [1438454592.0, 40780000.0, 0.470703932165333], [1438456320.0, 40800000.0, 0.48175367410171505], [1438458048.0, 40820000.0, 0.4927273415482906], [1438459776.0, 40840000.0, 0.5036232016357592], [1438461504.0, 40860000.0, 0.5144395337815072], [1438463232.0, 40880000.0, 0.5251746299612958], [1438464960.0, 40900000.0, 0.5358267949789961], [1438466688.0, 40920000.0, 0.5463943467342649], [1438468416.0, 40940000.0, 0.5568756164881892], [1438470144.0, 40960000.0, 0.567268949126757], [1438471872.0, 40980000.0, 0.5775727034222674], [1438473600.0, 41000000.0, 0.5877852522924695], [1438475328.0, 41020000.0, 0.5979049830575203], [1438477056.0, 41040000.0, 0.6079302976946062], [1438478784.0, 41060000.0, 0.6178596130903317], [1438480512.0, 41080000.0, 0.6276913612906972], [1438482240.0, 41100000.0, 0.6374239897486914], [1438483968.0, 41120000.0, 0.6470559615694427], [1438485696.0, 41140000.0, 0.6565857557529542], [1438487424.0, 41160000.0, 0.6660118674342489], [1438489152.0, 41180000.0, 0.6753328081210264], [1438490880.0, 41200000.0, 0.6845471059286874], [1438492608.0, 41220000.0, 0.6936533058128032], [1438494336.0, 41240000.0, 0.7026499697988469], [1438496064.0, 41260000.0, 0.711535677209285], [1438497792.0, 41280000.0, 0.720309024887906], [1438499520.0, 41300000.0, 0.7289686274214101], [1438501248.0, 41320000.0, 0.7375131173581719], [1438502976.0, 41340000.0, 0.7459411454241821], [1438504704.0, 41360000.0, 0.7542513807361032], [1438506432.0, 41380000.0, 0.7624425110114468], [1438508160.0, 41400000.0, 0.7705132427757877], [1438509888.0, 41420000.0, 0.7784623015670237], [1438511616.0, 41440000.0, 0.7862884321366186], [1438513344.0, 41460000.0, 0.7939903986478346], [1438515072.0, 41480000.0, 0.8015669848708754], [1438516800.0, 41500000.0, 0.8090169943749479], [1438518528.0, 41520000.0, 0.8163392507171839], [1438520256.0, 41540000.0, 0.823532597628427], [1438521984.0, 41560000.0, 0.8305958991958118], [1438523712.0, 41580000.0, 0.8375280400421423], [1438525440.0, 41600000.0, 0.8443279255020153], [1438527168.0, 41620000.0, 0.8509944817946916], [1438528896.0, 41640000.0, 0.8575266561936516], [1438530624.0, 41660000.0, 0.8639234171928362], [1438532352.0, 41680000.0, 0.870183754669526], [1438534080.0, 41700000.0, 0.8763066800438636], [1438535808.0, 41720000.0, 0.8822912264349513], [1438537536.0, 41740000.0, 0.8881364488135455], [1438539264.0, 41760000.0, 0.8938414241512643], [1438540992.0, 41780000.0, 0.8994052515663697], [1438542720.0, 41800000.0, 0.9048270524660179], [1438544448.0, 41820000.0, 0.9101059706849967], [1438546176.0, 41840000.0, 0.9152411726209182], [1438547904.0, 41860000.0, 0.9202318473658693], [1438549632.0, 41880000.0, 0.9250772068344567], [1438551360.0, 41900000.0, 0.9297764858882525], [1438553088.0, 41920000.0, 0.9343289424566116], [1438554816.0, 41940000.0, 0.9387338576538733], [1438556544.0, 41960000.0, 0.9429905358928634], [1438558272.0, 41980000.0, 0.9470983049947442], [1438560000.0, 42000000.0, 0.9510565162951533], [1438561728.0, 42020000.0, 0.9548645447466424], [1438563456.0, 42040000.0, 0.9585217890173752], [1438565184.0, 42060000.0, 0.9620276715860859], [1438566912.0, 42080000.0, 0.9653816388332738], [1438568640.0, 42100000.0, 0.9685831611286307], [1438570368.0, 42120000.0, 0.9716317329146734], [1438572096.0, 42140000.0, 0.9745268727865772], [1438573824.0, 42160000.0, 0.9772681235681934], [1438575552.0, 42180000.0, 0.9798550523842466], [1438577280.0, 42200000.0, 0.9822872507286884], [1438579008.0, 42220000.0, 0.9845643345292056], [1438580736.0, 42240000.0, 0.9866859442078681], [1438582464.0, 42260000.0, 0.988651744737914], [1438584192.0, 42280000.0, 0.9904614256966511], [1438585920.0, 42300000.0, 0.992114701314478], [1438587648.0, 42320000.0, 0.9936113105200085], [1438589376.0, 42340000.0, 0.9949510169813002], [1438591104.0, 42360000.0, 0.9961336091431724], [1438592832.0, 42380000.0, 0.997158900260614], [1438594560.0, 42400000.0, 0.9980267284282717], [1438596288.0, 42420000.0, 0.9987369566060175], [1438598016.0, 42440000.0, 0.9992894726405891], [1438599744.0, 42460000.0, 0.9996841892833], [1438601472.0, 42480000.0, 0.9999210442038161], [1438603200.0, 42500000.0, 1.0], [1438604928.0, 42520000.0, 0.9999210442038161], [1438606656.0, 42540000.0, 0.9996841892833], [1438608384.0, 42560000.0, 0.9992894726405892], [1438610112.0, 42580000.0, 0.9987369566060176], [1438611840.0, 42600000.0, 0.9980267284282718], [1438613568.0, 42620000.0, 0.9971589002606143], [1438615296.0, 42640000.0, 0.9961336091431726], [1438617024.0, 42660000.0, 0.9949510169813004], [1438618752.0, 42680000.0, 0.9936113105200087], [1438620480.0, 42700000.0, 0.9921147013144783], [1438622208.0, 42720000.0, 0.9904614256966513], [1438623936.0, 42740000.0, 0.9886517447379143], [1438625664.0, 42760000.0, 0.9866859442078685], [1438627392.0, 42780000.0, 0.9845643345292059], [1438629120.0, 42800000.0, 0.9822872507286887], [1438630848.0, 42820000.0, 0.9798550523842471], [1438632576.0, 42840000.0, 0.9772681235681938], [1438634304.0, 42860000.0, 0.9745268727865778], [1438636032.0, 42880000.0, 0.9716317329146739], [1438637760.0, 42900000.0, 0.9685831611286313], [1438639488.0, 42920000.0, 0.9653816388332743], [1438641216.0, 42940000.0, 0.9620276715860865], [1438642944.0, 42960000.0, 0.9585217890173757], [1438644672.0, 42980000.0, 0.9548645447466431], [1438646400.0, 43000000.0, 0.9510565162951539], [1438648128.0, 43020000.0, 0.9470983049947449], [1438649856.0, 43040000.0, 0.9429905358928642], [1438651584.0, 43060000.0, 0.9387338576538741], [1438653312.0, 43080000.0, 0.9343289424566122], [1438655040.0, 43100000.0, 0.9297764858882532], [1438656768.0, 43120000.0, 0.9250772068344575], [1438658496.0, 43140000.0, 0.9202318473658702], [1438660224.0, 43160000.0, 0.9152411726209191], [1438661952.0, 43180000.0, 0.9101059706849975], [1438663680.0, 43200000.0, 0.9048270524660188], [1438665408.0, 43220000.0, 0.8994052515663706], [1438667136.0, 43240000.0, 0.8938414241512652], [1438668864.0, 43260000.0, 0.8881364488135464], [1438670592.0, 43280000.0, 0.8822912264349523], [1438672320.0, 43300000.0, 0.8763066800438646], [1438674048.0, 43320000.0, 0.8701837546695271], [1438675776.0, 43340000.0, 0.8639234171928372], [1438677504.0, 43360000.0, 0.8575266561936528], [1438679232.0, 43380000.0, 0.8509944817946927], [1438680960.0, 43400000.0, 0.8443279255020164], [1438682688.0, 43420000.0, 0.8375280400421435], [1438684416.0, 43440000.0, 0.8305958991958129], [1438686144.0, 43460000.0, 0.8235325976284281], [1438687872.0, 43480000.0, 0.8163392507171852], [1438689600.0, 43500000.0, 0.8090169943749491], [1438691328.0, 43520000.0, 0.8015669848708766], [1438693056.0, 43540000.0, 0.7939903986478358], [1438694784.0, 43560000.0, 0.7862884321366199], [1438696512.0, 43580000.0, 0.7784623015670249], [1438698240.0, 43600000.0, 0.770513242775789], [1438699968.0, 43620000.0, 0.7624425110114481], [1438701696.0, 43640000.0, 0.7542513807361046], [1438703424.0, 43660000.0, 0.7459411454241834], [1438705152.0, 43680000.0, 0.7375131173581734], [1438706880.0, 43700000.0, 0.7289686274214116], [1438708608.0, 43720000.0, 0.7203090248879075], [1438710336.0, 43740000.0, 0.7115356772092865], [1438712064.0, 43760000.0, 0.7026499697988483], [1438713792.0, 43780000.0, 0.6936533058128047], [1438715520.0, 43800000.0, 0.684547105928689], [1438717248.0, 43820000.0, 0.6753328081210279], [1438718976.0, 43840000.0, 0.6660118674342504], [1438720704.0, 43860000.0, 0.6565857557529557], [1438722432.0, 43880000.0, 0.6470559615694442], [1438724160.0, 43900000.0, 0.637423989748693], [1438725888.0, 43920000.0, 0.6276913612906989], [1438727616.0, 43940000.0, 0.6178596130903333], [1438729344.0, 43960000.0, 0.6079302976946078], [1438731072.0, 43980000.0, 0.5979049830575219], [1438732800.0, 44000000.0, 0.5877852522924711], [1438734528.0, 44020000.0, 0.5775727034222691], [1438736256.0, 44040000.0, 0.5672689491267586], [1438737984.0, 44060000.0, 0.5568756164881908], [1438739712.0, 44080000.0, 0.5463943467342667], [1438741440.0, 44100000.0, 0.5358267949789979], [1438743168.0, 44120000.0, 0.5251746299612976], [1438744896.0, 44140000.0, 0.514439533781509], [1438746624.0, 44160000.0, 0.503623201635761], [1438748352.0, 44180000.0, 0.49272734154829245], [1438750080.0, 44200000.0, 0.4817536741017168], [1438751808.0, 44220000.0, 0.47070393216533485], [1438753536.0, 44240000.0, 0.4595798606214877], [1438755264.0, 44260000.0, 0.4483832160900328], [1438756992.0, 44280000.0, 0.43711576665093416], [1438758720.0, 44300000.0, 0.4257792915650746], [1438760448.0, 44320000.0, 0.4143755809932836], [1438762176.0, 44340000.0, 0.4029064357136628], [1438763904.0, 44360000.0, 0.39137366683720326], [1438765632.0, 44380000.0, 0.3797790955218027], [1438767360.0, 44400000.0, 0.368124552684677], [1438769088.0, 44420000.0, 0.3564118787132505], [1438770816.0, 44440000.0, 0.34464292317451756], [1438772544.0, 44460000.0, 0.3328195445229879], [1438774272.0, 44480000.0, 0.3209436098072081], [1438776000.0, 44500000.0, 0.3090169943749468], [1438777728.0, 44520000.0, 0.29704158157703503], [1438779456.0, 44540000.0, 0.2850192624699804], [1438781184.0, 44560000.0, 0.2729519355173234], [1438782912.0, 44580000.0, 0.2608415062898959], [1438784640.0, 44600000.0, 0.24868988716485446], [1438786368.0, 44620000.0, 0.23649899702372856], [1438788096.0, 44640000.0, 0.22427076094937892], [1438789824.0, 44660000.0, 0.21200710992205313], [1438791552.0, 44680000.0, 0.19970998051440977], [1438793280.0, 44700000.0, 0.18738131458572815], [1438795008.0, 44720000.0, 0.17502305897527334], [1438796736.0, 44740000.0, 0.16263716519488516], [1438798464.0, 44760000.0, 0.1502255891207594], [1438800192.0, 44780000.0, 0.1377902906846412], [1438801920.0, 44800000.0, 0.1253332335643011], [1438803648.0, 44820000.0, 0.11285638487348285], [1438805376.0, 44840000.0, 0.10036171485121682], [1438807104.0, 44860000.0, 0.0878511965507459], [1438808832.0, 44880000.0, 0.07532680552793268], [1438810560.0, 44900000.0, 0.06279051952931412], [1438812288.0, 44920000.0, 0.05024431817977108], [1438814016.0, 44940000.0, 0.037690182669936845], [1438815744.0, 44960000.0, 0.025130095443337018], [1438817472.0, 44980000.0, 0.012566039883352928], [1438819200.0, 45000000.0, 1.102182119232618e-15], [1438820928.0, 45020000.0, -0.012566039883350723], [1438822656.0, 45040000.0, -0.025130095443334815], [1438824384.0, 45060000.0, -0.037690182669934645], [1438826112.0, 45080000.0, -0.05024431817976888], [1438827840.0, 45100000.0, -0.06279051952931192], [1438829568.0, 45120000.0, -0.07532680552793049], [1438831296.0, 45140000.0, -0.08785119655074371], [1438833024.0, 45160000.0, -0.10036171485121463], [1438834752.0, 45180000.0, -0.11285638487348065], [1438836480.0, 45200000.0, -0.12533323356429893], [1438838208.0, 45220000.0, -0.137790290684639], [1438839936.0, 45240000.0, -0.15022558912075723], [1438841664.0, 45260000.0, -0.162637165194883], [1438843392.0, 45280000.0, -0.17502305897527118], [1438845120.0, 45300000.0, -0.187381314585726], [1438846848.0, 45320000.0, -0.1997099805144076], [1438848576.0, 45340000.0, -0.212007109922051], [1438850304.0, 45360000.0, -0.22427076094937679], [1438852032.0, 45380000.0, -0.23649899702372643], [1438853760.0, 45400000.0, -0.24868988716485232], [1438855488.0, 45420000.0, -0.2608415062898937], [1438857216.0, 45440000.0, -0.2729519355173213], [1438858944.0, 45460000.0, -0.2850192624699783], [1438860672.0, 45480000.0, -0.2970415815770329], [1438862400.0, 45500000.0, -0.3090169943749447], [1438864128.0, 45520000.0, -0.320943609807206], [1438865856.0, 45540000.0, -0.3328195445229858], [1438867584.0, 45560000.0, -0.3446429231745155], [1438869312.0, 45580000.0, -0.3564118787132484], [1438871040.0, 45600000.0, -0.3681245526846749], [1438872768.0, 45620000.0, -0.37977909552180067], [1438874496.0, 45640000.0, -0.39137366683720126], [1438876224.0, 45660000.0, -0.4029064357136608], [1438877952.0, 45680000.0, -0.4143755809932816], [1438879680.0, 45700000.0, -0.4257792915650726], [1438881408.0, 45720000.0, -0.43711576665093216], [1438883136.0, 45740000.0, -0.4483832160900308], [1438884864.0, 45760000.0, -0.4595798606214857], [1438886592.0, 45780000.0, -0.4707039321653329], [1438888320.0, 45800000.0, -0.48175367410171493], [1438890048.0, 45820000.0, -0.4927273415482905], [1438891776.0, 45840000.0, -0.5036232016357591], [1438893504.0, 45860000.0, -0.5144395337815071], [1438895232.0, 45880000.0, -0.5251746299612957], [1438896960.0, 45900000.0, -0.535826794978996], [1438898688.0, 45920000.0, -0.5463943467342648], [1438900416.0, 45940000.0, -0.5568756164881891], [1438902144.0, 45960000.0, -0.5672689491267568], [1438903872.0, 45980000.0, -0.5775727034222673], [1438905600.0, 46000000.0, -0.5877852522924694], [1438907328.0, 46020000.0, -0.5979049830575202], [1438909056.0, 46040000.0, -0.607930297694606], [1438910784.0, 46060000.0, -0.6178596130903317], [1438912512.0, 46080000.0, -0.6276913612906971], [1438914240.0, 46100000.0, -0.6374239897486913], [1438915968.0, 46120000.0, -0.6470559615694426], [1438917696.0, 46140000.0, -0.6565857557529542], [1438919424.0, 46160000.0, -0.6660118674342488], [1438921152.0, 46180000.0, -0.6753328081210263], [1438922880.0, 46200000.0, -0.6845471059286873], [1438924608.0, 46220000.0, -0.693653305812803], [1438926336.0, 46240000.0, -0.7026499697988468], [1438928064.0, 46260000.0, -0.7115356772092849], [1438929792.0, 46280000.0, -0.7203090248879059], [1438931520.0, 46300000.0, -0.72896862742141], [1438933248.0, 46320000.0, -0.7375131173581719], [1438934976.0, 46340000.0, -0.745941145424182], [1438936704.0, 46360000.0, -0.7542513807361032], [1438938432.0, 46380000.0, -0.7624425110114467], [1438940160.0, 46400000.0, -0.7705132427757876], [1438941888.0, 46420000.0, -0.7784623015670236], [1438943616.0, 46440000.0, -0.7862884321366186], [1438945344.0, 46460000.0, -0.7939903986478345], [1438947072.0, 46480000.0, -0.8015669848708753], [1438948800.0, 46500000.0, -0.8090169943749478], [1438950528.0, 46520000.0, -0.8163392507171838], [1438952256.0, 46540000.0, -0.8235325976284269], [1438953984.0, 46560000.0, -0.8305958991958117], [1438955712.0, 46580000.0, -0.8375280400421423], [1438957440.0, 46600000.0, -0.8443279255020152], [1438959168.0, 46620000.0, -0.8509944817946916], [1438960896.0, 46640000.0, -0.8575266561936516], [1438962624.0, 46660000.0, -0.863923417192836], [1438964352.0, 46680000.0, -0.870183754669526], [1438966080.0, 46700000.0, -0.8763066800438636], [1438967808.0, 46720000.0, -0.8822912264349512], [1438969536.0, 46740000.0, -0.8881364488135454], [1438971264.0, 46760000.0, -0.8938414241512643], [1438972992.0, 46780000.0, -0.8994052515663696], [1438974720.0, 46800000.0, -0.9048270524660178], [1438976448.0, 46820000.0, -0.9101059706849967], [1438978176.0, 46840000.0, -0.9152411726209182], [1438979904.0, 46860000.0, -0.9202318473658693], [1438981632.0, 46880000.0, -0.9250772068344567], [1438983360.0, 46900000.0, -0.9297764858882525], [1438985088.0, 46920000.0, -0.9343289424566115], [1438986816.0, 46940000.0, -0.9387338576538733], [1438988544.0, 46960000.0, -0.9429905358928634], [1438990272.0, 46980000.0, -0.9470983049947441], [1438992000.0, 47000000.0, -0.9510565162951532], [1438993728.0, 47020000.0, -0.9548645447466424], [1438995456.0, 47040000.0, -0.9585217890173751], [1438997184.0, 47060000.0, -0.9620276715860859], [1438998912.0, 47080000.0, -0.9653816388332737], [1439000640.0, 47100000.0, -0.9685831611286307], [1439002368.0, 47120000.0, -0.9716317329146734], [1439004096.0, 47140000.0, -0.9745268727865772], [1439005824.0, 47160000.0, -0.9772681235681934], [1439007552.0, 47180000.0, -0.9798550523842466], [1439009280.0, 47200000.0, -0.9822872507286883], [1439011008.0, 47220000.0, -0.9845643345292056], [1439012736.0, 47240000.0, -0.986685944207868], [1439014464.0, 47260000.0, -0.988651744737914], [1439016192.0, 47280000.0, -0.990461425696651], [1439017920.0, 47300000.0, -0.992114701314478], [1439019648.0, 47320000.0, -0.9936113105200085], [1439021376.0, 47340000.0, -0.9949510169813002], [1439023104.0, 47360000.0, -0.9961336091431724], [1439024832.0, 47380000.0, -0.997158900260614], [1439026560.0, 47400000.0, -0.9980267284282717], [1439028288.0, 47420000.0, -0.9987369566060175], [1439030016.0, 47440000.0, -0.9992894726405891], [1439031744.0, 47460000.0, -0.9996841892833], [1439033472.0, 47480000.0, -0.9999210442038161], [1439035200.0, 47500000.0, -1.0], [1439036928.0, 47520000.0, -0.9999210442038161], [1439038656.0, 47540000.0, -0.9996841892833], [1439040384.0, 47560000.0, -0.9992894726405892], [1439042112.0, 47580000.0, -0.9987369566060176], [1439043840.0, 47600000.0, -0.9980267284282718], [1439045568.0, 47620000.0, -0.9971589002606143], [1439047296.0, 47640000.0, -0.9961336091431726], [1439049024.0, 47660000.0, -0.9949510169813004], [1439050752.0, 47680000.0, -0.9936113105200087], [1439052480.0, 47700000.0, -0.9921147013144783], [1439054208.0, 47720000.0, -0.9904614256966513], [1439055936.0, 47740000.0, -0.9886517447379143], [1439057664.0, 47760000.0, -0.9866859442078685], [1439059392.0, 47780000.0, -0.984564334529206], [1439061120.0, 47800000.0, -0.9822872507286887], [1439062848.0, 47820000.0, -0.9798550523842471], [1439064576.0, 47840000.0, -0.9772681235681939], [1439066304.0, 47860000.0, -0.9745268727865778], [1439068032.0, 47880000.0, -0.971631732914674], [1439069760.0, 47900000.0, -0.9685831611286313], [1439071488.0, 47920000.0, -0.9653816388332743], [1439073216.0, 47940000.0, -0.9620276715860865], [1439074944.0, 47960000.0, -0.9585217890173757], [1439076672.0, 47980000.0, -0.9548645447466431], [1439078400.0, 48000000.0, -0.951056516295154], [1439080128.0, 48020000.0, -0.9470983049947449], [1439081856.0, 48040000.0, -0.9429905358928642], [1439083584.0, 48060000.0, -0.9387338576538741], [1439085312.0, 48080000.0, -0.9343289424566124], [1439087040.0, 48100000.0, -0.9297764858882532], [1439088768.0, 48120000.0, -0.9250772068344576], [1439090496.0, 48140000.0, -0.9202318473658702], [1439092224.0, 48160000.0, -0.9152411726209191], [1439093952.0, 48180000.0, -0.9101059706849977], [1439095680.0, 48200000.0, -0.9048270524660188], [1439097408.0, 48220000.0, -0.8994052515663706], [1439099136.0, 48240000.0, -0.8938414241512653], [1439100864.0, 48260000.0, -0.8881364488135465], [1439102592.0, 48280000.0, -0.8822912264349523], [1439104320.0, 48300000.0, -0.8763066800438647], [1439106048.0, 48320000.0, -0.8701837546695271], [1439107776.0, 48340000.0, -0.8639234171928373], [1439109504.0, 48360000.0, -0.8575266561936528], [1439111232.0, 48380000.0, -0.8509944817946928], [1439112960.0, 48400000.0, -0.8443279255020165], [1439114688.0, 48420000.0, -0.8375280400421435], [1439116416.0, 48440000.0, -0.830595899195813], [1439118144.0, 48460000.0, -0.8235325976284282], [1439119872.0, 48480000.0, -0.8163392507171852], [1439121600.0, 48500000.0, -0.8090169943749491], [1439123328.0, 48520000.0, -0.8015669848708766], [1439125056.0, 48540000.0, -0.7939903986478359], [1439126784.0, 48560000.0, -0.78628843213662], [1439128512.0, 48580000.0, -0.778462301567025], [1439130240.0, 48600000.0, -0.770513242775789], [1439131968.0, 48620000.0, -0.7624425110114482], [1439133696.0, 48640000.0, -0.7542513807361046], [1439135424.0, 48660000.0, -0.7459411454241835], [1439137152.0, 48680000.0, -0.7375131173581735], [1439138880.0, 48700000.0, -0.7289686274214117], [1439140608.0, 48720000.0, -0.7203090248879075], [1439142336.0, 48740000.0, -0.7115356772092866], [1439144064.0, 48760000.0, -0.7026499697988484], [1439145792.0, 48780000.0, -0.6936533058128047], [1439147520.0, 48800000.0, -0.6845471059286891], [1439149248.0, 48820000.0, -0.675332808121028], [1439150976.0, 48840000.0, -0.6660118674342504], [1439152704.0, 48860000.0, -0.6565857557529559], [1439154432.0, 48880000.0, -0.6470559615694443], [1439156160.0, 48900000.0, -0.6374239897486931], [1439157888.0, 48920000.0, -0.627691361290699], [1439159616.0, 48940000.0, -0.6178596130903334], [1439161344.0, 48960000.0, -0.6079302976946079], [1439163072.0, 48980000.0, -0.597904983057522], [1439164800.0, 49000000.0, -0.5877852522924712], [1439166528.0, 49020000.0, -0.5775727034222692], [1439168256.0, 49040000.0, -0.5672689491267587], [1439169984.0, 49060000.0, -0.556875616488191], [1439171712.0, 49080000.0, -0.5463943467342668], [1439173440.0, 49100000.0, -0.535826794978998], [1439175168.0, 49120000.0, -0.5251746299612977], [1439176896.0, 49140000.0, -0.5144395337815091], [1439178624.0, 49160000.0, -0.5036232016357611], [1439180352.0, 49180000.0, -0.49272734154829256], [1439182080.0, 49200000.0, -0.48175367410171693], [1439183808.0, 49220000.0, -0.47070393216533496], [1439185536.0, 49240000.0, -0.4595798606214878], [1439187264.0, 49260000.0, -0.4483832160900329], [1439188992.0, 49280000.0, -0.43711576665093427], [1439190720.0, 49300000.0, -0.4257792915650747], [1439192448.0, 49320000.0, -0.4143755809932837], [1439194176.0, 49340000.0, -0.4029064357136629], [1439195904.0, 49360000.0, -0.3913736668372034], [1439197632.0, 49380000.0, -0.37977909552180283], [1439199360.0, 49400000.0, -0.3681245526846771], [1439201088.0, 49420000.0, -0.3564118787132506], [1439202816.0, 49440000.0, -0.34464292317451767], [1439204544.0, 49460000.0, -0.332819544522988], [1439206272.0, 49480000.0, -0.3209436098072082], [1439208000.0, 49500000.0, -0.3090169943749469], [1439209728.0, 49520000.0, -0.29704158157703514], [1439211456.0, 49540000.0, -0.2850192624699805], [1439213184.0, 49560000.0, -0.27295193551732355], [1439214912.0, 49580000.0, -0.260841506289896], [1439216640.0, 49600000.0, -0.2486898871648546], [1439218368.0, 49620000.0, -0.2364989970237287], [1439220096.0, 49640000.0, -0.22427076094937903], [1439221824.0, 49660000.0, -0.21200710992205327], [1439223552.0, 49680000.0, -0.19970998051440988], [1439225280.0, 49700000.0, -0.18738131458572826], [1439227008.0, 49720000.0, -0.17502305897527348], [1439228736.0, 49740000.0, -0.1626371651948853], [1439230464.0, 49760000.0, -0.15022558912075953], [1439232192.0, 49780000.0, -0.13779029068464133], [1439233920.0, 49800000.0, -0.12533323356430123], [1439235648.0, 49820000.0, -0.11285638487348297], [1439237376.0, 49840000.0, -0.10036171485121695], [1439239104.0, 49860000.0, -0.08785119655074602], [1439240832.0, 49880000.0, -0.0753268055279328], [1439242560.0, 49900000.0, -0.06279051952931425], [1439244288.0, 49920000.0, -0.0502443181797712], [1439246016.0, 49940000.0, -0.03769018266993697], [1439247744.0, 49960000.0, -0.02513009544333714], [1439249472.0, 49980000.0, -0.01256603988335305], [1439251200.0, 50000000.0, -1.2246467991473533e-15], [1439252928.0, 50020000.0, 0.012566039883350602], [1439254656.0, 50040000.0, 0.02513009544333469], [1439256384.0, 50060000.0, 0.03769018266993452], [1439258112.0, 50080000.0, 0.05024431817976876], [1439259840.0, 50100000.0, 0.06279051952931179], [1439261568.0, 50120000.0, 0.07532680552793036], [1439263296.0, 50140000.0, 0.08785119655074358], [1439265024.0, 50160000.0, 0.10036171485121452], [1439266752.0, 50180000.0, 0.11285638487348054], [1439268480.0, 50200000.0, 0.1253332335642988], [1439270208.0, 50220000.0, 0.13779029068463888], [1439271936.0, 50240000.0, 0.1502255891207571], [1439273664.0, 50260000.0, 0.1626371651948829], [1439275392.0, 50280000.0, 0.17502305897527107], [1439277120.0, 50300000.0, 0.18738131458572585], [1439278848.0, 50320000.0, 0.19970998051440747], [1439280576.0, 50340000.0, 0.21200710992205088], [1439282304.0, 50360000.0, 0.22427076094937665], [1439284032.0, 50380000.0, 0.23649899702372632], [1439285760.0, 50400000.0, 0.2486898871648522], [1439287488.0, 50420000.0, 0.2608415062898936], [1439289216.0, 50440000.0, 0.27295193551732116], [1439290944.0, 50460000.0, 0.28501926246997816], [1439292672.0, 50480000.0, 0.29704158157703275], [1439294400.0, 50500000.0, 0.30901699437494456], [1439296128.0, 50520000.0, 0.3209436098072059], [1439297856.0, 50540000.0, 0.3328195445229857], [1439299584.0, 50560000.0, 0.34464292317451534], [1439301312.0, 50580000.0, 0.3564118787132483], [1439303040.0, 50600000.0, 0.3681245526846748], [1439304768.0, 50620000.0, 0.37977909552180056], [1439306496.0, 50640000.0, 0.39137366683720115], [1439308224.0, 50660000.0, 0.4029064357136607], [1439309952.0, 50680000.0, 0.4143755809932815], [1439311680.0, 50700000.0, 0.4257792915650725], [1439313408.0, 50720000.0, 0.43711576665093205], [1439315136.0, 50740000.0, 0.4483832160900307], [1439316864.0, 50760000.0, 0.4595798606214856], [1439318592.0, 50780000.0, 0.4707039321653328], [1439320320.0, 50800000.0, 0.4817536741017148], [1439322048.0, 50820000.0, 0.4927273415482904], [1439323776.0, 50840000.0, 0.503623201635759], [1439325504.0, 50860000.0, 0.514439533781507], [1439327232.0, 50880000.0, 0.5251746299612956], [1439328960.0, 50900000.0, 0.5358267949789959], [1439330688.0, 50920000.0, 0.5463943467342647], [1439332416.0, 50940000.0, 0.556875616488186], [1439334144.0, 50960000.0, 0.5672689491267539], [1439335872.0, 50980000.0, 0.5775727034222643], [1439337600.0, 51000000.0, 0.5877852522924693], [1439339328.0, 51020000.0, 0.59790498305752], [1439341056.0, 51040000.0, 0.6079302976946059], [1439342784.0, 51060000.0, 0.6178596130903343], [1439344512.0, 51080000.0, 0.6276913612906998], [1439346240.0, 51100000.0, 0.6374239897486885], [1439347968.0, 51120000.0, 0.6470559615694425], [1439349696.0, 51140000.0, 0.6565857557529541], [1439351424.0, 51160000.0, 0.6660118674342487], [1439353152.0, 51180000.0, 0.6753328081210261], [1439354880.0, 51200000.0, 0.6845471059286898], [1439356608.0, 51220000.0, 0.6936533058128056], [1439358336.0, 51240000.0, 0.7026499697988492], [1439360064.0, 51260000.0, 0.7115356772092848], [1439361792.0, 51280000.0, 0.7203090248879058], [1439363520.0, 51300000.0, 0.7289686274214099], [1439365248.0, 51320000.0, 0.7375131173581718], [1439366976.0, 51340000.0, 0.7459411454241843], [1439368704.0, 51360000.0, 0.7542513807361054], [1439370432.0, 51380000.0, 0.7624425110114489], [1439372160.0, 51400000.0, 0.7705132427757853], [1439373888.0, 51420000.0, 0.7784623015670235], [1439375616.0, 51440000.0, 0.7862884321366185], [1439377344.0, 51460000.0, 0.7939903986478344], [1439379072.0, 51480000.0, 0.8015669848708752], [1439380800.0, 51500000.0, 0.8090169943749498], [1439382528.0, 51520000.0, 0.8163392507171817], [1439384256.0, 51540000.0, 0.8235325976284248], [1439385984.0, 51560000.0, 0.8305958991958097], [1439387712.0, 51580000.0, 0.8375280400421422], [1439389440.0, 51600000.0, 0.8443279255020152], [1439391168.0, 51620000.0, 0.8509944817946915], [1439392896.0, 51640000.0, 0.8575266561936515], [1439394624.0, 51660000.0, 0.8639234171928342], [1439396352.0, 51680000.0, 0.8701837546695242], [1439398080.0, 51700000.0, 0.8763066800438618], [1439399808.0, 51720000.0, 0.8822912264349512], [1439401536.0, 51740000.0, 0.8881364488135454], [1439403264.0, 51760000.0, 0.8938414241512642], [1439404992.0, 51780000.0, 0.8994052515663712], [1439406720.0, 51800000.0, 0.9048270524660192], [1439408448.0, 51820000.0, 0.9101059706849951], [1439410176.0, 51840000.0, 0.9152411726209168], [1439411904.0, 51860000.0, 0.9202318473658692], [1439413632.0, 51880000.0, 0.9250772068344566], [1439415360.0, 51900000.0, 0.9297764858882523], [1439417088.0, 51920000.0, 0.9343289424566127], [1439418816.0, 51940000.0, 0.9387338576538744], [1439420544.0, 51960000.0, 0.9429905358928645], [1439422272.0, 51980000.0, 0.9470983049947441], [1439424000.0, 52000000.0, 0.9510565162951532], [1439425728.0, 52020000.0, 0.9548645447466424], [1439427456.0, 52040000.0, 0.9585217890173751], [1439429184.0, 52060000.0, 0.9620276715860868], [1439430912.0, 52080000.0, 0.9653816388332745], [1439432640.0, 52100000.0, 0.9685831611286316], [1439434368.0, 52120000.0, 0.9716317329146725], [1439436096.0, 52140000.0, 0.9745268727865772], [1439437824.0, 52160000.0, 0.9772681235681934], [1439439552.0, 52180000.0, 0.9798550523842466], [1439441280.0, 52200000.0, 0.9822872507286883], [1439443008.0, 52220000.0, 0.9845643345292061], [1439444736.0, 52240000.0, 0.9866859442078675], [1439446464.0, 52260000.0, 0.9886517447379134], [1439448192.0, 52280000.0, 0.9904614256966505], [1439449920.0, 52300000.0, 0.992114701314478], [1439451648.0, 52320000.0, 0.9936113105200085], [1439453376.0, 52340000.0, 0.9949510169813002], [1439455104.0, 52360000.0, 0.9961336091431724], [1439456832.0, 52380000.0, 0.9971589002606138], [1439458560.0, 52400000.0, 0.9980267284282713], [1439460288.0, 52420000.0, 0.9987369566060174], [1439462016.0, 52440000.0, 0.9992894726405891], [1439463744.0, 52460000.0, 0.9996841892833], [1439465472.0, 52480000.0, 0.9999210442038161], [1439467200.0, 52500000.0, 1.0], [1439468928.0, 52520000.0, 0.9999210442038161], [1439470656.0, 52540000.0, 0.9996841892833], [1439472384.0, 52560000.0, 0.9992894726405893], [1439474112.0, 52580000.0, 0.9987369566060176], [1439475840.0, 52600000.0, 0.9980267284282718], [1439477568.0, 52620000.0, 0.9971589002606143], [1439479296.0, 52640000.0, 0.9961336091431723], [1439481024.0, 52660000.0, 0.9949510169813001], [1439482752.0, 52680000.0, 0.9936113105200084], [1439484480.0, 52700000.0, 0.9921147013144788], [1439486208.0, 52720000.0, 0.9904614256966514], [1439487936.0, 52740000.0, 0.9886517447379143], [1439489664.0, 52760000.0, 0.9866859442078685], [1439491392.0, 52780000.0, 0.984564334529206], [1439493120.0, 52800000.0, 0.9822872507286882], [1439494848.0, 52820000.0, 0.9798550523842464], [1439496576.0, 52840000.0, 0.9772681235681947], [1439498304.0, 52860000.0, 0.9745268727865786], [1439500032.0, 52880000.0, 0.971631732914674], [1439501760.0, 52900000.0, 0.9685831611286313], [1439503488.0, 52920000.0, 0.9653816388332743], [1439505216.0, 52940000.0, 0.9620276715860866], [1439506944.0, 52960000.0, 0.9585217890173748], [1439508672.0, 52980000.0, 0.9548645447466442], [1439510400.0, 53000000.0, 0.9510565162951551], [1439512128.0, 53020000.0, 0.9470983049947461], [1439513856.0, 53040000.0, 0.9429905358928642], [1439515584.0, 53060000.0, 0.9387338576538741], [1439517312.0, 53080000.0, 0.9343289424566124], [1439519040.0, 53100000.0, 0.929776485888252], [1439520768.0, 53120000.0, 0.9250772068344589], [1439522496.0, 53140000.0, 0.9202318473658716], [1439524224.0, 53160000.0, 0.9152411726209192], [1439525952.0, 53180000.0, 0.9101059706849977], [1439527680.0, 53200000.0, 0.9048270524660189], [1439529408.0, 53220000.0, 0.8994052515663707], [1439531136.0, 53240000.0, 0.8938414241512638], [1439532864.0, 53260000.0, 0.8881364488135449], [1439534592.0, 53280000.0, 0.8822912264349541], [1439536320.0, 53300000.0, 0.8763066800438647], [1439538048.0, 53320000.0, 0.8701837546695272], [1439539776.0, 53340000.0, 0.8639234171928373], [1439541504.0, 53360000.0, 0.857526656193651], [1439543232.0, 53380000.0, 0.8509944817946911], [1439544960.0, 53400000.0, 0.8443279255020146], [1439546688.0, 53420000.0, 0.8375280400421455], [1439548416.0, 53440000.0, 0.830595899195813], [1439550144.0, 53460000.0, 0.8235325976284283], [1439551872.0, 53480000.0, 0.8163392507171853], [1439553600.0, 53500000.0, 0.8090169943749492], [1439555328.0, 53520000.0, 0.8015669848708746], [1439557056.0, 53540000.0, 0.7939903986478338], [1439558784.0, 53560000.0, 0.7862884321366223], [1439560512.0, 53580000.0, 0.7784623015670273], [1439562240.0, 53600000.0, 0.7705132427757891], [1439563968.0, 53620000.0, 0.7624425110114483], [1439565696.0, 53640000.0, 0.7542513807361048], [1439567424.0, 53660000.0, 0.7459411454241837], [1439569152.0, 53680000.0, 0.7375131173581712], [1439570880.0, 53700000.0, 0.7289686274214141], [1439572608.0, 53720000.0, 0.72030902488791], [1439574336.0, 53740000.0, 0.7115356772092891], [1439576064.0, 53760000.0, 0.7026499697988485], [1439577792.0, 53780000.0, 0.6936533058128048], [1439579520.0, 53800000.0, 0.6845471059286892], [1439581248.0, 53820000.0, 0.6753328081210255], [1439582976.0, 53840000.0, 0.6660118674342532], [1439584704.0, 53860000.0, 0.6565857557529586], [1439586432.0, 53880000.0, 0.6470559615694471], [1439588160.0, 53900000.0, 0.6374239897486932], [1439589888.0, 53920000.0, 0.627691361290699], [1439591616.0, 53940000.0, 0.6178596130903335], [1439593344.0, 53960000.0, 0.6079302976946052], [1439595072.0, 53980000.0, 0.5979049830575193], [1439596800.0, 54000000.0, 0.5877852522924742], [1439598528.0, 54020000.0, 0.5775727034222693], [1439600256.0, 54040000.0, 0.5672689491267588], [1439601984.0, 54060000.0, 0.5568756164881911], [1439603712.0, 54080000.0, 0.5463943467342669], [1439605440.0, 54100000.0, 0.535826794978995], [1439607168.0, 54120000.0, 0.5251746299612948], [1439608896.0, 54140000.0, 0.5144395337815062], [1439610624.0, 54160000.0, 0.5036232016357612], [1439612352.0, 54180000.0, 0.4927273415482927], [1439614080.0, 54200000.0, 0.48175367410171704], [1439615808.0, 54220000.0, 0.47070393216533507], [1439617536.0, 54240000.0, 0.45957986062148476], [1439619264.0, 54260000.0, 0.44838321609002985], [1439620992.0, 54280000.0, 0.43711576665093754], [1439622720.0, 54300000.0, 0.42577929156507804], [1439624448.0, 54320000.0, 0.4143755809932838], [1439626176.0, 54340000.0, 0.402906435713663], [1439627904.0, 54360000.0, 0.39137366683720354], [1439629632.0, 54380000.0, 0.37977909552180295], [1439631360.0, 54400000.0, 0.3681245526846739], [1439633088.0, 54420000.0, 0.356411878713254], [1439634816.0, 54440000.0, 0.3446429231745211], [1439636544.0, 54460000.0, 0.33281954452299145], [1439638272.0, 54480000.0, 0.3209436098072083], [1439640000.0, 54500000.0, 0.309016994374947], [1439641728.0, 54520000.0, 0.29704158157703525], [1439643456.0, 54540000.0, 0.2850192624699772], [1439645184.0, 54560000.0, 0.27295193551732705], [1439646912.0, 54580000.0, 0.26084150628989955], [1439648640.0, 54600000.0, 0.24868988716485815], [1439650368.0, 54620000.0, 0.2364989970237288], [1439652096.0, 54640000.0, 0.22427076094937917], [1439653824.0, 54660000.0, 0.21200710992205338], [1439655552.0, 54680000.0, 0.19970998051440653], [1439657280.0, 54700000.0, 0.1873813145857249], [1439659008.0, 54720000.0, 0.1750230589752771], [1439660736.0, 54740000.0, 0.1626371651948854], [1439662464.0, 54760000.0, 0.15022558912075964], [1439664192.0, 54780000.0, 0.13779029068464144], [1439665920.0, 54800000.0, 0.12533323356430134], [1439667648.0, 54820000.0, 0.11285638487347956], [1439669376.0, 54840000.0, 0.10036171485121353], [1439671104.0, 54860000.0, 0.0878511965507426], [1439672832.0, 54880000.0, 0.07532680552793293], [1439674560.0, 54900000.0, 0.06279051952931436], [1439676288.0, 54920000.0, 0.050244318179771326], [1439678016.0, 54940000.0, 0.03769018266993709], [1439679744.0, 54960000.0, 0.02513009544333371], [1439681472.0, 54980000.0, 0.01256603988334962], [1439683200.0, 55000000.0, 4.899825157862589e-15], [1439684928.0, 55020000.0, -0.012566039883346926], [1439686656.0, 55040000.0, -0.025130095443331016], [1439688384.0, 55060000.0, -0.037690182669934395], [1439690112.0, 55080000.0, -0.050244318179768634], [1439691840.0, 55100000.0, -0.06279051952931168], [1439693568.0, 55120000.0, -0.07532680552793024], [1439695296.0, 55140000.0, -0.08785119655073992], [1439697024.0, 55160000.0, -0.10036171485121086], [1439698752.0, 55180000.0, -0.11285638487347688], [1439700480.0, 55200000.0, -0.12533323356429868], [1439702208.0, 55220000.0, -0.13779029068463877], [1439703936.0, 55240000.0, -0.15022558912075698], [1439705664.0, 55260000.0, -0.16263716519488275], [1439707392.0, 55280000.0, -0.17502305897527443], [1439709120.0, 55300000.0, -0.18738131458572224], [1439710848.0, 55320000.0, -0.1997099805144039], [1439712576.0, 55340000.0, -0.21200710992205074], [1439714304.0, 55360000.0, -0.22427076094937654], [1439716032.0, 55380000.0, -0.2364989970237262], [1439717760.0, 55400000.0, -0.24868988716485554], [1439719488.0, 55420000.0, -0.26084150628989694], [1439721216.0, 55440000.0, -0.2729519355173245], [1439722944.0, 55460000.0, -0.2850192624699746], [1439724672.0, 55480000.0, -0.29704158157703264], [1439726400.0, 55500000.0, -0.30901699437494445], [1439728128.0, 55520000.0, -0.32094360980720577], [1439729856.0, 55540000.0, -0.3328195445229889], [1439731584.0, 55560000.0, -0.34464292317451856], [1439733312.0, 55580000.0, -0.3564118787132515], [1439735040.0, 55600000.0, -0.3681245526846714], [1439736768.0, 55620000.0, -0.37977909552180045], [1439738496.0, 55640000.0, -0.39137366683720104], [1439740224.0, 55660000.0, -0.4029064357136606], [1439741952.0, 55680000.0, -0.41437558099328137], [1439743680.0, 55700000.0, -0.4257792915650756], [1439745408.0, 55720000.0, -0.4371157666509351], [1439747136.0, 55740000.0, -0.4483832160900274], [1439748864.0, 55760000.0, -0.4595798606214824], [1439750592.0, 55780000.0, -0.4707039321653327], [1439752320.0, 55800000.0, -0.4817536741017147], [1439754048.0, 55820000.0, -0.49272734154829034], [1439755776.0, 55840000.0, -0.5036232016357589], [1439757504.0, 55860000.0, -0.5144395337815039], [1439759232.0, 55880000.0, -0.5251746299612925], [1439760960.0, 55900000.0, -0.5358267949789928], [1439762688.0, 55920000.0, -0.5463943467342646], [1439764416.0, 55940000.0, -0.5568756164881888], [1439766144.0, 55960000.0, -0.5672689491267566], [1439767872.0, 55980000.0, -0.5775727034222671], [1439769600.0, 56000000.0, -0.587785252292472], [1439771328.0, 56020000.0, -0.5979049830575172], [1439773056.0, 56040000.0, -0.607930297694603], [1439774784.0, 56060000.0, -0.6178596130903314], [1439776512.0, 56080000.0, -0.627691361290697], [1439778240.0, 56100000.0, -0.6374239897486911], [1439779968.0, 56120000.0, -0.6470559615694451], [1439781696.0, 56140000.0, -0.6565857557529566], [1439783424.0, 56160000.0, -0.6660118674342512], [1439785152.0, 56180000.0, -0.6753328081210235], [1439786880.0, 56200000.0, -0.6845471059286872], [1439788608.0, 56220000.0, -0.6936533058128029], [1439790336.0, 56240000.0, -0.7026499697988466], [1439792064.0, 56260000.0, -0.7115356772092872], [1439793792.0, 56280000.0, -0.7203090248879082], [1439795520.0, 56300000.0, -0.7289686274214123], [1439797248.0, 56320000.0, -0.7375131173581693], [1439798976.0, 56340000.0, -0.7459411454241819], [1439800704.0, 56360000.0, -0.754251380736103], [1439802432.0, 56380000.0, -0.7624425110114466], [1439804160.0, 56400000.0, -0.7705132427757875], [1439805888.0, 56420000.0, -0.7784623015670257], [1439807616.0, 56440000.0, -0.7862884321366206], [1439809344.0, 56460000.0, -0.7939903986478322], [1439811072.0, 56480000.0, -0.801566984870873], [1439812800.0, 56500000.0, -0.8090169943749477], [1439814528.0, 56520000.0, -0.8163392507171837], [1439816256.0, 56540000.0, -0.8235325976284268], [1439817984.0, 56560000.0, -0.8305958991958116], [1439819712.0, 56580000.0, -0.8375280400421441], [1439821440.0, 56600000.0, -0.8443279255020132], [1439823168.0, 56620000.0, -0.8509944817946896], [1439824896.0, 56640000.0, -0.8575266561936497], [1439826624.0, 56660000.0, -0.8639234171928359], [1439828352.0, 56680000.0, -0.8701837546695259], [1439830080.0, 56700000.0, -0.8763066800438635], [1439831808.0, 56720000.0, -0.8822912264349527], [1439833536.0, 56740000.0, -0.8881364488135437], [1439835264.0, 56760000.0, -0.8938414241512626], [1439836992.0, 56780000.0, -0.8994052515663695], [1439838720.0, 56800000.0, -0.9048270524660177], [1439840448.0, 56820000.0, -0.9101059706849965], [1439842176.0, 56840000.0, -0.9152411726209181], [1439843904.0, 56860000.0, -0.9202318473658705], [1439845632.0, 56880000.0, -0.9250772068344579], [1439847360.0, 56900000.0, -0.929776485888251], [1439849088.0, 56920000.0, -0.9343289424566114], [1439850816.0, 56940000.0, -0.9387338576538732], [1439852544.0, 56960000.0, -0.9429905358928633], [1439854272.0, 56980000.0, -0.9470983049947452], [1439856000.0, 57000000.0, -0.9510565162951542], [1439857728.0, 57020000.0, -0.9548645447466434], [1439859456.0, 57040000.0, -0.9585217890173741], [1439861184.0, 57060000.0, -0.9620276715860858], [1439862912.0, 57080000.0, -0.9653816388332737], [1439864640.0, 57100000.0, -0.9685831611286306], [1439866368.0, 57120000.0, -0.9716317329146733], [1439868096.0, 57140000.0, -0.974526872786578], [1439869824.0, 57160000.0, -0.9772681235681941], [1439871552.0, 57180000.0, -0.9798550523842459], [1439873280.0, 57200000.0, -0.9822872507286876], [1439875008.0, 57220000.0, -0.9845643345292054], [1439876736.0, 57240000.0, -0.986685944207868], [1439878464.0, 57260000.0, -0.9886517447379138], [1439880192.0, 57280000.0, -0.990461425696651], [1439881920.0, 57300000.0, -0.9921147013144784], [1439883648.0, 57320000.0, -0.9936113105200081], [1439885376.0, 57340000.0, -0.9949510169812997], [1439887104.0, 57360000.0, -0.9961336091431721], [1439888832.0, 57380000.0, -0.997158900260614], [1439890560.0, 57400000.0, -0.9980267284282716], [1439892288.0, 57420000.0, -0.9987369566060175], [1439894016.0, 57440000.0, -0.9992894726405892], [1439895744.0, 57460000.0, -0.9996841892832999], [1439897472.0, 57480000.0, -0.9999210442038161], [1439899200.0, 57500000.0, -1.0], [1439900928.0, 57520000.0, -0.9999210442038162], [1439902656.0, 57540000.0, -0.9996841892833], [1439904384.0, 57560000.0, -0.9992894726405892], [1439906112.0, 57580000.0, -0.9987369566060175], [1439907840.0, 57600000.0, -0.9980267284282716], [1439909568.0, 57620000.0, -0.997158900260614], [1439911296.0, 57640000.0, -0.9961336091431726], [1439913024.0, 57660000.0, -0.9949510169813004], [1439914752.0, 57680000.0, -0.9936113105200087], [1439916480.0, 57700000.0, -0.9921147013144783], [1439918208.0, 57720000.0, -0.9904614256966509], [1439919936.0, 57740000.0, -0.9886517447379137], [1439921664.0, 57760000.0, -0.9866859442078679], [1439923392.0, 57780000.0, -0.9845643345292067], [1439925120.0, 57800000.0, -0.9822872507286888], [1439926848.0, 57820000.0, -0.9798550523842472], [1439928576.0, 57840000.0, -0.9772681235681939], [1439930304.0, 57860000.0, -0.9745268727865778], [1439932032.0, 57880000.0, -0.9716317329146732], [1439933760.0, 57900000.0, -0.9685831611286323], [1439935488.0, 57920000.0, -0.9653816388332753], [1439937216.0, 57940000.0, -0.9620276715860876], [1439938944.0, 57960000.0, -0.9585217890173758], [1439940672.0, 57980000.0, -0.9548645447466432], [1439942400.0, 58000000.0, -0.951056516295154], [1439944128.0, 58020000.0, -0.947098304994745], [1439945856.0, 58040000.0, -0.9429905358928654], [1439947584.0, 58060000.0, -0.9387338576538754], [1439949312.0, 58080000.0, -0.9343289424566137], [1439951040.0, 58100000.0, -0.9297764858882533], [1439952768.0, 58120000.0, -0.9250772068344577], [1439954496.0, 58140000.0, -0.9202318473658703], [1439956224.0, 58160000.0, -0.9152411726209178], [1439957952.0, 58180000.0, -0.9101059706849962], [1439959680.0, 58200000.0, -0.9048270524660205], [1439961408.0, 58220000.0, -0.8994052515663723], [1439963136.0, 58240000.0, -0.8938414241512654], [1439964864.0, 58260000.0, -0.8881364488135466], [1439966592.0, 58280000.0, -0.8822912264349524], [1439968320.0, 58300000.0, -0.876306680043863], [1439970048.0, 58320000.0, -0.8701837546695256], [1439971776.0, 58340000.0, -0.8639234171928356], [1439973504.0, 58360000.0, -0.8575266561936529], [1439975232.0, 58380000.0, -0.850994481794693], [1439976960.0, 58400000.0, -0.8443279255020166], [1439978688.0, 58420000.0, -0.8375280400421437], [1439980416.0, 58440000.0, -0.8305958991958111], [1439982144.0, 58460000.0, -0.8235325976284263], [1439983872.0, 58480000.0, -0.8163392507171833], [1439985600.0, 58500000.0, -0.8090169943749514], [1439987328.0, 58520000.0, -0.8015669848708769], [1439989056.0, 58540000.0, -0.793990398647836], [1439990784.0, 58560000.0, -0.7862884321366201], [1439992512.0, 58580000.0, -0.7784623015670251], [1439994240.0, 58600000.0, -0.7705132427757869], [1439995968.0, 58620000.0, -0.7624425110114507], [1439997696.0, 58640000.0, -0.7542513807361072], [1439999424.0, 58660000.0, -0.7459411454241861], [1440001152.0, 58680000.0, -0.7375131173581736], [1440002880.0, 58700000.0, -0.7289686274214118], [1440004608.0, 58720000.0, -0.7203090248879077], [1440006336.0, 58740000.0, -0.7115356772092867], [1440008064.0, 58760000.0, -0.7026499697988511], [1440009792.0, 58780000.0, -0.6936533058128075], [1440011520.0, 58800000.0, -0.6845471059286918], [1440013248.0, 58820000.0, -0.6753328081210281], [1440014976.0, 58840000.0, -0.6660118674342507], [1440016704.0, 58860000.0, -0.6565857557529561], [1440018432.0, 58880000.0, -0.6470559615694446], [1440020160.0, 58900000.0, -0.6374239897486905], [1440021888.0, 58920000.0, -0.6276913612907019], [1440023616.0, 58940000.0, -0.6178596130903364], [1440025344.0, 58960000.0, -0.6079302976946082], [1440027072.0, 58980000.0, -0.5979049830575223], [1440028800.0, 59000000.0, -0.5877852522924715], [1440030528.0, 59020000.0, -0.5775727034222665], [1440032256.0, 59040000.0, -0.5672689491267561], [1440033984.0, 59060000.0, -0.5568756164881882], [1440035712.0, 59080000.0, -0.5463943467342699], [1440037440.0, 59100000.0, -0.5358267949789981], [1440039168.0, 59120000.0, -0.5251746299612979], [1440040896.0, 59140000.0, -0.5144395337815093], [1440042624.0, 59160000.0, -0.5036232016357582], [1440044352.0, 59180000.0, -0.4927273415482897], [1440046080.0, 59200000.0, -0.48175367410171405], [1440047808.0, 59220000.0, -0.4707039321653383], [1440049536.0, 59240000.0, -0.45957986062148803], [1440051264.0, 59260000.0, -0.4483832160900331], [1440052992.0, 59280000.0, -0.4371157666509345], [1440054720.0, 59300000.0, -0.42577929156507494], [1440056448.0, 59320000.0, -0.41437558099328065], [1440058176.0, 59340000.0, -0.4029064357136599], [1440059904.0, 59360000.0, -0.39137366683720687], [1440061632.0, 59380000.0, -0.37977909552180633], [1440063360.0, 59400000.0, -0.3681245526846773], [1440065088.0, 59420000.0, -0.3564118787132508], [1440066816.0, 59440000.0, -0.3446429231745179], [1440068544.0, 59460000.0, -0.33281954452298823], [1440070272.0, 59480000.0, -0.3209436098072118], [1440072000.0, 59500000.0, -0.3090169943749505], [1440073728.0, 59520000.0, -0.29704158157703875], [1440075456.0, 59540000.0, -0.2850192624699807], [1440077184.0, 59560000.0, -0.2729519355173238], [1440078912.0, 59580000.0, -0.2608415062898962], [1440080640.0, 59600000.0, -0.24868988716485482], [1440082368.0, 59620000.0, -0.23649899702372548], [1440084096.0, 59640000.0, -0.22427076094938275], [1440085824.0, 59660000.0, -0.21200710992205696], [1440087552.0, 59680000.0, -0.19970998051441013], [1440089280.0, 59700000.0, -0.18738131458572851], [1440091008.0, 59720000.0, -0.1750230589752737], [1440092736.0, 59740000.0, -0.16263716519488203], [1440094464.0, 59760000.0, -0.15022558912075626], [1440096192.0, 59780000.0, -0.13779029068463802], [1440097920.0, 59800000.0, -0.12533323356430498], [1440099648.0, 59820000.0, -0.11285638487348322], [1440101376.0, 59840000.0, -0.1003617148512172], [1440103104.0, 59860000.0, -0.08785119655074626], [1440104832.0, 59880000.0, -0.0753268055279295], [1440106560.0, 59900000.0, -0.06279051952931095], [1440108288.0, 59920000.0, -0.0502443181797679], [1440110016.0, 59940000.0, -0.037690182669940765], [1440111744.0, 59960000.0, -0.025130095443337386], [1440113472.0, 59980000.0, -0.012566039883353296], [1440115200.0, 60000000.0, -1.4695761589768238e-15], [1440116928.0, 60020000.0, 0.012566039883350356], [1440118656.0, 60040000.0, 0.025130095443334447], [1440120384.0, 60060000.0, 0.03769018266993782], [1440122112.0, 60080000.0, 0.05024431817976496], [1440123840.0, 60100000.0, 0.062790519529308], [1440125568.0, 60120000.0, 0.07532680552792657], [1440127296.0, 60140000.0, 0.08785119655074333], [1440129024.0, 60160000.0, 0.10036171485121427], [1440130752.0, 60180000.0, 0.11285638487348029], [1440132480.0, 60200000.0, 0.12533323356430207], [1440134208.0, 60220000.0, 0.13779029068463514], [1440135936.0, 60240000.0, 0.15022558912075334], [1440137664.0, 60260000.0, 0.16263716519487914], [1440139392.0, 60280000.0, 0.17502305897527082], [1440141120.0, 60300000.0, 0.18738131458572563], [1440142848.0, 60320000.0, 0.19970998051440725], [1440144576.0, 60340000.0, 0.2120071099220541], [1440146304.0, 60360000.0, 0.2242707609493799], [1440148032.0, 60380000.0, 0.23649899702372262], [1440149760.0, 60400000.0, 0.248689887164852], [1440151488.0, 60420000.0, 0.2608415062898934], [1440153216.0, 60440000.0, 0.27295193551732094], [1440154944.0, 60460000.0, 0.2850192624699779], [1440156672.0, 60480000.0, 0.2970415815770359], [1440158400.0, 60500000.0, 0.30901699437494773], [1440160128.0, 60520000.0, 0.32094360980720904], [1440161856.0, 60540000.0, 0.33281954452298546], [1440163584.0, 60560000.0, 0.3446429231745151], [1440165312.0, 60580000.0, 0.35641187871324803], [1440167040.0, 60600000.0, 0.3681245526846746], [1440168768.0, 60620000.0, 0.3797790955218036], [1440170496.0, 60640000.0, 0.3913736668372042], [1440172224.0, 60660000.0, 0.4029064357136572], [1440173952.0, 60680000.0, 0.414375580993278], [1440175680.0, 60700000.0, 0.42577929156507227], [1440177408.0, 60720000.0, 0.4371157666509318], [1440179136.0, 60740000.0, 0.44838321609003046], [1440180864.0, 60760000.0, 0.4595798606214854], [1440182592.0, 60780000.0, 0.4707039321653357], [1440184320.0, 60800000.0, 0.4817536741017115], [1440186048.0, 60820000.0, 0.4927273415482871], [1440187776.0, 60840000.0, 0.5036232016357557], [1440189504.0, 60860000.0, 0.5144395337815069], [1440191232.0, 60880000.0, 0.5251746299612954], [1440192960.0, 60900000.0, 0.5358267949789957], [1440194688.0, 60920000.0, 0.5463943467342675], [1440196416.0, 60940000.0, 0.5568756164881857], [1440198144.0, 60960000.0, 0.5672689491267536], [1440199872.0, 60980000.0, 0.5775727034222641], [1440201600.0, 61000000.0, 0.587785252292469], [1440203328.0, 61020000.0, 0.5979049830575199], [1440205056.0, 61040000.0, 0.6079302976946058], [1440206784.0, 61060000.0, 0.6178596130903341], [1440208512.0, 61080000.0, 0.6276913612906997], [1440210240.0, 61100000.0, 0.6374239897486883], [1440211968.0, 61120000.0, 0.6470559615694423], [1440213696.0, 61140000.0, 0.6565857557529539], [1440215424.0, 61160000.0, 0.6660118674342485], [1440217152.0, 61180000.0, 0.675332808121026], [1440218880.0, 61200000.0, 0.6845471059286896], [1440220608.0, 61220000.0, 0.6936533058128054], [1440222336.0, 61240000.0, 0.7026499697988491], [1440224064.0, 61260000.0, 0.7115356772092847], [1440225792.0, 61280000.0, 0.7203090248879056], [1440227520.0, 61300000.0, 0.7289686274214098], [1440229248.0, 61320000.0, 0.7375131173581716], [1440230976.0, 61340000.0, 0.7459411454241841], [1440232704.0, 61360000.0, 0.7542513807361052], [1440234432.0, 61380000.0, 0.7624425110114488], [1440236160.0, 61400000.0, 0.7705132427757851], [1440237888.0, 61420000.0, 0.7784623015670233], [1440239616.0, 61440000.0, 0.7862884321366184], [1440241344.0, 61460000.0, 0.7939903986478343], [1440243072.0, 61480000.0, 0.8015669848708751], [1440244800.0, 61500000.0, 0.8090169943749497], [1440246528.0, 61520000.0, 0.8163392507171816], [1440248256.0, 61540000.0, 0.8235325976284247], [1440249984.0, 61560000.0, 0.8305958991958095], [1440251712.0, 61580000.0, 0.8375280400421421], [1440253440.0, 61600000.0, 0.8443279255020151], [1440255168.0, 61620000.0, 0.8509944817946914], [1440256896.0, 61640000.0, 0.8575266561936514], [1440258624.0, 61660000.0, 0.863923417192834], [1440260352.0, 61680000.0, 0.870183754669524], [1440262080.0, 61700000.0, 0.8763066800438617], [1440263808.0, 61720000.0, 0.8822912264349511], [1440265536.0, 61740000.0, 0.8881364488135453], [1440267264.0, 61760000.0, 0.8938414241512641], [1440268992.0, 61780000.0, 0.8994052515663711], [1440270720.0, 61800000.0, 0.9048270524660191], [1440272448.0, 61820000.0, 0.910105970684995], [1440274176.0, 61840000.0, 0.9152411726209166], [1440275904.0, 61860000.0, 0.9202318473658692], [1440277632.0, 61880000.0, 0.9250772068344566], [1440279360.0, 61900000.0, 0.9297764858882522], [1440281088.0, 61920000.0, 0.9343289424566126], [1440282816.0, 61940000.0, 0.9387338576538744], [1440284544.0, 61960000.0, 0.9429905358928645], [1440286272.0, 61980000.0, 0.947098304994744], [1440288000.0, 62000000.0, 0.9510565162951531], [1440289728.0, 62020000.0, 0.9548645447466423], [1440291456.0, 62040000.0, 0.9585217890173751], [1440293184.0, 62060000.0, 0.9620276715860868], [1440294912.0, 62080000.0, 0.9653816388332745], [1440296640.0, 62100000.0, 0.9685831611286315], [1440298368.0, 62120000.0, 0.9716317329146724], [1440300096.0, 62140000.0, 0.9745268727865771], [1440301824.0, 62160000.0, 0.9772681235681934], [1440303552.0, 62180000.0, 0.9798550523842465], [1440305280.0, 62200000.0, 0.9822872507286883], [1440307008.0, 62220000.0, 0.9845643345292061], [1440308736.0, 62240000.0, 0.9866859442078675], [1440310464.0, 62260000.0, 0.9886517447379133], [1440312192.0, 62280000.0, 0.9904614256966505], [1440313920.0, 62300000.0, 0.992114701314478], [1440315648.0, 62320000.0, 0.9936113105200084], [1440317376.0, 62340000.0, 0.9949510169813001], [1440319104.0, 62360000.0, 0.9961336091431724], [1440320832.0, 62380000.0, 0.9971589002606138], [1440322560.0, 62400000.0, 0.9980267284282713], [1440324288.0, 62420000.0, 0.9987369566060174], [1440326016.0, 62440000.0, 0.9992894726405891], [1440327744.0, 62460000.0, 0.9996841892833], [1440329472.0, 62480000.0, 0.9999210442038161], [1440331200.0, 62500000.0, 1.0], [1440332928.0, 62520000.0, 0.9999210442038161], [1440334656.0, 62540000.0, 0.9996841892833], [1440336384.0, 62560000.0, 0.9992894726405893], [1440338112.0, 62580000.0, 0.9987369566060177], [1440339840.0, 62600000.0, 0.9980267284282718], [1440341568.0, 62620000.0, 0.9971589002606143], [1440343296.0, 62640000.0, 0.9961336091431723], [1440345024.0, 62660000.0, 0.9949510169813001], [1440346752.0, 62680000.0, 0.9936113105200084], [1440348480.0, 62700000.0, 0.9921147013144788], [1440350208.0, 62720000.0, 0.9904614256966514], [1440351936.0, 62740000.0, 0.9886517447379143], [1440353664.0, 62760000.0, 0.9866859442078685], [1440355392.0, 62780000.0, 0.984564334529206], [1440357120.0, 62800000.0, 0.9822872507286882], [1440358848.0, 62820000.0, 0.9798550523842465], [1440360576.0, 62840000.0, 0.9772681235681947], [1440362304.0, 62860000.0, 0.9745268727865787], [1440364032.0, 62880000.0, 0.9716317329146741], [1440365760.0, 62900000.0, 0.9685831611286314], [1440367488.0, 62920000.0, 0.9653816388332744], [1440369216.0, 62940000.0, 0.9620276715860866], [1440370944.0, 62960000.0, 0.9585217890173748], [1440372672.0, 62980000.0, 0.9548645447466443], [1440374400.0, 63000000.0, 0.9510565162951552], [1440376128.0, 63020000.0, 0.9470983049947461], [1440377856.0, 63040000.0, 0.9429905358928643], [1440379584.0, 63060000.0, 0.9387338576538742], [1440381312.0, 63080000.0, 0.9343289424566125], [1440383040.0, 63100000.0, 0.9297764858882521], [1440384768.0, 63120000.0, 0.925077206834459], [1440386496.0, 63140000.0, 0.9202318473658717], [1440388224.0, 63160000.0, 0.9152411726209193], [1440389952.0, 63180000.0, 0.9101059706849978], [1440391680.0, 63200000.0, 0.904827052466019], [1440393408.0, 63220000.0, 0.8994052515663709], [1440395136.0, 63240000.0, 0.8938414241512639], [1440396864.0, 63260000.0, 0.888136448813545], [1440398592.0, 63280000.0, 0.8822912264349542], [1440400320.0, 63300000.0, 0.8763066800438648], [1440402048.0, 63320000.0, 0.8701837546695274], [1440403776.0, 63340000.0, 0.8639234171928374], [1440405504.0, 63360000.0, 0.8575266561936512], [1440407232.0, 63380000.0, 0.8509944817946912], [1440408960.0, 63400000.0, 0.8443279255020147], [1440410688.0, 63420000.0, 0.8375280400421456], [1440412416.0, 63440000.0, 0.8305958991958132], [1440414144.0, 63460000.0, 0.8235325976284285], [1440415872.0, 63480000.0, 0.8163392507171854], [1440417600.0, 63500000.0, 0.8090169943749493], [1440419328.0, 63520000.0, 0.8015669848708747], [1440421056.0, 63540000.0, 0.7939903986478339], [1440422784.0, 63560000.0, 0.7862884321366224], [1440424512.0, 63580000.0, 0.7784623015670274], [1440426240.0, 63600000.0, 0.7705132427757894], [1440427968.0, 63620000.0, 0.7624425110114484], [1440429696.0, 63640000.0, 0.754251380736105], [1440431424.0, 63660000.0, 0.7459411454241838], [1440433152.0, 63680000.0, 0.7375131173581713], [1440434880.0, 63700000.0, 0.7289686274214143], [1440436608.0, 63720000.0, 0.7203090248879103], [1440438336.0, 63740000.0, 0.7115356772092892], [1440440064.0, 63760000.0, 0.7026499697988488], [1440441792.0, 63780000.0, 0.693653305812805], [1440443520.0, 63800000.0, 0.6845471059286893], [1440445248.0, 63820000.0, 0.6753328081210257], [1440446976.0, 63840000.0, 0.6660118674342534], [1440448704.0, 63860000.0, 0.6565857557529589], [1440450432.0, 63880000.0, 0.6470559615694473], [1440452160.0, 63900000.0, 0.6374239897486934], [1440453888.0, 63920000.0, 0.6276913612906992], [1440455616.0, 63940000.0, 0.6178596130903338], [1440457344.0, 63960000.0, 0.6079302976946054], [1440459072.0, 63980000.0, 0.5979049830575195], [1440460800.0, 64000000.0, 0.5877852522924744], [1440462528.0, 64020000.0, 0.5775727034222695], [1440464256.0, 64040000.0, 0.5672689491267591], [1440465984.0, 64060000.0, 0.5568756164881913], [1440467712.0, 64080000.0, 0.5463943467342671], [1440469440.0, 64100000.0, 0.5358267949789952], [1440471168.0, 64120000.0, 0.5251746299612949], [1440472896.0, 64140000.0, 0.5144395337815064], [1440474624.0, 64160000.0, 0.5036232016357615], [1440476352.0, 64180000.0, 0.4927273415482929], [1440478080.0, 64200000.0, 0.48175367410171727], [1440479808.0, 64220000.0, 0.4707039321653353], [1440481536.0, 64240000.0, 0.459579860621485], [1440483264.0, 64260000.0, 0.44838321609003007], [1440484992.0, 64280000.0, 0.43711576665093776], [1440486720.0, 64300000.0, 0.42577929156507827], [1440488448.0, 64320000.0, 0.41437558099328403], [1440490176.0, 64340000.0, 0.40290643571366325], [1440491904.0, 64360000.0, 0.39137366683720376], [1440493632.0, 64380000.0, 0.37977909552180317], [1440495360.0, 64400000.0, 0.36812455268467414], [1440497088.0, 64420000.0, 0.35641187871325425], [1440498816.0, 64440000.0, 0.34464292317452133], [1440500544.0, 64460000.0, 0.3328195445229917], [1440502272.0, 64480000.0, 0.32094360980720854], [1440504000.0, 64500000.0, 0.30901699437494723], [1440505728.0, 64520000.0, 0.2970415815770355], [1440507456.0, 64540000.0, 0.28501926246997744], [1440509184.0, 64560000.0, 0.2729519355173273], [1440510912.0, 64580000.0, 0.26084150628989977], [1440512640.0, 64600000.0, 0.24868988716485838], [1440514368.0, 64620000.0, 0.23649899702372906], [1440516096.0, 64640000.0, 0.2242707609493794], [1440517824.0, 64660000.0, 0.21200710992205363], [1440519552.0, 64680000.0, 0.19970998051440675], [1440521280.0, 64700000.0, 0.18738131458572513], [1440523008.0, 64720000.0, 0.17502305897527734], [1440524736.0, 64740000.0, 0.16263716519488566], [1440526464.0, 64760000.0, 0.1502255891207599], [1440528192.0, 64780000.0, 0.1377902906846417], [1440529920.0, 64800000.0, 0.1253332335643016], [1440531648.0, 64820000.0, 0.1128563848734798], [1440533376.0, 64840000.0, 0.10036171485121378], [1440535104.0, 64860000.0, 0.08785119655074285], [1440536832.0, 64880000.0, 0.07532680552793318], [1440538560.0, 64900000.0, 0.06279051952931461], [1440540288.0, 64920000.0, 0.05024431817977157], [1440542016.0, 64940000.0, 0.03769018266993734], [1440543744.0, 64960000.0, 0.025130095443333954], [1440545472.0, 64980000.0, 0.012566039883349865], [1440547200.0, 65000000.0, -1.9606728399089416e-15], [1440548928.0, 65020000.0, -0.012566039883346682], [1440550656.0, 65040000.0, -0.025130095443330773], [1440552384.0, 65060000.0, -0.03769018266993415], [1440554112.0, 65080000.0, -0.05024431817976839], [1440555840.0, 65100000.0, -0.06279051952931143], [1440557568.0, 65120000.0, -0.07532680552793], [1440559296.0, 65140000.0, -0.08785119655073967], [1440561024.0, 65160000.0, -0.10036171485121062], [1440562752.0, 65180000.0, -0.11285638487347664], [1440564480.0, 65200000.0, -0.12533323356429843], [1440566208.0, 65220000.0, -0.13779029068463852], [1440567936.0, 65240000.0, -0.15022558912075673], [1440569664.0, 65260000.0, -0.16263716519488253], [1440571392.0, 65280000.0, -0.1750230589752742], [1440573120.0, 65300000.0, -0.18738131458572202], [1440574848.0, 65320000.0, -0.19970998051440364], [1440576576.0, 65340000.0, -0.21200710992205052], [1440578304.0, 65360000.0, -0.2242707609493763], [1440580032.0, 65380000.0, -0.23649899702372595], [1440581760.0, 65400000.0, -0.2486898871648553], [1440583488.0, 65420000.0, -0.26084150628989666], [1440585216.0, 65440000.0, -0.2729519355173242], [1440586944.0, 65460000.0, -0.2850192624699744], [1440588672.0, 65480000.0, -0.2970415815770324], [1440590400.0, 65500000.0, -0.30901699437494423], [1440592128.0, 65520000.0, -0.32094360980720554], [1440593856.0, 65540000.0, -0.3328195445229887], [1440595584.0, 65560000.0, -0.34464292317451833], [1440597312.0, 65580000.0, -0.35641187871325125], [1440599040.0, 65600000.0, -0.3681245526846712], [1440600768.0, 65620000.0, -0.3797790955218002], [1440602496.0, 65640000.0, -0.3913736668372008], [1440604224.0, 65660000.0, -0.40290643571366036], [1440605952.0, 65680000.0, -0.41437558099328115], [1440607680.0, 65700000.0, -0.4257792915650754], [1440609408.0, 65720000.0, -0.43711576665093493], [1440611136.0, 65740000.0, -0.4483832160900272], [1440612864.0, 65760000.0, -0.45957986062148215], [1440614592.0, 65780000.0, -0.47070393216533246], [1440616320.0, 65800000.0, -0.4817536741017145], [1440618048.0, 65820000.0, -0.4927273415482901], [1440619776.0, 65840000.0, -0.5036232016357587], [1440621504.0, 65860000.0, -0.5144395337815036], [1440623232.0, 65880000.0, -0.5251746299612923], [1440624960.0, 65900000.0, -0.5358267949789925], [1440626688.0, 65920000.0, -0.5463943467342643], [1440628416.0, 65940000.0, -0.5568756164881886], [1440630144.0, 65960000.0, -0.5672689491267564], [1440631872.0, 65980000.0, -0.577572703422267], [1440633600.0, 66000000.0, -0.5877852522924718], [1440635328.0, 66020000.0, -0.5979049830575169], [1440637056.0, 66040000.0, -0.6079302976946028], [1440638784.0, 66060000.0, -0.6178596130903312], [1440640512.0, 66080000.0, -0.6276913612906968], [1440642240.0, 66100000.0, -0.637423989748691], [1440643968.0, 66120000.0, -0.6470559615694449], [1440645696.0, 66140000.0, -0.6565857557529564], [1440647424.0, 66160000.0, -0.666011867434251], [1440649152.0, 66180000.0, -0.6753328081210233], [1440650880.0, 66200000.0, -0.684547105928687], [1440652608.0, 66220000.0, -0.6936533058128027], [1440654336.0, 66240000.0, -0.7026499697988464], [1440656064.0, 66260000.0, -0.711535677209287], [1440657792.0, 66280000.0, -0.720309024887908], [1440659520.0, 66300000.0, -0.7289686274214121], [1440661248.0, 66320000.0, -0.7375131173581692], [1440662976.0, 66340000.0, -0.7459411454241817], [1440664704.0, 66360000.0, -0.7542513807361029], [1440666432.0, 66380000.0, -0.7624425110114464], [1440668160.0, 66400000.0, -0.7705132427757873], [1440669888.0, 66420000.0, -0.7784623015670254], [1440671616.0, 66440000.0, -0.7862884321366205], [1440673344.0, 66460000.0, -0.793990398647832], [1440675072.0, 66480000.0, -0.8015669848708729], [1440676800.0, 66500000.0, -0.8090169943749476], [1440678528.0, 66520000.0, -0.8163392507171836], [1440680256.0, 66540000.0, -0.8235325976284267], [1440681984.0, 66560000.0, -0.8305958991958114], [1440683712.0, 66580000.0, -0.837528040042144], [1440685440.0, 66600000.0, -0.8443279255020131], [1440687168.0, 66620000.0, -0.8509944817946895], [1440688896.0, 66640000.0, -0.8575266561936495], [1440690624.0, 66660000.0, -0.8639234171928358], [1440692352.0, 66680000.0, -0.8701837546695258], [1440694080.0, 66700000.0, -0.8763066800438634], [1440695808.0, 66720000.0, -0.8822912264349526], [1440697536.0, 66740000.0, -0.8881364488135436], [1440699264.0, 66760000.0, -0.8938414241512624], [1440700992.0, 66780000.0, -0.8994052515663694], [1440702720.0, 66800000.0, -0.9048270524660176], [1440704448.0, 66820000.0, -0.9101059706849964], [1440706176.0, 66840000.0, -0.915241172620918], [1440707904.0, 66860000.0, -0.9202318473658705], [1440709632.0, 66880000.0, -0.9250772068344578], [1440711360.0, 66900000.0, -0.9297764858882509], [1440713088.0, 66920000.0, -0.9343289424566114], [1440714816.0, 66940000.0, -0.9387338576538731], [1440716544.0, 66960000.0, -0.9429905358928633], [1440718272.0, 66980000.0, -0.9470983049947451], [1440720000.0, 67000000.0, -0.9510565162951542], [1440721728.0, 67020000.0, -0.9548645447466433], [1440723456.0, 67040000.0, -0.958521789017374], [1440725184.0, 67060000.0, -0.9620276715860858], [1440726912.0, 67080000.0, -0.9653816388332735], [1440728640.0, 67100000.0, -0.9685831611286306], [1440730368.0, 67120000.0, -0.9716317329146733], [1440732096.0, 67140000.0, -0.9745268727865779], [1440733824.0, 67160000.0, -0.977268123568194], [1440735552.0, 67180000.0, -0.9798550523842459], [1440737280.0, 67200000.0, -0.9822872507286876], [1440739008.0, 67220000.0, -0.9845643345292054], [1440740736.0, 67240000.0, -0.986685944207868], [1440742464.0, 67260000.0, -0.9886517447379138], [1440744192.0, 67280000.0, -0.990461425696651], [1440745920.0, 67300000.0, -0.9921147013144784], [1440747648.0, 67320000.0, -0.9936113105200081], [1440749376.0, 67340000.0, -0.9949510169812997], [1440751104.0, 67360000.0, -0.9961336091431721], [1440752832.0, 67380000.0, -0.997158900260614], [1440754560.0, 67400000.0, -0.9980267284282716], [1440756288.0, 67420000.0, -0.9987369566060175], [1440758016.0, 67440000.0, -0.9992894726405892], [1440759744.0, 67460000.0, -0.9996841892832999], [1440761472.0, 67480000.0, -0.9999210442038161], [1440763200.0, 67500000.0, -1.0], [1440764928.0, 67520000.0, -0.9999210442038162], [1440766656.0, 67540000.0, -0.9996841892833002], [1440768384.0, 67560000.0, -0.9992894726405892], [1440770112.0, 67580000.0, -0.9987369566060175], [1440771840.0, 67600000.0, -0.9980267284282716], [1440773568.0, 67620000.0, -0.997158900260614], [1440775296.0, 67640000.0, -0.9961336091431726], [1440777024.0, 67660000.0, -0.9949510169813004], [1440778752.0, 67680000.0, -0.9936113105200088], [1440780480.0, 67700000.0, -0.9921147013144783], [1440782208.0, 67720000.0, -0.990461425696651], [1440783936.0, 67740000.0, -0.9886517447379138], [1440785664.0, 67760000.0, -0.9866859442078679], [1440787392.0, 67780000.0, -0.9845643345292067], [1440789120.0, 67800000.0, -0.9822872507286888], [1440790848.0, 67820000.0, -0.9798550523842472], [1440792576.0, 67840000.0, -0.977268123568194], [1440794304.0, 67860000.0, -0.9745268727865779], [1440796032.0, 67880000.0, -0.9716317329146732], [1440797760.0, 67900000.0, -0.9685831611286323], [1440799488.0, 67920000.0, -0.9653816388332753], [1440801216.0, 67940000.0, -0.9620276715860876], [1440802944.0, 67960000.0, -0.958521789017376], [1440804672.0, 67980000.0, -0.9548645447466432], [1440806400.0, 68000000.0, -0.9510565162951541], [1440808128.0, 68020000.0, -0.9470983049947451], [1440809856.0, 68040000.0, -0.9429905358928655], [1440811584.0, 68060000.0, -0.9387338576538755], [1440813312.0, 68080000.0, -0.9343289424566138], [1440815040.0, 68100000.0, -0.9297764858882535], [1440816768.0, 68120000.0, -0.9250772068344577], [1440818496.0, 68140000.0, -0.9202318473658704], [1440820224.0, 68160000.0, -0.9152411726209179], [1440821952.0, 68180000.0, -0.9101059706849963], [1440823680.0, 68200000.0, -0.9048270524660206], [1440825408.0, 68220000.0, -0.8994052515663724], [1440827136.0, 68240000.0, -0.8938414241512656], [1440828864.0, 68260000.0, -0.8881364488135467], [1440830592.0, 68280000.0, -0.8822912264349525], [1440832320.0, 68300000.0, -0.8763066800438631], [1440834048.0, 68320000.0, -0.8701837546695257], [1440835776.0, 68340000.0, -0.8639234171928357], [1440837504.0, 68360000.0, -0.8575266561936531], [1440839232.0, 68380000.0, -0.8509944817946931], [1440840960.0, 68400000.0, -0.8443279255020167], [1440842688.0, 68420000.0, -0.8375280400421439], [1440844416.0, 68440000.0, -0.8305958991958112], [1440846144.0, 68460000.0, -0.8235325976284265], [1440847872.0, 68480000.0, -0.8163392507171835], [1440849600.0, 68500000.0, -0.8090169943749516], [1440851328.0, 68520000.0, -0.801566984870877], [1440853056.0, 68540000.0, -0.7939903986478363], [1440854784.0, 68560000.0, -0.7862884321366203], [1440856512.0, 68580000.0, -0.7784623015670253], [1440858240.0, 68600000.0, -0.7705132427757871], [1440859968.0, 68620000.0, -0.7624425110114509], [1440861696.0, 68640000.0, -0.7542513807361073], [1440863424.0, 68660000.0, -0.7459411454241862], [1440865152.0, 68680000.0, -0.7375131173581738], [1440866880.0, 68700000.0, -0.728968627421412], [1440868608.0, 68720000.0, -0.7203090248879078], [1440870336.0, 68740000.0, -0.7115356772092869], [1440872064.0, 68760000.0, -0.7026499697988513], [1440873792.0, 68780000.0, -0.6936533058128077], [1440875520.0, 68800000.0, -0.684547105928692], [1440877248.0, 68820000.0, -0.6753328081210284], [1440878976.0, 68840000.0, -0.6660118674342509], [1440880704.0, 68860000.0, -0.6565857557529563], [1440882432.0, 68880000.0, -0.6470559615694448], [1440884160.0, 68900000.0, -0.6374239897486907], [1440885888.0, 68920000.0, -0.6276913612907021], [1440887616.0, 68940000.0, -0.6178596130903367], [1440889344.0, 68960000.0, -0.6079302976946083], [1440891072.0, 68980000.0, -0.5979049830575225], [1440892800.0, 69000000.0, -0.5877852522924716], [1440894528.0, 69020000.0, -0.5775727034222667], [1440896256.0, 69040000.0, -0.5672689491267563], [1440897984.0, 69060000.0, -0.5568756164881884], [1440899712.0, 69080000.0, -0.5463943467342701], [1440901440.0, 69100000.0, -0.5358267949789983], [1440903168.0, 69120000.0, -0.5251746299612982], [1440904896.0, 69140000.0, -0.5144395337815095], [1440906624.0, 69160000.0, -0.5036232016357585], [1440908352.0, 69180000.0, -0.4927273415482899], [1440910080.0, 69200000.0, -0.48175367410171427], [1440911808.0, 69220000.0, -0.4707039321653385], [1440913536.0, 69240000.0, -0.4595798606214882], [1440915264.0, 69260000.0, -0.44838321609003334], [1440916992.0, 69280000.0, -0.4371157666509347], [1440918720.0, 69300000.0, -0.42577929156507516], [1440920448.0, 69320000.0, -0.4143755809932809], [1440922176.0, 69340000.0, -0.40290643571366014], [1440923904.0, 69360000.0, -0.39137366683720715], [1440925632.0, 69380000.0, -0.37977909552180655], [1440927360.0, 69400000.0, -0.36812455268467753], [1440929088.0, 69420000.0, -0.35641187871325103], [1440930816.0, 69440000.0, -0.3446429231745181], [1440932544.0, 69460000.0, -0.33281954452298845], [1440934272.0, 69480000.0, -0.32094360980721204], [1440936000.0, 69500000.0, -0.3090169943749507], [1440937728.0, 69520000.0, -0.29704158157703897], [1440939456.0, 69540000.0, -0.28501926246998094], [1440941184.0, 69560000.0, -0.272951935517324], [1440942912.0, 69580000.0, -0.26084150628989644], [1440944640.0, 69600000.0, -0.24868988716485507], [1440946368.0, 69620000.0, -0.23649899702372573], [1440948096.0, 69640000.0, -0.22427076094938297], [1440949824.0, 69660000.0, -0.2120071099220572], [1440951552.0, 69680000.0, -0.19970998051441036], [1440953280.0, 69700000.0, -0.18738131458572874], [1440955008.0, 69720000.0, -0.17502305897527395], [1440956736.0, 69740000.0, -0.16263716519488228], [1440958464.0, 69760000.0, -0.1502255891207565], [1440960192.0, 69780000.0, -0.13779029068463827], [1440961920.0, 69800000.0, -0.12533323356430523], [1440963648.0, 69820000.0, -0.11285638487348346], [1440965376.0, 69840000.0, -0.10036171485121743], [1440967104.0, 69860000.0, -0.08785119655074651], [1440968832.0, 69880000.0, -0.07532680552792975], [1440970560.0, 69900000.0, -0.06279051952931118], [1440972288.0, 69920000.0, -0.05024431817976815], [1440974016.0, 69940000.0, -0.03769018266994101], [1440975744.0, 69960000.0, -0.02513009544333763], [1440977472.0, 69980000.0, -0.01256603988335354], [1440979200.0, 70000000.0, -1.7145055188062944e-15], [1440980928.0, 70020000.0, 0.012566039883350111], [1440982656.0, 70040000.0, 0.0251300954433342], [1440984384.0, 70060000.0, 0.03769018266993758], [1440986112.0, 70080000.0, 0.05024431817976472], [1440987840.0, 70100000.0, 0.06279051952930777], [1440989568.0, 70120000.0, 0.07532680552792634], [1440991296.0, 70140000.0, 0.0878511965507431], [1440993024.0, 70160000.0, 0.10036171485121402], [1440994752.0, 70180000.0, 0.11285638487348006], [1440996480.0, 70200000.0, 0.12533323356430184], [1440998208.0, 70220000.0, 0.1377902906846349], [1440999936.0, 70240000.0, 0.1502255891207531], [1441001664.0, 70260000.0, 0.1626371651948789], [1441003392.0, 70280000.0, 0.17502305897527057], [1441005120.0, 70300000.0, 0.18738131458572538], [1441006848.0, 70320000.0, 0.199709980514407], [1441008576.0, 70340000.0, 0.21200710992205385], [1441010304.0, 70360000.0, 0.22427076094937964], [1441012032.0, 70380000.0, 0.23649899702372237], [1441013760.0, 70400000.0, 0.24868988716485174], [1441015488.0, 70420000.0, 0.26084150628989317], [1441017216.0, 70440000.0, 0.2729519355173207], [1441018944.0, 70460000.0, 0.28501926246997766], [1441020672.0, 70480000.0, 0.2970415815770357], [1441022400.0, 70500000.0, 0.30901699437494745], [1441024128.0, 70520000.0, 0.3209436098072088], [1441025856.0, 70540000.0, 0.33281954452298523], [1441027584.0, 70560000.0, 0.3446429231745149], [1441029312.0, 70580000.0, 0.3564118787132478], [1441031040.0, 70600000.0, 0.36812455268467437], [1441032768.0, 70620000.0, 0.3797790955218034], [1441034496.0, 70640000.0, 0.391373666837204], [1441036224.0, 70660000.0, 0.402906435713657], [1441037952.0, 70680000.0, 0.41437558099327776], [1441039680.0, 70700000.0, 0.42577929156507205], [1441041408.0, 70720000.0, 0.4371157666509316], [1441043136.0, 70740000.0, 0.4483832160900303], [1441044864.0, 70760000.0, 0.4595798606214852], [1441046592.0, 70780000.0, 0.47070393216533546], [1441048320.0, 70800000.0, 0.48175367410171127], [1441050048.0, 70820000.0, 0.4927273415482869], [1441051776.0, 70840000.0, 0.5036232016357556], [1441053504.0, 70860000.0, 0.5144395337815066], [1441055232.0, 70880000.0, 0.5251746299612952], [1441056960.0, 70900000.0, 0.5358267949789954], [1441058688.0, 70920000.0, 0.5463943467342672], [1441060416.0, 70940000.0, 0.5568756164881855], [1441062144.0, 70960000.0, 0.5672689491267534], [1441063872.0, 70980000.0, 0.577572703422264], [1441065600.0, 71000000.0, 0.5877852522924688], [1441067328.0, 71020000.0, 0.5979049830575197], [1441069056.0, 71040000.0, 0.6079302976946056], [1441070784.0, 71060000.0, 0.6178596130903339], [1441072512.0, 71080000.0, 0.6276913612906995], [1441074240.0, 71100000.0, 0.6374239897486881], [1441075968.0, 71120000.0, 0.6470559615694421], [1441077696.0, 71140000.0, 0.6565857557529536], [1441079424.0, 71160000.0, 0.6660118674342483], [1441081152.0, 71180000.0, 0.6753328081210258], [1441082880.0, 71200000.0, 0.6845471059286895], [1441084608.0, 71220000.0, 0.6936533058128052], [1441086336.0, 71240000.0, 0.7026499697988489], [1441088064.0, 71260000.0, 0.7115356772092845], [1441089792.0, 71280000.0, 0.7203090248879055], [1441091520.0, 71300000.0, 0.7289686274214096], [1441093248.0, 71320000.0, 0.7375131173581715], [1441094976.0, 71340000.0, 0.745941145424184], [1441096704.0, 71360000.0, 0.7542513807361051], [1441098432.0, 71380000.0, 0.7624425110114487], [1441100160.0, 71400000.0, 0.7705132427757849], [1441101888.0, 71420000.0, 0.7784623015670231], [1441103616.0, 71440000.0, 0.7862884321366181], [1441105344.0, 71460000.0, 0.7939903986478342], [1441107072.0, 71480000.0, 0.801566984870875], [1441108800.0, 71500000.0, 0.8090169943749496], [1441110528.0, 71520000.0, 0.8163392507171815], [1441112256.0, 71540000.0, 0.8235325976284246], [1441113984.0, 71560000.0, 0.8305958991958093], [1441115712.0, 71580000.0, 0.837528040042142], [1441117440.0, 71600000.0, 0.8443279255020149], [1441119168.0, 71620000.0, 0.8509944817946913], [1441120896.0, 71640000.0, 0.8575266561936513], [1441122624.0, 71660000.0, 0.8639234171928339], [1441124352.0, 71680000.0, 0.8701837546695239], [1441126080.0, 71700000.0, 0.8763066800438616], [1441127808.0, 71720000.0, 0.882291226434951], [1441129536.0, 71740000.0, 0.8881364488135451], [1441131264.0, 71760000.0, 0.893841424151264], [1441132992.0, 71780000.0, 0.899405251566371], [1441134720.0, 71800000.0, 0.9048270524660191], [1441136448.0, 71820000.0, 0.9101059706849949], [1441138176.0, 71840000.0, 0.9152411726209165], [1441139904.0, 71860000.0, 0.920231847365869], [1441141632.0, 71880000.0, 0.9250772068344565], [1441143360.0, 71900000.0, 0.9297764858882522], [1441145088.0, 71920000.0, 0.9343289424566126], [1441146816.0, 71940000.0, 0.9387338576538743], [1441148544.0, 71960000.0, 0.9429905358928644], [1441150272.0, 71980000.0, 0.947098304994744], [1441152000.0, 72000000.0, 0.951056516295153], [1441153728.0, 72020000.0, 0.9548645447466422], [1441155456.0, 72040000.0, 0.958521789017375], [1441157184.0, 72060000.0, 0.9620276715860867], [1441158912.0, 72080000.0, 0.9653816388332744], [1441160640.0, 72100000.0, 0.9685831611286315], [1441162368.0, 72120000.0, 0.9716317329146724], [1441164096.0, 72140000.0, 0.9745268727865771], [1441165824.0, 72160000.0, 0.9772681235681933], [1441167552.0, 72180000.0, 0.9798550523842465], [1441169280.0, 72200000.0, 0.9822872507286882], [1441171008.0, 72220000.0, 0.9845643345292061], [1441172736.0, 72240000.0, 0.9866859442078674], [1441174464.0, 72260000.0, 0.9886517447379133], [1441176192.0, 72280000.0, 0.9904614256966504], [1441177920.0, 72300000.0, 0.9921147013144779], [1441179648.0, 72320000.0, 0.9936113105200084], [1441181376.0, 72340000.0, 0.9949510169813001], [1441183104.0, 72360000.0, 0.9961336091431723], [1441184832.0, 72380000.0, 0.9971589002606137], [1441186560.0, 72400000.0, 0.9980267284282713], [1441188288.0, 72420000.0, 0.9987369566060172], [1441190016.0, 72440000.0, 0.9992894726405891], [1441191744.0, 72460000.0, 0.9996841892833], [1441193472.0, 72480000.0, 0.9999210442038161], [1441195200.0, 72500000.0, 1.0], [1441196928.0, 72520000.0, 0.9999210442038161], [1441198656.0, 72540000.0, 0.9996841892833], [1441200384.0, 72560000.0, 0.9992894726405893], [1441202112.0, 72580000.0, 0.9987369566060177], [1441203840.0, 72600000.0, 0.9980267284282718], [1441205568.0, 72620000.0, 0.9971589002606143], [1441207296.0, 72640000.0, 0.9961336091431723], [1441209024.0, 72660000.0, 0.9949510169813001], [1441210752.0, 72680000.0, 0.9936113105200084], [1441212480.0, 72700000.0, 0.9921147013144788], [1441214208.0, 72720000.0, 0.9904614256966514], [1441215936.0, 72740000.0, 0.9886517447379144], [1441217664.0, 72760000.0, 0.9866859442078686], [1441219392.0, 72780000.0, 0.984564334529206], [1441221120.0, 72800000.0, 0.9822872507286882], [1441222848.0, 72820000.0, 0.9798550523842465], [1441224576.0, 72840000.0, 0.9772681235681948], [1441226304.0, 72860000.0, 0.9745268727865787], [1441228032.0, 72880000.0, 0.9716317329146741], [1441229760.0, 72900000.0, 0.9685831611286315], [1441231488.0, 72920000.0, 0.9653816388332744], [1441233216.0, 72940000.0, 0.9620276715860867], [1441234944.0, 72960000.0, 0.958521789017375], [1441236672.0, 72980000.0, 0.9548645447466443], [1441238400.0, 73000000.0, 0.9510565162951552], [1441240128.0, 73020000.0, 0.9470983049947462], [1441241856.0, 73040000.0, 0.9429905358928644], [1441243584.0, 73060000.0, 0.9387338576538743], [1441245312.0, 73080000.0, 0.9343289424566126], [1441247040.0, 73100000.0, 0.9297764858882522], [1441248768.0, 73120000.0, 0.9250772068344592], [1441250496.0, 73140000.0, 0.9202318473658718], [1441252224.0, 73160000.0, 0.9152411726209194], [1441253952.0, 73180000.0, 0.9101059706849979], [1441255680.0, 73200000.0, 0.9048270524660191], [1441257408.0, 73220000.0, 0.899405251566371], [1441259136.0, 73240000.0, 0.893841424151264], [1441260864.0, 73260000.0, 0.8881364488135451], [1441262592.0, 73280000.0, 0.8822912264349543], [1441264320.0, 73300000.0, 0.8763066800438649], [1441266048.0, 73320000.0, 0.8701837546695275], [1441267776.0, 73340000.0, 0.8639234171928375], [1441269504.0, 73360000.0, 0.8575266561936513], [1441271232.0, 73380000.0, 0.8509944817946913], [1441272960.0, 73400000.0, 0.8443279255020149], [1441274688.0, 73420000.0, 0.8375280400421459], [1441276416.0, 73440000.0, 0.8305958991958133], [1441278144.0, 73460000.0, 0.8235325976284286], [1441279872.0, 73480000.0, 0.8163392507171856], [1441281600.0, 73500000.0, 0.8090169943749496], [1441283328.0, 73520000.0, 0.801566984870875], [1441285056.0, 73540000.0, 0.7939903986478342], [1441286784.0, 73560000.0, 0.7862884321366226], [1441288512.0, 73580000.0, 0.7784623015670277], [1441290240.0, 73600000.0, 0.7705132427757895], [1441291968.0, 73620000.0, 0.7624425110114487], [1441293696.0, 73640000.0, 0.7542513807361051], [1441295424.0, 73660000.0, 0.745941145424184], [1441297152.0, 73680000.0, 0.7375131173581715], [1441298880.0, 73700000.0, 0.7289686274214144], [1441300608.0, 73720000.0, 0.7203090248879104], [1441302336.0, 73740000.0, 0.7115356772092895], [1441304064.0, 73760000.0, 0.7026499697988489], [1441305792.0, 73780000.0, 0.6936533058128052], [1441307520.0, 73800000.0, 0.6845471059286895], [1441309248.0, 73820000.0, 0.6753328081210258], [1441310976.0, 73840000.0, 0.6660118674342536], [1441312704.0, 73860000.0, 0.6565857557529591], [1441314432.0, 73880000.0, 0.6470559615694476], [1441316160.0, 73900000.0, 0.6374239897486935], [1441317888.0, 73920000.0, 0.6276913612906995], [1441319616.0, 73940000.0, 0.6178596130903339], [1441321344.0, 73960000.0, 0.6079302976946056], [1441323072.0, 73980000.0, 0.5979049830575197], [1441324800.0, 74000000.0, 0.5877852522924746], [1441326528.0, 74020000.0, 0.5775727034222697], [1441328256.0, 74040000.0, 0.5672689491267593], [1441329984.0, 74060000.0, 0.5568756164881915], [1441331712.0, 74080000.0, 0.5463943467342672], [1441333440.0, 74100000.0, 0.5358267949789954], [1441335168.0, 74120000.0, 0.5251746299612952], [1441336896.0, 74140000.0, 0.5144395337815066], [1441338624.0, 74160000.0, 0.5036232016357617], [1441340352.0, 74180000.0, 0.49272734154829306], [1441342080.0, 74200000.0, 0.4817536741017175], [1441343808.0, 74220000.0, 0.47070393216533546], [1441345536.0, 74240000.0, 0.4595798606214852], [1441347264.0, 74260000.0, 0.4483832160900303], [1441348992.0, 74280000.0, 0.437115766650938], [1441350720.0, 74300000.0, 0.4257792915650785], [1441352448.0, 74320000.0, 0.41437558099328425], [1441354176.0, 74340000.0, 0.40290643571366347], [1441355904.0, 74360000.0, 0.391373666837204], [1441357632.0, 74380000.0, 0.3797790955218034], [1441359360.0, 74400000.0, 0.36812455268467437], [1441361088.0, 74420000.0, 0.35641187871325447], [1441362816.0, 74440000.0, 0.34464292317452155], [1441364544.0, 74460000.0, 0.3328195445229919], [1441366272.0, 74480000.0, 0.3209436098072088], [1441368000.0, 74500000.0, 0.30901699437494745], [1441369728.0, 74520000.0, 0.2970415815770357], [1441371456.0, 74540000.0, 0.28501926246997766], [1441373184.0, 74560000.0, 0.27295193551732755], [1441374912.0, 74580000.0, 0.2608415062899], [1441376640.0, 74600000.0, 0.24868988716485863], [1441378368.0, 74620000.0, 0.23649899702372928], [1441380096.0, 74640000.0, 0.22427076094937964], [1441381824.0, 74660000.0, 0.21200710992205385], [1441383552.0, 74680000.0, 0.199709980514407], [1441385280.0, 74700000.0, 0.18738131458572538], [1441387008.0, 74720000.0, 0.17502305897527756], [1441388736.0, 74740000.0, 0.1626371651948859], [1441390464.0, 74760000.0, 0.15022558912076012], [1441392192.0, 74780000.0, 0.1377902906846419], [1441393920.0, 74800000.0, 0.12533323356430184], [1441395648.0, 74820000.0, 0.11285638487348004], [1441397376.0, 74840000.0, 0.10036171485121402], [1441399104.0, 74860000.0, 0.0878511965507431], [1441400832.0, 74880000.0, 0.07532680552793342], [1441402560.0, 74900000.0, 0.06279051952931486], [1441404288.0, 74920000.0, 0.05024431817977182], [1441406016.0, 74940000.0, 0.03769018266993758], [1441407744.0, 74960000.0, 0.0251300954433342], [1441409472.0, 74980000.0, 0.012566039883350111], [1441411200.0, 75000000.0, 5.3896838775215305e-15], [1441412928.0, 75020000.0, -0.012566039883346437], [1441414656.0, 75040000.0, -0.025130095443330527], [1441416384.0, 75060000.0, -0.03769018266993391], [1441418112.0, 75080000.0, -0.05024431817976815], [1441419840.0, 75100000.0, -0.06279051952931118], [1441421568.0, 75120000.0, -0.07532680552792975], [1441423296.0, 75140000.0, -0.08785119655073943], [1441425024.0, 75160000.0, -0.10036171485121037], [1441426752.0, 75180000.0, -0.11285638487347639], [1441428480.0, 75200000.0, -0.12533323356429818], [1441430208.0, 75220000.0, -0.13779029068463827], [1441431936.0, 75240000.0, -0.1502255891207565], [1441433664.0, 75260000.0, -0.16263716519488228], [1441435392.0, 75280000.0, -0.17502305897527395], [1441437120.0, 75300000.0, -0.18738131458572177], [1441438848.0, 75320000.0, -0.1997099805144034], [1441440576.0, 75340000.0, -0.21200710992205027], [1441442304.0, 75360000.0, -0.22427076094937606], [1441444032.0, 75380000.0, -0.23649899702372573], [1441445760.0, 75400000.0, -0.24868988716485507], [1441447488.0, 75420000.0, -0.26084150628989644], [1441449216.0, 75440000.0, -0.272951935517324], [1441450944.0, 75460000.0, -0.28501926246997417], [1441452672.0, 75480000.0, -0.2970415815770322], [1441454400.0, 75500000.0, -0.30901699437494395], [1441456128.0, 75520000.0, -0.3209436098072053], [1441457856.0, 75540000.0, -0.33281954452298845], [1441459584.0, 75560000.0, -0.3446429231745181], [1441461312.0, 75580000.0, -0.35641187871325103], [1441463040.0, 75600000.0, -0.3681245526846709], [1441464768.0, 75620000.0, -0.3797790955218], [1441466496.0, 75640000.0, -0.3913736668372006], [1441468224.0, 75660000.0, -0.40290643571366014], [1441469952.0, 75680000.0, -0.4143755809932809], [1441471680.0, 75700000.0, -0.42577929156507516], [1441473408.0, 75720000.0, -0.4371157666509347], [1441475136.0, 75740000.0, -0.44838321609002696], [1441476864.0, 75760000.0, -0.45957986062148193], [1441478592.0, 75780000.0, -0.47070393216533224], [1441480320.0, 75800000.0, -0.48175367410171427], [1441482048.0, 75820000.0, -0.4927273415482899], [1441483776.0, 75840000.0, -0.5036232016357585], [1441485504.0, 75860000.0, -0.5144395337815034], [1441487232.0, 75880000.0, -0.525174629961292], [1441488960.0, 75900000.0, -0.5358267949789923], [1441490688.0, 75920000.0, -0.5463943467342642], [1441492416.0, 75940000.0, -0.5568756164881884], [1441494144.0, 75960000.0, -0.5672689491267563], [1441495872.0, 75980000.0, -0.5775727034222667], [1441497600.0, 76000000.0, -0.5877852522924716], [1441499328.0, 76020000.0, -0.5979049830575167], [1441501056.0, 76040000.0, -0.6079302976946027], [1441502784.0, 76060000.0, -0.617859613090331], [1441504512.0, 76080000.0, -0.6276913612906966], [1441506240.0, 76100000.0, -0.6374239897486907], [1441507968.0, 76120000.0, -0.6470559615694448], [1441509696.0, 76140000.0, -0.6565857557529563], [1441511424.0, 76160000.0, -0.6660118674342509], [1441513152.0, 76180000.0, -0.6753328081210231], [1441514880.0, 76200000.0, -0.6845471059286868], [1441516608.0, 76220000.0, -0.6936533058128026], [1441518336.0, 76240000.0, -0.7026499697988463], [1441520064.0, 76260000.0, -0.7115356772092869], [1441521792.0, 76280000.0, -0.7203090248879078], [1441523520.0, 76300000.0, -0.728968627421412], [1441525248.0, 76320000.0, -0.737513117358169], [1441526976.0, 76340000.0, -0.7459411454241816], [1441528704.0, 76360000.0, -0.7542513807361027], [1441530432.0, 76380000.0, -0.7624425110114462], [1441532160.0, 76400000.0, -0.7705132427757871], [1441533888.0, 76420000.0, -0.7784623015670253], [1441535616.0, 76440000.0, -0.7862884321366203], [1441537344.0, 76460000.0, -0.7939903986478319], [1441539072.0, 76480000.0, -0.8015669848708727], [1441540800.0, 76500000.0, -0.8090169943749473], [1441542528.0, 76520000.0, -0.8163392507171835], [1441544256.0, 76540000.0, -0.8235325976284265], [1441545984.0, 76560000.0, -0.8305958991958112], [1441547712.0, 76580000.0, -0.8375280400421439], [1441549440.0, 76600000.0, -0.844327925502013], [1441551168.0, 76620000.0, -0.8509944817946894], [1441552896.0, 76640000.0, -0.8575266561936494], [1441554624.0, 76660000.0, -0.8639234171928357], [1441556352.0, 76680000.0, -0.8701837546695257], [1441558080.0, 76700000.0, -0.8763066800438633], [1441559808.0, 76720000.0, -0.8822912264349525], [1441561536.0, 76740000.0, -0.8881364488135435], [1441563264.0, 76760000.0, -0.8938414241512623], [1441564992.0, 76780000.0, -0.8994052515663693], [1441566720.0, 76800000.0, -0.9048270524660175], [1441568448.0, 76820000.0, -0.9101059706849963], [1441570176.0, 76840000.0, -0.9152411726209179], [1441571904.0, 76860000.0, -0.9202318473658704], [1441573632.0, 76880000.0, -0.9250772068344577], [1441575360.0, 76900000.0, -0.9297764858882508], [1441577088.0, 76920000.0, -0.9343289424566112], [1441578816.0, 76940000.0, -0.9387338576538731], [1441580544.0, 76960000.0, -0.9429905358928632], [1441582272.0, 76980000.0, -0.9470983049947451], [1441584000.0, 77000000.0, -0.9510565162951541], [1441585728.0, 77020000.0, -0.9548645447466432], [1441587456.0, 77040000.0, -0.9585217890173738], [1441589184.0, 77060000.0, -0.9620276715860857], [1441590912.0, 77080000.0, -0.9653816388332735], [1441592640.0, 77100000.0, -0.9685831611286305], [1441594368.0, 77120000.0, -0.9716317329146732], [1441596096.0, 77140000.0, -0.9745268727865779], [1441597824.0, 77160000.0, -0.977268123568194], [1441599552.0, 77180000.0, -0.9798550523842458], [1441601280.0, 77200000.0, -0.9822872507286875], [1441603008.0, 77220000.0, -0.9845643345292054], [1441604736.0, 77240000.0, -0.9866859442078679], [1441606464.0, 77260000.0, -0.9886517447379138], [1441608192.0, 77280000.0, -0.990461425696651], [1441609920.0, 77300000.0, -0.9921147013144783], [1441611648.0, 77320000.0, -0.993611310520008], [1441613376.0, 77340000.0, -0.9949510169812997], [1441615104.0, 77360000.0, -0.996133609143172], [1441616832.0, 77380000.0, -0.997158900260614], [1441618560.0, 77400000.0, -0.9980267284282716], [1441620288.0, 77420000.0, -0.9987369566060175], [1441622016.0, 77440000.0, -0.9992894726405892], [1441623744.0, 77460000.0, -0.9996841892832999], [1441625472.0, 77480000.0, -0.9999210442038161], [1441627200.0, 77500000.0, -1.0], [1441628928.0, 77520000.0, -0.9999210442038162], [1441630656.0, 77540000.0, -0.9996841892833002], [1441632384.0, 77560000.0, -0.9992894726405892], [1441634112.0, 77580000.0, -0.9987369566060175], [1441635840.0, 77600000.0, -0.9980267284282716], [1441637568.0, 77620000.0, -0.997158900260614], [1441639296.0, 77640000.0, -0.9961336091431726], [1441641024.0, 77660000.0, -0.9949510169813004], [1441642752.0, 77680000.0, -0.9936113105200088], [1441644480.0, 77700000.0, -0.9921147013144784], [1441646208.0, 77720000.0, -0.990461425696651], [1441647936.0, 77740000.0, -0.9886517447379138], [1441649664.0, 77760000.0, -0.986685944207868], [1441651392.0, 77780000.0, -0.9845643345292067], [1441653120.0, 77800000.0, -0.9822872507286889], [1441654848.0, 77820000.0, -0.9798550523842473], [1441656576.0, 77840000.0, -0.977268123568194], [1441658304.0, 77860000.0, -0.9745268727865779], [1441660032.0, 77880000.0, -0.9716317329146733], [1441661760.0, 77900000.0, -0.9685831611286324], [1441663488.0, 77920000.0, -0.9653816388332754], [1441665216.0, 77940000.0, -0.9620276715860877], [1441666944.0, 77960000.0, -0.958521789017376], [1441668672.0, 77980000.0, -0.9548645447466433], [1441670400.0, 78000000.0, -0.9510565162951542], [1441672128.0, 78020000.0, -0.9470983049947451], [1441673856.0, 78040000.0, -0.9429905358928656], [1441675584.0, 78060000.0, -0.9387338576538755], [1441677312.0, 78080000.0, -0.9343289424566138], [1441679040.0, 78100000.0, -0.9297764858882536], [1441680768.0, 78120000.0, -0.9250772068344578], [1441682496.0, 78140000.0, -0.9202318473658705], [1441684224.0, 78160000.0, -0.915241172620918], [1441685952.0, 78180000.0, -0.9101059706849964], [1441687680.0, 78200000.0, -0.9048270524660207], [1441689408.0, 78220000.0, -0.8994052515663725], [1441691136.0, 78240000.0, -0.8938414241512657], [1441692864.0, 78260000.0, -0.8881364488135468], [1441694592.0, 78280000.0, -0.8822912264349526], [1441696320.0, 78300000.0, -0.8763066800438634], [1441698048.0, 78320000.0, -0.8701837546695258], [1441699776.0, 78340000.0, -0.8639234171928358], [1441701504.0, 78360000.0, -0.8575266561936532], [1441703232.0, 78380000.0, -0.8509944817946932], [1441704960.0, 78400000.0, -0.8443279255020169], [1441706688.0, 78420000.0, -0.837528040042144], [1441708416.0, 78440000.0, -0.8305958991958114], [1441710144.0, 78460000.0, -0.8235325976284267], [1441711872.0, 78480000.0, -0.8163392507171836], [1441713600.0, 78500000.0, -0.8090169943749517], [1441715328.0, 78520000.0, -0.8015669848708771], [1441717056.0, 78540000.0, -0.7939903986478364], [1441718784.0, 78560000.0, -0.7862884321366205], [1441720512.0, 78580000.0, -0.7784623015670254], [1441722240.0, 78600000.0, -0.7705132427757873], [1441723968.0, 78620000.0, -0.762442511011451], [1441725696.0, 78640000.0, -0.7542513807361075], [1441727424.0, 78660000.0, -0.7459411454241864], [1441729152.0, 78680000.0, -0.7375131173581739], [1441730880.0, 78700000.0, -0.7289686274214121], [1441732608.0, 78720000.0, -0.720309024887908], [1441734336.0, 78740000.0, -0.711535677209287], [1441736064.0, 78760000.0, -0.7026499697988515], [1441737792.0, 78780000.0, -0.6936533058128078], [1441739520.0, 78800000.0, -0.6845471059286922], [1441741248.0, 78820000.0, -0.6753328081210286], [1441742976.0, 78840000.0, -0.666011867434251], [1441744704.0, 78860000.0, -0.6565857557529564], [1441746432.0, 78880000.0, -0.6470559615694449], [1441748160.0, 78900000.0, -0.637423989748691], [1441749888.0, 78920000.0, -0.6276913612907024], [1441751616.0, 78940000.0, -0.6178596130903368], [1441753344.0, 78960000.0, -0.6079302976946085], [1441755072.0, 78980000.0, -0.5979049830575226], [1441756800.0, 79000000.0, -0.5877852522924718], [1441758528.0, 79020000.0, -0.577572703422267], [1441760256.0, 79040000.0, -0.5672689491267564], [1441761984.0, 79060000.0, -0.5568756164881886], [1441763712.0, 79080000.0, -0.5463943467342703], [1441765440.0, 79100000.0, -0.5358267949789985], [1441767168.0, 79120000.0, -0.5251746299612983], [1441768896.0, 79140000.0, -0.5144395337815098], [1441770624.0, 79160000.0, -0.5036232016357587], [1441772352.0, 79180000.0, -0.4927273415482901], [1441774080.0, 79200000.0, -0.4817536741017145], [1441775808.0, 79220000.0, -0.47070393216533873], [1441777536.0, 79240000.0, -0.4595798606214884], [1441779264.0, 79260000.0, -0.44838321609003357], [1441780992.0, 79280000.0, -0.43711576665093493], [1441782720.0, 79300000.0, -0.4257792915650754], [1441784448.0, 79320000.0, -0.41437558099328115], [1441786176.0, 79340000.0, -0.40290643571366036], [1441787904.0, 79360000.0, -0.39137366683720737], [1441789632.0, 79380000.0, -0.3797790955218068], [1441791360.0, 79400000.0, -0.3681245526846778], [1441793088.0, 79420000.0, -0.35641187871325125], [1441794816.0, 79440000.0, -0.34464292317451833], [1441796544.0, 79460000.0, -0.3328195445229887], [1441798272.0, 79480000.0, -0.32094360980721226], [1441800000.0, 79500000.0, -0.30901699437495095], [1441801728.0, 79520000.0, -0.2970415815770392], [1441803456.0, 79540000.0, -0.2850192624699812], [1441805184.0, 79560000.0, -0.2729519355173242], [1441806912.0, 79580000.0, -0.26084150628989666], [1441808640.0, 79600000.0, -0.2486898871648553], [1441810368.0, 79620000.0, -0.23649899702372595], [1441812096.0, 79640000.0, -0.22427076094938322], [1441813824.0, 79660000.0, -0.21200710992205746], [1441815552.0, 79680000.0, -0.1997099805144106], [1441817280.0, 79700000.0, -0.187381314585729], [1441819008.0, 79720000.0, -0.1750230589752742], [1441820736.0, 79740000.0, -0.16263716519488253], [1441822464.0, 79760000.0, -0.15022558912075673], [1441824192.0, 79780000.0, -0.13779029068463852], [1441825920.0, 79800000.0, -0.12533323356430548], [1441827648.0, 79820000.0, -0.1128563848734837], [1441829376.0, 79840000.0, -0.10036171485121768], [1441831104.0, 79860000.0, -0.08785119655074675], [1441832832.0, 79880000.0, -0.07532680552793], [1441834560.0, 79900000.0, -0.06279051952931143], [1441836288.0, 79920000.0, -0.05024431817976839], [1441838016.0, 79940000.0, -0.03769018266994125], [1441839744.0, 79960000.0, -0.025130095443337875], [1441841472.0, 79980000.0, -0.012566039883353785], [1441843200.0, 80000000.0, -1.959434878635765e-15], [1441844928.0, 80020000.0, 0.012566039883356972], [1441846656.0, 80040000.0, 0.025130095443333958], [1441848384.0, 80060000.0, 0.03769018266993734], [1441850112.0, 80080000.0, 0.05024431817976448], [1441851840.0, 80100000.0, 0.06279051952930752], [1441853568.0, 80120000.0, 0.07532680552793318], [1441855296.0, 80140000.0, 0.08785119655073577], [1441857024.0, 80160000.0, 0.10036171485121378], [1441858752.0, 80180000.0, 0.11285638487348687], [1441860480.0, 80200000.0, 0.1253332335643016], [1441862208.0, 80220000.0, 0.13779029068463464], [1441863936.0, 80240000.0, 0.15022558912075287], [1441865664.0, 80260000.0, 0.16263716519487864], [1441867392.0, 80280000.0, 0.17502305897527734], [1441869120.0, 80300000.0, 0.18738131458571816], [1441870848.0, 80320000.0, 0.19970998051440675], [1441872576.0, 80340000.0, 0.21200710992205363], [1441874304.0, 80360000.0, 0.2242707609493794], [1441876032.0, 80380000.0, 0.23649899702372215], [1441877760.0, 80400000.0, 0.2486898871648446], [1441879488.0, 80420000.0, 0.2608415062898929], [1441881216.0, 80440000.0, 0.2729519355173273], [1441882944.0, 80460000.0, 0.2850192624699706], [1441884672.0, 80480000.0, 0.2970415815770355], [1441886400.0, 80500000.0, 0.30901699437494723], [1441888128.0, 80520000.0, 0.32094360980720854], [1441889856.0, 80540000.0, 0.33281954452298496], [1441891584.0, 80560000.0, 0.344642923174508], [1441893312.0, 80580000.0, 0.3564118787132476], [1441895040.0, 80600000.0, 0.36812455268468075], [1441896768.0, 80620000.0, 0.37977909552179656], [1441898496.0, 80640000.0, 0.39137366683720376], [1441900224.0, 80660000.0, 0.40290643571366325], [1441901952.0, 80680000.0, 0.41437558099327754], [1441903680.0, 80700000.0, 0.4257792915650718], [1441905408.0, 80720000.0, 0.437115766650925], [1441907136.0, 80740000.0, 0.44838321609003007], [1441908864.0, 80760000.0, 0.45957986062149125], [1441910592.0, 80780000.0, 0.470703932165329], [1441912320.0, 80800000.0, 0.48175367410171105], [1441914048.0, 80820000.0, 0.4927273415482929], [1441915776.0, 80840000.0, 0.5036232016357554], [1441917504.0, 80860000.0, 0.5144395337815064], [1441919232.0, 80880000.0, 0.5251746299612889], [1441920960.0, 80900000.0, 0.5358267949789952], [1441922688.0, 80920000.0, 0.546394346734273], [1441924416.0, 80940000.0, 0.5568756164881854], [1441926144.0, 80960000.0, 0.5672689491267532], [1441927872.0, 80980000.0, 0.5775727034222695], [1441929600.0, 81000000.0, 0.5877852522924687], [1441931328.0, 81020000.0, 0.5979049830575195], [1441933056.0, 81040000.0, 0.6079302976945997], [1441934784.0, 81060000.0, 0.6178596130903338], [1441936512.0, 81080000.0, 0.6276913612906992], [1441938240.0, 81100000.0, 0.6374239897486879], [1441939968.0, 81120000.0, 0.6470559615694419], [1441941696.0, 81140000.0, 0.6565857557529589], [1441943424.0, 81160000.0, 0.6660118674342481], [1441945152.0, 81180000.0, 0.6753328081210257], [1441946880.0, 81200000.0, 0.6845471059286841], [1441948608.0, 81220000.0, 0.693653305812805], [1441950336.0, 81240000.0, 0.7026499697988488], [1441952064.0, 81260000.0, 0.7115356772092793], [1441953792.0, 81280000.0, 0.7203090248879053], [1441955520.0, 81300000.0, 0.7289686274214143], [1441957248.0, 81320000.0, 0.7375131173581713], [1441958976.0, 81340000.0, 0.7459411454241838], [1441960704.0, 81360000.0, 0.7542513807361003], [1441962432.0, 81380000.0, 0.7624425110114484], [1441964160.0, 81400000.0, 0.7705132427757894], [1441965888.0, 81420000.0, 0.7784623015670186], [1441967616.0, 81440000.0, 0.786288432136618], [1441969344.0, 81460000.0, 0.7939903986478383], [1441971072.0, 81480000.0, 0.8015669848708747], [1441972800.0, 81500000.0, 0.8090169943749493], [1441974528.0, 81520000.0, 0.8163392507171813], [1441976256.0, 81540000.0, 0.8235325976284243], [1441977984.0, 81560000.0, 0.8305958991958132], [1441979712.0, 81580000.0, 0.837528040042138], [1441981440.0, 81600000.0, 0.8443279255020147], [1441983168.0, 81620000.0, 0.8509944817946948], [1441984896.0, 81640000.0, 0.8575266561936512], [1441986624.0, 81660000.0, 0.8639234171928338], [1441988352.0, 81680000.0, 0.8701837546695238], [1441990080.0, 81700000.0, 0.8763066800438615], [1441991808.0, 81720000.0, 0.8822912264349542], [1441993536.0, 81740000.0, 0.8881364488135417], [1441995264.0, 81760000.0, 0.8938414241512639], [1441996992.0, 81780000.0, 0.899405251566374], [1441998720.0, 81800000.0, 0.904827052466019], [1442000448.0, 81820000.0, 0.9101059706849948], [1442002176.0, 81840000.0, 0.9152411726209135], [1442003904.0, 81860000.0, 0.9202318473658689], [1442005632.0, 81880000.0, 0.925077206834459], [1442007360.0, 81900000.0, 0.9297764858882495], [1442009088.0, 81920000.0, 0.9343289424566125], [1442010816.0, 81940000.0, 0.9387338576538742], [1442012544.0, 81960000.0, 0.9429905358928643], [1442014272.0, 81980000.0, 0.9470983049947439], [1442016000.0, 82000000.0, 0.9510565162951508], [1442017728.0, 82020000.0, 0.9548645447466422], [1442019456.0, 82040000.0, 0.958521789017377], [1442021184.0, 82060000.0, 0.9620276715860847], [1442022912.0, 82080000.0, 0.9653816388332744], [1442024640.0, 82100000.0, 0.9685831611286314], [1442026368.0, 82120000.0, 0.9716317329146723], [1442028096.0, 82140000.0, 0.974526872786577], [1442029824.0, 82160000.0, 0.9772681235681917], [1442031552.0, 82180000.0, 0.9798550523842465], [1442033280.0, 82200000.0, 0.9822872507286895], [1442035008.0, 82220000.0, 0.9845643345292048], [1442036736.0, 82240000.0, 0.9866859442078674], [1442038464.0, 82260000.0, 0.9886517447379143], [1442040192.0, 82280000.0, 0.9904614256966504], [1442041920.0, 82300000.0, 0.9921147013144779], [1442043648.0, 82320000.0, 0.9936113105200076], [1442045376.0, 82340000.0, 0.9949510169813001], [1442047104.0, 82360000.0, 0.996133609143173], [1442048832.0, 82380000.0, 0.9971589002606137], [1442050560.0, 82400000.0, 0.9980267284282713], [1442052288.0, 82420000.0, 0.9987369566060177], [1442054016.0, 82440000.0, 0.9992894726405891], [1442055744.0, 82460000.0, 0.9996841892833], [1442057472.0, 82480000.0, 0.999921044203816], [1442059200.0, 82500000.0, 1.0], [1442060928.0, 82520000.0, 0.9999210442038161], [1442062656.0, 82540000.0, 0.9996841892833], [1442064384.0, 82560000.0, 0.9992894726405893], [1442066112.0, 82580000.0, 0.998736956606018], [1442067840.0, 82600000.0, 0.9980267284282718], [1442069568.0, 82620000.0, 0.9971589002606138], [1442071296.0, 82640000.0, 0.996133609143173], [1442073024.0, 82660000.0, 0.9949510169813001], [1442074752.0, 82680000.0, 0.9936113105200084], [1442076480.0, 82700000.0, 0.9921147013144789], [1442078208.0, 82720000.0, 0.9904614256966514], [1442079936.0, 82740000.0, 0.9886517447379155], [1442081664.0, 82760000.0, 0.9866859442078686], [1442083392.0, 82780000.0, 0.9845643345292049], [1442085120.0, 82800000.0, 0.9822872507286896], [1442086848.0, 82820000.0, 0.9798550523842465], [1442088576.0, 82840000.0, 0.9772681235681934], [1442090304.0, 82860000.0, 0.9745268727865788], [1442092032.0, 82880000.0, 0.9716317329146742], [1442093760.0, 82900000.0, 0.9685831611286333], [1442095488.0, 82920000.0, 0.9653816388332745], [1442097216.0, 82940000.0, 0.9620276715860848], [1442098944.0, 82960000.0, 0.9585217890173771], [1442100672.0, 82980000.0, 0.9548645447466444], [1442102400.0, 83000000.0, 0.9510565162951531], [1442104128.0, 83020000.0, 0.9470983049947463], [1442105856.0, 83040000.0, 0.9429905358928645], [1442107584.0, 83060000.0, 0.9387338576538768], [1442109312.0, 83080000.0, 0.9343289424566126], [1442111040.0, 83100000.0, 0.9297764858882522], [1442112768.0, 83120000.0, 0.9250772068344593], [1442114496.0, 83140000.0, 0.9202318473658719], [1442116224.0, 83160000.0, 0.9152411726209166], [1442117952.0, 83180000.0, 0.910105970684998], [1442119680.0, 83200000.0, 0.9048270524660191], [1442121408.0, 83220000.0, 0.8994052515663742], [1442123136.0, 83240000.0, 0.8938414241512641], [1442124864.0, 83260000.0, 0.8881364488135453], [1442126592.0, 83280000.0, 0.8822912264349544], [1442128320.0, 83300000.0, 0.8763066800438651], [1442130048.0, 83320000.0, 0.870183754669524], [1442131776.0, 83340000.0, 0.8639234171928377], [1442133504.0, 83360000.0, 0.8575266561936514], [1442135232.0, 83380000.0, 0.8509944817946952], [1442136960.0, 83400000.0, 0.8443279255020151], [1442138688.0, 83420000.0, 0.8375280400421421], [1442140416.0, 83440000.0, 0.8305958991958174], [1442142144.0, 83460000.0, 0.8235325976284287], [1442143872.0, 83480000.0, 0.8163392507171816], [1442145600.0, 83500000.0, 0.8090169943749497], [1442147328.0, 83520000.0, 0.8015669848708751], [1442149056.0, 83540000.0, 0.7939903986478386], [1442150784.0, 83560000.0, 0.7862884321366227], [1442152512.0, 83580000.0, 0.7784623015670233], [1442154240.0, 83600000.0, 0.7705132427757941], [1442155968.0, 83620000.0, 0.7624425110114488], [1442157696.0, 83640000.0, 0.7542513807361005], [1442159424.0, 83660000.0, 0.7459411454241841], [1442161152.0, 83680000.0, 0.7375131173581716], [1442162880.0, 83700000.0, 0.7289686274214147], [1442164608.0, 83720000.0, 0.7203090248879106], [1442166336.0, 83740000.0, 0.7115356772092847], [1442168064.0, 83760000.0, 0.7026499697988541], [1442169792.0, 83780000.0, 0.6936533058128054], [1442171520.0, 83800000.0, 0.6845471059286845], [1442173248.0, 83820000.0, 0.675332808121026], [1442174976.0, 83840000.0, 0.6660118674342538], [1442176704.0, 83860000.0, 0.6565857557529592], [1442178432.0, 83880000.0, 0.6470559615694477], [1442180160.0, 83900000.0, 0.6374239897486883], [1442181888.0, 83920000.0, 0.6276913612907051], [1442183616.0, 83940000.0, 0.6178596130903341], [1442185344.0, 83960000.0, 0.6079302976946058], [1442187072.0, 83980000.0, 0.5979049830575199], [1442188800.0, 84000000.0, 0.5877852522924748], [1442190528.0, 84020000.0, 0.5775727034222757], [1442192256.0, 84040000.0, 0.5672689491267595], [1442193984.0, 84060000.0, 0.5568756164881857], [1442195712.0, 84080000.0, 0.5463943467342735], [1442197440.0, 84100000.0, 0.5358267949789957], [1442199168.0, 84120000.0, 0.5251746299612954], [1442200896.0, 84140000.0, 0.5144395337815069], [1442202624.0, 84160000.0, 0.5036232016357619], [1442204352.0, 84180000.0, 0.4927273415482995], [1442206080.0, 84200000.0, 0.4817536741017177], [1442207808.0, 84220000.0, 0.4707039321653294], [1442209536.0, 84240000.0, 0.4595798606214917], [1442211264.0, 84260000.0, 0.44838321609003046], [1442212992.0, 84280000.0, 0.4371157666509318], [1442214720.0, 84300000.0, 0.4257792915650787], [1442216448.0, 84320000.0, 0.4143755809932845], [1442218176.0, 84340000.0, 0.4029064357136702], [1442219904.0, 84360000.0, 0.3913736668372042], [1442221632.0, 84380000.0, 0.37977909552179706], [1442223360.0, 84400000.0, 0.3681245526846812], [1442225088.0, 84420000.0, 0.3564118787132547], [1442226816.0, 84440000.0, 0.3446429231745151], [1442228544.0, 84460000.0, 0.3328195445229921], [1442230272.0, 84480000.0, 0.32094360980720904], [1442232000.0, 84500000.0, 0.30901699437495445], [1442233728.0, 84520000.0, 0.2970415815770359], [1442235456.0, 84540000.0, 0.2850192624699711], [1442237184.0, 84560000.0, 0.27295193551732777], [1442238912.0, 84580000.0, 0.2608415062899002], [1442240640.0, 84600000.0, 0.248689887164852], [1442242368.0, 84620000.0, 0.23649899702372953], [1442244096.0, 84640000.0, 0.2242707609493799], [1442245824.0, 84660000.0, 0.21200710992206104], [1442247552.0, 84680000.0, 0.19970998051440725], [1442249280.0, 84700000.0, 0.18738131458572563], [1442251008.0, 84720000.0, 0.1750230589752778], [1442252736.0, 84740000.0, 0.16263716519488614], [1442254464.0, 84760000.0, 0.15022558912075334], [1442256192.0, 84780000.0, 0.13779029068464216], [1442257920.0, 84800000.0, 0.12533323356430207], [1442259648.0, 84820000.0, 0.11285638487348736], [1442261376.0, 84840000.0, 0.10036171485121427], [1442263104.0, 84860000.0, 0.08785119655074333], [1442264832.0, 84880000.0, 0.07532680552794074], [1442266560.0, 84900000.0, 0.0627905195293151], [1442268288.0, 84920000.0, 0.05024431817976496], [1442270016.0, 84940000.0, 0.03769018266993782], [1442271744.0, 84960000.0, 0.025130095443334444], [1442273472.0, 84980000.0, 0.01256603988335746], [1442275200.0, 85000000.0, -1.4708141202500005e-15], [1442276928.0, 85020000.0, -0.012566039883353296], [1442278656.0, 85040000.0, -0.025130095443330284], [1442280384.0, 85060000.0, -0.03769018266993367], [1442282112.0, 85080000.0, -0.05024431817976081], [1442283840.0, 85100000.0, -0.06279051952931095], [1442285568.0, 85120000.0, -0.0753268055279366], [1442287296.0, 85140000.0, -0.08785119655073918], [1442289024.0, 85160000.0, -0.10036171485121012], [1442290752.0, 85180000.0, -0.11285638487348322], [1442292480.0, 85200000.0, -0.12533323356429796], [1442294208.0, 85220000.0, -0.13779029068463805], [1442295936.0, 85240000.0, -0.15022558912074924], [1442297664.0, 85260000.0, -0.16263716519488203], [1442299392.0, 85280000.0, -0.1750230589752737], [1442301120.0, 85300000.0, -0.18738131458572152], [1442302848.0, 85320000.0, -0.19970998051440317], [1442304576.0, 85340000.0, -0.21200710992205699], [1442306304.0, 85360000.0, -0.2242707609493758], [1442308032.0, 85380000.0, -0.23649899702372548], [1442309760.0, 85400000.0, -0.24868988716484794], [1442311488.0, 85420000.0, -0.2608415062898962], [1442313216.0, 85440000.0, -0.2729519355173238], [1442314944.0, 85460000.0, -0.2850192624699671], [1442316672.0, 85480000.0, -0.297041581577032], [1442318400.0, 85500000.0, -0.3090169943749505], [1442320128.0, 85520000.0, -0.3209436098072051], [1442321856.0, 85540000.0, -0.33281954452298823], [1442323584.0, 85560000.0, -0.34464292317451123], [1442325312.0, 85580000.0, -0.3564118787132508], [1442327040.0, 85600000.0, -0.3681245526846773], [1442328768.0, 85620000.0, -0.3797790955217932], [1442330496.0, 85640000.0, -0.3913736668372004], [1442332224.0, 85660000.0, -0.4029064357136664], [1442333952.0, 85680000.0, -0.4143755809932807], [1442335680.0, 85700000.0, -0.42577929156507494], [1442337408.0, 85720000.0, -0.43711576665092805], [1442339136.0, 85740000.0, -0.44838321609002674], [1442340864.0, 85760000.0, -0.45957986062148803], [1442342592.0, 85780000.0, -0.47070393216532574], [1442344320.0, 85800000.0, -0.48175367410171405], [1442346048.0, 85820000.0, -0.49272734154829584], [1442347776.0, 85840000.0, -0.5036232016357582], [1442349504.0, 85860000.0, -0.5144395337815032], [1442351232.0, 85880000.0, -0.5251746299612918], [1442352960.0, 85900000.0, -0.5358267949789921], [1442354688.0, 85920000.0, -0.5463943467342699], [1442356416.0, 85940000.0, -0.5568756164881823], [1442358144.0, 85960000.0, -0.5672689491267561], [1442359872.0, 85980000.0, -0.5775727034222723], [1442361600.0, 86000000.0, -0.5877852522924715], [1442363328.0, 86020000.0, -0.5979049830575166], [1442365056.0, 86040000.0, -0.6079302976946025], [1442366784.0, 86060000.0, -0.6178596130903309], [1442368512.0, 86080000.0, -0.6276913612907019], [1442370240.0, 86100000.0, -0.6374239897486851], [1442371968.0, 86120000.0, -0.6470559615694446], [1442373696.0, 86140000.0, -0.6565857557529561], [1442375424.0, 86160000.0, -0.6660118674342507], [1442377152.0, 86180000.0, -0.6753328081210229], [1442378880.0, 86200000.0, -0.6845471059286814], [1442380608.0, 86220000.0, -0.6936533058128024], [1442382336.0, 86240000.0, -0.7026499697988511], [1442384064.0, 86260000.0, -0.7115356772092817], [1442385792.0, 86280000.0, -0.7203090248879077], [1442387520.0, 86300000.0, -0.7289686274214118], [1442389248.0, 86320000.0, -0.7375131173581688], [1442390976.0, 86340000.0, -0.7459411454241813], [1442392704.0, 86360000.0, -0.7542513807360979], [1442394432.0, 86380000.0, -0.7624425110114461], [1442396160.0, 86400000.0, -0.7705132427757915], [1442397888.0, 86420000.0, -0.7784623015670207], [1442399616.0, 86440000.0, -0.7862884321366201], [1442401344.0, 86460000.0, -0.793990398647836], [1442403072.0, 86480000.0, -0.8015669848708726], [1442404800.0, 86500000.0, -0.8090169943749472], [1442406528.0, 86520000.0, -0.8163392507171792], [1442408256.0, 86540000.0, -0.8235325976284263], [1442409984.0, 86560000.0, -0.8305958991958151], [1442411712.0, 86580000.0, -0.8375280400421398], [1442413440.0, 86600000.0, -0.8443279255020129], [1442415168.0, 86620000.0, -0.850994481794693], [1442416896.0, 86640000.0, -0.8575266561936493], [1442418624.0, 86660000.0, -0.8639234171928356], [1442420352.0, 86680000.0, -0.870183754669522], [1442422080.0, 86700000.0, -0.876306680043863], [1442423808.0, 86720000.0, -0.8822912264349524], [1442425536.0, 86740000.0, -0.8881364488135434], [1442427264.0, 86760000.0, -0.8938414241512622], [1442428992.0, 86780000.0, -0.8994052515663723], [1442430720.0, 86800000.0, -0.9048270524660175], [1442432448.0, 86820000.0, -0.9101059706849962], [1442434176.0, 86840000.0, -0.915241172620915], [1442435904.0, 86860000.0, -0.9202318473658703], [1442437632.0, 86880000.0, -0.9250772068344577], [1442439360.0, 86900000.0, -0.9297764858882508], [1442441088.0, 86920000.0, -0.9343289424566111], [1442442816.0, 86940000.0, -0.9387338576538754], [1442444544.0, 86960000.0, -0.9429905358928631], [1442446272.0, 86980000.0, -0.947098304994745], [1442448000.0, 87000000.0, -0.9510565162951519], [1442449728.0, 87020000.0, -0.9548645447466432], [1442451456.0, 87040000.0, -0.9585217890173758], [1442453184.0, 87060000.0, -0.9620276715860837], [1442454912.0, 87080000.0, -0.9653816388332734], [1442456640.0, 87100000.0, -0.9685831611286323], [1442458368.0, 87120000.0, -0.9716317329146732], [1442460096.0, 87140000.0, -0.9745268727865778], [1442461824.0, 87160000.0, -0.9772681235681925], [1442463552.0, 87180000.0, -0.9798550523842458], [1442465280.0, 87200000.0, -0.9822872507286888], [1442467008.0, 87220000.0, -0.9845643345292041], [1442468736.0, 87240000.0, -0.9866859442078679], [1442470464.0, 87260000.0, -0.9886517447379148], [1442472192.0, 87280000.0, -0.9904614256966509], [1442473920.0, 87300000.0, -0.9921147013144783], [1442475648.0, 87320000.0, -0.993611310520008], [1442477376.0, 87340000.0, -0.9949510169812997], [1442479104.0, 87360000.0, -0.9961336091431726], [1442480832.0, 87380000.0, -0.9971589002606135], [1442482560.0, 87400000.0, -0.9980267284282716], [1442484288.0, 87420000.0, -0.9987369566060178], [1442486016.0, 87440000.0, -0.9992894726405892], [1442487744.0, 87460000.0, -0.9996841892832999], [1442489472.0, 87480000.0, -0.9999210442038161], [1442491200.0, 87500000.0, -1.0], [1442492928.0, 87520000.0, -0.9999210442038161], [1442494656.0, 87540000.0, -0.9996841892833002], [1442496384.0, 87560000.0, -0.9992894726405892], [1442498112.0, 87580000.0, -0.9987369566060178], [1442499840.0, 87600000.0, -0.9980267284282716], [1442501568.0, 87620000.0, -0.997158900260614], [1442503296.0, 87640000.0, -0.9961336091431733], [1442505024.0, 87660000.0, -0.9949510169813005], [1442506752.0, 87680000.0, -0.9936113105200081], [1442508480.0, 87700000.0, -0.9921147013144784], [1442510208.0, 87720000.0, -0.990461425696651], [1442511936.0, 87740000.0, -0.988651744737915], [1442513664.0, 87760000.0, -0.986685944207868], [1442515392.0, 87780000.0, -0.9845643345292054], [1442517120.0, 87800000.0, -0.9822872507286903], [1442518848.0, 87820000.0, -0.9798550523842473], [1442520576.0, 87840000.0, -0.9772681235681926], [1442522304.0, 87860000.0, -0.974526872786578], [1442524032.0, 87880000.0, -0.9716317329146733], [1442525760.0, 87900000.0, -0.9685831611286324], [1442527488.0, 87920000.0, -0.9653816388332754], [1442529216.0, 87940000.0, -0.9620276715860858], [1442530944.0, 87960000.0, -0.9585217890173781], [1442532672.0, 87980000.0, -0.9548645447466434], [1442534400.0, 88000000.0, -0.9510565162951521], [1442536128.0, 88020000.0, -0.9470983049947452], [1442537856.0, 88040000.0, -0.9429905358928657], [1442539584.0, 88060000.0, -0.9387338576538756], [1442541312.0, 88080000.0, -0.9343289424566139], [1442543040.0, 88100000.0, -0.929776485888251], [1442544768.0, 88120000.0, -0.9250772068344606], [1442546496.0, 88140000.0, -0.9202318473658705], [1442548224.0, 88160000.0, -0.9152411726209152], [1442549952.0, 88180000.0, -0.9101059706849965], [1442551680.0, 88200000.0, -0.9048270524660208], [1442553408.0, 88220000.0, -0.8994052515663757], [1442555136.0, 88240000.0, -0.8938414241512658], [1442556864.0, 88260000.0, -0.8881364488135437], [1442558592.0, 88280000.0, -0.882291226434956], [1442560320.0, 88300000.0, -0.8763066800438635], [1442562048.0, 88320000.0, -0.8701837546695259], [1442563776.0, 88340000.0, -0.8639234171928359], [1442565504.0, 88360000.0, -0.8575266561936533], [1442567232.0, 88380000.0, -0.850994481794697], [1442568960.0, 88400000.0, -0.844327925502017], [1442570688.0, 88420000.0, -0.8375280400421402], [1442572416.0, 88440000.0, -0.8305958991958156], [1442574144.0, 88460000.0, -0.8235325976284268], [1442575872.0, 88480000.0, -0.8163392507171837], [1442577600.0, 88500000.0, -0.8090169943749518], [1442579328.0, 88520000.0, -0.8015669848708773], [1442581056.0, 88540000.0, -0.7939903986478408], [1442582784.0, 88560000.0, -0.7862884321366206], [1442584512.0, 88580000.0, -0.7784623015670211], [1442586240.0, 88600000.0, -0.7705132427757919], [1442587968.0, 88620000.0, -0.7624425110114511], [1442589696.0, 88640000.0, -0.754251380736103], [1442591424.0, 88660000.0, -0.7459411454241865], [1442593152.0, 88680000.0, -0.7375131173581742], [1442594880.0, 88700000.0, -0.7289686274214171], [1442596608.0, 88720000.0, -0.7203090248879082], [1442598336.0, 88740000.0, -0.7115356772092822], [1442600064.0, 88760000.0, -0.7026499697988516], [1442601792.0, 88780000.0, -0.693653305812808], [1442603520.0, 88800000.0, -0.6845471059286872], [1442605248.0, 88820000.0, -0.6753328081210287], [1442606976.0, 88840000.0, -0.6660118674342512], [1442608704.0, 88860000.0, -0.656585755752962], [1442610432.0, 88880000.0, -0.6470559615694451], [1442612160.0, 88900000.0, -0.6374239897486911], [1442613888.0, 88920000.0, -0.6276913612907025], [1442615616.0, 88940000.0, -0.617859613090337], [1442617344.0, 88960000.0, -0.607930297694603], [1442619072.0, 88980000.0, -0.5979049830575228], [1442620800.0, 89000000.0, -0.587785252292472], [1442622528.0, 89020000.0, -0.577572703422273], [1442624256.0, 89040000.0, -0.5672689491267566], [1442625984.0, 89060000.0, -0.5568756164881888], [1442627712.0, 89080000.0, -0.5463943467342766], [1442629440.0, 89100000.0, -0.5358267949789988], [1442631168.0, 89120000.0, -0.5251746299612925], [1442632896.0, 89140000.0, -0.51443953378151], [1442634624.0, 89160000.0, -0.5036232016357589], [1442636352.0, 89180000.0, -0.4927273415482965], [1442638080.0, 89200000.0, -0.4817536741017147], [1442639808.0, 89220000.0, -0.4707039321653327], [1442641536.0, 89240000.0, -0.459579860621495], [1442643264.0, 89260000.0, -0.4483832160900338], [1442644992.0, 89280000.0, -0.4371157666509287], [1442646720.0, 89300000.0, -0.4257792915650756], [1442648448.0, 89320000.0, -0.41437558099328137], [1442650176.0, 89340000.0, -0.4029064357136671], [1442651904.0, 89360000.0, -0.3913736668372076], [1442653632.0, 89380000.0, -0.37977909552180045], [1442655360.0, 89400000.0, -0.36812455268468464], [1442657088.0, 89420000.0, -0.3564118787132515], [1442658816.0, 89440000.0, -0.3446429231745119], [1442660544.0, 89460000.0, -0.3328195445229889], [1442662272.0, 89480000.0, -0.3209436098072125], [1442664000.0, 89500000.0, -0.3090169943749512], [1442665728.0, 89520000.0, -0.29704158157703947], [1442667456.0, 89540000.0, -0.2850192624699746], [1442669184.0, 89560000.0, -0.2729519355173313], [1442670912.0, 89580000.0, -0.26084150628989694], [1442672640.0, 89600000.0, -0.24868988716484866], [1442674368.0, 89620000.0, -0.2364989970237262], [1442676096.0, 89640000.0, -0.22427076094938347], [1442677824.0, 89660000.0, -0.21200710992205768], [1442679552.0, 89680000.0, -0.19970998051441083], [1442681280.0, 89700000.0, -0.18738131458572224], [1442683008.0, 89720000.0, -0.17502305897528142], [1442684736.0, 89740000.0, -0.16263716519488275], [1442686464.0, 89760000.0, -0.15022558912075698], [1442688192.0, 89780000.0, -0.13779029068463877], [1442689920.0, 89800000.0, -0.12533323356430573], [1442691648.0, 89820000.0, -0.112856384873491], [1442693376.0, 89840000.0, -0.10036171485121792], [1442695104.0, 89860000.0, -0.08785119655073992], [1442696832.0, 89880000.0, -0.07532680552793733], [1442698560.0, 89900000.0, -0.06279051952931167], [1442700288.0, 89920000.0, -0.050244318179768634], [1442702016.0, 89940000.0, -0.037690182669941494], [1442703744.0, 89960000.0, -0.025130095443338118], [1442705472.0, 89980000.0, -0.012566039883361135], [1442707200.0, 90000000.0, -2.204364238465236e-15], [1442708928.0, 90020000.0, 0.012566039883356727], [1442710656.0, 90040000.0, 0.02513009544333371], [1442712384.0, 90060000.0, 0.037690182669937095], [1442714112.0, 90080000.0, 0.050244318179764234], [1442715840.0, 90100000.0, 0.06279051952930727], [1442717568.0, 90120000.0, 0.07532680552793293], [1442719296.0, 90140000.0, 0.08785119655073553], [1442721024.0, 90160000.0, 0.10036171485121353], [1442722752.0, 90180000.0, 0.11285638487348662], [1442724480.0, 90200000.0, 0.12533323356430134], [1442726208.0, 90220000.0, 0.1377902906846344], [1442727936.0, 90240000.0, 0.15022558912075262], [1442729664.0, 90260000.0, 0.16263716519487842], [1442731392.0, 90280000.0, 0.1750230589752771], [1442733120.0, 90300000.0, 0.1873813145857179], [1442734848.0, 90320000.0, 0.19970998051440653], [1442736576.0, 90340000.0, 0.21200710992205338], [1442738304.0, 90360000.0, 0.22427076094937917], [1442740032.0, 90380000.0, 0.2364989970237219], [1442741760.0, 90400000.0, 0.2486898871648444], [1442743488.0, 90420000.0, 0.26084150628989267], [1442745216.0, 90440000.0, 0.27295193551732705], [1442746944.0, 90460000.0, 0.2850192624699704], [1442748672.0, 90480000.0, 0.29704158157703525], [1442750400.0, 90500000.0, 0.309016994374947], [1442752128.0, 90520000.0, 0.3209436098072083], [1442753856.0, 90540000.0, 0.33281954452298473], [1442755584.0, 90560000.0, 0.3446429231745078], [1442757312.0, 90580000.0, 0.35641187871324737], [1442759040.0, 90600000.0, 0.36812455268468053], [1442760768.0, 90620000.0, 0.37977909552179634], [1442762496.0, 90640000.0, 0.39137366683720354], [1442764224.0, 90660000.0, 0.402906435713663], [1442765952.0, 90680000.0, 0.4143755809932773], [1442767680.0, 90700000.0, 0.4257792915650716], [1442769408.0, 90720000.0, 0.4371157666509248], [1442771136.0, 90740000.0, 0.44838321609002985], [1442772864.0, 90760000.0, 0.45957986062149103], [1442774592.0, 90780000.0, 0.4707039321653288], [1442776320.0, 90800000.0, 0.4817536741017108], [1442778048.0, 90820000.0, 0.4927273415482927], [1442779776.0, 90840000.0, 0.5036232016357551], [1442781504.0, 90860000.0, 0.5144395337815062], [1442783232.0, 90880000.0, 0.5251746299612887], [1442784960.0, 90900000.0, 0.535826794978995], [1442786688.0, 90920000.0, 0.5463943467342728], [1442788416.0, 90940000.0, 0.5568756164881852], [1442790144.0, 90960000.0, 0.5672689491267531], [1442791872.0, 90980000.0, 0.5775727034222693], [1442793600.0, 91000000.0, 0.5877852522924685], [1442795328.0, 91020000.0, 0.5979049830575193], [1442797056.0, 91040000.0, 0.6079302976945996], [1442798784.0, 91060000.0, 0.6178596130903335], [1442800512.0, 91080000.0, 0.627691361290699], [1442802240.0, 91100000.0, 0.6374239897486877], [1442803968.0, 91120000.0, 0.6470559615694418], [1442805696.0, 91140000.0, 0.6565857557529586], [1442807424.0, 91160000.0, 0.6660118674342479], [1442809152.0, 91180000.0, 0.6753328081210255], [1442810880.0, 91200000.0, 0.684547105928684], [1442812608.0, 91220000.0, 0.6936533058128048], [1442814336.0, 91240000.0, 0.7026499697988485], [1442816064.0, 91260000.0, 0.7115356772092791], [1442817792.0, 91280000.0, 0.7203090248879052], [1442819520.0, 91300000.0, 0.7289686274214141], [1442821248.0, 91320000.0, 0.7375131173581712], [1442822976.0, 91340000.0, 0.7459411454241837], [1442824704.0, 91360000.0, 0.7542513807361001], [1442826432.0, 91380000.0, 0.7624425110114483], [1442828160.0, 91400000.0, 0.7705132427757891], [1442829888.0, 91420000.0, 0.7784623015670185], [1442831616.0, 91440000.0, 0.7862884321366179], [1442833344.0, 91460000.0, 0.7939903986478382], [1442835072.0, 91480000.0, 0.8015669848708746], [1442836800.0, 91500000.0, 0.8090169943749492], [1442838528.0, 91520000.0, 0.8163392507171812], [1442840256.0, 91540000.0, 0.8235325976284242], [1442841984.0, 91560000.0, 0.830595899195813], [1442843712.0, 91580000.0, 0.8375280400421378], [1442845440.0, 91600000.0, 0.8443279255020146], [1442847168.0, 91620000.0, 0.8509944817946947], [1442848896.0, 91640000.0, 0.857526656193651], [1442850624.0, 91660000.0, 0.8639234171928337], [1442852352.0, 91680000.0, 0.8701837546695237], [1442854080.0, 91700000.0, 0.8763066800438613], [1442855808.0, 91720000.0, 0.8822912264349541], [1442857536.0, 91740000.0, 0.8881364488135416], [1442859264.0, 91760000.0, 0.8938414241512638], [1442860992.0, 91780000.0, 0.8994052515663739], [1442862720.0, 91800000.0, 0.9048270524660189], [1442864448.0, 91820000.0, 0.9101059706849948], [1442866176.0, 91840000.0, 0.9152411726209134], [1442867904.0, 91860000.0, 0.9202318473658688], [1442869632.0, 91880000.0, 0.9250772068344589], [1442871360.0, 91900000.0, 0.9297764858882493], [1442873088.0, 91920000.0, 0.9343289424566124], [1442874816.0, 91940000.0, 0.9387338576538741], [1442876544.0, 91960000.0, 0.9429905358928642], [1442878272.0, 91980000.0, 0.9470983049947438], [1442880000.0, 92000000.0, 0.9510565162951506], [1442881728.0, 92020000.0, 0.9548645447466421], [1442883456.0, 92040000.0, 0.9585217890173768], [1442885184.0, 92060000.0, 0.9620276715860846], [1442886912.0, 92080000.0, 0.9653816388332743], [1442888640.0, 92100000.0, 0.9685831611286314], [1442890368.0, 92120000.0, 0.9716317329146723], [1442892096.0, 92140000.0, 0.974526872786577], [1442893824.0, 92160000.0, 0.9772681235681917], [1442895552.0, 92180000.0, 0.9798550523842464], [1442897280.0, 92200000.0, 0.9822872507286895], [1442899008.0, 92220000.0, 0.9845643345292048], [1442900736.0, 92240000.0, 0.9866859442078674], [1442902464.0, 92260000.0, 0.9886517447379143], [1442904192.0, 92280000.0, 0.9904614256966504], [1442905920.0, 92300000.0, 0.9921147013144779], [1442907648.0, 92320000.0, 0.9936113105200075], [1442909376.0, 92340000.0, 0.9949510169813001], [1442911104.0, 92360000.0, 0.996133609143173], [1442912832.0, 92380000.0, 0.9971589002606137], [1442914560.0, 92400000.0, 0.9980267284282713], [1442916288.0, 92420000.0, 0.9987369566060176], [1442918016.0, 92440000.0, 0.9992894726405891], [1442919744.0, 92460000.0, 0.9996841892833], [1442921472.0, 92480000.0, 0.999921044203816], [1442923200.0, 92500000.0, 1.0], [1442924928.0, 92520000.0, 0.9999210442038161], [1442926656.0, 92540000.0, 0.9996841892833], [1442928384.0, 92560000.0, 0.9992894726405893], [1442930112.0, 92580000.0, 0.998736956606018], [1442931840.0, 92600000.0, 0.9980267284282719], [1442933568.0, 92620000.0, 0.9971589002606138], [1442935296.0, 92640000.0, 0.996133609143173], [1442937024.0, 92660000.0, 0.9949510169813002], [1442938752.0, 92680000.0, 0.9936113105200085], [1442940480.0, 92700000.0, 0.9921147013144789], [1442942208.0, 92720000.0, 0.9904614256966515], [1442943936.0, 92740000.0, 0.9886517447379155], [1442945664.0, 92760000.0, 0.9866859442078686], [1442947392.0, 92780000.0, 0.9845643345292049], [1442949120.0, 92800000.0, 0.9822872507286896], [1442950848.0, 92820000.0, 0.9798550523842466], [1442952576.0, 92840000.0, 0.9772681235681934], [1442954304.0, 92860000.0, 0.9745268727865788], [1442956032.0, 92880000.0, 0.9716317329146742], [1442957760.0, 92900000.0, 0.9685831611286334], [1442959488.0, 92920000.0, 0.9653816388332745], [1442961216.0, 92940000.0, 0.9620276715860849], [1442962944.0, 92960000.0, 0.9585217890173771], [1442964672.0, 92980000.0, 0.9548645447466445], [1442966400.0, 93000000.0, 0.9510565162951532], [1442968128.0, 93020000.0, 0.9470983049947465], [1442969856.0, 93040000.0, 0.9429905358928645], [1442971584.0, 93060000.0, 0.938733857653877], [1442973312.0, 93080000.0, 0.9343289424566127], [1442975040.0, 93100000.0, 0.9297764858882523], [1442976768.0, 93120000.0, 0.9250772068344594], [1442978496.0, 93140000.0, 0.9202318473658719], [1442980224.0, 93160000.0, 0.9152411726209168], [1442981952.0, 93180000.0, 0.9101059706849981], [1442983680.0, 93200000.0, 0.9048270524660192], [1442985408.0, 93220000.0, 0.8994052515663742], [1442987136.0, 93240000.0, 0.8938414241512642], [1442988864.0, 93260000.0, 0.8881364488135454], [1442990592.0, 93280000.0, 0.8822912264349545], [1442992320.0, 93300000.0, 0.8763066800438652], [1442994048.0, 93320000.0, 0.8701837546695242], [1442995776.0, 93340000.0, 0.8639234171928378], [1442997504.0, 93360000.0, 0.8575266561936515], [1442999232.0, 93380000.0, 0.8509944817946953], [1443000960.0, 93400000.0, 0.8443279255020152], [1443002688.0, 93420000.0, 0.8375280400421422], [1443004416.0, 93440000.0, 0.8305958991958176], [1443006144.0, 93460000.0, 0.8235325976284289], [1443007872.0, 93480000.0, 0.8163392507171817], [1443009600.0, 93500000.0, 0.8090169943749498], [1443011328.0, 93520000.0, 0.8015669848708752], [1443013056.0, 93540000.0, 0.7939903986478387], [1443014784.0, 93560000.0, 0.7862884321366228], [1443016512.0, 93580000.0, 0.7784623015670235], [1443018240.0, 93600000.0, 0.7705132427757944], [1443019968.0, 93620000.0, 0.7624425110114489], [1443021696.0, 93640000.0, 0.7542513807361008], [1443023424.0, 93660000.0, 0.7459411454241843], [1443025152.0, 93680000.0, 0.7375131173581718], [1443026880.0, 93700000.0, 0.7289686274214148], [1443028608.0, 93720000.0, 0.7203090248879107], [1443030336.0, 93740000.0, 0.7115356772092848], [1443032064.0, 93760000.0, 0.7026499697988543], [1443033792.0, 93780000.0, 0.6936533058128056], [1443035520.0, 93800000.0, 0.6845471059286846], [1443037248.0, 93820000.0, 0.6753328081210261], [1443038976.0, 93840000.0, 0.666011867434254], [1443040704.0, 93860000.0, 0.6565857557529594], [1443042432.0, 93880000.0, 0.6470559615694479], [1443044160.0, 93900000.0, 0.6374239897486885], [1443045888.0, 93920000.0, 0.6276913612907054], [1443047616.0, 93940000.0, 0.6178596130903343], [1443049344.0, 93960000.0, 0.6079302976946059], [1443051072.0, 93980000.0, 0.59790498305752], [1443052800.0, 94000000.0, 0.587785252292475], [1443054528.0, 94020000.0, 0.577572703422276], [1443056256.0, 94040000.0, 0.5672689491267596], [1443057984.0, 94060000.0, 0.556875616488186], [1443059712.0, 94080000.0, 0.5463943467342737], [1443061440.0, 94100000.0, 0.5358267949789959], [1443063168.0, 94120000.0, 0.5251746299612956], [1443064896.0, 94140000.0, 0.514439533781507], [1443066624.0, 94160000.0, 0.5036232016357621], [1443068352.0, 94180000.0, 0.49272734154829967], [1443070080.0, 94200000.0, 0.48175367410171793], [1443071808.0, 94220000.0, 0.47070393216532963], [1443073536.0, 94240000.0, 0.4595798606214919], [1443075264.0, 94260000.0, 0.4483832160900307], [1443076992.0, 94280000.0, 0.43711576665093205], [1443078720.0, 94300000.0, 0.42577929156507893], [1443080448.0, 94320000.0, 0.4143755809932847], [1443082176.0, 94340000.0, 0.4029064357136704], [1443083904.0, 94360000.0, 0.3913736668372044], [1443085632.0, 94380000.0, 0.3797790955217973], [1443087360.0, 94400000.0, 0.3681245526846814], [1443089088.0, 94420000.0, 0.3564118787132549], [1443090816.0, 94440000.0, 0.34464292317451534], [1443092544.0, 94460000.0, 0.3328195445229924], [1443094272.0, 94480000.0, 0.32094360980720926], [1443096000.0, 94500000.0, 0.30901699437495467], [1443097728.0, 94520000.0, 0.2970415815770362], [1443099456.0, 94540000.0, 0.28501926246997134], [1443101184.0, 94560000.0, 0.272951935517328], [1443102912.0, 94580000.0, 0.2608415062899005], [1443104640.0, 94600000.0, 0.2486898871648522], [1443106368.0, 94620000.0, 0.23649899702372976], [1443108096.0, 94640000.0, 0.22427076094938012], [1443109824.0, 94660000.0, 0.2120071099220613], [1443111552.0, 94680000.0, 0.19970998051440747], [1443113280.0, 94700000.0, 0.18738131458572585], [1443115008.0, 94720000.0, 0.17502305897527806], [1443116736.0, 94740000.0, 0.16263716519488639], [1443118464.0, 94760000.0, 0.1502255891207536], [1443120192.0, 94780000.0, 0.1377902906846424], [1443121920.0, 94800000.0, 0.12533323356430232], [1443123648.0, 94820000.0, 0.11285638487348759], [1443125376.0, 94840000.0, 0.10036171485121452], [1443127104.0, 94860000.0, 0.08785119655074358], [1443128832.0, 94880000.0, 0.075326805527941], [1443130560.0, 94900000.0, 0.06279051952931534], [1443132288.0, 94920000.0, 0.050244318179765206], [1443134016.0, 94940000.0, 0.037690182669938066], [1443135744.0, 94960000.0, 0.02513009544333469], [1443137472.0, 94980000.0, 0.012566039883357706], [1443139200.0, 95000000.0, 5.879542597180472e-15], [1443140928.0, 95020000.0, -0.012566039883353051], [1443142656.0, 95040000.0, -0.025130095443330037], [1443144384.0, 95060000.0, -0.03769018266993342], [1443146112.0, 95080000.0, -0.050244318179760564], [1443147840.0, 95100000.0, -0.0627905195293107], [1443149568.0, 95120000.0, -0.07532680552793634], [1443151296.0, 95140000.0, -0.08785119655073895], [1443153024.0, 95160000.0, -0.10036171485120989], [1443154752.0, 95180000.0, -0.11285638487348297], [1443156480.0, 95200000.0, -0.1253332335642977], [1443158208.0, 95220000.0, -0.1377902906846378], [1443159936.0, 95240000.0, -0.150225589120749], [1443161664.0, 95260000.0, -0.16263716519488178], [1443163392.0, 95280000.0, -0.17502305897527348], [1443165120.0, 95300000.0, -0.1873813145857213], [1443166848.0, 95320000.0, -0.19970998051440292], [1443168576.0, 95340000.0, -0.21200710992205674], [1443170304.0, 95360000.0, -0.2242707609493756], [1443172032.0, 95380000.0, -0.23649899702372523], [1443173760.0, 95400000.0, -0.24868988716484772], [1443175488.0, 95420000.0, -0.260841506289896], [1443177216.0, 95440000.0, -0.27295193551732355], [1443178944.0, 95460000.0, -0.2850192624699669], [1443180672.0, 95480000.0, -0.29704158157703175], [1443182400.0, 95500000.0, -0.3090169943749503], [1443184128.0, 95520000.0, -0.3209436098072049], [1443185856.0, 95540000.0, -0.332819544522988], [1443187584.0, 95560000.0, -0.344642923174511], [1443189312.0, 95580000.0, -0.3564118787132506], [1443191040.0, 95600000.0, -0.3681245526846771], [1443192768.0, 95620000.0, -0.37977909552179295], [1443194496.0, 95640000.0, -0.39137366683720015], [1443196224.0, 95660000.0, -0.4029064357136662], [1443197952.0, 95680000.0, -0.4143755809932805], [1443199680.0, 95700000.0, -0.4257792915650747], [1443201408.0, 95720000.0, -0.43711576665092783], [1443203136.0, 95740000.0, -0.4483832160900265], [1443204864.0, 95760000.0, -0.4595798606214878], [1443206592.0, 95780000.0, -0.4707039321653255], [1443208320.0, 95800000.0, -0.4817536741017138], [1443210048.0, 95820000.0, -0.49272734154829567], [1443211776.0, 95840000.0, -0.503623201635758], [1443213504.0, 95860000.0, -0.5144395337815031], [1443215232.0, 95880000.0, -0.5251746299612916], [1443216960.0, 95900000.0, -0.5358267949789919], [1443218688.0, 95920000.0, -0.5463943467342698], [1443220416.0, 95940000.0, -0.5568756164881821], [1443222144.0, 95960000.0, -0.5672689491267559], [1443223872.0, 95980000.0, -0.5775727034222721], [1443225600.0, 96000000.0, -0.5877852522924712], [1443227328.0, 96020000.0, -0.5979049830575164], [1443229056.0, 96040000.0, -0.6079302976946023], [1443230784.0, 96060000.0, -0.6178596130903307], [1443232512.0, 96080000.0, -0.6276913612907018], [1443234240.0, 96100000.0, -0.6374239897486849], [1443235968.0, 96120000.0, -0.6470559615694443], [1443237696.0, 96140000.0, -0.6565857557529559], [1443239424.0, 96160000.0, -0.6660118674342504], [1443241152.0, 96180000.0, -0.6753328081210227], [1443242880.0, 96200000.0, -0.6845471059286813], [1443244608.0, 96220000.0, -0.6936533058128022], [1443246336.0, 96240000.0, -0.702649969798851], [1443248064.0, 96260000.0, -0.7115356772092816], [1443249792.0, 96280000.0, -0.7203090248879075], [1443251520.0, 96300000.0, -0.7289686274214117], [1443253248.0, 96320000.0, -0.7375131173581687], [1443254976.0, 96340000.0, -0.7459411454241812], [1443256704.0, 96360000.0, -0.7542513807360977], [1443258432.0, 96380000.0, -0.7624425110114459], [1443260160.0, 96400000.0, -0.7705132427757914], [1443261888.0, 96420000.0, -0.7784623015670206], [1443263616.0, 96440000.0, -0.78628843213662], [1443265344.0, 96460000.0, -0.7939903986478359], [1443267072.0, 96480000.0, -0.8015669848708724], [1443268800.0, 96500000.0, -0.8090169943749471], [1443270528.0, 96520000.0, -0.816339250717179], [1443272256.0, 96540000.0, -0.8235325976284262], [1443273984.0, 96560000.0, -0.830595899195815], [1443275712.0, 96580000.0, -0.8375280400421397], [1443277440.0, 96600000.0, -0.8443279255020126], [1443279168.0, 96620000.0, -0.8509944817946928], [1443280896.0, 96640000.0, -0.8575266561936491], [1443282624.0, 96660000.0, -0.8639234171928355], [1443284352.0, 96680000.0, -0.8701837546695219], [1443286080.0, 96700000.0, -0.8763066800438629], [1443287808.0, 96720000.0, -0.8822912264349523], [1443289536.0, 96740000.0, -0.8881364488135433], [1443291264.0, 96760000.0, -0.8938414241512621], [1443292992.0, 96780000.0, -0.8994052515663722], [1443294720.0, 96800000.0, -0.9048270524660174], [1443296448.0, 96820000.0, -0.9101059706849961], [1443298176.0, 96840000.0, -0.9152411726209149], [1443299904.0, 96860000.0, -0.9202318473658702], [1443301632.0, 96880000.0, -0.9250772068344576], [1443303360.0, 96900000.0, -0.9297764858882507], [1443305088.0, 96920000.0, -0.934328942456611], [1443306816.0, 96940000.0, -0.9387338576538753], [1443308544.0, 96960000.0, -0.942990535892863], [1443310272.0, 96980000.0, -0.9470983049947449], [1443312000.0, 97000000.0, -0.9510565162951518], [1443313728.0, 97020000.0, -0.9548645447466431], [1443315456.0, 97040000.0, -0.9585217890173757], [1443317184.0, 97060000.0, -0.9620276715860836], [1443318912.0, 97080000.0, -0.9653816388332733], [1443320640.0, 97100000.0, -0.9685831611286322], [1443322368.0, 97120000.0, -0.9716317329146731], [1443324096.0, 97140000.0, -0.9745268727865778], [1443325824.0, 97160000.0, -0.9772681235681924], [1443327552.0, 97180000.0, -0.9798550523842458], [1443329280.0, 97200000.0, -0.9822872507286887], [1443331008.0, 97220000.0, -0.9845643345292041], [1443332736.0, 97240000.0, -0.9866859442078679], [1443334464.0, 97260000.0, -0.9886517447379148], [1443336192.0, 97280000.0, -0.9904614256966509], [1443337920.0, 97300000.0, -0.9921147013144783], [1443339648.0, 97320000.0, -0.993611310520008], [1443341376.0, 97340000.0, -0.9949510169812996], [1443343104.0, 97360000.0, -0.9961336091431726], [1443344832.0, 97380000.0, -0.9971589002606135], [1443346560.0, 97400000.0, -0.9980267284282716], [1443348288.0, 97420000.0, -0.9987369566060178], [1443350016.0, 97440000.0, -0.9992894726405892], [1443351744.0, 97460000.0, -0.9996841892832999], [1443353472.0, 97480000.0, -0.9999210442038161], [1443355200.0, 97500000.0, -1.0], [1443356928.0, 97520000.0, -0.9999210442038161], [1443358656.0, 97540000.0, -0.9996841892833002], [1443360384.0, 97560000.0, -0.9992894726405892], [1443362112.0, 97580000.0, -0.9987369566060179], [1443363840.0, 97600000.0, -0.9980267284282717], [1443365568.0, 97620000.0, -0.997158900260614], [1443367296.0, 97640000.0, -0.9961336091431733], [1443369024.0, 97660000.0, -0.9949510169813005], [1443370752.0, 97680000.0, -0.9936113105200081], [1443372480.0, 97700000.0, -0.9921147013144784], [1443374208.0, 97720000.0, -0.990461425696651], [1443375936.0, 97740000.0, -0.988651744737915], [1443377664.0, 97760000.0, -0.986685944207868], [1443379392.0, 97780000.0, -0.9845643345292056], [1443381120.0, 97800000.0, -0.9822872507286903], [1443382848.0, 97820000.0, -0.9798550523842473], [1443384576.0, 97840000.0, -0.9772681235681926], [1443386304.0, 97860000.0, -0.974526872786578], [1443388032.0, 97880000.0, -0.9716317329146734], [1443389760.0, 97900000.0, -0.9685831611286325], [1443391488.0, 97920000.0, -0.9653816388332755], [1443393216.0, 97940000.0, -0.9620276715860859], [1443394944.0, 97960000.0, -0.9585217890173782], [1443396672.0, 97980000.0, -0.9548645447466434], [1443398400.0, 98000000.0, -0.9510565162951521], [1443400128.0, 98020000.0, -0.9470983049947453], [1443401856.0, 98040000.0, -0.9429905358928657], [1443403584.0, 98060000.0, -0.9387338576538757], [1443405312.0, 98080000.0, -0.934328942456614], [1443407040.0, 98100000.0, -0.9297764858882511], [1443408768.0, 98120000.0, -0.9250772068344607], [1443410496.0, 98140000.0, -0.9202318473658706], [1443412224.0, 98160000.0, -0.9152411726209153], [1443413952.0, 98180000.0, -0.9101059706849967], [1443415680.0, 98200000.0, -0.9048270524660209], [1443417408.0, 98220000.0, -0.8994052515663759], [1443419136.0, 98240000.0, -0.8938414241512659], [1443420864.0, 98260000.0, -0.8881364488135438], [1443422592.0, 98280000.0, -0.8822912264349562], [1443424320.0, 98300000.0, -0.8763066800438636], [1443426048.0, 98320000.0, -0.870183754669526], [1443427776.0, 98340000.0, -0.863923417192836], [1443429504.0, 98360000.0, -0.8575266561936534], [1443431232.0, 98380000.0, -0.8509944817946972], [1443432960.0, 98400000.0, -0.8443279255020172], [1443434688.0, 98420000.0, -0.8375280400421403], [1443436416.0, 98440000.0, -0.8305958991958157], [1443438144.0, 98460000.0, -0.8235325976284269], [1443439872.0, 98480000.0, -0.8163392507171838], [1443441600.0, 98500000.0, -0.809016994374952], [1443443328.0, 98520000.0, -0.8015669848708774], [1443445056.0, 98540000.0, -0.7939903986478409], [1443446784.0, 98560000.0, -0.7862884321366207], [1443448512.0, 98580000.0, -0.7784623015670213], [1443450240.0, 98600000.0, -0.7705132427757921], [1443451968.0, 98620000.0, -0.7624425110114513], [1443453696.0, 98640000.0, -0.7542513807361032], [1443455424.0, 98660000.0, -0.7459411454241868], [1443457152.0, 98680000.0, -0.7375131173581743], [1443458880.0, 98700000.0, -0.7289686274214173], [1443460608.0, 98720000.0, -0.7203090248879084], [1443462336.0, 98740000.0, -0.7115356772092823], [1443464064.0, 98760000.0, -0.7026499697988519], [1443465792.0, 98780000.0, -0.6936533058128082], [1443467520.0, 98800000.0, -0.6845471059286873], [1443469248.0, 98820000.0, -0.6753328081210289], [1443470976.0, 98840000.0, -0.6660118674342514], [1443472704.0, 98860000.0, -0.6565857557529622], [1443474432.0, 98880000.0, -0.6470559615694453], [1443476160.0, 98900000.0, -0.6374239897486913], [1443477888.0, 98920000.0, -0.6276913612907027], [1443479616.0, 98940000.0, -0.6178596130903372], [1443481344.0, 98960000.0, -0.6079302976946033], [1443483072.0, 98980000.0, -0.597904983057523], [1443484800.0, 99000000.0, -0.5877852522924722], [1443486528.0, 99020000.0, -0.5775727034222731], [1443488256.0, 99040000.0, -0.5672689491267568], [1443489984.0, 99060000.0, -0.5568756164881891], [1443491712.0, 99080000.0, -0.5463943467342767], [1443493440.0, 99100000.0, -0.535826794978999], [1443495168.0, 99120000.0, -0.5251746299612927], [1443496896.0, 99140000.0, -0.5144395337815102], [1443498624.0, 99160000.0, -0.5036232016357591], [1443500352.0, 99180000.0, -0.4927273415482967], [1443502080.0, 99200000.0, -0.48175367410171493], [1443503808.0, 99220000.0, -0.4707039321653329], [1443505536.0, 99240000.0, -0.4595798606214952], [1443507264.0, 99260000.0, -0.448383216090034], [1443508992.0, 99280000.0, -0.43711576665092894], [1443510720.0, 99300000.0, -0.4257792915650758], [1443512448.0, 99320000.0, -0.4143755809932816], [1443514176.0, 99340000.0, -0.4029064357136673], [1443515904.0, 99360000.0, -0.3913736668372078], [1443517632.0, 99380000.0, -0.37977909552180067], [1443519360.0, 99400000.0, -0.36812455268468486], [1443521088.0, 99420000.0, -0.3564118787132517], [1443522816.0, 99440000.0, -0.3446429231745121], [1443524544.0, 99460000.0, -0.3328195445229891], [1443526272.0, 99480000.0, -0.32094360980721276], [1443528000.0, 99500000.0, -0.30901699437495145], [1443529728.0, 99520000.0, -0.2970415815770397], [1443531456.0, 99540000.0, -0.28501926246997483], [1443533184.0, 99560000.0, -0.27295193551733155], [1443534912.0, 99580000.0, -0.26084150628989716], [1443536640.0, 99600000.0, -0.24868988716484888], [1443538368.0, 99620000.0, -0.23649899702372643], [1443540096.0, 99640000.0, -0.2242707609493837], [1443541824.0, 99660000.0, -0.21200710992205793], [1443543552.0, 99680000.0, -0.19970998051441108], [1443545280.0, 99700000.0, -0.1873813145857225], [1443547008.0, 99720000.0, -0.17502305897528167], [1443548736.0, 99740000.0, -0.162637165194883], [1443550464.0, 99760000.0, -0.15022558912075723], [1443552192.0, 99780000.0, -0.137790290684639], [1443553920.0, 99800000.0, -0.12533323356430598], [1443555648.0, 99820000.0, -0.11285638487349124], [1443557376.0, 99840000.0, -0.10036171485121817], [1443559104.0, 99860000.0, -0.08785119655074017], [1443560832.0, 99880000.0, -0.07532680552793757], [1443562560.0, 99900000.0, -0.06279051952931192], [1443564288.0, 99920000.0, -0.050244318179768876], [1443566016.0, 99940000.0, -0.037690182669941744], [1443567744.0, 99960000.0, -0.025130095443338364], [1443569472.0, 99980000.0, -0.01256603988336138]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/app/demo/data/sq.json b/tensorflow/tensorboard/app/demo/data/sq.json
new file mode 100644
index 0000000000..03c99221b9
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/data/sq.json
@@ -0,0 +1 @@
+[[1434931200.0, 0.0, 100.0], [1434932928.0, 20000.0, 99.999996], [1434934656.0, 40000.0, 99.999984], [1434936384.0, 60000.0, 99.999964], [1434938112.0, 80000.0, 99.999936], [1434939840.0, 100000.0, 99.9999], [1434941568.0, 120000.0, 99.999856], [1434943296.0, 140000.0, 99.999804], [1434945024.0, 160000.0, 99.999744], [1434946752.0, 180000.0, 99.999676], [1434948480.0, 200000.0, 99.9996], [1434950208.0, 220000.0, 99.999516], [1434951936.0, 240000.0, 99.999424], [1434953664.0, 260000.0, 99.999324], [1434955392.0, 280000.0, 99.999216], [1434957120.0, 300000.0, 99.9991], [1434958848.0, 320000.0, 99.998976], [1434960576.0, 340000.0, 99.998844], [1434962304.0, 360000.0, 99.998704], [1434964032.0, 380000.0, 99.998556], [1434965760.0, 400000.0, 99.9984], [1434967488.0, 420000.0, 99.998236], [1434969216.0, 440000.0, 99.998064], [1434970944.0, 460000.0, 99.997884], [1434972672.0, 480000.0, 99.997696], [1434974400.0, 500000.0, 99.9975], [1434976128.0, 520000.0, 99.997296], [1434977856.0, 540000.0, 99.997084], [1434979584.0, 560000.0, 99.996864], [1434981312.0, 580000.0, 99.996636], [1434983040.0, 600000.0, 99.9964], [1434984768.0, 620000.0, 99.996156], [1434986496.0, 640000.0, 99.995904], [1434988224.0, 660000.0, 99.995644], [1434989952.0, 680000.0, 99.995376], [1434991680.0, 700000.0, 99.9951], [1434993408.0, 720000.0, 99.994816], [1434995136.0, 740000.0, 99.994524], [1434996864.0, 760000.0, 99.994224], [1434998592.0, 780000.0, 99.993916], [1435000320.0, 800000.0, 99.9936], [1435002048.0, 820000.0, 99.993276], [1435003776.0, 840000.0, 99.992944], [1435005504.0, 860000.0, 99.992604], [1435007232.0, 880000.0, 99.992256], [1435008960.0, 900000.0, 99.9919], [1435010688.0, 920000.0, 99.991536], [1435012416.0, 940000.0, 99.991164], [1435014144.0, 960000.0, 99.990784], [1435015872.0, 980000.0, 99.990396], [1435017600.0, 1000000.0, 99.99], [1435019328.0, 1020000.0, 99.989596], [1435021056.0, 1040000.0, 99.989184], [1435022784.0, 1060000.0, 99.988764], [1435024512.0, 1080000.0, 99.988336], [1435026240.0, 1100000.0, 99.9879], [1435027968.0, 1120000.0, 99.987456], [1435029696.0, 1140000.0, 99.987004], [1435031424.0, 1160000.0, 99.986544], [1435033152.0, 1180000.0, 99.986076], [1435034880.0, 1200000.0, 99.9856], [1435036608.0, 1220000.0, 99.985116], [1435038336.0, 1240000.0, 99.984624], [1435040064.0, 1260000.0, 99.984124], [1435041792.0, 1280000.0, 99.983616], [1435043520.0, 1300000.0, 99.9831], [1435045248.0, 1320000.0, 99.982576], [1435046976.0, 1340000.0, 99.982044], [1435048704.0, 1360000.0, 99.981504], [1435050432.0, 1380000.0, 99.980956], [1435052160.0, 1400000.0, 99.9804], [1435053888.0, 1420000.0, 99.979836], [1435055616.0, 1440000.0, 99.979264], [1435057344.0, 1460000.0, 99.978684], [1435059072.0, 1480000.0, 99.978096], [1435060800.0, 1500000.0, 99.9775], [1435062528.0, 1520000.0, 99.976896], [1435064256.0, 1540000.0, 99.976284], [1435065984.0, 1560000.0, 99.975664], [1435067712.0, 1580000.0, 99.975036], [1435069440.0, 1600000.0, 99.9744], [1435071168.0, 1620000.0, 99.973756], [1435072896.0, 1640000.0, 99.973104], [1435074624.0, 1660000.0, 99.972444], [1435076352.0, 1680000.0, 99.971776], [1435078080.0, 1700000.0, 99.9711], [1435079808.0, 1720000.0, 99.970416], [1435081536.0, 1740000.0, 99.969724], [1435083264.0, 1760000.0, 99.969024], [1435084992.0, 1780000.0, 99.968316], [1435086720.0, 1800000.0, 99.9676], [1435088448.0, 1820000.0, 99.966876], [1435090176.0, 1840000.0, 99.966144], [1435091904.0, 1860000.0, 99.965404], [1435093632.0, 1880000.0, 99.964656], [1435095360.0, 1900000.0, 99.9639], [1435097088.0, 1920000.0, 99.963136], [1435098816.0, 1940000.0, 99.962364], [1435100544.0, 1960000.0, 99.961584], [1435102272.0, 1980000.0, 99.960796], [1435104000.0, 2000000.0, 99.96], [1435105728.0, 2020000.0, 99.959196], [1435107456.0, 2040000.0, 99.958384], [1435109184.0, 2060000.0, 99.957564], [1435110912.0, 2080000.0, 99.956736], [1435112640.0, 2100000.0, 99.9559], [1435114368.0, 2120000.0, 99.955056], [1435116096.0, 2140000.0, 99.954204], [1435117824.0, 2160000.0, 99.953344], [1435119552.0, 2180000.0, 99.952476], [1435121280.0, 2200000.0, 99.9516], [1435123008.0, 2220000.0, 99.950716], [1435124736.0, 2240000.0, 99.949824], [1435126464.0, 2260000.0, 99.948924], [1435128192.0, 2280000.0, 99.948016], [1435129920.0, 2300000.0, 99.9471], [1435131648.0, 2320000.0, 99.946176], [1435133376.0, 2340000.0, 99.945244], [1435135104.0, 2360000.0, 99.944304], [1435136832.0, 2380000.0, 99.943356], [1435138560.0, 2400000.0, 99.9424], [1435140288.0, 2420000.0, 99.941436], [1435142016.0, 2440000.0, 99.940464], [1435143744.0, 2460000.0, 99.939484], [1435145472.0, 2480000.0, 99.938496], [1435147200.0, 2500000.0, 99.9375], [1435148928.0, 2520000.0, 99.936496], [1435150656.0, 2540000.0, 99.935484], [1435152384.0, 2560000.0, 99.934464], [1435154112.0, 2580000.0, 99.933436], [1435155840.0, 2600000.0, 99.9324], [1435157568.0, 2620000.0, 99.931356], [1435159296.0, 2640000.0, 99.930304], [1435161024.0, 2660000.0, 99.929244], [1435162752.0, 2680000.0, 99.928176], [1435164480.0, 2700000.0, 99.9271], [1435166208.0, 2720000.0, 99.926016], [1435167936.0, 2740000.0, 99.924924], [1435169664.0, 2760000.0, 99.923824], [1435171392.0, 2780000.0, 99.922716], [1435173120.0, 2800000.0, 99.9216], [1435174848.0, 2820000.0, 99.920476], [1435176576.0, 2840000.0, 99.919344], [1435178304.0, 2860000.0, 99.918204], [1435180032.0, 2880000.0, 99.917056], [1435181760.0, 2900000.0, 99.9159], [1435183488.0, 2920000.0, 99.914736], [1435185216.0, 2940000.0, 99.913564], [1435186944.0, 2960000.0, 99.912384], [1435188672.0, 2980000.0, 99.911196], [1435190400.0, 3000000.0, 99.91], [1435192128.0, 3020000.0, 99.908796], [1435193856.0, 3040000.0, 99.907584], [1435195584.0, 3060000.0, 99.906364], [1435197312.0, 3080000.0, 99.905136], [1435199040.0, 3100000.0, 99.9039], [1435200768.0, 3120000.0, 99.902656], [1435202496.0, 3140000.0, 99.901404], [1435204224.0, 3160000.0, 99.900144], [1435205952.0, 3180000.0, 99.898876], [1435207680.0, 3200000.0, 99.8976], [1435209408.0, 3220000.0, 99.896316], [1435211136.0, 3240000.0, 99.895024], [1435212864.0, 3260000.0, 99.893724], [1435214592.0, 3280000.0, 99.892416], [1435216320.0, 3300000.0, 99.8911], [1435218048.0, 3320000.0, 99.889776], [1435219776.0, 3340000.0, 99.888444], [1435221504.0, 3360000.0, 99.887104], [1435223232.0, 3380000.0, 99.885756], [1435224960.0, 3400000.0, 99.8844], [1435226688.0, 3420000.0, 99.883036], [1435228416.0, 3440000.0, 99.881664], [1435230144.0, 3460000.0, 99.880284], [1435231872.0, 3480000.0, 99.878896], [1435233600.0, 3500000.0, 99.8775], [1435235328.0, 3520000.0, 99.876096], [1435237056.0, 3540000.0, 99.874684], [1435238784.0, 3560000.0, 99.873264], [1435240512.0, 3580000.0, 99.871836], [1435242240.0, 3600000.0, 99.8704], [1435243968.0, 3620000.0, 99.868956], [1435245696.0, 3640000.0, 99.867504], [1435247424.0, 3660000.0, 99.866044], [1435249152.0, 3680000.0, 99.864576], [1435250880.0, 3700000.0, 99.8631], [1435252608.0, 3720000.0, 99.861616], [1435254336.0, 3740000.0, 99.860124], [1435256064.0, 3760000.0, 99.858624], [1435257792.0, 3780000.0, 99.857116], [1435259520.0, 3800000.0, 99.8556], [1435261248.0, 3820000.0, 99.854076], [1435262976.0, 3840000.0, 99.852544], [1435264704.0, 3860000.0, 99.851004], [1435266432.0, 3880000.0, 99.849456], [1435268160.0, 3900000.0, 99.8479], [1435269888.0, 3920000.0, 99.846336], [1435271616.0, 3940000.0, 99.844764], [1435273344.0, 3960000.0, 99.843184], [1435275072.0, 3980000.0, 99.841596], [1435276800.0, 4000000.0, 99.84], [1435278528.0, 4020000.0, 99.838396], [1435280256.0, 4040000.0, 99.836784], [1435281984.0, 4060000.0, 99.835164], [1435283712.0, 4080000.0, 99.833536], [1435285440.0, 4100000.0, 99.8319], [1435287168.0, 4120000.0, 99.830256], [1435288896.0, 4140000.0, 99.828604], [1435290624.0, 4160000.0, 99.826944], [1435292352.0, 4180000.0, 99.825276], [1435294080.0, 4200000.0, 99.8236], [1435295808.0, 4220000.0, 99.821916], [1435297536.0, 4240000.0, 99.820224], [1435299264.0, 4260000.0, 99.818524], [1435300992.0, 4280000.0, 99.816816], [1435302720.0, 4300000.0, 99.8151], [1435304448.0, 4320000.0, 99.813376], [1435306176.0, 4340000.0, 99.811644], [1435307904.0, 4360000.0, 99.809904], [1435309632.0, 4380000.0, 99.808156], [1435311360.0, 4400000.0, 99.8064], [1435313088.0, 4420000.0, 99.804636], [1435314816.0, 4440000.0, 99.802864], [1435316544.0, 4460000.0, 99.801084], [1435318272.0, 4480000.0, 99.799296], [1435320000.0, 4500000.0, 99.7975], [1435321728.0, 4520000.0, 99.795696], [1435323456.0, 4540000.0, 99.793884], [1435325184.0, 4560000.0, 99.792064], [1435326912.0, 4580000.0, 99.790236], [1435328640.0, 4600000.0, 99.7884], [1435330368.0, 4620000.0, 99.786556], [1435332096.0, 4640000.0, 99.784704], [1435333824.0, 4660000.0, 99.782844], [1435335552.0, 4680000.0, 99.780976], [1435337280.0, 4700000.0, 99.7791], [1435339008.0, 4720000.0, 99.777216], [1435340736.0, 4740000.0, 99.775324], [1435342464.0, 4760000.0, 99.773424], [1435344192.0, 4780000.0, 99.771516], [1435345920.0, 4800000.0, 99.7696], [1435347648.0, 4820000.0, 99.767676], [1435349376.0, 4840000.0, 99.765744], [1435351104.0, 4860000.0, 99.763804], [1435352832.0, 4880000.0, 99.761856], [1435354560.0, 4900000.0, 99.7599], [1435356288.0, 4920000.0, 99.757936], [1435358016.0, 4940000.0, 99.755964], [1435359744.0, 4960000.0, 99.753984], [1435361472.0, 4980000.0, 99.751996], [1435363200.0, 5000000.0, 99.75], [1435364928.0, 5020000.0, 99.747996], [1435366656.0, 5040000.0, 99.74598399999999], [1435368384.0, 5060000.0, 99.743964], [1435370112.0, 5080000.0, 99.741936], [1435371840.0, 5100000.0, 99.7399], [1435373568.0, 5120000.0, 99.737856], [1435375296.0, 5140000.0, 99.735804], [1435377024.0, 5160000.0, 99.733744], [1435378752.0, 5180000.0, 99.731676], [1435380480.0, 5200000.0, 99.7296], [1435382208.0, 5220000.0, 99.727516], [1435383936.0, 5240000.0, 99.725424], [1435385664.0, 5260000.0, 99.723324], [1435387392.0, 5280000.0, 99.721216], [1435389120.0, 5300000.0, 99.7191], [1435390848.0, 5320000.0, 99.716976], [1435392576.0, 5340000.0, 99.714844], [1435394304.0, 5360000.0, 99.712704], [1435396032.0, 5380000.0, 99.710556], [1435397760.0, 5400000.0, 99.7084], [1435399488.0, 5420000.0, 99.706236], [1435401216.0, 5440000.0, 99.704064], [1435402944.0, 5460000.0, 99.701884], [1435404672.0, 5480000.0, 99.699696], [1435406400.0, 5500000.0, 99.6975], [1435408128.0, 5520000.0, 99.695296], [1435409856.0, 5540000.0, 99.693084], [1435411584.0, 5560000.0, 99.690864], [1435413312.0, 5580000.0, 99.688636], [1435415040.0, 5600000.0, 99.6864], [1435416768.0, 5620000.0, 99.684156], [1435418496.0, 5640000.0, 99.681904], [1435420224.0, 5660000.0, 99.679644], [1435421952.0, 5680000.0, 99.677376], [1435423680.0, 5700000.0, 99.6751], [1435425408.0, 5720000.0, 99.672816], [1435427136.0, 5740000.0, 99.670524], [1435428864.0, 5760000.0, 99.668224], [1435430592.0, 5780000.0, 99.665916], [1435432320.0, 5800000.0, 99.6636], [1435434048.0, 5820000.0, 99.661276], [1435435776.0, 5840000.0, 99.658944], [1435437504.0, 5860000.0, 99.656604], [1435439232.0, 5880000.0, 99.654256], [1435440960.0, 5900000.0, 99.6519], [1435442688.0, 5920000.0, 99.649536], [1435444416.0, 5940000.0, 99.647164], [1435446144.0, 5960000.0, 99.644784], [1435447872.0, 5980000.0, 99.642396], [1435449600.0, 6000000.0, 99.64], [1435451328.0, 6020000.0, 99.637596], [1435453056.0, 6040000.0, 99.635184], [1435454784.0, 6060000.0, 99.632764], [1435456512.0, 6080000.0, 99.630336], [1435458240.0, 6100000.0, 99.6279], [1435459968.0, 6120000.0, 99.625456], [1435461696.0, 6140000.0, 99.623004], [1435463424.0, 6160000.0, 99.620544], [1435465152.0, 6180000.0, 99.618076], [1435466880.0, 6200000.0, 99.6156], [1435468608.0, 6220000.0, 99.613116], [1435470336.0, 6240000.0, 99.610624], [1435472064.0, 6260000.0, 99.608124], [1435473792.0, 6280000.0, 99.605616], [1435475520.0, 6300000.0, 99.6031], [1435477248.0, 6320000.0, 99.600576], [1435478976.0, 6340000.0, 99.598044], [1435480704.0, 6360000.0, 99.595504], [1435482432.0, 6380000.0, 99.592956], [1435484160.0, 6400000.0, 99.5904], [1435485888.0, 6420000.0, 99.587836], [1435487616.0, 6440000.0, 99.585264], [1435489344.0, 6460000.0, 99.582684], [1435491072.0, 6480000.0, 99.580096], [1435492800.0, 6500000.0, 99.5775], [1435494528.0, 6520000.0, 99.574896], [1435496256.0, 6540000.0, 99.572284], [1435497984.0, 6560000.0, 99.569664], [1435499712.0, 6580000.0, 99.567036], [1435501440.0, 6600000.0, 99.5644], [1435503168.0, 6620000.0, 99.561756], [1435504896.0, 6640000.0, 99.559104], [1435506624.0, 6660000.0, 99.556444], [1435508352.0, 6680000.0, 99.553776], [1435510080.0, 6700000.0, 99.5511], [1435511808.0, 6720000.0, 99.548416], [1435513536.0, 6740000.0, 99.545724], [1435515264.0, 6760000.0, 99.543024], [1435516992.0, 6780000.0, 99.540316], [1435518720.0, 6800000.0, 99.5376], [1435520448.0, 6820000.0, 99.534876], [1435522176.0, 6840000.0, 99.532144], [1435523904.0, 6860000.0, 99.529404], [1435525632.0, 6880000.0, 99.526656], [1435527360.0, 6900000.0, 99.5239], [1435529088.0, 6920000.0, 99.521136], [1435530816.0, 6940000.0, 99.518364], [1435532544.0, 6960000.0, 99.515584], [1435534272.0, 6980000.0, 99.512796], [1435536000.0, 7000000.0, 99.51], [1435537728.0, 7020000.0, 99.507196], [1435539456.0, 7040000.0, 99.504384], [1435541184.0, 7060000.0, 99.501564], [1435542912.0, 7080000.0, 99.498736], [1435544640.0, 7100000.0, 99.4959], [1435546368.0, 7120000.0, 99.493056], [1435548096.0, 7140000.0, 99.490204], [1435549824.0, 7160000.0, 99.487344], [1435551552.0, 7180000.0, 99.484476], [1435553280.0, 7200000.0, 99.4816], [1435555008.0, 7220000.0, 99.478716], [1435556736.0, 7240000.0, 99.475824], [1435558464.0, 7260000.0, 99.472924], [1435560192.0, 7280000.0, 99.470016], [1435561920.0, 7300000.0, 99.4671], [1435563648.0, 7320000.0, 99.464176], [1435565376.0, 7340000.0, 99.461244], [1435567104.0, 7360000.0, 99.458304], [1435568832.0, 7380000.0, 99.455356], [1435570560.0, 7400000.0, 99.4524], [1435572288.0, 7420000.0, 99.449436], [1435574016.0, 7440000.0, 99.446464], [1435575744.0, 7460000.0, 99.443484], [1435577472.0, 7480000.0, 99.440496], [1435579200.0, 7500000.0, 99.4375], [1435580928.0, 7520000.0, 99.434496], [1435582656.0, 7540000.0, 99.431484], [1435584384.0, 7560000.0, 99.428464], [1435586112.0, 7580000.0, 99.425436], [1435587840.0, 7600000.0, 99.4224], [1435589568.0, 7620000.0, 99.419356], [1435591296.0, 7640000.0, 99.416304], [1435593024.0, 7660000.0, 99.413244], [1435594752.0, 7680000.0, 99.410176], [1435596480.0, 7700000.0, 99.4071], [1435598208.0, 7720000.0, 99.404016], [1435599936.0, 7740000.0, 99.400924], [1435601664.0, 7760000.0, 99.397824], [1435603392.0, 7780000.0, 99.394716], [1435605120.0, 7800000.0, 99.3916], [1435606848.0, 7820000.0, 99.388476], [1435608576.0, 7840000.0, 99.385344], [1435610304.0, 7860000.0, 99.382204], [1435612032.0, 7880000.0, 99.379056], [1435613760.0, 7900000.0, 99.3759], [1435615488.0, 7920000.0, 99.372736], [1435617216.0, 7940000.0, 99.369564], [1435618944.0, 7960000.0, 99.366384], [1435620672.0, 7980000.0, 99.363196], [1435622400.0, 8000000.0, 99.36], [1435624128.0, 8020000.0, 99.356796], [1435625856.0, 8040000.0, 99.353584], [1435627584.0, 8060000.0, 99.350364], [1435629312.0, 8080000.0, 99.347136], [1435631040.0, 8100000.0, 99.3439], [1435632768.0, 8120000.0, 99.340656], [1435634496.0, 8140000.0, 99.337404], [1435636224.0, 8160000.0, 99.334144], [1435637952.0, 8180000.0, 99.330876], [1435639680.0, 8200000.0, 99.3276], [1435641408.0, 8220000.0, 99.324316], [1435643136.0, 8240000.0, 99.321024], [1435644864.0, 8260000.0, 99.317724], [1435646592.0, 8280000.0, 99.314416], [1435648320.0, 8300000.0, 99.3111], [1435650048.0, 8320000.0, 99.307776], [1435651776.0, 8340000.0, 99.304444], [1435653504.0, 8360000.0, 99.301104], [1435655232.0, 8380000.0, 99.297756], [1435656960.0, 8400000.0, 99.2944], [1435658688.0, 8420000.0, 99.291036], [1435660416.0, 8440000.0, 99.287664], [1435662144.0, 8460000.0, 99.284284], [1435663872.0, 8480000.0, 99.280896], [1435665600.0, 8500000.0, 99.2775], [1435667328.0, 8520000.0, 99.274096], [1435669056.0, 8540000.0, 99.270684], [1435670784.0, 8560000.0, 99.267264], [1435672512.0, 8580000.0, 99.263836], [1435674240.0, 8600000.0, 99.2604], [1435675968.0, 8620000.0, 99.256956], [1435677696.0, 8640000.0, 99.253504], [1435679424.0, 8660000.0, 99.250044], [1435681152.0, 8680000.0, 99.246576], [1435682880.0, 8700000.0, 99.2431], [1435684608.0, 8720000.0, 99.239616], [1435686336.0, 8740000.0, 99.236124], [1435688064.0, 8760000.0, 99.232624], [1435689792.0, 8780000.0, 99.229116], [1435691520.0, 8800000.0, 99.2256], [1435693248.0, 8820000.0, 99.222076], [1435694976.0, 8840000.0, 99.218544], [1435696704.0, 8860000.0, 99.215004], [1435698432.0, 8880000.0, 99.211456], [1435700160.0, 8900000.0, 99.2079], [1435701888.0, 8920000.0, 99.204336], [1435703616.0, 8940000.0, 99.200764], [1435705344.0, 8960000.0, 99.197184], [1435707072.0, 8980000.0, 99.193596], [1435708800.0, 9000000.0, 99.19], [1435710528.0, 9020000.0, 99.186396], [1435712256.0, 9040000.0, 99.182784], [1435713984.0, 9060000.0, 99.179164], [1435715712.0, 9080000.0, 99.175536], [1435717440.0, 9100000.0, 99.1719], [1435719168.0, 9120000.0, 99.168256], [1435720896.0, 9140000.0, 99.164604], [1435722624.0, 9160000.0, 99.160944], [1435724352.0, 9180000.0, 99.157276], [1435726080.0, 9200000.0, 99.1536], [1435727808.0, 9220000.0, 99.149916], [1435729536.0, 9240000.0, 99.146224], [1435731264.0, 9260000.0, 99.142524], [1435732992.0, 9280000.0, 99.138816], [1435734720.0, 9300000.0, 99.1351], [1435736448.0, 9320000.0, 99.131376], [1435738176.0, 9340000.0, 99.127644], [1435739904.0, 9360000.0, 99.123904], [1435741632.0, 9380000.0, 99.120156], [1435743360.0, 9400000.0, 99.1164], [1435745088.0, 9420000.0, 99.112636], [1435746816.0, 9440000.0, 99.108864], [1435748544.0, 9460000.0, 99.105084], [1435750272.0, 9480000.0, 99.101296], [1435752000.0, 9500000.0, 99.0975], [1435753728.0, 9520000.0, 99.093696], [1435755456.0, 9540000.0, 99.089884], [1435757184.0, 9560000.0, 99.086064], [1435758912.0, 9580000.0, 99.082236], [1435760640.0, 9600000.0, 99.0784], [1435762368.0, 9620000.0, 99.074556], [1435764096.0, 9640000.0, 99.070704], [1435765824.0, 9660000.0, 99.066844], [1435767552.0, 9680000.0, 99.062976], [1435769280.0, 9700000.0, 99.0591], [1435771008.0, 9720000.0, 99.055216], [1435772736.0, 9740000.0, 99.051324], [1435774464.0, 9760000.0, 99.047424], [1435776192.0, 9780000.0, 99.043516], [1435777920.0, 9800000.0, 99.03960000000001], [1435779648.0, 9820000.0, 99.035676], [1435781376.0, 9840000.0, 99.031744], [1435783104.0, 9860000.0, 99.027804], [1435784832.0, 9880000.0, 99.023856], [1435786560.0, 9900000.0, 99.0199], [1435788288.0, 9920000.0, 99.015936], [1435790016.0, 9940000.0, 99.011964], [1435791744.0, 9960000.0, 99.007984], [1435793472.0, 9980000.0, 99.003996], [1435795200.0, 10000000.0, 99.0], [1435796928.0, 10020000.0, 98.995996], [1435798656.0, 10040000.0, 98.991984], [1435800384.0, 10060000.0, 98.987964], [1435802112.0, 10080000.0, 98.983936], [1435803840.0, 10100000.0, 98.9799], [1435805568.0, 10120000.0, 98.975856], [1435807296.0, 10140000.0, 98.971804], [1435809024.0, 10160000.0, 98.967744], [1435810752.0, 10180000.0, 98.963676], [1435812480.0, 10200000.0, 98.9596], [1435814208.0, 10220000.0, 98.955516], [1435815936.0, 10240000.0, 98.951424], [1435817664.0, 10260000.0, 98.947324], [1435819392.0, 10280000.0, 98.943216], [1435821120.0, 10300000.0, 98.9391], [1435822848.0, 10320000.0, 98.934976], [1435824576.0, 10340000.0, 98.930844], [1435826304.0, 10360000.0, 98.926704], [1435828032.0, 10380000.0, 98.922556], [1435829760.0, 10400000.0, 98.9184], [1435831488.0, 10420000.0, 98.914236], [1435833216.0, 10440000.0, 98.910064], [1435834944.0, 10460000.0, 98.905884], [1435836672.0, 10480000.0, 98.901696], [1435838400.0, 10500000.0, 98.8975], [1435840128.0, 10520000.0, 98.893296], [1435841856.0, 10540000.0, 98.889084], [1435843584.0, 10560000.0, 98.884864], [1435845312.0, 10580000.0, 98.880636], [1435847040.0, 10600000.0, 98.8764], [1435848768.0, 10620000.0, 98.872156], [1435850496.0, 10640000.0, 98.867904], [1435852224.0, 10660000.0, 98.863644], [1435853952.0, 10680000.0, 98.859376], [1435855680.0, 10700000.0, 98.8551], [1435857408.0, 10720000.0, 98.850816], [1435859136.0, 10740000.0, 98.846524], [1435860864.0, 10760000.0, 98.842224], [1435862592.0, 10780000.0, 98.837916], [1435864320.0, 10800000.0, 98.8336], [1435866048.0, 10820000.0, 98.829276], [1435867776.0, 10840000.0, 98.824944], [1435869504.0, 10860000.0, 98.820604], [1435871232.0, 10880000.0, 98.816256], [1435872960.0, 10900000.0, 98.8119], [1435874688.0, 10920000.0, 98.807536], [1435876416.0, 10940000.0, 98.803164], [1435878144.0, 10960000.0, 98.798784], [1435879872.0, 10980000.0, 98.794396], [1435881600.0, 11000000.0, 98.79], [1435883328.0, 11020000.0, 98.785596], [1435885056.0, 11040000.0, 98.781184], [1435886784.0, 11060000.0, 98.776764], [1435888512.0, 11080000.0, 98.772336], [1435890240.0, 11100000.0, 98.7679], [1435891968.0, 11120000.0, 98.763456], [1435893696.0, 11140000.0, 98.759004], [1435895424.0, 11160000.0, 98.754544], [1435897152.0, 11180000.0, 98.75007599999999], [1435898880.0, 11200000.0, 98.7456], [1435900608.0, 11220000.0, 98.741116], [1435902336.0, 11240000.0, 98.736624], [1435904064.0, 11260000.0, 98.732124], [1435905792.0, 11280000.0, 98.727616], [1435907520.0, 11300000.0, 98.7231], [1435909248.0, 11320000.0, 98.718576], [1435910976.0, 11340000.0, 98.714044], [1435912704.0, 11360000.0, 98.709504], [1435914432.0, 11380000.0, 98.704956], [1435916160.0, 11400000.0, 98.7004], [1435917888.0, 11420000.0, 98.695836], [1435919616.0, 11440000.0, 98.691264], [1435921344.0, 11460000.0, 98.686684], [1435923072.0, 11480000.0, 98.682096], [1435924800.0, 11500000.0, 98.6775], [1435926528.0, 11520000.0, 98.672896], [1435928256.0, 11540000.0, 98.668284], [1435929984.0, 11560000.0, 98.663664], [1435931712.0, 11580000.0, 98.659036], [1435933440.0, 11600000.0, 98.6544], [1435935168.0, 11620000.0, 98.649756], [1435936896.0, 11640000.0, 98.645104], [1435938624.0, 11660000.0, 98.640444], [1435940352.0, 11680000.0, 98.635776], [1435942080.0, 11700000.0, 98.6311], [1435943808.0, 11720000.0, 98.626416], [1435945536.0, 11740000.0, 98.621724], [1435947264.0, 11760000.0, 98.617024], [1435948992.0, 11780000.0, 98.612316], [1435950720.0, 11800000.0, 98.6076], [1435952448.0, 11820000.0, 98.602876], [1435954176.0, 11840000.0, 98.598144], [1435955904.0, 11860000.0, 98.593404], [1435957632.0, 11880000.0, 98.588656], [1435959360.0, 11900000.0, 98.5839], [1435961088.0, 11920000.0, 98.579136], [1435962816.0, 11940000.0, 98.574364], [1435964544.0, 11960000.0, 98.569584], [1435966272.0, 11980000.0, 98.564796], [1435968000.0, 12000000.0, 98.56], [1435969728.0, 12020000.0, 98.555196], [1435971456.0, 12040000.0, 98.550384], [1435973184.0, 12060000.0, 98.545564], [1435974912.0, 12080000.0, 98.540736], [1435976640.0, 12100000.0, 98.5359], [1435978368.0, 12120000.0, 98.531056], [1435980096.0, 12140000.0, 98.526204], [1435981824.0, 12160000.0, 98.521344], [1435983552.0, 12180000.0, 98.516476], [1435985280.0, 12200000.0, 98.5116], [1435987008.0, 12220000.0, 98.506716], [1435988736.0, 12240000.0, 98.501824], [1435990464.0, 12260000.0, 98.496924], [1435992192.0, 12280000.0, 98.492016], [1435993920.0, 12300000.0, 98.4871], [1435995648.0, 12320000.0, 98.482176], [1435997376.0, 12340000.0, 98.477244], [1435999104.0, 12360000.0, 98.472304], [1436000832.0, 12380000.0, 98.467356], [1436002560.0, 12400000.0, 98.4624], [1436004288.0, 12420000.0, 98.457436], [1436006016.0, 12440000.0, 98.452464], [1436007744.0, 12460000.0, 98.447484], [1436009472.0, 12480000.0, 98.442496], [1436011200.0, 12500000.0, 98.4375], [1436012928.0, 12520000.0, 98.432496], [1436014656.0, 12540000.0, 98.427484], [1436016384.0, 12560000.0, 98.422464], [1436018112.0, 12580000.0, 98.417436], [1436019840.0, 12600000.0, 98.4124], [1436021568.0, 12620000.0, 98.407356], [1436023296.0, 12640000.0, 98.402304], [1436025024.0, 12660000.0, 98.397244], [1436026752.0, 12680000.0, 98.392176], [1436028480.0, 12700000.0, 98.3871], [1436030208.0, 12720000.0, 98.382016], [1436031936.0, 12740000.0, 98.376924], [1436033664.0, 12760000.0, 98.371824], [1436035392.0, 12780000.0, 98.366716], [1436037120.0, 12800000.0, 98.3616], [1436038848.0, 12820000.0, 98.356476], [1436040576.0, 12840000.0, 98.351344], [1436042304.0, 12860000.0, 98.346204], [1436044032.0, 12880000.0, 98.341056], [1436045760.0, 12900000.0, 98.3359], [1436047488.0, 12920000.0, 98.330736], [1436049216.0, 12940000.0, 98.325564], [1436050944.0, 12960000.0, 98.320384], [1436052672.0, 12980000.0, 98.315196], [1436054400.0, 13000000.0, 98.31], [1436056128.0, 13020000.0, 98.304796], [1436057856.0, 13040000.0, 98.299584], [1436059584.0, 13060000.0, 98.294364], [1436061312.0, 13080000.0, 98.289136], [1436063040.0, 13100000.0, 98.2839], [1436064768.0, 13120000.0, 98.278656], [1436066496.0, 13140000.0, 98.273404], [1436068224.0, 13160000.0, 98.268144], [1436069952.0, 13180000.0, 98.262876], [1436071680.0, 13200000.0, 98.2576], [1436073408.0, 13220000.0, 98.252316], [1436075136.0, 13240000.0, 98.247024], [1436076864.0, 13260000.0, 98.241724], [1436078592.0, 13280000.0, 98.236416], [1436080320.0, 13300000.0, 98.2311], [1436082048.0, 13320000.0, 98.225776], [1436083776.0, 13340000.0, 98.220444], [1436085504.0, 13360000.0, 98.215104], [1436087232.0, 13380000.0, 98.209756], [1436088960.0, 13400000.0, 98.20439999999999], [1436090688.0, 13420000.0, 98.199036], [1436092416.0, 13440000.0, 98.193664], [1436094144.0, 13460000.0, 98.188284], [1436095872.0, 13480000.0, 98.182896], [1436097600.0, 13500000.0, 98.1775], [1436099328.0, 13520000.0, 98.172096], [1436101056.0, 13540000.0, 98.166684], [1436102784.0, 13560000.0, 98.161264], [1436104512.0, 13580000.0, 98.155836], [1436106240.0, 13600000.0, 98.1504], [1436107968.0, 13620000.0, 98.144956], [1436109696.0, 13640000.0, 98.139504], [1436111424.0, 13660000.0, 98.134044], [1436113152.0, 13680000.0, 98.128576], [1436114880.0, 13700000.0, 98.1231], [1436116608.0, 13720000.0, 98.117616], [1436118336.0, 13740000.0, 98.112124], [1436120064.0, 13760000.0, 98.106624], [1436121792.0, 13780000.0, 98.101116], [1436123520.0, 13800000.0, 98.0956], [1436125248.0, 13820000.0, 98.090076], [1436126976.0, 13840000.0, 98.084544], [1436128704.0, 13860000.0, 98.079004], [1436130432.0, 13880000.0, 98.073456], [1436132160.0, 13900000.0, 98.0679], [1436133888.0, 13920000.0, 98.062336], [1436135616.0, 13940000.0, 98.056764], [1436137344.0, 13960000.0, 98.051184], [1436139072.0, 13980000.0, 98.045596], [1436140800.0, 14000000.0, 98.04], [1436142528.0, 14020000.0, 98.034396], [1436144256.0, 14040000.0, 98.028784], [1436145984.0, 14060000.0, 98.023164], [1436147712.0, 14080000.0, 98.017536], [1436149440.0, 14100000.0, 98.0119], [1436151168.0, 14120000.0, 98.006256], [1436152896.0, 14140000.0, 98.000604], [1436154624.0, 14160000.0, 97.994944], [1436156352.0, 14180000.0, 97.989276], [1436158080.0, 14200000.0, 97.9836], [1436159808.0, 14220000.0, 97.977916], [1436161536.0, 14240000.0, 97.972224], [1436163264.0, 14260000.0, 97.966524], [1436164992.0, 14280000.0, 97.960816], [1436166720.0, 14300000.0, 97.9551], [1436168448.0, 14320000.0, 97.949376], [1436170176.0, 14340000.0, 97.943644], [1436171904.0, 14360000.0, 97.937904], [1436173632.0, 14380000.0, 97.932156], [1436175360.0, 14400000.0, 97.9264], [1436177088.0, 14420000.0, 97.920636], [1436178816.0, 14440000.0, 97.914864], [1436180544.0, 14460000.0, 97.90908400000001], [1436182272.0, 14480000.0, 97.903296], [1436184000.0, 14500000.0, 97.8975], [1436185728.0, 14520000.0, 97.891696], [1436187456.0, 14540000.0, 97.885884], [1436189184.0, 14560000.0, 97.880064], [1436190912.0, 14580000.0, 97.874236], [1436192640.0, 14600000.0, 97.8684], [1436194368.0, 14620000.0, 97.862556], [1436196096.0, 14640000.0, 97.856704], [1436197824.0, 14660000.0, 97.850844], [1436199552.0, 14680000.0, 97.844976], [1436201280.0, 14700000.0, 97.8391], [1436203008.0, 14720000.0, 97.833216], [1436204736.0, 14740000.0, 97.827324], [1436206464.0, 14760000.0, 97.821424], [1436208192.0, 14780000.0, 97.815516], [1436209920.0, 14800000.0, 97.8096], [1436211648.0, 14820000.0, 97.803676], [1436213376.0, 14840000.0, 97.797744], [1436215104.0, 14860000.0, 97.791804], [1436216832.0, 14880000.0, 97.785856], [1436218560.0, 14900000.0, 97.7799], [1436220288.0, 14920000.0, 97.773936], [1436222016.0, 14940000.0, 97.767964], [1436223744.0, 14960000.0, 97.761984], [1436225472.0, 14980000.0, 97.755996], [1436227200.0, 15000000.0, 97.75], [1436228928.0, 15020000.0, 97.743996], [1436230656.0, 15040000.0, 97.737984], [1436232384.0, 15060000.0, 97.731964], [1436234112.0, 15080000.0, 97.725936], [1436235840.0, 15100000.0, 97.7199], [1436237568.0, 15120000.0, 97.71385599999999], [1436239296.0, 15140000.0, 97.707804], [1436241024.0, 15160000.0, 97.701744], [1436242752.0, 15180000.0, 97.695676], [1436244480.0, 15200000.0, 97.6896], [1436246208.0, 15220000.0, 97.683516], [1436247936.0, 15240000.0, 97.677424], [1436249664.0, 15260000.0, 97.671324], [1436251392.0, 15280000.0, 97.665216], [1436253120.0, 15300000.0, 97.6591], [1436254848.0, 15320000.0, 97.652976], [1436256576.0, 15340000.0, 97.646844], [1436258304.0, 15360000.0, 97.640704], [1436260032.0, 15380000.0, 97.634556], [1436261760.0, 15400000.0, 97.6284], [1436263488.0, 15420000.0, 97.622236], [1436265216.0, 15440000.0, 97.616064], [1436266944.0, 15460000.0, 97.609884], [1436268672.0, 15480000.0, 97.603696], [1436270400.0, 15500000.0, 97.5975], [1436272128.0, 15520000.0, 97.591296], [1436273856.0, 15540000.0, 97.585084], [1436275584.0, 15560000.0, 97.578864], [1436277312.0, 15580000.0, 97.572636], [1436279040.0, 15600000.0, 97.5664], [1436280768.0, 15620000.0, 97.560156], [1436282496.0, 15640000.0, 97.553904], [1436284224.0, 15660000.0, 97.547644], [1436285952.0, 15680000.0, 97.541376], [1436287680.0, 15700000.0, 97.5351], [1436289408.0, 15720000.0, 97.528816], [1436291136.0, 15740000.0, 97.522524], [1436292864.0, 15760000.0, 97.516224], [1436294592.0, 15780000.0, 97.509916], [1436296320.0, 15800000.0, 97.5036], [1436298048.0, 15820000.0, 97.497276], [1436299776.0, 15840000.0, 97.490944], [1436301504.0, 15860000.0, 97.484604], [1436303232.0, 15880000.0, 97.478256], [1436304960.0, 15900000.0, 97.4719], [1436306688.0, 15920000.0, 97.465536], [1436308416.0, 15940000.0, 97.459164], [1436310144.0, 15960000.0, 97.452784], [1436311872.0, 15980000.0, 97.446396], [1436313600.0, 16000000.0, 97.44], [1436315328.0, 16020000.0, 97.433596], [1436317056.0, 16040000.0, 97.427184], [1436318784.0, 16060000.0, 97.420764], [1436320512.0, 16080000.0, 97.414336], [1436322240.0, 16100000.0, 97.4079], [1436323968.0, 16120000.0, 97.401456], [1436325696.0, 16140000.0, 97.395004], [1436327424.0, 16160000.0, 97.388544], [1436329152.0, 16180000.0, 97.382076], [1436330880.0, 16200000.0, 97.3756], [1436332608.0, 16220000.0, 97.369116], [1436334336.0, 16240000.0, 97.362624], [1436336064.0, 16260000.0, 97.356124], [1436337792.0, 16280000.0, 97.349616], [1436339520.0, 16300000.0, 97.3431], [1436341248.0, 16320000.0, 97.336576], [1436342976.0, 16340000.0, 97.330044], [1436344704.0, 16360000.0, 97.323504], [1436346432.0, 16380000.0, 97.316956], [1436348160.0, 16400000.0, 97.3104], [1436349888.0, 16420000.0, 97.303836], [1436351616.0, 16440000.0, 97.297264], [1436353344.0, 16460000.0, 97.290684], [1436355072.0, 16480000.0, 97.284096], [1436356800.0, 16500000.0, 97.2775], [1436358528.0, 16520000.0, 97.270896], [1436360256.0, 16540000.0, 97.264284], [1436361984.0, 16560000.0, 97.257664], [1436363712.0, 16580000.0, 97.251036], [1436365440.0, 16600000.0, 97.2444], [1436367168.0, 16620000.0, 97.237756], [1436368896.0, 16640000.0, 97.231104], [1436370624.0, 16660000.0, 97.224444], [1436372352.0, 16680000.0, 97.217776], [1436374080.0, 16700000.0, 97.2111], [1436375808.0, 16720000.0, 97.204416], [1436377536.0, 16740000.0, 97.197724], [1436379264.0, 16760000.0, 97.191024], [1436380992.0, 16780000.0, 97.184316], [1436382720.0, 16800000.0, 97.1776], [1436384448.0, 16820000.0, 97.170876], [1436386176.0, 16840000.0, 97.164144], [1436387904.0, 16860000.0, 97.157404], [1436389632.0, 16880000.0, 97.150656], [1436391360.0, 16900000.0, 97.1439], [1436393088.0, 16920000.0, 97.137136], [1436394816.0, 16940000.0, 97.130364], [1436396544.0, 16960000.0, 97.123584], [1436398272.0, 16980000.0, 97.116796], [1436400000.0, 17000000.0, 97.11], [1436401728.0, 17020000.0, 97.103196], [1436403456.0, 17040000.0, 97.096384], [1436405184.0, 17060000.0, 97.089564], [1436406912.0, 17080000.0, 97.082736], [1436408640.0, 17100000.0, 97.0759], [1436410368.0, 17120000.0, 97.069056], [1436412096.0, 17140000.0, 97.062204], [1436413824.0, 17160000.0, 97.055344], [1436415552.0, 17180000.0, 97.048476], [1436417280.0, 17200000.0, 97.0416], [1436419008.0, 17220000.0, 97.034716], [1436420736.0, 17240000.0, 97.027824], [1436422464.0, 17260000.0, 97.020924], [1436424192.0, 17280000.0, 97.014016], [1436425920.0, 17300000.0, 97.0071], [1436427648.0, 17320000.0, 97.000176], [1436429376.0, 17340000.0, 96.993244], [1436431104.0, 17360000.0, 96.986304], [1436432832.0, 17380000.0, 96.979356], [1436434560.0, 17400000.0, 96.9724], [1436436288.0, 17420000.0, 96.965436], [1436438016.0, 17440000.0, 96.958464], [1436439744.0, 17460000.0, 96.951484], [1436441472.0, 17480000.0, 96.944496], [1436443200.0, 17500000.0, 96.9375], [1436444928.0, 17520000.0, 96.930496], [1436446656.0, 17540000.0, 96.923484], [1436448384.0, 17560000.0, 96.916464], [1436450112.0, 17580000.0, 96.909436], [1436451840.0, 17600000.0, 96.9024], [1436453568.0, 17620000.0, 96.895356], [1436455296.0, 17640000.0, 96.888304], [1436457024.0, 17660000.0, 96.881244], [1436458752.0, 17680000.0, 96.874176], [1436460480.0, 17700000.0, 96.8671], [1436462208.0, 17720000.0, 96.860016], [1436463936.0, 17740000.0, 96.852924], [1436465664.0, 17760000.0, 96.845824], [1436467392.0, 17780000.0, 96.838716], [1436469120.0, 17800000.0, 96.8316], [1436470848.0, 17820000.0, 96.824476], [1436472576.0, 17840000.0, 96.817344], [1436474304.0, 17860000.0, 96.810204], [1436476032.0, 17880000.0, 96.803056], [1436477760.0, 17900000.0, 96.7959], [1436479488.0, 17920000.0, 96.788736], [1436481216.0, 17940000.0, 96.781564], [1436482944.0, 17960000.0, 96.774384], [1436484672.0, 17980000.0, 96.767196], [1436486400.0, 18000000.0, 96.76], [1436488128.0, 18020000.0, 96.752796], [1436489856.0, 18040000.0, 96.745584], [1436491584.0, 18060000.0, 96.738364], [1436493312.0, 18080000.0, 96.731136], [1436495040.0, 18100000.0, 96.7239], [1436496768.0, 18120000.0, 96.716656], [1436498496.0, 18140000.0, 96.709404], [1436500224.0, 18160000.0, 96.702144], [1436501952.0, 18180000.0, 96.694876], [1436503680.0, 18200000.0, 96.6876], [1436505408.0, 18220000.0, 96.680316], [1436507136.0, 18240000.0, 96.673024], [1436508864.0, 18260000.0, 96.665724], [1436510592.0, 18280000.0, 96.658416], [1436512320.0, 18300000.0, 96.6511], [1436514048.0, 18320000.0, 96.643776], [1436515776.0, 18340000.0, 96.636444], [1436517504.0, 18360000.0, 96.629104], [1436519232.0, 18380000.0, 96.621756], [1436520960.0, 18400000.0, 96.6144], [1436522688.0, 18420000.0, 96.607036], [1436524416.0, 18440000.0, 96.599664], [1436526144.0, 18460000.0, 96.592284], [1436527872.0, 18480000.0, 96.584896], [1436529600.0, 18500000.0, 96.5775], [1436531328.0, 18520000.0, 96.570096], [1436533056.0, 18540000.0, 96.562684], [1436534784.0, 18560000.0, 96.555264], [1436536512.0, 18580000.0, 96.547836], [1436538240.0, 18600000.0, 96.5404], [1436539968.0, 18620000.0, 96.532956], [1436541696.0, 18640000.0, 96.525504], [1436543424.0, 18660000.0, 96.518044], [1436545152.0, 18680000.0, 96.510576], [1436546880.0, 18700000.0, 96.5031], [1436548608.0, 18720000.0, 96.495616], [1436550336.0, 18740000.0, 96.488124], [1436552064.0, 18760000.0, 96.480624], [1436553792.0, 18780000.0, 96.473116], [1436555520.0, 18800000.0, 96.4656], [1436557248.0, 18820000.0, 96.458076], [1436558976.0, 18840000.0, 96.450544], [1436560704.0, 18860000.0, 96.443004], [1436562432.0, 18880000.0, 96.435456], [1436564160.0, 18900000.0, 96.4279], [1436565888.0, 18920000.0, 96.420336], [1436567616.0, 18940000.0, 96.412764], [1436569344.0, 18960000.0, 96.405184], [1436571072.0, 18980000.0, 96.39759600000001], [1436572800.0, 19000000.0, 96.39], [1436574528.0, 19020000.0, 96.382396], [1436576256.0, 19040000.0, 96.374784], [1436577984.0, 19060000.0, 96.367164], [1436579712.0, 19080000.0, 96.359536], [1436581440.0, 19100000.0, 96.3519], [1436583168.0, 19120000.0, 96.344256], [1436584896.0, 19140000.0, 96.336604], [1436586624.0, 19160000.0, 96.328944], [1436588352.0, 19180000.0, 96.321276], [1436590080.0, 19200000.0, 96.3136], [1436591808.0, 19220000.0, 96.305916], [1436593536.0, 19240000.0, 96.298224], [1436595264.0, 19260000.0, 96.290524], [1436596992.0, 19280000.0, 96.282816], [1436598720.0, 19300000.0, 96.2751], [1436600448.0, 19320000.0, 96.267376], [1436602176.0, 19340000.0, 96.259644], [1436603904.0, 19360000.0, 96.251904], [1436605632.0, 19380000.0, 96.244156], [1436607360.0, 19400000.0, 96.2364], [1436609088.0, 19420000.0, 96.228636], [1436610816.0, 19440000.0, 96.220864], [1436612544.0, 19460000.0, 96.213084], [1436614272.0, 19480000.0, 96.205296], [1436616000.0, 19500000.0, 96.1975], [1436617728.0, 19520000.0, 96.189696], [1436619456.0, 19540000.0, 96.181884], [1436621184.0, 19560000.0, 96.174064], [1436622912.0, 19580000.0, 96.166236], [1436624640.0, 19600000.0, 96.1584], [1436626368.0, 19620000.0, 96.150556], [1436628096.0, 19640000.0, 96.142704], [1436629824.0, 19660000.0, 96.134844], [1436631552.0, 19680000.0, 96.126976], [1436633280.0, 19700000.0, 96.1191], [1436635008.0, 19720000.0, 96.111216], [1436636736.0, 19740000.0, 96.103324], [1436638464.0, 19760000.0, 96.095424], [1436640192.0, 19780000.0, 96.087516], [1436641920.0, 19800000.0, 96.0796], [1436643648.0, 19820000.0, 96.071676], [1436645376.0, 19840000.0, 96.063744], [1436647104.0, 19860000.0, 96.055804], [1436648832.0, 19880000.0, 96.047856], [1436650560.0, 19900000.0, 96.0399], [1436652288.0, 19920000.0, 96.031936], [1436654016.0, 19940000.0, 96.023964], [1436655744.0, 19960000.0, 96.015984], [1436657472.0, 19980000.0, 96.007996], [1436659200.0, 20000000.0, 96.0], [1436660928.0, 20020000.0, 95.991996], [1436662656.0, 20040000.0, 95.983984], [1436664384.0, 20060000.0, 95.975964], [1436666112.0, 20080000.0, 95.967936], [1436667840.0, 20100000.0, 95.9599], [1436669568.0, 20120000.0, 95.951856], [1436671296.0, 20140000.0, 95.943804], [1436673024.0, 20160000.0, 95.935744], [1436674752.0, 20180000.0, 95.927676], [1436676480.0, 20200000.0, 95.9196], [1436678208.0, 20220000.0, 95.911516], [1436679936.0, 20240000.0, 95.903424], [1436681664.0, 20260000.0, 95.895324], [1436683392.0, 20280000.0, 95.887216], [1436685120.0, 20300000.0, 95.8791], [1436686848.0, 20320000.0, 95.870976], [1436688576.0, 20340000.0, 95.862844], [1436690304.0, 20360000.0, 95.854704], [1436692032.0, 20380000.0, 95.846556], [1436693760.0, 20400000.0, 95.83840000000001], [1436695488.0, 20420000.0, 95.830236], [1436697216.0, 20440000.0, 95.822064], [1436698944.0, 20460000.0, 95.813884], [1436700672.0, 20480000.0, 95.805696], [1436702400.0, 20500000.0, 95.7975], [1436704128.0, 20520000.0, 95.78929600000001], [1436705856.0, 20540000.0, 95.781084], [1436707584.0, 20560000.0, 95.772864], [1436709312.0, 20580000.0, 95.764636], [1436711040.0, 20600000.0, 95.7564], [1436712768.0, 20620000.0, 95.748156], [1436714496.0, 20640000.0, 95.739904], [1436716224.0, 20660000.0, 95.731644], [1436717952.0, 20680000.0, 95.723376], [1436719680.0, 20700000.0, 95.7151], [1436721408.0, 20720000.0, 95.706816], [1436723136.0, 20740000.0, 95.698524], [1436724864.0, 20760000.0, 95.690224], [1436726592.0, 20780000.0, 95.681916], [1436728320.0, 20800000.0, 95.6736], [1436730048.0, 20820000.0, 95.665276], [1436731776.0, 20840000.0, 95.656944], [1436733504.0, 20860000.0, 95.648604], [1436735232.0, 20880000.0, 95.640256], [1436736960.0, 20900000.0, 95.6319], [1436738688.0, 20920000.0, 95.623536], [1436740416.0, 20940000.0, 95.61516400000001], [1436742144.0, 20960000.0, 95.606784], [1436743872.0, 20980000.0, 95.598396], [1436745600.0, 21000000.0, 95.59], [1436747328.0, 21020000.0, 95.581596], [1436749056.0, 21040000.0, 95.573184], [1436750784.0, 21060000.0, 95.564764], [1436752512.0, 21080000.0, 95.556336], [1436754240.0, 21100000.0, 95.5479], [1436755968.0, 21120000.0, 95.539456], [1436757696.0, 21140000.0, 95.531004], [1436759424.0, 21160000.0, 95.522544], [1436761152.0, 21180000.0, 95.514076], [1436762880.0, 21200000.0, 95.5056], [1436764608.0, 21220000.0, 95.497116], [1436766336.0, 21240000.0, 95.488624], [1436768064.0, 21260000.0, 95.480124], [1436769792.0, 21280000.0, 95.471616], [1436771520.0, 21300000.0, 95.4631], [1436773248.0, 21320000.0, 95.454576], [1436774976.0, 21340000.0, 95.446044], [1436776704.0, 21360000.0, 95.437504], [1436778432.0, 21380000.0, 95.428956], [1436780160.0, 21400000.0, 95.4204], [1436781888.0, 21420000.0, 95.411836], [1436783616.0, 21440000.0, 95.403264], [1436785344.0, 21460000.0, 95.394684], [1436787072.0, 21480000.0, 95.386096], [1436788800.0, 21500000.0, 95.3775], [1436790528.0, 21520000.0, 95.368896], [1436792256.0, 21540000.0, 95.36028400000001], [1436793984.0, 21560000.0, 95.351664], [1436795712.0, 21580000.0, 95.343036], [1436797440.0, 21600000.0, 95.3344], [1436799168.0, 21620000.0, 95.325756], [1436800896.0, 21640000.0, 95.317104], [1436802624.0, 21660000.0, 95.308444], [1436804352.0, 21680000.0, 95.299776], [1436806080.0, 21700000.0, 95.2911], [1436807808.0, 21720000.0, 95.282416], [1436809536.0, 21740000.0, 95.273724], [1436811264.0, 21760000.0, 95.265024], [1436812992.0, 21780000.0, 95.256316], [1436814720.0, 21800000.0, 95.2476], [1436816448.0, 21820000.0, 95.238876], [1436818176.0, 21840000.0, 95.230144], [1436819904.0, 21860000.0, 95.221404], [1436821632.0, 21880000.0, 95.212656], [1436823360.0, 21900000.0, 95.2039], [1436825088.0, 21920000.0, 95.195136], [1436826816.0, 21940000.0, 95.186364], [1436828544.0, 21960000.0, 95.177584], [1436830272.0, 21980000.0, 95.168796], [1436832000.0, 22000000.0, 95.16], [1436833728.0, 22020000.0, 95.151196], [1436835456.0, 22040000.0, 95.14238399999999], [1436837184.0, 22060000.0, 95.133564], [1436838912.0, 22080000.0, 95.124736], [1436840640.0, 22100000.0, 95.1159], [1436842368.0, 22120000.0, 95.107056], [1436844096.0, 22140000.0, 95.098204], [1436845824.0, 22160000.0, 95.089344], [1436847552.0, 22180000.0, 95.080476], [1436849280.0, 22200000.0, 95.0716], [1436851008.0, 22220000.0, 95.062716], [1436852736.0, 22240000.0, 95.05382399999999], [1436854464.0, 22260000.0, 95.044924], [1436856192.0, 22280000.0, 95.036016], [1436857920.0, 22300000.0, 95.0271], [1436859648.0, 22320000.0, 95.018176], [1436861376.0, 22340000.0, 95.009244], [1436863104.0, 22360000.0, 95.000304], [1436864832.0, 22380000.0, 94.991356], [1436866560.0, 22400000.0, 94.9824], [1436868288.0, 22420000.0, 94.973436], [1436870016.0, 22440000.0, 94.96446399999999], [1436871744.0, 22460000.0, 94.955484], [1436873472.0, 22480000.0, 94.946496], [1436875200.0, 22500000.0, 94.9375], [1436876928.0, 22520000.0, 94.928496], [1436878656.0, 22540000.0, 94.919484], [1436880384.0, 22560000.0, 94.910464], [1436882112.0, 22580000.0, 94.901436], [1436883840.0, 22600000.0, 94.8924], [1436885568.0, 22620000.0, 94.883356], [1436887296.0, 22640000.0, 94.874304], [1436889024.0, 22660000.0, 94.865244], [1436890752.0, 22680000.0, 94.856176], [1436892480.0, 22700000.0, 94.8471], [1436894208.0, 22720000.0, 94.838016], [1436895936.0, 22740000.0, 94.828924], [1436897664.0, 22760000.0, 94.819824], [1436899392.0, 22780000.0, 94.810716], [1436901120.0, 22800000.0, 94.80160000000001], [1436902848.0, 22820000.0, 94.792476], [1436904576.0, 22840000.0, 94.783344], [1436906304.0, 22860000.0, 94.774204], [1436908032.0, 22880000.0, 94.765056], [1436909760.0, 22900000.0, 94.7559], [1436911488.0, 22920000.0, 94.746736], [1436913216.0, 22940000.0, 94.737564], [1436914944.0, 22960000.0, 94.728384], [1436916672.0, 22980000.0, 94.719196], [1436918400.0, 23000000.0, 94.71000000000001], [1436920128.0, 23020000.0, 94.700796], [1436921856.0, 23040000.0, 94.691584], [1436923584.0, 23060000.0, 94.682364], [1436925312.0, 23080000.0, 94.673136], [1436927040.0, 23100000.0, 94.6639], [1436928768.0, 23120000.0, 94.654656], [1436930496.0, 23140000.0, 94.645404], [1436932224.0, 23160000.0, 94.636144], [1436933952.0, 23180000.0, 94.626876], [1436935680.0, 23200000.0, 94.6176], [1436937408.0, 23220000.0, 94.608316], [1436939136.0, 23240000.0, 94.599024], [1436940864.0, 23260000.0, 94.589724], [1436942592.0, 23280000.0, 94.580416], [1436944320.0, 23300000.0, 94.5711], [1436946048.0, 23320000.0, 94.561776], [1436947776.0, 23340000.0, 94.552444], [1436949504.0, 23360000.0, 94.543104], [1436951232.0, 23380000.0, 94.533756], [1436952960.0, 23400000.0, 94.5244], [1436954688.0, 23420000.0, 94.515036], [1436956416.0, 23440000.0, 94.505664], [1436958144.0, 23460000.0, 94.496284], [1436959872.0, 23480000.0, 94.486896], [1436961600.0, 23500000.0, 94.47749999999999], [1436963328.0, 23520000.0, 94.468096], [1436965056.0, 23540000.0, 94.458684], [1436966784.0, 23560000.0, 94.449264], [1436968512.0, 23580000.0, 94.439836], [1436970240.0, 23600000.0, 94.4304], [1436971968.0, 23620000.0, 94.420956], [1436973696.0, 23640000.0, 94.41150400000001], [1436975424.0, 23660000.0, 94.402044], [1436977152.0, 23680000.0, 94.392576], [1436978880.0, 23700000.0, 94.3831], [1436980608.0, 23720000.0, 94.373616], [1436982336.0, 23740000.0, 94.364124], [1436984064.0, 23760000.0, 94.354624], [1436985792.0, 23780000.0, 94.345116], [1436987520.0, 23800000.0, 94.3356], [1436989248.0, 23820000.0, 94.326076], [1436990976.0, 23840000.0, 94.316544], [1436992704.0, 23860000.0, 94.307004], [1436994432.0, 23880000.0, 94.297456], [1436996160.0, 23900000.0, 94.2879], [1436997888.0, 23920000.0, 94.278336], [1436999616.0, 23940000.0, 94.268764], [1437001344.0, 23960000.0, 94.259184], [1437003072.0, 23980000.0, 94.249596], [1437004800.0, 24000000.0, 94.24], [1437006528.0, 24020000.0, 94.230396], [1437008256.0, 24040000.0, 94.220784], [1437009984.0, 24060000.0, 94.211164], [1437011712.0, 24080000.0, 94.201536], [1437013440.0, 24100000.0, 94.1919], [1437015168.0, 24120000.0, 94.182256], [1437016896.0, 24140000.0, 94.17260399999999], [1437018624.0, 24160000.0, 94.162944], [1437020352.0, 24180000.0, 94.153276], [1437022080.0, 24200000.0, 94.1436], [1437023808.0, 24220000.0, 94.133916], [1437025536.0, 24240000.0, 94.124224], [1437027264.0, 24260000.0, 94.114524], [1437028992.0, 24280000.0, 94.104816], [1437030720.0, 24300000.0, 94.0951], [1437032448.0, 24320000.0, 94.085376], [1437034176.0, 24340000.0, 94.075644], [1437035904.0, 24360000.0, 94.065904], [1437037632.0, 24380000.0, 94.056156], [1437039360.0, 24400000.0, 94.0464], [1437041088.0, 24420000.0, 94.036636], [1437042816.0, 24440000.0, 94.026864], [1437044544.0, 24460000.0, 94.017084], [1437046272.0, 24480000.0, 94.007296], [1437048000.0, 24500000.0, 93.9975], [1437049728.0, 24520000.0, 93.987696], [1437051456.0, 24540000.0, 93.977884], [1437053184.0, 24560000.0, 93.968064], [1437054912.0, 24580000.0, 93.958236], [1437056640.0, 24600000.0, 93.9484], [1437058368.0, 24620000.0, 93.938556], [1437060096.0, 24640000.0, 93.928704], [1437061824.0, 24660000.0, 93.91884399999999], [1437063552.0, 24680000.0, 93.908976], [1437065280.0, 24700000.0, 93.8991], [1437067008.0, 24720000.0, 93.889216], [1437068736.0, 24740000.0, 93.879324], [1437070464.0, 24760000.0, 93.869424], [1437072192.0, 24780000.0, 93.859516], [1437073920.0, 24800000.0, 93.8496], [1437075648.0, 24820000.0, 93.839676], [1437077376.0, 24840000.0, 93.829744], [1437079104.0, 24860000.0, 93.819804], [1437080832.0, 24880000.0, 93.809856], [1437082560.0, 24900000.0, 93.7999], [1437084288.0, 24920000.0, 93.789936], [1437086016.0, 24940000.0, 93.77996399999999], [1437087744.0, 24960000.0, 93.769984], [1437089472.0, 24980000.0, 93.759996], [1437091200.0, 25000000.0, 93.75], [1437092928.0, 25020000.0, 93.739996], [1437094656.0, 25040000.0, 93.729984], [1437096384.0, 25060000.0, 93.719964], [1437098112.0, 25080000.0, 93.709936], [1437099840.0, 25100000.0, 93.6999], [1437101568.0, 25120000.0, 93.689856], [1437103296.0, 25140000.0, 93.679804], [1437105024.0, 25160000.0, 93.669744], [1437106752.0, 25180000.0, 93.659676], [1437108480.0, 25200000.0, 93.64959999999999], [1437110208.0, 25220000.0, 93.639516], [1437111936.0, 25240000.0, 93.629424], [1437113664.0, 25260000.0, 93.619324], [1437115392.0, 25280000.0, 93.609216], [1437117120.0, 25300000.0, 93.5991], [1437118848.0, 25320000.0, 93.588976], [1437120576.0, 25340000.0, 93.578844], [1437122304.0, 25360000.0, 93.568704], [1437124032.0, 25380000.0, 93.558556], [1437125760.0, 25400000.0, 93.5484], [1437127488.0, 25420000.0, 93.538236], [1437129216.0, 25440000.0, 93.528064], [1437130944.0, 25460000.0, 93.517884], [1437132672.0, 25480000.0, 93.507696], [1437134400.0, 25500000.0, 93.4975], [1437136128.0, 25520000.0, 93.487296], [1437137856.0, 25540000.0, 93.477084], [1437139584.0, 25560000.0, 93.466864], [1437141312.0, 25580000.0, 93.456636], [1437143040.0, 25600000.0, 93.4464], [1437144768.0, 25620000.0, 93.436156], [1437146496.0, 25640000.0, 93.425904], [1437148224.0, 25660000.0, 93.415644], [1437149952.0, 25680000.0, 93.405376], [1437151680.0, 25700000.0, 93.3951], [1437153408.0, 25720000.0, 93.384816], [1437155136.0, 25740000.0, 93.37452400000001], [1437156864.0, 25760000.0, 93.364224], [1437158592.0, 25780000.0, 93.353916], [1437160320.0, 25800000.0, 93.3436], [1437162048.0, 25820000.0, 93.333276], [1437163776.0, 25840000.0, 93.322944], [1437165504.0, 25860000.0, 93.312604], [1437167232.0, 25880000.0, 93.302256], [1437168960.0, 25900000.0, 93.2919], [1437170688.0, 25920000.0, 93.281536], [1437172416.0, 25940000.0, 93.271164], [1437174144.0, 25960000.0, 93.260784], [1437175872.0, 25980000.0, 93.250396], [1437177600.0, 26000000.0, 93.24], [1437179328.0, 26020000.0, 93.229596], [1437181056.0, 26040000.0, 93.219184], [1437182784.0, 26060000.0, 93.208764], [1437184512.0, 26080000.0, 93.198336], [1437186240.0, 26100000.0, 93.1879], [1437187968.0, 26120000.0, 93.177456], [1437189696.0, 26140000.0, 93.167004], [1437191424.0, 26160000.0, 93.156544], [1437193152.0, 26180000.0, 93.146076], [1437194880.0, 26200000.0, 93.1356], [1437196608.0, 26220000.0, 93.125116], [1437198336.0, 26240000.0, 93.11462399999999], [1437200064.0, 26260000.0, 93.104124], [1437201792.0, 26280000.0, 93.093616], [1437203520.0, 26300000.0, 93.0831], [1437205248.0, 26320000.0, 93.072576], [1437206976.0, 26340000.0, 93.062044], [1437208704.0, 26360000.0, 93.051504], [1437210432.0, 26380000.0, 93.040956], [1437212160.0, 26400000.0, 93.0304], [1437213888.0, 26420000.0, 93.019836], [1437215616.0, 26440000.0, 93.009264], [1437217344.0, 26460000.0, 92.998684], [1437219072.0, 26480000.0, 92.988096], [1437220800.0, 26500000.0, 92.9775], [1437222528.0, 26520000.0, 92.966896], [1437224256.0, 26540000.0, 92.956284], [1437225984.0, 26560000.0, 92.945664], [1437227712.0, 26580000.0, 92.935036], [1437229440.0, 26600000.0, 92.9244], [1437231168.0, 26620000.0, 92.913756], [1437232896.0, 26640000.0, 92.903104], [1437234624.0, 26660000.0, 92.892444], [1437236352.0, 26680000.0, 92.881776], [1437238080.0, 26700000.0, 92.8711], [1437239808.0, 26720000.0, 92.860416], [1437241536.0, 26740000.0, 92.849724], [1437243264.0, 26760000.0, 92.839024], [1437244992.0, 26780000.0, 92.828316], [1437246720.0, 26800000.0, 92.8176], [1437248448.0, 26820000.0, 92.806876], [1437250176.0, 26840000.0, 92.796144], [1437251904.0, 26860000.0, 92.785404], [1437253632.0, 26880000.0, 92.774656], [1437255360.0, 26900000.0, 92.7639], [1437257088.0, 26920000.0, 92.753136], [1437258816.0, 26940000.0, 92.742364], [1437260544.0, 26960000.0, 92.731584], [1437262272.0, 26980000.0, 92.720796], [1437264000.0, 27000000.0, 92.71], [1437265728.0, 27020000.0, 92.699196], [1437267456.0, 27040000.0, 92.688384], [1437269184.0, 27060000.0, 92.677564], [1437270912.0, 27080000.0, 92.666736], [1437272640.0, 27100000.0, 92.6559], [1437274368.0, 27120000.0, 92.645056], [1437276096.0, 27140000.0, 92.634204], [1437277824.0, 27160000.0, 92.623344], [1437279552.0, 27180000.0, 92.612476], [1437281280.0, 27200000.0, 92.6016], [1437283008.0, 27220000.0, 92.590716], [1437284736.0, 27240000.0, 92.579824], [1437286464.0, 27260000.0, 92.568924], [1437288192.0, 27280000.0, 92.558016], [1437289920.0, 27300000.0, 92.5471], [1437291648.0, 27320000.0, 92.536176], [1437293376.0, 27340000.0, 92.525244], [1437295104.0, 27360000.0, 92.514304], [1437296832.0, 27380000.0, 92.503356], [1437298560.0, 27400000.0, 92.4924], [1437300288.0, 27420000.0, 92.481436], [1437302016.0, 27440000.0, 92.47046399999999], [1437303744.0, 27460000.0, 92.459484], [1437305472.0, 27480000.0, 92.448496], [1437307200.0, 27500000.0, 92.4375], [1437308928.0, 27520000.0, 92.426496], [1437310656.0, 27540000.0, 92.415484], [1437312384.0, 27560000.0, 92.404464], [1437314112.0, 27580000.0, 92.393436], [1437315840.0, 27600000.0, 92.3824], [1437317568.0, 27620000.0, 92.371356], [1437319296.0, 27640000.0, 92.360304], [1437321024.0, 27660000.0, 92.349244], [1437322752.0, 27680000.0, 92.338176], [1437324480.0, 27700000.0, 92.3271], [1437326208.0, 27720000.0, 92.316016], [1437327936.0, 27740000.0, 92.304924], [1437329664.0, 27760000.0, 92.293824], [1437331392.0, 27780000.0, 92.282716], [1437333120.0, 27800000.0, 92.2716], [1437334848.0, 27820000.0, 92.260476], [1437336576.0, 27840000.0, 92.24934400000001], [1437338304.0, 27860000.0, 92.238204], [1437340032.0, 27880000.0, 92.227056], [1437341760.0, 27900000.0, 92.2159], [1437343488.0, 27920000.0, 92.204736], [1437345216.0, 27940000.0, 92.193564], [1437346944.0, 27960000.0, 92.182384], [1437348672.0, 27980000.0, 92.171196], [1437350400.0, 28000000.0, 92.16], [1437352128.0, 28020000.0, 92.148796], [1437353856.0, 28040000.0, 92.137584], [1437355584.0, 28060000.0, 92.126364], [1437357312.0, 28080000.0, 92.115136], [1437359040.0, 28100000.0, 92.1039], [1437360768.0, 28120000.0, 92.092656], [1437362496.0, 28140000.0, 92.081404], [1437364224.0, 28160000.0, 92.070144], [1437365952.0, 28180000.0, 92.058876], [1437367680.0, 28200000.0, 92.0476], [1437369408.0, 28220000.0, 92.036316], [1437371136.0, 28240000.0, 92.025024], [1437372864.0, 28260000.0, 92.013724], [1437374592.0, 28280000.0, 92.002416], [1437376320.0, 28300000.0, 91.9911], [1437378048.0, 28320000.0, 91.979776], [1437379776.0, 28340000.0, 91.968444], [1437381504.0, 28360000.0, 91.957104], [1437383232.0, 28380000.0, 91.945756], [1437384960.0, 28400000.0, 91.9344], [1437386688.0, 28420000.0, 91.923036], [1437388416.0, 28440000.0, 91.911664], [1437390144.0, 28460000.0, 91.900284], [1437391872.0, 28480000.0, 91.888896], [1437393600.0, 28500000.0, 91.8775], [1437395328.0, 28520000.0, 91.866096], [1437397056.0, 28540000.0, 91.85468399999999], [1437398784.0, 28560000.0, 91.843264], [1437400512.0, 28580000.0, 91.831836], [1437402240.0, 28600000.0, 91.8204], [1437403968.0, 28620000.0, 91.808956], [1437405696.0, 28640000.0, 91.797504], [1437407424.0, 28660000.0, 91.786044], [1437409152.0, 28680000.0, 91.774576], [1437410880.0, 28700000.0, 91.7631], [1437412608.0, 28720000.0, 91.751616], [1437414336.0, 28740000.0, 91.740124], [1437416064.0, 28760000.0, 91.728624], [1437417792.0, 28780000.0, 91.717116], [1437419520.0, 28800000.0, 91.7056], [1437421248.0, 28820000.0, 91.694076], [1437422976.0, 28840000.0, 91.68254400000001], [1437424704.0, 28860000.0, 91.671004], [1437426432.0, 28880000.0, 91.659456], [1437428160.0, 28900000.0, 91.64789999999999], [1437429888.0, 28920000.0, 91.636336], [1437431616.0, 28940000.0, 91.624764], [1437433344.0, 28960000.0, 91.613184], [1437435072.0, 28980000.0, 91.601596], [1437436800.0, 29000000.0, 91.59], [1437438528.0, 29020000.0, 91.578396], [1437440256.0, 29040000.0, 91.566784], [1437441984.0, 29060000.0, 91.555164], [1437443712.0, 29080000.0, 91.543536], [1437445440.0, 29100000.0, 91.5319], [1437447168.0, 29120000.0, 91.520256], [1437448896.0, 29140000.0, 91.50860399999999], [1437450624.0, 29160000.0, 91.496944], [1437452352.0, 29180000.0, 91.485276], [1437454080.0, 29200000.0, 91.4736], [1437455808.0, 29220000.0, 91.461916], [1437457536.0, 29240000.0, 91.450224], [1437459264.0, 29260000.0, 91.438524], [1437460992.0, 29280000.0, 91.426816], [1437462720.0, 29300000.0, 91.4151], [1437464448.0, 29320000.0, 91.403376], [1437466176.0, 29340000.0, 91.391644], [1437467904.0, 29360000.0, 91.379904], [1437469632.0, 29380000.0, 91.368156], [1437471360.0, 29400000.0, 91.35640000000001], [1437473088.0, 29420000.0, 91.344636], [1437474816.0, 29440000.0, 91.332864], [1437476544.0, 29460000.0, 91.321084], [1437478272.0, 29480000.0, 91.309296], [1437480000.0, 29500000.0, 91.2975], [1437481728.0, 29520000.0, 91.285696], [1437483456.0, 29540000.0, 91.273884], [1437485184.0, 29560000.0, 91.262064], [1437486912.0, 29580000.0, 91.250236], [1437488640.0, 29600000.0, 91.2384], [1437490368.0, 29620000.0, 91.226556], [1437492096.0, 29640000.0, 91.214704], [1437493824.0, 29660000.0, 91.202844], [1437495552.0, 29680000.0, 91.190976], [1437497280.0, 29700000.0, 91.1791], [1437499008.0, 29720000.0, 91.167216], [1437500736.0, 29740000.0, 91.155324], [1437502464.0, 29760000.0, 91.143424], [1437504192.0, 29780000.0, 91.131516], [1437505920.0, 29800000.0, 91.1196], [1437507648.0, 29820000.0, 91.107676], [1437509376.0, 29840000.0, 91.095744], [1437511104.0, 29860000.0, 91.083804], [1437512832.0, 29880000.0, 91.071856], [1437514560.0, 29900000.0, 91.0599], [1437516288.0, 29920000.0, 91.04793599999999], [1437518016.0, 29940000.0, 91.03596399999999], [1437519744.0, 29960000.0, 91.023984], [1437521472.0, 29980000.0, 91.011996], [1437523200.0, 30000000.0, 91.0], [1437524928.0, 30020000.0, 90.987996], [1437526656.0, 30040000.0, 90.975984], [1437528384.0, 30060000.0, 90.963964], [1437530112.0, 30080000.0, 90.951936], [1437531840.0, 30100000.0, 90.9399], [1437533568.0, 30120000.0, 90.927856], [1437535296.0, 30140000.0, 90.91580400000001], [1437537024.0, 30160000.0, 90.903744], [1437538752.0, 30180000.0, 90.891676], [1437540480.0, 30200000.0, 90.8796], [1437542208.0, 30220000.0, 90.867516], [1437543936.0, 30240000.0, 90.855424], [1437545664.0, 30260000.0, 90.843324], [1437547392.0, 30280000.0, 90.831216], [1437549120.0, 30300000.0, 90.8191], [1437550848.0, 30320000.0, 90.806976], [1437552576.0, 30340000.0, 90.794844], [1437554304.0, 30360000.0, 90.782704], [1437556032.0, 30380000.0, 90.770556], [1437557760.0, 30400000.0, 90.7584], [1437559488.0, 30420000.0, 90.746236], [1437561216.0, 30440000.0, 90.734064], [1437562944.0, 30460000.0, 90.721884], [1437564672.0, 30480000.0, 90.709696], [1437566400.0, 30500000.0, 90.6975], [1437568128.0, 30520000.0, 90.685296], [1437569856.0, 30540000.0, 90.673084], [1437571584.0, 30560000.0, 90.660864], [1437573312.0, 30580000.0, 90.648636], [1437575040.0, 30600000.0, 90.6364], [1437576768.0, 30620000.0, 90.624156], [1437578496.0, 30640000.0, 90.611904], [1437580224.0, 30660000.0, 90.599644], [1437581952.0, 30680000.0, 90.587376], [1437583680.0, 30700000.0, 90.5751], [1437585408.0, 30720000.0, 90.562816], [1437587136.0, 30740000.0, 90.550524], [1437588864.0, 30760000.0, 90.538224], [1437590592.0, 30780000.0, 90.525916], [1437592320.0, 30800000.0, 90.5136], [1437594048.0, 30820000.0, 90.501276], [1437595776.0, 30840000.0, 90.488944], [1437597504.0, 30860000.0, 90.476604], [1437599232.0, 30880000.0, 90.464256], [1437600960.0, 30900000.0, 90.4519], [1437602688.0, 30920000.0, 90.439536], [1437604416.0, 30940000.0, 90.427164], [1437606144.0, 30960000.0, 90.414784], [1437607872.0, 30980000.0, 90.402396], [1437609600.0, 31000000.0, 90.39], [1437611328.0, 31020000.0, 90.377596], [1437613056.0, 31040000.0, 90.365184], [1437614784.0, 31060000.0, 90.35276400000001], [1437616512.0, 31080000.0, 90.340336], [1437618240.0, 31100000.0, 90.3279], [1437619968.0, 31120000.0, 90.315456], [1437621696.0, 31140000.0, 90.303004], [1437623424.0, 31160000.0, 90.290544], [1437625152.0, 31180000.0, 90.278076], [1437626880.0, 31200000.0, 90.2656], [1437628608.0, 31220000.0, 90.253116], [1437630336.0, 31240000.0, 90.240624], [1437632064.0, 31260000.0, 90.22812400000001], [1437633792.0, 31280000.0, 90.215616], [1437635520.0, 31300000.0, 90.2031], [1437637248.0, 31320000.0, 90.190576], [1437638976.0, 31340000.0, 90.178044], [1437640704.0, 31360000.0, 90.165504], [1437642432.0, 31380000.0, 90.152956], [1437644160.0, 31400000.0, 90.1404], [1437645888.0, 31420000.0, 90.127836], [1437647616.0, 31440000.0, 90.115264], [1437649344.0, 31460000.0, 90.102684], [1437651072.0, 31480000.0, 90.090096], [1437652800.0, 31500000.0, 90.0775], [1437654528.0, 31520000.0, 90.064896], [1437656256.0, 31540000.0, 90.052284], [1437657984.0, 31560000.0, 90.039664], [1437659712.0, 31580000.0, 90.027036], [1437661440.0, 31600000.0, 90.0144], [1437663168.0, 31620000.0, 90.001756], [1437664896.0, 31640000.0, 89.989104], [1437666624.0, 31660000.0, 89.976444], [1437668352.0, 31680000.0, 89.963776], [1437670080.0, 31700000.0, 89.9511], [1437671808.0, 31720000.0, 89.938416], [1437673536.0, 31740000.0, 89.925724], [1437675264.0, 31760000.0, 89.913024], [1437676992.0, 31780000.0, 89.900316], [1437678720.0, 31800000.0, 89.88759999999999], [1437680448.0, 31820000.0, 89.874876], [1437682176.0, 31840000.0, 89.862144], [1437683904.0, 31860000.0, 89.84940399999999], [1437685632.0, 31880000.0, 89.836656], [1437687360.0, 31900000.0, 89.8239], [1437689088.0, 31920000.0, 89.811136], [1437690816.0, 31940000.0, 89.798364], [1437692544.0, 31960000.0, 89.785584], [1437694272.0, 31980000.0, 89.772796], [1437696000.0, 32000000.0, 89.75999999999999], [1437697728.0, 32020000.0, 89.747196], [1437699456.0, 32040000.0, 89.734384], [1437701184.0, 32060000.0, 89.721564], [1437702912.0, 32080000.0, 89.708736], [1437704640.0, 32100000.0, 89.6959], [1437706368.0, 32120000.0, 89.683056], [1437708096.0, 32140000.0, 89.670204], [1437709824.0, 32160000.0, 89.657344], [1437711552.0, 32180000.0, 89.644476], [1437713280.0, 32200000.0, 89.63159999999999], [1437715008.0, 32220000.0, 89.618716], [1437716736.0, 32240000.0, 89.605824], [1437718464.0, 32260000.0, 89.592924], [1437720192.0, 32280000.0, 89.580016], [1437721920.0, 32300000.0, 89.5671], [1437723648.0, 32320000.0, 89.554176], [1437725376.0, 32340000.0, 89.541244], [1437727104.0, 32360000.0, 89.52830399999999], [1437728832.0, 32380000.0, 89.515356], [1437730560.0, 32400000.0, 89.5024], [1437732288.0, 32420000.0, 89.489436], [1437734016.0, 32440000.0, 89.47646399999999], [1437735744.0, 32460000.0, 89.463484], [1437737472.0, 32480000.0, 89.450496], [1437739200.0, 32500000.0, 89.4375], [1437740928.0, 32520000.0, 89.424496], [1437742656.0, 32540000.0, 89.411484], [1437744384.0, 32560000.0, 89.398464], [1437746112.0, 32580000.0, 89.385436], [1437747840.0, 32600000.0, 89.3724], [1437749568.0, 32620000.0, 89.359356], [1437751296.0, 32640000.0, 89.346304], [1437753024.0, 32660000.0, 89.33324400000001], [1437754752.0, 32680000.0, 89.320176], [1437756480.0, 32700000.0, 89.3071], [1437758208.0, 32720000.0, 89.294016], [1437759936.0, 32740000.0, 89.280924], [1437761664.0, 32760000.0, 89.267824], [1437763392.0, 32780000.0, 89.254716], [1437765120.0, 32800000.0, 89.2416], [1437766848.0, 32820000.0, 89.228476], [1437768576.0, 32840000.0, 89.215344], [1437770304.0, 32860000.0, 89.202204], [1437772032.0, 32880000.0, 89.189056], [1437773760.0, 32900000.0, 89.1759], [1437775488.0, 32920000.0, 89.162736], [1437777216.0, 32940000.0, 89.149564], [1437778944.0, 32960000.0, 89.136384], [1437780672.0, 32980000.0, 89.12319600000001], [1437782400.0, 33000000.0, 89.11], [1437784128.0, 33020000.0, 89.096796], [1437785856.0, 33040000.0, 89.083584], [1437787584.0, 33060000.0, 89.070364], [1437789312.0, 33080000.0, 89.057136], [1437791040.0, 33100000.0, 89.0439], [1437792768.0, 33120000.0, 89.030656], [1437794496.0, 33140000.0, 89.017404], [1437796224.0, 33160000.0, 89.004144], [1437797952.0, 33180000.0, 88.990876], [1437799680.0, 33200000.0, 88.9776], [1437801408.0, 33220000.0, 88.964316], [1437803136.0, 33240000.0, 88.951024], [1437804864.0, 33260000.0, 88.937724], [1437806592.0, 33280000.0, 88.92441600000001], [1437808320.0, 33300000.0, 88.9111], [1437810048.0, 33320000.0, 88.897776], [1437811776.0, 33340000.0, 88.884444], [1437813504.0, 33360000.0, 88.871104], [1437815232.0, 33380000.0, 88.857756], [1437816960.0, 33400000.0, 88.84440000000001], [1437818688.0, 33420000.0, 88.831036], [1437820416.0, 33440000.0, 88.81766400000001], [1437822144.0, 33460000.0, 88.804284], [1437823872.0, 33480000.0, 88.790896], [1437825600.0, 33500000.0, 88.7775], [1437827328.0, 33520000.0, 88.764096], [1437829056.0, 33540000.0, 88.750684], [1437830784.0, 33560000.0, 88.737264], [1437832512.0, 33580000.0, 88.723836], [1437834240.0, 33600000.0, 88.7104], [1437835968.0, 33620000.0, 88.696956], [1437837696.0, 33640000.0, 88.683504], [1437839424.0, 33660000.0, 88.670044], [1437841152.0, 33680000.0, 88.656576], [1437842880.0, 33700000.0, 88.6431], [1437844608.0, 33720000.0, 88.629616], [1437846336.0, 33740000.0, 88.616124], [1437848064.0, 33760000.0, 88.602624], [1437849792.0, 33780000.0, 88.589116], [1437851520.0, 33800000.0, 88.57560000000001], [1437853248.0, 33820000.0, 88.562076], [1437854976.0, 33840000.0, 88.548544], [1437856704.0, 33860000.0, 88.535004], [1437858432.0, 33880000.0, 88.521456], [1437860160.0, 33900000.0, 88.5079], [1437861888.0, 33920000.0, 88.494336], [1437863616.0, 33940000.0, 88.480764], [1437865344.0, 33960000.0, 88.467184], [1437867072.0, 33980000.0, 88.453596], [1437868800.0, 34000000.0, 88.44], [1437870528.0, 34020000.0, 88.426396], [1437872256.0, 34040000.0, 88.412784], [1437873984.0, 34060000.0, 88.399164], [1437875712.0, 34080000.0, 88.385536], [1437877440.0, 34100000.0, 88.3719], [1437879168.0, 34120000.0, 88.358256], [1437880896.0, 34140000.0, 88.344604], [1437882624.0, 34160000.0, 88.330944], [1437884352.0, 34180000.0, 88.31727599999999], [1437886080.0, 34200000.0, 88.3036], [1437887808.0, 34220000.0, 88.289916], [1437889536.0, 34240000.0, 88.276224], [1437891264.0, 34260000.0, 88.262524], [1437892992.0, 34280000.0, 88.248816], [1437894720.0, 34300000.0, 88.2351], [1437896448.0, 34320000.0, 88.221376], [1437898176.0, 34340000.0, 88.207644], [1437899904.0, 34360000.0, 88.193904], [1437901632.0, 34380000.0, 88.180156], [1437903360.0, 34400000.0, 88.1664], [1437905088.0, 34420000.0, 88.152636], [1437906816.0, 34440000.0, 88.138864], [1437908544.0, 34460000.0, 88.125084], [1437910272.0, 34480000.0, 88.111296], [1437912000.0, 34500000.0, 88.0975], [1437913728.0, 34520000.0, 88.083696], [1437915456.0, 34540000.0, 88.069884], [1437917184.0, 34560000.0, 88.056064], [1437918912.0, 34580000.0, 88.042236], [1437920640.0, 34600000.0, 88.0284], [1437922368.0, 34620000.0, 88.014556], [1437924096.0, 34640000.0, 88.000704], [1437925824.0, 34660000.0, 87.98684399999999], [1437927552.0, 34680000.0, 87.972976], [1437929280.0, 34700000.0, 87.9591], [1437931008.0, 34720000.0, 87.945216], [1437932736.0, 34740000.0, 87.931324], [1437934464.0, 34760000.0, 87.917424], [1437936192.0, 34780000.0, 87.903516], [1437937920.0, 34800000.0, 87.8896], [1437939648.0, 34820000.0, 87.875676], [1437941376.0, 34840000.0, 87.861744], [1437943104.0, 34860000.0, 87.847804], [1437944832.0, 34880000.0, 87.833856], [1437946560.0, 34900000.0, 87.8199], [1437948288.0, 34920000.0, 87.805936], [1437950016.0, 34940000.0, 87.791964], [1437951744.0, 34960000.0, 87.777984], [1437953472.0, 34980000.0, 87.76399599999999], [1437955200.0, 35000000.0, 87.75], [1437956928.0, 35020000.0, 87.735996], [1437958656.0, 35040000.0, 87.721984], [1437960384.0, 35060000.0, 87.707964], [1437962112.0, 35080000.0, 87.69393600000001], [1437963840.0, 35100000.0, 87.6799], [1437965568.0, 35120000.0, 87.665856], [1437967296.0, 35140000.0, 87.651804], [1437969024.0, 35160000.0, 87.637744], [1437970752.0, 35180000.0, 87.623676], [1437972480.0, 35200000.0, 87.6096], [1437974208.0, 35220000.0, 87.595516], [1437975936.0, 35240000.0, 87.581424], [1437977664.0, 35260000.0, 87.567324], [1437979392.0, 35280000.0, 87.55321599999999], [1437981120.0, 35300000.0, 87.5391], [1437982848.0, 35320000.0, 87.524976], [1437984576.0, 35340000.0, 87.510844], [1437986304.0, 35360000.0, 87.496704], [1437988032.0, 35380000.0, 87.482556], [1437989760.0, 35400000.0, 87.4684], [1437991488.0, 35420000.0, 87.45423600000001], [1437993216.0, 35440000.0, 87.440064], [1437994944.0, 35460000.0, 87.425884], [1437996672.0, 35480000.0, 87.411696], [1437998400.0, 35500000.0, 87.39750000000001], [1438000128.0, 35520000.0, 87.383296], [1438001856.0, 35540000.0, 87.369084], [1438003584.0, 35560000.0, 87.35486399999999], [1438005312.0, 35580000.0, 87.340636], [1438007040.0, 35600000.0, 87.3264], [1438008768.0, 35620000.0, 87.312156], [1438010496.0, 35640000.0, 87.297904], [1438012224.0, 35660000.0, 87.283644], [1438013952.0, 35680000.0, 87.269376], [1438015680.0, 35700000.0, 87.2551], [1438017408.0, 35720000.0, 87.240816], [1438019136.0, 35740000.0, 87.226524], [1438020864.0, 35760000.0, 87.21222399999999], [1438022592.0, 35780000.0, 87.197916], [1438024320.0, 35800000.0, 87.1836], [1438026048.0, 35820000.0, 87.169276], [1438027776.0, 35840000.0, 87.154944], [1438029504.0, 35860000.0, 87.140604], [1438031232.0, 35880000.0, 87.126256], [1438032960.0, 35900000.0, 87.1119], [1438034688.0, 35920000.0, 87.097536], [1438036416.0, 35940000.0, 87.083164], [1438038144.0, 35960000.0, 87.068784], [1438039872.0, 35980000.0, 87.054396], [1438041600.0, 36000000.0, 87.03999999999999], [1438043328.0, 36020000.0, 87.02559600000001], [1438045056.0, 36040000.0, 87.011184], [1438046784.0, 36060000.0, 86.996764], [1438048512.0, 36080000.0, 86.982336], [1438050240.0, 36100000.0, 86.9679], [1438051968.0, 36120000.0, 86.953456], [1438053696.0, 36140000.0, 86.939004], [1438055424.0, 36160000.0, 86.924544], [1438057152.0, 36180000.0, 86.910076], [1438058880.0, 36200000.0, 86.8956], [1438060608.0, 36220000.0, 86.881116], [1438062336.0, 36240000.0, 86.866624], [1438064064.0, 36260000.0, 86.852124], [1438065792.0, 36280000.0, 86.837616], [1438067520.0, 36300000.0, 86.8231], [1438069248.0, 36320000.0, 86.808576], [1438070976.0, 36340000.0, 86.794044], [1438072704.0, 36360000.0, 86.779504], [1438074432.0, 36380000.0, 86.764956], [1438076160.0, 36400000.0, 86.7504], [1438077888.0, 36420000.0, 86.735836], [1438079616.0, 36440000.0, 86.721264], [1438081344.0, 36460000.0, 86.706684], [1438083072.0, 36480000.0, 86.69209599999999], [1438084800.0, 36500000.0, 86.6775], [1438086528.0, 36520000.0, 86.662896], [1438088256.0, 36540000.0, 86.648284], [1438089984.0, 36560000.0, 86.633664], [1438091712.0, 36580000.0, 86.619036], [1438093440.0, 36600000.0, 86.6044], [1438095168.0, 36620000.0, 86.589756], [1438096896.0, 36640000.0, 86.575104], [1438098624.0, 36660000.0, 86.560444], [1438100352.0, 36680000.0, 86.545776], [1438102080.0, 36700000.0, 86.5311], [1438103808.0, 36720000.0, 86.51641599999999], [1438105536.0, 36740000.0, 86.501724], [1438107264.0, 36760000.0, 86.48702399999999], [1438108992.0, 36780000.0, 86.472316], [1438110720.0, 36800000.0, 86.4576], [1438112448.0, 36820000.0, 86.442876], [1438114176.0, 36840000.0, 86.428144], [1438115904.0, 36860000.0, 86.413404], [1438117632.0, 36880000.0, 86.398656], [1438119360.0, 36900000.0, 86.3839], [1438121088.0, 36920000.0, 86.369136], [1438122816.0, 36940000.0, 86.354364], [1438124544.0, 36960000.0, 86.339584], [1438126272.0, 36980000.0, 86.324796], [1438128000.0, 37000000.0, 86.31], [1438129728.0, 37020000.0, 86.295196], [1438131456.0, 37040000.0, 86.280384], [1438133184.0, 37060000.0, 86.265564], [1438134912.0, 37080000.0, 86.250736], [1438136640.0, 37100000.0, 86.2359], [1438138368.0, 37120000.0, 86.221056], [1438140096.0, 37140000.0, 86.206204], [1438141824.0, 37160000.0, 86.191344], [1438143552.0, 37180000.0, 86.17647600000001], [1438145280.0, 37200000.0, 86.16159999999999], [1438147008.0, 37220000.0, 86.146716], [1438148736.0, 37240000.0, 86.131824], [1438150464.0, 37260000.0, 86.116924], [1438152192.0, 37280000.0, 86.10201599999999], [1438153920.0, 37300000.0, 86.08709999999999], [1438155648.0, 37320000.0, 86.072176], [1438157376.0, 37340000.0, 86.057244], [1438159104.0, 37360000.0, 86.042304], [1438160832.0, 37380000.0, 86.027356], [1438162560.0, 37400000.0, 86.0124], [1438164288.0, 37420000.0, 85.997436], [1438166016.0, 37440000.0, 85.982464], [1438167744.0, 37460000.0, 85.967484], [1438169472.0, 37480000.0, 85.952496], [1438171200.0, 37500000.0, 85.9375], [1438172928.0, 37520000.0, 85.922496], [1438174656.0, 37540000.0, 85.907484], [1438176384.0, 37560000.0, 85.892464], [1438178112.0, 37580000.0, 85.877436], [1438179840.0, 37600000.0, 85.86240000000001], [1438181568.0, 37620000.0, 85.847356], [1438183296.0, 37640000.0, 85.83230400000001], [1438185024.0, 37660000.0, 85.817244], [1438186752.0, 37680000.0, 85.802176], [1438188480.0, 37700000.0, 85.7871], [1438190208.0, 37720000.0, 85.77201600000001], [1438191936.0, 37740000.0, 85.756924], [1438193664.0, 37760000.0, 85.74182400000001], [1438195392.0, 37780000.0, 85.726716], [1438197120.0, 37800000.0, 85.7116], [1438198848.0, 37820000.0, 85.696476], [1438200576.0, 37840000.0, 85.681344], [1438202304.0, 37860000.0, 85.666204], [1438204032.0, 37880000.0, 85.651056], [1438205760.0, 37900000.0, 85.63589999999999], [1438207488.0, 37920000.0, 85.620736], [1438209216.0, 37940000.0, 85.605564], [1438210944.0, 37960000.0, 85.590384], [1438212672.0, 37980000.0, 85.575196], [1438214400.0, 38000000.0, 85.56], [1438216128.0, 38020000.0, 85.544796], [1438217856.0, 38040000.0, 85.529584], [1438219584.0, 38060000.0, 85.514364], [1438221312.0, 38080000.0, 85.49913600000001], [1438223040.0, 38100000.0, 85.4839], [1438224768.0, 38120000.0, 85.468656], [1438226496.0, 38140000.0, 85.453404], [1438228224.0, 38160000.0, 85.438144], [1438229952.0, 38180000.0, 85.422876], [1438231680.0, 38200000.0, 85.4076], [1438233408.0, 38220000.0, 85.392316], [1438235136.0, 38240000.0, 85.377024], [1438236864.0, 38260000.0, 85.361724], [1438238592.0, 38280000.0, 85.346416], [1438240320.0, 38300000.0, 85.33109999999999], [1438242048.0, 38320000.0, 85.315776], [1438243776.0, 38340000.0, 85.300444], [1438245504.0, 38360000.0, 85.285104], [1438247232.0, 38380000.0, 85.269756], [1438248960.0, 38400000.0, 85.2544], [1438250688.0, 38420000.0, 85.239036], [1438252416.0, 38440000.0, 85.223664], [1438254144.0, 38460000.0, 85.20828399999999], [1438255872.0, 38480000.0, 85.192896], [1438257600.0, 38500000.0, 85.1775], [1438259328.0, 38520000.0, 85.162096], [1438261056.0, 38540000.0, 85.146684], [1438262784.0, 38560000.0, 85.131264], [1438264512.0, 38580000.0, 85.115836], [1438266240.0, 38600000.0, 85.10040000000001], [1438267968.0, 38620000.0, 85.084956], [1438269696.0, 38640000.0, 85.069504], [1438271424.0, 38660000.0, 85.054044], [1438273152.0, 38680000.0, 85.038576], [1438274880.0, 38700000.0, 85.0231], [1438276608.0, 38720000.0, 85.007616], [1438278336.0, 38740000.0, 84.992124], [1438280064.0, 38760000.0, 84.976624], [1438281792.0, 38780000.0, 84.961116], [1438283520.0, 38800000.0, 84.9456], [1438285248.0, 38820000.0, 84.930076], [1438286976.0, 38840000.0, 84.914544], [1438288704.0, 38860000.0, 84.899004], [1438290432.0, 38880000.0, 84.883456], [1438292160.0, 38900000.0, 84.86789999999999], [1438293888.0, 38920000.0, 84.85233600000001], [1438295616.0, 38940000.0, 84.836764], [1438297344.0, 38960000.0, 84.821184], [1438299072.0, 38980000.0, 84.805596], [1438300800.0, 39000000.0, 84.79], [1438302528.0, 39020000.0, 84.774396], [1438304256.0, 39040000.0, 84.758784], [1438305984.0, 39060000.0, 84.743164], [1438307712.0, 39080000.0, 84.727536], [1438309440.0, 39100000.0, 84.7119], [1438311168.0, 39120000.0, 84.696256], [1438312896.0, 39140000.0, 84.680604], [1438314624.0, 39160000.0, 84.664944], [1438316352.0, 39180000.0, 84.649276], [1438318080.0, 39200000.0, 84.6336], [1438319808.0, 39220000.0, 84.617916], [1438321536.0, 39240000.0, 84.602224], [1438323264.0, 39260000.0, 84.586524], [1438324992.0, 39280000.0, 84.57081600000001], [1438326720.0, 39300000.0, 84.5551], [1438328448.0, 39320000.0, 84.539376], [1438330176.0, 39340000.0, 84.523644], [1438331904.0, 39360000.0, 84.507904], [1438333632.0, 39380000.0, 84.492156], [1438335360.0, 39400000.0, 84.4764], [1438337088.0, 39420000.0, 84.460636], [1438338816.0, 39440000.0, 84.444864], [1438340544.0, 39460000.0, 84.429084], [1438342272.0, 39480000.0, 84.413296], [1438344000.0, 39500000.0, 84.3975], [1438345728.0, 39520000.0, 84.381696], [1438347456.0, 39540000.0, 84.365884], [1438349184.0, 39560000.0, 84.350064], [1438350912.0, 39580000.0, 84.334236], [1438352640.0, 39600000.0, 84.3184], [1438354368.0, 39620000.0, 84.302556], [1438356096.0, 39640000.0, 84.286704], [1438357824.0, 39660000.0, 84.270844], [1438359552.0, 39680000.0, 84.254976], [1438361280.0, 39700000.0, 84.2391], [1438363008.0, 39720000.0, 84.22321600000001], [1438364736.0, 39740000.0, 84.207324], [1438366464.0, 39760000.0, 84.191424], [1438368192.0, 39780000.0, 84.175516], [1438369920.0, 39800000.0, 84.1596], [1438371648.0, 39820000.0, 84.143676], [1438373376.0, 39840000.0, 84.127744], [1438375104.0, 39860000.0, 84.11180399999999], [1438376832.0, 39880000.0, 84.095856], [1438378560.0, 39900000.0, 84.0799], [1438380288.0, 39920000.0, 84.063936], [1438382016.0, 39940000.0, 84.047964], [1438383744.0, 39960000.0, 84.031984], [1438385472.0, 39980000.0, 84.015996], [1438387200.0, 40000000.0, 84.0], [1438388928.0, 40020000.0, 83.983996], [1438390656.0, 40040000.0, 83.967984], [1438392384.0, 40060000.0, 83.951964], [1438394112.0, 40080000.0, 83.935936], [1438395840.0, 40100000.0, 83.9199], [1438397568.0, 40120000.0, 83.903856], [1438399296.0, 40140000.0, 83.887804], [1438401024.0, 40160000.0, 83.871744], [1438402752.0, 40180000.0, 83.855676], [1438404480.0, 40200000.0, 83.8396], [1438406208.0, 40220000.0, 83.823516], [1438407936.0, 40240000.0, 83.807424], [1438409664.0, 40260000.0, 83.791324], [1438411392.0, 40280000.0, 83.775216], [1438413120.0, 40300000.0, 83.75909999999999], [1438414848.0, 40320000.0, 83.742976], [1438416576.0, 40340000.0, 83.726844], [1438418304.0, 40360000.0, 83.710704], [1438420032.0, 40380000.0, 83.694556], [1438421760.0, 40400000.0, 83.6784], [1438423488.0, 40420000.0, 83.66223600000001], [1438425216.0, 40440000.0, 83.646064], [1438426944.0, 40460000.0, 83.629884], [1438428672.0, 40480000.0, 83.613696], [1438430400.0, 40500000.0, 83.5975], [1438432128.0, 40520000.0, 83.58129600000001], [1438433856.0, 40540000.0, 83.565084], [1438435584.0, 40560000.0, 83.548864], [1438437312.0, 40580000.0, 83.532636], [1438439040.0, 40600000.0, 83.5164], [1438440768.0, 40620000.0, 83.500156], [1438442496.0, 40640000.0, 83.483904], [1438444224.0, 40660000.0, 83.467644], [1438445952.0, 40680000.0, 83.45137600000001], [1438447680.0, 40700000.0, 83.4351], [1438449408.0, 40720000.0, 83.41881599999999], [1438451136.0, 40740000.0, 83.402524], [1438452864.0, 40760000.0, 83.386224], [1438454592.0, 40780000.0, 83.36991599999999], [1438456320.0, 40800000.0, 83.3536], [1438458048.0, 40820000.0, 83.337276], [1438459776.0, 40840000.0, 83.320944], [1438461504.0, 40860000.0, 83.304604], [1438463232.0, 40880000.0, 83.288256], [1438464960.0, 40900000.0, 83.2719], [1438466688.0, 40920000.0, 83.255536], [1438468416.0, 40940000.0, 83.239164], [1438470144.0, 40960000.0, 83.222784], [1438471872.0, 40980000.0, 83.206396], [1438473600.0, 41000000.0, 83.19], [1438475328.0, 41020000.0, 83.173596], [1438477056.0, 41040000.0, 83.157184], [1438478784.0, 41060000.0, 83.140764], [1438480512.0, 41080000.0, 83.124336], [1438482240.0, 41100000.0, 83.1079], [1438483968.0, 41120000.0, 83.091456], [1438485696.0, 41140000.0, 83.075004], [1438487424.0, 41160000.0, 83.058544], [1438489152.0, 41180000.0, 83.042076], [1438490880.0, 41200000.0, 83.0256], [1438492608.0, 41220000.0, 83.009116], [1438494336.0, 41240000.0, 82.992624], [1438496064.0, 41260000.0, 82.976124], [1438497792.0, 41280000.0, 82.959616], [1438499520.0, 41300000.0, 82.9431], [1438501248.0, 41320000.0, 82.92657600000001], [1438502976.0, 41340000.0, 82.910044], [1438504704.0, 41360000.0, 82.89350400000001], [1438506432.0, 41380000.0, 82.876956], [1438508160.0, 41400000.0, 82.8604], [1438509888.0, 41420000.0, 82.843836], [1438511616.0, 41440000.0, 82.827264], [1438513344.0, 41460000.0, 82.81068400000001], [1438515072.0, 41480000.0, 82.794096], [1438516800.0, 41500000.0, 82.7775], [1438518528.0, 41520000.0, 82.760896], [1438520256.0, 41540000.0, 82.744284], [1438521984.0, 41560000.0, 82.727664], [1438523712.0, 41580000.0, 82.711036], [1438525440.0, 41600000.0, 82.6944], [1438527168.0, 41620000.0, 82.677756], [1438528896.0, 41640000.0, 82.661104], [1438530624.0, 41660000.0, 82.644444], [1438532352.0, 41680000.0, 82.627776], [1438534080.0, 41700000.0, 82.6111], [1438535808.0, 41720000.0, 82.594416], [1438537536.0, 41740000.0, 82.57772399999999], [1438539264.0, 41760000.0, 82.561024], [1438540992.0, 41780000.0, 82.54431600000001], [1438542720.0, 41800000.0, 82.5276], [1438544448.0, 41820000.0, 82.510876], [1438546176.0, 41840000.0, 82.494144], [1438547904.0, 41860000.0, 82.477404], [1438549632.0, 41880000.0, 82.460656], [1438551360.0, 41900000.0, 82.4439], [1438553088.0, 41920000.0, 82.42713599999999], [1438554816.0, 41940000.0, 82.410364], [1438556544.0, 41960000.0, 82.393584], [1438558272.0, 41980000.0, 82.376796], [1438560000.0, 42000000.0, 82.36], [1438561728.0, 42020000.0, 82.343196], [1438563456.0, 42040000.0, 82.326384], [1438565184.0, 42060000.0, 82.309564], [1438566912.0, 42080000.0, 82.29273599999999], [1438568640.0, 42100000.0, 82.27590000000001], [1438570368.0, 42120000.0, 82.259056], [1438572096.0, 42140000.0, 82.242204], [1438573824.0, 42160000.0, 82.225344], [1438575552.0, 42180000.0, 82.208476], [1438577280.0, 42200000.0, 82.1916], [1438579008.0, 42220000.0, 82.17471599999999], [1438580736.0, 42240000.0, 82.157824], [1438582464.0, 42260000.0, 82.140924], [1438584192.0, 42280000.0, 82.124016], [1438585920.0, 42300000.0, 82.1071], [1438587648.0, 42320000.0, 82.090176], [1438589376.0, 42340000.0, 82.073244], [1438591104.0, 42360000.0, 82.056304], [1438592832.0, 42380000.0, 82.039356], [1438594560.0, 42400000.0, 82.0224], [1438596288.0, 42420000.0, 82.005436], [1438598016.0, 42440000.0, 81.988464], [1438599744.0, 42460000.0, 81.971484], [1438601472.0, 42480000.0, 81.954496], [1438603200.0, 42500000.0, 81.9375], [1438604928.0, 42520000.0, 81.920496], [1438606656.0, 42540000.0, 81.903484], [1438608384.0, 42560000.0, 81.88646399999999], [1438610112.0, 42580000.0, 81.86943600000001], [1438611840.0, 42600000.0, 81.8524], [1438613568.0, 42620000.0, 81.835356], [1438615296.0, 42640000.0, 81.818304], [1438617024.0, 42660000.0, 81.801244], [1438618752.0, 42680000.0, 81.784176], [1438620480.0, 42700000.0, 81.7671], [1438622208.0, 42720000.0, 81.750016], [1438623936.0, 42740000.0, 81.732924], [1438625664.0, 42760000.0, 81.715824], [1438627392.0, 42780000.0, 81.698716], [1438629120.0, 42800000.0, 81.6816], [1438630848.0, 42820000.0, 81.66447600000001], [1438632576.0, 42840000.0, 81.647344], [1438634304.0, 42860000.0, 81.630204], [1438636032.0, 42880000.0, 81.613056], [1438637760.0, 42900000.0, 81.5959], [1438639488.0, 42920000.0, 81.578736], [1438641216.0, 42940000.0, 81.561564], [1438642944.0, 42960000.0, 81.544384], [1438644672.0, 42980000.0, 81.527196], [1438646400.0, 43000000.0, 81.51], [1438648128.0, 43020000.0, 81.492796], [1438649856.0, 43040000.0, 81.475584], [1438651584.0, 43060000.0, 81.458364], [1438653312.0, 43080000.0, 81.441136], [1438655040.0, 43100000.0, 81.4239], [1438656768.0, 43120000.0, 81.406656], [1438658496.0, 43140000.0, 81.389404], [1438660224.0, 43160000.0, 81.372144], [1438661952.0, 43180000.0, 81.354876], [1438663680.0, 43200000.0, 81.3376], [1438665408.0, 43220000.0, 81.32031599999999], [1438667136.0, 43240000.0, 81.303024], [1438668864.0, 43260000.0, 81.285724], [1438670592.0, 43280000.0, 81.268416], [1438672320.0, 43300000.0, 81.25110000000001], [1438674048.0, 43320000.0, 81.233776], [1438675776.0, 43340000.0, 81.216444], [1438677504.0, 43360000.0, 81.199104], [1438679232.0, 43380000.0, 81.18175600000001], [1438680960.0, 43400000.0, 81.1644], [1438682688.0, 43420000.0, 81.147036], [1438684416.0, 43440000.0, 81.12966399999999], [1438686144.0, 43460000.0, 81.112284], [1438687872.0, 43480000.0, 81.094896], [1438689600.0, 43500000.0, 81.0775], [1438691328.0, 43520000.0, 81.060096], [1438693056.0, 43540000.0, 81.042684], [1438694784.0, 43560000.0, 81.02526399999999], [1438696512.0, 43580000.0, 81.007836], [1438698240.0, 43600000.0, 80.9904], [1438699968.0, 43620000.0, 80.972956], [1438701696.0, 43640000.0, 80.955504], [1438703424.0, 43660000.0, 80.938044], [1438705152.0, 43680000.0, 80.920576], [1438706880.0, 43700000.0, 80.9031], [1438708608.0, 43720000.0, 80.885616], [1438710336.0, 43740000.0, 80.868124], [1438712064.0, 43760000.0, 80.850624], [1438713792.0, 43780000.0, 80.833116], [1438715520.0, 43800000.0, 80.8156], [1438717248.0, 43820000.0, 80.79807600000001], [1438718976.0, 43840000.0, 80.78054399999999], [1438720704.0, 43860000.0, 80.763004], [1438722432.0, 43880000.0, 80.745456], [1438724160.0, 43900000.0, 80.7279], [1438725888.0, 43920000.0, 80.710336], [1438727616.0, 43940000.0, 80.692764], [1438729344.0, 43960000.0, 80.675184], [1438731072.0, 43980000.0, 80.657596], [1438732800.0, 44000000.0, 80.64], [1438734528.0, 44020000.0, 80.622396], [1438736256.0, 44040000.0, 80.604784], [1438737984.0, 44060000.0, 80.587164], [1438739712.0, 44080000.0, 80.569536], [1438741440.0, 44100000.0, 80.5519], [1438743168.0, 44120000.0, 80.534256], [1438744896.0, 44140000.0, 80.516604], [1438746624.0, 44160000.0, 80.498944], [1438748352.0, 44180000.0, 80.481276], [1438750080.0, 44200000.0, 80.4636], [1438751808.0, 44220000.0, 80.44591600000001], [1438753536.0, 44240000.0, 80.428224], [1438755264.0, 44260000.0, 80.410524], [1438756992.0, 44280000.0, 80.392816], [1438758720.0, 44300000.0, 80.3751], [1438760448.0, 44320000.0, 80.35737599999999], [1438762176.0, 44340000.0, 80.33964399999999], [1438763904.0, 44360000.0, 80.321904], [1438765632.0, 44380000.0, 80.304156], [1438767360.0, 44400000.0, 80.2864], [1438769088.0, 44420000.0, 80.268636], [1438770816.0, 44440000.0, 80.250864], [1438772544.0, 44460000.0, 80.233084], [1438774272.0, 44480000.0, 80.215296], [1438776000.0, 44500000.0, 80.19749999999999], [1438777728.0, 44520000.0, 80.179696], [1438779456.0, 44540000.0, 80.161884], [1438781184.0, 44560000.0, 80.144064], [1438782912.0, 44580000.0, 80.126236], [1438784640.0, 44600000.0, 80.1084], [1438786368.0, 44620000.0, 80.090556], [1438788096.0, 44640000.0, 80.07270399999999], [1438789824.0, 44660000.0, 80.054844], [1438791552.0, 44680000.0, 80.036976], [1438793280.0, 44700000.0, 80.01910000000001], [1438795008.0, 44720000.0, 80.001216], [1438796736.0, 44740000.0, 79.983324], [1438798464.0, 44760000.0, 79.965424], [1438800192.0, 44780000.0, 79.94751600000001], [1438801920.0, 44800000.0, 79.9296], [1438803648.0, 44820000.0, 79.911676], [1438805376.0, 44840000.0, 79.893744], [1438807104.0, 44860000.0, 79.875804], [1438808832.0, 44880000.0, 79.857856], [1438810560.0, 44900000.0, 79.8399], [1438812288.0, 44920000.0, 79.821936], [1438814016.0, 44940000.0, 79.80396400000001], [1438815744.0, 44960000.0, 79.785984], [1438817472.0, 44980000.0, 79.767996], [1438819200.0, 45000000.0, 79.75], [1438820928.0, 45020000.0, 79.73199600000001], [1438822656.0, 45040000.0, 79.71398400000001], [1438824384.0, 45060000.0, 79.695964], [1438826112.0, 45080000.0, 79.677936], [1438827840.0, 45100000.0, 79.6599], [1438829568.0, 45120000.0, 79.641856], [1438831296.0, 45140000.0, 79.62380399999999], [1438833024.0, 45160000.0, 79.605744], [1438834752.0, 45180000.0, 79.587676], [1438836480.0, 45200000.0, 79.56960000000001], [1438838208.0, 45220000.0, 79.55151599999999], [1438839936.0, 45240000.0, 79.533424], [1438841664.0, 45260000.0, 79.515324], [1438843392.0, 45280000.0, 79.49721600000001], [1438845120.0, 45300000.0, 79.4791], [1438846848.0, 45320000.0, 79.460976], [1438848576.0, 45340000.0, 79.44284400000001], [1438850304.0, 45360000.0, 79.424704], [1438852032.0, 45380000.0, 79.406556], [1438853760.0, 45400000.0, 79.3884], [1438855488.0, 45420000.0, 79.370236], [1438857216.0, 45440000.0, 79.352064], [1438858944.0, 45460000.0, 79.333884], [1438860672.0, 45480000.0, 79.315696], [1438862400.0, 45500000.0, 79.2975], [1438864128.0, 45520000.0, 79.279296], [1438865856.0, 45540000.0, 79.261084], [1438867584.0, 45560000.0, 79.242864], [1438869312.0, 45580000.0, 79.224636], [1438871040.0, 45600000.0, 79.2064], [1438872768.0, 45620000.0, 79.18815599999999], [1438874496.0, 45640000.0, 79.169904], [1438876224.0, 45660000.0, 79.151644], [1438877952.0, 45680000.0, 79.133376], [1438879680.0, 45700000.0, 79.1151], [1438881408.0, 45720000.0, 79.096816], [1438883136.0, 45740000.0, 79.078524], [1438884864.0, 45760000.0, 79.060224], [1438886592.0, 45780000.0, 79.041916], [1438888320.0, 45800000.0, 79.0236], [1438890048.0, 45820000.0, 79.00527600000001], [1438891776.0, 45840000.0, 78.98694400000001], [1438893504.0, 45860000.0, 78.968604], [1438895232.0, 45880000.0, 78.950256], [1438896960.0, 45900000.0, 78.9319], [1438898688.0, 45920000.0, 78.91353600000001], [1438900416.0, 45940000.0, 78.895164], [1438902144.0, 45960000.0, 78.876784], [1438903872.0, 45980000.0, 78.858396], [1438905600.0, 46000000.0, 78.84], [1438907328.0, 46020000.0, 78.821596], [1438909056.0, 46040000.0, 78.803184], [1438910784.0, 46060000.0, 78.784764], [1438912512.0, 46080000.0, 78.766336], [1438914240.0, 46100000.0, 78.7479], [1438915968.0, 46120000.0, 78.729456], [1438917696.0, 46140000.0, 78.711004], [1438919424.0, 46160000.0, 78.692544], [1438921152.0, 46180000.0, 78.674076], [1438922880.0, 46200000.0, 78.65559999999999], [1438924608.0, 46220000.0, 78.637116], [1438926336.0, 46240000.0, 78.61862400000001], [1438928064.0, 46260000.0, 78.600124], [1438929792.0, 46280000.0, 78.581616], [1438931520.0, 46300000.0, 78.5631], [1438933248.0, 46320000.0, 78.544576], [1438934976.0, 46340000.0, 78.526044], [1438936704.0, 46360000.0, 78.507504], [1438938432.0, 46380000.0, 78.488956], [1438940160.0, 46400000.0, 78.4704], [1438941888.0, 46420000.0, 78.451836], [1438943616.0, 46440000.0, 78.433264], [1438945344.0, 46460000.0, 78.414684], [1438947072.0, 46480000.0, 78.396096], [1438948800.0, 46500000.0, 78.3775], [1438950528.0, 46520000.0, 78.358896], [1438952256.0, 46540000.0, 78.340284], [1438953984.0, 46560000.0, 78.321664], [1438955712.0, 46580000.0, 78.30303599999999], [1438957440.0, 46600000.0, 78.2844], [1438959168.0, 46620000.0, 78.265756], [1438960896.0, 46640000.0, 78.24710400000001], [1438962624.0, 46660000.0, 78.228444], [1438964352.0, 46680000.0, 78.209776], [1438966080.0, 46700000.0, 78.1911], [1438967808.0, 46720000.0, 78.172416], [1438969536.0, 46740000.0, 78.153724], [1438971264.0, 46760000.0, 78.135024], [1438972992.0, 46780000.0, 78.116316], [1438974720.0, 46800000.0, 78.0976], [1438976448.0, 46820000.0, 78.078876], [1438978176.0, 46840000.0, 78.060144], [1438979904.0, 46860000.0, 78.041404], [1438981632.0, 46880000.0, 78.022656], [1438983360.0, 46900000.0, 78.0039], [1438985088.0, 46920000.0, 77.985136], [1438986816.0, 46940000.0, 77.966364], [1438988544.0, 46960000.0, 77.947584], [1438990272.0, 46980000.0, 77.92879599999999], [1438992000.0, 47000000.0, 77.91], [1438993728.0, 47020000.0, 77.89119600000001], [1438995456.0, 47040000.0, 77.87238400000001], [1438997184.0, 47060000.0, 77.85356399999999], [1438998912.0, 47080000.0, 77.83473599999999], [1439000640.0, 47100000.0, 77.8159], [1439002368.0, 47120000.0, 77.797056], [1439004096.0, 47140000.0, 77.77820399999999], [1439005824.0, 47160000.0, 77.759344], [1439007552.0, 47180000.0, 77.740476], [1439009280.0, 47200000.0, 77.7216], [1439011008.0, 47220000.0, 77.702716], [1439012736.0, 47240000.0, 77.683824], [1439014464.0, 47260000.0, 77.664924], [1439016192.0, 47280000.0, 77.646016], [1439017920.0, 47300000.0, 77.6271], [1439019648.0, 47320000.0, 77.608176], [1439021376.0, 47340000.0, 77.58924400000001], [1439023104.0, 47360000.0, 77.57030400000001], [1439024832.0, 47380000.0, 77.551356], [1439026560.0, 47400000.0, 77.5324], [1439028288.0, 47420000.0, 77.513436], [1439030016.0, 47440000.0, 77.494464], [1439031744.0, 47460000.0, 77.475484], [1439033472.0, 47480000.0, 77.456496], [1439035200.0, 47500000.0, 77.4375], [1439036928.0, 47520000.0, 77.418496], [1439038656.0, 47540000.0, 77.399484], [1439040384.0, 47560000.0, 77.38046399999999], [1439042112.0, 47580000.0, 77.361436], [1439043840.0, 47600000.0, 77.3424], [1439045568.0, 47620000.0, 77.323356], [1439047296.0, 47640000.0, 77.304304], [1439049024.0, 47660000.0, 77.285244], [1439050752.0, 47680000.0, 77.266176], [1439052480.0, 47700000.0, 77.2471], [1439054208.0, 47720000.0, 77.228016], [1439055936.0, 47740000.0, 77.208924], [1439057664.0, 47760000.0, 77.189824], [1439059392.0, 47780000.0, 77.170716], [1439061120.0, 47800000.0, 77.1516], [1439062848.0, 47820000.0, 77.132476], [1439064576.0, 47840000.0, 77.113344], [1439066304.0, 47860000.0, 77.094204], [1439068032.0, 47880000.0, 77.07505599999999], [1439069760.0, 47900000.0, 77.05590000000001], [1439071488.0, 47920000.0, 77.036736], [1439073216.0, 47940000.0, 77.01756400000001], [1439074944.0, 47960000.0, 76.998384], [1439076672.0, 47980000.0, 76.979196], [1439078400.0, 48000000.0, 76.96000000000001], [1439080128.0, 48020000.0, 76.940796], [1439081856.0, 48040000.0, 76.921584], [1439083584.0, 48060000.0, 76.902364], [1439085312.0, 48080000.0, 76.88313600000001], [1439087040.0, 48100000.0, 76.8639], [1439088768.0, 48120000.0, 76.844656], [1439090496.0, 48140000.0, 76.82540399999999], [1439092224.0, 48160000.0, 76.806144], [1439093952.0, 48180000.0, 76.786876], [1439095680.0, 48200000.0, 76.7676], [1439097408.0, 48220000.0, 76.748316], [1439099136.0, 48240000.0, 76.72902400000001], [1439100864.0, 48260000.0, 76.70972400000001], [1439102592.0, 48280000.0, 76.690416], [1439104320.0, 48300000.0, 76.6711], [1439106048.0, 48320000.0, 76.651776], [1439107776.0, 48340000.0, 76.632444], [1439109504.0, 48360000.0, 76.61310399999999], [1439111232.0, 48380000.0, 76.593756], [1439112960.0, 48400000.0, 76.5744], [1439114688.0, 48420000.0, 76.555036], [1439116416.0, 48440000.0, 76.535664], [1439118144.0, 48460000.0, 76.516284], [1439119872.0, 48480000.0, 76.49689599999999], [1439121600.0, 48500000.0, 76.4775], [1439123328.0, 48520000.0, 76.458096], [1439125056.0, 48540000.0, 76.438684], [1439126784.0, 48560000.0, 76.419264], [1439128512.0, 48580000.0, 76.39983600000001], [1439130240.0, 48600000.0, 76.3804], [1439131968.0, 48620000.0, 76.360956], [1439133696.0, 48640000.0, 76.341504], [1439135424.0, 48660000.0, 76.322044], [1439137152.0, 48680000.0, 76.302576], [1439138880.0, 48700000.0, 76.28309999999999], [1439140608.0, 48720000.0, 76.263616], [1439142336.0, 48740000.0, 76.244124], [1439144064.0, 48760000.0, 76.22462399999999], [1439145792.0, 48780000.0, 76.205116], [1439147520.0, 48800000.0, 76.1856], [1439149248.0, 48820000.0, 76.166076], [1439150976.0, 48840000.0, 76.14654399999999], [1439152704.0, 48860000.0, 76.127004], [1439154432.0, 48880000.0, 76.107456], [1439156160.0, 48900000.0, 76.0879], [1439157888.0, 48920000.0, 76.06833599999999], [1439159616.0, 48940000.0, 76.048764], [1439161344.0, 48960000.0, 76.029184], [1439163072.0, 48980000.0, 76.009596], [1439164800.0, 49000000.0, 75.99], [1439166528.0, 49020000.0, 75.970396], [1439168256.0, 49040000.0, 75.950784], [1439169984.0, 49060000.0, 75.931164], [1439171712.0, 49080000.0, 75.911536], [1439173440.0, 49100000.0, 75.89189999999999], [1439175168.0, 49120000.0, 75.872256], [1439176896.0, 49140000.0, 75.852604], [1439178624.0, 49160000.0, 75.832944], [1439180352.0, 49180000.0, 75.813276], [1439182080.0, 49200000.0, 75.7936], [1439183808.0, 49220000.0, 75.773916], [1439185536.0, 49240000.0, 75.754224], [1439187264.0, 49260000.0, 75.734524], [1439188992.0, 49280000.0, 75.714816], [1439190720.0, 49300000.0, 75.6951], [1439192448.0, 49320000.0, 75.675376], [1439194176.0, 49340000.0, 75.655644], [1439195904.0, 49360000.0, 75.635904], [1439197632.0, 49380000.0, 75.616156], [1439199360.0, 49400000.0, 75.59639999999999], [1439201088.0, 49420000.0, 75.576636], [1439202816.0, 49440000.0, 75.556864], [1439204544.0, 49460000.0, 75.53708400000001], [1439206272.0, 49480000.0, 75.51729599999999], [1439208000.0, 49500000.0, 75.4975], [1439209728.0, 49520000.0, 75.47769600000001], [1439211456.0, 49540000.0, 75.457884], [1439213184.0, 49560000.0, 75.438064], [1439214912.0, 49580000.0, 75.418236], [1439216640.0, 49600000.0, 75.3984], [1439218368.0, 49620000.0, 75.378556], [1439220096.0, 49640000.0, 75.35870399999999], [1439221824.0, 49660000.0, 75.338844], [1439223552.0, 49680000.0, 75.31897599999999], [1439225280.0, 49700000.0, 75.29910000000001], [1439227008.0, 49720000.0, 75.27921599999999], [1439228736.0, 49740000.0, 75.25932399999999], [1439230464.0, 49760000.0, 75.239424], [1439232192.0, 49780000.0, 75.219516], [1439233920.0, 49800000.0, 75.1996], [1439235648.0, 49820000.0, 75.179676], [1439237376.0, 49840000.0, 75.159744], [1439239104.0, 49860000.0, 75.139804], [1439240832.0, 49880000.0, 75.119856], [1439242560.0, 49900000.0, 75.09989999999999], [1439244288.0, 49920000.0, 75.079936], [1439246016.0, 49940000.0, 75.05996400000001], [1439247744.0, 49960000.0, 75.039984], [1439249472.0, 49980000.0, 75.01999599999999], [1439251200.0, 50000000.0, 75.0], [1439252928.0, 50020000.0, 74.979996], [1439254656.0, 50040000.0, 74.959984], [1439256384.0, 50060000.0, 74.939964], [1439258112.0, 50080000.0, 74.919936], [1439259840.0, 50100000.0, 74.8999], [1439261568.0, 50120000.0, 74.879856], [1439263296.0, 50140000.0, 74.859804], [1439265024.0, 50160000.0, 74.839744], [1439266752.0, 50180000.0, 74.819676], [1439268480.0, 50200000.0, 74.7996], [1439270208.0, 50220000.0, 74.779516], [1439271936.0, 50240000.0, 74.759424], [1439273664.0, 50260000.0, 74.73932400000001], [1439275392.0, 50280000.0, 74.719216], [1439277120.0, 50300000.0, 74.6991], [1439278848.0, 50320000.0, 74.678976], [1439280576.0, 50340000.0, 74.658844], [1439282304.0, 50360000.0, 74.638704], [1439284032.0, 50380000.0, 74.618556], [1439285760.0, 50400000.0, 74.5984], [1439287488.0, 50420000.0, 74.578236], [1439289216.0, 50440000.0, 74.558064], [1439290944.0, 50460000.0, 74.53788399999999], [1439292672.0, 50480000.0, 74.517696], [1439294400.0, 50500000.0, 74.4975], [1439296128.0, 50520000.0, 74.477296], [1439297856.0, 50540000.0, 74.457084], [1439299584.0, 50560000.0, 74.436864], [1439301312.0, 50580000.0, 74.416636], [1439303040.0, 50600000.0, 74.3964], [1439304768.0, 50620000.0, 74.376156], [1439306496.0, 50640000.0, 74.355904], [1439308224.0, 50660000.0, 74.335644], [1439309952.0, 50680000.0, 74.315376], [1439311680.0, 50700000.0, 74.29509999999999], [1439313408.0, 50720000.0, 74.274816], [1439315136.0, 50740000.0, 74.254524], [1439316864.0, 50760000.0, 74.23422400000001], [1439318592.0, 50780000.0, 74.213916], [1439320320.0, 50800000.0, 74.1936], [1439322048.0, 50820000.0, 74.173276], [1439323776.0, 50840000.0, 74.152944], [1439325504.0, 50860000.0, 74.132604], [1439327232.0, 50880000.0, 74.112256], [1439328960.0, 50900000.0, 74.09190000000001], [1439330688.0, 50920000.0, 74.07153600000001], [1439332416.0, 50940000.0, 74.051164], [1439334144.0, 50960000.0, 74.030784], [1439335872.0, 50980000.0, 74.010396], [1439337600.0, 51000000.0, 73.99000000000001], [1439339328.0, 51020000.0, 73.969596], [1439341056.0, 51040000.0, 73.949184], [1439342784.0, 51060000.0, 73.928764], [1439344512.0, 51080000.0, 73.908336], [1439346240.0, 51100000.0, 73.8879], [1439347968.0, 51120000.0, 73.867456], [1439349696.0, 51140000.0, 73.847004], [1439351424.0, 51160000.0, 73.826544], [1439353152.0, 51180000.0, 73.80607599999999], [1439354880.0, 51200000.0, 73.7856], [1439356608.0, 51220000.0, 73.765116], [1439358336.0, 51240000.0, 73.744624], [1439360064.0, 51260000.0, 73.72412399999999], [1439361792.0, 51280000.0, 73.703616], [1439363520.0, 51300000.0, 73.6831], [1439365248.0, 51320000.0, 73.662576], [1439366976.0, 51340000.0, 73.642044], [1439368704.0, 51360000.0, 73.621504], [1439370432.0, 51380000.0, 73.600956], [1439372160.0, 51400000.0, 73.5804], [1439373888.0, 51420000.0, 73.55983599999999], [1439375616.0, 51440000.0, 73.539264], [1439377344.0, 51460000.0, 73.51868400000001], [1439379072.0, 51480000.0, 73.498096], [1439380800.0, 51500000.0, 73.47749999999999], [1439382528.0, 51520000.0, 73.456896], [1439384256.0, 51540000.0, 73.436284], [1439385984.0, 51560000.0, 73.415664], [1439387712.0, 51580000.0, 73.395036], [1439389440.0, 51600000.0, 73.3744], [1439391168.0, 51620000.0, 73.353756], [1439392896.0, 51640000.0, 73.333104], [1439394624.0, 51660000.0, 73.312444], [1439396352.0, 51680000.0, 73.291776], [1439398080.0, 51700000.0, 73.2711], [1439399808.0, 51720000.0, 73.250416], [1439401536.0, 51740000.0, 73.229724], [1439403264.0, 51760000.0, 73.209024], [1439404992.0, 51780000.0, 73.188316], [1439406720.0, 51800000.0, 73.16760000000001], [1439408448.0, 51820000.0, 73.14687599999999], [1439410176.0, 51840000.0, 73.126144], [1439411904.0, 51860000.0, 73.105404], [1439413632.0, 51880000.0, 73.084656], [1439415360.0, 51900000.0, 73.06389999999999], [1439417088.0, 51920000.0, 73.043136], [1439418816.0, 51940000.0, 73.022364], [1439420544.0, 51960000.0, 73.00158400000001], [1439422272.0, 51980000.0, 72.980796], [1439424000.0, 52000000.0, 72.96], [1439425728.0, 52020000.0, 72.939196], [1439427456.0, 52040000.0, 72.918384], [1439429184.0, 52060000.0, 72.89756399999999], [1439430912.0, 52080000.0, 72.876736], [1439432640.0, 52100000.0, 72.8559], [1439434368.0, 52120000.0, 72.83505600000001], [1439436096.0, 52140000.0, 72.81420399999999], [1439437824.0, 52160000.0, 72.79334399999999], [1439439552.0, 52180000.0, 72.772476], [1439441280.0, 52200000.0, 72.7516], [1439443008.0, 52220000.0, 72.730716], [1439444736.0, 52240000.0, 72.709824], [1439446464.0, 52260000.0, 72.688924], [1439448192.0, 52280000.0, 72.668016], [1439449920.0, 52300000.0, 72.6471], [1439451648.0, 52320000.0, 72.626176], [1439453376.0, 52340000.0, 72.605244], [1439455104.0, 52360000.0, 72.584304], [1439456832.0, 52380000.0, 72.563356], [1439458560.0, 52400000.0, 72.5424], [1439460288.0, 52420000.0, 72.521436], [1439462016.0, 52440000.0, 72.500464], [1439463744.0, 52460000.0, 72.479484], [1439465472.0, 52480000.0, 72.458496], [1439467200.0, 52500000.0, 72.4375], [1439468928.0, 52520000.0, 72.416496], [1439470656.0, 52540000.0, 72.39548400000001], [1439472384.0, 52560000.0, 72.37446399999999], [1439474112.0, 52580000.0, 72.353436], [1439475840.0, 52600000.0, 72.3324], [1439477568.0, 52620000.0, 72.311356], [1439479296.0, 52640000.0, 72.29030399999999], [1439481024.0, 52660000.0, 72.269244], [1439482752.0, 52680000.0, 72.248176], [1439484480.0, 52700000.0, 72.22710000000001], [1439486208.0, 52720000.0, 72.206016], [1439487936.0, 52740000.0, 72.184924], [1439489664.0, 52760000.0, 72.163824], [1439491392.0, 52780000.0, 72.14271600000001], [1439493120.0, 52800000.0, 72.1216], [1439494848.0, 52820000.0, 72.100476], [1439496576.0, 52840000.0, 72.079344], [1439498304.0, 52860000.0, 72.058204], [1439500032.0, 52880000.0, 72.03705599999999], [1439501760.0, 52900000.0, 72.0159], [1439503488.0, 52920000.0, 71.994736], [1439505216.0, 52940000.0, 71.97356400000001], [1439506944.0, 52960000.0, 71.952384], [1439508672.0, 52980000.0, 71.931196], [1439510400.0, 53000000.0, 71.91], [1439512128.0, 53020000.0, 71.888796], [1439513856.0, 53040000.0, 71.867584], [1439515584.0, 53060000.0, 71.846364], [1439517312.0, 53080000.0, 71.825136], [1439519040.0, 53100000.0, 71.8039], [1439520768.0, 53120000.0, 71.782656], [1439522496.0, 53140000.0, 71.761404], [1439524224.0, 53160000.0, 71.740144], [1439525952.0, 53180000.0, 71.71887600000001], [1439527680.0, 53200000.0, 71.6976], [1439529408.0, 53220000.0, 71.676316], [1439531136.0, 53240000.0, 71.655024], [1439532864.0, 53260000.0, 71.633724], [1439534592.0, 53280000.0, 71.612416], [1439536320.0, 53300000.0, 71.5911], [1439538048.0, 53320000.0, 71.569776], [1439539776.0, 53340000.0, 71.548444], [1439541504.0, 53360000.0, 71.527104], [1439543232.0, 53380000.0, 71.50575599999999], [1439544960.0, 53400000.0, 71.4844], [1439546688.0, 53420000.0, 71.463036], [1439548416.0, 53440000.0, 71.441664], [1439550144.0, 53460000.0, 71.420284], [1439551872.0, 53480000.0, 71.39889600000001], [1439553600.0, 53500000.0, 71.3775], [1439555328.0, 53520000.0, 71.356096], [1439557056.0, 53540000.0, 71.334684], [1439558784.0, 53560000.0, 71.313264], [1439560512.0, 53580000.0, 71.291836], [1439562240.0, 53600000.0, 71.2704], [1439563968.0, 53620000.0, 71.24895599999999], [1439565696.0, 53640000.0, 71.227504], [1439567424.0, 53660000.0, 71.206044], [1439569152.0, 53680000.0, 71.18457599999999], [1439570880.0, 53700000.0, 71.1631], [1439572608.0, 53720000.0, 71.141616], [1439574336.0, 53740000.0, 71.120124], [1439576064.0, 53760000.0, 71.098624], [1439577792.0, 53780000.0, 71.077116], [1439579520.0, 53800000.0, 71.0556], [1439581248.0, 53820000.0, 71.034076], [1439582976.0, 53840000.0, 71.01254399999999], [1439584704.0, 53860000.0, 70.991004], [1439586432.0, 53880000.0, 70.96945600000001], [1439588160.0, 53900000.0, 70.9479], [1439589888.0, 53920000.0, 70.92633599999999], [1439591616.0, 53940000.0, 70.904764], [1439593344.0, 53960000.0, 70.883184], [1439595072.0, 53980000.0, 70.861596], [1439596800.0, 54000000.0, 70.84], [1439598528.0, 54020000.0, 70.818396], [1439600256.0, 54040000.0, 70.796784], [1439601984.0, 54060000.0, 70.775164], [1439603712.0, 54080000.0, 70.753536], [1439605440.0, 54100000.0, 70.7319], [1439607168.0, 54120000.0, 70.710256], [1439608896.0, 54140000.0, 70.688604], [1439610624.0, 54160000.0, 70.666944], [1439612352.0, 54180000.0, 70.645276], [1439614080.0, 54200000.0, 70.6236], [1439615808.0, 54220000.0, 70.601916], [1439617536.0, 54240000.0, 70.580224], [1439619264.0, 54260000.0, 70.558524], [1439620992.0, 54280000.0, 70.536816], [1439622720.0, 54300000.0, 70.5151], [1439624448.0, 54320000.0, 70.493376], [1439626176.0, 54340000.0, 70.471644], [1439627904.0, 54360000.0, 70.449904], [1439629632.0, 54380000.0, 70.428156], [1439631360.0, 54400000.0, 70.40639999999999], [1439633088.0, 54420000.0, 70.384636], [1439634816.0, 54440000.0, 70.362864], [1439636544.0, 54460000.0, 70.341084], [1439638272.0, 54480000.0, 70.319296], [1439640000.0, 54500000.0, 70.2975], [1439641728.0, 54520000.0, 70.275696], [1439643456.0, 54540000.0, 70.253884], [1439645184.0, 54560000.0, 70.232064], [1439646912.0, 54580000.0, 70.210236], [1439648640.0, 54600000.0, 70.1884], [1439650368.0, 54620000.0, 70.166556], [1439652096.0, 54640000.0, 70.14470399999999], [1439653824.0, 54660000.0, 70.122844], [1439655552.0, 54680000.0, 70.100976], [1439657280.0, 54700000.0, 70.07910000000001], [1439659008.0, 54720000.0, 70.057216], [1439660736.0, 54740000.0, 70.035324], [1439662464.0, 54760000.0, 70.013424], [1439664192.0, 54780000.0, 69.991516], [1439665920.0, 54800000.0, 69.9696], [1439667648.0, 54820000.0, 69.947676], [1439669376.0, 54840000.0, 69.92574400000001], [1439671104.0, 54860000.0, 69.90380400000001], [1439672832.0, 54880000.0, 69.881856], [1439674560.0, 54900000.0, 69.8599], [1439676288.0, 54920000.0, 69.837936], [1439678016.0, 54940000.0, 69.81596400000001], [1439679744.0, 54960000.0, 69.793984], [1439681472.0, 54980000.0, 69.771996], [1439683200.0, 55000000.0, 69.75], [1439684928.0, 55020000.0, 69.727996], [1439686656.0, 55040000.0, 69.705984], [1439688384.0, 55060000.0, 69.683964], [1439690112.0, 55080000.0, 69.661936], [1439691840.0, 55100000.0, 69.6399], [1439693568.0, 55120000.0, 69.617856], [1439695296.0, 55140000.0, 69.595804], [1439697024.0, 55160000.0, 69.573744], [1439698752.0, 55180000.0, 69.551676], [1439700480.0, 55200000.0, 69.5296], [1439702208.0, 55220000.0, 69.507516], [1439703936.0, 55240000.0, 69.485424], [1439705664.0, 55260000.0, 69.463324], [1439707392.0, 55280000.0, 69.441216], [1439709120.0, 55300000.0, 69.4191], [1439710848.0, 55320000.0, 69.396976], [1439712576.0, 55340000.0, 69.374844], [1439714304.0, 55360000.0, 69.352704], [1439716032.0, 55380000.0, 69.330556], [1439717760.0, 55400000.0, 69.3084], [1439719488.0, 55420000.0, 69.286236], [1439721216.0, 55440000.0, 69.264064], [1439722944.0, 55460000.0, 69.241884], [1439724672.0, 55480000.0, 69.219696], [1439726400.0, 55500000.0, 69.1975], [1439728128.0, 55520000.0, 69.175296], [1439729856.0, 55540000.0, 69.15308399999999], [1439731584.0, 55560000.0, 69.130864], [1439733312.0, 55580000.0, 69.108636], [1439735040.0, 55600000.0, 69.0864], [1439736768.0, 55620000.0, 69.064156], [1439738496.0, 55640000.0, 69.041904], [1439740224.0, 55660000.0, 69.019644], [1439741952.0, 55680000.0, 68.997376], [1439743680.0, 55700000.0, 68.9751], [1439745408.0, 55720000.0, 68.952816], [1439747136.0, 55740000.0, 68.930524], [1439748864.0, 55760000.0, 68.908224], [1439750592.0, 55780000.0, 68.885916], [1439752320.0, 55800000.0, 68.86359999999999], [1439754048.0, 55820000.0, 68.841276], [1439755776.0, 55840000.0, 68.818944], [1439757504.0, 55860000.0, 68.796604], [1439759232.0, 55880000.0, 68.774256], [1439760960.0, 55900000.0, 68.7519], [1439762688.0, 55920000.0, 68.729536], [1439764416.0, 55940000.0, 68.70716399999999], [1439766144.0, 55960000.0, 68.68478400000001], [1439767872.0, 55980000.0, 68.662396], [1439769600.0, 56000000.0, 68.64], [1439771328.0, 56020000.0, 68.61759599999999], [1439773056.0, 56040000.0, 68.595184], [1439774784.0, 56060000.0, 68.572764], [1439776512.0, 56080000.0, 68.550336], [1439778240.0, 56100000.0, 68.52789999999999], [1439779968.0, 56120000.0, 68.505456], [1439781696.0, 56140000.0, 68.483004], [1439783424.0, 56160000.0, 68.460544], [1439785152.0, 56180000.0, 68.438076], [1439786880.0, 56200000.0, 68.4156], [1439788608.0, 56220000.0, 68.393116], [1439790336.0, 56240000.0, 68.370624], [1439792064.0, 56260000.0, 68.348124], [1439793792.0, 56280000.0, 68.325616], [1439795520.0, 56300000.0, 68.3031], [1439797248.0, 56320000.0, 68.280576], [1439798976.0, 56340000.0, 68.258044], [1439800704.0, 56360000.0, 68.23550399999999], [1439802432.0, 56380000.0, 68.212956], [1439804160.0, 56400000.0, 68.19040000000001], [1439805888.0, 56420000.0, 68.167836], [1439807616.0, 56440000.0, 68.145264], [1439809344.0, 56460000.0, 68.12268399999999], [1439811072.0, 56480000.0, 68.10009600000001], [1439812800.0, 56500000.0, 68.0775], [1439814528.0, 56520000.0, 68.054896], [1439816256.0, 56540000.0, 68.032284], [1439817984.0, 56560000.0, 68.009664], [1439819712.0, 56580000.0, 67.98703599999999], [1439821440.0, 56600000.0, 67.9644], [1439823168.0, 56620000.0, 67.941756], [1439824896.0, 56640000.0, 67.919104], [1439826624.0, 56660000.0, 67.896444], [1439828352.0, 56680000.0, 67.87377599999999], [1439830080.0, 56700000.0, 67.8511], [1439831808.0, 56720000.0, 67.828416], [1439833536.0, 56740000.0, 67.805724], [1439835264.0, 56760000.0, 67.783024], [1439836992.0, 56780000.0, 67.760316], [1439838720.0, 56800000.0, 67.7376], [1439840448.0, 56820000.0, 67.714876], [1439842176.0, 56840000.0, 67.692144], [1439843904.0, 56860000.0, 67.669404], [1439845632.0, 56880000.0, 67.64665600000001], [1439847360.0, 56900000.0, 67.62389999999999], [1439849088.0, 56920000.0, 67.601136], [1439850816.0, 56940000.0, 67.578364], [1439852544.0, 56960000.0, 67.55558400000001], [1439854272.0, 56980000.0, 67.53279599999999], [1439856000.0, 57000000.0, 67.50999999999999], [1439857728.0, 57020000.0, 67.487196], [1439859456.0, 57040000.0, 67.464384], [1439861184.0, 57060000.0, 67.441564], [1439862912.0, 57080000.0, 67.418736], [1439864640.0, 57100000.0, 67.3959], [1439866368.0, 57120000.0, 67.373056], [1439868096.0, 57140000.0, 67.35020399999999], [1439869824.0, 57160000.0, 67.327344], [1439871552.0, 57180000.0, 67.304476], [1439873280.0, 57200000.0, 67.2816], [1439875008.0, 57220000.0, 67.25871599999999], [1439876736.0, 57240000.0, 67.23582400000001], [1439878464.0, 57260000.0, 67.212924], [1439880192.0, 57280000.0, 67.190016], [1439881920.0, 57300000.0, 67.1671], [1439883648.0, 57320000.0, 67.14417599999999], [1439885376.0, 57340000.0, 67.12124399999999], [1439887104.0, 57360000.0, 67.09830400000001], [1439888832.0, 57380000.0, 67.075356], [1439890560.0, 57400000.0, 67.0524], [1439892288.0, 57420000.0, 67.029436], [1439894016.0, 57440000.0, 67.006464], [1439895744.0, 57460000.0, 66.983484], [1439897472.0, 57480000.0, 66.960496], [1439899200.0, 57500000.0, 66.9375], [1439900928.0, 57520000.0, 66.914496], [1439902656.0, 57540000.0, 66.891484], [1439904384.0, 57560000.0, 66.86846399999999], [1439906112.0, 57580000.0, 66.845436], [1439907840.0, 57600000.0, 66.8224], [1439909568.0, 57620000.0, 66.799356], [1439911296.0, 57640000.0, 66.776304], [1439913024.0, 57660000.0, 66.753244], [1439914752.0, 57680000.0, 66.730176], [1439916480.0, 57700000.0, 66.7071], [1439918208.0, 57720000.0, 66.684016], [1439919936.0, 57740000.0, 66.660924], [1439921664.0, 57760000.0, 66.637824], [1439923392.0, 57780000.0, 66.614716], [1439925120.0, 57800000.0, 66.5916], [1439926848.0, 57820000.0, 66.568476], [1439928576.0, 57840000.0, 66.545344], [1439930304.0, 57860000.0, 66.522204], [1439932032.0, 57880000.0, 66.499056], [1439933760.0, 57900000.0, 66.4759], [1439935488.0, 57920000.0, 66.452736], [1439937216.0, 57940000.0, 66.429564], [1439938944.0, 57960000.0, 66.406384], [1439940672.0, 57980000.0, 66.383196], [1439942400.0, 58000000.0, 66.36], [1439944128.0, 58020000.0, 66.336796], [1439945856.0, 58040000.0, 66.31358399999999], [1439947584.0, 58060000.0, 66.290364], [1439949312.0, 58080000.0, 66.267136], [1439951040.0, 58100000.0, 66.2439], [1439952768.0, 58120000.0, 66.22065599999999], [1439954496.0, 58140000.0, 66.197404], [1439956224.0, 58160000.0, 66.17414400000001], [1439957952.0, 58180000.0, 66.15087600000001], [1439959680.0, 58200000.0, 66.1276], [1439961408.0, 58220000.0, 66.104316], [1439963136.0, 58240000.0, 66.081024], [1439964864.0, 58260000.0, 66.05772400000001], [1439966592.0, 58280000.0, 66.034416], [1439968320.0, 58300000.0, 66.0111], [1439970048.0, 58320000.0, 65.987776], [1439971776.0, 58340000.0, 65.96444400000001], [1439973504.0, 58360000.0, 65.941104], [1439975232.0, 58380000.0, 65.917756], [1439976960.0, 58400000.0, 65.8944], [1439978688.0, 58420000.0, 65.871036], [1439980416.0, 58440000.0, 65.847664], [1439982144.0, 58460000.0, 65.824284], [1439983872.0, 58480000.0, 65.800896], [1439985600.0, 58500000.0, 65.7775], [1439987328.0, 58520000.0, 65.754096], [1439989056.0, 58540000.0, 65.730684], [1439990784.0, 58560000.0, 65.70726400000001], [1439992512.0, 58580000.0, 65.68383600000001], [1439994240.0, 58600000.0, 65.6604], [1439995968.0, 58620000.0, 65.636956], [1439997696.0, 58640000.0, 65.613504], [1439999424.0, 58660000.0, 65.590044], [1440001152.0, 58680000.0, 65.566576], [1440002880.0, 58700000.0, 65.5431], [1440004608.0, 58720000.0, 65.519616], [1440006336.0, 58740000.0, 65.49612400000001], [1440008064.0, 58760000.0, 65.472624], [1440009792.0, 58780000.0, 65.449116], [1440011520.0, 58800000.0, 65.4256], [1440013248.0, 58820000.0, 65.402076], [1440014976.0, 58840000.0, 65.378544], [1440016704.0, 58860000.0, 65.35500400000001], [1440018432.0, 58880000.0, 65.331456], [1440020160.0, 58900000.0, 65.3079], [1440021888.0, 58920000.0, 65.284336], [1440023616.0, 58940000.0, 65.260764], [1440025344.0, 58960000.0, 65.237184], [1440027072.0, 58980000.0, 65.213596], [1440028800.0, 59000000.0, 65.19], [1440030528.0, 59020000.0, 65.16639599999999], [1440032256.0, 59040000.0, 65.142784], [1440033984.0, 59060000.0, 65.11916400000001], [1440035712.0, 59080000.0, 65.095536], [1440037440.0, 59100000.0, 65.0719], [1440039168.0, 59120000.0, 65.04825600000001], [1440040896.0, 59140000.0, 65.02460400000001], [1440042624.0, 59160000.0, 65.000944], [1440044352.0, 59180000.0, 64.97727599999999], [1440046080.0, 59200000.0, 64.9536], [1440047808.0, 59220000.0, 64.929916], [1440049536.0, 59240000.0, 64.906224], [1440051264.0, 59260000.0, 64.88252399999999], [1440052992.0, 59280000.0, 64.85881599999999], [1440054720.0, 59300000.0, 64.83510000000001], [1440056448.0, 59320000.0, 64.811376], [1440058176.0, 59340000.0, 64.787644], [1440059904.0, 59360000.0, 64.763904], [1440061632.0, 59380000.0, 64.74015600000001], [1440063360.0, 59400000.0, 64.7164], [1440065088.0, 59420000.0, 64.692636], [1440066816.0, 59440000.0, 64.668864], [1440068544.0, 59460000.0, 64.645084], [1440070272.0, 59480000.0, 64.621296], [1440072000.0, 59500000.0, 64.5975], [1440073728.0, 59520000.0, 64.573696], [1440075456.0, 59540000.0, 64.549884], [1440077184.0, 59560000.0, 64.52606399999999], [1440078912.0, 59580000.0, 64.502236], [1440080640.0, 59600000.0, 64.4784], [1440082368.0, 59620000.0, 64.454556], [1440084096.0, 59640000.0, 64.43070399999999], [1440085824.0, 59660000.0, 64.406844], [1440087552.0, 59680000.0, 64.382976], [1440089280.0, 59700000.0, 64.35910000000001], [1440091008.0, 59720000.0, 64.335216], [1440092736.0, 59740000.0, 64.311324], [1440094464.0, 59760000.0, 64.287424], [1440096192.0, 59780000.0, 64.26351600000001], [1440097920.0, 59800000.0, 64.2396], [1440099648.0, 59820000.0, 64.215676], [1440101376.0, 59840000.0, 64.191744], [1440103104.0, 59860000.0, 64.167804], [1440104832.0, 59880000.0, 64.143856], [1440106560.0, 59900000.0, 64.1199], [1440108288.0, 59920000.0, 64.095936], [1440110016.0, 59940000.0, 64.07196400000001], [1440111744.0, 59960000.0, 64.04798399999999], [1440113472.0, 59980000.0, 64.023996], [1440115200.0, 60000000.0, 64.0], [1440116928.0, 60020000.0, 63.975996], [1440118656.0, 60040000.0, 63.951984], [1440120384.0, 60060000.0, 63.927963999999996], [1440122112.0, 60080000.0, 63.903936], [1440123840.0, 60100000.0, 63.8799], [1440125568.0, 60120000.0, 63.855856], [1440127296.0, 60140000.0, 63.831804], [1440129024.0, 60160000.0, 63.807744], [1440130752.0, 60180000.0, 63.783676], [1440132480.0, 60200000.0, 63.759600000000006], [1440134208.0, 60220000.0, 63.735516], [1440135936.0, 60240000.0, 63.711424], [1440137664.0, 60260000.0, 63.687324000000004], [1440139392.0, 60280000.0, 63.663216000000006], [1440141120.0, 60300000.0, 63.6391], [1440142848.0, 60320000.0, 63.614976], [1440144576.0, 60340000.0, 63.590844000000004], [1440146304.0, 60360000.0, 63.566704], [1440148032.0, 60380000.0, 63.542556], [1440149760.0, 60400000.0, 63.5184], [1440151488.0, 60420000.0, 63.494236], [1440153216.0, 60440000.0, 63.47006400000001], [1440154944.0, 60460000.0, 63.445884], [1440156672.0, 60480000.0, 63.421696], [1440158400.0, 60500000.0, 63.3975], [1440160128.0, 60520000.0, 63.373296], [1440161856.0, 60540000.0, 63.349084], [1440163584.0, 60560000.0, 63.324864], [1440165312.0, 60580000.0, 63.300636000000004], [1440167040.0, 60600000.0, 63.2764], [1440168768.0, 60620000.0, 63.252156], [1440170496.0, 60640000.0, 63.227904], [1440172224.0, 60660000.0, 63.203644000000004], [1440173952.0, 60680000.0, 63.179376000000005], [1440175680.0, 60700000.0, 63.1551], [1440177408.0, 60720000.0, 63.130815999999996], [1440179136.0, 60740000.0, 63.106524], [1440180864.0, 60760000.0, 63.082224000000004], [1440182592.0, 60780000.0, 63.057916], [1440184320.0, 60800000.0, 63.0336], [1440186048.0, 60820000.0, 63.009276], [1440187776.0, 60840000.0, 62.984944000000006], [1440189504.0, 60860000.0, 62.960604], [1440191232.0, 60880000.0, 62.936256], [1440192960.0, 60900000.0, 62.9119], [1440194688.0, 60920000.0, 62.887536000000004], [1440196416.0, 60940000.0, 62.863164], [1440198144.0, 60960000.0, 62.838784], [1440199872.0, 60980000.0, 62.814396], [1440201600.0, 61000000.0, 62.790000000000006], [1440203328.0, 61020000.0, 62.765595999999995], [1440205056.0, 61040000.0, 62.741184], [1440206784.0, 61060000.0, 62.716764000000005], [1440208512.0, 61080000.0, 62.692336000000005], [1440210240.0, 61100000.0, 62.667899999999996], [1440211968.0, 61120000.0, 62.643456], [1440213696.0, 61140000.0, 62.619004000000004], [1440215424.0, 61160000.0, 62.594544000000006], [1440217152.0, 61180000.0, 62.57007599999999], [1440218880.0, 61200000.0, 62.5456], [1440220608.0, 61220000.0, 62.521116], [1440222336.0, 61240000.0, 62.496624000000004], [1440224064.0, 61260000.0, 62.472123999999994], [1440225792.0, 61280000.0, 62.447616], [1440227520.0, 61300000.0, 62.4231], [1440229248.0, 61320000.0, 62.398576000000006], [1440230976.0, 61340000.0, 62.374044], [1440232704.0, 61360000.0, 62.349503999999996], [1440234432.0, 61380000.0, 62.324956], [1440236160.0, 61400000.0, 62.3004], [1440237888.0, 61420000.0, 62.275836], [1440239616.0, 61440000.0, 62.251264], [1440241344.0, 61460000.0, 62.226684], [1440243072.0, 61480000.0, 62.202096000000004], [1440244800.0, 61500000.0, 62.177499999999995], [1440246528.0, 61520000.0, 62.152896], [1440248256.0, 61540000.0, 62.128284], [1440249984.0, 61560000.0, 62.103664], [1440251712.0, 61580000.0, 62.079035999999995], [1440253440.0, 61600000.0, 62.0544], [1440255168.0, 61620000.0, 62.029756], [1440256896.0, 61640000.0, 62.005104], [1440258624.0, 61660000.0, 61.980444], [1440260352.0, 61680000.0, 61.955776], [1440262080.0, 61700000.0, 61.9311], [1440263808.0, 61720000.0, 61.906416], [1440265536.0, 61740000.0, 61.881724], [1440267264.0, 61760000.0, 61.857023999999996], [1440268992.0, 61780000.0, 61.832316], [1440270720.0, 61800000.0, 61.8076], [1440272448.0, 61820000.0, 61.782875999999995], [1440274176.0, 61840000.0, 61.758144], [1440275904.0, 61860000.0, 61.733404], [1440277632.0, 61880000.0, 61.708656000000005], [1440279360.0, 61900000.0, 61.683899999999994], [1440281088.0, 61920000.0, 61.659136], [1440282816.0, 61940000.0, 61.634364], [1440284544.0, 61960000.0, 61.609584000000005], [1440286272.0, 61980000.0, 61.584796], [1440288000.0, 62000000.0, 61.559999999999995], [1440289728.0, 62020000.0, 61.535196], [1440291456.0, 62040000.0, 61.510384], [1440293184.0, 62060000.0, 61.485564], [1440294912.0, 62080000.0, 61.460736], [1440296640.0, 62100000.0, 61.435900000000004], [1440298368.0, 62120000.0, 61.411056], [1440300096.0, 62140000.0, 61.38620399999999], [1440301824.0, 62160000.0, 61.361343999999995], [1440303552.0, 62180000.0, 61.336476], [1440305280.0, 62200000.0, 61.311600000000006], [1440307008.0, 62220000.0, 61.28671599999999], [1440308736.0, 62240000.0, 61.261824], [1440310464.0, 62260000.0, 61.236924], [1440312192.0, 62280000.0, 61.212016000000006], [1440313920.0, 62300000.0, 61.187099999999994], [1440315648.0, 62320000.0, 61.162175999999995], [1440317376.0, 62340000.0, 61.137244], [1440319104.0, 62360000.0, 61.112304], [1440320832.0, 62380000.0, 61.08735599999999], [1440322560.0, 62400000.0, 61.0624], [1440324288.0, 62420000.0, 61.037436], [1440326016.0, 62440000.0, 61.012464], [1440327744.0, 62460000.0, 60.987483999999995], [1440329472.0, 62480000.0, 60.962495999999994], [1440331200.0, 62500000.0, 60.9375], [1440332928.0, 62520000.0, 60.912496000000004], [1440334656.0, 62540000.0, 60.88748400000001], [1440336384.0, 62560000.0, 60.862463999999996], [1440338112.0, 62580000.0, 60.837436], [1440339840.0, 62600000.0, 60.812400000000004], [1440341568.0, 62620000.0, 60.787356], [1440343296.0, 62640000.0, 60.762304], [1440345024.0, 62660000.0, 60.737244], [1440346752.0, 62680000.0, 60.712176], [1440348480.0, 62700000.0, 60.68710000000001], [1440350208.0, 62720000.0, 60.662015999999994], [1440351936.0, 62740000.0, 60.636924], [1440353664.0, 62760000.0, 60.611824000000006], [1440355392.0, 62780000.0, 60.586716], [1440357120.0, 62800000.0, 60.5616], [1440358848.0, 62820000.0, 60.536476], [1440360576.0, 62840000.0, 60.511344], [1440362304.0, 62860000.0, 60.48620400000001], [1440364032.0, 62880000.0, 60.461056], [1440365760.0, 62900000.0, 60.4359], [1440367488.0, 62920000.0, 60.410736], [1440369216.0, 62940000.0, 60.385564], [1440370944.0, 62960000.0, 60.360383999999996], [1440372672.0, 62980000.0, 60.335195999999996], [1440374400.0, 63000000.0, 60.31], [1440376128.0, 63020000.0, 60.28479600000001], [1440377856.0, 63040000.0, 60.259584], [1440379584.0, 63060000.0, 60.234364], [1440381312.0, 63080000.0, 60.209136], [1440383040.0, 63100000.0, 60.18390000000001], [1440384768.0, 63120000.0, 60.15865599999999], [1440386496.0, 63140000.0, 60.133404], [1440388224.0, 63160000.0, 60.108144], [1440389952.0, 63180000.0, 60.082876000000006], [1440391680.0, 63200000.0, 60.057599999999994], [1440393408.0, 63220000.0, 60.032316], [1440395136.0, 63240000.0, 60.007024], [1440396864.0, 63260000.0, 59.98172400000001], [1440398592.0, 63280000.0, 59.956416], [1440400320.0, 63300000.0, 59.9311], [1440402048.0, 63320000.0, 59.905776], [1440403776.0, 63340000.0, 59.880444000000004], [1440405504.0, 63360000.0, 59.855104], [1440407232.0, 63380000.0, 59.829755999999996], [1440408960.0, 63400000.0, 59.8044], [1440410688.0, 63420000.0, 59.779036000000005], [1440412416.0, 63440000.0, 59.75366399999999], [1440414144.0, 63460000.0, 59.728284], [1440415872.0, 63480000.0, 59.702896], [1440417600.0, 63500000.0, 59.6775], [1440419328.0, 63520000.0, 59.65209599999999], [1440421056.0, 63540000.0, 59.626684], [1440422784.0, 63560000.0, 59.601264], [1440424512.0, 63580000.0, 59.575836], [1440426240.0, 63600000.0, 59.550399999999996], [1440427968.0, 63620000.0, 59.524955999999996], [1440429696.0, 63640000.0, 59.499504], [1440431424.0, 63660000.0, 59.474044000000006], [1440433152.0, 63680000.0, 59.448575999999996], [1440434880.0, 63700000.0, 59.4231], [1440436608.0, 63720000.0, 59.397616], [1440438336.0, 63740000.0, 59.37212400000001], [1440440064.0, 63760000.0, 59.346624], [1440441792.0, 63780000.0, 59.321115999999996], [1440443520.0, 63800000.0, 59.2956], [1440445248.0, 63820000.0, 59.270076], [1440446976.0, 63840000.0, 59.244544], [1440448704.0, 63860000.0, 59.219004], [1440450432.0, 63880000.0, 59.193456000000005], [1440452160.0, 63900000.0, 59.1679], [1440453888.0, 63920000.0, 59.14233599999999], [1440455616.0, 63940000.0, 59.116763999999996], [1440457344.0, 63960000.0, 59.091184], [1440459072.0, 63980000.0, 59.065596000000006], [1440460800.0, 64000000.0, 59.03999999999999], [1440462528.0, 64020000.0, 59.014396], [1440464256.0, 64040000.0, 58.988784], [1440465984.0, 64060000.0, 58.963164000000006], [1440467712.0, 64080000.0, 58.937535999999994], [1440469440.0, 64100000.0, 58.911899999999996], [1440471168.0, 64120000.0, 58.886256], [1440472896.0, 64140000.0, 58.860604], [1440474624.0, 64160000.0, 58.83494399999999], [1440476352.0, 64180000.0, 58.809276], [1440478080.0, 64200000.0, 58.7836], [1440479808.0, 64220000.0, 58.757916], [1440481536.0, 64240000.0, 58.732223999999995], [1440483264.0, 64260000.0, 58.706523999999995], [1440484992.0, 64280000.0, 58.680816], [1440486720.0, 64300000.0, 58.655100000000004], [1440488448.0, 64320000.0, 58.62937599999999], [1440490176.0, 64340000.0, 58.603643999999996], [1440491904.0, 64360000.0, 58.577904000000004], [1440493632.0, 64380000.0, 58.552156000000004], [1440495360.0, 64400000.0, 58.526399999999995], [1440497088.0, 64420000.0, 58.500636], [1440498816.0, 64440000.0, 58.474864000000004], [1440500544.0, 64460000.0, 58.449084000000006], [1440502272.0, 64480000.0, 58.42329599999999], [1440504000.0, 64500000.0, 58.3975], [1440505728.0, 64520000.0, 58.371696], [1440507456.0, 64540000.0, 58.345884000000005], [1440509184.0, 64560000.0, 58.320063999999995], [1440510912.0, 64580000.0, 58.294236], [1440512640.0, 64600000.0, 58.2684], [1440514368.0, 64620000.0, 58.242556], [1440516096.0, 64640000.0, 58.21670399999999], [1440517824.0, 64660000.0, 58.190844], [1440519552.0, 64680000.0, 58.164976], [1440521280.0, 64700000.0, 58.139100000000006], [1440523008.0, 64720000.0, 58.113215999999994], [1440524736.0, 64740000.0, 58.087323999999995], [1440526464.0, 64760000.0, 58.061424], [1440528192.0, 64780000.0, 58.035516], [1440529920.0, 64800000.0, 58.00959999999999], [1440531648.0, 64820000.0, 57.983675999999996], [1440533376.0, 64840000.0, 57.957744], [1440535104.0, 64860000.0, 57.931804], [1440536832.0, 64880000.0, 57.90585599999999], [1440538560.0, 64900000.0, 57.8799], [1440540288.0, 64920000.0, 57.853936], [1440542016.0, 64940000.0, 57.827964], [1440543744.0, 64960000.0, 57.801984], [1440545472.0, 64980000.0, 57.775996], [1440547200.0, 65000000.0, 57.75], [1440548928.0, 65020000.0, 57.723996], [1440550656.0, 65040000.0, 57.697984000000005], [1440552384.0, 65060000.0, 57.671963999999996], [1440554112.0, 65080000.0, 57.645936], [1440555840.0, 65100000.0, 57.6199], [1440557568.0, 65120000.0, 57.593856], [1440559296.0, 65140000.0, 57.567803999999995], [1440561024.0, 65160000.0, 57.541744], [1440562752.0, 65180000.0, 57.515676], [1440564480.0, 65200000.0, 57.4896], [1440566208.0, 65220000.0, 57.463516], [1440567936.0, 65240000.0, 57.437424], [1440569664.0, 65260000.0, 57.411324], [1440571392.0, 65280000.0, 57.38521600000001], [1440573120.0, 65300000.0, 57.3591], [1440574848.0, 65320000.0, 57.332976], [1440576576.0, 65340000.0, 57.306844000000005], [1440578304.0, 65360000.0, 57.28070400000001], [1440580032.0, 65380000.0, 57.254555999999994], [1440581760.0, 65400000.0, 57.2284], [1440583488.0, 65420000.0, 57.202236], [1440585216.0, 65440000.0, 57.176064000000004], [1440586944.0, 65460000.0, 57.14988399999999], [1440588672.0, 65480000.0, 57.123696], [1440590400.0, 65500000.0, 57.097500000000004], [1440592128.0, 65520000.0, 57.071296000000004], [1440593856.0, 65540000.0, 57.045083999999996], [1440595584.0, 65560000.0, 57.018864], [1440597312.0, 65580000.0, 56.992636000000005], [1440599040.0, 65600000.0, 56.96640000000001], [1440600768.0, 65620000.0, 56.940155999999995], [1440602496.0, 65640000.0, 56.913904], [1440604224.0, 65660000.0, 56.887644], [1440605952.0, 65680000.0, 56.86137600000001], [1440607680.0, 65700000.0, 56.8351], [1440609408.0, 65720000.0, 56.808816], [1440611136.0, 65740000.0, 56.782524], [1440612864.0, 65760000.0, 56.756224], [1440614592.0, 65780000.0, 56.729915999999996], [1440616320.0, 65800000.0, 56.7036], [1440618048.0, 65820000.0, 56.677276], [1440619776.0, 65840000.0, 56.650944], [1440621504.0, 65860000.0, 56.624604], [1440623232.0, 65880000.0, 56.598256], [1440624960.0, 65900000.0, 56.5719], [1440626688.0, 65920000.0, 56.545536000000006], [1440628416.0, 65940000.0, 56.519163999999996], [1440630144.0, 65960000.0, 56.492784], [1440631872.0, 65980000.0, 56.466396], [1440633600.0, 66000000.0, 56.440000000000005], [1440635328.0, 66020000.0, 56.413596], [1440637056.0, 66040000.0, 56.387184], [1440638784.0, 66060000.0, 56.360764], [1440640512.0, 66080000.0, 56.33433600000001], [1440642240.0, 66100000.0, 56.3079], [1440643968.0, 66120000.0, 56.281456], [1440645696.0, 66140000.0, 56.255004], [1440647424.0, 66160000.0, 56.22854400000001], [1440649152.0, 66180000.0, 56.202076], [1440650880.0, 66200000.0, 56.175599999999996], [1440652608.0, 66220000.0, 56.149116], [1440654336.0, 66240000.0, 56.122624], [1440656064.0, 66260000.0, 56.096123999999996], [1440657792.0, 66280000.0, 56.069615999999996], [1440659520.0, 66300000.0, 56.0431], [1440661248.0, 66320000.0, 56.01657600000001], [1440662976.0, 66340000.0, 55.990044], [1440664704.0, 66360000.0, 55.963504], [1440666432.0, 66380000.0, 55.936956], [1440668160.0, 66400000.0, 55.9104], [1440669888.0, 66420000.0, 55.883835999999995], [1440671616.0, 66440000.0, 55.857264], [1440673344.0, 66460000.0, 55.830684], [1440675072.0, 66480000.0, 55.804096], [1440676800.0, 66500000.0, 55.777499999999996], [1440678528.0, 66520000.0, 55.750896], [1440680256.0, 66540000.0, 55.724284000000004], [1440681984.0, 66560000.0, 55.697664], [1440683712.0, 66580000.0, 55.671035999999994], [1440685440.0, 66600000.0, 55.6444], [1440687168.0, 66620000.0, 55.617756], [1440688896.0, 66640000.0, 55.591104], [1440690624.0, 66660000.0, 55.564443999999995], [1440692352.0, 66680000.0, 55.537776], [1440694080.0, 66700000.0, 55.5111], [1440695808.0, 66720000.0, 55.484416], [1440697536.0, 66740000.0, 55.45772399999999], [1440699264.0, 66760000.0, 55.431024], [1440700992.0, 66780000.0, 55.404316], [1440702720.0, 66800000.0, 55.3776], [1440704448.0, 66820000.0, 55.35087599999999], [1440706176.0, 66840000.0, 55.324144], [1440707904.0, 66860000.0, 55.297404], [1440709632.0, 66880000.0, 55.270656], [1440711360.0, 66900000.0, 55.2439], [1440713088.0, 66920000.0, 55.217135999999996], [1440714816.0, 66940000.0, 55.190364], [1440716544.0, 66960000.0, 55.163584], [1440718272.0, 66980000.0, 55.136796], [1440720000.0, 67000000.0, 55.11], [1440721728.0, 67020000.0, 55.083196], [1440723456.0, 67040000.0, 55.056384], [1440725184.0, 67060000.0, 55.02956399999999], [1440726912.0, 67080000.0, 55.002736], [1440728640.0, 67100000.0, 54.9759], [1440730368.0, 67120000.0, 54.949056000000006], [1440732096.0, 67140000.0, 54.922203999999994], [1440733824.0, 67160000.0, 54.895343999999994], [1440735552.0, 67180000.0, 54.868476], [1440737280.0, 67200000.0, 54.84160000000001], [1440739008.0, 67220000.0, 54.814716], [1440740736.0, 67240000.0, 54.787824], [1440742464.0, 67260000.0, 54.760924], [1440744192.0, 67280000.0, 54.734016000000004], [1440745920.0, 67300000.0, 54.7071], [1440747648.0, 67320000.0, 54.680175999999996], [1440749376.0, 67340000.0, 54.653244], [1440751104.0, 67360000.0, 54.626304000000005], [1440752832.0, 67380000.0, 54.59935599999999], [1440754560.0, 67400000.0, 54.572399999999995], [1440756288.0, 67420000.0, 54.545436], [1440758016.0, 67440000.0, 54.518464], [1440759744.0, 67460000.0, 54.49148399999999], [1440761472.0, 67480000.0, 54.464496], [1440763200.0, 67500000.0, 54.4375], [1440764928.0, 67520000.0, 54.410496], [1440766656.0, 67540000.0, 54.383484], [1440768384.0, 67560000.0, 54.356463999999995], [1440770112.0, 67580000.0, 54.329436], [1440771840.0, 67600000.0, 54.302400000000006], [1440773568.0, 67620000.0, 54.27535600000001], [1440775296.0, 67640000.0, 54.248304], [1440777024.0, 67660000.0, 54.221244], [1440778752.0, 67680000.0, 54.194176000000006], [1440780480.0, 67700000.0, 54.167100000000005], [1440782208.0, 67720000.0, 54.140015999999996], [1440783936.0, 67740000.0, 54.112924], [1440785664.0, 67760000.0, 54.085824], [1440787392.0, 67780000.0, 54.058716000000004], [1440789120.0, 67800000.0, 54.0316], [1440790848.0, 67820000.0, 54.004476], [1440792576.0, 67840000.0, 53.977344], [1440794304.0, 67860000.0, 53.95020400000001], [1440796032.0, 67880000.0, 53.923055999999995], [1440797760.0, 67900000.0, 53.8959], [1440799488.0, 67920000.0, 53.868736000000006], [1440801216.0, 67940000.0, 53.841564000000005], [1440802944.0, 67960000.0, 53.814384], [1440804672.0, 67980000.0, 53.787196], [1440806400.0, 68000000.0, 53.760000000000005], [1440808128.0, 68020000.0, 53.73279600000001], [1440809856.0, 68040000.0, 53.705583999999995], [1440811584.0, 68060000.0, 53.678364], [1440813312.0, 68080000.0, 53.651136], [1440815040.0, 68100000.0, 53.623900000000006], [1440816768.0, 68120000.0, 53.596655999999996], [1440818496.0, 68140000.0, 53.569404], [1440820224.0, 68160000.0, 53.542144], [1440821952.0, 68180000.0, 53.51487600000001], [1440823680.0, 68200000.0, 53.48759999999999], [1440825408.0, 68220000.0, 53.460316], [1440827136.0, 68240000.0, 53.433024], [1440828864.0, 68260000.0, 53.405724000000006], [1440830592.0, 68280000.0, 53.378415999999994], [1440832320.0, 68300000.0, 53.3511], [1440834048.0, 68320000.0, 53.323776], [1440835776.0, 68340000.0, 53.29644400000001], [1440837504.0, 68360000.0, 53.269104], [1440839232.0, 68380000.0, 53.241756], [1440840960.0, 68400000.0, 53.214400000000005], [1440842688.0, 68420000.0, 53.187036000000006], [1440844416.0, 68440000.0, 53.159664], [1440846144.0, 68460000.0, 53.132284], [1440847872.0, 68480000.0, 53.104896000000004], [1440849600.0, 68500000.0, 53.07750000000001], [1440851328.0, 68520000.0, 53.050095999999996], [1440853056.0, 68540000.0, 53.022684], [1440854784.0, 68560000.0, 52.995264], [1440856512.0, 68580000.0, 52.967836000000005], [1440858240.0, 68600000.0, 52.9404], [1440859968.0, 68620000.0, 52.912956], [1440861696.0, 68640000.0, 52.885504000000005], [1440863424.0, 68660000.0, 52.85804400000001], [1440865152.0, 68680000.0, 52.83057599999999], [1440866880.0, 68700000.0, 52.8031], [1440868608.0, 68720000.0, 52.775616], [1440870336.0, 68740000.0, 52.748124000000004], [1440872064.0, 68760000.0, 52.720623999999994], [1440873792.0, 68780000.0, 52.693115999999996], [1440875520.0, 68800000.0, 52.665600000000005], [1440877248.0, 68820000.0, 52.638076000000005], [1440878976.0, 68840000.0, 52.610544], [1440880704.0, 68860000.0, 52.583003999999995], [1440882432.0, 68880000.0, 52.555456], [1440884160.0, 68900000.0, 52.5279], [1440885888.0, 68920000.0, 52.500336], [1440887616.0, 68940000.0, 52.472764], [1440889344.0, 68960000.0, 52.445184000000005], [1440891072.0, 68980000.0, 52.417596], [1440892800.0, 69000000.0, 52.38999999999999], [1440894528.0, 69020000.0, 52.362396], [1440896256.0, 69040000.0, 52.334784], [1440897984.0, 69060000.0, 52.30716400000001], [1440899712.0, 69080000.0, 52.27953599999999], [1440901440.0, 69100000.0, 52.2519], [1440903168.0, 69120000.0, 52.224256000000004], [1440904896.0, 69140000.0, 52.196604], [1440906624.0, 69160000.0, 52.168943999999996], [1440908352.0, 69180000.0, 52.141276], [1440910080.0, 69200000.0, 52.1136], [1440911808.0, 69220000.0, 52.085916000000005], [1440913536.0, 69240000.0, 52.058223999999996], [1440915264.0, 69260000.0, 52.030524], [1440916992.0, 69280000.0, 52.002816], [1440918720.0, 69300000.0, 51.975100000000005], [1440920448.0, 69320000.0, 51.94737599999999], [1440922176.0, 69340000.0, 51.919644], [1440923904.0, 69360000.0, 51.891904000000004], [1440925632.0, 69380000.0, 51.864156], [1440927360.0, 69400000.0, 51.8364], [1440929088.0, 69420000.0, 51.808636], [1440930816.0, 69440000.0, 51.780864], [1440932544.0, 69460000.0, 51.753084], [1440934272.0, 69480000.0, 51.72529599999999], [1440936000.0, 69500000.0, 51.6975], [1440937728.0, 69520000.0, 51.669696], [1440939456.0, 69540000.0, 51.641884000000005], [1440941184.0, 69560000.0, 51.61406399999999], [1440942912.0, 69580000.0, 51.586236], [1440944640.0, 69600000.0, 51.5584], [1440946368.0, 69620000.0, 51.530556000000004], [1440948096.0, 69640000.0, 51.502703999999994], [1440949824.0, 69660000.0, 51.474844], [1440951552.0, 69680000.0, 51.446976], [1440953280.0, 69700000.0, 51.4191], [1440955008.0, 69720000.0, 51.39121599999999], [1440956736.0, 69740000.0, 51.363324], [1440958464.0, 69760000.0, 51.335424], [1440960192.0, 69780000.0, 51.30751600000001], [1440961920.0, 69800000.0, 51.279599999999995], [1440963648.0, 69820000.0, 51.251675999999996], [1440965376.0, 69840000.0, 51.223744], [1440967104.0, 69860000.0, 51.195804], [1440968832.0, 69880000.0, 51.16785599999999], [1440970560.0, 69900000.0, 51.1399], [1440972288.0, 69920000.0, 51.111936], [1440974016.0, 69940000.0, 51.083964], [1440975744.0, 69960000.0, 51.055983999999995], [1440977472.0, 69980000.0, 51.027995999999995], [1440979200.0, 70000000.0, 51.0], [1440980928.0, 70020000.0, 50.971996000000004], [1440982656.0, 70040000.0, 50.94398400000001], [1440984384.0, 70060000.0, 50.915963999999995], [1440986112.0, 70080000.0, 50.887936], [1440987840.0, 70100000.0, 50.8599], [1440989568.0, 70120000.0, 50.83185600000001], [1440991296.0, 70140000.0, 50.803804], [1440993024.0, 70160000.0, 50.775744], [1440994752.0, 70180000.0, 50.747676000000006], [1440996480.0, 70200000.0, 50.71960000000001], [1440998208.0, 70220000.0, 50.691516], [1440999936.0, 70240000.0, 50.663424], [1441001664.0, 70260000.0, 50.635324000000004], [1441003392.0, 70280000.0, 50.60721600000001], [1441005120.0, 70300000.0, 50.5791], [1441006848.0, 70320000.0, 50.550976], [1441008576.0, 70340000.0, 50.522844], [1441010304.0, 70360000.0, 50.494704000000006], [1441012032.0, 70380000.0, 50.466556], [1441013760.0, 70400000.0, 50.4384], [1441015488.0, 70420000.0, 50.410236000000005], [1441017216.0, 70440000.0, 50.38206400000001], [1441018944.0, 70460000.0, 50.353883999999994], [1441020672.0, 70480000.0, 50.325696], [1441022400.0, 70500000.0, 50.2975], [1441024128.0, 70520000.0, 50.269296000000004], [1441025856.0, 70540000.0, 50.241083999999994], [1441027584.0, 70560000.0, 50.212863999999996], [1441029312.0, 70580000.0, 50.184636000000005], [1441031040.0, 70600000.0, 50.156400000000005], [1441032768.0, 70620000.0, 50.128156], [1441034496.0, 70640000.0, 50.099904], [1441036224.0, 70660000.0, 50.071644], [1441037952.0, 70680000.0, 50.043376], [1441039680.0, 70700000.0, 50.0151], [1441041408.0, 70720000.0, 49.986816], [1441043136.0, 70740000.0, 49.958524000000004], [1441044864.0, 70760000.0, 49.930224], [1441046592.0, 70780000.0, 49.90191599999999], [1441048320.0, 70800000.0, 49.873599999999996], [1441050048.0, 70820000.0, 49.845276000000005], [1441051776.0, 70840000.0, 49.81694400000001], [1441053504.0, 70860000.0, 49.78860399999999], [1441055232.0, 70880000.0, 49.760256], [1441056960.0, 70900000.0, 49.7319], [1441058688.0, 70920000.0, 49.70353600000001], [1441060416.0, 70940000.0, 49.675163999999995], [1441062144.0, 70960000.0, 49.646784], [1441063872.0, 70980000.0, 49.618396000000004], [1441065600.0, 71000000.0, 49.59], [1441067328.0, 71020000.0, 49.561595999999994], [1441069056.0, 71040000.0, 49.533184], [1441070784.0, 71060000.0, 49.504764], [1441072512.0, 71080000.0, 49.476336], [1441074240.0, 71100000.0, 49.4479], [1441075968.0, 71120000.0, 49.419456], [1441077696.0, 71140000.0, 49.391004], [1441079424.0, 71160000.0, 49.36254400000001], [1441081152.0, 71180000.0, 49.334075999999996], [1441082880.0, 71200000.0, 49.3056], [1441084608.0, 71220000.0, 49.277116], [1441086336.0, 71240000.0, 49.24862400000001], [1441088064.0, 71260000.0, 49.220124], [1441089792.0, 71280000.0, 49.191615999999996], [1441091520.0, 71300000.0, 49.1631], [1441093248.0, 71320000.0, 49.134576], [1441094976.0, 71340000.0, 49.106044], [1441096704.0, 71360000.0, 49.077504], [1441098432.0, 71380000.0, 49.048956000000004], [1441100160.0, 71400000.0, 49.0204], [1441101888.0, 71420000.0, 48.99183599999999], [1441103616.0, 71440000.0, 48.963263999999995], [1441105344.0, 71460000.0, 48.934684000000004], [1441107072.0, 71480000.0, 48.906096000000005], [1441108800.0, 71500000.0, 48.8775], [1441110528.0, 71520000.0, 48.848895999999996], [1441112256.0, 71540000.0, 48.820284], [1441113984.0, 71560000.0, 48.791664000000004], [1441115712.0, 71580000.0, 48.76303599999999], [1441117440.0, 71600000.0, 48.7344], [1441119168.0, 71620000.0, 48.705756], [1441120896.0, 71640000.0, 48.67710400000001], [1441122624.0, 71660000.0, 48.648444], [1441124352.0, 71680000.0, 48.619775999999995], [1441126080.0, 71700000.0, 48.591100000000004], [1441127808.0, 71720000.0, 48.562416000000006], [1441129536.0, 71740000.0, 48.53372399999999], [1441131264.0, 71760000.0, 48.505024], [1441132992.0, 71780000.0, 48.476316000000004], [1441134720.0, 71800000.0, 48.4476], [1441136448.0, 71820000.0, 48.418876], [1441138176.0, 71840000.0, 48.390144], [1441139904.0, 71860000.0, 48.361404], [1441141632.0, 71880000.0, 48.33265600000001], [1441143360.0, 71900000.0, 48.30389999999999], [1441145088.0, 71920000.0, 48.275135999999996], [1441146816.0, 71940000.0, 48.246364], [1441148544.0, 71960000.0, 48.217584], [1441150272.0, 71980000.0, 48.188795999999996], [1441152000.0, 72000000.0, 48.16], [1441153728.0, 72020000.0, 48.131196], [1441155456.0, 72040000.0, 48.102384], [1441157184.0, 72060000.0, 48.073564], [1441158912.0, 72080000.0, 48.044736], [1441160640.0, 72100000.0, 48.0159], [1441162368.0, 72120000.0, 47.987056], [1441164096.0, 72140000.0, 47.958203999999995], [1441165824.0, 72160000.0, 47.929344], [1441167552.0, 72180000.0, 47.900476], [1441169280.0, 72200000.0, 47.8716], [1441171008.0, 72220000.0, 47.842715999999996], [1441172736.0, 72240000.0, 47.813824], [1441174464.0, 72260000.0, 47.784924000000004], [1441176192.0, 72280000.0, 47.756016], [1441177920.0, 72300000.0, 47.72709999999999], [1441179648.0, 72320000.0, 47.698176], [1441181376.0, 72340000.0, 47.669244], [1441183104.0, 72360000.0, 47.640304], [1441184832.0, 72380000.0, 47.611355999999994], [1441186560.0, 72400000.0, 47.5824], [1441188288.0, 72420000.0, 47.553436], [1441190016.0, 72440000.0, 47.524464], [1441191744.0, 72460000.0, 47.49548399999999], [1441193472.0, 72480000.0, 47.466496], [1441195200.0, 72500000.0, 47.4375], [1441196928.0, 72520000.0, 47.40849600000001], [1441198656.0, 72540000.0, 47.379484000000005], [1441200384.0, 72560000.0, 47.350463999999995], [1441202112.0, 72580000.0, 47.321436], [1441203840.0, 72600000.0, 47.2924], [1441205568.0, 72620000.0, 47.26335600000001], [1441207296.0, 72640000.0, 47.234303999999995], [1441209024.0, 72660000.0, 47.205244], [1441210752.0, 72680000.0, 47.176176000000005], [1441212480.0, 72700000.0, 47.14710000000001], [1441214208.0, 72720000.0, 47.118016], [1441215936.0, 72740000.0, 47.088924], [1441217664.0, 72760000.0, 47.059824000000006], [1441219392.0, 72780000.0, 47.030716000000005], [1441221120.0, 72800000.0, 47.001599999999996], [1441222848.0, 72820000.0, 46.972476], [1441224576.0, 72840000.0, 46.943344], [1441226304.0, 72860000.0, 46.914204000000005], [1441228032.0, 72880000.0, 46.885056], [1441229760.0, 72900000.0, 46.8559], [1441231488.0, 72920000.0, 46.826736000000004], [1441233216.0, 72940000.0, 46.79756400000001], [1441234944.0, 72960000.0, 46.768384], [1441236672.0, 72980000.0, 46.739196], [1441238400.0, 73000000.0, 46.71], [1441240128.0, 73020000.0, 46.68079600000001], [1441241856.0, 73040000.0, 46.65158399999999], [1441243584.0, 73060000.0, 46.622364], [1441245312.0, 73080000.0, 46.593136], [1441247040.0, 73100000.0, 46.563900000000004], [1441248768.0, 73120000.0, 46.534656], [1441250496.0, 73140000.0, 46.505404], [1441252224.0, 73160000.0, 46.476144000000005], [1441253952.0, 73180000.0, 46.446876], [1441255680.0, 73200000.0, 46.41759999999999], [1441257408.0, 73220000.0, 46.388315999999996], [1441259136.0, 73240000.0, 46.359024000000005], [1441260864.0, 73260000.0, 46.329724000000006], [1441262592.0, 73280000.0, 46.300416], [1441264320.0, 73300000.0, 46.2711], [1441266048.0, 73320000.0, 46.241776], [1441267776.0, 73340000.0, 46.212444000000005], [1441269504.0, 73360000.0, 46.18310399999999], [1441271232.0, 73380000.0, 46.153756], [1441272960.0, 73400000.0, 46.1244], [1441274688.0, 73420000.0, 46.09503600000001], [1441276416.0, 73440000.0, 46.065664], [1441278144.0, 73460000.0, 46.036284], [1441279872.0, 73480000.0, 46.006896000000005], [1441281600.0, 73500000.0, 45.977500000000006], [1441283328.0, 73520000.0, 45.94809599999999], [1441285056.0, 73540000.0, 45.918684], [1441286784.0, 73560000.0, 45.889264000000004], [1441288512.0, 73580000.0, 45.85983600000001], [1441290240.0, 73600000.0, 45.8304], [1441291968.0, 73620000.0, 45.800956], [1441293696.0, 73640000.0, 45.771504], [1441295424.0, 73660000.0, 45.74204400000001], [1441297152.0, 73680000.0, 45.712576], [1441298880.0, 73700000.0, 45.683099999999996], [1441300608.0, 73720000.0, 45.653616], [1441302336.0, 73740000.0, 45.624124], [1441304064.0, 73760000.0, 45.594623999999996], [1441305792.0, 73780000.0, 45.565115999999996], [1441307520.0, 73800000.0, 45.5356], [1441309248.0, 73820000.0, 45.50607600000001], [1441310976.0, 73840000.0, 45.476544], [1441312704.0, 73860000.0, 45.447004], [1441314432.0, 73880000.0, 45.417456], [1441316160.0, 73900000.0, 45.3879], [1441317888.0, 73920000.0, 45.358335999999994], [1441319616.0, 73940000.0, 45.328764], [1441321344.0, 73960000.0, 45.299184000000004], [1441323072.0, 73980000.0, 45.26959600000001], [1441324800.0, 74000000.0, 45.239999999999995], [1441326528.0, 74020000.0, 45.210395999999996], [1441328256.0, 74040000.0, 45.180784], [1441329984.0, 74060000.0, 45.151164], [1441331712.0, 74080000.0, 45.12153599999999], [1441333440.0, 74100000.0, 45.091899999999995], [1441335168.0, 74120000.0, 45.062256], [1441336896.0, 74140000.0, 45.032604000000006], [1441338624.0, 74160000.0, 45.00294399999999], [1441340352.0, 74180000.0, 44.973276], [1441342080.0, 74200000.0, 44.9436], [1441343808.0, 74220000.0, 44.91391600000001], [1441345536.0, 74240000.0, 44.884223999999996], [1441347264.0, 74260000.0, 44.854524], [1441348992.0, 74280000.0, 44.824816], [1441350720.0, 74300000.0, 44.795100000000005], [1441352448.0, 74320000.0, 44.765375999999996], [1441354176.0, 74340000.0, 44.735644], [1441355904.0, 74360000.0, 44.705904000000004], [1441357632.0, 74380000.0, 44.676156000000006], [1441359360.0, 74400000.0, 44.64639999999999], [1441361088.0, 74420000.0, 44.616636], [1441362816.0, 74440000.0, 44.586864], [1441364544.0, 74460000.0, 44.557084], [1441366272.0, 74480000.0, 44.52729599999999], [1441368000.0, 74500000.0, 44.497499999999995], [1441369728.0, 74520000.0, 44.467696000000004], [1441371456.0, 74540000.0, 44.437884000000004], [1441373184.0, 74560000.0, 44.408063999999996], [1441374912.0, 74580000.0, 44.378235999999994], [1441376640.0, 74600000.0, 44.3484], [1441378368.0, 74620000.0, 44.318556], [1441380096.0, 74640000.0, 44.288703999999996], [1441381824.0, 74660000.0, 44.258843999999996], [1441383552.0, 74680000.0, 44.228976], [1441385280.0, 74700000.0, 44.1991], [1441387008.0, 74720000.0, 44.16921599999999], [1441388736.0, 74740000.0, 44.139323999999995], [1441390464.0, 74760000.0, 44.109424], [1441392192.0, 74780000.0, 44.079516000000005], [1441393920.0, 74800000.0, 44.04959999999999], [1441395648.0, 74820000.0, 44.019676], [1441397376.0, 74840000.0, 43.989744], [1441399104.0, 74860000.0, 43.959804000000005], [1441400832.0, 74880000.0, 43.929855999999994], [1441402560.0, 74900000.0, 43.899899999999995], [1441404288.0, 74920000.0, 43.869936], [1441406016.0, 74940000.0, 43.839964], [1441407744.0, 74960000.0, 43.80998399999999], [1441409472.0, 74980000.0, 43.779996], [1441411200.0, 75000000.0, 43.75], [1441412928.0, 75020000.0, 43.719996], [1441414656.0, 75040000.0, 43.68998400000001], [1441416384.0, 75060000.0, 43.659963999999995], [1441418112.0, 75080000.0, 43.629936], [1441419840.0, 75100000.0, 43.599900000000005], [1441421568.0, 75120000.0, 43.56985600000001], [1441423296.0, 75140000.0, 43.539804], [1441425024.0, 75160000.0, 43.509744], [1441426752.0, 75180000.0, 43.479676000000005], [1441428480.0, 75200000.0, 43.449600000000004], [1441430208.0, 75220000.0, 43.419515999999994], [1441431936.0, 75240000.0, 43.389424], [1441433664.0, 75260000.0, 43.359324], [1441435392.0, 75280000.0, 43.32921600000001], [1441437120.0, 75300000.0, 43.299099999999996], [1441438848.0, 75320000.0, 43.268976], [1441440576.0, 75340000.0, 43.238844], [1441442304.0, 75360000.0, 43.208704000000004], [1441444032.0, 75380000.0, 43.17855599999999], [1441445760.0, 75400000.0, 43.1484], [1441447488.0, 75420000.0, 43.118236], [1441449216.0, 75440000.0, 43.088064], [1441450944.0, 75460000.0, 43.057883999999994], [1441452672.0, 75480000.0, 43.027696], [1441454400.0, 75500000.0, 42.9975], [1441456128.0, 75520000.0, 42.967296000000005], [1441457856.0, 75540000.0, 42.937084], [1441459584.0, 75560000.0, 42.906864], [1441461312.0, 75580000.0, 42.876636000000005], [1441463040.0, 75600000.0, 42.8464], [1441464768.0, 75620000.0, 42.81615599999999], [1441466496.0, 75640000.0, 42.785904], [1441468224.0, 75660000.0, 42.755644000000004], [1441469952.0, 75680000.0, 42.725376000000004], [1441471680.0, 75700000.0, 42.6951], [1441473408.0, 75720000.0, 42.664816], [1441475136.0, 75740000.0, 42.634524], [1441476864.0, 75760000.0, 42.60422400000001], [1441478592.0, 75780000.0, 42.573916], [1441480320.0, 75800000.0, 42.5436], [1441482048.0, 75820000.0, 42.513276000000005], [1441483776.0, 75840000.0, 42.482944], [1441485504.0, 75860000.0, 42.452603999999994], [1441487232.0, 75880000.0, 42.422256], [1441488960.0, 75900000.0, 42.3919], [1441490688.0, 75920000.0, 42.36153600000001], [1441492416.0, 75940000.0, 42.331163999999994], [1441494144.0, 75960000.0, 42.300784], [1441495872.0, 75980000.0, 42.270396000000005], [1441497600.0, 76000000.0, 42.24], [1441499328.0, 76020000.0, 42.209596], [1441501056.0, 76040000.0, 42.179184], [1441502784.0, 76060000.0, 42.148764], [1441504512.0, 76080000.0, 42.118336000000006], [1441506240.0, 76100000.0, 42.0879], [1441507968.0, 76120000.0, 42.057456], [1441509696.0, 76140000.0, 42.027004000000005], [1441511424.0, 76160000.0, 41.99654400000001], [1441513152.0, 76180000.0, 41.966075999999994], [1441514880.0, 76200000.0, 41.9356], [1441516608.0, 76220000.0, 41.905116], [1441518336.0, 76240000.0, 41.874624000000004], [1441520064.0, 76260000.0, 41.844123999999994], [1441521792.0, 76280000.0, 41.813615999999996], [1441523520.0, 76300000.0, 41.783100000000005], [1441525248.0, 76320000.0, 41.752576000000005], [1441526976.0, 76340000.0, 41.722044], [1441528704.0, 76360000.0, 41.691503999999995], [1441530432.0, 76380000.0, 41.660956], [1441532160.0, 76400000.0, 41.6304], [1441533888.0, 76420000.0, 41.599835999999996], [1441535616.0, 76440000.0, 41.569264], [1441537344.0, 76460000.0, 41.538684], [1441539072.0, 76480000.0, 41.508096], [1441540800.0, 76500000.0, 41.47749999999999], [1441542528.0, 76520000.0, 41.446895999999995], [1441544256.0, 76540000.0, 41.416284000000005], [1441545984.0, 76560000.0, 41.385664000000006], [1441547712.0, 76580000.0, 41.35503599999999], [1441549440.0, 76600000.0, 41.3244], [1441551168.0, 76620000.0, 41.293756], [1441552896.0, 76640000.0, 41.263104000000006], [1441554624.0, 76660000.0, 41.232443999999994], [1441556352.0, 76680000.0, 41.201775999999995], [1441558080.0, 76700000.0, 41.1711], [1441559808.0, 76720000.0, 41.140416], [1441561536.0, 76740000.0, 41.10972399999999], [1441563264.0, 76760000.0, 41.079024], [1441564992.0, 76780000.0, 41.048316], [1441566720.0, 76800000.0, 41.0176], [1441568448.0, 76820000.0, 40.986875999999995], [1441570176.0, 76840000.0, 40.956143999999995], [1441571904.0, 76860000.0, 40.925404], [1441573632.0, 76880000.0, 40.894656000000005], [1441575360.0, 76900000.0, 40.863899999999994], [1441577088.0, 76920000.0, 40.833135999999996], [1441578816.0, 76940000.0, 40.802364000000004], [1441580544.0, 76960000.0, 40.771584000000004], [1441582272.0, 76980000.0, 40.740795999999996], [1441584000.0, 77000000.0, 40.709999999999994], [1441585728.0, 77020000.0, 40.679196], [1441587456.0, 77040000.0, 40.64838400000001], [1441589184.0, 77060000.0, 40.617563999999994], [1441590912.0, 77080000.0, 40.586735999999995], [1441592640.0, 77100000.0, 40.5559], [1441594368.0, 77120000.0, 40.525056000000006], [1441596096.0, 77140000.0, 40.494203999999996], [1441597824.0, 77160000.0, 40.463344], [1441599552.0, 77180000.0, 40.432476], [1441601280.0, 77200000.0, 40.4016], [1441603008.0, 77220000.0, 40.370715999999994], [1441604736.0, 77240000.0, 40.339824], [1441606464.0, 77260000.0, 40.308924], [1441608192.0, 77280000.0, 40.278016], [1441609920.0, 77300000.0, 40.247099999999996], [1441611648.0, 77320000.0, 40.216176], [1441613376.0, 77340000.0, 40.185244], [1441615104.0, 77360000.0, 40.154304], [1441616832.0, 77380000.0, 40.123355999999994], [1441618560.0, 77400000.0, 40.0924], [1441620288.0, 77420000.0, 40.061436], [1441622016.0, 77440000.0, 40.030464], [1441623744.0, 77460000.0, 39.999483999999995], [1441625472.0, 77480000.0, 39.968495999999995], [1441627200.0, 77500000.0, 39.9375], [1441628928.0, 77520000.0, 39.906496000000004], [1441630656.0, 77540000.0, 39.87548400000001], [1441632384.0, 77560000.0, 39.844463999999995], [1441634112.0, 77580000.0, 39.813436], [1441635840.0, 77600000.0, 39.7824], [1441637568.0, 77620000.0, 39.75135600000001], [1441639296.0, 77640000.0, 39.720304], [1441641024.0, 77660000.0, 39.689244], [1441642752.0, 77680000.0, 39.658176000000005], [1441644480.0, 77700000.0, 39.627100000000006], [1441646208.0, 77720000.0, 39.596016], [1441647936.0, 77740000.0, 39.564924], [1441649664.0, 77760000.0, 39.533824], [1441651392.0, 77780000.0, 39.50271600000001], [1441653120.0, 77800000.0, 39.471599999999995], [1441654848.0, 77820000.0, 39.440476], [1441656576.0, 77840000.0, 39.409344000000004], [1441658304.0, 77860000.0, 39.378204000000004], [1441660032.0, 77880000.0, 39.347055999999995], [1441661760.0, 77900000.0, 39.3159], [1441663488.0, 77920000.0, 39.284736], [1441665216.0, 77940000.0, 39.253564000000004], [1441666944.0, 77960000.0, 39.222384], [1441668672.0, 77980000.0, 39.191196], [1441670400.0, 78000000.0, 39.160000000000004], [1441672128.0, 78020000.0, 39.12879600000001], [1441673856.0, 78040000.0, 39.097584], [1441675584.0, 78060000.0, 39.066364], [1441677312.0, 78080000.0, 39.035136], [1441679040.0, 78100000.0, 39.00390000000001], [1441680768.0, 78120000.0, 38.97265599999999], [1441682496.0, 78140000.0, 38.941404], [1441684224.0, 78160000.0, 38.910144], [1441685952.0, 78180000.0, 38.878876000000005], [1441687680.0, 78200000.0, 38.84759999999999], [1441689408.0, 78220000.0, 38.816316], [1441691136.0, 78240000.0, 38.785024], [1441692864.0, 78260000.0, 38.753724000000005], [1441694592.0, 78280000.0, 38.722415999999996], [1441696320.0, 78300000.0, 38.6911], [1441698048.0, 78320000.0, 38.659776], [1441699776.0, 78340000.0, 38.62844400000001], [1441701504.0, 78360000.0, 38.597103999999995], [1441703232.0, 78380000.0, 38.565756], [1441704960.0, 78400000.0, 38.534400000000005], [1441706688.0, 78420000.0, 38.50303600000001], [1441708416.0, 78440000.0, 38.471664], [1441710144.0, 78460000.0, 38.440284], [1441711872.0, 78480000.0, 38.408896], [1441713600.0, 78500000.0, 38.377500000000005], [1441715328.0, 78520000.0, 38.346095999999996], [1441717056.0, 78540000.0, 38.314684], [1441718784.0, 78560000.0, 38.283264], [1441720512.0, 78580000.0, 38.251836000000004], [1441722240.0, 78600000.0, 38.2204], [1441723968.0, 78620000.0, 38.188956], [1441725696.0, 78640000.0, 38.157504], [1441727424.0, 78660000.0, 38.12604400000001], [1441729152.0, 78680000.0, 38.094575999999996], [1441730880.0, 78700000.0, 38.0631], [1441732608.0, 78720000.0, 38.031616], [1441734336.0, 78740000.0, 38.00012400000001], [1441736064.0, 78760000.0, 37.96862399999999], [1441737792.0, 78780000.0, 37.937115999999996], [1441739520.0, 78800000.0, 37.9056], [1441741248.0, 78820000.0, 37.874076], [1441742976.0, 78840000.0, 37.842544], [1441744704.0, 78860000.0, 37.811004], [1441746432.0, 78880000.0, 37.779456], [1441748160.0, 78900000.0, 37.74790000000001], [1441749888.0, 78920000.0, 37.71633599999999], [1441751616.0, 78940000.0, 37.684764], [1441753344.0, 78960000.0, 37.653184], [1441755072.0, 78980000.0, 37.621596000000004], [1441756800.0, 79000000.0, 37.589999999999996], [1441758528.0, 79020000.0, 37.558395999999995], [1441760256.0, 79040000.0, 37.526784], [1441761984.0, 79060000.0, 37.495164], [1441763712.0, 79080000.0, 37.463536], [1441765440.0, 79100000.0, 37.4319], [1441767168.0, 79120000.0, 37.400256], [1441768896.0, 79140000.0, 37.368604000000005], [1441770624.0, 79160000.0, 37.336943999999995], [1441772352.0, 79180000.0, 37.305276], [1441774080.0, 79200000.0, 37.2736], [1441775808.0, 79220000.0, 37.241916], [1441777536.0, 79240000.0, 37.210224], [1441779264.0, 79260000.0, 37.178523999999996], [1441780992.0, 79280000.0, 37.146816], [1441782720.0, 79300000.0, 37.115100000000005], [1441784448.0, 79320000.0, 37.083375999999994], [1441786176.0, 79340000.0, 37.051643999999996], [1441787904.0, 79360000.0, 37.019904000000004], [1441789632.0, 79380000.0, 36.988156000000004], [1441791360.0, 79400000.0, 36.956399999999995], [1441793088.0, 79420000.0, 36.924636], [1441794816.0, 79440000.0, 36.892864], [1441796544.0, 79460000.0, 36.861084000000005], [1441798272.0, 79480000.0, 36.82929599999999], [1441800000.0, 79500000.0, 36.7975], [1441801728.0, 79520000.0, 36.765696], [1441803456.0, 79540000.0, 36.733884], [1441805184.0, 79560000.0, 36.70206399999999], [1441806912.0, 79580000.0, 36.670235999999996], [1441808640.0, 79600000.0, 36.6384], [1441810368.0, 79620000.0, 36.606556000000005], [1441812096.0, 79640000.0, 36.57470399999999], [1441813824.0, 79660000.0, 36.542843999999995], [1441815552.0, 79680000.0, 36.510976], [1441817280.0, 79700000.0, 36.4791], [1441819008.0, 79720000.0, 36.44721599999999], [1441820736.0, 79740000.0, 36.415324], [1441822464.0, 79760000.0, 36.383424], [1441824192.0, 79780000.0, 36.351516000000004], [1441825920.0, 79800000.0, 36.319599999999994], [1441827648.0, 79820000.0, 36.287676], [1441829376.0, 79840000.0, 36.255744], [1441831104.0, 79860000.0, 36.223804], [1441832832.0, 79880000.0, 36.191855999999994], [1441834560.0, 79900000.0, 36.15989999999999], [1441836288.0, 79920000.0, 36.127936], [1441838016.0, 79940000.0, 36.095964], [1441839744.0, 79960000.0, 36.06398399999999], [1441841472.0, 79980000.0, 36.031996], [1441843200.0, 80000000.0, 36.0], [1441844928.0, 80020000.0, 35.967995999999985], [1441846656.0, 80040000.0, 35.935984000000005], [1441848384.0, 80060000.0, 35.903964], [1441850112.0, 80080000.0, 35.87193600000002], [1441851840.0, 80100000.0, 35.8399], [1441853568.0, 80120000.0, 35.80785599999999], [1441855296.0, 80140000.0, 35.77580400000001], [1441857024.0, 80160000.0, 35.74374400000001], [1441858752.0, 80180000.0, 35.71167599999998], [1441860480.0, 80200000.0, 35.67960000000001], [1441862208.0, 80220000.0, 35.647515999999996], [1441863936.0, 80240000.0, 35.61542400000002], [1441865664.0, 80260000.0, 35.583324000000005], [1441867392.0, 80280000.0, 35.551216], [1441869120.0, 80300000.0, 35.51910000000001], [1441870848.0, 80320000.0, 35.486976], [1441872576.0, 80340000.0, 35.454843999999994], [1441874304.0, 80360000.0, 35.42270400000001], [1441876032.0, 80380000.0, 35.39055599999999], [1441877760.0, 80400000.0, 35.35840000000002], [1441879488.0, 80420000.0, 35.32623600000001], [1441881216.0, 80440000.0, 35.29406399999999], [1441882944.0, 80460000.0, 35.26188400000001], [1441884672.0, 80480000.0, 35.229696000000004], [1441886400.0, 80500000.0, 35.19749999999999], [1441888128.0, 80520000.0, 35.16529600000001], [1441889856.0, 80540000.0, 35.133084], [1441891584.0, 80560000.0, 35.100864000000016], [1441893312.0, 80580000.0, 35.068636], [1441895040.0, 80600000.0, 35.036399999999986], [1441896768.0, 80620000.0, 35.00415600000001], [1441898496.0, 80640000.0, 34.971903999999995], [1441900224.0, 80660000.0, 34.93964399999999], [1441901952.0, 80680000.0, 34.907376], [1441903680.0, 80700000.0, 34.87509999999999], [1441905408.0, 80720000.0, 34.84281600000001], [1441907136.0, 80740000.0, 34.810524], [1441908864.0, 80760000.0, 34.778223999999994], [1441910592.0, 80780000.0, 34.74591600000001], [1441912320.0, 80800000.0, 34.7136], [1441914048.0, 80820000.0, 34.68127599999998], [1441915776.0, 80840000.0, 34.648944], [1441917504.0, 80860000.0, 34.616603999999995], [1441919232.0, 80880000.0, 34.58425600000001], [1441920960.0, 80900000.0, 34.5519], [1441922688.0, 80920000.0, 34.51953599999999], [1441924416.0, 80940000.0, 34.48716400000001], [1441926144.0, 80960000.0, 34.454784000000004], [1441927872.0, 80980000.0, 34.42239599999999], [1441929600.0, 81000000.0, 34.39], [1441931328.0, 81020000.0, 34.357596], [1441933056.0, 81040000.0, 34.32518400000001], [1441934784.0, 81060000.0, 34.292764000000005], [1441936512.0, 81080000.0, 34.260335999999995], [1441938240.0, 81100000.0, 34.227900000000005], [1441939968.0, 81120000.0, 34.19545599999999], [1441941696.0, 81140000.0, 34.16300399999999], [1441943424.0, 81160000.0, 34.130544], [1441945152.0, 81180000.0, 34.09807599999999], [1441946880.0, 81200000.0, 34.06560000000002], [1441948608.0, 81220000.0, 34.03311600000001], [1441950336.0, 81240000.0, 34.00062399999999], [1441952064.0, 81260000.0, 33.968124], [1441953792.0, 81280000.0, 33.935615999999996], [1441955520.0, 81300000.0, 33.90309999999998], [1441957248.0, 81320000.0, 33.870576], [1441958976.0, 81340000.0, 33.838044], [1441960704.0, 81360000.0, 33.80550400000001], [1441962432.0, 81380000.0, 33.77295600000001], [1441964160.0, 81400000.0, 33.740399999999994], [1441965888.0, 81420000.0, 33.707836000000015], [1441967616.0, 81440000.0, 33.675264], [1441969344.0, 81460000.0, 33.64268399999999], [1441971072.0, 81480000.0, 33.610096], [1441972800.0, 81500000.0, 33.5775], [1441974528.0, 81520000.0, 33.54489600000001], [1441976256.0, 81540000.0, 33.51228400000001], [1441977984.0, 81560000.0, 33.479663999999985], [1441979712.0, 81580000.0, 33.44703600000001], [1441981440.0, 81600000.0, 33.4144], [1441983168.0, 81620000.0, 33.38175599999998], [1441984896.0, 81640000.0, 33.34910400000001], [1441986624.0, 81660000.0, 33.31644399999999], [1441988352.0, 81680000.0, 33.28377600000002], [1441990080.0, 81700000.0, 33.25110000000001], [1441991808.0, 81720000.0, 33.21841599999999], [1441993536.0, 81740000.0, 33.18572400000001], [1441995264.0, 81760000.0, 33.153024], [1441996992.0, 81780000.0, 33.12031599999999], [1441998720.0, 81800000.0, 33.08760000000001], [1442000448.0, 81820000.0, 33.05487599999999], [1442002176.0, 81840000.0, 33.02214400000001], [1442003904.0, 81860000.0, 32.98940400000001], [1442005632.0, 81880000.0, 32.956655999999995], [1442007360.0, 81900000.0, 32.9239], [1442009088.0, 81920000.0, 32.891136], [1442010816.0, 81940000.0, 32.85836399999998], [1442012544.0, 81960000.0, 32.825584000000006], [1442014272.0, 81980000.0, 32.792795999999996], [1442016000.0, 82000000.0, 32.760000000000005], [1442017728.0, 82020000.0, 32.727196000000006], [1442019456.0, 82040000.0, 32.694383999999985], [1442021184.0, 82060000.0, 32.66156400000001], [1442022912.0, 82080000.0, 32.628736], [1442024640.0, 82100000.0, 32.595899999999986], [1442026368.0, 82120000.0, 32.563056], [1442028096.0, 82140000.0, 32.530204], [1442029824.0, 82160000.0, 32.49734400000001], [1442031552.0, 82180000.0, 32.464476000000005], [1442033280.0, 82200000.0, 32.43159999999999], [1442035008.0, 82220000.0, 32.39871600000001], [1442036736.0, 82240000.0, 32.365824], [1442038464.0, 82260000.0, 32.33292399999999], [1442040192.0, 82280000.0, 32.300016], [1442041920.0, 82300000.0, 32.2671], [1442043648.0, 82320000.0, 32.234176000000005], [1442045376.0, 82340000.0, 32.201244], [1442047104.0, 82360000.0, 32.16830399999999], [1442048832.0, 82380000.0, 32.135356], [1442050560.0, 82400000.0, 32.1024], [1442052288.0, 82420000.0, 32.06943599999998], [1442054016.0, 82440000.0, 32.03646400000001], [1442055744.0, 82460000.0, 32.003483999999986], [1442057472.0, 82480000.0, 31.97049600000001], [1442059200.0, 82500000.0, 31.9375], [1442060928.0, 82520000.0, 31.904495999999995], [1442062656.0, 82540000.0, 31.87148400000001], [1442064384.0, 82560000.0, 31.838464000000002], [1442066112.0, 82580000.0, 31.805436000000014], [1442067840.0, 82600000.0, 31.772400000000005], [1442069568.0, 82620000.0, 31.739355999999987], [1442071296.0, 82640000.0, 31.706304000000017], [1442073024.0, 82660000.0, 31.673243999999997], [1442074752.0, 82680000.0, 31.640175999999983], [1442076480.0, 82700000.0, 31.607100000000003], [1442078208.0, 82720000.0, 31.574016], [1442079936.0, 82740000.0, 31.540924000000018], [1442081664.0, 82760000.0, 31.507824], [1442083392.0, 82780000.0, 31.474715999999987], [1442085120.0, 82800000.0, 31.441600000000008], [1442086848.0, 82820000.0, 31.408475999999993], [1442088576.0, 82840000.0, 31.375343999999984], [1442090304.0, 82860000.0, 31.34220400000001], [1442092032.0, 82880000.0, 31.309055999999998], [1442093760.0, 82900000.0, 31.275900000000007], [1442095488.0, 82920000.0, 31.242736000000008], [1442097216.0, 82940000.0, 31.209563999999986], [1442098944.0, 82960000.0, 31.176384000000013], [1442100672.0, 82980000.0, 31.143196000000003], [1442102400.0, 83000000.0, 31.109999999999985], [1442104128.0, 83020000.0, 31.076796], [1442105856.0, 83040000.0, 31.043583999999996], [1442107584.0, 83060000.0, 31.01036400000001], [1442109312.0, 83080000.0, 30.977136], [1442111040.0, 83100000.0, 30.943899999999985], [1442112768.0, 83120000.0, 30.910656000000017], [1442114496.0, 83140000.0, 30.877404], [1442116224.0, 83160000.0, 30.844143999999986], [1442117952.0, 83180000.0, 30.810876000000007], [1442119680.0, 83200000.0, 30.777599999999993], [1442121408.0, 83220000.0, 30.744316000000012], [1442123136.0, 83240000.0, 30.71102400000001], [1442124864.0, 83260000.0, 30.677723999999998], [1442126592.0, 83280000.0, 30.644416000000007], [1442128320.0, 83300000.0, 30.611099999999993], [1442130048.0, 83320000.0, 30.577775999999986], [1442131776.0, 83340000.0, 30.544444000000013], [1442133504.0, 83360000.0, 30.51110399999999], [1442135232.0, 83380000.0, 30.477756000000014], [1442136960.0, 83400000.0, 30.4444], [1442138688.0, 83420000.0, 30.411035999999996], [1442140416.0, 83440000.0, 30.37766400000001], [1442142144.0, 83460000.0, 30.344284000000002], [1442143872.0, 83480000.0, 30.310895999999985], [1442145600.0, 83500000.0, 30.277500000000003], [1442147328.0, 83520000.0, 30.244096], [1442149056.0, 83540000.0, 30.210684000000015], [1442150784.0, 83560000.0, 30.177264000000008], [1442152512.0, 83580000.0, 30.143835999999993], [1442154240.0, 83600000.0, 30.110400000000013], [1442155968.0, 83620000.0, 30.076955999999996], [1442157696.0, 83640000.0, 30.043503999999984], [1442159424.0, 83660000.0, 30.010044000000008], [1442161152.0, 83680000.0, 29.976575999999994], [1442162880.0, 83700000.0, 29.943100000000015], [1442164608.0, 83720000.0, 29.909616], [1442166336.0, 83740000.0, 29.87612399999999], [1442168064.0, 83760000.0, 29.842624000000015], [1442169792.0, 83780000.0, 29.809116000000003], [1442171520.0, 83800000.0, 29.775599999999983], [1442173248.0, 83820000.0, 29.74207600000001], [1442174976.0, 83840000.0, 29.70854399999999], [1442176704.0, 83860000.0, 29.675004000000015], [1442178432.0, 83880000.0, 29.641456000000005], [1442180160.0, 83900000.0, 29.607899999999987], [1442181888.0, 83920000.0, 29.574336000000002], [1442183616.0, 83940000.0, 29.540763999999996], [1442185344.0, 83960000.0, 29.50718399999998], [1442187072.0, 83980000.0, 29.473596], [1442188800.0, 84000000.0, 29.439999999999998], [1442190528.0, 84020000.0, 29.406396000000015], [1442192256.0, 84040000.0, 29.372783999999996], [1442193984.0, 84060000.0, 29.339163999999997], [1442195712.0, 84080000.0, 29.305536000000004], [1442197440.0, 84100000.0, 29.271900000000002], [1442199168.0, 84120000.0, 29.238255999999993], [1442200896.0, 84140000.0, 29.204604000000003], [1442202624.0, 84160000.0, 29.17094399999999], [1442204352.0, 84180000.0, 29.137276000000014], [1442206080.0, 84200000.0, 29.1036], [1442207808.0, 84220000.0, 29.069915999999992], [1442209536.0, 84240000.0, 29.036224000000004], [1442211264.0, 84260000.0, 29.002523999999994], [1442212992.0, 84280000.0, 28.96881599999999], [1442214720.0, 84300000.0, 28.935100000000006], [1442216448.0, 84320000.0, 28.901376], [1442218176.0, 84340000.0, 28.867644000000013], [1442219904.0, 84360000.0, 28.833904000000004], [1442221632.0, 84380000.0, 28.800155999999987], [1442223360.0, 84400000.0, 28.766400000000004], [1442225088.0, 84420000.0, 28.732636], [1442226816.0, 84440000.0, 28.698863999999986], [1442228544.0, 84460000.0, 28.665084000000007], [1442230272.0, 84480000.0, 28.631295999999992], [1442232000.0, 84500000.0, 28.59750000000001], [1442233728.0, 84520000.0, 28.563696000000007], [1442235456.0, 84540000.0, 28.529883999999996], [1442237184.0, 84560000.0, 28.496064000000004], [1442238912.0, 84580000.0, 28.46223599999999], [1442240640.0, 84600000.0, 28.428399999999982], [1442242368.0, 84620000.0, 28.39455600000001], [1442244096.0, 84640000.0, 28.360704], [1442245824.0, 84660000.0, 28.32684400000001], [1442247552.0, 84680000.0, 28.292975999999996], [1442249280.0, 84700000.0, 28.25909999999999], [1442251008.0, 84720000.0, 28.225216000000003], [1442252736.0, 84740000.0, 28.191323999999994], [1442254464.0, 84760000.0, 28.157423999999992], [1442256192.0, 84780000.0, 28.12351600000001], [1442257920.0, 84800000.0, 28.08959999999999], [1442259648.0, 84820000.0, 28.055676000000005], [1442261376.0, 84840000.0, 28.021743999999998], [1442263104.0, 84860000.0, 27.987803999999983], [1442264832.0, 84880000.0, 27.953856000000002], [1442266560.0, 84900000.0, 27.9199], [1442268288.0, 84920000.0, 27.885935999999987], [1442270016.0, 84940000.0, 27.85196400000001], [1442271744.0, 84960000.0, 27.817983999999996], [1442273472.0, 84980000.0, 27.783996000000016], [1442275200.0, 85000000.0, 27.75], [1442276928.0, 85020000.0, 27.71599599999999], [1442278656.0, 85040000.0, 27.681984000000014], [1442280384.0, 85060000.0, 27.647964], [1442282112.0, 85080000.0, 27.61393600000001], [1442283840.0, 85100000.0, 27.57990000000001], [1442285568.0, 85120000.0, 27.545855999999986], [1442287296.0, 85140000.0, 27.511804000000012], [1442289024.0, 85160000.0, 27.477744], [1442290752.0, 85180000.0, 27.443675999999982], [1442292480.0, 85200000.0, 27.40960000000001], [1442294208.0, 85220000.0, 27.37551599999999], [1442295936.0, 85240000.0, 27.341424000000018], [1442297664.0, 85260000.0, 27.30732400000001], [1442299392.0, 85280000.0, 27.27321599999999], [1442301120.0, 85300000.0, 27.239100000000008], [1442302848.0, 85320000.0, 27.204976000000002], [1442304576.0, 85340000.0, 27.17084399999999], [1442306304.0, 85360000.0, 27.13670400000001], [1442308032.0, 85380000.0, 27.102555999999993], [1442309760.0, 85400000.0, 27.06840000000001], [1442311488.0, 85420000.0, 27.034236000000007], [1442313216.0, 85440000.0, 27.000063999999995], [1442314944.0, 85460000.0, 26.965884000000017], [1442316672.0, 85480000.0, 26.931696000000002], [1442318400.0, 85500000.0, 26.897499999999994], [1442320128.0, 85520000.0, 26.863296000000005], [1442321856.0, 85540000.0, 26.829083999999995], [1442323584.0, 85560000.0, 26.794864000000018], [1442325312.0, 85580000.0, 26.760636000000005], [1442327040.0, 85600000.0, 26.726399999999998], [1442328768.0, 85620000.0, 26.69215600000001], [1442330496.0, 85640000.0, 26.657904000000002], [1442332224.0, 85660000.0, 26.623643999999985], [1442333952.0, 85680000.0, 26.589376], [1442335680.0, 85700000.0, 26.555099999999996], [1442337408.0, 85720000.0, 26.52081600000001], [1442339136.0, 85740000.0, 26.486524000000003], [1442340864.0, 85760000.0, 26.452223999999987], [1442342592.0, 85780000.0, 26.417916000000005], [1442344320.0, 85800000.0, 26.3836], [1442346048.0, 85820000.0, 26.34927599999999], [1442347776.0, 85840000.0, 26.31494400000001], [1442349504.0, 85860000.0, 26.280603999999997], [1442351232.0, 85880000.0, 26.246256000000017], [1442352960.0, 85900000.0, 26.2119], [1442354688.0, 85920000.0, 26.17753599999999], [1442356416.0, 85940000.0, 26.143164000000013], [1442358144.0, 85960000.0, 26.108784], [1442359872.0, 85980000.0, 26.074395999999993], [1442361600.0, 86000000.0, 26.040000000000006], [1442363328.0, 86020000.0, 26.005595999999997], [1442365056.0, 86040000.0, 25.971184000000008], [1442366784.0, 86060000.0, 25.936763999999997], [1442368512.0, 86080000.0, 25.90233599999999], [1442370240.0, 86100000.0, 25.867900000000006], [1442371968.0, 86120000.0, 25.833455999999998], [1442373696.0, 86140000.0, 25.799003999999982], [1442375424.0, 86160000.0, 25.764544], [1442377152.0, 86180000.0, 25.730075999999997], [1442378880.0, 86200000.0, 25.695600000000013], [1442380608.0, 86220000.0, 25.661116000000007], [1442382336.0, 86240000.0, 25.626623999999993], [1442384064.0, 86260000.0, 25.592124000000013], [1442385792.0, 86280000.0, 25.557615999999996], [1442387520.0, 86300000.0, 25.523099999999985], [1442389248.0, 86320000.0, 25.48857600000001], [1442390976.0, 86340000.0, 25.454043999999996], [1442392704.0, 86360000.0, 25.419504000000018], [1442394432.0, 86380000.0, 25.384956000000003], [1442396160.0, 86400000.0, 25.350399999999993], [1442397888.0, 86420000.0, 25.315836000000004], [1442399616.0, 86440000.0, 25.281263999999993], [1442401344.0, 86460000.0, 25.246683999999988], [1442403072.0, 86480000.0, 25.212096000000003], [1442404800.0, 86500000.0, 25.177499999999995], [1442406528.0, 86520000.0, 25.142896000000007], [1442408256.0, 86540000.0, 25.108283999999998], [1442409984.0, 86560000.0, 25.073663999999994], [1442411712.0, 86580000.0, 25.03903600000001], [1442413440.0, 86600000.0, 25.004400000000004], [1442415168.0, 86620000.0, 24.96975599999999], [1442416896.0, 86640000.0, 24.93510400000001], [1442418624.0, 86660000.0, 24.900443999999993], [1442420352.0, 86680000.0, 24.86577600000001], [1442422080.0, 86700000.0, 24.831100000000006], [1442423808.0, 86720000.0, 24.796415999999994], [1442425536.0, 86740000.0, 24.761724000000015], [1442427264.0, 86760000.0, 24.727024], [1442428992.0, 86780000.0, 24.69231599999999], [1442430720.0, 86800000.0, 24.657600000000002], [1442432448.0, 86820000.0, 24.62287599999999], [1442434176.0, 86840000.0, 24.588144000000014], [1442435904.0, 86860000.0, 24.553404], [1442437632.0, 86880000.0, 24.518655999999993], [1442439360.0, 86900000.0, 24.483900000000006], [1442441088.0, 86920000.0, 24.449135999999996], [1442442816.0, 86940000.0, 24.414363999999992], [1442444544.0, 86960000.0, 24.37958400000001], [1442446272.0, 86980000.0, 24.344795999999988], [1442448000.0, 87000000.0, 24.310000000000016], [1442449728.0, 87020000.0, 24.275195999999994], [1442451456.0, 87040000.0, 24.24038399999999], [1442453184.0, 87060000.0, 24.20556400000001], [1442454912.0, 87080000.0, 24.17073599999999], [1442456640.0, 87100000.0, 24.135899999999978], [1442458368.0, 87120000.0, 24.101056], [1442460096.0, 87140000.0, 24.066204], [1442461824.0, 87160000.0, 24.03134400000002], [1442463552.0, 87180000.0, 23.996476], [1442465280.0, 87200000.0, 23.96159999999999], [1442467008.0, 87220000.0, 23.926716000000013], [1442468736.0, 87240000.0, 23.891824], [1442470464.0, 87260000.0, 23.856923999999978], [1442472192.0, 87280000.0, 23.822016000000005], [1442473920.0, 87300000.0, 23.787099999999995], [1442475648.0, 87320000.0, 23.752176000000006], [1442477376.0, 87340000.0, 23.717243999999994], [1442479104.0, 87360000.0, 23.682303999999988], [1442480832.0, 87380000.0, 23.647356000000002], [1442482560.0, 87400000.0, 23.612399999999994], [1442484288.0, 87420000.0, 23.57743599999999], [1442486016.0, 87440000.0, 23.54246400000001], [1442487744.0, 87460000.0, 23.50748399999999], [1442489472.0, 87480000.0, 23.472496000000007], [1442491200.0, 87500000.0, 23.4375], [1442492928.0, 87520000.0, 23.402495999999985], [1442494656.0, 87540000.0, 23.367484000000005], [1442496384.0, 87560000.0, 23.332464], [1442498112.0, 87580000.0, 23.29743600000002], [1442499840.0, 87600000.0, 23.2624], [1442501568.0, 87620000.0, 23.227355999999986], [1442503296.0, 87640000.0, 23.192304000000007], [1442505024.0, 87660000.0, 23.157244000000006], [1442506752.0, 87680000.0, 23.122175999999982], [1442508480.0, 87700000.0, 23.087100000000007], [1442510208.0, 87720000.0, 23.052015999999995], [1442511936.0, 87740000.0, 23.016924000000017], [1442513664.0, 87760000.0, 22.981824000000003], [1442515392.0, 87780000.0, 22.946715999999995], [1442517120.0, 87800000.0, 22.911600000000007], [1442518848.0, 87820000.0, 22.876475999999997], [1442520576.0, 87840000.0, 22.841343999999992], [1442522304.0, 87860000.0, 22.806204000000008], [1442524032.0, 87880000.0, 22.771056], [1442525760.0, 87900000.0, 22.735900000000015], [1442527488.0, 87920000.0, 22.700736000000006], [1442529216.0, 87940000.0, 22.66556399999999], [1442530944.0, 87960000.0, 22.630384000000006], [1442532672.0, 87980000.0, 22.595196], [1442534400.0, 88000000.0, 22.559999999999988], [1442536128.0, 88020000.0, 22.52479600000001], [1442537856.0, 88040000.0, 22.489583999999994], [1442539584.0, 88060000.0, 22.454364000000012], [1442541312.0, 88080000.0, 22.41913600000001], [1442543040.0, 88100000.0, 22.383899999999997], [1442544768.0, 88120000.0, 22.348656000000005], [1442546496.0, 88140000.0, 22.313404000000006], [1442548224.0, 88160000.0, 22.278143999999983], [1442549952.0, 88180000.0, 22.24287600000001], [1442551680.0, 88200000.0, 22.2076], [1442553408.0, 88220000.0, 22.17231600000001], [1442555136.0, 88240000.0, 22.137023999999997], [1442556864.0, 88260000.0, 22.10172399999999], [1442558592.0, 88280000.0, 22.066416000000004], [1442560320.0, 88300000.0, 22.031099999999995], [1442562048.0, 88320000.0, 21.995775999999992], [1442563776.0, 88340000.0, 21.96044400000001], [1442565504.0, 88360000.0, 21.92510399999999], [1442567232.0, 88380000.0, 21.88975600000002], [1442568960.0, 88400000.0, 21.8544], [1442570688.0, 88420000.0, 21.819035999999997], [1442572416.0, 88440000.0, 21.783664000000016], [1442574144.0, 88460000.0, 21.748283999999998], [1442575872.0, 88480000.0, 21.712895999999986], [1442577600.0, 88500000.0, 21.67750000000001], [1442579328.0, 88520000.0, 21.642095999999995], [1442581056.0, 88540000.0, 21.606684000000016], [1442582784.0, 88560000.0, 21.571264], [1442584512.0, 88580000.0, 21.53583599999999], [1442586240.0, 88600000.0, 21.500400000000013], [1442587968.0, 88620000.0, 21.464956], [1442589696.0, 88640000.0, 21.42950399999998], [1442591424.0, 88660000.0, 21.394044000000008], [1442593152.0, 88680000.0, 21.358576], [1442594880.0, 88700000.0, 21.32310000000001], [1442596608.0, 88720000.0, 21.287616], [1442598336.0, 88740000.0, 21.252123999999995], [1442600064.0, 88760000.0, 21.21662400000001], [1442601792.0, 88780000.0, 21.181116000000003], [1442603520.0, 88800000.0, 21.145599999999988], [1442605248.0, 88820000.0, 21.110076000000007], [1442606976.0, 88840000.0, 21.07454399999999], [1442608704.0, 88860000.0, 21.03900400000002], [1442610432.0, 88880000.0, 21.003456], [1442612160.0, 88900000.0, 20.967899999999986], [1442613888.0, 88920000.0, 20.932336000000006], [1442615616.0, 88940000.0, 20.896764000000005], [1442617344.0, 88960000.0, 20.86118399999998], [1442619072.0, 88980000.0, 20.825596000000004], [1442620800.0, 89000000.0, 20.789999999999992], [1442622528.0, 89020000.0, 20.754396000000014], [1442624256.0, 89040000.0, 20.718784], [1442625984.0, 89060000.0, 20.68316399999999], [1442627712.0, 89080000.0, 20.647536000000002], [1442629440.0, 89100000.0, 20.61189999999999], [1442631168.0, 89120000.0, 20.576255999999987], [1442632896.0, 89140000.0, 20.540604000000002], [1442634624.0, 89160000.0, 20.504943999999995], [1442636352.0, 89180000.0, 20.469276000000008], [1442638080.0, 89200000.0, 20.4336], [1442639808.0, 89220000.0, 20.397915999999995], [1442641536.0, 89240000.0, 20.362224000000012], [1442643264.0, 89260000.0, 20.326523999999992], [1442644992.0, 89280000.0, 20.290815999999978], [1442646720.0, 89300000.0, 20.2551], [1442648448.0, 89320000.0, 20.219375999999997], [1442650176.0, 89340000.0, 20.183644000000015], [1442651904.0, 89360000.0, 20.147903999999997], [1442653632.0, 89380000.0, 20.112155999999985], [1442655360.0, 89400000.0, 20.076400000000007], [1442657088.0, 89420000.0, 20.040635999999992], [1442658816.0, 89440000.0, 20.004863999999984], [1442660544.0, 89460000.0, 19.96908400000001], [1442662272.0, 89480000.0, 19.933296], [1442664000.0, 89500000.0, 19.897500000000008], [1442665728.0, 89520000.0, 19.861695999999995], [1442667456.0, 89540000.0, 19.825883999999988], [1442669184.0, 89560000.0, 19.790064000000015], [1442670912.0, 89580000.0, 19.75423599999999], [1442672640.0, 89600000.0, 19.71839999999999], [1442674368.0, 89620000.0, 19.682556000000005], [1442676096.0, 89640000.0, 19.646703999999986], [1442677824.0, 89660000.0, 19.610844000000014], [1442679552.0, 89680000.0, 19.574976000000007], [1442681280.0, 89700000.0, 19.53909999999999], [1442683008.0, 89720000.0, 19.50321600000001], [1442684736.0, 89740000.0, 19.46732399999999], [1442686464.0, 89760000.0, 19.43142399999998], [1442688192.0, 89780000.0, 19.395516], [1442689920.0, 89800000.0, 19.359599999999986], [1442691648.0, 89820000.0, 19.323676000000006], [1442693376.0, 89840000.0, 19.287744000000004], [1442695104.0, 89860000.0, 19.251803999999993], [1442696832.0, 89880000.0, 19.215856000000002], [1442698560.0, 89900000.0, 19.17989999999999], [1442700288.0, 89920000.0, 19.143935999999982], [1442702016.0, 89940000.0, 19.10796400000001], [1442703744.0, 89960000.0, 19.071983999999986], [1442705472.0, 89980000.0, 19.03599600000001], [1442707200.0, 90000000.0, 19.0], [1442708928.0, 90020000.0, 18.963995999999995], [1442710656.0, 90040000.0, 18.92798400000001], [1442712384.0, 90060000.0, 18.891964], [1442714112.0, 90080000.0, 18.855936000000014], [1442715840.0, 90100000.0, 18.819900000000004], [1442717568.0, 90120000.0, 18.783855999999986], [1442719296.0, 90140000.0, 18.747804000000016], [1442721024.0, 90160000.0, 18.711743999999996], [1442722752.0, 90180000.0, 18.67567599999998], [1442724480.0, 90200000.0, 18.6396], [1442726208.0, 90220000.0, 18.603516], [1442727936.0, 90240000.0, 18.567424000000017], [1442729664.0, 90260000.0, 18.531323999999998], [1442731392.0, 90280000.0, 18.495215999999985], [1442733120.0, 90300000.0, 18.459100000000007], [1442734848.0, 90320000.0, 18.422976000000006], [1442736576.0, 90340000.0, 18.386843999999982], [1442738304.0, 90360000.0, 18.350704000000007], [1442740032.0, 90380000.0, 18.314555999999996], [1442741760.0, 90400000.0, 18.27840000000002], [1442743488.0, 90420000.0, 18.242236000000005], [1442745216.0, 90440000.0, 18.206063999999998], [1442746944.0, 90460000.0, 18.16988400000001], [1442748672.0, 90480000.0, 18.133696], [1442750400.0, 90500000.0, 18.097499999999982], [1442752128.0, 90520000.0, 18.061296000000013], [1442753856.0, 90540000.0, 18.025083999999993], [1442755584.0, 90560000.0, 17.98886400000002], [1442757312.0, 90580000.0, 17.952636], [1442759040.0, 90600000.0, 17.916399999999996], [1442760768.0, 90620000.0, 17.880156000000014], [1442762496.0, 90640000.0, 17.843903999999995], [1442764224.0, 90660000.0, 17.807643999999982], [1442765952.0, 90680000.0, 17.771376000000004], [1442767680.0, 90700000.0, 17.73509999999999], [1442769408.0, 90720000.0, 17.698816000000008], [1442771136.0, 90740000.0, 17.662524000000005], [1442772864.0, 90760000.0, 17.626223999999993], [1442774592.0, 90780000.0, 17.589916000000017], [1442776320.0, 90800000.0, 17.553600000000003], [1442778048.0, 90820000.0, 17.51727599999998], [1442779776.0, 90840000.0, 17.480944000000008], [1442781504.0, 90860000.0, 17.444603999999998], [1442783232.0, 90880000.0, 17.40825600000001], [1442784960.0, 90900000.0, 17.371899999999997], [1442786688.0, 90920000.0, 17.33553599999999], [1442788416.0, 90940000.0, 17.299164000000005], [1442790144.0, 90960000.0, 17.262783999999996], [1442791872.0, 90980000.0, 17.22639599999998], [1442793600.0, 91000000.0, 17.190000000000012], [1442795328.0, 91020000.0, 17.153595999999993], [1442797056.0, 91040000.0, 17.11718400000001], [1442798784.0, 91060000.0, 17.080764000000002], [1442800512.0, 91080000.0, 17.044335999999987], [1442802240.0, 91100000.0, 17.007900000000006], [1442803968.0, 91120000.0, 16.971456000000003], [1442805696.0, 91140000.0, 16.935003999999992], [1442807424.0, 91160000.0, 16.898544], [1442809152.0, 91180000.0, 16.862075999999988], [1442810880.0, 91200000.0, 16.82560000000001], [1442812608.0, 91220000.0, 16.789116000000007], [1442814336.0, 91240000.0, 16.752623999999983], [1442816064.0, 91260000.0, 16.716124000000008], [1442817792.0, 91280000.0, 16.679615999999996], [1442819520.0, 91300000.0, 16.64309999999999], [1442821248.0, 91320000.0, 16.606576000000004], [1442822976.0, 91340000.0, 16.570043999999996], [1442824704.0, 91360000.0, 16.533504000000008], [1442826432.0, 91380000.0, 16.496955999999997], [1442828160.0, 91400000.0, 16.460399999999993], [1442829888.0, 91420000.0, 16.42383600000001], [1442831616.0, 91440000.0, 16.387264000000002], [1442833344.0, 91460000.0, 16.350683999999987], [1442835072.0, 91480000.0, 16.314096000000006], [1442836800.0, 91500000.0, 16.27749999999999], [1442838528.0, 91520000.0, 16.24089600000002], [1442840256.0, 91540000.0, 16.204284], [1442841984.0, 91560000.0, 16.167663999999988], [1442843712.0, 91580000.0, 16.13103600000001], [1442845440.0, 91600000.0, 16.094399999999993], [1442847168.0, 91620000.0, 16.057755999999983], [1442848896.0, 91640000.0, 16.02110400000001], [1442850624.0, 91660000.0, 15.984443999999996], [1442852352.0, 91680000.0, 15.947776000000019], [1442854080.0, 91700000.0, 15.911100000000005], [1442855808.0, 91720000.0, 15.874415999999982], [1442857536.0, 91740000.0, 15.837724000000009], [1442859264.0, 91760000.0, 15.801023999999998], [1442860992.0, 91780000.0, 15.76431599999998], [1442862720.0, 91800000.0, 15.72760000000001], [1442864448.0, 91820000.0, 15.690875999999989], [1442866176.0, 91840000.0, 15.654144000000016], [1442867904.0, 91860000.0, 15.617404000000008], [1442869632.0, 91880000.0, 15.58065599999999], [1442871360.0, 91900000.0, 15.543900000000008], [1442873088.0, 91920000.0, 15.507136000000003], [1442874816.0, 91940000.0, 15.47036399999999], [1442876544.0, 91960000.0, 15.43358400000001], [1442878272.0, 91980000.0, 15.396795999999995], [1442880000.0, 92000000.0, 15.360000000000014], [1442881728.0, 92020000.0, 15.323195999999996], [1442883456.0, 92040000.0, 15.286383999999984], [1442885184.0, 92060000.0, 15.249564000000007], [1442886912.0, 92080000.0, 15.212735999999992], [1442888640.0, 92100000.0, 15.175899999999984], [1442890368.0, 92120000.0, 15.13905600000001], [1442892096.0, 92140000.0, 15.102203999999986], [1442893824.0, 92160000.0, 15.06534400000001], [1442895552.0, 92180000.0, 15.028475999999998], [1442897280.0, 92200000.0, 14.991599999999991], [1442899008.0, 92220000.0, 14.954716000000005], [1442900736.0, 92240000.0, 14.917823999999996], [1442902464.0, 92260000.0, 14.880923999999979], [1442904192.0, 92280000.0, 14.84401600000001], [1442905920.0, 92300000.0, 14.807099999999991], [1442907648.0, 92320000.0, 14.770176000000006], [1442909376.0, 92340000.0, 14.733244], [1442911104.0, 92360000.0, 14.696303999999984], [1442912832.0, 92380000.0, 14.659356000000002], [1442914560.0, 92400000.0, 14.622399999999999], [1442916288.0, 92420000.0, 14.585435999999987], [1442918016.0, 92440000.0, 14.54846400000001], [1442919744.0, 92460000.0, 14.511483999999996], [1442921472.0, 92480000.0, 14.474496000000016], [1442923200.0, 92500000.0, 14.4375], [1442924928.0, 92520000.0, 14.40049599999999], [1442926656.0, 92540000.0, 14.363484000000014], [1442928384.0, 92560000.0, 14.326464000000001], [1442930112.0, 92580000.0, 14.289436000000023], [1442931840.0, 92600000.0, 14.252400000000009], [1442933568.0, 92620000.0, 14.215355999999986], [1442935296.0, 92640000.0, 14.178304000000011], [1442937024.0, 92660000.0, 14.141244], [1442938752.0, 92680000.0, 14.104175999999981], [1442940480.0, 92700000.0, 14.06710000000001], [1442942208.0, 92720000.0, 14.03001599999999], [1442943936.0, 92740000.0, 13.992924000000016], [1442945664.0, 92760000.0, 13.955824000000007], [1442947392.0, 92780000.0, 13.91871599999999], [1442949120.0, 92800000.0, 13.881600000000006], [1442950848.0, 92820000.0, 13.844476], [1442952576.0, 92840000.0, 13.807343999999986], [1442954304.0, 92860000.0, 13.770204000000007], [1442956032.0, 92880000.0, 13.73305599999999], [1442957760.0, 92900000.0, 13.695900000000009], [1442959488.0, 92920000.0, 13.658736000000005], [1442961216.0, 92940000.0, 13.621563999999992], [1442962944.0, 92960000.0, 13.584384000000014], [1442964672.0, 92980000.0, 13.547196], [1442966400.0, 93000000.0, 13.509999999999991], [1442968128.0, 93020000.0, 13.472796000000002], [1442969856.0, 93040000.0, 13.435583999999992], [1442971584.0, 93060000.0, 13.398364000000015], [1442973312.0, 93080000.0, 13.361136000000002], [1442975040.0, 93100000.0, 13.323899999999995], [1442976768.0, 93120000.0, 13.286656000000008], [1442978496.0, 93140000.0, 13.249403999999998], [1442980224.0, 93160000.0, 13.21214399999998], [1442981952.0, 93180000.0, 13.174876000000012], [1442983680.0, 93200000.0, 13.137599999999992], [1442985408.0, 93220000.0, 13.10031600000002], [1442987136.0, 93240000.0, 13.063023999999999], [1442988864.0, 93260000.0, 13.025723999999997], [1442990592.0, 93280000.0, 12.988416000000015], [1442992320.0, 93300000.0, 12.951099999999997], [1442994048.0, 93320000.0, 12.913775999999984], [1442995776.0, 93340000.0, 12.876444000000006], [1442997504.0, 93360000.0, 12.839103999999992], [1442999232.0, 93380000.0, 12.801756000000012], [1443000960.0, 93400000.0, 12.764400000000009], [1443002688.0, 93420000.0, 12.727035999999984], [1443004416.0, 93440000.0, 12.689664000000008], [1443006144.0, 93460000.0, 12.652283999999995], [1443007872.0, 93480000.0, 12.614895999999987], [1443009600.0, 93500000.0, 12.5775], [1443011328.0, 93520000.0, 12.540095999999991], [1443013056.0, 93540000.0, 12.502684000000016], [1443014784.0, 93560000.0, 12.465264000000005], [1443016512.0, 93580000.0, 12.427835999999985], [1443018240.0, 93600000.0, 12.390400000000014], [1443019968.0, 93620000.0, 12.352955999999992], [1443021696.0, 93640000.0, 12.31550399999999], [1443023424.0, 93660000.0, 12.278044000000008], [1443025152.0, 93680000.0, 12.24057599999999], [1443026880.0, 93700000.0, 12.20310000000002], [1443028608.0, 93720000.0, 12.165616], [1443030336.0, 93740000.0, 12.128123999999985], [1443032064.0, 93760000.0, 12.090624000000005], [1443033792.0, 93780000.0, 12.053116000000003], [1443035520.0, 93800000.0, 12.015599999999992], [1443037248.0, 93820000.0, 11.978076000000001], [1443038976.0, 93840000.0, 11.940543999999989], [1443040704.0, 93860000.0, 11.90300400000001], [1443042432.0, 93880000.0, 11.865456000000009], [1443044160.0, 93900000.0, 11.827899999999985], [1443045888.0, 93920000.0, 11.79033600000001], [1443047616.0, 93940000.0, 11.752763999999999], [1443049344.0, 93960000.0, 11.71518399999998], [1443051072.0, 93980000.0, 11.677596000000008], [1443052800.0, 94000000.0, 11.639999999999986], [1443054528.0, 94020000.0, 11.602396000000013], [1443056256.0, 94040000.0, 11.564784000000003], [1443057984.0, 94060000.0, 11.527163999999985], [1443059712.0, 94080000.0, 11.489536000000015], [1443061440.0, 94100000.0, 11.451899999999995], [1443063168.0, 94120000.0, 11.41425599999998], [1443064896.0, 94140000.0, 11.376604], [1443066624.0, 94160000.0, 11.338943999999998], [1443068352.0, 94180000.0, 11.301276000000016], [1443070080.0, 94200000.0, 11.263599999999997], [1443071808.0, 94220000.0, 11.225915999999984], [1443073536.0, 94240000.0, 11.188224000000005], [1443075264.0, 94260000.0, 11.15052399999999], [1443076992.0, 94280000.0, 11.112815999999981], [1443078720.0, 94300000.0, 11.075100000000006], [1443080448.0, 94320000.0, 11.037375999999995], [1443082176.0, 94340000.0, 10.999644000000018], [1443083904.0, 94360000.0, 10.961904000000004], [1443085632.0, 94380000.0, 10.924155999999982], [1443087360.0, 94400000.0, 10.886400000000009], [1443089088.0, 94420000.0, 10.848635999999999], [1443090816.0, 94440000.0, 10.810863999999981], [1443092544.0, 94460000.0, 10.773084000000011], [1443094272.0, 94480000.0, 10.735295999999991], [1443096000.0, 94500000.0, 10.69750000000002], [1443097728.0, 94520000.0, 10.659695999999997], [1443099456.0, 94540000.0, 10.621883999999994], [1443101184.0, 94560000.0, 10.584064000000012], [1443102912.0, 94580000.0, 10.546235999999993], [1443104640.0, 94600000.0, 10.50839999999998], [1443106368.0, 94620000.0, 10.470556000000002], [1443108096.0, 94640000.0, 10.432703999999987], [1443109824.0, 94660000.0, 10.394844000000006], [1443111552.0, 94680000.0, 10.356976000000003], [1443113280.0, 94700000.0, 10.319099999999992], [1443115008.0, 94720000.0, 10.281216000000015], [1443116736.0, 94740000.0, 10.243324000000001], [1443118464.0, 94760000.0, 10.20542399999998], [1443120192.0, 94780000.0, 10.167516000000006], [1443121920.0, 94800000.0, 10.129599999999996], [1443123648.0, 94820000.0, 10.091676000000007], [1443125376.0, 94840000.0, 10.053743999999995], [1443127104.0, 94860000.0, 10.015803999999989], [1443128832.0, 94880000.0, 9.977856000000003], [1443130560.0, 94900000.0, 9.939899999999994], [1443132288.0, 94920000.0, 9.901935999999978], [1443134016.0, 94940000.0, 9.86396400000001], [1443135744.0, 94960000.0, 9.825983999999991], [1443137472.0, 94980000.0, 9.787996000000007], [1443139200.0, 95000000.0, 9.75], [1443140928.0, 95020000.0, 9.711995999999985], [1443142656.0, 95040000.0, 9.673984000000004], [1443144384.0, 95060000.0, 9.635964000000001], [1443146112.0, 95080000.0, 9.597936000000018], [1443147840.0, 95100000.0, 9.559899999999999], [1443149568.0, 95120000.0, 9.521855999999985], [1443151296.0, 95140000.0, 9.483804000000006], [1443153024.0, 95160000.0, 9.445744000000005], [1443154752.0, 95180000.0, 9.40767599999998], [1443156480.0, 95200000.0, 9.369600000000005], [1443158208.0, 95220000.0, 9.331515999999993], [1443159936.0, 95240000.0, 9.293424000000016], [1443161664.0, 95260000.0, 9.255324000000002], [1443163392.0, 95280000.0, 9.217215999999993], [1443165120.0, 95300000.0, 9.179100000000005], [1443166848.0, 95320000.0, 9.140975999999995], [1443168576.0, 95340000.0, 9.10284399999999], [1443170304.0, 95360000.0, 9.064704000000006], [1443172032.0, 95380000.0, 9.026556], [1443173760.0, 95400000.0, 8.988400000000013], [1443175488.0, 95420000.0, 8.950236000000004], [1443177216.0, 95440000.0, 8.912063999999987], [1443178944.0, 95460000.0, 8.873884000000018], [1443180672.0, 95480000.0, 8.835695999999999], [1443182400.0, 95500000.0, 8.797499999999985], [1443184128.0, 95520000.0, 8.759296000000006], [1443185856.0, 95540000.0, 8.72108399999999], [1443187584.0, 95560000.0, 8.68286400000001], [1443189312.0, 95580000.0, 8.644636000000006], [1443191040.0, 95600000.0, 8.606399999999994], [1443192768.0, 95620000.0, 8.568156000000016], [1443194496.0, 95640000.0, 8.529904000000002], [1443196224.0, 95660000.0, 8.49164399999998], [1443197952.0, 95680000.0, 8.453376000000006], [1443199680.0, 95700000.0, 8.415099999999995], [1443201408.0, 95720000.0, 8.37681600000002], [1443203136.0, 95740000.0, 8.338524000000007], [1443204864.0, 95760000.0, 8.300223999999986], [1443206592.0, 95780000.0, 8.261916000000014], [1443208320.0, 95800000.0, 8.223600000000005], [1443210048.0, 95820000.0, 8.185275999999988], [1443211776.0, 95840000.0, 8.146944000000005], [1443213504.0, 95860000.0, 8.108604], [1443215232.0, 95880000.0, 8.070256000000015], [1443216960.0, 95900000.0, 8.031900000000007], [1443218688.0, 95920000.0, 7.993535999999992], [1443220416.0, 95940000.0, 7.955164000000011], [1443222144.0, 95960000.0, 7.916783999999993], [1443223872.0, 95980000.0, 7.878395999999981], [1443225600.0, 96000000.0, 7.840000000000003], [1443227328.0, 96020000.0, 7.801595999999989], [1443229056.0, 96040000.0, 7.76318400000001], [1443230784.0, 96060000.0, 7.7247640000000075], [1443232512.0, 96080000.0, 7.686335999999983], [1443234240.0, 96100000.0, 7.647900000000007], [1443235968.0, 96120000.0, 7.6094559999999944], [1443237696.0, 96140000.0, 7.571003999999988], [1443239424.0, 96160000.0, 7.5325440000000015], [1443241152.0, 96180000.0, 7.494075999999993], [1443242880.0, 96200000.0, 7.455600000000018], [1443244608.0, 96220000.0, 7.417116000000007], [1443246336.0, 96240000.0, 7.378623999999988], [1443248064.0, 96260000.0, 7.340124000000017], [1443249792.0, 96280000.0, 7.301615999999996], [1443251520.0, 96300000.0, 7.26309999999998], [1443253248.0, 96320000.0, 7.224576000000013], [1443254976.0, 96340000.0, 7.186043999999995], [1443256704.0, 96360000.0, 7.147504000000012], [1443258432.0, 96380000.0, 7.108956000000006], [1443260160.0, 96400000.0, 7.070399999999992], [1443261888.0, 96420000.0, 7.031836000000013], [1443263616.0, 96440000.0, 6.993263999999996], [1443265344.0, 96460000.0, 6.954683999999986], [1443267072.0, 96480000.0, 6.91609600000001], [1443268800.0, 96500000.0, 6.877499999999998], [1443270528.0, 96520000.0, 6.83889600000002], [1443272256.0, 96540000.0, 6.800284000000005], [1443273984.0, 96560000.0, 6.761663999999982], [1443275712.0, 96580000.0, 6.723036000000008], [1443277440.0, 96600000.0, 6.684399999999997], [1443279168.0, 96620000.0, 6.645755999999977], [1443280896.0, 96640000.0, 6.607104000000007], [1443282624.0, 96660000.0, 6.5684439999999995], [1443284352.0, 96680000.0, 6.5297760000000125], [1443286080.0, 96700000.0, 6.491100000000003], [1443287808.0, 96720000.0, 6.452415999999985], [1443289536.0, 96740000.0, 6.413724000000016], [1443291264.0, 96760000.0, 6.375023999999996], [1443292992.0, 96780000.0, 6.336315999999982], [1443294720.0, 96800000.0, 6.297600000000003], [1443296448.0, 96820000.0, 6.258875999999987], [1443298176.0, 96840000.0, 6.220144000000019], [1443299904.0, 96860000.0, 6.181404000000001], [1443301632.0, 96880000.0, 6.142655999999988], [1443303360.0, 96900000.0, 6.10390000000001], [1443305088.0, 96920000.0, 6.065135999999995], [1443306816.0, 96940000.0, 6.026363999999987], [1443308544.0, 96960000.0, 5.987583999999998], [1443310272.0, 96980000.0, 5.948795999999987], [1443312000.0, 97000000.0, 5.910000000000011], [1443313728.0, 97020000.0, 5.871195999999998], [1443315456.0, 97040000.0, 5.8323839999999905], [1443317184.0, 97060000.0, 5.7935640000000035], [1443318912.0, 97080000.0, 5.754735999999994], [1443320640.0, 97100000.0, 5.715899999999976], [1443322368.0, 97120000.0, 5.677056000000007], [1443324096.0, 97140000.0, 5.638203999999988], [1443325824.0, 97160000.0, 5.599344000000016], [1443327552.0, 97180000.0, 5.560475999999994], [1443329280.0, 97200000.0, 5.521599999999992], [1443331008.0, 97220000.0, 5.482716000000011], [1443332736.0, 97240000.0, 5.443823999999992], [1443334464.0, 97260000.0, 5.40492399999998], [1443336192.0, 97280000.0, 5.366016000000002], [1443337920.0, 97300000.0, 5.327099999999987], [1443339648.0, 97320000.0, 5.288176000000007], [1443341376.0, 97340000.0, 5.2492440000000045], [1443343104.0, 97360000.0, 5.210303999999994], [1443344832.0, 97380000.0, 5.171356000000003], [1443346560.0, 97400000.0, 5.13239999999999], [1443348288.0, 97420000.0, 5.093435999999983], [1443350016.0, 97440000.0, 5.05446400000001], [1443351744.0, 97460000.0, 5.0154839999999865], [1443353472.0, 97480000.0, 4.976496000000012], [1443355200.0, 97500000.0, 4.9375], [1443356928.0, 97520000.0, 4.89849599999998], [1443358656.0, 97540000.0, 4.859484000000009], [1443360384.0, 97560000.0, 4.820464000000001], [1443362112.0, 97580000.0, 4.781436000000014], [1443363840.0, 97600000.0, 4.7424000000000035], [1443365568.0, 97620000.0, 4.703355999999985], [1443367296.0, 97640000.0, 4.6643040000000155], [1443369024.0, 97660000.0, 4.625243999999995], [1443370752.0, 97680000.0, 4.5861759999999805], [1443372480.0, 97700000.0, 4.547100000000015], [1443374208.0, 97720000.0, 4.508015999999998], [1443375936.0, 97740000.0, 4.468924000000015], [1443377664.0, 97760000.0, 4.429824000000011], [1443379392.0, 97780000.0, 4.390715999999998], [1443381120.0, 97800000.0, 4.351600000000019], [1443382848.0, 97820000.0, 4.312476000000004], [1443384576.0, 97840000.0, 4.27334399999998], [1443386304.0, 97860000.0, 4.234204000000005], [1443388032.0, 97880000.0, 4.195055999999994], [1443389760.0, 97900000.0, 4.155900000000017], [1443391488.0, 97920000.0, 4.116736000000003], [1443393216.0, 97940000.0, 4.077563999999995], [1443394944.0, 97960000.0, 4.038384000000008], [1443396672.0, 97980000.0, 3.9991959999999978], [1443398400.0, 98000000.0, 3.9599999999999795], [1443400128.0, 98020000.0, 3.92079600000001], [1443401856.0, 98040000.0, 3.8815839999999895], [1443403584.0, 98060000.0, 3.8423640000000177], [1443405312.0, 98080000.0, 3.803136000000009], [1443407040.0, 98100000.0, 3.7638999999999925], [1443408768.0, 98120000.0, 3.72465600000001], [1443410496.0, 98140000.0, 3.6854040000000055], [1443412224.0, 98160000.0, 3.6461439999999925], [1443413952.0, 98180000.0, 3.606876000000014], [1443415680.0, 98200000.0, 3.5675999999999988], [1443417408.0, 98220000.0, 3.528316000000018], [1443419136.0, 98240000.0, 3.4890240000000006], [1443420864.0, 98260000.0, 3.449723999999989], [1443422592.0, 98280000.0, 3.410416000000012], [1443424320.0, 98300000.0, 3.3710999999999984], [1443426048.0, 98320000.0, 3.3317759999999907], [1443427776.0, 98340000.0, 3.2924440000000033], [1443429504.0, 98360000.0, 3.2531039999999933], [1443431232.0, 98380000.0, 3.213756000000018], [1443432960.0, 98400000.0, 3.1744000000000057], [1443434688.0, 98420000.0, 3.1350359999999853], [1443436416.0, 98440000.0, 3.0956640000000135], [1443438144.0, 98460000.0, 3.056284000000005], [1443439872.0, 98480000.0, 3.0168959999999885], [1443441600.0, 98500000.0, 2.9775000000000063], [1443443328.0, 98520000.0, 2.9380959999999874], [1443445056.0, 98540000.0, 2.898684000000017], [1443446784.0, 98560000.0, 2.859263999999996], [1443448512.0, 98580000.0, 2.819835999999995], [1443450240.0, 98600000.0, 2.7804000000000144], [1443451968.0, 98620000.0, 2.740955999999997], [1443453696.0, 98640000.0, 2.7015039999999857], [1443455424.0, 98660000.0, 2.6620440000000087], [1443457152.0, 98680000.0, 2.622575999999995], [1443458880.0, 98700000.0, 2.583100000000016], [1443460608.0, 98720000.0, 2.543616], [1443462336.0, 98740000.0, 2.5041239999999902], [1443464064.0, 98760000.0, 2.464624000000015], [1443465792.0, 98780000.0, 2.4251160000000027], [1443467520.0, 98800000.0, 2.3855999999999824], [1443469248.0, 98820000.0, 2.3460760000000107], [1443470976.0, 98840000.0, 2.306543999999988], [1443472704.0, 98860000.0, 2.2670040000000142], [1443474432.0, 98880000.0, 2.2274560000000037], [1443476160.0, 98900000.0, 2.187899999999985], [1443477888.0, 98920000.0, 2.1483360000000147], [1443479616.0, 98940000.0, 2.1087639999999936], [1443481344.0, 98960000.0, 2.0691839999999786], [1443483072.0, 98980000.0, 2.029596000000012], [1443484800.0, 99000000.0, 1.9899999999999949], [1443486528.0, 99020000.0, 1.950396000000012], [1443488256.0, 99040000.0, 1.9107840000000067], [1443489984.0, 99060000.0, 1.8711639999999932], [1443491712.0, 99080000.0, 1.831536000000014], [1443493440.0, 99100000.0, 1.7918999999999983], [1443495168.0, 99120000.0, 1.7522559999999885], [1443496896.0, 99140000.0, 1.712603999999999], [1443498624.0, 99160000.0, 1.6729439999999869], [1443500352.0, 99180000.0, 1.6332760000000093], [1443502080.0, 99200000.0, 1.593599999999995], [1443503808.0, 99220000.0, 1.5539159999999868], [1443505536.0, 99240000.0, 1.514224000000013], [1443507264.0, 99260000.0, 1.4745240000000024], [1443508992.0, 99280000.0, 1.4348159999999837], [1443510720.0, 99300000.0, 1.3950999999999993], [1443512448.0, 99320000.0, 1.3553759999999926], [1443514176.0, 99340000.0, 1.3156440000000202], [1443515904.0, 99360000.0, 1.275903999999997], [1443517632.0, 99380000.0, 1.236155999999994], [1443519360.0, 99400000.0, 1.1964000000000112], [1443521088.0, 99420000.0, 1.1566359999999918], [1443522816.0, 99440000.0, 1.1168639999999783], [1443524544.0, 99460000.0, 1.0770839999999993], [1443526272.0, 99480000.0, 1.0372959999999978], [1443528000.0, 99500000.0, 0.9975000000000165], [1443529728.0, 99520000.0, 0.9576959999999985], [1443531456.0, 99540000.0, 0.9178839999999866], [1443533184.0, 99560000.0, 0.8780640000000091], [1443534912.0, 99580000.0, 0.8382359999999949], [1443536640.0, 99600000.0, 0.7983999999999867], [1443538368.0, 99620000.0, 0.7585559999999987], [1443540096.0, 99640000.0, 0.7187039999999882], [1443541824.0, 99660000.0, 0.6788440000000122], [1443543552.0, 99680000.0, 0.6389759999999995], [1443545280.0, 99700000.0, 0.5990999999999929], [1443547008.0, 99720000.0, 0.5592160000000064], [1443548736.0, 99740000.0, 0.5193239999999975], [1443550464.0, 99760000.0, 0.4794239999999803], [1443552192.0, 99780000.0, 0.4395160000000118], [1443553920.0, 99800000.0, 0.3995999999999924], [1443555648.0, 99820000.0, 0.35967600000000743], [1443557376.0, 99840000.0, 0.31974400000000003], [1443559104.0, 99860000.0, 0.2798039999999844], [1443560832.0, 99880000.0, 0.23985600000000318], [1443562560.0, 99900000.0, 0.19989999999999952], [1443564288.0, 99920000.0, 0.15993599999998764], [1443566016.0, 99940000.0, 0.11996400000001017], [1443567744.0, 99960000.0, 0.07998399999999606], [1443569472.0, 99980000.0, 0.03999600000001635]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/app/demo/index.html b/tensorflow/tensorboard/app/demo/index.html
new file mode 100644
index 0000000000..a12b5abc26
--- /dev/null
+++ b/tensorflow/tensorboard/app/demo/index.html
@@ -0,0 +1,25 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <script src="../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+    <script>
+      function handleLoad() {
+        TF.Urls.runsUrl = function() {return "data/runs.json"};
+        TF.Urls.graphUrl = function(run) {
+          return "data/" + run + "-graph.pbtxt";
+        };
+        TF.Urls.scalarsUrl = function(tag, run) {
+          var path = "data/" + run.split("_")[0] + ".json";
+          return path;
+        };
+      }
+    </script>
+    <link rel="import" href="../tf-tensorboard.html" onload="handleLoad(event)">
+    <link rel="stylesheet" type="text/css" href="../../lib/css/global.css">
+    <title>TensorBoard Demo</title>
+  </head>
+  <body>
+    <script>handleLoad()</script>
+    <tf-tensorboard id="demo"></tf-tensorboard>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/app/index.html b/tensorflow/tensorboard/app/index.html
new file mode 100644
index 0000000000..7adc063452
--- /dev/null
+++ b/tensorflow/tensorboard/app/index.html
@@ -0,0 +1,13 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <script src="../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+    <script src="../lib/js/analytics.js"></script>
+    <link rel="import" href="tf-tensorboard.html">
+    <link rel="stylesheet" type="text/css" href="../lib/css/global.css">
+    <title>TensorBoard</title>
+  </head>
+  <body>
+    <tf-tensorboard></tf-tensorboard>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/app/tf-tensorboard-demo.html b/tensorflow/tensorboard/app/tf-tensorboard-demo.html
new file mode 100644
index 0000000000..5f0ef5b00c
--- /dev/null
+++ b/tensorflow/tensorboard/app/tf-tensorboard-demo.html
@@ -0,0 +1,72 @@
+<link rel="import" href="../bower_components/polymer/polymer.html">
+<link rel="import" href="tf-tensorboard.html">
+<!--
+tf-tensorboard-demo creates a demo instance of TensorBoard.
+
+It expects to load data from a folder called tensorboard/
+
+The way it ensures the urls are correct is quite hacky.
+TODO(danmane): Fix the url handling during cleanup.
+-->
+<dom-module id="tf-tensorboard-demo">
+  <template>
+    <template is="dom-if" if="[[_urlsReady]]">
+      <tf-tensorboard mode="[[mode]]"></tf-tensorboard>
+    </template>
+    <template is="dom-if" if="[[!_urlsReady]]">
+      <p>
+        urls not ready - probably because a dataDir was not provided
+      </p>
+    </template>
+    <style>
+    :host {
+      display: block;
+      width: 100%;
+      height: 100%;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-tensorboard-demo",
+      properties: {
+        mode: {
+          type: String,
+          value: "events",
+        },
+        dataDir: {
+          type: String,
+        },
+        _urlsReady: {
+          type: Boolean,
+          value: false,
+        },
+      },
+      observers: ['_setupUrls(dataDir)'],
+      _setupUrls: function(dataDir) {
+        function router(route) {
+          return function(tag, run) {
+            run = run.replace(/[ \)\(]/g, "_");
+            tag = tag.replace(/[ \)\(]/g, "_");
+            return dataDir + "/" + route + "/" + run + "/" + tag + ".json";
+          };
+        }
+        TF.Urls.runsUrl = function() {
+          return dataDir + "/runs.json"
+        };
+        TF.Urls.graphUrl = function(run) {
+          run = run.replace(/ /g, "_");
+          return dataDir + "/graph/" + run + ".pbtxt";
+        };
+        TF.Urls.scalarsUrl = router("scalars");
+        TF.Urls.histogramsUrl = router("histograms");
+        TF.Urls.compressedHistogramsUrl = router("compressedHistograms");
+        TF.Urls.imagesUrl = router("images");
+        TF.Urls.individualImageUrl = function(query) {
+          return dataDir + "/individualImage/" + query + ".png";
+        }
+        this._urlsReady = true;
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/app/tf-tensorboard.html b/tensorflow/tensorboard/app/tf-tensorboard.html
new file mode 100644
index 0000000000..0f5114143e
--- /dev/null
+++ b/tensorflow/tensorboard/app/tf-tensorboard.html
@@ -0,0 +1,135 @@
+<link rel="import" href="../bower_components/polymer/polymer.html">
+<link rel="import" href="../bower_components/paper-toolbar/paper-toolbar.html">
+<link rel="import" href="../bower_components/paper-button/paper-button.html">
+<link rel="import" href="../bower_components/paper-header-panel/paper-header-panel.html">
+<link rel="import" href="../components/tf-event-dashboard/tf-event-dashboard.html">
+<link rel="import" href="../components/tf-histogram-dashboard/tf-histogram-dashboard.html">
+<link rel="import" href="../components/tf-image-dashboard/tf-image-dashboard.html">
+<link rel="import" href="../components/tf-graph-dashboard/tf-graph-dashboard.html">
+<link rel="import" href="../components/tf-dashboard-common/tensorboard-color.html">
+<!--
+tf-tensorboard is the frontend entry point for TensorBoard.
+
+It implements a toolbar (via paper-header-panel and paper-toolbar) that
+allows the user to toggle between various dashboards.
+-->
+<dom-module id="tf-tensorboard">
+  <template>
+    <paper-header-panel>
+      <paper-toolbar id="toolbar">
+        <div id="toolbar-content">
+          <div class="toolbar-title">
+            TensorBoard
+          </div>
+          <div class="right-buttons">
+            <paper-button
+              class="link-button"
+              on-click="chooseEvents"
+              active$="[[eventDashboard(mode)]]"
+              noink
+            >Events</paper-button>
+            <paper-button
+              class="link-button"
+              on-click="chooseImages"
+              active$="[[imageDashboard(mode)]]"
+              noink
+            >Images</paper-button>
+            <paper-button
+              class="link-button"
+              on-click="chooseGraphs"
+              active$="[[graphDashboard(mode)]]"
+              noink
+            >Graph</paper-button>
+            <paper-button
+              class="link-button"
+              on-click="chooseHistograms"
+              active$="[[histogramDashboard(mode)]]"
+              noink
+            >Histograms</paper-button>
+          </div>
+        </div>
+      </paper-toolbar>
+      <div id="content" class="fit">
+        <template is="dom-if" if="[[eventDashboard(mode)]]">
+          <tf-event-dashboard id="eventDash"></tf-event-dashboard>
+        </template>
+
+        <template is="dom-if" if="[[imageDashboard(mode)]]">
+          <tf-image-dashboard id="imageDash"></tf-image-dashboard>
+        </template>
+
+        <template is="dom-if" if="[[graphDashboard(mode)]]">
+          <tf-graph-dashboard id="graphDash"></tf-graph-dashboard>
+        </template>
+
+        <template is="dom-if" if="[[histogramDashboard(mode)]]">
+          <tf-histogram-dashboard id="histogramDash"></tf-histogram-dashboard>
+        </template>
+      </div>
+    </paper-header-panel>
+    <style>
+      #toolbar {
+        background-color: var(--tb-orange-strong);
+        background-image: radial-gradient(ellipse, var(--tb-orange-weak), var(--tb-orange-strong));
+      }
+      #toolbar-content {
+        width: 100%;
+        display: flex;
+        flex-direction: row;
+        justify-content: space-between;
+        align-items: center;
+      }
+      .toolbar-title {
+        font-size: 30px;
+      }
+      #content {
+        height: 100%;
+      }
+      .link-button {
+        height: 30px;
+      }
+      [active] {
+        font-weight: bold;
+      }
+      :host {
+        height: 100%;
+        display: block;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-tensorboard",
+      properties: {
+        mode: {
+          type: String,
+          value: "events",
+        },
+      },
+      chooseEvents: function() {
+        this.mode = "events";
+      },
+      chooseImages: function() {
+        this.mode = "images";
+      },
+      chooseGraphs: function() {
+        this.mode = "graphs";
+      },
+      chooseHistograms: function() {
+        this.mode = "histograms";
+      },
+      eventDashboard: function(mode) {
+        return mode === "events";
+      },
+      imageDashboard: function(mode) {
+        return mode === "images";
+      },
+      graphDashboard: function(mode) {
+        return mode === "graphs";
+      },
+      histogramDashboard: function(mode) {
+        return mode === "histograms";
+      }
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/bower.json b/tensorflow/tensorboard/bower.json
new file mode 100644
index 0000000000..bdd16d662a
--- /dev/null
+++ b/tensorflow/tensorboard/bower.json
@@ -0,0 +1,50 @@
+{
+  "name": "tf-vis",
+  "version": "0.0.0",
+  "description": "Visualizations for TensorFlow",
+  "authors": [
+    "Google"
+  ],
+  "license": "Apache-2.0",
+  "ignore": [
+    "**/.*",
+    "node_modules",
+    "bower_components",
+    "test",
+    "tests"
+  ],
+  "private": true,
+  "dependencies": {
+    "d3": "3.5.6",
+    "dagre": "~0.7.4",
+    "es6-promise": "~3.0.2",
+    "graphlib": "~1.0.7",
+    "iron-ajax": "PolymerElements/iron-ajax#~1.0.8",
+    "iron-collapse": "PolymerElements/iron-collapse#~1.0.4",
+    "iron-list": "PolymerElements/iron-list#~1.1.5",
+    "paper-button": "PolymerElements/paper-button#~1.0.7",
+    "paper-checkbox": "PolymerElements/paper-checkbox#~1.0.6",
+    "paper-dropdown-menu": "PolymerElements/paper-dropdown-menu#~1.0.4",
+    "paper-header-panel": "PolymerElements/paper-header-panel#~1.0.5",
+    "paper-icon-button": "PolymerElements/paper-icon-button#~1.0.3",
+    "paper-input": "PolymerElements/paper-input#~1.0.15",
+    "paper-item": "PolymerElements/paper-item#~1.0.3",
+    "paper-menu": "PolymerElements/paper-menu#~1.1.1",
+    "paper-progress": "PolymerElements/paper-progress#~1.0.7",
+    "paper-radio-button": "PolymerElements/paper-radio-button#~1.0.8",
+    "paper-radio-group": "PolymerElements/paper-radio-group#~1.0.4",
+    "paper-slider": "PolymerElements/paper-slider#~1.0.4",
+    "paper-styles": "PolymerElements/paper-styles#~1.0.11",
+    "paper-toggle-button": "PolymerElements/paper-toggle-button#~1.0.6",
+    "paper-toolbar": "PolymerElements/paper-toolbar#~1.0.4",
+    "plottable": "~1.16.1",
+    "polymer": "~1.2.0"
+  },
+  "devDependencies": {
+    "iron-component-page": "PolymerElements/iron-component-page#^1.0.0",
+    "web-component-tester": "*"
+  },
+  "resolutions": {
+    "d3": "3.5.6"
+  }
+}
diff --git a/tensorflow/tensorboard/bower/BUILD b/tensorflow/tensorboard/bower/BUILD
new file mode 100644
index 0000000000..709760b312
--- /dev/null
+++ b/tensorflow/tensorboard/bower/BUILD
@@ -0,0 +1,76 @@
+package(default_visibility = ["//tensorflow:internal"])
+
+filegroup(
+    name = "bower",
+    srcs = [
+        "@accessibility-developer-tools//:accessibility-developer-tools",
+        "@async//:async",
+        "@chai//:chai",
+        "@d3//:d3",
+        "@dagre//:dagre",
+        "@es6-promise//:es6-promise",
+        "@font-roboto//:font-roboto",
+        "@graphlib//:graphlib",
+        "@hydrolysis//:hydrolysis",
+        "@iron-a11y-keys-behavior//:iron-a11y-keys-behavior",
+        "@iron-ajax//:iron-ajax",
+        "@iron-autogrow-textarea//:iron-autogrow-textarea",
+        "@iron-behaviors//:iron-behaviors",
+        "@iron-checked-element-behavior//:iron-checked-element-behavior",
+        "@iron-collapse//:iron-collapse",
+        "@iron-component-page//:iron-component-page",
+        "@iron-doc-viewer//:iron-doc-viewer",
+        "@iron-dropdown//:iron-dropdown",
+        "@iron-fit-behavior//:iron-fit-behavior",
+        "@iron-flex-layout//:iron-flex-layout",
+        "@iron-form-element-behavior//:iron-form-element-behavior",
+        "@iron-icon//:iron-icon",
+        "@iron-icons//:iron-icons",
+        "@iron-iconset-svg//:iron-iconset-svg",
+        "@iron-input//:iron-input",
+        "@iron-list//:iron-list",
+        "@iron-menu-behavior//:iron-menu-behavior",
+        "@iron-meta//:iron-meta",
+        "@iron-overlay-behavior//:iron-overlay-behavior",
+        "@iron-range-behavior//:iron-range-behavior",
+        "@iron-resizable-behavior//:iron-resizable-behavior",
+        "@iron-selector//:iron-selector",
+        "@iron-validatable-behavior//:iron-validatable-behavior",
+        "@lodash//:lodash",
+        "@marked-element//:marked-element",
+        "@marked//:marked",
+        "@mocha//:mocha",
+        "@neon-animation//:neon-animation",
+        "@paper-behaviors//:paper-behaviors",
+        "@paper-button//:paper-button",
+        "@paper-checkbox//:paper-checkbox",
+        "@paper-dropdown-menu//:paper-dropdown-menu",
+        "@paper-header-panel//:paper-header-panel",
+        "@paper-icon-button//:paper-icon-button",
+        "@paper-input//:paper-input",
+        "@paper-item//:paper-item",
+        "@paper-material//:paper-material",
+        "@paper-menu-button//:paper-menu-button",
+        "@paper-menu//:paper-menu",
+        "@paper-progress//:paper-progress",
+        "@paper-radio-button//:paper-radio-button",
+        "@paper-radio-group//:paper-radio-group",
+        "@paper-ripple//:paper-ripple",
+        "@paper-slider//:paper-slider",
+        "@paper-styles//:paper-styles",
+        "@paper-toggle-button//:paper-toggle-button",
+        "@paper-toolbar//:paper-toolbar",
+        "@plottable//:plottable",
+        "@polymer//:polymer",
+        "@prism-element//:prism-element",
+        "@prism//:prism",
+        "@promise-polyfill//:promise-polyfill",
+        "@sinon-chai//:sinon-chai",
+        "@sinonjs//:sinonjs",
+        "@stacky//:stacky",
+        "@svg-typewriter//:svg-typewriter",
+        "@web-animations-js//:web-animations-js",
+        "@web-component-tester//:web-component-tester",
+        "@webcomponentsjs//:webcomponentsjs",
+    ],
+)
diff --git a/tensorflow/tensorboard/components/hydrogen-join/demo/index.html b/tensorflow/tensorboard/components/hydrogen-join/demo/index.html
new file mode 100644
index 0000000000..238cff294d
--- /dev/null
+++ b/tensorflow/tensorboard/components/hydrogen-join/demo/index.html
@@ -0,0 +1,118 @@
+<!doctype html>
+<!--
+@license
+Copyright (c) 2015 The Polymer Project Authors. All rights reserved.
+This code may only be used under the BSD style license found at http://polymer.github.io/LICENSE.txt
+The complete set of authors may be found at http://polymer.github.io/AUTHORS.txt
+The complete set of contributors may be found at http://polymer.github.io/CONTRIBUTORS.txt
+Code distributed by Google as part of the polymer project is also
+subject to an additional IP rights grant found at http://polymer.github.io/PATENTS.txt
+-->
+<html>
+  <head>
+    <meta charset="utf-8">
+    <meta name="viewport" content="width=device-width, minimum-scale=1.0, initial-scale=1.0, user-scalable=yes">
+    <title>tf-graph Demo</title>
+    <script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+    <link rel="import" href="../hydrogen-join.html">
+    <link rel="import" href="../../../bower_components/paper-button/paper-button.html">
+    <link rel="import" href="../../../bower_components/paper-slider/paper-slider.html">
+  </head>
+
+  <style>
+    body {
+      font-family: "RobotoDraft","Roboto",sans-serif;
+      font-size: 14;
+    }
+  </style>
+
+  <body unresolved>
+    <dom-module id="button-press-counter">
+      <style>
+        paper-button {
+          background-color: #4DB6AC;
+          color: white;
+        }
+      </style>
+      <template>
+        <paper-button raised on-click="increment"><span>[[count]]<span></paper-button>
+      </template>
+      <script>
+        Polymer({
+          is: "button-press-counter",
+          properties: {count: {notify: true, value: 0}},
+          increment: function() {this.count++;}
+        });
+      </script>
+    </dom-module>
+
+    <dom-module id="args-demo">
+      <template>
+        <h1>args-demo</h1>
+        <button-press-counter count="{{in1}}"></button-press-counter>
+        <button-press-counter count="{{in2}}"></button-press-counter>
+        <button-press-counter count="{{in3}}"></button-press-counter>
+        <hydrogen-join
+          in1="[[in1]]"
+          in2="[[in2]]"
+          in3="[[in3]]"
+          out="{{out}}"
+          id="argsjoin"
+        ></hydrogen-join>
+        <p>Output from the hydrogen-join: <span>[[out]]</span></p>
+      </template>
+      <script>
+        Polymer({
+          is: "args-demo",
+          properties: {
+            in1: Number,
+            in2: Number,
+            in3: Number,
+            out: Array,
+          },
+        });
+      </script>
+    </dom-module>
+
+    <dom-module id="repeat-demo">
+      <style>
+        .button {
+          padding: 3px;
+          margin-bottom: 4px;
+          display: inline-block;
+        }
+      </style>
+      <template>
+        <h1>repeat-demo</h1>
+        <hydrogen-join id="repeatjoin" in-join-property="count" out="{{out}}">
+          <template is="dom-repeat" items="[[counters]]">
+            <button-press-counter class="button" count="[[item]]"></button-press-counter>
+          </template>
+        </hydrogen-join>
+        <br>
+        <p> Move this slider to add/remove buttons. It stays synced! What magic! </p>
+        <paper-slider min="0" max="20" value="{{nCounters}}"></paper-slider>
+        <p>Output from the hydrogen-join: <span>[[out]]</span></p>
+      </template>
+      <script>
+        Polymer({
+          is: "repeat-demo",
+          properties: {
+            nCounters: {type: Number, value: 10},
+            counters: {type: Array, computed: "_makeCounters(nCounters)"},
+          },
+          _makeCounters: function(nCounters) {
+            var c = [];
+            for (var i=0; i<nCounters; i++) {
+              c.push(i);
+            }
+            return c;
+          }
+        });
+      </script>
+    </dom-module>
+
+    <args-demo id="argsdemo"></args-demo>
+    <repeat-demo id="repeatdemo"></repeat-demo>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/components/hydrogen-join/hydrogen-join.html b/tensorflow/tensorboard/components/hydrogen-join/hydrogen-join.html
new file mode 100644
index 0000000000..9f8dcfe2af
--- /dev/null
+++ b/tensorflow/tensorboard/components/hydrogen-join/hydrogen-join.html
@@ -0,0 +1,118 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+
+<!--
+A plumber component which joins a number of input bindings. It outputs a list
+consisting of all of the inputs that are not null or undefined.
+
+It can take explicit arguments [in0...in98] or it can pull its arguments from
+a dom-repeat template, provided that it is given the property name it is looking
+for on the repeated children, and that property has {notify: true}. The repeat
+binding involves some magic and may not be totally reliable; ping danmane@ if
+something goes wrong.
+
+Example:
+  <hydrogen-join
+    in1="[[foo1]]"
+    in2="[[foo2]]"
+    in3="[[foo3]]"
+    out="{{foos}}" // equivalent to [foo1].push(foo2).push(foo3)
+  ></hydrogen-join>
+
+  <hydrogen-join out="{{foo}}" in-join-property="out">
+    <template is="dom-repeat" items="[[foos]]">
+        <foo item="[[item]]"></foo> //foo has a property out: {type: Array, notify: true}
+    </template>
+  </hydrogen-join>
+
+Swapping the inJoinProperty is not currently supported, it will warn if you try.
+
+There's a possible bug in repeat mode if an element is removed from the dom-repeat,
+but continues to exist somewhere else, and continues to fire property-changed events.
+Then the hydrogen-join will inappropriately record its value, even though it is not
+still connected in hydrogen-join's DOM.
+
+@demo
+-->
+<dom-module id="hydrogen-join">
+  <script>
+    var declaration = {
+      is: 'hydrogen-join',
+      properties: {
+        out: {type: Array, readOnly: true, notify: true},
+        _items: {type: Array, value: function() {return [];}},
+        /* Property to pull from dom-repeated child nodes and then pull and join */
+        inJoinProperty: {type: String, observer: "_modifyJoinProperty"},
+      },
+      listeners: {
+        "dom-change": "_syncListenersAndState",
+      },
+      /* If we are in repeat-mode, ensure all event listeners are setup, and pull
+       * items out of whatever children currently exist
+       */
+      _syncListenersAndState: function() {
+        if (this.inJoinProperty == null) {
+          // this codepath is for pulling properties out of children in a repeat
+          return;
+        }
+        function repeatUpdateFunction(i) {
+          return function(e) {
+            this._items[i] = e.detail.value;
+            // Debounce just in case something else thrashes
+            this.debounce("updateOut", this._updateOutput)
+          }
+        }
+        // create a new items array to replace the old one, otherwise old items
+        // might never get removed when their corresponding element leaves
+        this._items = [];
+        // Sadly, we need to bind an update function onto every child node,
+        // because the property-changed event does not bubble.
+        for (var i=0; i<this.childNodes.length; i++) {
+          var child = this.childNodes[i];
+          if (child.properties != null && child.properties[this.inJoinProperty] != null) {
+            child.addEventListener(this.inJoinProperty + "-changed", repeatUpdateFunction(i).bind(this));
+            this._items[i] = child[this.inJoinProperty];
+          }
+        }
+        this._updateOutput();
+      },
+      _modifyJoinProperty: function(newJoinProperty, oldJoinProperty) {
+        if (oldJoinProperty != null) {
+          console.warning("Changing the join property may be unsafe. Have fun!");
+        }
+        this._syncListenersAndState();
+      },
+      _updateOutput: function() {
+        var out = [];
+        for (var i=0; i<99; i++) {
+          if (this._items[i] != null) {
+            out.push(this._items[i]);
+          }
+        }
+        this._setOut(out);
+      }
+    };
+
+    /* Programatically add properties in0-in98, with associated observers */
+    function argsUpdateFunction(i) {
+      return function(newval) {
+        this._items[i] = newval;
+        if (i === 98) {
+          console.warn("You're updating the last arg (98). Possibly some values are being lost");
+        }
+        if (this.inJoinProperty != null) {
+          console.warn("It looks like you're providing a join property and also arguments. This is not supported.")
+        }
+        this.debounce("updateOut", this._updateOutput)
+      }
+    }
+    // I got 99 arguments and ain't off by 1!
+    for (var i = 0; i < 99; i++) {
+      var propName = "in" + i;
+      var updateName = "_update" + i;
+      var property = {type: Object, observer: updateName};
+      declaration.properties[propName] = property;
+      declaration[updateName] = argsUpdateFunction(i);
+    }
+    Polymer(declaration);
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/hydrogen-set/demo/index.html b/tensorflow/tensorboard/components/hydrogen-set/demo/index.html
new file mode 100644
index 0000000000..6a1735a699
--- /dev/null
+++ b/tensorflow/tensorboard/components/hydrogen-set/demo/index.html
@@ -0,0 +1,106 @@
+<!doctype html>
+<!--
+@license
+Copyright (c) 2016 The Polymer Project Authors. All rights reserved.
+This code may only be used under the BSD style license found at http://polymer.github.io/LICENSE.txt
+The complete set of authors may be found at http://polymer.github.io/AUTHORS.txt
+The complete set of contributors may be found at http://polymer.github.io/CONTRIBUTORS.txt
+Code distributed by Google as part of the polymer project is also
+subject to an additional IP rights grant found at http://polymer.github.io/PATENTS.txt
+-->
+<html>
+  <head>
+    <meta charset="utf-8">
+    <meta name="viewport" content="width=device-width, minimum-scale=1.0, initial-scale=1.0, user-scalable=yes">
+    <title>hydrogen-set demo</title>
+    <script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+    <link rel="import" href="../hydrogen-set.html">
+    <link rel="import" href="../../../bower_components/paper-checkbox/paper-checkbox.html">
+  </head>
+
+  <style>
+    body {
+      margin: 0;
+      font-family: "RobotoDraft","Roboto",sans-serif;
+      font-size: 14;
+    }
+  </style>
+
+  <body unresolved>
+    <dom-module id="x-a">
+      <template>
+        <template is="dom-repeat" items="[[items]]">
+          <span>[[item]]</span>
+          <paper-checkbox
+            name="[[item]]"
+            on-iron-change="_onChange"
+            checked="[[_isSelected(item, selected.*)]]"
+            ></paper-checkbox>
+        </template>
+      </template>
+    </dom-module>
+    <script>
+    Polymer({
+      is: 'x-a',
+      properties: {
+        selected: {
+          type: Array
+        },
+        items: {
+          type: Array,
+          value: function() {
+            return ["a","b","c","d","e"];
+          }
+        }
+      },
+      _onChange: function(e) {
+        var name = e.srcElement.name;
+        if (name) {
+          if (e.srcElement.checked) {
+            this.fire('select', name);
+          } else {
+            this.fire('unselect', name);
+          }
+        }
+      },
+      _isSelected: function(item, selected) {
+        return selected.base.indexOf(item) >= 0;
+      }
+    });
+    </script>
+    <template is="dom-bind">
+      <hydrogen-set
+        id="set"
+        event-add="{{add}}"
+        event-delete="{{del}}"
+        out-value="{{selected}}"
+      ></hydrogen-set>
+      <div>
+        Mutate the two sets below.
+      </div>
+      <br/>
+      <div>First Set</div>
+      <x-a
+        id="a"
+        on-select="add"
+        on-unselect="del"
+        selected="[[selected]]"
+      ></x-a>
+      <br/>
+      <br/>
+      <div>Second Set</div>
+      <x-a
+        id="b"
+        on-select="add"
+        on-unselect="del"
+        selected="[[selected]]"
+      ></x-a>
+      <br/>
+      <br/>
+      <div>List Selected:</div>
+      <template is="dom-repeat" items="[[selected]]">
+        <div>[[item]]</div>
+      </template>
+    </template>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/components/hydrogen-set/hydrogen-set.html b/tensorflow/tensorboard/components/hydrogen-set/hydrogen-set.html
new file mode 100644
index 0000000000..28fb8bd2a1
--- /dev/null
+++ b/tensorflow/tensorboard/components/hydrogen-set/hydrogen-set.html
@@ -0,0 +1,174 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+
+<!--
+hydrogen-set is a plumber component that is driven by events
+and produces data in the form of an upward-bindable set.
+It provides event handler functions like event-add and
+event-del, which may be attached elsewhere as event listeners.
+The hydrogen-set then captures those events and adds or
+removes their details to an internal set, which it
+publishes as out-value.
+
+As an example, you may have a list of input widgets that
+generate strings by firing an event. If you attach
+the hydrogen-set `event-add` function as the handler
+for the input widgets' input event, then the hydrogen-set
+will collect and deduplicate all of the strings the
+user generated.
+
+Thus, hydrogen-set is useful for capturing semantic
+events from across an application, and organizing
+the data generated into a single store.
+
+Example:
+
+  <hydrogen-set
+    event-add="{{add}}
+    event-delete="{{del}}
+    out-value="{{selected}}"
+  ></hydrogen-set>
+  <element-one
+    selected="{{selected}}"
+    on-select="add"
+    on-deselect="del"
+  ></element-one>
+  <element-two
+    selected="{{selected}}"
+    on-select="add"
+    on-deselect="del"
+  ></element-two>
+
+@demo demo/index.html
+-->
+<script>
+Polymer({
+  is: 'hydrogen-set',
+  properties: {
+    /**
+    * A function to bind to event callback where
+    * the detail value is the item to add.
+    *
+    * @property eventAdd
+    * @type Function
+    */
+    eventAdd: {
+      readOnly: true,
+      notify: true,
+      type: Function,
+      value: function() {
+        return function(e) {
+          this.add(e.detail);
+        }.bind(this);
+      }
+    },
+    /**
+    * A function to bind to event callback where
+    * the detail value is the item to remove.
+    *
+    * @property eventDelete
+    * @type Function
+    */
+    eventDelete: {
+      readOnly: true,
+      notify: true,
+      type: Function,
+      value: function() {
+        return function(e) {
+          this.delete(e.detail);
+        }.bind(this);
+      }
+    },
+    /**
+    * A function to bind to event callback where
+    * the detail value is the list of items that should
+    * replace the current set.
+    *
+    * @property eventUpdate
+    * @type Function
+    */
+    eventUpdate: {
+      readOnly: true,
+      notify: true,
+      type: Function,
+      value: function() {
+        return function(e) {
+          this.update(e.detail);
+        }.bind(this);
+      }
+    },
+    /**
+    * A function to bind to event callback
+    * which when called reset the set to the
+    * empty set ([]).
+    *
+    * @property eventClear
+    * @type Function
+    */
+    eventClear: {
+      readOnly: true,
+      notify: true,
+      type: Function,
+      value: function() {
+        return function(e) {
+          this.clear(e.detail);
+        }.bind(this);
+      }
+    },
+    /**
+    * The read-only array representing the set of
+    * items in this set.
+    *
+    * @property outValue
+    * @type Array
+    * @default []
+    */
+    outValue: {
+      type: Array,
+      readOnly: true,
+      notify: true,
+      value: function() {
+        return [];
+      }
+    }
+  },
+  /**
+   * Adds an item to the set.
+   */
+  add: function(value) {
+    if (this.outValue.indexOf(value) >= 0) { return; }
+    this.push('outValue', value);
+  },
+  /**
+   * Removes an item from the set.
+   */
+  delete: function(value) {
+    var i = this.outValue.indexOf(value);
+    if (i < 0) { return; }
+    this.splice('outValue', i, 1);
+  },
+  /**
+   * Sets the set to a specific array of items.
+   */
+  update: function(value) {
+    if (value.constructor === Array) {
+      var uniq = {};
+      var list = [];
+      for (var i=0, l = value.length; i < l; ++i) {
+        var item = value[i];
+        if (uniq.hasOwnProperty(item)) {
+          continue;
+        }
+        list.push(item);
+        uniq[item] = true;
+      }
+      this._setOutValue(list)
+    }
+  },
+  /**
+   * Resets the set to the empty set.
+   */
+  clear: function() {
+    this.update([]);
+  }
+});
+</script>
diff --git a/tensorflow/tensorboard/components/imports/d3.html b/tensorflow/tensorboard/components/imports/d3.html
new file mode 100644
index 0000000000..d63c480fd9
--- /dev/null
+++ b/tensorflow/tensorboard/components/imports/d3.html
@@ -0,0 +1 @@
+<script src="../../bower_components/d3/d3.min.js"></script>
diff --git a/tensorflow/tensorboard/components/imports/lodash.html b/tensorflow/tensorboard/components/imports/lodash.html
new file mode 100644
index 0000000000..1e94d2c1c4
--- /dev/null
+++ b/tensorflow/tensorboard/components/imports/lodash.html
@@ -0,0 +1 @@
+<script src="../../bower_components/lodash/lodash.min.js"></script>
diff --git a/tensorflow/tensorboard/components/imports/plottable.html b/tensorflow/tensorboard/components/imports/plottable.html
new file mode 100644
index 0000000000..08e636886a
--- /dev/null
+++ b/tensorflow/tensorboard/components/imports/plottable.html
@@ -0,0 +1,3 @@
+<link rel="import" href="d3.html">
+<script src="../../bower_components/plottable/plottable.min.js"></script>
+<link rel="stylesheet" type="text/css" href="../../bower_components/plottable/plottable.css">
diff --git a/tensorflow/tensorboard/components/tf-categorizer/categorizer.ts b/tensorflow/tensorboard/components/tf-categorizer/categorizer.ts
new file mode 100644
index 0000000000..e05078279e
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-categorizer/categorizer.ts
@@ -0,0 +1,133 @@
+/// <reference path="../../typings/tsd.d.ts" />
+
+module Categorizer {
+  /**
+   * This module contains methods that allow sorting tags into "categories".
+   * A category contains a name and a list of tags.
+   * The sorting strategy is defined by a "CustomCategorization", which contains
+   * "categoryDefinitions" which are regex rules used to construct a category.
+   * E.g. the regex rule "xent" will create a category called "xent" that
+   * contains values whose tags match the regex.
+   *
+   * After custom categories are evaluated, the tags are sorted by a hardcoded
+   * fallback categorizer, which may, for example, group tags into categories
+   * based on their top namespace.
+   */
+
+  export interface Category {
+    // Categories that data is sorted into
+    name: string;
+    tags: string[];
+  }
+
+  export interface CustomCategorization {
+    // Defines a categorization strategy
+    categoryDefinitions: string[];
+    fallbackCategorizer: string;
+    /* {"TopLevelNamespaceCategorizer",
+        "LegacyUnderscoreCategorizer"} */
+  }
+
+  export interface Categorizer {
+    // Function that generates categories
+    (tags: string[]): Category[];
+  }
+
+  /* Canonical TensorFlow ops are namespaced using forward slashes.
+   * This fallback categorizer categorizes by the top-level namespace.
+   */
+  export var topLevelNamespaceCategorizer: Categorizer = splitCategorizer(/\//);
+
+  // Try to produce good categorizations on legacy graphs, which often
+  // are namespaced like l1_foo/bar or l2_baz/bam.
+  // If there is no leading underscore before the first forward slash,
+  // then it behaves the same as topLevelNamespaceCategorizer
+  export var legacyUnderscoreCategorizer: Categorizer = splitCategorizer(/[\/_]/);
+
+  export function fallbackCategorizer(s: string): Categorizer {
+    switch (s) {
+      case "TopLevelNamespaceCategorizer":
+        return topLevelNamespaceCategorizer;
+      case "LegacyUnderscoreCategorizer":
+        return legacyUnderscoreCategorizer;
+      default:
+        throw new Error("Unrecognized categorization strategy: " + s);
+    }
+  }
+
+  /* An "extractor" is a function that takes a tag name, and "extracts" a category name.
+   * This function takes an extractor, and produces a categorizer.
+   * Currently, it is just used for the fallbackCategorizer, but we may want to
+   * refactor the general categorization logic to use the concept of extractors.
+   */
+  function extractorToCategorizer(extractor: (s: string) => string): Categorizer {
+    return (tags: string[]): Category[] => {
+      if (tags.length === 0) {
+        return [];
+      }
+      var sortedTags = tags.slice().sort();
+      var categories: Category[] = [];
+      var currentCategory = {
+        name: extractor(sortedTags[0]),
+        tags: [],
+      };
+      sortedTags.forEach((t: string) => {
+        var topLevel = extractor(t);
+        if (currentCategory.name !== topLevel) {
+          categories.push(currentCategory);
+          currentCategory = {
+            name: topLevel,
+            tags: [],
+          };
+        }
+        currentCategory.tags.push(t);
+      });
+      categories.push(currentCategory);
+      return categories;
+    };
+  }
+
+  function splitCategorizer(r: RegExp): Categorizer {
+    var extractor = (t: string) => {
+      return t.split(r)[0];
+    };
+    return extractorToCategorizer(extractor);
+  }
+
+  export interface CategoryDefinition {
+    name: string;
+    matches: (t: string) => boolean;
+  }
+
+  export function defineCategory(ruledef: string): CategoryDefinition {
+    var r = new RegExp(ruledef);
+    var f = function(tag: string): boolean {
+      return r.test(tag);
+    };
+    return { name: ruledef, matches: f };
+  }
+
+  export function _categorizer(rules: CategoryDefinition[], fallback: Categorizer) {
+    return function(tags: string[]): Category[] {
+      var remaining: d3.Set = d3.set(tags);
+      var userSpecified = rules.map((def: CategoryDefinition) => {
+        var tags: string[] = [];
+        remaining.forEach((t: string) => {
+          if (def.matches(t)) {
+            tags.push(t);
+          }
+        });
+        var cat = { name: def.name, tags: tags.sort() };
+        return cat;
+      });
+      var defaultCategories = fallback(remaining.values());
+      return userSpecified.concat(defaultCategories);
+    };
+  }
+
+  export function categorizer(s: CustomCategorization): Categorizer {
+    var rules = s.categoryDefinitions.map(defineCategory);
+    var fallback = fallbackCategorizer(s.fallbackCategorizer);
+    return _categorizer(rules, fallback);
+  };
+}
diff --git a/tensorflow/tensorboard/components/tf-categorizer/demo/index.html b/tensorflow/tensorboard/components/tf-categorizer/demo/index.html
new file mode 100644
index 0000000000..ea3f162aa5
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-categorizer/demo/index.html
@@ -0,0 +1,97 @@
+<!DOCTYPE html>
+<html>
+ <head>
+   <script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+   <script src="../../../bower_components/d3/d3.js"></script>
+   <link rel="import" href="../tf-categorizer.html">
+   <link rel="import" href="../../../bower_components/iron-flex-layout/classes/iron-flex-layout.html">
+
+ </head>
+ <body>
+  <style>
+  </style>
+  <dom-module id="x-demo">
+    <style>
+      .container {
+        width: 255px;
+        padding: 10px;
+        border: 1px solid var(--paper-indigo-900);
+        border-radius: 5px;
+        position: fixed;
+      }
+      :host {
+        margin: 0px;
+      }
+
+      .categories {
+        font-family: "RobotoDraft",Helvetica;
+        margin-left: 300px;
+        width: 500px;
+        border: 1px solid var(--paper-indigo-500);
+        border-radius: 5px;
+      }
+
+      .category {
+        background-color: var(--paper-indigo-50);
+        margin: 20px;
+        padding: 20px;
+        border-radius: 5px;
+      }
+
+      .cat-name {
+        font-size: 20px;
+      }
+
+      .tag {
+        border-radius: 5px;
+        padding: 5px;
+        margin: 5px;
+        background-color: var(--paper-indigo-900);
+        color: white;
+      }
+    </style>
+    <template>
+      <div class="container">
+        <tf-categorizer categories="{{categories}}" tags="[[tags]]" id="demo"></tf-categorizer>
+      </div>
+      <div class="categories">
+        <template is="dom-repeat" items="[[categories]]">
+          <div class="category">
+            <p class="cat-name">Category: <span>[[item.name]]</span></p>
+            <div class="tags-container layout horizontal wrap">
+              <template is="dom-repeat" items="[[item.tags]]">
+                <span class="tag layout vertical center-center">[[item]]</span>
+              </template>
+          </div>
+        </template>
+      </div>
+    </template>
+    <script>
+
+    function tagsGenerator() {
+      var tags = ["special1", "special2", "special3", "special4", "special5"];
+      ["l1", "l2", "l3", "l4", "l5"].forEach(function(l) {
+        ["foo", "bar", "baz", "boink", "zod", "specialx"].forEach(function(x) {
+          tags.push(l + "/" + x);
+        });
+      });
+      return tags;
+    }
+
+    Polymer({
+      is: "x-demo",
+      properties: {
+        tags: { type: Array, value: tagsGenerator },
+      },
+    });
+    </script>
+  </dom-module>
+
+  <x-demo id="demo"></x-demo>
+ </body>
+ <script>
+  HTMLImports.whenReady(function() {
+    window.demo = document.getElementById("demo");
+  })
+ </script>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-categorizer/index.html b/tensorflow/tensorboard/components/tf-categorizer/index.html
new file mode 100644
index 0000000000..f08a125f7c
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-categorizer/index.html
@@ -0,0 +1,18 @@
+<!doctype html>
+<html>
+<head>
+
+  <title>tf-categorizer</title>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+  <script src="../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+  <link rel="import" href="../../bower_components/iron-component-page/iron-component-page.html">
+
+</head>
+<body>
+
+  <iron-component-page></iron-component-page>
+
+</body>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-categorizer/test/categorizerTest.ts b/tensorflow/tensorboard/components/tf-categorizer/test/categorizerTest.ts
new file mode 100644
index 0000000000..be09c56c41
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-categorizer/test/categorizerTest.ts
@@ -0,0 +1,139 @@
+/// <reference path="../../../typings/tsd.d.ts" />
+/// <reference path="../categorizer.ts" />
+var assert = chai.assert;
+
+module Categorizer {
+  describe("categorizer", () => {
+    describe("topLevelNamespaceCategorizer", () => {
+      it("returns empty array on empty tags", () => {
+        assert.lengthOf(topLevelNamespaceCategorizer([]), 0);
+      });
+
+      it("handles a simple case", () => {
+        var simple = ["foo1/bar", "foo1/zod", "foo2/bar", "foo2/zod",
+          "gosh/lod/mar", "gosh/lod/ned"];
+        var expected = [
+          { name: "foo1", tags: ["foo1/bar", "foo1/zod"] },
+          { name: "foo2", tags: ["foo2/bar", "foo2/zod"] },
+          { name: "gosh", tags: ["gosh/lod/mar", "gosh/lod/ned"] },
+        ];
+        assert.deepEqual(topLevelNamespaceCategorizer(simple), expected);
+      });
+
+      it("orders the categories", () => {
+        var test = ["e", "f", "g", "a", "b", "c"];
+        var expected = [
+          { name: "a", tags: ["a"] },
+          { name: "b", tags: ["b"] },
+          { name: "c", tags: ["c"] },
+          { name: "e", tags: ["e"] },
+          { name: "f", tags: ["f"] },
+          { name: "g", tags: ["g"] },
+        ];
+        assert.deepEqual(topLevelNamespaceCategorizer(test), expected);
+      });
+
+      it("handles cases where category names overlap node names", () => {
+        var test = ["a", "a/a", "a/b", "a/c", "b", "b/a"];
+        var actual = topLevelNamespaceCategorizer(test);
+        var expected = [
+          { name: "a", tags: ["a", "a/a", "a/b", "a/c"] },
+          { name: "b", tags: ["b", "b/a"] },
+        ];
+        assert.deepEqual(actual, expected);
+      });
+
+      it("handles singleton case", () => {
+        assert.deepEqual(topLevelNamespaceCategorizer(["a"]), [{ name: "a", tags: ["a"] }]);
+      });
+    });
+
+    describe("legacyUnderscoreCategorizer", () => {
+      it("splits by shorter of first _ or /", () => {
+        var tags = ["l0_bar/foo", "l0_bar/baz", "l0_foo/wob", "l1_zoink/bla",
+                    "l1_wibble/woz", "l1/foo_woink", "l2/wozzle_wizzle"];
+        var actual = legacyUnderscoreCategorizer(tags);
+        var expected = [
+          { name: "l0", tags: ["l0_bar/baz", "l0_bar/foo", "l0_foo/wob"] },
+          { name: "l1", tags: ["l1/foo_woink", "l1_wibble/woz", "l1_zoink/bla"] },
+          { name: "l2", tags: ["l2/wozzle_wizzle"] },
+        ];
+        assert.deepEqual(actual, expected);
+      });
+    });
+
+    describe("customCategorizer", () => {
+      function noFallbackCategorizer(tags: string[]): Category[] {
+        return [];
+      }
+
+      function testCategorizer(defs: string[],
+                               fallback: Categorizer, tags: string[]): Category[] {
+        var catDefs = defs.map(defineCategory);
+        return _categorizer(catDefs, fallback)(tags);
+      }
+
+      it("categorizes by regular expression", () => {
+        var defs = ["foo..", "bar.."];
+        var tags = ["fooab", "fooxa", "barts", "barms"];
+        var actual = testCategorizer(defs, noFallbackCategorizer, tags);
+        var expected = [
+          { name: "foo..", tags: ["fooab", "fooxa"] },
+          { name: "bar..", tags: ["barms", "barts"] },
+        ];
+        assert.deepEqual(actual, expected);
+      });
+
+      it("matches non-exclusively", () => {
+        var tags = ["abc", "bar", "zod"];
+        var actual = testCategorizer(["...", "bar"], noFallbackCategorizer, tags);
+        var expected = [
+          { name: "...", tags: ["abc", "bar", "zod"] },
+          { name: "bar", tags: ["bar"] },
+        ];
+        assert.deepEqual(actual, expected);
+      });
+
+      it("creates categories for unmatched rules", () => {
+        var actual = testCategorizer(["a", "b", "c"], noFallbackCategorizer, []);
+        var expected = [
+          { name: "a", tags: [] },
+          { name: "b", tags: [] },
+          { name: "c", tags: [] },
+        ];
+        assert.deepEqual(actual, expected);
+      });
+
+      it("category regexs work with special characters", () => {
+        var defs = ["^\\w+$", "^\\d+$", "^\\/..$"];
+        var tags = ["foo", "3243", "/xa"];
+        var actual = testCategorizer(defs, noFallbackCategorizer, tags);
+        var expected = [
+          { name: "^\\w+$", tags: ["3243", "foo"] },
+          { name: "^\\d+$", tags: ["3243"] },
+          { name: "^\\/..$", tags: ["/xa"] },
+        ];
+        assert.deepEqual(actual, expected);
+      });
+
+      it("category tags are sorted", () => {
+        var tags = ["a", "z", "c", "d", "e", "x", "f", "y", "g"];
+        var sorted = tags.slice().sort();
+        var expected = [{ name: ".*", tags: sorted}];
+        var actual = testCategorizer([".*"], noFallbackCategorizer, tags);
+        assert.deepEqual(actual, expected);
+      });
+
+      it("if nonexclusive: all tags passed to fallback", () => {
+        var passedToDefault = null;
+        function defaultCategorizer(tags: string[]): Category[] {
+          passedToDefault = tags;
+          return [];
+        }
+        var tags = ["foo", "bar", "foo123"];
+        testCategorizer(["foo"], defaultCategorizer, tags);
+        assert.deepEqual(passedToDefault, tags);
+      });
+    });
+  });
+}
diff --git a/tensorflow/tensorboard/components/tf-categorizer/tf-categorizer.html b/tensorflow/tensorboard/components/tf-categorizer/tf-categorizer.html
new file mode 100644
index 0000000000..3672db38a2
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-categorizer/tf-categorizer.html
@@ -0,0 +1,103 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-toggle-button/paper-toggle-button.html">
+
+<link rel="import" href="../tf-regex-group/tf-regex-group.html">
+<link rel="import" href="../tf-dashboard-common/tensorboard-color.html">
+
+<!--
+`tf-categorizer` turns an array of tags into an array of categories
+
+The transformation from tags to categories is controlled by the user, through
+interacting with the categorizer widget.
+
+(See type signatures in categorizer.ts)
+
+Example:
+  <tf-categorizer tags="[[tags]]" categories="{{categories}}"></tf-categorizer>
+
+Public Properties:
+`tags` - Array of strings that are the tags to categorize. Should be one-way bound downward.
+`categories` - Array of Categorizer.Category objects that are generated by the Categorizer.
+  Are readOnly and notify: True. Expected to be one-way bound upward.
+
+The categorizer provides inputs for adding regular expression rules and toggling whether
+categories are exclusive.
+-->
+<dom-module id="tf-categorizer">
+  <template>
+    <div class="inputs">
+      <tf-regex-group id="regex-group" regexes="{{regexes}}"></tf-regex-group>
+    </div>
+    <div id="underscore-categorization">
+      <span>Split On Underscores:</span>
+      <paper-toggle-button checked="{{splitOnUnderscore}}"></paper-toggle-button>
+    </div>
+    <style>
+      :host {
+        display: block;
+        padding-bottom: 5px;
+        padding-top: 5px;
+      }
+
+      .inputs {
+        padding-left: 5px;
+      }
+
+      paper-toggle-button {
+        --paper-toggle-button-checked-button-color: var(--tb-orange-strong);
+        --paper-toggle-button-checked-bar-color: var(--tb-orange-weak);
+      }
+      #underscore-categorization {
+        padding-left: 94px;
+        color: var(--paper-grey-700);
+        font-size: 14px;
+      }
+    </style>
+  </template>
+  <script src="categorizer.js"></script>
+  <script>
+    Polymer({
+      is: "tf-categorizer",
+      properties: {
+        regexes: {type: Array},
+        tags: {type: Array},
+        categoriesAreExclusive: {type: Boolean, value: true},
+        fallbackCategorizer: {
+          type: String,
+          computed: "chooseFallbackCategorizer(splitOnUnderscore)"
+        },
+        splitOnUnderscore: {
+          type: Boolean,
+          value: false,
+        },
+        categorizer: {
+          type: Object,
+          computed: "computeCategorization(regexes.*, categoriesAreExclusive, fallbackCategorizer)",
+        },
+        categories: {type: Array, value: function() {return [];}, notify: true, readOnly: true},
+      },
+      observers: ['recategorize(tags.*, categorizer)'],
+      computeCategorization: function(regexes, categoriesAreExclusive, fallbackCategorizer) {
+        var categorizationStrategy = {
+          categoryDefinitions: regexes.base,
+          categoriesAreExclusive: categoriesAreExclusive,
+          fallbackCategorizer: fallbackCategorizer,
+        };
+        return Categorizer.categorizer(categorizationStrategy);
+      },
+      recategorize: function() {
+        this.debounce("tf-categorizer-recategorize", function (){
+          var categories = this.categorizer(this.tags);
+          this._setCategories(categories);
+        })
+      },
+      chooseFallbackCategorizer: function(splitOnUnderscore) {
+        if (splitOnUnderscore) {
+          return "LegacyUnderscoreCategorizer";
+        } else {
+          return "TopLevelNamespaceCategorizer";
+        }
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-collapsable-pane/demo/index.html b/tensorflow/tensorboard/components/tf-collapsable-pane/demo/index.html
new file mode 100644
index 0000000000..8906b0f3da
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-collapsable-pane/demo/index.html
@@ -0,0 +1,17 @@
+<!DOCTYPE html>
+<html>
+ <head>
+   <script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+   <link rel="import" href="../tf-collapsable-pane.html">
+ </head>
+ <body>
+  <style>
+  </style>
+  <tf-collapsable-pane name="foo">
+    <h1>This is content inside the pane.</h1>
+  </tf-collapsable-pane>
+ </body>
+ <script>
+
+ </script>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-collapsable-pane/index.html b/tensorflow/tensorboard/components/tf-collapsable-pane/index.html
new file mode 100644
index 0000000000..032e5be8c8
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-collapsable-pane/index.html
@@ -0,0 +1,18 @@
+<!doctype html>
+<html>
+<head>
+
+  <title>tf-collapsable-pane</title>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+  <script src="../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+  <link rel="import" href="../../bower_components/iron-component-page/iron-component-page.html">
+
+</head>
+<body>
+
+  <iron-component-page></iron-component-page>
+
+</body>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-collapsable-pane/tf-collapsable-pane.html b/tensorflow/tensorboard/components/tf-collapsable-pane/tf-collapsable-pane.html
new file mode 100644
index 0000000000..c06d40a2ec
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-collapsable-pane/tf-collapsable-pane.html
@@ -0,0 +1,90 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/iron-collapse/iron-collapse.html">
+
+<dom-module id="tf-collapsable-pane">
+  <template>
+    <button
+      class="heading"
+      on-tap="togglePane"
+      open-button$="[[opened]]"
+    >
+    <span class="name">[[name]]</span>
+    <span class="hackpadding"></span>
+    <span class="count">
+      (<span>[[count]]</span>)
+    </span>
+  </button>
+    <iron-collapse opened="[[opened]]">
+      <div class="content">
+        <template is="dom-if" if="[[opened]]" restamp="[[restamp]]">
+          <content></content>
+        </template>
+      </div>
+    </iron-collapse>
+    <style>
+      .heading {
+        margin-top: 10px;
+        padding-left: 15px;
+        background-color: #f3f3f3;
+        border: 1px solid #dedede;
+        border-radius: 5px;
+        font-size: 18px;
+        cursor: pointer;
+        -webkit-tap-highlight-color: rgba(0,0,0,0);
+        width: 100%;
+        height: 30px;
+        box-sizing: border-box;
+        font-size: 16px;
+        display: inline-flex;
+        flex-direction: row;
+        align-items: center;
+        justify-content: space-between;
+        line-height: 1;
+        padding-top: 2px;
+        padding-bottom: 2px;
+      }
+
+      .content {
+        padding: 15px;
+        border: 1px solid #dedede;
+        border-top: none;
+        border-bottom-left-radius: 5px;
+        border-bottom-right-radius: 5px;
+      }
+      [open-button] {
+        border-bottom-left-radius: 0px !important;
+        border-bottom-right-radius: 0px !important;
+      }
+      .name {
+        flex-grow: 0;
+      }
+      .count {
+        flex-grow: 0;
+        float: right;
+        font-size: 12px;
+      }
+      .hackpadding {
+        /* An obnoxious hack, but I can't get justify-content: space-between to work */
+        flex-grow: 1;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-collapsable-pane",
+      properties: {
+        opened: {type: Boolean, value: false},
+        restamp: {type: Boolean, value: true},
+        name: {type: String, observer: "hide"},
+        count: {type: Number},
+      },
+      hide: function() {
+        this.opened = false;
+      },
+      togglePane: function() {
+        this.opened = !this.opened;
+      }
+    });
+  </script>
+
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/dashboard-style.html b/tensorflow/tensorboard/components/tf-dashboard-common/dashboard-style.html
new file mode 100644
index 0000000000..795cbbcac3
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/dashboard-style.html
@@ -0,0 +1,97 @@
+<link rel="import" href="../../bower_components/paper-styles/paper-styles.html">
+<link rel="import" href="../tf-dashboard-common/tensorboard-color.html">
+
+<dom-module id="dashboard-style">
+  <template>
+    <style>
+      .card {
+        height: 200px;
+        width: 300px;
+        display: flex;
+        flex-direction: column;
+        margin: 5px 5px;
+        padding: 5px;
+        border: 1px solid var(--paper-grey-500);
+        border-radius: 3px;
+        -webkit-user-select: none;
+        -moz-user-select: none;
+        position: relative;
+      }
+
+      .card .card-title {
+        flex-grow: 0;
+        flex-shrink: 0;
+        margin-bottom: 2px;
+        font-size: 14px;
+        font-weight: bold;
+        text-overflow: ellipsis;
+        overflow: hidden;
+      }
+
+      .card .card-content {
+        flex-grow: 1;
+        flex-shrink: 1;
+        display: flex;
+      }
+      .card .card-bottom-row {
+        flex-grow: 0;
+        flex-shrink: 0;
+        padding-left: 10px;
+        padding-right: 10px;
+      }
+
+      .card.selected {
+        height: 400px;
+        width: 100%;
+      }
+
+      [shift] {
+        bottom: 20px !important;
+      }
+
+      .expand-button {
+        position: absolute;
+        left: 0px;
+        bottom: 0px;
+        color: #2196F3;
+        display: block;
+      }
+
+      #content-container{
+        display: block;
+      }
+
+      .sidebar {
+        display: flex;
+        flex-direction: column;
+        height: 100%;
+      }
+
+      #categorizer {
+        flex-shrink: 0;
+      }
+
+      #xTypeSelector {
+        flex-shrink: 0;
+        margin: 20px 0;
+      }
+
+      #runSelector {
+        flex-shrink: 1;
+        flex-grow: 1;
+      }
+
+      #download-option {
+        padding-left: 55px;
+        color: var(--paper-grey-700);
+        font-size: 14px;
+      }
+
+      #download-option paper-toggle-button {
+        --paper-toggle-button-checked-button-color: var(--tb-orange-strong);
+        --paper-toggle-button-checked-bar-color: var(--tb-orange-weak);
+
+      }
+    </style>
+  </template>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/run-color-style.html b/tensorflow/tensorboard/components/tf-dashboard-common/run-color-style.html
new file mode 100644
index 0000000000..d9b12f366a
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/run-color-style.html
@@ -0,0 +1,62 @@
+<link rel="import" href="../../bower_components/paper-styles/paper-styles.html">
+
+<dom-module id="run-color-style">
+  <template>
+    <style>
+    [color-class="light-blue"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-light-blue-500);
+      --paper-checkbox-checked-ink-color: var(--paper-light-blue-500);
+      --paper-checkbox-unchecked-color: var(--paper-light-blue-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-light-blue-900);
+    }
+    [color-class="red"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-red-500);
+      --paper-checkbox-checked-ink-color: var(--paper-red-500);
+      --paper-checkbox-unchecked-color: var(--paper-red-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-red-900);
+    }
+    [color-class="green"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-green-500);
+      --paper-checkbox-checked-ink-color: var(--paper-green-500);
+      --paper-checkbox-unchecked-color: var(--paper-green-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-green-900);
+    }
+    [color-class="purple"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-purple-500);
+      --paper-checkbox-checked-ink-color: var(--paper-purple-500);
+      --paper-checkbox-unchecked-color: var(--paper-purple-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-purple-900);
+    }
+    [color-class="teal"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-teal-500);
+      --paper-checkbox-checked-ink-color: var(--paper-teal-500);
+      --paper-checkbox-unchecked-color: var(--paper-teal-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-teal-900);
+    }
+    [color-class="pink"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-pink-500);
+      --paper-checkbox-checked-ink-color: var(--paper-pink-500);
+      --paper-checkbox-unchecked-color: var(--paper-pink-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-pink-900);
+    }
+    [color-class="orange"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-orange-500);
+      --paper-checkbox-checked-ink-color: var(--paper-orange-500);
+      --paper-checkbox-unchecked-color: var(--paper-orange-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-orange-900);
+    }
+    [color-class="brown"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-brown-500);
+      --paper-checkbox-checked-ink-color: var(--paper-brown-500);
+      --paper-checkbox-unchecked-color: var(--paper-brown-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-brown-900);
+    }
+    [color-class="indigo"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-indigo-500);
+      --paper-checkbox-checked-ink-color: var(--paper-indigo-500);
+      --paper-checkbox-unchecked-color: var(--paper-indigo-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-indigo-900);
+    }
+    </style>
+  </template>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/scrollbar-style.html b/tensorflow/tensorboard/components/tf-dashboard-common/scrollbar-style.html
new file mode 100644
index 0000000000..90fc184e8d
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/scrollbar-style.html
@@ -0,0 +1,28 @@
+<link rel="import" href="../../bower_components/paper-styles/paper-styles.html">
+
+<dom-module id="scrollbar-style">
+  <template>
+    <style>
+      .scrollbar::-webkit-scrollbar-track
+      {
+        visibility: hidden;
+      }
+
+      .scrollbar::-webkit-scrollbar
+      {
+        width: 10px;
+      }
+
+      .scrollbar::-webkit-scrollbar-thumb
+      {
+        border-radius: 10px;
+        -webkit-box-shadow: inset 0 0 2px rgba(0,0,0,.3);
+        background-color: var(--paper-grey-500);
+        color: var(--paper-grey-900);
+      }
+      .scrollbar {
+        box-sizing: border-box;
+      }
+    </style>
+  </template>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/tensorboard-color.html b/tensorflow/tensorboard/components/tf-dashboard-common/tensorboard-color.html
new file mode 100644
index 0000000000..c3a59b7a31
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/tensorboard-color.html
@@ -0,0 +1,11 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<style is="custom-style">
+
+  :root {
+    --tb-orange-weak: #fcb938;
+    --tb-orange-strong: #f3913e;
+    --tb-grey-darker: #e2e2e2;
+    --tb-grey-lighter: #f3f3f3;
+  }
+
+</style>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/tf-dashboard-layout.html b/tensorflow/tensorboard/components/tf-dashboard-common/tf-dashboard-layout.html
new file mode 100644
index 0000000000..89c51342fe
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/tf-dashboard-layout.html
@@ -0,0 +1,50 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="scrollbar-style.html">
+<link rel="import" href="tensorboard-color.html">
+<!--
+Generic layout for a dashboard.
+-->
+<dom-module id="tf-dashboard-layout">
+  <template>
+    <div id="sidebar">
+      <content select=".sidebar"></content>
+    </div>
+
+    <div id="center" class="scrollbar">
+      <content select=".center"></content>
+    </div>
+    <style include="scrollbar-style"></style>
+    <style>
+      #sidebar {
+        width: inherit;
+        height: 100%;
+        background-color: var(--tb-grey-darker);
+        background-image: linear-gradient(to right, var(--tb-grey-lighter), var(--tb-grey-lighter));
+        overflow: ellipsis;
+        padding-left: 10px;
+        padding-right: 10px;
+        flex-grow: 0;
+        flex-shrink: 0;
+      }
+
+      #center {
+        margin: 0 10px;
+        height: 100%;
+        overflow-y: scroll;
+        padding-right: 12px;
+        flex-grow: 1;
+        flex-shrink: 1;
+      }
+      :host {
+        display: flex;
+        flex-direction: row;
+        height: 100%;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-dashboard-layout",
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/tf-downloader.html b/tensorflow/tensorboard/components/tf-dashboard-common/tf-downloader.html
new file mode 100644
index 0000000000..c7251ec578
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/tf-downloader.html
@@ -0,0 +1,85 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-dropdown-menu/paper-dropdown-menu.html">
+<link rel="import" href="../../bower_components/paper-menu/paper-menu.html">
+<link rel="import" href="../../bower_components/paper-item/paper-item.html">
+
+<dom-module id="tf-downloader">
+  <template>
+    <paper-dropdown-menu
+      no-label-float="true"
+      label="run to download"
+      selected-item-label="{{_run}}"
+    >
+      <paper-menu class="dropdown-content">
+        <template is="dom-repeat" items="[[_runs]]">
+          <paper-item no-label-float=true>[[item]]</paper-item>
+        </template>
+      </paper-menu>
+    </paper-dropdown-menu>
+    <a
+      download="[[_csvName(_run)]]"
+      href="[[_csvUrl(_run, urlFn)]]"
+    >CSV</a>
+    <a
+      download="[[_jsonName(_run)]]"
+      href="[[_jsonUrl(_run, urlFn)]]"
+    >JSON</a>
+    <style>
+      :host {
+        display: block;
+      }
+      paper-dropdown-menu {
+        width: 220px;
+        --paper-input-container-label: {
+          font-size: 10px;
+        }
+        --paper-input-container-input: {
+          font-size: 10px;
+        }
+      }
+      a {
+        font-size: 10px;
+        border-radius: 3px;
+        border: 1px solid #EEE;
+      }
+      paper-input {
+        font-size: 22px;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-downloader",
+      properties: {
+        _run: String,
+        _runs: {
+          type: Array,
+          computed: "_computeRuns(runToTag.*, selectedRuns.*)",
+        },
+        selectedRuns: Array,
+        runToTag: Object,
+        tag: String,
+        urlFn: Function,
+      },
+      _computeRuns: function(runToTagChange, selectedRunsChange) {
+        var runToTag = this.runToTag;
+        var tag = this.tag;
+        return this.selectedRuns.filter(function(x) {
+          return runToTag[x].indexOf(tag) !== -1;
+        })
+      },
+      _csvUrl: function(_run, urlFn) {
+        return urlFn(this.tag, _run) + "&format=csv";
+      },
+      _jsonUrl: function(_run, urlFn) {
+        return urlFn(this.tag, _run);
+      },
+      _csvName: function(_run) {
+        return "run_" + _run + ",tag_" + this.tag + ".csv";
+      },
+      _jsonName: function(_run) {
+        return "run-" + _run + "-tag-" + this.tag + ".json";
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/tf-run-generator.html b/tensorflow/tensorboard/components/tf-dashboard-common/tf-run-generator.html
new file mode 100644
index 0000000000..4d72552049
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/tf-run-generator.html
@@ -0,0 +1,97 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/iron-ajax/iron-ajax.html">
+<link rel="import" href="../imports/lodash.html">
+
+<!--
+tf-run-generator is a plumbing component that takes in a url to load runs from, and
+  produces the following upward-bindable properties:
+
+  outRunToScalars: Maps from run name (string) to an array of scalar tags (strings).
+  outRunToHistograms: Maps from run name (string) to an array of histogram tags (strings).
+  outRunToImages: Maps from run name (string) to an array of image tags (strings).
+-->
+<dom-module id="tf-run-generator">
+  <template>
+    <iron-ajax
+      id="ajax"
+      auto
+      url="[[url]]"
+      handle-as="json"
+      debounce="300"
+      on-response="_setResponse"
+      verbose=true
+    >
+    </iron-ajax>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-run-generator",
+      properties: {
+        url: String,
+        _runToTag: {
+          type: Object,
+          readOnly: true,
+        },
+        outRunToScalars: {
+          // {[runName: string]: string[]}
+          // the names of scalar tags.
+          type: Object,
+          computed: "_scalars(_runToTag.*)",
+          notify: true,
+        },
+        outRunToHistograms: {
+          // {[runName: string]: string[]}
+          // the names of histogram tags.
+          type: Object,
+          computed: "_histograms(_runToTag.*)",
+          notify: true,
+        },
+        outRunToCompressedHistograms: {
+          // {[runName: string]: string[]}
+          // the names of histogram tags.
+          type: Object,
+          computed: "_compressedHistograms(_runToTag.*)",
+          notify: true,
+        },
+        outRunToImages: {
+          // {[runName: string]: string[]}
+          // the names of image tags.
+          type: Object,
+          computed: "_images(_runToTag.*)",
+          notify: true,
+        },
+        outRunsWithGraph: {
+          // ["run1", "run2", ...]
+          // array of run names that have an associated graph definition.
+          type: Array,
+          computed: "_graphs(_runToTag.*)",
+          notify: true
+        }
+      },
+      _scalars: function(_runToTag) {
+        return _.mapValues(_runToTag.base, "scalars");
+      },
+      _histograms: function(_runToTag) {
+        return _.mapValues(_runToTag.base, "histograms");
+      },
+      _compressedHistograms: function(_runToTag) {
+        return _.mapValues(_runToTag.base, "compressedHistograms");
+      },
+      _images: function(_runToTag) {
+        return _.mapValues(_runToTag.base, "images");
+      },
+      _graphs: function(_runToTag) {
+        var runsWithGraph = [];
+        _.each(_runToTag.base, function(runInfo, runName) {
+          if (runInfo.graph === true) {
+            runsWithGraph.push(runName);
+          }
+        });
+        return runsWithGraph;
+      },
+      _setResponse: function(event) {
+        this._set_runToTag(event.detail.response);
+      }
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/tf-url-generator.html b/tensorflow/tensorboard/components/tf-dashboard-common/tf-url-generator.html
new file mode 100644
index 0000000000..803998daeb
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/tf-url-generator.html
@@ -0,0 +1,50 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+
+<!-- tf-url-generator is a plumbing component that provides two upward bindable properties:
+ outRunsUrl and outValuesUrlGenerator. These may be used by the rest of the application to communicate
+ with the backend, and by overriding the TF.Urls code that backs it, can be modified to load data from
+ a demo data source instead.
+ -->
+<dom-module id="tf-url-generator">
+  <script src="urlGenerator.js"></script>
+  <script>
+    var polymerObject = {
+      is: "tf-url-generator",
+      properties: {
+        outRunsUrl: {
+          type: String,
+          value: function() {
+            return TF.Urls.runsUrl();
+          },
+          readOnly: true,
+          notify: true,
+        },
+      },
+    };
+    TF.Urls.routes.forEach(function(route) {
+      /* for each route (other than runs, handled seperately):
+       * out`RouteName`: {
+       *  type: Function,
+       *  readOnly: true,
+       *  notify: true,
+       *  value: function() {
+       *    return TF.Urls.`routeName`Url;
+       *  }
+       */
+      if (route === "runs") {
+        return;
+      }
+      var urlName = route + "Url";
+      var propertyName = Polymer.CaseMap.dashToCamelCase("out-" + urlName + "Generator");
+      polymerObject.properties[propertyName] = {
+        type: Function,
+        value: function() {
+          return TF.Urls[urlName];
+        },
+        notify: true,
+        readOnly: true,
+      }
+    });
+    Polymer(polymerObject);
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/urlGenerator.ts b/tensorflow/tensorboard/components/tf-dashboard-common/urlGenerator.ts
new file mode 100644
index 0000000000..c7bbcbf434
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/urlGenerator.ts
@@ -0,0 +1,33 @@
+/// <reference path="../../typings/tsd.d.ts" />
+/// <reference path="../../bower_components/plottable/plottable.d.ts" />
+
+module TF {
+  export module Urls {
+
+    export var routes = ["runs", "scalars", "histograms",
+                         "compressedHistograms", "images",
+                         "individualImage", "graph"];
+
+    function router(route: string): ((tag: string, run: string) => string) {
+      return function(tag: string, run: string): string {
+        return "/" + route + "?tag=" + encodeURIComponent(tag)
+                           + "&run=" + encodeURIComponent(run);
+      };
+    }
+
+    export function runsUrl() {
+      return "/runs";
+    }
+    export var scalarsUrl = router("scalars");
+    export var histogramsUrl = router("histograms");
+    export var compressedHistogramsUrl = router("compressedHistograms");
+    export var imagesUrl = router("images");
+    export function individualImageUrl(query: string) {
+      return "/individualImage?" + query;
+    }
+    export function graphUrl(run: string) {
+      return "/graph?run=" + encodeURIComponent(run);
+    }
+
+  }
+}
diff --git a/tensorflow/tensorboard/components/tf-dashboard-common/warning-style.html b/tensorflow/tensorboard/components/tf-dashboard-common/warning-style.html
new file mode 100644
index 0000000000..c4103a7248
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-dashboard-common/warning-style.html
@@ -0,0 +1,10 @@
+<dom-module id="warning-style">
+  <template>
+    <style>
+      .warning {
+        max-width: 540px;
+        margin: 80px auto 0 auto;
+      }
+    </style>
+  </template>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/dataCoordinator.ts b/tensorflow/tensorboard/components/tf-event-dashboard/dataCoordinator.ts
new file mode 100644
index 0000000000..c489eca17c
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/dataCoordinator.ts
@@ -0,0 +1,57 @@
+/// <reference path="../../typings/tsd.d.ts" />
+/// <reference path="../../bower_components/plottable/plottable.d.ts" />
+
+module TF {
+
+  /* The DataCoordinator generates TF.Datasets for each run/tag combination,
+   * and is responsible for communicating with the backend to load data into them.
+   * A key fact about this design is that when Datasets modify their data, they
+   * automatically notify all dependent Plottable charts.
+   */
+  export class DataCoordinator {
+    private urlGenerator: (tag: string, run: string) => string;
+    private datasets: {[key: string]: TF.Dataset};
+    private runToTag: {[run: string]: string[]};
+
+    constructor(urlGenerator: (tag: string, run: string) => string,
+                runToTag: {[run: string]: string[]}) {
+      this.datasets = {};
+      this.urlGenerator = urlGenerator;
+      this.runToTag = runToTag;
+    }
+
+    /* Create or return an array of Datasets for the given
+     * tag and runs. It filters which runs it uses by checking
+     * that data exists for each tag-run combination.
+     * Calling this triggers a load on the dataset.
+     */
+    public getDatasets(tag: string, runs: string[]) {
+      var usableRuns = runs.filter((r) => {
+        var tags = this.runToTag[r];
+        return tags.indexOf(tag) !== -1;
+      });
+      return usableRuns.map((r) => this.getDataset(tag, r));
+    }
+
+    /* Create or return a Dataset for given tag and run.
+     * Calling this triggers a load on the dataset.
+     */
+    public getDataset(tag: string, run: string): TF.Dataset {
+      var dataset = this._getDataset(tag, run);
+      dataset.load();
+      return dataset;
+    }
+
+    private _getDataset(tag: string, run: string): TF.Dataset {
+      var key = [tag, run].toString();
+      var dataset: TF.Dataset;
+      if (this.datasets[key] != null) {
+        dataset = this.datasets[key];
+      } else {
+        dataset = new TF.Dataset(tag, run, this.urlGenerator);
+        this.datasets[key] = dataset;
+      }
+      return dataset;
+    }
+  }
+}
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/dataset.ts b/tensorflow/tensorboard/components/tf-event-dashboard/dataset.ts
new file mode 100644
index 0000000000..de814583d3
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/dataset.ts
@@ -0,0 +1,41 @@
+/// <reference path="../../typings/tsd.d.ts" />
+/// <reference path="../../bower_components/plottable/plottable.d.ts" />
+
+module TF {
+  /* An extension of Plottable.Dataset that knows how to load data from a backend.
+   */
+  export class Dataset extends Plottable.Dataset {
+    public tag: string;
+    public run: string;
+    private lastLoadTime: number;
+    private lastRequest;
+    private urlGenerator: Function;
+
+    constructor(tag: string, run: string, urlGenerator: (run: string, tag: string) => string) {
+      super([], {tag: tag, run: run});
+      this.tag = tag;
+      this.run = run;
+      this.urlGenerator = urlGenerator;
+    }
+
+    public load = _.debounce(this._load, 10);
+
+    private _load() {
+      var url = this.urlGenerator(this.tag, this.run);
+      if (this.lastRequest != null) {
+        this.lastRequest.abort();
+      }
+      this.lastRequest = d3.json(url, (error, json) => {
+        this.lastRequest = null;
+        if (error) {
+          /* tslint:disable */
+          console.log(error);
+          /* tslint:enable */
+          throw new Error("Failure loading JSON at url: \"" + url + "\"");
+        } else {
+          this.data(json);
+        }
+      });
+    }
+  }
+}
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d1.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d1.json
new file mode 100644
index 0000000000..af17f5c328
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d1.json
@@ -0,0 +1 @@
+[[1436926051.074826, 84, 0.6990088224411011], [1436926530.99861, 2289, 6.9384379386901855], [1436927011.134076, 7611, 13.698328971862793], [1436927490.984256, 16147, 20.168190002441406], [1436927970.957234, 26087, 20.877344131469727], [1436928450.977514, 36241, 21.269058227539062], [1436928930.989548, 46432, 21.329505920410156], [1436929410.976308, 56629, 21.220420837402344], [1436929890.966395, 66791, 21.190065383911133], [1436930370.958199, 76936, 21.108604431152344], [1436930850.985301, 87083, 21.157001495361328], [1436931331.009261, 97161, 21.02127456665039], [1436931810.966042, 107210, 20.891658782958984], [1436932290.955417, 117262, 20.930112838745117], [1436932770.964496, 127333, 20.986324310302734], [1436933250.962592, 137430, 20.981359481811523], [1436933730.992022, 147528, 21.083036422729492], [1436934210.959831, 157635, 21.092649459838867], [1436934690.97072, 167749, 21.11568832397461], [1436935170.957944, 177869, 21.145965576171875], [1436935650.959987, 188025, 21.215585708618164], [1436936130.997541, 198206, 21.227184295654297], [1436936610.965526, 208395, 21.226459503173828], [1436937090.965581, 218592, 21.264968872070312], [1436937570.964874, 228818, 21.335866928100586], [1436938050.965706, 239021, 21.286521911621094], [1436938531.013159, 249210, 21.20963478088379], [1436939010.957926, 259415, 21.28431510925293], [1436939490.96341, 269637, 21.326831817626953], [1436939970.959372, 279876, 21.38308334350586], [1436940450.963802, 290127, 21.355499267578125], [1436940931.004537, 300349, 21.31337547302246], [1436941410.979614, 310601, 21.405778884887695], [1436941890.979674, 320872, 21.368688583374023], [1436942370.975153, 331131, 21.39077377319336], [1436942850.980459, 341399, 21.41745948791504], [1436943331.000808, 351651, 21.384023666381836], [1436943810.968736, 361904, 21.326438903808594], [1436944290.95947, 372158, 21.367351531982422], [1436944770.955783, 382430, 21.476247787475586], [1436945250.966321, 392684, 21.36678695678711], [1436945731.008667, 402950, 21.349145889282227], [1436946210.977922, 413210, 21.373897552490234], [1436946690.975303, 423463, 21.322399139404297], [1436947170.964596, 433723, 21.341150283813477], [1436947650.955017, 443991, 21.366348266601562], [1436948130.992501, 454271, 21.43844223022461], [1436948610.960555, 464519, 21.36829948425293], [1436949090.961079, 474758, 21.266357421875], [1436949570.971528, 484987, 21.316511154174805], [1436950050.977787, 495228, 21.356050491333008], [1436950531.020035, 505458, 21.31462860107422], [1436951010.959775, 515682, 21.277490615844727], [1436951490.967418, 525910, 21.289737701416016], [1436951970.969778, 536112, 21.2515811920166], [1436952450.956291, 546320, 21.254491806030273], [1436952931.005547, 556541, 21.297870635986328], [1436953410.955758, 566755, 21.320045471191406], [1436953890.959151, 576957, 21.23529624938965], [1436954370.959553, 587165, 21.25132179260254], [1436954850.960546, 597371, 21.23470115661621], [1436955330.989932, 607582, 21.19434356689453], [1436955810.957128, 617790, 21.258535385131836], [1436956290.9763, 627991, 21.221921920776367], [1436956770.957785, 638208, 21.309843063354492], [1436957250.974143, 648404, 21.252185821533203], [1436957731.012441, 658613, 21.265626907348633], [1436958210.980787, 668824, 21.239660263061523], [1436958690.973474, 679034, 21.2642765045166], [1436959170.95825, 689249, 21.303138732910156], [1436959650.959345, 699454, 21.24073600769043], [1436960131.008682, 709664, 21.217615127563477], [1436960610.958074, 719876, 21.251184463500977], [1436961090.963638, 730100, 21.290971755981445], [1436961570.979029, 740316, 21.305265426635742], [1436962050.974645, 750534, 21.27857208251953], [1436962531.055479, 760757, 21.329837799072266], [1436963010.975299, 770964, 21.248849868774414], [1436963490.963107, 781164, 21.19978904724121], [1436963970.965936, 791382, 21.30535888671875], [1436964450.959947, 801590, 21.226255416870117], [1436964931.00587, 811785, 21.242237091064453], [1436965410.977997, 821977, 21.226497650146484], [1436965890.988465, 832189, 21.31219482421875], [1436966370.965612, 842399, 21.283390045166016], [1436966850.965794, 852612, 21.273908615112305], [1436967331.009476, 862825, 21.260452270507812], [1436967810.96767, 873037, 21.315444946289062], [1436968290.959107, 883248, 21.28677749633789], [1436968770.9681, 893452, 21.265335083007812], [1436969250.959332, 903655, 21.252891540527344], [1436969731.055609, 913856, 21.233684539794922], [1436970210.961426, 924047, 21.191429138183594], [1436970690.962999, 934250, 21.23288345336914], [1436971170.989107, 944430, 21.17190170288086], [1436971650.956015, 954634, 21.275972366333008], [1436972131.006841, 964844, 21.278474807739258], [1436972610.981754, 975045, 21.25553321838379], [1436973090.961548, 985239, 21.21686553955078], [1436973570.960013, 995439, 21.26004981994629], [1436974050.975653, 1005642, 21.25356101989746], [1436974530.988571, 1015842, 21.23944664001465], [1436975010.95851, 1026048, 21.293363571166992], [1436975490.97355, 1036253, 21.277101516723633], [1436975970.960916, 1046451, 21.242155075073242], [1436976450.990263, 1056636, 21.182037353515625], [1436976930.999578, 1066834, 21.21113395690918], [1436977410.962637, 1077031, 21.230762481689453], [1436977890.970389, 1087222, 21.232444763183594], [1436978370.959059, 1097405, 21.202342987060547], [1436978850.956562, 1107601, 21.23992156982422], [1436979331.021134, 1117786, 21.197628021240234], [1436979810.958593, 1127973, 21.2270565032959], [1436980290.958763, 1138163, 21.250303268432617], [1436980770.967171, 1148348, 21.215538024902344], [1436981250.960473, 1158540, 21.277185440063477], [1436981731.009465, 1168733, 21.268449783325195], [1436982210.960797, 1178930, 21.268077850341797], [1436982690.959709, 1189129, 21.243141174316406], [1436983170.961963, 1199327, 21.21793556213379], [1436983650.958504, 1209524, 21.2817440032959], [1436984130.998057, 1219726, 21.261478424072266], [1436984610.958945, 1229936, 21.300107955932617], [1436985090.978825, 1240145, 21.326183319091797], [1436985570.993741, 1250311, 21.115875244140625], [1436986050.965608, 1260436, 21.19010353088379], [1436986531.026713, 1270611, 21.183719635009766], [1436987010.969056, 1280784, 21.273176193237305], [1436987490.975071, 1290959, 21.182931900024414], [1436987970.96007, 1301147, 21.260244369506836], [1436988450.966092, 1311328, 21.225025177001953], [1436988931.004917, 1321514, 21.242164611816406], [1436989410.980351, 1331709, 21.19801139831543], [1436989890.975192, 1341910, 21.273555755615234], [1436990370.964941, 1352090, 21.175983428955078], [1436990850.973647, 1362240, 21.13412094116211], [1436991330.999346, 1372396, 21.153064727783203], [1436991811.003573, 1382550, 21.155475616455078], [1436992290.962706, 1392710, 21.17011833190918], [1436992770.999149, 1402862, 21.128713607788086], [1436993250.965124, 1413020, 21.1361026763916], [1436993731.020464, 1423164, 21.157777786254883], [1436994210.966935, 1433312, 21.119478225708008], [1436994690.962803, 1443468, 21.161104202270508], [1436995170.972952, 1453657, 21.11492919921875], [1436995650.976233, 1463820, 21.194231033325195], [1436996130.990524, 1473980, 21.169816970825195], [1436996610.97302, 1484152, 21.18223762512207], [1436997090.958457, 1494308, 21.1954402923584], [1436997570.980333, 1504463, 21.140769958496094], [1436998050.969869, 1514618, 21.162744522094727], [1436998530.99688, 1524770, 21.139591217041016], [1436999010.970375, 1534905, 21.107114791870117], [1436999490.960775, 1545070, 21.233396530151367], [1436999970.965087, 1555223, 21.201074600219727], [1437000450.969008, 1565370, 21.147083282470703], [1437000931.007425, 1575517, 21.108510971069336], [1437001410.962798, 1585666, 21.11674690246582], [1437001890.966192, 1595826, 21.17819595336914], [1437002370.961814, 1605980, 21.157669067382812], [1437002850.962206, 1616145, 21.212690353393555], [1437003330.994816, 1626291, 21.177446365356445], [1437003810.966017, 1636448, 21.17884063720703], [1437004290.959479, 1646599, 21.150310516357422], [1437004770.965083, 1656754, 21.21011734008789], [1437005250.958234, 1666902, 21.14912986755371], [1437005731.003528, 1677043, 21.125459671020508], [1437006210.961371, 1687192, 21.124374389648438], [1437006690.962663, 1697338, 21.150362014770508], [1437007170.961639, 1707484, 21.16637420654297], [1437007650.972242, 1717625, 21.163259506225586], [1437008131.003191, 1727767, 21.167280197143555], [1437008610.962644, 1737913, 21.174945831298828], [1437009090.964129, 1748068, 21.17894172668457], [1437009570.962582, 1758219, 21.116622924804688], [1437010050.984863, 1768384, 21.23469352722168], [1437010531.002295, 1778534, 21.143510818481445], [1437011010.961803, 1788677, 21.159791946411133], [1437011490.974074, 1798822, 21.119792938232422], [1437011970.959982, 1808958, 21.10943603515625], [1437012450.95932, 1819091, 21.123899459838867], [1437012931.004909, 1829227, 21.094532012939453], [1437013410.957751, 1839374, 21.200057983398438], [1437013890.960506, 1849509, 21.10895538330078], [1437014370.96113, 1859653, 21.108680725097656], [1437014850.962876, 1869791, 21.141136169433594], [1437015331.009875, 1879944, 21.160165786743164], [1437015810.960671, 1890090, 21.158742904663086], [1437016290.970743, 1900242, 21.16562271118164], [1437016770.961673, 1910391, 21.141860961914062], [1437017250.96735, 1920551, 21.19420051574707], [1437017731.000324, 1930702, 21.16814422607422], [1437018210.967878, 1940856, 21.125978469848633], [1437018690.962742, 1951005, 21.15043067932129], [1437019170.975774, 1961158, 21.157419204711914], [1437019650.964573, 1971309, 21.150177001953125], [1437020130.999343, 1981461, 21.124492645263672], [1437020610.960696, 1991611, 21.109933853149414], [1437021090.958597, 2001766, 21.169754028320312], [1437021570.964477, 2011919, 21.13479995727539], [1437022050.966522, 2022063, 21.131561279296875], [1437022531.005607, 2032219, 21.135629653930664], [1437023010.970667, 2042380, 21.207313537597656], [1437023490.964885, 2052534, 21.108623504638672], [1437023970.965596, 2062691, 21.14097023010254], [1437024450.962296, 2072837, 21.129037857055664], [1437024931.00395, 2082982, 21.077030181884766], [1437025410.96602, 2093128, 21.13152503967285], [1437025890.961753, 2103274, 21.117740631103516], [1437026370.962022, 2113424, 21.141584396362305], [1437026850.975475, 2123570, 21.143577575683594], [1437027331.009277, 2133721, 21.175586700439453], [1437027810.97206, 2143857, 21.099014282226562], [1437028290.961523, 2154015, 21.141523361206055], [1437028770.964366, 2164168, 21.141345977783203], [1437029250.962109, 2174320, 21.14827537536621], [1437029731.003068, 2184453, 21.086946487426758], [1437030210.960946, 2194602, 21.1590576171875], [1437030690.966681, 2204754, 21.17353057861328], [1437031170.961207, 2214899, 21.133989334106445], [1437031650.962809, 2225062, 21.14800453186035], [1437032130.997644, 2235215, 21.15397834777832], [1437032610.962999, 2245366, 21.15763282775879], [1437033090.962192, 2255521, 21.133577346801758], [1437033570.963341, 2265657, 21.058490753173828], [1437034050.979501, 2275787, 21.079614639282227], [1437034531.003514, 2285923, 21.12677574157715], [1437035010.960984, 2296058, 21.100793838500977], [1437035490.97325, 2306176, 21.10753059387207], [1437035970.969759, 2316297, 21.100393295288086], [1437036450.962305, 2326428, 21.041208267211914], [1437036931.001785, 2336571, 21.15167999267578], [1437037410.967681, 2346709, 21.09291648864746], [1437037890.963194, 2356854, 21.18524932861328], [1437038370.96445, 2366985, 21.116247177124023], [1437038850.960718, 2377124, 21.125469207763672], [1437039331.003148, 2387259, 21.132274627685547], [1437039810.974007, 2397400, 21.119945526123047], [1437040290.983415, 2407539, 21.154672622680664], [1437040770.961836, 2417667, 21.066741943359375], [1437041250.964281, 2427791, 21.126564025878906], [1437041731.0196, 2437923, 21.1062068939209], [1437042210.962927, 2448056, 21.124549865722656], [1437042690.964392, 2458193, 21.13232421875], [1437043170.972024, 2468318, 21.066423416137695], [1437043650.966111, 2478449, 21.123788833618164], [1437044131.030028, 2488576, 21.138349533081055], [1437044610.962532, 2498717, 21.11895179748535], [1437045090.965094, 2508839, 21.019609451293945], [1437045570.963352, 2518972, 21.079254150390625], [1437046050.96194, 2529106, 21.15033531188965], [1437046530.995016, 2539243, 21.11912727355957], [1437047010.963313, 2549369, 21.08464813232422], [1437047490.963943, 2559509, 21.133895874023438], [1437047970.958612, 2569646, 21.108659744262695], [1437048450.962392, 2579776, 21.084848403930664], [1437048931.005408, 2589906, 21.092708587646484], [1437049410.984115, 2600033, 21.130634307861328], [1437049890.964103, 2610162, 21.074010848999023], [1437050370.960886, 2620282, 21.086149215698242], [1437050850.959795, 2630402, 21.08969497680664], [1437051331.008292, 2640533, 21.134498596191406], [1437051810.96622, 2650643, 21.065444946289062], [1437052290.98584, 2660774, 21.120830535888672], [1437052770.967707, 2670900, 21.085134506225586], [1437053250.978851, 2681021, 21.037155151367188], [1437053731.021686, 2691151, 21.09203338623047], [1437054210.971744, 2701273, 21.048450469970703], [1437054690.966686, 2711425, 21.048809051513672], [1437055170.964463, 2721564, 21.13330078125], [1437055650.97301, 2731694, 21.097095489501953], [1437056130.997053, 2741810, 21.031536102294922], [1437056610.968681, 2751927, 21.04400634765625], [1437057090.976676, 2762049, 21.114444732666016], [1437057570.962334, 2772169, 21.06243896484375], [1437058050.969524, 2782292, 21.12563133239746], [1437058531.012918, 2792420, 21.12433433532715], [1437059010.972868, 2802545, 21.067407608032227], [1437059490.96188, 2812684, 21.099285125732422], [1437059970.965083, 2822806, 21.08357810974121], [1437060450.964845, 2832940, 21.142192840576172], [1437060931.011947, 2843080, 21.109895706176758], [1437061410.963414, 2853223, 21.13360023498535], [1437061890.969303, 2863361, 21.152849197387695], [1437062370.963703, 2873490, 21.08356285095215], [1437062850.964392, 2883627, 21.115087509155273], [1437063331.025516, 2893758, 21.13198471069336], [1437063810.962087, 2903877, 21.084623336791992], [1437064290.973818, 2914013, 21.14010238647461], [1437064770.967792, 2924145, 21.108346939086914], [1437065250.95886, 2934291, 21.1142635345459], [1437065731.01002, 2944434, 21.17418670654297], [1437066210.959306, 2954576, 21.084075927734375], [1437066690.960644, 2964724, 21.125164031982422], [1437067170.969539, 2974890, 21.200775146484375], [1437067650.960018, 2985036, 21.14740562438965], [1437068130.990731, 2995179, 21.11964225769043], [1437068610.960429, 3005322, 21.141313552856445], [1437069090.95752, 3015461, 21.082963943481445], [1437069570.974879, 3025595, 21.12288475036621], [1437070050.95761, 3035734, 21.107513427734375], [1437070531.0013, 3045868, 21.171630859375], [1437071010.961705, 3056004, 21.066505432128906], [1437071490.961495, 3066137, 21.10834312438965], [1437071970.978122, 3076267, 21.08027458190918], [1437072450.963299, 3086399, 21.089733123779297], [1437072931.018382, 3096524, 21.133176803588867], [1437073050.962102, 3099048, 21.041847229003906], [1437073170.96983, 3101584, 21.131967544555664], [1437073290.957895, 3104118, 21.118793487548828]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d2.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d2.json
new file mode 100644
index 0000000000..92bb414348
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d2.json
@@ -0,0 +1 @@
+[[1436925978.257845, 7, 0.04500000178813934], [1436926413.945391, 1476, 0.04500000178813934], [1436926893.945037, 6006, 0.04500000178813934], [1436927373.995472, 13786, 0.04500000178813934], [1436927853.989794, 23650, 0.04500000178813934], [1436928334.132361, 33755, 0.04500000178813934], [1436928813.973288, 43941, 0.04500000178813934], [1436929293.975949, 54146, 0.04500000178813934], [1436929773.992781, 64316, 0.04500000178813934], [1436930253.997415, 74465, 0.04500000178813934], [1436930734.203004, 84611, 0.04230000078678131], [1436931214.03644, 94700, 0.04230000078678131], [1436931694.094564, 104766, 0.04230000078678131], [1436932174.114955, 114817, 0.04230000078678131], [1436932654.161382, 124880, 0.04230000078678131], [1436933133.960214, 134977, 0.04230000078678131], [1436933614.044337, 145062, 0.04230000078678131], [1436934094.166206, 155169, 0.04230000078678131], [1436934574.106036, 165284, 0.03976200148463249], [1436935054.150647, 175402, 0.03976200148463249], [1436935533.819562, 185538, 0.03976200148463249], [1436936013.710422, 195712, 0.03976200148463249], [1436936493.609025, 205906, 0.03976200148463249], [1436936973.683892, 216099, 0.03976200148463249], [1436937454.138383, 226331, 0.03976200148463249], [1436937933.838475, 236532, 0.03976200148463249], [1436938413.89688, 246724, 0.0373762808740139], [1436938894.018652, 256925, 0.0373762808740139], [1436939373.69067, 267137, 0.0373762808740139], [1436939853.673692, 277369, 0.0373762808740139], [1436940333.651346, 287620, 0.0373762808740139], [1436940813.599579, 297848, 0.0373762808740139], [1436941293.596313, 308088, 0.0373762808740139], [1436941773.659172, 318362, 0.0373762808740139], [1436942253.648479, 328621, 0.03513370454311371], [1436942733.752284, 338892, 0.03513370454311371], [1436943213.621881, 349144, 0.03513370454311371], [1436943693.698743, 359399, 0.03513370454311371], [1436944173.578463, 369649, 0.03513370454311371], [1436944653.692217, 379912, 0.03513370454311371], [1436945133.677298, 390180, 0.03513370454311371], [1436945613.572411, 400445, 0.03302568197250366], [1436946093.56123, 410703, 0.03302568197250366], [1436946573.542364, 420958, 0.03302568197250366], [1436947053.616578, 431216, 0.03302568197250366], [1436947533.636973, 441483, 0.03302568197250366], [1436948013.541574, 451751, 0.03302568197250366], [1436948493.560223, 462015, 0.03302568197250366], [1436948973.512541, 472260, 0.03302568197250366], [1436949453.550055, 482483, 0.031044140458106995], [1436949933.828011, 492731, 0.031044140458106995], [1436950413.603177, 502957, 0.031044140458106995], [1436950893.563009, 513185, 0.031044140458106995], [1436951373.620887, 523410, 0.031044140458106995], [1436951853.61941, 533618, 0.031044140458106995], [1436952333.694447, 543828, 0.031044140458106995], [1436952813.621004, 554042, 0.031044140458106995], [1436953293.588156, 564251, 0.02918149158358574], [1436953773.599734, 574464, 0.02918149158358574], [1436954253.621309, 584672, 0.02918149158358574], [1436954733.738119, 594882, 0.02918149158358574], [1436955213.56617, 605091, 0.02918149158358574], [1436955693.585366, 615296, 0.02918149158358574], [1436956173.626395, 625501, 0.02918149158358574], [1436956653.601937, 635705, 0.02918149158358574], [1436957133.665878, 645915, 0.02743060328066349], [1436957613.584762, 656116, 0.02743060328066349], [1436958093.549783, 666331, 0.02743060328066349], [1436958573.646778, 676543, 0.02743060328066349], [1436959053.585655, 686750, 0.02743060328066349], [1436959533.679696, 696961, 0.02743060328066349], [1436960013.633292, 707173, 0.02743060328066349], [1436960493.578778, 717383, 0.02743060328066349], [1436960973.596715, 727598, 0.025784766301512718], [1436961453.625644, 737818, 0.025784766301512718], [1436961933.740339, 748040, 0.025784766301512718], [1436962413.573845, 758252, 0.025784766301512718], [1436962893.610678, 768470, 0.025784766301512718], [1436963373.642878, 778674, 0.025784766301512718], [1436963853.558388, 788877, 0.025784766301512718], [1436964333.658419, 799099, 0.025784766301512718], [1436964813.573319, 809289, 0.024237681180238724], [1436965293.542098, 819484, 0.024237681180238724], [1436965773.545453, 829687, 0.024237681180238724], [1436966253.586517, 839901, 0.024237681180238724], [1436966733.639348, 850120, 0.024237681180238724], [1436967213.697288, 860330, 0.024237681180238724], [1436967693.617172, 870539, 0.024237681180238724], [1436968173.593885, 880748, 0.024237681180238724], [1436968653.560836, 890955, 0.022783419117331505], [1436969133.676337, 901164, 0.022783419117331505], [1436969613.506638, 911358, 0.022783419117331505], [1436970093.595964, 921560, 0.022783419117331505], [1436970573.541227, 931756, 0.022783419117331505], [1436971053.624316, 941945, 0.022783419117331505], [1436971533.655543, 952138, 0.022783419117331505], [1436972013.604738, 962349, 0.02141641452908516], [1436972493.613199, 972551, 0.02141641452908516], [1436972973.501155, 982746, 0.02141641452908516], [1436973453.64842, 992945, 0.02141641452908516], [1436973933.689516, 1003147, 0.02141641452908516], [1436974413.577769, 1013350, 0.02141641452908516], [1436974893.542281, 1023545, 0.02141641452908516], [1436975373.638453, 1033759, 0.02141641452908516], [1436975853.524388, 1043955, 0.02013142965734005], [1436976333.625792, 1054148, 0.02013142965734005], [1436976813.610661, 1064342, 0.02013142965734005], [1436977293.601581, 1074539, 0.02013142965734005], [1436977773.575627, 1084733, 0.02013142965734005], [1436978253.564972, 1094914, 0.02013142965734005], [1436978733.673144, 1105109, 0.02013142965734005], [1436979213.540585, 1115293, 0.02013142965734005], [1436979693.699591, 1125483, 0.018923543393611908], [1436980173.613012, 1135670, 0.018923543393611908], [1436980653.575769, 1145862, 0.018923543393611908], [1436981133.719264, 1156045, 0.018923543393611908], [1436981613.563551, 1166236, 0.018923543393611908], [1436982093.553233, 1176436, 0.018923543393611908], [1436982573.577846, 1186636, 0.018923543393611908], [1436983053.605749, 1196837, 0.018923543393611908], [1436983533.684994, 1207025, 0.017788130789995193], [1436984013.561492, 1217233, 0.017788130789995193], [1436984493.629873, 1227437, 0.017788130789995193], [1436984973.606714, 1237643, 0.017788130789995193], [1436985453.690084, 1247835, 0.017788130789995193], [1436985933.711388, 1257951, 0.017788130789995193], [1436986413.598807, 1268125, 0.017788130789995193], [1436986893.631797, 1278290, 0.017788130789995193], [1436987373.596962, 1288473, 0.016720842570066452], [1436987853.555549, 1298650, 0.016720842570066452], [1436988333.722032, 1308841, 0.016720842570066452], [1436988813.55697, 1319018, 0.016720842570066452], [1436989293.756905, 1329221, 0.016720842570066452], [1436989773.665141, 1339417, 0.016720842570066452], [1436990253.768302, 1349610, 0.016720842570066452], [1436990733.708919, 1359759, 0.016720842570066452], [1436991213.663033, 1369914, 0.01571759209036827], [1436991693.730925, 1380074, 0.01571759209036827], [1436992173.751791, 1390224, 0.01571759209036827], [1436992653.758682, 1400383, 0.01571759209036827], [1436993133.835604, 1410542, 0.01571759209036827], [1436993613.674655, 1420684, 0.01571759209036827], [1436994093.747454, 1430832, 0.01571759209036827], [1436994573.768973, 1440986, 0.01571759209036827], [1436995053.666661, 1451174, 0.014774537645280361], [1436995533.83439, 1461345, 0.014774537645280361], [1436996013.556996, 1471495, 0.014774537645280361], [1436996493.635477, 1481663, 0.014774537645280361], [1436996973.668684, 1491822, 0.014774537645280361], [1436997453.59326, 1501979, 0.014774537645280361], [1436997933.774019, 1512139, 0.014774537645280361], [1436998413.575162, 1522290, 0.01388806477189064], [1436998893.640468, 1532431, 0.01388806477189064], [1436999373.551661, 1542579, 0.01388806477189064], [1436999853.57906, 1552734, 0.01388806477189064], [1437000333.680409, 1562888, 0.01388806477189064], [1437000813.602383, 1573037, 0.01388806477189064], [1437001293.610337, 1583190, 0.01388806477189064], [1437001773.618199, 1593341, 0.01388806477189064], [1437002253.572966, 1603497, 0.013054781593382359], [1437002733.67994, 1613657, 0.013054781593382359], [1437003213.583266, 1623809, 0.013054781593382359], [1437003693.639943, 1633966, 0.013054781593382359], [1437004173.568287, 1644113, 0.013054781593382359], [1437004653.610772, 1654268, 0.013054781593382359], [1437005133.663045, 1664424, 0.013054781593382359], [1437005613.580984, 1674567, 0.013054781593382359], [1437006093.601019, 1684715, 0.01227149460464716], [1437006573.625314, 1694857, 0.01227149460464716], [1437007053.584514, 1704999, 0.01227149460464716], [1437007533.719303, 1715150, 0.01227149460464716], [1437008013.604962, 1725282, 0.01227149460464716], [1437008493.655091, 1735432, 0.01227149460464716], [1437008973.640165, 1745584, 0.01227149460464716], [1437009453.715067, 1755742, 0.01227149460464716], [1437009933.765712, 1765896, 0.011535204015672207], [1437010413.632128, 1776052, 0.011535204015672207], [1437010893.66766, 1786195, 0.011535204015672207], [1437011373.636164, 1796346, 0.011535204015672207], [1437011853.631224, 1806481, 0.011535204015672207], [1437012333.706205, 1816617, 0.011535204015672207], [1437012813.61987, 1826754, 0.011535204015672207], [1437013293.479904, 1836883, 0.011535204015672207], [1437013773.604574, 1847029, 0.010843091644346714], [1437014253.618884, 1857175, 0.010843091644346714], [1437014733.756419, 1867312, 0.010843091644346714], [1437015213.638607, 1877459, 0.010843091644346714], [1437015693.625763, 1887608, 0.010843091644346714], [1437016173.63194, 1897759, 0.010843091644346714], [1437016653.609074, 1907909, 0.010843091644346714], [1437017133.717601, 1918074, 0.010843091644346714], [1437017613.716011, 1928220, 0.010192506946623325], [1437018093.626005, 1938377, 0.010192506946623325], [1437018573.626522, 1948523, 0.010192506946623325], [1437019053.648174, 1958678, 0.010192506946623325], [1437019533.803011, 1968831, 0.010192506946623325], [1437020013.667751, 1978978, 0.010192506946623325], [1437020493.659028, 1989133, 0.010192506946623325], [1437020973.657346, 1999287, 0.010192506946623325], [1437021453.650634, 2009437, 0.00958095584064722], [1437021933.848661, 2019588, 0.00958095584064722], [1437022413.674963, 2029736, 0.00958095584064722], [1437022893.69086, 2039894, 0.00958095584064722], [1437023373.68883, 2050054, 0.00958095584064722], [1437023853.686116, 2060205, 0.00958095584064722], [1437024333.763876, 2070362, 0.00958095584064722], [1437024813.707845, 2080507, 0.00958095584064722], [1437025293.483294, 2090645, 0.009006098844110966], [1437025773.695712, 2100793, 0.009006098844110966], [1437026253.672994, 2110943, 0.009006098844110966], [1437026733.780775, 2121094, 0.009006098844110966], [1437027213.617849, 2131235, 0.009006098844110966], [1437027693.694451, 2141382, 0.009006098844110966], [1437028173.68596, 2151537, 0.009006098844110966], [1437028653.584833, 2161685, 0.009006098844110966], [1437029133.792483, 2171839, 0.00846573244780302], [1437029613.661672, 2181977, 0.00846573244780302], [1437030093.641009, 2192118, 0.00846573244780302], [1437030573.656274, 2202268, 0.00846573244780302], [1437031053.643631, 2212416, 0.00846573244780302], [1437031533.777478, 2222583, 0.00846573244780302], [1437032013.704008, 2232736, 0.00846573244780302], [1437032493.638393, 2242882, 0.007957788184285164], [1437032973.684986, 2253041, 0.007957788184285164], [1437033453.699562, 2263183, 0.007957788184285164], [1437033933.918074, 2273320, 0.007957788184285164], [1437034413.596351, 2283443, 0.007957788184285164], [1437034893.640496, 2293579, 0.007957788184285164], [1437035373.637761, 2303701, 0.007957788184285164], [1437035853.669947, 2313823, 0.007957788184285164], [1437036333.78905, 2323961, 0.0074803209863603115], [1437036813.699727, 2334089, 0.0074803209863603115], [1437037293.662592, 2344235, 0.0074803209863603115], [1437037773.66716, 2354364, 0.0074803209863603115], [1437038253.603687, 2364507, 0.0074803209863603115], [1437038733.78864, 2374644, 0.0074803209863603115], [1437039213.641799, 2384782, 0.0074803209863603115], [1437039693.687078, 2394923, 0.0074803209863603115], [1437040173.635717, 2405058, 0.0070315017364919186], [1437040653.673331, 2415194, 0.0070315017364919186], [1437041133.764768, 2425322, 0.0070315017364919186], [1437041613.629279, 2435449, 0.0070315017364919186], [1437042093.703985, 2445575, 0.0070315017364919186], [1437042573.496029, 2455712, 0.0070315017364919186], [1437043053.686022, 2465844, 0.0070315017364919186], [1437043533.731929, 2475974, 0.0070315017364919186], [1437044013.636245, 2486095, 0.006609611678868532], [1437044493.69923, 2496238, 0.006609611678868532], [1437044973.652155, 2506373, 0.006609611678868532], [1437045453.691467, 2516497, 0.006609611678868532], [1437045933.935804, 2526637, 0.006609611678868532], [1437046413.635583, 2536770, 0.006609611678868532], [1437046893.626337, 2546896, 0.006609611678868532], [1437047373.67437, 2557029, 0.006609611678868532], [1437047853.652939, 2567169, 0.0062130349688231945], [1437048333.778436, 2577306, 0.0062130349688231945], [1437048813.654248, 2587433, 0.0062130349688231945], [1437049293.610609, 2597552, 0.0062130349688231945], [1437049773.646573, 2607690, 0.0062130349688231945], [1437050253.667925, 2617808, 0.0062130349688231945], [1437050733.735291, 2627933, 0.0062130349688231945], [1437051213.620222, 2638053, 0.0062130349688231945], [1437051693.601978, 2648171, 0.005840253084897995], [1437052173.634985, 2658299, 0.005840253084897995], [1437052653.687176, 2668425, 0.005840253084897995], [1437053133.762819, 2678556, 0.005840253084897995], [1437053613.643698, 2688671, 0.005840253084897995], [1437054093.673047, 2698804, 0.005840253084897995], [1437054573.667371, 2708956, 0.005840253084897995], [1437055053.650441, 2719087, 0.005840253084897995], [1437055533.778469, 2729219, 0.005489837843924761], [1437056013.694082, 2739343, 0.005489837843924761], [1437056493.674871, 2749458, 0.005489837843924761], [1437056973.700234, 2759575, 0.005489837843924761], [1437057453.666129, 2769697, 0.005489837843924761], [1437057933.848506, 2779821, 0.005489837843924761], [1437058413.643799, 2789941, 0.005489837843924761], [1437058893.715386, 2800076, 0.005489837843924761], [1437059373.62596, 2810207, 0.005160447675734758], [1437059853.650848, 2820334, 0.005160447675734758], [1437060333.792248, 2830465, 0.005160447675734758], [1437060813.682955, 2840600, 0.005160447675734758], [1437061293.681795, 2850745, 0.005160447675734758], [1437061773.691182, 2860880, 0.005160447675734758], [1437062253.662987, 2871013, 0.005160447675734758], [1437062733.760419, 2881153, 0.005160447675734758], [1437063213.651969, 2891278, 0.004850820638239384], [1437063693.723523, 2901406, 0.004850820638239384], [1437064173.68663, 2911533, 0.004850820638239384], [1437064653.547643, 2921667, 0.004850820638239384], [1437065133.62645, 2931813, 0.004850820638239384], [1437065613.566569, 2941947, 0.004850820638239384], [1437066093.537804, 2952102, 0.004850820638239384], [1437066573.529332, 2962243, 0.004850820638239384], [1437067053.520098, 2972400, 0.004559771623462439], [1437067533.605733, 2982561, 0.004559771623462439], [1437068013.535467, 2992698, 0.004559771623462439], [1437068493.559976, 3002839, 0.004559771623462439], [1437068973.558743, 3012983, 0.004559771623462439], [1437069453.562661, 3023116, 0.004559771623462439], [1437069933.627071, 3033256, 0.004559771623462439], [1437070413.574131, 3043386, 0.004286185372620821], [1437070893.658803, 3053528, 0.004286185372620821], [1437071373.638711, 3063659, 0.004286185372620821], [1437071853.621384, 3073794, 0.004286185372620821], [1437072333.665269, 3083926, 0.004286185372620821], [1437072813.584388, 3094040, 0.004286185372620821], [1437073293.569178, 3104172, 0.004286185372620821]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d3.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d3.json
new file mode 100644
index 0000000000..69191b9154
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d3.json
@@ -0,0 +1 @@
+[[1436925978.257845, 7, 0.0], [1436927853.989794, 23650, 7360.0], [1436929773.992781, 64316, 7360.0], [1436931694.094564, 104766, 7360.0], [1436933614.044337, 145062, 7360.0], [1436935533.819562, 185538, 7360.0], [1436937454.138383, 226331, 7360.0], [1436939373.69067, 267137, 7360.0], [1436941293.596313, 308088, 7360.0], [1436943213.621881, 349144, 7360.0], [1436945133.677298, 390180, 7360.0], [1436947053.616578, 431216, 7360.0], [1436948973.512541, 472260, 7360.0], [1436950893.563009, 513185, 7360.0], [1436952813.621004, 554042, 7360.0], [1436954733.738119, 594882, 7360.0], [1436956653.601937, 635705, 7360.0], [1436958573.646778, 676543, 7360.0], [1436960493.578778, 717383, 7360.0], [1436962413.573845, 758252, 7360.0], [1436964333.658419, 799099, 7360.0], [1436966253.586517, 839901, 7360.0], [1436968173.593885, 880748, 7360.0], [1436970093.595964, 921560, 7360.0], [1436972013.604738, 962349, 7360.0], [1436973933.689516, 1003147, 7360.0], [1436975853.524388, 1043955, 7360.0], [1436977773.575627, 1084733, 7360.0], [1436979693.699591, 1125483, 7360.0], [1436981613.563551, 1166236, 7360.0], [1436983533.684994, 1207025, 7360.0], [1436985453.690084, 1247835, 7360.0], [1436987373.596962, 1288473, 7360.0], [1436989293.756905, 1329221, 7360.0], [1436991213.663033, 1369914, 7360.0], [1436993133.835604, 1410542, 7360.0], [1436995053.666661, 1451174, 7360.0], [1436996973.668684, 1491822, 7360.0], [1436998893.640468, 1532431, 7360.0], [1437000813.602383, 1573037, 7360.0], [1437002733.67994, 1613657, 7360.0], [1437004653.610772, 1654268, 7360.0], [1437006573.625314, 1694857, 7360.0], [1437008493.655091, 1735432, 7360.0], [1437010413.632128, 1776052, 7360.0], [1437012333.706205, 1816617, 7360.0], [1437014253.618884, 1857175, 7360.0], [1437016173.63194, 1897759, 7360.0], [1437018093.626005, 1938377, 7360.0], [1437020013.667751, 1978978, 7360.0], [1437021933.848661, 2019588, 7360.0], [1437023853.686116, 2060205, 7360.0], [1437025773.695712, 2100793, 7360.0], [1437027693.694451, 2141382, 7360.0], [1437029613.661672, 2181977, 7360.0], [1437031533.777478, 2222583, 7360.0], [1437033453.699562, 2263183, 7360.0], [1437035373.637761, 2303701, 7360.0], [1437037293.662592, 2344235, 7360.0], [1437039213.641799, 2384782, 7360.0], [1437041133.764768, 2425322, 7360.0], [1437043053.686022, 2465844, 7360.0], [1437044973.652155, 2506373, 7360.0], [1437046893.626337, 2546896, 7862.0], [1437048813.654248, 2587433, 7862.0], [1437050733.735291, 2627933, 7862.0], [1437052653.687176, 2668425, 7862.0], [1437054573.667371, 2708956, 7862.0], [1437056493.674871, 2749458, 7862.0], [1437058413.643799, 2789941, 7862.0], [1437060333.792248, 2830465, 7862.0], [1437062253.662987, 2871013, 7862.0], [1437064173.68663, 2911533, 7862.0], [1437066093.537804, 2952102, 7862.0], [1437068013.535467, 2992698, 7862.0], [1437069933.627071, 3033256, 7862.0], [1437071853.621384, 3073794, 7862.0], [1437072333.665269, 3083926, 7862.0], [1437072813.584388, 3094040, 7862.0], [1437073293.569178, 3104172, 7862.0]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d4.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d4.json
new file mode 100644
index 0000000000..caf1ae6e7f
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/alpha/d4.json
@@ -0,0 +1 @@
+[[1436925978.257845, 7, 2.461352825164795], [1436926413.945391, 1476, 12.772720336914062], [1436926893.945037, 6006, 12.195232391357422], [1436927373.995472, 13786, 11.528279304504395], [1436927853.989794, 23650, 10.722719192504883], [1436928334.132361, 33755, 10.215253829956055], [1436928813.973288, 43941, 9.730447769165039], [1436929293.975949, 54146, 9.399007797241211], [1436929773.992781, 64316, 9.1018648147583], [1436930253.997415, 74465, 8.961446762084961], [1436930734.203004, 84611, 8.757476806640625], [1436931214.03644, 94700, 8.4615478515625], [1436931694.094564, 104766, 8.506814956665039], [1436932174.114955, 114817, 8.246719360351562], [1436932654.161382, 124880, 8.329349517822266], [1436933133.960214, 134977, 7.90853214263916], [1436933614.044337, 145062, 8.192558288574219], [1436934094.166206, 155169, 7.865443229675293], [1436934574.106036, 165284, 7.910976886749268], [1436935054.150647, 175402, 7.925509929656982], [1436935533.819562, 185538, 7.866455078125], [1436936013.710422, 195712, 7.9123406410217285], [1436936493.609025, 205906, 7.748654842376709], [1436936973.683892, 216099, 7.849164009094238], [1436937454.138383, 226331, 7.784902572631836], [1436937933.838475, 236532, 7.749933242797852], [1436938413.89688, 246724, 7.777050971984863], [1436938894.018652, 256925, 7.663984775543213], [1436939373.69067, 267137, 7.602056980133057], [1436939853.673692, 277369, 7.539070129394531], [1436940333.651346, 287620, 7.575552463531494], [1436940813.599579, 297848, 7.47900390625], [1436941293.596313, 308088, 7.403858184814453], [1436941773.659172, 318362, 7.589539527893066], [1436942253.648479, 328621, 7.511919975280762], [1436942733.752284, 338892, 7.31054162979126], [1436943213.621881, 349144, 7.261094570159912], [1436943693.698743, 359399, 7.552957534790039], [1436944173.578463, 369649, 7.449452877044678], [1436944653.692217, 379912, 7.177209854125977], [1436945133.677298, 390180, 7.308793067932129], [1436945613.572411, 400445, 7.229344844818115], [1436946093.56123, 410703, 7.129981994628906], [1436946573.542364, 420958, 7.127549171447754], [1436947053.616578, 431216, 7.538583755493164], [1436947533.636973, 441483, 7.030594825744629], [1436948013.541574, 451751, 6.98097038269043], [1436948493.560223, 462015, 7.213271141052246], [1436948973.512541, 472260, 7.1727519035339355], [1436949453.550055, 482483, 6.985068321228027], [1436949933.828011, 492731, 7.051283836364746], [1436950413.603177, 502957, 7.082402229309082], [1436950893.563009, 513185, 7.1637864112854], [1436951373.620887, 523410, 7.193849086761475], [1436951853.61941, 533618, 7.1212921142578125], [1436952333.694447, 543828, 7.208009719848633], [1436952813.621004, 554042, 7.28671932220459], [1436953293.588156, 564251, 6.941026210784912], [1436953773.599734, 574464, 7.230144500732422], [1436954253.621309, 584672, 6.815900802612305], [1436954733.738119, 594882, 7.060589790344238], [1436955213.56617, 605091, 7.079995155334473], [1436955693.585366, 615296, 7.300849437713623], [1436956173.626395, 625501, 6.927395343780518], [1436956653.601937, 635705, 6.893837928771973], [1436957133.665878, 645915, 6.965301990509033], [1436957613.584762, 656116, 6.902514457702637], [1436958093.549783, 666331, 7.2444868087768555], [1436958573.646778, 676543, 6.784783840179443], [1436959053.585655, 686750, 6.800273418426514], [1436959533.679696, 696961, 6.743415355682373], [1436960013.633292, 707173, 7.012747764587402], [1436960493.578778, 717383, 6.548677921295166], [1436960973.596715, 727598, 6.638228416442871], [1436961453.625644, 737818, 6.884350776672363], [1436961933.740339, 748040, 6.797428607940674], [1436962413.573845, 758252, 6.815422058105469], [1436962893.610678, 768470, 6.7392377853393555], [1436963373.642878, 778674, 6.8375959396362305], [1436963853.558388, 788877, 6.7254252433776855], [1436964333.658419, 799099, 6.765130996704102], [1436964813.573319, 809289, 6.7060980796813965], [1436965293.542098, 819484, 6.63279390335083], [1436965773.545453, 829687, 6.587352752685547], [1436966253.586517, 839901, 6.4957275390625], [1436966733.639348, 850120, 6.765798091888428], [1436967213.697288, 860330, 6.681786060333252], [1436967693.617172, 870539, 6.696804523468018], [1436968173.593885, 880748, 6.571035385131836], [1436968653.560836, 890955, 6.29492712020874], [1436969133.676337, 901164, 6.679598331451416], [1436969613.506638, 911358, 6.548522472381592], [1436970093.595964, 921560, 6.585646629333496], [1436970573.541227, 931756, 6.589619159698486], [1436971053.624316, 941945, 6.333208084106445], [1436971533.655543, 952138, 6.582470417022705], [1436972013.604738, 962349, 6.289045810699463], [1436972493.613199, 972551, 6.360206127166748], [1436972973.501155, 982746, 6.567287921905518], [1436973453.64842, 992945, 6.246123313903809], [1436973933.689516, 1003147, 6.44004487991333], [1436974413.577769, 1013350, 6.315634727478027], [1436974893.542281, 1023545, 6.289544105529785], [1436975373.638453, 1033759, 6.412042140960693], [1436975853.524388, 1043955, 6.165371894836426], [1436976333.625792, 1054148, 6.403027534484863], [1436976813.610661, 1064342, 6.37597131729126], [1436977293.601581, 1074539, 6.336863994598389], [1436977773.575627, 1084733, 6.377552032470703], [1436978253.564972, 1094914, 6.28995943069458], [1436978733.673144, 1105109, 6.28420352935791], [1436979213.540585, 1115293, 6.277828216552734], [1436979693.699591, 1125483, 6.185207843780518], [1436980173.613012, 1135670, 6.186310768127441], [1436980653.575769, 1145862, 5.922095775604248], [1436981133.719264, 1156045, 6.141305923461914], [1436981613.563551, 1166236, 6.10508394241333], [1436982093.553233, 1176436, 5.967081069946289], [1436982573.577846, 1186636, 5.960882186889648], [1436983053.605749, 1196837, 6.2222185134887695], [1436983533.684994, 1207025, 6.051136493682861], [1436984013.561492, 1217233, 6.087917804718018], [1436984493.629873, 1227437, 5.95945405960083], [1436984973.606714, 1237643, 5.971570014953613], [1436985453.690084, 1247835, 5.969781398773193], [1436985933.711388, 1257951, 6.040994644165039], [1436986413.598807, 1268125, 6.142050743103027], [1436986893.631797, 1278290, 6.03120231628418], [1436987373.596962, 1288473, 5.921470642089844], [1436987853.555549, 1298650, 5.921937942504883], [1436988333.722032, 1308841, 6.050085067749023], [1436988813.55697, 1319018, 5.837893486022949], [1436989293.756905, 1329221, 5.927487850189209], [1436989773.665141, 1339417, 6.117348670959473], [1436990253.768302, 1349610, 6.052918434143066], [1436990733.708919, 1359759, 5.8977789878845215], [1436991213.663033, 1369914, 5.903198719024658], [1436991693.730925, 1380074, 5.85245418548584], [1436992173.751791, 1390224, 5.902153968811035], [1436992653.758682, 1400383, 5.822136878967285], [1436993133.835604, 1410542, 5.88037633895874], [1436993613.674655, 1420684, 5.778636932373047], [1436994093.747454, 1430832, 5.876591682434082], [1436994573.768973, 1440986, 6.196285724639893], [1436995053.666661, 1451174, 5.7718634605407715], [1436995533.83439, 1461345, 5.931266784667969], [1436996013.556996, 1471495, 5.9706597328186035], [1436996493.635477, 1481663, 5.589694023132324], [1436996973.668684, 1491822, 5.787637233734131], [1436997453.59326, 1501979, 5.634321689605713], [1436997933.774019, 1512139, 5.699962615966797], [1436998413.575162, 1522290, 5.807012557983398], [1436998893.640468, 1532431, 5.559602737426758], [1436999373.551661, 1542579, 5.918235778808594], [1436999853.57906, 1552734, 5.745569229125977], [1437000333.680409, 1562888, 5.59443473815918], [1437000813.602383, 1573037, 5.703190326690674], [1437001293.610337, 1583190, 5.468636512756348], [1437001773.618199, 1593341, 5.610755920410156], [1437002253.572966, 1603497, 5.4396867752075195], [1437002733.67994, 1613657, 5.7537946701049805], [1437003213.583266, 1623809, 5.7613725662231445], [1437003693.639943, 1633966, 5.439754009246826], [1437004173.568287, 1644113, 5.4889116287231445], [1437004653.610772, 1654268, 5.39843225479126], [1437005133.663045, 1664424, 5.576738357543945], [1437005613.580984, 1674567, 5.662004470825195], [1437006093.601019, 1684715, 5.3926777839660645], [1437006573.625314, 1694857, 5.464866638183594], [1437007053.584514, 1704999, 5.40261173248291], [1437007533.719303, 1715150, 5.23733377456665], [1437008013.604962, 1725282, 5.448479652404785], [1437008493.655091, 1735432, 5.684703826904297], [1437008973.640165, 1745584, 5.400024890899658], [1437009453.715067, 1755742, 5.378822326660156], [1437009933.765712, 1765896, 5.45297384262085], [1437010413.632128, 1776052, 5.248030185699463], [1437010893.66766, 1786195, 5.3377580642700195], [1437011373.636164, 1796346, 5.292956352233887], [1437011853.631224, 1806481, 5.438100814819336], [1437012333.706205, 1816617, 5.148743629455566], [1437012813.61987, 1826754, 5.319127559661865], [1437013293.479904, 1836883, 5.1646199226379395], [1437013773.604574, 1847029, 5.494720458984375], [1437014253.618884, 1857175, 5.17764949798584], [1437014733.756419, 1867312, 5.14331579208374], [1437015213.638607, 1877459, 5.309914588928223], [1437015693.625763, 1887608, 5.542352676391602], [1437016173.63194, 1897759, 5.075393199920654], [1437016653.609074, 1907909, 5.249225616455078], [1437017133.717601, 1918074, 5.392384052276611], [1437017613.716011, 1928220, 5.38590669631958], [1437018093.626005, 1938377, 5.229607105255127], [1437018573.626522, 1948523, 5.287610054016113], [1437019053.648174, 1958678, 5.2798333168029785], [1437019533.803011, 1968831, 5.151246070861816], [1437020013.667751, 1978978, 5.118294715881348], [1437020493.659028, 1989133, 5.327050685882568], [1437020973.657346, 1999287, 5.174264430999756], [1437021453.650634, 2009437, 5.1660661697387695], [1437021933.848661, 2019588, 5.089689254760742], [1437022413.674963, 2029736, 5.06661319732666], [1437022893.69086, 2039894, 5.031608581542969], [1437023373.68883, 2050054, 4.874476432800293], [1437023853.686116, 2060205, 5.107512474060059], [1437024333.763876, 2070362, 5.135380268096924], [1437024813.707845, 2080507, 5.087984561920166], [1437025293.483294, 2090645, 5.240448474884033], [1437025773.695712, 2100793, 4.930302619934082], [1437026253.672994, 2110943, 4.914392471313477], [1437026733.780775, 2121094, 5.182378768920898], [1437027213.617849, 2131235, 4.93843412399292], [1437027693.694451, 2141382, 4.924433708190918], [1437028173.68596, 2151537, 4.957921028137207], [1437028653.584833, 2161685, 5.040386199951172], [1437029133.792483, 2171839, 5.01956033706665], [1437029613.661672, 2181977, 4.987490177154541], [1437030093.641009, 2192118, 4.960195064544678], [1437030573.656274, 2202268, 5.0094523429870605], [1437031053.643631, 2212416, 4.83445930480957], [1437031533.777478, 2222583, 4.922268390655518], [1437032013.704008, 2232736, 5.113382339477539], [1437032493.638393, 2242882, 4.881488800048828], [1437032973.684986, 2253041, 4.953296661376953], [1437033453.699562, 2263183, 4.865671157836914], [1437033933.918074, 2273320, 4.829331874847412], [1437034413.596351, 2283443, 4.777036190032959], [1437034893.640496, 2293579, 4.864566326141357], [1437035373.637761, 2303701, 4.988693714141846], [1437035853.669947, 2313823, 5.016432285308838], [1437036333.78905, 2323961, 4.651939868927002], [1437036813.699727, 2334089, 4.767807960510254], [1437037293.662592, 2344235, 4.628738880157471], [1437037773.66716, 2354364, 4.929834842681885], [1437038253.603687, 2364507, 4.739555835723877], [1437038733.78864, 2374644, 4.821824073791504], [1437039213.641799, 2384782, 4.853730201721191], [1437039693.687078, 2394923, 4.581423759460449], [1437040173.635717, 2405058, 4.452754497528076], [1437040653.673331, 2415194, 4.837629318237305], [1437041133.764768, 2425322, 4.752482891082764], [1437041613.629279, 2435449, 4.730231761932373], [1437042093.703985, 2445575, 4.5618896484375], [1437042573.496029, 2455712, 4.673112869262695], [1437043053.686022, 2465844, 4.565918922424316], [1437043533.731929, 2475974, 4.7191481590271], [1437044013.636245, 2486095, 4.589008331298828], [1437044493.69923, 2496238, 4.599475383758545], [1437044973.652155, 2506373, 4.544175624847412], [1437045453.691467, 2516497, 4.4221673011779785], [1437045933.935804, 2526637, 4.44448709487915], [1437046413.635583, 2536770, 4.647110939025879], [1437046893.626337, 2546896, 4.768988609313965], [1437047373.67437, 2557029, 4.5318827629089355], [1437047853.652939, 2567169, 4.501277923583984], [1437048333.778436, 2577306, 4.6167216300964355], [1437048813.654248, 2587433, 4.66096305847168], [1437049293.610609, 2597552, 4.529193878173828], [1437049773.646573, 2607690, 4.455351829528809], [1437050253.667925, 2617808, 4.51211404800415], [1437050733.735291, 2627933, 4.803231716156006], [1437051213.620222, 2638053, 4.645476341247559], [1437051693.601978, 2648171, 4.419768810272217], [1437052173.634985, 2658299, 4.48175048828125], [1437052653.687176, 2668425, 4.397725582122803], [1437053133.762819, 2678556, 4.188413619995117], [1437053613.643698, 2688671, 4.291479110717773], [1437054093.673047, 2698804, 4.321218013763428], [1437054573.667371, 2708956, 4.311710834503174], [1437055053.650441, 2719087, 4.481810092926025], [1437055533.778469, 2729219, 4.452049255371094], [1437056013.694082, 2739343, 4.455989360809326], [1437056493.674871, 2749458, 4.415104866027832], [1437056973.700234, 2759575, 4.259828567504883], [1437057453.666129, 2769697, 4.510563373565674], [1437057933.848506, 2779821, 4.221935272216797], [1437058413.643799, 2789941, 4.437899112701416], [1437058893.715386, 2800076, 4.302872657775879], [1437059373.62596, 2810207, 4.228428363800049], [1437059853.650848, 2820334, 4.220061779022217], [1437060333.792248, 2830465, 4.138088703155518], [1437060813.682955, 2840600, 4.2196125984191895], [1437061293.681795, 2850745, 4.1594085693359375], [1437061773.691182, 2860880, 4.179514408111572], [1437062253.662987, 2871013, 4.202476978302002], [1437062733.760419, 2881153, 4.282044887542725], [1437063213.651969, 2891278, 4.200533866882324], [1437063693.723523, 2901406, 4.263350486755371], [1437064173.68663, 2911533, 4.378939628601074], [1437064653.547643, 2921667, 4.202810287475586], [1437065133.62645, 2931813, 4.193121910095215], [1437065613.566569, 2941947, 4.132870197296143], [1437066093.537804, 2952102, 4.35767936706543], [1437066573.529332, 2962243, 4.211732864379883], [1437067053.520098, 2972400, 4.020431041717529], [1437067533.605733, 2982561, 4.342063903808594], [1437068013.535467, 2992698, 4.197565078735352], [1437068493.559976, 3002839, 3.8806259632110596], [1437068973.558743, 3012983, 3.871702194213867], [1437069453.562661, 3023116, 4.064865589141846], [1437069933.627071, 3033256, 3.817744731903076], [1437070413.574131, 3043386, 4.106888294219971], [1437070893.658803, 3053528, 4.235474586486816], [1437071373.638711, 3063659, 4.127055644989014], [1437071853.621384, 3073794, 4.176018238067627], [1437072333.665269, 3083926, 4.048959732055664], [1437072813.584388, 3094040, 4.178991794586182], [1437073293.569178, 3104172, 3.8385396003723145]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d1.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d1.json
new file mode 100644
index 0000000000..27ff64e5dd
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d1.json
@@ -0,0 +1 @@
+[[1436925978.257845, 7, 1.009283951483666897], [1436926413.945391, 1476, 0.932567862421274185], [1436926893.945037, 6006, 0.02773338556289673], [1436927373.995472, 13786, 0.021291319280862808], [1436927853.989794, 23650, 0.515582754276692867], [1436928334.132361, 33755, 0.011689444072544575], [1436928813.973288, 43941, 0.009183925576508045], [1436929293.975949, 54146, 0.007850822061300278], [1436929773.992781, 64316, 0.007189035415649414], [1436930253.997415, 74465, 0.007230754010379314], [1436930734.203004, 84611, 0.007685001939535141], [1436931214.03644, 94700, 0.008264732547104359], [1436931694.094564, 104766, 0.008946491405367851], [1436932174.114955, 114817, 0.00966302677989006], [1436932654.161382, 124880, 0.010994276031851768], [1436933133.960214, 134977, 0.01196141354739666], [1436933614.044337, 145062, 0.012673594057559967], [1436934094.166206, 155169, 0.013639944605529308], [1436934574.106036, 165284, 0.014305333606898785], [1436935054.150647, 175402, 0.014946178533136845], [1436935533.819562, 185538, 0.015736915171146393], [1436936013.710422, 195712, 0.01633097417652607], [1436936493.609025, 205906, 0.01669587567448616], [1436936973.683892, 216099, 0.017459288239479065], [1436937454.138383, 226331, 0.018532060086727142], [1436937933.838475, 236532, 0.01949254982173443], [1436938413.89688, 246724, 0.01951725408434868], [1436938894.018652, 256925, 0.019763393327593803], [1436939373.69067, 267137, 0.02008610963821411], [1436939853.673692, 277369, 0.021090799942612648], [1436940333.651346, 287620, 0.021408839151263237], [1436940813.599579, 297848, 0.021988894790410995], [1436941293.596313, 308088, 0.02236073836684227], [1436941773.659172, 318362, 0.022547174245119095], [1436942253.648479, 328621, 0.02303086407482624], [1436942733.752284, 338892, 0.023787079378962517], [1436943213.621881, 349144, 0.024007514119148254], [1436943693.698743, 359399, 0.02414763905107975], [1436944173.578463, 369649, 0.024576496332883835], [1436944653.692217, 379912, 0.02469169721007347], [1436945133.677298, 390180, 0.024951916188001633], [1436945613.572411, 400445, 0.025548970326781273], [1436946093.56123, 410703, 0.025769377127289772], [1436946573.542364, 420958, 0.02602097950875759], [1436947053.616578, 431216, 0.026028109714388847], [1436947533.636973, 441483, 0.026348495855927467], [1436948013.541574, 451751, 0.02621930092573166], [1436948493.560223, 462015, 0.02671053633093834], [1436948973.512541, 472260, 0.0272178016602993], [1436949453.550055, 482483, 0.02734796144068241], [1436949933.828011, 492731, 0.027217809110879898], [1436950413.603177, 502957, 0.027318621054291725], [1436950893.563009, 513185, 0.027304155752062798], [1436951373.620887, 523410, 0.027759933844208717], [1436951853.61941, 533618, 0.028056234121322632], [1436952333.694447, 543828, 0.028620803728699684], [1436952813.621004, 554042, 0.028957637026906013], [1436953293.588156, 564251, 0.029187509790062904], [1436953773.599734, 574464, 0.028960268944501877], [1436954253.621309, 584672, 0.02891424670815468], [1436954733.738119, 594882, 0.029211293905973434], [1436955213.56617, 605091, 0.029444213956594467], [1436955693.585366, 615296, 0.02974688820540905], [1436956173.626395, 625501, 0.03026159666478634], [1436956653.601937, 635705, 0.03039497137069702], [1436957133.665878, 645915, 0.03041839227080345], [1436957613.584762, 656116, 0.030588043853640556], [1436958093.549783, 666331, 0.030284974724054337], [1436958573.646778, 676543, 0.030354496091604233], [1436959053.585655, 686750, 0.030551007017493248], [1436959533.679696, 696961, 0.03068561479449272], [1436960013.633292, 707173, 0.030921893194317818], [1436960493.578778, 717383, 0.031080031767487526], [1436960973.596715, 727598, 0.030773505568504333], [1436961453.625644, 737818, 0.03084484674036503], [1436961933.740339, 748040, 0.03110458515584469], [1436962413.573845, 758252, 0.03114113211631775], [1436962893.610678, 768470, 0.03101053647696972], [1436963373.642878, 778674, 0.03110116347670555], [1436963853.558388, 788877, 0.031342316418886185], [1436964333.658419, 799099, 0.03130127117037773], [1436964813.573319, 809289, 0.031288161873817444], [1436965293.542098, 819484, 0.031435444951057434], [1436965773.545453, 829687, 0.03166936710476875], [1436966253.586517, 839901, 0.03169429674744606], [1436966733.639348, 850120, 0.03191458433866501], [1436967213.697288, 860330, 0.03205746412277222], [1436967693.617172, 870539, 0.03206293657422066], [1436968173.593885, 880748, 0.031957853585481644], [1436968653.560836, 890955, 0.0316658616065979], [1436969133.676337, 901164, 0.031929533928632736], [1436969613.506638, 911358, 0.03174331784248352], [1436970093.595964, 921560, 0.03157960623502731], [1436970573.541227, 931756, 0.03176721930503845], [1436971053.624316, 941945, 0.031810544431209564], [1436971533.655543, 952138, 0.031946416944265366], [1436972013.604738, 962349, 0.03205405920743942], [1436972493.613199, 972551, 0.031924981623888016], [1436972973.501155, 982746, 0.03199697285890579], [1436973453.64842, 992945, 0.03204970061779022], [1436973933.689516, 1003147, 0.032020214945077896], [1436974413.577769, 1013350, 0.03207497298717499], [1436974893.542281, 1023545, 0.03221454098820686], [1436975373.638453, 1033759, 0.032191887497901917], [1436975853.524388, 1043955, 0.03240729123353958], [1436976333.625792, 1054148, 0.032219529151916504], [1436976813.610661, 1064342, 0.03200426697731018], [1436977293.601581, 1074539, 0.03198647499084473], [1436977773.575627, 1084733, 0.0320645235478878], [1436978253.564972, 1094914, 0.0322980061173439], [1436978733.673144, 1105109, 0.032482605427503586], [1436979213.540585, 1115293, 0.032628435641527176], [1436979693.699591, 1125483, 0.032744552940130234], [1436980173.613012, 1135670, 0.03268158435821533], [1436980653.575769, 1145862, 0.0324023962020874], [1436981133.719264, 1156045, 0.03237328305840492], [1436981613.563551, 1166236, 0.03202575817704201], [1436982093.553233, 1176436, 0.03216284513473511], [1436982573.577846, 1186636, 0.03232415020465851], [1436983053.605749, 1196837, 0.0324099175632], [1436983533.684994, 1207025, 0.03245137259364128], [1436984013.561492, 1217233, 0.032246463000774384], [1436984493.629873, 1227437, 0.032042667269706726], [1436984973.606714, 1237643, 0.0318642184138298], [1436985453.690084, 1247835, 0.03191140666604042], [1436985933.711388, 1257951, 0.032287366688251495], [1436986413.598807, 1268125, 0.03226638585329056], [1436986893.631797, 1278290, 0.03252791240811348], [1436987373.596962, 1288473, 0.03241675719618797], [1436987853.555549, 1298650, 0.032103829085826874], [1436988333.722032, 1308841, 0.031904906034469604], [1436988813.55697, 1319018, 0.03179024159908295], [1436989293.756905, 1329221, 0.03168707340955734], [1436989773.665141, 1339417, 0.03160175681114197], [1436990253.768302, 1349610, 0.03161788731813431], [1436990733.708919, 1359759, 0.031772397458553314], [1436991213.663033, 1369914, 0.031758904457092285], [1436991693.730925, 1380074, 0.031629469245672226], [1436992173.751791, 1390224, 0.03154703974723816], [1436992653.758682, 1400383, 0.031527940183877945], [1436993133.835604, 1410542, 0.03169580549001694], [1436993613.674655, 1420684, 0.03182605654001236], [1436994093.747454, 1430832, 0.03185024857521057], [1436994573.768973, 1440986, 0.03199737146496773], [1436995053.666661, 1451174, 0.03156095743179321], [1436995533.83439, 1461345, 0.03150693327188492], [1436996013.556996, 1471495, 0.031496383249759674], [1436996493.635477, 1481663, 0.0313432440161705], [1436996973.668684, 1491822, 0.031145794317126274], [1436997453.59326, 1501979, 0.03106667660176754], [1436997933.774019, 1512139, 0.03143244981765747], [1436998413.575162, 1522290, 0.03142988309264183], [1436998893.640468, 1532431, 0.03132546320557594], [1436999373.551661, 1542579, 0.03125471621751785], [1436999853.57906, 1552734, 0.03098788857460022], [1437000333.680409, 1562888, 0.0308846328407526], [1437000813.602383, 1573037, 0.03082612156867981], [1437001293.610337, 1583190, 0.030793681740760803], [1437001773.618199, 1593341, 0.03087364137172699], [1437002253.572966, 1603497, 0.030839646235108376], [1437002733.67994, 1613657, 0.030705047771334648], [1437003213.583266, 1623809, 0.03071814589202404], [1437003693.639943, 1633966, 0.0304812490940094], [1437004173.568287, 1644113, 0.03030412085354328], [1437004653.610772, 1654268, 0.03032425045967102], [1437005133.663045, 1664424, 0.030430471524596214], [1437005613.580984, 1674567, 0.03036225587129593], [1437006093.601019, 1684715, 0.03056645393371582], [1437006573.625314, 1694857, 0.03043070062994957], [1437007053.584514, 1704999, 0.030224520713090897], [1437007533.719303, 1715150, 0.03024231642484665], [1437008013.604962, 1725282, 0.03009769506752491], [1437008493.655091, 1735432, 0.030214866623282433], [1437008973.640165, 1745584, 0.030181538313627243], [1437009453.715067, 1755742, 0.03017231822013855], [1437009933.765712, 1765896, 0.030141284689307213], [1437010413.632128, 1776052, 0.030052203685045242], [1437010893.66766, 1786195, 0.030078601092100143], [1437011373.636164, 1796346, 0.029969291761517525], [1437011853.631224, 1806481, 0.02999536693096161], [1437012333.706205, 1816617, 0.030100464820861816], [1437012813.61987, 1826754, 0.03008824959397316], [1437013293.479904, 1836883, 0.029995709657669067], [1437013773.604574, 1847029, 0.02995096519589424], [1437014253.618884, 1857175, 0.02980179339647293], [1437014733.756419, 1867312, 0.029607007279992104], [1437015213.638607, 1877459, 0.02952035330235958], [1437015693.625763, 1887608, 0.02937002293765545], [1437016173.63194, 1897759, 0.029285306110978127], [1437016653.609074, 1907909, 0.029194746166467667], [1437017133.717601, 1918074, 0.029153630137443542], [1437017613.716011, 1928220, 0.029063496738672256], [1437018093.626005, 1938377, 0.028990253806114197], [1437018573.626522, 1948523, 0.0290801040828228], [1437019053.648174, 1958678, 0.029026925563812256], [1437019533.803011, 1968831, 0.029071522876620293], [1437020013.667751, 1978978, 0.02911040186882019], [1437020493.659028, 1989133, 0.02908971533179283], [1437020973.657346, 1999287, 0.028982823714613914], [1437021453.650634, 2009437, 0.028793631121516228], [1437021933.848661, 2019588, 0.02868799678981304], [1437022413.674963, 2029736, 0.028585929423570633], [1437022893.69086, 2039894, 0.028488371521234512], [1437023373.68883, 2050054, 0.028293771669268608], [1437023853.686116, 2060205, 0.028227869421243668], [1437024333.763876, 2070362, 0.0280953086912632], [1437024813.707845, 2080507, 0.02794187143445015], [1437025293.483294, 2090645, 0.0278786551207304], [1437025773.695712, 2100793, 0.02786232903599739], [1437026253.672994, 2110943, 0.02783624827861786], [1437026733.780775, 2121094, 0.027756746858358383], [1437027213.617849, 2131235, 0.027644069865345955], [1437027693.694451, 2141382, 0.02752004750072956], [1437028173.68596, 2151537, 0.0274327602237463], [1437028653.584833, 2161685, 0.027434347197413445], [1437029133.792483, 2171839, 0.02731819450855255], [1437029613.661672, 2181977, 0.027138520032167435], [1437030093.641009, 2192118, 0.027088932693004608], [1437030573.656274, 2202268, 0.02713087759912014], [1437031053.643631, 2212416, 0.027159670367836952], [1437031533.777478, 2222583, 0.027089878916740417], [1437032013.704008, 2232736, 0.026989545673131943], [1437032493.638393, 2242882, 0.02692277729511261], [1437032973.684986, 2253041, 0.026783647015690804], [1437033453.699562, 2263183, 0.026735099032521248], [1437033933.918074, 2273320, 0.02665248140692711], [1437034413.596351, 2283443, 0.02659791149199009], [1437034893.640496, 2293579, 0.026540575549006462], [1437035373.637761, 2303701, 0.02647154964506626], [1437035853.669947, 2313823, 0.02645135670900345], [1437036333.78905, 2323961, 0.026429900899529457], [1437036813.699727, 2334089, 0.026324935257434845], [1437037293.662592, 2344235, 0.026287639513611794], [1437037773.66716, 2354364, 0.02626391313970089], [1437038253.603687, 2364507, 0.026225272566080093], [1437038733.78864, 2374644, 0.026248561218380928], [1437039213.641799, 2384782, 0.026243599131703377], [1437039693.687078, 2394923, 0.026255469769239426], [1437040173.635717, 2405058, 0.026186810806393623], [1437040653.673331, 2415194, 0.02606010064482689], [1437041133.764768, 2425322, 0.026031550019979477], [1437041613.629279, 2435449, 0.02595149166882038], [1437042093.703985, 2445575, 0.025885630398988724], [1437042573.496029, 2455712, 0.025858554989099503], [1437043053.686022, 2465844, 0.0257696695625782], [1437043533.731929, 2475974, 0.02574242651462555], [1437044013.636245, 2486095, 0.025741754099726677], [1437044493.69923, 2496238, 0.02561314031481743], [1437044973.652155, 2506373, 0.02550213597714901], [1437045453.691467, 2516497, 0.025422468781471252], [1437045933.935804, 2526637, 0.025300107896327972], [1437046413.635583, 2536770, 0.02533198893070221], [1437046893.626337, 2546896, 0.025261884555220604], [1437047373.67437, 2557029, 0.025176096707582474], [1437047853.652939, 2567169, 0.025054505094885826], [1437048333.778436, 2577306, 0.024978378787636757], [1437048813.654248, 2587433, 0.024952610954642296], [1437049293.610609, 2597552, 0.02484666183590889], [1437049773.646573, 2607690, 0.024764036759734154], [1437050253.667925, 2617808, 0.024689028039574623], [1437050733.735291, 2627933, 0.024599267169833183], [1437051213.620222, 2638053, 0.024585112929344177], [1437051693.601978, 2648171, 0.024474989622831345], [1437052173.634985, 2658299, 0.024343013763427734], [1437052653.687176, 2668425, 0.024294432252645493], [1437053133.762819, 2678556, 0.024164099246263504], [1437053613.643698, 2688671, 0.024035055190324783], [1437054093.673047, 2698804, 0.024000361561775208], [1437054573.667371, 2708956, 0.023914529010653496], [1437055053.650441, 2719087, 0.023955287411808968], [1437055533.778469, 2729219, 0.023859601467847824], [1437056013.694082, 2739343, 0.023759596049785614], [1437056493.674871, 2749458, 0.02367720566689968], [1437056973.700234, 2759575, 0.023645451292395592], [1437057453.666129, 2769697, 0.023565715178847313], [1437057933.848506, 2779821, 0.023514313623309135], [1437058413.643799, 2789941, 0.023489659652113914], [1437058893.715386, 2800076, 0.023429812863469124], [1437059373.62596, 2810207, 0.023344023153185844], [1437059853.650848, 2820334, 0.023226741701364517], [1437060333.792248, 2830465, 0.023134270682930946], [1437060813.682955, 2840600, 0.02305578999221325], [1437061293.681795, 2850745, 0.02298513427376747], [1437061773.691182, 2860880, 0.022913720458745956], [1437062253.662987, 2871013, 0.022864067927002907], [1437062733.760419, 2881153, 0.02278953418135643], [1437063213.651969, 2891278, 0.02276339940726757], [1437063693.723523, 2901406, 0.022675812244415283], [1437064173.68663, 2911533, 0.022622767835855484], [1437064653.547643, 2921667, 0.02255198359489441], [1437065133.62645, 2931813, 0.022431762889027596], [1437065613.566569, 2941947, 0.022368362173438072], [1437066093.537804, 2952102, 0.022323831915855408], [1437066573.529332, 2962243, 0.02226843684911728], [1437067053.520098, 2972400, 0.022210361436009407], [1437067533.605733, 2982561, 0.022118505090475082], [1437068013.535467, 2992698, 0.022013112902641296], [1437068493.559976, 3002839, 0.02197197824716568], [1437068973.558743, 3012983, 0.02191166952252388], [1437069453.562661, 3023116, 0.021851476281881332], [1437069933.627071, 3033256, 0.021762533113360405], [1437070413.574131, 3043386, 0.021733969449996948], [1437070893.658803, 3053528, 0.021669406443834305], [1437071373.638711, 3063659, 0.02159426547586918], [1437071853.621384, 3073794, 0.02153114229440689], [1437072333.665269, 3083926, 0.021499117836356163], [1437072813.584388, 3094040, 0.021457014605402946], [1437073293.569178, 3104172, 0.021365314722061157]]
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d2.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d2.json
new file mode 100644
index 0000000000..fb5a18d53a
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d2.json
@@ -0,0 +1 @@
+[[1436925978.257845, 7, 0.01034154836088419], [1436926413.945391, 1476, 0.03646053001284599], [1436926893.945037, 6006, 0.031110260635614395], [1436927373.995472, 13786, 0.024214591830968857], [1436927853.989794, 23650, 0.01820789836347103], [1436928334.132361, 33755, 0.01442798599600792], [1436928813.973288, 43941, 0.012150184251368046], [1436929293.975949, 54146, 0.011141776107251644], [1436929773.992781, 64316, 0.010859030298888683], [1436930253.997415, 74465, 0.011160558089613914], [1436930734.203004, 84611, 0.011997541412711143], [1436931214.03644, 94700, 0.01278648804873228], [1436931694.094564, 104766, 0.014073861762881279], [1436932174.114955, 114817, 0.01523376815021038], [1436932654.161382, 124880, 0.016527879983186722], [1436933133.960214, 134977, 0.01782997138798237], [1436933614.044337, 145062, 0.019055265933275223], [1436934094.166206, 155169, 0.02028629370033741], [1436934574.106036, 165284, 0.02116803079843521], [1436935054.150647, 175402, 0.022192901000380516], [1436935533.819562, 185538, 0.022869590669870377], [1436936013.710422, 195712, 0.023398980498313904], [1436936493.609025, 205906, 0.02443159930408001], [1436936973.683892, 216099, 0.025154944509267807], [1436937454.138383, 226331, 0.025802481919527054], [1436937933.838475, 236532, 0.027000702917575836], [1436938413.89688, 246724, 0.02752412110567093], [1436938894.018652, 256925, 0.0278119258582592], [1436939373.69067, 267137, 0.027698883786797523], [1436939853.673692, 277369, 0.028744956478476524], [1436940333.651346, 287620, 0.029281964525580406], [1436940813.599579, 297848, 0.03002205118536949], [1436941293.596313, 308088, 0.030467400327324867], [1436941773.659172, 318362, 0.03132195770740509], [1436942253.648479, 328621, 0.031431782990694046], [1436942733.752284, 338892, 0.03147844970226288], [1436943213.621881, 349144, 0.032013144344091415], [1436943693.698743, 359399, 0.03241390734910965], [1436944173.578463, 369649, 0.03261363133788109], [1436944653.692217, 379912, 0.033306822180747986], [1436945133.677298, 390180, 0.03390969708561897], [1436945613.572411, 400445, 0.03396527096629143], [1436946093.56123, 410703, 0.03388286381959915], [1436946573.542364, 420958, 0.03399669751524925], [1436947053.616578, 431216, 0.03394070267677307], [1436947533.636973, 441483, 0.03419327735900879], [1436948013.541574, 451751, 0.0342416949570179], [1436948493.560223, 462015, 0.034808479249477386], [1436948973.512541, 472260, 0.03552314639091492], [1436949453.550055, 482483, 0.036012329161167145], [1436949933.828011, 492731, 0.035826291888952255], [1436950413.603177, 502957, 0.03600003197789192], [1436950893.563009, 513185, 0.03563224524259567], [1436951373.620887, 523410, 0.03584449738264084], [1436951853.61941, 533618, 0.03587675839662552], [1436952333.694447, 543828, 0.036698292940855026], [1436952813.621004, 554042, 0.03698749095201492], [1436953293.588156, 564251, 0.03712376952171326], [1436953773.599734, 574464, 0.03729996830224991], [1436954253.621309, 584672, 0.03730553761124611], [1436954733.738119, 594882, 0.037479378283023834], [1436955213.56617, 605091, 0.03754287213087082], [1436955693.585366, 615296, 0.0377657376229763], [1436956173.626395, 625501, 0.038117796182632446], [1436956653.601937, 635705, 0.03822959586977959], [1436957133.665878, 645915, 0.03776161000132561], [1436957613.584762, 656116, 0.03816362842917442], [1436958093.549783, 666331, 0.03853853791952133], [1436958573.646778, 676543, 0.03826189786195755], [1436959053.585655, 686750, 0.0381099209189415], [1436959533.679696, 696961, 0.03844142332673073], [1436960013.633292, 707173, 0.03868117928504944], [1436960493.578778, 717383, 0.0390009842813015], [1436960973.596715, 727598, 0.0383562371134758], [1436961453.625644, 737818, 0.0382055900990963], [1436961933.740339, 748040, 0.03806299716234207], [1436962413.573845, 758252, 0.03807120397686958], [1436962893.610678, 768470, 0.03795558586716652], [1436963373.642878, 778674, 0.038018494844436646], [1436963853.558388, 788877, 0.038447774946689606], [1436964333.658419, 799099, 0.03842216357588768], [1436964813.573319, 809289, 0.03840547427535057], [1436965293.542098, 819484, 0.038492728024721146], [1436965773.545453, 829687, 0.0387515053153038], [1436966253.586517, 839901, 0.03869732841849327], [1436966733.639348, 850120, 0.03907460719347], [1436967213.697288, 860330, 0.0395859070122242], [1436967693.617172, 870539, 0.039280518889427185], [1436968173.593885, 880748, 0.0392826572060585], [1436968653.560836, 890955, 0.03899630531668663], [1436969133.676337, 901164, 0.03888440132141113], [1436969613.506638, 911358, 0.038790252059698105], [1436970093.595964, 921560, 0.03851785138249397], [1436970573.541227, 931756, 0.03913348540663719], [1436971053.624316, 941945, 0.038978900760412216], [1436971533.655543, 952138, 0.03925086557865143], [1436972013.604738, 962349, 0.039124101400375366], [1436972493.613199, 972551, 0.0390220545232296], [1436972973.501155, 982746, 0.039025235921144485], [1436973453.64842, 992945, 0.03877083212137222], [1436973933.689516, 1003147, 0.03902769833803177], [1436974413.577769, 1013350, 0.038719139993190765], [1436974893.542281, 1023545, 0.03872331231832504], [1436975373.638453, 1033759, 0.03927341103553772], [1436975853.524388, 1043955, 0.03930830955505371], [1436976333.625792, 1054148, 0.039153918623924255], [1436976813.610661, 1064342, 0.03932590410113335], [1436977293.601581, 1074539, 0.03922765702009201], [1436977773.575627, 1084733, 0.039390794932842255], [1436978253.564972, 1094914, 0.03935663774609566], [1436978733.673144, 1105109, 0.03939087316393852], [1436979213.540585, 1115293, 0.039371199905872345], [1436979693.699591, 1125483, 0.03982992097735405], [1436980173.613012, 1135670, 0.03941287472844124], [1436980653.575769, 1145862, 0.03933672979474068], [1436981133.719264, 1156045, 0.03919614478945732], [1436981613.563551, 1166236, 0.03906407952308655], [1436982093.553233, 1176436, 0.038837045431137085], [1436982573.577846, 1186636, 0.039009105414152145], [1436983053.605749, 1196837, 0.039010051637887955], [1436983533.684994, 1207025, 0.03891472890973091], [1436984013.561492, 1217233, 0.038610219955444336], [1436984493.629873, 1227437, 0.03866511583328247], [1436984973.606714, 1237643, 0.03865685313940048], [1436985453.690084, 1247835, 0.038945719599723816], [1436985933.711388, 1257951, 0.03925580158829689], [1436986413.598807, 1268125, 0.039332933723926544], [1436986893.631797, 1278290, 0.03918297216296196], [1436987373.596962, 1288473, 0.03883613646030426], [1436987853.555549, 1298650, 0.038776978850364685], [1436988333.722032, 1308841, 0.03888171166181564], [1436988813.55697, 1319018, 0.038825325667858124], [1436989293.756905, 1329221, 0.03864298388361931], [1436989773.665141, 1339417, 0.03865634649991989], [1436990253.768302, 1349610, 0.03898858651518822], [1436990733.708919, 1359759, 0.03906260430812836], [1436991213.663033, 1369914, 0.03911694139242172], [1436991693.730925, 1380074, 0.03875250369310379], [1436992173.751791, 1390224, 0.03882621228694916], [1436992653.758682, 1400383, 0.03877855837345123], [1436993133.835604, 1410542, 0.03870398923754692], [1436993613.674655, 1420684, 0.03887751325964928], [1436994093.747454, 1430832, 0.03915301710367203], [1436994573.768973, 1440986, 0.03938450664281845], [1436995053.666661, 1451174, 0.03919720649719238], [1436995533.83439, 1461345, 0.038862887769937515], [1436996013.556996, 1471495, 0.03901274502277374], [1436996493.635477, 1481663, 0.0388539656996727], [1436996973.668684, 1491822, 0.038732752203941345], [1436997453.59326, 1501979, 0.03879735246300697], [1436997933.774019, 1512139, 0.038524042814970016], [1436998413.575162, 1522290, 0.03869651257991791], [1436998893.640468, 1532431, 0.0383637398481369], [1436999373.551661, 1542579, 0.038300249725580215], [1436999853.57906, 1552734, 0.03799160569906235], [1437000333.680409, 1562888, 0.03759683296084404], [1437000813.602383, 1573037, 0.037678662687540054], [1437001293.610337, 1583190, 0.037575822323560715], [1437001773.618199, 1593341, 0.0376887246966362], [1437002253.572966, 1603497, 0.037922415882349014], [1437002733.67994, 1613657, 0.03766244649887085], [1437003213.583266, 1623809, 0.03754705190658569], [1437003693.639943, 1633966, 0.03738937899470329], [1437004173.568287, 1644113, 0.037347543984651566], [1437004653.610772, 1654268, 0.037374842911958694], [1437005133.663045, 1664424, 0.037443988025188446], [1437005613.580984, 1674567, 0.037457264959812164], [1437006093.601019, 1684715, 0.037874478846788406], [1437006573.625314, 1694857, 0.037644676864147186], [1437007053.584514, 1704999, 0.03743988648056984], [1437007533.719303, 1715150, 0.03739031031727791], [1437008013.604962, 1725282, 0.037301771342754364], [1437008493.655091, 1735432, 0.03735104575753212], [1437008973.640165, 1745584, 0.037282250821590424], [1437009453.715067, 1755742, 0.03729768097400665], [1437009933.765712, 1765896, 0.03717759624123573], [1437010413.632128, 1776052, 0.03691410645842552], [1437010893.66766, 1786195, 0.036807890981435776], [1437011373.636164, 1796346, 0.036659423261880875], [1437011853.631224, 1806481, 0.03682238608598709], [1437012333.706205, 1816617, 0.036776404827833176], [1437012813.61987, 1826754, 0.036672260612249374], [1437013293.479904, 1836883, 0.03666841238737106], [1437013773.604574, 1847029, 0.036642514169216156], [1437014253.618884, 1857175, 0.03654393553733826], [1437014733.756419, 1867312, 0.03638240322470665], [1437015213.638607, 1877459, 0.03610989451408386], [1437015693.625763, 1887608, 0.036011870950460434], [1437016173.63194, 1897759, 0.03607400134205818], [1437016653.609074, 1907909, 0.03581620752811432], [1437017133.717601, 1918074, 0.035680998116731644], [1437017613.716011, 1928220, 0.03547567501664162], [1437018093.626005, 1938377, 0.035375215113162994], [1437018573.626522, 1948523, 0.03534447029232979], [1437019053.648174, 1958678, 0.03535373508930206], [1437019533.803011, 1968831, 0.03541970252990723], [1437020013.667751, 1978978, 0.03534942492842674], [1437020493.659028, 1989133, 0.035337116569280624], [1437020973.657346, 1999287, 0.03519223630428314], [1437021453.650634, 2009437, 0.0350094810128212], [1437021933.848661, 2019588, 0.03481736779212952], [1437022413.674963, 2029736, 0.03482922539114952], [1437022893.69086, 2039894, 0.03482965752482414], [1437023373.68883, 2050054, 0.034710027277469635], [1437023853.686116, 2060205, 0.03447446599602699], [1437024333.763876, 2070362, 0.034356746822595596], [1437024813.707845, 2080507, 0.03430519998073578], [1437025293.483294, 2090645, 0.03412580490112305], [1437025773.695712, 2100793, 0.03409077599644661], [1437026253.672994, 2110943, 0.0340830534696579], [1437026733.780775, 2121094, 0.03400549292564392], [1437027213.617849, 2131235, 0.033846043050289154], [1437027693.694451, 2141382, 0.03379584103822708], [1437028173.68596, 2151537, 0.033618565648794174], [1437028653.584833, 2161685, 0.03352222591638565], [1437029133.792483, 2171839, 0.03338197246193886], [1437029613.661672, 2181977, 0.03323192894458771], [1437030093.641009, 2192118, 0.03313163295388222], [1437030573.656274, 2202268, 0.0331595316529274], [1437031053.643631, 2212416, 0.03310840204358101], [1437031533.777478, 2222583, 0.03298124670982361], [1437032013.704008, 2232736, 0.03288085386157036], [1437032493.638393, 2242882, 0.03281677886843681], [1437032973.684986, 2253041, 0.03261971473693848], [1437033453.699562, 2263183, 0.03251069411635399], [1437033933.918074, 2273320, 0.03243493288755417], [1437034413.596351, 2283443, 0.03251812607049942], [1437034893.640496, 2293579, 0.03244208171963692], [1437035373.637761, 2303701, 0.03246922418475151], [1437035853.669947, 2313823, 0.032652080059051514], [1437036333.78905, 2323961, 0.032621122896671295], [1437036813.699727, 2334089, 0.03248974680900574], [1437037293.662592, 2344235, 0.032404426485300064], [1437037773.66716, 2354364, 0.03240393102169037], [1437038253.603687, 2364507, 0.03238365799188614], [1437038733.78864, 2374644, 0.03244389593601227], [1437039213.641799, 2384782, 0.03239350765943527], [1437039693.687078, 2394923, 0.032426562160253525], [1437040173.635717, 2405058, 0.032403264194726944], [1437040653.673331, 2415194, 0.03231978043913841], [1437041133.764768, 2425322, 0.03223187103867531], [1437041613.629279, 2435449, 0.03213196247816086], [1437042093.703985, 2445575, 0.032153598964214325], [1437042573.496029, 2455712, 0.03199320286512375], [1437043053.686022, 2465844, 0.03188605234026909], [1437043533.731929, 2475974, 0.03178738057613373], [1437044013.636245, 2486095, 0.03171614184975624], [1437044493.69923, 2496238, 0.031645938754081726], [1437044973.652155, 2506373, 0.03155189007520676], [1437045453.691467, 2516497, 0.03144536912441254], [1437045933.935804, 2526637, 0.031432293355464935], [1437046413.635583, 2536770, 0.03129834309220314], [1437046893.626337, 2546896, 0.031195342540740967], [1437047373.67437, 2557029, 0.031033318489789963], [1437047853.652939, 2567169, 0.030938012525439262], [1437048333.778436, 2577306, 0.030827201902866364], [1437048813.654248, 2587433, 0.03068169392645359], [1437049293.610609, 2597552, 0.030520914122462273], [1437049773.646573, 2607690, 0.030437452718615532], [1437050253.667925, 2617808, 0.03041636385023594], [1437050733.735291, 2627933, 0.030291059985756874], [1437051213.620222, 2638053, 0.030283397063612938], [1437051693.601978, 2648171, 0.030193043872714043], [1437052173.634985, 2658299, 0.03004123829305172], [1437052653.687176, 2668425, 0.0299222432076931], [1437053133.762819, 2678556, 0.029762346297502518], [1437053613.643698, 2688671, 0.02970775216817856], [1437054093.673047, 2698804, 0.029604140669107437], [1437054573.667371, 2708956, 0.02949359640479088], [1437055053.650441, 2719087, 0.02943229116499424], [1437055533.778469, 2729219, 0.029304414987564087], [1437056013.694082, 2739343, 0.029147598892450333], [1437056493.674871, 2749458, 0.029033908620476723], [1437056973.700234, 2759575, 0.028886595740914345], [1437057453.666129, 2769697, 0.028734514489769936], [1437057933.848506, 2779821, 0.02874554693698883], [1437058413.643799, 2789941, 0.028716085478663445], [1437058893.715386, 2800076, 0.028669510036706924], [1437059373.62596, 2810207, 0.028530430048704147], [1437059853.650848, 2820334, 0.02839958481490612], [1437060333.792248, 2830465, 0.028364405035972595], [1437060813.682955, 2840600, 0.0282796248793602], [1437061293.681795, 2850745, 0.02820495329797268], [1437061773.691182, 2860880, 0.028159918263554573], [1437062253.662987, 2871013, 0.028104742988944054], [1437062733.760419, 2881153, 0.028099438175559044], [1437063213.651969, 2891278, 0.02802356891334057], [1437063693.723523, 2901406, 0.027945902198553085], [1437064173.68663, 2911533, 0.027897505089640617], [1437064653.547643, 2921667, 0.027821676805615425], [1437065133.62645, 2931813, 0.02770490199327469], [1437065613.566569, 2941947, 0.02761264331638813], [1437066093.537804, 2952102, 0.027557073161005974], [1437066573.529332, 2962243, 0.027522796764969826], [1437067053.520098, 2972400, 0.027469975873827934], [1437067533.605733, 2982561, 0.027299631386995316], [1437068013.535467, 2992698, 0.027225365862250328], [1437068493.559976, 3002839, 0.027095869183540344], [1437068973.558743, 3012983, 0.027036350220441818], [1437069453.562661, 3023116, 0.02693818509578705], [1437069933.627071, 3033256, 0.02687198854982853], [1437070413.574131, 3043386, 0.02687297947704792], [1437070893.658803, 3053528, 0.026770537719130516], [1437071373.638711, 3063659, 0.026667704805731773], [1437071853.621384, 3073794, 0.026571234688162804], [1437072333.665269, 3083926, 0.026447603479027748], [1437072813.584388, 3094040, 0.026389220729470253], [1437073293.569178, 3104172, 0.026299258694052696]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d3.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d3.json
new file mode 100644
index 0000000000..e489130ea7
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d3.json
@@ -0,0 +1 @@
+[[1436925978.257845, 7, 0.03425809368491173], [1436926413.945391, 1476, 0.032557398080825806], [1436926893.945037, 6006, 0.0277252234518528], [1436927373.995472, 13786, 0.021282576024532318], [1436927853.989794, 23650, 0.015578101389110088], [1436928334.132361, 33755, 0.011687012389302254], [1436928813.973288, 43941, 0.00918175745755434], [1436929293.975949, 54146, 0.00784988235682249], [1436929773.992781, 64316, 0.007188988849520683], [1436930253.997415, 74465, 0.0072308750823140144], [1436930734.203004, 84611, 0.007685060612857342], [1436931214.03644, 94700, 0.008267422206699848], [1436931694.094564, 104766, 0.008946981281042099], [1436932174.114955, 114817, 0.009664506651461124], [1436932654.161382, 124880, 0.010994983837008476], [1436933133.960214, 134977, 0.011961394920945168], [1436933614.044337, 145062, 0.012674711644649506], [1436934094.166206, 155169, 0.013640021905303001], [1436934574.106036, 165284, 0.014305224642157555], [1436935054.150647, 175402, 0.014946703799068928], [1436935533.819562, 185538, 0.015737954527139664], [1436936013.710422, 195712, 0.016330912709236145], [1436936493.609025, 205906, 0.016695979982614517], [1436936973.683892, 216099, 0.017458846792578697], [1436937454.138383, 226331, 0.018533164635300636], [1436937933.838475, 236532, 0.01949200965464115], [1436938413.89688, 246724, 0.019517479464411736], [1436938894.018652, 256925, 0.019764307886362076], [1436939373.69067, 267137, 0.02008572220802307], [1436939853.673692, 277369, 0.021091068163514137], [1436940333.651346, 287620, 0.02140945754945278], [1436940813.599579, 297848, 0.021988170221447945], [1436941293.596313, 308088, 0.0223606675863266], [1436941773.659172, 318362, 0.022547796368598938], [1436942253.648479, 328621, 0.023031413555145264], [1436942733.752284, 338892, 0.023786410689353943], [1436943213.621881, 349144, 0.024008480831980705], [1436943693.698743, 359399, 0.024148935452103615], [1436944173.578463, 369649, 0.02457556128501892], [1436944653.692217, 379912, 0.02469060942530632], [1436945133.677298, 390180, 0.024952523410320282], [1436945613.572411, 400445, 0.02554873190820217], [1436946093.56123, 410703, 0.025771528482437134], [1436946573.542364, 420958, 0.02602078951895237], [1436947053.616578, 431216, 0.02602880820631981], [1436947533.636973, 441483, 0.026351822540163994], [1436948013.541574, 451751, 0.0262188371270895], [1436948493.560223, 462015, 0.026711203157901764], [1436948973.512541, 472260, 0.027218565344810486], [1436949453.550055, 482483, 0.02734719216823578], [1436949933.828011, 492731, 0.027217986062169075], [1436950413.603177, 502957, 0.027318857610225677], [1436950893.563009, 513185, 0.027305351570248604], [1436951373.620887, 523410, 0.027760380879044533], [1436951853.61941, 533618, 0.0280567966401577], [1436952333.694447, 543828, 0.028621215373277664], [1436952813.621004, 554042, 0.028958816081285477], [1436953293.588156, 564251, 0.029186993837356567], [1436953773.599734, 574464, 0.028960207477211952], [1436954253.621309, 584672, 0.028913332149386406], [1436954733.738119, 594882, 0.02921229600906372], [1436955213.56617, 605091, 0.029444556683301926], [1436955693.585366, 615296, 0.029747728258371353], [1436956173.626395, 625501, 0.030260732397437096], [1436956653.601937, 635705, 0.030394721776247025], [1436957133.665878, 645915, 0.03041674755513668], [1436957613.584762, 656116, 0.03058660589158535], [1436958093.549783, 666331, 0.030284838750958443], [1436958573.646778, 676543, 0.030354052782058716], [1436959053.585655, 686750, 0.030551131814718246], [1436959533.679696, 696961, 0.030686482787132263], [1436960013.633292, 707173, 0.030921922996640205], [1436960493.578778, 717383, 0.031079748645424843], [1436960973.596715, 727598, 0.03077232837677002], [1436961453.625644, 737818, 0.03084420971572399], [1436961933.740339, 748040, 0.03110562451183796], [1436962413.573845, 758252, 0.031141508370637894], [1436962893.610678, 768470, 0.031010067090392113], [1436963373.642878, 778674, 0.031100917607545853], [1436963853.558388, 788877, 0.03134296461939812], [1436964333.658419, 799099, 0.031301673501729965], [1436964813.573319, 809289, 0.031290579587221146], [1436965293.542098, 819484, 0.031435515731573105], [1436965773.545453, 829687, 0.031667787581682205], [1436966253.586517, 839901, 0.03169453889131546], [1436966733.639348, 850120, 0.03191617131233215], [1436967213.697288, 860330, 0.03205711767077446], [1436967693.617172, 870539, 0.03206227719783783], [1436968173.593885, 880748, 0.03195691108703613], [1436968653.560836, 890955, 0.03166574612259865], [1436969133.676337, 901164, 0.031929291784763336], [1436969613.506638, 911358, 0.031744007021188736], [1436970093.595964, 921560, 0.0315803587436676], [1436970573.541227, 931756, 0.031766779720783234], [1436971053.624316, 941945, 0.03181062266230583], [1436971533.655543, 952138, 0.0319465771317482], [1436972013.604738, 962349, 0.032054755836725235], [1436972493.613199, 972551, 0.03192495182156563], [1436972973.501155, 982746, 0.0319976881146431], [1436973453.64842, 992945, 0.03205036744475365], [1436973933.689516, 1003147, 0.032020118087530136], [1436974413.577769, 1013350, 0.03207429125905037], [1436974893.542281, 1023545, 0.032214779406785965], [1436975373.638453, 1033759, 0.03219134360551834], [1436975853.524388, 1043955, 0.0324082113802433], [1436976333.625792, 1054148, 0.03221917897462845], [1436976813.610661, 1064342, 0.03200480341911316], [1436977293.601581, 1074539, 0.03198748826980591], [1436977773.575627, 1084733, 0.032064300030469894], [1436978253.564972, 1094914, 0.032298240810632706], [1436978733.673144, 1105109, 0.03248215466737747], [1436979213.540585, 1115293, 0.03262820467352867], [1436979693.699591, 1125483, 0.032745134085416794], [1436980173.613012, 1135670, 0.032681502401828766], [1436980653.575769, 1145862, 0.03240214288234711], [1436981133.719264, 1156045, 0.03237201273441315], [1436981613.563551, 1166236, 0.03202598914504051], [1436982093.553233, 1176436, 0.032163310796022415], [1436982573.577846, 1186636, 0.03232435882091522], [1436983053.605749, 1196837, 0.032410554587841034], [1436983533.684994, 1207025, 0.03245232254266739], [1436984013.561492, 1217233, 0.03224659338593483], [1436984493.629873, 1227437, 0.03204221650958061], [1436984973.606714, 1237643, 0.03186390548944473], [1436985453.690084, 1247835, 0.031911786645650864], [1436985933.711388, 1257951, 0.032286882400512695], [1436986413.598807, 1268125, 0.032266560941934586], [1436986893.631797, 1278290, 0.03252791985869408], [1436987373.596962, 1288473, 0.03241678699851036], [1436987853.555549, 1298650, 0.03210347890853882], [1436988333.722032, 1308841, 0.031904902309179306], [1436988813.55697, 1319018, 0.03179018944501877], [1436989293.756905, 1329221, 0.0316874124109745], [1436989773.665141, 1339417, 0.03160090371966362], [1436990253.768302, 1349610, 0.03161816671490669], [1436990733.708919, 1359759, 0.0317724235355854], [1436991213.663033, 1369914, 0.03175821527838707], [1436991693.730925, 1380074, 0.031629402190446854], [1436992173.751791, 1390224, 0.031547073274850845], [1436992653.758682, 1400383, 0.031528495252132416], [1436993133.835604, 1410542, 0.03169562667608261], [1436993613.674655, 1420684, 0.031826674938201904], [1436994093.747454, 1430832, 0.03185039013624191], [1436994573.768973, 1440986, 0.03199826925992966], [1436995053.666661, 1451174, 0.03156091645359993], [1436995533.83439, 1461345, 0.031506411731243134], [1436996013.556996, 1471495, 0.031495608389377594], [1436996493.635477, 1481663, 0.03134337440133095], [1436996973.668684, 1491822, 0.031145554035902023], [1436997453.59326, 1501979, 0.031068041920661926], [1436997933.774019, 1512139, 0.031432390213012695], [1436998413.575162, 1522290, 0.03142932057380676], [1436998893.640468, 1532431, 0.03132513165473938], [1436999373.551661, 1542579, 0.03125539794564247], [1436999853.57906, 1552734, 0.0309873279184103], [1437000333.680409, 1562888, 0.03088490664958954], [1437000813.602383, 1573037, 0.0308260228484869], [1437001293.610337, 1583190, 0.030793415382504463], [1437001773.618199, 1593341, 0.03087344579398632], [1437002253.572966, 1603497, 0.0308389812707901], [1437002733.67994, 1613657, 0.03070608340203762], [1437003213.583266, 1623809, 0.0307186096906662], [1437003693.639943, 1633966, 0.03048117645084858], [1437004173.568287, 1644113, 0.03030446544289589], [1437004653.610772, 1654268, 0.030324051156640053], [1437005133.663045, 1664424, 0.03043009154498577], [1437005613.580984, 1674567, 0.030361991375684738], [1437006093.601019, 1684715, 0.030566193163394928], [1437006573.625314, 1694857, 0.030430208891630173], [1437007053.584514, 1704999, 0.030224468559026718], [1437007533.719303, 1715150, 0.030241932719945908], [1437008013.604962, 1725282, 0.030097855255007744], [1437008493.655091, 1735432, 0.030217904597520828], [1437008973.640165, 1745584, 0.030181601643562317], [1437009453.715067, 1755742, 0.030172593891620636], [1437009933.765712, 1765896, 0.030141659080982208], [1437010413.632128, 1776052, 0.030052196234464645], [1437010893.66766, 1786195, 0.03007938154041767], [1437011373.636164, 1796346, 0.02996920794248581], [1437011853.631224, 1806481, 0.029995175078511238], [1437012333.706205, 1816617, 0.03010040894150734], [1437012813.61987, 1826754, 0.030088385567069054], [1437013293.479904, 1836883, 0.029996229335665703], [1437013773.604574, 1847029, 0.029950618743896484], [1437014253.618884, 1857175, 0.029801754280924797], [1437014733.756419, 1867312, 0.029606210067868233], [1437015213.638607, 1877459, 0.029520301148295403], [1437015693.625763, 1887608, 0.02937021106481552], [1437016173.63194, 1897759, 0.02928493171930313], [1437016653.609074, 1907909, 0.029194936156272888], [1437017133.717601, 1918074, 0.029153617098927498], [1437017613.716011, 1928220, 0.029063349589705467], [1437018093.626005, 1938377, 0.02899051643908024], [1437018573.626522, 1948523, 0.02908063493669033], [1437019053.648174, 1958678, 0.029026903212070465], [1437019533.803011, 1968831, 0.029071694239974022], [1437020013.667751, 1978978, 0.029110101982951164], [1437020493.659028, 1989133, 0.02908976934850216], [1437020973.657346, 1999287, 0.028982611373066902], [1437021453.650634, 2009437, 0.028793690726161003], [1437021933.848661, 2019588, 0.02868787571787834], [1437022413.674963, 2029736, 0.028585631400346756], [1437022893.69086, 2039894, 0.02848806604743004], [1437023373.68883, 2050054, 0.028294002637267113], [1437023853.686116, 2060205, 0.02822807803750038], [1437024333.763876, 2070362, 0.02809525839984417], [1437024813.707845, 2080507, 0.027941878885030746], [1437025293.483294, 2090645, 0.02787884697318077], [1437025773.695712, 2100793, 0.027862509712576866], [1437026253.672994, 2110943, 0.027835993096232414], [1437026733.780775, 2121094, 0.027756690979003906], [1437027213.617849, 2131235, 0.027644263580441475], [1437027693.694451, 2141382, 0.02752007730305195], [1437028173.68596, 2151537, 0.027432529255747795], [1437028653.584833, 2161685, 0.027434471994638443], [1437029133.792483, 2171839, 0.027317894622683525], [1437029613.661672, 2181977, 0.027138294652104378], [1437030093.641009, 2192118, 0.027088705450296402], [1437030573.656274, 2202268, 0.027131302282214165], [1437031053.643631, 2212416, 0.02715957537293434], [1437031533.777478, 2222583, 0.027089620009064674], [1437032013.704008, 2232736, 0.026989320293068886], [1437032493.638393, 2242882, 0.026922713965177536], [1437032973.684986, 2253041, 0.02678370475769043], [1437033453.699562, 2263183, 0.0267350971698761], [1437033933.918074, 2273320, 0.026652036234736443], [1437034413.596351, 2283443, 0.0265977680683136], [1437034893.640496, 2293579, 0.02654072269797325], [1437035373.637761, 2303701, 0.026471523568034172], [1437035853.669947, 2313823, 0.026451298967003822], [1437036333.78905, 2323961, 0.026429779827594757], [1437036813.699727, 2334089, 0.026324886828660965], [1437037293.662592, 2344235, 0.026287589222192764], [1437037773.66716, 2354364, 0.026264755055308342], [1437038253.603687, 2364507, 0.026225194334983826], [1437038733.78864, 2374644, 0.02624845691025257], [1437039213.641799, 2384782, 0.02624380588531494], [1437039693.687078, 2394923, 0.026255516335368156], [1437040173.635717, 2405058, 0.026186630129814148], [1437040653.673331, 2415194, 0.026059549301862717], [1437041133.764768, 2425322, 0.02603207901120186], [1437041613.629279, 2435449, 0.025951188057661057], [1437042093.703985, 2445575, 0.025885486975312233], [1437042573.496029, 2455712, 0.0258584376424551], [1437043053.686022, 2465844, 0.02576967515051365], [1437043533.731929, 2475974, 0.02574247308075428], [1437044013.636245, 2486095, 0.025741368532180786], [1437044493.69923, 2496238, 0.025613142177462578], [1437044973.652155, 2506373, 0.025502001866698265], [1437045453.691467, 2516497, 0.025422129780054092], [1437045933.935804, 2526637, 0.02530006691813469], [1437046413.635583, 2536770, 0.02533203549683094], [1437046893.626337, 2546896, 0.025261884555220604], [1437047373.67437, 2557029, 0.02517615258693695], [1437047853.652939, 2567169, 0.025054262951016426], [1437048333.778436, 2577306, 0.024978358298540115], [1437048813.654248, 2587433, 0.024952327832579613], [1437049293.610609, 2597552, 0.024846646934747696], [1437049773.646573, 2607690, 0.024763893336057663], [1437050253.667925, 2617808, 0.024688972160220146], [1437050733.735291, 2627933, 0.024599123746156693], [1437051213.620222, 2638053, 0.024585271254181862], [1437051693.601978, 2648171, 0.024474715813994408], [1437052173.634985, 2658299, 0.0243435837328434], [1437052653.687176, 2668425, 0.024294523522257805], [1437053133.762819, 2678556, 0.024163981899619102], [1437053613.643698, 2688671, 0.024034887552261353], [1437054093.673047, 2698804, 0.024000374600291252], [1437054573.667371, 2708956, 0.023914175108075142], [1437055053.650441, 2719087, 0.02395522966980934], [1437055533.778469, 2729219, 0.023859599605202675], [1437056013.694082, 2739343, 0.02375946193933487], [1437056493.674871, 2749458, 0.023677179589867592], [1437056973.700234, 2759575, 0.023645443841814995], [1437057453.666129, 2769697, 0.02356558106839657], [1437057933.848506, 2779821, 0.023514214903116226], [1437058413.643799, 2789941, 0.023489613085985184], [1437058893.715386, 2800076, 0.023429814726114273], [1437059373.62596, 2810207, 0.023343827575445175], [1437059853.650848, 2820334, 0.02322673238813877], [1437060333.792248, 2830465, 0.023134106770157814], [1437060813.682955, 2840600, 0.023055672645568848], [1437061293.681795, 2850745, 0.022985080257058144], [1437061773.691182, 2860880, 0.02291373908519745], [1437062253.662987, 2871013, 0.022864071652293205], [1437062733.760419, 2881153, 0.0227896086871624], [1437063213.651969, 2891278, 0.02276325598359108], [1437063693.723523, 2901406, 0.022676151245832443], [1437064173.68663, 2911533, 0.022622840479016304], [1437064653.547643, 2921667, 0.022551873698830605], [1437065133.62645, 2931813, 0.022431621327996254], [1437065613.566569, 2941947, 0.022368427366018295], [1437066093.537804, 2952102, 0.022323856130242348], [1437066573.529332, 2962243, 0.022268367931246758], [1437067053.520098, 2972400, 0.022210223600268364], [1437067533.605733, 2982561, 0.022118542343378067], [1437068013.535467, 2992698, 0.022013003006577492], [1437068493.559976, 3002839, 0.021971898153424263], [1437068973.558743, 3012983, 0.021911533549427986], [1437069453.562661, 3023116, 0.021851375699043274], [1437069933.627071, 3033256, 0.021762363612651825], [1437070413.574131, 3043386, 0.021733952686190605], [1437070893.658803, 3053528, 0.021669508889317513], [1437071373.638711, 3063659, 0.021594204008579254], [1437071853.621384, 3073794, 0.021531015634536743], [1437072333.665269, 3083926, 0.021499203518033028], [1437072813.584388, 3094040, 0.021456807851791382], [1437073293.569178, 3104172, 0.02136526256799698]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d4.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d4.json
new file mode 100644
index 0000000000..434b78cd0f
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/beta/d4.json
@@ -0,0 +1 @@
+[[1436925978.257845, 7, 0.5028539896011353], [1436926413.945391, 1476, 0.4976981580257416], [1436926893.945037, 6006, 0.5092837810516357], [1436927373.995472, 13786, 0.5118998885154724], [1436927853.989794, 23650, 0.5314905643463135], [1436928334.132361, 33755, 0.550969123840332], [1436928813.973288, 43941, 0.5487659573554993], [1436929293.975949, 54146, 0.5263530015945435], [1436929773.992781, 64316, 0.5077286958694458], [1436930253.997415, 74465, 0.5120566487312317], [1436930734.203004, 84611, 0.5140185952186584], [1436931214.03644, 94700, 0.5133042335510254], [1436931694.094564, 104766, 0.5233010053634644], [1436932174.114955, 114817, 0.5230671763420105], [1436932654.161382, 124880, 0.5250263810157776], [1436933133.960214, 134977, 0.5088120698928833], [1436933614.044337, 145062, 0.5097426176071167], [1436934094.166206, 155169, 0.5103482007980347], [1436934574.106036, 165284, 0.5021579265594482], [1436935054.150647, 175402, 0.49785494804382324], [1436935533.819562, 185538, 0.4970649182796478], [1436936013.710422, 195712, 0.5023221373558044], [1436936493.609025, 205906, 0.5063169002532959], [1436936973.683892, 216099, 0.50455641746521], [1436937454.138383, 226331, 0.5104150772094727], [1436937933.838475, 236532, 0.5066487193107605], [1436938413.89688, 246724, 0.5183079838752747], [1436938894.018652, 256925, 0.5163102746009827], [1436939373.69067, 267137, 0.5216323733329773], [1436939853.673692, 277369, 0.5153006315231323], [1436940333.651346, 287620, 0.5240126252174377], [1436940813.599579, 297848, 0.5263218879699707], [1436941293.596313, 308088, 0.5236956477165222], [1436941773.659172, 318362, 0.534295916557312], [1436942253.648479, 328621, 0.540306031703949], [1436942733.752284, 338892, 0.5359382033348083], [1436943213.621881, 349144, 0.540198564529419], [1436943693.698743, 359399, 0.5404431819915771], [1436944173.578463, 369649, 0.5429667234420776], [1436944653.692217, 379912, 0.5415231585502625], [1436945133.677298, 390180, 0.54068922996521], [1436945613.572411, 400445, 0.5396349430084229], [1436946093.56123, 410703, 0.5486253499984741], [1436946573.542364, 420958, 0.5451043248176575], [1436947053.616578, 431216, 0.5478819608688354], [1436947533.636973, 441483, 0.5503379106521606], [1436948013.541574, 451751, 0.5534676313400269], [1436948493.560223, 462015, 0.5574610829353333], [1436948973.512541, 472260, 0.5558810234069824], [1436949453.550055, 482483, 0.5529404878616333], [1436949933.828011, 492731, 0.5618430972099304], [1436950413.603177, 502957, 0.5641138553619385], [1436950893.563009, 513185, 0.5707159638404846], [1436951373.620887, 523410, 0.5676558613777161], [1436951853.61941, 533618, 0.5637813806533813], [1436952333.694447, 543828, 0.5682924389839172], [1436952813.621004, 554042, 0.5690237283706665], [1436953293.588156, 564251, 0.5655006766319275], [1436953773.599734, 574464, 0.553955614566803], [1436954253.621309, 584672, 0.5558924674987793], [1436954733.738119, 594882, 0.5603042840957642], [1436955213.56617, 605091, 0.5625290870666504], [1436955693.585366, 615296, 0.5668522715568542], [1436956173.626395, 625501, 0.5736584663391113], [1436956653.601937, 635705, 0.5693879723548889], [1436957133.665878, 645915, 0.576599657535553], [1436957613.584762, 656116, 0.5648065805435181], [1436958093.549783, 666331, 0.5632508397102356], [1436958573.646778, 676543, 0.5660487413406372], [1436959053.585655, 686750, 0.568809449672699], [1436959533.679696, 696961, 0.5667826533317566], [1436960013.633292, 707173, 0.5637232661247253], [1436960493.578778, 717383, 0.5675314664840698], [1436960973.596715, 727598, 0.5714674592018127], [1436961453.625644, 737818, 0.564845085144043], [1436961933.740339, 748040, 0.5700833797454834], [1436962413.573845, 758252, 0.5702976584434509], [1436962893.610678, 768470, 0.5745863914489746], [1436963373.642878, 778674, 0.5763651728630066], [1436963853.558388, 788877, 0.5721960067749023], [1436964333.658419, 799099, 0.5714120864868164], [1436964813.573319, 809289, 0.5687000155448914], [1436965293.542098, 819484, 0.5728974938392639], [1436965773.545453, 829687, 0.5738612413406372], [1436966253.586517, 839901, 0.5702064037322998], [1436966733.639348, 850120, 0.5715107321739197], [1436967213.697288, 860330, 0.5695001482963562], [1436967693.617172, 870539, 0.5783872008323669], [1436968173.593885, 880748, 0.5758792161941528], [1436968653.560836, 890955, 0.572809636592865], [1436969133.676337, 901164, 0.5752230286598206], [1436969613.506638, 911358, 0.5861247181892395], [1436970093.595964, 921560, 0.5834078788757324], [1436970573.541227, 931756, 0.5814791321754456], [1436971053.624316, 941945, 0.5803619623184204], [1436971533.655543, 952138, 0.5765199065208435], [1436972013.604738, 962349, 0.5693190693855286], [1436972493.613199, 972551, 0.5720453262329102], [1436972973.501155, 982746, 0.5741620063781738], [1436973453.64842, 992945, 0.5705713629722595], [1436973933.689516, 1003147, 0.5657351613044739], [1436974413.577769, 1013350, 0.5685256123542786], [1436974893.542281, 1023545, 0.5698860287666321], [1436975373.638453, 1033759, 0.5801734328269958], [1436975853.524388, 1043955, 0.577880322933197], [1436976333.625792, 1054148, 0.5780594348907471], [1436976813.610661, 1064342, 0.5804633498191833], [1436977293.601581, 1074539, 0.5842364430427551], [1436977773.575627, 1084733, 0.5745837092399597], [1436978253.564972, 1094914, 0.5848771333694458], [1436978733.673144, 1105109, 0.5795935392379761], [1436979213.540585, 1115293, 0.583346426486969], [1436979693.699591, 1125483, 0.5840965509414673], [1436980173.613012, 1135670, 0.5807850360870361], [1436980653.575769, 1145862, 0.5843925476074219], [1436981133.719264, 1156045, 0.5828814506530762], [1436981613.563551, 1166236, 0.5873864889144897], [1436982093.553233, 1176436, 0.5896572470664978], [1436982573.577846, 1186636, 0.5887367725372314], [1436983053.605749, 1196837, 0.5841871500015259], [1436983533.684994, 1207025, 0.5867579579353333], [1436984013.561492, 1217233, 0.5940297842025757], [1436984493.629873, 1227437, 0.5925037860870361], [1436984973.606714, 1237643, 0.5981529951095581], [1436985453.690084, 1247835, 0.5954598188400269], [1436985933.711388, 1257951, 0.5903756022453308], [1436986413.598807, 1268125, 0.5837404131889343], [1436986893.631797, 1278290, 0.583182156085968], [1436987373.596962, 1288473, 0.5860618352890015], [1436987853.555549, 1298650, 0.5829544067382812], [1436988333.722032, 1308841, 0.5798720121383667], [1436988813.55697, 1319018, 0.589148998260498], [1436989293.756905, 1329221, 0.5905702710151672], [1436989773.665141, 1339417, 0.5900465250015259], [1436990253.768302, 1349610, 0.5893078446388245], [1436990733.708919, 1359759, 0.589722752571106], [1436991213.663033, 1369914, 0.5907371640205383], [1436991693.730925, 1380074, 0.5939858555793762], [1436992173.751791, 1390224, 0.5906378626823425], [1436992653.758682, 1400383, 0.5876493453979492], [1436993133.835604, 1410542, 0.5912420153617859], [1436993613.674655, 1420684, 0.5887293219566345], [1436994093.747454, 1430832, 0.589107096195221], [1436994573.768973, 1440986, 0.5928497910499573], [1436995053.666661, 1451174, 0.5916265845298767], [1436995533.83439, 1461345, 0.5911784768104553], [1436996013.556996, 1471495, 0.5890726447105408], [1436996493.635477, 1481663, 0.5914839506149292], [1436996973.668684, 1491822, 0.5915400385856628], [1436997453.59326, 1501979, 0.591564416885376], [1436997933.774019, 1512139, 0.5926578640937805], [1436998413.575162, 1522290, 0.5942149758338928], [1436998893.640468, 1532431, 0.5931802988052368], [1436999373.551661, 1542579, 0.587592601776123], [1436999853.57906, 1552734, 0.5877953171730042], [1437000333.680409, 1562888, 0.590681791305542], [1437000813.602383, 1573037, 0.5924896001815796], [1437001293.610337, 1583190, 0.5913501381874084], [1437001773.618199, 1593341, 0.5952408909797668], [1437002253.572966, 1603497, 0.5953922271728516], [1437002733.67994, 1613657, 0.6002237200737], [1437003213.583266, 1623809, 0.6042569875717163], [1437003693.639943, 1633966, 0.6017740368843079], [1437004173.568287, 1644113, 0.6037994623184204], [1437004653.610772, 1654268, 0.6037947535514832], [1437005133.663045, 1664424, 0.6028310060501099], [1437005613.580984, 1674567, 0.603211522102356], [1437006093.601019, 1684715, 0.6052727699279785], [1437006573.625314, 1694857, 0.6032628417015076], [1437007053.584514, 1704999, 0.5978461503982544], [1437007533.719303, 1715150, 0.602828323841095], [1437008013.604962, 1725282, 0.6063790917396545], [1437008493.655091, 1735432, 0.6047347784042358], [1437008973.640165, 1745584, 0.6031648516654968], [1437009453.715067, 1755742, 0.6067507863044739], [1437009933.765712, 1765896, 0.6062817573547363], [1437010413.632128, 1776052, 0.609245240688324], [1437010893.66766, 1786195, 0.6066284775733948], [1437011373.636164, 1796346, 0.6102170944213867], [1437011853.631224, 1806481, 0.609173595905304], [1437012333.706205, 1816617, 0.6035751104354858], [1437012813.61987, 1826754, 0.604059636592865], [1437013293.479904, 1836883, 0.6039224863052368], [1437013773.604574, 1847029, 0.5974730849266052], [1437014253.618884, 1857175, 0.6040806174278259], [1437014733.756419, 1867312, 0.6017186045646667], [1437015213.638607, 1877459, 0.5987159609794617], [1437015693.625763, 1887608, 0.6047909259796143], [1437016173.63194, 1897759, 0.6033824682235718], [1437016653.609074, 1907909, 0.6038352847099304], [1437017133.717601, 1918074, 0.6083348989486694], [1437017613.716011, 1928220, 0.6044996380805969], [1437018093.626005, 1938377, 0.6009799242019653], [1437018573.626522, 1948523, 0.60047847032547], [1437019053.648174, 1958678, 0.6019382476806641], [1437019533.803011, 1968831, 0.6007305383682251], [1437020013.667751, 1978978, 0.6025127172470093], [1437020493.659028, 1989133, 0.6051828861236572], [1437020973.657346, 1999287, 0.6085876822471619], [1437021453.650634, 2009437, 0.6065122485160828], [1437021933.848661, 2019588, 0.6084572076797485], [1437022413.674963, 2029736, 0.6065473556518555], [1437022893.69086, 2039894, 0.6075063347816467], [1437023373.68883, 2050054, 0.6095973253250122], [1437023853.686116, 2060205, 0.6047213077545166], [1437024333.763876, 2070362, 0.6034210324287415], [1437024813.707845, 2080507, 0.6008927822113037], [1437025293.483294, 2090645, 0.604469895362854], [1437025773.695712, 2100793, 0.6068717837333679], [1437026253.672994, 2110943, 0.6099737882614136], [1437026733.780775, 2121094, 0.6105009317398071], [1437027213.617849, 2131235, 0.611957311630249], [1437027693.694451, 2141382, 0.6141949892044067], [1437028173.68596, 2151537, 0.6135279536247253], [1437028653.584833, 2161685, 0.6111017465591431], [1437029133.792483, 2171839, 0.6135671138763428], [1437029613.661672, 2181977, 0.6112024188041687], [1437030093.641009, 2192118, 0.6097264289855957], [1437030573.656274, 2202268, 0.6097284555435181], [1437031053.643631, 2212416, 0.6121350526809692], [1437031533.777478, 2222583, 0.6147991418838501], [1437032013.704008, 2232736, 0.6118316054344177], [1437032493.638393, 2242882, 0.6191433072090149], [1437032973.684986, 2253041, 0.6188027262687683], [1437033453.699562, 2263183, 0.6163974404335022], [1437033933.918074, 2273320, 0.6144159436225891], [1437034413.596351, 2283443, 0.6123769879341125], [1437034893.640496, 2293579, 0.6139131188392639], [1437035373.637761, 2303701, 0.6150627136230469], [1437035853.669947, 2313823, 0.6149951219558716], [1437036333.78905, 2323961, 0.6155945658683777], [1437036813.699727, 2334089, 0.613308310508728], [1437037293.662592, 2344235, 0.6153736114501953], [1437037773.66716, 2354364, 0.6160987615585327], [1437038253.603687, 2364507, 0.611574113368988], [1437038733.78864, 2374644, 0.6145234107971191], [1437039213.641799, 2384782, 0.6117951273918152], [1437039693.687078, 2394923, 0.6129845380783081], [1437040173.635717, 2405058, 0.6095831394195557], [1437040653.673331, 2415194, 0.6110679507255554], [1437041133.764768, 2425322, 0.6099690198898315], [1437041613.629279, 2435449, 0.6105908155441284], [1437042093.703985, 2445575, 0.6124749779701233], [1437042573.496029, 2455712, 0.6118302345275879], [1437043053.686022, 2465844, 0.6094756722450256], [1437043533.731929, 2475974, 0.6094986796379089], [1437044013.636245, 2486095, 0.6114639639854431], [1437044493.69923, 2496238, 0.6101082563400269], [1437044973.652155, 2506373, 0.6105718612670898], [1437045453.691467, 2516497, 0.6115666627883911], [1437045933.935804, 2526637, 0.6128115653991699], [1437046413.635583, 2536770, 0.6122986078262329], [1437046893.626337, 2546896, 0.6142017245292664], [1437047373.67437, 2557029, 0.6111341714859009], [1437047853.652939, 2567169, 0.611350417137146], [1437048333.778436, 2577306, 0.6126709580421448], [1437048813.654248, 2587433, 0.6111524105072021], [1437049293.610609, 2597552, 0.6135894060134888], [1437049773.646573, 2607690, 0.6136029362678528], [1437050253.667925, 2617808, 0.6141685843467712], [1437050733.735291, 2627933, 0.6170881390571594], [1437051213.620222, 2638053, 0.6189730167388916], [1437051693.601978, 2648171, 0.6157540678977966], [1437052173.634985, 2658299, 0.6178646683692932], [1437052653.687176, 2668425, 0.6164441108703613], [1437053133.762819, 2678556, 0.6175132393836975], [1437053613.643698, 2688671, 0.6158696413040161], [1437054093.673047, 2698804, 0.6162974238395691], [1437054573.667371, 2708956, 0.6160892844200134], [1437055053.650441, 2719087, 0.6176281571388245], [1437055533.778469, 2729219, 0.6165231466293335], [1437056013.694082, 2739343, 0.6171510219573975], [1437056493.674871, 2749458, 0.6124134659767151], [1437056973.700234, 2759575, 0.6120688319206238], [1437057453.666129, 2769697, 0.6126770377159119], [1437057933.848506, 2779821, 0.6126595139503479], [1437058413.643799, 2789941, 0.616513729095459], [1437058893.715386, 2800076, 0.6130264401435852], [1437059373.62596, 2810207, 0.6114044785499573], [1437059853.650848, 2820334, 0.6077002882957458], [1437060333.792248, 2830465, 0.6086235046386719], [1437060813.682955, 2840600, 0.6084680557250977], [1437061293.681795, 2850745, 0.6094310879707336], [1437061773.691182, 2860880, 0.6066345572471619], [1437062253.662987, 2871013, 0.6094250082969666], [1437062733.760419, 2881153, 0.609106719493866], [1437063213.651969, 2891278, 0.6080747246742249], [1437063693.723523, 2901406, 0.6081057786941528], [1437064173.68663, 2911533, 0.6066460609436035], [1437064653.547643, 2921667, 0.6057829856872559], [1437065133.62645, 2931813, 0.6092885136604309], [1437065613.566569, 2941947, 0.6089289784431458], [1437066093.537804, 2952102, 0.6070758700370789], [1437066573.529332, 2962243, 0.6096142530441284], [1437067053.520098, 2972400, 0.609714925289154], [1437067533.605733, 2982561, 0.6116167306900024], [1437068013.535467, 2992698, 0.6119107007980347], [1437068493.559976, 3002839, 0.6119140386581421], [1437068973.558743, 3012983, 0.6115538477897644], [1437069453.562661, 3023116, 0.6126777529716492], [1437069933.627071, 3033256, 0.6146017909049988], [1437070413.574131, 3043386, 0.6119789481163025], [1437070893.658803, 3053528, 0.6139205694198608], [1437071373.638711, 3063659, 0.612362802028656], [1437071853.621384, 3073794, 0.6109192371368408], [1437072333.665269, 3083926, 0.6141091585159302], [1437072813.584388, 3094040, 0.6132751703262329], [1437073293.569178, 3104172, 0.6132386922836304]]
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/runs.json b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/runs.json
new file mode 100644
index 0000000000..90fb0cbb09
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/data/runs.json
@@ -0,0 +1,22 @@
+{
+  "alpha": {
+    "scalars": [
+      "d1",
+      "d2",
+      "d3",
+      "d4"
+    ],
+    "histograms": [],
+    "images": []
+  },
+  "beta": {
+    "scalars": [
+      "d1",
+      "d2",
+      "d3",
+      "d4"
+    ],
+    "histograms": [],
+    "images": []
+  }
+}
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/demo/index.html b/tensorflow/tensorboard/components/tf-event-dashboard/demo/index.html
new file mode 100644
index 0000000000..34ad6a7263
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/demo/index.html
@@ -0,0 +1,17 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+    <link rel="import" href="../tf-event-dashboard.html">
+    <link rel="stylesheet" type="text/css" href="../../../lib/css/global.css">
+    <title>Event Dashboard Demo Demo</title>
+  </head>
+  <body>
+    <script>
+      TF.Urls.runsUrl = function() {return "data/runs.json"};
+      TF.Urls.scalarsUrl = function(tag, run) {return "data/" + run + "/" + tag + ".json";};
+    </script>
+
+    <tf-event-dashboard id="demo"></tf-event-dashboard>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/dragZoomInteraction.ts b/tensorflow/tensorboard/components/tf-event-dashboard/dragZoomInteraction.ts
new file mode 100644
index 0000000000..bf9f7b70e2
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/dragZoomInteraction.ts
@@ -0,0 +1,150 @@
+module Plottable {
+export class DragZoomLayer extends Components.SelectionBoxLayer {
+  private _dragInteraction: Interactions.Drag;
+  private _doubleClickInteraction: Interactions.DoubleClick;
+  private xDomainToRestore: any[];
+  private yDomainToRestore: any[];
+  private isZoomed = false;
+  private easeFn: (t: number) => number = d3.ease("cubic-in-out");
+  private _animationTime = 750;
+
+  /* Constructs a SelectionBoxLayer with an attached DragInteraction and ClickInteraction.
+   * On drag, it triggers an animated zoom into the box that was dragged.
+   * On double click, it zooms back out to the original view, before any zooming.
+   * The zoom animation uses an easing function (default d3.ease("cubic-in-out")) and is customizable.
+   * Usage: Construct the selection box layer and attach x and y scales, and then add the layer
+   * over the plot you are zooming on using a Component Group.
+   * TODO(danmane) - merge this into Plottable
+   */
+  constructor(xScale: QuantitativeScale<number | { valueOf(): number }>,
+              yScale: QuantitativeScale<number | { valueOf(): number }>) {
+    super();
+    this.xScale(xScale);
+    this.yScale(yScale);
+    this._dragInteraction = new Interactions.Drag();
+    this._dragInteraction.attachTo(this);
+    this._doubleClickInteraction = new Interactions.DoubleClick();
+    this._doubleClickInteraction.attachTo(this);
+    this.setupCallbacks();
+  }
+
+  private setupCallbacks() {
+    let dragging = false;
+    this._dragInteraction.onDragStart((startPoint: Point) => {
+      this.bounds({
+        topLeft: startPoint,
+        bottomRight: startPoint,
+      });
+    });
+    this._dragInteraction.onDrag((startPoint, endPoint) => {
+      this.bounds({topLeft: startPoint, bottomRight: endPoint});
+      this.boxVisible(true);
+      dragging = true;
+    });
+    this._dragInteraction.onDragEnd((startPoint, endPoint) => {
+      this.boxVisible(false);
+      this.bounds({topLeft: startPoint, bottomRight: endPoint});
+      if (dragging) {
+        this.zoom();
+      }
+      dragging = false;
+    });
+
+    this._doubleClickInteraction.onDoubleClick(this.unzoom.bind(this));
+  }
+
+  /* Set the time (in ms) over which the zoom will interpolate.
+   * 0 implies no interpolation. (ie zoom is instant)
+   */
+  public animationTime(): number;
+  public animationTime(animationTime: number): DragZoomLayer;
+  public animationTime(animationTime?: number): any {
+    if (animationTime == null) {
+      return this._animationTime;
+    }
+    if (animationTime < 0) {
+      throw new Error("animationTime cannot be negative");
+    }
+    this._animationTime = animationTime;
+    return this;
+  }
+
+  /* Set the easing function, which determines how the zoom interpolates over time. */
+  public ease(fn: (t: number) => number): DragZoomLayer {
+    if (typeof(fn) !== "function") {
+      throw new Error("ease function must be a function");
+    }
+    if (fn(0) !== 0 || fn(1) !== 1) {
+      Utils.Window.warn("Easing function does not maintain invariant f(0)==0 && f(1)==1. Bad behavior may result.");
+    }
+    this.easeFn = fn;
+    return this;
+  }
+
+  // Zoom into extent of the selection box bounds
+  private zoom() {
+    let x0: number = this.xExtent()[0].valueOf();
+    let x1: number = this.xExtent()[1].valueOf();
+    let y0: number = this.yExtent()[1].valueOf();
+    let y1: number = this.yExtent()[0].valueOf();
+
+    if (x0 === x1 || y0 === y1) {
+      return;
+    }
+
+    if (!this.isZoomed) {
+      this.isZoomed = true;
+      this.xDomainToRestore = this.xScale().domain();
+      this.yDomainToRestore = this.yScale().domain();
+    }
+    this.interpolateZoom(x0, x1, y0, y1);
+  }
+
+  // Restore the scales to their state before any zoom
+  private unzoom() {
+    if (!this.isZoomed) {
+      return;
+    }
+    this.isZoomed = false;
+    this.interpolateZoom(this.xDomainToRestore[0], this.xDomainToRestore[1],
+                         this.yDomainToRestore[0], this.yDomainToRestore[1]);
+  }
+
+  // If we are zooming, disable interactions, to avoid contention
+  private isZooming(isZooming: boolean) {
+    this._dragInteraction.enabled(!isZooming);
+    this._doubleClickInteraction.enabled(!isZooming);
+  }
+
+  private interpolateZoom(x0f: number, x1f: number, y0f: number, y1f: number) {
+    let x0s: number = this.xScale().domain()[0].valueOf();
+    let x1s: number = this.xScale().domain()[1].valueOf();
+    let y0s: number = this.yScale().domain()[0].valueOf();
+    let y1s: number = this.yScale().domain()[1].valueOf();
+
+    // Copy a ref to the ease fn, so that changing ease wont affect zooms in progress
+    let ease = this.easeFn;
+    let interpolator = (a: number, b: number, p: number) => d3.interpolateNumber(a, b)(ease(p));
+
+    this.isZooming(true);
+    let start = Date.now();
+    let draw = () => {
+      let now = Date.now();
+      let passed = now - start;
+      let p = this._animationTime === 0 ? 1 : Math.min(1, passed / this._animationTime);
+      let x0 = interpolator(x0s, x0f, p);
+      let x1 = interpolator(x1s, x1f, p);
+      let y0 = interpolator(y0s, y0f, p);
+      let y1 = interpolator(y1s, y1f, p);
+      this.xScale().domain([x0, x1]);
+      this.yScale().domain([y0, y1]);
+      if (p < 1) {
+        Utils.DOM.requestAnimationFramePolyfill(draw);
+      } else {
+        this.isZooming(false);
+      }
+    };
+    draw();
+  }
+}
+}
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/tf-chart.html b/tensorflow/tensorboard/components/tf-event-dashboard/tf-chart.html
new file mode 100644
index 0000000000..39ca9704c3
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/tf-chart.html
@@ -0,0 +1,101 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../imports/plottable.html">
+<link rel="import" href="../imports/lodash.html">
+
+<!--
+tf-chart (TFChart) creates an element that draws a line chart for dispalying event values.
+It has the following settable properties:
+  tag: (required, string) - the name of the tag to load for this chart
+  selectedRuns: (required, string[]) - the runs the chart should display
+  xType: (required, string) - the way to display x values - allows "step" or "wall_time"
+  dataCoordinator: (required, TF.DataCoordinator) - the data coordinator for talking to backend
+  colorScale: (required, Plottable.Scales.Color) - maps from runs to colors
+  tooltipUpdater: (required, function) - allows the chart to modify tooltips
+
+It exposes the following methods:
+  redraw() - cause the chart to re-render (useful if e.g. container size changed)
+
+The data is expected to be an array of [wall_time, step, value] arrays.
+The wall_time is serialized as seconds since epoch.
+-->
+<dom-module id="tf-chart">
+  <template>
+    <svg id="chartsvg"></svg>
+    <style>
+      :host {
+        -webkit-user-select: none;
+        -moz-user-select: none;
+        display: flex;
+        flex-direction: column;
+        flex-grow: 1;
+        flex-shrink: 1;
+      }
+      svg {
+        -webkit-user-select: none;
+        -moz-user-select: none;
+        flex-grow: 1;
+        flex-shrink: 1;
+      }
+      .plottable .crosshairs line.guide-line {
+        stroke: #777;
+      }
+    </style>
+  </template>
+  <script src="dragZoomInteraction.js"></script>
+  <script src="tf-chart.js"></script>
+  <script>
+    Polymer({
+      is: "tf-chart",
+      properties: {
+        type: String, // "scalar" or "compressedHistogram"
+        _chart: Object,
+        colorScale: Object,
+        tag: String,
+        selectedRuns: Array,
+        xType: String,
+        dataCoordinator: Object,
+        tooltipUpdater: Function,
+        _initialized: Boolean,
+      },
+      observers: [
+        "_makeChart(type, tag, dataCoordinator, tooltipUpdater, xType, colorScale, _initialized)",
+        "_changeRuns(_chart, selectedRuns.*)"
+      ],
+      _changeRuns: function(chart, change) {
+        this._chart.changeRuns(this.selectedRuns);
+        this.redraw();
+      },
+      redraw: function() {
+        this._chart.redraw();
+      },
+      _constructor: function(type) {
+        if (type === "scalar") {
+          return TF.LineChart;
+        } else if (type === "compressedHistogram") {
+          return TF.HistogramChart;
+        } else {
+          throw new Error("Unrecognized chart type");
+        }
+      },
+      _makeChart: function(type, tag, dataCoordinator, tooltipUpdater, xType, colorScale, _initialized) {
+        if (!_initialized) {
+          return;
+        }
+        if (this._chart) this._chart.destroy();
+        var cns = this._constructor(type);
+        var chart = new cns(tag, dataCoordinator, tooltipUpdater, xType, colorScale);
+        var svg = d3.select(this.$.chartsvg);
+        this.async(function() {
+          chart.renderTo(svg);
+          this._chart = chart;
+        }, 350);
+      },
+      attached: function() {
+        this._initialized = true;
+      },
+      detached: function() {
+        this._initialized = false;
+      }
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/tf-chart.ts b/tensorflow/tensorboard/components/tf-event-dashboard/tf-chart.ts
new file mode 100644
index 0000000000..0b0103d697
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/tf-chart.ts
@@ -0,0 +1,327 @@
+/// <reference path="../../typings/tsd.d.ts" />
+/// <reference path="../../bower_components/plottable/plottable.d.ts" />
+
+module TF {
+  type TFDatum = [number, number, number];
+  type tooltipMap = {[run: string]: string};
+  type TooltipUpdater = (tooltipMap, xValue, closestRun) => void;
+
+  let Y_TOOLTIP_FORMATTER_PRECISION = 4;
+  let STEP_AXIS_FORMATTER_PRECISION = 4;
+  let Y_AXIS_FORMATTER_PRECISION = 3;
+
+  export class BaseChart {
+    protected dataCoordinator: TF.DataCoordinator;
+    protected tag: string;
+    protected tooltipUpdater: TooltipUpdater;
+
+    protected xAccessor: Plottable.Accessor<number | Date>;
+    protected xScale: Plottable.QuantitativeScale<number | Date>;
+    protected yScale: Plottable.QuantitativeScale<number>;
+    protected gridlines: Plottable.Components.Gridlines;
+    protected center: Plottable.Components.Group;
+    protected xAxis: Plottable.Axes.Numeric | Plottable.Axes.Time;
+    protected yAxis: Plottable.Axes.Numeric;
+    protected xLabel: Plottable.Components.AxisLabel;
+    protected yLabel: Plottable.Components.AxisLabel;
+    protected outer: Plottable.Components.Table;
+    protected colorScale: Plottable.Scales.Color;
+    protected xTooltipFormatter: (d: number) => string;
+    constructor(
+        tag: string,
+        dataCoordinator: TF.DataCoordinator,
+        tooltipUpdater: TooltipUpdater,
+        xType: string,
+        colorScale: Plottable.Scales.Color
+      ) {
+      this.dataCoordinator = dataCoordinator;
+      this.tag = tag;
+      this.colorScale = colorScale;
+      this.tooltipUpdater = tooltipUpdater;
+      this.buildChart(xType);
+    }
+
+    public changeRuns(runs: string[]) {
+      throw new Error("Abstract method not implemented");
+    }
+
+    protected addCrosshairs(plot: Plottable.XYPlot<number | Date, number>, yAccessor): Plottable.Components.Group {
+      var pi = new Plottable.Interactions.Pointer();
+      pi.attachTo(plot);
+      let xGuideLine = new Plottable.Components.GuideLineLayer<void>("vertical");
+      let yGuideLine = new Plottable.Components.GuideLineLayer<void>("horizontal");
+      xGuideLine.addClass("crosshairs");
+      yGuideLine.addClass("crosshairs");
+      var group = new Plottable.Components.Group([plot, xGuideLine, yGuideLine]);
+      let yfmt = multiscaleFormatter(Y_TOOLTIP_FORMATTER_PRECISION);
+
+      pi.onPointerMove((p: Plottable.Point) => {
+        let run2val: {[run: string]: string} = {};
+        let x: number = this.xScale.invert(p.x).valueOf();
+        let yMin: number = this.yScale.domain()[0];
+        let yMax: number = this.yScale.domain()[1];
+        let closestRun: string = null;
+        let minYDistToRun: number = Infinity;
+        let yValueForCrosshairs: number = p.y;
+        plot.datasets().forEach((dataset) => {
+          let run: string = dataset.metadata().run;
+          let data: TFDatum[] = dataset.data();
+          let xs: number[] = data.map((d, i) => this.xAccessor(d, i, dataset).valueOf());
+          let idx: number = _.sortedIndex(xs, x);
+          if (idx === 0 || idx === data.length) {
+            // Only find a point when the cursor is inside the range of the data
+            // if the cursor is to the left or right of all the data, dont attach.
+            return;
+          }
+          let previous = data[idx - 1];
+          let next = data[idx];
+          let x0: number = this.xAccessor(previous, idx - 1, dataset).valueOf();
+          let x1: number = this.xAccessor(next, idx, dataset).valueOf();
+          let y0: number = yAccessor(previous, idx - 1, dataset).valueOf();
+          let y1: number = yAccessor(next, idx, dataset).valueOf();
+          let slope: number = (y1 - y0) / (x1 - x0);
+          let y: number = y0 + slope * (x - x0);
+
+          if (y < yMin || y > yMax || y !== y) {
+            // don't find data that is off the top or bottom of the plot.
+            // also don't find data if it is NaN
+            return;
+          }
+          let dist = Math.abs(this.yScale.scale(y) - p.y);
+          if (dist < minYDistToRun) {
+            minYDistToRun = dist;
+            closestRun = run;
+            yValueForCrosshairs = this.yScale.scale(y);
+          }
+          // Note this tooltip will display linearly interpolated values
+          // e.g. will display a y=0 value halfway between [y=-1, y=1], even
+          // though there is not actually any 0 datapoint. This could be misleading
+          run2val[run] = yfmt(y);
+        });
+        xGuideLine.pixelPosition(p.x);
+        yGuideLine.pixelPosition(yValueForCrosshairs);
+        this.tooltipUpdater(run2val, this.xTooltipFormatter(x), closestRun);
+
+      });
+
+      pi.onPointerExit(() => {
+        this.tooltipUpdater(null, null, null);
+        xGuideLine.pixelPosition(-1);
+        yGuideLine.pixelPosition(-1);
+      });
+
+      return group;
+
+    }
+
+    protected buildChart(xType: string) {
+      if (this.outer) {
+        this.outer.destroy();
+      }
+      var xComponents = getXComponents(xType);
+      this.xAccessor = xComponents.accessor;
+      this.xScale = xComponents.scale;
+      this.xAxis = xComponents.axis;
+      this.xAxis.margin(0).tickLabelPadding(3);
+      this.xTooltipFormatter = xComponents.tooltipFormatter;
+      this.yScale = new Plottable.Scales.Linear();
+      this.yAxis = new Plottable.Axes.Numeric(this.yScale, "left");
+      let yFormatter = multiscaleFormatter(Y_AXIS_FORMATTER_PRECISION);
+      this.yAxis.margin(0).tickLabelPadding(5).formatter(yFormatter);
+      this.yAxis.usesTextWidthApproximation(true);
+
+      var center = this.buildPlot(this.xAccessor, this.xScale, this.yScale);
+
+      this.gridlines = new Plottable.Components.Gridlines(this.xScale, this.yScale);
+
+      var dzl = new Plottable.DragZoomLayer(this.xScale, this.yScale);
+
+      this.center = new Plottable.Components.Group([center, this.gridlines, dzl]);
+      this.outer =  new Plottable.Components.Table([
+                                                   [this.yAxis, this.center],
+                                                   [null, this.xAxis]
+                                                  ]);
+    }
+
+    protected buildPlot(xAccessor, xScale, yScale): Plottable.Component {
+      throw new Error("Abstract method not implemented.");
+    }
+
+    public renderTo(target: d3.Selection<any>) {
+      this.outer.renderTo(target);
+    }
+
+    public redraw() {
+      this.outer.redraw();
+    }
+
+    protected destroy() {
+      this.outer.destroy();
+    }
+  }
+
+  export class LineChart extends BaseChart {
+    private plot: Plottable.Plots.Line<number | Date>;
+    protected buildPlot(xAccessor, xScale, yScale): Plottable.Component {
+      var yAccessor = accessorize("2");
+      var plot = new Plottable.Plots.Line<number | Date>();
+      plot.x(xAccessor, xScale);
+      plot.y(yAccessor, yScale);
+      plot.attr("stroke", (d: any, i: number, m: any) => m.run, this.colorScale);
+      this.plot = plot;
+      var group = this.addCrosshairs(plot, yAccessor);
+      return group;
+    }
+
+    public changeRuns(runs: string[]) {
+      var datasets = this.dataCoordinator.getDatasets(this.tag, runs);
+      this.plot.datasets(datasets);
+    }
+
+  }
+
+  export class HistogramChart extends BaseChart {
+    private plots: Plottable.XYPlot<number | Date, number>[];
+
+    public changeRuns(runs: string[]) {
+      var datasets = this.dataCoordinator.getDatasets(this.tag, runs);
+      this.plots.forEach((p) => p.datasets(datasets));
+    }
+
+    protected buildPlot(xAccessor, xScale, yScale): Plottable.Component {
+      var percents =  [0, 228, 1587, 3085, 5000, 6915, 8413, 9772, 10000];
+      var opacities = _.range(percents.length - 1).map((i) => (percents[i + 1] - percents[i]) / 2500);
+      var accessors = percents.map((p, i) => (datum) => datum[2][i][1]);
+      var median = 4;
+      var medianAccessor = accessors[median];
+
+      var plots = _.range(accessors.length - 1).map((i) => {
+        var p = new Plottable.Plots.Area<number | Date>();
+        p.x(xAccessor, xScale);
+
+        var y0 = i > median ? accessors[i] : accessors[i + 1];
+        var y  = i > median ? accessors[i + 1] : accessors[i];
+        p.y(y, yScale);
+        p.y0(y0);
+        p.attr("fill", (d: any, i: number, m: any) => m.run, this.colorScale);
+        p.attr("stroke", (d: any, i: number, m: any) => m.run, this.colorScale);
+        p.attr("stroke-weight", (d: any, i: number, m: any) => "0.5px");
+        p.attr("stroke-opacity", () => opacities[i]);
+        p.attr("fill-opacity", () => opacities[i]);
+        return p;
+      });
+
+      var medianPlot = new Plottable.Plots.Line<number | Date>();
+      medianPlot.x(xAccessor, xScale);
+      medianPlot.y(medianAccessor, yScale);
+      medianPlot.attr("stroke", (d: any, i: number, m: any) => m.run, this.colorScale);
+
+      this.plots = plots;
+      var group = this.addCrosshairs(medianPlot, medianAccessor);
+      return new Plottable.Components.Group([new Plottable.Components.Group(plots), group]);
+    }
+  }
+
+  /* Create a formatter function that will switch between exponential and
+   * regular display depending on the scale of the number being formatted,
+   * and show `digits` significant digits.
+   */
+  function multiscaleFormatter(digits: number): ((v: number) => string) {
+    return (v: number) => {
+      var absv = Math.abs(v);
+      if (absv < 1E-15) {
+        // Sometimes zero-like values get an annoying representation
+        absv = 0;
+      }
+      var f: (x: number) => string;
+      if (absv >= 1E4) {
+        f = d3.format("." + digits + "e");
+      } else if (absv > 0 && absv < 0.01) {
+        f = d3.format("." + digits + "e");
+      } else {
+        f = d3.format("." + digits + "g");
+      }
+      return f(v);
+    };
+  }
+
+  function accessorize(key: string): Plottable.Accessor<number> {
+    return (d: any, index: number, dataset: Plottable.Dataset) => d[key];
+  }
+
+  interface XComponents {
+    /* tslint:disable */
+    scale: Plottable.Scales.Linear | Plottable.Scales.Time,
+    axis: Plottable.Axes.Numeric | Plottable.Axes.Time,
+    accessor: Plottable.Accessor<number | Date>,
+    tooltipFormatter: (d: number) => string;
+    /* tslint:enable */
+  }
+
+  function stepX(): XComponents {
+    var scale = new Plottable.Scales.Linear();
+    var axis = new Plottable.Axes.Numeric(scale, "bottom");
+    var formatter = Plottable.Formatters.siSuffix(STEP_AXIS_FORMATTER_PRECISION);
+    axis.formatter(formatter);
+    return {
+      scale: scale,
+      axis: axis,
+      accessor: accessorize("1"),
+      tooltipFormatter: formatter,
+    };
+  }
+
+  function wallX(): XComponents {
+    var scale = new Plottable.Scales.Time();
+    var formatter = Plottable.Formatters.time("%a %b %e, %H:%M:%S");
+    return {
+      scale: scale,
+      axis: new Plottable.Axes.Time(scale, "bottom"),
+      accessor: (d: any, index: number, dataset: Plottable.Dataset) => {
+        return d[0] * 1000; // convert seconds to ms
+      },
+      tooltipFormatter: (d: number) => formatter(new Date(d)),
+    };
+  }
+
+  function relativeX(): XComponents {
+    var scale = new Plottable.Scales.Linear();
+    var formatter = (n: number) => {
+      var days = Math.floor(n / 24);
+      n -= (days * 24);
+      var hours = Math.floor(n);
+      n -= hours;
+      n *= 60;
+      var minutes = Math.floor(n);
+      n -= minutes;
+      n *= 60;
+      var seconds = Math.floor(n);
+      return days + "d " + hours + "h " + minutes + "m " + seconds + "s";
+    };
+    return {
+      scale: scale,
+      axis: new Plottable.Axes.Numeric(scale, "bottom"),
+      accessor: (d: any, index: number, dataset: Plottable.Dataset) => {
+        var data = dataset && dataset.data();
+        // I can't imagine how this function would be called when the data is empty
+        // (after all, it iterates over the data), but lets guard just to be safe.
+        var first = data.length > 0 ? data[0][0] : 0;
+        return (d[0] - first) / (60 * 60); // convert seconds to hours
+      },
+      tooltipFormatter: formatter,
+    };
+  }
+
+  function getXComponents(xType: string): XComponents {
+    switch (xType) {
+      case "step":
+        return stepX();
+      case "wall_time":
+        return wallX();
+      case "relative":
+        return relativeX();
+      default:
+        throw new Error("invalid xType: " + xType);
+    }
+  }
+}
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/tf-color-scale.html b/tensorflow/tensorboard/components/tf-event-dashboard/tf-color-scale.html
new file mode 100644
index 0000000000..b559cab9cd
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/tf-color-scale.html
@@ -0,0 +1,69 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../imports/lodash.html">
+<link rel="import" href="../imports/plottable.html">
+
+<!--
+tf-color-scale is a plumbing component that takes in an array of runs, and produces
+an upward-bindable outColorScale, which is a color scale mapping from those runs to
+a set of colors.
+
+Right now, the colors are hard-coded and must be manually synchronized with the colors expected in
+tf-run-selector. TODO(danmane): we should enshrine the mapping elsewhere.
+-->
+<dom-module id="tf-color-scale">
+  <script>
+    (function() {
+      // TODO(danmane) - get Plottable team to make an API point for this
+      Plottable.Scales.Color._LOOP_LIGHTEN_FACTOR = 0;
+      var classColorPairs = [
+        ["light-blue", "#03A9F4"],
+        ["red"       , "#f44366"],
+        ["green"     , "#4CAF50"],
+        ["purple"    , "#9c27b0"],
+        ["teal"      , "#009688"],
+        ["pink"      , "#e91e63"],
+        ["orange"    , "#ff9800"],
+        ["brown"     , "#795548"],
+        ["indigo"    , "#3f51b5"],
+      ];
+      var classes = _.pluck(classColorPairs, 0);
+      var colors = _.pluck(classColorPairs, 1);
+      Polymer({
+        is: "tf-color-scale",
+        properties: {
+          runs: Array,
+          outClassScale: {
+            type: Object,
+            notify: true,
+            readOnly: true,
+            value: function() {
+              return new d3.scale.ordinal().range(classes);
+            },
+            // TODO(danmane): the class scale will not update if the domain changes.
+            // this behavior is inconsistent with the ColorScale.
+            // in practice we don't change runs after initial load so it's not currently an issue
+          },
+          outColorScale: {
+            type: Object,
+            notify: true,
+            readOnly: true,
+            value: function() {
+              var scale = new Plottable.Scales.Color().range(colors);
+              scale.onUpdate(this._notifyColorScaleDomainChange.bind(this));
+              return scale;
+            },
+          },
+        },
+        observers: ["_changeRuns(runs.*)"],
+        _changeRuns: function(runs) {
+          this.outClassScale.domain(this.runs);
+          this.outColorScale.domain(this.runs);
+        },
+        _notifyColorScaleDomainChange: function() {
+          this.notifyPath("outColorScale.domain_path", this.outColorScale.domain());
+          this.outColorScale.domain_path = null;
+        },
+      });
+    })();
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/tf-data-coordinator.html b/tensorflow/tensorboard/components/tf-event-dashboard/tf-data-coordinator.html
new file mode 100644
index 0000000000..454dff4a9e
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/tf-data-coordinator.html
@@ -0,0 +1,29 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../imports/plottable.html">
+<link rel="import" href="../imports/lodash.html">
+
+<!--
+tf-data-coordinator is a simple plumbing component that takes in a value url generator
+(a function that takes a tag and a run and returns a url), and produces an upward-bindable
+TF.DataCoordinator for consumption elsewhere.
+-->
+<dom-module id="tf-data-coordinator">
+  <script src="dataCoordinator.js"></script>
+  <script src="dataset.js"></script>
+  <script>
+    Polymer({
+      is: "tf-data-coordinator",
+      properties: {
+        urlGenerator: Object,
+        outDataCoordinator: {
+          type: Object,
+          computed: "getCoordinator(urlGenerator, runToTag)",
+          notify: true,
+        },
+      },
+      getCoordinator: function(generator, runToTag) {
+        return new TF.DataCoordinator(generator, runToTag);
+      }
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/tf-event-dashboard.html b/tensorflow/tensorboard/components/tf-event-dashboard/tf-event-dashboard.html
new file mode 100644
index 0000000000..534f62072f
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/tf-event-dashboard.html
@@ -0,0 +1,208 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="tf-data-coordinator.html">
+<link rel="import" href="tf-tooltip-coordinator.html">
+<link rel="import" href="tf-run-selector.html">
+<link rel="import" href="tf-x-type-selector.html">
+<link rel="import" href="../tf-dashboard-common/tf-run-generator.html">
+<link rel="import" href="tf-color-scale.html">
+<link rel="import" href="../tf-dashboard-common/tf-url-generator.html">
+<link rel="import" href="../tf-dashboard-common/tf-dashboard-layout.html">
+<link rel="import" href="../tf-dashboard-common/tensorboard-color.html">
+<link rel="import" href="../tf-dashboard-common/dashboard-style.html">
+<link rel="import" href="../tf-dashboard-common/tf-downloader.html">
+<link rel="import" href="../tf-categorizer/tf-categorizer.html">
+<link rel="import" href="tf-chart.html">
+<link rel="import" href="../tf-collapsable-pane/tf-collapsable-pane.html">
+<link rel="import" href="../../bower_components/iron-collapse/iron-collapse.html">
+<link rel="import" href="../../bower_components/paper-icon-button/paper-icon-button.html">
+<link rel="import" href="../imports/lodash.html">
+<link rel="import" href="../tf-dashboard-common/warning-style.html">
+
+<!--
+tf-event-dashboard is a complete frontend that loads runs from a backend,
+and creates chart panes that display data for those runs.
+
+It provides a categorizer, run selector, and x type selector, by which the user
+can customize how data is organized and displayed.
+
+Each chart has a button that can toggle whether it is "selected"; selectedRuns
+charts are larger.
+
+Organizationally, the #plumbing div contains components that have no concrete
+manifestation and just effect data bindings or data loading. The #sidebar contains
+shared controls like the tf-categorizer, tf-run-selector, and tf-x-type-selector.
+The #center div contains tf-charts embedded inside tf-collapsable-panes.
+-->
+<dom-module id="tf-event-dashboard">
+  <template>
+    <div id="plumbing">
+      <tf-url-generator
+        out-runs-url="{{runsUrl}}"
+        out-scalars-url-generator="{{scalarsUrlGen}}"
+        id="urlGenerator"
+      ></tf-url-generator>
+
+      <tf-data-coordinator
+        id="dataCoordinator"
+        url-generator="[[scalarsUrlGen]]"
+        run-to-tag="[[runToScalars]]"
+        color-scale="[[colorScale]]"
+        out-data-coordinator="{{dataCoordinator}}"
+      /></tf-data-coordinator>
+
+      <tf-run-generator
+        id="runGenerator"
+        url="[[runsUrl]]"
+        out-run-to-scalars="{{runToScalars}}"
+      /></tf-run-generator>
+
+      <tf-color-scale
+        id="colorScale"
+        runs="[[_runs]]"
+        out-color-scale="{{colorScale}}"
+        out-class-scale="{{classScale}}"
+      ></tf-color-scale>
+
+      <tf-tooltip-coordinator
+        id="tooltipCoordinator"
+        out-tooltip-updater="{{tooltipUpdater}}"
+        out-tooltip-map="{{tooltipMap}}"
+        out-x-value="{{tooltipXValue}}"
+        out-closest-run="{{closestRun}}"
+      ></tf-tooltip-coordinator>
+
+    </div>
+
+    <tf-dashboard-layout>
+      <div class="sidebar">
+        <tf-categorizer
+          id="categorizer"
+          tags="[[_visibleTags]]"
+          categories="{{categories}}"
+        ></tf-categorizer>
+        <span id="download-option">
+          Show Data Download Links:
+          <paper-toggle-button checked="{{_show_download_links}}"></paper-toggle-button>
+        </span>
+
+        <tf-x-type-selector
+          id="xTypeSelector"
+          out-x-type="{{xType}}"
+        ></tf-x-type-selector>
+
+        <tf-run-selector
+          id="runSelector"
+          runs="[[_runs]]"
+          class-scale="[[classScale]]"
+          out-selected="{{selectedRuns}}"
+          tooltips="[[tooltipMap]]"
+          closest-run="[[closestRun]]"
+          x-value="[[tooltipXValue]]"
+          x-type="[[xType]]"
+        ></tf-run-selector>
+
+      </div>
+      <div class="center">
+        <template is="dom-if" if="[[!categories.length]]">
+          <div class="warning">
+            <p>
+              No scalar summary tags were found.
+            </p>
+            <p>
+              Maybe data hasn't loaded yet, or maybe you need
+              to add some <code>tf.scalar_summary</code> ops to your graph, and
+              serialize them using the <code>tf.training.summary_io.SummaryWriter</code>.
+            </p>
+          </div>
+        </template>
+        <template is="dom-repeat" items="[[categories]]">
+          <tf-collapsable-pane name="[[item.name]]" count="[[item.tags.length]]">
+            <div class="layout horizontal wrap">
+              <template is="dom-repeat" items="[[item.tags]]" as="tag">
+                <div class="card">
+                  <span class="card-title">[[tag]]</span>
+                  <div class="card-content">
+                    <tf-chart
+                      tag="[[tag]]"
+                      type="scalar"
+                      id="chart"
+                      selected-runs="[[selectedRuns]]"
+                      x-type="[[xType]]"
+                      data-coordinator="[[dataCoordinator]]"
+                      color-scale="[[colorScale]]"
+                      on-keyup="toggleSelected"
+                      tabindex="2"
+                      tooltip-updater="[[tooltipUpdater]]"
+                    ></tf-chart>
+                    <paper-icon-button
+                      class="expand-button"
+                      shift$="[[_show_download_links]]"
+                      icon="fullscreen"
+                      on-tap="toggleSelected"
+                    ></paper-icon-button>
+                  </div>
+                  <template is="dom-if" if="[[_show_download_links]]">
+                    <div class="card-bottom-row">
+                      <tf-downloader
+                        selected-runs="[[selectedRuns]]"
+                        tag="[[tag]]"
+                        url-fn="[[scalarsUrlGen]]"
+                        run-to-tag="[[runToScalars]]"
+                      >
+                      </tf-downloader>
+                    </div>
+                  </template>
+                </div>
+              </template>
+            </div>
+          </tf-collapsable-pane>
+        </template>
+      </div>
+    </tf-dashboard-layout>
+
+    <style include="dashboard-style"></style>
+    <style include="warning-style"></style>
+
+  </template>
+
+  <script>
+    Polymer({
+      is: "tf-event-dashboard",
+      properties: {
+        _runs: {
+          type: Array,
+          computed: "_getRuns(runToScalars)",
+        },
+        _visibleTags: {
+          type: Array,
+          computed: "_getVisibleTags(selectedRuns.*, runToScalars.*)"
+        },
+        _show_download_links: Boolean,
+      },
+      observers: ['redraw(_show_download_links)'],
+      redraw: function(_show_download_links) {
+        var els = this.getElementsByTagName("tf-chart");
+        for (var i=0; i<els.length; i++) {
+          els[i].redraw();
+        }
+      },
+      _getRuns: function(runToScalars) {
+        return _.keys(runToScalars);
+      },
+      _getVisibleTags: function(selectedRunsChange, runsToScalarsChange) {
+        var keys = selectedRunsChange.base;
+        var dict = runsToScalarsChange.base;
+        return _.union.apply(null, keys.map(function(k) {return dict[k]}));
+      },
+      toggleSelected: function(e) {
+        var currentTarget = Polymer.dom(e.currentTarget);
+        var parentDiv = currentTarget.parentNode.parentNode;
+        parentDiv.classList.toggle("selected");
+        var chart = currentTarget.previousElementSibling;
+        if (chart) {
+          chart.redraw();
+        }
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/tf-run-selector.html b/tensorflow/tensorboard/components/tf-event-dashboard/tf-run-selector.html
new file mode 100644
index 0000000000..92cecb2e5a
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/tf-run-selector.html
@@ -0,0 +1,104 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-checkbox/paper-checkbox.html">
+<link rel="import" href="../imports/lodash.html">
+<link rel="import" href="../tf-dashboard-common/scrollbar-style.html">
+<link rel="import" href="../tf-multi-checkbox/tf-multi-checkbox.html">
+
+<!--
+tf-run-selector creates a set of checkboxes to display which runs are selected.
+It also displays tooltips.
+
+Properties in:
+- runs: Array of strings representing the runs that may be selected
+- tooltips: An object that maps from a run to the associated tooltip string.
+When tooltips are available, runs that have no associated tooltip will be
+hidden. When tooltips are available, the runs will be sorted by their tooltip.
+- closestRun: The name of the run that is closest to the cursor (present when
+tooltips are active). It will be highlighted
+- classScale: An object (generated by tf-dashboard-common/tf-color-scale) that
+maps from a run name to a class name, which will be used to color the run.
+- xValue: The string that represents the x-value associated with the tooltips.
+- xType: The string that describes what kind of data is displayed on the x axis.
+
+Properties out:
+- outSelected: The array of run names that are currently checked by the user.
+
+-->
+<dom-module id="tf-run-selector">
+  <template>
+    <div id="top-text">
+      <template is="dom-if" if="[[xValue]]">
+        <div class="x-tooltip tooltip-container">
+          <div class="x-tooltip-label">[[xType]]</div>
+          <div class="x-tooltip-value">[[xValue]]</div>
+        </div>
+      </template>
+      <template is="dom-if" if="[[!xValue]]">
+        <div id="tooltip-help" class="tooltip-container">
+          Selected Runs:
+        </div>
+      </template>
+    </div>
+    <tf-multi-checkbox
+      names="[[runs]]"
+      tooltips="[[tooltips]]"
+      highlights="[[_arrayify(closestRun)]]"
+      out-selected="{{outSelected}}"
+      class-scale="[[classScale]]"
+      hide-missing-tooltips
+    ></tf-multi-checkbox>
+    <style>
+      :host {
+        display: flex;
+        flex-direction: column;
+        padding-bottom: 10px;
+        box-sizing: border-box;
+      }
+      #top-text {
+        width: 100%;
+        flex-grow: 0;
+        flex-shrink: 0;
+        padding-left: 35px;
+        padding-right: 16px;
+        padding-bottom: 6px;
+        box-sizing: border-box;
+        color: var(--paper-grey-800);
+      }
+      tf-multi-checkbox {
+        display: flex;
+        flex-grow: 1;
+        flex-shrink: 1;
+        height: 0px; /* hackhack So the flex-grow takes over and gives it space */
+      }
+      .x-tooltip {
+        display: flex;
+        flex-direction: row;
+      }
+      .x-tooltip-label {
+        flex-grow: 1;
+        align-self: flex-start;
+      }
+      .x-tooltip-value {
+        align-self: flex-end;
+      }
+    </style>
+  </template>
+  <script>
+  Polymer({
+    is: "tf-run-selector",
+    properties: {
+      outSelected: {type: Array, notify: true},
+      // runs: an array of strings, representing the run names that may be chosen
+      runs: Array,
+      tooltips: {type: Object, value: null}, // {[run: string]: string}
+      xValue: {type: String, value: null}, // the string representing run's x val
+      xType: String, // string: relative, stpe, wall_time
+      classScale: Object, // map from run name to color class (css)
+      closestRun: {type: String, value: null}, // which run has a value closest to mouse coordinate
+    },
+    _arrayify: function(item) {
+      return [item];
+    },
+  });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/tf-tooltip-coordinator.html b/tensorflow/tensorboard/components/tf-event-dashboard/tf-tooltip-coordinator.html
new file mode 100644
index 0000000000..23b0cba87a
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/tf-tooltip-coordinator.html
@@ -0,0 +1,48 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+
+<!-- tf-tooltip-coordinator is a plumbing component that provides a TooltipUpdater,
+which is a function that allows modification of the values within the tooltip-coordinator
+from javascript logic elsewhere. It then propagates the values to other Polymer components.
+
+Thus, the tooltip-coordinator allows many JS pieces of the application to modify a single
+piece of shared state.
+ -->
+<dom-module id="tf-tooltip-coordinator">
+  <script>
+    Polymer({
+      is: "tf-tooltip-coordinator",
+      properties: {
+        outTooltipUpdater: {
+          type: Function,
+          value: function() {
+            return (function(tooltipMap, xValue, closestRun) {
+              this._setOutTooltipMap(tooltipMap);
+              this._setOutXValue(xValue);
+              this._setOutClosestRun(closestRun);
+            }).bind(this);
+          },
+          notify: true,
+          readOnly: true,
+        },
+        outTooltipMap: {
+          // a {runName: tooltipValue} map, where runName and tooltipValue are strings.
+          type: Object,
+          notify: true,
+          readOnly: true,
+        },
+        outXValue: {
+          // a string representation of the closest x value for the tooltips
+          type: Number,
+          notify: true,
+          readOnly: true,
+        },
+        outClosestRun: {
+          // the name of the run that is closest to the user cursor (if any)
+          type: String,
+          notify: true,
+          readOnly: true,
+        },
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-event-dashboard/tf-x-type-selector.html b/tensorflow/tensorboard/components/tf-event-dashboard/tf-x-type-selector.html
new file mode 100644
index 0000000000..078e456a0d
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-event-dashboard/tf-x-type-selector.html
@@ -0,0 +1,75 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-button/paper-button.html">
+<link rel="import" href="../tf-dashboard-common/tensorboard-color.html">
+
+<!--
+tf-x-type-selector is a simple component that creates buttons labeled "step" and "wall",
+  and provides (as upward bindable) an outXType property that is either "step" or "wall_time".
+-->
+<dom-module id="tf-x-type-selector">
+  <template>
+    <div id="buttons">
+      <p>X Type: </p>
+      <paper-button
+        class="x-button selected"
+        id="step"
+        on-tap="_select"
+        raised
+      >
+      step
+      </paper-button>
+      <paper-button
+        class="x-button"
+        id="relative"
+        on-tap="_select"
+      >
+      relative
+      </paper-button>
+      <paper-button
+        class="x-button"
+        id="wall_time"
+        on-tap="_select"
+      >
+      wall
+      </paper-button>
+    </div>
+    <style>
+      .x-button {
+        width: 29%;
+        font-size: 14px;
+        background-color: var(--paper-grey-500);
+        margin-top: 5px;
+        color: white;
+      }
+
+      .x-button.selected {
+        font-weight: bold;
+        background-color: var(--tb-orange-strong) !important;
+      }
+
+      #buttons p {
+        text-align: center;
+        font-size: 12px;
+        margin: 0;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-x-type-selector",
+      properties: {
+        outXType: {type: String, notify: true, readOnly: true, value: "step"},
+      },
+      _select: function(e) {
+        var _this = this;
+        ["step", "wall_time", "relative"].forEach(function(id) {
+          _this.$[id].raised = false;
+          _this.$[id].classList.remove("selected");
+        });
+        e.currentTarget.raised = true;
+        this._setOutXType(e.currentTarget.id);
+        e.currentTarget.classList.add("selected");
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-graph-board/tf-graph-board.html b/tensorflow/tensorboard/components/tf-graph-board/tf-graph-board.html
new file mode 100644
index 0000000000..c053d6f7a7
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-board/tf-graph-board.html
@@ -0,0 +1,152 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../tf-graph/tf-graph.html">
+<link rel="import" href="../tf-graph-info/tf-graph-info.html">
+<link rel="import" href="../../bower_components/paper-progress/paper-progress.html">
+<!-- Element for putting tf-graph and tf-graph-info side by side.
+
+Example
+
+<tf-graph-board graph=[[graph]]></tf-graph-board>
+
+-->
+
+<dom-module id="tf-graph-board">
+<template>
+<style>
+::host {
+  display: block;
+}
+
+/deep/ .close {
+  position: absolute;
+  cursor: pointer;
+  left: 15px;
+  bottom: 15px;
+}
+
+.container {
+  width: 100%;
+  height: 100%;
+  opacity: 1;
+}
+
+.container.loading {
+  cursor: progress;
+  opacity: 0.1;
+}
+
+.container.loading.error {
+  cursor: auto;
+}
+
+#info {
+  position: absolute;
+  right: 5px;
+  top: 5px;
+  padding: 0px;
+  max-width: 380px;
+  min-width: 320px;
+  background-color: rgba(255,255,255,0.9);
+  @apply(--shadow-elevation-2dp);
+}
+
+#main {
+  width: 100%;
+  height: 100%;
+}
+
+#progress-bar {
+  display: flex;
+  flex-direction: column;
+  align-items: center;
+  justify-content: center;
+  width: 100%;
+  position: absolute;
+  top: 40px;
+  left: 0;
+  font-size: 13px;
+}
+
+#progress-msg {
+  width: 400px;
+  margin-bottom: 5px;
+}
+
+paper-progress {
+  width: 400px;
+  --paper-progress-height: 6px;
+  --paper-progress-active-color: #f3913e;
+}
+</style>
+<template is="dom-if" if="[[_isNotComplete(progress)]]">
+  <div id="progress-bar">
+    <div id="progress-msg">[[progress.msg]]</div>
+    <paper-progress value="[[progress.value]]"></paper-progress>
+  </div>
+</template>
+<div class$="[[_getContainerClass(progress)]]">
+  <div id="main">
+    <tf-graph id="graph"
+              graph-hierarchy="[[graphHierarchy]]"
+              selected-node="{{_selectedNode}}"
+              highlighted-node="{{_highlightedNode}}"
+              color-by="[[colorBy]]"
+              color-by-params="{{colorByParams}}"
+              graph-name="[[graphName]]"
+              progress="[[progress]]"
+    ></tf-graph>
+  </div>
+  <div id="info">
+    <tf-graph-info id="graph-info"
+                    title="selected"
+                    graph-hierarchy="[[graphHierarchy]]"
+                    graph="[[graph]]"
+                    selected-node="{{_selectedNode}}"
+                    highlighted-node="{{_highlightedNode}}"
+    ></tf-graph-info>
+  </div>
+</div>
+</template>
+</dom-module>
+
+<script>
+Polymer({
+  is: 'tf-graph-board',
+  properties: {
+    // Public API.
+    graphHierarchy: Object,
+    graph: Object,
+    graphName: String,
+    // True if the graph data has also run-time stats.
+    hasStats: Boolean,
+    /**
+     * @type {value: number, msg: string}
+     *
+     * A number between 0 and 100 denoting the % of progress
+     * for the progress bar and the displayed message.
+     */
+    progress: Object,
+    colorByParams: {
+      type: Object,
+      notify: true,
+    },
+    // Private API: Data routing between child components.
+    _selectedNode: String,
+    _highlightedNode: String,
+  },
+  /** True if the progress is not complete yet (< 100 %). */
+  _isNotComplete: function(progress) {
+    return progress.value < 100;
+  },
+  _getContainerClass: function(progress) {
+    var result = 'container';
+    if (progress.error) {
+      result += ' error';
+    }
+    if (this._isNotComplete(progress)) {
+      result += ' loading';
+    }
+    return result;
+  }
+});
+</script>
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/colors.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/colors.ts
new file mode 100644
index 0000000000..8912483c09
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/colors.ts
@@ -0,0 +1,133 @@
+module tf {
+
+/**
+ * Mapping from color palette name to color pallette, which contains
+ * exact colors for multiple states of a single color pallette.
+ */
+export let COLORS = [
+  {
+    "name": "Google Blue",
+    "color": "#4184f3",
+    "active": "#3a53c5",
+    "disabled": "#cad8fc"
+  },
+  {
+    "name": "Google Red",
+    "color": "#db4437",
+    "active": "#8f2a0c",
+    "disabled": "#e8c6c1"
+  },
+  {
+    "name": "Google Yellow",
+    "color": "#f4b400",
+    "active": "#db9200",
+    "disabled": "#f7e8b0"
+  },
+  {
+    "name": "Google Green",
+    "color": "#0f9d58",
+    "active": "#488046",
+    "disabled": "#c2e1cc"
+  },
+  {
+    "name": "Purple",
+    "color": "#aa46bb",
+    "active": "#5c1398",
+    "disabled": "#d7bce6"
+  },
+  {
+    "name": "Teal",
+    "color": "#00abc0",
+    "active": "#47828e",
+    "disabled": "#c2eaf2"
+  },
+  {
+    "name": "Deep Orange",
+    "color": "#ff6f42",
+    "active": "#ca4a06",
+    "disabled": "#f2cbba"
+  },
+  {
+    "name": "Lime",
+    "color": "#9d9c23",
+    "active": "#7f771d",
+    "disabled": "#f1f4c2"
+  },
+  {
+    "name": "Indigo",
+    "color": "#5b6abf",
+    "active": "#3e47a9",
+    "disabled": "#c5c8e8"
+  },
+  {
+    "name": "Pink",
+    "color": "#ef6191",
+    "active": "#ca1c60",
+    "disabled": "#e9b9ce"
+  },
+  {
+    "name": "Deep Teal",
+    "color": "#00786a",
+    "active": "#2b4f43",
+    "disabled": "#bededa"
+  },
+  {
+    "name": "Deep Pink",
+    "color": "#c1175a",
+    "active": "#75084f",
+    "disabled": "#de8cae"
+  },
+  {
+    "name": "Gray",
+    "color": "#9E9E9E", // 500
+    "active": "#424242", // 800
+    "disabled": "F5F5F5" // 100
+  }
+].reduce((m, c) => {
+  m[c.name] = c;
+  return m;
+}, {});
+
+/**
+ * Mapping from op category to color palette name
+ * e.g.,  OP_GROUP_COLORS["state_ops"] = "Google Blue";
+ */
+export let OP_GROUP_COLORS = [
+  {
+    color: "Google Red",
+    groups: ["gen_legacy_ops", "legacy_ops", "legacy_flogs_input",
+      "legacy_image_input", "legacy_input_example_input",
+      "legacy_sequence_input", "legacy_seti_input_input"]
+  }, {
+    color: "Deep Orange",
+    groups: ["constant_ops"]
+  }, {
+    color: "Indigo",
+    groups: ["state_ops"]
+  }, {
+    color: "Purple",
+    groups: ["nn_ops", "nn"]
+  }, {
+    color: "Google Green",
+    groups: ["math_ops"]
+  }, {
+    color: "Lime",
+    groups: ["array_ops"]
+  }, {
+    color: "Teal",
+    groups: ["control_flow_ops", "data_flow_ops"]
+  }, {
+    color: "Pink",
+    groups: ["summary_ops"]
+  }, {
+    color: "Deep Pink",
+    groups: ["io_ops"]
+  }
+].reduce((m, c) => {
+  c.groups.forEach(function(group) {
+    m[group] = c.color;
+  });
+  return m;
+}, {});
+
+}
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/common.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/common.ts
new file mode 100644
index 0000000000..ed148bf719
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/common.ts
@@ -0,0 +1,236 @@
+/// <reference path="../../../typings/tsd.d.ts" />
+
+declare module graphlib {
+
+  interface GraphOptions {
+    name: string;
+    /**
+     * Direction for rank nodes. Can be TB, BT, LR, or RL, where T = top,
+     * B = bottom, L = left, and R = right.
+     */
+    rankdir: string;
+    type: string|number;
+    /** Number of pixels between each rank in the layout. */
+    ranksep?: number;
+    /** Number of pixels that separate nodes horizontally in the layout. */
+    nodesep?: number;
+  }
+
+  export interface EdgeObject {
+    v: string;
+    w: string;
+    name?: string;
+  }
+
+  export class Graph<N, E> {
+        constructor(opt?: Object);
+        setNode(name: string, value?: N): void;
+        hasNode(name: string): boolean;
+        setEdge(fromName: string, toName: string, value?: E): void;
+        hasEdge(fromName: string, toName: string): boolean;
+        edge(fromName: string, toName: string): E;
+        edge(edgeObject: EdgeObject): E;
+        removeEdge(v: string, w: string): void;
+        nodes(): string[];
+        node(name: string): N;
+        removeNode(name: string): void;
+        setGraph(graphOptions: GraphOptions): void;
+        graph(): GraphOptions;
+        nodeCount(): number;
+        neighbors(name: string): string[];
+        successors(name: string): string[];
+        predecessors(name: string): string[];
+        edges(): EdgeObject[];
+        outEdges(name: string): E[];
+        inEdges(name: string): E[];
+        /** Returns those nodes in the graph that have no in-edges. Takes O(|V|) time. */
+        sources(): string[];
+        /**
+         * Remove the node with the id v in the graph or do nothing if
+         * the node is not in the graph. If the node was removed this
+         * function also removes any incident edges. Returns the graph,
+         * allowing this to be chained with other functions. Takes O(|E|) time.
+         */
+        removeNode(name: string): Graph<N, E>;
+        setParent(name: string, parentName: string): void;
+    }
+}
+
+module tf {
+/**
+ * Recommended delay (ms) when running an expensive task asynchronously
+ * that gives enough time for the progress bar to update its UI.
+ */
+const ASYNC_TASK_DELAY = 20;
+
+export function time<T>(msg: string, task: () => T) {
+    let start = Date.now();
+    let result = task();
+    /* tslint:disable */
+    console.log(msg, ":", Date.now() - start, "ms");
+    /* tslint:enable */
+    return result;
+}
+
+/**
+ * Tracks task progress. Each task being passed a progress tracker needs
+ * to call the below-defined methods to notify the caller about the gradual
+ * progress of the task.
+ */
+export interface ProgressTracker {
+  updateProgress(incrementValue: number): void;
+  setMessage(msg: string): void;
+  reportError(msg: string): void;
+}
+
+/**
+ * Creates a tracker for a subtask given the parent tracker, the total progress
+ * of the subtask and the subtask message. The parent task should pass a
+ * subtracker to its subtasks. The subtask reports its own progress which
+ * becames relative to the main task.
+ */
+export function getSubtaskTracker(parentTracker: ProgressTracker,
+    impactOnTotalProgress: number, subtaskMsg: string): ProgressTracker {
+  return {
+    setMessage: function(progressMsg) {
+      // The parent should show a concatenation of its message along with
+      // its subtask tracker message.
+      parentTracker.setMessage(subtaskMsg + " : " + progressMsg);
+    },
+    updateProgress: function(incrementValue) {
+      // Update the parent progress relative to the child progress.
+      // For example, if the sub-task progresses by 30%, and the impact on the
+      // total progress is 50%, then the task progresses by 30% * 50% = 15%.
+      parentTracker
+          .updateProgress(incrementValue * impactOnTotalProgress / 100);
+    },
+    reportError: function(errorMsg) {
+      // The parent should show a concatenation of its message along with
+      // its subtask error message.
+      parentTracker.reportError(subtaskMsg + " : " + errorMsg);
+    }
+  };
+}
+
+/**
+ * Runs an expensive task asynchronously and returns a promise of the result.
+ */
+export function runAsyncTask<T>(msg: string, incProgressValue: number,
+    task: () => T, tracker: ProgressTracker): Promise<T> {
+  return new Promise((resolve, reject) => {
+    // Update the progress message to say the current running task.
+    tracker.setMessage(msg);
+    // Run the expensive task with a delay that gives enough time for the
+    // UI to update.
+    setTimeout(function() {
+      try {
+        var result = tf.time(msg, task);
+        // Update the progress value.
+        tracker.updateProgress(incProgressValue);
+        // Return the result to be used by other tasks.
+        resolve(result);
+      } catch (e) {
+        reject(result);
+      }
+    }, ASYNC_TASK_DELAY);
+  });
+}
+
+/**
+ * Returns a query selector with escaped special characters that are not
+ * allowed in a query selector.
+ */
+export function escapeQuerySelector(querySelector: string): string {
+  return querySelector.replace( /([:.\[\],/\\\(\)])/g, "\\$1" );
+}
+
+/**
+ * TensorFlow node definition as definied in the graph proto file.
+ */
+export interface TFNode {
+  /** Name of the node */
+  name: string;
+  /** List of nodes that are inputs for this node. */
+  input: string[];
+  /** The name of the device where the computation will run. */
+  device: string;
+  /** The name of the operation associated with this node. */
+  op: string;
+  /** List of attributes that describe/modify the operation. */
+  attr: {key: string, value: Object}[];
+}
+
+/**
+ * TensorFlow stats file definition as defined in the stats proto file.
+ */
+export interface TFStats {
+  devStats: {device: string, nodeStats: TFNodeStats[]}[];
+}
+
+/**
+ * TensorFlow stats for a node as defined in the stats proto file.
+ */
+export interface TFNodeStats {
+  nodeName: string;
+  // The next 4 properties are currently stored as string in json
+  // and must be parsed.
+  allStartMicros: number;
+  opStartRelMicros: number;
+  opEndRelMicros: number;
+  allEndRelMicros: number;
+  memory: {
+    allocatorName: string;
+    totalBytes: number; // Stored as string in json and should be parsed.
+    peakBytes: number; // Stored as string in json and should be parsed.
+  }[];
+  /** Output sizes recorded for a single execution of a graph node */
+  output: TFNodeOutput[];
+  timelineLabel: string;
+  scheduledMicros: string;
+  threadId: string;
+}
+
+/**
+ * Description for the output tensor(s) of an operation in the graph.
+ */
+export interface TFNodeOutput {
+  slot: number; // Stored as string in json and should be parsed.
+  /** Was the tensor allocated by this Op or a previous computation */
+  allocationType: string;
+  tensorDescription: {
+    /** Data type of tensor elements */
+    dtype: string;
+    /** Shape of the tensor */
+    shape: {
+      /**
+       * Dimensions of the tensor, such as [{name: "input", size: 30},
+       * {name: "output", size: 40}] for a 30 x 40 2D tensor.  The names
+       * are optional. The order of entries in "dim" matters: It indicates
+       * the layout of the values in the tensor in-memory representation.
+       */
+      dim: {
+        /** Size of the tensor in that dimension */
+        size: number, // Stored as string in json and should be parsed.
+        /** Optional name of the tensor dimension */
+        name?: string
+      }[];
+    };
+    /** Information about the size and allocator used for the data */
+    allocationDescription: {
+      // The next 2 properties are stored as string in json and
+      // should be parsed.
+      /** Total number of bytes requested */
+      requestedBytes: number;
+      /** Total number of bytes allocated, if known */
+      allocatedBytes?: number;
+      /** Name of the allocator used */
+      allocatorName: string;
+    };
+  };
+}
+} // close module tf
+
+/**
+ * Declaring dagre var used for dagre layout.
+ */
+declare var dagre: { layout(graph: graphlib.Graph<any, any>): void; };
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/graph.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/graph.ts
new file mode 100644
index 0000000000..64e537154b
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/graph.ts
@@ -0,0 +1,889 @@
+/// <reference path="../../../typings/tsd.d.ts" />
+/// <reference path="common.ts" />
+module tf.graph {
+
+/** Delimiter used in node names to denote namespaces. */
+export const NAMESPACE_DELIM = "/";
+const FULL_GRAPH_NAME = "fullGraph";
+export const ROOT_NAME = "__root__";
+
+// Separator between the source and the destination name of the edge.
+export const EDGE_KEY_DELIM = "--";
+
+export enum GraphType {FULL, EMBEDDED, META, SERIES, CORE, SHADOW, BRIDGE,
+    EDGE};
+export enum NodeType {META, OP, SERIES, BRIDGE, ELLIPSIS};
+
+/**
+ * A BaseEdge is the label object (in the graphlib sense) for an edge in the
+ * original, full graph produced after parsing. Subsequent graphs, like those
+ * which belong to Metanodes, should not use BaseEdge objects, but instead
+ * contain Metaedges (which in turn may contain any number of BaseEdges).
+ */
+export interface BaseEdge extends graphlib.EdgeObject {
+  isControlDependency: boolean;
+  isReferenceEdge: boolean;
+}
+
+/**
+ * A SlimGraph is inspired by graphlib.Graph, but having only the functionality
+ * that we need.
+ */
+export class SlimGraph {
+  nodes: { [nodeName: string]: OpNode };
+  edges: BaseEdge[];
+
+  constructor() {
+    this.nodes = {};
+    this.edges = [];
+  }
+}
+
+interface NormalizedInput {
+  name: string;
+  hasNumberPart: boolean;
+  isControlDependency: boolean;
+}
+
+interface BuildParams {
+  enableEmbedding: boolean;
+  inEmbeddingTypes: string[];
+  outEmbeddingTypes: string[];
+  refEdges: { [inputEdge: string]: boolean };
+}
+
+/**
+ * The most basic information about a node in the hierarchical graph.
+ */
+export interface Node {
+  /** The name of the node, used frequently to look up nodes by name. */
+  name: string;
+  /** Which type of node this is. */
+  type: NodeType;
+  /**
+   * Whether this node is a type that may contain other nodes. Those types
+   * should extend from GroupNode.
+   *
+   * For an OpNode, isGroupNode will be false, even though it may have
+   * embeddings. These embedding Nodes will have their parentNode set to the
+   * OpNode. However, embeddings are later rendered as annotations, not as
+   * children to be made visible on expansion (like a Metanode or SeriesNode).
+   */
+  isGroupNode: boolean;
+  /**
+   * The number of nodes this node represents. For OpNodes, this will be 1, and
+   * for GroupNodes it will be a count of the total number of descendents it
+   * contains.
+   */
+  cardinality: number;
+  /**
+   * The Node which is this Node's parent. This is of type Node and not
+   * GroupNode because of embeddings, which will have a parent OpNode.
+   */
+  parentNode: Node;
+  /** Runtime execution stats for this node, if available */
+  stats: NodeStats;
+}
+
+export interface OpNode extends Node {
+  op: string;
+  device: string;
+  attr: {key: string, value: Object}[];
+  inputs: NormalizedInput[];
+  inEmbeddings: OpNode[];
+  outEmbeddings: OpNode[];
+}
+
+export interface BridgeNode extends Node {
+  /**
+   * Whether this bridge node represents edges coming into its parent node.
+   */
+  inbound: boolean;
+}
+
+/**
+ * A node that is used when there are more than the maximum number of allowed
+ * annotations hanging off of a node.  This node represents an ellipsis
+ * annotation, indicating a number of additional annotations.
+ */
+export interface EllipsisNode extends Node {
+  /**
+   * The number of nodes this ellipsis represents.
+   */
+  numMoreNodes: number;
+
+  /**
+   * Sets the number of nodes this ellipsis represents and changes the node
+   * name accordingly.
+   */
+  setNumMoreNodes(numNodes: number);
+}
+
+export interface GroupNode extends Node {
+  /**
+   * The metagraph contains nodes and metaedges between the immediate children
+   * of this group. The node label objects may be other GroupNodes (like
+   * SeriesNodes and Metanodes) or individual OpNodes. All edge label objects
+   * are Metaedges, each of which contains references to the original
+   * BaseEdge(s) from which it was created.
+   */
+  metagraph: graphlib.Graph<GroupNode|OpNode, Metaedge>;
+
+  /**
+   * The bridgegraph contains only edges which link immediate children of this
+   * group with nodes outside of the metagraph. As in the metagraph, all edge
+   * label objects are Metaedges which contain references to the original
+   * BaseEdge(s) that contribute to it.
+   *
+   * For a Metaedge in the bridgegraph, its external endpoint will be the same
+   * as the metagraph edge from which it came. This is most easily explained
+   * by example.
+   *
+   * Consider an original graph that contains a BaseEdge A/B/C->Z/Y/X.
+   *
+   *     +-------+    (BaseEdge)     +-------+
+   *     | A/B/C |>----------------->| Z/Y/X |
+   *     +-------+                   +-------+
+   *
+   * When we construct the Root's metagraph, it will contain nodes for A and Z,
+   * and a Metaedge A->Z. The A->Z Metaedge will contain the original BaseEdge
+   * A/B/C->Z/Y/X in its baseEdgeGraph. The Root's bridgegraph will always be
+   * empty.
+   *
+   *     +---+    (Root.metagraph edge)    +---+
+   *     | A |>--------------------------->| Z |
+   *     +---+                             +---+
+   *
+   * Now consider the Metanode A. Its metagraph will contain a Metanode for A/B
+   * and no edges. A's bridgegraph will have one Metaedge from A/B->Z, which
+   * was derived from the Root's Metaedge A->Z. That Metaedge will contain the
+   * original BaseEdge in its baseEdgeGraph.
+   *
+   *     +---------+
+   *     | A       |
+   *     |  +---+  |   (A.bridgegraph edge)    +---+
+   *     |  | B |>---------------------------->| Z |
+   *     |  +---+  |                           +---+
+   *     +---------+
+   *
+   * Finally, consider the Metanode A/B. Its metagraph will contain a Metanode
+   * for A/B/C and again no edges. A/B's bridgegraph will have one Metaedge
+   * from A/B/C->Z, which was derived from A's bridgegraph Metaedge A/B->Z.
+   * As before, the A/B/C->Z Metaedge will contain the original BaseEdge in its
+   * baseEdgeGraph.
+   *
+   *     +---------------+
+   *     | A             |
+   *     |  +---------+  |
+   *     |  | B       |  |
+   *     |  |  +---+  |  |   (A/B.bridgegraph edge)      +---+
+   *     |  |  | C |>----------------------------------->| Z |
+   *     |  |  +---+  |  |                               +---+
+   *     |  +---------+  |
+   *     +---------------+
+   *
+   * Likewise, under the Metanode Z and Z/Y, to compute the bridgegraph, we'll
+   * end up with Metaedges A->Z/Y and A->Z/Y/X respectively. So the original
+   * BaseEdge A/B/C->Z/Y/X becomes four different Metaedges in four different
+   * bridgegraphs:
+   *
+   *   + A/B->Z in GroupNode A's bridgegraph,
+   *   + A/B/C->Z in GroupNode A/B's bridgegraph,
+   *   + A->Z/Y in GroupNode Z's bridgegraph, and
+   *   + A->Z/Y/X in GroupNode Z/Y's bridgegraph.
+   *
+   * Considering any BaseEdge then, if N is the number of path segments in the
+   * source and M is the number of path semgents in the destination, then the
+   * total number of bridgegraph edges you could create would be (N-1)(M-1).
+   *
+   * For this reason, it is computationally expensive to generate all the
+   * bridgegraphs for all the Metanodes, and instead they should be computed
+   * on demand as needed.
+   */
+  bridgegraph: graphlib.Graph<GroupNode|OpNode, Metaedge>;
+
+  /**
+   * Stores how many times each device name appears in its children
+   * op nodes. Used to color group nodes by devices.
+   */
+  deviceHistogram: {[device: string]: number};
+
+  /**
+   * Flag indicating whether this GroupNode's metagraph contains any edges that
+   * are not control edges. Used to quickly determine how to draw a collapsed
+   * series (vertically or horizontally).
+   */
+  hasNonControlEdges: boolean;
+}
+
+export interface Metanode extends GroupNode {
+  depth: number;
+  templateId: string;
+  opHistogram: {[op: string]: number};
+  getFirstChild(): GroupNode|OpNode;
+  getRootOp(): OpNode;
+  /** Return name of all leaves inside a metanode. */
+  leaves(): string[];
+}
+
+export interface SeriesNode extends GroupNode {
+  hasLoop: boolean;
+  prefix: string;
+  suffix: string;
+  clusterId: number;
+  ids: number[];
+  parent: string;
+}
+
+export class EllipsisNodeImpl implements EllipsisNode {
+  name: string;
+  numMoreNodes: number;
+  stats: NodeStats;
+  type: NodeType;
+  isGroupNode: boolean;
+  cardinality: number;
+  parentNode: Node;
+
+  /**
+   * Constructs a new ellipsis annotation node.
+   *
+   * @param numNodes The number of additional annotations this node represents.
+   */
+  constructor(numNodes: number) {
+    this.type = NodeType.ELLIPSIS;
+    this.isGroupNode = false;
+    this.cardinality = 1;
+    this.parentNode = null;
+    this.stats = null;
+    this.setNumMoreNodes(numNodes);
+  }
+
+  setNumMoreNodes(numNodes: number) {
+    this.numMoreNodes = numNodes;
+    this.name = "... " + numNodes + " more";
+  }
+};
+
+/**
+ * A label object for nodes in the full graph and leaf nodes in the render
+ * graph.
+ */
+class OpNodeImpl implements OpNode {
+  name: string;
+  op: string;
+  device: string;
+  stats: NodeStats;
+  attr: {key: string, value: Object}[];
+  inputs: NormalizedInput[];
+  type: NodeType;
+  isGroupNode: boolean;
+  cardinality: number;
+  inEmbeddings: OpNode[];
+  outEmbeddings: OpNode[];
+  parentNode: Node;
+
+  /**
+   * Constructs a new Op node.
+   *
+   * @param rawNode The raw node.
+   * @param normalizedInputs An array of normalized
+   *     inputs that denote the incoming edges to the current node. Each input
+   *     contains the normalized name of the source node, whether it has a number
+   *     part and whether it is a control dependency.
+   */
+  constructor(rawNode: tf.TFNode, normalizedInputs: NormalizedInput[]) {
+    this.op = rawNode.op;
+    this.name = rawNode.name;
+    this.device = rawNode.device;
+    this.attr = rawNode.attr;
+    this.inputs = normalizedInputs;
+    // additional properties
+    this.type = NodeType.OP;
+    this.isGroupNode = false;
+    this.cardinality = 1;
+    this.inEmbeddings = [];
+    this.outEmbeddings = [];
+    this.parentNode = null;
+  }
+};
+
+export function createMetanode(name: string, opt = {}): Metanode {
+  return new MetanodeImpl(name, opt);
+}
+
+/**
+ * Joins the information from the stats file (memory, compute time) with the graph
+ * information.
+ */
+export function joinStatsInfoWithGraph(graph: SlimGraph,
+    statsJson: TFStats): void {
+  _.each(statsJson.devStats, stats => {
+    _.each(stats.nodeStats, nodeStats => {
+      // Lookup the node in the graph by its original name, e.g. A. If not
+      // found, lookup by the rewritten name A/(A) in case the name is both
+      // a namespace and a node name.
+      let nodeName = nodeStats.nodeName in graph.nodes ?
+          nodeStats.nodeName :
+          nodeStats.nodeName + NAMESPACE_DELIM + "(" + nodeStats.nodeName + ")";
+      if (nodeName in graph.nodes) {
+        // Compute the total bytes used.
+        let totalBytes = 0;
+        if (nodeStats.memory) {
+          _.each(nodeStats.memory, alloc => {
+            if (alloc.totalBytes) {
+              totalBytes += Number(alloc.totalBytes);
+            }
+          });
+        }
+        let outputSize: number[][] = null;
+        if (nodeStats.output) {
+          outputSize = _.map(nodeStats.output, output => {
+            return _.map(output.tensorDescription.shape.dim,
+                dim => Number(dim.size));
+          });
+        }
+        graph.nodes[nodeName].stats = new NodeStats(totalBytes,
+            Number(nodeStats.allEndRelMicros), outputSize);
+      }
+    });
+  });
+}
+
+/**
+ * Execution stats for the node.
+ */
+class NodeStats {
+  constructor(totalBytes: number, totalMicros: number, outputSize: number[][]) {
+    this.totalBytes = totalBytes;
+    this.totalMicros = totalMicros;
+    this.outputSize = outputSize;
+  }
+
+  /**
+   * Total number of bytes used for the node. Sum of all childen
+   * if it is a Group node.
+   */
+  totalBytes: number;
+  /**
+   * Total number of compute time in microseconds used for the node.
+   * Sum of all children if it is a Group node.
+   */
+  totalMicros: number;
+  /**
+   * The shape of each output tensors, if there are any.
+   * Empty if it is a Group node.
+   */
+  outputSize: number[][];
+
+  /**
+   * Combines the specified stats with the current stats.
+   * Modifies the current object. This methos is used to
+   * compute aggregate stats for group nodes.
+   */
+  combine(stats: NodeStats): void {
+    if (stats.totalBytes != null) {
+      this.totalBytes += stats.totalBytes;
+    }
+    if (stats.totalMicros != null) {
+      this.totalMicros += stats.totalMicros;
+    }
+  }
+}
+
+class MetanodeImpl implements Metanode {
+  name: string;
+  stats: NodeStats;
+  type: NodeType;
+  depth: number;
+  isGroupNode: boolean;
+  cardinality: number;
+  metagraph: graphlib.Graph<GroupNode|OpNode, Metaedge>;
+  bridgegraph: graphlib.Graph<GroupNode|OpNode, Metaedge>;
+  templateId: string;
+  opHistogram: {[op: string]: number};
+  deviceHistogram: {[op: string]: number};
+  parentNode: Node;
+  hasNonControlEdges: boolean;
+
+  /** A label object for meta-nodes in the graph hierarchy */
+  constructor(name: string, opt = {}) {
+    this.name = name;
+    this.type = NodeType.META;
+    /** number of levels under this group */
+    this.depth = 1;
+    this.isGroupNode = true;
+    /** # of leaf nodes (including embedded ones) */
+    this.cardinality = 0;
+    /** graph contains metanodes, nodes, edges
+     * and metaedges for main items within this metanode
+     */
+    this.metagraph =
+      createGraph<GroupNode|OpNode, Metaedge>(name, GraphType.META, opt);
+    /** bridgegraph must be constructed lazily-see hierarchy.getBridgegraph() */
+    this.bridgegraph = null;
+    /**
+     * A dictionary that count ops type of nodes in this metanode
+     * (op type => count).
+     */
+    this.opHistogram = {};
+    this.deviceHistogram = {};
+    /** unique id for a metanode of similar subgraph */
+    this.templateId = null;
+    /** Metanode which contains this node, if any */
+    this.parentNode = null;
+    this.stats = new NodeStats(0, 0, null);
+    this.hasNonControlEdges = false;
+  }
+
+  getFirstChild(): GroupNode|OpNode {
+    return this.metagraph.node(this.metagraph.nodes()[0]);
+  }
+
+  /**
+   * Returns the op node associated with the metanode.
+   * For example, if the metanode is "sgd", the associated
+   * op node is sgd/(sgd).
+   */
+  getRootOp(): OpNode {
+    let nameSplit = this.name.split("/");
+    let rootOpName = this.name + "/(" + nameSplit[nameSplit.length - 1] + ")";
+    return <OpNode>this.metagraph.node(rootOpName);
+  }
+
+  /**
+   * Return an array of the names of all the leaves (non-GroupNodes) inside
+   * this metanode. This performs a breadth-first search of the tree, so
+   * immediate child leaves will appear earlier in the output array than
+   * descendant leaves.
+   */
+  leaves(): string[] {
+    let leaves = [];
+    let queue = [<Node> this];
+    let metagraph; // Defined here due to a limitation of ES6->5 compilation.
+    while (queue.length) {
+      let node = queue.shift();
+      if (node.isGroupNode) {
+        metagraph = (<GroupNode> node).metagraph;
+        _.each(metagraph.nodes(), name => queue.push(metagraph.node(name)));
+      } else {
+        leaves.push(node.name);
+      }
+    }
+    return leaves;
+  }
+};
+
+export interface Metaedge extends graphlib.EdgeObject {
+
+  /**
+   * Stores the original BaseEdges represented by this Metaedge.
+   */
+  baseEdgeList: BaseEdge[];
+
+  /**
+   * Whether this edge represents a relationship that is inbound (or outbound)
+   * to the object which contains this information. For example, in a Metanode's
+   * bridgegraph, each edge connects an immediate child to something outside
+   * the Metanode. If the destination of the edge is inside the Metanode, then
+   * its inbound property should be true. If the destination is outside the
+   * Metanode, then its inbound property should be false.
+   *
+   * The property is optional because not all edges can be described as
+   * inbound/outbound. For example, in a Metanode's metagraph, all of the edges
+   * connect immediate children of the Metanode. None should have an inbound
+   * property, or they should be null/undefined.
+   */
+  inbound?: boolean;
+
+  /**
+   * Number of regular edges (not control dependency edges).
+   */
+  numRegularEdges: number;
+
+  /**
+   * Number of control dependency edges.
+   */
+  numControlEdges: number;
+
+  /**
+   * Number of reference edges, which is an edge to an operation
+   * that takes a reference to its input and changes its value.
+   */
+  numRefEdges: number;
+
+  addBaseEdge(edge: BaseEdge): void;
+}
+
+export function createMetaedge(v: string, w: string): Metaedge {
+  return new MetaedgeImpl(v, w);
+}
+
+/**
+ * A label object for edges between metanodes of subgraphs in the render graph.
+ */
+class MetaedgeImpl implements Metaedge {
+  v: string;
+  w: string;
+  baseEdgeList: BaseEdge[];
+  inbound: boolean;
+  numRegularEdges: number;
+  numControlEdges: number;
+  numRefEdges: number;
+
+  constructor(v: string, w: string) {
+    this.v = v;
+    this.w = w;
+    this.baseEdgeList = [];
+    this.inbound = null;
+    this.numRegularEdges = 0;
+    this.numControlEdges = 0;
+    this.numRefEdges = 0;
+  }
+
+  addBaseEdge(edge: BaseEdge): void {
+    this.baseEdgeList.push(edge);
+    if (edge.isControlDependency) {
+      this.numControlEdges += 1;
+    } else {
+      this.numRegularEdges += 1;
+    }
+    if (edge.isReferenceEdge) {
+      this.numRefEdges += 1;
+    }
+  }
+}
+
+export function createSeriesNode(prefix: string, suffix: string,
+    parent: string, clusterId: number, name: string): SeriesNode {
+  return new SeriesNodeImpl(prefix, suffix, parent, clusterId, name);
+}
+
+export function getSeriesNodeName(prefix: string, suffix: string,
+    parent: string, startId?: number, endId?: number): string {
+  let numRepresentation =
+      (typeof startId !== "undefined" && typeof endId !== "undefined") ?
+      "[" + startId + "-" + endId + "]" : "#";
+  let pattern = prefix + numRepresentation + suffix;
+  return (parent ? parent + "/" : "") + pattern;
+}
+
+class SeriesNodeImpl implements SeriesNode {
+  name: string;
+  type: NodeType;
+  stats: NodeStats;
+  hasLoop: boolean;
+  prefix: string;
+  suffix: string;
+  clusterId: number;
+  ids: number[];
+  parent: string;
+  isGroupNode: boolean;
+  cardinality: number;
+  metagraph: graphlib.Graph<GroupNode|OpNode, Metaedge>;
+  bridgegraph: graphlib.Graph<GroupNode|OpNode, Metaedge>;
+  parentNode: Node;
+  deviceHistogram: {[op: string]: number};
+  hasNonControlEdges: boolean;
+
+  constructor(prefix: string, suffix: string, parent: string,
+      clusterId: number, name: string) {
+    this.name = name || getSeriesNodeName(prefix, suffix, parent);
+    this.type = NodeType.SERIES;
+    this.hasLoop = false;
+    this.prefix = prefix;
+    this.suffix = suffix;
+    this.clusterId = clusterId;
+    this.ids = [];
+    this.parent = parent;
+    this.isGroupNode = true;
+    this.cardinality = 0;
+    this.metagraph = createGraph<Metanode, Metaedge>(name, GraphType.SERIES);
+    // bridgegraph must be constructed lazily-see hierarchy.getBridgegraph()
+    this.bridgegraph = null;
+    this.parentNode = null;
+    this.deviceHistogram = {};
+    this.hasNonControlEdges = false;
+    this.stats = new NodeStats(0, 0, null);
+  }
+}
+
+/**
+ * Normalizes the inputs and extracts associated metadata:
+ * 1) Inputs can contain a colon followed by a number at the end
+ *    (e.g. inputName:1) and we remove this from the input name, and take note
+ *    that the input was numbered.
+ * 2) Control dependency inputs contain caret at the beginning and we
+ *    remove this and annotate the edge as a control dependency.
+ * @param inputs Array of unnormalized names of input nodes.
+ */
+function normalizeInputs(inputs: string[]): NormalizedInput[] {
+  return _.reduce(inputs, function(normalizedInputs, inputName) {
+    let start = inputName[0] === "^";
+    let colon = inputName.lastIndexOf(":");
+    let end = colon !== -1 &&
+      inputName.length - colon > 1 &&
+      !(/\D/).test(inputName.substring(colon + 1)) ?
+      colon : inputName.length;
+    let name = inputName.substring(start ? 1 : 0, end);
+    if (normalizedInputs.length === 0 ||
+      name !== normalizedInputs[normalizedInputs.length - 1].name) {
+      normalizedInputs.push({
+        name: name,
+        hasNumberPart: end !== inputName.length,
+        isControlDependency: start
+      });
+    }
+    return normalizedInputs;
+  }, []);
+}
+
+export function build(rawNodes: tf.TFNode[], params: BuildParams,
+    tracker: ProgressTracker): Promise<SlimGraph|void> {
+  /**
+   * A dictionary that maps each in-embedding node name to its host node label
+   * object.
+   */
+  let inEmbedding: {[nodeName: string]: OpNode} = {};
+  /**
+   * A dictionary that maps each node name to an array of the node's
+   * out-embedding node label objects.
+   */
+  let outEmbeddings: {[inputName: string]: OpNode[]} = {};
+  let isInEmbeddedPred = getEmbedPredicate(params.inEmbeddingTypes);
+  let isOutEmbeddedPred = getEmbedPredicate(params.outEmbeddingTypes);
+  let embeddingNodeNames: string[] = [];
+  /**
+   * A list of all the non-embedding node names which appear in the processed
+   * list of raw nodes. Here we pre-allocate enough room for all the rawNodes,
+   * even though there will some number of embeddings. The excess array length
+   * is spliced off later.
+   *
+   * Experimentation shows that around 30% of the array will go unused, and
+   * even for very large networks that amounts to less than 10k spaces.
+   */
+  let nodeNames = new Array<string>(rawNodes.length);
+
+  return runAsyncTask("Normalizing names", 30, () => {
+    let opNodes = new Array<OpNode>(rawNodes.length);
+    let index = 0;
+    _.each(rawNodes, rawNode => {
+      let normalizedInputs = normalizeInputs(rawNode.input);
+      let opNode = new OpNodeImpl(rawNode, normalizedInputs);
+      if (isInEmbeddedPred(opNode)) {
+        embeddingNodeNames.push(opNode.name);
+        inEmbedding[opNode.name] = opNode;
+        return;
+      }
+
+      if (isOutEmbeddedPred(opNode)) {
+        embeddingNodeNames.push(opNode.name);
+        _.each(opNode.inputs, input => {
+          let inputName = input.name;
+          outEmbeddings[inputName] = outEmbeddings[inputName] || [];
+          outEmbeddings[inputName].push(opNode);
+        });
+        return;
+      }
+      // The node is not an embedding, so add it to the names and nodes lists.
+      opNodes[index] = opNode;
+      nodeNames[index] = opNode.name;
+      index++;
+    });
+    opNodes.splice(index);
+    nodeNames.splice(index);
+    return opNodes;
+  }, tracker)
+  .then((opNodes) => {
+    // Create the graph data structure from the graphlib library.
+    return runAsyncTask("Building the data structure", 70, () => {
+      let normalizedNameDict = mapStrictHierarchy(nodeNames,
+        embeddingNodeNames);
+      let graph = new SlimGraph;
+
+      // Add the nodes to the graph.
+      _.each(opNodes, opNode => {
+        let normalizedName = normalizedNameDict[opNode.name] || opNode.name;
+        graph.nodes[normalizedName] = opNode;
+        // Check if the node has out-embeddings. If yes, add them to to the
+        // node.
+        if (opNode.name in outEmbeddings) {
+          opNode.outEmbeddings = outEmbeddings[opNode.name];
+          // Normalize the names of the out-embeddings.
+          _.each(opNode.outEmbeddings, node => {
+            node.name = normalizedNameDict[node.name] || node.name;
+          });
+        }
+        // Update the name of the node.
+        opNode.name = normalizedName;
+      });
+
+      // Visit each node's inputs to add the edges to the graph. If the input
+      // is an in-embedding, then add it to the node's in-embeddings instead.
+      _.each(opNodes, opNode => {
+        _.each(opNode.inputs, (input, i) => {
+          let inputName = input.name;
+          if (inputName in inEmbedding) {
+            opNode.inEmbeddings.push(inEmbedding[inputName]);
+          } else {
+            graph.edges.push({
+              v: normalizedNameDict[inputName] || inputName,
+              w: opNode.name,
+              isControlDependency: input.isControlDependency,
+              // Check if this op type and input number corresponds to a
+              // reference edge using the refEdges dictionary in the params.
+              isReferenceEdge: (params.refEdges[opNode.op + " " + i] === true)
+            });
+          }
+        });
+      });
+
+      // Normalize the names of in-embeddings.
+      _.each(inEmbedding, (node, name) => {
+        node.name = normalizedNameDict[node.name] || node.name;
+      });
+
+      return graph;
+    }, tracker);
+  })
+  .catch(function(reason) {
+    throw new Error("Failure creating graph");
+  });
+};
+
+/**
+ * Create a new graphlib.Graph() instance with default parameters
+ */
+export function createGraph<N, E>(name: string, type, opt = {}):
+    graphlib.Graph<N, E> {
+  let graph = new graphlib.Graph<N, E>(opt);
+  graph.setGraph({
+    name: name,
+    rankdir: "BT", // BT,TB,LR,RL
+    type: type
+  });
+  return graph;
+};
+
+/**
+ * Create a predicate for checking whether a node should be embedded based on
+ * the specified types.
+ */
+function getEmbedPredicate(types: string[]) {
+  return function(node) {
+    // check types
+    for (let i = 0; i < types.length; i++) {
+      let regExp = new RegExp(types[i]);
+      if (node.op.match(regExp)) { return true; }
+    }
+    return false;
+  };
+};
+
+/**
+ * Returns a strict node name (name => name/(name)) to avoid conflicts
+ * where the node name is also a namespace.
+ */
+function getStrictName(name: string): string {
+  let parts = name.split(NAMESPACE_DELIM);
+  return name + NAMESPACE_DELIM + "(" + parts[parts.length - 1] + ")";
+}
+
+/**
+ * For each op node (embedding or non-embedding), rename it if there is a
+ * non-embedding node under its namespace. For example, assume node name "A".
+ * If there is a non-embedding node under its namespace (e.g. "A/B"), "A" will
+ * be renamed to "A/(A)". Then the namespace "A" will contain 2 nodes: "(A)"
+ * and "B". If all the nodes under "A" are embedding nodes (e.g. constant and
+ * summary), keep "A" as an Op node and don't create a namespace.
+ *
+ * @param nodeNames An array of regular (non-embedding) node names.
+ * @param embeddingNodeNames An array of embedding node names.
+ * @return Dictionary object mapping names that need to be renamed to
+ *     new names.
+ */
+function mapStrictHierarchy(nodeNames: string[],
+    embeddingNodeNames: string[]): {[oldName: string]: string} {
+  /** Dictionary that maps the old new to the new name */
+  let newNameDictionary: {[oldName: string]: string} = {};
+  /** Set used to store all namespaces. */
+  let namespaceSet: {[namespace: string]: boolean} = {};
+  // sort the nodes to make prefix check faster
+  nodeNames.sort();
+  // look for nodes with a prefix a,a/b -> a/(a),a/b
+  for (let i = 0; i < nodeNames.length - 1; ++i) {
+    let a = nodeNames[i];
+    // Get all the parent namespaces of the current node
+    // and add them in the namespace set.
+    _.each(getHierarchicalPath(a).slice(0, -1), ns => {
+      namespaceSet[ns] = true;
+    });
+    let b = nodeNames[i + 1];
+    if (_.startsWith(b, a + NAMESPACE_DELIM)) {
+      newNameDictionary[a] = getStrictName(a);
+    }
+  }
+  // Go through all the embedding node names and rename them in case they
+  // collide with namespaces.
+  _.each(embeddingNodeNames, embeddingName => {
+    if (embeddingName in namespaceSet) {
+      // Rename to follow strict hierarchy.
+      newNameDictionary[embeddingName] = getStrictName(embeddingName);
+    }
+  });
+  return newNameDictionary;
+};
+
+/**
+ * Returns a list of the degrees of each node in the graph.
+ */
+function degreeSequence(graph: graphlib.Graph<any, any>): number[] {
+  let degrees = graph.nodes().map(function(name) {
+    return graph.neighbors(name).length;
+  });
+  degrees.sort();
+  return degrees;
+};
+
+/**
+ * Returns if the degree sequence of the two graphs is the same.
+ */
+export function hasSimilarDegreeSequence(graph1: graphlib.Graph<any, any>,
+    graph2: graphlib.Graph<any, any>): boolean {
+  let dg1 = degreeSequence(graph1);
+  let dg2 = degreeSequence(graph2);
+
+  for (let i = 0; i < dg1.length; i++) {
+    if (dg1[i] !== dg2[i]) {
+      return false;
+    }
+  }
+  return true;
+};
+
+/**
+ * Returns the hierarchical path of the current node, based on the node's name.
+ * For example, if the name is 'a/b/c', the returned path is ['a', 'a/b', 'a/b/c'].
+ */
+export function getHierarchicalPath(name: string,
+  seriesNames?: { [name: string]: string }): string[] {
+  let path: string[] = [];
+  let i = name.indexOf(NAMESPACE_DELIM);
+  // Push all parent portions of the path.
+  while (i >= 0) {
+    path.push(name.substring(0, i));
+    i = name.indexOf(NAMESPACE_DELIM, i + 1);
+  }
+  // If the node's path is under a series, then add the series node name to the
+  // hierarchical path as the parent of the leaf.
+  if (seriesNames) {
+    let seriesName = seriesNames[name];
+    if (seriesName) {
+      path.push(seriesName);
+    }
+  }
+  // Push the leaf of the path.
+  path.push(name);
+  return path;
+};
+
+} // close module tf.graph
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/hierarchy.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/hierarchy.ts
new file mode 100644
index 0000000000..6c9333de4c
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/hierarchy.ts
@@ -0,0 +1,715 @@
+/// <reference path="graph.ts" />
+/// <reference path="template.ts" />
+
+/**
+ * Package for the Graph Hierarchy for TensorFlow graph.
+ */
+module tf.graph.hierarchy {
+
+const LOG_PREFIX_MSG = "Graph hierarchy: ";
+
+/**
+ * Class used as output for getPredecessors and getSuccessors methods
+ */
+interface Edges {
+  control: string[];
+  regular: string[];
+}
+
+export interface Hierarchy {
+  root: Metanode;
+  templates: {[templateId: string]: string[]};
+  /** List of all device names */
+  devices: string[];
+  getNodeMap(): {[nodeName: string]: GroupNode|OpNode};
+  node(name: string): GroupNode|OpNode;
+  setNode(name: string, node: GroupNode|OpNode): void;
+  getBridgegraph(nodeName: string): graphlib.Graph<GroupNode|OpNode, Metaedge>;
+  getPredecessors(nodeName: string): Edges;
+  getSuccessors(nodeName: string): Edges;
+  getTopologicalOrdering(nodeName: string): { [childName: string]: number };
+}
+
+/**
+ * Class for the Graph Hierarchy for TensorFlow graph.
+ */
+class HierarchyImpl implements Hierarchy {
+  root: Metanode;
+  templates: {[templateId: string]: string[]};
+  private index: {[nodeName: string]: GroupNode|OpNode};
+  devices: string[];
+  orderings: { [nodeName: string]: { [childName: string]: number } };
+
+  constructor() {
+    this.root = createMetanode(ROOT_NAME, {compound: true});
+    this.templates = null;
+    this.devices = null;
+    /**
+     * @type {Object} Dictionary object that maps node name to the node
+     * (could be op-node, metanode, or series-node)
+     */
+    this.index = {};
+    this.index[ROOT_NAME] = this.root;
+    this.orderings = {};
+  }
+
+  getNodeMap(): {[nodeName: string]: GroupNode|OpNode} {
+    return this.index;
+  }
+
+  node(name: string): GroupNode|OpNode {
+    return this.index[name];
+  }
+
+  setNode(name: string, node: GroupNode|OpNode): void {
+    this.index[name] = node;
+  }
+
+  /**
+   * Given the name of a node in this hierarchy, get its bridgegraph, creating
+   * it on the fly if necessary. If the node is not a GroupNode, then this
+   * method returns null. If the provided name does not map to a node in the
+   * hierarchy, an error will be thrown.
+   */
+  getBridgegraph(nodeName: string): graphlib.Graph<GroupNode|OpNode, Metaedge> {
+    let node = this.index[nodeName];
+    if (!node) {
+      throw Error("Could not find node in hierarchy: " + nodeName);
+    }
+    if (!("metagraph" in node)) {
+      return null;
+    }
+    let groupNode = <GroupNode> node;
+    if (groupNode.bridgegraph) {
+      return groupNode.bridgegraph;
+    }
+    let bridgegraph = groupNode.bridgegraph =
+        createGraph<GroupNode|OpNode, Metaedge>(
+            "BRIDGEGRAPH", GraphType.BRIDGE);
+    if (!node.parentNode || !("metagraph" in node.parentNode)) {
+      return bridgegraph;
+    }
+
+    let parentNode = <GroupNode>node.parentNode;
+    let parentMetagraph = parentNode.metagraph;
+    let parentBridgegraph = this.getBridgegraph(parentNode.name);
+
+    // For each of the parent node's two Metaedge containing graphs, process
+    // each Metaedge involving this node.
+    _.each([parentMetagraph, parentBridgegraph], parentGraph => {
+      _(parentGraph.edges())
+        .filter(e => e.v === nodeName || e.w === nodeName)
+        .each(parentEdgeObj => {
+
+          let inbound = parentEdgeObj.w === nodeName;
+          let parentMetaedge = parentGraph.edge(parentEdgeObj);
+
+          // The parent's Metaedge represents some number of underlying
+          // BaseEdges from the original full graph. For each of those, we need
+          // to determine which immediate child is involved and make sure
+          // there's a Metaedge in the bridgegraph that covers it.
+          _.each(parentMetaedge.baseEdgeList, baseEdge => {
+
+            // Based on the direction, figure out which is the descendant node
+            // and which is the "other" node (sibling of parent or ancestor).
+            let [descendantName, otherName] =
+              inbound ?
+                [baseEdge.w, parentEdgeObj.v] :
+                [baseEdge.v, parentEdgeObj.w];
+
+            // Determine the immediate child containing this descendant node.
+            let childName = this.getChildName(nodeName, descendantName);
+
+            // Look for an existing Metaedge in the bridgegraph (or create a
+            // new one) that covers the relationship between child and other.
+            let bridgeEdgeObj = <graphlib.EdgeObject> {
+              v: inbound ? otherName : childName,
+              w: inbound ? childName : otherName,
+            };
+            let bridgeMetaedge = bridgegraph.edge(bridgeEdgeObj);
+            if (!bridgeMetaedge) {
+              bridgeMetaedge = createMetaedge(bridgeEdgeObj.v, bridgeEdgeObj.w);
+              bridgeMetaedge.inbound = inbound;
+              bridgegraph.setEdge(bridgeEdgeObj.v, bridgeEdgeObj.w,
+                  bridgeMetaedge);
+            }
+
+            // Copy the BaseEdge from the parent's Metaedge into this
+            // bridgegraph Metaedge.
+            bridgeMetaedge.addBaseEdge(baseEdge);
+          });
+        })
+        .value(); // force lodash chain execution.
+    });
+
+    return bridgegraph;
+  }
+
+  /**
+   * Utility function for determining the name of the immediate child under a
+   * node for a given descendant path. If the descendant corresponds to no
+   * immediate child, an error is thrown.
+   */
+  getChildName(nodeName: string, descendantName: string): string {
+    // Walk up the hierarchy from the descendant to find the child.
+    let currentNode: Node = this.index[descendantName];
+    while (currentNode) {
+      if (currentNode.parentNode && currentNode.parentNode.name === nodeName) {
+        return currentNode.name;
+      }
+      currentNode = currentNode.parentNode;
+    }
+    throw Error("Could not find immediate child for descendant: " +
+        descendantName);
+  };
+
+  /**
+   * Given the name of a node, return the names of its predecssors.
+   * For an OpNode, this will contain the targets from the underlying BaseEdges.
+   * For a GroupNode, this will contain the targets truncated to siblings of
+   * the shared ancestor.
+   *
+   * For example, consider an original non-control BaseEdge A/B/C->Z/Y/X. Their
+   * shared ancestor is the ROOT node. A and Z are the highest siblings. Here
+   * are the results of calling getPredecessors():
+   *
+   *  - getPredecessors("Z/Y/X") === {regular: ["A/B/C"], control: []};
+   *  - getPredecessors("Z/Y") === {regular: ["A"], control: []};
+   *  - getPredecessors("Z") === {regular: ["A"], control: []};
+   *
+   * The reason getPredecessors("Z/Y") returns ["A"] (and not ["A/B"] as you
+   * might intuitively expect) is because it's not clear how far down the
+   * other end of the hierarchy to traverse in the general case.
+   *
+   * Continuing this example, say there was another BaseEdge A/K->Z/Y/W. When
+   * we look at Z/Y's predecessors, the best we can say is ["A"] without getting
+   * into the details of which of of Z/Y's descendant nodes have predecessors to
+   * which of A's descendants.
+   *
+   * On the other hand, for an OpNode it's clear what the final predecssors
+   * ought to be. There is no ambiguity.
+   */
+  getPredecessors(nodeName: string): Edges {
+    let node = this.index[nodeName];
+    if (!node) {
+      throw Error("Could not find node with name: " + nodeName);
+    }
+
+    let predecessors = this.getOneWayEdges(node, true);
+
+    // Add embedded predecessors, such as constants.
+    if (!node.isGroupNode) {
+      _.each((<OpNode>node).inEmbeddings, embeddedNode => {
+        predecessors.regular.push(embeddedNode.name);
+      });
+    }
+    return predecessors;
+  }
+
+  /**
+   * Given the name of a node, return an array of the names of its successors.
+   * For an OpNode, this will contain the targets from the underlying BaseEdges.
+   * For a GroupNode, this will contain the targets truncated to sibling of
+   * the shared ancestor.
+   *
+   * This is the inverse of getPredecessors(). See that method's documentation
+   * for an in-depth example.
+   */
+  getSuccessors(nodeName: string): Edges {
+    let node = this.index[nodeName];
+    if (!node) {
+      throw Error("Could not find node with name: " + nodeName);
+    }
+
+    let successors = this.getOneWayEdges(node, false);
+
+    // Add embedded successors, such as summaries.
+    if (!node.isGroupNode) {
+      _.each((<OpNode>node).outEmbeddings, embeddedNode => {
+        successors.regular.push(embeddedNode.name);
+      });
+    }
+    return successors;
+  }
+
+  /** Helper method for getPredeccessors and getSuccessors */
+  getOneWayEdges(node: GroupNode|OpNode, inEdges: boolean) {
+    let edges = { control: [], regular: [] };
+    // A node with no parent cannot have any edges.
+    if (!node.parentNode) {
+    return edges;
+    }
+    if (node.parentNode.isGroupNode) {
+      let parentNode = <GroupNode>node.parentNode;
+      let metagraph = parentNode.metagraph;
+      let bridgegraph = this.getBridgegraph(parentNode.name);
+      findEdgeTargetsInGraph(metagraph, node, inEdges, edges);
+      findEdgeTargetsInGraph(bridgegraph, node, inEdges, edges);
+    }
+    return edges;
+  }
+
+  /**
+   * For a given GroupNode, get or calculate an object which describes a
+   * topological ordering of child nodes within that GroupNode's metagraph.
+   *
+   * This ordering is used when rendering bridge control edges which are
+   * sometimes backwards relative to the dataflow.
+   *
+   * For example, say we have a graph with two edges A->B and A->C, and we're
+   * interested in the ordering under ROOT. In this case, any of the following
+   * would be legitimate return values:
+   *
+   *  - { "A": 0, "B": 1, "C": 2 } -- most likely
+   *  - { "A": 0, "B": 2, "C": 1 } -- less likely
+   *  - { "A": 12, "B": 100, "C": 99 } -- unlikely, but still OK
+   *
+   * The algorithm does not guarantee that all numbers from 0-N (where N is
+   * the number of nodes) appear exactly once. Rather it guarantees that if
+   * there is a path between two nodes, the earlier one will have a lower
+   * number in the ordering hash.
+   *
+   * When generating the ordering, we ignore control Metaedges (those which
+   * represent only BaseEdges that have isControlDependency set to true).
+   *
+   * If there is no node with the specified name, an error is thrown. If the
+   * node with the specified name is not a group node, null is returned.
+   */
+  getTopologicalOrdering(nodeName: string): { [childName: string]: number } {
+    let node = this.index[nodeName];
+    if (!node) {
+      throw Error("Could not find node with name: " + nodeName);
+    }
+    if (!node.isGroupNode) {
+      return null;
+    }
+    if (nodeName in this.orderings) {
+      return this.orderings[nodeName];
+    }
+
+    // Mapping of a child node names to lists of their successors.
+    let successors: { [childName: string]: string[] } = {};
+
+    // Set of node names which have appeared as a destination.
+    let destinations: { [childName: string]: boolean } = {};
+
+    let metagraph = (<GroupNode> node).metagraph;
+    _.each(metagraph.edges(), (e: graphlib.EdgeObject) => {
+      if (!metagraph.edge(e).numRegularEdges) {
+        return; // Skip control edges.
+      }
+
+      // Keep track of successors and destinations.
+      if (!(e.v in successors)) {
+        successors[e.v] = [];
+      }
+      successors[e.v].push(e.w);
+      destinations[e.w] = true;
+    });
+
+    // Seed the queue with true sources (those that are not destinations).
+    let queue: string[] =
+      _.difference(_.keys(successors), _.keys(destinations));
+
+    // Produce an ordering by traversing the graph breadth first.
+    let ordering = this.orderings[nodeName] = {};
+    let index = 0;
+    while (queue.length) {
+      let childName = queue.shift();
+      ordering[childName] = index++;
+      _.each(successors[childName], succName => queue.push(succName));
+      delete successors[childName]; // Prevent cycles from infinite looping.
+    }
+    return ordering;
+  }
+
+}
+
+/**
+ * Internal utility function - given a graph (should be either a metagraph or a
+ * bridgegraph) and a node which is known to be in that graph, determine
+ * the other ends of edges that involve that node in the direction specified
+ * by whether it's inbound.
+ *
+ * For example if you wanted to find the predecessors of a node, you'd call
+ * this method for the parent's metagraph and bridgegraph, specifying inbound
+ * as true (look at the source of inbound edges to the specified node).
+ *
+ * Discovered target names are appended to the targets array.
+ */
+function findEdgeTargetsInGraph(
+    graph: graphlib.Graph<GroupNode|OpNode, Metaedge>,
+    node: Node, inbound: boolean, targets: Edges): void {
+  _.each(<Metaedge[]> graph.edges(), e => {
+    let [selfName, otherName] = inbound ? [e.w, e.v] : [e.v, e.w];
+    if (selfName === node.name) {
+      if (node.isGroupNode) {
+        let targetList = graph.edge(e).numRegularEdges
+          ? targets.regular : targets.control;
+        targetList.push(otherName);
+      } else {
+        _.each(graph.edge(e).baseEdgeList, baseEdge => {
+          let targetList = baseEdge.isControlDependency
+            ? targets.control : targets.regular;
+          targetList.push(inbound ? baseEdge.v : baseEdge.w);
+        });
+      }
+    }
+  });
+}
+
+interface HierarchyParams {
+  verifyTemplate: boolean;
+  groupSeries: boolean;
+}
+
+/**
+ * @param graph The raw graph.
+ * @param params Parameters used when building a hierarchy.
+ */
+export function build(graph: tf.graph.SlimGraph, params: HierarchyParams,
+    tracker: ProgressTracker): Promise<Hierarchy|void> {
+  let h = new HierarchyImpl();
+  let seriesNames: { [name: string]: string } = {};
+  return runAsyncTask("Adding nodes", 20, () => {
+    // Get all the possible device names.
+    let deviceNames = {};
+    _.each(graph.nodes, (node, nodeName) => {
+      if (node.device != null) {
+        deviceNames[node.device] = true;
+      }
+    });
+    h.devices = _.keys(deviceNames);
+    addNodes(h, graph);
+  }, tracker)
+  .then(() => {
+    return runAsyncTask("Detect series", 20, () => {
+      if (params.groupSeries) {
+        groupSeries(h.root, h, seriesNames);
+      }
+    }, tracker);
+  })
+  .then(() => {
+    return runAsyncTask("Adding edges", 30, () => {
+      addEdges(h, graph, seriesNames);
+    }, tracker);
+  })
+  .then(() => {
+    return runAsyncTask("Finding similar subgraphs", 30, () => {
+      h.templates = template.detect(h, params.verifyTemplate);
+    }, tracker);
+  })
+  .then(() => {
+    return h;
+  }).catch(function(reason) {
+    throw new Error("Failure creating graph hierarchy");
+  });
+};
+
+/**
+ * Creates the metanodes in the hierarchical graph and assigns parent-child
+ * relationship between them.
+ */
+function addNodes(h: Hierarchy, graph: SlimGraph) {
+  _.each(graph.nodes, (node, nodeName) => {
+    let path = getHierarchicalPath(node.name);
+    let parent: Metanode = h.root;
+
+    parent.depth = Math.max(path.length, parent.depth);
+
+    // Create parent metanodes for each depth. For example if the node name
+    // is 'a/b/c', then create metanodes 'a' and 'a/b', where 'a/b' is a child
+    // of a.
+    for (let i = 0; i < path.length; i++) {
+      parent.depth = Math.max(parent.depth, path.length - i);
+      parent.cardinality += node.cardinality;
+      parent.opHistogram[node.op] = (parent.opHistogram[node.op] || 0) + 1;
+      if (node.stats) {
+        parent.stats.combine(node.stats);
+      }
+      if (node.device != null) {
+        parent.deviceHistogram[node.device] =
+            (parent.deviceHistogram[node.device] || 0) + 1;
+      }
+      if (i === path.length - 1) { break; }
+      let name = path[i];
+      let child = <Metanode>h.node(name);
+      if (!child) {
+        child = createMetanode(name);
+        child.parentNode = parent;
+        h.setNode(name, child);
+        parent.metagraph.setNode(name, child);
+      }
+      parent = child;
+    }
+    // Assuming node name is 'a/b/c', assign the OpNode as a child of the metanode 'a/b'.
+    h.setNode(node.name, node);
+    node.parentNode = parent;
+    parent.metagraph.setNode(node.name, node);
+
+    // Add each of the in-embeddings and out-embeddings in the hierarchy.
+    _.each(node.inEmbeddings, function(embedding) {
+      h.setNode(embedding.name, embedding);
+      embedding.parentNode = node;
+    });
+    _.each(node.outEmbeddings, function(embedding) {
+      h.setNode(embedding.name, embedding);
+      embedding.parentNode = node;
+    });
+  });
+};
+
+/**
+ * For each metanode in the hierarchical graph, this method adds:
+ * the edges in the metagraph. These are edges between nodes
+ * that share the same parent.
+ */
+function addEdges(h: Hierarchy, graph: SlimGraph,
+    seriesNames: { [name: string]: string }) {
+
+  let nodeIndex = h.getNodeMap();
+
+  // Ancestor paths for the source and destination nodes of an edge. These are
+  // reused for each edge rather than allocating new ones. It's about 10% faster
+  // than allocating new ones on each pass through the loop.
+  let sourcePath: string[] = [];
+  let destPath: string[] = [];
+
+  // Insert the ancestor path for a node into the provided array, including the
+  // node itself. Return the index of the last node inserted (always ROOT).
+  let getPath = (node: Node, path: string[]): number => {
+    let i = 0;
+    while (node) {
+      path[i++] = node.name;
+      node = node.parentNode;
+    }
+    return i - 1;
+  };
+
+  _.each(graph.edges, baseEdge => {
+
+    // Get the hierarchical paths for the source and destination of the edge.
+    let sourceAncestorIndex = getPath(graph.nodes[baseEdge.v], sourcePath);
+    let destAncestorIndex = getPath(graph.nodes[baseEdge.w], destPath);
+
+    // Find the lowest shared ancestor between source and dest by looking for
+    // the highest nodes that differ between their ancestor paths.
+    while (sourcePath[sourceAncestorIndex] === destPath[destAncestorIndex]) {
+      sourceAncestorIndex--;
+      destAncestorIndex--;
+      if (sourceAncestorIndex < 0 || destAncestorIndex < 0) {
+        // This would only occur if the two nodes were the same (a cycle in the
+        // graph), or if one endpoint was a strict ancestor of the other. The
+        // latter shouldn't happen because we rename nodes which are both
+        // metanodes and op nodes. E.g. "A/B" becomes "A/B/(B)".
+        throw Error("No difference found between ancestor paths.");
+      }
+    }
+
+    let sharedAncestorNode =
+      <GroupNode>nodeIndex[sourcePath[sourceAncestorIndex + 1]];
+    let sourceAncestorName = sourcePath[sourceAncestorIndex];
+    let destAncestorName = destPath[destAncestorIndex];
+
+    // Find or create the Metaedge which should contain this BaseEdge inside
+    // the shared ancestor.
+    let metaedge =
+      sharedAncestorNode.metagraph.edge(sourceAncestorName, destAncestorName);
+    if (!metaedge) {
+      metaedge = createMetaedge(sourceAncestorName, destAncestorName);
+      sharedAncestorNode.metagraph
+        .setEdge(sourceAncestorName, destAncestorName, metaedge);
+    }
+    if (!sharedAncestorNode.hasNonControlEdges &&
+        !baseEdge.isControlDependency) {
+      sharedAncestorNode.hasNonControlEdges = true;
+    }
+    metaedge.addBaseEdge(baseEdge);
+
+  });
+
+};
+
+/**
+ * Using the hierarchy template information, detect series in the provided
+ * metanode.  For each detected series, create a new SeriesNode
+ * and remove series members from the metanode's metagraph and move them to
+ * the new series node's metagraph.
+ *
+ * @param metanode
+ * @param hierarchy
+ * @return A dictionary from node name to series node name that contains the node
+ */
+function groupSeries(metanode: Metanode, hierarchy: Hierarchy,
+    seriesNames: { [name: string]: string }) {
+  let metagraph = metanode.metagraph;
+  _.each(metagraph.nodes(), n => {
+    let child = metagraph.node(n);
+    if (child.type === tf.graph.NodeType.META) {
+      groupSeries(<Metanode>child, hierarchy, seriesNames);
+    }
+  });
+
+  let clusters = clusterNodes(metagraph);
+  let seriesDict = detectSeries(clusters, metagraph);
+
+  // Add each series node to the graph and add its grouped children to its own
+  // metagraph.
+  _.each(seriesDict, function(seriesNode: SeriesNode, seriesName: string) {
+    let nodeMemberNames = seriesNode.metagraph.nodes();
+    let firstMember = seriesNode.metagraph.node(nodeMemberNames[0]);
+    let seriesType = firstMember.type;
+
+    hierarchy.setNode(seriesName, seriesNode); // add to the index
+    metagraph.setNode(seriesName, seriesNode);
+    _.each(nodeMemberNames, n => {
+      let child = <OpNode> metagraph.node(n);
+      seriesNode.metagraph.setNode(n, child);
+      seriesNode.parentNode = child.parentNode;
+      seriesNode.cardinality++;
+      if (child.device != null) {
+        seriesNode.deviceHistogram[child.device] =
+            (seriesNode.deviceHistogram[child.device] || 0) + 1;
+      }
+      child.parentNode = seriesNode;
+      seriesNames[n] = seriesName;
+
+      if (child.stats) {
+        seriesNode.stats.combine(child.stats);
+      }
+
+      // Remove now-grouped node from its original parent's metagraph.
+      metagraph.removeNode(n);
+    });
+  });
+};
+
+/** cluster op-nodes with similar op */
+function clusterNodes(metagraph: graphlib.Graph<GroupNode|OpNode, Metaedge>):
+    {[clusterId: string]: string[]} {
+  let result: {[clusterId: string]: string[]} = {};
+  return  _.reduce(metagraph.nodes(), function(clusters: {[clusterId: string]: string[]}, n: string) {
+    let child = metagraph.node(n);
+    if (child.type === NodeType.META) {
+      // skip metanodes
+      return clusters;
+    }
+    let template = (<OpNode>child).op;
+    if (template) {
+      clusters[template] = clusters[template] || [];
+      clusters[template].push(child.name);
+    }
+    return clusters;
+  }, result);
+}
+
+/**
+ * For each cluster of op-nodes based op type, try to detect groupings.
+ * Infer series name using by trying to find pattern "<number>" in the node
+ * name.
+ *
+ * @param clusters Dictionary output from clusterNodes().
+ * @param metagraph
+ * @return A dictionary from series name => seriesNode
+ */
+function detectSeries(clusters: {[clusterId: string]: string[]},
+     metagraph: graphlib.Graph<GroupNode|OpNode, Metaedge>):
+     {[seriesName: string]: SeriesNode} {
+  let seriesDict: {[seriesName: string]: SeriesNode} = {};
+  _.each(clusters, function(members, clusterId: string) {
+    if (members.length <= 1) { return; } // isolated clusters can't make series
+
+    /** @type {Object}  A dictionary mapping seriesName to seriesInfoArray,
+     * which is an array that contains objects with name, id, prefix, suffix,
+     * and parent properties.
+     */
+    let candidatesDict = {};
+
+    // Group all nodes that have the same name, with the exception of a
+    // number at the end of the name after an underscore, which is allowed to
+    // vary.
+    _.each(members, function(name: string) {
+      let isGroup = name.charAt(name.length - 1) === "*";
+      let namepath = name.split("/");
+      let leaf = namepath[namepath.length - 1];
+      let parent = namepath.slice(0, namepath.length - 1).join("/");
+      let matches = leaf.match(/^(\D*)_(\d+)$/);
+
+      let prefix;
+      let id;
+      let suffix = "";
+      if (matches) { // if found "<number>" in the name, assign id.
+        prefix = matches[1]; // the front non-numeric characters
+        id = matches[2]; // the digits
+      } else { // for node without "_<number>", make them zero-th items.
+        prefix = isGroup ? leaf.substr(0, leaf.length - 1) : leaf;
+        if (prefix.charAt(prefix.length - 1) !== "_") {
+          prefix += "_";
+        }
+        id = 0;
+        suffix = isGroup ? "*" : "";
+      }
+      let seriesName = getSeriesNodeName(prefix, suffix, parent);
+      candidatesDict[seriesName] = candidatesDict[seriesName] || [];
+      let seriesNode = createSeriesNode(prefix, suffix, parent, +id, name);
+      candidatesDict[seriesName].push(seriesNode);
+    });
+
+    // In each group of nodes, group nodes in bunches that have monotonically
+    // increasing numbers in their names.  Each of these bunches is a series.
+    _.each(candidatesDict, function(seriesInfoArray: SeriesNode[], seriesName) {
+      if (seriesInfoArray.length < 2) {
+        return;
+      }
+      seriesInfoArray.sort(function(a, b) {
+        return (+a.clusterId) - (+b.clusterId);
+      });
+
+      // Loop through the nodes sorted by its detected series number, grouping
+      // all nodes with monotonically-increasing series numbers.
+      let seriesNodes = [seriesInfoArray[0]];
+      for (let index = 1; index < seriesInfoArray.length; index++) {
+        let nextNode = seriesInfoArray[index];
+        if (nextNode.clusterId === seriesNodes[seriesNodes.length - 1].clusterId + 1) {
+          seriesNodes.push(nextNode);
+          continue;
+        }
+        addSeriesToDict(seriesNodes, seriesDict, +clusterId, metagraph);
+        seriesNodes = [nextNode];
+      }
+      addSeriesToDict(seriesNodes, seriesDict, +clusterId, metagraph);
+    });
+  });
+  return seriesDict;
+}
+
+/**
+ * Add a series to the provided dictionary mapping series names to series.
+ *
+ * @param seriesNodes the nodes in the series. Contains
+ *     name, id, prefix, suffix and parent properties of the node.
+ * @param seriesDict the dictionary of series
+ * @param clusterId ID of the template of the nodes of the series
+ * @param metagraph
+ */
+function addSeriesToDict(seriesNodes: SeriesNode[],
+    seriesDict: {[seriesName: string] : SeriesNode},
+    clusterId: number,
+    metagraph: graphlib.Graph<GroupNode|OpNode, Metaedge>) {
+  if (seriesNodes.length > 1) {
+    let curSeriesName = getSeriesNodeName(
+      seriesNodes[0].prefix, seriesNodes[0].suffix,
+      seriesNodes[0].parent, seriesNodes[0].clusterId,
+      seriesNodes[seriesNodes.length - 1].clusterId);
+    let curSeriesNode = createSeriesNode(seriesNodes[0].prefix,
+      seriesNodes[0].suffix, seriesNodes[0].parent, clusterId,
+      curSeriesName);
+    _.each(seriesNodes, function(node) {
+      curSeriesNode.ids.push(node.clusterId);
+      curSeriesNode.metagraph.setNode(node.name, metagraph.node(node.name));
+    });
+    seriesDict[curSeriesName] = curSeriesNode;
+  }
+}
+
+} // close module tf.graph.hierarchy
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/layout.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/layout.ts
new file mode 100644
index 0000000000..4eb3cab011
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/layout.ts
@@ -0,0 +1,628 @@
+/// <reference path="graph.ts" />
+/// <reference path="render.ts" />
+
+module tf.graph.layout {
+
+/** Set of parameters that define the look and feel of the graph. */
+export const PARAMS = {
+  animation: {
+    /** Default duration for graph animations in ms. */
+    duration: 250
+  },
+  graph: {
+    /** Graph parameter for metanode. */
+    meta: {
+      /**
+       * Dagre's nodesep param - number of pixels that
+       * separate nodes horizontally in the layout.
+       *
+       * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+       */
+      nodeSep: 110,
+      /**
+       * Dagre's ranksep param - number of pixels
+       * between each rank in the layout.
+       *
+       * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+       */
+      rankSep: 25
+    },
+    /** Graph parameter for metanode. */
+    series: {
+      /**
+       * Dagre's nodesep param - number of pixels that
+       * separate nodes horizontally in the layout.
+       *
+       * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+       */
+      nodeSep: 90,
+      /**
+       * Dagre's ranksep param - number of pixels
+       * between each rank in the layout.
+       *
+       * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+       */
+      rankSep: 25,
+    },
+    /**
+     * Padding is used to correctly position the graph SVG inside of its parent
+     * element. The padding amounts are applied using an SVG transform of X and
+     * Y coordinates.
+     */
+    padding: {
+      paddingTop: 40,
+      paddingLeft: 20
+    }
+  },
+  subscene: {
+    meta: {
+      paddingTop: 10,
+      paddingBottom: 10,
+      paddingLeft: 10,
+      paddingRight: 10,
+      /**
+       * Used to leave room for the label on top of the highest node in
+       * the core graph.
+       */
+      labelHeight: 20,
+      /** X-space between each extracted node and the core graph. */
+      extractXOffset: 50,
+      /** Y-space between each extracted node. */
+      extractYOffset: 20
+    },
+    series: {
+      paddingTop: 10,
+      paddingBottom: 10,
+      paddingLeft: 10,
+      paddingRight: 10,
+      labelHeight: 10
+    }
+  },
+  nodeSize: {
+    /** Size of meta nodes. */
+    meta: {
+      radius: 5,
+      width: 60,
+      /** A scale for the node's height based on number of nodes inside */
+      height: d3.scale.linear().domain([1, 200]).range([15, 60]).clamp(true),
+      /** The radius of the circle denoting the expand button. */
+      expandButtonRadius: 3
+    },
+    /** Size of op nodes. */
+    op: {
+      width: 15,
+      height: 6,
+      radius: 3, // for making annotation touching ellipse
+      labelOffset: -8
+    },
+    /** Size of series nodes. */
+    series: {
+      expanded: {
+        // For expanded series nodes, width and height will be
+        // computed to account for the subscene.
+        radius: 10,
+        labelOffset: 0,
+      },
+      vertical: {
+        // When unexpanded, series whose underlying metagraphs contain
+        // one or more non-control edges will show as a vertical stack
+        // of ellipses.
+        width: 16,
+        height: 13,
+        labelOffset: -13,
+      },
+      horizontal: {
+        // When unexpanded, series whose underlying metagraphs contain
+        // no non-control edges will show as a horizontal stack of
+        // ellipses.
+        width: 24,
+        height: 8,
+        radius: 10, // Forces annotations to center line.
+        labelOffset: -10,
+      },
+    },
+    /** Size of bridge nodes. */
+    bridge: {
+      // NOTE: bridge nodes will normally be invisible, but they must
+      // take up some space so that the layout step leaves room for
+      // their edges.
+      width: 20,
+      height: 20,
+      radius: 2,
+      labelOffset: 0
+    }
+  },
+  shortcutSize: {
+    /** Size of shortcuts for op nodes */
+    op: {
+      width: 10,
+      height: 4
+    },
+    /** Size of shortcuts for meta nodes */
+    meta: {
+      width: 12,
+      height: 4,
+      radius: 1
+    },
+    /** Size of shortcuts for series nodes */
+    series: {
+      width: 14,
+      height: 4,
+    }
+  },
+  annotations: {
+    /** X-space between the shape and each annotation-node. */
+    xOffset: 10,
+    /** Y-space between each annotation-node. */
+    yOffset: 3,
+    /** X-space between each annotation-node and its label. */
+    labelOffset: 2,
+    /** Estimate max width for annotation label */
+    labelWidth: 35
+  },
+  constant: {
+    size: {
+      width: 4,
+      height: 4
+    }
+  },
+  series: {
+    /** Maximum number of repeated item for unexpanded series node. */
+    maxStackCount: 3,
+    /**
+     * Positioning offset ratio for collapsed stack
+     * of parallel series (series without edges between its members).
+     */
+    parallelStackOffsetRatio: 0.2,
+    /**
+     * Positioning offset ratio for collapsed stack
+     * of tower series (series with edges between its members).
+     */
+    towerStackOffsetRatio: 0.5
+  },
+  minimap : {
+    /** The maximum width/height the minimap can have. */
+    size: 150
+  }
+};
+
+/** Calculate layout for a scene of a group node. */
+export function scene(renderNodeInfo: render.RenderGroupNodeInformation)
+    : void {
+  // Update layout, size, and annotations of its children nodes and edges.
+  if (renderNodeInfo.node.isGroupNode) {
+    layoutChildren(renderNodeInfo);
+  }
+
+  // Update position of its children nodes and edges
+  if (renderNodeInfo.node.type === NodeType.META) {
+    layoutMetanode(renderNodeInfo);
+  } else if (renderNodeInfo.node.type === NodeType.SERIES) {
+    layoutSeriesNode(renderNodeInfo);
+  }
+};
+
+/**
+ * Update layout, size, and annotations of its children nodes and edges.
+ */
+function layoutChildren(renderNodeInfo: render.RenderGroupNodeInformation)
+    : void {
+  let children = renderNodeInfo.coreGraph.nodes().map(n => {
+    return renderNodeInfo.coreGraph.node(n);
+  }).concat(renderNodeInfo.isolatedInExtract,
+      renderNodeInfo.isolatedOutExtract);
+
+  _.each(children, childNodeInfo => {
+    // Set size of each child
+    switch (childNodeInfo.node.type) {
+      case NodeType.OP:
+        _.extend(childNodeInfo, PARAMS.nodeSize.op);
+        break;
+      case NodeType.BRIDGE:
+        _.extend(childNodeInfo, PARAMS.nodeSize.bridge);
+        break;
+      case NodeType.META:
+        if (!childNodeInfo.expanded) {
+          // set fixed width and scalable height based on cardinality
+          _.extend(childNodeInfo, PARAMS.nodeSize.meta);
+          childNodeInfo.height =
+              PARAMS.nodeSize.meta.height(childNodeInfo.node.cardinality);
+        } else {
+          let childGroupNodeInfo =
+            <render.RenderGroupNodeInformation>childNodeInfo;
+          scene(childGroupNodeInfo); // Recursively layout its subscene.
+        }
+        break;
+      case NodeType.SERIES:
+        if (childNodeInfo.expanded) {
+          _.extend(childNodeInfo, PARAMS.nodeSize.series.expanded);
+          let childGroupNodeInfo =
+            <render.RenderGroupNodeInformation>childNodeInfo;
+          scene(childGroupNodeInfo); // Recursively layout its subscene.
+        } else {
+          let childGroupNodeInfo =
+            <render.RenderGroupNodeInformation>childNodeInfo;
+          let seriesParams =
+            childGroupNodeInfo.node.hasNonControlEdges ?
+              PARAMS.nodeSize.series.vertical :
+              PARAMS.nodeSize.series.horizontal;
+          _.extend(childNodeInfo, seriesParams);
+        }
+        break;
+      default:
+        throw Error("Unrecognized node type: " + childNodeInfo.node.type);
+    }
+
+    // Layout each child's annotations
+    layoutAnnotation(childNodeInfo);
+  });
+}
+
+/**
+ * Calculate layout for a graph using dagre
+ * @param graph the graph to be laid out
+ * @param params layout parameters
+ * @return width and height of the core graph
+ */
+function dagreLayout(graph: graphlib.Graph<any, any>, params)
+    : {height: number, width: number} {
+  _.extend(graph.graph(), {
+      nodeSep: params.nodeSep,
+      rankSep: params.rankSep
+    });
+
+  let bridgeNodeNames = [];
+  let nonBridgeNodeNames = [];
+
+  // Split out nodes into bridge and non-bridge nodes, and calculate the total
+  // width we should use for bridge nodes.
+  _.each(graph.nodes(), nodeName => {
+    let nodeInfo = graph.node(nodeName);
+    if (nodeInfo.node.type === NodeType.BRIDGE) {
+      bridgeNodeNames.push(nodeName);
+    } else {
+      nonBridgeNodeNames.push(nodeName);
+    }
+  });
+
+  // If there are no non-bridge nodes, then the graph has zero size.
+  if (!nonBridgeNodeNames.length) {
+    return {
+      width: 0,
+      height: 0,
+    };
+  }
+
+  dagre.layout(graph);
+
+  let graphLabel = graph.graph();
+
+  // Calculate the true bounding box of the graph by iterating over nodes and
+  // edges rather than accepting dagre's word for it. In particular, we should
+  // ignore the extra-wide bridge nodes and bridge edges, and allow for
+  // annotation boxes and labels.
+  let minX = Infinity;
+  let minY = Infinity;
+  let maxX = -Infinity;
+  let maxY = -Infinity;
+  _.each(nonBridgeNodeNames, nodeName => {
+    let nodeInfo = graph.node(nodeName);
+    let w = 0.5 * nodeInfo.width;
+    let x1 = nodeInfo.x - w - nodeInfo.inboxWidth;
+    let x2 = nodeInfo.x + w + nodeInfo.outboxWidth;
+    minX = x1 < minX ? x1 : minX;
+    maxX = x2 > maxX ? x2 : maxX;
+    let labelLength =
+        nodeName.length - nodeName.lastIndexOf(NAMESPACE_DELIM);
+    // TODO(jimbo): Account for font width rather than using a magic number.
+    let charWidth = 3; // 3 pixels per character.
+    let lw = 0.5 * labelLength * charWidth;
+    let lx1 = nodeInfo.x - lw;
+    let lx2 = nodeInfo.x + lw;
+    minX = lx1 < minX ? lx1 : minX;
+    maxX = lx2 > maxX ? lx2 : maxX;
+    // TODO(jimbo): Account for the height of labels above op nodes here.
+    let h = 0.5 * nodeInfo.outerHeight;
+    let y1 = nodeInfo.y - h;
+    let y2 = nodeInfo.y + h;
+    minY = y1 < minY ? y1 : minY;
+    maxY = y2 > maxY ? y2 : maxY;
+  });
+  _.each(graph.edges(), edgeObj => {
+    let renderMetaedgeInfo = graph.edge(edgeObj);
+    if (renderMetaedgeInfo.structural) {
+      return; // Skip structural edges from min/max calculations.
+    }
+    _.each(renderMetaedgeInfo.points,
+      (point: { x: number, y: number }) => {
+        minX = point.x < minX ? point.x : minX;
+        maxX = point.x > maxX ? point.x : maxX;
+        minY = point.y < minY ? point.y : minY;
+        maxY = point.y > maxY ? point.y : maxY;
+      });
+  });
+
+  // Shift all nodes and edge points to account for the left-padding amount,
+  // and the invisble bridge nodes.
+  _.each(graph.nodes(), nodeName => {
+    let nodeInfo = graph.node(nodeName);
+    nodeInfo.x -= minX;
+    nodeInfo.y -= minY;
+  });
+  _.each(graph.edges(), edgeObj => {
+    _.each(graph.edge(edgeObj).points,
+      (point: { x: number, y: number }) => {
+        point.x -= minX;
+        point.y -= minY;
+      });
+  });
+
+  return {
+    width: maxX - minX,
+    height: maxY - minY,
+  };
+}
+
+/** Layout a metanode. */
+function layoutMetanode(renderNodeInfo): void {
+  // First, copy params specific to meta nodes onto this render info object.
+  let params = PARAMS.subscene.meta;
+  renderNodeInfo = _.extend(renderNodeInfo, params);
+
+  // Invoke dagre.layout() on the core graph and record the bounding box
+  // dimensions.
+  _.extend(renderNodeInfo.coreBox,
+      dagreLayout(renderNodeInfo.coreGraph, PARAMS.graph.meta));
+
+  // Calculate the position of nodes in isolatedInExtract relative to the
+  // top-left corner of inExtractBox (the bounding box for all inExtract nodes)
+  // and calculate the size of the inExtractBox.
+  let hasInExtract = renderNodeInfo.isolatedInExtract.length > 0;
+
+  renderNodeInfo.inExtractBox.width = hasInExtract ?
+    _(renderNodeInfo.isolatedInExtract).pluck("outerWidth").max() : 0;
+
+  renderNodeInfo.inExtractBox.height =
+    _.reduce(renderNodeInfo.isolatedInExtract, (height, child: any, i) => {
+      let yOffset = i > 0 ? params.extractYOffset : 0;
+      // use outerWidth/Height here to avoid overlaps between extracts
+      child.x = renderNodeInfo.inExtractBox.width / 2;
+      child.y = height + yOffset + child.outerHeight / 2;
+      return height + yOffset + child.outerHeight;
+    }, 0);
+
+  // Calculate the position of nodes in isolatedOutExtract relative to the
+  // top-left corner of outExtractBox (the bounding box for all outExtract
+  // nodes) and calculate the size of the outExtractBox.
+  let hasOutExtract = renderNodeInfo.isolatedOutExtract.length > 0;
+  renderNodeInfo.outExtractBox.width = hasOutExtract ?
+    _(renderNodeInfo.isolatedOutExtract).pluck("outerWidth").max() : 0;
+
+  renderNodeInfo.outExtractBox.height =
+    _.reduce(renderNodeInfo.isolatedOutExtract, (height, child: any, i) => {
+      let yOffset = i > 0 ? params.extractYOffset : 0;
+      // use outerWidth/Height here to avoid overlaps between extracts
+      child.x = renderNodeInfo.outExtractBox.width / 2;
+      child.y = height + yOffset + child.outerHeight / 2;
+      return height + yOffset + child.outerHeight;
+    }, 0);
+
+  // Determine the whole metanode's width (from left to right).
+  renderNodeInfo.width =
+    params.paddingLeft + renderNodeInfo.coreBox.width + params.paddingRight +
+    (hasInExtract ?
+      renderNodeInfo.inExtractBox.width + params.extractXOffset : 0) +
+    (hasOutExtract ?
+      params.extractXOffset + renderNodeInfo.outExtractBox.width : 0);
+
+  // TODO(jimbo): Remove labelHeight and instead incorporate into box sizes.
+  // Determine the whole metanode's height (from top to bottom).
+  renderNodeInfo.height =
+    renderNodeInfo.labelHeight +
+    params.paddingTop +
+    Math.max(
+        renderNodeInfo.inExtractBox.height,
+        renderNodeInfo.coreBox.height,
+        renderNodeInfo.outExtractBox.height
+    ) +
+    params.paddingBottom;
+}
+
+/**
+ * Calculate layout for series node's core graph. Only called for an expanded
+ * series.
+ */
+function layoutSeriesNode(node: render.RenderGroupNodeInformation): void {
+  let graph = node.coreGraph;
+
+  let params = PARAMS.subscene.series;
+  _.extend(node, params);
+
+  // Layout the core.
+  _.extend(node.coreBox,
+      dagreLayout(node.coreGraph, PARAMS.graph.series));
+
+  _.each(graph.nodes(), nodeName => {
+    graph.node(nodeName).excluded = false;
+  });
+
+  // Series do not have in/outExtractBox so no need to include them here.
+  node.width = node.coreBox.width + params.paddingLeft + params.paddingRight;
+  node.height = node.coreBox.height + params.paddingTop + params.paddingBottom;
+}
+
+/**
+ * Calculate layout for annotations of a given node.
+ * This will modify positions of the the given node and its annotations.
+ *
+ * @see tf.graph.render.Node and tf.graph.render.Annotation
+ * for description of each property of each render node.
+ *
+ */
+ function layoutAnnotation(renderNodeInfo: render.RenderNodeInformation): void {
+  // If the render node is an expanded metanode, then its annotations will not
+  // be visible and we should skip the annotation calculations.
+  if (renderNodeInfo.expanded) {
+    _.extend(renderNodeInfo, {
+      inboxWidth: 0,
+      inboxHeight: 0,
+      outboxWidth: 0,
+      outboxHeight: 0,
+      outerWidth: renderNodeInfo.width,
+      outerHeight: renderNodeInfo.height
+    });
+    return;
+  }
+
+  let inAnnotations = renderNodeInfo.inAnnotations.list;
+  let outAnnotations = renderNodeInfo.outAnnotations.list;
+
+  // Calculate size for in-annotations
+  _.each(inAnnotations, a => sizeAnnotation(a));
+
+  // Calculate size for out-annotations
+  _.each(outAnnotations, a => sizeAnnotation(a));
+
+  let params = PARAMS.annotations;
+  renderNodeInfo.inboxWidth =
+    inAnnotations.length > 0 ?
+      (<any>_(inAnnotations).pluck("width").max()) +
+          params.xOffset + params.labelWidth + params.labelOffset :
+      0;
+
+  renderNodeInfo.outboxWidth =
+    outAnnotations.length > 0 ?
+      (<any>_(outAnnotations).pluck("width").max()) +
+          params.xOffset + params.labelWidth + params.labelOffset :
+      0;
+
+  // Calculate annotation node position (a.dx, a.dy)
+  // and total height for in-annotations
+  // After this chunk of code:
+  // inboxHeight = sum of annotation heights+ (annotation.length - 1 * yOffset)
+  let inboxHeight = _.reduce(inAnnotations,
+      (height, a: any, i) => {
+        let yOffset = i > 0 ? params.yOffset : 0;
+        a.dx = -(renderNodeInfo.width + a.width) / 2 - params.xOffset;
+        a.dy = height + yOffset + a.height / 2;
+        return height + yOffset + a.height;
+      }, 0);
+
+  _.each(inAnnotations, (a: any) => {
+    a.dy -= inboxHeight / 2;
+
+    a.labelOffset = params.labelOffset;
+  });
+
+  // Calculate annotation node position position (a.dx, a.dy)
+  // and total height for out-annotations
+  // After this chunk of code:
+  // outboxHeight = sum of annotation heights +
+  //                (annotation.length - 1 * yOffset)
+  let outboxHeight = _.reduce(outAnnotations,
+      (height, a: any, i) => {
+        let yOffset = i > 0 ? params.yOffset : 0;
+        a.dx = (renderNodeInfo.width + a.width) / 2 + params.xOffset;
+        a.dy = height + yOffset + a.height / 2;
+        return height + yOffset + a.height;
+      }, 0);
+
+  _.each(outAnnotations, (a: any) => {
+    // adjust by (half of ) the total height
+    // so dy is relative to the host node's center.
+    a.dy -= outboxHeight / 2;
+
+    a.labelOffset = params.labelOffset;
+  });
+
+  // Creating scales for touch point between the in-annotation edges
+  // and their hosts.
+
+  let inTouchHeight =
+      Math.min(renderNodeInfo.height / 2 - renderNodeInfo.radius,
+          inboxHeight / 2);
+  inTouchHeight = inTouchHeight < 0 ? 0 : inTouchHeight;
+
+  let inY = d3.scale.linear()
+    .domain([0, inAnnotations.length - 1])
+    .range([-inTouchHeight, inTouchHeight]);
+
+  // Calculate annotation edge position
+  _.each(inAnnotations, (a: any, i) => {
+    a.points = [
+      // The annotation node end
+      {
+        dx: a.dx + a.width / 2,
+        dy: a.dy
+      },
+
+      // The host node end
+      {
+        dx: - renderNodeInfo.width / 2,
+        // only use scale if there are more than one,
+        // otherwise center it vertically
+        dy: inAnnotations.length > 1 ? inY(i) : 0
+      }
+    ];
+  });
+
+  // Creating scales for touch point between the out-annotation edges
+  // and their hosts.
+  let outTouchHeight =
+      Math.min(renderNodeInfo.height / 2 - renderNodeInfo.radius,
+          outboxHeight / 2);
+  outTouchHeight = outTouchHeight < 0 ? 0 : outTouchHeight;
+  let outY = d3.scale.linear()
+    .domain([0, outAnnotations.length - 1])
+    .range([-outTouchHeight, outTouchHeight]);
+
+  _.each(outAnnotations, (a: any, i) => {
+    // Add point from the border of the annotation node
+    a.points = [
+      // The host node end
+      {
+        dx: renderNodeInfo.width / 2,
+        // only use scale if there are more than one,
+        // otherwise center it vertically
+        dy: outAnnotations.length > 1 ? outY(i) : 0
+      },
+      // The annotation node end
+      {
+        dx: a.dx - a.width / 2,
+        dy: a.dy
+      }
+    ];
+  });
+
+  renderNodeInfo.outerWidth = renderNodeInfo.width + renderNodeInfo.inboxWidth +
+      renderNodeInfo.outboxWidth;
+  renderNodeInfo.outerHeight =
+      Math.max(renderNodeInfo.height, inboxHeight, outboxHeight);
+}
+
+/**
+ * Set size of an annotation node.
+ */
+function sizeAnnotation(a: render.Annotation): void {
+  switch (a.annotationType) {
+    case render.AnnotationType.CONSTANT:
+      _.extend(a, PARAMS.constant.size);
+      break;
+    case render.AnnotationType.SHORTCUT:
+      if (a.node.type === NodeType.OP) {
+        _.extend(a, PARAMS.shortcutSize.op);
+      } else if (a.node.type === NodeType.META) {
+        _.extend(a, PARAMS.shortcutSize.meta);
+      } else if (a.node.type === NodeType.SERIES) {
+        _.extend(a, PARAMS.shortcutSize.series);
+      } else {
+        throw Error("Invalid node type: " + a.node.type);
+      }
+      break;
+    case render.AnnotationType.SUMMARY:
+      _.extend(a, PARAMS.constant.size);
+      break;
+  }
+}
+
+} // close module
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/parser.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/parser.ts
new file mode 100644
index 0000000000..b4864738a9
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/parser.ts
@@ -0,0 +1,189 @@
+/// <reference path="../../../typings/tsd.d.ts" />
+/// <reference path="common.ts" />
+module tf.graph.parser {
+
+/**
+ * Parses a native js value, which can be either a string, boolean or number.
+ *
+ * @param value The value to be parsed.
+ */
+function parseValue(value: string): string|number|boolean {
+  if (value === "true") {
+    return true;
+  }
+  if (value === "false") {
+    return false;
+  }
+  let firstChar = value[0];
+  if (firstChar === "\"") {
+    return value.substring(1, value.length - 1);
+  }
+  let num = parseFloat(value);
+  return isNaN(num) ? value : num;
+}
+
+/**
+ * Fetches a text file and returns a promise of the result.
+ */
+export function readPbTxt(filepath: string): Promise<string> {
+  return new Promise<string>(function(resolve, reject) {
+    d3.text(filepath, function(error, text) {
+      if (error) {
+        reject(error);
+        return;
+      }
+      resolve(text);
+    });
+  });
+}
+
+/**
+ * Fetches and parses a json file and returns a promise of the result.
+ */
+export function readJson(filepath: string): Promise<Object> {
+  return new Promise<Object>(function(resolve, reject) {
+    d3.json(filepath, function(error, text) {
+      if (error) {
+        reject(error);
+        return;
+      }
+      resolve(text);
+    });
+  });
+}
+
+/**
+ * Reads the graph and stats file (if available), parses them and returns a
+ * promise of the result.
+ */
+export function readAndParseData(dataset: {path: string, statsPath: string},
+    pbTxtContent: string, tracker: ProgressTracker):
+    Promise<{ nodes: TFNode[], statsJson: Object }|void> {
+  let graphPbTxt;
+  let statsJson;
+  return runAsyncTask("Reading graph.pbtxt", 20, () => {
+    return pbTxtContent || readPbTxt(dataset.path);
+  }, tracker)
+  .then(function(text) {
+    graphPbTxt = text;
+    return runAsyncTask("Reading stats.pbtxt", 20, () => {
+      return (dataset != null && dataset.statsPath != null) ?
+          readJson(dataset.statsPath) : null;
+    }, tracker);
+  })
+  .then(function(json) {
+    statsJson = json;
+    return runAsyncTask("Parsing graph.pbtxt", 60, () => {
+      return parsePbtxt(graphPbTxt);
+    }, tracker);
+  })
+  .then(function(nodes) {
+    return {
+      nodes: nodes,
+      statsJson: statsJson
+    };
+    })
+  .catch(function(reason) {
+    throw new Error("Failure parsing graph definition");
+  });
+}
+
+/**
+ * Parses a proto txt file into a javascript object.
+ *
+ * @param input The string contents of the proto txt file.
+ * @return The parsed object.
+ */
+export function parsePbtxt(input: string): TFNode[] {
+  let output: { [name: string]: any; } = { node: [] };
+  let stack = [];
+  let path: string[] = [];
+  let current: { [name: string]: any; } = output;
+
+  function splitNameAndValueInAttribute(line: string) {
+    let colonIndex = line.indexOf(":");
+    let name = line.substring(0, colonIndex).trim();
+    let value = parseValue(line.substring(colonIndex + 2).trim());
+    return {
+      name: name,
+      value: value
+    };
+  }
+
+  /**
+   * Since proto-txt doesn't explicitly say whether an attribute is repeated
+   * (an array) or not, we keep a hard-coded list of attributes that are known
+   * to be repeated. This list is used in parsing time to convert repeated
+   * attributes into arrays even when the attribute only shows up once in the
+   * object.
+   */
+  let ARRAY_ATTRIBUTES: {[attrPath: string] : boolean} = {
+    "node": true,
+    "node.input": true,
+    "node.attr": true,
+    "node.attr.value.list.type": true,
+    "node.attr.value.shape.dim": true,
+    "node.attr.value.tensor.string_val": true,
+    "node.attr.value.tensor.tensor_shape.dim": true
+  };
+
+  /**
+   * Adds a value, given the attribute name and the host object. If the
+   * attribute already exists, but is not an array, it will convert it to an
+   * array of values.
+   *
+   * @param obj The host object that holds the attribute.
+   * @param name The attribute name (key).
+   * @param value The attribute value.
+   * @param path A path that identifies the attribute. Used to check if
+   *     an attribute is an array or not.
+   */
+  function addAttribute(obj: Object, name: string,
+      value: Object|string|number|boolean, path: string[]): void {
+    // We treat "node" specially since it is done so often.
+    let existingValue = obj[name];
+    if (existingValue == null) {
+      obj[name] = path.join(".") in ARRAY_ATTRIBUTES ? [value] : value;
+    } else if (Array.isArray(existingValue)) {
+      existingValue.push(value);
+    } else {
+      obj[name] = [existingValue, value];
+    }
+  }
+
+  // Run through the file a line at a time.
+  let startPos = 0;
+  while (startPos < input.length) {
+    let endPos = input.indexOf("\n", startPos);
+    if (endPos === -1) {
+      endPos = input.length;
+    }
+    let line = input.substring(startPos, endPos);
+    startPos = endPos + 1;
+    if (!line) {
+      continue;
+    }
+    switch (line[line.length - 1]) {
+      case "{": // create new object
+        let name = line.substring(0, line.length - 2).trim();
+        let newValue: { [name: string]: any; } = {};
+        stack.push(current);
+        path.push(name);
+        addAttribute(current, name, newValue, path);
+        current = newValue;
+        break;
+      case "}":
+        current = stack.pop();
+        path.pop();
+        break;
+      default:
+        let x = splitNameAndValueInAttribute(line);
+        addAttribute(current, x.name, x.value, path.concat(x.name));
+        break;
+    }
+  }
+
+  return output["node"];
+}
+
+} // Close module tf.graph.parser.
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/render.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/render.ts
new file mode 100644
index 0000000000..363f006fd5
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/render.ts
@@ -0,0 +1,1360 @@
+/// <reference path="graph.ts" />
+/// <reference path="hierarchy.ts" />
+
+/**
+ * Package for the Render Hierarchy for TensorFlow graph.
+ */
+
+module tf.graph.render {
+
+/**
+ * Color parameters for node encoding.
+ * @type {Object}
+ */
+export let MetanodeColors = {
+  SATURATION: 0.6,
+  LIGHTNESS: 0.85,
+  /**
+   * Neutral color to use when the node is expanded (used when coloring by
+   * compute time, memory and device).
+   */
+  EXPANDED_COLOR: "#f0f0f0",
+  /**
+   * Standard hue values for node color palette.
+   */
+  HUES: [220, 100, 180, 40, 20, 340, 260, 300, 140, 60],
+  STRUCTURE_PALETTE: function(id: number, lightened? : boolean) {
+    // The code below is a flexible way to computationally create a set
+    // of colors that go well together.
+    let hues = MetanodeColors.HUES;
+    let n = hues.length;
+    let hue = hues[id % n];
+    let m = Math.sin(hue * Math.PI / 360);
+    let sat = lightened ? 30 : 90 - 60 * m;
+    let light = lightened ? 95 : 80;
+    return d3.hsl(hue, .01 * sat, .01 * light).toString();
+  },
+  DEVICE_PALETTE: function (index: number): string {
+    return MetanodeColors.STRUCTURE_PALETTE(index);
+  },
+  UNKNOWN: "#eee",
+  GRADIENT_OUTLINE: "#888"
+};
+
+/**
+ * Parameters that affect how the graph is rendered on the screen.
+ */
+interface RenderGraphParams {
+  /**
+   * Whether to extract high degree nodes from the core part of the graph.
+   */
+  enableExtraction: boolean;
+  /**
+   * Maximum in-degree that a node can have without being considered as
+   * high in-degree node.
+   */
+  maxInDegree: number;
+  /**
+   * Maximum out-degree that a node can have without being considered as
+   * high out-degree node.
+   */
+  maxOutDegree: number;
+  /**
+   * Maximum number of control edges a node can have before they aren't
+   * displayed.
+   */
+  maxControlDegree: number;
+  /**
+   * Types patterns for predefined out-extract nodes, which are
+   * sink-like nodes that will be extracted from the main graph.
+   */
+  outExtractTypes: string[];
+
+  /**
+   * Types patterns for predefined in-extract nodes, which are
+   * source-like nodes that will be extracted from the main graph.
+   */
+  inExtractTypes: string[];
+
+  /**
+   * When removing edges from a high degree node, remove all of its edges if
+   * detachAllEdgesForHighDegree is true.  Otherwise remove all in-edges if
+   * the node has high in-degree, or all out-edges if the node has high
+   * out-degree.
+   */
+  detachAllEdgesForHighDegree: boolean;
+
+  /**
+   * After extracting high in/out degree nodes and predefined
+   * source-like/sink-like, extract isolated nodes to the side
+   * if this extractIsolatedNodesWithAnnotationsOnOneSide is true.
+   */
+  extractIsolatedNodesWithAnnotationsOnOneSide: boolean;
+
+  /**
+   * Whether to add bridge nodes and edges to the core when building the
+   * subhierarchy of an expanded metanode. See buildSubhierarchy().
+   */
+  enableBridgegraph: boolean;
+
+  /**
+   * 2 colors, for the minimum and maximum value respectively, whenever we
+   * have a gradient scale.
+   */
+  minMaxColors: string[];
+
+  /**
+   * Maximum number of annotations to be displayed on a node.
+   */
+  maxAnnotations: number;
+}
+
+/**
+ * Stores the rendering information, such as x and y coordinates,
+ * for each node in the graph.
+ */
+export class RenderGraphInformation {
+  private hierarchy: hierarchy.Hierarchy;
+  private index: {[nodeName: string]: RenderNodeInformation};
+  private params: RenderGraphParams;
+  private deviceColorMap: d3.scale.Ordinal<string, string>;
+  private memoryUsageScale: d3.scale.Linear<string, string>;
+  private computeTimeScale: d3.scale.Linear<string, string>;
+  // Since the rendering information for each node is constructed lazily,
+  // upon node's expansion by the user, we keep a map between the node's name and
+  // whether the rendering information was already constructed for that node.
+  private hasSubhierarchy: {[nodeName: string]: boolean};
+  root: RenderGroupNodeInformation;
+
+  constructor(hierarchy: hierarchy.Hierarchy, params: RenderGraphParams) {
+    this.hierarchy = hierarchy;
+    this.index = {};
+    this.deviceColorMap = d3.scale.ordinal<string>()
+        .domain(hierarchy.devices)
+        .range(_.map(d3.range(hierarchy.devices.length),
+                     MetanodeColors.DEVICE_PALETTE));
+
+    let topLevelGraph = hierarchy.root.metagraph;
+    // Find the maximum and minimum memory usage.
+    let memoryExtent = d3.extent(topLevelGraph.nodes(),
+        (nodeName, index) => {
+      let node = topLevelGraph.node(nodeName);
+      // Some ops don't have stats at all.
+      if (node.stats != null) {
+        return node.stats.totalBytes;
+      }
+    });
+    this.memoryUsageScale = d3.scale.linear<string, string>()
+        .domain(memoryExtent)
+        .range(params.minMaxColors);
+
+    // Find also the minimum and maximum compute time.
+    let computeTimeExtent = d3.extent(topLevelGraph.nodes(), (nodeName, index) => {
+      let node = topLevelGraph.node(nodeName);
+      // Some ops don't have stats at all.
+      if (node.stats != null) {
+        return node.stats.totalMicros;
+      }
+    });
+    this.computeTimeScale = d3.scale.linear<string, string>()
+        .domain(computeTimeExtent)
+        .range(params.minMaxColors);
+
+    // Maps node name to whether the rendering hierarchy was already constructed.
+    this.hasSubhierarchy = {};
+    this.params = params;
+    this.root = new RenderGroupNodeInformation(hierarchy.root);
+    this.index[hierarchy.root.name] = this.root;
+    this.buildSubhierarchy(hierarchy.root.name);
+    this.root.expanded = true;
+  }
+
+  getRenderNodeByName(nodeName: string): RenderNodeInformation {
+    return this.index[nodeName];
+  }
+
+  /**
+   * Return the nearest ancestor node, including itself, that is visible
+   * in the visualization. This method is used so that we can select
+   * (highlight) a node that isn't drawn yet, by selecting (highlighting)
+   * its nearest ancestor that has been drawn.
+   */
+  getNearestVisibleAncestor(name: string): string {
+    let path = getHierarchicalPath(name);
+    for (let i = 0; i < path.length; i++) {
+      let nodeName = path[i];
+      // Op nodes have expanded set to false by default.
+      if (!this.getRenderNodeByName(nodeName).expanded) {
+        return nodeName;
+      }
+    }
+    // Fallthrough. If everything was expanded return the node.
+    return name;
+  }
+
+  // TODO(jimbo): Delete this an any code it touches (all deprecated).
+  setDepth(depth: number): void {
+    setGroupNodeDepth(this.root, +depth);
+  }
+
+  buildSubhierarchy(nodeName: string): void {
+    // Terminate if the rendering hierarchy was already constructed
+    // for this node.
+    if (nodeName in this.hasSubhierarchy) {
+      return;
+    }
+
+    let renderNodeInfo = this.index[nodeName];
+
+    // If it is not a meta node or a series node, don't do anything.
+    if (renderNodeInfo.node.type !== NodeType.META &&
+        renderNodeInfo.node.type !== NodeType.SERIES) {
+      return;
+    }
+
+    // At this point we know the rendering information is about a group node.
+    let renderGroupNodeInfo = <RenderGroupNodeInformation> renderNodeInfo;
+    let metagraph = renderGroupNodeInfo.node.metagraph;
+    let coreGraph = renderGroupNodeInfo.coreGraph;
+
+    // Create render nodes to represent each child from the metagraph. Although
+    // these will initially be added to the coreGraph, they may later be
+    // extracted. Also, due to extraction, the coreGraph may contain disjoint
+    // groups between which there is no visible path (other than annotations).
+    _.each(metagraph.nodes(), childName => {
+
+      let childNode = metagraph.node(childName);
+      let childRenderInfo = childNode.isGroupNode ?
+          new RenderGroupNodeInformation(<GroupNode>childNode) :
+          new RenderNodeInformation(childNode);
+      this.index[childName] = childRenderInfo;
+      coreGraph.setNode(childName, childRenderInfo);
+
+      if (childRenderInfo.node.stats != null) {
+        childRenderInfo.memoryColor =
+          this.memoryUsageScale(childRenderInfo.node.stats.totalBytes);
+        childRenderInfo.computeTimeColor =
+          this.computeTimeScale(childRenderInfo.node.stats.totalMicros);
+      }
+
+      if (!childNode.isGroupNode) {
+        _.each((<OpNode>childNode).inEmbeddings, embedding => {
+          let renderMetaedgeInfo = new RenderMetaedgeInformation(null);
+          addInAnnotation(childRenderInfo, embedding, null, renderMetaedgeInfo,
+              AnnotationType.CONSTANT, this.params);
+          this.index[embedding.name] = new RenderNodeInformation(embedding);
+        });
+        _.each((<OpNode>childNode).outEmbeddings, embedding => {
+          let renderMetaedgeInfo = new RenderMetaedgeInformation(null);
+          addOutAnnotation(childRenderInfo, embedding, null, renderMetaedgeInfo,
+              AnnotationType.SUMMARY, this.params);
+          this.index[embedding.name] = new RenderNodeInformation(embedding);
+        });
+        let device = (<OpNode>childRenderInfo.node).device;
+        if (device != null) {
+          childRenderInfo.deviceColors = [{
+            color: this.deviceColorMap(device),
+            proportion: 1.0
+          }];
+        }
+      } else {
+        // Make a list of tuples (device, proportion), where proportion
+        // is the fraction of op nodes that have that device.
+        let pairs = _.pairs((<GroupNode> childNode).deviceHistogram);
+        if (pairs.length > 0) {
+          // Compute the total # of devices.
+          let numDevices = _.sum(pairs, _.last);
+          childRenderInfo.deviceColors = _.map(pairs, pair => {
+            return {
+              color: this.deviceColorMap(pair[0]),
+              // Normalize to a proportion of total # of devices.
+              proportion: pair[1] / numDevices
+            };
+          });
+        }
+      }
+    });
+
+    // Add render metaedge info for edges in the metagraph.
+    _.each(metagraph.edges(), edgeObj => {
+      let metaedge = metagraph.edge(edgeObj);
+      let renderMetaedgeInfo = new RenderMetaedgeInformation(metaedge);
+      coreGraph.setEdge(edgeObj.v, edgeObj.w, renderMetaedgeInfo);
+    });
+
+    if (this.params.enableExtraction &&
+        renderGroupNodeInfo.node.type === NodeType.META) {
+      extractHighDegrees(renderGroupNodeInfo, this.params);
+    }
+
+    // Record that we constructed the rendering hierarchy for this node, so we
+    // don't construct it another time.
+    this.hasSubhierarchy[nodeName] = true;
+
+    // Look up the parent node's render information and short circuit if none.
+    let parentNode = renderGroupNodeInfo.node.parentNode;
+    if (!parentNode) {
+      return;
+    }
+    let parentNodeInfo =
+      <RenderGroupNodeInformation> this.index[parentNode.name];
+
+    // Utility function for computing the name of a bridge node.
+    let getBridgeNodeName = (inbound, ...rest) =>
+      rest.concat([inbound ? "IN" : "OUT"]).join("~~");
+
+    // Build out the bridgegraph.
+    let bridgegraph = this.hierarchy.getBridgegraph(nodeName);
+
+    // Look for popular nodes so we can make annotations instead of paths.
+    let otherCounts = {
+      // Counts of edges coming INTO other nodes by name (outgoing from self).
+      in: <{[nodeName: string]: number}> {},
+      // Counts of edges going OUT from other nodes by name (coming into self).
+      out: <{[nodeName: string]: number}> {},
+      // Counts of all control edges involving other nodes by name.
+      control: <{[nodeName: string]: number}> {},
+    };
+    _.each(bridgegraph.edges(), e => {
+      // An edge is inbound if its destination node is in the metagraph.
+      let inbound = !!metagraph.node(e.w);
+      let otherName = inbound ? e.v : e.w;
+      let metaedge = bridgegraph.edge(e);
+      if (!metaedge.numRegularEdges) {
+        otherCounts.control[otherName] =
+          (otherCounts.control[otherName] || 0) + 1;
+      } else if (inbound) {
+        otherCounts.out[otherName] = (otherCounts.out[otherName] || 0) + 1;
+      } else {
+        otherCounts.in[otherName] = (otherCounts.in[otherName] || 0) + 1;
+      }
+    });
+
+    // Add annotations and edges for bridgegraph relationships.
+    let hierarchyNodeMap = this.hierarchy.getNodeMap();
+    _.each(bridgegraph.edges(), bridgeEdgeObj => {
+      let bridgeMetaedge = bridgegraph.edge(bridgeEdgeObj);
+
+      // Determine whether this bridge edge is incoming by checking the
+      // metagraph for a node that matches the destination end.
+      let inbound = !!metagraph.node(bridgeEdgeObj.w);
+
+      // Based on the direction of the edge, one endpoint will be an immediate
+      // child of this renderNodeInfo, and the other endpoint will be a sibling
+      // of the parent (or an ancestor further up).
+      let [childName, otherName] =
+        inbound ?
+          [bridgeEdgeObj.w, bridgeEdgeObj.v] :
+          [bridgeEdgeObj.v, bridgeEdgeObj.w];
+
+      let childRenderInfo = this.index[childName];
+      let otherRenderInfo = this.index[otherName];
+      let otherNode =
+        otherRenderInfo ?
+          otherRenderInfo.node :
+          hierarchyNodeMap[otherName];
+
+      // Determine whether this edge is a control edge between nodes where
+      // either node is high-degree with respect to control edges. This will
+      // be a signal to show it as an annotation instead of a bridge edge.
+      let isHighDegreeControlEdge = !bridgeMetaedge.numRegularEdges &&
+        otherCounts.control[otherName] > this.params.maxControlDegree;
+
+      let [annotations, childAnnotations] =
+        inbound ?
+          [renderNodeInfo.inAnnotations, childRenderInfo.inAnnotations] :
+          [renderNodeInfo.outAnnotations, childRenderInfo.outAnnotations];
+
+      let isOtherHighDegree =
+        inbound ?
+          otherCounts.out[otherName] > this.params.maxOutDegree :
+          otherCounts.in[otherName] > this.params.maxInDegree;
+
+      // The adjoining render metaedge info from the parent's coreGraph, if any.
+      // It will either be a Metaedge involving this node directly, if it
+      // previously came from a metagraph, or it'll be a Metaedge involving
+      // a previously created bridge node standing in for the other node.
+      let adjoiningMetaedge = null;
+
+      // We can only hope to render a bridge path if:
+      //  - bridgegraph paths are enabled,
+      //  - the other node is not too high-degree,
+      //  - the child is in the core (not extracted for being high-degree), and
+      //  - there's a path (in the traversal sense) between child and other.
+      let canDrawBridgePath = false;
+      if (this.params.enableBridgegraph &&
+          !isOtherHighDegree &&
+          !isHighDegreeControlEdge &&
+          childRenderInfo.isInCore()) {
+
+        // Utility function for finding an adjoining metaedge.
+        let findAdjoiningMetaedge = targetName => {
+          let adjoiningEdgeObj: graphlib.EdgeObject =
+            inbound ?
+              { v: targetName, w: nodeName } :
+              { v: nodeName, w: targetName };
+          return <RenderMetaedgeInformation>
+            parentNodeInfo.coreGraph.edge(adjoiningEdgeObj);
+        };
+
+        adjoiningMetaedge = findAdjoiningMetaedge(otherName);
+        if (!adjoiningMetaedge) {
+          adjoiningMetaedge = findAdjoiningMetaedge(
+              getBridgeNodeName(inbound, otherName, parentNode.name));
+        }
+
+        canDrawBridgePath = !!adjoiningMetaedge;
+      }
+
+      // Although dataflow edges are acyclic, control dependency edges may
+      // actually point "backwards" in the graph. If this bridgeMetaedge is
+      // a control dependency, we need to determine whether it's backwards
+      // pointing so that we render it appropriately.
+      //
+      // For instance, say we're rendering a graph with nodes named A/B and Z/Y,
+      // and we're currently rendering the bridgegraph for A. Further, let's say
+      // that there was an original BaseEdge from A/B->Z/Y and a CONTROL EDGE
+      // from Z/Y=>A/B.
+      //
+      //     +----------------+
+      //     | A              |
+      //     |  +-----+       |         +------+
+      //     |  | B   |>----->|>------->| Z    |
+      //     |  |     |       |         |      |
+      //     |  |     |   *   |         |      |
+      //     |  |     |<=====<|<=======<|      |
+      //     |  +-----+       |         +------+
+      //     +----------------+
+      //
+      // When we render the subhierarchy for Metanode A, we'll come across a
+      // control-only Metaedge in the bridgegraph from Z=>A/B (*). The question
+      // is whether this edge is backwards.
+      //
+      // To answer that question, we follow the chain of adjoining metaedges
+      // until we reach the topmost one. In this case, that's the control-only
+      // Metaedge Z=>A in the ROOT's metagraph. We determine that this edge
+      // is backwards by looking at the topological ordering of ROOT's metagraph
+      // (which ignores control edges) and seeing that Z comes AFTER A.
+      //
+      // The property of being backwards is independent of whether the edge
+      // is inbound or outbound. In the preceeding example, if we were building
+      // the subhierarchy for Z, we'd find bridge edge Z/Y=>A, walk to its
+      // topmost adjoining metaedge Z=>A and discover that it's backwards.
+      let backwards = false;
+      if (adjoiningMetaedge && !bridgeMetaedge.numRegularEdges) {
+        // Find the top-most adjoining render metaedge information, and the
+        // GroupNode whose metagraph must contain the associated metaedge.
+        let topAdjoiningMetaedge = adjoiningMetaedge;
+        let topGroupNode = parentNodeInfo.node;
+        while (topAdjoiningMetaedge.adjoiningMetaedge) {
+          topAdjoiningMetaedge = topAdjoiningMetaedge.adjoiningMetaedge;
+          topGroupNode = <GroupNode>topGroupNode.parentNode;
+        }
+
+        // Check against the topological ordering for the top node. The current
+        // bridge metaedge we're evaluating is backwards if its source comes
+        // after its destination.
+        let ordering = this.hierarchy.getTopologicalOrdering(topGroupNode.name);
+        let e = topAdjoiningMetaedge.metaedge;
+        backwards = ordering[e.v] > ordering[e.w];
+      }
+
+      // Render backwards control edges as annotations.
+      canDrawBridgePath = canDrawBridgePath && !backwards;
+
+      // If we can't make a bridge path for any reason, then we add an
+      // annotation instead.
+      if (!canDrawBridgePath) {
+        childAnnotations.push(new Annotation(
+            otherNode,
+            otherRenderInfo,
+            new RenderMetaedgeInformation(bridgeMetaedge),
+            AnnotationType.SHORTCUT,
+            inbound), this.params);
+        return;
+      }
+
+      // At this point, all conditions have been met for drawing a bridge path.
+
+      // Find or create the IN/OUT node representing otherNode.
+      let bridgeContainerName = getBridgeNodeName(inbound, nodeName);
+      let bridgeNodeName = getBridgeNodeName(inbound, otherName, nodeName);
+      let bridgeNodeRenderInfo = coreGraph.node(bridgeNodeName);
+      if (!bridgeNodeRenderInfo) {
+
+        // Find or create the directional container for the bridge node.
+        let bridgeContainerInfo = coreGraph.node(bridgeContainerName);
+        if (!bridgeContainerInfo) {
+          let bridgeContainerNode: BridgeNode = {
+            // Important node properties.
+            name: bridgeContainerName,
+            type: NodeType.BRIDGE,
+            // Unused node properties.
+            isGroupNode: false,
+            cardinality: 0,
+            parentNode: null,
+            stats: null,
+            // BridgeNode properties.
+            inbound: inbound,
+          };
+          bridgeContainerInfo =
+            new RenderNodeInformation(bridgeContainerNode);
+          this.index[bridgeContainerName] = bridgeContainerInfo;
+          coreGraph.setNode(bridgeContainerName, bridgeContainerInfo);
+        }
+
+        let bridgeNode: BridgeNode = {
+          // Important node properties.
+          name: bridgeNodeName,
+          type: NodeType.BRIDGE,
+          // Unimportant node properties.
+          isGroupNode: false,
+          cardinality: 1,
+          parentNode: null,
+          stats: null,
+          // BridgeNode properties.
+          inbound: inbound,
+        };
+        bridgeNodeRenderInfo = new RenderNodeInformation(bridgeNode);
+        this.index[bridgeNodeName] = bridgeNodeRenderInfo;
+        coreGraph.setNode(bridgeNodeName, bridgeNodeRenderInfo);
+
+        // Set bridgeNode to be a graphlib child of the container node.
+        coreGraph.setParent(bridgeNodeName, bridgeContainerName);
+        bridgeContainerInfo.node.cardinality++;
+      }
+
+      // Create and add a bridge render metaedge.
+      let bridgeRenderMetaedge =
+        new RenderMetaedgeInformation(bridgeMetaedge);
+      bridgeRenderMetaedge.adjoiningMetaedge = adjoiningMetaedge;
+      inbound ?
+        coreGraph.setEdge(bridgeNodeName, childName, bridgeRenderMetaedge) :
+        coreGraph.setEdge(childName, bridgeNodeName, bridgeRenderMetaedge);
+
+    }); // End _.each(bridgegraph.edges).
+
+    // For each bridge container (IN and/or OUT), add structural edges between
+    // terminal nodes and that container. A terminal node is one which has no
+    // non-bridge edges in the direction of the container.
+    //
+    // For example, consider a Metanode A which contains two child nodes A/B
+    // and A/C. Let's say it has one edge in the metagraph from A/B->A/C, and
+    // one edge in the bridgegraph from Z->A/C.
+    //
+    // At this point, we've added a container bridge node IN to house all
+    // incoming bridge nodes. We'v alse added a bridge node Z' (with parent IN)
+    // to A, and a bridge edge from Z'->C.
+    //
+    //     +----------------------+
+    //     | A          +---+     |
+    //     |    +------>| C |     |
+    //     |    |       +---+     |
+    //     |    |         ^       |
+    //     |    |         |       |
+    //     |    |    +----|----+  |
+    //     |    |    | IN |    |  |
+    //     |  +---+  |  +---+  |  |
+    //     |  | B |  |  | Z'|  |  |
+    //     |  +---+  |  +---+  |  |
+    //     |         +---------+  |
+    //     +----------------------+
+    //
+    // With no other help, dagre would lay out B and Z' on the same level,
+    // because both of them have no incoming edges. In other words, B is a
+    // terminal node in the INCOMING direction.
+    //
+    // But we want to force dagre to lay out Z' (and everything in IN) lower
+    // than all non-bridge nodes, so that there's enough room for the bridge
+    // edges after they've been adjusted to meet up with paths coming in from
+    // outside.
+    //
+    // To force Z' (and all other bridge nodes) to be lowest in the graph, we
+    // identify terminal nodes like B and give them structural edges to
+    // a new structural bridge node S which we add to IN.
+    //
+    //     +----------------------+
+    //     | A          +---+     |
+    //     |       +--->| C |     |
+    //     |       |    +---+     |
+    //     |     +---+    ^       |
+    //     |     | B |    |       |
+    //     |     +---+    |       |
+    //     |       ^      |       |
+    //     |       |      |       |
+    //     |  +----|------|----+  |
+    //     |  |IN  |      |    |  |
+    //     |  |  +---+  +---+  |  |
+    //     |  |  | S |  | Z'|  |  |
+    //     |  |  +---+  +---+  |  |
+    //     |  +----------------+  |
+    //     +----------------------+
+    //
+    // This ensures that dagre will lay out the bridge containers strictly at
+    // the ends of the graph. The structural edges will never be seen in the
+    // visualization except as a debugging aid.
+    _.each([true, false], inbound => {
+      let bridgeContainerName = getBridgeNodeName(inbound, nodeName);
+      let bridgeContainerInfo = coreGraph.node(bridgeContainerName);
+      if (!bridgeContainerInfo) {
+        return;
+      }
+      _.each(coreGraph.nodes(), childName => {
+        // Short-circuit if this child is a bridge node or it's not a terminal
+        // node in the direction we're interested in.
+        let childNodeInfo = coreGraph.node(childName);
+        if (childNodeInfo.node.type === NodeType.BRIDGE) {
+          return;
+        }
+        let isTerminal = inbound ?
+          !coreGraph.predecessors(childName).length :
+          !coreGraph.successors(childName).length;
+        if (!isTerminal) {
+          return;
+        }
+
+        // Find or create a bridge node in the container for all structural
+        // metaedges. It would have been nice to skip this step and simply
+        // set a metaedge between the terminal node and the container node, but
+        // in that case, something about the graph upsets dagre.layout()'s
+        // longestPath algorithm (was getting errors due to an undefined).
+        let structuralNodeName =
+          getBridgeNodeName(inbound, nodeName, "STRUCTURAL_TARGET");
+        let structuralRenderInfo = coreGraph.node(structuralNodeName);
+        if (!structuralRenderInfo) {
+          let bridgeNode: BridgeNode = {
+            // Important Node properties.
+            name: structuralNodeName,
+            type: NodeType.BRIDGE,
+            // Unimportant Node properties.
+            isGroupNode: false,
+            cardinality: 1,
+            parentNode: null,
+            stats: null,
+            // BridgeNode properties.
+            inbound: inbound,
+          };
+          structuralRenderInfo = new RenderNodeInformation(bridgeNode);
+          structuralRenderInfo.structural = true;
+          this.index[structuralNodeName] = structuralRenderInfo;
+          coreGraph.setNode(structuralNodeName, structuralRenderInfo);
+          bridgeContainerInfo.node.cardinality++;
+          coreGraph.setParent(structuralNodeName, bridgeContainerName);
+        }
+
+        // Create the structural Metaedge and insert it.
+        let structuralMetaedgeInfo = new RenderMetaedgeInformation(null);
+        structuralMetaedgeInfo.structural = true;
+        structuralMetaedgeInfo.weight--; // Reduce weight for dagre layout.
+        inbound ?
+          coreGraph.setEdge(
+              structuralNodeName, childName, structuralMetaedgeInfo) :
+          coreGraph.setEdge(
+              childName, structuralNodeName, structuralMetaedgeInfo);
+      });
+    });
+  }
+}
+
+/**
+ * A class for rendering annotation object which contains label
+ * about the node embedded as annotation, type of annotation and the location
+ * of both the annotation's node and edge.
+ *
+ * Annotation objects include embedded constants, embedded summary, and
+ * edge shortcuts.
+ */
+export class Annotation {
+  node: Node;
+  renderNodeInfo: RenderNodeInformation;
+  renderMetaedgeInfo: RenderMetaedgeInformation;
+  annotationType: AnnotationType;
+  /**
+   * Center position of annotation relative to the host
+   * node's center x.
+   */
+  dx: number;
+  /**
+   * Center position of annotation relative to the host
+   * node's center y.
+   */
+  dy: number;
+  width: number;
+  height: number;
+  /**
+   * A flag whether it is an in-annotation (if true) or
+   * out-annotation  (if false).
+   */
+  isIn: boolean;
+  /** Label horizontal offset from the end of the node shape */
+  labelOffset: number;
+  /**
+   * Array of points for edges from the annotation to its host
+   * node. Each point contains the point location, relative to
+   * the host node's center.
+   */
+  points: {dx: number, dy: number}[];
+
+  /**
+   * Creates a new Annotation.
+   *
+   * @param node The underlying node this annotation points to.
+   * @param renderNodeInfo The render information for the underlying node
+   *     this annotation points to. This can be null if the annotation
+   *     denotes an embedding (constant, summary), in which case we
+   *     use the node property.
+   * @param renderMetaedgeInfo The render information for the edge associated
+   *     with the annotation.
+   * @param type The type of the annotation.
+   * @param isIn True if it is an in-annotation. False if it is an
+   *     out-annotation.
+   */
+  constructor(node: Node, renderNodeInfo: RenderNodeInformation,
+      renderMetaedgeInfo: RenderMetaedgeInformation, type: AnnotationType,
+      isIn: boolean) {
+    this.node = node;
+    this.renderNodeInfo = renderNodeInfo;
+    this.renderMetaedgeInfo = renderMetaedgeInfo;
+    this.annotationType = type;
+    // Properties specified by layout
+    this.dx = 0;
+    this.dy = 0;
+    this.width = 0;
+    this.height = 0;
+
+    this.isIn = isIn;
+    this.points = [];
+  }
+};
+
+export enum AnnotationType {SHORTCUT, CONSTANT, SUMMARY, ELLIPSIS};
+
+/**
+ * Manages a list of annotations. Two will be used for each
+ * RenderNodeInformation, one for in annotations and one for out annotations.
+ */
+export class AnnotationList {
+  /**
+   * List of visually drawable annotations, may include an ellipses annotation
+   * if the number added exceeds the number specified by maxAnnotations.
+   */
+  list: Annotation[];
+
+  /**
+   * Set of nodes which have been added as annotations to this list, so we can
+   * prevent duplicates.
+   */
+  nodeNames: { [nodeName: string]: boolean };
+
+  constructor() {
+    this.list = [];
+    this.nodeNames = {};
+  }
+
+  /**
+   * Append an annotation to the list, or a stand-in ellipsis annotation instead
+   * if this would make it too many.
+   */
+  push(annotation: Annotation, params: RenderGraphParams): void {
+    if (annotation.node.name in this.nodeNames) {
+      return; // Skip duplicate annotation.
+    }
+    this.nodeNames[annotation.node.name] = true;
+
+    if (this.list.length < params.maxAnnotations) {
+      this.list.push(annotation);
+      return;
+    }
+
+    let lastAnnotation = this.list[this.list.length - 1];
+    if (lastAnnotation.annotationType === AnnotationType.ELLIPSIS) {
+      let ellipsisNode = <EllipsisNode>lastAnnotation.node;
+      ellipsisNode.setNumMoreNodes(++ellipsisNode.numMoreNodes);
+      return;
+    }
+
+    let ellipsisNode = new tf.graph.EllipsisNodeImpl(1);
+    this.list.push(new Annotation(ellipsisNode,
+        new RenderNodeInformation(ellipsisNode), null,
+        AnnotationType.ELLIPSIS, annotation.isIn));
+  }
+}
+
+/**
+ * Contains rendering information about a node in the hierarchical graph.
+ */
+export class RenderNodeInformation {
+  /** Reference to the original underlying Node from the hierarchical graph. */
+  node: Node;
+  /** Whether the node is expanded or not. */
+  expanded: boolean;
+  /**
+   * List of rendering information about in-annotations like constants and
+   * shortcuts to high-degree nodes.
+   */
+  inAnnotations: AnnotationList;
+  /** List of rendering information about out-annotations (e.g. summary nodes)  */
+  outAnnotations: AnnotationList;
+
+  // --- Params specified by layout --- //
+
+  /** Center x position */
+  x: number;
+  /** Center y position */
+  y: number;
+  /** Width of the node's shape */
+  width: number;
+  /** Height of the node's shape. */
+  height: number;
+  /** Width of the bounding box for all in-annotations. */
+  inboxWidth: number;
+  /** Width of the bounding box for all out-annotations. */
+  outboxWidth: number;
+  /**
+   * Whether the node should be excluded from the scene.
+   * This is only used when there are too many items in a series so we only
+   * want to include top N ones.
+   */
+  // TODO(jimbo): Now that series rendering is non-recursive, remove this and
+  // all its uses from the code base.
+  excluded: boolean;
+
+  // --- Params used in drawing the bridge paths --- //
+
+  /**
+   * All bridge nodes are meant to be invisible, but whereas most represent a
+   * relationship from the underlying graph hierarchy, some exist solely for
+   * layout reasons. Specifically, those bridge nodes which have only structural
+   * rendering metaedges.
+   */
+  structural: boolean;
+
+  // --- Params for the size of the node box --- //
+
+  /** Label vertical offset from the center of node shape */
+  labelOffset: number;
+  /** X-space between each extracted node and the core graph. */
+  extractXOffset: number;
+  /** Rectangle radius (for making rounded rectangle) */
+  radius: number;
+
+  // --- Params for expanded node --- //
+
+  /** Label height for expanded node. */
+  labelHeight: number;
+  // Paddings between inner subscene and the border of the expanded node.
+  paddingTop: number;
+  paddingLeft: number;
+  paddingRight: number;
+  paddingBottom: number;
+
+  /** Width of the whole node including its shape and its annotations */
+  outerWidth: number;
+  /** Height of the whole node including its shape and its annotations */
+  outerHeight: number;
+  /**
+   * Whether a node is extracted as source-like (having high out-degree or matching
+   * predefined in-extract pattern.)
+   */
+  isInExtract: boolean;
+  /**
+   * Whether a node is extracted as sink-like (having high in-degree or matching
+   * predefined out-extract pattern.)
+   */
+  isOutExtract: boolean;
+
+  /**
+   * List of (color, proportion) tuples based on the proportion of devices of
+   * its children. If this node is an op node, this list will have only one
+   * color with proportion 1.0.
+   */
+  deviceColors: {color: string, proportion: number}[];
+
+  /**
+   * Color according to the memory usage of this node.
+   */
+  memoryColor: string;
+
+  /**
+   * Color according to the compute time of this node.
+   */
+  computeTimeColor: string;
+
+  constructor(node: Node) {
+    this.node = node;
+    this.expanded = false;
+    this.inAnnotations = new AnnotationList();
+    this.outAnnotations = new AnnotationList();
+    // Params specified by layout
+    this.x = 0;
+    this.y = 0;
+    this.width = 0;
+    this.height = 0;
+    this.inboxWidth = 0;
+    this.outboxWidth = 0;
+
+    this.excluded = false;
+
+    // Params for bridge paths.
+    this.structural = false;
+
+    // Params for node box.
+    this.labelOffset = 0;
+    this.extractXOffset = 0;
+    this.radius = 0;
+
+    // Params for expanded node
+    this.labelHeight = 0;
+    this.paddingTop = 0;
+    this.paddingLeft = 0;
+    this.paddingRight = 0;
+    this.paddingBottom = 0;
+
+    this.outerWidth = 0;
+    this.outerHeight = 0;
+    this.isInExtract = false;
+    this.isOutExtract = false;
+  }
+
+  isInCore(): boolean {
+    return !this.isInExtract && !this.isOutExtract;
+  }
+}
+
+/**
+ * Contains rendering information about a Metaedge from the underlying
+ * hierarchical graph. It may be from either a metagraph or a bridgegraph.
+ */
+export class RenderMetaedgeInformation {
+  /**
+   * Reference to the original underlying Metaedge from the hierarchical graph,
+   * if any. This will be null for the edges which connect OpNodes to their
+   * embeddings, for example.
+   */
+  metaedge: Metaedge;
+
+  /**
+   * Reference to the adjoining RenderMeteaedgeInformation from the parent's
+   * coreGraph. This is used during layout to determine the point at which this
+   * edge should touch the node's bounding box. This property will be null for
+   * edges which terminate at a node on both ends (all non-bridge edges).
+   */
+  adjoiningMetaedge: RenderMetaedgeInformation;
+
+  /**
+   * Most of the time, a RenderMetaedgeInformation object represents a real
+   * edge between nodes in the underlying graph structure. But sometimes, an
+   * edge only exsts for layout purposes. These structural edges are added
+   * during buildSubhierarchy() to force dagre.layout() to put bridge nodes
+   * at the ends of the flow.
+   * @see buildSubhierarchy()
+   */
+  structural: boolean;
+
+  /**
+   * Weight of the edge, used by dagre when deciding how important an edge is.
+   * Edges with higher weight are made shorter and straighter. The default
+   * dagre uses is 1.
+   */
+  weight: number;
+
+  /**
+   * X and Y coordinate pairs of the points in the path of the edge.
+   * @see tf.graph.node.subsceneAdjustPaths
+   */
+  points: any[];
+
+  /**
+   * D3 selection of the group containing the path that displays this edge.
+   */
+  edgeGroup: d3.Selection<RenderMetaedgeInformation>;
+
+  constructor(metaedge: Metaedge) {
+    this.metaedge = metaedge;
+    this.adjoiningMetaedge = null;
+    this.structural = false;
+    this.weight = 1;
+  }
+}
+
+function addInAnnotation(node: RenderNodeInformation, predecessor: Node,
+    predecessorRenderInfo: RenderNodeInformation, edge: any,
+    type: AnnotationType, params: RenderGraphParams): void {
+  let annotation = new Annotation(predecessor, predecessorRenderInfo, edge,
+      type, true);
+  node.inAnnotations.push(annotation, params);
+}
+
+function addOutAnnotation(node: RenderNodeInformation, successor: Node,
+    successorRenderInfo: RenderNodeInformation, edge: any,
+    type: AnnotationType, params: RenderGraphParams): void {
+  let annotation = new Annotation(successor, successorRenderInfo, edge,
+      type, false);
+  node.outAnnotations.push(annotation, params);
+}
+
+function setGraphDepth(graph: graphlib.Graph<RenderNodeInformation, any>,
+    depth: number) {
+  _.each(graph.nodes(), nodeName => {
+    let child = graph.node(nodeName);
+    child.expanded = depth > 1; // set all child of depth 1 to collapsed
+    if (depth > 0) {
+      switch (child.node.type) {
+        case NodeType.META:
+        case NodeType.SERIES:
+          setGroupNodeDepth(<RenderGroupNodeInformation>child, depth - 1);
+          break;
+        // Do nothing for leaf
+      }
+    }
+  });
+};
+
+export class RenderGroupNodeInformation extends RenderNodeInformation {
+  node: GroupNode;
+  /**
+   * The core graph is derived from the underlying node's metagraph, minus
+   * the extracted source-like and sink-like nodes.
+   */
+  coreGraph: graphlib.Graph<RenderNodeInformation, RenderMetaedgeInformation>;
+  /** Size of the bounding box for a metanode's core graph. */
+  coreBox: {
+    width: number,
+    height: number,
+  };
+  /** Size of the bounding box for a metanode's isolated in-extract children. */
+  inExtractBox: {width: number, height: number};
+  /** Size of the bounding box for a metanode's isolated out-extract children. */
+  outExtractBox: {width: number, height: number};
+  /** Array of isolated in-extract nodes. */
+  isolatedInExtract: RenderNodeInformation[];
+  /** Array of isolated out-extract nodes. */
+  isolatedOutExtract: RenderNodeInformation[];
+
+  constructor(groupNode: GroupNode) {
+    super(groupNode);
+    let metagraph = groupNode.metagraph;
+    let gl = metagraph.graph();
+    this.coreGraph =
+        createGraph<RenderNodeInformation, RenderMetaedgeInformation>(
+            gl.name, GraphType.CORE, { compound: true });
+    this.coreBox = {width: 0, height: 0};
+    this.inExtractBox = {width: 0, height: 0};
+    this.outExtractBox = {width: 0, height: 0};
+    this.isolatedInExtract = [];
+    this.isolatedOutExtract = [];
+  }
+}
+
+function setGroupNodeDepth(renderInfo: RenderGroupNodeInformation,
+    depth: number): void {
+  if (renderInfo.coreGraph) {
+    setGraphDepth(renderInfo.coreGraph, depth);
+  }
+}
+
+/**
+ * Remove an edge from the graph and add annotations to both ends of the edge.
+ *
+ * @param The core graph.
+ * @param v Source name.
+ * @param w Sink name.
+ */
+function createShortcut(graph: graphlib.Graph<RenderNodeInformation, {}>,
+  v: string, w: string, params: RenderGraphParams) {
+  let src = graph.node(v);
+  let sink = graph.node(w);
+  let edge = graph.edge(v, w);
+
+  // Add each annotation.
+  addOutAnnotation(src, sink.node, sink, edge, AnnotationType.SHORTCUT, params);
+  addInAnnotation(sink, src.node, src, edge, AnnotationType.SHORTCUT, params);
+
+  // Remove the edge from the core graph.
+  graph.removeEdge(v, w);
+}
+
+/**
+ * Remove edges from a node, and set its isOutExtract property to true,
+ * and remove the node and move it to isolatedOutExtract.
+ *
+ * If detachAllEdgesForHighDegree is true, extract all of its edges.
+ * Otherwise, only extract all in-edges.
+ */
+function makeOutExtract(renderNode: RenderGroupNodeInformation, n: string,
+    params: RenderGraphParams) {
+  let graph = renderNode.coreGraph;
+
+  graph.node(n).isOutExtract = true;
+
+  _.each(graph.predecessors(n), (p, index) => {
+    createShortcut(graph, p, n, params);
+  });
+
+  if (params.detachAllEdgesForHighDegree) {
+    _.each(graph.successors(n), (s, index) => {
+      createShortcut(graph, n, s, params);
+    });
+  }
+
+  if (params.detachAllEdgesForHighDegree || graph.neighbors(n).length === 0) {
+    renderNode.isolatedOutExtract.push(graph.node(n));
+    graph.removeNode(n);
+  }
+}
+
+/**
+ * Remove edges from a node, set its isInExtract property to true,
+ * and remove the node and move it to isolatedInExtract.
+ * If detachAllEdgesForHighDegree is true, extract all of its edges.
+ * Otherwise, only remove all out-edges.
+ */
+function makeInExtract(renderNode: RenderGroupNodeInformation, n: string,
+    params: RenderGraphParams) {
+  let graph = renderNode.coreGraph;
+  graph.node(n).isInExtract = true;
+
+  _.each(graph.successors(n), (s, index) => {
+    createShortcut(graph, n, s, params);
+  });
+
+  if (params.detachAllEdgesForHighDegree) {
+    _.each(graph.predecessors(n), (p, index) => {
+      createShortcut(graph, p, n, params);
+    });
+  }
+
+  // Remove the node from the core graph if conditions are met.
+  if (params.detachAllEdgesForHighDegree || graph.neighbors(n).length === 0) {
+    renderNode.isolatedInExtract.push(graph.node(n));
+    graph.removeNode(n);
+  }
+}
+
+/**
+ * Check whether the node's type is a member of the given list of types.
+ *
+ * @param node Node.
+ * @param types List of type to match.
+ */
+function hasTypeIn(node: Node, types: string[]): boolean {
+  if (node.type === NodeType.OP) {
+    for (let i = 0; i < types.length; i++) {
+      if ((<OpNode>node).op === types[i]) { return true; }
+    }
+  } else if (node.type === NodeType.META) {
+    let rootOpNode = (<Metanode>node).getRootOp();
+    if (rootOpNode) {
+      for (let i = 0; i < types.length; i++) {
+        if (rootOpNode.op === types[i]) { return true; }
+      }
+    }
+  }
+  return false;
+}
+
+/** Remove edges from pre-defined out-extract patterns */
+function extractPredefinedSink(renderNode: RenderGroupNodeInformation,
+    params: RenderGraphParams) {
+  let graph = renderNode.coreGraph;
+  _.each(graph.nodes(), n => {
+    let renderInfo = graph.node(n);
+    if (hasTypeIn(renderInfo.node, params.outExtractTypes)) {
+      makeOutExtract(renderNode, n, params);
+    }
+  });
+}
+
+/** Remove edges from pre-defined in-extract patterns */
+function extractPredefinedSource(renderNode: RenderGroupNodeInformation,
+    params: RenderGraphParams) {
+  let graph = renderNode.coreGraph;
+
+  _.each(graph.nodes(), n => {
+    let renderInfo = graph.node(n);
+    if (hasTypeIn(renderInfo.node, params.inExtractTypes)) {
+      makeInExtract(renderNode, n, params);
+    }
+  });
+}
+
+/** Extract from nodes with in-degree > maxInDegree */
+function extractHighInDegree(renderNode: RenderGroupNodeInformation,
+    params: RenderGraphParams) {
+  let graph = renderNode.coreGraph;
+  let maxInDegree = params.maxInDegree;
+
+  // detect first so degrees don't get affected by other removal
+  let highInDegreeNames = _.filter(graph.nodes(), n => {
+    // Count the in-degree based on only regular edges, unless there are
+    // no regular edges, in which case use the number of control edges.
+    // This is done so that control edges don't effect if nodes are extracted
+    // from the core graph, unless the node is only used for control.
+    let numEdgesToCount = _.reduce(graph.predecessors(n), (numEdgesToCount, pred) => {
+      let metaedge = graph.edge(pred, n).metaedge;
+      return numEdgesToCount + (metaedge.numRegularEdges ? 1 : 0);
+    }, 0);
+    if (numEdgesToCount === 0 && graph.predecessors(n).length > 0) {
+      numEdgesToCount = graph.predecessors(n).length;
+    }
+    return numEdgesToCount > maxInDegree;
+  });
+
+  _.each(highInDegreeNames, n => {
+    makeOutExtract(renderNode, n, params);
+  });
+}
+
+/** Extract nodes with out-degree > maxOutDegree */
+function extractHighOutDegree(renderNode: RenderGroupNodeInformation,
+    params: RenderGraphParams) {
+  let graph = renderNode.coreGraph;
+  let maxOutDegree = params.maxOutDegree;
+
+  // detect first so degrees don't get affected by other removal
+  let highOutDegreeNames = _.filter(graph.nodes(), n => {
+    // Count the out-degree based on only regular edges, unless there are
+    // no regular edges, in which case use the number of control edges.
+    // This is done so that control edges don't effect if nodes are extracted
+    // from the core graph, unless the node is only used for control.
+    let numEdgesToCount = _.reduce(graph.successors(n), (numEdgesToCount, succ) => {
+      let metaedge = graph.edge(n, succ).metaedge;
+      return numEdgesToCount + (metaedge.numRegularEdges ? 1 : 0);
+    }, 0);
+    if (numEdgesToCount === 0 && graph.successors(n).length > 0) {
+      numEdgesToCount = graph.successors(n).length;
+    }
+    return numEdgesToCount > maxOutDegree;
+  });
+
+  _.each(highOutDegreeNames, n => {
+    makeInExtract(renderNode, n, params);
+  });
+}
+
+/** Remove control edges from nodes that have too many control edges */
+function removeControlEdges(renderNode: RenderGroupNodeInformation,
+    params: RenderGraphParams) {
+  let graph = renderNode.coreGraph;
+
+  // Collect control edges into a map by node name.
+  let map = <{[nodeName: string]: graphlib.EdgeObject[]}>{};
+  _.each(graph.edges(), e => {
+    if (!graph.edge(e).metaedge.numRegularEdges) {
+      (map[e.v] = map[e.v] || []).push(e);
+      (map[e.w] = map[e.w] || []).push(e);
+    }
+  });
+
+  // For each node with too many control edges, turn them into annotations.
+  _.each(map, (edges, nodeName) => {
+    if (edges.length > params.maxControlDegree) {
+      _.each(edges, e => createShortcut(graph, e.v, e.w, params));
+    }
+  });
+}
+
+/**
+ * Given an integer, picks a hue that is far apart from other colors.
+ * The formula for picking color that avoid collision is:
+ *     hue = (color range * golden ratio * index) % color range
+ */
+export function mapIndexToHue(id: number): number {
+  let GOLDEN_RATIO = 1.61803398875;
+  // Hue of 0 is reserved for the gray nodes.
+  let MIN_HUE = 1;
+  let MAX_HUE = 359;
+  let COLOR_RANGE = MAX_HUE - MIN_HUE;
+  return MIN_HUE + ((COLOR_RANGE * GOLDEN_RATIO * id) % COLOR_RANGE);
+};
+
+/**
+ * Remove edges and add to annotation instead.
+ *
+ * For root node, consider predefined types for source and sink.
+ * We do not extract predefined type from non-root so that Variables and the
+ * sgd node (op type = "NoOp") do not get extract from inside own group.
+ *
+ * The order of extraction is important here as swapping the order can totally
+ * screw up the graph layout.
+ *
+ * @param {Render.Node} renderNode Node to manipulate.
+ * @param {Object} params render Graph construction parameters. See
+ *     <tf-graph-params>'s output
+ */
+function extractHighDegrees(renderNode: RenderGroupNodeInformation,
+    params: RenderGraphParams) {
+  if (params.outExtractTypes) {
+    extractPredefinedSink(renderNode, params);
+  }
+
+  // This has to come before extract high in-degree to protect the core part
+  // that takes many variables.
+  if (params.inExtractTypes) {
+    extractPredefinedSource(renderNode, params);
+  }
+
+  // This has to come before extract high out-degree to protect the core part
+  // that output to many places as there are more high-degree sinks than
+  // sources.
+
+  if (params.maxInDegree) {
+    extractHighInDegree(renderNode, params);
+  }
+
+  if (params.maxOutDegree) {
+    extractHighOutDegree(renderNode, params);
+  }
+
+  if (params.maxControlDegree) {
+    removeControlEdges(renderNode, params);
+  }
+
+  // Extract isolated nodes, which can be
+  // (1) source-like and sink-like nodes that are not originally isolated but
+  //     become isolated after further removal.
+  // (2) isolated nodes with annotations on one-side.  These might be either
+  //     - nodes that originally have high out-degree but because we remove
+  //       high in-degree nodes first, they no longer have high in-degree when
+  //       we check.  (Detecting all high-degree before removing also leads to
+  //       another problem.)
+  //     - nodes that do not have high degree, but their neighbors are all
+  //       extracted, so it might make sense to extract them too.
+
+  let graph = renderNode.coreGraph;
+  _.each(graph.nodes(), n => {
+    let child = graph.node(n);
+    let degree = graph.neighbors(n).length;
+
+    if (degree === 0) {
+      let hasOutAnnotations = child.outAnnotations.list.length > 0;
+      let hasInAnnotations = child.inAnnotations.list.length > 0;
+
+      if (child.isInExtract) { // Is source-like.
+        // This case only happens if detachAllEdgesForHighDegree is false.
+        // (Otherwise all source-like nodes are all isolated already.)
+        renderNode.isolatedInExtract.push(child);
+        graph.removeNode(n);
+      } else if (child.isOutExtract) { // Is sink-like.
+        // This case only happens if detachAllEdgesForHighDegree is false.
+        // // (Otherwise all sink-like nodes are all isolated already.)
+        renderNode.isolatedOutExtract.push(child);
+        graph.removeNode(n);
+      } else if (params.extractIsolatedNodesWithAnnotationsOnOneSide) {
+        if (hasOutAnnotations && !hasInAnnotations) {
+          child.isInExtract = true; // for ones with high out-annotations
+          renderNode.isolatedInExtract.push(child);
+          graph.removeNode(n);
+        } else if (hasInAnnotations && !hasOutAnnotations) {
+          child.isOutExtract = true; // for ones with high in-annotations
+          renderNode.isolatedOutExtract.push(child);
+          graph.removeNode(n);
+        } else {
+          // if a low degree node has both in- & out- annotations, do nothing
+          // because it is unclear which side it should go to.
+        }
+      }
+    }
+  });
+}
+} // close module tf.graph.render
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/scene/annotation.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/annotation.ts
new file mode 100644
index 0000000000..82609e8652
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/annotation.ts
@@ -0,0 +1,223 @@
+/// <reference path="../graph.ts" />
+/// <reference path="../render.ts" />
+/// <reference path="scene.ts" />
+/// <reference path="edge.ts" />
+
+module tf.graph.scene.annotation {
+
+/**
+ * Populate a given annotation container group
+ *
+ *     <g class="{in|out}-annotations"></g>
+ *
+ * with annotation group of the following structure:
+ *
+ * <g class="annotation">
+ *   <g class="annotation-node">
+ *   <!--
+ *   Content here determined by Scene.node.buildGroup.
+ *   -->
+ *   </g>
+ * </g>
+ *
+ * @param container selection of the container.
+ * @param annotationData node.{in|out}Annotations
+ * @param d node to build group for.
+ * @param sceneBehavior polymer scene element.
+ * @return selection of appended objects
+ */
+export function buildGroup(container, annotationData: render.AnnotationList,
+  d: render.RenderNodeInformation, sceneBehavior) {
+  // Select all children and join with data.
+  let annotationGroups = container.selectAll(function() {
+       // using d3's selector function
+       // See https://github.com/mbostock/d3/releases/tag/v2.0.0
+       // (It's not listed in the d3 wiki.)
+         return this.childNodes;
+       })
+       .data(annotationData.list, d => { return d.node.name; });
+
+  annotationGroups.enter()
+    .append("g")
+    .attr("data-name", a => { return a.node.name; })
+    .each(function(a) {
+      let aGroup = d3.select(this);
+
+      // Add annotation to the index in the scene
+      sceneBehavior.addAnnotationGroup(a, d, aGroup);
+      // Append annotation edge
+      let edgeType = Class.Annotation.EDGE;
+      let metaedge = a.renderMetaedgeInfo && a.renderMetaedgeInfo.metaedge;
+      if (metaedge && !metaedge.numRegularEdges) {
+        edgeType += " " + Class.Annotation.CONTROL_EDGE;
+      }
+      // If any edges are reference edges, add the reference edge class.
+      if (metaedge && metaedge.numRefEdges) {
+        edgeType += " " + Class.Edge.REF_LINE;
+      }
+      edge.appendEdge(aGroup, a, sceneBehavior, edgeType);
+
+      if (a.annotationType !== tf.graph.render.AnnotationType.ELLIPSIS) {
+        addAnnotationLabelFromNode(aGroup, a);
+        buildShape(aGroup, a, sceneBehavior);
+      } else {
+        addAnnotationLabel(aGroup, a.node.name, a, Class.Annotation.ELLIPSIS);
+      }
+    });
+
+  annotationGroups
+    .attr("class", a => {
+      return Class.Annotation.GROUP + " " +
+        annotationToClassName(a.annotationType) +
+        " " + node.nodeClass(a);
+    })
+    .each(function(a) {
+      let aGroup = d3.select(this);
+      update(aGroup, d, a, sceneBehavior);
+      if (a.annotationType !== tf.graph.render.AnnotationType.ELLIPSIS) {
+        addInteraction(aGroup, d, sceneBehavior);
+      }
+    });
+
+  annotationGroups.exit()
+    .each(function(a) {
+      let aGroup = d3.select(this);
+
+      // Remove annotation from the index in the scene
+      sceneBehavior.removeAnnotationGroup(a, d, aGroup);
+    })
+    .remove();
+  return annotationGroups;
+};
+
+/**
+ * Maps an annotation enum to a class name used in css rules.
+ */
+function annotationToClassName(annotationType: render.AnnotationType) {
+  return (tf.graph.render.AnnotationType[annotationType] || "")
+      .toLowerCase() || null;
+}
+
+function buildShape(aGroup, a: render.Annotation, sceneBehavior) {
+  if (a.annotationType === tf.graph.render.AnnotationType.SUMMARY) {
+    let image = scene.selectOrCreateChild(aGroup, "image");
+    image.attr({
+      "xlink:href": sceneBehavior.resolveUrl("../../lib/svg/summary-icon.svg"),
+      "height": "12px",
+      "width": "12px",
+      "cursor": "pointer"
+    });
+  } else {
+    let shape = node.buildShape(aGroup, a, Class.Annotation.NODE);
+    // add title tag to get native tooltips
+    scene.selectOrCreateChild(shape, "title").text(a.node.name);
+  }
+}
+
+function addAnnotationLabelFromNode(aGroup, a: render.Annotation) {
+  let namePath = a.node.name.split("/");
+  let text = namePath[namePath.length - 1];
+  let shortenedText = text.length > 8 ? text.substring(0, 8) + "..." : text;
+  return addAnnotationLabel(aGroup, shortenedText, a, null, text);
+}
+
+function addAnnotationLabel(aGroup, label, a, additionalClassNames,
+    fullLabel?) {
+  let classNames = Class.Annotation.LABEL;
+  if (additionalClassNames) {
+    classNames += " " + additionalClassNames;
+  }
+  let titleText = fullLabel ? fullLabel : label;
+  return aGroup.append("text")
+                .attr("class", classNames)
+                .attr("dy", ".35em")
+                .attr("text-anchor", a.isIn ? "end" : "start")
+                .text(label)
+                .append("title").text(titleText);
+}
+
+function addInteraction(selection, d: render.RenderNodeInformation,
+    sceneBehavior) {
+  selection
+    .on("mouseover", a => {
+      sceneBehavior.fire("annotation-highlight", {
+        name: a.node.name,
+        hostName: d.node.name
+      });
+    })
+    .on("mouseout", a => {
+      sceneBehavior.fire("annotation-unhighlight", {
+        name: a.node.name,
+        hostName: d.node.name
+      });
+    })
+    .on("click", a => {
+      // Stop this event"s propagation so that it isn't also considered a
+      // graph-select.
+      (<Event>d3.event).stopPropagation();
+      sceneBehavior.fire("annotation-select", {
+        name: a.node.name,
+        hostName: d.node.name
+      });
+    });
+};
+
+/**
+ * Adjust annotation's position.
+ *
+ * @param aGroup selection of a "g.annotation" element.
+ * @param d Host node data.
+ * @param a annotation node data.
+ * @param scene Polymer scene element.
+ */
+function update(aGroup, d: render.RenderNodeInformation, a: render.Annotation,
+    sceneBehavior) {
+  // Annotations that point to embedded nodes (constants,summary)
+  // don't have a render information attached so we don't stylize these.
+  // Also we don't stylize ellipsis annotations (the string "... and X more").
+  if (a.renderNodeInfo &&
+      a.annotationType !== tf.graph.render.AnnotationType.ELLIPSIS) {
+    node.stylize(aGroup, a.renderNodeInfo, sceneBehavior,
+      Class.Annotation.NODE);
+  }
+
+  if (a.annotationType === tf.graph.render.AnnotationType.SUMMARY) {
+    // Update the width of the annotation to give space for the image.
+    a.width += 10;
+  }
+
+  // label position
+  aGroup.select("text." + Class.Annotation.LABEL).transition().attr({
+    x: d.x + a.dx + (a.isIn ? -1 : 1) * (a.width / 2 + a.labelOffset),
+    y: d.y + a.dy
+  });
+
+  // Some annotations (such as summary) are represented using a 12x12 image tag.
+  // Purposely ommited units (e.g. pixels) since the images are vector graphics.
+  // If there is an image, we adjust the location of the image to be vertically
+  // centered with the node and horizontally centered between the arrow and the
+  // text label.
+  aGroup.select("image").transition().attr({
+    x: d.x + a.dx - 3,
+    y: d.y + a.dy - 6
+  });
+
+  // Node position (only one of the shape selection will be non-empty.)
+  scene.positionEllipse(aGroup.select("." + Class.Annotation.NODE + " ellipse"),
+                        d.x + a.dx, d.y + a.dy, a.width, a.height);
+  scene.positionRect(aGroup.select("." + Class.Annotation.NODE + " rect"),
+                     d.x + a.dx, d.y + a.dy, a.width, a.height);
+  scene.positionRect(aGroup.select("." + Class.Annotation.NODE + " use"),
+                     d.x + a.dx, d.y + a.dy, a.width, a.height);
+
+  // Edge position
+  aGroup.select("path." + Class.Annotation.EDGE).transition().attr("d", a => {
+        // map relative position to absolute position
+        let points = a.points.map(p => {
+          return {x: p.dx + d.x, y: p.dy + d.y};
+        });
+        return edge.interpolate(points);
+      });
+};
+
+} // close module
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/scene/edge.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/edge.ts
new file mode 100644
index 0000000000..e11ec97f80
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/edge.ts
@@ -0,0 +1,177 @@
+/// <reference path="../graph.ts" />
+/// <reference path="../render.ts" />
+/// <reference path="scene.ts" />
+
+module tf.graph.scene.edge {
+
+let Scene = tf.graph.scene; // Aliased
+
+export function getEdgeKey(edgeObj) {
+  return edgeObj.v + tf.graph.EDGE_KEY_DELIM + edgeObj.w;
+}
+
+/**
+ * Select or Create a "g.edges" group to a given sceneGroup
+ * and builds a number of "g.edge" groups inside the group.
+ *
+ * Structure Pattern:
+ *
+ * <g class="edges">
+ *   <g class="edge">
+ *     <path class="edgeline"/>
+ *   </g>
+ *   ...
+ * </g>
+ *
+ *
+ * @param sceneGroup container
+ * @param graph
+ * @param sceneBehavior Parent scene module.
+ * @return selection of the created nodeGroups
+ */
+export function buildGroup(sceneGroup,
+  graph: graphlib.Graph<tf.graph.render.RenderNodeInformation,
+    tf.graph.render.RenderMetaedgeInformation>, sceneBehavior) {
+  let edgeData = _.reduce(graph.edges(), (edges, edgeObj) => {
+    let edgeLabel = graph.edge(edgeObj);
+    edges.push({
+      v: edgeObj.v,
+      w: edgeObj.w,
+      label: edgeLabel
+    });
+    return edges;
+  }, []);
+
+  let container = scene.selectOrCreateChild(sceneGroup, "g",
+     Class.Edge.CONTAINER);
+  let containerNode = container.node();
+
+  // Select all children and join with data.
+  // (Note that all children of g.edges are g.edge)
+  let edgeGroups = container.selectAll(function() {
+    // using d3's selector function
+    // See https://github.com/mbostock/d3/releases/tag/v2.0.0
+    // (It's not listed in the d3 wiki.)
+    return this.childNodes;
+  })
+    .data(edgeData, getEdgeKey);
+
+  // Make edges a group to support rendering multiple lines for metaedge
+  edgeGroups.enter()
+    .append("g")
+    .attr("class", Class.Edge.GROUP)
+    .attr("data-edge", getEdgeKey)
+    .each(function(d) {
+      let edgeGroup = d3.select(this);
+      d.label.edgeGroup = edgeGroup;
+      // index node group for quick highlighting
+      sceneBehavior._edgeGroupIndex[getEdgeKey(d)] = edgeGroup;
+
+      // If any edges are reference edges, add the reference edge class.
+      let extraEdgeClass = d.label.metaedge && d.label.metaedge.numRefEdges
+        ? Class.Edge.REF_LINE + " " + Class.Edge.LINE
+        : undefined;
+      // Add line during enter because we're assuming that type of line
+      // normally does not change.
+      appendEdge(edgeGroup, d, scene, extraEdgeClass);
+    });
+
+  edgeGroups.each(position);
+  edgeGroups.each(function(d) {
+    stylize(d3.select(this), d, sceneBehavior);
+  });
+
+  edgeGroups.exit()
+    .each(d => {
+      delete sceneBehavior._edgeGroupIndex[getEdgeKey(d)];
+    })
+    .remove();
+  return edgeGroups;
+};
+
+/**
+ * For a given d3 selection and data object, create a path to represent the
+ * edge described in d.label.
+ *
+ * If d.label is defined, it will be a RenderMetaedgeInformation instance. It
+ * will sometimes be undefined, for example for some Annotation edges for which
+ * there is no underlying Metaedge in the hierarchical graph.
+ */
+export function appendEdge(edgeGroup, d, sceneBehavior, edgeClass?) {
+  edgeClass = edgeClass || Class.Edge.LINE; // set default type
+
+  if (d.label && d.label.structural) {
+    edgeClass += " " + Class.Edge.STRUCTURAL;
+  }
+
+  edgeGroup.append("path")
+    .attr("class", edgeClass);
+};
+
+/**
+ * Returns a tween interpolator for the endpoint of an edge path.
+ */
+function getEdgePathInterpolator(d, i, a) {
+  let renderMetaedgeInfo = d.label;
+  let adjoiningMetaedge = renderMetaedgeInfo.adjoiningMetaedge;
+  if (!adjoiningMetaedge) {
+    return d3.interpolate(a, interpolate(renderMetaedgeInfo.points));
+  }
+
+  let renderPath = this;
+
+  // Get the adjoining path that matches the adjoining metaedge.
+  let adjoiningPath =
+    <SVGPathElement>((<HTMLElement>adjoiningMetaedge.edgeGroup.node())
+      .firstChild);
+
+  // Find the desired SVGPoint along the adjoining path, then convert those
+  // coordinates into the space of the renderPath using its Current
+  // Transformation Matrix (CTM).
+  let inbound = renderMetaedgeInfo.metaedge.inbound;
+
+  return function(t) {
+    let adjoiningPoint = adjoiningPath
+      .getPointAtLength(inbound ? adjoiningPath.getTotalLength() : 0)
+      .matrixTransform(adjoiningPath.getCTM())
+      .matrixTransform(renderPath.getCTM().inverse());
+
+    // Update the relevant point in the renderMetaedgeInfo's points list, then
+    // re-interpolate the path.
+    let points = renderMetaedgeInfo.points;
+    let index = inbound ? 0 : points.length - 1;
+    points[index].x = adjoiningPoint.x;
+    points[index].y = adjoiningPoint.y;
+    let dPath = interpolate(points);
+    return dPath;
+  };
+}
+
+export let interpolate = d3.svg.line()
+  .interpolate("basis")
+  .x((d: any) => { return d.x; })
+  .y((d: any) => { return d.y; });
+
+function position(d) {
+  d3.select(this).select("path." + Class.Edge.LINE)
+    .each(function(d) {
+      let path = d3.select(this);
+      path.transition().attrTween("d", getEdgePathInterpolator);
+    });
+};
+
+/**
+ * For a given d3 selection and data object, mark the edge as a control
+ * dependency if it contains only control edges.
+ *
+ * d's label property will be a RenderMetaedgeInformation object.
+ */
+function stylize(edgeGroup, d, stylize) {
+  let a;
+  let metaedge = d.label.metaedge;
+  edgeGroup
+    .select("path." + Class.Edge.LINE)
+    .classed("control-dep", metaedge && !metaedge.numRegularEdges);
+};
+
+} // close module
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/scene/minimap.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/minimap.ts
new file mode 100644
index 0000000000..1a34132765
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/minimap.ts
@@ -0,0 +1,269 @@
+/// <reference path="../../../../typings/tsd.d.ts" />
+/// <reference path="../common.ts" />
+
+module tf.scene {
+
+/** Show minimap when the viewpoint area is less than X% of the whole area. */
+const FRAC_VIEWPOINT_AREA: number = 0.8;
+
+export class Minimap {
+  /** The minimap container. */
+  private minimap: HTMLElement;
+  /** The canvas used for drawing the mini version of the svg. */
+  private canvas: HTMLCanvasElement;
+  /** A buffer canvas used for temporary drawing to avoid flickering. */
+  private canvasBuffer: HTMLCanvasElement;
+
+  /** The minimap svg used for holding the viewpoint rectangle. */
+  private minimapSvg: SVGSVGElement;
+  /** The rectangle showing the current viewpoint. */
+  private viewpoint: SVGRectElement;
+  /**
+   * The scale factor for the minimap. The factor is determined automatically
+   * so that the minimap doesn't violate the maximum width/height specified
+   * in the constructor. The minimap maintains the same aspect ratio as the
+   * original svg.
+   */
+  private scaleMinimap: number;
+  /** The main svg element. */
+  private svg: SVGSVGElement;
+  /** The svg group used for panning and zooming the main svg. */
+  private zoomG: SVGGElement;
+  /** The zoom behavior of the main svg. */
+  private mainZoom: d3.behavior.Zoom<any>;
+  /** The maximum width and height for the minimap. */
+  private maxWandH: number;
+  /** The last translation vector used in the main svg. */
+  private translate: [number, number];
+  /** The last scaling factor used in the main svg. */
+  private scaleMain: number;
+  /** The coordinates of the viewpoint rectangle. */
+  private viewpointCoord: {x: number, y: number};
+  /** The current size of the minimap */
+  private minimapSize: {width: number, height: number};
+  /** Padding (px) due to the main labels of the graph. */
+  private labelPadding: number;
+  /**
+   * Constructs a new minimap.
+   *
+   * @param svg The main svg element.
+   * @param zoomG The svg group used for panning and zooming the main svg.
+   * @param mainZoom The main zoom behavior.
+   * @param minimap The minimap container.
+   * @param maxWandH The maximum width/height for the minimap.
+   * @param labelPadding Padding in pixels due to the main graph labels.
+   */
+  constructor(svg: SVGSVGElement, zoomG: SVGGElement,
+      mainZoom: d3.behavior.Zoom<any>, minimap: HTMLElement,
+      maxWandH: number, labelPadding: number) {
+    this.svg = svg;
+    this.labelPadding = labelPadding;
+    this.zoomG = zoomG;
+    this.mainZoom = mainZoom;
+    this.maxWandH = maxWandH;
+    let $minimap = d3.select(minimap);
+    // The minimap will have 2 main components: the canvas showing the content
+    // and an svg showing a rectangle of the currently zoomed/panned viewpoint.
+    let $minimapSvg = $minimap.select("svg");
+
+    // Make the viewpoint rectangle draggable.
+    let $viewpoint = $minimapSvg.select("rect");
+    let dragmove = (d) => {
+      this.viewpointCoord.x = (<DragEvent>d3.event).x;
+      this.viewpointCoord.y = (<DragEvent>d3.event).y;
+      this.updateViewpoint();
+    };
+    this.viewpointCoord = {x: 0, y: 0};
+    let drag = d3.behavior.drag().origin(Object).on("drag", dragmove);
+    $viewpoint.datum(this.viewpointCoord).call(drag);
+
+    // Make the minimap clickable.
+    $minimapSvg.on("click", () => {
+      if ((<Event>d3.event).defaultPrevented) {
+        // This click was part of a drag event, so suppress it.
+        return;
+      }
+      // Update the coordinates of the viewpoint.
+      let width = Number($viewpoint.attr("width"));
+      let height = Number($viewpoint.attr("height"));
+      let clickCoords = d3.mouse($minimapSvg.node());
+      this.viewpointCoord.x = clickCoords[0] - width / 2;
+      this.viewpointCoord.y = clickCoords[1] - height / 2;
+      this.updateViewpoint();
+    });
+    this.viewpoint = <SVGRectElement> $viewpoint.node();
+    this.minimapSvg = <SVGSVGElement> $minimapSvg.node();
+    this.minimap = minimap;
+    this.canvas = <HTMLCanvasElement> $minimap.select("canvas.first").node();
+    this.canvasBuffer =
+        <HTMLCanvasElement> $minimap.select("canvas.second").node();
+  }
+
+  /**
+   * Updates the position and the size of the viewpoint rectangle.
+   * It also notifies the main svg about the new panned position.
+   */
+  private updateViewpoint(): void {
+    // Update the coordinates of the viewpoint rectangle.
+    d3.select(this.viewpoint)
+      .attr("x", this.viewpointCoord.x)
+      .attr("y", this.viewpointCoord.y);
+    // Update the translation vector of the main svg to reflect the
+    // new viewpoint.
+    let mainX = - this.viewpointCoord.x * this.scaleMain / this.scaleMinimap;
+    let mainY = - this.viewpointCoord.y * this.scaleMain / this.scaleMinimap;
+    let zoomEvent = this.mainZoom.translate([mainX, mainY]).event;
+    d3.select(this.zoomG).call(zoomEvent);
+  }
+
+  /**
+   * Redraws the minimap. Should be called whenever the main svg
+   * was updated (e.g. when a node was expanded).
+   */
+  update(): void {
+    let $svg = d3.select(this.svg);
+    // Read all the style rules in the document and embed them into the svg.
+    // The svg needs to be self contained, i.e. all the style rules need to be
+    // embedded so the canvas output matches the origin.
+    let stylesText = "";
+    for (let k = 0; k < document.styleSheets.length; k++) {
+      try {
+        let cssRules = (<any>document.styleSheets[k]).cssRules ||
+          (<any>document.styleSheets[k]).rules;
+        if (cssRules == null) {
+          continue;
+        }
+        for (let i = 0; i < cssRules.length; i++) {
+          stylesText += cssRules[i].cssText + "\n";
+        }
+      } catch (e) {
+        if (e.name !== "SecurityError") {
+          throw e;
+        }
+      }
+    }
+
+    // Temporarily add the css rules to the main svg.
+    let svgStyle = $svg.append("style");
+    svgStyle.text(stylesText);
+
+    // Temporarily remove the zoom/pan transform from the main svg since we
+    // want the minimap to show a zoomed-out and centered view.
+    let $zoomG = d3.select(this.zoomG);
+    let zoomTransform = $zoomG.attr("transform");
+    $zoomG.attr("transform", null);
+
+    // Get the size of the entire scene.
+    let sceneSize = this.zoomG.getBBox();
+    // Since we add padding, account for that here.
+    sceneSize.height += this.labelPadding;
+
+    // Temporarily assign an explicit width/height to the main svg, since
+    // it doesn't have one (uses flex-box), but we need it for the canvas
+    // to work.
+    $svg.attr({
+      width: sceneSize.width,
+      height: sceneSize.height,
+    });
+
+    // Since the content inside the svg changed (e.g. a node was expanded),
+    // the aspect ratio have also changed. Thus, we need to update the scale
+    // factor of the minimap. The scale factor is determined such that both
+    // the width and height of the minimap are <= maximum specified w/h.
+    this.scaleMinimap =
+        this.maxWandH / Math.max(sceneSize.width, sceneSize.height);
+
+    this.minimapSize = {
+      width: sceneSize.width * this.scaleMinimap,
+      height: sceneSize.height * this.scaleMinimap
+    };
+
+    // Update the size of the minimap's svg, the buffer canvas and the
+    // viewpoint rect.
+    d3.select(this.minimapSvg).attr(<any>this.minimapSize);
+    d3.select(this.canvasBuffer).attr(<any>this.minimapSize);
+    if (this.translate != null && this.zoom != null) {
+      // Update the viewpoint rectangle shape since the aspect ratio of the
+      // map has changed.
+      requestAnimationFrame(() => this.zoom());
+    }
+
+    // Serialize the main svg to a string which will be used as the rendering
+    // content for the canvas.
+    let svgXml = (new XMLSerializer()).serializeToString(this.svg);
+
+    // Now that the svg is serialized for rendering, remove the temporarily
+    // assigned styles, explicit width and height and bring back the pan/zoom
+    // transform.
+    svgStyle.remove();
+    $svg.attr({
+      width: null,
+      height: null
+    });
+    $zoomG.attr("transform", zoomTransform);
+    let image = new Image();
+    image.onload = () => {
+      // Draw the svg content onto the buffer canvas.
+      let context = this.canvasBuffer.getContext("2d");
+      context.clearRect(0, 0, this.canvasBuffer.width,
+          this.canvasBuffer.height);
+      context.drawImage(image, 0, 0,
+        this.minimapSize.width, this.minimapSize.height);
+      requestAnimationFrame(() => {
+        // Hide the old canvas and show the new buffer canvas.
+        d3.select(this.canvasBuffer).style("display", null);
+        d3.select(this.canvas).style("display", "none");
+        // Swap the two canvases.
+        [this.canvas, this.canvasBuffer] = [this.canvasBuffer, this.canvas];
+      });
+    };
+    image.src = "data:image/svg+xml;base64," + btoa(svgXml);
+  }
+
+  /**
+   * Handles changes in zooming/panning. Should be called from the main svg
+   * to notify that a zoom/pan was performed and this minimap will update it's
+   * viewpoint rectangle.
+   *
+   * @param translate The translate vector, or none to use the last used one.
+   * @param scale The scaling factor, or none to use the last used one.
+   */
+  zoom(translate?: [number, number], scale?: number): void {
+    // Update the new translate and scale params, only if specified.
+    this.translate = translate || this.translate;
+    this.scaleMain = scale || this.scaleMain;
+    // Update the location of the viewpoint rectangle.
+    let svgRect = this.svg.getBoundingClientRect();
+    let $viewpoint = d3.select(this.viewpoint);
+    this.viewpointCoord.x = -this.translate[0] * this.scaleMinimap /
+        this.scaleMain;
+    this.viewpointCoord.y = -this.translate[1] * this.scaleMinimap /
+        this.scaleMain;
+    let viewpointWidth = svgRect.width * this.scaleMinimap / this.scaleMain;
+    let viewpointHeight = svgRect.height * this.scaleMinimap / this.scaleMain;
+    $viewpoint.attr({
+      x: this.viewpointCoord.x,
+      y: this.viewpointCoord.y,
+      width: viewpointWidth,
+      height: viewpointHeight
+    });
+    // Show/hide the minimap depending on the viewpoint area as fraction of the
+    // whole minimap.
+    let mapWidth = this.minimapSize.width;
+    let mapHeight = this.minimapSize.height;
+    let x = this.viewpointCoord.x;
+    let y = this.viewpointCoord.y;
+    let w = Math.min(Math.max(0, x + viewpointWidth), mapWidth) -
+        Math.min(Math.max(0, x), mapWidth);
+    let h = Math.min(Math.max(0, y + viewpointHeight), mapHeight) -
+        Math.min(Math.max(0, y), mapHeight);
+    let fracIntersect = (w * h) / (mapWidth * mapHeight);
+    if (fracIntersect < FRAC_VIEWPOINT_AREA) {
+      this.minimap.classList.remove("hidden");
+    } else {
+      this.minimap.classList.add("hidden");
+    }
+  }
+}
+
+} // close module tf.scene
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/scene/node.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/node.ts
new file mode 100644
index 0000000000..8c74b37e07
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/node.ts
@@ -0,0 +1,525 @@
+/// <reference path="../graph.ts" />
+/// <reference path="scene.ts" />
+/// <reference path="annotation.ts" />
+
+module tf.graph.scene.node {
+
+/**
+ * Select or Create a "g.nodes" group to a given sceneGroup
+ * and builds a number of "g.node" groups inside the group.
+ *
+ * Structure Pattern:
+ *
+ * <g class="nodes">
+ *   <g class="node">
+ *     <g class="in-annotations">
+ *       ...
+ *     </g>
+ *     <g class="out-annotations">
+ *       ...
+ *     </g>
+ *     <g class="nodeshape">
+ *      <!--
+ *      Content of the node shape should be for the node itself. For example a
+ *      Metanode would have a <rect> with rounded edges, an op would have an
+ *      <ellipse>. More complex nodes like series may contain multiple elements
+ *      which are conditionally visible based on whether the node is expanded.
+ *      -->
+ *     </g>
+ *     <text class="label">node name</text>
+ *     <g class="subscene">
+ *       <!--
+ *       Content of  the subscene (only for metanode and series node).
+ *
+ *       Subscene is a svg group that contains content of the
+ *       metanode's metagraph that is recursively generated by Scene.build().
+ *
+ *       When the graph is expanded multiple times, a subscene can contain
+ *       nested subscenes inside.
+ *       -->
+ *     </g>
+ *   </g>
+ *   ...
+ * </g>
+ *
+ *
+ * @param sceneGroup selection of the container
+ * @param nodeData array of render node information to map
+ * @param sceneBehavior parent scene module
+ * @return selection of the created nodeGroups
+ */
+export function buildGroup(sceneGroup,
+    nodeData: render.RenderNodeInformation[], sceneBehavior) {
+  let container = scene.selectOrCreateChild(sceneGroup, "g",
+    Class.Node.CONTAINER);
+  // Select all children and join with data.
+  // (Note that all children of g.nodes are g.node)
+  let nodeGroups = container.selectAll(function() {
+    // using d3's selector function
+    // See https://github.com/mbostock/d3/releases/tag/v2.0.0
+    // (It's not listed in the d3 wiki.)
+    return this.childNodes; // this here refers to container.node()
+  })
+    .data(nodeData, (d: any) => {
+      // make sure that we don't have to swap shape type
+      return d.node.name + ":" + d.node.type;
+    });
+
+  // ENTER
+  nodeGroups.enter()
+    .append("g")
+    .attr("data-name", d => { return d.node.name; })
+    .each(function(d) {
+      let nodeGroup = d3.select(this);
+      // index node group for quick stylizing
+      sceneBehavior.addNodeGroup(d.node.name, nodeGroup);
+    });
+
+  // UPDATE
+  nodeGroups
+    .attr("class", d => {
+      return Class.Node.GROUP + " " + nodeClass(d);
+    })
+    .each(function(d) {
+      let nodeGroup = d3.select(this);
+      // add g.in-annotations (always add -- to keep layer order consistent.)
+      let inAnnotationBox = scene.selectOrCreateChild(nodeGroup, "g",
+        Class.Annotation.INBOX);
+      annotation.buildGroup(inAnnotationBox, d.inAnnotations, d,
+        sceneBehavior);
+
+      // add g.out-annotations  (always add -- to keep layer order consistent.)
+      let outAnnotationBox = scene.selectOrCreateChild(nodeGroup, "g",
+        Class.Annotation.OUTBOX);
+      annotation.buildGroup(outAnnotationBox, d.outAnnotations, d,
+        sceneBehavior);
+
+      // label
+      let label = labelBuild(nodeGroup, d, sceneBehavior);
+      // Do not add interaction to metanode labels as they live inside the
+      // metanode shape which already has the same interactions.
+      addInteraction(label, d, sceneBehavior, d.node.type === NodeType.META);
+
+      // build .shape below label
+      let shape = buildShape(nodeGroup, d, Class.Node.SHAPE, label.node());
+      if (d.node.isGroupNode) {
+        addButton(shape, d, sceneBehavior);
+      }
+      addInteraction(shape, d, sceneBehavior);
+
+      // build subscene on the top
+      subsceneBuild(nodeGroup, d, sceneBehavior);
+
+      stylize(nodeGroup, d, sceneBehavior);
+      position(nodeGroup, d, sceneBehavior);
+    });
+
+  // EXIT
+  nodeGroups.exit()
+    .each(function(d) {
+      // remove all indices on remove
+      sceneBehavior.removeNodeGroup(d.node.name);
+
+      let nodeGroup = d3.select(this);
+      if (d.inAnnotations.list.length > 0) {
+        nodeGroup.select("." + Class.Annotation.INBOX)
+          .selectAll("." + Class.Annotation.GROUP)
+          .each(a => {
+            sceneBehavior.removeAnnotationGroup(a, d);
+          });
+      }
+      if (d.outAnnotations.list.length > 0) {
+        nodeGroup.select("." + Class.Annotation.OUTBOX)
+          .selectAll("." + Class.Annotation.GROUP)
+          .each(a => {
+            sceneBehavior.removeAnnotationGroup(a, d);
+          });
+      }
+    })
+    .remove();
+  return nodeGroups;
+};
+
+/**
+ * Update or remove the subscene of a render group node depending on whether it
+ * is a expanded. If the node is not a group node, this method has no effect.
+ *
+ * @param nodeGroup selection of the container
+ * @param renderNodeInfo the render information for the node.
+ * @param sceneBehavior parent scene module
+ * @return Selection of the subscene group, or null if node group does not have
+ *        a subscene. Op nodes, bridge nodes and unexpanded group nodes will
+ *        not have a subscene.
+ */
+function subsceneBuild(nodeGroup,
+    renderNodeInfo: render.RenderGroupNodeInformation, sceneBehavior) {
+  if (renderNodeInfo.node.isGroupNode) {
+    if (renderNodeInfo.expanded) {
+      // Recursively build the subscene.
+      return scene.buildGroup(nodeGroup, renderNodeInfo, sceneBehavior,
+        Class.Subscene.GROUP);
+    }
+    // Clean out existing subscene if the node is not expanded.
+    scene.selectChild(nodeGroup, "g", Class.Subscene.GROUP).remove();
+  }
+  return null;
+};
+
+/**
+ * Translate the subscene of the given node group
+ */
+function subscenePosition(nodeGroup, d: render.RenderNodeInformation) {
+  let x0 = d.x - d.width / 2.0 + d.paddingLeft;
+  let y0 = d.y - d.height / 2.0 + d.paddingTop;
+
+  let subscene = scene.selectChild(nodeGroup, "g", Class.Subscene.GROUP);
+  scene.translate(subscene, x0, y0);
+};
+
+/**
+ * Add an expand/collapse button to a group node
+ *
+ * @param selection The group node selection.
+ * @param d Info about the node being rendered.
+ * @param sceneBehavior parent scene module.
+ */
+function addButton(selection, d: render.RenderNodeInformation, sceneBehavior) {
+  let group = scene.selectOrCreateChild(
+    selection, "g", Class.Node.BUTTON_CONTAINER);
+  scene.selectOrCreateChild(group, "circle", Class.Node.BUTTON_CIRCLE);
+  scene.selectOrCreateChild(group, "path", Class.Node.EXPAND_BUTTON).attr(
+    "d", "M0,-2.2 V2.2 M-2.2,0 H2.2");
+  scene.selectOrCreateChild(group, "path", Class.Node.COLLAPSE_BUTTON).attr(
+    "d", "M-2.2,0 H2.2");
+  group.on("click", d => {
+    // Stop this event's propagation so that it isn't also considered a
+    // node-select.
+    (<Event>d3.event).stopPropagation();
+    sceneBehavior.fire("node-toggle-expand", { name: d.node.name });
+  });
+  scene.positionButton(group, d);
+};
+
+/**
+ * Fire node-* events when the selection is interacted.
+ *
+ * @param disableInteraction When true, have the provided selection
+ * ignore all pointer events. Used for text labels inside of metanodes, which
+ * don't need interaction as their surrounding shape has interaction, and if
+ * given interaction would cause conflicts with the expand/collapse button.
+ */
+function addInteraction(selection, d: render.RenderNodeInformation,
+    sceneBehavior, disableInteraction?: boolean) {
+  if (disableInteraction) {
+    selection.attr("pointer-events", "none");
+    return;
+  }
+  selection.on("dblclick", d => {
+  sceneBehavior.fire("node-toggle-expand", { name: d.node.name });
+  })
+    .on("mouseover", d => {
+      // don't send mouseover over expanded group,
+      // otherwise it is causing too much glitches
+    if (sceneBehavior.isNodeExpanded(d)) { return; }
+
+      sceneBehavior.fire("node-highlight", { name: d.node.name });
+    })
+    .on("mouseout", d => {
+      // don't send mouseover over expanded group,
+      // otherwise it is causing too much glitches
+    if (sceneBehavior.isNodeExpanded(d)) { return; }
+
+    sceneBehavior.fire("node-unhighlight", { name: d.node.name });
+    })
+    .on("click", d => {
+      // Stop this event's propagation so that it isn't also considered
+      // a graph-select.
+      (<Event>d3.event).stopPropagation();
+      sceneBehavior.fire("node-select", { name: d.node.name });
+    });
+};
+
+/**
+ * Append svg text for label and assign data.
+ * @param nodeGroup
+ * @param renderNodeInfo The render node information for the label.
+ * @param sceneBehavior parent scene module.
+ */
+function labelBuild(nodeGroup, renderNodeInfo: render.RenderNodeInformation,
+    sceneBehavior) {
+  let namePath = renderNodeInfo.node.name.split("/");
+  let text = namePath[namePath.length - 1];
+
+  // Truncate long labels for unexpanded Metanodes.
+  let useFontScale = renderNodeInfo.node.type === NodeType.META &&
+    !renderNodeInfo.expanded;
+
+  let label = scene.selectOrCreateChild(nodeGroup, "text", Class.Node.LABEL);
+  label.attr("dy", ".35em")
+    .attr("text-anchor", "middle");
+  if (useFontScale) {
+    if (text.length > sceneBehavior.maxMetanodeLabelLength) {
+      text = text.substr(0, sceneBehavior.maxMetanodeLabelLength - 2) + "...";
+    }
+    let scale = getLabelFontScale(sceneBehavior);
+    label.attr("font-size", scale(text.length) + "px");
+  }
+  label.text(text);
+  return label;
+};
+
+/**
+ * d3 scale used for sizing font of labels, used by labelBuild,
+ * initialized once by getLabelFontScale.
+ */
+let fontScale = null;
+function getLabelFontScale(sceneBehavior) {
+  if (!fontScale) {
+    fontScale = d3.scale.linear()
+      .domain([sceneBehavior.maxMetanodeLabelLengthLargeFont,
+        sceneBehavior.maxMetanodeLabelLength])
+      .range([sceneBehavior.maxMetanodeLabelLengthFontSize,
+        sceneBehavior.minMetanodeLabelLengthFontSize]).clamp(true);
+  }
+  return fontScale;
+}
+/**
+ * Set label position of a given node group
+ */
+function labelPosition(nodeGroup, d: render.RenderNodeInformation,
+    yOffset: number) {
+  scene.selectChild(nodeGroup, "text", Class.Node.LABEL).transition()
+    .attr("x", d.x)
+    .attr("y", d.y + yOffset);
+};
+
+/**
+ * Select or append/insert shape for a node and assign renderNode
+ * as the shape's data.
+ *
+ * @param nodeGroup
+ * @param d RenderNodeInformation
+ * @param nodeClass class for the element.
+ * @param before Reference DOM node for insertion.
+ * @return Selection of the shape.
+ */
+export function buildShape(nodeGroup, d, nodeClass: string, before?) {
+  // Create a group to house the underlying visual elements.
+  let shapeGroup = scene.selectOrCreateChild(nodeGroup, "g", nodeClass,
+    before);
+  // TODO(jimbo): DOM structure should be templated in HTML somewhere, not JS.
+  switch (d.node.type) {
+    case NodeType.OP:
+      scene.selectOrCreateChild(shapeGroup, "ellipse",
+        Class.Node.COLOR_TARGET);
+      break;
+    case NodeType.SERIES:
+      // Choose the correct stamp to use to represent this series.
+      let stampType = "annotation";
+      let groupNodeInfo = <render.RenderGroupNodeInformation>d;
+      if (groupNodeInfo.coreGraph) {
+        stampType = groupNodeInfo.node.hasNonControlEdges
+          ? "vertical" : "horizontal";
+      }
+      scene.selectOrCreateChild(shapeGroup, "use", Class.Node.COLOR_TARGET)
+        .attr("xlink:href", "#op-series-" + stampType + "-stamp");
+      scene.selectOrCreateChild(shapeGroup, "rect", Class.Node.COLOR_TARGET)
+        .attr({ rx: d.radius, ry: d.radius });
+      break;
+    case NodeType.BRIDGE:
+      scene.selectOrCreateChild(shapeGroup, "rect", Class.Node.COLOR_TARGET)
+        .attr({ rx: d.radius, ry: d.radius });
+      break;
+    case NodeType.META:
+      scene.selectOrCreateChild(shapeGroup, "rect", Class.Node.COLOR_TARGET)
+        .attr({ rx: d.radius, ry: d.radius });
+      break;
+    default:
+      throw Error("Unrecognized node type: " + d.node.type);
+  }
+  return shapeGroup;
+};
+
+export function nodeClass(d: render.RenderNodeInformation) {
+  switch (d.node.type) {
+    case NodeType.OP:
+      return Class.OPNODE;
+    case NodeType.META:
+      return Class.METANODE;
+    case NodeType.SERIES:
+      return Class.SERIESNODE;
+    case NodeType.BRIDGE:
+      return Class.BRIDGENODE;
+    case NodeType.ELLIPSIS:
+      return Class.ELLIPSISNODE;
+  };
+  throw Error("Unrecognized node type: " + d.node.type);
+};
+
+/** Modify node and its subscene and its label's positional attributes */
+function position(nodeGroup, d: render.RenderNodeInformation, sceneBehavior) {
+  let shapeGroup = scene.selectChild(nodeGroup, "g", Class.Node.SHAPE);
+  switch (d.node.type) {
+    case NodeType.OP: {
+      // position shape
+      let shape = scene.selectChild(shapeGroup, "ellipse");
+      scene.positionEllipse(shape, d.x, d.y, d.width, d.height);
+      labelPosition(nodeGroup, d, d.labelOffset);
+      break;
+    }
+    case NodeType.META: {
+      // position shape
+      let shape = scene.selectChild(shapeGroup, "rect");
+      scene.positionRect(shape, d.x, d.y, d.width, d.height);
+
+      if (d.expanded) {
+        subscenePosition(nodeGroup, d);
+
+        // put label on top
+        labelPosition(nodeGroup, d,
+          - d.height / 2 + d.labelHeight / 2);
+      } else {
+        labelPosition(nodeGroup, d, 0);
+      }
+      break;
+    }
+    case NodeType.SERIES: {
+      let shape = scene.selectChild(shapeGroup, "use");
+      scene.positionRect(shape, d.x, d.y, d.width, d.height);
+      if (d.expanded) {
+      subscenePosition(nodeGroup, d);
+
+        // put label on top
+        labelPosition(nodeGroup, d,
+          - d.height / 2 + d.labelHeight / 2);
+      } else {
+        labelPosition(nodeGroup, d, d.labelOffset);
+      }
+    }
+    case NodeType.BRIDGE: {
+      // position shape
+      // NOTE: In reality, these will not be visible, but it helps to put them
+      // in the correct position for debugging purposes.
+      let shape = scene.selectChild(shapeGroup, "rect");
+      scene.positionRect(shape, d.x, d.y, d.width, d.height);
+      break;
+    }
+    default: {
+      throw Error("Unrecognized node type: " + d.node.type);
+    }
+  }
+};
+
+/** Enum specifying the options to color nodes by */
+let ColorBy = {
+  STRUCTURE: 0,
+  DEVICE: 1,
+  COMPUTE_TIME: 2,
+  MEMORY: 3
+};
+
+/**
+ * Returns the fill color for the node given its state and the "color by"
+ * option.
+ */
+function getFillForNode(sceneBehavior, colorBy,
+    renderInfo: render.RenderNodeInformation, isExpanded: boolean): string {
+  let colorParams = tf.graph.render.MetanodeColors;
+  switch (colorBy) {
+    case ColorBy.STRUCTURE:
+      if (renderInfo.node.type === tf.graph.NodeType.META) {
+        let tid = (<Metanode>renderInfo.node).templateId;
+        return tid === null ? colorParams.UNKNOWN : colorParams.STRUCTURE_PALETTE(
+        sceneBehavior.templateIndex(tid), renderInfo.expanded);
+      } else if (renderInfo.node.type === tf.graph.NodeType.SERIES) {
+        // If expanded, we're showing the background rect, which we want to
+        // appear gray. Otherwise we're showing a stack of ellipses which we
+        // want to show white.
+        return renderInfo.expanded ? colorParams.EXPANDED_COLOR : "white";
+      } else if (renderInfo.node.type === NodeType.BRIDGE) {
+        return renderInfo.structural ? "#f0e" :
+          (<BridgeNode>renderInfo.node).inbound ? "#0ef" : "#fe0";
+      } else {
+        // Op nodes are white.
+        return "white";
+      }
+    case ColorBy.DEVICE:
+      if (renderInfo.deviceColors == null) {
+        // Return the hue for unknown device.
+        return colorParams.UNKNOWN;
+      }
+      let id = renderInfo.node.name;
+      let escapedId = tf.escapeQuerySelector(id);
+      let gradientDefs = d3.select("svg#svg defs #linearGradients");
+      let linearGradient =
+        gradientDefs.select("linearGradient#" + escapedId);
+      // If the linear gradient is not there yet, create it.
+      if (linearGradient.size() === 0) {
+        linearGradient = gradientDefs.append("linearGradient").attr("id", id);
+        // Re-create the stops of the linear gradient.
+        linearGradient.selectAll("*").remove();
+        let cumulativeProportion = 0;
+        // For each device, create a stop using the proportion of that device.
+        _.each(renderInfo.deviceColors, (d: any) => {
+          let color = d.color;
+          linearGradient.append("stop")
+            .attr("offset", cumulativeProportion)
+            .attr("stop-color", color);
+          linearGradient.append("stop")
+            .attr("offset", cumulativeProportion + d.proportion)
+            .attr("stop-color", color);
+          cumulativeProportion += d.proportion;
+        });
+      }
+      return isExpanded ? colorParams.EXPANDED_COLOR : `url(#${escapedId})`;
+    case ColorBy.COMPUTE_TIME:
+      return isExpanded ?
+        colorParams.EXPANDED_COLOR : renderInfo.computeTimeColor ||
+        colorParams.UNKNOWN;
+    case ColorBy.MEMORY:
+      return isExpanded ?
+        colorParams.EXPANDED_COLOR : renderInfo.memoryColor ||
+        colorParams.UNKNOWN;
+    default:
+      throw new Error("Unknown case to color nodes by");
+  }
+}
+
+/**
+ * Modify node style by toggling class and assign attributes (only for things
+ * that can't be done in css).
+ */
+export function stylize(nodeGroup, renderInfo: render.RenderNodeInformation,
+    sceneBehavior, nodeClass?) {
+  nodeClass = nodeClass || Class.Node.SHAPE;
+  let isHighlighted = sceneBehavior.isNodeHighlighted(renderInfo.node.name);
+  let isSelected = sceneBehavior.isNodeSelected(renderInfo.node.name);
+  let isExtract = renderInfo.isInExtract || renderInfo.isOutExtract;
+  let isExpanded = renderInfo.expanded;
+  nodeGroup.classed("highlighted", isHighlighted);
+  nodeGroup.classed("selected", isSelected);
+  nodeGroup.classed("extract", isExtract);
+  nodeGroup.classed("expanded", isExpanded);
+
+  // Main node always exists here and it will be reached before subscene,
+  // so d3 selection is fine here.
+  let node = nodeGroup.select("." + nodeClass + " ." + Class.Node.COLOR_TARGET);
+  let fillColor = getFillForNode(sceneBehavior,
+    ColorBy[sceneBehavior.colorBy.toUpperCase()],
+    renderInfo, isExpanded);
+  node.style("fill", fillColor);
+
+  // Choose outline to be darker version of node color if the node is a single
+  // color and is not selected.
+  if (isSelected) {
+    node.style("stroke", null);
+  } else {
+    // If node is colored by a gradient, then use a dark gray outline.
+    let outlineColor = fillColor.substring(0, 3) === "url" ?
+      tf.graph.render.MetanodeColors.GRADIENT_OUTLINE :
+      d3.rgb(fillColor).darker().toString();
+    node.style("stroke", outlineColor);
+  }
+};
+
+} // close module
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/scene/scene.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/scene.ts
new file mode 100644
index 0000000000..2e2467f039
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/scene/scene.ts
@@ -0,0 +1,409 @@
+/// <reference path="../graph.ts" />
+/// <reference path="edge.ts" />
+/// <reference path="node.ts" />
+/// <reference path="../layout.ts" />
+
+module tf.graph.scene {
+
+/** Enums element class of objects in the scene */
+export let Class = {
+  Node: {
+    // <g> element that contains nodes.
+    CONTAINER: "nodes",
+    // <g> element that contains detail about a node.
+    GROUP: "node",
+    // <g> element that contains visual elements (like rect, ellipse).
+    SHAPE: "nodeshape",
+    // <*> element(s) under SHAPE that should receive color updates.
+    COLOR_TARGET: "nodecolortarget",
+    // <text> element showing the node's label.
+    LABEL: "nodelabel",
+    // <g> element that contains all visuals for the expand/collapse
+    // button for expandable group nodes.
+    BUTTON_CONTAINER: "buttoncontainer",
+    // <circle> element that surrounds expand/collapse buttons.
+    BUTTON_CIRCLE: "buttoncircle",
+    // <path> element of the expand button.
+    EXPAND_BUTTON: "expandbutton",
+    // <path> element of the collapse button.
+    COLLAPSE_BUTTON: "collapsebutton"
+  },
+  Edge: {
+    CONTAINER: "edges",
+    GROUP: "edge",
+    LINE: "edgeline",
+    REF_LINE: "refline",
+    STRUCTURAL: "structural"
+  },
+  Annotation: {
+    OUTBOX: "out-annotations",
+    INBOX: "in-annotations",
+    GROUP: "annotation",
+    NODE: "annotation-node",
+    EDGE: "annotation-edge",
+    CONTROL_EDGE: "annotation-control-edge",
+    LABEL: "annotation-label",
+    ELLIPSIS: "annotation-ellipsis"
+  },
+  Scene: {
+    GROUP: "scene",
+    CORE: "core",
+    INEXTRACT: "in-extract",
+    OUTEXTRACT: "out-extract"
+  },
+  Subscene: {
+    GROUP: "subscene"
+  },
+  OPNODE: "op",
+  METANODE: "meta",
+  SERIESNODE: "series",
+  BRIDGENODE: "bridge",
+  ELLIPSISNODE: "ellipsis"
+};
+
+/**
+ * Helper method for fitting the graph in the svg view.
+ *
+ * @param svg The main svg.
+ * @param zoomG The svg group used for panning and zooming.
+ * @param d3zoom The zoom behavior.
+ * @param callback Called when the fitting is done.
+ */
+export function fit(svg, zoomG, d3zoom, callback) {
+  let svgRect = svg.getBoundingClientRect();
+  let sceneSize = zoomG.getBBox();
+  let scale = 0.9 * Math.min(
+      svgRect.width / sceneSize.width,
+      svgRect.height / sceneSize.height,
+      2
+    );
+  let params = layout.PARAMS.graph;
+  let zoomEvent = d3zoom.scale(scale)
+    .on("zoomend.fitted", () => {
+      // Remove the listener for the zoomend event,
+      // so we don't get called at the end of regular zoom events,
+      // just those that fit the graph to screen.
+      d3zoom.on("zoomend.fitted", null);
+      callback();
+    })
+    .translate([params.padding.paddingLeft, params.padding.paddingTop])
+    .event;
+  d3.select(zoomG).transition().duration(500).call(zoomEvent);
+};
+
+/**
+ * Helper method for panning the graph to center on the provided node,
+ * if the node is currently off-screen.
+ *
+ * @param nodeName The node to center the graph on
+ * @param svg The root SVG element for the graph
+ * @param zoomG The svg group used for panning and zooming.
+ * @param d3zoom The zoom behavior.
+ * @return True if the graph had to be panned to display the
+ *            provided node.
+ */
+export function panToNode(nodeName: String, svg, zoomG, d3zoom): boolean {
+  let node: any = d3.selectAll("[data-name='" + nodeName + "']."
+    + Class.Node.GROUP)[0][0];
+  if (!node) {
+    return false;
+  }
+  let translate = d3zoom.translate();
+  // Check if the selected node is off-screen in either
+  // X or Y dimension in either direction.
+  let nodeBox = node.getBBox();
+  let nodeCtm = node.getScreenCTM();
+  let pointTL = svg.createSVGPoint();
+  let pointBR = svg.createSVGPoint();
+  pointTL.x = nodeBox.x;
+  pointTL.y = nodeBox.y;
+  pointBR.x = nodeBox.x + nodeBox.width;
+  pointBR.y = nodeBox.y + nodeBox.height;
+  pointTL = pointTL.matrixTransform(nodeCtm);
+  pointBR = pointBR.matrixTransform(nodeCtm);
+  let isOutsideOfBounds = (start, end, bound) => {
+    return end < 0 || start > bound;
+  };
+  let svgRect = svg.getBoundingClientRect();
+  if (isOutsideOfBounds(pointTL.x, pointBR.x, svgRect.width) ||
+      isOutsideOfBounds(pointTL.y, pointBR.y, svgRect.height)) {
+    // Determine the amount to transform the graph in both X and Y
+    // dimensions in order to center the selected node. This takes into
+    // acount the position of the node, the size of the svg scene, the
+    // amount the scene has been scaled by through zooming, and any previous
+    // transform already performed by this logic.
+    let centerX = (pointTL.x + pointBR.x) / 2;
+    let centerY = (pointTL.y + pointBR.y) / 2;
+    let dx = ((svgRect.width / 2) - centerX);
+    let dy = ((svgRect.height / 2) - centerY);
+    let zoomEvent = d3zoom.translate([translate[0] + dx, translate[1] + dy])
+        .event;
+    d3.select(zoomG).transition().duration(500).call(zoomEvent);
+    return true;
+  }
+  return false;
+};
+
+/**
+ * Given a container d3 selection, select a child svg element of a given tag
+ * and class if exists or append / insert one otherwise.  If multiple children
+ * matches the tag and class name, returns only the first one.
+ *
+ * @param container
+ * @param tagName tag name.
+ * @param className (optional) Class name.
+ * @param before (optional) reference DOM node for insertion.
+ * @return selection of the element
+ */
+export function selectOrCreateChild(container, tagName: string,
+    className?: string, before?) {
+  let child = selectChild(container, tagName, className);
+  if (!child.empty()) {
+    return child;
+  }
+  let newElement = document.createElementNS("http://www.w3.org/2000/svg",
+    tagName);
+  if (className) {
+    newElement.classList.add(className);
+  }
+
+  if (before) { // if before exists, insert
+    container.node().insertBefore(newElement, before);
+  } else { // otherwise, append
+    container.node().appendChild(newElement);
+  }
+  return d3.select(newElement)
+           // need to bind data to emulate d3_selection.append
+           .datum(container.datum());
+};
+
+/**
+ * Given a container d3 selection, select a child element of a given tag and
+ * class. If multiple children matches the tag and class name, returns only
+ * the first one.
+ *
+ * @param container
+ * @param tagName tag name.
+ * @param className (optional) Class name.
+ * @return selection of the element, or an empty selection
+ */
+export function selectChild(container, tagName: string, className?: string) {
+  let children = container.node().childNodes;
+  for (let i = 0; i < children.length; i++) {
+    let child = children[i];
+    if (child.tagName === tagName &&
+        (!className || child.classList.contains(className))
+          ) {
+      return d3.select(child);
+    }
+  }
+  return d3.select(null);
+};
+
+/**
+ * Select or create a sceneGroup and build/update its nodes and edges.
+ *
+ * Structure Pattern:
+ *
+ * <g class="scene">
+ *   <g class="core">
+ *     <g class="edges">
+ *       ... stuff from tf.graph.scene.edges.build ...
+ *     </g>
+ *     <g class="nodes">
+ *       ... stuff from tf.graph.scene.nodes.build ...
+ *     </g>
+ *   </g>
+ *   <g class="in-extract">
+ *     <g class="nodes">
+ *       ... stuff from tf.graph.scene.nodes.build ...
+ *     </g>
+ *   </g>
+ *   <g class="out-extract">
+ *     <g class="nodes">
+ *       ... stuff from tf.graph.scene.nodes.build ...
+ *     </g>
+ *   </g>
+ * </g>
+ *
+ * @param container D3 selection of the parent.
+ * @param renderNode render node of a metanode or series node.
+ * @param sceneBehavior Parent scene module.
+ * @param sceneClass class attribute of the scene (default="scene").
+ */
+export function buildGroup(container,
+    renderNode: render.RenderGroupNodeInformation,
+    sceneBehavior,
+    sceneClass: string) {
+  sceneClass = sceneClass || Class.Scene.GROUP;
+  let isNewSceneGroup = selectChild(container, "g", sceneClass).empty();
+  let sceneGroup = selectOrCreateChild(container, "g", sceneClass);
+
+  // core
+  let coreGroup = selectOrCreateChild(sceneGroup, "g", Class.Scene.CORE);
+  let coreNodes = _.reduce(renderNode.coreGraph.nodes(), (nodes, name) => {
+                    let node = renderNode.coreGraph.node(name);
+                    if (!node.excluded) {
+                      nodes.push(node);
+                    }
+                    return nodes;
+                  }, []);
+
+  if (renderNode.node.type === NodeType.SERIES) {
+    // For series, we want the first item on top, so reverse the array so
+    // the first item in the series becomes last item in the top, and thus
+    // is rendered on the top.
+    coreNodes.reverse();
+  }
+
+  // Create the layer of edges for this scene (paths).
+  edge.buildGroup(coreGroup, renderNode.coreGraph, sceneBehavior);
+
+  // Create the layer of nodes for this scene (ellipses, rects etc).
+  node.buildGroup(coreGroup, coreNodes, sceneBehavior);
+
+  // In-extract
+  if (renderNode.isolatedInExtract.length > 0) {
+    let inExtractGroup = selectOrCreateChild(sceneGroup, "g",
+      Class.Scene.INEXTRACT);
+    node.buildGroup(inExtractGroup, renderNode.isolatedInExtract,
+        sceneBehavior);
+  } else {
+    selectChild(sceneGroup, "g", Class.Scene.INEXTRACT).remove();
+  }
+
+  // Out-extract
+  if (renderNode.isolatedOutExtract.length > 0) {
+    let outExtractGroup = selectOrCreateChild(sceneGroup, "g",
+      Class.Scene.OUTEXTRACT);
+    node.buildGroup(outExtractGroup, renderNode.isolatedOutExtract,
+        sceneBehavior);
+  } else {
+    selectChild(sceneGroup, "g", Class.Scene.OUTEXTRACT).remove();
+  }
+
+  position(sceneGroup, renderNode);
+
+  // Fade in the scene group if it didn't already exist.
+  if (isNewSceneGroup) {
+    sceneGroup.attr("opacity", 0)
+      .transition().attr("opacity", 1);
+  }
+
+  return sceneGroup;
+};
+
+/**
+ * Given a scene's svg group, set  g.in-extract, g.coreGraph, g.out-extract svg
+ * groups' position relative to the scene.
+ *
+ * @param sceneGroup
+ * @param renderNode render node of a metanode or series node.
+ */
+function position(sceneGroup, renderNode: render.RenderGroupNodeInformation) {
+  // Translate scenes down by the label height so that when showing graphs in
+  // expanded metanodes, the graphs are below the labels.  Do not shift them
+  // down for series nodes as series nodes don't have labels inside of their
+  // bounding boxes.
+  let yTranslate = renderNode.node.type === NodeType.SERIES ?
+    0 : layout.PARAMS.subscene.meta.labelHeight;
+
+  // core
+  translate(selectChild(sceneGroup, "g", Class.Scene.CORE),
+                  0, yTranslate);
+
+  // in-extract
+  let inExtractX = renderNode.coreBox.width === 0 ?
+    0 : renderNode.coreBox.width;
+  let hasInExtract = renderNode.isolatedInExtract.length > 0;
+  if (hasInExtract) {
+    translate(selectChild(sceneGroup, "g", Class.Scene.INEXTRACT),
+                    inExtractX, yTranslate);
+  }
+
+  // out-extract
+  let hasOutExtract = renderNode.isolatedOutExtract.length > 0;
+  if (hasOutExtract) {
+    let outExtractX = inExtractX + renderNode.inExtractBox.width
+      + renderNode.extractXOffset;
+    translate(selectChild(sceneGroup, "g", Class.Scene.OUTEXTRACT),
+                    outExtractX, yTranslate);
+  }
+};
+
+/** Adds a click listener to a group that fires a graph-select event */
+export function addGraphClickListener(graphGroup, sceneBehavior) {
+  d3.select(graphGroup).on("click", () => {
+    sceneBehavior.fire("graph-select");
+  });
+};
+
+/** Helper for adding transform: translate(x0, y0) */
+export function translate(selection, x0: number, y0: number) {
+  selection.attr("transform", "translate(" + x0 + "," + y0 + ")");
+};
+
+/**
+ * Helper for setting position of a svg rect
+ * @param rect rect to set position of.
+ * @param cx Center x.
+ * @param cy Center x.
+ * @param width Width to set.
+ * @param height Height to set.
+ */
+export function positionRect(rect, cx: number, cy: number, width: number,
+    height: number) {
+  rect.transition().attr({
+    x: cx - width / 2,
+    y: cy - height / 2,
+    width: width,
+    height: height
+  });
+};
+
+/**
+ * Helper for setting position of a svg expand/collapse button
+ * @param button container group
+ * @param renderNode the render node of the group node to position
+ *        the button on.
+ */
+export function positionButton(button,
+    renderNode: render.RenderNodeInformation) {
+  // Position the button in the top-right corner of the group node,
+  // with space given the draw the button inside of the corner.
+  let x = renderNode.x + renderNode.width / 2 - 6;
+  let y = renderNode.y - renderNode.height / 2 + 6;
+  // For unexpanded series nodes, the button has special placement due
+  // to the unique visuals of this group node.
+  if (renderNode.node.type === NodeType.SERIES && !renderNode.expanded) {
+    x += 10;
+    y -= 2;
+  }
+  let translateStr = "translate(" + x + "," + y + ")";
+  button.selectAll("path").transition().attr("transform", translateStr);
+  button.select("circle").transition().attr({
+    cx: x,
+    cy: y,
+    r: layout.PARAMS.nodeSize.meta.expandButtonRadius
+  });
+};
+
+/**
+ * Helper for setting position of a svg ellipse
+ * @param ellipse ellipse to set position of.
+ * @param cx Center x.
+ * @param cy Center x.
+ * @param width Width to set.
+ * @param height Height to set.
+ */
+export function positionEllipse(ellipse, cx: number, cy: number,
+    width: number, height: number) {
+  ellipse.transition().attr({
+    cx: cx,
+    cy: cy,
+    rx: width / 2,
+    ry: height / 2
+  });
+};
+
+} // close module
diff --git a/tensorflow/tensorboard/components/tf-graph-common/lib/template.ts b/tensorflow/tensorboard/components/tf-graph-common/lib/template.ts
new file mode 100644
index 0000000000..b5aafc55e5
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/lib/template.ts
@@ -0,0 +1,282 @@
+/// <reference path="graph.ts" />
+/// <reference path="hierarchy.ts" />
+
+module tf.graph.template {
+
+/**
+ * Detect repeating patterns of subgraphs.
+ * Assign templateId to each subgraph if it belongs to a template.
+ * Returns clusters of similar subgraphs .
+ *
+ * @param graph
+ * @param verifyTemplate whether to run the template verification algorithm
+ * @return a dict (template id => Array of node names)
+ */
+export function detect(h, verifyTemplate): {[templateId: string]: string[]} {
+  // In any particular subgraph, there are either
+  // - leaf nodes (which do not have subgraph)
+  // - metanode nodes - some of them have only one member (singular metanode)
+  //                    and some have multiple members (non-singular metanode)
+
+  // First, generate a nearest neighbor hash of metanode nodes.
+  let nnGroups = clusterSimilarSubgraphs(h);
+
+  // For each metanode, compare its subgraph (starting from shallower groups)
+  // and assign template id.
+  let templates = groupTemplateAndAssignId(nnGroups, verifyTemplate);
+
+  // Sort the templates by minimum level in the graph at which they appear,
+  // as this leads to optimal setting of the colors of each template for
+  // maximum differentiation.
+  return _(templates).pairs()
+      .sortBy(function(pair) {
+        return pair[1].level;
+      })
+      .map(function(pair) {
+        return [pair[0], pair[1].nodes];
+      })
+      .object().value();
+};
+
+/**
+ * @return Unique string for a metanode based on depth, |V|, |E| and
+ * op type histogram.
+ */
+ function getSignature(metanode) {
+  // depth=<number> |V|=<number> |E|=<number>
+     let props = _.map({
+         "depth": metanode.depth,
+         "|V|": metanode.metagraph.nodes().length,
+         "|E|": metanode.metagraph.edges().length
+     }, function(v, k) { return k + "=" + v; }).join(" ");
+
+  // optype1=count1,optype2=count2
+  let ops = _.map(metanode.opHistogram, function(count, op) {
+      return op + "=" + count;
+    }).join(",");
+
+  return props + " [ops] " + ops;
+}
+
+/**
+ * Generate a nearest neighbor hash of metanodes
+ * based on depth, |V|, |E|, and opHistogram of their subgraph
+ * (excluding leaf nodes and singular metanodes).
+ * @param graph The graph
+ * @return Array of pairs of [signature,
+ *   Object with min level of the template and an Array of tf.graph.Group]
+ *   sort by ascending order of minimum depth at which metanode appears.
+ */
+function clusterSimilarSubgraphs(h: hierarchy.Hierarchy) {
+  /** a dict from metanode.signature() => Array of tf.graph.Groups */
+  let hashDict = _(h.getNodeMap()).reduce(function(hash, node: OpNode|Metanode, name) {
+    if (node.type !== NodeType.META) {
+        return hash;
+    }
+    let levelOfMetaNode = name.split("/").length - 1;
+    let signature = getSignature(node);
+    let templateInfo = hash[signature] ||
+      {nodes: [], level: levelOfMetaNode};
+    hash[signature] = templateInfo;
+    templateInfo.nodes.push(node);
+    if (templateInfo.level > levelOfMetaNode) {
+      templateInfo.level = levelOfMetaNode;
+    }
+    return hash;
+  }, {});
+
+  return _(hashDict).pairs()
+           // filter nn metanode with only one member
+           .filter(function(pair) {
+             return pair[1].nodes.length > 1;
+           })
+           .sortBy(function(pair) {
+              // sort by depth
+              // (all members in the same nnGroup has equal depth)
+              return pair[1].nodes[0].depth;
+            })
+            .value();
+}
+
+function groupTemplateAndAssignId(nnGroups, verifyTemplate) {
+  // For each metanode, compare its subgraph (starting from shallower groups)
+  // and assign template id.
+  return _.reduce(nnGroups, function(templates, nnGroupPair) {
+    let signature = nnGroupPair[0],
+      nnGroup = nnGroupPair[1].nodes,
+      clusters = [];
+
+    nnGroup.forEach(function(metanode) {
+      // check with each existing cluster
+      for (let i = 0; i < clusters.length; i++) {
+        let similar = !verifyTemplate ||
+                      isSimilarSubgraph(
+                        clusters[i].metanode.metagraph,
+                        metanode.metagraph
+                      );
+        // if similar, just add this metanode to the cluster
+        if (similar) {
+          // get template from the first one
+          metanode.templateId = clusters[i].metanode.templateId;
+          clusters[i].members.push(metanode.name);
+          return;
+        }
+      }
+      // otherwise create a new cluster with id "signature [count] "
+      metanode.templateId = signature + "[" + clusters.length + "]";
+      clusters.push({
+        metanode: metanode,
+        members: [metanode.name]
+      });
+    });
+
+    clusters.forEach(function(c) {
+      templates[c.metanode.templateId] = {
+        level: nnGroupPair[1].level,
+        nodes: c.members
+      };
+    });
+    return templates;
+  }, {});
+}
+
+function sortNodes(names: string[], graph: graphlib.Graph<Metanode|OpNode, Metaedge>,
+    prefix: string) {
+  return _.sortByAll(names,
+    function(name) {
+      let node = graph.node(name);
+      return (<OpNode>node).op;
+    },
+    function(name) {
+      let node = graph.node(name);
+      return (<Metanode>node).templateId;
+    },
+    function(name) {
+      return graph.neighbors(name).length;
+    },
+    function(name) {
+      return graph.predecessors(name).length;
+    },
+    function(name) {
+      return graph.successors(name).length;
+    },
+    function(name) {
+      return name.substr(prefix.length);
+    });
+}
+
+function isSimilarSubgraph(g1: graphlib.Graph<any, any>, g2: graphlib.Graph<any, any>) {
+  if (!tf.graph.hasSimilarDegreeSequence(g1, g2)) {
+      return false;
+  }
+
+  // if we want to skip, just return true here.
+  // return true;
+
+  // Verify sequence by running DFS
+  let g1prefix = g1.graph().name;
+  let g2prefix = g2.graph().name;
+
+  let visited1 = {};
+  let visited2 = {};
+  let stack = [];
+
+  /**
+   * push sources or successors into the stack
+   * if the visiting pattern has been similar.
+   */
+  function stackPushIfNotDifferent(n1, n2) {
+    let sub1 = n1.substr(g1prefix.length),
+      sub2 = n2.substr(g2prefix.length);
+
+    /* tslint:disable */
+    if (visited1[sub1] ^ visited2[sub1]) {
+      console.warn("different visit pattern", "[" + g1prefix + "]", sub1,
+                                              "[" + g2prefix + "]", sub2);
+      return true;
+    }
+    /* tslint:enable */
+    if (!visited1[sub1]) { // implied && !visited2[sub2]
+      visited1[sub1] = visited2[sub2] = true;
+      stack.push({n1: n1, n2: n2});
+    }
+
+    return false;
+  }
+
+  // check if have same # of sources then sort and push
+  let sources1 = g1.sources();
+  let sources2 = g2.sources();
+  if (sources1.length !== sources2.length) {
+    /* tslint:disable */
+    console.log("different source length");
+    /* tslint:enable */
+    return false;
+  }
+  sources1 = sortNodes(sources1, g1, g1prefix);
+  sources2 = sortNodes(sources2, g2, g2prefix);
+
+  for (let i = 0; i < sources1.length; i++) {
+    let different = stackPushIfNotDifferent(sources1[i], sources2[i]);
+    if (different) {
+        return false;
+    }
+  }
+
+  while (stack.length > 0) {
+    let cur = stack.pop();
+
+    // check node
+    let similar = isSimilarNode(g1.node(cur.n1), g2.node(cur.n2));
+    if (!similar) {
+        return false;
+    }
+
+    // check if have same # of successors then sort and push
+    let succ1 = g1.successors(cur.n1), succ2 = g2.successors(cur.n2);
+    if (succ1.length !== succ2.length) {
+      /* tslint:disable */
+      console.log("# of successors mismatch", succ1, succ2);
+      /* tslint:enable */
+      return false;
+    }
+    succ1 = sortNodes(succ1, g1, g1prefix);
+    succ2 = sortNodes(succ2, g2, g2prefix);
+
+    for (let j = 0; j < succ1.length; j++) {
+      let different = stackPushIfNotDifferent(succ1[j], succ2[j]);
+      if (different) {
+          return false;
+      }
+    }
+  }
+
+  return true;
+}
+
+/**
+ * Returns if two nodes have identical structure.
+ */
+  function isSimilarNode(n1: OpNode|Metanode|SeriesNode, n2: OpNode|Metanode|SeriesNode): boolean {
+  if (n1.type === NodeType.META) {
+    // compare metanode
+    let metanode1 = <Metanode> n1;
+    let metanode2 = <Metanode> n2;
+    return metanode1.templateId && metanode2.templateId && metanode1.templateId === metanode2.templateId;
+  } else if (n1.type === NodeType.OP && n2.type === NodeType.OP) {
+    // compare leaf node
+    return (<OpNode>n1).op === (<OpNode>n2).op;
+  } else if (n1.type === NodeType.SERIES && n2.type === NodeType.SERIES) {
+    // compare series node sizes and operations
+    // (only need to check one op as all op nodes are identical in series)
+    let seriesnode1 = <SeriesNode> n1;
+    let seriesnode2 = <SeriesNode> n2;
+    let seriesnode1Count = seriesnode1.metagraph.nodeCount();
+    return (seriesnode1Count === seriesnode2.metagraph.nodeCount() &&
+      (seriesnode1Count === 0 ||
+        ((<OpNode>seriesnode1.metagraph.node(seriesnode1.metagraph.nodes()[0])).op ===
+          (<OpNode>seriesnode2.metagraph.node(seriesnode2.metagraph.nodes()[0])).op)));
+  }
+  return false;
+}
+}
diff --git a/tensorflow/tensorboard/components/tf-graph-common/tf-graph-common.html b/tensorflow/tensorboard/components/tf-graph-common/tf-graph-common.html
new file mode 100644
index 0000000000..e4cd153113
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-common/tf-graph-common.html
@@ -0,0 +1,16 @@
+<script src="../../bower_components/d3/d3.js"></script>
+<script src="../../bower_components/lodash/lodash.js"></script>
+<script src="../../bower_components/graphlib/dist/graphlib.core.js"></script>
+
+<script src="lib/common.js"></script>
+<script src="lib/graph.js"></script>
+<script src="lib/parser.js"></script>
+<script src="lib/hierarchy.js"></script>
+<script src="lib/render.js"></script>
+<script src="lib/template.js"></script>
+<script src="lib/scene/scene.js"></script>
+<script src="lib/scene/annotation.js"></script>
+<script src="lib/scene/edge.js"></script>
+<script src="lib/scene/node.js"></script>
+<script src="lib/layout.js"></script>
+<script src="lib/colors.js"></script>
diff --git a/tensorflow/tensorboard/components/tf-graph-dashboard/tf-graph-dashboard.html b/tensorflow/tensorboard/components/tf-graph-dashboard/tf-graph-dashboard.html
new file mode 100644
index 0000000000..779d2d3fe9
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-dashboard/tf-graph-dashboard.html
@@ -0,0 +1,118 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../tf-graph-loader/tf-graph-loader.html">
+<link rel="import" href="../tf-graph-board/tf-graph-board.html">
+<link rel="import" href="../tf-graph/tf-graph-controls.html">
+<link rel="import" href="../tf-dashboard-common/warning-style.html">
+
+<!--
+tf-graph-dashboard displays a graph from a TensorFlow run.
+
+It has simple behavior: Creates a url-generator and run-generator
+to talk to the backend, and then passes the runsWithGraph (list of runs with
+associated graphs) along with the url generator into tf-graph-board for display.
+
+If there are multiple runs with graphs, the first run's graph is shown
+by default. The user can select a different run from a dropdown menu.
+-->
+
+<dom-module id="tf-graph-dashboard">
+<template>
+<div id="plumbing">
+  <tf-url-generator
+      out-runs-url="{{_runsUrl}}"
+      out-graph-url-generator="{{_graphUrlGen}}"
+      id="urlGenerator"
+  ></tf-url-generator>
+  <tf-run-generator
+      id="runGenerator"
+      url="[[_runsUrl]]"
+      out-runs-with-graph="{{_runsWithGraph}}"
+  /></tf-run-generator>
+</div>
+<template is="dom-if" if="[[_datasetsEmpty(_datasets)]]">
+<div class="warning">
+  <p>
+    No graph definition files were found.
+  </p>
+  <p>
+    To store a graph, create a
+    <code>tf.python.training.summary_io.SummaryWriter</code>
+    and pass the graph either via the constructor, or by calling its
+    <code>add_graph()</code> method.
+  </p>
+</div>
+</template>
+<template is="dom-if" if="[[!_datasetsEmpty(_datasets)]]">
+<tf-dashboard-layout>
+<div class="sidebar">
+  <tf-graph-controls id="controls"
+      color-by-params="[[_colorByParams]]"
+      has-stats="[[_hasStats]]"
+      color-by="{{_colorBy}}",
+      datasets="[[_datasets]]",
+      selected-dataset="{{_selectedDataset}}"
+      selected-file="{{_selectedFile}}"
+  ></tf-graph-controls>
+  <tf-graph-loader id="loader"
+          datasets="[[_datasets]]",
+          selected-dataset="[[_selectedDataset]]"
+          selected-file="[[_selectedFile]]"
+          out-graph-hierarchy="{{_graphHierarchy}}"
+          out-graph="{{_graph}}"
+          out-graph-name="{{_graphName}}"
+          has-stats="{{_hasStats}}"
+          progress="{{_progress}}"
+  ></tf-graph-loader>
+</div>
+<div class="center">
+    <tf-graph-board id="graphboard"
+                graph-hierarchy="[[_graphHierarchy]]"
+                graph="[[_graph]]"
+                has-stats="[[_hasStats]]"
+                graph-name="[[_graphName]]"
+                progress="[[_progress]]"
+                color-by="[[_colorBy]]"
+                color-by-params="{{_colorByParams}}">
+    </tf-graph-board>
+</div>
+</template>
+<style>
+
+:host /deep/ {
+  font-family: 'Roboto', sans-serif;
+}
+
+.center {
+  height: 100%;
+}
+
+</style>
+<style include="warning-style"></style>
+</template>
+</dom-module>
+
+<script>
+(function() {
+Polymer({
+  is: 'tf-graph-dashboard',
+  properties: {
+    _runsWithGraph: Array,
+    _datasets: {
+      type: Object,
+      computed: '_getDatasets(_runsWithGraph, _graphUrlGen)'
+    }
+  },
+  _getDatasets: function(runsWithGraph, graphUrlGen) {
+    return _.map(runsWithGraph, function(runName) {
+      return {
+        name: runName,
+        path: graphUrlGen(runName)
+      };
+    });
+  },
+  _datasetsEmpty: function(datasets) {
+    return !datasets || !datasets.length;
+  }
+});
+})();
+</script>
diff --git a/tensorflow/tensorboard/components/tf-graph-info/tf-graph-info.html b/tensorflow/tensorboard/components/tf-graph-info/tf-graph-info.html
new file mode 100644
index 0000000000..b7900f86de
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-info/tf-graph-info.html
@@ -0,0 +1,65 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="tf-node-info.html">
+<dom-module id="tf-graph-info">
+<template>
+<style>
+:host {
+  font-size: 12px;
+  margin: 0;
+  padding: 0;
+  display: block;
+}
+
+h2 {
+  padding: 0;
+  text-align: center;
+  margin: 0;
+}
+</style>
+<template is="dom-if" if="{{selectedNode}}">
+  <paper-material elevation="1" class="card">
+    <tf-node-info graph-hierarchy='[[graphHierarchy]]'
+                  flat-graph="[[graph]]"
+                  node-name='[[selectedNode]]'
+                  highlighted-node='{{highlightedNode}}'>
+    </tf-node-info>
+  </paper-material>
+</template>
+</template>
+<script>
+(function() {
+  Polymer({
+    is: 'tf-graph-info',
+
+    properties: {
+      title: String,
+      graphHierarchy: Object,
+      graph: Object,
+      // Two-ways
+      selectedNode: {
+        type: String,
+        notify: true
+      },
+      highlightedNode: {
+        type: String,
+        notify: true
+      }
+    },
+    listeners: {
+      'node-list-item-click': '_nodeListItemClicked',
+      'node-list-item-mouseover': '_nodeListItemMouseover',
+      'node-list-item-mouseout': '_nodeListItemMouseout'
+    },
+    _nodeListItemClicked: function(event) {
+      this.selectedNode = event.detail.nodeName;
+    },
+    _nodeListItemMouseover: function(event) {
+      this.highlightedNode = event.detail.nodeName;
+    },
+    _nodeListItemMouseout: function() {
+      this.highlightedNode = null;
+    }
+  });
+})();
+</script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-graph-info/tf-node-info.html b/tensorflow/tensorboard/components/tf-graph-info/tf-node-info.html
new file mode 100644
index 0000000000..5044bf2bb1
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-info/tf-node-info.html
@@ -0,0 +1,345 @@
+<link rel="import" href="../../bower_components/iron-collapse/iron-collapse.html">
+<link rel="import" href="../../bower_components/iron-list/iron-list.html">
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-icon-button/paper-icon-button.html">
+<link rel="import" href="../../bower_components/paper-item/all-imports.html">
+<link rel="import" href="../tf-graph-common/tf-graph-common.html">
+<link rel="import" href="../tf-graph/tf-graph-icon.html">
+<link rel="import" href="tf-node-list-item.html">
+
+<dom-module id="tf-node-info">
+  <style>
+  .sub-list-group {
+    padding: 8px 12px 0px;
+    font-weight: 500;
+    font-size: 12pt;
+  }
+
+  .sub-list {
+    max-height: 300px;
+    overflow-y: scroll;
+  }
+
+  .attr-left {
+    float: left;
+    width: 30%;
+    word-wrap: break-word;
+    color: #565656;
+    font-size: 11pt;
+    font-weight: 400;
+  }
+
+  .attr-right {
+    margin-left: 30%;
+    word-wrap: break-word;
+    color: #565656;
+    font-weight: 400;
+  }
+
+  paper-item {
+    padding: 0;
+    background: #e9e9e9;
+  }
+
+  paper-item-body[two-line] {
+    min-height: 0;
+    padding: 8px 12px 4px;
+  }
+
+  .expandedInfo {
+    padding: 0 0 8px;
+  }
+
+  .controlDeps {
+    padding: 0 0 0 8px;
+  }
+
+  .node-name {
+    white-space: normal;
+    word-wrap: break-word;
+    font-size: 14pt;
+    font-weight: 500;
+  }
+
+  .node-icon {
+    float: right;
+  }
+
+  .subtitle {
+    font-size: 12pt;
+    color: #5e5e5e;
+  }
+
+  .controlLine {
+    font-size: 11pt;
+    font-weight: 400;
+  }
+
+  .toggle-button {
+    float: right;
+    max-height: 20px;
+    max-width: 20px;
+    padding: 0;
+  }
+
+  .control-toggle-button {
+    float: left;
+    max-height: 20px;
+    max-width: 20px;
+    padding: 0;
+  }
+  </style>
+  <template>
+    <paper-item>
+      <paper-item-body two-line>
+        <div>
+          <paper-icon-button
+            icon="{{_getToggleIcon(_expanded)}}"
+            on-click="_toggleExpanded"
+            class="toggle-button">
+          </paper-icon-button>
+          <div class="node-name">[[_getNodeName(nodeName)]]</div>
+        </div>
+        <div secondary>
+          <tf-graph-icon class="node-icon" node="[[_node]]"></tf-graph-icon>
+          <template is="dom-if" if="{{_node.op}}">
+            <div class="subtitle">
+              Operation:
+              <span>[[_node.op]]</span>
+            </div>
+          </template>
+          <template is="dom-if" if="{{_node.metagraph}}">
+            <div class="subtitle">
+              Subgraph:
+              <span>[[_node.cardinality]]</span> nodes
+            </div>
+          </template>
+        </div>
+      </paper-item-body>
+    </paper-item>
+    <iron-collapse opened="{{_expanded}}">
+    <template is="dom-if" if="{{_expanded}}" restamp="true">
+      <div class="expandedInfo">
+        <div class="sub-list-group attributes">
+          Attributes
+          (<span>[[_attributes.length]]</span>)
+          <iron-list class="sub-list" id ="attributesList"
+                    items="[[_attributes]]">
+            <template>
+              <div>
+                <div class="attr-left">[[item.key]]</div>
+                <div class="attr-right">[[item.value]]</div>
+              </div>
+            </template>
+          </iron-list>
+        </div>
+
+        <template is="dom-if" if="{{_device}}">
+          <div class="sub-list-group device">
+            <div class="attr-left">Device</div>
+            <div class="attr-right">[[_device]]</div>
+          </div>
+        </template>
+
+        <div class="sub-list-group predecessors">
+          Inputs
+          (<span>[[_totalPredecessors]]</span>)
+          <iron-list class="sub-list" id ="inputsList"
+                    items="[[_predecessors.regular]]">
+            <template>
+              <tf-node-list-item card-node="[[_node]]"
+                              item-node="[[_getNode(item, graphHierarchy)]]"
+                              name="[[item]]"
+                              item-type="predecessors">
+              </tf-node-list-item>
+            </template>
+          </iron-list>
+          <template is="dom-if" if="[[_predecessors.control.length]]">
+            <div class="controlDeps">
+              <div class="controlLine">
+                <paper-icon-button
+                  icon="{{_getToggleIcon(_openedControlPred)}}"
+                  on-click="_toggleControlPred"
+                  class="control-toggle-button">
+                </paper-icon-button>
+                Control dependencies
+              </div>
+              <iron-collapse opened="{{_openedControlPred}}">
+                <template is="dom-if" if="{{_openedControlPred}}" restamp="true">
+                  <iron-list class="sub-list" items="[[_predecessors.control]]">
+                    <template>
+                      <tf-node-list-item card-node="[[_node]]"
+                                      item-node="[[_getNode(item, graphHierarchy)]]"
+                                      name="[[item]]"
+                                      item-type="predecessors">
+                      </tf-node-list-item>
+                    </template>
+                  </iron-list>
+                </template>
+              </iron-collapse>
+            </div>
+          </template>
+        </div>
+
+        <div class="sub-list-group successors">
+          Outputs
+          (<span>[[_totalSuccessors]]</span>)
+          <iron-list class="sub-list" id ="outputsList"
+                    items="[[_successors.regular]]">
+            <template>
+              <tf-node-list-item card-node="[[_node]]"
+                              item-node="[[_getNode(item, graphHierarchy)]]"
+                              name="[[item]]"
+                              item-type="successor">
+              </tf-node-list-item>
+            </template>
+          </iron-list>
+          <template is="dom-if" if="[[_successors.control.length]]">
+            <div class="controlDeps">
+              <div class="controlLine">
+                <paper-icon-button
+                  icon="{{_getToggleIcon(_openedControlSucc)}}"
+                  on-click="_toggleControlSucc"
+                  class="control-toggle-button">
+                </paper-icon-button>
+                Control dependencies
+              </div>
+              <iron-collapse opened="{{_openedControlSucc}}">
+                <template is="dom-if" if="{{_openedControlSucc}}" restamp="true">
+                  <iron-list class="sub-list" items="[[_successors.control]]">
+                    <template>
+                      <tf-node-list-item card-node="[[_node]]"
+                                  item-node="[[_getNode(item, graphHierarchy)]]"
+                                  name="[[item]]"
+                                  item-type="successors">
+                      </tf-node-list-item>
+                    </template>
+                  </iron-list>
+                </template>
+              </iron-collapse>
+            </div>
+          </template>
+        </div>
+      </div>
+    </template>
+    </iron-collapse>
+  </template>
+
+  <script>
+    (function() {
+      Polymer({
+        is: 'tf-node-info',
+
+        properties: {
+          nodeName: String,
+          graphHierarchy: Object,
+          _node: {
+            type: Object,
+            computed: '_getNode(nodeName, graphHierarchy)',
+            observer: '_resetState'
+          },
+          _attributes: {
+            type: Array,
+            computed: '_getAttributes(_node)'
+          },
+          _device: {
+            type: String,
+            computed: '_getDevice(_node)'
+          },
+          _successors: {
+            type: Object,
+            computed: '_getSuccessors(_node, graphHierarchy)'
+          },
+          _predecessors: {
+            type: Object,
+            computed: '_getPredecessors(_node, graphHierarchy)'
+          },
+          _subnodes: {
+            type: Array,
+            computed: '_getSubnodes(_node)'
+          },
+          _expanded: {
+            type: Boolean,
+            value: true
+          },
+          _totalPredecessors: {
+            type: Number,
+            computed: '_getTotalPred(_predecessors)'
+          },
+          _totalSuccessors: {
+            type: Number,
+            computed: '_getTotalSucc(_successors)'
+          },
+          _openedControlPred: {
+            type: Boolean,
+            value: false
+          },
+          _openedControlSucc: {
+            type: Boolean,
+            value: false
+          },
+        },
+        expandNode: function() {
+          this.fire('_node.expand', this.node);
+        },
+        _getNode: function(n, graphHierarchy) {
+          return graphHierarchy.node(n);
+        },
+        _getNodeName: function(nodeName) {
+          // Insert a zero-width whitespace character before each slash so that
+          // long node names wrap cleanly at path boundaries.
+          return (nodeName || '').replace(/\//g, '\u200B/');
+        },
+        _getAttributes: function(node) {
+          this.async(this._resizeList.bind(this, "#attributesList"));
+          return node && node.attr ? node.attr.map(function(entry) {
+            return {key: entry.key, value: JSON.stringify(entry.value)};
+          }) : [];
+
+        },
+        _getDevice: function(node) {
+          return node ? node.device : null;
+        },
+        _getSuccessors: function(node, hierarchy) {
+          this.async(this._resizeList.bind(this, "#inputsList"));
+          return node ? hierarchy.getSuccessors(node.name) : [[], []];
+        },
+        _getPredecessors: function(node, hierarchy) {
+          this.async(this._resizeList.bind(this, "#outputsList"));
+          return node ? hierarchy.getPredecessors(node.name) : [[], []];
+        },
+        _getSubnodes: function(node) {
+          return node && node.metagraph ? node.metagraph.nodes() : null;
+        },
+        _getTotalPred: function(predecessors) {
+          return predecessors.regular.length + predecessors.control.length;
+        },
+        _getTotalSucc: function(successors) {
+          return successors.regular.length + successors.control.length;
+        },
+        _toggleControlPred: function() {
+          this._openedControlPred = !this._openedControlPred;
+        },
+        _toggleControlSucc: function() {
+          this._openedControlSucc = !this._openedControlSucc;
+        },
+        _toggleExpanded: function() {
+          this._expanded = !this._expanded;
+        },
+        _getToggleIcon: function(expanded) {
+          return expanded ? "expand-less" : "expand-more";
+        },
+        _resetState: function() {
+          this._openedControlPred = false;
+          this._openedControlSucc = false;
+        },
+        _resizeList: function(selector) {
+          var list = document.querySelector(selector);
+          if (list) {
+            list.fire('iron-resize');
+          }
+        }
+      });
+    })();
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-graph-info/tf-node-list-item.html b/tensorflow/tensorboard/components/tf-graph-info/tf-node-list-item.html
new file mode 100644
index 0000000000..f16e9e4511
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-info/tf-node-list-item.html
@@ -0,0 +1,91 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../tf-graph/tf-graph-icon.html">
+
+<dom-module id="tf-node-list-item">
+  <style>
+  #list-item {
+    width: 100%;
+    color: #565656;
+    font-size: 11pt;
+    font-weight: 400;
+    position: relative;
+  }
+
+  #list-item:hover {
+    background-color: var(--google-yellow-100);
+  }
+
+  .clickable {
+    cursor: pointer;
+  }
+
+  #list-item span {
+    display: block;
+    margin-left: 40px;
+  }
+
+  #list-item.excluded span {
+    color: #999;
+  }
+
+  .node-icon {
+    position: absolute;
+    top: 1px;
+    left: 2px;
+  }
+  </style>
+  <template>
+    <div id="list-item"
+         on-mouseover="_nodeListener"
+         on-mouseout="_nodeListener"
+         on-click="_nodeListener">
+      <tf-graph-icon class="node-icon"
+          node="[[itemNode]]" height="12"></tf-graph-icon>
+      <span title$="[[name]]">[[name]]</span>
+    </div>
+  </template>
+
+  <script>
+    (function() {
+      Polymer({
+        is: 'tf-node-list-item',
+
+        properties: {
+          /**
+           * The Node for the card itself, on which this item is being drawn.
+           * @type {tf.graph.Node}
+           */
+          cardNode: Object,
+          /**
+           * The Node for the item within the card, somehow related to cardNode.
+           * @type {tf.graph.Node}
+           */
+          itemNode: Object,
+          name: String,
+          itemType: {
+            type: String,
+            observer: '_itemTypeChanged'
+          }
+        },
+
+        _itemTypeChanged: function() {
+          if (this.itemType !== 'subnode') {
+            this.$['list-item'].classList.add('clickable');
+          } else {
+            this.$['list-item'].classList.remove('clickable');
+          }
+        },
+
+        _nodeListener: function(event) {
+          // fire node.click/mouseover/mouseout
+          this.fire('node-list-item-' + event.type, {
+            cardNode: this.cardNode.name,
+            nodeName: this.name,
+            type: this.itemType
+          });
+        }
+
+      });
+    })();
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-graph-loader/tf-graph-loader.html b/tensorflow/tensorboard/components/tf-graph-loader/tf-graph-loader.html
new file mode 100644
index 0000000000..3f290f2152
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph-loader/tf-graph-loader.html
@@ -0,0 +1,172 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+
+<!--
+An element which provides a filter parsing for pbtxt to graph output.
+-->
+<dom-module id="tf-graph-loader">
+</dom-module>
+
+<script>
+Polymer({
+
+  is: 'tf-graph-loader',
+
+  properties: {
+    /**
+     * @type {value: number, msg: string}
+     *
+     * A number between 0 and 100 denoting the % of progress
+     * for the progress bar and the displayed message.
+     */
+    progress: {
+      type: Object,
+      notify: true,
+      readOnly: true // Produces, does not consume.
+    },
+    datasets: Array,
+    hasStats: {
+      type: Boolean,
+      readOnly: true, // This property produces data.
+      notify: true
+    },
+    selectedDataset: Number,
+    selectedFile: {
+      type: Object,
+      observer: '_selectedFileChanged'
+    },
+    outGraphHierarchy: {
+      type: Object,
+      readOnly: true, //readonly so outsider can't change this via binding
+      notify: true
+    },
+    outGraph: {
+      type: Object,
+      readOnly: true, //readonly so outsider can't change this via binding
+      notify: true
+    },
+    outGraphName: {
+      type: String,
+      readOnly: true,
+      notify: true
+    }
+  },
+  observers: [
+    '_selectedDatasetChanged(selectedDataset, datasets)'
+  ],
+  _parseAndConstructHierarchicalGraph: function(dataset, pbTxtContent) {
+    var self = this;
+    // Reset the progress bar to 0.
+    self._setProgress({
+      value: 0,
+      msg: ''
+    });
+    var tracker = {
+      setMessage: function(msg) {
+        self._setProgress({
+          value: self.progress.value,
+          msg: msg
+        });
+      },
+      updateProgress: function(value) {
+        self._setProgress({
+          value: self.progress.value + value,
+          msg: self.progress.msg
+        });
+      },
+      reportError: function(msg) {
+        self._setProgress({
+          value: self.progress.value,
+          msg: msg,
+          error: true
+        });
+      },
+    };
+    var statsJson;
+    var dataTracker = tf.getSubtaskTracker(tracker, 30, 'Data');
+    tf.graph.parser.readAndParseData(dataset, pbTxtContent, dataTracker)
+    .then(function(result) {
+      // Build the flat graph (consists only of Op nodes).
+      var nodes = result.nodes;
+      statsJson = result.statsJson;
+
+      // This is the whitelist of inputs on op types that are considered
+      // reference edges. "Assign 0" indicates that the first input to
+      // an OpNode with operation type "Assign" is a reference edge.
+      var refEdges = {};
+      refEdges["Assign 0"] = true;
+      refEdges["AssignAdd 0"] = true;
+      refEdges["AssignSub 0"] = true;
+      refEdges["assign 0"] = true;
+      refEdges["assign_add 0"] = true;
+      refEdges["assign_sub 0"] = true;
+      refEdges["count_up_to 0"] = true;
+      refEdges["ScatterAdd 0"] = true;
+      refEdges["ScatterSub 0"] = true;
+      refEdges["ScatterUpdate 0"] = true;
+      refEdges["scatter_add 0"] = true;
+      refEdges["scatter_sub 0"] = true;
+      refEdges["scatter_update 0"] = true;
+      var buildParams = {
+        enableEmbedding: true,
+        inEmbeddingTypes: ['Const'],
+        outEmbeddingTypes: ['^[a-zA-Z]+Summary$'],
+        refEdges: refEdges
+      };
+      var graphTracker = tf.getSubtaskTracker(tracker, 20,
+          'Graph');
+      return tf.graph.build(nodes, buildParams, graphTracker);
+    })
+    .then(function(graph) {
+      this._setOutGraph(graph);
+      if (statsJson) {
+        // If there are associated stats, join them with the graph.
+        tf.time('Joining stats info with graph...', function() {
+          tf.graph.joinStatsInfoWithGraph(graph, statsJson);
+        });
+      }
+      var hierarchyParams = {
+        verifyTemplate: true,
+        groupSeries: true,
+      };
+      var hierarchyTracker = tf.getSubtaskTracker(tracker, 50,
+          'Namespace hierarchy');
+      return tf.graph.hierarchy.build(graph, hierarchyParams, hierarchyTracker);
+    }.bind(this))
+    .then(function(graphHierarchy) {
+      // Update the properties which notify the parent with the
+      // graph hierarchy and whether the data has live stats or not.
+      this._setHasStats(statsJson != null);
+      this._setOutGraphHierarchy(graphHierarchy);
+    }.bind(this))
+    .catch(function(reason) {
+      tracker.reportError("Graph visualization failed: " + reason);
+    });
+  },
+  _selectedDatasetChanged: function(datasetIndex, datasets) {
+    var dataset = datasets[datasetIndex];
+    this._parseAndConstructHierarchicalGraph(dataset);
+    this._setOutGraphName(dataset.name);
+  },
+  _selectedFileChanged: function(e) {
+    if (!e) {
+      return;
+    }
+    var file = e.target.files[0];
+    if (!file) {
+      return;
+    }
+
+    // Clear out the value of the file chooser. This ensures that if the user
+    // selects the same file, we'll re-read it.
+    e.target.value = '';
+
+    var reader = new FileReader();
+
+    reader.onload = function(e) {
+      this._parseAndConstructHierarchicalGraph(null, e.target.result);
+    }.bind(this);
+
+    reader.readAsText(file);
+  }
+});
+</script>
diff --git a/tensorflow/tensorboard/components/tf-graph/demo/tf-graph-demo.html b/tensorflow/tensorboard/components/tf-graph/demo/tf-graph-demo.html
new file mode 100644
index 0000000000..d6e736d185
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph/demo/tf-graph-demo.html
@@ -0,0 +1,185 @@
+<link rel="import" href="../../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../tf-graph-board/tf-graph-board.html">
+<link rel="import" href="../../tf-graph-loader/tf-graph-loader.html">
+<link rel="import" href="../../tf-graph/tf-graph-controls.html">
+<!-- Element for tf-graph demo page
+
+Example
+
+<tf-graph-demo></tf-graph-demo>
+
+-->
+
+<dom-module id="tf-graph-demo">
+<template>
+<style>
+
+:host /deep/ {
+  font-family: 'Roboto', sans-serif;
+}
+
+.main {
+  position: absolute;
+  right: 0;
+  left: 250px;
+  height: 100%;
+}
+
+.side {
+  position: absolute;
+  left: 0;
+  width: 250px;
+  height: 100%;
+  border: 1px solid black;
+  box-sizing: border-box;
+}
+
+.all {
+  position: relative;
+  width: 100%;
+  height: 100%
+}
+
+</style>
+<div class="all">
+  <div class="side">
+    <tf-graph-controls
+        color-by-params="[[colorByParams]]"
+        has-stats="[[hasStats]]"
+        color-by="{{colorBy}}"
+        datasets="[[datasets]]",
+        selected-dataset="{{selectedDataset}}"
+        selected-file="{{selectedFile}}"
+    ></tf-graph-controls>
+    <tf-graph-loader id="loader"
+        datasets="[[datasets]]",
+        selected-dataset="[[selectedDataset]]"
+        selected-file="[[selectedFile]]"
+        out-graph-hierarchy="{{graphHierarchy}}"
+        out-graph="{{graph}}"
+        out-graph-name="{{graphName}}"
+        has-stats="{{hasStats}}"
+        progress="{{_progress}}"
+    ></tf-graph-loader>
+  </div>
+  <div class="main">
+    <tf-graph-board id="graphboard"
+        graph-hierarchy="[[graphHierarchy]]"
+        graph="[[graph]]"
+        has-stats="[[hasStats]]"
+        graph-name="[[graphName]]"
+        progress="[[_progress]]"
+        color-by="[[colorBy]]"
+        color-by-params="{{colorByParams}}"
+    ></tf-graph-board>
+  </div>
+</div>
+</template>
+</dom-module>
+
+<script>
+(function(){
+
+var datasets = [
+  {
+    name: "Mnist Eval",
+    path: "mnist_eval.pbtxt",
+  },
+  {
+    name: "Mnist Train (with stats)",
+    path: "mnist_train.pbtxt",
+    statsPath: "mnist_train_stats.json"
+  },
+  {
+    name: "Inception Train (huge)",
+    path: "inception_train.pbtxt",
+  },
+  {
+    name: "Inception Train Eval",
+    path: "inception_train_eval.pbtxt",
+  },
+  {
+    name: "Inception Test",
+    path: "inception_test_eval.pbtxt",
+  },
+  {
+    name: "PTB Word LSTM Train",
+    path: "ptb_word_lstm_train.pbtxt",
+  },
+  {
+    name: "PTB Word LSTM Train Eval",
+    path: "ptb_word_lstm_train_eval.pbtxt",
+  },
+  {
+    name: "PTB Word LSTM Test",
+    path: "ptb_word_lstm_test_eval.pbtxt",
+  },
+  {
+    name: "Cifar10 Train",
+    path: "cifar10_train.pbtxt",
+  },
+  {
+    name: "Cifar10 Multi-GPU Train",
+    path: "cifar10_multi_gpu_train.pbtxt",
+  },
+  {
+    name: "Cifar10 Eval",
+    path: "cifar10_eval.pbtxt",
+  },
+  {
+    name: "Fatcat LSTM",
+    path: "fatcat_lstm.pbtxt",
+  },
+  {
+    name: "Legacy Inception Renamed",
+    path: "legacy_inception_renamed.pbtxt",
+  },
+  {
+    name: "Wolfe (Broken)",
+    path: "wolfe1.pbtxt",
+  },
+  {
+    name: "Wolfe (Fixed)",
+    path: "wolfe2.pbtxt",
+  },
+  {
+    name: "AlexNet",
+    path: "alexnet.pbtxt",
+  },
+  {
+    name: "TestError404",
+    path: "nofile"
+  }
+];
+
+Polymer({
+  is: 'tf-graph-demo',
+  properties: {
+    hasStats: Boolean,
+    datasets: {
+      type: Object,
+      value: function() {
+        return this._getDatasets();
+      }
+    },
+    selectedDataset: {
+      type: Number,
+      value: 1
+    },
+    _progress: Object
+  },
+  _getDatasets: function() {
+    return _.map(datasets, function(dataset) {
+      var result = {
+        name: dataset.name,
+        path: this.resolveUrl('tf_model_zoo/' + dataset.path)
+      };
+      if (dataset.statsPath != null) {
+        result.statsPath = this.resolveUrl('tf_model_zoo/' + dataset.statsPath);
+      }
+      return result;
+    }, this);
+  }
+});
+})();
+</script>
diff --git a/tensorflow/tensorboard/components/tf-graph/tf-graph-controls.html b/tensorflow/tensorboard/components/tf-graph/tf-graph-controls.html
new file mode 100644
index 0000000000..4a6901e911
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph/tf-graph-controls.html
@@ -0,0 +1,487 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-menu/paper-menu.html">
+<link rel="import" href="../../bower_components/paper-dropdown-menu/paper-dropdown-menu.html">
+
+<dom-module id="tf-graph-controls">
+<template>
+<style>
+:host {
+  font-size: 12px;
+  color: gray;
+  --paper-font-subhead: {
+    font-size: 14px;
+    color: gray;
+  };
+  --paper-dropdown-menu-icon: {
+    width: 15px;
+    height: 15px;
+  };
+  --paper-dropdown-menu-button: {
+    padding: 0;
+  };
+  --paper-dropdown-menu-input: {
+    padding: 0;
+  };
+  --paper-item-min-height: 30px;
+}
+
+paper-button[raised].keyboard-focus {
+  font-weight: normal;
+}
+
+.run-dropdown {
+  --paper-input-container: {
+    padding: 9px 0 0 25px;
+  };
+}
+
+.color-dropdown {
+  --paper-input-container: {
+    padding: 9px 0 0 13px;
+  };
+}
+
+table {
+  border-collapse: collapse;
+  border-spacing: 0;
+}
+
+table td {
+  padding: 0;
+  margin: 0;
+}
+
+.allcontrols {
+  padding: 10px;
+}
+
+.legend-holder {
+  position: absolute;
+  bottom: 0;
+  padding-bottom: 10px;
+}
+
+#fit {
+  color: var(--paper-orange-500);
+}
+
+paper-radio-button {
+  padding: 5px;
+}
+svg.icon {
+  width: 60px;
+  height: 18px;
+}
+.icon ellipse {
+  rx: 10px;
+  ry: 5px;
+  stroke: #CCC;
+  stroke-width: 1px;
+  fill: #FFFFFF;
+  cy: 10px;
+}
+.icon rect {
+  height: 14px;
+  width: 35px;
+  rx: 5px;
+  ry: 5px;
+  stroke: #CCC;
+  stroke-width: 2px;
+  fill: #D9D9D9;
+}
+.domainValues {
+  width: 165px;
+}
+.domainStart {
+  float: left;
+}
+.domainEnd {
+  float: right;
+}
+.colorBox {
+  width: 20px;
+}
+
+.image-icon {
+  width: 24px;
+  height: 24px;
+}
+
+.gray {
+  color: #666;
+}
+
+.title {
+  font-size: 16px;
+  margin: 8px 5px 8px 0;
+  color: black;
+}
+.title small {
+  font-weight: normal;
+}
+.deviceList {
+  max-height: 100px;
+  overflow-y: auto;
+}
+
+#file {
+  padding: 8px 0;
+}
+
+.color-text {
+  padding: 0 0 0 55px;
+}
+
+.fit-button-text {
+  text-transform: none;
+  padding: 8px 18px 0 18px;
+  font-size: 14px
+}
+
+.upload-button {
+  width: 165px;
+  height: 25px;
+  text-transform: none;
+  margin-top: 4px;
+}
+
+.fit-button {
+  padding: 2px;
+  width: 30px;
+  height: 30px;
+}
+
+.hidden-input {
+  height: 0px;
+  width: 0px;
+  overflow:hidden;
+}
+
+.allcontrols .control-holder {
+  display: flex;
+  clear: both;
+}
+</style>
+<div class="allcontrols">
+  <div class="control-holder">
+    <paper-icon-button id="fit" icon="aspect-ratio" class="fit-button" on-click="fit" alt="Fit to screen">
+    </paper-icon-button>
+    <paper-button class="fit-button-text" on-click="fit">Fit to screen
+    </paper-button>
+  </div>
+  <div class="control-holder">
+    <div class="title">Run</div>
+    <paper-dropdown-menu no-label-float no-animations noink class="run-dropdown">
+      <paper-menu id="select" class="dropdown-content" selected="{{selectedDataset}}">
+        <template is="dom-repeat" items="[[datasets]]">
+          <paper-item>[[item.name]]</paper-item>
+        </template>
+      </paper-menu>
+    </paper-dropdown-menu>
+  </div>
+  <div class="control-holder">
+    <div class="title">Upload</div>
+    <paper-button raised class="text-button upload-button"
+        on-click="_getFile">Choose File</paper-button>
+    <div class="hidden-input">
+      <input type="file" id="file" name="file" on-change="_updateFileInput" />
+    </div>
+  </div>
+  <div class="control-holder">
+    <div class="title">Color</div>
+    <paper-dropdown-menu no-label-float no-animations noink class="color-dropdown">
+      <paper-menu class="dropdown-content" selected="{{_colorByIndex}}">
+        <paper-item>Structure</paper-item>
+        <paper-item>Device</paper-item>
+        <template is="dom-if" if="[[hasStats]]">
+          <paper-item>Compute time</paper-item>
+          <paper-item>Memory</paper-item>
+        </template>
+      </paper-menu>
+    </paper-dropdown-menu>
+  </div>
+  <div>
+    <template is="dom-if" if="[[_isGradientColoring(colorBy)]]">
+      <svg width="160" height="20" style="margin: 0 5px" class="color-text">
+        <defs>
+          <linearGradient id="linearGradient" x1="0%" y1="0%" x2="100%" y2="0%">
+            <stop class="start" offset="0%"
+                stop-color$="[[_currentGradientParams.startColor]]"/>
+            <stop class="end" offset="100%"
+                stop-color$="[[_currentGradientParams.endColor]]"/>
+          </linearGradient>
+        </defs>
+        <rect x="0" y="0" width="160" height="20" fill="url(#linearGradient)"
+            stroke="black" />
+      </svg>
+      <div class="domainValues color-text">
+        <div class="domainStart">[[_currentGradientParams.minValue]]</div>
+        <div class="domainEnd">[[_currentGradientParams.maxValue]]</div>
+      </div>
+    </template>
+    <template is="dom-if" if="[[_equals(colorBy, 'structure')]]">
+      <div class="color-text">
+        color: same substructure<br/>
+        gray: unique substructure
+      </div>
+    </template>
+    <template is="dom-if" if="[[_equals(colorBy, 'device')]]">
+      <div class="color-text">
+        <div class="deviceList">
+          <table>
+          <template is="dom-repeat" items="[[colorByParams.device]]">
+            <tr>
+              <td style$="[[_getBackgroundColor(item.color)]]">
+                <div class="colorBox"></div>
+              </td>
+              <td>
+                <div>[[item.device]]</div>
+              </td>
+            </tr>
+          </template>
+          </table>
+        </div>
+        <br/>
+        gray: unknown device
+      </div>
+    </template>
+  </div>
+  <div class="legend-holder">
+    <table>
+      <tr>
+        <td><div class="title">Graph</div></td>
+        <td>(* = expandable)</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon">
+            <rect transform="translate(3, 1)" height="14" width="35"
+                rx="5" ry="5"/>
+          </svg>
+        </td>
+        <td>Namespace<span class="gray">*</span></td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" preserveAspectRatio="xMinYMid meet"
+              viewBox="0 0 10 10">
+            <use xlink:href="#op-node-stamp" fill="white" stroke="#ccc" x="9.5"
+               y="6" />
+          </svg>
+        </td>
+        <td>OpNode</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px" preserveAspectRatio="xMinYMid meet"
+              viewBox="0 0 12 12">
+            <use xlink:href="#op-series-horizontal-stamp" fill="white"
+                stroke="#ccc" x="2" y="2"/>
+          </svg>
+        </td>
+        <td>Unconnected series<span class="gray">*</span></td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px"
+              preserveAspectRatio="xMinYMid meet" viewBox="0 0 15 15">
+            <use xlink:href="#op-series-vertical-stamp"
+                fill="white" stroke="#ccc" x="2" y="2"/>
+          </svg>
+        </td>
+        <td>Connected series<span class="gray">*</span></td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon">
+            <circle fill="white" stroke="#848484" cx="10" cy="10" r="5"/>
+          </svg>
+        </td>
+        <td>Constant</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="image-icon">
+            <image id="summary-icon" width="24" height="24" x="0" y="0"
+                class="image-icon"/>
+          </svg>
+        </td>
+        <td>Summary</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px"
+              preserveAspectRatio="xMinYMid meet" viewBox="0 0 15 15">
+            <defs>
+              <marker id="arrowhead-legend" fill="#bbb" markerWidth="10"
+                  markerHeight="10" refX="9" refY="5" orient="auto">
+                <path d="M 0,0 L 10,5 L 0,10 C 3,7 3,3 0,0"/>
+              </marker>
+              <marker id="ref-arrowhead-legend" fill="#bbb" markerWidth="10"
+                  markerHeight="10" refX="1" refY="5" orient="auto">
+                <path d="M 10,0 L 0,5 L 10,10 C 7,7 7,3 10,0"/>
+              </marker>
+            </defs>
+            <path marker-end="url(#arrowhead-legend)" stroke="#bbb"
+                d="M2 9 l 23 0" stroke-linecap="round" />
+          </svg>
+        </td>
+        <td>Dataflow edge</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px"
+              preserveAspectRatio="xMinYMid meet" viewBox="0 0 15 15">
+            <path marker-end="url(#arrowhead-legend)" stroke="#bbb"
+               d="M2 9 l 23 0" stroke-linecap="round" stroke-dasharray="2, 2" />
+          </svg>
+        </td>
+        <td>Control dependency edge</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px"
+              preserveAspectRatio="xMinYMid meet" viewBox="0 0 15 15">
+            <path marker-start="url(#ref-arrowhead-legend)"
+              marker-end="url(#arrowhead-legend)" stroke="#bbb" d="M2 9 l 23 0"
+              stroke-linecap="round" />
+          </svg>
+        </td>
+        <td>Reference edge</td>
+      </tr>
+    </table>
+  </div>
+  </div>
+</template>
+<script>
+(function() { // Private scope.
+Polymer({
+  is: 'tf-graph-controls',
+  ready: function() {
+    // Set the url to download the summary icon.
+    d3.select(this.$['summary-icon'])
+      .attr('xlink:href', this.resolveUrl('../../lib/svg/summary-icon.svg'));
+  },
+  properties: {
+    // Public API.
+    hasStats: {
+      type: Boolean
+    },
+    colorBy: {
+      type: String,
+      notify: true,
+      computed: '_getColorBy(_colorByIndex)'
+    },
+    colorByParams: Object,
+    datasets: {
+      type: Array,
+      observer: '_datasetsChanged'
+    },
+    selectedDataset: {
+      type: Number,
+      notify: true,
+      value: 0,
+    },
+    selectedFile: {
+      type: Object,
+      notify: true
+    },
+    // Private API.
+    _colorByIndex: {
+      type: Number,
+      value: 0 // Defaults to 'structure'.
+    },
+    _currentGradientParams: {
+      type: Object,
+      computed: '_getCurrentGradientParams(colorByParams, colorBy)'
+    }
+  },
+  _getColorBy: function(colorByIndex) {
+    return ["structure", "device", "compute_time", "memory"][colorByIndex];
+  },
+  _getBackgroundColor: function(color) {
+    return 'background-color:' + color;
+  },
+  fit: function() {
+    document.querySelector('#scene').fit();
+  },
+  _isGradientColoring: function(colorBy) {
+    return ["compute_time", "memory"].indexOf(colorBy) !== -1;
+  },
+  _equals: function(a, b) {
+    return a === b;
+  },
+  _getCurrentGradientParams: function(colorByParams, colorBy) {
+    if (!this._isGradientColoring(colorBy)) {
+      return;
+    }
+    var params = colorByParams[colorBy];
+    var minValue = params.minValue;
+    var maxValue = params.maxValue;
+    if (colorBy === 'memory') {
+      minValue = convertToHumanReadable(minValue, MEMORY_UNITS);
+      maxValue = convertToHumanReadable(maxValue, MEMORY_UNITS);
+    } else if (colorBy === 'compute_time') {
+      minValue = convertToHumanReadable(minValue, TIME_UNITS);
+      maxValue = convertToHumanReadable(maxValue, TIME_UNITS);
+    }
+    return {
+      minValue: minValue,
+      maxValue: maxValue,
+      startColor: params.startColor,
+      endColor: params.endColor
+    };
+  },
+  _updateFileInput: function(e) {
+    this.set('selectedFile', e);
+  },
+  _datasetsChanged: function(newDatasets, oldDatasets) {
+    if (oldDatasets != null || this.selected == null) {
+      // Select the first dataset by default.
+      this.set('selectedDataset', 0);
+    }
+  },
+  _getFile: function() {
+    this.$.file.click();
+  }
+});
+
+// Private methods.
+var MEMORY_UNITS = [
+  // Atomic unit.
+  {symbol: 'B'},
+  // numUnits specifies how many previous units this unit contains.
+  {symbol: 'KB', numUnits: 1024},
+  {symbol: 'MB', numUnits: 1024},
+  {symbol: 'GB', numUnits: 1024},
+  {symbol: 'TB', numUnits: 1024},
+  {symbol: 'PB', numUnits: 1024}
+];
+var TIME_UNITS = [
+  // Atomic unit. Finest granularity in TensorFlow stat collection.
+  {symbol: 'µs'},
+  // numUnits specifies how many previous units this unit contains.
+  {symbol: 'ms', numUnits: 1000},
+  {symbol: 's', numUnits: 1000},
+  {symbol: 'min', numUnits: 60},
+  {symbol: 'hr', numUnits: 60},
+  {symbol: 'days', numUnits: 24}
+];
+
+/**
+  * Returns the human readable version of the unit.
+  * (e.g. 1.35 GB, 23 MB, 34 ms, 6.53 min etc).
+  */
+function convertToHumanReadable(value, units, unitIndex) {
+  unitIndex = unitIndex == null ? 0 : unitIndex;
+  if (unitIndex + 1 < units.length && value >= units[unitIndex + 1].numUnits) {
+    return convertToHumanReadable(value / units[unitIndex + 1].numUnits,
+        units, unitIndex + 1);
+  }
+  // toPrecision() has the tendency to return a number in scientific
+  // notation and (number - 0) brings it back to normal notation.
+  return (value.toPrecision(3) - 0) + ' ' + units[unitIndex].symbol;
+}
+})(); // Closing private scope.
+</script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-graph/tf-graph-icon.html b/tensorflow/tensorboard/components/tf-graph/tf-graph-icon.html
new file mode 100644
index 0000000000..fafaa3b954
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph/tf-graph-icon.html
@@ -0,0 +1,164 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+
+<dom-module id="tf-graph-icon">
+  <template>
+    <template is="dom-if" if="[[_isType(node, type, 'OP')]]">
+      <template is="dom-if" if="[[_isConst(node, const)]]">
+        <svg height$="[[height]]"
+            preserveAspectRatio="xMinYMid meet" viewBox="0 0 10 10">
+          <circle fill="white" stroke="#848484" cx="5" cy="5" r="3" />
+        </svg>
+      </template>
+      <template is="dom-if" if="[[_isSummary(node, summary)]]">
+        <img height$="[[height]]"
+            src="[[resolveUrl('../../lib/svg/summary-icon.svg')]]" />
+      </template>
+      <template is="dom-if" if="[[_isRegularOp(node, const, summary)]]">
+        <svg height$="[[height]]"
+            preserveAspectRatio="xMinYMid meet" viewBox="0 0 16 8">
+          <use xmlns:xlink="http://www.w3.org/1999/xlink"
+              xlink:href="#op-node-stamp"
+              fill="white"
+              stroke="#ccc"
+              x="8" y="4" />
+        </svg>
+      </template>
+    </template>
+    <template is="dom-if" if="[[_isType(node, type, 'META')]]">
+      <svg height$="[[height]]"
+            preserveAspectRatio="xMinYMid meet" viewBox="0 0 37 16">
+        <rect x="1" y="1"
+            fill="#d9d9d9" stroke="#ccc" stroke-width="2px"
+            height="14" width="35"
+            rx="5" ry="5" />
+      </svg>
+    </template>
+    <template is="dom-if" if="[[_isType(node, type, 'SERIES')]]">
+      <template is="dom-if" if="[[_isVertical(node, vertical)]]">
+        <svg height$="[[height]]"
+            preserveAspectRatio="xMinYMid meet" viewBox="0 0 16 15">
+          <use xmlns:xlink="http://www.w3.org/1999/xlink"
+              xlink:href="#op-series-vertical-stamp"
+              fill="white"
+              stroke="#ccc"
+              x="0" y="2" />
+        </svg>
+      </template>
+      <template is="dom-if" if="[[!_isVertical(node, vertical)]]">
+        <svg height$="[[height]]"
+            preserveAspectRatio="xMinYMid meet" viewBox="0 0 24 10">
+          <use xmlns:xlink="http://www.w3.org/1999/xlink"
+              xlink:href="#op-series-horizontal-stamp"
+              fill="white"
+              stroke="#ccc"
+              x="0" y="1" />
+        </svg>
+      </template>
+    </template>
+  </template>
+
+  <script>
+    (function() {
+      Polymer({
+        is: 'tf-graph-icon',
+
+        properties: {
+          /**
+           * Node to represent with an icon. Optional, but if specified, its
+           * properties override those defined in the type, vertical, const and
+           * summary properties.
+           * @type {tf.graph.Node}
+           */
+          node: {
+            type: Object,
+            value: null
+          },
+
+          /** Type of node to draw. */
+          type: {
+            type: String,
+            value: null
+          },
+
+          /** Direction for series (ignored for other types). */
+          vertical: {
+            type: Boolean,
+            value: false
+          },
+
+          /** Whether the op is Const (ignored for non-ops). */
+          const: {
+            type: Boolean,
+            value: false
+          },
+
+          /** Whether the op is a Summary (ignored for non-ops). */
+          summary: {
+            type: Boolean,
+            value: false
+          },
+
+          /** Height of the SVG element in pixels, used for scaling. */
+          height: {
+            type: Number,
+            value: 20
+          }
+        },
+
+        /**
+         * Test whether the specified node's type, or the literal type string,
+         * match a particular other type.
+         */
+        _isType: function(inputNode, inputType, targetType) {
+          if (inputNode) {
+            return tf.graph.NodeType[inputNode.type] === targetType;
+          }
+          return inputType === targetType;
+        },
+
+        /**
+         * Test whether the specified node should be represented as a vertical
+         * series. Defaults to the value of the vertical property if node is
+         * not specified.
+         */
+        _isVertical: function(inputNode, inputVertical) {
+          if (inputNode) {
+            return inputNode.hasNonControlEdges;
+          }
+          return !!inputVertical;
+        },
+
+        /**
+         * Test whether the specified node is a constant. Defaults to the value
+         * of the const property if node is not specified.
+         */
+        _isConst: function(inputNode, inputConst) {
+          if (inputNode) {
+            return inputNode.op === 'Const';
+          }
+          return !!inputConst;
+        },
+
+        /**
+         * Test whether the specified node is a summary. Defaults to the value
+         * of the summary property if node is not specified.
+         */
+        _isSummary: function(inputNode, inputSummary) {
+          if (inputNode) {
+            return this._isType(inputNode, null, 'OP') &&
+                inputNode.op.substr(-7) === 'Summary';
+          }
+          return !!inputSummary;
+        },
+
+        /**
+         * Test whether the op node is a regular non-summary non-const node.
+         */
+        _isRegularOp: function(inputNode, inputConst, inputSummary) {
+          return !this._isConst(inputNode, inputConst) &&
+              !this._isSummary(inputNode, inputSummary);
+        }
+      });
+    })();
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-graph/tf-graph-minimap.html b/tensorflow/tensorboard/components/tf-graph/tf-graph-minimap.html
new file mode 100644
index 0000000000..2b6beeaded
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph/tf-graph-minimap.html
@@ -0,0 +1,69 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<script src="../tf-graph-common/lib/scene/minimap.js"></script>
+<dom-module id="tf-graph-minimap">
+<template>
+<style>
+:host {
+  background-color:white;
+  transition: opacity .3s linear;
+  pointer-events: auto;
+}
+
+:host.hidden {
+  opacity: 0;
+  pointer-events: none;
+}
+
+canvas {
+  border: 1px solid #999;
+}
+
+rect {
+  fill: white;
+  stroke: #111111;
+  stroke-width: 1px;
+  fill-opacity: 0;
+  filter: url(#minimapDropShadow);
+  cursor: move;
+}
+
+svg {
+  position: absolute;
+}
+</style>
+<svg>
+  <defs>
+    <filter id="minimapDropShadow" x="-20%" y="-20%" width="150%" height="150%">
+      <feOffset result="offOut" in="SourceGraphic" dx="1" dy="1"></feOffset>
+      <feColorMatrix result="matrixOut" in="offOut" type="matrix" values="0.1 0 0 0 0 0 0.1 0 0 0 0 0 0.1 0 0 0 0 0 0.5 0"></feColorMatrix>
+      <feGaussianBlur result="blurOut" in="matrixOut" stdDeviation="2"></feGaussianBlur>
+      <feBlend in="SourceGraphic" in2="blurOut" mode="normal"></feBlend>
+    </filter>
+  </defs>
+  <rect></rect>
+</svg>
+<canvas class="first"></canvas>
+<!-- Additional canvas to use as buffer to avoid flickering between updates -->
+<canvas class="second"></canvas>
+</template>
+<script>
+Polymer({
+  is: 'tf-graph-minimap',
+
+  /**
+   * Initializes the minimap and returns a minimap object to notify when
+   * things update.
+   *
+   * @param svg The main svg element.
+   * @param zoomG The svg group used for panning and zooming the main svg.
+   * @param mainZoom The main zoom behavior.
+   * @param maxWandH The maximum width/height for the minimap.
+   * @param labelPadding Padding in pixels due to the main graph labels.
+   */
+  init: function(svg, zoomG, mainZoom, maxWAndH, labelPadding) {
+    return new tf.scene.Minimap(svg, zoomG, mainZoom, this, maxWAndH,
+        labelPadding);
+  }
+});
+</script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-graph/tf-graph-params.html b/tensorflow/tensorboard/components/tf-graph/tf-graph-params.html
new file mode 100644
index 0000000000..576816ddd0
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph/tf-graph-params.html
@@ -0,0 +1,113 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<!--
+Module for adjusting render graph building parameter.
+-->
+<dom-module id="tf-graph-params">
+</dom-module>
+<script>
+  Polymer({
+
+    is: 'tf-graph-params',
+
+    properties: {
+      // PARAMETERS
+
+      enableExtraction: {
+        type: Boolean,
+        value: true
+      },
+
+      /** Maximum in-degree that a node can have without being considered as
+       *  high in-degree node. */
+      maxInDegree: {
+        type: Number,
+        value: 4
+      },
+      /** Maximum out-degree that a node can have without being considered as
+       *  high out-degree node. */
+      maxOutDegree: {
+        type: Number,
+        value: 4
+      },
+      /** Maximum number of control edges a node can have before they aren't
+       *  displayed. */
+      maxControlDegree: {
+        type: Number,
+        value: 4
+      },
+
+      /**
+       * Types patterns for predefined out-extract nodes, which are
+       * sink-like nodes that will be extracted from the main graph.
+       */
+      outExtractTypes: {
+        type: Array,
+        value: function() {
+          return [
+            'NoOp' // for "sgd", "momentum" group
+          ];
+        }
+      },
+
+      /**
+       * Types patterns for predefined in-extract nodes, which are
+       * source-like nodes that will be extracted from the main graph.
+       */
+      inExtractTypes: {
+        type: Array,
+        value: function() {
+          return ['Variable'];
+        }
+      },
+
+      /**
+       * When removing edges from a high degree node, remove all of its edges if
+       * detachAllEdgesForHighDegree is true.  Otherwise remove all in-edges if
+       * the node has high in-degree, or all out-edges if the node has high
+       * out-degree.
+       */
+      detachAllEdgesForHighDegree: {
+        type: Boolean,
+        value: false
+      },
+
+      /**
+       * After extracting high in/out degree nodes and predefined
+       * source-like/sink-like, extract isolated nodes to the side
+       * if this extractIsolatedNodesWithAnnotationsOnOneSide is true.
+       */
+      extractIsolatedNodesWithAnnotationsOnOneSide: {
+        type: Boolean,
+        value: true
+      },
+
+      /**
+       * Whether to draw bridge paths inside of expanded group nodes.
+       */
+      enableBridgegraph: {
+        type: Boolean,
+        value: true
+      },
+
+      /**
+       * Colors for the minimum and maximum values whenever we have a gradient
+       * scale.
+       */
+      minMaxColors: {
+        type: Array,
+        value: function() {
+          return ["#fff5f0", "#fb6a4a"];
+        }
+      },
+
+      /**
+       * Maximum number of annotations to be displayed on a node before an
+       * ellipsis is used.
+       */
+      maxAnnotations: {
+        type: Number,
+        value: 5
+      }
+    }
+  });
+</script>
diff --git a/tensorflow/tensorboard/components/tf-graph/tf-graph-scene.html b/tensorflow/tensorboard/components/tf-graph/tf-graph-scene.html
new file mode 100644
index 0000000000..34c2d3dc3d
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph/tf-graph-scene.html
@@ -0,0 +1,475 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="tf-graph-style.html">
+<link rel="import" href="tf-graph-minimap.html">
+<script src="../tf-graph-common/lib/layout.js"></script>
+<script src="../../bower_components/dagre/dist/dagre.core.js"></script>
+<!--
+  A module that takes graph-hierarchy as input and produce
+  a svg dom using dagre and d3.
+-->
+<dom-module id="tf-graph-scene">
+<template>
+<style include="tf-graph-style">
+  :host {
+    font-size: 20px;
+  }
+  .titleContainer {
+    position: relative;
+  }
+  .title {
+    position: absolute;
+  }
+  .auxTitle {
+    position: absolute;
+  }
+  #minimap {
+    position: absolute;
+    right: 20px;
+    bottom: 20px;
+  }
+</style>
+<div class="titleContainer">
+  <div id="title" class="title">Main Graph</div>
+  <div id="auxTitle" class="auxTitle">Auxiliary nodes</div>
+</div>
+<svg id="svg">
+  <defs>
+    <!-- Arrow head for edge paths. -->
+    <marker id="arrowhead" markerWidth="10" markerHeight="10"
+      refX="9" refY="5" orient="auto">
+      <path d="M 0,0 L 10,5 L 0,10 C 3,7 3,3 0,0"/>
+    </marker>
+    <marker id="ref-arrowhead" markerWidth="10" markerHeight="10"
+      refX="1" refY="5" orient="auto">
+      <path d="M 10,0 L 0,5 L 10,10 C 7,7 7,3 10,0"/>
+    </marker>
+    <!-- Arrow head for annotation edge paths. -->
+    <marker id="annotation-arrowhead" markerWidth="5" markerHeight="5"
+      refX="5" refY="2.5" orient="auto">
+      <path d="M 0,0 L 5,2.5 L 0,5 L 0,0"/>
+    </marker>
+    <marker id="ref-annotation-arrowhead" markerWidth="5" markerHeight="5"
+      refX="0" refY="2.5" orient="auto">
+      <path d="M 5,0 L 0,2.5 L 5,5 L 5,0"/>
+    </marker>
+    <!-- Template for an Op node ellipse. -->
+    <ellipse id="op-node-stamp"
+        rx="7.5" ry="3" stroke="inherit" fill="inherit" />
+    <!-- Template for an Op node annotation ellipse (smaller). -->
+    <ellipse id="op-node-annotation-stamp"
+        rx="5" ry="2" stroke="inherit" fill="inherit" />
+    <!-- Vertically stacked series of Op nodes when unexpanded. -->
+    <g id="op-series-vertical-stamp">
+      <use xlink:href="#op-node-stamp" x="8" y="9" />
+      <use xlink:href="#op-node-stamp" x="8" y="6" />
+      <use xlink:href="#op-node-stamp" x="8" y="3" />
+    </g>
+    <!-- Horizontally stacked series of Op nodes when unexpanded. -->
+    <g id="op-series-horizontal-stamp">
+      <use xlink:href="#op-node-stamp" x="16" y="4" />
+      <use xlink:href="#op-node-stamp" x="12" y="4" />
+      <use xlink:href="#op-node-stamp" x="8" y="4" />
+    </g>
+    <!-- Horizontally stacked series of Op nodes for annotation. -->
+    <g id="op-series-annotation-stamp">
+      <use xlink:href="#op-node-annotation-stamp" x="9" y="2" />
+      <use xlink:href="#op-node-annotation-stamp" x="7" y="2" />
+      <use xlink:href="#op-node-annotation-stamp" x="5" y="2" />
+    </g>
+    <!--
+      Where the linearGradient for each node is stored. Used when coloring
+      by proportions of devices.
+    -->
+    <g id="linearGradients"></g>
+  </defs>
+  <!-- Make a large rectangle that fills the svg space so that
+  zoom events get captured on safari -->
+  <rect fill="white" width="10000" height="10000"></rect>
+  <g id="root"></g>
+</svg>
+<tf-graph-minimap id="minimap"></tf-graph-minimap>
+</template>
+</dom-module>
+<script>
+Polymer({
+  is: 'tf-graph-scene',
+  properties: {
+    graphHierarchy: Object,
+    name: String,
+    colorBy: {
+      type: String,
+      observer: '_colorByChanged'
+    },
+    /** @type {d3_zoom} d3 zoom object */
+    _zoom: Object,
+    highlightedNode: {
+      type: String,
+      observer: '_highlightedNodeChanged'
+    },
+    selectedNode: {
+      type: String,
+      observer: '_selectedNodeChanged'
+    },
+    /** Keeps track of if the graph has been zoomed/panned since loading */
+    _zoomed: {
+      type: Boolean,
+      observer: '_onZoomChanged',
+      value: false
+    },
+    /** Keeps track of the starting coordinates of a graph zoom/pan */
+    _zoomStartCoords: {
+      type: Array,
+      value: null
+    },
+    /** Keeps track of the current coordinates of a graph zoom/pan */
+    _zoomCoords: {
+      type: Array,
+      value: null
+    },
+    /** Maximum distance of a zoom event for it to be interpreted as a click */
+    _maxZoomDistanceForClick: {
+      type: Number,
+      value: 20
+    },
+    /**
+     * @type {d3.scale.ordinal}
+     * Scale mapping from template name to a number between 0 and N-1
+     * where N is the number of different template names.
+     */
+    templateIndex: Object,
+    /**
+     * @type {tf.scene.Minimap}
+     * A minimap object to notify for zoom events.
+     */
+    minimap: Object,
+    /*
+     * Dictionary for easily stylizing nodes when state changes.
+     * _nodeGroupIndex[nodeName] = d3_selection of the nodeGroup
+     */
+    _nodeGroupIndex: {
+      type: Object,
+      value: function() { return {}; }
+    },
+    /*
+     * Dictionary for easily stylizing annotation nodes when state changes.
+     * _annotationGroupIndex[nodeName][hostNodeName] =
+     *   d3_selection of the annotationGroup
+     */
+    _annotationGroupIndex: {
+      type: Object,
+      value: function() { return {}; }
+    },
+    /*
+     * Dictionary for easily stylizing edges when state changes.
+     * _edgeGroupIndex[edgeName] = d3_selection of the edgeGroup
+     */
+    _edgeGroupIndex: {
+      type: Object,
+      value: function() { return {}; }
+    },
+    /**
+     * Max font size for metanode label strings.
+     */
+    maxMetanodeLabelLengthFontSize: {
+      type: Number,
+      value: 9
+    },
+    /**
+     * Min font size for metanode label strings.
+     */
+    minMetanodeLabelLengthFontSize: {
+      type: Number,
+      value: 6
+    },
+    /**
+     * Metanode label strings longer than this are given smaller fonts.
+     */
+    maxMetanodeLabelLengthLargeFont: {
+      type: Number,
+      value: 11
+    },
+    /**
+     * Metanode label strings longer than this are truncated with ellipses.
+     */
+    maxMetanodeLabelLength: {
+      type: Number,
+      value: 18
+    },
+    progress: Object
+  },
+  observers: [
+    '_buildAndFit(graphHierarchy)'
+  ],
+  getNode: function(nodeName) {
+    return this.graphHierarchy.getRenderNodeByName(nodeName);
+  },
+  isNodeExpanded: function(node) {
+    return node.expanded;
+  },
+  setNodeExpanded: function(renderNode) {
+    this._build(this.graphHierarchy);
+  },
+  /**
+   * Resets the state of the component. Called whenever the whole graph
+   * (dataset) changes.
+   */
+  _resetState: function() {
+    // Reset the state of the component.
+    this._nodeGroupIndex = {};
+    this._annotationGroupIndex = {};
+    this._edgeGroupIndex = {};
+    this._updateLabels(false);
+    // Remove all svg elements under the 'root' svg group.
+    d3.select(this.$.svg).select('#root').selectAll('*').remove();
+    // And the defs.
+    d3.select(this.$.svg).select('defs #linearGradients')
+        .selectAll('*').remove();
+  },
+  /** Main method for building the scene */
+  _build: function(graphHierarchy) {
+    if (!graphHierarchy) { return; } //handle untruthy input
+    var templateNames = d3.keys(graphHierarchy.hierarchy.templates);
+
+    this.templateIndex = d3.scale.ordinal()
+                                 .domain(templateNames)
+                                 .range(d3.range(0, templateNames.length));
+    tf.time('tf-graph-scene (layout):', function() {
+      // layout the scene for this meta / series node
+      tf.graph.layout.scene(graphHierarchy.root, this);
+    }.bind(this));
+
+    tf.time('tf-graph-scene (build scene):', function() {
+      tf.graph.scene.buildGroup(d3.select(this.$.root), graphHierarchy.root, this);
+      tf.graph.scene.addGraphClickListener(this.$.svg, this);
+    }.bind(this));
+    // Update the minimap again when the graph is done animating.
+    setTimeout(function() {
+      this.minimap.update();
+    }.bind(this), tf.graph.layout.PARAMS.animation.duration);
+  },
+  ready: function() {
+    this._zoom = d3.behavior.zoom()
+      .on('zoomend', function() {
+        if (this._zoomStartCoords) {
+          // Calculate the total distance dragged during the zoom event.
+          // If it is sufficiently small, then fire an event indicating
+          // that zooming has ended. Otherwise wait to fire the zoom end
+          // event, so that a mouse click registered as part of this zooming
+          // is ignored (as this mouse click was part of a zooming, and should
+          // not be used to indicate an actual click on the graph).
+          var dragDistance = Math.sqrt(
+            Math.pow(this._zoomStartCoords[0] - this._zoomCoords[0], 2) +
+            Math.pow(this._zoomStartCoords[1] - this._zoomCoords[1], 2));
+          if (dragDistance < this._maxZoomDistanceForClick) {
+            this._fireEnableClick();
+          } else {
+            setTimeout(this._fireEnableClick.bind(this), 50);
+          }
+        }
+        this._zoomStartCoords = null;
+      }.bind(this))
+      .on('zoom', function() {
+        // Store the coordinates of the zoom event
+        this._zoomCoords = d3.event.translate;
+
+        // If this is the first zoom event after a zoom-end, then
+        // store the coordinates as the start coordinates as well,
+        // and fire an event to indicate that zooming has started.
+        // This doesn't use the zoomstart event, as d3 sends this
+        // event on mouse-down, even if there has been no dragging
+        // done to translate the graph around.
+        if (!this._zoomStartCoords) {
+          this._zoomStartCoords = this._zoomCoords.slice();
+          this.fire('disable-click');
+        }
+        this._zoomed = true;
+        d3.select(this.$.root).attr('transform',
+                    'translate(' + d3.event.translate + ')' +
+                    'scale(' + d3.event.scale + ')');
+        // Notify the minimap.
+        this.minimap.zoom(d3.event.translate, d3.event.scale);
+      }.bind(this));
+    d3.select(this.$.svg).call(this._zoom)
+      .on('dblclick.zoom', null);
+    d3.select(window).on('resize', function() {
+      // Notify the minimap that the user's window was resized.
+      // The minimap will figure out the new dimensions of the main svg
+      // and will use the existing translate and scale params.
+      this.minimap.zoom();
+    }.bind(this));
+    // Initialize the minimap.
+    this.minimap = this.$.minimap.init(this.$.svg, this.$.root, this._zoom,
+        tf.graph.layout.PARAMS.minimap.size,
+        tf.graph.layout.PARAMS.subscene.meta.labelHeight);
+  },
+  _buildAndFit: function(graphHierarchy) {
+    this._resetState();
+    this._build(graphHierarchy);
+    // Fit to screen after the graph is done animating.
+    setTimeout(this.fit.bind(this), tf.graph.layout.PARAMS.animation.duration);
+  },
+  _updateLabels: function(showLabels) {
+    var titleStyle = this.getElementsByClassName('title')[0].style;
+    var auxTitleStyle = this.getElementsByClassName('auxTitle')[0].style;
+    var core = this.getElementsByClassName(tf.graph.scene.Class.Scene.CORE)[0];
+    // Only show labels if the graph is fully loaded.
+    if (showLabels && core && this.progress && this.progress.value === 100) {
+      var aux =
+        this.getElementsByClassName(tf.graph.scene.Class.Scene.INEXTRACT)[0] ||
+        this.getElementsByClassName(tf.graph.scene.Class.Scene.OUTEXTRACT)[0];
+      var coreX = core.getCTM().e;
+      var auxX = aux ? aux.getCTM().e : null;
+      titleStyle.display = 'inline';
+      titleStyle.left = coreX + 'px';
+      if (auxX !== null && auxX !== coreX) {
+        auxTitleStyle.display = 'inline';
+        auxTitleStyle.left = auxX + 'px';
+      } else {
+        auxTitleStyle.display = 'none';
+      }
+    } else {
+      titleStyle.display='none';
+      auxTitleStyle.display = 'none';
+    }
+  },
+
+
+
+
+  /**
+    * Called whenever the user changed the 'color by' option in the
+    * UI controls.
+    */
+  _colorByChanged: function() {
+    // We iterate through each svg node and update its state.
+    _.each(this._nodeGroupIndex, function(nodeGroup, nodeName) {
+      this._updateNodeState(nodeName);
+    }, this);
+    // Notify also the minimap.
+    this.minimap.update();
+  },
+  fit: function() {
+    tf.graph.scene.fit(this.$.svg, this.$.root, this._zoom, function() {
+      this._zoomed = false;
+    }.bind(this));
+  },
+  isNodeSelected: function(n) {
+    return n === this.selectedNode;
+  },
+  isNodeHighlighted: function(n) {
+    return n === this.highlightedNode;
+  },
+  addAnnotationGroup: function(a, d, selection) {
+    var an = a.node.name;
+    this._annotationGroupIndex[an] = this._annotationGroupIndex[an] || {};
+    this._annotationGroupIndex[an][d.node.name] = selection;
+  },
+  getAnnotationGroupsIndex: function(a) {
+    return this._annotationGroupIndex[a];
+  },
+  removeAnnotationGroup: function(a, d) {
+    delete this._annotationGroupIndex[a.node.name][d.node.name];
+  },
+  addNodeGroup: function(n, selection) {
+    this._nodeGroupIndex[n] = selection;
+  },
+  getNodeGroup: function(n) {
+    return this._nodeGroupIndex[n];
+  },
+  removeNodeGroup: function(n) {
+    delete this._nodeGroupIndex[n];
+  },
+  addEdgeGroup: function(n, selection) {
+    this._edgeGroupIndex[e] = selection;
+  },
+  getEdgeGroup: function(e) {
+    return this._edgeGroupIndex[e];
+  },
+  /**
+   * Update node and annotation node of the given name.
+   * @param  {String} n node name
+   */
+  _updateNodeState: function(n) {
+    var node = this.getNode(n);
+    var nodeGroup = this.getNodeGroup(n);
+
+    if (nodeGroup) {
+      tf.graph.scene.node.stylize(nodeGroup, node, this);
+    }
+
+    var annotationGroupIndex = this.getAnnotationGroupsIndex(n);
+    _.each(annotationGroupIndex, function(aGroup, hostName) {
+      tf.graph.scene.node.stylize(aGroup, node, this,
+          tf.graph.scene.Class.Annotation.NODE);
+    }, this);
+  },
+
+  _selectedNodeChanged: function(selectedNode, oldSelectedNode) {
+    if (selectedNode === oldSelectedNode) {
+      return;
+    }
+
+    if (selectedNode) {
+      this._updateNodeState(selectedNode);
+    }
+    if (oldSelectedNode) {
+      this._updateNodeState(oldSelectedNode);
+    }
+
+    if (!selectedNode) {
+      return;
+    }
+    // Update the minimap to reflect the highlighted (selected) node.
+    this.minimap.update();
+    var node = this.graphHierarchy.hierarchy.node(selectedNode);
+    var nodeParents = [];
+    // Create list of all metanode parents of the selected node.
+    while (node.parentNode != null
+        && node.parentNode.name != tf.graph.ROOT_NAME) {
+      node = node.parentNode;
+      nodeParents.push(node.name);
+    }
+    // Ensure each parent metanode is built and expanded.
+    var topParentNodeToBeExpanded;
+    _.forEachRight(nodeParents, function(parentName) {
+      this.graphHierarchy.buildSubhierarchy(parentName);
+      var renderNode = this.graphHierarchy.getRenderNodeByName(parentName);
+      if (renderNode.node.isGroupNode && !renderNode.expanded) {
+        renderNode.expanded = true;
+        if (!topParentNodeToBeExpanded) {
+          topParentNodeToBeExpanded = renderNode;
+        }
+      }
+    }, this);
+    // If any expansion was needed to display this selected node, then
+    // inform the scene of the top-most expansion.
+    if (topParentNodeToBeExpanded) {
+      this.setNodeExpanded(topParentNodeToBeExpanded);
+      this._zoomed = true;
+    }
+
+    if (tf.graph.scene.panToNode(selectedNode, this.$.svg, this.$.root,
+        this._zoom)) {
+      this._zoomed = true;
+    }
+  },
+  _highlightedNodeChanged: function(highlightedNode, oldHighlightedNode) {
+    if (highlightedNode === oldHighlightedNode) {
+      return;
+    }
+
+    if (highlightedNode) {
+      this._updateNodeState(highlightedNode);
+    }
+    if (oldHighlightedNode) {
+      this._updateNodeState(oldHighlightedNode);
+    }
+  },
+  _onZoomChanged: function() {
+    this._updateLabels(!this._zoomed);
+  },
+  _fireEnableClick: function() {
+    this.fire('enable-click');
+  },
+});
+</script>
diff --git a/tensorflow/tensorboard/components/tf-graph/tf-graph-style.html b/tensorflow/tensorboard/components/tf-graph/tf-graph-style.html
new file mode 100644
index 0000000000..3e6f7f2112
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph/tf-graph-style.html
@@ -0,0 +1,339 @@
+<dom-module id="tf-graph-style">
+<template>
+<style>
+:host {
+  display: flex;
+  width: 100%;
+}
+
+::content #svg {
+  overflow: hidden;
+  flex: 1;
+}
+
+::content #hidden {
+  position: fixed;
+  top: 0px;
+  visibility: hidden;
+}
+
+
+/* --- Node and annotation-node for Metanode --- */
+
+::content .meta > .nodeshape > rect,
+::content .meta > .annotation-node > rect {
+  cursor: pointer;
+  fill: hsl(0, 0%, 70%);
+}
+
+
+::content .node.meta.highlighted > .nodeshape > rect,
+::content .node.meta.highlighted > .annotation-node > rect {
+  stroke-width: 2;
+}
+
+::content .annotation.meta.highlighted > .nodeshape > rect,
+::content .annotation.meta.highlighted > .annotation-node > rect {
+  stroke-width: 1;
+}
+
+::content .meta.selected > .nodeshape > rect,
+::content .meta.selected > .annotation-node > rect {
+  stroke: red;
+  stroke-width: 2;
+}
+
+::content .node.meta.selected.expanded > .nodeshape > rect,
+::content .node.meta.selected.expanded > .annotation-node > rect {
+  stroke: red;
+  stroke-width: 3;
+}
+
+:content .annotation.meta.selected > .nodeshape > rect,
+:content .annotation.meta.selected > .annotation-node > rect {
+  stroke: red;
+  stroke-width: 2;
+}
+
+::content .node.meta.selected.expanded.highlighted > .nodeshape > rect,
+::content .node.meta.selected.expanded.highlighted > .annotation-node > rect {
+  stroke: red;
+  stroke-width: 4;
+}
+
+
+/* --- Op Node --- */
+
+::content .op > .nodeshape > ellipse,
+::content .op > .annotation-node > ellipse {
+  cursor: pointer;
+  fill: #fff;
+  stroke: #ccc;
+}
+
+::content .op.selected > .nodeshape > ellipse,
+::content .op.selected > .annotation-node > ellipse {
+  stroke: red;
+  stroke-width: 2;
+}
+
+::content .op.highlighted > .nodeshape > ellipse,
+::content .op.highlighted > .annotation-node > ellipse {
+  stroke-width: 2;
+}
+
+/* --- Series Node --- */
+
+/* By default, don't show the series background <rect>. */
+::content .series > .nodeshape > rect {
+  fill: hsl(0, 0%, 70%);
+  fill-opacity: 0;
+  stroke-dasharray: 5, 5;
+  stroke-opacity: 0;
+  cursor: pointer;
+}
+
+/* Once expanded, show the series background <rect> and hide the <use>. */
+::content .series.expanded > .nodeshape > rect {
+  fill-opacity: 0.15;
+  stroke: hsl(0, 0%, 70%);
+  stroke-opacity: 1;
+}
+::content .series.expanded > .nodeshape > use {
+  visibility: hidden;
+}
+
+/**
+ * TODO(jimbo): Simplify this by applying a stable class name to all <g>
+ * elements that currently have either the nodeshape or annotation-node classes.
+ */
+::content .series > .nodeshape > use ,
+::content .series > .annotation-node > use {
+  stroke: #ccc;
+}
+::content .series.highlighted > .nodeshape > use ,
+::content .series.highlighted > .annotation-node > use {
+  stroke-width: 2;
+}
+::content .series.selected > .nodeshape > use ,
+::content .series.selected > .annotation-node > use {
+  stroke: red;
+  stroke-width: 2;
+}
+
+::content .series.selected > .nodeshape > rect {
+  stroke: red;
+  stroke-width: 2;
+}
+
+:content .annotation.series.selected > .annotation-node > use {
+  stroke: red;
+  stroke-width: 2;
+}
+
+/* --- Bridge Node --- */
+::content .bridge > .nodeshape > rect {
+  stroke: #f0f;
+  opacity: 0.2;
+  display: none;
+}
+
+/* --- Structural Elements --- */
+::content .edge > path.edgeline.structural {
+  stroke: #f0f;
+  opacity: 0.2;
+  display: none;
+}
+
+/* --- Series Nodes --- */
+
+/* Hide the rect for a series' annotation. */
+::content .series > .annotation-node > rect {
+  display: none;
+}
+
+/* --- Node label --- */
+
+
+::content .node > text.nodelabel {
+  cursor: pointer;
+  fill: #444;
+}
+
+::content .meta.expanded > text.nodelabel {
+  font-size: 9px;
+}
+
+::content .series > text.nodelabel {
+  font-size: 8px;
+}
+
+::content .op > text.nodelabel {
+  font-size: 6px;
+}
+
+::content .bridge > text.nodelabel {
+  display: none;
+}
+
+::content .node.meta.expanded > text.nodelabel{
+  cursor: normal;
+}
+
+::content .annotation.meta.highlighted > text.annotation-label {
+  fill: #50A3F7;
+}
+
+::content .annotation.meta.selected > text.annotation-label {
+  fill: #4285F4;
+}
+
+/* --- Annotation --- */
+
+/* only applied for annotations that are not summary or constant.
+(.summary, .constant gets overriden below) */
+::content .annotation > .annotation-node > * {
+  stroke-width: 0.5;
+  stroke-dasharray: 1, 1;
+}
+
+::content .annotation.summary > .annotation-node > *,
+::content .annotation.constant > .annotation-node > * {
+  stroke-width: 1;
+  stroke-dasharray: none;
+}
+
+::content .annotation > .annotation-edge {
+  fill: none;
+  stroke: #aaa;
+  stroke-width: 0.5;
+  marker-end: url(#annotation-arrowhead);
+}
+
+::content .annotation > .annotation-edge.refline {
+  marker-start: url(#ref-annotation-arrowhead);
+}
+
+::content .annotation > .annotation-control-edge {
+  stroke-dasharray: 1, 1;
+}
+
+::content #annotation-arrowhead {
+  fill: #aaa;
+}
+
+::content #ref-annotation-arrowhead {
+  fill: #aaa;
+}
+
+::content .annotation > .annotation-label {
+  font-size: 5px;
+  cursor: pointer;
+}
+::content .annotation > .annotation-label.annotation-ellipsis {
+  cursor: default;
+}
+
+/* Hide annotations on expanded meta nodes since they're redundant. */
+::content .expanded > .in-annotations,
+::content .expanded > .out-annotations {
+  display: none;
+}
+
+/* --- Annotation: Constant --- */
+
+::content .constant > .annotation-node > ellipse {
+  cursor: pointer;
+  fill: white;
+  stroke: #848484;
+}
+
+::content .constant.selected > .annotation-node > ellipse {
+  fill: white;
+  stroke: red;
+}
+
+::content .constant.highlighted > .annotation-node > ellipse {
+  stroke-width: 1.5;
+}
+
+/* --- Annotation: Summary --- */
+
+::content .summary > .annotation-node > ellipse {
+  cursor: pointer;
+  fill: #DB4437;
+  stroke: #DB4437;
+}
+
+::content .summary.selected > .annotation-node > ellipse {
+  fill: #A52714;
+  stroke: #A52714;
+}
+
+::content .summary.highlighted > .annotation-node > ellipse {
+  stroke-width: 1.5;
+}
+
+/* --- Edge --- */
+
+::content .edge > path.edgeline {
+  fill: none;
+  marker-end: url(#arrowhead);
+  stroke: #bbb;
+  stroke-linecap: round;
+  stroke-width: 0.75;
+}
+
+::content .edge > path.edgeline.refline {
+  marker-start: url(#ref-arrowhead);
+}
+
+::content #arrowhead {
+  fill: #bbb;
+}
+
+::content #ref-arrowhead {
+  fill: #bbb;
+}
+
+::content .edge .control-dep {
+  stroke-dasharray: 2, 2;
+}
+
+/* --- Group node expand/collapse button --- */
+
+/* Hides expand/collapse buttons when a node isn't expanded or highlighted. Using
+   incredibly small opacity so that the bounding box of the <g> parent still takes
+   this container into account even when it isn't visible */
+::content .node:not(.highlighted):not(.expanded) > .nodeshape > .buttoncontainer {
+  opacity: 0.01;
+}
+::content .node.highlighted > .nodeshape > .buttoncontainer {
+  cursor: pointer;
+}
+::content .buttoncircle {
+  fill: #E7811D;
+}
+::content .buttoncircle:hover {
+  fill: #B96717;
+}
+::content .expandbutton,
+::content .collapsebutton {
+  stroke: white;
+}
+/* Do not let the path elements in the button take pointer focus */
+::content .node > .nodeshape > .buttoncontainer > .expandbutton,
+::content .node > .nodeshape > .buttoncontainer > .collapsebutton {
+  pointer-events: none;
+}
+/* Only show the expand button when a node is collapsed and only show the
+   collapse button when a node is expanded. */
+::content .node.expanded > .nodeshape > .buttoncontainer > .expandbutton {
+  display: none;
+}
+::content .node:not(.expanded) > .nodeshape > .buttoncontainer > .collapsebutton {
+  display: none;
+}
+</style>
+</template>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-graph/tf-graph.html b/tensorflow/tensorboard/components/tf-graph/tf-graph.html
new file mode 100644
index 0000000000..905d96e237
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-graph/tf-graph.html
@@ -0,0 +1,221 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/iron-flex-layout/iron-flex-layout.html">
+<link rel="import" href="../../bower_components/iron-icons/iron-icons.html">
+<link rel="import" href="../../bower_components/paper-button/paper-button.html">
+<link rel="import" href="../../bower_components/paper-input/paper-input.html">
+<link rel="import" href="../../bower_components/paper-toggle-button/paper-toggle-button.html">
+<link rel="import" href="../tf-graph-common/tf-graph-common.html">
+<link rel="import" href="tf-graph-scene.html">
+<link rel="import" href="tf-graph-params.html">
+<dom-module id="tf-graph">
+<template>
+<style>
+.container {
+  width: 100%;
+  height: 100%;
+}
+
+.vertical {
+  width:100%;
+  height:100%;
+  @apply(--layout-vertical);
+}
+
+.auto {
+  @apply(--layout-flex-auto);
+  @apply(--layout-vertical);
+}
+
+h2 {
+  text-align: center;
+}
+
+paper-button {
+  text-transform: none;
+}
+</style>
+<div class="container">
+  <tf-graph-params id="graphParams"></tf-graph-params>
+  <div class="vertical">
+    <h2>[[title]]</h2>
+    <tf-graph-scene id="scene" class="auto"
+          graph-hierarchy="[[_renderHierarchy]]"
+          highlighted-node="[[_getVisible(highlightedNode)]]"
+          selected-node="[[selectedNode]]"
+          color-by="[[colorBy]]"
+          name="[[graphName]]"
+          progress="[[progress]]"
+    ></tf-graph-scene>
+  </div>
+</div>
+</template>
+</dom-module>
+
+<script>
+Polymer({
+
+  is: 'tf-graph',
+
+  properties: {
+    graphHierarchy: {
+      type: Object,
+      notify: true,
+      observer: '_graphChanged'
+    },
+    title: String,
+    selectedNode: {
+      type: String,
+      notify: true,
+    },
+    highlightedNode: {
+      type: String,
+      notify: true
+    },
+    /** What to color the nodes by (compute time, memory, device etc.) */
+    colorBy: String,
+    colorByParams: {
+      type: Object,
+      notify: true,
+      readOnly: true, // Produces and doesn't consume.
+    },
+    // internal properties
+    _graphParams: {
+      type: Object,
+      value: function() {
+        return this.$.graphParams;
+      }
+    },
+    _renderDepth: {
+      type: Number,
+      value: 1
+    },
+    _renderHierarchy: {
+      type: Object,
+      readOnly: true,
+      notify: true,
+      computed: '_buildRenderHierarchy(graphHierarchy, _graphParams)'
+    },
+    _allowGraphSelect: {
+      type: Boolean,
+      value: true
+    }
+  },
+  _buildRenderHierarchy: function(graphHierarchy, params) {
+    return tf.time('new tf.graph.render.Hierarchy', function() {
+      if (graphHierarchy.root.type !== tf.graph.NodeType.META) {
+        // root must be metanode but sometimes Polymer's dom-if has not
+        // remove tf-graph element yet in <tf-node-info>
+        // and thus mistakenly pass non-metanode to this module.
+        return;
+      }
+      var renderGraph = new tf.graph.render.RenderGraphInformation(
+          graphHierarchy, params);
+      // Producing the 'color by' parameters to be consumed
+      // by the tf-graph-controls panel. It contains information about the
+      // min and max values and their respective colors, as well as list
+      // of devices with their respective colors.
+
+      function getColorParamsFromScale(scale) {
+        return {
+          minValue: scale.domain()[0],
+          maxValue: scale.domain()[1],
+          startColor: scale.range()[0],
+          endColor: scale.range()[1]
+        };
+      }
+
+      this._setColorByParams({
+        compute_time: getColorParamsFromScale(renderGraph.computeTimeScale),
+        memory: getColorParamsFromScale(renderGraph.memoryUsageScale),
+        device: _.map(renderGraph.deviceColorMap.domain(),
+            function(deviceName) {
+          return {
+            device: deviceName,
+            color: renderGraph.deviceColorMap(deviceName)
+          };
+        })
+      });
+      return renderGraph;
+    }.bind(this));
+  },
+  _getVisible: function(name) {
+    if (!name) {
+      return name;
+    }
+    return this._renderHierarchy.getNearestVisibleAncestor(name);
+  },
+  listeners: {
+    'graph-select': '_graphSelected',
+    'disable-click': '_disableClick',
+    'enable-click': '_enableClick',
+    // Nodes
+    'node-toggle-expand': '_nodeToggleExpand',
+    'node-select': '_nodeSelected',
+    'node-highlight': '_nodeHighlighted',
+    'node-unhighlight': '_nodeUnhighlighted',
+
+    // Annotations
+
+    /* Note: currently highlighting/selecting annotation node has the same
+      * behavior as highlighting/selecting actual node so we point to the same
+      * set of event listeners.  However, we might redesign this to be a bit
+      * different.
+      */
+    'annotation-select': '_nodeSelected',
+    'annotation-highlight': '_nodeHighlighted',
+    'annotation-unhighlight': '_nodeUnhighlighted',
+  },
+  _graphChanged: function() {
+    // When a new graph is loaded, fire this event so that there is no
+    // info-card being displayed for the previously-loaded graph.
+    this.fire('graph-select');
+  },
+  _graphSelected: function(event) {
+    // Graph selection is not allowed during an active zoom event, as the
+    // click seen during a zoom/pan is part of the zooming and does not
+    // indicate a user desire to click on a specific section of the graph.
+    if (this._allowGraphSelect) {
+      this.set('selectedNode', null);
+    }
+    // Reset this variable as a bug in d3 zoom behavior can cause zoomend
+    // callback not to be called if a right-click happens during a zoom event.
+    this._allowGraphSelect = true;
+  },
+  _disableClick: function(event) {
+    this._allowGraphSelect = false;
+  },
+  _enableClick: function(event) {
+    this._allowGraphSelect = true;
+  },
+  _nodeSelected: function(event) {
+    if (this._allowGraphSelect) {
+      this.set('selectedNode', event.detail.name);
+    }
+    // Reset this variable as a bug in d3 zoom behavior can cause zoomend
+    // callback not to be called if a right-click happens during a zoom event.
+    this._allowGraphSelect = true;
+  },
+  _nodeHighlighted: function(event) {
+    this.set('highlightedNode', event.detail.name);
+  },
+  _nodeUnhighlighted: function(event) {
+    this.set('highlightedNode', null);
+  },
+  _nodeToggleExpand: function(event) {
+    var nodeName = event.detail.name;
+    var renderNode = this._renderHierarchy.getRenderNodeByName(nodeName);
+    // Op nodes are not expandable.
+    if (renderNode.node.type === tf.graph.NodeType.OP) {
+      return;
+    }
+    this._renderHierarchy.buildSubhierarchy(nodeName);
+    renderNode.expanded = !renderNode.expanded;
+    this.querySelector('#scene').setNodeExpanded(renderNode);
+    // Also select the expanded node.
+    this._nodeSelected(event);
+  },
+  not: function(x) {
+    return !x;
+  }
+});
+</script>
diff --git a/tensorflow/tensorboard/components/tf-histogram-dashboard/tf-histogram-dashboard.html b/tensorflow/tensorboard/components/tf-histogram-dashboard/tf-histogram-dashboard.html
new file mode 100644
index 0000000000..8f8f159964
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-histogram-dashboard/tf-histogram-dashboard.html
@@ -0,0 +1,210 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../tf-event-dashboard/tf-data-coordinator.html">
+<link rel="import" href="../tf-event-dashboard/tf-tooltip-coordinator.html">
+<link rel="import" href="../tf-event-dashboard/tf-run-selector.html">
+<link rel="import" href="../tf-event-dashboard/tf-x-type-selector.html">
+<link rel="import" href="../tf-dashboard-common/tf-run-generator.html">
+<link rel="import" href="../tf-event-dashboard/tf-color-scale.html">
+<link rel="import" href="../tf-dashboard-common/tf-url-generator.html">
+<link rel="import" href="../tf-dashboard-common/tf-dashboard-layout.html">
+<link rel="import" href="../tf-dashboard-common/dashboard-style.html">
+<link rel="import" href="../tf-dashboard-common/warning-style.html">
+<link rel="import" href="../tf-categorizer/tf-categorizer.html">
+<link rel="import" href="../tf-event-dashboard/tf-chart.html">
+<link rel="import" href="../tf-collapsable-pane/tf-collapsable-pane.html">
+<link rel="import" href="../../bower_components/iron-collapse/iron-collapse.html">
+<link rel="import" href="../../bower_components/paper-icon-button/paper-icon-button.html">
+<link rel="import" href="../imports/lodash.html">
+
+<!--
+tf-histogram-dashboard is a complete frontend that loads runs from a backend,
+and creates chart panes that display data for those runs.
+
+It provides a categorizer, run selector, and x type selector, by which the user
+can customize how data is organized and displayed.
+
+Each chart has a button that can toggle whether it is "selected"; selectedRuns
+charts are larger.
+
+Organizationally, the #plumbing div contains components that have no concrete
+manifestation and just effect data bindings or data loading. The #sidebar contains
+shared controls like the tf-categorizer, tf-run-selector, and tf-x-type-selector.
+The #center div contains tf-charts embedded inside tf-collapsable-panes.
+-->
+<dom-module id="tf-histogram-dashboard">
+  <template>
+    <div id="plumbing">
+      <tf-url-generator
+        out-runs-url="{{runsUrl}}"
+        out-compressed-histograms-url-generator="{{compressedHistogramsUrlGen}}"
+        id="urlGenerator"
+      ></tf-url-generator>
+
+      <tf-data-coordinator
+        id="dataCoordinator"
+        url-generator="[[compressedHistogramsUrlGen]]"
+        run-to-tag="[[runToCompressedHistograms]]"
+        color-scale="[[colorScale]]"
+        out-data-coordinator="{{dataCoordinator}}"
+      /></tf-data-coordinator>
+
+      <tf-run-generator
+        id="runGenerator"
+        url="[[runsUrl]]"
+        out-run-to-compressed-histograms="{{runToCompressedHistograms}}"
+      /></tf-run-generator>
+
+      <tf-color-scale
+        id="colorScale"
+        runs="[[_runs]]"
+        out-color-scale="{{colorScale}}"
+        out-class-scale="{{classScale}}"
+      ></tf-color-scale>
+
+      <tf-tooltip-coordinator
+        id="tooltipCoordinator"
+        out-tooltip-updater="{{tooltipUpdater}}"
+        out-tooltip-map="{{tooltipMap}}"
+        out-x-value="{{tooltipXValue}}"
+        out-closest-run="{{closestRun}}"
+      ></tf-tooltip-coordinator>
+    </div>
+
+    <tf-dashboard-layout>
+      <div class="sidebar">
+
+        <tf-categorizer
+          id="categorizer"
+          tags="[[_visibleTags]]"
+          categories="{{categories}}"
+        ></tf-categorizer>
+
+        <tf-x-type-selector
+          id="xTypeSelector"
+          out-x-type="{{xType}}"
+        ></tf-x-type-selector>
+
+        <tf-run-selector
+          id="runSelector"
+          runs="[[_runs]]"
+          class-scale="[[classScale]]"
+          out-selected="{{selectedRuns}}"
+          tooltips="[[tooltipMap]]"
+          closest-run="[[closestRun]]"
+          x-value="[[tooltipXValue]]"
+          x-type="[[xType]]"
+        ></tf-run-selector>
+
+      </div>
+
+      <div class="center">
+        <template is="dom-if" if="[[!categories.length]]">
+          <div class="warning">
+            <p>
+              No histogram tags were found.
+            </p>
+            <p>
+              Maybe data hasn't loaded yet, or maybe you need
+              to add some <code>tf.histogram_summary</code> ops to your graph, and
+              serialize them using the <code>tf.training.summary_io.SummaryWriter</code>.
+            </p>
+          </div>
+        </template>
+        <template is="dom-repeat" items="[[categories]]">
+          <tf-collapsable-pane name="[[item.name]]" count="[[_count(item.tags, selectedRuns.*, runToCompressedHistograms.*)]]">
+            <div class="layout horizontal wrap">
+              <template is="dom-repeat" items="[[item.tags]]" as="tag">
+                <template is="dom-repeat" items="[[selectedRuns]]" as="run">
+                  <template is="dom-if" if="[[_exists(run, tag, runToCompressedHistograms.*)]]">
+                    <div class="card">
+                      <span class="card-title">[[tag]]</span>
+                      <div class="card-content">
+                        <tf-chart
+                          tag="[[tag]]"
+                          type="compressedHistogram"
+                          id="chart"
+                          selected-runs="[[_array(run)]]"
+                          x-type="[[xType]]"
+                          data-coordinator="[[dataCoordinator]]"
+                          color-scale="[[colorScale]]"
+                          on-keyup="toggleSelected"
+                          tabindex="2"
+                          tooltip-updater="[[tooltipUpdater]]"
+                        ></tf-chart>
+                        <paper-icon-button
+                          class="expand-button"
+                          icon="fullscreen"
+                          on-tap="toggleSelected"
+                        ></paper-icon-button>
+                      </div>
+                    </div>
+                  </template>
+                </template>
+              </template>
+            </div>
+          </tf-collapsable-pane>
+        </template>
+      </div>
+    </tf-dashboard-layout>
+
+    <style include="dashboard-style"></style>
+    <style include="warning-style"></style>
+  </template>
+
+  <script>
+    Polymer({
+      is: "tf-histogram-dashboard",
+      properties: {
+        _runs: {
+          type: Array,
+          computed: "_getRuns(runToCompressedHistograms)",
+        },
+        _visibleTags: {
+          type: Array,
+          computed: "_getVisibleTags(selectedRuns.*, runToCompressedHistograms.*)"
+        }
+      },
+      _exists: function(run, tag, runToCompressedHistogramsChange) {
+        var runToCompressedHistograms = runToCompressedHistogramsChange.base;
+        return runToCompressedHistograms[run].indexOf(tag) !== -1;
+      },
+      _array: function(x) {
+        return [x];
+      },
+      _count: function(tags, selectedRunsChange, runToCompressedHistogramsChange) {
+        var selectedRuns = selectedRunsChange.base;
+        var runToCompressedHistograms = runToCompressedHistogramsChange.base;
+        var targetTags = {};
+        tags.forEach(function(t) {
+          targetTags[t] = true;
+        });
+        var count = 0;
+        selectedRuns.forEach(function(r) {
+          runToCompressedHistograms[r].forEach(function(t) {
+            if (targetTags[t]) {
+              count++;
+            }
+          });
+        });
+        return count;
+      },
+      _getRuns: function(runToCompressedHistograms) {
+        return _.keys(runToCompressedHistograms);
+      },
+      _getVisibleTags: function(selectedRunsChange, runToCompressedHistogramsChange) {
+        var keys = selectedRunsChange.base;
+        var dict = runToCompressedHistogramsChange.base;
+        return _.union.apply(null, keys.map(function(k) {return dict[k]}));
+      },
+      toggleSelected: function(e) {
+        var currentTarget = Polymer.dom(e.currentTarget);
+        var parentDiv = currentTarget.parentNode.parentNode;
+        parentDiv.classList.toggle("selected");
+        var chart = currentTarget.previousElementSibling;
+        if (chart) {
+          chart.redraw();
+        }
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-image-dashboard/demo/image-loader-demo.html b/tensorflow/tensorboard/components/tf-image-dashboard/demo/image-loader-demo.html
new file mode 100644
index 0000000000..7aafd247f3
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-image-dashboard/demo/image-loader-demo.html
@@ -0,0 +1,73 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+    <script src="../../../bower_components/d3/d3.js"></script>
+    <script src="../../../bower_components/plottable/plottable.js"></script>
+    <link rel="stylesheet" type="text/css" href="../../../bower_components/plottable/plottable.css">
+    <link rel="import" href="../tf-image-dashboard.html">
+    <title>Event Dashboard Demo Demo</title>
+  </head>
+  <body>
+    <script>
+      TF.Urls.runsUrl = function() {
+        return "data/runs.json"
+      };
+      TF.Urls.scalarsUrl = function(tag, run) {
+        return "data/" + run + "/" + tag + ".json";
+      };
+    </script>
+
+    <dom-module id="x-demo">
+      <style>
+      #loader {
+        width: 300px;
+        height: 300px;
+      }
+      </style>
+      <template>
+        <tf-image-loader
+          id="loader"
+          run="[[run]]"
+          tag="[[tag]]"
+          images-generator="[[imagesGenerator]]"
+          individual-image-generator="[[individualImageGenerator]]"
+        ></tf-image-loader>
+      </template>
+      <script>
+      var imagesUrl = function(tag, run) {
+        return "data/images/" + run + "/" + tag + ".json";
+      };
+      var individualImageUrl = function(query) {
+        return "data/individualImage/" + query + ".png";
+      };
+      Polymer({
+        is: "x-demo",
+        properties: {
+          run: {
+            type: String,
+            value: "train",
+          },
+          tag: {
+            type: String,
+            value: "reconstruction_07%2Fimage%2F2"
+          },
+          imagesGenerator: {
+            type: Function,
+            value: function() {
+              return imagesUrl;
+            },
+          },
+          individualImageGenerator: {
+            type: Function,
+            value: function() {
+              return individualImageUrl;
+            },
+          },
+        },
+      });
+      </script>
+    </dom-module>
+    <x-demo id="demo"></x-demo>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-image-dashboard/demo/index.html b/tensorflow/tensorboard/components/tf-image-dashboard/demo/index.html
new file mode 100644
index 0000000000..4645b4b783
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-image-dashboard/demo/index.html
@@ -0,0 +1,39 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+    <link rel="stylesheet" type="text/css" href="../../../lib/css/global.css">
+    <link rel="import" href="../tf-image-dashboard.html">
+    <title>Image Dashboard Demo</title>
+    <style>
+      #container{
+        width: 500px;
+        height: 800px;
+        border: 2px solid grey;
+      }
+      html,body {
+        height: 100%;
+      }
+    </style>
+  </head>
+  <body>
+    <script>
+      TF.Urls.runsUrl = function() {
+        return "data/runs.json"
+      };
+      TF.Urls.imagesUrl = function(tag, run) {
+        return "data/images/" + run + "/" + tag + ".json";
+      };
+      TF.Urls.individualImageUrl = function(query) {
+        return "data/individualImage/" + query + ".png";
+      }
+    </script>
+
+    <p>The image dashboard is deliberately inside a small container
+    so that it's easy to test that the scroll bars display properly.</p>
+    <p>Looks goofy though.</p>
+    <div id="container">
+      <tf-image-dashboard id="demo"></tf-image-dashboard>
+    </div>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-dashboard.html b/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-dashboard.html
new file mode 100644
index 0000000000..726a420e9f
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-dashboard.html
@@ -0,0 +1,90 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../tf-dashboard-common/tf-run-generator.html">
+<link rel="import" href="../tf-dashboard-common/tf-url-generator.html">
+<link rel="import" href="../../bower_components/paper-icon-button/paper-icon-button.html">
+<link rel="import" href="tf-image-grid.html">
+<link rel="import" href="../tf-dashboard-common/warning-style.html">
+
+<!--
+tf-image-dashboard displays a dashboard that loads images from a TensorFlow run.
+
+Right now it has very simple behavior: Creates a url-generator and run-generator
+to talk to the backend, and then passes the runToImages map and urlGenerators into
+a tf-image-grid for display.
+
+Likely we will add more in the future, e.g. a sidebar like in the event
+dashboard that allows filtering and organizing the tags and runs, and a
+mechanism for loading older images rather than always getting the most recent one.
+-->
+<dom-module id="tf-image-dashboard">
+  <template>
+    <div id="plumbing">
+      <tf-url-generator
+        out-runs-url="{{runsUrl}}"
+        out-images-url-generator="{{imagesUrlGen}}"
+        out-individual-image-url-generator="{{individualImageUrlGen}}"
+        id="urlGenerator"
+      ></tf-url-generator>
+
+      <tf-run-generator
+        id="runGenerator"
+        url="[[runsUrl]]"
+        out-run-to-images="{{runToImages}}"
+      /></tf-run-generator>
+    </div>
+
+    <div class="center">
+      <template is="dom-if" if="[[!_hasImages(runToImages.*)]]">
+        <div class="warning">
+          <p>
+            No image tags were found.
+          </p>
+          <p>
+            Maybe data hasn't loaded yet, or maybe you need
+            to add some <code>tf.image_summary</code> ops to your graph, and
+            serialize them using the <code>tf.training.summary_io.SummaryWriter</code>.
+          </p>
+        </div>
+      </template>
+      <tf-image-grid
+        id="imageGrid"
+        run-to-images="[[runToImages]]"
+        images-generator="[[imagesUrlGen]]"
+        individual-image-generator="[[individualImageUrlGen]]"
+      ></tf-image-grid>
+    </div>
+
+    <style>
+      .center {
+        padding-left: 10px;
+        padding-right: 10px;
+        height: 100%;
+        width: 100%;
+        -webkit-box-sizing: border-box;
+        -moz-box-sizing: border-box;
+        box-sizing: border-box;
+      }
+      :host {
+        height: 100%;
+        display: block;
+      }
+
+    </style>
+    <style include="warning-style"></style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-image-dashboard",
+      properties: {
+        runToImages: Object,
+        imagesUrlGen: Function,
+        individualImageUrlGen: Function,
+      },
+      _hasImages: function(runToImagesChange) {
+        return _.values(runToImagesChange.base).some(function(arr) {
+          return arr.length > 0;
+        });
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-grid.html b/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-grid.html
new file mode 100644
index 0000000000..b7787b98c4
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-grid.html
@@ -0,0 +1,166 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-styles/paper-styles.html">
+<link rel="import" href="tf-image-loader.html">
+<link rel="import" href="../imports/lodash.html">
+<link rel="import" href="../tf-dashboard-common/scrollbar-style.html">
+
+<!--
+tf-image-grid creates a grid for examining image data. The columsn correspond
+to runs and the rows correspond to tags. Each cell is an image.
+
+Structurally, it makes extensive use of flexbox for layout: it has a top-level
+columnar flexbox that contains the topRow (run names) and then a
+bottomContainer. The bottomContainer is another columnar flexbox which contains
+repeated image-rows. Each image-row is a row flexbox which contains a tag name
+cell, and then image cells.
+
+In the future, we should improve on the layout by making the tag names and run names have fixed positions
+within the image-grid, so that when you scroll you always have context (e.g. row and column names in a spreadsheet).
+For now, it just scrolls.
+
+The image grid provides internal scroll bars (with styling) so that it can be dropped into
+a dashboard in a predictable fashion, even though the internal image grid may be enormous.
+
+Room for future improvement:
+
+- Make it obvious when an image didn't load due to the image not existing.
+- Find some way to collapse sparse image grids into denser ones (when sparsity
+is high)
+- Fix column/row names
+- Include hook for loading past images (by step/timestamp? or index?)
+
+-->
+<dom-module id="tf-image-grid">
+  <template>
+    <style include="scrollbar-style"></style>
+    <div id="fullContainer" class="container scrollbar">
+      <div id="topRow" class="container">
+        <div class="noshrink" id="paddingCell"></div>
+        <template
+          is="dom-repeat"
+          items="[[_runs]]"
+          as="run"
+        >
+        <div class="run-name-cell noshrink">
+          <span>[[run]]</span>
+        </div>
+      </template>
+      </div>
+      <div id="bottomContainer" class="container">
+        <template
+          is="dom-repeat"
+          items="[[_tags]]"
+          sort
+          as="tag"
+        >
+          <div class="image-row container noshrink">
+            <div class="tag-name-cell noshrink">
+              <span class="tag-name">[[tag]]</span>
+            </div>
+            <template
+              is="dom-repeat"
+              items="[[_runs]]"
+              as="run"
+            >
+              <div class="image-cell noshrink">
+                <template is="dom-if" if="[[_exists(run, tag, runToImages.*)]]">
+                  <tf-image-loader
+                    id="loader"
+                    run="[[run]]"
+                    tag="[[tag]]"
+                    images-generator="[[imagesGenerator]]"
+                    individual-image-generator="[[individualImageGenerator]]"
+                  >
+                  </tf-image-loader>
+                </template>
+              </div>
+            </template>
+          </div>
+        </template>
+      </div>
+    </div>
+    <style>
+      :host {
+        display: block;
+        height: 100%;
+      }
+      .container {
+        display: flex;
+        flex-wrap: nowrap;
+      }
+      #fullContainer {
+        width: 100%;
+        height: 100%;
+        flex-direction: column;
+        padding-top: 20px;
+        overflow: scroll;
+        -webkit-box-sizing: border-box;
+        -moz-box-sizing: border-box;
+        box-sizing: border-box;
+      }
+      #topRow {
+        flex-direction: row;
+      }
+      #bottomContainer {
+        flex-direction: column;
+        height: 100%;
+        width: 100%;
+      }
+      .image-row {
+        flex-direction: row;
+      }
+      .image-cell {
+        width: 300px;
+        height: 300px;
+        border: 1px solid black;
+      }
+      .tag-name-cell {
+        height: 300px;
+        width: 300px;
+        display:flex;
+        flex-direction: column;
+        justify-content: center;
+      }
+      .tag-name {
+        word-wrap: break-word;
+        text-align: center;
+        white-space: nowrap;
+      }
+      .run-name-cell {
+        width: 300px;
+        height: 30px;
+        text-align: center;
+      }
+      .noshrink {
+        flex-shrink: 0;
+      }
+      #paddingCell {
+        width: 300px;
+        height: 30px;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-image-grid",
+      properties: {
+        runToImages: Object,
+        _tags: {type: Array, computed: "_getTags(runToImages.*)"},
+        _runs: {type: Array, computed: "_getRuns(runToImages.*)"},
+        imagesGenerator: Function,
+        individualImageGenerator: Function,
+      },
+      _getTags: function(runToImages) {
+        return _.chain(runToImages.base).values().flatten().union().value();
+      },
+      _getRuns(runToImages) {
+        var r2i = runToImages.base;
+        return _.keys(r2i).filter(function(x) {return r2i[x].length > 0;});
+      },
+      _exists: function (run, tag, runToImages) {
+        runToImages = runToImages.base;
+        return runToImages[run].indexOf(tag) !== -1;
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-loader.html b/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-loader.html
new file mode 100644
index 0000000000..e70f189c73
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-image-dashboard/tf-image-loader.html
@@ -0,0 +1,64 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../imports/lodash.html">
+
+<!--
+tf-image-loader loads an individual image from the TensorBoard backend.
+
+Right now it always loads the most recent image. We should add support in the
+future for loading older images.
+-->
+<dom-module id="tf-image-loader">
+  <style>
+  :host {
+    display: block;
+  }
+  img {
+    width: 100%;
+    height: 100%;
+  }
+  </style>
+  <template>
+    <iron-ajax
+      id="ajax"
+      auto
+      url="[[metadataUrl]]"
+      handle-as="json"
+      debounce="50"
+      last-response="{{imageMetadata}}"
+      verbose="true"
+    ></iron-ajax>
+    <template is="dom-if" if="[[imageUrl]]">
+      <img src="[[imageUrl]]"></img>
+    </template is="dom-if">
+  </template>
+  <script>
+    Polymer({
+      is: "tf-image-loader",
+      properties: {
+        run: String,
+        tag: String,
+        imagesGenerator: Function,
+        individualImageGenerator: Function,
+        imageMetadata: Array,
+        metadataUrl: {
+          type: String,
+          computed: "apply(imagesGenerator, tag, run)",
+        },
+        imageUrl: {
+          type: String,
+          computed: "getLastImage(imageMetadata, individualImageGenerator)",
+        },
+      },
+      apply: function(imagesGenerator, run, tag) {
+        return imagesGenerator(run, tag);
+      },
+      getLastImage: function(imageMetadata, individualImageGenerator) {
+        if (imageMetadata == null) {
+          return null;
+        }
+        var query = _.last(imageMetadata).query;
+        return individualImageGenerator(query);
+      },
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-multi-checkbox/demo/index.html b/tensorflow/tensorboard/components/tf-multi-checkbox/demo/index.html
new file mode 100644
index 0000000000..e5661b98bc
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-multi-checkbox/demo/index.html
@@ -0,0 +1,162 @@
+<!DOCTYPE html>
+<html>
+<head>
+<script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+<script src="../../../bower_components/d3/d3.js"></script>
+<link rel="import" href="../tf-multi-checkbox.html">
+<link rel="import" href="../../tf-event-dashboard/tf-color-scale.html">
+<link rel="stylesheet" type="text/css" href="../../../lib/css/global.css">
+
+</head>
+<body>
+<script>
+var seed = 1;
+function random() {
+  var x = Math.sin(seed++) * 10000;
+  return x - Math.floor(x);
+}
+</script>
+<style>
+</style>
+
+<dom-module id="mc-demo">
+  <template>
+    <tf-multi-checkbox
+      id="multiCheckbox"
+      names="[[names]]"
+      tooltips="[[_tooltips]]"
+      class-scale="[[classScale]]"
+      highlights="[[highlights]]"
+    ></tf-multi-checkbox>
+    <tf-color-scale
+      id="colorScale"
+      runs="[[names]]"
+      out-class-scale="{{classScale}}"
+    ></tf-color-scale>
+  <style>
+  </style>
+  </template>
+  <script>
+
+  function randomTooltip() {
+    var s = "";
+    while (random() < 0.8) {
+      s += String(10*random())[0];
+    }
+    return s;
+  }
+  Polymer({
+    is: "mc-demo",
+    properties: {
+      names: Array,
+      tooltips: Object,
+      autoGenerateTooltips: {value: true},
+      _tooltips: Object,
+      classScale: Function,
+      highlights: Array,
+    },
+    observers: [
+      'autogenerate(names, autoGenerateTooltips)',
+      'randomHighlights(names)'
+    ],
+    autogenerate: function(names, autoGenerateTooltips) {
+      if (autoGenerateTooltips) {
+        var tooltips = {};
+        names.forEach(function(n) {
+        if (random() > 0.5) {
+          tooltips[n] = randomTooltip();
+        }
+      });
+      this._tooltips = tooltips;
+      }
+    },
+    randomHighlights: function(names) {
+      var h = [];
+      names.forEach(function(n) {
+        if (random() > 0.6) {
+          h.push(n);
+        }
+      });
+      this.highlights = h;
+    }
+  });
+  </script>
+</dom-module>
+
+<dom-module id="x-demo">
+<style>
+.small {
+  width: 200px;
+  height: 500px;
+}
+.large {
+  width: 500px;
+  height: 900px;
+}
+html,body {
+  height: 100%;
+}
+mc-demo {
+  padding: 5px;
+  border: 1px solid var(--paper-red-500);
+  display: inline-block;
+}
+</style>
+<template>
+  <div class="demo-block">
+    <mc-demo id="demo1" class="small" names="[[long_names]]"></mc-demo>
+    <mc-demo class="small" names="[[many_names]]"></mc-demo>
+    <mc-demo class="small" names="[[many_long_names]]"></mc-demo>
+  </div>
+
+  <div class="demo-block">
+    <mc-demo class="large" names="[[long_names]]"></mc-demo>
+    <mc-demo class="large" names="[[many_names]]"></mc-demo>
+    <mc-demo class="large" names="[[many_long_names]]"></mc-demo>
+  </div>
+
+</template>
+<script>
+
+function long_names() {
+  return [
+    "foo_bar very long name with spaces",
+    "the quick brown fox jumped over the lazy dog",
+    "supercalifragilisticexpialodcious/bar/foo/zod/longer/longer",
+  ];
+}
+
+function many_names() {
+  var out = [];
+  for (var i=0; i<20; i++) {
+    out.push("foo_bar-" + i);
+    out.push("bar_zod_bing-" + i);
+    out.push("lol-" + i);
+  }
+  return out;
+}
+
+function many_long_names() {
+  var out = [];
+  for (var i=0; i<20; i++) {
+    out.push("foo_bar very very very long some spaces though-" + i);
+    out.push("bar_zod_bing_bas_womp_wub_wub_dub_wub_wub-" + i);
+    out.push("rightly_to_be_great_is_not_to_stir_without_great_argument_but_greatly_to_find_quarrel_in_a_straw_when_honors_at_the_stake-" + i);
+  }
+  return out;
+}
+
+Polymer({
+  is: "x-demo",
+  properties: {
+  long_names: {type: Array, value: long_names},
+  many_names: {type: Array, value: many_names},
+  many_long_names: {type: Array, value: many_long_names},
+},
+});
+</script>
+</dom-module>
+
+<x-demo id="demo"></x-demo>
+</body>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-multi-checkbox/tf-multi-checkbox.html b/tensorflow/tensorboard/components/tf-multi-checkbox/tf-multi-checkbox.html
new file mode 100644
index 0000000000..a5447e8f5e
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-multi-checkbox/tf-multi-checkbox.html
@@ -0,0 +1,228 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-checkbox/paper-checkbox.html">
+<link rel="import" href="../imports/lodash.html">
+<link rel="import" href="../tf-dashboard-common/scrollbar-style.html">
+<link rel="import" href="../tf-dashboard-common/run-color-style.html">
+<!--
+tf-multi-checkbox creates a list of checkboxes that can be used to toggle on or off
+a large number of values. Each checkbox displays a name, and may also have an
+assosciated tooltip value. Checkboxes can be highlighted, hidden, and re-ordered.
+
+tf-multi-checkbox assumes that the names may be very long compared to the width
+of the checkbox, and the number of names may also be very large, and works to
+handle these situations gracefully.
+
+API:
+
+Properties in:
+names: The string names to associate with checkboxes.
+tooltips: An object mapping from name to tooltip value.
+tooltipOrderer: A function that is used to compute how to order the names based
+on tooltip values (when available). If tooltip values and a tooltip orderer are
+present, the tooltipOrderer computes a numeric value for each tooltip, tooltips
+with higher values are ordered first, tooltips with equal values are ordered
+lexicogrpahically, and tooltips without a value are placed last. If the
+tooltipOrderer is set to a falsey value (eg null), then the names are not
+re-ordered based on tooltip value.
+classScale: A function that maps from name to class name, which is applied as
+the special property color-class. This is intended to be used to color the names.
+hideMissingTooltips: If set, then when tooltips are present, any names that do
+not have an associate non-empty tooltip value will be hidden.
+
+Properties out:
+outSelected: An array of names that the user has checked.
+If the user does not interact, everything will be checked.
+
+-->
+
+<dom-module id="tf-multi-checkbox">
+  <style include="scrollbar-style"></style>
+  <style include="run-color-style"></style>
+
+  <template>
+    <div id="outer-container" class="scrollbar">
+      <template
+        is="dom-repeat"
+        items="[[names]]"
+        sort="[[_tooltipComparator(tooltips, tooltipOrderer)]]"
+      >
+        <div
+          class="run-row"
+          color-class$="[[_applyColorClass(item, classScale)]]"
+          null-tooltip$="[[_isNullTooltip(item, tooltips)]]"
+          highlight$="[[_isHighlighted(item, highlights.*)]]"
+        >
+          <div class="checkbox-container vertical-align-container">
+            <paper-checkbox
+              class="checkbox vertical-align-center"
+              name="[[item]]"
+              checked$="[[_isChecked(item,outSelected.*)]]"
+              on-change="_checkboxChange"
+            ></paper-checkbox>
+          </div>
+          <div class="item-label-container">
+            <span>[[item]]</span>
+          </div>
+          <div class="tooltip-value-container vertical-align-container">
+            <span class="vertical-align-top">[[_lookupTooltip(item,tooltips)]]</span>
+          </div>
+        </div>
+      </template>
+    </div>
+  <style>
+    :host {
+      display: flex;
+      flex-direction: column;
+      height: 100%;
+    }
+    #outer-container {
+      overflow-y: scroll;
+      overflow-x: hidden;
+      width: 100%;
+      flex-grow: 1;
+      flex-shrink: 1;
+      word-wrap: break-word;
+    }
+    .run-row {
+      padding-top: 5px;
+      padding-bottom: 5px;
+      display: flex;
+      flex-direction: row;
+      font-size: 13px;
+    }
+    .checkbox-container {
+      flex-grow: 0;
+      flex-shrink: 0;
+    }
+    .checkbox {
+      padding-left: 2px;
+      width: 32px;
+    }
+    .item-label-container {
+      flex-grow: 1;
+      flex-shrink: 1;
+      width: 0px; /* hack to get the flex-grow to work properly */
+    }
+    .tooltip-value-container {
+      display: flex;
+      justify-content: center;
+      flex-grow: 0;
+      flex-shrink: 0;
+      text-align:right;
+      padding-left: 2px;
+    }
+    .vertical-align-container {
+      display: flex;
+      justify-content: center;
+    }
+    .vertical-align-container .vertical-align-center {
+      align-self: center;
+    }
+    .vertical-align-container .vertical-align-top {
+      align-self: start;
+    }
+    [null-tooltip] {
+      display: none;
+    }
+    [highlight] {
+      font-weight: bold;
+    }
+  </style>
+  </template>
+
+  <script>
+  Polymer({
+    is: "tf-multi-checkbox",
+    properties: {
+      names: Array,
+      tooltipOrderer: {
+        /* Used to compute how to order the tooltips based on the tooltip value.
+         * By default, it parses the tooltip strings as numbers.
+         * If set to a falsey value, tooltips are always ordered lexicographically.
+         */
+        type: Function,
+        value: function() {
+          return function(x) {return +x;}
+        },
+      },
+      tooltips: Object,
+      highlights: Array,
+      outSelected: {
+        type: Array,
+        notify: true,
+        value: function() {
+          return [];
+        },
+      },
+      hideMissingTooltips: {
+        // If we have tooltips, but some names are missing, do we hide them?
+        type: Boolean,
+        value: true,
+      },
+      classScale: Function, // map from run name to css class
+    },
+    observers: [
+      "_initializeOutSelected(names.*)",
+    ],
+    _lookupTooltip: function(item, tooltips) {
+      return tooltips != null ? tooltips[item] : null;
+    },
+    _isNullTooltip: function(item, tooltips) {
+      if (!this.hideMissingTooltips) {
+        return true;
+      }
+      if (tooltips == null) {
+        return false;
+      }
+      return tooltips[item] == null;
+    },
+    _initializeOutSelected: function(change) {
+      this.outSelected = change.base.slice();
+    },
+    _tooltipComparator: function(tooltips, tooltipOrderer) {
+      return function(a, b) {
+        if (!tooltips || !tooltipOrderer) {
+          // if we're missing tooltips or orderer, do lexicogrpahic sort
+          return a.localeCompare(b);
+        }
+        function getValue(x) {
+          var value = tooltipOrderer(tooltips[x]);
+          return value == null || _.isNaN(value) ? -Infinity : value;
+        }
+        var aValue = getValue(a);
+        var bValue = getValue(b);
+        return aValue === bValue ? a.localeCompare(b) : bValue - aValue;
+      }
+    },
+    _checkboxChange: function(e) {
+      var name = e.srcElement.name;
+      var idx = this.outSelected.indexOf(name);
+      var checked = e.srcElement.checked;
+      if (checked && idx === -1) {
+          this.push("outSelected", name);
+      } else if (!checked && idx !== -1) {
+        this.splice("outSelected", idx, 1);
+      }
+    },
+    _isChecked: function(item, outSelectedChange) {
+      var outSelected = outSelectedChange.base;
+      return outSelected.indexOf(item) !== -1;
+    },
+    _initializeRuns: function(change) {
+      this.outSelected = change.base.slice();
+    },
+    _applyColorClass: function(item, classScale) {
+      // TODO: Update style just on the element that changes
+      // and apply at microtask timing
+      this.debounce("restyle", function (){
+        this.updateStyles();
+      }, 16);
+      return classScale(item);
+    },
+    _isHighlighted: function(item, highlights) {
+      return highlights.base.indexOf(item) !== -1;
+    },
+  });
+  </script>
+
+</dom-module>
diff --git a/tensorflow/tensorboard/components/tf-regex-group/demo/index.html b/tensorflow/tensorboard/components/tf-regex-group/demo/index.html
new file mode 100644
index 0000000000..efef84e0fc
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-regex-group/demo/index.html
@@ -0,0 +1,32 @@
+<!DOCTYPE html>
+<html>
+ <head>
+   <script src="../../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+   <link rel="import" href="../tf-regex-group.html">
+ </head>
+ <body>
+  <style>
+  .container {
+    width: 255px;
+    padding: 10px;
+    border: 1px solid #3f51b5;
+    border-radius: 5px;
+  }
+  :host {
+    margin: 0px;
+  }
+  </style>
+  <template id="page-template" is="dom-bind">
+    <div class="container">
+      <tf-regex-group regexes="{{regexes}}" id="demo"></tf-regex-group>
+    </div>
+    <p> Regexes:</p>
+    <template is="dom-repeat" items="[[regexes]]">
+      <p>"<span>[[item]]</span>"</p>
+    </template>
+  </template>
+ </body>
+ <script>
+
+ </script>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-regex-group/index.html b/tensorflow/tensorboard/components/tf-regex-group/index.html
new file mode 100644
index 0000000000..0238a8d326
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-regex-group/index.html
@@ -0,0 +1,18 @@
+<!doctype html>
+<html>
+<head>
+
+  <title>tf-regex-group</title>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+  <script src="../../bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+  <link rel="import" href="../../bower_components/iron-component-page/iron-component-page.html">
+
+</head>
+<body>
+
+  <iron-component-page></iron-component-page>
+
+</body>
+</html>
diff --git a/tensorflow/tensorboard/components/tf-regex-group/tf-regex-group.html b/tensorflow/tensorboard/components/tf-regex-group/tf-regex-group.html
new file mode 100644
index 0000000000..e9673e85d9
--- /dev/null
+++ b/tensorflow/tensorboard/components/tf-regex-group/tf-regex-group.html
@@ -0,0 +1,151 @@
+<link rel="import" href="../../bower_components/polymer/polymer.html">
+<link rel="import" href="../../bower_components/paper-icon-button/paper-icon-button.html">
+<link rel="import" href="../../bower_components/iron-icons/iron-icons.html">
+<link rel="import" href="../../bower_components/paper-toggle-button/paper-toggle-button.html">
+<link rel="import" href="../../bower_components/paper-input/paper-input.html">
+
+<!--
+`tf-regex-group` provides an input component for a group of regular expressions.
+
+Example:
+  <tf-regex-group regexes="{{regexes}}"></tf-regex-group>
+
+It contains a series of regular expression input fields. From this, it computes
+`regexes', an array in which every element is either a string representing an
+active, valid, nonempty regular expression, or the value `null`
+
+Public Properties:
+`regexes` a readonly, notifying array of strings, where each string is an
+  active, valid, nonempty regex
+
+It maintains an invariant that the final regex should always be an empty string,
+so the user can easily add more regular expressions. It does this by adding
+a new empty regex when the final one is nonempty.
+
+Pressing "enter" moves focus to the next regex (or just blurs if there are no
+more regexes).
+-->
+<dom-module id="tf-regex-group">
+  <template>
+    <div class="regex-list">
+      <template is="dom-repeat" items="{{rawRegexes}}">
+        <div class="regex-line">
+          <paper-input
+            id="text-input"
+            class="regex-input"
+            label="input new regex"
+            no-label-float
+            bind-value="{{item.regex}}"
+            invalid="[[!item.valid]]"
+            on-keyup="moveFocus"
+          ></paper-input>
+          <paper-toggle-button
+            class="active-button"
+            checked="{{item.active}}"
+            disabled="[[!item.valid]]"
+          ></paper-toggle-button>
+
+          <paper-icon-button
+            icon="delete"
+            class="delete-button"
+            aria-label="Delete Regex"
+            tabindex="0"
+            on-tap="deleteRegex"
+          ></paper-icon-button>
+        </div>
+        <style>
+          .regex-input {
+            width: 210px;
+            display: inline-block;
+            padding-left: 8px;
+            padding-right: 5px;
+          }
+
+          .active-button {
+            --paper-toggle-button-checked-button-color: var(--tb-orange-strong);
+            --paper-toggle-button-checked-bar-color: var(--tb-orange-weak);
+            border: none;
+          }
+
+          .delete-button {
+            color: var(--paper-pink-900);
+            width: 24px;
+            height: 24px;
+          }
+          .regex-list {
+            margin-bottom: 10px;
+          }
+          paper-input {
+            --paper-input-container-focus-color: var(--tb-orange-strong);
+          }
+        </style>
+      </template>
+    </div>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-regex-group",
+      properties: {
+        rawRegexes: {
+          type: Array,
+          value: function() {
+            return [{regex: "", active: true, valid: true}];
+          }
+        },
+        regexes: {type: Array, computed: "usableRegexes(rawRegexes.*)", notify: true},
+      },
+      observers: [
+        "addNewRegexIfNeeded(rawRegexes.*)",
+        "checkValidity(rawRegexes.*)",
+      ],
+      checkValidity: function(x) {
+        var match = x.path.match(/rawRegexes\.(\d+)\.regex/);
+        if (match) {
+          var idx = match[1];
+          this.set("rawRegexes." + idx + ".valid", this.isValid(x.value));
+        }
+      },
+      isValid: function(s) {
+        try {
+          new RegExp(s);
+          return true;
+        } catch (e) {
+          return false;
+        }
+      },
+      usableRegexes: function(regexes) {
+        var isValid = this.isValid;
+        return regexes.base.filter(function (r) {
+          // Checking validity here (rather than using the data property)
+          // is necessary because otherwise we might send invalid regexes due
+          // to the fact that this function can call before the observer does
+          return r.regex !== "" && r.active && isValid(r.regex);
+        }).map(function(r) {
+          return r.regex;
+        });
+      },
+      addNewRegexIfNeeded: function() {
+        var last = this.rawRegexes[this.rawRegexes.length - 1];
+        if (last.regex !== "") {
+          this.push("rawRegexes", {regex: "", active: true, valid: true});
+        }
+      },
+      deleteRegex: function(e) {
+        if (this.rawRegexes.length > 1) {
+          this.splice("rawRegexes", e.model.index, 1);
+        }
+      },
+      moveFocus: function(e) {
+        if (e.keyCode === 13) {
+          var idx = e.model.index;
+          var inputs = Polymer.dom(this.root).querySelectorAll(".regex-input");
+          if (idx < this.rawRegexes.length - 1) {
+            inputs[idx+1].$.input.focus();
+          } else {
+            document.activeElement.blur();
+          }
+        }
+      }
+    });
+  </script>
+</dom-module>
diff --git a/tensorflow/tensorboard/dist/index.html b/tensorflow/tensorboard/dist/index.html
new file mode 100644
index 0000000000..e75a87a4f3
--- /dev/null
+++ b/tensorflow/tensorboard/dist/index.html
@@ -0,0 +1,43 @@
+<!DOCTYPE html>
+<html>
+  <head>
+    <title>TensorBoard</title>
+    <link rel="stylesheet" type="text/css" href="/lib/css/global.css">
+
+    <script src="external/lodash/lodash.min.js"></script>
+    <script src="external/d3/d3.min.js"></script>
+    <script src="/external/plottable/plottable.min.js"></script>
+    <link rel="stylesheet" type="text/css" href="/external/plottable/plottable.css">
+
+    <script src="external/graphlib/dist/graphlib.core.min.js"></script>
+    <script src="external/dagre/dist/dagre.core.min.js"></script>
+
+    <link rel="import" href="external/polymer/polymer.html">
+    <script src="external/webcomponentsjs/webcomponents-lite.min.js"></script>
+
+    <link rel="import" href="external/iron-ajax/iron-ajax.html">
+    <link rel="import" href="external/iron-collapse/iron-collapse.html">
+    <link rel="import" href="external/iron-list/iron-list.html">
+    <link rel="import" href="external/paper-button/paper-button.html">
+    <link rel="import" href="external/paper-checkbox/paper-checkbox.html">
+    <link rel="import" href="external/paper-dropdown-menu/paper-dropdown-menu.html">
+    <link rel="import" href="external/paper-header-panel/paper-header-panel.html">
+    <link rel="import" href="external/paper-icon-button/paper-icon-button.html">
+    <link rel="import" href="external/paper-input/paper-input.html">
+    <link rel="import" href="external/paper-item/paper-item.html">
+    <link rel="import" href="external/paper-menu/paper-menu.html">
+    <link rel="import" href="external/paper-progress/paper-progress.html">
+    <link rel="import" href="external/paper-radio-button/paper-radio-button.html">
+    <link rel="import" href="external/paper-radio-group/paper-radio-group.html">
+    <link rel="import" href="external/paper-slider/paper-slider.html">
+    <link rel="import" href="external/paper-styles/paper-styles.html">
+    <link rel="import" href="external/paper-toggle-button/paper-toggle-button.html">
+    <link rel="import" href="external/paper-toolbar/paper-toolbar.html">
+
+    <link rel="import" href="dist/tf-tensorboard.html">
+
+  </head>
+  <body>
+    <tf-tensorboard></tf-tensorboard>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/dist/tf-tensorboard.html b/tensorflow/tensorboard/dist/tf-tensorboard.html
new file mode 100644
index 0000000000..2aa1e46ca3
--- /dev/null
+++ b/tensorflow/tensorboard/dist/tf-tensorboard.html
@@ -0,0 +1,10484 @@
+<html><head><meta charset="UTF-8">
+
+
+
+
+
+<style is="custom-style">
+
+  :root {
+    --tb-orange-weak: #fcb938;
+    --tb-orange-strong: #f3913e;
+    --tb-grey-darker: #e2e2e2;
+    --tb-grey-lighter: #f3f3f3;
+  }
+
+</style>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<script>/// <reference path="../../../typings/tsd.d.ts" />
+var tf;
+(function (tf) {
+    /**
+     * Recommended delay (ms) when running an expensive task asynchronously
+     * that gives enough time for the progress bar to update its UI.
+     */
+    var ASYNC_TASK_DELAY = 20;
+    function time(msg, task) {
+        var start = Date.now();
+        var result = task();
+        /* tslint:disable */
+        console.log(msg, ":", Date.now() - start, "ms");
+        /* tslint:enable */
+        return result;
+    }
+    tf.time = time;
+    /**
+     * Creates a tracker for a subtask given the parent tracker, the total progress
+     * of the subtask and the subtask message. The parent task should pass a
+     * subtracker to its subtasks. The subtask reports its own progress which
+     * becames relative to the main task.
+     */
+    function getSubtaskTracker(parentTracker, impactOnTotalProgress, subtaskMsg) {
+        return {
+            setMessage: function (progressMsg) {
+                // The parent should show a concatenation of its message along with
+                // its subtask tracker message.
+                parentTracker.setMessage(subtaskMsg + " : " + progressMsg);
+            },
+            updateProgress: function (incrementValue) {
+                // Update the parent progress relative to the child progress.
+                // For example, if the sub-task progresses by 30%, and the impact on the
+                // total progress is 50%, then the task progresses by 30% * 50% = 15%.
+                parentTracker
+                    .updateProgress(incrementValue * impactOnTotalProgress / 100);
+            },
+            reportError: function (errorMsg) {
+                // The parent should show a concatenation of its message along with
+                // its subtask error message.
+                parentTracker.reportError(subtaskMsg + " : " + errorMsg);
+            }
+        };
+    }
+    tf.getSubtaskTracker = getSubtaskTracker;
+    /**
+     * Runs an expensive task asynchronously and returns a promise of the result.
+     */
+    function runAsyncTask(msg, incProgressValue, task, tracker) {
+        return new Promise(function (resolve, reject) {
+            // Update the progress message to say the current running task.
+            tracker.setMessage(msg);
+            // Run the expensive task with a delay that gives enough time for the
+            // UI to update.
+            setTimeout(function () {
+                try {
+                    var result = tf.time(msg, task);
+                    // Update the progress value.
+                    tracker.updateProgress(incProgressValue);
+                    // Return the result to be used by other tasks.
+                    resolve(result);
+                }
+                catch (e) {
+                    reject(result);
+                }
+            }, ASYNC_TASK_DELAY);
+        });
+    }
+    tf.runAsyncTask = runAsyncTask;
+    /**
+     * Returns a query selector with escaped special characters that are not
+     * allowed in a query selector.
+     */
+    function escapeQuerySelector(querySelector) {
+        return querySelector.replace(/([:.\[\],/\\\(\)])/g, "\\$1");
+    }
+    tf.escapeQuerySelector = escapeQuerySelector;
+})(tf || (tf = {})); // close module tf
+</script>
+<script>/// <reference path="../../../typings/tsd.d.ts" />
+/// <reference path="common.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph_1) {
+        /** Delimiter used in node names to denote namespaces. */
+        graph_1.NAMESPACE_DELIM = "/";
+        var FULL_GRAPH_NAME = "fullGraph";
+        graph_1.ROOT_NAME = "__root__";
+        // Separator between the source and the destination name of the edge.
+        graph_1.EDGE_KEY_DELIM = "--";
+        (function (GraphType) {
+            GraphType[GraphType["FULL"] = 0] = "FULL";
+            GraphType[GraphType["EMBEDDED"] = 1] = "EMBEDDED";
+            GraphType[GraphType["META"] = 2] = "META";
+            GraphType[GraphType["SERIES"] = 3] = "SERIES";
+            GraphType[GraphType["CORE"] = 4] = "CORE";
+            GraphType[GraphType["SHADOW"] = 5] = "SHADOW";
+            GraphType[GraphType["BRIDGE"] = 6] = "BRIDGE";
+            GraphType[GraphType["EDGE"] = 7] = "EDGE";
+        })(graph_1.GraphType || (graph_1.GraphType = {}));
+        var GraphType = graph_1.GraphType;
+        ;
+        (function (NodeType) {
+            NodeType[NodeType["META"] = 0] = "META";
+            NodeType[NodeType["OP"] = 1] = "OP";
+            NodeType[NodeType["SERIES"] = 2] = "SERIES";
+            NodeType[NodeType["BRIDGE"] = 3] = "BRIDGE";
+            NodeType[NodeType["ELLIPSIS"] = 4] = "ELLIPSIS";
+        })(graph_1.NodeType || (graph_1.NodeType = {}));
+        var NodeType = graph_1.NodeType;
+        ;
+        /**
+         * A SlimGraph is inspired by graphlib.Graph, but having only the functionality
+         * that we need.
+         */
+        var SlimGraph = (function () {
+            function SlimGraph() {
+                this.nodes = {};
+                this.edges = [];
+            }
+            return SlimGraph;
+        })();
+        graph_1.SlimGraph = SlimGraph;
+        var EllipsisNodeImpl = (function () {
+            /**
+             * Constructs a new ellipsis annotation node.
+             *
+             * @param numNodes The number of additional annotations this node represents.
+             */
+            function EllipsisNodeImpl(numNodes) {
+                this.type = NodeType.ELLIPSIS;
+                this.isGroupNode = false;
+                this.cardinality = 1;
+                this.parentNode = null;
+                this.stats = null;
+                this.setNumMoreNodes(numNodes);
+            }
+            EllipsisNodeImpl.prototype.setNumMoreNodes = function (numNodes) {
+                this.numMoreNodes = numNodes;
+                this.name = "... " + numNodes + " more";
+            };
+            return EllipsisNodeImpl;
+        })();
+        graph_1.EllipsisNodeImpl = EllipsisNodeImpl;
+        ;
+        /**
+         * A label object for nodes in the full graph and leaf nodes in the render
+         * graph.
+         */
+        var OpNodeImpl = (function () {
+            /**
+             * Constructs a new Op node.
+             *
+             * @param rawNode The raw node.
+             * @param normalizedInputs An array of normalized
+             *     inputs that denote the incoming edges to the current node. Each input
+             *     contains the normalized name of the source node, whether it has a number
+             *     part and whether it is a control dependency.
+             */
+            function OpNodeImpl(rawNode, normalizedInputs) {
+                this.op = rawNode.op;
+                this.name = rawNode.name;
+                this.device = rawNode.device;
+                this.attr = rawNode.attr;
+                this.inputs = normalizedInputs;
+                // additional properties
+                this.type = NodeType.OP;
+                this.isGroupNode = false;
+                this.cardinality = 1;
+                this.inEmbeddings = [];
+                this.outEmbeddings = [];
+                this.parentNode = null;
+            }
+            return OpNodeImpl;
+        })();
+        ;
+        function createMetanode(name, opt) {
+            if (opt === void 0) { opt = {}; }
+            return new MetanodeImpl(name, opt);
+        }
+        graph_1.createMetanode = createMetanode;
+        /**
+         * Joins the information from the stats file (memory, compute time) with the graph
+         * information.
+         */
+        function joinStatsInfoWithGraph(graph, statsJson) {
+            _.each(statsJson.devStats, function (stats) {
+                _.each(stats.nodeStats, function (nodeStats) {
+                    // Lookup the node in the graph by its original name, e.g. A. If not
+                    // found, lookup by the rewritten name A/(A) in case the name is both
+                    // a namespace and a node name.
+                    var nodeName = nodeStats.nodeName in graph.nodes ?
+                        nodeStats.nodeName :
+                        nodeStats.nodeName + graph_1.NAMESPACE_DELIM + "(" + nodeStats.nodeName + ")";
+                    if (nodeName in graph.nodes) {
+                        // Compute the total bytes used.
+                        var totalBytes = 0;
+                        if (nodeStats.memory) {
+                            _.each(nodeStats.memory, function (alloc) {
+                                if (alloc.totalBytes) {
+                                    totalBytes += Number(alloc.totalBytes);
+                                }
+                            });
+                        }
+                        var outputSize = null;
+                        if (nodeStats.output) {
+                            outputSize = _.map(nodeStats.output, function (output) {
+                                return _.map(output.tensorDescription.shape.dim, function (dim) { return Number(dim.size); });
+                            });
+                        }
+                        graph.nodes[nodeName].stats = new NodeStats(totalBytes, Number(nodeStats.allEndRelMicros), outputSize);
+                    }
+                });
+            });
+        }
+        graph_1.joinStatsInfoWithGraph = joinStatsInfoWithGraph;
+        /**
+         * Execution stats for the node.
+         */
+        var NodeStats = (function () {
+            function NodeStats(totalBytes, totalMicros, outputSize) {
+                this.totalBytes = totalBytes;
+                this.totalMicros = totalMicros;
+                this.outputSize = outputSize;
+            }
+            /**
+             * Combines the specified stats with the current stats.
+             * Modifies the current object. This methos is used to
+             * compute aggregate stats for group nodes.
+             */
+            NodeStats.prototype.combine = function (stats) {
+                if (stats.totalBytes != null) {
+                    this.totalBytes += stats.totalBytes;
+                }
+                if (stats.totalMicros != null) {
+                    this.totalMicros += stats.totalMicros;
+                }
+            };
+            return NodeStats;
+        })();
+        var MetanodeImpl = (function () {
+            /** A label object for meta-nodes in the graph hierarchy */
+            function MetanodeImpl(name, opt) {
+                if (opt === void 0) { opt = {}; }
+                this.name = name;
+                this.type = NodeType.META;
+                /** number of levels under this group */
+                this.depth = 1;
+                this.isGroupNode = true;
+                /** # of leaf nodes (including embedded ones) */
+                this.cardinality = 0;
+                /** graph contains metanodes, nodes, edges
+                 * and metaedges for main items within this metanode
+                 */
+                this.metagraph =
+                    createGraph(name, GraphType.META, opt);
+                /** bridgegraph must be constructed lazily-see hierarchy.getBridgegraph() */
+                this.bridgegraph = null;
+                /**
+                 * A dictionary that count ops type of nodes in this metanode
+                 * (op type => count).
+                 */
+                this.opHistogram = {};
+                this.deviceHistogram = {};
+                /** unique id for a metanode of similar subgraph */
+                this.templateId = null;
+                /** Metanode which contains this node, if any */
+                this.parentNode = null;
+                this.stats = new NodeStats(0, 0, null);
+                this.hasNonControlEdges = false;
+            }
+            MetanodeImpl.prototype.getFirstChild = function () {
+                return this.metagraph.node(this.metagraph.nodes()[0]);
+            };
+            /**
+             * Returns the op node associated with the metanode.
+             * For example, if the metanode is "sgd", the associated
+             * op node is sgd/(sgd).
+             */
+            MetanodeImpl.prototype.getRootOp = function () {
+                var nameSplit = this.name.split("/");
+                var rootOpName = this.name + "/(" + nameSplit[nameSplit.length - 1] + ")";
+                return this.metagraph.node(rootOpName);
+            };
+            /**
+             * Return an array of the names of all the leaves (non-GroupNodes) inside
+             * this metanode. This performs a breadth-first search of the tree, so
+             * immediate child leaves will appear earlier in the output array than
+             * descendant leaves.
+             */
+            MetanodeImpl.prototype.leaves = function () {
+                var leaves = [];
+                var queue = [this];
+                var metagraph; // Defined here due to a limitation of ES6->5 compilation.
+                while (queue.length) {
+                    var node = queue.shift();
+                    if (node.isGroupNode) {
+                        metagraph = node.metagraph;
+                        _.each(metagraph.nodes(), function (name) { return queue.push(metagraph.node(name)); });
+                    }
+                    else {
+                        leaves.push(node.name);
+                    }
+                }
+                return leaves;
+            };
+            return MetanodeImpl;
+        })();
+        ;
+        function createMetaedge(v, w) {
+            return new MetaedgeImpl(v, w);
+        }
+        graph_1.createMetaedge = createMetaedge;
+        /**
+         * A label object for edges between metanodes of subgraphs in the render graph.
+         */
+        var MetaedgeImpl = (function () {
+            function MetaedgeImpl(v, w) {
+                this.v = v;
+                this.w = w;
+                this.baseEdgeList = [];
+                this.inbound = null;
+                this.numRegularEdges = 0;
+                this.numControlEdges = 0;
+                this.numRefEdges = 0;
+            }
+            MetaedgeImpl.prototype.addBaseEdge = function (edge) {
+                this.baseEdgeList.push(edge);
+                if (edge.isControlDependency) {
+                    this.numControlEdges += 1;
+                }
+                else {
+                    this.numRegularEdges += 1;
+                }
+                if (edge.isReferenceEdge) {
+                    this.numRefEdges += 1;
+                }
+            };
+            return MetaedgeImpl;
+        })();
+        function createSeriesNode(prefix, suffix, parent, clusterId, name) {
+            return new SeriesNodeImpl(prefix, suffix, parent, clusterId, name);
+        }
+        graph_1.createSeriesNode = createSeriesNode;
+        function getSeriesNodeName(prefix, suffix, parent, startId, endId) {
+            var numRepresentation = (typeof startId !== "undefined" && typeof endId !== "undefined") ?
+                "[" + startId + "-" + endId + "]" : "#";
+            var pattern = prefix + numRepresentation + suffix;
+            return (parent ? parent + "/" : "") + pattern;
+        }
+        graph_1.getSeriesNodeName = getSeriesNodeName;
+        var SeriesNodeImpl = (function () {
+            function SeriesNodeImpl(prefix, suffix, parent, clusterId, name) {
+                this.name = name || getSeriesNodeName(prefix, suffix, parent);
+                this.type = NodeType.SERIES;
+                this.hasLoop = false;
+                this.prefix = prefix;
+                this.suffix = suffix;
+                this.clusterId = clusterId;
+                this.ids = [];
+                this.parent = parent;
+                this.isGroupNode = true;
+                this.cardinality = 0;
+                this.metagraph = createGraph(name, GraphType.SERIES);
+                // bridgegraph must be constructed lazily-see hierarchy.getBridgegraph()
+                this.bridgegraph = null;
+                this.parentNode = null;
+                this.deviceHistogram = {};
+                this.hasNonControlEdges = false;
+                this.stats = new NodeStats(0, 0, null);
+            }
+            return SeriesNodeImpl;
+        })();
+        /**
+         * Normalizes the inputs and extracts associated metadata:
+         * 1) Inputs can contain a colon followed by a number at the end
+         *    (e.g. inputName:1) and we remove this from the input name, and take note
+         *    that the input was numbered.
+         * 2) Control dependency inputs contain caret at the beginning and we
+         *    remove this and annotate the edge as a control dependency.
+         * @param inputs Array of unnormalized names of input nodes.
+         */
+        function normalizeInputs(inputs) {
+            return _.reduce(inputs, function (normalizedInputs, inputName) {
+                var start = inputName[0] === "^";
+                var colon = inputName.lastIndexOf(":");
+                var end = colon !== -1 &&
+                    inputName.length - colon > 1 &&
+                    !(/\D/).test(inputName.substring(colon + 1)) ?
+                    colon : inputName.length;
+                var name = inputName.substring(start ? 1 : 0, end);
+                if (normalizedInputs.length === 0 ||
+                    name !== normalizedInputs[normalizedInputs.length - 1].name) {
+                    normalizedInputs.push({
+                        name: name,
+                        hasNumberPart: end !== inputName.length,
+                        isControlDependency: start
+                    });
+                }
+                return normalizedInputs;
+            }, []);
+        }
+        function build(rawNodes, params, tracker) {
+            /**
+             * A dictionary that maps each in-embedding node name to its host node label
+             * object.
+             */
+            var inEmbedding = {};
+            /**
+             * A dictionary that maps each node name to an array of the node's
+             * out-embedding node label objects.
+             */
+            var outEmbeddings = {};
+            var isInEmbeddedPred = getEmbedPredicate(params.inEmbeddingTypes);
+            var isOutEmbeddedPred = getEmbedPredicate(params.outEmbeddingTypes);
+            var embeddingNodeNames = [];
+            /**
+             * A list of all the non-embedding node names which appear in the processed
+             * list of raw nodes. Here we pre-allocate enough room for all the rawNodes,
+             * even though there will some number of embeddings. The excess array length
+             * is spliced off later.
+             *
+             * Experimentation shows that around 30% of the array will go unused, and
+             * even for very large networks that amounts to less than 10k spaces.
+             */
+            var nodeNames = new Array(rawNodes.length);
+            return tf.runAsyncTask("Normalizing names", 30, function () {
+                var opNodes = new Array(rawNodes.length);
+                var index = 0;
+                _.each(rawNodes, function (rawNode) {
+                    var normalizedInputs = normalizeInputs(rawNode.input);
+                    var opNode = new OpNodeImpl(rawNode, normalizedInputs);
+                    if (isInEmbeddedPred(opNode)) {
+                        embeddingNodeNames.push(opNode.name);
+                        inEmbedding[opNode.name] = opNode;
+                        return;
+                    }
+                    if (isOutEmbeddedPred(opNode)) {
+                        embeddingNodeNames.push(opNode.name);
+                        _.each(opNode.inputs, function (input) {
+                            var inputName = input.name;
+                            outEmbeddings[inputName] = outEmbeddings[inputName] || [];
+                            outEmbeddings[inputName].push(opNode);
+                        });
+                        return;
+                    }
+                    // The node is not an embedding, so add it to the names and nodes lists.
+                    opNodes[index] = opNode;
+                    nodeNames[index] = opNode.name;
+                    index++;
+                });
+                opNodes.splice(index);
+                nodeNames.splice(index);
+                return opNodes;
+            }, tracker)
+                .then(function (opNodes) {
+                // Create the graph data structure from the graphlib library.
+                return tf.runAsyncTask("Building the data structure", 70, function () {
+                    var normalizedNameDict = mapStrictHierarchy(nodeNames, embeddingNodeNames);
+                    var graph = new SlimGraph;
+                    // Add the nodes to the graph.
+                    _.each(opNodes, function (opNode) {
+                        var normalizedName = normalizedNameDict[opNode.name] || opNode.name;
+                        graph.nodes[normalizedName] = opNode;
+                        // Check if the node has out-embeddings. If yes, add them to to the
+                        // node.
+                        if (opNode.name in outEmbeddings) {
+                            opNode.outEmbeddings = outEmbeddings[opNode.name];
+                            // Normalize the names of the out-embeddings.
+                            _.each(opNode.outEmbeddings, function (node) {
+                                node.name = normalizedNameDict[node.name] || node.name;
+                            });
+                        }
+                        // Update the name of the node.
+                        opNode.name = normalizedName;
+                    });
+                    // Visit each node's inputs to add the edges to the graph. If the input
+                    // is an in-embedding, then add it to the node's in-embeddings instead.
+                    _.each(opNodes, function (opNode) {
+                        _.each(opNode.inputs, function (input, i) {
+                            var inputName = input.name;
+                            if (inputName in inEmbedding) {
+                                opNode.inEmbeddings.push(inEmbedding[inputName]);
+                            }
+                            else {
+                                graph.edges.push({
+                                    v: normalizedNameDict[inputName] || inputName,
+                                    w: opNode.name,
+                                    isControlDependency: input.isControlDependency,
+                                    // Check if this op type and input number corresponds to a
+                                    // reference edge using the refEdges dictionary in the params.
+                                    isReferenceEdge: (params.refEdges[opNode.op + " " + i] === true)
+                                });
+                            }
+                        });
+                    });
+                    // Normalize the names of in-embeddings.
+                    _.each(inEmbedding, function (node, name) {
+                        node.name = normalizedNameDict[node.name] || node.name;
+                    });
+                    return graph;
+                }, tracker);
+            })
+                .catch(function (reason) {
+                throw new Error("Failure creating graph");
+            });
+        }
+        graph_1.build = build;
+        ;
+        /**
+         * Create a new graphlib.Graph() instance with default parameters
+         */
+        function createGraph(name, type, opt) {
+            if (opt === void 0) { opt = {}; }
+            var graph = new graphlib.Graph(opt);
+            graph.setGraph({
+                name: name,
+                rankdir: "BT",
+                type: type
+            });
+            return graph;
+        }
+        graph_1.createGraph = createGraph;
+        ;
+        /**
+         * Create a predicate for checking whether a node should be embedded based on
+         * the specified types.
+         */
+        function getEmbedPredicate(types) {
+            return function (node) {
+                // check types
+                for (var i = 0; i < types.length; i++) {
+                    var regExp = new RegExp(types[i]);
+                    if (node.op.match(regExp)) {
+                        return true;
+                    }
+                }
+                return false;
+            };
+        }
+        ;
+        /**
+         * Returns a strict node name (name => name/(name)) to avoid conflicts
+         * where the node name is also a namespace.
+         */
+        function getStrictName(name) {
+            var parts = name.split(graph_1.NAMESPACE_DELIM);
+            return name + graph_1.NAMESPACE_DELIM + "(" + parts[parts.length - 1] + ")";
+        }
+        /**
+         * For each op node (embedding or non-embedding), rename it if there is a
+         * non-embedding node under its namespace. For example, assume node name "A".
+         * If there is a non-embedding node under its namespace (e.g. "A/B"), "A" will
+         * be renamed to "A/(A)". Then the namespace "A" will contain 2 nodes: "(A)"
+         * and "B". If all the nodes under "A" are embedding nodes (e.g. constant and
+         * summary), keep "A" as an Op node and don't create a namespace.
+         *
+         * @param nodeNames An array of regular (non-embedding) node names.
+         * @param embeddingNodeNames An array of embedding node names.
+         * @return Dictionary object mapping names that need to be renamed to
+         *     new names.
+         */
+        function mapStrictHierarchy(nodeNames, embeddingNodeNames) {
+            /** Dictionary that maps the old new to the new name */
+            var newNameDictionary = {};
+            /** Set used to store all namespaces. */
+            var namespaceSet = {};
+            // sort the nodes to make prefix check faster
+            nodeNames.sort();
+            // look for nodes with a prefix a,a/b -> a/(a),a/b
+            for (var i = 0; i < nodeNames.length - 1; ++i) {
+                var a = nodeNames[i];
+                // Get all the parent namespaces of the current node
+                // and add them in the namespace set.
+                _.each(getHierarchicalPath(a).slice(0, -1), function (ns) {
+                    namespaceSet[ns] = true;
+                });
+                var b = nodeNames[i + 1];
+                if (_.startsWith(b, a + graph_1.NAMESPACE_DELIM)) {
+                    newNameDictionary[a] = getStrictName(a);
+                }
+            }
+            // Go through all the embedding node names and rename them in case they
+            // collide with namespaces.
+            _.each(embeddingNodeNames, function (embeddingName) {
+                if (embeddingName in namespaceSet) {
+                    // Rename to follow strict hierarchy.
+                    newNameDictionary[embeddingName] = getStrictName(embeddingName);
+                }
+            });
+            return newNameDictionary;
+        }
+        ;
+        /**
+         * Returns a list of the degrees of each node in the graph.
+         */
+        function degreeSequence(graph) {
+            var degrees = graph.nodes().map(function (name) {
+                return graph.neighbors(name).length;
+            });
+            degrees.sort();
+            return degrees;
+        }
+        ;
+        /**
+         * Returns if the degree sequence of the two graphs is the same.
+         */
+        function hasSimilarDegreeSequence(graph1, graph2) {
+            var dg1 = degreeSequence(graph1);
+            var dg2 = degreeSequence(graph2);
+            for (var i = 0; i < dg1.length; i++) {
+                if (dg1[i] !== dg2[i]) {
+                    return false;
+                }
+            }
+            return true;
+        }
+        graph_1.hasSimilarDegreeSequence = hasSimilarDegreeSequence;
+        ;
+        /**
+         * Returns the hierarchical path of the current node, based on the node's name.
+         * For example, if the name is 'a/b/c', the returned path is ['a', 'a/b', 'a/b/c'].
+         */
+        function getHierarchicalPath(name, seriesNames) {
+            var path = [];
+            var i = name.indexOf(graph_1.NAMESPACE_DELIM);
+            // Push all parent portions of the path.
+            while (i >= 0) {
+                path.push(name.substring(0, i));
+                i = name.indexOf(graph_1.NAMESPACE_DELIM, i + 1);
+            }
+            // If the node's path is under a series, then add the series node name to the
+            // hierarchical path as the parent of the leaf.
+            if (seriesNames) {
+                var seriesName = seriesNames[name];
+                if (seriesName) {
+                    path.push(seriesName);
+                }
+            }
+            // Push the leaf of the path.
+            path.push(name);
+            return path;
+        }
+        graph_1.getHierarchicalPath = getHierarchicalPath;
+        ;
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module tf.graph
+</script>
+<script>/// <reference path="../../../typings/tsd.d.ts" />
+/// <reference path="common.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph) {
+        var parser;
+        (function (parser) {
+            /**
+             * Parses a native js value, which can be either a string, boolean or number.
+             *
+             * @param value The value to be parsed.
+             */
+            function parseValue(value) {
+                if (value === "true") {
+                    return true;
+                }
+                if (value === "false") {
+                    return false;
+                }
+                var firstChar = value[0];
+                if (firstChar === "\"") {
+                    return value.substring(1, value.length - 1);
+                }
+                var num = parseFloat(value);
+                return isNaN(num) ? value : num;
+            }
+            /**
+             * Fetches a text file and returns a promise of the result.
+             */
+            function readPbTxt(filepath) {
+                return new Promise(function (resolve, reject) {
+                    d3.text(filepath, function (error, text) {
+                        if (error) {
+                            reject(error);
+                            return;
+                        }
+                        resolve(text);
+                    });
+                });
+            }
+            parser.readPbTxt = readPbTxt;
+            /**
+             * Fetches and parses a json file and returns a promise of the result.
+             */
+            function readJson(filepath) {
+                return new Promise(function (resolve, reject) {
+                    d3.json(filepath, function (error, text) {
+                        if (error) {
+                            reject(error);
+                            return;
+                        }
+                        resolve(text);
+                    });
+                });
+            }
+            parser.readJson = readJson;
+            /**
+             * Reads the graph and stats file (if available), parses them and returns a
+             * promise of the result.
+             */
+            function readAndParseData(dataset, pbTxtContent, tracker) {
+                var graphPbTxt;
+                var statsJson;
+                return tf.runAsyncTask("Reading graph.pbtxt", 20, function () {
+                    return pbTxtContent || readPbTxt(dataset.path);
+                }, tracker)
+                    .then(function (text) {
+                    graphPbTxt = text;
+                    return tf.runAsyncTask("Reading stats.pbtxt", 20, function () {
+                        return (dataset != null && dataset.statsPath != null) ?
+                            readJson(dataset.statsPath) : null;
+                    }, tracker);
+                })
+                    .then(function (json) {
+                    statsJson = json;
+                    return tf.runAsyncTask("Parsing graph.pbtxt", 60, function () {
+                        return parsePbtxt(graphPbTxt);
+                    }, tracker);
+                })
+                    .then(function (nodes) {
+                    return {
+                        nodes: nodes,
+                        statsJson: statsJson
+                    };
+                })
+                    .catch(function (reason) {
+                    throw new Error("Failure parsing graph definition");
+                });
+            }
+            parser.readAndParseData = readAndParseData;
+            /**
+             * Parses a proto txt file into a javascript object.
+             *
+             * @param input The string contents of the proto txt file.
+             * @return The parsed object.
+             */
+            function parsePbtxt(input) {
+                var output = { node: [] };
+                var stack = [];
+                var path = [];
+                var current = output;
+                function splitNameAndValueInAttribute(line) {
+                    var colonIndex = line.indexOf(":");
+                    var name = line.substring(0, colonIndex).trim();
+                    var value = parseValue(line.substring(colonIndex + 2).trim());
+                    return {
+                        name: name,
+                        value: value
+                    };
+                }
+                /**
+                 * Since proto-txt doesn't explicitly say whether an attribute is repeated
+                 * (an array) or not, we keep a hard-coded list of attributes that are known
+                 * to be repeated. This list is used in parsing time to convert repeated
+                 * attributes into arrays even when the attribute only shows up once in the
+                 * object.
+                 */
+                var ARRAY_ATTRIBUTES = {
+                    "node": true,
+                    "node.input": true,
+                    "node.attr": true,
+                    "node.attr.value.list.type": true,
+                    "node.attr.value.shape.dim": true,
+                    "node.attr.value.tensor.string_val": true,
+                    "node.attr.value.tensor.tensor_shape.dim": true
+                };
+                /**
+                 * Adds a value, given the attribute name and the host object. If the
+                 * attribute already exists, but is not an array, it will convert it to an
+                 * array of values.
+                 *
+                 * @param obj The host object that holds the attribute.
+                 * @param name The attribute name (key).
+                 * @param value The attribute value.
+                 * @param path A path that identifies the attribute. Used to check if
+                 *     an attribute is an array or not.
+                 */
+                function addAttribute(obj, name, value, path) {
+                    // We treat "node" specially since it is done so often.
+                    var existingValue = obj[name];
+                    if (existingValue == null) {
+                        obj[name] = path.join(".") in ARRAY_ATTRIBUTES ? [value] : value;
+                    }
+                    else if (Array.isArray(existingValue)) {
+                        existingValue.push(value);
+                    }
+                    else {
+                        obj[name] = [existingValue, value];
+                    }
+                }
+                // Run through the file a line at a time.
+                var startPos = 0;
+                while (startPos < input.length) {
+                    var endPos = input.indexOf("\n", startPos);
+                    if (endPos === -1) {
+                        endPos = input.length;
+                    }
+                    var line = input.substring(startPos, endPos);
+                    startPos = endPos + 1;
+                    if (!line) {
+                        continue;
+                    }
+                    switch (line[line.length - 1]) {
+                        case "{":
+                            var name_1 = line.substring(0, line.length - 2).trim();
+                            var newValue = {};
+                            stack.push(current);
+                            path.push(name_1);
+                            addAttribute(current, name_1, newValue, path);
+                            current = newValue;
+                            break;
+                        case "}":
+                            current = stack.pop();
+                            path.pop();
+                            break;
+                        default:
+                            var x = splitNameAndValueInAttribute(line);
+                            addAttribute(current, x.name, x.value, path.concat(x.name));
+                            break;
+                    }
+                }
+                return output["node"];
+            }
+            parser.parsePbtxt = parsePbtxt;
+        })(parser = graph.parser || (graph.parser = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // Close module tf.graph.parser.
+</script>
+<script>/// <reference path="graph.ts" />
+/// <reference path="template.ts" />
+/**
+ * Package for the Graph Hierarchy for TensorFlow graph.
+ */
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph_1) {
+        var hierarchy;
+        (function (hierarchy_1) {
+            var LOG_PREFIX_MSG = "Graph hierarchy: ";
+            /**
+             * Class for the Graph Hierarchy for TensorFlow graph.
+             */
+            var HierarchyImpl = (function () {
+                function HierarchyImpl() {
+                    this.root = graph_1.createMetanode(graph_1.ROOT_NAME, { compound: true });
+                    this.templates = null;
+                    this.devices = null;
+                    /**
+                     * @type {Object} Dictionary object that maps node name to the node
+                     * (could be op-node, metanode, or series-node)
+                     */
+                    this.index = {};
+                    this.index[graph_1.ROOT_NAME] = this.root;
+                    this.orderings = {};
+                }
+                HierarchyImpl.prototype.getNodeMap = function () {
+                    return this.index;
+                };
+                HierarchyImpl.prototype.node = function (name) {
+                    return this.index[name];
+                };
+                HierarchyImpl.prototype.setNode = function (name, node) {
+                    this.index[name] = node;
+                };
+                /**
+                 * Given the name of a node in this hierarchy, get its bridgegraph, creating
+                 * it on the fly if necessary. If the node is not a GroupNode, then this
+                 * method returns null. If the provided name does not map to a node in the
+                 * hierarchy, an error will be thrown.
+                 */
+                HierarchyImpl.prototype.getBridgegraph = function (nodeName) {
+                    var _this = this;
+                    var node = this.index[nodeName];
+                    if (!node) {
+                        throw Error("Could not find node in hierarchy: " + nodeName);
+                    }
+                    if (!("metagraph" in node)) {
+                        return null;
+                    }
+                    var groupNode = node;
+                    if (groupNode.bridgegraph) {
+                        return groupNode.bridgegraph;
+                    }
+                    var bridgegraph = groupNode.bridgegraph =
+                        graph_1.createGraph("BRIDGEGRAPH", graph_1.GraphType.BRIDGE);
+                    if (!node.parentNode || !("metagraph" in node.parentNode)) {
+                        return bridgegraph;
+                    }
+                    var parentNode = node.parentNode;
+                    var parentMetagraph = parentNode.metagraph;
+                    var parentBridgegraph = this.getBridgegraph(parentNode.name);
+                    // For each of the parent node's two Metaedge containing graphs, process
+                    // each Metaedge involving this node.
+                    _.each([parentMetagraph, parentBridgegraph], function (parentGraph) {
+                        _(parentGraph.edges())
+                            .filter(function (e) { return e.v === nodeName || e.w === nodeName; })
+                            .each(function (parentEdgeObj) {
+                            var inbound = parentEdgeObj.w === nodeName;
+                            var parentMetaedge = parentGraph.edge(parentEdgeObj);
+                            // The parent's Metaedge represents some number of underlying
+                            // BaseEdges from the original full graph. For each of those, we need
+                            // to determine which immediate child is involved and make sure
+                            // there's a Metaedge in the bridgegraph that covers it.
+                            _.each(parentMetaedge.baseEdgeList, function (baseEdge) {
+                                // Based on the direction, figure out which is the descendant node
+                                // and which is the "other" node (sibling of parent or ancestor).
+                                var _a = inbound ?
+                                    [baseEdge.w, parentEdgeObj.v] :
+                                    [baseEdge.v, parentEdgeObj.w], descendantName = _a[0], otherName = _a[1];
+                                // Determine the immediate child containing this descendant node.
+                                var childName = _this.getChildName(nodeName, descendantName);
+                                // Look for an existing Metaedge in the bridgegraph (or create a
+                                // new one) that covers the relationship between child and other.
+                                var bridgeEdgeObj = {
+                                    v: inbound ? otherName : childName,
+                                    w: inbound ? childName : otherName,
+                                };
+                                var bridgeMetaedge = bridgegraph.edge(bridgeEdgeObj);
+                                if (!bridgeMetaedge) {
+                                    bridgeMetaedge = graph_1.createMetaedge(bridgeEdgeObj.v, bridgeEdgeObj.w);
+                                    bridgeMetaedge.inbound = inbound;
+                                    bridgegraph.setEdge(bridgeEdgeObj.v, bridgeEdgeObj.w, bridgeMetaedge);
+                                }
+                                // Copy the BaseEdge from the parent's Metaedge into this
+                                // bridgegraph Metaedge.
+                                bridgeMetaedge.addBaseEdge(baseEdge);
+                            });
+                        })
+                            .value(); // force lodash chain execution.
+                    });
+                    return bridgegraph;
+                };
+                /**
+                 * Utility function for determining the name of the immediate child under a
+                 * node for a given descendant path. If the descendant corresponds to no
+                 * immediate child, an error is thrown.
+                 */
+                HierarchyImpl.prototype.getChildName = function (nodeName, descendantName) {
+                    // Walk up the hierarchy from the descendant to find the child.
+                    var currentNode = this.index[descendantName];
+                    while (currentNode) {
+                        if (currentNode.parentNode && currentNode.parentNode.name === nodeName) {
+                            return currentNode.name;
+                        }
+                        currentNode = currentNode.parentNode;
+                    }
+                    throw Error("Could not find immediate child for descendant: " +
+                        descendantName);
+                };
+                ;
+                /**
+                 * Given the name of a node, return the names of its predecssors.
+                 * For an OpNode, this will contain the targets from the underlying BaseEdges.
+                 * For a GroupNode, this will contain the targets truncated to siblings of
+                 * the shared ancestor.
+                 *
+                 * For example, consider an original non-control BaseEdge A/B/C->Z/Y/X. Their
+                 * shared ancestor is the ROOT node. A and Z are the highest siblings. Here
+                 * are the results of calling getPredecessors():
+                 *
+                 *  - getPredecessors("Z/Y/X") === {regular: ["A/B/C"], control: []};
+                 *  - getPredecessors("Z/Y") === {regular: ["A"], control: []};
+                 *  - getPredecessors("Z") === {regular: ["A"], control: []};
+                 *
+                 * The reason getPredecessors("Z/Y") returns ["A"] (and not ["A/B"] as you
+                 * might intuitively expect) is because it's not clear how far down the
+                 * other end of the hierarchy to traverse in the general case.
+                 *
+                 * Continuing this example, say there was another BaseEdge A/K->Z/Y/W. When
+                 * we look at Z/Y's predecessors, the best we can say is ["A"] without getting
+                 * into the details of which of of Z/Y's descendant nodes have predecessors to
+                 * which of A's descendants.
+                 *
+                 * On the other hand, for an OpNode it's clear what the final predecssors
+                 * ought to be. There is no ambiguity.
+                 */
+                HierarchyImpl.prototype.getPredecessors = function (nodeName) {
+                    var node = this.index[nodeName];
+                    if (!node) {
+                        throw Error("Could not find node with name: " + nodeName);
+                    }
+                    var predecessors = this.getOneWayEdges(node, true);
+                    // Add embedded predecessors, such as constants.
+                    if (!node.isGroupNode) {
+                        _.each(node.inEmbeddings, function (embeddedNode) {
+                            predecessors.regular.push(embeddedNode.name);
+                        });
+                    }
+                    return predecessors;
+                };
+                /**
+                 * Given the name of a node, return an array of the names of its successors.
+                 * For an OpNode, this will contain the targets from the underlying BaseEdges.
+                 * For a GroupNode, this will contain the targets truncated to sibling of
+                 * the shared ancestor.
+                 *
+                 * This is the inverse of getPredecessors(). See that method's documentation
+                 * for an in-depth example.
+                 */
+                HierarchyImpl.prototype.getSuccessors = function (nodeName) {
+                    var node = this.index[nodeName];
+                    if (!node) {
+                        throw Error("Could not find node with name: " + nodeName);
+                    }
+                    var successors = this.getOneWayEdges(node, false);
+                    // Add embedded successors, such as summaries.
+                    if (!node.isGroupNode) {
+                        _.each(node.outEmbeddings, function (embeddedNode) {
+                            successors.regular.push(embeddedNode.name);
+                        });
+                    }
+                    return successors;
+                };
+                /** Helper method for getPredeccessors and getSuccessors */
+                HierarchyImpl.prototype.getOneWayEdges = function (node, inEdges) {
+                    var edges = { control: [], regular: [] };
+                    // A node with no parent cannot have any edges.
+                    if (!node.parentNode) {
+                        return edges;
+                    }
+                    if (node.parentNode.isGroupNode) {
+                        var parentNode = node.parentNode;
+                        var metagraph = parentNode.metagraph;
+                        var bridgegraph = this.getBridgegraph(parentNode.name);
+                        findEdgeTargetsInGraph(metagraph, node, inEdges, edges);
+                        findEdgeTargetsInGraph(bridgegraph, node, inEdges, edges);
+                    }
+                    return edges;
+                };
+                /**
+                 * For a given GroupNode, get or calculate an object which describes a
+                 * topological ordering of child nodes within that GroupNode's metagraph.
+                 *
+                 * This ordering is used when rendering bridge control edges which are
+                 * sometimes backwards relative to the dataflow.
+                 *
+                 * For example, say we have a graph with two edges A->B and A->C, and we're
+                 * interested in the ordering under ROOT. In this case, any of the following
+                 * would be legitimate return values:
+                 *
+                 *  - { "A": 0, "B": 1, "C": 2 } -- most likely
+                 *  - { "A": 0, "B": 2, "C": 1 } -- less likely
+                 *  - { "A": 12, "B": 100, "C": 99 } -- unlikely, but still OK
+                 *
+                 * The algorithm does not guarantee that all numbers from 0-N (where N is
+                 * the number of nodes) appear exactly once. Rather it guarantees that if
+                 * there is a path between two nodes, the earlier one will have a lower
+                 * number in the ordering hash.
+                 *
+                 * When generating the ordering, we ignore control Metaedges (those which
+                 * represent only BaseEdges that have isControlDependency set to true).
+                 *
+                 * If there is no node with the specified name, an error is thrown. If the
+                 * node with the specified name is not a group node, null is returned.
+                 */
+                HierarchyImpl.prototype.getTopologicalOrdering = function (nodeName) {
+                    var node = this.index[nodeName];
+                    if (!node) {
+                        throw Error("Could not find node with name: " + nodeName);
+                    }
+                    if (!node.isGroupNode) {
+                        return null;
+                    }
+                    if (nodeName in this.orderings) {
+                        return this.orderings[nodeName];
+                    }
+                    // Mapping of a child node names to lists of their successors.
+                    var successors = {};
+                    // Set of node names which have appeared as a destination.
+                    var destinations = {};
+                    var metagraph = node.metagraph;
+                    _.each(metagraph.edges(), function (e) {
+                        if (!metagraph.edge(e).numRegularEdges) {
+                            return; // Skip control edges.
+                        }
+                        // Keep track of successors and destinations.
+                        if (!(e.v in successors)) {
+                            successors[e.v] = [];
+                        }
+                        successors[e.v].push(e.w);
+                        destinations[e.w] = true;
+                    });
+                    // Seed the queue with true sources (those that are not destinations).
+                    var queue = _.difference(_.keys(successors), _.keys(destinations));
+                    // Produce an ordering by traversing the graph breadth first.
+                    var ordering = this.orderings[nodeName] = {};
+                    var index = 0;
+                    while (queue.length) {
+                        var childName = queue.shift();
+                        ordering[childName] = index++;
+                        _.each(successors[childName], function (succName) { return queue.push(succName); });
+                        delete successors[childName]; // Prevent cycles from infinite looping.
+                    }
+                    return ordering;
+                };
+                return HierarchyImpl;
+            })();
+            /**
+             * Internal utility function - given a graph (should be either a metagraph or a
+             * bridgegraph) and a node which is known to be in that graph, determine
+             * the other ends of edges that involve that node in the direction specified
+             * by whether it's inbound.
+             *
+             * For example if you wanted to find the predecessors of a node, you'd call
+             * this method for the parent's metagraph and bridgegraph, specifying inbound
+             * as true (look at the source of inbound edges to the specified node).
+             *
+             * Discovered target names are appended to the targets array.
+             */
+            function findEdgeTargetsInGraph(graph, node, inbound, targets) {
+                _.each(graph.edges(), function (e) {
+                    var _a = inbound ? [e.w, e.v] : [e.v, e.w], selfName = _a[0], otherName = _a[1];
+                    if (selfName === node.name) {
+                        if (node.isGroupNode) {
+                            var targetList = graph.edge(e).numRegularEdges
+                                ? targets.regular : targets.control;
+                            targetList.push(otherName);
+                        }
+                        else {
+                            _.each(graph.edge(e).baseEdgeList, function (baseEdge) {
+                                var targetList = baseEdge.isControlDependency
+                                    ? targets.control : targets.regular;
+                                targetList.push(inbound ? baseEdge.v : baseEdge.w);
+                            });
+                        }
+                    }
+                });
+            }
+            /**
+             * @param graph The raw graph.
+             * @param params Parameters used when building a hierarchy.
+             */
+            function build(graph, params, tracker) {
+                var h = new HierarchyImpl();
+                var seriesNames = {};
+                return tf.runAsyncTask("Adding nodes", 20, function () {
+                    // Get all the possible device names.
+                    var deviceNames = {};
+                    _.each(graph.nodes, function (node, nodeName) {
+                        if (node.device != null) {
+                            deviceNames[node.device] = true;
+                        }
+                    });
+                    h.devices = _.keys(deviceNames);
+                    addNodes(h, graph);
+                }, tracker)
+                    .then(function () {
+                    return tf.runAsyncTask("Detect series", 20, function () {
+                        if (params.groupSeries) {
+                            groupSeries(h.root, h, seriesNames);
+                        }
+                    }, tracker);
+                })
+                    .then(function () {
+                    return tf.runAsyncTask("Adding edges", 30, function () {
+                        addEdges(h, graph, seriesNames);
+                    }, tracker);
+                })
+                    .then(function () {
+                    return tf.runAsyncTask("Finding similar subgraphs", 30, function () {
+                        h.templates = graph_1.template.detect(h, params.verifyTemplate);
+                    }, tracker);
+                })
+                    .then(function () {
+                    return h;
+                }).catch(function (reason) {
+                    throw new Error("Failure creating graph hierarchy");
+                });
+            }
+            hierarchy_1.build = build;
+            ;
+            /**
+             * Creates the metanodes in the hierarchical graph and assigns parent-child
+             * relationship between them.
+             */
+            function addNodes(h, graph) {
+                _.each(graph.nodes, function (node, nodeName) {
+                    var path = graph_1.getHierarchicalPath(node.name);
+                    var parent = h.root;
+                    parent.depth = Math.max(path.length, parent.depth);
+                    // Create parent metanodes for each depth. For example if the node name
+                    // is 'a/b/c', then create metanodes 'a' and 'a/b', where 'a/b' is a child
+                    // of a.
+                    for (var i = 0; i < path.length; i++) {
+                        parent.depth = Math.max(parent.depth, path.length - i);
+                        parent.cardinality += node.cardinality;
+                        parent.opHistogram[node.op] = (parent.opHistogram[node.op] || 0) + 1;
+                        if (node.stats) {
+                            parent.stats.combine(node.stats);
+                        }
+                        if (node.device != null) {
+                            parent.deviceHistogram[node.device] =
+                                (parent.deviceHistogram[node.device] || 0) + 1;
+                        }
+                        if (i === path.length - 1) {
+                            break;
+                        }
+                        var name_1 = path[i];
+                        var child = h.node(name_1);
+                        if (!child) {
+                            child = graph_1.createMetanode(name_1);
+                            child.parentNode = parent;
+                            h.setNode(name_1, child);
+                            parent.metagraph.setNode(name_1, child);
+                        }
+                        parent = child;
+                    }
+                    // Assuming node name is 'a/b/c', assign the OpNode as a child of the metanode 'a/b'.
+                    h.setNode(node.name, node);
+                    node.parentNode = parent;
+                    parent.metagraph.setNode(node.name, node);
+                    // Add each of the in-embeddings and out-embeddings in the hierarchy.
+                    _.each(node.inEmbeddings, function (embedding) {
+                        h.setNode(embedding.name, embedding);
+                        embedding.parentNode = node;
+                    });
+                    _.each(node.outEmbeddings, function (embedding) {
+                        h.setNode(embedding.name, embedding);
+                        embedding.parentNode = node;
+                    });
+                });
+            }
+            ;
+            /**
+             * For each metanode in the hierarchical graph, this method adds:
+             * the edges in the metagraph. These are edges between nodes
+             * that share the same parent.
+             */
+            function addEdges(h, graph, seriesNames) {
+                var nodeIndex = h.getNodeMap();
+                // Ancestor paths for the source and destination nodes of an edge. These are
+                // reused for each edge rather than allocating new ones. It's about 10% faster
+                // than allocating new ones on each pass through the loop.
+                var sourcePath = [];
+                var destPath = [];
+                // Insert the ancestor path for a node into the provided array, including the
+                // node itself. Return the index of the last node inserted (always ROOT).
+                var getPath = function (node, path) {
+                    var i = 0;
+                    while (node) {
+                        path[i++] = node.name;
+                        node = node.parentNode;
+                    }
+                    return i - 1;
+                };
+                _.each(graph.edges, function (baseEdge) {
+                    // Get the hierarchical paths for the source and destination of the edge.
+                    var sourceAncestorIndex = getPath(graph.nodes[baseEdge.v], sourcePath);
+                    var destAncestorIndex = getPath(graph.nodes[baseEdge.w], destPath);
+                    // Find the lowest shared ancestor between source and dest by looking for
+                    // the highest nodes that differ between their ancestor paths.
+                    while (sourcePath[sourceAncestorIndex] === destPath[destAncestorIndex]) {
+                        sourceAncestorIndex--;
+                        destAncestorIndex--;
+                        if (sourceAncestorIndex < 0 || destAncestorIndex < 0) {
+                            // This would only occur if the two nodes were the same (a cycle in the
+                            // graph), or if one endpoint was a strict ancestor of the other. The
+                            // latter shouldn't happen because we rename nodes which are both
+                            // metanodes and op nodes. E.g. "A/B" becomes "A/B/(B)".
+                            throw Error("No difference found between ancestor paths.");
+                        }
+                    }
+                    var sharedAncestorNode = nodeIndex[sourcePath[sourceAncestorIndex + 1]];
+                    var sourceAncestorName = sourcePath[sourceAncestorIndex];
+                    var destAncestorName = destPath[destAncestorIndex];
+                    // Find or create the Metaedge which should contain this BaseEdge inside
+                    // the shared ancestor.
+                    var metaedge = sharedAncestorNode.metagraph.edge(sourceAncestorName, destAncestorName);
+                    if (!metaedge) {
+                        metaedge = graph_1.createMetaedge(sourceAncestorName, destAncestorName);
+                        sharedAncestorNode.metagraph
+                            .setEdge(sourceAncestorName, destAncestorName, metaedge);
+                    }
+                    if (!sharedAncestorNode.hasNonControlEdges &&
+                        !baseEdge.isControlDependency) {
+                        sharedAncestorNode.hasNonControlEdges = true;
+                    }
+                    metaedge.addBaseEdge(baseEdge);
+                });
+            }
+            ;
+            /**
+             * Using the hierarchy template information, detect series in the provided
+             * metanode.  For each detected series, create a new SeriesNode
+             * and remove series members from the metanode's metagraph and move them to
+             * the new series node's metagraph.
+             *
+             * @param metanode
+             * @param hierarchy
+             * @return A dictionary from node name to series node name that contains the node
+             */
+            function groupSeries(metanode, hierarchy, seriesNames) {
+                var metagraph = metanode.metagraph;
+                _.each(metagraph.nodes(), function (n) {
+                    var child = metagraph.node(n);
+                    if (child.type === tf.graph.NodeType.META) {
+                        groupSeries(child, hierarchy, seriesNames);
+                    }
+                });
+                var clusters = clusterNodes(metagraph);
+                var seriesDict = detectSeries(clusters, metagraph);
+                // Add each series node to the graph and add its grouped children to its own
+                // metagraph.
+                _.each(seriesDict, function (seriesNode, seriesName) {
+                    var nodeMemberNames = seriesNode.metagraph.nodes();
+                    var firstMember = seriesNode.metagraph.node(nodeMemberNames[0]);
+                    var seriesType = firstMember.type;
+                    hierarchy.setNode(seriesName, seriesNode); // add to the index
+                    metagraph.setNode(seriesName, seriesNode);
+                    _.each(nodeMemberNames, function (n) {
+                        var child = metagraph.node(n);
+                        seriesNode.metagraph.setNode(n, child);
+                        seriesNode.parentNode = child.parentNode;
+                        seriesNode.cardinality++;
+                        if (child.device != null) {
+                            seriesNode.deviceHistogram[child.device] =
+                                (seriesNode.deviceHistogram[child.device] || 0) + 1;
+                        }
+                        child.parentNode = seriesNode;
+                        seriesNames[n] = seriesName;
+                        if (child.stats) {
+                            seriesNode.stats.combine(child.stats);
+                        }
+                        // Remove now-grouped node from its original parent's metagraph.
+                        metagraph.removeNode(n);
+                    });
+                });
+            }
+            ;
+            /** cluster op-nodes with similar op */
+            function clusterNodes(metagraph) {
+                var result = {};
+                return _.reduce(metagraph.nodes(), function (clusters, n) {
+                    var child = metagraph.node(n);
+                    if (child.type === graph_1.NodeType.META) {
+                        // skip metanodes
+                        return clusters;
+                    }
+                    var template = child.op;
+                    if (template) {
+                        clusters[template] = clusters[template] || [];
+                        clusters[template].push(child.name);
+                    }
+                    return clusters;
+                }, result);
+            }
+            /**
+             * For each cluster of op-nodes based op type, try to detect groupings.
+             * Infer series name using by trying to find pattern "<number>" in the node
+             * name.
+             *
+             * @param clusters Dictionary output from clusterNodes().
+             * @param metagraph
+             * @return A dictionary from series name => seriesNode
+             */
+            function detectSeries(clusters, metagraph) {
+                var seriesDict = {};
+                _.each(clusters, function (members, clusterId) {
+                    if (members.length <= 1) {
+                        return;
+                    } // isolated clusters can't make series
+                    /** @type {Object}  A dictionary mapping seriesName to seriesInfoArray,
+                     * which is an array that contains objects with name, id, prefix, suffix,
+                     * and parent properties.
+                     */
+                    var candidatesDict = {};
+                    // Group all nodes that have the same name, with the exception of a
+                    // number at the end of the name after an underscore, which is allowed to
+                    // vary.
+                    _.each(members, function (name) {
+                        var isGroup = name.charAt(name.length - 1) === "*";
+                        var namepath = name.split("/");
+                        var leaf = namepath[namepath.length - 1];
+                        var parent = namepath.slice(0, namepath.length - 1).join("/");
+                        var matches = leaf.match(/^(\D*)_(\d+)$/);
+                        var prefix;
+                        var id;
+                        var suffix = "";
+                        if (matches) {
+                            prefix = matches[1]; // the front non-numeric characters
+                            id = matches[2]; // the digits
+                        }
+                        else {
+                            prefix = isGroup ? leaf.substr(0, leaf.length - 1) : leaf;
+                            if (prefix.charAt(prefix.length - 1) !== "_") {
+                                prefix += "_";
+                            }
+                            id = 0;
+                            suffix = isGroup ? "*" : "";
+                        }
+                        var seriesName = graph_1.getSeriesNodeName(prefix, suffix, parent);
+                        candidatesDict[seriesName] = candidatesDict[seriesName] || [];
+                        var seriesNode = graph_1.createSeriesNode(prefix, suffix, parent, +id, name);
+                        candidatesDict[seriesName].push(seriesNode);
+                    });
+                    // In each group of nodes, group nodes in bunches that have monotonically
+                    // increasing numbers in their names.  Each of these bunches is a series.
+                    _.each(candidatesDict, function (seriesInfoArray, seriesName) {
+                        if (seriesInfoArray.length < 2) {
+                            return;
+                        }
+                        seriesInfoArray.sort(function (a, b) {
+                            return (+a.clusterId) - (+b.clusterId);
+                        });
+                        // Loop through the nodes sorted by its detected series number, grouping
+                        // all nodes with monotonically-increasing series numbers.
+                        var seriesNodes = [seriesInfoArray[0]];
+                        for (var index = 1; index < seriesInfoArray.length; index++) {
+                            var nextNode = seriesInfoArray[index];
+                            if (nextNode.clusterId === seriesNodes[seriesNodes.length - 1].clusterId + 1) {
+                                seriesNodes.push(nextNode);
+                                continue;
+                            }
+                            addSeriesToDict(seriesNodes, seriesDict, +clusterId, metagraph);
+                            seriesNodes = [nextNode];
+                        }
+                        addSeriesToDict(seriesNodes, seriesDict, +clusterId, metagraph);
+                    });
+                });
+                return seriesDict;
+            }
+            /**
+             * Add a series to the provided dictionary mapping series names to series.
+             *
+             * @param seriesNodes the nodes in the series. Contains
+             *     name, id, prefix, suffix and parent properties of the node.
+             * @param seriesDict the dictionary of series
+             * @param clusterId ID of the template of the nodes of the series
+             * @param metagraph
+             */
+            function addSeriesToDict(seriesNodes, seriesDict, clusterId, metagraph) {
+                if (seriesNodes.length > 1) {
+                    var curSeriesName = graph_1.getSeriesNodeName(seriesNodes[0].prefix, seriesNodes[0].suffix, seriesNodes[0].parent, seriesNodes[0].clusterId, seriesNodes[seriesNodes.length - 1].clusterId);
+                    var curSeriesNode = graph_1.createSeriesNode(seriesNodes[0].prefix, seriesNodes[0].suffix, seriesNodes[0].parent, clusterId, curSeriesName);
+                    _.each(seriesNodes, function (node) {
+                        curSeriesNode.ids.push(node.clusterId);
+                        curSeriesNode.metagraph.setNode(node.name, metagraph.node(node.name));
+                    });
+                    seriesDict[curSeriesName] = curSeriesNode;
+                }
+            }
+        })(hierarchy = graph_1.hierarchy || (graph_1.hierarchy = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module tf.graph.hierarchy
+</script>
+<script>/// <reference path="graph.ts" />
+/// <reference path="hierarchy.ts" />
+var __extends = (this && this.__extends) || function (d, b) {
+    for (var p in b) if (b.hasOwnProperty(p)) d[p] = b[p];
+    function __() { this.constructor = d; }
+    __.prototype = b.prototype;
+    d.prototype = new __();
+};
+/**
+ * Package for the Render Hierarchy for TensorFlow graph.
+ */
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph_1) {
+        var render;
+        (function (render) {
+            /**
+             * Color parameters for node encoding.
+             * @type {Object}
+             */
+            render.MetanodeColors = {
+                SATURATION: 0.6,
+                LIGHTNESS: 0.85,
+                /**
+                 * Neutral color to use when the node is expanded (used when coloring by
+                 * compute time, memory and device).
+                 */
+                EXPANDED_COLOR: "#f0f0f0",
+                /**
+                 * Standard hue values for node color palette.
+                 */
+                HUES: [220, 100, 180, 40, 20, 340, 260, 300, 140, 60],
+                STRUCTURE_PALETTE: function (id, lightened) {
+                    // The code below is a flexible way to computationally create a set
+                    // of colors that go well together.
+                    var hues = render.MetanodeColors.HUES;
+                    var n = hues.length;
+                    var hue = hues[id % n];
+                    var m = Math.sin(hue * Math.PI / 360);
+                    var sat = lightened ? 30 : 90 - 60 * m;
+                    var light = lightened ? 95 : 80;
+                    return d3.hsl(hue, .01 * sat, .01 * light).toString();
+                },
+                DEVICE_PALETTE: function (index) {
+                    return render.MetanodeColors.STRUCTURE_PALETTE(index);
+                },
+                UNKNOWN: "#eee",
+                GRADIENT_OUTLINE: "#888"
+            };
+            /**
+             * Stores the rendering information, such as x and y coordinates,
+             * for each node in the graph.
+             */
+            var RenderGraphInformation = (function () {
+                function RenderGraphInformation(hierarchy, params) {
+                    this.hierarchy = hierarchy;
+                    this.index = {};
+                    this.deviceColorMap = d3.scale.ordinal()
+                        .domain(hierarchy.devices)
+                        .range(_.map(d3.range(hierarchy.devices.length), render.MetanodeColors.DEVICE_PALETTE));
+                    var topLevelGraph = hierarchy.root.metagraph;
+                    // Find the maximum and minimum memory usage.
+                    var memoryExtent = d3.extent(topLevelGraph.nodes(), function (nodeName, index) {
+                        var node = topLevelGraph.node(nodeName);
+                        // Some ops don't have stats at all.
+                        if (node.stats != null) {
+                            return node.stats.totalBytes;
+                        }
+                    });
+                    this.memoryUsageScale = d3.scale.linear()
+                        .domain(memoryExtent)
+                        .range(params.minMaxColors);
+                    // Find also the minimum and maximum compute time.
+                    var computeTimeExtent = d3.extent(topLevelGraph.nodes(), function (nodeName, index) {
+                        var node = topLevelGraph.node(nodeName);
+                        // Some ops don't have stats at all.
+                        if (node.stats != null) {
+                            return node.stats.totalMicros;
+                        }
+                    });
+                    this.computeTimeScale = d3.scale.linear()
+                        .domain(computeTimeExtent)
+                        .range(params.minMaxColors);
+                    // Maps node name to whether the rendering hierarchy was already constructed.
+                    this.hasSubhierarchy = {};
+                    this.params = params;
+                    this.root = new RenderGroupNodeInformation(hierarchy.root);
+                    this.index[hierarchy.root.name] = this.root;
+                    this.buildSubhierarchy(hierarchy.root.name);
+                    this.root.expanded = true;
+                }
+                RenderGraphInformation.prototype.getRenderNodeByName = function (nodeName) {
+                    return this.index[nodeName];
+                };
+                /**
+                 * Return the nearest ancestor node, including itself, that is visible
+                 * in the visualization. This method is used so that we can select
+                 * (highlight) a node that isn't drawn yet, by selecting (highlighting)
+                 * its nearest ancestor that has been drawn.
+                 */
+                RenderGraphInformation.prototype.getNearestVisibleAncestor = function (name) {
+                    var path = graph_1.getHierarchicalPath(name);
+                    for (var i = 0; i < path.length; i++) {
+                        var nodeName = path[i];
+                        // Op nodes have expanded set to false by default.
+                        if (!this.getRenderNodeByName(nodeName).expanded) {
+                            return nodeName;
+                        }
+                    }
+                    // Fallthrough. If everything was expanded return the node.
+                    return name;
+                };
+                // TODO(jimbo): Delete this an any code it touches (all deprecated).
+                RenderGraphInformation.prototype.setDepth = function (depth) {
+                    setGroupNodeDepth(this.root, +depth);
+                };
+                RenderGraphInformation.prototype.buildSubhierarchy = function (nodeName) {
+                    var _this = this;
+                    // Terminate if the rendering hierarchy was already constructed
+                    // for this node.
+                    if (nodeName in this.hasSubhierarchy) {
+                        return;
+                    }
+                    var renderNodeInfo = this.index[nodeName];
+                    // If it is not a meta node or a series node, don't do anything.
+                    if (renderNodeInfo.node.type !== graph_1.NodeType.META &&
+                        renderNodeInfo.node.type !== graph_1.NodeType.SERIES) {
+                        return;
+                    }
+                    // At this point we know the rendering information is about a group node.
+                    var renderGroupNodeInfo = renderNodeInfo;
+                    var metagraph = renderGroupNodeInfo.node.metagraph;
+                    var coreGraph = renderGroupNodeInfo.coreGraph;
+                    // Create render nodes to represent each child from the metagraph. Although
+                    // these will initially be added to the coreGraph, they may later be
+                    // extracted. Also, due to extraction, the coreGraph may contain disjoint
+                    // groups between which there is no visible path (other than annotations).
+                    _.each(metagraph.nodes(), function (childName) {
+                        var childNode = metagraph.node(childName);
+                        var childRenderInfo = childNode.isGroupNode ?
+                            new RenderGroupNodeInformation(childNode) :
+                            new RenderNodeInformation(childNode);
+                        _this.index[childName] = childRenderInfo;
+                        coreGraph.setNode(childName, childRenderInfo);
+                        if (childRenderInfo.node.stats != null) {
+                            childRenderInfo.memoryColor =
+                                _this.memoryUsageScale(childRenderInfo.node.stats.totalBytes);
+                            childRenderInfo.computeTimeColor =
+                                _this.computeTimeScale(childRenderInfo.node.stats.totalMicros);
+                        }
+                        if (!childNode.isGroupNode) {
+                            _.each(childNode.inEmbeddings, function (embedding) {
+                                var renderMetaedgeInfo = new RenderMetaedgeInformation(null);
+                                addInAnnotation(childRenderInfo, embedding, null, renderMetaedgeInfo, AnnotationType.CONSTANT, _this.params);
+                                _this.index[embedding.name] = new RenderNodeInformation(embedding);
+                            });
+                            _.each(childNode.outEmbeddings, function (embedding) {
+                                var renderMetaedgeInfo = new RenderMetaedgeInformation(null);
+                                addOutAnnotation(childRenderInfo, embedding, null, renderMetaedgeInfo, AnnotationType.SUMMARY, _this.params);
+                                _this.index[embedding.name] = new RenderNodeInformation(embedding);
+                            });
+                            var device = childRenderInfo.node.device;
+                            if (device != null) {
+                                childRenderInfo.deviceColors = [{
+                                        color: _this.deviceColorMap(device),
+                                        proportion: 1.0
+                                    }];
+                            }
+                        }
+                        else {
+                            // Make a list of tuples (device, proportion), where proportion
+                            // is the fraction of op nodes that have that device.
+                            var pairs = _.pairs(childNode.deviceHistogram);
+                            if (pairs.length > 0) {
+                                // Compute the total # of devices.
+                                var numDevices = _.sum(pairs, _.last);
+                                childRenderInfo.deviceColors = _.map(pairs, function (pair) {
+                                    return {
+                                        color: _this.deviceColorMap(pair[0]),
+                                        // Normalize to a proportion of total # of devices.
+                                        proportion: pair[1] / numDevices
+                                    };
+                                });
+                            }
+                        }
+                    });
+                    // Add render metaedge info for edges in the metagraph.
+                    _.each(metagraph.edges(), function (edgeObj) {
+                        var metaedge = metagraph.edge(edgeObj);
+                        var renderMetaedgeInfo = new RenderMetaedgeInformation(metaedge);
+                        coreGraph.setEdge(edgeObj.v, edgeObj.w, renderMetaedgeInfo);
+                    });
+                    if (this.params.enableExtraction &&
+                        renderGroupNodeInfo.node.type === graph_1.NodeType.META) {
+                        extractHighDegrees(renderGroupNodeInfo, this.params);
+                    }
+                    // Record that we constructed the rendering hierarchy for this node, so we
+                    // don't construct it another time.
+                    this.hasSubhierarchy[nodeName] = true;
+                    // Look up the parent node's render information and short circuit if none.
+                    var parentNode = renderGroupNodeInfo.node.parentNode;
+                    if (!parentNode) {
+                        return;
+                    }
+                    var parentNodeInfo = this.index[parentNode.name];
+                    // Utility function for computing the name of a bridge node.
+                    var getBridgeNodeName = function (inbound) {
+                        var rest = [];
+                        for (var _i = 1; _i < arguments.length; _i++) {
+                            rest[_i - 1] = arguments[_i];
+                        }
+                        return rest.concat([inbound ? "IN" : "OUT"]).join("~~");
+                    };
+                    // Build out the bridgegraph.
+                    var bridgegraph = this.hierarchy.getBridgegraph(nodeName);
+                    // Look for popular nodes so we can make annotations instead of paths.
+                    var otherCounts = {
+                        // Counts of edges coming INTO other nodes by name (outgoing from self).
+                        in: {},
+                        // Counts of edges going OUT from other nodes by name (coming into self).
+                        out: {},
+                        // Counts of all control edges involving other nodes by name.
+                        control: {},
+                    };
+                    _.each(bridgegraph.edges(), function (e) {
+                        // An edge is inbound if its destination node is in the metagraph.
+                        var inbound = !!metagraph.node(e.w);
+                        var otherName = inbound ? e.v : e.w;
+                        var metaedge = bridgegraph.edge(e);
+                        if (!metaedge.numRegularEdges) {
+                            otherCounts.control[otherName] =
+                                (otherCounts.control[otherName] || 0) + 1;
+                        }
+                        else if (inbound) {
+                            otherCounts.out[otherName] = (otherCounts.out[otherName] || 0) + 1;
+                        }
+                        else {
+                            otherCounts.in[otherName] = (otherCounts.in[otherName] || 0) + 1;
+                        }
+                    });
+                    // Add annotations and edges for bridgegraph relationships.
+                    var hierarchyNodeMap = this.hierarchy.getNodeMap();
+                    _.each(bridgegraph.edges(), function (bridgeEdgeObj) {
+                        var bridgeMetaedge = bridgegraph.edge(bridgeEdgeObj);
+                        // Determine whether this bridge edge is incoming by checking the
+                        // metagraph for a node that matches the destination end.
+                        var inbound = !!metagraph.node(bridgeEdgeObj.w);
+                        // Based on the direction of the edge, one endpoint will be an immediate
+                        // child of this renderNodeInfo, and the other endpoint will be a sibling
+                        // of the parent (or an ancestor further up).
+                        var _a = inbound ?
+                            [bridgeEdgeObj.w, bridgeEdgeObj.v] :
+                            [bridgeEdgeObj.v, bridgeEdgeObj.w], childName = _a[0], otherName = _a[1];
+                        var childRenderInfo = _this.index[childName];
+                        var otherRenderInfo = _this.index[otherName];
+                        var otherNode = otherRenderInfo ?
+                            otherRenderInfo.node :
+                            hierarchyNodeMap[otherName];
+                        // Determine whether this edge is a control edge between nodes where
+                        // either node is high-degree with respect to control edges. This will
+                        // be a signal to show it as an annotation instead of a bridge edge.
+                        var isHighDegreeControlEdge = !bridgeMetaedge.numRegularEdges &&
+                            otherCounts.control[otherName] > _this.params.maxControlDegree;
+                        var _b = inbound ?
+                            [renderNodeInfo.inAnnotations, childRenderInfo.inAnnotations] :
+                            [renderNodeInfo.outAnnotations, childRenderInfo.outAnnotations], annotations = _b[0], childAnnotations = _b[1];
+                        var isOtherHighDegree = inbound ?
+                            otherCounts.out[otherName] > _this.params.maxOutDegree :
+                            otherCounts.in[otherName] > _this.params.maxInDegree;
+                        // The adjoining render metaedge info from the parent's coreGraph, if any.
+                        // It will either be a Metaedge involving this node directly, if it
+                        // previously came from a metagraph, or it'll be a Metaedge involving
+                        // a previously created bridge node standing in for the other node.
+                        var adjoiningMetaedge = null;
+                        // We can only hope to render a bridge path if:
+                        //  - bridgegraph paths are enabled,
+                        //  - the other node is not too high-degree,
+                        //  - the child is in the core (not extracted for being high-degree), and
+                        //  - there's a path (in the traversal sense) between child and other.
+                        var canDrawBridgePath = false;
+                        if (_this.params.enableBridgegraph &&
+                            !isOtherHighDegree &&
+                            !isHighDegreeControlEdge &&
+                            childRenderInfo.isInCore()) {
+                            // Utility function for finding an adjoining metaedge.
+                            var findAdjoiningMetaedge = function (targetName) {
+                                var adjoiningEdgeObj = inbound ?
+                                    { v: targetName, w: nodeName } :
+                                    { v: nodeName, w: targetName };
+                                return parentNodeInfo.coreGraph.edge(adjoiningEdgeObj);
+                            };
+                            adjoiningMetaedge = findAdjoiningMetaedge(otherName);
+                            if (!adjoiningMetaedge) {
+                                adjoiningMetaedge = findAdjoiningMetaedge(getBridgeNodeName(inbound, otherName, parentNode.name));
+                            }
+                            canDrawBridgePath = !!adjoiningMetaedge;
+                        }
+                        // Although dataflow edges are acyclic, control dependency edges may
+                        // actually point "backwards" in the graph. If this bridgeMetaedge is
+                        // a control dependency, we need to determine whether it's backwards
+                        // pointing so that we render it appropriately.
+                        //
+                        // For instance, say we're rendering a graph with nodes named A/B and Z/Y,
+                        // and we're currently rendering the bridgegraph for A. Further, let's say
+                        // that there was an original BaseEdge from A/B->Z/Y and a CONTROL EDGE
+                        // from Z/Y=>A/B.
+                        //
+                        //     +----------------+
+                        //     | A              |
+                        //     |  +-----+       |         +------+
+                        //     |  | B   |>----->|>------->| Z    |
+                        //     |  |     |       |         |      |
+                        //     |  |     |   *   |         |      |
+                        //     |  |     |<=====<|<=======<|      |
+                        //     |  +-----+       |         +------+
+                        //     +----------------+
+                        //
+                        // When we render the subhierarchy for Metanode A, we'll come across a
+                        // control-only Metaedge in the bridgegraph from Z=>A/B (*). The question
+                        // is whether this edge is backwards.
+                        //
+                        // To answer that question, we follow the chain of adjoining metaedges
+                        // until we reach the topmost one. In this case, that's the control-only
+                        // Metaedge Z=>A in the ROOT's metagraph. We determine that this edge
+                        // is backwards by looking at the topological ordering of ROOT's metagraph
+                        // (which ignores control edges) and seeing that Z comes AFTER A.
+                        //
+                        // The property of being backwards is independent of whether the edge
+                        // is inbound or outbound. In the preceeding example, if we were building
+                        // the subhierarchy for Z, we'd find bridge edge Z/Y=>A, walk to its
+                        // topmost adjoining metaedge Z=>A and discover that it's backwards.
+                        var backwards = false;
+                        if (adjoiningMetaedge && !bridgeMetaedge.numRegularEdges) {
+                            // Find the top-most adjoining render metaedge information, and the
+                            // GroupNode whose metagraph must contain the associated metaedge.
+                            var topAdjoiningMetaedge = adjoiningMetaedge;
+                            var topGroupNode = parentNodeInfo.node;
+                            while (topAdjoiningMetaedge.adjoiningMetaedge) {
+                                topAdjoiningMetaedge = topAdjoiningMetaedge.adjoiningMetaedge;
+                                topGroupNode = topGroupNode.parentNode;
+                            }
+                            // Check against the topological ordering for the top node. The current
+                            // bridge metaedge we're evaluating is backwards if its source comes
+                            // after its destination.
+                            var ordering = _this.hierarchy.getTopologicalOrdering(topGroupNode.name);
+                            var e = topAdjoiningMetaedge.metaedge;
+                            backwards = ordering[e.v] > ordering[e.w];
+                        }
+                        // Render backwards control edges as annotations.
+                        canDrawBridgePath = canDrawBridgePath && !backwards;
+                        // If we can't make a bridge path for any reason, then we add an
+                        // annotation instead.
+                        if (!canDrawBridgePath) {
+                            childAnnotations.push(new Annotation(otherNode, otherRenderInfo, new RenderMetaedgeInformation(bridgeMetaedge), AnnotationType.SHORTCUT, inbound), _this.params);
+                            return;
+                        }
+                        // At this point, all conditions have been met for drawing a bridge path.
+                        // Find or create the IN/OUT node representing otherNode.
+                        var bridgeContainerName = getBridgeNodeName(inbound, nodeName);
+                        var bridgeNodeName = getBridgeNodeName(inbound, otherName, nodeName);
+                        var bridgeNodeRenderInfo = coreGraph.node(bridgeNodeName);
+                        if (!bridgeNodeRenderInfo) {
+                            // Find or create the directional container for the bridge node.
+                            var bridgeContainerInfo = coreGraph.node(bridgeContainerName);
+                            if (!bridgeContainerInfo) {
+                                var bridgeContainerNode = {
+                                    // Important node properties.
+                                    name: bridgeContainerName,
+                                    type: graph_1.NodeType.BRIDGE,
+                                    // Unused node properties.
+                                    isGroupNode: false,
+                                    cardinality: 0,
+                                    parentNode: null,
+                                    stats: null,
+                                    // BridgeNode properties.
+                                    inbound: inbound,
+                                };
+                                bridgeContainerInfo =
+                                    new RenderNodeInformation(bridgeContainerNode);
+                                _this.index[bridgeContainerName] = bridgeContainerInfo;
+                                coreGraph.setNode(bridgeContainerName, bridgeContainerInfo);
+                            }
+                            var bridgeNode = {
+                                // Important node properties.
+                                name: bridgeNodeName,
+                                type: graph_1.NodeType.BRIDGE,
+                                // Unimportant node properties.
+                                isGroupNode: false,
+                                cardinality: 1,
+                                parentNode: null,
+                                stats: null,
+                                // BridgeNode properties.
+                                inbound: inbound,
+                            };
+                            bridgeNodeRenderInfo = new RenderNodeInformation(bridgeNode);
+                            _this.index[bridgeNodeName] = bridgeNodeRenderInfo;
+                            coreGraph.setNode(bridgeNodeName, bridgeNodeRenderInfo);
+                            // Set bridgeNode to be a graphlib child of the container node.
+                            coreGraph.setParent(bridgeNodeName, bridgeContainerName);
+                            bridgeContainerInfo.node.cardinality++;
+                        }
+                        // Create and add a bridge render metaedge.
+                        var bridgeRenderMetaedge = new RenderMetaedgeInformation(bridgeMetaedge);
+                        bridgeRenderMetaedge.adjoiningMetaedge = adjoiningMetaedge;
+                        inbound ?
+                            coreGraph.setEdge(bridgeNodeName, childName, bridgeRenderMetaedge) :
+                            coreGraph.setEdge(childName, bridgeNodeName, bridgeRenderMetaedge);
+                    }); // End _.each(bridgegraph.edges).
+                    // For each bridge container (IN and/or OUT), add structural edges between
+                    // terminal nodes and that container. A terminal node is one which has no
+                    // non-bridge edges in the direction of the container.
+                    //
+                    // For example, consider a Metanode A which contains two child nodes A/B
+                    // and A/C. Let's say it has one edge in the metagraph from A/B->A/C, and
+                    // one edge in the bridgegraph from Z->A/C.
+                    //
+                    // At this point, we've added a container bridge node IN to house all
+                    // incoming bridge nodes. We'v alse added a bridge node Z' (with parent IN)
+                    // to A, and a bridge edge from Z'->C.
+                    //
+                    //     +----------------------+
+                    //     | A          +---+     |
+                    //     |    +------>| C |     |
+                    //     |    |       +---+     |
+                    //     |    |         ^       |
+                    //     |    |         |       |
+                    //     |    |    +----|----+  |
+                    //     |    |    | IN |    |  |
+                    //     |  +---+  |  +---+  |  |
+                    //     |  | B |  |  | Z'|  |  |
+                    //     |  +---+  |  +---+  |  |
+                    //     |         +---------+  |
+                    //     +----------------------+
+                    //
+                    // With no other help, dagre would lay out B and Z' on the same level,
+                    // because both of them have no incoming edges. In other words, B is a
+                    // terminal node in the INCOMING direction.
+                    //
+                    // But we want to force dagre to lay out Z' (and everything in IN) lower
+                    // than all non-bridge nodes, so that there's enough room for the bridge
+                    // edges after they've been adjusted to meet up with paths coming in from
+                    // outside.
+                    //
+                    // To force Z' (and all other bridge nodes) to be lowest in the graph, we
+                    // identify terminal nodes like B and give them structural edges to
+                    // a new structural bridge node S which we add to IN.
+                    //
+                    //     +----------------------+
+                    //     | A          +---+     |
+                    //     |       +--->| C |     |
+                    //     |       |    +---+     |
+                    //     |     +---+    ^       |
+                    //     |     | B |    |       |
+                    //     |     +---+    |       |
+                    //     |       ^      |       |
+                    //     |       |      |       |
+                    //     |  +----|------|----+  |
+                    //     |  |IN  |      |    |  |
+                    //     |  |  +---+  +---+  |  |
+                    //     |  |  | S |  | Z'|  |  |
+                    //     |  |  +---+  +---+  |  |
+                    //     |  +----------------+  |
+                    //     +----------------------+
+                    //
+                    // This ensures that dagre will lay out the bridge containers strictly at
+                    // the ends of the graph. The structural edges will never be seen in the
+                    // visualization except as a debugging aid.
+                    _.each([true, false], function (inbound) {
+                        var bridgeContainerName = getBridgeNodeName(inbound, nodeName);
+                        var bridgeContainerInfo = coreGraph.node(bridgeContainerName);
+                        if (!bridgeContainerInfo) {
+                            return;
+                        }
+                        _.each(coreGraph.nodes(), function (childName) {
+                            // Short-circuit if this child is a bridge node or it's not a terminal
+                            // node in the direction we're interested in.
+                            var childNodeInfo = coreGraph.node(childName);
+                            if (childNodeInfo.node.type === graph_1.NodeType.BRIDGE) {
+                                return;
+                            }
+                            var isTerminal = inbound ?
+                                !coreGraph.predecessors(childName).length :
+                                !coreGraph.successors(childName).length;
+                            if (!isTerminal) {
+                                return;
+                            }
+                            // Find or create a bridge node in the container for all structural
+                            // metaedges. It would have been nice to skip this step and simply
+                            // set a metaedge between the terminal node and the container node, but
+                            // in that case, something about the graph upsets dagre.layout()'s
+                            // longestPath algorithm (was getting errors due to an undefined).
+                            var structuralNodeName = getBridgeNodeName(inbound, nodeName, "STRUCTURAL_TARGET");
+                            var structuralRenderInfo = coreGraph.node(structuralNodeName);
+                            if (!structuralRenderInfo) {
+                                var bridgeNode = {
+                                    // Important Node properties.
+                                    name: structuralNodeName,
+                                    type: graph_1.NodeType.BRIDGE,
+                                    // Unimportant Node properties.
+                                    isGroupNode: false,
+                                    cardinality: 1,
+                                    parentNode: null,
+                                    stats: null,
+                                    // BridgeNode properties.
+                                    inbound: inbound,
+                                };
+                                structuralRenderInfo = new RenderNodeInformation(bridgeNode);
+                                structuralRenderInfo.structural = true;
+                                _this.index[structuralNodeName] = structuralRenderInfo;
+                                coreGraph.setNode(structuralNodeName, structuralRenderInfo);
+                                bridgeContainerInfo.node.cardinality++;
+                                coreGraph.setParent(structuralNodeName, bridgeContainerName);
+                            }
+                            // Create the structural Metaedge and insert it.
+                            var structuralMetaedgeInfo = new RenderMetaedgeInformation(null);
+                            structuralMetaedgeInfo.structural = true;
+                            structuralMetaedgeInfo.weight--; // Reduce weight for dagre layout.
+                            inbound ?
+                                coreGraph.setEdge(structuralNodeName, childName, structuralMetaedgeInfo) :
+                                coreGraph.setEdge(childName, structuralNodeName, structuralMetaedgeInfo);
+                        });
+                    });
+                };
+                return RenderGraphInformation;
+            })();
+            render.RenderGraphInformation = RenderGraphInformation;
+            /**
+             * A class for rendering annotation object which contains label
+             * about the node embedded as annotation, type of annotation and the location
+             * of both the annotation's node and edge.
+             *
+             * Annotation objects include embedded constants, embedded summary, and
+             * edge shortcuts.
+             */
+            var Annotation = (function () {
+                /**
+                 * Creates a new Annotation.
+                 *
+                 * @param node The underlying node this annotation points to.
+                 * @param renderNodeInfo The render information for the underlying node
+                 *     this annotation points to. This can be null if the annotation
+                 *     denotes an embedding (constant, summary), in which case we
+                 *     use the node property.
+                 * @param renderMetaedgeInfo The render information for the edge associated
+                 *     with the annotation.
+                 * @param type The type of the annotation.
+                 * @param isIn True if it is an in-annotation. False if it is an
+                 *     out-annotation.
+                 */
+                function Annotation(node, renderNodeInfo, renderMetaedgeInfo, type, isIn) {
+                    this.node = node;
+                    this.renderNodeInfo = renderNodeInfo;
+                    this.renderMetaedgeInfo = renderMetaedgeInfo;
+                    this.annotationType = type;
+                    // Properties specified by layout
+                    this.dx = 0;
+                    this.dy = 0;
+                    this.width = 0;
+                    this.height = 0;
+                    this.isIn = isIn;
+                    this.points = [];
+                }
+                return Annotation;
+            })();
+            render.Annotation = Annotation;
+            ;
+            (function (AnnotationType) {
+                AnnotationType[AnnotationType["SHORTCUT"] = 0] = "SHORTCUT";
+                AnnotationType[AnnotationType["CONSTANT"] = 1] = "CONSTANT";
+                AnnotationType[AnnotationType["SUMMARY"] = 2] = "SUMMARY";
+                AnnotationType[AnnotationType["ELLIPSIS"] = 3] = "ELLIPSIS";
+            })(render.AnnotationType || (render.AnnotationType = {}));
+            var AnnotationType = render.AnnotationType;
+            ;
+            /**
+             * Manages a list of annotations. Two will be used for each
+             * RenderNodeInformation, one for in annotations and one for out annotations.
+             */
+            var AnnotationList = (function () {
+                function AnnotationList() {
+                    this.list = [];
+                    this.nodeNames = {};
+                }
+                /**
+                 * Append an annotation to the list, or a stand-in ellipsis annotation instead
+                 * if this would make it too many.
+                 */
+                AnnotationList.prototype.push = function (annotation, params) {
+                    if (annotation.node.name in this.nodeNames) {
+                        return; // Skip duplicate annotation.
+                    }
+                    this.nodeNames[annotation.node.name] = true;
+                    if (this.list.length < params.maxAnnotations) {
+                        this.list.push(annotation);
+                        return;
+                    }
+                    var lastAnnotation = this.list[this.list.length - 1];
+                    if (lastAnnotation.annotationType === AnnotationType.ELLIPSIS) {
+                        var ellipsisNode_1 = lastAnnotation.node;
+                        ellipsisNode_1.setNumMoreNodes(++ellipsisNode_1.numMoreNodes);
+                        return;
+                    }
+                    var ellipsisNode = new tf.graph.EllipsisNodeImpl(1);
+                    this.list.push(new Annotation(ellipsisNode, new RenderNodeInformation(ellipsisNode), null, AnnotationType.ELLIPSIS, annotation.isIn));
+                };
+                return AnnotationList;
+            })();
+            render.AnnotationList = AnnotationList;
+            /**
+             * Contains rendering information about a node in the hierarchical graph.
+             */
+            var RenderNodeInformation = (function () {
+                function RenderNodeInformation(node) {
+                    this.node = node;
+                    this.expanded = false;
+                    this.inAnnotations = new AnnotationList();
+                    this.outAnnotations = new AnnotationList();
+                    // Params specified by layout
+                    this.x = 0;
+                    this.y = 0;
+                    this.width = 0;
+                    this.height = 0;
+                    this.inboxWidth = 0;
+                    this.outboxWidth = 0;
+                    this.excluded = false;
+                    // Params for bridge paths.
+                    this.structural = false;
+                    // Params for node box.
+                    this.labelOffset = 0;
+                    this.extractXOffset = 0;
+                    this.radius = 0;
+                    // Params for expanded node
+                    this.labelHeight = 0;
+                    this.paddingTop = 0;
+                    this.paddingLeft = 0;
+                    this.paddingRight = 0;
+                    this.paddingBottom = 0;
+                    this.outerWidth = 0;
+                    this.outerHeight = 0;
+                    this.isInExtract = false;
+                    this.isOutExtract = false;
+                }
+                RenderNodeInformation.prototype.isInCore = function () {
+                    return !this.isInExtract && !this.isOutExtract;
+                };
+                return RenderNodeInformation;
+            })();
+            render.RenderNodeInformation = RenderNodeInformation;
+            /**
+             * Contains rendering information about a Metaedge from the underlying
+             * hierarchical graph. It may be from either a metagraph or a bridgegraph.
+             */
+            var RenderMetaedgeInformation = (function () {
+                function RenderMetaedgeInformation(metaedge) {
+                    this.metaedge = metaedge;
+                    this.adjoiningMetaedge = null;
+                    this.structural = false;
+                    this.weight = 1;
+                }
+                return RenderMetaedgeInformation;
+            })();
+            render.RenderMetaedgeInformation = RenderMetaedgeInformation;
+            function addInAnnotation(node, predecessor, predecessorRenderInfo, edge, type, params) {
+                var annotation = new Annotation(predecessor, predecessorRenderInfo, edge, type, true);
+                node.inAnnotations.push(annotation, params);
+            }
+            function addOutAnnotation(node, successor, successorRenderInfo, edge, type, params) {
+                var annotation = new Annotation(successor, successorRenderInfo, edge, type, false);
+                node.outAnnotations.push(annotation, params);
+            }
+            function setGraphDepth(graph, depth) {
+                _.each(graph.nodes(), function (nodeName) {
+                    var child = graph.node(nodeName);
+                    child.expanded = depth > 1; // set all child of depth 1 to collapsed
+                    if (depth > 0) {
+                        switch (child.node.type) {
+                            case graph_1.NodeType.META:
+                            case graph_1.NodeType.SERIES:
+                                setGroupNodeDepth(child, depth - 1);
+                                break;
+                        }
+                    }
+                });
+            }
+            ;
+            var RenderGroupNodeInformation = (function (_super) {
+                __extends(RenderGroupNodeInformation, _super);
+                function RenderGroupNodeInformation(groupNode) {
+                    _super.call(this, groupNode);
+                    var metagraph = groupNode.metagraph;
+                    var gl = metagraph.graph();
+                    this.coreGraph =
+                        graph_1.createGraph(gl.name, graph_1.GraphType.CORE, { compound: true });
+                    this.coreBox = { width: 0, height: 0 };
+                    this.inExtractBox = { width: 0, height: 0 };
+                    this.outExtractBox = { width: 0, height: 0 };
+                    this.isolatedInExtract = [];
+                    this.isolatedOutExtract = [];
+                }
+                return RenderGroupNodeInformation;
+            })(RenderNodeInformation);
+            render.RenderGroupNodeInformation = RenderGroupNodeInformation;
+            function setGroupNodeDepth(renderInfo, depth) {
+                if (renderInfo.coreGraph) {
+                    setGraphDepth(renderInfo.coreGraph, depth);
+                }
+            }
+            /**
+             * Remove an edge from the graph and add annotations to both ends of the edge.
+             *
+             * @param The core graph.
+             * @param v Source name.
+             * @param w Sink name.
+             */
+            function createShortcut(graph, v, w, params) {
+                var src = graph.node(v);
+                var sink = graph.node(w);
+                var edge = graph.edge(v, w);
+                // Add each annotation.
+                addOutAnnotation(src, sink.node, sink, edge, AnnotationType.SHORTCUT, params);
+                addInAnnotation(sink, src.node, src, edge, AnnotationType.SHORTCUT, params);
+                // Remove the edge from the core graph.
+                graph.removeEdge(v, w);
+            }
+            /**
+             * Remove edges from a node, and set its isOutExtract property to true,
+             * and remove the node and move it to isolatedOutExtract.
+             *
+             * If detachAllEdgesForHighDegree is true, extract all of its edges.
+             * Otherwise, only extract all in-edges.
+             */
+            function makeOutExtract(renderNode, n, params) {
+                var graph = renderNode.coreGraph;
+                graph.node(n).isOutExtract = true;
+                _.each(graph.predecessors(n), function (p, index) {
+                    createShortcut(graph, p, n, params);
+                });
+                if (params.detachAllEdgesForHighDegree) {
+                    _.each(graph.successors(n), function (s, index) {
+                        createShortcut(graph, n, s, params);
+                    });
+                }
+                if (params.detachAllEdgesForHighDegree || graph.neighbors(n).length === 0) {
+                    renderNode.isolatedOutExtract.push(graph.node(n));
+                    graph.removeNode(n);
+                }
+            }
+            /**
+             * Remove edges from a node, set its isInExtract property to true,
+             * and remove the node and move it to isolatedInExtract.
+             * If detachAllEdgesForHighDegree is true, extract all of its edges.
+             * Otherwise, only remove all out-edges.
+             */
+            function makeInExtract(renderNode, n, params) {
+                var graph = renderNode.coreGraph;
+                graph.node(n).isInExtract = true;
+                _.each(graph.successors(n), function (s, index) {
+                    createShortcut(graph, n, s, params);
+                });
+                if (params.detachAllEdgesForHighDegree) {
+                    _.each(graph.predecessors(n), function (p, index) {
+                        createShortcut(graph, p, n, params);
+                    });
+                }
+                // Remove the node from the core graph if conditions are met.
+                if (params.detachAllEdgesForHighDegree || graph.neighbors(n).length === 0) {
+                    renderNode.isolatedInExtract.push(graph.node(n));
+                    graph.removeNode(n);
+                }
+            }
+            /**
+             * Check whether the node's type is a member of the given list of types.
+             *
+             * @param node Node.
+             * @param types List of type to match.
+             */
+            function hasTypeIn(node, types) {
+                if (node.type === graph_1.NodeType.OP) {
+                    for (var i = 0; i < types.length; i++) {
+                        if (node.op === types[i]) {
+                            return true;
+                        }
+                    }
+                }
+                else if (node.type === graph_1.NodeType.META) {
+                    var rootOpNode = node.getRootOp();
+                    if (rootOpNode) {
+                        for (var i = 0; i < types.length; i++) {
+                            if (rootOpNode.op === types[i]) {
+                                return true;
+                            }
+                        }
+                    }
+                }
+                return false;
+            }
+            /** Remove edges from pre-defined out-extract patterns */
+            function extractPredefinedSink(renderNode, params) {
+                var graph = renderNode.coreGraph;
+                _.each(graph.nodes(), function (n) {
+                    var renderInfo = graph.node(n);
+                    if (hasTypeIn(renderInfo.node, params.outExtractTypes)) {
+                        makeOutExtract(renderNode, n, params);
+                    }
+                });
+            }
+            /** Remove edges from pre-defined in-extract patterns */
+            function extractPredefinedSource(renderNode, params) {
+                var graph = renderNode.coreGraph;
+                _.each(graph.nodes(), function (n) {
+                    var renderInfo = graph.node(n);
+                    if (hasTypeIn(renderInfo.node, params.inExtractTypes)) {
+                        makeInExtract(renderNode, n, params);
+                    }
+                });
+            }
+            /** Extract from nodes with in-degree > maxInDegree */
+            function extractHighInDegree(renderNode, params) {
+                var graph = renderNode.coreGraph;
+                var maxInDegree = params.maxInDegree;
+                // detect first so degrees don't get affected by other removal
+                var highInDegreeNames = _.filter(graph.nodes(), function (n) {
+                    // Count the in-degree based on only regular edges, unless there are
+                    // no regular edges, in which case use the number of control edges.
+                    // This is done so that control edges don't effect if nodes are extracted
+                    // from the core graph, unless the node is only used for control.
+                    var numEdgesToCount = _.reduce(graph.predecessors(n), function (numEdgesToCount, pred) {
+                        var metaedge = graph.edge(pred, n).metaedge;
+                        return numEdgesToCount + (metaedge.numRegularEdges ? 1 : 0);
+                    }, 0);
+                    if (numEdgesToCount === 0 && graph.predecessors(n).length > 0) {
+                        numEdgesToCount = graph.predecessors(n).length;
+                    }
+                    return numEdgesToCount > maxInDegree;
+                });
+                _.each(highInDegreeNames, function (n) {
+                    makeOutExtract(renderNode, n, params);
+                });
+            }
+            /** Extract nodes with out-degree > maxOutDegree */
+            function extractHighOutDegree(renderNode, params) {
+                var graph = renderNode.coreGraph;
+                var maxOutDegree = params.maxOutDegree;
+                // detect first so degrees don't get affected by other removal
+                var highOutDegreeNames = _.filter(graph.nodes(), function (n) {
+                    // Count the out-degree based on only regular edges, unless there are
+                    // no regular edges, in which case use the number of control edges.
+                    // This is done so that control edges don't effect if nodes are extracted
+                    // from the core graph, unless the node is only used for control.
+                    var numEdgesToCount = _.reduce(graph.successors(n), function (numEdgesToCount, succ) {
+                        var metaedge = graph.edge(n, succ).metaedge;
+                        return numEdgesToCount + (metaedge.numRegularEdges ? 1 : 0);
+                    }, 0);
+                    if (numEdgesToCount === 0 && graph.successors(n).length > 0) {
+                        numEdgesToCount = graph.successors(n).length;
+                    }
+                    return numEdgesToCount > maxOutDegree;
+                });
+                _.each(highOutDegreeNames, function (n) {
+                    makeInExtract(renderNode, n, params);
+                });
+            }
+            /** Remove control edges from nodes that have too many control edges */
+            function removeControlEdges(renderNode, params) {
+                var graph = renderNode.coreGraph;
+                // Collect control edges into a map by node name.
+                var map = {};
+                _.each(graph.edges(), function (e) {
+                    if (!graph.edge(e).metaedge.numRegularEdges) {
+                        (map[e.v] = map[e.v] || []).push(e);
+                        (map[e.w] = map[e.w] || []).push(e);
+                    }
+                });
+                // For each node with too many control edges, turn them into annotations.
+                _.each(map, function (edges, nodeName) {
+                    if (edges.length > params.maxControlDegree) {
+                        _.each(edges, function (e) { return createShortcut(graph, e.v, e.w, params); });
+                    }
+                });
+            }
+            /**
+             * Given an integer, picks a hue that is far apart from other colors.
+             * The formula for picking color that avoid collision is:
+             *     hue = (color range * golden ratio * index) % color range
+             */
+            function mapIndexToHue(id) {
+                var GOLDEN_RATIO = 1.61803398875;
+                // Hue of 0 is reserved for the gray nodes.
+                var MIN_HUE = 1;
+                var MAX_HUE = 359;
+                var COLOR_RANGE = MAX_HUE - MIN_HUE;
+                return MIN_HUE + ((COLOR_RANGE * GOLDEN_RATIO * id) % COLOR_RANGE);
+            }
+            render.mapIndexToHue = mapIndexToHue;
+            ;
+            /**
+             * Remove edges and add to annotation instead.
+             *
+             * For root node, consider predefined types for source and sink.
+             * We do not extract predefined type from non-root so that Variables and the
+             * sgd node (op type = "NoOp") do not get extract from inside own group.
+             *
+             * The order of extraction is important here as swapping the order can totally
+             * screw up the graph layout.
+             *
+             * @param {Render.Node} renderNode Node to manipulate.
+             * @param {Object} params render Graph construction parameters. See
+             *     <tf-graph-params>'s output
+             */
+            function extractHighDegrees(renderNode, params) {
+                if (params.outExtractTypes) {
+                    extractPredefinedSink(renderNode, params);
+                }
+                // This has to come before extract high in-degree to protect the core part
+                // that takes many variables.
+                if (params.inExtractTypes) {
+                    extractPredefinedSource(renderNode, params);
+                }
+                // This has to come before extract high out-degree to protect the core part
+                // that output to many places as there are more high-degree sinks than
+                // sources.
+                if (params.maxInDegree) {
+                    extractHighInDegree(renderNode, params);
+                }
+                if (params.maxOutDegree) {
+                    extractHighOutDegree(renderNode, params);
+                }
+                if (params.maxControlDegree) {
+                    removeControlEdges(renderNode, params);
+                }
+                // Extract isolated nodes, which can be
+                // (1) source-like and sink-like nodes that are not originally isolated but
+                //     become isolated after further removal.
+                // (2) isolated nodes with annotations on one-side.  These might be either
+                //     - nodes that originally have high out-degree but because we remove
+                //       high in-degree nodes first, they no longer have high in-degree when
+                //       we check.  (Detecting all high-degree before removing also leads to
+                //       another problem.)
+                //     - nodes that do not have high degree, but their neighbors are all
+                //       extracted, so it might make sense to extract them too.
+                var graph = renderNode.coreGraph;
+                _.each(graph.nodes(), function (n) {
+                    var child = graph.node(n);
+                    var degree = graph.neighbors(n).length;
+                    if (degree === 0) {
+                        var hasOutAnnotations = child.outAnnotations.list.length > 0;
+                        var hasInAnnotations = child.inAnnotations.list.length > 0;
+                        if (child.isInExtract) {
+                            // This case only happens if detachAllEdgesForHighDegree is false.
+                            // (Otherwise all source-like nodes are all isolated already.)
+                            renderNode.isolatedInExtract.push(child);
+                            graph.removeNode(n);
+                        }
+                        else if (child.isOutExtract) {
+                            // This case only happens if detachAllEdgesForHighDegree is false.
+                            // // (Otherwise all sink-like nodes are all isolated already.)
+                            renderNode.isolatedOutExtract.push(child);
+                            graph.removeNode(n);
+                        }
+                        else if (params.extractIsolatedNodesWithAnnotationsOnOneSide) {
+                            if (hasOutAnnotations && !hasInAnnotations) {
+                                child.isInExtract = true; // for ones with high out-annotations
+                                renderNode.isolatedInExtract.push(child);
+                                graph.removeNode(n);
+                            }
+                            else if (hasInAnnotations && !hasOutAnnotations) {
+                                child.isOutExtract = true; // for ones with high in-annotations
+                                renderNode.isolatedOutExtract.push(child);
+                                graph.removeNode(n);
+                            }
+                            else {
+                            }
+                        }
+                    }
+                });
+            }
+        })(render = graph_1.render || (graph_1.render = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module tf.graph.render
+</script>
+<script>/// <reference path="graph.ts" />
+/// <reference path="hierarchy.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph_1) {
+        var template;
+        (function (template) {
+            /**
+             * Detect repeating patterns of subgraphs.
+             * Assign templateId to each subgraph if it belongs to a template.
+             * Returns clusters of similar subgraphs .
+             *
+             * @param graph
+             * @param verifyTemplate whether to run the template verification algorithm
+             * @return a dict (template id => Array of node names)
+             */
+            function detect(h, verifyTemplate) {
+                // In any particular subgraph, there are either
+                // - leaf nodes (which do not have subgraph)
+                // - metanode nodes - some of them have only one member (singular metanode)
+                //                    and some have multiple members (non-singular metanode)
+                // First, generate a nearest neighbor hash of metanode nodes.
+                var nnGroups = clusterSimilarSubgraphs(h);
+                // For each metanode, compare its subgraph (starting from shallower groups)
+                // and assign template id.
+                var templates = groupTemplateAndAssignId(nnGroups, verifyTemplate);
+                // Sort the templates by minimum level in the graph at which they appear,
+                // as this leads to optimal setting of the colors of each template for
+                // maximum differentiation.
+                return _(templates).pairs()
+                    .sortBy(function (pair) {
+                    return pair[1].level;
+                })
+                    .map(function (pair) {
+                    return [pair[0], pair[1].nodes];
+                })
+                    .object().value();
+            }
+            template.detect = detect;
+            ;
+            /**
+             * @return Unique string for a metanode based on depth, |V|, |E| and
+             * op type histogram.
+             */
+            function getSignature(metanode) {
+                // depth=<number> |V|=<number> |E|=<number>
+                var props = _.map({
+                    "depth": metanode.depth,
+                    "|V|": metanode.metagraph.nodes().length,
+                    "|E|": metanode.metagraph.edges().length
+                }, function (v, k) { return k + "=" + v; }).join(" ");
+                // optype1=count1,optype2=count2
+                var ops = _.map(metanode.opHistogram, function (count, op) {
+                    return op + "=" + count;
+                }).join(",");
+                return props + " [ops] " + ops;
+            }
+            /**
+             * Generate a nearest neighbor hash of metanodes
+             * based on depth, |V|, |E|, and opHistogram of their subgraph
+             * (excluding leaf nodes and singular metanodes).
+             * @param graph The graph
+             * @return Array of pairs of [signature,
+             *   Object with min level of the template and an Array of tf.graph.Group]
+             *   sort by ascending order of minimum depth at which metanode appears.
+             */
+            function clusterSimilarSubgraphs(h) {
+                /** a dict from metanode.signature() => Array of tf.graph.Groups */
+                var hashDict = _(h.getNodeMap()).reduce(function (hash, node, name) {
+                    if (node.type !== graph_1.NodeType.META) {
+                        return hash;
+                    }
+                    var levelOfMetaNode = name.split("/").length - 1;
+                    var signature = getSignature(node);
+                    var templateInfo = hash[signature] ||
+                        { nodes: [], level: levelOfMetaNode };
+                    hash[signature] = templateInfo;
+                    templateInfo.nodes.push(node);
+                    if (templateInfo.level > levelOfMetaNode) {
+                        templateInfo.level = levelOfMetaNode;
+                    }
+                    return hash;
+                }, {});
+                return _(hashDict).pairs()
+                    .filter(function (pair) {
+                    return pair[1].nodes.length > 1;
+                })
+                    .sortBy(function (pair) {
+                    // sort by depth
+                    // (all members in the same nnGroup has equal depth)
+                    return pair[1].nodes[0].depth;
+                })
+                    .value();
+            }
+            function groupTemplateAndAssignId(nnGroups, verifyTemplate) {
+                // For each metanode, compare its subgraph (starting from shallower groups)
+                // and assign template id.
+                return _.reduce(nnGroups, function (templates, nnGroupPair) {
+                    var signature = nnGroupPair[0], nnGroup = nnGroupPair[1].nodes, clusters = [];
+                    nnGroup.forEach(function (metanode) {
+                        // check with each existing cluster
+                        for (var i = 0; i < clusters.length; i++) {
+                            var similar = !verifyTemplate ||
+                                isSimilarSubgraph(clusters[i].metanode.metagraph, metanode.metagraph);
+                            // if similar, just add this metanode to the cluster
+                            if (similar) {
+                                // get template from the first one
+                                metanode.templateId = clusters[i].metanode.templateId;
+                                clusters[i].members.push(metanode.name);
+                                return;
+                            }
+                        }
+                        // otherwise create a new cluster with id "signature [count] "
+                        metanode.templateId = signature + "[" + clusters.length + "]";
+                        clusters.push({
+                            metanode: metanode,
+                            members: [metanode.name]
+                        });
+                    });
+                    clusters.forEach(function (c) {
+                        templates[c.metanode.templateId] = {
+                            level: nnGroupPair[1].level,
+                            nodes: c.members
+                        };
+                    });
+                    return templates;
+                }, {});
+            }
+            function sortNodes(names, graph, prefix) {
+                return _.sortByAll(names, function (name) {
+                    var node = graph.node(name);
+                    return node.op;
+                }, function (name) {
+                    var node = graph.node(name);
+                    return node.templateId;
+                }, function (name) {
+                    return graph.neighbors(name).length;
+                }, function (name) {
+                    return graph.predecessors(name).length;
+                }, function (name) {
+                    return graph.successors(name).length;
+                }, function (name) {
+                    return name.substr(prefix.length);
+                });
+            }
+            function isSimilarSubgraph(g1, g2) {
+                if (!tf.graph.hasSimilarDegreeSequence(g1, g2)) {
+                    return false;
+                }
+                // if we want to skip, just return true here.
+                // return true;
+                // Verify sequence by running DFS
+                var g1prefix = g1.graph().name;
+                var g2prefix = g2.graph().name;
+                var visited1 = {};
+                var visited2 = {};
+                var stack = [];
+                /**
+                 * push sources or successors into the stack
+                 * if the visiting pattern has been similar.
+                 */
+                function stackPushIfNotDifferent(n1, n2) {
+                    var sub1 = n1.substr(g1prefix.length), sub2 = n2.substr(g2prefix.length);
+                    /* tslint:disable */
+                    if (visited1[sub1] ^ visited2[sub1]) {
+                        console.warn("different visit pattern", "[" + g1prefix + "]", sub1, "[" + g2prefix + "]", sub2);
+                        return true;
+                    }
+                    /* tslint:enable */
+                    if (!visited1[sub1]) {
+                        visited1[sub1] = visited2[sub2] = true;
+                        stack.push({ n1: n1, n2: n2 });
+                    }
+                    return false;
+                }
+                // check if have same # of sources then sort and push
+                var sources1 = g1.sources();
+                var sources2 = g2.sources();
+                if (sources1.length !== sources2.length) {
+                    /* tslint:disable */
+                    console.log("different source length");
+                    /* tslint:enable */
+                    return false;
+                }
+                sources1 = sortNodes(sources1, g1, g1prefix);
+                sources2 = sortNodes(sources2, g2, g2prefix);
+                for (var i = 0; i < sources1.length; i++) {
+                    var different = stackPushIfNotDifferent(sources1[i], sources2[i]);
+                    if (different) {
+                        return false;
+                    }
+                }
+                while (stack.length > 0) {
+                    var cur = stack.pop();
+                    // check node
+                    var similar = isSimilarNode(g1.node(cur.n1), g2.node(cur.n2));
+                    if (!similar) {
+                        return false;
+                    }
+                    // check if have same # of successors then sort and push
+                    var succ1 = g1.successors(cur.n1), succ2 = g2.successors(cur.n2);
+                    if (succ1.length !== succ2.length) {
+                        /* tslint:disable */
+                        console.log("# of successors mismatch", succ1, succ2);
+                        /* tslint:enable */
+                        return false;
+                    }
+                    succ1 = sortNodes(succ1, g1, g1prefix);
+                    succ2 = sortNodes(succ2, g2, g2prefix);
+                    for (var j = 0; j < succ1.length; j++) {
+                        var different = stackPushIfNotDifferent(succ1[j], succ2[j]);
+                        if (different) {
+                            return false;
+                        }
+                    }
+                }
+                return true;
+            }
+            /**
+             * Returns if two nodes have identical structure.
+             */
+            function isSimilarNode(n1, n2) {
+                if (n1.type === graph_1.NodeType.META) {
+                    // compare metanode
+                    var metanode1 = n1;
+                    var metanode2 = n2;
+                    return metanode1.templateId && metanode2.templateId && metanode1.templateId === metanode2.templateId;
+                }
+                else if (n1.type === graph_1.NodeType.OP && n2.type === graph_1.NodeType.OP) {
+                    // compare leaf node
+                    return n1.op === n2.op;
+                }
+                else if (n1.type === graph_1.NodeType.SERIES && n2.type === graph_1.NodeType.SERIES) {
+                    // compare series node sizes and operations
+                    // (only need to check one op as all op nodes are identical in series)
+                    var seriesnode1 = n1;
+                    var seriesnode2 = n2;
+                    var seriesnode1Count = seriesnode1.metagraph.nodeCount();
+                    return (seriesnode1Count === seriesnode2.metagraph.nodeCount() &&
+                        (seriesnode1Count === 0 ||
+                            (seriesnode1.metagraph.node(seriesnode1.metagraph.nodes()[0]).op ===
+                                seriesnode2.metagraph.node(seriesnode2.metagraph.nodes()[0]).op)));
+                }
+                return false;
+            }
+        })(template = graph_1.template || (graph_1.template = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {}));
+</script>
+<script>/// <reference path="../graph.ts" />
+/// <reference path="edge.ts" />
+/// <reference path="node.ts" />
+/// <reference path="../layout.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph) {
+        var scene;
+        (function (scene) {
+            /** Enums element class of objects in the scene */
+            scene.Class = {
+                Node: {
+                    // <g> element that contains nodes.
+                    CONTAINER: "nodes",
+                    // <g> element that contains detail about a node.
+                    GROUP: "node",
+                    // <g> element that contains visual elements (like rect, ellipse).
+                    SHAPE: "nodeshape",
+                    // <*> element(s) under SHAPE that should receive color updates.
+                    COLOR_TARGET: "nodecolortarget",
+                    // <text> element showing the node's label.
+                    LABEL: "nodelabel",
+                    // <g> element that contains all visuals for the expand/collapse
+                    // button for expandable group nodes.
+                    BUTTON_CONTAINER: "buttoncontainer",
+                    // <circle> element that surrounds expand/collapse buttons.
+                    BUTTON_CIRCLE: "buttoncircle",
+                    // <path> element of the expand button.
+                    EXPAND_BUTTON: "expandbutton",
+                    // <path> element of the collapse button.
+                    COLLAPSE_BUTTON: "collapsebutton"
+                },
+                Edge: {
+                    CONTAINER: "edges",
+                    GROUP: "edge",
+                    LINE: "edgeline",
+                    REF_LINE: "refline",
+                    STRUCTURAL: "structural"
+                },
+                Annotation: {
+                    OUTBOX: "out-annotations",
+                    INBOX: "in-annotations",
+                    GROUP: "annotation",
+                    NODE: "annotation-node",
+                    EDGE: "annotation-edge",
+                    CONTROL_EDGE: "annotation-control-edge",
+                    LABEL: "annotation-label",
+                    ELLIPSIS: "annotation-ellipsis"
+                },
+                Scene: {
+                    GROUP: "scene",
+                    CORE: "core",
+                    INEXTRACT: "in-extract",
+                    OUTEXTRACT: "out-extract"
+                },
+                Subscene: {
+                    GROUP: "subscene"
+                },
+                OPNODE: "op",
+                METANODE: "meta",
+                SERIESNODE: "series",
+                BRIDGENODE: "bridge",
+                ELLIPSISNODE: "ellipsis"
+            };
+            /**
+             * Helper method for fitting the graph in the svg view.
+             *
+             * @param svg The main svg.
+             * @param zoomG The svg group used for panning and zooming.
+             * @param d3zoom The zoom behavior.
+             * @param callback Called when the fitting is done.
+             */
+            function fit(svg, zoomG, d3zoom, callback) {
+                var svgRect = svg.getBoundingClientRect();
+                var sceneSize = zoomG.getBBox();
+                var scale = 0.9 * Math.min(svgRect.width / sceneSize.width, svgRect.height / sceneSize.height, 2);
+                var params = graph.layout.PARAMS.graph;
+                var zoomEvent = d3zoom.scale(scale)
+                    .on("zoomend.fitted", function () {
+                    // Remove the listener for the zoomend event,
+                    // so we don't get called at the end of regular zoom events,
+                    // just those that fit the graph to screen.
+                    d3zoom.on("zoomend.fitted", null);
+                    callback();
+                })
+                    .translate([params.padding.paddingLeft, params.padding.paddingTop])
+                    .event;
+                d3.select(zoomG).transition().duration(500).call(zoomEvent);
+            }
+            scene.fit = fit;
+            ;
+            /**
+             * Helper method for panning the graph to center on the provided node,
+             * if the node is currently off-screen.
+             *
+             * @param nodeName The node to center the graph on
+             * @param svg The root SVG element for the graph
+             * @param zoomG The svg group used for panning and zooming.
+             * @param d3zoom The zoom behavior.
+             * @return True if the graph had to be panned to display the
+             *            provided node.
+             */
+            function panToNode(nodeName, svg, zoomG, d3zoom) {
+                var node = d3.selectAll("[data-name='" + nodeName + "']."
+                    + scene.Class.Node.GROUP)[0][0];
+                if (!node) {
+                    return false;
+                }
+                var translate = d3zoom.translate();
+                // Check if the selected node is off-screen in either
+                // X or Y dimension in either direction.
+                var nodeBox = node.getBBox();
+                var nodeCtm = node.getScreenCTM();
+                var pointTL = svg.createSVGPoint();
+                var pointBR = svg.createSVGPoint();
+                pointTL.x = nodeBox.x;
+                pointTL.y = nodeBox.y;
+                pointBR.x = nodeBox.x + nodeBox.width;
+                pointBR.y = nodeBox.y + nodeBox.height;
+                pointTL = pointTL.matrixTransform(nodeCtm);
+                pointBR = pointBR.matrixTransform(nodeCtm);
+                var isOutsideOfBounds = function (start, end, bound) {
+                    return end < 0 || start > bound;
+                };
+                var svgRect = svg.getBoundingClientRect();
+                if (isOutsideOfBounds(pointTL.x, pointBR.x, svgRect.width) ||
+                    isOutsideOfBounds(pointTL.y, pointBR.y, svgRect.height)) {
+                    // Determine the amount to transform the graph in both X and Y
+                    // dimensions in order to center the selected node. This takes into
+                    // acount the position of the node, the size of the svg scene, the
+                    // amount the scene has been scaled by through zooming, and any previous
+                    // transform already performed by this logic.
+                    var centerX = (pointTL.x + pointBR.x) / 2;
+                    var centerY = (pointTL.y + pointBR.y) / 2;
+                    var dx = ((svgRect.width / 2) - centerX);
+                    var dy = ((svgRect.height / 2) - centerY);
+                    var zoomEvent = d3zoom.translate([translate[0] + dx, translate[1] + dy])
+                        .event;
+                    d3.select(zoomG).transition().duration(500).call(zoomEvent);
+                    return true;
+                }
+                return false;
+            }
+            scene.panToNode = panToNode;
+            ;
+            /**
+             * Given a container d3 selection, select a child svg element of a given tag
+             * and class if exists or append / insert one otherwise.  If multiple children
+             * matches the tag and class name, returns only the first one.
+             *
+             * @param container
+             * @param tagName tag name.
+             * @param className (optional) Class name.
+             * @param before (optional) reference DOM node for insertion.
+             * @return selection of the element
+             */
+            function selectOrCreateChild(container, tagName, className, before) {
+                var child = selectChild(container, tagName, className);
+                if (!child.empty()) {
+                    return child;
+                }
+                var newElement = document.createElementNS("http://www.w3.org/2000/svg", tagName);
+                if (className) {
+                    newElement.classList.add(className);
+                }
+                if (before) {
+                    container.node().insertBefore(newElement, before);
+                }
+                else {
+                    container.node().appendChild(newElement);
+                }
+                return d3.select(newElement)
+                    .datum(container.datum());
+            }
+            scene.selectOrCreateChild = selectOrCreateChild;
+            ;
+            /**
+             * Given a container d3 selection, select a child element of a given tag and
+             * class. If multiple children matches the tag and class name, returns only
+             * the first one.
+             *
+             * @param container
+             * @param tagName tag name.
+             * @param className (optional) Class name.
+             * @return selection of the element, or an empty selection
+             */
+            function selectChild(container, tagName, className) {
+                var children = container.node().childNodes;
+                for (var i = 0; i < children.length; i++) {
+                    var child = children[i];
+                    if (child.tagName === tagName &&
+                        (!className || child.classList.contains(className))) {
+                        return d3.select(child);
+                    }
+                }
+                return d3.select(null);
+            }
+            scene.selectChild = selectChild;
+            ;
+            /**
+             * Select or create a sceneGroup and build/update its nodes and edges.
+             *
+             * Structure Pattern:
+             *
+             * <g class="scene">
+             *   <g class="core">
+             *     <g class="edges">
+             *       ... stuff from tf.graph.scene.edges.build ...
+             *     </g>
+             *     <g class="nodes">
+             *       ... stuff from tf.graph.scene.nodes.build ...
+             *     </g>
+             *   </g>
+             *   <g class="in-extract">
+             *     <g class="nodes">
+             *       ... stuff from tf.graph.scene.nodes.build ...
+             *     </g>
+             *   </g>
+             *   <g class="out-extract">
+             *     <g class="nodes">
+             *       ... stuff from tf.graph.scene.nodes.build ...
+             *     </g>
+             *   </g>
+             * </g>
+             *
+             * @param container D3 selection of the parent.
+             * @param renderNode render node of a metanode or series node.
+             * @param sceneBehavior Parent scene module.
+             * @param sceneClass class attribute of the scene (default="scene").
+             */
+            function buildGroup(container, renderNode, sceneBehavior, sceneClass) {
+                sceneClass = sceneClass || scene.Class.Scene.GROUP;
+                var isNewSceneGroup = selectChild(container, "g", sceneClass).empty();
+                var sceneGroup = selectOrCreateChild(container, "g", sceneClass);
+                // core
+                var coreGroup = selectOrCreateChild(sceneGroup, "g", scene.Class.Scene.CORE);
+                var coreNodes = _.reduce(renderNode.coreGraph.nodes(), function (nodes, name) {
+                    var node = renderNode.coreGraph.node(name);
+                    if (!node.excluded) {
+                        nodes.push(node);
+                    }
+                    return nodes;
+                }, []);
+                if (renderNode.node.type === graph.NodeType.SERIES) {
+                    // For series, we want the first item on top, so reverse the array so
+                    // the first item in the series becomes last item in the top, and thus
+                    // is rendered on the top.
+                    coreNodes.reverse();
+                }
+                // Create the layer of edges for this scene (paths).
+                scene.edge.buildGroup(coreGroup, renderNode.coreGraph, sceneBehavior);
+                // Create the layer of nodes for this scene (ellipses, rects etc).
+                scene.node.buildGroup(coreGroup, coreNodes, sceneBehavior);
+                // In-extract
+                if (renderNode.isolatedInExtract.length > 0) {
+                    var inExtractGroup = selectOrCreateChild(sceneGroup, "g", scene.Class.Scene.INEXTRACT);
+                    scene.node.buildGroup(inExtractGroup, renderNode.isolatedInExtract, sceneBehavior);
+                }
+                else {
+                    selectChild(sceneGroup, "g", scene.Class.Scene.INEXTRACT).remove();
+                }
+                // Out-extract
+                if (renderNode.isolatedOutExtract.length > 0) {
+                    var outExtractGroup = selectOrCreateChild(sceneGroup, "g", scene.Class.Scene.OUTEXTRACT);
+                    scene.node.buildGroup(outExtractGroup, renderNode.isolatedOutExtract, sceneBehavior);
+                }
+                else {
+                    selectChild(sceneGroup, "g", scene.Class.Scene.OUTEXTRACT).remove();
+                }
+                position(sceneGroup, renderNode);
+                // Fade in the scene group if it didn't already exist.
+                if (isNewSceneGroup) {
+                    sceneGroup.attr("opacity", 0)
+                        .transition().attr("opacity", 1);
+                }
+                return sceneGroup;
+            }
+            scene.buildGroup = buildGroup;
+            ;
+            /**
+             * Given a scene's svg group, set  g.in-extract, g.coreGraph, g.out-extract svg
+             * groups' position relative to the scene.
+             *
+             * @param sceneGroup
+             * @param renderNode render node of a metanode or series node.
+             */
+            function position(sceneGroup, renderNode) {
+                // Translate scenes down by the label height so that when showing graphs in
+                // expanded metanodes, the graphs are below the labels.  Do not shift them
+                // down for series nodes as series nodes don't have labels inside of their
+                // bounding boxes.
+                var yTranslate = renderNode.node.type === graph.NodeType.SERIES ?
+                    0 : graph.layout.PARAMS.subscene.meta.labelHeight;
+                // core
+                translate(selectChild(sceneGroup, "g", scene.Class.Scene.CORE), 0, yTranslate);
+                // in-extract
+                var inExtractX = renderNode.coreBox.width === 0 ?
+                    0 : renderNode.coreBox.width;
+                var hasInExtract = renderNode.isolatedInExtract.length > 0;
+                if (hasInExtract) {
+                    translate(selectChild(sceneGroup, "g", scene.Class.Scene.INEXTRACT), inExtractX, yTranslate);
+                }
+                // out-extract
+                var hasOutExtract = renderNode.isolatedOutExtract.length > 0;
+                if (hasOutExtract) {
+                    var outExtractX = inExtractX + renderNode.inExtractBox.width
+                        + renderNode.extractXOffset;
+                    translate(selectChild(sceneGroup, "g", scene.Class.Scene.OUTEXTRACT), outExtractX, yTranslate);
+                }
+            }
+            ;
+            /** Adds a click listener to a group that fires a graph-select event */
+            function addGraphClickListener(graphGroup, sceneBehavior) {
+                d3.select(graphGroup).on("click", function () {
+                    sceneBehavior.fire("graph-select");
+                });
+            }
+            scene.addGraphClickListener = addGraphClickListener;
+            ;
+            /** Helper for adding transform: translate(x0, y0) */
+            function translate(selection, x0, y0) {
+                selection.attr("transform", "translate(" + x0 + "," + y0 + ")");
+            }
+            scene.translate = translate;
+            ;
+            /**
+             * Helper for setting position of a svg rect
+             * @param rect rect to set position of.
+             * @param cx Center x.
+             * @param cy Center x.
+             * @param width Width to set.
+             * @param height Height to set.
+             */
+            function positionRect(rect, cx, cy, width, height) {
+                rect.transition().attr({
+                    x: cx - width / 2,
+                    y: cy - height / 2,
+                    width: width,
+                    height: height
+                });
+            }
+            scene.positionRect = positionRect;
+            ;
+            /**
+             * Helper for setting position of a svg expand/collapse button
+             * @param button container group
+             * @param renderNode the render node of the group node to position
+             *        the button on.
+             */
+            function positionButton(button, renderNode) {
+                // Position the button in the top-right corner of the group node,
+                // with space given the draw the button inside of the corner.
+                var x = renderNode.x + renderNode.width / 2 - 6;
+                var y = renderNode.y - renderNode.height / 2 + 6;
+                // For unexpanded series nodes, the button has special placement due
+                // to the unique visuals of this group node.
+                if (renderNode.node.type === graph.NodeType.SERIES && !renderNode.expanded) {
+                    x += 10;
+                    y -= 2;
+                }
+                var translateStr = "translate(" + x + "," + y + ")";
+                button.selectAll("path").transition().attr("transform", translateStr);
+                button.select("circle").transition().attr({
+                    cx: x,
+                    cy: y,
+                    r: graph.layout.PARAMS.nodeSize.meta.expandButtonRadius
+                });
+            }
+            scene.positionButton = positionButton;
+            ;
+            /**
+             * Helper for setting position of a svg ellipse
+             * @param ellipse ellipse to set position of.
+             * @param cx Center x.
+             * @param cy Center x.
+             * @param width Width to set.
+             * @param height Height to set.
+             */
+            function positionEllipse(ellipse, cx, cy, width, height) {
+                ellipse.transition().attr({
+                    cx: cx,
+                    cy: cy,
+                    rx: width / 2,
+                    ry: height / 2
+                });
+            }
+            scene.positionEllipse = positionEllipse;
+            ;
+        })(scene = graph.scene || (graph.scene = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module
+</script>
+<script>/// <reference path="../graph.ts" />
+/// <reference path="../render.ts" />
+/// <reference path="scene.ts" />
+/// <reference path="edge.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph) {
+        var scene;
+        (function (scene) {
+            var annotation;
+            (function (annotation) {
+                /**
+                 * Populate a given annotation container group
+                 *
+                 *     <g class="{in|out}-annotations"></g>
+                 *
+                 * with annotation group of the following structure:
+                 *
+                 * <g class="annotation">
+                 *   <g class="annotation-node">
+                 *   <!--
+                 *   Content here determined by Scene.node.buildGroup.
+                 *   -->
+                 *   </g>
+                 * </g>
+                 *
+                 * @param container selection of the container.
+                 * @param annotationData node.{in|out}Annotations
+                 * @param d node to build group for.
+                 * @param sceneBehavior polymer scene element.
+                 * @return selection of appended objects
+                 */
+                function buildGroup(container, annotationData, d, sceneBehavior) {
+                    // Select all children and join with data.
+                    var annotationGroups = container.selectAll(function () {
+                        // using d3's selector function
+                        // See https://github.com/mbostock/d3/releases/tag/v2.0.0
+                        // (It's not listed in the d3 wiki.)
+                        return this.childNodes;
+                    })
+                        .data(annotationData.list, function (d) { return d.node.name; });
+                    annotationGroups.enter()
+                        .append("g")
+                        .attr("data-name", function (a) { return a.node.name; })
+                        .each(function (a) {
+                        var aGroup = d3.select(this);
+                        // Add annotation to the index in the scene
+                        sceneBehavior.addAnnotationGroup(a, d, aGroup);
+                        // Append annotation edge
+                        var edgeType = scene.Class.Annotation.EDGE;
+                        var metaedge = a.renderMetaedgeInfo && a.renderMetaedgeInfo.metaedge;
+                        if (metaedge && !metaedge.numRegularEdges) {
+                            edgeType += " " + scene.Class.Annotation.CONTROL_EDGE;
+                        }
+                        // If any edges are reference edges, add the reference edge class.
+                        if (metaedge && metaedge.numRefEdges) {
+                            edgeType += " " + scene.Class.Edge.REF_LINE;
+                        }
+                        scene.edge.appendEdge(aGroup, a, sceneBehavior, edgeType);
+                        if (a.annotationType !== tf.graph.render.AnnotationType.ELLIPSIS) {
+                            addAnnotationLabelFromNode(aGroup, a);
+                            buildShape(aGroup, a, sceneBehavior);
+                        }
+                        else {
+                            addAnnotationLabel(aGroup, a.node.name, a, scene.Class.Annotation.ELLIPSIS);
+                        }
+                    });
+                    annotationGroups
+                        .attr("class", function (a) {
+                        return scene.Class.Annotation.GROUP + " " +
+                            annotationToClassName(a.annotationType) +
+                            " " + scene.node.nodeClass(a);
+                    })
+                        .each(function (a) {
+                        var aGroup = d3.select(this);
+                        update(aGroup, d, a, sceneBehavior);
+                        if (a.annotationType !== tf.graph.render.AnnotationType.ELLIPSIS) {
+                            addInteraction(aGroup, d, sceneBehavior);
+                        }
+                    });
+                    annotationGroups.exit()
+                        .each(function (a) {
+                        var aGroup = d3.select(this);
+                        // Remove annotation from the index in the scene
+                        sceneBehavior.removeAnnotationGroup(a, d, aGroup);
+                    })
+                        .remove();
+                    return annotationGroups;
+                }
+                annotation.buildGroup = buildGroup;
+                ;
+                /**
+                 * Maps an annotation enum to a class name used in css rules.
+                 */
+                function annotationToClassName(annotationType) {
+                    return (tf.graph.render.AnnotationType[annotationType] || "")
+                        .toLowerCase() || null;
+                }
+                function buildShape(aGroup, a, sceneBehavior) {
+                    if (a.annotationType === tf.graph.render.AnnotationType.SUMMARY) {
+                        var image = scene.selectOrCreateChild(aGroup, "image");
+                        image.attr({
+                            "xlink:href": sceneBehavior.resolveUrl("../../lib/svg/summary-icon.svg"),
+                            "height": "12px",
+                            "width": "12px",
+                            "cursor": "pointer"
+                        });
+                    }
+                    else {
+                        var shape = scene.node.buildShape(aGroup, a, scene.Class.Annotation.NODE);
+                        // add title tag to get native tooltips
+                        scene.selectOrCreateChild(shape, "title").text(a.node.name);
+                    }
+                }
+                function addAnnotationLabelFromNode(aGroup, a) {
+                    var namePath = a.node.name.split("/");
+                    var text = namePath[namePath.length - 1];
+                    var shortenedText = text.length > 8 ? text.substring(0, 8) + "..." : text;
+                    return addAnnotationLabel(aGroup, shortenedText, a, null, text);
+                }
+                function addAnnotationLabel(aGroup, label, a, additionalClassNames, fullLabel) {
+                    var classNames = scene.Class.Annotation.LABEL;
+                    if (additionalClassNames) {
+                        classNames += " " + additionalClassNames;
+                    }
+                    var titleText = fullLabel ? fullLabel : label;
+                    return aGroup.append("text")
+                        .attr("class", classNames)
+                        .attr("dy", ".35em")
+                        .attr("text-anchor", a.isIn ? "end" : "start")
+                        .text(label)
+                        .append("title").text(titleText);
+                }
+                function addInteraction(selection, d, sceneBehavior) {
+                    selection
+                        .on("mouseover", function (a) {
+                        sceneBehavior.fire("annotation-highlight", {
+                            name: a.node.name,
+                            hostName: d.node.name
+                        });
+                    })
+                        .on("mouseout", function (a) {
+                        sceneBehavior.fire("annotation-unhighlight", {
+                            name: a.node.name,
+                            hostName: d.node.name
+                        });
+                    })
+                        .on("click", function (a) {
+                        // Stop this event"s propagation so that it isn't also considered a
+                        // graph-select.
+                        d3.event.stopPropagation();
+                        sceneBehavior.fire("annotation-select", {
+                            name: a.node.name,
+                            hostName: d.node.name
+                        });
+                    });
+                }
+                ;
+                /**
+                 * Adjust annotation's position.
+                 *
+                 * @param aGroup selection of a "g.annotation" element.
+                 * @param d Host node data.
+                 * @param a annotation node data.
+                 * @param scene Polymer scene element.
+                 */
+                function update(aGroup, d, a, sceneBehavior) {
+                    // Annotations that point to embedded nodes (constants,summary)
+                    // don't have a render information attached so we don't stylize these.
+                    // Also we don't stylize ellipsis annotations (the string "... and X more").
+                    if (a.renderNodeInfo &&
+                        a.annotationType !== tf.graph.render.AnnotationType.ELLIPSIS) {
+                        scene.node.stylize(aGroup, a.renderNodeInfo, sceneBehavior, scene.Class.Annotation.NODE);
+                    }
+                    if (a.annotationType === tf.graph.render.AnnotationType.SUMMARY) {
+                        // Update the width of the annotation to give space for the image.
+                        a.width += 10;
+                    }
+                    // label position
+                    aGroup.select("text." + scene.Class.Annotation.LABEL).transition().attr({
+                        x: d.x + a.dx + (a.isIn ? -1 : 1) * (a.width / 2 + a.labelOffset),
+                        y: d.y + a.dy
+                    });
+                    // Some annotations (such as summary) are represented using a 12x12 image tag.
+                    // Purposely ommited units (e.g. pixels) since the images are vector graphics.
+                    // If there is an image, we adjust the location of the image to be vertically
+                    // centered with the node and horizontally centered between the arrow and the
+                    // text label.
+                    aGroup.select("image").transition().attr({
+                        x: d.x + a.dx - 3,
+                        y: d.y + a.dy - 6
+                    });
+                    // Node position (only one of the shape selection will be non-empty.)
+                    scene.positionEllipse(aGroup.select("." + scene.Class.Annotation.NODE + " ellipse"), d.x + a.dx, d.y + a.dy, a.width, a.height);
+                    scene.positionRect(aGroup.select("." + scene.Class.Annotation.NODE + " rect"), d.x + a.dx, d.y + a.dy, a.width, a.height);
+                    scene.positionRect(aGroup.select("." + scene.Class.Annotation.NODE + " use"), d.x + a.dx, d.y + a.dy, a.width, a.height);
+                    // Edge position
+                    aGroup.select("path." + scene.Class.Annotation.EDGE).transition().attr("d", function (a) {
+                        // map relative position to absolute position
+                        var points = a.points.map(function (p) {
+                            return { x: p.dx + d.x, y: p.dy + d.y };
+                        });
+                        return scene.edge.interpolate(points);
+                    });
+                }
+                ;
+            })(annotation = scene.annotation || (scene.annotation = {}));
+        })(scene = graph.scene || (graph.scene = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module
+</script>
+<script>/// <reference path="../graph.ts" />
+/// <reference path="../render.ts" />
+/// <reference path="scene.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph_1) {
+        var scene;
+        (function (scene) {
+            var edge;
+            (function (edge) {
+                var Scene = tf.graph.scene; // Aliased
+                function getEdgeKey(edgeObj) {
+                    return edgeObj.v + tf.graph.EDGE_KEY_DELIM + edgeObj.w;
+                }
+                edge.getEdgeKey = getEdgeKey;
+                /**
+                 * Select or Create a "g.edges" group to a given sceneGroup
+                 * and builds a number of "g.edge" groups inside the group.
+                 *
+                 * Structure Pattern:
+                 *
+                 * <g class="edges">
+                 *   <g class="edge">
+                 *     <path class="edgeline"/>
+                 *   </g>
+                 *   ...
+                 * </g>
+                 *
+                 *
+                 * @param sceneGroup container
+                 * @param graph
+                 * @param sceneBehavior Parent scene module.
+                 * @return selection of the created nodeGroups
+                 */
+                function buildGroup(sceneGroup, graph, sceneBehavior) {
+                    var edgeData = _.reduce(graph.edges(), function (edges, edgeObj) {
+                        var edgeLabel = graph.edge(edgeObj);
+                        edges.push({
+                            v: edgeObj.v,
+                            w: edgeObj.w,
+                            label: edgeLabel
+                        });
+                        return edges;
+                    }, []);
+                    var container = scene.selectOrCreateChild(sceneGroup, "g", scene.Class.Edge.CONTAINER);
+                    var containerNode = container.node();
+                    // Select all children and join with data.
+                    // (Note that all children of g.edges are g.edge)
+                    var edgeGroups = container.selectAll(function () {
+                        // using d3's selector function
+                        // See https://github.com/mbostock/d3/releases/tag/v2.0.0
+                        // (It's not listed in the d3 wiki.)
+                        return this.childNodes;
+                    })
+                        .data(edgeData, getEdgeKey);
+                    // Make edges a group to support rendering multiple lines for metaedge
+                    edgeGroups.enter()
+                        .append("g")
+                        .attr("class", scene.Class.Edge.GROUP)
+                        .attr("data-edge", getEdgeKey)
+                        .each(function (d) {
+                        var edgeGroup = d3.select(this);
+                        d.label.edgeGroup = edgeGroup;
+                        // index node group for quick highlighting
+                        sceneBehavior._edgeGroupIndex[getEdgeKey(d)] = edgeGroup;
+                        // If any edges are reference edges, add the reference edge class.
+                        var extraEdgeClass = d.label.metaedge && d.label.metaedge.numRefEdges
+                            ? scene.Class.Edge.REF_LINE + " " + scene.Class.Edge.LINE
+                            : undefined;
+                        // Add line during enter because we're assuming that type of line
+                        // normally does not change.
+                        appendEdge(edgeGroup, d, scene, extraEdgeClass);
+                    });
+                    edgeGroups.each(position);
+                    edgeGroups.each(function (d) {
+                        stylize(d3.select(this), d, sceneBehavior);
+                    });
+                    edgeGroups.exit()
+                        .each(function (d) {
+                        delete sceneBehavior._edgeGroupIndex[getEdgeKey(d)];
+                    })
+                        .remove();
+                    return edgeGroups;
+                }
+                edge.buildGroup = buildGroup;
+                ;
+                /**
+                 * For a given d3 selection and data object, create a path to represent the
+                 * edge described in d.label.
+                 *
+                 * If d.label is defined, it will be a RenderMetaedgeInformation instance. It
+                 * will sometimes be undefined, for example for some Annotation edges for which
+                 * there is no underlying Metaedge in the hierarchical graph.
+                 */
+                function appendEdge(edgeGroup, d, sceneBehavior, edgeClass) {
+                    edgeClass = edgeClass || scene.Class.Edge.LINE; // set default type
+                    if (d.label && d.label.structural) {
+                        edgeClass += " " + scene.Class.Edge.STRUCTURAL;
+                    }
+                    edgeGroup.append("path")
+                        .attr("class", edgeClass);
+                }
+                edge.appendEdge = appendEdge;
+                ;
+                /**
+                 * Returns a tween interpolator for the endpoint of an edge path.
+                 */
+                function getEdgePathInterpolator(d, i, a) {
+                    var renderMetaedgeInfo = d.label;
+                    var adjoiningMetaedge = renderMetaedgeInfo.adjoiningMetaedge;
+                    if (!adjoiningMetaedge) {
+                        return d3.interpolate(a, edge.interpolate(renderMetaedgeInfo.points));
+                    }
+                    var renderPath = this;
+                    // Get the adjoining path that matches the adjoining metaedge.
+                    var adjoiningPath = (adjoiningMetaedge.edgeGroup.node()
+                        .firstChild);
+                    // Find the desired SVGPoint along the adjoining path, then convert those
+                    // coordinates into the space of the renderPath using its Current
+                    // Transformation Matrix (CTM).
+                    var inbound = renderMetaedgeInfo.metaedge.inbound;
+                    return function (t) {
+                        var adjoiningPoint = adjoiningPath
+                            .getPointAtLength(inbound ? adjoiningPath.getTotalLength() : 0)
+                            .matrixTransform(adjoiningPath.getCTM())
+                            .matrixTransform(renderPath.getCTM().inverse());
+                        // Update the relevant point in the renderMetaedgeInfo's points list, then
+                        // re-interpolate the path.
+                        var points = renderMetaedgeInfo.points;
+                        var index = inbound ? 0 : points.length - 1;
+                        points[index].x = adjoiningPoint.x;
+                        points[index].y = adjoiningPoint.y;
+                        var dPath = edge.interpolate(points);
+                        return dPath;
+                    };
+                }
+                edge.interpolate = d3.svg.line()
+                    .interpolate("basis")
+                    .x(function (d) { return d.x; })
+                    .y(function (d) { return d.y; });
+                function position(d) {
+                    d3.select(this).select("path." + scene.Class.Edge.LINE)
+                        .each(function (d) {
+                        var path = d3.select(this);
+                        path.transition().attrTween("d", getEdgePathInterpolator);
+                    });
+                }
+                ;
+                /**
+                 * For a given d3 selection and data object, mark the edge as a control
+                 * dependency if it contains only control edges.
+                 *
+                 * d's label property will be a RenderMetaedgeInformation object.
+                 */
+                function stylize(edgeGroup, d, stylize) {
+                    var a;
+                    var metaedge = d.label.metaedge;
+                    edgeGroup
+                        .select("path." + scene.Class.Edge.LINE)
+                        .classed("control-dep", metaedge && !metaedge.numRegularEdges);
+                }
+                ;
+            })(edge = scene.edge || (scene.edge = {}));
+        })(scene = graph_1.scene || (graph_1.scene = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module
+</script>
+<script>/// <reference path="../graph.ts" />
+/// <reference path="scene.ts" />
+/// <reference path="annotation.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph) {
+        var scene;
+        (function (scene) {
+            var node;
+            (function (node_1) {
+                /**
+                 * Select or Create a "g.nodes" group to a given sceneGroup
+                 * and builds a number of "g.node" groups inside the group.
+                 *
+                 * Structure Pattern:
+                 *
+                 * <g class="nodes">
+                 *   <g class="node">
+                 *     <g class="in-annotations">
+                 *       ...
+                 *     </g>
+                 *     <g class="out-annotations">
+                 *       ...
+                 *     </g>
+                 *     <g class="nodeshape">
+                 *      <!--
+                 *      Content of the node shape should be for the node itself. For example a
+                 *      Metanode would have a <rect> with rounded edges, an op would have an
+                 *      <ellipse>. More complex nodes like series may contain multiple elements
+                 *      which are conditionally visible based on whether the node is expanded.
+                 *      -->
+                 *     </g>
+                 *     <text class="label">node name</text>
+                 *     <g class="subscene">
+                 *       <!--
+                 *       Content of  the subscene (only for metanode and series node).
+                 *
+                 *       Subscene is a svg group that contains content of the
+                 *       metanode's metagraph that is recursively generated by Scene.build().
+                 *
+                 *       When the graph is expanded multiple times, a subscene can contain
+                 *       nested subscenes inside.
+                 *       -->
+                 *     </g>
+                 *   </g>
+                 *   ...
+                 * </g>
+                 *
+                 *
+                 * @param sceneGroup selection of the container
+                 * @param nodeData array of render node information to map
+                 * @param sceneBehavior parent scene module
+                 * @return selection of the created nodeGroups
+                 */
+                function buildGroup(sceneGroup, nodeData, sceneBehavior) {
+                    var container = scene.selectOrCreateChild(sceneGroup, "g", scene.Class.Node.CONTAINER);
+                    // Select all children and join with data.
+                    // (Note that all children of g.nodes are g.node)
+                    var nodeGroups = container.selectAll(function () {
+                        // using d3's selector function
+                        // See https://github.com/mbostock/d3/releases/tag/v2.0.0
+                        // (It's not listed in the d3 wiki.)
+                        return this.childNodes; // this here refers to container.node()
+                    })
+                        .data(nodeData, function (d) {
+                        // make sure that we don't have to swap shape type
+                        return d.node.name + ":" + d.node.type;
+                    });
+                    // ENTER
+                    nodeGroups.enter()
+                        .append("g")
+                        .attr("data-name", function (d) { return d.node.name; })
+                        .each(function (d) {
+                        var nodeGroup = d3.select(this);
+                        // index node group for quick stylizing
+                        sceneBehavior.addNodeGroup(d.node.name, nodeGroup);
+                    });
+                    // UPDATE
+                    nodeGroups
+                        .attr("class", function (d) {
+                        return scene.Class.Node.GROUP + " " + nodeClass(d);
+                    })
+                        .each(function (d) {
+                        var nodeGroup = d3.select(this);
+                        // add g.in-annotations (always add -- to keep layer order consistent.)
+                        var inAnnotationBox = scene.selectOrCreateChild(nodeGroup, "g", scene.Class.Annotation.INBOX);
+                        scene.annotation.buildGroup(inAnnotationBox, d.inAnnotations, d, sceneBehavior);
+                        // add g.out-annotations  (always add -- to keep layer order consistent.)
+                        var outAnnotationBox = scene.selectOrCreateChild(nodeGroup, "g", scene.Class.Annotation.OUTBOX);
+                        scene.annotation.buildGroup(outAnnotationBox, d.outAnnotations, d, sceneBehavior);
+                        // label
+                        var label = labelBuild(nodeGroup, d, sceneBehavior);
+                        // Do not add interaction to metanode labels as they live inside the
+                        // metanode shape which already has the same interactions.
+                        addInteraction(label, d, sceneBehavior, d.node.type === graph.NodeType.META);
+                        // build .shape below label
+                        var shape = buildShape(nodeGroup, d, scene.Class.Node.SHAPE, label.node());
+                        if (d.node.isGroupNode) {
+                            addButton(shape, d, sceneBehavior);
+                        }
+                        addInteraction(shape, d, sceneBehavior);
+                        // build subscene on the top
+                        subsceneBuild(nodeGroup, d, sceneBehavior);
+                        stylize(nodeGroup, d, sceneBehavior);
+                        position(nodeGroup, d, sceneBehavior);
+                    });
+                    // EXIT
+                    nodeGroups.exit()
+                        .each(function (d) {
+                        // remove all indices on remove
+                        sceneBehavior.removeNodeGroup(d.node.name);
+                        var nodeGroup = d3.select(this);
+                        if (d.inAnnotations.list.length > 0) {
+                            nodeGroup.select("." + scene.Class.Annotation.INBOX)
+                                .selectAll("." + scene.Class.Annotation.GROUP)
+                                .each(function (a) {
+                                sceneBehavior.removeAnnotationGroup(a, d);
+                            });
+                        }
+                        if (d.outAnnotations.list.length > 0) {
+                            nodeGroup.select("." + scene.Class.Annotation.OUTBOX)
+                                .selectAll("." + scene.Class.Annotation.GROUP)
+                                .each(function (a) {
+                                sceneBehavior.removeAnnotationGroup(a, d);
+                            });
+                        }
+                    })
+                        .remove();
+                    return nodeGroups;
+                }
+                node_1.buildGroup = buildGroup;
+                ;
+                /**
+                 * Update or remove the subscene of a render group node depending on whether it
+                 * is a expanded. If the node is not a group node, this method has no effect.
+                 *
+                 * @param nodeGroup selection of the container
+                 * @param renderNodeInfo the render information for the node.
+                 * @param sceneBehavior parent scene module
+                 * @return Selection of the subscene group, or null if node group does not have
+                 *        a subscene. Op nodes, bridge nodes and unexpanded group nodes will
+                 *        not have a subscene.
+                 */
+                function subsceneBuild(nodeGroup, renderNodeInfo, sceneBehavior) {
+                    if (renderNodeInfo.node.isGroupNode) {
+                        if (renderNodeInfo.expanded) {
+                            // Recursively build the subscene.
+                            return scene.buildGroup(nodeGroup, renderNodeInfo, sceneBehavior, scene.Class.Subscene.GROUP);
+                        }
+                        // Clean out existing subscene if the node is not expanded.
+                        scene.selectChild(nodeGroup, "g", scene.Class.Subscene.GROUP).remove();
+                    }
+                    return null;
+                }
+                ;
+                /**
+                 * Translate the subscene of the given node group
+                 */
+                function subscenePosition(nodeGroup, d) {
+                    var x0 = d.x - d.width / 2.0 + d.paddingLeft;
+                    var y0 = d.y - d.height / 2.0 + d.paddingTop;
+                    var subscene = scene.selectChild(nodeGroup, "g", scene.Class.Subscene.GROUP);
+                    scene.translate(subscene, x0, y0);
+                }
+                ;
+                /**
+                 * Add an expand/collapse button to a group node
+                 *
+                 * @param selection The group node selection.
+                 * @param d Info about the node being rendered.
+                 * @param sceneBehavior parent scene module.
+                 */
+                function addButton(selection, d, sceneBehavior) {
+                    var group = scene.selectOrCreateChild(selection, "g", scene.Class.Node.BUTTON_CONTAINER);
+                    scene.selectOrCreateChild(group, "circle", scene.Class.Node.BUTTON_CIRCLE);
+                    scene.selectOrCreateChild(group, "path", scene.Class.Node.EXPAND_BUTTON).attr("d", "M0,-2.2 V2.2 M-2.2,0 H2.2");
+                    scene.selectOrCreateChild(group, "path", scene.Class.Node.COLLAPSE_BUTTON).attr("d", "M-2.2,0 H2.2");
+                    group.on("click", function (d) {
+                        // Stop this event's propagation so that it isn't also considered a
+                        // node-select.
+                        d3.event.stopPropagation();
+                        sceneBehavior.fire("node-toggle-expand", { name: d.node.name });
+                    });
+                    scene.positionButton(group, d);
+                }
+                ;
+                /**
+                 * Fire node-* events when the selection is interacted.
+                 *
+                 * @param disableInteraction When true, have the provided selection
+                 * ignore all pointer events. Used for text labels inside of metanodes, which
+                 * don't need interaction as their surrounding shape has interaction, and if
+                 * given interaction would cause conflicts with the expand/collapse button.
+                 */
+                function addInteraction(selection, d, sceneBehavior, disableInteraction) {
+                    if (disableInteraction) {
+                        selection.attr("pointer-events", "none");
+                        return;
+                    }
+                    selection.on("dblclick", function (d) {
+                        sceneBehavior.fire("node-toggle-expand", { name: d.node.name });
+                    })
+                        .on("mouseover", function (d) {
+                        // don't send mouseover over expanded group,
+                        // otherwise it is causing too much glitches
+                        if (sceneBehavior.isNodeExpanded(d)) {
+                            return;
+                        }
+                        sceneBehavior.fire("node-highlight", { name: d.node.name });
+                    })
+                        .on("mouseout", function (d) {
+                        // don't send mouseover over expanded group,
+                        // otherwise it is causing too much glitches
+                        if (sceneBehavior.isNodeExpanded(d)) {
+                            return;
+                        }
+                        sceneBehavior.fire("node-unhighlight", { name: d.node.name });
+                    })
+                        .on("click", function (d) {
+                        // Stop this event's propagation so that it isn't also considered
+                        // a graph-select.
+                        d3.event.stopPropagation();
+                        sceneBehavior.fire("node-select", { name: d.node.name });
+                    });
+                }
+                ;
+                /**
+                 * Append svg text for label and assign data.
+                 * @param nodeGroup
+                 * @param renderNodeInfo The render node information for the label.
+                 * @param sceneBehavior parent scene module.
+                 */
+                function labelBuild(nodeGroup, renderNodeInfo, sceneBehavior) {
+                    var namePath = renderNodeInfo.node.name.split("/");
+                    var text = namePath[namePath.length - 1];
+                    // Truncate long labels for unexpanded Metanodes.
+                    var useFontScale = renderNodeInfo.node.type === graph.NodeType.META &&
+                        !renderNodeInfo.expanded;
+                    var label = scene.selectOrCreateChild(nodeGroup, "text", scene.Class.Node.LABEL);
+                    label.attr("dy", ".35em")
+                        .attr("text-anchor", "middle");
+                    if (useFontScale) {
+                        if (text.length > sceneBehavior.maxMetanodeLabelLength) {
+                            text = text.substr(0, sceneBehavior.maxMetanodeLabelLength - 2) + "...";
+                        }
+                        var scale = getLabelFontScale(sceneBehavior);
+                        label.attr("font-size", scale(text.length) + "px");
+                    }
+                    label.text(text);
+                    return label;
+                }
+                ;
+                /**
+                 * d3 scale used for sizing font of labels, used by labelBuild,
+                 * initialized once by getLabelFontScale.
+                 */
+                var fontScale = null;
+                function getLabelFontScale(sceneBehavior) {
+                    if (!fontScale) {
+                        fontScale = d3.scale.linear()
+                            .domain([sceneBehavior.maxMetanodeLabelLengthLargeFont,
+                            sceneBehavior.maxMetanodeLabelLength])
+                            .range([sceneBehavior.maxMetanodeLabelLengthFontSize,
+                            sceneBehavior.minMetanodeLabelLengthFontSize]).clamp(true);
+                    }
+                    return fontScale;
+                }
+                /**
+                 * Set label position of a given node group
+                 */
+                function labelPosition(nodeGroup, d, yOffset) {
+                    scene.selectChild(nodeGroup, "text", scene.Class.Node.LABEL).transition()
+                        .attr("x", d.x)
+                        .attr("y", d.y + yOffset);
+                }
+                ;
+                /**
+                 * Select or append/insert shape for a node and assign renderNode
+                 * as the shape's data.
+                 *
+                 * @param nodeGroup
+                 * @param d RenderNodeInformation
+                 * @param nodeClass class for the element.
+                 * @param before Reference DOM node for insertion.
+                 * @return Selection of the shape.
+                 */
+                function buildShape(nodeGroup, d, nodeClass, before) {
+                    // Create a group to house the underlying visual elements.
+                    var shapeGroup = scene.selectOrCreateChild(nodeGroup, "g", nodeClass, before);
+                    // TODO(jimbo): DOM structure should be templated in HTML somewhere, not JS.
+                    switch (d.node.type) {
+                        case graph.NodeType.OP:
+                            scene.selectOrCreateChild(shapeGroup, "ellipse", scene.Class.Node.COLOR_TARGET);
+                            break;
+                        case graph.NodeType.SERIES:
+                            // Choose the correct stamp to use to represent this series.
+                            var stampType = "annotation";
+                            var groupNodeInfo = d;
+                            if (groupNodeInfo.coreGraph) {
+                                stampType = groupNodeInfo.node.hasNonControlEdges
+                                    ? "vertical" : "horizontal";
+                            }
+                            scene.selectOrCreateChild(shapeGroup, "use", scene.Class.Node.COLOR_TARGET)
+                                .attr("xlink:href", "#op-series-" + stampType + "-stamp");
+                            scene.selectOrCreateChild(shapeGroup, "rect", scene.Class.Node.COLOR_TARGET)
+                                .attr({ rx: d.radius, ry: d.radius });
+                            break;
+                        case graph.NodeType.BRIDGE:
+                            scene.selectOrCreateChild(shapeGroup, "rect", scene.Class.Node.COLOR_TARGET)
+                                .attr({ rx: d.radius, ry: d.radius });
+                            break;
+                        case graph.NodeType.META:
+                            scene.selectOrCreateChild(shapeGroup, "rect", scene.Class.Node.COLOR_TARGET)
+                                .attr({ rx: d.radius, ry: d.radius });
+                            break;
+                        default:
+                            throw Error("Unrecognized node type: " + d.node.type);
+                    }
+                    return shapeGroup;
+                }
+                node_1.buildShape = buildShape;
+                ;
+                function nodeClass(d) {
+                    switch (d.node.type) {
+                        case graph.NodeType.OP:
+                            return scene.Class.OPNODE;
+                        case graph.NodeType.META:
+                            return scene.Class.METANODE;
+                        case graph.NodeType.SERIES:
+                            return scene.Class.SERIESNODE;
+                        case graph.NodeType.BRIDGE:
+                            return scene.Class.BRIDGENODE;
+                        case graph.NodeType.ELLIPSIS:
+                            return scene.Class.ELLIPSISNODE;
+                    }
+                    ;
+                    throw Error("Unrecognized node type: " + d.node.type);
+                }
+                node_1.nodeClass = nodeClass;
+                ;
+                /** Modify node and its subscene and its label's positional attributes */
+                function position(nodeGroup, d, sceneBehavior) {
+                    var shapeGroup = scene.selectChild(nodeGroup, "g", scene.Class.Node.SHAPE);
+                    switch (d.node.type) {
+                        case graph.NodeType.OP: {
+                            // position shape
+                            var shape = scene.selectChild(shapeGroup, "ellipse");
+                            scene.positionEllipse(shape, d.x, d.y, d.width, d.height);
+                            labelPosition(nodeGroup, d, d.labelOffset);
+                            break;
+                        }
+                        case graph.NodeType.META: {
+                            // position shape
+                            var shape = scene.selectChild(shapeGroup, "rect");
+                            scene.positionRect(shape, d.x, d.y, d.width, d.height);
+                            if (d.expanded) {
+                                subscenePosition(nodeGroup, d);
+                                // put label on top
+                                labelPosition(nodeGroup, d, -d.height / 2 + d.labelHeight / 2);
+                            }
+                            else {
+                                labelPosition(nodeGroup, d, 0);
+                            }
+                            break;
+                        }
+                        case graph.NodeType.SERIES: {
+                            var shape = scene.selectChild(shapeGroup, "use");
+                            scene.positionRect(shape, d.x, d.y, d.width, d.height);
+                            if (d.expanded) {
+                                subscenePosition(nodeGroup, d);
+                                // put label on top
+                                labelPosition(nodeGroup, d, -d.height / 2 + d.labelHeight / 2);
+                            }
+                            else {
+                                labelPosition(nodeGroup, d, d.labelOffset);
+                            }
+                        }
+                        case graph.NodeType.BRIDGE: {
+                            // position shape
+                            // NOTE: In reality, these will not be visible, but it helps to put them
+                            // in the correct position for debugging purposes.
+                            var shape = scene.selectChild(shapeGroup, "rect");
+                            scene.positionRect(shape, d.x, d.y, d.width, d.height);
+                            break;
+                        }
+                        default: {
+                            throw Error("Unrecognized node type: " + d.node.type);
+                        }
+                    }
+                }
+                ;
+                /** Enum specifying the options to color nodes by */
+                var ColorBy = {
+                    STRUCTURE: 0,
+                    DEVICE: 1,
+                    COMPUTE_TIME: 2,
+                    MEMORY: 3
+                };
+                /**
+                 * Returns the fill color for the node given its state and the "color by"
+                 * option.
+                 */
+                function getFillForNode(sceneBehavior, colorBy, renderInfo, isExpanded) {
+                    var colorParams = tf.graph.render.MetanodeColors;
+                    switch (colorBy) {
+                        case ColorBy.STRUCTURE:
+                            if (renderInfo.node.type === tf.graph.NodeType.META) {
+                                var tid = renderInfo.node.templateId;
+                                return tid === null ? colorParams.UNKNOWN : colorParams.STRUCTURE_PALETTE(sceneBehavior.templateIndex(tid), renderInfo.expanded);
+                            }
+                            else if (renderInfo.node.type === tf.graph.NodeType.SERIES) {
+                                // If expanded, we're showing the background rect, which we want to
+                                // appear gray. Otherwise we're showing a stack of ellipses which we
+                                // want to show white.
+                                return renderInfo.expanded ? colorParams.EXPANDED_COLOR : "white";
+                            }
+                            else if (renderInfo.node.type === graph.NodeType.BRIDGE) {
+                                return renderInfo.structural ? "#f0e" :
+                                    renderInfo.node.inbound ? "#0ef" : "#fe0";
+                            }
+                            else {
+                                // Op nodes are white.
+                                return "white";
+                            }
+                        case ColorBy.DEVICE:
+                            if (renderInfo.deviceColors == null) {
+                                // Return the hue for unknown device.
+                                return colorParams.UNKNOWN;
+                            }
+                            var id = renderInfo.node.name;
+                            var escapedId = tf.escapeQuerySelector(id);
+                            var gradientDefs = d3.select("svg#svg defs #linearGradients");
+                            var linearGradient = gradientDefs.select("linearGradient#" + escapedId);
+                            // If the linear gradient is not there yet, create it.
+                            if (linearGradient.size() === 0) {
+                                linearGradient = gradientDefs.append("linearGradient").attr("id", id);
+                                // Re-create the stops of the linear gradient.
+                                linearGradient.selectAll("*").remove();
+                                var cumulativeProportion = 0;
+                                // For each device, create a stop using the proportion of that device.
+                                _.each(renderInfo.deviceColors, function (d) {
+                                    var color = d.color;
+                                    linearGradient.append("stop")
+                                        .attr("offset", cumulativeProportion)
+                                        .attr("stop-color", color);
+                                    linearGradient.append("stop")
+                                        .attr("offset", cumulativeProportion + d.proportion)
+                                        .attr("stop-color", color);
+                                    cumulativeProportion += d.proportion;
+                                });
+                            }
+                            return isExpanded ? colorParams.EXPANDED_COLOR : "url(#" + escapedId + ")";
+                        case ColorBy.COMPUTE_TIME:
+                            return isExpanded ?
+                                colorParams.EXPANDED_COLOR : renderInfo.computeTimeColor ||
+                                colorParams.UNKNOWN;
+                        case ColorBy.MEMORY:
+                            return isExpanded ?
+                                colorParams.EXPANDED_COLOR : renderInfo.memoryColor ||
+                                colorParams.UNKNOWN;
+                        default:
+                            throw new Error("Unknown case to color nodes by");
+                    }
+                }
+                /**
+                 * Modify node style by toggling class and assign attributes (only for things
+                 * that can't be done in css).
+                 */
+                function stylize(nodeGroup, renderInfo, sceneBehavior, nodeClass) {
+                    nodeClass = nodeClass || scene.Class.Node.SHAPE;
+                    var isHighlighted = sceneBehavior.isNodeHighlighted(renderInfo.node.name);
+                    var isSelected = sceneBehavior.isNodeSelected(renderInfo.node.name);
+                    var isExtract = renderInfo.isInExtract || renderInfo.isOutExtract;
+                    var isExpanded = renderInfo.expanded;
+                    nodeGroup.classed("highlighted", isHighlighted);
+                    nodeGroup.classed("selected", isSelected);
+                    nodeGroup.classed("extract", isExtract);
+                    nodeGroup.classed("expanded", isExpanded);
+                    // Main node always exists here and it will be reached before subscene,
+                    // so d3 selection is fine here.
+                    var node = nodeGroup.select("." + nodeClass + " ." + scene.Class.Node.COLOR_TARGET);
+                    var fillColor = getFillForNode(sceneBehavior, ColorBy[sceneBehavior.colorBy.toUpperCase()], renderInfo, isExpanded);
+                    node.style("fill", fillColor);
+                    // Choose outline to be darker version of node color if the node is a single
+                    // color and is not selected.
+                    if (isSelected) {
+                        node.style("stroke", null);
+                    }
+                    else {
+                        // If node is colored by a gradient, then use a dark gray outline.
+                        var outlineColor = fillColor.substring(0, 3) === "url" ?
+                            tf.graph.render.MetanodeColors.GRADIENT_OUTLINE :
+                            d3.rgb(fillColor).darker().toString();
+                        node.style("stroke", outlineColor);
+                    }
+                }
+                node_1.stylize = stylize;
+                ;
+            })(node = scene.node || (scene.node = {}));
+        })(scene = graph.scene || (graph.scene = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module
+</script>
+<script>/// <reference path="graph.ts" />
+/// <reference path="render.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph_1) {
+        var layout;
+        (function (layout) {
+            /** Set of parameters that define the look and feel of the graph. */
+            layout.PARAMS = {
+                animation: {
+                    /** Default duration for graph animations in ms. */
+                    duration: 250
+                },
+                graph: {
+                    /** Graph parameter for metanode. */
+                    meta: {
+                        /**
+                         * Dagre's nodesep param - number of pixels that
+                         * separate nodes horizontally in the layout.
+                         *
+                         * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+                         */
+                        nodeSep: 110,
+                        /**
+                         * Dagre's ranksep param - number of pixels
+                         * between each rank in the layout.
+                         *
+                         * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+                         */
+                        rankSep: 25
+                    },
+                    /** Graph parameter for metanode. */
+                    series: {
+                        /**
+                         * Dagre's nodesep param - number of pixels that
+                         * separate nodes horizontally in the layout.
+                         *
+                         * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+                         */
+                        nodeSep: 90,
+                        /**
+                         * Dagre's ranksep param - number of pixels
+                         * between each rank in the layout.
+                         *
+                         * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+                         */
+                        rankSep: 25,
+                    },
+                    /**
+                     * Padding is used to correctly position the graph SVG inside of its parent
+                     * element. The padding amounts are applied using an SVG transform of X and
+                     * Y coordinates.
+                     */
+                    padding: {
+                        paddingTop: 40,
+                        paddingLeft: 20
+                    }
+                },
+                subscene: {
+                    meta: {
+                        paddingTop: 10,
+                        paddingBottom: 10,
+                        paddingLeft: 10,
+                        paddingRight: 10,
+                        /**
+                         * Used to leave room for the label on top of the highest node in
+                         * the core graph.
+                         */
+                        labelHeight: 20,
+                        /** X-space between each extracted node and the core graph. */
+                        extractXOffset: 50,
+                        /** Y-space between each extracted node. */
+                        extractYOffset: 20
+                    },
+                    series: {
+                        paddingTop: 10,
+                        paddingBottom: 10,
+                        paddingLeft: 10,
+                        paddingRight: 10,
+                        labelHeight: 10
+                    }
+                },
+                nodeSize: {
+                    /** Size of meta nodes. */
+                    meta: {
+                        radius: 5,
+                        width: 60,
+                        /** A scale for the node's height based on number of nodes inside */
+                        height: d3.scale.linear().domain([1, 200]).range([15, 60]).clamp(true),
+                        /** The radius of the circle denoting the expand button. */
+                        expandButtonRadius: 3
+                    },
+                    /** Size of op nodes. */
+                    op: {
+                        width: 15,
+                        height: 6,
+                        radius: 3,
+                        labelOffset: -8
+                    },
+                    /** Size of series nodes. */
+                    series: {
+                        expanded: {
+                            // For expanded series nodes, width and height will be
+                            // computed to account for the subscene.
+                            radius: 10,
+                            labelOffset: 0,
+                        },
+                        vertical: {
+                            // When unexpanded, series whose underlying metagraphs contain
+                            // one or more non-control edges will show as a vertical stack
+                            // of ellipses.
+                            width: 16,
+                            height: 13,
+                            labelOffset: -13,
+                        },
+                        horizontal: {
+                            // When unexpanded, series whose underlying metagraphs contain
+                            // no non-control edges will show as a horizontal stack of
+                            // ellipses.
+                            width: 24,
+                            height: 8,
+                            radius: 10,
+                            labelOffset: -10,
+                        },
+                    },
+                    /** Size of bridge nodes. */
+                    bridge: {
+                        // NOTE: bridge nodes will normally be invisible, but they must
+                        // take up some space so that the layout step leaves room for
+                        // their edges.
+                        width: 20,
+                        height: 20,
+                        radius: 2,
+                        labelOffset: 0
+                    }
+                },
+                shortcutSize: {
+                    /** Size of shortcuts for op nodes */
+                    op: {
+                        width: 10,
+                        height: 4
+                    },
+                    /** Size of shortcuts for meta nodes */
+                    meta: {
+                        width: 12,
+                        height: 4,
+                        radius: 1
+                    },
+                    /** Size of shortcuts for series nodes */
+                    series: {
+                        width: 14,
+                        height: 4,
+                    }
+                },
+                annotations: {
+                    /** X-space between the shape and each annotation-node. */
+                    xOffset: 10,
+                    /** Y-space between each annotation-node. */
+                    yOffset: 3,
+                    /** X-space between each annotation-node and its label. */
+                    labelOffset: 2,
+                    /** Estimate max width for annotation label */
+                    labelWidth: 35
+                },
+                constant: {
+                    size: {
+                        width: 4,
+                        height: 4
+                    }
+                },
+                series: {
+                    /** Maximum number of repeated item for unexpanded series node. */
+                    maxStackCount: 3,
+                    /**
+                     * Positioning offset ratio for collapsed stack
+                     * of parallel series (series without edges between its members).
+                     */
+                    parallelStackOffsetRatio: 0.2,
+                    /**
+                     * Positioning offset ratio for collapsed stack
+                     * of tower series (series with edges between its members).
+                     */
+                    towerStackOffsetRatio: 0.5
+                },
+                minimap: {
+                    /** The maximum width/height the minimap can have. */
+                    size: 150
+                }
+            };
+            /** Calculate layout for a scene of a group node. */
+            function scene(renderNodeInfo) {
+                // Update layout, size, and annotations of its children nodes and edges.
+                if (renderNodeInfo.node.isGroupNode) {
+                    layoutChildren(renderNodeInfo);
+                }
+                // Update position of its children nodes and edges
+                if (renderNodeInfo.node.type === graph_1.NodeType.META) {
+                    layoutMetanode(renderNodeInfo);
+                }
+                else if (renderNodeInfo.node.type === graph_1.NodeType.SERIES) {
+                    layoutSeriesNode(renderNodeInfo);
+                }
+            }
+            layout.scene = scene;
+            ;
+            /**
+             * Update layout, size, and annotations of its children nodes and edges.
+             */
+            function layoutChildren(renderNodeInfo) {
+                var children = renderNodeInfo.coreGraph.nodes().map(function (n) {
+                    return renderNodeInfo.coreGraph.node(n);
+                }).concat(renderNodeInfo.isolatedInExtract, renderNodeInfo.isolatedOutExtract);
+                _.each(children, function (childNodeInfo) {
+                    // Set size of each child
+                    switch (childNodeInfo.node.type) {
+                        case graph_1.NodeType.OP:
+                            _.extend(childNodeInfo, layout.PARAMS.nodeSize.op);
+                            break;
+                        case graph_1.NodeType.BRIDGE:
+                            _.extend(childNodeInfo, layout.PARAMS.nodeSize.bridge);
+                            break;
+                        case graph_1.NodeType.META:
+                            if (!childNodeInfo.expanded) {
+                                // set fixed width and scalable height based on cardinality
+                                _.extend(childNodeInfo, layout.PARAMS.nodeSize.meta);
+                                childNodeInfo.height =
+                                    layout.PARAMS.nodeSize.meta.height(childNodeInfo.node.cardinality);
+                            }
+                            else {
+                                var childGroupNodeInfo = childNodeInfo;
+                                scene(childGroupNodeInfo); // Recursively layout its subscene.
+                            }
+                            break;
+                        case graph_1.NodeType.SERIES:
+                            if (childNodeInfo.expanded) {
+                                _.extend(childNodeInfo, layout.PARAMS.nodeSize.series.expanded);
+                                var childGroupNodeInfo = childNodeInfo;
+                                scene(childGroupNodeInfo); // Recursively layout its subscene.
+                            }
+                            else {
+                                var childGroupNodeInfo = childNodeInfo;
+                                var seriesParams = childGroupNodeInfo.node.hasNonControlEdges ?
+                                    layout.PARAMS.nodeSize.series.vertical :
+                                    layout.PARAMS.nodeSize.series.horizontal;
+                                _.extend(childNodeInfo, seriesParams);
+                            }
+                            break;
+                        default:
+                            throw Error("Unrecognized node type: " + childNodeInfo.node.type);
+                    }
+                    // Layout each child's annotations
+                    layoutAnnotation(childNodeInfo);
+                });
+            }
+            /**
+             * Calculate layout for a graph using dagre
+             * @param graph the graph to be laid out
+             * @param params layout parameters
+             * @return width and height of the core graph
+             */
+            function dagreLayout(graph, params) {
+                _.extend(graph.graph(), {
+                    nodeSep: params.nodeSep,
+                    rankSep: params.rankSep
+                });
+                var bridgeNodeNames = [];
+                var nonBridgeNodeNames = [];
+                // Split out nodes into bridge and non-bridge nodes, and calculate the total
+                // width we should use for bridge nodes.
+                _.each(graph.nodes(), function (nodeName) {
+                    var nodeInfo = graph.node(nodeName);
+                    if (nodeInfo.node.type === graph_1.NodeType.BRIDGE) {
+                        bridgeNodeNames.push(nodeName);
+                    }
+                    else {
+                        nonBridgeNodeNames.push(nodeName);
+                    }
+                });
+                // If there are no non-bridge nodes, then the graph has zero size.
+                if (!nonBridgeNodeNames.length) {
+                    return {
+                        width: 0,
+                        height: 0,
+                    };
+                }
+                dagre.layout(graph);
+                var graphLabel = graph.graph();
+                // Calculate the true bounding box of the graph by iterating over nodes and
+                // edges rather than accepting dagre's word for it. In particular, we should
+                // ignore the extra-wide bridge nodes and bridge edges, and allow for
+                // annotation boxes and labels.
+                var minX = Infinity;
+                var minY = Infinity;
+                var maxX = -Infinity;
+                var maxY = -Infinity;
+                _.each(nonBridgeNodeNames, function (nodeName) {
+                    var nodeInfo = graph.node(nodeName);
+                    var w = 0.5 * nodeInfo.width;
+                    var x1 = nodeInfo.x - w - nodeInfo.inboxWidth;
+                    var x2 = nodeInfo.x + w + nodeInfo.outboxWidth;
+                    minX = x1 < minX ? x1 : minX;
+                    maxX = x2 > maxX ? x2 : maxX;
+                    var labelLength = nodeName.length - nodeName.lastIndexOf(graph_1.NAMESPACE_DELIM);
+                    // TODO(jimbo): Account for font width rather than using a magic number.
+                    var charWidth = 3; // 3 pixels per character.
+                    var lw = 0.5 * labelLength * charWidth;
+                    var lx1 = nodeInfo.x - lw;
+                    var lx2 = nodeInfo.x + lw;
+                    minX = lx1 < minX ? lx1 : minX;
+                    maxX = lx2 > maxX ? lx2 : maxX;
+                    // TODO(jimbo): Account for the height of labels above op nodes here.
+                    var h = 0.5 * nodeInfo.outerHeight;
+                    var y1 = nodeInfo.y - h;
+                    var y2 = nodeInfo.y + h;
+                    minY = y1 < minY ? y1 : minY;
+                    maxY = y2 > maxY ? y2 : maxY;
+                });
+                _.each(graph.edges(), function (edgeObj) {
+                    var renderMetaedgeInfo = graph.edge(edgeObj);
+                    if (renderMetaedgeInfo.structural) {
+                        return; // Skip structural edges from min/max calculations.
+                    }
+                    _.each(renderMetaedgeInfo.points, function (point) {
+                        minX = point.x < minX ? point.x : minX;
+                        maxX = point.x > maxX ? point.x : maxX;
+                        minY = point.y < minY ? point.y : minY;
+                        maxY = point.y > maxY ? point.y : maxY;
+                    });
+                });
+                // Shift all nodes and edge points to account for the left-padding amount,
+                // and the invisble bridge nodes.
+                _.each(graph.nodes(), function (nodeName) {
+                    var nodeInfo = graph.node(nodeName);
+                    nodeInfo.x -= minX;
+                    nodeInfo.y -= minY;
+                });
+                _.each(graph.edges(), function (edgeObj) {
+                    _.each(graph.edge(edgeObj).points, function (point) {
+                        point.x -= minX;
+                        point.y -= minY;
+                    });
+                });
+                return {
+                    width: maxX - minX,
+                    height: maxY - minY,
+                };
+            }
+            /** Layout a metanode. */
+            function layoutMetanode(renderNodeInfo) {
+                // First, copy params specific to meta nodes onto this render info object.
+                var params = layout.PARAMS.subscene.meta;
+                renderNodeInfo = _.extend(renderNodeInfo, params);
+                // Invoke dagre.layout() on the core graph and record the bounding box
+                // dimensions.
+                _.extend(renderNodeInfo.coreBox, dagreLayout(renderNodeInfo.coreGraph, layout.PARAMS.graph.meta));
+                // Calculate the position of nodes in isolatedInExtract relative to the
+                // top-left corner of inExtractBox (the bounding box for all inExtract nodes)
+                // and calculate the size of the inExtractBox.
+                var hasInExtract = renderNodeInfo.isolatedInExtract.length > 0;
+                renderNodeInfo.inExtractBox.width = hasInExtract ?
+                    _(renderNodeInfo.isolatedInExtract).pluck("outerWidth").max() : 0;
+                renderNodeInfo.inExtractBox.height =
+                    _.reduce(renderNodeInfo.isolatedInExtract, function (height, child, i) {
+                        var yOffset = i > 0 ? params.extractYOffset : 0;
+                        // use outerWidth/Height here to avoid overlaps between extracts
+                        child.x = renderNodeInfo.inExtractBox.width / 2;
+                        child.y = height + yOffset + child.outerHeight / 2;
+                        return height + yOffset + child.outerHeight;
+                    }, 0);
+                // Calculate the position of nodes in isolatedOutExtract relative to the
+                // top-left corner of outExtractBox (the bounding box for all outExtract
+                // nodes) and calculate the size of the outExtractBox.
+                var hasOutExtract = renderNodeInfo.isolatedOutExtract.length > 0;
+                renderNodeInfo.outExtractBox.width = hasOutExtract ?
+                    _(renderNodeInfo.isolatedOutExtract).pluck("outerWidth").max() : 0;
+                renderNodeInfo.outExtractBox.height =
+                    _.reduce(renderNodeInfo.isolatedOutExtract, function (height, child, i) {
+                        var yOffset = i > 0 ? params.extractYOffset : 0;
+                        // use outerWidth/Height here to avoid overlaps between extracts
+                        child.x = renderNodeInfo.outExtractBox.width / 2;
+                        child.y = height + yOffset + child.outerHeight / 2;
+                        return height + yOffset + child.outerHeight;
+                    }, 0);
+                // Determine the whole metanode's width (from left to right).
+                renderNodeInfo.width =
+                    params.paddingLeft + renderNodeInfo.coreBox.width + params.paddingRight +
+                        (hasInExtract ?
+                            renderNodeInfo.inExtractBox.width + params.extractXOffset : 0) +
+                        (hasOutExtract ?
+                            params.extractXOffset + renderNodeInfo.outExtractBox.width : 0);
+                // TODO(jimbo): Remove labelHeight and instead incorporate into box sizes.
+                // Determine the whole metanode's height (from top to bottom).
+                renderNodeInfo.height =
+                    renderNodeInfo.labelHeight +
+                        params.paddingTop +
+                        Math.max(renderNodeInfo.inExtractBox.height, renderNodeInfo.coreBox.height, renderNodeInfo.outExtractBox.height) +
+                        params.paddingBottom;
+            }
+            /**
+             * Calculate layout for series node's core graph. Only called for an expanded
+             * series.
+             */
+            function layoutSeriesNode(node) {
+                var graph = node.coreGraph;
+                var params = layout.PARAMS.subscene.series;
+                _.extend(node, params);
+                // Layout the core.
+                _.extend(node.coreBox, dagreLayout(node.coreGraph, layout.PARAMS.graph.series));
+                _.each(graph.nodes(), function (nodeName) {
+                    graph.node(nodeName).excluded = false;
+                });
+                // Series do not have in/outExtractBox so no need to include them here.
+                node.width = node.coreBox.width + params.paddingLeft + params.paddingRight;
+                node.height = node.coreBox.height + params.paddingTop + params.paddingBottom;
+            }
+            /**
+             * Calculate layout for annotations of a given node.
+             * This will modify positions of the the given node and its annotations.
+             *
+             * @see tf.graph.render.Node and tf.graph.render.Annotation
+             * for description of each property of each render node.
+             *
+             */
+            function layoutAnnotation(renderNodeInfo) {
+                // If the render node is an expanded metanode, then its annotations will not
+                // be visible and we should skip the annotation calculations.
+                if (renderNodeInfo.expanded) {
+                    _.extend(renderNodeInfo, {
+                        inboxWidth: 0,
+                        inboxHeight: 0,
+                        outboxWidth: 0,
+                        outboxHeight: 0,
+                        outerWidth: renderNodeInfo.width,
+                        outerHeight: renderNodeInfo.height
+                    });
+                    return;
+                }
+                var inAnnotations = renderNodeInfo.inAnnotations.list;
+                var outAnnotations = renderNodeInfo.outAnnotations.list;
+                // Calculate size for in-annotations
+                _.each(inAnnotations, function (a) { return sizeAnnotation(a); });
+                // Calculate size for out-annotations
+                _.each(outAnnotations, function (a) { return sizeAnnotation(a); });
+                var params = layout.PARAMS.annotations;
+                renderNodeInfo.inboxWidth =
+                    inAnnotations.length > 0 ?
+                        _(inAnnotations).pluck("width").max() +
+                            params.xOffset + params.labelWidth + params.labelOffset :
+                        0;
+                renderNodeInfo.outboxWidth =
+                    outAnnotations.length > 0 ?
+                        _(outAnnotations).pluck("width").max() +
+                            params.xOffset + params.labelWidth + params.labelOffset :
+                        0;
+                // Calculate annotation node position (a.dx, a.dy)
+                // and total height for in-annotations
+                // After this chunk of code:
+                // inboxHeight = sum of annotation heights+ (annotation.length - 1 * yOffset)
+                var inboxHeight = _.reduce(inAnnotations, function (height, a, i) {
+                    var yOffset = i > 0 ? params.yOffset : 0;
+                    a.dx = -(renderNodeInfo.width + a.width) / 2 - params.xOffset;
+                    a.dy = height + yOffset + a.height / 2;
+                    return height + yOffset + a.height;
+                }, 0);
+                _.each(inAnnotations, function (a) {
+                    a.dy -= inboxHeight / 2;
+                    a.labelOffset = params.labelOffset;
+                });
+                // Calculate annotation node position position (a.dx, a.dy)
+                // and total height for out-annotations
+                // After this chunk of code:
+                // outboxHeight = sum of annotation heights +
+                //                (annotation.length - 1 * yOffset)
+                var outboxHeight = _.reduce(outAnnotations, function (height, a, i) {
+                    var yOffset = i > 0 ? params.yOffset : 0;
+                    a.dx = (renderNodeInfo.width + a.width) / 2 + params.xOffset;
+                    a.dy = height + yOffset + a.height / 2;
+                    return height + yOffset + a.height;
+                }, 0);
+                _.each(outAnnotations, function (a) {
+                    // adjust by (half of ) the total height
+                    // so dy is relative to the host node's center.
+                    a.dy -= outboxHeight / 2;
+                    a.labelOffset = params.labelOffset;
+                });
+                // Creating scales for touch point between the in-annotation edges
+                // and their hosts.
+                var inTouchHeight = Math.min(renderNodeInfo.height / 2 - renderNodeInfo.radius, inboxHeight / 2);
+                inTouchHeight = inTouchHeight < 0 ? 0 : inTouchHeight;
+                var inY = d3.scale.linear()
+                    .domain([0, inAnnotations.length - 1])
+                    .range([-inTouchHeight, inTouchHeight]);
+                // Calculate annotation edge position
+                _.each(inAnnotations, function (a, i) {
+                    a.points = [
+                        // The annotation node end
+                        {
+                            dx: a.dx + a.width / 2,
+                            dy: a.dy
+                        },
+                        // The host node end
+                        {
+                            dx: -renderNodeInfo.width / 2,
+                            // only use scale if there are more than one,
+                            // otherwise center it vertically
+                            dy: inAnnotations.length > 1 ? inY(i) : 0
+                        }
+                    ];
+                });
+                // Creating scales for touch point between the out-annotation edges
+                // and their hosts.
+                var outTouchHeight = Math.min(renderNodeInfo.height / 2 - renderNodeInfo.radius, outboxHeight / 2);
+                outTouchHeight = outTouchHeight < 0 ? 0 : outTouchHeight;
+                var outY = d3.scale.linear()
+                    .domain([0, outAnnotations.length - 1])
+                    .range([-outTouchHeight, outTouchHeight]);
+                _.each(outAnnotations, function (a, i) {
+                    // Add point from the border of the annotation node
+                    a.points = [
+                        // The host node end
+                        {
+                            dx: renderNodeInfo.width / 2,
+                            // only use scale if there are more than one,
+                            // otherwise center it vertically
+                            dy: outAnnotations.length > 1 ? outY(i) : 0
+                        },
+                        // The annotation node end
+                        {
+                            dx: a.dx - a.width / 2,
+                            dy: a.dy
+                        }
+                    ];
+                });
+                renderNodeInfo.outerWidth = renderNodeInfo.width + renderNodeInfo.inboxWidth +
+                    renderNodeInfo.outboxWidth;
+                renderNodeInfo.outerHeight =
+                    Math.max(renderNodeInfo.height, inboxHeight, outboxHeight);
+            }
+            /**
+             * Set size of an annotation node.
+             */
+            function sizeAnnotation(a) {
+                switch (a.annotationType) {
+                    case graph_1.render.AnnotationType.CONSTANT:
+                        _.extend(a, layout.PARAMS.constant.size);
+                        break;
+                    case graph_1.render.AnnotationType.SHORTCUT:
+                        if (a.node.type === graph_1.NodeType.OP) {
+                            _.extend(a, layout.PARAMS.shortcutSize.op);
+                        }
+                        else if (a.node.type === graph_1.NodeType.META) {
+                            _.extend(a, layout.PARAMS.shortcutSize.meta);
+                        }
+                        else if (a.node.type === graph_1.NodeType.SERIES) {
+                            _.extend(a, layout.PARAMS.shortcutSize.series);
+                        }
+                        else {
+                            throw Error("Invalid node type: " + a.node.type);
+                        }
+                        break;
+                    case graph_1.render.AnnotationType.SUMMARY:
+                        _.extend(a, layout.PARAMS.constant.size);
+                        break;
+                }
+            }
+        })(layout = graph_1.layout || (graph_1.layout = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module
+</script>
+<script>var tf;
+(function (tf) {
+    /**
+     * Mapping from color palette name to color pallette, which contains
+     * exact colors for multiple states of a single color pallette.
+     */
+    tf.COLORS = [
+        {
+            "name": "Google Blue",
+            "color": "#4184f3",
+            "active": "#3a53c5",
+            "disabled": "#cad8fc"
+        },
+        {
+            "name": "Google Red",
+            "color": "#db4437",
+            "active": "#8f2a0c",
+            "disabled": "#e8c6c1"
+        },
+        {
+            "name": "Google Yellow",
+            "color": "#f4b400",
+            "active": "#db9200",
+            "disabled": "#f7e8b0"
+        },
+        {
+            "name": "Google Green",
+            "color": "#0f9d58",
+            "active": "#488046",
+            "disabled": "#c2e1cc"
+        },
+        {
+            "name": "Purple",
+            "color": "#aa46bb",
+            "active": "#5c1398",
+            "disabled": "#d7bce6"
+        },
+        {
+            "name": "Teal",
+            "color": "#00abc0",
+            "active": "#47828e",
+            "disabled": "#c2eaf2"
+        },
+        {
+            "name": "Deep Orange",
+            "color": "#ff6f42",
+            "active": "#ca4a06",
+            "disabled": "#f2cbba"
+        },
+        {
+            "name": "Lime",
+            "color": "#9d9c23",
+            "active": "#7f771d",
+            "disabled": "#f1f4c2"
+        },
+        {
+            "name": "Indigo",
+            "color": "#5b6abf",
+            "active": "#3e47a9",
+            "disabled": "#c5c8e8"
+        },
+        {
+            "name": "Pink",
+            "color": "#ef6191",
+            "active": "#ca1c60",
+            "disabled": "#e9b9ce"
+        },
+        {
+            "name": "Deep Teal",
+            "color": "#00786a",
+            "active": "#2b4f43",
+            "disabled": "#bededa"
+        },
+        {
+            "name": "Deep Pink",
+            "color": "#c1175a",
+            "active": "#75084f",
+            "disabled": "#de8cae"
+        },
+        {
+            "name": "Gray",
+            "color": "#9E9E9E",
+            "active": "#424242",
+            "disabled": "F5F5F5" // 100
+        }
+    ].reduce(function (m, c) {
+        m[c.name] = c;
+        return m;
+    }, {});
+    /**
+     * Mapping from op category to color palette name
+     * e.g.,  OP_GROUP_COLORS["state_ops"] = "Google Blue";
+     */
+    tf.OP_GROUP_COLORS = [
+        {
+            color: "Google Red",
+            groups: ["gen_legacy_ops", "legacy_ops", "legacy_flogs_input",
+                "legacy_image_input", "legacy_input_example_input",
+                "legacy_sequence_input", "legacy_seti_input_input"]
+        }, {
+            color: "Deep Orange",
+            groups: ["constant_ops"]
+        }, {
+            color: "Indigo",
+            groups: ["state_ops"]
+        }, {
+            color: "Purple",
+            groups: ["nn_ops", "nn"]
+        }, {
+            color: "Google Green",
+            groups: ["math_ops"]
+        }, {
+            color: "Lime",
+            groups: ["array_ops"]
+        }, {
+            color: "Teal",
+            groups: ["control_flow_ops", "data_flow_ops"]
+        }, {
+            color: "Pink",
+            groups: ["summary_ops"]
+        }, {
+            color: "Deep Pink",
+            groups: ["io_ops"]
+        }
+    ].reduce(function (m, c) {
+        c.groups.forEach(function (group) {
+            m[group] = c.color;
+        });
+        return m;
+    }, {});
+})(tf || (tf = {}));
+</script>
+<script>/// <reference path="../../../../typings/tsd.d.ts" />
+/// <reference path="../common.ts" />
+var tf;
+(function (tf) {
+    var scene;
+    (function (scene) {
+        /** Show minimap when the viewpoint area is less than X% of the whole area. */
+        var FRAC_VIEWPOINT_AREA = 0.8;
+        var Minimap = (function () {
+            /**
+             * Constructs a new minimap.
+             *
+             * @param svg The main svg element.
+             * @param zoomG The svg group used for panning and zooming the main svg.
+             * @param mainZoom The main zoom behavior.
+             * @param minimap The minimap container.
+             * @param maxWandH The maximum width/height for the minimap.
+             * @param labelPadding Padding in pixels due to the main graph labels.
+             */
+            function Minimap(svg, zoomG, mainZoom, minimap, maxWandH, labelPadding) {
+                var _this = this;
+                this.svg = svg;
+                this.labelPadding = labelPadding;
+                this.zoomG = zoomG;
+                this.mainZoom = mainZoom;
+                this.maxWandH = maxWandH;
+                var $minimap = d3.select(minimap);
+                // The minimap will have 2 main components: the canvas showing the content
+                // and an svg showing a rectangle of the currently zoomed/panned viewpoint.
+                var $minimapSvg = $minimap.select("svg");
+                // Make the viewpoint rectangle draggable.
+                var $viewpoint = $minimapSvg.select("rect");
+                var dragmove = function (d) {
+                    _this.viewpointCoord.x = d3.event.x;
+                    _this.viewpointCoord.y = d3.event.y;
+                    _this.updateViewpoint();
+                };
+                this.viewpointCoord = { x: 0, y: 0 };
+                var drag = d3.behavior.drag().origin(Object).on("drag", dragmove);
+                $viewpoint.datum(this.viewpointCoord).call(drag);
+                // Make the minimap clickable.
+                $minimapSvg.on("click", function () {
+                    if (d3.event.defaultPrevented) {
+                        // This click was part of a drag event, so suppress it.
+                        return;
+                    }
+                    // Update the coordinates of the viewpoint.
+                    var width = Number($viewpoint.attr("width"));
+                    var height = Number($viewpoint.attr("height"));
+                    var clickCoords = d3.mouse($minimapSvg.node());
+                    _this.viewpointCoord.x = clickCoords[0] - width / 2;
+                    _this.viewpointCoord.y = clickCoords[1] - height / 2;
+                    _this.updateViewpoint();
+                });
+                this.viewpoint = $viewpoint.node();
+                this.minimapSvg = $minimapSvg.node();
+                this.minimap = minimap;
+                this.canvas = $minimap.select("canvas.first").node();
+                this.canvasBuffer =
+                    $minimap.select("canvas.second").node();
+            }
+            /**
+             * Updates the position and the size of the viewpoint rectangle.
+             * It also notifies the main svg about the new panned position.
+             */
+            Minimap.prototype.updateViewpoint = function () {
+                // Update the coordinates of the viewpoint rectangle.
+                d3.select(this.viewpoint)
+                    .attr("x", this.viewpointCoord.x)
+                    .attr("y", this.viewpointCoord.y);
+                // Update the translation vector of the main svg to reflect the
+                // new viewpoint.
+                var mainX = -this.viewpointCoord.x * this.scaleMain / this.scaleMinimap;
+                var mainY = -this.viewpointCoord.y * this.scaleMain / this.scaleMinimap;
+                var zoomEvent = this.mainZoom.translate([mainX, mainY]).event;
+                d3.select(this.zoomG).call(zoomEvent);
+            };
+            /**
+             * Redraws the minimap. Should be called whenever the main svg
+             * was updated (e.g. when a node was expanded).
+             */
+            Minimap.prototype.update = function () {
+                var _this = this;
+                var $svg = d3.select(this.svg);
+                // Read all the style rules in the document and embed them into the svg.
+                // The svg needs to be self contained, i.e. all the style rules need to be
+                // embedded so the canvas output matches the origin.
+                var stylesText = "";
+                for (var k = 0; k < document.styleSheets.length; k++) {
+                    try {
+                        var cssRules = document.styleSheets[k].cssRules ||
+                            document.styleSheets[k].rules;
+                        if (cssRules == null) {
+                            continue;
+                        }
+                        for (var i = 0; i < cssRules.length; i++) {
+                            stylesText += cssRules[i].cssText + "\n";
+                        }
+                    }
+                    catch (e) {
+                        if (e.name !== "SecurityError") {
+                            throw e;
+                        }
+                    }
+                }
+                // Temporarily add the css rules to the main svg.
+                var svgStyle = $svg.append("style");
+                svgStyle.text(stylesText);
+                // Temporarily remove the zoom/pan transform from the main svg since we
+                // want the minimap to show a zoomed-out and centered view.
+                var $zoomG = d3.select(this.zoomG);
+                var zoomTransform = $zoomG.attr("transform");
+                $zoomG.attr("transform", null);
+                // Get the size of the entire scene.
+                var sceneSize = this.zoomG.getBBox();
+                // Since we add padding, account for that here.
+                sceneSize.height += this.labelPadding;
+                // Temporarily assign an explicit width/height to the main svg, since
+                // it doesn't have one (uses flex-box), but we need it for the canvas
+                // to work.
+                $svg.attr({
+                    width: sceneSize.width,
+                    height: sceneSize.height,
+                });
+                // Since the content inside the svg changed (e.g. a node was expanded),
+                // the aspect ratio have also changed. Thus, we need to update the scale
+                // factor of the minimap. The scale factor is determined such that both
+                // the width and height of the minimap are <= maximum specified w/h.
+                this.scaleMinimap =
+                    this.maxWandH / Math.max(sceneSize.width, sceneSize.height);
+                this.minimapSize = {
+                    width: sceneSize.width * this.scaleMinimap,
+                    height: sceneSize.height * this.scaleMinimap
+                };
+                // Update the size of the minimap's svg, the buffer canvas and the
+                // viewpoint rect.
+                d3.select(this.minimapSvg).attr(this.minimapSize);
+                d3.select(this.canvasBuffer).attr(this.minimapSize);
+                if (this.translate != null && this.zoom != null) {
+                    // Update the viewpoint rectangle shape since the aspect ratio of the
+                    // map has changed.
+                    requestAnimationFrame(function () { return _this.zoom(); });
+                }
+                // Serialize the main svg to a string which will be used as the rendering
+                // content for the canvas.
+                var svgXml = (new XMLSerializer()).serializeToString(this.svg);
+                // Now that the svg is serialized for rendering, remove the temporarily
+                // assigned styles, explicit width and height and bring back the pan/zoom
+                // transform.
+                svgStyle.remove();
+                $svg.attr({
+                    width: null,
+                    height: null
+                });
+                $zoomG.attr("transform", zoomTransform);
+                var image = new Image();
+                image.onload = function () {
+                    // Draw the svg content onto the buffer canvas.
+                    var context = _this.canvasBuffer.getContext("2d");
+                    context.clearRect(0, 0, _this.canvasBuffer.width, _this.canvasBuffer.height);
+                    context.drawImage(image, 0, 0, _this.minimapSize.width, _this.minimapSize.height);
+                    requestAnimationFrame(function () {
+                        // Hide the old canvas and show the new buffer canvas.
+                        d3.select(_this.canvasBuffer).style("display", null);
+                        d3.select(_this.canvas).style("display", "none");
+                        // Swap the two canvases.
+                        _a = [_this.canvasBuffer, _this.canvas], _this.canvas = _a[0], _this.canvasBuffer = _a[1];
+                        var _a;
+                    });
+                };
+                image.src = "data:image/svg+xml;base64," + btoa(svgXml);
+            };
+            /**
+             * Handles changes in zooming/panning. Should be called from the main svg
+             * to notify that a zoom/pan was performed and this minimap will update it's
+             * viewpoint rectangle.
+             *
+             * @param translate The translate vector, or none to use the last used one.
+             * @param scale The scaling factor, or none to use the last used one.
+             */
+            Minimap.prototype.zoom = function (translate, scale) {
+                // Update the new translate and scale params, only if specified.
+                this.translate = translate || this.translate;
+                this.scaleMain = scale || this.scaleMain;
+                // Update the location of the viewpoint rectangle.
+                var svgRect = this.svg.getBoundingClientRect();
+                var $viewpoint = d3.select(this.viewpoint);
+                this.viewpointCoord.x = -this.translate[0] * this.scaleMinimap /
+                    this.scaleMain;
+                this.viewpointCoord.y = -this.translate[1] * this.scaleMinimap /
+                    this.scaleMain;
+                var viewpointWidth = svgRect.width * this.scaleMinimap / this.scaleMain;
+                var viewpointHeight = svgRect.height * this.scaleMinimap / this.scaleMain;
+                $viewpoint.attr({
+                    x: this.viewpointCoord.x,
+                    y: this.viewpointCoord.y,
+                    width: viewpointWidth,
+                    height: viewpointHeight
+                });
+                // Show/hide the minimap depending on the viewpoint area as fraction of the
+                // whole minimap.
+                var mapWidth = this.minimapSize.width;
+                var mapHeight = this.minimapSize.height;
+                var x = this.viewpointCoord.x;
+                var y = this.viewpointCoord.y;
+                var w = Math.min(Math.max(0, x + viewpointWidth), mapWidth) -
+                    Math.min(Math.max(0, x), mapWidth);
+                var h = Math.min(Math.max(0, y + viewpointHeight), mapHeight) -
+                    Math.min(Math.max(0, y), mapHeight);
+                var fracIntersect = (w * h) / (mapWidth * mapHeight);
+                if (fracIntersect < FRAC_VIEWPOINT_AREA) {
+                    this.minimap.classList.remove("hidden");
+                }
+                else {
+                    this.minimap.classList.add("hidden");
+                }
+            };
+            return Minimap;
+        })();
+        scene.Minimap = Minimap;
+    })(scene = tf.scene || (tf.scene = {}));
+})(tf || (tf = {})); // close module tf.scene
+</script>
+<script>/// <reference path="graph.ts" />
+/// <reference path="render.ts" />
+var tf;
+(function (tf) {
+    var graph;
+    (function (graph_1) {
+        var layout;
+        (function (layout) {
+            /** Set of parameters that define the look and feel of the graph. */
+            layout.PARAMS = {
+                animation: {
+                    /** Default duration for graph animations in ms. */
+                    duration: 250
+                },
+                graph: {
+                    /** Graph parameter for metanode. */
+                    meta: {
+                        /**
+                         * Dagre's nodesep param - number of pixels that
+                         * separate nodes horizontally in the layout.
+                         *
+                         * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+                         */
+                        nodeSep: 110,
+                        /**
+                         * Dagre's ranksep param - number of pixels
+                         * between each rank in the layout.
+                         *
+                         * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+                         */
+                        rankSep: 25
+                    },
+                    /** Graph parameter for metanode. */
+                    series: {
+                        /**
+                         * Dagre's nodesep param - number of pixels that
+                         * separate nodes horizontally in the layout.
+                         *
+                         * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+                         */
+                        nodeSep: 90,
+                        /**
+                         * Dagre's ranksep param - number of pixels
+                         * between each rank in the layout.
+                         *
+                         * See https://github.com/cpettitt/dagre/wiki#configuring-the-layout
+                         */
+                        rankSep: 25,
+                    },
+                    /**
+                     * Padding is used to correctly position the graph SVG inside of its parent
+                     * element. The padding amounts are applied using an SVG transform of X and
+                     * Y coordinates.
+                     */
+                    padding: {
+                        paddingTop: 40,
+                        paddingLeft: 20
+                    }
+                },
+                subscene: {
+                    meta: {
+                        paddingTop: 10,
+                        paddingBottom: 10,
+                        paddingLeft: 10,
+                        paddingRight: 10,
+                        /**
+                         * Used to leave room for the label on top of the highest node in
+                         * the core graph.
+                         */
+                        labelHeight: 20,
+                        /** X-space between each extracted node and the core graph. */
+                        extractXOffset: 50,
+                        /** Y-space between each extracted node. */
+                        extractYOffset: 20
+                    },
+                    series: {
+                        paddingTop: 10,
+                        paddingBottom: 10,
+                        paddingLeft: 10,
+                        paddingRight: 10,
+                        labelHeight: 10
+                    }
+                },
+                nodeSize: {
+                    /** Size of meta nodes. */
+                    meta: {
+                        radius: 5,
+                        width: 60,
+                        /** A scale for the node's height based on number of nodes inside */
+                        height: d3.scale.linear().domain([1, 200]).range([15, 60]).clamp(true),
+                        /** The radius of the circle denoting the expand button. */
+                        expandButtonRadius: 3
+                    },
+                    /** Size of op nodes. */
+                    op: {
+                        width: 15,
+                        height: 6,
+                        radius: 3,
+                        labelOffset: -8
+                    },
+                    /** Size of series nodes. */
+                    series: {
+                        expanded: {
+                            // For expanded series nodes, width and height will be
+                            // computed to account for the subscene.
+                            radius: 10,
+                            labelOffset: 0,
+                        },
+                        vertical: {
+                            // When unexpanded, series whose underlying metagraphs contain
+                            // one or more non-control edges will show as a vertical stack
+                            // of ellipses.
+                            width: 16,
+                            height: 13,
+                            labelOffset: -13,
+                        },
+                        horizontal: {
+                            // When unexpanded, series whose underlying metagraphs contain
+                            // no non-control edges will show as a horizontal stack of
+                            // ellipses.
+                            width: 24,
+                            height: 8,
+                            radius: 10,
+                            labelOffset: -10,
+                        },
+                    },
+                    /** Size of bridge nodes. */
+                    bridge: {
+                        // NOTE: bridge nodes will normally be invisible, but they must
+                        // take up some space so that the layout step leaves room for
+                        // their edges.
+                        width: 20,
+                        height: 20,
+                        radius: 2,
+                        labelOffset: 0
+                    }
+                },
+                shortcutSize: {
+                    /** Size of shortcuts for op nodes */
+                    op: {
+                        width: 10,
+                        height: 4
+                    },
+                    /** Size of shortcuts for meta nodes */
+                    meta: {
+                        width: 12,
+                        height: 4,
+                        radius: 1
+                    },
+                    /** Size of shortcuts for series nodes */
+                    series: {
+                        width: 14,
+                        height: 4,
+                    }
+                },
+                annotations: {
+                    /** X-space between the shape and each annotation-node. */
+                    xOffset: 10,
+                    /** Y-space between each annotation-node. */
+                    yOffset: 3,
+                    /** X-space between each annotation-node and its label. */
+                    labelOffset: 2,
+                    /** Estimate max width for annotation label */
+                    labelWidth: 35
+                },
+                constant: {
+                    size: {
+                        width: 4,
+                        height: 4
+                    }
+                },
+                series: {
+                    /** Maximum number of repeated item for unexpanded series node. */
+                    maxStackCount: 3,
+                    /**
+                     * Positioning offset ratio for collapsed stack
+                     * of parallel series (series without edges between its members).
+                     */
+                    parallelStackOffsetRatio: 0.2,
+                    /**
+                     * Positioning offset ratio for collapsed stack
+                     * of tower series (series with edges between its members).
+                     */
+                    towerStackOffsetRatio: 0.5
+                },
+                minimap: {
+                    /** The maximum width/height the minimap can have. */
+                    size: 150
+                }
+            };
+            /** Calculate layout for a scene of a group node. */
+            function scene(renderNodeInfo) {
+                // Update layout, size, and annotations of its children nodes and edges.
+                if (renderNodeInfo.node.isGroupNode) {
+                    layoutChildren(renderNodeInfo);
+                }
+                // Update position of its children nodes and edges
+                if (renderNodeInfo.node.type === graph_1.NodeType.META) {
+                    layoutMetanode(renderNodeInfo);
+                }
+                else if (renderNodeInfo.node.type === graph_1.NodeType.SERIES) {
+                    layoutSeriesNode(renderNodeInfo);
+                }
+            }
+            layout.scene = scene;
+            ;
+            /**
+             * Update layout, size, and annotations of its children nodes and edges.
+             */
+            function layoutChildren(renderNodeInfo) {
+                var children = renderNodeInfo.coreGraph.nodes().map(function (n) {
+                    return renderNodeInfo.coreGraph.node(n);
+                }).concat(renderNodeInfo.isolatedInExtract, renderNodeInfo.isolatedOutExtract);
+                _.each(children, function (childNodeInfo) {
+                    // Set size of each child
+                    switch (childNodeInfo.node.type) {
+                        case graph_1.NodeType.OP:
+                            _.extend(childNodeInfo, layout.PARAMS.nodeSize.op);
+                            break;
+                        case graph_1.NodeType.BRIDGE:
+                            _.extend(childNodeInfo, layout.PARAMS.nodeSize.bridge);
+                            break;
+                        case graph_1.NodeType.META:
+                            if (!childNodeInfo.expanded) {
+                                // set fixed width and scalable height based on cardinality
+                                _.extend(childNodeInfo, layout.PARAMS.nodeSize.meta);
+                                childNodeInfo.height =
+                                    layout.PARAMS.nodeSize.meta.height(childNodeInfo.node.cardinality);
+                            }
+                            else {
+                                var childGroupNodeInfo = childNodeInfo;
+                                scene(childGroupNodeInfo); // Recursively layout its subscene.
+                            }
+                            break;
+                        case graph_1.NodeType.SERIES:
+                            if (childNodeInfo.expanded) {
+                                _.extend(childNodeInfo, layout.PARAMS.nodeSize.series.expanded);
+                                var childGroupNodeInfo = childNodeInfo;
+                                scene(childGroupNodeInfo); // Recursively layout its subscene.
+                            }
+                            else {
+                                var childGroupNodeInfo = childNodeInfo;
+                                var seriesParams = childGroupNodeInfo.node.hasNonControlEdges ?
+                                    layout.PARAMS.nodeSize.series.vertical :
+                                    layout.PARAMS.nodeSize.series.horizontal;
+                                _.extend(childNodeInfo, seriesParams);
+                            }
+                            break;
+                        default:
+                            throw Error("Unrecognized node type: " + childNodeInfo.node.type);
+                    }
+                    // Layout each child's annotations
+                    layoutAnnotation(childNodeInfo);
+                });
+            }
+            /**
+             * Calculate layout for a graph using dagre
+             * @param graph the graph to be laid out
+             * @param params layout parameters
+             * @return width and height of the core graph
+             */
+            function dagreLayout(graph, params) {
+                _.extend(graph.graph(), {
+                    nodeSep: params.nodeSep,
+                    rankSep: params.rankSep
+                });
+                var bridgeNodeNames = [];
+                var nonBridgeNodeNames = [];
+                // Split out nodes into bridge and non-bridge nodes, and calculate the total
+                // width we should use for bridge nodes.
+                _.each(graph.nodes(), function (nodeName) {
+                    var nodeInfo = graph.node(nodeName);
+                    if (nodeInfo.node.type === graph_1.NodeType.BRIDGE) {
+                        bridgeNodeNames.push(nodeName);
+                    }
+                    else {
+                        nonBridgeNodeNames.push(nodeName);
+                    }
+                });
+                // If there are no non-bridge nodes, then the graph has zero size.
+                if (!nonBridgeNodeNames.length) {
+                    return {
+                        width: 0,
+                        height: 0,
+                    };
+                }
+                dagre.layout(graph);
+                var graphLabel = graph.graph();
+                // Calculate the true bounding box of the graph by iterating over nodes and
+                // edges rather than accepting dagre's word for it. In particular, we should
+                // ignore the extra-wide bridge nodes and bridge edges, and allow for
+                // annotation boxes and labels.
+                var minX = Infinity;
+                var minY = Infinity;
+                var maxX = -Infinity;
+                var maxY = -Infinity;
+                _.each(nonBridgeNodeNames, function (nodeName) {
+                    var nodeInfo = graph.node(nodeName);
+                    var w = 0.5 * nodeInfo.width;
+                    var x1 = nodeInfo.x - w - nodeInfo.inboxWidth;
+                    var x2 = nodeInfo.x + w + nodeInfo.outboxWidth;
+                    minX = x1 < minX ? x1 : minX;
+                    maxX = x2 > maxX ? x2 : maxX;
+                    var labelLength = nodeName.length - nodeName.lastIndexOf(graph_1.NAMESPACE_DELIM);
+                    // TODO(jimbo): Account for font width rather than using a magic number.
+                    var charWidth = 3; // 3 pixels per character.
+                    var lw = 0.5 * labelLength * charWidth;
+                    var lx1 = nodeInfo.x - lw;
+                    var lx2 = nodeInfo.x + lw;
+                    minX = lx1 < minX ? lx1 : minX;
+                    maxX = lx2 > maxX ? lx2 : maxX;
+                    // TODO(jimbo): Account for the height of labels above op nodes here.
+                    var h = 0.5 * nodeInfo.outerHeight;
+                    var y1 = nodeInfo.y - h;
+                    var y2 = nodeInfo.y + h;
+                    minY = y1 < minY ? y1 : minY;
+                    maxY = y2 > maxY ? y2 : maxY;
+                });
+                _.each(graph.edges(), function (edgeObj) {
+                    var renderMetaedgeInfo = graph.edge(edgeObj);
+                    if (renderMetaedgeInfo.structural) {
+                        return; // Skip structural edges from min/max calculations.
+                    }
+                    _.each(renderMetaedgeInfo.points, function (point) {
+                        minX = point.x < minX ? point.x : minX;
+                        maxX = point.x > maxX ? point.x : maxX;
+                        minY = point.y < minY ? point.y : minY;
+                        maxY = point.y > maxY ? point.y : maxY;
+                    });
+                });
+                // Shift all nodes and edge points to account for the left-padding amount,
+                // and the invisble bridge nodes.
+                _.each(graph.nodes(), function (nodeName) {
+                    var nodeInfo = graph.node(nodeName);
+                    nodeInfo.x -= minX;
+                    nodeInfo.y -= minY;
+                });
+                _.each(graph.edges(), function (edgeObj) {
+                    _.each(graph.edge(edgeObj).points, function (point) {
+                        point.x -= minX;
+                        point.y -= minY;
+                    });
+                });
+                return {
+                    width: maxX - minX,
+                    height: maxY - minY,
+                };
+            }
+            /** Layout a metanode. */
+            function layoutMetanode(renderNodeInfo) {
+                // First, copy params specific to meta nodes onto this render info object.
+                var params = layout.PARAMS.subscene.meta;
+                renderNodeInfo = _.extend(renderNodeInfo, params);
+                // Invoke dagre.layout() on the core graph and record the bounding box
+                // dimensions.
+                _.extend(renderNodeInfo.coreBox, dagreLayout(renderNodeInfo.coreGraph, layout.PARAMS.graph.meta));
+                // Calculate the position of nodes in isolatedInExtract relative to the
+                // top-left corner of inExtractBox (the bounding box for all inExtract nodes)
+                // and calculate the size of the inExtractBox.
+                var hasInExtract = renderNodeInfo.isolatedInExtract.length > 0;
+                renderNodeInfo.inExtractBox.width = hasInExtract ?
+                    _(renderNodeInfo.isolatedInExtract).pluck("outerWidth").max() : 0;
+                renderNodeInfo.inExtractBox.height =
+                    _.reduce(renderNodeInfo.isolatedInExtract, function (height, child, i) {
+                        var yOffset = i > 0 ? params.extractYOffset : 0;
+                        // use outerWidth/Height here to avoid overlaps between extracts
+                        child.x = renderNodeInfo.inExtractBox.width / 2;
+                        child.y = height + yOffset + child.outerHeight / 2;
+                        return height + yOffset + child.outerHeight;
+                    }, 0);
+                // Calculate the position of nodes in isolatedOutExtract relative to the
+                // top-left corner of outExtractBox (the bounding box for all outExtract
+                // nodes) and calculate the size of the outExtractBox.
+                var hasOutExtract = renderNodeInfo.isolatedOutExtract.length > 0;
+                renderNodeInfo.outExtractBox.width = hasOutExtract ?
+                    _(renderNodeInfo.isolatedOutExtract).pluck("outerWidth").max() : 0;
+                renderNodeInfo.outExtractBox.height =
+                    _.reduce(renderNodeInfo.isolatedOutExtract, function (height, child, i) {
+                        var yOffset = i > 0 ? params.extractYOffset : 0;
+                        // use outerWidth/Height here to avoid overlaps between extracts
+                        child.x = renderNodeInfo.outExtractBox.width / 2;
+                        child.y = height + yOffset + child.outerHeight / 2;
+                        return height + yOffset + child.outerHeight;
+                    }, 0);
+                // Determine the whole metanode's width (from left to right).
+                renderNodeInfo.width =
+                    params.paddingLeft + renderNodeInfo.coreBox.width + params.paddingRight +
+                        (hasInExtract ?
+                            renderNodeInfo.inExtractBox.width + params.extractXOffset : 0) +
+                        (hasOutExtract ?
+                            params.extractXOffset + renderNodeInfo.outExtractBox.width : 0);
+                // TODO(jimbo): Remove labelHeight and instead incorporate into box sizes.
+                // Determine the whole metanode's height (from top to bottom).
+                renderNodeInfo.height =
+                    renderNodeInfo.labelHeight +
+                        params.paddingTop +
+                        Math.max(renderNodeInfo.inExtractBox.height, renderNodeInfo.coreBox.height, renderNodeInfo.outExtractBox.height) +
+                        params.paddingBottom;
+            }
+            /**
+             * Calculate layout for series node's core graph. Only called for an expanded
+             * series.
+             */
+            function layoutSeriesNode(node) {
+                var graph = node.coreGraph;
+                var params = layout.PARAMS.subscene.series;
+                _.extend(node, params);
+                // Layout the core.
+                _.extend(node.coreBox, dagreLayout(node.coreGraph, layout.PARAMS.graph.series));
+                _.each(graph.nodes(), function (nodeName) {
+                    graph.node(nodeName).excluded = false;
+                });
+                // Series do not have in/outExtractBox so no need to include them here.
+                node.width = node.coreBox.width + params.paddingLeft + params.paddingRight;
+                node.height = node.coreBox.height + params.paddingTop + params.paddingBottom;
+            }
+            /**
+             * Calculate layout for annotations of a given node.
+             * This will modify positions of the the given node and its annotations.
+             *
+             * @see tf.graph.render.Node and tf.graph.render.Annotation
+             * for description of each property of each render node.
+             *
+             */
+            function layoutAnnotation(renderNodeInfo) {
+                // If the render node is an expanded metanode, then its annotations will not
+                // be visible and we should skip the annotation calculations.
+                if (renderNodeInfo.expanded) {
+                    _.extend(renderNodeInfo, {
+                        inboxWidth: 0,
+                        inboxHeight: 0,
+                        outboxWidth: 0,
+                        outboxHeight: 0,
+                        outerWidth: renderNodeInfo.width,
+                        outerHeight: renderNodeInfo.height
+                    });
+                    return;
+                }
+                var inAnnotations = renderNodeInfo.inAnnotations.list;
+                var outAnnotations = renderNodeInfo.outAnnotations.list;
+                // Calculate size for in-annotations
+                _.each(inAnnotations, function (a) { return sizeAnnotation(a); });
+                // Calculate size for out-annotations
+                _.each(outAnnotations, function (a) { return sizeAnnotation(a); });
+                var params = layout.PARAMS.annotations;
+                renderNodeInfo.inboxWidth =
+                    inAnnotations.length > 0 ?
+                        _(inAnnotations).pluck("width").max() +
+                            params.xOffset + params.labelWidth + params.labelOffset :
+                        0;
+                renderNodeInfo.outboxWidth =
+                    outAnnotations.length > 0 ?
+                        _(outAnnotations).pluck("width").max() +
+                            params.xOffset + params.labelWidth + params.labelOffset :
+                        0;
+                // Calculate annotation node position (a.dx, a.dy)
+                // and total height for in-annotations
+                // After this chunk of code:
+                // inboxHeight = sum of annotation heights+ (annotation.length - 1 * yOffset)
+                var inboxHeight = _.reduce(inAnnotations, function (height, a, i) {
+                    var yOffset = i > 0 ? params.yOffset : 0;
+                    a.dx = -(renderNodeInfo.width + a.width) / 2 - params.xOffset;
+                    a.dy = height + yOffset + a.height / 2;
+                    return height + yOffset + a.height;
+                }, 0);
+                _.each(inAnnotations, function (a) {
+                    a.dy -= inboxHeight / 2;
+                    a.labelOffset = params.labelOffset;
+                });
+                // Calculate annotation node position position (a.dx, a.dy)
+                // and total height for out-annotations
+                // After this chunk of code:
+                // outboxHeight = sum of annotation heights +
+                //                (annotation.length - 1 * yOffset)
+                var outboxHeight = _.reduce(outAnnotations, function (height, a, i) {
+                    var yOffset = i > 0 ? params.yOffset : 0;
+                    a.dx = (renderNodeInfo.width + a.width) / 2 + params.xOffset;
+                    a.dy = height + yOffset + a.height / 2;
+                    return height + yOffset + a.height;
+                }, 0);
+                _.each(outAnnotations, function (a) {
+                    // adjust by (half of ) the total height
+                    // so dy is relative to the host node's center.
+                    a.dy -= outboxHeight / 2;
+                    a.labelOffset = params.labelOffset;
+                });
+                // Creating scales for touch point between the in-annotation edges
+                // and their hosts.
+                var inTouchHeight = Math.min(renderNodeInfo.height / 2 - renderNodeInfo.radius, inboxHeight / 2);
+                inTouchHeight = inTouchHeight < 0 ? 0 : inTouchHeight;
+                var inY = d3.scale.linear()
+                    .domain([0, inAnnotations.length - 1])
+                    .range([-inTouchHeight, inTouchHeight]);
+                // Calculate annotation edge position
+                _.each(inAnnotations, function (a, i) {
+                    a.points = [
+                        // The annotation node end
+                        {
+                            dx: a.dx + a.width / 2,
+                            dy: a.dy
+                        },
+                        // The host node end
+                        {
+                            dx: -renderNodeInfo.width / 2,
+                            // only use scale if there are more than one,
+                            // otherwise center it vertically
+                            dy: inAnnotations.length > 1 ? inY(i) : 0
+                        }
+                    ];
+                });
+                // Creating scales for touch point between the out-annotation edges
+                // and their hosts.
+                var outTouchHeight = Math.min(renderNodeInfo.height / 2 - renderNodeInfo.radius, outboxHeight / 2);
+                outTouchHeight = outTouchHeight < 0 ? 0 : outTouchHeight;
+                var outY = d3.scale.linear()
+                    .domain([0, outAnnotations.length - 1])
+                    .range([-outTouchHeight, outTouchHeight]);
+                _.each(outAnnotations, function (a, i) {
+                    // Add point from the border of the annotation node
+                    a.points = [
+                        // The host node end
+                        {
+                            dx: renderNodeInfo.width / 2,
+                            // only use scale if there are more than one,
+                            // otherwise center it vertically
+                            dy: outAnnotations.length > 1 ? outY(i) : 0
+                        },
+                        // The annotation node end
+                        {
+                            dx: a.dx - a.width / 2,
+                            dy: a.dy
+                        }
+                    ];
+                });
+                renderNodeInfo.outerWidth = renderNodeInfo.width + renderNodeInfo.inboxWidth +
+                    renderNodeInfo.outboxWidth;
+                renderNodeInfo.outerHeight =
+                    Math.max(renderNodeInfo.height, inboxHeight, outboxHeight);
+            }
+            /**
+             * Set size of an annotation node.
+             */
+            function sizeAnnotation(a) {
+                switch (a.annotationType) {
+                    case graph_1.render.AnnotationType.CONSTANT:
+                        _.extend(a, layout.PARAMS.constant.size);
+                        break;
+                    case graph_1.render.AnnotationType.SHORTCUT:
+                        if (a.node.type === graph_1.NodeType.OP) {
+                            _.extend(a, layout.PARAMS.shortcutSize.op);
+                        }
+                        else if (a.node.type === graph_1.NodeType.META) {
+                            _.extend(a, layout.PARAMS.shortcutSize.meta);
+                        }
+                        else if (a.node.type === graph_1.NodeType.SERIES) {
+                            _.extend(a, layout.PARAMS.shortcutSize.series);
+                        }
+                        else {
+                            throw Error("Invalid node type: " + a.node.type);
+                        }
+                        break;
+                    case graph_1.render.AnnotationType.SUMMARY:
+                        _.extend(a, layout.PARAMS.constant.size);
+                        break;
+                }
+            }
+        })(layout = graph_1.layout || (graph_1.layout = {}));
+    })(graph = tf.graph || (tf.graph = {}));
+})(tf || (tf = {})); // close module
+</script>
+
+
+
+
+
+
+
+</head><body><div hidden="" by-vulcanize=""><dom-module id="tf-data-coordinator" assetpath="../components/tf-event-dashboard/">
+  <script>/// <reference path="../../typings/tsd.d.ts" />
+/// <reference path="../../bower_components/plottable/plottable.d.ts" />
+var TF;
+(function (TF) {
+    /* The DataCoordinator generates TF.Datasets for each run/tag combination,
+     * and is responsible for communicating with the backend to load data into them.
+     * A key fact about this design is that when Datasets modify their data, they
+     * automatically notify all dependent Plottable charts.
+     */
+    var DataCoordinator = (function () {
+        function DataCoordinator(urlGenerator, runToTag) {
+            this.datasets = {};
+            this.urlGenerator = urlGenerator;
+            this.runToTag = runToTag;
+        }
+        /* Create or return an array of Datasets for the given
+         * tag and runs. It filters which runs it uses by checking
+         * that data exists for each tag-run combination.
+         * Calling this triggers a load on the dataset.
+         */
+        DataCoordinator.prototype.getDatasets = function (tag, runs) {
+            var _this = this;
+            var usableRuns = runs.filter(function (r) {
+                var tags = _this.runToTag[r];
+                return tags.indexOf(tag) !== -1;
+            });
+            return usableRuns.map(function (r) { return _this.getDataset(tag, r); });
+        };
+        /* Create or return a Dataset for given tag and run.
+         * Calling this triggers a load on the dataset.
+         */
+        DataCoordinator.prototype.getDataset = function (tag, run) {
+            var dataset = this._getDataset(tag, run);
+            dataset.load();
+            return dataset;
+        };
+        DataCoordinator.prototype._getDataset = function (tag, run) {
+            var key = [tag, run].toString();
+            var dataset;
+            if (this.datasets[key] != null) {
+                dataset = this.datasets[key];
+            }
+            else {
+                dataset = new TF.Dataset(tag, run, this.urlGenerator);
+                this.datasets[key] = dataset;
+            }
+            return dataset;
+        };
+        return DataCoordinator;
+    })();
+    TF.DataCoordinator = DataCoordinator;
+})(TF || (TF = {}));
+</script>
+  <script>/// <reference path="../../typings/tsd.d.ts" />
+/// <reference path="../../bower_components/plottable/plottable.d.ts" />
+var __extends = (this && this.__extends) || function (d, b) {
+    for (var p in b) if (b.hasOwnProperty(p)) d[p] = b[p];
+    function __() { this.constructor = d; }
+    __.prototype = b.prototype;
+    d.prototype = new __();
+};
+var TF;
+(function (TF) {
+    /* An extension of Plottable.Dataset that knows how to load data from a backend.
+     */
+    var Dataset = (function (_super) {
+        __extends(Dataset, _super);
+        function Dataset(tag, run, urlGenerator) {
+            _super.call(this, [], { tag: tag, run: run });
+            this.load = _.debounce(this._load, 10);
+            this.tag = tag;
+            this.run = run;
+            this.urlGenerator = urlGenerator;
+        }
+        Dataset.prototype._load = function () {
+            var _this = this;
+            var url = this.urlGenerator(this.tag, this.run);
+            if (this.lastRequest != null) {
+                this.lastRequest.abort();
+            }
+            this.lastRequest = d3.json(url, function (error, json) {
+                _this.lastRequest = null;
+                if (error) {
+                    /* tslint:disable */
+                    console.log(error);
+                    /* tslint:enable */
+                    throw new Error("Failure loading JSON at url: \"" + url + "\"");
+                }
+                else {
+                    _this.data(json);
+                }
+            });
+        };
+        return Dataset;
+    })(Plottable.Dataset);
+    TF.Dataset = Dataset;
+})(TF || (TF = {}));
+</script>
+  <script>
+    Polymer({
+      is: "tf-data-coordinator",
+      properties: {
+        urlGenerator: Object,
+        outDataCoordinator: {
+          type: Object,
+          computed: "getCoordinator(urlGenerator, runToTag)",
+          notify: true,
+        },
+      },
+      getCoordinator: function(generator, runToTag) {
+        return new TF.DataCoordinator(generator, runToTag);
+      }
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-tooltip-coordinator" assetpath="../components/tf-event-dashboard/">
+  <script>
+    Polymer({
+      is: "tf-tooltip-coordinator",
+      properties: {
+        outTooltipUpdater: {
+          type: Function,
+          value: function() {
+            return (function(tooltipMap, xValue, closestRun) {
+              this._setOutTooltipMap(tooltipMap);
+              this._setOutXValue(xValue);
+              this._setOutClosestRun(closestRun);
+            }).bind(this);
+          },
+          notify: true,
+          readOnly: true,
+        },
+        outTooltipMap: {
+          // a {runName: tooltipValue} map, where runName and tooltipValue are strings.
+          type: Object,
+          notify: true,
+          readOnly: true,
+        },
+        outXValue: {
+          // a string representation of the closest x value for the tooltips
+          type: Number,
+          notify: true,
+          readOnly: true,
+        },
+        outClosestRun: {
+          // the name of the run that is closest to the user cursor (if any)
+          type: String,
+          notify: true,
+          readOnly: true,
+        },
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="scrollbar-style" assetpath="../components/tf-dashboard-common/">
+  <template>
+    <style>
+      .scrollbar::-webkit-scrollbar-track
+      {
+        visibility: hidden;
+      }
+
+      .scrollbar::-webkit-scrollbar
+      {
+        width: 10px;
+      }
+
+      .scrollbar::-webkit-scrollbar-thumb
+      {
+        border-radius: 10px;
+        -webkit-box-shadow: inset 0 0 2px rgba(0,0,0,.3);
+        background-color: var(--paper-grey-500);
+        color: var(--paper-grey-900);
+      }
+      .scrollbar {
+        box-sizing: border-box;
+      }
+    </style>
+  </template>
+</dom-module>
+<dom-module id="run-color-style" assetpath="../components/tf-dashboard-common/">
+  <template>
+    <style>
+    [color-class="light-blue"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-light-blue-500);
+      --paper-checkbox-checked-ink-color: var(--paper-light-blue-500);
+      --paper-checkbox-unchecked-color: var(--paper-light-blue-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-light-blue-900);
+    }
+    [color-class="red"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-red-500);
+      --paper-checkbox-checked-ink-color: var(--paper-red-500);
+      --paper-checkbox-unchecked-color: var(--paper-red-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-red-900);
+    }
+    [color-class="green"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-green-500);
+      --paper-checkbox-checked-ink-color: var(--paper-green-500);
+      --paper-checkbox-unchecked-color: var(--paper-green-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-green-900);
+    }
+    [color-class="purple"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-purple-500);
+      --paper-checkbox-checked-ink-color: var(--paper-purple-500);
+      --paper-checkbox-unchecked-color: var(--paper-purple-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-purple-900);
+    }
+    [color-class="teal"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-teal-500);
+      --paper-checkbox-checked-ink-color: var(--paper-teal-500);
+      --paper-checkbox-unchecked-color: var(--paper-teal-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-teal-900);
+    }
+    [color-class="pink"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-pink-500);
+      --paper-checkbox-checked-ink-color: var(--paper-pink-500);
+      --paper-checkbox-unchecked-color: var(--paper-pink-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-pink-900);
+    }
+    [color-class="orange"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-orange-500);
+      --paper-checkbox-checked-ink-color: var(--paper-orange-500);
+      --paper-checkbox-unchecked-color: var(--paper-orange-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-orange-900);
+    }
+    [color-class="brown"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-brown-500);
+      --paper-checkbox-checked-ink-color: var(--paper-brown-500);
+      --paper-checkbox-unchecked-color: var(--paper-brown-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-brown-900);
+    }
+    [color-class="indigo"] paper-checkbox {
+      --paper-checkbox-checked-color: var(--paper-indigo-500);
+      --paper-checkbox-checked-ink-color: var(--paper-indigo-500);
+      --paper-checkbox-unchecked-color: var(--paper-indigo-900);
+      --paper-checkbox-unchecked-ink-color: var(--paper-indigo-900);
+    }
+    </style>
+  </template>
+</dom-module>
+<dom-module id="tf-multi-checkbox" assetpath="../components/tf-multi-checkbox/">
+  <style include="scrollbar-style"></style>
+  <style include="run-color-style"></style>
+
+  <template>
+    <div id="outer-container" class="scrollbar">
+      <template is="dom-repeat" items="[[names]]" sort="[[_tooltipComparator(tooltips, tooltipOrderer)]]">
+        <div class="run-row" color-class$="[[_applyColorClass(item, classScale)]]" null-tooltip$="[[_isNullTooltip(item, tooltips)]]" highlight$="[[_isHighlighted(item, highlights.*)]]">
+          <div class="checkbox-container vertical-align-container">
+            <paper-checkbox class="checkbox vertical-align-center" name="[[item]]" checked$="[[_isChecked(item,outSelected.*)]]" on-change="_checkboxChange"></paper-checkbox>
+          </div>
+          <div class="item-label-container">
+            <span>[[item]]</span>
+          </div>
+          <div class="tooltip-value-container vertical-align-container">
+            <span class="vertical-align-top">[[_lookupTooltip(item,tooltips)]]</span>
+          </div>
+        </div>
+      </template>
+    </div>
+  <style>
+    :host {
+      display: flex;
+      flex-direction: column;
+      height: 100%;
+    }
+    #outer-container {
+      overflow-y: scroll;
+      overflow-x: hidden;
+      width: 100%;
+      flex-grow: 1;
+      flex-shrink: 1;
+      word-wrap: break-word;
+    }
+    .run-row {
+      padding-top: 5px;
+      padding-bottom: 5px;
+      display: flex;
+      flex-direction: row;
+      font-size: 13px;
+    }
+    .checkbox-container {
+      flex-grow: 0;
+      flex-shrink: 0;
+    }
+    .checkbox {
+      padding-left: 2px;
+      width: 32px;
+    }
+    .item-label-container {
+      flex-grow: 1;
+      flex-shrink: 1;
+      width: 0px; /* hack to get the flex-grow to work properly */
+    }
+    .tooltip-value-container {
+      display: flex;
+      justify-content: center;
+      flex-grow: 0;
+      flex-shrink: 0;
+      text-align:right;
+      padding-left: 2px;
+    }
+    .vertical-align-container {
+      display: flex;
+      justify-content: center;
+    }
+    .vertical-align-container .vertical-align-center {
+      align-self: center;
+    }
+    .vertical-align-container .vertical-align-top {
+      align-self: start;
+    }
+    [null-tooltip] {
+      display: none;
+    }
+    [highlight] {
+      font-weight: bold;
+    }
+  </style>
+  </template>
+
+  <script>
+  Polymer({
+    is: "tf-multi-checkbox",
+    properties: {
+      names: Array,
+      tooltipOrderer: {
+        /* Used to compute how to order the tooltips based on the tooltip value.
+         * By default, it parses the tooltip strings as numbers.
+         * If set to a falsey value, tooltips are always ordered lexicographically.
+         */
+        type: Function,
+        value: function() {
+          return function(x) {return +x;}
+        },
+      },
+      tooltips: Object,
+      highlights: Array,
+      outSelected: {
+        type: Array,
+        notify: true,
+        value: function() {
+          return [];
+        },
+      },
+      hideMissingTooltips: {
+        // If we have tooltips, but some names are missing, do we hide them?
+        type: Boolean,
+        value: true,
+      },
+      classScale: Function, // map from run name to css class
+    },
+    observers: [
+      "_initializeOutSelected(names.*)",
+    ],
+    _lookupTooltip: function(item, tooltips) {
+      return tooltips != null ? tooltips[item] : null;
+    },
+    _isNullTooltip: function(item, tooltips) {
+      if (!this.hideMissingTooltips) {
+        return true;
+      }
+      if (tooltips == null) {
+        return false;
+      }
+      return tooltips[item] == null;
+    },
+    _initializeOutSelected: function(change) {
+      this.outSelected = change.base.slice();
+    },
+    _tooltipComparator: function(tooltips, tooltipOrderer) {
+      return function(a, b) {
+        if (!tooltips || !tooltipOrderer) {
+          // if we're missing tooltips or orderer, do lexicogrpahic sort
+          return a.localeCompare(b);
+        }
+        function getValue(x) {
+          var value = tooltipOrderer(tooltips[x]);
+          return value == null || _.isNaN(value) ? -Infinity : value;
+        }
+        var aValue = getValue(a);
+        var bValue = getValue(b);
+        return aValue === bValue ? a.localeCompare(b) : bValue - aValue;
+      }
+    },
+    _checkboxChange: function(e) {
+      var name = e.srcElement.name;
+      var idx = this.outSelected.indexOf(name);
+      var checked = e.srcElement.checked;
+      if (checked && idx === -1) {
+          this.push("outSelected", name);
+      } else if (!checked && idx !== -1) {
+        this.splice("outSelected", idx, 1);
+      }
+    },
+    _isChecked: function(item, outSelectedChange) {
+      var outSelected = outSelectedChange.base;
+      return outSelected.indexOf(item) !== -1;
+    },
+    _initializeRuns: function(change) {
+      this.outSelected = change.base.slice();
+    },
+    _applyColorClass: function(item, classScale) {
+      // TODO: Update style just on the element that changes
+      // and apply at microtask timing
+      this.debounce("restyle", function (){
+        this.updateStyles();
+      }, 16);
+      return classScale(item);
+    },
+    _isHighlighted: function(item, highlights) {
+      return highlights.base.indexOf(item) !== -1;
+    },
+  });
+  </script>
+
+</dom-module>
+<dom-module id="tf-run-selector" assetpath="../components/tf-event-dashboard/">
+  <template>
+    <div id="top-text">
+      <template is="dom-if" if="[[xValue]]">
+        <div class="x-tooltip tooltip-container">
+          <div class="x-tooltip-label">[[xType]]</div>
+          <div class="x-tooltip-value">[[xValue]]</div>
+        </div>
+      </template>
+      <template is="dom-if" if="[[!xValue]]">
+        <div id="tooltip-help" class="tooltip-container">
+          Selected Runs:
+        </div>
+      </template>
+    </div>
+    <tf-multi-checkbox names="[[runs]]" tooltips="[[tooltips]]" highlights="[[_arrayify(closestRun)]]" out-selected="{{outSelected}}" class-scale="[[classScale]]" hide-missing-tooltips=""></tf-multi-checkbox>
+    <style>
+      :host {
+        display: flex;
+        flex-direction: column;
+        padding-bottom: 10px;
+        box-sizing: border-box;
+      }
+      #top-text {
+        width: 100%;
+        flex-grow: 0;
+        flex-shrink: 0;
+        padding-left: 35px;
+        padding-right: 16px;
+        padding-bottom: 6px;
+        box-sizing: border-box;
+        color: var(--paper-grey-800);
+      }
+      tf-multi-checkbox {
+        display: flex;
+        flex-grow: 1;
+        flex-shrink: 1;
+        height: 0px; /* hackhack So the flex-grow takes over and gives it space */
+      }
+      .x-tooltip {
+        display: flex;
+        flex-direction: row;
+      }
+      .x-tooltip-label {
+        flex-grow: 1;
+        align-self: flex-start;
+      }
+      .x-tooltip-value {
+        align-self: flex-end;
+      }
+    </style>
+  </template>
+  <script>
+  Polymer({
+    is: "tf-run-selector",
+    properties: {
+      outSelected: {type: Array, notify: true},
+      // runs: an array of strings, representing the run names that may be chosen
+      runs: Array,
+      tooltips: {type: Object, value: null}, // {[run: string]: string}
+      xValue: {type: String, value: null}, // the string representing run's x val
+      xType: String, // string: relative, stpe, wall_time
+      classScale: Object, // map from run name to color class (css)
+      closestRun: {type: String, value: null}, // which run has a value closest to mouse coordinate
+    },
+    _arrayify: function(item) {
+      return [item];
+    },
+  });
+  </script>
+</dom-module>
+<dom-module id="tf-x-type-selector" assetpath="../components/tf-event-dashboard/">
+  <template>
+    <div id="buttons">
+      <p>X Type: </p>
+      <paper-button class="x-button selected" id="step" on-tap="_select" raised="">
+      step
+      </paper-button>
+      <paper-button class="x-button" id="relative" on-tap="_select">
+      relative
+      </paper-button>
+      <paper-button class="x-button" id="wall_time" on-tap="_select">
+      wall
+      </paper-button>
+    </div>
+    <style>
+      .x-button {
+        width: 29%;
+        font-size: 14px;
+        background-color: var(--paper-grey-500);
+        margin-top: 5px;
+        color: white;
+      }
+
+      .x-button.selected {
+        font-weight: bold;
+        background-color: var(--tb-orange-strong) !important;
+      }
+
+      #buttons p {
+        text-align: center;
+        font-size: 12px;
+        margin: 0;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-x-type-selector",
+      properties: {
+        outXType: {type: String, notify: true, readOnly: true, value: "step"},
+      },
+      _select: function(e) {
+        var _this = this;
+        ["step", "wall_time", "relative"].forEach(function(id) {
+          _this.$[id].raised = false;
+          _this.$[id].classList.remove("selected");
+        });
+        e.currentTarget.raised = true;
+        this._setOutXType(e.currentTarget.id);
+        e.currentTarget.classList.add("selected");
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-run-generator" assetpath="../components/tf-dashboard-common/">
+  <template>
+    <iron-ajax id="ajax" auto="" url="[[url]]" handle-as="json" debounce="300" on-response="_setResponse" verbose="true">
+    </iron-ajax>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-run-generator",
+      properties: {
+        url: String,
+        _runToTag: {
+          type: Object,
+          readOnly: true,
+        },
+        outRunToScalars: {
+          // {[runName: string]: string[]}
+          // the names of scalar tags.
+          type: Object,
+          computed: "_scalars(_runToTag.*)",
+          notify: true,
+        },
+        outRunToHistograms: {
+          // {[runName: string]: string[]}
+          // the names of histogram tags.
+          type: Object,
+          computed: "_histograms(_runToTag.*)",
+          notify: true,
+        },
+        outRunToCompressedHistograms: {
+          // {[runName: string]: string[]}
+          // the names of histogram tags.
+          type: Object,
+          computed: "_compressedHistograms(_runToTag.*)",
+          notify: true,
+        },
+        outRunToImages: {
+          // {[runName: string]: string[]}
+          // the names of image tags.
+          type: Object,
+          computed: "_images(_runToTag.*)",
+          notify: true,
+        },
+        outRunsWithGraph: {
+          // ["run1", "run2", ...]
+          // array of run names that have an associated graph definition.
+          type: Array,
+          computed: "_graphs(_runToTag.*)",
+          notify: true
+        }
+      },
+      _scalars: function(_runToTag) {
+        return _.mapValues(_runToTag.base, "scalars");
+      },
+      _histograms: function(_runToTag) {
+        return _.mapValues(_runToTag.base, "histograms");
+      },
+      _compressedHistograms: function(_runToTag) {
+        return _.mapValues(_runToTag.base, "compressedHistograms");
+      },
+      _images: function(_runToTag) {
+        return _.mapValues(_runToTag.base, "images");
+      },
+      _graphs: function(_runToTag) {
+        var runsWithGraph = [];
+        _.each(_runToTag.base, function(runInfo, runName) {
+          if (runInfo.graph === true) {
+            runsWithGraph.push(runName);
+          }
+        });
+        return runsWithGraph;
+      },
+      _setResponse: function(event) {
+        this._set_runToTag(event.detail.response);
+      }
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-color-scale" assetpath="../components/tf-event-dashboard/">
+  <script>
+    (function() {
+      // TODO(danmane) - get Plottable team to make an API point for this
+      Plottable.Scales.Color._LOOP_LIGHTEN_FACTOR = 0;
+      var classColorPairs = [
+        ["light-blue", "#03A9F4"],
+        ["red"       , "#f44366"],
+        ["green"     , "#4CAF50"],
+        ["purple"    , "#9c27b0"],
+        ["teal"      , "#009688"],
+        ["pink"      , "#e91e63"],
+        ["orange"    , "#ff9800"],
+        ["brown"     , "#795548"],
+        ["indigo"    , "#3f51b5"],
+      ];
+      var classes = _.pluck(classColorPairs, 0);
+      var colors = _.pluck(classColorPairs, 1);
+      Polymer({
+        is: "tf-color-scale",
+        properties: {
+          runs: Array,
+          outClassScale: {
+            type: Object,
+            notify: true,
+            readOnly: true,
+            value: function() {
+              return new d3.scale.ordinal().range(classes);
+            },
+            // TODO(danmane): the class scale will not update if the domain changes.
+            // this behavior is inconsistent with the ColorScale.
+            // in practice we don't change runs after initial load so it's not currently an issue
+          },
+          outColorScale: {
+            type: Object,
+            notify: true,
+            readOnly: true,
+            value: function() {
+              var scale = new Plottable.Scales.Color().range(colors);
+              scale.onUpdate(this._notifyColorScaleDomainChange.bind(this));
+              return scale;
+            },
+          },
+        },
+        observers: ["_changeRuns(runs.*)"],
+        _changeRuns: function(runs) {
+          this.outClassScale.domain(this.runs);
+          this.outColorScale.domain(this.runs);
+        },
+        _notifyColorScaleDomainChange: function() {
+          this.notifyPath("outColorScale.domain_path", this.outColorScale.domain());
+          this.outColorScale.domain_path = null;
+        },
+      });
+    })();
+  </script>
+</dom-module>
+<dom-module id="tf-url-generator" assetpath="../components/tf-dashboard-common/">
+  <script>/// <reference path="../../typings/tsd.d.ts" />
+/// <reference path="../../bower_components/plottable/plottable.d.ts" />
+var TF;
+(function (TF) {
+    var Urls;
+    (function (Urls) {
+        Urls.routes = ["runs", "scalars", "histograms",
+            "compressedHistograms", "images",
+            "individualImage", "graph"];
+        function router(route) {
+            return function (tag, run) {
+                return "/" + route + "?tag=" + encodeURIComponent(tag)
+                    + "&run=" + encodeURIComponent(run);
+            };
+        }
+        function runsUrl() {
+            return "/runs";
+        }
+        Urls.runsUrl = runsUrl;
+        Urls.scalarsUrl = router("scalars");
+        Urls.histogramsUrl = router("histograms");
+        Urls.compressedHistogramsUrl = router("compressedHistograms");
+        Urls.imagesUrl = router("images");
+        function individualImageUrl(query) {
+            return "/individualImage?" + query;
+        }
+        Urls.individualImageUrl = individualImageUrl;
+        function graphUrl(run) {
+            return "/graph?run=" + encodeURIComponent(run);
+        }
+        Urls.graphUrl = graphUrl;
+    })(Urls = TF.Urls || (TF.Urls = {}));
+})(TF || (TF = {}));
+</script>
+  <script>
+    var polymerObject = {
+      is: "tf-url-generator",
+      properties: {
+        outRunsUrl: {
+          type: String,
+          value: function() {
+            return TF.Urls.runsUrl();
+          },
+          readOnly: true,
+          notify: true,
+        },
+      },
+    };
+    TF.Urls.routes.forEach(function(route) {
+      /* for each route (other than runs, handled seperately):
+       * out`RouteName`: {
+       *  type: Function,
+       *  readOnly: true,
+       *  notify: true,
+       *  value: function() {
+       *    return TF.Urls.`routeName`Url;
+       *  }
+       */
+      if (route === "runs") {
+        return;
+      }
+      var urlName = route + "Url";
+      var propertyName = Polymer.CaseMap.dashToCamelCase("out-" + urlName + "Generator");
+      polymerObject.properties[propertyName] = {
+        type: Function,
+        value: function() {
+          return TF.Urls[urlName];
+        },
+        notify: true,
+        readOnly: true,
+      }
+    });
+    Polymer(polymerObject);
+  </script>
+</dom-module>
+<dom-module id="tf-dashboard-layout" assetpath="../components/tf-dashboard-common/">
+  <template>
+    <div id="sidebar">
+      <content select=".sidebar"></content>
+    </div>
+
+    <div id="center" class="scrollbar">
+      <content select=".center"></content>
+    </div>
+    <style include="scrollbar-style"></style>
+    <style>
+      #sidebar {
+        width: inherit;
+        height: 100%;
+        background-color: var(--tb-grey-darker);
+        background-image: linear-gradient(to right, var(--tb-grey-lighter), var(--tb-grey-lighter));
+        overflow: ellipsis;
+        padding-left: 10px;
+        padding-right: 10px;
+        flex-grow: 0;
+        flex-shrink: 0;
+      }
+
+      #center {
+        margin: 0 10px;
+        height: 100%;
+        overflow-y: scroll;
+        padding-right: 12px;
+        flex-grow: 1;
+        flex-shrink: 1;
+      }
+      :host {
+        display: flex;
+        flex-direction: row;
+        height: 100%;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-dashboard-layout",
+    });
+  </script>
+</dom-module>
+<dom-module id="dashboard-style" assetpath="../components/tf-dashboard-common/">
+  <template>
+    <style>
+      .card {
+        height: 200px;
+        width: 300px;
+        display: flex;
+        flex-direction: column;
+        margin: 5px 5px;
+        padding: 5px;
+        border: 1px solid var(--paper-grey-500);
+        border-radius: 3px;
+        -webkit-user-select: none;
+        -moz-user-select: none;
+        position: relative;
+      }
+
+      .card .card-title {
+        flex-grow: 0;
+        flex-shrink: 0;
+        margin-bottom: 2px;
+        font-size: 14px;
+        font-weight: bold;
+        text-overflow: ellipsis;
+        overflow: hidden;
+      }
+
+      .card .card-content {
+        flex-grow: 1;
+        flex-shrink: 1;
+        display: flex;
+      }
+      .card .card-bottom-row {
+        flex-grow: 0;
+        flex-shrink: 0;
+        padding-left: 10px;
+        padding-right: 10px;
+      }
+
+      .card.selected {
+        height: 400px;
+        width: 100%;
+      }
+
+      [shift] {
+        bottom: 20px !important;
+      }
+
+      .expand-button {
+        position: absolute;
+        left: 0px;
+        bottom: 0px;
+        color: #2196F3;
+        display: block;
+      }
+
+      #content-container{
+        display: block;
+      }
+
+      .sidebar {
+        display: flex;
+        flex-direction: column;
+        height: 100%;
+      }
+
+      #categorizer {
+        flex-shrink: 0;
+      }
+
+      #xTypeSelector {
+        flex-shrink: 0;
+        margin: 20px 0;
+      }
+
+      #runSelector {
+        flex-shrink: 1;
+        flex-grow: 1;
+      }
+
+      #download-option {
+        padding-left: 55px;
+        color: var(--paper-grey-700);
+        font-size: 14px;
+      }
+
+      #download-option paper-toggle-button {
+        --paper-toggle-button-checked-button-color: var(--tb-orange-strong);
+        --paper-toggle-button-checked-bar-color: var(--tb-orange-weak);
+
+      }
+    </style>
+  </template>
+</dom-module>
+<dom-module id="tf-downloader" assetpath="../components/tf-dashboard-common/">
+  <template>
+    <paper-dropdown-menu no-label-float="true" label="run to download" selected-item-label="{{_run}}">
+      <paper-menu class="dropdown-content">
+        <template is="dom-repeat" items="[[_runs]]">
+          <paper-item no-label-float="true">[[item]]</paper-item>
+        </template>
+      </paper-menu>
+    </paper-dropdown-menu>
+    <a download="[[_csvName(_run)]]" href="[[_csvUrl(_run, urlFn)]]">CSV</a>
+    <a download="[[_jsonName(_run)]]" href="[[_jsonUrl(_run, urlFn)]]">JSON</a>
+    <style>
+      :host {
+        display: block;
+      }
+      paper-dropdown-menu {
+        width: 220px;
+        --paper-input-container-label: {
+          font-size: 10px;
+        }
+        --paper-input-container-input: {
+          font-size: 10px;
+        }
+      }
+      a {
+        font-size: 10px;
+        border-radius: 3px;
+        border: 1px solid #EEE;
+      }
+      paper-input {
+        font-size: 22px;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-downloader",
+      properties: {
+        _run: String,
+        _runs: {
+          type: Array,
+          computed: "_computeRuns(runToTag.*, selectedRuns.*)",
+        },
+        selectedRuns: Array,
+        runToTag: Object,
+        tag: String,
+        urlFn: Function,
+      },
+      _computeRuns: function(runToTagChange, selectedRunsChange) {
+        var runToTag = this.runToTag;
+        var tag = this.tag;
+        return this.selectedRuns.filter(function(x) {
+          return runToTag[x].indexOf(tag) !== -1;
+        })
+      },
+      _csvUrl: function(_run, urlFn) {
+        return urlFn(this.tag, _run) + "&format=csv";
+      },
+      _jsonUrl: function(_run, urlFn) {
+        return urlFn(this.tag, _run);
+      },
+      _csvName: function(_run) {
+        return "run_" + _run + ",tag_" + this.tag + ".csv";
+      },
+      _jsonName: function(_run) {
+        return "run-" + _run + "-tag-" + this.tag + ".json";
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-regex-group" assetpath="../components/tf-regex-group/">
+  <template>
+    <div class="regex-list">
+      <template is="dom-repeat" items="{{rawRegexes}}">
+        <div class="regex-line">
+          <paper-input id="text-input" class="regex-input" label="input new regex" no-label-float="" bind-value="{{item.regex}}" invalid="[[!item.valid]]" on-keyup="moveFocus"></paper-input>
+          <paper-toggle-button class="active-button" checked="{{item.active}}" disabled="[[!item.valid]]"></paper-toggle-button>
+
+          <paper-icon-button icon="delete" class="delete-button" aria-label="Delete Regex" tabindex="0" on-tap="deleteRegex"></paper-icon-button>
+        </div>
+        <style>
+          .regex-input {
+            width: 210px;
+            display: inline-block;
+            padding-left: 8px;
+            padding-right: 5px;
+          }
+
+          .active-button {
+            --paper-toggle-button-checked-button-color: var(--tb-orange-strong);
+            --paper-toggle-button-checked-bar-color: var(--tb-orange-weak);
+            border: none;
+          }
+
+          .delete-button {
+            color: var(--paper-pink-900);
+            width: 24px;
+            height: 24px;
+          }
+          .regex-list {
+            margin-bottom: 10px;
+          }
+          paper-input {
+            --paper-input-container-focus-color: var(--tb-orange-strong);
+          }
+        </style>
+      </template>
+    </div>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-regex-group",
+      properties: {
+        rawRegexes: {
+          type: Array,
+          value: function() {
+            return [{regex: "", active: true, valid: true}];
+          }
+        },
+        regexes: {type: Array, computed: "usableRegexes(rawRegexes.*)", notify: true},
+      },
+      observers: [
+        "addNewRegexIfNeeded(rawRegexes.*)",
+        "checkValidity(rawRegexes.*)",
+      ],
+      checkValidity: function(x) {
+        var match = x.path.match(/rawRegexes\.(\d+)\.regex/);
+        if (match) {
+          var idx = match[1];
+          this.set("rawRegexes." + idx + ".valid", this.isValid(x.value));
+        }
+      },
+      isValid: function(s) {
+        try {
+          new RegExp(s);
+          return true;
+        } catch (e) {
+          return false;
+        }
+      },
+      usableRegexes: function(regexes) {
+        var isValid = this.isValid;
+        return regexes.base.filter(function (r) {
+          // Checking validity here (rather than using the data property)
+          // is necessary because otherwise we might send invalid regexes due
+          // to the fact that this function can call before the observer does
+          return r.regex !== "" && r.active && isValid(r.regex);
+        }).map(function(r) {
+          return r.regex;
+        });
+      },
+      addNewRegexIfNeeded: function() {
+        var last = this.rawRegexes[this.rawRegexes.length - 1];
+        if (last.regex !== "") {
+          this.push("rawRegexes", {regex: "", active: true, valid: true});
+        }
+      },
+      deleteRegex: function(e) {
+        if (this.rawRegexes.length > 1) {
+          this.splice("rawRegexes", e.model.index, 1);
+        }
+      },
+      moveFocus: function(e) {
+        if (e.keyCode === 13) {
+          var idx = e.model.index;
+          var inputs = Polymer.dom(this.root).querySelectorAll(".regex-input");
+          if (idx < this.rawRegexes.length - 1) {
+            inputs[idx+1].$.input.focus();
+          } else {
+            document.activeElement.blur();
+          }
+        }
+      }
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-categorizer" assetpath="../components/tf-categorizer/">
+  <template>
+    <div class="inputs">
+      <tf-regex-group id="regex-group" regexes="{{regexes}}"></tf-regex-group>
+    </div>
+    <div id="underscore-categorization">
+      <span>Split On Underscores:</span>
+      <paper-toggle-button checked="{{splitOnUnderscore}}"></paper-toggle-button>
+    </div>
+    <style>
+      :host {
+        display: block;
+        padding-bottom: 5px;
+        padding-top: 5px;
+      }
+
+      .inputs {
+        padding-left: 5px;
+      }
+
+      paper-toggle-button {
+        --paper-toggle-button-checked-button-color: var(--tb-orange-strong);
+        --paper-toggle-button-checked-bar-color: var(--tb-orange-weak);
+      }
+      #underscore-categorization {
+        padding-left: 94px;
+        color: var(--paper-grey-700);
+        font-size: 14px;
+      }
+    </style>
+  </template>
+  <script>/// <reference path="../../typings/tsd.d.ts" />
+var Categorizer;
+(function (Categorizer) {
+    /* Canonical TensorFlow ops are namespaced using forward slashes.
+     * This fallback categorizer categorizes by the top-level namespace.
+     */
+    Categorizer.topLevelNamespaceCategorizer = splitCategorizer(/\//);
+    // Try to produce good categorizations on legacy graphs, which often
+    // are namespaced like l1_foo/bar or l2_baz/bam.
+    // If there is no leading underscore before the first forward slash,
+    // then it behaves the same as topLevelNamespaceCategorizer
+    Categorizer.legacyUnderscoreCategorizer = splitCategorizer(/[\/_]/);
+    function fallbackCategorizer(s) {
+        switch (s) {
+            case "TopLevelNamespaceCategorizer":
+                return Categorizer.topLevelNamespaceCategorizer;
+            case "LegacyUnderscoreCategorizer":
+                return Categorizer.legacyUnderscoreCategorizer;
+            default:
+                throw new Error("Unrecognized categorization strategy: " + s);
+        }
+    }
+    Categorizer.fallbackCategorizer = fallbackCategorizer;
+    /* An "extractor" is a function that takes a tag name, and "extracts" a category name.
+     * This function takes an extractor, and produces a categorizer.
+     * Currently, it is just used for the fallbackCategorizer, but we may want to
+     * refactor the general categorization logic to use the concept of extractors.
+     */
+    function extractorToCategorizer(extractor) {
+        return function (tags) {
+            if (tags.length === 0) {
+                return [];
+            }
+            var sortedTags = tags.slice().sort();
+            var categories = [];
+            var currentCategory = {
+                name: extractor(sortedTags[0]),
+                tags: [],
+            };
+            sortedTags.forEach(function (t) {
+                var topLevel = extractor(t);
+                if (currentCategory.name !== topLevel) {
+                    categories.push(currentCategory);
+                    currentCategory = {
+                        name: topLevel,
+                        tags: [],
+                    };
+                }
+                currentCategory.tags.push(t);
+            });
+            categories.push(currentCategory);
+            return categories;
+        };
+    }
+    function splitCategorizer(r) {
+        var extractor = function (t) {
+            return t.split(r)[0];
+        };
+        return extractorToCategorizer(extractor);
+    }
+    function defineCategory(ruledef) {
+        var r = new RegExp(ruledef);
+        var f = function (tag) {
+            return r.test(tag);
+        };
+        return { name: ruledef, matches: f };
+    }
+    Categorizer.defineCategory = defineCategory;
+    function _categorizer(rules, fallback) {
+        return function (tags) {
+            var remaining = d3.set(tags);
+            var userSpecified = rules.map(function (def) {
+                var tags = [];
+                remaining.forEach(function (t) {
+                    if (def.matches(t)) {
+                        tags.push(t);
+                    }
+                });
+                var cat = { name: def.name, tags: tags.sort() };
+                return cat;
+            });
+            var defaultCategories = fallback(remaining.values());
+            return userSpecified.concat(defaultCategories);
+        };
+    }
+    Categorizer._categorizer = _categorizer;
+    function categorizer(s) {
+        var rules = s.categoryDefinitions.map(defineCategory);
+        var fallback = fallbackCategorizer(s.fallbackCategorizer);
+        return _categorizer(rules, fallback);
+    }
+    Categorizer.categorizer = categorizer;
+    ;
+})(Categorizer || (Categorizer = {}));
+</script>
+  <script>
+    Polymer({
+      is: "tf-categorizer",
+      properties: {
+        regexes: {type: Array},
+        tags: {type: Array},
+        categoriesAreExclusive: {type: Boolean, value: true},
+        fallbackCategorizer: {
+          type: String,
+          computed: "chooseFallbackCategorizer(splitOnUnderscore)"
+        },
+        splitOnUnderscore: {
+          type: Boolean,
+          value: false,
+        },
+        categorizer: {
+          type: Object,
+          computed: "computeCategorization(regexes.*, categoriesAreExclusive, fallbackCategorizer)",
+        },
+        categories: {type: Array, value: function() {return [];}, notify: true, readOnly: true},
+      },
+      observers: ['recategorize(tags.*, categorizer)'],
+      computeCategorization: function(regexes, categoriesAreExclusive, fallbackCategorizer) {
+        var categorizationStrategy = {
+          categoryDefinitions: regexes.base,
+          categoriesAreExclusive: categoriesAreExclusive,
+          fallbackCategorizer: fallbackCategorizer,
+        };
+        return Categorizer.categorizer(categorizationStrategy);
+      },
+      recategorize: function() {
+        this.debounce("tf-categorizer-recategorize", function (){
+          var categories = this.categorizer(this.tags);
+          this._setCategories(categories);
+        })
+      },
+      chooseFallbackCategorizer: function(splitOnUnderscore) {
+        if (splitOnUnderscore) {
+          return "LegacyUnderscoreCategorizer";
+        } else {
+          return "TopLevelNamespaceCategorizer";
+        }
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-chart" assetpath="../components/tf-event-dashboard/">
+  <template>
+    <svg id="chartsvg"></svg>
+    <style>
+      :host {
+        -webkit-user-select: none;
+        -moz-user-select: none;
+        display: flex;
+        flex-direction: column;
+        flex-grow: 1;
+        flex-shrink: 1;
+      }
+      svg {
+        -webkit-user-select: none;
+        -moz-user-select: none;
+        flex-grow: 1;
+        flex-shrink: 1;
+      }
+      .plottable .crosshairs line.guide-line {
+        stroke: #777;
+      }
+    </style>
+  </template>
+  <script>var __extends = (this && this.__extends) || function (d, b) {
+    for (var p in b) if (b.hasOwnProperty(p)) d[p] = b[p];
+    function __() { this.constructor = d; }
+    __.prototype = b.prototype;
+    d.prototype = new __();
+};
+var Plottable;
+(function (Plottable) {
+    var DragZoomLayer = (function (_super) {
+        __extends(DragZoomLayer, _super);
+        /* Constructs a SelectionBoxLayer with an attached DragInteraction and ClickInteraction.
+         * On drag, it triggers an animated zoom into the box that was dragged.
+         * On double click, it zooms back out to the original view, before any zooming.
+         * The zoom animation uses an easing function (default d3.ease("cubic-in-out")) and is customizable.
+         * Usage: Construct the selection box layer and attach x and y scales, and then add the layer
+         * over the plot you are zooming on using a Component Group.
+         * TODO(danmane) - merge this into Plottable
+         */
+        function DragZoomLayer(xScale, yScale) {
+            _super.call(this);
+            this.isZoomed = false;
+            this.easeFn = d3.ease("cubic-in-out");
+            this._animationTime = 750;
+            this.xScale(xScale);
+            this.yScale(yScale);
+            this._dragInteraction = new Plottable.Interactions.Drag();
+            this._dragInteraction.attachTo(this);
+            this._doubleClickInteraction = new Plottable.Interactions.DoubleClick();
+            this._doubleClickInteraction.attachTo(this);
+            this.setupCallbacks();
+        }
+        DragZoomLayer.prototype.setupCallbacks = function () {
+            var _this = this;
+            var dragging = false;
+            this._dragInteraction.onDragStart(function (startPoint) {
+                _this.bounds({
+                    topLeft: startPoint,
+                    bottomRight: startPoint,
+                });
+            });
+            this._dragInteraction.onDrag(function (startPoint, endPoint) {
+                _this.bounds({ topLeft: startPoint, bottomRight: endPoint });
+                _this.boxVisible(true);
+                dragging = true;
+            });
+            this._dragInteraction.onDragEnd(function (startPoint, endPoint) {
+                _this.boxVisible(false);
+                _this.bounds({ topLeft: startPoint, bottomRight: endPoint });
+                if (dragging) {
+                    _this.zoom();
+                }
+                dragging = false;
+            });
+            this._doubleClickInteraction.onDoubleClick(this.unzoom.bind(this));
+        };
+        DragZoomLayer.prototype.animationTime = function (animationTime) {
+            if (animationTime == null) {
+                return this._animationTime;
+            }
+            if (animationTime < 0) {
+                throw new Error("animationTime cannot be negative");
+            }
+            this._animationTime = animationTime;
+            return this;
+        };
+        /* Set the easing function, which determines how the zoom interpolates over time. */
+        DragZoomLayer.prototype.ease = function (fn) {
+            if (typeof (fn) !== "function") {
+                throw new Error("ease function must be a function");
+            }
+            if (fn(0) !== 0 || fn(1) !== 1) {
+                Plottable.Utils.Window.warn("Easing function does not maintain invariant f(0)==0 && f(1)==1. Bad behavior may result.");
+            }
+            this.easeFn = fn;
+            return this;
+        };
+        // Zoom into extent of the selection box bounds
+        DragZoomLayer.prototype.zoom = function () {
+            var x0 = this.xExtent()[0].valueOf();
+            var x1 = this.xExtent()[1].valueOf();
+            var y0 = this.yExtent()[1].valueOf();
+            var y1 = this.yExtent()[0].valueOf();
+            if (x0 === x1 || y0 === y1) {
+                return;
+            }
+            if (!this.isZoomed) {
+                this.isZoomed = true;
+                this.xDomainToRestore = this.xScale().domain();
+                this.yDomainToRestore = this.yScale().domain();
+            }
+            this.interpolateZoom(x0, x1, y0, y1);
+        };
+        // Restore the scales to their state before any zoom
+        DragZoomLayer.prototype.unzoom = function () {
+            if (!this.isZoomed) {
+                return;
+            }
+            this.isZoomed = false;
+            this.interpolateZoom(this.xDomainToRestore[0], this.xDomainToRestore[1], this.yDomainToRestore[0], this.yDomainToRestore[1]);
+        };
+        // If we are zooming, disable interactions, to avoid contention
+        DragZoomLayer.prototype.isZooming = function (isZooming) {
+            this._dragInteraction.enabled(!isZooming);
+            this._doubleClickInteraction.enabled(!isZooming);
+        };
+        DragZoomLayer.prototype.interpolateZoom = function (x0f, x1f, y0f, y1f) {
+            var _this = this;
+            var x0s = this.xScale().domain()[0].valueOf();
+            var x1s = this.xScale().domain()[1].valueOf();
+            var y0s = this.yScale().domain()[0].valueOf();
+            var y1s = this.yScale().domain()[1].valueOf();
+            // Copy a ref to the ease fn, so that changing ease wont affect zooms in progress
+            var ease = this.easeFn;
+            var interpolator = function (a, b, p) { return d3.interpolateNumber(a, b)(ease(p)); };
+            this.isZooming(true);
+            var start = Date.now();
+            var draw = function () {
+                var now = Date.now();
+                var passed = now - start;
+                var p = _this._animationTime === 0 ? 1 : Math.min(1, passed / _this._animationTime);
+                var x0 = interpolator(x0s, x0f, p);
+                var x1 = interpolator(x1s, x1f, p);
+                var y0 = interpolator(y0s, y0f, p);
+                var y1 = interpolator(y1s, y1f, p);
+                _this.xScale().domain([x0, x1]);
+                _this.yScale().domain([y0, y1]);
+                if (p < 1) {
+                    Plottable.Utils.DOM.requestAnimationFramePolyfill(draw);
+                }
+                else {
+                    _this.isZooming(false);
+                }
+            };
+            draw();
+        };
+        return DragZoomLayer;
+    })(Plottable.Components.SelectionBoxLayer);
+    Plottable.DragZoomLayer = DragZoomLayer;
+})(Plottable || (Plottable = {}));
+</script>
+  <script>/// <reference path="../../typings/tsd.d.ts" />
+/// <reference path="../../bower_components/plottable/plottable.d.ts" />
+var __extends = (this && this.__extends) || function (d, b) {
+    for (var p in b) if (b.hasOwnProperty(p)) d[p] = b[p];
+    function __() { this.constructor = d; }
+    __.prototype = b.prototype;
+    d.prototype = new __();
+};
+var TF;
+(function (TF) {
+    var Y_TOOLTIP_FORMATTER_PRECISION = 4;
+    var STEP_AXIS_FORMATTER_PRECISION = 4;
+    var Y_AXIS_FORMATTER_PRECISION = 3;
+    var BaseChart = (function () {
+        function BaseChart(tag, dataCoordinator, tooltipUpdater, xType, colorScale) {
+            this.dataCoordinator = dataCoordinator;
+            this.tag = tag;
+            this.colorScale = colorScale;
+            this.tooltipUpdater = tooltipUpdater;
+            this.buildChart(xType);
+        }
+        BaseChart.prototype.changeRuns = function (runs) {
+            throw new Error("Abstract method not implemented");
+        };
+        BaseChart.prototype.addCrosshairs = function (plot, yAccessor) {
+            var _this = this;
+            var pi = new Plottable.Interactions.Pointer();
+            pi.attachTo(plot);
+            var xGuideLine = new Plottable.Components.GuideLineLayer("vertical");
+            var yGuideLine = new Plottable.Components.GuideLineLayer("horizontal");
+            xGuideLine.addClass("crosshairs");
+            yGuideLine.addClass("crosshairs");
+            var group = new Plottable.Components.Group([plot, xGuideLine, yGuideLine]);
+            var yfmt = multiscaleFormatter(Y_TOOLTIP_FORMATTER_PRECISION);
+            pi.onPointerMove(function (p) {
+                var run2val = {};
+                var x = _this.xScale.invert(p.x).valueOf();
+                var yMin = _this.yScale.domain()[0];
+                var yMax = _this.yScale.domain()[1];
+                var closestRun = null;
+                var minYDistToRun = Infinity;
+                var yValueForCrosshairs = p.y;
+                plot.datasets().forEach(function (dataset) {
+                    var run = dataset.metadata().run;
+                    var data = dataset.data();
+                    var xs = data.map(function (d, i) { return _this.xAccessor(d, i, dataset).valueOf(); });
+                    var idx = _.sortedIndex(xs, x);
+                    if (idx === 0 || idx === data.length) {
+                        // Only find a point when the cursor is inside the range of the data
+                        // if the cursor is to the left or right of all the data, dont attach.
+                        return;
+                    }
+                    var previous = data[idx - 1];
+                    var next = data[idx];
+                    var x0 = _this.xAccessor(previous, idx - 1, dataset).valueOf();
+                    var x1 = _this.xAccessor(next, idx, dataset).valueOf();
+                    var y0 = yAccessor(previous, idx - 1, dataset).valueOf();
+                    var y1 = yAccessor(next, idx, dataset).valueOf();
+                    var slope = (y1 - y0) / (x1 - x0);
+                    var y = y0 + slope * (x - x0);
+                    if (y < yMin || y > yMax || y !== y) {
+                        // don't find data that is off the top or bottom of the plot.
+                        // also don't find data if it is NaN
+                        return;
+                    }
+                    var dist = Math.abs(_this.yScale.scale(y) - p.y);
+                    if (dist < minYDistToRun) {
+                        minYDistToRun = dist;
+                        closestRun = run;
+                        yValueForCrosshairs = _this.yScale.scale(y);
+                    }
+                    // Note this tooltip will display linearly interpolated values
+                    // e.g. will display a y=0 value halfway between [y=-1, y=1], even
+                    // though there is not actually any 0 datapoint. This could be misleading
+                    run2val[run] = yfmt(y);
+                });
+                xGuideLine.pixelPosition(p.x);
+                yGuideLine.pixelPosition(yValueForCrosshairs);
+                _this.tooltipUpdater(run2val, _this.xTooltipFormatter(x), closestRun);
+            });
+            pi.onPointerExit(function () {
+                _this.tooltipUpdater(null, null, null);
+                xGuideLine.pixelPosition(-1);
+                yGuideLine.pixelPosition(-1);
+            });
+            return group;
+        };
+        BaseChart.prototype.buildChart = function (xType) {
+            if (this.outer) {
+                this.outer.destroy();
+            }
+            var xComponents = getXComponents(xType);
+            this.xAccessor = xComponents.accessor;
+            this.xScale = xComponents.scale;
+            this.xAxis = xComponents.axis;
+            this.xAxis.margin(0).tickLabelPadding(3);
+            this.xTooltipFormatter = xComponents.tooltipFormatter;
+            this.yScale = new Plottable.Scales.Linear();
+            this.yAxis = new Plottable.Axes.Numeric(this.yScale, "left");
+            var yFormatter = multiscaleFormatter(Y_AXIS_FORMATTER_PRECISION);
+            this.yAxis.margin(0).tickLabelPadding(5).formatter(yFormatter);
+            this.yAxis.usesTextWidthApproximation(true);
+            var center = this.buildPlot(this.xAccessor, this.xScale, this.yScale);
+            this.gridlines = new Plottable.Components.Gridlines(this.xScale, this.yScale);
+            var dzl = new Plottable.DragZoomLayer(this.xScale, this.yScale);
+            this.center = new Plottable.Components.Group([center, this.gridlines, dzl]);
+            this.outer = new Plottable.Components.Table([
+                [this.yAxis, this.center],
+                [null, this.xAxis]
+            ]);
+        };
+        BaseChart.prototype.buildPlot = function (xAccessor, xScale, yScale) {
+            throw new Error("Abstract method not implemented.");
+        };
+        BaseChart.prototype.renderTo = function (target) {
+            this.outer.renderTo(target);
+        };
+        BaseChart.prototype.redraw = function () {
+            this.outer.redraw();
+        };
+        BaseChart.prototype.destroy = function () {
+            this.outer.destroy();
+        };
+        return BaseChart;
+    })();
+    TF.BaseChart = BaseChart;
+    var LineChart = (function (_super) {
+        __extends(LineChart, _super);
+        function LineChart() {
+            _super.apply(this, arguments);
+        }
+        LineChart.prototype.buildPlot = function (xAccessor, xScale, yScale) {
+            var yAccessor = accessorize("2");
+            var plot = new Plottable.Plots.Line();
+            plot.x(xAccessor, xScale);
+            plot.y(yAccessor, yScale);
+            plot.attr("stroke", function (d, i, m) { return m.run; }, this.colorScale);
+            this.plot = plot;
+            var group = this.addCrosshairs(plot, yAccessor);
+            return group;
+        };
+        LineChart.prototype.changeRuns = function (runs) {
+            var datasets = this.dataCoordinator.getDatasets(this.tag, runs);
+            this.plot.datasets(datasets);
+        };
+        return LineChart;
+    })(BaseChart);
+    TF.LineChart = LineChart;
+    var HistogramChart = (function (_super) {
+        __extends(HistogramChart, _super);
+        function HistogramChart() {
+            _super.apply(this, arguments);
+        }
+        HistogramChart.prototype.changeRuns = function (runs) {
+            var datasets = this.dataCoordinator.getDatasets(this.tag, runs);
+            this.plots.forEach(function (p) { return p.datasets(datasets); });
+        };
+        HistogramChart.prototype.buildPlot = function (xAccessor, xScale, yScale) {
+            var _this = this;
+            var percents = [0, 228, 1587, 3085, 5000, 6915, 8413, 9772, 10000];
+            var opacities = _.range(percents.length - 1).map(function (i) { return (percents[i + 1] - percents[i]) / 2500; });
+            var accessors = percents.map(function (p, i) { return function (datum) { return datum[2][i][1]; }; });
+            var median = 4;
+            var medianAccessor = accessors[median];
+            var plots = _.range(accessors.length - 1).map(function (i) {
+                var p = new Plottable.Plots.Area();
+                p.x(xAccessor, xScale);
+                var y0 = i > median ? accessors[i] : accessors[i + 1];
+                var y = i > median ? accessors[i + 1] : accessors[i];
+                p.y(y, yScale);
+                p.y0(y0);
+                p.attr("fill", function (d, i, m) { return m.run; }, _this.colorScale);
+                p.attr("stroke", function (d, i, m) { return m.run; }, _this.colorScale);
+                p.attr("stroke-weight", function (d, i, m) { return "0.5px"; });
+                p.attr("stroke-opacity", function () { return opacities[i]; });
+                p.attr("fill-opacity", function () { return opacities[i]; });
+                return p;
+            });
+            var medianPlot = new Plottable.Plots.Line();
+            medianPlot.x(xAccessor, xScale);
+            medianPlot.y(medianAccessor, yScale);
+            medianPlot.attr("stroke", function (d, i, m) { return m.run; }, this.colorScale);
+            this.plots = plots;
+            var group = this.addCrosshairs(medianPlot, medianAccessor);
+            return new Plottable.Components.Group([new Plottable.Components.Group(plots), group]);
+        };
+        return HistogramChart;
+    })(BaseChart);
+    TF.HistogramChart = HistogramChart;
+    /* Create a formatter function that will switch between exponential and
+     * regular display depending on the scale of the number being formatted,
+     * and show `digits` significant digits.
+     */
+    function multiscaleFormatter(digits) {
+        return function (v) {
+            var absv = Math.abs(v);
+            if (absv < 1E-15) {
+                // Sometimes zero-like values get an annoying representation
+                absv = 0;
+            }
+            var f;
+            if (absv >= 1E4) {
+                f = d3.format("." + digits + "e");
+            }
+            else if (absv > 0 && absv < 0.01) {
+                f = d3.format("." + digits + "e");
+            }
+            else {
+                f = d3.format("." + digits + "g");
+            }
+            return f(v);
+        };
+    }
+    function accessorize(key) {
+        return function (d, index, dataset) { return d[key]; };
+    }
+    function stepX() {
+        var scale = new Plottable.Scales.Linear();
+        var axis = new Plottable.Axes.Numeric(scale, "bottom");
+        var formatter = Plottable.Formatters.siSuffix(STEP_AXIS_FORMATTER_PRECISION);
+        axis.formatter(formatter);
+        return {
+            scale: scale,
+            axis: axis,
+            accessor: accessorize("1"),
+            tooltipFormatter: formatter,
+        };
+    }
+    function wallX() {
+        var scale = new Plottable.Scales.Time();
+        var formatter = Plottable.Formatters.time("%a %b %e, %H:%M:%S");
+        return {
+            scale: scale,
+            axis: new Plottable.Axes.Time(scale, "bottom"),
+            accessor: function (d, index, dataset) {
+                return d[0] * 1000; // convert seconds to ms
+            },
+            tooltipFormatter: function (d) { return formatter(new Date(d)); },
+        };
+    }
+    function relativeX() {
+        var scale = new Plottable.Scales.Linear();
+        var formatter = function (n) {
+            var days = Math.floor(n / 24);
+            n -= (days * 24);
+            var hours = Math.floor(n);
+            n -= hours;
+            n *= 60;
+            var minutes = Math.floor(n);
+            n -= minutes;
+            n *= 60;
+            var seconds = Math.floor(n);
+            return days + "d " + hours + "h " + minutes + "m " + seconds + "s";
+        };
+        return {
+            scale: scale,
+            axis: new Plottable.Axes.Numeric(scale, "bottom"),
+            accessor: function (d, index, dataset) {
+                var data = dataset && dataset.data();
+                // I can't imagine how this function would be called when the data is empty
+                // (after all, it iterates over the data), but lets guard just to be safe.
+                var first = data.length > 0 ? data[0][0] : 0;
+                return (d[0] - first) / (60 * 60); // convert seconds to hours
+            },
+            tooltipFormatter: formatter,
+        };
+    }
+    function getXComponents(xType) {
+        switch (xType) {
+            case "step":
+                return stepX();
+            case "wall_time":
+                return wallX();
+            case "relative":
+                return relativeX();
+            default:
+                throw new Error("invalid xType: " + xType);
+        }
+    }
+})(TF || (TF = {}));
+</script>
+  <script>
+    Polymer({
+      is: "tf-chart",
+      properties: {
+        type: String, // "scalar" or "compressedHistogram"
+        _chart: Object,
+        colorScale: Object,
+        tag: String,
+        selectedRuns: Array,
+        xType: String,
+        dataCoordinator: Object,
+        tooltipUpdater: Function,
+        _initialized: Boolean,
+      },
+      observers: [
+        "_makeChart(type, tag, dataCoordinator, tooltipUpdater, xType, colorScale, _initialized)",
+        "_changeRuns(_chart, selectedRuns.*)"
+      ],
+      _changeRuns: function(chart, change) {
+        this._chart.changeRuns(this.selectedRuns);
+        this.redraw();
+      },
+      redraw: function() {
+        this._chart.redraw();
+      },
+      _constructor: function(type) {
+        if (type === "scalar") {
+          return TF.LineChart;
+        } else if (type === "compressedHistogram") {
+          return TF.HistogramChart;
+        } else {
+          throw new Error("Unrecognized chart type");
+        }
+      },
+      _makeChart: function(type, tag, dataCoordinator, tooltipUpdater, xType, colorScale, _initialized) {
+        if (!_initialized) {
+          return;
+        }
+        if (this._chart) this._chart.destroy();
+        var cns = this._constructor(type);
+        var chart = new cns(tag, dataCoordinator, tooltipUpdater, xType, colorScale);
+        var svg = d3.select(this.$.chartsvg);
+        this.async(function() {
+          chart.renderTo(svg);
+          this._chart = chart;
+        }, 350);
+      },
+      attached: function() {
+        this._initialized = true;
+      },
+      detached: function() {
+        this._initialized = false;
+      }
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-collapsable-pane" assetpath="../components/tf-collapsable-pane/">
+  <template>
+    <button class="heading" on-tap="togglePane" open-button$="[[opened]]">
+    <span class="name">[[name]]</span>
+    <span class="hackpadding"></span>
+    <span class="count">
+      (<span>[[count]]</span>)
+    </span>
+  </button>
+    <iron-collapse opened="[[opened]]">
+      <div class="content">
+        <template is="dom-if" if="[[opened]]" restamp="[[restamp]]">
+          <content></content>
+        </template>
+      </div>
+    </iron-collapse>
+    <style>
+      .heading {
+        margin-top: 10px;
+        padding-left: 15px;
+        background-color: #f3f3f3;
+        border: 1px solid #dedede;
+        border-radius: 5px;
+        font-size: 18px;
+        cursor: pointer;
+        -webkit-tap-highlight-color: rgba(0,0,0,0);
+        width: 100%;
+        height: 30px;
+        box-sizing: border-box;
+        font-size: 16px;
+        display: inline-flex;
+        flex-direction: row;
+        align-items: center;
+        justify-content: space-between;
+        line-height: 1;
+        padding-top: 2px;
+        padding-bottom: 2px;
+      }
+
+      .content {
+        padding: 15px;
+        border: 1px solid #dedede;
+        border-top: none;
+        border-bottom-left-radius: 5px;
+        border-bottom-right-radius: 5px;
+      }
+      [open-button] {
+        border-bottom-left-radius: 0px !important;
+        border-bottom-right-radius: 0px !important;
+      }
+      .name {
+        flex-grow: 0;
+      }
+      .count {
+        flex-grow: 0;
+        float: right;
+        font-size: 12px;
+      }
+      .hackpadding {
+        /* An obnoxious hack, but I can't get justify-content: space-between to work */
+        flex-grow: 1;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-collapsable-pane",
+      properties: {
+        opened: {type: Boolean, value: false},
+        restamp: {type: Boolean, value: true},
+        name: {type: String, observer: "hide"},
+        count: {type: Number},
+      },
+      hide: function() {
+        this.opened = false;
+      },
+      togglePane: function() {
+        this.opened = !this.opened;
+      }
+    });
+  </script>
+
+</dom-module>
+<dom-module id="warning-style" assetpath="../components/tf-dashboard-common/">
+  <template>
+    <style>
+      .warning {
+        max-width: 540px;
+        margin: 80px auto 0 auto;
+      }
+    </style>
+  </template>
+</dom-module>
+<dom-module id="tf-event-dashboard" assetpath="../components/tf-event-dashboard/">
+  <template>
+    <div id="plumbing">
+      <tf-url-generator out-runs-url="{{runsUrl}}" out-scalars-url-generator="{{scalarsUrlGen}}" id="urlGenerator"></tf-url-generator>
+
+      <tf-data-coordinator id="dataCoordinator" url-generator="[[scalarsUrlGen]]" run-to-tag="[[runToScalars]]" color-scale="[[colorScale]]" out-data-coordinator="{{dataCoordinator}}"></tf-data-coordinator>
+
+      <tf-run-generator id="runGenerator" url="[[runsUrl]]" out-run-to-scalars="{{runToScalars}}"></tf-run-generator>
+
+      <tf-color-scale id="colorScale" runs="[[_runs]]" out-color-scale="{{colorScale}}" out-class-scale="{{classScale}}"></tf-color-scale>
+
+      <tf-tooltip-coordinator id="tooltipCoordinator" out-tooltip-updater="{{tooltipUpdater}}" out-tooltip-map="{{tooltipMap}}" out-x-value="{{tooltipXValue}}" out-closest-run="{{closestRun}}"></tf-tooltip-coordinator>
+
+    </div>
+
+    <tf-dashboard-layout>
+      <div class="sidebar">
+        <tf-categorizer id="categorizer" tags="[[_visibleTags]]" categories="{{categories}}"></tf-categorizer>
+        <span id="download-option">
+          Show Data Download Links:
+          <paper-toggle-button checked="{{_show_download_links}}"></paper-toggle-button>
+        </span>
+
+        <tf-x-type-selector id="xTypeSelector" out-x-type="{{xType}}"></tf-x-type-selector>
+
+        <tf-run-selector id="runSelector" runs="[[_runs]]" class-scale="[[classScale]]" out-selected="{{selectedRuns}}" tooltips="[[tooltipMap]]" closest-run="[[closestRun]]" x-value="[[tooltipXValue]]" x-type="[[xType]]"></tf-run-selector>
+
+      </div>
+      <div class="center">
+        <template is="dom-if" if="[[!categories.length]]">
+          <div class="warning">
+            <p>
+              No scalar summary tags were found.
+            </p>
+            <p>
+              Maybe data hasn't loaded yet, or maybe you need
+              to add some <code>tf.scalar_summary</code> ops to your graph, and
+              serialize them using the <code>tf.training.summary_io.SummaryWriter</code>.
+            </p>
+          </div>
+        </template>
+        <template is="dom-repeat" items="[[categories]]">
+          <tf-collapsable-pane name="[[item.name]]" count="[[item.tags.length]]">
+            <div class="layout horizontal wrap">
+              <template is="dom-repeat" items="[[item.tags]]" as="tag">
+                <div class="card">
+                  <span class="card-title">[[tag]]</span>
+                  <div class="card-content">
+                    <tf-chart tag="[[tag]]" type="scalar" id="chart" selected-runs="[[selectedRuns]]" x-type="[[xType]]" data-coordinator="[[dataCoordinator]]" color-scale="[[colorScale]]" on-keyup="toggleSelected" tabindex="2" tooltip-updater="[[tooltipUpdater]]"></tf-chart>
+                    <paper-icon-button class="expand-button" shift$="[[_show_download_links]]" icon="fullscreen" on-tap="toggleSelected"></paper-icon-button>
+                  </div>
+                  <template is="dom-if" if="[[_show_download_links]]">
+                    <div class="card-bottom-row">
+                      <tf-downloader selected-runs="[[selectedRuns]]" tag="[[tag]]" url-fn="[[scalarsUrlGen]]" run-to-tag="[[runToScalars]]">
+                      </tf-downloader>
+                    </div>
+                  </template>
+                </div>
+              </template>
+            </div>
+          </tf-collapsable-pane>
+        </template>
+      </div>
+    </tf-dashboard-layout>
+
+    <style include="dashboard-style"></style>
+    <style include="warning-style"></style>
+
+  </template>
+
+  <script>
+    Polymer({
+      is: "tf-event-dashboard",
+      properties: {
+        _runs: {
+          type: Array,
+          computed: "_getRuns(runToScalars)",
+        },
+        _visibleTags: {
+          type: Array,
+          computed: "_getVisibleTags(selectedRuns.*, runToScalars.*)"
+        },
+        _show_download_links: Boolean,
+      },
+      observers: ['redraw(_show_download_links)'],
+      redraw: function(_show_download_links) {
+        var els = this.getElementsByTagName("tf-chart");
+        for (var i=0; i<els.length; i++) {
+          els[i].redraw();
+        }
+      },
+      _getRuns: function(runToScalars) {
+        return _.keys(runToScalars);
+      },
+      _getVisibleTags: function(selectedRunsChange, runsToScalarsChange) {
+        var keys = selectedRunsChange.base;
+        var dict = runsToScalarsChange.base;
+        return _.union.apply(null, keys.map(function(k) {return dict[k]}));
+      },
+      toggleSelected: function(e) {
+        var currentTarget = Polymer.dom(e.currentTarget);
+        var parentDiv = currentTarget.parentNode.parentNode;
+        parentDiv.classList.toggle("selected");
+        var chart = currentTarget.previousElementSibling;
+        if (chart) {
+          chart.redraw();
+        }
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-histogram-dashboard" assetpath="../components/tf-histogram-dashboard/">
+  <template>
+    <div id="plumbing">
+      <tf-url-generator out-runs-url="{{runsUrl}}" out-compressed-histograms-url-generator="{{compressedHistogramsUrlGen}}" id="urlGenerator"></tf-url-generator>
+
+      <tf-data-coordinator id="dataCoordinator" url-generator="[[compressedHistogramsUrlGen]]" run-to-tag="[[runToCompressedHistograms]]" color-scale="[[colorScale]]" out-data-coordinator="{{dataCoordinator}}"></tf-data-coordinator>
+
+      <tf-run-generator id="runGenerator" url="[[runsUrl]]" out-run-to-compressed-histograms="{{runToCompressedHistograms}}"></tf-run-generator>
+
+      <tf-color-scale id="colorScale" runs="[[_runs]]" out-color-scale="{{colorScale}}" out-class-scale="{{classScale}}"></tf-color-scale>
+
+      <tf-tooltip-coordinator id="tooltipCoordinator" out-tooltip-updater="{{tooltipUpdater}}" out-tooltip-map="{{tooltipMap}}" out-x-value="{{tooltipXValue}}" out-closest-run="{{closestRun}}"></tf-tooltip-coordinator>
+    </div>
+
+    <tf-dashboard-layout>
+      <div class="sidebar">
+
+        <tf-categorizer id="categorizer" tags="[[_visibleTags]]" categories="{{categories}}"></tf-categorizer>
+
+        <tf-x-type-selector id="xTypeSelector" out-x-type="{{xType}}"></tf-x-type-selector>
+
+        <tf-run-selector id="runSelector" runs="[[_runs]]" class-scale="[[classScale]]" out-selected="{{selectedRuns}}" tooltips="[[tooltipMap]]" closest-run="[[closestRun]]" x-value="[[tooltipXValue]]" x-type="[[xType]]"></tf-run-selector>
+
+      </div>
+
+      <div class="center">
+        <template is="dom-if" if="[[!categories.length]]">
+          <div class="warning">
+            <p>
+              No histogram tags were found.
+            </p>
+            <p>
+              Maybe data hasn't loaded yet, or maybe you need
+              to add some <code>tf.histogram_summary</code> ops to your graph, and
+              serialize them using the <code>tf.training.summary_io.SummaryWriter</code>.
+            </p>
+          </div>
+        </template>
+        <template is="dom-repeat" items="[[categories]]">
+          <tf-collapsable-pane name="[[item.name]]" count="[[_count(item.tags, selectedRuns.*, runToCompressedHistograms.*)]]">
+            <div class="layout horizontal wrap">
+              <template is="dom-repeat" items="[[item.tags]]" as="tag">
+                <template is="dom-repeat" items="[[selectedRuns]]" as="run">
+                  <template is="dom-if" if="[[_exists(run, tag, runToCompressedHistograms.*)]]">
+                    <div class="card">
+                      <span class="card-title">[[tag]]</span>
+                      <div class="card-content">
+                        <tf-chart tag="[[tag]]" type="compressedHistogram" id="chart" selected-runs="[[_array(run)]]" x-type="[[xType]]" data-coordinator="[[dataCoordinator]]" color-scale="[[colorScale]]" on-keyup="toggleSelected" tabindex="2" tooltip-updater="[[tooltipUpdater]]"></tf-chart>
+                        <paper-icon-button class="expand-button" icon="fullscreen" on-tap="toggleSelected"></paper-icon-button>
+                      </div>
+                    </div>
+                  </template>
+                </template>
+              </template>
+            </div>
+          </tf-collapsable-pane>
+        </template>
+      </div>
+    </tf-dashboard-layout>
+
+    <style include="dashboard-style"></style>
+    <style include="warning-style"></style>
+  </template>
+
+  <script>
+    Polymer({
+      is: "tf-histogram-dashboard",
+      properties: {
+        _runs: {
+          type: Array,
+          computed: "_getRuns(runToCompressedHistograms)",
+        },
+        _visibleTags: {
+          type: Array,
+          computed: "_getVisibleTags(selectedRuns.*, runToCompressedHistograms.*)"
+        }
+      },
+      _exists: function(run, tag, runToCompressedHistogramsChange) {
+        var runToCompressedHistograms = runToCompressedHistogramsChange.base;
+        return runToCompressedHistograms[run].indexOf(tag) !== -1;
+      },
+      _array: function(x) {
+        return [x];
+      },
+      _count: function(tags, selectedRunsChange, runToCompressedHistogramsChange) {
+        var selectedRuns = selectedRunsChange.base;
+        var runToCompressedHistograms = runToCompressedHistogramsChange.base;
+        var targetTags = {};
+        tags.forEach(function(t) {
+          targetTags[t] = true;
+        });
+        var count = 0;
+        selectedRuns.forEach(function(r) {
+          runToCompressedHistograms[r].forEach(function(t) {
+            if (targetTags[t]) {
+              count++;
+            }
+          });
+        });
+        return count;
+      },
+      _getRuns: function(runToCompressedHistograms) {
+        return _.keys(runToCompressedHistograms);
+      },
+      _getVisibleTags: function(selectedRunsChange, runToCompressedHistogramsChange) {
+        var keys = selectedRunsChange.base;
+        var dict = runToCompressedHistogramsChange.base;
+        return _.union.apply(null, keys.map(function(k) {return dict[k]}));
+      },
+      toggleSelected: function(e) {
+        var currentTarget = Polymer.dom(e.currentTarget);
+        var parentDiv = currentTarget.parentNode.parentNode;
+        parentDiv.classList.toggle("selected");
+        var chart = currentTarget.previousElementSibling;
+        if (chart) {
+          chart.redraw();
+        }
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-image-loader" assetpath="../components/tf-image-dashboard/">
+  <style>
+  :host {
+    display: block;
+  }
+  img {
+    width: 100%;
+    height: 100%;
+  }
+  </style>
+  <template>
+    <iron-ajax id="ajax" auto="" url="[[metadataUrl]]" handle-as="json" debounce="50" last-response="{{imageMetadata}}" verbose="true"></iron-ajax>
+    <template is="dom-if" if="[[imageUrl]]">
+      <img src="[[imageUrl]]">
+    </template>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-image-loader",
+      properties: {
+        run: String,
+        tag: String,
+        imagesGenerator: Function,
+        individualImageGenerator: Function,
+        imageMetadata: Array,
+        metadataUrl: {
+          type: String,
+          computed: "apply(imagesGenerator, tag, run)",
+        },
+        imageUrl: {
+          type: String,
+          computed: "getLastImage(imageMetadata, individualImageGenerator)",
+        },
+      },
+      apply: function(imagesGenerator, run, tag) {
+        return imagesGenerator(run, tag);
+      },
+      getLastImage: function(imageMetadata, individualImageGenerator) {
+        if (imageMetadata == null) {
+          return null;
+        }
+        var query = _.last(imageMetadata).query;
+        return individualImageGenerator(query);
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-image-grid" assetpath="../components/tf-image-dashboard/">
+  <template>
+    <style include="scrollbar-style"></style>
+    <div id="fullContainer" class="container scrollbar">
+      <div id="topRow" class="container">
+        <div class="noshrink" id="paddingCell"></div>
+        <template is="dom-repeat" items="[[_runs]]" as="run">
+        <div class="run-name-cell noshrink">
+          <span>[[run]]</span>
+        </div>
+      </template>
+      </div>
+      <div id="bottomContainer" class="container">
+        <template is="dom-repeat" items="[[_tags]]" sort="" as="tag">
+          <div class="image-row container noshrink">
+            <div class="tag-name-cell noshrink">
+              <span class="tag-name">[[tag]]</span>
+            </div>
+            <template is="dom-repeat" items="[[_runs]]" as="run">
+              <div class="image-cell noshrink">
+                <template is="dom-if" if="[[_exists(run, tag, runToImages.*)]]">
+                  <tf-image-loader id="loader" run="[[run]]" tag="[[tag]]" images-generator="[[imagesGenerator]]" individual-image-generator="[[individualImageGenerator]]">
+                  </tf-image-loader>
+                </template>
+              </div>
+            </template>
+          </div>
+        </template>
+      </div>
+    </div>
+    <style>
+      :host {
+        display: block;
+        height: 100%;
+      }
+      .container {
+        display: flex;
+        flex-wrap: nowrap;
+      }
+      #fullContainer {
+        width: 100%;
+        height: 100%;
+        flex-direction: column;
+        padding-top: 20px;
+        overflow: scroll;
+        -webkit-box-sizing: border-box;
+        -moz-box-sizing: border-box;
+        box-sizing: border-box;
+      }
+      #topRow {
+        flex-direction: row;
+      }
+      #bottomContainer {
+        flex-direction: column;
+        height: 100%;
+        width: 100%;
+      }
+      .image-row {
+        flex-direction: row;
+      }
+      .image-cell {
+        width: 300px;
+        height: 300px;
+        border: 1px solid black;
+      }
+      .tag-name-cell {
+        height: 300px;
+        width: 300px;
+        display:flex;
+        flex-direction: column;
+        justify-content: center;
+      }
+      .tag-name {
+        word-wrap: break-word;
+        text-align: center;
+        white-space: nowrap;
+      }
+      .run-name-cell {
+        width: 300px;
+        height: 30px;
+        text-align: center;
+      }
+      .noshrink {
+        flex-shrink: 0;
+      }
+      #paddingCell {
+        width: 300px;
+        height: 30px;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-image-grid",
+      properties: {
+        runToImages: Object,
+        _tags: {type: Array, computed: "_getTags(runToImages.*)"},
+        _runs: {type: Array, computed: "_getRuns(runToImages.*)"},
+        imagesGenerator: Function,
+        individualImageGenerator: Function,
+      },
+      _getTags: function(runToImages) {
+        return _.chain(runToImages.base).values().flatten().union().value();
+      },
+      _getRuns(runToImages) {
+        var r2i = runToImages.base;
+        return _.keys(r2i).filter(function(x) {return r2i[x].length > 0;});
+      },
+      _exists: function (run, tag, runToImages) {
+        runToImages = runToImages.base;
+        return runToImages[run].indexOf(tag) !== -1;
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-image-dashboard" assetpath="../components/tf-image-dashboard/">
+  <template>
+    <div id="plumbing">
+      <tf-url-generator out-runs-url="{{runsUrl}}" out-images-url-generator="{{imagesUrlGen}}" out-individual-image-url-generator="{{individualImageUrlGen}}" id="urlGenerator"></tf-url-generator>
+
+      <tf-run-generator id="runGenerator" url="[[runsUrl]]" out-run-to-images="{{runToImages}}"></tf-run-generator>
+    </div>
+
+    <div class="center">
+      <template is="dom-if" if="[[!_hasImages(runToImages.*)]]">
+        <div class="warning">
+          <p>
+            No image tags were found.
+          </p>
+          <p>
+            Maybe data hasn't loaded yet, or maybe you need
+            to add some <code>tf.image_summary</code> ops to your graph, and
+            serialize them using the <code>tf.training.summary_io.SummaryWriter</code>.
+          </p>
+        </div>
+      </template>
+      <tf-image-grid id="imageGrid" run-to-images="[[runToImages]]" images-generator="[[imagesUrlGen]]" individual-image-generator="[[individualImageUrlGen]]"></tf-image-grid>
+    </div>
+
+    <style>
+      .center {
+        padding-left: 10px;
+        padding-right: 10px;
+        height: 100%;
+        width: 100%;
+        -webkit-box-sizing: border-box;
+        -moz-box-sizing: border-box;
+        box-sizing: border-box;
+      }
+      :host {
+        height: 100%;
+        display: block;
+      }
+
+    </style>
+    <style include="warning-style"></style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-image-dashboard",
+      properties: {
+        runToImages: Object,
+        imagesUrlGen: Function,
+        individualImageUrlGen: Function,
+      },
+      _hasImages: function(runToImagesChange) {
+        return _.values(runToImagesChange.base).some(function(arr) {
+          return arr.length > 0;
+        });
+      },
+    });
+  </script>
+</dom-module>
+<dom-module id="tf-graph-loader" assetpath="../components/tf-graph-loader/">
+</dom-module>
+
+<script>
+Polymer({
+
+  is: 'tf-graph-loader',
+
+  properties: {
+    /**
+     * @type {value: number, msg: string}
+     *
+     * A number between 0 and 100 denoting the % of progress
+     * for the progress bar and the displayed message.
+     */
+    progress: {
+      type: Object,
+      notify: true,
+      readOnly: true // Produces, does not consume.
+    },
+    datasets: Array,
+    hasStats: {
+      type: Boolean,
+      readOnly: true, // This property produces data.
+      notify: true
+    },
+    selectedDataset: Number,
+    selectedFile: {
+      type: Object,
+      observer: '_selectedFileChanged'
+    },
+    outGraphHierarchy: {
+      type: Object,
+      readOnly: true, //readonly so outsider can't change this via binding
+      notify: true
+    },
+    outGraph: {
+      type: Object,
+      readOnly: true, //readonly so outsider can't change this via binding
+      notify: true
+    },
+    outGraphName: {
+      type: String,
+      readOnly: true,
+      notify: true
+    }
+  },
+  observers: [
+    '_selectedDatasetChanged(selectedDataset, datasets)'
+  ],
+  _parseAndConstructHierarchicalGraph: function(dataset, pbTxtContent) {
+    var self = this;
+    // Reset the progress bar to 0.
+    self._setProgress({
+      value: 0,
+      msg: ''
+    });
+    var tracker = {
+      setMessage: function(msg) {
+        self._setProgress({
+          value: self.progress.value,
+          msg: msg
+        });
+      },
+      updateProgress: function(value) {
+        self._setProgress({
+          value: self.progress.value + value,
+          msg: self.progress.msg
+        });
+      },
+      reportError: function(msg) {
+        self._setProgress({
+          value: self.progress.value,
+          msg: msg,
+          error: true
+        });
+      },
+    };
+    var statsJson;
+    var dataTracker = tf.getSubtaskTracker(tracker, 30, 'Data');
+    tf.graph.parser.readAndParseData(dataset, pbTxtContent, dataTracker)
+    .then(function(result) {
+      // Build the flat graph (consists only of Op nodes).
+      var nodes = result.nodes;
+      statsJson = result.statsJson;
+
+      // This is the whitelist of inputs on op types that are considered
+      // reference edges. "Assign 0" indicates that the first input to
+      // an OpNode with operation type "Assign" is a reference edge.
+      var refEdges = {};
+      refEdges["Assign 0"] = true;
+      refEdges["AssignAdd 0"] = true;
+      refEdges["AssignSub 0"] = true;
+      refEdges["assign 0"] = true;
+      refEdges["assign_add 0"] = true;
+      refEdges["assign_sub 0"] = true;
+      refEdges["count_up_to 0"] = true;
+      refEdges["ScatterAdd 0"] = true;
+      refEdges["ScatterSub 0"] = true;
+      refEdges["ScatterUpdate 0"] = true;
+      refEdges["scatter_add 0"] = true;
+      refEdges["scatter_sub 0"] = true;
+      refEdges["scatter_update 0"] = true;
+      var buildParams = {
+        enableEmbedding: true,
+        inEmbeddingTypes: ['Const'],
+        outEmbeddingTypes: ['^[a-zA-Z]+Summary$'],
+        refEdges: refEdges
+      };
+      var graphTracker = tf.getSubtaskTracker(tracker, 20,
+          'Graph');
+      return tf.graph.build(nodes, buildParams, graphTracker);
+    })
+    .then(function(graph) {
+      this._setOutGraph(graph);
+      if (statsJson) {
+        // If there are associated stats, join them with the graph.
+        tf.time('Joining stats info with graph...', function() {
+          tf.graph.joinStatsInfoWithGraph(graph, statsJson);
+        });
+      }
+      var hierarchyParams = {
+        verifyTemplate: true,
+        groupSeries: true,
+      };
+      var hierarchyTracker = tf.getSubtaskTracker(tracker, 50,
+          'Namespace hierarchy');
+      return tf.graph.hierarchy.build(graph, hierarchyParams, hierarchyTracker);
+    }.bind(this))
+    .then(function(graphHierarchy) {
+      // Update the properties which notify the parent with the
+      // graph hierarchy and whether the data has live stats or not.
+      this._setHasStats(statsJson != null);
+      this._setOutGraphHierarchy(graphHierarchy);
+    }.bind(this))
+    .catch(function(reason) {
+      tracker.reportError("Graph visualization failed: " + reason);
+    });
+  },
+  _selectedDatasetChanged: function(datasetIndex, datasets) {
+    var dataset = datasets[datasetIndex];
+    this._parseAndConstructHierarchicalGraph(dataset);
+    this._setOutGraphName(dataset.name);
+  },
+  _selectedFileChanged: function(e) {
+    if (!e) {
+      return;
+    }
+    var file = e.target.files[0];
+    if (!file) {
+      return;
+    }
+
+    // Clear out the value of the file chooser. This ensures that if the user
+    // selects the same file, we'll re-read it.
+    e.target.value = '';
+
+    var reader = new FileReader();
+
+    reader.onload = function(e) {
+      this._parseAndConstructHierarchicalGraph(null, e.target.result);
+    }.bind(this);
+
+    reader.readAsText(file);
+  }
+});
+</script>
+<dom-module id="tf-graph-style" assetpath="../components/tf-graph/">
+<template>
+<style>
+:host {
+  display: flex;
+  width: 100%;
+}
+
+::content #svg {
+  overflow: hidden;
+  flex: 1;
+}
+
+::content #hidden {
+  position: fixed;
+  top: 0px;
+  visibility: hidden;
+}
+
+
+/* --- Node and annotation-node for Metanode --- */
+
+::content .meta > .nodeshape > rect,
+::content .meta > .annotation-node > rect {
+  cursor: pointer;
+  fill: hsl(0, 0%, 70%);
+}
+
+
+::content .node.meta.highlighted > .nodeshape > rect,
+::content .node.meta.highlighted > .annotation-node > rect {
+  stroke-width: 2;
+}
+
+::content .annotation.meta.highlighted > .nodeshape > rect,
+::content .annotation.meta.highlighted > .annotation-node > rect {
+  stroke-width: 1;
+}
+
+::content .meta.selected > .nodeshape > rect,
+::content .meta.selected > .annotation-node > rect {
+  stroke: red;
+  stroke-width: 2;
+}
+
+::content .node.meta.selected.expanded > .nodeshape > rect,
+::content .node.meta.selected.expanded > .annotation-node > rect {
+  stroke: red;
+  stroke-width: 3;
+}
+
+:content .annotation.meta.selected > .nodeshape > rect,
+:content .annotation.meta.selected > .annotation-node > rect {
+  stroke: red;
+  stroke-width: 2;
+}
+
+::content .node.meta.selected.expanded.highlighted > .nodeshape > rect,
+::content .node.meta.selected.expanded.highlighted > .annotation-node > rect {
+  stroke: red;
+  stroke-width: 4;
+}
+
+
+/* --- Op Node --- */
+
+::content .op > .nodeshape > ellipse,
+::content .op > .annotation-node > ellipse {
+  cursor: pointer;
+  fill: #fff;
+  stroke: #ccc;
+}
+
+::content .op.selected > .nodeshape > ellipse,
+::content .op.selected > .annotation-node > ellipse {
+  stroke: red;
+  stroke-width: 2;
+}
+
+::content .op.highlighted > .nodeshape > ellipse,
+::content .op.highlighted > .annotation-node > ellipse {
+  stroke-width: 2;
+}
+
+/* --- Series Node --- */
+
+/* By default, don't show the series background <rect>. */
+::content .series > .nodeshape > rect {
+  fill: hsl(0, 0%, 70%);
+  fill-opacity: 0;
+  stroke-dasharray: 5, 5;
+  stroke-opacity: 0;
+  cursor: pointer;
+}
+
+/* Once expanded, show the series background <rect> and hide the <use>. */
+::content .series.expanded > .nodeshape > rect {
+  fill-opacity: 0.15;
+  stroke: hsl(0, 0%, 70%);
+  stroke-opacity: 1;
+}
+::content .series.expanded > .nodeshape > use {
+  visibility: hidden;
+}
+
+/**
+ * TODO(jimbo): Simplify this by applying a stable class name to all <g>
+ * elements that currently have either the nodeshape or annotation-node classes.
+ */
+::content .series > .nodeshape > use ,
+::content .series > .annotation-node > use {
+  stroke: #ccc;
+}
+::content .series.highlighted > .nodeshape > use ,
+::content .series.highlighted > .annotation-node > use {
+  stroke-width: 2;
+}
+::content .series.selected > .nodeshape > use ,
+::content .series.selected > .annotation-node > use {
+  stroke: red;
+  stroke-width: 2;
+}
+
+::content .series.selected > .nodeshape > rect {
+  stroke: red;
+  stroke-width: 2;
+}
+
+:content .annotation.series.selected > .annotation-node > use {
+  stroke: red;
+  stroke-width: 2;
+}
+
+/* --- Bridge Node --- */
+::content .bridge > .nodeshape > rect {
+  stroke: #f0f;
+  opacity: 0.2;
+  display: none;
+}
+
+/* --- Structural Elements --- */
+::content .edge > path.edgeline.structural {
+  stroke: #f0f;
+  opacity: 0.2;
+  display: none;
+}
+
+/* --- Series Nodes --- */
+
+/* Hide the rect for a series' annotation. */
+::content .series > .annotation-node > rect {
+  display: none;
+}
+
+/* --- Node label --- */
+
+
+::content .node > text.nodelabel {
+  cursor: pointer;
+  fill: #444;
+}
+
+::content .meta.expanded > text.nodelabel {
+  font-size: 9px;
+}
+
+::content .series > text.nodelabel {
+  font-size: 8px;
+}
+
+::content .op > text.nodelabel {
+  font-size: 6px;
+}
+
+::content .bridge > text.nodelabel {
+  display: none;
+}
+
+::content .node.meta.expanded > text.nodelabel{
+  cursor: normal;
+}
+
+::content .annotation.meta.highlighted > text.annotation-label {
+  fill: #50A3F7;
+}
+
+::content .annotation.meta.selected > text.annotation-label {
+  fill: #4285F4;
+}
+
+/* --- Annotation --- */
+
+/* only applied for annotations that are not summary or constant.
+(.summary, .constant gets overriden below) */
+::content .annotation > .annotation-node > * {
+  stroke-width: 0.5;
+  stroke-dasharray: 1, 1;
+}
+
+::content .annotation.summary > .annotation-node > *,
+::content .annotation.constant > .annotation-node > * {
+  stroke-width: 1;
+  stroke-dasharray: none;
+}
+
+::content .annotation > .annotation-edge {
+  fill: none;
+  stroke: #aaa;
+  stroke-width: 0.5;
+  marker-end: url("#annotation-arrowhead");
+}
+
+::content .annotation > .annotation-edge.refline {
+  marker-start: url("#ref-annotation-arrowhead");
+}
+
+::content .annotation > .annotation-control-edge {
+  stroke-dasharray: 1, 1;
+}
+
+::content #annotation-arrowhead {
+  fill: #aaa;
+}
+
+::content #ref-annotation-arrowhead {
+  fill: #aaa;
+}
+
+::content .annotation > .annotation-label {
+  font-size: 5px;
+  cursor: pointer;
+}
+::content .annotation > .annotation-label.annotation-ellipsis {
+  cursor: default;
+}
+
+/* Hide annotations on expanded meta nodes since they're redundant. */
+::content .expanded > .in-annotations,
+::content .expanded > .out-annotations {
+  display: none;
+}
+
+/* --- Annotation: Constant --- */
+
+::content .constant > .annotation-node > ellipse {
+  cursor: pointer;
+  fill: white;
+  stroke: #848484;
+}
+
+::content .constant.selected > .annotation-node > ellipse {
+  fill: white;
+  stroke: red;
+}
+
+::content .constant.highlighted > .annotation-node > ellipse {
+  stroke-width: 1.5;
+}
+
+/* --- Annotation: Summary --- */
+
+::content .summary > .annotation-node > ellipse {
+  cursor: pointer;
+  fill: #DB4437;
+  stroke: #DB4437;
+}
+
+::content .summary.selected > .annotation-node > ellipse {
+  fill: #A52714;
+  stroke: #A52714;
+}
+
+::content .summary.highlighted > .annotation-node > ellipse {
+  stroke-width: 1.5;
+}
+
+/* --- Edge --- */
+
+::content .edge > path.edgeline {
+  fill: none;
+  marker-end: url("#arrowhead");
+  stroke: #bbb;
+  stroke-linecap: round;
+  stroke-width: 0.75;
+}
+
+::content .edge > path.edgeline.refline {
+  marker-start: url("#ref-arrowhead");
+}
+
+::content #arrowhead {
+  fill: #bbb;
+}
+
+::content #ref-arrowhead {
+  fill: #bbb;
+}
+
+::content .edge .control-dep {
+  stroke-dasharray: 2, 2;
+}
+
+/* --- Group node expand/collapse button --- */
+
+/* Hides expand/collapse buttons when a node isn't expanded or highlighted. Using
+   incredibly small opacity so that the bounding box of the <g> parent still takes
+   this container into account even when it isn't visible */
+::content .node:not(.highlighted):not(.expanded) > .nodeshape > .buttoncontainer {
+  opacity: 0.01;
+}
+::content .node.highlighted > .nodeshape > .buttoncontainer {
+  cursor: pointer;
+}
+::content .buttoncircle {
+  fill: #E7811D;
+}
+::content .buttoncircle:hover {
+  fill: #B96717;
+}
+::content .expandbutton,
+::content .collapsebutton {
+  stroke: white;
+}
+/* Do not let the path elements in the button take pointer focus */
+::content .node > .nodeshape > .buttoncontainer > .expandbutton,
+::content .node > .nodeshape > .buttoncontainer > .collapsebutton {
+  pointer-events: none;
+}
+/* Only show the expand button when a node is collapsed and only show the
+   collapse button when a node is expanded. */
+::content .node.expanded > .nodeshape > .buttoncontainer > .expandbutton {
+  display: none;
+}
+::content .node:not(.expanded) > .nodeshape > .buttoncontainer > .collapsebutton {
+  display: none;
+}
+</style>
+</template>
+</dom-module>
+<dom-module id="tf-graph-minimap" assetpath="../components/tf-graph/">
+<template>
+<style>
+:host {
+  background-color:white;
+  transition: opacity .3s linear;
+  pointer-events: auto;
+}
+
+:host.hidden {
+  opacity: 0;
+  pointer-events: none;
+}
+
+canvas {
+  border: 1px solid #999;
+}
+
+rect {
+  fill: white;
+  stroke: #111111;
+  stroke-width: 1px;
+  fill-opacity: 0;
+  filter: url("#minimapDropShadow");
+  cursor: move;
+}
+
+svg {
+  position: absolute;
+}
+</style>
+<svg>
+  <defs>
+    <filter id="minimapDropShadow" x="-20%" y="-20%" width="150%" height="150%">
+      <feOffset result="offOut" in="SourceGraphic" dx="1" dy="1"></feOffset>
+      <feColorMatrix result="matrixOut" in="offOut" type="matrix" values="0.1 0 0 0 0 0 0.1 0 0 0 0 0 0.1 0 0 0 0 0 0.5 0"></feColorMatrix>
+      <feGaussianBlur result="blurOut" in="matrixOut" stdDeviation="2"></feGaussianBlur>
+      <feBlend in="SourceGraphic" in2="blurOut" mode="normal"></feBlend>
+    </filter>
+  </defs>
+  <rect></rect>
+</svg>
+<canvas class="first"></canvas>
+
+<canvas class="second"></canvas>
+</template>
+<script>
+Polymer({
+  is: 'tf-graph-minimap',
+
+  /**
+   * Initializes the minimap and returns a minimap object to notify when
+   * things update.
+   *
+   * @param svg The main svg element.
+   * @param zoomG The svg group used for panning and zooming the main svg.
+   * @param mainZoom The main zoom behavior.
+   * @param maxWandH The maximum width/height for the minimap.
+   * @param labelPadding Padding in pixels due to the main graph labels.
+   */
+  init: function(svg, zoomG, mainZoom, maxWAndH, labelPadding) {
+    return new tf.scene.Minimap(svg, zoomG, mainZoom, this, maxWAndH,
+        labelPadding);
+  }
+});
+</script>
+</dom-module>
+<dom-module id="tf-graph-scene" assetpath="../components/tf-graph/">
+<template>
+<style include="tf-graph-style">
+  :host {
+    font-size: 20px;
+  }
+  .titleContainer {
+    position: relative;
+  }
+  .title {
+    position: absolute;
+  }
+  .auxTitle {
+    position: absolute;
+  }
+  #minimap {
+    position: absolute;
+    right: 20px;
+    bottom: 20px;
+  }
+</style>
+<div class="titleContainer">
+  <div id="title" class="title">Main Graph</div>
+  <div id="auxTitle" class="auxTitle">Auxiliary nodes</div>
+</div>
+<svg id="svg">
+  <defs>
+    
+    <marker id="arrowhead" markerWidth="10" markerHeight="10" refX="9" refY="5" orient="auto">
+      <path d="M 0,0 L 10,5 L 0,10 C 3,7 3,3 0,0"></path>
+    </marker>
+    <marker id="ref-arrowhead" markerWidth="10" markerHeight="10" refX="1" refY="5" orient="auto">
+      <path d="M 10,0 L 0,5 L 10,10 C 7,7 7,3 10,0"></path>
+    </marker>
+    
+    <marker id="annotation-arrowhead" markerWidth="5" markerHeight="5" refX="5" refY="2.5" orient="auto">
+      <path d="M 0,0 L 5,2.5 L 0,5 L 0,0"></path>
+    </marker>
+    <marker id="ref-annotation-arrowhead" markerWidth="5" markerHeight="5" refX="0" refY="2.5" orient="auto">
+      <path d="M 5,0 L 0,2.5 L 5,5 L 5,0"></path>
+    </marker>
+    
+    <ellipse id="op-node-stamp" rx="7.5" ry="3" stroke="inherit" fill="inherit"></ellipse>
+    
+    <ellipse id="op-node-annotation-stamp" rx="5" ry="2" stroke="inherit" fill="inherit"></ellipse>
+    
+    <g id="op-series-vertical-stamp">
+      <use xlink:href="#op-node-stamp" x="8" y="9"></use>
+      <use xlink:href="#op-node-stamp" x="8" y="6"></use>
+      <use xlink:href="#op-node-stamp" x="8" y="3"></use>
+    </g>
+    
+    <g id="op-series-horizontal-stamp">
+      <use xlink:href="#op-node-stamp" x="16" y="4"></use>
+      <use xlink:href="#op-node-stamp" x="12" y="4"></use>
+      <use xlink:href="#op-node-stamp" x="8" y="4"></use>
+    </g>
+    
+    <g id="op-series-annotation-stamp">
+      <use xlink:href="#op-node-annotation-stamp" x="9" y="2"></use>
+      <use xlink:href="#op-node-annotation-stamp" x="7" y="2"></use>
+      <use xlink:href="#op-node-annotation-stamp" x="5" y="2"></use>
+    </g>
+    
+    <g id="linearGradients"></g>
+  </defs>
+  
+  <rect fill="white" width="10000" height="10000"></rect>
+  <g id="root"></g>
+</svg>
+<tf-graph-minimap id="minimap"></tf-graph-minimap>
+</template>
+</dom-module>
+<script>
+Polymer({
+  is: 'tf-graph-scene',
+  properties: {
+    graphHierarchy: Object,
+    name: String,
+    colorBy: {
+      type: String,
+      observer: '_colorByChanged'
+    },
+    /** @type {d3_zoom} d3 zoom object */
+    _zoom: Object,
+    highlightedNode: {
+      type: String,
+      observer: '_highlightedNodeChanged'
+    },
+    selectedNode: {
+      type: String,
+      observer: '_selectedNodeChanged'
+    },
+    /** Keeps track of if the graph has been zoomed/panned since loading */
+    _zoomed: {
+      type: Boolean,
+      observer: '_onZoomChanged',
+      value: false
+    },
+    /** Keeps track of the starting coordinates of a graph zoom/pan */
+    _zoomStartCoords: {
+      type: Array,
+      value: null
+    },
+    /** Keeps track of the current coordinates of a graph zoom/pan */
+    _zoomCoords: {
+      type: Array,
+      value: null
+    },
+    /** Maximum distance of a zoom event for it to be interpreted as a click */
+    _maxZoomDistanceForClick: {
+      type: Number,
+      value: 20
+    },
+    /**
+     * @type {d3.scale.ordinal}
+     * Scale mapping from template name to a number between 0 and N-1
+     * where N is the number of different template names.
+     */
+    templateIndex: Object,
+    /**
+     * @type {tf.scene.Minimap}
+     * A minimap object to notify for zoom events.
+     */
+    minimap: Object,
+    /*
+     * Dictionary for easily stylizing nodes when state changes.
+     * _nodeGroupIndex[nodeName] = d3_selection of the nodeGroup
+     */
+    _nodeGroupIndex: {
+      type: Object,
+      value: function() { return {}; }
+    },
+    /*
+     * Dictionary for easily stylizing annotation nodes when state changes.
+     * _annotationGroupIndex[nodeName][hostNodeName] =
+     *   d3_selection of the annotationGroup
+     */
+    _annotationGroupIndex: {
+      type: Object,
+      value: function() { return {}; }
+    },
+    /*
+     * Dictionary for easily stylizing edges when state changes.
+     * _edgeGroupIndex[edgeName] = d3_selection of the edgeGroup
+     */
+    _edgeGroupIndex: {
+      type: Object,
+      value: function() { return {}; }
+    },
+    /**
+     * Max font size for metanode label strings.
+     */
+    maxMetanodeLabelLengthFontSize: {
+      type: Number,
+      value: 9
+    },
+    /**
+     * Min font size for metanode label strings.
+     */
+    minMetanodeLabelLengthFontSize: {
+      type: Number,
+      value: 6
+    },
+    /**
+     * Metanode label strings longer than this are given smaller fonts.
+     */
+    maxMetanodeLabelLengthLargeFont: {
+      type: Number,
+      value: 11
+    },
+    /**
+     * Metanode label strings longer than this are truncated with ellipses.
+     */
+    maxMetanodeLabelLength: {
+      type: Number,
+      value: 18
+    },
+    progress: Object
+  },
+  observers: [
+    '_buildAndFit(graphHierarchy)'
+  ],
+  getNode: function(nodeName) {
+    return this.graphHierarchy.getRenderNodeByName(nodeName);
+  },
+  isNodeExpanded: function(node) {
+    return node.expanded;
+  },
+  setNodeExpanded: function(renderNode) {
+    this._build(this.graphHierarchy);
+  },
+  /**
+   * Resets the state of the component. Called whenever the whole graph
+   * (dataset) changes.
+   */
+  _resetState: function() {
+    // Reset the state of the component.
+    this._nodeGroupIndex = {};
+    this._annotationGroupIndex = {};
+    this._edgeGroupIndex = {};
+    this._updateLabels(false);
+    // Remove all svg elements under the 'root' svg group.
+    d3.select(this.$.svg).select('#root').selectAll('*').remove();
+    // And the defs.
+    d3.select(this.$.svg).select('defs #linearGradients')
+        .selectAll('*').remove();
+  },
+  /** Main method for building the scene */
+  _build: function(graphHierarchy) {
+    if (!graphHierarchy) { return; } //handle untruthy input
+    var templateNames = d3.keys(graphHierarchy.hierarchy.templates);
+
+    this.templateIndex = d3.scale.ordinal()
+                                 .domain(templateNames)
+                                 .range(d3.range(0, templateNames.length));
+    tf.time('tf-graph-scene (layout):', function() {
+      // layout the scene for this meta / series node
+      tf.graph.layout.scene(graphHierarchy.root, this);
+    }.bind(this));
+
+    tf.time('tf-graph-scene (build scene):', function() {
+      tf.graph.scene.buildGroup(d3.select(this.$.root), graphHierarchy.root, this);
+      tf.graph.scene.addGraphClickListener(this.$.svg, this);
+    }.bind(this));
+    // Update the minimap again when the graph is done animating.
+    setTimeout(function() {
+      this.minimap.update();
+    }.bind(this), tf.graph.layout.PARAMS.animation.duration);
+  },
+  ready: function() {
+    this._zoom = d3.behavior.zoom()
+      .on('zoomend', function() {
+        if (this._zoomStartCoords) {
+          // Calculate the total distance dragged during the zoom event.
+          // If it is sufficiently small, then fire an event indicating
+          // that zooming has ended. Otherwise wait to fire the zoom end
+          // event, so that a mouse click registered as part of this zooming
+          // is ignored (as this mouse click was part of a zooming, and should
+          // not be used to indicate an actual click on the graph).
+          var dragDistance = Math.sqrt(
+            Math.pow(this._zoomStartCoords[0] - this._zoomCoords[0], 2) +
+            Math.pow(this._zoomStartCoords[1] - this._zoomCoords[1], 2));
+          if (dragDistance < this._maxZoomDistanceForClick) {
+            this._fireEnableClick();
+          } else {
+            setTimeout(this._fireEnableClick.bind(this), 50);
+          }
+        }
+        this._zoomStartCoords = null;
+      }.bind(this))
+      .on('zoom', function() {
+        // Store the coordinates of the zoom event
+        this._zoomCoords = d3.event.translate;
+
+        // If this is the first zoom event after a zoom-end, then
+        // store the coordinates as the start coordinates as well,
+        // and fire an event to indicate that zooming has started.
+        // This doesn't use the zoomstart event, as d3 sends this
+        // event on mouse-down, even if there has been no dragging
+        // done to translate the graph around.
+        if (!this._zoomStartCoords) {
+          this._zoomStartCoords = this._zoomCoords.slice();
+          this.fire('disable-click');
+        }
+        this._zoomed = true;
+        d3.select(this.$.root).attr('transform',
+                    'translate(' + d3.event.translate + ')' +
+                    'scale(' + d3.event.scale + ')');
+        // Notify the minimap.
+        this.minimap.zoom(d3.event.translate, d3.event.scale);
+      }.bind(this));
+    d3.select(this.$.svg).call(this._zoom)
+      .on('dblclick.zoom', null);
+    d3.select(window).on('resize', function() {
+      // Notify the minimap that the user's window was resized.
+      // The minimap will figure out the new dimensions of the main svg
+      // and will use the existing translate and scale params.
+      this.minimap.zoom();
+    }.bind(this));
+    // Initialize the minimap.
+    this.minimap = this.$.minimap.init(this.$.svg, this.$.root, this._zoom,
+        tf.graph.layout.PARAMS.minimap.size,
+        tf.graph.layout.PARAMS.subscene.meta.labelHeight);
+  },
+  _buildAndFit: function(graphHierarchy) {
+    this._resetState();
+    this._build(graphHierarchy);
+    // Fit to screen after the graph is done animating.
+    setTimeout(this.fit.bind(this), tf.graph.layout.PARAMS.animation.duration);
+  },
+  _updateLabels: function(showLabels) {
+    var titleStyle = this.getElementsByClassName('title')[0].style;
+    var auxTitleStyle = this.getElementsByClassName('auxTitle')[0].style;
+    var core = this.getElementsByClassName(tf.graph.scene.Class.Scene.CORE)[0];
+    // Only show labels if the graph is fully loaded.
+    if (showLabels && core && this.progress && this.progress.value === 100) {
+      var aux =
+        this.getElementsByClassName(tf.graph.scene.Class.Scene.INEXTRACT)[0] ||
+        this.getElementsByClassName(tf.graph.scene.Class.Scene.OUTEXTRACT)[0];
+      var coreX = core.getCTM().e;
+      var auxX = aux ? aux.getCTM().e : null;
+      titleStyle.display = 'inline';
+      titleStyle.left = coreX + 'px';
+      if (auxX !== null && auxX !== coreX) {
+        auxTitleStyle.display = 'inline';
+        auxTitleStyle.left = auxX + 'px';
+      } else {
+        auxTitleStyle.display = 'none';
+      }
+    } else {
+      titleStyle.display='none';
+      auxTitleStyle.display = 'none';
+    }
+  },
+
+
+
+
+  /**
+    * Called whenever the user changed the 'color by' option in the
+    * UI controls.
+    */
+  _colorByChanged: function() {
+    // We iterate through each svg node and update its state.
+    _.each(this._nodeGroupIndex, function(nodeGroup, nodeName) {
+      this._updateNodeState(nodeName);
+    }, this);
+    // Notify also the minimap.
+    this.minimap.update();
+  },
+  fit: function() {
+    tf.graph.scene.fit(this.$.svg, this.$.root, this._zoom, function() {
+      this._zoomed = false;
+    }.bind(this));
+  },
+  isNodeSelected: function(n) {
+    return n === this.selectedNode;
+  },
+  isNodeHighlighted: function(n) {
+    return n === this.highlightedNode;
+  },
+  addAnnotationGroup: function(a, d, selection) {
+    var an = a.node.name;
+    this._annotationGroupIndex[an] = this._annotationGroupIndex[an] || {};
+    this._annotationGroupIndex[an][d.node.name] = selection;
+  },
+  getAnnotationGroupsIndex: function(a) {
+    return this._annotationGroupIndex[a];
+  },
+  removeAnnotationGroup: function(a, d) {
+    delete this._annotationGroupIndex[a.node.name][d.node.name];
+  },
+  addNodeGroup: function(n, selection) {
+    this._nodeGroupIndex[n] = selection;
+  },
+  getNodeGroup: function(n) {
+    return this._nodeGroupIndex[n];
+  },
+  removeNodeGroup: function(n) {
+    delete this._nodeGroupIndex[n];
+  },
+  addEdgeGroup: function(n, selection) {
+    this._edgeGroupIndex[e] = selection;
+  },
+  getEdgeGroup: function(e) {
+    return this._edgeGroupIndex[e];
+  },
+  /**
+   * Update node and annotation node of the given name.
+   * @param  {String} n node name
+   */
+  _updateNodeState: function(n) {
+    var node = this.getNode(n);
+    var nodeGroup = this.getNodeGroup(n);
+
+    if (nodeGroup) {
+      tf.graph.scene.node.stylize(nodeGroup, node, this);
+    }
+
+    var annotationGroupIndex = this.getAnnotationGroupsIndex(n);
+    _.each(annotationGroupIndex, function(aGroup, hostName) {
+      tf.graph.scene.node.stylize(aGroup, node, this,
+          tf.graph.scene.Class.Annotation.NODE);
+    }, this);
+  },
+
+  _selectedNodeChanged: function(selectedNode, oldSelectedNode) {
+    if (selectedNode === oldSelectedNode) {
+      return;
+    }
+
+    if (selectedNode) {
+      this._updateNodeState(selectedNode);
+    }
+    if (oldSelectedNode) {
+      this._updateNodeState(oldSelectedNode);
+    }
+
+    if (!selectedNode) {
+      return;
+    }
+    // Update the minimap to reflect the highlighted (selected) node.
+    this.minimap.update();
+    var node = this.graphHierarchy.hierarchy.node(selectedNode);
+    var nodeParents = [];
+    // Create list of all metanode parents of the selected node.
+    while (node.parentNode != null
+        && node.parentNode.name != tf.graph.ROOT_NAME) {
+      node = node.parentNode;
+      nodeParents.push(node.name);
+    }
+    // Ensure each parent metanode is built and expanded.
+    var topParentNodeToBeExpanded;
+    _.forEachRight(nodeParents, function(parentName) {
+      this.graphHierarchy.buildSubhierarchy(parentName);
+      var renderNode = this.graphHierarchy.getRenderNodeByName(parentName);
+      if (renderNode.node.isGroupNode && !renderNode.expanded) {
+        renderNode.expanded = true;
+        if (!topParentNodeToBeExpanded) {
+          topParentNodeToBeExpanded = renderNode;
+        }
+      }
+    }, this);
+    // If any expansion was needed to display this selected node, then
+    // inform the scene of the top-most expansion.
+    if (topParentNodeToBeExpanded) {
+      this.setNodeExpanded(topParentNodeToBeExpanded);
+      this._zoomed = true;
+    }
+
+    if (tf.graph.scene.panToNode(selectedNode, this.$.svg, this.$.root,
+        this._zoom)) {
+      this._zoomed = true;
+    }
+  },
+  _highlightedNodeChanged: function(highlightedNode, oldHighlightedNode) {
+    if (highlightedNode === oldHighlightedNode) {
+      return;
+    }
+
+    if (highlightedNode) {
+      this._updateNodeState(highlightedNode);
+    }
+    if (oldHighlightedNode) {
+      this._updateNodeState(oldHighlightedNode);
+    }
+  },
+  _onZoomChanged: function() {
+    this._updateLabels(!this._zoomed);
+  },
+  _fireEnableClick: function() {
+    this.fire('enable-click');
+  },
+});
+</script>
+<dom-module id="tf-graph-params" assetpath="../components/tf-graph/">
+</dom-module>
+<script>
+  Polymer({
+
+    is: 'tf-graph-params',
+
+    properties: {
+      // PARAMETERS
+
+      enableExtraction: {
+        type: Boolean,
+        value: true
+      },
+
+      /** Maximum in-degree that a node can have without being considered as
+       *  high in-degree node. */
+      maxInDegree: {
+        type: Number,
+        value: 4
+      },
+      /** Maximum out-degree that a node can have without being considered as
+       *  high out-degree node. */
+      maxOutDegree: {
+        type: Number,
+        value: 4
+      },
+      /** Maximum number of control edges a node can have before they aren't
+       *  displayed. */
+      maxControlDegree: {
+        type: Number,
+        value: 4
+      },
+
+      /**
+       * Types patterns for predefined out-extract nodes, which are
+       * sink-like nodes that will be extracted from the main graph.
+       */
+      outExtractTypes: {
+        type: Array,
+        value: function() {
+          return [
+            'NoOp' // for "sgd", "momentum" group
+          ];
+        }
+      },
+
+      /**
+       * Types patterns for predefined in-extract nodes, which are
+       * source-like nodes that will be extracted from the main graph.
+       */
+      inExtractTypes: {
+        type: Array,
+        value: function() {
+          return ['Variable'];
+        }
+      },
+
+      /**
+       * When removing edges from a high degree node, remove all of its edges if
+       * detachAllEdgesForHighDegree is true.  Otherwise remove all in-edges if
+       * the node has high in-degree, or all out-edges if the node has high
+       * out-degree.
+       */
+      detachAllEdgesForHighDegree: {
+        type: Boolean,
+        value: false
+      },
+
+      /**
+       * After extracting high in/out degree nodes and predefined
+       * source-like/sink-like, extract isolated nodes to the side
+       * if this extractIsolatedNodesWithAnnotationsOnOneSide is true.
+       */
+      extractIsolatedNodesWithAnnotationsOnOneSide: {
+        type: Boolean,
+        value: true
+      },
+
+      /**
+       * Whether to draw bridge paths inside of expanded group nodes.
+       */
+      enableBridgegraph: {
+        type: Boolean,
+        value: true
+      },
+
+      /**
+       * Colors for the minimum and maximum values whenever we have a gradient
+       * scale.
+       */
+      minMaxColors: {
+        type: Array,
+        value: function() {
+          return ["#fff5f0", "#fb6a4a"];
+        }
+      },
+
+      /**
+       * Maximum number of annotations to be displayed on a node before an
+       * ellipsis is used.
+       */
+      maxAnnotations: {
+        type: Number,
+        value: 5
+      }
+    }
+  });
+</script>
+<dom-module id="tf-graph" assetpath="../components/tf-graph/">
+<template>
+<style>
+.container {
+  width: 100%;
+  height: 100%;
+}
+
+.vertical {
+  width:100%;
+  height:100%;
+  @apply(--layout-vertical);
+}
+
+.auto {
+  @apply(--layout-flex-auto);
+  @apply(--layout-vertical);
+}
+
+h2 {
+  text-align: center;
+}
+
+paper-button {
+  text-transform: none;
+}
+</style>
+<div class="container">
+  <tf-graph-params id="graphParams"></tf-graph-params>
+  <div class="vertical">
+    <h2>[[title]]</h2>
+    <tf-graph-scene id="scene" class="auto" graph-hierarchy="[[_renderHierarchy]]" highlighted-node="[[_getVisible(highlightedNode)]]" selected-node="[[selectedNode]]" color-by="[[colorBy]]" name="[[graphName]]" progress="[[progress]]"></tf-graph-scene>
+  </div>
+</div>
+</template>
+</dom-module>
+
+<script>
+Polymer({
+
+  is: 'tf-graph',
+
+  properties: {
+    graphHierarchy: {
+      type: Object,
+      notify: true,
+      observer: '_graphChanged'
+    },
+    title: String,
+    selectedNode: {
+      type: String,
+      notify: true,
+    },
+    highlightedNode: {
+      type: String,
+      notify: true
+    },
+    /** What to color the nodes by (compute time, memory, device etc.) */
+    colorBy: String,
+    colorByParams: {
+      type: Object,
+      notify: true,
+      readOnly: true, // Produces and doesn't consume.
+    },
+    // internal properties
+    _graphParams: {
+      type: Object,
+      value: function() {
+        return this.$.graphParams;
+      }
+    },
+    _renderDepth: {
+      type: Number,
+      value: 1
+    },
+    _renderHierarchy: {
+      type: Object,
+      readOnly: true,
+      notify: true,
+      computed: '_buildRenderHierarchy(graphHierarchy, _graphParams)'
+    },
+    _allowGraphSelect: {
+      type: Boolean,
+      value: true
+    }
+  },
+  _buildRenderHierarchy: function(graphHierarchy, params) {
+    return tf.time('new tf.graph.render.Hierarchy', function() {
+      if (graphHierarchy.root.type !== tf.graph.NodeType.META) {
+        // root must be metanode but sometimes Polymer's dom-if has not
+        // remove tf-graph element yet in <tf-node-info>
+        // and thus mistakenly pass non-metanode to this module.
+        return;
+      }
+      var renderGraph = new tf.graph.render.RenderGraphInformation(
+          graphHierarchy, params);
+      // Producing the 'color by' parameters to be consumed
+      // by the tf-graph-controls panel. It contains information about the
+      // min and max values and their respective colors, as well as list
+      // of devices with their respective colors.
+
+      function getColorParamsFromScale(scale) {
+        return {
+          minValue: scale.domain()[0],
+          maxValue: scale.domain()[1],
+          startColor: scale.range()[0],
+          endColor: scale.range()[1]
+        };
+      }
+
+      this._setColorByParams({
+        compute_time: getColorParamsFromScale(renderGraph.computeTimeScale),
+        memory: getColorParamsFromScale(renderGraph.memoryUsageScale),
+        device: _.map(renderGraph.deviceColorMap.domain(),
+            function(deviceName) {
+          return {
+            device: deviceName,
+            color: renderGraph.deviceColorMap(deviceName)
+          };
+        })
+      });
+      return renderGraph;
+    }.bind(this));
+  },
+  _getVisible: function(name) {
+    if (!name) {
+      return name;
+    }
+    return this._renderHierarchy.getNearestVisibleAncestor(name);
+  },
+  listeners: {
+    'graph-select': '_graphSelected',
+    'disable-click': '_disableClick',
+    'enable-click': '_enableClick',
+    // Nodes
+    'node-toggle-expand': '_nodeToggleExpand',
+    'node-select': '_nodeSelected',
+    'node-highlight': '_nodeHighlighted',
+    'node-unhighlight': '_nodeUnhighlighted',
+
+    // Annotations
+
+    /* Note: currently highlighting/selecting annotation node has the same
+      * behavior as highlighting/selecting actual node so we point to the same
+      * set of event listeners.  However, we might redesign this to be a bit
+      * different.
+      */
+    'annotation-select': '_nodeSelected',
+    'annotation-highlight': '_nodeHighlighted',
+    'annotation-unhighlight': '_nodeUnhighlighted',
+  },
+  _graphChanged: function() {
+    // When a new graph is loaded, fire this event so that there is no
+    // info-card being displayed for the previously-loaded graph.
+    this.fire('graph-select');
+  },
+  _graphSelected: function(event) {
+    // Graph selection is not allowed during an active zoom event, as the
+    // click seen during a zoom/pan is part of the zooming and does not
+    // indicate a user desire to click on a specific section of the graph.
+    if (this._allowGraphSelect) {
+      this.set('selectedNode', null);
+    }
+    // Reset this variable as a bug in d3 zoom behavior can cause zoomend
+    // callback not to be called if a right-click happens during a zoom event.
+    this._allowGraphSelect = true;
+  },
+  _disableClick: function(event) {
+    this._allowGraphSelect = false;
+  },
+  _enableClick: function(event) {
+    this._allowGraphSelect = true;
+  },
+  _nodeSelected: function(event) {
+    if (this._allowGraphSelect) {
+      this.set('selectedNode', event.detail.name);
+    }
+    // Reset this variable as a bug in d3 zoom behavior can cause zoomend
+    // callback not to be called if a right-click happens during a zoom event.
+    this._allowGraphSelect = true;
+  },
+  _nodeHighlighted: function(event) {
+    this.set('highlightedNode', event.detail.name);
+  },
+  _nodeUnhighlighted: function(event) {
+    this.set('highlightedNode', null);
+  },
+  _nodeToggleExpand: function(event) {
+    var nodeName = event.detail.name;
+    var renderNode = this._renderHierarchy.getRenderNodeByName(nodeName);
+    // Op nodes are not expandable.
+    if (renderNode.node.type === tf.graph.NodeType.OP) {
+      return;
+    }
+    this._renderHierarchy.buildSubhierarchy(nodeName);
+    renderNode.expanded = !renderNode.expanded;
+    this.querySelector('#scene').setNodeExpanded(renderNode);
+    // Also select the expanded node.
+    this._nodeSelected(event);
+  },
+  not: function(x) {
+    return !x;
+  }
+});
+</script>
+<dom-module id="tf-graph-icon" assetpath="../components/tf-graph/">
+  <template>
+    <template is="dom-if" if="[[_isType(node, type, 'OP')]]">
+      <template is="dom-if" if="[[_isConst(node, const)]]">
+        <svg height$="[[height]]" preserveAspectRatio="xMinYMid meet" viewBox="0 0 10 10">
+          <circle fill="white" stroke="#848484" cx="5" cy="5" r="3"></circle>
+        </svg>
+      </template>
+      <template is="dom-if" if="[[_isSummary(node, summary)]]">
+        <img height$="[[height]]" src="[[resolveUrl('../../lib/svg/summary-icon.svg')]]">
+      </template>
+      <template is="dom-if" if="[[_isRegularOp(node, const, summary)]]">
+        <svg height$="[[height]]" preserveAspectRatio="xMinYMid meet" viewBox="0 0 16 8">
+          <use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#op-node-stamp" fill="white" stroke="#ccc" x="8" y="4"></use>
+        </svg>
+      </template>
+    </template>
+    <template is="dom-if" if="[[_isType(node, type, 'META')]]">
+      <svg height$="[[height]]" preserveAspectRatio="xMinYMid meet" viewBox="0 0 37 16">
+        <rect x="1" y="1" fill="#d9d9d9" stroke="#ccc" stroke-width="2px" height="14" width="35" rx="5" ry="5"></rect>
+      </svg>
+    </template>
+    <template is="dom-if" if="[[_isType(node, type, 'SERIES')]]">
+      <template is="dom-if" if="[[_isVertical(node, vertical)]]">
+        <svg height$="[[height]]" preserveAspectRatio="xMinYMid meet" viewBox="0 0 16 15">
+          <use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#op-series-vertical-stamp" fill="white" stroke="#ccc" x="0" y="2"></use>
+        </svg>
+      </template>
+      <template is="dom-if" if="[[!_isVertical(node, vertical)]]">
+        <svg height$="[[height]]" preserveAspectRatio="xMinYMid meet" viewBox="0 0 24 10">
+          <use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#op-series-horizontal-stamp" fill="white" stroke="#ccc" x="0" y="1"></use>
+        </svg>
+      </template>
+    </template>
+  </template>
+
+  <script>
+    (function() {
+      Polymer({
+        is: 'tf-graph-icon',
+
+        properties: {
+          /**
+           * Node to represent with an icon. Optional, but if specified, its
+           * properties override those defined in the type, vertical, const and
+           * summary properties.
+           * @type {tf.graph.Node}
+           */
+          node: {
+            type: Object,
+            value: null
+          },
+
+          /** Type of node to draw. */
+          type: {
+            type: String,
+            value: null
+          },
+
+          /** Direction for series (ignored for other types). */
+          vertical: {
+            type: Boolean,
+            value: false
+          },
+
+          /** Whether the op is Const (ignored for non-ops). */
+          const: {
+            type: Boolean,
+            value: false
+          },
+
+          /** Whether the op is a Summary (ignored for non-ops). */
+          summary: {
+            type: Boolean,
+            value: false
+          },
+
+          /** Height of the SVG element in pixels, used for scaling. */
+          height: {
+            type: Number,
+            value: 20
+          }
+        },
+
+        /**
+         * Test whether the specified node's type, or the literal type string,
+         * match a particular other type.
+         */
+        _isType: function(inputNode, inputType, targetType) {
+          if (inputNode) {
+            return tf.graph.NodeType[inputNode.type] === targetType;
+          }
+          return inputType === targetType;
+        },
+
+        /**
+         * Test whether the specified node should be represented as a vertical
+         * series. Defaults to the value of the vertical property if node is
+         * not specified.
+         */
+        _isVertical: function(inputNode, inputVertical) {
+          if (inputNode) {
+            return inputNode.hasNonControlEdges;
+          }
+          return !!inputVertical;
+        },
+
+        /**
+         * Test whether the specified node is a constant. Defaults to the value
+         * of the const property if node is not specified.
+         */
+        _isConst: function(inputNode, inputConst) {
+          if (inputNode) {
+            return inputNode.op === 'Const';
+          }
+          return !!inputConst;
+        },
+
+        /**
+         * Test whether the specified node is a summary. Defaults to the value
+         * of the summary property if node is not specified.
+         */
+        _isSummary: function(inputNode, inputSummary) {
+          if (inputNode) {
+            return this._isType(inputNode, null, 'OP') &&
+                inputNode.op.substr(-7) === 'Summary';
+          }
+          return !!inputSummary;
+        },
+
+        /**
+         * Test whether the op node is a regular non-summary non-const node.
+         */
+        _isRegularOp: function(inputNode, inputConst, inputSummary) {
+          return !this._isConst(inputNode, inputConst) &&
+              !this._isSummary(inputNode, inputSummary);
+        }
+      });
+    })();
+  </script>
+</dom-module>
+<dom-module id="tf-node-list-item" assetpath="../components/tf-graph-info/">
+  <style>
+  #list-item {
+    width: 100%;
+    color: #565656;
+    font-size: 11pt;
+    font-weight: 400;
+    position: relative;
+  }
+
+  #list-item:hover {
+    background-color: var(--google-yellow-100);
+  }
+
+  .clickable {
+    cursor: pointer;
+  }
+
+  #list-item span {
+    display: block;
+    margin-left: 40px;
+  }
+
+  #list-item.excluded span {
+    color: #999;
+  }
+
+  .node-icon {
+    position: absolute;
+    top: 1px;
+    left: 2px;
+  }
+  </style>
+  <template>
+    <div id="list-item" on-mouseover="_nodeListener" on-mouseout="_nodeListener" on-click="_nodeListener">
+      <tf-graph-icon class="node-icon" node="[[itemNode]]" height="12"></tf-graph-icon>
+      <span title$="[[name]]">[[name]]</span>
+    </div>
+  </template>
+
+  <script>
+    (function() {
+      Polymer({
+        is: 'tf-node-list-item',
+
+        properties: {
+          /**
+           * The Node for the card itself, on which this item is being drawn.
+           * @type {tf.graph.Node}
+           */
+          cardNode: Object,
+          /**
+           * The Node for the item within the card, somehow related to cardNode.
+           * @type {tf.graph.Node}
+           */
+          itemNode: Object,
+          name: String,
+          itemType: {
+            type: String,
+            observer: '_itemTypeChanged'
+          }
+        },
+
+        _itemTypeChanged: function() {
+          if (this.itemType !== 'subnode') {
+            this.$['list-item'].classList.add('clickable');
+          } else {
+            this.$['list-item'].classList.remove('clickable');
+          }
+        },
+
+        _nodeListener: function(event) {
+          // fire node.click/mouseover/mouseout
+          this.fire('node-list-item-' + event.type, {
+            cardNode: this.cardNode.name,
+            nodeName: this.name,
+            type: this.itemType
+          });
+        }
+
+      });
+    })();
+  </script>
+</dom-module>
+<dom-module id="tf-node-info" assetpath="../components/tf-graph-info/">
+  <style>
+  .sub-list-group {
+    padding: 8px 12px 0px;
+    font-weight: 500;
+    font-size: 12pt;
+  }
+
+  .sub-list {
+    max-height: 300px;
+    overflow-y: scroll;
+  }
+
+  .attr-left {
+    float: left;
+    width: 30%;
+    word-wrap: break-word;
+    color: #565656;
+    font-size: 11pt;
+    font-weight: 400;
+  }
+
+  .attr-right {
+    margin-left: 30%;
+    word-wrap: break-word;
+    color: #565656;
+    font-weight: 400;
+  }
+
+  paper-item {
+    padding: 0;
+    background: #e9e9e9;
+  }
+
+  paper-item-body[two-line] {
+    min-height: 0;
+    padding: 8px 12px 4px;
+  }
+
+  .expandedInfo {
+    padding: 0 0 8px;
+  }
+
+  .controlDeps {
+    padding: 0 0 0 8px;
+  }
+
+  .node-name {
+    white-space: normal;
+    word-wrap: break-word;
+    font-size: 14pt;
+    font-weight: 500;
+  }
+
+  .node-icon {
+    float: right;
+  }
+
+  .subtitle {
+    font-size: 12pt;
+    color: #5e5e5e;
+  }
+
+  .controlLine {
+    font-size: 11pt;
+    font-weight: 400;
+  }
+
+  .toggle-button {
+    float: right;
+    max-height: 20px;
+    max-width: 20px;
+    padding: 0;
+  }
+
+  .control-toggle-button {
+    float: left;
+    max-height: 20px;
+    max-width: 20px;
+    padding: 0;
+  }
+  </style>
+  <template>
+    <paper-item>
+      <paper-item-body two-line="">
+        <div>
+          <paper-icon-button icon="{{_getToggleIcon(_expanded)}}" on-click="_toggleExpanded" class="toggle-button">
+          </paper-icon-button>
+          <div class="node-name">[[_getNodeName(nodeName)]]</div>
+        </div>
+        <div secondary="">
+          <tf-graph-icon class="node-icon" node="[[_node]]"></tf-graph-icon>
+          <template is="dom-if" if="{{_node.op}}">
+            <div class="subtitle">
+              Operation:
+              <span>[[_node.op]]</span>
+            </div>
+          </template>
+          <template is="dom-if" if="{{_node.metagraph}}">
+            <div class="subtitle">
+              Subgraph:
+              <span>[[_node.cardinality]]</span> nodes
+            </div>
+          </template>
+        </div>
+      </paper-item-body>
+    </paper-item>
+    <iron-collapse opened="{{_expanded}}">
+    <template is="dom-if" if="{{_expanded}}" restamp="true">
+      <div class="expandedInfo">
+        <div class="sub-list-group attributes">
+          Attributes
+          (<span>[[_attributes.length]]</span>)
+          <iron-list class="sub-list" id="attributesList" items="[[_attributes]]">
+            <template>
+              <div>
+                <div class="attr-left">[[item.key]]</div>
+                <div class="attr-right">[[item.value]]</div>
+              </div>
+            </template>
+          </iron-list>
+        </div>
+
+        <template is="dom-if" if="{{_device}}">
+          <div class="sub-list-group device">
+            <div class="attr-left">Device</div>
+            <div class="attr-right">[[_device]]</div>
+          </div>
+        </template>
+
+        <div class="sub-list-group predecessors">
+          Inputs
+          (<span>[[_totalPredecessors]]</span>)
+          <iron-list class="sub-list" id="inputsList" items="[[_predecessors.regular]]">
+            <template>
+              <tf-node-list-item card-node="[[_node]]" item-node="[[_getNode(item, graphHierarchy)]]" name="[[item]]" item-type="predecessors">
+              </tf-node-list-item>
+            </template>
+          </iron-list>
+          <template is="dom-if" if="[[_predecessors.control.length]]">
+            <div class="controlDeps">
+              <div class="controlLine">
+                <paper-icon-button icon="{{_getToggleIcon(_openedControlPred)}}" on-click="_toggleControlPred" class="control-toggle-button">
+                </paper-icon-button>
+                Control dependencies
+              </div>
+              <iron-collapse opened="{{_openedControlPred}}">
+                <template is="dom-if" if="{{_openedControlPred}}" restamp="true">
+                  <iron-list class="sub-list" items="[[_predecessors.control]]">
+                    <template>
+                      <tf-node-list-item card-node="[[_node]]" item-node="[[_getNode(item, graphHierarchy)]]" name="[[item]]" item-type="predecessors">
+                      </tf-node-list-item>
+                    </template>
+                  </iron-list>
+                </template>
+              </iron-collapse>
+            </div>
+          </template>
+        </div>
+
+        <div class="sub-list-group successors">
+          Outputs
+          (<span>[[_totalSuccessors]]</span>)
+          <iron-list class="sub-list" id="outputsList" items="[[_successors.regular]]">
+            <template>
+              <tf-node-list-item card-node="[[_node]]" item-node="[[_getNode(item, graphHierarchy)]]" name="[[item]]" item-type="successor">
+              </tf-node-list-item>
+            </template>
+          </iron-list>
+          <template is="dom-if" if="[[_successors.control.length]]">
+            <div class="controlDeps">
+              <div class="controlLine">
+                <paper-icon-button icon="{{_getToggleIcon(_openedControlSucc)}}" on-click="_toggleControlSucc" class="control-toggle-button">
+                </paper-icon-button>
+                Control dependencies
+              </div>
+              <iron-collapse opened="{{_openedControlSucc}}">
+                <template is="dom-if" if="{{_openedControlSucc}}" restamp="true">
+                  <iron-list class="sub-list" items="[[_successors.control]]">
+                    <template>
+                      <tf-node-list-item card-node="[[_node]]" item-node="[[_getNode(item, graphHierarchy)]]" name="[[item]]" item-type="successors">
+                      </tf-node-list-item>
+                    </template>
+                  </iron-list>
+                </template>
+              </iron-collapse>
+            </div>
+          </template>
+        </div>
+      </div>
+    </template>
+    </iron-collapse>
+  </template>
+
+  <script>
+    (function() {
+      Polymer({
+        is: 'tf-node-info',
+
+        properties: {
+          nodeName: String,
+          graphHierarchy: Object,
+          _node: {
+            type: Object,
+            computed: '_getNode(nodeName, graphHierarchy)',
+            observer: '_resetState'
+          },
+          _attributes: {
+            type: Array,
+            computed: '_getAttributes(_node)'
+          },
+          _device: {
+            type: String,
+            computed: '_getDevice(_node)'
+          },
+          _successors: {
+            type: Object,
+            computed: '_getSuccessors(_node, graphHierarchy)'
+          },
+          _predecessors: {
+            type: Object,
+            computed: '_getPredecessors(_node, graphHierarchy)'
+          },
+          _subnodes: {
+            type: Array,
+            computed: '_getSubnodes(_node)'
+          },
+          _expanded: {
+            type: Boolean,
+            value: true
+          },
+          _totalPredecessors: {
+            type: Number,
+            computed: '_getTotalPred(_predecessors)'
+          },
+          _totalSuccessors: {
+            type: Number,
+            computed: '_getTotalSucc(_successors)'
+          },
+          _openedControlPred: {
+            type: Boolean,
+            value: false
+          },
+          _openedControlSucc: {
+            type: Boolean,
+            value: false
+          },
+        },
+        expandNode: function() {
+          this.fire('_node.expand', this.node);
+        },
+        _getNode: function(n, graphHierarchy) {
+          return graphHierarchy.node(n);
+        },
+        _getNodeName: function(nodeName) {
+          // Insert a zero-width whitespace character before each slash so that
+          // long node names wrap cleanly at path boundaries.
+          return (nodeName || '').replace(/\//g, '\u200B/');
+        },
+        _getAttributes: function(node) {
+          this.async(this._resizeList.bind(this, "#attributesList"));
+          return node && node.attr ? node.attr.map(function(entry) {
+            return {key: entry.key, value: JSON.stringify(entry.value)};
+          }) : [];
+
+        },
+        _getDevice: function(node) {
+          return node ? node.device : null;
+        },
+        _getSuccessors: function(node, hierarchy) {
+          this.async(this._resizeList.bind(this, "#inputsList"));
+          return node ? hierarchy.getSuccessors(node.name) : [[], []];
+        },
+        _getPredecessors: function(node, hierarchy) {
+          this.async(this._resizeList.bind(this, "#outputsList"));
+          return node ? hierarchy.getPredecessors(node.name) : [[], []];
+        },
+        _getSubnodes: function(node) {
+          return node && node.metagraph ? node.metagraph.nodes() : null;
+        },
+        _getTotalPred: function(predecessors) {
+          return predecessors.regular.length + predecessors.control.length;
+        },
+        _getTotalSucc: function(successors) {
+          return successors.regular.length + successors.control.length;
+        },
+        _toggleControlPred: function() {
+          this._openedControlPred = !this._openedControlPred;
+        },
+        _toggleControlSucc: function() {
+          this._openedControlSucc = !this._openedControlSucc;
+        },
+        _toggleExpanded: function() {
+          this._expanded = !this._expanded;
+        },
+        _getToggleIcon: function(expanded) {
+          return expanded ? "expand-less" : "expand-more";
+        },
+        _resetState: function() {
+          this._openedControlPred = false;
+          this._openedControlSucc = false;
+        },
+        _resizeList: function(selector) {
+          var list = document.querySelector(selector);
+          if (list) {
+            list.fire('iron-resize');
+          }
+        }
+      });
+    })();
+  </script>
+</dom-module>
+<dom-module id="tf-graph-info" assetpath="../components/tf-graph-info/">
+<template>
+<style>
+:host {
+  font-size: 12px;
+  margin: 0;
+  padding: 0;
+  display: block;
+}
+
+h2 {
+  padding: 0;
+  text-align: center;
+  margin: 0;
+}
+</style>
+<template is="dom-if" if="{{selectedNode}}">
+  <paper-material elevation="1" class="card">
+    <tf-node-info graph-hierarchy="[[graphHierarchy]]" flat-graph="[[graph]]" node-name="[[selectedNode]]" highlighted-node="{{highlightedNode}}">
+    </tf-node-info>
+  </paper-material>
+</template>
+</template>
+<script>
+(function() {
+  Polymer({
+    is: 'tf-graph-info',
+
+    properties: {
+      title: String,
+      graphHierarchy: Object,
+      graph: Object,
+      // Two-ways
+      selectedNode: {
+        type: String,
+        notify: true
+      },
+      highlightedNode: {
+        type: String,
+        notify: true
+      }
+    },
+    listeners: {
+      'node-list-item-click': '_nodeListItemClicked',
+      'node-list-item-mouseover': '_nodeListItemMouseover',
+      'node-list-item-mouseout': '_nodeListItemMouseout'
+    },
+    _nodeListItemClicked: function(event) {
+      this.selectedNode = event.detail.nodeName;
+    },
+    _nodeListItemMouseover: function(event) {
+      this.highlightedNode = event.detail.nodeName;
+    },
+    _nodeListItemMouseout: function() {
+      this.highlightedNode = null;
+    }
+  });
+})();
+</script>
+</dom-module>
+<dom-module id="tf-graph-board" assetpath="../components/tf-graph-board/">
+<template>
+<style>
+::host {
+  display: block;
+}
+
+/deep/ .close {
+  position: absolute;
+  cursor: pointer;
+  left: 15px;
+  bottom: 15px;
+}
+
+.container {
+  width: 100%;
+  height: 100%;
+  opacity: 1;
+}
+
+.container.loading {
+  cursor: progress;
+  opacity: 0.1;
+}
+
+.container.loading.error {
+  cursor: auto;
+}
+
+#info {
+  position: absolute;
+  right: 5px;
+  top: 5px;
+  padding: 0px;
+  max-width: 380px;
+  min-width: 320px;
+  background-color: rgba(255,255,255,0.9);
+  @apply(--shadow-elevation-2dp);
+}
+
+#main {
+  width: 100%;
+  height: 100%;
+}
+
+#progress-bar {
+  display: flex;
+  flex-direction: column;
+  align-items: center;
+  justify-content: center;
+  width: 100%;
+  position: absolute;
+  top: 40px;
+  left: 0;
+  font-size: 13px;
+}
+
+#progress-msg {
+  width: 400px;
+  margin-bottom: 5px;
+}
+
+paper-progress {
+  width: 400px;
+  --paper-progress-height: 6px;
+  --paper-progress-active-color: #f3913e;
+}
+</style>
+<template is="dom-if" if="[[_isNotComplete(progress)]]">
+  <div id="progress-bar">
+    <div id="progress-msg">[[progress.msg]]</div>
+    <paper-progress value="[[progress.value]]"></paper-progress>
+  </div>
+</template>
+<div class$="[[_getContainerClass(progress)]]">
+  <div id="main">
+    <tf-graph id="graph" graph-hierarchy="[[graphHierarchy]]" selected-node="{{_selectedNode}}" highlighted-node="{{_highlightedNode}}" color-by="[[colorBy]]" color-by-params="{{colorByParams}}" graph-name="[[graphName]]" progress="[[progress]]"></tf-graph>
+  </div>
+  <div id="info">
+    <tf-graph-info id="graph-info" title="selected" graph-hierarchy="[[graphHierarchy]]" graph="[[graph]]" selected-node="{{_selectedNode}}" highlighted-node="{{_highlightedNode}}"></tf-graph-info>
+  </div>
+</div>
+</template>
+</dom-module>
+
+<script>
+Polymer({
+  is: 'tf-graph-board',
+  properties: {
+    // Public API.
+    graphHierarchy: Object,
+    graph: Object,
+    graphName: String,
+    // True if the graph data has also run-time stats.
+    hasStats: Boolean,
+    /**
+     * @type {value: number, msg: string}
+     *
+     * A number between 0 and 100 denoting the % of progress
+     * for the progress bar and the displayed message.
+     */
+    progress: Object,
+    colorByParams: {
+      type: Object,
+      notify: true,
+    },
+    // Private API: Data routing between child components.
+    _selectedNode: String,
+    _highlightedNode: String,
+  },
+  /** True if the progress is not complete yet (< 100 %). */
+  _isNotComplete: function(progress) {
+    return progress.value < 100;
+  },
+  _getContainerClass: function(progress) {
+    var result = 'container';
+    if (progress.error) {
+      result += ' error';
+    }
+    if (this._isNotComplete(progress)) {
+      result += ' loading';
+    }
+    return result;
+  }
+});
+</script>
+<dom-module id="tf-graph-controls" assetpath="../components/tf-graph/">
+<template>
+<style>
+:host {
+  font-size: 12px;
+  color: gray;
+  --paper-font-subhead: {
+    font-size: 14px;
+    color: gray;
+  };
+  --paper-dropdown-menu-icon: {
+    width: 15px;
+    height: 15px;
+  };
+  --paper-dropdown-menu-button: {
+    padding: 0;
+  };
+  --paper-dropdown-menu-input: {
+    padding: 0;
+  };
+  --paper-item-min-height: 30px;
+}
+
+paper-button[raised].keyboard-focus {
+  font-weight: normal;
+}
+
+.run-dropdown {
+  --paper-input-container: {
+    padding: 9px 0 0 25px;
+  };
+}
+
+.color-dropdown {
+  --paper-input-container: {
+    padding: 9px 0 0 13px;
+  };
+}
+
+table {
+  border-collapse: collapse;
+  border-spacing: 0;
+}
+
+table td {
+  padding: 0;
+  margin: 0;
+}
+
+.allcontrols {
+  padding: 10px;
+}
+
+.legend-holder {
+  position: absolute;
+  bottom: 0;
+  padding-bottom: 10px;
+}
+
+#fit {
+  color: var(--paper-orange-500);
+}
+
+paper-radio-button {
+  padding: 5px;
+}
+svg.icon {
+  width: 60px;
+  height: 18px;
+}
+.icon ellipse {
+  rx: 10px;
+  ry: 5px;
+  stroke: #CCC;
+  stroke-width: 1px;
+  fill: #FFFFFF;
+  cy: 10px;
+}
+.icon rect {
+  height: 14px;
+  width: 35px;
+  rx: 5px;
+  ry: 5px;
+  stroke: #CCC;
+  stroke-width: 2px;
+  fill: #D9D9D9;
+}
+.domainValues {
+  width: 165px;
+}
+.domainStart {
+  float: left;
+}
+.domainEnd {
+  float: right;
+}
+.colorBox {
+  width: 20px;
+}
+
+.image-icon {
+  width: 24px;
+  height: 24px;
+}
+
+.gray {
+  color: #666;
+}
+
+.title {
+  font-size: 16px;
+  margin: 8px 5px 8px 0;
+  color: black;
+}
+.title small {
+  font-weight: normal;
+}
+.deviceList {
+  max-height: 100px;
+  overflow-y: auto;
+}
+
+#file {
+  padding: 8px 0;
+}
+
+.color-text {
+  padding: 0 0 0 55px;
+}
+
+.fit-button-text {
+  text-transform: none;
+  padding: 8px 18px 0 18px;
+  font-size: 14px
+}
+
+.upload-button {
+  width: 165px;
+  height: 25px;
+  text-transform: none;
+  margin-top: 4px;
+}
+
+.fit-button {
+  padding: 2px;
+  width: 30px;
+  height: 30px;
+}
+
+.hidden-input {
+  height: 0px;
+  width: 0px;
+  overflow:hidden;
+}
+
+.allcontrols .control-holder {
+  display: flex;
+  clear: both;
+}
+</style>
+<div class="allcontrols">
+  <div class="control-holder">
+    <paper-icon-button id="fit" icon="aspect-ratio" class="fit-button" on-click="fit" alt="Fit to screen">
+    </paper-icon-button>
+    <paper-button class="fit-button-text" on-click="fit">Fit to screen
+    </paper-button>
+  </div>
+  <div class="control-holder">
+    <div class="title">Run</div>
+    <paper-dropdown-menu no-label-float="" no-animations="" noink="" class="run-dropdown">
+      <paper-menu id="select" class="dropdown-content" selected="{{selectedDataset}}">
+        <template is="dom-repeat" items="[[datasets]]">
+          <paper-item>[[item.name]]</paper-item>
+        </template>
+      </paper-menu>
+    </paper-dropdown-menu>
+  </div>
+  <div class="control-holder">
+    <div class="title">Upload</div>
+    <paper-button raised="" class="text-button upload-button" on-click="_getFile">Choose File</paper-button>
+    <div class="hidden-input">
+      <input type="file" id="file" name="file" on-change="_updateFileInput">
+    </div>
+  </div>
+  <div class="control-holder">
+    <div class="title">Color</div>
+    <paper-dropdown-menu no-label-float="" no-animations="" noink="" class="color-dropdown">
+      <paper-menu class="dropdown-content" selected="{{_colorByIndex}}">
+        <paper-item>Structure</paper-item>
+        <paper-item>Device</paper-item>
+        <template is="dom-if" if="[[hasStats]]">
+          <paper-item>Compute time</paper-item>
+          <paper-item>Memory</paper-item>
+        </template>
+      </paper-menu>
+    </paper-dropdown-menu>
+  </div>
+  <div>
+    <template is="dom-if" if="[[_isGradientColoring(colorBy)]]">
+      <svg width="160" height="20" style="margin: 0 5px" class="color-text">
+        <defs>
+          <linearGradient id="linearGradient" x1="0%" y1="0%" x2="100%" y2="0%">
+            <stop class="start" offset="0%" stop-color$="[[_currentGradientParams.startColor]]"></stop>
+            <stop class="end" offset="100%" stop-color$="[[_currentGradientParams.endColor]]"></stop>
+          </linearGradient>
+        </defs>
+        <rect x="0" y="0" width="160" height="20" fill="url(#linearGradient)" stroke="black"></rect>
+      </svg>
+      <div class="domainValues color-text">
+        <div class="domainStart">[[_currentGradientParams.minValue]]</div>
+        <div class="domainEnd">[[_currentGradientParams.maxValue]]</div>
+      </div>
+    </template>
+    <template is="dom-if" if="[[_equals(colorBy, 'structure')]]">
+      <div class="color-text">
+        color: same substructure<br>
+        gray: unique substructure
+      </div>
+    </template>
+    <template is="dom-if" if="[[_equals(colorBy, 'device')]]">
+      <div class="color-text">
+        <div class="deviceList">
+          <table>
+          <template is="dom-repeat" items="[[colorByParams.device]]">
+            <tr>
+              <td style$="[[_getBackgroundColor(item.color)]]">
+                <div class="colorBox"></div>
+              </td>
+              <td>
+                <div>[[item.device]]</div>
+              </td>
+            </tr>
+          </template>
+          </table>
+        </div>
+        <br>
+        gray: unknown device
+      </div>
+    </template>
+  </div>
+  <div class="legend-holder">
+    <table>
+      <tbody><tr>
+        <td><div class="title">Graph</div></td>
+        <td>(* = expandable)</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon">
+            <rect transform="translate(3, 1)" height="14" width="35" rx="5" ry="5"></rect>
+          </svg>
+        </td>
+        <td>Namespace<span class="gray">*</span></td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" preserveAspectRatio="xMinYMid meet" viewBox="0 0 10 10">
+            <use xlink:href="#op-node-stamp" fill="white" stroke="#ccc" x="9.5" y="6"></use>
+          </svg>
+        </td>
+        <td>OpNode</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px" preserveAspectRatio="xMinYMid meet" viewBox="0 0 12 12">
+            <use xlink:href="#op-series-horizontal-stamp" fill="white" stroke="#ccc" x="2" y="2"></use>
+          </svg>
+        </td>
+        <td>Unconnected series<span class="gray">*</span></td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px" preserveAspectRatio="xMinYMid meet" viewBox="0 0 15 15">
+            <use xlink:href="#op-series-vertical-stamp" fill="white" stroke="#ccc" x="2" y="2"></use>
+          </svg>
+        </td>
+        <td>Connected series<span class="gray">*</span></td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon">
+            <circle fill="white" stroke="#848484" cx="10" cy="10" r="5"></circle>
+          </svg>
+        </td>
+        <td>Constant</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="image-icon">
+            <image id="summary-icon" width="24" height="24" x="0" y="0" class="image-icon"></image>
+          </svg>
+        </td>
+        <td>Summary</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px" preserveAspectRatio="xMinYMid meet" viewBox="0 0 15 15">
+            <defs>
+              <marker id="arrowhead-legend" fill="#bbb" markerWidth="10" markerHeight="10" refX="9" refY="5" orient="auto">
+                <path d="M 0,0 L 10,5 L 0,10 C 3,7 3,3 0,0"></path>
+              </marker>
+              <marker id="ref-arrowhead-legend" fill="#bbb" markerWidth="10" markerHeight="10" refX="1" refY="5" orient="auto">
+                <path d="M 10,0 L 0,5 L 10,10 C 7,7 7,3 10,0"></path>
+              </marker>
+            </defs>
+            <path marker-end="url(#arrowhead-legend)" stroke="#bbb" d="M2 9 l 23 0" stroke-linecap="round"></path>
+          </svg>
+        </td>
+        <td>Dataflow edge</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px" preserveAspectRatio="xMinYMid meet" viewBox="0 0 15 15">
+            <path marker-end="url(#arrowhead-legend)" stroke="#bbb" d="M2 9 l 23 0" stroke-linecap="round" stroke-dasharray="2, 2"></path>
+          </svg>
+        </td>
+        <td>Control dependency edge</td>
+      </tr>
+      <tr>
+        <td>
+          <svg class="icon" height="15px" preserveAspectRatio="xMinYMid meet" viewBox="0 0 15 15">
+            <path marker-start="url(#ref-arrowhead-legend)" marker-end="url(#arrowhead-legend)" stroke="#bbb" d="M2 9 l 23 0" stroke-linecap="round"></path>
+          </svg>
+        </td>
+        <td>Reference edge</td>
+      </tr>
+    </tbody></table>
+  </div>
+  </div>
+</template>
+<script>
+(function() { // Private scope.
+Polymer({
+  is: 'tf-graph-controls',
+  ready: function() {
+    // Set the url to download the summary icon.
+    d3.select(this.$['summary-icon'])
+      .attr('xlink:href', this.resolveUrl('../../lib/svg/summary-icon.svg'));
+  },
+  properties: {
+    // Public API.
+    hasStats: {
+      type: Boolean
+    },
+    colorBy: {
+      type: String,
+      notify: true,
+      computed: '_getColorBy(_colorByIndex)'
+    },
+    colorByParams: Object,
+    datasets: {
+      type: Array,
+      observer: '_datasetsChanged'
+    },
+    selectedDataset: {
+      type: Number,
+      notify: true,
+      value: 0,
+    },
+    selectedFile: {
+      type: Object,
+      notify: true
+    },
+    // Private API.
+    _colorByIndex: {
+      type: Number,
+      value: 0 // Defaults to 'structure'.
+    },
+    _currentGradientParams: {
+      type: Object,
+      computed: '_getCurrentGradientParams(colorByParams, colorBy)'
+    }
+  },
+  _getColorBy: function(colorByIndex) {
+    return ["structure", "device", "compute_time", "memory"][colorByIndex];
+  },
+  _getBackgroundColor: function(color) {
+    return 'background-color:' + color;
+  },
+  fit: function() {
+    document.querySelector('#scene').fit();
+  },
+  _isGradientColoring: function(colorBy) {
+    return ["compute_time", "memory"].indexOf(colorBy) !== -1;
+  },
+  _equals: function(a, b) {
+    return a === b;
+  },
+  _getCurrentGradientParams: function(colorByParams, colorBy) {
+    if (!this._isGradientColoring(colorBy)) {
+      return;
+    }
+    var params = colorByParams[colorBy];
+    var minValue = params.minValue;
+    var maxValue = params.maxValue;
+    if (colorBy === 'memory') {
+      minValue = convertToHumanReadable(minValue, MEMORY_UNITS);
+      maxValue = convertToHumanReadable(maxValue, MEMORY_UNITS);
+    } else if (colorBy === 'compute_time') {
+      minValue = convertToHumanReadable(minValue, TIME_UNITS);
+      maxValue = convertToHumanReadable(maxValue, TIME_UNITS);
+    }
+    return {
+      minValue: minValue,
+      maxValue: maxValue,
+      startColor: params.startColor,
+      endColor: params.endColor
+    };
+  },
+  _updateFileInput: function(e) {
+    this.set('selectedFile', e);
+  },
+  _datasetsChanged: function(newDatasets, oldDatasets) {
+    if (oldDatasets != null || this.selected == null) {
+      // Select the first dataset by default.
+      this.set('selectedDataset', 0);
+    }
+  },
+  _getFile: function() {
+    this.$.file.click();
+  }
+});
+
+// Private methods.
+var MEMORY_UNITS = [
+  // Atomic unit.
+  {symbol: 'B'},
+  // numUnits specifies how many previous units this unit contains.
+  {symbol: 'KB', numUnits: 1024},
+  {symbol: 'MB', numUnits: 1024},
+  {symbol: 'GB', numUnits: 1024},
+  {symbol: 'TB', numUnits: 1024},
+  {symbol: 'PB', numUnits: 1024}
+];
+var TIME_UNITS = [
+  // Atomic unit. Finest granularity in TensorFlow stat collection.
+  {symbol: 'µs'},
+  // numUnits specifies how many previous units this unit contains.
+  {symbol: 'ms', numUnits: 1000},
+  {symbol: 's', numUnits: 1000},
+  {symbol: 'min', numUnits: 60},
+  {symbol: 'hr', numUnits: 60},
+  {symbol: 'days', numUnits: 24}
+];
+
+/**
+  * Returns the human readable version of the unit.
+  * (e.g. 1.35 GB, 23 MB, 34 ms, 6.53 min etc).
+  */
+function convertToHumanReadable(value, units, unitIndex) {
+  unitIndex = unitIndex == null ? 0 : unitIndex;
+  if (unitIndex + 1 < units.length && value >= units[unitIndex + 1].numUnits) {
+    return convertToHumanReadable(value / units[unitIndex + 1].numUnits,
+        units, unitIndex + 1);
+  }
+  // toPrecision() has the tendency to return a number in scientific
+  // notation and (number - 0) brings it back to normal notation.
+  return (value.toPrecision(3) - 0) + ' ' + units[unitIndex].symbol;
+}
+})(); // Closing private scope.
+</script>
+</dom-module>
+<dom-module id="tf-graph-dashboard" assetpath="../components/tf-graph-dashboard/">
+<template>
+<div id="plumbing">
+  <tf-url-generator out-runs-url="{{_runsUrl}}" out-graph-url-generator="{{_graphUrlGen}}" id="urlGenerator"></tf-url-generator>
+  <tf-run-generator id="runGenerator" url="[[_runsUrl]]" out-runs-with-graph="{{_runsWithGraph}}"></tf-run-generator>
+</div>
+<template is="dom-if" if="[[_datasetsEmpty(_datasets)]]">
+<div class="warning">
+  <p>
+    No graph definition files were found.
+  </p>
+  <p>
+    To store a graph, create a
+    <code>tf.python.training.summary_io.SummaryWriter</code>
+    and pass the graph either via the constructor, or by calling its
+    <code>add_graph()</code> method.
+  </p>
+</div>
+</template>
+<template is="dom-if" if="[[!_datasetsEmpty(_datasets)]]">
+<tf-dashboard-layout>
+<div class="sidebar">
+  <tf-graph-controls id="controls" color-by-params="[[_colorByParams]]" has-stats="[[_hasStats]]" color-by="{{_colorBy}}" ,="" datasets="[[_datasets]]" selected-dataset="{{_selectedDataset}}" selected-file="{{_selectedFile}}"></tf-graph-controls>
+  <tf-graph-loader id="loader" datasets="[[_datasets]]" ,="" selected-dataset="[[_selectedDataset]]" selected-file="[[_selectedFile]]" out-graph-hierarchy="{{_graphHierarchy}}" out-graph="{{_graph}}" out-graph-name="{{_graphName}}" has-stats="{{_hasStats}}" progress="{{_progress}}"></tf-graph-loader>
+</div>
+<div class="center">
+    <tf-graph-board id="graphboard" graph-hierarchy="[[_graphHierarchy]]" graph="[[_graph]]" has-stats="[[_hasStats]]" graph-name="[[_graphName]]" progress="[[_progress]]" color-by="[[_colorBy]]" color-by-params="{{_colorByParams}}">
+    </tf-graph-board>
+</div>
+</tf-dashboard-layout></template>
+<style>
+
+:host /deep/ {
+  font-family: 'Roboto', sans-serif;
+}
+
+.center {
+  height: 100%;
+}
+
+</style>
+<style include="warning-style"></style>
+</template>
+</dom-module>
+
+<script>
+(function() {
+Polymer({
+  is: 'tf-graph-dashboard',
+  properties: {
+    _runsWithGraph: Array,
+    _datasets: {
+      type: Object,
+      computed: '_getDatasets(_runsWithGraph, _graphUrlGen)'
+    }
+  },
+  _getDatasets: function(runsWithGraph, graphUrlGen) {
+    return _.map(runsWithGraph, function(runName) {
+      return {
+        name: runName,
+        path: graphUrlGen(runName)
+      };
+    });
+  },
+  _datasetsEmpty: function(datasets) {
+    return !datasets || !datasets.length;
+  }
+});
+})();
+</script>
+</div><dom-module id="tf-tensorboard">
+  <template>
+    <paper-header-panel>
+      <paper-toolbar id="toolbar">
+        <div id="toolbar-content">
+          <div class="toolbar-title">
+            TensorBoard
+          </div>
+          <div class="right-buttons">
+            <paper-button class="link-button" on-click="chooseEvents" active$="[[eventDashboard(mode)]]" noink="">Events</paper-button>
+            <paper-button class="link-button" on-click="chooseImages" active$="[[imageDashboard(mode)]]" noink="">Images</paper-button>
+            <paper-button class="link-button" on-click="chooseGraphs" active$="[[graphDashboard(mode)]]" noink="">Graph</paper-button>
+            <paper-button class="link-button" on-click="chooseHistograms" active$="[[histogramDashboard(mode)]]" noink="">Histograms</paper-button>
+          </div>
+        </div>
+      </paper-toolbar>
+      <div id="content" class="fit">
+        <template is="dom-if" if="[[eventDashboard(mode)]]">
+          <tf-event-dashboard id="eventDash"></tf-event-dashboard>
+        </template>
+
+        <template is="dom-if" if="[[imageDashboard(mode)]]">
+          <tf-image-dashboard id="imageDash"></tf-image-dashboard>
+        </template>
+
+        <template is="dom-if" if="[[graphDashboard(mode)]]">
+          <tf-graph-dashboard id="graphDash"></tf-graph-dashboard>
+        </template>
+
+        <template is="dom-if" if="[[histogramDashboard(mode)]]">
+          <tf-histogram-dashboard id="histogramDash"></tf-histogram-dashboard>
+        </template>
+      </div>
+    </paper-header-panel>
+    <style>
+      #toolbar {
+        background-color: var(--tb-orange-strong);
+        background-image: radial-gradient(ellipse, var(--tb-orange-weak), var(--tb-orange-strong));
+      }
+      #toolbar-content {
+        width: 100%;
+        display: flex;
+        flex-direction: row;
+        justify-content: space-between;
+        align-items: center;
+      }
+      .toolbar-title {
+        font-size: 30px;
+      }
+      #content {
+        height: 100%;
+      }
+      .link-button {
+        height: 30px;
+      }
+      [active] {
+        font-weight: bold;
+      }
+      :host {
+        height: 100%;
+        display: block;
+      }
+    </style>
+  </template>
+  <script>
+    Polymer({
+      is: "tf-tensorboard",
+      properties: {
+        mode: {
+          type: String,
+          value: "events",
+        },
+      },
+      chooseEvents: function() {
+        this.mode = "events";
+      },
+      chooseImages: function() {
+        this.mode = "images";
+      },
+      chooseGraphs: function() {
+        this.mode = "graphs";
+      },
+      chooseHistograms: function() {
+        this.mode = "histograms";
+      },
+      eventDashboard: function(mode) {
+        return mode === "events";
+      },
+      imageDashboard: function(mode) {
+        return mode === "images";
+      },
+      graphDashboard: function(mode) {
+        return mode === "graphs";
+      },
+      histogramDashboard: function(mode) {
+        return mode === "histograms";
+      }
+    });
+  </script>
+</dom-module>
+</body></html>
+\ No newline at end of file
diff --git a/tensorflow/tensorboard/float_wrapper.py b/tensorflow/tensorboard/float_wrapper.py
new file mode 100644
index 0000000000..9fe45d9070
--- /dev/null
+++ b/tensorflow/tensorboard/float_wrapper.py
@@ -0,0 +1,30 @@
+"""A module providing a function for serializing JSON values with Infinity.
+
+Python provides no way to override how json.dumps serializes
+Infinity/-Infinity/NaN; if allow_nan is true, it encodes them as
+Infinity/-Infinity/NaN, in violation of the JSON spec and in violation of what
+JSON.parse accepts. If it's false, it throws a ValueError, Neither subclassing
+JSONEncoder nor passing a function in the |default| keyword argument overrides
+this.
+"""
+
+import math
+
+
+def WrapSpecialFloats(obj):
+  """Replaces all instances of Infinity/-Infinity/NaN with strings."""
+  if obj == float('inf'):
+    return 'Infinity'
+  elif obj == float('-inf'):
+    return '-Infinity'
+  elif isinstance(obj, float) and math.isnan(obj):
+    return 'NaN'
+  elif isinstance(obj, list) or isinstance(obj, tuple):
+    return map(WrapSpecialFloats, obj)
+  elif isinstance(obj, dict):
+    return {
+        WrapSpecialFloats(k): WrapSpecialFloats(v)
+        for k, v in obj.items()
+    }
+  else:
+    return obj
diff --git a/tensorflow/tensorboard/float_wrapper_test.py b/tensorflow/tensorboard/float_wrapper_test.py
new file mode 100644
index 0000000000..5f6594733c
--- /dev/null
+++ b/tensorflow/tensorboard/float_wrapper_test.py
@@ -0,0 +1,38 @@
+import tensorflow.python.platform
+
+from tensorflow.python.platform import googletest
+from tensorflow.tensorboard import float_wrapper
+
+_INFINITY = float('inf')
+
+
+class FloatWrapperTest(googletest.TestCase):
+
+  def _assertWrapsAs(self, to_wrap, expected):
+    """Asserts that |to_wrap| becomes |expected| when wrapped."""
+    actual = float_wrapper.WrapSpecialFloats(to_wrap)
+    for a, e in zip(actual, expected):
+      self.assertEqual(e, a)
+
+  def testWrapsPrimitives(self):
+    self._assertWrapsAs(_INFINITY, 'Infinity')
+    self._assertWrapsAs(-_INFINITY, '-Infinity')
+    self._assertWrapsAs(float('nan'), 'NaN')
+
+  def testWrapsObjectValues(self):
+    self._assertWrapsAs({'x': _INFINITY}, {'x': 'Infinity'})
+
+  def testWrapsObjectKeys(self):
+    self._assertWrapsAs({_INFINITY: 'foo'}, {'Infinity': 'foo'})
+
+  def testWrapsInListsAndTuples(self):
+    self._assertWrapsAs([_INFINITY], ['Infinity'])
+    # map() returns a list even if the argument is a tuple.
+    self._assertWrapsAs((_INFINITY,), ['Infinity',])
+
+  def testWrapsRecursively(self):
+    self._assertWrapsAs({'x': [_INFINITY]}, {'x': ['Infinity']})
+
+
+if __name__ == '__main__':
+  googletest.main()
diff --git a/tensorflow/tensorboard/gulpfile.js b/tensorflow/tensorboard/gulpfile.js
new file mode 100644
index 0000000000..34b567d62a
--- /dev/null
+++ b/tensorflow/tensorboard/gulpfile.js
@@ -0,0 +1,170 @@
+// Based on the gulpfile provided by angular team
+// (https://github.com/angular/ts2dart/blob/master/gulpfile.js)
+var gulp = require('gulp');
+var ts = require('gulp-typescript');
+var typescript = require('typescript');
+var gutil = require('gulp-util');
+var mochaPhantomJS = require('gulp-mocha-phantomjs');
+var tslint = require('gulp-tslint');
+var server = require('gulp-server-livereload');
+var concat = require('gulp-concat');
+var merge = require('merge2');
+var gulpFilter = require('gulp-filter');
+var vulcanize = require('gulp-vulcanize');
+var rename = require('gulp-rename');
+var minimist = require('minimist');
+var replace = require('gulp-replace');
+
+var options = minimist(process.argv.slice(2), {
+  default: {
+    p: 8000  // port for gulp server
+  }
+});
+
+var tsProject = ts.createProject('tsconfig.json', {
+  typescript: typescript,
+  noExternalResolve: true, // opt-in for faster compilation!
+});
+
+var hasError;
+var failOnError = true; // Is set to false when watching.
+
+var onError = function(err) {
+  hasError = true;
+  gutil.log(err.message);
+  if (failOnError) {
+    process.exit(1);
+  }
+};
+
+gulp.task('compile.all', function() {
+  hasError = false;
+  var isComponent = gulpFilter(['components/**/*.js', '!components/**/test/*']);
+  var isApp = gulpFilter(['app/**/*.js']);
+  var isTest = gulpFilter(['test/**/*', 'components/**/test/*']);
+
+  var srcs = ['components/**/*.ts', 'test/**/*.ts', 'app/**/*.ts',
+              'typings/**/*.d.ts', 'bower_components/**/*.d.ts'];
+
+  var tsResult = gulp.src(srcs, {base: '.'})
+                     .pipe(ts(tsProject))
+                     .on('error', onError);
+  return merge([
+    // Send concatenated component code to build/component
+    tsResult.js
+            .pipe(isComponent)
+            .pipe(concat('components.js'))
+            .pipe(gulp.dest('build')),
+
+    // Duplicate all component code to live next to the ts file
+    // (makes polymer imports very clean)
+    tsResult.js
+            .pipe(isComponent)
+            .pipe(gulp.dest('.')),
+
+    tsResult.js
+            .pipe(isApp)
+            .pipe(gulp.dest('.')),
+
+    // Send all test code to build/test.js
+    tsResult.js
+            .pipe(isTest)
+            .pipe(concat('test.js'))
+            .pipe(gulp.dest('build')),
+
+    // Create a unified defintions file at build/all.d.ts
+    tsResult.dts
+            .pipe(concat('all.d.ts'))
+            .pipe(gulp.dest('build')),
+  ]);
+});
+
+var tslintTask = function(strict) {
+  return function(done) {
+    if (hasError) {
+      done();
+      return;
+    }
+    return gulp.src(['components/**/*.ts', 'test/**/*.ts'])
+               .pipe(tslint())
+               .pipe(tslint.report('verbose', {
+                  emitError: strict,
+               }));
+ };
+};
+
+// Since constructs like console.log are disabled by tslint
+// but very useful while developing, create a "permissive"
+// version of tslint that warns without erroring, for the
+// watch task.
+gulp.task('tslint-permissive', [], tslintTask(false));
+gulp.task('tslint-strict', [], tslintTask(true));
+
+
+gulp.task('run-tests', ['compile.all'], function(done) {
+  if (hasError) {
+    done();
+    return;
+  }
+  return gulp.src('tests.html')
+    .pipe(mochaPhantomJS({reporter: 'dot'}));
+});
+
+gulp.task('test', ['run-tests', 'tslint-strict']);
+gulp.task('watch', ['run-tests', 'tslint-permissive'], function() {
+  failOnError = false;
+  // Avoid watching generated .d.ts in the build (aka output) directory.
+  return gulp.watch(['test/**/*.ts', 'components/**/*.ts'],
+          {ignoreInitial: true},
+          ['run-tests', 'tslint-permissive']);
+});
+
+gulp.task('server', function() {
+  gulp.src('.')
+      .pipe(server({
+        host: '0.0.0.0',
+        port: options.p,
+        livereload: {
+          enable: true,
+          port: 27729 + options.p
+        },
+        directoryListing: true,
+      }));
+});
+
+
+var linkRegex = /<link rel="[^"]*" (type="[^"]*" )?href=".*bower_components[^"]*">\n/g;
+var scriptRegex = /<script src=".*bower_components[^"]*"><\/script>\n/g;
+gulp.task('vulcanize', ['compile.all', 'tslint-strict'], function() {
+      gulp.src('app/tf-tensorboard.html')
+          .pipe(vulcanize({
+            inlineScripts: true,
+            inlineCss: true,
+            stripComments: true,
+            excludes: ['/bower_components/'],
+          }))
+          // TODO(danmane): Remove this worrysome brittleness when vulcanize
+          // fixes https://github.com/Polymer/vulcanize/issues/273
+          .pipe(replace(linkRegex, ''))
+          .pipe(replace(scriptRegex, ''))
+          .pipe(gulp.dest('dist'));
+
+      gulp.src('app/index.html')
+          .pipe(vulcanize({
+            inlineScripts: true,
+            inlineCss: true,
+            stripComments: true,
+          }))
+          .pipe(gulp.dest('dist'));
+
+      gulp.src('app/tf-tensorboard-demo.html')
+          .pipe(vulcanize({
+            inlineScripts: true,
+            inlineCss: true,
+            stripComments: true,
+          }))
+          .pipe(gulp.dest('dist'));
+});
+
+gulp.task('serve', ['server']); // alias
+gulp.task('default', ['watch']);
diff --git a/tensorflow/tensorboard/http_api.md b/tensorflow/tensorboard/http_api.md
new file mode 100644
index 0000000000..953c020692
--- /dev/null
+++ b/tensorflow/tensorboard/http_api.md
@@ -0,0 +1,210 @@
+# Tensorboard client-server HTTP API
+
+## Runs, Tags, and Tag Types
+
+TensorBoard data is organized around the concept of a `run`, which represents
+all the related data thrown off by a single execution of TensorFlow, a `tag`,
+which groups values of data that come from the same source within a TensorFlow
+run, and `tag types`, which are our way of distinguishing different types of
+data that have fundamentally different representations and should be processed
+on different code paths. For example, a "train" run may have a `scalars`
+tag that represents the learning rate, another `scalars` tag that
+represents the value of the objective function, a `histograms` tag that reveals
+information on weights in a particular layer over time, and an `images` tag that
+shows input images flowing into the system. The "eval" run might have an
+entirely different set of tag names, or some duplicated tag names.
+
+The currently supported tag types are `scalars`, `images`, `histograms` and
+`graph`. Each tag type corresponds to a route (documented below) for
+retrieving tag data of that type.
+
+All of the data provided comes from TensorFlow events files ('\*.tfevents\*'),
+which are written using the SummaryWriter class
+(tensorflow/python/training/summary_writer.py), and the data is generated by
+summary ops (tensorflow/python/ops/summary_ops.py). The `scalars` come from
+the `ScalarSummary` op, the `histograms` from the `HistogramSummary`, and the
+`images` from `ImageSummary`. The tag type `graph` is special in that it is not
+a collection of tags of that type, but a boolean denoting if there is a graph
+definition associated with the run. The tag is provided to the summary
+op (usually as a constant).
+
+## `/runs`
+
+Returns a dictionary mapping from `run name` (quoted string) to dictionaries
+mapping from all available tagTypes to a list of tags of that type available for
+the run. Think of this as a comprehensive index of all of the data available
+from the TensorBoard server. Here is an example:
+
+{
+  "train_run": {
+    "histograms": ["foo_histogram", "bar_histogram"],
+    "scalars": ["xent", "loss", "learning_rate"],
+    "images": ["input"],
+    "graph": true
+  },
+  "eval": {
+    "histograms": ["foo_histogram", "bar_histogram"],
+    "scalars": ["precision", "recall"],
+    "images": ["input"],
+    "graph": false
+  }
+}
+
+Note that the same tag may be present for many runs. It is not guaranteed that
+they will have the same meaning across runs. It is also not guaranteed that they
+will have the same tag type across different runs.
+
+## '/scalars?run=foo&tag=bar'
+
+Returns an array of event_accumulator.SimpleValueEvents ([wall_time, step,
+value]) for the given run and tag. wall_time is seconds since epoch.
+
+Example:
+[
+  [1443856985.705543, 1448, 0.7461960315704346],  # wall_time, step, value
+  [1443857105.704628, 3438, 0.5427092909812927],
+  [1443857225.705133, 5417, 0.5457325577735901],
+  ...
+]
+
+If the format parameter is set to 'csv', the response will instead be in CSV
+format:
+
+    Wall time,step,value
+    1443856985.705543,1448,0.7461960315704346
+    1443857105.704628,3438,0.5427092909812927
+    1443857225.705133,5417,0.5457325577735901
+
+
+## '/histograms?run=foo&tag=bar'
+
+Returns an array of event_accumulator.HistogramEvents ([wall_time, step,
+HistogramValue]) for the given run and tag. A HistogramValue is [min, max, num,
+sum, sum_squares, bucket_limit, bucket]. wall_time is seconds since epoch.
+
+Annotated Example: (note - real data is higher precision)
+
+[
+  [
+    1443871386.185149, # wall_time
+    235166,            # step
+    [
+      -0.66,           # minimum value
+      0.44,            # maximum value
+      8.0,             # number of items in the histogram
+      -0.80,           # sum of items in the histogram
+      0.73,            # sum of squares of items in the histogram
+      [-0.68, -0.62, -0.292, -0.26, -0.11, -0.10, -0.08, -0.07, -0.05,
+       -0.0525, -0.0434, -0.039, -0.029, -0.026, 0.42, 0.47, 1.8e+308],
+                       # the right edge of each bucket
+     [0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0,
+      1.0, 0.0]        # the number of elements within each bucket
+     ]
+   ]
+ ]
+
+## '/compressedHistograms?run=foo&tag=bar'
+
+Returns an array of event_accumulator.CompressedHistogramEvents ([wall_time,
+step, CompressedHistogramValues]) for the given run and tag.
+
+CompressedHistogramValues is a list of namedtuples with each tuple specifying
+a basis point (bps) as well as an interpolated value of the histogram value
+at that basis point. A basis point is 1/100 of a percent.
+
+The current compression strategy is to choose basis points that correspond to
+the median and bands of 1SD, 2SD, and 3SDs around the median. Note that the
+current compression strategy does not work well for representing multimodal
+data -- this is something that will be improved in a later iteration.
+
+Annotated Example: (note - real data is higher precision)
+
+[
+  [
+    1441154832.580509,   # wall_time
+    5,                   # step
+    [  [0, -3.67],       # CompressedHistogramValue for 0th percentile
+       [2500, -4.19],    # CompressedHistogramValue for 25th percentile
+       [5000, 6.29],
+       [7500, 1.64],
+       [10000, 3.67]
+    ]
+  ],
+  ...
+]
+
+## `/images?run=foo&tag=bar`
+
+Gets a sample of ImageMetadatas for the given run and tag.
+
+Returns an array of objects containing information about available images,
+crucially including the query parameter that may be used to retrieve that image.
+(See /individualImage for details.)
+
+For example:
+      {
+        "width": 28,                 # width in pixels
+        "height": 28,                # height in pixels
+        "wall_time": 1440210599.246, # time in seconds since epoch
+        "step": 63702821,            # number of steps that have passed
+        "query": "index=0&tagname=input%2Fimage%2F2&run=train"
+                                     # param for /individualImage
+      }
+
+## `/individualImage?{{query}}`
+
+Retrieves an individual image. The image query should not be generated by the
+frontend, but instead acquired from calling the /images route (the image
+metadata objects contain the query to use). The response is the image itself
+with mime-type 'image/png'.
+
+Note that the query is not guaranteed to always refer to the same image even
+within a single run, as images may be removed from the sampling reservoir and
+replaced with other images. (See Notes for details on the reservoir sampling.)
+
+An example call to this route would look like this:
+/individualImage?index=0&tagname=input%2Fimage%2F2&run=train
+
+## `/graph?run=foo`
+
+Returns the graph definition for the given run in gzipped pbtxt format. The
+graph is composed of a list of nodes, where each node is a specific TensorFlow
+operation which takes as inputs other nodes (operations).
+
+An example pbtxt response of graph with 3 nodes:
+node {
+  op: "Input"
+  name: "A"
+}
+node {
+  op: "Input"
+  name: "B"
+}
+node {
+  op: "MatMul"
+  name: "C"
+  input: "A"
+  input: "B"
+}
+
+## Notes
+
+All returned values, histograms, and images are returned in the order they were
+written by Tensorflow (which should correspond to increasing `wall_time` order,
+but may not necessarily correspond to increasing step count if the process had
+to restart from a previous checkpoint).
+
+The returned values may be downsampled using reservoir sampling, which is
+configurable by the TensorBoard server. When downsampling occurs, the server
+guarantees that different tags will all sample at the same sequence of indices,
+so that if if there are two tags `A` and `B` which are related so that `A[i] ~
+B[i]` for all `i`, then `D(A)[i] ~ D(B)[i]` for all `i`, where `D` represents
+the downsampling operation.
+
+The reservoir sampling puts an upper bound on the number of items that will be
+returned for a given run-tag combination, and guarantees that all items are
+equally likely to be in the final sample (ie it is a uniform distribution over
+the values), with the proviso that the most recent individual item is always
+included in the sample.
+
+The reservoir sizes are configurable on a per-tag type basis.
diff --git a/tensorflow/tensorboard/lib/css/global.css b/tensorflow/tensorboard/lib/css/global.css
new file mode 100644
index 0000000000..b3681d766f
--- /dev/null
+++ b/tensorflow/tensorboard/lib/css/global.css
@@ -0,0 +1,6 @@
+html,body {
+  margin: 0;
+  padding: 0;
+  height: 100%;
+  font-family: "RobotoDraft","Roboto",sans-serif;
+}
diff --git a/tensorflow/tensorboard/lib/svg/summary-icon.svg b/tensorflow/tensorboard/lib/svg/summary-icon.svg
new file mode 100644
index 0000000000..f66c99580c
--- /dev/null
+++ b/tensorflow/tensorboard/lib/svg/summary-icon.svg
@@ -0,0 +1,3 @@
+<svg fill="#848484" height="24" viewBox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg">
+    <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
+</svg>
diff --git a/tensorflow/tensorboard/package.json b/tensorflow/tensorboard/package.json
new file mode 100644
index 0000000000..3c78a5831c
--- /dev/null
+++ b/tensorflow/tensorboard/package.json
@@ -0,0 +1,34 @@
+{
+  "name": "tensorflow-vis",
+  "version": "0.0.0",
+  "description": "Visualizers for TensorFlow",
+  "scripts": {
+    "test": "gulp test"
+  },
+  "keywords": [
+    "tensorflow"
+  ],
+  "author": "Google",
+  "license": "Apache-2.0",
+  "devDependencies": {
+    "gulp": "~3.9.0",
+    "gulp-typescript": "~2.8.0",
+    "tsd": "~0.6.3",
+    "chai": "~3.2.0",
+    "typescript": "~1.5.3",
+    "gulp-cli": "~0.3.0",
+    "gulp-util": "~3.0.6",
+    "gulp-mocha-phantomjs": "~0.8.0",
+    "mocha": "~2.2.5",
+    "gulp-tslint": "~3.1.1-beta",
+    "gulp-server-livereload": "~1.4.0",
+    "gulp-concat": "~2.6.0",
+    "merge2": "~0.3.6",
+    "gulp-filter": "~3.0.0",
+    "gulp-rename": "~1.2.2",
+    "vulcanize": "~1.14.0",
+    "gulp-vulcanize": "~6.0.1",
+    "minimist": "~1.2.0",
+    "gulp-replace": "~0.5.4"
+  }
+}
diff --git a/tensorflow/tensorboard/tensorboard.py b/tensorflow/tensorboard/tensorboard.py
new file mode 100644
index 0000000000..dcbc50401c
--- /dev/null
+++ b/tensorflow/tensorboard/tensorboard.py
@@ -0,0 +1,139 @@
+"""Serve TensorFlow summary data to a web frontend.
+
+This is a simple web server to proxy data from the event_loader to the web, and
+serve static web files.
+"""
+
+import BaseHTTPServer
+import functools
+import os
+import socket
+import SocketServer
+
+import tensorflow.python.platform
+
+from tensorflow.python.platform import app
+from tensorflow.python.platform import flags
+from tensorflow.python.platform import logging
+from tensorflow.python.platform import status_bar
+from tensorflow.python.summary import event_accumulator
+from tensorflow.python.summary import event_multiplexer
+from tensorflow.tensorboard import tensorboard_handler
+
+flags.DEFINE_string('logdir', None, """
+logdir specifies where TensorBoard will look to find TensorFlow event files
+that it can display. In the simplest case, logdir is a directory containing
+tfevents files. TensorBoard also supports comparing multiple TensorFlow
+executions: to do this, you can use directory whose subdirectories contain
+tfevents files, as in the following example:
+
+foo/bar/logdir/
+foo/bar/logdir/mnist_1/events.out.tfevents.1444088766
+foo/bar/logdir/mnist_2/events.out.tfevents.1444090064
+
+You may also pass a comma seperated list of log directories, and you can
+assign names to individual log directories by putting a colon between the name
+and the path, as in
+
+tensorboard --logdir=name1:/path/to/logs/1,name2:/path/to/logs/2
+""")
+flags.DEFINE_boolean('debug', False, 'Whether to run the app in debug mode. '
+                     'This increases log verbosity to DEBUG.')
+flags.DEFINE_string('host', '0.0.0.0', 'What host to listen to. Defaults to '
+                    'allowing remote access, set to 127.0.0.1 to serve only on '
+                    'localhost.')
+flags.DEFINE_integer('port', 6006, 'What port to serve TensorBoard on.')
+
+FLAGS = flags.FLAGS
+
+# How many elements to store per tag, by tag type
+TENSORBOARD_SIZE_GUIDANCE = {
+    event_accumulator.COMPRESSED_HISTOGRAMS: 500,
+    event_accumulator.IMAGES: 4,
+    event_accumulator.SCALARS: 10000,
+    event_accumulator.HISTOGRAMS: 1,
+}
+
+
+def ParseEventFilesFlag(flag_value):
+  """Parses the logdir flag into a map from paths to run group names.
+
+  The events files flag format is a comma-separated list of path specifications.
+  A path specification either looks like 'group_name:/path/to/directory' or
+  '/path/to/directory'; in the latter case, the group is unnamed. Group names
+  cannot start with a forward slash: /foo:bar/baz will be interpreted as a
+  spec with no name and path '/foo:bar/baz'.
+
+  Globs are not supported.
+
+  Args:
+    flag_value: A comma-separated list of run specifications.
+  Returns:
+    A dict mapping directory paths to names like {'/path/to/directory': 'name'}.
+    Groups without an explicit name are named after their path. If flag_value
+    is None, returns an empty dict, which is helpful for testing things that
+    don't require any valid runs.
+  """
+  files = {}
+  if flag_value is None:
+    return files
+  for specification in flag_value.split(','):
+    # If the spec looks like /foo:bar/baz, then we assume it's a path with a
+    # colon.
+    if ':' in specification and specification[0] != '/':
+      # We split at most once so run_name:/path:with/a/colon will work.
+      run_name, path = specification.split(':', 1)
+    else:
+      run_name = None
+      path = specification
+    files[path] = run_name
+  return files
+
+
+class ThreadedHTTPServer(SocketServer.ThreadingMixIn,
+                         BaseHTTPServer.HTTPServer):
+  """A threaded HTTP server."""
+  daemon = True
+
+
+def main(unused_argv=None):
+  # Change current working directory to tensorflow/'s parent directory.
+  server_root = os.path.join(os.path.dirname(__file__),
+                             os.pardir, os.pardir)
+  os.chdir(server_root)
+
+  if FLAGS.debug:
+    logging.set_verbosity(logging.DEBUG)
+
+  if not FLAGS.logdir:
+    logging.error('A logdir must be specified. Run `tensorboard --help` for '
+                  'details and examples.')
+    return -1
+
+  if FLAGS.debug:
+    logging.info('Starting TensorBoard in directory %s' % os.getcwd())
+
+  path_to_run = ParseEventFilesFlag(FLAGS.logdir)
+  multiplexer = event_multiplexer.AutoloadingMultiplexer(
+      path_to_run=path_to_run, interval_secs=60,
+      size_guidance=TENSORBOARD_SIZE_GUIDANCE)
+
+  multiplexer.AutoUpdate(interval=30)
+
+  factory = functools.partial(tensorboard_handler.TensorboardHandler,
+                              multiplexer)
+  try:
+    server = ThreadedHTTPServer((FLAGS.host, FLAGS.port), factory)
+  except socket.error:
+    logging.error('Tried to connect to port %d, but that address is in use.' %
+                  FLAGS.port)
+    return -2
+
+  status_bar.SetupStatusBarInsideGoogle('TensorBoard', FLAGS.port)
+  print 'Starting TensorBoard on port %d' % FLAGS.port
+  print '(You can navigate to http://localhost:%d)' % FLAGS.port
+  server.serve_forever()
+
+
+if __name__ == '__main__':
+  app.run()
diff --git a/tensorflow/tensorboard/tensorboard_handler.py b/tensorflow/tensorboard/tensorboard_handler.py
new file mode 100644
index 0000000000..cd50f43069
--- /dev/null
+++ b/tensorflow/tensorboard/tensorboard_handler.py
@@ -0,0 +1,379 @@
+"""TensorBoard server handler logic.
+
+TensorboardHandler contains all the logic for serving static files off of disk
+and for handling the API calls to endpoints like /tags that require information
+about loaded events.
+"""
+
+import BaseHTTPServer
+import csv
+import gzip
+import imghdr
+import json
+import mimetypes
+import os
+import StringIO
+import urllib
+import urlparse
+
+from google.protobuf import text_format
+import tensorflow.python.platform
+
+from tensorflow.python.platform import logging
+from tensorflow.python.platform import resource_loader
+from tensorflow.python.summary import event_accumulator
+from tensorflow.tensorboard import float_wrapper
+
+RUNS_ROUTE = '/runs'
+SCALARS_ROUTE = '/' + event_accumulator.SCALARS
+IMAGES_ROUTE = '/' + event_accumulator.IMAGES
+HISTOGRAMS_ROUTE = '/' + event_accumulator.HISTOGRAMS
+COMPRESSED_HISTOGRAMS_ROUTE = '/' + event_accumulator.COMPRESSED_HISTOGRAMS
+INDIVIDUAL_IMAGE_ROUTE = '/individualImage'
+GRAPH_ROUTE = '/' + event_accumulator.GRAPH
+
+_IMGHDR_TO_MIMETYPE = {
+    'bmp': 'image/bmp',
+    'gif': 'image/gif',
+    'jpeg': 'image/jpeg',
+    'png': 'image/png'
+}
+_DEFAULT_IMAGE_MIMETYPE = 'application/octet-stream'
+
+
+def _content_type_for_image(encoded_image_string):
+  image_type = imghdr.what(None, encoded_image_string)
+  return _IMGHDR_TO_MIMETYPE.get(image_type, _DEFAULT_IMAGE_MIMETYPE)
+
+
+class _OutputFormat(object):
+  """An enum used to list the valid output formats for API calls.
+
+  Not all API calls support all formats (for example, only scalars and
+  compressed histograms support CSV).
+  """
+  JSON = 'json'
+  CSV = 'csv'
+
+
+class TensorboardHandler(BaseHTTPServer.BaseHTTPRequestHandler):
+  """Handler class for use with BaseHTTPServer.HTTPServer.
+
+  This is essentially a thin wrapper around calls to an EventMultiplexer object
+  as well as serving files off disk.
+  """
+
+  def __init__(self, multiplexer, *args):
+    self._multiplexer = multiplexer
+    BaseHTTPServer.BaseHTTPRequestHandler.__init__(self, *args)
+
+  # We use underscore_names for consistency with inherited methods.
+
+  def _image_response_for_run(self, run_images, run, tag):
+    """Builds a JSON-serializable object with information about run_images.
+
+    Args:
+      run_images: A list of event_accumulator.ImageValueEvent objects.
+      run: The name of the run.
+      tag: The name of the tag the images all belong to.
+
+    Returns:
+      A list of dictionaries containing the wall time, step, URL, width, and
+      height for each image.
+    """
+    response = []
+    for index, run_image in enumerate(run_images):
+      response.append({
+          'wall_time': run_image.wall_time,
+          'step': run_image.step,
+          # We include the size so that the frontend can add that to the <img>
+          # tag so that the page layout doesn't change when the image loads.
+          'width': run_image.width,
+          'height': run_image.height,
+          'query': self._query_for_individual_image(run, tag, index)
+      })
+    return response
+
+  def _path_is_safe(self, path):
+    """Check path is safe (stays within current directory).
+
+    This is for preventing directory-traversal attacks.
+
+    Args:
+      path: The path to check for safety.
+
+    Returns:
+      True if the given path stays within the current directory, and false
+      if it would escape to a higher directory. E.g. _path_is_safe('index.html')
+      returns true, but _path_is_safe('../../../etc/password') returns false.
+    """
+    base = os.path.abspath(os.curdir)
+    absolute_path = os.path.abspath(path)
+    prefix = os.path.commonprefix([base, absolute_path])
+    return prefix == base
+
+  def _send_gzip_response(self, content, content_type, code=200):
+    """Writes the given content as gzip response using the given content type.
+
+    Args:
+      content: The content to respond with.
+      content_type: The mime type of the content.
+      code: The numeric HTTP status code to use.
+    """
+    out = StringIO.StringIO()
+    f = gzip.GzipFile(fileobj=out, mode='w')
+    f.write(content)
+    f.close()
+    gzip_content = out.getvalue()
+    self.send_response(code)
+    self.send_header('Content-Type', content_type)
+    self.send_header('Content-Length', len(gzip_content))
+    self.send_header('Content-Encoding', 'gzip')
+    self.end_headers()
+    self.wfile.write(gzip_content)
+
+  def _send_json_response(self, obj, code=200):
+    """Writes out the given object as JSON using the given HTTP status code.
+
+    This also replaces special float values with stringified versions.
+
+    Args:
+      obj: The object to respond with.
+      code: The numeric HTTP status code to use.
+    """
+
+    output = json.dumps(float_wrapper.WrapSpecialFloats(obj))
+
+    self.send_response(code)
+    self.send_header('Content-Type', 'application/json')
+    self.send_header('Content-Length', len(output))
+    self.end_headers()
+    self.wfile.write(output)
+
+  def _send_csv_response(self, serialized_csv, code=200):
+    """Writes out the given string, which represents CSV data.
+
+    Unlike _send_json_response, this does *not* perform the CSV serialization
+    for you. It only sets the proper headers.
+
+    Args:
+      serialized_csv: A string containing some CSV data.
+      code: The numeric HTTP status code to use.
+    """
+
+    self.send_response(code)
+    self.send_header('Content-Type', 'text/csv')
+    self.send_header('Content-Length', len(serialized_csv))
+    self.end_headers()
+    self.wfile.write(serialized_csv)
+
+  def _serve_scalars(self, query_params):
+    """Given a tag and single run, return array of ScalarEvents."""
+    # TODO(cassandrax): return HTTP status code for malformed requests
+    tag = query_params.get('tag')
+    run = query_params.get('run')
+    values = self._multiplexer.Scalars(run, tag)
+
+    if query_params.get('format') == _OutputFormat.CSV:
+      string_io = StringIO.StringIO()
+      writer = csv.writer(string_io)
+      writer.writerow(['Wall time', 'Step', 'Value'])
+      writer.writerows(values)
+      self._send_csv_response(string_io.getvalue())
+    else:
+      self._send_json_response(values)
+
+  def _serve_graph(self, query_params):
+    """Given a single run, return the graph definition in json format."""
+    run = query_params.get('run', None)
+    if run is None:
+      self.send_error(400, 'query parameter "run" is required')
+      return
+
+    try:
+      graph = self._multiplexer.Graph(run)
+    except ValueError:
+      self.send_response(404)
+      return
+
+    # Serialize the graph to pbtxt format.
+    graph_pbtxt = text_format.MessageToString(graph)
+    # Gzip it and send it to the user.
+    self._send_gzip_response(graph_pbtxt, 'text/plain')
+
+  def _serve_histograms(self, query_params):
+    """Given a tag and single run, return an array of histogram values."""
+    tag = query_params.get('tag')
+    run = query_params.get('run')
+    values = self._multiplexer.Histograms(run, tag)
+    self._send_json_response(values)
+
+  def _serve_compressed_histograms(self, query_params):
+    """Given a tag and single run, return an array of compressed histograms."""
+    tag = query_params.get('tag')
+    run = query_params.get('run')
+    compressed_histograms = self._multiplexer.CompressedHistograms(run, tag)
+    if query_params.get('format') == _OutputFormat.CSV:
+      string_io = StringIO.StringIO()
+      writer = csv.writer(string_io)
+
+      # Build the headers; we have two columns for timing and two columns for
+      # each compressed histogram bucket.
+      headers = ['Wall time', 'Step']
+      if compressed_histograms:
+        bucket_count = len(compressed_histograms[0].compressed_histogram_values)
+        for i in xrange(bucket_count):
+          headers += ['Edge %d basis points' % i, 'Edge %d value' % i]
+      writer.writerow(headers)
+
+      for compressed_histogram in compressed_histograms:
+        row = [compressed_histogram.wall_time, compressed_histogram.step]
+        for value in compressed_histogram.compressed_histogram_values:
+          row += [value.rank_in_bps, value.value]
+        writer.writerow(row)
+      self._send_csv_response(string_io.getvalue())
+    else:
+      self._send_json_response(compressed_histograms)
+
+  def _serve_images(self, query_params):
+    """Given a tag and list of runs, serve a list of images.
+
+    Note that the images themselves are not sent; instead, we respond with URLs
+    to the images. The frontend should treat these URLs as opaque and should not
+    try to parse information about them or generate them itself, as the format
+    may change.
+
+    Args:
+      query_params: The query parameters as a dict.
+    """
+    tag = query_params.get('tag')
+    run = query_params.get('run')
+
+    images = self._multiplexer.Images(run, tag)
+    response = self._image_response_for_run(images, run, tag)
+    self._send_json_response(response)
+
+  def _serve_image(self, query_params):
+    """Serves an individual image."""
+    tag = query_params.get('tag')
+    run = query_params.get('run')
+    index = int(query_params.get('index'))
+    image = self._multiplexer.Images(run, tag)[index]
+    encoded_image_string = image.encoded_image_string
+    content_type = _content_type_for_image(encoded_image_string)
+
+    self.send_response(200)
+    self.send_header('Content-Type', content_type)
+    self.send_header('Content-Length', len(encoded_image_string))
+    self.end_headers()
+    self.wfile.write(encoded_image_string)
+
+  def _query_for_individual_image(self, run, tag, index):
+    """Builds a URL for accessing the specified image.
+
+    This should be kept in sync with _serve_image. Note that the URL is *not*
+    guaranteed to always return the same image, since images may be unloaded
+    from the reservoir as new images come in.
+
+    Args:
+      run: The name of the run.
+      tag: The tag.
+      index: The index of the image. Negative values are OK.
+
+    Returns:
+      A string representation of a URL that will load the index-th
+      sampled image in the given run with the given tag.
+    """
+    query_string = urllib.urlencode({
+        'run': run,
+        'tag': tag,
+        'index': index
+    })
+    return query_string
+
+  def _serve_runs(self, unused_query_params):
+    """Return a JSON object about runs and tags.
+
+    Returns a mapping from runs to tagType to list of tags for that run.
+
+    Returns:
+      {runName: {images: [tag1, tag2, tag3],
+                 scalars: [tagA, tagB, tagC],
+                 histograms: [tagX, tagY, tagZ]}}
+    """
+    self._send_json_response(self._multiplexer.Runs())
+
+  def _serve_index(self, unused_query_params):
+    """Serves the index page (i.e., the tensorboard app itself)."""
+    self._serve_static_file('/dist/index.html')
+
+  def _serve_static_file(self, path):
+    """Serves the static file located at the given path.
+
+    Args:
+      path: The path of the static file, relative to the tensorboard/ directory.
+    """
+    # Strip off the leading forward slash.
+    path = path.lstrip('/')
+    if not self._path_is_safe(path):
+      logging.info('path %s not safe, sending 404' % path)
+      # Traversal attack, so 404.
+      self.send_error(404)
+      return
+
+    if path.startswith('external'):
+      path = os.path.join('../', path)
+    else:
+      path = os.path.join('tensorboard', path)
+    # Open the file and read it.
+    try:
+      contents = resource_loader.load_resource(path)
+    except IOError:
+      logging.info('path %s not found, sending 404' % path)
+      self.send_error(404)
+      return
+
+    self.send_response(200)
+
+    mimetype = mimetypes.guess_type(path)[0] or 'application/octet-stream'
+    self.send_header('Content-Type', mimetype)
+    self.end_headers()
+    self.wfile.write(contents)
+
+  def do_GET(self):  # pylint: disable=invalid-name
+    """Handler for all get requests."""
+    parsed_url = urlparse.urlparse(self.path)
+
+    # Remove a trailing slash, if present.
+    clean_path = parsed_url.path
+    if clean_path.endswith('/'):
+      clean_path = clean_path[:-1]
+
+    handlers = {
+        SCALARS_ROUTE: self._serve_scalars,
+        GRAPH_ROUTE: self._serve_graph,
+        HISTOGRAMS_ROUTE: self._serve_histograms,
+        COMPRESSED_HISTOGRAMS_ROUTE: self._serve_compressed_histograms,
+        IMAGES_ROUTE: self._serve_images,
+        INDIVIDUAL_IMAGE_ROUTE: self._serve_image,
+        RUNS_ROUTE: self._serve_runs,
+        '': self._serve_index
+    }
+
+    if clean_path in handlers:
+      query_params = urlparse.parse_qs(parsed_url.query)
+      # parse_qs returns a list of values for each key; we're only interested in
+      # the first.
+      for key in query_params:
+        value_count = len(query_params[key])
+        if value_count != 1:
+          self.send_error(
+              400,
+              'query parameter %s should have exactly one value, had %d' %
+              (key, value_count))
+          return
+
+        query_params[key] = query_params[key][0]
+      handlers[clean_path](query_params)
+    else:
+      self._serve_static_file(clean_path)
diff --git a/tensorflow/tensorboard/tests.html b/tensorflow/tensorboard/tests.html
new file mode 100644
index 0000000000..31773f705c
--- /dev/null
+++ b/tensorflow/tensorboard/tests.html
@@ -0,0 +1,31 @@
+<!DOCTYPE html>
+<html>
+    <head>
+        <title>Mocha</title>
+        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+        <meta name="viewport" content="width=device-width, initial-scale=1.0">
+        <link rel="stylesheet" href="node_modules/mocha/mocha.css" />
+    </head>
+    <body>
+        <div id="mocha"></div>
+        <script src="node_modules/chai/chai.js"></script>
+        <script src="node_modules/mocha/mocha.js"></script>
+        <script>mocha.setup('bdd')</script>
+        <script>Polymer = function() {}
+        // hack hack - can't get polymer to run in phantomjs, so mock it out
+        </script>
+      <script src="bower_components/d3/d3.js"></script>
+      <script src="bower_components/svg-typewriter/svgtypewriter.js"></script>
+      <script src="bower_components/plottable/plottable.js"></script>
+        <script src="build/components.js"></script>
+        <script src="build/test.js"></script>
+        <script>
+            if (window.mochaPhantomJS) {
+                mochaPhantomJS.run();
+            } else {
+                mocha.run();
+            }
+        </script>
+    </body>
+</html>
+
diff --git a/tensorflow/tensorboard/tfgraph-demo-index.html b/tensorflow/tensorboard/tfgraph-demo-index.html
new file mode 100644
index 0000000000..84b7aaa94c
--- /dev/null
+++ b/tensorflow/tensorboard/tfgraph-demo-index.html
@@ -0,0 +1,38 @@
+<!doctype html>
+<!--
+@license
+Copyright (c) 2015 The Polymer Project Authors. All rights reserved.
+This code may only be used under the BSD style license found at http://polymer.github.io/LICENSE.txt
+The complete set of authors may be found at http://polymer.github.io/AUTHORS.txt
+The complete set of contributors may be found at http://polymer.github.io/CONTRIBUTORS.txt
+Code distributed by Google as part of the polymer project is also
+subject to an additional IP rights grant found at http://polymer.github.io/PATENTS.txt
+-->
+<html>
+  <head>
+    <meta charset="utf-8">
+    <meta name="viewport" content="width=device-width, minimum-scale=1.0, initial-scale=1.0, user-scalable=yes">
+    <title>tf-graph Demo</title>
+    <!-- Libraries that should be imported in TensorBoard when the Graph visualizer ports to TensorBoard -->
+    <script src="bower_components/webcomponentsjs/webcomponents-lite.min.js"></script>
+    <script src="bower_components/es6-promise/promise.min.js"></script>
+    <link rel="import" href="components/tf-graph/demo/tf-graph-demo.html">
+    <style>
+      html {
+        width: 100%;
+        height: 100%;
+      }
+
+      body {
+        margin: 0;
+        padding: 0;
+        width: 100%;
+        height: 100%;
+      }
+    </style>
+  </head>
+
+  <body unresolved>
+    <tf-graph-demo></tf-graph-demo>
+  </body>
+</html>
diff --git a/tensorflow/tensorboard/tsconfig.json b/tensorflow/tensorboard/tsconfig.json
new file mode 100644
index 0000000000..ea9d69237c
--- /dev/null
+++ b/tensorflow/tensorboard/tsconfig.json
@@ -0,0 +1,8 @@
+{
+  "compilerOptions": {
+    "noImplicitAny": false,
+    "noEmitOnError": true,
+    "compileOnSave": false,
+    "target": "ES5"
+  }
+}
diff --git a/tensorflow/tensorboard/tsd.json b/tensorflow/tensorboard/tsd.json
new file mode 100644
index 0000000000..de25bb111c
--- /dev/null
+++ b/tensorflow/tensorboard/tsd.json
@@ -0,0 +1,30 @@
+{
+  "version": "v4",
+  "repo": "borisyankov/DefinitelyTyped",
+  "ref": "master",
+  "path": "typings",
+  "bundle": "typings/tsd.d.ts",
+  "installed": {
+    "chai/chai-assert.d.ts": {
+      "commit": "16dd1ab76fb4c65e532ca820dd45c875636521b6"
+    },
+    "d3/d3.d.ts": {
+      "commit": "eb59a40d3c2f3257e34ec2ede181046230814a41"
+    },
+    "mocha/mocha.d.ts": {
+      "commit": "484544d14d400190b20f270341c97b16adc0f1ef"
+    },
+    "webcomponents.js/webcomponents.js.d.ts": {
+      "commit": "e9be3cecf8a326d3e220c52b42d218169a7bb9f2"
+    },
+    "lodash/lodash.d.ts": {
+      "commit": "eb835aa72f45eee4246b1de8beeffe2010c689e6"
+    },
+    "polymer/polymer.d.ts": {
+      "commit": "e9be3cecf8a326d3e220c52b42d218169a7bb9f2"
+    },
+    "es6-promise/es6-promise.d.ts": {
+      "commit": "e9be3cecf8a326d3e220c52b42d218169a7bb9f2"
+    }
+  }
+}
diff --git a/tensorflow/tensorboard/tslint.json b/tensorflow/tensorboard/tslint.json
new file mode 100644
index 0000000000..bd1efdbd6f
--- /dev/null
+++ b/tensorflow/tensorboard/tslint.json
@@ -0,0 +1,66 @@
+{
+  "rules": {
+    "class-name": true,
+    "comment-format": [true, "check-space"],
+    "curly": true,
+    "eofline": true,
+    "forin": true,
+    "jsdoc-format": true,
+    "label-position": true,
+    "label-undefined": true,
+    "max-line-length": [true, 140],
+    "member-ordering": [false, "variables-before-functions"],
+    "no-arg": true,
+    "no-bitwise": true,
+    "no-consecutive-blank-lines": true,
+    "no-console": [true,
+        "log",
+        "debug",
+        "info",
+        "time",
+        "timeEnd",
+        "trace",
+        "warn"
+    ],
+    "no-construct": true,
+    "no-constructor-vars": true,
+    "no-debugger": true,
+    "no-duplicate-key": true,
+    "no-duplicate-variable": true,
+    "no-empty": true,
+    "no-eval": true,
+    "no-trailing-whitespace": true,
+    "no-unreachable": true,
+    "no-unused-expression": true,
+    "no-unused-variable": false,
+    "no-use-before-declare": true,
+    "one-line": [true,
+        "check-catch",
+        "check-else",
+        "check-open-brace",
+        "check-whitespace"
+    ],
+    "quotemark": [true,
+        "double"
+    ],
+    "radix": true,
+    "semicolon": true,
+    "triple-equals": [true,
+        "allow-null-check"
+    ],
+    "typedef-whitespace": [true, {
+        "call-signature": "nospace",
+        "index-signature": "nospace",
+        "parameter": "nospace",
+        "property-declaration": "nospace",
+        "variable-declaration": "nospace"
+    }],
+    "whitespace": [true,
+        "check-branch",
+        "check-decl",
+        "check-operator",
+        "check-separator",
+        "check-type"
+    ]
+  }
+}
diff --git a/tensorflow/tensorflow.bzl b/tensorflow/tensorflow.bzl
new file mode 100644
index 0000000000..4bcfd6234c
--- /dev/null
+++ b/tensorflow/tensorflow.bzl
@@ -0,0 +1,340 @@
+# -*- Python -*-
+
+# Return the options to use for a C++ library or binary build.
+# Uses the ":optmode" config_setting to pick the options.
+
+load("/tensorflow/core/platform/default/build_config_root",
+     "tf_cuda_tests_tags")
+
+# List of proto files for android builds
+def tf_android_core_proto_sources():
+    return [
+        "//tensorflow/core:framework/allocation_description.proto",
+        "//tensorflow/core:framework/attr_value.proto",
+        "//tensorflow/core:framework/config.proto",
+        "//tensorflow/core:framework/device_attributes.proto",
+        "//tensorflow/core:framework/function.proto",
+        "//tensorflow/core:framework/graph.proto",
+        "//tensorflow/core:framework/kernel_def.proto",
+        "//tensorflow/core:framework/op_def.proto",
+        "//tensorflow/core:framework/step_stats.proto",
+        "//tensorflow/core:framework/summary.proto",
+        "//tensorflow/core:framework/tensor.proto",
+        "//tensorflow/core:framework/tensor_description.proto",
+        "//tensorflow/core:framework/tensor_shape.proto",
+        "//tensorflow/core:framework/tensor_slice.proto",
+        "//tensorflow/core:framework/types.proto",
+        "//tensorflow/core:lib/core/error_codes.proto",
+        "//tensorflow/core:util/saved_tensor_slice.proto"
+	]
+
+
+def if_cuda(a, b=[]):
+  return select({
+      "//third_party/gpus/cuda:cuda_crosstool_condition": a,
+      "//conditions:default": b,
+  })
+
+
+def tf_copts():
+  return ["-pthread", "-fno-exceptions",] + if_cuda(["-DGOOGLE_CUDA=1"])
+
+
+# Given a list of "op_lib_names" (a list of files in the ops directory
+# without their .cc extensions), generate a library for that file.
+def tf_gen_op_libs(op_lib_names):
+  # Make library out of each op so it can also be used to generate wrappers
+  # for various languages.
+  for n in op_lib_names:
+    native.cc_library(name=n + "_op_lib",
+                      copts=tf_copts(),
+                      srcs=["ops/" + n + ".cc"],
+                      deps=(["//tensorflow/core:framework"]),
+                      visibility=["//visibility:public"],
+                      alwayslink=1,
+                      linkstatic=1,)
+
+
+def tf_gen_op_wrapper_cc(name, out_ops_file, pkg=""):
+  # Construct an op generator binary for these ops.
+  tool = out_ops_file + "_gen_cc"
+  native.cc_binary(
+      name = tool,
+      copts = tf_copts(),
+      linkopts = ["-lm"],
+      linkstatic = 1,   # Faster to link this one-time-use binary dynamically
+      deps = (["//tensorflow/cc:cc_op_gen_main",
+               pkg + ":" + name + "_op_lib"])
+  )
+
+  # Run the op generator.
+  if name == "sendrecv_ops":
+    include_internal = "1"
+  else:
+    include_internal = "0"
+  native.genrule(
+      name=name + "_genrule",
+      outs=[out_ops_file + ".h", out_ops_file + ".cc"],
+      tools=[":" + tool],
+      cmd=("$(location :" + tool + ") $(location :" + out_ops_file + ".h) " +
+           "$(location :" + out_ops_file + ".cc) " + include_internal))
+
+
+# Given a list of "op_lib_names" (a list of files in the ops directory
+# without their .cc extensions), generate individual C++ .cc and .h
+# files for each of the ops files mentioned, and then generate a
+# single cc_library called "name" that combines all the
+# generated C++ code.
+#
+# For example, for:
+#  tf_gen_op_wrappers_cc("tf_ops_lib", [ "array_ops", "math_ops" ])
+#
+#
+# This will ultimately generate ops/* files and a library like:
+#
+# cc_library(name = "tf_ops_lib",
+#            srcs = [ "ops/array_ops.cc",
+#                     "ops/math_ops.cc" ],
+#            hdrs = [ "ops/array_ops.h",
+#                     "ops/math_ops.h" ],
+#            deps = [ ... ])
+def tf_gen_op_wrappers_cc(name,
+                          op_lib_names=[],
+                          other_srcs=[],
+                          other_hdrs=[],
+                          pkg=""):
+  subsrcs = other_srcs
+  subhdrs = other_hdrs
+  for n in op_lib_names:
+    tf_gen_op_wrapper_cc(n, "ops/" + n, pkg=pkg)
+    subsrcs += ["ops/" + n + ".cc"]
+    subhdrs += ["ops/" + n + ".h"]
+
+  native.cc_library(name=name,
+                    srcs=subsrcs,
+                    hdrs=subhdrs,
+                    deps=["//tensorflow/core:core_cpu"],
+                    copts=tf_copts(),
+                    alwayslink=1,)
+
+
+# Invoke this rule in .../tensorflow/python to build the wrapper library.
+def tf_gen_op_wrapper_py(name, out=None, hidden=[], visibility=None, deps=[],
+                         require_shape_functions=False):
+  # Construct a cc_binary containing the specified ops.
+  tool_name = "gen_" + name + "_py_wrappers_cc"
+  if not deps:
+    deps = ["//tensorflow/core:" + name + "_op_lib"]
+  native.cc_binary(
+      name = tool_name,
+      linkopts = ["-lm"],
+      copts = tf_copts(),
+      linkstatic = 1,   # Faster to link this one-time-use binary dynamically
+      deps = (["//tensorflow/core:framework",
+               "//tensorflow/python:python_op_gen_main"] + deps),
+      visibility = ["//tensorflow:internal"],
+  )
+
+  # Invoke the previous cc_binary to generate a python file.
+  if not out:
+    out = "ops/gen_" + name + ".py"
+
+  native.genrule(
+      name=name + "_pygenrule",
+      outs=[out],
+      tools=[tool_name],
+      cmd=("$(location " + tool_name + ") " + ",".join(hidden)
+           + " " + ("1" if require_shape_functions else "0") + " > $@"))
+
+  # Make a py_library out of the generated python file.
+  native.py_library(name=name,
+                    srcs=[out],
+                    visibility=visibility,
+                    deps=[
+                        "//tensorflow/core:protos_all_py",
+                        "//tensorflow/python:framework",
+                    ],)
+
+
+# Define a bazel macro that creates cc_test for tensorflow.
+# TODO(opensource): we need to enable this to work around the hidden symbol
+# __cudaRegisterFatBinary error. Need more investigations.
+def tf_cc_test(name, deps, linkstatic=0, tags=[]):
+  name = name.replace(".cc", "")
+  native.cc_test(name="%s" % (name.replace("/", "_")),
+                 srcs=["%s.cc" % (name)],
+                 copts=tf_copts(),
+                 deps=deps,
+                 linkopts=["-lpthread", "-lm"],
+                 linkstatic=linkstatic,
+                 tags=tags,)
+
+
+# Create a cc_test for each of the tensorflow tests listed in "tests"
+def tf_cc_tests(tests, deps, linkstatic=0, tags=[]):
+  for t in tests:
+    tf_cc_test(t, deps, linkstatic, tags=tags)
+
+# Build defs for TensorFlow kernels
+
+
+# When this target is built using --config=cuda, a cc_library is built
+# that passes -DGOOGLE_CUDA=1 and '-x cuda', linking in additional
+# libraries needed by GPU kernels.
+def tf_gpu_kernel_library(srcs, copts=[], cuda_copts=[], deps=[], hdrs=[],
+                       **kwargs):
+  # We have to disable variadic templates in Eigen for NVCC even though
+  # std=c++11 are enabled
+  cuda_copts = ["-x", "cuda", "-DGOOGLE_CUDA=1",
+                "-nvcc_options=relaxed-constexpr"] + cuda_copts
+  native.cc_library(
+      srcs = srcs,
+      hdrs = hdrs,
+      copts = copts + if_cuda(cuda_copts),
+      deps = deps + if_cuda([
+          "//tensorflow/core:stream_executor",
+      ]) + ["//tensorflow/core/platform/default/build_config:cuda_runtime_extra"],
+      alwayslink=1,
+      **kwargs)
+
+
+def tf_cuda_library(deps=None, cuda_deps=None, copts=None, **kwargs):
+  """Generate a cc_library with a conditional set of CUDA dependencies.
+
+  When the library is built with --config=cuda:
+
+  - both deps and cuda_deps are used as dependencies
+  - the gcudacc runtime is added as a dependency (if necessary)
+  - The library additionally passes -DGOOGLE_CUDA=1 to the list of copts
+
+  Args:
+  - cuda_deps: BUILD dependencies which will be linked if and only if:
+      '--config=cuda' is passed to the bazel command line.
+  - deps: dependencies which will always be linked.
+  - copts: copts always passed to the cc_library.
+  - kwargs: Any other argument to cc_library.
+  """
+  if not deps:
+    deps = []
+  if not cuda_deps:
+    cuda_deps = []
+  if not copts:
+    copts = []
+
+  native.cc_library(
+      deps = deps + if_cuda(cuda_deps) +
+          ["//tensorflow/core/platform/default/build_config:cuda_runtime_extra"],
+      copts = copts + if_cuda(["-DGOOGLE_CUDA=1"]),
+      **kwargs)
+
+
+# Bazel rules for building swig files.
+def _py_wrap_cc_impl(ctx):
+  srcs = ctx.files.srcs
+  if len(srcs) != 1:
+    fail("Exactly one SWIG source file label must be specified.", "srcs")
+  module_name = ctx.attr.module_name
+  cc_out = ctx.outputs.cc_out
+  py_out = ctx.outputs.py_out
+  src = ctx.files.srcs[0]
+  args = ["-c++", "-python"]
+  args += ["-module", module_name]
+  args += ["-l" + f.path for f in ctx.files.swig_includes]
+  cc_include_dirs = set()
+  cc_includes = set()
+  for dep in ctx.attr.deps:
+    cc_include_dirs += [h.dirname for h in dep.cc.transitive_headers]
+    cc_includes += dep.cc.transitive_headers
+  args += ["-I" + x for x in cc_include_dirs]
+  args += ["-o", cc_out.path]
+  args += ["-outdir", py_out.dirname]
+  args += [src.path]
+  outputs = [cc_out, py_out]
+  ctx.action(executable=ctx.executable.swig_binary,
+             arguments=args,
+             mnemonic="PythonSwig",
+             inputs=list(set([src]) + cc_includes + ctx.files.swig_includes +
+                         ctx.attr.swig_deps.files),
+             outputs=outputs,
+             progress_message="SWIGing {input}".format(input=src.path))
+  return struct(files=set(outputs))
+
+
+_py_wrap_cc = rule(attrs={
+    "srcs": attr.label_list(mandatory=True,
+                            allow_files=True,),
+    "swig_includes": attr.label_list(cfg=DATA_CFG,
+                                     allow_files=True,),
+    "deps": attr.label_list(allow_files=True,
+                            providers=["cc"],),
+    "swig_deps": attr.label(default=Label(
+        "//tensorflow:swig")),  # swig_templates
+    "module_name": attr.string(mandatory=True),
+    "py_module_name": attr.string(mandatory=True),
+    "swig_binary": attr.label(default=Label("//tensorflow:swig"),
+                              cfg=HOST_CFG,
+                              executable=True,
+                              allow_files=True,),
+},
+                   outputs={
+                       "cc_out": "%{module_name}.cc",
+                       "py_out": "%{py_module_name}.py",
+                   },
+                   implementation=_py_wrap_cc_impl,)
+
+
+def tf_extension_linkopts():
+  return []  # No extension link opts
+
+def tf_py_wrap_cc(name, srcs, swig_includes=[], deps=[], copts=[], **kwargs):
+  module_name = name.split("/")[-1]
+  # Convert a rule name such as foo/bar/baz to foo/bar/_baz.so
+  # and use that as the name for the rule producing the .so file.
+  cc_library_name = "/".join(name.split("/")[:-1] + ["_" + module_name + ".so"])
+  _py_wrap_cc(name=name + "_py_wrap",
+              srcs=srcs,
+              swig_includes=swig_includes,
+              deps=deps,
+              module_name=module_name,
+              py_module_name=name)
+  native.cc_binary(
+      name=cc_library_name,
+      srcs=[module_name + ".cc"],
+      copts=copts + ["-Wno-self-assign", "-Wno-write-strings"
+                    ] + ["-I/usr/include/python2.7"],
+      linkopts=tf_extension_linkopts(),
+      linkstatic=1,
+      linkshared=1,
+      deps=deps)
+  native.py_library(name=name,
+                    srcs=[":" + name + ".py"],
+                    data=[":" + cc_library_name])
+
+
+def py_tests(name,
+             srcs,
+             additional_deps=[],
+             data=[],
+             tags=[],
+             shard_count=1,
+             prefix=""):
+  for src in srcs:
+    test_name = src.split("/")[-1].split(".")[0]
+    if prefix:
+      test_name = "%s_%s" % (prefix, test_name)
+    native.py_test(name=test_name,
+                   srcs=[src],
+                   main=src,
+                   tags=tags,
+                   visibility=["//tensorflow:internal"],
+                   shard_count=shard_count,
+                   data=data,
+                   deps=[
+                       "//tensorflow/python:extra_py_tests_deps",
+                       "//tensorflow/python:kernel_tests/gradient_checker",
+                   ] + additional_deps)
+
+
+def cuda_py_tests(name, srcs, additional_deps=[], data=[], shard_count=1):
+  test_tags = tf_cuda_tests_tags()
+  py_tests(name, srcs, additional_deps, data, test_tags, shard_count)
diff --git a/tensorflow/tools/__init__.py b/tensorflow/tools/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/tools/__init__.py
diff --git a/tensorflow/tools/docker/BUILD b/tensorflow/tools/docker/BUILD
new file mode 100644
index 0000000000..2cc540ed3b
--- /dev/null
+++ b/tensorflow/tools/docker/BUILD
@@ -0,0 +1,26 @@
+# Description:
+# Various tools and rules related to the TensorFlow docker container.
+
+package(default_visibility = ["//visibility:private"])
+
+licenses(["notice"])  # Apache 2.0
+
+exports_files(["LICENSE"])
+
+py_binary(
+    name = "simple_console",
+    srcs = ["simple_console.py"],
+    deps = ["//tensorflow:tensorflow_py"],
+)
+
+filegroup(
+    name = "all_files",
+    srcs = glob(
+        ["**/*"],
+        exclude = [
+            "**/METADATA",
+            "**/OWNERS",
+        ],
+    ),
+    visibility = ["//tensorflow:__subpackages__"],
+)
diff --git a/tensorflow/tools/docker/Dockerfile b/tensorflow/tools/docker/Dockerfile
new file mode 100644
index 0000000000..02d8837858
--- /dev/null
+++ b/tensorflow/tools/docker/Dockerfile
@@ -0,0 +1,100 @@
+FROM ipython/notebook:latest
+
+MAINTAINER Craig Citro <craigcitro@google.com>
+
+# Set up Bazel.
+# Install dependencies for bazel.
+RUN apt-get update && apt-get install -y \
+    pkg-config \
+    zip \
+    g++ \
+    zlib1g-dev \
+    unzip \
+    swig \
+    software-properties-common \
+    wget
+
+# We need to add a custom PPA to pick up JDK8, since trusty doesn't
+# have an openjdk8 backport.  openjdk-r is maintained by a reliable contributor:
+# Matthias Klose (https://launchpad.net/~doko).  It will do until
+# we either update the base image beyond 14.04 or openjdk-8 is
+# finally backported to trusty; see e.g.
+#   https://bugs.launchpad.net/trusty-backports/+bug/1368094
+RUN add-apt-repository -y ppa:openjdk-r/ppa && \
+    apt-get update && \
+    apt-get install -y openjdk-8-jdk openjdk-8-jre-headless
+
+# Set up CUDA variables and symlinks
+COPY cuda /usr/local/cuda
+ENV CUDA_PATH /usr/local/cuda
+ENV LD_LIBRARY_PATH /usr/local/cuda/lib64
+RUN ln -s libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so
+
+# Running bazel inside a `docker build` command causes trouble, cf:
+#   https://github.com/bazelbuild/bazel/issues/134
+# The easiest solution is to set up a bazelrc file forcing --batch.
+RUN echo "startup --batch" >>/root/.bazelrc
+# Similarly, we need to workaround sandboxing issues:
+#   https://github.com/bazelbuild/bazel/issues/418
+RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
+    >>/root/.bazelrc
+ENV BAZELRC /root/.bazelrc
+# Install the most recent bazel release.
+ENV BAZEL_VERSION 0.1.1
+WORKDIR /
+RUN mkdir /bazel && \
+    cd /bazel && \
+    wget https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
+    wget -O /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE.txt
+    chmod +x bazel-*.sh && \
+    ./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
+    cd / && \
+    rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
+
+# Download and build TensorFlow.
+WORKDIR /tensorflow
+# Pick up some TF dependencies
+RUN apt-get update && \
+    apt-get install -y python-numpy && \
+    apt-get install -y libfreetype6-dev
+
+# We can't clone the TF git repo yet, because of permissions issues.
+# RUN git clone https://tensorflow.googlesource.com/
+# Instead, we manually copy it in:
+COPY tensorflow /tensorflow
+
+# Set up the CUDA tensorflow directories
+RUN rm -rf /tensorflow/third_party/gpus/cuda/lib64
+RUN rm -rf /tensorflow/third_party/gpus/cuda/bin
+RUN rm -rf /tensorflow/third_party/gpus/cuda/include
+RUN rm -rf /tensorflow/third_party/gpus/cuda/nvvm
+RUN ln -s /usr/local/cuda/lib64 /tensorflow/third_party/gpus/cuda/
+RUN ln -s /usr/local/cuda/bin /tensorflow/third_party/gpus/cuda/
+RUN ln -s /usr/local/cuda/include /tensorflow/third_party/gpus/cuda/
+RUN ln -s /usr/local/cuda/nvvm /tensorflow/third_party/gpus/cuda/
+
+# Now we build
+RUN bazel clean && \
+    bazel build -c opt --config=cuda tensorflow/tools/docker:simple_console
+
+ENV PYTHONPATH=/tensorflow/bazel-bin/tensorflow/tools/docker/simple_console.runfiles/:$PYTHONPATH
+
+# Add any notebooks in this directory.
+COPY notebooks /notebooks
+
+# Add variables for the local IPython. This sets a fixed password and
+# switches to HTTP (to avoid self-signed certificate warnings in
+# Chrome).
+ENV PASSWORD=JustForNow
+ENV USE_HTTP=1
+
+RUN if [ -f /notebooks/requirements.txt ];\
+        then pip install -r /notebooks/requirements.txt;\
+    fi
+
+# Set the workdir so we see notebooks on the IPython landing page.
+WORKDIR /notebooks
+
+# Remove CUDA libraries, headers, nvcc.  The user will have to
+# provide this directly when running docker.
+RUN rm -rf /usr/local/cuda
diff --git a/tensorflow/tools/docker/Dockerfile.cpu b/tensorflow/tools/docker/Dockerfile.cpu
new file mode 100644
index 0000000000..c93a6e8bd2
--- /dev/null
+++ b/tensorflow/tools/docker/Dockerfile.cpu
@@ -0,0 +1,68 @@
+FROM b.gcr.io/tensorflow-testing/tensorflow
+
+MAINTAINER Craig Citro <craigcitro@google.com>
+
+# Set up Bazel.
+# Install dependencies for bazel.
+RUN apt-get update && apt-get install -y \
+        g++ \
+        pkg-config \
+        python-dev \
+        python-numpy \
+        python-pip \
+        software-properties-common \
+        swig \
+        unzip \
+        wget \
+        zip \
+        zlib1g-dev \
+        && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# We need to add a custom PPA to pick up JDK8, since trusty doesn't
+# have an openjdk8 backport.  openjdk-r is maintained by a reliable contributor:
+# Matthias Klose (https://launchpad.net/~doko).  It will do until
+# we either update the base image beyond 14.04 or openjdk-8 is
+# finally backported to trusty; see e.g.
+#   https://bugs.launchpad.net/trusty-backports/+bug/1368094
+RUN add-apt-repository -y ppa:openjdk-r/ppa && \
+    apt-get update && \
+    apt-get install -y openjdk-8-jdk openjdk-8-jre-headless && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Running bazel inside a `docker build` command causes trouble, cf:
+#   https://github.com/bazelbuild/bazel/issues/134
+# The easiest solution is to set up a bazelrc file forcing --batch.
+RUN echo "startup --batch" >>/root/.bazelrc
+# Similarly, we need to workaround sandboxing issues:
+#   https://github.com/bazelbuild/bazel/issues/418
+RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
+    >>/root/.bazelrc
+ENV BAZELRC /root/.bazelrc
+# Install the most recent bazel release.
+ENV BAZEL_VERSION 0.1.1
+WORKDIR /
+RUN mkdir /bazel && \
+    cd /bazel && \
+    wget https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
+    wget -O /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE.txt && \
+    chmod +x bazel-*.sh && \
+    ./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
+    cd / && \
+    rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
+
+# Download and build TensorFlow.
+WORKDIR /tensorflow
+
+# We can't clone the TF git repo yet, because of permissions issues.
+# RUN git clone https://tensorflow.googlesource.com/
+# Instead, we manually copy it in:
+COPY tensorflow /tensorflow
+
+# Now we build
+RUN bazel clean && \
+    bazel build -c opt tensorflow/tools/docker:simple_console
+
+ENV PYTHONPATH=/tensorflow/bazel-bin/tensorflow/tools/docker/simple_console.runfiles/:$PYTHONPATH
diff --git a/tensorflow/tools/docker/Dockerfile.lite b/tensorflow/tools/docker/Dockerfile.lite
new file mode 100644
index 0000000000..8ba5f2d778
--- /dev/null
+++ b/tensorflow/tools/docker/Dockerfile.lite
@@ -0,0 +1,56 @@
+FROM ubuntu:14.04
+
+MAINTAINER Craig Citro <craigcitro@google.com>
+
+# Pick up some TF dependencies
+RUN apt-get update && apt-get install -y \
+        curl \
+        libfreetype6-dev \
+        libpng12-dev \
+        libzmq3-dev \
+        pkg-config \
+        python-numpy \
+        python-pip \
+        python-scipy \
+        && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN pip install \
+        jupyter \
+        matplotlib
+
+# Install TensorFlow CPU version.
+RUN pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-cp27-none-linux_x86_64.whl
+
+RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
+    python get-pip.py && \
+    rm get-pip.py && \
+    pip --no-cache-dir install requests[security]
+
+RUN pip --no-cache-dir install ipykernel && \
+    python -m ipykernel.kernelspec
+
+# Add any notebooks in this directory.
+COPY notebooks/*.ipynb /notebooks/
+
+# Set up our notebook config.
+COPY jupyter_notebook_config.py /root/.jupyter/
+
+# Jupyter has issues with being run directly:
+#   https://github.com/ipython/ipython/issues/7062
+# We just add a little wrapper script.
+COPY run_jupyter.sh /
+
+# Set the workdir so we see notebooks on the IPython landing page.
+WORKDIR /notebooks
+
+# These are temporary while we sort out the GPU dependency.
+ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
+
+# TensorBoard
+EXPOSE 6006
+# IPython
+EXPOSE 8888
+
+CMD ["/run_jupyter.sh"]
diff --git a/tensorflow/tools/docker/LICENSE b/tensorflow/tools/docker/LICENSE
new file mode 100644
index 0000000000..28711d7885
--- /dev/null
+++ b/tensorflow/tools/docker/LICENSE
@@ -0,0 +1,13 @@
+Copyright 2015 The TensorFlow Authors.  All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
diff --git a/tensorflow/tools/docker/README.md b/tensorflow/tools/docker/README.md
new file mode 100644
index 0000000000..1d64b7ea1b
--- /dev/null
+++ b/tensorflow/tools/docker/README.md
@@ -0,0 +1,63 @@
+# Using TensorFlow via Docker
+
+This directory contains `Dockerfile`s to make it easy to get up and running with
+TensorFlow via [Docker](http://www.docker.com/).
+
+## Installing Docker
+
+General installation instructions are
+[on the Docker site](https://docs.docker.com/installation/), but we give some
+quick links here:
+
+* [OSX](https://docs.docker.com/installation/mac/): [docker toolbox](https://www.docker.com/toolbox)
+* [ubuntu](https://docs.docker.com/installation/ubuntulinux/)
+
+## Running the container
+
+Before you build your container, you can add notebooks you need
+to a subdirectory of your working directory `notebooks/` and any python
+libraries you need for them to `notebooks/requirements.txt` to have them
+installed with `pip`.
+
+To build a container image from this `Dockerfile`, just run
+
+    $ docker build -t $USER/tensorflow_docker .
+
+This will create a new container from the description, and print out an
+identifying hash. You can then run this container locally:
+
+    $ docker run -p 8888:8888 -it $USER/tensorflow_docker
+
+This will start the container (inside a VM locally), and expose the running
+IPython endpoint locally on port 8888. (The `-it` flags keep stdin connected to
+a tty in the container, which is helpful when you want to stop the server;
+`docker help run` explains all the possibilities.)
+
+**NOTE**: If you want to be able to add data to your IPython Notebook while it's
+running you can do this in a subdirectory of the /notebook volume as follows:
+
+    $ docker run -p 8888:8888 -it -v ./notebook/data:/notebook/data \
+        $USER/tensorflow_docker
+
+**Caveat**: Note that `docker build` uses the first positional argument as the
+*context* for the build; in particular, it starts by collecting all files in
+that directory and shipping them to the docker daemon to build the image itself.
+This means you shouldn't use the `-f` flag to use this Dockerfile from a
+different directory, or you'll end up copying around more files than you'd like.
+So:
+
+    # ok
+    $ docker build .                     # inside tools/docker
+    $ docker build path/to/tools/docker  # further up the tree
+    # bad
+    $ docker build -f tools/docker/Dockerfile . # will pick up all files in .
+
+## Experimenting in the container:
+
+When the container starts up, it launches an IPython notebook server, populated
+with several "Getting Started with TensorFlow" notebooks.
+
+# TODO
+
+* Decide how much of this is handled by the native
+  [docker support in bazel](http://bazel.io/blog/2015/07/28/docker_build.html).
diff --git a/tensorflow/tools/docker/__init__.py b/tensorflow/tools/docker/__init__.py
new file mode 100755
index 0000000000..e69de29bb2
--- /dev/null
+++ b/tensorflow/tools/docker/__init__.py
diff --git a/tensorflow/tools/docker/jupyter_notebook_config.py b/tensorflow/tools/docker/jupyter_notebook_config.py
new file mode 100644
index 0000000000..8031c5f269
--- /dev/null
+++ b/tensorflow/tools/docker/jupyter_notebook_config.py
@@ -0,0 +1,4 @@
+c.NotebookApp.ip = '*'
+c.NotebookApp.port = 8888
+c.NotebookApp.open_browser = False
+c.MultiKernelManager.default_kernel_name = 'python2'
diff --git a/tensorflow/tools/docker/notebooks/1_hello_tensorflow.ipynb b/tensorflow/tools/docker/notebooks/1_hello_tensorflow.ipynb
new file mode 100644
index 0000000000..201711d333
--- /dev/null
+++ b/tensorflow/tools/docker/notebooks/1_hello_tensorflow.ipynb
@@ -0,0 +1,742 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "a3bskVXPvchm"
+   },
+   "source": [
+    "# Hello, TensorFlow\n",
+    "## A beginner-level, getting started, basic introduction to TensorFlow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "Rb5rSpcZvYbX"
+   },
+   "source": [
+    "TensorFlow is a general-purpose system for graph-based computation. A typical use is machine learning. In this notebook, we'll introduce the basic concepts of TensorFlow using some simple examples.\n",
+    "\n",
+    "TensorFlow gets its name from [tensors](https://en.wikipedia.org/wiki/Tensor), which are arrays of arbitrary dimensionality. A vector is a 1-d array and is known as a 1st-order tensor. A matrix is a 2-d array and a 2nd-order tensor. The \"flow\" part of the name refers to computation flowing through a graph. Training and inference in a neural network, for example, involves the propagation of matrix computations through many nodes in a computational graph.\n",
+    "\n",
+    "When you think of doing things in TensorFlow, you might want to think of creating tensors (like matrices), adding operations (that output other tensors), and then executing the computation (running the computational graph). In particular, it's important to realize that when you add an operation on tensors, it doesn't execute immediately. Rather, TensorFlow waits for you to define all the operations you want to perform. Then, TensorFlow optimizes the computation graph, deciding how to execute the computation, before generating the data. Because of this, a tensor in TensorFlow isn't so much holding the data as a placeholder for holding the data, waiting for the data to arrive when a computation is executed."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "E8FhiMivhcYB"
+   },
+   "source": [
+    "## Adding two vectors in TensorFlow\n",
+    "\n",
+    "Let's start with something that should be simple. Let's add two length four vectors (two 1st-order tensors):\n",
+    "\n",
+    "$\\begin{bmatrix} 1. & 1. & 1. & 1.\\end{bmatrix} + \\begin{bmatrix} 2. & 2. & 2. & 2.\\end{bmatrix} = \\begin{bmatrix} 3. & 3. & 3. & 3.\\end{bmatrix}$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 131,
+     "status": "ok",
+     "timestamp": 1446243605678,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "7391995727249e65",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 420
+    },
+    "id": "2iv3XQ6k3eF1",
+    "outputId": "e21e1144-736a-4b1f-df78-a9ceab9d4c61"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[ 3.  3.  3.  3.]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "\n",
+    "with tf.Session():\n",
+    "  input1 = tf.constant([1.0, 1.0, 1.0, 1.0])\n",
+    "  input2 = tf.constant([2.0, 2.0, 2.0, 2.0])\n",
+    "  output = tf.add(input1, input2)\n",
+    "  result = output.eval()\n",
+    "  print result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "dqLV5GXT3wLy"
+   },
+   "source": [
+    "What we're doing is creating two vectors, [1.0, 1.0, 1.0, 1.0] and [2.0, 2.0, 2.0, 2.0], and then adding them. Here's equivalent code in raw Python and using numpy:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 152,
+     "status": "ok",
+     "timestamp": 1446242020458,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "7391995727249e65",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 420
+    },
+    "id": "7DzDJ7sW79ao",
+    "outputId": "cf89e613-06e5-4435-bea3-9f48a4eff943"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[3.0, 3.0, 3.0, 3.0]\n"
+     ]
+    }
+   ],
+   "source": [
+    "print [x + y for x, y in zip([1.0] * 4, [2.0] * 4)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 97,
+     "status": "ok",
+     "timestamp": 1446242021921,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "7391995727249e65",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 420
+    },
+    "id": "MDWJf0lHAF4E",
+    "outputId": "66d8c4a2-92b7-4048-b365-39dc42dff2bc"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[ 1.  1.  1.  1.] + [ 2.  2.  2.  2.] = [ 3.  3.  3.  3.]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "x, y = np.full(4, 1.0), np.full(4, 2.0)\n",
+    "print \"{} + {} = {}\".format(x, y, x + y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "I52jQOyO8vAn"
+   },
+   "source": [
+    "## Details of adding two vectors in TensorFlow\n",
+    "\n",
+    "The example above of adding two vectors involves a lot more than it seems, so let's look at it in more depth.\n",
+    "\n",
+    ">`import tensorflow as tf`\n",
+    "\n",
+    "This import brings TensorFlow's public API into our IPython runtime environment.\n",
+    "\n",
+    ">`with tf.Session():`\n",
+    "\n",
+    "When you run an operation in TensorFlow, you need to do it in the context of a `Session`. A session holds the computation graph, which contains the tensors and the operations. When you create tensors and operations, they are not executed immediately, but wait for other operations and tensors to be added to the graph, only executing when finally requested to produce the results of the session. Deferring the execution like this provides additional opportunities for parallelism and optimization, as TensorFlow can decide how to combine operations and where to run them after TensorFlow knows about all the operations. \n",
+    "\n",
+    ">>`input1 = tf.constant([1.0, 1.0, 1.0, 1.0])`\n",
+    "\n",
+    ">>`input2 = tf.constant([2.0, 2.0, 2.0, 2.0])`\n",
+    "\n",
+    "The next two lines create tensors using a convenience function called `constant`, which is similar to numpy's `array` and numpy's `full`. If you look at the code for `constant`, you can see the details of what it is doing to create the tensor. In summary, it creates a tensor of the necessary shape and applies the constant operator to it to fill it with the provided values. The values to `constant` can be Python or numpy arrays. `constant` can take an optional shape paramter, which works similarly to numpy's `fill` if provided, and an optional name parameter, which can be used to put a more human-readable label on the operation in the TensorFlow operation graph.\n",
+    "\n",
+    ">>`output = tf.add(input1, input2)`\n",
+    "\n",
+    "You might think `add` just adds the two vectors now, but it doesn't quite do that. What it does is put the `add` operation into the computational graph. The results of the addition aren't available yet. They've been put in the computation graph, but the computation graph hasn't been executed yet.\n",
+    "\n",
+    ">>`result = output.eval()`\n",
+    "\n",
+    ">>`print result`\n",
+    "\n",
+    "`eval()` is also slightly more complicated than it looks. Yes, it does get the value of the vector (tensor) that results from the addition. It returns this as a numpy array, which can then be printed. But, it's important to realize it also runs the computation graph at this point, because we demanded the output from the operation node of the graph; to produce that, it had to run the computation graph. So, this is the point where the addition is actually performed, not when `add` was called, as `add` just put the addition operation into the TensorFlow computation graph."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "H_5_2YY3ySr2"
+   },
+   "source": [
+    "## Multiple operations\n",
+    "\n",
+    "To use TensorFlow, you add operations on tensors that produce tensors to the computation graph, then execute that graph to run all those operations and calculate the values of all the tensors in the graph.\n",
+    "\n",
+    "Here's a simple example with two operations:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 101,
+     "status": "ok",
+     "timestamp": 1446242580297,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "7391995727249e65",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 420
+    },
+    "id": "-kQmn3U_yXX8",
+    "outputId": "e96a6e27-665e-47d3-822e-47aeb66fc7f8"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[ 6.  6.  6.  6.]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "\n",
+    "with tf.Session():\n",
+    "  input1 = tf.constant(1.0, shape=[4])\n",
+    "  input2 = tf.constant(2.0, shape=[4])\n",
+    "  input3 = tf.constant(3.0, shape=[4])\n",
+    "  output = tf.add(tf.add(input1, input2), input3)\n",
+    "  result = output.eval()\n",
+    "  print result"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "Hod0zvsly8YT"
+   },
+   "source": [
+    "This version uses `constant` in a way similar to numpy's `fill`, specifying the optional shape and having the values copied out across it.\n",
+    "\n",
+    "The `add` operator supports operator overloading, so you could try writing it inline as `input1 + input2` instead as well as experimenting with other operators."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 156,
+     "status": "ok",
+     "timestamp": 1446242664353,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "7391995727249e65",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 420
+    },
+    "id": "yS2WElRfxz53",
+    "outputId": "9818bf3c-5659-4a87-8b5d-40a28f1a2677"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[ 3.  3.  3.  3.]\n"
+     ]
+    }
+   ],
+   "source": [
+    "with tf.Session():\n",
+    "  input1 = tf.constant(1.0, shape=[4])\n",
+    "  input2 = tf.constant(2.0, shape=[4])\n",
+    "  output = input1 + input2\n",
+    "  print output.eval()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "zszjoYUjkUNU"
+   },
+   "source": [
+    "##  Adding two matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "EWNYBCB6kbri"
+   },
+   "source": [
+    "Next, let's do something very similar, adding two matrices:\n",
+    "\n",
+    "$\\begin{bmatrix}\n",
+    "  1. & 1. & 1. \\\\\n",
+    "  1. & 1. & 1. \\\\\n",
+    "\\end{bmatrix} + \n",
+    "\\begin{bmatrix}\n",
+    "  1. & 2. & 3. \\\\\n",
+    "  4. & 5. & 6. \\\\\n",
+    "\\end{bmatrix} = \n",
+    "\\begin{bmatrix}\n",
+    "  2. & 3. & 4. \\\\\n",
+    "  5. & 6. & 7. \\\\\n",
+    "\\end{bmatrix}$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 1540,
+     "status": "ok",
+     "timestamp": 1446242690334,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "7391995727249e65",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 420
+    },
+    "id": "tmWcCxSilYkg",
+    "outputId": "f3a2e904-790b-42e1-9ca4-2f3c54d7f4a8"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[[ 2.  3.  4.]\n",
+      " [ 5.  6.  7.]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "\n",
+    "with tf.Session():\n",
+    "  input1 = tf.constant(1.0, shape=[2, 3])\n",
+    "  input2 = tf.constant(np.reshape(np.arange(1.0, 7.0, dtype=np.float32), (2, 3)))\n",
+    "  output = tf.add(input1, input2)\n",
+    "  print output.eval()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "JuU3Bmglq1vd"
+   },
+   "source": [
+    "Recall that you can pass numpy or Python arrays into `constant`.\n",
+    "\n",
+    "In this example, the matrix with values from 1 to 6 is created in numpy and passed into `constant`, but TensorFlow also has `range`, `reshape`, and `tofloat` operators. Doing this entirely within TensorFlow could be more efficient if this was a very large matrix.\n",
+    "\n",
+    "Try experimenting with this code a bit -- maybe modifying some of the values, using the numpy version, doing this using, adding another operation, or doing this using TensorFlow's `range` function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "gnXnpnuLrflb"
+   },
+   "source": [
+    "##  Multiplying matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "Ho-QNSOorj0y"
+   },
+   "source": [
+    "Let's move on to matrix multiplication. This time, let's use a bit vector and some random values, which is a good step toward some of what we'll need to do for regression and neural networks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 132,
+     "status": "ok",
+     "timestamp": 1446242872027,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "7391995727249e65",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 420
+    },
+    "id": "uNqMaFR8sIY5",
+    "outputId": "fc0e29a0-306c-4709-c181-1108d5a21d88"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Input:\n",
+      "[[ 1.  0.  0.  1.]]\n",
+      "Weights:\n",
+      "[[-0.8187139  -0.81037313]\n",
+      " [-0.31439888 -2.36761999]\n",
+      " [-1.3127892  -0.33629459]\n",
+      " [-1.23475349 -1.19031894]]\n",
+      "Output:\n",
+      "[[-2.05346727 -2.00069213]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "#@test {\"output\": \"ignore\"}\n",
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "\n",
+    "with tf.Session():\n",
+    "  input_features = tf.constant(np.reshape([1, 0, 0, 1], (1, 4)).astype(np.float32))\n",
+    "  weights = tf.constant(np.random.randn(4, 2).astype(np.float32))\n",
+    "  output = tf.matmul(input_features, weights)\n",
+    "  print \"Input:\"\n",
+    "  print input_features.eval()\n",
+    "  print \"Weights:\"\n",
+    "  print weights.eval()\n",
+    "  print \"Output:\"\n",
+    "  print output.eval()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "JDAVTPhb22AP"
+   },
+   "source": [
+    "Above, we're taking a 1 x 4 vector [1 0 0 1] and multiplying it by a 4 by 2 matrix full of random values from a normal distribution (mean 0, stdev 1). The output is a 1 x 2 matrix.\n",
+    "\n",
+    "You might try modifying this example. Running the cell multiple times will generate new random weights and a new output. Or, change the input, e.g., to \\[0 0 0 1]), and run the cell again. Or, try initializing the weights using the TensorFlow op, e.g., `random_normal`, instead of using numpy to generate the random weights.\n",
+    "\n",
+    "What we have here is the basics of a simple neural network already. If we are reading in the input features, along with some expected output, and change the weights based on the error with the output each time, that's a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "XhnBjAUILuy8"
+   },
+   "source": [
+    "## Use of variables\n",
+    "\n",
+    "Let's look at adding two small matrices in a loop, not by creating new tensors every time, but by updating the existing values and then re-running the computation graph on the new data. This happens a lot with machine learning models, where we change some parameters each time such as gradient descent on some weights and then perform the same computations over and over again."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 180,
+     "status": "ok",
+     "timestamp": 1446244201894,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "7391995727249e65",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 420
+    },
+    "id": "vJ_AgZ8lLtRv",
+    "outputId": "8d3aadaa-2b34-4642-889b-e3daaf5ee693"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[[-0.41494703  0.47648168]] [[-0.41494703  0.47648168]]\n",
+      "[[ 0.35746408  0.99504066]] [[-0.05748296  1.47152233]]\n",
+      "[[-0.46462393 -0.80201006]] [[-0.52210689  0.66951227]]\n",
+      "[[-0.99513483 -0.42322445]] [[-1.51724172  0.24628782]]\n",
+      "[[ 0.13371086 -0.85545826]] [[-1.38353086 -0.60917044]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "#@test {\"output\": \"ignore\"}\n",
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "\n",
+    "with tf.Session() as sess:\n",
+    "  # Set up two variables, total and weights, that we'll change repeatedly.\n",
+    "  total = tf.Variable(tf.zeros([1, 2]))\n",
+    "  weights = tf.Variable(tf.random_uniform([1,2]))\n",
+    "  \n",
+    "  # Initialize the variables we defined above.\n",
+    "  tf.initialize_all_variables().run()\n",
+    "  \n",
+    "  # This only adds the operators to the graph right now. The assignment\n",
+    "  # and addition operations are not performed yet.\n",
+    "  update_weights = tf.assign(weights, tf.random_uniform([1, 2], -1.0, 1.0))\n",
+    "  update_total = tf.assign(total, tf.add(total, weights))\n",
+    "  \n",
+    "  for _ in range(5):\n",
+    "    # Actually run the operation graph, so randomly generate weights and then\n",
+    "    # add them into the total. Order does matter here. We need to update\n",
+    "    # the weights before updating the total.\n",
+    "    sess.run(update_weights)\n",
+    "    sess.run(update_total)\n",
+    "    \n",
+    "    print weights.eval(), total.eval()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "kSYJr89aM_n0"
+   },
+   "source": [
+    "This is more complicated. At a high level, we create two variables and add operations over them, then, in a loop, repeatedly execute those operations. Let's walk through it step by step.\n",
+    "\n",
+    "Starting off, the code creates two variables, `total` and `weights`. `total` is initialized to \\[0, 0\\] and `weights` is initialized to random values between -1 and 1.\n",
+    "\n",
+    "Next, two assignment operators are added to the graph, one that updates weights with random values from [-1, 1], the other that updates the total with the new weights. Again, the operators are not executed here. In fact, this isn't even inside the loop. We won't execute these operations until the `eval` call inside the loop.\n",
+    "\n",
+    "Finally, in the for loop, we run each of the operators. In each iteration of the loop, this executes the operators we added earlier, first putting random values into the weights, then updating the totals with the new weights. This call uses `eval` on the session; the code also could have called `eval` on the operators (e.g. `update_weights.eval`).\n",
+    "\n",
+    "It can be a little hard to wrap your head around exactly what computation is done when. The important thing to remember is that computation is only performed on demand.\n",
+    "\n",
+    "Variables can be useful in cases where you have a large amount of computation and data that you want to use over and over again with just a minor change to the input each time. That happens quite a bit with neural networks, for example, where you just want to update the weights each time you go through the batches of input data, then run the same operations over again."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "fL3WfAbKzqr5"
+   },
+   "source": [
+    "## What's next?\n",
+    "\n",
+    "This has been a gentle introduction to TensorFlow, focused on what TensorFlow is and the very basics of doing anything in TensorFlow. If you'd like more, the next tutorial in the series is Getting Started with TensorFlow, also available in the [notebooks directory](http://127.0.0.1:8888/tree)."
+   ]
+  }
+ ],
+ "metadata": {
+  "colabVersion": "0.3.2",
+  "colab_default_view": {},
+  "colab_views": {},
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/tensorflow/tools/docker/notebooks/2_getting_started.ipynb b/tensorflow/tools/docker/notebooks/2_getting_started.ipynb
new file mode 100644
index 0000000000..01d1c521fe
--- /dev/null
+++ b/tensorflow/tools/docker/notebooks/2_getting_started.ipynb
@@ -0,0 +1,844 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "6TuWv0Y0sY8n"
+   },
+   "source": [
+    "# Getting Started in TensorFlow\n",
+    "## A look at a very simple neural network in TensorFlow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "u9J5e2mQsYsQ"
+   },
+   "source": [
+    "This is an introduction to working with TensorFlow. It works through an example of a very simple neural network, walking through the steps of setting up the input, adding operators, setting up gradient descent, and running the computation graph. \n",
+    "\n",
+    "This tutorial presumes some familiarity with the TensorFlow computational model, which is introduced in the [Hello, TensorFlow](http://127.0.0.1:8888/tree) notebook, also available in this bundle."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "Dr2Sv0vD8rT-"
+   },
+   "source": [
+    "## A simple neural network\n",
+    "\n",
+    "Let's start with code. We're going to construct a very simple neural network computing a linear regression between two variables, y and x. The function it tries to compute is the best $w_1$ and $w_2$ it can find for the function $y = w_2 x + w_1$ for the data. The data we're going to give it is toy data, linear perturbed with random noise.\n",
+    "\n",
+    "This is what the network looks like:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJYAAABkCAYAAABkW8nwAAAO90lEQVR4Xu2dT5Dc1J3Hv+YQT8VJ\nZUhVdprLWs4FTSrGGv4ql9CuHBCH4GaTFCLZwnIcjOAy8l6Q/1SlU4XHcg6xJgtY2OOik2KxSGoT\nGWrXzYFC2T2MDAtWitRavmQ0e9k2SYGowom4hNRPtqA9TE+rW3/cPfPepcfup6f3fu/Tv9/T+/PV\npo8//vhjsMQsULAFNjGwCrYoKy6xAAOLgVCKBRhYpZiVFcrAYgyUYgEGVilmZYUysBgDpViAgVWK\nWVmhDCzGQCkWGEuwrly5gtf++zW887/vYOn/lnD5T5cT40x9ZQrb/nEbxDtFiHeI2LJlSylGY4X2\nt8BYgUVAvfzqy3i5/TI+vPLhmq37wpYv4AHpATxw3wMMsP4cFJ5jbMAiqA4eOYg/Lv8xMcL26e34\n+vTXk8+vbv1q8n/03TsX38EfLv4h+aRE380dmmNwFY7O2gWOBVgE1Y/2/yjxUls+vwXaY1oS7tZK\n3v94MJ8zceUvV0Dea+H4AoOrQrhGHqxuT0Xjp0P7D2HqH6Yymejyu5dx5PiRZBxGnmt+bj7TdSxT\nfgv0ASuAzglwmyE8pfbZu3VaEDkDdT+AweevzGolvPjvL+LMb84knmr+yHxmqNKyCK7ZQ7OJ5yIo\n+3m6clqx8UrNB1bso2W64FQN9cnijdcdAvNAQWGRPBcLicX3Ua8S84FVcj3PnjuLhRcWkgH63OG5\nXHc7+NTBZEBP47NvffNbucpiF/e3QCaw2g0NfNvES5c+wtQ9u2G0LCj8BLAiFEaeBU0zYJ9fxkfY\njKl7FZgtCzIHIA7QUmXov/g9LmMztt6rwLBMyFROj3TkZ0fgveXh4X96GN//zvf7t2aNHGlI7VlW\n0pYmRC+AKUwAsQu5thOuvIjQEjGBGJ7CQYptdOw6etc6VzXXzcUZwJrGseWt2P28DV2I4OgyDgQK\nFgMTYtQ1xqq10eDuR6j8Fi1NxGTkwpAfRos7h05bQscQIFgibEeHMBHCVhs4EBtY8lQQd6ulvbN7\n8e6f302mC7Z/bXsuo9NkKk1X9PZ+IUyeR0sN4GscYl8DPzOP5VuPYynQwMU+dL4O3wzRbpQQ93O1\nbvQuzgRWS0p/tQA6Nuqcilq7A5u3Px28T7qw7BB1VUHqhEKTB2+pCAIVHZVD3dPgujpE6peOBzes\nQRS5nr/+b//g24nF7JN27qkCGq/J++RknHXm5JlVeiKGr/MQPQMdV0ZkCRBbNUwEMYzQhRyZEHgH\nOv29ynPM6HXtja1Rf7B4AZ7RgZv+SuMAOj+NtrYEX3avfyqMfDi2DdcLEAQBvPOX8MGtR3Ex0MEF\nJiRxP373wWZsvaeBhixDVRrg1/jxlwEWPV3ap+xVrR57Cjgpht2xEDV4mLIFvqkiaoUwwzp4U4Hv\n9/awN7YrR+vuGcAS4ZsdtKV0VNEFVqMLrIkWJGEPPP4hKA0RgiCAc1XsdJQErGQ2Ig7hOQ5sx4Hz\n0u+wvHX2akjtMWCpNhQCiCicq+AcCx1Fh9B2IegcNN6B4Teg1z0EeknzKqPFRe7a9AeLm4ajXvzU\noJEDqUahMESrKxSqbQHbDBGLoXUNlBiuUsNOT8fFQEVsNdHmdOjStTgSGOCnLTQuBDBosLxKqnTw\nntw/glPnoHMS4E6iFVjgbBGcwUGMPAjtawP73GZf/wVkAutYtAvPezYUPoKjipBdGZ5vQOgavGte\nHbfsiXD09TZUIUbg6JD3vITlrU/iYthErPOYaQk44ZhocDF8U0HDqsEOHfQaC7/2X68lyzJVTjd0\nWiJu2XMem++7+tAxSd52+hguTe3GYtjq6V3XPyqDtbA/WLyAtqRg0rHhLceo3avCsk0kjqd7uoEL\n0FJkaC/9Hh/gS9ixS0dTCaDKHVidNhoTNN2gQP/FedAmly/t2IWm2YK2xswqDbj3antzz5oToD/9\n15/i5smbcdo8vfaDQGiC37YfEyeW4KtcMu2g1HbCrp9Dx5Fw3ZCw04ZSb0Jse6CsLH1qgZFfK0zn\nn+hpznzKHGpJRzus4YJ/AX/78G94ofUC7r777pwMxAhdE6pyAK8u78CJJZ+BtcKiIw8Wea0DTx34\nZCH5oHYwM1y0TjhnziXbaWgB+4cP/RCPPfYYtm/fjpMnT+Kmm24aDrDYhdpoQdAbaMtNSB4Da6Uh\nRx4sqnB3SCTPNbtvtu9iMoU/Wg5Kt9p0h8DTp09j3759ePrpp/H4448PB1fylOtC5jTUGVifseFY\ngJXClXou+jcN6Gk2nj7JG1Gi7TG0Hkiz7OlGP/ru6OGjq46rnnjiCSwuLibe66677hocMAZWT5uN\nDVgpXGfbZ5OtybQNZq1EE6G0NXmXtGvNwbrv+4n3uu222wYPjwys9QFW2goKjbQ4Tdth6CAFeSpK\n5J3oQMUwhynS8PjMM89AVdVs3ouBtb7Aytbrw+WiMZfnednCIwOLgTUIZml43LFjB5577rnhnx4H\nuek6yztWY6yqbb+wsJBMTwwUHquu5Ijej4GVoWMoPJ4/fz7xXkM9PWa4x3rLwsDK2KMXLlxIvBeF\nR5qe2LRpU8YrN2Y2BtaA/U7hkaYnnn322exPjwPeYz1kZ2AN2YtpeCTvdeeddw5Zyvq9jIGVo28p\nPJL3ok2NLDxeb0gGVg6w0kvT8HjixIlkHJY1lauaE8GRangwsvD/noKqt+kzsLJSkCEfzdi/8cYb\nifdaKzxWoppDmxJ5FT54NH06YZShAQVmYWAVaEwqKg2PMzMzyfTEyqfHqlRzAoOH6OqwJnXoNQeB\nSWcjq0sMrJJsferUqSQsdofHylRzYg8aLyG0QtiTOvhGhFZglyKD0Mt8DKySwEqLpfD45ptvYn5+\nHr/+z19/sukwj2pOP72vyJXBy4BNME340Pg6AiNAu8IDkQysksGi4t9++2189wffxee++DkIO4Tc\nqjlrSw504Eg81FobYetq+KOwKDgagjVOnRdtBgZW0RZdpbw0BL73/nv4yZM/6bv7tVeVxkk1h4FV\nAVgbUTWHgVUBWGUcvCVV6EP/cuiztQ9NCNsMiIshrPSIeaK3oUNIlXQqaDMDqwIjlyEV0Fv6MoQl\nbENT/FTIhWSXOF2AF5jocei8cCswsAo36WcLLEPchO7yyr+9smrt6TQ3geQmcgcd2CQbIHoIDKGy\nuSwG1joEi06oU+jj3RAWR2HQgFiiTuxqJmRgVQBWGaGQDo78/OjPe9T+qpfSeBeeqIM3JPip4k8F\n7aVbMLAqMHSlg/dr7YkcCZxWg1Jz0G5UL7/EwKoArBuhmoNEbupBvPrRDhxf8qFVLFrCwKoArFQi\n4P3o/VwTpCmgdBi3r2oOIrQbNdwfGljytZ46r2U1n4FVlmW7yn3rrbfwvX/+XrKkMyPM5FLNIS2K\nbCrSNI8loKX48G6AxhIDq2SwaIcDgWWaJn71H78qRDWnlxbF1aaQxJILj6TRjRhm0L4hYrwMrJLA\nos1+BBXtyaLty5SKVs1Zverx1RB4dhIPPe/CVioeXF2rFAOrYLDIOxFQd9xxRwLVytSt90XfFaGa\nU3ATCimOgVWIGa8WkoY9AorA6pUIrqJVcwpsRiFFMbAKMONqYS9LsWWo5mS5bxV5GFg5rExhj8ZP\ndHBitbCXo+ixv5SBNWQXpmGPvNXtt98+ZCnr9zIG1oB9O2zYG/A2Y5+dgZWxC1nYy2goNt2Q3VA0\njqIDESzsZbcZ81hr2CoNe/T56KOPZrcqy8m2zazGAAt7+X8ZzGOtsCELe/mhohLGEqwyVFpY2CsG\nqLSUsQKrDJUWFvaKBWrswCpDpYWFvXKgKiYUxh5U/huwhd8idBqYRARX4bHTldd8Le8gTSpapYWW\nX0is47qnveTdi02I6aFOejlAbSdcOT2fF8NTOEixDTqnV6Uk0CC2GpW8hYTCyFXA72yj8XoAAzoE\n+nsxgNnrZc8DtL7bU9HJlDwqLY9855FkbY8ktS3LWlGLECbPo6UG8DUOsa+Bn5nH8q3HsRRo4GIS\nL6vDN0O0e70SdoB2rfeshYBF71Juyzzu90TcF59FIC8WJvSVvgiT9nnPH5nP/K7CtOPonYWzh2aT\nF2Fu+usmvPjLF3us7cXwdR6iZ6DjyogsAWKrhokghhG6kCMTAu9Ap7+r1l0cQwoLAote4+ugwT+I\nsxO78XrQKkTkqzsEkqeily8Nk0il5cfHfowv3/xlLBxf6Pk2sNhTwEkx7I6FqMHDlC3wTRVRK4QZ\n1sGbCnxfrfxgwjBtvtHXFAZW7OsQZo7hEm7Fkxf8nm+mH6TBlau0RG00OBWcY6Gj6BDaLgSdDn46\nMPwG9Hr15/MGsdco5S0GrDiAIU7D5M/AgIo9gY6Lng4+5wi3jIOea59wieCQzgEnAe4kWoEFzhbB\nGRzEyIPQDmBWpaoxSpQMUZdCwCLh1OlmDWcCBzJsSNzDiIyL8LR8Ur1lHE2nPeZzh+d6mooENW7Z\ncx6b7zuHTlvCJB1Nnz6GS1O7sUhKxDl/LEP00Vhekh8sUjThNUyYAdxr59dCSwSvAWbg5Xq7exkq\nLfRO6TMnz/TurNAEv20/Jk4swaf2xC6U2k7Y9XPoOBIm6crYh6UoaLodABOoSU3YlpLbQ48lQT0q\nnR+sEq1RBlj0dGmfsnPVOtB51IMmfEdGLQ7RkkSYkps8VbJ01QIjDdaNCIVZwOi4DnxOgsRRXIzh\nazwakY3gmphsljLWe56RBqv6wfvg3R0HFqS6CcHxC5kQHrwGo3nFSIN1Q1RaBuinyDchSyYmDRct\nhWPLPF22G2mwuo+k55kgHUylJRtZoa1A0kI0bAdGPRnSszQuYFE90yUdepoznzKHWtLRDmsglZY8\ncHZTE7UVCGqEpmtDScZZLK20wEh7LKpst9YBKQUf1A5mhovWCefMuU9eM9JbWnEQMAIY/DQOXLr+\nmqmHXkfIdj18YpSRByuFa6+2F1f+cgXkuWb3zfZdN6Twt/DCQuKpsgmVDQIXy9vPAmMB1krPRf9e\nryot/TpsXL4fG7BSuNa7Ssu4gNOvnmMFVtqY9azS0q/DxuX7sQRrXIy7kevJwNrIvV9i2xlYJRp3\nIxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3Ixf9d0NIelzdt4X5\nAAAAAElFTkSuQmCC\n",
+      "text/plain": [
+       "<IPython.core.display.Image object>"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from IPython.display import Image\n",
+    "import base64\n",
+    "Image(data=base64.decodestring(\"iVBORw0KGgoAAAANSUhEUgAAAJYAAABkCAYAAABkW8nwAAAO90lEQVR4Xu2dT5Dc1J3Hv+YQT8VJZUhVdprLWs4FTSrGGv4ql9CuHBCH4GaTFCLZwnIcjOAy8l6Q/1SlU4XHcg6xJgtY2OOik2KxSGoTGWrXzYFC2T2MDAtWitRavmQ0e9k2SYGowom4hNRPtqA9TE+rW3/cPfPepcfup6f3fu/Tv9/T+/PVpo8//vhjsMQsULAFNjGwCrYoKy6xAAOLgVCKBRhYpZiVFcrAYgyUYgEGVilmZYUysBgDpViAgVWKWVmhDCzGQCkWGEuwrly5gtf++zW887/vYOn/lnD5T5cT40x9ZQrb/nEbxDtFiHeI2LJlSylGY4X2t8BYgUVAvfzqy3i5/TI+vPLhmq37wpYv4AHpATxw3wMMsP4cFJ5jbMAiqA4eOYg/Lv8xMcL26e34+vTXk8+vbv1q8n/03TsX38EfLv4h+aRE380dmmNwFY7O2gWOBVgE1Y/2/yjxUls+vwXaY1oS7tZK3v94MJ8zceUvV0Dea+H4AoOrQrhGHqxuT0Xjp0P7D2HqH6Yymejyu5dx5PiRZBxGnmt+bj7TdSxTfgv0ASuAzglwmyE8pfbZu3VaEDkDdT+AweevzGolvPjvL+LMb84knmr+yHxmqNKyCK7ZQ7OJ5yIo+3m6clqx8UrNB1bso2W64FQN9cnijdcdAvNAQWGRPBcLicX3Ua8S84FVcj3PnjuLhRcWkgH63OG5XHc7+NTBZEBP47NvffNbucpiF/e3QCaw2g0NfNvES5c+wtQ9u2G0LCj8BLAiFEaeBU0zYJ9fxkfYjKl7FZgtCzIHIA7QUmXov/g9LmMztt6rwLBMyFROj3TkZ0fgveXh4X96GN//zvf7t2aNHGlI7VlW0pYmRC+AKUwAsQu5thOuvIjQEjGBGJ7CQYptdOw6etc6VzXXzcUZwJrGseWt2P28DV2I4OgyDgQKFgMTYtQ1xqq10eDuR6j8Fi1NxGTkwpAfRos7h05bQscQIFgibEeHMBHCVhs4EBtY8lQQd6ulvbN78e6f302mC7Z/bXsuo9NkKk1X9PZ+IUyeR0sN4GscYl8DPzOP5VuPYynQwMU+dL4O3wzRbpQQ93O1bvQuzgRWS0p/tQA6Nuqcilq7A5u3Px28T7qw7BB1VUHqhEKTB2+pCAIVHZVD3dPgujpE6peOBzesQRS5nr/+b//g24nF7JN27qkCGq/J++RknHXm5JlVeiKGr/MQPQMdV0ZkCRBbNUwEMYzQhRyZEHgHOv29ynPM6HXtja1Rf7B4AZ7RgZv+SuMAOj+NtrYEX3avfyqMfDi2DdcLEAQBvPOX8MGtR3Ex0MEFJiRxP373wWZsvaeBhixDVRrg1/jxlwEWPV3ap+xVrR57Cjgpht2xEDV4mLIFvqkiaoUwwzp4U4Hv9/awN7YrR+vuGcAS4ZsdtKV0VNEFVqMLrIkWJGEPPP4hKA0RgiCAc1XsdJQErGQ2Ig7hOQ5sx4Hz0u+wvHX2akjtMWCpNhQCiCicq+AcCx1Fh9B2IegcNN6B4Teg1z0EeknzKqPFRe7a9AeLm4ajXvzUoJEDqUahMESrKxSqbQHbDBGLoXUNlBiuUsNOT8fFQEVsNdHmdOjStTgSGOCnLTQuBDBosLxKqnTwntw/glPnoHMS4E6iFVjgbBGcwUGMPAjtawP73GZf/wVkAutYtAvPezYUPoKjipBdGZ5vQOgavGteHbfsiXD09TZUIUbg6JD3vITlrU/iYthErPOYaQk44ZhocDF8U0HDqsEOHfQaC7/2X68lyzJVTjd0WiJu2XMem++7+tAxSd52+hguTe3GYtjq6V3XPyqDtbA/WLyAtqRg0rHhLceo3avCsk0kjqd7uoEL0FJkaC/9Hh/gS9ixS0dTCaDKHVidNhoTNN2gQP/FedAmly/t2IWm2YK2xswqDbj3antzz5oToD/915/i5smbcdo8vfaDQGiC37YfEyeW4KtcMu2g1HbCrp9Dx5Fw3ZCw04ZSb0Jse6CsLH1qgZFfK0znn+hpznzKHGpJRzus4YJ/AX/78G94ofUC7r777pwMxAhdE6pyAK8u78CJJZ+BtcKiIw8Wea0DTx34ZCH5oHYwM1y0TjhnziXbaWgB+4cP/RCPPfYYtm/fjpMnT+Kmm24aDrDYhdpoQdAbaMtNSB4Da6UhRx4sqnB3SCTPNbtvtu9iMoU/Wg5Kt9p0h8DTp09j3759ePrpp/H4448PB1fylOtC5jTUGVifseFYgJXClXou+jcN6Gk2nj7JG1Gi7TG0Hkiz7OlGP/ru6OGjq46rnnjiCSwuLibe66677hocMAZWT5uNDVgpXGfbZ5OtybQNZq1EE6G0NXmXtGvNwbrv+4n3uu222wYPjwys9QFW2goKjbQ4Tdth6CAFeSpK5J3oQMUwhynS8PjMM89AVdVs3ouBtb7Aytbrw+WiMZfnednCIwOLgTUIZml43LFjB5577rnhnx4Huek6yztWY6yqbb+wsJBMTwwUHquu5Ijej4GVoWMoPJ4/fz7xXkM9PWa4x3rLwsDK2KMXLlxIvBeFR5qe2LRpU8YrN2Y2BtaA/U7hkaYnnn322exPjwPeYz1kZ2AN2YtpeCTvdeeddw5Zyvq9jIGVo28pPJL3ok2NLDxeb0gGVg6w0kvT8HjixIlkHJY1lauaE8GRangwsvD/noKqt+kzsLJSkCEfzdi/8cYbifdaKzxWoppDmxJ5FT54NH06YZShAQVmYWAVaEwqKg2PMzMzyfTEyqfHqlRzAoOH6OqwJnXoNQeBSWcjq0sMrJJsferUqSQsdofHylRzYg8aLyG0QtiTOvhGhFZglyKD0Mt8DKySwEqLpfD45ptvYn5+Hr/+z19/sukwj2pOP72vyJXBy4BNME340Pg6AiNAu8IDkQysksGi4t9++2189wffxee++DkIO4TcqjlrSw504Eg81FobYetq+KOwKDgagjVOnRdtBgZW0RZdpbw0BL73/nv4yZM/6bv7tVeVxkk1h4FVAVgbUTWHgVUBWGUcvCVV6EP/cuiztQ9NCNsMiIshrPSIeaK3oUNIlXQqaDMDqwIjlyEV0Fv6MoQlbENT/FTIhWSXOF2AF5jocei8cCswsAo36WcLLEPchO7yyr+9smrt6TQ3geQmcgcd2CQbIHoIDKGyuSwG1joEi06oU+jj3RAWR2HQgFiiTuxqJmRgVQBWGaGQDo78/OjPe9T+qpfSeBeeqIM3JPip4k8F7aVbMLAqMHSlg/dr7YkcCZxWg1Jz0G5UL7/EwKoArBuhmoNEbupBvPrRDhxf8qFVLFrCwKoArFQi4P3o/VwTpCmgdBi3r2oOIrQbNdwfGljytZ46r2U1n4FVlmW7yn3rrbfwvX/+XrKkMyPM5FLNIS2KbCrSNI8loKX48G6AxhIDq2SwaIcDgWWaJn71H78qRDWnlxbF1aaQxJILj6TRjRhm0L4hYrwMrJLAos1+BBXtyaLty5SKVs1Zverx1RB4dhIPPe/CVioeXF2rFAOrYLDIOxFQd9xxRwLVytSt90XfFaGaU3ATCimOgVWIGa8WkoY9AorA6pUIrqJVcwpsRiFFMbAKMONqYS9LsWWo5mS5bxV5GFg5rExhj8ZPdHBitbCXo+ixv5SBNWQXpmGPvNXtt98+ZCnr9zIG1oB9O2zYG/A2Y5+dgZWxC1nYy2goNt2Q3VA0jqIDESzsZbcZ81hr2CoNe/T56KOPZrcqy8m2zazGAAt7+X8ZzGOtsCELe/mhohLGEqwyVFpY2CsGqLSUsQKrDJUWFvaKBWrswCpDpYWFvXKgKiYUxh5U/huwhd8idBqYRARX4bHTldd8Le8gTSpapYWWX0is47qnveTdi02I6aFOejlAbSdcOT2fF8NTOEixDTqnV6Uk0CC2GpW8hYTCyFXA72yj8XoAAzoE+nsxgNnrZc8DtL7bU9HJlDwqLY9855FkbY8ktS3LWlGLECbPo6UG8DUOsa+Bn5nH8q3HsRRo4GISL6vDN0O0e70SdoB2rfeshYBF71Juyzzu90TcF59FIC8WJvSVvgiT9nnPH5nP/K7CtOPonYWzh2aTF2Fu+usmvPjLF3us7cXwdR6iZ6DjyogsAWKrhokghhG6kCMTAu9Ap7+r1l0cQwoLAote4+ugwT+IsxO78XrQKkTkqzsEkqeily8Nk0il5cfHfowv3/xlLBxf6Pk2sNhTwEkx7I6FqMHDlC3wTRVRK4QZ1sGbCnxfrfxgwjBtvtHXFAZW7OsQZo7hEm7Fkxf8nm+mH6TBlau0RG00OBWcY6Gj6BDaLgSdDn46MPwG9Hr15/MGsdco5S0GrDiAIU7D5M/AgIo9gY6Lng4+5wi3jIOea59wieCQzgEnAe4kWoEFzhbBGRzEyIPQDmBWpaoxSpQMUZdCwCLh1OlmDWcCBzJsSNzDiIyL8LR8Ur1lHE2nPeZzh+d6mooENW7Zcx6b7zuHTlvCJB1Nnz6GS1O7sUhKxDl/LEP00Vhekh8sUjThNUyYAdxr59dCSwSvAWbg5Xq7exkqLfRO6TMnz/TurNAEv20/Jk4swaf2xC6U2k7Y9XPoOBIm6crYh6UoaLodABOoSU3YlpLbQ48lQT0qnR+sEq1RBlj0dGmfsnPVOtB51IMmfEdGLQ7RkkSYkps8VbJ01QIjDdaNCIVZwOi4DnxOgsRRXIzhazwakY3gmphsljLWe56RBqv6wfvg3R0HFqS6CcHxC5kQHrwGo3nFSIN1Q1RaBuinyDchSyYmDRcthWPLPF22G2mwuo+k55kgHUylJRtZoa1A0kI0bAdGPRnSszQuYFE90yUdepoznzKHWtLRDmsglZY8cHZTE7UVCGqEpmtDScZZLK20wEh7LKpst9YBKQUf1A5mhovWCefMuU9eM9JbWnEQMAIY/DQOXLr+mqmHXkfIdj18YpSRByuFa6+2F1f+cgXkuWb3zfZdN6Twt/DCQuKpsgmVDQIXy9vPAmMB1krPRf9eryot/TpsXL4fG7BSuNa7Ssu4gNOvnmMFVtqY9azS0q/DxuX7sQRrXIy7kevJwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3Ixf9d0NIelzdt4X5AAAAAElFTkSuQmCC\"), embed=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "fBQq_R8B8rRf"
+   },
+   "source": [
+    "Here is the TensorFlow code for this simple neural network and the results of running this code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 665,
+     "status": "ok",
+     "timestamp": 1446658971218,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "4896c353dcc58d9f",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "Dy8pFefa_Ho_",
+    "outputId": "5a95f8c8-0c32-411d-956d-bb81aeed8e50"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlsAAAEPCAYAAAB1MgENAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYVNW1///36oFuZpFmUFBQQGVwQkVQgx0cUENEo5HM\nIRqvSuKAibZmuOD3l1wFjWg0XjVxilfjnGCIUVDpoAmIiiIWCCjz1NDMMz2s3x+nWsumm56q6lRV\nf17Pc56qOnXqnFVAb1bvvc/a5u6IiIiISGJkhR2AiIiISCZTsiUiIiKSQEq2RERERBJIyZaIiIhI\nAinZEhEREUkgJVsiIiIiCRSXZMvMHjGzEjP7KGZfBzObamYLzew1M2sfj2uJiCSSmXU3szfNLGJm\n88zs2uj+iWa2wMw+NLMXzaxdzGduNbPF0ffPDS96EUlF8erZegwYXm3fLcDr7n408CZwa5yuJSKS\nSOXAje7eHxgC/NTMjgGmAv3d/QRgMdE2zcz6AZcBfYHzgQfMzEKJXERSUlySLXd/G9hcbfdI4Ino\n8yeAi+JxLRGRRHL3de7+YfT5DmAB0M3dX3f3yuhhs4Du0ecXAs+4e7m7LyNIxAYlOWwRSWGJnLPV\n2d1LIGi8gM4JvJaISNyZWU/gBOCdam9dDrwSfd4NWBnz3uroPhERILkT5LUukIikDTNrA7wAXB/t\n4ara/0ugzN3/ElpwIpJWchJ47hIz6+LuJWbWFVhf00FmpiRMJEO4e0bMVTKzHIJE60l3nxyzfzRw\nATAs5vDVwGExr7tH99V0XrV3IhmgoW1dPHu2LLpVeRkYHX3+Q2By9Q9UcfeU2MaNGxd6DKkWS6rE\noVhSOw73jMshHgXmu/u9VTvM7DzgJuBCd98bc+zLwLfMrIWZHQH0BmbXduKw/54y5d9bc/0O6R5/\nJnyHxohLz5aZPQ0UAh3NbAUwDrgDeN7MLgeWE9ytIyKS0szsdOC7wDwz+4BgCsQvgd8DLYBp0ZsN\nZ7n7GHefb2bPAfOBMmCMN7ZFFpGMFJdky92/U8tbZ8fj/CIiyeLu/waya3irzwE+cztwe8KCEpG0\npgryMQoLC8MO4XOpEkuqxAGKpSapEoc0D5nw7y3dv0O6xw+Z8R0aysLu7TYz9biLZAAzwzNkgnyi\nqL0TSX+NaevUsyUiIiKSQEq2RERERBJIyZaIiIhIAinZEhEREUkgJVsiIiIiCaRkS0RERCSBlGyJ\niIiIJJCSLREREZEEUrIlIiIikkBKtkREREQSSMmWiIiISAIp2RIRERFJICVbIiJJpHWoRZofJVsi\nIkm0c2fYEYhIsinZEhFJopKSsCMQkWRTsiUikkTr14cdgYgkm5ItEZEkUrIl0vwo2RIRSSINI4o0\nPzlhByAijROJRJgybQoAI84ZQf/+/UOOSOpDPVsizY96tkTSUCQS4eY7b2Zm2Uxmls3k5jtvJhKJ\nhB2W1IOSLZHmR8mWSJqJRCL8fNzPWdpyKbmdc+lyTBdy++d+3sslqU3DiCLNj5ItkTRS1aO1onIF\n67PW88aCN9i8eXPYYWUUM+tuZm+aWcTM5pnZddH9HcxsqpktNLPXzKx9zGduNbPFZrbAzM490PnV\nsyXS/CjZEkkjU6ZNIbd/Lt3O7Ma+8n3kbMvh45kfUxYpY8Q5I8IOL1OUAze6e39gCPATMzsGuAV4\n3d2PBt4EbgUws37AZUBf4HzgATOz2k6uni2R5kfJlkia2V65nZIWJZze/3Q6retEjw09mHjTRE2Q\njxN3X+fuH0af7wAWAN2BkcAT0cOeAC6KPr8QeMbdy919GbAYGFTb+dWzJdL8KNkSSSPHDj6WpSVL\n6bStE3l78+jZrid33XaXEq0EMbOewAnALKCLu5dAkJABnaOHdQNWxnxsdXRfjbZuhfLyREQrIqlK\npR9E0sSSzUt4c9ub3PPNe1jwzgIARtykkg+JYmZtgBeA6919h5lVX0K6UUtKt2gxnqIiaNsWCgsL\nKSwsbGqoIpJAxcXFFBcXN+kc5iEvQW9mHnYMIqlu+Zbl3Df7PkafMJoBnQeEHU6NzAx3r3WuUjox\nsxxgCvBPd783um8BUOjuJWbWFZju7n3N7BbA3X1C9LhXgXHu/k4N5/Vjj3WefBKOPz5530dE4qcx\nbZ2GEUVS3Kptq7h/9v18/7jvp2yilYEeBeZXJVpRLwOjo89/CEyO2f8tM2thZkcAvYHZtZ24c2fN\n2xJpbjSMKJLC1m5fy72z7uVbA77F8V3VFZIMZnY68F1gnpl9QDBc+AtgAvCcmV0OLCe4AxF3n29m\nzwHzgTJgzIG66zt31h2JIs2Nki2RFFWyo4R7Zt3DN/t/k5MOPSnscJoNd/83kF3L22fX8pnbgdvr\nc/4uXdSzJdLcaBhRJAWV7ipl0qxJjDxmJIO61VpFQNKQhhFFmh/1bImkmI27NnL3zLu5oM8FnHbY\nafu9P2fOHJ599jXWrVuNWTZdunRl1KjhDBw4MIRopaE6d4ZFi8KOQkSSScmWSArZsmcLk2ZN4uwj\nz2Zoj6H7vT9nzhyuvnoSe/aczpIls4DzOeKIQ5k+fRIPPjhWCVca0DCiSPOjYUSRFLFt7zbunnk3\nQ3sMZdgRw2o85tlnXyM7ezS7dm0mN/c6cnMvYffuvmRnj+bZZ19LarzSOBpGFGl+1LMlEqJIJMKU\naVPY63tZ120dI04cwbm9DriOsaS5Ll10N6JIc6OeLZGQRCIRbr7zZt7a9xZ/2f0XZhfPpse+Hgf8\nzKhRw6moeJxWrTpQVvZ7yspepGXLBVRUPM6oUcOTFLk0RadOQc+WajmLNB+qIC/NQtWkciBlJpNP\nuGcCb+97m5IOJbTPa0+rda04rcVpFN1QdMDPpeoE+UyqIJ8oVe1d27awahW0bx92RCLSUI1p6zSM\nKBmvalJ5dvZogJSZTF7mZSwpX0LXFl054qAjWF9Sv4k8AwcODD12aZqqSfJKtkSaBw0jSsarmlRe\nUHAWBQVnpcRk8r3le9nSYwu5q3NpU9KG9QvXUxYpY8Q5I0KNS5JDk+RFmhf1bIkkWVlFGQ+8+wDH\nHnkso48azT9e/wcAI24aQf/+/UOOTpJBS/aINC9KtiTjjRo1nOnTJ1FaGrwOJpOPDSWW8spy/ve9\n/6VdXju+f/z3ybIsBgzQ4tLNjWptiTQvSrYk4w0cOJAHHxwbM0E+nPlaFZUVPPz+w+Rl5/GjE39E\nlmkUv7nSMKJI86JkS5qFsCeVV3olf5rzJ9ydK066QolWM9elCyxYEHYUIpIsavFFEqzSK3nsg8fY\nW7GXq06+ipws/Y7T3KlnS6R5UbIlkkDuzpNzn2Tb3m1cc/I1SrQEULIl0twkvOU3s2XAVqASKHP3\nQYm+pkgqcHeenvc0G3Zt4NpB15KbnRt2SJIitGSPSPOSjF+zK4FCd9+chGuJ1CqZVeTdneciz7Fq\n2yquH3w9eTl5CbuWpB/1bIk0LwlfrsfMlgInu/vGWt7Xcj2ScNWryFdUPB7XKvJVC0oDfO3sr7Ew\nayELNy7khsE30Cq3VVyukeq0XE/dqtq7ykrIz4cdO6BFi7CjEpGGSNXlehyYZmYVwMPu/sckXFPk\nS2KryAOUlgb74pFsVS0onds/GCZ8/oHnGfLVIdwx8o5mk2hJw2RlQUEBbNgA3bqFHY2IJFoykq3T\n3X2tmXUiSLoWuPvbsQeMHz/+8+eFhYUUFhYmISyR+JgybQq5/XPpckwXVmxdwY59O+i0qhOtW7QO\nO7SEKi4upri4OOww0lbVUKKSLZHMl/Bky93XRh83mNlfgUFArcmWSCIko4r8qm2rWL9zPb1ye5Fn\nmT9Hq/ovRrfddlt4waQhLdkj0nwkNNkys1ZAlrvvMLPWwLmAWmSJq/pMfE9EFfmqeVola0tYsm4J\ne/ftpXdub5gfrHMociBaskek+Uh0z1YX4K9m5tFrPeXuUxN8TWlGqk98nz59Uq0T3+NZRT4SiXDV\nr66ipP0mdufvojxnDxfvuJieXXpqQWmpF92RKNJ8JDTZcvelwAmJvIY0b4mc+H4gDz3+EJGyZVir\nw9nbch85nxm5B+dSdENRg86TzHIUUn9m9ggwAihx9+Oi+04B/gDkAmXAGHd/L/rercDlQDlwfX1+\nqVStLZHmQxXkRRph7txFVLQ5mPK2u+iwbwjZFUcyd+6iBp2jqlduxoxBzJgxiKuvnsScOXMSFLE0\n0GPA8Gr7JgK/cvcTgXHAnQBm1g+4DOgLnA88YGZ13hauni2R5kPJlqS1UaOGU1HxOKWlb1Ba+kZ0\n4nv1/yPj76AjD2afl5K3vAvla7fjn1TQ+7BjGnSO2F65goKzyM4e/Xkvl4Qresd09ULMa4H20ecH\nAaujzy8EnnH3cndfBiwmuBHogJRsiTQfWqhN0lpTJr43dghv7rq5tDs9m5739WPXlsMB6JRfwU9+\n8oNGfANJI7cA/zaz3wEGnBbd3w2YGXPc6ui+A9IwokjzoWRL0l5jJr43ZGJ9bHX4YwYdw1s73uI3\nX/8NG/tvjEnWrmlwDMkoRyFx9Qhwrbv/zcwuBR4FzmnoSapK3WzdCitXFgKF8YtQROIuHjUFE75c\nT50BaLmeZqshPUvxnkheVHQ7M2YMiplY/wZDh85mwoRbv3RcbHX47ZXbWVqylAdGPcB5p54Xl/gy\naYJ8pi3XY2Y9gL/HTJDf5u7tYt7f4u4HmdktgLv7hOj+V4Fx7v5ODef8vL3bswfatw8e657hJSKp\nIlWX6xHZT0N6lhpybLxVVYfP75nP0tKlHMmRzJ0590vJVlPii2c5Cok7i25VFpvZme7+LzM7i2Bu\nFsDLwFNmNolg+LA3MLuuk+fnQ+vWwZI9nTvHO3QRSSVKtiQUDSnZkIjyDg0ZwttZuZOlpUs5puAY\n9u7Ym5T4JFxm9jTB+F5HM1tBcPfhfxHcadgC2BN9jbvPN7PngPl8URKiXt31AwfCe+/BBRck4EuI\nSMpQsiXNUn0n1g88bSAPPfMQPenJ3h17KYuUqTp8M+Du36nlrVNrOf524PaGXmfwYJg1S8mWSKZT\nsiWhqG/P0pw5cygpWceyZb9m587FtG7dJ24Tyesawlu5dSVTt0zlrkvuYvG7wYhRTdXhNdFdGmvw\nYPj978OOQkQSTRPkJTR1TQ6PnQu1Y8dOSkvv5dxz+3L99ZcnfIhuzfY1TJo5iW8f+20GHlL3tTJp\nontjZdoE+USo3t5t2AB9+sCmTZClqociaUET5CWt1NWz9OWin9CmTWu6dp2d8ESmZEcJ9866l2/2\n/2a9Ei3QRHdpnE6doKAAFiwALacpkrmUbEmzF1tH67Shp/GPTf9g5DEjGdStziLgIk02ZEgwb0vJ\nlkjmUse1pKy6luKZM2cORUW3U1R0e6PXFKyqozWzbCb/2vcvfvTkj+iX04/TDjut7g+LxEHVJHkR\nyVzq2ZKUdaA7BuNVe2vKtCns7LaT0q3b2JC1jq6tO7H2w7Uq6i1JM3gwPPhg2FGISCIp2ZKUVttc\nqHjUtopEIrz8ysu8v2keNqgDeXYYK5avZnXf1XV/WCROjjsOli6FbdugXbu6jxeR9KNkS1JWIu/w\nqxo+XNFpDfsOLyP70620phcVS7uzs4tG1yV5cnPhxBPh3XfhrLPCjkZEEkH/q0hKqhomnDFjEDNm\nDOLqqyd9aV5WXfO56jJl2hSsn1Heq4zWLXqRn3MoWYvzOLzVlRQUdEnEVxKpleZtiWQ2JVuSkr5c\n9uEssrNHf97LBV/M5xo6dDZDh85u8Hytfb6PJeVL6NGlO/l79pCb3YH83O7k5RU3KGkTiQclWyKZ\nTcOIknYaM7wYiUR45P8eYcHiBfTp1YetvbaS/2k+XXK6kH9IPivnr6PwK635yU+uqbG4anMvWCqJ\nNWQIXH01uIOpLKxIxlEFeYmreCUm1e82rKh4nAcfDJbAqWn/ga4TiUQYM24Mn+R+grUzduzcQfec\n7txx0R0sXLYQgBHn7L8Mz4HiUMK1P1WQr9uB2rvDD4fp06FXryQHJSINogryEqp4lWOA2ss+FBXd\n3uC7EKdMm8KGThvI75bPnvw95Jfmk7Uoi4XLFlJ0Q9EB44jHXY8i9TF4MMycqWRLJBMp2ZK4iXdi\nEs8lcBxnu28nz/LIL8vHUAeMpJaqeVvf+17YkYhIvCnZkrQyatRwpk+fRGlp8Dq4C3HsfsfFLsHT\np2cfyueXU766nBbegj0L9tCjQw9GnDMibtcTaarBg+Hpp8OOQkQSQXO2JG6SNb+prnlhVTW0cvvn\n4u4sWbmE0044jZaftmTh4oX07dWXK35wRY1ztBpzPQlozlbdDtTe7d4dLEq9YQO0apXkwESk3hrT\n1inZkrhKhcRkwj0TmFk2k85Hd2bhxoVsXruZb7X8Fr8Y+4ukx9KcKNmqW13t3aBBcPfdcMYZSQxK\nRBpEE+QldPGcZ9UU7s7iTYspqyyjZ05Psi077JBE6jRkSDBvS8mWSGZRsiUZo2qe1rq161i4cSE5\n5TkcmXMkFfMrGHFT3fOzRMI2eDC8+GLYUYhIvGkYUTJC1TytnH45rK5YzcalG/l6x6/TrUu3Wmto\nSXxpGLFudbV3S5fCaafBmjUqbiqSqjSMKM1SJBLh5+N+zvLK5bTPaU9WhywGZA+gW4tuddbREkkl\nRxwBBx0E770Hp5wSdjQiEi9aG1HSWlWP1vJOy1l72Frmz59Pt73dNEdL0tbFF8NLL4UdhYjEk5It\nSVtVPVpLWy4lv08+la0qyc/JZ9GbiyiLlNWrjpZIqvnGN4J5W5pdIZI5lGxJWqrq0VpRuYJ1uev4\nrPQzTj7kZDpaRw7POpyJN03UPC1pNDN7xMxKzOyjavuvNbMFZjbPzO6I2X+rmS2OvnduU6590kmw\nZw/Mn9+Us4hIKlGyJWkntker5cCWVOyuIH9TPmsXrOWI3Udw1213KdGSpnoMGB67w8wKga8Dx7r7\nscBd0f19gcuAvsD5wANmjZ/ebhb0bmkoUSRzKNmStBLbo1WSU8KiXYs4qedJFKwvoMeGHurRkrhw\n97eBzdV2XwPc4e7l0WOiizgxEnjG3cvdfRmwGBjUlOt/4xvw17825Qwikkp0N6KEpjHV5qdMm0Ju\n/1w653Zm5fyVtCxvybrN6zii3RFKtCTRjgKGmtn/ALuBn7v7+0A3YGbMcauj+xrt9NNh1aqgFMQR\nRzTlTCKSCpRsSSiqr6M4ffqkA66jWFWwtPjtYtYfuZ6KIys489gzWTJ9STBH6zYlWpJwOUAHdx9s\nZqcAzwNHNvQk48eP//x5YWEhhYWF+x2TnQ0jRwa9Wzfe2Oh4RSQOiouLKS4ubtI5VNRUQlFUdDsz\nZgyioOAsAEpL32Do0NlMmHDrfsfefffd/OZPt2NHGwd1a8vaLWs5seuJdOrSibJImXq0UkSmFTU1\nsx7A3939uOjrV4AJ7v6v6OvFwGDgSgB3vyO6/1VgnLu/U8M5693e/fOf8Nvfwttvx+PbiEi8NKat\n05wtSWl33303N99+K1ta7WBr+30sy1nBYe0Oo92ydgzJHaJESxLJoluVvwHDAMzsKKCFu28EXgZG\nmVkLMzsC6A3MburFhw2DSATWrm3qmUQkbEq2JBSjRg2nouJxSkvfoLT0DSoqHmfUqC/d/MV9991H\n0T23UHFcOX5MOZW+E9vYhtK1myk8o5CiG4qUaElCmNnTwH+Ao8xshZn9CHgUONLM5gFPAz8AcPf5\nwHPAfOAVYEw8uuvz8uCCC2Dy5KaeSUTCpmFECc2BJshPnjyZ79zwHXafvBvv4pANbMjFFmVz0K42\nvPV8sRKtFJNpw4iJ0ND27oUX4OGHYerUBAYlIg3SmLZOyZaknEgkwsU/vJhlOcvgBCjrVBbchL8g\ni+yPcph46+3cqFnDKUfJVt0a2t7t2AGHHgrLlsHBBycuLhGpP83ZkrRXVUdri2+h8tBKyveUk7sh\nFysxchZnK9GSZqVNm2Du1pQpYUciIk2h0g+SMiKRCFeNvYpFGxeR0z4Hb+Fk7ciCT6BlSUvuuOkO\nrr322lo/35i6XSKprmqtxB/8IOxIRKSxNIwoKSESiTBm3BjeL3uffQfto5JKWi5sSe6OXAryC7jz\n13cycuTIWj9fvW7Xtm2/45RT+tClS1clXkmSisOIZtYLWOXue6PL7RwH/Nndt4QUT4Pbu61boUcP\nWLQIOndOUGAiUm8aRpS0NWXaFFa2WYn3goruFWTlZ0EL6Ny+M3994q8HTLQAnn32NbKzR1NQcBa5\nuR1YvjyXqVMHMmPGIK6+ehJz5sxJzheRVPMiUGFmvYGHgcMI7iRMG+3bw0UXwZNPhh2JiDSWki1J\nCatXr2b1hg3syXWy93WgcgvkVuRywdALGnzX4Zo1rwE/oFWr0ygoOIvs7NGfDy9Ks1MZXcvwYuA+\nd78JOCTkmBrsiivgT38CDQKIpCclWxKKOXPmcPbZX6dDjwJ6DOjBv995D1q1JGtJHrYsF/u4Nbkb\n89m6vpKiotvr7JmKrdu1a9cyYAOHHqoxF6HMzL4N/BCommaeG2I8jXLGGVBZCTNn1n2siKSehM/Z\nMrPzgHsIErtH3H1Ctfc1Z6uZmTNnDl/72g9YlzsfBkVraJVBx2XDaLf7eHbsWUDl3n2wbwd9+vwP\nABUVjx9w7cSq8z777GusW7ea999fQ9u2P6n3Z6XpUnTOVj/gamCmu/8lWuH9surtUBLjaXR7N3Ei\nfPIJPPponIMSkQZJuTpbZpYFLALOAtYA7wLfcvdPYo5RstXMXHHFDTz61wfgrDI4OgvMYYWT+4/W\nnNQnKJe9bNmvKSi4hZ49LwQOvHZiTXRnYvKlYrIVy8w6AIe5+0chxtDo9m7dOjjmGFixAtq1i3Ng\nIlJvjWnrEl36YRCw2N2XA5jZM8BI4JMDfkrSQmMTmk9XfgK50R6tbKDSYCfk5+UwdGiwpNzRR5/C\nwoWtGx3bwIEDlWAJZlYMXEjQ1r0PrDezf7t72hVr69oVvvpVePZZuPLKsKMRkYZI9JytbsDKmNer\novskzVWVWpgxY1CD7/g7/vijsINzYA+w0uETh9kwqO9XmDDhViZMuJXrrvthnWsnitRDe3ffBnyD\noOTDqcDZIcfUaFdcAY88EnYUItJQKVHUdPz48Z8/LywspLCwMLRYpH5iSy0AlJYG++rTm/T9732f\n5zY+x/ollfhMoMzomtebiRNv+/yYgQMH8uCDY2N6zjTnKtUUFxdTXFwcdhh1yTGzQ4DLgF+GHUxT\nnXceXHUVfPwxDBgQdjQiUl+JTrZWA4fHvO4e3fclscmWZLZdZbuYunUqt3zrFj6b9hkftVhM78OO\n4Sc/+cF+yZSGAlNb9V+MbrvtttoPDs//A14D/u3u75rZkcDikGNqtJwcGD066N2aNCnsaESkvhI9\nQT4bWEgwQX4tMBv4trsviDlGE+TTUPWK7fW5429P+R4mzZxEr4N78c1+38QsZedSSyOk+gT5VBCP\n9u6zz2DwYFi1CvLy4hSYiNRbyt2NCJ+XfriXL0o/3FHtfSVbaaohE+T3lu/l3nfupXu77nx7wLeV\naGWgVEy2zKw7cB9wenTXW8D17r4qpHji0t4NGwZXXw2XXRaHoESkQVIy2aozACVbGW9fxT7un30/\nBa0K+P5x31eilaFSNNmaRrA8T9ViN98Dvuvu54QUT1zau2eegQcfhNSfMieSeZRsZbh0rB1VVlHG\nA+8+QNu8tow+YTRZpkULMlWKJlsfuvsJde1LYjxxae/KyqBXL3jpJTj55DgEJiL1poWoM1hTSi2E\npbyynIfff5iWuS2VaElYNprZ98wsO7p9D9gYdlBNlZsLN9wAd90VdiQiUh8pUfpB6taUUgvJFIlE\nmDJtCpVeye5euznkkEO44sQrlGhJWC4nmLM1CXDgP8DoMAOKlx//GH77W1i2DHr2DDsaETkQ/Q8o\ncROJRLj5zpv5z77/8Nye53jx9Rc5Lf80srOyww5Nmil3X+7uF7p7J3fv7O4XAZeEHVc8tGsXJFwq\nASGS+pRspYlRo4andEX1yZMnc/EPL2b2ztl8lvUZLTu3pFf3Xrz6xqthhyZSXdot1VOb666DJ5+E\nTZvCjkREDkTDiGkizIrqdU3Mnzx5MleOv5JdbXexp+MetqzYQmFeoYYOJVWl1CT+pujWDS68EB56\nCG6t3xrtIhIC3Y0oB1Sf4qXnX3o+H3T8gMrOlWxZvgUq4aC9B3FK61OYeNNE+vfvH07wklSpeDdi\nTcxshbsfXveRCbl23Nu7efNg+HBYulRFTkWSQXcjStzFTswvKDiL7OzRn/dyRSIRJtwzgU+XfcrO\n8p3QDg49/FDyVuRx0NKDlGhJaMxsu5ltq2HbDhxaj88/YmYlZvZRDe/9zMwqzezgmH23mtliM1tg\nZufG+esc0LHHwnHHwVNPJfOqItIQSrakUWInw9tZxq6Nu6h4t4K9G/bScntL7vz1nUq0JDTu3tbd\n29WwtXX3+kyfeAzYb1JktCL9OcDymH19CRa67gucDzxgSa7ce9NNQRmIyspkXlVE6kvJlhxQ9Yn5\nO3bcw+7Ktfx83M/Zc/ge9h66l3a92nFirxPpuKQjJ2w8gT+O/yMjR45s1PXmzJlDUdHtFBXdnvJ1\nxCRzufvbwOYa3poE3FRt30jgGXcvd/dlBAtdD0pshF82bFgwhPjKK8m8qojUlybIywHFTswvLS1h\n4abNrOiczYrPVrB6zWq6dezGoB6D2Lx1M0O+N4SiG4oafa3q88OmT59U5+LWIsliZhcCK919XrWO\nq27AzJjXq6P7khgb/OIXMH48fO1rwWsRSR1KtqROAwcOZODAgUy4ZwIbywrockwXNuVsYtncZZQv\nLGfzvs2URcoYcdOIJl0nXQq3SvNjZi2BXxAMITbJ+PHjP39eWFhIYWFhU08JwCWXwB13wIsvwqWX\nxuWUIgIUFxdT3MSFSJVsSYOt2b6G7S23M7DjQNosb8OQo4Yw4qYRmqMlmawX0BOYG52P1R2YY2aD\nCHqyYu9u7B7dV6PYZCuesrLgf/4Hrr8eLroIctS6i8RF9V+KbrvttgafQz+OUqeqJXhKSkqY/8l8\nNvTcRIceGWcDAAAgAElEQVStB9N6c2vu+s1d9O/f//O5VtD4RbJHjRrO9OmTKC0NXgeFW8fG86uI\nNIRFN9z9Y6Dr52+YLQUGuvtmM3sZeMrM7iYYPuwNzA4hXs49Fw45BJ54Aq64IowIRKQmqrPVzNVV\nsLTqrsPc/rms2LqCjz/5hIPmDSPfD6F93jqeeCLI8OuqxRWveCR1pUudrfows6eBQqAjUAKMc/fH\nYt5fApzs7puir28FrgDKgOvdfWot5014ezdrFnzzm7B4MeTnJ/RSIs1SY9o6JVvNWF0FSyORCD8f\n93OWd1rOoQMP5aOVEfZ93J1Dll5K7+5FlJa+wdChwS/wM2YMiplrFeyfMEElrZuTTEq2EiVZ7d3I\nkXDmmXBjxixMJJI6VNRUGuRABUsnT57MJWMu4f0N77O2ci2zl86m7e72ZJW1CDVmEanbb38LEybA\ntm1hRyIioGRLajB58mR+fNOPWd15NeX9y9m+Zzs5G3LwDfuo/GQZLSoLvrQYdqovki3S3AwYECzh\n87vfhR2JiICGEZu1moYRb7rpQn59/69Z7aspO6qMioMraLe9Hdnzszmp00lcPupy3nvvU+DLc6o0\n10o0jFi3ZLZ3S5fCySfDggXQuXNSLinSLGjOljRYbJJ08sm9efTZR3l/w/uUH1HO9uztZJNN7rZc\nuq3vxosPvKjyDlIrJVt1S3Z7d8MNsGMH/OlPSbukSMZTsiWNVnXX4dKWS1lXsY5te7fR1tpSvqSc\n/HX5/OnOPzV6CR5pHpRs1S3Z7d3WrdCvHzz3HJx+etIuK5LRNEFeGqXqrsOlLZdScEwB5QXltM1v\nS+7SXLpZNyVaImmqfftg3tY110BZWdjRiDRf6tlq5iZPnkzRxCI27dlE2dFlVPSsYEDnAWxZuIUe\nG3pw12131WvoUHO2RD1bdQujvXMPip2edx787GdJvbRIRtIwYjMRr8QmEolwyZhL2HrMVipzKykt\nLSWnPJc2Fe3oy1E89JuH6p1oxauoqaQvJVt1C6u9W7wYhgyBDz6Aww5L+uVFMoqGEZuBqsRmxoxB\nzJgxiKuvnsScOXMada4p06aQ1S+L7MOy2XPoHiw3h8r3W1IxuydbP+3A3r1763WeA9XrEpHw9ekD\nP/1pMGFeRJJPyVaaiXdi07FTR7bt2kb5lkpsd2taVHThhD6P0abNDUqYRDLILbfARx/BK6+EHYlI\n86Nkqxk7fejprNm+hj45fchf3JKc99rQr8sE2rZtWHkHFTUVSX35+fCHPwQ9XLt2hR2NSPOiOVtp\npinzoyKRCFOmTQHgK2d+hX9s+ge9s3qzfu56Vq9ezfRXl9KmzQ0NPm9VXJog37xpzlbdUqG9+973\ngrsU//CHUMMQSVuaIN9MNCaxqaqjlds/lzIvY9GaRRRdWMSPh/24SecVqaJkq26p0N5t2QInnAD3\n3w8jRoQaikhaUrIlNaqqo7W803KOPvVoVuxdQYvNLbgo/yKKbiiq9XNKvqQhlGzVLVXau7ffhm9+\nM7g7sWvXsKMRSS+6G1H2U9WjtaJyBRt9I28ufJOW3pIu2V0O+Ll43vUoIqnljDPgyith9GiorAw7\nGpHMp2Qrw02ZNoXc/rkcNewo9pTvIWtXFhvnbqQsUsaIc2ofQ1A5B5HM9utfB0OK998fdiQimS8n\n7AAkcSKRCMVvF7PMl5EzOId+/fqx9e2t9MjqwcTbJmpRaZFmLDcXnnoKBg+Gr34Vjj027IhEMpeS\nrQwViUS46ldXsaLVGtbkrySr2OiS050eWYfWawmeUaOGM336JEpLg9dBOYexSYhcRJKlVy+46y74\n9rfhnXegdeuwIxLJTBpGzFAPPf4QH5cvpbRjCyq9G2U7WrD5nZb1rgw/cOBAHnxwLEOHzmbo0Nla\nfkckQ/3gB3DyyZq/JZJIuhsxQ31l+Lm8330x3qITFRuOxjcuod2ithx1+E0MHTqbCRNuDTtEyTC6\nG7Fuqdre7dkDw4bB8OEwblzY0Yiktsa0dRpGzEDlleWUH1MGiwyraIuXlcLCClrnHBN2aCKSgvLz\n4aWX4NRToV+/oCyEiMSPkq00VlMdrIrKCv74/h/56plD2P5ma0q3dWTT5m3klLfkoN7HaO6ViNSo\na1f429/g3HOhd2848cSwIxLJHBpGTFM1LdvzwP9ezwd8wN6KvVx98tV89OFHPPvsa5SUrMO9gq5d\nu6k4qSSMhhHrlg7t3QsvwI03wuzZKngqUhNVkG9GiopuZ8aMQRQUnAXAhtJpHPzVhxh+4VcYc8oY\ncrNzG31uVY6XxsikZMvMHgFGACXuflx030Tg68Be4DPgR+6+LfrercDlQDlwvbtPreW8adHe3XYb\n/POf8Prr0KZN2NGIpBZVkG9mdu1axqerJrB41R2sKXiG3baLa065psmJlirHi/AYMLzavqlAf3c/\nAVgM3ApgZv2Ay4C+wPnAA2aW1knnf/839O8PI0fC7t1hRyOS/pRspaFIJMKqDR+zsHQsq9v/hVXH\nPcqGrOe58rjv0SK7RZPOrcrxIuDubwObq+173d2riiPMArpHn18IPOPu5e6+jCARG5SsWBPBDB5+\nGLp0gUsugXpUixGRA1CylWaq1jr8IO8DWg/PpazDIloWbOOUASewcu3KsMMTaS4uB16JPu8GxP7w\nrY7uS2vZ2fDEE8Gdit/5DpSXhx2RSPrS3YgprPrcqby8PH4+7uesqFxB1kFZZB+aTV6XFvTY3J2D\nux4cl2uqcrzIgZnZL4Eyd/9LYz4/fvz4z58XFhZSWFgYn8ASIDcX/vIXuPhi+OEP4c9/DpIwkeak\nuLiY4uLiJp1DE+RTVPW7DXfsuIf2vTezqesmNlduZs/mPZQfUk6rva0oWF9Az3Y9mXjTgdc7rO/E\nd02Ql8bIpAnyAGbWA/h71QT56L7RwJXAMHffG913C+DuPiH6+lVgnLu/U8M507K9270bvvY16NED\n/vhHyNGv6dKM6W7EDFL9bsO5C68k77R/c+KwAUyPTGd3xW46LexE3rY8Lhh8AVf84Io6E63qpSK0\nBI/EUwYmWz0Jkq1jo6/PA34HDHX3jTHH9QOeAk4lGD6cBvSpqWFL5/Zux45g/lZeHjzzDLRqFXZE\nIuFQBfkMtX17hG0755K9tZSS3SV0OrQTFQsrODLvSO56oO5FpeHLE98BSkuDfUq2RPZnZk8DhUBH\nM1sBjAN+AbQApkVvNpzl7mPcfb6ZPQfMB8qAMWmbUR1Amzbw97/DFVfA2WcHzzt2DDsqkfSQsAny\nZjbOzFaZ2Zzodl6irpWJRo0aTkXF46xY8QhzV19BZe/llO3bw3sfv0fBxgJ67+rNXbfVL9ESkYZx\n9++4+6Hunufuh7v7Y+7ex917uPvA6DYm5vjb3b23u/etrcZWJmjRIpg0/5WvwBlnwPLlYUckkh4S\nfTfi3TEN06sJvlZGGThwIDfddCHbKiaQ1XkJR519GIeeeSjd13Wn9fzWdc7Pqq4qeSstfYPS0jei\nE9+rlxESETmwrCyYMAGuuipIuObODTsikdSXsDlbZjYO2OHuv6vjuEzscW+yyZMnUzSxiE05m9jb\nYy/e0jlzwJlk78lmSO4Qim4oavA5NfFdEinT5mwlQqa1d88+Cz/9KdxzD3z3u2FHI5IcKTVBPpps\njQa2Au8BP3P3rTUcl1GNTzxEIhEuGXMJW4/ZSnmrcjZv3Ey7snYcsveQet11KBIGJVt1y8T2bu5c\nuPRSGD4cfve7YAK9SCZL+gR5M5sGdIndBTjwS+AB4P+5u5vZb4C7gStqOk861Z1JhinTppDVLwvr\nZlS2qKTAC6j8oJLDOx3eLBIt9cClh3jUnpH0d/zx8O67MHo0nHkmPP88HHZY2FGJpJaklH6oqV5N\nzHsZ95teU024ZwIvbXqJBeULaNeqHRUrK2j/cXtefPjFZpFoqURFelLPVt0yub1zhzvvhLvvhkcf\nhQsuCDsikcRIqYWozaxrzMtvAB8n6lqZpu+pfdmwYwMDcgbQbnU78j7Mo1+XU/jzn1/O+EWhtTaj\nSHoyg5tvDuZx/eQn8KMfwZYtYUclkhoSeTfiRDP7yMw+BM4EtOZLPSwsXciM7TP4w6g/MPLgkZzd\n5mza7jqWtWtHM2PGIK6+elLGJ1wikr7OPBPmzQuKnh57LLzySt2fEcl0qiCfQj7d9CkPvvcg/3XS\nf3FUx6OA/SvJl5a+wdChs5kw4dYwQ00YDSOmLw0j1q25tXdvvhkUQS0sDCbPHxyfJVxFQpVSw4jS\nMEs3L+XB9x7kihOv+DzRCtOcOXMoKrqdoqLbk9qTNnDgQB58cCxDh85m6NDZSrRE0tiwYUEvV9u2\n0Lcv3H8/lJeHHZVI8qlnKwWs2LqC37/ze0afMJoBnQd86b0wenrUuySNoZ6tujXn9m7ePBg7Ftau\nhUmT4Nxzw45IpHFSqs5WvQNoxo0PwKptq7h31r1897jvckLXE2o8JtmlEJrb0KXEh5KtujX39s4d\nXn4ZfvazoKfrt7+F4/a7R10ktWkh6jSzdvta7p11L6MGjKo10YJgaE29SiKS7sxg5Eg47zz4wx+C\nQqhDhsB//zecUHsTKJL2NGcrJOt3rueeWfdwab9LOfnQk8MO50u0jqKIJFJeHtx4I3z2WbCo9QUX\nBEnY+++HHZlIYmgYMUkikQhTpk0B4PShpzNl0xS+ftTXOf3w00OOrGaq4i4NpWHEujWX9q6hdu+G\nP/4RJk6EXr3g2mvhoosgR2MvkoI0ZytFRSIRbr7zZnL757LP97Fo7SJ+eeEv+dFXfxR2aCJxo2Sr\nbs2hvWuKsjL461/hvvtg2TK45hq48kro1CnsyES+oNIPKWrKtCnk9s/loN4Hsa79Orp06cL6uevD\nDktEJKXk5sJll8FbbwUT6T/7DPr0gW98AyZPhn37wo5QpHGUbCVJmZcxb/08urbpSqds/ZomInIg\nJ54IjzwCy5fD174WrLnYrRv89KcwcyZUVoYdoUj9KdlKgmFfHcbiNYvJ3ZxLizUtKIuUMeKcEWGH\nJSKS8tq3D6rQ/+tfMHs2dOkCP/4xHHYYjBkDr78eDD+KpDLN2UqA2Mnww746jNe2vka7Pe3YFdmF\nmTHinBH0798/5ChF4ktztuqWie1dWBYtCuZ3vfQSfPppUEbi3HPhnHOCHjCRRNEE+RQQOxm+witY\ntHoRV11wFWPPHYuZ/h+SzKVkq26Z1t6litWr4Z//hGnTgp6uQw4Jkq5hw+C006Bjx7AjlEyiZCsF\nTLhnAjPLZtLxqI58vP5jKkoruDT/Us4989wGlVJQ6QVJN0q26pZp7V0qqqiAOXNg6tRg6HHWrGDI\n8Ywzgu3UU6F3b8jSJBppJCVbKWDCPRP4975/s+HgDbTMaUnb9W3psaEHs/61ud5rDWptQklHSrbq\nlmntXTooL4ePPoK33w7ucnz3XdiyBQYOhJNOCrbjjw/uelRdL6kPLdeTAoafNZwn/vgEeRV5dM7u\nTPn8cnZmZ5GdPTpmrUF49tnXak2enn32tQYdLyIiNcvJCRKrgQPhuuuCfaWlQbX6996DZ5+FX/0K\n1qyBo4+GAQPg2GOD50cdFRRZbdEi3O8g6U/JVhyVV5bz1q63GHXuKFosbkGWZTHiphH8+c8vhx3a\nAWnIUuTLzOwRYARQ4u7HRfd1AJ4FegDLgMvcfWv0vVuBy4Fy4Hp3nxpG3FI/BQXBhPrhMauQ7dwJ\n8+fDxx8H24wZwST8FSuge/cg8TriiC9vPXtChw7Bmo8iB6JhxDipqKzgofcfIsuyuHLglWRnZX/+\nXkOHBZM5jKghS4mXTBpGNLMzgB3An2OSrQnARnefaGZFQAd3v8XM+gFPAacA3YHXgT41NWyZ0t41\nJ/v2wdKlsHhx8LhkSfC4dGlQA6ysLEjGDjss2Lp1g65dg0n6VY+dO0Pr1krKMoXmbIWk0iv505w/\nUVZRxlUnX0VO1v4dhg3tPUpWb1NR0e3MmDEoZsjyDYYOnc2ECbcm5HqSuTIp2QIwsx7A32OSrU+A\nM929xMy6AsXufoyZ3QK4u0+IHvdPYLy7v1PDOdO+vZMv274dVq2ClSuDbc0aWLsW1q0LHteuhfXr\nwT1YdqhTpyD56tgRDj74i8eDD4aDDvpia98+eFSSlno0ZysElV7J4x8+zu6y3Yw5ZQw5WTk1JkpV\nW3019HgRSbjO7l4C4O7rzKxzdH83YGbMcauj+6QZaNsW+vYNtgPZuTNIujZsCLZNm4Jt40ZYuDB4\nvmVLsG3d+sXzPXugTRto1y64Vrt2QQLWps0XW+vW0KrVF1vLll/e8vO/eKza8vKCxxYtgue5ubpD\nM5GUbDWBu/N/H/0fW/Zs4dpB15KbnbvfsNz06ZNSelhu1KjhTJ8+idLS4HVFxeOMGjU23KBE0kOj\nuqjGjx//+fPCwkIKCwvjFI6kstatv5jr1RDl5bBjB2zbFvSibdsWJG47dnz5cdeu4Pn69cHrPXtg\n9+5gq3q+Zw/s3Rs8Vm379n2x5eYGyVeLFl88z82tecvJ+eIxdsvO3v+xti0ra//nWVkH3sy+/Fh9\nX/X3q79X0wa1v5edDXl5xRQXFzfp71/DiI3k7vzl47+wettqrjv1OvJy8oD0HJbTBHmJh2YwjLgA\nKIwZRpzu7n1rGEZ8FRinYURJJ+7B/LO9e4PHffu+eNy3L0j6ysqCrep5efmXt7KyoM5ZRcUX+6pe\nV98qK/d/HbtVVAQx1bSv+v7YfbGPscdVva6+VX332racHPj737/8Z6VhxCRxd56f/zwrtq7ghsE3\nfJ5opSsNWYrUyKJblZeB0cAE4IfA5Jj9T5nZJILhw97A7OSFKdJ0Zl/0akn8KdlqIHfnr5/8lcUb\nFzN2yFjyc/K/9L6G5UTSn5k9DRQCHc1sBTAOuAN43swuB5YDlwG4+3wzew6YD5QBY9R9JSKxNIzY\nQH9f+Hc+WPcBPxvyM1q3aF3jMRqWk+Yo04YREyHd2jsR2Z9KPyTYPxf/k3dWv8PPhvyMtnltww5H\nJKUo2apbOrV3IlKzxrR1utGznqZ9No3/rPwPYwePVaIlIiIi9aZkqx6mL51O8bJibhxyI+3z24cd\njoiIiKQRJVt1eGv5W0z9bCo3DrmRDi07hB2OiIiIpBklWwcwc+VM/rH4H4wdMpaOrTqGHY6IiIik\nISVbtXh39bv87ZO/ccPgG+jcunPdHxARERGpgZKtGsxZO4fnIs9x/eDr6dqma9jhiIiISBpTslXN\nRyUf8fS8p7nu1Os4tO2hYYcjIiIiaU7JVozI+gh/nvtnfjropxzW/rCwwxEREZEMoGQr6pPST3js\nw8e45uRr6HlQz7DDERERkQyhCvJRCzYsIDsrm6M6HhV2KCJpSRXk65Yq7Z2INJ6W6xGR0CjZqpva\nO5H0p+V6RERERFKMki0RERGRBFKyJSIiIpJASrZEREREEkjJloiIiEgCKdkSERERSSAlWyIiIiIJ\npGRLREREJIGUbImIiIgkkJItERERkQRqUrJlZpea2cdmVmFmA6u9d6uZLTazBWZ2btPCFBFJDdG2\nLWJmH5nZU2bWwsw6mNlUM1toZq+ZWfuw4xSR1NHUnq15wMXAv2J3mllf4DKgL3A+8ICZpfyaacXF\nxWGH8LlUiSVV4gDFUpNUiaO5MLMewJXAie5+HJADfBu4BXjd3Y8G3gRuDS/KxMmEf2/p/h3SPX7I\njO/QUE1Kttx9obsvBqonUiOBZ9y93N2XAYuBQU25VjKk0j+AVIklVeIAxVKTVImjGdkG7ANam1kO\n0BJYTdDmPRE95gngonDCS6xM+PeW7t8h3eOHzPgODZWoOVvdgJUxr1dH94mIpC133wz8DlhB0K5t\ndffXgS7uXhI9Zh3QObwoRSTV5NR1gJlNA7rE7gIc+KW7/z1RgYmIpBozOxIYC/QAtgLPm9l3CdrE\nWNVfi0gzZu5NbxPMbDrwM3efE319C+DuPiH6+lVgnLu/U8Nn1SiJZAh3T/m5mU1hZpcB57j7ldHX\n3wcGA8OAQncvMbOuwHR371vD59XeiWSAhrZ1dfZsNUDshV8GnjKzSQTDh72B2TV9KNMbZxHJKAuB\nX5tZPrAXOAt4F9gBjAYmAD8EJtf0YbV3Is1Tk5ItM7sIuA8oAKaY2Yfufr67zzez54D5QBkwxuPR\nhSYiEiJ3n2tmfwbeByqAD4CHgbbAc2Z2ObCc4G5sEREgTsOIIiIiIlKzlKkgb2bXRgugzjOzO1Ig\nnp+ZWaWZHRzS9SdG/zw+NLMXzaxdCDGcZ2afmNkiMytK9vVj4uhuZm9GC0nOM7PrwoolGk+Wmc0x\ns5dDjqO9mT0f/XcSMbNTQ4pjvyKfYcSRylLlZ6khzOwRMysxs49i9qVN8dba2o00+w55ZvaOmX0Q\n/R7/E92fNt8B9m8z0zD+ZWY2N/r3MDu6r0HfISWSLTMrBL4OHOvuxwJ3hRxPd+AcguGAsEwF+rv7\nCQR1ypJaJNHMsoD7geFAf+DbZnZMMmOIUQ7c6O79gSHAT0KMBeB6giHysN0LvBKdiH08sCDZAdRS\n5PNbyY4jlaXYz1JDPEYQc6x0Kt5aW7uRNt/B3fcCX3X3E4HjgGFmdjpp9B2iqreZ6RZ/JcENMCe6\ne1XN0AZ9h5RItoBrgDvcvRzA3UtDjmcScFOYAbj76+5eGX05C+ie5BAGAYvdfbm7lwHPEBRuTDp3\nX+fuH0af7yBIKkKp2xZNxC8A/hTG9WPiaAd8xd0fA4gWEN4WQijVi3y2AtaEEEcqS5mfpYZw97eB\nzdV2p03x1lraje6k0XcAcPdd0ad5BP9nbyaNvkMtbWbaxB9l7J8vNeg7pEqydRQw1Mxmmdl0Mzs5\nrEDM7EJgpbvPCyuGGlwO/DPJ16xemHYVKVCY1sx6AicA+5URSZKqRDzsyY5HAKVm9li0e/5hM2uZ\n7CBqKPK5JVrkU76Qkj9LjdQ5HYu3xrQbs0izArTRIbgPgHVAsbvPJ72+Q01tZjrFD0Hs08zsXTP7\ncXRfg75DPEs/HJDVXhz1V9E4Orj7YDM7BXgOODKkWH5BMIQY+16y4/i8YKyZ/RIoc/enExVHujCz\nNsALwPXR31STff2vASXu/mF06DvM2/hzgIHAT9z9PTO7h6Bbe1wyg7D9i3y+YGbf0b/XZiPsXzrq\nVL3dsP1rnaX0d4iOcJwY7c1+Ldr2pMV3qKHNrE1Kxh/jdHdfa2adgKlmtpAG/h0kLdly93Nqe8/M\nrgZeih73bnRiekd335jMWMxsANATmGtmRtDl/L6ZDXL39cmKIyae0QTdr8Pife16WA0cHvO6e3Rf\nKKJDVC8AT7p7jTWMkuB04EIzu4BgTby2ZvZnd/9BCLGsIuiBfS/6+gUgjInXJwP/dvdNAGb2EnAa\noGTrCyn1s9REJWbWJaZ4a9zbxXiqpd1Iq+9Qxd23mdkrBD9z6fIdamoznwTWpUn8ALj72ujjBjP7\nG8HUgAb9HaTKMOLfiCYUZnYUkJuoROtA3P1jd+/q7ke6+xEE/6GdmIhEqy5mdh5B1+uF0UmSyfYu\n0NvMekTvLvsWQbHasDwKzHf3e8MKwN1/4e6Hu/uRBH8eb4aUaBHtvl4Z/XmBoLhmGJP2FwKDzSw/\n+gvKWYQwUT/FpdrPUkMY+xesHh19Xmvx1hRSU7uRNt/BzAqq7nKLThM4h6C2W1p8h1razO8DfycN\n4gcws1bR3lHMrDVwLjCPBv4dJK1nqw6PAY+a2TyCqsyh/AdWAye8oaL7gBYE48QAs9x9TLIu7u4V\nZvZTgrsis4BH3D2U/0Sjd998F5gXnbvgwC/c/dUw4kkh1xGs1JALLAF+lOwADlDkU6JS6WepIczs\naaAQ6GhmKwiGqO8gWA8y5Yu31tZuEFT5T5cCtIcAT0R/kcki6KF7I/p90uU71OQO0if+LsBfo8PP\nOcBT7j7VzN6jAd9BRU1FREREEihVhhFFREREMpKSLREREZEEUrIlIiIikkBKtkREREQSSMmWiIiI\nSAIp2RIRERFJICVbIiKStszsYDP7ILpG6FozWxXzul61JM3sETPrU8cxY8zs2/GJusbzXxxTpFgy\njOpsiYhIRjCz/wZ2uPvdNbxnnsL/4UWXsXkhxOXIJIHUsyUiIpni8xU/zKyXmUXM7P/M7GOgq5k9\nZGazzWyemf0q5ti3zOw4M8s2s81mdruZfWhm/zazgugx/5+ZXRdz/O1m9o6ZLTCzwdH9rczsBTP7\n2MyeN7N3zey4/YI0uzMa24fR85xBsA7u3dEeucPNrLeZvRo9R7GZ9Y5+9kkze8DM3jOzT6JLu2Fm\nA6LfbU70vD0T9qcsDZYqy/WIiIjE29HA99z9AwAzK3L3LWaWDUw3sxfc/ZNqn2kPTHf3W83sd8Dl\nwMSaTu7up5rZ1wmWMjofuBZY6+6XRpOs96t/xsw6A+e7e//o63Yxi0w/7+4vR/e/CVzh7kvN7DTg\nD8Dw6Gm6u/vJ0WHH182sFzAGuNPdn48u4RXWUnNSAyVbIiKSqT6rSrSivhtdyy6HYN3BfkD1ZGuX\nu0+NPn8fOKOWc78Uc0yP6PMzCNb9w90/MrNIDZ/bBFSY2cPAK8CU6gdEF58eDLwYXRcRvjwS9Vz0\nGoui61b2Af4D/Drao/WSu39WS9wSAg0jiohIptpZ9SQ6DHcdUOjuxwOvAfk1fGZfzPMKau+U2FuP\nY/brXXL3cuBk4G/ARcA/avncBncf6O4nRrfjY09T7Vh39/+Lnm8v8Gp0aFJShJItERHJVLHJTjtg\nG7DDzA7hiyG5A32mof4NjAIws2OBvvud3KwN0N7dXwFuBE6IvrU9GiPuvgVYa2YXRT9j1eZ+fTO6\n/yigO7DYzI5w9yXu/nuC3rL95opJeDSMKCIimerzHiB3n2NmC4AFwHLg7ZqOq/a8zvNWcx/wRHRC\n/vzotrXaMe2Bl8wsjyCxGxvd/xfgITO7kaCH6lvAg2Y2HsgF/g/4KHrsajN7D2gNXOnu5Wb2nWhp\nig7DR7AAAAB7SURBVDJgNcE8MkkRKv0gIiISB9GJ9znuvjc6bPka0MfdK+N4jSeJmUgv6UE9WyIi\nIvHRBngjppjqf8Uz0YpSD0kaUs+WiIiISAJpgryIiIhIAinZEhEREUkgJVsiIiIiCaRkS0RERCSB\nlGyJiIiIJJCSLREREZEE+v8BDWQFu2iG7q0AAAAASUVORK5CYII=\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f804039d090>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "#@test {\"output\": \"ignore\"}\n",
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "%matplotlib inline\n",
+    "\n",
+    "# Set up the data with a noisy linear relationship between X and Y.\n",
+    "num_examples = 50\n",
+    "X = np.array([np.linspace(-2, 4, num_examples), np.linspace(-6, 6, num_examples)])\n",
+    "X += np.random.randn(2, num_examples)\n",
+    "x, y = X\n",
+    "x_with_bias = np.array([(1., a) for a in x]).astype(np.float32)\n",
+    "\n",
+    "losses = []\n",
+    "training_steps = 50\n",
+    "learning_rate = 0.002\n",
+    "\n",
+    "with tf.Session() as sess:\n",
+    "  # Set up all the tensors, variables, and operations.\n",
+    "  input = tf.constant(x_with_bias)\n",
+    "  target = tf.constant(np.transpose([y]).astype(np.float32))\n",
+    "  weights = tf.Variable(tf.random_normal([2, 1], 0, 0.1))\n",
+    "  \n",
+    "  tf.initialize_all_variables().run()\n",
+    "                      \n",
+    "  yhat = tf.matmul(input, weights)\n",
+    "  yerror = tf.sub(yhat, target)\n",
+    "  loss = tf.reduce_mean(tf.nn.l2_loss(yerror))\n",
+    "  \n",
+    "  update_weights = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)\n",
+    "  \n",
+    "  for _ in range(training_steps):\n",
+    "    # Repeatedly run the operations, updating the TensorFlow variable.\n",
+    "    update_weights.run()\n",
+    "    losses.append(loss.eval())\n",
+    "\n",
+    "  # Training is done, get the final values for the graphs\n",
+    "  betas = weights.eval()\n",
+    "  yhat = yhat.eval()\n",
+    "\n",
+    "# Show the fit and the loss over time.\n",
+    "fig, (ax1, ax2) = plt.subplots(1, 2)\n",
+    "plt.subplots_adjust(wspace=.3)\n",
+    "fig.set_size_inches(10, 4)\n",
+    "ax1.scatter(x, y, alpha=.7)\n",
+    "ax1.scatter(x, np.transpose(yhat)[0], c=\"g\", alpha=.6)\n",
+    "line_x_range = (-4, 6)\n",
+    "ax1.plot(line_x_range, [betas[0] + a * betas[1] for a in line_x_range], \"g\", alpha=0.6)\n",
+    "ax2.plot(range(0, training_steps), losses)\n",
+    "ax2.set_ylabel(\"Loss\")\n",
+    "ax2.set_xlabel(\"Training steps\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "vNtkU8h18rOv"
+   },
+   "source": [
+    "In the remainder of this notebook, we'll go through this example in more detail."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "r6rsv-q5gnn-"
+   },
+   "source": [
+    "## From the beginning\n",
+    "\n",
+    "Let's walk through exactly what this is doing from the beginning. We'll start with what the data looks like, then we'll look at this neural network, what is executed when, what gradient descent is doing, and how it all works together."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "UgtkJKqAjuDj"
+   },
+   "source": [
+    "## The data\n",
+    "\n",
+    "This is a toy data set here. We have 50 (x,y) data points. At first, the data is perfectly linear."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "cellView": "form",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 398,
+     "status": "ok",
+     "timestamp": 1446659128547,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "4896c353dcc58d9f",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "-uoBWol3klhA",
+    "outputId": "efef4adf-42de-4e6f-e0c3-07ddd3083d85"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQMAAAEACAYAAAC3RRNlAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEGpJREFUeJzt3XuMHeV5x/Hv4xinBgIN2Yq0WCFFJK1EUzC3otAWC4JB\nRHESkUq0qRKIBG2jAE1cZBKoQL0kBGSRtEr/IBdE2iDUEkhMLrJNqS1BBeF+dYA2KbcALYUWIVyu\nT/8447Ase9a7O+/MOTPn+5Esn4Nn552F9Y9553nP80ZmIklLRn0BksaDYSAJMAwkVQwDSYBhIKli\nGEgCCoVBRHw2Iu6NiLsi4lsRsazEeSW1p3YYRMS+wKnAysz8TWApcFLd80pq19IC53gWeBHYLSJe\nBXYFflbgvJJaVPvOIDOfAdYDDwOPAf+TmdfWPa+kdpWYJuwHfBrYF/gVYPeI+IO655XUrhLThEOB\nGzLzaYCIuAp4L3D59IMiwg9BSCOSmbGzY0pUE+4HjoiIX4iIAI4Btg25oNZ+nXfeeY7X0fH6/L2N\nYrz5KvHM4E7gm8CtwJ1AAJfUPa+kdpWYJpCZFwEXlTiXpNHo7QrEVatWOV5Hx+vz9zaK8eYrFjKn\nqDVQRLY1lqTXRATZ0gNEST1gGEgCDANJFcNAEmAYSKoYBpIAw0BSxTCQBBgGkiqGgSTAMJBUMQwk\nAYaBpIphIAkwDCRVDANJgGEgqVJqr8U9I+KfImJbtefib5U4r6T2FGmICnwZ+EFm/l5ELGWwxZqk\nDimxo9IewO9k5qUAmflyZj5b+8qkntm4cSOrV5/I6tUnsnHjxlFfzhvUbogaEQcy2CfhPuBA4Bbg\nzMzcPuM4G6JqYm3cuJEPf/jjbN/+RQCWL1/H1VdfxnHHHdf42G02RF0KHAx8JTMPBp4Hzi5wXqk3\n1q+/pAqCjwODUFi/frz2GirxzOBR4JHMvKV6fyWwbrYDzz///J+/XrVq1dj2j5dK2bhxI+vXX8Kt\nt94JrGllzC1btrBly5YFf12RfRMiYitwamY+EBHnAbtm5roZxzhN0ER5/dTgbuCrwN8A4zlNKFVN\nOAP4VkTsAvwEOKXQeaXOev3UYGCvvf6SQw45kLVr2wmChSi11+KdwGElziV12Y5pAcBTT/33jD99\nD4cc8lM2bfp2+xc2D6XuDKSJN7NisGzZn7Js2Vm8+OLgz5cvX8fatZeN8ArnZhhIhcycFrz4Iqxc\n+VWmpjYAjOXUYDrDQGrQ1NTeYzstmMkwkGra8Zzgqaee7NS0YCa3ZJdqmO05wQEHHMjU1NtYu/a0\nsZgWtF1alCbSbM8JpqY2dGZqMJ39DKRF2PGho8HKwn7wzkBaoNdPDX6VwZq7ga49J5jOMJAWqGsr\nC+fLMJBqG++VhfNlGEjzMH2Z8VFHHcz1169je9Wxo8tTg+ksLUo7MVtjknPOOZ2tW28DGJsS4jCW\nFqVCZj4j2L4dtm7tZvlwLpYWpSH6WD6ci3cG0iz6Wj6ci2EgzaKv5cO5GAZSpcuNSUowDCS635ik\nBMNAovuNSUooFgYRsYTBBiqPZmY7PaGlBnWpMUkJJe8MzmSwq9IeBc8pNaovjUlKKBIGEbECOAH4\na+AzJc4pNW225wQrV15aNSbp/7RgplJ3BhcDZwF7Fjqf1Lg+NSYpoXYYRMT7gScz846IWAUMXQPt\n9moaB6PY8qxNI9teLSI+D/wh8DKwHHgLcFVmfmzGcX5QSSM3LluetWm+H1Qq+qnFiDgKWDtbNcEw\n0DhYvfpENm9ew2srC/+Mvfb6TrWycLw/fbhYfmpRmpf+ryycr6JhkJlbga0lzynVNQmNSUqwuYl6\nreuNSUpwmiAxOY1JSrC5iXpp0hqTlOCdgXpnEhuTlGAYqHcmsTFJCYaBemHSG5OUYBio82xMUoZh\noM6zMUkZhoF6adIak5RgGKizbExSlisQ1UmzPSc44IADq8Yk/V9VuBCuQFSv2ZikPMNAnTF3+VB1\nGQbqBMuHzTMM1AmWD5tnGKizLB+WZRhorFk+bI+lRY0ty4dlWFpU51k+bFft5iYRsSIirouIeyPi\n7og4Y+dfJQ1nY5LRKHFn8DLwmWoTld2BWyNiU2b+uMC5NWFsTDI6tcMgM58AnqhePxcR24B9AMNA\nC2ZjktEp+swgIt4JHATcVPK8mmQ2JmlLsTCopghXAmdm5nOzHeNei5qN+xqUNbK9FgEiYinwPeCH\nmfnlIcdYWtQbuK9B89ouLX4DuG9YEEjDuK/B+ChRWjwS+ChwdETcHhG3RcTx9S9NfWb5cPyUqCbc\nALypwLVoQlg+HE+uQFTrLB+OJ8NArXBfg/FnGKhxNibpBsNAjbMxSTcYBhoJG5OMH8NAjbExSbfY\n3ESNsDHJ+LC5iUbKxiTdU3sFojSdKwu7yzsDFePKwm4zDFSMKwu7zTBQg1xZ2CWGgWqxMUl/WFrU\notmYpBssLapxNibpF0uLWjDLh/3knYEWxPJhfxkGWhDLh/1lGGinbEwyGYqEQdUA9UsMnkF8PTO/\nWOK8Gj0bk0yO2qXFiFgCPAAcA/wMuBk4aeZei5YWu2n16hPZvHkNr00LLqsak+wNWD7sgjZLi4cD\nD2bmQ9XAVwAfxL0We8vGJP1UIgz2AR6Z9v5RBgGhDrMxyeRp9QGiey12w2zPCVauvLRqTGLFYNyN\nbK/FiDgCOD8zj6/enw3kzIeIPjPojtmeExx7rCsLu6rNZwY3A/tHxL7A48BJwO8XOK9aNHf5UJOg\nxPZqr0TEp4BNvFZa3Fb7ytQay4cCP7UoLB/2nZ9aVC2WDyePYTDBLB9qOqcJE8p9DSaH0wTNyX0N\nNJPNTSaMjUk0jHcGE8TGJJqLYTBBbEyiuRgGE83GJHqNYdBz7mug+bK02GPuayCwtCjc10ALY2mx\nhywfajG8M+gZy4daLMOgZywfarEMgx5wXwOVYBh0nI1JVIph0HGzfeBo0JhkA4BTA82bYdBDNibR\nYhgGHWVjEpVWawViRFwIfAB4Afh34JTMfHbIsa5ALMTGJFqI+a5ArBsG7wOuy8xXI+ICBvslfHbI\nsYZBIe5roIWYbxjUWoGYmddm5qvV2xuBFXXOp7m5slBNKvnM4BPAFQXPp2lcWaim7TQMImIzsPf0\nfwQkcE5mXlMdcw7wUmZePte53Gtx8VxZqPka5V6LJwOnAkdn5gtzHOczgxp8TqDFauUjzBFxPHAW\n8LtzBYEWx8YkalPdasKDwDJgx4L4GzPzk0OO9c5gAWxMolJauTPIzHfV+XoNZ2MStc3mJmPG8qFG\nxeXIY8TyoUbJMBgjlg81SobBiNmYROPCMBghG5NonBgGI2RjEo0Tw2DM2JhEo2IYjICNSTSO3F6t\nZTYmUdvcXm1MzfacYGrKlYUaPcOgBXOXD6XxYBg0zPKhusIwaJjlQ3WFYTAClg81jgyDhlg+VNdY\nWmyA5UONE0uLI2T5UF1kc5OCbEyiLvPOoBAbk6jrioRBRKwFLgKmMvPpEufsGhuTqOtqh0FErACO\nBR6qfzl9YmMSdUuJO4OLGeydsKHAuTrFfQ3UJ3U3UVkDPJKZd0fstHLRKzPLh9dfv2NfA1cWqpvq\n7LV4LvA5BlOE6X82VJ/2WnRfA42r1vdajIjfAK4FnmcQAiuAx4DDM/M/Zzm+F4uOdkwNbr31Tp5+\n+s9x70ONu8YXHWXmPcDbpw34U+DgzHxmseccd5YP1Wcl1xkkO5kmdJ3lQ/VZsTDIzP1KnWucuK+B\nJoUrEOdgYxJNEsNgDjYm0SQxDBbIxiTqK8NgFjYm0SSyuckMNiZR39jcZJFsTKJJZXOTio1JNOm8\nM8CVhRIYBoArCyUwDIZwZaEmz8SGgY1JpNebyNLizPLh8uU7GpPcBmAJUb1iaXEONiaR3miiSouW\nD6XhJubOwPKhNLeJCQPLh9Lceh0GNiaR5q+3YWBjEmlhSuyodDrwSeBl4PuZeXbtqyrAxiTSwtTd\nRGUV8AHgPZn5ckRMFbmqhtiYRBqu7p3BnwAXZObLAJn5VP1LqsfGJNLi1FqBGBG3A98Fjge2A2dl\n5i1Djm18BaKNSaQ3KrYCcSfbqy0F3pqZR0TEYcA/AiNrmW5jEmnxdhoGmXnssD+LiD8GrqqOuzki\nXo2It2XmzDoe0K+9FqVx1fpeiwARcRqwT2aeFxHvBjZn5r5Djm19mrB8+TquvtqqgSbbfKcJdcNg\nF+AbwEHAC8DazNw65NhWPrU4faGRzwmklsJgIcbpI8zSJJlvGEzUpxYlDWcYSAIMA0kVw0ASYBhI\nqhgGkgDDQFLFMJAEGAaSKoaBJMAwkFQxDCQBhoGkimEgCTAMJFUMA0mAYSCpYhhIAgwDSZVaYRAR\nh0XEjyLi9ur3Q0tdmKR21b0zuBA4NzNXAucBF9W/pDIW0zfe8cZjvD5/b6MYb77qhsHjwJ7V618E\nHqt5vmL6/h+4z+P1+XsbxXjzVXfj1bOBGyJiPYNt195b/5IkjULdvRZPB07PzO9ExEcYbKgydDs2\nSeOr7o5Kz2bmHtPe/29m7jnkWHdQkUakyC7MO/FgRByVmVsj4hjggToXI2l06obBHwFfiYhlwP8B\np9W/JEmj0Npei5LGW6srECPiLyLizoi4IyKujYgVDY93YURsq8b7dkTssfOvWvRYH4mIeyLilYg4\nuMFxjo+IH0fEAxGxrqlxqrG+HhFPRsRdTY4zbbwVEXFdRNwbEXdHxBkNj/fmiLipWjR3b0R8vsnx\nqjGXRMRtEbGh6bGq8f6j+jt3e0T8aM6DM7O1X8Du016fDnyt4fHeByypXl8AfKHBsX4NeBdwHXBw\nQ2MsAf4N2BfYBbgD+PUGv6ffBg4C7mrp5+PtwEE7flaA+5v8/qpxdq1+fxNwI3Bkw+N9GvgHYENL\n/05/Arx1Pse2emeQmc9Ne7sb8FTD412bma9Wb28EGrsTycz7M/NBBqXXphwOPJiZD2XmS8AVwAeb\nGiwzrweeaer8s4z3RGbeUb1+DtgG7NPwmM9XL9/MIGwb+36rO+ETgK81NcZswzLPGUDrH1SKiL+K\niIeBk4EvtDj0J4AftjheE/YBHpn2/lEa/ssyKhHxTgZ3JTc1PM6SiLgdeALYkpn3NTjcxcBZDNbp\ntCWBzRFxc0ScOteBdasJbzDHIqVzMvOazDwXOLea734JOKXJ8apjzgFeyszLmx5L9UXE7sCVwJkz\n7iaLq+4cV1bPkzbtKJWXHici3g88mZl3RMQqmr2DnO7IzHw8In6JQShsq+743qB4GGTmfFcgXg78\noOnxIuJkBrdmRzc9VgseA94x7f0KxujzICVExFIGQfD3mfndtsbNzGcj4vvAoUDxMACOBNZExAnA\ncuAtEfHNzPxYA2P9XGY+Xv3+XxFxNYOp5qxh0HY1Yf9pbz/E4AFYk+Mdz+C2bE1mvtDkWDOHbui8\nNwP7R8S+1dqOk4Cmn0oH7f1fDAZL2u/LzC83PVBETEXEntXr5QyW0jfyM5mZn8vMd2Tmfgz+u13X\ndBBExK7VXRYRsRuwGrhn2PFtPzO4ICLuquZoq4C1DY/3twyeSm+uyjl/19RAEfGhiHgEOAL4XkQU\nfz6Rma8AnwI2AfcCV2TmttLj7BARlwP/Crw7Ih6OiFpTunmMdyTwUeDoqhR2WxXoTfll4F+qn8cb\nGTzh/+cGx2vb3sD1076/azJz07CDXXQkCbDtmaSKYSAJMAwkVQwDSYBhIKliGEgCDANJFcNAEgD/\nDzcvNav5fpAxAAAAAElFTkSuQmCC\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f8030785b10>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "#@test {\"output\": \"ignore\"}\n",
+    "num_examples = 50\n",
+    "X = np.array([np.linspace(-2, 4, num_examples), np.linspace(-6, 6, num_examples)])\n",
+    "plt.figure(figsize=(4,4))\n",
+    "plt.scatter(X[0], X[1])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "AId3xHBNlcnk"
+   },
+   "source": [
+    "Then we perturb it with noise:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "cellView": "form",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 327,
+     "status": "ok",
+     "timestamp": 1446659134929,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "4896c353dcc58d9f",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "fXcGNNtjlX63",
+    "outputId": "231c945e-e4a4-409e-b75b-8a8fe1fdfc30"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQkAAAEACAYAAACgZ4OsAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEptJREFUeJzt3X+MpVV9x/H3B5dJx6rYFbNGNqjUX2S1wFbRipWNZHYo\nhtUV02B/iJq4bbBI64QssCZLSmyKdKOGYixRYW3cUH/Rbi3MMgZ3G0ypRUC3gLCt1gKKBrES063L\nj2//uHfWu8OdZ+7Mvec553nu55Xc7Nx7n5nnTLL3O+d8z/eco4jAzGwxR+VugJmVzUHCzCo5SJhZ\nJQcJM6vkIGFmlRwkzKxS8iAh6RJJd0v6tqTPSZpIfU8zG52kQULSi4D3AadExG8Aq4BzU97TzEZr\nVeKf/xhwCPhVSU8BzwR+kPieZjZCSXsSEfFTYAfw38BDwP9ExFdT3tPMRiv1cOME4M+AFwEvBJ4l\n6fdS3tPMRiv1cOM1wNcj4lEASV8G3gDsmr9AkhePmGUUEap6P/Xsxn3A6yX9iiQBZwD3LrwoIhr1\n2L59e/Y2tL3NTWtvU9s8iNQ5iW8BnwW+CXwLEHBNynua2WilHm4QEVcCV6a+j5ml4YrLFdiwYUPu\nJixb09rctPZCM9s8CA06LknWAClyt8FsXEkiMicuzazhHCTMrJKDhJlVcpAws0oOEmZWyUHCzCo5\nSJhZJQcJM6vkIGFmlRwkzKySg4SZVXKQMLNKDhJmVslBwswq1XE4zzGSviDp3u4hPa9LfU8zG506\nehIfB26MiBOBk+izx6VZ2+3Zs4eNG89h48Zz2LNnT+7mLEvSTWckPQe4MyJ+veIabzpjrbZnzx42\nbz6PgwevAGBycis33LCT6enpzC0rY9OZlwCPSLpW0h2SrpE0mfieZkXZseOaboA4D+gEix07mrMf\ndOqNcFcB64H3R8Ttkj4GXAxs773osssuO/z1hg0bWrtXoFlue/fuZe/evcv6ntTDjTXAv0TECd3n\nbwS2RsTZPdd4uGGt1vThRvKNcCXtA94XEfdL2g48MyK29rzvIGGtt2fPnsNDjJmZLUUECCgnSJwE\nfAo4Gvgu8J6I+FnP+w4SVqxSP9yjUkSQWIqDhJUq5TChlODjIGE2hI0bz2FubhOdWQmAnUxN7ebm\nm7801M8tKUcxSJBIfsyfmR3pyClROHiw81qpQxkHCbMevcOA009fz623buXgwc57k5NbmZnZmbF1\neThImHUtHAbceutWtm27gH37dgMwMzOaIcHMzBZuvfW8xgQf5yRs7CyWNEyVg1hOG+rmnIS10jAf\nsKf3Fs7LkjScnp4uNgfxNBGR9dFpgtlgZmdnY3JyTcB1AdfF5OSamJ2dHfj7p6be3v3e6D6ui6mp\nt4/kZzdR9/NX+Rl1T8IaJeXMwPT0NDfcsLOnl1JG6XRuDhI2VpZKGjZqGFATJy6tUUZRiFRK0rAE\nrri0Vqr6kLcxAKT8nQYJEk5cWmu0MfGY+ndigMSlexLWGnXWOdQl9e9UwvZ1ZtZwnt2w1mhaufMg\nSvidPNywxnHist7EZR07Ux0F3A48GBGb+rzvIGEDK2kvhjYoJSdxIXBPDfexMdD07embKGmQkLQW\nOIvOHpdm1kCpexIfBS4CPJ6wkZiZ2cLk5FZgJ7Czm8jbsuT3NfmYvdySzW5Iegvwo4i4S9IGYNFx\njw/nsUGtZBFWKcvDS1DU4TyS/gL4A+AJYBJ4NvDliHjXguucuLSk2lhkNSpZE5cRcWlEHB+d07vO\nBW5ZGCDMrHwuprLWK6EgqclcTGUr1qTCpSa1tU5FFFMtxUGimVzU1A4OEpaMk4HtUErFpZk1mBOX\ntiJOBo4P9yRsxV75ypeyevXlnHLKtc5HtJh7ErZsC5OWBw9uzdwiS8k9CVu23CsxvQ6jXg4S1ijz\nvZi5uU3MzW1i8+bzDgcKB49EltopN/UD75bdODl3pV7smL7UbZqdnY2pqbcfvldb4GP+LIUSj8NL\nefzfuK8idZCwFcl1HN5iU68pcyIpA1ATOEhYo1T1Yly3kYbLsq01Ui3iavM6Fa/dMBuRtq4idZAw\ns0pe4GVmQ0u+pb6kWyTdLWm/pA+kvJ+ZjV7S4YakFwAviM6O2c8Cvgm8NSK+03ONhxtmmWQfbkTE\nwxFxV/frnwP3AselvKdZPy7ZXrnaEpeSXgzsBV7VDRjzr7snYUm1eQpzWIP0JGoppuoONb4IXNgb\nIOb5cB5LadwrJnsVdTjP4RtIq4CvADdFxMf7vO+ehCXl/TgXV0pP4jPAPf0ChFkdvNXecFLPbpwG\n/DOwn86hwQFcGhGzPde4JzHm6qhmbGvF5LBccWnFc1Ixr+xToDbeBpl2zL0V3ii1dZrVS8UtiXHb\nqKXVv+9SW1elfuDt61ppsW3mFsq5Fd4oDfr7loYBtq/zcMOymt9EZmpqN1NTu5f917etXfyiLBVF\nUj9wT6KV6ughlNQLKakty8EAPQkHCRvYcneMTr3DdGld/CbuqD1IkHDi0gaynMTcuNYk5NocOLml\nokjqB+5JNEKqROQwf32b2sUvCe5JWN2Ws5hq2GnDEs//aCMHCRtIivUPo1id2doufkEcJGwgS/3V\nns9DPPLIj5iYuIhDhzqvezFV83nthg1t4bBhYuJPWbfuJI499nmViUuv28jPC7ysFsPs1zCuMyGl\nKGU/CbNFOadQPpdl29BmZrYwObkV2Ans7OYhtuRu1ki47Lue7evOBD5GJyB9OiKuWPC+hxst0MZh\nwzjkTLLnJCQdBdwPnAH8APg34NzwuRvWAOOwN2YJm86cChyIiO9HxOPA9cBbE9/TzEYodeLyOOCB\nnucP0gkcZsXzBrodnt0wW4TLvjtSB4mHgON7nq/tvnYEH87TPm1JZLZtira4w3kkPQO4j07i8ofA\nN4B3RsS9Pdc4cdkyy50VaEtAaaJBEpd1LAU/k06gOABc3Of9Uax4tYIsZzMYL/fOixL2uIyI2Yh4\nRUS8LCL+MvX9rFnq2lLfRVEr58SljVxpswKt3u6+Dkt1NVI/8HCjlQbdcaqO4UZpe2GWBO9MZbkM\nOivgacbyeam4td44rMFYqexrNwbhIGF18DRrfw4SZlaphAVelpin9iw19yQazGNtG5Z7Eg2ykh7B\nKAuR3COxxThIFGC+RzA3t4m5uU1s3nze4Q9qHR/eqvubuZiqAIsV+yxVaDSqQiQXG40vXEzVbEud\ncOVCJKuDg0QBFlvrMEh+YRT7HczMbGHfvnM5dOiTAExMfIeZmeuH+pnWHg4SBajqEdS3UOpo4I+7\nX1+U6B7WRJ4CLVwdlYLjsCu09ecTvFqgbdunWfMkCxKSPgKcDfwC+E/gPRHxWKr72cqVtv+DlSVl\nncTNwLqIOJnO1nWXJLxXI5RasDSfE5ma2s3U1G5XbdoRaslJSHobcE5E/GGf98YiJ+ESaitRSWXZ\n7wVuquleRaprL0ezURsqSEiak/Ttnsf+7r9n91yzDXg8InYN3dqMSh0qmKU2VOIyIqaq3pf0buAs\n4M1V15V+OM8oNlJ1ctBKUNThPJLOBHYAb4qIn1RcV3xOYlR1BN4dyUqTu07iKmACmJMEcFtEnJ/w\nfsVbTs2DA4qVIlmQiIiXpfrZdat7qOBzIqwkLsseUJ1/2V0mbXXJPdxoFZdH27jyzlQjMOrp0ZmZ\nLUxObgV2Aju7w5stQ/9cs5XwcGNIqSopnbi0OvjcjRr0yx+sXn05u3Zd7Q+2Fa+ksuyx8uijz/dm\nstYa7kkMaeFwA+ZzCQ97RsKK555EDeaXWa9efTnwSToBwsMMaw8HiRGYnp5m166rmZz8HvAwnpGw\nNvFwY4Q8I2FN49kNM6vknISZDc1BoiG86Y3l4uFGA3h/TEvFOYmW8KpQS8U5CTMbWvIgIWlG0lOS\nVqe+10o0YazvVaGWU9LhhqS1wKeAVwC/GRGP9rkm23CjSWN912BYCtlzEpK+APw5sJsCg0Tusb4/\n+JZb1p2pJG0CHoiI/d2NcK2H97G0phgqSEiaA9b0vgQE8CHgUmBqwXt95Tp3I+dZGEee6AUHD3Ze\nc5CwlIo5d0PSq4CvAv9LJzisBR4CTo2IHy+4NusUaK4u//r1b+TOO58EXghsoY6l5R7e2EKDDDeI\niOQP4HvAry3yXoyb2dnZmJh4fsB13cexMTHx3JidnU16z8nJNYfvOTm5Jun9rBm6n7/Kz29du2UH\nFcONcbNjxzUcOnQlv0yYwrp11yb9y+7hja1ULUEiIk6o4z5Nduyxz8vdBLO+fO5GBjkSpj6w2FbK\nazcyyZFEdOLSFspeTDWIcQ0STeLg0l4OEja0JpWu2/J5FegimrCoqxRHzop0gsV8r8LGw9glLl0O\nbbY8Y9eTKOkvYxN6NF6mbmPXkyhFU3o084cP/TJxWV4bLa2xS1yWkojLvUzdDDIvFS+V/zKaLc/Y\n9SRK0a9Hs23bBezbdwfgegSrh+skCtdbpHT66ev58Ievyj4MsvHiINEgzlFYDi6mMrOhjV3islRe\npWml8nCjIF5IZXXLnpOQdAFwPvAE8E8RcXGfaxwkzDLJmpOQtAE4G3h1RLwa+KtU9ypVE8quzZaS\nrCch6e+Av4mIW5a4rpU9iVIqO82q5J7deDnwJkm3SfqapNckvFdxSlpIZjaMlIfzrKKzjf7rJb0W\n+DzQd0PcXIfzmI2bYg7nAZB0I3BFROzrPv8P4HUR8ZMF13m4YZZJ1tkNSVuA4yJiu6SXA3MR8aI+\n17UySICnNK18uYPE0cBngJOBXwAz872KBddlDxL+MNu4yl4nMYjcQcLDAhtnDhID8MIqG2e5p0DN\nrAXGfoGXF1aZVRv74QY4cWnjyzkJM6vknISZDc1BwswqOUiYWSUHCTOr5CBRGG9UY6Xx7EZBXCJu\ndfMUaMO4RNzq5ilQMxva2Jdll8Ql4lYiDzcK4xJxq5NzEmZWKfe5G6+V9A1Jd3b/Havdss3aImXi\n8iPAhyLiFGA7cGXCe5lZIimDxA+BY7pfPxd4KOG9zCyRlBvhHg98nc45HALeEBEP9LnOOQmzTAbJ\nSaQ8nOcC4IKI+HtJ76Czc/ZUv5/jw3nM6lHa4TyPRcRzep7/LCKO6XOdexJmmeSuuDwg6fRuQ84A\n7k94LzNLJGXF5R8BV0uaAP4P2JLwXmaWiIupzMZY7uGGmbWAg4SZVXKQMLNKDhJmVslBwswqOUiY\nWSUHCTOr5CBhZpUcJMyskoOEmVVykDCzSg4SZlbJQcLMKjlImFklBwkzqzRUkJD0Dkn/LulJSesX\nvHeJpAOS7pW0cbhmmlkuw/Yk9gObgX29L0o6Efhd4ETgd4BPSKrc2KJJlruRaAma1uamtRea2eZB\nDBUkIuK+iDhAZ5fsXm8Fro+IJyLiv4ADwKnD3KskTfzP0LQ2N6290Mw2DyJVTuI4oPeMjYe6r5lZ\nwyy5EW7F2RrbIuIfUzXMzMowko1wJX0NmImIO7rPLwYiIq7oPp8FtkfEv/b5Xu+Ca5ZR0hO8Fui9\n0W7gc5I+SmeY8VLgG/2+aakGmllew06Bvk3SA8Drga9IugkgIu4BPg/cA9wInO99882aKfu5G2ZW\ntqIqLiXNSHpK0urcbVmKpI90C8XukvQlSc9Z+rvqJ+lMSd+RdL+krbnbsxRJayXdIuluSfslfSB3\nmwYh6ShJd0janbstg5B0jKQvdP8P3y3pdYtdW0yQkLSWzqnj38/dlgHdDKyLiJPp1IFckrk9TyPp\nKOCvgWlgHfBOSa/M26olPQF8MCLWAb8FvL8BbQa4kM7wuik+DtwYEScCJwH3LnZhMUEC+ChwUe5G\nDCoivhoRT3Wf3gaszdmeRZwKHIiI70fE48D1dArdihURD0fEXd2vf07nP2/RNTbdP3BnAZ/K3ZZB\ndHu9vx0R1wJ0ix4fW+z6IoKEpE3AAxGxP3dbVui9wE25G9HHwqK2Byn8A9dL0ouBk4GnTZ0XZv4P\nXFMSfC8BHpF0bXeIdI2kycUuTnmq+BEqirI+BFxKZ6jR+152gxSSSdoGPB4RuzI0sbUkPQv4InBh\nt0dRJElvAX4UEXdJ2kAh/3eXsApYD7w/Im6X9DHgYmD7YhfXIiKm+r0u6VXAi4FvdReBrQW+KenU\niPhxXe3rZ7E2z5P0bjrdzDfX0qDlewg4vuf52u5rRZO0ik6A+NuI+Ifc7VnCacAmSWcBk8CzJX02\nIt6VuV1VHqTTc7+9+/yLwKJJ7eKmQCV9D1gfET/N3ZYqks4EdgBvioif5G5PP5KeAdwHnAH8kE5B\n2zsjYtEkVQkkfRZ4JCI+mLstyyHpdDqVx5tyt2UpkvYB74uI+yVtB54ZEX0DRW09iWUImtFluwqY\nAOa6q+Bvi4jz8zbpSBHxpKQ/oTMTcxTw6QYEiNOA3wf2S7qTzv+HSyNiNm/LWucDdKqijwa+C7xn\nsQuL60mYWVmKmN0ws3I5SJhZJQcJM6vkIGFmlRwkzKySg4SZVXKQMLNKDhJmVun/AUYHTBJb9HcU\nAAAAAElFTkSuQmCC\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f8040158790>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "#@test {\"output\": \"ignore\"}\n",
+    "X += np.random.randn(2, num_examples)\n",
+    "plt.figure(figsize=(4,4))\n",
+    "plt.scatter(X[0], X[1])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "3dc1cl5imNLM"
+   },
+   "source": [
+    "## What we want to do\n",
+    "\n",
+    "What we're trying to do is calculate the green line below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "cellView": "form",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 414,
+     "status": "ok",
+     "timestamp": 1446659137254,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "4896c353dcc58d9f",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "P0m-3Mf8sQaA",
+    "outputId": "74e74f19-6ff8-4a8c-81c7-9021a08b78b5",
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQMAAAEACAYAAAC3RRNlAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHvdJREFUeJzt3Xl4VPX1+PH3CRCNRVCgUpWK4iOlRWWxipVa0BDCIsHt\nZ22lLLbybRECGBYXFBBbQeQBRCoWGkWptrUUxQoJQYQWLVKURREBFUGggKDsS0jm/P6YgULMJDOZ\nO3eZnNfzzNNMcrn3tM2cfO7nc+75iKpijDFpXgdgjPEHSwbGGMCSgTEmwpKBMQawZGCMibBkYIwB\n4kgGIvJHEdkpImtO+d65IrJARNaLSKGI1E1OmMaYZItnZPAckF3me/cDC1X1e8Ai4AGnAjPGuEvi\nKToSkcbA66p6ZeT9x0A7Vd0pIt8BFqtqs+SEaoxJpkTnDM5T1Z0AqroDOC/xkIwxXnB6AtFqm40J\nqJoJ/vudItLwlNuEXdEOFBFLFMZ4RFWlsmPiHRlI5HXCXKB35OtewGuVBBSo18iRIz2PIZXjtZjd\necUqnqXFl4B3gKYiskVE+gBjgSwRWQ9kRt4bYwIo5tsEVf15lB91cCgWY4yHrAKxAu3bt/c6hLgE\nLV6wmP0krjqDhC4kom5dyxjzPyKCJmEC0RiToiwZGGMASwbGmAhLBsYYwJKBMSbCkoExBrBkYIyJ\nsGRgjAEsGRhjIiwZGGMASwbGmAhLBsYYwJKBMSbCkoExBnAoGYjIAyKyVkTWiMifRCTdifMaY9yT\ncDKI7KVwD9BKw/sp1ATuTPS8xhh3JdodGWA/UAx8S0RCwFnAdgfOa4xxUcIjA1X9GpgAbAG2AXtV\ndWGi5zXGuCvhkYGINAEGA42BfcDfROTnqvpS2WNHjRp18uv27dunbC85Y7y0ePFiFi9eHPe/S7gH\noojcAWSp6j2R978A2qhq/zLHWQ9EYzzgZg/E9cC1InKmiAjh/RPWOXBeY4yLnJgzWA28ALwHrCa8\n49IfEj2vMcZd1ird+FZhYSETJoT/ruTl9SU7O9vjiIIp1tsESwbGlwoLC7nlll4cOTIOgIyM4cyZ\nM9MSQhVYMjCB1rHjbRQV5RDezxdgJllZc1mwYLaXYQWSbaJijA8UFhbSseNtdOx4G4WFhV6HUyEn\nKhCNcVxeXl+WLu3FkSPh9xkZw8nLm+ltUHEqe6uzdGkvX9/q2G2C8a2gTyD65VYn1tsEGxkY38rO\nzg5cAggySwbGJEnQbnXsNsGYJPLDrY4tLRpjAFtaNMbEyZKBMQawZGCMibBkYIwPFH1axPYD3nYL\ntGRgjIdUlRdXv8ikdydxtOSop7FYMjCBFKSa/2hCGmLisom8vuF18nPyaXJuE0/jsaVFEzip8Hhz\ncWkxoxaP4stDXzIhewJ1zqiTtGu5urQoInVF5BURWRfZTKWNE+c1pjwTJvwhkgh6AeGkcKKwJwgO\nFh8kd34uJaESpnadmtREEA+nypEnA/NU9f+JSE3CeycYY8r48tCX5Bbk0rJhS4a2HUqa+OdO3YlW\n6XWA61W1N4CqlhDeWMWYpAhazf8Jn+/9nNz5udzc7Gb6tOxDuH+wfzjRKr0F4QaoHwEtgBXAQFU9\nUuY4mzMwjvFDzX881uxcw5AFQxhwzQC6fa+bq9d27dkEEbkKWAb8SFVXiMgkYJ+qjixznI4c+b9v\n2SYqprr45+Z/8uiSR3n0hke57rvXJf16ZTdRGT16tGvJoCHwb1VtEnn/Y2C4qnYrc5yNDEy1M2fd\nHKa9N42J2RP5wbd/4EkMrjU3UdWdIvKFiDRV1Q2EN1H5KNHzGhNkqsr096fzxsY3mN5tOhfVvcjr\nkCrl1GpCLvAnEakFfAb0cei8xgROaaiUsUvHsm73Op7r/hz1Mup5HVJMrOjIGAcdLTnKg28+yLGS\nY4zvOJ6zanm/ym79DIxx2b6j+/jNG7+hdnptJnWa5ItEEA/rgWiMA7Yf2M6A+QNo17gd/a/p76ti\nolhZMjAmQRv2bGBQwSB6tujJnZff6XU4VWbJwJgE/Gfbf3hw0YMMu24YWZdmeR1OQiwZGFNFCz5d\nwPh3xjM2cyxXXXCV1+EkzJKBMVXw0gcvMWvNLH7f5fdcVv8yr8NxRPBmOYyJUTIaoIQ0xKRlk5jz\n8Rzyu+enTCIAqzMwKSoZDVCOlx5n9JLRbD+wnUmdJvmmD0FlrM7A+Fqy25Y53QDlUPEhBhYM5GjJ\nUZ7p+kxgEkE8bM7AuC5oW5XvObyHAfMHcPl5l3P/j+8PZA1BTFTVlVf4UsaoZmXdqvC8gkZez2tW\n1q2OXqOgoEAzMhpGrvO8ZmQ01IKCgrjPs3nvZu32Ujed/t50DYVCjsbolshnr9LPqI0MTErKzs5m\nzpyZpzRAiX/k8eGuD7mv8D76Xd2Pm5vdnIwwfcUmEI3rgtDdeOmWpYxaPIqR7UZyfePrvQ4nIbYL\ns/E1P7ctm7t+Lk8vf5oJHSdwRcMrvA4nYZYMjImTqpK/Mp/X1r/GlM5TaHxOY69DcoQtLRoTh5CG\nGLt0LG9uepP87vm+SgRu7R7lWDIQkTQReV9E5jp1TpM6/Lwd2rGSYwwrGsYX+79gerfpNDirgdch\nnXRifqWoKIeiohxuuaVX8v73i2XJIZYXMBiYBcyN8vNkrJqYAHBqmS8Z9h3dp3e/erc+9OZDWlxS\n7HU43+DEMiwxLi06tb1aI6ALMMOJ85nU4pft0MqOTnYc3MEv5/6SKxpewaM3PEqtGrVcj8lPnKoz\nmAgMBeo6dD5jKhXPikTZ5cx/fngXVw5vwqAbB/HzK37uSrxV4ebuUU5sr9YV2Kmqq0SkPRB11nLU\nqFEnv7ZNVKqPsr/QaWmDadcuL6FzxlvSfNro5Pz3ONbhO3w17yg/H+jfRABVK54qu4lKzGK5l6jo\nBfwO2EK4Rfp/gYPAC+UcV9XbJpMCHnvsMU1Lq69wrUJewvMG8d5Lnzy+SZHyiw7KBY9oWlp938xd\nJBNuzRmo6oOqepGGd1S6E1ikqj0TPa9JLUuWvE8oNAH4N/Ckg/MGhcBtwDR2794Z9ai8vL7I5ffC\nj/JgXgfY/iyhUO9AbeWebFZnYHyrouXIvLy+pKcPAnoAOcCvWbt2Q7nLbqrKhnM2UOe6M2FuTdiz\nHJgJBL+60FGxDB+ceGG3CdVavMuLsRzfqlW7Sm8Vjpce14cXPay9X+2tf/vH33y7xJlM2FOLxk/i\nnQg7fTkSjhwJf+/Uf9OgQf0Kr3n4+GGGFQ2jVlotnun6DGfWPJPac2on9CRjSoslYzjxwkYGJg6x\nTBBWNHrYc3iP9vh7Dx2zZIyWlJZ84/wFBQWalXWrZmXdmvKjA2IcGVgyML4U621FQUGBtmrVTuvV\nu1RbtWqrBQUFumXvFu3+cnd9dsWz5TYk8XNFZDJYMjCBF8tf77If7DMa1dfWE1vr7I9mRz2vG52W\n/CTWZGBzBsa3srOzK72nP21u4bvvcOyGhqQtrc2tg251J8gUYsnApIamr0ObKVD4M869cmWFh7pZ\n4hsk1tzEBFpBQQE5I3/K8Usvgfk9yDj2ZEwt1Pzcaclp1unIpLyQhhj/9njmr5oP82qTXnJGyn+w\nq8KSgUlpxaXFjFg0gv3H9vNkxyepnV7b65B8y9qemZS1/9h++s/rTw2pwVOdn7JE4BBLBiZQdh3a\nxa/m/opmDZrx28zfkl4j3euQUoatJpjA+OzrzxgwfwB3Nr+THlf2QKTSka+JgyUDEwgr/7uS4QuH\nM/jawXS+rLPX4aQku00wvlLeY8uLNi1i2MJhjLlhjCWCJLLVBOMb5W27NnjGL3lf32di9kSaNWiW\ntOumcs1BrKsJTjxz0AhYBKwFPgByoxyXhKprk0pOf2YgpFz9Cz1vwAW6dd/WpF2zOjy0hIvPJpQA\n92m4IWpt4D0RWaCqHztwblMdpZXA9b+Fcz/jBxtbc2GdC5N2qVj6JlQXCScDVd0B7Ih8fVBE1gEX\nApYMTFzy8vryr2U9OfrjWaDCmW9u5P5XXvA6rGrD0dUEEbkYaAm86+R5TfVw9U+upu3jV7Dp/W00\n+aIZQ155Iel/oe2hpf9xbAIxcouwGBijqq+V83N16lom9Wzbv43+8/vTsUlHfv3DX7taQ2ATiJHj\nnPiAikhN4B/AfFWdHOUYHTly5Mn3tomKOWHdl+sYXDiYX7X+Fbf/4Havwwm8spuojB492tVk8AKw\nW1Xvq+AYGxmYb1i2dRkjFo3goesf4oZLbvA6nJTk2oNKItIWuAu4UURWRrZl75ToeU3s/LzdeUXm\nbZzHI289wpMdn7RE4AexrD868cLqDBJWXk/AIK6Th0Ihnblqpnb9U1f99KtPvQ4n5WENUVNLtA99\n0Jp7loZKdfzb4/WOV+7QnQd3eh1OtRBrMrAHlQIiWnFMkBSXFvPIW4/w1ZGvmNFtBmefcXbSrpXq\nKwTJYA8qBVxeXl8yMoYT3jtwZmSdvK/XYX3DgWMH6D+vP6rK012eTloiKCwspHXr9nTpchdFRZdQ\nVJTDLbf0CtRcimdiGT448cJuExJS0dyA33cH2nlwp/70lZ/qE0uf0NJQadKuU/Z/I2ioUOD7W6dk\nI8bbBHtqMUCCOPTd9PUmcgtyuf37t9OzRc+kFhN17HgbRUU5nLiVCo+W5gI5ZGXNZcGC2Um7tp/F\nurRocwYBEsumIn6yesdqhhYNZdC1g+hyWRePotherUuM42FzBqZcidYuLP58MUOKhjC6/WjXEkHZ\n+ZO0tMG0alUjpn0UDDZnYL4p0dqF2R/N1uwXs3XtrrVJjLJ8fp8/8QI2Z2Cqqrx771juuVWVZ997\nloJPCni6y9M0qtMo6bGaytmcgXFVaaiU3/3rd2z8aiP53fOpl1HP65BMnCwZmG+I9xn/I8eP8MCb\nD1AaKmXaTdM4q9ZZLkVqnGS3CaZcsS5j7j26l0EFg2hctzEPt3uYmmn298VvbK9Fk3TbD2yn/7z+\nZF6SSb+r+9mmJj5lcwYmqdbvXs+gwkH0admHO5rf4XU4xgGWDEzclm9bzkOLHuL+tveT2STT63CM\nQ6zoKAUls9lJwScFjFg0gnEdxlkiSDFOtT3rBEwinFz+qKrjyjnG5gxcUN6uRE5V4M1aM4uXP3yZ\npzo9xaX1Lk34fMYdrk0gikgasAHIBLYD/wHu1DKbqFgycEdVC4YqEtIQk5ZNYtnWZUzpPIWGtRs6\nEqtxh5sTiNcAG1V1c+TCfwa6Y5uopITi0mJGLR7Fl4e+ZEbODOqcUcfrkEySODFncCHwxSnvt0a+\nZxwSzxyAk81ODhUfYuD8gZSESpjadaolghRnqwk+V3YOYOnSXhXOAWRnZzNnzsxTCoaqNl+w+/Bu\ncufn0qJhC4a2HUqa2FxzqnMiGWwDLjrlfaPI975h1KhRJ7+2TVRiU5WNQU/te3BiVAGxN0TZvHcz\nA+YP4OZmN9OnZR8rJgqYspuoxCyWRxsregE1gE+AxkA6sAr4fjnHOflUZrWRSPfjqjyKvHrHas16\nIUtf+/g1J8I3PoCbrdKBTsB6YCNwf5Rjkv/fOgUl0lsg3kSy5PMlmjkzU5duXupU+MYHYk0GjswZ\nqGoB8D0nzmVO59QcQGVe/fhVnlnxDJM7Tab5ec0dP7/xP3tQKYWUfdIQqLQASVWZ/v503tj4BlM6\nT+GiuhdVeE5rHxY8sdYZWNuzFBHtdqKiNmAlpSX62JLH9K7Zd+mew3tiPqcJFmx7tdQU7cMd7/zA\nkeNHdHDBYO33j356qPhQuccEbes2U75Yk4HVGQRI2ZqDJUt+QfPmTWnQoCG7d++J+Tz7ju5jcOFg\nGtVpxLgO46hVo1ayQjZBEkvGcOKFjQwSVt5farhW4XlNTz9H09O/XemQfvv+7XrrX27VycsmV7q7\nkd0mpAZsZFBdXAD0orgYWrWaToMGc4HyVx027NnAoIJB9GzRkzsvv7PSM7u1kmH8wVYTAqTsbQIM\nAWYB2VT2dOKK7St44M0HGHbdMLIuzXIpYuMH1gMxRZ1Y6tu9ew9r166muHgSUHHfggWfLmD8O+MZ\nmzmWqy64yu2QjccsGVQDsdQAvPTBS8xaM4vJnSZzWf3Lkn494z+WDKq5kIaY8u4U/rXlX0zpPIXz\nzz4/ofM51UHJEor7rOioGisuKdYRb47QPq/20b1H9jpyTidqDmx1whvEuJpgD6kHXNnGJ4ePH2Zg\nwUAOHz/MM12foe6Zdb0O8aTTH8cOjzJOjBKM92xpMcDKDt3/teIX/OjRy+nQogPD2w6nRloNx64V\n75ZrJoBiGT448cJuExx32tC97mblZy31+z1aaSgUSsr1Et3u3G4TvIEVHVUj530IHfNgxfU0+u62\npHUmOrWDUlX/vRUx+ZetJgRYYWEhOf1/RvF158Pi7mR8me/YHgkmdcS6mpDQBKKIPCEi60RklYjM\nFhFrn+uiYxcf4/J7L+HafeeT9b31lghMQhIaGYhIB2CRqoZEZCzhe5MHohxrIwOHqCr5K/N5bf1r\nTOk8hcbnNPY6JONjrmyioqoLT3m7DLgtkfOZyoU0xBNvP8GanWvI755Pg7MaeB2SSRFOTiDeDfzZ\nwfOZMo6VHGPEohEcOn6I6d2m8630b3kdkkkhlSYDESkCTt1cTwAFHlLV1yPHPAQcV9WXKjqX7ZtQ\ndfuP7WdwwWC+U/s7/C7zd9aQxERV1X0TnNh4tTdwD3Cjqh6r4LjAzxl4VVe/4+AOBswfQNvvtiW3\nTa7tbmTi4sqzCYT3S1gL1I/hWEcLKdzmVcHMxj0btfOszjpr9aykX8ttiRYxmdjgRkNUwpumbAbe\nj7x+X8GxLvzXdtapv6ytWrVzvTnoe9vf0w4vdNCCjan3QbFqRPfEmgwSXU1I7AF5Hytb95+Wlufq\n9Rd+tpBxb4/jtzf+lmsuvMbVa7uhKntImuSycuQoyv6yhkIfkJY2mFAo/PNkPqjzlw//wszVM5na\nZSpN6zdNyjWMKcuSQcyuoEWLH1TYcDRRqsrU/0zlrc/fYkbODC44+4Jyj0uFBiH2FKQPxXIv4cSL\ngM0ZuH1Pe7z0uD6y6BHt/Wpv/frI176JK5lsAtEdxDhnYA8qVcCtv8CHjx9meNFwaqbV5PEOj3Nm\nzTOjHtu69Y9ZubKUcIv0vsCOCrsiR5MKowsTG2t7FhB7Du/RHn/voWOWjNGS0pIKjy0oKNC0tHNP\njgqgoUKetR8zFcL2WvS/LXu3aPeXu+uzK56NqSFJeX0I09Lqx/1Btj0Uq5dYk4GVsnnkoy8/4p7X\n76Fni570vaovIvKNfoaxaNHichviG2fEkjGceGEjg5Pe3vK2Zs7M1CWfLzn5vViG7k4N7+02oXrB\nJhD96R8b/sFT7z7Fkx2f5MqGV578fseOt1FUlMOJuoZTt0s7dbKvXbvWLFnyPpDYxJ9NIFYfrvQz\nMLFTVZ5f9Tx///jvPHvTs1xy7iUx/buylZBLl1Zt85KyEu1naFKPJQMXhDTEk+88ycodK8nPyefb\n3/r2N46JVoRjZbvGLTaBmGTFpcXcv/B+Pv3qU6Z3m15uIoD/dQ7OyppLVtZc62doXGdzBkl04NgB\n8hbkUT+jPqNvGE16jfS4z+HUHoem+rKNVz2269AuBswfQJsL2zDo2kEVNiSpbDLPJvtMIiwZeOiz\nrz8jd34uP23+U3pc2aPCTU3sL79JNksGHlm1YxXDioYx+NrBdL6sc6XHV7SkaIwTXNlE5ZSL5YlI\nSETqOXG+oFq0aRFDi4Yy5oYxMSUCY/wk4WQgIo2ALMLtz1JGvKXBr6x9hfHvjGdK5ym0adQm5uvk\n5fUlI2M4MBOYGVlS7Fv1wI2pqljKFCt6Aa8AVwCbgHoVHOdoiWUyxVOuGwqFdOryqXrLn2/Rrfu2\nVvl69ly/SRbcKEcWkRygvareJyKbgKtU9asox2oi13JTrKXBg+77JSsyVvDJV58wudNkzs0417OY\njYnGsXLkCjZRGQE8SPgW4dSfRRX0TVROm/mvWcyiM27npps68vLdL5NRK8Pr8IwBPNhERUQuBxYC\nhwkngUbANuAaVd1VzvGBGRlEW+6bMOEP4RHDmTnQeSB8dZDMdGXhgjkeR2xMdElfTVDVD1X1O6ra\nRFUvAbYCrcpLBEFTYWnw2V9D97th67WwJIc0q+g2KcKxOgMR+Qz4YSrMGUQz49UZ/N+cewmtuAc+\nupr09EE0b96CBg3qW2Wg8S1X6wwAIiOEchNBKnh367v85cBfeLzrGLIu/C+tWk0HarFyZR+KinK4\n5ZZeMXcnckJVuiIZU6FYlhyceBGgpcWy3tjwhma9kKUr/7vy5Pe87CNonYpMPHBje7VUp6q8uOZF\n/rr2r0y7aRpNzm3idUiAbU1mksOSQRQhDTHx3xNZvn05+d3zOe9b5532c9sRyKQae1CpHMWlxYx8\nayR7juxhQscJnH3G2eUe59Wjxfako4mHPbVYRQeLD5JXmMc5Z57DmBvHVKkhiRusx4GJlSWDKth1\naBe583O56vyryLsur8KGJMYEhXVHjtOmrzeRW5DL7d+/nZ4telbYkMSYVGTJAFizcw1DFgxhYJuB\ndG3a1etwjPFEtR8HL/l8CXkL8hjVfhRdm3a1Yh5TfcVSjODECx8WHc3+aLZmv5ita3etVVX/FvNY\nvwOTCGwX5uhCoZBO+8807f5yd92yd8vJ7/txd2K/JigTHLEmg2o3Z1AaKuXxpY+zYc8G8rvnUy/D\n320brdrQuKVaJYOjJUd5YOEDlIRKmHbTNM6qddZpP7eqQlOdVZs6g71H9zK4cDAX1bmIh9s9TM20\n8vOg34p5rNrQJMqKjk6x/cB2+s/rT+YlmfS7ul/gagj8lqBMsFgyiNiwZwODCgbRu2Vv7mh+h+vX\nN8ZrrjU3EZEBIrJORD4QkbGJns9Jy7ct595595L3ozxLBMZUIqEJRBFpD3QDrlDVEhFp4EhUDij8\npJAJ/57AuA7jaH1+a6/DMcb3El1N+A0wVlVLAFR1d+IhJW7Wmlm8/OHLPNP1GS6td6nX4RgTCIne\nJjQFfiIiy0TkLRH5oRNBVdWJhiRz188lPyffEoExcUh0E5WawLmqeq2IXA38FYjaGyyZm6gUlxYz\nevFodh7ayYycGdQ5o45j5zYmSFzfRAVAROYB41R1SeT9J0AbVd1TzrFJW004VHyIIQuGUDu9No/d\n+Bhn1DwjKdcxJojcWk14FbgxcsGmQK3yEkEy7T68m3tev4eLz7mYcVnjLBEYU0WJTiA+B+SLyAfA\nMaBn4iHFbuv+rfR7ox83N7uZPi37BK6YyBg/CXTR0YFjB1i+bTmZTTIdPa+TrHrQeM0qEH2g7HMF\n6elDad68KQ0aNLTEYFzj+vZqbglSJ6LTHz/uRXHxeFauLPVkOzZjKhOoZHDiL21RUU6AP1AXAOHR\nwonbB2P8IFD9DILW6KNsfwQYAszyMCJjogvUyCBosrOzmTNnJllZc2nV6jnS00uAHcDMSOOUvl6H\naMxJgZpADHqjD1tZMF5I2dUE+0AZE5+UTQbGmPik7NKiMSY5LBkYYwBLBsaYCEsGxhjAkoExJsKS\ngTEGsGRgjIlIKBmIyNUislxEVkb+09OGqMaYqkt0ZPAEMEJVWwEjgfGJh+QfVWkq6aWgxQsWs58k\nmgz+C9SNfH0OsC3B8/lK0P5PD1q8YDH7SaKPMN8PvC0iEwi3UL8u8ZCMMV5IdN+EAcAAVX1VRG4H\n8oGsZARqjEmuRPdN2K+qdU55v09V60Y51p5SMsYjsTyolOhtwkYRaaeqS0QkE9iQSDDGGO8kmgz+\nD5gqIunAUcBa9xgTUK71MzDG+JsnFYgikiciIRGp58X1YyUiT4jIOhFZJSKzRcS3u7mKSCcR+VhE\nNojIcK/jqYiINBKRRSKyVkQ+EJFcr2OKlYikicj7IjLX61hiISJ1ReSVyO/xWhFpE+1Y15OBiDQi\nvOKw2e1rV8ECoLmqtgQ2Ag94HE+5RCQNeBrIBpoDPxORZt5GVaES4D5VbQ78CLjX5/GeaiDwkddB\nxGEyME9Vvw+0ANZFO9CLkcFEYKgH142bqi5U1VDk7TKgkZfxVOAaYKOqblbV48Cfge4exxSVqu5Q\n1VWRrw8S/gW90NuoKhf5Q9YFmOF1LLGIjGSvV9XnAFS1RFX3Rzve1WQgIjnAF6r6gZvXdcjdwHyv\ng4jiQuCLU95vJQAfLgARuRhoCbzrbSQxOfGHLCgTbZcAu0XkucitzR9EJCPawY5volJJkdKDnF6U\n5PlyYwXxPqSqr0eOeQg4rqoveRBiyhKR2sDfgIGREYJviUhXYKeqrhKR9vjgdzcGNYHWwL2qukJE\nJhGuGh4Z7WBHqWq5FYgicjlwMbBawnunNwLeE5FrVHWX03HEKlq8J4hIb8JDwxtdCahqtgEXnfK+\nET5/TkREahJOBC+q6mtexxODtkCOiHQBMoCzReQFVe3pcVwV2Up4JL4i8v5vQNTJZc+WFkVkE9Ba\nVb/2JIAYiEgnYALwE1Xd43U80YhIDWA9kEn44bHlwM9UNepkkddE5AVgt6re53Us8RKRdkCequZ4\nHUtlRGQJcI+qbhCRkcBZqlpuQvByr0XF/0OtKUA6UBQezLBMVft5G9I3qWqpiPQnvPqRBvzR54mg\nLXAX8IGIrCT8u/CgqhZ4G1lKygX+JCK1gM+APtEOtKIjYwxgbc+MMRGWDIwxgCUDY0yEJQNjDGDJ\nwBgTYcnAGANYMjDGRFgyMMYA8P8BB4YEUxpBpuwAAAAASUVORK5CYII=\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f80640172d0>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "#@test {\"output\": \"ignore\"}\n",
+    "weights = np.polyfit(X[0], X[1], 1)\n",
+    "plt.figure(figsize=(4,4))\n",
+    "plt.scatter(X[0], X[1])\n",
+    "line_x_range = (-3, 5)\n",
+    "plt.plot(line_x_range, [weights[1] + a * weights[0] for a in line_x_range], \"g\", alpha=0.8)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "VYUr2uPA9ah8"
+   },
+   "source": [
+    "Remember that our simple network looks like this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "cellView": "form",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 170,
+     "status": "ok",
+     "timestamp": 1446659140755,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "4896c353dcc58d9f",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "gt8UuSQA9frA",
+    "outputId": "080025d5-d110-4975-e105-7635afaa3ce9"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAJYAAABkCAYAAABkW8nwAAAO90lEQVR4Xu2dT5Dc1J3Hv+YQT8VJ\nZUhVdprLWs4FTSrGGv4ql9CuHBCH4GaTFCLZwnIcjOAy8l6Q/1SlU4XHcg6xJgtY2OOik2KxSGoT\nGWrXzYFC2T2MDAtWitRavmQ0e9k2SYGowom4hNRPtqA9TE+rW3/cPfPepcfup6f3fu/Tv9/T+/PV\npo8//vhjsMQsULAFNjGwCrYoKy6xAAOLgVCKBRhYpZiVFcrAYgyUYgEGVilmZYUysBgDpViAgVWK\nWVmhDCzGQCkWGEuwrly5gtf++zW887/vYOn/lnD5T5cT40x9ZQrb/nEbxDtFiHeI2LJlSylGY4X2\nt8BYgUVAvfzqy3i5/TI+vPLhmq37wpYv4AHpATxw3wMMsP4cFJ5jbMAiqA4eOYg/Lv8xMcL26e34\n+vTXk8+vbv1q8n/03TsX38EfLv4h+aRE380dmmNwFY7O2gWOBVgE1Y/2/yjxUls+vwXaY1oS7tZK\n3v94MJ8zceUvV0Dea+H4AoOrQrhGHqxuT0Xjp0P7D2HqH6Yymejyu5dx5PiRZBxGnmt+bj7TdSxT\nfgv0ASuAzglwmyE8pfbZu3VaEDkDdT+AweevzGolvPjvL+LMb84knmr+yHxmqNKyCK7ZQ7OJ5yIo\n+3m6clqx8UrNB1bso2W64FQN9cnijdcdAvNAQWGRPBcLicX3Ua8S84FVcj3PnjuLhRcWkgH63OG5\nXHc7+NTBZEBP47NvffNbucpiF/e3QCaw2g0NfNvES5c+wtQ9u2G0LCj8BLAiFEaeBU0zYJ9fxkfY\njKl7FZgtCzIHIA7QUmXov/g9LmMztt6rwLBMyFROj3TkZ0fgveXh4X96GN//zvf7t2aNHGlI7VlW\n0pYmRC+AKUwAsQu5thOuvIjQEjGBGJ7CQYptdOw6etc6VzXXzcUZwJrGseWt2P28DV2I4OgyDgQK\nFgMTYtQ1xqq10eDuR6j8Fi1NxGTkwpAfRos7h05bQscQIFgibEeHMBHCVhs4EBtY8lQQd6ulvbN7\n8e6f302mC7Z/bXsuo9NkKk1X9PZ+IUyeR0sN4GscYl8DPzOP5VuPYynQwMU+dL4O3wzRbpQQ93O1\nbvQuzgRWS0p/tQA6Nuqcilq7A5u3Px28T7qw7BB1VUHqhEKTB2+pCAIVHZVD3dPgujpE6peOBzes\nQRS5nr/+b//g24nF7JN27qkCGq/J++RknHXm5JlVeiKGr/MQPQMdV0ZkCRBbNUwEMYzQhRyZEHgH\nOv29ynPM6HXtja1Rf7B4AZ7RgZv+SuMAOj+NtrYEX3avfyqMfDi2DdcLEAQBvPOX8MGtR3Ex0MEF\nJiRxP373wWZsvaeBhixDVRrg1/jxlwEWPV3ap+xVrR57Cjgpht2xEDV4mLIFvqkiaoUwwzp4U4Hv\n9/awN7YrR+vuGcAS4ZsdtKV0VNEFVqMLrIkWJGEPPP4hKA0RgiCAc1XsdJQErGQ2Ig7hOQ5sx4Hz\n0u+wvHX2akjtMWCpNhQCiCicq+AcCx1Fh9B2IegcNN6B4Teg1z0EeknzKqPFRe7a9AeLm4ajXvzU\noJEDqUahMESrKxSqbQHbDBGLoXUNlBiuUsNOT8fFQEVsNdHmdOjStTgSGOCnLTQuBDBosLxKqnTw\nntw/glPnoHMS4E6iFVjgbBGcwUGMPAjtawP73GZf/wVkAutYtAvPezYUPoKjipBdGZ5vQOgavGte\nHbfsiXD09TZUIUbg6JD3vITlrU/iYthErPOYaQk44ZhocDF8U0HDqsEOHfQaC7/2X68lyzJVTjd0\nWiJu2XMem++7+tAxSd52+hguTe3GYtjq6V3XPyqDtbA/WLyAtqRg0rHhLceo3avCsk0kjqd7uoEL\n0FJkaC/9Hh/gS9ixS0dTCaDKHVidNhoTNN2gQP/FedAmly/t2IWm2YK2xswqDbj3antzz5oToD/9\n15/i5smbcdo8vfaDQGiC37YfEyeW4KtcMu2g1HbCrp9Dx5Fw3ZCw04ZSb0Jse6CsLH1qgZFfK0zn\nn+hpznzKHGpJRzus4YJ/AX/78G94ofUC7r777pwMxAhdE6pyAK8u78CJJZ+BtcKiIw8Wea0DTx34\nZCH5oHYwM1y0TjhnziXbaWgB+4cP/RCPPfYYtm/fjpMnT+Kmm24aDrDYhdpoQdAbaMtNSB4Da6Uh\nRx4sqnB3SCTPNbtvtu9iMoU/Wg5Kt9p0h8DTp09j3759ePrpp/H4448PB1fylOtC5jTUGVifseFY\ngJXClXou+jcN6Gk2nj7JG1Gi7TG0Hkiz7OlGP/ru6OGjq46rnnjiCSwuLibe66677hocMAZWT5uN\nDVgpXGfbZ5OtybQNZq1EE6G0NXmXtGvNwbrv+4n3uu222wYPjwys9QFW2goKjbQ4Tdth6CAFeSpK\n5J3oQMUwhynS8PjMM89AVdVs3ouBtb7Aytbrw+WiMZfnednCIwOLgTUIZml43LFjB5577rnhnx4H\nuek6yztWY6yqbb+wsJBMTwwUHquu5Ijej4GVoWMoPJ4/fz7xXkM9PWa4x3rLwsDK2KMXLlxIvBeF\nR5qe2LRpU8YrN2Y2BtaA/U7hkaYnnn322exPjwPeYz1kZ2AN2YtpeCTvdeeddw5Zyvq9jIGVo28p\nPJL3ok2NLDxeb0gGVg6w0kvT8HjixIlkHJY1lauaE8GRangwsvD/noKqt+kzsLJSkCEfzdi/8cYb\nifdaKzxWoppDmxJ5FT54NH06YZShAQVmYWAVaEwqKg2PMzMzyfTEyqfHqlRzAoOH6OqwJnXoNQeB\nSWcjq0sMrJJsferUqSQsdofHylRzYg8aLyG0QtiTOvhGhFZglyKD0Mt8DKySwEqLpfD45ptvYn5+\nHr/+z19/sukwj2pOP72vyJXBy4BNME340Pg6AiNAu8IDkQysksGi4t9++2189wffxee++DkIO4Tc\nqjlrSw504Eg81FobYetq+KOwKDgagjVOnRdtBgZW0RZdpbw0BL73/nv4yZM/6bv7tVeVxkk1h4FV\nAVgbUTWHgVUBWGUcvCVV6EP/cuiztQ9NCNsMiIshrPSIeaK3oUNIlXQqaDMDqwIjlyEV0Fv6MoQl\nbENT/FTIhWSXOF2AF5jocei8cCswsAo36WcLLEPchO7yyr+9smrt6TQ3geQmcgcd2CQbIHoIDKGy\nuSwG1joEi06oU+jj3RAWR2HQgFiiTuxqJmRgVQBWGaGQDo78/OjPe9T+qpfSeBeeqIM3JPip4k8F\n7aVbMLAqMHSlg/dr7YkcCZxWg1Jz0G5UL7/EwKoArBuhmoNEbupBvPrRDhxf8qFVLFrCwKoArFQi\n4P3o/VwTpCmgdBi3r2oOIrQbNdwfGljytZ46r2U1n4FVlmW7yn3rrbfwvX/+XrKkMyPM5FLNIS2K\nbCrSNI8loKX48G6AxhIDq2SwaIcDgWWaJn71H78qRDWnlxbF1aaQxJILj6TRjRhm0L4hYrwMrJLA\nos1+BBXtyaLty5SKVs1Zverx1RB4dhIPPe/CVioeXF2rFAOrYLDIOxFQd9xxRwLVytSt90XfFaGa\nU3ATCimOgVWIGa8WkoY9AorA6pUIrqJVcwpsRiFFMbAKMONqYS9LsWWo5mS5bxV5GFg5rExhj8ZP\ndHBitbCXo+ixv5SBNWQXpmGPvNXtt98+ZCnr9zIG1oB9O2zYG/A2Y5+dgZWxC1nYy2goNt2Q3VA0\njqIDESzsZbcZ81hr2CoNe/T56KOPZrcqy8m2zazGAAt7+X8ZzGOtsCELe/mhohLGEqwyVFpY2CsG\nqLSUsQKrDJUWFvaKBWrswCpDpYWFvXKgKiYUxh5U/huwhd8idBqYRARX4bHTldd8Le8gTSpapYWW\nX0is47qnveTdi02I6aFOejlAbSdcOT2fF8NTOEixDTqnV6Uk0CC2GpW8hYTCyFXA72yj8XoAAzoE\n+nsxgNnrZc8DtL7bU9HJlDwqLY9855FkbY8ktS3LWlGLECbPo6UG8DUOsa+Bn5nH8q3HsRRo4GIS\nL6vDN0O0e70SdoB2rfeshYBF71Juyzzu90TcF59FIC8WJvSVvgiT9nnPH5nP/K7CtOPonYWzh2aT\nF2Fu+usmvPjLF3us7cXwdR6iZ6DjyogsAWKrhokghhG6kCMTAu9Ap7+r1l0cQwoLAote4+ugwT+I\nsxO78XrQKkTkqzsEkqeily8Nk0il5cfHfowv3/xlLBxf6Pk2sNhTwEkx7I6FqMHDlC3wTRVRK4QZ\n1sGbCnxfrfxgwjBtvtHXFAZW7OsQZo7hEm7Fkxf8nm+mH6TBlau0RG00OBWcY6Gj6BDaLgSdDn46\nMPwG9Hr15/MGsdco5S0GrDiAIU7D5M/AgIo9gY6Lng4+5wi3jIOea59wieCQzgEnAe4kWoEFzhbB\nGRzEyIPQDmBWpaoxSpQMUZdCwCLh1OlmDWcCBzJsSNzDiIyL8LR8Ur1lHE2nPeZzh+d6mooENW7Z\ncx6b7zuHTlvCJB1Nnz6GS1O7sUhKxDl/LEP00Vhekh8sUjThNUyYAdxr59dCSwSvAWbg5Xq7exkq\nLfRO6TMnz/TurNAEv20/Jk4swaf2xC6U2k7Y9XPoOBIm6crYh6UoaLodABOoSU3YlpLbQ48lQT0q\nnR+sEq1RBlj0dGmfsnPVOtB51IMmfEdGLQ7RkkSYkps8VbJ01QIjDdaNCIVZwOi4DnxOgsRRXIzh\nazwakY3gmphsljLWe56RBqv6wfvg3R0HFqS6CcHxC5kQHrwGo3nFSIN1Q1RaBuinyDchSyYmDRct\nhWPLPF22G2mwuo+k55kgHUylJRtZoa1A0kI0bAdGPRnSszQuYFE90yUdepoznzKHWtLRDmsglZY8\ncHZTE7UVCGqEpmtDScZZLK20wEh7LKpst9YBKQUf1A5mhovWCefMuU9eM9JbWnEQMAIY/DQOXLr+\nmqmHXkfIdj18YpSRByuFa6+2F1f+cgXkuWb3zfZdN6Twt/DCQuKpsgmVDQIXy9vPAmMB1krPRf9e\nryot/TpsXL4fG7BSuNa7Ssu4gNOvnmMFVtqY9azS0q/DxuX7sQRrXIy7kevJwNrIvV9i2xlYJRp3\nIxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3Ixf9d0NIelzdt4X5\nAAAAAElFTkSuQmCC\n",
+      "text/plain": [
+       "<IPython.core.display.Image object>"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from IPython.display import Image\n",
+    "import base64\n",
+    "Image(data=base64.decodestring(\"iVBORw0KGgoAAAANSUhEUgAAAJYAAABkCAYAAABkW8nwAAAO90lEQVR4Xu2dT5Dc1J3Hv+YQT8VJZUhVdprLWs4FTSrGGv4ql9CuHBCH4GaTFCLZwnIcjOAy8l6Q/1SlU4XHcg6xJgtY2OOik2KxSGoTGWrXzYFC2T2MDAtWitRavmQ0e9k2SYGowom4hNRPtqA9TE+rW3/cPfPepcfup6f3fu/Tv9/T+/PVpo8//vhjsMQsULAFNjGwCrYoKy6xAAOLgVCKBRhYpZiVFcrAYgyUYgEGVilmZYUysBgDpViAgVWKWVmhDCzGQCkWGEuwrly5gtf++zW887/vYOn/lnD5T5cT40x9ZQrb/nEbxDtFiHeI2LJlSylGY4X2t8BYgUVAvfzqy3i5/TI+vPLhmq37wpYv4AHpATxw3wMMsP4cFJ5jbMAiqA4eOYg/Lv8xMcL26e34+vTXk8+vbv1q8n/03TsX38EfLv4h+aRE380dmmNwFY7O2gWOBVgE1Y/2/yjxUls+vwXaY1oS7tZK3v94MJ8zceUvV0Dea+H4AoOrQrhGHqxuT0Xjp0P7D2HqH6Yymejyu5dx5PiRZBxGnmt+bj7TdSxTfgv0ASuAzglwmyE8pfbZu3VaEDkDdT+AweevzGolvPjvL+LMb84knmr+yHxmqNKyCK7ZQ7OJ5yIo+3m6clqx8UrNB1bso2W64FQN9cnijdcdAvNAQWGRPBcLicX3Ua8S84FVcj3PnjuLhRcWkgH63OG5XHc7+NTBZEBP47NvffNbucpiF/e3QCaw2g0NfNvES5c+wtQ9u2G0LCj8BLAiFEaeBU0zYJ9fxkfYjKl7FZgtCzIHIA7QUmXov/g9LmMztt6rwLBMyFROj3TkZ0fgveXh4X96GN//zvf7t2aNHGlI7VlW0pYmRC+AKUwAsQu5thOuvIjQEjGBGJ7CQYptdOw6etc6VzXXzcUZwJrGseWt2P28DV2I4OgyDgQKFgMTYtQ1xqq10eDuR6j8Fi1NxGTkwpAfRos7h05bQscQIFgibEeHMBHCVhs4EBtY8lQQd6ulvbN78e6f302mC7Z/bXsuo9NkKk1X9PZ+IUyeR0sN4GscYl8DPzOP5VuPYynQwMU+dL4O3wzRbpQQ93O1bvQuzgRWS0p/tQA6Nuqcilq7A5u3Px28T7qw7BB1VUHqhEKTB2+pCAIVHZVD3dPgujpE6peOBzesQRS5nr/+b//g24nF7JN27qkCGq/J++RknHXm5JlVeiKGr/MQPQMdV0ZkCRBbNUwEMYzQhRyZEHgHOv29ynPM6HXtja1Rf7B4AZ7RgZv+SuMAOj+NtrYEX3avfyqMfDi2DdcLEAQBvPOX8MGtR3Ex0MEFJiRxP373wWZsvaeBhixDVRrg1/jxlwEWPV3ap+xVrR57Cjgpht2xEDV4mLIFvqkiaoUwwzp4U4Hv9/awN7YrR+vuGcAS4ZsdtKV0VNEFVqMLrIkWJGEPPP4hKA0RgiCAc1XsdJQErGQ2Ig7hOQ5sx4Hz0u+wvHX2akjtMWCpNhQCiCicq+AcCx1Fh9B2IegcNN6B4Teg1z0EeknzKqPFRe7a9AeLm4ajXvzUoJEDqUahMESrKxSqbQHbDBGLoXUNlBiuUsNOT8fFQEVsNdHmdOjStTgSGOCnLTQuBDBosLxKqnTwntw/glPnoHMS4E6iFVjgbBGcwUGMPAjtawP73GZf/wVkAutYtAvPezYUPoKjipBdGZ5vQOgavGteHbfsiXD09TZUIUbg6JD3vITlrU/iYthErPOYaQk44ZhocDF8U0HDqsEOHfQaC7/2X68lyzJVTjd0WiJu2XMem++7+tAxSd52+hguTe3GYtjq6V3XPyqDtbA/WLyAtqRg0rHhLceo3avCsk0kjqd7uoEL0FJkaC/9Hh/gS9ixS0dTCaDKHVidNhoTNN2gQP/FedAmly/t2IWm2YK2xswqDbj3antzz5oToD/915/i5smbcdo8vfaDQGiC37YfEyeW4KtcMu2g1HbCrp9Dx5Fw3ZCw04ZSb0Jse6CsLH1qgZFfK0znn+hpznzKHGpJRzus4YJ/AX/78G94ofUC7r777pwMxAhdE6pyAK8u78CJJZ+BtcKiIw8Wea0DTx34ZCH5oHYwM1y0TjhnziXbaWgB+4cP/RCPPfYYtm/fjpMnT+Kmm24aDrDYhdpoQdAbaMtNSB4Da6UhRx4sqnB3SCTPNbtvtu9iMoU/Wg5Kt9p0h8DTp09j3759ePrpp/H4448PB1fylOtC5jTUGVifseFYgJXClXou+jcN6Gk2nj7JG1Gi7TG0Hkiz7OlGP/ru6OGjq46rnnjiCSwuLibe66677hocMAZWT5uNDVgpXGfbZ5OtybQNZq1EE6G0NXmXtGvNwbrv+4n3uu222wYPjwys9QFW2goKjbQ4Tdth6CAFeSpK5J3oQMUwhynS8PjMM89AVdVs3ouBtb7Aytbrw+WiMZfnednCIwOLgTUIZml43LFjB5577rnhnx4Huek6yztWY6yqbb+wsJBMTwwUHquu5Ijej4GVoWMoPJ4/fz7xXkM9PWa4x3rLwsDK2KMXLlxIvBeFR5qe2LRpU8YrN2Y2BtaA/U7hkaYnnn322exPjwPeYz1kZ2AN2YtpeCTvdeeddw5Zyvq9jIGVo28pPJL3ok2NLDxeb0gGVg6w0kvT8HjixIlkHJY1lauaE8GRangwsvD/noKqt+kzsLJSkCEfzdi/8cYbifdaKzxWoppDmxJ5FT54NH06YZShAQVmYWAVaEwqKg2PMzMzyfTEyqfHqlRzAoOH6OqwJnXoNQeBSWcjq0sMrJJsferUqSQsdofHylRzYg8aLyG0QtiTOvhGhFZglyKD0Mt8DKySwEqLpfD45ptvYn5+Hr/+z19/sukwj2pOP72vyJXBy4BNME340Pg6AiNAu8IDkQysksGi4t9++2189wffxee++DkIO4TcqjlrSw504Eg81FobYetq+KOwKDgagjVOnRdtBgZW0RZdpbw0BL73/nv4yZM/6bv7tVeVxkk1h4FVAVgbUTWHgVUBWGUcvCVV6EP/cuiztQ9NCNsMiIshrPSIeaK3oUNIlXQqaDMDqwIjlyEV0Fv6MoQlbENT/FTIhWSXOF2AF5jocei8cCswsAo36WcLLEPchO7yyr+9smrt6TQ3geQmcgcd2CQbIHoIDKGyuSwG1joEi06oU+jj3RAWR2HQgFiiTuxqJmRgVQBWGaGQDo78/OjPe9T+qpfSeBeeqIM3JPip4k8F7aVbMLAqMHSlg/dr7YkcCZxWg1Jz0G5UL7/EwKoArBuhmoNEbupBvPrRDhxf8qFVLFrCwKoArFQi4P3o/VwTpCmgdBi3r2oOIrQbNdwfGljytZ46r2U1n4FVlmW7yn3rrbfwvX/+XrKkMyPM5FLNIS2KbCrSNI8loKX48G6AxhIDq2SwaIcDgWWaJn71H78qRDWnlxbF1aaQxJILj6TRjRhm0L4hYrwMrJLAos1+BBXtyaLty5SKVs1Zverx1RB4dhIPPe/CVioeXF2rFAOrYLDIOxFQd9xxRwLVytSt90XfFaGaU3ATCimOgVWIGa8WkoY9AorA6pUIrqJVcwpsRiFFMbAKMONqYS9LsWWo5mS5bxV5GFg5rExhj8ZPdHBitbCXo+ixv5SBNWQXpmGPvNXtt98+ZCnr9zIG1oB9O2zYG/A2Y5+dgZWxC1nYy2goNt2Q3VA0jqIDESzsZbcZ81hr2CoNe/T56KOPZrcqy8m2zazGAAt7+X8ZzGOtsCELe/mhohLGEqwyVFpY2CsGqLSUsQKrDJUWFvaKBWrswCpDpYWFvXKgKiYUxh5U/huwhd8idBqYRARX4bHTldd8Le8gTSpapYWWX0is47qnveTdi02I6aFOejlAbSdcOT2fF8NTOEixDTqnV6Uk0CC2GpW8hYTCyFXA72yj8XoAAzoE+nsxgNnrZc8DtL7bU9HJlDwqLY9855FkbY8ktS3LWlGLECbPo6UG8DUOsa+Bn5nH8q3HsRRo4GISL6vDN0O0e70SdoB2rfeshYBF71Juyzzu90TcF59FIC8WJvSVvgiT9nnPH5nP/K7CtOPonYWzh2aTF2Fu+usmvPjLF3us7cXwdR6iZ6DjyogsAWKrhokghhG6kCMTAu9Ap7+r1l0cQwoLAote4+ugwT+IsxO78XrQKkTkqzsEkqeily8Nk0il5cfHfowv3/xlLBxf6Pk2sNhTwEkx7I6FqMHDlC3wTRVRK4QZ1sGbCnxfrfxgwjBtvtHXFAZW7OsQZo7hEm7Fkxf8nm+mH6TBlau0RG00OBWcY6Gj6BDaLgSdDn46MPwG9Hr15/MGsdco5S0GrDiAIU7D5M/AgIo9gY6Lng4+5wi3jIOea59wieCQzgEnAe4kWoEFzhbBGRzEyIPQDmBWpaoxSpQMUZdCwCLh1OlmDWcCBzJsSNzDiIyL8LR8Ur1lHE2nPeZzh+d6mooENW7Zcx6b7zuHTlvCJB1Nnz6GS1O7sUhKxDl/LEP00Vhekh8sUjThNUyYAdxr59dCSwSvAWbg5Xq7exkqLfRO6TMnz/TurNAEv20/Jk4swaf2xC6U2k7Y9XPoOBIm6crYh6UoaLodABOoSU3YlpLbQ48lQT0qnR+sEq1RBlj0dGmfsnPVOtB51IMmfEdGLQ7RkkSYkps8VbJ01QIjDdaNCIVZwOi4DnxOgsRRXIzhazwakY3gmphsljLWe56RBqv6wfvg3R0HFqS6CcHxC5kQHrwGo3nFSIN1Q1RaBuinyDchSyYmDRcthWPLPF22G2mwuo+k55kgHUylJRtZoa1A0kI0bAdGPRnSszQuYFE90yUdepoznzKHWtLRDmsglZY8cHZTE7UVCGqEpmtDScZZLK20wEh7LKpst9YBKQUf1A5mhovWCefMuU9eM9JbWnEQMAIY/DQOXLr+mqmHXkfIdj18YpSRByuFa6+2F1f+cgXkuWb3zfZdN6Twt/DCQuKpsgmVDQIXy9vPAmMB1krPRf9eryot/TpsXL4fG7BSuNa7Ssu4gNOvnmMFVtqY9azS0q/DxuX7sQRrXIy7kevJwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3IxfNwNrIvV9i2xlYJRp3Ixf9d0NIelzdt4X5AAAAAElFTkSuQmCC\"), embed=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "Ft95NDUZy4Rr"
+   },
+   "source": [
+    "That's equivalent to the function $\\hat{y} = w_2 x + w_1$. What we're trying to do is find the \"best\" weights $w_1$ and $w_2$. That will give us that green regression line above.\n",
+    "\n",
+    "What are the best weights? They're the weights that minimize the difference between our estimate $\\hat{y}$ and the actual y. Specifically, we want to minimize the sum of the squared errors, so minimize $\\sum{(\\hat{y} - y)^2}$, which is known as the *L2 loss*. So, the best weights are the weights that minimize the L2 loss."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "RHDGz_14vGNg"
+   },
+   "source": [
+    "## Gradient descent\n",
+    "\n",
+    "What gradient descent does is start with random weights for $\\hat{y} = w_2 x + w_1$ and gradually moves those weights toward better values.\n",
+    "\n",
+    "It does that by following the downward slope of the error curves. Imagine that the possible errors we could get with different weights as a landscape. From whatever weights we have, moving in some directions will increase the error, like going uphill, and some directions will decrease the error, like going downhill. We want to roll downhill, always moving the weights toward lower error.\n",
+    "\n",
+    "How does gradient descent know which way is downhill? It follows the partial derivatives of the L2 loss. The partial derivative is like a velocity, saying which way the error will change if we change the weight. We want to move in the direction of lower error. The partial derivative points the way.\n",
+    "\n",
+    "So, what gradient descent does is start with random weights and gradually walk those weights toward lower error, using the partial derivatives to know which direction to go."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "W7SgnPAWBX2M"
+   },
+   "source": [
+    "## The code again\n",
+    "\n",
+    "Let's go back to the code now, walking through it with many more comments in the code this time:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 718,
+     "status": "ok",
+     "timestamp": 1446659172854,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "4896c353dcc58d9f",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "4qtXAPGmBWUW",
+    "outputId": "0664707f-ea8a-453b-fc3f-48d5ca0f76dc"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlUAAAEPCAYAAABr+zG+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl8VPW9//HXJwtLIKwBQVAEEdlETBUXLM2VKlZR22uv\n2NpatNdb6lqtGm17L1j7uwgu1KU+qPe617Vqi3LdEIm4VFHjggNaXNjXsIU1ZPn8/pgJjiEJSWY5\nk5n38/GYR2bOnDnnM0K+fviez/l8zd0RERERkdhkBR2AiIiISDpQUiUiIiISB0qqREREROJASZWI\niIhIHCipEhEREYkDJVUiIiIicRCXpMrMrjezkJl9bGaPmFmbeBxXRCRRzKyvmb0aGbsWmtnlke1d\nzexlM/vMzF4ys85Rn7nezJaY2WIzOyW46EUkFcWcVJlZP+Ai4Ch3HwHkAOfGelwRkQSrAq5y92HA\n8cAlZjYYuA54xd0PB14Frgcws6HAOcAQ4HvA3WZmgUQuIikpHjNV5cAeoIOZ5QB5wOo4HFdEJGHc\nfa27fxh5vh1YDPQFzgIejOz2IPD9yPMzgcfdvcrdlwJLgFFJDVpEUlrMSZW7bwZuBZYDq4At7v5K\nrMcVEUkWMzsEGAm8DRzg7usgnHgBPSO79QFWRH1sVWSbiAgQn8t/A4ArgX7AgUBHM/txrMcVEUkG\nM+sIPAVcEZmxqrt2l9byEpEmyYnDMY4G3nT3TQBm9gxwAvBo9E5mpoFJJE24e1rUEkVKFp4CHnb3\nWZHN68zsAHdfZ2a9gPWR7auAg6I+3jeyre4xNdaJpInmjnXxqKn6DDjOzNpFijbHEq5NqC+4lHhM\nnjw58BhSKQ7FktpxpFosaeY+YJG73x617VlgYuT5z4BZUdvPNbM2ZtYfGAgsqO+gQf8ZpdPft0yM\nPx2+Q2uP371lY13MM1Xu/pGZPQS8D1QDHwD3xHpcEZFEMrPRwHnAQjP7gPBlvt8A04AnzexCYBnh\nO/5w90Vm9iSwCKgELvaWjrwikpbicfkPd78ZuDkexxIRSQZ3fxPIbuDt7zbwmanA1IQFJSKtWkZ2\nVC8qKgo6BCB14gDFUp9UiQNSKxZJf63971trjx9a/3do7fG3lCVr9trMNFMukgbMDE+TQvVE0Fgn\nkh5aMtZl5EyViIiISLwpqRIRERGJAyVVIimuoqqixbf3iohI8iipEklhVTVV3LXgLhasqrcdkoiI\npBAlVSIpyt154MMH6NCmA8f0OSbocEREZD+UVImkqGcWP8OmXZv4+VE/J8v0qyoikuo0UoukoHlf\nzeOjdR9xyTGXkJudG3Q4IiLSBEqqRFLMB2s+4MXPX+TyYy+nQ5sOQYcjIiJNpKRKJIV8sekL/vLx\nX7j4mIspyCsIOhwREWkGJVUiKWLd9nXMfG8mFxx1Af269As6HBERaSYlVSIpoLyinDveuYPvD/4+\nw3sODzocERFpASVVIgGrqKrgrgV3cVzf4xh98OigwxERkRZSUiUSoBqv4Z7376Fvp76MHzQ+6HBE\nRCQGSqpEAuLuPPLxIzjOeUech1mzFkMXEZEUo6RKJCDPL3me5VuX84tv/YLsrOygwxERkRgpqRIJ\nwFsr3uLNFW9y2bGX0TanbdDhiIhIHCipEkmy0PoQzyx+hsuPvZxObTsFHY6IiMRJXJIqM+tsZn81\ns8VmFjKzY+NxXJF0s3zrcu774D4mHT2JXh17BR1ORjOze81snZl9HLVthJm9ZWYfmdksM+sY9d71\nZrYkMs6d0tix3RMZuYikKvM4/Pab2QPAa+5+v5nlAHnuXl5nH4/HuURKS0t54omXAJgwYRyFhYUB\nR9Q0G3duZPqb05kwfAKFvVtHzPUxM9y91VfVm9mJwHbgIXcfEdm2ALjK3d8ws4nAAHf/LzMbCjwC\nHAP0BV4BDqtvUDMz377d6aAVhkRatZaMdTHPVJlZJ+Db7n4/gLtX1U2oROKltLSUSZNmMH/+KObP\nH8WkSTMoLS0NOqz92rFnB3e8cwfjBo5r1QlVOnH3N4DNdTYfFtkO4cTp7MjzM4HHI+PbUmAJMKqh\nY2/bFudgRaRViMflv/5AmZndb2alZnaPmbWPw3FF9vHEEy+RnT2RgoKxFBSMJTt74t5Zq1RVWV3J\n3e/ezREHHMFJ/U8KOhxpXMjMzow8P4fwrBRAH2BF1H6rItvqpaRKJDPlxOkYhcAl7v6emf0RuA6Y\nHIdji7Rq7s59H9xHl3ZdOHvI2fv/gATtQuBOM/tP4FlgT0sOcsstU+jdO/y8qKiIoqKieMUnIglS\nUlJCSUlJTMeIR1K1Eljh7u9FXj8FFNe345QpU/Y+10AjLTFhwjjmzZtBWVn4dXX1A0yYcGWwQdUR\nCoWYPWc2ALlDcqnJr+GKY69otc094zHQtBbu/k9gHICZHQacHnlrFXBQ1K59I9vq9eMfT+E730lU\nlCKSCHXzkhtuuKHZx4hXofprwEXu/k8zm0y4UL24zj4qVJe4eOSRR7jjjicAuPzyCZx33nlxPX4s\nhfChUIhrb76W3GG5rK9ez4a1G3jsgsc4+sij4xpjkNKlUB3AzA4BnnP3IyKve7j7BjPLAu4H5rn7\nA1GF6scSvuw3h0YK1Z97zhmvVYdEWrWWjHXxmKkCuBx4xMxygS+BC+J0XJFvKC0t5fbbXyQ7+woA\nbr/9AYYMGbI38Yn1zsDaQvjs7IkAzJs3g5kzr2zycWbPmU3usFyyDspi55adHJZ1GHPnzU2rpCpd\nmNmjQBHQ3cyWEy5ZyDezSwAHnnH3BwDcfZGZPQksAiqBixv7V6JqqkQyU1ySKnf/iPCtxiIJFV2o\nDlBWFt5WWFgYc0K0v+M31faa7WzYvIHhPYezY9uOJn9Oksvdf9zAW3c0sP9UYGpTjq2kSiQzqaO6\npI1UuDPw2BOP5ct1X1JQXsCOL3dQGapk/Mm6DpRplFSJZKZ4Xf4TSYpEF6rHcvwtu7fw8taX+cP3\n/8CK0vDd9+OvGc+wYcPiFp+0DkqqRDJTXArVm3QiFapLnDRUN1X38l919QPNvvzX2PEbs7tqN7e8\ndQuFvQs57bDTmnW+1iadCtUTwcz8qqucW28NOhIRiUVLxjolVZJWgljCprqmmrsW3EX3vO6cd8R5\nrbZ1QlMpqWqcmflFFzn33BN0JCISiyDv/hNJCYWFhQlLpOpL2Nydhz9+mOysbH58xI8xs1a7NqHE\njy7/iWQmJVUiTdDQnYUrO6xkzbY1XHX8VWRZVlzuQJTWr1yrn4pkJCVVIk1QX6uFW/46kwEn96R4\ndDFtc9o2uF9zWzJI66eZKpHMpJYKIi2ws+OXfJ79KZcfezn5bfODDkdSjJIqkcykmSqRJpgwYRzP\nPz+ZVRsfp6rDNvYMfZf7xs2gZ4ee++yX6msTSuIpqRLJTEqqROoRXWx+9NED+cd7/2Bb3kJqRi5k\nV8E2Dll7MId2O3SfzxUWFjJz5pVRheqqp8pESqpEMpNaKojUEV1svnPnUpbt/C+6DWzLjn472N1t\nN0f1O4ou5V04Pvd4in9VvN/jpRu1VGicmXn79s7OnUFHIiKxaMlYp5oqkTqii833ZJWRNfgQdloF\ne7ruoW37tuzatCvoECXFVVRAVVXQUYhIsimpEtkPx/HDHVtlZK3MYtvybVrTTxrVsSNs3x50FCKS\nbEqqROqYMGEcZWU38nrpKFasepid2z8mv2MeI3qMoMsnXTi1y6lMv2a61vSTBuXnq65KJBOpUF2a\nJRO6hb/55pus3PMO1X2q4VDDKmDUulEcOfRIxl+oBZJl/5RUiWQmJVXSZJnQLTwUCvH7//k9NSfW\nkNM9h6rtVXSo6cjmTZszsihdWkZJlUhmUlIlTZYJ3cJnz5mNHW5YX6Mmv4bcmlwqP6+Envv/rEgt\nJVUimUk1VSJ19OnXh6rKKtgGVZurqFpWRfuaAoqLp1JaWhp0eNIKKKkSyUxKqqTJJkwYR3X1A5SV\nzaWsbG6kW/i4oMOKq29/59uUV5dzePbhdPqyE3kf5NHDh7NmzUTmzx/FpEkzlFjJfimpEslMcbv8\nZ2ZZwHvASnc/M17HldSRrt3CQ6EQs+fMptIrWdt3LcXji9m8cDP0gX92WsOiRWek9SXPTGVm9wLj\ngXXuPiKy7RjgT0AuUAlc7O7vRd67HrgQqAKucPeXGzq2kiqRzBTPmqorgEVApzgeU1JMYWFhWiUU\noVCIa2++luyh2XxV9RU5JTlM+uUkho8dDkBx8dSAI5QEuh+4E3goatt04Hfu/rKZfQ+4GfgXMxsK\nnAMMAfoCr5jZYQ0tE5GfD+XliQ1eRFJPXJIqM+sLnAb8P+CqeBxTJFGi20LsqllDztActvTYQmfv\nTPec7vzfK//H8OHhpEoLJKcvd3/DzPrV2bwG6Bx53gVYFXl+JvC4u1cBS81sCTAKeKe+Y2umSiQz\nxWumagZwDV8PRiIpqW5biBUbrqbLuZV06NqBI3oeQdnGsm/sn66XPKVB1wFvmtmtgAEnRLb3Af4R\ntd+qyLZ65efDunUJi1FEUlTMSZWZnU64JuFDMysiPBDVa8qUKXufFxUVUVRUFOvpRZrliSdeoqKi\niD1Z7wFgvY9i1bJnGd3pOMq2lIWXn7nmm8vPpNslz+YqKSmhpKQk6DCS5V7gMnf/u5n9ELgPOLm5\nB3n11SksWwZTpmisE2kt4jHWWQMlAU0/gNl/Az8hXLzZHsgHnnH38+vs11D5gUjS/Pznv+KZ+W/T\n9sheVLbbyo49nzHWRzNm9NEAjD9ZHdP3pyUrt6eqyOW/56IK1cvdvVPU+1vcvYuZXQe4u0+LbH8R\nmOzu+1z+MzN/8knn8cfh6aeT9EVEJO5aMtbFPFPl7r8BfhMJ4DvAr+smVCK1YlnmJh5L5HToVgOH\nr6Sqex678pfR5ssaDj20t7qlZy7jm7PrS8zsO+7+mpmNBZZEtj8LPGJmMwhf9hsILGjooJ06qaZK\nJBOpo7okTSzL3MRjiZxQKMSSr5bQqWc22woWcpAdQO8hB8Oer+/yS9f1DGVfZvYoUAR0N7PlwGTg\nP4C7zawNsDvyGndfZGZPEr7DubbVQoNT7ypUF8lMcU2q3P014LV4HlPSRyzL3MS6RE5t64Qd/XZQ\ntrGMNmva0LfvAVQsrGDe51/RseNZQHquZyj1c/cfN/DWsQ3sPxVoUo8NJVUimUkzVZIRZs+ZTdbQ\nLHZ33c3IbSPZ8sYW2pS34ZBuR7Coo5p7SnwpqRLJTEqqJGla2vOptLSUtWtXsXTpf7N9+w46duzQ\n7H5RNV7DsqpldMvtxqCBg1hfvZ7jc49n05qaln4dkQYpqRLJTEqqJGla0vMpupaqR4/hbNhwE9/6\n1jFcfnnTL9G5O1mDs6h6oYpOOZ1Yv2H93tYJFRUVau4pcVebVLmDpcV9kiLSFDG3VGjyidRSQVqg\nuHgq8+ePiro8N5cxYxYwbdr1TT7GC0te4P0173Nal9OY8+oc4JutE+JxV2EmSaeWColQO9a1bQtb\nt0K7dkFHJCItEUhLBZFU9vbKt3l9+etcO/paurTrQuGIfROmTG/uKYlRO1ulpEokc2QFHYBIYyZM\nGEd19QOUlc2lrGxu5PLcuCZ9dvGGxTy16CkuG3UZXdp1SXCkIt+kuiqRzKOZKklpza3DCoVCzJ4z\nm601W1ndazW/Hfdbeuf3Tla4InspqRLJPEqqJOU19fJcbS8qH+osqVxCz5Ke7DlyD3RPQpAideTn\nQ3l50FGISDLp8p+kjdlzZmNDjXWd1zHokEH0OKwHs+fMDjosyVCaqRLJPEqqJG1UezVLq5bStX1X\n+uT3CTocyXBKqkQyj5IqSQvuTuXASmyN0WFtB9Z/FulFdfL4oEOTDKWkSiTzqKZK0sIzi58hr1se\nD//Hw7w0N1zUPv6ar3tRiSRbp05KqkQyjZKqNJVJDS1f/epVPl73MdeOvpYObTow8oiRQYckopkq\nkQykpCoNRS/tAjBv3gxmzmz6si6pbtasWcx8eCYAp5x9CmsL1u5NqERSRX4+bNgQdBQikkxKqtLQ\nE0+8RHb2xKilXcLbmpJUpfoM16xZs7hoykVkj8qmKruK155/jTvOvIPueeqbIKklPx++/DLoKEQk\nmVSoLnvVznDNnz+K+fNHMWnSDEpLS4MOa69QKMQ1N17DrvxdtOnRhppDasjrlMfTTzwddGgi+9Dl\nP5HMo5mqNDRhwjjmzZtBWVn4dXhplyv3+7lYZrgSrbax55b+W6hoV8GaVWsosAK8Sot0S2pSUiWS\neZRUpaHmLu2Saupegmzbti1XT76aZT2Wcejhh/Le2vdgF2x7bRvtt7Vn0pRJAUcssi8lVSKZR0lV\nmmrq0i7RWjrDFU91i+yff34ynQduZrNtZlPNJtaUr+GQnoew5cMtdN3ZlZun3MxZZ52V1BhFmkJJ\nlUjmibmmysz6mtmrZhYys4Vmdnk8ApPkq53hGjNmAWPGLAjkjsHoS5AFBWPZWtGLdZ03MeSkIVTu\nrKS6opqclTmM6jCKvz34NyVU0mJmdq+ZrTOzj6O2PW5mpZHHV2ZWGvXe9Wa2xMwWm9kp+zu+kiqR\nzBOPmaoq4Cp3/9DMOgLvm9nL7v5pHI4tSdaSGa5k2N5lO71H9Kb67WoOsUOYfsN0NfaUWN0P3Ak8\nVLvB3c+tfW5mtwBbIs+HAOcAQ4C+wCtmdpi7N1jUp6RKJPPEnFS5+1pgbeT5djNbDPQBlFRJs9Ve\ngvznP99k3fbZVFV+RcGmDny57EsOyz0M8mH6NUqoJHbu/oaZ9Wtkl3OAosjzs4DH3b0KWGpmS4BR\nwDsNfVhJlUjmiWtNlZkdAoykkYFGWqdk9a8qLCykqKg3M578A1ZotC1ow9pd2zhjxRkcPfhoLT0j\nSWFm3wbWunttp6k+wD+idlkV2dagjh1h506oqYEsNa8RyQhxS6oil/6eAq5w9+317TNlypS9z4uK\niigqKorX6SWBktmhPRQKcf9z95M12sg5JIfdlbvJ35jP9rLtFP+qOO7nk/0rKSmhpKQk6DCS7UfA\nYy39cO1Yl50NL7xQxOmnF8UnKhFJmHiMddZISUDTD2KWA8wGXnD32xvYp7HyA0lhxcVTmT9/VFT/\nqrmMGbOAadOub9HxGpv1mvbHadz6t1vZNngbfpCTtTuL7M+yOdFP5IWnXoj9y0jMzAx3t6DjiIfI\n5b/n3H1E1LZswjNRhe6+OrLtOsDdfVrk9YvAZHffZ1Y+eqzr3Rveew/6NDqnJSKpqCVjXbwmpe8D\nFjWUUInUakrX9oOHHkzljkp8rVP9VTX+kTPpp+pFJQlhkUe0k4HFtQlVxLPAuWbWxsz6AwOBBfs7\neKdOqqsSySTxaKkwGjgPOMnMPojcinxq7KFJqpgwYRzV1Q9QVjaXsrK5kf5V41p0rLotE7KzJ+6d\ntQIYN3Yc29tsp39Bfzov7Eyn0k5MvWyqWidI3JnZo8BbwCAzW25mF0TemkCdS3/uvgh4ElgEPA9c\n3JSpdxWri2SWeNz99yaQHYdYJEUlq0O7u/NOxTv86Ls/ou3nbbFDjPEnB1OYnuoLS0vs3P3HDWy/\noIHtU4GpzTmHkiqRzKKO6tIk8epf1VDXdnfnr4v+yvY92/nt6b8lJyu4v5rJLMyX9KakSiSzKKmS\npGpo1uuVL19h8YbFXDP6mkATKkjthaWldVFSJZJZlFRJ0tWd9Xpv9Xu88uUrXDv6WvJy8wKMTCS+\nlFSJZBYlVZJ0oVCI2XNmA3DEcUfwavmr/Oq4X9GtfbeAIwtLhYWlJT0oqRLJLEqqJKlmzZpF8W3F\nZA3NoluPbvz5yT/zx3/7I3079Q06tL2SVZgv6U9JlUhmUVIlSRMKhSieXszW4VvJ7ZPLyp0rGdRx\nEIvfWcyZx58ZdHjfkKoLS0vrkp8Py5YFHYWIJItWpJKkmT1nNtkHZNOmYxt2td1F+/bt2blhZ9Bh\niSSMZqpEMouSKkmqHv17UL6pHC9zbKVRs6iG8SePDzoskYRQUiWSWZRUSdKc/t3T2VSxiR4H9KDn\nkp50+aQL066aFkhzT5FkUFIlkllUUyVx1Vgn8i9yvuCkfzmJHst7kNMrJ7Bu6SLJoqRKJLMoqZK4\naawT+evLXufd1e9y4/gbyW+bH2icIsmSnw/l5UFHISLJoqRK4qahTuQ5fXJ49rNnuWb0NUqoJKN0\n6qSZKpFMoqRK4m7bthDrts5mx47P+WpzNQ9++CCXjrqUnh16Bh2aSFLp8p9IZlFSJd/QWE3U/kyY\nMI5nnrmKlTu/hJ5G9sE7eSunLbd3vJ3+XfsnKmSRlKWkSiSzKKlKA7EkQnWP01BNVFO0bduWym5L\nyTp6M1l52ThV9OjYl8/f/RxGtygkkVatbVuoqYE9e6BNm6CjEZFEU0uFVq42EZo/fxTz549i0qQZ\nlJaWtuhY0TVRBQVjyc6euDdZa4rZc2aTV5hHwbButB/chryCPPZ8vqdFsYg0lZkdamZtI8+LzOxy\nM+sSdFwAZpqtEskkSqpauVgToXjr1aMX5dvLqdldQ9aWLKrXVau5pyTa00C1mQ0E7gEOAh4NNqSv\nKakSyRxKqgQIz3itW7eWpUv/k2XLZlJWNpfq6geYMGFck49x+ndPZ/PGzRS0LaDH6h50+bQL065V\nc09JuBp3rwJ+ANzp7tcAvQOOaS8lVSKZQzVVrdyECeOYN28GZWXh1+FE6Momfba2Fmvt2lW8//5q\n8vMvoaBgLBs23M5RRw3hiiuaXk8FsKrdKkYXjabXyl7kds1l/KVq7ilJUWlmPwJ+BpwR2ZYbYDzf\noKRKJHPEJakys1OBPxKe+brX3afF47iyf4WFhcyceWVUoXrTEqHoovRly56kvPwUCguP5pBDOtOx\nYwd69VrQrIRqwaoFzFs6jxvH30jX9l1b+nVEWuICYBLw/9z9KzPrDzy8vw+Z2b3AeGCdu4+I2n4Z\ncDFQBfyfu18X2X49cGFk+xXu/nJTglNSJZI5Yk6qzCwLuAsYC6wG3jWzWe7+aazHlqYpLCxs9h1/\n0bVYGzYsoLy8B6tXr6dz585N+nwoFGLabdN4+6O36di3IwefcjB3//RuJVSSdO6+CLgcwMy6AvlN\n/Ifd/cCdwEO1G8ysiPBs1xHuXmVmBZHtQ4BzgCFAX+AVMzvM3X1/J1FSJZI54lFTNQpY4u7L3L0S\neBw4Kw7HlSQ58MBxwEPs3PlWk2qpQqEQ5196Pk+88QRLhy3lo4M/Yt5f5vHuq+8mL2iRCDMrMbNO\nZtYNKAX+x8xu29/n3P0NYHOdzb8EborUaOHukQvrnAU87u5V7r4UWEJ47NsvJVUimSMeSVUfYEXU\n65WRbZLCJkwYR3X1A5SVzaWycjP9+lVyyimljBmzoNHeVKFQiKsnX82nyz6FYZB1WBZtu7elZkAN\nMx+eCYQvLRYXT6W4eGqL2zuINENndy8H/hV4yN2PBb7bwmMNAsaY2dtmNs/MvhXZXnecW0UTxzkl\nVSKZI6mF6lOmTNn7vKioiKKiomSeXqLsW4v1h/1eQgyFQlx787Us67GMyqGVVLapJHtDDlkdHas2\nIPYGopJ6SkpKKCkpCTqMxuSYWW/Cl+d+G+uxgK7ufpyZHQP8FRjQ3INEj3WbNhWxbVtRjGGJSKLF\nY6yzJpQENH4As+OAKe5+auT1dYDXrWkws6aUH0iKqp2hWtZjGX2G9uGNL95g9/rd4YsgfXPILjWm\nX3ET69ZVMH/+qKhFlecyZswCpk27PtgvIHFjZri7BR1HLTP7N+A/gTfd/ZdmNgC42d3PbsJn+wHP\n1Raqm9nzwDR3fy3yeglwHHARgLvfFNn+IjDZ3d+p55jfGOumToWtW+Gmm2L8oiKSVC0Z6+IxU/Uu\nMDAyOK0BzgV+FIfjSoqonaFaXrOcTTWbWLtqLXm5+VSt64AthY5rB9Mj7xTWrasIOlTJQO7+V8Iz\nSrWvvwT2m1BFWORR6+/AScBrZjYIaOPuG83sWeCRSK1WH2AgsKApJ+jUCVas2P9+ItL6xVxT5e7V\nwKXAy0CIcDHn4liPK6lj9pzZ5A7LZejYoVRtq6KyshL/3Om4fCAnDHuN0SPfoFu38OJ+0bVaLWkg\nKtJcZtbXzP5mZusjj6fNrG8TPvco8BYwyMyWm9kFwH3AADNbSLgr+/mw9w7DJ4FFwPPAxU2deldN\nlUjmiEtNlbu/CBwej2NJ6trddTc9R/bEFzjddnSjvF03KirWUlGxdm/T0Zb2zRKJwf2EE6B/i7z+\nSWTbyY19yN1/3MBbP21g/6nA1OYG16lT+PKfiKS/mGuqmnwi1VS1WqFQiEm3T2Jtv7UMzB1I1qIs\npl8znYqKiqjkaVxck6fabu+JOLbEJgVrqj5095H725bEeL4x1n30EZx3HnzySRDRiEhLtWSsU1Il\n9QqFQsyeMxuAI48/kse/eJxea3vRNasr409O7PIzde8grK5+QHcQppAUTKrmEp6Zeiyy6UfABe4+\nNqB4vjHW7dwJ3buHLwHmaGEwkVYjqEJ1STO1hem5w3LZ7bv58+N/5rZ/u43v//j7STl/dLd3gLKy\n8DYlVdKACwl3Rp8BOOE6qYlBBhQtLw969YKlS2HgwKCjEZFEUlIl31DbOmF5zXIOa3cY63LX0aem\nD58t+AxOCDq62OiSYnpy92XAmdHbzOxXhNcjTQmDB8OnnyqpEkl38eioLmkiFApx8eSLed/eZ1nO\nMua+O5e8nXl0z+6e1DgScQdh7SXF+fNHMX/+KCZNmqFu7+ntqqADiFabVIlIetNMVZqJZTbm3r/c\ny6e5n5IzKIeK3RX4emfNC2vo0qcL468Zn6iQ95GIOwh1STHjpEzNF4STqne1NKZI2lNSlUZiWSKm\ntjB9R/8d5LXNo3P7zuxetZuczTlM/+P0hBam16ewsFAJj8Qipe6KGTwYHn446ChEJNGUVKWRls7G\n1BamVx9aTUV5BRXLKyjIKiB7eTbjxzZ8p19rqlGaMGEc8+bNoKws/Lq2r5a0Xma2jfqTJwPaJzmc\nRunyn0hmUFKV4aLX9Cs4soB1y9bBR1BdXs3gvoP5+fk/r/dzrW3hZDUlTT/unh90DE3VsydUV4f/\noVNQEHTQ56HpAAAgAElEQVQ0IpIoSqrSSHNnY2bNmkXx9GI27d5EZZdKVm5YydEDjmZ9xXr6bejH\nLTfc0uAsVWusUdIlRQmK2dezVSeeGHQ0IpIoSqrSSHNmY2bNmsW//+e/s/vw3eS2zWXbjm10WteJ\n9ZvX039Xf6bfkPw6KpF0pqRKJP0pqUpBsdQqNWU2JhQKUTy9mN0jdlN1YBUVVJC/Pp+cD3Po178f\n028IL0FTXDy1wRhUoyTSPKqrEkl/SqpSTDJqlWbPmU32Adl07NaRje02YnuMqh1VHNDuAG654RYq\nKir2G4NqlESaZ/BgeP31oKMQkURSUpViklWr1KN/D1ZtXkXHNh2pXFtJu8/aMe3GaQwbNozi4qlN\nikE1SiJNp5kqkfSnpCqD1PaiWrtmLavLV9OrZy+y/plFzboapt04jbPOOivoEEXS1oABsHIl7N4N\n7doFHY2IJIKSqhSTqFql6EWSV3dZTdWmKs6wM/BDnR1ds3jrrUUcdNBBFBYWql5KJAFyc6F/f/j8\ncxg+POhoRCQRzD05jYfNzJN1rtauvkL1WBttTvvjNP5R+Q+qDqxizfY19N7amwEbBvD2a5v31k5V\nVz+wt3aqNTX2lOQyM9w9pZaBSSWNjXU/+AGcdx788IdJDkpEmq0lY52SqhTRWBJTt3g9OvnZn9pL\nfiVvlFDWv4zKQys58oAj2fr5VjbP2c2ebddE1U7NZcyYBUybdn3cv5+kDyVVjWtsrLv+eujQAX73\nuyQHJSLN1pKxTpf/UsD+7vhrafH6rFmzKL6tmKyhWeQeksuStUsY6SPZum0rlaFKBh40mEWLEvjF\nROQbBg+GOXOCjkJEEiUrlg+b2XQzW2xmH5rZ02bWKV6BZZLopKmgYCzZ2RP3zlq1VG0vqq2Dt7Kt\nzza+avMVA3sNpPPSzhyfezzTr5nOJZecT3X1A5SVzaWsbG6kdmpcXL6TSKozs3vNbJ2ZfRy1bbKZ\nrTSz0sjj1Kj3rjezJZEx75SWnFN3AIqkt1hnql4GrnP3GjO7Cbg+8pA4aknh+L0P3cum3ZvYs2wP\nHAB5eXlUraqi6MQiin9VvHc/9ZqSDHY/cCfwUJ3tt7n7bdEbzGwIcA4wBOgLvGJmhzW3puHww+Gz\nz8A9vHSNiKSXmJIqd38l6uXbwNmxhZOZ9pc0NbfRZigU4vm3n6fmiBp2tNuBlRo57XOoWV/D+EvH\nf2Nf9ZqSTOXub5hZv3reqi/dOQt43N2rgKVmtgQYBbzTnHN26QIdO8KqVdC3b/NjFpHUFs+aqguB\nx+N4vIzRlKSpqclPKBTi6slXs2fAHvYU7KFju45UbqzEPjKm3TxN6/mJ7N+lZvZT4D3g1+6+FegD\n/CNqn1WRbc1WewlQSZVI+tlvUmVmc4ADojcBDvzW3Z+L7PNboNLdH23sWFOmTNn7vKioiKKiouZH\nnKbiMWNUW5i+kY3s7r0bM6Ov96W6oJpTzz61VTX3VEuH1FFSUkJJSUnQYSTL3cDv3d3N7A/ArcC/\nN/cgjY11tUnVd78bc6wiEkfxGOtibqlgZhOBi4CT3L2ikf3UUiGBQqEQZ//H2WwdvpWKjhVsW72N\nzu0707NtT/rv6s/0a6a3mlmqWFpISOKlU0uFyOW/59x9RGPvmdl1gLv7tMh7LwKT3X2fy3/7G+tu\nvx2WLIG77orb1xCRBGjJWBfr3X+nAtcAZzaWUEli1V7y27R7E5VeSdYBWXTv2Z2sL7Lot6Ffq0qo\nIDF3Q4o0wIiqoTKzXlHv/SvwSeT5s8C5ZtbGzPoDA4EFLTmh7gAUSV+x1lTdCbQB5lj4Vpa33f3i\nmKOSJqu95Lep5yZ2j9jNzm076f5pdyzX6EY3brnhlnoTKl1ek0xnZo8CRUB3M1sOTAb+xcxGAjXA\nUuAXAO6+yMyeBBYBlcDFLZ16V1Ilkr7UUb0Vi77kV1NQw8ZdG8nblEe7he3o1q4b066tf5HkVL+8\nlurxZbp0uvyXCPsb62pqID8f1q4N/xSR1JT0y38SnFmzZvGDn/2AFWtWsKd6D1Wdq+iW1412O9rx\nrf7f4ul7nm6wMD3VL6/V3g05ZswCxoxZoIRK0kpWFgwaFO5XJSLpRcvUtEKzZs3ioikXsWvYLiq2\nVrBz4046fdWJNjVt6Ly+M7fcXf8lv9ZE/bMkndVeAjz66KAjEZF40kxVKzTz4Zlkj8qmx1E9yBqZ\nRVaHLKpeqqLzJ52ZdtX+e1FNmDBOy9OIBGjIEFi4MOgoRCTeVFPVCn3vh9/jg+4fwGFQXVXNrk92\nceCnB/K3B//W5BkqFapLS6mmqnFNGeveew/OPTfcWkHL1YikppaMdUqqWqG///3vnP+/58MgyNuR\nR82CGv5nyv+0quae0nopqWpcU8Y69/AlwIcegmOPTVJgItIsSqrSWCgUYvac2QB0G9GNV5e8ysZX\nNpLt2Uz66SQlVJI0Sqoa19Sx7sYbYf16uPPOJAQlIs2mpCpN1faiyhqaRYceHdi0bRMPn/8wJxx1\nQtChSQZSUtW4po51X34Jxx0XXlw5NzcJgYlIs6ilQhoKhUIUTy9m6+CtbO2zlc+qPqNXfi9ef+31\noEMTkRgMGAADB8KcOUFHIiLxoqQqxc2eM5vsA7LJzs9md5vd5Ofls3nD5qDDEpE4+MlP4C9/CToK\nEYkXJVWtQLf+3SgvL6dNWRtqVtRQs6iG8SePDzosEYnROefA88/Dtm1BRyIi8aCkKsWdfNLJrKte\nR/+u/en2ebcm96ISkdRXUADf/jb8/e9BRyIi8aBC9RRUe6dftVdT3r+cbm27UfNpDWbG+JPHM2zY\nMPWZksCoUL1xzR3rnngC7rsPXkqdlaJEBN39lxZCoRDX3nwtOUNzWF69nMrVlTz2i8c4YvgRe/cp\nLS3l/PP/wKZN4TYK3brN4qGHfqfESpJCSVXjmjvW7dwJffrA4sXQq1cCAxORZtHdf61cKBTi6slX\n81X7ryjLL6NNQRsGHDSA5195/hv73XHHgyxdegq7dp3Grl2nsXTpKdxxx4NJjbW0tJTi4qkUF0+l\ntLQ0qecWSSd5eXDWWfD440FHIiKxUlKVImbNmsXZ/3E273/1PqtrVvPJik84MPdAsmzfP6LFi78C\netCmTfgBPSLbkqO0tJRJk2Ywf/4o5s8fxaRJM5RYicRAdwGKpAclVSkgFApRfFsxW4dvxQud7eXb\nyVmTw2fvfEZlqHKfO/0GDz4YeIg9e+ayZ89c4KHItuR44omXyM6eSEHBWAoKxpKdPXFvfZeINN+/\n/AusXg2ffhp0JCISCyVVKWD2nNlkDc0i+6BsqgdW0y2vG9kfZ9NvQz+mXzN9nzv9rrjiQvr1q6R9\n+ydp3/5J+vWr5IorLgwoehGJVXY2XHghTJsWdCQiEoucoAOQsO49urNy50o65nXEc5zO7Tpzyw23\n1Ns6obCwkIcf/kPU3X+/SGqR+oQJ45g3bwZlZeHX1dUPMGHClUk7v0g6uvZaGDIE3n47vHyNiLQ+\nuvsvBSz4cAHnPXAe+Z3y2b1hNzWLaph21bSUXiRZLR0yl+7+a1wsY91f/gJ//CO880549kpEghNY\nSwUz+zVwM1Dg7psa2EdJVT32VO/htn/cRt7OPHYt2gWwtxeVSCpKl6TKzO4FxgPr3H1Enff2GdPM\n7HrgQqAKuMLdX27guC0e69zDzUB/9jO46KIWHUJE4iSQpMrM+gL/CxwOfEtJVdPVeA0z35tJ+5z2\nTBw5EbNW//8pyQBplFSdCGwHHopOquob08xsCPAocAzQF3gFOKy+QS3Wse6DD+B73wv3reratcWH\nEZEYBdWnagZwTRyOk1Hcncc/eZyKqgp+euRPlVCJJJm7vwHUtzp5fWPaWcDj7l7l7kuBJcCoRMR1\n1FHwr/8K//VfiTi6iCRSTIXqZnYmsMLdFyopaJ6XvniJLzZ9wdUnXE1OVviPQXVKIsFqZEzrA/wj\n6vWqyLaEuPFGGDoU/v3f4cgjE3UWEYm3/SZVZjYHOCB6E+DA74DfACfXea9BU6ZM2fu8qKiIoqKi\npkeaRt5Z+Q6vLX2N4hOLaZ/bHvi6oWZ29kQA5s2bwcyZV8acWClRk1iVlJRQUlISdBgJZ2bt2XdM\na5FYx7ru3eGGG+Cyy+C110D/ZhVJvHiMdS2uqTKz4YTrCnYSTqb6Ev7X2yh3X1/P/hlbU1W7QDLA\n0GOH8tq217jq+Ks4MP/AvfsUF09l/vxRFBSMBaCsbC5jxixg2rTrW3zeuoladfUDcUnUJLOlS00V\ngJn1A55z9xGNjWmEC9Rx95sin3sRmOzu79RzzLiMddXVcOyxcMEFcMklMR9ORJqpJWNdiy//ufsn\nwN7lP83sK6DQ3eurUchYtQsk5w7LZVfNLv781z9z5zl3fiOhSpTozucAZWXhbUqqRPayyKPRMc3M\nngUeMbPbCF/2GwgsSGRg2dnh9QBPOCFcZ3XCCYk8m4jEQzw7qjv7ufyXie596F6Wli9lxRcrWGbL\nOKjnQXzy9if77Ddhwjiqqx+grGwuZWVzIw01xwFavFgkEczsUeAtYJCZLTezC+rssndMc/dFwJPA\nIuB54OJkTL0PHAj33w/nnANr1iT6bCISKzX/TKBQKMTZF5/NliFb2N11N22+asPwPsMZd9A4in9V\nvM/+9dU/xXIJT5f/JBHS6fJfIiRirPv97+Hll+HVV6FNm7geWkQaEFjzzyadKAOTqml/nMaLm17k\ng8oPyM3LJXtlNl0+6cLT9zzd5OaesdZaqVBd4k1JVeMSMdbV1MD3vw8HHwx33RXXQ4tIA5JaUyX7\n5+5sab+FQ3seSnZ5NtuztnPqmFOT2i29sLBQiZRIK5eVBQ8/DMccAw8+GO64LiKpR0lVArUb1o5d\n/7eLQdmDyMrNonJXJT+/9OfNOoYWLxYRgM6d4e9/h6Ii6N8fxowJOiIRqUuX/xJk3lfzmLd0Hmd0\nO4NX570KtHxNP13Ck1Siy3+NS/RY9+qrcO65MGsWHH98wk4jkvFUU5UiPljzAY998hjXjr6WgryC\noMMRiSslVY1Lxlj34otw/vnwf/8XviQoIvGnpCoFfLn5S/604E9cfuzl9OvSD0jeTJNmtCQZlFQ1\nLllj3XPPhZexeeklGDky4acTyThKqgK2bvs6bnnrFn428mcM7zkcSF5bA7VPkGRRUtW4ZI51Tz8N\nl14Kc+bA8OFJOaVIxmjJWBfP5p8ZrbyinDveuYOzBp+1N6ECuP32+1i5Mo8NGxaQm9uV7OyJe2eT\n4im6e3pBwdiEnUdEUsfZZ8Ntt8Epp8C77wYdjYjo7r84qKiq4K4Fd3Fc3+M48eAT924vLS3l5ZcX\ns23bxeza1YWyshkcfPDoACMVkXTzox9BXh6cdhr87//CWWcFHZFI5lJSFaMar+Ge9++hT34fxg8a\n/433nnjiJQoKrmDXroOAvlRVfZ8NG25lwoT4d+9T6wWRzHXWWdCnT/jn0qVwxRVBRySSmZRUxcDd\neXThozjOT0b8BLN9L7127NiBESMGsHr1enbu3M4ppxyTkDqnwsJCZs68MqpQXfVUIpnk6KPhrbfg\n9NPhiy9gxozwoswikjwqVI/B80uep3RNKVefcDXtctrt837QxeN17wYEdHegxEyF6o0LeqzbuhV+\n+MPwGoEPPQTduwcWikirprv/kuitFW8x+5+zKR5dTOd2nRvcL6g2B3UTuvLyWzFrR37+JYDuDpSW\nU1LVuFQY6yor4Te/gSeegL/8Rd3XRVpCa/8lyaINi3hm8TP8+vhfN5pQQXBr70XfDQiwbNmTwAn0\n71+7MHN4HyVVIuknNxduvhlOOgkmTIBJk+B3v9PlQJFEU0uFZlqxdQX3lt7LpKMn0Tu/d9DhiIg0\n6Hvfg9JSmD8fxo6FlSuDjkgkvSmpaoaNOzdy14K7OG/EeQzsNjDocBo1YcI4qqsfoKxsLmVlc+na\ndQXdus3a+zp8d+C4oMMUkQTr3Rtefjncy+qoo+BPf4Lq6qCjEklPqqlqoh17dnDzWzfz7YO/zdgB\nY4MOp0lUqC6JoJqqxqXyWLdoEfziF7BnD/z5z1reRqQxKlRPkMrqSm5/53b6de7Hvw37t6DDEQmU\nkqrGpfpYV1MD998fLmT/yU/ghhugY8egoxJJPYEsU2Nml5nZYjNbaGY3xXq8VOPu3P/h/XRu25kf\nDv1h0OGISJyY2b1mts7MPo7a9nsz+8jMPjSzV8ysb9R715vZksh4d0owUccuKwt+/nP45JPwDSuH\nHRa+JLhnT9CRibR+MSVVZlYEnAEc4e5HALfEI6hU8tSipyivKGfiyIn1NvcUkVbrfqBuYeF0dz/S\n3UcCs4DJAGY2FDgHGAJ8D7jbWvmA0KMHPPggPP88zJ4NgweH2y+o3kqk5WKdqfolcJO7VwG4e1ns\nIaWOuV/OJbQhxC+P/iW52blBhyMiceTubwCb62zbHvWyA7Ax8vxM4HF3r3L3pcASYFQy4ky0o46C\nF14IXxK8++5wndVf/wpVVUFHJtL6xJpUDQLGmNnbZjbPzI6OR1Cp4P3V7/PyFy9z2ajL6NCmQ9Dh\niEiSmNkfzGw5MBGYGtncB1gRtduqyLa08Z3vwJtvwn//N9x+e/iy4B13wPbt+/+siITtN6kyszlm\n9nHUY2Hk55mEm4d2dffjgGuBJxMdcDIs2biExz55jEtHXUr3PK3xIJJJ3P137n4w4cuDfww6nmQy\ngzPOgDfegMceg9dfh0MOgeuuCy/ULCKN229HdXc/uaH3zGwS8Exkv3fNrMbMurv7xvr2nzJlyt7n\nRUVFFBUVNTfehFuzbQ1/fv/PXHjUhRzU+aCgwxEJXElJCSUlJUGHEYRHgecjz1cB0QNC38i2erWG\nsW5/jjsufBnwyy/hzjvhmGPgyCPhwgvhBz+A9u2DjlAkvuIx1sXUUsHM/gPo4+6TzWwQMMfd+zWw\nb0rfZgywdfdWpr05jTMGncHxBx0fdDgiKSmdWiqY2SHAc5EbbTCzge7+eeT5ZcAod/9ppFD9EeBY\nwpf95gCH1TeotYaxriV274Znn4X77oN334VzzoFzz4UTT9TyN5Kekt6nysxygfuAkUAF8Gt3f62B\nfVN6oNldtZtb3rqFwt6FnHbYaUGHI5Ky0iWpMrNHgSKgO7CO8J1+pwOHA1XAl8Av3X19ZP/rgZ8D\nlcAV7v5yA8dN6bEuHpYvh4cfhqeegtWrwzNXP/xhuC4rV/f0SJpQ888Wqq6p5k/v/olu7btx3hHn\nqXWCSCPSJalKlFQe6xLhiy/g6afDj88/h+9+F8aNCz/6pFUpv2QaJVUt4O489NFDbNuzjYuPuZgs\n03KIIo1RUtW4VB3rkmHVqvA6gy++CK+8El538JRT4NvfhtGjoWfPoCMUaTolVS3w3GfPsXD9Qn59\n/K9pm9M26HBEUp6Sqsal6liXbNXV8N57MHdu+G7Ct96CXr3CNVgnnABHHw1Dh0LOfm+XEgmGkqpm\nemP5G7yw5AWKTyymU9tOQYfTInUXTdYiyZJoSqoal4pjXSqorg4vjfP66/D22/D+++HarBEj4Fvf\nCjcdHT4chg2D/PygoxVRUtUsn6z/hAc/fJCrT7iaAzoeEHQ4LVJaWsqkSTPIzp4IQHX1A8yceaUS\nK0koJVWNS7WxLpWVl8MHH4QTrI8+glAIFi8OL6EzfDgcfjgMGhRuRDpoEBx4YHjtQpFkaMlYl5ET\nr8u2LOP+D+7nklGXtNqECuCJJ14iO3siBQVjgfDiqE888ZKSKhFpFTp1Ct8x+J3vfL2tuhq++io8\nq/XZZ+H2DY8+Cv/8ZzgJ69cv3JA0+mffvuGi+AMPhLaq4pAAZVxSVbazjD+9+yd+euRPGdB1QNDh\niIhIlOxsGDgw/Khr+/ZwZ/fax7Jl4bqtlSvDRfJr10LnzuEE64AD9n107w4FBV//7NQp3EVeJF4y\nKqmq8Rr+tOBPnHbYaYzsNTLocGI2YcI45s2bQVlkGevq6geYMOHKYIMSEUmQjh3DlwWHD6///Zoa\n2LAhnGCtW/f1Y/Vq+PBD2LgxPKNf+3PXrnAS1qULdO0a/tmlSzjZ6tQp/F6nTuEar44dv3506BB+\n5OV9/bN9ezVBlQysqdqwYwM9OvQIOoy4UaG6JJtqqhqXKmOd7F9lJWzZAps3f/1z69bwZcby8q+f\nb9/+zce2bbBz59ePHTvCCVpuLrRrF06w2rcPP699tG379c82bb7+WfvIzQ0/op/n5obvjox+XvvI\nzm74Z+0jK2vf59E/G3qY7fvc7JvP63uvsUdrpEJ1EUk4JVWN01iXmdyhoiK8nM+uXeHH7t1fb4v+\nuWdP+FH7vKIinODVPvbsCf+sqvrm9urq8LboR3X119ujf9bUfP1e7evabbU/3b/eXneb+9evo7fV\n/mxoW91HXfUlW409r7ut7vaGttX3Xt04on/+7Gdw661191FSJSIJpqSqcRrrRPbVUMLV2PO62+pu\nb2hbfe/VjaXue23b7tvKQ3f/iYiISMppzZcBm0MdP0RERETiQEmViIiISBwoqRIRERGJAyVVIiIi\nInGgpEpEREQkDpRUiYiIiMSBkioRERGROFBSJSIiIhIHMSVVZnaMmS0wsw8iP4+OV2AiIolkZvea\n2Toz+zhq23QzW2xmH5rZ02bWKeq9681sSeT9U4KJWkRSWawzVdOB37n7UcBk4ObYQ0q8kpKSoEMA\nUicOUCz1SZU4ILViSSP3A+PqbHsZGObuI4ElwPUAZjYUOAcYAnwPuNssfftDt/a/b609fmj936G1\nx99SsSZVa4DOkeddgFUxHi8pUuUPO1XiAMVSn1SJA1IrlnTh7m8Am+tse8XdayIv3wb6Rp6fCTzu\n7lXuvpRwwjUqWbEmW2v/+9ba44fW/x1ae/wtFevaf9cBb5rZrYABJ8QekohISrgQeCzyvA/wj6j3\nVkW2iYjstd+kyszmAAdEbwIc+B1wGXCZu//dzH4I3AecnIhARUSSxcx+C1S6+2P73VlEJMLcveUf\nNit39+hCzq3u3rmBfVt+IhFJKe6eFvVEZtYPeM7dR0RtmwhcBJzk7hWRbdcB7u7TIq9fBCa7+zv1\nHFNjnUiaaO5YF+vlvyVm9h13f83MxgL/jFdgIiJJYJFH+IXZqcA1wJjahCriWeARM5tB+LLfQGBB\nfQfUWCeSuWJNqn4B/MnM2gC7gf+IPSQRkcQzs0eBIqC7mS0nfAfzb4A2wJzIzX1vu/vF7r7IzJ4E\nFgGVwMUeyzS/iKSlmC7/iYiIiEhY0juqm9llkeZ5C83spmSfv04svzazGjPrFmAMDTYbTNL5TzWz\nT83sn2ZWnMxz14mjr5m9amahyN+Ny4OKJRJPlpmVmtmzAcfR2cz+Gvk7EjKzYwOM5fpIDB+b2SOR\nGWqJkiq/T83RQBPUrmb2spl9ZmYvmVm9tbKpoKGxo7V8BzNra2bvRJpoh8zsvyPbW0X8teqOma0w\n/qVm9lFtM/PItmZ/h6QmVWZWBJwBHOHuRwC3JPP8dWLpS/hOxWVBxRBRb7PBZDCzLOAuwg0QhwE/\nMrPByTp/HVXAVe4+DDgeuCTAWACuIHypJ2i3A8+7+xDgSGBxEEFECrovAo6KFHXnAOcGEUuqSrHf\np+aorwnqdcAr7n448CpJHJdaoKGxo1V8h0jt3r9EmmiPAE4ys9G0kvij1B0zW1v8NUCRux/l7rU9\n6Jr9HZI9U/VL4CZ3rwJw97Iknz/aDMIFqYFqpNlgMowClrj7MnevBB4Hzkri+fdy97Xu/mHk+XbC\nyUMgfYAiCfdpwP8Gcf6oODoB33b3+wEijSfLAwqnHNgDdDCzHCAPWB1QLKkqZX6fmqO+JqiE434w\n8vxB4PtJDaoZGhg7+tK6vsPOyNO2hP+/vJlWFH8DY2ariT/C2DcnavZ3SHZSNQgYY2Zvm9k8C2it\nQDM7E1jh7guDOH8jLgReSOL5+gArol6vJAUaGprZIcBIYJ/b1ZOkNuEOuuCwP1BmZvdHptXvMbP2\nQQTi7puBW4HlhBtfbnH3V4KIJYWl5O9TC/V093UQTlqAngHH0yRRY8fbwAGt5TtELp19AKwFStx9\nEa0ofuofM1tT/BCOfY6ZvWtm/x7Z1uzvEOvdf/uwxpuF5gBd3f04MzsGeBIYEO8YmhDHb/hmk9KE\n3gLdSCy/dffnIvvUNht8NJGxpDoz6wg8BVwR+Vdnss9/OrDO3T+MXK4O8vb4HKAQuMTd3zOzPxKe\njp6c7EDMbABwJdAP2Ao8ZWY/zvS/rxkk6H9g7FfdscP27ReWst8hcrXiqMjs9EuRsadVxF/PmNmQ\nlIw/ymh3X2NmPYCXzewzWvBnEPekyt0b7KhuZpOAZyL7vRspEu/u7huTFYeZDQcOAT4yMyM8Tfy+\nmY1y9/XxjqOxWKJimkh46vSkRJy/EauAg6Ne9yXA9Rsjl5WeAh5291kBhTEaONPMTgPaA/lm9pC7\nnx9ALCsJz6i+F3n9FBBU8fPRwJvuvgnAzJ4hvCyVkqqvpdTvU4zWmdkB7r7OzHoBCRkb46WBsaNV\nfQcAdy83s+cJ/761lvjrGzMfBta2kvgBcPc1kZ8bzOzvhC/nN/vPINmX//5OJHEws0FAbiISqsa4\n+yfu3svdB7h7f8L/4zoqUQnV/tjXzQbPrNNsMBneBQaaWb/InVznEm5yGJT7gEXufntQAbj7b9z9\nYHcfQPi/x6sBJVREpp1XRH5XAMYSXPH8Z8BxZtYu8o+RsQRUNJ/CUu33qTm+0QSVcNwTI89/BgT1\nj5ymqm/saBXfwcwKau8qi1zePxn4gFYSfwNj5k+B52gF8QOYWV5kphMz6wCcAiykBX8GcZ+p2o/7\ngfvMbCFQAQTyP6s6nGAv8dxJPc0Gk3Fid682s0sJ34GYBdzr7kHdXTYaOA9YGKktcOA37v5iEPGk\nkMsJd/LOBb4ELggiCHf/yMweAt4HqgkP+vcEEUuqSqXfp+aw+pug3gT81cwuJHyH9DnBRdi4hsYO\nYMjnsCMAAAPrSURBVBrwZCv4Dr2BByP/WMkiPNs2N/JdWkP8DbmJ1hP/AcDfIpeMc4BH3P1lM3uP\nZn4HNf8UERERiYOkN/8UERERSUdKqkRERETiQEmViIiISBwoqRIRERGJAyVVIiIiInGgpEpEREQk\nDpRUiYhIyjOzbmb2QWQdzDVmtjLqdZN6LprZvWZ22H72udjMfhSfqOs9/g+iGvpKmlGfKhERaVXM\n7L+A7e5+Wz3vmafw/9giS7g8FeBSXJJAmqkSEZHWZu8qGGZ2qJmFzOwvZvYJ0MvM/mxmC8xsoZn9\nLmrf181shJllm9lmM5tqZh+a2ZtmVhDZ50Yzuzxq/6lm9o6ZLTaz4yLb88zsKTP7xMz+ambvmtmI\nfYI0uzkS24eR45xIeJ3X2yIzbAeb2UAzezFyjBIzGxj57MNmdreZvWdmn0aWNMPMhke+W2nkuIck\n7L+yNFuyl6kRERGJt8OBn7j7BwBmVuzuW8wsG5hnZk+5+6d1PtP5/7d3PyE2hWEcx79PjSjTzE6p\nKUmjkAZNkSzsNAs1CyJ2xEJRZmehrMmGFHYypUwmicnInw1WxmKKKRqymGah5H8NM34W9xmuM/di\n6tTMrd+nTr333Pd9z1k+Pc/beYAHko5FxGlgH3Cy1uaSNkbEdiotfLqAw8C4pB0ZTA0V10TEEqBL\n0pr83VLVMLlP0o28fx/YL+l1RGwGzgHbcps2SZ1ZLrwbESuAQ8ApSX3Zvmou26xZgYMqMzNrdKPT\nAVXam/3amqj01lsNFIOqr5Lu5HgI2FJn7/6qOctyvIVKbzskDUfEsxrr3gFTEXERGABuFidkI+VN\nwLXs/Qd/VpCu5jNeZF/GduAxcDwzVP2SRuu8t80Bl//MzKzRfZkeZPnsCLBVUgcwCCyqseZb1XiK\n+kmGif+YMyNbJGkS6ASuA93ArTrr3kraIGl9Xh3V2xTmSlJv7jcB3M6Sos0TDqrMzKzRVQc1LcBH\n4HNELOV3Ke1va2brEbALICLWAqtmbB7RDLRKGgB6gHX516d8RyS9B8YjojvXROFs1s68vxJoA15G\nxHJJrySdoZL9mnGWy+aOy39mZtbofmV0JD2NiBFgBHgDPKw1rzD+574FZ4FLeTD+eV4fCnNagf6I\nWEglgDua968AFyKih0rGaTdwPiJOAAuAXmA4545FxBNgMXBA0mRE7MlPPnwHxqic87J5wp9UMDMz\nm4U8AN8kaSLLjYNAu6QfJT7jMlUH2q0xOFNlZmY2O83AvaqPjh4sM6BKzng0IGeqzMzMzErgg+pm\nZmZmJXBQZWZmZlYCB1VmZmZmJXBQZWZmZlYCB1VmZmZmJXBQZWZmZlaCn3k/n05X32zbAAAAAElF\nTkSuQmCC\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f8047207c10>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "#@test {\"output\": \"ignore\"}\n",
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Set up the data with a noisy linear relationship between X and Y.\n",
+    "num_examples = 50\n",
+    "X = np.array([np.linspace(-2, 4, num_examples), np.linspace(-6, 6, num_examples)])\n",
+    "# Add random noise (gaussian, mean 0, stdev 1)\n",
+    "X += np.random.randn(2, num_examples)\n",
+    "# Split into x and y\n",
+    "x, y = X\n",
+    "# Add the bias node which always has a value of 1\n",
+    "x_with_bias = np.array([(1., a) for a in x]).astype(np.float32)\n",
+    "\n",
+    "# Keep track of the loss at each iteration so we can chart it later\n",
+    "losses = []\n",
+    "# How many iterations to run our training\n",
+    "training_steps = 50\n",
+    "# The learning rate. Also known has the step size. This changes how far\n",
+    "# we move down the gradient toward lower error at each step. Too large\n",
+    "# jumps risk inaccuracy, too small slow the learning.\n",
+    "mu = 0.002\n",
+    "\n",
+    "# In TensorFlow, we need to run everything in the context of a session.\n",
+    "with tf.Session() as sess:\n",
+    "  # Set up all the tensors.\n",
+    "  # Our input layer is the x value and the bias node.\n",
+    "  input = tf.constant(x_with_bias)\n",
+    "  # Our target is the y values. They need to be massaged to the right shape.\n",
+    "  target = tf.constant(np.transpose([y]).astype(np.float32))\n",
+    "  # Weights are a variable. They change every time through the loop.\n",
+    "  # Weights are initialized to random values (gaussian, mean 0, stdev 1)\n",
+    "  weights = tf.Variable(tf.random_normal([2, 1], 0, 0.1))\n",
+    "\n",
+    "  # Initialize all the variables defined above.\n",
+    "  tf.initialize_all_variables().run()\n",
+    "  \n",
+    "  # Set up all operations that will run in the loop.\n",
+    "  # For all x values, generate our estimate on all y given our current\n",
+    "  # weights. So, this is computing y = w2 * x + w1 * bias\n",
+    "  yhat = tf.matmul(input, weights)\n",
+    "  # Compute the error, which is just the difference between our \n",
+    "  # estimate of y and what y actually is.\n",
+    "  yerror = tf.sub(yhat, target)\n",
+    "  # We are going to minimize the L2 loss. The L2 loss is the sum of the\n",
+    "  # squared error for all our estimates of y. This penalizes large errors\n",
+    "  # a lot, but small errors only a little.\n",
+    "  loss = tf.reduce_mean(tf.nn.l2_loss(yerror))\n",
+    "\n",
+    "  # Perform gradient descent. \n",
+    "  # This essentially just updates weights, like weights += grads * mu\n",
+    "  # using the partial derivative of the loss with respect to the\n",
+    "  # weights. It's the direction we want to go to move toward lower error.\n",
+    "  update_weights = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)\n",
+    "  \n",
+    "  # At this point, we've defined all our tensors and run our initialization\n",
+    "  # operations. We've also set up the operations that will repeatedly be run\n",
+    "  # inside the training loop. All the training loop is going to do is \n",
+    "  # repeatedly call run, inducing the gradient descent operation, which has the effect of\n",
+    "  # repeatedly changing weights by a small amount in the direction (the\n",
+    "  # partial derivative or gradient) that will reduce the error (the L2 loss).\n",
+    "  for _ in range(training_steps):\n",
+    "    # Repeatedly run the operations, updating the TensorFlow variable.\n",
+    "    sess.run(update_weights)\n",
+    "    \n",
+    "    # Here, we're keeping a history of the losses to plot later\n",
+    "    # so we can see the change in loss as training progresses.\n",
+    "    losses.append(loss.eval())\n",
+    "\n",
+    "  # Training is done, get the final values for the charts\n",
+    "  betas = weights.eval()\n",
+    "  yhat = yhat.eval()\n",
+    "\n",
+    "# Show the results.\n",
+    "fig, (ax1, ax2) = plt.subplots(1, 2)\n",
+    "plt.subplots_adjust(wspace=.3)\n",
+    "fig.set_size_inches(10, 4)\n",
+    "ax1.scatter(x, y, alpha=.7)\n",
+    "ax1.scatter(x, np.transpose(yhat)[0], c=\"g\", alpha=.6)\n",
+    "line_x_range = (-4, 6)\n",
+    "ax1.plot(line_x_range, [betas[0] + a * betas[1] for a in line_x_range], \"g\", alpha=0.6)\n",
+    "ax2.plot(range(0, training_steps), losses)\n",
+    "ax2.set_ylabel(\"Loss\")\n",
+    "ax2.set_xlabel(\"Training steps\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "lSWT9YsLP1de"
+   },
+   "source": [
+    "This version of the code has a lot more comments at each step. Read through the code and the comments.\n",
+    "\n",
+    "The core piece is the loop, which contains a single `run` call. `run` executes the operations necessary for the `GradientDescentOptimizer` operation. That includes several other operations, all of which are also executed each time through the loop. The `GradientDescentOptimizer` execution has a side effect of assigning to weights, so the variable weights changes each time in the loop.\n",
+    "\n",
+    "The result is that, in each iteration of the loop, the code processes the entire input data set, generates all the estimates $\\hat{y}$ for each $x$ given the current weights $w_i$, finds all the errors and L2 losses $(\\hat{y} - y)^2$, and then changes the weights $w_i$ by a small amount in the direction of that will reduce the L2 loss.\n",
+    "\n",
+    "After many iterations of the loop, the amount we are changing the weights gets smaller and smaller, and the loss gets smaller and smaller, as we narrow in on near optimal values for the weights. By the end of the loop, we should be near the lowest possible values for the L2 loss, and near the best possible weights we could have."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "dFOk7ERATLk2"
+   },
+   "source": [
+    "## The details\n",
+    "\n",
+    "This code works, but there are still a few black boxes that are worth diving into here. `l2_loss`? `GradientDescentOptimizer`? What exactly are those doing?\n",
+    "\n",
+    "One way to understand exactly what those are doing is to do the same thing without using those functions. Here is equivalent code that calculates the gradients (derivatives), L2 loss (sum squared error), and `GradientDescentOptimizer` from scratch without using those functions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 657,
+     "status": "ok",
+     "timestamp": 1446499870301,
+     "user": {
+      "color": "",
+      "displayName": "",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "",
+      "photoUrl": "",
+      "sessionId": "0",
+      "userId": ""
+     },
+     "user_tz": 480
+    },
+    "id": "_geHN4sPTeRk",
+    "outputId": "85c49bf6-8d07-401a-ae08-79c6933adff5"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlUAAAEPCAYAAABr+zG+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl8VPXZ///XlYUlLEEIIIsCCqKAqCmgFkvTUkUrLnft\nXay2va12obZK9RZTWr8/oO19K2pLXWqxLWqxWnGrWEpVVCK3K2pcBxSs7MgSdgiELNfvj5ngGJKQ\nZCZzZnk/H4/zyJkzZ865RuDjlc/5fK6PuTsiIiIiEpusoAMQERERSQdKqkRERETiQEmViIiISBwo\nqRIRERGJAyVVIiIiInGgpEpEREQkDuKSVJnZFDMLmdm7ZvaAmbWJx3VFRFqLmc02s01m9m7UseFm\n9rKZvWNm88ysY9R7U8xshZktM7OzgolaRJJZzEmVmfUDvg+c4u7DgRzg4livKyLSyu4FxtU59mfg\nenc/Cfg7cD2AmQ0BvgGcAJwD3GVmlsBYRSQFxKOnahdwAOhgZjlAHrAhDtcVEWk17v4isL3O4UGR\n4wDPAhdF9s8HHnL3KndfBawARiUkUBFJGTEnVe6+HfgNsAZYD+xw92djva6ISABCZnZ+ZP8bQN/I\nfh9gbdR56yPHREQOisfjv2OAa4B+QG+go5ldEut1RUQCcDnwYzN7HehAuBdeRKRJcuJwjRHAS+6+\nDcDMHgc+DzwYfZKZaZFBkTTh7mk5nsjdlxMZZ2Vmg4BzI2+tB46KOrVv5Ngh1NaJpI/mtnXxGFP1\nIXCambWLDNwcCyxrILik2KZOnRp4DMkUh2JJ7jiSLZY0Y5Et/MKse+RnFnADMCvy1pPAxWbWxswG\nAAOBJQ1dNOg/o3T6+5aJ8afDd0j1+N1b1tbF3FPl7u+Y2RzgTaAaeAv4Y6zXFRFpTWb2IFAEdDOz\nNcBUoJOZ/Rhw4HF3vw/A3Zea2cPAUqASuNJb2uqKSNqKx+M/3P0W4JZ4XEtEJBHcvaGxn7c3cP6N\nwI2tF5GIpLqMrKheVFQUdAhA8sQBiqU+yRIHJFcskv5S/e9bqscPqf8dUj3+lrJE9WCbmXrLRdKA\nmeFpOlA9HtTWiaSHlrR1GdlTJSIiIhJvSqpERERE4kBJlUiSq6yubPH0XhERSRwlVSJJrKqmijuX\n3MmS9Q2WRBIRkSShpEokSbk7c96ZQ7ucdozsMzLocERE5DCUVIkkqSc+eIIte7fwvcLvkWX6pyoi\nkuzUUoskoZJVJZR+UsqPR/2Y3OzcoMMREZEmUFIlkmTe2fgOC1YsYNJpk+jYpmPQ4YiISBMpqRJJ\nIh9v/5j7372fH4/8MQV5BUGHIyIizaCkSiRJbN67mT+8/gcuO/ky+nXpF3Q4IiLSTEqqRJLA7ord\n3PbqbZw/+HyG9RgWdDgiItICSqpEAlZRVcGdS+7k1L6n8oV+Xwg6HBERaSElVSIBqvEa/lz6Z3p3\n6s15x50XdDgSJzU1QUcgIkFQUiUSEHfnwfcepKqmim8N/xZmzVoMXZLY7t1BRyAiQVBSJRKQf330\nL1btWMUPR/yQ7KzsoMORONqxI+gIRCQISqpEAvDK2ld4cc2LXDXqKtrltAs6HIkzJVUimUlJlUiC\nLduyjMeWPcZVo64iv11+0OFIK1BSJZKZ4pJUmVm+mT1iZsvMLGRmp8bjuiLpZu3Otcx+azYTR0yk\nV6deQYcjrURJlUhmyonTdW4DFrj7f5pZDpAXp+uKHKK0tJS5c58GYMKEcRQWFgYcUdNsLd/KnUvu\n5JITL2Fg14FBhyOtaOfOoCMQkSDEnFSZWWfgC+5+GYC7VwG7Yr2uSH1KS0uZOHEm2dmXAbBo0Uxm\nzbom6ROr8spy7lhyB2cdexaFvZI7VomdeqpEMlM8Hv8NAMrM7F4zKzWzP5pZ+zhcV+QQc+c+TXb2\nZRQUjKWgYCzZ2Zcd7LVKVpXVldz1+l0M7T6UsceMDTocSQAlVSKZKR5JVQ5QCPze3QuBcuBncbiu\nSMpzd+57+z7y2+bz9SFfDzociWJms81sk5m9G3VspJktMbO3Ij9HRL03xcxWRMaOntXYtZVUiWSm\neIypWgesdfc3Iq8fBYrrO3HatGkH94uKiigqKorD7SWTTJgwjkWLZlJWFn5dXX0fEyZcE2xQdYRC\nIeYvnA9A7gm51HSqYdKpk1K2uGdJSQklJSVBh9Ea7gXuAOZEHbsZuMHdnzGzc4BbgC+Z2RDgG8AJ\nQF/gWTMb5O5e34WVVIlkppiTKnffZGZrzew4d18OjAWW1ndudFIl0hKFhYVMmnQ2t99+GwCTJk2I\n+3iqWAbCh0Ihrr/lenKH5rKlegub/7WZB7/7ILnZuXGNMZHq/gI0ffr04IKJI3d/0cz61Tn8CVBb\n56ILsD6yfz7wUGTM6CozWwGMAl6r79pKqkQyU7xm/10NPGBmucDHwHfjdF2RzygtLeW2254iO3sS\nALfddh8nnHDCwcQn1pmBsQ6En79wPrlDc8k+Opu92/cyKGsQzy96npEnjWxWHBKYnwEvmdlvAAM+\nHzneB3gl6rz1kWP10uw/kcwUl6TK3d8B9H8NaXXRA9UBysrCxwoLC+MyM7Cx6zfVnpo9bNm2hWE9\nhrF3994mf06SwmzgKnd/wsy+DtwDnNnciyxdOo3ajnkNdRBJDfEY6hCvniqRwMUjIYrVqWecyt0P\n3k1/+rN3z14qQ5WMnzw+YfeXmJ3q7mcCuPujZvbnyPH1wFFR5/Xl00eDh8jL+zSpEpHUEI+hDkqq\nJKW09kD1WK6/c/9Ontn5DL+68FesK10HwPjJ4xk6dGjc4pO4s8hWa4WZfdHdXzCzscCKyPEnCQ9x\nmEn4sd9AYElDF9WYKpHMZA1MXon/jcwamigj0iwNjZuq+/ivuvq+FhUGbcm4rP1V+7n15Vsp7FXI\nVwd9tVn3SzVmhrun5lTGKGb2IFAEdAM2AVOBd4G7gDbAfuBKd38rcv4U4AqgEpjk7s80cF3PznYq\nKyFFJ3yKCC1r65RUSVoJYgmb6ppqfv/67+naviuXnnhpypZOaKp0Sapai5l5hw7Oxo3QsWPQ0YhI\nSympEmlF9SVs7s6cd+aw+8Burhx5JVmWlbJrEzaVkqrGmZn36eO8+ir07Rt0NCLSUi1p6zSmSqQJ\nGppZuL7Dejbs3sC1p197MKFKxbUJJb66dAmPq1JSJZJZlFSJNEF9Mwt/88jdDDizO8Wji2mb07bB\n8xI9A1GCl5+vweoimSgea/+JZJzyjitZkb2Mq0+9mk5tOwUdjiSZ2p4qEcks6qkSaYIJE8axYMFU\n1m99iKoOuzkw5HXuGTeTHh16HHJesq9NKK1PSZVIZlJSJVKP6MHmI0YM5JU3XmF33nvUnPwe+wp2\n03/j0Rzb9dhDPldYWMisWddEDVTXeKpM1KWLlqoRyURKqkTqiB5sXl6+irsf/yldB7Zl3yn72N91\nP4X9Csnvnc/8hfPrLexZWFioRCrDqadKJDNpTJVIHdGDzQ9klZF1fH/KrYIDRxygbfu2lG8rDzpE\nSXJKqkQyk5IqkcNwHB/s2Hoja10Wu9fsDq/pd6bW9JP6afafSGbS4z+ROiZMGMfjj1/LsjVTqKos\np7LLKrp1PJK+Wcex8dWNnD3mbK74yRVa008apJ4qkcykpEqaJd2rhQO89NJLrDvwGtV9quFYwypg\n1KZRnDTkJMZfrgWS5fA0UF0kMympkibLhGrhoVCIX/7pl9ScUUNOtxyq9lTRoaYj27dtp/inxUGH\nJylCPVUimUlJlTRZJlQLn79wPjbYsL5GTacacmtyqfyoEnoc/rMitZRUiWQmDVQXqaNPvz5UVVbB\nbqjaXkXV6ira1xRQXHwjpaWlQYcnKUAD1UUyk5IqabIJE8ZRXX0fZWXPUVb2XKRa+Ligw4qrL3zx\nC+yq3sXg7MF0/rgzeW/l0d2H8cknl7F48SgmTpypxEoOqzapcg86EhFJJPM4/as3syzgDWCdu59f\nz/ser3tJcNJxoHooFGL+wvlUeiUb+25keNfhbH9vOwDL3/uEpUvPi3rk+RxjxixhxowpQYYcKDPD\n3S3oOJJVbVvXvj1s3Qp5eUFHJCIt0ZK2Lp5jqiYBS4HOcbymJJl0qxYeCoW4/pbryR6SzcdVH5Nb\nksvEH01k2NhhABQX3xhwhJKqamcAKqkSyRxxSarMrC/wVeB/gGvjcU2R1hLd27av5hNyhuSwo/sO\njvAj6JrTlX8++0+GDQsnVVogWVqqdrB6r15BRyIiiRKvnqqZwGQgP07XE2kVdctCrN1yHV0urqTD\nER0Y3nM4W7Zu+cz5WiBZWkozAEUyT8xJlZmdC2xy97fNrAho8PnjtGnTDu4XFRVRVFQU6+1FmmXu\n3KepqCjiQNYbAFivk1m/+h+M7nwaW3ZsCS8/M/mzy8+k2yPP5iopKaGkpCToMFKOZgCKZJ6YB6qb\n2f8C3wKqgPZAJ+Bxd/9OnfM0UF0Cd8UVP+Xxxa/S9qQjqWy3k70HPmSsj2bM6BEAjD9TFdMPRwPV\nG1fb1l18MVxwAXzzm0FHJCItEchAdXf/OfDzSABfBP67bkIlUiuW2YPxmHnYoWsNDF5HVbc89nVa\nTduPazj22F6qli5xp6VqRDKPKqpLwsSyzE08lsgJhUKsWLmCzj2y2V3wHkdZT3qdcDQc+HSWX7qU\niZDgaUyVSOaJa/FPd3+hvhpVIvDZZW4KCsaSnX3ZwZ6n1vwsfFo6YW+/vZTVlOGfVNE3qydVS6tY\n9NRKFi8epeKeGcbMZpvZJjN7N+rYQ2ZWGtlWmllp1HtTzGyFmS0zs7MOd30lVSKZRz1VkhHmL5xP\n1pAs9h+xn5N3n8yOF3fQZlcb+nc9kaUdz0vr9QylQfcCdwBzag+4+8W1+2Z2K7Ajsn8C8A3gBKAv\n8KyZDWpsoGh+Pqxa1TqBi0hyUlIlCdPSmk+lpaVs3LieVav+lz179tKxY4dm14uq8RpWV62mW243\nBg0cxObqzZyeezrbPqlp6deRFOfuL5pZv0ZO+QZQFNm/AHjI3auAVWa2AhgFvNbQh9VTJZJ5lFRJ\nwrSk5lP0WKru3YexZctNfO5zI7n66qaPp3J3bLBR/VQ1nXM6s3nL5oOlEyoqKlTcUw5hZl8ANrr7\nx5FDfYBXok5ZHznWICVVIplHSZUkVHNrPtUdS9WhwyB69lzSrGv866N/UdWxivu/dz8Ln18IwPjJ\nn5ZOUHFPqcc3gb+19MPTpk1j7Vp4/30oKVFNPpFUEI+afEqqJK29vPZlXlzzIsWji8lvl0/h8EMT\npkwv7imfZWbZwNeA6L8U64Gjol73jRyr17Rp01i2DF5+GZRPiaSGukXJp0+f3uxrxHX2n0i8TZgw\njurq+ygre46ysucij+fGNemzS7cs5fFlj3PVqKvIb6cVlKRexqGrQJwJLHP3DVHHngQuNrM2ZjYA\nGAgsaezCevwnknlirqje5Buporq0UHOKfoZCIeYvnM+Omh18cuQn3HD2DQzsOjBRoWaEdKmobmYP\nEh6I3g3YBEx193vN7F7gFXf/Y53zpwBXAJXAJHd/poHrurtTXg7dusG+fa36NUSklbSkrVNSJWmj\nthaVD3FWVK6g5+qe3D3pbi07E2fpklS1ltq2zh3atoXdu8M/RSS1tKSt0+M/SRvzF87Hhhib8jcx\nuP9gCgYVMH/h/KDDkgxlpqVqRDKNkipJG9VezaqqVXRt35U+nRud7S6SEBpXJZJZlFRJWnB3Dgw8\ngH1i5G3MY9MHm8K1qM4cH3RoksGUVIlkFpVUkLTw6NJH6ditI/f/4H6efi48qD26FpVIEPLzlVSJ\nZBIlVWmqOTPmUt2zHz9LaEuI60dfT15uHiefeHLQIYkA6qkSyTRKqtJQ9NIuAIsWzWTWrPSpFD5v\n3jxm3T8LgK9c9BU2F2ym+Ixi8nLzAo5M5LOUVIlkFiVVaSh6aReAsrLwsaYkVcnewzVv3jy+P+37\nZI/Kpiq7ihf++QK3X3A7Xdt3DTo0kUNo9p9IZtFAdTmotodr8eJRLF48iokTZ1JaWhp0WAeFQiEm\n/2oy+zrtI7d7LjX9a+iQ34HH5j4WdGgi9VJPlUhmUU9VGpowYRyLFs2krCz8Ory0yzWH/VwsPVyt\nrbaw544BO6hoV8HGDRspsAK8SgVlJXl16QIbNhz+PBFJD0qq0lBhYSGzZl0T9RgvtcZT1X0E2bZt\nW66beh2ru6/m2MHH8samN6Acdr+wm/a72zNx2sSAIxapn2b/iWQWJVVpqrCwsNmJVEt7uOKp7iD7\nBQumkj9wO9ttO9tqtvHJrk8Y0H0A29/ezhHlR3DLtFu44IILEhqjSFPp8Z9IZol5TJWZ9TWz580s\nZGbvmdnV8QhMEq+2h2vMmCWMGbMkkBmD0Y8gCwrGsrPiSDblb+OEL59A5b5KqvdXk70um1EdRvH3\nv/xdCZUkNQ1UF8ks8eipqgKudfe3zawj8KaZPePuH8Th2pJgLenhSoTd+bvpfWJvql6tor/15+bp\nN6uwpyQ99VSJZJaYkyp33whsjOzvMbNlQB9ASZU0W+0jyOXLX2LTnvlUVa6kYFsHVq1ZxcDcgdAJ\nbp6shEpSg5IqkcwS1zFVZtYfOBl4LZ7XleAlqn5VYWEhRUW9mPnwr7FCo21BGzbu2815a89jxPEj\ntPSMpBQNVBfJLHFLqiKP/h4FJrn7nvrOmTZt2sH9oqIiioqK4nV7aUWJrNAeCoW49x/3kjXayOmf\nw/7K/XTa2ok9ZXso/mlx3O8nh1dSUkJJSUnQYaSkjh1h/36orITc3KCjEZHWFpekysxyCCdU97v7\nvIbOi06qJHXEu35VY71e8xfOx3oYWR2zqG5fTa7lUrmrMubvIC1X9xeg6dOnBxdMijEL91bt3AkF\nBUFHIyKtLV4V1e8Blrr7bXG6nqSpplRtP3rI0VTurcQ/capXVuPvOBO/rVpUkpo0A1Akc8SjpMJo\n4FLgy2b2lpmVmtnZsYcmyWLChHFUV99HWdlzlJU9F6lfNa5F16pbMiE7+7KDvVYA48aOY0+bPRzT\n7Rjy38+nc2lnbrzqRpVOkJSlweoimSMes/9eArLjEIskqURVaK/xGl7e/zKXnnkpuStysQHG+DOD\nGZie7AtLS+pQUiWSOVRRXZokXvWrGqra7u7MfX8uFVUVTPnqFHKygvurmciB+ZL+NANQJHMoqZKE\naqjX6+mPnuajbR9x3eevCzShguReWFpSj3qqRDKHkipJuLq9XkvWL6FkVQnFZxTTPrd9gJGJxJ8G\nqotkDiVVknChUIj5C+cDMOTUIbyw+wWuPf1aurTrEnBkYcmwsLS0PjObDYwHNrn78KjjVwFXEl6C\n65/u/rPI8SnA5ZHjk9z9mabcRz1VIplDSZUk1Lx58yj+bTFZQ7Lo2r0rdz9yN3d84w56d+oddGgH\nJWpgvgTuXuAOYE7tATMrAs4DTnT3KjMriBw/AfgGcALQF3jWzAa5ux/uJl26wEcftUL0IpJ0lFRJ\nwoRCIYpvLmbnsJ3k9MlhXfk6BncczPuvvs+5p50bdHifkawLS0v8uPuLZtavzuEfATe5e1XknEh/\nJRcAD0WOrzKzFcAomrAklwaqi2SOeBX/FDms+Qvnk90zm9yOuexvu5/27duzd8veoMMSiXYcMMbM\nXjWzRWb2ucjxPsDaqPPWR44dVrdusGVLnKMUkaSknipJqO4DurN++3pyc3KxHUbNBzWM/8n4oMMS\nqZUDHOHup5nZSOAR4JjmXiR6Sa5jjy3iww+L4hWfiLSSeKxzak0YEhAXZtaU4QeSxt5//30umXUJ\nFR0ryF6eTc2mGmZcP0PV0lOMmeHuFnQc8RB5/PeP2oHqZrYAmOHuL0RerwBOA74P4O43RY4/BUx1\n90Me/9Vt66qqoFOncGmODh1a+xuJSLy0pK1TT5XEVWOVyJdnL+fML59Jt9XdyD4yO7Bq6SJRLLLV\negL4MvCCmR0HtHH3rWb2JPCAmf2W8GO/gcCSptwgJwcGDYIPPwQN0xNJb0qqJG4aq0ResqqEtze+\nzS/H/5IObfTrugTPzB4EioBuZrYGmEp4cfh7zew9oAL4DoC7LzWzh4GlQCVwZXO63ocMgaVLlVSJ\npDslVRI3DVUit17GghULuH709UqoJGm4+yUNvPXtBs6/EbixJfeqTapEJL0pqZK42707xKad89m7\n9yM+3l7FX98t4+pTr6YgryDo0EQCMWQI/PWvQUchIq1NSZV8RmNjog5nwoRxPP74tawr/xh6GNlH\nl/Nybhsu7fR7+nWpWw5IJHOop0okMyipSgOxJEJ1r9PQmKimaNu2LZVdV5E1YjtZedk4VfTscBQf\nLvkQPt+ikETSwsCBsGYN7N8P7doFHY2ItBYV/0xxtYnQ4sWjWLx4FBMnzqS0tLRF14oeE1VQMJbs\n7MsOJmtNMX/hfPIK8ygY2pV2x+eSV5BHxUcVLYpFpKnM7FgzaxvZLzKzq80sORaSjGjTBo45BpYv\nDzoSEWlNSqpSXKyJULwd2f1Idu3Zhe93snZkUb2pmvFnqrintKrHgGozGwj8ETgKeDDYkA41dCiE\nQkFHISKtSUmVAOEer02bNrJq1f9j9epZlJU9R3X1fUyYMK7J1zj3K+eybes2urftTvcN3enyQRdm\nXD9DtaiktdVE1uT7D+AOd58M9Ao4pkNoXJVI+tOYqhQ3YcI4Fi2aSVlk2ddwInRNkz5bOxZr48b1\nvPnmBjp1+jEFBWPZsuU2TjnlBCZNavp4KoDVbVYz5ktj6Lm2JzlH5DD+JyruKQlRaWbfBP4LOC9y\nLDfAeOo1ZAg8/HDQUYhIa4pLUmVmZwO/I9zzNdvdZ8TjunJ4hYWFzJp1TdRA9aYlQtGD0levfphd\nu86isHAE/fvn07FjB448cslhrzNv3jxm3T8LgKKvFbGz505+fd6v6dy2c8zfS6QZvgtMBP7H3Vea\n2QDg/oBjOoR6qkTSX8xJlZllAXcCY4ENwOtmNs/dP4j12tI0hYWFzZ7xFz0Wa8uWJeza1Z0NGzaT\nn5/fpM/fcccdTLlzCjbSyOmQwwsLXuCuC+9SQiUJ5+5LgasBzOwIoFMy/mJ33HGwciUcOBAeuC4i\n6SceY6pGASvcfbW7VwIPAVohN4X07j0OmEN5+ctNGksVCoX45R2/pHJEJTWDatjTcw/t8tox96G5\niQtaJMLMSsyss5l1BUqBP0XW6EsqbdtCv36wYkXQkYhIa4nH478+wNqo1+sIJ1qSxOqOxerXr5KR\nI0vp2XNDo48QQ6EQ1029jvKqctgH1dnV5JJLZXnlwXPiVTdLpIny3X2XmX0PmOPuU83s3aCDqk/t\nI0ANNRRJTwkdqD5t2rSD+0VFRRQVFSXy9hLl0LFYvz5s8hMKhbj+lutZ3X01uaflUr6vHHs/iyw3\ncktzmXjrxJgLiEryKSkpoaSkJOgwGpNjZr2AbwC/CDqYxmhclUh6i0dStR44Oup138ixQ0QnVRK8\n5ozFqu2hWt19NX2G9GHt2rVkfZKNP5uLVbWle/tBHHXUUQ0uqqykKnXV/QVo+vTpwQVTv18CTwMv\nufvrZnYMkJQP2YYMgSefDDoKEWkt8UiqXgcGmlk/4BPgYuCbcbiuJInaHqo1NWvYVrONjes30tbb\nU7m5K53bDWFIv1upqNgYaNFRyVzu/gjwSNTrj4GLgouoYUOGwE03BR2FiLSWmAequ3s18BPgGSAE\nPOTuy2K9riSP+Qvnkzs0lyFjh1C1p4qqA1XUfOS0/7gXQ/rdSqdOnw4QmTBhHNXV91FW9lyLCoiK\nNJeZ9TWzv5vZ5sj2mJn1DTqu+gweDB99BFVVQUciIq0hLmOq3P0pYHA8riXJq7xLOT1O6oG/5nTd\n25Vd7bpSUbGRioqNB4uOtrRulkgM7iW8LM1/Rl5/K3LszMAiakBeHvTuHU6sjj8+6GhEJN7M3RNz\nIzNP1L0kvkKhED+47Qds7reZQbmDsKXGzZNvpqKiotVm+WkGYfIyM9zdgo6jlpm97e4nH+5YAuNp\ntK077zz47nfha19LYFAi0mwtaeu0TI3UKxQKMX/hfABOPO1EBhUNYszGMXTJ6sL4yZ8uP9MayY5m\nEEozbTWzbwF/i7z+JrA1wHgaVTsDUEmVSPpRUiWHqB2Ynjs0l/2+n7sfvpvf/efvOP+S8xNyf80g\nlGa6HLgDmAk48DJwWZABNWbIEHhaczpE0lI8KqpLGqktnbBq1yq8nbOp8yb69ujLstdSf+5BaWkp\nxcU3Ulx8I6WlpUGHI3ESWc3hfHfv7u493P1CknT2H4QLf6pWlUh6UlIlB4VCIa6ceiVv2puszlnN\n8288T4d9Heia3TWhcbTGDMLaR4qLF49i8eJRTJw4U4lVers26AAacvzxsHw5VFcHHYmIxJse/6WZ\nWAZ4z/7rbD7I/YCc43Ko2F+Bb3I2LNhAfp98xk8e31ohH6I1ZhDqkWLGSZqB9HV17Ag9eoQXVx44\nMOhoRCSelFSlkVgGeNcOTN87YC/t27Qnv30++9fvJ2d7Djf/7uaDA9MTpTnV3kXqkdRTjWsHqyup\nEkkvevyXRqJ7YwoKxpKdfVmTqpzXDkyvPraaivIKdqzdQe6aXDqs6cD4seMbTKhSaYySipKmHzPb\nbWa76tl2A72b8PnZZrYpevFlM5tqZuvMrDSynR313hQzW2Fmy8zsrFhiHzIEQqFYriAiyUhJVYar\nHZi+sv1Kup3WjbaD2tL2w7ZUv1bN4CMGc8V3rqj3c6k2Rqn2keKYMUsYM2aJSjSkAXfv5O6d69k6\nuXtTeuHvBerLrH/r7oWR7SkAMzuB8ILNJwDnAHeZWYsfMZ5+Oixa1NJPi0iy0uO/NDJhwjgWLZpJ\nWVn4dW2V84bMmzeP4puL2bZ/G5VHVFK9pZoRx4xgc8Vm+m3px63Tb22wlyoVxyjpkaJEc/cXI2uW\n1lVfsnQB4SW4qoBVZrYCGAW81pJ7jxsXLgC6Ywd06dKSK4hIMlJPVRppTm/MvHnz+N7/+x7re6+n\nakgVu/fuJmdTDpuXbWbAvgGNJlQiae4nZva2mf3ZzPIjx/oAa6POWR851iIdO0JRESxYEEOUIpJ0\n1FOVhGLoBo+PAAAgAElEQVSZwdeU3phQKETxzcXsH76fqt5VVFBBp02dyHk7h34D+nHz9PASNMXF\nNzYYQ3N7xURSxF3AL93dzezXwG+A7zX3ItOmTTu4X1RURFFR0SHnXHghPPEEXHJJi2MVkTgqKSmh\npKQkpmto7b8kU3cGX3X1fXEf/zPjdzOYs3gO2/puY2uXrdgBo82/29BnQx8e++NjVFRUNCkGrc+X\nmZJt7b9YRB7//cPdhzf2npn9DHB3nxF57ylgqrsf8vivqW3dli0waBBs3Ajt2sX8VUQkzrT2XxpI\n1Fil7gO6s37HejrmdqRyUyXtPmzHjF/NYOjQoRQX39ikGDRGSdKAETWGysyOdPeNkZdfA96P7D8J\nPGBmMwk/9hsILInlxt27w/Dh8NxzcO65sVxJRJKFkqoMUluLauMnG1m/ez29uvfClhs1m2qY8asZ\nXHDBBUGHKJIwZvYgUAR0M7M1wFTgS2Z2MlADrAJ+CODuS83sYWApUAlcGY+u99pHgEqqRNKDHv8l\nmdZ6/FdbiypnSA7rq9ez/ePtnFdwHjVVNezdlkVBQc+Dj/AS8QhSUlc6Pf5rDc1p6z7+OFxeYcMG\nyM5u5cBEpFla0tYpqUpC9Y1VinX80ozfzeCVylc40PsAW/Zu4cgdRzJgywBefWF7vcmTxktJQ5RU\nNa65bd1JJ8Fdd8Ho0a0YlIg0m5KqFNZYEhNLz1HtI7+SF0vYfMxmao6p4aSeJ7Hjox1sX7ifA7sn\nR42deo4xY5YwY8aUuH8/SR9KqhrX3LZu6lQoL4dbbmnFoESk2VrS1qlOVRI4XHXyli4/M2/ePC66\n8iL+8uFfWD9gPcs2LCPvozx2fLSDylAlA486vvW+lIg0yYUXwt//DvqdUyT1xTRQ3cxuBs4DKoB/\nA991913xCCyTtMaMv9paVDuH7SS7Tza7yncxsNdAOq/szOl9T2f85PEHSyeo1pRIcE4+GaqqwmsB\nDhsWdDQiEotYZ/89A/zM3WvM7CZgSmSTOGpJoc3Zc2azbf82KlZXYD2NDnkdqFpfRdEZRRT/tPjg\nebNmXRP12FGD0UUSzezTWYBKqkRSW9zGVJnZhcBF7v7tBt7XmKoGNGXMVHMGjodCIS668iLKBpSx\ns/1ObLXRuUNnCjYX8Nhdj2n5GYmJxlQ1riVtXUkJXHcdvPFG68QkIs0X6EB1M3uS8IKjDzbwvpKq\nRsRjtl0oFGL2nNnMf24+5YPL2dV7FzntcjjwwQHav9OeP9/yZ9WikpgpqWpcS9q6qio48kh46y04\n6qhWCkxEmqVVKqqb2UKgZ/QhwIFfuPs/Iuf8AqhsKKGq1ZT1sDJVrNXJQ6EQP/r5j/hw+4fszdvL\n/o77aVfTjj7eh+qCas6+6OyUSqhU0iF5xGM9LGlcTg6cfz787W9w/fVBRyMiLRVzT5WZXQZ8H/iy\nu1c0cp56qlpJKBTiuqnX8ebKN8kZlkP10dVs+3gbba0tR3c6mgH7BnDz5JtT5rGfio8mN/VUNa6l\nbd2bb4bHVn38MeTmtkJgItIsCS+pYGZnA5OB8xtLqKT11FZKX919NfuO2UfZ5jIO2AG6detG3to8\n+m3pl1IJFbS8hIRIKvvc58ILLM+dG3QkItJSsc7+uwNoAyw0M4BX3f3KmKOSJpv919msbL+SnG45\n5OTkUF5VTuVLlbTp0YYTup/ArdNvrTeh0uM1keRz3XXw85/DpZeGZwWKSGqJqafK3Qe5ez93L4xs\nSqgSKBQKsWDxArbXbGdru60cyDtAj/Ie9N7Zm0v7X8pd0+9qMKFqrNho0CZMGEd19X2UlT1HWdlz\nkRIS44IOS6TVnX02VFbCc88FHYmItESsPVUSkHnz5jH5V5MpKy/DPjCqulXRfk978lfl89hfGi+b\n0BrFRuOpsLBQ9bMkI2VlwX//N9x6K3zlK0FHIyLNpaQqBc2bN4/vT/s++4buo8IqqC6vpteKXuRX\n5zPutHEpNX6qIbHOhhRJVZdeCjfcAO++C8OHBx2NiDSH1v5LQbPun0X2qGy6n9Id+kNWtywOfHyA\n/p37c8V3rjjs5/V4TSR5tW0LV10Fv/1t0JGISHPFrfjnYW+kkgpxc87Xz+Gtbm/hgxyvdsrfK6f3\nB735+1/+3uReKg1Ul5ZSSYXGxaOt27YNBg6E996DPn3iFJiINEugFdUPeyMlVXHzxBNP8J17vgMD\nIW9vHjVLavjTtD+lVHFPSV1KqhoXr7Zu0iRo1w5mzIhDUCLSbEqq0lgoFGL+wvkAdBrWiZf//TJb\nnt1Clmcx8dsTlVBJwiipaly82rqVK2HEiPDPzp3jEJiINIuSqjQ1b948in9bTNaQLPK657Fj9w4e\n+K8HOPXkU4MOTTKQkqrGxbOt+/a3YcAA+OUv43I5EWmGhFdUl9YXCoUovrmYncfvZEefHSyvWk6v\nTr20FptIBvjf/4Xf/x5Wrw46EhFpCiVVSW7+wvlk98wmu1M2+3P30ymvE1u3bA06LBFJgKOOCs8E\nLC4OOhIRaQolVSmg6zFd2bV7F23L2lKztoaapTWMP3N80GGJSAJcfz28/DK8+GLQkYjI4SipSnJf\n/tKX2Vi1kWPzj6Xrv7uS/34+M66dkRYFPkXk8PLy4Kab4Kc/hZqaoKMRkcZooHoSqp3pV+3VbO23\nlaM6HEXFsgoAxp85nqFDh6rOlAQmXQaqm9lsYDywyd2H13nvv4FbgAJ33xY5NgW4HKgCJrn7Mw1c\nN+5tnTuMHg3f/z5897txvbSINECz/9JAKBTi+luuJ2dIDqurVlP9STV/++HfGDZs2MFzSktL+c53\nfs22beEyCl27zmPOnBuUWElCpFFSdQawB5gTnVSZWV/gz8Bg4HPuvs3MTgAeBEYCfYFngUH1NWqt\n1da9/jpccAF8+CF06hT3y4tIHZr9l+JCoRDXTb2Ole1XsrnjZtr3aM+Aowbwz2f/+Znzbr/9L6xa\ndRb79n2Vffu+yqpVZ3H77X9JaKylpaUUF99IcfGNlJaWJvTeIvHg7i8C2+t5ayYwuc6xC4CH3L3K\n3VcBK4BRrRvhZ40cCWeeGZ4RKCLJSUlVkpg3bx4X/eAi3lz5Jut9PUvXLqVXTi+y7NA/omXLVgLd\nadMmvEH3yLHEKC0tZeLEmSxePIrFi0cxceJMJVaSFszsfGCtu79X560+wNqo1+sjxxLqxhvhT3+C\nDz5I9J1FpClygg5AIrWoflvMzmE7qampYe+OveRX5PPBjg8YsG8A4yd/dqbf8ccfTSg0hwMHukSO\nzOH4449OWLxz5z5NdvZlFBSMBaCsLHxMjx8llZlZe+DnwJmxXmvatGkH94uKiigqKor1kgD07g2/\n+lW4KOjLL0NublwuKyJASUlJzDUglVQlgfkL55M1JIusPll4G6fbB93wd5x+A/px8/SbD5npN2nS\n5bzxxg1s3/4wAL17VzJp0uVBhC6STo4F+gPvmJkRHjtVamajCPdMRf/m0jdyrF7RSVW8TZwI8+aF\nHwNOndpqtxHJOHV/AZo+fXqzr6GkKkl07d6VdeXr6EQnanJryG+Xz63Tb623dEJhYSH33//rqNl/\nP0xoL9GECeNYtGgmZWXh19XV9zFhwjUJu79IHFlkw93fB448+IbZSqDQ3beb2ZPAA2b2W8KP/QYC\nSwKIFzO45x445RT46lfDY61EJDlo9l8SeOWtV/j2nG/TpVMXyreUU7O0hhnXzkjqRZJV0iFzpdHs\nvweBIqAbsAmY6u73Rr3/MTCiTkmFK4BKElxSoT5z54Z7qkpLw7WsRCS+AiupUF9Nl3rOUVJVj/1V\n+7nlpVvoXtmdne/vBD6tRSWSjNIlqWotiWzrLrkECgrg9tsTcjuRjBJIUlVfTZcGzlNSVUdVTRV3\nLrmT7nndueTESwgP4xBJbkqqGpfItm7bNjjpJLj3XvjKVxJyS5GMEVSdqvpqushhuDv3v3M/uVm5\nfPPEbyqhEpFm69oVZs8OV1nftCnoaEQkpoHq0TVdlBQ0z5MfPsnGPRu59vRrD9ai0jglEWmus86C\nyy6Diy6C55+HNm2Cjkgkcx02qTKzhUDP6EOAAzdwaE2XRjOr1qrdkmoWr17MGxve4PrR19M2py3w\naUHN7OzLAFi0aCazZl0Tc2KlRE1iFY/aLdK6pk+H996Dn/wE7r47PENQRBKvxWOqzGwY4fWvygkn\nU7V1W0a5++Z6zs/YMVW1CyQDDBo5iFf2vsLk0ZPp0aHHwXOKi29k8eJRUQU1n2PMmCXMmDGlxfet\nm6hVV98Xl0RNMpvGVDUuqLZu9244/XS48srwJiKxaUlb1+LHf43VdGnpNdNR7QLJuUNzKa8p5+7H\n7uYPF//hMwlVa1Hlc5HM0alTuCjo6NEwZAhk6IMAkUDFc+0/5zCP/zLR7DmzWbVrFWs+XsNqW02/\nnv14++W3DzlvwoRxVFffR1nZc5SVPRcpqDkO0OLFItI0xx4LDzwAF18MKxO3HKiIRMStorq7HxOv\na6WLUCjEglcXsP2E7VQcUUGbN9rQ56g+cNSh5xYWFjJr1jVR45/Cj+liGWulyucimWfsWPjFL2D8\neFi8GLp1CzoikcyhZWpa0fyF8+n1xV5srNxIm/ZtyD4imw2vbmD85ePrPb+wsPCQZCmWR3gNJWoi\nkt5+8hPYsCE8M/C556BLl8N/RkRip6SqFdV4DdvabWNQz0HYTmNP1h7OHnN2Qqul15eoiUh6Mwsv\nuFxeDuecA888Ex5zJSKtK55jqiSKu5N9fDaVGyvptbcXvXN7M2DfAK741hXNuk5jY61ERBpiBr/7\nHZx4Ipx/fjjBEpHWpQWVW8lTHz3F6+tf59wjzmXh8wuBlq/pp1pTkkxUUqFxydbWVVeHi4Nu2RKe\nHdi2bdARiaSGwBZUbtKNkqyhaU2vrXuNJz54guIziunSToMZJL0oqWpcMrZ1VVXwzW/C3r3wyCPQ\noUPQEYkkPyVVSeCDsg/4c+mfufb0a+ndqTeQuJ4m9WhJIiipalyytnWVlfDDH4Yrr//zn9Cj9Uvl\niaQ0JVUBW7drHb979Xf84HM/4LhuxwGJq2qu6umSKEqqGpfMbZ17eEmbv/4V/vUvGDQo6IhEkldL\n2joNVI+T7fu2c+eSO7l42MUHEyqA2267h3Xr8tiyZQm5uUeQnX3Zwd6keIouvVBQMLbV7iMiqcsM\npk2DKVNgzBh49dWgIxJJL0qq4qC8spzbX7udsQPGMqL3iIPHS0tLeeaZZezadRZbt47i3Xdnsnfv\nigAjFRGBK66Ae+4Jzwp85JGgoxFJH0qqYlRVU8UfXv8Dxxccz1eO+cpn3ps792kKCiaRk3MMMJyq\nqgvZsmVOq5REUOkFEWmOc86Bp56C4mKYNAkqKoKOSCT1KamKgbtz39v30aFNB/5z6H9iduij144d\nOzB8+DF067aDzp33MG7cyFYZ51RbPX3MmCWMGbNE46lE5LAKC+HNN2HNGjjjDK0XKBIrDVSPwWNL\nH+Pf2//NNaddQ2527iHvBz14vO5sQECzAyVmGqjeuFRs69zhttvCVdjvvhv+4z+CjkgkeJr9l0CL\nVi5i0apFFI8upkObhou+BFXmoG5Ct2vXbzBrR6dOPwY0O1BaTklV41K5rXvtNbj4YjjzTLj5Zq0Z\nKJmtJW2d1v5rgbc+eYunPnqKyaMnN5pQQXBr79VdiHn16oeBzzNgQPMXZhaRzHDqqfD22/Czn8HQ\noXDHHfC1rwUdlUjq0JiqZvr3tn/z13f/ypUjr6QgryDocERE4io/H/7wB3joIfj5z8NJ1YYNQUcl\nkhqUVDXDpj2bmPXGLC4/5XL6dekXdDiNqjsb8Igj1tK16zzNDhSJMLPZZrbJzN6NOvZLM3vHzN42\ns2fNrG/Ue1PMbIWZLTOzs4KJOnG+8IVwr9WwYXDSSfCb38D+/UFHJZLcNKaqiXZV7GLGizP46qCv\nMvro0UGH0yQaqC6tIV3GVJnZGcAeYI67D48c6+jueyL7VwHD3f37ZjYEeAAYCfQFngUG1deopXpb\nV59ly8KPBN95B379a7jkEsjSr+SS5jRQvZVUVFXwm1d+w/Cewxl/3PigwxEJVLokVQBm1g/4R21S\nVee9nwFd3P1nkX139xmR9/4FTHP31+r5XMq2dYfzf/8HkyeHa1rNmBEe0F5PJRmRtBDIMjVmdlWk\nO/w9M7sp1uslm+qaau5+8276du7LuYPODTocEWllZvZrM1sDXAbcGDncB1gbddr6yLGM8oUvwCuv\nwA03wFVXwWmnweOPQ3V10JGJJIeYZv+ZWRFwHnCiu1eZWVqN3HZ3HnjvAQzj0hMvrbe4p4ikF3e/\nAbjBzIqB3wHfbe41pk2bdnC/qKiIoqKieIUXODO46CK48EKYNy/cYzVlClx3HXz729CuXdARirRM\nSUkJJSUlMV0jpsd/ZjYXuNvdn2/CuSnXJT5/+Xze2fgO133+OtrmtA06HJGkkEGP/44CFrj7ifU8\n/nsKmJppj//q4w4vvBCua1VaCt/9Lnzve3DssUFHJhKbIB7/HQeMMbNXzWyRmY047CdSxEtrXuKV\nta9w1alXKaESSV8W2cIvzAZGvXch8HZk/0ngYjNrY2YDgIHAkoRFmcTMoKgIFiyARYvgwAE4/XQY\nOzZclkFrCkomOWxPlZktBHpGHwIcuAH4H+B5d59kZiOBue5+TAPXSZnf3kKbQ9z39n1c9/nr6Nmx\n5+E/IJJB0qWnysweBIqAbsAmYCpwLjAYqAI+Bn7k7psj508BrgAqgUnu/kwD102Ztq61VFSEHw3+\n6U/h3qsLLoBvfCOcaOUeuqKXSFJK+Ow/M1sAzHD3FyKvPwJOdfet9ZzrU6dOPfg6WccZrNm5htte\nvY0rR17JsV3Vfy1Sd5zB9OnT0yKpai1Kqj5r3Tp49FF4+GFYvjw8FutrX4MvfQnatw86OpGGBZFU\n/QDo4+5Tzew4YKG711sVMxUamq3lW7n5pZu5eNjFnNLrlKDDEUlK6dJT1VpSoa0Lypo14QRr3jx4\n6y0YPRrOOSe8DRoUdHQinxVEUpUL3AOcDFQA/13ba1XPuUnd0Ow9sJcZL83gS/2/xJcGfCnocESS\nlpKqxiV7W5csduyAZ5+Ff/0rvLVpA2PGwBe/GP45cKBqYEmwVPyzhSqrK5n56kyOPeJYLhpyUdDh\niCQ1JVWNS+a2Llm5w4cfwuLF4ZmEL7wQrn11+ukwcmR4GzECunQJOlLJJEqqWqDGa/jjm38kJyuH\nK065QrWoRA5DSVXjkrWtSyXusGoVvPYavP56eHvrLejdO7wO4YknfroNGKAlc6R1KKlqJnfn4dDD\nrNu1jkmnTSInK6ZaqCIZQUlV45KxrUsH1dXhNQjffRfee+/TbevW8HiswYPhuOPC26BB4WSre3c9\nQpSWU1LVTAv/vZCX177M5NGTycvNCzqcFqm7aLIWSZbWpqSqccnY1qWzXbvCswqXLw8/QqzdX7ky\nXDOrf/9wgtW/P/TtC336hH/27Rvu+cpLzaZfEkBJVTO8vv51Hlv2GMWjizmi/RFBh9MipaWlTJw4\nk+zsywCorr6PWbOuUWIlrUpJVeOSra3LZDt3hh8jrlwJq1fD+vXhEg+124YN4QHyPXvCkUeGtx49\noKDgs1vXrnDEEeEtP1+PGzNFS9q6jHzetXzrcuaG5vLT036asgkVwNy5T5OdfRkFBWMBKCsLH1NS\nJSISToBOOim81cc93NO1ceOn25Yt4bb0ww/hpZfCr7dtg+3bw9uePdC5c/jatVvt644doVOnT392\n6PDplpf36c/27Q/dcnP1qDIdZFxStWH3Bv745h/5XuH36Nu5b9DhiIhIQMw+TYwGD27aZ6qqwj1g\nO3eGE7Lo/T17YPfu8M8NG8L75eWwd+9nf+7bd+hWXQ1t24a3du0+3W/T5tOfbdqEk6/an/VtOTmf\n/ozesrPDW/R+3S0rq+GfdffNDt2veyz6eFO32vNr/3yit9pj0e/Vtx/9s6nHOnSIz+zSjEqqamf6\nfX3I1zm+4Pigw4nZhAnjWLRoJmVl4dfV1fcxYcI1wQYlIpLGcnKgW7fwFk81NeHlffbvD28VFeEx\nYQcOhPcrKqCyMrzVHq99XbtVVX36M3qr/Ux1dXirqvp0v3arqWn4Z91j7uH92p8N7bsfun+4raYm\n/N+j7vHaY9Hv1bcf/bM5xy69NLwoeKwybkzVropddG7bOegw4kYD1SXRNKaqccnS1olIbDRQXURa\nnZKqxqmtE0kPLWnrNIdBREREJA6UVImIiIjEgZIqERERkThQUiUiIiISB0qqREREROJASZWIiIhI\nHCipEhEREYkDJVUiIiIicaCkSkRERCQOlFSJiIiIxEFMSZWZjTSzJWb2VuTniHgFJiLSmsxstplt\nMrN3o47dbGbLzOxtM3vMzDpHvTfFzFZE3j8rmKhFJJnF2lN1M3CDu58CTAVuiT2k1ldSUhJ0CEDy\nxAGKpT7JEgckVyxp5F5gXJ1jzwBD3f1kYAUwBcDMhgDfAE4AzgHuMrO0Xf8w1f++pXr8kPrfIdXj\nb6lYk6pPgPzIfhdgfYzXS4hk+cNOljhAsdQnWeKA5IolXbj7i8D2OseedfeayMtXgb6R/fOBh9y9\nyt1XEU64RiUq1kRL9b9vqR4/pP53SPX4Wyonxs//DHjJzH4DGPD52EMSEUkKlwN/i+z3AV6Jem99\n5JiIyEGHTarMbCHQM/oQ4MANwFXAVe7+hJl9HbgHOLM1AhURSRQz+wVQ6e5/O+zJIiIR5u4t/7DZ\nLnePHsi5093zGzi35TcSkaTi7mkxnsjM+gH/cPfhUccuA74PfNndKyLHfga4u8+IvH4KmOrur9Vz\nTbV1ImmiuW1drI//VpjZF939BTMbCyyPV2AiIglgkS38wuxsYDIwpjahingSeMDMZhJ+7DcQWFLf\nBdXWiWSuWJOqHwK/N7M2wH7gB7GHJCLS+szsQaAI6GZmawjPYP450AZYGJnc96q7X+nuS83sYWAp\nUAlc6bF084tIWorp8Z+IiIiIhCW8orqZXRUpnveemd2U6PvXieW/zazGzLoGGEODxQYTdP+zzewD\nM1tuZsWJvHedOPqa2fNmFor83bg6qFgi8WSZWamZPRlwHPlm9kjk70jIzE4NMJYpkRjeNbMHIj3U\nEiVZ/j01RwNFUI8ws2fM7EMze9rM6h0rmwwaajtS5TuYWVszey1SRDtkZv8bOZ4S8deq22amYPyr\nzOyd2mLmkWPN/g4JTarMrAg4DzjR3U8Ebk3k/evE0pfwTMXVQcUQUW+xwUQwsyzgTsIFEIcC3zSz\n4xN1/zqqgGvdfShwOvDjAGMBmET4UU/QbgMWuPsJwEnAsiCCiAzo/j5wSmRQdw5wcRCxJKsk+/fU\nHPUVQf0Z8Ky7DwaeJ4HtUgs01HakxHeIjN37UqSI9nDgy2Y2mhSJP0rdNjPV4q8Bitz9FHevrUHX\n7O+Q6J6qHwE3uXsVgLuXJfj+0WYSHpAaqEaKDSbCKGCFu69290rgIeCCBN7/IHff6O5vR/b3EE4e\nAqkDFEm4vwr8OYj7R8XRGfiCu98LECk8uSugcHYBB4AOZpYD5AEbAoolWSXNv6fmqK8IKuG4/xLZ\n/wtwYUKDaoYG2o6+pNZ3KI/stiX8/+XtpFD8DbSZKRN/hHFoTtTs75DopOo4YIyZvWpmiyygtQLN\n7Hxgrbu/F8T9G3E58K8E3q8PsDbq9TqSoKChmfUHTgYOma6eILUJd9ADDgcAZWZ2b6Rb/Y9m1j6I\nQNx9O/AbYA3hwpc73P3ZIGJJYkn576mFerj7JggnLUCPgONpkqi241WgZ6p8h8ijs7eAjUCJuy8l\nheKn/jYzleKHcOwLzex1M/te5Fizv0Oss/8OYY0XC80BjnD308xsJPAwcEy8Y2hCHD/ns0VKW3UK\ndCOx/MLd/xE5p7bY4IOtGUuyM7OOwKPApMhvnYm+/7nAJnd/O/K4Osjp8TlAIfBjd3/DzH5HuDt6\naqIDMbNjgGuAfsBO4FEzuyTT/75mkKB/wTisum2HHVovLGm/Q+RpxSmR3umnI21PSsRfT5vZkKSM\nP8pod//EzLoDz5jZh7TgzyDuSZW7N1hR3cwmAo9Hzns9Mki8m7tvTVQcZjYM6A+8Y2ZGuJv4TTMb\n5e6b4x1HY7FExXQZ4a7TL7fG/RuxHjg66nVfAly/MfJY6VHgfnefF1AYo4HzzeyrQHugk5nNcffv\nBBDLOsI9qm9EXj8KBDX4eQTwkrtvAzCzxwkvS6Wk6lNJ9e8pRpvMrKe7bzKzI4FWaRvjpYG2I6W+\nA4C77zKzBYT/vaVK/PW1mfcDG1MkfgDc/ZPIzy1m9gThx/nN/jNI9OO/J4gkDmZ2HJDbGglVY9z9\nfXc/0t2PcfcBhP/HdUprJVSHY58WGzy/TrHBRHgdGGhm/SIzuS4mXOQwKPcAS939tqACcPefu/vR\n7n4M4f8ezweUUBHpdl4b+bcCMJbgBs9/CJxmZu0iv4yMJaBB80ks2f49NcdniqASjvuyyP5/AUH9\nktNU9bUdKfEdzKygdlZZ5PH+mcBbpEj8DbSZ3wb+QQrED2BmeZGeTsysA3AW8B4t+DOIe0/VYdwL\n3GNm7wEVQCD/s6rDCfYRzx3UU2wwETd292oz+wnhGYhZwGx3D2p22WjgUuC9yNgCB37u7k8FEU8S\nuZpwJe9c4GPgu0EE4e7vmNkc4E2gmnCj/8cgYklWyfTvqTms/iKoNwGPmNnlhGdIfyO4CBvXUNsB\nzAAeToHv0Av4S+SXlSzCvW3PRb5LKsTfkJtInfh7An+PPDLOAR5w92fM7A2a+R1U/FNEREQkDhJe\n/FNEREQkHSmpEhEREYkDJVUiIiIicaCkSkRERCQOlFSJiIiIxIGSKhEREZE4UFIlIiJJz8y6mtlb\nkXUwPzGzdVGvm1Rz0cxmm9mgw5xzpZl9Mz5R13v9/4gq6CtpRnWqREQkpZjZ/wfscfff1vOeeRL/\nj/Hvu0AAAAM/SURBVC2yhMujAS7FJa1IPVUiIpJqDq6CYWbHmlnIzP5qZu8DR5rZ3Wa2xMzeM7Mb\nos79PzMbbmbZZrbdzG40s7fN7CUzK4ic8yszuzrq/BvN7DUzW2Zmp0WO55nZo2b2vpk9Ymavm9nw\nQ4I0uyUS29uR65xBeJ3X30Z62I42s4Fm9lTkGiVmNjDy2fvN7C4ze8PMPogsaYaZDYt8t9LIdfu3\n2n9labZEL1MjIiISb4OBb7n7WwBmVuzuO8wsG1hkZo+6+wd1PpMPLHL3KWb2G+By4Ob6Lu7up5rZ\neYSX8DkHuAr4xN2/Hkmm3qz7GTPrAZzj7kMjrztHLZj8iLs/GTn+PHCFu680s88DvwfGRS7T191H\nRB4XPmtmxwJXAre4+yOR5auCXGZN6lBSJSIiqe7ftQlVxKWR9dpyCK+tNwSom1SVu/szkf03gTMa\nuPbjUef0i+yfQXhtO9z9XTML1fO5bUC1mf0RWADMr3tCZCHl04DHImv/wWefID0cucfyyLqMg4CX\ngf8X6aF63N3/3UDcEgA9/hMRkVS3t3Yn8vjsaqDI3U8Cngba1fOZA1H71TTcyVDRhHMO6S1y9yr+\n/3bun7WKIArD+HNAsUgwtWCTIgEL8Q9+kDSSYJ/0ySfIZ0gaLcWAndgoWthpFywsTCEIFiFFGjU2\nQfG1uINeNtFwYcBceH6wsOzOzG75cuYwcAd4CiwBz/4y7zDJ7SS32nVjfJnB2CTZaesdAy/alqLO\nCUOVJGnajYeay8BX4FtVXeHPVtq/5kzqDbAMUFXXgWsnFq+aBeaSPAc2gJvt1VH7R5J8Bg6qaqnN\nqUFv1t32fBG4CnyoqvkkH5NsMap+nejl0v/j9p8kadr9rugkeVtVe8Ae8Al4fdq4wf2Z6w5sAw9b\nY/z7dn0ZjJkDnlTVJUYBbr09fww8qKoNRhWnFeB+VW0CF4Ed4F0bu19Vu8AMsJrkR1Xda0c+fAf2\nGfV56ZzwSAVJkibQGuAvJDlu240vgYUkPzt+4xFjDe2aDlaqJEmazCzwauzQ0bWegaqx4jGFrFRJ\nkiR1YKO6JElSB4YqSZKkDgxVkiRJHRiqJEmSOjBUSZIkdWCokiRJ6uAX5JpS92QHQFAAAAAASUVO\nRK5CYII=\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f8040158550>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "#@test {\"output\": \"ignore\"}\n",
+    "\n",
+    "# Use the same input data and parameters as the examples above.\n",
+    "# We're going to build up a list of the errors over time as we train to display later.\n",
+    "losses = []\n",
+    "\n",
+    "with tf.Session() as sess:\n",
+    "  # Set up all the tensors.\n",
+    "  # The input is the x values with the bias appended on to each x.\n",
+    "  input = tf.constant(x_with_bias)\n",
+    "  # We're trying to find the best fit for the target y values.\n",
+    "  target = tf.constant(np.transpose([y]).astype(np.float32))\n",
+    "  # Let's set up the weights randomly\n",
+    "  weights = tf.Variable(tf.random_normal([2, 1], 0, 0.1))\n",
+    "\n",
+    "  tf.initialize_all_variables().run()\n",
+    "  \n",
+    "  # mu is the learning rate (step size), so how much we jump from the current spot\n",
+    "  mu = 0.002\n",
+    "  \n",
+    "  # The operations in the operation graph.\n",
+    "  # Compute the predicted y values given our current weights\n",
+    "  yhat = tf.matmul(input, weights)\n",
+    "  # How much does this differ from the actual y?\n",
+    "  yerror = tf.sub(yhat, target)\n",
+    "  # Change the weights by subtracting derivative with respect to that weight\n",
+    "  loss = 0.5 * tf.reduce_sum(tf.mul(yerror, yerror))\n",
+    "  gradient = tf.reduce_sum(tf.transpose(tf.mul(input, yerror)), 1, keep_dims=True)\n",
+    "  update_weights = tf.assign_sub(weights, mu * gradient)\n",
+    "  \n",
+    "  # Repeatedly run the operation graph over the training data and weights.\n",
+    "  for _ in range(training_steps):\n",
+    "    sess.run(update_weights)\n",
+    "\n",
+    "    # Here, we're keeping a history of the losses to plot later\n",
+    "    # so we can see the change in loss as training progresses.\n",
+    "    losses.append(loss.eval())\n",
+    "\n",
+    "  # Training is done, compute final values for the graph.\n",
+    "  betas = weights.eval()\n",
+    "  yhat = yhat.eval()\n",
+    "\n",
+    "# Show the results.\n",
+    "fig, (ax1, ax2) = plt.subplots(1, 2)\n",
+    "plt.subplots_adjust(wspace=.3)\n",
+    "fig.set_size_inches(10, 4)\n",
+    "ax1.scatter(x, y, alpha=.7)\n",
+    "ax1.scatter(x, np.transpose(yhat)[0], c=\"g\", alpha=.6)\n",
+    "line_x_range = (-4, 6)\n",
+    "ax1.plot(line_x_range, [betas[0] + a * betas[1] for a in line_x_range], \"g\", alpha=0.6)\n",
+    "ax2.plot(range(0, training_steps), losses)\n",
+    "ax2.set_ylabel(\"Loss\")\n",
+    "ax2.set_xlabel(\"Training steps\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "TzIETgHwTexL"
+   },
+   "source": [
+    "This code looks very similar to the code above, but without using `l2_loss` or `GradientDescentOptimizer`. Let's look at exactly what it is doing instead.\n",
+    "\n",
+    "This code is the key difference:\n",
+    "\n",
+    ">`loss = 0.5 * tf.reduce_sum(tf.mul(yerror, yerror))`\n",
+    "\n",
+    ">`gradient = tf.reduce_sum(tf.transpose(tf.mul(input, yerror)), 1, keep_dims=True)`\n",
+    "\n",
+    ">`update_weights = tf.assign_sub(weights, mu * gradient)`\n",
+    "\n",
+    "The first line calculates the L2 loss manually. It's the same as `l2_loss(yerror)`, which is half of the sum of the squared error, so $\\frac{1}{2} \\sum (\\hat{y} - y)^2$. With this code, you can see exactly what the `l2_loss` operation does. It's the total of all the squared differences between the target and our estimates. And minimizing the L2 loss will minimize how much our estimates of $y$ differ from the true values of $y$.\n",
+    "\n",
+    "The second line calculates $\\sum{x_i (\\hat{y} - y)}$. What is that? It's the partial derivative of the L2 loss, the same thing as what `gradients(loss, weights)` does in the earlier code. Not sure about that? Let's look at it in more detail. The gradient calculation is going to get the partial derivatives of loss with respect to each of the weights so we can change those weights in the direction that will reduce the loss. L2 loss is $\\frac{1}{2} \\sum (\\hat{y} - y)^2$, where $\\hat{y} = w_2 x + w_1$. So, using the chain rule and substituting in for $\\hat{y}$ in the derivative, $\\frac{\\partial}{\\partial w_i} = \\sum{(\\hat{y} - y)\\, x_i}$. `GradientDescentOptimizer` does these calculations automatically for you based on the graph structure.\n",
+    "\n",
+    "The third line is equivalent to `weights -= mu * gradient`, so it subtracts a constant the gradient after scaling by the learning rate (to avoid jumping too far each time, which risks moving in the wrong direction). It's also the same thing that `GradientDescentOptimizer(learning_rate).minimize(loss)` does in the earlier code. Gradient descient updates its first parameter based on the values in the second after scaling by the third, so it's equivalent to the `assign_sub(weights, mu * gradient)`.\n",
+    "\n",
+    "Hopefully, this other code gives you a better understanding of what the operations we used previously are actually doing. In practice, you'll want to use those high level operators most of the time rather than calculating things yourself. For this toy example and simple network, it's not too bad to compute and apply the gradients yourself from scratch, but things get more complicated with larger networks."
+   ]
+  }
+ ],
+ "metadata": {
+  "colabVersion": "0.3.2",
+  "colab_default_view": {},
+  "colab_views": {},
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/tensorflow/tools/docker/notebooks/3_mnist_from_scratch.ipynb b/tensorflow/tools/docker/notebooks/3_mnist_from_scratch.ipynb
new file mode 100644
index 0000000000..b51983b71c
--- /dev/null
+++ b/tensorflow/tools/docker/notebooks/3_mnist_from_scratch.ipynb
@@ -0,0 +1,1985 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "9yupXUk1DKOe"
+   },
+   "source": [
+    "# MNIST from scratch\n",
+    "\n",
+    "This notebook walks through an example of training a TensorFlow model to do digit classification using the [MNIST data set](http://yann.lecun.com/exdb/mnist/). MNIST is a labeled set of images of handwritten digits.\n",
+    "\n",
+    "An example follows."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAMYAAABFCAYAAAARv5krAAAYl0lEQVR4Ae3dV4wc1bYG4D3YYJuc\nc8455yCSSIYrBAi4EjriAZHECyAk3rAID1gCIXGRgIvASIQr8UTmgDA5imByPpicTcYGY+yrbx+t\nOUWpu2e6u7qnZ7qXVFPVVbv2Xutfce+q7hlasmTJktSAXrnn8vR/3/xXmnnadg1aTfxL3/7rwfSP\nmT+kf/7vf098YRtK+FnaZaf/SS++OjNNathufF9caiT2v/xxqbTGki/SXyM1nODXv/r8+7Tb+r+l\nnxZNcEFHEG/e3LnpoINXSh/PWzxCy/F9eWjOnDlLrr/++jR16tQakgylqdOWTZOGFqX5C/5IjXNL\njdt7/NTvv/+eTjnllLT//vunr776Kl100UVpueWWq8n10lOmpSmTU5o/f0Fa3DDH1ry9p0/++eef\naZ999slYYPS0005LK664Yk2eJ02ekqZNnZx+XzA/LfprYgGxePHitOqqq6YZM2akyfPmzUvXXXdd\nHceoic2EOckxDj300CzPggUL0g033NC3OKy00krDer3pppv6FgcBIjvGUkv9u5paZZVVhoHpl4Mv\nv/wyhfxDQ0NZ7H7EQbacPHny39Tejzj88ccfacqUKRmHEecYf0Nr8GGAQJ8gMHCMPlH0QMzmEBg4\nRnN4DVr3CQIDx+gTRQ/EbA6BgWM0h9egdZ8g8PeliD4RutfF/Ouvfz9OtZy8aNGiNH/+/GGWl112\n2XzseYuVNKtqsaI23Ghw0DYCA8doG8JqO+AUG2+8cVq4cGHaY4890vLLL5/WXXfdfI6jvPDCC3lJ\n8amnnkoezP3000/pl19+GThHtWpIPekYomTxFS7HnkqKjMsss0yGgFE4r62tSBFVJ02aNPyconi9\nV4/JwzHwT9ZNNtkkeZ6w5ZZbph133DH99ttv6ccff8zXX3nllcRRnHNfv2cNGMQWGRaOrWbUrjsG\nBRLAA6U4Lhoqw9h2223ztRBq6aWXzsbgvueffz4Lu9NOO2UnYTgrr7xy7tO9nOH111/Pbb744ov0\nww8/jAvngAdFMvQDDjggG/0GG2yQX1GZNm1aziCCwzrrrJPl3muvvXKwePnll9M333wzHDCKWPbL\nMbuAkfISjnvvvXcW/emnn85lqCBqa4a65hiYR/Gk2RNGRlwm3n7ggQfmdrKD9sqJtdZaKxvCnDlz\n8n3Tp09PXmPYeuutc0SVNQjvnmuvvTa3efzxx9N33303PGZ5rF75DBvvqq233nrp22+/TWeddVby\nikpgxCE4vQDhlQUBRfDw2esbs2fPTquvvnqviNN1PuIdJ4GErVx44YUZowsuuCB9+umn6eeff84B\nspmsWqljhPFDxjGGYx/lDkN33udajCoVlAjRzl4U8LjefRwnPjsXG8OJqKBd8NB1LTU5IHyCd7LJ\nGOYXNoGjFqaGIKtrERDIDKtukfGMH/zRZa1A101+YBF44KfMYzO8VOYYjDWiukiGqc022yyXOUqd\nzTffPJ/z1ialeqNVxA9gi0wzlOJ5juJlR8JeddVV+ZrIKTq4ZvJp/8EHH+SU+txzz+W2SqmxVFZR\nplrH5DTRXmGFFdKuu+6azjjjjOzosl5g6D54CQCI4mGjhNQO5occckh2LvLTA6fqJOEnyhU6kNlk\nZmUuvrtNcFx77bUzhsZWXgoSsm6t4Dsa/tp2DErCmA04HAI4FLjaaqtlBhmnSKiNY4rDtHZFB6jF\nMMH0RVDH+nCPYxtDCFJnKkniRbDitWjTK3sykQUuMLPn3DZGX8SFnCG/fVyz5zCCBtIHTLshdzif\n8fERn8cKXxjCNOwCTu3Qf6yqhV4AQokiP489//zzM0DxnQYKwqAtIkko1kQzFFxvaNcJ6u3Pe+65\nJ/cRRvDee+9lA2BInIyRff/997nNO++8k7t0vl2A6vHWynmyiPJ43WKLLbIijz/++LTddtvlTCdz\nwIWSg9yjxBJ0GN/DDz+c7zv77LOzbEceeWSekwVGgsOsWbNyNo0+qt7DfPvtt8/dmtvIGnPnzk3P\nPPPMsJ6rHrNef/BBeJA90RprrJEDcNhctMkXR/mnbccwuCjNGTbaaKMc8TBZprITxOdgOvbuKxqG\nz6LSJ598kseJ9Gi1CYmSv/76a3YyJZWMZJ6Ceskp8EMusihFEAyUmVaa8G2rxTNHIrd733///eH7\nYeaLNe5xrEzlWNF/HqQDf0Tm+GIbvYdD43MsKAIo/JDgE0G5aFfN8NaWYxiUshikqGYTTUSt0TCk\njXsYNqJQQso+rgGa0vX58ccf56hQTtk+48F92rmvlnE1A0on2uKP0Yrw+Nxzzz0zn+ZhjKwRXq6v\nueaa2TmUiRQfS7SyNeMks9IV9vrvJOl/q622yo4Mfw5Pvm6TMclLdit6shh+YAMnq1E29tEsteUY\nBgMSgxa5MOAzJZcVXQs4bUR8XxhCHIwzMALCBuCcx5q0tF3u133l8XrRMchFiRYNyMxBKM/5IjZl\nWVzjULKwACISytIWFsi56aab5mvOKyEikmdAO/iHY+BDCRUZuoPD1e1akECyLseA7d13352DhdKa\nk8Cmlt3U7TSl9p58FwejYK8ncAwKpDTnGDcARbWiAUjHiNEHsITSPlagpEZChcfrZzwSOfBOiQwX\nLuR3PjAhtwAD08iAMCO/a+5xPTIm3ALjwERf0V+c69QeT7ZujVdLDhgKBrANXAMreMESRkU7rdVP\nrXNtZ4xIpSLH1VdfnR3j4IMPzkbw2Wefpa+//jovo5188slZsZjArAcvFP3YY4+lSy+9NEdTdTTy\n0I5xHHfccfm1CH2LtuORKEqmkwVlVU+sBY+IdJRmE0zeeOONnEXuu+++7AhnnnlmWn/99XMJ5brt\nzTffzHMJx/o555xzkgdb0U8rRtAKrnTYqtG1Ml6teyxInHDCCdlGYByBmG2Z97ChVvFo2zEwbHCR\nTbqP7EDxPjN2pUBEe86AXAcsg+f10TYMSTvnRM1ulQe1wG/nHEXZZEJZUIYQ5cgWMsEgMgqclFdk\ndh+MbFFyuddnWMLNfTYkcuuXHlBkpFYNI3dS+mMMfCHHsZWadfUjmQVn8iLywscG21apMscQwR55\n5JEM3KuvvpoZ5LHOmzgjAvBwzFt2/Oijj3Lm4Ayin/MU/eGHH+b2N998c/5MGSaZ44nw7OEd5Rx7\n7LE5+1EehYXxkpes5li2K6+8Mhv8Lrvsko381ltvzcEBfvHQKh5auk9GPvHEE3NJAx+/eKL/HXbY\nIQcbK3nwN067xAk4s5VHdbvsx0nxrYQeKxJMZAfBA7GlRx99NC9EtCN7JY4RoPBeAHIAyrB3jpHY\nwqu1d02d7HpZcfqINo5dL7eJMXtxTzk2sgWFM/gcsnCakI2cFOk+523O+Qw7WaeYHYpYRp9xn4Bk\nbPdWSfgJXYYM+ne+2xRj2sdx8EDu8rm4Ntp9pY4RSmb0CIPOAVNGoLA47yU4S2xen37ppZdy9CkL\nE/3lm8bJHzJbbiavt2Q9p7AkK7oyXAZOLk7gs9c4PJC0AOE8DDyrgJkaWgYQkSPYuAdpWySfteU8\nHhqKouYq+io6ZfGeZo7xpbT1+jt+jGULfprpq922ePHMBibwjWVq523KVrzBsIzTaMeu1DFi0HI0\nYyyYtAekY5MltbRyihFJiROBKIYTwMCTWJNubwdQFCXFapK9z96mtbjgs3thFKWnUgjBzNZIya5F\nOyUcPG36q4LwRgZ6Ix8HtBk3tirGGU0feAkslHfk5PzBh2cXSkvtWqWOOEaRGcoSHdXDMoYn1tK8\nyaON0ahbCWgFS/vxSnjn5F4ItLeiFAGAzCKc7MDA1OlIjc4pLFKE7FEyxb5ZPNTbtuiv2fvrtddf\nOFsYXcwj8d8qv/XGq3femLvvvnvOvrIYPPEjG+PDseDbDnXcMXiyiGiyyACOPvrovN95552zV3/+\n+ef5zVveznlEo6CICvG5l/d4JSvHP+qoo7JjKDs4PkVSGPm9HSz9W5rlPEoCQYHjVFXyRGnBOcKA\n28VOP/qTBWX6YnS2IKB8qYL/enyGHPbKziOOOCLj6sGeslGW8L6Y4ANr2MY99fpsdL7jjmFwkSTS\nr6gDVCk+tmDQedcJ5LgdwaLPbu7xjJRRNlErSsiQhVHJlOEQoh182o1wRTnharwYs3itnWP9Rd/R\nD5mLW5yveh/YRhYMjItyBh/wjPat8tEVx6B00RKo5513XpIl7rzzzuwEourMmTOz95uIcyBfTSXY\niy++mCOrSFS1klsFrNZ9eGPoJtmeyRx00EE5cpGbIi21XnbZZbkMee2117KMHIKMIVcotVb/vXoO\nz6I0+URoMlVFcBFE7L1+IjNYIo6v/fo+D3tC+FCR+FHuwNUCgfOtUlccI5hnJMoIBhN1sBICqMoN\nNaLP3pkiFGciIIBC4HaEbRWk0dyHb3Mp/EY0I6+NsytvyKxsKhpQr8ozGpm1IZ8IbV+PyllGuyh1\nYBXXOQEcy6R8M5eAHzuxxX3GRvbaCKJ4aRfXrjkG5jEbk00Prxi8SZTJKmc5/PDDc5v99tsvC+hB\njWtqStmD0F4Ma1foMvDtfqZMUc3/lYjMSFFW3NS7JtyyoKzSiTocHoFJHMc+MlK7Mta7n9NbATJe\nrbEYvQWIWCVitIyaXrV3nsG7H2Y2GVcbxyj6NX+waKEPmOvbfShwtjhQDDz5Ygt/uuoY+OPtnICD\nEMBTWsAQUu0NBBsDEgFEWOADAiDaVRERWsCq5i34IRN+TbTJgn8KwzOFuR4KDUXW7Kyik53Ep8w/\n+RkxWeO5S1EM5wVABguXMGp69dk1x87D0ObdL32GHI5tsDQGHtwbm/Hw4TpnKvNY5Ge0x113DEwT\n3tIsIdSnDIfxcxJAevCHfE9cXcmotHXfAw88kIFUdgFjLMn4HuZRuh9FExmjRCCnZxRqcPxz8ioU\nVk9eRhJkPAYHV8ZVFRkjjFSfAtw222yTy2OZ0iv15fHcQ4dKaMcwsBdEEL26RzaIh5+yK7LSBGPn\no8yOZX+vzRhfXzZ8cRrtyzzkzpr803XHwB8wTJYIRol+VY8zqMMBbP0f+cExE1qTdbU7x3jwwQdz\nVBYdesExKNiEWx2MfwoOAyCbJ9uRHZvUTcPmsENhGNE4HBKOHKNqZzQu3KNfX9H1nRABQZlbNkpt\n4SNo4DWIIesDj9qYnwki2giWqol3330348kZLPm7xvi1Pffcc7MzhA3gy/0oeIuxWtmPiWNgNCIF\nYwcCAa2FA1ikJZz1aeUVsBmge9TyoqGoIqKUFdEKCFXcU0/pHJizVMUnXBiBh6IicdTTzsEOnuZk\nDE/2rcJI4KMf/TF+0TucwDhkZ+DGL4/nGkPGV/AIC+2RvfP6ZPTI4gu5XNM/Um7RPzuIFyn1zW7w\npQ9UHj+fbOHPmDlGCOGBGIeQQfwuq0jnISBQfOHft7JEHN94Q5xF6XLFFVfkyKIEGyuiGAo3r6BI\nx0imcM6k+6GHHspOEQbcDq+UTl4BwRu7PstUiPEJFsa9/PLL83nXg6d2xnUvoxS5L7744uGyh/wy\nRpRF9YwSHsHjE088kWWADQeRFThZkTgBstensZG5h4m56oEdcAp9CwTOVUlj6hgECcGBpA6XDaze\niLKhVABQAhKB3cNxbEAL4KoEppm+gjf3OMafDf+UW7zeTL/ltqIiAxBMOIIxnLOHgbFsMGQ4InhE\n0nJfrXw2hnIRD3SFBKmYWDfqE49woFvOzZno3NxM0HDciMjBDsjEBgLTsJHYN+qjmWtj7hjBLKFF\nQgL7qRz14jHHHJPBcC2M3wRPVDT5ohzZRv0Z16O/sdozAKmdopUH5kftTrzJpl+lk29CcgpLw3Bg\npMbwwqF/S80pGJ6xO0WM+8Ybbxw2TuOEoTYakwyovB/JKdzDMVQOHvCRzXju890fL11aGhcMqqIx\ndwwCRkYQDZAaE7lWBhyosQEmQM439MgffDHm0Si8EcuBC0ezcQSZVKYktzFEW+3sfQ4natRvu9eM\nTS9F7IvHo+m/2fb6LNuCc0WsW+mzHq9j6hgE9YCHp5tkez2EAVjlMOmyUlU2Lis8ygVR0rykyolt\nPZCaOY9fr32Qp50X6xi7pWCGbsHBvwLgGIcddljGxvcsjOU1GseyiKjJQWydpiqNsBlei85BfhNx\neJunVCl31x0jBOMAjJ9jRC3OEERDS7QMI0qQohIYgLSq7FJuMZbi9WZA7kRbvFAWx5Dyy449mjED\nG/dyDPW4VSiy2iNvBcCSUdxyyy35OYHrqJUx843j8I/qQpA074BVVdR1x+AIHCIiIGewsqIuds41\ntSSlOxeOFHuOQ/E+2zPEuFYVKM32U3RMvGy44YbZMTg2B2+GOIXXJcjpR9lkUy/QyZ7GUU8zAD9R\nCiuR0oQYVv1IMAk7qFL+rjkGg7GZQPLufffdN69QKJtkCAKKjNGu1p7gMgWDYEDRpkpAmu0rnMLe\nhie/RavcI49Sr1ZW0w6V91ac/IsxmdHPB0U5pQ+4+TExDudNUhPufnaKIn7N6m2k9h11jKLRqP+U\nQJb2eHh4uYjK0LW1D0MpCq0NR4g24RTR/0hCdvM6/m14FtljeTL4D/liedFeO7LYcyh7eMGDY8X1\n6IM8Vp9kWjj2GwWG5IZb2FKVOHTMMTCvDKBgD2Z22223bNynnnpqVrZXBFxjQDZUFJiwIqKHN8qH\nO+64IxvN/fffn9vG/VWC0UpfeC5uZMEbg/ctM/8SzYOxZ599Nhs4ebSx0ECpcDFvMCdRggkesoQ+\nzaHU0N4EgAEnue2227JTON+LgaEVDFu5h+w2Wdl33GFkEUIQqYIqdYwwbJGO8q2xOydqUiTFWpJV\nPzsuUwhlzzFETxlGdFSCqaMB4XwvUzgKWU3AyW4uwFns4QMbilUyxbq8p/4cw3UEB8FDGQUDx/ac\nqB8zRS2dw5qthe3VatPKucocg6JiYu3lP2nfawvekKVITzgJQLH24QTBtPZeE2D89957b27jwZ1I\nwIm8R2OMWHmJ+3pxTzaK8l+HyMrgTzrppMxqOIEsGoZvz0nsyWiliRMUl2G9aOk6POyLZVUvYtBp\nniL4wA1m9lVSW46BOQqKpTLK9FnUsxftvW4swssa4dkhCGFCMNfcp08lhM9KKc4h0obgsa8ShHb6\nCv5DJnu8IwHB9TB852DkOlzIRV6kXbSVMfQj48BWdhE0TLr1Fe3zQR/+gRMK5yjuq4KjZccQ2SlY\njexHmCnSkiLjtsesmlnpQ5naFo1A5GMAHoJxBI709ttv54ygntZWmWEcQMS9VQleRT9kNmfAG0P3\nHRPGbHnVudg4gEyJOAYiE0wikHAAcxHyxndO4KI/WHEK/Qzo7wjAXfaFNdurikaNtIERRTqmYIYd\nE2tGEs8hfJ8iFB/3xV67MCjG8NZbb6Unn3wyC+XfDxfnDxFp496qhK6qn5CDA5twK/fIRH5Gb0MM\nOhxCFgkKjOBoHqKEkmWvueaanG04iTHcP3CKQO0/e3ZhgceP2smqcKyKRuUYlEKhPDL+d5z1c4qV\nFTDnmBIZMwZ9DiKAzTmvCetPNFR7W7fXXt/KLddqTcyjr17bRybkEF5XiQhPHnMuDlF07MCB3I49\nl4EDxTrnfsFBJBxQbQSKeGoROqjdurWzIzoGJqRxS2KUf/rpp2flcRDRjRKVCdpFhCwz7rOVKE5z\n++235/7uuuuuXDq5P5yKEY0np8B3TKb9K1/vLTF0/7MiJtyRPYrq4fx+7R2e7vFDDzDyfx1goPwc\nUGMEYG/rFI3oGAYW0UUyimQIcRwGzbgpVsZAUTYE065xCtc5GUeSHTyg4kzKs/FKoSBljyhvTz6y\n2gseZAwlwgI+cNBGtpV9ZRj4BobjFY9O8g0bQcXWaRpxBE5hHuFnJ0XB6dOn56ge2QGDlK2dFSSG\n4b8kxVzEdSWGVxgYQLzrxJkIGgbTaUE73b9MZ/KNfIMOJpdcckndYZWmFAwv+wgydW/o8wsCK3xn\nz56dFzx8oxPGtk7QiI5h0FBaeGzRKYIpjDN2ig6lB9OiprmI60qNieIMIXvsQy7yotjH9eI+2hbP\nDY4bI8D+2JdnWTYY+iwDs78qaUTHEM0sI1pClAVMnqX9ImGQszB6DHoNOLzZNZlGRlEq9JNB9JOs\nRXvoxDGnsDTudwFUHTNmzMjDqEaU9xYvGgWiZnka0TEo16CeNyCM1SLtwmt5cNEoCOUa5xjQAIFW\nEGBP5rbKdTRr1qwcfGUMthXVTCt917pnRMdwE6ZiQm0JckADBMYCgWLwtXjTSeq/d5Y7ieag7wmD\nwMAxJowqB4JUicDAMapEc9DXhEFgcjxcM7vvR4on7bHS1q84WNkpUr/iEL+aOLRw4cIlQCmuIhUB\nmsjHlpQ9c7EmzjEsN1vd6DeCg8UVT+qRd7b6EQey8wMT+6El8RSu36xhIO8AgQYI9F94bADG4NIA\ngUDg/wHX+3lgThDIegAAAABJRU5ErkJggg==\n",
+      "text/plain": [
+       "<IPython.core.display.Image object>"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from IPython.display import Image\n",
+    "import base64\n",
+    "Image(data=base64.decodestring(\"iVBORw0KGgoAAAANSUhEUgAAAMYAAABFCAYAAAARv5krAAAYl0lEQVR4Ae3dV4wc1bYG4D3YYJucc8455yCSSIYrBAi4EjriAZHECyAk3rAID1gCIXGRgIvASIQr8UTmgDA5imByPpicTcYGY+yrbx+tOUWpu2e6u7qnZ7qXVFPVVbv2Xutfce+q7hlasmTJktSAXrnn8vR/3/xXmnnadg1aTfxL3/7rwfSPmT+kf/7vf098YRtK+FnaZaf/SS++OjNNathufF9caiT2v/xxqbTGki/SXyM1nODXv/r8+7Tb+r+lnxZNcEFHEG/e3LnpoINXSh/PWzxCy/F9eWjOnDlLrr/++jR16tQakgylqdOWTZOGFqX5C/5IjXNLjdt7/NTvv/+eTjnllLT//vunr776Kl100UVpueWWq8n10lOmpSmTU5o/f0Fa3DDH1ry9p0/++eefaZ999slYYPS0005LK664Yk2eJ02ekqZNnZx+XzA/LfprYgGxePHitOqqq6YZM2akyfPmzUvXXXddHceoic2EOckxDj300CzPggUL0g033NC3OKy00krDer3pppv6FgcBIjvGUkv9u5paZZVVhoHpl4Mvv/wyhfxDQ0NZ7H7EQbacPHny39Tejzj88ccfacqUKRmHEecYf0Nr8GGAQJ8gMHCMPlH0QMzmEBg4RnN4DVr3CQIDx+gTRQ/EbA6BgWM0h9egdZ8g8PeliD4RutfF/Ouvfz9OtZy8aNGiNH/+/GGWl1122XzseYuVNKtqsaI23Ghw0DYCA8doG8JqO+AUG2+8cVq4cGHaY4890vLLL5/WXXfdfI6jvPDCC3lJ8amnnkoezP3000/pl19+GThHtWpIPekYomTxFS7HnkqKjMsss0yGgFE4r62tSBFVJ02aNPyconi9V4/JwzHwT9ZNNtkkeZ6w5ZZbph133DH99ttv6ccff8zXX3nllcRRnHNfv2cNGMQWGRaOrWbUrjsGBRLAA6U4Lhoqw9h2223ztRBq6aWXzsbgvueffz4Lu9NOO2UnYTgrr7xy7tO9nOH111/Pbb744ov0ww8/jAvngAdFMvQDDjggG/0GG2yQX1GZNm1aziCCwzrrrJPl3muvvXKwePnll9M333wzHDCKWPbLMbuAkfISjnvvvXcW/emnn85lqCBqa4a65hiYR/Gk2RNGRlwm3n7ggQfmdrKD9sqJtdZaKxvCnDlz8n3Tp09PXmPYeuutc0SVNQjvnmuvvTa3efzxx9N33303PGZ5rF75DBvvqq233nrp22+/TWeddVbyikpgxCE4vQDhlQUBRfDw2esbs2fPTquvvnqviNN1PuIdJ4GErVx44YUZowsuuCB9+umn6eeff84BspmsWqljhPFDxjGGYx/lDkN33udajCoVlAjRzl4U8LjefRwnPjsXG8OJqKBd8NB1LTU5IHyCd7LJGOYXNoGjFqaGIKtrERDIDKtukfGMH/zRZa1A101+YBF44KfMYzO8VOYYjDWiukiGqc022yyXOUqdzTffPJ/z1ialeqNVxA9gi0wzlOJ5juJlR8JeddVV+ZrIKTq4ZvJp/8EHH+SU+txzz+W2SqmxVFZRplrH5DTRXmGFFdKuu+6azjjjjOzosl5g6D54CQCI4mGjhNQO5occckh2LvLTA6fqJOEnyhU6kNlkZmUuvrtNcFx77bUzhsZWXgoSsm6t4Dsa/tp2DErCmA04HAI4FLjaaqtlBhmnSKiNY4rDtHZFB6jFMMH0RVDH+nCPYxtDCFJnKkniRbDitWjTK3sykQUuMLPn3DZGX8SFnCG/fVyz5zCCBtIHTLshdzif8fERn8cKXxjCNOwCTu3Qf6yqhV4AQokiP489//zzM0DxnQYKwqAtIkko1kQzFFxvaNcJ6u3Pe+65J/cRRvDee+9lA2BInIyRff/997nNO++8k7t0vl2A6vHWynmyiPJ43WKLLbIijz/++LTddtvlTCdzwIWSg9yjxBJ0GN/DDz+c7zv77LOzbEceeWSekwVGgsOsWbNyNo0+qt7DfPvtt8/dmtvIGnPnzk3PPPPMsJ6rHrNef/BBeJA90RprrJEDcNhctMkXR/mnbccwuCjNGTbaaKMc8TBZprITxOdgOvbuKxqGz6LSJ598kseJ9Gi1CYmSv/76a3YyJZWMZJ6Ceskp8EMusihFEAyUmVaa8G2rxTNHIrd733///eH7YeaLNe5xrEzlWNF/HqQDf0Tm+GIbvYdD43MsKAIo/JDgE0G5aFfN8NaWYxiUshikqGYTTUSt0TCkjXsYNqJQQso+rgGa0vX58ccf56hQTtk+48F92rmvlnE1A0on2uKP0Yrw+Nxzzz0zn+ZhjKwRXq6vueaa2TmUiRQfS7SyNeMks9IV9vrvJOl/q622yo4Mfw5Pvm6TMclLdit6shh+YAMnq1E29tEsteUYBgMSgxa5MOAzJZcVXQs4bUR8XxhCHIwzMALCBuCcx5q0tF3u133l8XrRMchFiRYNyMxBKM/5IjZlWVzjULKwACISytIWFsi56aab5mvOKyEikmdAO/iHY+BDCRUZuoPD1e1akECyLseA7d13352DhdKak8Cmlt3U7TSl9p58FwejYK8ncAwKpDTnGDcARbWiAUjHiNEHsITSPlagpEZChcfrZzwSOfBOiQwXLuR3PjAhtwAD08iAMCO/a+5xPTIm3ALjwERf0V+c69QeT7ZujVdLDhgKBrANXAMreMESRkU7rdVPrXNtZ4xIpSLH1VdfnR3j4IMPzkbw2Wefpa+//jovo5188slZsZjArAcvFP3YY4+lSy+9NEdTdTTy0I5xHHfccfm1CH2LtuORKEqmkwVlVU+sBY+IdJRmE0zeeOONnEXuu+++7AhnnnlmWn/99XMJ5brtzTffzHMJx/o555xzkgdb0U8rRtAKrnTYqtG1Ml6teyxInHDCCdlGYByBmG2Z97ChVvFo2zEwbHCRTbqP7EDxPjN2pUBEe86AXAcsg+f10TYMSTvnRM1ulQe1wG/nHEXZZEJZUIYQ5cgWMsEgMgqclFdkdh+MbFFyuddnWMLNfTYkcuuXHlBkpFYNI3dS+mMMfCHHsZWadfUjmQVn8iLywscG21apMscQwR555JEM3KuvvpoZ5LHOmzgjAvBwzFt2/Oijj3Lm4Ayin/MU/eGHH+b2N998c/5MGSaZ44nw7OEd5Rx77LE5+1EehYXxkpes5li2K6+8Mhv8Lrvsko381ltvzcEBfvHQKh5auk9GPvHEE3NJAx+/eKL/HXbYIQcbK3nwN067xAk4s5VHdbvsx0nxrYQeKxJMZAfBA7GlRx99NC9EtCN7JY4RoPBeAHIAyrB3jpHYwqu1d02d7HpZcfqINo5dL7eJMXtxTzk2sgWFM/gcsnCakI2cFOk+523O+Qw7WaeYHYpYRp9xn4BkbPdWSfgJXYYM+ne+2xRj2sdx8EDu8rm4Ntp9pY4RSmb0CIPOAVNGoLA47yU4S2xen37ppZdy9CkLE/3lm8bJHzJbbiavt2Q9p7AkK7oyXAZOLk7gs9c4PJC0AOE8DDyrgJkaWgYQkSPYuAdpWySfteU8HhqKouYq+io6ZfGeZo7xpbT1+jt+jGULfprpq922ePHMBibwjWVq523KVrzBsIzTaMeu1DFi0HI0YyyYtAekY5MltbRyihFJiROBKIYTwMCTWJNubwdQFCXFapK9z96mtbjgs3thFKWnUgjBzNZIya5FOyUcPG36q4LwRgZ6Ix8HtBk3tirGGU0feAkslHfk5PzBh2cXSkvtWqWOOEaRGcoSHdXDMoYn1tK8yaON0ahbCWgFS/vxSnjn5F4ItLeiFAGAzCKc7MDA1OlIjc4pLFKE7FEyxb5ZPNTbtuiv2fvrtddfOFsYXcwj8d8qv/XGq3femLvvvnvOvrIYPPEjG+PDseDbDnXcMXiyiGiyyACOPvrovN95552zV3/++ef5zVveznlEo6CICvG5l/d4JSvHP+qoo7JjKDs4PkVSGPm9HSz9W5rlPEoCQYHjVFXyRGnBOcKA28VOP/qTBWX6YnS2IKB8qYL/enyGHPbKziOOOCLj6sGeslGW8L6Y4ANr2MY99fpsdL7jjmFwkSTSr6gDVCk+tmDQedcJ5LgdwaLPbu7xjJRRNlErSsiQhVHJlOEQoh182o1wRTnharwYs3itnWP9Rd/RD5mLW5yveh/YRhYMjItyBh/wjPat8tEVx6B00RKo5513XpIl7rzzzuwEourMmTOz95uIcyBfTSXYiy++mCOrSFS1klsFrNZ9eGPoJtmeyRx00EE5cpGbIi21XnbZZbkMee2117KMHIKMIVcotVb/vXoOz6I0+URoMlVFcBFE7L1+IjNYIo6v/fo+D3tC+FCR+FHuwNUCgfOtUlccI5hnJMoIBhN1sBICqMoNNaLP3pkiFGciIIBC4HaEbRWk0dyHb3Mp/EY0I6+NsytvyKxsKhpQr8ozGpm1IZ8IbV+PyllGuyh1YBXXOQEcy6R8M5eAHzuxxX3GRvbaCKJ4aRfXrjkG5jEbk00Prxi8SZTJKmc5/PDDc5v99tsvC+hBjWtqStmD0F4Ma1foMvDtfqZMUc3/lYjMSFFW3NS7JtyyoKzSiTocHoFJHMc+MlK7Mta7n9NbATJerbEYvQWIWCVitIyaXrV3nsG7H2Y2GVcbxyj6NX+waKEPmOvbfShwtjhQDDz5Ygt/uuoY+OPtnICDEMBTWsAQUu0NBBsDEgFEWOADAiDaVRERWsCq5i34IRN+TbTJgn8KwzOFuR4KDUXW7Kyik53Ep8w/+RkxWeO5S1EM5wVABguXMGp69dk1x87D0ObdL32GHI5tsDQGHtwbm/Hw4TpnKvNY5Ge0x113DEwT3tIsIdSnDIfxcxJAevCHfE9cXcmotHXfAw88kIFUdgFjLMn4HuZRuh9FExmjRCCnZxRqcPxz8ioUVk9eRhJkPAYHV8ZVFRkjjFSfAtw222yTy2OZ0iv15fHcQ4dKaMcwsBdEEL26RzaIh5+yK7LSBGPno8yOZX+vzRhfXzZ8cRrtyzzkzpr803XHwB8wTJYIRol+VY8zqMMBbP0f+cExE1qTdbU7x3jwwQdzVBYdesExKNiEWx2MfwoOAyCbJ9uRHZvUTcPmsENhGNE4HBKOHKNqZzQu3KNfX9H1nRABQZlbNkpt4SNo4DWIIesDj9qYnwki2giWqol3330348kZLPm7xvi1Pffcc7MzhA3gy/0oeIuxWtmPiWNgNCIFYwcCAa2FA1ikJZz1aeUVsBmge9TyoqGoIqKUFdEKCFXcU0/pHJizVMUnXBiBh6IicdTTzsEOnuZkDE/2rcJI4KMf/TF+0TucwDhkZ+DGL4/nGkPGV/AIC+2RvfP6ZPTI4gu5XNM/Um7RPzuIFyn1zW7wpQ9UHj+fbOHPmDlGCOGBGIeQQfwuq0jnISBQfOHft7JEHN94Q5xF6XLFFVfkyKIEGyuiGAo3r6BIx0imcM6k+6GHHspOEQbcDq+UTl4BwRu7PstUiPEJFsa9/PLL83nXg6d2xnUvoxS5L7744uGyh/wyRpRF9YwSHsHjE088kWWADQeRFThZkTgBstensZG5h4m56oEdcAp9CwTOVUlj6hgECcGBpA6XDazeiLKhVABQAhKB3cNxbEAL4KoEppm+gjf3OMafDf+UW7zeTL/ltqIiAxBMOIIxnLOHgbFsMGQ4InhE0nJfrXw2hnIRD3SFBKmYWDfqE49woFvOzZno3NxM0HDciMjBDsjEBgLTsJHYN+qjmWtj7hjBLKFFQgL7qRz14jHHHJPBcC2M3wRPVDT5ohzZRv0Z16O/sdozAKmdopUH5kftTrzJpl+lk29CcgpLw3BgpMbwwqF/S80pGJ6xO0WM+8Ybbxw2TuOEoTYakwyovB/JKdzDMVQOHvCRzXju890fL11aGhcMqqIxdwwCRkYQDZAaE7lWBhyosQEmQM439MgffDHm0Si8EcuBC0ezcQSZVKYktzFEW+3sfQ4natRvu9eMTS9F7IvHo+m/2fb6LNuCc0WsW+mzHq9j6hgE9YCHp5tkez2EAVjlMOmyUlU2Lis8ygVR0rykyoltPZCaOY9fr32Qp50X6xi7pWCGbsHBvwLgGIcddljGxvcsjOU1GseyiKjJQWydpiqNsBlei85BfhNxeJunVCl31x0jBOMAjJ9jRC3OEERDS7QMI0qQohIYgLSq7FJuMZbi9WZA7kRbvFAWx5Dyy449mjEDG/dyDPW4VSiy2iNvBcCSUdxyyy35OYHrqJUx843j8I/qQpA074BVVdR1x+AIHCIiIGewsqIuds41tSSlOxeOFHuOQ/E+2zPEuFYVKM32U3RMvGy44YbZMTg2B2+GOIXXJcjpR9lkUy/QyZ7GUU8zAD9RCiuR0oQYVv1IMAk7qFL+rjkGg7GZQPLufffdN69QKJtkCAKKjNGu1p7gMgWDYEDRpkpAmu0rnMLehie/RavcI49Sr1ZW0w6V91ac/IsxmdHPB0U5pQ+4+TExDudNUhPufnaKIn7N6m2k9h11jKLRqP+UQJb2eHh4uYjK0LW1D0MpCq0NR4g24RTR/0hCdvM6/m14FtljeTL4D/liedFeO7LYcyh7eMGDY8X16IM8Vp9kWjj2GwWG5IZb2FKVOHTMMTCvDKBgD2Z22223bNynnnpqVrZXBFxjQDZUFJiwIqKHN8qHO+64IxvN/fffn9vG/VWC0UpfeC5uZMEbg/ctM/8SzYOxZ599Nhs4ebSx0ECpcDFvMCdRggkesoQ+zaHU0N4EgAEnue2227JTON+LgaEVDFu5h+w2Wdl33GFkEUIQqYIqdYwwbJGO8q2xOydqUiTFWpJVPzsuUwhlzzFETxlGdFSCqaMB4XwvUzgKWU3AyW4uwFns4QMbilUyxbq8p/4cw3UEB8FDGQUDx/acqB8zRS2dw5qthe3VatPKucocg6JiYu3lP2nfawvekKVITzgJQLH24QTBtPZeE2D89957b27jwZ1IwIm8R2OMWHmJ+3pxTzaK8l+HyMrgTzrppMxqOIEsGoZvz0nsyWiliRMUl2G9aOk6POyLZVUvYtBpniL4wA1m9lVSW46BOQqKpTLK9FnUsxftvW4swssa4dkhCGFCMNfcp08lhM9KKc4h0obgsa8ShHb6Cv5DJnu8IwHB9TB852DkOlzIRV6kXbSVMfQj48BWdhE0TLr1Fe3zQR/+gRMK5yjuq4KjZccQ2SlYjexHmCnSkiLjtsesmlnpQ5naFo1A5GMAHoJxBI709ttv54ygntZWmWEcQMS9VQleRT9kNmfAG0P3HRPGbHnVudg4gEyJOAYiE0wikHAAcxHyxndO4KI/WHEK/Qzo7wjAXfaFNdurikaNtIERRTqmYIYdE2tGEs8hfJ8iFB/3xV67MCjG8NZbb6Unn3wyC+XfDxfnDxFp496qhK6qn5CDA5twK/fIRH5Gb0MMOhxCFgkKjOBoHqKEkmWvueaanG04iTHcP3CKQO0/e3ZhgceP2smqcKyKRuUYlEKhPDL+d5z1c4qVFTDnmBIZMwZ9DiKAzTmvCetPNFR7W7fXXt/KLddqTcyjr17bRybkEF5XiQhPHnMuDlF07MCB3I49l4EDxTrnfsFBJBxQbQSKeGoROqjdurWzIzoGJqRxS2KUf/rpp2flcRDRjRKVCdpFhCwz7rOVKE5z++235/7uuuuuXDq5P5yKEY0np8B3TKb9K1/vLTF0/7MiJtyRPYrq4fx+7R2e7vFDDzDyfx1goPwcUGMEYG/rFI3oGAYW0UUyimQIcRwGzbgpVsZAUTYE065xCtc5GUeSHTyg4kzKs/FKoSBljyhvTz6y2gseZAwlwgI+cNBGtpV9ZRj4BobjFY9O8g0bQcXWaRpxBE5hHuFnJ0XB6dOn56ge2QGDlK2dFSSG4b8kxVzEdSWGVxgYQLzrxJkIGgbTaUE73b9MZ/KNfIMOJpdcckndYZWmFAwv+wgydW/o8wsCK3xnz56dFzx8oxPGtk7QiI5h0FBaeGzRKYIpjDN2ig6lB9OiprmI60qNieIMIXvsQy7yotjH9eI+2hbPDY4bI8D+2JdnWTYY+iwDs78qaUTHEM0sI1pClAVMnqX9ImGQszB6DHoNOLzZNZlGRlEq9JNB9JOsRXvoxDGnsDTudwFUHTNmzMjDqEaU9xYvGgWiZnka0TEo16CeNyCM1SLtwmt5cNEoCOUa5xjQAIFWEGBP5rbKdTRr1qwcfGUMthXVTCt917pnRMdwE6ZiQm0JckADBMYCgWLwtXjTSeq/d5Y7ieag7wmDwMAxJowqB4JUicDAMapEc9DXhEFgcjxcM7vvR4on7bHS1q84WNkpUr/iEL+aOLRw4cIlQCmuIhUBmsjHlpQ9c7EmzjEsN1vd6DeCg8UVT+qRd7b6EQey8wMT+6El8RSu36xhIO8AgQYI9F94bADG4NIAgUDg/wHX+3lgThDIegAAAABJRU5ErkJggg==\"), embed=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We're going to be building a model that recognizes these digits as 5, 0, and 4.\n",
+    "\n",
+    "# Imports and input data\n",
+    "\n",
+    "We'll proceed in steps, beginning with importing and inspecting the MNIST data. This doesn't have anything to do with TensorFlow in particular -- we're just downloading the data archive."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 110,
+     "status": "ok",
+     "timestamp": 1446749124399,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "w5vKZqr6CDz9",
+    "outputId": "794eac6d-a918-4888-e8cf-a8628474d7f1"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Already downloaded train-images-idx3-ubyte.gz\n",
+      "Already downloaded train-labels-idx1-ubyte.gz\n",
+      "Already downloaded t10k-images-idx3-ubyte.gz\n",
+      "Already downloaded t10k-labels-idx1-ubyte.gz\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import urllib\n",
+    "\n",
+    "SOURCE_URL = 'http://yann.lecun.com/exdb/mnist/'\n",
+    "WORK_DIRECTORY = \"/tmp/mnist-data\"\n",
+    "\n",
+    "def maybe_download(filename):\n",
+    "  \"\"\"A helper to download the data files if not present.\"\"\"\n",
+    "  if not os.path.exists(WORK_DIRECTORY):\n",
+    "    os.mkdir(WORK_DIRECTORY)\n",
+    "  filepath = os.path.join(WORK_DIRECTORY, filename)\n",
+    "  if not os.path.exists(filepath):\n",
+    "    filepath, _ = urllib.urlretrieve(SOURCE_URL + filename, filepath)\n",
+    "    statinfo = os.stat(filepath)\n",
+    "    print 'Succesfully downloaded', filename, statinfo.st_size, 'bytes.'\n",
+    "  else:\n",
+    "    print 'Already downloaded', filename\n",
+    "  return filepath\n",
+    "\n",
+    "train_data_filename = maybe_download('train-images-idx3-ubyte.gz')\n",
+    "train_labels_filename = maybe_download('train-labels-idx1-ubyte.gz')\n",
+    "test_data_filename = maybe_download('t10k-images-idx3-ubyte.gz')\n",
+    "test_labels_filename = maybe_download('t10k-labels-idx1-ubyte.gz')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "gCtMhpIoC84F"
+   },
+   "source": [
+    "## Working with the images\n",
+    "\n",
+    "Now we have the files, but the format requires a bit of pre-processing before we can work with it. The data is gzipped, requiring us to decompress it. And, each of the images are grayscale-encoded with values from [0, 255]; we'll normalize these to [-0.5, 0.5].\n",
+    "\n",
+    "Let's try to unpack the data using the documented format:\n",
+    "\n",
+    "    [offset] [type]          [value]          [description] \n",
+    "    0000     32 bit integer  0x00000803(2051) magic number \n",
+    "    0004     32 bit integer  60000            number of images \n",
+    "    0008     32 bit integer  28               number of rows \n",
+    "    0012     32 bit integer  28               number of columns \n",
+    "    0016     unsigned byte   ??               pixel \n",
+    "    0017     unsigned byte   ??               pixel \n",
+    "    ........ \n",
+    "    xxxx     unsigned byte   ??               pixel\n",
+    "    \n",
+    "Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).\n",
+    "\n",
+    "We'll start by reading the first image from the test data as a sanity check."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 57,
+     "status": "ok",
+     "timestamp": 1446749125010,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "P_3Fm5BpFMDF",
+    "outputId": "c8e777e0-d891-4eb1-a178-9809f293cc28"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "magic number 2051\n",
+      "image count 10000\n",
+      "rows 28\n",
+      "columns 28\n",
+      "First 10 pixels: [0 0 0 0 0 0 0 0 0 0]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import gzip, binascii, struct, numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "with gzip.open(test_data_filename) as f:\n",
+    "  # Print the header fields.\n",
+    "  for field in ['magic number', 'image count', 'rows', 'columns']:\n",
+    "    # struct.unpack reads the binary data provided by f.read.\n",
+    "    # The format string '>i' decodes a big-endian integer, which\n",
+    "    # is the encoding of the data.\n",
+    "    print field, struct.unpack('>i', f.read(4))[0]\n",
+    "    \n",
+    "  # Read the first 28x28 set of pixel values. \n",
+    "  # Each pixel is one byte, [0, 255], a uint8.\n",
+    "  buf = f.read(28 * 28)\n",
+    "  image = numpy.frombuffer(buf, dtype=numpy.uint8)\n",
+    "  \n",
+    "  # Print the first few values of image.\n",
+    "  print 'First 10 pixels:', image[:10]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "7NXKCQENNRQT"
+   },
+   "source": [
+    "The first 10 pixels are all 0 values. Not very interesting, but also unsurprising. We'd expect most of the pixel values to be the background color, 0.\n",
+    "\n",
+    "We could print all 28 * 28 values, but what we really need to do to make sure we're reading our data properly is look at an image."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 887,
+     "status": "ok",
+     "timestamp": 1446749126640,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "F_5w-cOoNLaG",
+    "outputId": "77dabc81-e3ee-4fcf-ac72-88038494fb6c"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnWuQbFd1339rHt3T3dM974eu5iKJCINExRGULYeSXRli\nLAtCJIoPisDlAuQkrgIZqpQHuny5oziJEAk42C7ZQWBKUBBZOIUlV1FGKNQ4RQISGInXFfIF50r3\nzp3Hnff0PLqnu1c+dO+j3We6586jex5n1q9qV59z5pyz9+mZ+ffqtddeS1QVwzAMI1q0HPYADMMw\njMZj4m4YhhFBTNwNwzAiiIm7YRhGBDFxNwzDiCAm7oZhGBHExN2INCLyCyLyvIh8v/K6JCIfFpEe\nEXlaRF4Ska+LSJd3zRkROS8iL4rI7Yc5fsPYK2Jx7sZJQURagEvArwD3AXOq+gkR+SjQo6oPiMjN\nwJeAXwZGgGeA16n9oxjHDLPcjZPE24Cfq+pF4C7gscrxx4B3VbbvBB5X1YKqXgDOA7ce9EANY7+Y\nuBsniX8BfLmyPaSq0wCqOgUMVo5fC1z0rpmoHDOMY4WJu3EiEJF2ylb5VyqHwm4Wc7sYkaLtsAdg\nGAfE24G/VdXZyv60iAyp6rSIDAMzleMTwGnvupHKsS2IiH0gGE1FVWWv15rlbpwU3gP8D2//KeD9\nle33AU96x+8RkZiI3ADcCDxX76aq2tR29uzZSPQRpWc5qPdrv5jlbkQeEUlSnkz9197hh4EnRORe\n4GXgbgBVPSciTwDngE3gg9qI/zTDOGBM3I3Io6prwEDo2Dxlwa91/kPAQwcwNMNoGuaWMYwjzOjo\naCT6OKh+otJHI7BFTIaxR0TEPDZG0xAR1CZUDcMwDB8Td8MwjAhi4m4Y+6CtLb6l/cmffOawh2UY\nFi1jGPuhWFwOHfmv/OxnPz+UsRiGj4m7YeyLeGjf/qWMo4G5ZQzDMCKIibthGEYEMXE3DMOIICbu\nhmEYEcTE3TAMI4KYuBuGYUQQE3fDMIwIYuJuGIYRQUzcDcMwIoiJu2EYRgQxcTcMw4ggJu6GYRgR\nxMTdMAwjgpi4G4ZhRBATd8MwjAhi4m4YhhFBTNyNyCMiXSLyFRF5UUR+IiK/IiI9IvK0iLwkIl8X\nkS7v/DMicr5y/u2HOXbD2Csm7sZJ4NPA11T1JuAfAT8FHgCeUdXXA98EzgCIyM3A3cBNwNuBR0RE\nDmXUhrEPTNyNSCMiGeDXVPXzAKpaUNUl4C7gscppjwHvqmzfCTxeOe8CcB649WBHbRj7x8TdiDo3\nALMi8nkR+b6IfEZEksCQqk4DqOoUMFg5/1rgonf9ROWYYRwrrJqvEXXagDcDH1LV74nIH1B2yWjo\nvPD+Dhnztkf3dgvDAMbHxxkfH2/Y/UzcjahzCbioqt+r7P9PyuI+LSJDqjotIsPATOXnE8Bp7/qR\nyrE6jIX2n23AkI2TyOjoKKOjo8H+gw8+uK/77cstIyJ3iMhPReTvROSj+xqJYTSBiuvlooj8QuXQ\nrwM/AZ4C3l859j7gycr2U8A9IhITkRuAG4HnDm7EhtEY9my5i0gL8MeU/1kuA98VkSdV9aeNGpxh\nNIgPA18SkXbg74EPAK3AEyJyL/Ay5QgZVPWciDwBnAM2gQ+q6h5dNoZxeOzHLXMrcF5VXwYQkccp\nRyBUibuI2D+G0VRUddtQRVX9AfDLNX70tjrnPwQ81IChGcahsR+3TDiq4BJ1ogpUFVXl7NmzwXaz\nm/V1vPraa3+GYdTGQiENwzAiyH7cMhPAa7z9ulEFY2NjwKuhPv6MsGHshkaHixlGVNmPuH8XuFFE\nrgMmgXuA99Q60Rf3gxL2g/wAsb4Orr9Gh4sZRlSR/fgtReQOynk7WoDPqerHa5yj5hs1moWIoFeZ\nUG1i37p17dPD3H//PJ/85MOHMSQjQuz3b3tfi5hU9a+B1+/nHoZhGEbjsQlVwzCMCGLibhiGEUFM\n3A3DMCKIibthGEYEMXE3DMOIICbuhmEYEcTE3TAMI4KYuBuGYUQQE3fDMIwIYuJuGIYRQUzcDcMw\nIoiJu2EYRgQxcTcMw4ggJu6GYRgRxMTdMAwjgpi4G4ZhRBATd8MwjAhi4m5EHhG5ICI/EJHnReS5\nyrEeEXlaRF4Ska+LSJd3/hkROS8iL4rI7Yc3csPYOybuxkmgBIyq6ptU9dbKsQeAZ1T19cA3gTMA\nInIzcDdwE/B24BEROZQarYaxH/Yl7rUsIsM4gghb/9bvAh6rbD8GvKuyfSfwuKoWVPUCcB64FcM4\nZuyrQDavWkQLjRiMYTQJBb4hIkXgv6vqZ4EhVZ0GUNUpERmsnHst8G3v2onKMcM4VuxX3GtZRIZx\n1LhNVSdFZAB4WkReoiz4PuH9HTLmbY/u7RaGAYyPjzM+Pt6w++1X3H2L6DOq+mgDxmQYDUVVJyuv\nV0TkLym7WaZFZEhVp0VkGJipnD4BnPYuH6kcq8NYaP/ZBo3aOGmMjo4yOjoa7D/44IP7ut9+xd23\niL4hIi+q6rfCJ42NjQXb4QcwjN2wW+tGRJJAi6pmRSQF3A48CDwFvB94GHgf8GTlkqeAL4nIH1B2\nx9wI2HyScewQ1T1+Gw3fSOQssKKqnwod10b1YRhhRARVrRvNIiI3AF+l/C2zDfiSqn5cRHqBJyhb\n6S8Dd6vqYuWaM8DvAJvAR1T16Tr31q3enIe5//55PvnJh/f7aMYJ52p/21djz5b7NhaRYRwZVPX/\nAbfUOD4PvK3ONQ8BDzV5aIbRVPbjlhkCvlq2XgKLqKaFYxiGYRwsexb3ehaRYRiGcfjsd0L1yKKq\nlEqlqqaqFAoFisUihUIh2HZzAiJS1Rz+dktLC62trUFra2ujpaUlOMe/ttarLXY0DOMgiLS4OxH3\nxTyXy5HL5djY2Ai2i8UiLS0tVS0s0m67ra2NWCxGPB4nFosFLfzB4N+j1geGYRhGMzkR4p7P59nc\n3CSfz7O6urqlFQqFwBL3LXPYas3HYjESiQTJZDJ4VdUtHwzhDws3JhN4wzAOgkiLe6lUCkTdWerL\ny8ssLS2xtLQUbG9ubtLW1ha4Wdx2LUu8o6ODdDpNPp+nWCxuEXL/A0JVA5ePE3jDMIyDINLi7ix3\n54ZZX19neXmZhYUF5ubmmJ+fZ35+nnw+T1tbG+3t7VWvvqi711QqVSXs7gPBWfulUikQ9rD1b/H+\nhmEcFJEVd2e1r6+vk81mWVlZYWVlhbm5uS0tl8ttEXZ/otR/TSaTVf5696HhC7wv+O5e/jeCo0q9\nCeCwi8l9C6k3cWwYxuETaXHf2NhgeXk5sNDn5+dZWFgI2uLiIouLi+Tz+S2C7NwrYZ97IpEgm82y\ntLTE3Nwc6XSaVCq1RdxbW1tpb2/f0traju5bHp4zEJFgAtm1eDxOe3t7zQ8+wzCODkdXafZJsVhk\nY2ODlZUV5ufnmZqaYmpqiuXl5cDXvrKyEvjcwxOq9aJdak2odnR0bBF29y3Aj6xxwnhUCYd5trS0\nEI/HSSaTQSsWiySTyap5BfdqlrthHB0iK+6+5T43N8fU1BQXL14km80GbXV1lWw2S6FQqBvhEnY5\nOMEOW7JhcW9tbSUWi9HR0UFHRwfxeJyOjg5isdihvSdXw3cdue1EIkEmkyGTyQTvk/uZqlbNTRiG\ncXSIrLj7lvvc3ByTk5NcvHiRtbU11tbWWF9fD5oTre1i0t1+2Lqt11paWujo6CCRSJBIJOjo6CCZ\nTBKPxw/j7dgR7tuG31KpFBsbG8EEcnt7Ox0dHcE1/gehYRhHh8iKu7Pcs9ksCwsLzMzMMDExUTUZ\nms/ng0VMe8W5I3zRd9tO0P3mC+NRIxaLbflmkslkAmF3lnw+nw+uceJukUCGcbSIrLi3trYGLoX+\n/n5OnTq1ZWVqeIVqeGWpi5UPv4a3S6USQCBwvtC5qJ2NjQ1Ulc3NzZr+fHeui433Y+R3Q61vHLVW\n2ob7UdWqSV+3XSgUqvbdB4D7NuI+3I7yXIJhnEQiK+4tLS0kEgm6uroYGBggl8uhqoG4+68uNj3c\nisXillYrN029bREJxN3F3efz+ZqLntzPw/lwdkM9YQ9HwQBb+imVSjUXcrkFXr64x2IxCoUCpVIp\nsOjNcjeMo0Vkxd25RTKZDLlcjlKpRHt7O+vr62xsbGwRdz++3W1vbm5SKBSCV7edz+erUhpsbm5u\n+TaQz+cD0dzc3KwS9vDCp7a2Nkql0pYPj/2Iu9sORwC5OPtaH1y1zt3c3KwS9ng8TjwerxL2jo6O\nXY/VMIzmEmlxd5a7i+pIpVJV4u62VbXKz+xcD07EfTEPu3bcvdxErRNPJ+phNw5Q84OkVCpVfZC4\nD4TdUGtRUa1FVUDwYeW3WrHrvrg7YY/H44Gwx+Px4JuJYRhHh8iLu/Mlp1Ipenp6ggiZsLi7cEU/\nZLGekK+vr1dF3KytrbGyshKEBZZKJfL5fGCF+5Z/sVismrgsFou0t7cH5/nfCAqFwq6euZZ/PbxC\n1o1xc3NzS3PX+a/5fH6LsMfj8UDYU6kUhULBxN0wjhiRFXc/bM/fdgLtu2aALeLlxD3c/BBKv6VS\nqarm0hSE3TiFQqEqRt5tF4vFwJ3j2l7F3RfocM6c9vZ2SqVSMG5gS157eHVSOBaLVY3HuYvCefIN\nwzhaRFrcnd/Yj+Twk3054YLqMEAngv5K03g8Tj6fD8IbfcHf2NhgdXWVtbW1qlTCvlg7kXeWe7g5\nn3zYDbSb53WvfquVEK1QKLC0tMTi4iJLS0uBJe9H68CrAu/PE/gfSu5+FuduGEePq4q7iHwOeCcw\nraq/WDnWA/w5cB1wgXLl+KUmjnPXOHF3PmZf7J1I+e4I3w/u+8OdsPsTqr77xG3X8uXncrkqy73Z\n4l5L4P3ncNu5XI6ZmRk6OjoCv/rq6mpNi9x/H/1JVXc/P1WDYRhHh51Y7p8H/gj4gnfsAeAZVf2E\niHwUOFM5dqRwYl5L2P0oEaBmbhkXweKam/QMR7XUEuZazXfLhCs51XLL7EXca1nuYYFfX18P3FWb\nm5tks1na2tqqJnCdsIe/AYUtd/deGYZxtLiquKvqt0TkutDhu4B/Utl+DBjniIm7H9PtJ7kKL0qq\nVT/VWaLh8+otaKoVxuiHTfrW+9Usd1/g9+Nz385yX1lZCYR9dXWV+fn5LYuQ/NDG8Aejm3D2xd0s\nd8M4WuzV5z6oqtMAqjolIoMNHFND8IW62YTF3227En9hn3t4iX8j3TLbWe5ue2lpKShcMjs7SyKR\nCAqNuPG7by61rHYXWXScfO4i0gJ8D7ikqndu51oUkTPAvUAB+IiqPn04ozaMvdOoCdVtwyXGxsaC\n7dHRUUZHRxvU7dEivIio1oSuWwka9lm71Ln+as/dFvaoJ/BOqN03i3B+HecucguT3Jja29uDnDip\nVIrOzs6guTw5sVgsCK88CMbHxxkfH9/LpR8BzgGZyn5N16KI3AzcDdwEjADPiMjr1EKCjGPGXsV9\nWkSGVHVaRIaBme1O9sU9yvgC5wTbL87hRLZWrdZak7+7WcRUK3+Mcy255uLRnbj7wu5cQLXKCrrW\n2dlJOp0mnU4H4u5cMwcl7mHj4MEHH7zqNSIyArwD+E/A/ZXD9VyLdwKPq2oBuCAi54FbgWcb8gCG\ncUDsVNyl0hxPAe8HHgbeBzzZ2GEdP5yQhgXe33ax5OEMkmG3hrOe97KkP7wIyU9l4EQ8bLW71bB+\n9SXXfKvdt96dayYWix2ouO+RPwD+HdDlHRuq41q8Fvi2d95E5ZhhHCt2Egr5ZWAU6BORV4CzwMeB\nr4jIvcDLlL/Gnnh8ga/lFnEul/ASfyfm7nzf/73b/sP7LseNs9D9VbfhxUnt7e2BuDv/uhN3X9jT\n6fSW0oFHVdxF5J9RDuN9QURGtzl1j26XMW97u9sbxvbsw+VYk51Ey7y3zo/e1rBRRIiw393f98U6\nbGG7bRfV0yj8rJRugjfslnHC7z5g/CpS9dwy4QRjR1XcgduAO0XkHUACSIvIF4GpOq7FCeC0d/1I\n5VgdxkL75r0x9sZeXI7bcfTDHI4B9SYynWXup/cNu2NqnbuTSk+1Kj/5zd3TTaC6oiVXrlxhZmaG\n+fl5lpeXWVtbC8S/tbU1yBfT1dVFX18ffX19dHd3k8lkSKVSVWGQ/pzBURV3Vf2Yqr5GVV8L3AN8\nU1V/G/gryq5FqHYtPgXcIyIxEbkBuBF47oCHbRj7JrLpB04q4fQBuVyObDbL/Pw8c3NzzM3NMT09\nzfT0NPPz82Sz2SDXvV8ztbe3l97eXoaGhujv76erq4tUKkUsFqv5IXYM+TjwRNi1qKrnROQJypE1\nm8AHLVLGOI6YuEeIWtWVcrkcKysrzM/PMz09zeTkJNPT01y5ciUQ93w+v0Xc+/r6GBoaYmhoiL6+\nPrq6ukgmk8RisapvBscJVf0b4G8q2/PUcS2q6kPAQwc4NMNoOCbuEcMX9lKpVGW5T01NcfHiRaam\nplhaWmJpaSkQdyCIa3fiPjw8zNDQEL29vVWWu4vuqTVvYBjG0cDEPUKEhT0s7tPT01y6dInJycmq\nnPS13DK+uKfTaTKZTGC5m6gbxtHHxD1C+KLuEp1tbGywsrISTKZOTk4yOTm5pXwgEJTMS6fT9PT0\n0N/fz8DAAIlEgmQySSKRoL29/VikGzCMk46Je0QI57JxbXFxkZWVFdbW1tjY2KiqEOXna6+V6jgc\nFWMJwgzj+GDiHhFUlc3NzSCfvGtLS0usrKywurpaJe7OfeMnV6tVtckvXGLibhjHBxP3iOAWKeVy\nOVZXV8lms2Sz2UDcfcvdz1njx9ib5W4Y0cHEPSL4lvva2hrLy8tBKT1nua+vrweWe60UCGFx99ML\nuNS+Ju6GcTywmbGI4Cx3V891eXmZhYWFLZa7n97Xd8v4mSr9knpmuRvG8cQs92NIeKESENRxzWaz\nLC4uMjc3x8zMDLOzsywuLgYrUZ3V7nLH+G1wcDBYsJRKpYKCHOF0CYZhHH1M3I8h4RJ/LuTR+did\nsF++fJnZ2VkWFhbIZrNsbGxQLBZpbW0lkUjQ1dUVtEwmw8jICNdccw19fX2k02ni8XhgtZvlbhjH\nCxP3Y4gTd79uq5tIXVpaYn5+PhD3xcVFlpeXWVlZIZfLBcVCkskk3d3dDA4OMjAwwMDAQFW6AV/c\nzWo3jOOHifsxxK/R6hYjOV+7E3eXRyabzQYrUZ3lHovFSCaT9PT0MDQ0xMjICCMjI/T09ATNF3c/\nosYE3jCOBybuxxBVpVgsUiwWg3zs9dwy/iSqs/KdW6a7u5uhoSFe85rX8NrXvjaotuRyt3d0dFSV\n/vNfDcM42pi4H0N8y91fuLS6usrKygpLS0ssLCwwOzsbpBZwtLS0EIvFqnK2Dw8PMzIyUlU6z59M\nNQzj+GHifgxx1ZWcqK+trQXhji6WvV691tbWVrq6ukin06RSqaDQtRPz41CAwzCMq3PVOHcR+ZyI\nTIvID71jZ0Xkkoh8v9LuaO4wDR/njsnlcqytrZHNZuumGGhpaaG9vT2osJTJZAJx7+zsDMS9Vky7\nYRjHl538B38e+M0axz+lqm+utL9u8LiMbXAuGSfuKysrLC8v1xR3t/LU1UP1xT2VSpFIJAJxj8Vi\ntmDJMCLCTgpkf0tErqvxI/vPPyScW6aW5V7LLdPe3h6Ie2dnJ5lMZovlHo/Ht9RiNXE3jOPLfr57\n3yciL4jIZ0Wkq2EjMq6K73N3lvvV3DL1LPd6bhkTdsM43ux1QvUR4D+oqorIfwQ+BfxOvZPHxsaC\n7dHRUUZHR/fY7cmiXl1m53P3xd13y2xublIqlQACYe/s7KS7u5u+vj4GBgaCWHa/AIc/iXpUxX18\nfJzx8fHDHoZhHHn2JO6qesXbfRT4q+3O98Xd2DtO7H1xd+GP4eRgpVIJEamqi9rb28vg4GCwCtWV\nzmtvbz8Wwg5bjYMHH3zw8AZjGEeYnbplBM/HLiLD3s/eDfy4kYMyqvEThLkFTPl8/qqWuxN3VxfV\nF/fe3l4ymUxguTuOsrAbhrFzrmq5i8iXgVGgT0ReAc4CbxWRW4AScAH43SaO0YC64u4s9+3E3bfc\nXf6YWkWvTdgNIzrsJFrmvTUOf74JYzFq4PvdnQVfyy1zNcs9nU5XWe6JRCJozi0TRUQkDvxvIFZp\nT6rqx0SkB/hz4DrKBsrdqrpUueYMcC9QAD6iqk8fxtgNYz/YSpUjjhNzl2rAWexuZaoTd5fS1wm7\nHyWTTCZJp9N0dXUFicEymQypVIqOjo4gxYCz3v123FHVHPBWVX0T8IvAPxWR24AHgGdU9fXAN4Ez\nACJyM3A3cBPwduARicIbYZw4LP3AEcZVV8rn80GCsHw+z8LCAouLi0GVJZf50dVGdZZ4LBYLYtqT\nyWTVgqVwEY4oo6prlc04ZYNmAbgL+CeV448B45QF/07gcVUtABdE5DxwK/DsQY7ZMPaLifsRxtVF\nzeVyrK+vB2l7nbgvLy+zvLxMNptlfX09SMvrSuS1tLQEi5Wcle4Sg7myeidhsZKItAB/C/wD4E9V\n9ZyIDKnqNICqTonIYOX0a4Fve5dPVI4ZxrHCxP0IE66Lurq6SjabDWqj+pb7+vp6kELAibuz3N1i\npUQisaV03klYsKSqJeBNIpIBvi4io0B4EUHtRQVXZczbHt3bLQyDxq/hMHE/wjhx96ssuXS+foUl\n55ZxrhgXIeNWpPp5ZJzl7hffiLq4O1R1WUS+BvwSMO2s90po70zltAngtHfZSOVYHcZC++a9MfZG\no9dwRNvZeszxLfe1tTWWl5eZn5+v6XNfX18Pcrc7cQ9nf/TF3bfeoyzwItLv0mOISAL4DeB54Cng\n/ZXT3gc8Wdl+CrhHRGIicgNwI/DcgQ7aMBqAWe5HhFKpFIQ6uu1cLrelutLMzAxXrlypKnrtomNc\nEQ4X097f309fXx9dXV2kUqkgOVhUhbwO1wCPVSJeWoAvqur/EpHngSdE5F7gZcoRMlT88U8A54BN\n4INaLw+EYRxhTNyPCH7pPNfW19dZWVlhcXGR2dnZoC7q3NwcCwsLrK6uks/nUdUgra/LIdPf38/Q\n0BD9/f1bxP0koao/At5c4/g88LY61zwEPNTkoRlGUzFxPyKE49kLhUKQWmBhYYG5uTmmp6eZmJgI\nImSy2Wwg7rUShA0PD9Pb20t3dzednZ3E4/HIhz0ahlHGxP2I4NwxLpY9n8/XtNwvX74cWOyuAVWW\ne1dXV2C5++l9T6LlbhgnFRP3I4JvuYeTgi0uLgaW++XLl8nlcoF/3rWwW6avr4+hoSE6OzuDNAMu\n9t0wjOhj4n7AbJej3VnrLqZ9YWGhZisWi0FRjba2tiB/jF9pyaUacKLuImRM3A3jZGDifoj4mR7z\n+Tyrq6ssLi6yuLgY+NmnpqaYn59nZWUlsNj9otcutLG7u5tMJlMV9lir6PUJi5QxjBOLifsh4btU\nADY3NwNxd+GOMzMzTE9PV4l7qVQKrHWXFCyZTG4R93CqASfuhmGcDEzcD4Gwv9y33BcWFpiZmeHy\n5ctMTU0xNzfH/Pw82Ww2sNxbW1uromPS6XRNcfddMVb02jBOFibuh0gtcV9cXOTKlStMTExw+fLl\nIMWAb7m7BUvOz57JZKrE3U810NbWFqxANXE3jJODifsh4Qu7C4EMi/ulS5fY2Nggl8sFr77PvaOj\ng1QqRVdXV13L3UIfDeNkYuLeRGpFxqhqINR+m5iYYHp6mtnZWRYXF6uKbxSLRUSEtrY2VJVEIhHE\ns/f19TEwMMDAwMCWuqhRzhljGMb27KSG6gjwBWCIcs3UR1X1D7crU2bUp1QqkcvlgtJ4zuUyNTXF\n5OQks7OzLC0tsba2FrhhVDUQ95aWlqCyUnd3d1XpvJ6eHtLpdORL5xmGcXV2Ej5RAO5X1TcCbwE+\nJCJvoE6ZMmN7SqUSGxsbLC8vMzc3x+XLl3n55Zd55ZVXAnFfXFwMVqEWi8Ut4Y++uDvLfXBwkN7e\n3ipxNwzj5LKTAtlTwFRlOysiL1LOcV2vTJlRB+djd+I+OzvL5OQkk5OTXLlyhbm5Oebm5qosdxej\n7rdkMlm1EnVwcJDBwcEgLDKZTNLWZh43wzjJ7EoBROR64BbgO0C9MmVGDZz/3bllnLg7y31+fp6l\npaWgdJ6z3IEgyqWtrY1YLLbFcnfi7ldhMreMYZxsdizuItIJ/AXwkYoF36AyZdHGX4Xq8sfkcrmg\n+IbL0+5S+K6trbG2tha4ZEQkKL7hWk9PDz09PXR3d9PV1RUkBwtb+IZhnFx2JO4i0kZZ2L+oqq5i\nTb0yZVsYGxsLtsOlpKJOeLFSsVgMil5vbGwEuWTW1tbY2Nggn89TKBQCP3s8HiedTgfhjl1dXZw6\ndYqBgYEgT3t4FWqUo2QaXWfSMKLKTi33PwPOqeqnvWOuTNnDVJcp24Iv7ieJcCx7qVQK8rW7JGFr\na2uBuOdyucBiL5VKiAjxeJzOzk56e3uDkMfh4WEGBgbo6emhs7MzEHd/JWpUaXSdScOIKjsJhbwN\n+C3gR5XSZAp8jLKobylTZlTjC7ursORb7mtra0GBa1ekw1nura2tgeXe29vL8PAwp06dCsrndXd3\nV+Vp9632qFruhmHsjJ1Ey/wfoJ4Dt2aZMqOMb7U7YXeWuy/uznJ3FrtrYct9aGiI06dP09PTQyaT\nIZPJVLllfFE3cTeMk43FyzUAt8io3opU33J3xThyuVzglnGWe5iWlhY6OjpIp9NB2bzTp08Hq1AT\niQTJZDLIIWMYhuEwRWgQfvpe9+qiYlxbX19ncXGRS5cuceXKFZaWltjY2AgqKflhjLFYjHQ6XeVf\nT6fTNTM+mpVuGEYYE/cGUSuNr0sz4ApwuHJ5Fy9eZGZmJhB3l6M9mUySSqWC1t3dHUTGOFeMy/YY\ni8WCBU4m7oZhhDFxbyC+j90XdxfL7opvTE9PMzMzw/LyciDu8XicRCIRhDz29PQEk6i1LPe2tjYT\nd8Mw6hLdmLkDJhzy6BYrraysBCtRL1y4wM9//vMqt8z6+nrglkkmk3R1dTEwMMCpU6c4ffo011xz\nTVXGR+cXRHVQAAAPJ0lEQVRjN7fMzhCRERH5poj8RER+JCIfrhzvEZGnReQlEfm6iHR515wRkfMi\n8qKI3H54ozeMvWOWewMJC7xLMzA3N8fk5CQXLlzg4sWLgf99fX19i1vGF/fh4WH6+vro7e2tyvjo\nF+CwsMer4hLfvVBZZf23IvI08AHKie8+ISIfpZz47gERuZlyWO9NlHMoPSMir9N6lc0N44hi4t4g\n/DBH95rNZgOf+/z8PHNzc8zOzpLP54OFTJubm0FMu583pqenh/7+/iC1gHPHtLe3W2qBXbCHxHd3\nAo+ragG4ICLngVuBZw946IaxL0zcG4CfM2ZjYyMoxrG4uMjS0lIQ6ujSC7gPAOebh3JcuhN4v/C1\nK3RtLpj9s8PEd9cC3/Yum6gcM4xjhYl7gygUCkHo4+rqalDsenl5eYu4Oyvf5Wp3eWRcOKTL2e5q\nofqRMcbeaF7iuzFve3RvtzAMGp83ycS9Aahqlbi7KkuLi4uBuDv/ej6fr5p0LZVKwFbL3S1Qspj2\n/bPLxHcTwGnv8pHKsTqMhfbNe2PsjUbnTbJomQbhi/vy8jILCwtV4r66urqtW6aW5Z5MJoO4drPc\n98V2ie+gOvHdU8A9IhITkRuAG4HnDmqghtEozHLfJfVSDDhxX11dDcTdd8usr6+Ty+XY3NyseV9n\nube3t1dZ7s5q9y13s953zm4T36nqORF5AjgHbAIftEgZ4zhi4r4HnMXtwh4LhULgjnGrUKenp5md\nnWVhYYFsNsvGxgaFQuGwh37i2EviO1V9CHioaYMyjAPAxH2XhBcquXh23x0zOzvLzMwMV65cYXFx\nMRD3YrF42MM3DOOEYOK+S/wUvi73unPHhC33+fl5VlZWTNwNwzhwTNx3ie+K2dzcDHKzh90yri6q\nK6WXy+XMLWMYxoFh4r4HfMt9O3FfWlqqWrVqlrthGAeFifsu8S33fD4fVFNyzS1gctWV/MpKLqYd\n2JIbxs/yGI6KsegYwzB2y1Xj3Gtk1fu9yvGzInJJRL5faXc0f7iHTy1hX11drVqk5Mex+5E1jpaW\nliDsMRaLEY/HgxQDLneMX+zaEoQZhrFbdmK518qq943Kzz6lqp9q3vCOHi6PjHPHuFJ5fnoB54Jx\nwu4vVoJXxd231v0CHH6edhN1wzD2wk4KZNfKqucSKZ041fHFvZbl7hYq+eLuNxGpEndnrcdisWDB\nkm+5m2vGMIy9sKv0A15WPZdA4z4ReUFEPusXO4gybjLVL3Jdyy1TLBa3CLzD+dhruWXqWe5mwRuG\nsRt2LO7hrHrAI8BrVfUWypb9iXDPOMs97HevlzfGt9id1e6E3aX27ezsDDJAOpEP+90NwzB2w46i\nZWpl1VPVK94pjwJ/Ve/6sbGxYDuc+SzKODF3bhgn7Ol0mnQ6TSaTIZ1O09nZyTXXXMOpU6fo7+8n\nk8kEdVJruWhOMo1Oi2oYUWWnoZBbsuqJyHDFHw/wbuDH9S72xf0k4ScDc+6WeDweFL/228DAAIOD\ngwwMDATi3traWjM08iTT6LSohhFVriru22TVe6+I3AKUgAvA7zZxnMcS3wXjJkyTySQ9PT0MDg4y\nODjI0NAQQ0ND9Pb20t3dTVdXV5Xl7oTdLHfDMHbDTqJl6mXV++vGDyda+JZ7PB6no6ODVCpFd3c3\nAwMDXHvttYyMjHDttdeSyWSCHO4uj7tfCNswDGM32ArVJhIW90QiQWdnZ2C5nzp1iuuuu47rr7+e\nVCoV+NidC8esdcMw9oqJ+y5xk6N+Iet0Ok13dzfr6+tBGb2WlhZUlUQiETR37vDwMAMDA/T399PT\n00MmkyGRSFS5X9y2YRjGXjBx3yUtLS2BsHd2dgZhjqVSiba2NhKJBJlMhr6+PlQ1cMe4WPZUKsXw\n8DCDg4N0dXWRSCSqYtoBc8UYhrFvTNx3SUtLC+3t7SQSCVQ12G9vbyeZTNLV1UV/fz9LS0uoalWZ\nPPeh0NvbS09PD93d3YG4OyvdRN0wjEZg4r5LnJg7YY/FYoHLJZPJsL6+HuSbAQIfugtrjMVipFKp\noCWTyWDiFMxqNwyjMRyouI+Pjx/YAqZm9eXE3Ql1qVRifHyct7zlLVW5211hDt+P7hYjOUvezwK5\nU1GPwnt4VPozjChj4r5LaqUD+N73vsc73/nOhvdViyi8h0elP8OIMhaOYRiGEUFM3A3DMCKI+Klo\nm9KBSHM7ME48qrrtZIWIfA54JzCtqr9YOdYD/DlwHeX0GXer6lLlZ2eAeykXqvmIqj5d575azsbh\n8zD33z/PJz/58D6eyDDK83VX+9vejqb73PczOMNoEJ8H/gj4gnfsAeAZVf2EiHwUOAM8ICI3A3cD\nNwEjwDMi8jptthVkGA3G3DJG5FHVbwELocN3AY9Vth8D3lXZvhN4XFULqnoBOA/cehDjNIxGYuJu\nnFQGVXUaglKSg5Xj1wIXvfMmeLWspGEcG2wRk2GU2aPbZczbHm3AMIyTSqML0RyIuIvIHcB/o/xN\n4XOq2rTZJhG5ACxRzjO/qaoN/Uq928m5JvR1FvhXwEzltI+p6r7TL4vICGWf9BDl9+5RVf3DZjxb\njb4+o6p/1Kxnq8O0iAyp6rSIDHt9TgCnvfNGKsfqMBbaf7bWSYZxVRpdiKbpbhkRaQH+GPhN4I3A\ne0TkDU3ssgSMquqbGi3sFT5P+Vl83OTc64FvUp6ca1ZfAJ9S1TdXWqPErwDcr6pvBN4CfKjye2rG\ns4X7us/7m2jGswFIpTmeAt5f2X4f8KR3/B4RiYnIDcCNwHMNHIdhHAgH4XO/FTivqi+r6ibwOOXJ\nrGYhNPG5djk514y+oFqkGoKqTqnqC5XtLPAiZau14c9Wpy/n1274s4nIl4H/C/yCiLwiIh8APg78\nhoi8BPx6ZR9VPQc8AZwDvgZ80CJljOPIQbhlwhNUl2hu9IEC3xCRIuWv+482sS9H1eSciAxe7YJ9\ncp+I/DbwPeDfNMIF5CMi1wO3AN8Bhpr5bF5fzwK/ShOeTVXfW+dHb6tz/kPAQ/vt1zAOkyhGy9ym\nqm8G3kHZtfCrhzCGZlp6jwCvVdVbgCngU428uYh0An9BefFOlq3P0rBnq9FXU5/NME4SByHuE8Br\nvP2rTFDtD1WdrLxeAb7KwcQoT4vIEEBocq7hqOoVz03wKPDLjbq3iLRRFtsvqqrzQTfl2Wr11cxn\nM4yTxkGI+3eBG0XkOhGJAfdQnrRqOCKSrFiDiEgKuB34cTO6YmeTcw3vqyKwjnfT2Of7M+Ccqn7a\nO9asZ9vSV5OfzTBOFAeRfqAoIvcBT/NqKOSLTepuCPhqJZ9NG/ClenlB9kplcm4U6BORV4CzlCfj\nviIi9wIvU16+3qy+3ioit1COCroA/G6D+roN+C3gRyLyPGX3y8eAh4EnGvls2/T13mY8m2GcRJqe\nOMwwooolDjOayX4Th0VxQtUwDOPEY+JuGIYRQUzcDcMwIoiJu2EYRgQxcTcMw4ggJu6GYRgRxMTd\nMAwjgpi4G4ZhRBATd8MwjAhi4m4YhhFBTNwNwzAiiIm7YRjGNgwPX4+I1GzDw9cf9vDqciAFsg3D\nMI4r09MvU69GzfR0w6tCNgyz3A3DMCKIibthGEYEMXE3DMOIICbuhmEYEcTE3TBqICJ3iMhPReTv\nROSjhz0ew9gtJu6GEUJEWoA/Bn4TeCPwHhF5w2GMZXx8PBJ9HFQ/B/MsB9HH/jFxN4yt3AqcV9WX\nVXUTeBy46zAG8s53vqvpMdbHWdzDMehvfetbr/re1Itb3/n7Od6AkTcfE3fD2Mq1wEVv/1Ll2I74\n0z/9bMMEeXV1iXKM9dY2PT117BbWNJpXY9BdO0v5vXl5F9ds/34eV2wRk2Hsg0zmn1ft53I/Y21t\nnvqLXjpqCkZLS5JSaW2Xvedq9lOvD4ChoeuYmrqw5fjw8PV1BbHe2LYbc72f/f7v/5ddX7O39ya+\nB2Gu/X7C8RR4Ua39R2gYJxUR+cfAmKreUdl/AFBVfTh0nv3zGE1FVff8yWLibhghRKQVeAn4dWAS\neA54j6q+eKgDM4xdYG4ZwwihqkURuQ94mvK81OdM2I3jhlnuhmEYEcSiZQxjlzRrgZOIXBCRH4jI\n8yLyXOVYj4g8LSIvicjXRaRrD/f9nIhMi8gPvWN17ysiZ0TkvIi8KCK376OPsyJySUS+X2l37LOP\nERH5poj8RER+JCIfbtKzhPv5vUY/j4jEReTZyu/6JyLynxv+LKpqzZq1HTbKBtHPgOuAduAF4A0N\nuvffAz2hYw8D/76y/VHg43u4768CtwA/vNp9gZuB5ym7bK+vPKvssY+zwP01zr1pj30MA7dUtjsp\nz4u8oQnPUq+fRj9PsvLaCnwHuK2Rz2KWu2HsjmYucBK2fpu+C3issv0Y8K7d3lRVvwUs7PC+dwKP\nq2pBVS8A5yk/8176gNpxhHftsY8pVX2hsp0FXgRGmvAstfpx6xwa+TwuvjNO+fe+0MhnMXE3jN2x\nrwVOV0GBb4jId0XkX1aODanqNJRFBxhsUF+Dde4bfr4J9vd894nICyLyWc/FsO8+ROR6yt8UvkP9\n96iR/TxbOdSw5xGRFhF5HpgCxlX1XCOfxcTdMI4Ot6nqm4F3AB8SkV9j66qaZkVANOO+jwCvVdVb\nKAvYJxtxUxHpBP4C+EjFsm7Ke1Sjn4Y+j6qWVPVNlL99/JqIjNLAZzFxN4zdMQG8xtsfqRzbN6o6\nWXm9Avwl5a/d0yIyBCAiw8BMI/ra5r4TwGnvvD0/n6pe0YrDGHiUV90Ie+5DRNooC+4XVfXJyuGG\nP0utfprxPJX7LgNfA36pkc9i4m4Yu+O7wI0icp2IxIB7gKf2e1MRSVYsRUQkBdwO/Khy7/dXTnsf\n8GTNG+ygC6r9xfXu+xRwj4jEROQG4EbKi7h23UdFnBzvBn7cgD7+DDinqp9u8rNs6aeRzyMi/c6t\nIyIJ4DcoT5g27ll2O/NuzdpJb8AdlCMozgMPNOieN1COvHmesqg/UDneCzxT6e9poHsP9/4ycJly\n8pRXgA8APfXuC5yhHI3xInD7Pvr4AvDDynP9JWV/8n76uA0oeu/T9yu/i7rvUYP7adjzAP+wct/n\ngR8A//Zqv+/d9mGLmAzDMCKIuWUMwzAiiIm7YRhGBDFxNwzDiCAm7oZhGBHExN0wDCOCmLgbhmFE\nEBN3wzCMCGLibhiGEUH+P+TZ+wgUbGx9AAAAAElFTkSuQmCC\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f22841b4d50>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "# We'll show the image and its pixel value histogram side-by-side.\n",
+    "_, (ax1, ax2) = plt.subplots(1, 2)\n",
+    "\n",
+    "# To interpret the values as a 28x28 image, we need to reshape\n",
+    "# the numpy array, which is one dimensional.\n",
+    "ax1.imshow(image.reshape(28, 28), cmap=plt.cm.Greys);\n",
+    "\n",
+    "ax2.hist(image, bins=20, range=[0,255]);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "weVoVR-nN0cN"
+   },
+   "source": [
+    "The large number of 0 values correspond to the background of the image, another large mass of value 255 is black, and a mix of grayscale transition values in between.\n",
+    "\n",
+    "Both the image and histogram look sensible. But, it's good practice when training image models to normalize values to be centered around 0.\n",
+    "\n",
+    "We'll do that next. The normalization code is fairly short, and it may be tempting to assume we haven't made mistakes, but we'll double-check by looking at the rendered input and histogram again. Malformed inputs are a surprisingly common source of errors when developing new models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 531,
+     "status": "ok",
+     "timestamp": 1446749126656,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "jc1xCZXHNKVp",
+    "outputId": "bd45b3dd-438b-41db-ea8f-d202d4a09e63"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXUAAAEACAYAAABMEua6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnXmQbHd13z9nlu7p7unu2Wf0NE9bxF4GQRlhClwMBmOB\niaTiD0XgwoCcMlWsFScOEpUq6aWSElAu8EKRhE0RBCILHCLZcYyQxdhFAggZie0J+ZnwxFtmebNP\nz9I9M33yR9/f5dfbm56Z7lnunE/Vr/r27Xvv73f7zfve0+d3fueIqmIYhmFEg7aDHoBhGIbRPEzU\nDcMwIoSJumEYRoQwUTcMw4gQJuqGYRgRwkTdMAwjQpioG5FGRJ4rIk+KyPeD10UR+YCI9IrIIyLy\njIh8XUSy3jl3icgZEXlaRN5wkOM3jJ0iFqduHBdEpA04D7wCeB8wq6ofE5EPAb2qeqeIvBD4EvBy\nYBR4FHiO2n8U44hglrpxnHg98DNVPQfcAtwf7L8fuDXYvhl4QFU3VfUscAa4cb8Hahi7xUTdOE78\nC+DLwfawqk4BqOokMBTsvxI4551zIdhnGEcCE3XjWCAinZSs8K8EuyrdKeZeMSJBx0EPwDD2iTcC\n/6CqM8H7KREZVtUpERkBpoP9F4CT3nmjwb4qRMQeBEZLUVXZ6TlmqRvHhbcC/917/zDwzmD7HcBD\n3v7bRSQmItcC1wOP17uoqra03X333ZHoI0r3sl/f124xS92IPCKSpDRJ+vve7o8CD4rIHcCzwG0A\nqnpaRB4ETgMbwHt0L//DDGOfMVE3Io+qrgKDFfvmKAl9rePvBe7dh6EZRtMx94thHGLGxsYi0cd+\n9ROVPvaCLT4yjF0iIuaZMVqGiKA2UWoYhnG8MVE3DMOIECbqhrEPPPPMMySTWTo64lVtcHCUXC53\n0EM0IoKJumHsAxMTE8RiL2Fra6mq5XIFVlZWDnqIRkSwkEbD2DfagHjVXpEdz4UZRl3MUjcMw4gQ\nJuqGYRgRwkTdMAwjQpioG4ZhRAgTdcMwjAhhom4YhhEhTNQNwzAihIm6YRhGhDBRNwzDiBAm6oZh\nGBHCRN0wDCNCmKgbhmFECBN1wzCMCGGibhiGESFM1A3DMCKEibphGEaEMFE3Io+IZEXkKyLytIj8\nREReISK9IvKIiDwjIl8Xkax3/F0iciY4/g0HOXbD2Ckm6sZx4E+Av1bVFwAvAX4K3Ak8qqrPAx4D\n7gIQkRcCtwEvAN4IfEqsNJFxhDBRNyKNiGSAX1fV+wBUdVNVF4FbgPuDw+4Hbg22bwYeCI47C5wB\nbtzfURvG7jFRN6LOtcCMiNwnIt8XkU+LSBIYVtUpAFWdBIaC468EznnnXwj2GcaRwApPG1GnA3gZ\n8F5VfUJEPkHJ9aIVx1W+b4h77rkn3B4bG2NsbGx3ozSOPePj44yPj+/5OibqRtQ5D5xT1SeC939B\nSdSnRGRYVadEZASYDj6/AJz0zh8N9tXEF3XD2AuVRsGpU6d2dZ09uV9E5CYR+amI/KOIfGgv1zKM\nVhC4WM6JyHODXa8DfgI8DLwz2PcO4KFg+2HgdhGJici1wPXA4/s3YsPYG7u21EWkDfgkpf8kF4Hv\nichDqvrTZg3OMJrEB4AviUgn8P+AdwHtwIMicgfwLKWIF1T1tIg8CJwGNoD3qOquXDOGcRDsxf1y\nI3BGVZ8FEJEHKEUUlIm6iNh/CKOlqOplQw5V9QfAy2t89Po6x98L3NuEoRnGvrMX90tllMB56kQJ\nqCqqyt133x1ut7pZX0err932ZxhGORbSaBiGESH24n65AFzlva8bJeAiBFzIjoV9GbulWWFfhhFV\n9iLq3wOuF5GrgQngduCttQ70RX2/BH0/HxzW1/7116ywL8OIKrIXv6SI3EQpr0Yb8DlV/UiNY9R8\nn0arEBF0m4nSFvbd8N/2+Pg4t956D4uL41WfJRLD/PznP2R4eLjJIzSOMrv9297T4iNV/RvgeXu5\nhmEYhtE8bKLUMAwjQpioG4ZhRAgTdcMwjAhhom4YhhEhTNQNwzAihIm6YRhGhDBRNwzDiBAm6oZh\nGBHCRN0wDCNCmKgbhmFECBN1wzCMCGGibhiGESFM1A3DMCKEibphGEaEMFE3DMOIECbqhmEYEcJE\n3TAMI0KYqBuRR0TOisgPRORJEXk82NcrIo+IyDMi8nURyXrH3yUiZ0TkaRF5w8GN3DB2jom6cRwo\nAmOq+lJVvTHYdyfwqKo+D3gMuAtARF4I3Aa8AHgj8CkROZAaqIaxG/Yk6rUsIMM4hAjVf+u3APcH\n2/cDtwbbNwMPqOqmqp4FzgA3YhhHhD0VnuaXFtB8MwZjGC1CgW+IyBbwX1T1s8Cwqk4BqOqkiAwF\nx14JfNs790KwzzCOBHsV9VoWkGEcNl6lqhMiMgg8IiLPUBJ6n8r3DXHPPfeE22NjY4yNje12jMYx\nZ3x8nPHx8T1fZ6+i7ltAn1bVz+x5RIbRZFR1Ini9JCL/k5I7ZUpEhlV1SkRGgOng8AvASe/00WBf\nTXxRN4y9UGkUnDp1alfX2auo+xbQN0TkaVX9VuVBZs0YzWKn1oyIJIE2Vc2JSAp4A3AKeBh4J/BR\n4B3AQ8EpDwNfEpFPUHK7XA/YfJFxZBDVXf3qrL6QyN3Asqp+vGK/NqsPw6hERFDVutEpInIt8DVK\nvyo7gC+p6kdEpA94kJJV/ixwm6ouBOfcBfwesAF8UFUfqXPthv+2x8fHufXWe1hcHK/6LJEY5uc/\n/yHDw8MNXcs4Hmz3t12PXVvql7GADOPQoKo/B26osX8OeH2dc+4F7m3x0AyjJezF/TIMfE1EfAuo\npkVjGIZh7A+7FvV6FpBhGIZxcOx1ovTQoqoUi8Wypqpsbm6ytbXF5uZmuO38oiJS1hz+dltbG+3t\n7WHr6Oigra0tPMY/t9arLU40DKOVRFrUnXj7Ip7P58nn86yvr4fbW1tbtLW1lbVKcXbbHR0dxGIx\n4vE4sVgsbJUPBP8atR4UhmEYreBYiHqhUGBjY4NCocDKykpV29zcDC1v3xKHaus9FouRSCRIJpPh\nq6pWPRAqHxJuTCbshmG0kkiLerFYDMXcWeZLS0ssLi6yuLgYbm9sbNDR0RG6U9x2Lcu7q6uLdDpN\noVBga2urSsD9B4Oqhq4dJ+yGYRitJNKi7ix1525ZW1tjaWmJ+fl5ZmdnmZubY25ujkKhQEdHB52d\nnWWvvpi711QqVSbo7kHgrPtisRgKeqW1b/H6hmG0msiKurPS19bWyOVyLC8vs7y8zOzsbFXL5/NV\ngu5PgPqvyWSyzB/vHha+sPtC767l/wI4rNSb2K10JblfHfUmhA3DODgiLerr6+ssLS2FFvnc3Bzz\n8/NhW1hYYGFhgUKhUCXEzo1S6VNPJBLkcjkWFxeZnZ0lnU6TSqWqRL29vZ3Ozs6q1tFxeL/yyjkB\nEQknhl2Lx+N0dnbWfOAZhnHwHF6F2SNbW1usr6+zvLzM3Nwck5OTTE5OsrS0FPrSl5eXQ5965URp\nveiVWhOlXV1dVYLurH4/UsYJ4mGlMlyzra2NeDxOMpkM29bWFslksmzewL2apW4YB09kRd231Gdn\nZ5mcnOTcuXPkcrmwrayskMvl2NzcrBuxUulacEJdablWinp7ezuxWIyuri66urqIx+N0dXURi8UO\n7DvZDt9F5LYTiQSZTIZMJhN+T+4zVS2bezAM4+CJrKj7lvrs7CwTExOcO3eO1dVVVldXWVtbC5sT\nq8vFlLv3ldZsvdbW1kZXVxeJRIJEIkFXVxfJZJJ4PH4QX0dDuF8XfkulUqyvr4cTw52dnXR1dYXn\n+A9AwzAOnsiKurPUc7kc8/PzTE9Pc+HChbJJzkKhEC4+2i3O7eCLvdt2Qu43XxAPG7FYrOqXSCaT\nCQXdWe6FQiE8x4m6RfYYxuEgsqLe3t4eug4GBgY4ceJE1UrSyhWllStBXax75WvldrFYBAiFzRc4\nF4Wzvr6OqrKxsVHTX++OdbHtfoz7Tqj1C6PWytjKflS1bDLXbW9ubpa9d8Lvfn24h9phniswjONE\nZEW9ra2NRCJBNptlcHCQfD6Pqoai7r+62PLKtrW1VdVq5Y6pty0ioai7uPlCoVBzsZL7vDJfzU6o\nJ+iVUS1AVT/FYrHmAiy3MMsX9VgsxubmJsViMbTgzVI3jMNBZEXduT8ymQz5fJ5isUhnZydra2us\nr69Xibofn+62NzY22NzcDF/ddqFQKEs9sLGxUWX9FwqFUCw3NjbKBL1ywVJHRwfFYrHqobEXUXfb\nlRE9Lk6+1gOr1rEbGxtlgh6Px4nH42WC3tXVteOxGobRGiIt6s5Sd1EaqVSqTNTdtqqW+ZGdi8GJ\nty/ilS4cdy03AetE04l5pbsGqPkAKRaLZQ8Q9yDYCbUWA9VaDAWEDym/1Yo990XdCXo8Hg8FPR6P\nh79EDMM4eCIv6s5XnEql6O3tDSNeKkXdhR36oYf1BHxtba0sgmZ1dZXl5eUwvK9YLFIoFEKr27f0\nt7a2yiYkt7a26OzsDI/zfwFsbm7u6J5r+c8rV7S6MW5sbFQ1d57/WigUqgQ9Ho+Hgp5Kpdjc3DRR\nN4xDQmRF3Q+/87edMPsuGKBKtJyoVzY/FNJvqVSqrLl0ApXums3NzbIYd7e9tbUVum1c262o+8Jc\nmdOms7OTYrEYjhuoyisPv5zsjcViZeNxbqHKPPWGYRwOIi3qzi/sR2b4SbicYEF5OJ8TP39laDwe\np1AohGGKvtCvr6+zsrLC6upqWUpfX6SduDtLvbI5n3ulu2cn9+te/VYrUdnm5iaLi4ssLCywuLgY\nWu5+9A38Utj9eQD/YeSuZ3HqhnF42FbUReRzwJuBKVV9cbCvF/hz4GrgLKVK7IstHOeOcaLufMi+\nyDtx8t0Ovp/b93c7QfcnSn03iduu5avP5/NllnqrRb2WsPv34bbz+TzT09N0dXWFfvOVlZWaFrj/\nPfqTpe56fkoFwzAOnkYs9fuAPwO+4O27E3hUVT8mIh8C7gr2HSqciNcSdD/qA6iZ+8VFpLjmJjMr\no1RqCXKt5rtfKisn1XK/7EbUa1nqlcK+trYWuqU2NjbI5XJ0dHSUTcw6Qa/8xVNpqbvvyjCMw8G2\noq6q3xKRqyt23wK8Jti+HxjnkIm6H5PtJ5+qXExUqz6pszwrj6u3EKlWOKIf/uhb69tZ6r6w78Wn\nfjlLfXl5ORT0lZUV5ubmqhYP+SGKlQ9EN5Hsi7pZ6oZxONitT31IVacAVHVSRIaaOKam4At0q6kU\nfbftSulV+tQrl+I30/1yOUvdbS8uLoYFQ2ZmZkgkEmGBDzd+90ullpXuIoWOkk9dRNqAJ4Dzqnrz\n5VyIInIXcAewCXxQVR85mFEbxs5p1kTpZcMf7rnnnnB7bGyMsbGxJnV7uKhc/FNrotat3Kz0SbsU\ntv7qzJ0W1Kgn7E6g3S+Jyvw3zi3kFhS5MXV2doY5a1KpFN3d3WFzeWxisVgYJrkfjI+PMz4+vptT\nPwicBjLB+5ouRBF5IXAb8AJgFHhURJ6jFuJjHBF2K+pTIjKsqlMiMgJMX+5gX9SjjC9sTqj9ohhO\nXGvVQq01qbuTxUe18rs4F5JrLp7cibov6M7VU6t8n2vd3d2k02nS6XQo6s4Fs1+iXmkUnDp1attz\nRGQUeBPwH4E/CHbXcyHeDDygqpvAWRE5A9wIfLcpN2AYLaZRUZegOR4G3gl8FHgH8FBzh3X0cAJa\nKez+tosFr8zoWOm+cNbybpbeVy4e8lMOOPGutNLd6lW/2pFrvpXuW+vOBROLxfZV1HfJJ4A/BLLe\nvuE6LsQrgW97x10I9hnGkaCRkMYvA2NAv4j8Argb+AjwFRG5A3iW0s/VY48v7LXcH861UrkU34m4\nO973b++0/8r3LgeNs8j9VbKVi4o6OztDUXf+cyfqvqCn0+mqEn2HVdRF5LcpheM+JSJjlzl0V+6V\n4+JaNFrPHlyLZTQS/fK2Oh+9fs+9R5BKv7r/3hfpSovabbsonWbhZ4l0E7eV7hcn+O7B4ldtqud+\nqUz8dVhFHXgVcLOIvAlIAGkR+SIwWceFeAE46Z0/GuyryXFxLRqtZzeuxVoc/rCFI0C9CUpniftp\ndivdLrWObaSyUq1KS35z13QTo65YyKVLl5ienmZubo6lpSVWV1dD0W9vbw/zuWSzWfr7++nv76en\np4dMJkMqlSoLZ/TnBA6rqKvqh1X1KlW9DrgdeExV3w78JSUXIpS7EB8GbheRmIhcC1wPPL7PwzaM\nXRPZNAHHlcpl/vl8nlwux9zcHLOzs8zOzjI1NcXU1BRzc3Pkcrkw17xfk7Svr4++vj6Gh4cZGBgg\nm82SSqWIxWI1H15HkI8AD1a6EFX1tIg8SClSZgN4j0W+GEcJE/UIUauaUT6fZ3l5mbm5OaamppiY\nmGBqaopLly6Fol4oFKpEvb+/n+HhYYaHh+nv7yebzZJMJonFYmW/BI4Sqvp3wN8F23PUcSGq6r3A\nvfs4NMNoGibqEcMX9GKxWGapT05Ocu7cOSYnJ1lcXGRxcTEUdSCMS3eiPjIywvDwMH19fWWWuovW\nqTUvYBjGwWKiHiEqBb1S1Kempjh//jwTExNlOeFruV98UU+n02QymdBSNzE3jMOLiXqE8MXcJSBb\nX19neXk5nCSdmJhgYmKiqkwfEJamS6fT9Pb2MjAwwODgIIlEgmQySSKRoLOz80ikBTCM44qJekSo\nzDXj2sLCAsvLy6yurrK+vl5WkcnPl14r5XBllIsl7jKMw4+JekRQVTY2NsJ87q4tLi6yvLzMyspK\nmag7N42f9KxWlSS/YIiJumEcfkzUI4JbXJTP51lZWSGXy5HL5UJR9y11P6eMHyNvlrphHH1M1COC\nb6mvrq6ytLQUlqxzlvra2lpoqddKVVAp6n4aAJdi10TdMA43NuMVEZyl7uqlLi0tMT8/X2Wp+2l2\nffeLnznSL11nlrphHC3MUj+CVC4wAsI6qblcjoWFBWZnZ5menmZmZoaFhYVw5aiz0l1uF78NDQ2F\nC41SqVRYCKMyrYFhGIcXE/UjSGUpPRe66HzoTtAvXrzIzMwM8/Pz5HI51tfX2draor29nUQiQTab\nDVsmk2F0dJQrrriC/v5+0uk08Xg8tNLNUjeMo4GJ+hHEibpfF9VNkC4uLjI3NxeK+sLCAktLSywv\nL5PP58MiHclkkp6eHoaGhhgcHGRwcLAsLYAv6malG8bRwUT9COLXQHWLiJwv3Ym6y/OSy+XClaPO\nUo/FYiSTSXp7exkeHmZ0dJTR0VF6e3vD5ou6HyFjwm4YhxsT9SOIqrK1tcXW1laYD72e+8WfHHVW\nvXO/9PT0MDw8zFVXXcV1110XVjdyudO7urrKSuz5r4ZhHE5M1I8gvqXuLzhaWVlheXmZxcVF5ufn\nmZmZCVMAONra2ojFYmU500dGRhgdHS0rUedPkhqGcXQwUT+CuGpGTsxXV1fDsEUXi16vHmp7ezvZ\nbJZ0Ok0qlQoLSDsRPwqFLwzDqM+2ceoi8jkRmRKRH3r77haR8yLy/aDd1NphGj7O7ZLP51ldXSWX\ny9VNBdDW1kZnZ2dY0SiTyYSi3t3dHYp6rZh0wzCOHo38z70P+K0a+z+uqi8L2t80eVzGZXCuFyfq\ny8vLLC0t1RR1t1LU1Rv1RT2VSpFIJEJRj8VittDIMI44jRSe/paIXF3jI/sff0A490stS72W+6Wz\nszMU9e7ubjKZTJWlHo/Hq2qdmqgbxtFjL7+x3yciT4nIZ0Uk27QRGdvi+9Sdpb6d+6WepV7P/WKC\nbhhHk91OlH4K+PeqqiLyH4CPA79X7+B77rkn3B4bG2NsbGyX3R4v6tU7dj51X9R998vGxgbFYhEg\nFPTu7m56enro7+9ncHAwjEX3C1/4k6OHVdTHx8cZHx8/6GEYxqFlV6Kuqpe8t58B/vJyx/uibuwe\nJ/K+qLswxsqkXcViEREpqzva19fH0NBQuGrUlajr7Ow8EoIO1UbBqVOnDm4whnEIadT9Ing+dBEZ\n8T57C/DjZg7KKMdP3OUWHhUKhW0tdSfqru6oL+p9fX1kMpnQUnccZkE3DGN7trXUReTLwBjQLyK/\nAO4GXisiNwBF4Czw7haO0YC6ou4s9cuJum+pu/wutYpJm6AbxtGnkeiXt9XYfV8LxmLUwPerO4u9\nlvtlO0s9nU6XWeqJRCJszv0SRUQkDvw9EAvaQ6r6YRHpBf4cuJqSYXKbqi4G59wF3AFsAh9U1UcO\nYuyGsRtshckhx4m4SwngLHS3ktSJukut6wTdj3pJJpOk02my2WyYsCuTyZBKpejq6gpTAThr3W9H\nHVXNA69V1ZcCLwZ+Q0ReBdwJPKqqzwMeA+4CEJEXArcBLwDeCHxKovBFGMcGSxNwiHHVjAqFQpi4\nq1AoMD8/z8LCQljVyGVidLVHneUdi8XCmPRkMlm20Kiy+EWUUdXVYDNOyZCZB24BXhPsvx8YpyT0\nNwMPqOomcFZEzgA3At/dzzEbxm4xUT/EuLqj+XyetbW1MH2uE/WlpSWWlpbI5XKsra2F6XFdKbq2\ntrZwkZGzyl3CLle+7jgsMhKRNuAfgH8G/GdVPS0iw6o6BaCqkyIyFBx+JfBt7/QLwT7DOBKYqB9i\nKuuOrqyskMvlwtqjvqW+trYWLvV3ou4sdbfIKJFIVJWoOw4LjVS1CLxURDLA10VkDKhcBFB7UcA2\n2BoMo1k0aw2Gifohxom6X9XIpdX1Kxo594tzubiIF7eC1M/z4ix1v+hF1EXdoapLIvLXwK8CU85a\nD0J0p4PDLgAnvdNGg301sTUYRrNo1hqMaDtTjzi+pb66usrS0hJzc3M1fepra2th7nQn6pXZGH1R\n9631KAu7iAy4NBYikgB+E3gSeBh4Z3DYO4CHgu2HgdtFJCYi1wLXA4/v66ANYw+YpX5IKBaLYcii\n287n81XVjKanp7l06VJZMWkX7eKKX7iY9IGBAfr7+8lms6RSqTBpV1QFvA5XAPcHESxtwBdV9W9F\n5EngQRG5A3iWUsQLgb/9QeA0sAG8R+vlazCMQ4iJ+iHBL1Hn2traGsvLyywsLDAzMxPWHZ2dnWV+\nfp6VlRUKhQKqGqbXdTleBgYGGB4eZmBgoErUjxOq+iPgZTX2zwGvr3POvcC9LR6aYbQEE/VDQmU8\n+ubmZpgCYH5+ntnZWaamprhw4UIY8ZLL5UJRr5W4a2RkhL6+Pnp6euju7iYej0c+fNEwjjsm6ocE\n53ZxseiFQqGmpX7x4sXQQncNKLPUs9lsaKn7aXaPo6VuGMcNE/VDgm+pVybrWlhYCC31ixcvks/n\nQ/+7a5Xul/7+foaHh+nu7g7TAbjYdcMwoouJ+j5zuRzpzjp3Menz8/M129bWVljMoqOjI8zv4lc2\ncikBnJi7iBcTdcOINibqB4ifebFQKLCyssLCwgILCwuhH31ycpK5uTmWl5dDC90vJu1CFHt6eshk\nMmXhi7WKSR+zyBfDOHaYqB8QvusEYGNjIxR1F7Y4PT3N1NRUmagXi8XQOnfJupLJZJWoV6YEcKJu\nGEa0MVE/ACr94b6lPj8/z/T0NBcvXmRycpLZ2Vnm5ubI5XKhpd7e3l4W7ZJOp2uKuu9ysWLShnE8\nMFE/QGqJ+sLCApcuXeLChQtcvHgxTAXgW+puoZHzo2cymTJR91MCdHR0hCtGTdQNI/qYqB8QvqC7\nUMZKUT9//jzr6+vk8/nw1fepd3V1kUqlyGazdS11C2E0jOOFiXoLqRXpoqqhQPvtwoULTE1NMTMz\nw8LCQlnRi62tLUSEjo4OVJVEIhHGo/f39zM4OMjg4GBV3dEo53QxDKM2jdQoHQW+AAxTqkn6GVX9\n08uVAzPqUywWyefzYQk651qZnJxkYmKCmZkZFhcXWV1dDd0tqhqKeltbW1jJqKenp6xEXW9vL+l0\nOvIl6gzDqE8j4RCbwB+o6ouAVwLvFZHnU6ccmHF5isUi6+vrLC0tMTs7y8WLF3n22Wf5xS9+EYr6\nwsJCuGp0a2urKozRF3VnqQ8NDdHX11cm6oZhHD8aKTw9CUwG2zkReZpSjul65cCMOjgfuhP1mZkZ\nJiYmmJiY4NKlS8zOzjI7O1tmqbsYc78lk8mylaNDQ0MMDQ2F4Y3JZJKODvOsGcZxZEf/80XkGuAG\n4DtAvXJgRg2cf925X5yoO0t9bm6OxcXFsESds9SBMGqlo6ODWCxWZak7UferHpn7xTCOJw2Luoh0\nA18FPhhY7E0pBxZ1/FWjLr9LPp8Pi164POkule7q6iqrq6uh60VEwqIXrvX29tLb20tPTw/ZbDZM\n2lVp0RuGcfxoSNRFpIOSoH9RVV2FmHrlwKo4znUcKxcZbW1thcWk19fXw1wvq6urrK+vUygU2Nzc\nDP3o8XicdDodhi1ms1lOnDjB4OBgmCe9ctVolKNemlXH0TCiSqOW+ueB06r6J94+Vw7so5SXA6vi\nuNZxrIxFLxaLYb50l7xrdXU1FPV8Ph9a6MViEREhHo/T3d1NX19fGLo4MjLC4OAgvb29dHd3h6Lu\nrxyNKs2q42gYUaWRkMZXAb8D/CgoAabAhymJeVU5MKMcX9BdRSPfUl9dXQ0LR7viGM5Sb29vDy31\nvr4+RkZGOHHiRFimrqenpyxPum+lR9VSNwzj8jQS/fJ/gHoO2prlwIwSvpXuBN1Z6r6oO0vdWeiu\nVVrqw8PDnDx5kt7eXjKZDJlMpsz94ou5ibphHE8s7q0JuMVB9VaQ+pa6K4KRz+dD94uz1Ctpa2uj\nq6uLdDodlqc7efJkuGo0kUiQTCbDHC+GYRimBE3CT6PrXl2Ui2tra2ssLCxw/vx5Ll26xOLiIuvr\n62HlIj8cMRaLkU6ny/zn6XS6ZgZGs8oNw3CYqDeJWul0XToAV/jClaU7d+4c09PToai7HOnJZJJU\nKhW2np6eMNLFuVxc9sVYLBYuTDJRNwzDYaLeRHwfui/qLhbdFb2YmppienqapaWlUNTj8TiJRCIM\nXezt7Q0nR2tZ6h0dHSbqhmFUEd3Yt32mMnTRLTJaXl4OV46ePXuWn/3sZ2Xul7W1tdD9kkwmyWaz\nDA4OcuKtFEXOAAAQFklEQVTECU6ePMkVV1xRloHR+dDN/dIYIjIqIo+JyE9E5Eci8oFgf6+IPCIi\nz4jI10Uk651zl4icEZGnReQNBzd6w9g5Zqk3kUphd+kAZmdnmZiY4OzZs5w7dy70r6+trVW5X3xR\nHxkZob+/n76+vrIMjH7hCwtf3BaXkO6pYFX0P4jII8C7KCWk+5iIfIhSQro7ReSFlMJzX0Apx9Gj\nIvIcrVcx3DAOGSbqTcIPV3SvuVwu9KnPzc0xOzvLzMwMhUIhXIC0sbERxqT7eV16e3sZGBgIUwA4\nt0tnZ6elANgBu0hIdzPwgKpuAmdF5AxwI/DdfR66YewKE/Um4Od0WV9fD4tgLCwssLi4GIYsujQA\nTvid7x1KceVO2P2C0q6AtLla9k6DCemuBL7tnXYh2GcYRwIT9SaxubkZhjCurKyERaSXlpaqRN1Z\n9S5Xusvz4sIaXc50V2vUj3QxdkerEtId57xGRnNpVl4jE/UmoKplou6qGi0sLISi7vznhUKhbDK1\nWCwC1Za6W1hkMel7Z4cJ6S4AJ73TR4N9NTmueY2M5tOsvEYW/dIkfFFfWlpifn6+TNRXVlYu636p\nZaknk8kwLt0s9T1xuYR0UJ6Q7mHgdhGJici1wPXA4/s1UMPYK2ap75B6qQCcqK+srISi7rtf1tbW\nyOfzbGxs1Lyus9Q7OzvLLHVnpfuWulnrjbPThHSqelpEHgROAxvAeyzyxThKmKjvAmdhu/DFzc3N\n0O3iVo1OTU0xMzPD/Pw8uVyO9fV1Njc3D3rox47dJKRT1XuBe1s2KMNoISbqO6RygZGLR/fdLjMz\nM0xPT3Pp0iUWFhZCUd/a2jro4RuGEXFM1HeIn0rX5T53bpdKS31ubo7l5WUTdcMw9g0T9R3iu1w2\nNjbC3OiV7hdXd9SVrMvn8+Z+MQyj5Zio7wLfUr+cqC8uLpatMjVL3TCMVmOivkN8S71QKITVi1xz\nC49cNSO/kpGLSQeqcrf4WRcro1ws2sUwjEbZNk69Rpa79wf77xaR8yLy/aDd1PrhHjy1BH1lZaVs\ncZEfh+5Hyjja2trC8MVYLEY8Hg9TAbjcLn4RaUvcZRhGozRiqdfKcveN4LOPq+rHWze8w4fL8+Lc\nLq4knZ8GwLlanKD7i4zgl6LuW+d+4Qs/T7qJuWEYO6GRwtO1sty5BEfHTm18Ua9lqbsFRr6o+01E\nykTdWeexWCxcaORb6uaCMQxjJ+woTYCX5c6lIX2fiDwlIp/1iwxEGTdJ6hePruV+2draqhJ2h/Oh\n13K/1LPUzWI3DKMRGhb1yix3wKeA61T1BkqW/LFwwzhLvdKvXi+vi2+hOyvdCbpLsdvd3R1mZHTi\nXulXNwzDaISGol9qZblT1UveIZ8B/rLe+cc1PakTceducYKeTqdJp9NkMhnS6TTd3d1cccUVnDhx\ngoGBATKZTFiHtJYr5jjTrPSkhhFVGg1prMpyJyIjgb8d4C3Aj+udfFzTk/pJupxbJR6Ph0Wl/TY4\nOMjQ0BCDg4OhqLe3t9cMcTzONCs9qWFElW1F/TJZ7t4mIjcAReAs8O4WjvNI4rta3ERoMpmkt7eX\noaEhhoaGGB4eZnh4mL6+Pnp6eshms2WWuhN0s9QNw2iERqJf6mW5+5vmDyda+JZ6PB6nq6uLVCpF\nT08Pg4ODXHnllYyOjnLllVeSyWTCHOouj7pfYNowDKMRbEVpC6kU9UQiQXd3d2ipnzhxgquvvppr\nrrmGVCoV+tCdq8asc8MwdoqJ+g5xk55+geh0Ok1PTw9ra2thubq2tjZUlUQiETZ37MjICIODgwwM\nDNDb20smkyGRSJS5Wdy2YRjGTjBR3yFtbW2hoHd3d4fhisVikY6ODhKJBJlMhv7+flQ1dLu4WPRU\nKsXIyAhDQ0Nks1kSiURZTDpgLhfDMHaNifoOaWtro7Ozk0QigaqG7zs7O0kmk2SzWQYGBlhcXERV\ny8rRuYdBX18fvb299PT0hKLurHITc8Mw9oKJ+g5xIu4EPRaLha6VTCbD2tpamA8GCH3kLjwxFouR\nSqXClkwmwwlRMCvdMIy9sa+iPj4+vm8Lj1rVlxN1J9DFYpHx8XFe+cpXluVOdwUxfD+5W0TkLHc/\nK2OjYh6F7/Cw9GcYUcREfYfUWrb/xBNP8OY3v7npfdUiCt/hYenPMKKIhVcYhmFECBN1wzCMCCF+\nStiWdCDS2g6MY4+qXnYyQkQ+B7wZmFLVFwf7eoE/B66mlObiNlVdDD67C7iDUoGYD6rqI3Wuq43+\n/xkfH+fWW+9hcXG86rNEYpif//yHDA8PN3Qt43ggItv+bdei5T713QzKMJrMfcCfAV/w9t0JPKqq\nHxORDwF3AXeKyAuB24AXAKPAoyLynIbV2zAOGHO/GJFHVb8FzFfsvgW4P9i+H7g12L4ZeEBVN1X1\nLHAGuHE/xmkYzcBE3TiuDKnqFIQlG4eC/VcC57zjLvDL8o2GceixxUeGUWJX7pXjWgDGaD5NKwBT\nWRi5FQ24Cfgp8I/Ah1rc11ngB8CTwOMtuP7ngCngh96+XuAR4Bng60C2hX3dDZwHvh+0m5rU1yjw\nGPAT4EfAB1p1bzX6en8r7y249tUV3+PTwHCwPQI8HWzf6f+NUkox/Yo619RG+eY3v6nZ7GsUtKol\nEkM6OTnZ8LWM40Hw97Xjv/WWu19EpA34JPBbwIuAt4rI81vYZREYU9WXqmorfKH3UboXHzfp9jxK\nYnVXC/sC+Liqvixozcprvwn8gaq+CHgl8N7g36kV91bZ1/u8v4lW3BuABM3xMPDOYPsdwEPe/ttF\nJCYi1wLXA483cRyG0VL2w6d+I3BGVZ9V1Q3gAUqTVK1CaOF96c4m3VrRF5SLU1NQ1UlVfSrYzlGy\nZEdpwb3V6cv5rZt+byLyZeD/As8VkV+IyLuAjwC/KSLPAK8L3qOqp4EHgdPAXwPvCawmwzgS7IdP\nvXLi6TytjSZQ4BsisgV8WlU/08K+HGWTbiIytN0Je+R9IvJ24AngX2sQX90sROQa4AbgO5RcFC27\nN6+v7wKvpgX3pqpvq/PR6+scfy9w7177NYyDIIrRL69S1ZcBb6LkQnj1AYyhlZbdp4DrVPUGYBL4\neDMvLiLdwFcpLbrJUX0vTbu3Gn219N4M4ziwH6J+AbjKez8a7GsJqjoRvF4Cvsb+xBhPicgwgIiM\nANOt6khVL3nugM8AL2/WtUWkg5LIflFVnY+5JfdWq69W3pthHBf2Q9S/B1wvIleLSAy4ndJkVNMR\nkWRg/SEiKeANwI9b0RWNTbo1va9AWB1vobn393ngtKr+ibevVfdW1VeL780wjgX7kSZgS0TeRyks\nrg34nKo+3aLuhoGvBflmOoAvaZ28HbslmHQbA/pF5BeUwvA+AnxFRO4AnqW0zLxVfb1WRG6gFOVz\nFnh3k/p6FfA7wI9E5ElKbpYPAx8FHmzmvV2mr7e14t4M4zjR8oRehhFVLKGX0Up2m9ArihOlhmEY\nxxYTdcMwjAhhom4YhhEhTNQNwzAihIm6YRhGhDBRNwzDiBAm6oZhGBHCRN0wDCNCmKgbhmFECBN1\nwzCMCGGibhiGESFM1A3DMJrEyMg1iEhVGxm5Zt/GsB+VjwzDMI4FU1PPUquOzNRU06s01sUsdcMw\njAhhom4YhhEhTNQNwzAihIm6YRhGhDBRN4waiMhNIvJTEflHEfnQQY/HMBrFRN0wKhCRNuCTwG8B\nLwLeKiLPP4ixjI+PR6KP/eonKn3sBRN1w6jmRuCMqj6rqhvAA8AtBzGQ3QjITmOlj4Oo1/tOLve9\n1DvnzW++tXU30ARM1A2jmiuBc97788G+lvErv/LymgLyR3/0xzu+1i9jpcvb1NRk0/o4atT7Tkrf\ny7M7OmdlJVf3AXEYsMVHhrEPdHZ2sr7+IzKZf1712erqIpcu5am1aGVlpaOmWLS1JSkWV3c4ip31\nATA8fDWTk2er9o+MXFNTDC8/rg5OnTrV8DmXu1b9z2r3cXniOxTkLWp9jyUOXthN1A2jmgvAVd77\n0WBfFTu1zvL5v7rMp41fa3tBr3etnY13aurZHd3j5ce1uaNzLnet+p/V7qPEbgR3N99j7c/2y5IX\n1XpPHMM4nohIO/AM8DpgAngceKuqPn2gAzOMBjBL3TAqUNUtEXkf8AileafPmaAbRwWz1A3DMCKE\nRb8YRgOISK+IPCIiz4jI10UkW+e4rIh8RUSeFpGfiMgrWtFPcGybiHxfRB5udh8iMioijwX38CMR\n+UCD19520ZaI/KmInBGRp0Tkhp2MvdF+RORtIvKDoH1LRH6l2X14x71cRDZE5C2t6ENExkTkSRH5\nsYh8c9uLqqo1a9a2acBHgX8bbH8I+Eid4/4r8K5guwPItKKf4PN/Bfw34OFm9wGMADcE292U5hie\nv81124B/Aq4GOoGnKs8B3gj8r2D7FcB3dvFv0Ug/vwZkg+2bdtpPI314x/0t8FfAW1pwH1ngJ8CV\nwfuB7a5rlrphNMYtwP3B9v1A1QoUEckAv66q9wGo6qaqLjW7n6CvUeBNwGd3eP2G+lDVSVV9KtjO\nAU+zfax+I4u2bgG+EFz3u0BWRIZ3OP5t+1HV76jqYvD2Ow2Mfcd9BLwf+CowvcPrN9rH24C/UNUL\nAKo6s91FTdQNozGGVHUKSoIHDNU45lpgRkTuC9winxaRRAv6AfgE8IfUD5huRh8AiMg1wA3Ad7e5\nbiOLtiqPuVDjmO3Y6eKwfwn872b3ISIngFtV9T+xu3jJRu7juUCfiHxTRL4nIm/f7qIW/WIYASLy\nDcC3GoWSaP67GofXEtMO4GXAe1X1CRH5Y+BO4O5m9iMivw1MqepTIjJGDUFpwr2463RTskQ/GFjs\nRwoReS3wLuDVLbj8H1NyX4XdtaAP9zf1G0AK+LaIfFtV/+lyJxiGAajqb9b7TESmRGRYVadEZITa\nP7fPA+dU9Yng/Vcp/0/frH5eBdwsIm8CEkBaRL6gqr/bxD4QkY7gHr6oqg/Vu55HI4u2LgAntzmm\nGf0gIi8GPg3cpKrzLejjV4EHpLSqaAB4o4hsqGqjE9eN9HEemFHVdWBdRP4eeAklX3xNzP1iGI3x\nMPDOYPsdQJXIBS6NcyLy3GDX64DTLejnw6p6lapeB9wOPOYLejP6CPg8cFpV/6TB634PuF5ErhaR\nWDC2SoF7GPhdABH5NWDBuYJ2wLb9iMhVwF8Ab1fVn+3w+g31oarXBe1aSg+/9+xA0Bvqg9K/zatF\npF1EkpQmly+/ZmKnM8/WrB3HBvQBj1KKAnkE6An2XwH8lXfcS4L/rE8B/4MgAqPZ/XjHv4adR79s\n2welXwNbwX08CXyfksW73bVvCq57Brgz2Pdu4Pe9Yz5JydL8AfCyXf57XLYf4DPAbDDuJ4HHm91H\nxbGfZ4fRLzv4vv4NpQiYHwLv3+6atvjIMAwjQpj7xTAMI0KYqBuGYUQIE3XDMIwIYaJuGIYRIUzU\nDcMwIoSJumEYRoQwUTcMw4gQJuqGYRgR4v8DDPSR5usfYD4AAAAASUVORK5CYII=\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f22841b4550>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Let's convert the uint8 image to 32 bit floats and rescale \n",
+    "# the values to be centered around 0, between [-0.5, 0.5]. \n",
+    "# \n",
+    "# We again plot the image and histogram to check that we \n",
+    "# haven't mangled the data.\n",
+    "scaled = image.astype(numpy.float32)\n",
+    "scaled = (scaled - (255 / 2.0)) / 255\n",
+    "_, (ax1, ax2) = plt.subplots(1, 2)\n",
+    "ax1.imshow(scaled.reshape(28, 28), cmap=plt.cm.Greys);\n",
+    "ax2.hist(scaled, bins=20, range=[-0.5, 0.5]);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "PlqlwkX-O0Hd"
+   },
+   "source": [
+    "Great -- we've retained the correct image data while properly rescaling to the range [-0.5, 0.5].\n",
+    "\n",
+    "## Reading the labels\n",
+    "\n",
+    "Let's next unpack the test label data. The format here is similar: a magic number followed by a count followed by the labels as `uint8` values. In more detail:\n",
+    "\n",
+    "    [offset] [type]          [value]          [description] \n",
+    "    0000     32 bit integer  0x00000801(2049) magic number (MSB first) \n",
+    "    0004     32 bit integer  10000            number of items \n",
+    "    0008     unsigned byte   ??               label \n",
+    "    0009     unsigned byte   ??               label \n",
+    "    ........ \n",
+    "    xxxx     unsigned byte   ??               label\n",
+    "\n",
+    "As with the image data, let's read  the first test set value to sanity check our input path. We'll expect a 7."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 90,
+     "status": "ok",
+     "timestamp": 1446749126903,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "d8zv9yZzQOnV",
+    "outputId": "ad203b2c-f095-4035-e0cd-7869c078da3d"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "magic number 2049\n",
+      "label count 10000\n",
+      "First label: 7\n"
+     ]
+    }
+   ],
+   "source": [
+    "with gzip.open(test_labels_filename) as f:\n",
+    "  # Print the header fields.\n",
+    "  for field in ['magic number', 'label count']:\n",
+    "    print field, struct.unpack('>i', f.read(4))[0]\n",
+    "\n",
+    "  print 'First label:', struct.unpack('B', f.read(1))[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "zAGrQSXCQtIm"
+   },
+   "source": [
+    "Indeed, the first label of the test set is 7.\n",
+    "\n",
+    "## Forming the training, testing, and validation data sets\n",
+    "\n",
+    "Now that we understand how to read a single element, we can read a much larger set that we'll use for training, testing, and validation.\n",
+    "\n",
+    "### Image data\n",
+    "\n",
+    "The code below is a generalization of our prototyping above that reads the entire test and training data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 2
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 734,
+     "status": "ok",
+     "timestamp": 1446749128718,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "ofFZ5oJeRMDA",
+    "outputId": "ff2de90b-aed9-4ce5-db8c-9123496186b1"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Extracting /tmp/mnist-data/train-images-idx3-ubyte.gz\n",
+      "Extracting /tmp/mnist-data/t10k-images-idx3-ubyte.gz\n"
+     ]
+    }
+   ],
+   "source": [
+    "IMAGE_SIZE = 28\n",
+    "PIXEL_DEPTH = 255\n",
+    "\n",
+    "def extract_data(filename, num_images):\n",
+    "  \"\"\"Extract the images into a 4D tensor [image index, y, x, channels].\n",
+    "  \n",
+    "  For MNIST data, the number of channels is always 1.\n",
+    "\n",
+    "  Values are rescaled from [0, 255] down to [-0.5, 0.5].\n",
+    "  \"\"\"\n",
+    "  print 'Extracting', filename\n",
+    "  with gzip.open(filename) as bytestream:\n",
+    "    # Skip the magic number and dimensions; we know these values.\n",
+    "    bytestream.read(16)\n",
+    "    \n",
+    "    buf = bytestream.read(IMAGE_SIZE * IMAGE_SIZE * num_images)\n",
+    "    data = numpy.frombuffer(buf, dtype=numpy.uint8).astype(numpy.float32)\n",
+    "    data = (data - (PIXEL_DEPTH / 2.0)) / PIXEL_DEPTH\n",
+    "    data = data.reshape(num_images, IMAGE_SIZE, IMAGE_SIZE, 1)\n",
+    "    return data\n",
+    "\n",
+    "train_data = extract_data(train_data_filename, 60000)\n",
+    "test_data = extract_data(test_data_filename, 10000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "0x4rwXxUR96O"
+   },
+   "source": [
+    "A crucial difference here is how we `reshape` the array of pixel values. Instead of one image that's 28x28, we now have a set of 60,000 images, each one being 28x28. We also include a number of channels, which for grayscale images as we have here is 1.\n",
+    "\n",
+    "Let's make sure we've got the reshaping parameters right by inspecting the dimensions and the first two images. (Again, mangled input is a very common source of errors.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      },
+      {
+       "item_id": 2
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 400,
+     "status": "ok",
+     "timestamp": 1446749129657,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "0AwSo8mlSja_",
+    "outputId": "11490c39-7c67-4fe5-982c-ca8278294d96"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Training data shape (60000, 28, 28, 1)\n"
+     ]
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW0AAAC2CAYAAAASj9x6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztfVtsa9t13VikSPEhviVKR/f63rjIh31RBEaB+scBoqJB\nYRQBXOTDCFIUdhIE+WjaAAlQJ/655wb5SPJhwA2Qj7hOYBcNkjZAaidAUidohcIF0joPN07jPK7t\n+zpHD0p8k+JTqx9HY525Fxd1KImkuKk1gIW9RUncpDT34FzzMabSWsPDw8PDIxyI3PcL8PDw8PCY\nHZ60PTw8PEIET9oeHh4eIYInbQ8PD48QwZO2h4eHR4jgSdvDw8MjRLgTaSulPqqU+hul1N8ppT41\nrxfl4XHf8LbtsapQt63TVkpFAPwdgH8K4CmArwH4Ia3138zv5Xl4LB/etj1WGRt3+N0PA/h7rfXb\nAKCU+i0AHwMQMGyllO/e8VgotNZqzk/pbdtjJeCy7buER14C8K74+r2rx1wXhtYar7/+ujlf9PLX\nCte1bnu9BeFGtu3/J/5ai7jWNNzF054Zjx8/BgAcHh7i8PAQBwcHy7isxxqCNrQqePz4MQ4PD/H4\n8WMcHBx42/a4NWa17buQ9hMAr4ivX756bAIkbRq2h8dtYRPjG2+8sYjL3Mi2uTw87oJZbfsu4ZGv\nAfhupdSrSqk4gB8C8OUXvahlwV8rXNe6j+tdgxvZtv+f+Gst81q3rh4BnpVFAfgsnpH/57XWv+j4\nGX2Xa3h4XAelFPT8E5Hetj3uHdNs+06kPeOFvWF7LAyLIu0Zr+1t22NhmGbbviPSw8PDI0TwpO3h\n4eERInjS9vDw8AgRPGl7eHh4hAietD08PDxCBE/aHh4eHiGCJ20PDw+PEMGTtoeHh0eI4Enbw8PD\nI0TwpO3h4eERInjS9vDw8AgRlqKn7eHh4SEhNVvk+eXlpRkC8KJzpRSUUohEIhPn9mOEUsp5HiZ4\n0vbw8LgXuAh5OBzOvDY2NhCLxWZaLmIPKzxpe3h43Au01hiPx7i8vMTl5SXG4zH6/T4uLi5wcXGB\nXq9nzl2PbW5uIplMIplMIpFImHP7MQCIRCKIRqPG6w4zcXvS9vDwuBdIsh6PxxiNRri4uEC73Ua7\n3Uar1TJH13kqlcLW1hYymQwymYw5l0cAiEaj2NjYgNYa0Wg0tGRNeNL28PBYOmRoZDweYzgcYjQa\nodfrod1uo16vo16vo1arBY7yPJPJIJ/Po1AoIJ/PB85HoxEAYGNjA5ubmyZurpTC5eVlIM4dNnjS\n9vDwuBeQtEejEUajEYbDofG06/U6zs7OAqtSqQS+zufz2N7eNmtnZwf9fn+CsFOpFACYeHbYB1d4\n0l4gZjGO2xiQ3FJyXV5eOp/TlXmPRCLY2NgwiRyeS4+HS25d5fUkuN3kNlSuWCyGaDQ6scLs6Xjc\nDLQ9LgAYjUbodDrodrvodDpm0ZOuVquBY61WQ7PZRKfTQa/Xw3A4RK/XQ7fbRavVwsbGhvGih8Mh\n+v0+ut0u2u02Go0Gtra2kE6nA8doNHrPf5nb4U6krZR6C0ADwCWAodb6w/N4UesOGq59nBXD4RCD\nwSCwRqNR4Pl4Lkmd59FoNJCwSaVSiEQiJgnEZE+v10Ov10O/3w+swWAAACYbz/NYLDaRDEomk4jH\n49jc3DQrDITtbXt+kA4D12AwQLvdNoQsibnZbKLRaJhzLhI2bX08HqPX66HT6RgPmiGWTqeDZrOJ\nWq2GbDaLUqmEYrGIUqkEpRQSiQTi8fh9/2luhbt62pcADrTWtXm8mIcAF7HelLRHo1Egy35xcYF+\nvz/h0dCI7bWxsYFsNotsNovxeIxIJIJ4PI5+v49Op2OSPa1Wy3hA9Ii63S663a4hbEncyWQSmUwG\n2Ww2cEylUkin09BaIxKJIBaLzfePuhh4254j6DTwOBgM0Gq1cH5+jtPTU5ycnODk5CRgc3LR7mT4\ng8/T6XQChE3vu16vI51OI51Oo91uYzAYIBKJIJFIIJfL3fNf5Pa4K2kr+K7KmWETtr1lnBUk7U6n\nYzLqvV4vEArhkV45a1sHgwFisZjxWJRS2NzcDNwAjUbDbEvr9fqEx9NsNicaGJRSSKfTxqPh4rXZ\nDBGLxcISU/S2PSdIL5thtn6/j3a7jWq1iqOjI7z77rt49913jTctF3d3clcJwDwP7Zzx8GazaUr+\nuAaDgfGwc7ncRIgvTLgraWsAf6SUGgP4Na315+bwmtYS1xH2TYlbehTNZhP1eh3dbjew/eSNQoOX\n4Y14PI7xeGwIe2try/ws44pnZ2c4PT3F2dmZiS1y1Wq1ic6zSCSCbDaL3d1d7O7uotvtGrKWHnYy\nmQzE31cY3rbnBNoAw3N0HuhpHx0d4e2338a3vvUtdLtdk5TkkiE+LgAmTDIcDhGJREwttqu5BoAh\n7HK5bIg/jLgraX9Ea32klNrBMwP/ptb6q/N4YcuEizCvI1E7tHEdCbvOZXyZ5zchbW795Op0OhOE\nzdihjEX3+31sbm4iFoshHo8jkUgglUohlUqh0WigVqvh/PwclUoFJycnqFQqAcKWpM2EIs+HwyHi\n8TiSySS2trZMwojJTb62kGAtbHuZcLWmy7AF7bDX66HRaASqQhgi6fV6EzkYl80wJGcnxpVSxi5l\n0rtYLGJ3dxftdtvsMum42M/pwirVdt+JtLXWR1fHilLqdwF8GMCEYT9+/NicHxwc4ODg4C6XXQhm\njTXLT3s7TmdXW8iqDUnYDBnILd9NyEzGnRkeubi4cOozyHIqng8GA9TrdUPeTOKQqE9PT015Va1W\nM89Pz9lVJbKxsTHR6MBMfSqVQiKRMJUkd7kBDg8PcXh4eOvfnxXrZNvLhCtExxizXLVaDe+99x5O\nTk7MTpH3j3Q+JOwciotw7Z/h8/CelGFCxrjtHeN9YVbbVrf1fJRSKQARrXVbKZUG8BUAb2itv2L9\nnF5178omuuuIW5KzTO7RCKRB0EuQRjwejydac0m4s4KhkYuLC5OgIfG73oftgScSiUB9KxdDIiTt\n09NTc0PJa3W7XcTj8YlVKBSwv7+P/f19PHr0CPv7+9jb2zNJT7nmlYxUSkFrPVc3aJ1se5kgOdrO\nTLPZRLVaxfn5uTmen5/j+PjYrJOTExwfH2M4HE7sFoHnZGznUfg9eXThAx/4AF577TV88IMfxGuv\nvYbXXnsN5XI5sGOUpaj2c92Hpz3Ntu/iae8C+F2llL56nv9kG3WYYBMejcXe8smYnDzayRNuwWzP\nfDQaBVp0uW6SGGEdKsMdMtkn3wuTf7aBD4dD42EDMO+JYRHZxNBqtQIfSPS0ZXglkUiY2LhrpVIp\n49WHpDZ2rWx7mbDDf6PRCN1u1+RJSNKnp6cm3NZoNAKetivPY3vDUjtkmnft2nHS26ZNs72ddd68\nJ3jvrCJuTdpa6+8A+NAcX8u9wQ5hyGYVm7TpVcvEHj1fuyyOSRTpeQyHQ9TrdTQaDTQaDXN+k8SI\nbHKRW0pXTFE2tGxsbJjYM0MirBrp9XrGCzo7OzNHO8F5eXlpnmtzc9OI8lAHwrWYyZ9HeGQZWCfb\nXjbshCP1RBqNBiqVCp48eYJ3330Xx8fHAadFkrZ8LkJ62nLZpaf8HVcYUyY3ZViSBM17gva5qsTt\nOyKvYIcSJAnKT33+w+lNM7zhErRhiESuwWAwsU2sVqsYDocLeV/0iHmMx+PmWjRkNiPIKhG+rl6v\nF0g4ygy9TGSm0+lALFsuNtXE4/FQkLbH7SDvIRk27Ha7hrSfPn2Kt956C0+ePJnI68gGMQnpUUsb\ntElbtqjLowxlymvyPuDvche4qmRNPDjSdlV5sMbTloCc9rN2hyBJT3ranU7HxOektz0cDtFoNALJ\nvZtWj9wEGxsbSCQSpsmAixUj8pxkzJ/P5XImWSMz8dFoFKlUynjYPC8UCiiXyyiVSsjlckin04as\n6WWHoRvS4zmm2aXr3hiPxxOt6d1uF0dHRzg6OsLZ2ZkJhZCk7ZCICyRVO/FNu5KOCROfMgfDvoZm\ns4lKpYKtrS3EYjFTzy1tmPcBr8HzVSLxB0na9taJ2zeGKthQ4sqE03uw49qy7XuWmLbUUFgkYrEY\n0um0UUHL5/PIZDImFi11h3mkelq73cZoNHJqh8jGBa5sNotCoYBisRggbeqb0MtepRvAY3ZIUrXt\nmqG/Vqtl7iHeTycnJxOkzdCJK7Rng2V8cofH0lKbcC8vL831I5GICc+QtE9PT7GxsYHRaIRarWY6\nd7kymUxAcmFzc3Pl8jAPlrRlnTTrRlkxweoJV92zq3JEVo/IJX9fXpdhFbbkLsrLZgdiKpVCLpcz\nSmj5fN7ogchjKpVCJpMJaI8whm3HxeXv8TyVSgWMnwlI+Xve0w4nXGEHu9SVeiLVatUktCuVSiBP\n0mw2AzvMaeV9Nuhp09YSiUSAcHmutcbp6SkikQjG4zG63S4AoN/vo9FoIBqNmh1BtVo1TkaxWDSO\nFj8ItH7eFLZKxP3gSZufxPV6Haenp3jvvffw7rvv4unTpxNTNeTRfswuBbQ9CEncMiGyKNKmN0vS\nzufz2NnZwf7+PorFYkDhj0e7E43VInKbyKP8XZ7LRh16PhSIspNHHuHDNMLm6vV6aLVaqFarODk5\nwZMnT3B0dBQQgpKk7Srtc0GGR2KxWIC0SbqFQgGlUsmoWLK0tl6vQyllPG2qC56fn6NQKGBnZ8c0\n3DAJynuShL1qZZ0PlrTpNTMRx3jXe++9h29/+9t46623JqRPJRG7yuum1XrLawPBreV1sbzbQpKi\n7Wnv7++jVCo5u8ZciVil1ER8j+f2kltXrlgsNlGa5RFe2LtG+UHvIu23337bELVM3LtKVK+DTdrs\nuqWmdrlcRrlcNiFMlhlSya/f7xsPu1arIRqNIpfLod1um+/R1nm9VZVdeHCkTdgJFMalKcB+fn7u\n9J4X+anrKmECJgv7p31IuJoNZCIyl8uhWCxie3vbObl62mtykbbtPXO5vHKPcMOlqe6SR2i32ya8\nyBBjpVJBp9MJiJYxNHJT2CTP8B/tO5vNAoCpYuKOj4JoLASQFS78AJCjy2S4b5Hhy9viQd5RssSH\n239buJ8dezYhLgouwpOJO7lccquyc4xHkjZL85LJpBGAdz2v6z3KJJBcrmYHVymWR7ihtTakLJPs\nrM6g58xW9ePjYxwdHZn4db/fv1HScRpIuP1+39RjZzIZkxviB4EsSaU3nsvlAuFILkne3CnI55P3\n1SrhwZG2XaRvk5Ekb+nNLou0ZU01uxZtgmSik56O/GBxkbbcUrLkbxbC5vPY3rTrw8RF4J601wPD\n4dCQst3NS/1r9ifIGv9Wq+Uk7ZuCu+HhcGgIm0nGXq9nvHiGOKgPT3vP5XIBwSoAZtfMHQTvJ4Zx\nWJK4iiJnD5K0Gb9lRlgStSRuxu0kKS3qHyhJm15xPB6f8GA5YYaDCBjD4+u0l6z0kNtA/i1m0W2Y\nJqozzVv3hL0+YG8CG2Q4YYZlsezsZZLRlmdgWaurae0mr8EObfA1kYxJ2ryHGTbZ2tpCNps1IT3g\n+fAE6WlL0mbIZxkh0dvgwZE2MDngc5pqHUdzLYOE6BVLjziRSEzMVYxGo7i4uAh4HP1+3xkPl562\nDI/Q05bXftFrm5ZMdMXRZ/kg8AgHGB6hdjsnzZyfn0/McGw0GhNlr7J5bFqCfpbXID1jyjBIT1sS\nrPS0GR6xCVsm3mWXsw+PrCCkN8ivXfHsWCxmPG2buKeR0V0+kekdMAHCRIqd1OPPATAGJ6fQuEh7\nmqft4WHDVfEkPW2q85G4pU5NvV53Pue0XMlN4CoLlKEMmdzk7pKkzZAIZZF7vV7A6WFMm3F7etp3\nicEvEg+OtAEjeRio/0wmk8jlciiVSkbZzm5rv7i4MMkOO3braq6Z9s92VV2w5lSuTCbjTALKMWOt\nVsuUVNl14+PxGPl83og2SWU/Dw8b02QbKIjGYbkMjbTbbVxcXBiCmwZXMtsljRCNRp0dx9NCf7K0\nlN2LrCSRr4nXIGlfXFxMvQ9WjaBdeJCkDQQ/6Una2WwW29vbJrnnUu67vLyciIFHIhHTms5Ynq1Y\nJsEaUPk82WwWxWIROzs7ZuVyuQmtX4ZHXINPXWVYbFuXMXIPDxdcEg/0TLvdrpm/yPZ0xqzZ/TsN\ntHdbvMx1pI4Pl51Xkou/Jwk7mUwGiJ47U+5KLy4uzIxTSdAusl5VAn9wpG3HY7XWgQYUFtpvbGwE\nEipcWuuJBpJIJGJieyzJ63a7TtKmITFswUXSLpfLePToEfb29lAsFieSkExEyl0AS6/syemdTsd7\n2h43gt0xzLCBHJpL0qbdvajumrtZW6tGDt/lebvdRq1WM4UAdEBkmS7XNE9betjM55CwW62WuWft\nGmy7VX9V8eBIG5iMp0lPm+GPZDJpsuEyDAHAGAdXNBo17dok7OvIkUYsn8cm7fe9733Y2dlxltHZ\n8TdWkzCLL49SIMp72h7XwdWefp2nTRuc1dNmvsZWnOTa2toy3YrUBGq1WgCCvRUy/yQJm3kbWTXF\ncAkJu1qtIh6PBzhg1UnaxoMkbWCya5AKYVSwy2QygXImzjkEMCFLSulG1o5SYWzadV161DI8sr+/\nj1deeQW7u7uBEjrZXGMLune73YmRThsbG97T9pgZrhb1aaTdbDZNhcWsnjZJm3NEXWPokskktNbo\n9/totVqmLNcmbZdkAj126WGnUikz+b1WqyGVSoXeeXlwpO3KWst/Mr+Ox+MTxsHvk7Cl/i5j3vTI\nKVRjT88AYAyYrbeZTMaI3hQKBeTzeeRyOWSzWWcCxhbq4YeF3TC0sbExIZO6SmplHvcDl2IfgEBX\nIFe73UalUkG1WkW9Xker1TKldlKXh88hd4Q8Z4MLpYELhQJyuVxgEDQX9UGYI+p0OlBKTShKxuNx\n7OzsGPvmsA0SMsOP/GCh47K5uWmcLFk9IidQseuTiVCZn5pW+rpMPDjSdsFOWLCyRAomsYQIQCA0\nQiLM5/OBqTXdbhebm5uBahImOBlDp3dNtTF6xYlEwhiWq2tReh0EpVXZgMBa1Ww2i3w+b0g7zB6G\nx/xga9dIiWLZNMOp6cfHx6jVama4h0s8TZbPytJZat6USiWzcrkcksnkhJ47c0tMrI/HYxPes2Pg\ne3t7eOmll7C9vW28dO4mpeMyHo9NeFDq5rCxhmTNey6bzZrXwOoY1obb5cL3gReStlLq8wB+AMCJ\n1vp7rh4rAPhtAK8CeAvAx7XWjQW+zoVCJgdJ1GzVtms+AUxsy5RSZmgAE4G9Xg+xWMx44AxrjMdj\nQ7Ayjr29vW3K/Fykbb9eaTzAs5uQimR8zfYYMO9pB/EQbHsaZBhEDgOh2qVrVatVtNvtQIejLA/k\n/WLnfDjRSFZGFQqFgMQvz7n7ZJw8Eong4uJiYspSOp1GqVTC9vY2tre3kcvlDDHzPZGwLy8vzffk\nnFKWNPb7fXQ6HQDParnpgHFQCUlb6sHfp/Mzi6f9GwB+BcAXxWM/C+CPtda/rJT6FICfu3oslCAB\nyk9pGhMJm94ygAljAxAgbHZV8XkY0uj3+7i8vAx42qVSCbu7u8aQpactDcMmblcrud0Nxikc9FA8\naU9g7W17GmyJYupPywG81MOm191oNAKetqvDUcaumWAslUrY2dnB3t4ednd3sbe3h0KhYH5H2jZr\nrEnY8Xgcg8HAhFI4d9QVF6enzUSm3EnQ/uUEJUpA9Ho9AM9FqYrFotkty45LisvdZ2gEmIG0tdZf\nVUq9aj38MQDfd3X+BQCHCLFh2541PW7GsaUsJYCJxgCtNTqdjmnCkWVKkrB5DSqQSdLmlpE11dfN\npePj8lOfKx6PByQ07RZ9Hx55jodg29Ngl/aRvCRpv/XWW3jnnXcCzWVSS8SOiUspBiYcaeM7OzvY\n3d3FSy+9hP39fRQKBadaJT1j6XyMx2OT4+GiZ00ypncvhd742kja0zxt4Hl7O7X16WnLtnYpynaf\nzs9tY9plrfUJAGitj5VS5Tm+pqVDdmwRdmeYNFIZniApZ7NZ42Xz01lOO2+32wHSppgN4330spnd\nto1CNgPc9yf9mmOtbHtaOZs9xMDWFzk+PsY777yD73znO85hIK7n5W5V6lvn83lTFUUvm9OTmPCT\n5avyeWRIkklMmdC0+xfskKEEk5QuT5uNN0xgsttTkjY7nHmN+9QkmVci8tpCx8ePH5vzg4MDHBwc\nzOmyy4PdlOOKM9vVGy5tadkaLAcusNKEBmuXUIWtlnRRODw8xOHh4TIvGXrbtr1OltTJSpFer4ez\nszPUajVTIcJGM6nVPk3hkXZLVb1cLhcYBWbPDJXJf1ta2C6JBZ4NNqCIGsOY9r11U0eG15R/G3aB\nssSxXq/j7OwMAEw8nffpvDGrbd+WtE+UUrta6xOl1B6A0+t+WBp2mOAiZnnu+trVuSWNS8bRaBTp\ndNr8zKqOOFoV2MT4xhtvzPsSa2fb9o6RnYZSdqHdbhvRp1ar5dQUkfYtF208Ho8b0mZpX6lUQqFQ\nQDabNbtIhulcyX+bsJlHImFK0nZVVt32bwPA6OZTn4S13WdnZ6apTYZu5o1ZbXtW0lZXi/gygE8C\n+CUAnwDwpVu8xtDAZRB2veY0g7a9AQABT7vRaJhYHCs+6OF4LAVrbds2WTO8wS5aWd53fn4eqMWe\nJtZEu7bVJzc3N02SUHra7Mp1edo2YXPAdCKRMHFuACZuLYeDSOXNu/6NCKlR0mq1jKfN3BUJO5VK\n3fm6t8UsJX+/CeAAQEkp9Q6A1wH8IoD/opT6UQBvA/j4Il/kfeI6o7A97WnhEWlcLk9bVqlsbW29\nUDXNYz54KLbt6nSkp01vkvKqDI8wlstktgxdkLBZQcUj7dflaUvSZpJdkjerquwyRN4HvAY9cdkp\nKY+3+dvY5/a8WEpVSMLmh8l9YJbqkR+e8q3vn/NrWWnMYhQu4ra3cYxpU+CJJYH0UvL5vPe0l4SH\nYtuuShHpaTMEYHvadB5k0l0Stq3QJytG6GmzKopJdnra0nOd1qFpN+7Y99I8/z6E7WlzF0zHin0P\n93l/+o7IKbiJUdBbkAI1rCah7jUbA3q9nonjDQYDU9SfTqedBf321tRXjXhMgytZTRKScr39fh/n\n5+fGu+aqVqtoNpuGsBnjZUWIXHZLOXeKbBRjezk9bLsjcdVsWf7tbHVDqmaywuW+J9p40p4TZBkf\nP4Wl4A4z9aPRyJT0jcdj9Ho9XF5eBsqMZOmgLRK/SobusdqQoTipUU09kUqlgtPTU1QqlYkkJOuS\nKaZGD5pH1kfLzuBkMmmqRYrFoml42dzcNOGNecWhFwm5M5HDGO46UX5e8KQ9B7CpgKQNIEDKLKvq\ndDomG876bhL61taW8bRlfajsuryuDtXDg7C9Rpk/4ZIeNhcJ2yZtOdVpe3sbpVLJhDpsPWtbACrM\npC3laamo6Ul7jUDSlufUc+AWq91um+4qe8lJ1tLTjsfjgaJ+D49ZQRU7ypwyds34tb1YMcIFPHM+\nSNo7Ozt49OgRHj16FFDVk0c7jMISPbuxZZVhe9r0tr2nvUZg3A+AKX1Kp9NGMpVbUsYLOdWGXlCn\n08HW1lagC0t2iMmSKNkQ4OFhw07iSU+7VquhUqng+PgY1Wp1YlGFUv6+PYpvf38fr776KrLZ7ARh\nM+xnL3s036rbr5209eGRNYUsCaLBj0Yjk4hkA8PFxQUAmAnXsvvKtWwjkeVXhKvsadZSRY/1g2ym\nsctLSdq1Wg31et0c6/U6RqPRxGQYDgShfPDe3h5efvll5HK5ifCI7G6Uy27CWXX7kx94DJF40l5D\nuAxRxgMZJwRgdK25de10OgGVtdPTUySTSYxGo8CwBTl0wZ4daXs2tpaKx/rDbqKR08dZk02NbOZP\nWA0hk46sdGJ52/ve9z7s7+9je3sb+Xw+0Jlox6pZ1npduZ7H3eBJewGgYUoJVpnYkYRNISkpQn96\neopoNIperxe4gXjO+KDd4OAaOOxvkocDux6bNkaRf0naDNPJEjaG9ThBqVgsolgsGknVnZ0d5HK5\nQJOMHfpg+E6GWTxpzxeetOcI2yClpw3ADDeVhC0nT9PTZuVJu902+sFbW1tm+Kmsk5WZe8pYAs+V\n0jweDuyuRzbR0NOmbEK9Xg8MhZYORTqdNkRNsi6VSmbyDEk7kUg4u39dHYYAFtYY8xDhSXvOoJfB\n5CQ1CnieTCYDhE3tbEnanOher9cnSqhYRiVHL3FMEwmbrb4eDws2abPqwRUesXWsZXikUChgb28P\nr776Kl5++eWJGu1UKhUQbZpGyDI8Io8ed4Mn7TlAbgf5NfDc0+bNkM1mTZVItVoNjBZjEw4rThqN\nBjY3N01LsBR/l2OXuKiFIIcU+0qThwUXaTM8QtJm2zp/Xi6O+yJpv/LKK3j/+98fmM/Io1265+1s\nefCkPSe4jJY6DSzZY/s6s/GlUgnlchnNZhOdTicwbIETQoDnnW28Ce3kJIV6KKdJz0kmK13iVf6m\nWy/IIbXsDTg/Pw8kHmlXdkVHJBJBKpUyKn10EHK53ESuJKwTkOyKj+ucmlXWr/ekvWBIOUsAxpvJ\n5/Mol8umtb3ZbAba3Xu9nolJ2pOj7RFLFJuS+gicLkIxH7nk5A6Z8ffEHW6wjFROVD8+Psb5+Tka\njUaggknOQaVdcCeYTqcDc0Vtuwk7XHH3697XqhG4J+0FQxI2291TqZTpMBuNRohEIqhWq0bXGEBA\n3IeeNssDXQprtqcNPPP02UYswyWu+OM63IwPHZK0KQZ1fHyMs7MzNBoN0/EIBMNodAAYt2b4TQ6D\nlhUi64CbeN2rBk/aCwSNgCEJxg1TqRTy+bwhbJZabW5uAgD6/T6azaYJc3BogqsWm0dOsaaHzdAM\nJTFtsXn5QeKxHpCkfXZ2hqOjI5ycnODs7Mx048rcB1UpmReZ5mm7hnqEFbaXbb+XadUvqwRP2guG\njB/LDH0+n0ckEjEdZ9QtoU5EJBIxcWxCbuVsjWO2zdMQqf/LDi4S9ubmppGVlHW1HuHHNNKu1WoT\nnrY9hFeGjIl6AAAfkElEQVRWiNDTTiaTiMfjaycNPI24w3IveNJeIKaVOjFUEY1GA8qAzPI3m03T\nRCNrbmV5luw401ojlUoF4pPc8srBrLxRZWKUXZPUNZn2HjxWB/awAJ7LwQbVatXIrlJKgXmSafrv\n0tOWvQDriBd53PJDyh4beN/3hCftJUNO/4jH44ZQs9ksSqWSKfuLRCLIZrMmKSkXyZuETmKml16t\nVo0OBDvf+DyDwQCZTMYp9iMN8r4N0+N62GO5WOfP/zeFyrrdrqkYkeL9JO1kMmlIW048p/jTOuJF\n+RxW1fDesJOy9x3bn2VG5OcB/ACAE63191w99jqAH8fzSdWf1lr/4cJe5ZpBkja/zmQy2N7exuXl\npfHA8/m8EY6i6FSr1QpMIWHYhaTdbrdNYw3FqHjj8uaV7fGpVMqEU2xPYt2JO6y2LcWM5E5MkjZ1\n2fm/Z4LaRdqc65jNZg1ps3lm3WGTt3SquGNlfbok7fu8N2bxtH8DwK8A+KL1+Ge01p+Z/0tab3Br\nyu0ZDYQ3FOPc9LypxEY1to2NDSPfSsIeDocBT5ujzKjNzTJA3riDwSAwPHhjYyOQcJKvdc0RStuW\nIv1S/9ombJK2/H/bOQ+SNhu4JGmvo6c9zcOW5/S0Sdr0tFmpdd916rMM9v2qUupVx7fW/o5eFOym\nG6nFLQm72WyiUqkYvRE250idBzZK0IOWsyfj8biZgCNF3LmA4MAG+foeAGGH1ralp80dF9vV5UxD\nLumRXxcekaRNwbF1xYvCI/xQI2nbnvZKk/Y1+Eml1L8C8KcAfkZr3ZjTa1p7yG40AKYUkIRNz6jT\n6RitEY4cY/afA1t7vZ5pvuHNK7d5Mp4ptYDpbTERJadLS3nNh0DeDqy0bdueNlvVp8W0+T9n7BsI\nlvzR05Yx7XUMj9hhv2m2LcMjsnU/TOERF34VwM9rrbVS6hcAfAbAj0374cePH5vzg4MDHBwc3PKy\n4ceLGlrk1iwSiRj5TIZAAJh2Y7k6nc7ECDN6X+1220hp0kO3dZfH47GpOJHJyVW7cQ8PD3F4eLjI\nS4TCtvmhKv9/dpURP6j589zZKaXM/1i2rsvJ6WEIj7iqnbgLsRelIqSqIR0b2cq/sbGBTCaDXC6H\nfD5v5CY4qJiytIv428xq27ciba11RXz5OQC/d93PS8P2mA5pRAAQj8dN0wNLtdhu3Gw2zVxJHim/\nKbfFspOS3ro9SoliVVICljXlqwabGN944425Pn+YbNsmbtfUGEKGvOzGGtZpb21tmTDAfcdtbwMS\ntgwZ8chJ86xVZ++C1KLnkXriHGS8s7ODUqmEfD4fCFXOG7Pa9qxXVhBxPqXUntb6+OrLHwTwV7d6\nlR4GJGzZQHN5eWlCF/SOUqkUWq1WoJpESm5Wq1VEo1Ej78oZgUx+0gOnNy5F8ovFIvr9Pi4vL41C\n4QNAKG3bRdDTlmzCojfJWO000g5TeER+ONnTehgqqlaraDabRuqBSX/+LShxnEwmzQAIEna5XEax\nWDR/o0WR9qyYpeTvNwEcACgppd4B8DqAf6KU+hCASwBvAfiJBb7GBwO5feUNw5K8eDyOdDqNXC5n\nvGk5S7JeryOZTJpOym63C6WUEZkiYTNZRS9ExkJpzCRsmZxcR6yDbdvetr1klRIlDOhRkqQYHpET\n1sNE2gT/Dix15S6UcrT0tGnn3LkyTMQhI/l83gx9IGnn83lD7IsKj8yKWapHftjx8G8s4LU8aMjk\niNQs4bBgamZPK+2qVqsBwq7X61BKGU+bhB2JRNButwOEzQ8ASdi5XC6QnFxHhNm2Z/WygWDYza4/\nJlmxuUYO9g0DacvuUBkeYWcoy2VJ2tLTlsn4VCplWvnpaTM8Ui6Xkc1mA+PVVtrT9lgOXO2xMikp\nPSpbM7nb7SKRSBjdiXQ6bRp3GL+WGiYMgfBxxrQpXMVBxExkzZp191g+bkPc0tuWJW0cbiDFoVYB\n04SbXDsM5m9arRbq9TrOzs5wdnZmwiNSNMtVq854Nlc+nzex7FVpZfekHQJMK/yX5Xv9fn8iHimN\ny9ZakPMDudVjnJzNO71ez4RcbA0Gj/DBJTTmUvALy/9XjlRjjqbdbqNSqaBSqeD09NTor5C0KRPB\nHQXj+TLxKGdhMhSySn8TT9ohgG0w0vvm1zZpu8ZB8efpZTNhA8CURbGLkqTd7/cDcrCrZLwes8FW\n6ZtG2GEgbul8MF8jB4c0Gg1D2CcnJzg5OcHp6alxSCRpy+qsfD6P7e1t7O7umvI+yiWviodNeNJe\ncZBspbGQtGVHJb1ilmuRtCVZE9LTBmDCJCwblJ72YDAw8Ts+36rX73o8h50rsXdM9tgxm9xXCdN2\ni7RZDss+OzszpH18fIyTkxMzUIShQcoUM5Ytx//l83kzi5XCWaukyeNJOwSQxC0Nh4Q9Ho+dnrb9\n+wQNHnheItXr9Yw3Qk+bNa7y2p6wwwfbbq7ztleFmK6DTDpKOVrqiEtP+/j4GMfHxxN17CRthkfo\naZfLZVNJY4dHVuVv4kk7JLA9bVv/ejAYOMMjkrBlezqTjIPBwNysDI9IdTh64/TGpiWFPFYfLwqL\nrEri8TrI5Kr0tKkh7iLtp0+fBrTmZYs6wyOFQsGER1gKyfvJFdP2iUgP52ADehOyVVnW38rzer0e\nyJDTQ7Y9bMK+WRnjY1WBHAgsR5StirfhMRtY1iarRNi6zsk0Upb3vuG6D6izYq9qtWpmYZKw5RBj\nlrByfJpcqVQKxWIRL730Evb29lAqlQJt6i7nZ1XgSXtF4GpJZqzZXq4hCI1GAycnJ6hWq2i324a0\n+dwS9LjsulOOmJLDXknenrTDCXusGBtppKLfKiWYeQ/Ie4EetT0MRJL2+fk5zs/PUa/X0el0jHAa\n56NSM1yOVSsUCqZ5Znt7G9ls1uiurIKa3zR40l4h2B61zI7LJUuceE7SrtVqpnnG9lgIGQ+XnrWs\n1yVx0xOjt70qN7fHbLBJm5USFIdatZZ1e4fJEJ5MNnK5SLvVapnehEgkYsb2sWGGx2KxaBKOPJK0\npSOzio6KJ+0VgTRWetCUXpWC9p1Ox8SaeeT0dulpM0vuikFLT5setYzjSdK2p3GvmgF7XA+2akvS\nLhQKK+tpAwh42LLDsdFooNFooF6vo9FoTBD2+fk5ut2ucUZisRiSySRisZjxqMvlcqA1XYZL7AYj\n72l7XAuXzCZbz9nhRTU/VnewI/Li4gKtVgvn5+fG05bhERdYdy01KGzCphj+KpeBeVwPfjjbpC09\n7VUibdt54W6SWiIs6ZOELYmbM1AzmYyJ42cyGezs7ODRo0fY3983S7amy0HXrrr2VYIn7QXCVR8N\nIFB+xHMZ7mDIg80CLGei+A3rqOXQ3na7bX5GJiJdkJ621KGQIRGGRVZRntXjZpAlfjLUdZ95Cle7\nPQd72LKq7XYb5+fnqFQqpi29UqkYTZFGo2GcGd5PsVgMqVTKhEN2d3exu7uLvb097O3t4dGjR8hk\nMkt/3/OAvyMXjGmGaS9XooUetJRhbbfbpoba/ll+T85/dMElICTn3/mkY/hgVwoRcsJRp9NBo9Ew\nSedsNoter4fRaLT0Uk6WndrLnrrDXaYUfuKiNrZSysgIR6NRbG9vmyU1sSmvmkgkVjLsMSs8aS8Q\nrooQJhfZccijbaj2UZ7TA5GeOXVEJGlPi2fbpM0widwqruK20CMI1//XfkySdrvdNqGQTCZj8iOc\n5LJssCpELu4mubuUO0wZImw2myZZGYlETPliIpEIxK13dnaws7NjxJ9I2mFuEvOkvWDYI71I2jIL\nLr1oetZSK4FkzCXL/uSSW0p2PErIbrdZPG2PcMAVhpNVQ8yNkLC11igUCsYBuC9PW3Y0cnGQB2PU\n1WoV9XrdOftSNsywVC+TyaBcLptwCFc6nTaLyfWwwpP2AiE9bZlYYXJRZsJ5lOetVmvCE+n3+4Em\nG7lc09YJu43ZJdPpG2nWCzIc1+v1oJQytsjcCD3tZQ+84K5TJhlbrVZAN4THWq1mbJ+OSb/fN3Xn\nUl61WCyiXC6buDVj2CxtlHYeVnjSXjBcVSH0tFm2RI+CXgaPzWZzoh6bW9nrNIZtuPSwbU9btvj6\n8r71AUmbhE1bkqR9nzFthvXk/XBycoInT57gyZMnePr0KarV6kQzGR0XDp9OJBIB/RAmHVkpIj3r\nsNu1J+0bwtUWLr1pSdBMMHIx1GGTc61WQ71eN3E8ZsI5OVp2QF6XYLQnS7OxwtZd4GQaNhTwSFlK\nlkt54g4/pG1yOHQkEjFEKTU7RqNRoLKEC5j84LeT6/aOUvYb2OPPKFImq6K4y2R1CLWv5cACSi3w\n+hwLxikzXBxawDj3ujWFzTIj8mUAXwSwi2dz8z6ntf73SqkCgN8G8CqezdL7uNa6scDXujJwaYS4\nKkLkcFEe6VHYoRDGtpl0lIQtPwymQXY5yo4uW3MhmUwGxkvJI42enWFh3kLOgnW3bUmwJFSbtBuN\nBqrVKk5PTzEcDs00Gx4pAWwrBcpaatnBa4cwuEO0pRf6/X4gh8Nz3hcMDQIwHZt20wvtVVaLFItF\n5HK5lRjAuyjM8o5GAH5aa/11pdQWgD9TSn0FwI8A+GOt9S8rpT4F4OcA/OwCX+tKwfY05KBcHunJ\n2EsmG6Ucql3yNxwOJ0SipkGStrzxbGJm4wHlJ+VQ12w2a1YymVx70saa2LZLgU4+JkmbIGnX63Wc\nn59ja2sLo9HIdAZSswPAhM52JBIx5MtFR8VOFnLHyConWe1kzzmVfQcsa9Vam+omKbsQi8UmCJsl\nfnKAwYMkba31MYDjq/O2UuqbAF4G8DEA33f1Y18AcIgVNux5wiZshkJYusdVr9cnQiEUtLFL+mTM\nWgpDye3nrKTNRplkMhnYQnJR4H1rayuQVefv8PfDnGGfBWG3bdeW336M9sNzLplXqVarSCaTGI/H\n5gNdaz0RbiNhSzEzWXoqS/ZkeR5J2NWHYFdH2cl0krZsAKONkqRlLXahUAjY84MkbQml1HcB+BCA\nPwGwq7U+AZ4Zv1KqPPdXt4KwCZuL8WuZCee2kzPrOKvOZcCSoG3Peppanw2SNidypNNp5PN5U7O6\nu7uLcrmMQqGAVCo1sabFMx8Cwm7b19XVu5TzZHiE+hy0Y9Y+06uVxE3bl/MZGQqhPojdBGM7KXLI\nhl0VQudDtpXbNk1nw+Vp53K5QHfvOtrwzKR9tX38HQA/deWV2AwylVEeP35szg8ODnBwcHCzV7lg\nvKgSQx6lLog0XOld8JyttzZpy3pqOR3GBZcOgku8Ph6Pm3AHDXtraytQs8rzQqEQ8Fh4DEOy5vDw\nEIeHh3N9znW2bSBov/wfS5IlMVOcScagh8OhM57sUqCUddaStBn+k8TN8J8dYrElFORIMBney2az\nKBaLJo6dz+fN98Na2jerbatZSn2UUhsAfh/AH2itP3v12DcBHGitT5RSewD+h9b6g47f1ffRbXUT\nuMh5Wnacw3ClxyC9a5lcYXiE1SHUSXAlLaf9jWwxG9kMI3VCmGCUnsjW1paZfSfDI9ls1ij4yaRT\nGEjbxlVS7NYvPIy2PRwOnd7rm2++ada3vvUtvPnmm6jX63yt5vcjkQjy+TwKhYLpFCwUCoFqIp5n\nMplA7f51pH1xceHM35Ck5Q7THtLBcxkG4Uqn087XZkurMuwnK6ZI/utk27N62r8O4K9p1Ff4MoBP\nAvglAJ8A8KW7vsj7hE3Udn01j3IenSxVsrV+qZkgiZyxa5lJvy7ByK2iPVGG4Yx0Om2O0xa9Ei6W\nQq260PsSsfa2DQS9bNklyRFyTCJyt1ir1cyHv03a0WjU5HDs5fowkSEQLlleKJ+XOz+ZEN3a2kI+\nnw+QNMv65O5Slvitc4PYLCV/HwHwLwF8Qyn1F3i2Vfw0nhn0f1ZK/SiAtwF8fJEvdBlwJRhlQT8F\nbWQ9KWUhbZ0QKvFJ74JxPPkhcNPkYiKRCHgcXJlMJkDiPMobQM69k577QyXth2TbQDAMSNKWBN5u\nt429cLGKyCbYaQM6XB28LtkFJjoZ2uNuTzoktONMJmMGGMjlkhKWjsiDJW2t9f8CMC0w9P3zfTn3\nh2ketoxfj0YjdDod1Ot1VCoVHB0d4ejoCCcnJ4FQCZcdH5QGK8Mu10GSNo2ZIQ+ZOc/n887kIuOD\nsrlGTuQIy0DXReCh2LYNEjXwnLxbrZYhTltPXYbnrvO0qWNiOzouyQUOZyBp8wNCes9c7CGQjTTF\nYnFC5GyaHva6Yf3qYQRIiGwGuO7nriNsWY4nSfvp06d455138OTJkwlRJw4WvQvkdBnG9uRsO6kP\nXCwWA94Rjy5CXkdDfqi47n953ffoQEi4ZA2kHo0k7tFoNFEFNRgMZn7dGxsb0Fqb8B89antxV2lL\nrZZKpVAlGeeJtSVtSdKSPO06aJ7LZgGe2/Knw+EQ1WoVx8fHOD4+xvn5OZrNpgl7yGaY62AndeRA\nAtnNuLm56fQ86G3Qw2acmh1sqzrbzmN+kNox8Xgcl5eXJh4scxqU6bU9XRfs0CDtR4YKabOs06Yn\nLe8328u1E+nRaBSbm5sTIT52MrrCI3bDzEO27bUlbcLOUFNVTIYzZLut3TBgr2azGZhJ12q1jJfB\nBMssyUVbpMluHJAt57IahJl0O57NUAi3jA815PEQYE8eor3JeDQ/4GWITpKsCy7CZpmrXWIqSVvm\nZvja5JFhF7k4VUZWsbB/QOZvZDJynRtmboK1fvd2NQjwrD6VsqhyfJedPHHVUrO21a7JZicX1yye\nthzztbm5aUIfbCnn0fY67OoQmTlf9SnSHvOBVGmMxWLGtuWQWtqGtGkAL7RNEjftWBK2JGIZQpS7\nS3rYchdpVzzxtTE2LY+Mo8vFWY90aLyn/QAgiZsz56i5wBCHbKedlgkfDAbOjHmv15soD5zF02Zy\nMZVKIZfLTWTHs9msM7loey2stV71KdIe8wNDDSzlozypTYy9Xs/YAjsep8FueWfC0LXs/I/tacsy\nPnrWsvSUMqo7OzsmXr2zsxNIfMqQitQe8Z72GsNVd01Pm8nE4+NjM29uWn2pXb5k125LbZBZmi2Y\nfGFykU0wHI3EVSwWA6V6TC7Sk3YNaH1RxvwheyjrAulpA8/j2zI8IkkbeE7YL/owZziE15HXtB+z\nQ4/ytcmYuyRtlu6VSiWUy+WJZasKyud76NVORKhIe1rnoqukSEpGyvPT01PnZAy7XI8le3a45CYV\nIdJ4pb611LDmKpVKE6TNdnN70WjXvbTJYzq4Y5PEJrsHi8Uiut0ulFLGbuRIOfv+cCUor3NAZALd\nnoIk28jj8bghay5OSKeXXSqVTHz7oXvRsyB0fyFb+IYhD5f8o10hMhqNjAaIbI5heMSuOZWVJbOQ\nte2NMG4tEzGcsGEnE+3QSCaTMR1eMrkor+GJ+mHC5YkCQCKRMI0oHHiQTqedreWUD5ZSwjcBwx4u\n+5ZNL3xN9tAN2jztnB62x4sRKtKmZ217CXZog2OUXElFKWRDPRC2l7sIf5Z2cwl5M8VisYkSrEwm\nY9pwmTmn2A0TkExCynZz1svanrU39IcLSdj0qLPZrCFs6qnLqeY8SnkFAKY0cFaQtG1532lJc1fp\nKu2ddu4xG0JF2kCwZpSkyooOaZRykrlMMtoDCFiyZ3ct2lM5XuRpu7wfTo6R3gWJWm4TqQEsM+Q8\n2lKpnrA9gEkBKOC5pw0A8XjcTCjihKR6vT5ROsed6k3jxNQJofyvLTIlFfmkjALP7RJX72nPjtCR\ntixJkmO9OOuOQ3Ip4mR74CRvOSGDyUV7uZT+roMkbOlpc8sqRdtlqVOxWDSGa9dvu+pevXF72B/c\nWmsjr0vCzufzplJK7uCohAc887A7nc6tSJueNitBOJ/RXjLMJ8eYyRLVh17GdxOEirRl8b9sMac6\nWbVaRaVSMclFbgG5DWy32xMdjoPBYCYvWh5dsAfqRiIRU3/NxBD1rKVmCM+pw+CTix4vwjR7JBmm\nUinjeHS7XSMmxrBFKpUCENQdicVizjZ0qQ4owdrrbDYbkFWw9UGKxeKEQNlDG7Axb4SOtDkUVIY8\npG51tVo1Iuyy85Gi67IrTJK1TbhKqQmRnGn1z/SqZdY8Foshl8sFyJlHdjEylucbYTzmAVeIjh5x\nOp02MetoNIrLy0sT4uBOsNfrzbSbBIBMJjMx7ouj7KhrzRCMrbrnbf1uCCVp07umbjXjdXImY71e\nn8iQs81czp+jJ2GL4rCgX27p4vG4k2BZbmXH6Zh0lPFs6isw0Sif0xu0x10hCVuL+Yoc1MsORZuw\ny+XyhKcta69tsCFMLibT5YxGKTjlnZP5IHSkPR6P0e/3jf5vs9kMELasDLEV+hgKsZOLNHI7puzS\nomYdqdyiRiIRZ2u561wmYmxP2ycYPe4C2o2c5UjSlop60qkoFovG+bmuW9IGvXdbw106LzIs4r3s\n+SF0pM3wCD1tljNJL5uJSFkBwji4TCpy2YpprDl1lScxgWOTtmyW4fm0lnP7w8Guv/bwuAukLdHJ\noIedTCYxHA6RyWQCzWMczjHr89s7US456stWnPS7yfkglKTNOY0kbdvTrlaraDabU2c+yucDnneX\nSdKmILu9Bdzc3DS/Q+OLRqMTJXzsZnQJyNtVJg+9LddjfrA//KXWjUs33q6Wusl1XO3ldpWTrCP3\nmA9CRdpAkGCZ+GMDSzabNTKpJNdZYI/zYuhCes08xuPxiVBGNBoNyEzy6IpXe4L2WBRcxCgdC4/1\nwCwzIl8G8EUAuwAuAfya1vpXlFKvA/hxAKdXP/pprfUfLuyVIrjFy2QyuLy8NKGNRCIREF66uLiY\n+XmZYZcJR8bsbC1r2QQgbwjWwcrhon5buNpYJdv28JgVs3jaIwA/rbX+ulJqC8CfKaX+6Op7n9Fa\nf2ZxLy8I2bBCwqZanhzDRW2FWTFtzJJrKAE9FlsDxJ7L6BMvocDK2LaHx6yYZbDvMYDjq/O2Uuqb\nAF66+vZSGYkkLQmbUy0KhUKgZd2ef3cdXCV/jAPKBAvJWP4ej9JLp6ftK0JWG6tk2x4es0LdMPnw\nXQAOAfxDAD8D4JMAGgD+FMDPaK0bjt/RN7nGdbB1R2wNkptMj7FeozOBIiUneT4tbmj/nE3a8mc9\n5gelFLTWd/6j3rdte3jYmGbbMycir7aPvwPgp668kl8F8PNaa62U+gUAnwHwY67fffz4sTk/ODjA\nwcHBzV79FaQmtcfDxOHhIQ4PD+f6nKtg2x4es9r2TJ62UmoDwO8D+AOt9Wcd338VwO9prb/H8T3v\njXgsDHf1tL1te6wqptn2rPVnvw7gr6VRK6X2xPd/EMBf3e0lenjcC7xte4QKL/S0lVIfAfA/AXwD\ngL5anwbwwwA+hGelUm8B+Amt9Ynj97034rEw3MXT9rbtscqYZts3SkTe8sLesD0WhnklIm95bW/b\nHgvDXcMjHh4eHh4rAE/aHh4eHiGCJ20PDw+PEMGTtoeHh0eI4Enbw8PDI0RYKmnPu5PNX2t9rnUf\n15sX/P/EX2uZ1/Kk7a+1Ete6j+vNC/5/4q+1zGv58IiHh4dHiOBJ28PDwyNEWEpH5EIv4PHgcZ8d\nkfdxXY+Hg3tpY/fw8PDwmB98eMTDw8MjRPCk7eHh4REieNL28PDwCBGWQtpKqY8qpf5GKfV3SqlP\nLfhabyml/q9S6i+UUv9nAc//eaXUiVLqL8VjBaXUV5RSf6uU+m9KqdwCr/W6Uuo9pdSfX62Pzula\nLyul/rtS6v8ppb6hlPq3V4/P/b05rvVvrh5fyHtbJNbFtpdp19dcb+7//2Xa9ZTrzd+2tdYLXXj2\nwfAmgFcBxAB8HcAHFni9bwMoLPD5vxfPBPL/Ujz2SwD+3dX5pwD84gKv9TqAn17A+9oD8KGr8y0A\nfwvgA4t4b9dcayHvbYG2sDa2vUy7vuZ6c///L9OuX3C9ub23ZXjaHwbw91rrt7XWQwC/BeBjC7ye\nwgJ3EFrrrwKoWQ9/DMAXrs6/AOBfLPBawLP3OFdorY+11l+/Om8D+CaAl7GA9zblWi9dfTtM4+rX\nxraXadfXXA+Y8/9/mXZ9zfXmatvLIO2XALwrvn4Pz9/EIqAB/JFS6mtKqR9f4HUkyvpqHJXW+hhA\necHX+0ml1NeVUv9hnltWQin1XXjmBf0JgN1Fvjdxrf999dBC39ucse62vWy7Bhb4/1+mXVvXm6tt\nr2Mi8iNa638E4J8D+NdKqe+9h9ewyOL3XwXwD7TWHwJwDOAz83xypdQWgN8B8FNXnoL9Xub23hzX\nWuh7WwPct20vuqljYf//Zdr1lOvN7b0tg7SfAHhFfP3y1WMLgdb66OpYAfC7eLaFXTROlFK7gJnk\nfbqoC2mtK/oqYAbgcwD+8byeWym1gWeG9h+11l+6engh7811rUW+twVh3W17aXYNLO7/v0y7nna9\neb63ZZD21wB8t1LqVaVUHMAPAfjyIi6klEpdfcJBKZUG8M8A/NUiLoVgfOrLAD55df4JAF+yf2Fe\n17oyMOIHMd/39+sA/lpr/Vnx2KLe28S1FvzeFoF1s+1l2vXE9Rb4/1+mXTuvN9f3Ns9M7TUZ1Y/i\nWRb17wH87AKv8348y+D/BYBvLOJaAH4TwFMAfQDvAPgRAAUAf3z1Hr8CIL/Aa30RwF9evc//imex\nuXlc6yMAxuLv9+dX/7fivN/bNddayHtb5FoX216mXV9zvbn//5dp1y+43tzem9ce8fDw8AgR1jER\n6eHh4bG28KTt4eHhESJ40vbw8PAIETxpe3h4eIQInrQ9PDw8QgRP2h4eHh4hgidtDw8PjxDh/wPY\nbj/C7XFdaAAAAABJRU5ErkJggg==\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f22684f2890>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "print 'Training data shape', train_data.shape\n",
+    "_, (ax1, ax2) = plt.subplots(1, 2)\n",
+    "ax1.imshow(train_data[0].reshape(28, 28), cmap=plt.cm.Greys);\n",
+    "ax2.imshow(train_data[1].reshape(28, 28), cmap=plt.cm.Greys);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "cwBhQ3ouTQcW"
+   },
+   "source": [
+    "Looks good. Now we know how to index our full set of training and test images."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "PBCB9aYxRvBi"
+   },
+   "source": [
+    "### Label data\n",
+    "\n",
+    "Let's move on to loading the full set of labels. As is typical in classification problems, we'll convert our input labels into a [1-hot](https://en.wikipedia.org/wiki/One-hot) encoding over a length 10 vector corresponding to 10 digits. The vector [0, 1, 0, 0, 0, 0, 0, 0, 0, 0], for example, would correspond to the digit 1."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 191,
+     "status": "ok",
+     "timestamp": 1446749131421,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "9pK1j2WlRwY9",
+    "outputId": "1ca31655-e14f-405a-b266-6a6c78827af5"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Extracting /tmp/mnist-data/train-labels-idx1-ubyte.gz\n",
+      "Extracting /tmp/mnist-data/t10k-labels-idx1-ubyte.gz\n"
+     ]
+    }
+   ],
+   "source": [
+    "NUM_LABELS = 10\n",
+    "\n",
+    "def extract_labels(filename, num_images):\n",
+    "  \"\"\"Extract the labels into a 1-hot matrix [image index, label index].\"\"\"\n",
+    "  print 'Extracting', filename\n",
+    "  with gzip.open(filename) as bytestream:\n",
+    "    # Skip the magic number and count; we know these values.\n",
+    "    bytestream.read(8)\n",
+    "    \n",
+    "    buf = bytestream.read(1 * num_images)\n",
+    "    labels = numpy.frombuffer(buf, dtype=numpy.uint8)\n",
+    "  # Convert to dense 1-hot representation.\n",
+    "  return (numpy.arange(NUM_LABELS) == labels[:, None]).astype(numpy.float32)\n",
+    "\n",
+    "train_labels = extract_labels(train_labels_filename, 60000)\n",
+    "test_labels = extract_labels(test_labels_filename, 10000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "hb3Vaq72UUxW"
+   },
+   "source": [
+    "As with our image data, we'll double-check that our 1-hot encoding of the first few values matches our expectations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 127,
+     "status": "ok",
+     "timestamp": 1446749132853,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "uEBID71nUVj1",
+    "outputId": "3f318310-18dd-49ed-9943-47b4aae7ee69"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Training labels shape (60000, 10)\n",
+      "First label vector [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]\n",
+      "Second label vector [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]\n"
+     ]
+    }
+   ],
+   "source": [
+    "print 'Training labels shape', train_labels.shape\n",
+    "print 'First label vector', train_labels[0]\n",
+    "print 'Second label vector', train_labels[1]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "5EwtEhxRUneF"
+   },
+   "source": [
+    "The 1-hot encoding looks reasonable.\n",
+    "\n",
+    "### Segmenting data into training, test, and validation\n",
+    "\n",
+    "The final step in preparing our data is to split it into three sets: training, test, and validation. This isn't the format of the original data set, so we'll take a small slice of the training data and treat that as our validation set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 176,
+     "status": "ok",
+     "timestamp": 1446749134110,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "e7aBYBtIVxHE",
+    "outputId": "bdeae1a8-daff-4743-e594-f1d2229c0f4e"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Validation shape (5000, 28, 28, 1)\n",
+      "Train size 55000\n"
+     ]
+    }
+   ],
+   "source": [
+    "VALIDATION_SIZE = 5000\n",
+    "\n",
+    "validation_data = train_data[:VALIDATION_SIZE, :, :, :]\n",
+    "validation_labels = train_labels[:VALIDATION_SIZE]\n",
+    "train_data = train_data[VALIDATION_SIZE:, :, :, :]\n",
+    "train_labels = train_labels[VALIDATION_SIZE:]\n",
+    "\n",
+    "train_size = train_labels.shape[0]\n",
+    "\n",
+    "print 'Validation shape', validation_data.shape\n",
+    "print 'Train size', train_size"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "1JFhEH8EVj4O"
+   },
+   "source": [
+    "# Defining the model\n",
+    "\n",
+    "Now that we've prepared our data, we're ready to define our model.\n",
+    "\n",
+    "The comments describe the architecture, which fairly typical of models that process image data. The raw input passes through several [convolution](https://en.wikipedia.org/wiki/Convolutional_neural_network#Convolutional_layer) and [max pooling](https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer) layers with [rectified linear](https://en.wikipedia.org/wiki/Convolutional_neural_network#ReLU_layer) activations before several fully connected layers and a [softmax](https://en.wikipedia.org/wiki/Convolutional_neural_network#Loss_layer) loss for predicting the output class. During training, we use [dropout](https://en.wikipedia.org/wiki/Convolutional_neural_network#Dropout_method).\n",
+    "\n",
+    "We'll separate our model definition into three steps:\n",
+    "\n",
+    "1. Defining the variables that will hold the trainable weights.\n",
+    "1. Defining the basic model graph structure described above. And,\n",
+    "1. Stamping out several copies of the model graph for training, testing, and validation.\n",
+    "\n",
+    "We'll start with the variables."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 2081,
+     "status": "ok",
+     "timestamp": 1446749138298,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "Q1VfiAzjzuK8",
+    "outputId": "f53a39c9-3a52-47ca-d7a3-9f9d84eccf63"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Done\n"
+     ]
+    }
+   ],
+   "source": [
+    "import tensorflow as tf\n",
+    "\n",
+    "# We'll bundle groups of examples during training for efficiency.\n",
+    "# This defines the size of the batch.\n",
+    "BATCH_SIZE = 60\n",
+    "# We have only one channel in our grayscale images.\n",
+    "NUM_CHANNELS = 1\n",
+    "# The random seed that defines initialization.\n",
+    "SEED = 42\n",
+    "\n",
+    "# This is where training samples and labels are fed to the graph.\n",
+    "# These placeholder nodes will be fed a batch of training data at each\n",
+    "# training step, which we'll write once we define the graph structure.\n",
+    "train_data_node = tf.placeholder(\n",
+    "  tf.float32,\n",
+    "  shape=(BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))\n",
+    "train_labels_node = tf.placeholder(tf.float32,\n",
+    "                                   shape=(BATCH_SIZE, NUM_LABELS))\n",
+    "\n",
+    "# For the validation and test data, we'll just hold the entire dataset in\n",
+    "# one constant node.\n",
+    "validation_data_node = tf.constant(validation_data)\n",
+    "test_data_node = tf.constant(test_data)\n",
+    "\n",
+    "# The variables below hold all the trainable weights. For each, the\n",
+    "# parameter defines how the variables will be initialized.\n",
+    "conv1_weights = tf.Variable(\n",
+    "  tf.truncated_normal([5, 5, NUM_CHANNELS, 32],  # 5x5 filter, depth 32.\n",
+    "                      stddev=0.1,\n",
+    "                      seed=SEED))\n",
+    "conv1_biases = tf.Variable(tf.zeros([32]))\n",
+    "conv2_weights = tf.Variable(\n",
+    "  tf.truncated_normal([5, 5, 32, 64],\n",
+    "                      stddev=0.1,\n",
+    "                      seed=SEED))\n",
+    "conv2_biases = tf.Variable(tf.constant(0.1, shape=[64]))\n",
+    "fc1_weights = tf.Variable(  # fully connected, depth 512.\n",
+    "  tf.truncated_normal([IMAGE_SIZE / 4 * IMAGE_SIZE / 4 * 64, 512],\n",
+    "                      stddev=0.1,\n",
+    "                      seed=SEED))\n",
+    "fc1_biases = tf.Variable(tf.constant(0.1, shape=[512]))\n",
+    "fc2_weights = tf.Variable(\n",
+    "  tf.truncated_normal([512, NUM_LABELS],\n",
+    "                      stddev=0.1,\n",
+    "                      seed=SEED))\n",
+    "fc2_biases = tf.Variable(tf.constant(0.1, shape=[NUM_LABELS]))\n",
+    "\n",
+    "print 'Done'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "QHB_u04Z4HO6"
+   },
+   "source": [
+    "Now that we've defined the variables to be trained, we're ready to wire them together into a TensorFlow graph.\n",
+    "\n",
+    "We'll define a helper to do this, `model`, which will return copies of the graph suitable for training and testing. Note the `train` argument, which controls whether or not dropout is used in the hidden layer. (We want to use dropout only during training.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 772,
+     "status": "ok",
+     "timestamp": 1446749138306,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "V85_B9QF3uBp",
+    "outputId": "457d3e49-73ad-4451-c196-421dd4681efc"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Done\n"
+     ]
+    }
+   ],
+   "source": [
+    "def model(data, train=False):\n",
+    "  \"\"\"The Model definition.\"\"\"\n",
+    "  # 2D convolution, with 'SAME' padding (i.e. the output feature map has\n",
+    "  # the same size as the input). Note that {strides} is a 4D array whose\n",
+    "  # shape matches the data layout: [image index, y, x, depth].\n",
+    "  conv = tf.nn.conv2d(data,\n",
+    "                      conv1_weights,\n",
+    "                      strides=[1, 1, 1, 1],\n",
+    "                      padding='SAME')\n",
+    "\n",
+    "  # Bias and rectified linear non-linearity.\n",
+    "  relu = tf.nn.relu(tf.nn.bias_add(conv, conv1_biases))\n",
+    "  \n",
+    "  # Max pooling. The kernel size spec ksize also follows the layout of\n",
+    "  # the data. Here we have a pooling window of 2, and a stride of 2.\n",
+    "  pool = tf.nn.max_pool(relu,\n",
+    "                        ksize=[1, 2, 2, 1],\n",
+    "                        strides=[1, 2, 2, 1],\n",
+    "                        padding='SAME')\n",
+    "  conv = tf.nn.conv2d(pool,\n",
+    "                      conv2_weights,\n",
+    "                      strides=[1, 1, 1, 1],\n",
+    "                      padding='SAME')\n",
+    "  relu = tf.nn.relu(tf.nn.bias_add(conv, conv2_biases))\n",
+    "  pool = tf.nn.max_pool(relu,\n",
+    "                        ksize=[1, 2, 2, 1],\n",
+    "                        strides=[1, 2, 2, 1],\n",
+    "                        padding='SAME')\n",
+    "  \n",
+    "  # Reshape the feature map cuboid into a 2D matrix to feed it to the\n",
+    "  # fully connected layers.\n",
+    "  pool_shape = pool.get_shape().as_list()\n",
+    "  reshape = tf.reshape(\n",
+    "      pool,\n",
+    "      [pool_shape[0], pool_shape[1] * pool_shape[2] * pool_shape[3]])\n",
+    "  \n",
+    "  # Fully connected layer. Note that the '+' operation automatically\n",
+    "  # broadcasts the biases.\n",
+    "  hidden = tf.nn.relu(tf.matmul(reshape, fc1_weights) + fc1_biases)\n",
+    "  \n",
+    "  # Add a 50% dropout during training only. Dropout also scales\n",
+    "  # activations such that no rescaling is needed at evaluation time.\n",
+    "  if train:\n",
+    "    hidden = tf.nn.dropout(hidden, 0.5, seed=SEED)\n",
+    "  return tf.matmul(hidden, fc2_weights) + fc2_biases\n",
+    "\n",
+    "print 'Done'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "7bvEtt8C4fLC"
+   },
+   "source": [
+    "Having defined the basic structure of the graph, we're ready to stamp out multiple copies for training, testing, and validation.\n",
+    "\n",
+    "Here, we'll do some customizations depending on which graph we're constructing. `train_prediction` holds the training graph, for which we use cross-entropy loss and weight regularization. We'll adjust the learning rate during training -- that's handled by the `exponential_decay` operation, which is itself an argument to the `MomentumOptimizer` that performs the actual training.\n",
+    "\n",
+    "The vaildation and prediction graphs are much simpler the generate -- we need only create copies of the model with the validation and test inputs and a softmax classifier as the output."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 269,
+     "status": "ok",
+     "timestamp": 1446749139596,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "9pR1EBNT3sCv",
+    "outputId": "570681b1-f33e-4618-b742-48e12aa58132"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Done\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Training computation: logits + cross-entropy loss.\n",
+    "logits = model(train_data_node, True)\n",
+    "loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(\n",
+    "  logits, train_labels_node))\n",
+    "\n",
+    "# L2 regularization for the fully connected parameters.\n",
+    "regularizers = (tf.nn.l2_loss(fc1_weights) + tf.nn.l2_loss(fc1_biases) +\n",
+    "                tf.nn.l2_loss(fc2_weights) + tf.nn.l2_loss(fc2_biases))\n",
+    "# Add the regularization term to the loss.\n",
+    "loss += 5e-4 * regularizers\n",
+    "\n",
+    "# Optimizer: set up a variable that's incremented once per batch and\n",
+    "# controls the learning rate decay.\n",
+    "batch = tf.Variable(0)\n",
+    "# Decay once per epoch, using an exponential schedule starting at 0.01.\n",
+    "learning_rate = tf.train.exponential_decay(\n",
+    "  0.01,                # Base learning rate.\n",
+    "  batch * BATCH_SIZE,  # Current index into the dataset.\n",
+    "  train_size,          # Decay step.\n",
+    "  0.95,                # Decay rate.\n",
+    "  staircase=True)\n",
+    "# Use simple momentum for the optimization.\n",
+    "optimizer = tf.train.MomentumOptimizer(learning_rate,\n",
+    "                                       0.9).minimize(loss,\n",
+    "                                                     global_step=batch)\n",
+    "\n",
+    "# Predictions for the minibatch, validation set and test set.\n",
+    "train_prediction = tf.nn.softmax(logits)\n",
+    "# We'll compute them only once in a while by calling their {eval()} method.\n",
+    "validation_prediction = tf.nn.softmax(model(validation_data_node))\n",
+    "test_prediction = tf.nn.softmax(model(test_data_node))\n",
+    "\n",
+    "print 'Done'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "4T21uZJq5UfH"
+   },
+   "source": [
+    "# Training and visualizing results\n",
+    "\n",
+    "Now that we have the training, test, and validation graphs, we're ready to actually go through the training loop and periodically evaluate loss and error.\n",
+    "\n",
+    "All of these operations take place in the context of a session. In Python, we'd write something like:\n",
+    "\n",
+    "    with tf.Session() as s:\n",
+    "      ...training / test / evaluation loop...\n",
+    "  \n",
+    "But, here, we'll want to keep the session open so we can poke at values as we work out the details of training. The TensorFlow API includes a function for this, `InteractiveSession`.\n",
+    "\n",
+    "We'll start by creating a session and initializing the varibles we defined above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": []
+    },
+    "colab_type": "code",
+    "collapsed": true,
+    "executionInfo": {
+     "elapsed": 219,
+     "status": "ok",
+     "timestamp": 1446749385874,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "z6Kc5iql6qxV",
+    "outputId": "51512ffa-8eda-4579-bbf8-d684638df2aa"
+   },
+   "outputs": [],
+   "source": [
+    "# Create a new interactive session that we'll use in\n",
+    "# subsequent code cells.\n",
+    "s = tf.InteractiveSession()\n",
+    "\n",
+    "# Use our newly created session as the default for \n",
+    "# subsequent operations.\n",
+    "s.as_default()\n",
+    "\n",
+    "# Initialize all the variables we defined above.\n",
+    "tf.initialize_all_variables().run()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "hcG8H-Ka6_mw"
+   },
+   "source": [
+    "Now we're ready to perform operations on the graph. Let's start with one round of training. We're going to organize our training steps into batches for efficiency; i.e., training using a small set of examples at each step rather than a single example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 386,
+     "status": "ok",
+     "timestamp": 1446749389138,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "LYVxeEox71Pg",
+    "outputId": "9184b5df-009a-4b1b-e312-5be94351351f"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Done\n"
+     ]
+    }
+   ],
+   "source": [
+    "BATCH_SIZE = 60\n",
+    "\n",
+    "# Grab the first BATCH_SIZE examples and labels.\n",
+    "batch_data = train_data[:BATCH_SIZE, :, :, :]\n",
+    "batch_labels = train_labels[:BATCH_SIZE]\n",
+    "\n",
+    "# This dictionary maps the batch data (as a numpy array) to the\n",
+    "# node in the graph is should be fed to.\n",
+    "feed_dict = {train_data_node: batch_data,\n",
+    "             train_labels_node: batch_labels}\n",
+    "\n",
+    "# Run the graph and fetch some of the nodes.\n",
+    "_, l, lr, predictions = s.run(\n",
+    "  [optimizer, loss, learning_rate, train_prediction],\n",
+    "  feed_dict=feed_dict)\n",
+    "\n",
+    "print 'Done'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "7bL4-RNm_K-B"
+   },
+   "source": [
+    "Let's take a look at the predictions. How did we do? Recall that the output will be probabilities over the possible classes, so let's look at those probabilities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 160,
+     "status": "ok",
+     "timestamp": 1446749519023,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "2eNitV_4_ZUL",
+    "outputId": "f1340dd1-255b-4523-bf62-7e3ebb361333"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[  2.25393465e-04   4.76219648e-05   1.66868104e-03   5.67830284e-05\n",
+      "   6.03432834e-01   4.34970111e-02   2.19316153e-05   1.41285171e-04\n",
+      "   1.54902827e-05   3.50893021e-01]\n"
+     ]
+    }
+   ],
+   "source": [
+    "print predictions[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "X5MgraJb_eQZ"
+   },
+   "source": [
+    "As expected without training, the predictions are all noise. Let's write a scoring function that picks the class with the maximum probability and compares with the example's label. We'll start by converting the probability vectors returned by the softmax into predictions we can match against the labels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 220,
+     "status": "ok",
+     "timestamp": 1446750411574,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "wMMlUf5rCKgT",
+    "outputId": "2c10e96d-52b6-47b0-b6eb-969ad462d46b"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "First prediction 4\n",
+      "(60, 10)\n",
+      "All predictions [4 4 2 7 7 7 7 7 7 7 7 7 0 8 9 0 7 7 0 7 4 0 5 0 9 9 7 0 7 4 7 7 7 0 7 7 9\n",
+      " 7 9 9 0 7 7 7 2 7 0 7 2 9 9 9 9 9 0 7 9 4 8 7]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# The highest probability in the first entry.\n",
+    "print 'First prediction', numpy.argmax(predictions[0])\n",
+    "\n",
+    "# But, predictions is actually a list of BATCH_SIZE probability vectors.\n",
+    "print predictions.shape\n",
+    "\n",
+    "# So, we'll take the highest probability for each vector.\n",
+    "print 'All predictions', numpy.argmax(predictions, 1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "8pMCIZ3_C2ni"
+   },
+   "source": [
+    "Next, we can do the same thing for our labels -- using `argmax` to convert our 1-hot encoding into a digit class."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 232,
+     "status": "ok",
+     "timestamp": 1446750498351,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "kZWp4T0JDDUe",
+    "outputId": "47b588cd-bc82-45c3-a5d0-8d84dc27a3be"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Batch labels [7 3 4 6 1 8 1 0 9 8 0 3 1 2 7 0 2 9 6 0 1 6 7 1 9 7 6 5 5 8 8 3 4 4 8 7 3\n",
+      " 6 4 6 6 3 8 8 9 9 4 4 0 7 8 1 0 0 1 8 5 7 1 7]\n"
+     ]
+    }
+   ],
+   "source": [
+    "print 'Batch labels', numpy.argmax(batch_labels, 1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "bi5Z6whtDiht"
+   },
+   "source": [
+    "Now we can compare the predicted and label classes to compute the error rate and confusion matrix for this batch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      },
+      {
+       "item_id": 2
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 330,
+     "status": "ok",
+     "timestamp": 1446751307304,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "U4hrLW4CDtQB",
+    "outputId": "720494a3-cbf9-4687-9d94-e64a33fdd78f"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.0666666666667\n"
+     ]
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPcAAAD7CAYAAAC2TgIoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADpNJREFUeJzt3X+MXWWdx/H3p7TUli6UXYnrUtuxMV2RhECDSKzQ64Ju\nxQ1m12wEJGTJhv1DmxJNDMo/vf1jN9kYY0jWf4jYFVp0Q6GBRJcUJVMDG1nsT6QtdWUHCkKDsdOG\nlGBLv/vHPe1O6S3z3DnPM515+nklN3Nmcvqdb+6dT89zz7n3exURmFl9ZpzpBsysDIfbrFIOt1ml\nHG6zSjncZpVyuM0qNTNXIUm+pmZ2hkSE3v2zbOEGYH5ivt/qwpxu0q7LDzye/OtHuusY6t6avP9m\n/TJxz2Ggk1x3MAPWXtdN2+/hLnwxcV+AlYn7DfDYQe2PX6m6g9Ze0/enXpabVcrhNqvUmQn3zE6R\nsvM7lxWpC0OF6hasfUmnTN1Cjx1Mx8evVN08tc9MuGd1ipSdfn8cBWt/rFOmbqHHDqbj41eqbp7a\nSeGWtELSHkl7Jd3V+reaWXHjhlvSDODfgL8GLgVulvTR0o2ZWTspR+6rgN9ExEsRcQT4MfCFsm2Z\nWVsp4b4Y2Dfm+1ean5nZFOZLYWaVSnmF2qvAwjHfL2h+dqq3umMqd4qeWTU7e400t/eWEu5ngY9I\nWgS8BtwE3Nx3zwFelmhmEzXEyZfKNvfda9xwR8Q7klYCm+gt4++LiN3tGzSzkpLeOBIRjwN/WbgX\nM8vIJ9TMKuVwm1XK4TarlMNtVimH26xSDrdZpZTrs8J6AxJXZ6llZoNY03dAoo/cZpVyuM0q5XCb\nVcrhNquUw21WKYfbrFIOt1mlUqaf3idpv6Sdk9GQmeWRcuReS2+ssZlNI+OGOyKeAg5MQi9mlpGf\nc5tVKu/nczM8ZnuIsp+lZHa2GiHX9NMBdPKWM7M+hkiZfpq6LFdzM7NpIuVS2IPAfwFLJL0s6fby\nbZlZWylzy2+ZjEbMLC+fLTerlMNtVimH26xSDrdZpRxus0o53GaVyjvaeH6eWicZ7eavOZ3N75ap\n6/t5cpR4/Ebl0cZmZxOH26xSDrdZpRxus0o53GaVcrjNKpXyls8Fkp6U9Lyk5yStmozGzKydlEks\nR4GvR8R2SfOALZI2RcSewr2ZWQsp009fj4jtzfabwG7g4tKNmVk7Az3nljQEXA48U6IZM8sneUBi\nsyTfANzZHMFP9VZ3TOUOzOq06c3M+jkyDEeHx90tKdySZtIL9gMR8ehpd5zTTerNzFqY1Tn5wPn2\nmr67pS7LfwDsioh7WrZlZpMk5VLYMuDLwF9J2iZpq6QV5VszszZSpp8+DZwzCb2YWUZ+hZpZpRxu\ns0o53GaVcrjNKuVwm1XK4TarVN7pp6zOUsvMBrHG00/NziYOt1mlHG6zSjncZpVyuM0q5XCbVWrc\nd4VJmg38Aji3uT0aEXeXbszM2kl5y+fbkj4dEYclnQM8LWlZ81ZQM5uikpblEXG42Zzd/JsDxToy\nsyySwi1phqRtwOvAcETsKtuWmbWVeuQ+FhFXAAuAayUtL9uWmbWVPNoYICIOSfoJcCWw+dQ9hsds\nDzU3M8trpLm9t5Sz5e8HjkTEQUlzgM8A/Wep0knvz8wmaIiTD5x9jrOkHbk/CPxQkugt4x+IiJ+3\n7M7MCku5FPYcsHQSejGzjPwKNbNKOdxmlXK4zSrlcJtVyuE2q5TDbVYph9usUgO9/PSMmN8tV3u0\nUO3p2PN0tK5bpu7KMmWBSX38fOQ2q5TDbVYph9usUg63WaUcbrNKOdxmlUoOdzNHbaukx0o2ZGZ5\nDHLkvhPwYESzaSJ1+ukC4Abg+2XbMbNcUo/c3wW+AUTBXswso5QBiZ8H9kfEdkkdQKffe3jM9hCe\nfmpWwghZpp8Cy4AbJd0AzAH+RNL9EXHbqbt2BmjQzCZmiJTpp+MuyyPi7ohYGBGLgZuAJ/sH28ym\nEl/nNqvUoJ84spnTrQHMbErxkdusUg63WaUcbrNKOdxmlXK4zSrlcJtVaupPP52O0z6nY8/TUakp\npZU8fj5ym1XK4TarlMNtVimH26xSDrdZpRxus0olXQqTNAIcBI4BRyLiqpJNmVl7qde5jwGdiDhQ\nshkzyyd1Wa4B9jWzKSA1sAE8IelZSXeUbMjM8khdli+LiNckXUQv5Lsj4qlTdxsesz2Ep5+alTBC\nrumnRMRrzdc3JG0ErgL6hLuT3J6ZTdQQWaafSporaV6zfR7wWeDXrfszs6JSjtwfADZKimb/9RGx\nqWxbZtbWuOGOiP8FLp+EXswsI1/eMquUw21WKYfbrFIOt1mlHG6zSjncZpWa+tNPzU6nkimlpfjI\nbVYph9usUg63WaUcbrNKOdxmlXK4zSqVFG5JF0h6SNJuSc9L+kTpxsysndTr3PcAP42Iv5c0E5hb\nsCczy2DccEs6H7gmIv4BICKOAocK92VmLaUsyz8M/F7SWklbJd0raU7pxsysnZRwzwSWAt+LiKXA\nYeCbRbsys9ZSnnO/AuyLiF81328A7uq/6/CY7SE82tishBGyjDaOiP2S9klaEhF7geuAXf337gzQ\noJlNzBApo41Tz5avAtZLmgW8CNzeojMzmwSpH0qwA/h44V7MLCO/Qs2sUg63WaUcbrNKOdxmlXK4\nzSrlcJtVyuE2q1TW0cbL4+qc5QDYvH5F9ponrCxTdvmBx8sUpuD94fvihNW3qkhdgOH4z+w1N5+m\nXR+5zSrlcJtVyuE2q5TDbVYph9usUg63WaXGDbekJZK2NfPTtkk6KGnVZDRnZhOXMollL3AFgKQZ\n9MYubSzcl5m1NOiy/HrgtxGxr0QzZpbPoOH+EvCjEo2YWV7JLz9t5qfdyHuMNR7prjuxPb9zGfM7\nl7VqzsxONTq8k9HhnePuN8hryz8HbImIN063w1D31gHKmdlEvPvA+dKa9X33G2RZfjNekptNG6mf\n8jmX3sm0R8q2Y2a5pI42PgxcVLgXM8vIr1Azq5TDbVYph9usUg63WaUcbrNKOdxmlVJE5CkkBazO\nUusk87v5a5Y22j3THdhZZQ0RccoMVB+5zSrlcJtVyuE2q5TDbVYph9usUg63WaVS3/L5LUnPS9op\nab2kc0s3ZmbtpIw2XgTcAVwREZfRe5voTaUbM7N2Ut7PfQj4I3CepGPAXOB3Rbsys9bGPXJHxAHg\nO8DLwKvAaET8rHRjZtbOuEduSYuBrwGLgIPABkm3RMSDp+49PGZ7qLmZWV4jze29pSzLrwSejog/\nAEh6BPgk0CfcneT2zGyihjj5wLm5714pZ8tfAK6W9D5JAq4DdrfszswKS3nOvQO4H9gC7AAE3Fu4\nLzNrKXX66beBbxfuxcwy8ivUzCrlcJtVyuE2q5TDbVYph9usUg63WaUcbrNK5R1tvC5PrZOszF/y\nhFIjiKfjOOZpaPmBx4vU3XzhiiJ1ixmVRxubnU0cbrNKOdxmlXK4zSrlcJtVKnX66Z2Snmtuq0o3\nZWbtpUw/vRT4R3oTWS4H/qYZvWRmU1jKkfsS4JmIeDsi3gF+Afxd2bbMrK2UcP8auEbShZLmAjcA\nHyrblpm1Ne4klojYI+lfgSeAN4FtwDulGzOzdlLHLK0F1gJI+mdgX98dH+7+//YlHfhYp113Znaq\nI8NwdHjc3ZLCLemiiHhD0kLgb4Gr++74xW5yf2Y2QbM6vdtxb6/pu1tSuIGHJf0pcAT4SkQcated\nmZWWuiy/tnQjZpaXX6FmVimH26xSDrdZpc5MuHcNl6l7pFDdhE9UnLBSPU+3ugVrjw7vLFJ3qt8X\nZybcu4fL1E249jcxI4XqUq7n6Va3YO1i4Z7i94WX5WaVcrjNKpV3+qmZnRH9pp9mC7eZTS1elptV\nyuE2q9SkhlvSCkl7JO2VdFfGuvdJ2i8p6zUPSQskPSnp+Zzz4yTNlvSMpG1N7X/JUXdM/RmStkp6\nLHPdEUk7mr7/O2PdCyQ9JGl3c398IlPdJU2vW5uvBzM+ht9qet0pab2kczPVzTevMCIm5UbvP5L/\nARYBs4DtwEcz1f4UvfluOzP3/OfA5c32POCFjD3Pbb6eA/wSWJax768B64DHMt8fLwIXFvjb+Hfg\n9mZ7JnB+gd8xA/gd8KEMtRY198W5zff/AdyWoe6lwE5gdvN3sQlYPNF6k3nkvgr4TUS8FBFHgB8D\nX8hROCKeAg7kqPWuuq9HxPZm+01gN3BxptqHm83Z9P7wsvQvaQG9UVjfz1Hv3eXJvNqTdD5wTfQG\nghARR6PMW4qvB34bEf0HjQzmEPBH4DxJM4G59P7jaCvrvMLJDPfFnDzB5RUyBWUySBqitzp4JlO9\nGZK2Aa8DwxGxK0dd4LvAN4ASl0ECeELSs5LuyFTzw8DvJa1tls/3SpqTqfZYXwJ+lKNQRBwAvgO8\nDLwKjEbEzzKUzjqv0CfUEkiaB2wA7myO4K1FxLGIuAJYAFwraXnbmpI+D+xvVhtqbjkti4il9P7o\nvirpUxlqzgSWAt9rah8Gvpmh7gmSZgE3Ag9lqreY3lOfRcBfAPMk3dK2bkTsAY7PK/wpLecVTma4\nXwUWjvl+QfOzKa1Zdm0AHoiIR3PXb5agP6E3F76tZcCNkl6kd5T6tKT7M9QFICJea76+AWyk91Sr\nrVeAfRHxq+b7DfTCntPngC1N3zlcCTwdEX9ols+PAJ/MUTgi1kbElRHRAUaBvROtNZnhfhb4iKRF\nzZnFm4CcZ3NLHKkAfgDsioh7chWU9H5JFzTbc4DP0DvB2EpE3B0RCyNiMb3798mIuK1tXQBJc5sV\nDJLOAz5LbxnZSkTsB/ZJWtL86Dog11OU424m05K88QJwtaT3SRK9nnfnKCzpoubr8XmFD060VuoM\ntdYi4h1JK+mdAZwB3BcRue6QB4EO8GeSXgZWHz9B07LuMuDLwHPN8+MA7o6Itp/6/kHgh80fxgx6\nq4Kft6xZ2geAjc3LjGcC6yNiU6baq4D1zfL5ReD2THVpnrteD/xTrpoRsaNZEW2ht2zeBtybqXy2\neYV++alZpXxCzaxSDrdZpRxus0o53GaVcrjNKuVwm1XK4TarlMNtVqn/A4ohGJAOEeBVAAAAAElF\nTkSuQmCC\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f226836c350>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "correct = numpy.sum(numpy.argmax(predictions, 1) == numpy.argmax(batch_labels, 1))\n",
+    "total = predictions.shape[0]\n",
+    "\n",
+    "print float(correct) / float(total)\n",
+    "\n",
+    "confusions = numpy.zeros([10, 10], numpy.float32)\n",
+    "bundled = zip(numpy.argmax(predictions, 1), numpy.argmax(batch_labels, 1))\n",
+    "for predicted, actual in bundled:\n",
+    "  confusions[predicted, actual] += 1\n",
+    "\n",
+    "plt.grid(False)\n",
+    "plt.xticks(numpy.arange(NUM_LABELS))\n",
+    "plt.yticks(numpy.arange(NUM_LABELS))\n",
+    "plt.imshow(confusions, cmap=plt.cm.jet, interpolation='nearest');"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "iZmx_9DiDXQ3"
+   },
+   "source": [
+    "Now let's wrap this up into our scoring function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 178,
+     "status": "ok",
+     "timestamp": 1446751995007,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "DPJie7bPDaLa",
+    "outputId": "a06c64ed-f95f-416f-a621-44cccdaba0f8"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Done\n"
+     ]
+    }
+   ],
+   "source": [
+    "def error_rate(predictions, labels):\n",
+    "  \"\"\"Return the error rate and confusions.\"\"\"\n",
+    "  correct = numpy.sum(numpy.argmax(predictions, 1) == numpy.argmax(labels, 1))\n",
+    "  total = predictions.shape[0]\n",
+    "\n",
+    "  error = 100.0 - (100 * float(correct) / float(total))\n",
+    "\n",
+    "  confusions = numpy.zeros([10, 10], numpy.float32)\n",
+    "  bundled = zip(numpy.argmax(predictions, 1), numpy.argmax(labels, 1))\n",
+    "  for predicted, actual in bundled:\n",
+    "    confusions[predicted, actual] += 1\n",
+    "    \n",
+    "  return error, confusions\n",
+    "\n",
+    "print 'Done'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "sLv22cjeB5Rd"
+   },
+   "source": [
+    "We'll need to train for some time to actually see useful predicted values. Let's define a loop that will go through our data. We'll print the loss and error periodically.\n",
+    "\n",
+    "Here, we want to iterate over the entire data set rather than just the first batch, so we'll need to slice the data to that end.\n",
+    "\n",
+    "(One pass through our training set will take some time on a CPU, so be patient if you are executing this notebook.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 2
+      },
+      {
+       "item_id": 3
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 12621,
+     "status": "error",
+     "timestamp": 1446752161966,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "4cgKJrS1_vej",
+    "outputId": "df1be05c-a41b-4eff-e91a-5cd24012d140"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Step 0 of 916\n",
+      "Mini-batch loss: 3.01724 Error: 6.66667 Learning rate: 0.00950\n",
+      "Validation error: 2.1%\n",
+      "Step 100 of 916\n",
+      "Mini-batch loss: 2.91099 Error: 3.33333 Learning rate: 0.00950\n",
+      "Validation error: 1.8%\n",
+      "Step 200 of 916\n",
+      "Mini-batch loss: 2.85178 Error: 1.66667 Learning rate: 0.00950\n",
+      "Validation error: 1.6%\n",
+      "Step 300 of 916\n",
+      "Mini-batch loss: 2.81923 Error: 3.33333 Learning rate: 0.00950\n",
+      "Validation error: 1.6%\n",
+      "Step 400 of 916\n",
+      "Mini-batch loss: 2.82241 Error: 3.33333 Learning rate: 0.00950\n",
+      "Validation error: 1.4%\n",
+      "Step 500 of 916\n",
+      "Mini-batch loss: 2.73972 Error: 0.00000 Learning rate: 0.00950\n",
+      "Validation error: 1.4%\n",
+      "Step 600 of 916\n",
+      "Mini-batch loss: 2.74712 Error: 1.66667 Learning rate: 0.00950\n",
+      "Validation error: 1.5%\n",
+      "Step 700 of 916\n",
+      "Mini-batch loss: 2.82140 Error: 5.00000 Learning rate: 0.00950\n",
+      "Validation error: 1.3%\n",
+      "Step 800 of 916\n",
+      "Mini-batch loss: 2.68445 Error: 1.66667 Learning rate: 0.00950\n",
+      "Validation error: 1.3%\n",
+      "Step 900 of 916\n",
+      "Mini-batch loss: 2.61466 Error: 0.00000 Learning rate: 0.00950\n",
+      "Validation error: 1.2%\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Train over the first 1/4th of our training set.\n",
+    "steps = int(train_size / BATCH_SIZE)\n",
+    "for step in xrange(steps):\n",
+    "  # Compute the offset of the current minibatch in the data.\n",
+    "  # Note that we could use better randomization across epochs.\n",
+    "  offset = (step * BATCH_SIZE) % (train_size - BATCH_SIZE)\n",
+    "  batch_data = train_data[offset:(offset + BATCH_SIZE), :, :, :]\n",
+    "  batch_labels = train_labels[offset:(offset + BATCH_SIZE)]\n",
+    "  # This dictionary maps the batch data (as a numpy array) to the\n",
+    "  # node in the graph is should be fed to.\n",
+    "  feed_dict = {train_data_node: batch_data,\n",
+    "               train_labels_node: batch_labels}\n",
+    "  # Run the graph and fetch some of the nodes.\n",
+    "  _, l, lr, predictions = s.run(\n",
+    "    [optimizer, loss, learning_rate, train_prediction],\n",
+    "    feed_dict=feed_dict)\n",
+    "  \n",
+    "  # Print out the loss periodically.\n",
+    "  if step % 100 == 0:\n",
+    "    error, _ = error_rate(predictions, batch_labels)\n",
+    "    print 'Step %d of %d' % (step, steps)\n",
+    "    print 'Mini-batch loss: %.5f Error: %.5f Learning rate: %.5f' % (l, error, lr)\n",
+    "    print 'Validation error: %.1f%%' % error_rate(\n",
+    "        validation_prediction.eval(), validation_labels)[0]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "J4LskgGXIDAm"
+   },
+   "source": [
+    "The error seems to have gone down. Let's evaluate the results using the test set.\n",
+    "\n",
+    "To help identify rare mispredictions, we'll include the raw count of each (prediction, label) pair in the confusion matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      },
+      {
+       "item_id": 2
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 436,
+     "status": "ok",
+     "timestamp": 1446752934104,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "6Yh1jGFuIKc_",
+    "outputId": "4e411de4-0fe2-451b-e4ca-8a4854f0db89"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Test error: 1.4%\n"
+     ]
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQYAAAEKCAYAAADw9/tHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl4VOXZx/HvTSYkhCUsokBQAgooQkBZFBFEYhRcK1CV\nRcGoVUsRK68o1L4oVdGqUCv1bVRKhYIoVlwQEAyLLKKCIASNWJDKEiNhCxAJCTzvH+ckBCchk8x5\nMjNwf65rrpycnLnPk8lwc7b5HTHGoJRSJVUL9QCUUuFHG4NSyo82BqWUH20MSik/2hiUUn60MSil\n/PhCPQAAEdFzpkqFiDFGfjkvLBoDgGkb2HKPZ8PjZwW2rGSMrcAIlgA9K7B8qOvarB0udaMrsGw6\nkFyB5QsCXG4J4fFa2Kr9RKlzdVdCKeVHG4NSyk/ENYaeNW1VToywujZrR1pdgOaW6iZGWF1vaks4\nfFZCREygxxgqVLdCxxhU+KnIMYaKCvQYw6nuiVIPPlrfYhCR3iKSKSKbROQR2+tTSgXPamMQkWrA\nJOAa4EJggIicb3OdSqng2d5i6AJ8Z4z5rzGmAJgJ3GR5nUqpINluDAnAthLfb3fnle/+sTB1ObyW\nDi3bQp/bYPIi5/t3N8ILbznL1akLL70PU5bAI3/xePiVM2/eYLKzH2b06O6hHkrI2XwtDh0aRXr6\nINLTBzF0aJInNW2O11btWrWqs3x5KunpQ1i+PJXLLjs76Jphc4HTCVolQdvOcMflcFYCPDUV7k6G\neTOdn/9hEnyxxJm+cxTMnwkfzoAnXoOuKfDpwpANHSA19T2uuqoFTZvWCek4woHN12L79lySk6d7\nWtPmeG3VPnjwCN27T8EYQ2JiXaZP70e3bpODqml7i2EHcE6J75u68/w8nn38sbFRK/h6jfOD7B2Q\n0Bx8bg+LioLL+8Di953vO10BS+c400s/cL4PsaysA4jfcd7Tk83XolGjWixaNIhZs/pyzjne/GOz\nOV6btYvOLsbHx7J7d95JltyKc2Vk0aN0trcYvgDOE5FmQBZwGzCgtAVPuMx5WwakDneaQYs2zlZD\nnXqwZ5fTFFYvhYIjzrJ16sHBXGc6dx/E17f466hwkpg4ib17D5OS0pzJk68nJWVGqIcUMo0b1+bN\nN/vTuvUZ3HTTGydZMpETr3NYWupSVrcYjDFHgd8BC4CNwExjzDflPvH7TGfXIG0BDBwO/9noNAWA\n6wfDnH8dXzZ3L9Ss7UzXjof9ezz+LVS42rv3MAALF35Ps2bxIR5NaGVlHaBHjyl06vQKaWk3BF3P\n+nUMxpj5xpjWxpiWxphnAn7irDS4qxdMmwjfbXDm1awNF1wMny06vtzqpdDjOme6+7XO92FCdyeO\n8/q1iIuLLq7Zrt2Z7Np1ss3nirP5t/O6dnR0VPH0wYNHqF496iRLByY8Dz4C/H0+RPlgXw48NcyZ\nl9IPFr174nJTnoOnp8It98Gm9SE/8AiQlnYDXbs2JSbGR8eOTejX781QDylkbL0WbdqcQVrateTm\n5mOM4d5753pS1+bfzlbttm3PZOLEaygsPEZMjI8RI+YFXVMviVZhTC+Jti9El0QrpSKPNgallB9t\nDEopP9oYlFJ+wuashI0DhWPLyLPzwhPogU379ABhqOgWg1LKjzYGpZQfbQxKKT/aGJRSfrQxKKX8\naGNQSvnRxqCU8mM7JXqyiGSLyHqb61FKecv2FsMUnOh4pVQEsZ3gtBzYa3MdgRo8bx4PZ2fTffRo\nAOo1b85vvviC0fv3c3bXrsXLXTNhAnetXMldK1bQbdSo4vltb7uNu1etInX5clKefbbM9bRv34hl\ny1JZvHgoCxfeQbNmde39Uqe5li0bkJ//R7p2DT4VuYiNxGXbbLznrOcxuHmPHxhjysz3FhGDhUuM\nS14SXbtxY1pcdRV1mjZl2fjx+GJi8MXGcs2ECXz52mts+/RTAOq1aMHeLVsAuGvFCv49aBD7tm7l\nge++4+V27Sg8fJghixbR+74MNm3a7bfOhg1rcujQEfLyCujd+zwGDGjHkCGzPf/dFLz++s00alSL\nxx9fwqefbiv/CQESEU8Tl20L7j1Xeh5D2HxW4sTE2kS8vunngaysEzK1CvPzKczP98vZKmoKAMcK\nCzFHjwLw8549xMbH8/PRo0RFR7Nv3+FS17Nr16Hi6fz8oxQUHPXy11Cuzp0TyMo6QGHhMc9rB564\nHB4q9p7b6j5OLowaQ89QD+AE7QYOZM/mzezf5vxPtGz8eO5bt46CvDw2vvUWP/1UemMoEhcXzZNP\n9iI19b2qGO5pZ8yY7tx557tMmOD9IazAE5fDS2DvuURCnhLtEvcRMVokJ9NhyBDm3HsvAL6YGK4c\nN46/nnceL557Lg0vvJCOHZuU+fyoqGrMnNmf8eOX8e23OVU17NNGnz4tWb16Z5lbbcHyOnG5Knj9\nnrN9unIGsBJoJSI/iMidNtcXkNIiekvMS+jShSvHjeOt/v05WuB87Leaz0e1qCiOHHI22Q7v3Uu9\nerFlrmL69L7Mnp3JnDmbvB27AqBDh0b07JnI3LmDSEk5l+efv9qzuzvZSFyuCl6/58ImDNb2wccb\n0tJo2rUrvpgYfsrIYPaQIdz6zjs0vOACcnfs4Lu5c1k6bhz3r18PxpC3ezcYw0cjR/LjunVc8sAD\nJA0aRGF+Pnu++46L7/qh1HX27XsBU6b8itWrdyIC69dn8+CD8z3/3ZRj8uSbePXVNaxatd2Tehdd\n1PiExOU//WkpCxZs9qS2LcG950o/+HjaNAavaVCLOjVoSrRSKkDaGJRSfrQxKKX8aGNQSvnRxqCU\n8hNGVz56z+aZg5woO2c8zjiqZzuqRg1LdX+2VLdq6RaDUsqPNgallB9tDEopP9oYlFJ+tDEopfxo\nY1BK+bH9seumIrJIRDaKyAYRecDm+pRS3rB9HUMh8JAxZp2I1ALWiMgCY0ym5fUqpYJgOyX6R2PM\nOnf6IPANkFCRGvPmDSY7+2FGj+7u6di8qlv7w3nU25lNjUdHF8+Lm/gidRYvpfbs95D4eGfeCxOo\ns3wldZatIPZhN306Joba8z6izuKl1Fm2guhrNGk/HIwd243lyweRnn4bF154hmd1bb2XbaiyKx9F\nJBHoAHxWkeelpr7HVVe18Cyhx+u6B+9OJTr5KqKaNgUg+uqrkRo1yL3yCqoPGkyNUY+Q94cxHP7b\nJI6NfAiAOstWcGTWWxzbto1Dv7mbY9u2IfXrE//JCpg7M+jfTVVeUlJDOnduzOWXTychoTZTp15H\ncrI3fxNb72UbqqQxuLsRbwMj3C2HUiwpMZ1IUWBlVtaBUtPYguVVXZOVhZQo5OtxBUc+nANAwZwP\niL3vfgCOlUifprAQjh6Fo0c55obNmsOHixOpVei0alWfNWt+BGDHjgM0bx6Pz1fNkzRqW+/litlK\nWKREi4gPpylMM8acJL62p+2hVIlq9Rtg9jn32DH79yPxJ978o/qAgRzdvLm4IRSp+fwEDj//Z6B5\nVQ1VlSIjI4fhwzvi81WjTZsGJCTUpl69WHbtCv8Y+cAkEkhKdFVsMfwD+NoY82IVrCvkzN49xc1A\n6tQpbhIA0cnJxNwxhAM3Xn/Cc2qM+QMmdz/506ZiI+JOBS4zczczZnzNggW3sHnzPjZuzDmFmkLg\nbJ+u7AYMAnqJyFoR+VJEeleulrdj87yuW6jgk6VU73MtANHXXkfBJ05H9nXpQo3Hx3Hwlv7gpk8D\nxP52GNXOPY+8Rx/xaCAqWGlp6+jVayYTJ65mw4ZdntcP/e5E+cI+DDYt7Qa6dm1KTIyPjIyf6Nfv\nTU/WGWzdoo9d1/y/NHyXdkViYijcmMHBX/ej5osvEZWUhNm/n4ND78Ds20f8Wid92uxx0qcPPTyS\nY9u3U297FoWfrnSOORhD9V6fePL7qfKU/bHr+fNvwecTcnJ+ZtiwhezeXZGPUpe9rK33cnBOw5Ro\nmzSPIdJpHoNDU6KVUgHSxqCU8qONQSnlRxuDUsqPNgallJ9TOiXaJltnD8yZ9u63KT/pGY/jIu3s\nQdXSLQallB9tDEopP9oYlFJ+tDEopfxoY1BK+dHGoJTyY/V0pYjEAJ8A1d3He8aYMTbXqZQKntXG\nYIzJF5ErjTF5IhIFrBCRbsaYFTbXq5QKjvVdCWNMUfxNjLu+vSdZXCkVBqw3BhGpJiJrgR+BJcaY\nr22vMxC2orzbt2/EsmWpLF48lIUL76BZs7rlP6k8/zMWPlgO/06H8y+E2Fh49S14ZxFMfhtq1XaW\na36eM+/f6TD2ueDXG6RataqzfHkq6elDWL48lcsuOzvUQyqXzTGH+60QSqqyoBYRqQMsAB4xxiz9\nxc+qPKilcePaxVHe48cv86xuw4Y1OXToCHl5BfTufR4DBrRjyJDZAT/f75LoNkkw5mkYfD00ToBJ\nU2H+e1A9Bv72HNz4a2eZZ/4IU96BF5+Gdath/CSY+w4sW1RcKhSXRIsIxhgSE+syfXo/unWbXOVj\nqChbY7b1nguuboiDWowxucCHQKfSl1hS4rHV+nhsRXnv2nWIvDwn0zE//ygFBUFGwp/bCtavcaaz\ndsA5zaFFK/hqtTNv7edwWU9nukUr+Mpddt0X0O3K4NbtgaL/eOLjY9m9OzJCVW2NOTxuhbCVE/+t\nlc72WYkzgAJjzH4RqQGkAGV8SqinzaFUubi4aJ58shepqSdJzA9EZgbcNRx8PmjVBho3hR0/QK8+\nsHwxXHUd1K3vLPvNeujVG9LnQXIf2LM7+F8kSI0b1+bNN/vTuvUZ3HTTG6EeTkAiccyBSySQ+Hjb\nWwyNgcXuMYZVwPvGmHTL6wy5qKhqzJzZn/Hjl/HttznBFfsuE96ZAW8tgHsecBrF3yc4xxne/hga\nNYHsnc6yj/8PDLob3vzIaQpF80MoK+sAPXpMoVOnV0hLuyHUwwlIJI7Za7ZPV24ALra5jmDZ2LSb\nPr0vs2dnMmfOJm8KTk1zHq3bwO8ece5kNca9cfjgu2GHe/OaH3dCaj9n+qXX4cN3vFl/JUVHRxXv\nSh08eITq1aNCOp5AVMWYw/5WCJzGeQwlo7w7dmziWZR3374X0KdPSxo2rMnttyexfn02Dz44P7ii\nM+c7uxJ7cuDRYdDyfHj2ZadBfL0ennjYWe7m22DwPXDsGMyaBpu+Cf4XCkLbtmcyceI1FBYeIybG\nx4gR80I6nkDYHLOt95yNuhofH2Y0qEVVLY2PV0oFSBuDUsqPNgallB9tDEopP9oYlFJ+TtvTleHK\n5pkD09bOGQ/J0LMdVSO6ytZ00sYgIg+d7OfGmAneDkcpFQ7K22JwP89La6Az8L77/Q3A57YGpZQK\nrZM2BmPMEwAi8glwsTHmgPv94ziflFRKnYICPfh4FnCkxPdH3HlKqVNQoAcfpwKfi0hR4sivgNft\nDEkpFWoBNQZjzFMiMg8oyo660xiz1t6wlFKhVJHrGOKAXGPMi8B2EWke6BPd3McvReT98pdWSoVa\nQI1BRMYCjwCj3VnRwL8qsJ4RQFiEwCqlyhfoFsPNwI3AIQBjzE6On8o8KRFpClwLvFaZASqL7h8L\nU5fDa+nQsi30uQ0mL3K+f3cjvPCWs9yoCfCvlTBtBdz5cGjH7IqkxOWqqH3RRY2YP/82Pv54IOPH\nB5/1GejBxyPGGOPkJoCI1KzAOiYCDwPxFR2csqhVErTtDHdcDmclwFNT4e5kmDfT+fkfJsFqNw9w\nxiT4s3ut27QVsGAW7NgakmEXSU19rzgZORLq2qzt81XjmWeu5Oab3y4OIg5WoFsMb4lIGlBXRO4B\nPiaALQARuQ7INsasA8R9lGEJVZkSfdpLbAVfu4nS2TsgobmTEgUQFQWX94FFbpDt9i3Hn3e0EI4G\nmXztgfBIXA6P2l27JnDw4BHeeONXLFw4kG7dmp5k6S1AeolH6QI9K/G8iKQAuThXQf6vMWZhAE/t\nBtwoItcCNYDaIjLVGHOH/6I9AxmK8sp3GTDQTZ9u0cbZaqhTD/bscprC6qVQcOTE51w3ELZthh+3\nhWbMqlRNmtQmKelM2rd/jfj4GNLTB9GmTVoZS7dwH0UWl7pUQI1BRJ41xjwCLCxlXpncG9iOcZe/\nAhhZelNQVe77TPhwBqQtcP6x/2ej0xQArh8Mb79y4vKXJsONQ2DY9VU/VnVSe/b8zMqV28nLKyAv\nr4CcnDwaNKjB7t0/V7pmoLsSKaXM61PptarwMCsN7uoF0ybCdxuceTVrwwUXw2fH72BFuy4wbBw8\n1B8KvdmH9UokJC7brv3ZZztp1aoBIs4t9ho2jAuqKUD5n668H/gtcK6IrC/xo9rAyoqsyL0tXel3\nt1Ch8ff5EOWDfTnw1DBnXko/WPTuics98RoYA399z/n6/EjIXFf14y0hkhKXbdfOzc3npZe+YOnS\n2/H5qjFq1KLyn1SOk6ZEi0g8UA8YDzxa4kcHjDF7gl778fVoSnQV0DyGSGcjj+GxiqdEG2P2G2O2\nAi8Ce4wx/zXG/BcoFJFLLIxSKRUGAj3G8H/AwRLfH3TnKaVOQYE2BjEl9jmMMcfQWDilTlmBNoYt\nIvKAiES7jxE4V0oopU5BgTaG+4DLgB3AduAS4De2BqWUCi29d2XYsZkEbOcaBDPS0tmOF560UtcR\nXtdjhE7p964s7zqGUcaYP4vIS4BfBzHGPODhCJVSYaK8A4hF91FfbXsgSqnwUV5K9AfuV813VOo0\nUt6uxAeUsgtRxBhzo+cjUkqFXHm7Es+7X/sCjTge5zYAyLY1KKVUaJW3K7EUQEReMMZ0KvGjD0RE\njzsodYoK9OrFmiLSwhizBcBNiA4o3k1EtgL7gWNAgTGmS2UGqpSqOoE2ht8DS0RkC048WzPg3gCf\newzoaYzZW4nxKaVCINBot/ki0hI4352VaYzJD3AdQsXuX3GC9u0bMWnStRQWHqOw8Bh33/0+//3v\nvsqWs17XtkOHRrFq1Q4Apk3bwD//ub6cZ4TA9c9B4mVQLQqWToDMuTDoDaheE6r54J374ceNcN0z\ncHZnQKBha0h/Cla+HNAqzj+/AS+/3BtjIDbWR8uW9TjzzL8EPfR58wZz8cWN+ctfVjF+/LKg6xWp\nVas68+cPJj//KDExUYwatZCVK72LyGvZsgEZGb+lZ89/8umnwdcNNNotDngIaGaMuUdEWopIa2PM\nnACeboCFInIUeMUY82pFBrhz5wGuuWYaeXkF9O59HuPGXcmQIbPLf2KI6tq2fXsuycnTQz2MsjVs\nDQkXwaRuTiN4aB3E1YMfPoOPn4QWPSD5MZg+AD4sEfHx0DrY8O+AV5OZuZtevZzXoX//87nyykRP\nhm8ryfngwSN07z4FYwyJiXWZPr0f3bpN9qz+Y4/1YMmSrZ7VC3RXYgqwBujqfr8DmAUE0hi6GWOy\nRKQhToP4xhiz3H+xJSWmE90H7Np1qHhufv5RCgq8SSi2Vde2Ro1qsWjRIHbv/pmRIz/mhx9yQz2k\nEx3eD1HVna2F2DqQtxsO5cAZrZyfx9WHg784odWkAxzIdh6VMHhwO559tkKBYmWymRJd9PGD+PhY\ndu/O86xu584JZGUdoLDwWABLbyWQFPZAG8O5xphbRWQAgDEmTySwl88Yk+V+3eXeFLcLUEpj6HnS\nOnFx0Tz5ZC9SU98LcMiBsVXXlsTESezde5iUlOZMnnw9KSkzQj2kEx340dk6eGQTRMfB2/dA5nzo\n8RCMXA+x8fC3y098TsfBsLZyW0H16sXSunV9Pv10hweDt6tx49q8+WZ/Wrc+g5tuesOzumPGdOfO\nO99lwoRrAlg6kaL/dB2lpy0Guu9/RERq4F7sJCLnAuUeYxCROBGp5U7XBK4GMgJcZ7GoqGrMnNmf\n8eOX8e23ORV9epXXtWnv3sMALFz4Pc2aheE9fJp1hfrNYfy58Ofz4drxkDwavnoLXkiCab+GviWO\nI4jAhTfB+sB3I0q69dY2zJqV6dHg7crKOkCPHlPo1OkV0tJu8KRmnz4tWb16J/v2HfakXpFAtxjG\nAvOBs0VkOs79IoYG8LyzgNnuHax8wHRjzIKKDnL69L7Mnp3JnDmbKvrUkNS1JS4ump9/LsAYaNfu\nTHbt8m5z1DM16sLP7gmoI4ec3YrqNWH39868QzlQo97x5VteBdu+cJathEGD2nLXXYHs0VaM17sT\n0dFRxburBw8eoXr1KE/qdujQiJ49E+nW7WzatTuL1q3P4NZbZ7F9e3C7mOU2BneXIRPn6sdLcc4y\njDDGlPtfrDHme6BDMAPs2/cC+vRpScOGNbn99iTWr8/mwQfnB1PSal2b2rQ5g7S0a8nNzccYw733\nzg31kPx9+xF0uA1++wn4qsOyFyHjXRg4Dbqkgi8W5pa4HcnFg+DLitwf+bjExHiqV49i0ybPcomt\nJTm3bXsmEydeQ2HhMWJifIwYMc+TuuPHLys+ezJ58k28+uqaoJsCBJjHICIbjDHtgl5b2fU1j6GY\n5jEU0TyGqlB6HkOgxxi+FJHOHo9IKRWmAj3GcAkw2L28+RDO7oQxxiTZGphSKnQCbQyBnAdRSp0i\nystjiMUJgj0P2ABMNsYUVsXAlFKhU94xhteBTjhNoQ/wgvURKaVCrrx7VxafjRARH/C5MeZizweh\nZyVUKcwNds52AMgH+n5zVO6sRPE5Hd2FUOr0Ud7Bx/YiUnS1hAA13O+Lzkp4+xE0pVRYKC/azZvr\nNpVSEaXSASpKqVOXNgallB9tDEopP9Ybg4jEi8gsEflGRDaKyCW216mUCk6gl0QH40VgrjHm1+61\nEHFVsE6lVBCsNgYRqQN0N8YMheJrIcIspFAp9Uu2txiaAzkiMgVoj3PX7BHGmJ8DLRCpMe+RxOZr\n7Gkc+4Cx0CEFCvLh1RFQqx7c/jQUFkDhEZh4O+zfBXdNgNaXgjHw2bvwznOe/C7hykbkfUBBLZUu\nLtIRWAV0NcasFpG/APuNMWN/sVyZl0Q3bFiTQ4eOFMe8DxjQLiJi3iOJzde4cePaxXHsFX3TnnBJ\ndGKS0wT+dD00SIDfT4X/vRqOueneyUOh8Xnwr8egUQv4cYsz/9kVMGEQZG89ofapdEl0MK9xWZdE\n295i2A5sM8YU3efybeCR0hddUmI6Edvx8eo4m6+xZ3HsCa1g8xpnevcOOKv5icGMcXXgoBvxVtQU\nAI4VwtFT+z1Tsdd4K17Gx1eKMSZbRLaJSCtjzCYgGfi69KV7nrRWpMW8R6Kwfo3/mwHXD4coH5zd\nBuonOLsS53WCgU9Azbow8hchY1cMhKzNkOPdHZ8iXyKBxMdXxVmJB4DpIhINbAHurGiBSIx5jzRh\n/xpvz4SlM+CJBfDjZti20TmesGae87isH9z5PEy6x1m+fTL0GuLseqgKs94YjDFfAUHlRUZazHsk\nsv0ae7I7MT/NeZzdBvo9Ar5o58AjQN5+qB7rTLfqAgPHweO9j//8NOBl5H1VbDEEJRJj3iONzdfY\n0zj2x+c7uxK5OfD3YXDl7dDzdjDH4GghvHyfs9zvXgMM/OE95+s/RsKWdV78OmHJRuS91bMSAQ9C\ng1pUKTSopSoEFx+vlDqNaGNQSvnRxqCU8qONQSnlRxuDUspP2J+uVKcvm2cOzJmWbsT7k82zHTZv\neHwi3WJQSvnRxqCU8qONQSnlRxuDUsqPNgallB9tDEopP1Ybg4i0EpG1IvKl+3W/iDxgc51KqeDZ\nTnDaBFwEICLVcKLeNLBRqTBXlbsSVwGbjTEVytmaN28w2dkPM3p0d08HY6uuzdo2x3zaemMebMyG\nEaOPz3vqRXh3KUx9D+rEO/Pi68K092H2EnjyL8eXbdsB3l/mPG6546Srsvn3O3RoFOnpg0hPH8TQ\noUlB16vKKx9vBd6o6JNSU98rTsD1kq26NmvbHPNp68FU6HEVNGnqfN/zaoitAb+6AvoPhuGPwFNj\n4Hej4N2Z8O8ZMPE1uCIFli6Ep1+C+wdCdhbMXUWdqQvIzc0vdVU2/37bt+eSnDzds3pVssXg5j3e\nCMwqe6klJR5bi+d6ljL8C7bq2qxtc8ynreysEzPRLrsCFs5xphd8AJf2cKa7XgELSsy/7AqIjoYa\ncbBjGxQWwqpP6NIlocxV2fz7NWpUi0WLBjFrVl/OOedkjWcLkF7iUbqq2mLoA6wxxuwqe5GeVTQU\npU6iXgPYt9eZzt0P8fWc6br14YB7E7X9+5zv6zWA3BI35sndT/36Tat2vK7ExEns3XuYlJTmTJ58\nPSkpM8pYsoX7KLK41KWq6hjDACqxG6FUldu3xzmeAFC7Dux3m8T+vVCrtjNdJ95Zbt+e443Dnb9n\nT8A3WfPU3r2HAVi48HuaNYsPul5V3O06DufA4zvB1fFmPFVV12Zt3Z2woOhFXbkUkq91plOug0/d\n+y6sXOJ8D87PVy6FI0fg0EFonAA+H3Tpxuef7wh4VV6Ji4surtmu3Zns2pUXdM2qiI/PAxpW9vk2\nEnBt1rVZ2+aYT1vPp0GnrlA9BpI6Qmo/SLneOStxYD/8zj3T8LfnYNJUuOM++Ga9c+AR4I8PQtpM\nZ/offyM3t3mZq7L192vT5gzS0q4lNzcfYwz33js36JqaEq1OS5rHUOQxTYlWSgVGG4NSyo82BqWU\nH20MSik/2hiUUn40JVqFsRrWKts6e2A6Wbzf5uonrdX+Jd1iUEr50caglPKjjUEp5Ucbg1LKjzYG\npZQfbQxKKT9V8bHr0SKyUUTWi8h0Ealue51KqeDYjo9vBtwDXGSMScK5buI2m+tUSgXP9hZDLnAE\nqCkiPiAO2FnRIi1bNiA//4907Xq2ZwNr374Ry5alsnjxUBYuvINmzep6VhvsjNlWXduvhS1jx3Zj\n+fJBpKffxoUXnuFJTc+TnO8ZC68th5fT4dwLnXl9BsPfFsLLH0PKrScuP/af8NJHFV5NRKVEG2P2\nisgLwA9AHrDAGPNxRes89lgPlizZ6unYdu48wDXXTCMvr4Devc9j3LgrGTLEu1te2Bizrbq2Xwsb\nkpIa0rlzYy6/fDoJCbWZOvU6kpNnBl3X0yTnlknQpjPcfTmcmQCPT4XnfgddroJhKf7Ln9sWalUu\nli2iUqIC38bxAAALk0lEQVRFpAXwe6AZ0ASoJSIDS196CaWlRHfunEBW1gG2b8/1dGy7dh0iL68A\ngPz8oxQUHPWstq0xR+JrYUurVvVZs+ZHAHbsOEDz5vH4fMG/nT1Ncj6nFWSucaZ/2gFNmkOvfnA4\nz9kqePZtaNjk+PJ3/RGmPFWpVXmdEm17V6ITsMIYs8cYcxQn9/Gy0hftWeKRWDx3zJjuPPPMcms5\nh3Fx0Tz5ZC+ee26lZzVtjTkSXwtbMjJy6NnzHHy+aiQlNSQhoTb16sWGelgn2pwBF/eEKJ+z9XBW\nUzijCcTXh+HXwPv/gBHPO8te3AN++Bb2/FSpVSUmTqJXr+m88spaJk++/iRLtgCSSzxKZ7sxfAtc\nKiKxIiLuSL4J9Ml9+rRk9eqd7Nt32MrgoqKqMXNmf8aPX8a33+Z4UtPWmCPxtbApM3M3M2Z8zYIF\ntzB8eEc2bszxJATVU1sz4aMZMGkB3DLcaRS5e+BT9xjCqo+c3QeAIY/CtOecpNhKdP6ISok2xnwF\nTAXWAF8BArwS6PM7dGhEz56JzJ07iJSUc3n++as9vYvP9Ol9mT07kzlzNnlW09aYI/G1sC0tbR29\nes1k4sTVbNhwkluWVIJnW2XvpMH9veCNifCfDbBmiXPcAeCCTrB9M9SoCfXPgqdmOgcfW3WAoY8G\nvAobKdEREwY7efJNvPrqGlat2u7JOvv2vYApU37F6tU7EYH167N58MH5ntQu4vWYbdWtiteick7+\nsev582/B5xNycn5m2LCF7N5dkXs6lL5sySTnjIyfKpzk7Pex67/Od6Ll9+XAs8Ng/2548AVofZHT\nfZ7+Dfzw3fHlG50Df3jV2dX4hbI+dt2pU+MTUqIfeGABGRmBNsrSw2AjpjGo05G9PIayGkOwIi+P\nQVOilVIB0saglPKjjUEp5Ucbg1LKjzYGpZQfbQxKKT96uvK0YuOmqAAFluraZOtUqJ3ToACmpfen\nQuU79HSlUiow2hiUUn60MSil/GhjUEr50caglPJTFSnRI0Rkg/t4wPb6lFLBsx3tdiFwF06SUwfg\nejfuTSkVxmxvMVwAfGaMyXej3T4B+lpep1IqSLYbQwbQXUTqiUgccC3gbZ76acTzaPMSLrqoEfPn\n38bHHw9k/PgrPa8fSWzE0nse0T98LLyxHF5Ph5YXQs1akPYBTF0EMz5x5gGMngBvroSZK+DuhwMu\nbzs+PlNEngUWAgeBtUD4RxCHKU+jzUvw+arxzDNXcvPNbxenRZ+ubMXSexrRf34StOsMAy6HsxLg\nz1Nh3lvw1Wfw8pPQuQf89jH4/QD41yQY/5DzvJkrYP4s2L613FVYbQwAxpgpwBQAEXkK2Fb6kktK\nTCdSMilaOTyNNi+ha9cEDh48whtv/Iq4uGgef/wTVqzwNo4uUpQVS19YeCyourt2HSqeDjqiP7EV\nZLix9Nk7oGlz2Lcbmrdy5tWtDznZzvS2Lcefd7SQTw8e5aPd5a/CemMQkYbGmF0icg5wM3Bp6Uv2\ntD0UVYYmTWqTlHQm7du/Rnx8DOnpg2jTJi3UwwqJjIwchg/viM9XjTZtGhTH0nuVQF0U0Z+a+l7l\ni3yXAbcPd7Ikz2vjbDWs/RSG/h4+WA+1452tiZJuGAg/bKbrvm10bXB89hN7Sl+F9cYA/FtE6uN8\n0ua3xhhv75aigrZnz8+sXLmdvLwC8vIKyMnJo0GDGhUMVz01lIyl37x5n6ex9J5F9G/OhA9mwD8W\nwLbN8J+NcPMdzu7E6y9CUhd4/GW49wZn+cuS4eYhcO/J7jdxIuvXMRhjehhj2hpjLjLGLLG9vtOB\n17sTn322k1atGiACtWpVp2HDuNOyKRSxFUvvaUT/zDS4oxdMmQibNji3ttvn7iPszYE69ZzppC7w\nwDgY3h8KAj9+VBVbDMojJaPNO3ZsUuFo87Lk5ubz0ktfsHTp7fh81Rg1apEndSPVL2PpvdC37wX0\n6dOShg1rcvvtScFH9E92Y+n35sATw6B6rHMQsl8qxMTCc6Oc5Z56DYyB/3vP+frMSPhmXbnlNY/h\ntKJ5DMdpHgNoHoNSqgIisDFs1brWa28pf5FK2Wqprs3amy3V3WqpLizx4FipNoaIrWuz9veW6m61\nVNdm7chrkks82JuJwMaglLJNG4NSyk8YnZVQSoVC2N7tWikVXnRXQinlRxuDUspPxDQGEektIpki\nsklEHvGw7mQRyRaR9V7VdOs2FZFFIrLRy7xLEYkRkc9EZK1b+2kv6paoX01EvhSR9z2uu1VEvnLH\n/bmHdeNFZJaIfOO+Hpd4VLeVO9Yv3a/7PfwbjnbHul5EpotIdY/qepevaowJ+wdOA/sP0Aznut51\nwPke1b4cJ49yvcdjbgR0cKdrAd96OOY492sUsAro5uG4fw/8C3jf49djC1DPwnvjn8Cd7rQPqGNh\nHdWAncDZHtRq5r4W1d3v3wTu8KDuhcB6IMZ9XywAWlS2XqRsMXQBvjPG/NcYUwDMBG7yorAxZjmw\n14tav6j7ozFmnTt9EPgGSPCodtG1bTE4b1pPxi8iTXHi917zot4vy+PxFqqI1AG6GycMCGNMobHz\nsf6rgM3GmDJChiokFzgC1BQRHxCH03SC5Wm+aqQ0hgROTH7ajkf/yKqCiCTibJV85lG9aiKyFvgR\nWGKM+dqLusBE4GHAxqkqAywUkS9E5B6PajYHckRkirvJ/4qI2Ph01K3AG14UMsbsBV4AfgB2APuM\nMR97UNrTfNVIaQwRS0RqAW8DI9wth6AZY44ZYy4CmgI9ROSKYGuKyHVAtruVI+7DS92MMRfjvGGH\nicjl5T0hAD7gYuBvbu084FEP6hYTkWjgRmCWR/Va4OyuNQOaALVEZGCwdY0xmUBRvupcgsxXjZTG\nsAM4p8T3Td15Yc3dVHwbmGaMCSLLq3TuZvOHOPftCFY34EYR2YLzv+OVIjLVg7oAGGOy3K+7gNk4\nu4fB2g5sM8asdr9/G6dReKkPsMYdtxc6ASuMMXvcTf53gMu8KGyMmWKM6WSM6QnsAyqdCBMpjeEL\n4DwRaeYewb0N8PKouY3/IQH+AXxtjHnRq4IicoaIxLvTNYAUnIOxQTHGjDHGnGOMaYHz+i4yxtwR\nbF0AEYlzt5wQkZrA1TibvkExxmQD20TETUElGfBqt6rIADzajXB9C1wqIrEiIjhj/saLwiLS0P1a\nlK86o7K1IiLByRhzVER+h3OktRow2Rjj1Ys5AyeJtoGI/ACMLTqYFWTdbsAgYIN7PMAAY4wxQcT2\nANAYeN19U1XD2RpJD7KmbWcBs91L333AdGPMAo9qPwBMdzf5twB3elQXd1/9KuA3XtU0xnzlbomt\nwdnUXwu84lF5z/JV9ZJopZSfSNmVUEpVIW0MSik/2hiUUn60MSil/GhjUEr50caglPKjjUEBICK/\nEpFjJS4WKmu5ISLSKIj1XCEiH1T2+apqaGNQRW4D5uBc6XcyQwn+A2x68UyY08agii5TvgQYhtMg\niuY/4oaJrBWRp0WkH861/v9yP80YKyLfu1fbISIdRWSxO91ZRFaKyBoRWS4iLUPwq6lKiohLopV1\nNwEfGWO2ichPInIRzmXMNwCdjTH5IlLXGLNPRIYBI40xa6HUhO+i778BLjfGHBORZGA80L9qfh0V\nLG0MCpzdh4nu9CxgIM6HyqYYY/IBjDH73J//8gNnZX34rC4w1d1SKPqMhIoQ+sc6zYlIPaAX0Nb9\n3z8K5x/yLAL7xGkhx3dJY0vM/xPOJzT7ikgzYLF3o1a26TEG9WtgqjGmuTGmhTGmGc7NK3OBoUWJ\nSG4DwZ1fp8Tzvwc6utP9SsyP53hmhmefeFRVQxuDuhUnOKWkf+OE2b4PrBaRL4GR7s9eB/7uHnyM\nAcYBf3WTnwtL1Pgz8IyIrEHfZxFHP3atlPKjnVwp5Ucbg1LKjzYGpZQfbQxKKT/aGJRSfrQxKKX8\naGNQSvnRxqCU8vP/Jbv/zp0OVTQAAAAASUVORK5CYII=\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f22684b17d0>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "test_error, confusions = error_rate(test_prediction.eval(), test_labels)\n",
+    "print 'Test error: %.1f%%' % test_error\n",
+    "\n",
+    "plt.xlabel('Actual')\n",
+    "plt.ylabel('Predicted')\n",
+    "plt.grid(False)\n",
+    "plt.xticks(numpy.arange(NUM_LABELS))\n",
+    "plt.yticks(numpy.arange(NUM_LABELS))\n",
+    "plt.imshow(confusions, cmap=plt.cm.jet, interpolation='nearest');\n",
+    "\n",
+    "for i, cas in enumerate(confusions):\n",
+    "  for j, count in enumerate(cas):\n",
+    "    if count > 0:\n",
+    "      xoff = .07 * len(str(count))\n",
+    "      plt.text(j-xoff, i+.2, int(count), fontsize=9, color='white')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "yLnS4dGiMwI1"
+   },
+   "source": [
+    "We can see here that we're mostly accurate, with some errors you might expect, e.g., '9' is often confused as '4'.\n",
+    "\n",
+    "Let's do another sanity check to make sure this matches roughly the distribution of our test set, e.g., it seems like we have fewer '5' values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {
+    "cellView": "both",
+    "colab": {
+     "autoexec": {
+      "startup": false,
+      "wait_interval": 0
+     },
+     "output_extras": [
+      {
+       "item_id": 1
+      }
+     ]
+    },
+    "colab_type": "code",
+    "collapsed": false,
+    "executionInfo": {
+     "elapsed": 352,
+     "status": "ok",
+     "timestamp": 1446753006584,
+     "user": {
+      "color": "#1FA15D",
+      "displayName": "Michael Piatek",
+      "isAnonymous": false,
+      "isMe": true,
+      "permissionId": "00327059602783983041",
+      "photoUrl": "//lh6.googleusercontent.com/-wKJwK_OPl34/AAAAAAAAAAI/AAAAAAAAAlk/Rh3u6O2Z7ns/s50-c-k-no/photo.jpg",
+      "sessionId": "716a6ad5e180d821",
+      "userId": "106975671469698476657"
+     },
+     "user_tz": 480
+    },
+    "id": "x5KOv1AJMgzV",
+    "outputId": "2acdf737-bab6-408f-8b3c-05fa66d04fe6"
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAEACAYAAABfxaZOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEqxJREFUeJzt3W+MZXV9x/H3B1dUVMjWlt2WRf4EQTC2SgzaUttpbVE0\nXWhNEDX1D8Y+gAZjE+MufcD6pEqTRm1aTIwWF4vSRTGskchCN5NGUwQVXMuuuK0Blq071EgxjYnu\n6rcP7tnudTqzM3Nnds5Zfu9XcrPn/u7vzv3M3ZnP/d1z7r2TqkKS1IYT+g4gSVo9lr4kNcTSl6SG\nWPqS1BBLX5IaYulLUkMWLP0kn0wyk2TX2NhfJ9mT5MEkn09y8thlm5Ps7S6/ZGz8wiS7knw3yUdW\n/luRJC1kMSv9m4DXzhrbAbykql4G7AU2AyS5ALgCOB+4FLgxSbrrfAx4V1WdC5ybZPbXlCQdYwuW\nflV9BXhy1tg9VfXz7uy9wIZueyNwa1UdqqpHGD0gXJRkPfD8qrq/m3czcPkK5JckLcFK7NO/Criz\n2z4N2Dd22f5u7DTg8bHxx7sxSdIqWlbpJ/lL4GBVfXaF8kiSjqE1k14xyTuA1wO/Pza8Hzh97PyG\nbmy+8fm+th8IJEkTqKoc7fLFrvTTnUZnktcB7wM2VtVPxuZtB65McmKSs4BzgPuq6gDwVJKLugO7\nbwPuWCD4oE7XX3997xnM9PTKZSYzrXSmxVhwpZ/kM8AU8IIkjwHXA9cBJwJ3dy/Oubeqrq6q3Um2\nAbuBg8DVVXV41X4N8Cng2cCdVfXlRSWUJK2YBUu/qt4yx/BNR5n/QeCDc4x/A3jpktJJklaU78hd\npKmpqb4j/D9mWrwh5jLT4phpcRabKUf2vgxHkhpiLkkasiTUCh3IlSQ9DVj6ktQQS1+SGmLpS1JD\nLH1JaoilL0kNsfTnsX79mSTp/bR+/Zl93xWSnkZ8nf78GYAh3Deh7/tC0vHB1+lLkn6BpS9JDbH0\nJakhlr4kNcTSl6SGWPqS1BBLX5IaYulLUkMsfUlqiKUvSQ2x9CWpIZa+JDXE0pekhlj6ktQQS1+S\nGmLpS1JDLH1JasiavgNoIc/q/opXf9atO4MDBx7pNYOklbHgSj/JJ5PMJNk1NrY2yY4kDye5K8kp\nY5dtTrI3yZ4kl4yNX5hkV5LvJvnIyn8rT1c/YfRnG/s7zcw8euy/TR13/DvSx6fF7N65CXjtrLFN\nwD1VdR6wE9gMkOQC4ArgfOBS4MYcWaZ+DHhXVZ0LnJtk9teUdBwZLQb6XZC4KFm6BUu/qr4CPDlr\n+DJga7e9Fbi8294I3FpVh6rqEWAvcFGS9cDzq+r+bt7NY9eRJK2SSffpn1pVMwBVdSDJqd34acC/\njs3b340dAh4fG3+8G9dxof/jCuCxBQ3X+vVnHjfPOFbqQG6t0NfRIB0+rtCvmZn+H3ikuRzZ1dW3\nhX9HJi39mSTrqmqm23XzRDe+Hzh9bN6Gbmy+8Xlt2bLl/7anpqaYmpqaMKokPV1Nd6fFS9XCj05J\nzgS+WFUv7c7fAPywqm5I8n5gbVVt6g7k3gK8ktHum7uBF1VVJbkXuBa4H/gS8LdV9eV5bq8Wk+tY\nGu3OGMojd985hpABIPT9c6EjhvQ70vfPxcDui6Mu9xdc6Sf5DDAFvCDJY8D1wIeA25JcBTzK6BU7\nVNXuJNuA3cBB4Oqx9r4G+BTwbODO+QpfknTsLGqlv9qS1Akn9Pu+sZ///BBDeeTuP8cQMsBovfCT\nXhN4MPmIga1u+00wrPviqCv9wZY+/LTHBF8A3sRQ/hP7zzGEDDCMHP0XzFAMrOj6TTCs+2J5u3f6\n88web3vAd4ukWYbxkuLjhe0m6Tg3hJcUHz8POn7KpiQ1xJW+dBw6nt4BqmGx9KXj0DDeAXr87NLQ\nEe7ekaSGWPqS1BBLX5IaYulLUkMsfUlqiKUvSQ2x9CWpIZa+JDXE0pekhlj6ktQQP4ZBWhI/xlfH\nN0tfWpIhfIwv+Lk3mpS7dySpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQlqSGW\nviQ1xNKXpIYsq/STbE7yUJJdSW5JcmKStUl2JHk4yV1JTpk1f2+SPUkuWX58SdJSTFz6Sc4A3g28\nvKp+ndGHt70Z2ATcU1XnATuBzd38C4ArgPOBS4Eb48cVStKqWs5K/0fAT4HnJlkDPAfYD1wGbO3m\nbAUu77Y3ArdW1aGqegTYC1y0jNuXJC3RxKVfVU8CfwM8xqjsn6qqe4B1VTXTzTkAnNpd5TRg39iX\n2N+NSZJWycSfp5/kbOC9wBnAU8BtSd7K//+w8Qk/fHzL2PZUd5IkHTHdnRZvOX9E5RXAV6vqhwBJ\nvgD8FjCTZF1VzSRZDzzRzd8PnD52/Q3d2Dy2LCOaJLVgil9cEH9gwWssZ5/+w8Crkjy7OyD7GmA3\nsB14Rzfn7cAd3fZ24MruFT5nAecA9y3j9iVJSzTxSr+qvpXkZuAbwM+AB4CPA88HtiW5CniU0St2\nqKrdSbYxemA4CFxdVUP4u3OS1IwMsXeTVL9/h/R24I0M52+h9p1jCBlgGDmGkAGGkWMIGWAYOYaQ\nASBU1VFfCu87ciWpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKX\npIZY+pLUEEtfkhpi6UtSQyx9SWqIpS9JDbH0Jakhlr4kNcTSl6SGWPqS1BBLX5IaYulLUkMsfUlq\niKUvSQ2x9CWpIZa+JDXE0pekhlj6ktSQZZV+klOS3JZkT5KHkrwyydokO5I8nOSuJKeMzd+cZG83\n/5Llx5ckLcVyV/ofBe6sqvOB3wC+A2wC7qmq84CdwGaAJBcAVwDnA5cCNybJMm9fkrQEE5d+kpOB\nV1fVTQBVdaiqngIuA7Z207YCl3fbG4Fbu3mPAHuBiya9fUnS0i1npX8W8IMkNyX5ZpKPJzkJWFdV\nMwBVdQA4tZt/GrBv7Pr7uzFJ0ipZs8zrXghcU1VfT/JhRrt2ata82ecXacvY9lR3kiQdMd2dFm85\npf84sK+qvt6d/zyj0p9Jsq6qZpKsB57oLt8PnD52/Q3d2Dy2LCOaJLVgil9cEH9gwWtMvHun24Wz\nL8m53dBrgIeA7cA7urG3A3d029uBK5OcmOQs4BzgvklvX5K0dMtZ6QNcC9yS5JnA94B3As8AtiW5\nCniU0St2qKrdSbYBu4GDwNVVNeGuH0nSJDLE3k1SEx8KWBG3A2+k3wyHhf5zDCEDDCPHEDLAMHIM\nIQMMI8cQMgCEqjrqS+F9R64kNcTSl6SGWPqS1BBLX5IaYulLUkMsfUlqiKUvSQ2x9CWpIZa+JDXE\n0pekhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKXpIZY+pLUEEtfkhpi6UtSQyx9\nSWqIpS9JDbH0Jakhlr4kNcTSl6SGWPqS1BBLX5IasuzST3JCkm8m2d6dX5tkR5KHk9yV5JSxuZuT\n7E2yJ8kly71tSdLSrMRK/z3A7rHzm4B7quo8YCewGSDJBcAVwPnApcCNSbICty9JWqRllX6SDcDr\ngU+MDV8GbO22twKXd9sbgVur6lBVPQLsBS5azu1LkpZmuSv9DwPvA2psbF1VzQBU1QHg1G78NGDf\n2Lz93ZgkaZWsmfSKSd4AzFTVg0mmjjK1jnLZUWwZ257qTpKkI6a70+JNXPrAxcDGJK8HngM8P8mn\ngQNJ1lXVTJL1wBPd/P3A6WPX39CNzWPLMqJJUgum+MUF8QcWvMbEu3eq6rqqemFVnQ1cCeysqj8F\nvgi8o5v2duCObns7cGWSE5OcBZwD3Dfp7UuSlm45K/35fAjYluQq4FFGr9ihqnYn2cbolT4Hgaur\nasJdP5KkSWSIvZukJj4UsCJuB95IvxkOC/3nGEIGGEaOIWSAYeQYQgYYRo4hZAAIVXXUl8L7jlxJ\naoilL0kNsfQlqSGWviQ1xNKXpIZY+pLUEEtfkhpi6UtSQyx9SWqIpS9JDbH0Jakhlr4kNcTSl6SG\nWPqS1BBLX5IaYulLUkMsfUlqiKUvSQ2x9CWpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDLH1Jaoil\nL0kNsfQlqSGWviQ1ZOLST7Ihyc4kDyX5dpJru/G1SXYkeTjJXUlOGbvO5iR7k+xJcslKfAOSpMVb\nzkr/EPAXVfUS4DeBa5K8GNgE3FNV5wE7gc0ASS4ArgDOBy4FbkyS5YSXJC3NxKVfVQeq6sFu+3+A\nPcAG4DJgazdtK3B5t70RuLWqDlXVI8Be4KJJb1+StHQrsk8/yZnAy4B7gXVVNQOjBwbg1G7aacC+\nsavt78YkSatk2aWf5HnA54D3dCv+mjVl9nlJUk/WLOfKSdYwKvxPV9Ud3fBMknVVNZNkPfBEN74f\nOH3s6hu6sXlsGdue6k6SpCOmu9PipWryhXiSm4EfVNVfjI3dAPywqm5I8n5gbVVt6g7k3gK8ktFu\nnbuBF9UcAZJUv08QbgfeyDCepIT+cwwhAwwjxxAywDByDCEDDCPHEDIAhKo66gtkJl7pJ7kYeCvw\n7SQPMPqOrwNuALYluQp4lNErdqiq3Um2AbuBg8DVcxW+JOnYWdZK/1hxpT9uCCuIIWSAYeQYQgYY\nRo4hZIBh5BhCBljMSt935EpSQyx9SWqIpS9JDbH0Jakhlr4kNcTSl6SGWPqS1BBLX5IaYulLUkMs\nfUlqiKUvSQ2x9CWpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKX\npIZY+pLUEEtfkhpi6UtSQyx9SWqIpS9JDbH0Jakhq176SV6X5DtJvpvk/at9+5LUslUt/SQnAH8H\nvBZ4CfDmJC9ezQyTm+47wBym+w4wh+m+A8xjuu8Ac5juO8AcpvsOMIfpvgPMYbrvAHOYXtSs1V7p\nXwTsrapHq+ogcCtw2SpnmNB03wHmMN13gDlM9x1gHtN9B5jDdN8B5jDdd4A5TPcdYA7TfQeYw/Si\nZq126Z8G7Bs7/3g3JklaBWv6DjCfk0/+o95u+9Ch7/PjH/d285J0zKSqVu/GklcBW6rqdd35TUBV\n1Q2z5q1eKEl6GqmqHO3y1S79ZwAPA68Bvg/cB7y5qvasWghJatiq7t6pqp8l+XNgB6PjCZ+08CVp\n9azqSl+S1K9BvSN3iG/cSvLJJDNJdvWd5bAkG5LsTPJQkm8nuXYAmZ6V5GtJHuhy/VXfmQ5LckKS\nbybZ3ncWgCSPJPlWd1/d13cegCSnJLktyZ7u/++VA8h0bncffbP796mB/Kxv7u6jXUluSXLiADK9\np+uChfugqgZxYvQA9O/AGcAzgQeBFw8g128DLwN29Z1lLNN64GXd9vMYHScZwn11UvfvM4B7gYv7\nztTleS/wj8D2vrN0eb4HrO07x6xMnwLe2W2vAU7uO9OsfCcA/wmc3nOOM7r/vxO78/8EvK3nTC8B\ndgHP6n73dgBnzzd/SCv9Qb5xq6q+AjzZd45xVXWgqh7stv8H2MMA3u9QVYdf6PosRr+kvd9vSTYA\nrwc+0XeWMWFAz7KTnAy8uqpuAqiqQ1X1o55jzfYHwH9U1b4FZx5bPwJ+Cjw3yRrgJEYPRn06H/ha\nVf2kqn4G/AvwJ/NNHswPHr5xayJJzmT0TORr/Sb5v90oDwAHgOmq2t13JuDDwPuAIR28KuDuJPcn\neXffYYCzgB8kuanblfLxJM/pO9QsbwI+23eIqnoS+BvgMWA/8N9VdU+/qfg34NVJ1iY5idEi5/T5\nJg+p9LVESZ4HfA54T7fi71VV/byqXg5sAH4nye/2mSfJG4CZ7llRutMQXFxVFzL65bwmyW/3nGcN\ncCHw912uHwOb+o10RJJnAhuB2waQ5WxGuwvPAH4NeF6St/SZqaq+A9wA3A3cCTwA/Gy++UMq/f3A\nC8fOb+jGNIfuqeXngE9X1R195xnX7Rr4EvCKnqNcDGxM8j1Gq8TfS3Jzz5moqu93//4X8AVGuzb7\n9Diwr6q+3p3/HKMHgaG4FPhGd3/17RXAV6vqh92ulNuB3+o5E1V1U1W9oqqmgP8Gvjvf3CGV/v3A\nOUnO6I6GXwkM4tUWDGuVeNg/ALur6qN9BwFI8stJTum2nwP8IaOD8b2pquuq6oVVdTajn6edVfW2\nPjMlOal7hkaS5wKXMHp63puqmgH2JTm3G3oNMIRdc4e9mQHs2uk8DLwqybOThNF91ft7jZL8Svfv\nC4E/Bj4z39zBfPZODfSNW0k+A0wBL0jyGHD94QNePWa6GHgr8O1uH3oB11XVl3uM9avA1u4X4QRG\nz0D+ucc8Q7UO+EL3USNrgFuqakfPmQCuBW7pdqV8D3hnz3mA0YMko4O4f9Z3FoCq+lb3bPEbjHah\nPAB8vN9UAHw+yS8BB4Grj3Yg3jdnSVJDhrR7R5J0jFn6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQl\nqSGWviQ15H8Bc9sQZEMpbW0AAAAASUVORK5CYII=\n",
+      "text/plain": [
+       "<matplotlib.figure.Figure at 0x7f2265a72d50>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "plt.xticks(numpy.arange(NUM_LABELS))\n",
+    "plt.hist(numpy.argmax(test_labels, 1));"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "colab_type": "text",
+    "id": "E6DzLSK5M1ju"
+   },
+   "source": [
+    "Indeed, we appear to have fewer 5 labels in the test set. So, on the whole, it seems like our model is learning and our early results are sensible.\n",
+    "\n",
+    "But, we've only done one round of training. We can greatly improve accuracy by training for longer. To try this out, just re-execute the training cell above."
+   ]
+  }
+ ],
+ "metadata": {
+  "colabVersion": "0.3.2",
+  "colab_default_view": {},
+  "colab_views": {},
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/tensorflow/tools/docker/notebooks/LICENSE b/tensorflow/tools/docker/notebooks/LICENSE
new file mode 100644
index 0000000000..28711d7885
--- /dev/null
+++ b/tensorflow/tools/docker/notebooks/LICENSE
@@ -0,0 +1,13 @@
+Copyright 2015 The TensorFlow Authors.  All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
diff --git a/tensorflow/tools/docker/run_jupyter.sh b/tensorflow/tools/docker/run_jupyter.sh
new file mode 100755
index 0000000000..8b41a7589f
--- /dev/null
+++ b/tensorflow/tools/docker/run_jupyter.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+jupyter notebook "$@"
diff --git a/tensorflow/tools/docker/simple_console.py b/tensorflow/tools/docker/simple_console.py
new file mode 100644
index 0000000000..b2fdc739e7
--- /dev/null
+++ b/tensorflow/tools/docker/simple_console.py
@@ -0,0 +1,14 @@
+"""Start a simple interactive console with TensorFlow available."""
+
+import code
+import sys
+
+
+def main(_):
+    """Run an interactive console."""
+    code.interact()
+    return 0
+
+
+if __name__ == '__main__':
+    sys.exit(main(sys.argv))
diff --git a/tensorflow/tools/pip_package/BUILD b/tensorflow/tools/pip_package/BUILD
new file mode 100644
index 0000000000..1302553e8d
--- /dev/null
+++ b/tensorflow/tools/pip_package/BUILD
@@ -0,0 +1,27 @@
+# Description:
+#  Tools for building the TensorFlow pip package.
+
+package(default_visibility = ["//visibility:private"])
+
+py_binary(
+    name = "simple_console",
+    srcs = ["simple_console.py"],
+    deps = ["//tensorflow:tensorflow_py"],
+)
+
+sh_binary(
+    name = "build_pip_package",
+    srcs = ["build_pip_package.sh"],
+    data = [
+        "MANIFEST.in",
+        "README",
+        "setup.py",
+        ":simple_console",
+        "//tensorflow:tensorflow_py",
+        "//tensorflow/models/image/cifar10:cifar10_eval",
+        "//tensorflow/models/image/cifar10:cifar10_train",
+        "//tensorflow/models/image/cifar10:cifar10_multi_gpu_train",
+        "//tensorflow/models/image/mnist:convolutional",
+        "//tensorflow/tensorboard",
+    ],
+)
diff --git a/tensorflow/tools/pip_package/MANIFEST.in b/tensorflow/tools/pip_package/MANIFEST.in
new file mode 100644
index 0000000000..a267dd1c76
--- /dev/null
+++ b/tensorflow/tools/pip_package/MANIFEST.in
@@ -0,0 +1,3 @@
+include README
+recursive-include * *.py
+recursive-include * *.so
diff --git a/tensorflow/tools/pip_package/README b/tensorflow/tools/pip_package/README
new file mode 100644
index 0000000000..72e19becbb
--- /dev/null
+++ b/tensorflow/tools/pip_package/README
@@ -0,0 +1 @@
+TensorFlow
diff --git a/tensorflow/tools/pip_package/build_pip_package.sh b/tensorflow/tools/pip_package/build_pip_package.sh
new file mode 100755
index 0000000000..71317d34a5
--- /dev/null
+++ b/tensorflow/tools/pip_package/build_pip_package.sh
@@ -0,0 +1,38 @@
+#!/bin/bash
+
+set -e
+
+function main() {
+  if [ $# -lt 1 ] ; then
+    echo "No destination dir provided"
+    exit 1
+  fi
+
+  DEST=$1
+  TMPDIR=$(mktemp -d -t tmp.XXXXXXXXXX)
+
+  echo `date` : "=== Using tmpdir: ${TMPDIR}"
+
+  if [ ! -d bazel-bin/tensorflow ]; then
+    echo "Could not find bazel-bin.  Did you run from the root of the build tree?"
+    exit 1
+  fi
+  cp -R \
+    bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/* \
+    ${TMPDIR}
+
+  cp tensorflow/tools/pip_package/MANIFEST.in ${TMPDIR}
+  cp tensorflow/tools/pip_package/README ${TMPDIR}
+  cp tensorflow/tools/pip_package/setup.py ${TMPDIR}
+  pushd ${TMPDIR}
+  rm -f MANIFEST
+  echo `date` : "=== Building wheel"
+  python setup.py sdist bdist_wheel >/dev/null
+  mkdir -p ${DEST}
+  cp dist/* ${DEST}
+  popd
+  rm -rf ${TMPDIR}
+  echo `date` : "=== Output wheel file is in: ${DEST}"
+}
+
+main "$@"
diff --git a/tensorflow/tools/pip_package/setup.py b/tensorflow/tools/pip_package/setup.py
new file mode 100644
index 0000000000..e7f9ecdc71
--- /dev/null
+++ b/tensorflow/tools/pip_package/setup.py
@@ -0,0 +1,79 @@
+import fnmatch
+import os
+from setuptools import find_packages, setup, Extension
+from setuptools.dist import Distribution
+
+_VERSION = '0.5.0'
+
+REQUIRED_PACKAGES = [
+    'numpy',
+    'six >= 1.10.0',
+    'virtualenvwrapper',
+]
+
+# pylint: disable=line-too-long
+CONSOLE_SCRIPTS = [
+    'tensorboard = tensorflow.tensorboard.tensorboard:main',
+    'tensorflow_model_cifar10_train = tensorflow.models.image.cifar10.cifar10_train:main',
+    'tensorflow_model_cifar10_multi_gpu_train = tensorflow.models.image.cifar10.cifar10_multi_gpu_train:main',
+    'tensorflow_model_cifar10_eval = tensorflow.models.image.cifar10.cifar10_eval:main',
+    'tensorflow_model_mnist_convolutional = tensorflow.models.image.mnist.convolutional:main',
+]
+# pylint: enable=line-too-long
+
+TEST_PACKAGES = [
+    'scipy >= 0.15.1',
+]
+
+class BinaryDistribution(Distribution):
+  def is_pure(self):
+    return False
+
+matches = []
+for root, dirnames, filenames in os.walk('external'):
+  for filename in fnmatch.filter(filenames, '*'):
+    matches.append(os.path.join(root, filename))
+
+matches = ['../' + x for x in matches if '.py' not in x]
+
+setup(
+    name='tensorflow',
+    version=_VERSION,
+    description='TensorFlow helps the tensors flow',
+    long_description='',
+    url='http://tensorflow.com/',
+    author='Google Inc.',
+    author_email='opensource@google.com',
+    # Contained modules and scripts.
+    packages=find_packages(),
+    entry_points={
+        'console_scripts': CONSOLE_SCRIPTS,
+    },
+    install_requires=REQUIRED_PACKAGES,
+    tests_require=REQUIRED_PACKAGES + TEST_PACKAGES,
+    # Add in any packaged data.
+    include_package_data=True,
+    package_data={
+        'tensorflow': ['python/_pywrap_tensorflow.so',
+                       'tensorboard/dist/index.html',
+                       'tensorboard/dist/tf-tensorboard.html',
+                       'tensorboard/lib/css/global.css',
+                     ] + matches,
+    },
+    zip_safe=False,
+    distclass=BinaryDistribution,
+    # PyPI package information.
+    classifiers=[
+        'Development Status :: 4 - Beta',
+        'Intended Audience :: Developers',
+        'Intended Audience :: Education',
+        'Intended Audience :: Science/Research',
+        'License :: OSI Approved :: Apache Software License',
+        'Programming Language :: Python :: 2.7',
+        'Topic :: Scientific/Engineering :: Mathematics',
+        'Topic :: Software Development :: Libraries :: Python Modules',
+        'Topic :: Software Development :: Libraries',
+        ],
+    license='Apache 2.0',
+    keywords='tensorflow tensor machine learning',
+    )
diff --git a/tensorflow/tools/pip_package/simple_console.py b/tensorflow/tools/pip_package/simple_console.py
new file mode 100644
index 0000000000..b2fdc739e7
--- /dev/null
+++ b/tensorflow/tools/pip_package/simple_console.py
@@ -0,0 +1,14 @@
+"""Start a simple interactive console with TensorFlow available."""
+
+import code
+import sys
+
+
+def main(_):
+    """Run an interactive console."""
+    code.interact()
+    return 0
+
+
+if __name__ == '__main__':
+    sys.exit(main(sys.argv))
diff --git a/tensorflow/tools/swig/swig.sh b/tensorflow/tools/swig/swig.sh
new file mode 100755
index 0000000000..96f0186098
--- /dev/null
+++ b/tensorflow/tools/swig/swig.sh
@@ -0,0 +1,2 @@
+#!/bin/bash
+swig "$@"
diff --git a/third_party/eigen3/BUILD b/third_party/eigen3/BUILD
new file mode 100644
index 0000000000..ac7eede6d9
--- /dev/null
+++ b/third_party/eigen3/BUILD
@@ -0,0 +1,10 @@
+licenses(["restricted"])  # MPL2, portions GPL v3, LGPL v3, BSD-like
+
+cc_library(
+    name = "eigen3",
+    hdrs = glob([
+        "**/*.h",
+    ]),
+    includes = [ "." ],
+    visibility = ["//visibility:public"],
+)
diff --git a/third_party/eigen3/Eigen/Array b/third_party/eigen3/Eigen/Array
new file mode 100644
index 0000000000..3d004fb69e
--- /dev/null
+++ b/third_party/eigen3/Eigen/Array
@@ -0,0 +1,11 @@
+#ifndef EIGEN_ARRAY_MODULE_H
+#define EIGEN_ARRAY_MODULE_H
+
+// include Core first to handle Eigen2 support macros
+#include "Core"
+
+#ifndef EIGEN2_SUPPORT
+  #error The Eigen/Array header does no longer exist in Eigen3. All that functionality has moved to Eigen/Core.
+#endif
+
+#endif // EIGEN_ARRAY_MODULE_H
diff --git a/third_party/eigen3/Eigen/Cholesky b/third_party/eigen3/Eigen/Cholesky
new file mode 100644
index 0000000000..7314d326c1
--- /dev/null
+++ b/third_party/eigen3/Eigen/Cholesky
@@ -0,0 +1,34 @@
+#ifndef EIGEN_CHOLESKY_MODULE_H
+#define EIGEN_CHOLESKY_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+/** \defgroup Cholesky_Module Cholesky module
+  *
+  *
+  *
+  * This module provides two variants of the Cholesky decomposition for selfadjoint (hermitian) matrices.
+  * Those decompositions are also accessible via the following methods:
+  *  - MatrixBase::llt()
+  *  - MatrixBase::ldlt()
+  *  - SelfAdjointView::llt()
+  *  - SelfAdjointView::ldlt()
+  *
+  * \code
+  * #include <Eigen/Cholesky>
+  * \endcode
+  */
+
+#include "src/misc/Solve.h"
+#include "src/Cholesky/LLT.h"
+#include "src/Cholesky/LDLT.h"
+#ifdef EIGEN_USE_LAPACKE
+#include "src/Cholesky/LLT_MKL.h"
+#endif
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_CHOLESKY_MODULE_H
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/Eigen/CholmodSupport b/third_party/eigen3/Eigen/CholmodSupport
new file mode 100644
index 0000000000..745b884e74
--- /dev/null
+++ b/third_party/eigen3/Eigen/CholmodSupport
@@ -0,0 +1,45 @@
+#ifndef EIGEN_CHOLMODSUPPORT_MODULE_H
+#define EIGEN_CHOLMODSUPPORT_MODULE_H
+
+#include "SparseCore"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+extern "C" {
+  #include <cholmod.h>
+}
+
+/** \ingroup Support_modules
+  * \defgroup CholmodSupport_Module CholmodSupport module
+  *
+  * This module provides an interface to the Cholmod library which is part of the <a href="http://www.cise.ufl.edu/research/sparse/SuiteSparse/">suitesparse</a> package.
+  * It provides the two following main factorization classes:
+  * - class CholmodSupernodalLLT: a supernodal LLT Cholesky factorization.
+  * - class CholmodDecomposiiton: a general L(D)LT Cholesky factorization with automatic or explicit runtime selection of the underlying factorization method (supernodal or simplicial).
+  *
+  * For the sake of completeness, this module also propose the two following classes:
+  * - class CholmodSimplicialLLT
+  * - class CholmodSimplicialLDLT
+  * Note that these classes does not bring any particular advantage compared to the built-in
+  * SimplicialLLT and SimplicialLDLT factorization classes.
+  *
+  * \code
+  * #include <Eigen/CholmodSupport>
+  * \endcode
+  *
+  * In order to use this module, the cholmod headers must be accessible from the include paths, and your binary must be linked to the cholmod library and its dependencies.
+  * The dependencies depend on how cholmod has been compiled.
+  * For a cmake based project, you can use our FindCholmod.cmake module to help you in this task.
+  *
+  */
+
+#include "src/misc/Solve.h"
+#include "src/misc/SparseSolve.h"
+
+#include "src/CholmodSupport/CholmodSupport.h"
+
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_CHOLMODSUPPORT_MODULE_H
+
diff --git a/third_party/eigen3/Eigen/Core b/third_party/eigen3/Eigen/Core
new file mode 100644
index 0000000000..68f29bc693
--- /dev/null
+++ b/third_party/eigen3/Eigen/Core
@@ -0,0 +1,481 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2007-2011 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CORE_H
+#define EIGEN_CORE_H
+
+// first thing Eigen does: stop the compiler from committing suicide
+#include "src/Core/util/DisableStupidWarnings.h"
+
+// Begin Google-only: this allows us to link MPL2-only and non-MPL2-only
+// versions of Eigen without conflict. Do not use outside of :eigen3_restricted.
+#ifdef GOOGLE3_EIGEN_MPL2_ONLY_OVERRIDE
+#undef EIGEN_MPL2_ONLY
+#endif
+// End Google-only.
+
+#ifdef __CUDACC__
+  // Do not try asserts on CUDA!
+  #ifndef EIGEN_NO_DEBUG
+  #define EIGEN_NO_DEBUG
+  #endif
+
+  #ifdef EIGEN_INTERNAL_DEBUGGING
+  #undef EIGEN_INTERNAL_DEBUGGING
+  #endif
+
+  // Do not try to vectorize on CUDA!
+  #define EIGEN_DONT_VECTORIZE
+
+  // All functions callable from CUDA code must be qualified with __device__
+  #define EIGEN_DEVICE_FUNC __host__ __device__
+
+#else
+  #define EIGEN_DEVICE_FUNC
+
+#endif
+
+// CUDA before C++11 support does not have std::max or std::min
+#if defined(__CUDA_ARCH__) && (__cplusplus < 201103L)
+  #define EIGEN_USING_STD_MATH(FUNC) using ::FUNC;
+#else
+  #define EIGEN_USING_STD_MATH(FUNC) using std::FUNC;
+#endif
+
+// then include this file where all our macros are defined. It's really important to do it first because
+// it's where we do all the alignment settings (platform detection and honoring the user's will if he
+// defined e.g. EIGEN_DONT_ALIGN) so it needs to be done before we do anything with vectorization.
+#include "src/Core/util/Macros.h"
+
+// Disable the ipa-cp-clone optimization flag with MinGW 6.x or newer (enabled by default with -O3)
+// See http://eigen.tuxfamily.org/bz/show_bug.cgi?id=556 for details.
+#if EIGEN_COMP_MINGW && EIGEN_GNUC_AT_LEAST(4,6)
+  #pragma GCC optimize ("-fno-ipa-cp-clone")
+#endif
+
+#include <complex>
+
+// this include file manages BLAS and MKL related macros
+// and inclusion of their respective header files
+#include "src/Core/util/MKL_support.h"
+
+// if alignment is disabled, then disable vectorization. Note: EIGEN_ALIGN is the proper check, it takes into
+// account both the user's will (EIGEN_DONT_ALIGN) and our own platform checks
+#if !EIGEN_ALIGN
+  #ifndef EIGEN_DONT_VECTORIZE
+    #define EIGEN_DONT_VECTORIZE
+  #endif
+#endif
+
+#if EIGEN_COMP_MSVC
+  #include <malloc.h> // for _aligned_malloc -- need it regardless of whether vectorization is enabled
+  #if (EIGEN_COMP_MSVC >= 1500) // 2008 or later
+    // Remember that usage of defined() in a #define is undefined by the standard.
+    // a user reported that in 64-bit mode, MSVC doesn't care to define _M_IX86_FP.
+    #if (defined(_M_IX86_FP) && (_M_IX86_FP >= 2)) || EIGEN_ARCH_x86_64
+      #define EIGEN_SSE2_ON_MSVC_2008_OR_LATER
+    #endif
+  #endif
+#else
+  // Remember that usage of defined() in a #define is undefined by the standard
+  #if (defined __SSE2__) && ( (!EIGEN_COMP_GNUC) || EIGEN_COMP_ICC || EIGEN_GNUC_AT_LEAST(4,2) )
+    #define EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC
+  #endif
+#endif
+
+#ifndef EIGEN_DONT_VECTORIZE
+
+  #if defined (EIGEN_SSE2_ON_NON_MSVC_BUT_NOT_OLD_GCC) || defined(EIGEN_SSE2_ON_MSVC_2008_OR_LATER)
+
+    // Defines symbols for compile-time detection of which instructions are
+    // used.
+    // EIGEN_VECTORIZE_YY is defined if and only if the instruction set YY is used
+    #define EIGEN_VECTORIZE
+    #define EIGEN_VECTORIZE_SSE
+    #define EIGEN_VECTORIZE_SSE2
+
+    // Detect sse3/ssse3/sse4:
+    // gcc and icc defines __SSE3__, ...
+    // there is no way to know about this on msvc. You can define EIGEN_VECTORIZE_SSE* if you
+    // want to force the use of those instructions with msvc.
+    #ifdef __SSE3__
+      #define EIGEN_VECTORIZE_SSE3
+    #endif
+    #ifdef __SSSE3__
+      #define EIGEN_VECTORIZE_SSSE3
+    #endif
+    #ifdef __SSE4_1__
+      #define EIGEN_VECTORIZE_SSE4_1
+    #endif
+    #ifdef __SSE4_2__
+      #define EIGEN_VECTORIZE_SSE4_2
+    #endif
+    #ifdef __AVX__
+      #define EIGEN_VECTORIZE_AVX
+      #define EIGEN_VECTORIZE_SSE3
+      #define EIGEN_VECTORIZE_SSSE3
+      #define EIGEN_VECTORIZE_SSE4_1
+      #define EIGEN_VECTORIZE_SSE4_2
+    #endif
+    #ifdef __AVX2__
+      #define EIGEN_VECTORIZE_AVX2
+    #endif
+    #ifdef __FMA__
+      #define EIGEN_VECTORIZE_FMA
+    #endif
+    // include files
+
+    // This extern "C" works around a MINGW-w64 compilation issue
+    // https://sourceforge.net/tracker/index.php?func=detail&aid=3018394&group_id=202880&atid=983354
+    // In essence, intrin.h is included by windows.h and also declares intrinsics (just as emmintrin.h etc. below do).
+    // However, intrin.h uses an extern "C" declaration, and g++ thus complains of duplicate declarations
+    // with conflicting linkage.  The linkage for intrinsics doesn't matter, but at that stage the compiler doesn't know;
+    // so, to avoid compile errors when windows.h is included after Eigen/Core, ensure intrinsics are extern "C" here too.
+    // notice that since these are C headers, the extern "C" is theoretically needed anyways.
+    extern "C" {
+      // In theory we should only include immintrin.h and not the other *mmintrin.h header files directly.
+      // Doing so triggers some issues with ICC. However old gcc versions seems to not have this file, thus:
+      #if EIGEN_COMP_ICC >= 1110
+        #include <immintrin.h>
+      #else
+        #include <emmintrin.h>
+        #include <xmmintrin.h>
+        #ifdef  EIGEN_VECTORIZE_SSE3
+        #include <pmmintrin.h>
+        #endif
+        #ifdef EIGEN_VECTORIZE_SSSE3
+        #include <tmmintrin.h>
+        #endif
+        #ifdef EIGEN_VECTORIZE_SSE4_1
+        #include <smmintrin.h>
+        #endif
+        #ifdef EIGEN_VECTORIZE_SSE4_2
+        #include <nmmintrin.h>
+        #endif
+        #ifdef EIGEN_VECTORIZE_AVX
+        #include <immintrin.h>
+        #endif
+      #endif
+    } // end extern "C"
+  #elif defined __VSX__
+    #define EIGEN_VECTORIZE
+    #define EIGEN_VECTORIZE_VSX
+    #include <altivec.h>
+    // We need to #undef all these ugly tokens defined in <altivec.h>
+    // => use __vector instead of vector
+    #undef bool
+    #undef vector
+    #undef pixel
+  #elif defined __ALTIVEC__
+    #define EIGEN_VECTORIZE
+    #define EIGEN_VECTORIZE_ALTIVEC
+    #include <altivec.h>
+    // We need to #undef all these ugly tokens defined in <altivec.h>
+    // => use __vector instead of vector
+    #undef bool
+    #undef vector
+    #undef pixel
+  #elif defined  __ARM_NEON__
+    #define EIGEN_VECTORIZE
+    #define EIGEN_VECTORIZE_NEON
+    #include <arm_neon.h>
+  #endif
+#endif
+
+#include <float.h>
+#include <limits.h>
+#include <math.h>
+
+#if defined(__NVCC__)
+  #define EIGEN_VECTORIZE_CUDA
+  #include <vector_types.h>
+#elif defined(__GCUDACC__)
+  #define EIGEN_VECTORIZE_CUDA
+  #define __VECTOR_TYPES_H__
+  #include "third_party/gpus/cuda/include/vector_functions.h"
+#endif
+
+#if (defined _OPENMP) && (!defined EIGEN_DONT_PARALLELIZE)
+  #define EIGEN_HAS_OPENMP
+#endif
+
+#ifdef EIGEN_HAS_OPENMP
+#include <omp.h>
+#endif
+
+// MSVC for windows mobile does not have the errno.h file
+#if !(EIGEN_COMP_MSVC && EIGEN_OS_WINCE) && !EIGEN_COMP_ARM
+#define EIGEN_HAS_ERRNO
+#endif
+
+#ifdef EIGEN_HAS_ERRNO
+#include <cerrno>
+#endif
+#include <cstddef>
+#include <cstdlib>
+#include <cmath>
+#include <cassert>
+#include <functional>
+#include <iosfwd>
+#include <cstring>
+#include <string>
+#include <limits>
+#include <climits> // for CHAR_BIT
+// for min/max:
+#include <algorithm>
+
+// for outputting debug info
+#ifdef EIGEN_DEBUG_ASSIGN
+#include <iostream>
+#endif
+
+// required for __cpuid, needs to be included after cmath
+#if defined(_MSC_VER) && (defined(_M_IX86)||defined(_M_X64))
+  #include <intrin.h>
+#endif
+
+#if (defined(_CPPUNWIND) || defined(__EXCEPTIONS)) && !defined(__CUDA_ARCH__)
+  #define EIGEN_EXCEPTIONS
+#endif
+
+#ifdef EIGEN_EXCEPTIONS
+  #include <new>
+#endif
+
+/** \brief Namespace containing all symbols from the %Eigen library. */
+namespace Eigen {
+
+inline static const char *SimdInstructionSetsInUse(void) {
+#if defined(EIGEN_VECTORIZE_AVX)
+  return "AVX SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2";
+#elif defined(EIGEN_VECTORIZE_SSE4_2)
+  return "SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2";
+#elif defined(EIGEN_VECTORIZE_SSE4_1)
+  return "SSE, SSE2, SSE3, SSSE3, SSE4.1";
+#elif defined(EIGEN_VECTORIZE_SSSE3)
+  return "SSE, SSE2, SSE3, SSSE3";
+#elif defined(EIGEN_VECTORIZE_SSE3)
+  return "SSE, SSE2, SSE3";
+#elif defined(EIGEN_VECTORIZE_SSE2)
+  return "SSE, SSE2";
+#elif defined(EIGEN_VECTORIZE_ALTIVEC)
+  return "AltiVec";
+#elif defined(EIGEN_VECTORIZE_VSX)
+  return "VSX";
+#elif defined(EIGEN_VECTORIZE_NEON)
+  return "ARM NEON";
+#else
+  return "None";
+#endif
+}
+
+} // end namespace Eigen
+
+#define STAGE10_FULL_EIGEN2_API             10
+#define STAGE20_RESOLVE_API_CONFLICTS       20
+#define STAGE30_FULL_EIGEN3_API             30
+#define STAGE40_FULL_EIGEN3_STRICTNESS      40
+#define STAGE99_NO_EIGEN2_SUPPORT           99
+
+#if   defined EIGEN2_SUPPORT_STAGE40_FULL_EIGEN3_STRICTNESS
+  #define EIGEN2_SUPPORT
+  #define EIGEN2_SUPPORT_STAGE STAGE40_FULL_EIGEN3_STRICTNESS
+#elif defined EIGEN2_SUPPORT_STAGE30_FULL_EIGEN3_API
+  #define EIGEN2_SUPPORT
+  #define EIGEN2_SUPPORT_STAGE STAGE30_FULL_EIGEN3_API
+#elif defined EIGEN2_SUPPORT_STAGE20_RESOLVE_API_CONFLICTS
+  #define EIGEN2_SUPPORT
+  #define EIGEN2_SUPPORT_STAGE STAGE20_RESOLVE_API_CONFLICTS
+#elif defined EIGEN2_SUPPORT_STAGE10_FULL_EIGEN2_API
+  #define EIGEN2_SUPPORT
+  #define EIGEN2_SUPPORT_STAGE STAGE10_FULL_EIGEN2_API
+#elif defined EIGEN2_SUPPORT
+  // default to stage 3, that's what it's always meant
+  #define EIGEN2_SUPPORT_STAGE30_FULL_EIGEN3_API
+  #define EIGEN2_SUPPORT_STAGE STAGE30_FULL_EIGEN3_API
+#else
+  #define EIGEN2_SUPPORT_STAGE STAGE99_NO_EIGEN2_SUPPORT
+#endif
+
+#ifdef EIGEN2_SUPPORT
+#undef minor
+#endif
+
+// we use size_t frequently and we'll never remember to prepend it with std:: everytime just to
+// ensure QNX/QCC support
+using std::size_t;
+// gcc 4.6.0 wants std:: for ptrdiff_t
+using std::ptrdiff_t;
+
+/** \defgroup Core_Module Core module
+  * This is the main module of Eigen providing dense matrix and vector support
+  * (both fixed and dynamic size) with all the features corresponding to a BLAS library
+  * and much more...
+  *
+  * \code
+  * #include <Eigen/Core>
+  * \endcode
+  */
+
+#include "src/Core/util/Constants.h"
+#include "src/Core/util/ForwardDeclarations.h"
+#include "src/Core/util/Meta.h"
+#include "src/Core/util/StaticAssert.h"
+#include "src/Core/util/XprHelper.h"
+#include "src/Core/util/Memory.h"
+
+#include "src/Core/NumTraits.h"
+#include "src/Core/MathFunctions.h"
+#include "src/Core/GenericPacketMath.h"
+
+#if defined EIGEN_VECTORIZE_AVX
+  // Use AVX for floats and doubles, SSE for integers
+  #include "src/Core/arch/SSE/PacketMath.h"
+  #include "src/Core/arch/SSE/Complex.h"
+  #include "src/Core/arch/AVX/PacketMath.h"
+  #include "src/Core/arch/AVX/Complex.h"
+  #include "src/Core/arch/AVX/MathFunctions.h"
+  #include "src/Core/arch/AVX/TypeCasting.h"
+#elif defined EIGEN_VECTORIZE_SSE
+  #include "src/Core/arch/SSE/PacketMath.h"
+  #include "src/Core/arch/SSE/MathFunctions.h"
+  #include "src/Core/arch/SSE/Complex.h"
+  #include "src/Core/arch/SSE/TypeCasting.h"
+#elif defined(EIGEN_VECTORIZE_ALTIVEC) || defined(EIGEN_VECTORIZE_VSX)
+  #include "src/Core/arch/AltiVec/PacketMath.h"
+  #include "src/Core/arch/AltiVec/MathFunctions.h"
+  #include "src/Core/arch/AltiVec/Complex.h"
+#elif defined EIGEN_VECTORIZE_NEON
+  #include "src/Core/arch/NEON/PacketMath.h"
+  #include "src/Core/arch/NEON/MathFunctions.h"
+  #include "src/Core/arch/NEON/Complex.h"
+#endif
+
+#if defined EIGEN_VECTORIZE_CUDA
+  #include "src/Core/arch/CUDA/PacketMath.h"
+  #include "src/Core/arch/CUDA/MathFunctions.h"
+#endif
+
+#include "src/Core/arch/Default/Settings.h"
+
+#include "src/Core/functors/BinaryFunctors.h"
+#include "src/Core/functors/UnaryFunctors.h"
+#include "src/Core/functors/NullaryFunctors.h"
+#include "src/Core/functors/StlFunctors.h"
+
+#include "src/Core/DenseCoeffsBase.h"
+#include "src/Core/DenseBase.h"
+#include "src/Core/MatrixBase.h"
+#include "src/Core/EigenBase.h"
+
+#ifdef EIGEN_ENABLE_EVALUATORS
+#include "src/Core/functors/AssignmentFunctors.h"
+#include "src/Core/Product.h"
+#include "src/Core/CoreEvaluators.h"
+#include "src/Core/AssignEvaluator.h"
+#include "src/Core/ProductEvaluators.h"
+#endif
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN // work around Doxygen bug triggered by Assign.h r814874
+                                // at least confirmed with Doxygen 1.5.5 and 1.5.6
+  #include "src/Core/Assign.h"
+#endif
+
+#include "src/Core/ArrayBase.h"
+#include "src/Core/util/BlasUtil.h"
+#include "src/Core/util/MatrixMapper.h"
+#include "src/Core/DenseStorage.h"
+#include "src/Core/NestByValue.h"
+#include "src/Core/ForceAlignedAccess.h"
+#include "src/Core/ReturnByValue.h"
+#include "src/Core/NoAlias.h"
+#include "src/Core/PlainObjectBase.h"
+#include "src/Core/Matrix.h"
+#include "src/Core/Array.h"
+#include "src/Core/CwiseBinaryOp.h"
+#include "src/Core/CwiseUnaryOp.h"
+#include "src/Core/CwiseNullaryOp.h"
+#include "src/Core/CwiseUnaryView.h"
+#include "src/Core/SelfCwiseBinaryOp.h"
+#include "src/Core/Dot.h"
+#include "src/Core/StableNorm.h"
+#include "src/Core/MapBase.h"
+#include "src/Core/Stride.h"
+#include "src/Core/Map.h"
+#include "src/Core/Block.h"
+#include "src/Core/VectorBlock.h"
+#include "src/Core/Ref.h"
+#include "src/Core/Transpose.h"
+#include "src/Core/DiagonalMatrix.h"
+#include "src/Core/Diagonal.h"
+#include "src/Core/DiagonalProduct.h"
+#include "src/Core/PermutationMatrix.h"
+#include "src/Core/Transpositions.h"
+#include "src/Core/Redux.h"
+#include "src/Core/Visitor.h"
+#include "src/Core/Fuzzy.h"
+#include "src/Core/IO.h"
+#include "src/Core/Swap.h"
+#include "src/Core/CommaInitializer.h"
+#include "src/Core/Flagged.h"
+#include "src/Core/ProductBase.h"
+#include "src/Core/GeneralProduct.h"
+#include "src/Core/TriangularMatrix.h"
+#include "src/Core/SelfAdjointView.h"
+#include "src/Core/products/GeneralBlockPanelKernel.h"
+#include "src/Core/products/Parallelizer.h"
+#include "src/Core/products/CoeffBasedProduct.h"
+#include "src/Core/products/GeneralMatrixVector.h"
+#include "src/Core/products/GeneralMatrixMatrix.h"
+#include "src/Core/SolveTriangular.h"
+#include "src/Core/products/GeneralMatrixMatrixTriangular.h"
+#include "src/Core/products/SelfadjointMatrixVector.h"
+#include "src/Core/products/SelfadjointMatrixMatrix.h"
+#include "src/Core/products/SelfadjointProduct.h"
+#include "src/Core/products/SelfadjointRank2Update.h"
+#include "src/Core/products/TriangularMatrixVector.h"
+#include "src/Core/products/TriangularMatrixMatrix.h"
+#include "src/Core/products/TriangularSolverMatrix.h"
+#include "src/Core/products/TriangularSolverVector.h"
+#include "src/Core/BandMatrix.h"
+#include "src/Core/CoreIterators.h"
+
+#include "src/Core/BooleanRedux.h"
+#include "src/Core/Select.h"
+#include "src/Core/VectorwiseOp.h"
+#include "src/Core/Random.h"
+#include "src/Core/Replicate.h"
+#include "src/Core/Reverse.h"
+#include "src/Core/ArrayWrapper.h"
+
+#ifdef EIGEN_USE_BLAS
+#include "src/Core/products/GeneralMatrixMatrix_MKL.h"
+#include "src/Core/products/GeneralMatrixVector_MKL.h"
+#include "src/Core/products/GeneralMatrixMatrixTriangular_MKL.h"
+#include "src/Core/products/SelfadjointMatrixMatrix_MKL.h"
+#include "src/Core/products/SelfadjointMatrixVector_MKL.h"
+#include "src/Core/products/TriangularMatrixMatrix_MKL.h"
+#include "src/Core/products/TriangularMatrixVector_MKL.h"
+#include "src/Core/products/TriangularSolverMatrix_MKL.h"
+#endif // EIGEN_USE_BLAS
+
+#ifdef EIGEN_USE_MKL_VML
+#include "src/Core/Assign_MKL.h"
+#endif
+
+#include "src/Core/GlobalFunctions.h"
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#ifdef EIGEN2_SUPPORT
+#include "Eigen2Support"
+#endif
+
+#endif // EIGEN_CORE_H
diff --git a/third_party/eigen3/Eigen/Dense b/third_party/eigen3/Eigen/Dense
new file mode 100644
index 0000000000..5768910bd8
--- /dev/null
+++ b/third_party/eigen3/Eigen/Dense
@@ -0,0 +1,7 @@
+#include "Core"
+#include "LU"
+#include "Cholesky"
+#include "QR"
+#include "SVD"
+#include "Geometry"
+#include "Eigenvalues"
diff --git a/third_party/eigen3/Eigen/Eigen2Support b/third_party/eigen3/Eigen/Eigen2Support
new file mode 100644
index 0000000000..36156d29a9
--- /dev/null
+++ b/third_party/eigen3/Eigen/Eigen2Support
@@ -0,0 +1,82 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2SUPPORT_H
+#define EIGEN2SUPPORT_H
+
+#if (!defined(EIGEN2_SUPPORT)) || (!defined(EIGEN_CORE_H))
+#error Eigen2 support must be enabled by defining EIGEN2_SUPPORT before including any Eigen header
+#endif
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+/** \ingroup Support_modules
+  * \defgroup Eigen2Support_Module Eigen2 support module
+  * This module provides a couple of deprecated functions improving the compatibility with Eigen2.
+  *
+  * To use it, define EIGEN2_SUPPORT before including any Eigen header
+  * \code
+  * #define EIGEN2_SUPPORT
+  * \endcode
+  *
+  */
+
+#include "src/Eigen2Support/Macros.h"
+#include "src/Eigen2Support/Memory.h"
+#include "src/Eigen2Support/Meta.h"
+#include "src/Eigen2Support/Lazy.h"
+#include "src/Eigen2Support/Cwise.h"
+#include "src/Eigen2Support/CwiseOperators.h"
+#include "src/Eigen2Support/TriangularSolver.h"
+#include "src/Eigen2Support/Block.h"
+#include "src/Eigen2Support/VectorBlock.h"
+#include "src/Eigen2Support/Minor.h"
+#include "src/Eigen2Support/MathFunctions.h"
+
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+// Eigen2 used to include iostream
+#include<iostream>
+
+#define EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, SizeSuffix) \
+using Eigen::Matrix##SizeSuffix##TypeSuffix; \
+using Eigen::Vector##SizeSuffix##TypeSuffix; \
+using Eigen::RowVector##SizeSuffix##TypeSuffix;
+
+#define EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE(TypeSuffix) \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 2) \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 3) \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 4) \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, X) \
+
+#define EIGEN_USING_MATRIX_TYPEDEFS \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE(i) \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE(f) \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE(d) \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE(cf) \
+EIGEN_USING_MATRIX_TYPEDEFS_FOR_TYPE(cd)
+
+#define USING_PART_OF_NAMESPACE_EIGEN \
+EIGEN_USING_MATRIX_TYPEDEFS \
+using Eigen::Matrix; \
+using Eigen::MatrixBase; \
+using Eigen::ei_random; \
+using Eigen::ei_real; \
+using Eigen::ei_imag; \
+using Eigen::ei_conj; \
+using Eigen::ei_abs; \
+using Eigen::ei_abs2; \
+using Eigen::ei_sqrt; \
+using Eigen::ei_exp; \
+using Eigen::ei_log; \
+using Eigen::ei_sin; \
+using Eigen::ei_cos;
+
+#endif // EIGEN2SUPPORT_H
diff --git a/third_party/eigen3/Eigen/Eigenvalues b/third_party/eigen3/Eigen/Eigenvalues
new file mode 100644
index 0000000000..53c5a73a27
--- /dev/null
+++ b/third_party/eigen3/Eigen/Eigenvalues
@@ -0,0 +1,48 @@
+#ifndef EIGEN_EIGENVALUES_MODULE_H
+#define EIGEN_EIGENVALUES_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#include "Cholesky"
+#include "Jacobi"
+#include "Householder"
+#include "LU"
+#include "Geometry"
+
+/** \defgroup Eigenvalues_Module Eigenvalues module
+  *
+  *
+  *
+  * This module mainly provides various eigenvalue solvers.
+  * This module also provides some MatrixBase methods, including:
+  *  - MatrixBase::eigenvalues(),
+  *  - MatrixBase::operatorNorm()
+  *
+  * \code
+  * #include <Eigen/Eigenvalues>
+  * \endcode
+  */
+
+#include "src/Eigenvalues/Tridiagonalization.h"
+#include "src/Eigenvalues/RealSchur.h"
+#include "src/Eigenvalues/EigenSolver.h"
+#include "src/Eigenvalues/SelfAdjointEigenSolver.h"
+#include "src/Eigenvalues/GeneralizedSelfAdjointEigenSolver.h"
+#include "src/Eigenvalues/HessenbergDecomposition.h"
+#include "src/Eigenvalues/ComplexSchur.h"
+#include "src/Eigenvalues/ComplexEigenSolver.h"
+#include "src/Eigenvalues/RealQZ.h"
+#include "src/Eigenvalues/GeneralizedEigenSolver.h"
+#include "src/Eigenvalues/MatrixBaseEigenvalues.h"
+#ifdef EIGEN_USE_LAPACKE
+#include "src/Eigenvalues/RealSchur_MKL.h"
+#include "src/Eigenvalues/ComplexSchur_MKL.h"
+#include "src/Eigenvalues/SelfAdjointEigenSolver_MKL.h"
+#endif
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_EIGENVALUES_MODULE_H
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/Eigen/Geometry b/third_party/eigen3/Eigen/Geometry
new file mode 100644
index 0000000000..f9bc6fc578
--- /dev/null
+++ b/third_party/eigen3/Eigen/Geometry
@@ -0,0 +1,65 @@
+#ifndef EIGEN_GEOMETRY_MODULE_H
+#define EIGEN_GEOMETRY_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#include "SVD"
+#include "LU"
+#include <limits>
+
+#ifndef M_PI
+#define M_PI 3.14159265358979323846
+#endif
+
+/** \defgroup Geometry_Module Geometry module
+  *
+  *
+  *
+  * This module provides support for:
+  *  - fixed-size homogeneous transformations
+  *  - translation, scaling, 2D and 3D rotations
+  *  - quaternions
+  *  - \ref MatrixBase::cross() "cross product"
+  *  - \ref MatrixBase::unitOrthogonal() "orthognal vector generation"
+  *  - some linear components: parametrized-lines and hyperplanes
+  *
+  * \code
+  * #include <Eigen/Geometry>
+  * \endcode
+  */
+
+#include "src/Geometry/OrthoMethods.h"
+#include "src/Geometry/EulerAngles.h"
+
+#if EIGEN2_SUPPORT_STAGE > STAGE20_RESOLVE_API_CONFLICTS
+  #include "src/Geometry/Homogeneous.h"
+  #include "src/Geometry/RotationBase.h"
+  #include "src/Geometry/Rotation2D.h"
+  #include "src/Geometry/Quaternion.h"
+  #include "src/Geometry/AngleAxis.h"
+  #include "src/Geometry/Transform.h"
+  #include "src/Geometry/Translation.h"
+  #include "src/Geometry/Scaling.h"
+  #include "src/Geometry/Hyperplane.h"
+  #include "src/Geometry/ParametrizedLine.h"
+  #include "src/Geometry/AlignedBox.h"
+  #include "src/Geometry/Umeyama.h"
+
+  // Use the SSE optimized version whenever possible. At the moment the
+  // SSE version doesn't compile when AVX is enabled
+  #if defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX
+    #include "src/Geometry/arch/Geometry_SSE.h"
+  #endif
+#endif
+
+#ifdef EIGEN2_SUPPORT
+#include "src/Eigen2Support/Geometry/All.h"
+#endif
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_GEOMETRY_MODULE_H
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
+
diff --git a/third_party/eigen3/Eigen/Householder b/third_party/eigen3/Eigen/Householder
new file mode 100644
index 0000000000..6e348db5c4
--- /dev/null
+++ b/third_party/eigen3/Eigen/Householder
@@ -0,0 +1,23 @@
+#ifndef EIGEN_HOUSEHOLDER_MODULE_H
+#define EIGEN_HOUSEHOLDER_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+/** \defgroup Householder_Module Householder module
+  * This module provides Householder transformations.
+  *
+  * \code
+  * #include <Eigen/Householder>
+  * \endcode
+  */
+
+#include "src/Householder/Householder.h"
+#include "src/Householder/HouseholderSequence.h"
+#include "src/Householder/BlockHouseholder.h"
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_HOUSEHOLDER_MODULE_H
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/Eigen/Jacobi b/third_party/eigen3/Eigen/Jacobi
new file mode 100644
index 0000000000..ba8a4dc36a
--- /dev/null
+++ b/third_party/eigen3/Eigen/Jacobi
@@ -0,0 +1,26 @@
+#ifndef EIGEN_JACOBI_MODULE_H
+#define EIGEN_JACOBI_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+/** \defgroup Jacobi_Module Jacobi module
+  * This module provides Jacobi and Givens rotations.
+  *
+  * \code
+  * #include <Eigen/Jacobi>
+  * \endcode
+  *
+  * In addition to listed classes, it defines the two following MatrixBase methods to apply a Jacobi or Givens rotation:
+  *  - MatrixBase::applyOnTheLeft()
+  *  - MatrixBase::applyOnTheRight().
+  */
+
+#include "src/Jacobi/Jacobi.h"
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_JACOBI_MODULE_H
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
+
diff --git a/third_party/eigen3/Eigen/LU b/third_party/eigen3/Eigen/LU
new file mode 100644
index 0000000000..e5c3f32f78
--- /dev/null
+++ b/third_party/eigen3/Eigen/LU
@@ -0,0 +1,43 @@
+#ifndef EIGEN_LU_MODULE_H
+#define EIGEN_LU_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+/** \defgroup LU_Module LU module
+  * This module includes %LU decomposition and related notions such as matrix inversion and determinant.
+  * This module defines the following MatrixBase methods:
+  *  - MatrixBase::inverse()
+  *  - MatrixBase::determinant()
+  *
+  * \code
+  * #include <Eigen/LU>
+  * \endcode
+  */
+
+#include "src/misc/Solve.h"
+#include "src/misc/Kernel.h"
+#include "src/misc/Image.h"
+#include "src/LU/FullPivLU.h"
+#include "src/LU/PartialPivLU.h"
+#ifdef EIGEN_USE_LAPACKE
+#include "src/LU/PartialPivLU_MKL.h"
+#endif
+#include "src/LU/Determinant.h"
+#include "src/LU/Inverse.h"
+
+// Use the SSE optimized version whenever possible. At the moment the
+// SSE version doesn't compile when AVX is enabled
+#if defined EIGEN_VECTORIZE_SSE && !defined EIGEN_VECTORIZE_AVX
+  #include "src/LU/arch/Inverse_SSE.h"
+#endif
+
+#ifdef EIGEN2_SUPPORT
+  #include "src/Eigen2Support/LU.h"
+#endif
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_LU_MODULE_H
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/Eigen/LeastSquares b/third_party/eigen3/Eigen/LeastSquares
new file mode 100644
index 0000000000..35137c25db
--- /dev/null
+++ b/third_party/eigen3/Eigen/LeastSquares
@@ -0,0 +1,32 @@
+#ifndef EIGEN_REGRESSION_MODULE_H
+#define EIGEN_REGRESSION_MODULE_H
+
+#ifndef EIGEN2_SUPPORT
+#error LeastSquares is only available in Eigen2 support mode (define EIGEN2_SUPPORT)
+#endif
+
+// exclude from normal eigen3-only documentation
+#ifdef EIGEN2_SUPPORT
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#include "Eigenvalues"
+#include "Geometry"
+
+/** \defgroup LeastSquares_Module LeastSquares module
+  * This module provides linear regression and related features.
+  *
+  * \code
+  * #include <Eigen/LeastSquares>
+  * \endcode
+  */
+
+#include "src/Eigen2Support/LeastSquares.h"
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN2_SUPPORT
+
+#endif // EIGEN_REGRESSION_MODULE_H
diff --git a/third_party/eigen3/Eigen/OrderingMethods b/third_party/eigen3/Eigen/OrderingMethods
new file mode 100644
index 0000000000..7c0f1fffff
--- /dev/null
+++ b/third_party/eigen3/Eigen/OrderingMethods
@@ -0,0 +1,66 @@
+#ifndef EIGEN_ORDERINGMETHODS_MODULE_H
+#define EIGEN_ORDERINGMETHODS_MODULE_H
+
+#include "SparseCore"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+/** 
+  * \defgroup OrderingMethods_Module OrderingMethods module
+  *
+  * This module is currently for internal use only
+  * 
+  * It defines various built-in and external ordering methods for sparse matrices. 
+  * They are typically used to reduce the number of elements during 
+  * the sparse matrix decomposition (LLT, LU, QR).
+  * Precisely, in a preprocessing step, a permutation matrix P is computed using 
+  * those ordering methods and applied to the columns of the matrix. 
+  * Using for instance the sparse Cholesky decomposition, it is expected that 
+  * the nonzeros elements in LLT(A*P) will be much smaller than that in LLT(A).
+  * 
+  * 
+  * Usage : 
+  * \code
+  * #include <Eigen/OrderingMethods>
+  * \endcode
+  * 
+  * A simple usage is as a template parameter in the sparse decomposition classes : 
+  * 
+  * \code 
+  * SparseLU<MatrixType, COLAMDOrdering<int> > solver;
+  * \endcode 
+  * 
+  * \code 
+  * SparseQR<MatrixType, COLAMDOrdering<int> > solver;
+  * \endcode
+  * 
+  * It is possible as well to call directly a particular ordering method for your own purpose, 
+  * \code 
+  * AMDOrdering<int> ordering;
+  * PermutationMatrix<Dynamic, Dynamic, int> perm;
+  * SparseMatrix<double> A; 
+  * //Fill the matrix ...
+  * 
+  * ordering(A, perm); // Call AMD
+  * \endcode
+  * 
+  * \note Some of these methods (like AMD or METIS), need the sparsity pattern 
+  * of the input matrix to be symmetric. When the matrix is structurally unsymmetric, 
+  * Eigen computes internally the pattern of \f$A^T*A\f$ before calling the method.
+  * If your matrix is already symmetric (at leat in structure), you can avoid that
+  * by calling the method with a SelfAdjointView type.
+  * 
+  * \code
+  *  // Call the ordering on the pattern of the lower triangular matrix A
+  * ordering(A.selfadjointView<Lower>(), perm);
+  * \endcode
+  */
+
+#ifndef EIGEN_MPL2_ONLY
+#include "src/OrderingMethods/Amd.h"
+#endif
+
+#include "src/OrderingMethods/Ordering.h"
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_ORDERINGMETHODS_MODULE_H
diff --git a/third_party/eigen3/Eigen/PaStiXSupport b/third_party/eigen3/Eigen/PaStiXSupport
new file mode 100644
index 0000000000..7c616ee5ea
--- /dev/null
+++ b/third_party/eigen3/Eigen/PaStiXSupport
@@ -0,0 +1,46 @@
+#ifndef EIGEN_PASTIXSUPPORT_MODULE_H
+#define EIGEN_PASTIXSUPPORT_MODULE_H
+
+#include "SparseCore"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#include <complex.h>
+extern "C" {
+#include <pastix_nompi.h>
+#include <pastix.h>
+}
+
+#ifdef complex
+#undef complex
+#endif
+
+/** \ingroup Support_modules
+  * \defgroup PaStiXSupport_Module PaStiXSupport module
+  * 
+  * This module provides an interface to the <a href="http://pastix.gforge.inria.fr/">PaSTiX</a> library.
+  * PaSTiX is a general \b supernodal, \b parallel and \b opensource sparse solver.
+  * It provides the two following main factorization classes:
+  * - class PastixLLT : a supernodal, parallel LLt Cholesky factorization.
+  * - class PastixLDLT: a supernodal, parallel LDLt Cholesky factorization.
+  * - class PastixLU : a supernodal, parallel LU factorization (optimized for a symmetric pattern).
+  * 
+  * \code
+  * #include <Eigen/PaStiXSupport>
+  * \endcode
+  *
+  * In order to use this module, the PaSTiX headers must be accessible from the include paths, and your binary must be linked to the PaSTiX library and its dependencies.
+  * The dependencies depend on how PaSTiX has been compiled.
+  * For a cmake based project, you can use our FindPaSTiX.cmake module to help you in this task.
+  *
+  */
+
+#include "src/misc/Solve.h"
+#include "src/misc/SparseSolve.h"
+
+#include "src/PaStiXSupport/PaStiXSupport.h"
+
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_PASTIXSUPPORT_MODULE_H
diff --git a/third_party/eigen3/Eigen/PardisoSupport b/third_party/eigen3/Eigen/PardisoSupport
new file mode 100644
index 0000000000..99330ce7a7
--- /dev/null
+++ b/third_party/eigen3/Eigen/PardisoSupport
@@ -0,0 +1,30 @@
+#ifndef EIGEN_PARDISOSUPPORT_MODULE_H
+#define EIGEN_PARDISOSUPPORT_MODULE_H
+
+#include "SparseCore"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#include <mkl_pardiso.h>
+
+#include <unsupported/Eigen/SparseExtra>
+
+/** \ingroup Support_modules
+  * \defgroup PardisoSupport_Module PardisoSupport module
+  *
+  * This module brings support for the Intel(R) MKL PARDISO direct sparse solvers.
+  *
+  * \code
+  * #include <Eigen/PardisoSupport>
+  * \endcode
+  *
+  * In order to use this module, the MKL headers must be accessible from the include paths, and your binary must be linked to the MKL library and its dependencies.
+  * See this \ref TopicUsingIntelMKL "page" for more information on MKL-Eigen integration.
+  * 
+  */
+
+#include "src/PardisoSupport/PardisoSupport.h"
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_PARDISOSUPPORT_MODULE_H
diff --git a/third_party/eigen3/Eigen/QR b/third_party/eigen3/Eigen/QR
new file mode 100644
index 0000000000..8c7c6162e7
--- /dev/null
+++ b/third_party/eigen3/Eigen/QR
@@ -0,0 +1,47 @@
+#ifndef EIGEN_QR_MODULE_H
+#define EIGEN_QR_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#include "Cholesky"
+#include "Jacobi"
+#include "Householder"
+
+/** \defgroup QR_Module QR module
+  *
+  *
+  *
+  * This module provides various QR decompositions
+  * This module also provides some MatrixBase methods, including:
+  *  - MatrixBase::householderQr()
+  *  - MatrixBase::colPivHouseholderQr()
+  *  - MatrixBase::fullPivHouseholderQr()
+  *
+  * \code
+  * #include <Eigen/QR>
+  * \endcode
+  */
+
+#include "src/misc/Solve.h"
+#include "src/QR/HouseholderQR.h"
+#include "src/QR/FullPivHouseholderQR.h"
+#include "src/QR/ColPivHouseholderQR.h"
+#ifdef EIGEN_USE_LAPACKE
+#include "src/QR/HouseholderQR_MKL.h"
+#include "src/QR/ColPivHouseholderQR_MKL.h"
+#endif
+
+#ifdef EIGEN2_SUPPORT
+#include "src/Eigen2Support/QR.h"
+#endif
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#ifdef EIGEN2_SUPPORT
+#include "Eigenvalues"
+#endif
+
+#endif // EIGEN_QR_MODULE_H
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/Eigen/QtAlignedMalloc b/third_party/eigen3/Eigen/QtAlignedMalloc
new file mode 100644
index 0000000000..6717e9bd01
--- /dev/null
+++ b/third_party/eigen3/Eigen/QtAlignedMalloc
@@ -0,0 +1,29 @@
+
+#ifndef EIGEN_QTMALLOC_MODULE_H
+#define EIGEN_QTMALLOC_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+void *qMalloc(size_t size)
+{
+  return Eigen::internal::aligned_malloc(size);
+}
+
+void qFree(void *ptr)
+{
+  Eigen::internal::aligned_free(ptr);
+}
+
+void *qRealloc(void *ptr, size_t size)
+{
+  void* newPtr = Eigen::internal::aligned_malloc(size);
+  memcpy(newPtr, ptr, size);
+  Eigen::internal::aligned_free(ptr);
+  return newPtr;
+}
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_QTMALLOC_MODULE_H
diff --git a/third_party/eigen3/Eigen/SPQRSupport b/third_party/eigen3/Eigen/SPQRSupport
new file mode 100644
index 0000000000..77016442ee
--- /dev/null
+++ b/third_party/eigen3/Eigen/SPQRSupport
@@ -0,0 +1,29 @@
+#ifndef EIGEN_SPQRSUPPORT_MODULE_H
+#define EIGEN_SPQRSUPPORT_MODULE_H
+
+#include "SparseCore"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#include "SuiteSparseQR.hpp"
+
+/** \ingroup Support_modules
+  * \defgroup SPQRSupport_Module SuiteSparseQR module
+  * 
+  * This module provides an interface to the SPQR library, which is part of the <a href="http://www.cise.ufl.edu/research/sparse/SuiteSparse/">suitesparse</a> package.
+  *
+  * \code
+  * #include <Eigen/SPQRSupport>
+  * \endcode
+  *
+  * In order to use this module, the SPQR headers must be accessible from the include paths, and your binary must be linked to the SPQR library and its dependencies (Cholmod, AMD, COLAMD,...).
+  * For a cmake based project, you can use our FindSPQR.cmake and FindCholmod.Cmake modules
+  *
+  */
+
+#include "src/misc/Solve.h"
+#include "src/misc/SparseSolve.h"
+#include "src/CholmodSupport/CholmodSupport.h"
+#include "src/SPQRSupport/SuiteSparseQRSupport.h"
+
+#endif
diff --git a/third_party/eigen3/Eigen/SVD b/third_party/eigen3/Eigen/SVD
new file mode 100644
index 0000000000..fd310017ad
--- /dev/null
+++ b/third_party/eigen3/Eigen/SVD
@@ -0,0 +1,37 @@
+#ifndef EIGEN_SVD_MODULE_H
+#define EIGEN_SVD_MODULE_H
+
+#include "QR"
+#include "Householder"
+#include "Jacobi"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+/** \defgroup SVD_Module SVD module
+  *
+  *
+  *
+  * This module provides SVD decomposition for matrices (both real and complex).
+  * This decomposition is accessible via the following MatrixBase method:
+  *  - MatrixBase::jacobiSvd()
+  *
+  * \code
+  * #include <Eigen/SVD>
+  * \endcode
+  */
+
+#include "src/misc/Solve.h"
+#include "src/SVD/JacobiSVD.h"
+#if defined(EIGEN_USE_LAPACKE) && !defined(EIGEN_USE_LAPACKE_STRICT)
+#include "src/SVD/JacobiSVD_MKL.h"
+#endif
+#include "src/SVD/UpperBidiagonalization.h"
+
+#ifdef EIGEN2_SUPPORT
+#include "src/Eigen2Support/SVD.h"
+#endif
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_SVD_MODULE_H
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/Eigen/SparseCore b/third_party/eigen3/Eigen/SparseCore
new file mode 100644
index 0000000000..9b5be5e15a
--- /dev/null
+++ b/third_party/eigen3/Eigen/SparseCore
@@ -0,0 +1,64 @@
+#ifndef EIGEN_SPARSECORE_MODULE_H
+#define EIGEN_SPARSECORE_MODULE_H
+
+#include "Core"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#include <vector>
+#include <map>
+#include <cstdlib>
+#include <cstring>
+#include <algorithm>
+
+/** 
+  * \defgroup SparseCore_Module SparseCore module
+  *
+  * This module provides a sparse matrix representation, and basic associatd matrix manipulations
+  * and operations.
+  *
+  * See the \ref TutorialSparse "Sparse tutorial"
+  *
+  * \code
+  * #include <Eigen/SparseCore>
+  * \endcode
+  *
+  * This module depends on: Core.
+  */
+
+namespace Eigen {
+
+/** The type used to identify a general sparse storage. */
+struct Sparse {};
+
+}
+
+#include "src/SparseCore/SparseUtil.h"
+#include "src/SparseCore/SparseMatrixBase.h"
+#include "src/SparseCore/CompressedStorage.h"
+#include "src/SparseCore/AmbiVector.h"
+#include "src/SparseCore/SparseMatrix.h"
+#include "src/SparseCore/MappedSparseMatrix.h"
+#include "src/SparseCore/SparseVector.h"
+#include "src/SparseCore/SparseBlock.h"
+#include "src/SparseCore/SparseTranspose.h"
+#include "src/SparseCore/SparseCwiseUnaryOp.h"
+#include "src/SparseCore/SparseCwiseBinaryOp.h"
+#include "src/SparseCore/SparseDot.h"
+#include "src/SparseCore/SparsePermutation.h"
+#include "src/SparseCore/SparseRedux.h"
+#include "src/SparseCore/SparseFuzzy.h"
+#include "src/SparseCore/ConservativeSparseSparseProduct.h"
+#include "src/SparseCore/SparseSparseProductWithPruning.h"
+#include "src/SparseCore/SparseProduct.h"
+#include "src/SparseCore/SparseDenseProduct.h"
+#include "src/SparseCore/SparseDiagonalProduct.h"
+#include "src/SparseCore/SparseTriangularView.h"
+#include "src/SparseCore/SparseSelfAdjointView.h"
+#include "src/SparseCore/TriangularSolver.h"
+#include "src/SparseCore/SparseView.h"
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_SPARSECORE_MODULE_H
+
diff --git a/third_party/eigen3/Eigen/SparseQR b/third_party/eigen3/Eigen/SparseQR
new file mode 100644
index 0000000000..4ee42065ee
--- /dev/null
+++ b/third_party/eigen3/Eigen/SparseQR
@@ -0,0 +1,33 @@
+#ifndef EIGEN_SPARSEQR_MODULE_H
+#define EIGEN_SPARSEQR_MODULE_H
+
+#include "SparseCore"
+#include "OrderingMethods"
+#include "src/Core/util/DisableStupidWarnings.h"
+
+/** \defgroup SparseQR_Module SparseQR module
+  * \brief Provides QR decomposition for sparse matrices
+  * 
+  * This module provides a simplicial version of the left-looking Sparse QR decomposition. 
+  * The columns of the input matrix should be reordered to limit the fill-in during the 
+  * decomposition. Built-in methods (COLAMD, AMD) or external  methods (METIS) can be used to this end.
+  * See the \link OrderingMethods_Module OrderingMethods\endlink module for the list 
+  * of built-in and external ordering methods.
+  * 
+  * \code
+  * #include <Eigen/SparseQR>
+  * \endcode
+  * 
+  * 
+  */
+
+#include "src/misc/Solve.h"
+#include "src/misc/SparseSolve.h"
+
+#include "OrderingMethods"
+#include "src/SparseCore/SparseColEtree.h"
+#include "src/SparseQR/SparseQR.h"
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif
diff --git a/third_party/eigen3/Eigen/StdDeque b/third_party/eigen3/Eigen/StdDeque
new file mode 100644
index 0000000000..be3a7f82be
--- /dev/null
+++ b/third_party/eigen3/Eigen/StdDeque
@@ -0,0 +1,27 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Hauke Heibel <hauke.heibel@googlemail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STDDEQUE_MODULE_H
+#define EIGEN_STDDEQUE_MODULE_H
+
+#include "Core"
+#include <deque>
+
+#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 /* MSVC auto aligns in 64 bit builds */
+
+#define EIGEN_DEFINE_STL_DEQUE_SPECIALIZATION(...)
+
+#else
+
+#include "src/StlSupport/StdDeque.h"
+
+#endif
+
+#endif // EIGEN_STDDEQUE_MODULE_H
diff --git a/third_party/eigen3/Eigen/StdList b/third_party/eigen3/Eigen/StdList
new file mode 100644
index 0000000000..07ba1297be
--- /dev/null
+++ b/third_party/eigen3/Eigen/StdList
@@ -0,0 +1,26 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Hauke Heibel <hauke.heibel@googlemail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STDLIST_MODULE_H
+#define EIGEN_STDLIST_MODULE_H
+
+#include "Core"
+#include <list>
+
+#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 /* MSVC auto aligns in 64 bit builds */    
+
+#define EIGEN_DEFINE_STL_LIST_SPECIALIZATION(...)
+
+#else
+
+#include "src/StlSupport/StdList.h"
+
+#endif
+
+#endif // EIGEN_STDLIST_MODULE_H
diff --git a/third_party/eigen3/Eigen/StdVector b/third_party/eigen3/Eigen/StdVector
new file mode 100644
index 0000000000..fdfc377662
--- /dev/null
+++ b/third_party/eigen3/Eigen/StdVector
@@ -0,0 +1,27 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Hauke Heibel <hauke.heibel@googlemail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STDVECTOR_MODULE_H
+#define EIGEN_STDVECTOR_MODULE_H
+
+#include "Core"
+#include <vector>
+
+#if EIGEN_COMP_MSVC && EIGEN_OS_WIN64 /* MSVC auto aligns in 64 bit builds */
+
+#define EIGEN_DEFINE_STL_VECTOR_SPECIALIZATION(...)
+
+#else
+
+#include "src/StlSupport/StdVector.h"
+
+#endif
+
+#endif // EIGEN_STDVECTOR_MODULE_H
diff --git a/third_party/eigen3/Eigen/SuperLUSupport b/third_party/eigen3/Eigen/SuperLUSupport
new file mode 100644
index 0000000000..575e14fbc2
--- /dev/null
+++ b/third_party/eigen3/Eigen/SuperLUSupport
@@ -0,0 +1,59 @@
+#ifndef EIGEN_SUPERLUSUPPORT_MODULE_H
+#define EIGEN_SUPERLUSUPPORT_MODULE_H
+
+#include "SparseCore"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+#ifdef EMPTY
+#define EIGEN_EMPTY_WAS_ALREADY_DEFINED
+#endif
+
+typedef int int_t;
+#include <slu_Cnames.h>
+#include <supermatrix.h>
+#include <slu_util.h>
+
+// slu_util.h defines a preprocessor token named EMPTY which is really polluting,
+// so we remove it in favor of a SUPERLU_EMPTY token.
+// If EMPTY was already defined then we don't undef it.
+
+#if defined(EIGEN_EMPTY_WAS_ALREADY_DEFINED)
+# undef EIGEN_EMPTY_WAS_ALREADY_DEFINED
+#elif defined(EMPTY)
+# undef EMPTY
+#endif
+
+#define SUPERLU_EMPTY (-1)
+
+namespace Eigen { struct SluMatrix; }
+
+/** \ingroup Support_modules
+  * \defgroup SuperLUSupport_Module SuperLUSupport module
+  *
+  * This module provides an interface to the <a href="http://crd-legacy.lbl.gov/~xiaoye/SuperLU/">SuperLU</a> library.
+  * It provides the following factorization class:
+  * - class SuperLU: a supernodal sequential LU factorization.
+  * - class SuperILU: a supernodal sequential incomplete LU factorization (to be used as a preconditioner for iterative methods).
+  *
+  * \warning When including this module, you have to use SUPERLU_EMPTY instead of EMPTY which is no longer defined because it is too polluting.
+  *
+  * \code
+  * #include <Eigen/SuperLUSupport>
+  * \endcode
+  *
+  * In order to use this module, the superlu headers must be accessible from the include paths, and your binary must be linked to the superlu library and its dependencies.
+  * The dependencies depend on how superlu has been compiled.
+  * For a cmake based project, you can use our FindSuperLU.cmake module to help you in this task.
+  *
+  */
+
+#include "src/misc/Solve.h"
+#include "src/misc/SparseSolve.h"
+
+#include "src/SuperLUSupport/SuperLUSupport.h"
+
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_SUPERLUSUPPORT_MODULE_H
diff --git a/third_party/eigen3/Eigen/UmfPackSupport b/third_party/eigen3/Eigen/UmfPackSupport
new file mode 100644
index 0000000000..984f64a841
--- /dev/null
+++ b/third_party/eigen3/Eigen/UmfPackSupport
@@ -0,0 +1,36 @@
+#ifndef EIGEN_UMFPACKSUPPORT_MODULE_H
+#define EIGEN_UMFPACKSUPPORT_MODULE_H
+
+#include "SparseCore"
+
+#include "src/Core/util/DisableStupidWarnings.h"
+
+extern "C" {
+#include <umfpack.h>
+}
+
+/** \ingroup Support_modules
+  * \defgroup UmfPackSupport_Module UmfPackSupport module
+  *
+  * This module provides an interface to the UmfPack library which is part of the <a href="http://www.cise.ufl.edu/research/sparse/SuiteSparse/">suitesparse</a> package.
+  * It provides the following factorization class:
+  * - class UmfPackLU: a multifrontal sequential LU factorization.
+  *
+  * \code
+  * #include <Eigen/UmfPackSupport>
+  * \endcode
+  *
+  * In order to use this module, the umfpack headers must be accessible from the include paths, and your binary must be linked to the umfpack library and its dependencies.
+  * The dependencies depend on how umfpack has been compiled.
+  * For a cmake based project, you can use our FindUmfPack.cmake module to help you in this task.
+  *
+  */
+
+#include "src/misc/Solve.h"
+#include "src/misc/SparseSolve.h"
+
+#include "src/UmfPackSupport/UmfPackSupport.h"
+
+#include "src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_UMFPACKSUPPORT_MODULE_H
diff --git a/third_party/eigen3/Eigen/src/Cholesky/LDLT.h b/third_party/eigen3/Eigen/src/Cholesky/LDLT.h
new file mode 100644
index 0000000000..6c5632d024
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Cholesky/LDLT.h
@@ -0,0 +1,607 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Keir Mierle <mierle@gmail.com>
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2011 Timothy E. Holy <tim.holy@gmail.com >
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_LDLT_H
+#define EIGEN_LDLT_H
+
+namespace Eigen {
+
+namespace internal {
+  template<typename MatrixType, int UpLo> struct LDLT_Traits;
+
+  // PositiveSemiDef means positive semi-definite and non-zero; same for NegativeSemiDef
+  enum SignMatrix { PositiveSemiDef, NegativeSemiDef, ZeroSign, Indefinite };
+}
+
+/** \ingroup Cholesky_Module
+  *
+  * \class LDLT
+  *
+  * \brief Robust Cholesky decomposition of a matrix with pivoting
+  *
+  * \param MatrixType the type of the matrix of which to compute the LDL^T Cholesky decomposition
+  * \param UpLo the triangular part that will be used for the decompositon: Lower (default) or Upper.
+  *             The other triangular part won't be read.
+  *
+  * Perform a robust Cholesky decomposition of a positive semidefinite or negative semidefinite
+  * matrix \f$ A \f$ such that \f$ A =  P^TLDL^*P \f$, where P is a permutation matrix, L
+  * is lower triangular with a unit diagonal and D is a diagonal matrix.
+  *
+  * The decomposition uses pivoting to ensure stability, so that L will have
+  * zeros in the bottom right rank(A) - n submatrix. Avoiding the square root
+  * on D also stabilizes the computation.
+  *
+  * Remember that Cholesky decompositions are not rank-revealing. Also, do not use a Cholesky
+  * decomposition to determine whether a system of equations has a solution.
+  *
+  * \sa MatrixBase::ldlt(), SelfAdjointView::ldlt(), class LLT
+  */
+template<typename _MatrixType, int _UpLo> class LDLT
+{
+  public:
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options & ~RowMajorBit, // these are the options for the TmpMatrixType, we need a ColMajor matrix here!
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
+      UpLo = _UpLo
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef Matrix<Scalar, RowsAtCompileTime, 1, Options, MaxRowsAtCompileTime, 1> TmpMatrixType;
+
+    typedef Transpositions<RowsAtCompileTime, MaxRowsAtCompileTime, Index> TranspositionType;
+    typedef PermutationMatrix<RowsAtCompileTime, MaxRowsAtCompileTime, Index> PermutationType;
+
+    typedef internal::LDLT_Traits<MatrixType,UpLo> Traits;
+
+    /** \brief Default Constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via LDLT::compute(const MatrixType&).
+      */
+    LDLT()
+      : m_matrix(),
+        m_transpositions(),
+        m_sign(internal::ZeroSign),
+        m_isInitialized(false)
+    {}
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa LDLT()
+      */
+    LDLT(Index size)
+      : m_matrix(size, size),
+        m_transpositions(size),
+        m_temporary(size),
+        m_sign(internal::ZeroSign),
+        m_isInitialized(false)
+    {}
+
+    /** \brief Constructor with decomposition
+      *
+      * This calculates the decomposition for the input \a matrix.
+      * \sa LDLT(Index size)
+      */
+    LDLT(const MatrixType& matrix)
+      : m_matrix(matrix.rows(), matrix.cols()),
+        m_transpositions(matrix.rows()),
+        m_temporary(matrix.rows()),
+        m_sign(internal::ZeroSign),
+        m_isInitialized(false)
+    {
+      compute(matrix);
+    }
+
+    /** Clear any existing decomposition
+     * \sa rankUpdate(w,sigma)
+     */
+    void setZero()
+    {
+      m_isInitialized = false;
+    }
+
+    /** \returns a view of the upper triangular matrix U */
+    inline typename Traits::MatrixU matrixU() const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      return Traits::getU(m_matrix);
+    }
+
+    /** \returns a view of the lower triangular matrix L */
+    inline typename Traits::MatrixL matrixL() const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      return Traits::getL(m_matrix);
+    }
+
+    /** \returns the permutation matrix P as a transposition sequence.
+      */
+    inline const TranspositionType& transpositionsP() const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      return m_transpositions;
+    }
+
+    /** \returns the coefficients of the diagonal matrix D */
+    inline Diagonal<const MatrixType> vectorD() const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      return m_matrix.diagonal();
+    }
+
+    /** \returns true if the matrix is positive (semidefinite) */
+    inline bool isPositive() const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      return m_sign == internal::PositiveSemiDef || m_sign == internal::ZeroSign;
+    }
+
+    #ifdef EIGEN2_SUPPORT
+    inline bool isPositiveDefinite() const
+    {
+      return isPositive();
+    }
+    #endif
+
+    /** \returns true if the matrix is negative (semidefinite) */
+    inline bool isNegative(void) const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      return m_sign == internal::NegativeSemiDef || m_sign == internal::ZeroSign;
+    }
+
+    /** \returns a solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * This function also supports in-place solves using the syntax <tt>x = decompositionObject.solve(x)</tt> .
+      *
+      * \note_about_checking_solutions
+      *
+      * More precisely, this method solves \f$ A x = b \f$ using the decomposition \f$ A = P^T L D L^* P \f$
+      * by solving the systems \f$ P^T y_1 = b \f$, \f$ L y_2 = y_1 \f$, \f$ D y_3 = y_2 \f$,
+      * \f$ L^* y_4 = y_3 \f$ and \f$ P x = y_4 \f$ in succession. If the matrix \f$ A \f$ is singular, then
+      * \f$ D \f$ will also be singular (all the other matrices are invertible). In that case, the
+      * least-square solution of \f$ D y_3 = y_2 \f$ is computed. This does not mean that this function
+      * computes the least-square solution of \f$ A x = b \f$ is \f$ A \f$ is singular.
+      *
+      * \sa MatrixBase::ldlt(), SelfAdjointView::ldlt()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<LDLT, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      eigen_assert(m_matrix.rows()==b.rows()
+                && "LDLT::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<LDLT, Rhs>(*this, b.derived());
+    }
+
+    #ifdef EIGEN2_SUPPORT
+    template<typename OtherDerived, typename ResultType>
+    bool solve(const MatrixBase<OtherDerived>& b, ResultType *result) const
+    {
+      *result = this->solve(b);
+      return true;
+    }
+    #endif
+
+    template<typename Derived>
+    bool solveInPlace(MatrixBase<Derived> &bAndX) const;
+
+    LDLT& compute(const MatrixType& matrix);
+
+    template <typename Derived>
+    LDLT& rankUpdate(const MatrixBase<Derived>& w, const RealScalar& alpha=1);
+
+    /** \returns the internal LDLT decomposition matrix
+      *
+      * TODO: document the storage layout
+      */
+    inline const MatrixType& matrixLDLT() const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      return m_matrix;
+    }
+
+    MatrixType reconstructedMatrix() const;
+
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the matrix.appears to be negative.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "LDLT is not initialized.");
+      return Success;
+    }
+
+  protected:
+
+    /** \internal
+      * Used to compute and store the Cholesky decomposition A = L D L^* = U^* D U.
+      * The strict upper part is used during the decomposition, the strict lower
+      * part correspond to the coefficients of L (its diagonal is equal to 1 and
+      * is not stored), and the diagonal entries correspond to D.
+      */
+    MatrixType m_matrix;
+    TranspositionType m_transpositions;
+    TmpMatrixType m_temporary;
+    internal::SignMatrix m_sign;
+    bool m_isInitialized;
+};
+
+namespace internal {
+
+template<int UpLo> struct ldlt_inplace;
+
+template<> struct ldlt_inplace<Lower>
+{
+  template<typename MatrixType, typename TranspositionType, typename Workspace>
+  static bool unblocked(MatrixType& mat, TranspositionType& transpositions, Workspace& temp, SignMatrix& sign)
+  {
+    using std::abs;
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    eigen_assert(mat.rows()==mat.cols());
+    const Index size = mat.rows();
+
+    if (size <= 1)
+    {
+      transpositions.setIdentity();
+      if (numext::real(mat.coeff(0,0)) > 0) sign = PositiveSemiDef;
+      else if (numext::real(mat.coeff(0,0)) < 0) sign = NegativeSemiDef;
+      else sign = ZeroSign;
+      return true;
+    }
+
+    RealScalar cutoff(0), biggest_in_corner;
+
+    for (Index k = 0; k < size; ++k)
+    {
+      // Find largest diagonal element
+      Index index_of_biggest_in_corner;
+      biggest_in_corner = mat.diagonal().tail(size-k).cwiseAbs().maxCoeff(&index_of_biggest_in_corner);
+      index_of_biggest_in_corner += k;
+
+      if(k == 0)
+      {
+        // The biggest overall is the point of reference to which further diagonals
+        // are compared; if any diagonal is negligible compared
+        // to the largest overall, the algorithm bails.
+        cutoff = abs(NumTraits<Scalar>::epsilon() * biggest_in_corner);
+      }
+
+      transpositions.coeffRef(k) = index_of_biggest_in_corner;
+      if(k != index_of_biggest_in_corner)
+      {
+        // apply the transposition while taking care to consider only
+        // the lower triangular part
+        Index s = size-index_of_biggest_in_corner-1; // trailing size after the biggest element
+        mat.row(k).head(k).swap(mat.row(index_of_biggest_in_corner).head(k));
+        mat.col(k).tail(s).swap(mat.col(index_of_biggest_in_corner).tail(s));
+        std::swap(mat.coeffRef(k,k),mat.coeffRef(index_of_biggest_in_corner,index_of_biggest_in_corner));
+        for(Index i=k+1;i<index_of_biggest_in_corner;++i)
+        {
+          Scalar tmp = mat.coeffRef(i,k);
+          mat.coeffRef(i,k) = numext::conj(mat.coeffRef(index_of_biggest_in_corner,i));
+          mat.coeffRef(index_of_biggest_in_corner,i) = numext::conj(tmp);
+        }
+        if(NumTraits<Scalar>::IsComplex)
+          mat.coeffRef(index_of_biggest_in_corner,k) = numext::conj(mat.coeff(index_of_biggest_in_corner,k));
+      }
+
+      // partition the matrix:
+      //       A00 |  -  |  -
+      // lu  = A10 | A11 |  -
+      //       A20 | A21 | A22
+      Index rs = size - k - 1;
+      Block<MatrixType,Dynamic,1> A21(mat,k+1,k,rs,1);
+      Block<MatrixType,1,Dynamic> A10(mat,k,0,1,k);
+      Block<MatrixType,Dynamic,Dynamic> A20(mat,k+1,0,rs,k);
+
+      if(k>0)
+      {
+        temp.head(k) = mat.diagonal().head(k).asDiagonal() * A10.adjoint();
+        mat.coeffRef(k,k) -= (A10 * temp.head(k)).value();
+        if(rs>0)
+          A21.noalias() -= A20 * temp.head(k);
+      }
+
+      if((rs>0) && (abs(mat.coeffRef(k,k)) > cutoff))
+        A21 /= mat.coeffRef(k,k);
+
+      RealScalar realAkk = numext::real(mat.coeffRef(k,k));
+      if (sign == PositiveSemiDef) {
+        if (realAkk < 0) sign = Indefinite;
+      } else if (sign == NegativeSemiDef) {
+        if (realAkk > 0) sign = Indefinite;
+      } else if (sign == ZeroSign) {
+        if (realAkk > 0) sign = PositiveSemiDef;
+        else if (realAkk < 0) sign = NegativeSemiDef;
+      }
+    }
+
+    return true;
+  }
+
+  // Reference for the algorithm: Davis and Hager, "Multiple Rank
+  // Modifications of a Sparse Cholesky Factorization" (Algorithm 1)
+  // Trivial rearrangements of their computations (Timothy E. Holy)
+  // allow their algorithm to work for rank-1 updates even if the
+  // original matrix is not of full rank.
+  // Here only rank-1 updates are implemented, to reduce the
+  // requirement for intermediate storage and improve accuracy
+  template<typename MatrixType, typename WDerived>
+  static bool updateInPlace(MatrixType& mat, MatrixBase<WDerived>& w, const typename MatrixType::RealScalar& sigma=1)
+  {
+    using numext::isfinite;
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+
+    const Index size = mat.rows();
+    eigen_assert(mat.cols() == size && w.size()==size);
+
+    RealScalar alpha = 1;
+
+    // Apply the update
+    for (Index j = 0; j < size; j++)
+    {
+      // Check for termination due to an original decomposition of low-rank
+      if (!(isfinite)(alpha))
+        break;
+
+      // Update the diagonal terms
+      RealScalar dj = numext::real(mat.coeff(j,j));
+      Scalar wj = w.coeff(j);
+      RealScalar swj2 = sigma*numext::abs2(wj);
+      RealScalar gamma = dj*alpha + swj2;
+
+      mat.coeffRef(j,j) += swj2/alpha;
+      alpha += swj2/dj;
+
+
+      // Update the terms of L
+      Index rs = size-j-1;
+      w.tail(rs) -= wj * mat.col(j).tail(rs);
+      if(gamma != 0)
+        mat.col(j).tail(rs) += (sigma*numext::conj(wj)/gamma)*w.tail(rs);
+    }
+    return true;
+  }
+
+  template<typename MatrixType, typename TranspositionType, typename Workspace, typename WType>
+  static bool update(MatrixType& mat, const TranspositionType& transpositions, Workspace& tmp, const WType& w, const typename MatrixType::RealScalar& sigma=1)
+  {
+    // Apply the permutation to the input w
+    tmp = transpositions * w;
+
+    return ldlt_inplace<Lower>::updateInPlace(mat,tmp,sigma);
+  }
+};
+
+template<> struct ldlt_inplace<Upper>
+{
+  template<typename MatrixType, typename TranspositionType, typename Workspace>
+  static EIGEN_STRONG_INLINE bool unblocked(MatrixType& mat, TranspositionType& transpositions, Workspace& temp, SignMatrix& sign)
+  {
+    Transpose<MatrixType> matt(mat);
+    return ldlt_inplace<Lower>::unblocked(matt, transpositions, temp, sign);
+  }
+
+  template<typename MatrixType, typename TranspositionType, typename Workspace, typename WType>
+  static EIGEN_STRONG_INLINE bool update(MatrixType& mat, TranspositionType& transpositions, Workspace& tmp, WType& w, const typename MatrixType::RealScalar& sigma=1)
+  {
+    Transpose<MatrixType> matt(mat);
+    return ldlt_inplace<Lower>::update(matt, transpositions, tmp, w.conjugate(), sigma);
+  }
+};
+
+template<typename MatrixType> struct LDLT_Traits<MatrixType,Lower>
+{
+  typedef const TriangularView<const MatrixType, UnitLower> MatrixL;
+  typedef const TriangularView<const typename MatrixType::AdjointReturnType, UnitUpper> MatrixU;
+  static inline MatrixL getL(const MatrixType& m) { return m; }
+  static inline MatrixU getU(const MatrixType& m) { return m.adjoint(); }
+};
+
+template<typename MatrixType> struct LDLT_Traits<MatrixType,Upper>
+{
+  typedef const TriangularView<const typename MatrixType::AdjointReturnType, UnitLower> MatrixL;
+  typedef const TriangularView<const MatrixType, UnitUpper> MatrixU;
+  static inline MatrixL getL(const MatrixType& m) { return m.adjoint(); }
+  static inline MatrixU getU(const MatrixType& m) { return m; }
+};
+
+} // end namespace internal
+
+/** Compute / recompute the LDLT decomposition A = L D L^* = U^* D U of \a matrix
+  */
+template<typename MatrixType, int _UpLo>
+LDLT<MatrixType,_UpLo>& LDLT<MatrixType,_UpLo>::compute(const MatrixType& a)
+{
+  eigen_assert(a.rows()==a.cols());
+  const Index size = a.rows();
+
+  m_matrix = a;
+
+  m_transpositions.resize(size);
+  m_isInitialized = false;
+  m_temporary.resize(size);
+
+  internal::ldlt_inplace<UpLo>::unblocked(m_matrix, m_transpositions, m_temporary, m_sign);
+
+  m_isInitialized = true;
+  return *this;
+}
+
+/** Update the LDLT decomposition:  given A = L D L^T, efficiently compute the decomposition of A + sigma w w^T.
+ * \param w a vector to be incorporated into the decomposition.
+ * \param sigma a scalar, +1 for updates and -1 for "downdates," which correspond to removing previously-added column vectors. Optional; default value is +1.
+ * \sa setZero()
+  */
+template<typename MatrixType, int _UpLo>
+template<typename Derived>
+LDLT<MatrixType,_UpLo>& LDLT<MatrixType,_UpLo>::rankUpdate(const MatrixBase<Derived>& w, const typename NumTraits<typename MatrixType::Scalar>::Real& sigma)
+{
+  const Index size = w.rows();
+  if (m_isInitialized)
+  {
+    eigen_assert(m_matrix.rows()==size);
+  }
+  else
+  {
+    m_matrix.resize(size,size);
+    m_matrix.setZero();
+    m_transpositions.resize(size);
+    for (Index i = 0; i < size; i++)
+      m_transpositions.coeffRef(i) = i;
+    m_temporary.resize(size);
+    m_sign = sigma>=0 ? internal::PositiveSemiDef : internal::NegativeSemiDef;
+    m_isInitialized = true;
+  }
+
+  internal::ldlt_inplace<UpLo>::update(m_matrix, m_transpositions, m_temporary, w, sigma);
+
+  return *this;
+}
+
+namespace internal {
+template<typename _MatrixType, int _UpLo, typename Rhs>
+struct solve_retval<LDLT<_MatrixType,_UpLo>, Rhs>
+  : solve_retval_base<LDLT<_MatrixType,_UpLo>, Rhs>
+{
+  typedef LDLT<_MatrixType,_UpLo> LDLTType;
+  EIGEN_MAKE_SOLVE_HELPERS(LDLTType,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    eigen_assert(rhs().rows() == dec().matrixLDLT().rows());
+    // dst = P b
+    dst = dec().transpositionsP() * rhs();
+
+    // dst = L^-1 (P b)
+    dec().matrixL().solveInPlace(dst);
+
+    // dst = D^-1 (L^-1 P b)
+    // more precisely, use pseudo-inverse of D (see bug 241)
+    using std::abs;
+    typedef typename LDLTType::MatrixType MatrixType;
+    typedef typename LDLTType::Scalar Scalar;
+    typedef typename LDLTType::RealScalar RealScalar;
+    const Diagonal<const MatrixType> vectorD = dec().vectorD();
+    RealScalar tolerance = numext::maxi(vectorD.array().abs().maxCoeff() * NumTraits<Scalar>::epsilon(),
+                                        RealScalar(1) / NumTraits<RealScalar>::highest()); // motivated by LAPACK's xGELSS
+    for (Index i = 0; i < vectorD.size(); ++i) {
+      if(abs(vectorD(i)) > tolerance)
+        dst.row(i) /= vectorD(i);
+      else
+        dst.row(i).setZero();
+    }
+
+    // dst = L^-T (D^-1 L^-1 P b)
+    dec().matrixU().solveInPlace(dst);
+
+    // dst = P^-1 (L^-T D^-1 L^-1 P b) = A^-1 b
+    dst = dec().transpositionsP().transpose() * dst;
+  }
+};
+}
+
+/** \internal use x = ldlt_object.solve(x);
+  *
+  * This is the \em in-place version of solve().
+  *
+  * \param bAndX represents both the right-hand side matrix b and result x.
+  *
+  * \returns true always! If you need to check for existence of solutions, use another decomposition like LU, QR, or SVD.
+  *
+  * This version avoids a copy when the right hand side matrix b is not
+  * needed anymore.
+  *
+  * \sa LDLT::solve(), MatrixBase::ldlt()
+  */
+template<typename MatrixType,int _UpLo>
+template<typename Derived>
+bool LDLT<MatrixType,_UpLo>::solveInPlace(MatrixBase<Derived> &bAndX) const
+{
+  eigen_assert(m_isInitialized && "LDLT is not initialized.");
+  eigen_assert(m_matrix.rows() == bAndX.rows());
+
+  bAndX = this->solve(bAndX);
+
+  return true;
+}
+
+/** \returns the matrix represented by the decomposition,
+ * i.e., it returns the product: P^T L D L^* P.
+ * This function is provided for debug purpose. */
+template<typename MatrixType, int _UpLo>
+MatrixType LDLT<MatrixType,_UpLo>::reconstructedMatrix() const
+{
+  eigen_assert(m_isInitialized && "LDLT is not initialized.");
+  const Index size = m_matrix.rows();
+  MatrixType res(size,size);
+
+  // P
+  res.setIdentity();
+  res = transpositionsP() * res;
+  // L^* P
+  res = matrixU() * res;
+  // D(L^*P)
+  res = vectorD().asDiagonal() * res;
+  // L(DL^*P)
+  res = matrixL() * res;
+  // P^T (LDL^*P)
+  res = transpositionsP().transpose() * res;
+
+  return res;
+}
+
+#ifndef __CUDACC__
+/** \cholesky_module
+  * \returns the Cholesky decomposition with full pivoting without square root of \c *this
+  * \sa MatrixBase::ldlt()
+  */
+template<typename MatrixType, unsigned int UpLo>
+inline const LDLT<typename SelfAdjointView<MatrixType, UpLo>::PlainObject, UpLo>
+SelfAdjointView<MatrixType, UpLo>::ldlt() const
+{
+  return LDLT<PlainObject,UpLo>(m_matrix);
+}
+
+/** \cholesky_module
+  * \returns the Cholesky decomposition with full pivoting without square root of \c *this
+  * \sa SelfAdjointView::ldlt()
+  */
+template<typename Derived>
+inline const LDLT<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::ldlt() const
+{
+  return LDLT<PlainObject>(derived());
+}
+#endif // __CUDACC__
+
+} // end namespace Eigen
+
+#endif // EIGEN_LDLT_H
diff --git a/third_party/eigen3/Eigen/src/Cholesky/LLT.h b/third_party/eigen3/Eigen/src/Cholesky/LLT.h
new file mode 100644
index 0000000000..45ed8438f7
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Cholesky/LLT.h
@@ -0,0 +1,494 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_LLT_H
+#define EIGEN_LLT_H
+
+namespace Eigen { 
+
+namespace internal{
+template<typename MatrixType, int UpLo> struct LLT_Traits;
+}
+
+/** \ingroup Cholesky_Module
+  *
+  * \class LLT
+  *
+  * \brief Standard Cholesky decomposition (LL^T) of a matrix and associated features
+  *
+  * \param MatrixType the type of the matrix of which we are computing the LL^T Cholesky decomposition
+  * \param UpLo the triangular part that will be used for the decompositon: Lower (default) or Upper.
+  *             The other triangular part won't be read.
+  *
+  * This class performs a LL^T Cholesky decomposition of a symmetric, positive definite
+  * matrix A such that A = LL^* = U^*U, where L is lower triangular.
+  *
+  * While the Cholesky decomposition is particularly useful to solve selfadjoint problems like  D^*D x = b,
+  * for that purpose, we recommend the Cholesky decomposition without square root which is more stable
+  * and even faster. Nevertheless, this standard Cholesky decomposition remains useful in many other
+  * situations like generalised eigen problems with hermitian matrices.
+  *
+  * Remember that Cholesky decompositions are not rank-revealing. This LLT decomposition is only stable on positive definite matrices,
+  * use LDLT instead for the semidefinite case. Also, do not use a Cholesky decomposition to determine whether a system of equations
+  * has a solution.
+  *
+  * Example: \include LLT_example.cpp
+  * Output: \verbinclude LLT_example.out
+  *    
+  * \sa MatrixBase::llt(), SelfAdjointView::llt(), class LDLT
+  */
+ /* HEY THIS DOX IS DISABLED BECAUSE THERE's A BUG EITHER HERE OR IN LDLT ABOUT THAT (OR BOTH)
+  * Note that during the decomposition, only the upper triangular part of A is considered. Therefore,
+  * the strict lower part does not have to store correct values.
+  */
+template<typename _MatrixType, int _UpLo> class LLT
+{
+  public:
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
+    typedef typename MatrixType::Index Index;
+
+    enum {
+      PacketSize = internal::packet_traits<Scalar>::size,
+      AlignmentMask = int(PacketSize)-1,
+      UpLo = _UpLo
+    };
+
+    typedef internal::LLT_Traits<MatrixType,UpLo> Traits;
+
+    /**
+      * \brief Default Constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via LLT::compute(const MatrixType&).
+      */
+    LLT() : m_matrix(), m_isInitialized(false) {}
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa LLT()
+      */
+    LLT(Index size) : m_matrix(size, size),
+                    m_isInitialized(false) {}
+
+    LLT(const MatrixType& matrix)
+      : m_matrix(matrix.rows(), matrix.cols()),
+        m_isInitialized(false)
+    {
+      compute(matrix);
+    }
+
+    /** \returns a view of the upper triangular matrix U */
+    inline typename Traits::MatrixU matrixU() const
+    {
+      eigen_assert(m_isInitialized && "LLT is not initialized.");
+      return Traits::getU(m_matrix);
+    }
+
+    /** \returns a view of the lower triangular matrix L */
+    inline typename Traits::MatrixL matrixL() const
+    {
+      eigen_assert(m_isInitialized && "LLT is not initialized.");
+      return Traits::getL(m_matrix);
+    }
+
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * Since this LLT class assumes anyway that the matrix A is invertible, the solution
+      * theoretically exists and is unique regardless of b.
+      *
+      * Example: \include LLT_solve.cpp
+      * Output: \verbinclude LLT_solve.out
+      *
+      * \sa solveInPlace(), MatrixBase::llt(), SelfAdjointView::llt()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<LLT, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "LLT is not initialized.");
+      eigen_assert(m_matrix.rows()==b.rows()
+                && "LLT::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<LLT, Rhs>(*this, b.derived());
+    }
+
+    #ifdef EIGEN2_SUPPORT
+    template<typename OtherDerived, typename ResultType>
+    bool solve(const MatrixBase<OtherDerived>& b, ResultType *result) const
+    {
+      *result = this->solve(b);
+      return true;
+    }
+    
+    bool isPositiveDefinite() const { return true; }
+    #endif
+
+    template<typename Derived>
+    void solveInPlace(MatrixBase<Derived> &bAndX) const;
+
+    LLT& compute(const MatrixType& matrix);
+
+    /** \returns the LLT decomposition matrix
+      *
+      * TODO: document the storage layout
+      */
+    inline const MatrixType& matrixLLT() const
+    {
+      eigen_assert(m_isInitialized && "LLT is not initialized.");
+      return m_matrix;
+    }
+
+    MatrixType reconstructedMatrix() const;
+
+
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the matrix.appears to be negative.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "LLT is not initialized.");
+      return m_info;
+    }
+
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+
+    template<typename VectorType>
+    LLT rankUpdate(const VectorType& vec, const RealScalar& sigma = 1);
+
+  protected:
+    /** \internal
+      * Used to compute and store L
+      * The strict upper part is not used and even not initialized.
+      */
+    MatrixType m_matrix;
+    bool m_isInitialized;
+    ComputationInfo m_info;
+};
+
+namespace internal {
+
+template<typename Scalar, int UpLo> struct llt_inplace;
+
+template<typename MatrixType, typename VectorType>
+static typename MatrixType::Index llt_rank_update_lower(MatrixType& mat, const VectorType& vec, const typename MatrixType::RealScalar& sigma)
+{
+  using std::sqrt;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename MatrixType::RealScalar RealScalar;
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::ColXpr ColXpr;
+  typedef typename internal::remove_all<ColXpr>::type ColXprCleaned;
+  typedef typename ColXprCleaned::SegmentReturnType ColXprSegment;
+  typedef Matrix<Scalar,Dynamic,1> TempVectorType;
+  typedef typename TempVectorType::SegmentReturnType TempVecSegment;
+
+  Index n = mat.cols();
+  eigen_assert(mat.rows()==n && vec.size()==n);
+
+  TempVectorType temp;
+
+  if(sigma>0)
+  {
+    // This version is based on Givens rotations.
+    // It is faster than the other one below, but only works for updates,
+    // i.e., for sigma > 0
+    temp = sqrt(sigma) * vec;
+
+    for(Index i=0; i<n; ++i)
+    {
+      JacobiRotation<Scalar> g;
+      g.makeGivens(mat(i,i), -temp(i), &mat(i,i));
+
+      Index rs = n-i-1;
+      if(rs>0)
+      {
+        ColXprSegment x(mat.col(i).tail(rs));
+        TempVecSegment y(temp.tail(rs));
+        apply_rotation_in_the_plane(x, y, g);
+      }
+    }
+  }
+  else
+  {
+    temp = vec;
+    RealScalar beta = 1;
+    for(Index j=0; j<n; ++j)
+    {
+      RealScalar Ljj = numext::real(mat.coeff(j,j));
+      RealScalar dj = numext::abs2(Ljj);
+      Scalar wj = temp.coeff(j);
+      RealScalar swj2 = sigma*numext::abs2(wj);
+      RealScalar gamma = dj*beta + swj2;
+
+      RealScalar x = dj + swj2/beta;
+      if (x<=RealScalar(0))
+        return j;
+      RealScalar nLjj = sqrt(x);
+      mat.coeffRef(j,j) = nLjj;
+      beta += swj2/dj;
+
+      // Update the terms of L
+      Index rs = n-j-1;
+      if(rs)
+      {
+        temp.tail(rs) -= (wj/Ljj) * mat.col(j).tail(rs);
+        if(gamma != 0)
+          mat.col(j).tail(rs) = (nLjj/Ljj) * mat.col(j).tail(rs) + (nLjj * sigma*numext::conj(wj)/gamma)*temp.tail(rs);
+      }
+    }
+  }
+  return -1;
+}
+
+template<typename Scalar> struct llt_inplace<Scalar, Lower>
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  template<typename MatrixType>
+  static typename MatrixType::Index unblocked(MatrixType& mat)
+  {
+    using std::sqrt;
+    typedef typename MatrixType::Index Index;
+    
+    eigen_assert(mat.rows()==mat.cols());
+    const Index size = mat.rows();
+    for(Index k = 0; k < size; ++k)
+    {
+      Index rs = size-k-1; // remaining size
+
+      Block<MatrixType,Dynamic,1> A21(mat,k+1,k,rs,1);
+      Block<MatrixType,1,Dynamic> A10(mat,k,0,1,k);
+      Block<MatrixType,Dynamic,Dynamic> A20(mat,k+1,0,rs,k);
+
+      RealScalar x = numext::real(mat.coeff(k,k));
+      if (k>0) x -= A10.squaredNorm();
+      if (x<=RealScalar(0))
+        return k;
+      mat.coeffRef(k,k) = x = sqrt(x);
+      if (k>0 && rs>0) A21.noalias() -= A20 * A10.adjoint();
+      if (rs>0) A21 *= RealScalar(1)/x;
+    }
+    return -1;
+  }
+
+  template<typename MatrixType>
+  static typename MatrixType::Index blocked(MatrixType& m)
+  {
+    typedef typename MatrixType::Index Index;
+    eigen_assert(m.rows()==m.cols());
+    Index size = m.rows();
+    if(size<32)
+      return unblocked(m);
+
+    Index blockSize = size/8;
+    blockSize = (blockSize/16)*16;
+    blockSize = (std::min)((std::max)(blockSize,Index(8)), Index(128));
+
+    for (Index k=0; k<size; k+=blockSize)
+    {
+      // partition the matrix:
+      //       A00 |  -  |  -
+      // lu  = A10 | A11 |  -
+      //       A20 | A21 | A22
+      Index bs = (std::min)(blockSize, size-k);
+      Index rs = size - k - bs;
+      Block<MatrixType,Dynamic,Dynamic> A11(m,k,   k,   bs,bs);
+      Block<MatrixType,Dynamic,Dynamic> A21(m,k+bs,k,   rs,bs);
+      Block<MatrixType,Dynamic,Dynamic> A22(m,k+bs,k+bs,rs,rs);
+
+      Index ret;
+      if((ret=unblocked(A11))>=0) return k+ret;
+      if(rs>0) A11.adjoint().template triangularView<Upper>().template solveInPlace<OnTheRight>(A21);
+      if(rs>0) A22.template selfadjointView<Lower>().rankUpdate(A21,-1); // bottleneck
+    }
+    return -1;
+  }
+
+  template<typename MatrixType, typename VectorType>
+  static typename MatrixType::Index rankUpdate(MatrixType& mat, const VectorType& vec, const RealScalar& sigma)
+  {
+    return Eigen::internal::llt_rank_update_lower(mat, vec, sigma);
+  }
+};
+  
+template<typename Scalar> struct llt_inplace<Scalar, Upper>
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+
+  template<typename MatrixType>
+  static EIGEN_STRONG_INLINE typename MatrixType::Index unblocked(MatrixType& mat)
+  {
+    Transpose<MatrixType> matt(mat);
+    return llt_inplace<Scalar, Lower>::unblocked(matt);
+  }
+  template<typename MatrixType>
+  static EIGEN_STRONG_INLINE typename MatrixType::Index blocked(MatrixType& mat)
+  {
+    Transpose<MatrixType> matt(mat);
+    return llt_inplace<Scalar, Lower>::blocked(matt);
+  }
+  template<typename MatrixType, typename VectorType>
+  static typename MatrixType::Index rankUpdate(MatrixType& mat, const VectorType& vec, const RealScalar& sigma)
+  {
+    Transpose<MatrixType> matt(mat);
+    return llt_inplace<Scalar, Lower>::rankUpdate(matt, vec.conjugate(), sigma);
+  }
+};
+
+template<typename MatrixType> struct LLT_Traits<MatrixType,Lower>
+{
+  typedef const TriangularView<const MatrixType, Lower> MatrixL;
+  typedef const TriangularView<const typename MatrixType::AdjointReturnType, Upper> MatrixU;
+  static inline MatrixL getL(const MatrixType& m) { return m; }
+  static inline MatrixU getU(const MatrixType& m) { return m.adjoint(); }
+  static bool inplace_decomposition(MatrixType& m)
+  { return llt_inplace<typename MatrixType::Scalar, Lower>::blocked(m)==-1; }
+};
+
+template<typename MatrixType> struct LLT_Traits<MatrixType,Upper>
+{
+  typedef const TriangularView<const typename MatrixType::AdjointReturnType, Lower> MatrixL;
+  typedef const TriangularView<const MatrixType, Upper> MatrixU;
+  static inline MatrixL getL(const MatrixType& m) { return m.adjoint(); }
+  static inline MatrixU getU(const MatrixType& m) { return m; }
+  static bool inplace_decomposition(MatrixType& m)
+  { return llt_inplace<typename MatrixType::Scalar, Upper>::blocked(m)==-1; }
+};
+
+} // end namespace internal
+
+/** Computes / recomputes the Cholesky decomposition A = LL^* = U^*U of \a matrix
+  *
+  * \returns a reference to *this
+  *
+  * Example: \include TutorialLinAlgComputeTwice.cpp
+  * Output: \verbinclude TutorialLinAlgComputeTwice.out
+  */
+template<typename MatrixType, int _UpLo>
+LLT<MatrixType,_UpLo>& LLT<MatrixType,_UpLo>::compute(const MatrixType& a)
+{
+  eigen_assert(a.rows()==a.cols());
+  const Index size = a.rows();
+  m_matrix.resize(size, size);
+  m_matrix = a;
+
+  m_isInitialized = true;
+  bool ok = Traits::inplace_decomposition(m_matrix);
+  m_info = ok ? Success : NumericalIssue;
+
+  return *this;
+}
+
+/** Performs a rank one update (or dowdate) of the current decomposition.
+  * If A = LL^* before the rank one update,
+  * then after it we have LL^* = A + sigma * v v^* where \a v must be a vector
+  * of same dimension.
+  */
+template<typename _MatrixType, int _UpLo>
+template<typename VectorType>
+LLT<_MatrixType,_UpLo> LLT<_MatrixType,_UpLo>::rankUpdate(const VectorType& v, const RealScalar& sigma)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorType);
+  eigen_assert(v.size()==m_matrix.cols());
+  eigen_assert(m_isInitialized);
+  if(internal::llt_inplace<typename MatrixType::Scalar, UpLo>::rankUpdate(m_matrix,v,sigma)>=0)
+    m_info = NumericalIssue;
+  else
+    m_info = Success;
+
+  return *this;
+}
+    
+namespace internal {
+template<typename _MatrixType, int UpLo, typename Rhs>
+struct solve_retval<LLT<_MatrixType, UpLo>, Rhs>
+  : solve_retval_base<LLT<_MatrixType, UpLo>, Rhs>
+{
+  typedef LLT<_MatrixType,UpLo> LLTType;
+  EIGEN_MAKE_SOLVE_HELPERS(LLTType,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dst = rhs();
+    dec().solveInPlace(dst);
+  }
+};
+}
+
+/** \internal use x = llt_object.solve(x);
+  * 
+  * This is the \em in-place version of solve().
+  *
+  * \param bAndX represents both the right-hand side matrix b and result x.
+  *
+  * \returns true always! If you need to check for existence of solutions, use another decomposition like LU, QR, or SVD.
+  *
+  * This version avoids a copy when the right hand side matrix b is not
+  * needed anymore.
+  *
+  * \sa LLT::solve(), MatrixBase::llt()
+  */
+template<typename MatrixType, int _UpLo>
+template<typename Derived>
+void LLT<MatrixType,_UpLo>::solveInPlace(MatrixBase<Derived> &bAndX) const
+{
+  eigen_assert(m_isInitialized && "LLT is not initialized.");
+  eigen_assert(m_matrix.rows()==bAndX.rows());
+  matrixL().solveInPlace(bAndX);
+  matrixU().solveInPlace(bAndX);
+}
+
+/** \returns the matrix represented by the decomposition,
+ * i.e., it returns the product: L L^*.
+ * This function is provided for debug purpose. */
+template<typename MatrixType, int _UpLo>
+MatrixType LLT<MatrixType,_UpLo>::reconstructedMatrix() const
+{
+  eigen_assert(m_isInitialized && "LLT is not initialized.");
+  return matrixL() * matrixL().adjoint().toDenseMatrix();
+}
+
+#ifndef __CUDACC__
+/** \cholesky_module
+  * \returns the LLT decomposition of \c *this
+  * \sa SelfAdjointView::llt()
+  */
+template<typename Derived>
+inline const LLT<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::llt() const
+{
+  return LLT<PlainObject>(derived());
+}
+
+/** \cholesky_module
+  * \returns the LLT decomposition of \c *this
+  * \sa SelfAdjointView::llt()
+  */
+template<typename MatrixType, unsigned int UpLo>
+inline const LLT<typename SelfAdjointView<MatrixType, UpLo>::PlainObject, UpLo>
+SelfAdjointView<MatrixType, UpLo>::llt() const
+{
+  return LLT<PlainObject,UpLo>(m_matrix);
+}
+#endif // __CUDACC__
+  
+} // end namespace Eigen
+
+#endif // EIGEN_LLT_H
diff --git a/third_party/eigen3/Eigen/src/Cholesky/LLT_MKL.h b/third_party/eigen3/Eigen/src/Cholesky/LLT_MKL.h
new file mode 100644
index 0000000000..64daa445cf
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Cholesky/LLT_MKL.h
@@ -0,0 +1,102 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *     LLt decomposition based on LAPACKE_?potrf function.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_LLT_MKL_H
+#define EIGEN_LLT_MKL_H
+
+#include "Eigen/src/Core/util/MKL_support.h"
+#include <iostream>
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Scalar> struct mkl_llt;
+
+#define EIGEN_MKL_LLT(EIGTYPE, MKLTYPE, MKLPREFIX) \
+template<> struct mkl_llt<EIGTYPE> \
+{ \
+  template<typename MatrixType> \
+  static inline typename MatrixType::Index potrf(MatrixType& m, char uplo) \
+  { \
+    lapack_int matrix_order; \
+    lapack_int size, lda, info, StorageOrder; \
+    EIGTYPE* a; \
+    eigen_assert(m.rows()==m.cols()); \
+    /* Set up parameters for ?potrf */ \
+    size = m.rows(); \
+    StorageOrder = MatrixType::Flags&RowMajorBit?RowMajor:ColMajor; \
+    matrix_order = StorageOrder==RowMajor ? LAPACK_ROW_MAJOR : LAPACK_COL_MAJOR; \
+    a = &(m.coeffRef(0,0)); \
+    lda = m.outerStride(); \
+\
+    info = LAPACKE_##MKLPREFIX##potrf( matrix_order, uplo, size, (MKLTYPE*)a, lda ); \
+    info = (info==0) ? Success : NumericalIssue; \
+    return info; \
+  } \
+}; \
+template<> struct llt_inplace<EIGTYPE, Lower> \
+{ \
+  template<typename MatrixType> \
+  static typename MatrixType::Index blocked(MatrixType& m) \
+  { \
+    return mkl_llt<EIGTYPE>::potrf(m, 'L'); \
+  } \
+  template<typename MatrixType, typename VectorType> \
+  static typename MatrixType::Index rankUpdate(MatrixType& mat, const VectorType& vec, const typename MatrixType::RealScalar& sigma) \
+  { return Eigen::internal::llt_rank_update_lower(mat, vec, sigma); } \
+}; \
+template<> struct llt_inplace<EIGTYPE, Upper> \
+{ \
+  template<typename MatrixType> \
+  static typename MatrixType::Index blocked(MatrixType& m) \
+  { \
+    return mkl_llt<EIGTYPE>::potrf(m, 'U'); \
+  } \
+  template<typename MatrixType, typename VectorType> \
+  static typename MatrixType::Index rankUpdate(MatrixType& mat, const VectorType& vec, const typename MatrixType::RealScalar& sigma) \
+  { \
+    Transpose<MatrixType> matt(mat); \
+    return llt_inplace<EIGTYPE, Lower>::rankUpdate(matt, vec.conjugate(), sigma); \
+  } \
+};
+
+EIGEN_MKL_LLT(double, double, d)
+EIGEN_MKL_LLT(float, float, s)
+EIGEN_MKL_LLT(dcomplex, MKL_Complex16, z)
+EIGEN_MKL_LLT(scomplex, MKL_Complex8, c)
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_LLT_MKL_H
diff --git a/third_party/eigen3/Eigen/src/CholmodSupport/CholmodSupport.h b/third_party/eigen3/Eigen/src/CholmodSupport/CholmodSupport.h
new file mode 100644
index 0000000000..c449960de4
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/CholmodSupport/CholmodSupport.h
@@ -0,0 +1,607 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CHOLMODSUPPORT_H
+#define EIGEN_CHOLMODSUPPORT_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Scalar, typename CholmodType>
+void cholmod_configure_matrix(CholmodType& mat)
+{
+  if (internal::is_same<Scalar,float>::value)
+  {
+    mat.xtype = CHOLMOD_REAL;
+    mat.dtype = CHOLMOD_SINGLE;
+  }
+  else if (internal::is_same<Scalar,double>::value)
+  {
+    mat.xtype = CHOLMOD_REAL;
+    mat.dtype = CHOLMOD_DOUBLE;
+  }
+  else if (internal::is_same<Scalar,std::complex<float> >::value)
+  {
+    mat.xtype = CHOLMOD_COMPLEX;
+    mat.dtype = CHOLMOD_SINGLE;
+  }
+  else if (internal::is_same<Scalar,std::complex<double> >::value)
+  {
+    mat.xtype = CHOLMOD_COMPLEX;
+    mat.dtype = CHOLMOD_DOUBLE;
+  }
+  else
+  {
+    eigen_assert(false && "Scalar type not supported by CHOLMOD");
+  }
+}
+
+} // namespace internal
+
+/** Wraps the Eigen sparse matrix \a mat into a Cholmod sparse matrix object.
+  * Note that the data are shared.
+  */
+template<typename _Scalar, int _Options, typename _Index>
+cholmod_sparse viewAsCholmod(SparseMatrix<_Scalar,_Options,_Index>& mat)
+{
+  cholmod_sparse res;
+  res.nzmax   = mat.nonZeros();
+  res.nrow    = mat.rows();;
+  res.ncol    = mat.cols();
+  res.p       = mat.outerIndexPtr();
+  res.i       = mat.innerIndexPtr();
+  res.x       = mat.valuePtr();
+  res.z       = 0;
+  res.sorted  = 1;
+  if(mat.isCompressed())
+  {
+    res.packed  = 1;
+    res.nz = 0;
+  }
+  else
+  {
+    res.packed  = 0;
+    res.nz = mat.innerNonZeroPtr();
+  }
+
+  res.dtype   = 0;
+  res.stype   = -1;
+  
+  if (internal::is_same<_Index,int>::value)
+  {
+    res.itype = CHOLMOD_INT;
+  }
+  else if (internal::is_same<_Index,UF_long>::value)
+  {
+    res.itype = CHOLMOD_LONG;
+  }
+  else
+  {
+    eigen_assert(false && "Index type not supported yet");
+  }
+
+  // setup res.xtype
+  internal::cholmod_configure_matrix<_Scalar>(res);
+  
+  res.stype = 0;
+  
+  return res;
+}
+
+template<typename _Scalar, int _Options, typename _Index>
+const cholmod_sparse viewAsCholmod(const SparseMatrix<_Scalar,_Options,_Index>& mat)
+{
+  cholmod_sparse res = viewAsCholmod(mat.const_cast_derived());
+  return res;
+}
+
+/** Returns a view of the Eigen sparse matrix \a mat as Cholmod sparse matrix.
+  * The data are not copied but shared. */
+template<typename _Scalar, int _Options, typename _Index, unsigned int UpLo>
+cholmod_sparse viewAsCholmod(const SparseSelfAdjointView<SparseMatrix<_Scalar,_Options,_Index>, UpLo>& mat)
+{
+  cholmod_sparse res = viewAsCholmod(mat.matrix().const_cast_derived());
+  
+  if(UpLo==Upper) res.stype =  1;
+  if(UpLo==Lower) res.stype = -1;
+
+  return res;
+}
+
+/** Returns a view of the Eigen \b dense matrix \a mat as Cholmod dense matrix.
+  * The data are not copied but shared. */
+template<typename Derived>
+cholmod_dense viewAsCholmod(MatrixBase<Derived>& mat)
+{
+  EIGEN_STATIC_ASSERT((internal::traits<Derived>::Flags&RowMajorBit)==0,THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
+  typedef typename Derived::Scalar Scalar;
+
+  cholmod_dense res;
+  res.nrow   = mat.rows();
+  res.ncol   = mat.cols();
+  res.nzmax  = res.nrow * res.ncol;
+  res.d      = Derived::IsVectorAtCompileTime ? mat.derived().size() : mat.derived().outerStride();
+  res.x      = (void*)(mat.derived().data());
+  res.z      = 0;
+
+  internal::cholmod_configure_matrix<Scalar>(res);
+
+  return res;
+}
+
+/** Returns a view of the Cholmod sparse matrix \a cm as an Eigen sparse matrix.
+  * The data are not copied but shared. */
+template<typename Scalar, int Flags, typename Index>
+MappedSparseMatrix<Scalar,Flags,Index> viewAsEigen(cholmod_sparse& cm)
+{
+  return MappedSparseMatrix<Scalar,Flags,Index>
+         (cm.nrow, cm.ncol, static_cast<Index*>(cm.p)[cm.ncol],
+          static_cast<Index*>(cm.p), static_cast<Index*>(cm.i),static_cast<Scalar*>(cm.x) );
+}
+
+enum CholmodMode {
+  CholmodAuto, CholmodSimplicialLLt, CholmodSupernodalLLt, CholmodLDLt
+};
+
+
+/** \ingroup CholmodSupport_Module
+  * \class CholmodBase
+  * \brief The base class for the direct Cholesky factorization of Cholmod
+  * \sa class CholmodSupernodalLLT, class CholmodSimplicialLDLT, class CholmodSimplicialLLT
+  */
+template<typename _MatrixType, int _UpLo, typename Derived>
+class CholmodBase : internal::noncopyable
+{
+  public:
+    typedef _MatrixType MatrixType;
+    enum { UpLo = _UpLo };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef MatrixType CholMatrixType;
+    typedef typename MatrixType::Index Index;
+
+  public:
+
+    CholmodBase()
+      : m_cholmodFactor(0), m_info(Success), m_isInitialized(false)
+    {
+      m_shiftOffset[0] = m_shiftOffset[1] = RealScalar(0.0);
+      cholmod_start(&m_cholmod);
+    }
+
+    CholmodBase(const MatrixType& matrix)
+      : m_cholmodFactor(0), m_info(Success), m_isInitialized(false)
+    {
+      m_shiftOffset[0] = m_shiftOffset[1] = RealScalar(0.0);
+      cholmod_start(&m_cholmod);
+      compute(matrix);
+    }
+
+    ~CholmodBase()
+    {
+      if(m_cholmodFactor)
+        cholmod_free_factor(&m_cholmodFactor, &m_cholmod);
+      cholmod_finish(&m_cholmod);
+    }
+    
+    inline Index cols() const { return m_cholmodFactor->n; }
+    inline Index rows() const { return m_cholmodFactor->n; }
+    
+    Derived& derived() { return *static_cast<Derived*>(this); }
+    const Derived& derived() const { return *static_cast<const Derived*>(this); }
+    
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the matrix.appears to be negative.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_info;
+    }
+
+    /** Computes the sparse Cholesky decomposition of \a matrix */
+    Derived& compute(const MatrixType& matrix)
+    {
+      analyzePattern(matrix);
+      factorize(matrix);
+      return derived();
+    }
+    
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<CholmodBase, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "LLT is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "CholmodDecomposition::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<CholmodBase, Rhs>(*this, b.derived());
+    }
+    
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::sparse_solve_retval<CholmodBase, Rhs>
+    solve(const SparseMatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "LLT is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "CholmodDecomposition::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::sparse_solve_retval<CholmodBase, Rhs>(*this, b.derived());
+    }
+    
+    /** Performs a symbolic decomposition on the sparsity pattern of \a matrix.
+      *
+      * This function is particularly useful when solving for several problems having the same structure.
+      * 
+      * \sa factorize()
+      */
+    void analyzePattern(const MatrixType& matrix)
+    {
+      if(m_cholmodFactor)
+      {
+        cholmod_free_factor(&m_cholmodFactor, &m_cholmod);
+        m_cholmodFactor = 0;
+      }
+      cholmod_sparse A = viewAsCholmod(matrix.template selfadjointView<UpLo>());
+      m_cholmodFactor = cholmod_analyze(&A, &m_cholmod);
+      
+      this->m_isInitialized = true;
+      this->m_info = Success;
+      m_analysisIsOk = true;
+      m_factorizationIsOk = false;
+    }
+    
+    /** Performs a numeric decomposition of \a matrix
+      *
+      * The given matrix must have the same sparsity pattern as the matrix on which the symbolic decomposition has been performed.
+      *
+      * \sa analyzePattern()
+      */
+    void factorize(const MatrixType& matrix)
+    {
+      eigen_assert(m_analysisIsOk && "You must first call analyzePattern()");
+      cholmod_sparse A = viewAsCholmod(matrix.template selfadjointView<UpLo>());
+      cholmod_factorize_p(&A, m_shiftOffset, 0, 0, m_cholmodFactor, &m_cholmod);
+      
+      // If the factorization failed, minor is the column at which it did. On success minor == n.
+      this->m_info = (m_cholmodFactor->minor == m_cholmodFactor->n ? Success : NumericalIssue);
+      m_factorizationIsOk = true;
+    }
+    
+    /** Returns a reference to the Cholmod's configuration structure to get a full control over the performed operations.
+     *  See the Cholmod user guide for details. */
+    cholmod_common& cholmod() { return m_cholmod; }
+    
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** \internal */
+    template<typename Rhs,typename Dest>
+    void _solve(const MatrixBase<Rhs> &b, MatrixBase<Dest> &dest) const
+    {
+      eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or symbolic()/numeric()");
+      const Index size = m_cholmodFactor->n;
+      EIGEN_UNUSED_VARIABLE(size);
+      eigen_assert(size==b.rows());
+
+      // note: cd stands for Cholmod Dense
+      Rhs& b_ref(b.const_cast_derived());
+      cholmod_dense b_cd = viewAsCholmod(b_ref);
+      cholmod_dense* x_cd = cholmod_solve(CHOLMOD_A, m_cholmodFactor, &b_cd, &m_cholmod);
+      if(!x_cd)
+      {
+        this->m_info = NumericalIssue;
+      }
+      // TODO optimize this copy by swapping when possible (be careful with alignment, etc.)
+      dest = Matrix<Scalar,Dest::RowsAtCompileTime,Dest::ColsAtCompileTime>::Map(reinterpret_cast<Scalar*>(x_cd->x),b.rows(),b.cols());
+      cholmod_free_dense(&x_cd, &m_cholmod);
+    }
+    
+    /** \internal */
+    template<typename RhsScalar, int RhsOptions, typename RhsIndex, typename DestScalar, int DestOptions, typename DestIndex>
+    void _solve(const SparseMatrix<RhsScalar,RhsOptions,RhsIndex> &b, SparseMatrix<DestScalar,DestOptions,DestIndex> &dest) const
+    {
+      eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or symbolic()/numeric()");
+      const Index size = m_cholmodFactor->n;
+      EIGEN_UNUSED_VARIABLE(size);
+      eigen_assert(size==b.rows());
+
+      // note: cs stands for Cholmod Sparse
+      cholmod_sparse b_cs = viewAsCholmod(b);
+      cholmod_sparse* x_cs = cholmod_spsolve(CHOLMOD_A, m_cholmodFactor, &b_cs, &m_cholmod);
+      if(!x_cs)
+      {
+        this->m_info = NumericalIssue;
+      }
+      // TODO optimize this copy by swapping when possible (be careful with alignment, etc.)
+      dest = viewAsEigen<DestScalar,DestOptions,DestIndex>(*x_cs);
+      cholmod_free_sparse(&x_cs, &m_cholmod);
+    }
+    #endif // EIGEN_PARSED_BY_DOXYGEN
+    
+    
+    /** Sets the shift parameter that will be used to adjust the diagonal coefficients during the numerical factorization.
+      *
+      * During the numerical factorization, an offset term is added to the diagonal coefficients:\n
+      * \c d_ii = \a offset + \c d_ii
+      *
+      * The default is \a offset=0.
+      *
+      * \returns a reference to \c *this.
+      */
+    Derived& setShift(const RealScalar& offset)
+    {
+      m_shiftOffset[0] = offset;
+      return derived();
+    }
+    
+    template<typename Stream>
+    void dumpMemory(Stream& /*s*/)
+    {}
+    
+  protected:
+    mutable cholmod_common m_cholmod;
+    cholmod_factor* m_cholmodFactor;
+    RealScalar m_shiftOffset[2];
+    mutable ComputationInfo m_info;
+    bool m_isInitialized;
+    int m_factorizationIsOk;
+    int m_analysisIsOk;
+};
+
+/** \ingroup CholmodSupport_Module
+  * \class CholmodSimplicialLLT
+  * \brief A simplicial direct Cholesky (LLT) factorization and solver based on Cholmod
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a simplicial LL^T Cholesky factorization
+  * using the Cholmod library.
+  * This simplicial variant is equivalent to Eigen's built-in SimplicialLLT class. Therefore, it has little practical interest.
+  * The sparse matrix A must be selfadjoint and positive definite. The vectors or matrices
+  * X and B can be either dense or sparse.
+  *
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower
+  *               or Upper. Default is Lower.
+  *
+  * This class supports all kind of SparseMatrix<>: row or column major; upper, lower, or both; compressed or non compressed.
+  *
+  * \sa \ref TutorialSparseDirectSolvers, class CholmodSupernodalLLT, class SimplicialLLT
+  */
+template<typename _MatrixType, int _UpLo = Lower>
+class CholmodSimplicialLLT : public CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLLT<_MatrixType, _UpLo> >
+{
+    typedef CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLLT> Base;
+    using Base::m_cholmod;
+    
+  public:
+    
+    typedef _MatrixType MatrixType;
+    
+    CholmodSimplicialLLT() : Base() { init(); }
+
+    CholmodSimplicialLLT(const MatrixType& matrix) : Base()
+    {
+      init();
+      compute(matrix);
+    }
+
+    ~CholmodSimplicialLLT() {}
+  protected:
+    void init()
+    {
+      m_cholmod.final_asis = 0;
+      m_cholmod.supernodal = CHOLMOD_SIMPLICIAL;
+      m_cholmod.final_ll = 1;
+    }
+};
+
+
+/** \ingroup CholmodSupport_Module
+  * \class CholmodSimplicialLDLT
+  * \brief A simplicial direct Cholesky (LDLT) factorization and solver based on Cholmod
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a simplicial LDL^T Cholesky factorization
+  * using the Cholmod library.
+  * This simplicial variant is equivalent to Eigen's built-in SimplicialLDLT class. Therefore, it has little practical interest.
+  * The sparse matrix A must be selfadjoint and positive definite. The vectors or matrices
+  * X and B can be either dense or sparse.
+  *
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower
+  *               or Upper. Default is Lower.
+  *
+  * This class supports all kind of SparseMatrix<>: row or column major; upper, lower, or both; compressed or non compressed.
+  *
+  * \sa \ref TutorialSparseDirectSolvers, class CholmodSupernodalLLT, class SimplicialLDLT
+  */
+template<typename _MatrixType, int _UpLo = Lower>
+class CholmodSimplicialLDLT : public CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLDLT<_MatrixType, _UpLo> >
+{
+    typedef CholmodBase<_MatrixType, _UpLo, CholmodSimplicialLDLT> Base;
+    using Base::m_cholmod;
+    
+  public:
+    
+    typedef _MatrixType MatrixType;
+    
+    CholmodSimplicialLDLT() : Base() { init(); }
+
+    CholmodSimplicialLDLT(const MatrixType& matrix) : Base()
+    {
+      init();
+      compute(matrix);
+    }
+
+    ~CholmodSimplicialLDLT() {}
+  protected:
+    void init()
+    {
+      m_cholmod.final_asis = 1;
+      m_cholmod.supernodal = CHOLMOD_SIMPLICIAL;
+    }
+};
+
+/** \ingroup CholmodSupport_Module
+  * \class CholmodSupernodalLLT
+  * \brief A supernodal Cholesky (LLT) factorization and solver based on Cholmod
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a supernodal LL^T Cholesky factorization
+  * using the Cholmod library.
+  * This supernodal variant performs best on dense enough problems, e.g., 3D FEM, or very high order 2D FEM.
+  * The sparse matrix A must be selfadjoint and positive definite. The vectors or matrices
+  * X and B can be either dense or sparse.
+  *
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower
+  *               or Upper. Default is Lower.
+  *
+  * This class supports all kind of SparseMatrix<>: row or column major; upper, lower, or both; compressed or non compressed.
+  *
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename _MatrixType, int _UpLo = Lower>
+class CholmodSupernodalLLT : public CholmodBase<_MatrixType, _UpLo, CholmodSupernodalLLT<_MatrixType, _UpLo> >
+{
+    typedef CholmodBase<_MatrixType, _UpLo, CholmodSupernodalLLT> Base;
+    using Base::m_cholmod;
+    
+  public:
+    
+    typedef _MatrixType MatrixType;
+    
+    CholmodSupernodalLLT() : Base() { init(); }
+
+    CholmodSupernodalLLT(const MatrixType& matrix) : Base()
+    {
+      init();
+      compute(matrix);
+    }
+
+    ~CholmodSupernodalLLT() {}
+  protected:
+    void init()
+    {
+      m_cholmod.final_asis = 1;
+      m_cholmod.supernodal = CHOLMOD_SUPERNODAL;
+    }
+};
+
+/** \ingroup CholmodSupport_Module
+  * \class CholmodDecomposition
+  * \brief A general Cholesky factorization and solver based on Cholmod
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a LL^T or LDL^T Cholesky factorization
+  * using the Cholmod library. The sparse matrix A must be selfadjoint and positive definite. The vectors or matrices
+  * X and B can be either dense or sparse.
+  *
+  * This variant permits to change the underlying Cholesky method at runtime.
+  * On the other hand, it does not provide access to the result of the factorization.
+  * The default is to let Cholmod automatically choose between a simplicial and supernodal factorization.
+  *
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower
+  *               or Upper. Default is Lower.
+  *
+  * This class supports all kind of SparseMatrix<>: row or column major; upper, lower, or both; compressed or non compressed.
+  *
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename _MatrixType, int _UpLo = Lower>
+class CholmodDecomposition : public CholmodBase<_MatrixType, _UpLo, CholmodDecomposition<_MatrixType, _UpLo> >
+{
+    typedef CholmodBase<_MatrixType, _UpLo, CholmodDecomposition> Base;
+    using Base::m_cholmod;
+    
+  public:
+    
+    typedef _MatrixType MatrixType;
+    
+    CholmodDecomposition() : Base() { init(); }
+
+    CholmodDecomposition(const MatrixType& matrix) : Base()
+    {
+      init();
+      compute(matrix);
+    }
+
+    ~CholmodDecomposition() {}
+    
+    void setMode(CholmodMode mode)
+    {
+      switch(mode)
+      {
+        case CholmodAuto:
+          m_cholmod.final_asis = 1;
+          m_cholmod.supernodal = CHOLMOD_AUTO;
+          break;
+        case CholmodSimplicialLLt:
+          m_cholmod.final_asis = 0;
+          m_cholmod.supernodal = CHOLMOD_SIMPLICIAL;
+          m_cholmod.final_ll = 1;
+          break;
+        case CholmodSupernodalLLt:
+          m_cholmod.final_asis = 1;
+          m_cholmod.supernodal = CHOLMOD_SUPERNODAL;
+          break;
+        case CholmodLDLt:
+          m_cholmod.final_asis = 1;
+          m_cholmod.supernodal = CHOLMOD_SIMPLICIAL;
+          break;
+        default:
+          break;
+      }
+    }
+  protected:
+    void init()
+    {
+      m_cholmod.final_asis = 1;
+      m_cholmod.supernodal = CHOLMOD_AUTO;
+    }
+};
+
+namespace internal {
+  
+template<typename _MatrixType, int _UpLo, typename Derived, typename Rhs>
+struct solve_retval<CholmodBase<_MatrixType,_UpLo,Derived>, Rhs>
+  : solve_retval_base<CholmodBase<_MatrixType,_UpLo,Derived>, Rhs>
+{
+  typedef CholmodBase<_MatrixType,_UpLo,Derived> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+template<typename _MatrixType, int _UpLo, typename Derived, typename Rhs>
+struct sparse_solve_retval<CholmodBase<_MatrixType,_UpLo,Derived>, Rhs>
+  : sparse_solve_retval_base<CholmodBase<_MatrixType,_UpLo,Derived>, Rhs>
+{
+  typedef CholmodBase<_MatrixType,_UpLo,Derived> Dec;
+  EIGEN_MAKE_SPARSE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_CHOLMODSUPPORT_H
diff --git a/third_party/eigen3/Eigen/src/Core/Array.h b/third_party/eigen3/Eigen/src/Core/Array.h
new file mode 100644
index 0000000000..28d6f14434
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Array.h
@@ -0,0 +1,338 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ARRAY_H
+#define EIGEN_ARRAY_H
+
+namespace Eigen {
+
+/** \class Array 
+  * \ingroup Core_Module
+  *
+  * \brief General-purpose arrays with easy API for coefficient-wise operations
+  *
+  * The %Array class is very similar to the Matrix class. It provides
+  * general-purpose one- and two-dimensional arrays. The difference between the
+  * %Array and the %Matrix class is primarily in the API: the API for the
+  * %Array class provides easy access to coefficient-wise operations, while the
+  * API for the %Matrix class provides easy access to linear-algebra
+  * operations.
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_ARRAY_PLUGIN.
+  *
+  * \sa \ref TutorialArrayClass, \ref TopicClassHierarchy
+  */
+namespace internal {
+template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
+struct traits<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> > : traits<Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> >
+{
+  typedef ArrayXpr XprKind;
+  typedef ArrayBase<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> > XprBase;
+};
+}
+
+template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
+class Array
+  : public PlainObjectBase<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> >
+{
+  public:
+
+    typedef PlainObjectBase<Array> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Array)
+
+    enum { Options = _Options };
+    typedef typename Base::PlainObject PlainObject;
+
+  protected:
+    template <typename Derived, typename OtherDerived, bool IsVector>
+    friend struct internal::conservative_resize_like_impl;
+
+    using Base::m_storage;
+
+  public:
+
+    using Base::base;
+    using Base::coeff;
+    using Base::coeffRef;
+
+    /**
+      * The usage of
+      *   using Base::operator=;
+      * fails on MSVC. Since the code below is working with GCC and MSVC, we skipped
+      * the usage of 'using'. This should be done only for operator=.
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array& operator=(const EigenBase<OtherDerived> &other)
+    {
+      return Base::operator=(other);
+    }
+
+    /** Copies the value of the expression \a other into \c *this with automatic resizing.
+      *
+      * *this might be resized to match the dimensions of \a other. If *this was a null matrix (not already initialized),
+      * it will be initialized.
+      *
+      * Note that copying a row-vector into a vector (and conversely) is allowed.
+      * The resizing, if any, is then done in the appropriate way so that row-vectors
+      * remain row-vectors and vectors remain vectors.
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array& operator=(const ArrayBase<OtherDerived>& other)
+    {
+      return Base::_set(other);
+    }
+
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array& operator=(const Array& other)
+    {
+      return Base::_set(other);
+    }
+
+    /** Default constructor.
+      *
+      * For fixed-size matrices, does nothing.
+      *
+      * For dynamic-size matrices, creates an empty matrix of size 0. Does not allocate any array. Such a matrix
+      * is called a null matrix. This constructor is the unique way to create null matrices: resizing
+      * a matrix to 0 is not supported.
+      *
+      * \sa resize(Index,Index)
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array() : Base()
+    {
+      Base::_check_template_params();
+      EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    }
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    // FIXME is it still needed ??
+    /** \internal */
+    EIGEN_DEVICE_FUNC
+    Array(internal::constructor_without_unaligned_array_assert)
+      : Base(internal::constructor_without_unaligned_array_assert())
+    {
+      Base::_check_template_params();
+      EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    }
+#endif
+
+#ifdef EIGEN_HAVE_RVALUE_REFERENCES
+    Array(Array&& other)
+      : Base(std::move(other))
+    {
+      Base::_check_template_params();
+      if (RowsAtCompileTime!=Dynamic && ColsAtCompileTime!=Dynamic)
+        Base::_set_noalias(other);
+    }
+    Array& operator=(Array&& other)
+    {
+      other.swap(*this);
+      return *this;
+    }
+#endif
+
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename T>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE explicit Array(const T& x)
+    {
+      Base::_check_template_params();
+      Base::template _init1<T>(x);
+    }
+
+    template<typename T0, typename T1>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array(const T0& val0, const T1& val1)
+    {
+      Base::_check_template_params();
+      this->template _init2<T0,T1>(val0, val1);
+    }
+    #else
+    /** \brief Constructs a fixed-sized array initialized with coefficients starting at \a data */
+    EIGEN_DEVICE_FUNC explicit Array(const Scalar *data);
+    /** Constructs a vector or row-vector with given dimension. \only_for_vectors
+      *
+      * Note that this is only useful for dynamic-size vectors. For fixed-size vectors,
+      * it is redundant to pass the dimension here, so it makes more sense to use the default
+      * constructor Array() instead.
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE explicit Array(Index dim);
+    /** constructs an initialized 1x1 Array with the given coefficient */
+    Array(const Scalar& value);
+    /** constructs an uninitialized array with \a rows rows and \a cols columns.
+      *
+      * This is useful for dynamic-size arrays. For fixed-size arrays,
+      * it is redundant to pass these parameters, so one should use the default constructor
+      * Array() instead. */
+    Array(Index rows, Index cols);
+    /** constructs an initialized 2D vector with given coefficients */
+    Array(const Scalar& val0, const Scalar& val1);
+    #endif
+
+    /** constructs an initialized 3D vector with given coefficients */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array(const Scalar& val0, const Scalar& val1, const Scalar& val2)
+    {
+      Base::_check_template_params();
+      EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Array, 3)
+      m_storage.data()[0] = val0;
+      m_storage.data()[1] = val1;
+      m_storage.data()[2] = val2;
+    }
+    /** constructs an initialized 4D vector with given coefficients */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array(const Scalar& val0, const Scalar& val1, const Scalar& val2, const Scalar& val3)
+    {
+      Base::_check_template_params();
+      EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Array, 4)
+      m_storage.data()[0] = val0;
+      m_storage.data()[1] = val1;
+      m_storage.data()[2] = val2;
+      m_storage.data()[3] = val3;
+    }
+
+    /** Constructor copying the value of the expression \a other */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array(const ArrayBase<OtherDerived>& other)
+             : Base(other.rows() * other.cols(), other.rows(), other.cols())
+    {
+      Base::_check_template_params();
+      Base::_set_noalias(other);
+    }
+    /** Copy constructor */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array(const Array& other)
+            : Base(other.rows() * other.cols(), other.rows(), other.cols())
+    {
+      Base::_check_template_params();
+      Base::_set_noalias(other);
+    }
+    /** Copy constructor with in-place evaluation */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array(const ReturnByValue<OtherDerived>& other)
+    {
+      Base::_check_template_params();
+      Base::resize(other.rows(), other.cols());
+      other.evalTo(*this);
+    }
+
+    /** \sa MatrixBase::operator=(const EigenBase<OtherDerived>&) */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Array(const EigenBase<OtherDerived> &other)
+      : Base(other.derived().rows() * other.derived().cols(), other.derived().rows(), other.derived().cols())
+    {
+      Base::_check_template_params();
+      Base::_resize_to_match(other);
+      *this = other;
+    }
+
+    /** Override MatrixBase::swap() since for dynamic-sized matrices of same type it is enough to swap the
+      * data pointers.
+      */
+    template<typename OtherDerived>
+    void swap(ArrayBase<OtherDerived> const & other)
+    { this->_swap(other.derived()); }
+
+    EIGEN_DEVICE_FUNC inline Index innerStride() const { return 1; }
+    EIGEN_DEVICE_FUNC inline Index outerStride() const { return this->innerSize(); }
+
+    #ifdef EIGEN_ARRAY_PLUGIN
+    #include EIGEN_ARRAY_PLUGIN
+    #endif
+
+  private:
+
+    template<typename MatrixType, typename OtherDerived, bool SwapPointers>
+    friend struct internal::matrix_swap_impl;
+};
+
+/** \defgroup arraytypedefs Global array typedefs
+  * \ingroup Core_Module
+  *
+  * Eigen defines several typedef shortcuts for most common 1D and 2D array types.
+  *
+  * The general patterns are the following:
+  *
+  * \c ArrayRowsColsType where \c Rows and \c Cols can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for dynamic size,
+  * and where \c Type can be \c i for integer, \c f for float, \c d for double, \c cf for complex float, \c cd
+  * for complex double.
+  *
+  * For example, \c Array33d is a fixed-size 3x3 array type of doubles, and \c ArrayXXf is a dynamic-size matrix of floats.
+  *
+  * There are also \c ArraySizeType which are self-explanatory. For example, \c Array4cf is
+  * a fixed-size 1D array of 4 complex floats.
+  *
+  * \sa class Array
+  */
+
+#define EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, Size, SizeSuffix)   \
+/** \ingroup arraytypedefs */                                    \
+typedef Array<Type, Size, Size> Array##SizeSuffix##SizeSuffix##TypeSuffix;  \
+/** \ingroup arraytypedefs */                                    \
+typedef Array<Type, Size, 1>    Array##SizeSuffix##TypeSuffix;
+
+#define EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, Size)         \
+/** \ingroup arraytypedefs */                                    \
+typedef Array<Type, Size, Dynamic> Array##Size##X##TypeSuffix;  \
+/** \ingroup arraytypedefs */                                    \
+typedef Array<Type, Dynamic, Size> Array##X##Size##TypeSuffix;
+
+#define EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(Type, TypeSuffix) \
+EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 2, 2) \
+EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 3, 3) \
+EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, 4, 4) \
+EIGEN_MAKE_ARRAY_TYPEDEFS(Type, TypeSuffix, Dynamic, X) \
+EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 2) \
+EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 3) \
+EIGEN_MAKE_ARRAY_FIXED_TYPEDEFS(Type, TypeSuffix, 4)
+
+EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(int,                  i)
+EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(float,                f)
+EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(double,               d)
+EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(std::complex<float>,  cf)
+EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES(std::complex<double>, cd)
+
+#undef EIGEN_MAKE_ARRAY_TYPEDEFS_ALL_SIZES
+#undef EIGEN_MAKE_ARRAY_TYPEDEFS
+
+#undef EIGEN_MAKE_ARRAY_TYPEDEFS_LARGE
+
+#define EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, SizeSuffix) \
+using Eigen::Matrix##SizeSuffix##TypeSuffix; \
+using Eigen::Vector##SizeSuffix##TypeSuffix; \
+using Eigen::RowVector##SizeSuffix##TypeSuffix;
+
+#define EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(TypeSuffix) \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 2) \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 3) \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, 4) \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE_AND_SIZE(TypeSuffix, X) \
+
+#define EIGEN_USING_ARRAY_TYPEDEFS \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(i) \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(f) \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(d) \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(cf) \
+EIGEN_USING_ARRAY_TYPEDEFS_FOR_TYPE(cd)
+
+} // end namespace Eigen
+
+#endif // EIGEN_ARRAY_H
diff --git a/third_party/eigen3/Eigen/src/Core/ArrayBase.h b/third_party/eigen3/Eigen/src/Core/ArrayBase.h
new file mode 100644
index 0000000000..2c9ace4a77
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/ArrayBase.h
@@ -0,0 +1,238 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ARRAYBASE_H
+#define EIGEN_ARRAYBASE_H
+
+namespace Eigen { 
+
+template<typename ExpressionType> class MatrixWrapper;
+
+/** \class ArrayBase
+  * \ingroup Core_Module
+  *
+  * \brief Base class for all 1D and 2D array, and related expressions
+  *
+  * An array is similar to a dense vector or matrix. While matrices are mathematical
+  * objects with well defined linear algebra operators, an array is just a collection
+  * of scalar values arranged in a one or two dimensionnal fashion. As the main consequence,
+  * all operations applied to an array are performed coefficient wise. Furthermore,
+  * arrays support scalar math functions of the c++ standard library (e.g., std::sin(x)), and convenient
+  * constructors allowing to easily write generic code working for both scalar values
+  * and arrays.
+  *
+  * This class is the base that is inherited by all array expression types.
+  *
+  * \tparam Derived is the derived type, e.g., an array or an expression type.
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_ARRAYBASE_PLUGIN.
+  *
+  * \sa class MatrixBase, \ref TopicClassHierarchy
+  */
+template<typename Derived> class ArrayBase
+  : public DenseBase<Derived>
+{
+  public:
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** The base class for a given storage type. */
+    typedef ArrayBase StorageBaseType;
+
+    typedef ArrayBase Eigen_BaseClassForSpecializationOfGlobalMathFuncImpl;
+
+    using internal::special_scalar_op_base<Derived,typename internal::traits<Derived>::Scalar,
+                typename NumTraits<typename internal::traits<Derived>::Scalar>::Real>::operator*;
+
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type PacketScalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    typedef DenseBase<Derived> Base;
+    using Base::RowsAtCompileTime;
+    using Base::ColsAtCompileTime;
+    using Base::SizeAtCompileTime;
+    using Base::MaxRowsAtCompileTime;
+    using Base::MaxColsAtCompileTime;
+    using Base::MaxSizeAtCompileTime;
+    using Base::IsVectorAtCompileTime;
+    using Base::Flags;
+    using Base::CoeffReadCost;
+
+    using Base::derived;
+    using Base::const_cast_derived;
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::coeff;
+    using Base::coeffRef;
+    using Base::lazyAssign;
+    using Base::operator=;
+    using Base::operator+=;
+    using Base::operator-=;
+    using Base::operator*=;
+    using Base::operator/=;
+
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** \internal the plain matrix type corresponding to this expression. Note that is not necessarily
+      * exactly the return type of eval(): in the case of plain matrices, the return type of eval() is a const
+      * reference to a matrix, not a matrix! It is however guaranteed that the return type of eval() is either
+      * PlainObject or const PlainObject&.
+      */
+    typedef Array<typename internal::traits<Derived>::Scalar,
+                internal::traits<Derived>::RowsAtCompileTime,
+                internal::traits<Derived>::ColsAtCompileTime,
+                AutoAlign | (internal::traits<Derived>::Flags&RowMajorBit ? RowMajor : ColMajor),
+                internal::traits<Derived>::MaxRowsAtCompileTime,
+                internal::traits<Derived>::MaxColsAtCompileTime
+          > PlainObject;
+
+
+    /** \internal Represents a matrix with all coefficients equal to one another*/
+    typedef CwiseNullaryOp<internal::scalar_constant_op<Scalar>,Derived> ConstantReturnType;
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+#define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::ArrayBase
+#   include "../plugins/CommonCwiseUnaryOps.h"
+#   include "../plugins/MatrixCwiseUnaryOps.h"
+#   include "../plugins/ArrayCwiseUnaryOps.h"
+#   include "../plugins/CommonCwiseBinaryOps.h"
+#   include "../plugins/MatrixCwiseBinaryOps.h"
+#   include "../plugins/ArrayCwiseBinaryOps.h"
+#   ifdef EIGEN_ARRAYBASE_PLUGIN
+#     include EIGEN_ARRAYBASE_PLUGIN
+#   endif
+#undef EIGEN_CURRENT_STORAGE_BASE_CLASS
+
+    /** Special case of the template operator=, in order to prevent the compiler
+      * from generating a default operator= (issue hit with g++ 4.1)
+      */
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const ArrayBase& other)
+    {
+      return internal::assign_selector<Derived,Derived>::run(derived(), other.derived());
+    }
+
+    EIGEN_DEVICE_FUNC
+    Derived& operator+=(const Scalar& scalar);
+    EIGEN_DEVICE_FUNC
+    Derived& operator-=(const Scalar& scalar);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator+=(const ArrayBase<OtherDerived>& other);
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator-=(const ArrayBase<OtherDerived>& other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator*=(const ArrayBase<OtherDerived>& other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator/=(const ArrayBase<OtherDerived>& other);
+
+  public:
+    EIGEN_DEVICE_FUNC
+    ArrayBase<Derived>& array() { return *this; }
+    EIGEN_DEVICE_FUNC
+    const ArrayBase<Derived>& array() const { return *this; }
+
+    /** \returns an \link Eigen::MatrixBase Matrix \endlink expression of this array
+      * \sa MatrixBase::array() */
+    EIGEN_DEVICE_FUNC
+    MatrixWrapper<Derived> matrix() { return derived(); }
+    EIGEN_DEVICE_FUNC
+    const MatrixWrapper<const Derived> matrix() const { return derived(); }
+
+//     template<typename Dest>
+//     inline void evalTo(Dest& dst) const { dst = matrix(); }
+
+  protected:
+    EIGEN_DEVICE_FUNC
+    ArrayBase() : Base() {}
+
+  private:
+    explicit ArrayBase(Index);
+    ArrayBase(Index,Index);
+    template<typename OtherDerived> explicit ArrayBase(const ArrayBase<OtherDerived>&);
+  protected:
+    // mixing arrays and matrices is not legal
+    template<typename OtherDerived> Derived& operator+=(const MatrixBase<OtherDerived>& )
+    {EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar))==-1,YOU_CANNOT_MIX_ARRAYS_AND_MATRICES); return *this;}
+    // mixing arrays and matrices is not legal
+    template<typename OtherDerived> Derived& operator-=(const MatrixBase<OtherDerived>& )
+    {EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar))==-1,YOU_CANNOT_MIX_ARRAYS_AND_MATRICES); return *this;}
+};
+
+/** replaces \c *this by \c *this - \a other.
+  *
+  * \returns a reference to \c *this
+  */
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived &
+ArrayBase<Derived>::operator-=(const ArrayBase<OtherDerived> &other)
+{
+  SelfCwiseBinaryOp<internal::scalar_difference_op<Scalar>, Derived, OtherDerived> tmp(derived());
+  tmp = other.derived();
+  return derived();
+}
+
+/** replaces \c *this by \c *this + \a other.
+  *
+  * \returns a reference to \c *this
+  */
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived &
+ArrayBase<Derived>::operator+=(const ArrayBase<OtherDerived>& other)
+{
+  SelfCwiseBinaryOp<internal::scalar_sum_op<Scalar>, Derived, OtherDerived> tmp(derived());
+  tmp = other.derived();
+  return derived();
+}
+
+/** replaces \c *this by \c *this * \a other coefficient wise.
+  *
+  * \returns a reference to \c *this
+  */
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived &
+ArrayBase<Derived>::operator*=(const ArrayBase<OtherDerived>& other)
+{
+  SelfCwiseBinaryOp<internal::scalar_product_op<Scalar>, Derived, OtherDerived> tmp(derived());
+  tmp = other.derived();
+  return derived();
+}
+
+/** replaces \c *this by \c *this / \a other coefficient wise.
+  *
+  * \returns a reference to \c *this
+  */
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived &
+ArrayBase<Derived>::operator/=(const ArrayBase<OtherDerived>& other)
+{
+  SelfCwiseBinaryOp<internal::scalar_quotient_op<Scalar>, Derived, OtherDerived> tmp(derived());
+  tmp = other.derived();
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_ARRAYBASE_H
diff --git a/third_party/eigen3/Eigen/src/Core/ArrayWrapper.h b/third_party/eigen3/Eigen/src/Core/ArrayWrapper.h
new file mode 100644
index 0000000000..4bb6480243
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/ArrayWrapper.h
@@ -0,0 +1,287 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ARRAYWRAPPER_H
+#define EIGEN_ARRAYWRAPPER_H
+
+namespace Eigen { 
+
+/** \class ArrayWrapper
+  * \ingroup Core_Module
+  *
+  * \brief Expression of a mathematical vector or matrix as an array object
+  *
+  * This class is the return type of MatrixBase::array(), and most of the time
+  * this is the only way it is use.
+  *
+  * \sa MatrixBase::array(), class MatrixWrapper
+  */
+
+namespace internal {
+template<typename ExpressionType>
+struct traits<ArrayWrapper<ExpressionType> >
+  : public traits<typename remove_all<typename ExpressionType::Nested>::type >
+{
+  typedef ArrayXpr XprKind;
+};
+}
+
+template<typename ExpressionType>
+class ArrayWrapper : public ArrayBase<ArrayWrapper<ExpressionType> >
+{
+  public:
+    typedef ArrayBase<ArrayWrapper> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(ArrayWrapper)
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(ArrayWrapper)
+
+    typedef typename internal::conditional<
+                       internal::is_lvalue<ExpressionType>::value,
+                       Scalar,
+                       const Scalar
+                     >::type ScalarWithConstIfNotLvalue;
+
+    typedef typename internal::nested<ExpressionType>::type NestedExpressionType;
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE ArrayWrapper(ExpressionType& matrix) : m_expression(matrix) {}
+
+    EIGEN_DEVICE_FUNC
+    inline Index rows() const { return m_expression.rows(); }
+    EIGEN_DEVICE_FUNC
+    inline Index cols() const { return m_expression.cols(); }
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const { return m_expression.outerStride(); }
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const { return m_expression.innerStride(); }
+
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue* data() { return m_expression.const_cast_derived().data(); }
+    EIGEN_DEVICE_FUNC
+    inline const Scalar* data() const { return m_expression.data(); }
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffReturnType coeff(Index rowId, Index colId) const
+    {
+      return m_expression.coeff(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index rowId, Index colId)
+    {
+      return m_expression.const_cast_derived().coeffRef(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index rowId, Index colId) const
+    {
+      return m_expression.const_cast_derived().coeffRef(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffReturnType coeff(Index index) const
+    {
+      return m_expression.coeff(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index index)
+    {
+      return m_expression.const_cast_derived().coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index index) const
+    {
+      return m_expression.const_cast_derived().coeffRef(index);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index rowId, Index colId) const
+    {
+      return m_expression.template packet<LoadMode>(rowId, colId);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index rowId, Index colId, const PacketScalar& val)
+    {
+      m_expression.const_cast_derived().template writePacket<LoadMode>(rowId, colId, val);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index index) const
+    {
+      return m_expression.template packet<LoadMode>(index);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index index, const PacketScalar& val)
+    {
+      m_expression.const_cast_derived().template writePacket<LoadMode>(index, val);
+    }
+
+    template<typename Dest>
+    EIGEN_DEVICE_FUNC
+    inline void evalTo(Dest& dst) const { dst = m_expression; }
+
+    const typename internal::remove_all<NestedExpressionType>::type& 
+    EIGEN_DEVICE_FUNC
+    nestedExpression() const 
+    {
+      return m_expression;
+    }
+
+    /** Forwards the resizing request to the nested expression
+      * \sa DenseBase::resize(Index)  */
+    EIGEN_DEVICE_FUNC
+    void resize(Index newSize) { m_expression.const_cast_derived().resize(newSize); }
+    /** Forwards the resizing request to the nested expression
+      * \sa DenseBase::resize(Index,Index)*/
+    EIGEN_DEVICE_FUNC
+    void resize(Index nbRows, Index nbCols) { m_expression.const_cast_derived().resize(nbRows,nbCols); }
+
+  protected:
+    NestedExpressionType m_expression;
+};
+
+/** \class MatrixWrapper
+  * \ingroup Core_Module
+  *
+  * \brief Expression of an array as a mathematical vector or matrix
+  *
+  * This class is the return type of ArrayBase::matrix(), and most of the time
+  * this is the only way it is use.
+  *
+  * \sa MatrixBase::matrix(), class ArrayWrapper
+  */
+
+namespace internal {
+template<typename ExpressionType>
+struct traits<MatrixWrapper<ExpressionType> >
+ : public traits<typename remove_all<typename ExpressionType::Nested>::type >
+{
+  typedef MatrixXpr XprKind;
+};
+}
+
+template<typename ExpressionType>
+class MatrixWrapper : public MatrixBase<MatrixWrapper<ExpressionType> >
+{
+  public:
+    typedef MatrixBase<MatrixWrapper<ExpressionType> > Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(MatrixWrapper)
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(MatrixWrapper)
+
+    typedef typename internal::conditional<
+                       internal::is_lvalue<ExpressionType>::value,
+                       Scalar,
+                       const Scalar
+                     >::type ScalarWithConstIfNotLvalue;
+
+    typedef typename internal::nested<ExpressionType>::type NestedExpressionType;
+
+    EIGEN_DEVICE_FUNC
+    inline MatrixWrapper(ExpressionType& a_matrix) : m_expression(a_matrix) {}
+
+    EIGEN_DEVICE_FUNC
+    inline Index rows() const { return m_expression.rows(); }
+    EIGEN_DEVICE_FUNC
+    inline Index cols() const { return m_expression.cols(); }
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const { return m_expression.outerStride(); }
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const { return m_expression.innerStride(); }
+
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue* data() { return m_expression.const_cast_derived().data(); }
+    EIGEN_DEVICE_FUNC
+    inline const Scalar* data() const { return m_expression.data(); }
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffReturnType coeff(Index rowId, Index colId) const
+    {
+      return m_expression.coeff(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index rowId, Index colId)
+    {
+      return m_expression.const_cast_derived().coeffRef(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index rowId, Index colId) const
+    {
+      return m_expression.derived().coeffRef(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffReturnType coeff(Index index) const
+    {
+      return m_expression.coeff(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index index)
+    {
+      return m_expression.const_cast_derived().coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index index) const
+    {
+      return m_expression.const_cast_derived().coeffRef(index);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index rowId, Index colId) const
+    {
+      return m_expression.template packet<LoadMode>(rowId, colId);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index rowId, Index colId, const PacketScalar& val)
+    {
+      m_expression.const_cast_derived().template writePacket<LoadMode>(rowId, colId, val);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index index) const
+    {
+      return m_expression.template packet<LoadMode>(index);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index index, const PacketScalar& val)
+    {
+      m_expression.const_cast_derived().template writePacket<LoadMode>(index, val);
+    }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<NestedExpressionType>::type& 
+    nestedExpression() const 
+    {
+      return m_expression;
+    }
+
+    /** Forwards the resizing request to the nested expression
+      * \sa DenseBase::resize(Index)  */
+    EIGEN_DEVICE_FUNC
+    void resize(Index newSize) { m_expression.const_cast_derived().resize(newSize); }
+    /** Forwards the resizing request to the nested expression
+      * \sa DenseBase::resize(Index,Index)*/
+    EIGEN_DEVICE_FUNC
+    void resize(Index nbRows, Index nbCols) { m_expression.const_cast_derived().resize(nbRows,nbCols); }
+
+  protected:
+    NestedExpressionType m_expression;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_ARRAYWRAPPER_H
diff --git a/third_party/eigen3/Eigen/src/Core/Assign.h b/third_party/eigen3/Eigen/src/Core/Assign.h
new file mode 100644
index 0000000000..07da2fe31d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Assign.h
@@ -0,0 +1,622 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2007 Michael Olbrich <michael.olbrich@gmx.net>
+// Copyright (C) 2006-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ASSIGN_H
+#define EIGEN_ASSIGN_H
+
+namespace Eigen {
+
+namespace internal {
+
+/***************************************************************************
+* Part 1 : the logic deciding a strategy for traversal and unrolling       *
+***************************************************************************/
+
+template <typename Derived, typename OtherDerived>
+struct assign_traits
+{
+public:
+  enum {
+    DstIsAligned = Derived::Flags & AlignedBit,
+    DstHasDirectAccess = Derived::Flags & DirectAccessBit,
+    SrcIsAligned = OtherDerived::Flags & AlignedBit,
+    JointAlignment = bool(DstIsAligned) && bool(SrcIsAligned) ? Aligned : Unaligned
+  };
+
+private:
+  enum {
+    InnerSize = int(Derived::IsVectorAtCompileTime) ? int(Derived::SizeAtCompileTime)
+              : int(Derived::Flags)&RowMajorBit ? int(Derived::ColsAtCompileTime)
+              : int(Derived::RowsAtCompileTime),
+    InnerMaxSize = int(Derived::IsVectorAtCompileTime) ? int(Derived::MaxSizeAtCompileTime)
+              : int(Derived::Flags)&RowMajorBit ? int(Derived::MaxColsAtCompileTime)
+              : int(Derived::MaxRowsAtCompileTime),
+    MaxSizeAtCompileTime = Derived::SizeAtCompileTime,
+    PacketSize = packet_traits<typename Derived::Scalar>::size
+  };
+
+  enum {
+    StorageOrdersAgree = (int(Derived::IsRowMajor) == int(OtherDerived::IsRowMajor)),
+    MightVectorize = StorageOrdersAgree
+                  && (int(Derived::Flags) & int(OtherDerived::Flags) & ActualPacketAccessBit),
+    MayInnerVectorize  = MightVectorize && int(InnerSize)!=Dynamic && int(InnerSize)%int(PacketSize)==0
+                       && int(DstIsAligned) && int(SrcIsAligned),
+    MayLinearize = StorageOrdersAgree && (int(Derived::Flags) & int(OtherDerived::Flags) & LinearAccessBit),
+    MayLinearVectorize = MightVectorize && MayLinearize && DstHasDirectAccess
+                       && (DstIsAligned || MaxSizeAtCompileTime == Dynamic),
+      /* If the destination isn't aligned, we have to do runtime checks and we don't unroll,
+         so it's only good for large enough sizes. */
+    MaySliceVectorize  = MightVectorize && DstHasDirectAccess
+                       && (int(InnerMaxSize)==Dynamic || int(InnerMaxSize)>=3*PacketSize)
+      /* slice vectorization can be slow, so we only want it if the slices are big, which is
+         indicated by InnerMaxSize rather than InnerSize, think of the case of a dynamic block
+         in a fixed-size matrix */
+  };
+
+public:
+  enum {
+    Traversal = int(MayInnerVectorize)  ? int(InnerVectorizedTraversal)
+              : int(MayLinearVectorize) ? int(LinearVectorizedTraversal)
+              : int(MaySliceVectorize)  ? int(SliceVectorizedTraversal)
+              : int(MayLinearize)       ? int(LinearTraversal)
+                                        : int(DefaultTraversal),
+    Vectorized = int(Traversal) == InnerVectorizedTraversal
+              || int(Traversal) == LinearVectorizedTraversal
+              || int(Traversal) == SliceVectorizedTraversal
+  };
+
+private:
+  enum {
+    UnrollingLimit      = EIGEN_UNROLLING_LIMIT * (Vectorized ? int(PacketSize) : 1),
+    MayUnrollCompletely = int(Derived::SizeAtCompileTime) != Dynamic
+                       && int(OtherDerived::CoeffReadCost) != Dynamic
+                       && int(Derived::SizeAtCompileTime) * int(OtherDerived::CoeffReadCost) <= int(UnrollingLimit),
+    MayUnrollInner      = int(InnerSize) != Dynamic
+                       && int(OtherDerived::CoeffReadCost) != Dynamic
+                       && int(InnerSize) * int(OtherDerived::CoeffReadCost) <= int(UnrollingLimit)
+  };
+
+public:
+  enum {
+    Unrolling = (int(Traversal) == int(InnerVectorizedTraversal) || int(Traversal) == int(DefaultTraversal))
+                ? (
+                    int(MayUnrollCompletely) ? int(CompleteUnrolling)
+                  : int(MayUnrollInner)      ? int(InnerUnrolling)
+                                             : int(NoUnrolling)
+                  )
+              : int(Traversal) == int(LinearVectorizedTraversal)
+                ? ( bool(MayUnrollCompletely) && bool(DstIsAligned) ? int(CompleteUnrolling) : int(NoUnrolling) )
+              : int(Traversal) == int(LinearTraversal)
+                ? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling) : int(NoUnrolling) )
+              : int(NoUnrolling)
+  };
+
+#ifdef EIGEN_DEBUG_ASSIGN
+  static void debug()
+  {
+    EIGEN_DEBUG_VAR(DstIsAligned)
+    EIGEN_DEBUG_VAR(SrcIsAligned)
+    EIGEN_DEBUG_VAR(JointAlignment)
+    EIGEN_DEBUG_VAR(Derived::SizeAtCompileTime)
+    EIGEN_DEBUG_VAR(OtherDerived::CoeffReadCost)
+    EIGEN_DEBUG_VAR(InnerSize)
+    EIGEN_DEBUG_VAR(InnerMaxSize)
+    EIGEN_DEBUG_VAR(PacketSize)
+    EIGEN_DEBUG_VAR(StorageOrdersAgree)
+    EIGEN_DEBUG_VAR(MightVectorize)
+    EIGEN_DEBUG_VAR(MayLinearize)
+    EIGEN_DEBUG_VAR(MayInnerVectorize)
+    EIGEN_DEBUG_VAR(MayLinearVectorize)
+    EIGEN_DEBUG_VAR(MaySliceVectorize)
+    EIGEN_DEBUG_VAR(Traversal)
+    EIGEN_DEBUG_VAR(UnrollingLimit)
+    EIGEN_DEBUG_VAR(MayUnrollCompletely)
+    EIGEN_DEBUG_VAR(MayUnrollInner)
+    EIGEN_DEBUG_VAR(Unrolling)
+  }
+#endif
+};
+
+/***************************************************************************
+* Part 2 : meta-unrollers
+***************************************************************************/
+
+/************************
+*** Default traversal ***
+************************/
+
+template<typename Derived1, typename Derived2, int Index, int Stop>
+struct assign_DefaultTraversal_CompleteUnrolling
+{
+  enum {
+    outer = Index / Derived1::InnerSizeAtCompileTime,
+    inner = Index % Derived1::InnerSizeAtCompileTime
+  };
+
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    dst.copyCoeffByOuterInner(outer, inner, src);
+    assign_DefaultTraversal_CompleteUnrolling<Derived1, Derived2, Index+1, Stop>::run(dst, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Stop>
+struct assign_DefaultTraversal_CompleteUnrolling<Derived1, Derived2, Stop, Stop>
+{
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Derived1 &, const Derived2 &) {}
+};
+
+template<typename Derived1, typename Derived2, int Index, int Stop>
+struct assign_DefaultTraversal_InnerUnrolling
+{
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src, typename Derived1::Index outer)
+  {
+    dst.copyCoeffByOuterInner(outer, Index, src);
+    assign_DefaultTraversal_InnerUnrolling<Derived1, Derived2, Index+1, Stop>::run(dst, src, outer);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Stop>
+struct assign_DefaultTraversal_InnerUnrolling<Derived1, Derived2, Stop, Stop>
+{
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Derived1 &, const Derived2 &, typename Derived1::Index) {}
+};
+
+/***********************
+*** Linear traversal ***
+***********************/
+
+template<typename Derived1, typename Derived2, int Index, int Stop>
+struct assign_LinearTraversal_CompleteUnrolling
+{
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    dst.copyCoeff(Index, src);
+    assign_LinearTraversal_CompleteUnrolling<Derived1, Derived2, Index+1, Stop>::run(dst, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Stop>
+struct assign_LinearTraversal_CompleteUnrolling<Derived1, Derived2, Stop, Stop>
+{
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Derived1 &, const Derived2 &) {}
+};
+
+/**************************
+*** Inner vectorization ***
+**************************/
+
+template<typename Derived1, typename Derived2, int Index, int Stop>
+struct assign_innervec_CompleteUnrolling
+{
+  enum {
+    outer = Index / Derived1::InnerSizeAtCompileTime,
+    inner = Index % Derived1::InnerSizeAtCompileTime,
+    JointAlignment = assign_traits<Derived1,Derived2>::JointAlignment
+  };
+
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    dst.template copyPacketByOuterInner<Derived2, Aligned, JointAlignment>(outer, inner, src);
+    assign_innervec_CompleteUnrolling<Derived1, Derived2,
+      Index+packet_traits<typename Derived1::Scalar>::size, Stop>::run(dst, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Stop>
+struct assign_innervec_CompleteUnrolling<Derived1, Derived2, Stop, Stop>
+{
+  static EIGEN_STRONG_INLINE void run(Derived1 &, const Derived2 &) {}
+};
+
+template<typename Derived1, typename Derived2, int Index, int Stop>
+struct assign_innervec_InnerUnrolling
+{
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src, typename Derived1::Index outer)
+  {
+    dst.template copyPacketByOuterInner<Derived2, Aligned, Aligned>(outer, Index, src);
+    assign_innervec_InnerUnrolling<Derived1, Derived2,
+      Index+packet_traits<typename Derived1::Scalar>::size, Stop>::run(dst, src, outer);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Stop>
+struct assign_innervec_InnerUnrolling<Derived1, Derived2, Stop, Stop>
+{
+  static EIGEN_STRONG_INLINE void run(Derived1 &, const Derived2 &, typename Derived1::Index) {}
+};
+
+/***************************************************************************
+* Part 3 : implementation of all cases
+***************************************************************************/
+
+template<typename Derived1, typename Derived2,
+         int Traversal = assign_traits<Derived1, Derived2>::Traversal,
+         int Unrolling = assign_traits<Derived1, Derived2>::Unrolling,
+         int Version = Specialized>
+struct assign_impl;
+
+/************************
+*** Default traversal ***
+************************/
+
+template<typename Derived1, typename Derived2, int Unrolling, int Version>
+struct assign_impl<Derived1, Derived2, InvalidTraversal, Unrolling, Version>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &, const Derived2 &) { }
+};
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, DefaultTraversal, NoUnrolling, Version>
+{
+  typedef typename Derived1::Index Index;
+  EIGEN_DEVICE_FUNC 
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    const Index innerSize = dst.innerSize();
+    const Index outerSize = dst.outerSize();
+    for(Index outer = 0; outer < outerSize; ++outer)
+      for(Index inner = 0; inner < innerSize; ++inner)
+        dst.copyCoeffByOuterInner(outer, inner, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, DefaultTraversal, CompleteUnrolling, Version>
+{
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    assign_DefaultTraversal_CompleteUnrolling<Derived1, Derived2, 0, Derived1::SizeAtCompileTime>
+      ::run(dst, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, DefaultTraversal, InnerUnrolling, Version>
+{
+  typedef typename Derived1::Index Index;
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    const Index outerSize = dst.outerSize();
+    for(Index outer = 0; outer < outerSize; ++outer)
+      assign_DefaultTraversal_InnerUnrolling<Derived1, Derived2, 0, Derived1::InnerSizeAtCompileTime>
+        ::run(dst, src, outer);
+  }
+};
+
+/***********************
+*** Linear traversal ***
+***********************/
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, LinearTraversal, NoUnrolling, Version>
+{
+  typedef typename Derived1::Index Index;
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    const Index size = dst.size();
+    for(Index i = 0; i < size; ++i)
+      dst.copyCoeff(i, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, LinearTraversal, CompleteUnrolling, Version>
+{
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    assign_LinearTraversal_CompleteUnrolling<Derived1, Derived2, 0, Derived1::SizeAtCompileTime>
+      ::run(dst, src);
+  }
+};
+
+/**************************
+*** Inner vectorization ***
+**************************/
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, InnerVectorizedTraversal, NoUnrolling, Version>
+{
+  typedef typename Derived1::Index Index;
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    const Index innerSize = dst.innerSize();
+    const Index outerSize = dst.outerSize();
+    const Index packetSize = packet_traits<typename Derived1::Scalar>::size;
+    for(Index outer = 0; outer < outerSize; ++outer)
+      for(Index inner = 0; inner < innerSize; inner+=packetSize)
+        dst.template copyPacketByOuterInner<Derived2, Aligned, Aligned>(outer, inner, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, InnerVectorizedTraversal, CompleteUnrolling, Version>
+{
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    assign_innervec_CompleteUnrolling<Derived1, Derived2, 0, Derived1::SizeAtCompileTime>
+      ::run(dst, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, InnerVectorizedTraversal, InnerUnrolling, Version>
+{
+  typedef typename Derived1::Index Index;
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    const Index outerSize = dst.outerSize();
+    for(Index outer = 0; outer < outerSize; ++outer)
+      assign_innervec_InnerUnrolling<Derived1, Derived2, 0, Derived1::InnerSizeAtCompileTime>
+        ::run(dst, src, outer);
+  }
+};
+
+/***************************
+*** Linear vectorization ***
+***************************/
+
+template <bool IsAligned = false>
+struct unaligned_assign_impl
+{
+  template <typename Derived, typename OtherDerived>
+  static EIGEN_STRONG_INLINE void run(const Derived&, OtherDerived&, typename Derived::Index, typename Derived::Index) {}
+};
+
+template <>
+struct unaligned_assign_impl<false>
+{
+  // MSVC must not inline this functions. If it does, it fails to optimize the
+  // packet access path.
+#ifdef _MSC_VER
+  template <typename Derived, typename OtherDerived>
+  static EIGEN_DONT_INLINE void run(const Derived& src, OtherDerived& dst, typename Derived::Index start, typename Derived::Index end)
+#else
+  template <typename Derived, typename OtherDerived>
+  static EIGEN_STRONG_INLINE void run(const Derived& src, OtherDerived& dst, typename Derived::Index start, typename Derived::Index end)
+#endif
+  {
+    for (typename Derived::Index index = start; index < end; ++index)
+      dst.copyCoeff(index, src);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, LinearVectorizedTraversal, NoUnrolling, Version>
+{
+  typedef typename Derived1::Index Index;
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    const Index size = dst.size();
+    typedef packet_traits<typename Derived1::Scalar> PacketTraits;
+    enum {
+      packetSize = PacketTraits::size,
+      dstAlignment = PacketTraits::AlignedOnScalar ? Aligned : int(assign_traits<Derived1,Derived2>::DstIsAligned) ,
+      srcAlignment = assign_traits<Derived1,Derived2>::JointAlignment
+    };
+    const Index alignedStart = assign_traits<Derived1,Derived2>::DstIsAligned ? 0
+                             : internal::first_aligned(&dst.coeffRef(0), size);
+    const Index alignedEnd = alignedStart + ((size-alignedStart)/packetSize)*packetSize;
+
+    unaligned_assign_impl<assign_traits<Derived1,Derived2>::DstIsAligned!=0>::run(src,dst,0,alignedStart);
+
+    for(Index index = alignedStart; index < alignedEnd; index += packetSize)
+    {
+      dst.template copyPacket<Derived2, dstAlignment, srcAlignment>(index, src);
+    }
+
+    unaligned_assign_impl<>::run(src,dst,alignedEnd,size);
+  }
+};
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, LinearVectorizedTraversal, CompleteUnrolling, Version>
+{
+  typedef typename Derived1::Index Index;
+  static EIGEN_STRONG_INLINE void run(Derived1 &dst, const Derived2 &src)
+  {
+    enum { size = Derived1::SizeAtCompileTime,
+           packetSize = packet_traits<typename Derived1::Scalar>::size,
+           alignedSize = (size/packetSize)*packetSize };
+
+    assign_innervec_CompleteUnrolling<Derived1, Derived2, 0, alignedSize>::run(dst, src);
+    assign_DefaultTraversal_CompleteUnrolling<Derived1, Derived2, alignedSize, size>::run(dst, src);
+  }
+};
+
+/**************************
+*** Slice vectorization ***
+***************************/
+
+template<typename Derived1, typename Derived2, int Version>
+struct assign_impl<Derived1, Derived2, SliceVectorizedTraversal, NoUnrolling, Version>
+{
+  typedef typename Derived1::Index Index;
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    typedef packet_traits<typename Derived1::Scalar> PacketTraits;
+    enum {
+      packetSize = PacketTraits::size,
+      alignable = PacketTraits::AlignedOnScalar,
+      dstAlignment = alignable ? Aligned : int(assign_traits<Derived1,Derived2>::DstIsAligned) ,
+      srcAlignment = assign_traits<Derived1,Derived2>::JointAlignment
+    };
+    const Index packetAlignedMask = packetSize - 1;
+    const Index innerSize = dst.innerSize();
+    const Index outerSize = dst.outerSize();
+    const Index alignedStep = alignable ? (packetSize - dst.outerStride() % packetSize) & packetAlignedMask : 0;
+    Index alignedStart = ((!alignable) || assign_traits<Derived1,Derived2>::DstIsAligned) ? 0
+                       : internal::first_aligned(&dst.coeffRef(0,0), innerSize);
+
+    for(Index outer = 0; outer < outerSize; ++outer)
+    {
+      const Index alignedEnd = alignedStart + ((innerSize-alignedStart) & ~packetAlignedMask);
+      // do the non-vectorizable part of the assignment
+      for(Index inner = 0; inner<alignedStart ; ++inner)
+        dst.copyCoeffByOuterInner(outer, inner, src);
+
+      // do the vectorizable part of the assignment
+      for(Index inner = alignedStart; inner<alignedEnd; inner+=packetSize)
+        dst.template copyPacketByOuterInner<Derived2, dstAlignment, Unaligned>(outer, inner, src);
+
+      // do the non-vectorizable part of the assignment
+      for(Index inner = alignedEnd; inner<innerSize ; ++inner)
+        dst.copyCoeffByOuterInner(outer, inner, src);
+
+      alignedStart = std::min<Index>((alignedStart+alignedStep)%packetSize, innerSize);
+    }
+  }
+};
+
+} // end namespace internal
+
+/***************************************************************************
+* Part 4 : implementation of DenseBase methods
+***************************************************************************/
+
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived& DenseBase<Derived>
+  ::lazyAssign(const DenseBase<OtherDerived>& other)
+{
+  enum{
+    SameType = internal::is_same<typename Derived::Scalar,typename OtherDerived::Scalar>::value
+  };
+
+  EIGEN_STATIC_ASSERT_LVALUE(Derived)
+  EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Derived,OtherDerived)
+  EIGEN_STATIC_ASSERT(SameType,YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+#ifdef EIGEN_TEST_EVALUATORS
+  
+#ifdef EIGEN_DEBUG_ASSIGN
+  internal::copy_using_evaluator_traits<Derived, OtherDerived>::debug();
+#endif
+  eigen_assert(rows() == other.rows() && cols() == other.cols());
+  internal::call_dense_assignment_loop(derived(),other.derived());
+  
+#else // EIGEN_TEST_EVALUATORS
+
+#ifdef EIGEN_DEBUG_ASSIGN
+  internal::assign_traits<Derived, OtherDerived>::debug();
+#endif
+  eigen_assert(rows() == other.rows() && cols() == other.cols());
+  internal::assign_impl<Derived, OtherDerived, int(SameType) ? int(internal::assign_traits<Derived, OtherDerived>::Traversal)
+                                                             : int(InvalidTraversal)>::run(derived(),other.derived());
+  
+#endif // EIGEN_TEST_EVALUATORS
+  
+#ifndef EIGEN_NO_DEBUG
+  checkTransposeAliasing(other.derived());
+#endif
+  return derived();
+}
+
+namespace internal {
+
+template<typename Derived, typename OtherDerived,
+         bool EvalBeforeAssigning = (int(internal::traits<OtherDerived>::Flags) & EvalBeforeAssigningBit) != 0,
+         bool NeedToTranspose = ((int(Derived::RowsAtCompileTime) == 1 && int(OtherDerived::ColsAtCompileTime) == 1)
+                              |   // FIXME | instead of || to please GCC 4.4.0 stupid warning "suggest parentheses around &&".
+                                  // revert to || as soon as not needed anymore.
+                                  (int(Derived::ColsAtCompileTime) == 1 && int(OtherDerived::RowsAtCompileTime) == 1))
+                              && int(Derived::SizeAtCompileTime) != 1>
+struct assign_selector;
+
+template<typename Derived, typename OtherDerived>
+struct assign_selector<Derived,OtherDerived,false,false> {
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Derived& run(Derived& dst, const OtherDerived& other) { return dst.lazyAssign(other.derived()); }
+  template<typename ActualDerived, typename ActualOtherDerived>
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Derived& evalTo(ActualDerived& dst, const ActualOtherDerived& other) { other.evalTo(dst); return dst; }
+};
+template<typename Derived, typename OtherDerived>
+struct assign_selector<Derived,OtherDerived,true,false> {
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Derived& run(Derived& dst, const OtherDerived& other) { return dst.lazyAssign(other.eval()); }
+};
+template<typename Derived, typename OtherDerived>
+struct assign_selector<Derived,OtherDerived,false,true> {
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Derived& run(Derived& dst, const OtherDerived& other) { return dst.lazyAssign(other.transpose()); }
+  template<typename ActualDerived, typename ActualOtherDerived>
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Derived& evalTo(ActualDerived& dst, const ActualOtherDerived& other) { Transpose<ActualDerived> dstTrans(dst); other.evalTo(dstTrans); return dst; }
+};
+template<typename Derived, typename OtherDerived>
+struct assign_selector<Derived,OtherDerived,true,true> {
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Derived& run(Derived& dst, const OtherDerived& other) { return dst.lazyAssign(other.transpose().eval()); }
+};
+
+} // end namespace internal
+
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::operator=(const DenseBase<OtherDerived>& other)
+{
+  return internal::assign_selector<Derived,OtherDerived>::run(derived(), other.derived());
+}
+
+template<typename Derived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::operator=(const DenseBase& other)
+{
+  return internal::assign_selector<Derived,Derived>::run(derived(), other.derived());
+}
+
+template<typename Derived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const MatrixBase& other)
+{
+  return internal::assign_selector<Derived,Derived>::run(derived(), other.derived());
+}
+
+template<typename Derived>
+template <typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const DenseBase<OtherDerived>& other)
+{
+  return internal::assign_selector<Derived,OtherDerived>::run(derived(), other.derived());
+}
+
+template<typename Derived>
+template <typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const EigenBase<OtherDerived>& other)
+{
+  return internal::assign_selector<Derived,OtherDerived,false>::evalTo(derived(), other.derived());
+}
+
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::operator=(const ReturnByValue<OtherDerived>& other)
+{
+  return internal::assign_selector<Derived,OtherDerived,false>::evalTo(derived(), other.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_ASSIGN_H
diff --git a/third_party/eigen3/Eigen/src/Core/AssignEvaluator.h b/third_party/eigen3/Eigen/src/Core/AssignEvaluator.h
new file mode 100644
index 0000000000..b1e304e2b1
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/AssignEvaluator.h
@@ -0,0 +1,842 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2011-2013 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2011-2012 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ASSIGN_EVALUATOR_H
+#define EIGEN_ASSIGN_EVALUATOR_H
+
+namespace Eigen {
+
+// This implementation is based on Assign.h
+
+namespace internal {
+  
+/***************************************************************************
+* Part 1 : the logic deciding a strategy for traversal and unrolling       *
+***************************************************************************/
+
+// copy_using_evaluator_traits is based on assign_traits
+
+template <typename Derived, typename OtherDerived>
+struct copy_using_evaluator_traits
+{
+public:
+  enum {
+    DstIsAligned = Derived::Flags & AlignedBit,
+    DstHasDirectAccess = Derived::Flags & DirectAccessBit,
+    SrcIsAligned = OtherDerived::Flags & AlignedBit,
+    JointAlignment = bool(DstIsAligned) && bool(SrcIsAligned) ? Aligned : Unaligned,
+    SrcEvalBeforeAssign = (evaluator_traits<OtherDerived>::HasEvalTo == 1)
+  };
+
+private:
+  enum {
+    InnerSize = int(Derived::IsVectorAtCompileTime) ? int(Derived::SizeAtCompileTime)
+              : int(Derived::Flags)&RowMajorBit ? int(Derived::ColsAtCompileTime)
+              : int(Derived::RowsAtCompileTime),
+    InnerMaxSize = int(Derived::IsVectorAtCompileTime) ? int(Derived::MaxSizeAtCompileTime)
+              : int(Derived::Flags)&RowMajorBit ? int(Derived::MaxColsAtCompileTime)
+              : int(Derived::MaxRowsAtCompileTime),
+    MaxSizeAtCompileTime = Derived::SizeAtCompileTime,
+    PacketSize = packet_traits<typename Derived::Scalar>::size
+  };
+
+  enum {
+    StorageOrdersAgree = (int(Derived::IsRowMajor) == int(OtherDerived::IsRowMajor)),
+    MightVectorize = StorageOrdersAgree
+                  && (int(Derived::Flags) & int(OtherDerived::Flags) & ActualPacketAccessBit),
+    MayInnerVectorize  = MightVectorize && int(InnerSize)!=Dynamic && int(InnerSize)%int(PacketSize)==0
+                       && int(DstIsAligned) && int(SrcIsAligned),
+    MayLinearize = StorageOrdersAgree && (int(Derived::Flags) & int(OtherDerived::Flags) & LinearAccessBit),
+    MayLinearVectorize = MightVectorize && MayLinearize && DstHasDirectAccess
+                       && (DstIsAligned || MaxSizeAtCompileTime == Dynamic),
+      /* If the destination isn't aligned, we have to do runtime checks and we don't unroll,
+         so it's only good for large enough sizes. */
+    MaySliceVectorize  = MightVectorize && DstHasDirectAccess
+                       && (int(InnerMaxSize)==Dynamic || int(InnerMaxSize)>=3*PacketSize)
+      /* slice vectorization can be slow, so we only want it if the slices are big, which is
+         indicated by InnerMaxSize rather than InnerSize, think of the case of a dynamic block
+         in a fixed-size matrix */
+  };
+
+public:
+  enum {
+    Traversal = int(SrcEvalBeforeAssign) ? int(AllAtOnceTraversal) 
+              : int(MayInnerVectorize)   ? int(InnerVectorizedTraversal)
+              : int(MayLinearVectorize)  ? int(LinearVectorizedTraversal)
+              : int(MaySliceVectorize)   ? int(SliceVectorizedTraversal)
+              : int(MayLinearize)        ? int(LinearTraversal)
+                                         : int(DefaultTraversal),
+    Vectorized = int(Traversal) == InnerVectorizedTraversal
+              || int(Traversal) == LinearVectorizedTraversal
+              || int(Traversal) == SliceVectorizedTraversal
+  };
+
+private:
+  enum {
+    UnrollingLimit      = EIGEN_UNROLLING_LIMIT * (Vectorized ? int(PacketSize) : 1),
+    MayUnrollCompletely = int(Derived::SizeAtCompileTime) != Dynamic
+                       && int(OtherDerived::CoeffReadCost) != Dynamic
+                       && int(Derived::SizeAtCompileTime) * int(OtherDerived::CoeffReadCost) <= int(UnrollingLimit),
+    MayUnrollInner      = int(InnerSize) != Dynamic
+                       && int(OtherDerived::CoeffReadCost) != Dynamic
+                       && int(InnerSize) * int(OtherDerived::CoeffReadCost) <= int(UnrollingLimit)
+  };
+
+public:
+  enum {
+    Unrolling = (int(Traversal) == int(InnerVectorizedTraversal) || int(Traversal) == int(DefaultTraversal))
+                ? (
+                    int(MayUnrollCompletely) ? int(CompleteUnrolling)
+                  : int(MayUnrollInner)      ? int(InnerUnrolling)
+                                             : int(NoUnrolling)
+                  )
+              : int(Traversal) == int(LinearVectorizedTraversal)
+                ? ( bool(MayUnrollCompletely) && bool(DstIsAligned) ? int(CompleteUnrolling) 
+                                                                    : int(NoUnrolling) )
+              : int(Traversal) == int(LinearTraversal)
+                ? ( bool(MayUnrollCompletely) ? int(CompleteUnrolling) 
+                                              : int(NoUnrolling) )
+              : int(NoUnrolling)
+  };
+
+#ifdef EIGEN_DEBUG_ASSIGN
+  static void debug()
+  {
+    EIGEN_DEBUG_VAR(DstIsAligned)
+    EIGEN_DEBUG_VAR(SrcIsAligned)
+    EIGEN_DEBUG_VAR(JointAlignment)
+    EIGEN_DEBUG_VAR(InnerSize)
+    EIGEN_DEBUG_VAR(InnerMaxSize)
+    EIGEN_DEBUG_VAR(PacketSize)
+    EIGEN_DEBUG_VAR(StorageOrdersAgree)
+    EIGEN_DEBUG_VAR(MightVectorize)
+    EIGEN_DEBUG_VAR(MayLinearize)
+    EIGEN_DEBUG_VAR(MayInnerVectorize)
+    EIGEN_DEBUG_VAR(MayLinearVectorize)
+    EIGEN_DEBUG_VAR(MaySliceVectorize)
+    EIGEN_DEBUG_VAR(Traversal)
+    EIGEN_DEBUG_VAR(UnrollingLimit)
+    EIGEN_DEBUG_VAR(MayUnrollCompletely)
+    EIGEN_DEBUG_VAR(MayUnrollInner)
+    EIGEN_DEBUG_VAR(Unrolling)
+  }
+#endif
+};
+
+/***************************************************************************
+* Part 2 : meta-unrollers
+***************************************************************************/
+
+/************************
+*** Default traversal ***
+************************/
+
+template<typename Kernel, int Index, int Stop>
+struct copy_using_evaluator_DefaultTraversal_CompleteUnrolling
+{
+  typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
+  typedef typename DstEvaluatorType::XprType DstXprType;
+  
+  enum {
+    outer = Index / DstXprType::InnerSizeAtCompileTime,
+    inner = Index % DstXprType::InnerSizeAtCompileTime
+  };
+
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    kernel.assignCoeffByOuterInner(outer, inner);
+    copy_using_evaluator_DefaultTraversal_CompleteUnrolling<Kernel, Index+1, Stop>::run(kernel);
+  }
+};
+
+template<typename Kernel, int Stop>
+struct copy_using_evaluator_DefaultTraversal_CompleteUnrolling<Kernel, Stop, Stop>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel&) { }
+};
+
+template<typename Kernel, int Index, int Stop>
+struct copy_using_evaluator_DefaultTraversal_InnerUnrolling
+{
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel, int outer)
+  {
+    kernel.assignCoeffByOuterInner(outer, Index);
+    copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, Index+1, Stop>::run(kernel, outer);
+  }
+};
+
+template<typename Kernel, int Stop>
+struct copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, Stop, Stop>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel&, int) { }
+};
+
+/***********************
+*** Linear traversal ***
+***********************/
+
+template<typename Kernel, int Index, int Stop>
+struct copy_using_evaluator_LinearTraversal_CompleteUnrolling
+{
+  static EIGEN_STRONG_INLINE void run(Kernel& kernel)
+  {
+    kernel.assignCoeff(Index);
+    copy_using_evaluator_LinearTraversal_CompleteUnrolling<Kernel, Index+1, Stop>::run(kernel);
+  }
+};
+
+template<typename Kernel, int Stop>
+struct copy_using_evaluator_LinearTraversal_CompleteUnrolling<Kernel, Stop, Stop>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel&) { }
+};
+
+/**************************
+*** Inner vectorization ***
+**************************/
+
+template<typename Kernel, int Index, int Stop>
+struct copy_using_evaluator_innervec_CompleteUnrolling
+{
+  typedef typename Kernel::DstEvaluatorType DstEvaluatorType;
+  typedef typename DstEvaluatorType::XprType DstXprType;
+
+  enum {
+    outer = Index / DstXprType::InnerSizeAtCompileTime,
+    inner = Index % DstXprType::InnerSizeAtCompileTime,
+    JointAlignment = Kernel::AssignmentTraits::JointAlignment
+  };
+
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    kernel.template assignPacketByOuterInner<Aligned, JointAlignment>(outer, inner);
+    enum { NextIndex = Index + packet_traits<typename DstXprType::Scalar>::size };
+    copy_using_evaluator_innervec_CompleteUnrolling<Kernel, NextIndex, Stop>::run(kernel);
+  }
+};
+
+template<typename Kernel, int Stop>
+struct copy_using_evaluator_innervec_CompleteUnrolling<Kernel, Stop, Stop>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel&) { }
+};
+
+template<typename Kernel, int Index, int Stop>
+struct copy_using_evaluator_innervec_InnerUnrolling
+{
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel, int outer)
+  {
+    kernel.template assignPacketByOuterInner<Aligned, Aligned>(outer, Index);
+    typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
+    enum { NextIndex = Index + packet_traits<typename DstXprType::Scalar>::size };
+    copy_using_evaluator_innervec_InnerUnrolling<Kernel, NextIndex, Stop>::run(kernel, outer);
+  }
+};
+
+template<typename Kernel, int Stop>
+struct copy_using_evaluator_innervec_InnerUnrolling<Kernel, Stop, Stop>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel &, int) { }
+};
+
+/***************************************************************************
+* Part 3 : implementation of all cases
+***************************************************************************/
+
+// dense_assignment_loop is based on assign_impl
+
+template<typename Kernel,
+         int Traversal = Kernel::AssignmentTraits::Traversal,
+         int Unrolling = Kernel::AssignmentTraits::Unrolling>
+struct dense_assignment_loop;
+
+/************************
+*** Default traversal ***
+************************/
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, DefaultTraversal, NoUnrolling>
+{
+  static void run(Kernel &kernel)
+  {
+    typedef typename Kernel::Index Index;
+    
+    for(Index outer = 0; outer < kernel.outerSize(); ++outer) {
+      for(Index inner = 0; inner < kernel.innerSize(); ++inner) {
+        kernel.assignCoeffByOuterInner(outer, inner);
+      }
+    }
+  }
+};
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, DefaultTraversal, CompleteUnrolling>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
+    copy_using_evaluator_DefaultTraversal_CompleteUnrolling<Kernel, 0, DstXprType::SizeAtCompileTime>::run(kernel);
+  }
+};
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, DefaultTraversal, InnerUnrolling>
+{
+  typedef typename Kernel::Index Index;
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
+
+    const Index outerSize = kernel.outerSize();
+    for(Index outer = 0; outer < outerSize; ++outer)
+      copy_using_evaluator_DefaultTraversal_InnerUnrolling<Kernel, 0, DstXprType::InnerSizeAtCompileTime>::run(kernel, outer);
+  }
+};
+
+/***************************
+*** Linear vectorization ***
+***************************/
+
+
+// The goal of unaligned_dense_assignment_loop is simply to factorize the handling
+// of the non vectorizable beginning and ending parts
+
+template <bool IsAligned = false>
+struct unaligned_dense_assignment_loop
+{
+  // if IsAligned = true, then do nothing
+  template <typename Kernel>
+  static EIGEN_STRONG_INLINE void run(Kernel&, typename Kernel::Index, typename Kernel::Index) {}
+};
+
+template <>
+struct unaligned_dense_assignment_loop<false>
+{
+  // MSVC must not inline this functions. If it does, it fails to optimize the
+  // packet access path.
+  // FIXME check which version exhibits this issue
+#if EIGEN_COMP_MSVC
+  template <typename Kernel>
+  static EIGEN_DONT_INLINE void run(Kernel &kernel,
+                                    typename Kernel::Index start,
+                                    typename Kernel::Index end)
+#else
+  template <typename Kernel>
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel,
+                                      typename Kernel::Index start,
+                                      typename Kernel::Index end)
+#endif
+  {
+    for (typename Kernel::Index index = start; index < end; ++index)
+      kernel.assignCoeff(index);
+  }
+};
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, LinearVectorizedTraversal, NoUnrolling>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    typedef typename Kernel::Index Index;
+
+    const Index size = kernel.size();
+    typedef packet_traits<typename Kernel::Scalar> PacketTraits;
+    enum {
+      packetSize = PacketTraits::size,
+      dstIsAligned = int(Kernel::AssignmentTraits::DstIsAligned),
+      dstAlignment = PacketTraits::AlignedOnScalar ? Aligned : dstIsAligned,
+      srcAlignment = Kernel::AssignmentTraits::JointAlignment
+    };
+    const Index alignedStart = dstIsAligned ? 0 : internal::first_aligned(&kernel.dstEvaluator().coeffRef(0), size);
+    const Index alignedEnd = alignedStart + ((size-alignedStart)/packetSize)*packetSize;
+
+    unaligned_dense_assignment_loop<dstIsAligned!=0>::run(kernel, 0, alignedStart);
+
+    for(Index index = alignedStart; index < alignedEnd; index += packetSize)
+      kernel.template assignPacket<dstAlignment, srcAlignment>(index);
+
+    unaligned_dense_assignment_loop<>::run(kernel, alignedEnd, size);
+  }
+};
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, LinearVectorizedTraversal, CompleteUnrolling>
+{
+  typedef typename Kernel::Index Index;
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
+    
+    enum { size = DstXprType::SizeAtCompileTime,
+           packetSize = packet_traits<typename Kernel::Scalar>::size,
+           alignedSize = (size/packetSize)*packetSize };
+
+    copy_using_evaluator_innervec_CompleteUnrolling<Kernel, 0, alignedSize>::run(kernel);
+    copy_using_evaluator_DefaultTraversal_CompleteUnrolling<Kernel, alignedSize, size>::run(kernel);
+  }
+};
+
+/**************************
+*** Inner vectorization ***
+**************************/
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, InnerVectorizedTraversal, NoUnrolling>
+{
+  static inline void run(Kernel &kernel)
+  {
+    typedef typename Kernel::Index Index;
+
+    const Index innerSize = kernel.innerSize();
+    const Index outerSize = kernel.outerSize();
+    const Index packetSize = packet_traits<typename Kernel::Scalar>::size;
+    for(Index outer = 0; outer < outerSize; ++outer)
+      for(Index inner = 0; inner < innerSize; inner+=packetSize)
+        kernel.template assignPacketByOuterInner<Aligned, Aligned>(outer, inner);
+  }
+};
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, InnerVectorizedTraversal, CompleteUnrolling>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
+    copy_using_evaluator_innervec_CompleteUnrolling<Kernel, 0, DstXprType::SizeAtCompileTime>::run(kernel);
+  }
+};
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, InnerVectorizedTraversal, InnerUnrolling>
+{
+  typedef typename Kernel::Index Index;
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
+    const Index outerSize = kernel.outerSize();
+    for(Index outer = 0; outer < outerSize; ++outer)
+      copy_using_evaluator_innervec_InnerUnrolling<Kernel, 0, DstXprType::InnerSizeAtCompileTime>::run(kernel, outer);
+  }
+};
+
+/***********************
+*** Linear traversal ***
+***********************/
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, LinearTraversal, NoUnrolling>
+{
+  static inline void run(Kernel &kernel)
+  {
+    typedef typename Kernel::Index Index;
+    const Index size = kernel.size();
+    for(Index i = 0; i < size; ++i)
+      kernel.assignCoeff(i);
+  }
+};
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, LinearTraversal, CompleteUnrolling>
+{
+  static EIGEN_STRONG_INLINE void run(Kernel &kernel)
+  {
+    typedef typename Kernel::DstEvaluatorType::XprType DstXprType;
+    copy_using_evaluator_LinearTraversal_CompleteUnrolling<Kernel, 0, DstXprType::SizeAtCompileTime>::run(kernel);
+  }
+};
+
+/**************************
+*** Slice vectorization ***
+***************************/
+
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, SliceVectorizedTraversal, NoUnrolling>
+{
+  static inline void run(Kernel &kernel)
+  {
+    typedef typename Kernel::Index Index;
+    typedef packet_traits<typename Kernel::Scalar> PacketTraits;
+    enum {
+      packetSize = PacketTraits::size,
+      alignable = PacketTraits::AlignedOnScalar,
+      dstAlignment = alignable ? Aligned : int(Kernel::AssignmentTraits::DstIsAligned)
+    };
+    const Index packetAlignedMask = packetSize - 1;
+    const Index innerSize = kernel.innerSize();
+    const Index outerSize = kernel.outerSize();
+    const Index alignedStep = alignable ? (packetSize - kernel.outerStride() % packetSize) & packetAlignedMask : 0;
+    Index alignedStart = ((!alignable) || Kernel::AssignmentTraits::DstIsAligned) ? 0
+                       : internal::first_aligned(&kernel.dstEvaluator().coeffRef(0,0), innerSize);
+
+    for(Index outer = 0; outer < outerSize; ++outer)
+    {
+      const Index alignedEnd = alignedStart + ((innerSize-alignedStart) & ~packetAlignedMask);
+      // do the non-vectorizable part of the assignment
+      for(Index inner = 0; inner<alignedStart ; ++inner)
+        kernel.assignCoeffByOuterInner(outer, inner);
+
+      // do the vectorizable part of the assignment
+      for(Index inner = alignedStart; inner<alignedEnd; inner+=packetSize)
+        kernel.template assignPacketByOuterInner<dstAlignment, Unaligned>(outer, inner);
+
+      // do the non-vectorizable part of the assignment
+      for(Index inner = alignedEnd; inner<innerSize ; ++inner)
+        kernel.assignCoeffByOuterInner(outer, inner);
+
+      alignedStart = std::min<Index>((alignedStart+alignedStep)%packetSize, innerSize);
+    }
+  }
+};
+
+/****************************
+*** All-at-once traversal ***
+****************************/
+
+// TODO: this 'AllAtOnceTraversal' should be dropped or caught earlier (Gael)
+// Indeed, what to do with the kernel's functor??
+template<typename Kernel>
+struct dense_assignment_loop<Kernel, AllAtOnceTraversal, NoUnrolling>
+{
+  static inline void run(Kernel & kernel)
+  {
+    // Evaluate rhs in temporary to prevent aliasing problems in a = a * a;
+    // TODO: Do not pass the xpr object to evalTo() (Jitse)
+    kernel.srcEvaluator().evalTo(kernel.dstEvaluator(), kernel.dstExpression());
+  }
+};
+
+/***************************************************************************
+* Part 4 : Generic Assignment routine
+***************************************************************************/
+
+// This class generalize the assignment of a coefficient (or packet) from one dense evaluator
+// to another dense writable evaluator.
+// It is parametrized by the two evaluators, and the actual assignment functor.
+// This abstraction level permits to keep the evaluation loops as simple and as generic as possible.
+// One can customize the assignment using this generic dense_assignment_kernel with different
+// functors, or by completely overloading it, by-passing a functor.
+template<typename DstEvaluatorTypeT, typename SrcEvaluatorTypeT, typename Functor>
+class generic_dense_assignment_kernel
+{
+protected:
+  typedef typename DstEvaluatorTypeT::XprType DstXprType;
+  typedef typename SrcEvaluatorTypeT::XprType SrcXprType;
+public:
+  
+  typedef DstEvaluatorTypeT DstEvaluatorType;
+  typedef SrcEvaluatorTypeT SrcEvaluatorType;
+  typedef typename DstEvaluatorType::Scalar Scalar;
+  typedef typename DstEvaluatorType::Index Index;
+  typedef copy_using_evaluator_traits<DstXprType, SrcXprType> AssignmentTraits;
+  
+  
+  generic_dense_assignment_kernel(DstEvaluatorType &dst, const SrcEvaluatorType &src, const Functor &func, DstXprType& dstExpr)
+    : m_dst(dst), m_src(src), m_functor(func), m_dstExpr(dstExpr)
+  {}
+  
+  Index size() const        { return m_dstExpr.size(); }
+  Index innerSize() const   { return m_dstExpr.innerSize(); }
+  Index outerSize() const   { return m_dstExpr.outerSize(); }
+  Index outerStride() const { return m_dstExpr.outerStride(); }
+  
+  // TODO get rid of this one:
+  DstXprType& dstExpression() const { return m_dstExpr; }
+  
+  DstEvaluatorType& dstEvaluator() { return m_dst; }
+  const SrcEvaluatorType& srcEvaluator() const { return m_src; }
+  
+  void assignCoeff(Index row, Index col)
+  {
+    m_functor.assignCoeff(m_dst.coeffRef(row,col), m_src.coeff(row,col));
+  }
+  
+  void assignCoeff(Index index)
+  {
+    m_functor.assignCoeff(m_dst.coeffRef(index), m_src.coeff(index));
+  }
+  
+  void assignCoeffByOuterInner(Index outer, Index inner)
+  {
+    Index row = rowIndexByOuterInner(outer, inner); 
+    Index col = colIndexByOuterInner(outer, inner); 
+    assignCoeff(row, col);
+  }
+  
+  
+  template<int StoreMode, int LoadMode>
+  void assignPacket(Index row, Index col)
+  {
+    m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(row,col), m_src.template packet<LoadMode>(row,col));
+  }
+  
+  template<int StoreMode, int LoadMode>
+  void assignPacket(Index index)
+  {
+    m_functor.template assignPacket<StoreMode>(&m_dst.coeffRef(index), m_src.template packet<LoadMode>(index));
+  }
+  
+  template<int StoreMode, int LoadMode>
+  void assignPacketByOuterInner(Index outer, Index inner)
+  {
+    Index row = rowIndexByOuterInner(outer, inner); 
+    Index col = colIndexByOuterInner(outer, inner);
+    assignPacket<StoreMode,LoadMode>(row, col);
+  }
+  
+  static Index rowIndexByOuterInner(Index outer, Index inner)
+  {
+    typedef typename DstEvaluatorType::ExpressionTraits Traits;
+    return int(Traits::RowsAtCompileTime) == 1 ? 0
+      : int(Traits::ColsAtCompileTime) == 1 ? inner
+      : int(Traits::Flags)&RowMajorBit ? outer
+      : inner;
+  }
+
+  static Index colIndexByOuterInner(Index outer, Index inner)
+  {
+    typedef typename DstEvaluatorType::ExpressionTraits Traits;
+    return int(Traits::ColsAtCompileTime) == 1 ? 0
+      : int(Traits::RowsAtCompileTime) == 1 ? inner
+      : int(Traits::Flags)&RowMajorBit ? inner
+      : outer;
+  }
+  
+protected:
+  DstEvaluatorType& m_dst;
+  const SrcEvaluatorType& m_src;
+  const Functor &m_functor;
+  // TODO find a way to avoid the needs of the original expression
+  DstXprType& m_dstExpr;
+};
+
+template<typename DstXprType, typename SrcXprType, typename Functor>
+void call_dense_assignment_loop(const DstXprType& dst, const SrcXprType& src, const Functor &func)
+{
+#ifdef EIGEN_DEBUG_ASSIGN
+  // TODO these traits should be computed from information provided by the evaluators
+  internal::copy_using_evaluator_traits<DstXprType, SrcXprType>::debug();
+#endif
+  eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols());
+  
+  typedef typename evaluator<DstXprType>::type DstEvaluatorType;
+  typedef typename evaluator<SrcXprType>::type SrcEvaluatorType;
+
+  DstEvaluatorType dstEvaluator(dst);
+  SrcEvaluatorType srcEvaluator(src);
+    
+  typedef generic_dense_assignment_kernel<DstEvaluatorType,SrcEvaluatorType,Functor> Kernel;
+  Kernel kernel(dstEvaluator, srcEvaluator, func, dst.const_cast_derived());
+  
+  dense_assignment_loop<Kernel>::run(kernel);
+}
+
+template<typename DstXprType, typename SrcXprType>
+void call_dense_assignment_loop(const DstXprType& dst, const SrcXprType& src)
+{
+  call_dense_assignment_loop(dst, src, internal::assign_op<typename DstXprType::Scalar>());
+}
+
+/***************************************************************************
+* Part 5 : Entry points
+***************************************************************************/
+
+// Based on DenseBase::LazyAssign()
+// The following functions are just for testing and they are meant to be moved to operator= and the likes.
+
+template<typename DstXprType, template <typename> class StorageBase, typename SrcXprType>
+EIGEN_STRONG_INLINE
+const DstXprType& copy_using_evaluator(const NoAlias<DstXprType, StorageBase>& dst, 
+                                       const EigenBase<SrcXprType>& src)
+{
+  return noalias_copy_using_evaluator(dst.expression(), src.derived(), internal::assign_op<typename DstXprType::Scalar>());
+}
+
+template<typename XprType, int AssumeAliasing = evaluator_traits<XprType>::AssumeAliasing>
+struct AddEvalIfAssumingAliasing;
+
+template<typename XprType>
+struct AddEvalIfAssumingAliasing<XprType, 0>
+{
+  static const XprType& run(const XprType& xpr) 
+  {
+    return xpr;
+  }
+};
+
+template<typename XprType>
+struct AddEvalIfAssumingAliasing<XprType, 1>
+{
+  static const EvalToTemp<XprType> run(const XprType& xpr)
+  {
+    return EvalToTemp<XprType>(xpr);
+  }
+};
+
+template<typename DstXprType, typename SrcXprType, typename Functor>
+EIGEN_STRONG_INLINE
+const DstXprType& copy_using_evaluator(const EigenBase<DstXprType>& dst, const EigenBase<SrcXprType>& src, const Functor &func)
+{
+  return noalias_copy_using_evaluator(dst.const_cast_derived(), 
+                                      AddEvalIfAssumingAliasing<SrcXprType>::run(src.derived()),
+                                      func
+                                     );
+}
+
+// this mimics operator=
+template<typename DstXprType, typename SrcXprType>
+EIGEN_STRONG_INLINE
+const DstXprType& copy_using_evaluator(const EigenBase<DstXprType>& dst, const EigenBase<SrcXprType>& src)
+{
+  return copy_using_evaluator(dst.const_cast_derived(), src.derived(), internal::assign_op<typename DstXprType::Scalar>());
+}
+
+template<typename DstXprType, typename SrcXprType, typename Functor>
+EIGEN_STRONG_INLINE
+const DstXprType& noalias_copy_using_evaluator(const PlainObjectBase<DstXprType>& dst, const EigenBase<SrcXprType>& src, const Functor &func)
+{
+#ifdef EIGEN_DEBUG_ASSIGN
+  internal::copy_using_evaluator_traits<DstXprType, SrcXprType>::debug();
+#endif
+#ifdef EIGEN_NO_AUTOMATIC_RESIZING
+  eigen_assert((dst.size()==0 || (IsVectorAtCompileTime ? (dst.size() == src.size())
+                                                        : (dst.rows() == src.rows() && dst.cols() == src.cols())))
+              && "Size mismatch. Automatic resizing is disabled because EIGEN_NO_AUTOMATIC_RESIZING is defined");
+#else
+  dst.const_cast_derived().resizeLike(src.derived());
+#endif
+  call_dense_assignment_loop(dst.const_cast_derived(), src.derived(), func);
+  return dst.derived();
+}
+
+template<typename DstXprType, typename SrcXprType, typename Functor>
+EIGEN_STRONG_INLINE
+const DstXprType& noalias_copy_using_evaluator(const EigenBase<DstXprType>& dst, const EigenBase<SrcXprType>& src, const Functor &func)
+{
+  call_dense_assignment_loop(dst.const_cast_derived(), src.derived(), func);
+  return dst.derived();
+}
+
+// Based on DenseBase::swap()
+// TODO: Check whether we need to do something special for swapping two
+//       Arrays or Matrices. (Jitse)
+
+// Overload default assignPacket behavior for swapping them
+template<typename DstEvaluatorTypeT, typename SrcEvaluatorTypeT>
+class swap_kernel : public generic_dense_assignment_kernel<DstEvaluatorTypeT, SrcEvaluatorTypeT, swap_assign_op<typename DstEvaluatorTypeT::Scalar> >
+{
+  typedef generic_dense_assignment_kernel<DstEvaluatorTypeT, SrcEvaluatorTypeT, swap_assign_op<typename DstEvaluatorTypeT::Scalar> > Base;
+  typedef typename DstEvaluatorTypeT::PacketScalar PacketScalar;
+  using Base::m_dst;
+  using Base::m_src;
+  using Base::m_functor;
+  
+public:
+  typedef typename Base::Scalar Scalar;
+  typedef typename Base::Index Index;
+  typedef typename Base::DstXprType DstXprType;
+  
+  swap_kernel(DstEvaluatorTypeT &dst, const SrcEvaluatorTypeT &src, DstXprType& dstExpr)
+    : Base(dst, src, swap_assign_op<Scalar>(), dstExpr)
+  {}
+  
+  template<int StoreMode, int LoadMode>
+  void assignPacket(Index row, Index col)
+  {
+    m_functor.template swapPacket<StoreMode,LoadMode,PacketScalar>(&m_dst.coeffRef(row,col), &const_cast<SrcEvaluatorTypeT&>(m_src).coeffRef(row,col));
+  }
+  
+  template<int StoreMode, int LoadMode>
+  void assignPacket(Index index)
+  {
+    m_functor.template swapPacket<StoreMode,LoadMode,PacketScalar>(&m_dst.coeffRef(index), &const_cast<SrcEvaluatorTypeT&>(m_src).coeffRef(index));
+  }
+  
+  // TODO find a simple way not to have to copy/paste this function from generic_dense_assignment_kernel, by simple I mean no CRTP (Gael)
+  template<int StoreMode, int LoadMode>
+  void assignPacketByOuterInner(Index outer, Index inner)
+  {
+    Index row = Base::rowIndexByOuterInner(outer, inner); 
+    Index col = Base::colIndexByOuterInner(outer, inner);
+    assignPacket<StoreMode,LoadMode>(row, col);
+  }
+};
+  
+template<typename DstXprType, typename SrcXprType>
+void swap_using_evaluator(const DstXprType& dst, const SrcXprType& src)
+{
+  // TODO there is too much redundancy with call_dense_assignment_loop
+  
+  eigen_assert(dst.rows() == src.rows() && dst.cols() == src.cols());
+  
+  typedef typename evaluator<DstXprType>::type DstEvaluatorType;
+  typedef typename evaluator<SrcXprType>::type SrcEvaluatorType;
+
+  DstEvaluatorType dstEvaluator(dst);
+  SrcEvaluatorType srcEvaluator(src);
+    
+  typedef swap_kernel<DstEvaluatorType,SrcEvaluatorType> Kernel;
+  Kernel kernel(dstEvaluator, srcEvaluator, dst.const_cast_derived());
+  
+  dense_assignment_loop<Kernel>::run(kernel);
+}
+
+// Based on MatrixBase::operator+= (in CwiseBinaryOp.h)
+template<typename DstXprType, typename SrcXprType>
+void add_assign_using_evaluator(const MatrixBase<DstXprType>& dst, const MatrixBase<SrcXprType>& src)
+{
+  typedef typename DstXprType::Scalar Scalar;
+  copy_using_evaluator(dst.derived(), src.derived(), add_assign_op<Scalar>());
+}
+
+// Based on ArrayBase::operator+=
+template<typename DstXprType, typename SrcXprType>
+void add_assign_using_evaluator(const ArrayBase<DstXprType>& dst, const ArrayBase<SrcXprType>& src)
+{
+  typedef typename DstXprType::Scalar Scalar;
+  copy_using_evaluator(dst.derived(), src.derived(), add_assign_op<Scalar>());
+}
+
+// TODO: Add add_assign_using_evaluator for EigenBase ? (Jitse)
+
+template<typename DstXprType, typename SrcXprType>
+void subtract_assign_using_evaluator(const MatrixBase<DstXprType>& dst, const MatrixBase<SrcXprType>& src)
+{
+  typedef typename DstXprType::Scalar Scalar;
+  copy_using_evaluator(dst.derived(), src.derived(), sub_assign_op<Scalar>());
+}
+
+template<typename DstXprType, typename SrcXprType>
+void subtract_assign_using_evaluator(const ArrayBase<DstXprType>& dst, const ArrayBase<SrcXprType>& src)
+{
+  typedef typename DstXprType::Scalar Scalar;
+  copy_using_evaluator(dst.derived(), src.derived(), sub_assign_op<Scalar>());
+}
+
+template<typename DstXprType, typename SrcXprType>
+void multiply_assign_using_evaluator(const ArrayBase<DstXprType>& dst, const ArrayBase<SrcXprType>& src)
+{
+  typedef typename DstXprType::Scalar Scalar;
+  copy_using_evaluator(dst.derived(), src.derived(), mul_assign_op<Scalar>());
+}
+
+template<typename DstXprType, typename SrcXprType>
+void divide_assign_using_evaluator(const ArrayBase<DstXprType>& dst, const ArrayBase<SrcXprType>& src)
+{
+  typedef typename DstXprType::Scalar Scalar;
+  copy_using_evaluator(dst.derived(), src.derived(), div_assign_op<Scalar>());
+}
+
+
+} // namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_ASSIGN_EVALUATOR_H
diff --git a/third_party/eigen3/Eigen/src/Core/Assign_MKL.h b/third_party/eigen3/Eigen/src/Core/Assign_MKL.h
new file mode 100644
index 0000000000..97134ffd72
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Assign_MKL.h
@@ -0,0 +1,225 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   MKL VML support for coefficient-wise unary Eigen expressions like a=b.sin()
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_ASSIGN_VML_H
+#define EIGEN_ASSIGN_VML_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Op> struct vml_call
+{ enum { IsSupported = 0 }; };
+
+template<typename Dst, typename Src, typename UnaryOp>
+class vml_assign_traits
+{
+  private:
+    enum {
+      DstHasDirectAccess = Dst::Flags & DirectAccessBit,
+      SrcHasDirectAccess = Src::Flags & DirectAccessBit,
+
+      StorageOrdersAgree = (int(Dst::IsRowMajor) == int(Src::IsRowMajor)),
+      InnerSize = int(Dst::IsVectorAtCompileTime) ? int(Dst::SizeAtCompileTime)
+                : int(Dst::Flags)&RowMajorBit ? int(Dst::ColsAtCompileTime)
+                : int(Dst::RowsAtCompileTime),
+      InnerMaxSize  = int(Dst::IsVectorAtCompileTime) ? int(Dst::MaxSizeAtCompileTime)
+                    : int(Dst::Flags)&RowMajorBit ? int(Dst::MaxColsAtCompileTime)
+                    : int(Dst::MaxRowsAtCompileTime),
+      MaxSizeAtCompileTime = Dst::SizeAtCompileTime,
+
+      MightEnableVml =  vml_call<UnaryOp>::IsSupported && StorageOrdersAgree && DstHasDirectAccess && SrcHasDirectAccess
+                     && Src::InnerStrideAtCompileTime==1 && Dst::InnerStrideAtCompileTime==1,
+      MightLinearize = MightEnableVml && (int(Dst::Flags) & int(Src::Flags) & LinearAccessBit),
+      VmlSize = MightLinearize ? MaxSizeAtCompileTime : InnerMaxSize,
+      LargeEnough = VmlSize==Dynamic || VmlSize>=EIGEN_MKL_VML_THRESHOLD,
+      MayEnableVml = MightEnableVml && LargeEnough,
+      MayLinearize = MayEnableVml && MightLinearize
+    };
+  public:
+    enum {
+      Traversal = MayLinearize ? LinearVectorizedTraversal
+                : MayEnableVml ? InnerVectorizedTraversal
+                : DefaultTraversal
+    };
+};
+
+template<typename Derived1, typename Derived2, typename UnaryOp, int Traversal, int Unrolling,
+         int VmlTraversal = vml_assign_traits<Derived1, Derived2, UnaryOp>::Traversal >
+struct vml_assign_impl
+  : assign_impl<Derived1, Eigen::CwiseUnaryOp<UnaryOp, Derived2>,Traversal,Unrolling,BuiltIn>
+{
+};
+
+template<typename Derived1, typename Derived2, typename UnaryOp, int Traversal, int Unrolling>
+struct vml_assign_impl<Derived1, Derived2, UnaryOp, Traversal, Unrolling, InnerVectorizedTraversal>
+{
+  typedef typename Derived1::Scalar Scalar;
+  typedef typename Derived1::Index Index;
+  static inline void run(Derived1& dst, const CwiseUnaryOp<UnaryOp, Derived2>& src)
+  {
+    // in case we want to (or have to) skip VML at runtime we can call:
+    // assign_impl<Derived1,Eigen::CwiseUnaryOp<UnaryOp, Derived2>,Traversal,Unrolling,BuiltIn>::run(dst,src);
+    const Index innerSize = dst.innerSize();
+    const Index outerSize = dst.outerSize();
+    for(Index outer = 0; outer < outerSize; ++outer) {
+      const Scalar *src_ptr = src.IsRowMajor ?  &(src.nestedExpression().coeffRef(outer,0)) :
+                                                &(src.nestedExpression().coeffRef(0, outer));
+      Scalar *dst_ptr = dst.IsRowMajor ? &(dst.coeffRef(outer,0)) : &(dst.coeffRef(0, outer));
+      vml_call<UnaryOp>::run(src.functor(), innerSize, src_ptr, dst_ptr );
+    }
+  }
+};
+
+template<typename Derived1, typename Derived2, typename UnaryOp, int Traversal, int Unrolling>
+struct vml_assign_impl<Derived1, Derived2, UnaryOp, Traversal, Unrolling, LinearVectorizedTraversal>
+{
+  static inline void run(Derived1& dst, const CwiseUnaryOp<UnaryOp, Derived2>& src)
+  {
+    // in case we want to (or have to) skip VML at runtime we can call:
+    // assign_impl<Derived1,Eigen::CwiseUnaryOp<UnaryOp, Derived2>,Traversal,Unrolling,BuiltIn>::run(dst,src);
+    vml_call<UnaryOp>::run(src.functor(), dst.size(), src.nestedExpression().data(), dst.data() );
+  }
+};
+
+// Macroses
+
+#define EIGEN_MKL_VML_SPECIALIZE_ASSIGN(TRAVERSAL,UNROLLING) \
+  template<typename Derived1, typename Derived2, typename UnaryOp> \
+  struct assign_impl<Derived1, Eigen::CwiseUnaryOp<UnaryOp, Derived2>, TRAVERSAL, UNROLLING, Specialized>  {  \
+    static inline void run(Derived1 &dst, const Eigen::CwiseUnaryOp<UnaryOp, Derived2> &src) { \
+      vml_assign_impl<Derived1,Derived2,UnaryOp,TRAVERSAL,UNROLLING>::run(dst, src); \
+    } \
+  };
+
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(DefaultTraversal,NoUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(DefaultTraversal,CompleteUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(DefaultTraversal,InnerUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(LinearTraversal,NoUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(LinearTraversal,CompleteUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(InnerVectorizedTraversal,NoUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(InnerVectorizedTraversal,CompleteUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(InnerVectorizedTraversal,InnerUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(LinearVectorizedTraversal,CompleteUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(LinearVectorizedTraversal,NoUnrolling)
+EIGEN_MKL_VML_SPECIALIZE_ASSIGN(SliceVectorizedTraversal,NoUnrolling)
+
+
+#if !defined (EIGEN_FAST_MATH) || (EIGEN_FAST_MATH != 1)
+#define  EIGEN_MKL_VML_MODE VML_HA
+#else
+#define  EIGEN_MKL_VML_MODE VML_LA
+#endif
+
+#define EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE)     \
+  template<> struct vml_call< scalar_##EIGENOP##_op<EIGENTYPE> > {               \
+    enum { IsSupported = 1 };                                                    \
+    static inline void run( const scalar_##EIGENOP##_op<EIGENTYPE>& /*func*/,        \
+                            int size, const EIGENTYPE* src, EIGENTYPE* dst) {    \
+      VMLOP(size, (const VMLTYPE*)src, (VMLTYPE*)dst);                           \
+    }                                                                            \
+  };
+
+#define EIGEN_MKL_VML_DECLARE_UNARY_CALL_LA(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE)  \
+  template<> struct vml_call< scalar_##EIGENOP##_op<EIGENTYPE> > {               \
+    enum { IsSupported = 1 };                                                    \
+    static inline void run( const scalar_##EIGENOP##_op<EIGENTYPE>& /*func*/,        \
+                            int size, const EIGENTYPE* src, EIGENTYPE* dst) {    \
+      MKL_INT64 vmlMode = EIGEN_MKL_VML_MODE;                                    \
+      VMLOP(size, (const VMLTYPE*)src, (VMLTYPE*)dst, vmlMode);                  \
+    }                                                                            \
+  };
+
+#define EIGEN_MKL_VML_DECLARE_POW_CALL(EIGENOP, VMLOP, EIGENTYPE, VMLTYPE)       \
+  template<> struct vml_call< scalar_##EIGENOP##_op<EIGENTYPE> > {               \
+    enum { IsSupported = 1 };                                                    \
+    static inline void run( const scalar_##EIGENOP##_op<EIGENTYPE>& func,        \
+                          int size, const EIGENTYPE* src, EIGENTYPE* dst) {      \
+      EIGENTYPE exponent = func.m_exponent;                                      \
+      MKL_INT64 vmlMode = EIGEN_MKL_VML_MODE;                                    \
+      VMLOP(&size, (const VMLTYPE*)src, (const VMLTYPE*)&exponent,               \
+                        (VMLTYPE*)dst, &vmlMode);                                \
+    }                                                                            \
+  };
+
+#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(EIGENOP, VMLOP)                   \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, vs##VMLOP, float, float)             \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, vd##VMLOP, double, double)
+
+#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_COMPLEX(EIGENOP, VMLOP)                \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, vc##VMLOP, scomplex, MKL_Complex8)   \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALL(EIGENOP, vz##VMLOP, dcomplex, MKL_Complex16)
+
+#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS(EIGENOP, VMLOP)                        \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(EIGENOP, VMLOP)                         \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALLS_COMPLEX(EIGENOP, VMLOP)
+
+
+#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL_LA(EIGENOP, VMLOP)                \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALL_LA(EIGENOP, vms##VMLOP, float, float)         \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALL_LA(EIGENOP, vmd##VMLOP, double, double)
+
+#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_COMPLEX_LA(EIGENOP, VMLOP)             \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALL_LA(EIGENOP, vmc##VMLOP, scomplex, MKL_Complex8)  \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALL_LA(EIGENOP, vmz##VMLOP, dcomplex, MKL_Complex16)
+
+#define EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(EIGENOP, VMLOP)                     \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL_LA(EIGENOP, VMLOP)                      \
+  EIGEN_MKL_VML_DECLARE_UNARY_CALLS_COMPLEX_LA(EIGENOP, VMLOP)
+
+
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(sin,  Sin)
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(asin, Asin)
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(cos,  Cos)
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(acos, Acos)
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(tan,  Tan)
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(atan,  Atan)
+//EIGEN_MKL_VML_DECLARE_UNARY_CALLS(abs,  Abs)
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(exp,  Exp)
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(log,  Ln)
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_LA(sqrt, Sqrt)
+
+EIGEN_MKL_VML_DECLARE_UNARY_CALLS_REAL(square, Sqr)
+
+// The vm*powx functions are not avaibale in the windows version of MKL.
+#ifndef _WIN32
+EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmspowx_, float, float)
+EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmdpowx_, double, double)
+EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmcpowx_, scomplex, MKL_Complex8)
+EIGEN_MKL_VML_DECLARE_POW_CALL(pow, vmzpowx_, dcomplex, MKL_Complex16)
+#endif
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_ASSIGN_VML_H
diff --git a/third_party/eigen3/Eigen/src/Core/BandMatrix.h b/third_party/eigen3/Eigen/src/Core/BandMatrix.h
new file mode 100644
index 0000000000..ffd7fe8b30
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/BandMatrix.h
@@ -0,0 +1,334 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BANDMATRIX_H
+#define EIGEN_BANDMATRIX_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Derived>
+class BandMatrixBase : public EigenBase<Derived>
+{
+  public:
+
+    enum {
+      Flags = internal::traits<Derived>::Flags,
+      CoeffReadCost = internal::traits<Derived>::CoeffReadCost,
+      RowsAtCompileTime = internal::traits<Derived>::RowsAtCompileTime,
+      ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
+      MaxRowsAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = internal::traits<Derived>::MaxColsAtCompileTime,
+      Supers = internal::traits<Derived>::Supers,
+      Subs   = internal::traits<Derived>::Subs,
+      Options = internal::traits<Derived>::Options
+    };
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef Matrix<Scalar,RowsAtCompileTime,ColsAtCompileTime> DenseMatrixType;
+    typedef typename DenseMatrixType::Index Index;
+    typedef typename internal::traits<Derived>::CoefficientsType CoefficientsType;
+    typedef EigenBase<Derived> Base;
+
+  protected:
+    enum {
+      DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic))
+                            ? 1 + Supers + Subs
+                            : Dynamic,
+      SizeAtCompileTime = EIGEN_SIZE_MIN_PREFER_DYNAMIC(RowsAtCompileTime,ColsAtCompileTime)
+    };
+
+  public:
+    
+    using Base::derived;
+    using Base::rows;
+    using Base::cols;
+
+    /** \returns the number of super diagonals */
+    inline Index supers() const { return derived().supers(); }
+
+    /** \returns the number of sub diagonals */
+    inline Index subs() const { return derived().subs(); }
+    
+    /** \returns an expression of the underlying coefficient matrix */
+    inline const CoefficientsType& coeffs() const { return derived().coeffs(); }
+    
+    /** \returns an expression of the underlying coefficient matrix */
+    inline CoefficientsType& coeffs() { return derived().coeffs(); }
+
+    /** \returns a vector expression of the \a i -th column,
+      * only the meaningful part is returned.
+      * \warning the internal storage must be column major. */
+    inline Block<CoefficientsType,Dynamic,1> col(Index i)
+    {
+      EIGEN_STATIC_ASSERT((Options&RowMajor)==0,THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
+      Index start = 0;
+      Index len = coeffs().rows();
+      if (i<=supers())
+      {
+        start = supers()-i;
+        len = (std::min)(rows(),std::max<Index>(0,coeffs().rows() - (supers()-i)));
+      }
+      else if (i>=rows()-subs())
+        len = std::max<Index>(0,coeffs().rows() - (i + 1 - rows() + subs()));
+      return Block<CoefficientsType,Dynamic,1>(coeffs(), start, i, len, 1);
+    }
+
+    /** \returns a vector expression of the main diagonal */
+    inline Block<CoefficientsType,1,SizeAtCompileTime> diagonal()
+    { return Block<CoefficientsType,1,SizeAtCompileTime>(coeffs(),supers(),0,1,(std::min)(rows(),cols())); }
+
+    /** \returns a vector expression of the main diagonal (const version) */
+    inline const Block<const CoefficientsType,1,SizeAtCompileTime> diagonal() const
+    { return Block<const CoefficientsType,1,SizeAtCompileTime>(coeffs(),supers(),0,1,(std::min)(rows(),cols())); }
+
+    template<int Index> struct DiagonalIntReturnType {
+      enum {
+        ReturnOpposite = (Options&SelfAdjoint) && (((Index)>0 && Supers==0) || ((Index)<0 && Subs==0)),
+        Conjugate = ReturnOpposite && NumTraits<Scalar>::IsComplex,
+        ActualIndex = ReturnOpposite ? -Index : Index,
+        DiagonalSize = (RowsAtCompileTime==Dynamic || ColsAtCompileTime==Dynamic)
+                     ? Dynamic
+                     : (ActualIndex<0
+                     ? EIGEN_SIZE_MIN_PREFER_DYNAMIC(ColsAtCompileTime, RowsAtCompileTime + ActualIndex)
+                     : EIGEN_SIZE_MIN_PREFER_DYNAMIC(RowsAtCompileTime, ColsAtCompileTime - ActualIndex))
+      };
+      typedef Block<CoefficientsType,1, DiagonalSize> BuildType;
+      typedef typename internal::conditional<Conjugate,
+                 CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>,BuildType >,
+                 BuildType>::type Type;
+    };
+
+    /** \returns a vector expression of the \a N -th sub or super diagonal */
+    template<int N> inline typename DiagonalIntReturnType<N>::Type diagonal()
+    {
+      return typename DiagonalIntReturnType<N>::BuildType(coeffs(), supers()-N, (std::max)(0,N), 1, diagonalLength(N));
+    }
+
+    /** \returns a vector expression of the \a N -th sub or super diagonal */
+    template<int N> inline const typename DiagonalIntReturnType<N>::Type diagonal() const
+    {
+      return typename DiagonalIntReturnType<N>::BuildType(coeffs(), supers()-N, (std::max)(0,N), 1, diagonalLength(N));
+    }
+
+    /** \returns a vector expression of the \a i -th sub or super diagonal */
+    inline Block<CoefficientsType,1,Dynamic> diagonal(Index i)
+    {
+      eigen_assert((i<0 && -i<=subs()) || (i>=0 && i<=supers()));
+      return Block<CoefficientsType,1,Dynamic>(coeffs(), supers()-i, std::max<Index>(0,i), 1, diagonalLength(i));
+    }
+
+    /** \returns a vector expression of the \a i -th sub or super diagonal */
+    inline const Block<const CoefficientsType,1,Dynamic> diagonal(Index i) const
+    {
+      eigen_assert((i<0 && -i<=subs()) || (i>=0 && i<=supers()));
+      return Block<const CoefficientsType,1,Dynamic>(coeffs(), supers()-i, std::max<Index>(0,i), 1, diagonalLength(i));
+    }
+    
+    template<typename Dest> inline void evalTo(Dest& dst) const
+    {
+      dst.resize(rows(),cols());
+      dst.setZero();
+      dst.diagonal() = diagonal();
+      for (Index i=1; i<=supers();++i)
+        dst.diagonal(i) = diagonal(i);
+      for (Index i=1; i<=subs();++i)
+        dst.diagonal(-i) = diagonal(-i);
+    }
+
+    DenseMatrixType toDenseMatrix() const
+    {
+      DenseMatrixType res(rows(),cols());
+      evalTo(res);
+      return res;
+    }
+
+  protected:
+
+    inline Index diagonalLength(Index i) const
+    { return i<0 ? (std::min)(cols(),rows()+i) : (std::min)(rows(),cols()-i); }
+};
+
+/**
+  * \class BandMatrix
+  * \ingroup Core_Module
+  *
+  * \brief Represents a rectangular matrix with a banded storage
+  *
+  * \param _Scalar Numeric type, i.e. float, double, int
+  * \param Rows Number of rows, or \b Dynamic
+  * \param Cols Number of columns, or \b Dynamic
+  * \param Supers Number of super diagonal
+  * \param Subs Number of sub diagonal
+  * \param _Options A combination of either \b #RowMajor or \b #ColMajor, and of \b #SelfAdjoint
+  *                 The former controls \ref TopicStorageOrders "storage order", and defaults to
+  *                 column-major. The latter controls whether the matrix represents a selfadjoint 
+  *                 matrix in which case either Supers of Subs have to be null.
+  *
+  * \sa class TridiagonalMatrix
+  */
+
+template<typename _Scalar, int _Rows, int _Cols, int _Supers, int _Subs, int _Options>
+struct traits<BandMatrix<_Scalar,_Rows,_Cols,_Supers,_Subs,_Options> >
+{
+  typedef _Scalar Scalar;
+  typedef Dense StorageKind;
+  typedef DenseIndex Index;
+  enum {
+    CoeffReadCost = NumTraits<Scalar>::ReadCost,
+    RowsAtCompileTime = _Rows,
+    ColsAtCompileTime = _Cols,
+    MaxRowsAtCompileTime = _Rows,
+    MaxColsAtCompileTime = _Cols,
+    Flags = LvalueBit,
+    Supers = _Supers,
+    Subs = _Subs,
+    Options = _Options,
+    DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic
+  };
+  typedef Matrix<Scalar,DataRowsAtCompileTime,ColsAtCompileTime,Options&RowMajor?RowMajor:ColMajor> CoefficientsType;
+};
+
+template<typename _Scalar, int Rows, int Cols, int Supers, int Subs, int Options>
+class BandMatrix : public BandMatrixBase<BandMatrix<_Scalar,Rows,Cols,Supers,Subs,Options> >
+{
+  public:
+
+    typedef typename internal::traits<BandMatrix>::Scalar Scalar;
+    typedef typename internal::traits<BandMatrix>::Index Index;
+    typedef typename internal::traits<BandMatrix>::CoefficientsType CoefficientsType;
+
+    inline BandMatrix(Index rows=Rows, Index cols=Cols, Index supers=Supers, Index subs=Subs)
+      : m_coeffs(1+supers+subs,cols),
+        m_rows(rows), m_supers(supers), m_subs(subs)
+    {
+    }
+
+    /** \returns the number of columns */
+    inline Index rows() const { return m_rows.value(); }
+
+    /** \returns the number of rows */
+    inline Index cols() const { return m_coeffs.cols(); }
+
+    /** \returns the number of super diagonals */
+    inline Index supers() const { return m_supers.value(); }
+
+    /** \returns the number of sub diagonals */
+    inline Index subs() const { return m_subs.value(); }
+
+    inline const CoefficientsType& coeffs() const { return m_coeffs; }
+    inline CoefficientsType& coeffs() { return m_coeffs; }
+
+  protected:
+
+    CoefficientsType m_coeffs;
+    internal::variable_if_dynamic<Index, Rows>   m_rows;
+    internal::variable_if_dynamic<Index, Supers> m_supers;
+    internal::variable_if_dynamic<Index, Subs>   m_subs;
+};
+
+template<typename _CoefficientsType,int _Rows, int _Cols, int _Supers, int _Subs,int _Options>
+class BandMatrixWrapper;
+
+template<typename _CoefficientsType,int _Rows, int _Cols, int _Supers, int _Subs,int _Options>
+struct traits<BandMatrixWrapper<_CoefficientsType,_Rows,_Cols,_Supers,_Subs,_Options> >
+{
+  typedef typename _CoefficientsType::Scalar Scalar;
+  typedef typename _CoefficientsType::StorageKind StorageKind;
+  typedef typename _CoefficientsType::Index Index;
+  enum {
+    CoeffReadCost = internal::traits<_CoefficientsType>::CoeffReadCost,
+    RowsAtCompileTime = _Rows,
+    ColsAtCompileTime = _Cols,
+    MaxRowsAtCompileTime = _Rows,
+    MaxColsAtCompileTime = _Cols,
+    Flags = LvalueBit,
+    Supers = _Supers,
+    Subs = _Subs,
+    Options = _Options,
+    DataRowsAtCompileTime = ((Supers!=Dynamic) && (Subs!=Dynamic)) ? 1 + Supers + Subs : Dynamic
+  };
+  typedef _CoefficientsType CoefficientsType;
+};
+
+template<typename _CoefficientsType,int _Rows, int _Cols, int _Supers, int _Subs,int _Options>
+class BandMatrixWrapper : public BandMatrixBase<BandMatrixWrapper<_CoefficientsType,_Rows,_Cols,_Supers,_Subs,_Options> >
+{
+  public:
+
+    typedef typename internal::traits<BandMatrixWrapper>::Scalar Scalar;
+    typedef typename internal::traits<BandMatrixWrapper>::CoefficientsType CoefficientsType;
+    typedef typename internal::traits<BandMatrixWrapper>::Index Index;
+
+    inline BandMatrixWrapper(const CoefficientsType& coeffs, Index rows=_Rows, Index cols=_Cols, Index supers=_Supers, Index subs=_Subs)
+      : m_coeffs(coeffs),
+        m_rows(rows), m_supers(supers), m_subs(subs)
+    {
+      EIGEN_UNUSED_VARIABLE(cols);
+      //internal::assert(coeffs.cols()==cols() && (supers()+subs()+1)==coeffs.rows());
+    }
+
+    /** \returns the number of columns */
+    inline Index rows() const { return m_rows.value(); }
+
+    /** \returns the number of rows */
+    inline Index cols() const { return m_coeffs.cols(); }
+
+    /** \returns the number of super diagonals */
+    inline Index supers() const { return m_supers.value(); }
+
+    /** \returns the number of sub diagonals */
+    inline Index subs() const { return m_subs.value(); }
+
+    inline const CoefficientsType& coeffs() const { return m_coeffs; }
+
+  protected:
+
+    const CoefficientsType& m_coeffs;
+    internal::variable_if_dynamic<Index, _Rows>   m_rows;
+    internal::variable_if_dynamic<Index, _Supers> m_supers;
+    internal::variable_if_dynamic<Index, _Subs>   m_subs;
+};
+
+/**
+  * \class TridiagonalMatrix
+  * \ingroup Core_Module
+  *
+  * \brief Represents a tridiagonal matrix with a compact banded storage
+  *
+  * \param _Scalar Numeric type, i.e. float, double, int
+  * \param Size Number of rows and cols, or \b Dynamic
+  * \param _Options Can be 0 or \b SelfAdjoint
+  *
+  * \sa class BandMatrix
+  */
+template<typename Scalar, int Size, int Options>
+class TridiagonalMatrix : public BandMatrix<Scalar,Size,Size,Options&SelfAdjoint?0:1,1,Options|RowMajor>
+{
+    typedef BandMatrix<Scalar,Size,Size,Options&SelfAdjoint?0:1,1,Options|RowMajor> Base;
+    typedef typename Base::Index Index;
+  public:
+    TridiagonalMatrix(Index size = Size) : Base(size,size,Options&SelfAdjoint?0:1,1) {}
+
+    inline typename Base::template DiagonalIntReturnType<1>::Type super()
+    { return Base::template diagonal<1>(); }
+    inline const typename Base::template DiagonalIntReturnType<1>::Type super() const
+    { return Base::template diagonal<1>(); }
+    inline typename Base::template DiagonalIntReturnType<-1>::Type sub()
+    { return Base::template diagonal<-1>(); }
+    inline const typename Base::template DiagonalIntReturnType<-1>::Type sub() const
+    { return Base::template diagonal<-1>(); }
+  protected:
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_BANDMATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/Block.h b/third_party/eigen3/Eigen/src/Core/Block.h
new file mode 100644
index 0000000000..da193d1a22
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Block.h
@@ -0,0 +1,432 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BLOCK_H
+#define EIGEN_BLOCK_H
+
+namespace Eigen { 
+
+/** \class Block
+  * \ingroup Core_Module
+  *
+  * \brief Expression of a fixed-size or dynamic-size block
+  *
+  * \param XprType the type of the expression in which we are taking a block
+  * \param BlockRows the number of rows of the block we are taking at compile time (optional)
+  * \param BlockCols the number of columns of the block we are taking at compile time (optional)
+  * \param InnerPanel is true, if the block maps to a set of rows of a row major matrix or
+  *        to set of columns of a column major matrix (optional). The parameter allows to determine
+  *        at compile time whether aligned access is possible on the block expression.
+  *
+  * This class represents an expression of either a fixed-size or dynamic-size block. It is the return
+  * type of DenseBase::block(Index,Index,Index,Index) and DenseBase::block<int,int>(Index,Index) and
+  * most of the time this is the only way it is used.
+  *
+  * However, if you want to directly maniputate block expressions,
+  * for instance if you want to write a function returning such an expression, you
+  * will need to use this class.
+  *
+  * Here is an example illustrating the dynamic case:
+  * \include class_Block.cpp
+  * Output: \verbinclude class_Block.out
+  *
+  * \note Even though this expression has dynamic size, in the case where \a XprType
+  * has fixed size, this expression inherits a fixed maximal size which means that evaluating
+  * it does not cause a dynamic memory allocation.
+  *
+  * Here is an example illustrating the fixed-size case:
+  * \include class_FixedBlock.cpp
+  * Output: \verbinclude class_FixedBlock.out
+  *
+  * \sa DenseBase::block(Index,Index,Index,Index), DenseBase::block(Index,Index), class VectorBlock
+  */
+
+namespace internal {
+template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
+struct traits<Block<XprType, BlockRows, BlockCols, InnerPanel> > : traits<XprType>
+{
+  typedef typename traits<XprType>::Scalar Scalar;
+  typedef typename traits<XprType>::StorageKind StorageKind;
+  typedef typename traits<XprType>::XprKind XprKind;
+  typedef typename nested<XprType>::type XprTypeNested;
+  typedef typename remove_reference<XprTypeNested>::type _XprTypeNested;
+  enum{
+    MatrixRows = traits<XprType>::RowsAtCompileTime,
+    MatrixCols = traits<XprType>::ColsAtCompileTime,
+    RowsAtCompileTime = MatrixRows == 0 ? 0 : BlockRows,
+    ColsAtCompileTime = MatrixCols == 0 ? 0 : BlockCols,
+    MaxRowsAtCompileTime = BlockRows==0 ? 0
+                         : RowsAtCompileTime != Dynamic ? int(RowsAtCompileTime)
+                         : int(traits<XprType>::MaxRowsAtCompileTime),
+    MaxColsAtCompileTime = BlockCols==0 ? 0
+                         : ColsAtCompileTime != Dynamic ? int(ColsAtCompileTime)
+                         : int(traits<XprType>::MaxColsAtCompileTime),
+    XprTypeIsRowMajor = (int(traits<XprType>::Flags)&RowMajorBit) != 0,
+    IsRowMajor = (MaxRowsAtCompileTime==1&&MaxColsAtCompileTime!=1) ? 1
+               : (MaxColsAtCompileTime==1&&MaxRowsAtCompileTime!=1) ? 0
+               : XprTypeIsRowMajor,
+    HasSameStorageOrderAsXprType = (IsRowMajor == XprTypeIsRowMajor),
+    InnerSize = IsRowMajor ? int(ColsAtCompileTime) : int(RowsAtCompileTime),
+    InnerStrideAtCompileTime = HasSameStorageOrderAsXprType
+                             ? int(inner_stride_at_compile_time<XprType>::ret)
+                             : int(outer_stride_at_compile_time<XprType>::ret),
+    OuterStrideAtCompileTime = HasSameStorageOrderAsXprType
+                             ? int(outer_stride_at_compile_time<XprType>::ret)
+                             : int(inner_stride_at_compile_time<XprType>::ret),
+    MaskPacketAccessBit = (InnerSize == Dynamic || (InnerSize % packet_traits<Scalar>::size) == 0)
+                       && (InnerStrideAtCompileTime == 1)
+                        ? PacketAccessBit : 0,
+    MaskAlignedBit = (InnerPanel && (OuterStrideAtCompileTime!=Dynamic) && (((OuterStrideAtCompileTime * int(sizeof(Scalar))) % EIGEN_ALIGN_BYTES) == 0)) ? AlignedBit : 0,
+    FlagsLinearAccessBit = (RowsAtCompileTime == 1 || ColsAtCompileTime == 1 || (InnerPanel && (traits<XprType>::Flags&LinearAccessBit))) ? LinearAccessBit : 0,
+    FlagsLvalueBit = is_lvalue<XprType>::value ? LvalueBit : 0,
+    FlagsRowMajorBit = IsRowMajor ? RowMajorBit : 0,
+    Flags0 = traits<XprType>::Flags & ( (HereditaryBits & ~RowMajorBit) |
+                                        DirectAccessBit |
+                                        MaskPacketAccessBit |
+                                        MaskAlignedBit),
+    Flags = Flags0 | FlagsLinearAccessBit | FlagsLvalueBit | FlagsRowMajorBit
+  };
+};
+
+template<typename XprType, int BlockRows=Dynamic, int BlockCols=Dynamic, bool InnerPanel = false,
+         bool HasDirectAccess = internal::has_direct_access<XprType>::ret> class BlockImpl_dense;
+         
+} // end namespace internal
+
+template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, typename StorageKind> class BlockImpl;
+
+template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel> class Block
+  : public BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, typename internal::traits<XprType>::StorageKind>
+{
+    typedef BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, typename internal::traits<XprType>::StorageKind> Impl;
+  public:
+    //typedef typename Impl::Base Base;
+    typedef Impl Base;
+    EIGEN_GENERIC_PUBLIC_INTERFACE(Block)
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Block)
+  
+    /** Column or Row constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline Block(XprType& xpr, Index i) : Impl(xpr,i)
+    {
+      eigen_assert( (i>=0) && (
+          ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && i<xpr.rows())
+        ||((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && i<xpr.cols())));
+    }
+
+    /** Fixed-size constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline Block(XprType& xpr, Index a_startRow, Index a_startCol)
+      : Impl(xpr, a_startRow, a_startCol)
+    {
+      EIGEN_STATIC_ASSERT(RowsAtCompileTime!=Dynamic && ColsAtCompileTime!=Dynamic,THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE)
+      eigen_assert(a_startRow >= 0 && BlockRows >= 1 && a_startRow + BlockRows <= xpr.rows()
+             && a_startCol >= 0 && BlockCols >= 1 && a_startCol + BlockCols <= xpr.cols());
+    }
+
+    /** Dynamic-size constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline Block(XprType& xpr,
+          Index a_startRow, Index a_startCol,
+          Index blockRows, Index blockCols)
+      : Impl(xpr, a_startRow, a_startCol, blockRows, blockCols)
+    {
+      eigen_assert((RowsAtCompileTime==Dynamic || RowsAtCompileTime==blockRows)
+          && (ColsAtCompileTime==Dynamic || ColsAtCompileTime==blockCols));
+      eigen_assert(a_startRow >= 0 && blockRows >= 0 && a_startRow  <= xpr.rows() - blockRows
+          && a_startCol >= 0 && blockCols >= 0 && a_startCol <= xpr.cols() - blockCols);
+    }
+};
+         
+// The generic default implementation for dense block simplu forward to the internal::BlockImpl_dense
+// that must be specialized for direct and non-direct access...
+template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
+class BlockImpl<XprType, BlockRows, BlockCols, InnerPanel, Dense>
+  : public internal::BlockImpl_dense<XprType, BlockRows, BlockCols, InnerPanel>
+{
+    typedef internal::BlockImpl_dense<XprType, BlockRows, BlockCols, InnerPanel> Impl;
+    typedef typename XprType::Index Index;
+  public:
+    typedef Impl Base;
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl)
+    EIGEN_DEVICE_FUNC inline BlockImpl(XprType& xpr, Index i) : Impl(xpr,i) {}
+    EIGEN_DEVICE_FUNC inline BlockImpl(XprType& xpr, Index a_startRow, Index a_startCol) : Impl(xpr, a_startRow, a_startCol) {}
+    EIGEN_DEVICE_FUNC
+    inline BlockImpl(XprType& xpr, Index a_startRow, Index a_startCol, Index blockRows, Index blockCols)
+      : Impl(xpr, a_startRow, a_startCol, blockRows, blockCols) {}
+};
+
+namespace internal {
+
+/** \internal Internal implementation of dense Blocks in the general case. */
+template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel, bool HasDirectAccess> class BlockImpl_dense
+  : public internal::dense_xpr_base<Block<XprType, BlockRows, BlockCols, InnerPanel> >::type
+{
+    typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
+  public:
+
+    typedef typename internal::dense_xpr_base<BlockType>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(BlockType)
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl_dense)
+
+    class InnerIterator;
+
+    /** Column or Row constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline BlockImpl_dense(XprType& xpr, Index i)
+      : m_xpr(xpr),
+        // It is a row if and only if BlockRows==1 and BlockCols==XprType::ColsAtCompileTime,
+        // and it is a column if and only if BlockRows==XprType::RowsAtCompileTime and BlockCols==1,
+        // all other cases are invalid.
+        // The case a 1x1 matrix seems ambiguous, but the result is the same anyway.
+        m_startRow( (BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) ? i : 0),
+        m_startCol( (BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) ? i : 0),
+        m_blockRows(BlockRows==1 ? 1 : xpr.rows()),
+        m_blockCols(BlockCols==1 ? 1 : xpr.cols())
+    {}
+
+    /** Fixed-size constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline BlockImpl_dense(XprType& xpr, Index a_startRow, Index a_startCol)
+      : m_xpr(xpr), m_startRow(a_startRow), m_startCol(a_startCol),
+                    m_blockRows(BlockRows), m_blockCols(BlockCols)
+    {}
+
+    /** Dynamic-size constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline BlockImpl_dense(XprType& xpr,
+          Index a_startRow, Index a_startCol,
+          Index blockRows, Index blockCols)
+      : m_xpr(xpr), m_startRow(a_startRow), m_startCol(a_startCol),
+                    m_blockRows(blockRows), m_blockCols(blockCols)
+    {}
+
+    EIGEN_DEVICE_FUNC inline Index rows() const { return m_blockRows.value(); }
+    EIGEN_DEVICE_FUNC inline Index cols() const { return m_blockCols.value(); }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index rowId, Index colId)
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(XprType)
+      return m_xpr.const_cast_derived()
+               .coeffRef(rowId + m_startRow.value(), colId + m_startCol.value());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index rowId, Index colId) const
+    {
+      return m_xpr.derived()
+               .coeffRef(rowId + m_startRow.value(), colId + m_startCol.value());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const CoeffReturnType coeff(Index rowId, Index colId) const
+    {
+      return m_xpr.coeff(rowId + m_startRow.value(), colId + m_startCol.value());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index index)
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(XprType)
+      return m_xpr.const_cast_derived()
+             .coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
+                       m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index index) const
+    {
+      return m_xpr.const_cast_derived()
+             .coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
+                       m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const CoeffReturnType coeff(Index index) const
+    {
+      return m_xpr
+             .coeff(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
+                    m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
+    }
+
+    template<int LoadMode>
+    inline PacketScalar packet(Index rowId, Index colId) const
+    {
+      return m_xpr.template packet<Unaligned>
+              (rowId + m_startRow.value(), colId + m_startCol.value());
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index rowId, Index colId, const PacketScalar& val)
+    {
+      m_xpr.const_cast_derived().template writePacket<Unaligned>
+              (rowId + m_startRow.value(), colId + m_startCol.value(), val);
+    }
+
+    template<int LoadMode>
+    inline PacketScalar packet(Index index) const
+    {
+      return m_xpr.template packet<Unaligned>
+              (m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
+               m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index index, const PacketScalar& val)
+    {
+      m_xpr.const_cast_derived().template writePacket<Unaligned>
+         (m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
+          m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0), val);
+    }
+
+    #ifdef EIGEN_PARSED_BY_DOXYGEN
+    /** \sa MapBase::data() */
+    EIGEN_DEVICE_FUNC inline const Scalar* data() const;
+    EIGEN_DEVICE_FUNC inline Index innerStride() const;
+    EIGEN_DEVICE_FUNC inline Index outerStride() const;
+    #endif
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type& nestedExpression() const
+    { 
+      return m_xpr; 
+    }
+      
+    EIGEN_DEVICE_FUNC
+    Index startRow() const
+    { 
+      return m_startRow.value(); 
+    }
+      
+    EIGEN_DEVICE_FUNC
+    Index startCol() const 
+    { 
+      return m_startCol.value(); 
+    }
+
+  protected:
+
+    const typename XprType::Nested m_xpr;
+    const internal::variable_if_dynamic<Index, XprType::RowsAtCompileTime == 1 ? 0 : Dynamic> m_startRow;
+    const internal::variable_if_dynamic<Index, XprType::ColsAtCompileTime == 1 ? 0 : Dynamic> m_startCol;
+    const internal::variable_if_dynamic<Index, RowsAtCompileTime> m_blockRows;
+    const internal::variable_if_dynamic<Index, ColsAtCompileTime> m_blockCols;
+};
+
+/** \internal Internal implementation of dense Blocks in the direct access case.*/
+template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
+class BlockImpl_dense<XprType,BlockRows,BlockCols, InnerPanel,true>
+  : public MapBase<Block<XprType, BlockRows, BlockCols, InnerPanel> >
+{
+    typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
+  public:
+
+    typedef MapBase<BlockType> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(BlockType)
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl_dense)
+
+    /** Column or Row constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline BlockImpl_dense(XprType& xpr, Index i)
+      : Base(internal::const_cast_ptr(&xpr.coeffRef(
+              (BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) ? i : 0,
+              (BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) ? i : 0)),
+             BlockRows==1 ? 1 : xpr.rows(),
+             BlockCols==1 ? 1 : xpr.cols()),
+        m_xpr(xpr)
+    {
+      init();
+    }
+
+    /** Fixed-size constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline BlockImpl_dense(XprType& xpr, Index startRow, Index startCol)
+      : Base(internal::const_cast_ptr(&xpr.coeffRef(startRow,startCol))), m_xpr(xpr)
+    {
+      init();
+    }
+
+    /** Dynamic-size constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline BlockImpl_dense(XprType& xpr,
+          Index startRow, Index startCol,
+          Index blockRows, Index blockCols)
+      : Base(internal::const_cast_ptr(&xpr.coeffRef(startRow,startCol)), blockRows, blockCols),
+        m_xpr(xpr)
+    {
+      init();
+    }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type& nestedExpression() const
+    { 
+      return m_xpr; 
+    }
+      
+    /** \sa MapBase::innerStride() */
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const
+    {
+      return internal::traits<BlockType>::HasSameStorageOrderAsXprType
+             ? m_xpr.innerStride()
+             : m_xpr.outerStride();
+    }
+
+    /** \sa MapBase::outerStride() */
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const
+    {
+      return m_outerStride;
+    }
+
+  #ifndef __SUNPRO_CC
+  // FIXME sunstudio is not friendly with the above friend...
+  // META-FIXME there is no 'friend' keyword around here. Is this obsolete?
+  protected:
+  #endif
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** \internal used by allowAligned() */
+    EIGEN_DEVICE_FUNC
+    inline BlockImpl_dense(XprType& xpr, const Scalar* data, Index blockRows, Index blockCols)
+      : Base(data, blockRows, blockCols), m_xpr(xpr)
+    {
+      init();
+    }
+    #endif
+
+  protected:
+    EIGEN_DEVICE_FUNC
+    void init()
+    {
+      m_outerStride = internal::traits<BlockType>::HasSameStorageOrderAsXprType
+                    ? m_xpr.outerStride()
+                    : m_xpr.innerStride();
+    }
+
+    typename XprType::Nested m_xpr;
+    Index m_outerStride;
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_BLOCK_H
diff --git a/third_party/eigen3/Eigen/src/Core/BooleanRedux.h b/third_party/eigen3/Eigen/src/Core/BooleanRedux.h
new file mode 100644
index 0000000000..be9f48a8c7
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/BooleanRedux.h
@@ -0,0 +1,154 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ALLANDANY_H
+#define EIGEN_ALLANDANY_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Derived, int UnrollCount>
+struct all_unroller
+{
+  enum {
+    col = (UnrollCount-1) / Derived::RowsAtCompileTime,
+    row = (UnrollCount-1) % Derived::RowsAtCompileTime
+  };
+
+  static inline bool run(const Derived &mat)
+  {
+    return all_unroller<Derived, UnrollCount-1>::run(mat) && mat.coeff(row, col);
+  }
+};
+
+template<typename Derived>
+struct all_unroller<Derived, 0>
+{
+  static inline bool run(const Derived &/*mat*/) { return true; }
+};
+
+template<typename Derived>
+struct all_unroller<Derived, Dynamic>
+{
+  static inline bool run(const Derived &) { return false; }
+};
+
+template<typename Derived, int UnrollCount>
+struct any_unroller
+{
+  enum {
+    col = (UnrollCount-1) / Derived::RowsAtCompileTime,
+    row = (UnrollCount-1) % Derived::RowsAtCompileTime
+  };
+
+  static inline bool run(const Derived &mat)
+  {
+    return any_unroller<Derived, UnrollCount-1>::run(mat) || mat.coeff(row, col);
+  }
+};
+
+template<typename Derived>
+struct any_unroller<Derived, 0>
+{
+  static inline bool run(const Derived & /*mat*/) { return false; }
+};
+
+template<typename Derived>
+struct any_unroller<Derived, Dynamic>
+{
+  static inline bool run(const Derived &) { return false; }
+};
+
+} // end namespace internal
+
+/** \returns true if all coefficients are true
+  *
+  * Example: \include MatrixBase_all.cpp
+  * Output: \verbinclude MatrixBase_all.out
+  *
+  * \sa any(), Cwise::operator<()
+  */
+template<typename Derived>
+inline bool DenseBase<Derived>::all() const
+{
+  enum {
+    unroll = SizeAtCompileTime != Dynamic
+          && CoeffReadCost != Dynamic
+          && NumTraits<Scalar>::AddCost != Dynamic
+          && SizeAtCompileTime * (CoeffReadCost + NumTraits<Scalar>::AddCost) <= EIGEN_UNROLLING_LIMIT
+  };
+  if(unroll)
+    return internal::all_unroller<Derived, unroll ? int(SizeAtCompileTime) : Dynamic>::run(derived());
+  else
+  {
+    for(Index j = 0; j < cols(); ++j)
+      for(Index i = 0; i < rows(); ++i)
+        if (!coeff(i, j)) return false;
+    return true;
+  }
+}
+
+/** \returns true if at least one coefficient is true
+  *
+  * \sa all()
+  */
+template<typename Derived>
+inline bool DenseBase<Derived>::any() const
+{
+  enum {
+    unroll = SizeAtCompileTime != Dynamic
+          && CoeffReadCost != Dynamic
+          && NumTraits<Scalar>::AddCost != Dynamic
+          && SizeAtCompileTime * (CoeffReadCost + NumTraits<Scalar>::AddCost) <= EIGEN_UNROLLING_LIMIT
+  };
+  if(unroll)
+    return internal::any_unroller<Derived, unroll ? int(SizeAtCompileTime) : Dynamic>::run(derived());
+  else
+  {
+    for(Index j = 0; j < cols(); ++j)
+      for(Index i = 0; i < rows(); ++i)
+        if (coeff(i, j)) return true;
+    return false;
+  }
+}
+
+/** \returns the number of coefficients which evaluate to true
+  *
+  * \sa all(), any()
+  */
+template<typename Derived>
+inline typename DenseBase<Derived>::Index DenseBase<Derived>::count() const
+{
+  return derived().template cast<bool>().template cast<Index>().sum();
+}
+
+/** \returns true is \c *this contains at least one Not A Number (NaN).
+  *
+  * \sa allFinite()
+  */
+template<typename Derived>
+inline bool DenseBase<Derived>::hasNaN() const
+{
+  return !((derived().array()==derived().array()).all());
+}
+
+/** \returns true if \c *this contains only finite numbers, i.e., no NaN and no +/-INF values.
+  *
+  * \sa hasNaN()
+  */
+template<typename Derived>
+inline bool DenseBase<Derived>::allFinite() const
+{
+  return !((derived()-derived()).hasNaN());
+}
+    
+} // end namespace Eigen
+
+#endif // EIGEN_ALLANDANY_H
diff --git a/third_party/eigen3/Eigen/src/Core/CommaInitializer.h b/third_party/eigen3/Eigen/src/Core/CommaInitializer.h
new file mode 100644
index 0000000000..70cbfeff55
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/CommaInitializer.h
@@ -0,0 +1,161 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COMMAINITIALIZER_H
+#define EIGEN_COMMAINITIALIZER_H
+
+namespace Eigen { 
+
+/** \class CommaInitializer
+  * \ingroup Core_Module
+  *
+  * \brief Helper class used by the comma initializer operator
+  *
+  * This class is internally used to implement the comma initializer feature. It is
+  * the return type of MatrixBase::operator<<, and most of the time this is the only
+  * way it is used.
+  *
+  * \sa \ref MatrixBaseCommaInitRef "MatrixBase::operator<<", CommaInitializer::finished()
+  */
+template<typename XprType>
+struct CommaInitializer
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::Index Index;
+
+  EIGEN_DEVICE_FUNC
+  inline CommaInitializer(XprType& xpr, const Scalar& s)
+    : m_xpr(xpr), m_row(0), m_col(1), m_currentBlockRows(1)
+  {
+    m_xpr.coeffRef(0,0) = s;
+  }
+
+  template<typename OtherDerived>
+  EIGEN_DEVICE_FUNC
+  inline CommaInitializer(XprType& xpr, const DenseBase<OtherDerived>& other)
+    : m_xpr(xpr), m_row(0), m_col(other.cols()), m_currentBlockRows(other.rows())
+  {
+    m_xpr.block(0, 0, other.rows(), other.cols()) = other;
+  }
+
+  /* Copy/Move constructor which transfers ownership. This is crucial in 
+   * absence of return value optimization to avoid assertions during destruction. */
+  // FIXME in C++11 mode this could be replaced by a proper RValue constructor
+  EIGEN_DEVICE_FUNC
+  inline CommaInitializer(const CommaInitializer& o)
+  : m_xpr(o.m_xpr), m_row(o.m_row), m_col(o.m_col), m_currentBlockRows(o.m_currentBlockRows) {
+    // Mark original object as finished. In absence of R-value references we need to const_cast:
+    const_cast<CommaInitializer&>(o).m_row = m_xpr.rows();
+    const_cast<CommaInitializer&>(o).m_col = m_xpr.cols();
+    const_cast<CommaInitializer&>(o).m_currentBlockRows = 0;
+  }
+
+  /* inserts a scalar value in the target matrix */
+  EIGEN_DEVICE_FUNC
+  CommaInitializer& operator,(const Scalar& s)
+  {
+    if (m_col==m_xpr.cols())
+    {
+      m_row+=m_currentBlockRows;
+      m_col = 0;
+      m_currentBlockRows = 1;
+      eigen_assert(m_row<m_xpr.rows()
+        && "Too many rows passed to comma initializer (operator<<)");
+    }
+    eigen_assert(m_col<m_xpr.cols()
+      && "Too many coefficients passed to comma initializer (operator<<)");
+    eigen_assert(m_currentBlockRows==1);
+    m_xpr.coeffRef(m_row, m_col++) = s;
+    return *this;
+  }
+
+  /* inserts a matrix expression in the target matrix */
+  template<typename OtherDerived>
+  EIGEN_DEVICE_FUNC
+  CommaInitializer& operator,(const DenseBase<OtherDerived>& other)
+  {
+    if(other.cols()==0 || other.rows()==0)
+      return *this;
+    if (m_col==m_xpr.cols())
+    {
+      m_row+=m_currentBlockRows;
+      m_col = 0;
+      m_currentBlockRows = other.rows();
+      eigen_assert(m_row+m_currentBlockRows<=m_xpr.rows()
+        && "Too many rows passed to comma initializer (operator<<)");
+    }
+    eigen_assert(m_col<m_xpr.cols()
+      && "Too many coefficients passed to comma initializer (operator<<)");
+    eigen_assert(m_currentBlockRows==other.rows());
+    if (OtherDerived::SizeAtCompileTime != Dynamic)
+      m_xpr.template block<OtherDerived::RowsAtCompileTime != Dynamic ? OtherDerived::RowsAtCompileTime : 1,
+                              OtherDerived::ColsAtCompileTime != Dynamic ? OtherDerived::ColsAtCompileTime : 1>
+                    (m_row, m_col) = other;
+    else
+      m_xpr.block(m_row, m_col, other.rows(), other.cols()) = other;
+    m_col += other.cols();
+    return *this;
+  }
+
+  EIGEN_DEVICE_FUNC
+  inline ~CommaInitializer()
+  {
+    eigen_assert((m_row+m_currentBlockRows) == m_xpr.rows()
+         && m_col == m_xpr.cols()
+         && "Too few coefficients passed to comma initializer (operator<<)");
+  }
+
+  /** \returns the built matrix once all its coefficients have been set.
+    * Calling finished is 100% optional. Its purpose is to write expressions
+    * like this:
+    * \code
+    * quaternion.fromRotationMatrix((Matrix3f() << axis0, axis1, axis2).finished());
+    * \endcode
+    */
+  EIGEN_DEVICE_FUNC
+  inline XprType& finished() { return m_xpr; }
+
+  XprType& m_xpr;           // target expression
+  Index m_row;              // current row id
+  Index m_col;              // current col id
+  Index m_currentBlockRows; // current block height
+};
+
+/** \anchor MatrixBaseCommaInitRef
+  * Convenient operator to set the coefficients of a matrix.
+  *
+  * The coefficients must be provided in a row major order and exactly match
+  * the size of the matrix. Otherwise an assertion is raised.
+  *
+  * Example: \include MatrixBase_set.cpp
+  * Output: \verbinclude MatrixBase_set.out
+  * 
+  * \note According the c++ standard, the argument expressions of this comma initializer are evaluated in arbitrary order.
+  *
+  * \sa CommaInitializer::finished(), class CommaInitializer
+  */
+template<typename Derived>
+inline CommaInitializer<Derived> DenseBase<Derived>::operator<< (const Scalar& s)
+{
+  return CommaInitializer<Derived>(*static_cast<Derived*>(this), s);
+}
+
+/** \sa operator<<(const Scalar&) */
+template<typename Derived>
+template<typename OtherDerived>
+inline CommaInitializer<Derived>
+DenseBase<Derived>::operator<<(const DenseBase<OtherDerived>& other)
+{
+  return CommaInitializer<Derived>(*static_cast<Derived *>(this), other);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMMAINITIALIZER_H
diff --git a/third_party/eigen3/Eigen/src/Core/CoreEvaluators.h b/third_party/eigen3/Eigen/src/Core/CoreEvaluators.h
new file mode 100644
index 0000000000..3568cb85f9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/CoreEvaluators.h
@@ -0,0 +1,1121 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2011-2012 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+
+#ifndef EIGEN_COREEVALUATORS_H
+#define EIGEN_COREEVALUATORS_H
+
+namespace Eigen {
+
+namespace internal {
+
+// evaluator_traits<T> contains traits for evaluator_impl<T> 
+
+template<typename T>
+struct evaluator_traits
+{
+  // 1 if evaluator_impl<T>::evalTo() exists
+  // 0 if evaluator_impl<T> allows coefficient-based access
+  static const int HasEvalTo = 0;
+
+  // 1 if assignment A = B assumes aliasing when B is of type T and thus B needs to be evaluated into a
+  // temporary; 0 if not.
+  static const int AssumeAliasing = 0;
+};
+
+// expression class for evaluating nested expression to a temporary
+ 
+template<typename ArgType>
+class EvalToTemp;
+
+// evaluator<T>::type is type of evaluator for T
+// evaluator<T>::nestedType is type of evaluator if T is nested inside another evaluator
+ 
+template<typename T>
+struct evaluator_impl 
+{ };
+ 
+template<typename T, int Nested = evaluator_traits<T>::HasEvalTo>
+struct evaluator_nested_type;
+
+template<typename T>
+struct evaluator_nested_type<T, 0>
+{
+  typedef evaluator_impl<T> type;
+};
+
+template<typename T>
+struct evaluator_nested_type<T, 1>
+{
+  typedef evaluator_impl<EvalToTemp<T> > type;
+};
+
+template<typename T>
+struct evaluator
+{
+  typedef evaluator_impl<T> type;
+  typedef typename evaluator_nested_type<T>::type nestedType;
+};
+
+// TODO: Think about const-correctness
+
+template<typename T>
+struct evaluator<const T>
+  : evaluator<T>
+{ };
+
+// ---------- base class for all writable evaluators ----------
+
+// TODO this class does not seem to be necessary anymore
+template<typename ExpressionType>
+struct evaluator_impl_base
+{
+  typedef typename ExpressionType::Index Index;
+  // TODO that's not very nice to have to propagate all these traits. They are currently only needed to handle outer,inner indices.
+  typedef traits<ExpressionType> ExpressionTraits;
+
+  evaluator_impl<ExpressionType>& derived() 
+  {
+    return *static_cast<evaluator_impl<ExpressionType>*>(this); 
+  }
+};
+
+// -------------------- Matrix and Array --------------------
+//
+// evaluator_impl<PlainObjectBase> is a common base class for the
+// Matrix and Array evaluators.
+
+template<typename Derived>
+struct evaluator_impl<PlainObjectBase<Derived> >
+  : evaluator_impl_base<Derived>
+{
+  typedef PlainObjectBase<Derived> PlainObjectType;
+
+  enum {
+    IsRowMajor = PlainObjectType::IsRowMajor,
+    IsVectorAtCompileTime = PlainObjectType::IsVectorAtCompileTime,
+    RowsAtCompileTime = PlainObjectType::RowsAtCompileTime,
+    ColsAtCompileTime = PlainObjectType::ColsAtCompileTime
+  };
+
+  evaluator_impl(const PlainObjectType& m) 
+    : m_data(m.data()), m_outerStride(IsVectorAtCompileTime ? 0 : m.outerStride()) 
+  { }
+
+  typedef typename PlainObjectType::Index Index;
+  typedef typename PlainObjectType::Scalar Scalar;
+  typedef typename PlainObjectType::CoeffReturnType CoeffReturnType;
+  typedef typename PlainObjectType::PacketScalar PacketScalar;
+  typedef typename PlainObjectType::PacketReturnType PacketReturnType;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    if (IsRowMajor)
+      return m_data[row * m_outerStride.value() + col];
+    else
+      return m_data[row + col * m_outerStride.value()];
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_data[index];
+  }
+
+  Scalar& coeffRef(Index row, Index col)
+  {
+    if (IsRowMajor)
+      return const_cast<Scalar*>(m_data)[row * m_outerStride.value() + col];
+    else
+      return const_cast<Scalar*>(m_data)[row + col * m_outerStride.value()];
+  }
+
+  Scalar& coeffRef(Index index)
+  {
+    return const_cast<Scalar*>(m_data)[index];
+  }
+
+  template<int LoadMode> 
+  PacketReturnType packet(Index row, Index col) const
+  {
+    if (IsRowMajor)
+      return ploadt<PacketScalar, LoadMode>(m_data + row * m_outerStride.value() + col);
+    else
+      return ploadt<PacketScalar, LoadMode>(m_data + row + col * m_outerStride.value());
+  }
+
+  template<int LoadMode> 
+  PacketReturnType packet(Index index) const
+  {
+    return ploadt<PacketScalar, LoadMode>(m_data + index);
+  }
+
+  template<int StoreMode> 
+  void writePacket(Index row, Index col, const PacketScalar& x)
+  {
+    if (IsRowMajor)
+      return pstoret<Scalar, PacketScalar, StoreMode>
+	            (const_cast<Scalar*>(m_data) + row * m_outerStride.value() + col, x);
+    else
+      return pstoret<Scalar, PacketScalar, StoreMode>
+                    (const_cast<Scalar*>(m_data) + row + col * m_outerStride.value(), x);
+  }
+
+  template<int StoreMode> 
+  void writePacket(Index index, const PacketScalar& x)
+  {
+    return pstoret<Scalar, PacketScalar, StoreMode>(const_cast<Scalar*>(m_data) + index, x);
+  }
+
+protected:
+  const Scalar *m_data;
+
+  // We do not need to know the outer stride for vectors
+  variable_if_dynamic<Index, IsVectorAtCompileTime  ? 0 
+                                                    : int(IsRowMajor) ? ColsAtCompileTime 
+                                                    : RowsAtCompileTime> m_outerStride;
+};
+
+template<typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
+struct evaluator_impl<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
+  : evaluator_impl<PlainObjectBase<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> > >
+{
+  typedef Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols> XprType;
+
+  evaluator_impl(const XprType& m) 
+    : evaluator_impl<PlainObjectBase<XprType> >(m) 
+  { }
+};
+
+template<typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
+struct evaluator_impl<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
+  : evaluator_impl<PlainObjectBase<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> > >
+{
+  typedef Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> XprType;
+
+  evaluator_impl(const XprType& m) 
+    : evaluator_impl<PlainObjectBase<XprType> >(m) 
+  { }
+};
+
+// -------------------- EvalToTemp --------------------
+
+template<typename ArgType>
+struct traits<EvalToTemp<ArgType> >
+  : public traits<ArgType>
+{ };
+
+template<typename ArgType>
+class EvalToTemp
+  : public dense_xpr_base<EvalToTemp<ArgType> >::type
+{
+ public:
+ 
+  typedef typename dense_xpr_base<EvalToTemp>::type Base;
+  EIGEN_GENERIC_PUBLIC_INTERFACE(EvalToTemp)
+ 
+  EvalToTemp(const ArgType& arg)
+    : m_arg(arg)
+  { }
+ 
+  const ArgType& arg() const
+  {
+    return m_arg;
+  }
+
+  Index rows() const 
+  {
+    return m_arg.rows();
+  }
+
+  Index cols() const 
+  {
+    return m_arg.cols();
+  }
+
+ private:
+  const ArgType& m_arg;
+};
+ 
+template<typename ArgType>
+struct evaluator_impl<EvalToTemp<ArgType> >
+{
+  typedef EvalToTemp<ArgType> XprType;
+  typedef typename ArgType::PlainObject PlainObject;
+
+  evaluator_impl(const XprType& xpr) 
+    : m_result(xpr.rows(), xpr.cols()), m_resultImpl(m_result)
+  {
+    // TODO we should simply do m_result(xpr.arg());
+    call_dense_assignment_loop(m_result, xpr.arg());
+  }
+
+  // This constructor is used when nesting an EvalTo evaluator in another evaluator
+  evaluator_impl(const ArgType& arg) 
+    : m_result(arg.rows(), arg.cols()), m_resultImpl(m_result)
+  {
+    // TODO we should simply do m_result(xpr.arg());
+    call_dense_assignment_loop(m_result, arg);
+  }
+
+  typedef typename PlainObject::Index Index;
+  typedef typename PlainObject::Scalar Scalar;
+  typedef typename PlainObject::CoeffReturnType CoeffReturnType;
+  typedef typename PlainObject::PacketScalar PacketScalar;
+  typedef typename PlainObject::PacketReturnType PacketReturnType;
+
+  // All other functions are forwarded to m_resultImpl
+
+  CoeffReturnType coeff(Index row, Index col) const 
+  { 
+    return m_resultImpl.coeff(row, col); 
+  }
+  
+  CoeffReturnType coeff(Index index) const 
+  { 
+    return m_resultImpl.coeff(index); 
+  }
+  
+  Scalar& coeffRef(Index row, Index col) 
+  { 
+    return m_resultImpl.coeffRef(row, col); 
+  }
+  
+  Scalar& coeffRef(Index index) 
+  { 
+    return m_resultImpl.coeffRef(index); 
+  }
+
+  template<int LoadMode> 
+  PacketReturnType packet(Index row, Index col) const
+  {
+    return m_resultImpl.template packet<LoadMode>(row, col);
+  }
+
+  template<int LoadMode> 
+  PacketReturnType packet(Index index) const
+  {
+    return m_resultImpl.packet<LoadMode>(index);
+  }
+
+  template<int StoreMode> 
+  void writePacket(Index row, Index col, const PacketScalar& x)
+  {
+    m_resultImpl.template writePacket<StoreMode>(row, col, x);
+  }
+
+  template<int StoreMode> 
+  void writePacket(Index index, const PacketScalar& x)
+  {
+    m_resultImpl.template writePacket<StoreMode>(index, x);
+  }
+
+protected:
+  PlainObject m_result;
+  typename evaluator<PlainObject>::nestedType m_resultImpl;
+};
+
+// -------------------- Transpose --------------------
+
+template<typename ArgType>
+struct evaluator_impl<Transpose<ArgType> >
+  : evaluator_impl_base<Transpose<ArgType> >
+{
+  typedef Transpose<ArgType> XprType;
+
+  evaluator_impl(const XprType& t) : m_argImpl(t.nestedExpression()) {}
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketScalar PacketScalar;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    return m_argImpl.coeff(col, row);
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_argImpl.coeff(index);
+  }
+
+  Scalar& coeffRef(Index row, Index col)
+  {
+    return m_argImpl.coeffRef(col, row);
+  }
+
+  typename XprType::Scalar& coeffRef(Index index)
+  {
+    return m_argImpl.coeffRef(index);
+  }
+
+  template<int LoadMode>
+  PacketReturnType packet(Index row, Index col) const
+  {
+    return m_argImpl.template packet<LoadMode>(col, row);
+  }
+
+  template<int LoadMode>
+  PacketReturnType packet(Index index) const
+  {
+    return m_argImpl.template packet<LoadMode>(index);
+  }
+
+  template<int StoreMode> 
+  void writePacket(Index row, Index col, const PacketScalar& x)
+  {
+    m_argImpl.template writePacket<StoreMode>(col, row, x);
+  }
+
+  template<int StoreMode> 
+  void writePacket(Index index, const PacketScalar& x)
+  {
+    m_argImpl.template writePacket<StoreMode>(index, x);
+  }
+
+protected:
+  typename evaluator<ArgType>::nestedType m_argImpl;
+};
+
+// -------------------- CwiseNullaryOp --------------------
+
+template<typename NullaryOp, typename PlainObjectType>
+struct evaluator_impl<CwiseNullaryOp<NullaryOp,PlainObjectType> >
+{
+  typedef CwiseNullaryOp<NullaryOp,PlainObjectType> XprType;
+
+  evaluator_impl(const XprType& n) 
+    : m_functor(n.functor()) 
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketScalar PacketScalar;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    return m_functor(row, col);
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_functor(index);
+  }
+
+  template<int LoadMode>
+  PacketScalar packet(Index row, Index col) const
+  {
+    return m_functor.packetOp(row, col);
+  }
+
+  template<int LoadMode>
+  PacketScalar packet(Index index) const
+  {
+    return m_functor.packetOp(index);
+  }
+
+protected:
+  const NullaryOp m_functor;
+};
+
+// -------------------- CwiseUnaryOp --------------------
+
+template<typename UnaryOp, typename ArgType>
+struct evaluator_impl<CwiseUnaryOp<UnaryOp, ArgType> >
+{
+  typedef CwiseUnaryOp<UnaryOp, ArgType> XprType;
+
+  evaluator_impl(const XprType& op) 
+    : m_functor(op.functor()), 
+      m_argImpl(op.nestedExpression()) 
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketScalar PacketScalar;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    return m_functor(m_argImpl.coeff(row, col));
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_functor(m_argImpl.coeff(index));
+  }
+
+  template<int LoadMode>
+  PacketScalar packet(Index row, Index col) const
+  {
+    return m_functor.packetOp(m_argImpl.template packet<LoadMode>(row, col));
+  }
+
+  template<int LoadMode>
+  PacketScalar packet(Index index) const
+  {
+    return m_functor.packetOp(m_argImpl.template packet<LoadMode>(index));
+  }
+
+protected:
+  const UnaryOp m_functor;
+  typename evaluator<ArgType>::nestedType m_argImpl;
+};
+
+// -------------------- CwiseBinaryOp --------------------
+
+template<typename BinaryOp, typename Lhs, typename Rhs>
+struct evaluator_impl<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
+{
+  typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> XprType;
+
+  evaluator_impl(const XprType& xpr) 
+    : m_functor(xpr.functor()),
+      m_lhsImpl(xpr.lhs()), 
+      m_rhsImpl(xpr.rhs())  
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketScalar PacketScalar;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    return m_functor(m_lhsImpl.coeff(row, col), m_rhsImpl.coeff(row, col));
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_functor(m_lhsImpl.coeff(index), m_rhsImpl.coeff(index));
+  }
+
+  template<int LoadMode>
+  PacketScalar packet(Index row, Index col) const
+  {
+    return m_functor.packetOp(m_lhsImpl.template packet<LoadMode>(row, col),
+			      m_rhsImpl.template packet<LoadMode>(row, col));
+  }
+
+  template<int LoadMode>
+  PacketScalar packet(Index index) const
+  {
+    return m_functor.packetOp(m_lhsImpl.template packet<LoadMode>(index),
+			      m_rhsImpl.template packet<LoadMode>(index));
+  }
+
+protected:
+  const BinaryOp m_functor;
+  typename evaluator<Lhs>::nestedType m_lhsImpl;
+  typename evaluator<Rhs>::nestedType m_rhsImpl;
+};
+
+// -------------------- CwiseUnaryView --------------------
+
+template<typename UnaryOp, typename ArgType>
+struct evaluator_impl<CwiseUnaryView<UnaryOp, ArgType> >
+  : evaluator_impl_base<CwiseUnaryView<UnaryOp, ArgType> >
+{
+  typedef CwiseUnaryView<UnaryOp, ArgType> XprType;
+
+  evaluator_impl(const XprType& op) 
+    : m_unaryOp(op.functor()), 
+      m_argImpl(op.nestedExpression()) 
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    return m_unaryOp(m_argImpl.coeff(row, col));
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_unaryOp(m_argImpl.coeff(index));
+  }
+
+  Scalar& coeffRef(Index row, Index col)
+  {
+    return m_unaryOp(m_argImpl.coeffRef(row, col));
+  }
+
+  Scalar& coeffRef(Index index)
+  {
+    return m_unaryOp(m_argImpl.coeffRef(index));
+  }
+
+protected:
+  const UnaryOp m_unaryOp;
+  typename evaluator<ArgType>::nestedType m_argImpl;
+};
+
+// -------------------- Map --------------------
+
+template<typename Derived, int AccessorsType>
+struct evaluator_impl<MapBase<Derived, AccessorsType> >
+  : evaluator_impl_base<Derived>
+{
+  typedef MapBase<Derived, AccessorsType> MapType;
+  typedef Derived XprType;
+
+  typedef typename XprType::PointerType PointerType;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketScalar PacketScalar;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  
+  evaluator_impl(const XprType& map) 
+    : m_data(const_cast<PointerType>(map.data())),  
+      m_rowStride(map.rowStride()),
+      m_colStride(map.colStride())
+  { }
+ 
+  enum {
+    RowsAtCompileTime = XprType::RowsAtCompileTime
+  };
+ 
+  CoeffReturnType coeff(Index row, Index col) const 
+  { 
+    return m_data[col * m_colStride + row * m_rowStride];
+  }
+  
+  CoeffReturnType coeff(Index index) const 
+  { 
+    return coeff(RowsAtCompileTime == 1 ? 0 : index,
+		 RowsAtCompileTime == 1 ? index : 0);
+  }
+
+  Scalar& coeffRef(Index row, Index col) 
+  { 
+    return m_data[col * m_colStride + row * m_rowStride];
+  }
+  
+  Scalar& coeffRef(Index index) 
+  { 
+    return coeffRef(RowsAtCompileTime == 1 ? 0 : index,
+		    RowsAtCompileTime == 1 ? index : 0);
+  }
+ 
+  template<int LoadMode> 
+  PacketReturnType packet(Index row, Index col) const 
+  { 
+    PointerType ptr = m_data + row * m_rowStride + col * m_colStride;
+    return internal::ploadt<PacketScalar, LoadMode>(ptr);
+  }
+
+  template<int LoadMode> 
+  PacketReturnType packet(Index index) const 
+  { 
+    return packet<LoadMode>(RowsAtCompileTime == 1 ? 0 : index,
+			    RowsAtCompileTime == 1 ? index : 0);
+  }
+  
+  template<int StoreMode> 
+  void writePacket(Index row, Index col, const PacketScalar& x) 
+  { 
+    PointerType ptr = m_data + row * m_rowStride + col * m_colStride;
+    return internal::pstoret<Scalar, PacketScalar, StoreMode>(ptr, x);
+  }
+  
+  template<int StoreMode> 
+  void writePacket(Index index, const PacketScalar& x) 
+  { 
+    return writePacket<StoreMode>(RowsAtCompileTime == 1 ? 0 : index,
+				  RowsAtCompileTime == 1 ? index : 0,
+				  x);
+  }
+ 
+protected:
+  PointerType m_data;
+  int m_rowStride;
+  int m_colStride;
+};
+
+template<typename PlainObjectType, int MapOptions, typename StrideType> 
+struct evaluator_impl<Map<PlainObjectType, MapOptions, StrideType> >
+  : public evaluator_impl<MapBase<Map<PlainObjectType, MapOptions, StrideType> > >
+{
+  typedef Map<PlainObjectType, MapOptions, StrideType> XprType;
+
+  evaluator_impl(const XprType& map) 
+    : evaluator_impl<MapBase<XprType> >(map) 
+  { }
+};
+
+// -------------------- Block --------------------
+
+template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel,
+         bool HasDirectAccess = internal::has_direct_access<ArgType>::ret> struct block_evaluator;
+         
+template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel> 
+struct evaluator_impl<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
+  : block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel>
+{
+  typedef Block<ArgType, BlockRows, BlockCols, InnerPanel> XprType;
+  typedef block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel> block_evaluator_type;
+  evaluator_impl(const XprType& block) : block_evaluator_type(block) {}
+};
+
+template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel>
+struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /*HasDirectAccess*/ false>
+  : evaluator_impl_base<Block<ArgType, BlockRows, BlockCols, InnerPanel> >
+{
+  typedef Block<ArgType, BlockRows, BlockCols, InnerPanel> XprType;
+
+  block_evaluator(const XprType& block) 
+    : m_argImpl(block.nestedExpression()), 
+      m_startRow(block.startRow()), 
+      m_startCol(block.startCol()) 
+  { }
+ 
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketScalar PacketScalar;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  enum {
+    RowsAtCompileTime = XprType::RowsAtCompileTime
+  };
+ 
+  CoeffReturnType coeff(Index row, Index col) const 
+  { 
+    return m_argImpl.coeff(m_startRow.value() + row, m_startCol.value() + col); 
+  }
+  
+  CoeffReturnType coeff(Index index) const 
+  { 
+    return coeff(RowsAtCompileTime == 1 ? 0 : index,
+		 RowsAtCompileTime == 1 ? index : 0);
+  }
+
+  Scalar& coeffRef(Index row, Index col) 
+  { 
+    return m_argImpl.coeffRef(m_startRow.value() + row, m_startCol.value() + col); 
+  }
+  
+  Scalar& coeffRef(Index index) 
+  { 
+    return coeffRef(RowsAtCompileTime == 1 ? 0 : index,
+		    RowsAtCompileTime == 1 ? index : 0);
+  }
+ 
+  template<int LoadMode> 
+  PacketReturnType packet(Index row, Index col) const 
+  { 
+    return m_argImpl.template packet<LoadMode>(m_startRow.value() + row, m_startCol.value() + col); 
+  }
+
+  template<int LoadMode> 
+  PacketReturnType packet(Index index) const 
+  { 
+    return packet<LoadMode>(RowsAtCompileTime == 1 ? 0 : index,
+			    RowsAtCompileTime == 1 ? index : 0);
+  }
+  
+  template<int StoreMode> 
+  void writePacket(Index row, Index col, const PacketScalar& x) 
+  { 
+    return m_argImpl.template writePacket<StoreMode>(m_startRow.value() + row, m_startCol.value() + col, x); 
+  }
+  
+  template<int StoreMode> 
+  void writePacket(Index index, const PacketScalar& x) 
+  { 
+    return writePacket<StoreMode>(RowsAtCompileTime == 1 ? 0 : index,
+				  RowsAtCompileTime == 1 ? index : 0,
+				  x);
+  }
+ 
+protected:
+  typename evaluator<ArgType>::nestedType m_argImpl;
+  const variable_if_dynamic<Index, ArgType::RowsAtCompileTime == 1 ? 0 : Dynamic> m_startRow;
+  const variable_if_dynamic<Index, ArgType::ColsAtCompileTime == 1 ? 0 : Dynamic> m_startCol;
+};
+
+// TODO: This evaluator does not actually use the child evaluator; 
+// all action is via the data() as returned by the Block expression.
+
+template<typename ArgType, int BlockRows, int BlockCols, bool InnerPanel> 
+struct block_evaluator<ArgType, BlockRows, BlockCols, InnerPanel, /* HasDirectAccess */ true>
+  : evaluator_impl<MapBase<Block<ArgType, BlockRows, BlockCols, InnerPanel> > >
+{
+  typedef Block<ArgType, BlockRows, BlockCols, InnerPanel> XprType;
+
+  block_evaluator(const XprType& block) 
+    : evaluator_impl<MapBase<XprType> >(block) 
+  { }
+};
+
+
+// -------------------- Select --------------------
+
+template<typename ConditionMatrixType, typename ThenMatrixType, typename ElseMatrixType>
+struct evaluator_impl<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
+{
+  typedef Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> XprType;
+
+  evaluator_impl(const XprType& select) 
+    : m_conditionImpl(select.conditionMatrix()),
+      m_thenImpl(select.thenMatrix()),
+      m_elseImpl(select.elseMatrix())
+  { }
+ 
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    if (m_conditionImpl.coeff(row, col))
+      return m_thenImpl.coeff(row, col);
+    else
+      return m_elseImpl.coeff(row, col);
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    if (m_conditionImpl.coeff(index))
+      return m_thenImpl.coeff(index);
+    else
+      return m_elseImpl.coeff(index);
+  }
+ 
+protected:
+  typename evaluator<ConditionMatrixType>::nestedType m_conditionImpl;
+  typename evaluator<ThenMatrixType>::nestedType m_thenImpl;
+  typename evaluator<ElseMatrixType>::nestedType m_elseImpl;
+};
+
+
+// -------------------- Replicate --------------------
+
+template<typename ArgType, int RowFactor, int ColFactor> 
+struct evaluator_impl<Replicate<ArgType, RowFactor, ColFactor> >
+{
+  typedef Replicate<ArgType, RowFactor, ColFactor> XprType;
+
+  evaluator_impl(const XprType& replicate) 
+    : m_argImpl(replicate.nestedExpression()),
+      m_rows(replicate.nestedExpression().rows()),
+      m_cols(replicate.nestedExpression().cols())
+  { }
+ 
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    // try to avoid using modulo; this is a pure optimization strategy
+    const Index actual_row = internal::traits<XprType>::RowsAtCompileTime==1 ? 0
+                           : RowFactor==1 ? row
+                           : row % m_rows.value();
+    const Index actual_col = internal::traits<XprType>::ColsAtCompileTime==1 ? 0
+                           : ColFactor==1 ? col
+                           : col % m_cols.value();
+    
+    return m_argImpl.coeff(actual_row, actual_col);
+  }
+
+  template<int LoadMode>
+  PacketReturnType packet(Index row, Index col) const
+  {
+    const Index actual_row = internal::traits<XprType>::RowsAtCompileTime==1 ? 0
+                           : RowFactor==1 ? row
+                           : row % m_rows.value();
+    const Index actual_col = internal::traits<XprType>::ColsAtCompileTime==1 ? 0
+                           : ColFactor==1 ? col
+                           : col % m_cols.value();
+
+    return m_argImpl.template packet<LoadMode>(actual_row, actual_col);
+  }
+ 
+protected:
+  typename evaluator<ArgType>::nestedType m_argImpl;
+  const variable_if_dynamic<Index, XprType::RowsAtCompileTime> m_rows;
+  const variable_if_dynamic<Index, XprType::ColsAtCompileTime> m_cols;
+};
+
+
+// -------------------- PartialReduxExpr --------------------
+//
+// This is a wrapper around the expression object. 
+// TODO: Find out how to write a proper evaluator without duplicating
+//       the row() and col() member functions.
+
+template< typename ArgType, typename MemberOp, int Direction>
+struct evaluator_impl<PartialReduxExpr<ArgType, MemberOp, Direction> >
+{
+  typedef PartialReduxExpr<ArgType, MemberOp, Direction> XprType;
+
+  evaluator_impl(const XprType expr)
+    : m_expr(expr)
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+ 
+  CoeffReturnType coeff(Index row, Index col) const 
+  { 
+    return m_expr.coeff(row, col);
+  }
+  
+  CoeffReturnType coeff(Index index) const 
+  { 
+    return m_expr.coeff(index);
+  }
+
+protected:
+  const XprType m_expr;
+};
+
+
+// -------------------- MatrixWrapper and ArrayWrapper --------------------
+//
+// evaluator_impl_wrapper_base<T> is a common base class for the
+// MatrixWrapper and ArrayWrapper evaluators.
+
+template<typename XprType>
+struct evaluator_impl_wrapper_base
+  : evaluator_impl_base<XprType>
+{
+  typedef typename remove_all<typename XprType::NestedExpressionType>::type ArgType;
+
+  evaluator_impl_wrapper_base(const ArgType& arg) : m_argImpl(arg) {}
+
+  typedef typename ArgType::Index Index;
+  typedef typename ArgType::Scalar Scalar;
+  typedef typename ArgType::CoeffReturnType CoeffReturnType;
+  typedef typename ArgType::PacketScalar PacketScalar;
+  typedef typename ArgType::PacketReturnType PacketReturnType;
+
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    return m_argImpl.coeff(row, col);
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_argImpl.coeff(index);
+  }
+
+  Scalar& coeffRef(Index row, Index col)
+  {
+    return m_argImpl.coeffRef(row, col);
+  }
+
+  Scalar& coeffRef(Index index)
+  {
+    return m_argImpl.coeffRef(index);
+  }
+
+  template<int LoadMode> 
+  PacketReturnType packet(Index row, Index col) const
+  {
+    return m_argImpl.template packet<LoadMode>(row, col);
+  }
+
+  template<int LoadMode> 
+  PacketReturnType packet(Index index) const
+  {
+    return m_argImpl.template packet<LoadMode>(index);
+  }
+
+  template<int StoreMode> 
+  void writePacket(Index row, Index col, const PacketScalar& x)
+  {
+    m_argImpl.template writePacket<StoreMode>(row, col, x);
+  }
+
+  template<int StoreMode> 
+  void writePacket(Index index, const PacketScalar& x)
+  {
+    m_argImpl.template writePacket<StoreMode>(index, x);
+  }
+
+protected:
+  typename evaluator<ArgType>::nestedType m_argImpl;
+};
+
+template<typename TArgType>
+struct evaluator_impl<MatrixWrapper<TArgType> >
+  : evaluator_impl_wrapper_base<MatrixWrapper<TArgType> >
+{
+  typedef MatrixWrapper<TArgType> XprType;
+
+  evaluator_impl(const XprType& wrapper) 
+    : evaluator_impl_wrapper_base<MatrixWrapper<TArgType> >(wrapper.nestedExpression())
+  { }
+};
+
+template<typename TArgType>
+struct evaluator_impl<ArrayWrapper<TArgType> >
+  : evaluator_impl_wrapper_base<ArrayWrapper<TArgType> >
+{
+  typedef ArrayWrapper<TArgType> XprType;
+
+  evaluator_impl(const XprType& wrapper) 
+    : evaluator_impl_wrapper_base<ArrayWrapper<TArgType> >(wrapper.nestedExpression())
+  { }
+};
+
+
+// -------------------- Reverse --------------------
+
+// defined in Reverse.h:
+template<typename PacketScalar, bool ReversePacket> struct reverse_packet_cond;
+
+template<typename ArgType, int Direction>
+struct evaluator_impl<Reverse<ArgType, Direction> >
+  : evaluator_impl_base<Reverse<ArgType, Direction> >
+{
+  typedef Reverse<ArgType, Direction> XprType;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketScalar PacketScalar;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  enum {
+    PacketSize = internal::packet_traits<Scalar>::size,
+    IsRowMajor = XprType::IsRowMajor,
+    IsColMajor = !IsRowMajor,
+    ReverseRow = (Direction == Vertical)   || (Direction == BothDirections),
+    ReverseCol = (Direction == Horizontal) || (Direction == BothDirections),
+    OffsetRow  = ReverseRow && IsColMajor ? PacketSize : 1,
+    OffsetCol  = ReverseCol && IsRowMajor ? PacketSize : 1,
+    ReversePacket = (Direction == BothDirections)
+                    || ((Direction == Vertical)   && IsColMajor)
+                    || ((Direction == Horizontal) && IsRowMajor)
+  };
+  typedef internal::reverse_packet_cond<PacketScalar,ReversePacket> reverse_packet;
+
+  evaluator_impl(const XprType& reverse) 
+    : m_argImpl(reverse.nestedExpression()),
+      m_rows(ReverseRow ? reverse.nestedExpression().rows() : 0),
+      m_cols(ReverseCol ? reverse.nestedExpression().cols() : 0)
+  { }
+ 
+  CoeffReturnType coeff(Index row, Index col) const
+  {
+    return m_argImpl.coeff(ReverseRow ? m_rows.value() - row - 1 : row,
+			   ReverseCol ? m_cols.value() - col - 1 : col);
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_argImpl.coeff(m_rows.value() * m_cols.value() - index - 1);
+  }
+
+  Scalar& coeffRef(Index row, Index col)
+  {
+    return m_argImpl.coeffRef(ReverseRow ? m_rows.value() - row - 1 : row,
+			      ReverseCol ? m_cols.value() - col - 1 : col);
+  }
+
+  Scalar& coeffRef(Index index)
+  {
+    return m_argImpl.coeffRef(m_rows.value() * m_cols.value() - index - 1);
+  }
+
+  template<int LoadMode>
+  PacketScalar packet(Index row, Index col) const
+  {
+    return reverse_packet::run(m_argImpl.template packet<LoadMode>(
+                                  ReverseRow ? m_rows.value() - row - OffsetRow : row,
+                                  ReverseCol ? m_cols.value() - col - OffsetCol : col));
+  }
+
+  template<int LoadMode>
+  PacketScalar packet(Index index) const
+  {
+    return preverse(m_argImpl.template packet<LoadMode>(m_rows.value() * m_cols.value() - index - PacketSize));
+  }
+
+  template<int LoadMode>
+  void writePacket(Index row, Index col, const PacketScalar& x)
+  {
+    m_argImpl.template writePacket<LoadMode>(
+                                  ReverseRow ? m_rows.value() - row - OffsetRow : row,
+                                  ReverseCol ? m_cols.value() - col - OffsetCol : col,
+                                  reverse_packet::run(x));
+  }
+
+  template<int LoadMode>
+  void writePacket(Index index, const PacketScalar& x)
+  {
+    m_argImpl.template writePacket<LoadMode>
+      (m_rows.value() * m_cols.value() - index - PacketSize, preverse(x));
+  }
+ 
+protected:
+  typename evaluator<ArgType>::nestedType m_argImpl;
+
+  // If we do not reverse rows, then we do not need to know the number of rows; same for columns
+  const variable_if_dynamic<Index, ReverseRow ? ArgType::RowsAtCompileTime : 0> m_rows;
+  const variable_if_dynamic<Index, ReverseCol ? ArgType::ColsAtCompileTime : 0> m_cols;
+};
+
+
+// -------------------- Diagonal --------------------
+
+template<typename ArgType, int DiagIndex>
+struct evaluator_impl<Diagonal<ArgType, DiagIndex> >
+  : evaluator_impl_base<Diagonal<ArgType, DiagIndex> >
+{
+  typedef Diagonal<ArgType, DiagIndex> XprType;
+
+  evaluator_impl(const XprType& diagonal) 
+    : m_argImpl(diagonal.nestedExpression()),
+      m_index(diagonal.index())
+  { }
+ 
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+
+  CoeffReturnType coeff(Index row, Index) const
+  {
+    return m_argImpl.coeff(row + rowOffset(), row + colOffset());
+  }
+
+  CoeffReturnType coeff(Index index) const
+  {
+    return m_argImpl.coeff(index + rowOffset(), index + colOffset());
+  }
+
+  Scalar& coeffRef(Index row, Index)
+  {
+    return m_argImpl.coeffRef(row + rowOffset(), row + colOffset());
+  }
+
+  Scalar& coeffRef(Index index)
+  {
+    return m_argImpl.coeffRef(index + rowOffset(), index + colOffset());
+  }
+
+protected:
+  typename evaluator<ArgType>::nestedType m_argImpl;
+  const internal::variable_if_dynamicindex<Index, XprType::DiagIndex> m_index;
+
+private:
+  EIGEN_STRONG_INLINE Index rowOffset() const { return m_index.value() > 0 ? 0 : -m_index.value(); }
+  EIGEN_STRONG_INLINE Index colOffset() const { return m_index.value() > 0 ? m_index.value() : 0; }
+};
+
+} // namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_COREEVALUATORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/CoreIterators.h b/third_party/eigen3/Eigen/src/Core/CoreIterators.h
new file mode 100644
index 0000000000..6da4683d2c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/CoreIterators.h
@@ -0,0 +1,61 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COREITERATORS_H
+#define EIGEN_COREITERATORS_H
+
+namespace Eigen { 
+
+/* This file contains the respective InnerIterator definition of the expressions defined in Eigen/Core
+ */
+
+/** \ingroup SparseCore_Module
+  * \class InnerIterator
+  * \brief An InnerIterator allows to loop over the element of a sparse (or dense) matrix or expression
+  *
+  * todo
+  */
+
+// generic version for dense matrix and expressions
+template<typename Derived> class DenseBase<Derived>::InnerIterator
+{
+  protected:
+    typedef typename Derived::Scalar Scalar;
+    typedef typename Derived::Index Index;
+
+    enum { IsRowMajor = (Derived::Flags&RowMajorBit)==RowMajorBit };
+  public:
+    EIGEN_STRONG_INLINE InnerIterator(const Derived& expr, Index outer)
+      : m_expression(expr), m_inner(0), m_outer(outer), m_end(expr.innerSize())
+    {}
+
+    EIGEN_STRONG_INLINE Scalar value() const
+    {
+      return (IsRowMajor) ? m_expression.coeff(m_outer, m_inner)
+                          : m_expression.coeff(m_inner, m_outer);
+    }
+
+    EIGEN_STRONG_INLINE InnerIterator& operator++() { m_inner++; return *this; }
+
+    EIGEN_STRONG_INLINE Index index() const { return m_inner; }
+    inline Index row() const { return IsRowMajor ? m_outer : index(); }
+    inline Index col() const { return IsRowMajor ? index() : m_outer; }
+
+    EIGEN_STRONG_INLINE operator bool() const { return m_inner < m_end && m_inner>=0; }
+
+  protected:
+    const Derived& m_expression;
+    Index m_inner;
+    const Index m_outer;
+    const Index m_end;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_COREITERATORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/CwiseBinaryOp.h b/third_party/eigen3/Eigen/src/Core/CwiseBinaryOp.h
new file mode 100644
index 0000000000..e20daacc8c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/CwiseBinaryOp.h
@@ -0,0 +1,238 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CWISE_BINARY_OP_H
+#define EIGEN_CWISE_BINARY_OP_H
+
+namespace Eigen {
+
+/** \class CwiseBinaryOp
+  * \ingroup Core_Module
+  *
+  * \brief Generic expression where a coefficient-wise binary operator is applied to two expressions
+  *
+  * \param BinaryOp template functor implementing the operator
+  * \param Lhs the type of the left-hand side
+  * \param Rhs the type of the right-hand side
+  *
+  * This class represents an expression  where a coefficient-wise binary operator is applied to two expressions.
+  * It is the return type of binary operators, by which we mean only those binary operators where
+  * both the left-hand side and the right-hand side are Eigen expressions.
+  * For example, the return type of matrix1+matrix2 is a CwiseBinaryOp.
+  *
+  * Most of the time, this is the only way that it is used, so you typically don't have to name
+  * CwiseBinaryOp types explicitly.
+  *
+  * \sa MatrixBase::binaryExpr(const MatrixBase<OtherDerived> &,const CustomBinaryOp &) const, class CwiseUnaryOp, class CwiseNullaryOp
+  */
+
+namespace internal {
+template<typename BinaryOp, typename Lhs, typename Rhs>
+struct traits<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
+{
+  // we must not inherit from traits<Lhs> since it has
+  // the potential to cause problems with MSVC
+  typedef typename remove_all<Lhs>::type Ancestor;
+  typedef typename traits<Ancestor>::XprKind XprKind;
+  enum {
+    RowsAtCompileTime = traits<Ancestor>::RowsAtCompileTime,
+    ColsAtCompileTime = traits<Ancestor>::ColsAtCompileTime,
+    MaxRowsAtCompileTime = traits<Ancestor>::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = traits<Ancestor>::MaxColsAtCompileTime
+  };
+
+  // even though we require Lhs and Rhs to have the same scalar type (see CwiseBinaryOp constructor),
+  // we still want to handle the case when the result type is different.
+  typedef typename result_of<
+                     BinaryOp(
+                       typename Lhs::Scalar,
+                       typename Rhs::Scalar
+                     )
+                   >::type Scalar;
+  typedef typename promote_storage_type<typename traits<Lhs>::StorageKind,
+                                           typename traits<Rhs>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename traits<Lhs>::Index,
+                                         typename traits<Rhs>::Index>::type Index;
+  typedef typename Lhs::Nested LhsNested;
+  typedef typename Rhs::Nested RhsNested;
+  typedef typename remove_reference<LhsNested>::type _LhsNested;
+  typedef typename remove_reference<RhsNested>::type _RhsNested;
+  enum {
+    LhsCoeffReadCost = _LhsNested::CoeffReadCost,
+    RhsCoeffReadCost = _RhsNested::CoeffReadCost,
+    LhsFlags = _LhsNested::Flags,
+    RhsFlags = _RhsNested::Flags,
+    SameType = is_same<typename _LhsNested::Scalar,typename _RhsNested::Scalar>::value,
+    StorageOrdersAgree = (int(Lhs::Flags)&RowMajorBit)==(int(Rhs::Flags)&RowMajorBit),
+    Flags0 = (int(LhsFlags) | int(RhsFlags)) & (
+        HereditaryBits
+      | (int(LhsFlags) & int(RhsFlags) &
+           ( AlignedBit
+           | (StorageOrdersAgree ? LinearAccessBit : 0)
+           | (functor_traits<BinaryOp>::PacketAccess && StorageOrdersAgree && SameType ? PacketAccessBit : 0)
+           )
+        )
+     ),
+    Flags = (Flags0 & ~RowMajorBit) | (LhsFlags & RowMajorBit),
+    CoeffReadCost = LhsCoeffReadCost + RhsCoeffReadCost + functor_traits<BinaryOp>::Cost
+  };
+};
+} // end namespace internal
+
+// we require Lhs and Rhs to have the same scalar type. Currently there is no example of a binary functor
+// that would take two operands of different types. If there were such an example, then this check should be
+// moved to the BinaryOp functors, on a per-case basis. This would however require a change in the BinaryOp functors, as
+// currently they take only one typename Scalar template parameter.
+// It is tempting to always allow mixing different types but remember that this is often impossible in the vectorized paths.
+// So allowing mixing different types gives very unexpected errors when enabling vectorization, when the user tries to
+// add together a float matrix and a double matrix.
+#define EIGEN_CHECK_BINARY_COMPATIBILIY(BINOP,LHS,RHS) \
+  EIGEN_STATIC_ASSERT((internal::functor_is_product_like<BINOP>::ret \
+                        ? int(internal::scalar_product_traits<LHS, RHS>::Defined) \
+                        : int(internal::is_same<LHS, RHS>::value)), \
+    YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+template<typename BinaryOp, typename Lhs, typename Rhs, typename StorageKind>
+class CwiseBinaryOpImpl;
+
+template<typename BinaryOp, typename Lhs, typename Rhs>
+class CwiseBinaryOp : internal::no_assignment_operator,
+  public CwiseBinaryOpImpl<
+          BinaryOp, Lhs, Rhs,
+          typename internal::promote_storage_type<typename internal::traits<Lhs>::StorageKind,
+                                           typename internal::traits<Rhs>::StorageKind>::ret>
+{
+  public:
+
+    typedef typename CwiseBinaryOpImpl<
+        BinaryOp, Lhs, Rhs,
+        typename internal::promote_storage_type<typename internal::traits<Lhs>::StorageKind,
+                                         typename internal::traits<Rhs>::StorageKind>::ret>::Base Base;
+    EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseBinaryOp)
+
+    typedef typename internal::nested<Lhs>::type LhsNested;
+    typedef typename internal::nested<Rhs>::type RhsNested;
+    typedef typename internal::remove_reference<LhsNested>::type _LhsNested;
+    typedef typename internal::remove_reference<RhsNested>::type _RhsNested;
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CwiseBinaryOp(const Lhs& aLhs, const Rhs& aRhs, const BinaryOp& func = BinaryOp())
+      : m_lhs(aLhs), m_rhs(aRhs), m_functor(func)
+    {
+      EIGEN_CHECK_BINARY_COMPATIBILIY(BinaryOp,typename Lhs::Scalar,typename Rhs::Scalar);
+      // require the sizes to match
+      EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Lhs, Rhs)
+      eigen_assert(aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index rows() const {
+      // return the fixed size type if available to enable compile time optimizations
+      if (internal::traits<typename internal::remove_all<LhsNested>::type>::RowsAtCompileTime==Dynamic)
+        return m_rhs.rows();
+      else
+        return m_lhs.rows();
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index cols() const {
+      // return the fixed size type if available to enable compile time optimizations
+      if (internal::traits<typename internal::remove_all<LhsNested>::type>::ColsAtCompileTime==Dynamic)
+        return m_rhs.cols();
+      else
+        return m_lhs.cols();
+    }
+
+    /** \returns the left hand side nested expression */
+    EIGEN_DEVICE_FUNC
+    const _LhsNested& lhs() const { return m_lhs; }
+    /** \returns the right hand side nested expression */
+    EIGEN_DEVICE_FUNC
+    const _RhsNested& rhs() const { return m_rhs; }
+    /** \returns the functor representing the binary operation */
+    EIGEN_DEVICE_FUNC
+    const BinaryOp& functor() const { return m_functor; }
+
+  protected:
+    LhsNested m_lhs;
+    RhsNested m_rhs;
+    const BinaryOp m_functor;
+};
+
+template<typename BinaryOp, typename Lhs, typename Rhs>
+class CwiseBinaryOpImpl<BinaryOp, Lhs, Rhs, Dense>
+  : public internal::dense_xpr_base<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >::type
+{
+    typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> Derived;
+  public:
+
+    typedef typename internal::dense_xpr_base<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE( Derived )
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index rowId, Index colId) const
+    {
+      return derived().functor()(derived().lhs().coeff(rowId, colId),
+                                 derived().rhs().coeff(rowId, colId));
+    }
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index rowId, Index colId) const
+    {
+      return derived().functor().packetOp(derived().lhs().template packet<LoadMode>(rowId, colId),
+                                          derived().rhs().template packet<LoadMode>(rowId, colId));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index index) const
+    {
+      return derived().functor()(derived().lhs().coeff(index),
+                                 derived().rhs().coeff(index));
+    }
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index index) const
+    {
+      return derived().functor().packetOp(derived().lhs().template packet<LoadMode>(index),
+                                          derived().rhs().template packet<LoadMode>(index));
+    }
+};
+
+/** replaces \c *this by \c *this - \a other.
+  *
+  * \returns a reference to \c *this
+  */
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived &
+MatrixBase<Derived>::operator-=(const MatrixBase<OtherDerived> &other)
+{
+  SelfCwiseBinaryOp<internal::scalar_difference_op<Scalar>, Derived, OtherDerived> tmp(derived());
+  tmp = other.derived();
+  return derived();
+}
+
+/** replaces \c *this by \c *this + \a other.
+  *
+  * \returns a reference to \c *this
+  */
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived &
+MatrixBase<Derived>::operator+=(const MatrixBase<OtherDerived>& other)
+{
+  SelfCwiseBinaryOp<internal::scalar_sum_op<Scalar>, Derived, OtherDerived> tmp(derived());
+  tmp = other.derived();
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CWISE_BINARY_OP_H
+
diff --git a/third_party/eigen3/Eigen/src/Core/CwiseNullaryOp.h b/third_party/eigen3/Eigen/src/Core/CwiseNullaryOp.h
new file mode 100644
index 0000000000..1243831142
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/CwiseNullaryOp.h
@@ -0,0 +1,875 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CWISE_NULLARY_OP_H
+#define EIGEN_CWISE_NULLARY_OP_H
+
+namespace Eigen {
+
+/** \class CwiseNullaryOp
+  * \ingroup Core_Module
+  *
+  * \brief Generic expression of a matrix where all coefficients are defined by a functor
+  *
+  * \param NullaryOp template functor implementing the operator
+  * \param PlainObjectType the underlying plain matrix/array type
+  *
+  * This class represents an expression of a generic nullary operator.
+  * It is the return type of the Ones(), Zero(), Constant(), Identity() and Random() methods,
+  * and most of the time this is the only way it is used.
+  *
+  * However, if you want to write a function returning such an expression, you
+  * will need to use this class.
+  *
+  * \sa class CwiseUnaryOp, class CwiseBinaryOp, DenseBase::NullaryExpr()
+  */
+
+namespace internal {
+template<typename NullaryOp, typename PlainObjectType>
+struct traits<CwiseNullaryOp<NullaryOp, PlainObjectType> > : traits<PlainObjectType>
+{
+  enum {
+    Flags = (traits<PlainObjectType>::Flags
+      & (  HereditaryBits
+         | (functor_has_linear_access<NullaryOp>::ret ? LinearAccessBit : 0)
+         | (functor_traits<NullaryOp>::PacketAccess ? PacketAccessBit : 0)))
+      | (functor_traits<NullaryOp>::IsRepeatable ? 0 : EvalBeforeNestingBit),
+    CoeffReadCost = functor_traits<NullaryOp>::Cost
+  };
+};
+}
+
+template<typename NullaryOp, typename PlainObjectType>
+class CwiseNullaryOp : internal::no_assignment_operator,
+  public internal::dense_xpr_base< CwiseNullaryOp<NullaryOp, PlainObjectType> >::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<CwiseNullaryOp>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(CwiseNullaryOp)
+
+    EIGEN_DEVICE_FUNC
+    CwiseNullaryOp(Index nbRows, Index nbCols, const NullaryOp& func = NullaryOp())
+      : m_rows(nbRows), m_cols(nbCols), m_functor(func)
+    {
+      eigen_assert(nbRows >= 0
+            && (RowsAtCompileTime == Dynamic || RowsAtCompileTime == nbRows)
+            &&  nbCols >= 0
+            && (ColsAtCompileTime == Dynamic || ColsAtCompileTime == nbCols));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index rows() const { return m_rows.value(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index cols() const { return m_cols.value(); }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index rowId, Index colId) const
+    {
+      return m_functor(rowId, colId);
+    }
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index rowId, Index colId) const
+    {
+      return m_functor.packetOp(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index index) const
+    {
+      return m_functor(index);
+    }
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index index) const
+    {
+      return m_functor.packetOp(index);
+    }
+
+    /** \returns the functor representing the nullary operation */
+    EIGEN_DEVICE_FUNC
+    const NullaryOp& functor() const { return m_functor; }
+
+  protected:
+    const internal::variable_if_dynamic<Index, RowsAtCompileTime> m_rows;
+    const internal::variable_if_dynamic<Index, ColsAtCompileTime> m_cols;
+    const NullaryOp m_functor;
+};
+
+
+/** \returns an expression of a matrix defined by a custom functor \a func
+  *
+  * The parameters \a rows and \a cols are the number of rows and of columns of
+  * the returned matrix. Must be compatible with this MatrixBase type.
+  *
+  * This variant is meant to be used for dynamic-size matrix types. For fixed-size types,
+  * it is redundant to pass \a rows and \a cols as arguments, so Zero() should be used
+  * instead.
+  *
+  * The template parameter \a CustomNullaryOp is the type of the functor.
+  *
+  * \sa class CwiseNullaryOp
+  */
+template<typename Derived>
+template<typename CustomNullaryOp>
+EIGEN_STRONG_INLINE const CwiseNullaryOp<CustomNullaryOp, Derived>
+DenseBase<Derived>::NullaryExpr(Index rows, Index cols, const CustomNullaryOp& func)
+{
+  return CwiseNullaryOp<CustomNullaryOp, Derived>(rows, cols, func);
+}
+
+/** \returns an expression of a matrix defined by a custom functor \a func
+  *
+  * The parameter \a size is the size of the returned vector.
+  * Must be compatible with this MatrixBase type.
+  *
+  * \only_for_vectors
+  *
+  * This variant is meant to be used for dynamic-size vector types. For fixed-size types,
+  * it is redundant to pass \a size as argument, so Zero() should be used
+  * instead.
+  *
+  * The template parameter \a CustomNullaryOp is the type of the functor.
+  *
+  * Here is an example with C++11 random generators: \include random_cpp11.cpp
+  * Output: \verbinclude random_cpp11.out
+  * 
+  * \sa class CwiseNullaryOp
+  */
+template<typename Derived>
+template<typename CustomNullaryOp>
+EIGEN_STRONG_INLINE const CwiseNullaryOp<CustomNullaryOp, Derived>
+DenseBase<Derived>::NullaryExpr(Index size, const CustomNullaryOp& func)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  if(RowsAtCompileTime == 1) return CwiseNullaryOp<CustomNullaryOp, Derived>(1, size, func);
+  else return CwiseNullaryOp<CustomNullaryOp, Derived>(size, 1, func);
+}
+
+/** \returns an expression of a matrix defined by a custom functor \a func
+  *
+  * This variant is only for fixed-size DenseBase types. For dynamic-size types, you
+  * need to use the variants taking size arguments.
+  *
+  * The template parameter \a CustomNullaryOp is the type of the functor.
+  *
+  * \sa class CwiseNullaryOp
+  */
+template<typename Derived>
+template<typename CustomNullaryOp>
+EIGEN_STRONG_INLINE const CwiseNullaryOp<CustomNullaryOp, Derived>
+DenseBase<Derived>::NullaryExpr(const CustomNullaryOp& func)
+{
+  return CwiseNullaryOp<CustomNullaryOp, Derived>(RowsAtCompileTime, ColsAtCompileTime, func);
+}
+
+/** \returns an expression of a constant matrix of value \a value
+  *
+  * The parameters \a nbRows and \a nbCols are the number of rows and of columns of
+  * the returned matrix. Must be compatible with this DenseBase type.
+  *
+  * This variant is meant to be used for dynamic-size matrix types. For fixed-size types,
+  * it is redundant to pass \a nbRows and \a nbCols as arguments, so Zero() should be used
+  * instead.
+  *
+  * The template parameter \a CustomNullaryOp is the type of the functor.
+  *
+  * \sa class CwiseNullaryOp
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Constant(Index nbRows, Index nbCols, const Scalar& value)
+{
+  return DenseBase<Derived>::NullaryExpr(nbRows, nbCols, internal::scalar_constant_op<Scalar>(value));
+}
+
+/** \returns an expression of a constant matrix of value \a value
+  *
+  * The parameter \a size is the size of the returned vector.
+  * Must be compatible with this DenseBase type.
+  *
+  * \only_for_vectors
+  *
+  * This variant is meant to be used for dynamic-size vector types. For fixed-size types,
+  * it is redundant to pass \a size as argument, so Zero() should be used
+  * instead.
+  *
+  * The template parameter \a CustomNullaryOp is the type of the functor.
+  *
+  * \sa class CwiseNullaryOp
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Constant(Index size, const Scalar& value)
+{
+  return DenseBase<Derived>::NullaryExpr(size, internal::scalar_constant_op<Scalar>(value));
+}
+
+/** \returns an expression of a constant matrix of value \a value
+  *
+  * This variant is only for fixed-size DenseBase types. For dynamic-size types, you
+  * need to use the variants taking size arguments.
+  *
+  * The template parameter \a CustomNullaryOp is the type of the functor.
+  *
+  * \sa class CwiseNullaryOp
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Constant(const Scalar& value)
+{
+  EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived)
+  return DenseBase<Derived>::NullaryExpr(RowsAtCompileTime, ColsAtCompileTime, internal::scalar_constant_op<Scalar>(value));
+}
+
+/**
+  * \brief Sets a linearly space vector.
+  *
+  * The function generates 'size' equally spaced values in the closed interval [low,high].
+  * This particular version of LinSpaced() uses sequential access, i.e. vector access is
+  * assumed to be a(0), a(1), ..., a(size). This assumption allows for better vectorization
+  * and yields faster code than the random access version.
+  *
+  * When size is set to 1, a vector of length 1 containing 'high' is returned.
+  *
+  * \only_for_vectors
+  *
+  * Example: \include DenseBase_LinSpaced_seq.cpp
+  * Output: \verbinclude DenseBase_LinSpaced_seq.out
+  *
+  * \sa setLinSpaced(Index,const Scalar&,const Scalar&), LinSpaced(Index,Scalar,Scalar), CwiseNullaryOp
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::SequentialLinSpacedReturnType
+DenseBase<Derived>::LinSpaced(Sequential_t, Index size, const Scalar& low, const Scalar& high)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return DenseBase<Derived>::NullaryExpr(size, internal::linspaced_op<Scalar,false>(low,high,size));
+}
+
+/**
+  * \copydoc DenseBase::LinSpaced(Sequential_t, Index, const Scalar&, const Scalar&)
+  * Special version for fixed size types which does not require the size parameter.
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::SequentialLinSpacedReturnType
+DenseBase<Derived>::LinSpaced(Sequential_t, const Scalar& low, const Scalar& high)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived)
+  return DenseBase<Derived>::NullaryExpr(Derived::SizeAtCompileTime, internal::linspaced_op<Scalar,false>(low,high,Derived::SizeAtCompileTime));
+}
+
+/**
+  * \brief Sets a linearly space vector.
+  *
+  * The function generates 'size' equally spaced values in the closed interval [low,high].
+  * When size is set to 1, a vector of length 1 containing 'high' is returned.
+  *
+  * \only_for_vectors
+  *
+  * Example: \include DenseBase_LinSpaced.cpp
+  * Output: \verbinclude DenseBase_LinSpaced.out
+  *
+  * \sa setLinSpaced(Index,const Scalar&,const Scalar&), LinSpaced(Sequential_t,Index,const Scalar&,const Scalar&,Index), CwiseNullaryOp
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::RandomAccessLinSpacedReturnType
+DenseBase<Derived>::LinSpaced(Index size, const Scalar& low, const Scalar& high)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return DenseBase<Derived>::NullaryExpr(size, internal::linspaced_op<Scalar,true>(low,high,size));
+}
+
+/**
+  * \copydoc DenseBase::LinSpaced(Index, const Scalar&, const Scalar&)
+  * Special version for fixed size types which does not require the size parameter.
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::RandomAccessLinSpacedReturnType
+DenseBase<Derived>::LinSpaced(const Scalar& low, const Scalar& high)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived)
+  return DenseBase<Derived>::NullaryExpr(Derived::SizeAtCompileTime, internal::linspaced_op<Scalar,true>(low,high,Derived::SizeAtCompileTime));
+}
+
+/** \returns true if all coefficients in this matrix are approximately equal to \a val, to within precision \a prec */
+template<typename Derived>
+bool DenseBase<Derived>::isApproxToConstant
+(const Scalar& val, const RealScalar& prec) const
+{
+  for(Index j = 0; j < cols(); ++j)
+    for(Index i = 0; i < rows(); ++i)
+      if(!internal::isApprox(this->coeff(i, j), val, prec))
+        return false;
+  return true;
+}
+
+/** This is just an alias for isApproxToConstant().
+  *
+  * \returns true if all coefficients in this matrix are approximately equal to \a value, to within precision \a prec */
+template<typename Derived>
+bool DenseBase<Derived>::isConstant
+(const Scalar& val, const RealScalar& prec) const
+{
+  return isApproxToConstant(val, prec);
+}
+
+/** Alias for setConstant(): sets all coefficients in this expression to \a val.
+  *
+  * \sa setConstant(), Constant(), class CwiseNullaryOp
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE void DenseBase<Derived>::fill(const Scalar& val)
+{
+  setConstant(val);
+}
+
+/** Sets all coefficients in this expression to \a value.
+  *
+  * \sa fill(), setConstant(Index,const Scalar&), setConstant(Index,Index,const Scalar&), setZero(), setOnes(), Constant(), class CwiseNullaryOp, setZero(), setOnes()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::setConstant(const Scalar& val)
+{
+  return derived() = Constant(rows(), cols(), val);
+}
+
+/** Resizes to the given \a size, and sets all coefficients in this expression to the given \a value.
+  *
+  * \only_for_vectors
+  *
+  * Example: \include Matrix_setConstant_int.cpp
+  * Output: \verbinclude Matrix_setConstant_int.out
+  *
+  * \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&)
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+PlainObjectBase<Derived>::setConstant(Index size, const Scalar& val)
+{
+  resize(size);
+  return setConstant(val);
+}
+
+/** Resizes to the given size, and sets all coefficients in this expression to the given \a value.
+  *
+  * \param nbRows the new number of rows
+  * \param nbCols the new number of columns
+  * \param val the value to which all coefficients are set
+  *
+  * Example: \include Matrix_setConstant_int_int.cpp
+  * Output: \verbinclude Matrix_setConstant_int_int.out
+  *
+  * \sa MatrixBase::setConstant(const Scalar&), setConstant(Index,const Scalar&), class CwiseNullaryOp, MatrixBase::Constant(const Scalar&)
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+PlainObjectBase<Derived>::setConstant(Index nbRows, Index nbCols, const Scalar& val)
+{
+  resize(nbRows, nbCols);
+  return setConstant(val);
+}
+
+/**
+  * \brief Sets a linearly space vector.
+  *
+  * The function generates 'size' equally spaced values in the closed interval [low,high].
+  * When size is set to 1, a vector of length 1 containing 'high' is returned.
+  *
+  * \only_for_vectors
+  *
+  * Example: \include DenseBase_setLinSpaced.cpp
+  * Output: \verbinclude DenseBase_setLinSpaced.out
+  *
+  * \sa CwiseNullaryOp
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::setLinSpaced(Index newSize, const Scalar& low, const Scalar& high)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return derived() = Derived::NullaryExpr(newSize, internal::linspaced_op<Scalar,false>(low,high,newSize));
+}
+
+/**
+  * \brief Sets a linearly space vector.
+  *
+  * The function fill *this with equally spaced values in the closed interval [low,high].
+  * When size is set to 1, a vector of length 1 containing 'high' is returned.
+  *
+  * \only_for_vectors
+  *
+  * \sa setLinSpaced(Index, const Scalar&, const Scalar&), CwiseNullaryOp
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::setLinSpaced(const Scalar& low, const Scalar& high)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return setLinSpaced(size(), low, high);
+}
+
+// zero:
+
+/** \returns an expression of a zero matrix.
+  *
+  * The parameters \a rows and \a cols are the number of rows and of columns of
+  * the returned matrix. Must be compatible with this MatrixBase type.
+  *
+  * This variant is meant to be used for dynamic-size matrix types. For fixed-size types,
+  * it is redundant to pass \a rows and \a cols as arguments, so Zero() should be used
+  * instead.
+  *
+  * Example: \include MatrixBase_zero_int_int.cpp
+  * Output: \verbinclude MatrixBase_zero_int_int.out
+  *
+  * \sa Zero(), Zero(Index)
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Zero(Index nbRows, Index nbCols)
+{
+  return Constant(nbRows, nbCols, Scalar(0));
+}
+
+/** \returns an expression of a zero vector.
+  *
+  * The parameter \a size is the size of the returned vector.
+  * Must be compatible with this MatrixBase type.
+  *
+  * \only_for_vectors
+  *
+  * This variant is meant to be used for dynamic-size vector types. For fixed-size types,
+  * it is redundant to pass \a size as argument, so Zero() should be used
+  * instead.
+  *
+  * Example: \include MatrixBase_zero_int.cpp
+  * Output: \verbinclude MatrixBase_zero_int.out
+  *
+  * \sa Zero(), Zero(Index,Index)
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Zero(Index size)
+{
+  return Constant(size, Scalar(0));
+}
+
+/** \returns an expression of a fixed-size zero matrix or vector.
+  *
+  * This variant is only for fixed-size MatrixBase types. For dynamic-size types, you
+  * need to use the variants taking size arguments.
+  *
+  * Example: \include MatrixBase_zero.cpp
+  * Output: \verbinclude MatrixBase_zero.out
+  *
+  * \sa Zero(Index), Zero(Index,Index)
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Zero()
+{
+  return Constant(Scalar(0));
+}
+
+/** \returns true if *this is approximately equal to the zero matrix,
+  *          within the precision given by \a prec.
+  *
+  * Example: \include MatrixBase_isZero.cpp
+  * Output: \verbinclude MatrixBase_isZero.out
+  *
+  * \sa class CwiseNullaryOp, Zero()
+  */
+template<typename Derived>
+bool DenseBase<Derived>::isZero(const RealScalar& prec) const
+{
+  for(Index j = 0; j < cols(); ++j)
+    for(Index i = 0; i < rows(); ++i)
+      if(!internal::isMuchSmallerThan(this->coeff(i, j), static_cast<Scalar>(1), prec))
+        return false;
+  return true;
+}
+
+/** Sets all coefficients in this expression to zero.
+  *
+  * Example: \include MatrixBase_setZero.cpp
+  * Output: \verbinclude MatrixBase_setZero.out
+  *
+  * \sa class CwiseNullaryOp, Zero()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::setZero()
+{
+  return setConstant(Scalar(0));
+}
+
+/** Resizes to the given \a size, and sets all coefficients in this expression to zero.
+  *
+  * \only_for_vectors
+  *
+  * Example: \include Matrix_setZero_int.cpp
+  * Output: \verbinclude Matrix_setZero_int.out
+  *
+  * \sa DenseBase::setZero(), setZero(Index,Index), class CwiseNullaryOp, DenseBase::Zero()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+PlainObjectBase<Derived>::setZero(Index newSize)
+{
+  resize(newSize);
+  return setConstant(Scalar(0));
+}
+
+/** Resizes to the given size, and sets all coefficients in this expression to zero.
+  *
+  * \param nbRows the new number of rows
+  * \param nbCols the new number of columns
+  *
+  * Example: \include Matrix_setZero_int_int.cpp
+  * Output: \verbinclude Matrix_setZero_int_int.out
+  *
+  * \sa DenseBase::setZero(), setZero(Index), class CwiseNullaryOp, DenseBase::Zero()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+PlainObjectBase<Derived>::setZero(Index nbRows, Index nbCols)
+{
+  resize(nbRows, nbCols);
+  return setConstant(Scalar(0));
+}
+
+// ones:
+
+/** \returns an expression of a matrix where all coefficients equal one.
+  *
+  * The parameters \a nbRows and \a nbCols are the number of rows and of columns of
+  * the returned matrix. Must be compatible with this MatrixBase type.
+  *
+  * This variant is meant to be used for dynamic-size matrix types. For fixed-size types,
+  * it is redundant to pass \a rows and \a cols as arguments, so Ones() should be used
+  * instead.
+  *
+  * Example: \include MatrixBase_ones_int_int.cpp
+  * Output: \verbinclude MatrixBase_ones_int_int.out
+  *
+  * \sa Ones(), Ones(Index), isOnes(), class Ones
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Ones(Index nbRows, Index nbCols)
+{
+  return Constant(nbRows, nbCols, Scalar(1));
+}
+
+/** \returns an expression of a vector where all coefficients equal one.
+  *
+  * The parameter \a newSize is the size of the returned vector.
+  * Must be compatible with this MatrixBase type.
+  *
+  * \only_for_vectors
+  *
+  * This variant is meant to be used for dynamic-size vector types. For fixed-size types,
+  * it is redundant to pass \a size as argument, so Ones() should be used
+  * instead.
+  *
+  * Example: \include MatrixBase_ones_int.cpp
+  * Output: \verbinclude MatrixBase_ones_int.out
+  *
+  * \sa Ones(), Ones(Index,Index), isOnes(), class Ones
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Ones(Index newSize)
+{
+  return Constant(newSize, Scalar(1));
+}
+
+/** \returns an expression of a fixed-size matrix or vector where all coefficients equal one.
+  *
+  * This variant is only for fixed-size MatrixBase types. For dynamic-size types, you
+  * need to use the variants taking size arguments.
+  *
+  * Example: \include MatrixBase_ones.cpp
+  * Output: \verbinclude MatrixBase_ones.out
+  *
+  * \sa Ones(Index), Ones(Index,Index), isOnes(), class Ones
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename DenseBase<Derived>::ConstantReturnType
+DenseBase<Derived>::Ones()
+{
+  return Constant(Scalar(1));
+}
+
+/** \returns true if *this is approximately equal to the matrix where all coefficients
+  *          are equal to 1, within the precision given by \a prec.
+  *
+  * Example: \include MatrixBase_isOnes.cpp
+  * Output: \verbinclude MatrixBase_isOnes.out
+  *
+  * \sa class CwiseNullaryOp, Ones()
+  */
+template<typename Derived>
+bool DenseBase<Derived>::isOnes
+(const RealScalar& prec) const
+{
+  return isApproxToConstant(Scalar(1), prec);
+}
+
+/** Sets all coefficients in this expression to one.
+  *
+  * Example: \include MatrixBase_setOnes.cpp
+  * Output: \verbinclude MatrixBase_setOnes.out
+  *
+  * \sa class CwiseNullaryOp, Ones()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::setOnes()
+{
+  return setConstant(Scalar(1));
+}
+
+/** Resizes to the given \a newSize, and sets all coefficients in this expression to one.
+  *
+  * \only_for_vectors
+  *
+  * Example: \include Matrix_setOnes_int.cpp
+  * Output: \verbinclude Matrix_setOnes_int.out
+  *
+  * \sa MatrixBase::setOnes(), setOnes(Index,Index), class CwiseNullaryOp, MatrixBase::Ones()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+PlainObjectBase<Derived>::setOnes(Index newSize)
+{
+  resize(newSize);
+  return setConstant(Scalar(1));
+}
+
+/** Resizes to the given size, and sets all coefficients in this expression to one.
+  *
+  * \param nbRows the new number of rows
+  * \param nbCols the new number of columns
+  *
+  * Example: \include Matrix_setOnes_int_int.cpp
+  * Output: \verbinclude Matrix_setOnes_int_int.out
+  *
+  * \sa MatrixBase::setOnes(), setOnes(Index), class CwiseNullaryOp, MatrixBase::Ones()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+PlainObjectBase<Derived>::setOnes(Index nbRows, Index nbCols)
+{
+  resize(nbRows, nbCols);
+  return setConstant(Scalar(1));
+}
+
+// Identity:
+
+/** \returns an expression of the identity matrix (not necessarily square).
+  *
+  * The parameters \a nbRows and \a nbCols are the number of rows and of columns of
+  * the returned matrix. Must be compatible with this MatrixBase type.
+  *
+  * This variant is meant to be used for dynamic-size matrix types. For fixed-size types,
+  * it is redundant to pass \a rows and \a cols as arguments, so Identity() should be used
+  * instead.
+  *
+  * Example: \include MatrixBase_identity_int_int.cpp
+  * Output: \verbinclude MatrixBase_identity_int_int.out
+  *
+  * \sa Identity(), setIdentity(), isIdentity()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::IdentityReturnType
+MatrixBase<Derived>::Identity(Index nbRows, Index nbCols)
+{
+  return DenseBase<Derived>::NullaryExpr(nbRows, nbCols, internal::scalar_identity_op<Scalar>());
+}
+
+/** \returns an expression of the identity matrix (not necessarily square).
+  *
+  * This variant is only for fixed-size MatrixBase types. For dynamic-size types, you
+  * need to use the variant taking size arguments.
+  *
+  * Example: \include MatrixBase_identity.cpp
+  * Output: \verbinclude MatrixBase_identity.out
+  *
+  * \sa Identity(Index,Index), setIdentity(), isIdentity()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::IdentityReturnType
+MatrixBase<Derived>::Identity()
+{
+  EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived)
+  return MatrixBase<Derived>::NullaryExpr(RowsAtCompileTime, ColsAtCompileTime, internal::scalar_identity_op<Scalar>());
+}
+
+/** \returns true if *this is approximately equal to the identity matrix
+  *          (not necessarily square),
+  *          within the precision given by \a prec.
+  *
+  * Example: \include MatrixBase_isIdentity.cpp
+  * Output: \verbinclude MatrixBase_isIdentity.out
+  *
+  * \sa class CwiseNullaryOp, Identity(), Identity(Index,Index), setIdentity()
+  */
+template<typename Derived>
+bool MatrixBase<Derived>::isIdentity
+(const RealScalar& prec) const
+{
+  for(Index j = 0; j < cols(); ++j)
+  {
+    for(Index i = 0; i < rows(); ++i)
+    {
+      if(i == j)
+      {
+        if(!internal::isApprox(this->coeff(i, j), static_cast<Scalar>(1), prec))
+          return false;
+      }
+      else
+      {
+        if(!internal::isMuchSmallerThan(this->coeff(i, j), static_cast<RealScalar>(1), prec))
+          return false;
+      }
+    }
+  }
+  return true;
+}
+
+namespace internal {
+
+template<typename Derived, bool Big = (Derived::SizeAtCompileTime>=16)>
+struct setIdentity_impl
+{
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Derived& run(Derived& m)
+  {
+    return m = Derived::Identity(m.rows(), m.cols());
+  }
+};
+
+template<typename Derived>
+struct setIdentity_impl<Derived, true>
+{
+  typedef typename Derived::Index Index;
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Derived& run(Derived& m)
+  {
+    m.setZero();
+    const Index size = (std::min)(m.rows(), m.cols());
+    for(Index i = 0; i < size; ++i) m.coeffRef(i,i) = typename Derived::Scalar(1);
+    return m;
+  }
+};
+
+} // end namespace internal
+
+/** Writes the identity expression (not necessarily square) into *this.
+  *
+  * Example: \include MatrixBase_setIdentity.cpp
+  * Output: \verbinclude MatrixBase_setIdentity.out
+  *
+  * \sa class CwiseNullaryOp, Identity(), Identity(Index,Index), isIdentity()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::setIdentity()
+{
+  return internal::setIdentity_impl<Derived>::run(derived());
+}
+
+/** \brief Resizes to the given size, and writes the identity expression (not necessarily square) into *this.
+  *
+  * \param nbRows the new number of rows
+  * \param nbCols the new number of columns
+  *
+  * Example: \include Matrix_setIdentity_int_int.cpp
+  * Output: \verbinclude Matrix_setIdentity_int_int.out
+  *
+  * \sa MatrixBase::setIdentity(), class CwiseNullaryOp, MatrixBase::Identity()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived& MatrixBase<Derived>::setIdentity(Index nbRows, Index nbCols)
+{
+  derived().resize(nbRows, nbCols);
+  return setIdentity();
+}
+
+/** \returns an expression of the i-th unit (basis) vector.
+  *
+  * \only_for_vectors
+  *
+  * \sa MatrixBase::Unit(Index), MatrixBase::UnitX(), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::BasisReturnType MatrixBase<Derived>::Unit(Index newSize, Index i)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return BasisReturnType(SquareMatrixType::Identity(newSize,newSize), i);
+}
+
+/** \returns an expression of the i-th unit (basis) vector.
+  *
+  * \only_for_vectors
+  *
+  * This variant is for fixed-size vector only.
+  *
+  * \sa MatrixBase::Unit(Index,Index), MatrixBase::UnitX(), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::BasisReturnType MatrixBase<Derived>::Unit(Index i)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return BasisReturnType(SquareMatrixType::Identity(),i);
+}
+
+/** \returns an expression of the X axis unit vector (1{,0}^*)
+  *
+  * \only_for_vectors
+  *
+  * \sa MatrixBase::Unit(Index,Index), MatrixBase::Unit(Index), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::BasisReturnType MatrixBase<Derived>::UnitX()
+{ return Derived::Unit(0); }
+
+/** \returns an expression of the Y axis unit vector (0,1{,0}^*)
+  *
+  * \only_for_vectors
+  *
+  * \sa MatrixBase::Unit(Index,Index), MatrixBase::Unit(Index), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::BasisReturnType MatrixBase<Derived>::UnitY()
+{ return Derived::Unit(1); }
+
+/** \returns an expression of the Z axis unit vector (0,0,1{,0}^*)
+  *
+  * \only_for_vectors
+  *
+  * \sa MatrixBase::Unit(Index,Index), MatrixBase::Unit(Index), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::BasisReturnType MatrixBase<Derived>::UnitZ()
+{ return Derived::Unit(2); }
+
+/** \returns an expression of the W axis unit vector (0,0,0,1)
+  *
+  * \only_for_vectors
+  *
+  * \sa MatrixBase::Unit(Index,Index), MatrixBase::Unit(Index), MatrixBase::UnitY(), MatrixBase::UnitZ(), MatrixBase::UnitW()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::BasisReturnType MatrixBase<Derived>::UnitW()
+{ return Derived::Unit(3); }
+
+} // end namespace Eigen
+
+#endif // EIGEN_CWISE_NULLARY_OP_H
diff --git a/third_party/eigen3/Eigen/src/Core/CwiseUnaryOp.h b/third_party/eigen3/Eigen/src/Core/CwiseUnaryOp.h
new file mode 100644
index 0000000000..aa7df197f9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/CwiseUnaryOp.h
@@ -0,0 +1,135 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CWISE_UNARY_OP_H
+#define EIGEN_CWISE_UNARY_OP_H
+
+namespace Eigen { 
+
+/** \class CwiseUnaryOp
+  * \ingroup Core_Module
+  *
+  * \brief Generic expression where a coefficient-wise unary operator is applied to an expression
+  *
+  * \param UnaryOp template functor implementing the operator
+  * \param XprType the type of the expression to which we are applying the unary operator
+  *
+  * This class represents an expression where a unary operator is applied to an expression.
+  * It is the return type of all operations taking exactly 1 input expression, regardless of the
+  * presence of other inputs such as scalars. For example, the operator* in the expression 3*matrix
+  * is considered unary, because only the right-hand side is an expression, and its
+  * return type is a specialization of CwiseUnaryOp.
+  *
+  * Most of the time, this is the only way that it is used, so you typically don't have to name
+  * CwiseUnaryOp types explicitly.
+  *
+  * \sa MatrixBase::unaryExpr(const CustomUnaryOp &) const, class CwiseBinaryOp, class CwiseNullaryOp
+  */
+
+namespace internal {
+template<typename UnaryOp, typename XprType>
+struct traits<CwiseUnaryOp<UnaryOp, XprType> >
+ : traits<XprType>
+{
+  typedef typename result_of<
+                     UnaryOp(typename XprType::Scalar)
+                   >::type Scalar;
+  typedef typename XprType::Nested XprTypeNested;
+  typedef typename remove_reference<XprTypeNested>::type _XprTypeNested;
+  enum {
+    Flags = _XprTypeNested::Flags & (
+      HereditaryBits | LinearAccessBit | AlignedBit
+      | (functor_traits<UnaryOp>::PacketAccess ? PacketAccessBit : 0)),
+    CoeffReadCost = _XprTypeNested::CoeffReadCost + functor_traits<UnaryOp>::Cost
+  };
+};
+}
+
+template<typename UnaryOp, typename XprType, typename StorageKind>
+class CwiseUnaryOpImpl;
+
+template<typename UnaryOp, typename XprType>
+class CwiseUnaryOp : internal::no_assignment_operator,
+  public CwiseUnaryOpImpl<UnaryOp, XprType, typename internal::traits<XprType>::StorageKind>
+{
+  public:
+
+    typedef typename CwiseUnaryOpImpl<UnaryOp, XprType,typename internal::traits<XprType>::StorageKind>::Base Base;
+    EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseUnaryOp)
+
+    EIGEN_DEVICE_FUNC
+    inline CwiseUnaryOp(const XprType& xpr, const UnaryOp& func = UnaryOp())
+      : m_xpr(xpr), m_functor(func) {}
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index rows() const { return m_xpr.rows(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index cols() const { return m_xpr.cols(); }
+
+    /** \returns the functor representing the unary operation */
+    EIGEN_DEVICE_FUNC
+    const UnaryOp& functor() const { return m_functor; }
+
+    /** \returns the nested expression */
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    nestedExpression() const { return m_xpr; }
+
+    /** \returns the nested expression */
+    EIGEN_DEVICE_FUNC
+    typename internal::remove_all<typename XprType::Nested>::type&
+    nestedExpression() { return m_xpr.const_cast_derived(); }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const UnaryOp m_functor;
+};
+
+// This is the generic implementation for dense storage.
+// It can be used for any expression types implementing the dense concept.
+template<typename UnaryOp, typename XprType>
+class CwiseUnaryOpImpl<UnaryOp,XprType,Dense>
+  : public internal::dense_xpr_base<CwiseUnaryOp<UnaryOp, XprType> >::type
+{
+  public:
+
+    typedef CwiseUnaryOp<UnaryOp, XprType> Derived;
+    typedef typename internal::dense_xpr_base<CwiseUnaryOp<UnaryOp, XprType> >::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index rowId, Index colId) const
+    {
+      return derived().functor()(derived().nestedExpression().coeff(rowId, colId));
+    }
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index rowId, Index colId) const
+    {
+      return derived().functor().packetOp(derived().nestedExpression().template packet<LoadMode>(rowId, colId));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index index) const
+    {
+      return derived().functor()(derived().nestedExpression().coeff(index));
+    }
+
+    template<int LoadMode>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE PacketScalar packet(Index index) const
+    {
+      return derived().functor().packetOp(derived().nestedExpression().template packet<LoadMode>(index));
+    }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CWISE_UNARY_OP_H
diff --git a/third_party/eigen3/Eigen/src/Core/CwiseUnaryView.h b/third_party/eigen3/Eigen/src/Core/CwiseUnaryView.h
new file mode 100644
index 0000000000..b2638d3265
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/CwiseUnaryView.h
@@ -0,0 +1,139 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CWISE_UNARY_VIEW_H
+#define EIGEN_CWISE_UNARY_VIEW_H
+
+namespace Eigen {
+
+/** \class CwiseUnaryView
+  * \ingroup Core_Module
+  *
+  * \brief Generic lvalue expression of a coefficient-wise unary operator of a matrix or a vector
+  *
+  * \param ViewOp template functor implementing the view
+  * \param MatrixType the type of the matrix we are applying the unary operator
+  *
+  * This class represents a lvalue expression of a generic unary view operator of a matrix or a vector.
+  * It is the return type of real() and imag(), and most of the time this is the only way it is used.
+  *
+  * \sa MatrixBase::unaryViewExpr(const CustomUnaryOp &) const, class CwiseUnaryOp
+  */
+
+namespace internal {
+template<typename ViewOp, typename MatrixType>
+struct traits<CwiseUnaryView<ViewOp, MatrixType> >
+ : traits<MatrixType>
+{
+  typedef typename result_of<
+                     ViewOp(typename traits<MatrixType>::Scalar)
+                   >::type Scalar;
+  typedef typename MatrixType::Nested MatrixTypeNested;
+  typedef typename remove_all<MatrixTypeNested>::type _MatrixTypeNested;
+  enum {
+    Flags = (traits<_MatrixTypeNested>::Flags & (HereditaryBits | LvalueBit | LinearAccessBit | DirectAccessBit)),
+    CoeffReadCost = traits<_MatrixTypeNested>::CoeffReadCost + functor_traits<ViewOp>::Cost,
+    MatrixTypeInnerStride =  inner_stride_at_compile_time<MatrixType>::ret,
+    // need to cast the sizeof's from size_t to int explicitly, otherwise:
+    // "error: no integral type can represent all of the enumerator values
+    InnerStrideAtCompileTime = MatrixTypeInnerStride == Dynamic
+                             ? int(Dynamic)
+                             : int(MatrixTypeInnerStride) * int(sizeof(typename traits<MatrixType>::Scalar) / sizeof(Scalar)),
+    OuterStrideAtCompileTime = outer_stride_at_compile_time<MatrixType>::ret == Dynamic
+                             ? int(Dynamic)
+                             : outer_stride_at_compile_time<MatrixType>::ret * int(sizeof(typename traits<MatrixType>::Scalar) / sizeof(Scalar))
+  };
+};
+}
+
+template<typename ViewOp, typename MatrixType, typename StorageKind>
+class CwiseUnaryViewImpl;
+
+template<typename ViewOp, typename MatrixType>
+class CwiseUnaryView : public CwiseUnaryViewImpl<ViewOp, MatrixType, typename internal::traits<MatrixType>::StorageKind>
+{
+  public:
+
+    typedef typename CwiseUnaryViewImpl<ViewOp, MatrixType,typename internal::traits<MatrixType>::StorageKind>::Base Base;
+    EIGEN_GENERIC_PUBLIC_INTERFACE(CwiseUnaryView)
+
+    inline CwiseUnaryView(const MatrixType& mat, const ViewOp& func = ViewOp())
+      : m_matrix(mat), m_functor(func) {}
+
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryView)
+
+    EIGEN_STRONG_INLINE Index rows() const { return m_matrix.rows(); }
+    EIGEN_STRONG_INLINE Index cols() const { return m_matrix.cols(); }
+
+    /** \returns the functor representing unary operation */
+    const ViewOp& functor() const { return m_functor; }
+
+    /** \returns the nested expression */
+    const typename internal::remove_all<typename MatrixType::Nested>::type&
+    nestedExpression() const { return m_matrix; }
+
+    /** \returns the nested expression */
+    typename internal::remove_all<typename MatrixType::Nested>::type&
+    nestedExpression() { return m_matrix.const_cast_derived(); }
+
+  protected:
+    // FIXME changed from MatrixType::Nested because of a weird compilation error with sun CC
+    typename internal::nested<MatrixType>::type m_matrix;
+    ViewOp m_functor;
+};
+
+template<typename ViewOp, typename MatrixType>
+class CwiseUnaryViewImpl<ViewOp,MatrixType,Dense>
+  : public internal::dense_xpr_base< CwiseUnaryView<ViewOp, MatrixType> >::type
+{
+  public:
+
+    typedef CwiseUnaryView<ViewOp, MatrixType> Derived;
+    typedef typename internal::dense_xpr_base< CwiseUnaryView<ViewOp, MatrixType> >::type Base;
+
+    EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(CwiseUnaryViewImpl)
+    
+    inline Scalar* data() { return &coeffRef(0); }
+    inline const Scalar* data() const { return &coeff(0); }
+
+    inline Index innerStride() const
+    {
+      return derived().nestedExpression().innerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
+    }
+
+    inline Index outerStride() const
+    {
+      return derived().nestedExpression().outerStride() * sizeof(typename internal::traits<MatrixType>::Scalar) / sizeof(Scalar);
+    }
+
+    EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const
+    {
+      return derived().functor()(derived().nestedExpression().coeff(row, col));
+    }
+
+    EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+    {
+      return derived().functor()(derived().nestedExpression().coeff(index));
+    }
+
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col)
+    {
+      return derived().functor()(const_cast_derived().nestedExpression().coeffRef(row, col));
+    }
+
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index index)
+    {
+      return derived().functor()(const_cast_derived().nestedExpression().coeffRef(index));
+    }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CWISE_UNARY_VIEW_H
diff --git a/third_party/eigen3/Eigen/src/Core/DenseBase.h b/third_party/eigen3/Eigen/src/Core/DenseBase.h
new file mode 100644
index 0000000000..55cec0bc26
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/DenseBase.h
@@ -0,0 +1,561 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2007-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_DENSEBASE_H
+#define EIGEN_DENSEBASE_H
+
+namespace Eigen {
+
+namespace internal {
+  
+// The index type defined by EIGEN_DEFAULT_DENSE_INDEX_TYPE must be a signed type.
+// This dummy function simply aims at checking that at compile time.
+static inline void check_DenseIndex_is_signed() {
+  EIGEN_STATIC_ASSERT(NumTraits<DenseIndex>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE); 
+}
+
+} // end namespace internal
+  
+/** \class DenseBase
+  * \ingroup Core_Module
+  *
+  * \brief Base class for all dense matrices, vectors, and arrays
+  *
+  * This class is the base that is inherited by all dense objects (matrix, vector, arrays,
+  * and related expression types). The common Eigen API for dense objects is contained in this class.
+  *
+  * \tparam Derived is the derived type, e.g., a matrix type or an expression.
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_DENSEBASE_PLUGIN.
+  *
+  * \sa \ref TopicClassHierarchy
+  */
+template<typename Derived> class DenseBase
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+  : public internal::special_scalar_op_base<Derived,typename internal::traits<Derived>::Scalar,
+                                     typename NumTraits<typename internal::traits<Derived>::Scalar>::Real>
+#else
+  : public DenseCoeffsBase<Derived>
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+{
+  public:
+    using internal::special_scalar_op_base<Derived,typename internal::traits<Derived>::Scalar,
+                typename NumTraits<typename internal::traits<Derived>::Scalar>::Real>::operator*;
+
+    class InnerIterator;
+
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+
+    /** \brief The type of indices 
+      * \details To change this, \c \#define the preprocessor symbol \c EIGEN_DEFAULT_DENSE_INDEX_TYPE.
+      * \sa \ref TopicPreprocessorDirectives.
+      */
+    typedef typename internal::traits<Derived>::Index Index; 
+
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type PacketScalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    typedef DenseCoeffsBase<Derived> Base;
+    using Base::derived;
+    using Base::const_cast_derived;
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::rowIndexByOuterInner;
+    using Base::colIndexByOuterInner;
+    using Base::coeff;
+    using Base::coeffByOuterInner;
+    using Base::packet;
+    using Base::packetByOuterInner;
+    using Base::writePacket;
+    using Base::writePacketByOuterInner;
+    using Base::coeffRef;
+    using Base::coeffRefByOuterInner;
+    using Base::copyCoeff;
+    using Base::copyCoeffByOuterInner;
+    using Base::copyPacket;
+    using Base::copyPacketByOuterInner;
+    using Base::operator();
+    using Base::operator[];
+    using Base::x;
+    using Base::y;
+    using Base::z;
+    using Base::w;
+    using Base::stride;
+    using Base::innerStride;
+    using Base::outerStride;
+    using Base::rowStride;
+    using Base::colStride;
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+
+    enum {
+
+      RowsAtCompileTime = internal::traits<Derived>::RowsAtCompileTime,
+        /**< The number of rows at compile-time. This is just a copy of the value provided
+          * by the \a Derived type. If a value is not known at compile-time,
+          * it is set to the \a Dynamic constant.
+          * \sa MatrixBase::rows(), MatrixBase::cols(), ColsAtCompileTime, SizeAtCompileTime */
+
+      ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
+        /**< The number of columns at compile-time. This is just a copy of the value provided
+          * by the \a Derived type. If a value is not known at compile-time,
+          * it is set to the \a Dynamic constant.
+          * \sa MatrixBase::rows(), MatrixBase::cols(), RowsAtCompileTime, SizeAtCompileTime */
+
+
+      SizeAtCompileTime = (internal::size_at_compile_time<internal::traits<Derived>::RowsAtCompileTime,
+                                                   internal::traits<Derived>::ColsAtCompileTime>::ret),
+        /**< This is equal to the number of coefficients, i.e. the number of
+          * rows times the number of columns, or to \a Dynamic if this is not
+          * known at compile-time. \sa RowsAtCompileTime, ColsAtCompileTime */
+
+      MaxRowsAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime,
+        /**< This value is equal to the maximum possible number of rows that this expression
+          * might have. If this expression might have an arbitrarily high number of rows,
+          * this value is set to \a Dynamic.
+          *
+          * This value is useful to know when evaluating an expression, in order to determine
+          * whether it is possible to avoid doing a dynamic memory allocation.
+          *
+          * \sa RowsAtCompileTime, MaxColsAtCompileTime, MaxSizeAtCompileTime
+          */
+
+      MaxColsAtCompileTime = internal::traits<Derived>::MaxColsAtCompileTime,
+        /**< This value is equal to the maximum possible number of columns that this expression
+          * might have. If this expression might have an arbitrarily high number of columns,
+          * this value is set to \a Dynamic.
+          *
+          * This value is useful to know when evaluating an expression, in order to determine
+          * whether it is possible to avoid doing a dynamic memory allocation.
+          *
+          * \sa ColsAtCompileTime, MaxRowsAtCompileTime, MaxSizeAtCompileTime
+          */
+
+      MaxSizeAtCompileTime = (internal::size_at_compile_time<internal::traits<Derived>::MaxRowsAtCompileTime,
+                                                      internal::traits<Derived>::MaxColsAtCompileTime>::ret),
+        /**< This value is equal to the maximum possible number of coefficients that this expression
+          * might have. If this expression might have an arbitrarily high number of coefficients,
+          * this value is set to \a Dynamic.
+          *
+          * This value is useful to know when evaluating an expression, in order to determine
+          * whether it is possible to avoid doing a dynamic memory allocation.
+          *
+          * \sa SizeAtCompileTime, MaxRowsAtCompileTime, MaxColsAtCompileTime
+          */
+
+      IsVectorAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime == 1
+                           || internal::traits<Derived>::MaxColsAtCompileTime == 1,
+        /**< This is set to true if either the number of rows or the number of
+          * columns is known at compile-time to be equal to 1. Indeed, in that case,
+          * we are dealing with a column-vector (if there is only one column) or with
+          * a row-vector (if there is only one row). */
+
+      Flags = internal::traits<Derived>::Flags,
+        /**< This stores expression \ref flags flags which may or may not be inherited by new expressions
+          * constructed from this one. See the \ref flags "list of flags".
+          */
+
+      IsRowMajor = int(Flags) & RowMajorBit, /**< True if this expression has row-major storage order. */
+
+      InnerSizeAtCompileTime = int(IsVectorAtCompileTime) ? int(SizeAtCompileTime)
+                             : int(IsRowMajor) ? int(ColsAtCompileTime) : int(RowsAtCompileTime),
+
+      CoeffReadCost = internal::traits<Derived>::CoeffReadCost,
+        /**< This is a rough measure of how expensive it is to read one coefficient from
+          * this expression.
+          */
+
+      InnerStrideAtCompileTime = internal::inner_stride_at_compile_time<Derived>::ret,
+      OuterStrideAtCompileTime = internal::outer_stride_at_compile_time<Derived>::ret
+    };
+
+    enum { ThisConstantIsPrivateInPlainObjectBase };
+
+    /** \returns the number of nonzero coefficients which is in practice the number
+      * of stored coefficients. */
+    EIGEN_DEVICE_FUNC
+    inline Index nonZeros() const { return size(); }
+    /** \returns true if either the number of rows or the number of columns is equal to 1.
+      * In other words, this function returns
+      * \code rows()==1 || cols()==1 \endcode
+      * \sa rows(), cols(), IsVectorAtCompileTime. */
+
+    /** \returns the outer size.
+      *
+      * \note For a vector, this returns just 1. For a matrix (non-vector), this is the major dimension
+      * with respect to the \ref TopicStorageOrders "storage order", i.e., the number of columns for a
+      * column-major matrix, and the number of rows for a row-major matrix. */
+    EIGEN_DEVICE_FUNC
+    Index outerSize() const
+    {
+      return IsVectorAtCompileTime ? 1
+           : int(IsRowMajor) ? this->rows() : this->cols();
+    }
+
+    /** \returns the inner size.
+      *
+      * \note For a vector, this is just the size. For a matrix (non-vector), this is the minor dimension
+      * with respect to the \ref TopicStorageOrders "storage order", i.e., the number of rows for a 
+      * column-major matrix, and the number of columns for a row-major matrix. */
+    EIGEN_DEVICE_FUNC
+    Index innerSize() const
+    {
+      return IsVectorAtCompileTime ? this->size()
+           : int(IsRowMajor) ? this->cols() : this->rows();
+    }
+
+    /** Only plain matrices/arrays, not expressions, may be resized; therefore the only useful resize methods are
+      * Matrix::resize() and Array::resize(). The present method only asserts that the new size equals the old size, and does
+      * nothing else.
+      */
+    EIGEN_DEVICE_FUNC
+    void resize(Index newSize)
+    {
+      EIGEN_ONLY_USED_FOR_DEBUG(newSize);
+      eigen_assert(newSize == this->size()
+                && "DenseBase::resize() does not actually allow to resize.");
+    }
+    /** Only plain matrices/arrays, not expressions, may be resized; therefore the only useful resize methods are
+      * Matrix::resize() and Array::resize(). The present method only asserts that the new size equals the old size, and does
+      * nothing else.
+      */
+    EIGEN_DEVICE_FUNC
+    void resize(Index nbRows, Index nbCols)
+    {
+      EIGEN_ONLY_USED_FOR_DEBUG(nbRows);
+      EIGEN_ONLY_USED_FOR_DEBUG(nbCols);
+      eigen_assert(nbRows == this->rows() && nbCols == this->cols()
+                && "DenseBase::resize() does not actually allow to resize.");
+    }
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+
+    /** \internal Represents a matrix with all coefficients equal to one another*/
+    typedef CwiseNullaryOp<internal::scalar_constant_op<Scalar>,Derived> ConstantReturnType;
+    /** \internal Represents a vector with linearly spaced coefficients that allows sequential access only. */
+    typedef CwiseNullaryOp<internal::linspaced_op<Scalar,false>,Derived> SequentialLinSpacedReturnType;
+    /** \internal Represents a vector with linearly spaced coefficients that allows random access. */
+    typedef CwiseNullaryOp<internal::linspaced_op<Scalar,true>,Derived> RandomAccessLinSpacedReturnType;
+    /** \internal the return type of MatrixBase::eigenvalues() */
+    typedef Matrix<typename NumTraits<typename internal::traits<Derived>::Scalar>::Real, internal::traits<Derived>::ColsAtCompileTime, 1> EigenvaluesReturnType;
+
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+    /** Copies \a other into *this. \returns a reference to *this. */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const DenseBase<OtherDerived>& other);
+
+    /** Special case of the template operator=, in order to prevent the compiler
+      * from generating a default operator= (issue hit with g++ 4.1)
+      */
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const DenseBase& other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const EigenBase<OtherDerived> &other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator+=(const EigenBase<OtherDerived> &other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator-=(const EigenBase<OtherDerived> &other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const ReturnByValue<OtherDerived>& func);
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** Copies \a other into *this without evaluating other. \returns a reference to *this. */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& lazyAssign(const DenseBase<OtherDerived>& other);
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+    EIGEN_DEVICE_FUNC
+    CommaInitializer<Derived> operator<< (const Scalar& s);
+
+    template<unsigned int Added,unsigned int Removed>
+    const Flagged<Derived, Added, Removed> flagged() const;
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    CommaInitializer<Derived> operator<< (const DenseBase<OtherDerived>& other);
+
+    EIGEN_DEVICE_FUNC
+    Eigen::Transpose<Derived> transpose();
+    typedef typename internal::add_const<Transpose<const Derived> >::type ConstTransposeReturnType;
+    EIGEN_DEVICE_FUNC
+    ConstTransposeReturnType transpose() const;
+    EIGEN_DEVICE_FUNC
+    void transposeInPlace();
+#ifndef EIGEN_NO_DEBUG
+  protected:
+    template<typename OtherDerived>
+    void checkTransposeAliasing(const OtherDerived& other) const;
+  public:
+#endif
+
+
+    EIGEN_DEVICE_FUNC static const ConstantReturnType
+    Constant(Index rows, Index cols, const Scalar& value);
+    EIGEN_DEVICE_FUNC static const ConstantReturnType
+    Constant(Index size, const Scalar& value);
+    EIGEN_DEVICE_FUNC static const ConstantReturnType
+    Constant(const Scalar& value);
+
+    EIGEN_DEVICE_FUNC static const SequentialLinSpacedReturnType
+    LinSpaced(Sequential_t, Index size, const Scalar& low, const Scalar& high);
+    EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType
+    LinSpaced(Index size, const Scalar& low, const Scalar& high);
+    EIGEN_DEVICE_FUNC static const SequentialLinSpacedReturnType
+    LinSpaced(Sequential_t, const Scalar& low, const Scalar& high);
+    EIGEN_DEVICE_FUNC static const RandomAccessLinSpacedReturnType
+    LinSpaced(const Scalar& low, const Scalar& high);
+
+    template<typename CustomNullaryOp> EIGEN_DEVICE_FUNC
+    static const CwiseNullaryOp<CustomNullaryOp, Derived>
+    NullaryExpr(Index rows, Index cols, const CustomNullaryOp& func);
+    template<typename CustomNullaryOp> EIGEN_DEVICE_FUNC
+    static const CwiseNullaryOp<CustomNullaryOp, Derived>
+    NullaryExpr(Index size, const CustomNullaryOp& func);
+    template<typename CustomNullaryOp> EIGEN_DEVICE_FUNC
+    static const CwiseNullaryOp<CustomNullaryOp, Derived>
+    NullaryExpr(const CustomNullaryOp& func);
+
+    EIGEN_DEVICE_FUNC static const ConstantReturnType Zero(Index rows, Index cols);
+    EIGEN_DEVICE_FUNC static const ConstantReturnType Zero(Index size);
+    EIGEN_DEVICE_FUNC static const ConstantReturnType Zero();
+    EIGEN_DEVICE_FUNC static const ConstantReturnType Ones(Index rows, Index cols);
+    EIGEN_DEVICE_FUNC static const ConstantReturnType Ones(Index size);
+    EIGEN_DEVICE_FUNC static const ConstantReturnType Ones();
+
+    EIGEN_DEVICE_FUNC void fill(const Scalar& value);
+    EIGEN_DEVICE_FUNC Derived& setConstant(const Scalar& value);
+    EIGEN_DEVICE_FUNC Derived& setLinSpaced(Index size, const Scalar& low, const Scalar& high);
+    EIGEN_DEVICE_FUNC Derived& setLinSpaced(const Scalar& low, const Scalar& high);
+    EIGEN_DEVICE_FUNC Derived& setZero();
+    EIGEN_DEVICE_FUNC Derived& setOnes();
+    EIGEN_DEVICE_FUNC Derived& setRandom();
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC
+    bool isApprox(const DenseBase<OtherDerived>& other,
+                  const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    EIGEN_DEVICE_FUNC 
+    bool isMuchSmallerThan(const RealScalar& other,
+                           const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC
+    bool isMuchSmallerThan(const DenseBase<OtherDerived>& other,
+                           const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+
+    EIGEN_DEVICE_FUNC bool isApproxToConstant(const Scalar& value, const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    EIGEN_DEVICE_FUNC bool isConstant(const Scalar& value, const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    EIGEN_DEVICE_FUNC bool isZero(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    EIGEN_DEVICE_FUNC bool isOnes(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    
+    inline bool hasNaN() const;
+    inline bool allFinite() const;
+
+    EIGEN_DEVICE_FUNC
+    inline Derived& operator*=(const Scalar& other);
+    EIGEN_DEVICE_FUNC
+    inline Derived& operator/=(const Scalar& other);
+
+    typedef typename internal::add_const_on_value_type<typename internal::eval<Derived>::type>::type EvalReturnType;
+    /** \returns the matrix or vector obtained by evaluating this expression.
+      *
+      * Notice that in the case of a plain matrix or vector (not an expression) this function just returns
+      * a const reference, in order to avoid a useless copy.
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE EvalReturnType eval() const
+    {
+      // Even though MSVC does not honor strong inlining when the return type
+      // is a dynamic matrix, we desperately need strong inlining for fixed
+      // size types on MSVC.
+      return typename internal::eval<Derived>::type(derived());
+    }
+
+    /** swaps *this with the expression \a other.
+      *
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void swap(const DenseBase<OtherDerived>& other,
+              int = OtherDerived::ThisConstantIsPrivateInPlainObjectBase)
+    {
+      SwapWrapper<Derived>(derived()).lazyAssign(other.derived());
+    }
+
+    /** swaps *this with the matrix or array \a other.
+      *
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void swap(PlainObjectBase<OtherDerived>& other)
+    {
+      SwapWrapper<Derived>(derived()).lazyAssign(other.derived());
+    }
+
+
+    EIGEN_DEVICE_FUNC inline const NestByValue<Derived> nestByValue() const;
+    EIGEN_DEVICE_FUNC inline const ForceAlignedAccess<Derived> forceAlignedAccess() const;
+    EIGEN_DEVICE_FUNC inline ForceAlignedAccess<Derived> forceAlignedAccess();
+    template<bool Enable> EIGEN_DEVICE_FUNC
+    inline const typename internal::conditional<Enable,ForceAlignedAccess<Derived>,Derived&>::type forceAlignedAccessIf() const;
+    template<bool Enable> EIGEN_DEVICE_FUNC
+    inline typename internal::conditional<Enable,ForceAlignedAccess<Derived>,Derived&>::type forceAlignedAccessIf();
+
+    EIGEN_DEVICE_FUNC Scalar sum() const;
+    EIGEN_DEVICE_FUNC Scalar mean() const;
+    EIGEN_DEVICE_FUNC Scalar trace() const;
+
+    EIGEN_DEVICE_FUNC Scalar prod() const;
+
+    EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar minCoeff() const;
+    EIGEN_DEVICE_FUNC typename internal::traits<Derived>::Scalar maxCoeff() const;
+
+    template<typename IndexType> EIGEN_DEVICE_FUNC
+    typename internal::traits<Derived>::Scalar minCoeff(IndexType* row, IndexType* col) const;
+    template<typename IndexType> EIGEN_DEVICE_FUNC
+    typename internal::traits<Derived>::Scalar maxCoeff(IndexType* row, IndexType* col) const;
+    template<typename IndexType> EIGEN_DEVICE_FUNC
+    typename internal::traits<Derived>::Scalar minCoeff(IndexType* index) const;
+    template<typename IndexType> EIGEN_DEVICE_FUNC
+    typename internal::traits<Derived>::Scalar maxCoeff(IndexType* index) const;
+
+    template<typename BinaryOp>
+    EIGEN_DEVICE_FUNC
+    typename internal::result_of<BinaryOp(typename internal::traits<Derived>::Scalar)>::type
+    redux(const BinaryOp& func) const;
+
+    template<typename Visitor>
+    EIGEN_DEVICE_FUNC
+    void visit(Visitor& func) const;
+
+    inline const WithFormat<Derived> format(const IOFormat& fmt) const;
+
+    /** \returns the unique coefficient of a 1x1 expression */
+    EIGEN_DEVICE_FUNC
+    CoeffReturnType value() const
+    {
+      EIGEN_STATIC_ASSERT_SIZE_1x1(Derived)
+      eigen_assert(this->rows() == 1 && this->cols() == 1);
+      return derived().coeff(0,0);
+    }
+
+    bool all() const;
+    bool any() const;
+    Index count() const;
+
+    typedef VectorwiseOp<Derived, Horizontal> RowwiseReturnType;
+    typedef const VectorwiseOp<const Derived, Horizontal> ConstRowwiseReturnType;
+    typedef VectorwiseOp<Derived, Vertical> ColwiseReturnType;
+    typedef const VectorwiseOp<const Derived, Vertical> ConstColwiseReturnType;
+
+    ConstRowwiseReturnType rowwise() const;
+    RowwiseReturnType rowwise();
+    ConstColwiseReturnType colwise() const;
+    ColwiseReturnType colwise();
+
+    static const CwiseNullaryOp<internal::scalar_random_op<Scalar>,Derived> Random(Index rows, Index cols);
+    static const CwiseNullaryOp<internal::scalar_random_op<Scalar>,Derived> Random(Index size);
+    static const CwiseNullaryOp<internal::scalar_random_op<Scalar>,Derived> Random();
+
+    template<typename ThenDerived,typename ElseDerived>
+    const Select<Derived,ThenDerived,ElseDerived>
+    select(const DenseBase<ThenDerived>& thenMatrix,
+           const DenseBase<ElseDerived>& elseMatrix) const;
+
+    template<typename ThenDerived>
+    inline const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
+    select(const DenseBase<ThenDerived>& thenMatrix, const typename ThenDerived::Scalar& elseScalar) const;
+
+    template<typename ElseDerived>
+    inline const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
+    select(const typename ElseDerived::Scalar& thenScalar, const DenseBase<ElseDerived>& elseMatrix) const;
+
+    template<int p> RealScalar lpNorm() const;
+
+    template<int RowFactor, int ColFactor>
+    const Replicate<Derived,RowFactor,ColFactor> replicate() const;
+    const Replicate<Derived,Dynamic,Dynamic> replicate(Index rowFacor,Index colFactor) const;
+
+    typedef Reverse<Derived, BothDirections> ReverseReturnType;
+    typedef const Reverse<const Derived, BothDirections> ConstReverseReturnType;
+    ReverseReturnType reverse();
+    ConstReverseReturnType reverse() const;
+    void reverseInPlace();
+
+#define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::DenseBase
+#   include "../plugins/BlockMethods.h"
+#   ifdef EIGEN_DENSEBASE_PLUGIN
+#     include EIGEN_DENSEBASE_PLUGIN
+#   endif
+// Because of an intra-Google include scanner limitation,
+// third_party/stan cannot define the EIGEN_DENSEBASE_PLUGIN
+// macro
+// as "stan/math/matrix/EigenDenseBaseAddons.hpp".  According to
+// ambrose@google.com, this is a known limitation: the include
+// scanner doesn't maintain any preprocessor state about macros,
+// previously visited files, etc.  See also //base/stacktrace.cc.
+#   ifdef STAN_MATH_MATRIX_EIGEN_DENSEBASE_PLUGIN
+#     include "stan/math/matrix/EigenDenseBaseAddons.hpp"
+#   endif
+#undef EIGEN_CURRENT_STORAGE_BASE_CLASS
+
+#ifdef EIGEN2_SUPPORT
+
+    Block<Derived> corner(CornerType type, Index cRows, Index cCols);
+    const Block<Derived> corner(CornerType type, Index cRows, Index cCols) const;
+    template<int CRows, int CCols>
+    Block<Derived, CRows, CCols> corner(CornerType type);
+    template<int CRows, int CCols>
+    const Block<Derived, CRows, CCols> corner(CornerType type) const;
+
+#endif // EIGEN2_SUPPORT
+
+
+    // disable the use of evalTo for dense objects with a nice compilation error
+    template<typename Dest>
+    EIGEN_DEVICE_FUNC
+    inline void evalTo(Dest& ) const
+    {
+      EIGEN_STATIC_ASSERT((internal::is_same<Dest,void>::value),THE_EVAL_EVALTO_FUNCTION_SHOULD_NEVER_BE_CALLED_FOR_DENSE_OBJECTS);
+    }
+
+  protected:
+    /** Default constructor. Do nothing. */
+    EIGEN_DEVICE_FUNC DenseBase()
+    {
+      /* Just checks for self-consistency of the flags.
+       * Only do it when debugging Eigen, as this borders on paranoiac and could slow compilation down
+       */
+#ifdef EIGEN_INTERNAL_DEBUGGING
+      EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, int(IsRowMajor))
+                        && EIGEN_IMPLIES(MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1, int(!IsRowMajor))),
+                          INVALID_STORAGE_ORDER_FOR_THIS_VECTOR_EXPRESSION)
+#endif
+    }
+
+  private:
+    EIGEN_DEVICE_FUNC explicit DenseBase(int);
+    EIGEN_DEVICE_FUNC DenseBase(int,int);
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC explicit DenseBase(const DenseBase<OtherDerived>&);
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_DENSEBASE_H
diff --git a/third_party/eigen3/Eigen/src/Core/DenseCoeffsBase.h b/third_party/eigen3/Eigen/src/Core/DenseCoeffsBase.h
new file mode 100644
index 0000000000..efabb5e675
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/DenseCoeffsBase.h
@@ -0,0 +1,787 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_DENSECOEFFSBASE_H
+#define EIGEN_DENSECOEFFSBASE_H
+
+namespace Eigen {
+
+namespace internal {
+template<typename T> struct add_const_on_value_type_if_arithmetic
+{
+  typedef typename conditional<is_arithmetic<T>::value, T, typename add_const_on_value_type<T>::type>::type type;
+};
+}
+
+/** \brief Base class providing read-only coefficient access to matrices and arrays.
+  * \ingroup Core_Module
+  * \tparam Derived Type of the derived class
+  * \tparam #ReadOnlyAccessors Constant indicating read-only access
+  *
+  * This class defines the \c operator() \c const function and friends, which can be used to read specific
+  * entries of a matrix or array.
+  * 
+  * \sa DenseCoeffsBase<Derived, WriteAccessors>, DenseCoeffsBase<Derived, DirectAccessors>,
+  *     \ref TopicClassHierarchy
+  */
+template<typename Derived>
+class DenseCoeffsBase<Derived,ReadOnlyAccessors> : public EigenBase<Derived>
+{
+  public:
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type PacketScalar;
+
+    // Explanation for this CoeffReturnType typedef.
+    // - This is the return type of the coeff() method.
+    // - The LvalueBit means exactly that we can offer a coeffRef() method, which means exactly that we can get references
+    // to coeffs, which means exactly that we can have coeff() return a const reference (as opposed to returning a value).
+    // - The is_artihmetic check is required since "const int", "const double", etc. will cause warnings on some systems
+    // while the declaration of "const T", where T is a non arithmetic type does not. Always returning "const Scalar&" is
+    // not possible, since the underlying expressions might not offer a valid address the reference could be referring to.
+    typedef typename internal::conditional<bool(internal::traits<Derived>::Flags&LvalueBit),
+                         const Scalar&,
+                         typename internal::conditional<internal::is_arithmetic<Scalar>::value, Scalar, const Scalar>::type
+                     >::type CoeffReturnType;
+
+    typedef typename internal::add_const_on_value_type_if_arithmetic<
+                         typename internal::packet_traits<Scalar>::type
+                     >::type PacketReturnType;
+
+    typedef EigenBase<Derived> Base;
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::derived;
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index rowIndexByOuterInner(Index outer, Index inner) const
+    {
+      return int(Derived::RowsAtCompileTime) == 1 ? 0
+          : int(Derived::ColsAtCompileTime) == 1 ? inner
+          : int(Derived::Flags)&RowMajorBit ? outer
+          : inner;
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index colIndexByOuterInner(Index outer, Index inner) const
+    {
+      return int(Derived::ColsAtCompileTime) == 1 ? 0
+          : int(Derived::RowsAtCompileTime) == 1 ? inner
+          : int(Derived::Flags)&RowMajorBit ? inner
+          : outer;
+    }
+
+    /** Short version: don't use this function, use
+      * \link operator()(Index,Index) const \endlink instead.
+      *
+      * Long version: this function is similar to
+      * \link operator()(Index,Index) const \endlink, but without the assertion.
+      * Use this for limiting the performance cost of debugging code when doing
+      * repeated coefficient access. Only use this when it is guaranteed that the
+      * parameters \a row and \a col are in range.
+      *
+      * If EIGEN_INTERNAL_DEBUGGING is defined, an assertion will be made, making this
+      * function equivalent to \link operator()(Index,Index) const \endlink.
+      *
+      * \sa operator()(Index,Index) const, coeffRef(Index,Index), coeff(Index) const
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType coeff(Index row, Index col) const
+    {
+      eigen_internal_assert(row >= 0 && row < rows()
+                        && col >= 0 && col < cols());
+      return derived().coeff(row, col);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType coeffByOuterInner(Index outer, Index inner) const
+    {
+      return coeff(rowIndexByOuterInner(outer, inner),
+                   colIndexByOuterInner(outer, inner));
+    }
+
+    /** \returns the coefficient at given the given row and column.
+      *
+      * \sa operator()(Index,Index), operator[](Index)
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType operator()(Index row, Index col) const
+    {
+      eigen_assert(row >= 0 && row < rows()
+          && col >= 0 && col < cols());
+      return derived().coeff(row, col);
+    }
+
+    /** Short version: don't use this function, use
+      * \link operator[](Index) const \endlink instead.
+      *
+      * Long version: this function is similar to
+      * \link operator[](Index) const \endlink, but without the assertion.
+      * Use this for limiting the performance cost of debugging code when doing
+      * repeated coefficient access. Only use this when it is guaranteed that the
+      * parameter \a index is in range.
+      *
+      * If EIGEN_INTERNAL_DEBUGGING is defined, an assertion will be made, making this
+      * function equivalent to \link operator[](Index) const \endlink.
+      *
+      * \sa operator[](Index) const, coeffRef(Index), coeff(Index,Index) const
+      */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType
+    coeff(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return derived().coeff(index);
+    }
+
+
+    /** \returns the coefficient at given index.
+      *
+      * This method is allowed only for vector expressions, and for matrix expressions having the LinearAccessBit.
+      *
+      * \sa operator[](Index), operator()(Index,Index) const, x() const, y() const,
+      * z() const, w() const
+      */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType
+    operator[](Index index) const
+    {
+      #ifndef EIGEN2_SUPPORT
+      EIGEN_STATIC_ASSERT(Derived::IsVectorAtCompileTime,
+                          THE_BRACKET_OPERATOR_IS_ONLY_FOR_VECTORS__USE_THE_PARENTHESIS_OPERATOR_INSTEAD)
+      #endif
+      eigen_assert(index >= 0 && index < size());
+      return derived().coeff(index);
+    }
+
+    /** \returns the coefficient at given index.
+      *
+      * This is synonymous to operator[](Index) const.
+      *
+      * This method is allowed only for vector expressions, and for matrix expressions having the LinearAccessBit.
+      *
+      * \sa operator[](Index), operator()(Index,Index) const, x() const, y() const,
+      * z() const, w() const
+      */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType
+    operator()(Index index) const
+    {
+      eigen_assert(index >= 0 && index < size());
+      return derived().coeff(index);
+    }
+
+    /** equivalent to operator[](0).  */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType
+    x() const { return (*this)[0]; }
+
+    /** equivalent to operator[](1).  */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType
+    y() const { return (*this)[1]; }
+
+    /** equivalent to operator[](2).  */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType
+    z() const { return (*this)[2]; }
+
+    /** equivalent to operator[](3).  */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE CoeffReturnType
+    w() const { return (*this)[3]; }
+
+    /** \internal
+      * \returns the packet of coefficients starting at the given row and column. It is your responsibility
+      * to ensure that a packet really starts there. This method is only available on expressions having the
+      * PacketAccessBit.
+      *
+      * The \a LoadMode parameter may have the value \a #Aligned or \a #Unaligned. Its effect is to select
+      * the appropriate vectorization instruction. Aligned access is faster, but is only possible for packets
+      * starting at an address which is a multiple of the packet size.
+      */
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketReturnType packet(Index row, Index col) const
+    {
+      eigen_internal_assert(row >= 0 && row < rows()
+                      && col >= 0 && col < cols());
+      return derived().template packet<LoadMode>(row,col);
+    }
+
+
+    /** \internal */
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketReturnType packetByOuterInner(Index outer, Index inner) const
+    {
+      return packet<LoadMode>(rowIndexByOuterInner(outer, inner),
+                              colIndexByOuterInner(outer, inner));
+    }
+
+    /** \internal
+      * \returns the packet of coefficients starting at the given index. It is your responsibility
+      * to ensure that a packet really starts there. This method is only available on expressions having the
+      * PacketAccessBit and the LinearAccessBit.
+      *
+      * The \a LoadMode parameter may have the value \a #Aligned or \a #Unaligned. Its effect is to select
+      * the appropriate vectorization instruction. Aligned access is faster, but is only possible for packets
+      * starting at an address which is a multiple of the packet size.
+      */
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return derived().template packet<LoadMode>(index);
+    }
+
+  protected:
+    // explanation: DenseBase is doing "using ..." on the methods from DenseCoeffsBase.
+    // But some methods are only available in the DirectAccess case.
+    // So we add dummy methods here with these names, so that "using... " doesn't fail.
+    // It's not private so that the child class DenseBase can access them, and it's not public
+    // either since it's an implementation detail, so has to be protected.
+    void coeffRef();
+    void coeffRefByOuterInner();
+    void writePacket();
+    void writePacketByOuterInner();
+    void copyCoeff();
+    void copyCoeffByOuterInner();
+    void copyPacket();
+    void copyPacketByOuterInner();
+    void stride();
+    void innerStride();
+    void outerStride();
+    void rowStride();
+    void colStride();
+};
+
+/** \brief Base class providing read/write coefficient access to matrices and arrays.
+  * \ingroup Core_Module
+  * \tparam Derived Type of the derived class
+  * \tparam #WriteAccessors Constant indicating read/write access
+  *
+  * This class defines the non-const \c operator() function and friends, which can be used to write specific
+  * entries of a matrix or array. This class inherits DenseCoeffsBase<Derived, ReadOnlyAccessors> which
+  * defines the const variant for reading specific entries.
+  * 
+  * \sa DenseCoeffsBase<Derived, DirectAccessors>, \ref TopicClassHierarchy
+  */
+template<typename Derived>
+class DenseCoeffsBase<Derived, WriteAccessors> : public DenseCoeffsBase<Derived, ReadOnlyAccessors>
+{
+  public:
+
+    typedef DenseCoeffsBase<Derived, ReadOnlyAccessors> Base;
+
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type PacketScalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    using Base::coeff;
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::derived;
+    using Base::rowIndexByOuterInner;
+    using Base::colIndexByOuterInner;
+    using Base::operator[];
+    using Base::operator();
+    using Base::x;
+    using Base::y;
+    using Base::z;
+    using Base::w;
+
+    /** Short version: don't use this function, use
+      * \link operator()(Index,Index) \endlink instead.
+      *
+      * Long version: this function is similar to
+      * \link operator()(Index,Index) \endlink, but without the assertion.
+      * Use this for limiting the performance cost of debugging code when doing
+      * repeated coefficient access. Only use this when it is guaranteed that the
+      * parameters \a row and \a col are in range.
+      *
+      * If EIGEN_INTERNAL_DEBUGGING is defined, an assertion will be made, making this
+      * function equivalent to \link operator()(Index,Index) \endlink.
+      *
+      * \sa operator()(Index,Index), coeff(Index, Index) const, coeffRef(Index)
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index row, Index col)
+    {
+      eigen_internal_assert(row >= 0 && row < rows()
+                        && col >= 0 && col < cols());
+      return derived().coeffRef(row, col);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    coeffRefByOuterInner(Index outer, Index inner)
+    {
+      return coeffRef(rowIndexByOuterInner(outer, inner),
+                      colIndexByOuterInner(outer, inner));
+    }
+
+    /** \returns a reference to the coefficient at given the given row and column.
+      *
+      * \sa operator[](Index)
+      */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    operator()(Index row, Index col)
+    {
+      eigen_assert(row >= 0 && row < rows()
+          && col >= 0 && col < cols());
+      return derived().coeffRef(row, col);
+    }
+
+
+    /** Short version: don't use this function, use
+      * \link operator[](Index) \endlink instead.
+      *
+      * Long version: this function is similar to
+      * \link operator[](Index) \endlink, but without the assertion.
+      * Use this for limiting the performance cost of debugging code when doing
+      * repeated coefficient access. Only use this when it is guaranteed that the
+      * parameters \a row and \a col are in range.
+      *
+      * If EIGEN_INTERNAL_DEBUGGING is defined, an assertion will be made, making this
+      * function equivalent to \link operator[](Index) \endlink.
+      *
+      * \sa operator[](Index), coeff(Index) const, coeffRef(Index,Index)
+      */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    coeffRef(Index index)
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return derived().coeffRef(index);
+    }
+
+    /** \returns a reference to the coefficient at given index.
+      *
+      * This method is allowed only for vector expressions, and for matrix expressions having the LinearAccessBit.
+      *
+      * \sa operator[](Index) const, operator()(Index,Index), x(), y(), z(), w()
+      */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    operator[](Index index)
+    {
+      #ifndef EIGEN2_SUPPORT
+      EIGEN_STATIC_ASSERT(Derived::IsVectorAtCompileTime,
+                          THE_BRACKET_OPERATOR_IS_ONLY_FOR_VECTORS__USE_THE_PARENTHESIS_OPERATOR_INSTEAD)
+      #endif
+      eigen_assert(index >= 0 && index < size());
+      return derived().coeffRef(index);
+    }
+
+    /** \returns a reference to the coefficient at given index.
+      *
+      * This is synonymous to operator[](Index).
+      *
+      * This method is allowed only for vector expressions, and for matrix expressions having the LinearAccessBit.
+      *
+      * \sa operator[](Index) const, operator()(Index,Index), x(), y(), z(), w()
+      */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    operator()(Index index)
+    {
+      eigen_assert(index >= 0 && index < size());
+      return derived().coeffRef(index);
+    }
+
+    /** equivalent to operator[](0).  */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    x() { return (*this)[0]; }
+
+    /** equivalent to operator[](1).  */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    y() { return (*this)[1]; }
+
+    /** equivalent to operator[](2).  */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    z() { return (*this)[2]; }
+
+    /** equivalent to operator[](3).  */
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar&
+    w() { return (*this)[3]; }
+
+    /** \internal
+      * Stores the given packet of coefficients, at the given row and column of this expression. It is your responsibility
+      * to ensure that a packet really starts there. This method is only available on expressions having the
+      * PacketAccessBit.
+      *
+      * The \a LoadMode parameter may have the value \a #Aligned or \a #Unaligned. Its effect is to select
+      * the appropriate vectorization instruction. Aligned access is faster, but is only possible for packets
+      * starting at an address which is a multiple of the packet size.
+      */
+
+    template<int StoreMode>
+    EIGEN_STRONG_INLINE void writePacket
+    (Index row, Index col, const typename internal::packet_traits<Scalar>::type& val)
+    {
+      eigen_internal_assert(row >= 0 && row < rows()
+                        && col >= 0 && col < cols());
+      derived().template writePacket<StoreMode>(row,col,val);
+    }
+
+
+    /** \internal */
+    template<int StoreMode>
+    EIGEN_STRONG_INLINE void writePacketByOuterInner
+    (Index outer, Index inner, const typename internal::packet_traits<Scalar>::type& val)
+    {
+      writePacket<StoreMode>(rowIndexByOuterInner(outer, inner),
+                            colIndexByOuterInner(outer, inner),
+                            val);
+    }
+
+    /** \internal
+      * Stores the given packet of coefficients, at the given index in this expression. It is your responsibility
+      * to ensure that a packet really starts there. This method is only available on expressions having the
+      * PacketAccessBit and the LinearAccessBit.
+      *
+      * The \a LoadMode parameter may have the value \a Aligned or \a Unaligned. Its effect is to select
+      * the appropriate vectorization instruction. Aligned access is faster, but is only possible for packets
+      * starting at an address which is a multiple of the packet size.
+      */
+    template<int StoreMode>
+    EIGEN_STRONG_INLINE void writePacket
+    (Index index, const typename internal::packet_traits<Scalar>::type& val)
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      derived().template writePacket<StoreMode>(index,val);
+    }
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+
+    /** \internal Copies the coefficient at position (row,col) of other into *this.
+      *
+      * This method is overridden in SwapWrapper, allowing swap() assignments to share 99% of their code
+      * with usual assignments.
+      *
+      * Outside of this internal usage, this method has probably no usefulness. It is hidden in the public API dox.
+      */
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void copyCoeff(Index row, Index col, const DenseBase<OtherDerived>& other)
+    {
+      eigen_internal_assert(row >= 0 && row < rows()
+                        && col >= 0 && col < cols());
+      derived().coeffRef(row, col) = other.derived().coeff(row, col);
+    }
+
+    /** \internal Copies the coefficient at the given index of other into *this.
+      *
+      * This method is overridden in SwapWrapper, allowing swap() assignments to share 99% of their code
+      * with usual assignments.
+      *
+      * Outside of this internal usage, this method has probably no usefulness. It is hidden in the public API dox.
+      */
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void copyCoeff(Index index, const DenseBase<OtherDerived>& other)
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      derived().coeffRef(index) = other.derived().coeff(index);
+    }
+
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void copyCoeffByOuterInner(Index outer, Index inner, const DenseBase<OtherDerived>& other)
+    {
+      const Index row = rowIndexByOuterInner(outer,inner);
+      const Index col = colIndexByOuterInner(outer,inner);
+      // derived() is important here: copyCoeff() may be reimplemented in Derived!
+      derived().copyCoeff(row, col, other);
+    }
+
+    /** \internal Copies the packet at position (row,col) of other into *this.
+      *
+      * This method is overridden in SwapWrapper, allowing swap() assignments to share 99% of their code
+      * with usual assignments.
+      *
+      * Outside of this internal usage, this method has probably no usefulness. It is hidden in the public API dox.
+      */
+
+    template<typename OtherDerived, int StoreMode, int LoadMode>
+    EIGEN_STRONG_INLINE void copyPacket(Index row, Index col, const DenseBase<OtherDerived>& other)
+    {
+      eigen_internal_assert(row >= 0 && row < rows()
+                        && col >= 0 && col < cols());
+      derived().template writePacket<StoreMode>(row, col,
+        other.derived().template packet<LoadMode>(row, col));
+    }
+
+    /** \internal Copies the packet at the given index of other into *this.
+      *
+      * This method is overridden in SwapWrapper, allowing swap() assignments to share 99% of their code
+      * with usual assignments.
+      *
+      * Outside of this internal usage, this method has probably no usefulness. It is hidden in the public API dox.
+      */
+
+    template<typename OtherDerived, int StoreMode, int LoadMode>
+    EIGEN_STRONG_INLINE void copyPacket(Index index, const DenseBase<OtherDerived>& other)
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      derived().template writePacket<StoreMode>(index,
+        other.derived().template packet<LoadMode>(index));
+    }
+
+    /** \internal */
+    template<typename OtherDerived, int StoreMode, int LoadMode>
+    EIGEN_STRONG_INLINE void copyPacketByOuterInner(Index outer, Index inner, const DenseBase<OtherDerived>& other)
+    {
+      const Index row = rowIndexByOuterInner(outer,inner);
+      const Index col = colIndexByOuterInner(outer,inner);
+      // derived() is important here: copyCoeff() may be reimplemented in Derived!
+      derived().template copyPacket< OtherDerived, StoreMode, LoadMode>(row, col, other);
+    }
+#endif
+
+};
+
+/** \brief Base class providing direct read-only coefficient access to matrices and arrays.
+  * \ingroup Core_Module
+  * \tparam Derived Type of the derived class
+  * \tparam #DirectAccessors Constant indicating direct access
+  *
+  * This class defines functions to work with strides which can be used to access entries directly. This class
+  * inherits DenseCoeffsBase<Derived, ReadOnlyAccessors> which defines functions to access entries read-only using
+  * \c operator() .
+  *
+  * \sa \ref TopicClassHierarchy
+  */
+template<typename Derived>
+class DenseCoeffsBase<Derived, DirectAccessors> : public DenseCoeffsBase<Derived, ReadOnlyAccessors>
+{
+  public:
+
+    typedef DenseCoeffsBase<Derived, ReadOnlyAccessors> Base;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::derived;
+
+    /** \returns the pointer increment between two consecutive elements within a slice in the inner direction.
+      *
+      * \sa outerStride(), rowStride(), colStride()
+      */
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const
+    {
+      return derived().innerStride();
+    }
+
+    /** \returns the pointer increment between two consecutive inner slices (for example, between two consecutive columns
+      *          in a column-major matrix).
+      *
+      * \sa innerStride(), rowStride(), colStride()
+      */
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const
+    {
+      return derived().outerStride();
+    }
+
+    // FIXME shall we remove it ?
+    inline Index stride() const
+    {
+      return Derived::IsVectorAtCompileTime ? innerStride() : outerStride();
+    }
+
+    /** \returns the pointer increment between two consecutive rows.
+      *
+      * \sa innerStride(), outerStride(), colStride()
+      */
+    EIGEN_DEVICE_FUNC
+    inline Index rowStride() const
+    {
+      return Derived::IsRowMajor ? outerStride() : innerStride();
+    }
+
+    /** \returns the pointer increment between two consecutive columns.
+      *
+      * \sa innerStride(), outerStride(), rowStride()
+      */
+    EIGEN_DEVICE_FUNC
+    inline Index colStride() const
+    {
+      return Derived::IsRowMajor ? innerStride() : outerStride();
+    }
+};
+
+/** \brief Base class providing direct read/write coefficient access to matrices and arrays.
+  * \ingroup Core_Module
+  * \tparam Derived Type of the derived class
+  * \tparam #DirectWriteAccessors Constant indicating direct access
+  *
+  * This class defines functions to work with strides which can be used to access entries directly. This class
+  * inherits DenseCoeffsBase<Derived, WriteAccessors> which defines functions to access entries read/write using
+  * \c operator().
+  *
+  * \sa \ref TopicClassHierarchy
+  */
+template<typename Derived>
+class DenseCoeffsBase<Derived, DirectWriteAccessors>
+  : public DenseCoeffsBase<Derived, WriteAccessors>
+{
+  public:
+
+    typedef DenseCoeffsBase<Derived, WriteAccessors> Base;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::derived;
+
+    /** \returns the pointer increment between two consecutive elements within a slice in the inner direction.
+      *
+      * \sa outerStride(), rowStride(), colStride()
+      */
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const
+    {
+      return derived().innerStride();
+    }
+
+    /** \returns the pointer increment between two consecutive inner slices (for example, between two consecutive columns
+      *          in a column-major matrix).
+      *
+      * \sa innerStride(), rowStride(), colStride()
+      */
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const
+    {
+      return derived().outerStride();
+    }
+
+    // FIXME shall we remove it ?
+    inline Index stride() const
+    {
+      return Derived::IsVectorAtCompileTime ? innerStride() : outerStride();
+    }
+
+    /** \returns the pointer increment between two consecutive rows.
+      *
+      * \sa innerStride(), outerStride(), colStride()
+      */
+    EIGEN_DEVICE_FUNC
+    inline Index rowStride() const
+    {
+      return Derived::IsRowMajor ? outerStride() : innerStride();
+    }
+
+    /** \returns the pointer increment between two consecutive columns.
+      *
+      * \sa innerStride(), outerStride(), rowStride()
+      */
+    EIGEN_DEVICE_FUNC
+    inline Index colStride() const
+    {
+      return Derived::IsRowMajor ? innerStride() : outerStride();
+    }
+};
+
+namespace internal {
+
+template<typename Derived, bool JustReturnZero>
+struct first_aligned_impl
+{
+  static inline typename Derived::Index run(const Derived&)
+  { return 0; }
+};
+
+template<typename Derived>
+struct first_aligned_impl<Derived, false>
+{
+  static inline typename Derived::Index run(const Derived& m)
+  {
+    return internal::first_aligned(&m.const_cast_derived().coeffRef(0,0), m.size());
+  }
+};
+
+/** \internal \returns the index of the first element of the array that is well aligned for vectorization.
+  *
+  * There is also the variant first_aligned(const Scalar*, Integer) defined in Memory.h. See it for more
+  * documentation.
+  */
+template<typename Derived>
+static inline typename Derived::Index first_aligned(const Derived& m)
+{
+  return first_aligned_impl
+          <Derived, (Derived::Flags & AlignedBit) || !(Derived::Flags & DirectAccessBit)>
+          ::run(m);
+}
+
+template<typename Derived, bool HasDirectAccess = has_direct_access<Derived>::ret>
+struct inner_stride_at_compile_time
+{
+  enum { ret = traits<Derived>::InnerStrideAtCompileTime };
+};
+
+template<typename Derived>
+struct inner_stride_at_compile_time<Derived, false>
+{
+  enum { ret = 0 };
+};
+
+template<typename Derived, bool HasDirectAccess = has_direct_access<Derived>::ret>
+struct outer_stride_at_compile_time
+{
+  enum { ret = traits<Derived>::OuterStrideAtCompileTime };
+};
+
+template<typename Derived>
+struct outer_stride_at_compile_time<Derived, false>
+{
+  enum { ret = 0 };
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_DENSECOEFFSBASE_H
diff --git a/third_party/eigen3/Eigen/src/Core/DenseStorage.h b/third_party/eigen3/Eigen/src/Core/DenseStorage.h
new file mode 100644
index 0000000000..59f5154956
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/DenseStorage.h
@@ -0,0 +1,480 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2010-2013 Hauke Heibel <hauke.heibel@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MATRIXSTORAGE_H
+#define EIGEN_MATRIXSTORAGE_H
+
+#ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+  #define EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN EIGEN_DENSE_STORAGE_CTOR_PLUGIN;
+#else
+  #define EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
+#endif
+
+namespace Eigen {
+
+namespace internal {
+
+struct constructor_without_unaligned_array_assert {};
+
+template<typename T, int Size>
+EIGEN_DEVICE_FUNC
+void check_static_allocation_size()
+{
+  // if EIGEN_STACK_ALLOCATION_LIMIT is defined to 0, then no limit
+  #if EIGEN_STACK_ALLOCATION_LIMIT
+  EIGEN_STATIC_ASSERT(Size * sizeof(T) <= EIGEN_STACK_ALLOCATION_LIMIT, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG);
+  #endif
+}
+
+/** \internal
+  * Static array. If the MatrixOrArrayOptions require auto-alignment, the array will be automatically aligned:
+  * to 16 bytes boundary if the total size is a multiple of 16 bytes.
+  */
+template <typename T, int Size, int MatrixOrArrayOptions,
+          int Alignment = (MatrixOrArrayOptions&DontAlign) ? 0
+                        : (((Size*sizeof(T))%EIGEN_ALIGN_BYTES)==0) ? EIGEN_ALIGN_BYTES
+                        : 0 >
+struct plain_array
+{
+  T array[Size];
+
+  EIGEN_DEVICE_FUNC
+  plain_array()
+  { 
+    check_static_allocation_size<T,Size>();
+  }
+
+  EIGEN_DEVICE_FUNC
+  plain_array(constructor_without_unaligned_array_assert)
+  { 
+    check_static_allocation_size<T,Size>();
+  }
+};
+
+#if defined(EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT)
+  #define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask)
+#elif EIGEN_GNUC_AT_LEAST(4,7) 
+  // GCC 4.7 is too aggressive in its optimizations and remove the alignement test based on the fact the array is declared to be aligned.
+  // See this bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53900
+  // Hiding the origin of the array pointer behind a function argument seems to do the trick even if the function is inlined:
+  template<typename PtrType>
+  EIGEN_ALWAYS_INLINE PtrType eigen_unaligned_array_assert_workaround_gcc47(PtrType array) { return array; }
+  #define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask) \
+    eigen_assert((reinterpret_cast<size_t>(eigen_unaligned_array_assert_workaround_gcc47(array)) & (sizemask)) == 0 \
+              && "this assertion is explained here: " \
+              "http://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html" \
+              " **** READ THIS WEB PAGE !!! ****");
+#else
+  #define EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(sizemask) \
+    eigen_assert((reinterpret_cast<size_t>(array) & (sizemask)) == 0 \
+              && "this assertion is explained here: " \
+              "http://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html" \
+              " **** READ THIS WEB PAGE !!! ****");
+#endif
+
+template <typename T, int Size, int MatrixOrArrayOptions>
+struct plain_array<T, Size, MatrixOrArrayOptions, EIGEN_ALIGN_BYTES>
+{
+  EIGEN_USER_ALIGN_DEFAULT T array[Size];
+
+  EIGEN_DEVICE_FUNC
+  plain_array() 
+  { 
+    EIGEN_MAKE_UNALIGNED_ARRAY_ASSERT(EIGEN_ALIGN_BYTES-1);
+    check_static_allocation_size<T,Size>();
+  }
+
+  EIGEN_DEVICE_FUNC
+  plain_array(constructor_without_unaligned_array_assert) 
+  { 
+    check_static_allocation_size<T,Size>();
+  }
+};
+
+template <typename T, int MatrixOrArrayOptions, int Alignment>
+struct plain_array<T, 0, MatrixOrArrayOptions, Alignment>
+{
+  EIGEN_USER_ALIGN_DEFAULT T array[1];
+  EIGEN_DEVICE_FUNC plain_array() {}
+  EIGEN_DEVICE_FUNC plain_array(constructor_without_unaligned_array_assert) {}
+};
+
+} // end namespace internal
+
+/** \internal
+  *
+  * \class DenseStorage
+  * \ingroup Core_Module
+  *
+  * \brief Stores the data of a matrix
+  *
+  * This class stores the data of fixed-size, dynamic-size or mixed matrices
+  * in a way as compact as possible.
+  *
+  * \sa Matrix
+  */
+template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseStorage;
+
+// purely fixed-size matrix
+template<typename T, int Size, int _Rows, int _Cols, int _Options> class DenseStorage
+{
+    internal::plain_array<T,Size,_Options> m_data;
+  public:
+    EIGEN_DEVICE_FUNC DenseStorage() {}
+    EIGEN_DEVICE_FUNC
+    DenseStorage(internal::constructor_without_unaligned_array_assert)
+      : m_data(internal::constructor_without_unaligned_array_assert()) {}
+    EIGEN_DEVICE_FUNC 
+    DenseStorage(const DenseStorage& other) : m_data(other.m_data) {}
+    EIGEN_DEVICE_FUNC 
+    DenseStorage& operator=(const DenseStorage& other)
+    { 
+      if (this != &other) m_data = other.m_data;
+      return *this; 
+    }
+    EIGEN_DEVICE_FUNC DenseStorage(DenseIndex,DenseIndex,DenseIndex) {}
+    EIGEN_DEVICE_FUNC void swap(DenseStorage& other) { std::swap(m_data,other.m_data); }
+    EIGEN_DEVICE_FUNC static DenseIndex rows(void) {return _Rows;}
+    EIGEN_DEVICE_FUNC static DenseIndex cols(void) {return _Cols;}
+    EIGEN_DEVICE_FUNC void conservativeResize(DenseIndex,DenseIndex,DenseIndex) {}
+    EIGEN_DEVICE_FUNC void resize(DenseIndex,DenseIndex,DenseIndex) {}
+    EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
+    EIGEN_DEVICE_FUNC T *data() { return m_data.array; }
+};
+
+// null matrix
+template<typename T, int _Rows, int _Cols, int _Options> class DenseStorage<T, 0, _Rows, _Cols, _Options>
+{
+  public:
+    EIGEN_DEVICE_FUNC DenseStorage() {}
+    EIGEN_DEVICE_FUNC DenseStorage(internal::constructor_without_unaligned_array_assert) {}
+    EIGEN_DEVICE_FUNC DenseStorage(const DenseStorage&) {}
+    EIGEN_DEVICE_FUNC DenseStorage& operator=(const DenseStorage&) { return *this; }
+    EIGEN_DEVICE_FUNC DenseStorage(DenseIndex,DenseIndex,DenseIndex) {}
+    EIGEN_DEVICE_FUNC void swap(DenseStorage& ) {}
+    EIGEN_DEVICE_FUNC static DenseIndex rows(void) {return _Rows;}
+    EIGEN_DEVICE_FUNC static DenseIndex cols(void) {return _Cols;}
+    EIGEN_DEVICE_FUNC void conservativeResize(DenseIndex,DenseIndex,DenseIndex) {}
+    EIGEN_DEVICE_FUNC void resize(DenseIndex,DenseIndex,DenseIndex) {}
+    EIGEN_DEVICE_FUNC const T *data() const { return 0; }
+    EIGEN_DEVICE_FUNC T *data() { return 0; }
+};
+
+// more specializations for null matrices; these are necessary to resolve ambiguities
+template<typename T, int _Options> class DenseStorage<T, 0, Dynamic, Dynamic, _Options>
+: public DenseStorage<T, 0, 0, 0, _Options> { };
+
+template<typename T, int _Rows, int _Options> class DenseStorage<T, 0, _Rows, Dynamic, _Options>
+: public DenseStorage<T, 0, 0, 0, _Options> { };
+
+template<typename T, int _Cols, int _Options> class DenseStorage<T, 0, Dynamic, _Cols, _Options>
+: public DenseStorage<T, 0, 0, 0, _Options> { };
+
+// dynamic-size matrix with fixed-size storage
+template<typename T, int Size, int _Options> class DenseStorage<T, Size, Dynamic, Dynamic, _Options>
+{
+    internal::plain_array<T,Size,_Options> m_data;
+    DenseIndex m_rows;
+    DenseIndex m_cols;
+  public:
+    EIGEN_DEVICE_FUNC DenseStorage() : m_rows(0), m_cols(0) {}
+    DenseStorage(internal::constructor_without_unaligned_array_assert)
+      : m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0), m_cols(0) {}
+    DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_rows(other.m_rows), m_cols(other.m_cols) {}
+    DenseStorage& operator=(const DenseStorage& other) 
+    { 
+      if (this != &other)
+      {
+        m_data = other.m_data;
+        m_rows = other.m_rows;
+        m_cols = other.m_cols;
+      }
+      return *this; 
+    }
+    DenseStorage(DenseIndex, DenseIndex nbRows, DenseIndex nbCols) : m_rows(nbRows), m_cols(nbCols) {}
+    void swap(DenseStorage& other)
+    { std::swap(m_data,other.m_data); std::swap(m_rows,other.m_rows); std::swap(m_cols,other.m_cols); }
+    EIGEN_DEVICE_FUNC DenseIndex rows() const {return m_rows;}
+    EIGEN_DEVICE_FUNC DenseIndex cols() const {return m_cols;}
+    void conservativeResize(DenseIndex, DenseIndex nbRows, DenseIndex nbCols) { m_rows = nbRows; m_cols = nbCols; }
+    void resize(DenseIndex, DenseIndex nbRows, DenseIndex nbCols) { m_rows = nbRows; m_cols = nbCols; }
+    EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
+    EIGEN_DEVICE_FUNC T *data() { return m_data.array; }
+};
+
+// dynamic-size matrix with fixed-size storage and fixed width
+template<typename T, int Size, int _Cols, int _Options> class DenseStorage<T, Size, Dynamic, _Cols, _Options>
+{
+    internal::plain_array<T,Size,_Options> m_data;
+    DenseIndex m_rows;
+  public:
+    EIGEN_DEVICE_FUNC DenseStorage() : m_rows(0) {}
+    DenseStorage(internal::constructor_without_unaligned_array_assert)
+      : m_data(internal::constructor_without_unaligned_array_assert()), m_rows(0) {}
+    DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_rows(other.m_rows) {}
+    DenseStorage& operator=(const DenseStorage& other) 
+    {
+      if (this != &other)
+      {
+        m_data = other.m_data;
+        m_rows = other.m_rows;
+      }
+      return *this; 
+    }
+    DenseStorage(DenseIndex, DenseIndex nbRows, DenseIndex) : m_rows(nbRows) {}
+    void swap(DenseStorage& other) { std::swap(m_data,other.m_data); std::swap(m_rows,other.m_rows); }
+    EIGEN_DEVICE_FUNC DenseIndex rows(void) const {return m_rows;}
+    EIGEN_DEVICE_FUNC DenseIndex cols(void) const {return _Cols;}
+    void conservativeResize(DenseIndex, DenseIndex nbRows, DenseIndex) { m_rows = nbRows; }
+    void resize(DenseIndex, DenseIndex nbRows, DenseIndex) { m_rows = nbRows; }
+    EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
+    EIGEN_DEVICE_FUNC T *data() { return m_data.array; }
+};
+
+// dynamic-size matrix with fixed-size storage and fixed height
+template<typename T, int Size, int _Rows, int _Options> class DenseStorage<T, Size, _Rows, Dynamic, _Options>
+{
+    internal::plain_array<T,Size,_Options> m_data;
+    DenseIndex m_cols;
+  public:
+    EIGEN_DEVICE_FUNC DenseStorage() : m_cols(0) {}
+    DenseStorage(internal::constructor_without_unaligned_array_assert)
+      : m_data(internal::constructor_without_unaligned_array_assert()), m_cols(0) {}
+    DenseStorage(const DenseStorage& other) : m_data(other.m_data), m_cols(other.m_cols) {}
+    DenseStorage& operator=(const DenseStorage& other)
+    {
+      if (this != &other)
+      {
+        m_data = other.m_data;
+        m_cols = other.m_cols;
+      }
+      return *this;
+    }
+    DenseStorage(DenseIndex, DenseIndex, DenseIndex nbCols) : m_cols(nbCols) {}
+    void swap(DenseStorage& other) { std::swap(m_data,other.m_data); std::swap(m_cols,other.m_cols); }
+    EIGEN_DEVICE_FUNC DenseIndex rows(void) const {return _Rows;}
+    EIGEN_DEVICE_FUNC DenseIndex cols(void) const {return m_cols;}
+    void conservativeResize(DenseIndex, DenseIndex, DenseIndex nbCols) { m_cols = nbCols; }
+    void resize(DenseIndex, DenseIndex, DenseIndex nbCols) { m_cols = nbCols; }
+    EIGEN_DEVICE_FUNC const T *data() const { return m_data.array; }
+    EIGEN_DEVICE_FUNC T *data() { return m_data.array; }
+};
+
+// purely dynamic matrix.
+template<typename T, int _Options> class DenseStorage<T, Dynamic, Dynamic, Dynamic, _Options>
+{
+    T *m_data;
+    DenseIndex m_rows;
+    DenseIndex m_cols;
+  public:
+    EIGEN_DEVICE_FUNC DenseStorage() : m_data(0), m_rows(0), m_cols(0) {}
+    DenseStorage(internal::constructor_without_unaligned_array_assert)
+       : m_data(0), m_rows(0), m_cols(0) {}
+    DenseStorage(DenseIndex size, DenseIndex nbRows, DenseIndex nbCols)
+      : m_data(internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(size)), m_rows(nbRows), m_cols(nbCols)
+    { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN }
+    DenseStorage(const DenseStorage& other)
+      : m_data(internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(other.m_rows*other.m_cols))
+      , m_rows(other.m_rows)
+      , m_cols(other.m_cols)
+    {
+      internal::smart_copy(other.m_data, other.m_data+other.m_rows*other.m_cols, m_data);
+    }
+    DenseStorage& operator=(const DenseStorage& other)
+    {
+      if (this != &other)
+      {
+        DenseStorage tmp(other);
+        this->swap(tmp);
+      }
+      return *this;
+    }
+#ifdef EIGEN_HAVE_RVALUE_REFERENCES
+    DenseStorage(DenseStorage&& other)
+      : m_data(std::move(other.m_data))
+      , m_rows(std::move(other.m_rows))
+      , m_cols(std::move(other.m_cols))
+    {
+      other.m_data = nullptr;
+    }
+    DenseStorage& operator=(DenseStorage&& other)
+    {
+      using std::swap;
+      swap(m_data, other.m_data);
+      swap(m_rows, other.m_rows);
+      swap(m_cols, other.m_cols);
+      return *this;
+    }
+#endif
+    ~DenseStorage() { internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, m_rows*m_cols); }
+    void swap(DenseStorage& other)
+    { std::swap(m_data,other.m_data); std::swap(m_rows,other.m_rows); std::swap(m_cols,other.m_cols); }
+    EIGEN_DEVICE_FUNC DenseIndex rows(void) const {return m_rows;}
+    EIGEN_DEVICE_FUNC DenseIndex cols(void) const {return m_cols;}
+    void conservativeResize(DenseIndex size, DenseIndex nbRows, DenseIndex nbCols)
+    {
+      m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, m_rows*m_cols);
+      m_rows = nbRows;
+      m_cols = nbCols;
+    }
+    void resize(DenseIndex size, DenseIndex nbRows, DenseIndex nbCols)
+    {
+      if(size != m_rows*m_cols)
+      {
+        internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, m_rows*m_cols);
+        if (size)
+          m_data = internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(size);
+        else
+          m_data = 0;
+        EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
+      }
+      m_rows = nbRows;
+      m_cols = nbCols;
+    }
+    EIGEN_DEVICE_FUNC const T *data() const { return m_data; }
+    EIGEN_DEVICE_FUNC T *data() { return m_data; }
+};
+
+// matrix with dynamic width and fixed height (so that matrix has dynamic size).
+template<typename T, int _Rows, int _Options> class DenseStorage<T, Dynamic, _Rows, Dynamic, _Options>
+{
+    T *m_data;
+    DenseIndex m_cols;
+  public:
+    EIGEN_DEVICE_FUNC DenseStorage() : m_data(0), m_cols(0) {}
+    DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(0), m_cols(0) {}
+    DenseStorage(DenseIndex size, DenseIndex, DenseIndex nbCols) : m_data(internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(size)), m_cols(nbCols)
+    { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN }
+    DenseStorage(const DenseStorage& other)
+      : m_data(internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(_Rows*other.m_cols))
+      , m_cols(other.m_cols)
+    {
+      internal::smart_copy(other.m_data, other.m_data+_Rows*m_cols, m_data);
+    }
+    DenseStorage& operator=(const DenseStorage& other)
+    {
+      if (this != &other)
+      {
+        DenseStorage tmp(other);
+        this->swap(tmp);
+      }
+      return *this;
+    }    
+#ifdef EIGEN_HAVE_RVALUE_REFERENCES
+    DenseStorage(DenseStorage&& other)
+      : m_data(std::move(other.m_data))
+      , m_cols(std::move(other.m_cols))
+    {
+      other.m_data = nullptr;
+    }
+    DenseStorage& operator=(DenseStorage&& other)
+    {
+      using std::swap;
+      swap(m_data, other.m_data);
+      swap(m_cols, other.m_cols);
+      return *this;
+    }
+#endif
+    ~DenseStorage() { internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, _Rows*m_cols); }
+    void swap(DenseStorage& other) { std::swap(m_data,other.m_data); std::swap(m_cols,other.m_cols); }
+    EIGEN_DEVICE_FUNC static DenseIndex rows(void) {return _Rows;}
+    EIGEN_DEVICE_FUNC DenseIndex cols(void) const {return m_cols;}
+    void conservativeResize(DenseIndex size, DenseIndex, DenseIndex nbCols)
+    {
+      m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, _Rows*m_cols);
+      m_cols = nbCols;
+    }
+    EIGEN_STRONG_INLINE void resize(DenseIndex size, DenseIndex, DenseIndex nbCols)
+    {
+      if(size != _Rows*m_cols)
+      {
+        internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, _Rows*m_cols);
+        if (size)
+          m_data = internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(size);
+        else
+          m_data = 0;
+        EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
+      }
+      m_cols = nbCols;
+    }
+    EIGEN_DEVICE_FUNC const T *data() const { return m_data; }
+    EIGEN_DEVICE_FUNC T *data() { return m_data; }
+};
+
+// matrix with dynamic height and fixed width (so that matrix has dynamic size).
+template<typename T, int _Cols, int _Options> class DenseStorage<T, Dynamic, Dynamic, _Cols, _Options>
+{
+    T *m_data;
+    DenseIndex m_rows;
+  public:
+    EIGEN_DEVICE_FUNC DenseStorage() : m_data(0), m_rows(0) {}
+    DenseStorage(internal::constructor_without_unaligned_array_assert) : m_data(0), m_rows(0) {}
+    DenseStorage(DenseIndex size, DenseIndex nbRows, DenseIndex) : m_data(internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(size)), m_rows(nbRows)
+    { EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN }
+    DenseStorage(const DenseStorage& other)
+      : m_data(internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(other.m_rows*_Cols))
+      , m_rows(other.m_rows)
+    {
+      internal::smart_copy(other.m_data, other.m_data+other.m_rows*_Cols, m_data);
+    }
+    DenseStorage& operator=(const DenseStorage& other)
+    {
+      if (this != &other)
+      {
+        DenseStorage tmp(other);
+        this->swap(tmp);
+      }
+      return *this;
+    }    
+#ifdef EIGEN_HAVE_RVALUE_REFERENCES
+    DenseStorage(DenseStorage&& other)
+      : m_data(std::move(other.m_data))
+      , m_rows(std::move(other.m_rows))
+    {
+      other.m_data = nullptr;
+    }
+    DenseStorage& operator=(DenseStorage&& other)
+    {
+      using std::swap;
+      swap(m_data, other.m_data);
+      swap(m_rows, other.m_rows);
+      return *this;
+    }
+#endif
+    ~DenseStorage() { internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, _Cols*m_rows); }
+    void swap(DenseStorage& other) { std::swap(m_data,other.m_data); std::swap(m_rows,other.m_rows); }
+    EIGEN_DEVICE_FUNC DenseIndex rows(void) const {return m_rows;}
+    EIGEN_DEVICE_FUNC static DenseIndex cols(void) {return _Cols;}
+    void conservativeResize(DenseIndex size, DenseIndex nbRows, DenseIndex)
+    {
+      m_data = internal::conditional_aligned_realloc_new_auto<T,(_Options&DontAlign)==0>(m_data, size, m_rows*_Cols);
+      m_rows = nbRows;
+    }
+    EIGEN_STRONG_INLINE void resize(DenseIndex size, DenseIndex nbRows, DenseIndex)
+    {
+      if(size != m_rows*_Cols)
+      {
+        internal::conditional_aligned_delete_auto<T,(_Options&DontAlign)==0>(m_data, _Cols*m_rows);
+        if (size)
+          m_data = internal::conditional_aligned_new_auto<T,(_Options&DontAlign)==0>(size);
+        else
+          m_data = 0;
+        EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
+      }
+      m_rows = nbRows;
+    }
+    EIGEN_DEVICE_FUNC const T *data() const { return m_data; }
+    EIGEN_DEVICE_FUNC T *data() { return m_data; }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_MATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/Diagonal.h b/third_party/eigen3/Eigen/src/Core/Diagonal.h
new file mode 100644
index 0000000000..d760762cc2
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Diagonal.h
@@ -0,0 +1,258 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2007-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_DIAGONAL_H
+#define EIGEN_DIAGONAL_H
+
+namespace Eigen { 
+
+/** \class Diagonal
+  * \ingroup Core_Module
+  *
+  * \brief Expression of a diagonal/subdiagonal/superdiagonal in a matrix
+  *
+  * \param MatrixType the type of the object in which we are taking a sub/main/super diagonal
+  * \param DiagIndex the index of the sub/super diagonal. The default is 0 and it means the main diagonal.
+  *              A positive value means a superdiagonal, a negative value means a subdiagonal.
+  *              You can also use Dynamic so the index can be set at runtime.
+  *
+  * The matrix is not required to be square.
+  *
+  * This class represents an expression of the main diagonal, or any sub/super diagonal
+  * of a square matrix. It is the return type of MatrixBase::diagonal() and MatrixBase::diagonal(Index) and most of the
+  * time this is the only way it is used.
+  *
+  * \sa MatrixBase::diagonal(), MatrixBase::diagonal(Index)
+  */
+
+namespace internal {
+template<typename MatrixType, int DiagIndex>
+struct traits<Diagonal<MatrixType,DiagIndex> >
+ : traits<MatrixType>
+{
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_reference<MatrixTypeNested>::type _MatrixTypeNested;
+  typedef typename MatrixType::StorageKind StorageKind;
+  enum {
+    RowsAtCompileTime = (int(DiagIndex) == DynamicIndex || int(MatrixType::SizeAtCompileTime) == Dynamic) ? Dynamic
+                      : (EIGEN_PLAIN_ENUM_MIN(MatrixType::RowsAtCompileTime - EIGEN_PLAIN_ENUM_MAX(-DiagIndex, 0),
+                                              MatrixType::ColsAtCompileTime - EIGEN_PLAIN_ENUM_MAX( DiagIndex, 0))),
+    ColsAtCompileTime = 1,
+    MaxRowsAtCompileTime = int(MatrixType::MaxSizeAtCompileTime) == Dynamic ? Dynamic
+                         : DiagIndex == DynamicIndex ? EIGEN_SIZE_MIN_PREFER_FIXED(MatrixType::MaxRowsAtCompileTime,
+                                                                              MatrixType::MaxColsAtCompileTime)
+                         : (EIGEN_PLAIN_ENUM_MIN(MatrixType::MaxRowsAtCompileTime - EIGEN_PLAIN_ENUM_MAX(-DiagIndex, 0),
+                                                 MatrixType::MaxColsAtCompileTime - EIGEN_PLAIN_ENUM_MAX( DiagIndex, 0))),
+    MaxColsAtCompileTime = 1,
+    MaskLvalueBit = is_lvalue<MatrixType>::value ? LvalueBit : 0,
+    Flags = (unsigned int)_MatrixTypeNested::Flags & (HereditaryBits | LinearAccessBit | MaskLvalueBit | DirectAccessBit) & ~RowMajorBit,
+    CoeffReadCost = _MatrixTypeNested::CoeffReadCost,
+    MatrixTypeOuterStride = outer_stride_at_compile_time<MatrixType>::ret,
+    InnerStrideAtCompileTime = MatrixTypeOuterStride == Dynamic ? Dynamic : MatrixTypeOuterStride+1,
+    OuterStrideAtCompileTime = 0
+  };
+};
+}
+
+template<typename MatrixType, int _DiagIndex> class Diagonal
+   : public internal::dense_xpr_base< Diagonal<MatrixType,_DiagIndex> >::type
+{
+  public:
+
+    enum { DiagIndex = _DiagIndex };
+    typedef typename internal::dense_xpr_base<Diagonal>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Diagonal)
+
+    EIGEN_DEVICE_FUNC
+    inline Diagonal(MatrixType& matrix, Index a_index = DiagIndex) : m_matrix(matrix), m_index(a_index) {}
+
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Diagonal)
+
+    EIGEN_DEVICE_FUNC
+    inline Index rows() const
+    {
+      return m_index.value()<0 ? numext::mini(Index(m_matrix.cols()),Index(m_matrix.rows()+m_index.value()))
+                               : numext::mini(Index(m_matrix.rows()),Index(m_matrix.cols()-m_index.value()));
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Index cols() const { return 1; }
+
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const
+    {
+      return m_matrix.outerStride() + 1;
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const
+    {
+      return 0;
+    }
+
+    typedef typename internal::conditional<
+                       internal::is_lvalue<MatrixType>::value,
+                       Scalar,
+                       const Scalar
+                     >::type ScalarWithConstIfNotLvalue;
+
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue* data() { return &(m_matrix.const_cast_derived().coeffRef(rowOffset(), colOffset())); }
+    EIGEN_DEVICE_FUNC
+    inline const Scalar* data() const { return &(m_matrix.const_cast_derived().coeffRef(rowOffset(), colOffset())); }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index row, Index)
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
+      return m_matrix.const_cast_derived().coeffRef(row+rowOffset(), row+colOffset());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index row, Index) const
+    {
+      return m_matrix.const_cast_derived().coeffRef(row+rowOffset(), row+colOffset());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffReturnType coeff(Index row, Index) const
+    {
+      return m_matrix.coeff(row+rowOffset(), row+colOffset());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index idx)
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
+      return m_matrix.const_cast_derived().coeffRef(idx+rowOffset(), idx+colOffset());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index idx) const
+    {
+      return m_matrix.const_cast_derived().coeffRef(idx+rowOffset(), idx+colOffset());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffReturnType coeff(Index idx) const
+    {
+      return m_matrix.coeff(idx+rowOffset(), idx+colOffset());
+    }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename MatrixType::Nested>::type& 
+    nestedExpression() const 
+    {
+      return m_matrix;
+    }
+
+    EIGEN_DEVICE_FUNC
+    int index() const
+    {
+      return m_index.value();
+    }
+
+  protected:
+    typename MatrixType::Nested m_matrix;
+    const internal::variable_if_dynamicindex<Index, DiagIndex> m_index;
+
+  private:
+    // some compilers may fail to optimize std::max etc in case of compile-time constants...
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index absDiagIndex() const { return m_index.value()>0 ? m_index.value() : -m_index.value(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index rowOffset() const { return m_index.value()>0 ? 0 : -m_index.value(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index colOffset() const { return m_index.value()>0 ? m_index.value() : 0; }
+    // triger a compile time error is someone try to call packet
+    template<int LoadMode> typename MatrixType::PacketReturnType packet(Index) const;
+    template<int LoadMode> typename MatrixType::PacketReturnType packet(Index,Index) const;
+};
+
+/** \returns an expression of the main diagonal of the matrix \c *this
+  *
+  * \c *this is not required to be square.
+  *
+  * Example: \include MatrixBase_diagonal.cpp
+  * Output: \verbinclude MatrixBase_diagonal.out
+  *
+  * \sa class Diagonal */
+template<typename Derived>
+inline typename MatrixBase<Derived>::DiagonalReturnType
+MatrixBase<Derived>::diagonal()
+{
+  return derived();
+}
+
+/** This is the const version of diagonal(). */
+template<typename Derived>
+inline typename MatrixBase<Derived>::ConstDiagonalReturnType
+MatrixBase<Derived>::diagonal() const
+{
+  return ConstDiagonalReturnType(derived());
+}
+
+/** \returns an expression of the \a DiagIndex-th sub or super diagonal of the matrix \c *this
+  *
+  * \c *this is not required to be square.
+  *
+  * The template parameter \a DiagIndex represent a super diagonal if \a DiagIndex > 0
+  * and a sub diagonal otherwise. \a DiagIndex == 0 is equivalent to the main diagonal.
+  *
+  * Example: \include MatrixBase_diagonal_int.cpp
+  * Output: \verbinclude MatrixBase_diagonal_int.out
+  *
+  * \sa MatrixBase::diagonal(), class Diagonal */
+template<typename Derived>
+inline typename MatrixBase<Derived>::template DiagonalIndexReturnType<DynamicIndex>::Type
+MatrixBase<Derived>::diagonal(Index index)
+{
+  return typename DiagonalIndexReturnType<DynamicIndex>::Type(derived(), index);
+}
+
+/** This is the const version of diagonal(Index). */
+template<typename Derived>
+inline typename MatrixBase<Derived>::template ConstDiagonalIndexReturnType<DynamicIndex>::Type
+MatrixBase<Derived>::diagonal(Index index) const
+{
+  return typename ConstDiagonalIndexReturnType<DynamicIndex>::Type(derived(), index);
+}
+
+/** \returns an expression of the \a DiagIndex-th sub or super diagonal of the matrix \c *this
+  *
+  * \c *this is not required to be square.
+  *
+  * The template parameter \a DiagIndex represent a super diagonal if \a DiagIndex > 0
+  * and a sub diagonal otherwise. \a DiagIndex == 0 is equivalent to the main diagonal.
+  *
+  * Example: \include MatrixBase_diagonal_template_int.cpp
+  * Output: \verbinclude MatrixBase_diagonal_template_int.out
+  *
+  * \sa MatrixBase::diagonal(), class Diagonal */
+template<typename Derived>
+template<int Index>
+inline typename MatrixBase<Derived>::template DiagonalIndexReturnType<Index>::Type
+MatrixBase<Derived>::diagonal()
+{
+  return derived();
+}
+
+/** This is the const version of diagonal<int>(). */
+template<typename Derived>
+template<int Index>
+inline typename MatrixBase<Derived>::template ConstDiagonalIndexReturnType<Index>::Type
+MatrixBase<Derived>::diagonal() const
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_DIAGONAL_H
diff --git a/third_party/eigen3/Eigen/src/Core/DiagonalMatrix.h b/third_party/eigen3/Eigen/src/Core/DiagonalMatrix.h
new file mode 100644
index 0000000000..f7ac22f8b0
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/DiagonalMatrix.h
@@ -0,0 +1,346 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2007-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_DIAGONALMATRIX_H
+#define EIGEN_DIAGONALMATRIX_H
+
+namespace Eigen { 
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+template<typename Derived>
+class DiagonalBase : public EigenBase<Derived>
+{
+  public:
+    typedef typename internal::traits<Derived>::DiagonalVectorType DiagonalVectorType;
+    typedef typename DiagonalVectorType::Scalar Scalar;
+    typedef typename DiagonalVectorType::RealScalar RealScalar;
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+
+    enum {
+      RowsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
+      ColsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
+      MaxRowsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime,
+      MaxColsAtCompileTime = DiagonalVectorType::MaxSizeAtCompileTime,
+      IsVectorAtCompileTime = 0,
+      Flags = 0
+    };
+
+    typedef Matrix<Scalar, RowsAtCompileTime, ColsAtCompileTime, 0, MaxRowsAtCompileTime, MaxColsAtCompileTime> DenseMatrixType;
+    typedef DenseMatrixType DenseType;
+    typedef DiagonalMatrix<Scalar,DiagonalVectorType::SizeAtCompileTime,DiagonalVectorType::MaxSizeAtCompileTime> PlainObject;
+
+    EIGEN_DEVICE_FUNC
+    inline const Derived& derived() const { return *static_cast<const Derived*>(this); }
+    EIGEN_DEVICE_FUNC
+    inline Derived& derived() { return *static_cast<Derived*>(this); }
+
+    EIGEN_DEVICE_FUNC
+    DenseMatrixType toDenseMatrix() const { return derived(); }
+    template<typename DenseDerived>
+    EIGEN_DEVICE_FUNC
+    void evalTo(MatrixBase<DenseDerived> &other) const;
+    template<typename DenseDerived>
+    EIGEN_DEVICE_FUNC
+    void addTo(MatrixBase<DenseDerived> &other) const
+    { other.diagonal() += diagonal(); }
+    template<typename DenseDerived>
+    EIGEN_DEVICE_FUNC
+    void subTo(MatrixBase<DenseDerived> &other) const
+    { other.diagonal() -= diagonal(); }
+
+    EIGEN_DEVICE_FUNC
+    inline const DiagonalVectorType& diagonal() const { return derived().diagonal(); }
+    EIGEN_DEVICE_FUNC
+    inline DiagonalVectorType& diagonal() { return derived().diagonal(); }
+
+    EIGEN_DEVICE_FUNC
+    inline Index rows() const { return diagonal().size(); }
+    EIGEN_DEVICE_FUNC
+    inline Index cols() const { return diagonal().size(); }
+
+    /** \returns the diagonal matrix product of \c *this by the matrix \a matrix.
+      */
+    template<typename MatrixDerived>
+    EIGEN_DEVICE_FUNC
+    const DiagonalProduct<MatrixDerived, Derived, OnTheLeft>
+    operator*(const MatrixBase<MatrixDerived> &matrix) const
+    {
+      return DiagonalProduct<MatrixDerived, Derived, OnTheLeft>(matrix.derived(), derived());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const DiagonalWrapper<const CwiseUnaryOp<internal::scalar_inverse_op<Scalar>, const DiagonalVectorType> >
+    inverse() const
+    {
+      return diagonal().cwiseInverse();
+    }
+    
+    EIGEN_DEVICE_FUNC
+    inline const DiagonalWrapper<const CwiseUnaryOp<internal::scalar_multiple_op<Scalar>, const DiagonalVectorType> >
+    operator*(const Scalar& scalar) const
+    {
+      return diagonal() * scalar;
+    }
+    EIGEN_DEVICE_FUNC
+    friend inline const DiagonalWrapper<const CwiseUnaryOp<internal::scalar_multiple_op<Scalar>, const DiagonalVectorType> >
+    operator*(const Scalar& scalar, const DiagonalBase& other)
+    {
+      return other.diagonal() * scalar;
+    }
+    
+    #ifdef EIGEN2_SUPPORT
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    bool isApprox(const DiagonalBase<OtherDerived>& other, typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision()) const
+    {
+      return diagonal().isApprox(other.diagonal(), precision);
+    }
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    bool isApprox(const MatrixBase<OtherDerived>& other, typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision()) const
+    {
+      return toDenseMatrix().isApprox(other, precision);
+    }
+    #endif
+};
+
+template<typename Derived>
+template<typename DenseDerived>
+void DiagonalBase<Derived>::evalTo(MatrixBase<DenseDerived> &other) const
+{
+  other.setZero();
+  other.diagonal() = diagonal();
+}
+#endif
+
+/** \class DiagonalMatrix
+  * \ingroup Core_Module
+  *
+  * \brief Represents a diagonal matrix with its storage
+  *
+  * \param _Scalar the type of coefficients
+  * \param SizeAtCompileTime the dimension of the matrix, or Dynamic
+  * \param MaxSizeAtCompileTime the dimension of the matrix, or Dynamic. This parameter is optional and defaults
+  *        to SizeAtCompileTime. Most of the time, you do not need to specify it.
+  *
+  * \sa class DiagonalWrapper
+  */
+
+namespace internal {
+template<typename _Scalar, int SizeAtCompileTime, int MaxSizeAtCompileTime>
+struct traits<DiagonalMatrix<_Scalar,SizeAtCompileTime,MaxSizeAtCompileTime> >
+ : traits<Matrix<_Scalar,SizeAtCompileTime,SizeAtCompileTime,0,MaxSizeAtCompileTime,MaxSizeAtCompileTime> >
+{
+  typedef Matrix<_Scalar,SizeAtCompileTime,1,0,MaxSizeAtCompileTime,1> DiagonalVectorType;
+  typedef Dense StorageKind;
+  typedef DenseIndex Index;
+  enum {
+    Flags = LvalueBit
+  };
+};
+}
+template<typename _Scalar, int SizeAtCompileTime, int MaxSizeAtCompileTime>
+class DiagonalMatrix
+  : public DiagonalBase<DiagonalMatrix<_Scalar,SizeAtCompileTime,MaxSizeAtCompileTime> >
+{
+  public:
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    typedef typename internal::traits<DiagonalMatrix>::DiagonalVectorType DiagonalVectorType;
+    typedef const DiagonalMatrix& Nested;
+    typedef _Scalar Scalar;
+    typedef typename internal::traits<DiagonalMatrix>::StorageKind StorageKind;
+    typedef typename internal::traits<DiagonalMatrix>::Index Index;
+    #endif
+
+  protected:
+
+    DiagonalVectorType m_diagonal;
+
+  public:
+
+    /** const version of diagonal(). */
+    EIGEN_DEVICE_FUNC
+    inline const DiagonalVectorType& diagonal() const { return m_diagonal; }
+    /** \returns a reference to the stored vector of diagonal coefficients. */
+    EIGEN_DEVICE_FUNC
+    inline DiagonalVectorType& diagonal() { return m_diagonal; }
+
+    /** Default constructor without initialization */
+    EIGEN_DEVICE_FUNC
+    inline DiagonalMatrix() {}
+
+    /** Constructs a diagonal matrix with given dimension  */
+    EIGEN_DEVICE_FUNC
+    inline DiagonalMatrix(Index dim) : m_diagonal(dim) {}
+
+    /** 2D constructor. */
+    EIGEN_DEVICE_FUNC
+    inline DiagonalMatrix(const Scalar& x, const Scalar& y) : m_diagonal(x,y) {}
+
+    /** 3D constructor. */
+    EIGEN_DEVICE_FUNC
+    inline DiagonalMatrix(const Scalar& x, const Scalar& y, const Scalar& z) : m_diagonal(x,y,z) {}
+
+    /** Copy constructor. */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    inline DiagonalMatrix(const DiagonalBase<OtherDerived>& other) : m_diagonal(other.diagonal()) {}
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** copy constructor. prevent a default copy constructor from hiding the other templated constructor */
+    inline DiagonalMatrix(const DiagonalMatrix& other) : m_diagonal(other.diagonal()) {}
+    #endif
+
+    /** generic constructor from expression of the diagonal coefficients */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    explicit inline DiagonalMatrix(const MatrixBase<OtherDerived>& other) : m_diagonal(other)
+    {}
+
+    /** Copy operator. */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    DiagonalMatrix& operator=(const DiagonalBase<OtherDerived>& other)
+    {
+      m_diagonal = other.diagonal();
+      return *this;
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    EIGEN_DEVICE_FUNC
+    DiagonalMatrix& operator=(const DiagonalMatrix& other)
+    {
+      m_diagonal = other.diagonal();
+      return *this;
+    }
+    #endif
+
+    /** Resizes to given size. */
+    EIGEN_DEVICE_FUNC
+    inline void resize(Index size) { m_diagonal.resize(size); }
+    /** Sets all coefficients to zero. */
+    EIGEN_DEVICE_FUNC
+    inline void setZero() { m_diagonal.setZero(); }
+    /** Resizes and sets all coefficients to zero. */
+    EIGEN_DEVICE_FUNC
+    inline void setZero(Index size) { m_diagonal.setZero(size); }
+    /** Sets this matrix to be the identity matrix of the current size. */
+    EIGEN_DEVICE_FUNC
+    inline void setIdentity() { m_diagonal.setOnes(); }
+    /** Sets this matrix to be the identity matrix of the given size. */
+    EIGEN_DEVICE_FUNC
+    inline void setIdentity(Index size) { m_diagonal.setOnes(size); }
+};
+
+/** \class DiagonalWrapper
+  * \ingroup Core_Module
+  *
+  * \brief Expression of a diagonal matrix
+  *
+  * \param _DiagonalVectorType the type of the vector of diagonal coefficients
+  *
+  * This class is an expression of a diagonal matrix, but not storing its own vector of diagonal coefficients,
+  * instead wrapping an existing vector expression. It is the return type of MatrixBase::asDiagonal()
+  * and most of the time this is the only way that it is used.
+  *
+  * \sa class DiagonalMatrix, class DiagonalBase, MatrixBase::asDiagonal()
+  */
+
+namespace internal {
+template<typename _DiagonalVectorType>
+struct traits<DiagonalWrapper<_DiagonalVectorType> >
+{
+  typedef _DiagonalVectorType DiagonalVectorType;
+  typedef typename DiagonalVectorType::Scalar Scalar;
+  typedef typename DiagonalVectorType::Index Index;
+  typedef typename DiagonalVectorType::StorageKind StorageKind;
+  enum {
+    RowsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
+    ColsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
+    MaxRowsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
+    MaxColsAtCompileTime = DiagonalVectorType::SizeAtCompileTime,
+    Flags =  traits<DiagonalVectorType>::Flags & LvalueBit
+  };
+};
+}
+
+template<typename _DiagonalVectorType>
+class DiagonalWrapper
+  : public DiagonalBase<DiagonalWrapper<_DiagonalVectorType> >, internal::no_assignment_operator
+{
+  public:
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    typedef _DiagonalVectorType DiagonalVectorType;
+    typedef DiagonalWrapper Nested;
+    #endif
+
+    /** Constructor from expression of diagonal coefficients to wrap. */
+    EIGEN_DEVICE_FUNC
+    inline DiagonalWrapper(DiagonalVectorType& a_diagonal) : m_diagonal(a_diagonal) {}
+
+    /** \returns a const reference to the wrapped expression of diagonal coefficients. */
+    EIGEN_DEVICE_FUNC
+    const DiagonalVectorType& diagonal() const { return m_diagonal; }
+
+  protected:
+    typename DiagonalVectorType::Nested m_diagonal;
+};
+
+/** \returns a pseudo-expression of a diagonal matrix with *this as vector of diagonal coefficients
+  *
+  * \only_for_vectors
+  *
+  * Example: \include MatrixBase_asDiagonal.cpp
+  * Output: \verbinclude MatrixBase_asDiagonal.out
+  *
+  * \sa class DiagonalWrapper, class DiagonalMatrix, diagonal(), isDiagonal()
+  **/
+template<typename Derived>
+inline const DiagonalWrapper<const Derived>
+MatrixBase<Derived>::asDiagonal() const
+{
+  return derived();
+}
+
+/** \returns true if *this is approximately equal to a diagonal matrix,
+  *          within the precision given by \a prec.
+  *
+  * Example: \include MatrixBase_isDiagonal.cpp
+  * Output: \verbinclude MatrixBase_isDiagonal.out
+  *
+  * \sa asDiagonal()
+  */
+template<typename Derived>
+bool MatrixBase<Derived>::isDiagonal(const RealScalar& prec) const
+{
+  using std::abs;
+  if(cols() != rows()) return false;
+  RealScalar maxAbsOnDiagonal = static_cast<RealScalar>(-1);
+  for(Index j = 0; j < cols(); ++j)
+  {
+    RealScalar absOnDiagonal = abs(coeff(j,j));
+    if(absOnDiagonal > maxAbsOnDiagonal) maxAbsOnDiagonal = absOnDiagonal;
+  }
+  for(Index j = 0; j < cols(); ++j)
+    for(Index i = 0; i < j; ++i)
+    {
+      if(!internal::isMuchSmallerThan(coeff(i, j), maxAbsOnDiagonal, prec)) return false;
+      if(!internal::isMuchSmallerThan(coeff(j, i), maxAbsOnDiagonal, prec)) return false;
+    }
+  return true;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_DIAGONALMATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/DiagonalProduct.h b/third_party/eigen3/Eigen/src/Core/DiagonalProduct.h
new file mode 100644
index 0000000000..c03a0c2e12
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/DiagonalProduct.h
@@ -0,0 +1,130 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2007-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_DIAGONALPRODUCT_H
+#define EIGEN_DIAGONALPRODUCT_H
+
+namespace Eigen { 
+
+namespace internal {
+template<typename MatrixType, typename DiagonalType, int ProductOrder>
+struct traits<DiagonalProduct<MatrixType, DiagonalType, ProductOrder> >
+ : traits<MatrixType>
+{
+  typedef typename scalar_product_traits<typename MatrixType::Scalar, typename DiagonalType::Scalar>::ReturnType Scalar;
+  enum {
+    RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+    MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
+
+    _StorageOrder = MatrixType::Flags & RowMajorBit ? RowMajor : ColMajor,
+    _ScalarAccessOnDiag =  !((int(_StorageOrder) == ColMajor && int(ProductOrder) == OnTheLeft)
+                          ||(int(_StorageOrder) == RowMajor && int(ProductOrder) == OnTheRight)),
+    _SameTypes = is_same<typename MatrixType::Scalar, typename DiagonalType::Scalar>::value,
+    // FIXME currently we need same types, but in the future the next rule should be the one
+    //_Vectorizable = bool(int(MatrixType::Flags)&PacketAccessBit) && ((!_PacketOnDiag) || (_SameTypes && bool(int(DiagonalType::DiagonalVectorType::Flags)&PacketAccessBit))),
+    _Vectorizable = bool(int(MatrixType::Flags)&PacketAccessBit) && _SameTypes && (_ScalarAccessOnDiag || (bool(int(DiagonalType::DiagonalVectorType::Flags)&PacketAccessBit))),
+    _LinearAccessMask = (RowsAtCompileTime==1 || ColsAtCompileTime==1) ? LinearAccessBit : 0,
+
+    Flags = ((HereditaryBits|_LinearAccessMask) & (unsigned int)(MatrixType::Flags)) | (_Vectorizable ? PacketAccessBit : 0) | AlignedBit,//(int(MatrixType::Flags)&int(DiagonalType::DiagonalVectorType::Flags)&AlignedBit),
+    CoeffReadCost = NumTraits<Scalar>::MulCost + MatrixType::CoeffReadCost + DiagonalType::DiagonalVectorType::CoeffReadCost
+  };
+};
+}
+
+template<typename MatrixType, typename DiagonalType, int ProductOrder>
+class DiagonalProduct : internal::no_assignment_operator,
+                        public MatrixBase<DiagonalProduct<MatrixType, DiagonalType, ProductOrder> >
+{
+  public:
+
+    typedef MatrixBase<DiagonalProduct> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(DiagonalProduct)
+
+    inline DiagonalProduct(const MatrixType& matrix, const DiagonalType& diagonal)
+      : m_matrix(matrix), m_diagonal(diagonal)
+    {
+      eigen_assert(diagonal.diagonal().size() == (ProductOrder == OnTheLeft ? matrix.rows() : matrix.cols()));
+    }
+
+    EIGEN_STRONG_INLINE Index rows() const { return m_matrix.rows(); }
+    EIGEN_STRONG_INLINE Index cols() const { return m_matrix.cols(); }
+
+    EIGEN_STRONG_INLINE const Scalar coeff(Index row, Index col) const
+    {
+      return m_diagonal.diagonal().coeff(ProductOrder == OnTheLeft ? row : col) * m_matrix.coeff(row, col);
+    }
+    
+    EIGEN_STRONG_INLINE const Scalar coeff(Index idx) const
+    {
+      enum {
+        StorageOrder = int(MatrixType::Flags) & RowMajorBit ? RowMajor : ColMajor
+      };
+      return coeff(int(StorageOrder)==ColMajor?idx:0,int(StorageOrder)==ColMajor?0:idx);
+    }
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index row, Index col) const
+    {
+      enum {
+        StorageOrder = Flags & RowMajorBit ? RowMajor : ColMajor
+      };
+      const Index indexInDiagonalVector = ProductOrder == OnTheLeft ? row : col;
+      return packet_impl<LoadMode>(row,col,indexInDiagonalVector,typename internal::conditional<
+        ((int(StorageOrder) == RowMajor && int(ProductOrder) == OnTheLeft)
+       ||(int(StorageOrder) == ColMajor && int(ProductOrder) == OnTheRight)), internal::true_type, internal::false_type>::type());
+    }
+    
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index idx) const
+    {
+      enum {
+        StorageOrder = int(MatrixType::Flags) & RowMajorBit ? RowMajor : ColMajor
+      };
+      return packet<LoadMode>(int(StorageOrder)==ColMajor?idx:0,int(StorageOrder)==ColMajor?0:idx);
+    }
+
+  protected:
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet_impl(Index row, Index col, Index id, internal::true_type) const
+    {
+      return internal::pmul(m_matrix.template packet<LoadMode>(row, col),
+                     internal::pset1<PacketScalar>(m_diagonal.diagonal().coeff(id)));
+    }
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet_impl(Index row, Index col, Index id, internal::false_type) const
+    {
+      enum {
+        InnerSize = (MatrixType::Flags & RowMajorBit) ? MatrixType::ColsAtCompileTime : MatrixType::RowsAtCompileTime,
+        DiagonalVectorPacketLoadMode = (LoadMode == Aligned && (((InnerSize%16) == 0) || (int(DiagonalType::DiagonalVectorType::Flags)&AlignedBit)==AlignedBit) ? Aligned : Unaligned)
+      };
+      return internal::pmul(m_matrix.template packet<LoadMode>(row, col),
+                     m_diagonal.diagonal().template packet<DiagonalVectorPacketLoadMode>(id));
+    }
+
+    typename MatrixType::Nested m_matrix;
+    typename DiagonalType::Nested m_diagonal;
+};
+
+/** \returns the diagonal matrix product of \c *this by the diagonal matrix \a diagonal.
+  */
+template<typename Derived>
+template<typename DiagonalDerived>
+inline const DiagonalProduct<Derived, DiagonalDerived, OnTheRight>
+MatrixBase<Derived>::operator*(const DiagonalBase<DiagonalDerived> &a_diagonal) const
+{
+  return DiagonalProduct<Derived, DiagonalDerived, OnTheRight>(derived(), a_diagonal.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_DIAGONALPRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/Core/Dot.h b/third_party/eigen3/Eigen/src/Core/Dot.h
new file mode 100644
index 0000000000..718de5d1af
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Dot.h
@@ -0,0 +1,270 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008, 2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_DOT_H
+#define EIGEN_DOT_H
+
+namespace Eigen { 
+
+namespace internal {
+
+// helper function for dot(). The problem is that if we put that in the body of dot(), then upon calling dot
+// with mismatched types, the compiler emits errors about failing to instantiate cwiseProduct BEFORE
+// looking at the static assertions. Thus this is a trick to get better compile errors.
+template<typename T, typename U,
+// the NeedToTranspose condition here is taken straight from Assign.h
+         bool NeedToTranspose = T::IsVectorAtCompileTime
+                && U::IsVectorAtCompileTime
+                && ((int(T::RowsAtCompileTime) == 1 && int(U::ColsAtCompileTime) == 1)
+                      |  // FIXME | instead of || to please GCC 4.4.0 stupid warning "suggest parentheses around &&".
+                         // revert to || as soon as not needed anymore.
+                    (int(T::ColsAtCompileTime) == 1 && int(U::RowsAtCompileTime) == 1))
+>
+struct dot_nocheck
+{
+  typedef typename scalar_product_traits<typename traits<T>::Scalar,typename traits<U>::Scalar>::ReturnType ResScalar;
+  EIGEN_DEVICE_FUNC
+  static inline ResScalar run(const MatrixBase<T>& a, const MatrixBase<U>& b)
+  {
+    return a.template binaryExpr<scalar_conj_product_op<typename traits<T>::Scalar,typename traits<U>::Scalar> >(b).sum();
+  }
+};
+
+template<typename T, typename U>
+struct dot_nocheck<T, U, true>
+{
+  typedef typename scalar_product_traits<typename traits<T>::Scalar,typename traits<U>::Scalar>::ReturnType ResScalar;
+  EIGEN_DEVICE_FUNC
+  static inline ResScalar run(const MatrixBase<T>& a, const MatrixBase<U>& b)
+  {
+    return a.transpose().template binaryExpr<scalar_conj_product_op<typename traits<T>::Scalar,typename traits<U>::Scalar> >(b).sum();
+  }
+};
+
+} // end namespace internal
+
+/** \returns the dot product of *this with other.
+  *
+  * \only_for_vectors
+  *
+  * \note If the scalar type is complex numbers, then this function returns the hermitian
+  * (sesquilinear) dot product, conjugate-linear in the first variable and linear in the
+  * second variable.
+  *
+  * \sa squaredNorm(), norm()
+  */
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+typename internal::scalar_product_traits<typename internal::traits<Derived>::Scalar,typename internal::traits<OtherDerived>::Scalar>::ReturnType
+MatrixBase<Derived>::dot(const MatrixBase<OtherDerived>& other) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+  EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(Derived,OtherDerived)
+  typedef internal::scalar_conj_product_op<Scalar,typename OtherDerived::Scalar> func;
+  EIGEN_CHECK_BINARY_COMPATIBILIY(func,Scalar,typename OtherDerived::Scalar);
+
+  eigen_assert(size() == other.size());
+
+  return internal::dot_nocheck<Derived,OtherDerived>::run(*this, other);
+}
+
+#ifdef EIGEN2_SUPPORT
+/** \returns the dot product of *this with other, with the Eigen2 convention that the dot product is linear in the first variable
+  * (conjugating the second variable). Of course this only makes a difference in the complex case.
+  *
+  * This method is only available in EIGEN2_SUPPORT mode.
+  *
+  * \only_for_vectors
+  *
+  * \sa dot()
+  */
+template<typename Derived>
+template<typename OtherDerived>
+typename internal::traits<Derived>::Scalar
+MatrixBase<Derived>::eigen2_dot(const MatrixBase<OtherDerived>& other) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+  EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(Derived,OtherDerived)
+  EIGEN_STATIC_ASSERT((internal::is_same<Scalar, typename OtherDerived::Scalar>::value),
+    YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+  eigen_assert(size() == other.size());
+
+  return internal::dot_nocheck<OtherDerived,Derived>::run(other,*this);
+}
+#endif
+
+
+//---------- implementation of L2 norm and related functions ----------
+
+/** \returns, for vectors, the squared \em l2 norm of \c *this, and for matrices the Frobenius norm.
+  * In both cases, it consists in the sum of the square of all the matrix entries.
+  * For vectors, this is also equals to the dot product of \c *this with itself.
+  *
+  * \sa dot(), norm()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::squaredNorm() const
+{
+  return numext::real((*this).cwiseAbs2().sum());
+}
+
+/** \returns, for vectors, the \em l2 norm of \c *this, and for matrices the Frobenius norm.
+  * In both cases, it consists in the square root of the sum of the square of all the matrix entries.
+  * For vectors, this is also equals to the square root of the dot product of \c *this with itself.
+  *
+  * \sa dot(), squaredNorm()
+  */
+template<typename Derived>
+inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::norm() const
+{
+  using std::sqrt;
+  return sqrt(squaredNorm());
+}
+
+/** \returns an expression of the quotient of *this by its own norm.
+  *
+  * \only_for_vectors
+  *
+  * \sa norm(), normalize()
+  */
+template<typename Derived>
+inline const typename MatrixBase<Derived>::PlainObject
+MatrixBase<Derived>::normalized() const
+{
+  typedef typename internal::nested<Derived>::type Nested;
+  typedef typename internal::remove_reference<Nested>::type _Nested;
+  _Nested n(derived());
+  return n / n.norm();
+}
+
+/** Normalizes the vector, i.e. divides it by its own norm.
+  *
+  * \only_for_vectors
+  *
+  * \sa norm(), normalized()
+  */
+template<typename Derived>
+inline void MatrixBase<Derived>::normalize()
+{
+  *this /= norm();
+}
+
+//---------- implementation of other norms ----------
+
+namespace internal {
+
+template<typename Derived, int p>
+struct lpNorm_selector
+{
+  typedef typename NumTraits<typename traits<Derived>::Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar run(const MatrixBase<Derived>& m)
+  {
+    using std::pow;
+    return pow(m.cwiseAbs().array().pow(p).sum(), RealScalar(1)/p);
+  }
+};
+
+template<typename Derived>
+struct lpNorm_selector<Derived, 1>
+{
+  EIGEN_DEVICE_FUNC
+  static inline typename NumTraits<typename traits<Derived>::Scalar>::Real run(const MatrixBase<Derived>& m)
+  {
+    return m.cwiseAbs().sum();
+  }
+};
+
+template<typename Derived>
+struct lpNorm_selector<Derived, 2>
+{
+  EIGEN_DEVICE_FUNC
+  static inline typename NumTraits<typename traits<Derived>::Scalar>::Real run(const MatrixBase<Derived>& m)
+  {
+    return m.norm();
+  }
+};
+
+template<typename Derived>
+struct lpNorm_selector<Derived, Infinity>
+{
+  EIGEN_DEVICE_FUNC
+  static inline typename NumTraits<typename traits<Derived>::Scalar>::Real run(const MatrixBase<Derived>& m)
+  {
+    return m.cwiseAbs().maxCoeff();
+  }
+};
+
+} // end namespace internal
+
+/** \returns the \f$ \ell^p \f$ norm of *this, that is, returns the p-th root of the sum of the p-th powers of the absolute values
+  *          of the coefficients of *this. If \a p is the special value \a Eigen::Infinity, this function returns the \f$ \ell^\infty \f$
+  *          norm, that is the maximum of the absolute values of the coefficients of *this.
+  *
+  * \sa norm()
+  */
+template<typename Derived>
+template<int p>
+inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
+MatrixBase<Derived>::lpNorm() const
+{
+  return internal::lpNorm_selector<Derived, p>::run(*this);
+}
+
+//---------- implementation of isOrthogonal / isUnitary ----------
+
+/** \returns true if *this is approximately orthogonal to \a other,
+  *          within the precision given by \a prec.
+  *
+  * Example: \include MatrixBase_isOrthogonal.cpp
+  * Output: \verbinclude MatrixBase_isOrthogonal.out
+  */
+template<typename Derived>
+template<typename OtherDerived>
+bool MatrixBase<Derived>::isOrthogonal
+(const MatrixBase<OtherDerived>& other, const RealScalar& prec) const
+{
+  typename internal::nested<Derived,2>::type nested(derived());
+  typename internal::nested<OtherDerived,2>::type otherNested(other.derived());
+  return numext::abs2(nested.dot(otherNested)) <= prec * prec * nested.squaredNorm() * otherNested.squaredNorm();
+}
+
+/** \returns true if *this is approximately an unitary matrix,
+  *          within the precision given by \a prec. In the case where the \a Scalar
+  *          type is real numbers, a unitary matrix is an orthogonal matrix, whence the name.
+  *
+  * \note This can be used to check whether a family of vectors forms an orthonormal basis.
+  *       Indeed, \c m.isUnitary() returns true if and only if the columns (equivalently, the rows) of m form an
+  *       orthonormal basis.
+  *
+  * Example: \include MatrixBase_isUnitary.cpp
+  * Output: \verbinclude MatrixBase_isUnitary.out
+  */
+template<typename Derived>
+bool MatrixBase<Derived>::isUnitary(const RealScalar& prec) const
+{
+  typename Derived::Nested nested(derived());
+  for(Index i = 0; i < cols(); ++i)
+  {
+    if(!internal::isApprox(nested.col(i).squaredNorm(), static_cast<RealScalar>(1), prec))
+      return false;
+    for(Index j = 0; j < i; ++j)
+      if(!internal::isMuchSmallerThan(nested.col(i).dot(nested.col(j)), static_cast<Scalar>(1), prec))
+        return false;
+  }
+  return true;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_DOT_H
diff --git a/third_party/eigen3/Eigen/src/Core/EigenBase.h b/third_party/eigen3/Eigen/src/Core/EigenBase.h
new file mode 100644
index 0000000000..1a577c2dce
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/EigenBase.h
@@ -0,0 +1,146 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_EIGENBASE_H
+#define EIGEN_EIGENBASE_H
+
+namespace Eigen {
+
+/** Common base class for all classes T such that MatrixBase has an operator=(T) and a constructor MatrixBase(T).
+  *
+  * In other words, an EigenBase object is an object that can be copied into a MatrixBase.
+  *
+  * Besides MatrixBase-derived classes, this also includes special matrix classes such as diagonal matrices, etc.
+  *
+  * Notice that this class is trivial, it is only used to disambiguate overloaded functions.
+  *
+  * \sa \ref TopicClassHierarchy
+  */
+template<typename Derived> struct EigenBase
+{
+//   typedef typename internal::plain_matrix_type<Derived>::type PlainObject;
+
+  typedef typename internal::traits<Derived>::StorageKind StorageKind;
+  typedef typename internal::traits<Derived>::Index Index;
+
+  /** \returns a reference to the derived object */
+  EIGEN_DEVICE_FUNC
+  Derived& derived() { return *static_cast<Derived*>(this); }
+  /** \returns a const reference to the derived object */
+  EIGEN_DEVICE_FUNC
+  const Derived& derived() const { return *static_cast<const Derived*>(this); }
+
+  EIGEN_DEVICE_FUNC
+  inline Derived& const_cast_derived() const
+  { return *static_cast<Derived*>(const_cast<EigenBase*>(this)); }
+  EIGEN_DEVICE_FUNC
+  inline const Derived& const_derived() const
+  { return *static_cast<const Derived*>(this); }
+
+  /** \returns the number of rows. \sa cols(), RowsAtCompileTime */
+  EIGEN_DEVICE_FUNC
+  inline Index rows() const { return derived().rows(); }
+  /** \returns the number of columns. \sa rows(), ColsAtCompileTime*/
+  EIGEN_DEVICE_FUNC
+  inline Index cols() const { return derived().cols(); }
+  /** \returns the number of coefficients, which is rows()*cols().
+    * \sa rows(), cols(), SizeAtCompileTime. */
+  EIGEN_DEVICE_FUNC
+  inline Index size() const { return rows() * cols(); }
+
+  /** \internal Don't use it, but do the equivalent: \code dst = *this; \endcode */
+  template<typename Dest>
+  EIGEN_DEVICE_FUNC
+  inline void evalTo(Dest& dst) const
+  { derived().evalTo(dst); }
+
+  /** \internal Don't use it, but do the equivalent: \code dst += *this; \endcode */
+  template<typename Dest>
+  EIGEN_DEVICE_FUNC
+  inline void addTo(Dest& dst) const
+  {
+    // This is the default implementation,
+    // derived class can reimplement it in a more optimized way.
+    typename Dest::PlainObject res(rows(),cols());
+    evalTo(res);
+    dst += res;
+  }
+
+  /** \internal Don't use it, but do the equivalent: \code dst -= *this; \endcode */
+  template<typename Dest>
+  EIGEN_DEVICE_FUNC
+  inline void subTo(Dest& dst) const
+  {
+    // This is the default implementation,
+    // derived class can reimplement it in a more optimized way.
+    typename Dest::PlainObject res(rows(),cols());
+    evalTo(res);
+    dst -= res;
+  }
+
+  /** \internal Don't use it, but do the equivalent: \code dst.applyOnTheRight(*this); \endcode */
+  template<typename Dest>
+  EIGEN_DEVICE_FUNC inline void applyThisOnTheRight(Dest& dst) const
+  {
+    // This is the default implementation,
+    // derived class can reimplement it in a more optimized way.
+    dst = dst * this->derived();
+  }
+
+  /** \internal Don't use it, but do the equivalent: \code dst.applyOnTheLeft(*this); \endcode */
+  template<typename Dest>
+  EIGEN_DEVICE_FUNC inline void applyThisOnTheLeft(Dest& dst) const
+  {
+    // This is the default implementation,
+    // derived class can reimplement it in a more optimized way.
+    dst = this->derived() * dst;
+  }
+
+};
+
+/***************************************************************************
+* Implementation of matrix base methods
+***************************************************************************/
+
+/** \brief Copies the generic expression \a other into *this.
+  *
+  * \details The expression must provide a (templated) evalTo(Derived& dst) const
+  * function which does the actual job. In practice, this allows any user to write
+  * its own special matrix without having to modify MatrixBase
+  *
+  * \returns a reference to *this.
+  */
+template<typename Derived>
+template<typename OtherDerived>
+Derived& DenseBase<Derived>::operator=(const EigenBase<OtherDerived> &other)
+{
+  other.derived().evalTo(derived());
+  return derived();
+}
+
+template<typename Derived>
+template<typename OtherDerived>
+Derived& DenseBase<Derived>::operator+=(const EigenBase<OtherDerived> &other)
+{
+  other.derived().addTo(derived());
+  return derived();
+}
+
+template<typename Derived>
+template<typename OtherDerived>
+Derived& DenseBase<Derived>::operator-=(const EigenBase<OtherDerived> &other)
+{
+  other.derived().subTo(derived());
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_EIGENBASE_H
diff --git a/third_party/eigen3/Eigen/src/Core/Flagged.h b/third_party/eigen3/Eigen/src/Core/Flagged.h
new file mode 100644
index 0000000000..1f2955fc1d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Flagged.h
@@ -0,0 +1,140 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_FLAGGED_H
+#define EIGEN_FLAGGED_H
+
+namespace Eigen { 
+
+/** \class Flagged
+  * \ingroup Core_Module
+  *
+  * \brief Expression with modified flags
+  *
+  * \param ExpressionType the type of the object of which we are modifying the flags
+  * \param Added the flags added to the expression
+  * \param Removed the flags removed from the expression (has priority over Added).
+  *
+  * This class represents an expression whose flags have been modified.
+  * It is the return type of MatrixBase::flagged()
+  * and most of the time this is the only way it is used.
+  *
+  * \sa MatrixBase::flagged()
+  */
+
+namespace internal {
+template<typename ExpressionType, unsigned int Added, unsigned int Removed>
+struct traits<Flagged<ExpressionType, Added, Removed> > : traits<ExpressionType>
+{
+  enum { Flags = (ExpressionType::Flags | Added) & ~Removed };
+};
+}
+
+template<typename ExpressionType, unsigned int Added, unsigned int Removed> class Flagged
+  : public MatrixBase<Flagged<ExpressionType, Added, Removed> >
+{
+  public:
+
+    typedef MatrixBase<Flagged> Base;
+    
+    EIGEN_DENSE_PUBLIC_INTERFACE(Flagged)
+    typedef typename internal::conditional<internal::must_nest_by_value<ExpressionType>::ret,
+        ExpressionType, const ExpressionType&>::type ExpressionTypeNested;
+    typedef typename ExpressionType::InnerIterator InnerIterator;
+
+    inline Flagged(const ExpressionType& matrix) : m_matrix(matrix) {}
+
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+    inline Index outerStride() const { return m_matrix.outerStride(); }
+    inline Index innerStride() const { return m_matrix.innerStride(); }
+
+    inline CoeffReturnType coeff(Index row, Index col) const
+    {
+      return m_matrix.coeff(row, col);
+    }
+
+    inline CoeffReturnType coeff(Index index) const
+    {
+      return m_matrix.coeff(index);
+    }
+    
+    inline const Scalar& coeffRef(Index row, Index col) const
+    {
+      return m_matrix.const_cast_derived().coeffRef(row, col);
+    }
+
+    inline const Scalar& coeffRef(Index index) const
+    {
+      return m_matrix.const_cast_derived().coeffRef(index);
+    }
+
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      return m_matrix.const_cast_derived().coeffRef(row, col);
+    }
+
+    inline Scalar& coeffRef(Index index)
+    {
+      return m_matrix.const_cast_derived().coeffRef(index);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index row, Index col) const
+    {
+      return m_matrix.template packet<LoadMode>(row, col);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index row, Index col, const PacketScalar& x)
+    {
+      m_matrix.const_cast_derived().template writePacket<LoadMode>(row, col, x);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index index) const
+    {
+      return m_matrix.template packet<LoadMode>(index);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index index, const PacketScalar& x)
+    {
+      m_matrix.const_cast_derived().template writePacket<LoadMode>(index, x);
+    }
+
+    const ExpressionType& _expression() const { return m_matrix; }
+
+    template<typename OtherDerived>
+    typename ExpressionType::PlainObject solveTriangular(const MatrixBase<OtherDerived>& other) const;
+
+    template<typename OtherDerived>
+    void solveTriangularInPlace(const MatrixBase<OtherDerived>& other) const;
+
+  protected:
+    ExpressionTypeNested m_matrix;
+};
+
+/** \returns an expression of *this with added and removed flags
+  *
+  * This is mostly for internal use.
+  *
+  * \sa class Flagged
+  */
+template<typename Derived>
+template<unsigned int Added,unsigned int Removed>
+inline const Flagged<Derived, Added, Removed>
+DenseBase<Derived>::flagged() const
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_FLAGGED_H
diff --git a/third_party/eigen3/Eigen/src/Core/ForceAlignedAccess.h b/third_party/eigen3/Eigen/src/Core/ForceAlignedAccess.h
new file mode 100644
index 0000000000..807c7a2934
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/ForceAlignedAccess.h
@@ -0,0 +1,146 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_FORCEALIGNEDACCESS_H
+#define EIGEN_FORCEALIGNEDACCESS_H
+
+namespace Eigen {
+
+/** \class ForceAlignedAccess
+  * \ingroup Core_Module
+  *
+  * \brief Enforce aligned packet loads and stores regardless of what is requested
+  *
+  * \param ExpressionType the type of the object of which we are forcing aligned packet access
+  *
+  * This class is the return type of MatrixBase::forceAlignedAccess()
+  * and most of the time this is the only way it is used.
+  *
+  * \sa MatrixBase::forceAlignedAccess()
+  */
+
+namespace internal {
+template<typename ExpressionType>
+struct traits<ForceAlignedAccess<ExpressionType> > : public traits<ExpressionType>
+{};
+}
+
+template<typename ExpressionType> class ForceAlignedAccess
+  : public internal::dense_xpr_base< ForceAlignedAccess<ExpressionType> >::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<ForceAlignedAccess>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(ForceAlignedAccess)
+
+    inline ForceAlignedAccess(const ExpressionType& matrix) : m_expression(matrix) {}
+
+    inline Index rows() const { return m_expression.rows(); }
+    inline Index cols() const { return m_expression.cols(); }
+    inline Index outerStride() const { return m_expression.outerStride(); }
+    inline Index innerStride() const { return m_expression.innerStride(); }
+
+    inline const CoeffReturnType coeff(Index row, Index col) const
+    {
+      return m_expression.coeff(row, col);
+    }
+
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      return m_expression.const_cast_derived().coeffRef(row, col);
+    }
+
+    inline const CoeffReturnType coeff(Index index) const
+    {
+      return m_expression.coeff(index);
+    }
+
+    inline Scalar& coeffRef(Index index)
+    {
+      return m_expression.const_cast_derived().coeffRef(index);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index row, Index col) const
+    {
+      return m_expression.template packet<Aligned>(row, col);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index row, Index col, const PacketScalar& x)
+    {
+      m_expression.const_cast_derived().template writePacket<Aligned>(row, col, x);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index index) const
+    {
+      return m_expression.template packet<Aligned>(index);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index index, const PacketScalar& x)
+    {
+      m_expression.const_cast_derived().template writePacket<Aligned>(index, x);
+    }
+
+    operator const ExpressionType&() const { return m_expression; }
+
+  protected:
+    const ExpressionType& m_expression;
+
+  private:
+    ForceAlignedAccess& operator=(const ForceAlignedAccess&);
+};
+
+/** \returns an expression of *this with forced aligned access
+  * \sa forceAlignedAccessIf(),class ForceAlignedAccess
+  */
+template<typename Derived>
+inline const ForceAlignedAccess<Derived>
+MatrixBase<Derived>::forceAlignedAccess() const
+{
+  return ForceAlignedAccess<Derived>(derived());
+}
+
+/** \returns an expression of *this with forced aligned access
+  * \sa forceAlignedAccessIf(), class ForceAlignedAccess
+  */
+template<typename Derived>
+inline ForceAlignedAccess<Derived>
+MatrixBase<Derived>::forceAlignedAccess()
+{
+  return ForceAlignedAccess<Derived>(derived());
+}
+
+/** \returns an expression of *this with forced aligned access if \a Enable is true.
+  * \sa forceAlignedAccess(), class ForceAlignedAccess
+  */
+template<typename Derived>
+template<bool Enable>
+inline typename internal::add_const_on_value_type<typename internal::conditional<Enable,ForceAlignedAccess<Derived>,Derived&>::type>::type
+MatrixBase<Derived>::forceAlignedAccessIf() const
+{
+  return derived();
+}
+
+/** \returns an expression of *this with forced aligned access if \a Enable is true.
+  * \sa forceAlignedAccess(), class ForceAlignedAccess
+  */
+template<typename Derived>
+template<bool Enable>
+inline typename internal::conditional<Enable,ForceAlignedAccess<Derived>,Derived&>::type
+MatrixBase<Derived>::forceAlignedAccessIf()
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_FORCEALIGNEDACCESS_H
diff --git a/third_party/eigen3/Eigen/src/Core/Functors.h b/third_party/eigen3/Eigen/src/Core/Functors.h
new file mode 100644
index 0000000000..0a45fa31a9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Functors.h
@@ -0,0 +1,1020 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_FUNCTORS_H
+#define EIGEN_FUNCTORS_H
+
+namespace Eigen {
+
+namespace internal {
+
+// associative functors:
+
+/** \internal
+  * \brief Template functor to compute the sum of two scalars
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::operator+, class VectorwiseOp, MatrixBase::sum()
+  */
+template<typename Scalar> struct scalar_sum_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sum_op)
+  EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { return a + b; }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::padd(a,b); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Scalar predux(const Packet& a) const
+  { return internal::predux(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_sum_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasAdd
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the product of two scalars
+  *
+  * \sa class CwiseBinaryOp, Cwise::operator*(), class VectorwiseOp, MatrixBase::redux()
+  */
+template<typename LhsScalar,typename RhsScalar> struct scalar_product_op {
+  enum {
+    // TODO vectorize mixed product
+    Vectorizable = is_same<LhsScalar,RhsScalar>::value && packet_traits<LhsScalar>::HasMul && packet_traits<RhsScalar>::HasMul
+  };
+  typedef typename scalar_product_traits<LhsScalar,RhsScalar>::ReturnType result_type;
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_product_op)
+  EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a * b; }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::pmul(a,b); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const result_type predux(const Packet& a) const
+  { return internal::predux_mul(a); }
+};
+template<typename LhsScalar,typename RhsScalar>
+struct functor_traits<scalar_product_op<LhsScalar,RhsScalar> > {
+  enum {
+    Cost = (NumTraits<LhsScalar>::MulCost + NumTraits<RhsScalar>::MulCost)/2, // rough estimate!
+    PacketAccess = scalar_product_op<LhsScalar,RhsScalar>::Vectorizable
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the conjugate product of two scalars
+  *
+  * This is a short cut for conj(x) * y which is needed for optimization purpose; in Eigen2 support mode, this becomes x * conj(y)
+  */
+template<typename LhsScalar,typename RhsScalar> struct scalar_conj_product_op {
+
+  enum {
+    Conj = NumTraits<LhsScalar>::IsComplex
+  };
+
+  typedef typename scalar_product_traits<LhsScalar,RhsScalar>::ReturnType result_type;
+
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_conj_product_op)
+  EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const
+  { return conj_helper<LhsScalar,RhsScalar,Conj,false>().pmul(a,b); }
+
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return conj_helper<Packet,Packet,Conj,false>().pmul(a,b); }
+};
+template<typename LhsScalar,typename RhsScalar>
+struct functor_traits<scalar_conj_product_op<LhsScalar,RhsScalar> > {
+  enum {
+    Cost = NumTraits<LhsScalar>::MulCost,
+    PacketAccess = internal::is_same<LhsScalar, RhsScalar>::value && packet_traits<LhsScalar>::HasMul
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the min of two scalars
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::cwiseMin, class VectorwiseOp, MatrixBase::minCoeff()
+  */
+template<typename Scalar> struct scalar_min_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_min_op)
+  EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { using std::min; return (min)(a, b); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::pmin(a,b); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Scalar predux(const Packet& a) const
+  { return internal::predux_min(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_min_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasMin
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the max of two scalars
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::cwiseMax, class VectorwiseOp, MatrixBase::maxCoeff()
+  */
+template<typename Scalar> struct scalar_max_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_max_op)
+  EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { using std::max; return (max)(a, b); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::pmax(a,b); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Scalar predux(const Packet& a) const
+  { return internal::predux_max(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_max_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasMax
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the hypot of two scalars
+  *
+  * \sa MatrixBase::stableNorm(), class Redux
+  */
+template<typename Scalar> struct scalar_hypot_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_hypot_op)
+//   typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& _x, const Scalar& _y) const
+  {
+    using std::max;
+    using std::min;
+    using std::sqrt;
+    Scalar p = (max)(_x, _y);
+    Scalar q = (min)(_x, _y);
+    Scalar qp = q/p;
+    return p * sqrt(Scalar(1) + qp*qp);
+  }
+};
+template<typename Scalar>
+struct functor_traits<scalar_hypot_op<Scalar> > {
+  enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess=0 };
+};
+
+/** \internal
+  * \brief Template functor to compute the pow of two scalars
+  */
+template<typename Scalar, typename OtherScalar> struct scalar_binary_pow_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_binary_pow_op)
+  inline Scalar operator() (const Scalar& a, const OtherScalar& b) const { return numext::pow(a, b); }
+};
+template<typename Scalar, typename OtherScalar>
+struct functor_traits<scalar_binary_pow_op<Scalar,OtherScalar> > {
+  enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = false };
+};
+
+// other binary functors:
+
+/** \internal
+  * \brief Template functor to compute the difference of two scalars
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::operator-
+  */
+template<typename Scalar> struct scalar_difference_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_difference_op)
+  EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { return a - b; }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::psub(a,b); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_difference_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasSub
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the quotient of two scalars
+  *
+  * \sa class CwiseBinaryOp, Cwise::operator/()
+  */
+template<typename LhsScalar,typename RhsScalar> struct scalar_quotient_op {
+  enum {
+    // TODO vectorize mixed product
+    Vectorizable = is_same<LhsScalar,RhsScalar>::value && packet_traits<LhsScalar>::HasDiv && packet_traits<RhsScalar>::HasDiv
+  };
+  typedef typename scalar_product_traits<LhsScalar,RhsScalar>::ReturnType result_type;
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_quotient_op)
+  EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a / b; }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::pdiv(a,b); }
+};
+template<typename LhsScalar,typename RhsScalar>
+struct functor_traits<scalar_quotient_op<LhsScalar,RhsScalar> > {
+  enum {
+    Cost = (NumTraits<LhsScalar>::MulCost + NumTraits<RhsScalar>::MulCost), // rough estimate!
+    PacketAccess = scalar_quotient_op<LhsScalar,RhsScalar>::Vectorizable
+  };
+};
+
+
+
+/** \internal
+  * \brief Template functor to compute the and of two booleans
+  *
+  * \sa class CwiseBinaryOp, ArrayBase::operator&&
+  */
+struct scalar_boolean_and_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_and_op)
+  EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a && b; }
+};
+template<> struct functor_traits<scalar_boolean_and_op> {
+  enum {
+    Cost = NumTraits<bool>::AddCost,
+    PacketAccess = false
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the or of two booleans
+  *
+  * \sa class CwiseBinaryOp, ArrayBase::operator||
+  */
+struct scalar_boolean_or_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_or_op)
+  EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a || b; }
+};
+template<> struct functor_traits<scalar_boolean_or_op> {
+  enum {
+    Cost = NumTraits<bool>::AddCost,
+    PacketAccess = false
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the xor of two booleans
+  *
+  * \sa class CwiseBinaryOp, ArrayBase::operator^
+  */
+struct scalar_boolean_xor_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_xor_op)
+  EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a ^ b; }
+};
+template<> struct functor_traits<scalar_boolean_xor_op> {
+  enum {
+    Cost = NumTraits<bool>::AddCost,
+    PacketAccess = false
+  };
+};
+
+// unary functors:
+
+/** \internal
+  * \brief Template functor to compute the opposite of a scalar
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::operator-
+  */
+template<typename Scalar> struct scalar_opposite_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_opposite_op)
+  EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return -a; }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pnegate(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_opposite_op<Scalar> >
+{ enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasNegate };
+};
+
+/** \internal
+  * \brief Template functor to compute the absolute value of a scalar
+  *
+  * \sa class CwiseUnaryOp, Cwise::abs
+  */
+template<typename Scalar> struct scalar_abs_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_abs_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { using std::abs; return abs(a); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pabs(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_abs_op<Scalar> >
+{
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasAbs
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the squared absolute value of a scalar
+  *
+  * \sa class CwiseUnaryOp, Cwise::abs2
+  */
+template<typename Scalar> struct scalar_abs2_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_abs2_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { return numext::abs2(a); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pmul(a,a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_abs2_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasAbs2 }; };
+
+/** \internal
+  * \brief Template functor to compute the conjugate of a complex value
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::conjugate()
+  */
+template<typename Scalar> struct scalar_conjugate_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_conjugate_op)
+  EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { using numext::conj; return conj(a); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::pconj(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_conjugate_op<Scalar> >
+{
+  enum {
+    Cost = NumTraits<Scalar>::IsComplex ? NumTraits<Scalar>::AddCost : 0,
+    PacketAccess = packet_traits<Scalar>::HasConj
+  };
+};
+
+/** \internal
+  * \brief Template functor to cast a scalar to another type
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::cast()
+  */
+template<typename Scalar, typename NewType>
+struct scalar_cast_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
+  typedef NewType result_type;
+  EIGEN_STRONG_INLINE const NewType operator() (const Scalar& a) const { return cast<Scalar, NewType>(a); }
+};
+template<typename Scalar, typename NewType>
+struct functor_traits<scalar_cast_op<Scalar,NewType> >
+{ enum { Cost = is_same<Scalar, NewType>::value ? 0 : NumTraits<NewType>::AddCost, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to convert a scalar to another type using a custom functor.
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::convert()
+  */
+template<typename Scalar, typename NewType, typename ConvertOp>
+struct scalar_convert_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_convert_op)
+  typedef NewType result_type;
+  EIGEN_STRONG_INLINE const NewType operator() (const Scalar& a) const { return ConvertOp()(a); }
+};
+template<typename Scalar, typename NewType, typename ConvertOp>
+struct functor_traits<scalar_convert_op<Scalar,NewType,ConvertOp> >
+{ enum { Cost = is_same<Scalar, NewType>::value ? 0 : NumTraits<NewType>::AddCost, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to extract the real part of a complex
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::real()
+  */
+template<typename Scalar>
+struct scalar_real_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_real_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { return numext::real(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_real_op<Scalar> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to extract the imaginary part of a complex
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::imag()
+  */
+template<typename Scalar>
+struct scalar_imag_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_imag_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { return numext::imag(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_imag_op<Scalar> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to extract the real part of a complex as a reference
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::real()
+  */
+template<typename Scalar>
+struct scalar_real_ref_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_real_ref_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_STRONG_INLINE result_type& operator() (const Scalar& a) const { return numext::real_ref(*const_cast<Scalar*>(&a)); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_real_ref_op<Scalar> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to extract the imaginary part of a complex as a reference
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::imag()
+  */
+template<typename Scalar>
+struct scalar_imag_ref_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_imag_ref_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_STRONG_INLINE result_type& operator() (const Scalar& a) const { return numext::imag_ref(*const_cast<Scalar*>(&a)); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_imag_ref_op<Scalar> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+/** \internal
+  *
+  * \brief Template functor to compute the exponential of a scalar
+  *
+  * \sa class CwiseUnaryOp, Cwise::exp()
+  */
+template<typename Scalar> struct scalar_exp_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_exp_op)
+  inline const Scalar operator() (const Scalar& a) const { using std::exp; return exp(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::pexp(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_exp_op<Scalar> >
+{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasExp }; };
+
+/** \internal
+  *
+  * \brief Template functor to compute the logarithm of a scalar
+  *
+  * \sa class CwiseUnaryOp, Cwise::log()
+  */
+template<typename Scalar> struct scalar_log_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_log_op)
+  inline const Scalar operator() (const Scalar& a) const { using std::log; return log(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::plog(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_log_op<Scalar> >
+{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasLog }; };
+
+/** \internal
+  * \brief Template functor to multiply a scalar by a fixed other one
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::operator*, MatrixBase::operator/
+  */
+/* NOTE why doing the pset1() in packetOp *is* an optimization ?
+ * indeed it seems better to declare m_other as a Packet and do the pset1() once
+ * in the constructor. However, in practice:
+ *  - GCC does not like m_other as a Packet and generate a load every time it needs it
+ *  - on the other hand GCC is able to moves the pset1() outside the loop :)
+ *  - simpler code ;)
+ * (ICC and gcc 4.4 seems to perform well in both cases, the issue is visible with y = a*x + b*y)
+ */
+template<typename Scalar>
+struct scalar_multiple_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  // FIXME default copy constructors seems bugged with std::complex<>
+  EIGEN_STRONG_INLINE scalar_multiple_op(const scalar_multiple_op& other) : m_other(other.m_other) { }
+  EIGEN_STRONG_INLINE scalar_multiple_op(const Scalar& other) : m_other(other) { }
+  EIGEN_STRONG_INLINE Scalar operator() (const Scalar& a) const { return a * m_other; }
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pmul(a, pset1<Packet>(m_other)); }
+  typename add_const_on_value_type<typename NumTraits<Scalar>::Nested>::type m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_multiple_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasMul }; };
+
+template<typename Scalar1, typename Scalar2>
+struct scalar_multiple2_op {
+  typedef typename packet_traits<Scalar1>::type Packet1;
+  typedef typename scalar_product_traits<Scalar1,Scalar2>::ReturnType result_type;
+  typedef typename packet_traits<result_type>::type packet_result_type;
+  EIGEN_STRONG_INLINE scalar_multiple2_op(const scalar_multiple2_op& other) : m_other(other.m_other) { }
+  EIGEN_STRONG_INLINE scalar_multiple2_op(const Scalar2& other) : m_other(other) { }
+  EIGEN_STRONG_INLINE result_type operator() (const Scalar1& a) const { return a * m_other; }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const packet_result_type packetOp(const Packet1& a) const
+  { eigen_assert("packetOp is not defined"); }
+  typename add_const_on_value_type<typename NumTraits<Scalar2>::Nested>::type m_other;
+};
+template<typename Scalar1,typename Scalar2>
+struct functor_traits<scalar_multiple2_op<Scalar1,Scalar2> >
+{ enum { Cost = NumTraits<Scalar1>::MulCost, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to divide a scalar by a fixed other one
+  *
+  * This functor is used to implement the quotient of a matrix by
+  * a scalar where the scalar type is not necessarily a floating point type.
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::operator/
+  */
+template<typename Scalar>
+struct scalar_quotient1_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  // FIXME default copy constructors seems bugged with std::complex<>
+  EIGEN_STRONG_INLINE scalar_quotient1_op(const scalar_quotient1_op& other) : m_other(other.m_other) { }
+  EIGEN_STRONG_INLINE scalar_quotient1_op(const Scalar& other) : m_other(other) {}
+  EIGEN_STRONG_INLINE Scalar operator() (const Scalar& a) const { return a / m_other; }
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pdiv(a, pset1<Packet>(m_other)); }
+  typename add_const_on_value_type<typename NumTraits<Scalar>::Nested>::type m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_quotient1_op<Scalar> >
+{ enum { Cost = 2 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasDiv }; };
+
+// nullary functors
+
+template<typename Scalar>
+struct scalar_constant_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_constant_op(const scalar_constant_op& other) : m_other(other.m_other) { }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_constant_op(const Scalar& other) : m_other(other) { }
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (Index, Index = 0) const { return m_other; }
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(Index, Index = 0) const { return internal::pset1<Packet>(m_other); }
+  const Scalar m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_constant_op<Scalar> >
+// FIXME replace this packet test by a safe one
+{ enum { Cost = 1, PacketAccess = packet_traits<Scalar>::Vectorizable, IsRepeatable = true }; };
+
+template<typename Scalar> struct scalar_identity_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_identity_op)
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Scalar operator() (Index row, Index col) const { return row==col ? Scalar(1) : Scalar(0); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_identity_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = false, IsRepeatable = true }; };
+
+template <typename Scalar, bool RandomAccess> struct linspaced_op_impl;
+
+// linear access for packet ops:
+// 1) initialization
+//   base = [low, ..., low] + ([step, ..., step] * [-size, ..., 0])
+// 2) each step (where size is 1 for coeff access or PacketSize for packet access)
+//   base += [size*step, ..., size*step]
+//
+// TODO: Perhaps it's better to initialize lazily (so not in the constructor but in packetOp)
+//       in order to avoid the padd() in operator() ?
+template <typename Scalar>
+struct linspaced_op_impl<Scalar,false>
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+
+  linspaced_op_impl(const Scalar& low, const Scalar& step) :
+  m_low(low), m_step(step),
+  m_packetStep(pset1<Packet>(packet_traits<Scalar>::size*step)),
+  m_base(padd(pset1<Packet>(low), pmul(pset1<Packet>(step),plset<Scalar>(-packet_traits<Scalar>::size)))) {}
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Scalar operator() (Index i) const
+  {
+    m_base = padd(m_base, pset1<Packet>(m_step));
+    return m_low+Scalar(i)*m_step;
+  }
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index) const { return m_base = padd(m_base,m_packetStep); }
+
+  const Scalar m_low;
+  const Scalar m_step;
+  const Packet m_packetStep;
+  mutable Packet m_base;
+};
+
+// random access for packet ops:
+// 1) each step
+//   [low, ..., low] + ( [step, ..., step] * ( [i, ..., i] + [0, ..., size] ) )
+template <typename Scalar>
+struct linspaced_op_impl<Scalar,true>
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+
+  linspaced_op_impl(const Scalar& low, const Scalar& step) :
+  m_low(low), m_step(step),
+  m_lowPacket(pset1<Packet>(m_low)), m_stepPacket(pset1<Packet>(m_step)), m_interPacket(plset<Scalar>(0)) {}
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Scalar operator() (Index i) const { return m_low+i*m_step; }
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index i) const
+  { return internal::padd(m_lowPacket, pmul(m_stepPacket, padd(pset1<Packet>(i),m_interPacket))); }
+
+  const Scalar m_low;
+  const Scalar m_step;
+  const Packet m_lowPacket;
+  const Packet m_stepPacket;
+  const Packet m_interPacket;
+};
+
+// ----- Linspace functor ----------------------------------------------------------------
+
+// Forward declaration (we default to random access which does not really give
+// us a speed gain when using packet access but it allows to use the functor in
+// nested expressions).
+template <typename Scalar, bool RandomAccess = true> struct linspaced_op;
+template <typename Scalar, bool RandomAccess> struct functor_traits< linspaced_op<Scalar,RandomAccess> >
+{ enum { Cost = 1, PacketAccess = packet_traits<Scalar>::HasSetLinear, IsRepeatable = true }; };
+template <typename Scalar, bool RandomAccess> struct linspaced_op
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+  linspaced_op(const Scalar& low, const Scalar& high, DenseIndex num_steps) : impl((num_steps==1 ? high : low), (num_steps==1 ? Scalar() : (high-low)/(num_steps-1))) {}
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Scalar operator() (Index i) const { return impl(i); }
+
+  // We need this function when assigning e.g. a RowVectorXd to a MatrixXd since
+  // there row==0 and col is used for the actual iteration.
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Scalar operator() (Index row, Index col) const
+  {
+    eigen_assert(col==0 || row==0);
+    return impl(col + row);
+  }
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index i) const { return impl.packetOp(i); }
+
+  // We need this function when assigning e.g. a RowVectorXd to a MatrixXd since
+  // there row==0 and col is used for the actual iteration.
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index row, Index col) const
+  {
+    eigen_assert(col==0 || row==0);
+    return impl.packetOp(col + row);
+  }
+
+  // This proxy object handles the actual required temporaries, the different
+  // implementations (random vs. sequential access) as well as the
+  // correct piping to size 2/4 packet operations.
+  const linspaced_op_impl<Scalar,RandomAccess> impl;
+};
+
+// all functors allow linear access, except scalar_identity_op. So we fix here a quick meta
+// to indicate whether a functor allows linear access, just always answering 'yes' except for
+// scalar_identity_op.
+// FIXME move this to functor_traits adding a functor_default
+template<typename Functor> struct functor_has_linear_access { enum { ret = 1 }; };
+template<typename Scalar> struct functor_has_linear_access<scalar_identity_op<Scalar> > { enum { ret = 0 }; };
+
+// In Eigen, any binary op (Product, CwiseBinaryOp) require the Lhs and Rhs to have the same scalar type, except for multiplication
+// where the mixing of different types is handled by scalar_product_traits
+// In particular, real * complex<real> is allowed.
+// FIXME move this to functor_traits adding a functor_default
+template<typename Functor> struct functor_is_product_like { enum { ret = 0 }; };
+template<typename LhsScalar,typename RhsScalar> struct functor_is_product_like<scalar_product_op<LhsScalar,RhsScalar> > { enum { ret = 1 }; };
+template<typename LhsScalar,typename RhsScalar> struct functor_is_product_like<scalar_conj_product_op<LhsScalar,RhsScalar> > { enum { ret = 1 }; };
+template<typename LhsScalar,typename RhsScalar> struct functor_is_product_like<scalar_quotient_op<LhsScalar,RhsScalar> > { enum { ret = 1 }; };
+
+
+/** \internal
+  * \brief Template functor to add a scalar to a fixed other one
+  * \sa class CwiseUnaryOp, Array::operator+
+  */
+/* If you wonder why doing the pset1() in packetOp() is an optimization check scalar_multiple_op */
+template<typename Scalar>
+struct scalar_add_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  // FIXME default copy constructors seems bugged with std::complex<>
+  inline scalar_add_op(const scalar_add_op& other) : m_other(other.m_other) { }
+  inline scalar_add_op(const Scalar& other) : m_other(other) { }
+  inline Scalar operator() (const Scalar& a) const { return a + m_other; }
+  inline const Packet packetOp(const Packet& a) const
+  { return internal::padd(a, pset1<Packet>(m_other)); }
+  const Scalar m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_add_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = packet_traits<Scalar>::HasAdd }; };
+
+/** \internal
+  * \brief Template functor to compute the square root of a scalar
+  * \sa class CwiseUnaryOp, Cwise::sqrt()
+  */
+template<typename Scalar> struct scalar_sqrt_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sqrt_op)
+  inline const Scalar operator() (const Scalar& a) const { using std::sqrt; return sqrt(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::psqrt(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_sqrt_op<Scalar> >
+{ enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasSqrt
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the cosine of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::cos()
+  */
+template<typename Scalar> struct scalar_cos_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_cos_op)
+  inline Scalar operator() (const Scalar& a) const { using std::cos; return cos(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::pcos(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_cos_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasCos
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the sine of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::sin()
+  */
+template<typename Scalar> struct scalar_sin_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sin_op)
+  inline const Scalar operator() (const Scalar& a) const { using std::sin; return sin(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::psin(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_sin_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasSin
+  };
+};
+
+
+/** \internal
+  * \brief Template functor to compute the tan of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::tan()
+  */
+template<typename Scalar> struct scalar_tan_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_tan_op)
+  inline const Scalar operator() (const Scalar& a) const { using std::tan; return tan(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::ptan(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_tan_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasTan
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the arc cosine of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::acos()
+  */
+template<typename Scalar> struct scalar_acos_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_acos_op)
+  inline const Scalar operator() (const Scalar& a) const { using std::acos; return acos(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::pacos(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_acos_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasACos
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the arc sine of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::asin()
+  */
+template<typename Scalar> struct scalar_asin_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_asin_op)
+  inline const Scalar operator() (const Scalar& a) const { using std::asin; return asin(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::pasin(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_asin_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasASin
+  };
+};
+
+/** \internal
+  * \brief Template functor to raise a scalar to a power
+  * \sa class CwiseUnaryOp, Cwise::pow
+  */
+template<typename Scalar>
+struct scalar_pow_op {
+  // FIXME default copy constructors seems bugged with std::complex<>
+  inline scalar_pow_op(const scalar_pow_op& other) : m_exponent(other.m_exponent) { }
+  inline scalar_pow_op(const Scalar& exponent) : m_exponent(exponent) {}
+  inline Scalar operator() (const Scalar& a) const { return numext::pow(a, m_exponent); }
+  const Scalar m_exponent;
+};
+template<typename Scalar>
+struct functor_traits<scalar_pow_op<Scalar> >
+{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to compute the quotient between a scalar and array entries.
+  * \sa class CwiseUnaryOp, Cwise::inverse()
+  */
+template<typename Scalar>
+struct scalar_inverse_mult_op {
+  scalar_inverse_mult_op(const Scalar& other) : m_other(other) {}
+  inline Scalar operator() (const Scalar& a) const { return m_other / a; }
+  template<typename Packet>
+  inline const Packet packetOp(const Packet& a) const
+  { return internal::pdiv(pset1<Packet>(m_other),a); }
+  Scalar m_other;
+};
+
+/** \internal
+  * \brief Template functor to compute the inverse of a scalar
+  * \sa class CwiseUnaryOp, Cwise::inverse()
+  */
+template<typename Scalar>
+struct scalar_inverse_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_inverse_op)
+  inline Scalar operator() (const Scalar& a) const { return Scalar(1)/a; }
+  template<typename Packet>
+  inline const Packet packetOp(const Packet& a) const
+  { return internal::pdiv(pset1<Packet>(Scalar(1)),a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_inverse_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasDiv }; };
+
+/** \internal
+  * \brief Template functor to compute the square of a scalar
+  * \sa class CwiseUnaryOp, Cwise::square()
+  */
+template<typename Scalar>
+struct scalar_square_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_square_op)
+  inline Scalar operator() (const Scalar& a) const { return a*a; }
+  template<typename Packet>
+  inline const Packet packetOp(const Packet& a) const
+  { return internal::pmul(a,a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_square_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasMul }; };
+
+/** \internal
+  * \brief Template functor to compute the cube of a scalar
+  * \sa class CwiseUnaryOp, Cwise::cube()
+  */
+template<typename Scalar>
+struct scalar_cube_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_cube_op)
+  inline Scalar operator() (const Scalar& a) const { return a*a*a; }
+  template<typename Packet>
+  inline const Packet packetOp(const Packet& a) const
+  { return internal::pmul(a,pmul(a,a)); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_cube_op<Scalar> >
+{ enum { Cost = 2*NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasMul }; };
+
+// default functor traits for STL functors:
+
+template<typename T>
+struct functor_traits<std::multiplies<T> >
+{ enum { Cost = NumTraits<T>::MulCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::divides<T> >
+{ enum { Cost = NumTraits<T>::MulCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::plus<T> >
+{ enum { Cost = NumTraits<T>::AddCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::minus<T> >
+{ enum { Cost = NumTraits<T>::AddCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::negate<T> >
+{ enum { Cost = NumTraits<T>::AddCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::logical_or<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::logical_and<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::logical_not<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::greater<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::less<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::greater_equal<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::less_equal<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::equal_to<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::not_equal_to<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::binder2nd<T> >
+{ enum { Cost = functor_traits<T>::Cost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::binder1st<T> >
+{ enum { Cost = functor_traits<T>::Cost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::unary_negate<T> >
+{ enum { Cost = 1 + functor_traits<T>::Cost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::binary_negate<T> >
+{ enum { Cost = 1 + functor_traits<T>::Cost, PacketAccess = false }; };
+
+#ifdef EIGEN_STDEXT_SUPPORT
+
+template<typename T0,typename T1>
+struct functor_traits<std::project1st<T0,T1> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+template<typename T0,typename T1>
+struct functor_traits<std::project2nd<T0,T1> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+template<typename T0,typename T1>
+struct functor_traits<std::select2nd<std::pair<T0,T1> > >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+template<typename T0,typename T1>
+struct functor_traits<std::select1st<std::pair<T0,T1> > >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+template<typename T0,typename T1>
+struct functor_traits<std::unary_compose<T0,T1> >
+{ enum { Cost = functor_traits<T0>::Cost + functor_traits<T1>::Cost, PacketAccess = false }; };
+
+template<typename T0,typename T1,typename T2>
+struct functor_traits<std::binary_compose<T0,T1,T2> >
+{ enum { Cost = functor_traits<T0>::Cost + functor_traits<T1>::Cost + functor_traits<T2>::Cost, PacketAccess = false }; };
+
+#endif // EIGEN_STDEXT_SUPPORT
+
+// allow to add new functors and specializations of functor_traits from outside Eigen.
+// this macro is really needed because functor_traits must be specialized after it is declared but before it is used...
+#ifdef EIGEN_FUNCTORS_PLUGIN
+#include EIGEN_FUNCTORS_PLUGIN
+#endif
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_FUNCTORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/Fuzzy.h b/third_party/eigen3/Eigen/src/Core/Fuzzy.h
new file mode 100644
index 0000000000..0ff1b96f56
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Fuzzy.h
@@ -0,0 +1,155 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_FUZZY_H
+#define EIGEN_FUZZY_H
+
+namespace Eigen {
+
+namespace internal
+{
+
+template<typename Derived, typename OtherDerived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
+struct isApprox_selector
+{
+  EIGEN_DEVICE_FUNC
+  static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar& prec)
+  {
+    typename internal::nested<Derived,2>::type nested(x);
+    typename internal::nested<OtherDerived,2>::type otherNested(y);
+    return (nested - otherNested).cwiseAbs2().sum() <= prec * prec * numext::mini(nested.cwiseAbs2().sum(), otherNested.cwiseAbs2().sum());
+  }
+};
+
+template<typename Derived, typename OtherDerived>
+struct isApprox_selector<Derived, OtherDerived, true>
+{
+  EIGEN_DEVICE_FUNC
+  static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar&)
+  {
+    return x.matrix() == y.matrix();
+  }
+};
+
+template<typename Derived, typename OtherDerived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
+struct isMuchSmallerThan_object_selector
+{
+  EIGEN_DEVICE_FUNC
+  static bool run(const Derived& x, const OtherDerived& y, const typename Derived::RealScalar& prec)
+  {
+    return x.cwiseAbs2().sum() <= numext::abs2(prec) * y.cwiseAbs2().sum();
+  }
+};
+
+template<typename Derived, typename OtherDerived>
+struct isMuchSmallerThan_object_selector<Derived, OtherDerived, true>
+{
+  EIGEN_DEVICE_FUNC
+  static bool run(const Derived& x, const OtherDerived&, const typename Derived::RealScalar&)
+  {
+    return x.matrix() == Derived::Zero(x.rows(), x.cols()).matrix();
+  }
+};
+
+template<typename Derived, bool is_integer = NumTraits<typename Derived::Scalar>::IsInteger>
+struct isMuchSmallerThan_scalar_selector
+{
+  EIGEN_DEVICE_FUNC
+  static bool run(const Derived& x, const typename Derived::RealScalar& y, const typename Derived::RealScalar& prec)
+  {
+    return x.cwiseAbs2().sum() <= numext::abs2(prec * y);
+  }
+};
+
+template<typename Derived>
+struct isMuchSmallerThan_scalar_selector<Derived, true>
+{
+  EIGEN_DEVICE_FUNC
+  static bool run(const Derived& x, const typename Derived::RealScalar&, const typename Derived::RealScalar&)
+  {
+    return x.matrix() == Derived::Zero(x.rows(), x.cols()).matrix();
+  }
+};
+
+} // end namespace internal
+
+
+/** \returns \c true if \c *this is approximately equal to \a other, within the precision
+  * determined by \a prec.
+  *
+  * \note The fuzzy compares are done multiplicatively. Two vectors \f$ v \f$ and \f$ w \f$
+  * are considered to be approximately equal within precision \f$ p \f$ if
+  * \f[ \Vert v - w \Vert \leqslant p\,\min(\Vert v\Vert, \Vert w\Vert). \f]
+  * For matrices, the comparison is done using the Hilbert-Schmidt norm (aka Frobenius norm
+  * L2 norm).
+  *
+  * \note Because of the multiplicativeness of this comparison, one can't use this function
+  * to check whether \c *this is approximately equal to the zero matrix or vector.
+  * Indeed, \c isApprox(zero) returns false unless \c *this itself is exactly the zero matrix
+  * or vector. If you want to test whether \c *this is zero, use internal::isMuchSmallerThan(const
+  * RealScalar&, RealScalar) instead.
+  *
+  * \sa internal::isMuchSmallerThan(const RealScalar&, RealScalar) const
+  */
+template<typename Derived>
+template<typename OtherDerived>
+bool DenseBase<Derived>::isApprox(
+  const DenseBase<OtherDerived>& other,
+  const RealScalar& prec
+) const
+{
+  return internal::isApprox_selector<Derived, OtherDerived>::run(derived(), other.derived(), prec);
+}
+
+/** \returns \c true if the norm of \c *this is much smaller than \a other,
+  * within the precision determined by \a prec.
+  *
+  * \note The fuzzy compares are done multiplicatively. A vector \f$ v \f$ is
+  * considered to be much smaller than \f$ x \f$ within precision \f$ p \f$ if
+  * \f[ \Vert v \Vert \leqslant p\,\vert x\vert. \f]
+  *
+  * For matrices, the comparison is done using the Hilbert-Schmidt norm. For this reason,
+  * the value of the reference scalar \a other should come from the Hilbert-Schmidt norm
+  * of a reference matrix of same dimensions.
+  *
+  * \sa isApprox(), isMuchSmallerThan(const DenseBase<OtherDerived>&, RealScalar) const
+  */
+template<typename Derived>
+bool DenseBase<Derived>::isMuchSmallerThan(
+  const typename NumTraits<Scalar>::Real& other,
+  const RealScalar& prec
+) const
+{
+  return internal::isMuchSmallerThan_scalar_selector<Derived>::run(derived(), other, prec);
+}
+
+/** \returns \c true if the norm of \c *this is much smaller than the norm of \a other,
+  * within the precision determined by \a prec.
+  *
+  * \note The fuzzy compares are done multiplicatively. A vector \f$ v \f$ is
+  * considered to be much smaller than a vector \f$ w \f$ within precision \f$ p \f$ if
+  * \f[ \Vert v \Vert \leqslant p\,\Vert w\Vert. \f]
+  * For matrices, the comparison is done using the Hilbert-Schmidt norm.
+  *
+  * \sa isApprox(), isMuchSmallerThan(const RealScalar&, RealScalar) const
+  */
+template<typename Derived>
+template<typename OtherDerived>
+bool DenseBase<Derived>::isMuchSmallerThan(
+  const DenseBase<OtherDerived>& other,
+  const RealScalar& prec
+) const
+{
+  return internal::isMuchSmallerThan_object_selector<Derived, OtherDerived>::run(derived(), other.derived(), prec);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_FUZZY_H
diff --git a/third_party/eigen3/Eigen/src/Core/GeneralProduct.h b/third_party/eigen3/Eigen/src/Core/GeneralProduct.h
new file mode 100644
index 0000000000..d2618ba25b
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/GeneralProduct.h
@@ -0,0 +1,674 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GENERAL_PRODUCT_H
+#define EIGEN_GENERAL_PRODUCT_H
+
+namespace Eigen {
+
+/** \class GeneralProduct
+  * \ingroup Core_Module
+  *
+  * \brief Expression of the product of two general matrices or vectors
+  *
+  * \param LhsNested the type used to store the left-hand side
+  * \param RhsNested the type used to store the right-hand side
+  * \param ProductMode the type of the product
+  *
+  * This class represents an expression of the product of two general matrices.
+  * We call a general matrix, a dense matrix with full storage. For instance,
+  * This excludes triangular, selfadjoint, and sparse matrices.
+  * It is the return type of the operator* between general matrices. Its template
+  * arguments are determined automatically by ProductReturnType. Therefore,
+  * GeneralProduct should never be used direclty. To determine the result type of a
+  * function which involves a matrix product, use ProductReturnType::Type.
+  *
+  * \sa ProductReturnType, MatrixBase::operator*(const MatrixBase<OtherDerived>&)
+  */
+template<typename Lhs, typename Rhs, int ProductType = internal::product_type<Lhs,Rhs>::value>
+class GeneralProduct;
+
+enum {
+  Large = 2,
+  Small = 3
+};
+
+namespace internal {
+
+template<int Rows, int Cols, int Depth> struct product_type_selector;
+
+template<int Size, int MaxSize> struct product_size_category
+{
+  enum { is_large = MaxSize == Dynamic ||
+                    Size >= EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD,
+         value = is_large  ? Large
+               : Size == 1 ? 1
+                           : Small
+  };
+};
+
+template<typename Lhs, typename Rhs> struct product_type
+{
+  typedef typename remove_all<Lhs>::type _Lhs;
+  typedef typename remove_all<Rhs>::type _Rhs;
+  enum {
+    MaxRows  = _Lhs::MaxRowsAtCompileTime,
+    Rows  = _Lhs::RowsAtCompileTime,
+    MaxCols  = _Rhs::MaxColsAtCompileTime,
+    Cols  = _Rhs::ColsAtCompileTime,
+    MaxDepth = EIGEN_SIZE_MIN_PREFER_FIXED(_Lhs::MaxColsAtCompileTime,
+                                           _Rhs::MaxRowsAtCompileTime),
+    Depth = EIGEN_SIZE_MIN_PREFER_FIXED(_Lhs::ColsAtCompileTime,
+                                        _Rhs::RowsAtCompileTime)
+  };
+
+  // the splitting into different lines of code here, introducing the _select enums and the typedef below,
+  // is to work around an internal compiler error with gcc 4.1 and 4.2.
+private:
+  enum {
+    rows_select = product_size_category<Rows,MaxRows>::value,
+    cols_select = product_size_category<Cols,MaxCols>::value,
+    depth_select = product_size_category<Depth,MaxDepth>::value
+  };
+  typedef product_type_selector<rows_select, cols_select, depth_select> selector;
+
+public:
+  enum {
+    value = selector::ret
+  };
+#ifdef EIGEN_DEBUG_PRODUCT
+  static void debug()
+  {
+      EIGEN_DEBUG_VAR(Rows);
+      EIGEN_DEBUG_VAR(Cols);
+      EIGEN_DEBUG_VAR(Depth);
+      EIGEN_DEBUG_VAR(rows_select);
+      EIGEN_DEBUG_VAR(cols_select);
+      EIGEN_DEBUG_VAR(depth_select);
+      EIGEN_DEBUG_VAR(value);
+  }
+#endif
+};
+
+
+/* The following allows to select the kind of product at compile time
+ * based on the three dimensions of the product.
+ * This is a compile time mapping from {1,Small,Large}^3 -> {product types} */
+// FIXME I'm not sure the current mapping is the ideal one.
+template<int M, int N>  struct product_type_selector<M,N,1>              { enum { ret = OuterProduct }; };
+template<int Depth>     struct product_type_selector<1,    1,    Depth>  { enum { ret = InnerProduct }; };
+template<>              struct product_type_selector<1,    1,    1>      { enum { ret = InnerProduct }; };
+template<>              struct product_type_selector<Small,1,    Small>  { enum { ret = CoeffBasedProductMode }; };
+template<>              struct product_type_selector<1,    Small,Small>  { enum { ret = CoeffBasedProductMode }; };
+template<>              struct product_type_selector<Small,Small,Small>  { enum { ret = CoeffBasedProductMode }; };
+template<>              struct product_type_selector<Small, Small, 1>    { enum { ret = LazyCoeffBasedProductMode }; };
+template<>              struct product_type_selector<Small, Large, 1>    { enum { ret = LazyCoeffBasedProductMode }; };
+template<>              struct product_type_selector<Large, Small, 1>    { enum { ret = LazyCoeffBasedProductMode }; };
+template<>              struct product_type_selector<1,    Large,Small>  { enum { ret = CoeffBasedProductMode }; };
+template<>              struct product_type_selector<1,    Large,Large>  { enum { ret = GemvProduct }; };
+template<>              struct product_type_selector<1,    Small,Large>  { enum { ret = CoeffBasedProductMode }; };
+template<>              struct product_type_selector<Large,1,    Small>  { enum { ret = CoeffBasedProductMode }; };
+template<>              struct product_type_selector<Large,1,    Large>  { enum { ret = GemvProduct }; };
+template<>              struct product_type_selector<Small,1,    Large>  { enum { ret = CoeffBasedProductMode }; };
+template<>              struct product_type_selector<Small,Small,Large>  { enum { ret = GemmProduct }; };
+template<>              struct product_type_selector<Large,Small,Large>  { enum { ret = GemmProduct }; };
+template<>              struct product_type_selector<Small,Large,Large>  { enum { ret = GemmProduct }; };
+template<>              struct product_type_selector<Large,Large,Large>  { enum { ret = GemmProduct }; };
+template<>              struct product_type_selector<Large,Small,Small>  { enum { ret = GemmProduct }; };
+template<>              struct product_type_selector<Small,Large,Small>  { enum { ret = GemmProduct }; };
+template<>              struct product_type_selector<Large,Large,Small>  { enum { ret = GemmProduct }; };
+
+} // end namespace internal
+
+/** \class ProductReturnType
+  * \ingroup Core_Module
+  *
+  * \brief Helper class to get the correct and optimized returned type of operator*
+  *
+  * \param Lhs the type of the left-hand side
+  * \param Rhs the type of the right-hand side
+  * \param ProductMode the type of the product (determined automatically by internal::product_mode)
+  *
+  * This class defines the typename Type representing the optimized product expression
+  * between two matrix expressions. In practice, using ProductReturnType<Lhs,Rhs>::Type
+  * is the recommended way to define the result type of a function returning an expression
+  * which involve a matrix product. The class Product should never be
+  * used directly.
+  *
+  * \sa class Product, MatrixBase::operator*(const MatrixBase<OtherDerived>&)
+  */
+template<typename Lhs, typename Rhs, int ProductType>
+struct ProductReturnType
+{
+  // TODO use the nested type to reduce instanciations ????
+//   typedef typename internal::nested<Lhs,Rhs::ColsAtCompileTime>::type LhsNested;
+//   typedef typename internal::nested<Rhs,Lhs::RowsAtCompileTime>::type RhsNested;
+
+  typedef GeneralProduct<Lhs/*Nested*/, Rhs/*Nested*/, ProductType> Type;
+};
+
+template<typename Lhs, typename Rhs>
+struct ProductReturnType<Lhs,Rhs,CoeffBasedProductMode>
+{
+  typedef typename internal::nested<Lhs, Rhs::ColsAtCompileTime, typename internal::plain_matrix_type<Lhs>::type >::type LhsNested;
+  typedef typename internal::nested<Rhs, Lhs::RowsAtCompileTime, typename internal::plain_matrix_type<Rhs>::type >::type RhsNested;
+  typedef CoeffBasedProduct<LhsNested, RhsNested, EvalBeforeAssigningBit | EvalBeforeNestingBit> Type;
+};
+
+template<typename Lhs, typename Rhs>
+struct ProductReturnType<Lhs,Rhs,LazyCoeffBasedProductMode>
+{
+  typedef typename internal::nested<Lhs, Rhs::ColsAtCompileTime, typename internal::plain_matrix_type<Lhs>::type >::type LhsNested;
+  typedef typename internal::nested<Rhs, Lhs::RowsAtCompileTime, typename internal::plain_matrix_type<Rhs>::type >::type RhsNested;
+  typedef CoeffBasedProduct<LhsNested, RhsNested, NestByRefBit> Type;
+};
+
+// this is a workaround for sun CC
+template<typename Lhs, typename Rhs>
+struct LazyProductReturnType : public ProductReturnType<Lhs,Rhs,LazyCoeffBasedProductMode>
+{};
+
+/***********************************************************************
+*  Implementation of Inner Vector Vector Product
+***********************************************************************/
+
+// FIXME : maybe the "inner product" could return a Scalar
+// instead of a 1x1 matrix ??
+// Pro: more natural for the user
+// Cons: this could be a problem if in a meta unrolled algorithm a matrix-matrix
+// product ends up to a row-vector times col-vector product... To tackle this use
+// case, we could have a specialization for Block<MatrixType,1,1> with: operator=(Scalar x);
+
+namespace internal {
+
+template<typename Lhs, typename Rhs>
+struct traits<GeneralProduct<Lhs,Rhs,InnerProduct> >
+ : traits<Matrix<typename scalar_product_traits<typename Lhs::Scalar, typename Rhs::Scalar>::ReturnType,1,1> >
+{};
+
+}
+
+template<typename Lhs, typename Rhs>
+class GeneralProduct<Lhs, Rhs, InnerProduct>
+  : internal::no_assignment_operator,
+    public Matrix<typename internal::scalar_product_traits<typename Lhs::Scalar, typename Rhs::Scalar>::ReturnType,1,1>
+{
+    typedef Matrix<typename internal::scalar_product_traits<typename Lhs::Scalar, typename Rhs::Scalar>::ReturnType,1,1> Base;
+  public:
+    GeneralProduct(const Lhs& lhs, const Rhs& rhs)
+    {
+      EIGEN_STATIC_ASSERT((internal::is_same<typename Lhs::RealScalar, typename Rhs::RealScalar>::value),
+        YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+      Base::coeffRef(0,0) = (lhs.transpose().cwiseProduct(rhs)).sum();
+    }
+
+    /** Convertion to scalar */
+    operator const typename Base::Scalar() const {
+      return Base::coeff(0,0);
+    }
+};
+
+/***********************************************************************
+*  Implementation of Outer Vector Vector Product
+***********************************************************************/
+
+namespace internal {
+
+// Column major
+template<typename ProductType, typename Dest, typename Func>
+EIGEN_DONT_INLINE void outer_product_selector_run(const ProductType& prod, Dest& dest, const Func& func, const false_type&)
+{
+  typedef typename Dest::Index Index;
+  // FIXME make sure lhs is sequentially stored
+  // FIXME not very good if rhs is real and lhs complex while alpha is real too
+  const Index cols = dest.cols();
+  for (Index j=0; j<cols; ++j)
+    func(dest.col(j), prod.rhs().coeff(j) * prod.lhs());
+}
+
+// Row major
+template<typename ProductType, typename Dest, typename Func>
+EIGEN_DONT_INLINE void outer_product_selector_run(const ProductType& prod, Dest& dest, const Func& func, const true_type&) {
+  typedef typename Dest::Index Index;
+  // FIXME make sure rhs is sequentially stored
+  // FIXME not very good if lhs is real and rhs complex while alpha is real too
+  const Index rows = dest.rows();
+  for (Index i=0; i<rows; ++i)
+    func(dest.row(i), prod.lhs().coeff(i) * prod.rhs());
+}
+
+template<typename Lhs, typename Rhs>
+struct traits<GeneralProduct<Lhs,Rhs,OuterProduct> >
+ : traits<ProductBase<GeneralProduct<Lhs,Rhs,OuterProduct>, Lhs, Rhs> >
+{};
+
+}
+
+template<typename Lhs, typename Rhs>
+class GeneralProduct<Lhs, Rhs, OuterProduct>
+  : public ProductBase<GeneralProduct<Lhs,Rhs,OuterProduct>, Lhs, Rhs>
+{
+    template<typename T> struct IsRowMajor : internal::conditional<(int(T::Flags)&RowMajorBit), internal::true_type, internal::false_type>::type {};
+
+  public:
+    EIGEN_PRODUCT_PUBLIC_INTERFACE(GeneralProduct)
+
+    GeneralProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs)
+    {
+      EIGEN_STATIC_ASSERT((internal::is_same<typename Lhs::RealScalar, typename Rhs::RealScalar>::value),
+        YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+    }
+
+    struct set  { template<typename Dst, typename Src> void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived()  = src; } };
+    struct add  { template<typename Dst, typename Src> void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() += src; } };
+    struct sub  { template<typename Dst, typename Src> void operator()(const Dst& dst, const Src& src) const { dst.const_cast_derived() -= src; } };
+    struct adds {
+      Scalar m_scale;
+      adds(const Scalar& s) : m_scale(s) {}
+      template<typename Dst, typename Src> void operator()(const Dst& dst, const Src& src) const {
+        dst.const_cast_derived() += m_scale * src;
+      }
+    };
+
+    template<typename Dest>
+    inline void evalTo(Dest& dest) const {
+      internal::outer_product_selector_run(*this, dest, set(), IsRowMajor<Dest>());
+    }
+
+    template<typename Dest>
+    inline void addTo(Dest& dest) const {
+      internal::outer_product_selector_run(*this, dest, add(), IsRowMajor<Dest>());
+    }
+
+    template<typename Dest>
+    inline void subTo(Dest& dest) const {
+      internal::outer_product_selector_run(*this, dest, sub(), IsRowMajor<Dest>());
+    }
+
+    template<typename Dest> void scaleAndAddTo(Dest& dest, const Scalar& alpha) const
+    {
+      internal::outer_product_selector_run(*this, dest, adds(alpha), IsRowMajor<Dest>());
+    }
+};
+
+/***********************************************************************
+*  Implementation of General Matrix Vector Product
+***********************************************************************/
+
+/*  According to the shape/flags of the matrix we have to distinghish 3 different cases:
+ *   1 - the matrix is col-major, BLAS compatible and M is large => call fast BLAS-like colmajor routine
+ *   2 - the matrix is row-major, BLAS compatible and N is large => call fast BLAS-like rowmajor routine
+ *   3 - all other cases are handled using a simple loop along the outer-storage direction.
+ *  Therefore we need a lower level meta selector.
+ *  Furthermore, if the matrix is the rhs, then the product has to be transposed.
+ */
+namespace internal {
+
+template<typename Lhs, typename Rhs>
+struct traits<GeneralProduct<Lhs,Rhs,GemvProduct> >
+ : traits<ProductBase<GeneralProduct<Lhs,Rhs,GemvProduct>, Lhs, Rhs> >
+{};
+
+template<int Side, int StorageOrder, bool BlasCompatible>
+struct gemv_selector;
+
+} // end namespace internal
+
+template<typename Lhs, typename Rhs>
+class GeneralProduct<Lhs, Rhs, GemvProduct>
+  : public ProductBase<GeneralProduct<Lhs,Rhs,GemvProduct>, Lhs, Rhs>
+{
+  public:
+    EIGEN_PRODUCT_PUBLIC_INTERFACE(GeneralProduct)
+
+    typedef typename Lhs::Scalar LhsScalar;
+    typedef typename Rhs::Scalar RhsScalar;
+
+    GeneralProduct(const Lhs& a_lhs, const Rhs& a_rhs) : Base(a_lhs,a_rhs)
+    {
+//       EIGEN_STATIC_ASSERT((internal::is_same<typename Lhs::Scalar, typename Rhs::Scalar>::value),
+//         YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+    }
+
+    enum { Side = Lhs::IsVectorAtCompileTime ? OnTheLeft : OnTheRight };
+    typedef typename internal::conditional<int(Side)==OnTheRight,_LhsNested,_RhsNested>::type MatrixType;
+
+    template<typename Dest> void scaleAndAddTo(Dest& dst, const Scalar& alpha) const
+    {
+      eigen_assert(m_lhs.rows() == dst.rows() && m_rhs.cols() == dst.cols());
+      internal::gemv_selector<Side,(int(MatrixType::Flags)&RowMajorBit) ? RowMajor : ColMajor,
+                       bool(internal::blas_traits<MatrixType>::HasUsableDirectAccess)>::run(*this, dst, alpha);
+    }
+};
+
+namespace internal {
+
+// The vector is on the left => transposition
+template<int StorageOrder, bool BlasCompatible>
+struct gemv_selector<OnTheLeft,StorageOrder,BlasCompatible>
+{
+  template<typename ProductType, typename Dest>
+  static void run(const ProductType& prod, Dest& dest, const typename ProductType::Scalar& alpha)
+  {
+    Transpose<Dest> destT(dest);
+    enum { OtherStorageOrder = StorageOrder == RowMajor ? ColMajor : RowMajor };
+    gemv_selector<OnTheRight,OtherStorageOrder,BlasCompatible>
+      ::run(GeneralProduct<Transpose<const typename ProductType::_RhsNested>,Transpose<const typename ProductType::_LhsNested>, GemvProduct>
+        (prod.rhs().transpose(), prod.lhs().transpose()), destT, alpha);
+  }
+};
+
+template<typename Scalar,int Size,int MaxSize,bool Cond> struct gemv_static_vector_if;
+
+template<typename Scalar,int Size,int MaxSize>
+struct gemv_static_vector_if<Scalar,Size,MaxSize,false>
+{
+  EIGEN_STRONG_INLINE  Scalar* data() { eigen_internal_assert(false && "should never be called"); return 0; }
+};
+
+template<typename Scalar,int Size>
+struct gemv_static_vector_if<Scalar,Size,Dynamic,true>
+{
+  EIGEN_STRONG_INLINE Scalar* data() { return 0; }
+};
+
+template<typename Scalar,int Size,int MaxSize>
+struct gemv_static_vector_if<Scalar,Size,MaxSize,true>
+{
+  #if EIGEN_ALIGN_STATICALLY
+  internal::plain_array<Scalar,EIGEN_SIZE_MIN_PREFER_FIXED(Size,MaxSize),0> m_data;
+  EIGEN_STRONG_INLINE Scalar* data() { return m_data.array; }
+  #else
+  // Some architectures cannot align on the stack,
+  // => let's manually enforce alignment by allocating more data and return the address of the first aligned element.
+  enum {
+    ForceAlignment  = internal::packet_traits<Scalar>::Vectorizable,
+    PacketSize      = internal::packet_traits<Scalar>::size
+  };
+  internal::plain_array<Scalar,EIGEN_SIZE_MIN_PREFER_FIXED(Size,MaxSize)+(ForceAlignment?PacketSize:0),0> m_data;
+  EIGEN_STRONG_INLINE Scalar* data() {
+    return ForceAlignment
+            ? reinterpret_cast<Scalar*>((reinterpret_cast<size_t>(m_data.array) & ~(size_t(EIGEN_ALIGN_BYTES-1))) + EIGEN_ALIGN_BYTES)
+            : m_data.array;
+  }
+  #endif
+};
+
+template<> struct gemv_selector<OnTheRight,ColMajor,true>
+{
+  template<typename ProductType, typename Dest>
+  static inline void run(const ProductType& prod, Dest& dest, const typename ProductType::Scalar& alpha)
+  {
+    typedef typename ProductType::Index Index;
+    typedef typename ProductType::LhsScalar   LhsScalar;
+    typedef typename ProductType::RhsScalar   RhsScalar;
+    typedef typename ProductType::Scalar      ResScalar;
+    typedef typename ProductType::RealScalar  RealScalar;
+    typedef typename ProductType::ActualLhsType ActualLhsType;
+    typedef typename ProductType::ActualRhsType ActualRhsType;
+    typedef typename ProductType::LhsBlasTraits LhsBlasTraits;
+    typedef typename ProductType::RhsBlasTraits RhsBlasTraits;
+    typedef Map<Matrix<ResScalar,Dynamic,1>, Aligned> MappedDest;
+
+    ActualLhsType actualLhs = LhsBlasTraits::extract(prod.lhs());
+    ActualRhsType actualRhs = RhsBlasTraits::extract(prod.rhs());
+
+    ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(prod.lhs())
+                                  * RhsBlasTraits::extractScalarFactor(prod.rhs());
+
+    enum {
+      // FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1
+      // on, the other hand it is good for the cache to pack the vector anyways...
+      EvalToDestAtCompileTime = Dest::InnerStrideAtCompileTime==1,
+      ComplexByReal = (NumTraits<LhsScalar>::IsComplex) && (!NumTraits<RhsScalar>::IsComplex),
+      MightCannotUseDest = (Dest::InnerStrideAtCompileTime!=1) || ComplexByReal
+    };
+
+    gemv_static_vector_if<ResScalar,Dest::SizeAtCompileTime,Dest::MaxSizeAtCompileTime,MightCannotUseDest> static_dest;
+
+    bool alphaIsCompatible = (!ComplexByReal) || (numext::imag(actualAlpha)==RealScalar(0));
+    bool evalToDest = EvalToDestAtCompileTime && alphaIsCompatible;
+
+    RhsScalar compatibleAlpha = get_factor<ResScalar,RhsScalar>::run(actualAlpha);
+
+    ei_declare_aligned_stack_constructed_variable(ResScalar,actualDestPtr,dest.size(),
+                                                  evalToDest ? dest.data() : static_dest.data());
+
+    if(!evalToDest)
+    {
+      #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      int size = dest.size();
+      EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      #endif
+      if(!alphaIsCompatible)
+      {
+        MappedDest(actualDestPtr, dest.size()).setZero();
+        compatibleAlpha = RhsScalar(1);
+      }
+      else
+        MappedDest(actualDestPtr, dest.size()) = dest;
+    }
+
+    typedef const_blas_data_mapper<LhsScalar,Index,ColMajor> LhsMapper;
+    typedef const_blas_data_mapper<RhsScalar,Index,RowMajor> RhsMapper;
+    general_matrix_vector_product
+        <Index,LhsScalar,LhsMapper,ColMajor,LhsBlasTraits::NeedToConjugate,RhsScalar,RhsMapper,RhsBlasTraits::NeedToConjugate>::run(
+        actualLhs.rows(), actualLhs.cols(),
+        LhsMapper(actualLhs.data(), actualLhs.outerStride()),
+        RhsMapper(actualRhs.data(), actualRhs.innerStride()),
+        actualDestPtr, 1,
+        compatibleAlpha);
+
+    if (!evalToDest)
+    {
+      if(!alphaIsCompatible)
+        dest += actualAlpha * MappedDest(actualDestPtr, dest.size());
+      else
+        dest = MappedDest(actualDestPtr, dest.size());
+    }
+  }
+};
+
+template<> struct gemv_selector<OnTheRight,RowMajor,true>
+{
+  template<typename ProductType, typename Dest>
+  static void run(const ProductType& prod, Dest& dest, const typename ProductType::Scalar& alpha)
+  {
+    typedef typename ProductType::LhsScalar LhsScalar;
+    typedef typename ProductType::RhsScalar RhsScalar;
+    typedef typename ProductType::Scalar    ResScalar;
+    typedef typename ProductType::Index Index;
+    typedef typename ProductType::ActualLhsType ActualLhsType;
+    typedef typename ProductType::ActualRhsType ActualRhsType;
+    typedef typename ProductType::_ActualRhsType _ActualRhsType;
+    typedef typename ProductType::LhsBlasTraits LhsBlasTraits;
+    typedef typename ProductType::RhsBlasTraits RhsBlasTraits;
+
+    typename add_const<ActualLhsType>::type actualLhs = LhsBlasTraits::extract(prod.lhs());
+    typename add_const<ActualRhsType>::type actualRhs = RhsBlasTraits::extract(prod.rhs());
+
+    ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(prod.lhs())
+                                  * RhsBlasTraits::extractScalarFactor(prod.rhs());
+
+    enum {
+      // FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1
+      // on, the other hand it is good for the cache to pack the vector anyways...
+      DirectlyUseRhs = _ActualRhsType::InnerStrideAtCompileTime==1
+    };
+
+    gemv_static_vector_if<RhsScalar,_ActualRhsType::SizeAtCompileTime,_ActualRhsType::MaxSizeAtCompileTime,!DirectlyUseRhs> static_rhs;
+
+    ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhsPtr,actualRhs.size(),
+        DirectlyUseRhs ? const_cast<RhsScalar*>(actualRhs.data()) : static_rhs.data());
+
+    if(!DirectlyUseRhs)
+    {
+      #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      int size = actualRhs.size();
+      EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      #endif
+      Map<typename _ActualRhsType::PlainObject>(actualRhsPtr, actualRhs.size()) = actualRhs;
+    }
+
+    typedef const_blas_data_mapper<LhsScalar,Index,RowMajor> LhsMapper;
+    typedef const_blas_data_mapper<RhsScalar,Index,ColMajor> RhsMapper;
+    general_matrix_vector_product
+        <Index,LhsScalar,LhsMapper,RowMajor,LhsBlasTraits::NeedToConjugate,RhsScalar,RhsMapper,RhsBlasTraits::NeedToConjugate>::run(
+        actualLhs.rows(), actualLhs.cols(),
+        LhsMapper(actualLhs.data(), actualLhs.outerStride()),
+        RhsMapper(actualRhsPtr, 1),
+        dest.data(), dest.innerStride(),
+        actualAlpha);
+  }
+};
+
+template<> struct gemv_selector<OnTheRight,ColMajor,false>
+{
+  template<typename ProductType, typename Dest>
+  static void run(const ProductType& prod, Dest& dest, const typename ProductType::Scalar& alpha)
+  {
+    typedef typename Dest::Index Index;
+    // TODO makes sure dest is sequentially stored in memory, otherwise use a temp
+    const Index size = prod.rhs().rows();
+    for(Index k=0; k<size; ++k)
+      dest += (alpha*prod.rhs().coeff(k)) * prod.lhs().col(k);
+  }
+};
+
+template<> struct gemv_selector<OnTheRight,RowMajor,false>
+{
+  template<typename ProductType, typename Dest>
+  static void run(const ProductType& prod, Dest& dest, const typename ProductType::Scalar& alpha)
+  {
+    typedef typename Dest::Index Index;
+    // TODO makes sure rhs is sequentially stored in memory, otherwise use a temp
+    const Index rows = prod.rows();
+    for(Index i=0; i<rows; ++i)
+      dest.coeffRef(i) += alpha * (prod.lhs().row(i).cwiseProduct(prod.rhs().transpose())).sum();
+  }
+};
+
+} // end namespace internal
+
+/***************************************************************************
+* Implementation of matrix base methods
+***************************************************************************/
+
+/** \returns the matrix product of \c *this and \a other.
+  *
+  * \note If instead of the matrix product you want the coefficient-wise product, see Cwise::operator*().
+  *
+  * \sa lazyProduct(), operator*=(const MatrixBase&), Cwise::operator*()
+  */
+#ifndef __CUDACC__
+
+#ifdef EIGEN_TEST_EVALUATORS
+template<typename Derived>
+template<typename OtherDerived>
+inline const Product<Derived, OtherDerived>
+MatrixBase<Derived>::operator*(const MatrixBase<OtherDerived> &other) const
+{
+  // A note regarding the function declaration: In MSVC, this function will sometimes
+  // not be inlined since DenseStorage is an unwindable object for dynamic
+  // matrices and product types are holding a member to store the result.
+  // Thus it does not help tagging this function with EIGEN_STRONG_INLINE.
+  enum {
+    ProductIsValid =  Derived::ColsAtCompileTime==Dynamic
+                   || OtherDerived::RowsAtCompileTime==Dynamic
+                   || int(Derived::ColsAtCompileTime)==int(OtherDerived::RowsAtCompileTime),
+    AreVectors = Derived::IsVectorAtCompileTime && OtherDerived::IsVectorAtCompileTime,
+    SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(Derived,OtherDerived)
+  };
+  // note to the lost user:
+  //    * for a dot product use: v1.dot(v2)
+  //    * for a coeff-wise product use: v1.cwiseProduct(v2)
+  EIGEN_STATIC_ASSERT(ProductIsValid || !(AreVectors && SameSizes),
+    INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS)
+  EIGEN_STATIC_ASSERT(ProductIsValid || !(SameSizes && !AreVectors),
+    INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION)
+  EIGEN_STATIC_ASSERT(ProductIsValid || SameSizes, INVALID_MATRIX_PRODUCT)
+#ifdef EIGEN_DEBUG_PRODUCT
+  internal::product_type<Derived,OtherDerived>::debug();
+#endif
+
+  return Product<Derived, OtherDerived>(derived(), other.derived());
+}
+#else
+template<typename Derived>
+template<typename OtherDerived>
+inline const typename ProductReturnType<Derived, OtherDerived>::Type
+MatrixBase<Derived>::operator*(const MatrixBase<OtherDerived> &other) const
+{
+  // A note regarding the function declaration: In MSVC, this function will sometimes
+  // not be inlined since DenseStorage is an unwindable object for dynamic
+  // matrices and product types are holding a member to store the result.
+  // Thus it does not help tagging this function with EIGEN_STRONG_INLINE.
+  enum {
+    ProductIsValid =  Derived::ColsAtCompileTime==Dynamic
+                   || OtherDerived::RowsAtCompileTime==Dynamic
+                   || int(Derived::ColsAtCompileTime)==int(OtherDerived::RowsAtCompileTime),
+    AreVectors = Derived::IsVectorAtCompileTime && OtherDerived::IsVectorAtCompileTime,
+    SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(Derived,OtherDerived)
+  };
+  // note to the lost user:
+  //    * for a dot product use: v1.dot(v2)
+  //    * for a coeff-wise product use: v1.cwiseProduct(v2)
+  EIGEN_STATIC_ASSERT(ProductIsValid || !(AreVectors && SameSizes),
+    INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS)
+  EIGEN_STATIC_ASSERT(ProductIsValid || !(SameSizes && !AreVectors),
+    INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION)
+  EIGEN_STATIC_ASSERT(ProductIsValid || SameSizes, INVALID_MATRIX_PRODUCT)
+#ifdef EIGEN_DEBUG_PRODUCT
+  internal::product_type<Derived,OtherDerived>::debug();
+#endif
+  return typename ProductReturnType<Derived,OtherDerived>::Type(derived(), other.derived());
+}
+#endif
+
+#endif
+/** \returns an expression of the matrix product of \c *this and \a other without implicit evaluation.
+  *
+  * The returned product will behave like any other expressions: the coefficients of the product will be
+  * computed once at a time as requested. This might be useful in some extremely rare cases when only
+  * a small and no coherent fraction of the result's coefficients have to be computed.
+  *
+  * \warning This version of the matrix product can be much much slower. So use it only if you know
+  * what you are doing and that you measured a true speed improvement.
+  *
+  * \sa operator*(const MatrixBase&)
+  */
+template<typename Derived>
+template<typename OtherDerived>
+const typename LazyProductReturnType<Derived,OtherDerived>::Type
+MatrixBase<Derived>::lazyProduct(const MatrixBase<OtherDerived> &other) const
+{
+  enum {
+    ProductIsValid =  Derived::ColsAtCompileTime==Dynamic
+                   || OtherDerived::RowsAtCompileTime==Dynamic
+                   || int(Derived::ColsAtCompileTime)==int(OtherDerived::RowsAtCompileTime),
+    AreVectors = Derived::IsVectorAtCompileTime && OtherDerived::IsVectorAtCompileTime,
+    SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(Derived,OtherDerived)
+  };
+  // note to the lost user:
+  //    * for a dot product use: v1.dot(v2)
+  //    * for a coeff-wise product use: v1.cwiseProduct(v2)
+  EIGEN_STATIC_ASSERT(ProductIsValid || !(AreVectors && SameSizes),
+    INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS)
+  EIGEN_STATIC_ASSERT(ProductIsValid || !(SameSizes && !AreVectors),
+    INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION)
+  EIGEN_STATIC_ASSERT(ProductIsValid || SameSizes, INVALID_MATRIX_PRODUCT)
+
+  return typename LazyProductReturnType<Derived,OtherDerived>::Type(derived(), other.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_PRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/Core/GenericPacketMath.h b/third_party/eigen3/Eigen/src/Core/GenericPacketMath.h
new file mode 100644
index 0000000000..bf9d6f9c33
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/GenericPacketMath.h
@@ -0,0 +1,584 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GENERIC_PACKET_MATH_H
+#define EIGEN_GENERIC_PACKET_MATH_H
+
+namespace Eigen {
+
+namespace internal {
+
+/** \internal
+  * \file GenericPacketMath.h
+  *
+  * Default implementation for types not supported by the vectorization.
+  * In practice these functions are provided to make easier the writing
+  * of generic vectorized code.
+  */
+
+#ifndef EIGEN_DEBUG_ALIGNED_LOAD
+#define EIGEN_DEBUG_ALIGNED_LOAD
+#endif
+
+#ifndef EIGEN_DEBUG_UNALIGNED_LOAD
+#define EIGEN_DEBUG_UNALIGNED_LOAD
+#endif
+
+#ifndef EIGEN_DEBUG_ALIGNED_STORE
+#define EIGEN_DEBUG_ALIGNED_STORE
+#endif
+
+#ifndef EIGEN_DEBUG_UNALIGNED_STORE
+#define EIGEN_DEBUG_UNALIGNED_STORE
+#endif
+
+struct default_packet_traits
+{
+  enum {
+    HasHalfPacket = 0,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasNegate = 1,
+    HasAbs    = 1,
+    HasAbs2   = 1,
+    HasMin    = 1,
+    HasMax    = 1,
+    HasConj   = 1,
+    HasSetLinear = 1,
+    HasBlend  = 0,
+
+    HasDiv    = 0,
+    HasSqrt   = 0,
+    HasRsqrt  = 0,
+    HasExp    = 0,
+    HasLog    = 0,
+    HasPow    = 0,
+
+    HasSin    = 0,
+    HasCos    = 0,
+    HasTan    = 0,
+    HasASin   = 0,
+    HasACos   = 0,
+    HasATan   = 0,
+    HasTanH   = 0
+  };
+};
+
+template<typename T> struct packet_traits : default_packet_traits
+{
+  typedef T type;
+  typedef T half;
+  enum {
+    Vectorizable = 0,
+    size = 1,
+    AlignedOnScalar = 0,
+    HasHalfPacket = 0
+  };
+  enum {
+    HasAdd    = 0,
+    HasSub    = 0,
+    HasMul    = 0,
+    HasNegate = 0,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasConj   = 0,
+    HasSetLinear = 0
+  };
+};
+
+template<typename T> struct packet_traits<const T> : packet_traits<T> { };
+
+
+template <typename Src, typename Tgt> struct type_casting_traits {
+  enum {
+    VectorizedCast = 0,
+    SrcCoeffRatio = 1,
+    TgtCoeffRatio = 1
+  };
+};
+
+template <typename T> struct type_casting_traits<T, T> {
+  enum {
+    VectorizedCast = 1,
+    SrcCoeffRatio = 1,
+    TgtCoeffRatio = 1
+  };
+};
+
+
+/** \internal \returns static_cast<TgtType>(a) (coeff-wise) */
+template <typename SrcPacket, typename TgtPacket>
+EIGEN_DEVICE_FUNC inline TgtPacket
+pcast(const SrcPacket& a) {
+  return static_cast<TgtPacket>(a);
+}
+template <typename SrcPacket, typename TgtPacket>
+EIGEN_DEVICE_FUNC inline TgtPacket
+pcast(const SrcPacket& a, const SrcPacket& /*b*/) {
+  return static_cast<TgtPacket>(a);
+}
+
+template <typename SrcPacket, typename TgtPacket>
+EIGEN_DEVICE_FUNC inline TgtPacket
+pcast(const SrcPacket& a, const SrcPacket& /*b*/, const SrcPacket& /*c*/, const SrcPacket& /*d*/) {
+  return static_cast<TgtPacket>(a);
+}
+
+/** \internal \returns a + b (coeff-wise) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+padd(const Packet& a,
+        const Packet& b) { return a+b; }
+
+/** \internal \returns a - b (coeff-wise) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+psub(const Packet& a,
+        const Packet& b) { return a-b; }
+
+/** \internal \returns true for if a == b */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+peq(const Packet& a, const Packet& b) { return a == b; }
+
+/** \internal \returns true for if a < b */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+plt(const Packet& a, const Packet& b) { return a < b; }
+
+/** \internal \returns true for if a <= b */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+ple(const Packet& a, const Packet& b) { return a <= b; }
+
+/** \internal \returns b if false_mask is set, else a */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pselect(const Packet& a,
+        const Packet& b,
+        const Packet& false_mask) {
+  return false_mask ? b : a;
+}
+
+/** \internal \returns -a (coeff-wise) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pnegate(const Packet& a) { return -a; }
+
+/** \internal \returns conj(a) (coeff-wise) */
+
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pconj(const Packet& a) { return numext::conj(a); }
+
+/** \internal \returns a * b (coeff-wise) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pmul(const Packet& a,
+        const Packet& b) { return a*b; }
+
+/** \internal \returns a / b (coeff-wise) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pdiv(const Packet& a,
+        const Packet& b) { return a/b; }
+
+/** \internal \returns the min of \a a and \a b  (coeff-wise) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pmin(const Packet& a,
+        const Packet& b) { return numext::mini(a, b); }
+
+/** \internal \returns the max of \a a and \a b  (coeff-wise) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pmax(const Packet& a,
+        const Packet& b) { return numext::maxi(a, b); }
+
+/** \internal \returns the absolute value of \a a */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pabs(const Packet& a) { using std::abs; return abs(a); }
+
+/** \internal \returns the bitwise and of \a a and \a b */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pand(const Packet& a, const Packet& b) { return a & b; }
+
+/** \internal \returns the bitwise or of \a a and \a b */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+por(const Packet& a, const Packet& b) { return a | b; }
+
+/** \internal \returns the bitwise xor of \a a and \a b */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pxor(const Packet& a, const Packet& b) { return a ^ b; }
+
+/** \internal \returns the bitwise andnot of \a a and \a b */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pandnot(const Packet& a, const Packet& b) { return a & (!b); }
+
+/** \internal \returns a packet version of \a *from, from must be 16 bytes aligned */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pload(const typename unpacket_traits<Packet>::type* from) { return *from; }
+
+/** \internal \returns a packet version of \a *from, (un-aligned load) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+ploadu(const typename unpacket_traits<Packet>::type* from) { return *from; }
+
+/** \internal \returns a packet with constant coefficients \a a, e.g.: (a,a,a,a) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pset1(const typename unpacket_traits<Packet>::type& a) { return a; }
+
+/** \internal \returns a packet with constant coefficients \a a[0], e.g.: (a[0],a[0],a[0],a[0]) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pload1(const typename unpacket_traits<Packet>::type  *a) { return pset1<Packet>(*a); }
+
+/** \internal \returns a packet with elements of \a *from duplicated.
+  * For instance, for a packet of 8 elements, 4 scalars will be read from \a *from and
+  * duplicated to form: {from[0],from[0],from[1],from[1],from[2],from[2],from[3],from[3]}
+  * Currently, this function is only used for scalar * complex products.
+  */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+ploaddup(const typename unpacket_traits<Packet>::type* from) { return *from; }
+
+/** \internal \returns a packet with elements of \a *from quadrupled.
+  * For instance, for a packet of 8 elements, 2 scalars will be read from \a *from and
+  * replicated to form: {from[0],from[0],from[0],from[0],from[1],from[1],from[1],from[1]}
+  * Currently, this function is only used in matrix products.
+  * For packet-size smaller or equal to 4, this function is equivalent to pload1
+  */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+ploadquad(const typename unpacket_traits<Packet>::type* from)
+{ return pload1<Packet>(from); }
+
+/** \internal equivalent to
+  * \code
+  * a0 = pload1(a+0);
+  * a1 = pload1(a+1);
+  * a2 = pload1(a+2);
+  * a3 = pload1(a+3);
+  * \endcode
+  * \sa pset1, pload1, ploaddup, pbroadcast2
+  */
+template<typename Packet> EIGEN_DEVICE_FUNC
+inline void pbroadcast4(const typename unpacket_traits<Packet>::type *a,
+                        Packet& a0, Packet& a1, Packet& a2, Packet& a3)
+{
+  a0 = pload1<Packet>(a+0);
+  a1 = pload1<Packet>(a+1);
+  a2 = pload1<Packet>(a+2);
+  a3 = pload1<Packet>(a+3);
+}
+
+/** \internal equivalent to
+  * \code
+  * a0 = pload1(a+0);
+  * a1 = pload1(a+1);
+  * \endcode
+  * \sa pset1, pload1, ploaddup, pbroadcast4
+  */
+template<typename Packet> EIGEN_DEVICE_FUNC
+inline void pbroadcast2(const typename unpacket_traits<Packet>::type *a,
+                        Packet& a0, Packet& a1)
+{
+  a0 = pload1<Packet>(a+0);
+  a1 = pload1<Packet>(a+1);
+}
+
+/** \internal \brief Returns a packet with coefficients (a,a+1,...,a+packet_size-1). */
+template<typename Scalar> inline typename packet_traits<Scalar>::type
+plset(const Scalar& a) { return a; }
+
+/** \internal copy the packet \a from to \a *to, \a to must be 16 bytes aligned */
+template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline void pstore(Scalar* to, const Packet& from)
+{ (*to) = from; }
+
+/** \internal copy the packet \a from to \a *to, (un-aligned store) */
+template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline void pstoreu(Scalar* to, const Packet& from)
+{  (*to) = from; }
+
+ template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline Packet pgather(const Scalar* from, int /*stride*/)
+ { return ploadu<Packet>(from); }
+
+ template<typename Scalar, typename Packet> EIGEN_DEVICE_FUNC inline void pscatter(Scalar* to, const Packet& from, int /*stride*/)
+ { pstore(to, from); }
+
+/** \internal tries to do cache prefetching of \a addr */
+template<typename Scalar> EIGEN_DEVICE_FUNC inline void prefetch(const Scalar* addr)
+{
+#ifdef __CUDA_ARCH__
+#if defined(__LP64__)
+  // 64-bit pointer operand constraint for inlined asm
+  asm(" prefetch.L1 [ %1 ];" : "=l"(addr) : "l"(addr));
+#else
+  // 32-bit pointer operand constraint for inlined asm
+  asm(" prefetch.L1 [ %1 ];" : "=r"(addr) : "r"(addr));
+#endif
+#elif !defined(_MSC_VER)
+  __builtin_prefetch(addr);
+#endif
+}
+
+/** \internal \returns the first element of a packet */
+template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type pfirst(const Packet& a)
+{ return a; }
+
+/** \internal \returns a packet where the element i contains the sum of the packet of \a vec[i] */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+preduxp(const Packet* vecs) { return vecs[0]; }
+
+/** \internal \returns the sum of the elements of \a a*/
+template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux(const Packet& a)
+{ return a; }
+
+/** \internal \returns the sum of the elements of \a a by block of 4 elements.
+  * For a packet {a0, a1, a2, a3, a4, a5, a6, a7}, it returns a half packet {a0+a4, a1+a5, a2+a6, a3+a7}
+  * For packet-size smaller or equal to 4, this boils down to a noop.
+  */
+template<typename Packet> EIGEN_DEVICE_FUNC inline
+typename conditional<(unpacket_traits<Packet>::size%8)==0,typename unpacket_traits<Packet>::half,Packet>::type
+predux4(const Packet& a)
+{ return a; }
+
+/** \internal \returns the product of the elements of \a a*/
+template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_mul(const Packet& a)
+{ return a; }
+
+/** \internal \returns the min of the elements of \a a*/
+template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_min(const Packet& a)
+{ return a; }
+
+/** \internal \returns the max of the elements of \a a*/
+template<typename Packet> EIGEN_DEVICE_FUNC inline typename unpacket_traits<Packet>::type predux_max(const Packet& a)
+{ return a; }
+
+/** \internal \returns the reversed elements of \a a*/
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet preverse(const Packet& a)
+{ return a; }
+
+template<size_t offset, typename Packet>
+struct protate_impl
+{
+  // Empty so attempts to use this unimplemented path will fail to compile.
+  // Only specializations of this template should be used.
+};
+
+/** \internal \returns a packet with the coefficients rotated to the right in little-endian convention,
+  * by the given offset, e.g. for offset == 1:
+  *     (packet[3], packet[2], packet[1], packet[0]) becomes (packet[0], packet[3], packet[2], packet[1])
+  */
+template<size_t offset, typename Packet> EIGEN_DEVICE_FUNC inline Packet protate(const Packet& a)
+{
+  return offset ? protate_impl<offset, Packet>::run(a) : a;
+}
+
+/** \internal \returns \a a with real and imaginary part flipped (for complex type only) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet pcplxflip(const Packet& a)
+{
+  // FIXME: uncomment the following in case we drop the internal imag and real functions.
+//   using std::imag;
+//   using std::real;
+  return Packet(imag(a),real(a));
+}
+
+/**************************
+* Special math functions
+***************************/
+
+/** \internal \returns the sine of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet psin(const Packet& a) { using std::sin; return sin(a); }
+
+/** \internal \returns the cosine of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet pcos(const Packet& a) { using std::cos; return cos(a); }
+
+/** \internal \returns the tan of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet ptan(const Packet& a) { using std::tan; return tan(a); }
+
+/** \internal \returns the arc sine of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet pasin(const Packet& a) { using std::asin; return asin(a); }
+
+/** \internal \returns the arc cosine of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet pacos(const Packet& a) { using std::acos; return acos(a); }
+
+/** \internal \returns the atan of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet patan(const Packet& a) { using std::atan; return atan(a); }
+
+/** \internal \returns the exp of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet pexp(const Packet& a) { using std::exp; return exp(a); }
+
+/** \internal \returns the log of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet plog(const Packet& a) { using std::log; return log(a); }
+
+/** \internal \returns the square-root of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet psqrt(const Packet& a) { using std::sqrt; return sqrt(a); }
+
+/** \internal \returns the reciprocal square-root of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet prsqrt(const Packet& a) {
+  using std::sqrt;
+  const Packet one(1);
+  return one/sqrt(a);
+}
+
+// Default ptanh approximation threshold, assumes single precision
+// floating point.
+template<typename Packet> Packet ptanh_approx_threshold() {
+  return pset1<Packet>(0.01);
+}
+
+/** \internal \returns the hyperbolic tan of \a a (coeff-wise) */
+template<typename Packet> EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+Packet ptanh(const Packet& x)
+{
+  const Packet one = pset1<Packet>(1);
+  const Packet two = pset1<Packet>(2);
+  const Packet three = pset1<Packet>(3);
+  const Packet thresh = ptanh_approx_threshold<Packet>();
+  const Packet x2 = pmul(x, x);
+  const Packet small_approx = pmul(x, psub(one, pdiv(x2, three)));
+  const Packet med_approx = psub(one, pdiv(two, padd(pexp(pmul(two, x)), one)));
+
+  // If |x| > thresh, tanh(x) = 1-2/(exp(2*x) + 1)
+  // tanh(x) can be written: x(1 - x^2/3 + ...) for |x| < pi/2
+  // Select a thresh s.t. |tanh(x) - x| = O(eps), where for floats,
+  // If |x| < thresh, tanh(x) = x*(1-x^2/3)
+  // Use theresh = 0.01 as this matches the float32 approximation
+  // threshold on my system!
+  return pselect(med_approx, small_approx, ple(pabs(x), thresh));
+}
+
+/***************************************************************************
+* The following functions might not have to be overwritten for vectorized types
+***************************************************************************/
+
+/** \internal copy a packet with constant coeficient \a a (e.g., [a,a,a,a]) to \a *to. \a to must be 16 bytes aligned */
+// NOTE: this function must really be templated on the packet type (think about different packet types for the same scalar type)
+template<typename Packet>
+inline void pstore1(typename unpacket_traits<Packet>::type* to, const typename unpacket_traits<Packet>::type& a)
+{
+  pstore(to, pset1<Packet>(a));
+}
+
+/** \internal \returns a * b + c (coeff-wise) */
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pmadd(const Packet&  a,
+         const Packet&  b,
+         const Packet&  c)
+{ return padd(pmul(a, b),c); }
+
+/** \internal \returns a packet version of \a *from.
+  * If LoadMode equals #Aligned, \a from must be 16 bytes aligned */
+template<typename Packet, int LoadMode>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet ploadt(const typename unpacket_traits<Packet>::type* from)
+{
+  if(LoadMode == Aligned)
+    return pload<Packet>(from);
+  else
+    return ploadu<Packet>(from);
+}
+
+/** \internal copy the packet \a from to \a *to.
+  * If StoreMode equals #Aligned, \a to must be 16 bytes aligned */
+template<typename Scalar, typename Packet, int LoadMode>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void pstoret(Scalar* to, const Packet& from)
+{
+  if(LoadMode == Aligned)
+    pstore(to, from);
+  else
+    pstoreu(to, from);
+}
+
+/** \internal \returns a packet version of \a *from.
+  * Unlike ploadt, ploadt_ro takes advantage of the read-only memory path on the
+  * hardware if available to speedup the loading of data that won't be modified
+  * by the current computation.
+  */
+template<typename Packet, int LoadMode>
+inline Packet ploadt_ro(const typename unpacket_traits<Packet>::type* from)
+{
+  return ploadt<Packet, LoadMode>(from);
+}
+
+/** \internal default implementation of palign() allowing partial specialization */
+template<int Offset,typename PacketType>
+struct palign_impl
+{
+  // by default data are aligned, so there is nothing to be done :)
+  static inline void run(PacketType&, const PacketType&) {}
+};
+
+/** \internal update \a first using the concatenation of the packet_size minus \a Offset last elements
+  * of \a first and \a Offset first elements of \a second.
+  *
+  * This function is currently only used to optimize matrix-vector products on unligned matrices.
+  * It takes 2 packets that represent a contiguous memory array, and returns a packet starting
+  * at the position \a Offset. For instance, for packets of 4 elements, we have:
+  *  Input:
+  *  - first = {f0,f1,f2,f3}
+  *  - second = {s0,s1,s2,s3}
+  * Output:
+  *   - if Offset==0 then {f0,f1,f2,f3}
+  *   - if Offset==1 then {f1,f2,f3,s0}
+  *   - if Offset==2 then {f2,f3,s0,s1}
+  *   - if Offset==3 then {f3,s0,s1,s3}
+  */
+template<int Offset,typename PacketType>
+inline void palign(PacketType& first, const PacketType& second)
+{
+  palign_impl<Offset,PacketType>::run(first,second);
+}
+
+/***************************************************************************
+* Fast complex products (GCC generates a function call which is very slow)
+***************************************************************************/
+
+// Eigen+CUDA does not support complexes.
+#ifndef __CUDACC__
+
+template<> inline std::complex<float> pmul(const std::complex<float>& a, const std::complex<float>& b)
+{ return std::complex<float>(real(a)*real(b) - imag(a)*imag(b), imag(a)*real(b) + real(a)*imag(b)); }
+
+template<> inline std::complex<double> pmul(const std::complex<double>& a, const std::complex<double>& b)
+{ return std::complex<double>(real(a)*real(b) - imag(a)*imag(b), imag(a)*real(b) + real(a)*imag(b)); }
+
+#endif
+
+
+/***************************************************************************
+ * PacketBlock, that is a collection of N packets where the number of words
+ * in the packet is a multiple of N.
+***************************************************************************/
+template <typename Packet,int N=unpacket_traits<Packet>::size> struct PacketBlock {
+  Packet packet[N];
+};
+
+template<typename SquarePacketBlock> EIGEN_DEVICE_FUNC inline void
+ptranspose(SquarePacketBlock& /*kernel*/) {
+  // Nothing to do in the scalar case, i.e. a 1x1 matrix.
+}
+
+
+/***************************************************************************
+ * Selector, i.e. vector of N boolean values used to select (i.e. blend)
+ * words from 2 packets.
+***************************************************************************/
+template <size_t N> struct Selector {
+  bool select[N];
+};
+
+template<typename Packet> EIGEN_DEVICE_FUNC inline Packet
+pblend(const Selector<unpacket_traits<Packet>::size>& ifPacket, const Packet& thenPacket, const Packet& elsePacket) {
+  return ifPacket.select[0] ? thenPacket : elsePacket;
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERIC_PACKET_MATH_H
diff --git a/third_party/eigen3/Eigen/src/Core/GlobalFunctions.h b/third_party/eigen3/Eigen/src/Core/GlobalFunctions.h
new file mode 100644
index 0000000000..0b1ce46ba2
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/GlobalFunctions.h
@@ -0,0 +1,94 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010-2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GLOBAL_FUNCTIONS_H
+#define EIGEN_GLOBAL_FUNCTIONS_H
+
+#define EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(NAME,FUNCTOR) \
+  template<typename Derived> \
+  inline const Eigen::CwiseUnaryOp<Eigen::internal::FUNCTOR<typename Derived::Scalar>, const Derived> \
+  NAME(const Eigen::ArrayBase<Derived>& x) { \
+    return x.derived(); \
+  }
+
+#define EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(NAME,FUNCTOR) \
+  \
+  template<typename Derived> \
+  struct NAME##_retval<ArrayBase<Derived> > \
+  { \
+    typedef const Eigen::CwiseUnaryOp<Eigen::internal::FUNCTOR<typename Derived::Scalar>, const Derived> type; \
+  }; \
+  template<typename Derived> \
+  struct NAME##_impl<ArrayBase<Derived> > \
+  { \
+    static inline typename NAME##_retval<ArrayBase<Derived> >::type run(const Eigen::ArrayBase<Derived>& x) \
+    { \
+      return x.derived(); \
+    } \
+  };
+
+
+namespace Eigen
+{
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(real,scalar_real_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(imag,scalar_imag_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(conj,scalar_conjugate_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sin,scalar_sin_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(cos,scalar_cos_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(asin,scalar_asin_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(acos,scalar_acos_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tan,scalar_tan_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(atan,scalar_atan_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(tanh,scalar_tanh_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(exp,scalar_exp_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(log,scalar_log_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(abs,scalar_abs_op)
+  EIGEN_ARRAY_DECLARE_GLOBAL_UNARY(sqrt,scalar_sqrt_op)
+  
+  template<typename Derived>
+  inline const Eigen::CwiseUnaryOp<Eigen::internal::scalar_pow_op<typename Derived::Scalar>, const Derived>
+  pow(const Eigen::ArrayBase<Derived>& x, const typename Derived::Scalar& exponent) {
+    return x.derived().pow(exponent);
+  }
+
+  template<typename Derived>
+  inline const Eigen::CwiseBinaryOp<Eigen::internal::scalar_binary_pow_op<typename Derived::Scalar, typename Derived::Scalar>, const Derived, const Derived>
+  pow(const Eigen::ArrayBase<Derived>& x, const Eigen::ArrayBase<Derived>& exponents) 
+  {
+    return Eigen::CwiseBinaryOp<Eigen::internal::scalar_binary_pow_op<typename Derived::Scalar, typename Derived::Scalar>, const Derived, const Derived>(
+      x.derived(),
+      exponents.derived()
+    );
+  }
+  
+  /**
+  * \brief Component-wise division of a scalar by array elements.
+  **/
+  template <typename Derived>
+  inline const Eigen::CwiseUnaryOp<Eigen::internal::scalar_inverse_mult_op<typename Derived::Scalar>, const Derived>
+    operator/(const typename Derived::Scalar& s, const Eigen::ArrayBase<Derived>& a)
+  {
+    return Eigen::CwiseUnaryOp<Eigen::internal::scalar_inverse_mult_op<typename Derived::Scalar>, const Derived>(
+      a.derived(),
+      Eigen::internal::scalar_inverse_mult_op<typename Derived::Scalar>(s)  
+    );
+  }
+
+  namespace internal
+  {
+    EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(real,scalar_real_op)
+    EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(imag,scalar_imag_op)
+    EIGEN_ARRAY_DECLARE_GLOBAL_EIGEN_UNARY(abs2,scalar_abs2_op)
+  }
+}
+
+// TODO: cleanly disable those functions that are not supported on Array (numext::real_ref, internal::random, internal::isApprox...)
+
+#endif // EIGEN_GLOBAL_FUNCTIONS_H
diff --git a/third_party/eigen3/Eigen/src/Core/IO.h b/third_party/eigen3/Eigen/src/Core/IO.h
new file mode 100644
index 0000000000..a1a90c119d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/IO.h
@@ -0,0 +1,257 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_IO_H
+#define EIGEN_IO_H
+
+namespace Eigen { 
+
+enum { DontAlignCols = 1 };
+enum { StreamPrecision = -1,
+       FullPrecision = -2 };
+
+namespace internal {
+template<typename Derived>
+std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt);
+}
+
+/** \class IOFormat
+  * \ingroup Core_Module
+  *
+  * \brief Stores a set of parameters controlling the way matrices are printed
+  *
+  * List of available parameters:
+  *  - \b precision number of digits for floating point values, or one of the special constants \c StreamPrecision and \c FullPrecision.
+  *                 The default is the special value \c StreamPrecision which means to use the
+  *                 stream's own precision setting, as set for instance using \c cout.precision(3). The other special value
+  *                 \c FullPrecision means that the number of digits will be computed to match the full precision of each floating-point
+  *                 type.
+  *  - \b flags an OR-ed combination of flags, the default value is 0, the only currently available flag is \c DontAlignCols which
+  *             allows to disable the alignment of columns, resulting in faster code.
+  *  - \b coeffSeparator string printed between two coefficients of the same row
+  *  - \b rowSeparator string printed between two rows
+  *  - \b rowPrefix string printed at the beginning of each row
+  *  - \b rowSuffix string printed at the end of each row
+  *  - \b matPrefix string printed at the beginning of the matrix
+  *  - \b matSuffix string printed at the end of the matrix
+  *
+  * Example: \include IOFormat.cpp
+  * Output: \verbinclude IOFormat.out
+  *
+  * \sa DenseBase::format(), class WithFormat
+  */
+struct IOFormat
+{
+  /** Default constructor, see class IOFormat for the meaning of the parameters */
+  IOFormat(int _precision = StreamPrecision, int _flags = 0,
+    const std::string& _coeffSeparator = " ",
+    const std::string& _rowSeparator = "\n", const std::string& _rowPrefix="", const std::string& _rowSuffix="",
+    const std::string& _matPrefix="", const std::string& _matSuffix="")
+  : matPrefix(_matPrefix), matSuffix(_matSuffix), rowPrefix(_rowPrefix), rowSuffix(_rowSuffix), rowSeparator(_rowSeparator),
+    rowSpacer(""), coeffSeparator(_coeffSeparator), precision(_precision), flags(_flags)
+  {
+    // TODO check if rowPrefix, rowSuffix or rowSeparator contains a newline
+    // don't add rowSpacer if columns are not to be aligned
+    if((flags & DontAlignCols))
+      return;
+    int i = int(matSuffix.length())-1;
+    while (i>=0 && matSuffix[i]!='\n')
+    {
+      rowSpacer += ' ';
+      i--;
+    }
+  }
+  std::string matPrefix, matSuffix;
+  std::string rowPrefix, rowSuffix, rowSeparator, rowSpacer;
+  std::string coeffSeparator;
+  int precision;
+  int flags;
+};
+
+/** \class WithFormat
+  * \ingroup Core_Module
+  *
+  * \brief Pseudo expression providing matrix output with given format
+  *
+  * \param ExpressionType the type of the object on which IO stream operations are performed
+  *
+  * This class represents an expression with stream operators controlled by a given IOFormat.
+  * It is the return type of DenseBase::format()
+  * and most of the time this is the only way it is used.
+  *
+  * See class IOFormat for some examples.
+  *
+  * \sa DenseBase::format(), class IOFormat
+  */
+template<typename ExpressionType>
+class WithFormat
+{
+  public:
+
+    WithFormat(const ExpressionType& matrix, const IOFormat& format)
+      : m_matrix(matrix), m_format(format)
+    {}
+
+    friend std::ostream & operator << (std::ostream & s, const WithFormat& wf)
+    {
+      return internal::print_matrix(s, wf.m_matrix.eval(), wf.m_format);
+    }
+
+  protected:
+    const typename ExpressionType::Nested m_matrix;
+    IOFormat m_format;
+};
+
+/** \returns a WithFormat proxy object allowing to print a matrix the with given
+  * format \a fmt.
+  *
+  * See class IOFormat for some examples.
+  *
+  * \sa class IOFormat, class WithFormat
+  */
+template<typename Derived>
+inline const WithFormat<Derived>
+DenseBase<Derived>::format(const IOFormat& fmt) const
+{
+  return WithFormat<Derived>(derived(), fmt);
+}
+
+namespace internal {
+
+template<typename Scalar, bool IsInteger>
+struct significant_decimals_default_impl
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  static inline int run()
+  {
+    using std::ceil;
+    using std::log;
+    return cast<RealScalar,int>(ceil(-log(NumTraits<RealScalar>::epsilon())/log(RealScalar(10))));
+  }
+};
+
+template<typename Scalar>
+struct significant_decimals_default_impl<Scalar, true>
+{
+  static inline int run()
+  {
+    return 0;
+  }
+};
+
+template<typename Scalar>
+struct significant_decimals_impl
+  : significant_decimals_default_impl<Scalar, NumTraits<Scalar>::IsInteger>
+{};
+
+/** \internal
+  * print the matrix \a _m to the output stream \a s using the output format \a fmt */
+template<typename Derived>
+std::ostream & print_matrix(std::ostream & s, const Derived& _m, const IOFormat& fmt)
+{
+  if(_m.size() == 0)
+  {
+    s << fmt.matPrefix << fmt.matSuffix;
+    return s;
+  }
+  
+  typename Derived::Nested m = _m;
+  typedef typename Derived::Scalar Scalar;
+  typedef typename Derived::Index Index;
+
+  Index width = 0;
+
+  std::streamsize explicit_precision;
+  if(fmt.precision == StreamPrecision)
+  {
+    explicit_precision = 0;
+  }
+  else if(fmt.precision == FullPrecision)
+  {
+    if (NumTraits<Scalar>::IsInteger)
+    {
+      explicit_precision = 0;
+    }
+    else
+    {
+      explicit_precision = significant_decimals_impl<Scalar>::run();
+    }
+  }
+  else
+  {
+    explicit_precision = fmt.precision;
+  }
+
+  std::streamsize old_precision = 0;
+  if(explicit_precision) old_precision = s.precision(explicit_precision);
+
+  bool align_cols = !(fmt.flags & DontAlignCols);
+  if(align_cols)
+  {
+    // compute the largest width
+    for(Index j = 0; j < m.cols(); ++j)
+      for(Index i = 0; i < m.rows(); ++i)
+      {
+        std::stringstream sstr;
+        sstr.copyfmt(s);
+        sstr << m.coeff(i,j);
+        width = std::max<Index>(width, Index(sstr.str().length()));
+      }
+  }
+  s << fmt.matPrefix;
+  const char old_fill = s.fill();
+  s.fill(' ');
+  for(Index i = 0; i < m.rows(); ++i)
+  {
+    if (i)
+      s << fmt.rowSpacer;
+    s << fmt.rowPrefix;
+    if(width) s.width(width);
+    s << m.coeff(i, 0);
+    for(Index j = 1; j < m.cols(); ++j)
+    {
+      s << fmt.coeffSeparator;
+      if (width) s.width(width);
+      s << m.coeff(i, j);
+    }
+    s << fmt.rowSuffix;
+    if( i < m.rows() - 1)
+      s << fmt.rowSeparator;
+  }
+  s.fill(old_fill);
+  s << fmt.matSuffix;
+  if(explicit_precision) s.precision(old_precision);
+  return s;
+}
+
+} // end namespace internal
+
+/** \relates DenseBase
+  *
+  * Outputs the matrix, to the given stream.
+  *
+  * If you wish to print the matrix with a format different than the default, use DenseBase::format().
+  *
+  * It is also possible to change the default format by defining EIGEN_DEFAULT_IO_FORMAT before including Eigen headers.
+  * If not defined, this will automatically be defined to Eigen::IOFormat(), that is the Eigen::IOFormat with default parameters.
+  *
+  * \sa DenseBase::format()
+  */
+template<typename Derived>
+std::ostream & operator <<
+(std::ostream & s,
+ const DenseBase<Derived> & m)
+{
+  return internal::print_matrix(s, m.eval(), EIGEN_DEFAULT_IO_FORMAT);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_IO_H
diff --git a/third_party/eigen3/Eigen/src/Core/Map.h b/third_party/eigen3/Eigen/src/Core/Map.h
new file mode 100644
index 0000000000..0838d69e37
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Map.h
@@ -0,0 +1,185 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2007-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MAP_H
+#define EIGEN_MAP_H
+
+namespace Eigen { 
+
+/** \class Map
+  * \ingroup Core_Module
+  *
+  * \brief A matrix or vector expression mapping an existing array of data.
+  *
+  * \tparam PlainObjectType the equivalent matrix type of the mapped data
+  * \tparam MapOptions specifies whether the pointer is \c #Aligned, or \c #Unaligned.
+  *                The default is \c #Unaligned.
+  * \tparam StrideType optionally specifies strides. By default, Map assumes the memory layout
+  *                   of an ordinary, contiguous array. This can be overridden by specifying strides.
+  *                   The type passed here must be a specialization of the Stride template, see examples below.
+  *
+  * This class represents a matrix or vector expression mapping an existing array of data.
+  * It can be used to let Eigen interface without any overhead with non-Eigen data structures,
+  * such as plain C arrays or structures from other libraries. By default, it assumes that the
+  * data is laid out contiguously in memory. You can however override this by explicitly specifying
+  * inner and outer strides.
+  *
+  * Here's an example of simply mapping a contiguous array as a \ref TopicStorageOrders "column-major" matrix:
+  * \include Map_simple.cpp
+  * Output: \verbinclude Map_simple.out
+  *
+  * If you need to map non-contiguous arrays, you can do so by specifying strides:
+  *
+  * Here's an example of mapping an array as a vector, specifying an inner stride, that is, the pointer
+  * increment between two consecutive coefficients. Here, we're specifying the inner stride as a compile-time
+  * fixed value.
+  * \include Map_inner_stride.cpp
+  * Output: \verbinclude Map_inner_stride.out
+  *
+  * Here's an example of mapping an array while specifying an outer stride. Here, since we're mapping
+  * as a column-major matrix, 'outer stride' means the pointer increment between two consecutive columns.
+  * Here, we're specifying the outer stride as a runtime parameter. Note that here \c OuterStride<> is
+  * a short version of \c OuterStride<Dynamic> because the default template parameter of OuterStride
+  * is  \c Dynamic
+  * \include Map_outer_stride.cpp
+  * Output: \verbinclude Map_outer_stride.out
+  *
+  * For more details and for an example of specifying both an inner and an outer stride, see class Stride.
+  *
+  * \b Tip: to change the array of data mapped by a Map object, you can use the C++
+  * placement new syntax:
+  *
+  * Example: \include Map_placement_new.cpp
+  * Output: \verbinclude Map_placement_new.out
+  *
+  * This class is the return type of PlainObjectBase::Map() but can also be used directly.
+  *
+  * \sa PlainObjectBase::Map(), \ref TopicStorageOrders
+  */
+
+namespace internal {
+template<typename PlainObjectType, int MapOptions, typename StrideType>
+struct traits<Map<PlainObjectType, MapOptions, StrideType> >
+  : public traits<PlainObjectType>
+{
+  typedef traits<PlainObjectType> TraitsBase;
+  typedef typename PlainObjectType::Index Index;
+  typedef typename PlainObjectType::Scalar Scalar;
+  enum {
+    InnerStrideAtCompileTime = StrideType::InnerStrideAtCompileTime == 0
+                             ? int(PlainObjectType::InnerStrideAtCompileTime)
+                             : int(StrideType::InnerStrideAtCompileTime),
+    OuterStrideAtCompileTime = StrideType::OuterStrideAtCompileTime == 0
+                             ? int(PlainObjectType::OuterStrideAtCompileTime)
+                             : int(StrideType::OuterStrideAtCompileTime),
+    HasNoInnerStride = InnerStrideAtCompileTime == 1,
+    HasNoOuterStride = StrideType::OuterStrideAtCompileTime == 0,
+    HasNoStride = HasNoInnerStride && HasNoOuterStride,
+    IsAligned = bool(EIGEN_ALIGN) && ((int(MapOptions)&Aligned)==Aligned),
+    IsDynamicSize = PlainObjectType::SizeAtCompileTime==Dynamic,
+    KeepsPacketAccess = bool(HasNoInnerStride)
+                        && ( bool(IsDynamicSize)
+                           || HasNoOuterStride
+                           || ( OuterStrideAtCompileTime!=Dynamic
+                           && ((static_cast<int>(sizeof(Scalar))*OuterStrideAtCompileTime)%EIGEN_ALIGN_BYTES)==0 ) ),
+    Flags0 = TraitsBase::Flags & (~NestByRefBit),
+    Flags1 = IsAligned ? (int(Flags0) | AlignedBit) : (int(Flags0) & ~AlignedBit),
+    Flags2 = (bool(HasNoStride) || bool(PlainObjectType::IsVectorAtCompileTime))
+           ? int(Flags1) : int(Flags1 & ~LinearAccessBit),
+    Flags3 = is_lvalue<PlainObjectType>::value ? int(Flags2) : (int(Flags2) & ~LvalueBit),
+    Flags = KeepsPacketAccess ? int(Flags3) : (int(Flags3) & ~PacketAccessBit)
+  };
+private:
+  enum { Options }; // Expressions don't have Options
+};
+}
+
+template<typename PlainObjectType, int MapOptions, typename StrideType> class Map
+  : public MapBase<Map<PlainObjectType, MapOptions, StrideType> >
+{
+  public:
+
+    typedef MapBase<Map> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Map)
+
+    typedef typename Base::PointerType PointerType;
+#if EIGEN2_SUPPORT_STAGE <= STAGE30_FULL_EIGEN3_API
+    typedef const Scalar* PointerArgType;
+    inline PointerType cast_to_pointer_type(PointerArgType ptr) { return const_cast<PointerType>(ptr); }
+#else
+    typedef PointerType PointerArgType;
+    EIGEN_DEVICE_FUNC
+    inline PointerType cast_to_pointer_type(PointerArgType ptr) { return ptr; }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const
+    {
+      return StrideType::InnerStrideAtCompileTime != 0 ? m_stride.inner() : 1;
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const
+    {
+      return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer()
+           : IsVectorAtCompileTime ? this->size()
+           : int(Flags)&RowMajorBit ? this->cols()
+           : this->rows();
+    }
+
+    /** Constructor in the fixed-size case.
+      *
+      * \param dataPtr pointer to the array to map
+      * \param a_stride optional Stride object, passing the strides.
+      */
+    EIGEN_DEVICE_FUNC
+    inline Map(PointerArgType dataPtr, const StrideType& a_stride = StrideType())
+      : Base(cast_to_pointer_type(dataPtr)), m_stride(a_stride)
+    {
+      PlainObjectType::Base::_check_template_params();
+    }
+
+    /** Constructor in the dynamic-size vector case.
+      *
+      * \param dataPtr pointer to the array to map
+      * \param a_size the size of the vector expression
+      * \param a_stride optional Stride object, passing the strides.
+      */
+    EIGEN_DEVICE_FUNC
+    inline Map(PointerArgType dataPtr, Index a_size, const StrideType& a_stride = StrideType())
+      : Base(cast_to_pointer_type(dataPtr), a_size), m_stride(a_stride)
+    {
+      PlainObjectType::Base::_check_template_params();
+    }
+
+    /** Constructor in the dynamic-size matrix case.
+      *
+      * \param dataPtr pointer to the array to map
+      * \param nbRows the number of rows of the matrix expression
+      * \param nbCols the number of columns of the matrix expression
+      * \param a_stride optional Stride object, passing the strides.
+      */
+    EIGEN_DEVICE_FUNC
+    inline Map(PointerArgType dataPtr, Index nbRows, Index nbCols, const StrideType& a_stride = StrideType())
+      : Base(cast_to_pointer_type(dataPtr), nbRows, nbCols), m_stride(a_stride)
+    {
+      PlainObjectType::Base::_check_template_params();
+    }
+
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Map)
+
+  protected:
+    StrideType m_stride;
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_MAP_H
diff --git a/third_party/eigen3/Eigen/src/Core/MapBase.h b/third_party/eigen3/Eigen/src/Core/MapBase.h
new file mode 100644
index 0000000000..e8ecb175bf
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/MapBase.h
@@ -0,0 +1,257 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2007-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MAPBASE_H
+#define EIGEN_MAPBASE_H
+
+#define EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived) \
+      EIGEN_STATIC_ASSERT((int(internal::traits<Derived>::Flags) & LinearAccessBit) || Derived::IsVectorAtCompileTime, \
+                          YOU_ARE_TRYING_TO_USE_AN_INDEX_BASED_ACCESSOR_ON_AN_EXPRESSION_THAT_DOES_NOT_SUPPORT_THAT)
+
+namespace Eigen { 
+
+/** \class MapBase
+  * \ingroup Core_Module
+  *
+  * \brief Base class for Map and Block expression with direct access
+  *
+  * \sa class Map, class Block
+  */
+template<typename Derived> class MapBase<Derived, ReadOnlyAccessors>
+  : public internal::dense_xpr_base<Derived>::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<Derived>::type Base;
+    enum {
+      RowsAtCompileTime = internal::traits<Derived>::RowsAtCompileTime,
+      ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
+      SizeAtCompileTime = Base::SizeAtCompileTime
+    };
+
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type PacketScalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename internal::conditional<
+                         bool(internal::is_lvalue<Derived>::value),
+                         Scalar *,
+                         const Scalar *>::type
+                     PointerType;
+
+    using Base::derived;
+//    using Base::RowsAtCompileTime;
+//    using Base::ColsAtCompileTime;
+//    using Base::SizeAtCompileTime;
+    using Base::MaxRowsAtCompileTime;
+    using Base::MaxColsAtCompileTime;
+    using Base::MaxSizeAtCompileTime;
+    using Base::IsVectorAtCompileTime;
+    using Base::Flags;
+    using Base::IsRowMajor;
+
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::coeff;
+    using Base::coeffRef;
+    using Base::lazyAssign;
+    using Base::eval;
+
+    using Base::innerStride;
+    using Base::outerStride;
+    using Base::rowStride;
+    using Base::colStride;
+
+    // bug 217 - compile error on ICC 11.1
+    using Base::operator=;
+
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+
+    EIGEN_DEVICE_FUNC inline Index rows() const { return m_rows.value(); }
+    EIGEN_DEVICE_FUNC inline Index cols() const { return m_cols.value(); }
+
+    /** Returns a pointer to the first coefficient of the matrix or vector.
+      *
+      * \note When addressing this data, make sure to honor the strides returned by innerStride() and outerStride().
+      *
+      * \sa innerStride(), outerStride()
+      */
+    inline const Scalar* data() const { return m_data; }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeff(Index rowId, Index colId) const
+    {
+      return m_data[colId * colStride() + rowId * rowStride()];
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeff(Index index) const
+    {
+      EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived)
+      return m_data[index * innerStride()];
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index rowId, Index colId) const
+    {
+      return this->m_data[colId * colStride() + rowId * rowStride()];
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index index) const
+    {
+      EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived)
+      return this->m_data[index * innerStride()];
+    }
+
+    template<int LoadMode>
+    inline PacketScalar packet(Index rowId, Index colId) const
+    {
+      return internal::ploadt<PacketScalar, LoadMode>
+               (m_data + (colId * colStride() + rowId * rowStride()));
+    }
+
+    template<int LoadMode>
+    inline PacketScalar packet(Index index) const
+    {
+      EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived)
+      return internal::ploadt<PacketScalar, LoadMode>(m_data + index * innerStride());
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline MapBase(PointerType dataPtr) : m_data(dataPtr), m_rows(RowsAtCompileTime), m_cols(ColsAtCompileTime)
+    {
+      EIGEN_STATIC_ASSERT_FIXED_SIZE(Derived)
+      checkSanity();
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline MapBase(PointerType dataPtr, Index vecSize)
+            : m_data(dataPtr),
+              m_rows(RowsAtCompileTime == Dynamic ? vecSize : Index(RowsAtCompileTime)),
+              m_cols(ColsAtCompileTime == Dynamic ? vecSize : Index(ColsAtCompileTime))
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+      eigen_assert(vecSize >= 0);
+      eigen_assert(dataPtr == 0 || SizeAtCompileTime == Dynamic || SizeAtCompileTime == vecSize);
+      checkSanity();
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline MapBase(PointerType dataPtr, Index nbRows, Index nbCols)
+            : m_data(dataPtr), m_rows(nbRows), m_cols(nbCols)
+    {
+      eigen_assert( (dataPtr == 0)
+              || (   nbRows >= 0 && (RowsAtCompileTime == Dynamic || RowsAtCompileTime == nbRows)
+                  && nbCols >= 0 && (ColsAtCompileTime == Dynamic || ColsAtCompileTime == nbCols)));
+      checkSanity();
+    }
+
+  protected:
+
+    EIGEN_DEVICE_FUNC
+    void checkSanity() const
+    {
+      EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(internal::traits<Derived>::Flags&PacketAccessBit,
+                                        internal::inner_stride_at_compile_time<Derived>::ret==1),
+                          PACKET_ACCESS_REQUIRES_TO_HAVE_INNER_STRIDE_FIXED_TO_1);
+      eigen_assert(EIGEN_IMPLIES(internal::traits<Derived>::Flags&AlignedBit, (size_t(m_data) % EIGEN_ALIGN_BYTES) == 0)
+                   && "data is not aligned");
+    }
+
+    PointerType m_data;
+    const internal::variable_if_dynamic<Index, RowsAtCompileTime> m_rows;
+    const internal::variable_if_dynamic<Index, ColsAtCompileTime> m_cols;
+};
+
+template<typename Derived> class MapBase<Derived, WriteAccessors>
+  : public MapBase<Derived, ReadOnlyAccessors>
+{
+  public:
+
+    typedef MapBase<Derived, ReadOnlyAccessors> Base;
+
+    typedef typename Base::Scalar Scalar;
+    typedef typename Base::PacketScalar PacketScalar;
+    typedef typename Base::Index Index;
+    typedef typename Base::PointerType PointerType;
+
+    using Base::derived;
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::coeff;
+    using Base::coeffRef;
+
+    using Base::innerStride;
+    using Base::outerStride;
+    using Base::rowStride;
+    using Base::colStride;
+
+    typedef typename internal::conditional<
+                    internal::is_lvalue<Derived>::value,
+                    Scalar,
+                    const Scalar
+                  >::type ScalarWithConstIfNotLvalue;
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar* data() const { return this->m_data; }
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue* data() { return this->m_data; } // no const-cast here so non-const-correct code will give a compile error
+
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue& coeffRef(Index row, Index col)
+    {
+      return this->m_data[col * colStride() + row * rowStride()];
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue& coeffRef(Index index)
+    {
+      EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived)
+      return this->m_data[index * innerStride()];
+    }
+
+    template<int StoreMode>
+    inline void writePacket(Index row, Index col, const PacketScalar& val)
+    {
+      internal::pstoret<Scalar, PacketScalar, StoreMode>
+               (this->m_data + (col * colStride() + row * rowStride()), val);
+    }
+
+    template<int StoreMode>
+    inline void writePacket(Index index, const PacketScalar& val)
+    {
+      EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS(Derived)
+      internal::pstoret<Scalar, PacketScalar, StoreMode>
+                (this->m_data + index * innerStride(), val);
+    }
+
+    EIGEN_DEVICE_FUNC explicit inline MapBase(PointerType dataPtr) : Base(dataPtr) {}
+    EIGEN_DEVICE_FUNC inline MapBase(PointerType dataPtr, Index vecSize) : Base(dataPtr, vecSize) {}
+    EIGEN_DEVICE_FUNC inline MapBase(PointerType dataPtr, Index nbRows, Index nbCols) : Base(dataPtr, nbRows, nbCols) {}
+
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const MapBase& other)
+    {
+      Base::Base::operator=(other);
+      return derived();
+    }
+
+    using Base::Base::operator=;
+};
+
+#undef EIGEN_STATIC_ASSERT_INDEX_BASED_ACCESS
+
+} // end namespace Eigen
+
+#endif // EIGEN_MAPBASE_H
diff --git a/third_party/eigen3/Eigen/src/Core/MathFunctions.h b/third_party/eigen3/Eigen/src/Core/MathFunctions.h
new file mode 100644
index 0000000000..941f72d224
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/MathFunctions.h
@@ -0,0 +1,1089 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MATHFUNCTIONS_H
+#define EIGEN_MATHFUNCTIONS_H
+
+// source: http://www.geom.uiuc.edu/~huberty/math5337/groupe/digits.html
+#define EIGEN_PI 3.141592653589793238462643383279502884197169399375105820974944592307816406
+
+namespace Eigen {
+
+// On WINCE, std::abs is defined for int only, so let's defined our own overloads:
+// This issue has been confirmed with MSVC 2008 only, but the issue might exist for more recent versions too.
+#if EIGEN_OS_WINCE && EIGEN_COMP_MSVC && EIGEN_COMP_MSVC<=1500
+long        abs(long        x) { return (labs(x));  }
+double      abs(double      x) { return (fabs(x));  }
+float       abs(float       x) { return (fabsf(x)); }
+long double abs(long double x) { return (fabsl(x)); }
+#endif
+  
+namespace internal {
+
+/** \internal \struct global_math_functions_filtering_base
+  *
+  * What it does:
+  * Defines a typedef 'type' as follows:
+  * - if type T has a member typedef Eigen_BaseClassForSpecializationOfGlobalMathFuncImpl, then
+  *   global_math_functions_filtering_base<T>::type is a typedef for it.
+  * - otherwise, global_math_functions_filtering_base<T>::type is a typedef for T.
+  *
+  * How it's used:
+  * To allow to defined the global math functions (like sin...) in certain cases, like the Array expressions.
+  * When you do sin(array1+array2), the object array1+array2 has a complicated expression type, all what you want to know
+  * is that it inherits ArrayBase. So we implement a partial specialization of sin_impl for ArrayBase<Derived>.
+  * So we must make sure to use sin_impl<ArrayBase<Derived> > and not sin_impl<Derived>, otherwise our partial specialization
+  * won't be used. How does sin know that? That's exactly what global_math_functions_filtering_base tells it.
+  *
+  * How it's implemented:
+  * SFINAE in the style of enable_if. Highly susceptible of breaking compilers. With GCC, it sure does work, but if you replace
+  * the typename dummy by an integer template parameter, it doesn't work anymore!
+  */
+
+template<typename T, typename dummy = void>
+struct global_math_functions_filtering_base
+{
+  typedef T type;
+};
+
+template<typename T> struct always_void { typedef void type; };
+
+template<typename T>
+struct global_math_functions_filtering_base
+  <T,
+   typename always_void<typename T::Eigen_BaseClassForSpecializationOfGlobalMathFuncImpl>::type
+  >
+{
+  typedef typename T::Eigen_BaseClassForSpecializationOfGlobalMathFuncImpl type;
+};
+
+#define EIGEN_MATHFUNC_IMPL(func, scalar) Eigen::internal::func##_impl<typename Eigen::internal::global_math_functions_filtering_base<scalar>::type>
+#define EIGEN_MATHFUNC_RETVAL(func, scalar) typename Eigen::internal::func##_retval<typename Eigen::internal::global_math_functions_filtering_base<scalar>::type>::type
+
+/****************************************************************************
+* Implementation of real                                                 *
+****************************************************************************/
+
+template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
+struct real_default_impl
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar run(const Scalar& x)
+  {
+    return x;
+  }
+};
+
+template<typename Scalar>
+struct real_default_impl<Scalar,true>
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar run(const Scalar& x)
+  {
+    using std::real;
+    return real(x);
+  }
+};
+
+template<typename Scalar> struct real_impl : real_default_impl<Scalar> {};
+
+template<typename Scalar>
+struct real_retval
+{
+  typedef typename NumTraits<Scalar>::Real type;
+};
+
+/****************************************************************************
+* Implementation of imag                                                 *
+****************************************************************************/
+
+template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
+struct imag_default_impl
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar run(const Scalar&)
+  {
+    return RealScalar(0);
+  }
+};
+
+template<typename Scalar>
+struct imag_default_impl<Scalar,true>
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar run(const Scalar& x)
+  {
+    using std::imag;
+    return imag(x);
+  }
+};
+
+template<typename Scalar> struct imag_impl : imag_default_impl<Scalar> {};
+
+template<typename Scalar>
+struct imag_retval
+{
+  typedef typename NumTraits<Scalar>::Real type;
+};
+
+/****************************************************************************
+* Implementation of real_ref                                             *
+****************************************************************************/
+
+template<typename Scalar>
+struct real_ref_impl
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar& run(Scalar& x)
+  {
+    return reinterpret_cast<RealScalar*>(&x)[0];
+  }
+  EIGEN_DEVICE_FUNC
+  static inline const RealScalar& run(const Scalar& x)
+  {
+    return reinterpret_cast<const RealScalar*>(&x)[0];
+  }
+};
+
+template<typename Scalar>
+struct real_ref_retval
+{
+  typedef typename NumTraits<Scalar>::Real & type;
+};
+
+/****************************************************************************
+* Implementation of imag_ref                                             *
+****************************************************************************/
+
+template<typename Scalar, bool IsComplex>
+struct imag_ref_default_impl
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar& run(Scalar& x)
+  {
+    return reinterpret_cast<RealScalar*>(&x)[1];
+  }
+  EIGEN_DEVICE_FUNC
+  static inline const RealScalar& run(const Scalar& x)
+  {
+    return reinterpret_cast<RealScalar*>(&x)[1];
+  }
+};
+
+template<typename Scalar>
+struct imag_ref_default_impl<Scalar, false>
+{
+  EIGEN_DEVICE_FUNC
+  static inline Scalar run(Scalar&)
+  {
+    return Scalar(0);
+  }
+  EIGEN_DEVICE_FUNC
+  static inline const Scalar run(const Scalar&)
+  {
+    return Scalar(0);
+  }
+};
+
+template<typename Scalar>
+struct imag_ref_impl : imag_ref_default_impl<Scalar, NumTraits<Scalar>::IsComplex> {};
+
+template<typename Scalar>
+struct imag_ref_retval
+{
+  typedef typename NumTraits<Scalar>::Real & type;
+};
+
+/****************************************************************************
+* Implementation of conj                                                 *
+****************************************************************************/
+
+template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
+struct conj_impl
+{
+  EIGEN_DEVICE_FUNC
+  static inline Scalar run(const Scalar& x)
+  {
+    return x;
+  }
+};
+
+template<typename Scalar>
+struct conj_impl<Scalar,true>
+{
+  EIGEN_DEVICE_FUNC
+  static inline Scalar run(const Scalar& x)
+  {
+    using std::conj;
+    return conj(x);
+  }
+};
+
+template<typename Scalar>
+struct conj_retval
+{
+  typedef Scalar type;
+};
+
+/****************************************************************************
+* Implementation of abs2                                                 *
+****************************************************************************/
+
+template<typename Scalar>
+struct abs2_impl
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar run(const Scalar& x)
+  {
+    return x*x;
+  }
+};
+
+template<typename RealScalar>
+struct abs2_impl<std::complex<RealScalar> >
+{
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar run(const std::complex<RealScalar>& x)
+  {
+    return real(x)*real(x) + imag(x)*imag(x);
+  }
+};
+
+template<typename Scalar>
+struct abs2_retval
+{
+  typedef typename NumTraits<Scalar>::Real type;
+};
+
+/****************************************************************************
+* Implementation of norm1                                                *
+****************************************************************************/
+
+template<typename Scalar, bool IsComplex>
+struct norm1_default_impl
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline RealScalar run(const Scalar& x)
+  {
+    using std::abs;
+    return abs(real(x)) + abs(imag(x));
+  }
+};
+
+template<typename Scalar>
+struct norm1_default_impl<Scalar, false>
+{
+  EIGEN_DEVICE_FUNC
+  static inline Scalar run(const Scalar& x)
+  {
+    using std::abs;
+    return abs(x);
+  }
+};
+
+template<typename Scalar>
+struct norm1_impl : norm1_default_impl<Scalar, NumTraits<Scalar>::IsComplex> {};
+
+template<typename Scalar>
+struct norm1_retval
+{
+  typedef typename NumTraits<Scalar>::Real type;
+};
+
+/****************************************************************************
+* Implementation of hypot                                                *
+****************************************************************************/
+
+template<typename Scalar>
+struct hypot_impl
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  static inline RealScalar run(const Scalar& x, const Scalar& y)
+  {
+    using std::abs;
+    using std::sqrt;
+    RealScalar _x = abs(x);
+    RealScalar _y = abs(y);
+    Scalar p, qp;
+    if(_x>_y)
+    {
+      p = _x;
+      qp = _y / p;
+    }
+    else
+    {
+      p = _y;
+      qp = _x / p;
+    }
+    if(p==RealScalar(0)) return RealScalar(0);
+    return p * sqrt(RealScalar(1) + qp*qp);
+  }
+};
+
+template<typename Scalar>
+struct hypot_retval
+{
+  typedef typename NumTraits<Scalar>::Real type;
+};
+
+/****************************************************************************
+* Implementation of cast                                                 *
+****************************************************************************/
+
+template<typename OldType, typename NewType>
+struct cast_impl
+{
+  EIGEN_DEVICE_FUNC static inline NewType run(const OldType& x)
+  {
+    return static_cast<NewType>(x);
+  }
+};
+
+// here, for once, we're plainly returning NewType: we don't want cast to do weird things.
+
+template<typename OldType, typename NewType>
+EIGEN_DEVICE_FUNC inline NewType cast(const OldType& x)
+{
+  return cast_impl<OldType, NewType>::run(x);
+}
+
+/****************************************************************************
+* Implementation of atanh2                                                *
+****************************************************************************/
+
+template<typename Scalar>
+struct atanh2_impl
+{
+  static inline Scalar run(const Scalar& x, const Scalar& r)
+  {
+    EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
+    using std::abs;
+    using std::log;
+    using std::sqrt;
+    Scalar z = x / r;
+    if (r == 0 || abs(z) > sqrt(NumTraits<Scalar>::epsilon()))
+      return log((r + x) / (r - x)) / 2;
+    else
+      return z + z*z*z / 3;
+  }
+};
+
+template<typename RealScalar>
+struct atanh2_impl<std::complex<RealScalar> >
+{
+  typedef std::complex<RealScalar> Scalar;
+  static inline Scalar run(const Scalar& x, const Scalar& r)
+  {
+    using std::log;
+    using std::norm;
+    using std::sqrt;
+    Scalar z = x / r;
+    if (r == Scalar(0) || norm(z) > NumTraits<RealScalar>::epsilon())
+      return RealScalar(0.5) * log((r + x) / (r - x));
+    else
+      return z + z*z*z / RealScalar(3);
+  }
+};
+
+template<typename Scalar>
+struct atanh2_retval
+{
+  typedef Scalar type;
+};
+
+/****************************************************************************
+* Implementation of round                                                   *
+****************************************************************************/
+
+#if EIGEN_HAS_CXX11_MATH
+  template<typename Scalar>
+  struct round_impl {
+    static inline Scalar run(const Scalar& x)
+    {
+      EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
+      using std::round;
+      return round(x);
+    }
+  };
+#else
+  template<typename Scalar>
+  struct round_impl
+  {
+    static inline Scalar run(const Scalar& x)
+    {
+      EIGEN_STATIC_ASSERT((!NumTraits<Scalar>::IsComplex), NUMERIC_TYPE_MUST_BE_REAL)
+      using std::floor;
+      using std::ceil;
+      return (x > 0.0) ? floor(x + 0.5) : ceil(x - 0.5);
+    }
+  };
+#endif
+
+template<typename Scalar>
+struct round_retval
+{
+  typedef Scalar type;
+};
+
+/****************************************************************************
+* Implementation of arg                                                     *
+****************************************************************************/
+
+#if EIGEN_HAS_CXX11_MATH
+  template<typename Scalar>
+  struct arg_impl {
+    static inline Scalar run(const Scalar& x)
+    {
+      using std::arg;
+      return arg(x);
+    }
+  };
+#else
+  template<typename Scalar, bool IsComplex = NumTraits<Scalar>::IsComplex>
+  struct arg_default_impl
+  {
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    EIGEN_DEVICE_FUNC
+    static inline RealScalar run(const Scalar& x)
+    {
+      return (x < 0.0) ? EIGEN_PI : 0.0; }
+  };
+
+  template<typename Scalar>
+  struct arg_default_impl<Scalar,true>
+  {
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    EIGEN_DEVICE_FUNC
+    static inline RealScalar run(const Scalar& x)
+    {
+      using std::arg;
+      return arg(x);
+    }
+  };
+
+  template<typename Scalar> struct arg_impl : arg_default_impl<Scalar> {};
+#endif
+
+template<typename Scalar>
+struct arg_retval
+{
+  typedef typename NumTraits<Scalar>::Real type;
+};
+
+/****************************************************************************
+* Implementation of log1p                                                   *
+****************************************************************************/
+template<typename Scalar, bool isComplex = NumTraits<Scalar>::IsComplex >
+struct log1p_impl
+{
+  static inline Scalar run(const Scalar& x)
+  {
+    EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    using std::log;
+    Scalar x1p = RealScalar(1) + x;
+    return ( x1p == Scalar(1) ) ? x : x * ( log(x1p) / (x1p - RealScalar(1)) );
+  }
+};
+
+#if EIGEN_HAS_CXX11_MATH
+template<typename Scalar>
+struct log1p_impl<Scalar, false> {
+  static inline Scalar run(const Scalar& x)
+  {
+    EIGEN_STATIC_ASSERT_NON_INTEGER(Scalar)
+    using std::log1p;
+    return log1p(x);
+  }
+};
+#endif
+
+template<typename Scalar>
+struct log1p_retval
+{
+  typedef Scalar type;
+};
+
+/****************************************************************************
+* Implementation of pow                                                  *
+****************************************************************************/
+
+template<typename Scalar, bool IsInteger>
+struct pow_default_impl
+{
+  typedef Scalar retval;
+  static inline Scalar run(const Scalar& x, const Scalar& y)
+  {
+    using std::pow;
+    return pow(x, y);
+  }
+};
+
+template<typename Scalar>
+struct pow_default_impl<Scalar, true>
+{
+  static inline Scalar run(Scalar x, Scalar y)
+  {
+    Scalar res(1);
+    eigen_assert(!NumTraits<Scalar>::IsSigned || y >= 0);
+    if(y & 1) res *= x;
+    y >>= 1;
+    while(y)
+    {
+      x *= x;
+      if(y&1) res *= x;
+      y >>= 1;
+    }
+    return res;
+  }
+};
+
+template<typename Scalar>
+struct pow_impl : pow_default_impl<Scalar, NumTraits<Scalar>::IsInteger> {};
+
+template<typename Scalar>
+struct pow_retval
+{
+  typedef Scalar type;
+};
+
+/****************************************************************************
+* Implementation of random                                               *
+****************************************************************************/
+
+template<typename Scalar,
+         bool IsComplex,
+         bool IsInteger>
+struct random_default_impl {};
+
+template<typename Scalar>
+struct random_impl : random_default_impl<Scalar, NumTraits<Scalar>::IsComplex, NumTraits<Scalar>::IsInteger> {};
+
+template<typename Scalar>
+struct random_retval
+{
+  typedef Scalar type;
+};
+
+template<typename Scalar> inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random(const Scalar& x, const Scalar& y);
+template<typename Scalar> inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random();
+
+template<typename Scalar>
+struct random_default_impl<Scalar, false, false>
+{
+  static inline Scalar run(const Scalar& x, const Scalar& y)
+  {
+    return x + (y-x) * Scalar(std::rand()) / Scalar(RAND_MAX);
+  }
+  static inline Scalar run()
+  {
+    return run(Scalar(NumTraits<Scalar>::IsSigned ? -1 : 0), Scalar(1));
+  }
+};
+
+enum {
+  meta_floor_log2_terminate,
+  meta_floor_log2_move_up,
+  meta_floor_log2_move_down,
+  meta_floor_log2_bogus
+};
+
+template<unsigned int n, int lower, int upper> struct meta_floor_log2_selector
+{
+  enum { middle = (lower + upper) / 2,
+         value = (upper <= lower + 1) ? int(meta_floor_log2_terminate)
+               : (n < (1 << middle)) ? int(meta_floor_log2_move_down)
+               : (n==0) ? int(meta_floor_log2_bogus)
+               : int(meta_floor_log2_move_up)
+  };
+};
+
+template<unsigned int n,
+         int lower = 0,
+         int upper = sizeof(unsigned int) * CHAR_BIT - 1,
+         int selector = meta_floor_log2_selector<n, lower, upper>::value>
+struct meta_floor_log2 {};
+
+template<unsigned int n, int lower, int upper>
+struct meta_floor_log2<n, lower, upper, meta_floor_log2_move_down>
+{
+  enum { value = meta_floor_log2<n, lower, meta_floor_log2_selector<n, lower, upper>::middle>::value };
+};
+
+template<unsigned int n, int lower, int upper>
+struct meta_floor_log2<n, lower, upper, meta_floor_log2_move_up>
+{
+  enum { value = meta_floor_log2<n, meta_floor_log2_selector<n, lower, upper>::middle, upper>::value };
+};
+
+template<unsigned int n, int lower, int upper>
+struct meta_floor_log2<n, lower, upper, meta_floor_log2_terminate>
+{
+  enum { value = (n >= ((unsigned int)(1) << (lower+1))) ? lower+1 : lower };
+};
+
+template<unsigned int n, int lower, int upper>
+struct meta_floor_log2<n, lower, upper, meta_floor_log2_bogus>
+{
+  // no value, error at compile time
+};
+
+template<typename Scalar>
+struct random_default_impl<Scalar, false, true>
+{
+  static inline Scalar run(const Scalar& x, const Scalar& y)
+  {
+    typedef typename conditional<NumTraits<Scalar>::IsSigned,std::ptrdiff_t,std::size_t>::type ScalarX;
+    if(y<x)
+      return x;
+    std::size_t range = ScalarX(y)-ScalarX(x);
+    std::size_t offset = 0;
+    // rejection sampling
+    std::size_t divisor    = (range+RAND_MAX-1)/(range+1);
+    std::size_t multiplier = (range+RAND_MAX-1)/std::size_t(RAND_MAX);
+
+    do {
+      offset = ( (std::size_t(std::rand()) * multiplier) / divisor );
+    } while (offset > range);
+
+    return Scalar(ScalarX(x) + offset);
+  }
+
+  static inline Scalar run()
+  {
+#ifdef EIGEN_MAKING_DOCS
+    return run(Scalar(NumTraits<Scalar>::IsSigned ? -10 : 0), Scalar(10));
+#else
+    enum { rand_bits = meta_floor_log2<(unsigned int)(RAND_MAX)+1>::value,
+           scalar_bits = sizeof(Scalar) * CHAR_BIT,
+           shift = EIGEN_PLAIN_ENUM_MAX(0, int(rand_bits) - int(scalar_bits)),
+           offset = NumTraits<Scalar>::IsSigned ? (1 << (EIGEN_PLAIN_ENUM_MIN(rand_bits,scalar_bits)-1)) : 0
+    };
+    return Scalar((std::rand() >> shift) - offset);
+#endif
+  }
+};
+
+template<typename Scalar>
+struct random_default_impl<Scalar, true, false>
+{
+  static inline Scalar run(const Scalar& x, const Scalar& y)
+  {
+    return Scalar(random(real(x), real(y)),
+                  random(imag(x), imag(y)));
+  }
+  static inline Scalar run()
+  {
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    return Scalar(random<RealScalar>(), random<RealScalar>());
+  }
+};
+
+template<typename Scalar>
+inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random(const Scalar& x, const Scalar& y)
+{
+  return EIGEN_MATHFUNC_IMPL(random, Scalar)::run(x, y);
+}
+
+template<typename Scalar>
+inline EIGEN_MATHFUNC_RETVAL(random, Scalar) random()
+{
+  return EIGEN_MATHFUNC_IMPL(random, Scalar)::run();
+}
+
+} // end namespace internal
+
+/****************************************************************************
+* Generic math functions                                                    *
+****************************************************************************/
+
+namespace numext {
+
+#ifndef __CUDA_ARCH__
+template<typename T>
+EIGEN_DEVICE_FUNC
+EIGEN_ALWAYS_INLINE T mini(const T& x, const T& y)
+{
+  EIGEN_USING_STD_MATH(min);
+  return min EIGEN_NOT_A_MACRO (x,y);
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)
+{
+  EIGEN_USING_STD_MATH(max);
+  return max EIGEN_NOT_A_MACRO (x,y);
+}
+#else
+template<typename T>
+EIGEN_DEVICE_FUNC
+EIGEN_ALWAYS_INLINE T mini(const T& x, const T& y)
+{
+  return y < x ? y : x;
+}
+template<>
+EIGEN_DEVICE_FUNC
+EIGEN_ALWAYS_INLINE float mini(const float& x, const float& y)
+{
+  return fmin(x, y);
+}
+template<typename T>
+EIGEN_DEVICE_FUNC
+EIGEN_ALWAYS_INLINE T maxi(const T& x, const T& y)
+{
+  return x < y ? y : x;
+}
+template<>
+EIGEN_DEVICE_FUNC
+EIGEN_ALWAYS_INLINE float maxi(const float& x, const float& y)
+{
+  return fmax(x, y);
+}
+#endif
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(real, Scalar) real(const Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(real, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline typename internal::add_const_on_value_type< EIGEN_MATHFUNC_RETVAL(real_ref, Scalar) >::type real_ref(const Scalar& x)
+{
+  return internal::real_ref_impl<Scalar>::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(real_ref, Scalar) real_ref(Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(real_ref, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(imag, Scalar) imag(const Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(imag, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(arg, Scalar) arg(const Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(arg, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline typename internal::add_const_on_value_type< EIGEN_MATHFUNC_RETVAL(imag_ref, Scalar) >::type imag_ref(const Scalar& x)
+{
+  return internal::imag_ref_impl<Scalar>::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(imag_ref, Scalar) imag_ref(Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(imag_ref, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(conj, Scalar) conj(const Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(conj, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(abs2, Scalar) abs2(const Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(abs2, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(norm1, Scalar) norm1(const Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(norm1, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(hypot, Scalar) hypot(const Scalar& x, const Scalar& y)
+{
+  return EIGEN_MATHFUNC_IMPL(hypot, Scalar)::run(x, y);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(log1p, Scalar) log1p(const Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(log1p, Scalar)::run(x);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(atanh2, Scalar) atanh2(const Scalar& x, const Scalar& y)
+{
+  return EIGEN_MATHFUNC_IMPL(atanh2, Scalar)::run(x, y);
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(pow, Scalar) pow(const Scalar& x, const Scalar& y)
+{
+  return EIGEN_MATHFUNC_IMPL(pow, Scalar)::run(x, y);
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+bool (isfinite)(const T& x)
+{
+  #if EIGEN_HAS_CXX11_MATH
+    using std::isfinite;
+    return isfinite(x);
+  #else
+    return x<NumTraits<T>::highest() && x>NumTraits<T>::lowest();
+  #endif
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+bool (isfinite)(const std::complex<T>& x)
+{
+  return numext::isfinite(numext::real(x)) && numext::isfinite(numext::imag(x));
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+bool (isnan)(const T& x)
+{
+  #if EIGEN_HAS_CXX11_MATH
+    using std::isnan;
+    return isnan(x);
+  #else
+    return x != x;
+  #endif
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+bool (isnan)(const std::complex<T>& x)
+{
+  return numext::isnan(numext::real(x)) || numext::isnan(numext::imag(x));
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+bool (isinf)(const T& x)
+{
+  #if EIGEN_HAS_CXX11_MATH
+    using std::isinf;
+    return isinf(x);
+  #else
+    return x>NumTraits<T>::highest() || x<NumTraits<T>::lowest();
+  #endif
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+bool (isinf)(const std::complex<T>& x)
+{
+  return (numext::isinf(numext::real(x)) || numext::isinf(numext::imag(x))) && (!numext::isnan(x));
+}
+
+template<typename Scalar>
+EIGEN_DEVICE_FUNC
+inline EIGEN_MATHFUNC_RETVAL(round, Scalar) round(const Scalar& x)
+{
+  return EIGEN_MATHFUNC_IMPL(round, Scalar)::run(x);
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+T (floor)(const T& x)
+{
+  using std::floor;
+  return floor(x);
+}
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+T (ceil)(const T& x)
+{
+  using std::ceil;
+  return ceil(x);
+}
+
+// Log base 2 for 32 bits positive integers.
+// Conveniently returns 0 for x==0.
+inline int log2(int x)
+{
+  eigen_assert(x>=0);
+  unsigned int v(x);
+  static const int table[32] = { 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 };
+  v |= v >> 1;
+  v |= v >> 2;
+  v |= v >> 4;
+  v |= v >> 8;
+  v |= v >> 16;
+  return table[(v * 0x07C4ACDDU) >> 27];
+}
+
+} // end namespace numext
+
+namespace internal {
+
+/****************************************************************************
+* Implementation of fuzzy comparisons                                       *
+****************************************************************************/
+
+template<typename Scalar,
+         bool IsComplex,
+         bool IsInteger>
+struct scalar_fuzzy_default_impl {};
+
+template<typename Scalar>
+struct scalar_fuzzy_default_impl<Scalar, false, false>
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  template<typename OtherScalar> EIGEN_DEVICE_FUNC
+  static inline bool isMuchSmallerThan(const Scalar& x, const OtherScalar& y, const RealScalar& prec)
+  {
+    using std::abs;
+    return abs(x) <= abs(y) * prec;
+  }
+  EIGEN_DEVICE_FUNC
+  static inline bool isApprox(const Scalar& x, const Scalar& y, const RealScalar& prec)
+  {
+    using std::abs;
+    return abs(x - y) <= numext::mini(abs(x), abs(y)) * prec;
+  }
+  EIGEN_DEVICE_FUNC
+  static inline bool isApproxOrLessThan(const Scalar& x, const Scalar& y, const RealScalar& prec)
+  {
+    return x <= y || isApprox(x, y, prec);
+  }
+};
+
+template<typename Scalar>
+struct scalar_fuzzy_default_impl<Scalar, false, true>
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  template<typename OtherScalar> EIGEN_DEVICE_FUNC
+  static inline bool isMuchSmallerThan(const Scalar& x, const Scalar&, const RealScalar&)
+  {
+    return x == Scalar(0);
+  }
+  EIGEN_DEVICE_FUNC
+  static inline bool isApprox(const Scalar& x, const Scalar& y, const RealScalar&)
+  {
+    return x == y;
+  }
+  EIGEN_DEVICE_FUNC
+  static inline bool isApproxOrLessThan(const Scalar& x, const Scalar& y, const RealScalar&)
+  {
+    return x <= y;
+  }
+};
+
+template<typename Scalar>
+struct scalar_fuzzy_default_impl<Scalar, true, false>
+{
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  template<typename OtherScalar>
+  static inline bool isMuchSmallerThan(const Scalar& x, const OtherScalar& y, const RealScalar& prec)
+  {
+    return numext::abs2(x) <= numext::abs2(y) * prec * prec;
+  }
+  static inline bool isApprox(const Scalar& x, const Scalar& y, const RealScalar& prec)
+  {
+    return numext::abs2(x - y) <= numext::mini(numext::abs2(x), numext::abs2(y)) * prec * prec;
+  }
+};
+
+template<typename Scalar>
+struct scalar_fuzzy_impl : scalar_fuzzy_default_impl<Scalar, NumTraits<Scalar>::IsComplex, NumTraits<Scalar>::IsInteger> {};
+
+template<typename Scalar, typename OtherScalar> EIGEN_DEVICE_FUNC
+inline bool isMuchSmallerThan(const Scalar& x, const OtherScalar& y,
+                                   typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision())
+{
+  return scalar_fuzzy_impl<Scalar>::template isMuchSmallerThan<OtherScalar>(x, y, precision);
+}
+
+template<typename Scalar> EIGEN_DEVICE_FUNC
+inline bool isApprox(const Scalar& x, const Scalar& y,
+                          typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision())
+{
+  return scalar_fuzzy_impl<Scalar>::isApprox(x, y, precision);
+}
+
+template<typename Scalar> EIGEN_DEVICE_FUNC
+inline bool isApproxOrLessThan(const Scalar& x, const Scalar& y,
+                                    typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision())
+{
+  return scalar_fuzzy_impl<Scalar>::isApproxOrLessThan(x, y, precision);
+}
+
+/******************************************
+***  The special case of the  bool type ***
+******************************************/
+
+template<> struct random_impl<bool>
+{
+  static inline bool run()
+  {
+    return random<int>(0,1)==0 ? false : true;
+  }
+};
+
+template<> struct scalar_fuzzy_impl<bool>
+{
+  typedef bool RealScalar;
+  
+  template<typename OtherScalar> EIGEN_DEVICE_FUNC
+  static inline bool isMuchSmallerThan(const bool& x, const bool&, const bool&)
+  {
+    return !x;
+  }
+  
+  EIGEN_DEVICE_FUNC
+  static inline bool isApprox(bool x, bool y, bool)
+  {
+    return x == y;
+  }
+
+  EIGEN_DEVICE_FUNC
+  static inline bool isApproxOrLessThan(const bool& x, const bool& y, const bool&)
+  {
+    return (!x) || y;
+  }
+  
+};
+
+  
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_MATHFUNCTIONS_H
diff --git a/third_party/eigen3/Eigen/src/Core/Matrix.h b/third_party/eigen3/Eigen/src/Core/Matrix.h
new file mode 100644
index 0000000000..782d67f54f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Matrix.h
@@ -0,0 +1,443 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MATRIX_H
+#define EIGEN_MATRIX_H
+
+namespace Eigen {
+
+/** \class Matrix
+  * \ingroup Core_Module
+  *
+  * \brief The matrix class, also used for vectors and row-vectors
+  *
+  * The %Matrix class is the work-horse for all \em dense (\ref dense "note") matrices and vectors within Eigen.
+  * Vectors are matrices with one column, and row-vectors are matrices with one row.
+  *
+  * The %Matrix class encompasses \em both fixed-size and dynamic-size objects (\ref fixedsize "note").
+  *
+  * The first three template parameters are required:
+  * \tparam _Scalar \anchor matrix_tparam_scalar Numeric type, e.g. float, double, int or std::complex<float>.
+  *                 User defined sclar types are supported as well (see \ref user_defined_scalars "here").
+  * \tparam _Rows Number of rows, or \b Dynamic
+  * \tparam _Cols Number of columns, or \b Dynamic
+  *
+  * The remaining template parameters are optional -- in most cases you don't have to worry about them.
+  * \tparam _Options \anchor matrix_tparam_options A combination of either \b #RowMajor or \b #ColMajor, and of either
+  *                 \b #AutoAlign or \b #DontAlign.
+  *                 The former controls \ref TopicStorageOrders "storage order", and defaults to column-major. The latter controls alignment, which is required
+  *                 for vectorization. It defaults to aligning matrices except for fixed sizes that aren't a multiple of the packet size.
+  * \tparam _MaxRows Maximum number of rows. Defaults to \a _Rows (\ref maxrows "note").
+  * \tparam _MaxCols Maximum number of columns. Defaults to \a _Cols (\ref maxrows "note").
+  *
+  * Eigen provides a number of typedefs covering the usual cases. Here are some examples:
+  *
+  * \li \c Matrix2d is a 2x2 square matrix of doubles (\c Matrix<double, 2, 2>)
+  * \li \c Vector4f is a vector of 4 floats (\c Matrix<float, 4, 1>)
+  * \li \c RowVector3i is a row-vector of 3 ints (\c Matrix<int, 1, 3>)
+  *
+  * \li \c MatrixXf is a dynamic-size matrix of floats (\c Matrix<float, Dynamic, Dynamic>)
+  * \li \c VectorXf is a dynamic-size vector of floats (\c Matrix<float, Dynamic, 1>)
+  *
+  * \li \c Matrix2Xf is a partially fixed-size (dynamic-size) matrix of floats (\c Matrix<float, 2, Dynamic>)
+  * \li \c MatrixX3d is a partially dynamic-size (fixed-size) matrix of double (\c Matrix<double, Dynamic, 3>)
+  *
+  * See \link matrixtypedefs this page \endlink for a complete list of predefined \em %Matrix and \em Vector typedefs.
+  *
+  * You can access elements of vectors and matrices using normal subscripting:
+  *
+  * \code
+  * Eigen::VectorXd v(10);
+  * v[0] = 0.1;
+  * v[1] = 0.2;
+  * v(0) = 0.3;
+  * v(1) = 0.4;
+  *
+  * Eigen::MatrixXi m(10, 10);
+  * m(0, 1) = 1;
+  * m(0, 2) = 2;
+  * m(0, 3) = 3;
+  * \endcode
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_MATRIX_PLUGIN.
+  *
+  * <i><b>Some notes:</b></i>
+  *
+  * <dl>
+  * <dt><b>\anchor dense Dense versus sparse:</b></dt>
+  * <dd>This %Matrix class handles dense, not sparse matrices and vectors. For sparse matrices and vectors, see the Sparse module.
+  *
+  * Dense matrices and vectors are plain usual arrays of coefficients. All the coefficients are stored, in an ordinary contiguous array.
+  * This is unlike Sparse matrices and vectors where the coefficients are stored as a list of nonzero coefficients.</dd>
+  *
+  * <dt><b>\anchor fixedsize Fixed-size versus dynamic-size:</b></dt>
+  * <dd>Fixed-size means that the numbers of rows and columns are known are compile-time. In this case, Eigen allocates the array
+  * of coefficients as a fixed-size array, as a class member. This makes sense for very small matrices, typically up to 4x4, sometimes up
+  * to 16x16. Larger matrices should be declared as dynamic-size even if one happens to know their size at compile-time.
+  *
+  * Dynamic-size means that the numbers of rows or columns are not necessarily known at compile-time. In this case they are runtime
+  * variables, and the array of coefficients is allocated dynamically on the heap.
+  *
+  * Note that \em dense matrices, be they Fixed-size or Dynamic-size, <em>do not</em> expand dynamically in the sense of a std::map.
+  * If you want this behavior, see the Sparse module.</dd>
+  *
+  * <dt><b>\anchor maxrows _MaxRows and _MaxCols:</b></dt>
+  * <dd>In most cases, one just leaves these parameters to the default values.
+  * These parameters mean the maximum size of rows and columns that the matrix may have. They are useful in cases
+  * when the exact numbers of rows and columns are not known are compile-time, but it is known at compile-time that they cannot
+  * exceed a certain value. This happens when taking dynamic-size blocks inside fixed-size matrices: in this case _MaxRows and _MaxCols
+  * are the dimensions of the original matrix, while _Rows and _Cols are Dynamic.</dd>
+  * </dl>
+  *
+  * \see MatrixBase for the majority of the API methods for matrices, \ref TopicClassHierarchy, 
+  * \ref TopicStorageOrders 
+  */
+
+namespace internal {
+template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
+struct traits<Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> >
+{
+  typedef _Scalar Scalar;
+  typedef Dense StorageKind;
+  typedef DenseIndex Index;
+  typedef MatrixXpr XprKind;
+  enum {
+    RowsAtCompileTime = _Rows,
+    ColsAtCompileTime = _Cols,
+    MaxRowsAtCompileTime = _MaxRows,
+    MaxColsAtCompileTime = _MaxCols,
+    Flags = compute_matrix_flags<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>::ret,
+    CoeffReadCost = NumTraits<Scalar>::ReadCost,
+    Options = _Options,
+    InnerStrideAtCompileTime = 1,
+    OuterStrideAtCompileTime = (Options&RowMajor) ? ColsAtCompileTime : RowsAtCompileTime
+  };
+};
+}
+
+template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
+class Matrix
+  : public PlainObjectBase<Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> >
+{
+  public:
+
+    /** \brief Base class typedef.
+      * \sa PlainObjectBase
+      */
+    typedef PlainObjectBase<Matrix> Base;
+
+    enum { Options = _Options };
+
+    EIGEN_DENSE_PUBLIC_INTERFACE(Matrix)
+
+    typedef typename Base::PlainObject PlainObject;
+
+    using Base::base;
+    using Base::coeffRef;
+
+    /**
+      * \brief Assigns matrices to each other.
+      *
+      * \note This is a special case of the templated operator=. Its purpose is
+      * to prevent a default operator= from hiding the templated operator=.
+      *
+      * \callgraph
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix& operator=(const Matrix& other)
+    {
+      return Base::_set(other);
+    }
+
+    /** \internal
+      * \brief Copies the value of the expression \a other into \c *this with automatic resizing.
+      *
+      * *this might be resized to match the dimensions of \a other. If *this was a null matrix (not already initialized),
+      * it will be initialized.
+      *
+      * Note that copying a row-vector into a vector (and conversely) is allowed.
+      * The resizing, if any, is then done in the appropriate way so that row-vectors
+      * remain row-vectors and vectors remain vectors.
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix& operator=(const MatrixBase<OtherDerived>& other)
+    {
+      return Base::_set(other);
+    }
+
+    /* Here, doxygen failed to copy the brief information when using \copydoc */
+
+    /**
+      * \brief Copies the generic expression \a other into *this.
+      * \copydetails DenseBase::operator=(const EigenBase<OtherDerived> &other)
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix& operator=(const EigenBase<OtherDerived> &other)
+    {
+      return Base::operator=(other);
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix& operator=(const ReturnByValue<OtherDerived>& func)
+    {
+      return Base::operator=(func);
+    }
+
+    /** \brief Default constructor.
+      *
+      * For fixed-size matrices, does nothing.
+      *
+      * For dynamic-size matrices, creates an empty matrix of size 0. Does not allocate any array. Such a matrix
+      * is called a null matrix. This constructor is the unique way to create null matrices: resizing
+      * a matrix to 0 is not supported.
+      *
+      * \sa resize(Index,Index)
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix() : Base()
+    {
+      Base::_check_template_params();
+      EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    }
+
+    // FIXME is it still needed
+    EIGEN_DEVICE_FUNC
+    Matrix(internal::constructor_without_unaligned_array_assert)
+      : Base(internal::constructor_without_unaligned_array_assert())
+    { Base::_check_template_params(); EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED }
+
+#ifdef EIGEN_HAVE_RVALUE_REFERENCES
+    Matrix(Matrix&& other)
+      : Base(std::move(other))
+    {
+      Base::_check_template_params();
+      if (RowsAtCompileTime!=Dynamic && ColsAtCompileTime!=Dynamic)
+        Base::_set_noalias(other);
+    }
+    Matrix& operator=(Matrix&& other)
+    {
+      other.swap(*this);
+      return *this;
+    }
+#endif
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+
+    // This constructor is for both 1x1 matrices and dynamic vectors
+    template<typename T>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE explicit Matrix(const T& x)
+    {
+      Base::_check_template_params();
+      Base::template _init1<T>(x);
+    }
+
+    template<typename T0, typename T1>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix(const T0& x, const T1& y)
+    {
+      Base::_check_template_params();
+      Base::template _init2<T0,T1>(x, y);
+    }
+    #else
+    /** \brief Constructs a fixed-sized matrix initialized with coefficients starting at \a data */
+    EIGEN_DEVICE_FUNC
+    explicit Matrix(const Scalar *data);
+
+    /** \brief Constructs a vector or row-vector with given dimension. \only_for_vectors
+      *
+      * Note that this is only useful for dynamic-size vectors. For fixed-size vectors,
+      * it is redundant to pass the dimension here, so it makes more sense to use the default
+      * constructor Matrix() instead.
+      */
+    EIGEN_STRONG_INLINE explicit Matrix(Index dim);
+    /** \brief Constructs an initialized 1x1 matrix with the given coefficient */
+    Matrix(const Scalar& x);
+    /** \brief Constructs an uninitialized matrix with \a rows rows and \a cols columns.
+      *
+      * This is useful for dynamic-size matrices. For fixed-size matrices,
+      * it is redundant to pass these parameters, so one should use the default constructor
+      * Matrix() instead. */
+    EIGEN_DEVICE_FUNC
+    Matrix(Index rows, Index cols);
+    /** \brief Constructs an initialized 2D vector with given coefficients */
+    Matrix(const Scalar& x, const Scalar& y);
+    #endif
+
+    /** \brief Constructs an initialized 3D vector with given coefficients */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix(const Scalar& x, const Scalar& y, const Scalar& z)
+    {
+      Base::_check_template_params();
+      EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Matrix, 3)
+      m_storage.data()[0] = x;
+      m_storage.data()[1] = y;
+      m_storage.data()[2] = z;
+    }
+    /** \brief Constructs an initialized 4D vector with given coefficients */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix(const Scalar& x, const Scalar& y, const Scalar& z, const Scalar& w)
+    {
+      Base::_check_template_params();
+      EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Matrix, 4)
+      m_storage.data()[0] = x;
+      m_storage.data()[1] = y;
+      m_storage.data()[2] = z;
+      m_storage.data()[3] = w;
+    }
+
+
+    /** \brief Constructor copying the value of the expression \a other */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix(const MatrixBase<OtherDerived>& other)
+             : Base(other.rows() * other.cols(), other.rows(), other.cols())
+    {
+      // This test resides here, to bring the error messages closer to the user. Normally, these checks
+      // are performed deeply within the library, thus causing long and scary error traces.
+      EIGEN_STATIC_ASSERT((internal::is_same<Scalar, typename OtherDerived::Scalar>::value),
+        YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+      Base::_check_template_params();
+      Base::_set_noalias(other);
+    }
+    /** \brief Copy constructor */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix(const Matrix& other)
+            : Base(other.rows() * other.cols(), other.rows(), other.cols())
+    {
+      Base::_check_template_params();
+      Base::_set_noalias(other);
+    }
+    /** \brief Copy constructor with in-place evaluation */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix(const ReturnByValue<OtherDerived>& other)
+    {
+      Base::_check_template_params();
+      Base::resize(other.rows(), other.cols());
+      other.evalTo(*this);
+    }
+
+    /** \brief Copy constructor for generic expressions.
+      * \sa MatrixBase::operator=(const EigenBase<OtherDerived>&)
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Matrix(const EigenBase<OtherDerived> &other)
+      : Base(other.derived().rows() * other.derived().cols(), other.derived().rows(), other.derived().cols())
+    {
+      Base::_check_template_params();
+      Base::_resize_to_match(other);
+      // FIXME/CHECK: isn't *this = other.derived() more efficient. it allows to
+      //              go for pure _set() implementations, right?
+      *this = other;
+    }
+
+    /** \internal
+      * \brief Override MatrixBase::swap() since for dynamic-sized matrices
+      * of same type it is enough to swap the data pointers.
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void swap(MatrixBase<OtherDerived> const & other)
+    { this->_swap(other.derived()); }
+
+    EIGEN_DEVICE_FUNC inline Index innerStride() const { return 1; }
+    EIGEN_DEVICE_FUNC inline Index outerStride() const { return this->innerSize(); }
+
+    /////////// Geometry module ///////////
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    explicit Matrix(const RotationBase<OtherDerived,ColsAtCompileTime>& r);
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Matrix& operator=(const RotationBase<OtherDerived,ColsAtCompileTime>& r);
+
+    #ifdef EIGEN2_SUPPORT
+    template<typename OtherDerived>
+    explicit Matrix(const eigen2_RotationBase<OtherDerived,ColsAtCompileTime>& r);
+    template<typename OtherDerived>
+    Matrix& operator=(const eigen2_RotationBase<OtherDerived,ColsAtCompileTime>& r);
+    #endif
+
+    // allow to extend Matrix outside Eigen
+    #ifdef EIGEN_MATRIX_PLUGIN
+    #include EIGEN_MATRIX_PLUGIN
+    #endif
+
+  protected:
+    template <typename Derived, typename OtherDerived, bool IsVector>
+    friend struct internal::conservative_resize_like_impl;
+
+    using Base::m_storage;
+};
+
+/** \defgroup matrixtypedefs Global matrix typedefs
+  *
+  * \ingroup Core_Module
+  *
+  * Eigen defines several typedef shortcuts for most common matrix and vector types.
+  *
+  * The general patterns are the following:
+  *
+  * \c MatrixSizeType where \c Size can be \c 2,\c 3,\c 4 for fixed size square matrices or \c X for dynamic size,
+  * and where \c Type can be \c i for integer, \c f for float, \c d for double, \c cf for complex float, \c cd
+  * for complex double.
+  *
+  * For example, \c Matrix3d is a fixed-size 3x3 matrix type of doubles, and \c MatrixXf is a dynamic-size matrix of floats.
+  *
+  * There are also \c VectorSizeType and \c RowVectorSizeType which are self-explanatory. For example, \c Vector4cf is
+  * a fixed-size vector of 4 complex floats.
+  *
+  * \sa class Matrix
+  */
+
+#define EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Size, SizeSuffix)   \
+/** \ingroup matrixtypedefs */                                    \
+typedef Matrix<Type, Size, Size> Matrix##SizeSuffix##TypeSuffix;  \
+/** \ingroup matrixtypedefs */                                    \
+typedef Matrix<Type, Size, 1>    Vector##SizeSuffix##TypeSuffix;  \
+/** \ingroup matrixtypedefs */                                    \
+typedef Matrix<Type, 1, Size>    RowVector##SizeSuffix##TypeSuffix;
+
+#define EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, Size)         \
+/** \ingroup matrixtypedefs */                                    \
+typedef Matrix<Type, Size, Dynamic> Matrix##Size##X##TypeSuffix;  \
+/** \ingroup matrixtypedefs */                                    \
+typedef Matrix<Type, Dynamic, Size> Matrix##X##Size##TypeSuffix;
+
+#define EIGEN_MAKE_TYPEDEFS_ALL_SIZES(Type, TypeSuffix) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 2, 2) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 3, 3) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 4, 4) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Dynamic, X) \
+EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, 2) \
+EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, 3) \
+EIGEN_MAKE_FIXED_TYPEDEFS(Type, TypeSuffix, 4)
+
+EIGEN_MAKE_TYPEDEFS_ALL_SIZES(int,                  i)
+EIGEN_MAKE_TYPEDEFS_ALL_SIZES(float,                f)
+EIGEN_MAKE_TYPEDEFS_ALL_SIZES(double,               d)
+EIGEN_MAKE_TYPEDEFS_ALL_SIZES(std::complex<float>,  cf)
+EIGEN_MAKE_TYPEDEFS_ALL_SIZES(std::complex<double>, cd)
+
+#undef EIGEN_MAKE_TYPEDEFS_ALL_SIZES
+#undef EIGEN_MAKE_TYPEDEFS
+#undef EIGEN_MAKE_FIXED_TYPEDEFS
+
+} // end namespace Eigen
+
+#endif // EIGEN_MATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/MatrixBase.h b/third_party/eigen3/Eigen/src/Core/MatrixBase.h
new file mode 100644
index 0000000000..598b38ed47
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/MatrixBase.h
@@ -0,0 +1,614 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MATRIXBASE_H
+#define EIGEN_MATRIXBASE_H
+
+namespace Eigen {
+
+/** \class MatrixBase
+  * \ingroup Core_Module
+  *
+  * \brief Base class for all dense matrices, vectors, and expressions
+  *
+  * This class is the base that is inherited by all matrix, vector, and related expression
+  * types. Most of the Eigen API is contained in this class, and its base classes. Other important
+  * classes for the Eigen API are Matrix, and VectorwiseOp.
+  *
+  * Note that some methods are defined in other modules such as the \ref LU_Module LU module
+  * for all functions related to matrix inversions.
+  *
+  * \tparam Derived is the derived type, e.g. a matrix type, or an expression, etc.
+  *
+  * When writing a function taking Eigen objects as argument, if you want your function
+  * to take as argument any matrix, vector, or expression, just let it take a
+  * MatrixBase argument. As an example, here is a function printFirstRow which, given
+  * a matrix, vector, or expression \a x, prints the first row of \a x.
+  *
+  * \code
+    template<typename Derived>
+    void printFirstRow(const Eigen::MatrixBase<Derived>& x)
+    {
+      cout << x.row(0) << endl;
+    }
+  * \endcode
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_MATRIXBASE_PLUGIN.
+  *
+  * \sa \ref TopicClassHierarchy
+  */
+template<typename Derived> class MatrixBase
+  : public DenseBase<Derived>
+{
+  public:
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    typedef MatrixBase StorageBaseType;
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type PacketScalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    typedef DenseBase<Derived> Base;
+    using Base::RowsAtCompileTime;
+    using Base::ColsAtCompileTime;
+    using Base::SizeAtCompileTime;
+    using Base::MaxRowsAtCompileTime;
+    using Base::MaxColsAtCompileTime;
+    using Base::MaxSizeAtCompileTime;
+    using Base::IsVectorAtCompileTime;
+    using Base::Flags;
+    using Base::CoeffReadCost;
+
+    using Base::derived;
+    using Base::const_cast_derived;
+    using Base::rows;
+    using Base::cols;
+    using Base::size;
+    using Base::coeff;
+    using Base::coeffRef;
+    using Base::lazyAssign;
+    using Base::eval;
+    using Base::operator+=;
+    using Base::operator-=;
+    using Base::operator*=;
+    using Base::operator/=;
+
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+    typedef typename Base::ConstTransposeReturnType ConstTransposeReturnType;
+    typedef typename Base::RowXpr RowXpr;
+    typedef typename Base::ColXpr ColXpr;
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** type of the equivalent square matrix */
+    typedef Matrix<Scalar,EIGEN_SIZE_MAX(RowsAtCompileTime,ColsAtCompileTime),
+                          EIGEN_SIZE_MAX(RowsAtCompileTime,ColsAtCompileTime)> SquareMatrixType;
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+    /** \returns the size of the main diagonal, which is min(rows(),cols()).
+      * \sa rows(), cols(), SizeAtCompileTime. */
+    EIGEN_DEVICE_FUNC
+    inline Index diagonalSize() const { return (std::min)(rows(),cols()); }
+
+    /** \brief The plain matrix type corresponding to this expression.
+      *
+      * This is not necessarily exactly the return type of eval(). In the case of plain matrices,
+      * the return type of eval() is a const reference to a matrix, not a matrix! It is however guaranteed
+      * that the return type of eval() is either PlainObject or const PlainObject&.
+      */
+    typedef Matrix<typename internal::traits<Derived>::Scalar,
+                internal::traits<Derived>::RowsAtCompileTime,
+                internal::traits<Derived>::ColsAtCompileTime,
+                AutoAlign | (internal::traits<Derived>::Flags&RowMajorBit ? RowMajor : ColMajor),
+                internal::traits<Derived>::MaxRowsAtCompileTime,
+                internal::traits<Derived>::MaxColsAtCompileTime
+          > PlainObject;
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** \internal Represents a matrix with all coefficients equal to one another*/
+    typedef CwiseNullaryOp<internal::scalar_constant_op<Scalar>,Derived> ConstantReturnType;
+    /** \internal the return type of MatrixBase::adjoint() */
+    typedef typename internal::conditional<NumTraits<Scalar>::IsComplex,
+                        CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>, ConstTransposeReturnType>,
+                        ConstTransposeReturnType
+                     >::type AdjointReturnType;
+    /** \internal Return type of eigenvalues() */
+    typedef Matrix<std::complex<RealScalar>, internal::traits<Derived>::ColsAtCompileTime, 1, ColMajor> EigenvaluesReturnType;
+    /** \internal the return type of identity */
+    typedef CwiseNullaryOp<internal::scalar_identity_op<Scalar>,Derived> IdentityReturnType;
+    /** \internal the return type of unit vectors */
+    typedef Block<const CwiseNullaryOp<internal::scalar_identity_op<Scalar>, SquareMatrixType>,
+                  internal::traits<Derived>::RowsAtCompileTime,
+                  internal::traits<Derived>::ColsAtCompileTime> BasisReturnType;
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+#define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::MatrixBase
+#   include "../plugins/CommonCwiseUnaryOps.h"
+#   include "../plugins/CommonCwiseBinaryOps.h"
+#   include "../plugins/MatrixCwiseUnaryOps.h"
+#   include "../plugins/MatrixCwiseBinaryOps.h"
+#   ifdef EIGEN_MATRIXBASE_PLUGIN
+#     include EIGEN_MATRIXBASE_PLUGIN
+#   endif
+#undef EIGEN_CURRENT_STORAGE_BASE_CLASS
+
+    /** Special case of the template operator=, in order to prevent the compiler
+      * from generating a default operator= (issue hit with g++ 4.1)
+      */
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const MatrixBase& other);
+
+    // We cannot inherit here via Base::operator= since it is causing
+    // trouble with MSVC.
+
+    template <typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const DenseBase<OtherDerived>& other);
+
+    template <typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const EigenBase<OtherDerived>& other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator=(const ReturnByValue<OtherDerived>& other);
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    EIGEN_DEVICE_FUNC
+    Derived& lazyAssign(const ProductBase<ProductDerived, Lhs,Rhs>& other);
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator+=(const MatrixBase<OtherDerived>& other);
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    Derived& operator-=(const MatrixBase<OtherDerived>& other);
+
+#ifdef __CUDACC__
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    const typename LazyProductReturnType<Derived,OtherDerived>::Type
+    operator*(const MatrixBase<OtherDerived> &other) const
+    { return this->lazyProduct(other); }
+#else
+
+#ifdef EIGEN_TEST_EVALUATORS
+    template<typename OtherDerived>
+    const Product<Derived,OtherDerived>
+    operator*(const MatrixBase<OtherDerived> &other) const;
+#else
+    template<typename OtherDerived>
+    const typename ProductReturnType<Derived,OtherDerived>::Type
+    operator*(const MatrixBase<OtherDerived> &other) const;
+#endif
+
+#endif
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    const typename LazyProductReturnType<Derived,OtherDerived>::Type
+    lazyProduct(const MatrixBase<OtherDerived> &other) const;
+
+    template<typename OtherDerived>
+    Derived& operator*=(const EigenBase<OtherDerived>& other);
+
+    template<typename OtherDerived>
+    void applyOnTheLeft(const EigenBase<OtherDerived>& other);
+
+    template<typename OtherDerived>
+    void applyOnTheRight(const EigenBase<OtherDerived>& other);
+
+    template<typename DiagonalDerived>
+    EIGEN_DEVICE_FUNC
+    const DiagonalProduct<Derived, DiagonalDerived, OnTheRight>
+    operator*(const DiagonalBase<DiagonalDerived> &diagonal) const;
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    typename internal::scalar_product_traits<typename internal::traits<Derived>::Scalar,typename internal::traits<OtherDerived>::Scalar>::ReturnType
+    dot(const MatrixBase<OtherDerived>& other) const;
+
+    #ifdef EIGEN2_SUPPORT
+      template<typename OtherDerived>
+      Scalar eigen2_dot(const MatrixBase<OtherDerived>& other) const;
+    #endif
+
+    EIGEN_DEVICE_FUNC RealScalar squaredNorm() const;
+    EIGEN_DEVICE_FUNC RealScalar norm() const;
+    RealScalar stableNorm() const;
+    RealScalar blueNorm() const;
+    RealScalar hypotNorm() const;
+    EIGEN_DEVICE_FUNC const PlainObject normalized() const;
+    EIGEN_DEVICE_FUNC void normalize();
+
+    EIGEN_DEVICE_FUNC const AdjointReturnType adjoint() const;
+    EIGEN_DEVICE_FUNC void adjointInPlace();
+
+    typedef Diagonal<Derived> DiagonalReturnType;
+    EIGEN_DEVICE_FUNC
+    DiagonalReturnType diagonal();
+    
+    typedef typename internal::add_const<Diagonal<const Derived> >::type ConstDiagonalReturnType;
+    EIGEN_DEVICE_FUNC
+    ConstDiagonalReturnType diagonal() const;
+
+    template<int Index> struct DiagonalIndexReturnType { typedef Diagonal<Derived,Index> Type; };
+    template<int Index> struct ConstDiagonalIndexReturnType { typedef const Diagonal<const Derived,Index> Type; };
+
+    template<int Index> 
+    EIGEN_DEVICE_FUNC
+    typename DiagonalIndexReturnType<Index>::Type diagonal();
+
+    template<int Index>
+    EIGEN_DEVICE_FUNC
+    typename ConstDiagonalIndexReturnType<Index>::Type diagonal() const;
+
+    // Note: The "MatrixBase::" prefixes are added to help MSVC9 to match these declarations with the later implementations.
+    // On the other hand they confuse MSVC8...
+    #if EIGEN_COMP_MSVC >= 1500 // 2008 or later
+    typename MatrixBase::template DiagonalIndexReturnType<DynamicIndex>::Type diagonal(Index index);
+    typename MatrixBase::template ConstDiagonalIndexReturnType<DynamicIndex>::Type diagonal(Index index) const;
+    #else
+    EIGEN_DEVICE_FUNC
+    typename DiagonalIndexReturnType<DynamicIndex>::Type diagonal(Index index);
+    
+    EIGEN_DEVICE_FUNC
+    typename ConstDiagonalIndexReturnType<DynamicIndex>::Type diagonal(Index index) const;
+    #endif
+
+    #ifdef EIGEN2_SUPPORT
+    template<unsigned int Mode> typename internal::eigen2_part_return_type<Derived, Mode>::type part();
+    template<unsigned int Mode> const typename internal::eigen2_part_return_type<Derived, Mode>::type part() const;
+    
+    // huuuge hack. make Eigen2's matrix.part<Diagonal>() work in eigen3. Problem: Diagonal is now a class template instead
+    // of an integer constant. Solution: overload the part() method template wrt template parameters list.
+    template<template<typename T, int N> class U>
+    const DiagonalWrapper<ConstDiagonalReturnType> part() const
+    { return diagonal().asDiagonal(); }
+    #endif // EIGEN2_SUPPORT
+
+    template<unsigned int Mode> struct TriangularViewReturnType { typedef TriangularView<Derived, Mode> Type; };
+    template<unsigned int Mode> struct ConstTriangularViewReturnType { typedef const TriangularView<const Derived, Mode> Type; };
+
+    template<unsigned int Mode>
+    EIGEN_DEVICE_FUNC
+    typename TriangularViewReturnType<Mode>::Type triangularView();
+    template<unsigned int Mode>
+    EIGEN_DEVICE_FUNC
+    typename ConstTriangularViewReturnType<Mode>::Type triangularView() const;
+
+    template<unsigned int UpLo> struct SelfAdjointViewReturnType { typedef SelfAdjointView<Derived, UpLo> Type; };
+    template<unsigned int UpLo> struct ConstSelfAdjointViewReturnType { typedef const SelfAdjointView<const Derived, UpLo> Type; };
+
+    template<unsigned int UpLo> 
+    EIGEN_DEVICE_FUNC
+    typename SelfAdjointViewReturnType<UpLo>::Type selfadjointView();
+    template<unsigned int UpLo>
+    EIGEN_DEVICE_FUNC
+    typename ConstSelfAdjointViewReturnType<UpLo>::Type selfadjointView() const;
+
+    const SparseView<Derived> sparseView(const Scalar& m_reference = Scalar(0),
+                                         const typename NumTraits<Scalar>::Real& m_epsilon = NumTraits<Scalar>::dummy_precision()) const;
+    EIGEN_DEVICE_FUNC static const IdentityReturnType Identity();
+    EIGEN_DEVICE_FUNC static const IdentityReturnType Identity(Index rows, Index cols);
+    EIGEN_DEVICE_FUNC static const BasisReturnType Unit(Index size, Index i);
+    EIGEN_DEVICE_FUNC static const BasisReturnType Unit(Index i);
+    EIGEN_DEVICE_FUNC static const BasisReturnType UnitX();
+    EIGEN_DEVICE_FUNC static const BasisReturnType UnitY();
+    EIGEN_DEVICE_FUNC static const BasisReturnType UnitZ();
+    EIGEN_DEVICE_FUNC static const BasisReturnType UnitW();
+
+    EIGEN_DEVICE_FUNC
+    const DiagonalWrapper<const Derived> asDiagonal() const;
+    const PermutationWrapper<const Derived> asPermutation() const;
+
+    EIGEN_DEVICE_FUNC
+    Derived& setIdentity();
+    EIGEN_DEVICE_FUNC
+    Derived& setIdentity(Index rows, Index cols);
+
+    bool isIdentity(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    bool isDiagonal(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+
+    bool isUpperTriangular(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    bool isLowerTriangular(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+
+    template<typename OtherDerived>
+    bool isOrthogonal(const MatrixBase<OtherDerived>& other,
+                      const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+    bool isUnitary(const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const;
+
+    /** \returns true if each coefficients of \c *this and \a other are all exactly equal.
+      * \warning When using floating point scalar values you probably should rather use a
+      *          fuzzy comparison such as isApprox()
+      * \sa isApprox(), operator!= */
+    template<typename OtherDerived>
+    inline bool operator==(const MatrixBase<OtherDerived>& other) const
+    { return cwiseEqual(other).all(); }
+
+    /** \returns true if at least one pair of coefficients of \c *this and \a other are not exactly equal to each other.
+      * \warning When using floating point scalar values you probably should rather use a
+      *          fuzzy comparison such as isApprox()
+      * \sa isApprox(), operator== */
+    template<typename OtherDerived>
+    inline bool operator!=(const MatrixBase<OtherDerived>& other) const
+    { return cwiseNotEqual(other).any(); }
+
+    NoAlias<Derived,Eigen::MatrixBase > noalias();
+
+    inline const ForceAlignedAccess<Derived> forceAlignedAccess() const;
+    inline ForceAlignedAccess<Derived> forceAlignedAccess();
+    template<bool Enable> inline typename internal::add_const_on_value_type<typename internal::conditional<Enable,ForceAlignedAccess<Derived>,Derived&>::type>::type forceAlignedAccessIf() const;
+    template<bool Enable> inline typename internal::conditional<Enable,ForceAlignedAccess<Derived>,Derived&>::type forceAlignedAccessIf();
+
+    Scalar trace() const;
+
+    template<int p> EIGEN_DEVICE_FUNC RealScalar lpNorm() const;
+
+    EIGEN_DEVICE_FUNC MatrixBase<Derived>& matrix() { return *this; }
+    EIGEN_DEVICE_FUNC const MatrixBase<Derived>& matrix() const { return *this; }
+
+    /** \returns an \link Eigen::ArrayBase Array \endlink expression of this matrix
+      * \sa ArrayBase::matrix() */
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ArrayWrapper<Derived> array() { return derived(); }
+    /** \returns a const \link Eigen::ArrayBase Array \endlink expression of this matrix
+      * \sa ArrayBase::matrix() */
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const ArrayWrapper<const Derived> array() const { return derived(); }
+
+/////////// LU module ///////////
+
+    EIGEN_DEVICE_FUNC const FullPivLU<PlainObject> fullPivLu() const;
+    EIGEN_DEVICE_FUNC const PartialPivLU<PlainObject> partialPivLu() const;
+
+    #if EIGEN2_SUPPORT_STAGE < STAGE20_RESOLVE_API_CONFLICTS
+    const LU<PlainObject> lu() const;
+    #endif
+
+    #ifdef EIGEN2_SUPPORT
+    const LU<PlainObject> eigen2_lu() const;
+    #endif
+
+    #if EIGEN2_SUPPORT_STAGE > STAGE20_RESOLVE_API_CONFLICTS
+    const PartialPivLU<PlainObject> lu() const;
+    #endif
+    
+    #ifdef EIGEN2_SUPPORT
+    template<typename ResultType>
+    void computeInverse(MatrixBase<ResultType> *result) const {
+      *result = this->inverse();
+    }
+    #endif
+
+    EIGEN_DEVICE_FUNC
+    const internal::inverse_impl<Derived> inverse() const;
+    template<typename ResultType>
+    void computeInverseAndDetWithCheck(
+      ResultType& inverse,
+      typename ResultType::Scalar& determinant,
+      bool& invertible,
+      const RealScalar& absDeterminantThreshold = NumTraits<Scalar>::dummy_precision()
+    ) const;
+    template<typename ResultType>
+    void computeInverseWithCheck(
+      ResultType& inverse,
+      bool& invertible,
+      const RealScalar& absDeterminantThreshold = NumTraits<Scalar>::dummy_precision()
+    ) const;
+    Scalar determinant() const;
+
+/////////// Cholesky module ///////////
+
+    const LLT<PlainObject>  llt() const;
+    const LDLT<PlainObject> ldlt() const;
+
+/////////// QR module ///////////
+
+    const HouseholderQR<PlainObject> householderQr() const;
+    const ColPivHouseholderQR<PlainObject> colPivHouseholderQr() const;
+    const FullPivHouseholderQR<PlainObject> fullPivHouseholderQr() const;
+    
+    #ifdef EIGEN2_SUPPORT
+    const QR<PlainObject> qr() const;
+    #endif
+
+    EigenvaluesReturnType eigenvalues() const;
+    RealScalar operatorNorm() const;
+
+/////////// SVD module ///////////
+
+    JacobiSVD<PlainObject> jacobiSvd(unsigned int computationOptions = 0) const;
+
+    #ifdef EIGEN2_SUPPORT
+    SVD<PlainObject> svd() const;
+    #endif
+
+/////////// Geometry module ///////////
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /// \internal helper struct to form the return type of the cross product
+    template<typename OtherDerived> struct cross_product_return_type {
+      typedef typename internal::scalar_product_traits<typename internal::traits<Derived>::Scalar,typename internal::traits<OtherDerived>::Scalar>::ReturnType Scalar;
+      typedef Matrix<Scalar,MatrixBase::RowsAtCompileTime,MatrixBase::ColsAtCompileTime> type;
+    };
+    #endif // EIGEN_PARSED_BY_DOXYGEN
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    typename cross_product_return_type<OtherDerived>::type
+    cross(const MatrixBase<OtherDerived>& other) const;
+    
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    PlainObject cross3(const MatrixBase<OtherDerived>& other) const;
+    
+    EIGEN_DEVICE_FUNC
+    PlainObject unitOrthogonal(void) const;
+    
+    Matrix<Scalar,3,1> eulerAngles(Index a0, Index a1, Index a2) const;
+    
+    #if EIGEN2_SUPPORT_STAGE > STAGE20_RESOLVE_API_CONFLICTS
+    ScalarMultipleReturnType operator*(const UniformScaling<Scalar>& s) const;
+    // put this as separate enum value to work around possible GCC 4.3 bug (?)
+    enum { HomogeneousReturnTypeDirection = ColsAtCompileTime==1?Vertical:Horizontal };
+    typedef Homogeneous<Derived, HomogeneousReturnTypeDirection> HomogeneousReturnType;
+    HomogeneousReturnType homogeneous() const;
+    #endif
+    
+    enum {
+      SizeMinusOne = SizeAtCompileTime==Dynamic ? Dynamic : SizeAtCompileTime-1
+    };
+    typedef Block<const Derived,
+                  internal::traits<Derived>::ColsAtCompileTime==1 ? SizeMinusOne : 1,
+                  internal::traits<Derived>::ColsAtCompileTime==1 ? 1 : SizeMinusOne> ConstStartMinusOne;
+    typedef CwiseUnaryOp<internal::scalar_quotient1_op<typename internal::traits<Derived>::Scalar>,
+                const ConstStartMinusOne > HNormalizedReturnType;
+
+    const HNormalizedReturnType hnormalized() const;
+
+////////// Householder module ///////////
+
+    void makeHouseholderInPlace(Scalar& tau, RealScalar& beta);
+    template<typename EssentialPart>
+    void makeHouseholder(EssentialPart& essential,
+                         Scalar& tau, RealScalar& beta) const;
+    template<typename EssentialPart>
+    void applyHouseholderOnTheLeft(const EssentialPart& essential,
+                                   const Scalar& tau,
+                                   Scalar* workspace);
+    template<typename EssentialPart>
+    void applyHouseholderOnTheRight(const EssentialPart& essential,
+                                    const Scalar& tau,
+                                    Scalar* workspace);
+
+///////// Jacobi module /////////
+
+    template<typename OtherScalar>
+    void applyOnTheLeft(Index p, Index q, const JacobiRotation<OtherScalar>& j);
+    template<typename OtherScalar>
+    void applyOnTheRight(Index p, Index q, const JacobiRotation<OtherScalar>& j);
+
+///////// MatrixFunctions module /////////
+
+    typedef typename internal::stem_function<Scalar>::type StemFunction;
+    const MatrixExponentialReturnValue<Derived> exp() const;
+    const MatrixFunctionReturnValue<Derived> matrixFunction(StemFunction f) const;
+    const MatrixFunctionReturnValue<Derived> cosh() const;
+    const MatrixFunctionReturnValue<Derived> sinh() const;
+    const MatrixFunctionReturnValue<Derived> cos() const;
+    const MatrixFunctionReturnValue<Derived> sin() const;
+    const MatrixSquareRootReturnValue<Derived> sqrt() const;
+    const MatrixLogarithmReturnValue<Derived> log() const;
+    const MatrixPowerReturnValue<Derived> pow(const RealScalar& p) const;
+    const MatrixComplexPowerReturnValue<Derived> pow(const std::complex<RealScalar>& p) const;
+
+#ifdef EIGEN2_SUPPORT
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    Derived& operator+=(const Flagged<ProductBase<ProductDerived, Lhs,Rhs>, 0,
+                                      EvalBeforeAssigningBit>& other);
+
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    Derived& operator-=(const Flagged<ProductBase<ProductDerived, Lhs,Rhs>, 0,
+                                      EvalBeforeAssigningBit>& other);
+
+    /** \deprecated because .lazy() is deprecated
+      * Overloaded for cache friendly product evaluation */
+    template<typename OtherDerived>
+    Derived& lazyAssign(const Flagged<OtherDerived, 0, EvalBeforeAssigningBit>& other)
+    { return lazyAssign(other._expression()); }
+
+    template<unsigned int Added>
+    const Flagged<Derived, Added, 0> marked() const;
+    const Flagged<Derived, 0, EvalBeforeAssigningBit> lazy() const;
+
+    inline const Cwise<Derived> cwise() const;
+    inline Cwise<Derived> cwise();
+
+    VectorBlock<Derived> start(Index size);
+    const VectorBlock<const Derived> start(Index size) const;
+    VectorBlock<Derived> end(Index size);
+    const VectorBlock<const Derived> end(Index size) const;
+    template<int Size> VectorBlock<Derived,Size> start();
+    template<int Size> const VectorBlock<const Derived,Size> start() const;
+    template<int Size> VectorBlock<Derived,Size> end();
+    template<int Size> const VectorBlock<const Derived,Size> end() const;
+
+    Minor<Derived> minor(Index row, Index col);
+    const Minor<Derived> minor(Index row, Index col) const;
+#endif
+
+  protected:
+    EIGEN_DEVICE_FUNC MatrixBase() : Base() {}
+
+  private:
+    EIGEN_DEVICE_FUNC explicit MatrixBase(int);
+    EIGEN_DEVICE_FUNC MatrixBase(int,int);
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC explicit MatrixBase(const MatrixBase<OtherDerived>&);
+  protected:
+    // mixing arrays and matrices is not legal
+    template<typename OtherDerived> Derived& operator+=(const ArrayBase<OtherDerived>& )
+    {EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar))==-1,YOU_CANNOT_MIX_ARRAYS_AND_MATRICES); return *this;}
+    // mixing arrays and matrices is not legal
+    template<typename OtherDerived> Derived& operator-=(const ArrayBase<OtherDerived>& )
+    {EIGEN_STATIC_ASSERT(std::ptrdiff_t(sizeof(typename OtherDerived::Scalar))==-1,YOU_CANNOT_MIX_ARRAYS_AND_MATRICES); return *this;}
+};
+
+
+/***************************************************************************
+* Implementation of matrix base methods
+***************************************************************************/
+
+/** replaces \c *this by \c *this * \a other.
+  *
+  * \returns a reference to \c *this
+  *
+  * Example: \include MatrixBase_applyOnTheRight.cpp
+  * Output: \verbinclude MatrixBase_applyOnTheRight.out
+  */
+template<typename Derived>
+template<typename OtherDerived>
+inline Derived&
+MatrixBase<Derived>::operator*=(const EigenBase<OtherDerived> &other)
+{
+  other.derived().applyThisOnTheRight(derived());
+  return derived();
+}
+
+/** replaces \c *this by \c *this * \a other. It is equivalent to MatrixBase::operator*=().
+  *
+  * Example: \include MatrixBase_applyOnTheRight.cpp
+  * Output: \verbinclude MatrixBase_applyOnTheRight.out
+  */
+template<typename Derived>
+template<typename OtherDerived>
+inline void MatrixBase<Derived>::applyOnTheRight(const EigenBase<OtherDerived> &other)
+{
+  other.derived().applyThisOnTheRight(derived());
+}
+
+/** replaces \c *this by \a other * \c *this.
+  *
+  * Example: \include MatrixBase_applyOnTheLeft.cpp
+  * Output: \verbinclude MatrixBase_applyOnTheLeft.out
+  */
+template<typename Derived>
+template<typename OtherDerived>
+inline void MatrixBase<Derived>::applyOnTheLeft(const EigenBase<OtherDerived> &other)
+{
+  other.derived().applyThisOnTheLeft(derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_MATRIXBASE_H
diff --git a/third_party/eigen3/Eigen/src/Core/NestByValue.h b/third_party/eigen3/Eigen/src/Core/NestByValue.h
new file mode 100644
index 0000000000..1944bd7858
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/NestByValue.h
@@ -0,0 +1,112 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_NESTBYVALUE_H
+#define EIGEN_NESTBYVALUE_H
+
+namespace Eigen {
+
+/** \class NestByValue
+  * \ingroup Core_Module
+  *
+  * \brief Expression which must be nested by value
+  *
+  * \param ExpressionType the type of the object of which we are requiring nesting-by-value
+  *
+  * This class is the return type of MatrixBase::nestByValue()
+  * and most of the time this is the only way it is used.
+  *
+  * \sa MatrixBase::nestByValue()
+  */
+
+namespace internal {
+template <typename ExpressionType>
+struct traits<NestByValue<ExpressionType> > : public traits<ExpressionType> {
+  enum { Flags = traits<ExpressionType>::Flags & ~NestByRefBit };
+};
+}
+
+template<typename ExpressionType> class NestByValue
+  : public internal::dense_xpr_base< NestByValue<ExpressionType> >::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<NestByValue>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(NestByValue)
+
+    inline NestByValue(const ExpressionType& matrix) : m_expression(matrix) {}
+
+    inline Index rows() const { return m_expression.rows(); }
+    inline Index cols() const { return m_expression.cols(); }
+    inline Index outerStride() const { return m_expression.outerStride(); }
+    inline Index innerStride() const { return m_expression.innerStride(); }
+
+    inline const CoeffReturnType coeff(Index row, Index col) const
+    {
+      return m_expression.coeff(row, col);
+    }
+
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      return m_expression.const_cast_derived().coeffRef(row, col);
+    }
+
+    inline const CoeffReturnType coeff(Index index) const
+    {
+      return m_expression.coeff(index);
+    }
+
+    inline Scalar& coeffRef(Index index)
+    {
+      return m_expression.const_cast_derived().coeffRef(index);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index row, Index col) const
+    {
+      return m_expression.template packet<LoadMode>(row, col);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index row, Index col, const PacketScalar& x)
+    {
+      m_expression.const_cast_derived().template writePacket<LoadMode>(row, col, x);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index index) const
+    {
+      return m_expression.template packet<LoadMode>(index);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index index, const PacketScalar& x)
+    {
+      m_expression.const_cast_derived().template writePacket<LoadMode>(index, x);
+    }
+
+    operator const ExpressionType&() const { return m_expression; }
+
+  protected:
+    const ExpressionType m_expression;
+};
+
+/** \returns an expression of the temporary version of *this.
+  */
+template<typename Derived>
+inline const NestByValue<Derived>
+DenseBase<Derived>::nestByValue() const
+{
+  return NestByValue<Derived>(derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_NESTBYVALUE_H
diff --git a/third_party/eigen3/Eigen/src/Core/NoAlias.h b/third_party/eigen3/Eigen/src/Core/NoAlias.h
new file mode 100644
index 0000000000..0a1c327433
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/NoAlias.h
@@ -0,0 +1,141 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_NOALIAS_H
+#define EIGEN_NOALIAS_H
+
+namespace Eigen {
+
+/** \class NoAlias
+  * \ingroup Core_Module
+  *
+  * \brief Pseudo expression providing an operator = assuming no aliasing
+  *
+  * \param ExpressionType the type of the object on which to do the lazy assignment
+  *
+  * This class represents an expression with special assignment operators
+  * assuming no aliasing between the target expression and the source expression.
+  * More precisely it alloas to bypass the EvalBeforeAssignBit flag of the source expression.
+  * It is the return type of MatrixBase::noalias()
+  * and most of the time this is the only way it is used.
+  *
+  * \sa MatrixBase::noalias()
+  */
+template<typename ExpressionType, template <typename> class StorageBase>
+class NoAlias
+{
+    typedef typename ExpressionType::Scalar Scalar;
+  public:
+    NoAlias(ExpressionType& expression) : m_expression(expression) {}
+
+    /** Behaves like MatrixBase::lazyAssign(other)
+      * \sa MatrixBase::lazyAssign() */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE ExpressionType& operator=(const StorageBase<OtherDerived>& other)
+    { return internal::assign_selector<ExpressionType,OtherDerived,false>::run(m_expression,other.derived()); }
+
+    /** \sa MatrixBase::operator+= */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE ExpressionType& operator+=(const StorageBase<OtherDerived>& other)
+    {
+      typedef SelfCwiseBinaryOp<internal::scalar_sum_op<Scalar>, ExpressionType, OtherDerived> SelfAdder;
+      SelfAdder tmp(m_expression);
+      typedef typename internal::nested<OtherDerived>::type OtherDerivedNested;
+      typedef typename internal::remove_all<OtherDerivedNested>::type _OtherDerivedNested;
+      internal::assign_selector<SelfAdder,_OtherDerivedNested,false>::run(tmp,OtherDerivedNested(other.derived()));
+      return m_expression;
+    }
+
+    /** \sa MatrixBase::operator-= */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE ExpressionType& operator-=(const StorageBase<OtherDerived>& other)
+    {
+      typedef SelfCwiseBinaryOp<internal::scalar_difference_op<Scalar>, ExpressionType, OtherDerived> SelfAdder;
+      SelfAdder tmp(m_expression);
+      typedef typename internal::nested<OtherDerived>::type OtherDerivedNested;
+      typedef typename internal::remove_all<OtherDerivedNested>::type _OtherDerivedNested;
+      internal::assign_selector<SelfAdder,_OtherDerivedNested,false>::run(tmp,OtherDerivedNested(other.derived()));
+      return m_expression;
+    }
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE ExpressionType& operator+=(const ProductBase<ProductDerived, Lhs,Rhs>& other)
+    { other.derived().addTo(m_expression); return m_expression; }
+
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE ExpressionType& operator-=(const ProductBase<ProductDerived, Lhs,Rhs>& other)
+    { other.derived().subTo(m_expression); return m_expression; }
+
+    template<typename Lhs, typename Rhs, int NestingFlags>
+    EIGEN_STRONG_INLINE ExpressionType& operator+=(const CoeffBasedProduct<Lhs,Rhs,NestingFlags>& other)
+    { return m_expression.derived() += CoeffBasedProduct<Lhs,Rhs,NestByRefBit>(other.lhs(), other.rhs()); }
+
+    template<typename Lhs, typename Rhs, int NestingFlags>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE ExpressionType& operator-=(const CoeffBasedProduct<Lhs,Rhs,NestingFlags>& other)
+    { return m_expression.derived() -= CoeffBasedProduct<Lhs,Rhs,NestByRefBit>(other.lhs(), other.rhs()); }
+    
+    template<typename OtherDerived>
+    ExpressionType& operator=(const ReturnByValue<OtherDerived>& func)
+    { return m_expression = func; }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    ExpressionType& expression() const
+    {
+      return m_expression;
+    }
+
+  protected:
+    ExpressionType& m_expression;
+};
+
+/** \returns a pseudo expression of \c *this with an operator= assuming
+  * no aliasing between \c *this and the source expression.
+  *
+  * More precisely, noalias() allows to bypass the EvalBeforeAssignBit flag.
+  * Currently, even though several expressions may alias, only product
+  * expressions have this flag. Therefore, noalias() is only usefull when
+  * the source expression contains a matrix product.
+  *
+  * Here are some examples where noalias is usefull:
+  * \code
+  * D.noalias()  = A * B;
+  * D.noalias() += A.transpose() * B;
+  * D.noalias() -= 2 * A * B.adjoint();
+  * \endcode
+  *
+  * On the other hand the following example will lead to a \b wrong result:
+  * \code
+  * A.noalias() = A * B;
+  * \endcode
+  * because the result matrix A is also an operand of the matrix product. Therefore,
+  * there is no alternative than evaluating A * B in a temporary, that is the default
+  * behavior when you write:
+  * \code
+  * A = A * B;
+  * \endcode
+  *
+  * \sa class NoAlias
+  */
+template<typename Derived>
+NoAlias<Derived,MatrixBase> MatrixBase<Derived>::noalias()
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_NOALIAS_H
diff --git a/third_party/eigen3/Eigen/src/Core/NumTraits.h b/third_party/eigen3/Eigen/src/Core/NumTraits.h
new file mode 100644
index 0000000000..dee9159517
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/NumTraits.h
@@ -0,0 +1,177 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_NUMTRAITS_H
+#define EIGEN_NUMTRAITS_H
+
+namespace Eigen {
+
+/** \class NumTraits
+  * \ingroup Core_Module
+  *
+  * \brief Holds information about the various numeric (i.e. scalar) types allowed by Eigen.
+  *
+  * \param T the numeric type at hand
+  *
+  * This class stores enums, typedefs and static methods giving information about a numeric type.
+  *
+  * The provided data consists of:
+  * \li A typedef \a Real, giving the "real part" type of \a T. If \a T is already real,
+  *     then \a Real is just a typedef to \a T. If \a T is \c std::complex<U> then \a Real
+  *     is a typedef to \a U.
+  * \li A typedef \a NonInteger, giving the type that should be used for operations producing non-integral values,
+  *     such as quotients, square roots, etc. If \a T is a floating-point type, then this typedef just gives
+  *     \a T again. Note however that many Eigen functions such as internal::sqrt simply refuse to
+  *     take integers. Outside of a few cases, Eigen doesn't do automatic type promotion. Thus, this typedef is
+  *     only intended as a helper for code that needs to explicitly promote types.
+  * \li A typedef \a Nested giving the type to use to nest a value inside of the expression tree. If you don't know what
+  *     this means, just use \a T here.
+  * \li An enum value \a IsComplex. It is equal to 1 if \a T is a \c std::complex
+  *     type, and to 0 otherwise.
+  * \li An enum value \a IsInteger. It is equal to \c 1 if \a T is an integer type such as \c int,
+  *     and to \c 0 otherwise.
+  * \li Enum values ReadCost, AddCost and MulCost representing a rough estimate of the number of CPU cycles needed
+  *     to by move / add / mul instructions respectively, assuming the data is already stored in CPU registers.
+  *     Stay vague here. No need to do architecture-specific stuff.
+  * \li An enum value \a IsSigned. It is equal to \c 1 if \a T is a signed type and to 0 if \a T is unsigned.
+  * \li An enum value \a RequireInitialization. It is equal to \c 1 if the constructor of the numeric type \a T must
+  *     be called, and to 0 if it is safe not to call it. Default is 0 if \a T is an arithmetic type, and 1 otherwise.
+  * \li An epsilon() function which, unlike std::numeric_limits::epsilon(), returns a \a Real instead of a \a T.
+  * \li A dummy_precision() function returning a weak epsilon value. It is mainly used as a default
+  *     value by the fuzzy comparison operators.
+  * \li highest() and lowest() functions returning the highest and lowest possible values respectively.
+  */
+
+template<typename T> struct GenericNumTraits
+{
+  enum {
+    IsInteger = std::numeric_limits<T>::is_integer,
+    IsSigned = std::numeric_limits<T>::is_signed,
+    IsComplex = 0,
+    RequireInitialization = internal::is_arithmetic<T>::value ? 0 : 1,
+    ReadCost = 1,
+    AddCost = 1,
+    MulCost = 1
+  };
+
+  typedef T Real;
+  typedef typename internal::conditional<
+                     IsInteger,
+                     typename internal::conditional<sizeof(T)<=2, float, double>::type,
+                     T
+                   >::type NonInteger;
+  typedef T Nested;
+
+  EIGEN_DEVICE_FUNC
+  static inline Real epsilon()
+  {
+#if defined(__CUDA_ARCH__) && !defined(__GCUDACC__)
+    return internal::device::numeric_limits<T>::epsilon();
+#else
+    return std::numeric_limits<T>::epsilon();
+#endif
+  }
+  EIGEN_DEVICE_FUNC
+  static inline Real dummy_precision()
+  {
+    // make sure to override this for floating-point types
+    return Real(0);
+  }
+
+  EIGEN_DEVICE_FUNC
+  static inline T highest() {
+#if defined(__CUDA_ARCH__) && !defined(__GCUDACC__)
+    return internal::device::numeric_limits<T>::max();
+#else
+    return (std::numeric_limits<T>::max)();
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC
+  static inline T lowest()  {
+#if defined(__CUDA_ARCH__) && !defined(__GCUDACC__)
+    return internal::device::numeric_limits<T>::lowest();
+#else
+    return IsInteger ? (std::numeric_limits<T>::min)() : (-(std::numeric_limits<T>::max)());
+#endif
+  }
+  
+#ifdef EIGEN2_SUPPORT
+  enum {
+    HasFloatingPoint = !IsInteger
+  };
+  typedef NonInteger FloatingPoint;
+#endif
+};
+
+template<typename T> struct NumTraits : GenericNumTraits<T>
+{};
+
+template<> struct NumTraits<float>
+  : GenericNumTraits<float>
+{
+  EIGEN_DEVICE_FUNC
+  static inline float dummy_precision() { return 1e-5f; }
+};
+
+template<> struct NumTraits<double> : GenericNumTraits<double>
+{
+  EIGEN_DEVICE_FUNC
+  static inline double dummy_precision() { return 1e-12; }
+};
+
+template<> struct NumTraits<long double>
+  : GenericNumTraits<long double>
+{
+  static inline long double dummy_precision() { return 1e-15l; }
+};
+
+template<typename _Real> struct NumTraits<std::complex<_Real> >
+  : GenericNumTraits<std::complex<_Real> >
+{
+  typedef _Real Real;
+  enum {
+    IsComplex = 1,
+    RequireInitialization = NumTraits<_Real>::RequireInitialization,
+    ReadCost = 2 * NumTraits<_Real>::ReadCost,
+    AddCost = 2 * NumTraits<Real>::AddCost,
+    MulCost = 4 * NumTraits<Real>::MulCost + 2 * NumTraits<Real>::AddCost
+  };
+
+  static inline Real epsilon() { return NumTraits<Real>::epsilon(); }
+  static inline Real dummy_precision() { return NumTraits<Real>::dummy_precision(); }
+};
+
+template<typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
+struct NumTraits<Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> >
+{
+  typedef Array<Scalar, Rows, Cols, Options, MaxRows, MaxCols> ArrayType;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef Array<RealScalar, Rows, Cols, Options, MaxRows, MaxCols> Real;
+  typedef typename NumTraits<Scalar>::NonInteger NonIntegerScalar;
+  typedef Array<NonIntegerScalar, Rows, Cols, Options, MaxRows, MaxCols> NonInteger;
+  typedef ArrayType & Nested;
+  
+  enum {
+    IsComplex = NumTraits<Scalar>::IsComplex,
+    IsInteger = NumTraits<Scalar>::IsInteger,
+    IsSigned  = NumTraits<Scalar>::IsSigned,
+    RequireInitialization = 1,
+    ReadCost = ArrayType::SizeAtCompileTime==Dynamic ? Dynamic : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::ReadCost,
+    AddCost  = ArrayType::SizeAtCompileTime==Dynamic ? Dynamic : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::AddCost,
+    MulCost  = ArrayType::SizeAtCompileTime==Dynamic ? Dynamic : ArrayType::SizeAtCompileTime * NumTraits<Scalar>::MulCost
+  };
+  
+  static inline RealScalar epsilon() { return NumTraits<RealScalar>::epsilon(); }
+  static inline RealScalar dummy_precision() { return NumTraits<RealScalar>::dummy_precision(); }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_NUMTRAITS_H
diff --git a/third_party/eigen3/Eigen/src/Core/PermutationMatrix.h b/third_party/eigen3/Eigen/src/Core/PermutationMatrix.h
new file mode 100644
index 0000000000..1297b8413f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/PermutationMatrix.h
@@ -0,0 +1,689 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PERMUTATIONMATRIX_H
+#define EIGEN_PERMUTATIONMATRIX_H
+
+namespace Eigen { 
+
+template<int RowCol,typename IndicesType,typename MatrixType, typename StorageKind> class PermutedImpl;
+
+/** \class PermutationBase
+  * \ingroup Core_Module
+  *
+  * \brief Base class for permutations
+  *
+  * \param Derived the derived class
+  *
+  * This class is the base class for all expressions representing a permutation matrix,
+  * internally stored as a vector of integers.
+  * The convention followed here is that if \f$ \sigma \f$ is a permutation, the corresponding permutation matrix
+  * \f$ P_\sigma \f$ is such that if \f$ (e_1,\ldots,e_p) \f$ is the canonical basis, we have:
+  *  \f[ P_\sigma(e_i) = e_{\sigma(i)}. \f]
+  * This convention ensures that for any two permutations \f$ \sigma, \tau \f$, we have:
+  *  \f[ P_{\sigma\circ\tau} = P_\sigma P_\tau. \f]
+  *
+  * Permutation matrices are square and invertible.
+  *
+  * Notice that in addition to the member functions and operators listed here, there also are non-member
+  * operator* to multiply any kind of permutation object with any kind of matrix expression (MatrixBase)
+  * on either side.
+  *
+  * \sa class PermutationMatrix, class PermutationWrapper
+  */
+
+namespace internal {
+
+template<typename PermutationType, typename MatrixType, int Side, bool Transposed=false>
+struct permut_matrix_product_retval;
+template<typename PermutationType, typename MatrixType, int Side, bool Transposed=false>
+struct permut_sparsematrix_product_retval;
+enum PermPermProduct_t {PermPermProduct};
+
+} // end namespace internal
+
+template<typename Derived>
+class PermutationBase : public EigenBase<Derived>
+{
+    typedef internal::traits<Derived> Traits;
+    typedef EigenBase<Derived> Base;
+  public:
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    typedef typename Traits::IndicesType IndicesType;
+    enum {
+      Flags = Traits::Flags,
+      CoeffReadCost = Traits::CoeffReadCost,
+      RowsAtCompileTime = Traits::RowsAtCompileTime,
+      ColsAtCompileTime = Traits::ColsAtCompileTime,
+      MaxRowsAtCompileTime = Traits::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = Traits::MaxColsAtCompileTime
+    };
+    typedef typename Traits::Scalar Scalar;
+    typedef typename Traits::Index Index;
+    typedef Matrix<Scalar,RowsAtCompileTime,ColsAtCompileTime,0,MaxRowsAtCompileTime,MaxColsAtCompileTime>
+            DenseMatrixType;
+    typedef PermutationMatrix<IndicesType::SizeAtCompileTime,IndicesType::MaxSizeAtCompileTime,Index>
+            PlainPermutationType;
+    using Base::derived;
+    #endif
+
+    /** Copies the other permutation into *this */
+    template<typename OtherDerived>
+    Derived& operator=(const PermutationBase<OtherDerived>& other)
+    {
+      indices() = other.indices();
+      return derived();
+    }
+
+    /** Assignment from the Transpositions \a tr */
+    template<typename OtherDerived>
+    Derived& operator=(const TranspositionsBase<OtherDerived>& tr)
+    {
+      setIdentity(tr.size());
+      for(Index k=size()-1; k>=0; --k)
+        applyTranspositionOnTheRight(k,tr.coeff(k));
+      return derived();
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    Derived& operator=(const PermutationBase& other)
+    {
+      indices() = other.indices();
+      return derived();
+    }
+    #endif
+
+    /** \returns the number of rows */
+    inline Index rows() const { return Index(indices().size()); }
+
+    /** \returns the number of columns */
+    inline Index cols() const { return Index(indices().size()); }
+
+    /** \returns the size of a side of the respective square matrix, i.e., the number of indices */
+    inline Index size() const { return Index(indices().size()); }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename DenseDerived>
+    void evalTo(MatrixBase<DenseDerived>& other) const
+    {
+      other.setZero();
+      for (int i=0; i<rows();++i)
+        other.coeffRef(indices().coeff(i),i) = typename DenseDerived::Scalar(1);
+    }
+    #endif
+
+    /** \returns a Matrix object initialized from this permutation matrix. Notice that it
+      * is inefficient to return this Matrix object by value. For efficiency, favor using
+      * the Matrix constructor taking EigenBase objects.
+      */
+    DenseMatrixType toDenseMatrix() const
+    {
+      return derived();
+    }
+
+    /** const version of indices(). */
+    const IndicesType& indices() const { return derived().indices(); }
+    /** \returns a reference to the stored array representing the permutation. */
+    IndicesType& indices() { return derived().indices(); }
+
+    /** Resizes to given size.
+      */
+    inline void resize(Index newSize)
+    {
+      indices().resize(newSize);
+    }
+
+    /** Sets *this to be the identity permutation matrix */
+    void setIdentity()
+    {
+      for(Index i = 0; i < size(); ++i)
+        indices().coeffRef(i) = i;
+    }
+
+    /** Sets *this to be the identity permutation matrix of given size.
+      */
+    void setIdentity(Index newSize)
+    {
+      resize(newSize);
+      setIdentity();
+    }
+
+    /** Multiplies *this by the transposition \f$(ij)\f$ on the left.
+      *
+      * \returns a reference to *this.
+      *
+      * \warning This is much slower than applyTranspositionOnTheRight(int,int):
+      * this has linear complexity and requires a lot of branching.
+      *
+      * \sa applyTranspositionOnTheRight(int,int)
+      */
+    Derived& applyTranspositionOnTheLeft(Index i, Index j)
+    {
+      eigen_assert(i>=0 && j>=0 && i<size() && j<size());
+      for(Index k = 0; k < size(); ++k)
+      {
+        if(indices().coeff(k) == i) indices().coeffRef(k) = j;
+        else if(indices().coeff(k) == j) indices().coeffRef(k) = i;
+      }
+      return derived();
+    }
+
+    /** Multiplies *this by the transposition \f$(ij)\f$ on the right.
+      *
+      * \returns a reference to *this.
+      *
+      * This is a fast operation, it only consists in swapping two indices.
+      *
+      * \sa applyTranspositionOnTheLeft(int,int)
+      */
+    Derived& applyTranspositionOnTheRight(Index i, Index j)
+    {
+      eigen_assert(i>=0 && j>=0 && i<size() && j<size());
+      std::swap(indices().coeffRef(i), indices().coeffRef(j));
+      return derived();
+    }
+
+    /** \returns the inverse permutation matrix.
+      *
+      * \note \note_try_to_help_rvo
+      */
+    inline Transpose<PermutationBase> inverse() const
+    { return derived(); }
+    /** \returns the tranpose permutation matrix.
+      *
+      * \note \note_try_to_help_rvo
+      */
+    inline Transpose<PermutationBase> transpose() const
+    { return derived(); }
+
+    /**** multiplication helpers to hopefully get RVO ****/
+
+  
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+  protected:
+    template<typename OtherDerived>
+    void assignTranspose(const PermutationBase<OtherDerived>& other)
+    {
+      for (int i=0; i<rows();++i) indices().coeffRef(other.indices().coeff(i)) = i;
+    }
+    template<typename Lhs,typename Rhs>
+    void assignProduct(const Lhs& lhs, const Rhs& rhs)
+    {
+      eigen_assert(lhs.cols() == rhs.rows());
+      for (int i=0; i<rows();++i) indices().coeffRef(i) = lhs.indices().coeff(rhs.indices().coeff(i));
+    }
+#endif
+
+  public:
+
+    /** \returns the product permutation matrix.
+      *
+      * \note \note_try_to_help_rvo
+      */
+    template<typename Other>
+    inline PlainPermutationType operator*(const PermutationBase<Other>& other) const
+    { return PlainPermutationType(internal::PermPermProduct, derived(), other.derived()); }
+
+    /** \returns the product of a permutation with another inverse permutation.
+      *
+      * \note \note_try_to_help_rvo
+      */
+    template<typename Other>
+    inline PlainPermutationType operator*(const Transpose<PermutationBase<Other> >& other) const
+    { return PlainPermutationType(internal::PermPermProduct, *this, other.eval()); }
+
+    /** \returns the product of an inverse permutation with another permutation.
+      *
+      * \note \note_try_to_help_rvo
+      */
+    template<typename Other> friend
+    inline PlainPermutationType operator*(const Transpose<PermutationBase<Other> >& other, const PermutationBase& perm)
+    { return PlainPermutationType(internal::PermPermProduct, other.eval(), perm); }
+
+  protected:
+
+};
+
+/** \class PermutationMatrix
+  * \ingroup Core_Module
+  *
+  * \brief Permutation matrix
+  *
+  * \param SizeAtCompileTime the number of rows/cols, or Dynamic
+  * \param MaxSizeAtCompileTime the maximum number of rows/cols, or Dynamic. This optional parameter defaults to SizeAtCompileTime. Most of the time, you should not have to specify it.
+  * \param IndexType the interger type of the indices
+  *
+  * This class represents a permutation matrix, internally stored as a vector of integers.
+  *
+  * \sa class PermutationBase, class PermutationWrapper, class DiagonalMatrix
+  */
+
+namespace internal {
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime, typename IndexType>
+struct traits<PermutationMatrix<SizeAtCompileTime, MaxSizeAtCompileTime, IndexType> >
+ : traits<Matrix<IndexType,SizeAtCompileTime,SizeAtCompileTime,0,MaxSizeAtCompileTime,MaxSizeAtCompileTime> >
+{
+  typedef IndexType Index;
+  typedef Matrix<IndexType, SizeAtCompileTime, 1, 0, MaxSizeAtCompileTime, 1> IndicesType;
+};
+}
+
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime, typename IndexType>
+class PermutationMatrix : public PermutationBase<PermutationMatrix<SizeAtCompileTime, MaxSizeAtCompileTime, IndexType> >
+{
+    typedef PermutationBase<PermutationMatrix> Base;
+    typedef internal::traits<PermutationMatrix> Traits;
+  public:
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    typedef typename Traits::IndicesType IndicesType;
+    #endif
+
+    inline PermutationMatrix()
+    {}
+
+    /** Constructs an uninitialized permutation matrix of given size.
+      */
+    inline PermutationMatrix(int size) : m_indices(size)
+    {}
+
+    /** Copy constructor. */
+    template<typename OtherDerived>
+    inline PermutationMatrix(const PermutationBase<OtherDerived>& other)
+      : m_indices(other.indices()) {}
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** Standard copy constructor. Defined only to prevent a default copy constructor
+      * from hiding the other templated constructor */
+    inline PermutationMatrix(const PermutationMatrix& other) : m_indices(other.indices()) {}
+    #endif
+
+    /** Generic constructor from expression of the indices. The indices
+      * array has the meaning that the permutations sends each integer i to indices[i].
+      *
+      * \warning It is your responsibility to check that the indices array that you passes actually
+      * describes a permutation, i.e., each value between 0 and n-1 occurs exactly once, where n is the
+      * array's size.
+      */
+    template<typename Other>
+    explicit inline PermutationMatrix(const MatrixBase<Other>& a_indices) : m_indices(a_indices)
+    {}
+
+    /** Convert the Transpositions \a tr to a permutation matrix */
+    template<typename Other>
+    explicit PermutationMatrix(const TranspositionsBase<Other>& tr)
+      : m_indices(tr.size())
+    {
+      *this = tr;
+    }
+
+    /** Copies the other permutation into *this */
+    template<typename Other>
+    PermutationMatrix& operator=(const PermutationBase<Other>& other)
+    {
+      m_indices = other.indices();
+      return *this;
+    }
+
+    /** Assignment from the Transpositions \a tr */
+    template<typename Other>
+    PermutationMatrix& operator=(const TranspositionsBase<Other>& tr)
+    {
+      return Base::operator=(tr.derived());
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    PermutationMatrix& operator=(const PermutationMatrix& other)
+    {
+      m_indices = other.m_indices;
+      return *this;
+    }
+    #endif
+
+    /** const version of indices(). */
+    const IndicesType& indices() const { return m_indices; }
+    /** \returns a reference to the stored array representing the permutation. */
+    IndicesType& indices() { return m_indices; }
+
+
+    /**** multiplication helpers to hopefully get RVO ****/
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename Other>
+    PermutationMatrix(const Transpose<PermutationBase<Other> >& other)
+      : m_indices(other.nestedPermutation().size())
+    {
+      for (int i=0; i<m_indices.size();++i) m_indices.coeffRef(other.nestedPermutation().indices().coeff(i)) = i;
+    }
+    template<typename Lhs,typename Rhs>
+    PermutationMatrix(internal::PermPermProduct_t, const Lhs& lhs, const Rhs& rhs)
+      : m_indices(lhs.indices().size())
+    {
+      Base::assignProduct(lhs,rhs);
+    }
+#endif
+
+  protected:
+
+    IndicesType m_indices;
+};
+
+
+namespace internal {
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime, typename IndexType, int _PacketAccess>
+struct traits<Map<PermutationMatrix<SizeAtCompileTime, MaxSizeAtCompileTime, IndexType>,_PacketAccess> >
+ : traits<Matrix<IndexType,SizeAtCompileTime,SizeAtCompileTime,0,MaxSizeAtCompileTime,MaxSizeAtCompileTime> >
+{
+  typedef IndexType Index;
+  typedef Map<const Matrix<IndexType, SizeAtCompileTime, 1, 0, MaxSizeAtCompileTime, 1>, _PacketAccess> IndicesType;
+};
+}
+
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime, typename IndexType, int _PacketAccess>
+class Map<PermutationMatrix<SizeAtCompileTime, MaxSizeAtCompileTime, IndexType>,_PacketAccess>
+  : public PermutationBase<Map<PermutationMatrix<SizeAtCompileTime, MaxSizeAtCompileTime, IndexType>,_PacketAccess> >
+{
+    typedef PermutationBase<Map> Base;
+    typedef internal::traits<Map> Traits;
+  public:
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    typedef typename Traits::IndicesType IndicesType;
+    typedef typename IndicesType::Scalar Index;
+    #endif
+
+    inline Map(const Index* indicesPtr)
+      : m_indices(indicesPtr)
+    {}
+
+    inline Map(const Index* indicesPtr, Index size)
+      : m_indices(indicesPtr,size)
+    {}
+
+    /** Copies the other permutation into *this */
+    template<typename Other>
+    Map& operator=(const PermutationBase<Other>& other)
+    { return Base::operator=(other.derived()); }
+
+    /** Assignment from the Transpositions \a tr */
+    template<typename Other>
+    Map& operator=(const TranspositionsBase<Other>& tr)
+    { return Base::operator=(tr.derived()); }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    Map& operator=(const Map& other)
+    {
+      m_indices = other.m_indices;
+      return *this;
+    }
+    #endif
+
+    /** const version of indices(). */
+    const IndicesType& indices() const { return m_indices; }
+    /** \returns a reference to the stored array representing the permutation. */
+    IndicesType& indices() { return m_indices; }
+
+  protected:
+
+    IndicesType m_indices;
+};
+
+/** \class PermutationWrapper
+  * \ingroup Core_Module
+  *
+  * \brief Class to view a vector of integers as a permutation matrix
+  *
+  * \param _IndicesType the type of the vector of integer (can be any compatible expression)
+  *
+  * This class allows to view any vector expression of integers as a permutation matrix.
+  *
+  * \sa class PermutationBase, class PermutationMatrix
+  */
+
+struct PermutationStorage {};
+
+template<typename _IndicesType> class TranspositionsWrapper;
+namespace internal {
+template<typename _IndicesType>
+struct traits<PermutationWrapper<_IndicesType> >
+{
+  typedef PermutationStorage StorageKind;
+  typedef typename _IndicesType::Scalar Scalar;
+  typedef typename _IndicesType::Scalar Index;
+  typedef _IndicesType IndicesType;
+  enum {
+    RowsAtCompileTime = _IndicesType::SizeAtCompileTime,
+    ColsAtCompileTime = _IndicesType::SizeAtCompileTime,
+    MaxRowsAtCompileTime = IndicesType::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = IndicesType::MaxColsAtCompileTime,
+    Flags = 0,
+    CoeffReadCost = _IndicesType::CoeffReadCost
+  };
+};
+}
+
+template<typename _IndicesType>
+class PermutationWrapper : public PermutationBase<PermutationWrapper<_IndicesType> >
+{
+    typedef PermutationBase<PermutationWrapper> Base;
+    typedef internal::traits<PermutationWrapper> Traits;
+  public:
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    typedef typename Traits::IndicesType IndicesType;
+    #endif
+
+    inline PermutationWrapper(const IndicesType& a_indices)
+      : m_indices(a_indices)
+    {}
+
+    /** const version of indices(). */
+    const typename internal::remove_all<typename IndicesType::Nested>::type&
+    indices() const { return m_indices; }
+
+  protected:
+
+    typename IndicesType::Nested m_indices;
+};
+
+/** \returns the matrix with the permutation applied to the columns.
+  */
+template<typename Derived, typename PermutationDerived>
+inline const internal::permut_matrix_product_retval<PermutationDerived, Derived, OnTheRight>
+operator*(const MatrixBase<Derived>& matrix,
+          const PermutationBase<PermutationDerived> &permutation)
+{
+  return internal::permut_matrix_product_retval
+           <PermutationDerived, Derived, OnTheRight>
+           (permutation.derived(), matrix.derived());
+}
+
+/** \returns the matrix with the permutation applied to the rows.
+  */
+template<typename Derived, typename PermutationDerived>
+inline const internal::permut_matrix_product_retval
+               <PermutationDerived, Derived, OnTheLeft>
+operator*(const PermutationBase<PermutationDerived> &permutation,
+          const MatrixBase<Derived>& matrix)
+{
+  return internal::permut_matrix_product_retval
+           <PermutationDerived, Derived, OnTheLeft>
+           (permutation.derived(), matrix.derived());
+}
+
+namespace internal {
+
+template<typename PermutationType, typename MatrixType, int Side, bool Transposed>
+struct traits<permut_matrix_product_retval<PermutationType, MatrixType, Side, Transposed> >
+{
+  typedef typename MatrixType::PlainObject ReturnType;
+};
+
+template<typename PermutationType, typename MatrixType, int Side, bool Transposed>
+struct permut_matrix_product_retval
+ : public ReturnByValue<permut_matrix_product_retval<PermutationType, MatrixType, Side, Transposed> >
+{
+    typedef typename remove_all<typename MatrixType::Nested>::type MatrixTypeNestedCleaned;
+    typedef typename MatrixType::Index Index;
+
+    permut_matrix_product_retval(const PermutationType& perm, const MatrixType& matrix)
+      : m_permutation(perm), m_matrix(matrix)
+    {}
+
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+
+    template<typename Dest> inline void evalTo(Dest& dst) const
+    {
+      const Index n = Side==OnTheLeft ? rows() : cols();
+      // FIXME we need an is_same for expression that is not sensitive to constness. For instance
+      // is_same_xpr<Block<const Matrix>, Block<Matrix> >::value should be true.
+      if(is_same<MatrixTypeNestedCleaned,Dest>::value && extract_data(dst) == extract_data(m_matrix))
+      {
+        // apply the permutation inplace
+        Matrix<bool,PermutationType::RowsAtCompileTime,1,0,PermutationType::MaxRowsAtCompileTime> mask(m_permutation.size());
+        mask.fill(false);
+        Index r = 0;
+        while(r < m_permutation.size())
+        {
+          // search for the next seed
+          while(r<m_permutation.size() && mask[r]) r++;
+          if(r>=m_permutation.size())
+            break;
+          // we got one, let's follow it until we are back to the seed
+          Index k0 = r++;
+          Index kPrev = k0;
+          mask.coeffRef(k0) = true;
+          for(Index k=m_permutation.indices().coeff(k0); k!=k0; k=m_permutation.indices().coeff(k))
+          {
+                  Block<Dest, Side==OnTheLeft ? 1 : Dest::RowsAtCompileTime, Side==OnTheRight ? 1 : Dest::ColsAtCompileTime>(dst, k)
+            .swap(Block<Dest, Side==OnTheLeft ? 1 : Dest::RowsAtCompileTime, Side==OnTheRight ? 1 : Dest::ColsAtCompileTime>
+                       (dst,((Side==OnTheLeft) ^ Transposed) ? k0 : kPrev));
+
+            mask.coeffRef(k) = true;
+            kPrev = k;
+          }
+        }
+      }
+      else
+      {
+        for(int i = 0; i < n; ++i)
+        {
+          Block<Dest, Side==OnTheLeft ? 1 : Dest::RowsAtCompileTime, Side==OnTheRight ? 1 : Dest::ColsAtCompileTime>
+               (dst, ((Side==OnTheLeft) ^ Transposed) ? m_permutation.indices().coeff(i) : i)
+
+          =
+
+          Block<const MatrixTypeNestedCleaned,Side==OnTheLeft ? 1 : MatrixType::RowsAtCompileTime,Side==OnTheRight ? 1 : MatrixType::ColsAtCompileTime>
+               (m_matrix, ((Side==OnTheRight) ^ Transposed) ? m_permutation.indices().coeff(i) : i);
+        }
+      }
+    }
+
+  protected:
+    const PermutationType& m_permutation;
+    typename MatrixType::Nested m_matrix;
+};
+
+/* Template partial specialization for transposed/inverse permutations */
+
+template<typename Derived>
+struct traits<Transpose<PermutationBase<Derived> > >
+ : traits<Derived>
+{};
+
+} // end namespace internal
+
+template<typename Derived>
+class Transpose<PermutationBase<Derived> >
+  : public EigenBase<Transpose<PermutationBase<Derived> > >
+{
+    typedef Derived PermutationType;
+    typedef typename PermutationType::IndicesType IndicesType;
+    typedef typename PermutationType::PlainPermutationType PlainPermutationType;
+  public:
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    typedef internal::traits<PermutationType> Traits;
+    typedef typename Derived::DenseMatrixType DenseMatrixType;
+    enum {
+      Flags = Traits::Flags,
+      CoeffReadCost = Traits::CoeffReadCost,
+      RowsAtCompileTime = Traits::RowsAtCompileTime,
+      ColsAtCompileTime = Traits::ColsAtCompileTime,
+      MaxRowsAtCompileTime = Traits::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = Traits::MaxColsAtCompileTime
+    };
+    typedef typename Traits::Scalar Scalar;
+    #endif
+
+    Transpose(const PermutationType& p) : m_permutation(p) {}
+
+    inline int rows() const { return m_permutation.rows(); }
+    inline int cols() const { return m_permutation.cols(); }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename DenseDerived>
+    void evalTo(MatrixBase<DenseDerived>& other) const
+    {
+      other.setZero();
+      for (int i=0; i<rows();++i)
+        other.coeffRef(i, m_permutation.indices().coeff(i)) = typename DenseDerived::Scalar(1);
+    }
+    #endif
+
+    /** \return the equivalent permutation matrix */
+    PlainPermutationType eval() const { return *this; }
+
+    DenseMatrixType toDenseMatrix() const { return *this; }
+
+    /** \returns the matrix with the inverse permutation applied to the columns.
+      */
+    template<typename OtherDerived> friend
+    inline const internal::permut_matrix_product_retval<PermutationType, OtherDerived, OnTheRight, true>
+    operator*(const MatrixBase<OtherDerived>& matrix, const Transpose& trPerm)
+    {
+      return internal::permut_matrix_product_retval<PermutationType, OtherDerived, OnTheRight, true>(trPerm.m_permutation, matrix.derived());
+    }
+
+    /** \returns the matrix with the inverse permutation applied to the rows.
+      */
+    template<typename OtherDerived>
+    inline const internal::permut_matrix_product_retval<PermutationType, OtherDerived, OnTheLeft, true>
+    operator*(const MatrixBase<OtherDerived>& matrix) const
+    {
+      return internal::permut_matrix_product_retval<PermutationType, OtherDerived, OnTheLeft, true>(m_permutation, matrix.derived());
+    }
+
+    const PermutationType& nestedPermutation() const { return m_permutation; }
+
+  protected:
+    const PermutationType& m_permutation;
+};
+
+template<typename Derived>
+const PermutationWrapper<const Derived> MatrixBase<Derived>::asPermutation() const
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_PERMUTATIONMATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/PlainObjectBase.h b/third_party/eigen3/Eigen/src/Core/PlainObjectBase.h
new file mode 100644
index 0000000000..50c3656a98
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/PlainObjectBase.h
@@ -0,0 +1,895 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_DENSESTORAGEBASE_H
+#define EIGEN_DENSESTORAGEBASE_H
+
+#if defined(EIGEN_INITIALIZE_MATRICES_BY_ZERO)
+# define EIGEN_INITIALIZE_COEFFS
+# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(int i=0;i<base().size();++i) coeffRef(i)=Scalar(0);
+#elif defined(EIGEN_INITIALIZE_MATRICES_BY_NAN)
+# define EIGEN_INITIALIZE_COEFFS
+# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED for(int i=0;i<base().size();++i) coeffRef(i)=std::numeric_limits<Scalar>::quiet_NaN();
+#else
+# undef EIGEN_INITIALIZE_COEFFS
+# define EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+#endif
+
+namespace Eigen {
+
+namespace internal {
+
+template<int MaxSizeAtCompileTime> struct check_rows_cols_for_overflow {
+  template<typename Index>
+  EIGEN_DEVICE_FUNC
+  static EIGEN_ALWAYS_INLINE void run(Index, Index)
+  {
+  }
+};
+
+template<> struct check_rows_cols_for_overflow<Dynamic> {
+  template<typename Index>
+  EIGEN_DEVICE_FUNC
+  static EIGEN_ALWAYS_INLINE void run(Index rows, Index cols)
+  {
+    // http://hg.mozilla.org/mozilla-central/file/6c8a909977d3/xpcom/ds/CheckedInt.h#l242
+    // we assume Index is signed
+    Index max_index = (size_t(1) << (8 * sizeof(Index) - 1)) - 1; // assume Index is signed
+    bool error = (rows == 0 || cols == 0) ? false
+               : (rows > max_index / cols);
+    if (error)
+      throw_std_bad_alloc();
+  }
+};
+
+template <typename Derived,
+          typename OtherDerived = Derived,
+          bool IsVector = bool(Derived::IsVectorAtCompileTime) && bool(OtherDerived::IsVectorAtCompileTime)>
+struct conservative_resize_like_impl;
+
+template<typename MatrixTypeA, typename MatrixTypeB, bool SwapPointers> struct matrix_swap_impl;
+
+} // end namespace internal
+
+/** \class PlainObjectBase
+  * \brief %Dense storage base class for matrices and arrays.
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_PLAINOBJECTBASE_PLUGIN.
+  *
+  * \sa \ref TopicClassHierarchy
+  */
+#ifdef EIGEN_PARSED_BY_DOXYGEN
+namespace internal {
+
+// this is a warkaround to doxygen not being able to understand the inheritence logic
+// when it is hidden by the dense_xpr_base helper struct.
+template<typename Derived> struct dense_xpr_base_dispatcher_for_doxygen;// : public MatrixBase<Derived> {};
+/** This class is just a workaround for Doxygen and it does not not actually exist. */
+template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
+struct dense_xpr_base_dispatcher_for_doxygen<Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> >
+    : public MatrixBase<Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> > {};
+/** This class is just a workaround for Doxygen and it does not not actually exist. */
+template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
+struct dense_xpr_base_dispatcher_for_doxygen<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> >
+    : public ArrayBase<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols> > {};
+
+} // namespace internal
+
+template<typename Derived>
+class PlainObjectBase : public internal::dense_xpr_base_dispatcher_for_doxygen<Derived>
+#else
+template<typename Derived>
+class PlainObjectBase : public internal::dense_xpr_base<Derived>::type
+#endif
+{
+  public:
+    enum { Options = internal::traits<Derived>::Options };
+    typedef typename internal::dense_xpr_base<Derived>::type Base;
+
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type PacketScalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef Derived DenseType;
+
+    using Base::RowsAtCompileTime;
+    using Base::ColsAtCompileTime;
+    using Base::SizeAtCompileTime;
+    using Base::MaxRowsAtCompileTime;
+    using Base::MaxColsAtCompileTime;
+    using Base::MaxSizeAtCompileTime;
+    using Base::IsVectorAtCompileTime;
+    using Base::Flags;
+
+    template<typename PlainObjectType, int MapOptions, typename StrideType> friend class Eigen::Map;
+    friend  class Eigen::Map<Derived, Unaligned>;
+    typedef Eigen::Map<Derived, Unaligned>  MapType;
+    friend  class Eigen::Map<const Derived, Unaligned>;
+    typedef const Eigen::Map<const Derived, Unaligned> ConstMapType;
+    friend  class Eigen::Map<Derived, Aligned>;
+    typedef Eigen::Map<Derived, Aligned> AlignedMapType;
+    friend  class Eigen::Map<const Derived, Aligned>;
+    typedef const Eigen::Map<const Derived, Aligned> ConstAlignedMapType;
+    template<typename StrideType> struct StridedMapType { typedef Eigen::Map<Derived, Unaligned, StrideType> type; };
+    template<typename StrideType> struct StridedConstMapType { typedef Eigen::Map<const Derived, Unaligned, StrideType> type; };
+    template<typename StrideType> struct StridedAlignedMapType { typedef Eigen::Map<Derived, Aligned, StrideType> type; };
+    template<typename StrideType> struct StridedConstAlignedMapType { typedef Eigen::Map<const Derived, Aligned, StrideType> type; };
+
+  protected:
+    DenseStorage<Scalar, Base::MaxSizeAtCompileTime, Base::RowsAtCompileTime, Base::ColsAtCompileTime, Options> m_storage;
+
+  public:
+    enum { NeedsToAlign = SizeAtCompileTime != Dynamic && (internal::traits<Derived>::Flags & AlignedBit) != 0 };
+    EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(NeedsToAlign)
+
+    EIGEN_DEVICE_FUNC
+    Base& base() { return *static_cast<Base*>(this); }
+    EIGEN_DEVICE_FUNC
+    const Base& base() const { return *static_cast<const Base*>(this); }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index rows() const { return m_storage.rows(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index cols() const { return m_storage.cols(); }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& coeff(Index rowId, Index colId) const
+    {
+      if(Flags & RowMajorBit)
+        return m_storage.data()[colId + rowId * m_storage.cols()];
+      else // column-major
+        return m_storage.data()[rowId + colId * m_storage.rows()];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& coeff(Index index) const
+    {
+      return m_storage.data()[index];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index rowId, Index colId)
+    {
+      if(Flags & RowMajorBit)
+        return m_storage.data()[colId + rowId * m_storage.cols()];
+      else // column-major
+        return m_storage.data()[rowId + colId * m_storage.rows()];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index index)
+    {
+      return m_storage.data()[index];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& coeffRef(Index rowId, Index colId) const
+    {
+      if(Flags & RowMajorBit)
+        return m_storage.data()[colId + rowId * m_storage.cols()];
+      else // column-major
+        return m_storage.data()[rowId + colId * m_storage.rows()];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& coeffRef(Index index) const
+    {
+      return m_storage.data()[index];
+    }
+
+    /** \internal */
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index rowId, Index colId) const
+    {
+      return internal::ploadt<PacketScalar, LoadMode>
+               (m_storage.data() + (Flags & RowMajorBit
+                                   ? colId + rowId * m_storage.cols()
+                                   : rowId + colId * m_storage.rows()));
+    }
+
+    /** \internal */
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE PacketScalar packet(Index index) const
+    {
+      return internal::ploadt<PacketScalar, LoadMode>(m_storage.data() + index);
+    }
+
+    /** \internal */
+    template<int StoreMode>
+    EIGEN_STRONG_INLINE void writePacket(Index rowId, Index colId, const PacketScalar& val)
+    {
+      internal::pstoret<Scalar, PacketScalar, StoreMode>
+              (m_storage.data() + (Flags & RowMajorBit
+                                   ? colId + rowId * m_storage.cols()
+                                   : rowId + colId * m_storage.rows()), val);
+    }
+
+    /** \internal */
+    template<int StoreMode>
+    EIGEN_STRONG_INLINE void writePacket(Index index, const PacketScalar& val)
+    {
+      internal::pstoret<Scalar, PacketScalar, StoreMode>(m_storage.data() + index, val);
+    }
+
+    /** \returns a const pointer to the data array of this matrix */
+    EIGEN_STRONG_INLINE const Scalar *data() const
+    { return m_storage.data(); }
+
+    /** \returns a pointer to the data array of this matrix */
+    EIGEN_STRONG_INLINE Scalar *data()
+    { return m_storage.data(); }
+
+    /** Resizes \c *this to a \a rows x \a cols matrix.
+      *
+      * This method is intended for dynamic-size matrices, although it is legal to call it on any
+      * matrix as long as fixed dimensions are left unchanged. If you only want to change the number
+      * of rows and/or of columns, you can use resize(NoChange_t, Index), resize(Index, NoChange_t).
+      *
+      * If the current number of coefficients of \c *this exactly matches the
+      * product \a rows * \a cols, then no memory allocation is performed and
+      * the current values are left unchanged. In all other cases, including
+      * shrinking, the data is reallocated and all previous values are lost.
+      *
+      * Example: \include Matrix_resize_int_int.cpp
+      * Output: \verbinclude Matrix_resize_int_int.out
+      *
+      * \sa resize(Index) for vectors, resize(NoChange_t, Index), resize(Index, NoChange_t)
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void resize(Index nbRows, Index nbCols)
+    {
+      eigen_assert(   EIGEN_IMPLIES(RowsAtCompileTime!=Dynamic,nbRows==RowsAtCompileTime)
+                   && EIGEN_IMPLIES(ColsAtCompileTime!=Dynamic,nbCols==ColsAtCompileTime)
+                   && EIGEN_IMPLIES(RowsAtCompileTime==Dynamic && MaxRowsAtCompileTime!=Dynamic,nbRows<=MaxRowsAtCompileTime)
+                   && EIGEN_IMPLIES(ColsAtCompileTime==Dynamic && MaxColsAtCompileTime!=Dynamic,nbCols<=MaxColsAtCompileTime)
+                   && nbRows>=0 && nbCols>=0 && "Invalid sizes when resizing a matrix or array.");
+      internal::check_rows_cols_for_overflow<MaxSizeAtCompileTime>::run(nbRows, nbCols);
+      #ifdef EIGEN_INITIALIZE_COEFFS
+        Index size = nbRows*nbCols;
+        bool size_changed = size != this->size();
+        m_storage.resize(size, nbRows, nbCols);
+        if(size_changed) EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+      #else
+        m_storage.resize(nbRows*nbCols, nbRows, nbCols);
+      #endif
+    }
+
+    /** Resizes \c *this to a vector of length \a size
+      *
+      * \only_for_vectors. This method does not work for
+      * partially dynamic matrices when the static dimension is anything other
+      * than 1. For example it will not work with Matrix<double, 2, Dynamic>.
+      *
+      * Example: \include Matrix_resize_int.cpp
+      * Output: \verbinclude Matrix_resize_int.out
+      *
+      * \sa resize(Index,Index), resize(NoChange_t, Index), resize(Index, NoChange_t)
+      */
+    EIGEN_DEVICE_FUNC
+    inline void resize(Index size)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(PlainObjectBase)
+      eigen_assert(((SizeAtCompileTime == Dynamic && (MaxSizeAtCompileTime==Dynamic || size<=MaxSizeAtCompileTime)) || SizeAtCompileTime == size) && size>=0);
+      #ifdef EIGEN_INITIALIZE_COEFFS
+        bool size_changed = size != this->size();
+      #endif
+      if(RowsAtCompileTime == 1)
+        m_storage.resize(size, 1, size);
+      else
+        m_storage.resize(size, size, 1);
+      #ifdef EIGEN_INITIALIZE_COEFFS
+        if(size_changed) EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+      #endif
+    }
+
+    /** Resizes the matrix, changing only the number of columns. For the parameter of type NoChange_t, just pass the special value \c NoChange
+      * as in the example below.
+      *
+      * Example: \include Matrix_resize_NoChange_int.cpp
+      * Output: \verbinclude Matrix_resize_NoChange_int.out
+      *
+      * \sa resize(Index,Index)
+      */
+    EIGEN_DEVICE_FUNC
+    inline void resize(NoChange_t, Index nbCols)
+    {
+      resize(rows(), nbCols);
+    }
+
+    /** Resizes the matrix, changing only the number of rows. For the parameter of type NoChange_t, just pass the special value \c NoChange
+      * as in the example below.
+      *
+      * Example: \include Matrix_resize_int_NoChange.cpp
+      * Output: \verbinclude Matrix_resize_int_NoChange.out
+      *
+      * \sa resize(Index,Index)
+      */
+    EIGEN_DEVICE_FUNC
+    inline void resize(Index nbRows, NoChange_t)
+    {
+      resize(nbRows, cols());
+    }
+
+    /** Resizes \c *this to have the same dimensions as \a other.
+      * Takes care of doing all the checking that's needed.
+      *
+      * Note that copying a row-vector into a vector (and conversely) is allowed.
+      * The resizing, if any, is then done in the appropriate way so that row-vectors
+      * remain row-vectors and vectors remain vectors.
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE void resizeLike(const EigenBase<OtherDerived>& _other)
+    {
+      const OtherDerived& other = _other.derived();
+      internal::check_rows_cols_for_overflow<MaxSizeAtCompileTime>::run(other.rows(), other.cols());
+      const Index othersize = other.rows()*other.cols();
+      if(RowsAtCompileTime == 1)
+      {
+        eigen_assert(other.rows() == 1 || other.cols() == 1);
+        resize(1, othersize);
+      }
+      else if(ColsAtCompileTime == 1)
+      {
+        eigen_assert(other.rows() == 1 || other.cols() == 1);
+        resize(othersize, 1);
+      }
+      else resize(other.rows(), other.cols());
+    }
+
+    /** Resizes the matrix to \a rows x \a cols while leaving old values untouched.
+      *
+      * The method is intended for matrices of dynamic size. If you only want to change the number
+      * of rows and/or of columns, you can use conservativeResize(NoChange_t, Index) or
+      * conservativeResize(Index, NoChange_t).
+      *
+      * Matrices are resized relative to the top-left element. In case values need to be 
+      * appended to the matrix they will be uninitialized.
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void conservativeResize(Index nbRows, Index nbCols)
+    {
+      internal::conservative_resize_like_impl<Derived>::run(*this, nbRows, nbCols);
+    }
+
+    /** Resizes the matrix to \a rows x \a cols while leaving old values untouched.
+      *
+      * As opposed to conservativeResize(Index rows, Index cols), this version leaves
+      * the number of columns unchanged.
+      *
+      * In case the matrix is growing, new rows will be uninitialized.
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void conservativeResize(Index nbRows, NoChange_t)
+    {
+      // Note: see the comment in conservativeResize(Index,Index)
+      conservativeResize(nbRows, cols());
+    }
+
+    /** Resizes the matrix to \a rows x \a cols while leaving old values untouched.
+      *
+      * As opposed to conservativeResize(Index rows, Index cols), this version leaves
+      * the number of rows unchanged.
+      *
+      * In case the matrix is growing, new columns will be uninitialized.
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void conservativeResize(NoChange_t, Index nbCols)
+    {
+      // Note: see the comment in conservativeResize(Index,Index)
+      conservativeResize(rows(), nbCols);
+    }
+
+    /** Resizes the vector to \a size while retaining old values.
+      *
+      * \only_for_vectors. This method does not work for
+      * partially dynamic matrices when the static dimension is anything other
+      * than 1. For example it will not work with Matrix<double, 2, Dynamic>.
+      *
+      * When values are appended, they will be uninitialized.
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void conservativeResize(Index size)
+    {
+      internal::conservative_resize_like_impl<Derived>::run(*this, size);
+    }
+
+    /** Resizes the matrix to \a rows x \a cols of \c other, while leaving old values untouched.
+      *
+      * The method is intended for matrices of dynamic size. If you only want to change the number
+      * of rows and/or of columns, you can use conservativeResize(NoChange_t, Index) or
+      * conservativeResize(Index, NoChange_t).
+      *
+      * Matrices are resized relative to the top-left element. In case values need to be 
+      * appended to the matrix they will copied from \c other.
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void conservativeResizeLike(const DenseBase<OtherDerived>& other)
+    {
+      internal::conservative_resize_like_impl<Derived,OtherDerived>::run(*this, other);
+    }
+
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& operator=(const PlainObjectBase& other)
+    {
+      return _set(other);
+    }
+
+    /** \sa MatrixBase::lazyAssign() */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& lazyAssign(const DenseBase<OtherDerived>& other)
+    {
+      _resize_to_match(other);
+      return Base::lazyAssign(other.derived());
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& operator=(const ReturnByValue<OtherDerived>& func)
+    {
+      resize(func.rows(), func.cols());
+      return Base::operator=(func);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE PlainObjectBase() : m_storage()
+    {
+//       _check_template_params();
+//       EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    }
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    // FIXME is it still needed ?
+    /** \internal */
+    EIGEN_DEVICE_FUNC
+    PlainObjectBase(internal::constructor_without_unaligned_array_assert)
+      : m_storage(internal::constructor_without_unaligned_array_assert())
+    {
+//       _check_template_params(); EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    }
+#endif
+
+#ifdef EIGEN_HAVE_RVALUE_REFERENCES
+    EIGEN_DEVICE_FUNC
+    PlainObjectBase(PlainObjectBase&& other)
+      : m_storage( std::move(other.m_storage) )
+    {
+    }
+
+    EIGEN_DEVICE_FUNC
+    PlainObjectBase& operator=(PlainObjectBase&& other)
+    {
+      using std::swap;
+      swap(m_storage, other.m_storage);
+      return *this;
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE PlainObjectBase(Index a_size, Index nbRows, Index nbCols)
+      : m_storage(a_size, nbRows, nbCols)
+    {
+//       _check_template_params();
+//       EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    }
+
+    /** \copydoc MatrixBase::operator=(const EigenBase<OtherDerived>&)
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE Derived& operator=(const EigenBase<OtherDerived> &other)
+    {
+      _resize_to_match(other);
+      Base::operator=(other.derived());
+      return this->derived();
+    }
+
+    /** \sa MatrixBase::operator=(const EigenBase<OtherDerived>&) */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE PlainObjectBase(const EigenBase<OtherDerived> &other)
+      : m_storage(other.derived().rows() * other.derived().cols(), other.derived().rows(), other.derived().cols())
+    {
+      _check_template_params();
+      internal::check_rows_cols_for_overflow<MaxSizeAtCompileTime>::run(other.derived().rows(), other.derived().cols());
+      Base::operator=(other.derived());
+    }
+
+    /** \name Map
+      * These are convenience functions returning Map objects. The Map() static functions return unaligned Map objects,
+      * while the AlignedMap() functions return aligned Map objects and thus should be called only with 16-byte-aligned
+      * \a data pointers.
+      *
+      * \see class Map
+      */
+    //@{
+    static inline ConstMapType Map(const Scalar* data)
+    { return ConstMapType(data); }
+    static inline MapType Map(Scalar* data)
+    { return MapType(data); }
+    static inline ConstMapType Map(const Scalar* data, Index size)
+    { return ConstMapType(data, size); }
+    static inline MapType Map(Scalar* data, Index size)
+    { return MapType(data, size); }
+    static inline ConstMapType Map(const Scalar* data, Index rows, Index cols)
+    { return ConstMapType(data, rows, cols); }
+    static inline MapType Map(Scalar* data, Index rows, Index cols)
+    { return MapType(data, rows, cols); }
+
+    static inline ConstAlignedMapType MapAligned(const Scalar* data)
+    { return ConstAlignedMapType(data); }
+    static inline AlignedMapType MapAligned(Scalar* data)
+    { return AlignedMapType(data); }
+    static inline ConstAlignedMapType MapAligned(const Scalar* data, Index size)
+    { return ConstAlignedMapType(data, size); }
+    static inline AlignedMapType MapAligned(Scalar* data, Index size)
+    { return AlignedMapType(data, size); }
+    static inline ConstAlignedMapType MapAligned(const Scalar* data, Index rows, Index cols)
+    { return ConstAlignedMapType(data, rows, cols); }
+    static inline AlignedMapType MapAligned(Scalar* data, Index rows, Index cols)
+    { return AlignedMapType(data, rows, cols); }
+
+    template<int Outer, int Inner>
+    static inline typename StridedConstMapType<Stride<Outer, Inner> >::type Map(const Scalar* data, const Stride<Outer, Inner>& stride)
+    { return typename StridedConstMapType<Stride<Outer, Inner> >::type(data, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedMapType<Stride<Outer, Inner> >::type Map(Scalar* data, const Stride<Outer, Inner>& stride)
+    { return typename StridedMapType<Stride<Outer, Inner> >::type(data, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedConstMapType<Stride<Outer, Inner> >::type Map(const Scalar* data, Index size, const Stride<Outer, Inner>& stride)
+    { return typename StridedConstMapType<Stride<Outer, Inner> >::type(data, size, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedMapType<Stride<Outer, Inner> >::type Map(Scalar* data, Index size, const Stride<Outer, Inner>& stride)
+    { return typename StridedMapType<Stride<Outer, Inner> >::type(data, size, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedConstMapType<Stride<Outer, Inner> >::type Map(const Scalar* data, Index rows, Index cols, const Stride<Outer, Inner>& stride)
+    { return typename StridedConstMapType<Stride<Outer, Inner> >::type(data, rows, cols, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedMapType<Stride<Outer, Inner> >::type Map(Scalar* data, Index rows, Index cols, const Stride<Outer, Inner>& stride)
+    { return typename StridedMapType<Stride<Outer, Inner> >::type(data, rows, cols, stride); }
+
+    template<int Outer, int Inner>
+    static inline typename StridedConstAlignedMapType<Stride<Outer, Inner> >::type MapAligned(const Scalar* data, const Stride<Outer, Inner>& stride)
+    { return typename StridedConstAlignedMapType<Stride<Outer, Inner> >::type(data, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedAlignedMapType<Stride<Outer, Inner> >::type MapAligned(Scalar* data, const Stride<Outer, Inner>& stride)
+    { return typename StridedAlignedMapType<Stride<Outer, Inner> >::type(data, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedConstAlignedMapType<Stride<Outer, Inner> >::type MapAligned(const Scalar* data, Index size, const Stride<Outer, Inner>& stride)
+    { return typename StridedConstAlignedMapType<Stride<Outer, Inner> >::type(data, size, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedAlignedMapType<Stride<Outer, Inner> >::type MapAligned(Scalar* data, Index size, const Stride<Outer, Inner>& stride)
+    { return typename StridedAlignedMapType<Stride<Outer, Inner> >::type(data, size, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedConstAlignedMapType<Stride<Outer, Inner> >::type MapAligned(const Scalar* data, Index rows, Index cols, const Stride<Outer, Inner>& stride)
+    { return typename StridedConstAlignedMapType<Stride<Outer, Inner> >::type(data, rows, cols, stride); }
+    template<int Outer, int Inner>
+    static inline typename StridedAlignedMapType<Stride<Outer, Inner> >::type MapAligned(Scalar* data, Index rows, Index cols, const Stride<Outer, Inner>& stride)
+    { return typename StridedAlignedMapType<Stride<Outer, Inner> >::type(data, rows, cols, stride); }
+    //@}
+
+    using Base::setConstant;
+    EIGEN_DEVICE_FUNC Derived& setConstant(Index size, const Scalar& value);
+    EIGEN_DEVICE_FUNC Derived& setConstant(Index rows, Index cols, const Scalar& value);
+
+    using Base::setZero;
+    EIGEN_DEVICE_FUNC Derived& setZero(Index size);
+    EIGEN_DEVICE_FUNC Derived& setZero(Index rows, Index cols);
+
+    using Base::setOnes;
+    EIGEN_DEVICE_FUNC Derived& setOnes(Index size);
+    EIGEN_DEVICE_FUNC Derived& setOnes(Index rows, Index cols);
+
+    using Base::setRandom;
+    Derived& setRandom(Index size);
+    Derived& setRandom(Index rows, Index cols);
+
+    #ifdef EIGEN_PLAINOBJECTBASE_PLUGIN
+    #include EIGEN_PLAINOBJECTBASE_PLUGIN
+    #endif
+
+  protected:
+    /** \internal Resizes *this in preparation for assigning \a other to it.
+      * Takes care of doing all the checking that's needed.
+      *
+      * Note that copying a row-vector into a vector (and conversely) is allowed.
+      * The resizing, if any, is then done in the appropriate way so that row-vectors
+      * remain row-vectors and vectors remain vectors.
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE void _resize_to_match(const EigenBase<OtherDerived>& other)
+    {
+      #ifdef EIGEN_NO_AUTOMATIC_RESIZING
+      eigen_assert((this->size()==0 || (IsVectorAtCompileTime ? (this->size() == other.size())
+                 : (rows() == other.rows() && cols() == other.cols())))
+        && "Size mismatch. Automatic resizing is disabled because EIGEN_NO_AUTOMATIC_RESIZING is defined");
+      EIGEN_ONLY_USED_FOR_DEBUG(other);
+      #else
+      resizeLike(other);
+      #endif
+    }
+
+    /**
+      * \brief Copies the value of the expression \a other into \c *this with automatic resizing.
+      *
+      * *this might be resized to match the dimensions of \a other. If *this was a null matrix (not already initialized),
+      * it will be initialized.
+      *
+      * Note that copying a row-vector into a vector (and conversely) is allowed.
+      * The resizing, if any, is then done in the appropriate way so that row-vectors
+      * remain row-vectors and vectors remain vectors.
+      *
+      * \sa operator=(const MatrixBase<OtherDerived>&), _set_noalias()
+      *
+      * \internal
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE Derived& _set(const DenseBase<OtherDerived>& other)
+    {
+      _set_selector(other.derived(), typename internal::conditional<static_cast<bool>(int(OtherDerived::Flags) & EvalBeforeAssigningBit), internal::true_type, internal::false_type>::type());
+      return this->derived();
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE void _set_selector(const OtherDerived& other, const internal::true_type&) { _set_noalias(other.eval()); }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE void _set_selector(const OtherDerived& other, const internal::false_type&) { _set_noalias(other); }
+
+    /** \internal Like _set() but additionally makes the assumption that no aliasing effect can happen (which
+      * is the case when creating a new matrix) so one can enforce lazy evaluation.
+      *
+      * \sa operator=(const MatrixBase<OtherDerived>&), _set()
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE Derived& _set_noalias(const DenseBase<OtherDerived>& other)
+    {
+      // I don't think we need this resize call since the lazyAssign will anyways resize
+      // and lazyAssign will be called by the assign selector.
+      //_resize_to_match(other);
+      // the 'false' below means to enforce lazy evaluation. We don't use lazyAssign() because
+      // it wouldn't allow to copy a row-vector into a column-vector.
+      return internal::assign_selector<Derived,OtherDerived,false>::run(this->derived(), other.derived());
+    }
+
+    template<typename T0, typename T1>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void _init2(Index nbRows, Index nbCols, typename internal::enable_if<Base::SizeAtCompileTime!=2,T0>::type* = 0)
+    {
+      EIGEN_STATIC_ASSERT(bool(NumTraits<T0>::IsInteger) &&
+                          bool(NumTraits<T1>::IsInteger),
+                          FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED)
+      resize(nbRows,nbCols);
+    }
+    template<typename T0, typename T1>
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE void _init2(const Scalar& val0, const Scalar& val1, typename internal::enable_if<Base::SizeAtCompileTime==2,T0>::type* = 0)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, 2)
+      m_storage.data()[0] = val0;
+      m_storage.data()[1] = val1;
+    }
+
+    template<typename T>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void _init1(Index size, typename internal::enable_if<Base::SizeAtCompileTime!=1,T>::type* = 0)
+    {
+      EIGEN_STATIC_ASSERT(bool(NumTraits<T>::IsInteger),
+                          FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED)
+      resize(size);
+    }
+    template<typename T>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void _init1(const Scalar& val0, typename internal::enable_if<Base::SizeAtCompileTime==1,T>::type* = 0)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(PlainObjectBase, 1)
+      m_storage.data()[0] = val0;
+    }
+
+    template<typename T>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void _init1(const Scalar* data){
+      this->_set_noalias(ConstMapType(data));
+    }
+
+    template<typename T, typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void _init1(const DenseBase<OtherDerived>& other){
+      this->_set_noalias(other);
+    }
+
+    template<typename T, typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void _init1(const EigenBase<OtherDerived>& other){
+      this->derived() = other;
+    }
+
+    template<typename T, typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void _init1(const ReturnByValue<OtherDerived>& other)
+    {
+      resize(other.rows(), other.cols());
+      other.evalTo(this->derived());
+    }
+
+    template<typename T, typename OtherDerived, int ColsAtCompileTime>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void _init1(const RotationBase<OtherDerived,ColsAtCompileTime>& r)
+    {
+      this->derived() = r;
+    }
+
+    template<typename MatrixTypeA, typename MatrixTypeB, bool SwapPointers>
+    friend struct internal::matrix_swap_impl;
+
+    /** \internal generic implementation of swap for dense storage since for dynamic-sized matrices of same type it is enough to swap the
+      * data pointers.
+      */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void _swap(DenseBase<OtherDerived> const & other)
+    {
+      enum { SwapPointers = internal::is_same<Derived, OtherDerived>::value && Base::SizeAtCompileTime==Dynamic };
+      internal::matrix_swap_impl<Derived, OtherDerived, bool(SwapPointers)>::run(this->derived(), other.const_cast_derived());
+    }
+
+  public:
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    EIGEN_DEVICE_FUNC 
+    static EIGEN_STRONG_INLINE void _check_template_params()
+    {
+      EIGEN_STATIC_ASSERT((EIGEN_IMPLIES(MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1, (Options&RowMajor)==RowMajor)
+                        && EIGEN_IMPLIES(MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1, (Options&RowMajor)==0)
+                        && ((RowsAtCompileTime == Dynamic) || (RowsAtCompileTime >= 0))
+                        && ((ColsAtCompileTime == Dynamic) || (ColsAtCompileTime >= 0))
+                        && ((MaxRowsAtCompileTime == Dynamic) || (MaxRowsAtCompileTime >= 0))
+                        && ((MaxColsAtCompileTime == Dynamic) || (MaxColsAtCompileTime >= 0))
+                        && (MaxRowsAtCompileTime == RowsAtCompileTime || RowsAtCompileTime==Dynamic)
+                        && (MaxColsAtCompileTime == ColsAtCompileTime || ColsAtCompileTime==Dynamic)
+                        && (Options & (DontAlign|RowMajor)) == Options),
+        INVALID_MATRIX_TEMPLATE_PARAMETERS)
+    }
+#endif
+
+private:
+    enum { ThisConstantIsPrivateInPlainObjectBase };
+};
+
+namespace internal {
+
+template <typename Derived, typename OtherDerived, bool IsVector>
+struct conservative_resize_like_impl
+{
+  typedef typename Derived::Index Index;
+  static void run(DenseBase<Derived>& _this, Index rows, Index cols)
+  {
+    if (_this.rows() == rows && _this.cols() == cols) return;
+    EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(Derived)
+
+    if ( ( Derived::IsRowMajor && _this.cols() == cols) || // row-major and we change only the number of rows
+         (!Derived::IsRowMajor && _this.rows() == rows) )  // column-major and we change only the number of columns
+    {
+      internal::check_rows_cols_for_overflow<Derived::MaxSizeAtCompileTime>::run(rows, cols);
+      _this.derived().m_storage.conservativeResize(rows*cols,rows,cols);
+    }
+    else
+    {
+      // The storage order does not allow us to use reallocation.
+      typename Derived::PlainObject tmp(rows,cols);
+      const Index common_rows = (std::min)(rows, _this.rows());
+      const Index common_cols = (std::min)(cols, _this.cols());
+      tmp.block(0,0,common_rows,common_cols) = _this.block(0,0,common_rows,common_cols);
+      _this.derived().swap(tmp);
+    }
+  }
+
+  static void run(DenseBase<Derived>& _this, const DenseBase<OtherDerived>& other)
+  {
+    if (_this.rows() == other.rows() && _this.cols() == other.cols()) return;
+
+    // Note: Here is space for improvement. Basically, for conservativeResize(Index,Index),
+    // neither RowsAtCompileTime or ColsAtCompileTime must be Dynamic. If only one of the
+    // dimensions is dynamic, one could use either conservativeResize(Index rows, NoChange_t) or
+    // conservativeResize(NoChange_t, Index cols). For these methods new static asserts like
+    // EIGEN_STATIC_ASSERT_DYNAMIC_ROWS and EIGEN_STATIC_ASSERT_DYNAMIC_COLS would be good.
+    EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(Derived)
+    EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(OtherDerived)
+
+    if ( ( Derived::IsRowMajor && _this.cols() == other.cols()) || // row-major and we change only the number of rows
+         (!Derived::IsRowMajor && _this.rows() == other.rows()) )  // column-major and we change only the number of columns
+    {
+      const Index new_rows = other.rows() - _this.rows();
+      const Index new_cols = other.cols() - _this.cols();
+      _this.derived().m_storage.conservativeResize(other.size(),other.rows(),other.cols());
+      if (new_rows>0)
+        _this.bottomRightCorner(new_rows, other.cols()) = other.bottomRows(new_rows);
+      else if (new_cols>0)
+        _this.bottomRightCorner(other.rows(), new_cols) = other.rightCols(new_cols);
+    }
+    else
+    {
+      // The storage order does not allow us to use reallocation.
+      typename Derived::PlainObject tmp(other);
+      const Index common_rows = (std::min)(tmp.rows(), _this.rows());
+      const Index common_cols = (std::min)(tmp.cols(), _this.cols());
+      tmp.block(0,0,common_rows,common_cols) = _this.block(0,0,common_rows,common_cols);
+      _this.derived().swap(tmp);
+    }
+  }
+};
+
+// Here, the specialization for vectors inherits from the general matrix case
+// to allow calling .conservativeResize(rows,cols) on vectors.
+template <typename Derived, typename OtherDerived>
+struct conservative_resize_like_impl<Derived,OtherDerived,true>
+  : conservative_resize_like_impl<Derived,OtherDerived,false>
+{
+  using conservative_resize_like_impl<Derived,OtherDerived,false>::run;
+  
+  typedef typename Derived::Index Index;
+  static void run(DenseBase<Derived>& _this, Index size)
+  {
+    const Index new_rows = Derived::RowsAtCompileTime==1 ? 1 : size;
+    const Index new_cols = Derived::RowsAtCompileTime==1 ? size : 1;
+    _this.derived().m_storage.conservativeResize(size,new_rows,new_cols);
+  }
+
+  static void run(DenseBase<Derived>& _this, const DenseBase<OtherDerived>& other)
+  {
+    if (_this.rows() == other.rows() && _this.cols() == other.cols()) return;
+
+    const Index num_new_elements = other.size() - _this.size();
+
+    const Index new_rows = Derived::RowsAtCompileTime==1 ? 1 : other.rows();
+    const Index new_cols = Derived::RowsAtCompileTime==1 ? other.cols() : 1;
+    _this.derived().m_storage.conservativeResize(other.size(),new_rows,new_cols);
+
+    if (num_new_elements > 0)
+      _this.tail(num_new_elements) = other.tail(num_new_elements);
+  }
+};
+
+template<typename MatrixTypeA, typename MatrixTypeB, bool SwapPointers>
+struct matrix_swap_impl
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(MatrixTypeA& a, MatrixTypeB& b)
+  {
+    a.base().swap(b);
+  }
+};
+
+template<typename MatrixTypeA, typename MatrixTypeB>
+struct matrix_swap_impl<MatrixTypeA, MatrixTypeB, true>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(MatrixTypeA& a, MatrixTypeB& b)
+  {
+    static_cast<typename MatrixTypeA::Base&>(a).m_storage.swap(static_cast<typename MatrixTypeB::Base&>(b).m_storage);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_DENSESTORAGEBASE_H
diff --git a/third_party/eigen3/Eigen/src/Core/Product.h b/third_party/eigen3/Eigen/src/Core/Product.h
new file mode 100644
index 0000000000..5d3789be74
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Product.h
@@ -0,0 +1,107 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PRODUCT_H
+#define EIGEN_PRODUCT_H
+
+namespace Eigen {
+
+template<typename Lhs, typename Rhs> class Product;
+template<typename Lhs, typename Rhs, typename StorageKind> class ProductImpl;
+
+/** \class Product
+  * \ingroup Core_Module
+  *
+  * \brief Expression of the product of two arbitrary matrices or vectors
+  *
+  * \param Lhs the type of the left-hand side expression
+  * \param Rhs the type of the right-hand side expression
+  *
+  * This class represents an expression of the product of two arbitrary matrices.
+  *
+  */
+
+// Use ProductReturnType to get correct traits, in particular vectorization flags
+namespace internal {
+template<typename Lhs, typename Rhs>
+struct traits<Product<Lhs, Rhs> >
+  : traits<typename ProductReturnType<Lhs, Rhs>::Type>
+{ 
+  // We want A+B*C to be of type Product<Matrix, Sum> and not Product<Matrix, Matrix>
+  // TODO: This flag should eventually go in a separate evaluator traits class
+  enum {
+    Flags = traits<typename ProductReturnType<Lhs, Rhs>::Type>::Flags & ~(EvalBeforeNestingBit | DirectAccessBit)
+  };
+};
+} // end namespace internal
+
+
+template<typename Lhs, typename Rhs>
+class Product : public ProductImpl<Lhs,Rhs,typename internal::promote_storage_type<typename internal::traits<Lhs>::StorageKind,
+                                                                            typename internal::traits<Rhs>::StorageKind>::ret>
+{
+  public:
+    
+    typedef typename ProductImpl<
+        Lhs, Rhs,
+        typename internal::promote_storage_type<typename Lhs::StorageKind,
+                                                typename Rhs::StorageKind>::ret>::Base Base;
+    EIGEN_GENERIC_PUBLIC_INTERFACE(Product)
+
+    typedef typename Lhs::Nested LhsNested;
+    typedef typename Rhs::Nested RhsNested;
+    typedef typename internal::remove_all<LhsNested>::type LhsNestedCleaned;
+    typedef typename internal::remove_all<RhsNested>::type RhsNestedCleaned;
+
+    Product(const Lhs& lhs, const Rhs& rhs) : m_lhs(lhs), m_rhs(rhs)
+    {
+      eigen_assert(lhs.cols() == rhs.rows()
+        && "invalid matrix product"
+        && "if you wanted a coeff-wise or a dot product use the respective explicit functions");
+    }
+
+    inline Index rows() const { return m_lhs.rows(); }
+    inline Index cols() const { return m_rhs.cols(); }
+
+    const LhsNestedCleaned& lhs() const { return m_lhs; }
+    const RhsNestedCleaned& rhs() const { return m_rhs; }
+
+  protected:
+
+    LhsNested m_lhs;
+    RhsNested m_rhs;
+};
+
+template<typename Lhs, typename Rhs>
+class ProductImpl<Lhs,Rhs,Dense> : public internal::dense_xpr_base<Product<Lhs,Rhs> >::type
+{
+    typedef Product<Lhs, Rhs> Derived;
+  public:
+
+    typedef typename internal::dense_xpr_base<Product<Lhs, Rhs> >::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Derived)
+};
+
+/***************************************************************************
+* Implementation of matrix base methods
+***************************************************************************/
+
+
+/** \internal used to test the evaluator only
+  */
+template<typename Lhs,typename Rhs>
+const Product<Lhs,Rhs>
+prod(const Lhs& lhs, const Rhs& rhs)
+{
+  return Product<Lhs,Rhs>(lhs,rhs);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_PRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/Core/ProductBase.h b/third_party/eigen3/Eigen/src/Core/ProductBase.h
new file mode 100644
index 0000000000..b6152cb8ca
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/ProductBase.h
@@ -0,0 +1,280 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PRODUCTBASE_H
+#define EIGEN_PRODUCTBASE_H
+
+namespace Eigen { 
+
+/** \class ProductBase
+  * \ingroup Core_Module
+  *
+  */
+
+namespace internal {
+template<typename Derived, typename _Lhs, typename _Rhs>
+struct traits<ProductBase<Derived,_Lhs,_Rhs> >
+{
+  typedef MatrixXpr XprKind;
+  typedef typename remove_all<_Lhs>::type Lhs;
+  typedef typename remove_all<_Rhs>::type Rhs;
+  typedef typename scalar_product_traits<typename Lhs::Scalar, typename Rhs::Scalar>::ReturnType Scalar;
+  typedef typename promote_storage_type<typename traits<Lhs>::StorageKind,
+                                           typename traits<Rhs>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename traits<Lhs>::Index,
+                                         typename traits<Rhs>::Index>::type Index;
+  enum {
+    RowsAtCompileTime = traits<Lhs>::RowsAtCompileTime,
+    ColsAtCompileTime = traits<Rhs>::ColsAtCompileTime,
+    MaxRowsAtCompileTime = traits<Lhs>::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = traits<Rhs>::MaxColsAtCompileTime,
+    Flags = (MaxRowsAtCompileTime==1 ? RowMajorBit : 0)
+          | EvalBeforeNestingBit | EvalBeforeAssigningBit | NestByRefBit,
+                  // Note that EvalBeforeNestingBit and NestByRefBit
+                  // are not used in practice because nested is overloaded for products
+    CoeffReadCost = 0 // FIXME why is it needed ?
+  };
+};
+}
+
+#define EIGEN_PRODUCT_PUBLIC_INTERFACE(Derived) \
+  typedef ProductBase<Derived, Lhs, Rhs > Base; \
+  EIGEN_DENSE_PUBLIC_INTERFACE(Derived) \
+  typedef typename Base::LhsNested LhsNested; \
+  typedef typename Base::_LhsNested _LhsNested; \
+  typedef typename Base::LhsBlasTraits LhsBlasTraits; \
+  typedef typename Base::ActualLhsType ActualLhsType; \
+  typedef typename Base::_ActualLhsType _ActualLhsType; \
+  typedef typename Base::RhsNested RhsNested; \
+  typedef typename Base::_RhsNested _RhsNested; \
+  typedef typename Base::RhsBlasTraits RhsBlasTraits; \
+  typedef typename Base::ActualRhsType ActualRhsType; \
+  typedef typename Base::_ActualRhsType _ActualRhsType; \
+  using Base::m_lhs; \
+  using Base::m_rhs;
+
+template<typename Derived, typename Lhs, typename Rhs>
+class ProductBase : public MatrixBase<Derived>
+{
+  public:
+    typedef MatrixBase<Derived> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(ProductBase)
+    
+    typedef typename Lhs::Nested LhsNested;
+    typedef typename internal::remove_all<LhsNested>::type _LhsNested;
+    typedef internal::blas_traits<_LhsNested> LhsBlasTraits;
+    typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhsType;
+    typedef typename internal::remove_all<ActualLhsType>::type _ActualLhsType;
+    typedef typename internal::traits<Lhs>::Scalar LhsScalar;
+
+    typedef typename Rhs::Nested RhsNested;
+    typedef typename internal::remove_all<RhsNested>::type _RhsNested;
+    typedef internal::blas_traits<_RhsNested> RhsBlasTraits;
+    typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhsType;
+    typedef typename internal::remove_all<ActualRhsType>::type _ActualRhsType;
+    typedef typename internal::traits<Rhs>::Scalar RhsScalar;
+
+    // Diagonal of a product: no need to evaluate the arguments because they are going to be evaluated only once
+    typedef CoeffBasedProduct<LhsNested, RhsNested, 0> FullyLazyCoeffBaseProductType;
+
+  public:
+
+    typedef typename Base::PlainObject PlainObject;
+
+    ProductBase(const Lhs& a_lhs, const Rhs& a_rhs)
+      : m_lhs(a_lhs), m_rhs(a_rhs)
+    {
+      eigen_assert(a_lhs.cols() == a_rhs.rows()
+        && "invalid matrix product"
+        && "if you wanted a coeff-wise or a dot product use the respective explicit functions");
+    }
+
+    inline Index rows() const { return m_lhs.rows(); }
+    inline Index cols() const { return m_rhs.cols(); }
+
+    template<typename Dest>
+    inline void evalTo(Dest& dst) const { dst.setZero(); scaleAndAddTo(dst,Scalar(1)); }
+
+    template<typename Dest>
+    inline void addTo(Dest& dst) const { scaleAndAddTo(dst,Scalar(1)); }
+
+    template<typename Dest>
+    inline void subTo(Dest& dst) const { scaleAndAddTo(dst,Scalar(-1)); }
+
+    template<typename Dest>
+    inline void scaleAndAddTo(Dest& dst, const Scalar& alpha) const { derived().scaleAndAddTo(dst,alpha); }
+
+    const _LhsNested& lhs() const { return m_lhs; }
+    const _RhsNested& rhs() const { return m_rhs; }
+
+    // Implicit conversion to the nested type (trigger the evaluation of the product)
+    operator const PlainObject& () const
+    {
+      m_result.resize(m_lhs.rows(), m_rhs.cols());
+      derived().evalTo(m_result);
+      return m_result;
+    }
+
+    const Diagonal<const FullyLazyCoeffBaseProductType,0> diagonal() const
+    { return FullyLazyCoeffBaseProductType(m_lhs, m_rhs); }
+
+    template<int Index>
+    const Diagonal<const FullyLazyCoeffBaseProductType,Index> diagonal() const
+    { return FullyLazyCoeffBaseProductType(m_lhs, m_rhs); }
+
+    const Diagonal<const FullyLazyCoeffBaseProductType, DynamicIndex> diagonal(Index index) const {
+      return Diagonal<const FullyLazyCoeffBaseProductType, DynamicIndex>(
+          FullyLazyCoeffBaseProductType(m_lhs, m_rhs), index);
+    }
+
+    // restrict coeff accessors to 1x1 expressions. No need to care about mutators here since this isnt a Lvalue expression
+    typename Base::CoeffReturnType coeff(Index row, Index col) const
+    {
+#ifdef EIGEN2_SUPPORT
+      return lhs().row(row).cwiseProduct(rhs().col(col).transpose()).sum();
+#else
+      EIGEN_STATIC_ASSERT_SIZE_1x1(Derived)
+      eigen_assert(this->rows() == 1 && this->cols() == 1);
+      Matrix<Scalar,1,1> result = *this;
+      return result.coeff(row,col);
+#endif
+    }
+
+    typename Base::CoeffReturnType coeff(Index i) const
+    {
+      EIGEN_STATIC_ASSERT_SIZE_1x1(Derived)
+      eigen_assert(this->rows() == 1 && this->cols() == 1);
+      Matrix<Scalar,1,1> result = *this;
+      return result.coeff(i);
+    }
+
+    const Scalar& coeffRef(Index row, Index col) const
+    {
+      EIGEN_STATIC_ASSERT_SIZE_1x1(Derived)
+      eigen_assert(this->rows() == 1 && this->cols() == 1);
+      return derived().coeffRef(row,col);
+    }
+
+    const Scalar& coeffRef(Index i) const
+    {
+      EIGEN_STATIC_ASSERT_SIZE_1x1(Derived)
+      eigen_assert(this->rows() == 1 && this->cols() == 1);
+      return derived().coeffRef(i);
+    }
+
+  protected:
+
+    LhsNested m_lhs;
+    RhsNested m_rhs;
+
+    mutable PlainObject m_result;
+};
+
+// here we need to overload the nested rule for products
+// such that the nested type is a const reference to a plain matrix
+namespace internal {
+template<typename Lhs, typename Rhs, int Mode, int N, typename PlainObject>
+struct nested<GeneralProduct<Lhs,Rhs,Mode>, N, PlainObject>
+{
+  typedef PlainObject const& type;
+};
+}
+
+template<typename NestedProduct>
+class ScaledProduct;
+
+// Note that these two operator* functions are not defined as member
+// functions of ProductBase, because, otherwise we would have to
+// define all overloads defined in MatrixBase. Furthermore, Using
+// "using Base::operator*" would not work with MSVC.
+//
+// Also note that here we accept any compatible scalar types
+template<typename Derived,typename Lhs,typename Rhs>
+const ScaledProduct<Derived>
+operator*(const ProductBase<Derived,Lhs,Rhs>& prod, const typename Derived::Scalar& x)
+{ return ScaledProduct<Derived>(prod.derived(), x); }
+
+template<typename Derived,typename Lhs,typename Rhs>
+typename internal::enable_if<!internal::is_same<typename Derived::Scalar,typename Derived::RealScalar>::value,
+                      const ScaledProduct<Derived> >::type
+operator*(const ProductBase<Derived,Lhs,Rhs>& prod, const typename Derived::RealScalar& x)
+{ return ScaledProduct<Derived>(prod.derived(), x); }
+
+
+template<typename Derived,typename Lhs,typename Rhs>
+const ScaledProduct<Derived>
+operator*(const typename Derived::Scalar& x,const ProductBase<Derived,Lhs,Rhs>& prod)
+{ return ScaledProduct<Derived>(prod.derived(), x); }
+
+template<typename Derived,typename Lhs,typename Rhs>
+typename internal::enable_if<!internal::is_same<typename Derived::Scalar,typename Derived::RealScalar>::value,
+                      const ScaledProduct<Derived> >::type
+operator*(const typename Derived::RealScalar& x,const ProductBase<Derived,Lhs,Rhs>& prod)
+{ return ScaledProduct<Derived>(prod.derived(), x); }
+
+namespace internal {
+template<typename NestedProduct>
+struct traits<ScaledProduct<NestedProduct> >
+ : traits<ProductBase<ScaledProduct<NestedProduct>,
+                         typename NestedProduct::_LhsNested,
+                         typename NestedProduct::_RhsNested> >
+{
+  typedef typename traits<NestedProduct>::StorageKind StorageKind;
+};
+}
+
+template<typename NestedProduct>
+class ScaledProduct
+  : public ProductBase<ScaledProduct<NestedProduct>,
+                       typename NestedProduct::_LhsNested,
+                       typename NestedProduct::_RhsNested>
+{
+  public:
+    typedef ProductBase<ScaledProduct<NestedProduct>,
+                       typename NestedProduct::_LhsNested,
+                       typename NestedProduct::_RhsNested> Base;
+    typedef typename Base::Scalar Scalar;
+    typedef typename Base::PlainObject PlainObject;
+//     EIGEN_PRODUCT_PUBLIC_INTERFACE(ScaledProduct)
+
+    ScaledProduct(const NestedProduct& prod, const Scalar& x)
+    : Base(prod.lhs(),prod.rhs()), m_prod(prod), m_alpha(x) {}
+
+    template<typename Dest>
+    inline void evalTo(Dest& dst) const { dst.setZero(); scaleAndAddTo(dst, Scalar(1)); }
+
+    template<typename Dest>
+    inline void addTo(Dest& dst) const { scaleAndAddTo(dst, Scalar(1)); }
+
+    template<typename Dest>
+    inline void subTo(Dest& dst) const { scaleAndAddTo(dst, Scalar(-1)); }
+
+    template<typename Dest>
+    inline void scaleAndAddTo(Dest& dst, const Scalar& a_alpha) const { m_prod.derived().scaleAndAddTo(dst,a_alpha * m_alpha); }
+
+    const Scalar& alpha() const { return m_alpha; }
+    
+  protected:
+    const NestedProduct& m_prod;
+    Scalar m_alpha;
+};
+
+/** \internal
+  * Overloaded to perform an efficient C = (A*B).lazy() */
+template<typename Derived>
+template<typename ProductDerived, typename Lhs, typename Rhs>
+Derived& MatrixBase<Derived>::lazyAssign(const ProductBase<ProductDerived, Lhs,Rhs>& other)
+{
+  other.derived().evalTo(derived());
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_PRODUCTBASE_H
diff --git a/third_party/eigen3/Eigen/src/Core/ProductEvaluators.h b/third_party/eigen3/Eigen/src/Core/ProductEvaluators.h
new file mode 100644
index 0000000000..855914f2eb
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/ProductEvaluators.h
@@ -0,0 +1,411 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2011 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+
+#ifndef EIGEN_PRODUCTEVALUATORS_H
+#define EIGEN_PRODUCTEVALUATORS_H
+
+namespace Eigen {
+  
+namespace internal {
+  
+// We can evaluate the product either all at once, like GeneralProduct and its evalTo() function, or
+// traverse the matrix coefficient by coefficient, like CoeffBasedProduct.  Use the existing logic
+// in ProductReturnType to decide.
+
+template<typename XprType, typename ProductType>
+struct product_evaluator_dispatcher;
+
+template<typename Lhs, typename Rhs>
+struct evaluator_impl<Product<Lhs, Rhs> >
+  : product_evaluator_dispatcher<Product<Lhs, Rhs>, typename ProductReturnType<Lhs, Rhs>::Type> 
+{
+  typedef Product<Lhs, Rhs> XprType;
+  typedef product_evaluator_dispatcher<XprType, typename ProductReturnType<Lhs, Rhs>::Type> Base;
+
+  evaluator_impl(const XprType& xpr) : Base(xpr) 
+  { }
+};
+
+template<typename XprType, typename ProductType>
+struct product_evaluator_traits_dispatcher;
+
+template<typename Lhs, typename Rhs>
+struct evaluator_traits<Product<Lhs, Rhs> >
+  : product_evaluator_traits_dispatcher<Product<Lhs, Rhs>, typename ProductReturnType<Lhs, Rhs>::Type> 
+{ 
+  static const int AssumeAliasing = 1;
+};
+
+// Case 1: Evaluate all at once
+//
+// We can view the GeneralProduct class as a part of the product evaluator. 
+// Four sub-cases: InnerProduct, OuterProduct, GemmProduct and GemvProduct.
+// InnerProduct is special because GeneralProduct does not have an evalTo() method in this case.
+
+template<typename Lhs, typename Rhs>
+struct product_evaluator_traits_dispatcher<Product<Lhs, Rhs>, GeneralProduct<Lhs, Rhs, InnerProduct> > 
+{
+  static const int HasEvalTo = 0;
+};
+
+template<typename Lhs, typename Rhs>
+struct product_evaluator_dispatcher<Product<Lhs, Rhs>, GeneralProduct<Lhs, Rhs, InnerProduct> > 
+  : public evaluator<typename Product<Lhs, Rhs>::PlainObject>::type
+{
+  typedef Product<Lhs, Rhs> XprType;
+  typedef typename XprType::PlainObject PlainObject;
+  typedef typename evaluator<PlainObject>::type evaluator_base;
+
+  // TODO: Computation is too early (?)
+  product_evaluator_dispatcher(const XprType& xpr) : evaluator_base(m_result)
+  {
+    m_result.coeffRef(0,0) = (xpr.lhs().transpose().cwiseProduct(xpr.rhs())).sum();
+  }
+  
+protected:  
+  PlainObject m_result;
+};
+
+// For the other three subcases, simply call the evalTo() method of GeneralProduct
+// TODO: GeneralProduct should take evaluators, not expression objects.
+
+template<typename Lhs, typename Rhs, int ProductType>
+struct product_evaluator_traits_dispatcher<Product<Lhs, Rhs>, GeneralProduct<Lhs, Rhs, ProductType> > 
+{
+  static const int HasEvalTo = 1;
+};
+
+template<typename Lhs, typename Rhs, int ProductType>
+struct product_evaluator_dispatcher<Product<Lhs, Rhs>, GeneralProduct<Lhs, Rhs, ProductType> > 
+{
+  typedef Product<Lhs, Rhs> XprType;
+  typedef typename XprType::PlainObject PlainObject;
+  typedef typename evaluator<PlainObject>::type evaluator_base;
+  
+  product_evaluator_dispatcher(const XprType& xpr) : m_xpr(xpr)
+  { }
+  
+  template<typename DstEvaluatorType, typename DstXprType>
+  void evalTo(DstEvaluatorType /* not used */, DstXprType& dst) const
+  {
+    dst.resize(m_xpr.rows(), m_xpr.cols());
+    GeneralProduct<Lhs, Rhs, ProductType>(m_xpr.lhs(), m_xpr.rhs()).evalTo(dst);
+  }
+  
+protected: 
+  const XprType& m_xpr;
+};
+
+// Case 2: Evaluate coeff by coeff
+//
+// This is mostly taken from CoeffBasedProduct.h
+// The main difference is that we add an extra argument to the etor_product_*_impl::run() function
+// for the inner dimension of the product, because evaluator object do not know their size.
+
+template<int Traversal, int UnrollingIndex, typename Lhs, typename Rhs, typename RetScalar>
+struct etor_product_coeff_impl;
+
+template<int StorageOrder, int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct etor_product_packet_impl;
+
+template<typename Lhs, typename Rhs, typename LhsNested, typename RhsNested, int Flags>
+struct product_evaluator_traits_dispatcher<Product<Lhs, Rhs>, CoeffBasedProduct<LhsNested, RhsNested, Flags> >
+{
+  static const int HasEvalTo = 0;
+};
+
+template<typename Lhs, typename Rhs, typename LhsNested, typename RhsNested, int Flags>
+struct product_evaluator_dispatcher<Product<Lhs, Rhs>, CoeffBasedProduct<LhsNested, RhsNested, Flags> >
+  : evaluator_impl_base<Product<Lhs, Rhs> >
+{
+  typedef Product<Lhs, Rhs> XprType;
+  typedef CoeffBasedProduct<LhsNested, RhsNested, Flags> CoeffBasedProductType;
+
+  product_evaluator_dispatcher(const XprType& xpr) 
+    : m_lhsImpl(xpr.lhs()), 
+      m_rhsImpl(xpr.rhs()),  
+      m_innerDim(xpr.lhs().cols())
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketScalar PacketScalar;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  // Everything below here is taken from CoeffBasedProduct.h
+
+  enum {
+    RowsAtCompileTime = traits<CoeffBasedProductType>::RowsAtCompileTime,
+    PacketSize = packet_traits<Scalar>::size,
+    InnerSize  = traits<CoeffBasedProductType>::InnerSize,
+    CoeffReadCost = traits<CoeffBasedProductType>::CoeffReadCost,
+    Unroll = CoeffReadCost != Dynamic && CoeffReadCost <= EIGEN_UNROLLING_LIMIT,
+    CanVectorizeInner = traits<CoeffBasedProductType>::CanVectorizeInner
+  };
+
+  typedef typename evaluator<Lhs>::type LhsEtorType;
+  typedef typename evaluator<Rhs>::type RhsEtorType;
+  typedef etor_product_coeff_impl<CanVectorizeInner ? InnerVectorizedTraversal : DefaultTraversal,
+                                  Unroll ? InnerSize-1 : Dynamic,
+                                  LhsEtorType, RhsEtorType, Scalar> CoeffImpl;
+
+  const CoeffReturnType coeff(Index row, Index col) const
+  {
+    Scalar res;
+    CoeffImpl::run(row, col, m_lhsImpl, m_rhsImpl, m_innerDim, res);
+    return res;
+  }
+
+  /* Allow index-based non-packet access. It is impossible though to allow index-based packed access,
+   * which is why we don't set the LinearAccessBit.
+   */
+  const CoeffReturnType coeff(Index index) const
+  {
+    Scalar res;
+    const Index row = RowsAtCompileTime == 1 ? 0 : index;
+    const Index col = RowsAtCompileTime == 1 ? index : 0;
+    CoeffImpl::run(row, col, m_lhsImpl, m_rhsImpl, m_innerDim, res);
+    return res;
+  }
+
+  template<int LoadMode>
+  const PacketReturnType packet(Index row, Index col) const
+  {
+    PacketScalar res;
+    typedef etor_product_packet_impl<Flags&RowMajorBit ? RowMajor : ColMajor,
+				     Unroll ? InnerSize-1 : Dynamic,
+				     LhsEtorType, RhsEtorType, PacketScalar, LoadMode> PacketImpl;
+    PacketImpl::run(row, col, m_lhsImpl, m_rhsImpl, m_innerDim, res);
+    return res;
+  }
+
+protected:
+  typename evaluator<Lhs>::type m_lhsImpl;
+  typename evaluator<Rhs>::type m_rhsImpl;
+
+  // TODO: Get rid of m_innerDim if known at compile time
+  Index m_innerDim;
+};
+
+/***************************************************************************
+* Normal product .coeff() implementation (with meta-unrolling)
+***************************************************************************/
+
+/**************************************
+*** Scalar path  - no vectorization ***
+**************************************/
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename RetScalar>
+struct etor_product_coeff_impl<DefaultTraversal, UnrollingIndex, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, RetScalar &res)
+  {
+    etor_product_coeff_impl<DefaultTraversal, UnrollingIndex-1, Lhs, Rhs, RetScalar>::run(row, col, lhs, rhs, innerDim, res);
+    res += lhs.coeff(row, UnrollingIndex) * rhs.coeff(UnrollingIndex, col);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename RetScalar>
+struct etor_product_coeff_impl<DefaultTraversal, 0, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, RetScalar &res)
+  {
+    res = lhs.coeff(row, 0) * rhs.coeff(0, col);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename RetScalar>
+struct etor_product_coeff_impl<DefaultTraversal, Dynamic, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, RetScalar& res)
+  {
+    eigen_assert(innerDim>0 && "you are using a non initialized matrix");
+    res = lhs.coeff(row, 0) * rhs.coeff(0, col);
+    for(Index i = 1; i < innerDim; ++i)
+      res += lhs.coeff(row, i) * rhs.coeff(i, col);
+  }
+};
+
+/*******************************************
+*** Scalar path with inner vectorization ***
+*******************************************/
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet>
+struct etor_product_coeff_vectorized_unroller
+{
+  typedef typename Lhs::Index Index;
+  enum { PacketSize = packet_traits<typename Lhs::Scalar>::size };
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, typename Lhs::PacketScalar &pres)
+  {
+    etor_product_coeff_vectorized_unroller<UnrollingIndex-PacketSize, Lhs, Rhs, Packet>::run(row, col, lhs, rhs, innerDim, pres);
+    pres = padd(pres, pmul( lhs.template packet<Aligned>(row, UnrollingIndex) , rhs.template packet<Aligned>(UnrollingIndex, col) ));
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet>
+struct etor_product_coeff_vectorized_unroller<0, Lhs, Rhs, Packet>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, typename Lhs::PacketScalar &pres)
+  {
+    pres = pmul(lhs.template packet<Aligned>(row, 0) , rhs.template packet<Aligned>(0, col));
+  }
+};
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename RetScalar>
+struct etor_product_coeff_impl<InnerVectorizedTraversal, UnrollingIndex, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::PacketScalar Packet;
+  typedef typename Lhs::Index Index;
+  enum { PacketSize = packet_traits<typename Lhs::Scalar>::size };
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, RetScalar &res)
+  {
+    Packet pres;
+    etor_product_coeff_vectorized_unroller<UnrollingIndex+1-PacketSize, Lhs, Rhs, Packet>::run(row, col, lhs, rhs, innerDim, pres);
+    etor_product_coeff_impl<DefaultTraversal,UnrollingIndex,Lhs,Rhs,RetScalar>::run(row, col, lhs, rhs, innerDim, res);
+    res = predux(pres);
+  }
+};
+
+template<typename Lhs, typename Rhs, int LhsRows = Lhs::RowsAtCompileTime, int RhsCols = Rhs::ColsAtCompileTime>
+struct etor_product_coeff_vectorized_dyn_selector
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, typename Lhs::Scalar &res)
+  {
+    res = lhs.row(row).transpose().cwiseProduct(rhs.col(col)).sum();
+  }
+};
+
+// NOTE the 3 following specializations are because taking .col(0) on a vector is a bit slower
+// NOTE maybe they are now useless since we have a specialization for Block<Matrix>
+template<typename Lhs, typename Rhs, int RhsCols>
+struct etor_product_coeff_vectorized_dyn_selector<Lhs,Rhs,1,RhsCols>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index /*row*/, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, typename Lhs::Scalar &res)
+  {
+    res = lhs.transpose().cwiseProduct(rhs.col(col)).sum();
+  }
+};
+
+template<typename Lhs, typename Rhs, int LhsRows>
+struct etor_product_coeff_vectorized_dyn_selector<Lhs,Rhs,LhsRows,1>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index /*col*/, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, typename Lhs::Scalar &res)
+  {
+    res = lhs.row(row).transpose().cwiseProduct(rhs).sum();
+  }
+};
+
+template<typename Lhs, typename Rhs>
+struct etor_product_coeff_vectorized_dyn_selector<Lhs,Rhs,1,1>
+{
+  typedef typename Lhs::Index Index;
+  EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, typename Lhs::Scalar &res)
+  {
+    res = lhs.transpose().cwiseProduct(rhs).sum();
+  }
+};
+
+template<typename Lhs, typename Rhs, typename RetScalar>
+struct etor_product_coeff_impl<InnerVectorizedTraversal, Dynamic, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, typename Lhs::Scalar &res)
+  {
+    etor_product_coeff_vectorized_dyn_selector<Lhs,Rhs>::run(row, col, lhs, rhs, innerDim, res);
+  }
+};
+
+/*******************
+*** Packet path  ***
+*******************/
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct etor_product_packet_impl<RowMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
+  {
+    etor_product_packet_impl<RowMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, innerDim, res);
+    res =  pmadd(pset1<Packet>(lhs.coeff(row, UnrollingIndex)), rhs.template packet<LoadMode>(UnrollingIndex, col), res);
+  }
+};
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct etor_product_packet_impl<ColMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet &res)
+  {
+    etor_product_packet_impl<ColMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, innerDim, res);
+    res =  pmadd(lhs.template packet<LoadMode>(row, UnrollingIndex), pset1<Packet>(rhs.coeff(UnrollingIndex, col)), res);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct etor_product_packet_impl<RowMajor, 0, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
+  {
+    res = pmul(pset1<Packet>(lhs.coeff(row, 0)),rhs.template packet<LoadMode>(0, col));
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct etor_product_packet_impl<ColMajor, 0, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index /*innerDim*/, Packet &res)
+  {
+    res = pmul(lhs.template packet<LoadMode>(row, 0), pset1<Packet>(rhs.coeff(0, col)));
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct etor_product_packet_impl<RowMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
+  {
+    eigen_assert(innerDim>0 && "you are using a non initialized matrix");
+    res = pmul(pset1<Packet>(lhs.coeff(row, 0)),rhs.template packet<LoadMode>(0, col));
+    for(Index i = 1; i < innerDim; ++i)
+      res =  pmadd(pset1<Packet>(lhs.coeff(row, i)), rhs.template packet<LoadMode>(i, col), res);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct etor_product_packet_impl<ColMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Index innerDim, Packet& res)
+  {
+    eigen_assert(innerDim>0 && "you are using a non initialized matrix");
+    res = pmul(lhs.template packet<LoadMode>(row, 0), pset1<Packet>(rhs.coeff(0, col)));
+    for(Index i = 1; i < innerDim; ++i)
+      res =  pmadd(lhs.template packet<LoadMode>(row, i), pset1<Packet>(rhs.coeff(i, col)), res);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_PRODUCT_EVALUATORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/Random.h b/third_party/eigen3/Eigen/src/Core/Random.h
new file mode 100644
index 0000000000..2d3a7243bc
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Random.h
@@ -0,0 +1,193 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_RANDOM_H
+#define EIGEN_RANDOM_H
+
+namespace Eigen {
+
+namespace internal {
+
+template<typename Scalar> struct scalar_random_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_random_op)
+
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (Index, Index = 0) const {
+#ifndef __CUDA_ARCH__
+    // We're not compiling a cuda kernel
+    return random<Scalar>();
+#else
+    // We're trying to generate a random number from a cuda kernel.
+    assert(false && "Generating random numbers on gpu isn't supported yet");
+    return Scalar(0);
+#endif
+  }
+};
+
+template<typename Scalar>
+struct functor_traits<scalar_random_op<Scalar> >
+{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = false, IsRepeatable = false }; };
+
+} // end namespace internal
+
+/** \returns a random matrix expression
+  *
+  * Numbers are uniformly spread through their whole definition range for integer types,
+  * and in the [-1:1] range for floating point scalar types.
+  *
+  * The parameters \a rows and \a cols are the number of rows and of columns of
+  * the returned matrix. Must be compatible with this MatrixBase type.
+  *
+  * \not_reentrant
+  *
+  * This variant is meant to be used for dynamic-size matrix types. For fixed-size types,
+  * it is redundant to pass \a rows and \a cols as arguments, so Random() should be used
+  * instead.
+  *
+  *
+  * Example: \include MatrixBase_random_int_int.cpp
+  * Output: \verbinclude MatrixBase_random_int_int.out
+  *
+  * This expression has the "evaluate before nesting" flag so that it will be evaluated into
+  * a temporary matrix whenever it is nested in a larger expression. This prevents unexpected
+  * behavior with expressions involving random matrices.
+  *
+  * See DenseBase::NullaryExpr(Index, const CustomNullaryOp&) for an example using C++11 random generators.
+  *
+  * \sa DenseBase::setRandom(), DenseBase::Random(Index), DenseBase::Random()
+  */
+template<typename Derived>
+inline const CwiseNullaryOp<internal::scalar_random_op<typename internal::traits<Derived>::Scalar>, Derived>
+DenseBase<Derived>::Random(Index rows, Index cols)
+{
+  return NullaryExpr(rows, cols, internal::scalar_random_op<Scalar>());
+}
+
+/** \returns a random vector expression
+  *
+  * Numbers are uniformly spread through their whole definition range for integer types,
+  * and in the [-1:1] range for floating point scalar types.
+  *
+  * The parameter \a size is the size of the returned vector.
+  * Must be compatible with this MatrixBase type.
+  *
+  * \only_for_vectors
+  * \not_reentrant
+  *
+  * This variant is meant to be used for dynamic-size vector types. For fixed-size types,
+  * it is redundant to pass \a size as argument, so Random() should be used
+  * instead.
+  *
+  * Example: \include MatrixBase_random_int.cpp
+  * Output: \verbinclude MatrixBase_random_int.out
+  *
+  * This expression has the "evaluate before nesting" flag so that it will be evaluated into
+  * a temporary vector whenever it is nested in a larger expression. This prevents unexpected
+  * behavior with expressions involving random matrices.
+  *
+  * \sa DenseBase::setRandom(), DenseBase::Random(Index,Index), DenseBase::Random()
+  */
+template<typename Derived>
+inline const CwiseNullaryOp<internal::scalar_random_op<typename internal::traits<Derived>::Scalar>, Derived>
+DenseBase<Derived>::Random(Index size)
+{
+  return NullaryExpr(size, internal::scalar_random_op<Scalar>());
+}
+
+/** \returns a fixed-size random matrix or vector expression
+  *
+  * Numbers are uniformly spread through their whole definition range for integer types,
+  * and in the [-1:1] range for floating point scalar types.
+  *
+  * This variant is only for fixed-size MatrixBase types. For dynamic-size types, you
+  * need to use the variants taking size arguments.
+  *
+  * Example: \include MatrixBase_random.cpp
+  * Output: \verbinclude MatrixBase_random.out
+  *
+  * This expression has the "evaluate before nesting" flag so that it will be evaluated into
+  * a temporary matrix whenever it is nested in a larger expression. This prevents unexpected
+  * behavior with expressions involving random matrices.
+  *
+  * \not_reentrant
+  *
+  * \sa DenseBase::setRandom(), DenseBase::Random(Index,Index), DenseBase::Random(Index)
+  */
+template<typename Derived>
+inline const CwiseNullaryOp<internal::scalar_random_op<typename internal::traits<Derived>::Scalar>, Derived>
+DenseBase<Derived>::Random()
+{
+  return NullaryExpr(RowsAtCompileTime, ColsAtCompileTime, internal::scalar_random_op<Scalar>());
+}
+
+/** Sets all coefficients in this expression to random values.
+  *
+  * Numbers are uniformly spread through their whole definition range for integer types,
+  * and in the [-1:1] range for floating point scalar types.
+  *
+  * \not_reentrant
+  *
+  * Example: \include MatrixBase_setRandom.cpp
+  * Output: \verbinclude MatrixBase_setRandom.out
+  *
+  * \sa class CwiseNullaryOp, setRandom(Index), setRandom(Index,Index)
+  */
+template<typename Derived>
+inline Derived& DenseBase<Derived>::setRandom()
+{
+  return *this = Random(rows(), cols());
+}
+
+/** Resizes to the given \a newSize, and sets all coefficients in this expression to random values.
+  *
+  * Numbers are uniformly spread through their whole definition range for integer types,
+  * and in the [-1:1] range for floating point scalar types.
+  *
+  * \only_for_vectors
+  * \not_reentrant
+  *
+  * Example: \include Matrix_setRandom_int.cpp
+  * Output: \verbinclude Matrix_setRandom_int.out
+  *
+  * \sa DenseBase::setRandom(), setRandom(Index,Index), class CwiseNullaryOp, DenseBase::Random()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+PlainObjectBase<Derived>::setRandom(Index newSize)
+{
+  resize(newSize);
+  return setRandom();
+}
+
+/** Resizes to the given size, and sets all coefficients in this expression to random values.
+  *
+  * Numbers are uniformly spread through their whole definition range for integer types,
+  * and in the [-1:1] range for floating point scalar types.
+  *
+  * \not_reentrant
+  *
+  * \param nbRows the new number of rows
+  * \param nbCols the new number of columns
+  *
+  * Example: \include Matrix_setRandom_int_int.cpp
+  * Output: \verbinclude Matrix_setRandom_int_int.out
+  *
+  * \sa DenseBase::setRandom(), setRandom(Index), class CwiseNullaryOp, DenseBase::Random()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+PlainObjectBase<Derived>::setRandom(Index nbRows, Index nbCols)
+{
+  resize(nbRows, nbCols);
+  return setRandom();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_RANDOM_H
diff --git a/third_party/eigen3/Eigen/src/Core/Redux.h b/third_party/eigen3/Eigen/src/Core/Redux.h
new file mode 100644
index 0000000000..5b82c9a654
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Redux.h
@@ -0,0 +1,417 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_REDUX_H
+#define EIGEN_REDUX_H
+
+namespace Eigen { 
+
+namespace internal {
+
+// TODO
+//  * implement other kind of vectorization
+//  * factorize code
+
+/***************************************************************************
+* Part 1 : the logic deciding a strategy for vectorization and unrolling
+***************************************************************************/
+
+template<typename Func, typename Derived>
+struct redux_traits
+{
+public:
+  enum {
+    PacketSize = packet_traits<typename Derived::Scalar>::size,
+    InnerMaxSize = int(Derived::IsRowMajor)
+                 ? Derived::MaxColsAtCompileTime
+                 : Derived::MaxRowsAtCompileTime
+  };
+
+  enum {
+    MightVectorize = (int(Derived::Flags)&ActualPacketAccessBit)
+                  && (functor_traits<Func>::PacketAccess),
+    MayLinearVectorize = MightVectorize && (int(Derived::Flags)&LinearAccessBit),
+    MaySliceVectorize  = MightVectorize && int(InnerMaxSize)>=3*PacketSize
+  };
+
+public:
+  enum {
+    Traversal = int(MayLinearVectorize) ? int(LinearVectorizedTraversal)
+              : int(MaySliceVectorize)  ? int(SliceVectorizedTraversal)
+                                        : int(DefaultTraversal)
+  };
+
+public:
+  enum {
+    Cost = (  Derived::SizeAtCompileTime == Dynamic
+           || Derived::CoeffReadCost == Dynamic
+           || (Derived::SizeAtCompileTime!=1 && functor_traits<Func>::Cost == Dynamic)
+           ) ? Dynamic
+           : Derived::SizeAtCompileTime * Derived::CoeffReadCost
+               + (Derived::SizeAtCompileTime-1) * functor_traits<Func>::Cost,
+    UnrollingLimit = EIGEN_UNROLLING_LIMIT * (int(Traversal) == int(DefaultTraversal) ? 1 : int(PacketSize))
+  };
+
+public:
+  enum {
+    Unrolling = Cost != Dynamic && Cost <= UnrollingLimit
+              ? CompleteUnrolling
+              : NoUnrolling
+  };
+};
+
+/***************************************************************************
+* Part 2 : unrollers
+***************************************************************************/
+
+/*** no vectorization ***/
+
+template<typename Func, typename Derived, int Start, int Length>
+struct redux_novec_unroller
+{
+  enum {
+    HalfLength = Length/2
+  };
+
+  typedef typename Derived::Scalar Scalar;
+
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Scalar run(const Derived &mat, const Func& func)
+  {
+    return func(redux_novec_unroller<Func, Derived, Start, HalfLength>::run(mat,func),
+                redux_novec_unroller<Func, Derived, Start+HalfLength, Length-HalfLength>::run(mat,func));
+  }
+};
+
+template<typename Func, typename Derived, int Start>
+struct redux_novec_unroller<Func, Derived, Start, 1>
+{
+  enum {
+    outer = Start / Derived::InnerSizeAtCompileTime,
+    inner = Start % Derived::InnerSizeAtCompileTime
+  };
+
+  typedef typename Derived::Scalar Scalar;
+
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Scalar run(const Derived &mat, const Func&)
+  {
+    return mat.coeffByOuterInner(outer, inner);
+  }
+};
+
+// This is actually dead code and will never be called. It is required
+// to prevent false warnings regarding failed inlining though
+// for 0 length run() will never be called at all.
+template<typename Func, typename Derived, int Start>
+struct redux_novec_unroller<Func, Derived, Start, 0>
+{
+  typedef typename Derived::Scalar Scalar;
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE Scalar run(const Derived&, const Func&) { return Scalar(); }
+};
+
+/*** vectorization ***/
+
+template<typename Func, typename Derived, int Start, int Length>
+struct redux_vec_unroller
+{
+  enum {
+    PacketSize = packet_traits<typename Derived::Scalar>::size,
+    HalfLength = Length/2
+  };
+
+  typedef typename Derived::Scalar Scalar;
+  typedef typename packet_traits<Scalar>::type PacketScalar;
+
+  static EIGEN_STRONG_INLINE PacketScalar run(const Derived &mat, const Func& func)
+  {
+    return func.packetOp(
+            redux_vec_unroller<Func, Derived, Start, HalfLength>::run(mat,func),
+            redux_vec_unroller<Func, Derived, Start+HalfLength, Length-HalfLength>::run(mat,func) );
+  }
+};
+
+template<typename Func, typename Derived, int Start>
+struct redux_vec_unroller<Func, Derived, Start, 1>
+{
+  enum {
+    index = Start * packet_traits<typename Derived::Scalar>::size,
+    outer = index / int(Derived::InnerSizeAtCompileTime),
+    inner = index % int(Derived::InnerSizeAtCompileTime),
+    alignment = (Derived::Flags & AlignedBit) ? Aligned : Unaligned
+  };
+
+  typedef typename Derived::Scalar Scalar;
+  typedef typename packet_traits<Scalar>::type PacketScalar;
+
+  static EIGEN_STRONG_INLINE PacketScalar run(const Derived &mat, const Func&)
+  {
+    return mat.template packetByOuterInner<alignment>(outer, inner);
+  }
+};
+
+/***************************************************************************
+* Part 3 : implementation of all cases
+***************************************************************************/
+
+template<typename Func, typename Derived,
+         int Traversal = redux_traits<Func, Derived>::Traversal,
+         int Unrolling = redux_traits<Func, Derived>::Unrolling
+>
+struct redux_impl;
+
+template<typename Func, typename Derived>
+struct redux_impl<Func, Derived, DefaultTraversal, NoUnrolling>
+{
+  typedef typename Derived::Scalar Scalar;
+  typedef typename Derived::Index Index;
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE Scalar run(const Derived& mat, const Func& func)
+  {
+    eigen_assert(mat.rows()>0 && mat.cols()>0 && "you are using an empty matrix");
+    Scalar res;
+    res = mat.coeffByOuterInner(0, 0);
+    for(Index i = 1; i < mat.innerSize(); ++i)
+      res = func(res, mat.coeffByOuterInner(0, i));
+    for(Index i = 1; i < mat.outerSize(); ++i)
+      for(Index j = 0; j < mat.innerSize(); ++j)
+        res = func(res, mat.coeffByOuterInner(i, j));
+    return res;
+  }
+};
+
+template<typename Func, typename Derived>
+struct redux_impl<Func,Derived, DefaultTraversal, CompleteUnrolling>
+  : public redux_novec_unroller<Func,Derived, 0, Derived::SizeAtCompileTime>
+{};
+
+template<typename Func, typename Derived>
+struct redux_impl<Func, Derived, LinearVectorizedTraversal, NoUnrolling>
+{
+  typedef typename Derived::Scalar Scalar;
+  typedef typename packet_traits<Scalar>::type PacketScalar;
+  typedef typename Derived::Index Index;
+
+  static Scalar run(const Derived& mat, const Func& func)
+  {
+    const Index size = mat.size();
+    eigen_assert(size && "you are using an empty matrix");
+    const Index packetSize = packet_traits<Scalar>::size;
+    const Index alignedStart = internal::first_aligned(mat);
+    enum {
+      alignment = bool(Derived::Flags & DirectAccessBit) || bool(Derived::Flags & AlignedBit)
+                ? Aligned : Unaligned
+    };
+    const Index alignedSize2 = ((size-alignedStart)/(2*packetSize))*(2*packetSize);
+    const Index alignedSize = ((size-alignedStart)/(packetSize))*(packetSize);
+    const Index alignedEnd2 = alignedStart + alignedSize2;
+    const Index alignedEnd  = alignedStart + alignedSize;
+    Scalar res;
+    if(alignedSize)
+    {
+      PacketScalar packet_res0 = mat.template packet<alignment>(alignedStart);
+      if(alignedSize>packetSize) // we have at least two packets to partly unroll the loop
+      {
+        PacketScalar packet_res1 = mat.template packet<alignment>(alignedStart+packetSize);
+        for(Index index = alignedStart + 2*packetSize; index < alignedEnd2; index += 2*packetSize)
+        {
+          packet_res0 = func.packetOp(packet_res0, mat.template packet<alignment>(index));
+          packet_res1 = func.packetOp(packet_res1, mat.template packet<alignment>(index+packetSize));
+        }
+
+        packet_res0 = func.packetOp(packet_res0,packet_res1);
+        if(alignedEnd>alignedEnd2)
+          packet_res0 = func.packetOp(packet_res0, mat.template packet<alignment>(alignedEnd2));
+      }
+      res = func.predux(packet_res0);
+
+      for(Index index = 0; index < alignedStart; ++index)
+        res = func(res,mat.coeff(index));
+
+      for(Index index = alignedEnd; index < size; ++index)
+        res = func(res,mat.coeff(index));
+    }
+    else // too small to vectorize anything.
+         // since this is dynamic-size hence inefficient anyway for such small sizes, don't try to optimize.
+    {
+      res = mat.coeff(0);
+      for(Index index = 1; index < size; ++index)
+        res = func(res,mat.coeff(index));
+    }
+
+    return res;
+  }
+};
+
+template<typename Func, typename Derived>
+struct redux_impl<Func, Derived, SliceVectorizedTraversal, NoUnrolling>
+{
+  typedef typename Derived::Scalar Scalar;
+  typedef typename packet_traits<Scalar>::type PacketScalar;
+  typedef typename Derived::Index Index;
+
+  static Scalar run(const Derived& mat, const Func& func)
+  {
+    eigen_assert(mat.rows()>0 && mat.cols()>0 && "you are using an empty matrix");
+    const Index innerSize = mat.innerSize();
+    const Index outerSize = mat.outerSize();
+    enum {
+      packetSize = packet_traits<Scalar>::size
+    };
+    const Index packetedInnerSize = ((innerSize)/packetSize)*packetSize;
+    Scalar res;
+    if(packetedInnerSize)
+    {
+      PacketScalar packet_res = mat.template packet<Unaligned>(0,0);
+      for(Index j=0; j<outerSize; ++j)
+        for(Index i=(j==0?packetSize:0); i<packetedInnerSize; i+=Index(packetSize))
+          packet_res = func.packetOp(packet_res, mat.template packetByOuterInner<Unaligned>(j,i));
+
+      res = func.predux(packet_res);
+      for(Index j=0; j<outerSize; ++j)
+        for(Index i=packetedInnerSize; i<innerSize; ++i)
+          res = func(res, mat.coeffByOuterInner(j,i));
+    }
+    else // too small to vectorize anything.
+         // since this is dynamic-size hence inefficient anyway for such small sizes, don't try to optimize.
+    {
+      res = redux_impl<Func, Derived, DefaultTraversal, NoUnrolling>::run(mat, func);
+    }
+
+    return res;
+  }
+};
+
+template<typename Func, typename Derived>
+struct redux_impl<Func, Derived, LinearVectorizedTraversal, CompleteUnrolling>
+{
+  typedef typename Derived::Scalar Scalar;
+  typedef typename packet_traits<Scalar>::type PacketScalar;
+  enum {
+    PacketSize = packet_traits<Scalar>::size,
+    Size = Derived::SizeAtCompileTime,
+    VectorizedSize = (Size / PacketSize) * PacketSize
+  };
+  static EIGEN_STRONG_INLINE Scalar run(const Derived& mat, const Func& func)
+  {
+    eigen_assert(mat.rows()>0 && mat.cols()>0 && "you are using an empty matrix");
+    if (VectorizedSize > 0) {
+      Scalar res = func.predux(redux_vec_unroller<Func, Derived, 0, Size / PacketSize>::run(mat,func));
+      if (VectorizedSize != Size)
+        res = func(res,redux_novec_unroller<Func, Derived, VectorizedSize, Size-VectorizedSize>::run(mat,func));
+      return res;
+    }
+    else {
+      return redux_novec_unroller<Func, Derived, 0, Size>::run(mat,func);
+    }
+  }
+};
+
+} // end namespace internal
+
+/***************************************************************************
+* Part 4 : public API
+***************************************************************************/
+
+
+/** \returns the result of a full redux operation on the whole matrix or vector using \a func
+  *
+  * The template parameter \a BinaryOp is the type of the functor \a func which must be
+  * an associative operator. Both current STL and TR1 functor styles are handled.
+  *
+  * \sa DenseBase::sum(), DenseBase::minCoeff(), DenseBase::maxCoeff(), MatrixBase::colwise(), MatrixBase::rowwise()
+  */
+template<typename Derived>
+template<typename Func>
+EIGEN_STRONG_INLINE typename internal::result_of<Func(typename internal::traits<Derived>::Scalar)>::type
+DenseBase<Derived>::redux(const Func& func) const
+{
+  typedef typename internal::remove_all<typename Derived::Nested>::type ThisNested;
+  return internal::redux_impl<Func, ThisNested>
+            ::run(derived(), func);
+}
+
+/** \returns the minimum of all coefficients of \c *this.
+  * \warning the result is undefined if \c *this contains NaN.
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::minCoeff() const
+{
+  return this->redux(Eigen::internal::scalar_min_op<Scalar>());
+}
+
+/** \returns the maximum of all coefficients of \c *this.
+  * \warning the result is undefined if \c *this contains NaN.
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::maxCoeff() const
+{
+  return this->redux(Eigen::internal::scalar_max_op<Scalar>());
+}
+
+/** \returns the sum of all coefficients of *this
+  *
+  * \sa trace(), prod(), mean()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::sum() const
+{
+  if(SizeAtCompileTime==0 || (SizeAtCompileTime==Dynamic && size()==0))
+    return Scalar(0);
+  return this->redux(Eigen::internal::scalar_sum_op<Scalar>());
+}
+
+/** \returns the mean of all coefficients of *this
+*
+* \sa trace(), prod(), sum()
+*/
+template<typename Derived>
+EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::mean() const
+{
+  return Scalar(this->redux(Eigen::internal::scalar_sum_op<Scalar>())) / Scalar(this->size());
+}
+
+/** \returns the product of all coefficients of *this
+  *
+  * Example: \include MatrixBase_prod.cpp
+  * Output: \verbinclude MatrixBase_prod.out
+  *
+  * \sa sum(), mean(), trace()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::prod() const
+{
+  if(SizeAtCompileTime==0 || (SizeAtCompileTime==Dynamic && size()==0))
+    return Scalar(1);
+  return this->redux(Eigen::internal::scalar_product_op<Scalar>());
+}
+
+/** \returns the trace of \c *this, i.e. the sum of the coefficients on the main diagonal.
+  *
+  * \c *this can be any matrix, not necessarily square.
+  *
+  * \sa diagonal(), sum()
+  */
+template<typename Derived>
+EIGEN_STRONG_INLINE typename internal::traits<Derived>::Scalar
+MatrixBase<Derived>::trace() const
+{
+  return derived().diagonal().sum();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_REDUX_H
diff --git a/third_party/eigen3/Eigen/src/Core/Ref.h b/third_party/eigen3/Eigen/src/Core/Ref.h
new file mode 100644
index 0000000000..cd6d949c4c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Ref.h
@@ -0,0 +1,260 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_REF_H
+#define EIGEN_REF_H
+
+namespace Eigen { 
+
+template<typename Derived> class RefBase;
+template<typename PlainObjectType, int Options = 0,
+         typename StrideType = typename internal::conditional<PlainObjectType::IsVectorAtCompileTime,InnerStride<1>,OuterStride<> >::type > class Ref;
+
+/** \class Ref
+  * \ingroup Core_Module
+  *
+  * \brief A matrix or vector expression mapping an existing expressions
+  *
+  * \tparam PlainObjectType the equivalent matrix type of the mapped data
+  * \tparam Options specifies whether the pointer is \c #Aligned, or \c #Unaligned.
+  *                The default is \c #Unaligned.
+  * \tparam StrideType optionally specifies strides. By default, Ref implies a contiguous storage along the inner dimension (inner stride==1),
+  *                   but accept a variable outer stride (leading dimension).
+  *                   This can be overridden by specifying strides.
+  *                   The type passed here must be a specialization of the Stride template, see examples below.
+  *
+  * This class permits to write non template functions taking Eigen's object as parameters while limiting the number of copies.
+  * A Ref<> object can represent either a const expression or a l-value:
+  * \code
+  * // in-out argument:
+  * void foo1(Ref<VectorXf> x);
+  *
+  * // read-only const argument:
+  * void foo2(const Ref<const VectorXf>& x);
+  * \endcode
+  *
+  * In the in-out case, the input argument must satisfies the constraints of the actual Ref<> type, otherwise a compilation issue will be triggered.
+  * By default, a Ref<VectorXf> can reference any dense vector expression of float having a contiguous memory layout.
+  * Likewise, a Ref<MatrixXf> can reference any column major dense matrix expression of float whose column's elements are contiguously stored with
+  * the possibility to have a constant space inbetween each column, i.e.: the inner stride mmust be equal to 1, but the outer-stride (or leading dimension),
+  * can be greater than the number of rows.
+  *
+  * In the const case, if the input expression does not match the above requirement, then it is evaluated into a temporary before being passed to the function.
+  * Here are some examples:
+  * \code
+  * MatrixXf A;
+  * VectorXf a;
+  * foo1(a.head());             // OK
+  * foo1(A.col());              // OK
+  * foo1(A.row());              // compilation error because here innerstride!=1
+  * foo2(A.row());              // The row is copied into a contiguous temporary
+  * foo2(2*a);                  // The expression is evaluated into a temporary
+  * foo2(A.col().segment(2,4)); // No temporary
+  * \endcode
+  *
+  * The range of inputs that can be referenced without temporary can be enlarged using the last two template parameter.
+  * Here is an example accepting an innerstride!=1:
+  * \code
+  * // in-out argument:
+  * void foo3(Ref<VectorXf,0,InnerStride<> > x);
+  * foo3(A.row());              // OK
+  * \endcode
+  * The downside here is that the function foo3 might be significantly slower than foo1 because it won't be able to exploit vectorization, and will involved more
+  * expensive address computations even if the input is contiguously stored in memory. To overcome this issue, one might propose to overloads internally calling a
+  * template function, e.g.:
+  * \code
+  * // in the .h:
+  * void foo(const Ref<MatrixXf>& A);
+  * void foo(const Ref<MatrixXf,0,Stride<> >& A);
+  *
+  * // in the .cpp:
+  * template<typename TypeOfA> void foo_impl(const TypeOfA& A) {
+  *     ... // crazy code goes here
+  * }
+  * void foo(const Ref<MatrixXf>& A) { foo_impl(A); }
+  * void foo(const Ref<MatrixXf,0,Stride<> >& A) { foo_impl(A); }
+  * \endcode
+  *
+  *
+  * \sa PlainObjectBase::Map(), \ref TopicStorageOrders
+  */
+
+namespace internal {
+
+template<typename _PlainObjectType, int _Options, typename _StrideType>
+struct traits<Ref<_PlainObjectType, _Options, _StrideType> >
+  : public traits<Map<_PlainObjectType, _Options, _StrideType> >
+{
+  typedef _PlainObjectType PlainObjectType;
+  typedef _StrideType StrideType;
+  enum {
+    Options = _Options,
+    Flags = traits<Map<_PlainObjectType, _Options, _StrideType> >::Flags | NestByRefBit
+  };
+
+  template<typename Derived> struct match {
+    enum {
+      HasDirectAccess = internal::has_direct_access<Derived>::ret,
+      StorageOrderMatch = PlainObjectType::IsVectorAtCompileTime || Derived::IsVectorAtCompileTime || ((PlainObjectType::Flags&RowMajorBit)==(Derived::Flags&RowMajorBit)),
+      InnerStrideMatch = int(StrideType::InnerStrideAtCompileTime)==int(Dynamic)
+                      || int(StrideType::InnerStrideAtCompileTime)==int(Derived::InnerStrideAtCompileTime)
+                      || (int(StrideType::InnerStrideAtCompileTime)==0 && int(Derived::InnerStrideAtCompileTime)==1),
+      OuterStrideMatch = Derived::IsVectorAtCompileTime
+                      || int(StrideType::OuterStrideAtCompileTime)==int(Dynamic) || int(StrideType::OuterStrideAtCompileTime)==int(Derived::OuterStrideAtCompileTime),
+      AlignmentMatch = (_Options!=Aligned) || ((PlainObjectType::Flags&AlignedBit)==0) || ((traits<Derived>::Flags&AlignedBit)==AlignedBit),
+      MatchAtCompileTime = HasDirectAccess && StorageOrderMatch && InnerStrideMatch && OuterStrideMatch && AlignmentMatch
+    };
+    typedef typename internal::conditional<MatchAtCompileTime,internal::true_type,internal::false_type>::type type;
+  };
+  
+};
+
+template<typename Derived>
+struct traits<RefBase<Derived> > : public traits<Derived> {};
+
+}
+
+template<typename Derived> class RefBase
+ : public MapBase<Derived>
+{
+  typedef typename internal::traits<Derived>::PlainObjectType PlainObjectType;
+  typedef typename internal::traits<Derived>::StrideType StrideType;
+
+public:
+
+  typedef MapBase<Derived> Base;
+  EIGEN_DENSE_PUBLIC_INTERFACE(RefBase)
+
+  inline Index innerStride() const
+  {
+    return StrideType::InnerStrideAtCompileTime != 0 ? m_stride.inner() : 1;
+  }
+
+  inline Index outerStride() const
+  {
+    return StrideType::OuterStrideAtCompileTime != 0 ? m_stride.outer()
+         : IsVectorAtCompileTime ? this->size()
+         : int(Flags)&RowMajorBit ? this->cols()
+         : this->rows();
+  }
+
+  RefBase()
+    : Base(0,RowsAtCompileTime==Dynamic?0:RowsAtCompileTime,ColsAtCompileTime==Dynamic?0:ColsAtCompileTime),
+      // Stride<> does not allow default ctor for Dynamic strides, so let' initialize it with dummy values:
+      m_stride(StrideType::OuterStrideAtCompileTime==Dynamic?0:StrideType::OuterStrideAtCompileTime,
+               StrideType::InnerStrideAtCompileTime==Dynamic?0:StrideType::InnerStrideAtCompileTime)
+  {}
+  
+  EIGEN_INHERIT_ASSIGNMENT_OPERATORS(RefBase)
+
+protected:
+
+  typedef Stride<StrideType::OuterStrideAtCompileTime,StrideType::InnerStrideAtCompileTime> StrideBase;
+
+  template<typename Expression>
+  void construct(Expression& expr)
+  {
+    if(PlainObjectType::RowsAtCompileTime==1)
+    {
+      eigen_assert(expr.rows()==1 || expr.cols()==1);
+      ::new (static_cast<Base*>(this)) Base(expr.data(), 1, expr.size());
+    }
+    else if(PlainObjectType::ColsAtCompileTime==1)
+    {
+      eigen_assert(expr.rows()==1 || expr.cols()==1);
+      ::new (static_cast<Base*>(this)) Base(expr.data(), expr.size(), 1);
+    }
+    else
+      ::new (static_cast<Base*>(this)) Base(expr.data(), expr.rows(), expr.cols());
+    
+    if(Expression::IsVectorAtCompileTime && (!PlainObjectType::IsVectorAtCompileTime) && ((Expression::Flags&RowMajorBit)!=(PlainObjectType::Flags&RowMajorBit)))
+      ::new (&m_stride) StrideBase(expr.innerStride(), StrideType::InnerStrideAtCompileTime==0?0:1);
+    else
+      ::new (&m_stride) StrideBase(StrideType::OuterStrideAtCompileTime==0?0:expr.outerStride(),
+                                   StrideType::InnerStrideAtCompileTime==0?0:expr.innerStride());    
+  }
+
+  StrideBase m_stride;
+};
+
+
+template<typename PlainObjectType, int Options, typename StrideType> class Ref
+  : public RefBase<Ref<PlainObjectType, Options, StrideType> >
+{
+    typedef internal::traits<Ref> Traits;
+  public:
+
+    typedef RefBase<Ref> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Ref)
+
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename Derived>
+    inline Ref(PlainObjectBase<Derived>& expr,
+               typename internal::enable_if<bool(Traits::template match<Derived>::MatchAtCompileTime),Derived>::type* = 0)
+    {
+      Base::construct(expr);
+    }
+    template<typename Derived>
+    inline Ref(const DenseBase<Derived>& expr,
+               typename internal::enable_if<bool(internal::is_lvalue<Derived>::value&&bool(Traits::template match<Derived>::MatchAtCompileTime)),Derived>::type* = 0,
+               int = Derived::ThisConstantIsPrivateInPlainObjectBase)
+    #else
+    template<typename Derived>
+    inline Ref(DenseBase<Derived>& expr)
+    #endif
+    {
+      Base::construct(expr.const_cast_derived());
+    }
+
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Ref)
+
+};
+
+// this is the const ref version
+template<typename TPlainObjectType, int Options, typename StrideType> class Ref<const TPlainObjectType, Options, StrideType>
+  : public RefBase<Ref<const TPlainObjectType, Options, StrideType> >
+{
+    typedef internal::traits<Ref> Traits;
+  public:
+
+    typedef RefBase<Ref> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Ref)
+
+    template<typename Derived>
+    inline Ref(const DenseBase<Derived>& expr)
+    {
+//      std::cout << match_helper<Derived>::HasDirectAccess << "," << match_helper<Derived>::OuterStrideMatch << "," << match_helper<Derived>::InnerStrideMatch << "\n";
+//      std::cout << int(StrideType::OuterStrideAtCompileTime) << " - " << int(Derived::OuterStrideAtCompileTime) << "\n";
+//      std::cout << int(StrideType::InnerStrideAtCompileTime) << " - " << int(Derived::InnerStrideAtCompileTime) << "\n";
+      construct(expr.derived(), typename Traits::template match<Derived>::type());
+    }
+
+  protected:
+
+    template<typename Expression>
+    void construct(const Expression& expr,internal::true_type)
+    {
+      Base::construct(expr);
+    }
+
+    template<typename Expression>
+    void construct(const Expression& expr, internal::false_type)
+    {
+      m_object.lazyAssign(expr);
+      Base::construct(m_object);
+    }
+
+  protected:
+    TPlainObjectType m_object;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_REF_H
diff --git a/third_party/eigen3/Eigen/src/Core/Replicate.h b/third_party/eigen3/Eigen/src/Core/Replicate.h
new file mode 100644
index 0000000000..dde86a8349
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Replicate.h
@@ -0,0 +1,177 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_REPLICATE_H
+#define EIGEN_REPLICATE_H
+
+namespace Eigen { 
+
+/**
+  * \class Replicate
+  * \ingroup Core_Module
+  *
+  * \brief Expression of the multiple replication of a matrix or vector
+  *
+  * \param MatrixType the type of the object we are replicating
+  *
+  * This class represents an expression of the multiple replication of a matrix or vector.
+  * It is the return type of DenseBase::replicate() and most of the time
+  * this is the only way it is used.
+  *
+  * \sa DenseBase::replicate()
+  */
+
+namespace internal {
+template<typename MatrixType,int RowFactor,int ColFactor>
+struct traits<Replicate<MatrixType,RowFactor,ColFactor> >
+ : traits<MatrixType>
+{
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename traits<MatrixType>::StorageKind StorageKind;
+  typedef typename traits<MatrixType>::XprKind XprKind;
+  enum {
+    Factor = (RowFactor==Dynamic || ColFactor==Dynamic) ? Dynamic : RowFactor*ColFactor
+  };
+  typedef typename nested<MatrixType,Factor>::type MatrixTypeNested;
+  typedef typename remove_reference<MatrixTypeNested>::type _MatrixTypeNested;
+  enum {
+    RowsAtCompileTime = RowFactor==Dynamic || int(MatrixType::RowsAtCompileTime)==Dynamic
+                      ? Dynamic
+                      : RowFactor * MatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = ColFactor==Dynamic || int(MatrixType::ColsAtCompileTime)==Dynamic
+                      ? Dynamic
+                      : ColFactor * MatrixType::ColsAtCompileTime,
+   //FIXME we don't propagate the max sizes !!!
+    MaxRowsAtCompileTime = RowsAtCompileTime,
+    MaxColsAtCompileTime = ColsAtCompileTime,
+    IsRowMajor = MaxRowsAtCompileTime==1 && MaxColsAtCompileTime!=1 ? 1
+               : MaxColsAtCompileTime==1 && MaxRowsAtCompileTime!=1 ? 0
+               : (MatrixType::Flags & RowMajorBit) ? 1 : 0,
+    Flags = (_MatrixTypeNested::Flags & HereditaryBits & ~RowMajorBit) | (IsRowMajor ? RowMajorBit : 0),
+    CoeffReadCost = _MatrixTypeNested::CoeffReadCost
+  };
+};
+}
+
+template<typename MatrixType,int RowFactor,int ColFactor> class Replicate
+  : public internal::dense_xpr_base< Replicate<MatrixType,RowFactor,ColFactor> >::type
+{
+    typedef typename internal::traits<Replicate>::MatrixTypeNested MatrixTypeNested;
+    typedef typename internal::traits<Replicate>::_MatrixTypeNested _MatrixTypeNested;
+  public:
+
+    typedef typename internal::dense_xpr_base<Replicate>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Replicate)
+
+    template<typename OriginalMatrixType>
+    inline explicit Replicate(const OriginalMatrixType& a_matrix)
+      : m_matrix(a_matrix), m_rowFactor(RowFactor), m_colFactor(ColFactor)
+    {
+      EIGEN_STATIC_ASSERT((internal::is_same<typename internal::remove_const<MatrixType>::type,OriginalMatrixType>::value),
+                          THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE)
+      eigen_assert(RowFactor!=Dynamic && ColFactor!=Dynamic);
+    }
+
+    template<typename OriginalMatrixType>
+    inline Replicate(const OriginalMatrixType& a_matrix, Index rowFactor, Index colFactor)
+      : m_matrix(a_matrix), m_rowFactor(rowFactor), m_colFactor(colFactor)
+    {
+      EIGEN_STATIC_ASSERT((internal::is_same<typename internal::remove_const<MatrixType>::type,OriginalMatrixType>::value),
+                          THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE)
+    }
+
+    inline Index rows() const { return m_matrix.rows() * m_rowFactor.value(); }
+    inline Index cols() const { return m_matrix.cols() * m_colFactor.value(); }
+
+    inline Scalar coeff(Index rowId, Index colId) const
+    {
+      // try to avoid using modulo; this is a pure optimization strategy
+      const Index actual_row  = internal::traits<MatrixType>::RowsAtCompileTime==1 ? 0
+                            : RowFactor==1 ? rowId
+                            : rowId%m_matrix.rows();
+      const Index actual_col  = internal::traits<MatrixType>::ColsAtCompileTime==1 ? 0
+                            : ColFactor==1 ? colId
+                            : colId%m_matrix.cols();
+
+      return m_matrix.coeff(actual_row, actual_col);
+    }
+    template<int LoadMode>
+    inline PacketScalar packet(Index rowId, Index colId) const
+    {
+      const Index actual_row  = internal::traits<MatrixType>::RowsAtCompileTime==1 ? 0
+                            : RowFactor==1 ? rowId
+                            : rowId%m_matrix.rows();
+      const Index actual_col  = internal::traits<MatrixType>::ColsAtCompileTime==1 ? 0
+                            : ColFactor==1 ? colId
+                            : colId%m_matrix.cols();
+
+      return m_matrix.template packet<LoadMode>(actual_row, actual_col);
+    }
+
+    const _MatrixTypeNested& nestedExpression() const
+    { 
+      return m_matrix; 
+    }
+
+  protected:
+    MatrixTypeNested m_matrix;
+    const internal::variable_if_dynamic<Index, RowFactor> m_rowFactor;
+    const internal::variable_if_dynamic<Index, ColFactor> m_colFactor;
+};
+
+/**
+  * \return an expression of the replication of \c *this
+  *
+  * Example: \include MatrixBase_replicate.cpp
+  * Output: \verbinclude MatrixBase_replicate.out
+  *
+  * \sa VectorwiseOp::replicate(), DenseBase::replicate(Index,Index), class Replicate
+  */
+template<typename Derived>
+template<int RowFactor, int ColFactor>
+inline const Replicate<Derived,RowFactor,ColFactor>
+DenseBase<Derived>::replicate() const
+{
+  return Replicate<Derived,RowFactor,ColFactor>(derived());
+}
+
+/**
+  * \return an expression of the replication of \c *this
+  *
+  * Example: \include MatrixBase_replicate_int_int.cpp
+  * Output: \verbinclude MatrixBase_replicate_int_int.out
+  *
+  * \sa VectorwiseOp::replicate(), DenseBase::replicate<int,int>(), class Replicate
+  */
+template<typename Derived>
+inline const Replicate<Derived,Dynamic,Dynamic>
+DenseBase<Derived>::replicate(Index rowFactor,Index colFactor) const
+{
+  return Replicate<Derived,Dynamic,Dynamic>(derived(),rowFactor,colFactor);
+}
+
+/**
+  * \return an expression of the replication of each column (or row) of \c *this
+  *
+  * Example: \include DirectionWise_replicate_int.cpp
+  * Output: \verbinclude DirectionWise_replicate_int.out
+  *
+  * \sa VectorwiseOp::replicate(), DenseBase::replicate(), class Replicate
+  */
+template<typename ExpressionType, int Direction>
+const typename VectorwiseOp<ExpressionType,Direction>::ReplicateReturnType
+VectorwiseOp<ExpressionType,Direction>::replicate(Index factor) const
+{
+  return typename VectorwiseOp<ExpressionType,Direction>::ReplicateReturnType
+          (_expression(),Direction==Vertical?factor:1,Direction==Horizontal?factor:1);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_REPLICATE_H
diff --git a/third_party/eigen3/Eigen/src/Core/ReturnByValue.h b/third_party/eigen3/Eigen/src/Core/ReturnByValue.h
new file mode 100644
index 0000000000..7834f6cbcd
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/ReturnByValue.h
@@ -0,0 +1,89 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_RETURNBYVALUE_H
+#define EIGEN_RETURNBYVALUE_H
+
+namespace Eigen {
+
+/** \class ReturnByValue
+  * \ingroup Core_Module
+  *
+  */
+
+namespace internal {
+
+template<typename Derived>
+struct traits<ReturnByValue<Derived> >
+  : public traits<typename traits<Derived>::ReturnType>
+{
+  enum {
+    // We're disabling the DirectAccess because e.g. the constructor of
+    // the Block-with-DirectAccess expression requires to have a coeffRef method.
+    // Also, we don't want to have to implement the stride stuff.
+    Flags = (traits<typename traits<Derived>::ReturnType>::Flags
+             | EvalBeforeNestingBit) & ~DirectAccessBit
+  };
+};
+
+/* The ReturnByValue object doesn't even have a coeff() method.
+ * So the only way that nesting it in an expression can work, is by evaluating it into a plain matrix.
+ * So internal::nested always gives the plain return matrix type.
+ *
+ * FIXME: I don't understand why we need this specialization: isn't this taken care of by the EvalBeforeNestingBit ??
+ */
+template<typename Derived,int n,typename PlainObject>
+struct nested<ReturnByValue<Derived>, n, PlainObject>
+{
+  typedef typename traits<Derived>::ReturnType type;
+};
+
+} // end namespace internal
+
+template<typename Derived> class ReturnByValue
+  : internal::no_assignment_operator, public internal::dense_xpr_base< ReturnByValue<Derived> >::type
+{
+  public:
+    typedef typename internal::traits<Derived>::ReturnType ReturnType;
+
+    typedef typename internal::dense_xpr_base<ReturnByValue>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(ReturnByValue)
+
+    template<typename Dest>
+    EIGEN_DEVICE_FUNC
+    inline void evalTo(Dest& dst) const
+    { static_cast<const Derived*>(this)->evalTo(dst); }
+    EIGEN_DEVICE_FUNC inline Index rows() const { return static_cast<const Derived*>(this)->rows(); }
+    EIGEN_DEVICE_FUNC inline Index cols() const { return static_cast<const Derived*>(this)->cols(); }
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+#define Unusable YOU_ARE_TRYING_TO_ACCESS_A_SINGLE_COEFFICIENT_IN_A_SPECIAL_EXPRESSION_WHERE_THAT_IS_NOT_ALLOWED_BECAUSE_THAT_WOULD_BE_INEFFICIENT
+    class Unusable{
+      Unusable(const Unusable&) {}
+      Unusable& operator=(const Unusable&) {return *this;}
+    };
+    const Unusable& coeff(Index) const { return *reinterpret_cast<const Unusable*>(this); }
+    const Unusable& coeff(Index,Index) const { return *reinterpret_cast<const Unusable*>(this); }
+    Unusable& coeffRef(Index) { return *reinterpret_cast<Unusable*>(this); }
+    Unusable& coeffRef(Index,Index) { return *reinterpret_cast<Unusable*>(this); }
+#endif
+};
+
+template<typename Derived>
+template<typename OtherDerived>
+Derived& DenseBase<Derived>::operator=(const ReturnByValue<OtherDerived>& other)
+{
+  other.evalTo(derived());
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_RETURNBYVALUE_H
diff --git a/third_party/eigen3/Eigen/src/Core/Reverse.h b/third_party/eigen3/Eigen/src/Core/Reverse.h
new file mode 100644
index 0000000000..e30ae3d281
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Reverse.h
@@ -0,0 +1,224 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009 Ricard Marxer <email@ricardmarxer.com>
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_REVERSE_H
+#define EIGEN_REVERSE_H
+
+namespace Eigen { 
+
+/** \class Reverse
+  * \ingroup Core_Module
+  *
+  * \brief Expression of the reverse of a vector or matrix
+  *
+  * \param MatrixType the type of the object of which we are taking the reverse
+  *
+  * This class represents an expression of the reverse of a vector.
+  * It is the return type of MatrixBase::reverse() and VectorwiseOp::reverse()
+  * and most of the time this is the only way it is used.
+  *
+  * \sa MatrixBase::reverse(), VectorwiseOp::reverse()
+  */
+
+namespace internal {
+
+template<typename MatrixType, int Direction>
+struct traits<Reverse<MatrixType, Direction> >
+ : traits<MatrixType>
+{
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename traits<MatrixType>::StorageKind StorageKind;
+  typedef typename traits<MatrixType>::XprKind XprKind;
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_reference<MatrixTypeNested>::type _MatrixTypeNested;
+  enum {
+    RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+    MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
+
+    // let's enable LinearAccess only with vectorization because of the product overhead
+    LinearAccess = ( (Direction==BothDirections) && (int(_MatrixTypeNested::Flags)&PacketAccessBit) )
+                 ? LinearAccessBit : 0,
+
+    Flags = int(_MatrixTypeNested::Flags) & (HereditaryBits | LvalueBit | PacketAccessBit | LinearAccess),
+
+    CoeffReadCost = _MatrixTypeNested::CoeffReadCost
+  };
+};
+
+template<typename PacketScalar, bool ReversePacket> struct reverse_packet_cond
+{
+  static inline PacketScalar run(const PacketScalar& x) { return preverse(x); }
+};
+
+template<typename PacketScalar> struct reverse_packet_cond<PacketScalar,false>
+{
+  static inline PacketScalar run(const PacketScalar& x) { return x; }
+};
+
+} // end namespace internal 
+
+template<typename MatrixType, int Direction> class Reverse
+  : public internal::dense_xpr_base< Reverse<MatrixType, Direction> >::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<Reverse>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Reverse)
+    using Base::IsRowMajor;
+
+    // next line is necessary because otherwise const version of operator()
+    // is hidden by non-const version defined in this file
+    using Base::operator(); 
+
+  protected:
+    enum {
+      PacketSize = internal::packet_traits<Scalar>::size,
+      IsColMajor = !IsRowMajor,
+      ReverseRow = (Direction == Vertical)   || (Direction == BothDirections),
+      ReverseCol = (Direction == Horizontal) || (Direction == BothDirections),
+      OffsetRow  = ReverseRow && IsColMajor ? PacketSize : 1,
+      OffsetCol  = ReverseCol && IsRowMajor ? PacketSize : 1,
+      ReversePacket = (Direction == BothDirections)
+                    || ((Direction == Vertical)   && IsColMajor)
+                    || ((Direction == Horizontal) && IsRowMajor)
+    };
+    typedef internal::reverse_packet_cond<PacketScalar,ReversePacket> reverse_packet;
+  public:
+
+    inline Reverse(const MatrixType& matrix) : m_matrix(matrix) { }
+
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Reverse)
+
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+
+    inline Index innerStride() const
+    {
+      return -m_matrix.innerStride();
+    }
+
+    inline Scalar& operator()(Index row, Index col)
+    {
+      eigen_assert(row >= 0 && row < rows() && col >= 0 && col < cols());
+      return coeffRef(row, col);
+    }
+
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      return m_matrix.const_cast_derived().coeffRef(ReverseRow ? m_matrix.rows() - row - 1 : row,
+                                                    ReverseCol ? m_matrix.cols() - col - 1 : col);
+    }
+
+    inline CoeffReturnType coeff(Index row, Index col) const
+    {
+      return m_matrix.coeff(ReverseRow ? m_matrix.rows() - row - 1 : row,
+                            ReverseCol ? m_matrix.cols() - col - 1 : col);
+    }
+
+    inline CoeffReturnType coeff(Index index) const
+    {
+      return m_matrix.coeff(m_matrix.size() - index - 1);
+    }
+
+    inline Scalar& coeffRef(Index index)
+    {
+      return m_matrix.const_cast_derived().coeffRef(m_matrix.size() - index - 1);
+    }
+
+    inline Scalar& operator()(Index index)
+    {
+      eigen_assert(index >= 0 && index < m_matrix.size());
+      return coeffRef(index);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index row, Index col) const
+    {
+      return reverse_packet::run(m_matrix.template packet<LoadMode>(
+                                    ReverseRow ? m_matrix.rows() - row - OffsetRow : row,
+                                    ReverseCol ? m_matrix.cols() - col - OffsetCol : col));
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index row, Index col, const PacketScalar& x)
+    {
+      m_matrix.const_cast_derived().template writePacket<LoadMode>(
+                                      ReverseRow ? m_matrix.rows() - row - OffsetRow : row,
+                                      ReverseCol ? m_matrix.cols() - col - OffsetCol : col,
+                                      reverse_packet::run(x));
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index index) const
+    {
+      return internal::preverse(m_matrix.template packet<LoadMode>( m_matrix.size() - index - PacketSize ));
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index index, const PacketScalar& x)
+    {
+      m_matrix.const_cast_derived().template writePacket<LoadMode>(m_matrix.size() - index - PacketSize, internal::preverse(x));
+    }
+
+    const typename internal::remove_all<typename MatrixType::Nested>::type& 
+    nestedExpression() const 
+    {
+      return m_matrix;
+    }
+
+  protected:
+    typename MatrixType::Nested m_matrix;
+};
+
+/** \returns an expression of the reverse of *this.
+  *
+  * Example: \include MatrixBase_reverse.cpp
+  * Output: \verbinclude MatrixBase_reverse.out
+  *
+  */
+template<typename Derived>
+inline typename DenseBase<Derived>::ReverseReturnType
+DenseBase<Derived>::reverse()
+{
+  return derived();
+}
+
+/** This is the const version of reverse(). */
+template<typename Derived>
+inline const typename DenseBase<Derived>::ConstReverseReturnType
+DenseBase<Derived>::reverse() const
+{
+  return derived();
+}
+
+/** This is the "in place" version of reverse: it reverses \c *this.
+  *
+  * In most cases it is probably better to simply use the reversed expression
+  * of a matrix. However, when reversing the matrix data itself is really needed,
+  * then this "in-place" version is probably the right choice because it provides
+  * the following additional features:
+  *  - less error prone: doing the same operation with .reverse() requires special care:
+  *    \code m = m.reverse().eval(); \endcode
+  *  - this API allows to avoid creating a temporary (the current implementation creates a temporary, but that could be avoided using swap)
+  *  - it allows future optimizations (cache friendliness, etc.)
+  *
+  * \sa reverse() */
+template<typename Derived>
+inline void DenseBase<Derived>::reverseInPlace()
+{
+  derived() = derived().reverse().eval();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_REVERSE_H
diff --git a/third_party/eigen3/Eigen/src/Core/Select.h b/third_party/eigen3/Eigen/src/Core/Select.h
new file mode 100644
index 0000000000..87993bbb55
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Select.h
@@ -0,0 +1,162 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SELECT_H
+#define EIGEN_SELECT_H
+
+namespace Eigen { 
+
+/** \class Select
+  * \ingroup Core_Module
+  *
+  * \brief Expression of a coefficient wise version of the C++ ternary operator ?:
+  *
+  * \param ConditionMatrixType the type of the \em condition expression which must be a boolean matrix
+  * \param ThenMatrixType the type of the \em then expression
+  * \param ElseMatrixType the type of the \em else expression
+  *
+  * This class represents an expression of a coefficient wise version of the C++ ternary operator ?:.
+  * It is the return type of DenseBase::select() and most of the time this is the only way it is used.
+  *
+  * \sa DenseBase::select(const DenseBase<ThenDerived>&, const DenseBase<ElseDerived>&) const
+  */
+
+namespace internal {
+template<typename ConditionMatrixType, typename ThenMatrixType, typename ElseMatrixType>
+struct traits<Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >
+ : traits<ThenMatrixType>
+{
+  typedef typename traits<ThenMatrixType>::Scalar Scalar;
+  typedef Dense StorageKind;
+  typedef typename traits<ThenMatrixType>::XprKind XprKind;
+  typedef typename ConditionMatrixType::Nested ConditionMatrixNested;
+  typedef typename ThenMatrixType::Nested ThenMatrixNested;
+  typedef typename ElseMatrixType::Nested ElseMatrixNested;
+  enum {
+    RowsAtCompileTime = ConditionMatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = ConditionMatrixType::ColsAtCompileTime,
+    MaxRowsAtCompileTime = ConditionMatrixType::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = ConditionMatrixType::MaxColsAtCompileTime,
+    Flags = (unsigned int)ThenMatrixType::Flags & ElseMatrixType::Flags & HereditaryBits,
+    CoeffReadCost = traits<typename remove_all<ConditionMatrixNested>::type>::CoeffReadCost
+                  + EIGEN_SIZE_MAX(traits<typename remove_all<ThenMatrixNested>::type>::CoeffReadCost,
+                                   traits<typename remove_all<ElseMatrixNested>::type>::CoeffReadCost)
+  };
+};
+}
+
+template<typename ConditionMatrixType, typename ThenMatrixType, typename ElseMatrixType>
+class Select : internal::no_assignment_operator,
+  public internal::dense_xpr_base< Select<ConditionMatrixType, ThenMatrixType, ElseMatrixType> >::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<Select>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Select)
+
+    Select(const ConditionMatrixType& a_conditionMatrix,
+           const ThenMatrixType& a_thenMatrix,
+           const ElseMatrixType& a_elseMatrix)
+      : m_condition(a_conditionMatrix), m_then(a_thenMatrix), m_else(a_elseMatrix)
+    {
+      eigen_assert(m_condition.rows() == m_then.rows() && m_condition.rows() == m_else.rows());
+      eigen_assert(m_condition.cols() == m_then.cols() && m_condition.cols() == m_else.cols());
+    }
+
+    Index rows() const { return m_condition.rows(); }
+    Index cols() const { return m_condition.cols(); }
+
+    const Scalar coeff(Index i, Index j) const
+    {
+      if (m_condition.coeff(i,j))
+        return m_then.coeff(i,j);
+      else
+        return m_else.coeff(i,j);
+    }
+
+    const Scalar coeff(Index i) const
+    {
+      if (m_condition.coeff(i))
+        return m_then.coeff(i);
+      else
+        return m_else.coeff(i);
+    }
+
+    const ConditionMatrixType& conditionMatrix() const
+    {
+      return m_condition;
+    }
+
+    const ThenMatrixType& thenMatrix() const
+    {
+      return m_then;
+    }
+
+    const ElseMatrixType& elseMatrix() const
+    {
+      return m_else;
+    }
+
+  protected:
+    typename ConditionMatrixType::Nested m_condition;
+    typename ThenMatrixType::Nested m_then;
+    typename ElseMatrixType::Nested m_else;
+};
+
+
+/** \returns a matrix where each coefficient (i,j) is equal to \a thenMatrix(i,j)
+  * if \c *this(i,j), and \a elseMatrix(i,j) otherwise.
+  *
+  * Example: \include MatrixBase_select.cpp
+  * Output: \verbinclude MatrixBase_select.out
+  *
+  * \sa class Select
+  */
+template<typename Derived>
+template<typename ThenDerived,typename ElseDerived>
+inline const Select<Derived,ThenDerived,ElseDerived>
+DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
+                            const DenseBase<ElseDerived>& elseMatrix) const
+{
+  return Select<Derived,ThenDerived,ElseDerived>(derived(), thenMatrix.derived(), elseMatrix.derived());
+}
+
+/** Version of DenseBase::select(const DenseBase&, const DenseBase&) with
+  * the \em else expression being a scalar value.
+  *
+  * \sa DenseBase::select(const DenseBase<ThenDerived>&, const DenseBase<ElseDerived>&) const, class Select
+  */
+template<typename Derived>
+template<typename ThenDerived>
+inline const Select<Derived,ThenDerived, typename ThenDerived::ConstantReturnType>
+DenseBase<Derived>::select(const DenseBase<ThenDerived>& thenMatrix,
+                           const typename ThenDerived::Scalar& elseScalar) const
+{
+  return Select<Derived,ThenDerived,typename ThenDerived::ConstantReturnType>(
+    derived(), thenMatrix.derived(), ThenDerived::Constant(rows(),cols(),elseScalar));
+}
+
+/** Version of DenseBase::select(const DenseBase&, const DenseBase&) with
+  * the \em then expression being a scalar value.
+  *
+  * \sa DenseBase::select(const DenseBase<ThenDerived>&, const DenseBase<ElseDerived>&) const, class Select
+  */
+template<typename Derived>
+template<typename ElseDerived>
+inline const Select<Derived, typename ElseDerived::ConstantReturnType, ElseDerived >
+DenseBase<Derived>::select(const typename ElseDerived::Scalar& thenScalar,
+                           const DenseBase<ElseDerived>& elseMatrix) const
+{
+  return Select<Derived,typename ElseDerived::ConstantReturnType,ElseDerived>(
+    derived(), ElseDerived::Constant(rows(),cols(),thenScalar), elseMatrix.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELECT_H
diff --git a/third_party/eigen3/Eigen/src/Core/SelfAdjointView.h b/third_party/eigen3/Eigen/src/Core/SelfAdjointView.h
new file mode 100644
index 0000000000..8231e3f5cd
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/SelfAdjointView.h
@@ -0,0 +1,338 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SELFADJOINTMATRIX_H
+#define EIGEN_SELFADJOINTMATRIX_H
+
+namespace Eigen { 
+
+/** \class SelfAdjointView
+  * \ingroup Core_Module
+  *
+  *
+  * \brief Expression of a selfadjoint matrix from a triangular part of a dense matrix
+  *
+  * \param MatrixType the type of the dense matrix storing the coefficients
+  * \param TriangularPart can be either \c #Lower or \c #Upper
+  *
+  * This class is an expression of a sefladjoint matrix from a triangular part of a matrix
+  * with given dense storage of the coefficients. It is the return type of MatrixBase::selfadjointView()
+  * and most of the time this is the only way that it is used.
+  *
+  * \sa class TriangularBase, MatrixBase::selfadjointView()
+  */
+
+namespace internal {
+template<typename MatrixType, unsigned int UpLo>
+struct traits<SelfAdjointView<MatrixType, UpLo> > : traits<MatrixType>
+{
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_all<MatrixTypeNested>::type MatrixTypeNestedCleaned;
+  typedef MatrixType ExpressionType;
+  typedef typename MatrixType::PlainObject DenseMatrixType;
+  enum {
+    Mode = UpLo | SelfAdjoint,
+    Flags =  MatrixTypeNestedCleaned::Flags & (HereditaryBits)
+           & (~(PacketAccessBit | DirectAccessBit | LinearAccessBit)), // FIXME these flags should be preserved
+    CoeffReadCost = MatrixTypeNestedCleaned::CoeffReadCost
+  };
+};
+}
+
+template <typename Lhs, int LhsMode, bool LhsIsVector,
+          typename Rhs, int RhsMode, bool RhsIsVector>
+struct SelfadjointProductMatrix;
+
+// FIXME could also be called SelfAdjointWrapper to be consistent with DiagonalWrapper ??
+template<typename MatrixType, unsigned int UpLo> class SelfAdjointView
+  : public TriangularBase<SelfAdjointView<MatrixType, UpLo> >
+{
+  public:
+
+    typedef TriangularBase<SelfAdjointView> Base;
+    typedef typename internal::traits<SelfAdjointView>::MatrixTypeNested MatrixTypeNested;
+    typedef typename internal::traits<SelfAdjointView>::MatrixTypeNestedCleaned MatrixTypeNestedCleaned;
+
+    /** \brief The type of coefficients in this matrix */
+    typedef typename internal::traits<SelfAdjointView>::Scalar Scalar; 
+
+    typedef typename MatrixType::Index Index;
+
+    enum {
+      Mode = internal::traits<SelfAdjointView>::Mode
+    };
+    typedef typename MatrixType::PlainObject PlainObject;
+
+    EIGEN_DEVICE_FUNC
+    inline SelfAdjointView(MatrixType& matrix) : m_matrix(matrix)
+    {}
+
+    EIGEN_DEVICE_FUNC
+    inline Index rows() const { return m_matrix.rows(); }
+    EIGEN_DEVICE_FUNC
+    inline Index cols() const { return m_matrix.cols(); }
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const { return m_matrix.outerStride(); }
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const { return m_matrix.innerStride(); }
+
+    /** \sa MatrixBase::coeff()
+      * \warning the coordinates must fit into the referenced triangular part
+      */
+    EIGEN_DEVICE_FUNC
+    inline Scalar coeff(Index row, Index col) const
+    {
+      Base::check_coordinates_internal(row, col);
+      return m_matrix.coeff(row, col);
+    }
+
+    /** \sa MatrixBase::coeffRef()
+      * \warning the coordinates must fit into the referenced triangular part
+      */
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      Base::check_coordinates_internal(row, col);
+      return m_matrix.const_cast_derived().coeffRef(row, col);
+    }
+
+    /** \internal */
+    EIGEN_DEVICE_FUNC
+    const MatrixTypeNestedCleaned& _expression() const { return m_matrix; }
+
+    EIGEN_DEVICE_FUNC
+    const MatrixTypeNestedCleaned& nestedExpression() const { return m_matrix; }
+    EIGEN_DEVICE_FUNC
+    MatrixTypeNestedCleaned& nestedExpression() { return *const_cast<MatrixTypeNestedCleaned*>(&m_matrix); }
+
+    /** Efficient self-adjoint matrix times vector/matrix product */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    SelfadjointProductMatrix<MatrixType,Mode,false,OtherDerived,0,OtherDerived::IsVectorAtCompileTime>
+    operator*(const MatrixBase<OtherDerived>& rhs) const
+    {
+      return SelfadjointProductMatrix
+              <MatrixType,Mode,false,OtherDerived,0,OtherDerived::IsVectorAtCompileTime>
+              (m_matrix, rhs.derived());
+    }
+
+    /** Efficient vector/matrix times self-adjoint matrix product */
+    template<typename OtherDerived> friend
+    EIGEN_DEVICE_FUNC
+    SelfadjointProductMatrix<OtherDerived,0,OtherDerived::IsVectorAtCompileTime,MatrixType,Mode,false>
+    operator*(const MatrixBase<OtherDerived>& lhs, const SelfAdjointView& rhs)
+    {
+      return SelfadjointProductMatrix
+              <OtherDerived,0,OtherDerived::IsVectorAtCompileTime,MatrixType,Mode,false>
+              (lhs.derived(),rhs.m_matrix);
+    }
+
+    /** Perform a symmetric rank 2 update of the selfadjoint matrix \c *this:
+      * \f$ this = this + \alpha u v^* + conj(\alpha) v u^* \f$
+      * \returns a reference to \c *this
+      *
+      * The vectors \a u and \c v \b must be column vectors, however they can be
+      * a adjoint expression without any overhead. Only the meaningful triangular
+      * part of the matrix is updated, the rest is left unchanged.
+      *
+      * \sa rankUpdate(const MatrixBase<DerivedU>&, Scalar)
+      */
+    template<typename DerivedU, typename DerivedV>
+    EIGEN_DEVICE_FUNC
+    SelfAdjointView& rankUpdate(const MatrixBase<DerivedU>& u, const MatrixBase<DerivedV>& v, const Scalar& alpha = Scalar(1));
+
+    /** Perform a symmetric rank K update of the selfadjoint matrix \c *this:
+      * \f$ this = this + \alpha ( u u^* ) \f$ where \a u is a vector or matrix.
+      *
+      * \returns a reference to \c *this
+      *
+      * Note that to perform \f$ this = this + \alpha ( u^* u ) \f$ you can simply
+      * call this function with u.adjoint().
+      *
+      * \sa rankUpdate(const MatrixBase<DerivedU>&, const MatrixBase<DerivedV>&, Scalar)
+      */
+    template<typename DerivedU>
+    EIGEN_DEVICE_FUNC
+    SelfAdjointView& rankUpdate(const MatrixBase<DerivedU>& u, const Scalar& alpha = Scalar(1));
+
+/////////// Cholesky module ///////////
+
+    const LLT<PlainObject, UpLo> llt() const;
+    const LDLT<PlainObject, UpLo> ldlt() const;
+
+/////////// Eigenvalue module ///////////
+
+    /** Real part of #Scalar */
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    /** Return type of eigenvalues() */
+    typedef Matrix<RealScalar, internal::traits<MatrixType>::ColsAtCompileTime, 1> EigenvaluesReturnType;
+
+    EIGEN_DEVICE_FUNC
+    EigenvaluesReturnType eigenvalues() const;
+    EIGEN_DEVICE_FUNC
+    RealScalar operatorNorm() const;
+    
+    #ifdef EIGEN2_SUPPORT
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    SelfAdjointView& operator=(const MatrixBase<OtherDerived>& other)
+    {
+      enum {
+        OtherPart = UpLo == Upper ? StrictlyLower : StrictlyUpper
+      };
+      m_matrix.const_cast_derived().template triangularView<UpLo>() = other;
+      m_matrix.const_cast_derived().template triangularView<OtherPart>() = other.adjoint();
+      return *this;
+    }
+    template<typename OtherMatrixType, unsigned int OtherMode>
+    EIGEN_DEVICE_FUNC
+    SelfAdjointView& operator=(const TriangularView<OtherMatrixType, OtherMode>& other)
+    {
+      enum {
+        OtherPart = UpLo == Upper ? StrictlyLower : StrictlyUpper
+      };
+      m_matrix.const_cast_derived().template triangularView<UpLo>() = other.toDenseMatrix();
+      m_matrix.const_cast_derived().template triangularView<OtherPart>() = other.toDenseMatrix().adjoint();
+      return *this;
+    }
+    #endif
+
+  protected:
+    MatrixTypeNested m_matrix;
+};
+
+
+// template<typename OtherDerived, typename MatrixType, unsigned int UpLo>
+// internal::selfadjoint_matrix_product_returntype<OtherDerived,SelfAdjointView<MatrixType,UpLo> >
+// operator*(const MatrixBase<OtherDerived>& lhs, const SelfAdjointView<MatrixType,UpLo>& rhs)
+// {
+//   return internal::matrix_selfadjoint_product_returntype<OtherDerived,SelfAdjointView<MatrixType,UpLo> >(lhs.derived(),rhs);
+// }
+
+// selfadjoint to dense matrix
+
+namespace internal {
+
+template<typename Derived1, typename Derived2, int UnrollCount, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, (SelfAdjoint|Upper), UnrollCount, ClearOpposite>
+{
+  enum {
+    col = (UnrollCount-1) / Derived1::RowsAtCompileTime,
+    row = (UnrollCount-1) % Derived1::RowsAtCompileTime
+  };
+
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    triangular_assignment_selector<Derived1, Derived2, (SelfAdjoint|Upper), UnrollCount-1, ClearOpposite>::run(dst, src);
+
+    if(row == col)
+      dst.coeffRef(row, col) = numext::real(src.coeff(row, col));
+    else if(row < col)
+      dst.coeffRef(col, row) = numext::conj(dst.coeffRef(row, col) = src.coeff(row, col));
+  }
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, SelfAdjoint|Upper, 0, ClearOpposite>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &, const Derived2 &) {}
+};
+
+template<typename Derived1, typename Derived2, int UnrollCount, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, (SelfAdjoint|Lower), UnrollCount, ClearOpposite>
+{
+  enum {
+    col = (UnrollCount-1) / Derived1::RowsAtCompileTime,
+    row = (UnrollCount-1) % Derived1::RowsAtCompileTime
+  };
+
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    triangular_assignment_selector<Derived1, Derived2, (SelfAdjoint|Lower), UnrollCount-1, ClearOpposite>::run(dst, src);
+
+    if(row == col)
+      dst.coeffRef(row, col) = numext::real(src.coeff(row, col));
+    else if(row > col)
+      dst.coeffRef(col, row) = numext::conj(dst.coeffRef(row, col) = src.coeff(row, col));
+  }
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, SelfAdjoint|Lower, 0, ClearOpposite>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &, const Derived2 &) {}
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, SelfAdjoint|Upper, Dynamic, ClearOpposite>
+{
+  typedef typename Derived1::Index Index;
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    for(Index j = 0; j < dst.cols(); ++j)
+    {
+      for(Index i = 0; i < j; ++i)
+      {
+        dst.copyCoeff(i, j, src);
+        dst.coeffRef(j,i) = numext::conj(dst.coeff(i,j));
+      }
+      dst.copyCoeff(j, j, src);
+    }
+  }
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, SelfAdjoint|Lower, Dynamic, ClearOpposite>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+  typedef typename Derived1::Index Index;
+    for(Index i = 0; i < dst.rows(); ++i)
+    {
+      for(Index j = 0; j < i; ++j)
+      {
+        dst.copyCoeff(i, j, src);
+        dst.coeffRef(j,i) = numext::conj(dst.coeff(i,j));
+      }
+      dst.copyCoeff(i, i, src);
+    }
+  }
+};
+
+} // end namespace internal
+
+/***************************************************************************
+* Implementation of MatrixBase methods
+***************************************************************************/
+
+template<typename Derived>
+template<unsigned int UpLo>
+typename MatrixBase<Derived>::template ConstSelfAdjointViewReturnType<UpLo>::Type
+MatrixBase<Derived>::selfadjointView() const
+{
+  return derived();
+}
+
+template<typename Derived>
+template<unsigned int UpLo>
+typename MatrixBase<Derived>::template SelfAdjointViewReturnType<UpLo>::Type
+MatrixBase<Derived>::selfadjointView()
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFADJOINTMATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/SelfCwiseBinaryOp.h b/third_party/eigen3/Eigen/src/Core/SelfCwiseBinaryOp.h
new file mode 100644
index 0000000000..65864adf84
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/SelfCwiseBinaryOp.h
@@ -0,0 +1,226 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SELFCWISEBINARYOP_H
+#define EIGEN_SELFCWISEBINARYOP_H
+
+namespace Eigen { 
+
+/** \class SelfCwiseBinaryOp
+  * \ingroup Core_Module
+  *
+  * \internal
+  *
+  * \brief Internal helper class for optimizing operators like +=, -=
+  *
+  * This is a pseudo expression class re-implementing the copyCoeff/copyPacket
+  * method to directly performs a +=/-= operations in an optimal way. In particular,
+  * this allows to make sure that the input/output data are loaded only once using
+  * aligned packet loads.
+  *
+  * \sa class SwapWrapper for a similar trick.
+  */
+
+namespace internal {
+template<typename BinaryOp, typename Lhs, typename Rhs>
+struct traits<SelfCwiseBinaryOp<BinaryOp,Lhs,Rhs> >
+  : traits<CwiseBinaryOp<BinaryOp,Lhs,Rhs> >
+{
+  enum {
+    // Note that it is still a good idea to preserve the DirectAccessBit
+    // so that assign can correctly align the data.
+    Flags = traits<CwiseBinaryOp<BinaryOp,Lhs,Rhs> >::Flags | (Lhs::Flags&AlignedBit) | (Lhs::Flags&DirectAccessBit) | (Lhs::Flags&LvalueBit),
+    OuterStrideAtCompileTime = Lhs::OuterStrideAtCompileTime,
+    InnerStrideAtCompileTime = Lhs::InnerStrideAtCompileTime
+  };
+};
+}
+
+template<typename BinaryOp, typename Lhs, typename Rhs> class SelfCwiseBinaryOp
+  : public internal::dense_xpr_base< SelfCwiseBinaryOp<BinaryOp, Lhs, Rhs> >::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<SelfCwiseBinaryOp>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(SelfCwiseBinaryOp)
+
+    typedef typename internal::packet_traits<Scalar>::type Packet;
+
+    EIGEN_DEVICE_FUNC
+    inline SelfCwiseBinaryOp(Lhs& xpr, const BinaryOp& func = BinaryOp()) : m_matrix(xpr), m_functor(func) {}
+
+    EIGEN_DEVICE_FUNC inline Index rows() const { return m_matrix.rows(); }
+    EIGEN_DEVICE_FUNC inline Index cols() const { return m_matrix.cols(); }
+    EIGEN_DEVICE_FUNC inline Index outerStride() const { return m_matrix.outerStride(); }
+    EIGEN_DEVICE_FUNC inline Index innerStride() const { return m_matrix.innerStride(); }
+    EIGEN_DEVICE_FUNC inline const Scalar* data() const { return m_matrix.data(); }
+
+    // note that this function is needed by assign to correctly align loads/stores
+    // TODO make Assign use .data()
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(Lhs)
+      return m_matrix.const_cast_derived().coeffRef(row, col);
+    }
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index row, Index col) const
+    {
+      return m_matrix.coeffRef(row, col);
+    }
+
+    // note that this function is needed by assign to correctly align loads/stores
+    // TODO make Assign use .data()
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index index)
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(Lhs)
+      return m_matrix.const_cast_derived().coeffRef(index);
+    }
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index index) const
+    {
+      return m_matrix.const_cast_derived().coeffRef(index);
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void copyCoeff(Index row, Index col, const DenseBase<OtherDerived>& other)
+    {
+      OtherDerived& _other = other.const_cast_derived();
+      eigen_internal_assert(row >= 0 && row < rows()
+                         && col >= 0 && col < cols());
+      Scalar& tmp = m_matrix.coeffRef(row,col);
+      tmp = m_functor(tmp, _other.coeff(row,col));
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void copyCoeff(Index index, const DenseBase<OtherDerived>& other)
+    {
+      OtherDerived& _other = other.const_cast_derived();
+      eigen_internal_assert(index >= 0 && index < m_matrix.size());
+      Scalar& tmp = m_matrix.coeffRef(index);
+      tmp = m_functor(tmp, _other.coeff(index));
+    }
+
+    template<typename OtherDerived, int StoreMode, int LoadMode>
+    void copyPacket(Index row, Index col, const DenseBase<OtherDerived>& other)
+    {
+      OtherDerived& _other = other.const_cast_derived();
+      eigen_internal_assert(row >= 0 && row < rows()
+                        && col >= 0 && col < cols());
+      m_matrix.template writePacket<StoreMode>(row, col,
+        m_functor.packetOp(m_matrix.template packet<StoreMode>(row, col),_other.template packet<LoadMode>(row, col)) );
+    }
+
+    template<typename OtherDerived, int StoreMode, int LoadMode>
+    void copyPacket(Index index, const DenseBase<OtherDerived>& other)
+    {
+      OtherDerived& _other = other.const_cast_derived();
+      eigen_internal_assert(index >= 0 && index < m_matrix.size());
+      m_matrix.template writePacket<StoreMode>(index,
+        m_functor.packetOp(m_matrix.template packet<StoreMode>(index),_other.template packet<LoadMode>(index)) );
+    }
+
+    // reimplement lazyAssign to handle complex *= real
+    // see CwiseBinaryOp ctor for details
+    template<typename RhsDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE SelfCwiseBinaryOp& lazyAssign(const DenseBase<RhsDerived>& rhs)
+    {
+      EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Lhs,RhsDerived)
+      EIGEN_CHECK_BINARY_COMPATIBILIY(BinaryOp,typename Lhs::Scalar,typename RhsDerived::Scalar);
+      
+    #ifdef EIGEN_DEBUG_ASSIGN
+      internal::assign_traits<SelfCwiseBinaryOp, RhsDerived>::debug();
+    #endif
+      eigen_assert(rows() == rhs.rows() && cols() == rhs.cols());
+      internal::assign_impl<SelfCwiseBinaryOp, RhsDerived>::run(*this,rhs.derived());
+    #ifndef EIGEN_NO_DEBUG
+      this->checkTransposeAliasing(rhs.derived());
+    #endif
+      return *this;
+    }
+    
+    // overloaded to honor evaluation of special matrices
+    // maybe another solution would be to not use SelfCwiseBinaryOp
+    // at first...
+    EIGEN_DEVICE_FUNC
+    SelfCwiseBinaryOp& operator=(const Rhs& _rhs)
+    {
+      typename internal::nested<Rhs>::type rhs(_rhs);
+      return Base::operator=(rhs);
+    }
+
+    EIGEN_DEVICE_FUNC
+    Lhs& expression() const 
+    { 
+      return m_matrix;
+    }
+
+    EIGEN_DEVICE_FUNC
+    const BinaryOp& functor() const 
+    { 
+      return m_functor;
+    }
+
+  protected:
+    Lhs& m_matrix;
+    const BinaryOp& m_functor;
+
+  private:
+    SelfCwiseBinaryOp& operator=(const SelfCwiseBinaryOp&);
+};
+
+template<typename Derived>
+inline Derived& DenseBase<Derived>::operator*=(const Scalar& other)
+{
+  typedef typename Derived::PlainObject PlainObject;
+  SelfCwiseBinaryOp<internal::scalar_product_op<Scalar>, Derived, typename PlainObject::ConstantReturnType> tmp(derived());
+  tmp = PlainObject::Constant(rows(),cols(),other);
+  return derived();
+}
+
+template<typename Derived>
+inline Derived& ArrayBase<Derived>::operator+=(const Scalar& other)
+{
+  typedef typename Derived::PlainObject PlainObject;
+  SelfCwiseBinaryOp<internal::scalar_sum_op<Scalar>, Derived, typename PlainObject::ConstantReturnType> tmp(derived());
+  tmp = PlainObject::Constant(rows(),cols(),other);
+  return derived();
+}
+
+template<typename Derived>
+inline Derived& ArrayBase<Derived>::operator-=(const Scalar& other)
+{
+  typedef typename Derived::PlainObject PlainObject;
+  SelfCwiseBinaryOp<internal::scalar_difference_op<Scalar>, Derived, typename PlainObject::ConstantReturnType> tmp(derived());
+  tmp = PlainObject::Constant(rows(),cols(),other);
+  return derived();
+}
+
+template<typename Derived>
+inline Derived& DenseBase<Derived>::operator/=(const Scalar& other)
+{
+  typedef typename internal::conditional<NumTraits<Scalar>::IsInteger,
+                                        internal::scalar_quotient_op<Scalar>,
+                                        internal::scalar_product_op<Scalar> >::type BinOp;
+  typedef typename Derived::PlainObject PlainObject;
+  SelfCwiseBinaryOp<BinOp, Derived, typename PlainObject::ConstantReturnType> tmp(derived());
+  Scalar actual_other;
+  if(NumTraits<Scalar>::IsInteger)  actual_other = other;
+  else                              actual_other = Scalar(1)/other;
+  tmp = PlainObject::Constant(rows(),cols(), actual_other);
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFCWISEBINARYOP_H
diff --git a/third_party/eigen3/Eigen/src/Core/SolveTriangular.h b/third_party/eigen3/Eigen/src/Core/SolveTriangular.h
new file mode 100644
index 0000000000..e158e31626
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/SolveTriangular.h
@@ -0,0 +1,260 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SOLVETRIANGULAR_H
+#define EIGEN_SOLVETRIANGULAR_H
+
+namespace Eigen { 
+
+namespace internal {
+
+// Forward declarations:
+// The following two routines are implemented in the products/TriangularSolver*.h files
+template<typename LhsScalar, typename RhsScalar, typename Index, int Side, int Mode, bool Conjugate, int StorageOrder>
+struct triangular_solve_vector;
+
+template <typename Scalar, typename Index, int Side, int Mode, bool Conjugate, int TriStorageOrder, int OtherStorageOrder>
+struct triangular_solve_matrix;
+
+// small helper struct extracting some traits on the underlying solver operation
+template<typename Lhs, typename Rhs, int Side>
+class trsolve_traits
+{
+  private:
+    enum {
+      RhsIsVectorAtCompileTime = (Side==OnTheLeft ? Rhs::ColsAtCompileTime : Rhs::RowsAtCompileTime)==1
+    };
+  public:
+    enum {
+      Unrolling   = (RhsIsVectorAtCompileTime && Rhs::SizeAtCompileTime != Dynamic && Rhs::SizeAtCompileTime <= 8)
+                  ? CompleteUnrolling : NoUnrolling,
+      RhsVectors  = RhsIsVectorAtCompileTime ? 1 : Dynamic
+    };
+};
+
+template<typename Lhs, typename Rhs,
+  int Side, // can be OnTheLeft/OnTheRight
+  int Mode, // can be Upper/Lower | UnitDiag
+  int Unrolling = trsolve_traits<Lhs,Rhs,Side>::Unrolling,
+  int RhsVectors = trsolve_traits<Lhs,Rhs,Side>::RhsVectors
+  >
+struct triangular_solver_selector;
+
+template<typename Lhs, typename Rhs, int Side, int Mode>
+struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,1>
+{
+  typedef typename Lhs::Scalar LhsScalar;
+  typedef typename Rhs::Scalar RhsScalar;
+  typedef blas_traits<Lhs> LhsProductTraits;
+  typedef typename LhsProductTraits::ExtractType ActualLhsType;
+  typedef Map<Matrix<RhsScalar,Dynamic,1>, Aligned> MappedRhs;
+  static void run(const Lhs& lhs, Rhs& rhs)
+  {
+    ActualLhsType actualLhs = LhsProductTraits::extract(lhs);
+
+    // FIXME find a way to allow an inner stride if packet_traits<Scalar>::size==1
+
+    bool useRhsDirectly = Rhs::InnerStrideAtCompileTime==1 || rhs.innerStride()==1;
+
+    ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhs,rhs.size(),
+                                                  (useRhsDirectly ? rhs.data() : 0));
+                                                  
+    if(!useRhsDirectly)
+      MappedRhs(actualRhs,rhs.size()) = rhs;
+
+    triangular_solve_vector<LhsScalar, RhsScalar, typename Lhs::Index, Side, Mode, LhsProductTraits::NeedToConjugate,
+                            (int(Lhs::Flags) & RowMajorBit) ? RowMajor : ColMajor>
+      ::run(actualLhs.cols(), actualLhs.data(), actualLhs.outerStride(), actualRhs);
+
+    if(!useRhsDirectly)
+      rhs = MappedRhs(actualRhs, rhs.size());
+  }
+};
+
+// the rhs is a matrix
+template<typename Lhs, typename Rhs, int Side, int Mode>
+struct triangular_solver_selector<Lhs,Rhs,Side,Mode,NoUnrolling,Dynamic>
+{
+  typedef typename Rhs::Scalar Scalar;
+  typedef typename Rhs::Index Index;
+  typedef blas_traits<Lhs> LhsProductTraits;
+  typedef typename LhsProductTraits::DirectLinearAccessType ActualLhsType;
+
+  static void run(const Lhs& lhs, Rhs& rhs)
+  {
+    typename internal::add_const_on_value_type<ActualLhsType>::type actualLhs = LhsProductTraits::extract(lhs);
+
+    const Index size = lhs.rows();
+    const Index othersize = Side==OnTheLeft? rhs.cols() : rhs.rows();
+
+    typedef internal::gemm_blocking_space<(Rhs::Flags&RowMajorBit) ? RowMajor : ColMajor,Scalar,Scalar,
+              Rhs::MaxRowsAtCompileTime, Rhs::MaxColsAtCompileTime, Lhs::MaxRowsAtCompileTime,4> BlockingType;
+
+    BlockingType blocking(rhs.rows(), rhs.cols(), size, 1, false);
+
+    triangular_solve_matrix<Scalar,Index,Side,Mode,LhsProductTraits::NeedToConjugate,(int(Lhs::Flags) & RowMajorBit) ? RowMajor : ColMajor,
+                               (Rhs::Flags&RowMajorBit) ? RowMajor : ColMajor>
+      ::run(size, othersize, &actualLhs.coeffRef(0,0), actualLhs.outerStride(), &rhs.coeffRef(0,0), rhs.outerStride(), blocking);
+  }
+};
+
+/***************************************************************************
+* meta-unrolling implementation
+***************************************************************************/
+
+template<typename Lhs, typename Rhs, int Mode, int Index, int Size,
+         bool Stop = Index==Size>
+struct triangular_solver_unroller;
+
+template<typename Lhs, typename Rhs, int Mode, int Index, int Size>
+struct triangular_solver_unroller<Lhs,Rhs,Mode,Index,Size,false> {
+  enum {
+    IsLower = ((Mode&Lower)==Lower),
+    I = IsLower ? Index : Size - Index - 1,
+    S = IsLower ? 0     : I+1
+  };
+  static void run(const Lhs& lhs, Rhs& rhs)
+  {
+    if (Index>0)
+      rhs.coeffRef(I) -= lhs.row(I).template segment<Index>(S).transpose()
+                         .cwiseProduct(rhs.template segment<Index>(S)).sum();
+
+    if(!(Mode & UnitDiag))
+      rhs.coeffRef(I) /= lhs.coeff(I,I);
+
+    triangular_solver_unroller<Lhs,Rhs,Mode,Index+1,Size>::run(lhs,rhs);
+  }
+};
+
+template<typename Lhs, typename Rhs, int Mode, int Index, int Size>
+struct triangular_solver_unroller<Lhs,Rhs,Mode,Index,Size,true> {
+  static void run(const Lhs&, Rhs&) {}
+};
+
+template<typename Lhs, typename Rhs, int Mode>
+struct triangular_solver_selector<Lhs,Rhs,OnTheLeft,Mode,CompleteUnrolling,1> {
+  static void run(const Lhs& lhs, Rhs& rhs)
+  { triangular_solver_unroller<Lhs,Rhs,Mode,0,Rhs::SizeAtCompileTime>::run(lhs,rhs); }
+};
+
+template<typename Lhs, typename Rhs, int Mode>
+struct triangular_solver_selector<Lhs,Rhs,OnTheRight,Mode,CompleteUnrolling,1> {
+  static void run(const Lhs& lhs, Rhs& rhs)
+  {
+    Transpose<const Lhs> trLhs(lhs);
+    Transpose<Rhs> trRhs(rhs);
+    
+    triangular_solver_unroller<Transpose<const Lhs>,Transpose<Rhs>,
+                              ((Mode&Upper)==Upper ? Lower : Upper) | (Mode&UnitDiag),
+                              0,Rhs::SizeAtCompileTime>::run(trLhs,trRhs);
+  }
+};
+
+} // end namespace internal
+
+/***************************************************************************
+* TriangularView methods
+***************************************************************************/
+
+/** "in-place" version of TriangularView::solve() where the result is written in \a other
+  *
+  * \warning The parameter is only marked 'const' to make the C++ compiler accept a temporary expression here.
+  * This function will const_cast it, so constness isn't honored here.
+  *
+  * See TriangularView:solve() for the details.
+  */
+template<typename MatrixType, unsigned int Mode>
+template<int Side, typename OtherDerived>
+void TriangularView<MatrixType,Mode>::solveInPlace(const MatrixBase<OtherDerived>& _other) const
+{
+  OtherDerived& other = _other.const_cast_derived();
+  eigen_assert( cols() == rows() && ((Side==OnTheLeft && cols() == other.rows()) || (Side==OnTheRight && cols() == other.cols())) );
+  eigen_assert((!(Mode & ZeroDiag)) && bool(Mode & (Upper|Lower)));
+
+  enum { copy = internal::traits<OtherDerived>::Flags & RowMajorBit  && OtherDerived::IsVectorAtCompileTime };
+  typedef typename internal::conditional<copy,
+    typename internal::plain_matrix_type_column_major<OtherDerived>::type, OtherDerived&>::type OtherCopy;
+  OtherCopy otherCopy(other);
+
+  internal::triangular_solver_selector<MatrixType, typename internal::remove_reference<OtherCopy>::type,
+    Side, Mode>::run(nestedExpression(), otherCopy);
+
+  if (copy)
+    other = otherCopy;
+}
+
+/** \returns the product of the inverse of \c *this with \a other, \a *this being triangular.
+  *
+  * This function computes the inverse-matrix matrix product inverse(\c *this) * \a other if
+  * \a Side==OnTheLeft (the default), or the right-inverse-multiply  \a other * inverse(\c *this) if
+  * \a Side==OnTheRight.
+  *
+  * The matrix \c *this must be triangular and invertible (i.e., all the coefficients of the
+  * diagonal must be non zero). It works as a forward (resp. backward) substitution if \c *this
+  * is an upper (resp. lower) triangular matrix.
+  *
+  * Example: \include MatrixBase_marked.cpp
+  * Output: \verbinclude MatrixBase_marked.out
+  *
+  * This function returns an expression of the inverse-multiply and can works in-place if it is assigned
+  * to the same matrix or vector \a other.
+  *
+  * For users coming from BLAS, this function (and more specifically solveInPlace()) offer
+  * all the operations supported by the \c *TRSV and \c *TRSM BLAS routines.
+  *
+  * \sa TriangularView::solveInPlace()
+  */
+template<typename Derived, unsigned int Mode>
+template<int Side, typename Other>
+const internal::triangular_solve_retval<Side,TriangularView<Derived,Mode>,Other>
+TriangularView<Derived,Mode>::solve(const MatrixBase<Other>& other) const
+{
+  return internal::triangular_solve_retval<Side,TriangularView,Other>(*this, other.derived());
+}
+
+namespace internal {
+
+
+template<int Side, typename TriangularType, typename Rhs>
+struct traits<triangular_solve_retval<Side, TriangularType, Rhs> >
+{
+  typedef typename internal::plain_matrix_type_column_major<Rhs>::type ReturnType;
+};
+
+template<int Side, typename TriangularType, typename Rhs> struct triangular_solve_retval
+ : public ReturnByValue<triangular_solve_retval<Side, TriangularType, Rhs> >
+{
+  typedef typename remove_all<typename Rhs::Nested>::type RhsNestedCleaned;
+  typedef ReturnByValue<triangular_solve_retval> Base;
+  typedef typename Base::Index Index;
+
+  triangular_solve_retval(const TriangularType& tri, const Rhs& rhs)
+    : m_triangularMatrix(tri), m_rhs(rhs)
+  {}
+
+  inline Index rows() const { return m_rhs.rows(); }
+  inline Index cols() const { return m_rhs.cols(); }
+
+  template<typename Dest> inline void evalTo(Dest& dst) const
+  {
+    if(!(is_same<RhsNestedCleaned,Dest>::value && extract_data(dst) == extract_data(m_rhs)))
+      dst = m_rhs;
+    m_triangularMatrix.template solveInPlace<Side>(dst);
+  }
+
+  protected:
+    const TriangularType& m_triangularMatrix;
+    typename Rhs::Nested m_rhs;
+};
+
+} // namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_SOLVETRIANGULAR_H
diff --git a/third_party/eigen3/Eigen/src/Core/StableNorm.h b/third_party/eigen3/Eigen/src/Core/StableNorm.h
new file mode 100644
index 0000000000..c862c0b63e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/StableNorm.h
@@ -0,0 +1,200 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STABLENORM_H
+#define EIGEN_STABLENORM_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename ExpressionType, typename Scalar>
+inline void stable_norm_kernel(const ExpressionType& bl, Scalar& ssq, Scalar& scale, Scalar& invScale)
+{
+  using std::max;
+  Scalar maxCoeff = bl.cwiseAbs().maxCoeff();
+  
+  if (maxCoeff>scale)
+  {
+    ssq = ssq * numext::abs2(scale/maxCoeff);
+    Scalar tmp = Scalar(1)/maxCoeff;
+    if(tmp > NumTraits<Scalar>::highest())
+    {
+      invScale = NumTraits<Scalar>::highest();
+      scale = Scalar(1)/invScale;
+    }
+    else
+    {
+      scale = maxCoeff;
+      invScale = tmp;
+    }
+  }
+  
+  // TODO if the maxCoeff is much much smaller than the current scale,
+  // then we can neglect this sub vector
+  if(scale>Scalar(0)) // if scale==0, then bl is 0 
+    ssq += (bl*invScale).squaredNorm();
+}
+
+template<typename Derived>
+inline typename NumTraits<typename traits<Derived>::Scalar>::Real
+blueNorm_impl(const EigenBase<Derived>& _vec)
+{
+  typedef typename Derived::RealScalar RealScalar;  
+  typedef typename Derived::Index Index;
+  using std::pow;
+  using std::sqrt;
+  using std::abs;
+  const Derived& vec(_vec.derived());
+  static bool initialized = false;
+  static RealScalar b1, b2, s1m, s2m, overfl, rbig, relerr;
+  if(!initialized)
+  {
+    int ibeta, it, iemin, iemax, iexp;
+    RealScalar eps;
+    // This program calculates the machine-dependent constants
+    // bl, b2, slm, s2m, relerr overfl
+    // from the "basic" machine-dependent numbers
+    // nbig, ibeta, it, iemin, iemax, rbig.
+    // The following define the basic machine-dependent constants.
+    // For portability, the PORT subprograms "ilmaeh" and "rlmach"
+    // are used. For any specific computer, each of the assignment
+    // statements can be replaced
+    ibeta = std::numeric_limits<RealScalar>::radix;                 // base for floating-point numbers
+    it    = std::numeric_limits<RealScalar>::digits;                // number of base-beta digits in mantissa
+    iemin = std::numeric_limits<RealScalar>::min_exponent;          // minimum exponent
+    iemax = std::numeric_limits<RealScalar>::max_exponent;          // maximum exponent
+    rbig  = (std::numeric_limits<RealScalar>::max)();               // largest floating-point number
+
+    iexp  = -((1-iemin)/2);
+    b1    = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp)));    // lower boundary of midrange
+    iexp  = (iemax + 1 - it)/2;
+    b2    = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp)));    // upper boundary of midrange
+
+    iexp  = (2-iemin)/2;
+    s1m   = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp)));    // scaling factor for lower range
+    iexp  = - ((iemax+it)/2);
+    s2m   = RealScalar(pow(RealScalar(ibeta),RealScalar(iexp)));    // scaling factor for upper range
+
+    overfl  = rbig*s2m;                                             // overflow boundary for abig
+    eps     = RealScalar(pow(double(ibeta), 1-it));
+    relerr  = sqrt(eps);                                            // tolerance for neglecting asml
+    initialized = true;
+  }
+  Index n = vec.size();
+  RealScalar ab2 = b2 / RealScalar(n);
+  RealScalar asml = RealScalar(0);
+  RealScalar amed = RealScalar(0);
+  RealScalar abig = RealScalar(0);
+  for(typename Derived::InnerIterator it(vec, 0); it; ++it)
+  {
+    RealScalar ax = abs(it.value());
+    if(ax > ab2)     abig += numext::abs2(ax*s2m);
+    else if(ax < b1) asml += numext::abs2(ax*s1m);
+    else             amed += numext::abs2(ax);
+  }
+  if(abig > RealScalar(0))
+  {
+    abig = sqrt(abig);
+    if(abig > overfl)
+    {
+      return rbig;
+    }
+    if(amed > RealScalar(0))
+    {
+      abig = abig/s2m;
+      amed = sqrt(amed);
+    }
+    else
+      return abig/s2m;
+  }
+  else if(asml > RealScalar(0))
+  {
+    if (amed > RealScalar(0))
+    {
+      abig = sqrt(amed);
+      amed = sqrt(asml) / s1m;
+    }
+    else
+      return sqrt(asml)/s1m;
+  }
+  else
+    return sqrt(amed);
+  asml = numext::mini(abig, amed);
+  abig = numext::maxi(abig, amed);
+  if(asml <= abig*relerr)
+    return abig;
+  else
+    return abig * sqrt(RealScalar(1) + numext::abs2(asml/abig));
+}
+
+} // end namespace internal
+
+/** \returns the \em l2 norm of \c *this avoiding underflow and overflow.
+  * This version use a blockwise two passes algorithm:
+  *  1 - find the absolute largest coefficient \c s
+  *  2 - compute \f$ s \Vert \frac{*this}{s} \Vert \f$ in a standard way
+  *
+  * For architecture/scalar types supporting vectorization, this version
+  * is faster than blueNorm(). Otherwise the blueNorm() is much faster.
+  *
+  * \sa norm(), blueNorm(), hypotNorm()
+  */
+template<typename Derived>
+inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
+MatrixBase<Derived>::stableNorm() const
+{
+  using std::sqrt;
+  const Index blockSize = 4096;
+  RealScalar scale(0);
+  RealScalar invScale(1);
+  RealScalar ssq(0); // sum of square
+  enum {
+    Alignment = (int(Flags)&DirectAccessBit) || (int(Flags)&AlignedBit) ? 1 : 0
+  };
+  Index n = size();
+  Index bi = internal::first_aligned(derived());
+  if (bi>0)
+    internal::stable_norm_kernel(this->head(bi), ssq, scale, invScale);
+  for (; bi<n; bi+=blockSize)
+    internal::stable_norm_kernel(this->segment(bi,numext::mini(blockSize, n - bi)).template forceAlignedAccessIf<Alignment>(), ssq, scale, invScale);
+  return scale * sqrt(ssq);
+}
+
+/** \returns the \em l2 norm of \c *this using the Blue's algorithm.
+  * A Portable Fortran Program to Find the Euclidean Norm of a Vector,
+  * ACM TOMS, Vol 4, Issue 1, 1978.
+  *
+  * For architecture/scalar types without vectorization, this version
+  * is much faster than stableNorm(). Otherwise the stableNorm() is faster.
+  *
+  * \sa norm(), stableNorm(), hypotNorm()
+  */
+template<typename Derived>
+inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
+MatrixBase<Derived>::blueNorm() const
+{
+  return internal::blueNorm_impl(*this);
+}
+
+/** \returns the \em l2 norm of \c *this avoiding undeflow and overflow.
+  * This version use a concatenation of hypot() calls, and it is very slow.
+  *
+  * \sa norm(), stableNorm()
+  */
+template<typename Derived>
+inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
+MatrixBase<Derived>::hypotNorm() const
+{
+  return this->cwiseAbs().redux(internal::scalar_hypot_op<RealScalar>());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_STABLENORM_H
diff --git a/third_party/eigen3/Eigen/src/Core/Stride.h b/third_party/eigen3/Eigen/src/Core/Stride.h
new file mode 100644
index 0000000000..d3d454e4e2
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Stride.h
@@ -0,0 +1,113 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STRIDE_H
+#define EIGEN_STRIDE_H
+
+namespace Eigen { 
+
+/** \class Stride
+  * \ingroup Core_Module
+  *
+  * \brief Holds strides information for Map
+  *
+  * This class holds the strides information for mapping arrays with strides with class Map.
+  *
+  * It holds two values: the inner stride and the outer stride.
+  *
+  * The inner stride is the pointer increment between two consecutive entries within a given row of a
+  * row-major matrix or within a given column of a column-major matrix.
+  *
+  * The outer stride is the pointer increment between two consecutive rows of a row-major matrix or
+  * between two consecutive columns of a column-major matrix.
+  *
+  * These two values can be passed either at compile-time as template parameters, or at runtime as
+  * arguments to the constructor.
+  *
+  * Indeed, this class takes two template parameters:
+  *  \param _OuterStrideAtCompileTime the outer stride, or Dynamic if you want to specify it at runtime.
+  *  \param _InnerStrideAtCompileTime the inner stride, or Dynamic if you want to specify it at runtime.
+  *
+  * Here is an example:
+  * \include Map_general_stride.cpp
+  * Output: \verbinclude Map_general_stride.out
+  *
+  * \sa class InnerStride, class OuterStride, \ref TopicStorageOrders
+  */
+template<int _OuterStrideAtCompileTime, int _InnerStrideAtCompileTime>
+class Stride
+{
+  public:
+    typedef DenseIndex Index;
+    enum {
+      InnerStrideAtCompileTime = _InnerStrideAtCompileTime,
+      OuterStrideAtCompileTime = _OuterStrideAtCompileTime
+    };
+
+    /** Default constructor, for use when strides are fixed at compile time */
+    EIGEN_DEVICE_FUNC
+    Stride()
+      : m_outer(OuterStrideAtCompileTime), m_inner(InnerStrideAtCompileTime)
+    {
+      eigen_assert(InnerStrideAtCompileTime != Dynamic && OuterStrideAtCompileTime != Dynamic);
+    }
+
+    /** Constructor allowing to pass the strides at runtime */
+    EIGEN_DEVICE_FUNC
+    Stride(Index outerStride, Index innerStride)
+      : m_outer(outerStride), m_inner(innerStride)
+    {
+      eigen_assert(innerStride>=0 && outerStride>=0);
+    }
+
+    /** Copy constructor */
+    EIGEN_DEVICE_FUNC
+    Stride(const Stride& other)
+      : m_outer(other.outer()), m_inner(other.inner())
+    {}
+
+    /** \returns the outer stride */
+    EIGEN_DEVICE_FUNC
+    inline Index outer() const { return m_outer.value(); }
+    /** \returns the inner stride */
+    EIGEN_DEVICE_FUNC
+    inline Index inner() const { return m_inner.value(); }
+
+  protected:
+    internal::variable_if_dynamic<Index, OuterStrideAtCompileTime> m_outer;
+    internal::variable_if_dynamic<Index, InnerStrideAtCompileTime> m_inner;
+};
+
+/** \brief Convenience specialization of Stride to specify only an inner stride
+  * See class Map for some examples */
+template<int Value = Dynamic>
+class InnerStride : public Stride<0, Value>
+{
+    typedef Stride<0, Value> Base;
+  public:
+    typedef DenseIndex Index;
+    EIGEN_DEVICE_FUNC InnerStride() : Base() {}
+    EIGEN_DEVICE_FUNC InnerStride(Index v) : Base(0, v) {}
+};
+
+/** \brief Convenience specialization of Stride to specify only an outer stride
+  * See class Map for some examples */
+template<int Value = Dynamic>
+class OuterStride : public Stride<Value, 0>
+{
+    typedef Stride<Value, 0> Base;
+  public:
+    typedef DenseIndex Index;
+    EIGEN_DEVICE_FUNC OuterStride() : Base() {}
+    EIGEN_DEVICE_FUNC OuterStride(Index v) : Base(v,0) {}
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_STRIDE_H
diff --git a/third_party/eigen3/Eigen/src/Core/Swap.h b/third_party/eigen3/Eigen/src/Core/Swap.h
new file mode 100644
index 0000000000..d602fba653
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Swap.h
@@ -0,0 +1,140 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SWAP_H
+#define EIGEN_SWAP_H
+
+namespace Eigen { 
+
+/** \class SwapWrapper
+  * \ingroup Core_Module
+  *
+  * \internal
+  *
+  * \brief Internal helper class for swapping two expressions
+  */
+namespace internal {
+template<typename ExpressionType>
+struct traits<SwapWrapper<ExpressionType> > : traits<ExpressionType> {};
+}
+
+template<typename ExpressionType> class SwapWrapper
+  : public internal::dense_xpr_base<SwapWrapper<ExpressionType> >::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<SwapWrapper>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(SwapWrapper)
+    typedef typename internal::packet_traits<Scalar>::type Packet;
+
+    EIGEN_DEVICE_FUNC
+    inline SwapWrapper(ExpressionType& xpr) : m_expression(xpr) {}
+
+    EIGEN_DEVICE_FUNC
+    inline Index rows() const { return m_expression.rows(); }
+    EIGEN_DEVICE_FUNC
+    inline Index cols() const { return m_expression.cols(); }
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const { return m_expression.outerStride(); }
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const { return m_expression.innerStride(); }
+    
+    typedef typename internal::conditional<
+                       internal::is_lvalue<ExpressionType>::value,
+                       Scalar,
+                       const Scalar
+                     >::type ScalarWithConstIfNotLvalue;
+                     
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue* data() { return m_expression.data(); }
+    EIGEN_DEVICE_FUNC
+    inline const Scalar* data() const { return m_expression.data(); }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index rowId, Index colId)
+    {
+      return m_expression.const_cast_derived().coeffRef(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index index)
+    {
+      return m_expression.const_cast_derived().coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index rowId, Index colId) const
+    {
+      return m_expression.coeffRef(rowId, colId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index index) const
+    {
+      return m_expression.coeffRef(index);
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void copyCoeff(Index rowId, Index colId, const DenseBase<OtherDerived>& other)
+    {
+      OtherDerived& _other = other.const_cast_derived();
+      eigen_internal_assert(rowId >= 0 && rowId < rows()
+                         && colId >= 0 && colId < cols());
+      Scalar tmp = m_expression.coeff(rowId, colId);
+      m_expression.coeffRef(rowId, colId) = _other.coeff(rowId, colId);
+      _other.coeffRef(rowId, colId) = tmp;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void copyCoeff(Index index, const DenseBase<OtherDerived>& other)
+    {
+      OtherDerived& _other = other.const_cast_derived();
+      eigen_internal_assert(index >= 0 && index < m_expression.size());
+      Scalar tmp = m_expression.coeff(index);
+      m_expression.coeffRef(index) = _other.coeff(index);
+      _other.coeffRef(index) = tmp;
+    }
+
+    template<typename OtherDerived, int StoreMode, int LoadMode>
+    void copyPacket(Index rowId, Index colId, const DenseBase<OtherDerived>& other)
+    {
+      OtherDerived& _other = other.const_cast_derived();
+      eigen_internal_assert(rowId >= 0 && rowId < rows()
+                        && colId >= 0 && colId < cols());
+      Packet tmp = m_expression.template packet<StoreMode>(rowId, colId);
+      m_expression.template writePacket<StoreMode>(rowId, colId,
+        _other.template packet<LoadMode>(rowId, colId)
+      );
+      _other.template writePacket<LoadMode>(rowId, colId, tmp);
+    }
+
+    template<typename OtherDerived, int StoreMode, int LoadMode>
+    void copyPacket(Index index, const DenseBase<OtherDerived>& other)
+    {
+      OtherDerived& _other = other.const_cast_derived();
+      eigen_internal_assert(index >= 0 && index < m_expression.size());
+      Packet tmp = m_expression.template packet<StoreMode>(index);
+      m_expression.template writePacket<StoreMode>(index,
+        _other.template packet<LoadMode>(index)
+      );
+      _other.template writePacket<LoadMode>(index, tmp);
+    }
+
+    EIGEN_DEVICE_FUNC
+    ExpressionType& expression() const { return m_expression; }
+
+  protected:
+    ExpressionType& m_expression;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_SWAP_H
diff --git a/third_party/eigen3/Eigen/src/Core/Transpose.h b/third_party/eigen3/Eigen/src/Core/Transpose.h
new file mode 100644
index 0000000000..aba3f66704
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Transpose.h
@@ -0,0 +1,428 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRANSPOSE_H
+#define EIGEN_TRANSPOSE_H
+
+namespace Eigen { 
+
+/** \class Transpose
+  * \ingroup Core_Module
+  *
+  * \brief Expression of the transpose of a matrix
+  *
+  * \param MatrixType the type of the object of which we are taking the transpose
+  *
+  * This class represents an expression of the transpose of a matrix.
+  * It is the return type of MatrixBase::transpose() and MatrixBase::adjoint()
+  * and most of the time this is the only way it is used.
+  *
+  * \sa MatrixBase::transpose(), MatrixBase::adjoint()
+  */
+
+namespace internal {
+template<typename MatrixType>
+struct traits<Transpose<MatrixType> > : traits<MatrixType>
+{
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_reference<MatrixTypeNested>::type MatrixTypeNestedPlain;
+  typedef typename traits<MatrixType>::StorageKind StorageKind;
+  typedef typename traits<MatrixType>::XprKind XprKind;
+  enum {
+    RowsAtCompileTime = MatrixType::ColsAtCompileTime,
+    ColsAtCompileTime = MatrixType::RowsAtCompileTime,
+    MaxRowsAtCompileTime = MatrixType::MaxColsAtCompileTime,
+    MaxColsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+    FlagsLvalueBit = is_lvalue<MatrixType>::value ? LvalueBit : 0,
+    Flags0 = MatrixTypeNestedPlain::Flags & ~(LvalueBit | NestByRefBit),
+    Flags1 = Flags0 | FlagsLvalueBit,
+    Flags = Flags1 ^ RowMajorBit,
+    CoeffReadCost = MatrixTypeNestedPlain::CoeffReadCost,
+    InnerStrideAtCompileTime = inner_stride_at_compile_time<MatrixType>::ret,
+    OuterStrideAtCompileTime = outer_stride_at_compile_time<MatrixType>::ret
+  };
+};
+}
+
+template<typename MatrixType, typename StorageKind> class TransposeImpl;
+
+template<typename MatrixType> class Transpose
+  : public TransposeImpl<MatrixType,typename internal::traits<MatrixType>::StorageKind>
+{
+  public:
+
+    typedef typename TransposeImpl<MatrixType,typename internal::traits<MatrixType>::StorageKind>::Base Base;
+    EIGEN_GENERIC_PUBLIC_INTERFACE(Transpose)
+
+    EIGEN_DEVICE_FUNC
+    inline Transpose(MatrixType& a_matrix) : m_matrix(a_matrix) {}
+
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Transpose)
+
+    EIGEN_DEVICE_FUNC inline Index rows() const { return m_matrix.cols(); }
+    EIGEN_DEVICE_FUNC inline Index cols() const { return m_matrix.rows(); }
+
+    /** \returns the nested expression */
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename MatrixType::Nested>::type&
+    nestedExpression() const { return m_matrix; }
+
+    /** \returns the nested expression */
+    EIGEN_DEVICE_FUNC
+    typename internal::remove_all<typename MatrixType::Nested>::type&
+    nestedExpression() { return m_matrix.const_cast_derived(); }
+
+  protected:
+    typename MatrixType::Nested m_matrix;
+};
+
+namespace internal {
+
+template<typename MatrixType, bool HasDirectAccess = has_direct_access<MatrixType>::ret>
+struct TransposeImpl_base
+{
+  typedef typename dense_xpr_base<Transpose<MatrixType> >::type type;
+};
+
+template<typename MatrixType>
+struct TransposeImpl_base<MatrixType, false>
+{
+  typedef typename dense_xpr_base<Transpose<MatrixType> >::type type;
+};
+
+} // end namespace internal
+
+template<typename MatrixType> class TransposeImpl<MatrixType,Dense>
+  : public internal::TransposeImpl_base<MatrixType>::type
+{
+  public:
+
+    typedef typename internal::TransposeImpl_base<MatrixType>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Transpose<MatrixType>)
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(TransposeImpl)
+
+    EIGEN_DEVICE_FUNC inline Index innerStride() const { return derived().nestedExpression().innerStride(); }
+    EIGEN_DEVICE_FUNC inline Index outerStride() const { return derived().nestedExpression().outerStride(); }
+
+    typedef typename internal::conditional<
+                       internal::is_lvalue<MatrixType>::value,
+                       Scalar,
+                       const Scalar
+                     >::type ScalarWithConstIfNotLvalue;
+
+    inline ScalarWithConstIfNotLvalue* data() { return derived().nestedExpression().data(); }
+    inline const Scalar* data() const { return derived().nestedExpression().data(); }
+
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue& coeffRef(Index rowId, Index colId)
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
+      return derived().nestedExpression().const_cast_derived().coeffRef(colId, rowId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline ScalarWithConstIfNotLvalue& coeffRef(Index index)
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
+      return derived().nestedExpression().const_cast_derived().coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index rowId, Index colId) const
+    {
+      return derived().nestedExpression().coeffRef(colId, rowId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline const Scalar& coeffRef(Index index) const
+    {
+      return derived().nestedExpression().coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffReturnType coeff(Index rowId, Index colId) const
+    {
+      return derived().nestedExpression().coeff(colId, rowId);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffReturnType coeff(Index index) const
+    {
+      return derived().nestedExpression().coeff(index);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index rowId, Index colId) const
+    {
+      return derived().nestedExpression().template packet<LoadMode>(colId, rowId);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index rowId, Index colId, const PacketScalar& x)
+    {
+      derived().nestedExpression().const_cast_derived().template writePacket<LoadMode>(colId, rowId, x);
+    }
+
+    template<int LoadMode>
+    inline const PacketScalar packet(Index index) const
+    {
+      return derived().nestedExpression().template packet<LoadMode>(index);
+    }
+
+    template<int LoadMode>
+    inline void writePacket(Index index, const PacketScalar& x)
+    {
+      derived().nestedExpression().const_cast_derived().template writePacket<LoadMode>(index, x);
+    }
+};
+
+/** \returns an expression of the transpose of *this.
+  *
+  * Example: \include MatrixBase_transpose.cpp
+  * Output: \verbinclude MatrixBase_transpose.out
+  *
+  * \warning If you want to replace a matrix by its own transpose, do \b NOT do this:
+  * \code
+  * m = m.transpose(); // bug!!! caused by aliasing effect
+  * \endcode
+  * Instead, use the transposeInPlace() method:
+  * \code
+  * m.transposeInPlace();
+  * \endcode
+  * which gives Eigen good opportunities for optimization, or alternatively you can also do:
+  * \code
+  * m = m.transpose().eval();
+  * \endcode
+  *
+  * \sa transposeInPlace(), adjoint() */
+template<typename Derived>
+inline Transpose<Derived>
+DenseBase<Derived>::transpose()
+{
+  return derived();
+}
+
+/** This is the const version of transpose().
+  *
+  * Make sure you read the warning for transpose() !
+  *
+  * \sa transposeInPlace(), adjoint() */
+template<typename Derived>
+inline typename DenseBase<Derived>::ConstTransposeReturnType
+DenseBase<Derived>::transpose() const
+{
+  return ConstTransposeReturnType(derived());
+}
+
+/** \returns an expression of the adjoint (i.e. conjugate transpose) of *this.
+  *
+  * Example: \include MatrixBase_adjoint.cpp
+  * Output: \verbinclude MatrixBase_adjoint.out
+  *
+  * \warning If you want to replace a matrix by its own adjoint, do \b NOT do this:
+  * \code
+  * m = m.adjoint(); // bug!!! caused by aliasing effect
+  * \endcode
+  * Instead, use the adjointInPlace() method:
+  * \code
+  * m.adjointInPlace();
+  * \endcode
+  * which gives Eigen good opportunities for optimization, or alternatively you can also do:
+  * \code
+  * m = m.adjoint().eval();
+  * \endcode
+  *
+  * \sa adjointInPlace(), transpose(), conjugate(), class Transpose, class internal::scalar_conjugate_op */
+template<typename Derived>
+inline const typename MatrixBase<Derived>::AdjointReturnType
+MatrixBase<Derived>::adjoint() const
+{
+  return this->transpose(); // in the complex case, the .conjugate() is be implicit here
+                            // due to implicit conversion to return type
+}
+
+/***************************************************************************
+* "in place" transpose implementation
+***************************************************************************/
+
+namespace internal {
+
+template<typename MatrixType,
+  bool IsSquare = (MatrixType::RowsAtCompileTime == MatrixType::ColsAtCompileTime) && MatrixType::RowsAtCompileTime!=Dynamic>
+struct inplace_transpose_selector;
+
+template<typename MatrixType>
+struct inplace_transpose_selector<MatrixType,true> { // square matrix
+  static void run(MatrixType& m) {
+    m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose());
+  }
+};
+
+template<typename MatrixType>
+struct inplace_transpose_selector<MatrixType,false> { // non square matrix
+  static void run(MatrixType& m) {
+    if (m.rows()==m.cols())
+      m.matrix().template triangularView<StrictlyUpper>().swap(m.matrix().transpose());
+    else
+      m = m.transpose().eval();
+  }
+};
+
+} // end namespace internal
+
+/** This is the "in place" version of transpose(): it replaces \c *this by its own transpose.
+  * Thus, doing
+  * \code
+  * m.transposeInPlace();
+  * \endcode
+  * has the same effect on m as doing
+  * \code
+  * m = m.transpose().eval();
+  * \endcode
+  * and is faster and also safer because in the latter line of code, forgetting the eval() results
+  * in a bug caused by \ref TopicAliasing "aliasing".
+  *
+  * Notice however that this method is only useful if you want to replace a matrix by its own transpose.
+  * If you just need the transpose of a matrix, use transpose().
+  *
+  * \note if the matrix is not square, then \c *this must be a resizable matrix. 
+  * This excludes (non-square) fixed-size matrices, block-expressions and maps.
+  *
+  * \sa transpose(), adjoint(), adjointInPlace() */
+template<typename Derived>
+inline void DenseBase<Derived>::transposeInPlace()
+{
+  eigen_assert((rows() == cols() || (RowsAtCompileTime == Dynamic && ColsAtCompileTime == Dynamic))
+               && "transposeInPlace() called on a non-square non-resizable matrix");
+  internal::inplace_transpose_selector<Derived>::run(derived());
+}
+
+/***************************************************************************
+* "in place" adjoint implementation
+***************************************************************************/
+
+/** This is the "in place" version of adjoint(): it replaces \c *this by its own transpose.
+  * Thus, doing
+  * \code
+  * m.adjointInPlace();
+  * \endcode
+  * has the same effect on m as doing
+  * \code
+  * m = m.adjoint().eval();
+  * \endcode
+  * and is faster and also safer because in the latter line of code, forgetting the eval() results
+  * in a bug caused by aliasing.
+  *
+  * Notice however that this method is only useful if you want to replace a matrix by its own adjoint.
+  * If you just need the adjoint of a matrix, use adjoint().
+  *
+  * \note if the matrix is not square, then \c *this must be a resizable matrix.
+  * This excludes (non-square) fixed-size matrices, block-expressions and maps.
+  *
+  * \sa transpose(), adjoint(), transposeInPlace() */
+template<typename Derived>
+inline void MatrixBase<Derived>::adjointInPlace()
+{
+  derived() = adjoint().eval();
+}
+
+#ifndef EIGEN_NO_DEBUG
+
+// The following is to detect aliasing problems in most common cases.
+
+namespace internal {
+
+template<typename BinOp,typename NestedXpr,typename Rhs>
+struct blas_traits<SelfCwiseBinaryOp<BinOp,NestedXpr,Rhs> >
+ : blas_traits<NestedXpr>
+{
+  typedef SelfCwiseBinaryOp<BinOp,NestedXpr,Rhs> XprType;
+  static inline const XprType extract(const XprType& x) { return x; }
+};
+
+template<bool DestIsTransposed, typename OtherDerived>
+struct check_transpose_aliasing_compile_time_selector
+{
+  enum { ret = bool(blas_traits<OtherDerived>::IsTransposed) != DestIsTransposed };
+};
+
+template<bool DestIsTransposed, typename BinOp, typename DerivedA, typename DerivedB>
+struct check_transpose_aliasing_compile_time_selector<DestIsTransposed,CwiseBinaryOp<BinOp,DerivedA,DerivedB> >
+{
+  enum { ret =    bool(blas_traits<DerivedA>::IsTransposed) != DestIsTransposed
+               || bool(blas_traits<DerivedB>::IsTransposed) != DestIsTransposed
+  };
+};
+
+template<typename Scalar, bool DestIsTransposed, typename OtherDerived>
+struct check_transpose_aliasing_run_time_selector
+{
+  static bool run(const Scalar* dest, const OtherDerived& src)
+  {
+    return (bool(blas_traits<OtherDerived>::IsTransposed) != DestIsTransposed) && (dest!=0 && dest==(const Scalar*)extract_data(src));
+  }
+};
+
+template<typename Scalar, bool DestIsTransposed, typename BinOp, typename DerivedA, typename DerivedB>
+struct check_transpose_aliasing_run_time_selector<Scalar,DestIsTransposed,CwiseBinaryOp<BinOp,DerivedA,DerivedB> >
+{
+  static bool run(const Scalar* dest, const CwiseBinaryOp<BinOp,DerivedA,DerivedB>& src)
+  {
+    return ((blas_traits<DerivedA>::IsTransposed != DestIsTransposed) && (dest!=0 && dest==(const Scalar*)extract_data(src.lhs())))
+        || ((blas_traits<DerivedB>::IsTransposed != DestIsTransposed) && (dest!=0 && dest==(const Scalar*)extract_data(src.rhs())));
+  }
+};
+
+// the following selector, checkTransposeAliasing_impl, based on MightHaveTransposeAliasing,
+// is because when the condition controlling the assert is known at compile time, ICC emits a warning.
+// This is actually a good warning: in expressions that don't have any transposing, the condition is
+// known at compile time to be false, and using that, we can avoid generating the code of the assert again
+// and again for all these expressions that don't need it.
+
+template<typename Derived, typename OtherDerived,
+         bool MightHaveTransposeAliasing
+                 = check_transpose_aliasing_compile_time_selector
+                     <blas_traits<Derived>::IsTransposed,OtherDerived>::ret
+        >
+struct checkTransposeAliasing_impl
+{
+    static void run(const Derived& dst, const OtherDerived& other)
+    {
+        eigen_assert((!check_transpose_aliasing_run_time_selector
+                      <typename Derived::Scalar,blas_traits<Derived>::IsTransposed,OtherDerived>
+                      ::run(extract_data(dst), other))
+          && "aliasing detected during transposition, use transposeInPlace() "
+             "or evaluate the rhs into a temporary using .eval()");
+
+    }
+};
+
+template<typename Derived, typename OtherDerived>
+struct checkTransposeAliasing_impl<Derived, OtherDerived, false>
+{
+    static void run(const Derived&, const OtherDerived&)
+    {
+    }
+};
+
+} // end namespace internal
+
+template<typename Derived>
+template<typename OtherDerived>
+void DenseBase<Derived>::checkTransposeAliasing(const OtherDerived& other) const
+{
+    internal::checkTransposeAliasing_impl<Derived, OtherDerived>::run(derived(), other);
+}
+#endif
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRANSPOSE_H
diff --git a/third_party/eigen3/Eigen/src/Core/Transpositions.h b/third_party/eigen3/Eigen/src/Core/Transpositions.h
new file mode 100644
index 0000000000..ac3aef5af5
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Transpositions.h
@@ -0,0 +1,436 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRANSPOSITIONS_H
+#define EIGEN_TRANSPOSITIONS_H
+
+namespace Eigen {
+
+/** \class Transpositions
+  * \ingroup Core_Module
+  *
+  * \brief Represents a sequence of transpositions (row/column interchange)
+  *
+  * \param SizeAtCompileTime the number of transpositions, or Dynamic
+  * \param MaxSizeAtCompileTime the maximum number of transpositions, or Dynamic. This optional parameter defaults to SizeAtCompileTime. Most of the time, you should not have to specify it.
+  *
+  * This class represents a permutation transformation as a sequence of \em n transpositions
+  * \f$[T_{n-1} \ldots T_{i} \ldots T_{0}]\f$. It is internally stored as a vector of integers \c indices.
+  * Each transposition \f$ T_{i} \f$ applied on the left of a matrix (\f$ T_{i} M\f$) interchanges
+  * the rows \c i and \c indices[i] of the matrix \c M.
+  * A transposition applied on the right (e.g., \f$ M T_{i}\f$) yields a column interchange.
+  *
+  * Compared to the class PermutationMatrix, such a sequence of transpositions is what is
+  * computed during a decomposition with pivoting, and it is faster when applying the permutation in-place.
+  *
+  * To apply a sequence of transpositions to a matrix, simply use the operator * as in the following example:
+  * \code
+  * Transpositions tr;
+  * MatrixXf mat;
+  * mat = tr * mat;
+  * \endcode
+  * In this example, we detect that the matrix appears on both side, and so the transpositions
+  * are applied in-place without any temporary or extra copy.
+  *
+  * \sa class PermutationMatrix
+  */
+
+namespace internal {
+template<typename TranspositionType, typename MatrixType, int Side, bool Transposed=false> struct transposition_matrix_product_retval;
+}
+
+template<typename Derived>
+class TranspositionsBase
+{
+    typedef internal::traits<Derived> Traits;
+
+  public:
+
+    typedef typename Traits::IndicesType IndicesType;
+    typedef typename IndicesType::Scalar Index;
+
+    Derived& derived() { return *static_cast<Derived*>(this); }
+    const Derived& derived() const { return *static_cast<const Derived*>(this); }
+
+    /** Copies the \a other transpositions into \c *this */
+    template<typename OtherDerived>
+    Derived& operator=(const TranspositionsBase<OtherDerived>& other)
+    {
+      indices() = other.indices();
+      return derived();
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    Derived& operator=(const TranspositionsBase& other)
+    {
+      indices() = other.indices();
+      return derived();
+    }
+    #endif
+
+    /** \returns the number of transpositions */
+    inline Index size() const { return indices().size(); }
+
+    /** Direct access to the underlying index vector */
+    inline const Index& coeff(Index i) const { return indices().coeff(i); }
+    /** Direct access to the underlying index vector */
+    inline Index& coeffRef(Index i) { return indices().coeffRef(i); }
+    /** Direct access to the underlying index vector */
+    inline const Index& operator()(Index i) const { return indices()(i); }
+    /** Direct access to the underlying index vector */
+    inline Index& operator()(Index i) { return indices()(i); }
+    /** Direct access to the underlying index vector */
+    inline const Index& operator[](Index i) const { return indices()(i); }
+    /** Direct access to the underlying index vector */
+    inline Index& operator[](Index i) { return indices()(i); }
+
+    /** const version of indices(). */
+    const IndicesType& indices() const { return derived().indices(); }
+    /** \returns a reference to the stored array representing the transpositions. */
+    IndicesType& indices() { return derived().indices(); }
+
+    /** Resizes to given size. */
+    inline void resize(Index newSize)
+    {
+      indices().resize(newSize);
+    }
+
+    /** Sets \c *this to represents an identity transformation */
+    void setIdentity()
+    {
+      for(int i = 0; i < indices().size(); ++i)
+        coeffRef(i) = i;
+    }
+
+    // FIXME: do we want such methods ?
+    // might be usefull when the target matrix expression is complex, e.g.:
+    // object.matrix().block(..,..,..,..) = trans * object.matrix().block(..,..,..,..);
+    /*
+    template<typename MatrixType>
+    void applyForwardToRows(MatrixType& mat) const
+    {
+      for(Index k=0 ; k<size() ; ++k)
+        if(m_indices(k)!=k)
+          mat.row(k).swap(mat.row(m_indices(k)));
+    }
+
+    template<typename MatrixType>
+    void applyBackwardToRows(MatrixType& mat) const
+    {
+      for(Index k=size()-1 ; k>=0 ; --k)
+        if(m_indices(k)!=k)
+          mat.row(k).swap(mat.row(m_indices(k)));
+    }
+    */
+
+    /** \returns the inverse transformation */
+    inline Transpose<TranspositionsBase> inverse() const
+    { return Transpose<TranspositionsBase>(derived()); }
+
+    /** \returns the tranpose transformation */
+    inline Transpose<TranspositionsBase> transpose() const
+    { return Transpose<TranspositionsBase>(derived()); }
+
+  protected:
+};
+
+namespace internal {
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime, typename IndexType>
+struct traits<Transpositions<SizeAtCompileTime,MaxSizeAtCompileTime,IndexType> >
+{
+  typedef IndexType Index;
+  typedef Matrix<Index, SizeAtCompileTime, 1, 0, MaxSizeAtCompileTime, 1> IndicesType;
+};
+}
+
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime, typename IndexType>
+class Transpositions : public TranspositionsBase<Transpositions<SizeAtCompileTime,MaxSizeAtCompileTime,IndexType> >
+{
+    typedef internal::traits<Transpositions> Traits;
+  public:
+
+    typedef TranspositionsBase<Transpositions> Base;
+    typedef typename Traits::IndicesType IndicesType;
+    typedef typename IndicesType::Scalar Index;
+
+    inline Transpositions() {}
+
+    /** Copy constructor. */
+    template<typename OtherDerived>
+    inline Transpositions(const TranspositionsBase<OtherDerived>& other)
+      : m_indices(other.indices()) {}
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** Standard copy constructor. Defined only to prevent a default copy constructor
+      * from hiding the other templated constructor */
+    inline Transpositions(const Transpositions& other) : m_indices(other.indices()) {}
+    #endif
+
+    /** Generic constructor from expression of the transposition indices. */
+    template<typename Other>
+    explicit inline Transpositions(const MatrixBase<Other>& a_indices) : m_indices(a_indices)
+    {}
+
+    /** Copies the \a other transpositions into \c *this */
+    template<typename OtherDerived>
+    Transpositions& operator=(const TranspositionsBase<OtherDerived>& other)
+    {
+      return Base::operator=(other);
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    Transpositions& operator=(const Transpositions& other)
+    {
+      m_indices = other.m_indices;
+      return *this;
+    }
+    #endif
+
+    /** Constructs an uninitialized permutation matrix of given size.
+      */
+    inline Transpositions(Index size) : m_indices(size)
+    {}
+
+    /** const version of indices(). */
+    const IndicesType& indices() const { return m_indices; }
+    /** \returns a reference to the stored array representing the transpositions. */
+    IndicesType& indices() { return m_indices; }
+
+  protected:
+
+    IndicesType m_indices;
+};
+
+
+namespace internal {
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime, typename IndexType, int _PacketAccess>
+struct traits<Map<Transpositions<SizeAtCompileTime,MaxSizeAtCompileTime,IndexType>,_PacketAccess> >
+{
+  typedef IndexType Index;
+  typedef Map<const Matrix<Index,SizeAtCompileTime,1,0,MaxSizeAtCompileTime,1>, _PacketAccess> IndicesType;
+};
+}
+
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime, typename IndexType, int PacketAccess>
+class Map<Transpositions<SizeAtCompileTime,MaxSizeAtCompileTime,IndexType>,PacketAccess>
+ : public TranspositionsBase<Map<Transpositions<SizeAtCompileTime,MaxSizeAtCompileTime,IndexType>,PacketAccess> >
+{
+    typedef internal::traits<Map> Traits;
+  public:
+
+    typedef TranspositionsBase<Map> Base;
+    typedef typename Traits::IndicesType IndicesType;
+    typedef typename IndicesType::Scalar Index;
+
+    inline Map(const Index* indicesPtr)
+      : m_indices(indicesPtr)
+    {}
+
+    inline Map(const Index* indicesPtr, Index size)
+      : m_indices(indicesPtr,size)
+    {}
+
+    /** Copies the \a other transpositions into \c *this */
+    template<typename OtherDerived>
+    Map& operator=(const TranspositionsBase<OtherDerived>& other)
+    {
+      return Base::operator=(other);
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    Map& operator=(const Map& other)
+    {
+      m_indices = other.m_indices;
+      return *this;
+    }
+    #endif
+
+    /** const version of indices(). */
+    const IndicesType& indices() const { return m_indices; }
+
+    /** \returns a reference to the stored array representing the transpositions. */
+    IndicesType& indices() { return m_indices; }
+
+  protected:
+
+    IndicesType m_indices;
+};
+
+namespace internal {
+template<typename _IndicesType>
+struct traits<TranspositionsWrapper<_IndicesType> >
+{
+  typedef typename _IndicesType::Scalar Index;
+  typedef _IndicesType IndicesType;
+};
+}
+
+template<typename _IndicesType>
+class TranspositionsWrapper
+ : public TranspositionsBase<TranspositionsWrapper<_IndicesType> >
+{
+    typedef internal::traits<TranspositionsWrapper> Traits;
+  public:
+
+    typedef TranspositionsBase<TranspositionsWrapper> Base;
+    typedef typename Traits::IndicesType IndicesType;
+    typedef typename IndicesType::Scalar Index;
+
+    inline TranspositionsWrapper(IndicesType& a_indices)
+      : m_indices(a_indices)
+    {}
+
+    /** Copies the \a other transpositions into \c *this */
+    template<typename OtherDerived>
+    TranspositionsWrapper& operator=(const TranspositionsBase<OtherDerived>& other)
+    {
+      return Base::operator=(other);
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is a special case of the templated operator=. Its purpose is to
+      * prevent a default operator= from hiding the templated operator=.
+      */
+    TranspositionsWrapper& operator=(const TranspositionsWrapper& other)
+    {
+      m_indices = other.m_indices;
+      return *this;
+    }
+    #endif
+
+    /** const version of indices(). */
+    const IndicesType& indices() const { return m_indices; }
+
+    /** \returns a reference to the stored array representing the transpositions. */
+    IndicesType& indices() { return m_indices; }
+
+  protected:
+
+    const typename IndicesType::Nested m_indices;
+};
+
+/** \returns the \a matrix with the \a transpositions applied to the columns.
+  */
+template<typename Derived, typename TranspositionsDerived>
+inline const internal::transposition_matrix_product_retval<TranspositionsDerived, Derived, OnTheRight>
+operator*(const MatrixBase<Derived>& matrix,
+          const TranspositionsBase<TranspositionsDerived> &transpositions)
+{
+  return internal::transposition_matrix_product_retval
+           <TranspositionsDerived, Derived, OnTheRight>
+           (transpositions.derived(), matrix.derived());
+}
+
+/** \returns the \a matrix with the \a transpositions applied to the rows.
+  */
+template<typename Derived, typename TranspositionDerived>
+inline const internal::transposition_matrix_product_retval
+               <TranspositionDerived, Derived, OnTheLeft>
+operator*(const TranspositionsBase<TranspositionDerived> &transpositions,
+          const MatrixBase<Derived>& matrix)
+{
+  return internal::transposition_matrix_product_retval
+           <TranspositionDerived, Derived, OnTheLeft>
+           (transpositions.derived(), matrix.derived());
+}
+
+namespace internal {
+
+template<typename TranspositionType, typename MatrixType, int Side, bool Transposed>
+struct traits<transposition_matrix_product_retval<TranspositionType, MatrixType, Side, Transposed> >
+{
+  typedef typename MatrixType::PlainObject ReturnType;
+};
+
+template<typename TranspositionType, typename MatrixType, int Side, bool Transposed>
+struct transposition_matrix_product_retval
+ : public ReturnByValue<transposition_matrix_product_retval<TranspositionType, MatrixType, Side, Transposed> >
+{
+    typedef typename remove_all<typename MatrixType::Nested>::type MatrixTypeNestedCleaned;
+    typedef typename TranspositionType::Index Index;
+
+    transposition_matrix_product_retval(const TranspositionType& tr, const MatrixType& matrix)
+      : m_transpositions(tr), m_matrix(matrix)
+    {}
+
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+
+    template<typename Dest> inline void evalTo(Dest& dst) const
+    {
+      const Index size = m_transpositions.size();
+      Index j = 0;
+
+      if(!(is_same<MatrixTypeNestedCleaned,Dest>::value && extract_data(dst) == extract_data(m_matrix)))
+        dst = m_matrix;
+
+      for(Index k=(Transposed?size-1:0) ; Transposed?k>=0:k<size ; Transposed?--k:++k)
+        if((j=m_transpositions.coeff(k))!=k)
+        {
+          if(Side==OnTheLeft)
+            dst.row(k).swap(dst.row(j));
+          else if(Side==OnTheRight)
+            dst.col(k).swap(dst.col(j));
+        }
+    }
+
+  protected:
+    const TranspositionType& m_transpositions;
+    typename MatrixType::Nested m_matrix;
+};
+
+} // end namespace internal
+
+/* Template partial specialization for transposed/inverse transpositions */
+
+template<typename TranspositionsDerived>
+class Transpose<TranspositionsBase<TranspositionsDerived> >
+{
+    typedef TranspositionsDerived TranspositionType;
+    typedef typename TranspositionType::IndicesType IndicesType;
+  public:
+
+    Transpose(const TranspositionType& t) : m_transpositions(t) {}
+
+    inline int size() const { return m_transpositions.size(); }
+
+    /** \returns the \a matrix with the inverse transpositions applied to the columns.
+      */
+    template<typename Derived> friend
+    inline const internal::transposition_matrix_product_retval<TranspositionType, Derived, OnTheRight, true>
+    operator*(const MatrixBase<Derived>& matrix, const Transpose& trt)
+    {
+      return internal::transposition_matrix_product_retval<TranspositionType, Derived, OnTheRight, true>(trt.m_transpositions, matrix.derived());
+    }
+
+    /** \returns the \a matrix with the inverse transpositions applied to the rows.
+      */
+    template<typename Derived>
+    inline const internal::transposition_matrix_product_retval<TranspositionType, Derived, OnTheLeft, true>
+    operator*(const MatrixBase<Derived>& matrix) const
+    {
+      return internal::transposition_matrix_product_retval<TranspositionType, Derived, OnTheLeft, true>(m_transpositions, matrix.derived());
+    }
+
+  protected:
+    const TranspositionType& m_transpositions;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRANSPOSITIONS_H
diff --git a/third_party/eigen3/Eigen/src/Core/TriangularMatrix.h b/third_party/eigen3/Eigen/src/Core/TriangularMatrix.h
new file mode 100644
index 0000000000..1d6e346506
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/TriangularMatrix.h
@@ -0,0 +1,900 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRIANGULARMATRIX_H
+#define EIGEN_TRIANGULARMATRIX_H
+
+namespace Eigen { 
+
+namespace internal {
+  
+template<int Side, typename TriangularType, typename Rhs> struct triangular_solve_retval;
+  
+}
+
+/** \internal
+  *
+  * \class TriangularBase
+  * \ingroup Core_Module
+  *
+  * \brief Base class for triangular part in a matrix
+  */
+template<typename Derived> class TriangularBase : public EigenBase<Derived>
+{
+  public:
+
+    enum {
+      Mode = internal::traits<Derived>::Mode,
+      CoeffReadCost = internal::traits<Derived>::CoeffReadCost,
+      RowsAtCompileTime = internal::traits<Derived>::RowsAtCompileTime,
+      ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
+      MaxRowsAtCompileTime = internal::traits<Derived>::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = internal::traits<Derived>::MaxColsAtCompileTime
+    };
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::traits<Derived>::DenseMatrixType DenseMatrixType;
+    typedef DenseMatrixType DenseType;
+
+    EIGEN_DEVICE_FUNC
+    inline TriangularBase() { eigen_assert(!((Mode&UnitDiag) && (Mode&ZeroDiag))); }
+
+    EIGEN_DEVICE_FUNC
+    inline Index rows() const { return derived().rows(); }
+    EIGEN_DEVICE_FUNC
+    inline Index cols() const { return derived().cols(); }
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const { return derived().outerStride(); }
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const { return derived().innerStride(); }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar coeff(Index row, Index col) const  { return derived().coeff(row,col); }
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index row, Index col) { return derived().coeffRef(row,col); }
+
+    /** \see MatrixBase::copyCoeff(row,col)
+      */
+    template<typename Other>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE void copyCoeff(Index row, Index col, Other& other)
+    {
+      derived().coeffRef(row, col) = other.coeff(row, col);
+    }
+
+    EIGEN_DEVICE_FUNC
+    inline Scalar operator()(Index row, Index col) const
+    {
+      check_coordinates(row, col);
+      return coeff(row,col);
+    }
+    EIGEN_DEVICE_FUNC
+    inline Scalar& operator()(Index row, Index col)
+    {
+      check_coordinates(row, col);
+      return coeffRef(row,col);
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    EIGEN_DEVICE_FUNC
+    inline const Derived& derived() const { return *static_cast<const Derived*>(this); }
+    EIGEN_DEVICE_FUNC
+    inline Derived& derived() { return *static_cast<Derived*>(this); }
+    #endif // not EIGEN_PARSED_BY_DOXYGEN
+
+    template<typename DenseDerived>
+    EIGEN_DEVICE_FUNC
+    void evalTo(MatrixBase<DenseDerived> &other) const;
+    template<typename DenseDerived>
+    EIGEN_DEVICE_FUNC
+    void evalToLazy(MatrixBase<DenseDerived> &other) const;
+
+    EIGEN_DEVICE_FUNC
+    DenseMatrixType toDenseMatrix() const
+    {
+      DenseMatrixType res(rows(), cols());
+      evalToLazy(res);
+      return res;
+    }
+
+  protected:
+
+    void check_coordinates(Index row, Index col) const
+    {
+      EIGEN_ONLY_USED_FOR_DEBUG(row);
+      EIGEN_ONLY_USED_FOR_DEBUG(col);
+      eigen_assert(col>=0 && col<cols() && row>=0 && row<rows());
+      const int mode = int(Mode) & ~SelfAdjoint;
+      EIGEN_ONLY_USED_FOR_DEBUG(mode);
+      eigen_assert((mode==Upper && col>=row)
+                || (mode==Lower && col<=row)
+                || ((mode==StrictlyUpper || mode==UnitUpper) && col>row)
+                || ((mode==StrictlyLower || mode==UnitLower) && col<row));
+    }
+
+    #ifdef EIGEN_INTERNAL_DEBUGGING
+    void check_coordinates_internal(Index row, Index col) const
+    {
+      check_coordinates(row, col);
+    }
+    #else
+    void check_coordinates_internal(Index , Index ) const {}
+    #endif
+
+};
+
+/** \class TriangularView
+  * \ingroup Core_Module
+  *
+  * \brief Base class for triangular part in a matrix
+  *
+  * \param MatrixType the type of the object in which we are taking the triangular part
+  * \param Mode the kind of triangular matrix expression to construct. Can be #Upper,
+  *             #Lower, #UnitUpper, #UnitLower, #StrictlyUpper, or #StrictlyLower.
+  *             This is in fact a bit field; it must have either #Upper or #Lower, 
+  *             and additionnaly it may have #UnitDiag or #ZeroDiag or neither.
+  *
+  * This class represents a triangular part of a matrix, not necessarily square. Strictly speaking, for rectangular
+  * matrices one should speak of "trapezoid" parts. This class is the return type
+  * of MatrixBase::triangularView() and most of the time this is the only way it is used.
+  *
+  * \sa MatrixBase::triangularView()
+  */
+namespace internal {
+template<typename MatrixType, unsigned int _Mode>
+struct traits<TriangularView<MatrixType, _Mode> > : traits<MatrixType>
+{
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_reference<MatrixTypeNested>::type MatrixTypeNestedNonRef;
+  typedef typename remove_all<MatrixTypeNested>::type MatrixTypeNestedCleaned;
+  typedef MatrixType ExpressionType;
+  typedef typename MatrixType::PlainObject DenseMatrixType;
+  enum {
+    Mode = _Mode,
+    Flags = (MatrixTypeNestedCleaned::Flags & (HereditaryBits) & (~(PacketAccessBit | DirectAccessBit | LinearAccessBit))) | Mode,
+    CoeffReadCost = MatrixTypeNestedCleaned::CoeffReadCost
+  };
+};
+}
+
+template<int Mode, bool LhsIsTriangular,
+         typename Lhs, bool LhsIsVector,
+         typename Rhs, bool RhsIsVector>
+struct TriangularProduct;
+
+template<typename _MatrixType, unsigned int _Mode> class TriangularView
+  : public TriangularBase<TriangularView<_MatrixType, _Mode> >
+{
+  public:
+
+    typedef TriangularBase<TriangularView> Base;
+    typedef typename internal::traits<TriangularView>::Scalar Scalar;
+
+    typedef _MatrixType MatrixType;
+    typedef typename internal::traits<TriangularView>::DenseMatrixType DenseMatrixType;
+    typedef DenseMatrixType PlainObject;
+
+  protected:
+    typedef typename internal::traits<TriangularView>::MatrixTypeNested MatrixTypeNested;
+    typedef typename internal::traits<TriangularView>::MatrixTypeNestedNonRef MatrixTypeNestedNonRef;
+    typedef typename internal::traits<TriangularView>::MatrixTypeNestedCleaned MatrixTypeNestedCleaned;
+
+    typedef typename internal::remove_all<typename MatrixType::ConjugateReturnType>::type MatrixConjugateReturnType;
+    
+  public:
+    using Base::evalToLazy;
+  
+
+    typedef typename internal::traits<TriangularView>::StorageKind StorageKind;
+    typedef typename internal::traits<TriangularView>::Index Index;
+
+    enum {
+      Mode = _Mode,
+      TransposeMode = (Mode & Upper ? Lower : 0)
+                    | (Mode & Lower ? Upper : 0)
+                    | (Mode & (UnitDiag))
+                    | (Mode & (ZeroDiag))
+    };
+
+    EIGEN_DEVICE_FUNC
+    inline TriangularView(const MatrixType& matrix) : m_matrix(matrix)
+    {}
+
+    EIGEN_DEVICE_FUNC
+    inline Index rows() const { return m_matrix.rows(); }
+    EIGEN_DEVICE_FUNC
+    inline Index cols() const { return m_matrix.cols(); }
+    EIGEN_DEVICE_FUNC
+    inline Index outerStride() const { return m_matrix.outerStride(); }
+    EIGEN_DEVICE_FUNC
+    inline Index innerStride() const { return m_matrix.innerStride(); }
+
+    /** \sa MatrixBase::operator+=() */    
+    template<typename Other>
+    EIGEN_DEVICE_FUNC
+    TriangularView&  operator+=(const DenseBase<Other>& other) { return *this = m_matrix + other.derived(); }
+    /** \sa MatrixBase::operator-=() */
+    template<typename Other>
+    EIGEN_DEVICE_FUNC
+    TriangularView&  operator-=(const DenseBase<Other>& other) { return *this = m_matrix - other.derived(); }
+    /** \sa MatrixBase::operator*=() */
+    EIGEN_DEVICE_FUNC
+    TriangularView&  operator*=(const typename internal::traits<MatrixType>::Scalar& other) { return *this = m_matrix * other; }
+    /** \sa MatrixBase::operator/=() */
+    EIGEN_DEVICE_FUNC
+    TriangularView&  operator/=(const typename internal::traits<MatrixType>::Scalar& other) { return *this = m_matrix / other; }
+
+    /** \sa MatrixBase::fill() */
+    EIGEN_DEVICE_FUNC
+    void fill(const Scalar& value) { setConstant(value); }
+    /** \sa MatrixBase::setConstant() */
+    EIGEN_DEVICE_FUNC
+    TriangularView& setConstant(const Scalar& value)
+    { return *this = MatrixType::Constant(rows(), cols(), value); }
+    /** \sa MatrixBase::setZero() */
+    EIGEN_DEVICE_FUNC
+    TriangularView& setZero() { return setConstant(Scalar(0)); }
+    /** \sa MatrixBase::setOnes() */
+    EIGEN_DEVICE_FUNC
+    TriangularView& setOnes() { return setConstant(Scalar(1)); }
+
+    /** \sa MatrixBase::coeff()
+      * \warning the coordinates must fit into the referenced triangular part
+      */
+    EIGEN_DEVICE_FUNC
+    inline Scalar coeff(Index row, Index col) const
+    {
+      Base::check_coordinates_internal(row, col);
+      return m_matrix.coeff(row, col);
+    }
+
+    /** \sa MatrixBase::coeffRef()
+      * \warning the coordinates must fit into the referenced triangular part
+      */
+    EIGEN_DEVICE_FUNC
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      Base::check_coordinates_internal(row, col);
+      return m_matrix.const_cast_derived().coeffRef(row, col);
+    }
+
+    EIGEN_DEVICE_FUNC
+    const MatrixTypeNestedCleaned& nestedExpression() const { return m_matrix; }
+    EIGEN_DEVICE_FUNC
+    MatrixTypeNestedCleaned& nestedExpression() { return *const_cast<MatrixTypeNestedCleaned*>(&m_matrix); }
+
+    /** Assigns a triangular matrix to a triangular part of a dense matrix */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    TriangularView& operator=(const TriangularBase<OtherDerived>& other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    TriangularView& operator=(const MatrixBase<OtherDerived>& other);
+
+    EIGEN_DEVICE_FUNC
+    TriangularView& operator=(const TriangularView& other)
+    { return *this = other.nestedExpression(); }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void lazyAssign(const TriangularBase<OtherDerived>& other);
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void lazyAssign(const MatrixBase<OtherDerived>& other);
+
+    /** \sa MatrixBase::conjugate() */
+    EIGEN_DEVICE_FUNC
+    inline TriangularView<MatrixConjugateReturnType,Mode> conjugate()
+    { return m_matrix.conjugate(); }
+    /** \sa MatrixBase::conjugate() const */
+    EIGEN_DEVICE_FUNC
+    inline const TriangularView<MatrixConjugateReturnType,Mode> conjugate() const
+    { return m_matrix.conjugate(); }
+
+    /** \sa MatrixBase::adjoint() const */
+    EIGEN_DEVICE_FUNC
+    inline const TriangularView<const typename MatrixType::AdjointReturnType,TransposeMode> adjoint() const
+    { return m_matrix.adjoint(); }
+
+    /** \sa MatrixBase::transpose() */
+    EIGEN_DEVICE_FUNC
+    inline TriangularView<Transpose<MatrixType>,TransposeMode> transpose()
+    {
+      EIGEN_STATIC_ASSERT_LVALUE(MatrixType)
+      return m_matrix.const_cast_derived().transpose();
+    }
+    /** \sa MatrixBase::transpose() const */
+    EIGEN_DEVICE_FUNC
+    inline const TriangularView<Transpose<MatrixType>,TransposeMode> transpose() const
+    {
+      return m_matrix.transpose();
+    }
+
+    /** Efficient triangular matrix times vector/matrix product */
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    TriangularProduct<Mode,true,MatrixType,false,OtherDerived, OtherDerived::IsVectorAtCompileTime>
+    operator*(const MatrixBase<OtherDerived>& rhs) const
+    {
+      return TriangularProduct
+              <Mode,true,MatrixType,false,OtherDerived,OtherDerived::IsVectorAtCompileTime>
+              (m_matrix, rhs.derived());
+    }
+
+    /** Efficient vector/matrix times triangular matrix product */
+    template<typename OtherDerived> friend
+    EIGEN_DEVICE_FUNC
+    TriangularProduct<Mode,false,OtherDerived,OtherDerived::IsVectorAtCompileTime,MatrixType,false>
+    operator*(const MatrixBase<OtherDerived>& lhs, const TriangularView& rhs)
+    {
+      return TriangularProduct
+              <Mode,false,OtherDerived,OtherDerived::IsVectorAtCompileTime,MatrixType,false>
+              (lhs.derived(),rhs.m_matrix);
+    }
+
+    #ifdef EIGEN2_SUPPORT
+    template<typename OtherDerived>
+    struct eigen2_product_return_type
+    {
+      typedef typename TriangularView<MatrixType,Mode>::DenseMatrixType DenseMatrixType;
+      typedef typename OtherDerived::PlainObject::DenseType OtherPlainObject;
+      typedef typename ProductReturnType<DenseMatrixType, OtherPlainObject>::Type ProdRetType;
+      typedef typename ProdRetType::PlainObject type;
+    };
+    template<typename OtherDerived>
+    const typename eigen2_product_return_type<OtherDerived>::type
+    operator*(const EigenBase<OtherDerived>& rhs) const
+    {
+      typename OtherDerived::PlainObject::DenseType rhsPlainObject;
+      rhs.evalTo(rhsPlainObject);
+      return this->toDenseMatrix() * rhsPlainObject;
+    }
+    template<typename OtherMatrixType>
+    bool isApprox(const TriangularView<OtherMatrixType, Mode>& other, typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision()) const
+    {
+      return this->toDenseMatrix().isApprox(other.toDenseMatrix(), precision);
+    }
+    template<typename OtherDerived>
+    bool isApprox(const MatrixBase<OtherDerived>& other, typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision()) const
+    {
+      return this->toDenseMatrix().isApprox(other, precision);
+    }
+    #endif // EIGEN2_SUPPORT
+
+    template<int Side, typename Other>
+    EIGEN_DEVICE_FUNC
+    inline const internal::triangular_solve_retval<Side,TriangularView, Other>
+    solve(const MatrixBase<Other>& other) const;
+
+    template<int Side, typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void solveInPlace(const MatrixBase<OtherDerived>& other) const;
+
+    template<typename Other>
+    EIGEN_DEVICE_FUNC
+    inline const internal::triangular_solve_retval<OnTheLeft,TriangularView, Other> 
+    solve(const MatrixBase<Other>& other) const
+    { return solve<OnTheLeft>(other); }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void solveInPlace(const MatrixBase<OtherDerived>& other) const
+    { return solveInPlace<OnTheLeft>(other); }
+
+    EIGEN_DEVICE_FUNC
+    const SelfAdjointView<MatrixTypeNestedNonRef,Mode> selfadjointView() const
+    {
+      EIGEN_STATIC_ASSERT((Mode&UnitDiag)==0,PROGRAMMING_ERROR);
+      return SelfAdjointView<MatrixTypeNestedNonRef,Mode>(m_matrix);
+    }
+    EIGEN_DEVICE_FUNC
+    SelfAdjointView<MatrixTypeNestedNonRef,Mode> selfadjointView()
+    {
+      EIGEN_STATIC_ASSERT((Mode&UnitDiag)==0,PROGRAMMING_ERROR);
+      return SelfAdjointView<MatrixTypeNestedNonRef,Mode>(m_matrix);
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void swap(TriangularBase<OtherDerived> const & other)
+    {
+      TriangularView<SwapWrapper<MatrixType>,Mode>(const_cast<MatrixType&>(m_matrix)).lazyAssign(other.derived());
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    void swap(MatrixBase<OtherDerived> const & other)
+    {
+      SwapWrapper<MatrixType> swaper(const_cast<MatrixType&>(m_matrix));
+      TriangularView<SwapWrapper<MatrixType>,Mode>(swaper).lazyAssign(other.derived());
+    }
+
+    EIGEN_DEVICE_FUNC
+    Scalar determinant() const
+    {
+      if (Mode & UnitDiag)
+        return 1;
+      else if (Mode & ZeroDiag)
+        return 0;
+      else
+        return m_matrix.diagonal().prod();
+    }
+    
+    // TODO simplify the following:
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TriangularView& operator=(const ProductBase<ProductDerived, Lhs,Rhs>& other)
+    {
+      setZero();
+      return assignProduct(other,1);
+    }
+    
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TriangularView& operator+=(const ProductBase<ProductDerived, Lhs,Rhs>& other)
+    {
+      return assignProduct(other,1);
+    }
+    
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TriangularView& operator-=(const ProductBase<ProductDerived, Lhs,Rhs>& other)
+    {
+      return assignProduct(other,-1);
+    }
+    
+    
+    template<typename ProductDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TriangularView& operator=(const ScaledProduct<ProductDerived>& other)
+    {
+      setZero();
+      return assignProduct(other,other.alpha());
+    }
+    
+    template<typename ProductDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TriangularView& operator+=(const ScaledProduct<ProductDerived>& other)
+    {
+      return assignProduct(other,other.alpha());
+    }
+    
+    template<typename ProductDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TriangularView& operator-=(const ScaledProduct<ProductDerived>& other)
+    {
+      return assignProduct(other,-other.alpha());
+    }
+    
+  protected:
+    
+    template<typename ProductDerived, typename Lhs, typename Rhs>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TriangularView& assignProduct(const ProductBase<ProductDerived, Lhs,Rhs>& prod, const Scalar& alpha);
+
+    MatrixTypeNested m_matrix;
+};
+
+/***************************************************************************
+* Implementation of triangular evaluation/assignment
+***************************************************************************/
+
+namespace internal {
+
+template<typename Derived1, typename Derived2, unsigned int Mode, int UnrollCount, bool ClearOpposite>
+struct triangular_assignment_selector
+{
+  enum {
+    col = (UnrollCount-1) / Derived1::RowsAtCompileTime,
+    row = (UnrollCount-1) % Derived1::RowsAtCompileTime
+  };
+  
+  typedef typename Derived1::Scalar Scalar;
+
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    triangular_assignment_selector<Derived1, Derived2, Mode, UnrollCount-1, ClearOpposite>::run(dst, src);
+
+    eigen_assert( Mode == Upper || Mode == Lower
+            || Mode == StrictlyUpper || Mode == StrictlyLower
+            || Mode == UnitUpper || Mode == UnitLower);
+    if((Mode == Upper && row <= col)
+    || (Mode == Lower && row >= col)
+    || (Mode == StrictlyUpper && row < col)
+    || (Mode == StrictlyLower && row > col)
+    || (Mode == UnitUpper && row < col)
+    || (Mode == UnitLower && row > col))
+      dst.copyCoeff(row, col, src);
+    else if(ClearOpposite)
+    {
+      if (Mode&UnitDiag && row==col)
+        dst.coeffRef(row, col) = Scalar(1);
+      else
+        dst.coeffRef(row, col) = Scalar(0);
+    }
+  }
+};
+
+// prevent buggy user code from causing an infinite recursion
+template<typename Derived1, typename Derived2, unsigned int Mode, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, Mode, 0, ClearOpposite>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &, const Derived2 &) {}
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, Upper, Dynamic, ClearOpposite>
+{
+  typedef typename Derived1::Index Index;
+  typedef typename Derived1::Scalar Scalar;
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    for(Index j = 0; j < dst.cols(); ++j)
+    {
+      Index maxi = (std::min)(j, dst.rows()-1);
+      for(Index i = 0; i <= maxi; ++i)
+        dst.copyCoeff(i, j, src);
+      if (ClearOpposite)
+        for(Index i = maxi+1; i < dst.rows(); ++i)
+          dst.coeffRef(i, j) = Scalar(0);
+    }
+  }
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, Lower, Dynamic, ClearOpposite>
+{
+  typedef typename Derived1::Index Index;
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    for(Index j = 0; j < dst.cols(); ++j)
+    {
+      for(Index i = j; i < dst.rows(); ++i)
+        dst.copyCoeff(i, j, src);
+      Index maxi = (std::min)(j, dst.rows());
+      if (ClearOpposite)
+        for(Index i = 0; i < maxi; ++i)
+          dst.coeffRef(i, j) = static_cast<typename Derived1::Scalar>(0);
+    }
+  }
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, StrictlyUpper, Dynamic, ClearOpposite>
+{
+  typedef typename Derived1::Index Index;
+  typedef typename Derived1::Scalar Scalar;
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    for(Index j = 0; j < dst.cols(); ++j)
+    {
+      Index maxi = (std::min)(j, dst.rows());
+      for(Index i = 0; i < maxi; ++i)
+        dst.copyCoeff(i, j, src);
+      if (ClearOpposite)
+        for(Index i = maxi; i < dst.rows(); ++i)
+          dst.coeffRef(i, j) = Scalar(0);
+    }
+  }
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, StrictlyLower, Dynamic, ClearOpposite>
+{
+  typedef typename Derived1::Index Index;
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    for(Index j = 0; j < dst.cols(); ++j)
+    {
+      for(Index i = j+1; i < dst.rows(); ++i)
+        dst.copyCoeff(i, j, src);
+      Index maxi = (std::min)(j, dst.rows()-1);
+      if (ClearOpposite)
+        for(Index i = 0; i <= maxi; ++i)
+          dst.coeffRef(i, j) = static_cast<typename Derived1::Scalar>(0);
+    }
+  }
+};
+
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, UnitUpper, Dynamic, ClearOpposite>
+{
+  typedef typename Derived1::Index Index;
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    for(Index j = 0; j < dst.cols(); ++j)
+    {
+      Index maxi = (std::min)(j, dst.rows());
+      for(Index i = 0; i < maxi; ++i)
+        dst.copyCoeff(i, j, src);
+      if (ClearOpposite)
+      {
+        for(Index i = maxi+1; i < dst.rows(); ++i)
+          dst.coeffRef(i, j) = 0;
+      }
+    }
+    dst.diagonal().setOnes();
+  }
+};
+template<typename Derived1, typename Derived2, bool ClearOpposite>
+struct triangular_assignment_selector<Derived1, Derived2, UnitLower, Dynamic, ClearOpposite>
+{
+  typedef typename Derived1::Index Index;
+  EIGEN_DEVICE_FUNC
+  static inline void run(Derived1 &dst, const Derived2 &src)
+  {
+    for(Index j = 0; j < dst.cols(); ++j)
+    {
+      Index maxi = (std::min)(j, dst.rows());
+      for(Index i = maxi+1; i < dst.rows(); ++i)
+        dst.copyCoeff(i, j, src);
+      if (ClearOpposite)
+      {
+        for(Index i = 0; i < maxi; ++i)
+          dst.coeffRef(i, j) = 0;
+      }
+    }
+    dst.diagonal().setOnes();
+  }
+};
+
+} // end namespace internal
+
+// FIXME should we keep that possibility
+template<typename MatrixType, unsigned int Mode>
+template<typename OtherDerived>
+inline TriangularView<MatrixType, Mode>&
+TriangularView<MatrixType, Mode>::operator=(const MatrixBase<OtherDerived>& other)
+{
+  if(OtherDerived::Flags & EvalBeforeAssigningBit)
+  {
+    typename internal::plain_matrix_type<OtherDerived>::type other_evaluated(other.rows(), other.cols());
+    other_evaluated.template triangularView<Mode>().lazyAssign(other.derived());
+    lazyAssign(other_evaluated);
+  }
+  else
+    lazyAssign(other.derived());
+  return *this;
+}
+
+// FIXME should we keep that possibility
+template<typename MatrixType, unsigned int Mode>
+template<typename OtherDerived>
+void TriangularView<MatrixType, Mode>::lazyAssign(const MatrixBase<OtherDerived>& other)
+{
+  enum {
+    unroll = MatrixType::SizeAtCompileTime != Dynamic
+          && internal::traits<OtherDerived>::CoeffReadCost != Dynamic
+          && MatrixType::SizeAtCompileTime*internal::traits<OtherDerived>::CoeffReadCost/2 <= EIGEN_UNROLLING_LIMIT
+  };
+  eigen_assert(m_matrix.rows() == other.rows() && m_matrix.cols() == other.cols());
+
+  internal::triangular_assignment_selector
+    <MatrixType, OtherDerived, int(Mode),
+    unroll ? int(MatrixType::SizeAtCompileTime) : Dynamic,
+    false // do not change the opposite triangular part
+    >::run(m_matrix.const_cast_derived(), other.derived());
+}
+
+
+
+template<typename MatrixType, unsigned int Mode>
+template<typename OtherDerived>
+inline TriangularView<MatrixType, Mode>&
+TriangularView<MatrixType, Mode>::operator=(const TriangularBase<OtherDerived>& other)
+{
+  eigen_assert(Mode == int(OtherDerived::Mode));
+  if(internal::traits<OtherDerived>::Flags & EvalBeforeAssigningBit)
+  {
+    typename OtherDerived::DenseMatrixType other_evaluated(other.rows(), other.cols());
+    other_evaluated.template triangularView<Mode>().lazyAssign(other.derived().nestedExpression());
+    lazyAssign(other_evaluated);
+  }
+  else
+    lazyAssign(other.derived().nestedExpression());
+  return *this;
+}
+
+template<typename MatrixType, unsigned int Mode>
+template<typename OtherDerived>
+void TriangularView<MatrixType, Mode>::lazyAssign(const TriangularBase<OtherDerived>& other)
+{
+  enum {
+    unroll = MatrixType::SizeAtCompileTime != Dynamic
+                   && internal::traits<OtherDerived>::CoeffReadCost != Dynamic
+                   && MatrixType::SizeAtCompileTime * internal::traits<OtherDerived>::CoeffReadCost / 2
+                        <= EIGEN_UNROLLING_LIMIT
+  };
+  eigen_assert(m_matrix.rows() == other.rows() && m_matrix.cols() == other.cols());
+
+  internal::triangular_assignment_selector
+    <MatrixType, OtherDerived, int(Mode),
+    unroll ? int(MatrixType::SizeAtCompileTime) : Dynamic,
+    false // preserve the opposite triangular part
+    >::run(m_matrix.const_cast_derived(), other.derived().nestedExpression());
+}
+
+/***************************************************************************
+* Implementation of TriangularBase methods
+***************************************************************************/
+
+/** Assigns a triangular or selfadjoint matrix to a dense matrix.
+  * If the matrix is triangular, the opposite part is set to zero. */
+template<typename Derived>
+template<typename DenseDerived>
+void TriangularBase<Derived>::evalTo(MatrixBase<DenseDerived> &other) const
+{
+  if(internal::traits<Derived>::Flags & EvalBeforeAssigningBit)
+  {
+    typename internal::plain_matrix_type<Derived>::type other_evaluated(rows(), cols());
+    evalToLazy(other_evaluated);
+    other.derived().swap(other_evaluated);
+  }
+  else
+    evalToLazy(other.derived());
+}
+
+/** Assigns a triangular or selfadjoint matrix to a dense matrix.
+  * If the matrix is triangular, the opposite part is set to zero. */
+template<typename Derived>
+template<typename DenseDerived>
+void TriangularBase<Derived>::evalToLazy(MatrixBase<DenseDerived> &other) const
+{
+  enum {
+    unroll = DenseDerived::SizeAtCompileTime != Dynamic
+                   && internal::traits<Derived>::CoeffReadCost != Dynamic
+                   && DenseDerived::SizeAtCompileTime * internal::traits<Derived>::CoeffReadCost / 2
+                        <= EIGEN_UNROLLING_LIMIT
+  };
+  other.derived().resize(this->rows(), this->cols());
+
+  internal::triangular_assignment_selector
+    <DenseDerived, typename internal::traits<Derived>::MatrixTypeNestedCleaned, Derived::Mode,
+    unroll ? int(DenseDerived::SizeAtCompileTime) : Dynamic,
+    true // clear the opposite triangular part
+    >::run(other.derived(), derived().nestedExpression());
+}
+
+/***************************************************************************
+* Implementation of TriangularView methods
+***************************************************************************/
+
+/***************************************************************************
+* Implementation of MatrixBase methods
+***************************************************************************/
+
+#ifdef EIGEN2_SUPPORT
+
+// implementation of part<>(), including the SelfAdjoint case.
+
+namespace internal {
+template<typename MatrixType, unsigned int Mode>
+struct eigen2_part_return_type
+{
+  typedef TriangularView<MatrixType, Mode> type;
+};
+
+template<typename MatrixType>
+struct eigen2_part_return_type<MatrixType, SelfAdjoint>
+{
+  typedef SelfAdjointView<MatrixType, Upper> type;
+};
+}
+
+/** \deprecated use MatrixBase::triangularView() */
+template<typename Derived>
+template<unsigned int Mode>
+const typename internal::eigen2_part_return_type<Derived, Mode>::type MatrixBase<Derived>::part() const
+{
+  return derived();
+}
+
+/** \deprecated use MatrixBase::triangularView() */
+template<typename Derived>
+template<unsigned int Mode>
+typename internal::eigen2_part_return_type<Derived, Mode>::type MatrixBase<Derived>::part()
+{
+  return derived();
+}
+#endif
+
+/**
+  * \returns an expression of a triangular view extracted from the current matrix
+  *
+  * The parameter \a Mode can have the following values: \c #Upper, \c #StrictlyUpper, \c #UnitUpper,
+  * \c #Lower, \c #StrictlyLower, \c #UnitLower.
+  *
+  * Example: \include MatrixBase_extract.cpp
+  * Output: \verbinclude MatrixBase_extract.out
+  *
+  * \sa class TriangularView
+  */
+template<typename Derived>
+template<unsigned int Mode>
+typename MatrixBase<Derived>::template TriangularViewReturnType<Mode>::Type
+MatrixBase<Derived>::triangularView()
+{
+  return derived();
+}
+
+/** This is the const version of MatrixBase::triangularView() */
+template<typename Derived>
+template<unsigned int Mode>
+typename MatrixBase<Derived>::template ConstTriangularViewReturnType<Mode>::Type
+MatrixBase<Derived>::triangularView() const
+{
+  return derived();
+}
+
+/** \returns true if *this is approximately equal to an upper triangular matrix,
+  *          within the precision given by \a prec.
+  *
+  * \sa isLowerTriangular()
+  */
+template<typename Derived>
+bool MatrixBase<Derived>::isUpperTriangular(const RealScalar& prec) const
+{
+  using std::abs;
+  RealScalar maxAbsOnUpperPart = static_cast<RealScalar>(-1);
+  for(Index j = 0; j < cols(); ++j)
+  {
+    Index maxi = (std::min)(j, rows()-1);
+    for(Index i = 0; i <= maxi; ++i)
+    {
+      RealScalar absValue = abs(coeff(i,j));
+      if(absValue > maxAbsOnUpperPart) maxAbsOnUpperPart = absValue;
+    }
+  }
+  RealScalar threshold = maxAbsOnUpperPart * prec;
+  for(Index j = 0; j < cols(); ++j)
+    for(Index i = j+1; i < rows(); ++i)
+      if(abs(coeff(i, j)) > threshold) return false;
+  return true;
+}
+
+/** \returns true if *this is approximately equal to a lower triangular matrix,
+  *          within the precision given by \a prec.
+  *
+  * \sa isUpperTriangular()
+  */
+template<typename Derived>
+bool MatrixBase<Derived>::isLowerTriangular(const RealScalar& prec) const
+{
+  using std::abs;
+  RealScalar maxAbsOnLowerPart = static_cast<RealScalar>(-1);
+  for(Index j = 0; j < cols(); ++j)
+    for(Index i = j; i < rows(); ++i)
+    {
+      RealScalar absValue = abs(coeff(i,j));
+      if(absValue > maxAbsOnLowerPart) maxAbsOnLowerPart = absValue;
+    }
+  RealScalar threshold = maxAbsOnLowerPart * prec;
+  for(Index j = 1; j < cols(); ++j)
+  {
+    Index maxi = (std::min)(j, rows()-1);
+    for(Index i = 0; i < maxi; ++i)
+      if(abs(coeff(i, j)) > threshold) return false;
+  }
+  return true;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIANGULARMATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/VectorBlock.h b/third_party/eigen3/Eigen/src/Core/VectorBlock.h
new file mode 100644
index 0000000000..216c568c4f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/VectorBlock.h
@@ -0,0 +1,97 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_VECTORBLOCK_H
+#define EIGEN_VECTORBLOCK_H
+
+namespace Eigen { 
+
+/** \class VectorBlock
+  * \ingroup Core_Module
+  *
+  * \brief Expression of a fixed-size or dynamic-size sub-vector
+  *
+  * \param VectorType the type of the object in which we are taking a sub-vector
+  * \param Size size of the sub-vector we are taking at compile time (optional)
+  *
+  * This class represents an expression of either a fixed-size or dynamic-size sub-vector.
+  * It is the return type of DenseBase::segment(Index,Index) and DenseBase::segment<int>(Index) and
+  * most of the time this is the only way it is used.
+  *
+  * However, if you want to directly maniputate sub-vector expressions,
+  * for instance if you want to write a function returning such an expression, you
+  * will need to use this class.
+  *
+  * Here is an example illustrating the dynamic case:
+  * \include class_VectorBlock.cpp
+  * Output: \verbinclude class_VectorBlock.out
+  *
+  * \note Even though this expression has dynamic size, in the case where \a VectorType
+  * has fixed size, this expression inherits a fixed maximal size which means that evaluating
+  * it does not cause a dynamic memory allocation.
+  *
+  * Here is an example illustrating the fixed-size case:
+  * \include class_FixedVectorBlock.cpp
+  * Output: \verbinclude class_FixedVectorBlock.out
+  *
+  * \sa class Block, DenseBase::segment(Index,Index,Index,Index), DenseBase::segment(Index,Index)
+  */
+
+namespace internal {
+template<typename VectorType, int Size>
+struct traits<VectorBlock<VectorType, Size> >
+  : public traits<Block<VectorType,
+                     traits<VectorType>::Flags & RowMajorBit ? 1 : Size,
+                     traits<VectorType>::Flags & RowMajorBit ? Size : 1> >
+{
+};
+}
+
+template<typename VectorType, int Size> class VectorBlock
+  : public Block<VectorType,
+                     internal::traits<VectorType>::Flags & RowMajorBit ? 1 : Size,
+                     internal::traits<VectorType>::Flags & RowMajorBit ? Size : 1>
+{
+    typedef Block<VectorType,
+                     internal::traits<VectorType>::Flags & RowMajorBit ? 1 : Size,
+                     internal::traits<VectorType>::Flags & RowMajorBit ? Size : 1> Base;
+    enum {
+      IsColVector = !(internal::traits<VectorType>::Flags & RowMajorBit)
+    };
+  public:
+    EIGEN_DENSE_PUBLIC_INTERFACE(VectorBlock)
+
+    using Base::operator=;
+
+    /** Dynamic-size constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline VectorBlock(VectorType& vector, Index start, Index size)
+      : Base(vector,
+             IsColVector ? start : 0, IsColVector ? 0 : start,
+             IsColVector ? size  : 1, IsColVector ? 1 : size)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorBlock);
+    }
+
+    /** Fixed-size constructor
+      */
+    EIGEN_DEVICE_FUNC
+    inline VectorBlock(VectorType& vector, Index start)
+      : Base(vector, IsColVector ? start : 0, IsColVector ? 0 : start)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorBlock);
+    }
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_VECTORBLOCK_H
diff --git a/third_party/eigen3/Eigen/src/Core/VectorwiseOp.h b/third_party/eigen3/Eigen/src/Core/VectorwiseOp.h
new file mode 100644
index 0000000000..f25ddca174
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/VectorwiseOp.h
@@ -0,0 +1,651 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PARTIAL_REDUX_H
+#define EIGEN_PARTIAL_REDUX_H
+
+namespace Eigen { 
+
+/** \class PartialReduxExpr
+  * \ingroup Core_Module
+  *
+  * \brief Generic expression of a partially reduxed matrix
+  *
+  * \tparam MatrixType the type of the matrix we are applying the redux operation
+  * \tparam MemberOp type of the member functor
+  * \tparam Direction indicates the direction of the redux (#Vertical or #Horizontal)
+  *
+  * This class represents an expression of a partial redux operator of a matrix.
+  * It is the return type of some VectorwiseOp functions,
+  * and most of the time this is the only way it is used.
+  *
+  * \sa class VectorwiseOp
+  */
+
+template< typename MatrixType, typename MemberOp, int Direction>
+class PartialReduxExpr;
+
+namespace internal {
+template<typename MatrixType, typename MemberOp, int Direction>
+struct traits<PartialReduxExpr<MatrixType, MemberOp, Direction> >
+ : traits<MatrixType>
+{
+  typedef typename MemberOp::result_type Scalar;
+  typedef typename traits<MatrixType>::StorageKind StorageKind;
+  typedef typename traits<MatrixType>::XprKind XprKind;
+  typedef typename MatrixType::Scalar InputScalar;
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_all<MatrixTypeNested>::type _MatrixTypeNested;
+  enum {
+    RowsAtCompileTime = Direction==Vertical   ? 1 : MatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = Direction==Horizontal ? 1 : MatrixType::ColsAtCompileTime,
+    MaxRowsAtCompileTime = Direction==Vertical   ? 1 : MatrixType::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = Direction==Horizontal ? 1 : MatrixType::MaxColsAtCompileTime,
+    Flags0 = (unsigned int)_MatrixTypeNested::Flags & HereditaryBits,
+    Flags = (Flags0 & ~RowMajorBit) | (RowsAtCompileTime == 1 ? RowMajorBit : 0),
+    TraversalSize = Direction==Vertical ? MatrixType::RowsAtCompileTime :  MatrixType::ColsAtCompileTime
+  };
+  #if EIGEN_GNUC_AT_LEAST(3,4)
+  typedef typename MemberOp::template Cost<InputScalar,int(TraversalSize)> CostOpType;
+  #else
+  typedef typename MemberOp::template Cost<InputScalar,TraversalSize> CostOpType;
+  #endif
+  enum {
+    CoeffReadCost = TraversalSize==Dynamic ? Dynamic
+                  : TraversalSize * traits<_MatrixTypeNested>::CoeffReadCost + int(CostOpType::value)
+  };
+};
+}
+
+template< typename MatrixType, typename MemberOp, int Direction>
+class PartialReduxExpr : internal::no_assignment_operator,
+  public internal::dense_xpr_base< PartialReduxExpr<MatrixType, MemberOp, Direction> >::type
+{
+  public:
+
+    typedef typename internal::dense_xpr_base<PartialReduxExpr>::type Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(PartialReduxExpr)
+    typedef typename internal::traits<PartialReduxExpr>::MatrixTypeNested MatrixTypeNested;
+    typedef typename internal::traits<PartialReduxExpr>::_MatrixTypeNested _MatrixTypeNested;
+
+    PartialReduxExpr(const MatrixType& mat, const MemberOp& func = MemberOp())
+      : m_matrix(mat), m_functor(func) {}
+
+    Index rows() const { return (Direction==Vertical   ? 1 : m_matrix.rows()); }
+    Index cols() const { return (Direction==Horizontal ? 1 : m_matrix.cols()); }
+
+    EIGEN_STRONG_INLINE const Scalar coeff(Index i, Index j) const
+    {
+      if (Direction==Vertical)
+        return m_functor(m_matrix.col(j));
+      else
+        return m_functor(m_matrix.row(i));
+    }
+
+    const Scalar coeff(Index index) const
+    {
+      if (Direction==Vertical)
+        return m_functor(m_matrix.col(index));
+      else
+        return m_functor(m_matrix.row(index));
+    }
+
+  protected:
+    MatrixTypeNested m_matrix;
+    const MemberOp m_functor;
+};
+
+#define EIGEN_MEMBER_FUNCTOR(MEMBER,COST)                               \
+  template <typename ResultType>                                        \
+  struct member_##MEMBER {                                              \
+    EIGEN_EMPTY_STRUCT_CTOR(member_##MEMBER)                            \
+    typedef ResultType result_type;                                     \
+    template<typename Scalar, int Size> struct Cost                     \
+    { enum { value = COST }; };                                         \
+    template<typename XprType>                                          \
+    EIGEN_STRONG_INLINE ResultType operator()(const XprType& mat) const \
+    { return mat.MEMBER(); } \
+  }
+
+namespace internal {
+
+EIGEN_MEMBER_FUNCTOR(squaredNorm, Size * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(norm, (Size+5) * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(stableNorm, (Size+5) * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(blueNorm, (Size+5) * NumTraits<Scalar>::MulCost + (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(hypotNorm, (Size-1) * functor_traits<scalar_hypot_op<Scalar> >::Cost );
+EIGEN_MEMBER_FUNCTOR(sum, (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(mean, (Size-1)*NumTraits<Scalar>::AddCost + NumTraits<Scalar>::MulCost);
+EIGEN_MEMBER_FUNCTOR(minCoeff, (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(maxCoeff, (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(all, (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(any, (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(count, (Size-1)*NumTraits<Scalar>::AddCost);
+EIGEN_MEMBER_FUNCTOR(prod, (Size-1)*NumTraits<Scalar>::MulCost);
+
+
+template <typename BinaryOp, typename Scalar>
+struct member_redux {
+  typedef typename result_of<
+                     BinaryOp(Scalar)
+                   >::type  result_type;
+  template<typename _Scalar, int Size> struct Cost
+  { enum { value = (Size-1) * functor_traits<BinaryOp>::Cost }; };
+  member_redux(const BinaryOp func) : m_functor(func) {}
+  template<typename Derived>
+  inline result_type operator()(const DenseBase<Derived>& mat) const
+  { return mat.redux(m_functor); }
+  const BinaryOp m_functor;
+};
+}
+
+/** \class VectorwiseOp
+  * \ingroup Core_Module
+  *
+  * \brief Pseudo expression providing partial reduction operations
+  *
+  * \param ExpressionType the type of the object on which to do partial reductions
+  * \param Direction indicates the direction of the redux (#Vertical or #Horizontal)
+  *
+  * This class represents a pseudo expression with partial reduction features.
+  * It is the return type of DenseBase::colwise() and DenseBase::rowwise()
+  * and most of the time this is the only way it is used.
+  *
+  * Example: \include MatrixBase_colwise.cpp
+  * Output: \verbinclude MatrixBase_colwise.out
+  *
+  * \sa DenseBase::colwise(), DenseBase::rowwise(), class PartialReduxExpr
+  */
+template<typename ExpressionType, int Direction> class VectorwiseOp
+{
+  public:
+
+    typedef typename ExpressionType::Scalar Scalar;
+    typedef typename ExpressionType::RealScalar RealScalar;
+    typedef typename ExpressionType::Index Index;
+    typedef typename internal::conditional<internal::must_nest_by_value<ExpressionType>::ret,
+        ExpressionType, ExpressionType&>::type ExpressionTypeNested;
+    typedef typename internal::remove_all<ExpressionTypeNested>::type ExpressionTypeNestedCleaned;
+
+    template<template<typename _Scalar> class Functor,
+                      typename Scalar=typename internal::traits<ExpressionType>::Scalar> struct ReturnType
+    {
+      typedef PartialReduxExpr<ExpressionType,
+                               Functor<Scalar>,
+                               Direction
+                              > Type;
+    };
+
+    template<typename BinaryOp> struct ReduxReturnType
+    {
+      typedef PartialReduxExpr<ExpressionType,
+                               internal::member_redux<BinaryOp,typename internal::traits<ExpressionType>::Scalar>,
+                               Direction
+                              > Type;
+    };
+
+    enum {
+      IsVertical   = (Direction==Vertical) ? 1 : 0,
+      IsHorizontal = (Direction==Horizontal) ? 1 : 0
+    };
+
+  protected:
+
+    /** \internal
+      * \returns the i-th subvector according to the \c Direction */
+    typedef typename internal::conditional<Direction==Vertical,
+                               typename ExpressionType::ColXpr,
+                               typename ExpressionType::RowXpr>::type SubVector;
+    SubVector subVector(Index i)
+    {
+      return SubVector(m_matrix.derived(),i);
+    }
+
+    /** \internal
+      * \returns the number of subvectors in the direction \c Direction */
+    Index subVectors() const
+    { return Direction==Vertical?m_matrix.cols():m_matrix.rows(); }
+
+    template<typename OtherDerived> struct ExtendedType {
+      typedef Replicate<OtherDerived,
+                        Direction==Vertical   ? 1 : ExpressionType::RowsAtCompileTime,
+                        Direction==Horizontal ? 1 : ExpressionType::ColsAtCompileTime> Type;
+    };
+
+    /** \internal
+      * Replicates a vector to match the size of \c *this */
+    template<typename OtherDerived>
+    typename ExtendedType<OtherDerived>::Type
+    extendedTo(const DenseBase<OtherDerived>& other) const
+    {
+      EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(Direction==Vertical, OtherDerived::MaxColsAtCompileTime==1),
+                          YOU_PASSED_A_ROW_VECTOR_BUT_A_COLUMN_VECTOR_WAS_EXPECTED)
+      EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(Direction==Horizontal, OtherDerived::MaxRowsAtCompileTime==1),
+                          YOU_PASSED_A_COLUMN_VECTOR_BUT_A_ROW_VECTOR_WAS_EXPECTED)
+      return typename ExtendedType<OtherDerived>::Type
+                      (other.derived(),
+                       Direction==Vertical   ? 1 : m_matrix.rows(),
+                       Direction==Horizontal ? 1 : m_matrix.cols());
+    }
+    
+    template<typename OtherDerived> struct OppositeExtendedType {
+      typedef Replicate<OtherDerived,
+                        Direction==Horizontal ? 1 : ExpressionType::RowsAtCompileTime,
+                        Direction==Vertical   ? 1 : ExpressionType::ColsAtCompileTime> Type;
+    };
+
+    /** \internal
+      * Replicates a vector in the opposite direction to match the size of \c *this */
+    template<typename OtherDerived>
+    typename OppositeExtendedType<OtherDerived>::Type
+    extendedToOpposite(const DenseBase<OtherDerived>& other) const
+    {
+      EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(Direction==Horizontal, OtherDerived::MaxColsAtCompileTime==1),
+                          YOU_PASSED_A_ROW_VECTOR_BUT_A_COLUMN_VECTOR_WAS_EXPECTED)
+      EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(Direction==Vertical, OtherDerived::MaxRowsAtCompileTime==1),
+                          YOU_PASSED_A_COLUMN_VECTOR_BUT_A_ROW_VECTOR_WAS_EXPECTED)
+      return typename OppositeExtendedType<OtherDerived>::Type
+                      (other.derived(),
+                       Direction==Horizontal  ? 1 : m_matrix.rows(),
+                       Direction==Vertical    ? 1 : m_matrix.cols());
+    }
+
+  public:
+
+    inline VectorwiseOp(ExpressionType& matrix) : m_matrix(matrix) {}
+
+    /** \internal */
+    inline const ExpressionType& _expression() const { return m_matrix; }
+
+    /** \returns a row or column vector expression of \c *this reduxed by \a func
+      *
+      * The template parameter \a BinaryOp is the type of the functor
+      * of the custom redux operator. Note that func must be an associative operator.
+      *
+      * \sa class VectorwiseOp, DenseBase::colwise(), DenseBase::rowwise()
+      */
+    template<typename BinaryOp>
+    const typename ReduxReturnType<BinaryOp>::Type
+    redux(const BinaryOp& func = BinaryOp()) const
+    { return typename ReduxReturnType<BinaryOp>::Type(_expression(), func); }
+
+    /** \returns a row (or column) vector expression of the smallest coefficient
+      * of each column (or row) of the referenced expression.
+      * 
+      * \warning the result is undefined if \c *this contains NaN.
+      *
+      * Example: \include PartialRedux_minCoeff.cpp
+      * Output: \verbinclude PartialRedux_minCoeff.out
+      *
+      * \sa DenseBase::minCoeff() */
+    const typename ReturnType<internal::member_minCoeff>::Type minCoeff() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression of the largest coefficient
+      * of each column (or row) of the referenced expression.
+      * 
+      * \warning the result is undefined if \c *this contains NaN.
+      *
+      * Example: \include PartialRedux_maxCoeff.cpp
+      * Output: \verbinclude PartialRedux_maxCoeff.out
+      *
+      * \sa DenseBase::maxCoeff() */
+    const typename ReturnType<internal::member_maxCoeff>::Type maxCoeff() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression of the squared norm
+      * of each column (or row) of the referenced expression.
+      * This is a vector with real entries, even if the original matrix has complex entries.
+      *
+      * Example: \include PartialRedux_squaredNorm.cpp
+      * Output: \verbinclude PartialRedux_squaredNorm.out
+      *
+      * \sa DenseBase::squaredNorm() */
+    const typename ReturnType<internal::member_squaredNorm,RealScalar>::Type squaredNorm() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression of the norm
+      * of each column (or row) of the referenced expression.
+      * This is a vector with real entries, even if the original matrix has complex entries.
+      *
+      * Example: \include PartialRedux_norm.cpp
+      * Output: \verbinclude PartialRedux_norm.out
+      *
+      * \sa DenseBase::norm() */
+    const typename ReturnType<internal::member_norm,RealScalar>::Type norm() const
+    { return _expression(); }
+
+
+    /** \returns a row (or column) vector expression of the norm
+      * of each column (or row) of the referenced expression, using
+      * Blue's algorithm. 
+      * This is a vector with real entries, even if the original matrix has complex entries.
+      *
+      * \sa DenseBase::blueNorm() */
+    const typename ReturnType<internal::member_blueNorm,RealScalar>::Type blueNorm() const
+    { return _expression(); }
+
+
+    /** \returns a row (or column) vector expression of the norm
+      * of each column (or row) of the referenced expression, avoiding
+      * underflow and overflow.
+      * This is a vector with real entries, even if the original matrix has complex entries.
+      *
+      * \sa DenseBase::stableNorm() */
+    const typename ReturnType<internal::member_stableNorm,RealScalar>::Type stableNorm() const
+    { return _expression(); }
+
+
+    /** \returns a row (or column) vector expression of the norm
+      * of each column (or row) of the referenced expression, avoiding
+      * underflow and overflow using a concatenation of hypot() calls.
+      * This is a vector with real entries, even if the original matrix has complex entries.
+      *
+      * \sa DenseBase::hypotNorm() */
+    const typename ReturnType<internal::member_hypotNorm,RealScalar>::Type hypotNorm() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression of the sum
+      * of each column (or row) of the referenced expression.
+      *
+      * Example: \include PartialRedux_sum.cpp
+      * Output: \verbinclude PartialRedux_sum.out
+      *
+      * \sa DenseBase::sum() */
+    const typename ReturnType<internal::member_sum>::Type sum() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression of the mean
+    * of each column (or row) of the referenced expression.
+    *
+    * \sa DenseBase::mean() */
+    const typename ReturnType<internal::member_mean>::Type mean() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression representing
+      * whether \b all coefficients of each respective column (or row) are \c true.
+      * This expression can be assigned to a vector with entries of type \c bool.
+      *
+      * \sa DenseBase::all() */
+    const typename ReturnType<internal::member_all>::Type all() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression representing
+      * whether \b at \b least one coefficient of each respective column (or row) is \c true.
+      * This expression can be assigned to a vector with entries of type \c bool.
+      *
+      * \sa DenseBase::any() */
+    const typename ReturnType<internal::member_any>::Type any() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression representing
+      * the number of \c true coefficients of each respective column (or row).
+      * This expression can be assigned to a vector whose entries have the same type as is used to
+      * index entries of the original matrix; for dense matrices, this is \c std::ptrdiff_t .
+      *
+      * Example: \include PartialRedux_count.cpp
+      * Output: \verbinclude PartialRedux_count.out
+      *
+      * \sa DenseBase::count() */
+    const PartialReduxExpr<ExpressionType, internal::member_count<Index>, Direction> count() const
+    { return _expression(); }
+
+    /** \returns a row (or column) vector expression of the product
+      * of each column (or row) of the referenced expression.
+      *
+      * Example: \include PartialRedux_prod.cpp
+      * Output: \verbinclude PartialRedux_prod.out
+      *
+      * \sa DenseBase::prod() */
+    const typename ReturnType<internal::member_prod>::Type prod() const
+    { return _expression(); }
+
+
+    /** \returns a matrix expression
+      * where each column (or row) are reversed.
+      *
+      * Example: \include Vectorwise_reverse.cpp
+      * Output: \verbinclude Vectorwise_reverse.out
+      *
+      * \sa DenseBase::reverse() */
+    const Reverse<ExpressionType, Direction> reverse() const
+    { return Reverse<ExpressionType, Direction>( _expression() ); }
+
+    typedef Replicate<ExpressionType,Direction==Vertical?Dynamic:1,Direction==Horizontal?Dynamic:1> ReplicateReturnType;
+    const ReplicateReturnType replicate(Index factor) const;
+
+    /**
+      * \return an expression of the replication of each column (or row) of \c *this
+      *
+      * Example: \include DirectionWise_replicate.cpp
+      * Output: \verbinclude DirectionWise_replicate.out
+      *
+      * \sa VectorwiseOp::replicate(Index), DenseBase::replicate(), class Replicate
+      */
+    // NOTE implemented here because of sunstudio's compilation errors
+    template<int Factor> const Replicate<ExpressionType,(IsVertical?Factor:1),(IsHorizontal?Factor:1)>
+    replicate(Index factor = Factor) const
+    {
+      return Replicate<ExpressionType,Direction==Vertical?Factor:1,Direction==Horizontal?Factor:1>
+          (_expression(),Direction==Vertical?factor:1,Direction==Horizontal?factor:1);
+    }
+
+/////////// Artithmetic operators ///////////
+
+    /** Copies the vector \a other to each subvector of \c *this */
+    template<typename OtherDerived>
+    ExpressionType& operator=(const DenseBase<OtherDerived>& other)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      //eigen_assert((m_matrix.isNull()) == (other.isNull())); FIXME
+      return const_cast<ExpressionType&>(m_matrix = extendedTo(other.derived()));
+    }
+
+    /** Adds the vector \a other to each subvector of \c *this */
+    template<typename OtherDerived>
+    ExpressionType& operator+=(const DenseBase<OtherDerived>& other)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      return const_cast<ExpressionType&>(m_matrix += extendedTo(other.derived()));
+    }
+
+    /** Substracts the vector \a other to each subvector of \c *this */
+    template<typename OtherDerived>
+    ExpressionType& operator-=(const DenseBase<OtherDerived>& other)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      return const_cast<ExpressionType&>(m_matrix -= extendedTo(other.derived()));
+    }
+
+    /** Multiples each subvector of \c *this by the vector \a other */
+    template<typename OtherDerived>
+    ExpressionType& operator*=(const DenseBase<OtherDerived>& other)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      m_matrix *= extendedTo(other.derived());
+      return const_cast<ExpressionType&>(m_matrix);
+    }
+
+    /** Divides each subvector of \c *this by the vector \a other */
+    template<typename OtherDerived>
+    ExpressionType& operator/=(const DenseBase<OtherDerived>& other)
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      m_matrix /= extendedTo(other.derived());
+      return const_cast<ExpressionType&>(m_matrix);
+    }
+
+    /** Returns the expression of the sum of the vector \a other to each subvector of \c *this */
+    template<typename OtherDerived> EIGEN_STRONG_INLINE
+    CwiseBinaryOp<internal::scalar_sum_op<Scalar>,
+                  const ExpressionTypeNestedCleaned,
+                  const typename ExtendedType<OtherDerived>::Type>
+    operator+(const DenseBase<OtherDerived>& other) const
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      return m_matrix + extendedTo(other.derived());
+    }
+
+    /** Returns the expression of the difference between each subvector of \c *this and the vector \a other */
+    template<typename OtherDerived>
+    CwiseBinaryOp<internal::scalar_difference_op<Scalar>,
+                  const ExpressionTypeNestedCleaned,
+                  const typename ExtendedType<OtherDerived>::Type>
+    operator-(const DenseBase<OtherDerived>& other) const
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      return m_matrix - extendedTo(other.derived());
+    }
+
+    /** Returns the expression where each subvector is the product of the vector \a other
+      * by the corresponding subvector of \c *this */
+    template<typename OtherDerived> EIGEN_STRONG_INLINE
+    CwiseBinaryOp<internal::scalar_product_op<Scalar>,
+                  const ExpressionTypeNestedCleaned,
+                  const typename ExtendedType<OtherDerived>::Type>
+    operator*(const DenseBase<OtherDerived>& other) const
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      return m_matrix * extendedTo(other.derived());
+    }
+
+    /** Returns the expression where each subvector is the quotient of the corresponding
+      * subvector of \c *this by the vector \a other */
+    template<typename OtherDerived>
+    CwiseBinaryOp<internal::scalar_quotient_op<Scalar>,
+                  const ExpressionTypeNestedCleaned,
+                  const typename ExtendedType<OtherDerived>::Type>
+    operator/(const DenseBase<OtherDerived>& other) const
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+      EIGEN_STATIC_ASSERT_ARRAYXPR(ExpressionType)
+      EIGEN_STATIC_ASSERT_SAME_XPR_KIND(ExpressionType, OtherDerived)
+      return m_matrix / extendedTo(other.derived());
+    }
+    
+    /** \returns an expression where each column of row of the referenced matrix are normalized.
+      * The referenced matrix is \b not modified.
+      * \sa MatrixBase::normalized(), normalize()
+      */
+    CwiseBinaryOp<internal::scalar_quotient_op<Scalar>,
+                  const ExpressionTypeNestedCleaned,
+                  const typename OppositeExtendedType<typename ReturnType<internal::member_norm,RealScalar>::Type>::Type>
+    normalized() const { return m_matrix.cwiseQuotient(extendedToOpposite(this->norm())); }
+    
+    
+    /** Normalize in-place each row or columns of the referenced matrix.
+      * \sa MatrixBase::normalize(), normalized()
+      */
+    void normalize() {
+      m_matrix = this->normalized();
+    }
+
+/////////// Geometry module ///////////
+
+    #if EIGEN2_SUPPORT_STAGE > STAGE20_RESOLVE_API_CONFLICTS
+    Homogeneous<ExpressionType,Direction> homogeneous() const;
+    #endif
+
+    typedef typename ExpressionType::PlainObject CrossReturnType;
+    template<typename OtherDerived>
+    const CrossReturnType cross(const MatrixBase<OtherDerived>& other) const;
+
+    enum {
+      HNormalized_Size = Direction==Vertical ? internal::traits<ExpressionType>::RowsAtCompileTime
+                                             : internal::traits<ExpressionType>::ColsAtCompileTime,
+      HNormalized_SizeMinusOne = HNormalized_Size==Dynamic ? Dynamic : HNormalized_Size-1
+    };
+    typedef Block<const ExpressionType,
+                  Direction==Vertical   ? int(HNormalized_SizeMinusOne)
+                                        : int(internal::traits<ExpressionType>::RowsAtCompileTime),
+                  Direction==Horizontal ? int(HNormalized_SizeMinusOne)
+                                        : int(internal::traits<ExpressionType>::ColsAtCompileTime)>
+            HNormalized_Block;
+    typedef Block<const ExpressionType,
+                  Direction==Vertical   ? 1 : int(internal::traits<ExpressionType>::RowsAtCompileTime),
+                  Direction==Horizontal ? 1 : int(internal::traits<ExpressionType>::ColsAtCompileTime)>
+            HNormalized_Factors;
+    typedef CwiseBinaryOp<internal::scalar_quotient_op<typename internal::traits<ExpressionType>::Scalar>,
+                const HNormalized_Block,
+                const Replicate<HNormalized_Factors,
+                  Direction==Vertical   ? HNormalized_SizeMinusOne : 1,
+                  Direction==Horizontal ? HNormalized_SizeMinusOne : 1> >
+            HNormalizedReturnType;
+
+    const HNormalizedReturnType hnormalized() const;
+
+  protected:
+    ExpressionTypeNested m_matrix;
+};
+
+/** \returns a VectorwiseOp wrapper of *this providing additional partial reduction operations
+  *
+  * Example: \include MatrixBase_colwise.cpp
+  * Output: \verbinclude MatrixBase_colwise.out
+  *
+  * \sa rowwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting
+  */
+template<typename Derived>
+inline const typename DenseBase<Derived>::ConstColwiseReturnType
+DenseBase<Derived>::colwise() const
+{
+  return derived();
+}
+
+/** \returns a writable VectorwiseOp wrapper of *this providing additional partial reduction operations
+  *
+  * \sa rowwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting
+  */
+template<typename Derived>
+inline typename DenseBase<Derived>::ColwiseReturnType
+DenseBase<Derived>::colwise()
+{
+  return derived();
+}
+
+/** \returns a VectorwiseOp wrapper of *this providing additional partial reduction operations
+  *
+  * Example: \include MatrixBase_rowwise.cpp
+  * Output: \verbinclude MatrixBase_rowwise.out
+  *
+  * \sa colwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting
+  */
+template<typename Derived>
+inline const typename DenseBase<Derived>::ConstRowwiseReturnType
+DenseBase<Derived>::rowwise() const
+{
+  return derived();
+}
+
+/** \returns a writable VectorwiseOp wrapper of *this providing additional partial reduction operations
+  *
+  * \sa colwise(), class VectorwiseOp, \ref TutorialReductionsVisitorsBroadcasting
+  */
+template<typename Derived>
+inline typename DenseBase<Derived>::RowwiseReturnType
+DenseBase<Derived>::rowwise()
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_PARTIAL_REDUX_H
diff --git a/third_party/eigen3/Eigen/src/Core/Visitor.h b/third_party/eigen3/Eigen/src/Core/Visitor.h
new file mode 100644
index 0000000000..64867b7a2c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/Visitor.h
@@ -0,0 +1,237 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_VISITOR_H
+#define EIGEN_VISITOR_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Visitor, typename Derived, int UnrollCount>
+struct visitor_impl
+{
+  enum {
+    col = (UnrollCount-1) / Derived::RowsAtCompileTime,
+    row = (UnrollCount-1) % Derived::RowsAtCompileTime
+  };
+
+  static inline void run(const Derived &mat, Visitor& visitor)
+  {
+    visitor_impl<Visitor, Derived, UnrollCount-1>::run(mat, visitor);
+    visitor(mat.coeff(row, col), row, col);
+  }
+};
+
+template<typename Visitor, typename Derived>
+struct visitor_impl<Visitor, Derived, 1>
+{
+  static inline void run(const Derived &mat, Visitor& visitor)
+  {
+    return visitor.init(mat.coeff(0, 0), 0, 0);
+  }
+};
+
+template<typename Visitor, typename Derived>
+struct visitor_impl<Visitor, Derived, Dynamic>
+{
+  typedef typename Derived::Index Index;
+  static inline void run(const Derived& mat, Visitor& visitor)
+  {
+    visitor.init(mat.coeff(0,0), 0, 0);
+    for(Index i = 1; i < mat.rows(); ++i)
+      visitor(mat.coeff(i, 0), i, 0);
+    for(Index j = 1; j < mat.cols(); ++j)
+      for(Index i = 0; i < mat.rows(); ++i)
+        visitor(mat.coeff(i, j), i, j);
+  }
+};
+
+} // end namespace internal
+
+/** Applies the visitor \a visitor to the whole coefficients of the matrix or vector.
+  *
+  * The template parameter \a Visitor is the type of the visitor and provides the following interface:
+  * \code
+  * struct MyVisitor {
+  *   // called for the first coefficient
+  *   void init(const Scalar& value, Index i, Index j);
+  *   // called for all other coefficients
+  *   void operator() (const Scalar& value, Index i, Index j);
+  * };
+  * \endcode
+  *
+  * \note compared to one or two \em for \em loops, visitors offer automatic
+  * unrolling for small fixed size matrix.
+  *
+  * \sa minCoeff(Index*,Index*), maxCoeff(Index*,Index*), DenseBase::redux()
+  */
+template<typename Derived>
+template<typename Visitor>
+void DenseBase<Derived>::visit(Visitor& visitor) const
+{
+  enum { unroll = SizeAtCompileTime != Dynamic
+                   && CoeffReadCost != Dynamic
+                   && (SizeAtCompileTime == 1 || internal::functor_traits<Visitor>::Cost != Dynamic)
+                   && SizeAtCompileTime * CoeffReadCost + (SizeAtCompileTime-1) * internal::functor_traits<Visitor>::Cost
+                      <= EIGEN_UNROLLING_LIMIT };
+  return internal::visitor_impl<Visitor, Derived,
+      unroll ? int(SizeAtCompileTime) : Dynamic
+    >::run(derived(), visitor);
+}
+
+namespace internal {
+
+/** \internal
+  * \brief Base class to implement min and max visitors
+  */
+template <typename Derived>
+struct coeff_visitor
+{
+  typedef typename Derived::Index Index;
+  typedef typename Derived::Scalar Scalar;
+  Index row, col;
+  Scalar res;
+  inline void init(const Scalar& value, Index i, Index j)
+  {
+    res = value;
+    row = i;
+    col = j;
+  }
+};
+
+/** \internal
+  * \brief Visitor computing the min coefficient with its value and coordinates
+  *
+  * \sa DenseBase::minCoeff(Index*, Index*)
+  */
+template <typename Derived>
+struct min_coeff_visitor : coeff_visitor<Derived>
+{
+  typedef typename Derived::Index Index;
+  typedef typename Derived::Scalar Scalar;
+  void operator() (const Scalar& value, Index i, Index j)
+  {
+    if(value < this->res)
+    {
+      this->res = value;
+      this->row = i;
+      this->col = j;
+    }
+  }
+};
+
+template<typename Scalar>
+struct functor_traits<min_coeff_visitor<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost
+  };
+};
+
+/** \internal
+  * \brief Visitor computing the max coefficient with its value and coordinates
+  *
+  * \sa DenseBase::maxCoeff(Index*, Index*)
+  */
+template <typename Derived>
+struct max_coeff_visitor : coeff_visitor<Derived>
+{
+  typedef typename Derived::Index Index;
+  typedef typename Derived::Scalar Scalar;
+  void operator() (const Scalar& value, Index i, Index j)
+  {
+    if(value > this->res)
+    {
+      this->res = value;
+      this->row = i;
+      this->col = j;
+    }
+  }
+};
+
+template<typename Scalar>
+struct functor_traits<max_coeff_visitor<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost
+  };
+};
+
+} // end namespace internal
+
+/** \returns the minimum of all coefficients of *this and puts in *row and *col its location.
+  * \warning the result is undefined if \c *this contains NaN.
+  *
+  * \sa DenseBase::minCoeff(Index*), DenseBase::maxCoeff(Index*,Index*), DenseBase::visitor(), DenseBase::minCoeff()
+  */
+template<typename Derived>
+template<typename IndexType>
+typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::minCoeff(IndexType* rowId, IndexType* colId) const
+{
+  internal::min_coeff_visitor<Derived> minVisitor;
+  this->visit(minVisitor);
+  *rowId = minVisitor.row;
+  if (colId) *colId = minVisitor.col;
+  return minVisitor.res;
+}
+
+/** \returns the minimum of all coefficients of *this and puts in *index its location.
+  * \warning the result is undefined if \c *this contains NaN. 
+  *
+  * \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::visitor(), DenseBase::minCoeff()
+  */
+template<typename Derived>
+template<typename IndexType>
+typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::minCoeff(IndexType* index) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  internal::min_coeff_visitor<Derived> minVisitor;
+  this->visit(minVisitor);
+  *index = (RowsAtCompileTime==1) ? minVisitor.col : minVisitor.row;
+  return minVisitor.res;
+}
+
+/** \returns the maximum of all coefficients of *this and puts in *row and *col its location.
+  * \warning the result is undefined if \c *this contains NaN. 
+  *
+  * \sa DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visitor(), DenseBase::maxCoeff()
+  */
+template<typename Derived>
+template<typename IndexType>
+typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::maxCoeff(IndexType* rowPtr, IndexType* colPtr) const
+{
+  internal::max_coeff_visitor<Derived> maxVisitor;
+  this->visit(maxVisitor);
+  *rowPtr = maxVisitor.row;
+  if (colPtr) *colPtr = maxVisitor.col;
+  return maxVisitor.res;
+}
+
+/** \returns the maximum of all coefficients of *this and puts in *index its location.
+  * \warning the result is undefined if \c *this contains NaN.
+  *
+  * \sa DenseBase::maxCoeff(IndexType*,IndexType*), DenseBase::minCoeff(IndexType*,IndexType*), DenseBase::visitor(), DenseBase::maxCoeff()
+  */
+template<typename Derived>
+template<typename IndexType>
+typename internal::traits<Derived>::Scalar
+DenseBase<Derived>::maxCoeff(IndexType* index) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  internal::max_coeff_visitor<Derived> maxVisitor;
+  this->visit(maxVisitor);
+  *index = (RowsAtCompileTime==1) ? maxVisitor.col : maxVisitor.row;
+  return maxVisitor.res;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_VISITOR_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/AVX/Complex.h b/third_party/eigen3/Eigen/src/Core/arch/AVX/Complex.h
new file mode 100644
index 0000000000..e98c40e1f1
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/AVX/Complex.h
@@ -0,0 +1,463 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner (benoit.steiner.goog@gmail.com)
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COMPLEX_AVX_H
+#define EIGEN_COMPLEX_AVX_H
+
+namespace Eigen {
+
+namespace internal {
+
+//---------- float ----------
+struct Packet4cf
+{
+  EIGEN_STRONG_INLINE Packet4cf() {}
+  EIGEN_STRONG_INLINE explicit Packet4cf(const __m256& a) : v(a) {}
+  __m256  v;
+};
+
+template<> struct packet_traits<std::complex<float> >  : default_packet_traits
+{
+  typedef Packet4cf type;
+  typedef Packet2cf half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 4,
+    HasHalfPacket = 1,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasDiv    = 1,
+    HasNegate = 1,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasSetLinear = 0
+  };
+};
+
+template<> struct unpacket_traits<Packet4cf> { typedef std::complex<float> type; enum {size=4}; typedef Packet2cf half; };
+
+template<> EIGEN_STRONG_INLINE Packet4cf padd<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_add_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet4cf psub<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_sub_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet4cf pnegate(const Packet4cf& a)
+{
+  return Packet4cf(pnegate(a.v));
+}
+template<> EIGEN_STRONG_INLINE Packet4cf pconj(const Packet4cf& a)
+{
+  const __m256 mask = _mm256_castsi256_ps(_mm256_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000,0x00000000,0x80000000));
+  return Packet4cf(_mm256_xor_ps(a.v,mask));
+}
+
+template<> EIGEN_STRONG_INLINE Packet4cf pmul<Packet4cf>(const Packet4cf& a, const Packet4cf& b)
+{
+  __m256 tmp1 = _mm256_mul_ps(_mm256_moveldup_ps(a.v), b.v);
+  __m256 tmp2 = _mm256_mul_ps(_mm256_movehdup_ps(a.v), _mm256_permute_ps(b.v, _MM_SHUFFLE(2,3,0,1)));
+  __m256 result = _mm256_addsub_ps(tmp1, tmp2);
+  return Packet4cf(result);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4cf pand   <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_and_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet4cf por    <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_or_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet4cf pxor   <Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_xor_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet4cf pandnot<Packet4cf>(const Packet4cf& a, const Packet4cf& b) { return Packet4cf(_mm256_andnot_ps(a.v,b.v)); }
+
+template<> EIGEN_STRONG_INLINE Packet4cf pload <Packet4cf>(const std::complex<float>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet4cf(pload<Packet8f>(&numext::real_ref(*from))); }
+template<> EIGEN_STRONG_INLINE Packet4cf ploadu<Packet4cf>(const std::complex<float>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet4cf(ploadu<Packet8f>(&numext::real_ref(*from))); }
+
+
+template<> EIGEN_STRONG_INLINE Packet4cf pset1<Packet4cf>(const std::complex<float>& from)
+{
+  return Packet4cf(_mm256_castpd_ps(_mm256_broadcast_sd((const double*)(const void*)&from)));
+}
+
+template<> EIGEN_STRONG_INLINE Packet4cf ploaddup<Packet4cf>(const std::complex<float>* from)
+{
+  // FIXME The following might be optimized using _mm256_movedup_pd
+  Packet2cf a = ploaddup<Packet2cf>(from);
+  Packet2cf b = ploaddup<Packet2cf>(from+1);
+  return  Packet4cf(_mm256_insertf128_ps(_mm256_castps128_ps256(a.v), b.v, 1));
+}
+
+template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float>* to, const Packet4cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore(&numext::real_ref(*to), from.v); }
+template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float>* to, const Packet4cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(&numext::real_ref(*to), from.v); }
+
+template<> EIGEN_DEVICE_FUNC inline Packet4cf pgather<std::complex<float>, Packet4cf>(const std::complex<float>* from, int stride)
+{
+  return Packet4cf(_mm256_set_ps(std::imag(from[3*stride]), std::real(from[3*stride]),
+                                 std::imag(from[2*stride]), std::real(from[2*stride]),
+                                 std::imag(from[1*stride]), std::real(from[1*stride]),
+                                 std::imag(from[0*stride]), std::real(from[0*stride])));
+}
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet4cf>(std::complex<float>* to, const Packet4cf& from, int stride)
+{
+  __m128 low = _mm256_extractf128_ps(from.v, 0);
+  to[stride*0] = std::complex<float>(_mm_cvtss_f32(_mm_shuffle_ps(low, low, 0)),
+                                     _mm_cvtss_f32(_mm_shuffle_ps(low, low, 1)));
+  to[stride*1] = std::complex<float>(_mm_cvtss_f32(_mm_shuffle_ps(low, low, 2)),
+                                     _mm_cvtss_f32(_mm_shuffle_ps(low, low, 3)));
+
+  __m128 high = _mm256_extractf128_ps(from.v, 1);
+  to[stride*2] = std::complex<float>(_mm_cvtss_f32(_mm_shuffle_ps(high, high, 0)),
+                                     _mm_cvtss_f32(_mm_shuffle_ps(high, high, 1)));
+  to[stride*3] = std::complex<float>(_mm_cvtss_f32(_mm_shuffle_ps(high, high, 2)),
+                                     _mm_cvtss_f32(_mm_shuffle_ps(high, high, 3)));
+
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<float>  pfirst<Packet4cf>(const Packet4cf& a)
+{
+  return pfirst(Packet2cf(_mm256_castps256_ps128(a.v)));
+}
+
+template<> EIGEN_STRONG_INLINE Packet4cf preverse(const Packet4cf& a) {
+  __m128 low  = _mm256_extractf128_ps(a.v, 0);
+  __m128 high = _mm256_extractf128_ps(a.v, 1);
+  __m128d lowd  = _mm_castps_pd(low);
+  __m128d highd = _mm_castps_pd(high);
+  low  = _mm_castpd_ps(_mm_shuffle_pd(lowd,lowd,0x1));
+  high = _mm_castpd_ps(_mm_shuffle_pd(highd,highd,0x1));
+  __m256 result = _mm256_setzero_ps();
+  result = _mm256_insertf128_ps(result, low, 1);
+  result = _mm256_insertf128_ps(result, high, 0);
+  return Packet4cf(result);
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet4cf>(const Packet4cf& a)
+{
+  return predux(padd(Packet2cf(_mm256_extractf128_ps(a.v,0)),
+                     Packet2cf(_mm256_extractf128_ps(a.v,1))));
+}
+
+template<> EIGEN_STRONG_INLINE Packet4cf preduxp<Packet4cf>(const Packet4cf* vecs)
+{
+  Packet8f t0 = _mm256_shuffle_ps(vecs[0].v, vecs[0].v, _MM_SHUFFLE(3, 1, 2 ,0));
+  Packet8f t1 = _mm256_shuffle_ps(vecs[1].v, vecs[1].v, _MM_SHUFFLE(3, 1, 2 ,0));
+  t0 = _mm256_hadd_ps(t0,t1);
+  Packet8f t2 = _mm256_shuffle_ps(vecs[2].v, vecs[2].v, _MM_SHUFFLE(3, 1, 2 ,0));
+  Packet8f t3 = _mm256_shuffle_ps(vecs[3].v, vecs[3].v, _MM_SHUFFLE(3, 1, 2 ,0));
+  t2 = _mm256_hadd_ps(t2,t3);
+  
+  t1 = _mm256_permute2f128_ps(t0,t2, 0 + (2<<4));
+  t3 = _mm256_permute2f128_ps(t0,t2, 1 + (3<<4));
+
+  return Packet4cf(_mm256_add_ps(t1,t3));
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet4cf>(const Packet4cf& a)
+{
+  return predux_mul(pmul(Packet2cf(_mm256_extractf128_ps(a.v, 0)),
+                         Packet2cf(_mm256_extractf128_ps(a.v, 1))));
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet4cf>
+{
+  static EIGEN_STRONG_INLINE void run(Packet4cf& first, const Packet4cf& second)
+  {
+    if (Offset==0) return;
+    palign_impl<Offset*2,Packet8f>::run(first.v, second.v);
+  }
+};
+
+template<> struct conj_helper<Packet4cf, Packet4cf, false,true>
+{
+  EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
+  {
+    return internal::pmul(a, pconj(b));
+  }
+};
+
+template<> struct conj_helper<Packet4cf, Packet4cf, true,false>
+{
+  EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
+  {
+    return internal::pmul(pconj(a), b);
+  }
+};
+
+template<> struct conj_helper<Packet4cf, Packet4cf, true,true>
+{
+  EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet4cf& y, const Packet4cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& a, const Packet4cf& b) const
+  {
+    return pconj(internal::pmul(a, b));
+  }
+};
+
+template<> struct conj_helper<Packet8f, Packet4cf, false,false>
+{
+  EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet8f& x, const Packet4cf& y, const Packet4cf& c) const
+  { return padd(c, pmul(x,y)); }
+
+  EIGEN_STRONG_INLINE Packet4cf pmul(const Packet8f& x, const Packet4cf& y) const
+  { return Packet4cf(Eigen::internal::pmul(x, y.v)); }
+};
+
+template<> struct conj_helper<Packet4cf, Packet8f, false,false>
+{
+  EIGEN_STRONG_INLINE Packet4cf pmadd(const Packet4cf& x, const Packet8f& y, const Packet4cf& c) const
+  { return padd(c, pmul(x,y)); }
+
+  EIGEN_STRONG_INLINE Packet4cf pmul(const Packet4cf& x, const Packet8f& y) const
+  { return Packet4cf(Eigen::internal::pmul(x.v, y)); }
+};
+
+template<> EIGEN_STRONG_INLINE Packet4cf pdiv<Packet4cf>(const Packet4cf& a, const Packet4cf& b)
+{
+  Packet4cf num = pmul(a, pconj(b));
+  __m256 tmp = _mm256_mul_ps(b.v, b.v);
+  __m256 tmp2    = _mm256_shuffle_ps(tmp,tmp,0xB1);
+  __m256 denom = _mm256_add_ps(tmp, tmp2);
+  return Packet4cf(_mm256_div_ps(num.v, denom));
+}
+
+template<> EIGEN_STRONG_INLINE Packet4cf pcplxflip<Packet4cf>(const Packet4cf& x)
+{
+  return Packet4cf(_mm256_shuffle_ps(x.v, x.v, _MM_SHUFFLE(2, 3, 0 ,1)));
+}
+
+//---------- double ----------
+struct Packet2cd
+{
+  EIGEN_STRONG_INLINE Packet2cd() {}
+  EIGEN_STRONG_INLINE explicit Packet2cd(const __m256d& a) : v(a) {}
+  __m256d  v;
+};
+
+template<> struct packet_traits<std::complex<double> >  : default_packet_traits
+{
+  typedef Packet2cd type;
+  typedef Packet1cd half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 0,
+    size = 2,
+    HasHalfPacket = 1,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasDiv    = 1,
+    HasNegate = 1,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasSetLinear = 0
+  };
+};
+
+template<> struct unpacket_traits<Packet2cd> { typedef std::complex<double> type; enum {size=2}; typedef Packet1cd half; };
+
+template<> EIGEN_STRONG_INLINE Packet2cd padd<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_add_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cd psub<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_sub_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cd pnegate(const Packet2cd& a) { return Packet2cd(pnegate(a.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cd pconj(const Packet2cd& a)
+{
+  const __m256d mask = _mm256_castsi256_pd(_mm256_set_epi32(0x80000000,0x0,0x0,0x0,0x80000000,0x0,0x0,0x0));
+  return Packet2cd(_mm256_xor_pd(a.v,mask));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cd pmul<Packet2cd>(const Packet2cd& a, const Packet2cd& b)
+{
+  __m256d tmp1 = _mm256_shuffle_pd(a.v,a.v,0x0);
+  __m256d even = _mm256_mul_pd(tmp1, b.v);
+  __m256d tmp2 = _mm256_shuffle_pd(a.v,a.v,0xF);
+  __m256d tmp3 = _mm256_shuffle_pd(b.v,b.v,0x5);
+  __m256d odd  = _mm256_mul_pd(tmp2, tmp3);
+  return Packet2cd(_mm256_addsub_pd(even, odd));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cd pand   <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_and_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cd por    <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_or_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cd pxor   <Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_xor_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cd pandnot<Packet2cd>(const Packet2cd& a, const Packet2cd& b) { return Packet2cd(_mm256_andnot_pd(a.v,b.v)); }
+
+template<> EIGEN_STRONG_INLINE Packet2cd pload <Packet2cd>(const std::complex<double>* from)
+{ EIGEN_DEBUG_ALIGNED_LOAD return Packet2cd(pload<Packet4d>((const double*)from)); }
+template<> EIGEN_STRONG_INLINE Packet2cd ploadu<Packet2cd>(const std::complex<double>* from)
+{ EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cd(ploadu<Packet4d>((const double*)from)); }
+
+template<> EIGEN_STRONG_INLINE Packet2cd pset1<Packet2cd>(const std::complex<double>& from)
+{
+  // in case casting to a __m128d* is really not safe, then we can still fallback to this version: (much slower though)
+//   return Packet2cd(_mm256_loadu2_m128d((const double*)&from,(const double*)&from));
+    return Packet2cd(_mm256_broadcast_pd((const __m128d*)(const void*)&from));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cd ploaddup<Packet2cd>(const std::complex<double>* from) { return pset1<Packet2cd>(*from); }
+
+template<> EIGEN_STRONG_INLINE void pstore <std::complex<double> >(std::complex<double> *   to, const Packet2cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); }
+template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<double> *   to, const Packet2cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); }
+
+template<> EIGEN_DEVICE_FUNC inline Packet2cd pgather<std::complex<double>, Packet2cd>(const std::complex<double>* from, int stride)
+{
+  return Packet2cd(_mm256_set_pd(std::imag(from[1*stride]), std::real(from[1*stride]),
+				 std::imag(from[0*stride]), std::real(from[0*stride])));
+}
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet2cd>(std::complex<double>* to, const Packet2cd& from, int stride)
+{
+  __m128d low = _mm256_extractf128_pd(from.v, 0);
+  to[stride*0] = std::complex<double>(_mm_cvtsd_f64(low), _mm_cvtsd_f64(_mm_shuffle_pd(low, low, 1)));
+  __m128d high = _mm256_extractf128_pd(from.v, 1);
+  to[stride*1] = std::complex<double>(_mm_cvtsd_f64(high), _mm_cvtsd_f64(_mm_shuffle_pd(high, high, 1)));
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<double> pfirst<Packet2cd>(const Packet2cd& a)
+{
+  __m128d low = _mm256_extractf128_pd(a.v, 0);
+  EIGEN_ALIGN16 double res[2];
+  _mm_store_pd(res, low);
+  return std::complex<double>(res[0],res[1]);
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cd preverse(const Packet2cd& a) {
+  __m256d result = _mm256_permute2f128_pd(a.v, a.v, 1);
+  return Packet2cd(result);
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet2cd>(const Packet2cd& a)
+{
+  return predux(padd(Packet1cd(_mm256_extractf128_pd(a.v,0)),
+                     Packet1cd(_mm256_extractf128_pd(a.v,1))));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cd preduxp<Packet2cd>(const Packet2cd* vecs)
+{
+  Packet4d t0 = _mm256_permute2f128_pd(vecs[0].v,vecs[1].v, 0 + (2<<4));
+  Packet4d t1 = _mm256_permute2f128_pd(vecs[0].v,vecs[1].v, 1 + (3<<4));
+
+  return Packet2cd(_mm256_add_pd(t0,t1));
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet2cd>(const Packet2cd& a)
+{
+  return predux(pmul(Packet1cd(_mm256_extractf128_pd(a.v,0)),
+                     Packet1cd(_mm256_extractf128_pd(a.v,1))));
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet2cd>
+{
+  static EIGEN_STRONG_INLINE void run(Packet2cd& first, const Packet2cd& second)
+  {
+    if (Offset==0) return;
+    palign_impl<Offset*2,Packet4d>::run(first.v, second.v);
+  }
+};
+
+template<> struct conj_helper<Packet2cd, Packet2cd, false,true>
+{
+  EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
+  {
+    return internal::pmul(a, pconj(b));
+  }
+};
+
+template<> struct conj_helper<Packet2cd, Packet2cd, true,false>
+{
+  EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
+  {
+    return internal::pmul(pconj(a), b);
+  }
+};
+
+template<> struct conj_helper<Packet2cd, Packet2cd, true,true>
+{
+  EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet2cd& y, const Packet2cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& a, const Packet2cd& b) const
+  {
+    return pconj(internal::pmul(a, b));
+  }
+};
+
+template<> struct conj_helper<Packet4d, Packet2cd, false,false>
+{
+  EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet4d& x, const Packet2cd& y, const Packet2cd& c) const
+  { return padd(c, pmul(x,y)); }
+
+  EIGEN_STRONG_INLINE Packet2cd pmul(const Packet4d& x, const Packet2cd& y) const
+  { return Packet2cd(Eigen::internal::pmul(x, y.v)); }
+};
+
+template<> struct conj_helper<Packet2cd, Packet4d, false,false>
+{
+  EIGEN_STRONG_INLINE Packet2cd pmadd(const Packet2cd& x, const Packet4d& y, const Packet2cd& c) const
+  { return padd(c, pmul(x,y)); }
+
+  EIGEN_STRONG_INLINE Packet2cd pmul(const Packet2cd& x, const Packet4d& y) const
+  { return Packet2cd(Eigen::internal::pmul(x.v, y)); }
+};
+
+template<> EIGEN_STRONG_INLINE Packet2cd pdiv<Packet2cd>(const Packet2cd& a, const Packet2cd& b)
+{
+  Packet2cd num = pmul(a, pconj(b));
+  __m256d tmp = _mm256_mul_pd(b.v, b.v);
+  __m256d denom = _mm256_hadd_pd(tmp, tmp);
+  return Packet2cd(_mm256_div_pd(num.v, denom));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cd pcplxflip<Packet2cd>(const Packet2cd& x)
+{
+  return Packet2cd(_mm256_shuffle_pd(x.v, x.v, 0x5));
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet4cf,4>& kernel) {
+  __m256d P0 = _mm256_castps_pd(kernel.packet[0].v);
+  __m256d P1 = _mm256_castps_pd(kernel.packet[1].v);
+  __m256d P2 = _mm256_castps_pd(kernel.packet[2].v);
+  __m256d P3 = _mm256_castps_pd(kernel.packet[3].v);
+
+  __m256d T0 = _mm256_shuffle_pd(P0, P1, 15);
+  __m256d T1 = _mm256_shuffle_pd(P0, P1, 0);
+  __m256d T2 = _mm256_shuffle_pd(P2, P3, 15);
+  __m256d T3 = _mm256_shuffle_pd(P2, P3, 0);
+
+  kernel.packet[1].v = _mm256_castpd_ps(_mm256_permute2f128_pd(T0, T2, 32));
+  kernel.packet[3].v = _mm256_castpd_ps(_mm256_permute2f128_pd(T0, T2, 49));
+  kernel.packet[0].v = _mm256_castpd_ps(_mm256_permute2f128_pd(T1, T3, 32));
+  kernel.packet[2].v = _mm256_castpd_ps(_mm256_permute2f128_pd(T1, T3, 49));
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet2cd,2>& kernel) {
+  __m256d tmp = _mm256_permute2f128_pd(kernel.packet[0].v, kernel.packet[1].v, 0+(2<<4));
+  kernel.packet[1].v = _mm256_permute2f128_pd(kernel.packet[0].v, kernel.packet[1].v, 1+(3<<4));
+ kernel.packet[0].v = tmp;
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMPLEX_AVX_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/AVX/MathFunctions.h b/third_party/eigen3/Eigen/src/Core/arch/AVX/MathFunctions.h
new file mode 100644
index 0000000000..faa5c79021
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/AVX/MathFunctions.h
@@ -0,0 +1,495 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Pedro Gonnet (pedro.gonnet@gmail.com)
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MATH_FUNCTIONS_AVX_H
+#define EIGEN_MATH_FUNCTIONS_AVX_H
+
+// For some reason, this function didn't make it into the avxintirn.h
+// used by the compiler, so we'll just wrap it.
+#define _mm256_setr_m128(lo, hi) \
+  _mm256_insertf128_si256(_mm256_castsi128_si256(lo), (hi), 1)
+
+/* The sin, cos, exp, and log functions of this file are loosely derived from
+ * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/
+ */
+
+namespace Eigen {
+
+namespace internal {
+
+// Sine function
+// Computes sin(x) by wrapping x to the interval [-Pi/4,3*Pi/4] and
+// evaluating interpolants in [-Pi/4,Pi/4] or [Pi/4,3*Pi/4]. The interpolants
+// are (anti-)symmetric and thus have only odd/even coefficients
+template <>
+EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
+psin<Packet8f>(const Packet8f& _x) {
+  Packet8f x = _x;
+
+  // Some useful values.
+  _EIGEN_DECLARE_CONST_Packet8i(one, 1);
+  _EIGEN_DECLARE_CONST_Packet8f(one, 1.0f);
+  _EIGEN_DECLARE_CONST_Packet8f(two, 2.0f);
+  _EIGEN_DECLARE_CONST_Packet8f(one_over_four, 0.25f);
+  _EIGEN_DECLARE_CONST_Packet8f(one_over_pi, 3.183098861837907e-01f);
+  _EIGEN_DECLARE_CONST_Packet8f(neg_pi_first, -3.140625000000000e+00);
+  _EIGEN_DECLARE_CONST_Packet8f(neg_pi_second, -9.670257568359375e-04);
+  _EIGEN_DECLARE_CONST_Packet8f(neg_pi_third, -6.278329571784980e-07);
+  _EIGEN_DECLARE_CONST_Packet8f(four_over_pi, 1.273239544735163e+00);
+
+  // Map x from [-Pi/4,3*Pi/4] to z in [-1,3] and subtract the shifted period.
+  Packet8f z = pmul(x, p8f_one_over_pi);
+  Packet8f shift = _mm256_floor_ps(padd(z, p8f_one_over_four));
+  x = pmadd(shift, p8f_neg_pi_first, x);
+  x = pmadd(shift, p8f_neg_pi_second, x);
+  x = pmadd(shift, p8f_neg_pi_third, x);
+  z = pmul(x, p8f_four_over_pi);
+
+  // Make a mask for the entries that need flipping, i.e. wherever the shift
+  // is odd.
+  Packet8i shift_ints = _mm256_cvtps_epi32(shift);
+  Packet8i shift_isodd =
+      (__m256i)_mm256_and_ps((__m256)shift_ints, (__m256)p8i_one);
+#ifdef EIGEN_VECTORIZE_AVX2
+  Packet8i sign_flip_mask = _mm256_slli_epi32(shift_isodd, 31);
+#else
+  __m128i lo =
+      _mm_slli_epi32(_mm256_extractf128_si256((__m256i)shift_isodd, 0), 31);
+  __m128i hi =
+      _mm_slli_epi32(_mm256_extractf128_si256((__m256i)shift_isodd, 1), 31);
+  Packet8i sign_flip_mask = _mm256_setr_m128(lo, hi);
+#endif
+
+  // Create a mask for which interpolant to use, i.e. if z > 1, then the mask
+  // is set to ones for that entry.
+  Packet8f ival_mask = _mm256_cmp_ps(z, p8f_one, _CMP_GT_OQ);
+
+  // Evaluate the polynomial for the interval [1,3] in z.
+  _EIGEN_DECLARE_CONST_Packet8f(coeff_right_0, 9.999999724233232e-01f);
+  _EIGEN_DECLARE_CONST_Packet8f(coeff_right_2, -3.084242535619928e-01);
+  _EIGEN_DECLARE_CONST_Packet8f(coeff_right_4, 1.584991525700324e-02);
+  _EIGEN_DECLARE_CONST_Packet8f(coeff_right_6, -3.188805084631342e-04);
+  Packet8f z_minus_two = psub(z, p8f_two);
+  Packet8f z_minus_two2 = pmul(z_minus_two, z_minus_two);
+  Packet8f right = pmadd(p8f_coeff_right_6, z_minus_two2, p8f_coeff_right_4);
+  right = pmadd(right, z_minus_two2, p8f_coeff_right_2);
+  right = pmadd(right, z_minus_two2, p8f_coeff_right_0);
+
+  // Evaluate the polynomial for the interval [-1,1] in z.
+  _EIGEN_DECLARE_CONST_Packet8f(coeff_left_1, 7.853981525427295e-01);
+  _EIGEN_DECLARE_CONST_Packet8f(coeff_left_3, -8.074536727092352e-02);
+  _EIGEN_DECLARE_CONST_Packet8f(coeff_left_5, 2.489871967827018e-03);
+  _EIGEN_DECLARE_CONST_Packet8f(coeff_left_7, -3.587725841214251e-05);
+  Packet8f z2 = pmul(z, z);
+  Packet8f left = pmadd(p8f_coeff_left_7, z2, p8f_coeff_left_5);
+  left = pmadd(left, z2, p8f_coeff_left_3);
+  left = pmadd(left, z2, p8f_coeff_left_1);
+  left = pmul(left, z);
+
+  // Assemble the results, i.e. select the left and right polynomials.
+  left = _mm256_andnot_ps(ival_mask, left);
+  right = _mm256_and_ps(ival_mask, right);
+  Packet8f res = _mm256_or_ps(left, right);
+
+  // Flip the sign on the odd intervals and return the result.
+  res = _mm256_xor_ps(res, (__m256)sign_flip_mask);
+  return res;
+}
+
+// Natural logarithm
+// Computes log(x) as log(2^e * m) = C*e + log(m), where the constant C =log(2)
+// and m is in the range [sqrt(1/2),sqrt(2)). In this range, the logarithm can
+// be easily approximated by a polynomial centered on m=1 for stability.
+// TODO(gonnet): Further reduce the interval allowing for lower-degree
+//               polynomial interpolants -> ... -> profit!
+template <>
+EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
+plog<Packet8f>(const Packet8f& _x) {
+  Packet8f x = _x;
+  _EIGEN_DECLARE_CONST_Packet8f(1, 1.0f);
+  _EIGEN_DECLARE_CONST_Packet8f(half, 0.5f);
+  _EIGEN_DECLARE_CONST_Packet8f(126f, 126.0f);
+
+  _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inv_mant_mask, ~0x7f800000);
+
+  // The smallest non denormalized float number.
+  _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(min_norm_pos, 0x00800000);
+  _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(minus_inf, 0xff800000);
+
+  // Polynomial coefficients.
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_SQRTHF, 0.707106781186547524f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p0, 7.0376836292E-2f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p1, -1.1514610310E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p2, 1.1676998740E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p3, -1.2420140846E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p4, +1.4249322787E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p5, -1.6668057665E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p6, +2.0000714765E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p7, -2.4999993993E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_p8, +3.3333331174E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_q1, -2.12194440e-4f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_log_q2, 0.693359375f);
+
+  // invalid_mask is set to true when x is NaN
+  Packet8f invalid_mask = _mm256_cmp_ps(x, _mm256_setzero_ps(), _CMP_NGE_UQ);
+  Packet8f iszero_mask = _mm256_cmp_ps(x, _mm256_setzero_ps(), _CMP_EQ_OQ);
+
+  // Truncate input values to the minimum positive normal.
+  x = pmax(x, p8f_min_norm_pos);
+
+// Extract the shifted exponents (No bitwise shifting in regular AVX, so
+// convert to SSE and do it there).
+#ifdef EIGEN_VECTORIZE_AVX2
+  Packet8f emm0 = _mm256_cvtepi32_ps(_mm256_srli_epi32((__m256i)x, 23));
+#else
+  __m128i lo = _mm_srli_epi32(_mm256_extractf128_si256((__m256i)x, 0), 23);
+  __m128i hi = _mm_srli_epi32(_mm256_extractf128_si256((__m256i)x, 1), 23);
+  Packet8f emm0 = _mm256_cvtepi32_ps(_mm256_setr_m128(lo, hi));
+#endif
+  Packet8f e = _mm256_sub_ps(emm0, p8f_126f);
+
+  // Set the exponents to -1, i.e. x are in the range [0.5,1).
+  x = _mm256_and_ps(x, p8f_inv_mant_mask);
+  x = _mm256_or_ps(x, p8f_half);
+
+  // part2: Shift the inputs from the range [0.5,1) to [sqrt(1/2),sqrt(2))
+  // and shift by -1. The values are then centered around 0, which improves
+  // the stability of the polynomial evaluation.
+  //   if( x < SQRTHF ) {
+  //     e -= 1;
+  //     x = x + x - 1.0;
+  //   } else { x = x - 1.0; }
+  Packet8f mask = _mm256_cmp_ps(x, p8f_cephes_SQRTHF, _CMP_LT_OQ);
+  Packet8f tmp = _mm256_and_ps(x, mask);
+  x = psub(x, p8f_1);
+  e = psub(e, _mm256_and_ps(p8f_1, mask));
+  x = padd(x, tmp);
+
+  Packet8f x2 = pmul(x, x);
+  Packet8f x3 = pmul(x2, x);
+
+  // Evaluate the polynomial approximant of degree 8 in three parts, probably
+  // to improve instruction-level parallelism.
+  Packet8f y, y1, y2;
+  y = pmadd(p8f_cephes_log_p0, x, p8f_cephes_log_p1);
+  y1 = pmadd(p8f_cephes_log_p3, x, p8f_cephes_log_p4);
+  y2 = pmadd(p8f_cephes_log_p6, x, p8f_cephes_log_p7);
+  y = pmadd(y, x, p8f_cephes_log_p2);
+  y1 = pmadd(y1, x, p8f_cephes_log_p5);
+  y2 = pmadd(y2, x, p8f_cephes_log_p8);
+  y = pmadd(y, x3, y1);
+  y = pmadd(y, x3, y2);
+  y = pmul(y, x3);
+
+  // Add the logarithm of the exponent back to the result of the interpolation.
+  y1 = pmul(e, p8f_cephes_log_q1);
+  tmp = pmul(x2, p8f_half);
+  y = padd(y, y1);
+  x = psub(x, tmp);
+  y2 = pmul(e, p8f_cephes_log_q2);
+  x = padd(x, y);
+  x = padd(x, y2);
+
+  // Filter out invalid inputs, i.e. negative arg will be NAN, 0 will be -INF.
+  return _mm256_or_ps(
+      _mm256_andnot_ps(iszero_mask, _mm256_or_ps(x, invalid_mask)),
+      _mm256_and_ps(iszero_mask, p8f_minus_inf));
+}
+
+// Exponential function. Works by writing "x = m*log(2) + r" where
+// "m = floor(x/log(2)+1/2)" and "r" is the remainder. The result is then
+// "exp(x) = 2^m*exp(r)" where exp(r) is in the range [-1,1).
+template <>
+EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
+pexp<Packet8f>(const Packet8f& _x) {
+  _EIGEN_DECLARE_CONST_Packet8f(1, 1.0f);
+  _EIGEN_DECLARE_CONST_Packet8f(half, 0.5f);
+  _EIGEN_DECLARE_CONST_Packet8f(127, 127.0f);
+
+  _EIGEN_DECLARE_CONST_Packet8f(exp_hi, 88.3762626647950f);
+  _EIGEN_DECLARE_CONST_Packet8f(exp_lo, -88.3762626647949f);
+
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_LOG2EF, 1.44269504088896341f);
+
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p0, 1.9875691500E-4f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p1, 1.3981999507E-3f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p2, 8.3334519073E-3f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p3, 4.1665795894E-2f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p4, 1.6666665459E-1f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_exp_p5, 5.0000001201E-1f);
+
+  // Clamp x.
+  Packet8f x = pmax(pmin(_x, p8f_exp_hi), p8f_exp_lo);
+
+  // Express exp(x) as exp(m*ln(2) + r), start by extracting
+  // m = floor(x/ln(2) + 0.5).
+  Packet8f m = _mm256_floor_ps(pmadd(x, p8f_cephes_LOG2EF, p8f_half));
+
+// Get r = x - m*ln(2). If no FMA instructions are available, m*ln(2) is
+// subtracted out in two parts, m*C1+m*C2 = m*ln(2), to avoid accumulating
+// truncation errors. Note that we don't use the "pmadd" function here to
+// ensure that a precision-preserving FMA instruction is used.
+#ifdef EIGEN_VECTORIZE_FMA
+  _EIGEN_DECLARE_CONST_Packet8f(nln2, -0.6931471805599453f);
+  Packet8f r = _mm256_fmadd_ps(m, p8f_nln2, x);
+#else
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_exp_C1, 0.693359375f);
+  _EIGEN_DECLARE_CONST_Packet8f(cephes_exp_C2, -2.12194440e-4f);
+  Packet8f r = psub(x, pmul(m, p8f_cephes_exp_C1));
+  r = psub(r, pmul(m, p8f_cephes_exp_C2));
+#endif
+
+  Packet8f r2 = pmul(r, r);
+
+  // TODO(gonnet): Split into odd/even polynomials and try to exploit
+  //               instruction-level parallelism.
+  Packet8f y = p8f_cephes_exp_p0;
+  y = pmadd(y, r, p8f_cephes_exp_p1);
+  y = pmadd(y, r, p8f_cephes_exp_p2);
+  y = pmadd(y, r, p8f_cephes_exp_p3);
+  y = pmadd(y, r, p8f_cephes_exp_p4);
+  y = pmadd(y, r, p8f_cephes_exp_p5);
+  y = pmadd(y, r2, r);
+  y = padd(y, p8f_1);
+
+  // Build emm0 = 2^m.
+  Packet8i emm0 = _mm256_cvttps_epi32(padd(m, p8f_127));
+#ifdef EIGEN_VECTORIZE_AVX2
+  emm0 = _mm256_slli_epi32(emm0, 23);
+#else
+  __m128i lo = _mm_slli_epi32(_mm256_extractf128_si256(emm0, 0), 23);
+  __m128i hi = _mm_slli_epi32(_mm256_extractf128_si256(emm0, 1), 23);
+  emm0 = _mm256_setr_m128(lo, hi);
+#endif
+
+  // Return 2^m * exp(r).
+  return pmax(pmul(y, _mm256_castsi256_ps(emm0)), _x);
+}
+template <>
+EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet4d
+pexp<Packet4d>(const Packet4d& _x) {
+  Packet4d x = _x;
+
+  _EIGEN_DECLARE_CONST_Packet4d(1, 1.0);
+  _EIGEN_DECLARE_CONST_Packet4d(2, 2.0);
+  _EIGEN_DECLARE_CONST_Packet4d(half, 0.5);
+
+  _EIGEN_DECLARE_CONST_Packet4d(exp_hi, 709.437);
+  _EIGEN_DECLARE_CONST_Packet4d(exp_lo, -709.436139303);
+
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_LOG2EF, 1.4426950408889634073599);
+
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_p0, 1.26177193074810590878e-4);
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_p1, 3.02994407707441961300e-2);
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_p2, 9.99999999999999999910e-1);
+
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_q0, 3.00198505138664455042e-6);
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_q1, 2.52448340349684104192e-3);
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_q2, 2.27265548208155028766e-1);
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_q3, 2.00000000000000000009e0);
+
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_C1, 0.693145751953125);
+  _EIGEN_DECLARE_CONST_Packet4d(cephes_exp_C2, 1.42860682030941723212e-6);
+  _EIGEN_DECLARE_CONST_Packet4i(1023, 1023);
+
+  Packet4d tmp, fx;
+
+  // clamp x
+  x = pmax(pmin(x, p4d_exp_hi), p4d_exp_lo);
+  // Express exp(x) as exp(g + n*log(2)).
+  fx = pmadd(p4d_cephes_LOG2EF, x, p4d_half);
+
+  // Get the integer modulus of log(2), i.e. the "n" described above.
+  fx = _mm256_floor_pd(fx);
+
+  // Get the remainder modulo log(2), i.e. the "g" described above. Subtract
+  // n*log(2) out in two steps, i.e. n*C1 + n*C2, C1+C2=log2 to get the last
+  // digits right.
+  tmp = pmul(fx, p4d_cephes_exp_C1);
+  Packet4d z = pmul(fx, p4d_cephes_exp_C2);
+  x = psub(x, tmp);
+  x = psub(x, z);
+
+  Packet4d x2 = pmul(x, x);
+
+  // Evaluate the numerator polynomial of the rational interpolant.
+  Packet4d px = p4d_cephes_exp_p0;
+  px = pmadd(px, x2, p4d_cephes_exp_p1);
+  px = pmadd(px, x2, p4d_cephes_exp_p2);
+  px = pmul(px, x);
+
+  // Evaluate the denominator polynomial of the rational interpolant.
+  Packet4d qx = p4d_cephes_exp_q0;
+  qx = pmadd(qx, x2, p4d_cephes_exp_q1);
+  qx = pmadd(qx, x2, p4d_cephes_exp_q2);
+  qx = pmadd(qx, x2, p4d_cephes_exp_q3);
+
+  // I don't really get this bit, copied from the SSE2 routines, so...
+  // TODO(gonnet): Figure out what is going on here, perhaps find a better
+  // rational interpolant?
+  x = _mm256_div_pd(px, psub(qx, px));
+  x = pmadd(p4d_2, x, p4d_1);
+
+  // Build e=2^n by constructing the exponents in a 128-bit vector and
+  // shifting them to where they belong in double-precision values.
+  __m128i emm0 = _mm256_cvtpd_epi32(fx);
+  emm0 = _mm_add_epi32(emm0, p4i_1023);
+  emm0 = _mm_shuffle_epi32(emm0, _MM_SHUFFLE(3, 1, 2, 0));
+  __m128i lo = _mm_slli_epi64(emm0, 52);
+  __m128i hi = _mm_slli_epi64(_mm_srli_epi64(emm0, 32), 52);
+  __m256i e = _mm256_insertf128_si256(_mm256_setzero_si256(), lo, 0);
+  e = _mm256_insertf128_si256(e, hi, 1);
+
+  // Construct the result 2^n * exp(g) = e * x. The max is used to catch
+  // non-finite values in the input.
+  return pmax(pmul(x, Packet4d(e)), _x);
+}
+
+// Functions for sqrt.
+// The EIGEN_FAST_MATH version uses the _mm_rsqrt_ps approximation and one step
+// of Newton's method, at a cost of 1-2 bits of precision as opposed to the
+// exact solution. The main advantage of this approach is not just speed, but
+// also the fact that it can be inlined and pipelined with other computations,
+// further reducing its effective latency.
+#if EIGEN_FAST_MATH
+template <>
+EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
+psqrt<Packet8f>(const Packet8f& _x) {
+  _EIGEN_DECLARE_CONST_Packet8f(one_point_five, 1.5f);
+  _EIGEN_DECLARE_CONST_Packet8f(minus_half, -0.5f);
+  _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(flt_min, 0x00800000);
+
+  Packet8f neg_half = pmul(_x, p8f_minus_half);
+
+  // select only the inverse sqrt of positive normal inputs (denormals are
+  // flushed to zero and cause infs as well).
+  Packet8f non_zero_mask = _mm256_cmp_ps(_x, p8f_flt_min, _CMP_GE_OQ);
+  Packet8f x = _mm256_and_ps(non_zero_mask, _mm256_rsqrt_ps(_x));
+
+  // Do a single step of Newton's iteration.
+  x = pmul(x, pmadd(neg_half, pmul(x, x), p8f_one_point_five));
+
+  // Multiply the original _x by it's reciprocal square root to extract the
+  // square root.
+  return pmul(_x, x);
+}
+#else
+template <>
+EIGEN_STRONG_INLINE Packet8f psqrt<Packet8f>(const Packet8f& x) {
+  return _mm256_sqrt_ps(x);
+}
+#endif
+template <>
+EIGEN_STRONG_INLINE Packet4d psqrt<Packet4d>(const Packet4d& x) {
+  return _mm256_sqrt_pd(x);
+}
+
+// Functions for rsqrt.
+// Almost identical to the sqrt routine, just leave out the last multiplication
+// and fill in NaN/Inf where needed. Note that this function only exists as an
+// iterative version since there is no instruction for diretly computing the
+// reciprocal square root in AVX/AVX2 (there will be one in AVX-512).
+#ifdef EIGEN_FAST_MATH
+template <>
+EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
+prsqrt<Packet8f>(const Packet8f& _x) {
+  _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inf, 0x7f800000);
+  _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(nan, 0x7fc00000);
+  _EIGEN_DECLARE_CONST_Packet8f(one_point_five, 1.5f);
+  _EIGEN_DECLARE_CONST_Packet8f(minus_half, -0.5f);
+  _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(flt_min, 0x00800000);
+
+  Packet8f neg_half = pmul(_x, p8f_minus_half);
+
+  // select only the inverse sqrt of positive normal inputs (denormals are
+  // flushed to zero and cause infs as well).
+  Packet8f le_zero_mask = _mm256_cmp_ps(_x, p8f_flt_min, _CMP_LT_OQ);
+  Packet8f x = _mm256_andnot_ps(le_zero_mask, _mm256_rsqrt_ps(_x));
+
+  // Fill in NaNs and Infs for the negative/zero entries.
+  Packet8f neg_mask = _mm256_cmp_ps(_x, _mm256_setzero_ps(), _CMP_LT_OQ);
+  Packet8f zero_mask = _mm256_andnot_ps(neg_mask, le_zero_mask);
+  Packet8f infs_and_nans = _mm256_or_ps(_mm256_and_ps(neg_mask, p8f_nan),
+                                        _mm256_and_ps(zero_mask, p8f_inf));
+
+  // Do a single step of Newton's iteration.
+  x = pmul(x, pmadd(neg_half, pmul(x, x), p8f_one_point_five));
+
+  // Insert NaNs and Infs in all the right places.
+  return _mm256_or_ps(x, infs_and_nans);
+}
+#else
+template <>
+EIGEN_STRONG_INLINE Packet8f prsqrt<Packet8f>(const Packet8f& x) {
+  _EIGEN_DECLARE_CONST_Packet8f(one, 1.0f);
+  return _mm256_div_ps(p8f_one, _mm256_sqrt_ps(x));
+}
+#endif
+template <>
+EIGEN_STRONG_INLINE Packet4d prsqrt<Packet4d>(const Packet4d& x) {
+  _EIGEN_DECLARE_CONST_Packet4d(one, 1.0);
+  return _mm256_div_pd(p4d_one, _mm256_sqrt_pd(x));
+}
+
+// Functions for division.
+// The EIGEN_FAST_MATH version uses the _mm_rcp_ps approximation and one step of
+// Newton's method, at a cost of 1-2 bits of precision as opposed to the exact
+// solution. The main advantage of this approach is not just speed, but also the
+// fact that it can be inlined and pipelined with other computations, further
+// reducing its effective latency.
+#if EIGEN_FAST_DIV
+template <>
+EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED Packet8f
+pdiv<Packet8f>(const Packet8f& a, const Packet8f& b) {
+  _EIGEN_DECLARE_CONST_Packet8f(two, 2.0f);
+  _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(inf, 0x7f800000);
+
+  Packet8f neg_b = pnegate(b);
+
+  /* select only the inverse of non-zero b */
+  Packet8f non_zero_mask = _mm256_cmp_ps(b, _mm256_setzero_ps(), _CMP_NEQ_OQ);
+  Packet8f x = _mm256_and_ps(non_zero_mask, _mm256_rcp_ps(b));
+
+  /* One step of Newton's method on b - x^-1 == 0. */
+  x = pmul(x, pmadd(neg_b, x, p8f_two));
+
+  /* Return Infs wherever there were zeros. */
+  return pmul(a, _mm256_or_ps(_mm256_and_ps(non_zero_mask, x),
+                              _mm256_andnot_ps(non_zero_mask, p8f_inf)));
+}
+#else
+template <>
+EIGEN_STRONG_INLINE Packet8f
+pdiv<Packet8f>(const Packet8f& a, const Packet8f& b) {
+  return _mm256_div_ps(a, b);
+}
+#endif
+template <>
+EIGEN_STRONG_INLINE Packet4d
+pdiv<Packet4d>(const Packet4d& a, const Packet4d& b) {
+  return _mm256_div_pd(a, b);
+}
+template <>
+EIGEN_STRONG_INLINE Packet8i
+pdiv<Packet8i>(const Packet8i& /*a*/, const Packet8i& /*b*/) {
+  eigen_assert(false && "packet integer division are not supported by AVX");
+  return pset1<Packet8i>(0);
+}
+
+// Identical to the ptanh in GenericPacketMath.h, but for doubles use
+// a small/medium approximation threshold of 0.001.
+template<> EIGEN_STRONG_INLINE Packet4d ptanh_approx_threshold() {
+  return pset1<Packet4d>(0.001);
+}
+
+}  // end namespace internal
+
+}  // end namespace Eigen
+
+#endif  // EIGEN_MATH_FUNCTIONS_AVX_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/AVX/PacketMath.h b/third_party/eigen3/Eigen/src/Core/arch/AVX/PacketMath.h
new file mode 100644
index 0000000000..6369a836ab
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/AVX/PacketMath.h
@@ -0,0 +1,650 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner (benoit.steiner.goog@gmail.com)
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PACKET_MATH_AVX_H
+#define EIGEN_PACKET_MATH_AVX_H
+
+namespace Eigen {
+
+namespace internal {
+
+#ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD
+#define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8
+#endif
+
+#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
+#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS (2*sizeof(void*))
+#endif
+
+#ifdef __FMA__
+#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+#endif
+#endif
+
+typedef __m256  Packet8f;
+typedef __m256i Packet8i;
+typedef __m256d Packet4d;
+
+template<> struct is_arithmetic<__m256>  { enum { value = true }; };
+template<> struct is_arithmetic<__m256i> { enum { value = true }; };
+template<> struct is_arithmetic<__m256d> { enum { value = true }; };
+
+#define _EIGEN_DECLARE_CONST_Packet8f(NAME,X) \
+  const Packet8f p8f_##NAME = pset1<Packet8f>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet8f_FROM_INT(NAME,X) \
+  const Packet8f p8f_##NAME = (__m256)pset1<Packet8i>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet8i(NAME,X) \
+  const Packet8i p8i_##NAME = pset1<Packet8i>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet4d(NAME,X) \
+  const Packet4d p4d_##NAME = pset1<Packet4d>(X)
+
+
+template<> struct packet_traits<float>  : default_packet_traits
+{
+  typedef Packet8f type;
+  typedef Packet4f half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=8,
+    HasHalfPacket = 1,
+
+    HasDiv  = 1,
+    HasSin  = 1,
+    HasCos  = 0,
+    HasTanH = 1,
+    HasBlend = 1,
+    HasLog  = 1,
+    HasExp  = 1,
+    HasSqrt = 1,
+    HasRsqrt = 1,
+    HasSelect = 1,
+    HasEq = 1,
+  };
+ };
+template<> struct packet_traits<double> : default_packet_traits
+{
+  typedef Packet4d type;
+  typedef Packet2d half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 4,
+    HasHalfPacket = 1,
+
+    HasDiv = 1,
+    HasBlend = 1,
+    HasExp = 1,
+    HasSqrt = 1,
+    HasRsqrt = 1,
+    HasSelect = 1,
+    HasEq = 1,
+  };
+};
+
+/* Proper support for integers is only provided by AVX2. In the meantime, we'll
+   use SSE instructions and packets to deal with integers.
+template<> struct packet_traits<int>    : default_packet_traits
+{
+  typedef Packet8i type;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=8
+  };
+};
+*/
+
+template<> struct unpacket_traits<Packet8f> { typedef float  type; typedef Packet4f half; enum {size=8}; };
+template<> struct unpacket_traits<Packet4d> { typedef double type; typedef Packet2d half; enum {size=4}; };
+template<> struct unpacket_traits<Packet8i> { typedef int    type; typedef Packet4i half; enum {size=8}; };
+
+template<> EIGEN_STRONG_INLINE Packet8f pset1<Packet8f>(const float&  from) { return _mm256_set1_ps(from); }
+template<> EIGEN_STRONG_INLINE Packet4d pset1<Packet4d>(const double& from) { return _mm256_set1_pd(from); }
+template<> EIGEN_STRONG_INLINE Packet8i pset1<Packet8i>(const int&    from) { return _mm256_set1_epi32(from); }
+
+template<> EIGEN_STRONG_INLINE Packet8f pload1<Packet8f>(const float*  from) { return _mm256_broadcast_ss(from); }
+template<> EIGEN_STRONG_INLINE Packet4d pload1<Packet4d>(const double* from) { return _mm256_broadcast_sd(from); }
+
+template<> EIGEN_STRONG_INLINE Packet8f plset<float>(const float& a) { return _mm256_add_ps(_mm256_set1_ps(a), _mm256_set_ps(7.0,6.0,5.0,4.0,3.0,2.0,1.0,0.0)); }
+template<> EIGEN_STRONG_INLINE Packet4d plset<double>(const double& a) { return _mm256_add_pd(_mm256_set1_pd(a), _mm256_set_pd(3.0,2.0,1.0,0.0)); }
+
+template<> EIGEN_STRONG_INLINE Packet8f padd<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_add_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d padd<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_add_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet8f psub<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_sub_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d psub<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_sub_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet8f ple<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_NGT_UQ); }
+template<> EIGEN_STRONG_INLINE Packet4d ple<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_cmp_pd(a,b,_CMP_NGT_UQ); }
+
+template<> EIGEN_STRONG_INLINE Packet8f plt<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_NGE_UQ); }
+template<> EIGEN_STRONG_INLINE Packet4d plt<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_cmp_pd(a,b,_CMP_NGE_UQ); }
+
+template<> EIGEN_STRONG_INLINE Packet8f peq<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_cmp_ps(a,b,_CMP_EQ_UQ); }
+template<> EIGEN_STRONG_INLINE Packet4d peq<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_cmp_pd(a,b,_CMP_EQ_UQ); }
+
+template<> EIGEN_STRONG_INLINE Packet8f pselect<Packet8f>(const Packet8f& a, const Packet8f& b, const Packet8f& false_mask) { return _mm256_blendv_ps(a,b,false_mask); }
+template<> EIGEN_STRONG_INLINE Packet4d pselect<Packet4d>(const Packet4d& a, const Packet4d& b, const Packet4d& false_mask) { return _mm256_blendv_pd(a,b,false_mask); }
+
+template<> EIGEN_STRONG_INLINE Packet8f pnegate(const Packet8f& a)
+{
+  return _mm256_sub_ps(_mm256_set1_ps(0.0),a);
+}
+template<> EIGEN_STRONG_INLINE Packet4d pnegate(const Packet4d& a)
+{
+  return _mm256_sub_pd(_mm256_set1_pd(0.0),a);
+}
+
+template<> EIGEN_STRONG_INLINE Packet8f pconj(const Packet8f& a) { return a; }
+template<> EIGEN_STRONG_INLINE Packet4d pconj(const Packet4d& a) { return a; }
+template<> EIGEN_STRONG_INLINE Packet8i pconj(const Packet8i& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE Packet8f pmul<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_mul_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d pmul<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_mul_pd(a,b); }
+
+#ifdef __FMA__
+template<> EIGEN_STRONG_INLINE Packet8f pmadd(const Packet8f& a, const Packet8f& b, const Packet8f& c) {
+#if EIGEN_COMP_GNUC || EIGEN_COMP_CLANG
+  // clang stupidly generates a vfmadd213ps instruction plus some vmovaps on registers,
+  // and gcc stupidly generates a vfmadd132ps instruction,
+  // so let's enforce it to generate a vfmadd231ps instruction since the most common use case is to accumulate
+  // the result of the product.
+  Packet8f res = c;
+  asm("vfmadd231ps %[a], %[b], %[c]" : [c] "+x" (res) : [a] "x" (a), [b] "x" (b));
+  return res;
+#else
+  return _mm256_fmadd_ps(a,b,c);
+#endif
+}
+template<> EIGEN_STRONG_INLINE Packet4d pmadd(const Packet4d& a, const Packet4d& b, const Packet4d& c) {
+#if EIGEN_COMP_GNUC || EIGEN_COMP_CLANG
+  // see above
+  Packet4d res = c;
+  asm("vfmadd231pd %[a], %[b], %[c]" : [c] "+x" (res) : [a] "x" (a), [b] "x" (b));
+  return res;
+#else
+  return _mm256_fmadd_pd(a,b,c);
+#endif
+}
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet8f pmin<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_min_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d pmin<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_min_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet8f pmax<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_max_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d pmax<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_max_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet8f pand<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_and_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d pand<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_and_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet8f por<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_or_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d por<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_or_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet8f pxor<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_xor_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d pxor<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_xor_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet8f pandnot<Packet8f>(const Packet8f& a, const Packet8f& b) { return _mm256_andnot_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4d pandnot<Packet4d>(const Packet4d& a, const Packet4d& b) { return _mm256_andnot_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet8f pload<Packet8f>(const float*   from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_ps(from); }
+template<> EIGEN_STRONG_INLINE Packet4d pload<Packet4d>(const double*  from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_pd(from); }
+template<> EIGEN_STRONG_INLINE Packet8i pload<Packet8i>(const int*     from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_si256(reinterpret_cast<const __m256i*>(from)); }
+
+template<> EIGEN_STRONG_INLINE Packet8f ploadu<Packet8f>(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_ps(from); }
+template<> EIGEN_STRONG_INLINE Packet4d ploadu<Packet4d>(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_pd(from); }
+template<> EIGEN_STRONG_INLINE Packet8i ploadu<Packet8i>(const int* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_si256(reinterpret_cast<const __m256i*>(from)); }
+
+// Loads 4 floats from memory a returns the packet {a0, a0  a1, a1, a2, a2, a3, a3}
+template<> EIGEN_STRONG_INLINE Packet8f ploaddup<Packet8f>(const float* from)
+{
+  // TODO try to find a way to avoid the need of a temporary register
+//   Packet8f tmp  = _mm256_castps128_ps256(_mm_loadu_ps(from));
+//   tmp = _mm256_insertf128_ps(tmp, _mm_movehl_ps(_mm256_castps256_ps128(tmp),_mm256_castps256_ps128(tmp)), 1);
+//   return _mm256_unpacklo_ps(tmp,tmp);
+
+  // _mm256_insertf128_ps is very slow on Haswell, thus:
+  Packet8f tmp = _mm256_broadcast_ps((const __m128*)(const void*)from);
+  // mimic an "inplace" permutation of the lower 128bits using a blend
+  tmp = _mm256_blend_ps(tmp,_mm256_castps128_ps256(_mm_permute_ps( _mm256_castps256_ps128(tmp), _MM_SHUFFLE(1,0,1,0))), 15);
+  // then we can perform a consistent permutation on the global register to get everything in shape:
+  return  _mm256_permute_ps(tmp, _MM_SHUFFLE(3,3,2,2));
+}
+// Loads 2 doubles from memory a returns the packet {a0, a0  a1, a1}
+template<> EIGEN_STRONG_INLINE Packet4d ploaddup<Packet4d>(const double* from)
+{
+  Packet4d tmp = _mm256_broadcast_pd((const __m128d*)(const void*)from);
+  return  _mm256_permute_pd(tmp, 3<<2);
+}
+
+// Loads 2 floats from memory a returns the packet {a0, a0  a0, a0, a1, a1, a1, a1}
+template<> EIGEN_STRONG_INLINE Packet8f ploadquad<Packet8f>(const float* from)
+{
+  Packet8f tmp = _mm256_castps128_ps256(_mm_broadcast_ss(from));
+  return _mm256_insertf128_ps(tmp, _mm_broadcast_ss(from+1), 1);
+}
+
+template<> EIGEN_STRONG_INLINE void pstore<float>(float*   to, const Packet8f& from) { EIGEN_DEBUG_ALIGNED_STORE _mm256_store_ps(to, from); }
+template<> EIGEN_STRONG_INLINE void pstore<double>(double* to, const Packet4d& from) { EIGEN_DEBUG_ALIGNED_STORE _mm256_store_pd(to, from); }
+template<> EIGEN_STRONG_INLINE void pstore<int>(int*       to, const Packet8i& from) { EIGEN_DEBUG_ALIGNED_STORE _mm256_storeu_si256(reinterpret_cast<__m256i*>(to), from); }
+
+template<> EIGEN_STRONG_INLINE void pstoreu<float>(float*   to, const Packet8f& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_ps(to, from); }
+template<> EIGEN_STRONG_INLINE void pstoreu<double>(double* to, const Packet4d& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_pd(to, from); }
+template<> EIGEN_STRONG_INLINE void pstoreu<int>(int*       to, const Packet8i& from) { EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_si256(reinterpret_cast<__m256i*>(to), from); }
+
+// NOTE: leverage _mm256_i32gather_ps and _mm256_i32gather_pd if AVX2 instructions are available
+template<> EIGEN_DEVICE_FUNC inline Packet8f pgather<float, Packet8f>(const float* from, int stride)
+{
+#ifdef EIGEN_VECTORIZE_AVX2
+  return _mm256_i32gather_ps(from, _mm256_set1_epi32(stride), 4);
+#else
+  return _mm256_set_ps(from[7*stride], from[6*stride], from[5*stride], from[4*stride],
+                       from[3*stride], from[2*stride], from[1*stride], from[0*stride]);
+#endif
+}
+template<> EIGEN_DEVICE_FUNC inline Packet4d pgather<double, Packet4d>(const double* from, int stride)
+{
+#ifdef EIGEN_VECTORIZE_AVX2
+  return _mm256_i32gather_pd(from, _mm_set1_epi32(stride), 8);
+#else
+  return _mm256_set_pd(from[3*stride], from[2*stride], from[1*stride], from[0*stride]);
+#endif
+}
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<float, Packet8f>(float* to, const Packet8f& from, int stride)
+{
+  __m128 low = _mm256_extractf128_ps(from, 0);
+  to[stride*0] = _mm_cvtss_f32(low);
+  to[stride*1] = _mm_cvtss_f32(_mm_shuffle_ps(low, low, 1));
+  to[stride*2] = _mm_cvtss_f32(_mm_shuffle_ps(low, low, 2));
+  to[stride*3] = _mm_cvtss_f32(_mm_shuffle_ps(low, low, 3));
+
+  __m128 high = _mm256_extractf128_ps(from, 1);
+  to[stride*4] = _mm_cvtss_f32(high);
+  to[stride*5] = _mm_cvtss_f32(_mm_shuffle_ps(high, high, 1));
+  to[stride*6] = _mm_cvtss_f32(_mm_shuffle_ps(high, high, 2));
+  to[stride*7] = _mm_cvtss_f32(_mm_shuffle_ps(high, high, 3));
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<double, Packet4d>(double* to, const Packet4d& from, int stride)
+{
+  __m128d low = _mm256_extractf128_pd(from, 0);
+  to[stride*0] = _mm_cvtsd_f64(low);
+  to[stride*1] = _mm_cvtsd_f64(_mm_shuffle_pd(low, low, 1));
+  __m128d high = _mm256_extractf128_pd(from, 1);
+  to[stride*2] = _mm_cvtsd_f64(high);
+  to[stride*3] = _mm_cvtsd_f64(_mm_shuffle_pd(high, high, 1));
+}
+
+template<> EIGEN_STRONG_INLINE void pstore1<Packet8f>(float* to, const float& a)
+{
+  Packet8f pa = pset1<Packet8f>(a);
+  pstore(to, pa);
+}
+template<> EIGEN_STRONG_INLINE void pstore1<Packet4d>(double* to, const double& a)
+{
+  Packet4d pa = pset1<Packet4d>(a);
+  pstore(to, pa);
+}
+template<> EIGEN_STRONG_INLINE void pstore1<Packet8i>(int* to, const int& a)
+{
+  Packet8i pa = pset1<Packet8i>(a);
+  pstore(to, pa);
+}
+
+template<> EIGEN_STRONG_INLINE void prefetch<float>(const float*   addr) { _mm_prefetch((const char*)(addr), _MM_HINT_T0); }
+template<> EIGEN_STRONG_INLINE void prefetch<double>(const double* addr) { _mm_prefetch((const char*)(addr), _MM_HINT_T0); }
+template<> EIGEN_STRONG_INLINE void prefetch<int>(const int*       addr) { _mm_prefetch((const char*)(addr), _MM_HINT_T0); }
+
+template<> EIGEN_STRONG_INLINE float  pfirst<Packet8f>(const Packet8f& a) {
+  return _mm_cvtss_f32(_mm256_castps256_ps128(a));
+}
+template<> EIGEN_STRONG_INLINE double pfirst<Packet4d>(const Packet4d& a) {
+  return _mm_cvtsd_f64(_mm256_castpd256_pd128(a));
+}
+template<> EIGEN_STRONG_INLINE int    pfirst<Packet8i>(const Packet8i& a) {
+  return _mm_cvtsi128_si32(_mm256_castsi256_si128(a));
+}
+
+
+template<> EIGEN_STRONG_INLINE Packet8f preverse(const Packet8f& a)
+{
+  __m256 tmp = _mm256_shuffle_ps(a,a,0x1b);
+  return _mm256_permute2f128_ps(tmp, tmp, 1);
+}
+template<> EIGEN_STRONG_INLINE Packet4d preverse(const Packet4d& a)
+{
+   __m256d tmp = _mm256_shuffle_pd(a,a,5);
+  return _mm256_permute2f128_pd(tmp, tmp, 1);
+
+  __m256d swap_halves = _mm256_permute2f128_pd(a,a,1);
+    return _mm256_permute_pd(swap_halves,5);
+}
+
+// pabs should be ok
+template<> EIGEN_STRONG_INLINE Packet8f pabs(const Packet8f& a)
+{
+  const Packet8f mask = _mm256_castsi256_ps(_mm256_setr_epi32(0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF));
+  return _mm256_and_ps(a,mask);
+}
+template<> EIGEN_STRONG_INLINE Packet4d pabs(const Packet4d& a)
+{
+  const Packet4d mask = _mm256_castsi256_pd(_mm256_setr_epi32(0xFFFFFFFF,0x7FFFFFFF,0xFFFFFFFF,0x7FFFFFFF,0xFFFFFFFF,0x7FFFFFFF,0xFFFFFFFF,0x7FFFFFFF));
+  return _mm256_and_pd(a,mask);
+}
+
+// preduxp should be ok
+// FIXME: why is this ok? why isn't the simply implementation working as expected?
+template<> EIGEN_STRONG_INLINE Packet8f preduxp<Packet8f>(const Packet8f* vecs)
+{
+    __m256 hsum1 = _mm256_hadd_ps(vecs[0], vecs[1]);
+    __m256 hsum2 = _mm256_hadd_ps(vecs[2], vecs[3]);
+    __m256 hsum3 = _mm256_hadd_ps(vecs[4], vecs[5]);
+    __m256 hsum4 = _mm256_hadd_ps(vecs[6], vecs[7]);
+
+    __m256 hsum5 = _mm256_hadd_ps(hsum1, hsum1);
+    __m256 hsum6 = _mm256_hadd_ps(hsum2, hsum2);
+    __m256 hsum7 = _mm256_hadd_ps(hsum3, hsum3);
+    __m256 hsum8 = _mm256_hadd_ps(hsum4, hsum4);
+
+    __m256 perm1 =  _mm256_permute2f128_ps(hsum5, hsum5, 0x23);
+    __m256 perm2 =  _mm256_permute2f128_ps(hsum6, hsum6, 0x23);
+    __m256 perm3 =  _mm256_permute2f128_ps(hsum7, hsum7, 0x23);
+    __m256 perm4 =  _mm256_permute2f128_ps(hsum8, hsum8, 0x23);
+
+    __m256 sum1 = _mm256_add_ps(perm1, hsum5);
+    __m256 sum2 = _mm256_add_ps(perm2, hsum6);
+    __m256 sum3 = _mm256_add_ps(perm3, hsum7);
+    __m256 sum4 = _mm256_add_ps(perm4, hsum8);
+
+    __m256 blend1 = _mm256_blend_ps(sum1, sum2, 0xcc);
+    __m256 blend2 = _mm256_blend_ps(sum3, sum4, 0xcc);
+
+    __m256 final = _mm256_blend_ps(blend1, blend2, 0xf0);
+    return final;
+}
+template<> EIGEN_STRONG_INLINE Packet4d preduxp<Packet4d>(const Packet4d* vecs)
+{
+ Packet4d tmp0, tmp1;
+
+  tmp0 = _mm256_hadd_pd(vecs[0], vecs[1]);
+  tmp0 = _mm256_add_pd(tmp0, _mm256_permute2f128_pd(tmp0, tmp0, 1));
+
+  tmp1 = _mm256_hadd_pd(vecs[2], vecs[3]);
+  tmp1 = _mm256_add_pd(tmp1, _mm256_permute2f128_pd(tmp1, tmp1, 1));
+
+  return _mm256_blend_pd(tmp0, tmp1, 0xC);
+}
+
+template<> EIGEN_STRONG_INLINE float predux<Packet8f>(const Packet8f& a)
+{
+  Packet8f tmp0 = _mm256_hadd_ps(a,_mm256_permute2f128_ps(a,a,1));
+  tmp0 = _mm256_hadd_ps(tmp0,tmp0);
+  return pfirst(_mm256_hadd_ps(tmp0, tmp0));
+}
+template<> EIGEN_STRONG_INLINE double predux<Packet4d>(const Packet4d& a)
+{
+  Packet4d tmp0 = _mm256_hadd_pd(a,_mm256_permute2f128_pd(a,a,1));
+  return pfirst(_mm256_hadd_pd(tmp0,tmp0));
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f predux4<Packet8f>(const Packet8f& a)
+{
+  return _mm_add_ps(_mm256_castps256_ps128(a),_mm256_extractf128_ps(a,1));
+}
+
+template<> EIGEN_STRONG_INLINE float predux_mul<Packet8f>(const Packet8f& a)
+{
+  Packet8f tmp;
+  tmp = _mm256_mul_ps(a, _mm256_permute2f128_ps(a,a,1));
+  tmp = _mm256_mul_ps(tmp, _mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(1,0,3,2)));
+  return pfirst(_mm256_mul_ps(tmp, _mm256_shuffle_ps(tmp,tmp,1)));
+}
+template<> EIGEN_STRONG_INLINE double predux_mul<Packet4d>(const Packet4d& a)
+{
+  Packet4d tmp;
+  tmp = _mm256_mul_pd(a, _mm256_permute2f128_pd(a,a,1));
+  return pfirst(_mm256_mul_pd(tmp, _mm256_shuffle_pd(tmp,tmp,1)));
+}
+
+template<> EIGEN_STRONG_INLINE float predux_min<Packet8f>(const Packet8f& a)
+{
+  Packet8f tmp = _mm256_min_ps(a, _mm256_permute2f128_ps(a,a,1));
+  tmp = _mm256_min_ps(tmp, _mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(1,0,3,2)));
+  return pfirst(_mm256_min_ps(tmp, _mm256_shuffle_ps(tmp,tmp,1)));
+}
+template<> EIGEN_STRONG_INLINE double predux_min<Packet4d>(const Packet4d& a)
+{
+  Packet4d tmp = _mm256_min_pd(a, _mm256_permute2f128_pd(a,a,1));
+  return pfirst(_mm256_min_pd(tmp, _mm256_shuffle_pd(tmp, tmp, 1)));
+}
+
+template<> EIGEN_STRONG_INLINE float predux_max<Packet8f>(const Packet8f& a)
+{
+  Packet8f tmp = _mm256_max_ps(a, _mm256_permute2f128_ps(a,a,1));
+  tmp = _mm256_max_ps(tmp, _mm256_shuffle_ps(tmp,tmp,_MM_SHUFFLE(1,0,3,2)));
+  return pfirst(_mm256_max_ps(tmp, _mm256_shuffle_ps(tmp,tmp,1)));
+}
+
+template<> EIGEN_STRONG_INLINE double predux_max<Packet4d>(const Packet4d& a)
+{
+  Packet4d tmp = _mm256_max_pd(a, _mm256_permute2f128_pd(a,a,1));
+  return pfirst(_mm256_max_pd(tmp, _mm256_shuffle_pd(tmp, tmp, 1)));
+}
+
+
+template<int Offset>
+struct palign_impl<Offset,Packet8f>
+{
+  static EIGEN_STRONG_INLINE void run(Packet8f& first, const Packet8f& second)
+  {
+    if (Offset==1)
+    {
+      first = _mm256_blend_ps(first, second, 1);
+      Packet8f tmp = _mm256_permute_ps (first, _MM_SHUFFLE(0,3,2,1));
+      first = _mm256_blend_ps(tmp, _mm256_permute2f128_ps (tmp, tmp, 1), 0x88);
+    }
+    else if (Offset==2)
+    {
+      first = _mm256_blend_ps(first, second, 3);
+      Packet8f tmp = _mm256_permute_ps (first, _MM_SHUFFLE(1,0,3,2));
+      first = _mm256_blend_ps(tmp, _mm256_permute2f128_ps (tmp, tmp, 1), 0xcc);
+    }
+    else if (Offset==3)
+    {
+      first = _mm256_blend_ps(first, second, 7);
+      Packet8f tmp = _mm256_permute_ps (first, _MM_SHUFFLE(2,1,0,3));
+      first = _mm256_blend_ps(tmp, _mm256_permute2f128_ps (tmp, tmp, 1), 0xee);
+    }
+    else if (Offset==4)
+    {
+      first = _mm256_blend_ps(first, second, 15);
+      Packet8f tmp = _mm256_permute_ps (first, _MM_SHUFFLE(3,2,1,0));
+      first = _mm256_permute_ps(_mm256_permute2f128_ps (tmp, tmp, 1), _MM_SHUFFLE(3,2,1,0));
+    }
+    else if (Offset==5)
+    {
+      first = _mm256_blend_ps(first, second, 31);
+      first = _mm256_permute2f128_ps(first, first, 1);
+      Packet8f tmp = _mm256_permute_ps (first, _MM_SHUFFLE(0,3,2,1));
+      first = _mm256_permute2f128_ps(tmp, tmp, 1);
+      first = _mm256_blend_ps(tmp, first, 0x88);
+    }
+    else if (Offset==6)
+    {
+      first = _mm256_blend_ps(first, second, 63);
+      first = _mm256_permute2f128_ps(first, first, 1);
+      Packet8f tmp = _mm256_permute_ps (first, _MM_SHUFFLE(1,0,3,2));
+      first = _mm256_permute2f128_ps(tmp, tmp, 1);
+      first = _mm256_blend_ps(tmp, first, 0xcc);
+    }
+    else if (Offset==7)
+    {
+      first = _mm256_blend_ps(first, second, 127);
+      first = _mm256_permute2f128_ps(first, first, 1);
+      Packet8f tmp = _mm256_permute_ps (first, _MM_SHUFFLE(2,1,0,3));
+      first = _mm256_permute2f128_ps(tmp, tmp, 1);
+      first = _mm256_blend_ps(tmp, first, 0xee);
+    }
+  }
+};
+
+template<int Offset>
+struct palign_impl<Offset,Packet4d>
+{
+  static EIGEN_STRONG_INLINE void run(Packet4d& first, const Packet4d& second)
+  {
+    if (Offset==1)
+    {
+      first = _mm256_blend_pd(first, second, 1);
+      __m256d tmp = _mm256_permute_pd(first, 5);
+      first = _mm256_permute2f128_pd(tmp, tmp, 1);
+      first = _mm256_blend_pd(tmp, first, 0xA);
+    }
+    else if (Offset==2)
+    {
+      first = _mm256_blend_pd(first, second, 3);
+      first = _mm256_permute2f128_pd(first, first, 1);
+    }
+    else if (Offset==3)
+    {
+      first = _mm256_blend_pd(first, second, 7);
+      __m256d tmp = _mm256_permute_pd(first, 5);
+      first = _mm256_permute2f128_pd(tmp, tmp, 1);
+      first = _mm256_blend_pd(tmp, first, 5);
+    }
+  }
+};
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet8f,8>& kernel) {
+  __m256 T0 = _mm256_unpacklo_ps(kernel.packet[0], kernel.packet[1]);
+  __m256 T1 = _mm256_unpackhi_ps(kernel.packet[0], kernel.packet[1]);
+  __m256 T2 = _mm256_unpacklo_ps(kernel.packet[2], kernel.packet[3]);
+  __m256 T3 = _mm256_unpackhi_ps(kernel.packet[2], kernel.packet[3]);
+  __m256 T4 = _mm256_unpacklo_ps(kernel.packet[4], kernel.packet[5]);
+  __m256 T5 = _mm256_unpackhi_ps(kernel.packet[4], kernel.packet[5]);
+  __m256 T6 = _mm256_unpacklo_ps(kernel.packet[6], kernel.packet[7]);
+  __m256 T7 = _mm256_unpackhi_ps(kernel.packet[6], kernel.packet[7]);
+  __m256 S0 = _mm256_shuffle_ps(T0,T2,_MM_SHUFFLE(1,0,1,0));
+  __m256 S1 = _mm256_shuffle_ps(T0,T2,_MM_SHUFFLE(3,2,3,2));
+  __m256 S2 = _mm256_shuffle_ps(T1,T3,_MM_SHUFFLE(1,0,1,0));
+  __m256 S3 = _mm256_shuffle_ps(T1,T3,_MM_SHUFFLE(3,2,3,2));
+  __m256 S4 = _mm256_shuffle_ps(T4,T6,_MM_SHUFFLE(1,0,1,0));
+  __m256 S5 = _mm256_shuffle_ps(T4,T6,_MM_SHUFFLE(3,2,3,2));
+  __m256 S6 = _mm256_shuffle_ps(T5,T7,_MM_SHUFFLE(1,0,1,0));
+  __m256 S7 = _mm256_shuffle_ps(T5,T7,_MM_SHUFFLE(3,2,3,2));
+  kernel.packet[0] = _mm256_permute2f128_ps(S0, S4, 0x20);
+  kernel.packet[1] = _mm256_permute2f128_ps(S1, S5, 0x20);
+  kernel.packet[2] = _mm256_permute2f128_ps(S2, S6, 0x20);
+  kernel.packet[3] = _mm256_permute2f128_ps(S3, S7, 0x20);
+  kernel.packet[4] = _mm256_permute2f128_ps(S0, S4, 0x31);
+  kernel.packet[5] = _mm256_permute2f128_ps(S1, S5, 0x31);
+  kernel.packet[6] = _mm256_permute2f128_ps(S2, S6, 0x31);
+  kernel.packet[7] = _mm256_permute2f128_ps(S3, S7, 0x31);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet8f,4>& kernel) {
+  __m256 T0 = _mm256_unpacklo_ps(kernel.packet[0], kernel.packet[1]);
+  __m256 T1 = _mm256_unpackhi_ps(kernel.packet[0], kernel.packet[1]);
+  __m256 T2 = _mm256_unpacklo_ps(kernel.packet[2], kernel.packet[3]);
+  __m256 T3 = _mm256_unpackhi_ps(kernel.packet[2], kernel.packet[3]);
+
+  __m256 S0 = _mm256_shuffle_ps(T0,T2,_MM_SHUFFLE(1,0,1,0));
+  __m256 S1 = _mm256_shuffle_ps(T0,T2,_MM_SHUFFLE(3,2,3,2));
+  __m256 S2 = _mm256_shuffle_ps(T1,T3,_MM_SHUFFLE(1,0,1,0));
+  __m256 S3 = _mm256_shuffle_ps(T1,T3,_MM_SHUFFLE(3,2,3,2));
+
+  kernel.packet[0] = _mm256_permute2f128_ps(S0, S1, 0x20);
+  kernel.packet[1] = _mm256_permute2f128_ps(S2, S3, 0x20);
+  kernel.packet[2] = _mm256_permute2f128_ps(S0, S1, 0x31);
+  kernel.packet[3] = _mm256_permute2f128_ps(S2, S3, 0x31);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet4d,4>& kernel) {
+  __m256d T0 = _mm256_shuffle_pd(kernel.packet[0], kernel.packet[1], 15);
+  __m256d T1 = _mm256_shuffle_pd(kernel.packet[0], kernel.packet[1], 0);
+  __m256d T2 = _mm256_shuffle_pd(kernel.packet[2], kernel.packet[3], 15);
+  __m256d T3 = _mm256_shuffle_pd(kernel.packet[2], kernel.packet[3], 0);
+
+  kernel.packet[1] = _mm256_permute2f128_pd(T0, T2, 32);
+  kernel.packet[3] = _mm256_permute2f128_pd(T0, T2, 49);
+  kernel.packet[0] = _mm256_permute2f128_pd(T1, T3, 32);
+  kernel.packet[2] = _mm256_permute2f128_pd(T1, T3, 49);
+}
+
+template<> EIGEN_STRONG_INLINE Packet8f pblend(const Selector<8>& ifPacket, const Packet8f& thenPacket, const Packet8f& elsePacket) {
+  const __m256 zero = _mm256_setzero_ps();
+  const __m256 select = _mm256_set_ps(ifPacket.select[7], ifPacket.select[6], ifPacket.select[5], ifPacket.select[4], ifPacket.select[3], ifPacket.select[2], ifPacket.select[1], ifPacket.select[0]);
+  __m256 false_mask = _mm256_cmp_ps(select, zero, _CMP_EQ_UQ);
+  return _mm256_blendv_ps(thenPacket, elsePacket, false_mask);
+}
+template<> EIGEN_STRONG_INLINE Packet4d pblend(const Selector<4>& ifPacket, const Packet4d& thenPacket, const Packet4d& elsePacket) {
+  const __m256d zero = _mm256_setzero_pd();
+  const __m256d select = _mm256_set_pd(ifPacket.select[3], ifPacket.select[2], ifPacket.select[1], ifPacket.select[0]);
+  __m256d false_mask = _mm256_cmp_pd(select, zero, _CMP_EQ_UQ);
+  return _mm256_blendv_pd(thenPacket, elsePacket, false_mask);
+}
+
+// Functions to print vectors of different types, makes debugging much easier.
+namespace{
+void print4f(char* name, __m128 val) {
+  float temp[4] __attribute__((aligned(32)));
+  _mm_store_ps(temp, val);
+  printf("%s: ", name);
+  for (int k = 0; k < 4; k++) printf("%.8e ", temp[k]);
+  printf("\n");
+}
+void print8f(char* name, __m256 val) {
+  float temp[8] __attribute__((aligned(32)));
+  _mm256_store_ps(temp, val);
+  printf("%s: ", name);
+  for (int k = 0; k < 8; k++) printf("%.8e ", temp[k]);
+  printf("\n");
+}
+void print4i(char* name, __m128i val) {
+  int temp[4] __attribute__((aligned(32)));
+  _mm_store_si128((__m128i*)temp, val);
+  printf("%s: ", name);
+  for (int k = 0; k < 4; k++) printf("%i ", temp[k]);
+  printf("\n");
+}
+void print8i(char* name, __m256i val) {
+  int temp[8] __attribute__((aligned(32)));
+  _mm256_store_si256((__m256i*)temp, val);
+  printf("%s: ", name);
+  for (int k = 0; k < 8; k++) printf("%i ", temp[k]);
+  printf("\n");
+}
+void print8b(char* name, __m256i val) {
+  int temp[8] __attribute__((aligned(32)));
+  _mm256_store_si256((__m256i*)temp, val);
+  printf("%s: ", name);
+  for (int k = 0; k < 8; k++) printf("0x%08x ", temp[k]);
+  printf("\n");
+}
+void print4d(char* name, __m256d val) {
+  double temp[4] __attribute__((aligned(32)));
+  _mm256_store_pd(temp, val);
+  printf("%s: ", name);
+  for (int k = 0; k < 4; k++) printf("%.16e ", temp[k]);
+  printf("\n");
+}
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_PACKET_MATH_AVX_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/AVX/TypeCasting.h b/third_party/eigen3/Eigen/src/Core/arch/AVX/TypeCasting.h
new file mode 100644
index 0000000000..83bfdc604b
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/AVX/TypeCasting.h
@@ -0,0 +1,51 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TYPE_CASTING_AVX_H
+#define EIGEN_TYPE_CASTING_AVX_H
+
+namespace Eigen {
+
+namespace internal {
+
+// For now we use SSE to handle integers, so we can't use AVX instructions to cast
+// from int to float
+template <>
+struct type_casting_traits<float, int> {
+  enum {
+    VectorizedCast = 0,
+    SrcCoeffRatio = 1,
+    TgtCoeffRatio = 1
+  };
+};
+
+template <>
+struct type_casting_traits<int, float> {
+  enum {
+    VectorizedCast = 0,
+    SrcCoeffRatio = 1,
+    TgtCoeffRatio = 1
+  };
+};
+
+
+
+template<> EIGEN_STRONG_INLINE Packet8i pcast<Packet8f, Packet8i>(const Packet8f& a) {
+  return _mm256_cvtps_epi32(a);
+}
+
+template<> EIGEN_STRONG_INLINE Packet8f pcast<Packet8i, Packet8f>(const Packet8i& a) {
+  return _mm256_cvtepi32_ps(a);
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TYPE_CASTING_AVX_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/AltiVec/Complex.h b/third_party/eigen3/Eigen/src/Core/arch/AltiVec/Complex.h
new file mode 100644
index 0000000000..57df9508b3
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/AltiVec/Complex.h
@@ -0,0 +1,439 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COMPLEX32_ALTIVEC_H
+#define EIGEN_COMPLEX32_ALTIVEC_H
+
+
+namespace Eigen {
+
+namespace internal {
+
+static Packet4ui  p4ui_CONJ_XOR = vec_mergeh((Packet4ui)p4i_ZERO, (Packet4ui)p4f_ZERO_);//{ 0x00000000, 0x80000000, 0x00000000, 0x80000000 };
+#ifdef EIGEN_VECTORIZE_VSX
+#ifdef _BIG_ENDIAN
+static Packet2ul  p2ul_CONJ_XOR1 = (Packet2ul) vec_sld((Packet4ui) p2d_ZERO_, (Packet4ui) p2l_ZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 };
+static Packet2ul  p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2l_ZERO,  (Packet4ui) p2d_ZERO_, 8);//{ 0x8000000000000000, 0x0000000000000000 };
+#else
+static Packet2ul  p2ul_CONJ_XOR1 = (Packet2ul) vec_sld((Packet4ui) p2l_ZERO,  (Packet4ui) p2d_ZERO_, 8);//{ 0x8000000000000000, 0x0000000000000000 };
+static Packet2ul  p2ul_CONJ_XOR2 = (Packet2ul) vec_sld((Packet4ui) p2d_ZERO_, (Packet4ui) p2l_ZERO, 8);//{ 0x8000000000000000, 0x0000000000000000 };
+#endif
+#endif  // EIGEN_VECTORIZE_VSX
+
+//---------- float ----------
+struct Packet2cf
+{
+  EIGEN_STRONG_INLINE Packet2cf() {}
+  EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {}
+  Packet4f  v;
+};
+
+template<> struct packet_traits<std::complex<float> >  : default_packet_traits
+{
+  typedef Packet2cf type;
+  typedef Packet2cf half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 2,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasDiv    = 1,
+    HasNegate = 1,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasSetLinear = 0
+  };
+};
+
+template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2}; typedef Packet2cf half; };
+
+template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>&  from)
+{
+  Packet2cf res;
+  /* On AltiVec we cannot load 64-bit registers, so wa have to take care of alignment */
+  if((ptrdiff_t(&from) % 16) == 0)
+    res.v = pload<Packet4f>((const float *)&from);
+  else
+    res.v = ploadu<Packet4f>((const float *)&from);
+  res.v = vec_perm(res.v, res.v, p16uc_PSET64_HI);
+  return res;
+}
+
+template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather<std::complex<float>, Packet2cf>(const std::complex<float>* from, int stride)
+{
+  std::complex<float> EIGEN_ALIGN16 af[2];
+  af[0] = from[0*stride];
+  af[1] = from[1*stride];
+  return Packet2cf(vec_ld(0, (const float*)af));
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet2cf>(std::complex<float>* to, const Packet2cf& from, int stride)
+{
+  std::complex<float> EIGEN_ALIGN16 af[2];
+  vec_st(from.v, 0, (float*)af);
+  to[0*stride] = af[0];
+  to[1*stride] = af[1];
+}
+
+
+template<> EIGEN_STRONG_INLINE Packet2cf padd<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vec_add(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vec_sub(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate(a.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a) { return Packet2cf((Packet4f)vec_xor((Packet4ui)a.v, p4ui_CONJ_XOR)); }
+
+template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  Packet4f v1, v2;
+
+  // Permute and multiply the real parts of a and b
+  v1 = vec_perm(a.v, a.v, p16uc_PSET32_WODD);
+  // Get the imaginary parts of a
+  v2 = vec_perm(a.v, a.v, p16uc_PSET32_WEVEN);
+  // multiply a_re * b 
+  v1 = vec_madd(v1, b.v, p4f_ZERO);
+  // multiply a_im * b and get the conjugate result
+  v2 = vec_madd(v2, b.v, p4f_ZERO);
+  v2 = (Packet4f) vec_xor((Packet4ui)v2, p4ui_CONJ_XOR);
+  // permute back to a proper order
+  v2 = vec_perm(v2, v2, p16uc_COMPLEX32_REV);
+  
+  return Packet2cf(vec_add(v1, v2));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf pand   <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vec_and(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf por    <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vec_or(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pxor   <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vec_xor(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pandnot<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(vec_and(a.v, vec_nor(b.v,b.v))); }
+
+template<> EIGEN_STRONG_INLINE Packet2cf pload <Packet2cf>(const std::complex<float>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload<Packet4f>((const float*)from)); }
+template<> EIGEN_STRONG_INLINE Packet2cf ploadu<Packet2cf>(const std::complex<float>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu<Packet4f>((const float*)from)); }
+
+template<> EIGEN_STRONG_INLINE Packet2cf ploaddup<Packet2cf>(const std::complex<float>*     from)
+{
+  return pset1<Packet2cf>(*from);
+}
+
+template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float> *   to, const Packet2cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((float*)to, from.v); }
+template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float> *   to, const Packet2cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((float*)to, from.v); }
+
+#ifndef __VSX__
+template<> EIGEN_STRONG_INLINE void prefetch<std::complex<float> >(const std::complex<float> *   addr) { vec_dstt((float *)addr, DST_CTRL(2,2,32), DST_CHAN); }
+#endif
+
+template<> EIGEN_STRONG_INLINE std::complex<float>  pfirst<Packet2cf>(const Packet2cf& a)
+{
+  std::complex<float> EIGEN_ALIGN16 res[2];
+  pstore((float *)&res, a.v);
+
+  return res[0];
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a)
+{
+  Packet4f rev_a;
+  rev_a = vec_perm(a.v, a.v, p16uc_COMPLEX32_REV2);
+  return Packet2cf(rev_a);
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packet2cf& a)
+{
+  Packet4f b;
+  b = (Packet4f) vec_sld(a.v, a.v, 8);
+  b = padd(a.v, b);
+  return pfirst(Packet2cf(b));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
+{
+  Packet4f b1, b2;
+#ifdef _BIG_ENDIAN  
+  b1 = (Packet4f) vec_sld(vecs[0].v, vecs[1].v, 8);
+  b2 = (Packet4f) vec_sld(vecs[1].v, vecs[0].v, 8);
+#else
+  b1 = (Packet4f) vec_sld(vecs[1].v, vecs[0].v, 8);
+  b2 = (Packet4f) vec_sld(vecs[0].v, vecs[1].v, 8);
+#endif
+  b2 = (Packet4f) vec_sld(b2, b2, 8);
+  b2 = padd(b1, b2);
+
+  return Packet2cf(b2);
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
+{
+  Packet4f b;
+  Packet2cf prod;
+  b = (Packet4f) vec_sld(a.v, a.v, 8);
+  prod = pmul(a, Packet2cf(b));
+
+  return pfirst(prod);
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet2cf>
+{
+  static EIGEN_STRONG_INLINE void run(Packet2cf& first, const Packet2cf& second)
+  {
+    if (Offset==1)
+    {
+#ifdef _BIG_ENDIAN
+      first.v = vec_sld(first.v, second.v, 8);
+#else
+      first.v = vec_sld(second.v, first.v, 8);
+#endif
+    }
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    return internal::pmul(a, pconj(b));
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    return internal::pmul(pconj(a), b);
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    return pconj(internal::pmul(a, b));
+  }
+};
+
+template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  // TODO optimize it for AltiVec
+  Packet2cf res = conj_helper<Packet2cf,Packet2cf,false,true>().pmul(a,b);
+  Packet4f s = vec_madd(b.v, b.v, p4f_ZERO);
+  return Packet2cf(pdiv(res.v, vec_add(s,vec_perm(s, s, p16uc_COMPLEX32_REV))));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf pcplxflip<Packet2cf>(const Packet2cf& x)
+{
+  return Packet2cf(vec_perm(x.v, x.v, p16uc_COMPLEX32_REV));
+}
+
+template<> EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet2cf,2>& kernel)
+{
+  Packet4f tmp = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_HI);
+  kernel.packet[1].v = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_LO);
+  kernel.packet[0].v = tmp;
+}
+
+//---------- double ----------
+#if defined(EIGEN_VECTORIZE_VSX)
+struct Packet1cd
+{
+  EIGEN_STRONG_INLINE Packet1cd() {}
+  EIGEN_STRONG_INLINE explicit Packet1cd(const Packet2d& a) : v(a) {}
+  Packet2d v;
+};
+
+template<> struct packet_traits<std::complex<double> >  : default_packet_traits
+{
+  typedef Packet1cd type;
+  typedef Packet1cd half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 0,
+    size = 1,
+    HasHalfPacket = 0,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasDiv    = 1,
+    HasNegate = 1,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasSetLinear = 0
+  };
+};
+
+template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1}; typedef Packet1cd half; };
+
+template<> EIGEN_STRONG_INLINE Packet1cd pload <Packet1cd>(const std::complex<double>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload<Packet2d>((const double*)from)); }
+template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu<Packet2d>((const double*)from)); }
+template<> EIGEN_STRONG_INLINE void pstore <std::complex<double> >(std::complex<double> *   to, const Packet1cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); }
+template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<double> *   to, const Packet1cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); }
+
+template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<double>&  from)
+{ /* here we really have to use unaligned loads :( */ return ploadu<Packet1cd>(&from); }
+
+// Google-local: Change type from DenseIndex to int in patch.
+template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, int/*DenseIndex*/ stride)
+{
+  std::complex<double> EIGEN_ALIGN16 af[2];
+  af[0] = from[0*stride];
+  af[1] = from[1*stride];
+  return pload<Packet1cd>(af);
+}
+// Google-local: Change type from DenseIndex to int in patch.
+template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, int/*DenseIndex*/ stride)
+{
+  std::complex<double> EIGEN_ALIGN16 af[2];
+  pstore<std::complex<double> >(af, from);
+  to[0*stride] = af[0];
+  to[1*stride] = af[1];
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_add(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd psub<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_sub(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate(Packet2d(a.v))); }
+template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { return Packet1cd((Packet2d)vec_xor((Packet2d)a.v, (Packet2d)p2ul_CONJ_XOR2)); }
+
+template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  Packet2d a_re, a_im, v1, v2;
+
+  // Permute and multiply the real parts of a and b
+  a_re = vec_perm(a.v, a.v, p16uc_PSET64_HI);
+  // Get the imaginary parts of a
+  a_im = vec_perm(a.v, a.v, p16uc_PSET64_LO);
+  // multiply a_re * b
+  v1 = vec_madd(a_re, b.v, p2d_ZERO);
+  // multiply a_im * b and get the conjugate result
+  v2 = vec_madd(a_im, b.v, p2d_ZERO);
+  v2 = (Packet2d) vec_sld((Packet4ui)v2, (Packet4ui)v2, 8);
+  v2 = (Packet2d) vec_xor((Packet2d)v2, (Packet2d) p2ul_CONJ_XOR1);
+
+  return Packet1cd(vec_add(v1, v2));
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd pand   <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_and(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd por    <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_or(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pxor   <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_xor(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pandnot<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(vec_and(a.v, vec_nor(b.v,b.v))); }
+
+template<> EIGEN_STRONG_INLINE Packet1cd ploaddup<Packet1cd>(const std::complex<double>*     from)
+{
+  return pset1<Packet1cd>(*from);
+}
+
+#ifndef __VSX__
+template<> EIGEN_STRONG_INLINE void prefetch<std::complex<double> >(const std::complex<double> *   addr) { vec_dstt((long *)addr, DST_CTRL(2,2,32), DST_CHAN); }
+#endif
+
+template<> EIGEN_STRONG_INLINE std::complex<double>  pfirst<Packet1cd>(const Packet1cd& a)
+{
+  std::complex<double> EIGEN_ALIGN16 res[2];
+  pstore<std::complex<double> >(res, a);
+
+  return res[0];
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a)
+{
+  return pfirst(a);
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs)
+{
+  return vecs[0];
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a)
+{
+  return pfirst(a);
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet1cd>
+{
+  static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
+  {
+    // FIXME is it sure we never have to align a Packet1cd?
+    // Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    return internal::pmul(a, pconj(b));
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    return internal::pmul(pconj(a), b);
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    return pconj(internal::pmul(a, b));
+  }
+};
+
+template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  // TODO optimize it for AltiVec
+  Packet1cd res = conj_helper<Packet1cd,Packet1cd,false,true>().pmul(a,b);
+  Packet2d s = vec_madd(b.v, b.v, p2d_ZERO_);
+  return Packet1cd(pdiv(res.v, vec_add(s,vec_perm(s, s, p16uc_REVERSE64))));
+}
+
+EIGEN_STRONG_INLINE Packet1cd pcplxflip/*<Packet1cd>*/(const Packet1cd& x)
+{
+  return Packet1cd(preverse(Packet2d(x.v)));
+}
+
+EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet1cd,2>& kernel)
+{
+  Packet2d tmp = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_HI);
+  kernel.packet[1].v = vec_perm(kernel.packet[0].v, kernel.packet[1].v, p16uc_TRANSPOSE64_LO);
+  kernel.packet[0].v = tmp;
+}
+#endif // EIGEN_VECTORIZE_VSX
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMPLEX32_ALTIVEC_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/AltiVec/MathFunctions.h b/third_party/eigen3/Eigen/src/Core/arch/AltiVec/MathFunctions.h
new file mode 100644
index 0000000000..e3545b4abc
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/AltiVec/MathFunctions.h
@@ -0,0 +1,299 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2007 Julien Pommier
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* The sin, cos, exp, and log functions of this file come from
+ * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/
+ */
+
+#ifndef EIGEN_MATH_FUNCTIONS_ALTIVEC_H
+#define EIGEN_MATH_FUNCTIONS_ALTIVEC_H
+
+#include <iostream>
+
+#define DUMP(v) do { std::cout << #v " = " << (v) << std::endl; } while(0)
+
+namespace Eigen {
+
+namespace internal {
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f plog<Packet4f>(const Packet4f& _x)
+{
+  Packet4f x = _x;
+  _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f);
+  _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f);
+  _EIGEN_DECLARE_CONST_Packet4i(0x7f, 0x7f);
+  _EIGEN_DECLARE_CONST_Packet4i(23, 23);
+
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(inv_mant_mask, ~0x7f800000);
+
+  /* the smallest non denormalized float number */
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(min_norm_pos,  0x00800000);
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(minus_inf,     0xff800000); // -1.f/0.f
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(minus_nan,     0xffffffff);
+
+  /* natural logarithm computed for 4 simultaneous float
+    return NaN for x <= 0
+  */
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_SQRTHF, 0.707106781186547524f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p0, 7.0376836292E-2f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p1, - 1.1514610310E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p2, 1.1676998740E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p3, - 1.2420140846E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p4, + 1.4249322787E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p5, - 1.6668057665E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p6, + 2.0000714765E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p7, - 2.4999993993E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p8, + 3.3333331174E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q1, -2.12194440e-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q2, 0.693359375f);
+
+
+  Packet4i emm0;
+
+  /* isvalid_mask is 0 if x < 0 or x is NaN. */
+  Packet4ui isvalid_mask = reinterpret_cast<Packet4ui>(vec_cmpge(x, p4f_ZERO));
+  Packet4ui iszero_mask = reinterpret_cast<Packet4ui>(vec_cmpeq(x, p4f_ZERO));
+
+  x = pmax(x, p4f_min_norm_pos);  /* cut off denormalized stuff */
+  emm0 = vec_sr(reinterpret_cast<Packet4i>(x),
+                reinterpret_cast<Packet4ui>(p4i_23));
+
+  /* keep only the fractional part */
+  x = pand(x, p4f_inv_mant_mask);
+  x = por(x, p4f_half);
+
+  emm0 = psub(emm0, p4i_0x7f);
+  Packet4f e = padd(vec_ctf(emm0, 0), p4f_1);
+
+  /* part2:
+     if( x < SQRTHF ) {
+       e -= 1;
+       x = x + x - 1.0;
+     } else { x = x - 1.0; }
+  */
+  Packet4f mask = reinterpret_cast<Packet4f>(vec_cmplt(x, p4f_cephes_SQRTHF));
+  Packet4f tmp = pand(x, mask);
+  x = psub(x, p4f_1);
+  e = psub(e, pand(p4f_1, mask));
+  x = padd(x, tmp);
+
+  Packet4f x2 = pmul(x,x);
+  Packet4f x3 = pmul(x2,x);
+
+  Packet4f y, y1, y2;
+  y  = pmadd(p4f_cephes_log_p0, x, p4f_cephes_log_p1);
+  y1 = pmadd(p4f_cephes_log_p3, x, p4f_cephes_log_p4);
+  y2 = pmadd(p4f_cephes_log_p6, x, p4f_cephes_log_p7);
+  y  = pmadd(y , x, p4f_cephes_log_p2);
+  y1 = pmadd(y1, x, p4f_cephes_log_p5);
+  y2 = pmadd(y2, x, p4f_cephes_log_p8);
+  y = pmadd(y, x3, y1);
+  y = pmadd(y, x3, y2);
+  y = pmul(y, x3);
+
+  y1 = pmul(e, p4f_cephes_log_q1);
+  tmp = pmul(x2, p4f_half);
+  y = padd(y, y1);
+  x = psub(x, tmp);
+  y2 = pmul(e, p4f_cephes_log_q2);
+  x = padd(x, y);
+  x = padd(x, y2);
+  // negative arg will be NAN, 0 will be -INF
+  x = vec_sel(x, p4f_minus_inf, iszero_mask);
+  x = vec_sel(p4f_minus_nan, x, isvalid_mask);
+  return x;
+}
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f pexp<Packet4f>(const Packet4f& _x)
+{
+  Packet4f x = _x;
+  _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f);
+  _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f);
+  _EIGEN_DECLARE_CONST_Packet4i(0x7f, 0x7f);
+  _EIGEN_DECLARE_CONST_Packet4i(23, 23);
+
+
+  _EIGEN_DECLARE_CONST_Packet4f(exp_hi,  88.3762626647950f);
+  _EIGEN_DECLARE_CONST_Packet4f(exp_lo, -88.3762626647949f);
+
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_LOG2EF, 1.44269504088896341f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C1, 0.693359375f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C2, -2.12194440e-4f);
+
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p0, 1.9875691500E-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p1, 1.3981999507E-3f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p2, 8.3334519073E-3f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p3, 4.1665795894E-2f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p4, 1.6666665459E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p5, 5.0000001201E-1f);
+
+  Packet4f tmp, fx;
+  Packet4i emm0;
+
+  // clamp x
+  x = vec_max(vec_min(x, p4f_exp_hi), p4f_exp_lo);
+
+  /* express exp(x) as exp(g + n*log(2)) */
+  fx = pmadd(x, p4f_cephes_LOG2EF, p4f_half);
+
+  fx = vec_floor(fx);
+
+  tmp = pmul(fx, p4f_cephes_exp_C1);
+  Packet4f z = pmul(fx, p4f_cephes_exp_C2);
+  x = psub(x, tmp);
+  x = psub(x, z);
+
+  z = pmul(x,x);
+
+  Packet4f y = p4f_cephes_exp_p0;
+  y = pmadd(y, x, p4f_cephes_exp_p1);
+  y = pmadd(y, x, p4f_cephes_exp_p2);
+  y = pmadd(y, x, p4f_cephes_exp_p3);
+  y = pmadd(y, x, p4f_cephes_exp_p4);
+  y = pmadd(y, x, p4f_cephes_exp_p5);
+  y = pmadd(y, z, x);
+  y = padd(y, p4f_1);
+
+  // build 2^n
+  emm0 = vec_cts(fx, 0);
+  emm0 = vec_add(emm0, p4i_0x7f);
+  emm0 = vec_sl(emm0, reinterpret_cast<Packet4ui>(p4i_23));
+
+  // Altivec's max & min operators just drop silent NaNs. Check NaNs in
+  // inputs and return them unmodified.
+  Packet4ui isnumber_mask = reinterpret_cast<Packet4ui>(vec_cmpeq(_x, _x));
+  return vec_sel(_x, pmax(pmul(y, reinterpret_cast<Packet4f>(emm0)), _x),
+                 isnumber_mask);
+}
+
+#ifdef __VSX__
+
+#undef GCC_VERSION
+#define GCC_VERSION (__GNUC__ * 10000 \
+                     + __GNUC_MINOR__ * 100 \
+                     + __GNUC_PATCHLEVEL__)
+
+// VSX support varies between different compilers and even different
+// versions of the same compiler.  For gcc version >= 4.9.3, we can use
+// vec_cts to efficiently convert Packet2d to Packet2l.  Otherwise, use
+// a slow version that works with older compilers.
+static inline Packet2l ConvertToPacket2l(const Packet2d& x) {
+#if GCC_VERSION >= 40903 || defined(__clang__)
+  return vec_cts(x, 0);
+#else
+  double tmp[2];
+  memcpy(tmp, &x, sizeof(tmp));
+  Packet2l l = { static_cast<long long>(tmp[0]),
+                 static_cast<long long>(tmp[1]) };
+  return l;
+#endif
+}
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet2d pexp<Packet2d>(const Packet2d& _x)
+{
+  Packet2d x = _x;
+
+  _EIGEN_DECLARE_CONST_Packet2d(1 , 1.0);
+  _EIGEN_DECLARE_CONST_Packet2d(2 , 2.0);
+  _EIGEN_DECLARE_CONST_Packet2d(half, 0.5);
+
+  _EIGEN_DECLARE_CONST_Packet2d(exp_hi,  709.437);
+  _EIGEN_DECLARE_CONST_Packet2d(exp_lo, -709.436139303);
+
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_LOG2EF, 1.4426950408889634073599);
+
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p0, 1.26177193074810590878e-4);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p1, 3.02994407707441961300e-2);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p2, 9.99999999999999999910e-1);
+
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q0, 3.00198505138664455042e-6);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q1, 2.52448340349684104192e-3);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q2, 2.27265548208155028766e-1);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q3, 2.00000000000000000009e0);
+
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C1, 0.693145751953125);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C2, 1.42860682030941723212e-6);
+
+  Packet2d tmp, fx;
+  Packet2l emm0;
+
+  // clamp x
+  x = pmax(pmin(x, p2d_exp_hi), p2d_exp_lo);
+  /* express exp(x) as exp(g + n*log(2)) */
+  fx = pmadd(p2d_cephes_LOG2EF, x, p2d_half);
+
+  fx = vec_floor(fx);
+
+  tmp = pmul(fx, p2d_cephes_exp_C1);
+  Packet2d z = pmul(fx, p2d_cephes_exp_C2);
+  x = psub(x, tmp);
+  x = psub(x, z);
+
+  Packet2d x2 = pmul(x,x);
+
+  Packet2d px = p2d_cephes_exp_p0;
+  px = pmadd(px, x2, p2d_cephes_exp_p1);
+  px = pmadd(px, x2, p2d_cephes_exp_p2);
+  px = pmul (px, x);
+
+  Packet2d qx = p2d_cephes_exp_q0;
+  qx = pmadd(qx, x2, p2d_cephes_exp_q1);
+  qx = pmadd(qx, x2, p2d_cephes_exp_q2);
+  qx = pmadd(qx, x2, p2d_cephes_exp_q3);
+
+  x = pdiv(px,psub(qx,px));
+  x = pmadd(p2d_2,x,p2d_1);
+
+  // build 2^n
+  emm0 = ConvertToPacket2l(fx);
+
+#ifdef __POWER8_VECTOR__
+  static const Packet2l p2l_1023 = { 1023, 1023 };
+  static const Packet2ul p2ul_52 = { 52, 52 };
+
+  emm0 = vec_add(emm0, p2l_1023);
+  emm0 = vec_sl(emm0, p2ul_52);
+#else
+  // Code is a bit complex for POWER7.  There is actually a
+  // vec_xxsldi intrinsic but it is not supported by some gcc versions.
+  // So we shift (52-32) bits and do a word swap with zeros.
+  _EIGEN_DECLARE_CONST_Packet4i(1023, 1023);
+  _EIGEN_DECLARE_CONST_Packet4i(20, 20);    // 52 - 32
+
+  Packet4i emm04i = reinterpret_cast<Packet4i>(emm0);
+  emm04i = vec_add(emm04i, p4i_1023);
+  emm04i = vec_sl(emm04i, reinterpret_cast<Packet4ui>(p4i_20));
+  static const Packet16uc perm = {
+    0x14, 0x15, 0x16, 0x17, 0x00, 0x01, 0x02, 0x03,
+    0x1c, 0x1d, 0x1e, 0x1f, 0x08, 0x09, 0x0a, 0x0b };
+#ifdef  _BIG_ENDIAN
+  emm0 = reinterpret_cast<Packet2l>(vec_perm(p4i_ZERO, emm04i, perm));
+#else
+  emm0 = reinterpret_cast<Packet2l>(vec_perm(emm04i, p4i_ZERO, perm));
+#endif
+
+#endif
+
+  // Altivec's max & min operators just drop silent NaNs. Check NaNs in
+  // inputs and return them unmodified.
+  Packet2ul isnumber_mask = reinterpret_cast<Packet2ul>(vec_cmpeq(_x, _x));
+  return vec_sel(_x, pmax(pmul(x, reinterpret_cast<Packet2d>(emm0)), _x),
+                 isnumber_mask);
+}
+#endif
+
+}  // end namespace internal
+
+}  // end namespace Eigen
+
+#endif  // EIGEN_MATH_FUNCTIONS_ALTIVEC_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/AltiVec/PacketMath.h b/third_party/eigen3/Eigen/src/Core/arch/AltiVec/PacketMath.h
new file mode 100644
index 0000000000..640488e92b
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/AltiVec/PacketMath.h
@@ -0,0 +1,943 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Konstantinos Margaritis <markos@codex.gr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PACKET_MATH_ALTIVEC_H
+#define EIGEN_PACKET_MATH_ALTIVEC_H
+
+namespace Eigen {
+
+namespace internal {
+
+#ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD
+#define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 4
+#endif
+
+#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+#endif
+
+#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
+#define EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
+#endif
+
+// NOTE Altivec has 32 registers, but Eigen only accepts a value of 8 or 16
+#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
+#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS  32
+#endif
+
+typedef __vector float          Packet4f;
+typedef __vector int            Packet4i;
+typedef __vector unsigned int   Packet4ui;
+typedef __vector __bool int     Packet4bi;
+typedef __vector short int      Packet8i;
+typedef __vector unsigned char  Packet16uc;
+
+// We don't want to write the same code all the time, but we need to reuse the constants
+// and it doesn't really work to declare them global, so we define macros instead
+
+#define _EIGEN_DECLARE_CONST_FAST_Packet4f(NAME,X) \
+  Packet4f p4f_##NAME = (Packet4f) vec_splat_s32(X)
+
+#define _EIGEN_DECLARE_CONST_FAST_Packet4i(NAME,X) \
+  Packet4i p4i_##NAME = vec_splat_s32(X)
+
+#define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \
+  Packet4f p4f_##NAME = pset1<Packet4f>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet4i(NAME,X) \
+  Packet4i p4i_##NAME = pset1<Packet4i>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet2d(NAME,X) \
+  Packet2d p2d_##NAME = pset1<Packet2d>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet2l(NAME,X) \
+  Packet2l p2l_##NAME = pset1<Packet2l>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(NAME,X) \
+  const Packet4f p4f_##NAME = reinterpret_cast<Packet4f>(pset1<Packet4i>(X))
+
+#define DST_CHAN 1
+#define DST_CTRL(size, count, stride) (((size) << 24) | ((count) << 16) | (stride))
+
+// These constants are endian-agnostic
+static _EIGEN_DECLARE_CONST_FAST_Packet4f(ZERO, 0);
+static _EIGEN_DECLARE_CONST_FAST_Packet4i(ZERO, 0);
+#ifndef __VSX__
+static _EIGEN_DECLARE_CONST_FAST_Packet4i(ONE,1);
+static Packet4f p4f_ONE = vec_ctf(p4i_ONE, 0);
+#endif
+static _EIGEN_DECLARE_CONST_FAST_Packet4i(MINUS16,-16);
+static _EIGEN_DECLARE_CONST_FAST_Packet4i(MINUS1,-1);
+static Packet4f p4f_ZERO_ = (Packet4f) vec_sl((Packet4ui)p4i_MINUS1, (Packet4ui)p4i_MINUS1);
+
+static Packet4f p4f_COUNTDOWN = { 0.0, 1.0, 2.0, 3.0 };
+static Packet4i p4i_COUNTDOWN = { 0, 1, 2, 3 };
+
+static Packet16uc p16uc_REVERSE32 = { 12,13,14,15, 8,9,10,11, 4,5,6,7, 0,1,2,3 };
+static Packet16uc p16uc_DUPLICATE32_HI = { 0,1,2,3, 0,1,2,3, 4,5,6,7, 4,5,6,7 };
+
+// Mask alignment
+#ifdef __PPC64__
+#define _EIGEN_MASK_ALIGNMENT	0xfffffffffffffff0
+#else
+#define _EIGEN_MASK_ALIGNMENT	0xfffffff0
+#endif
+
+#define _EIGEN_ALIGNED_PTR(x)	((ptrdiff_t)(x) & _EIGEN_MASK_ALIGNMENT)
+
+// Handle endianness properly while loading constants
+// Define global static constants:
+#ifdef _BIG_ENDIAN
+static Packet16uc p16uc_FORWARD = vec_lvsl(0, (float*)0); 
+static Packet16uc p16uc_REVERSE64 = { 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 };
+static Packet16uc p16uc_PSET32_WODD   = vec_sld((Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 0), (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 2), 8);//{ 0,1,2,3, 0,1,2,3, 8,9,10,11, 8,9,10,11 };
+static Packet16uc p16uc_PSET32_WEVEN  = vec_sld(p16uc_DUPLICATE32_HI, (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 3), 8);//{ 4,5,6,7, 4,5,6,7, 12,13,14,15, 12,13,14,15 };
+static Packet16uc p16uc_HALF64_0_16 = vec_sld((Packet16uc)p4i_ZERO, vec_splat((Packet16uc) vec_abs(p4i_MINUS16), 3), 8);      //{ 0,0,0,0, 0,0,0,0, 16,16,16,16, 16,16,16,16};
+#else
+static Packet16uc p16uc_FORWARD = p16uc_REVERSE32; 
+static Packet16uc p16uc_REVERSE64 = { 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 };
+static Packet16uc p16uc_PSET32_WODD = vec_sld((Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 1), (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 3), 8);//{ 0,1,2,3, 0,1,2,3, 8,9,10,11, 8,9,10,11 };
+static Packet16uc p16uc_PSET32_WEVEN = vec_sld((Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 0), (Packet16uc) vec_splat((Packet4ui)p16uc_FORWARD, 2), 8);//{ 4,5,6,7, 4,5,6,7, 12,13,14,15, 12,13,14,15 };
+static Packet16uc p16uc_HALF64_0_16 = vec_sld(vec_splat((Packet16uc) vec_abs(p4i_MINUS16), 0), (Packet16uc)p4i_ZERO, 8);      //{ 0,0,0,0, 0,0,0,0, 16,16,16,16, 16,16,16,16};
+#endif // _BIG_ENDIAN
+
+static Packet16uc p16uc_PSET64_HI = (Packet16uc) vec_mergeh((Packet4ui)p16uc_PSET32_WODD, (Packet4ui)p16uc_PSET32_WEVEN);     //{ 0,1,2,3, 4,5,6,7, 0,1,2,3, 4,5,6,7 };
+static Packet16uc p16uc_PSET64_LO = (Packet16uc) vec_mergel((Packet4ui)p16uc_PSET32_WODD, (Packet4ui)p16uc_PSET32_WEVEN);     //{ 8,9,10,11, 12,13,14,15, 8,9,10,11, 12,13,14,15 };
+static Packet16uc p16uc_TRANSPOSE64_HI = vec_add(p16uc_PSET64_HI, p16uc_HALF64_0_16);                                         //{ 0,1,2,3, 4,5,6,7, 16,17,18,19, 20,21,22,23};
+static Packet16uc p16uc_TRANSPOSE64_LO = vec_add(p16uc_PSET64_LO, p16uc_HALF64_0_16);                                         //{ 8,9,10,11, 12,13,14,15, 24,25,26,27, 28,29,30,31};
+
+static Packet16uc p16uc_COMPLEX32_REV = vec_sld(p16uc_REVERSE32, p16uc_REVERSE32, 8);                                         //{ 4,5,6,7, 0,1,2,3, 12,13,14,15, 8,9,10,11 };
+
+#ifdef _BIG_ENDIAN
+static Packet16uc p16uc_COMPLEX32_REV2 = vec_sld(p16uc_FORWARD, p16uc_FORWARD, 8);                                            //{ 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 };
+#else
+static Packet16uc p16uc_COMPLEX32_REV2 = vec_sld(p16uc_PSET64_HI, p16uc_PSET64_LO, 8);                                            //{ 8,9,10,11, 12,13,14,15, 0,1,2,3, 4,5,6,7 };
+#endif // _BIG_ENDIAN
+
+template<> struct packet_traits<float>  : default_packet_traits
+{
+  typedef Packet4f type;
+  typedef Packet4f half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=4,
+
+    // FIXME check the Has*
+#if defined(__VSX__)
+    HasDiv  = 1,
+#endif
+    HasSin  = 0,
+    HasCos  = 0,
+    HasLog  = 1,
+    HasExp  = 1,
+    HasSqrt = 0
+  };
+};
+template<> struct packet_traits<int>    : default_packet_traits
+{
+  typedef Packet4i type;
+  typedef Packet4i half;
+  enum {
+    // FIXME check the Has*
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=4
+  };
+};
+
+
+template<> struct unpacket_traits<Packet4f> { typedef float  type; enum {size=4}; typedef Packet4f half; };
+template<> struct unpacket_traits<Packet4i> { typedef int    type; enum {size=4}; typedef Packet4i half; };
+
+inline std::ostream & operator <<(std::ostream & s, const Packet16uc & v)
+{
+  union {
+    Packet16uc   v;
+    unsigned char n[16];
+  } vt;
+  vt.v = v;
+  for (int i=0; i< 16; i++)
+    s << (int)vt.n[i] << ", ";
+  return s;
+}
+
+inline std::ostream & operator <<(std::ostream & s, const Packet4f & v)
+{
+  union {
+    Packet4f   v;
+    float n[4];
+  } vt;
+  vt.v = v;
+  s << vt.n[0] << ", " << vt.n[1] << ", " << vt.n[2] << ", " << vt.n[3];
+  return s;
+}
+
+inline std::ostream & operator <<(std::ostream & s, const Packet4i & v)
+{
+  union {
+    Packet4i   v;
+    int n[4];
+  } vt;
+  vt.v = v;
+  s << vt.n[0] << ", " << vt.n[1] << ", " << vt.n[2] << ", " << vt.n[3];
+  return s;
+}
+
+inline std::ostream & operator <<(std::ostream & s, const Packet4ui & v)
+{
+  union {
+    Packet4ui   v;
+    unsigned int n[4];
+  } vt;
+  vt.v = v;
+  s << vt.n[0] << ", " << vt.n[1] << ", " << vt.n[2] << ", " << vt.n[3];
+  return s;
+}
+/*
+inline std::ostream & operator <<(std::ostream & s, const Packetbi & v)
+{
+  union {
+    Packet4bi v;
+    unsigned int n[4];
+  } vt;
+  vt.v = v;
+  s << vt.n[0] << ", " << vt.n[1] << ", " << vt.n[2] << ", " << vt.n[3];
+  return s;
+}*/
+
+
+// Need to define them first or we get specialization after instantiation errors
+template<> EIGEN_STRONG_INLINE Packet4f pload<Packet4f>(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return vec_ld(0, from); }
+template<> EIGEN_STRONG_INLINE Packet4i pload<Packet4i>(const int*     from) { EIGEN_DEBUG_ALIGNED_LOAD return vec_ld(0, from); }
+
+template<> EIGEN_STRONG_INLINE void pstore<float>(float*   to, const Packet4f& from) { EIGEN_DEBUG_ALIGNED_STORE vec_st(from, 0, to); }
+template<> EIGEN_STRONG_INLINE void pstore<int>(int*       to, const Packet4i& from) { EIGEN_DEBUG_ALIGNED_STORE vec_st(from, 0, to); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pset1<Packet4f>(const float&  from) {
+  // Taken from http://developer.apple.com/hardwaredrivers/ve/alignment.html
+  float EIGEN_ALIGN16 af[4];
+  af[0] = from;
+  Packet4f vc = pload<Packet4f>(af);
+  vc = vec_splat(vc, 0);
+  return vc;
+}
+
+template<> EIGEN_STRONG_INLINE Packet4i pset1<Packet4i>(const int&    from)   {
+  int EIGEN_ALIGN16 ai[4];
+  ai[0] = from;
+  Packet4i vc = pload<Packet4i>(ai);
+  vc = vec_splat(vc, 0);
+  return vc;
+}
+template<> EIGEN_STRONG_INLINE void
+pbroadcast4<Packet4f>(const float *a,
+                      Packet4f& a0, Packet4f& a1, Packet4f& a2, Packet4f& a3)
+{
+  a3 = pload<Packet4f>(a);
+  a0 = vec_splat(a3, 0);
+  a1 = vec_splat(a3, 1);
+  a2 = vec_splat(a3, 2);
+  a3 = vec_splat(a3, 3);
+}
+template<> EIGEN_STRONG_INLINE void
+pbroadcast4<Packet4i>(const int *a,
+                      Packet4i& a0, Packet4i& a1, Packet4i& a2, Packet4i& a3)
+{
+  a3 = pload<Packet4i>(a);
+  a0 = vec_splat(a3, 0);
+  a1 = vec_splat(a3, 1);
+  a2 = vec_splat(a3, 2);
+  a3 = vec_splat(a3, 3);
+}
+
+template<> EIGEN_DEVICE_FUNC inline Packet4f pgather<float, Packet4f>(const float* from, int stride)
+{
+  float EIGEN_ALIGN16 af[4];
+  af[0] = from[0*stride];
+  af[1] = from[1*stride];
+  af[2] = from[2*stride];
+  af[3] = from[3*stride];
+ return pload<Packet4f>(af);
+}
+template<> EIGEN_DEVICE_FUNC inline Packet4i pgather<int, Packet4i>(const int* from, int stride)
+{
+  int EIGEN_ALIGN16 ai[4];
+  ai[0] = from[0*stride];
+  ai[1] = from[1*stride];
+  ai[2] = from[2*stride];
+  ai[3] = from[3*stride];
+ return pload<Packet4i>(ai);
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<float, Packet4f>(float* to, const Packet4f& from, int stride)
+{
+  float EIGEN_ALIGN16 af[4];
+  pstore<float>(af, from);
+  to[0*stride] = af[0];
+  to[1*stride] = af[1];
+  to[2*stride] = af[2];
+  to[3*stride] = af[3];
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<int, Packet4i>(int* to, const Packet4i& from, int stride)
+{
+  int EIGEN_ALIGN16 ai[4];
+  pstore<int>((int *)ai, from);
+  to[0*stride] = ai[0];
+  to[1*stride] = ai[1];
+  to[2*stride] = ai[2];
+  to[3*stride] = ai[3];
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f plset<float>(const float& a) { return vec_add(pset1<Packet4f>(a), p4f_COUNTDOWN); }
+template<> EIGEN_STRONG_INLINE Packet4i plset<int>(const int& a)     { return vec_add(pset1<Packet4i>(a), p4i_COUNTDOWN); }
+
+template<> EIGEN_STRONG_INLINE Packet4f padd<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_add(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i padd<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_add(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f psub<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_sub(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i psub<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_sub(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a) { return psub<Packet4f>(p4f_ZERO, a); }
+template<> EIGEN_STRONG_INLINE Packet4i pnegate(const Packet4i& a) { return psub<Packet4i>(p4i_ZERO, a); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pconj(const Packet4f& a) { return a; }
+template<> EIGEN_STRONG_INLINE Packet4i pconj(const Packet4i& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE Packet4f pmul<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_madd(a,b,p4f_ZERO); }
+/* Commented out: it's actually slower than processing it scalar
+ *
+template<> EIGEN_STRONG_INLINE Packet4i pmul<Packet4i>(const Packet4i& a, const Packet4i& b)
+{
+  // Detailed in: http://freevec.org/content/32bit_signed_integer_multiplication_altivec
+  //Set up constants, variables
+  Packet4i a1, b1, bswap, low_prod, high_prod, prod, prod_, v1sel;
+
+  // Get the absolute values
+  a1  = vec_abs(a);
+  b1  = vec_abs(b);
+
+  // Get the signs using xor
+  Packet4bi sgn = (Packet4bi) vec_cmplt(vec_xor(a, b), p4i_ZERO);
+
+  // Do the multiplication for the asbolute values.
+  bswap = (Packet4i) vec_rl((Packet4ui) b1, (Packet4ui) p4i_MINUS16 );
+  low_prod = vec_mulo((Packet8i) a1, (Packet8i)b1);
+  high_prod = vec_msum((Packet8i) a1, (Packet8i) bswap, p4i_ZERO);
+  high_prod = (Packet4i) vec_sl((Packet4ui) high_prod, (Packet4ui) p4i_MINUS16);
+  prod = vec_add( low_prod, high_prod );
+
+  // NOR the product and select only the negative elements according to the sign mask
+  prod_ = vec_nor(prod, prod);
+  prod_ = vec_sel(p4i_ZERO, prod_, sgn);
+
+  // Add 1 to the result to get the negative numbers
+  v1sel = vec_sel(p4i_ZERO, p4i_ONE, sgn);
+  prod_ = vec_add(prod_, v1sel);
+
+  // Merge the results back to the final vector.
+  prod = vec_sel(prod, prod_, sgn);
+
+  return prod;
+}
+*/
+template<> EIGEN_STRONG_INLINE Packet4f pdiv<Packet4f>(const Packet4f& a, const Packet4f& b)
+{
+#if !defined(__VSX__) // VSX actually provides a div instruction
+  Packet4f t, y_0, y_1;
+
+  // Altivec does not offer a divide instruction, we have to do a reciprocal approximation
+  y_0 = vec_re(b);
+
+  // Do one Newton-Raphson iteration to get the needed accuracy
+  t   = vec_nmsub(y_0, b, p4f_ONE);
+  y_1 = vec_madd(y_0, t, y_0);
+
+  return vec_madd(a, y_1, p4f_ZERO);
+#else
+  return vec_div(a, b);
+#endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet4i pdiv<Packet4i>(const Packet4i& /*a*/, const Packet4i& /*b*/)
+{ eigen_assert(false && "packet integer division are not supported by AltiVec");
+  return pset1<Packet4i>(0);
+}
+
+// for some weird raisons, it has to be overloaded for packet of integers
+template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { return vec_madd(a, b, c); }
+template<> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i& b, const Packet4i& c) { return padd(pmul(a,b), c); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pmin<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_min(a, b); }
+template<> EIGEN_STRONG_INLINE Packet4i pmin<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_min(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pmax<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_max(a, b); }
+template<> EIGEN_STRONG_INLINE Packet4i pmax<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_max(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pand<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_and(a, b); }
+template<> EIGEN_STRONG_INLINE Packet4i pand<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_and(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f por<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_or(a, b); }
+template<> EIGEN_STRONG_INLINE Packet4i por<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_or(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pxor<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_xor(a, b); }
+template<> EIGEN_STRONG_INLINE Packet4i pxor<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_xor(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pandnot<Packet4f>(const Packet4f& a, const Packet4f& b) { return vec_and(a, vec_nor(b, b)); }
+template<> EIGEN_STRONG_INLINE Packet4i pandnot<Packet4i>(const Packet4i& a, const Packet4i& b) { return vec_and(a, vec_nor(b, b)); }
+
+#ifdef _BIG_ENDIAN
+template<> EIGEN_STRONG_INLINE Packet4f ploadu<Packet4f>(const float* from)
+{
+  EIGEN_DEBUG_ALIGNED_LOAD
+  Packet16uc MSQ, LSQ;
+  Packet16uc mask;
+  MSQ = vec_ld(0, (unsigned char *)from);          // most significant quadword
+  LSQ = vec_ld(15, (unsigned char *)from);         // least significant quadword
+  mask = vec_lvsl(0, from);                        // create the permute mask
+  return (Packet4f) vec_perm(MSQ, LSQ, mask);           // align the data
+
+}
+template<> EIGEN_STRONG_INLINE Packet4i ploadu<Packet4i>(const int* from)
+{
+  EIGEN_DEBUG_ALIGNED_LOAD
+  // Taken from http://developer.apple.com/hardwaredrivers/ve/alignment.html
+  Packet16uc MSQ, LSQ;
+  Packet16uc mask;
+  MSQ = vec_ld(0, (unsigned char *)from);          // most significant quadword
+  LSQ = vec_ld(15, (unsigned char *)from);         // least significant quadword
+  mask = vec_lvsl(0, from);                        // create the permute mask
+  return (Packet4i) vec_perm(MSQ, LSQ, mask);    // align the data
+}
+#else
+// We also need ot redefine little endian loading of Packet4i/Packet4f using VSX
+template<> EIGEN_STRONG_INLINE Packet4i ploadu<Packet4i>(const int* from)
+{
+  EIGEN_DEBUG_ALIGNED_LOAD
+  return (Packet4i) vec_vsx_ld((long)from & 15, (const Packet4i*) _EIGEN_ALIGNED_PTR(from));
+}
+template<> EIGEN_STRONG_INLINE Packet4f ploadu<Packet4f>(const float* from)
+{
+  EIGEN_DEBUG_ALIGNED_LOAD
+  return (Packet4f) vec_vsx_ld((long)from & 15, (const Packet4f*) _EIGEN_ALIGNED_PTR(from));
+}
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet4f ploaddup<Packet4f>(const float*   from)
+{
+  Packet4f p;
+  if((ptrdiff_t(from) % 16) == 0)  p = pload<Packet4f>(from);
+  else                             p = ploadu<Packet4f>(from);
+  return vec_perm(p, p, p16uc_DUPLICATE32_HI);
+}
+template<> EIGEN_STRONG_INLINE Packet4i ploaddup<Packet4i>(const int*     from)
+{
+  Packet4i p;
+  if((ptrdiff_t(from) % 16) == 0)  p = pload<Packet4i>(from);
+  else                             p = ploadu<Packet4i>(from);
+  return vec_perm(p, p, p16uc_DUPLICATE32_HI);
+}
+
+#ifdef _BIG_ENDIAN
+template<> EIGEN_STRONG_INLINE void pstoreu<float>(float*  to, const Packet4f& from)
+{
+  EIGEN_DEBUG_UNALIGNED_STORE
+  // Taken from http://developer.apple.com/hardwaredrivers/ve/alignment.html
+  // Warning: not thread safe!
+  Packet16uc MSQ, LSQ, edges;
+  Packet16uc edgeAlign, align;
+
+  MSQ = vec_ld(0, (unsigned char *)to);                     // most significant quadword
+  LSQ = vec_ld(15, (unsigned char *)to);                    // least significant quadword
+  edgeAlign = vec_lvsl(0, to);                              // permute map to extract edges
+  edges=vec_perm(LSQ,MSQ,edgeAlign);                        // extract the edges
+  align = vec_lvsr( 0, to );                                // permute map to misalign data
+  MSQ = vec_perm(edges,(Packet16uc)from,align);             // misalign the data (MSQ)
+  LSQ = vec_perm((Packet16uc)from,edges,align);             // misalign the data (LSQ)
+  vec_st( LSQ, 15, (unsigned char *)to );                   // Store the LSQ part first
+  vec_st( MSQ, 0, (unsigned char *)to );                    // Store the MSQ part
+}
+template<> EIGEN_STRONG_INLINE void pstoreu<int>(int*      to, const Packet4i& from)
+{
+  EIGEN_DEBUG_UNALIGNED_STORE
+  // Taken from http://developer.apple.com/hardwaredrivers/ve/alignment.html
+  // Warning: not thread safe!
+  Packet16uc MSQ, LSQ, edges;
+  Packet16uc edgeAlign, align;
+
+  MSQ = vec_ld(0, (unsigned char *)to);                     // most significant quadword
+  LSQ = vec_ld(15, (unsigned char *)to);                    // least significant quadword
+  edgeAlign = vec_lvsl(0, to);                              // permute map to extract edges
+  edges=vec_perm(LSQ, MSQ, edgeAlign);                      // extract the edges
+  align = vec_lvsr( 0, to );                                // permute map to misalign data
+  MSQ = vec_perm(edges, (Packet16uc) from, align);          // misalign the data (MSQ)
+  LSQ = vec_perm((Packet16uc) from, edges, align);          // misalign the data (LSQ)
+  vec_st( LSQ, 15, (unsigned char *)to );                   // Store the LSQ part first
+  vec_st( MSQ, 0, (unsigned char *)to );                    // Store the MSQ part
+}
+#else
+// We also need to redefine little endian loading of Packet4i/Packet4f using VSX
+template<> EIGEN_STRONG_INLINE void pstoreu<int>(int*       to, const Packet4i& from)
+{
+  EIGEN_DEBUG_ALIGNED_STORE
+  vec_vsx_st(from, (long)to & 15, (Packet4i*) _EIGEN_ALIGNED_PTR(to));
+}
+template<> EIGEN_STRONG_INLINE void pstoreu<float>(float*   to, const Packet4f& from)
+{
+  EIGEN_DEBUG_ALIGNED_STORE
+  vec_vsx_st(from, (long)to & 15, (Packet4f*) _EIGEN_ALIGNED_PTR(to));
+}
+#endif
+
+#ifndef __VSX__
+template<> EIGEN_STRONG_INLINE void prefetch<float>(const float* addr) { vec_dstt(addr, DST_CTRL(2,2,32), DST_CHAN); }
+template<> EIGEN_STRONG_INLINE void prefetch<int>(const int*     addr) { vec_dstt(addr, DST_CTRL(2,2,32), DST_CHAN); }
+#endif
+
+template<> EIGEN_STRONG_INLINE float  pfirst<Packet4f>(const Packet4f& a) { float EIGEN_ALIGN16 x[4]; vec_st(a, 0, x); return x[0]; }
+template<> EIGEN_STRONG_INLINE int    pfirst<Packet4i>(const Packet4i& a) { int   EIGEN_ALIGN16 x[4]; vec_st(a, 0, x); return x[0]; }
+
+template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a) { return (Packet4f)vec_perm((Packet16uc)a,(Packet16uc)a, p16uc_REVERSE32); }
+template<> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a) { return (Packet4i)vec_perm((Packet16uc)a,(Packet16uc)a, p16uc_REVERSE32); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a) { return vec_abs(a); }
+template<> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a) { return vec_abs(a); }
+
+template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
+{
+  Packet4f b, sum;
+  b   = (Packet4f) vec_sld(a, a, 8);
+  sum = vec_add(a, b);
+  b   = (Packet4f) vec_sld(sum, sum, 4);
+  sum = vec_add(sum, b);
+  return pfirst(sum);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f preduxp<Packet4f>(const Packet4f* vecs)
+{
+  Packet4f v[4], sum[4];
+
+  // It's easier and faster to transpose then add as columns
+  // Check: http://www.freevec.org/function/matrix_4x4_transpose_floats for explanation
+  // Do the transpose, first set of moves
+  v[0] = vec_mergeh(vecs[0], vecs[2]);
+  v[1] = vec_mergel(vecs[0], vecs[2]);
+  v[2] = vec_mergeh(vecs[1], vecs[3]);
+  v[3] = vec_mergel(vecs[1], vecs[3]);
+  // Get the resulting vectors
+  sum[0] = vec_mergeh(v[0], v[2]);
+  sum[1] = vec_mergel(v[0], v[2]);
+  sum[2] = vec_mergeh(v[1], v[3]);
+  sum[3] = vec_mergel(v[1], v[3]);
+
+  // Now do the summation:
+  // Lines 0+1
+  sum[0] = vec_add(sum[0], sum[1]);
+  // Lines 2+3
+  sum[1] = vec_add(sum[2], sum[3]);
+  // Add the results
+  sum[0] = vec_add(sum[0], sum[1]);
+
+  return sum[0];
+}
+
+template<> EIGEN_STRONG_INLINE int predux<Packet4i>(const Packet4i& a)
+{
+  Packet4i sum;
+  sum = vec_sums(a, p4i_ZERO);
+#ifdef _BIG_ENDIAN
+  sum = vec_sld(sum, p4i_ZERO, 12);
+#else
+  sum = vec_sld(p4i_ZERO, sum, 4);
+#endif
+  return pfirst(sum);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4i preduxp<Packet4i>(const Packet4i* vecs)
+{
+  Packet4i v[4], sum[4];
+
+  // It's easier and faster to transpose then add as columns
+  // Check: http://www.freevec.org/function/matrix_4x4_transpose_floats for explanation
+  // Do the transpose, first set of moves
+  v[0] = vec_mergeh(vecs[0], vecs[2]);
+  v[1] = vec_mergel(vecs[0], vecs[2]);
+  v[2] = vec_mergeh(vecs[1], vecs[3]);
+  v[3] = vec_mergel(vecs[1], vecs[3]);
+  // Get the resulting vectors
+  sum[0] = vec_mergeh(v[0], v[2]);
+  sum[1] = vec_mergel(v[0], v[2]);
+  sum[2] = vec_mergeh(v[1], v[3]);
+  sum[3] = vec_mergel(v[1], v[3]);
+
+  // Now do the summation:
+  // Lines 0+1
+  sum[0] = vec_add(sum[0], sum[1]);
+  // Lines 2+3
+  sum[1] = vec_add(sum[2], sum[3]);
+  // Add the results
+  sum[0] = vec_add(sum[0], sum[1]);
+
+  return sum[0];
+}
+
+// Other reduction functions:
+// mul
+template<> EIGEN_STRONG_INLINE float predux_mul<Packet4f>(const Packet4f& a)
+{
+  Packet4f prod;
+  prod = pmul(a, (Packet4f)vec_sld(a, a, 8));
+  return pfirst(pmul(prod, (Packet4f)vec_sld(prod, prod, 4)));
+}
+
+template<> EIGEN_STRONG_INLINE int predux_mul<Packet4i>(const Packet4i& a)
+{
+  EIGEN_ALIGN16 int aux[4];
+  pstore(aux, a);
+  return aux[0] * aux[1] * aux[2] * aux[3];
+}
+
+// min
+template<> EIGEN_STRONG_INLINE float predux_min<Packet4f>(const Packet4f& a)
+{
+  Packet4f b, res;
+  b = vec_min(a, vec_sld(a, a, 8));
+  res = vec_min(b, vec_sld(b, b, 4));
+  return pfirst(res);
+}
+
+template<> EIGEN_STRONG_INLINE int predux_min<Packet4i>(const Packet4i& a)
+{
+  Packet4i b, res;
+  b = vec_min(a, vec_sld(a, a, 8));
+  res = vec_min(b, vec_sld(b, b, 4));
+  return pfirst(res);
+}
+
+// max
+template<> EIGEN_STRONG_INLINE float predux_max<Packet4f>(const Packet4f& a)
+{
+  Packet4f b, res;
+  b = vec_max(a, vec_sld(a, a, 8));
+  res = vec_max(b, vec_sld(b, b, 4));
+  return pfirst(res);
+}
+
+template<> EIGEN_STRONG_INLINE int predux_max<Packet4i>(const Packet4i& a)
+{
+  Packet4i b, res;
+  b = vec_max(a, vec_sld(a, a, 8));
+  res = vec_max(b, vec_sld(b, b, 4));
+  return pfirst(res);
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet4f>
+{
+  static EIGEN_STRONG_INLINE void run(Packet4f& first, const Packet4f& second)
+  {
+#ifdef _BIG_ENDIAN
+    switch (Offset % 4) {
+    case 1:
+      first = vec_sld(first, second, 4); break;
+    case 2:
+      first = vec_sld(first, second, 8); break;
+    case 3:
+      first = vec_sld(first, second, 12); break;
+    }
+#else
+    switch (Offset % 4) {
+    case 1:
+      first = vec_sld(second, first, 12); break;
+    case 2:
+      first = vec_sld(second, first, 8); break;
+    case 3:
+      first = vec_sld(second, first, 4); break;
+    }
+#endif
+  }
+};
+
+template<int Offset>
+struct palign_impl<Offset,Packet4i>
+{
+  static EIGEN_STRONG_INLINE void run(Packet4i& first, const Packet4i& second)
+  {
+#ifdef _BIG_ENDIAN
+    switch (Offset % 4) {
+    case 1:
+      first = vec_sld(first, second, 4); break;
+    case 2:
+      first = vec_sld(first, second, 8); break;
+    case 3:
+      first = vec_sld(first, second, 12); break;
+    }
+#else
+    switch (Offset % 4) {
+    case 1:
+      first = vec_sld(second, first, 12); break;
+    case 2:
+      first = vec_sld(second, first, 8); break;
+    case 3:
+      first = vec_sld(second, first, 4); break;
+    }
+#endif
+  }
+};
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet4f,4>& kernel) {
+  Packet4f t0, t1, t2, t3;
+  t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]);
+  t1 = vec_mergel(kernel.packet[0], kernel.packet[2]);
+  t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]);
+  t3 = vec_mergel(kernel.packet[1], kernel.packet[3]);
+  kernel.packet[0] = vec_mergeh(t0, t2);
+  kernel.packet[1] = vec_mergel(t0, t2);
+  kernel.packet[2] = vec_mergeh(t1, t3);
+  kernel.packet[3] = vec_mergel(t1, t3);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet4i,4>& kernel) {
+  Packet4i t0, t1, t2, t3;
+  t0 = vec_mergeh(kernel.packet[0], kernel.packet[2]);
+  t1 = vec_mergel(kernel.packet[0], kernel.packet[2]);
+  t2 = vec_mergeh(kernel.packet[1], kernel.packet[3]);
+  t3 = vec_mergel(kernel.packet[1], kernel.packet[3]);
+  kernel.packet[0] = vec_mergeh(t0, t2);
+  kernel.packet[1] = vec_mergel(t0, t2);
+  kernel.packet[2] = vec_mergeh(t1, t3);
+  kernel.packet[3] = vec_mergel(t1, t3);
+}
+
+
+//---------- double ----------
+#if defined(__VSX__)
+typedef __vector double              Packet2d;
+typedef __vector unsigned long long  Packet2ul;
+typedef __vector long long           Packet2l;
+
+static Packet2l p2l_ZERO = (Packet2l) p4i_ZERO;
+static Packet2d p2d_ONE = { 1.0, 1.0 }; 
+static Packet2d p2d_ZERO = (Packet2d) p4f_ZERO;
+static Packet2d p2d_ZERO_ = { -0.0, -0.0 };
+
+#ifdef _BIG_ENDIAN
+static Packet2d p2d_COUNTDOWN = (Packet2d) vec_sld((Packet16uc) p2d_ZERO, (Packet16uc) p2d_ONE, 8);
+#else
+static Packet2d p2d_COUNTDOWN = (Packet2d) vec_sld((Packet16uc) p2d_ONE, (Packet16uc) p2d_ZERO, 8);
+#endif
+
+static EIGEN_STRONG_INLINE Packet2d vec_splat_dbl(Packet2d& a, int index)
+{
+  switch (index) {
+  case 0:
+    return (Packet2d) vec_perm(a, a, p16uc_PSET64_HI);
+  case 1:
+    return (Packet2d) vec_perm(a, a, p16uc_PSET64_LO);
+  }
+  return a;
+}
+
+template<> struct packet_traits<double> : default_packet_traits
+{
+  typedef Packet2d type;
+  typedef Packet2d half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=2,
+    HasHalfPacket = 0,
+
+    HasDiv  = 1,
+    HasExp  = 1,
+    HasSqrt = 0
+  };
+};
+
+template<> struct unpacket_traits<Packet2d> { typedef double type; enum {size=2}; typedef Packet2d half; };
+
+
+inline std::ostream & operator <<(std::ostream & s, const Packet2d & v)
+{
+  union {
+    Packet2d   v;
+    double n[2];
+  } vt;
+  vt.v = v;
+  s << vt.n[0] << ", " << vt.n[1];
+  return s;
+}
+
+// Need to define them first or we get specialization after instantiation errors
+template<> EIGEN_STRONG_INLINE Packet2d pload<Packet2d>(const double* from) { EIGEN_DEBUG_ALIGNED_LOAD return (Packet2d) vec_ld(0, (const float *) from); } //FIXME
+
+template<> EIGEN_STRONG_INLINE void pstore<double>(double*   to, const Packet2d& from) { EIGEN_DEBUG_ALIGNED_STORE vec_st((Packet4f)from, 0, (float *)to); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pset1<Packet2d>(const double&  from) {
+  double EIGEN_ALIGN16 af[2];
+  af[0] = from;
+  Packet2d vc = pload<Packet2d>(af);
+  vc = vec_splat_dbl(vc, 0);
+  return vc;
+}
+template<> EIGEN_STRONG_INLINE void
+pbroadcast4<Packet2d>(const double *a,
+                      Packet2d& a0, Packet2d& a1, Packet2d& a2, Packet2d& a3)
+{
+  a1 = pload<Packet2d>(a);
+  a0 = vec_splat_dbl(a1, 0);
+  a1 = vec_splat_dbl(a1, 1);
+  a3 = pload<Packet2d>(a+2);
+  a2 = vec_splat_dbl(a3, 0);
+  a3 = vec_splat_dbl(a3, 1);
+}
+// Google-local: Change type from DenseIndex to int in patch.
+template<> EIGEN_DEVICE_FUNC inline Packet2d pgather<double, Packet2d>(const double* from, int/*DenseIndex*/ stride)
+{
+  double EIGEN_ALIGN16 af[2];
+  af[0] = from[0*stride];
+  af[1] = from[1*stride];
+ return pload<Packet2d>(af);
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<double, Packet2d>(double* to, const Packet2d& from, /*DenseIndex*/int stride)
+{
+  double EIGEN_ALIGN16 af[2];
+  pstore<double>(af, from);
+  to[0*stride] = af[0];
+  to[1*stride] = af[1];
+}
+template<> EIGEN_STRONG_INLINE Packet2d plset<double>(const double& a) { return vec_add(pset1<Packet2d>(a), p2d_COUNTDOWN); }
+
+template<> EIGEN_STRONG_INLINE Packet2d padd<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_add(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d psub<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_sub(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pnegate(const Packet2d& a) { return psub<Packet2d>(p2d_ZERO, a); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pconj(const Packet2d& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE Packet2d pmul<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_madd(a,b,p2d_ZERO); }
+template<> EIGEN_STRONG_INLINE Packet2d pdiv<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_div(a,b); }
+
+// for some weird raisons, it has to be overloaded for packet of integers
+template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return vec_madd(a, b, c); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pmin<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_min(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pmax<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_max(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pand<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_and(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d por<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_or(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pxor<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_xor(a, b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pandnot<Packet2d>(const Packet2d& a, const Packet2d& b) { return vec_and(a, vec_nor(b, b)); }
+
+template<> EIGEN_STRONG_INLINE Packet2d ploadu<Packet2d>(const double* from)
+{
+  EIGEN_DEBUG_ALIGNED_LOAD
+  return (Packet2d) vec_vsx_ld((long)from & 15, (const Packet2d*) _EIGEN_ALIGNED_PTR(from));
+}
+template<> EIGEN_STRONG_INLINE Packet2d ploaddup<Packet2d>(const double*   from)
+{
+  Packet2d p;
+  if((ptrdiff_t(from) % 16) == 0)  p = pload<Packet2d>(from);
+  else                             p = ploadu<Packet2d>(from);
+  return vec_perm(p, p, p16uc_PSET64_HI);
+}
+
+template<> EIGEN_STRONG_INLINE void pstoreu<double>(double*  to, const Packet2d& from)
+{
+  EIGEN_DEBUG_ALIGNED_STORE
+  vec_vsx_st((Packet4f)from, (long)to & 15, (Packet4f*) _EIGEN_ALIGNED_PTR(to));
+}
+
+#ifndef __VSX__
+template<> EIGEN_STRONG_INLINE void prefetch<double>(const double* addr) { vec_dstt((const float *) addr, DST_CTRL(2,2,32), DST_CHAN); }
+#endif
+
+template<> EIGEN_STRONG_INLINE double  pfirst<Packet2d>(const Packet2d& a) { double EIGEN_ALIGN16 x[2]; pstore(x, a); return x[0]; }
+
+template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a) { return (Packet2d)vec_perm((Packet16uc)a,(Packet16uc)a, p16uc_REVERSE64); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a) { return vec_abs(a); }
+
+template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a)
+{
+  Packet2d b, sum;
+  b   = (Packet2d) vec_sld((Packet4ui) a, (Packet4ui)a, 8);
+  sum = vec_add(a, b);
+  return pfirst(sum);
+}
+
+template<> EIGEN_STRONG_INLINE Packet2d preduxp<Packet2d>(const Packet2d* vecs)
+{
+  Packet2d v[2], sum;
+  v[0] = vec_add(vecs[0], (Packet2d) vec_sld((Packet4ui) vecs[0], (Packet4ui) vecs[0], 8));
+  v[1] = vec_add(vecs[1], (Packet2d) vec_sld((Packet4ui) vecs[1], (Packet4ui) vecs[1], 8));
+ 
+#ifdef _BIG_ENDIAN
+ sum = (Packet2d) vec_sld((Packet4ui) v[0], (Packet4ui) v[1], 8);
+#else
+  sum = (Packet2d) vec_sld((Packet4ui) v[1], (Packet4ui) v[0], 8);
+#endif
+
+  return sum;
+}
+// Other reduction functions:
+// mul
+template<> EIGEN_STRONG_INLINE double predux_mul<Packet2d>(const Packet2d& a)
+{
+  return pfirst(pmul(a, (Packet2d)vec_sld((Packet4ui) a, (Packet4ui) a, 8)));
+}
+
+// min
+template<> EIGEN_STRONG_INLINE double predux_min<Packet2d>(const Packet2d& a)
+{
+  return pfirst(vec_min(a, (Packet2d) vec_sld((Packet4ui) a, (Packet4ui) a, 8)));
+}
+
+// max
+template<> EIGEN_STRONG_INLINE double predux_max<Packet2d>(const Packet2d& a)
+{
+  return pfirst(vec_max(a, (Packet2d) vec_sld((Packet4ui) a, (Packet4ui) a, 8)));
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet2d>
+{
+  static EIGEN_STRONG_INLINE void run(Packet2d& first, const Packet2d& second)
+  {
+    if (Offset == 1)
+#ifdef _BIG_ENDIAN
+      first = (Packet2d) vec_sld((Packet4ui) first, (Packet4ui) second, 8);
+#else
+      first = (Packet2d) vec_sld((Packet4ui) second, (Packet4ui) first, 8);
+#endif
+  }
+};
+
+EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet2d,2>& kernel) {
+  Packet2d t0, t1;
+  t0 = vec_perm(kernel.packet[0], kernel.packet[1], p16uc_TRANSPOSE64_HI);
+  t1 = vec_perm(kernel.packet[0], kernel.packet[1], p16uc_TRANSPOSE64_LO);
+  kernel.packet[0] = t0;
+  kernel.packet[1] = t1;
+}
+
+#endif  // defined(__VSX__)
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_PACKET_MATH_ALTIVEC_H
+
diff --git a/third_party/eigen3/Eigen/src/Core/arch/CUDA/MathFunctions.h b/third_party/eigen3/Eigen/src/Core/arch/CUDA/MathFunctions.h
new file mode 100644
index 0000000000..675daae8f0
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/CUDA/MathFunctions.h
@@ -0,0 +1,75 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MATH_FUNCTIONS_CUDA_H
+#define EIGEN_MATH_FUNCTIONS_CUDA_H
+
+namespace Eigen {
+
+namespace internal {
+
+// Make sure this is only available when targeting a GPU: we don't want to
+// introduce conflicts between these packet_traits definitions and the ones
+// we'll use on the host side (SSE, AVX, ...)
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__)
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+float4 plog<float4>(const float4& a)
+{
+  return make_float4(logf(a.x), logf(a.y), logf(a.z), logf(a.w));
+}
+
+template<>  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+double2 plog<double2>(const double2& a)
+{
+  return make_double2(log(a.x), log(a.y));
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+float4 pexp<float4>(const float4& a)
+{
+  return make_float4(expf(a.x), expf(a.y), expf(a.z), expf(a.w));
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+double2 pexp<double2>(const double2& a)
+{
+  return make_double2(exp(a.x), exp(a.y));
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+float4 psqrt<float4>(const float4& a)
+{
+  return make_float4(sqrtf(a.x), sqrtf(a.y), sqrtf(a.z), sqrtf(a.w));
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+double2 psqrt<double2>(const double2& a)
+{
+  return make_double2(sqrt(a.x), sqrt(a.y));
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+float4 prsqrt<float4>(const float4& a)
+{
+  return make_float4(rsqrtf(a.x), rsqrtf(a.y), rsqrtf(a.z), rsqrtf(a.w));
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+double2 prsqrt<double2>(const double2& a)
+{
+  return make_double2(rsqrt(a.x), rsqrt(a.y));
+}
+
+#endif
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_MATH_FUNCTIONS_CUDA_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/CUDA/PacketMath.h b/third_party/eigen3/Eigen/src/Core/arch/CUDA/PacketMath.h
new file mode 100644
index 0000000000..d11f5ba411
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/CUDA/PacketMath.h
@@ -0,0 +1,336 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PACKET_MATH_CUDA_H
+#define EIGEN_PACKET_MATH_CUDA_H
+
+namespace Eigen {
+
+namespace internal {
+// Make sure this is only available when targeting a GPU: we don't want to
+// introduce conflicts between these packet_traits definitions and the ones
+// we'll use on the host side (SSE, AVX, ...)
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__)
+template<> struct is_arithmetic<float4>  { enum { value = true }; };
+template<> struct is_arithmetic<double2> { enum { value = true }; };
+
+
+template<> struct packet_traits<float> : default_packet_traits
+{
+  typedef float4 type;
+  typedef float4 half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=4,
+    HasHalfPacket = 0,
+
+    HasDiv  = 1,
+    HasSin  = 0,
+    HasCos  = 0,
+    HasLog  = 1,
+    HasExp  = 1,
+    HasSqrt = 1,
+    HasRsqrt = 1,
+
+    HasBlend = 0,
+    HasSelect = 1,
+    HasEq = 1,
+  };
+};
+
+template<> struct packet_traits<double> : default_packet_traits
+{
+  typedef double2 type;
+  typedef double2 half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=2,
+    HasHalfPacket = 0,
+
+    HasDiv  = 1,
+    HasLog  = 1,
+    HasExp  = 1,
+    HasSqrt = 1,
+    HasRsqrt = 1,
+
+    HasBlend = 0,
+    HasSelect = 1,
+    HasEq = 1,
+  };
+};
+
+
+template<> struct unpacket_traits<float4> { typedef float  type; enum {size=4}; typedef float4 half; };
+template<> struct unpacket_traits<double2> { typedef double type; enum {size=2}; typedef double2 half; };
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pset1<float4>(const float&  from) {
+  return make_float4(from, from, from, from);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pset1<double2>(const double& from) {
+  return make_double2(from, from);
+}
+
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 plset<float>(const float& a) {
+  return make_float4(a, a+1, a+2, a+3);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 plset<double>(const double& a) {
+  return make_double2(a, a+1);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 padd<float4>(const float4& a, const float4& b) {
+  return make_float4(a.x+b.x, a.y+b.y, a.z+b.z, a.w+b.w);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 padd<double2>(const double2& a, const double2& b) {
+  return make_double2(a.x+b.x, a.y+b.y);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 psub<float4>(const float4& a, const float4& b) {
+  return make_float4(a.x-b.x, a.y-b.y, a.z-b.z, a.w-b.w);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 psub<double2>(const double2& a, const double2& b) {
+  return make_double2(a.x-b.x, a.y-b.y);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 peq<float4>(const float4& a, const float4& b) {
+  return make_float4(a.x == b.x ? 1.f : 0, a.y == b.y ? 1.f : 0, a.z == b.z ? 1.f : 0, a.w == b.w ? 1.f : 0);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 peq<double2>(const double2& a, const double2& b) {
+  return make_double2(a.x == b.x ? 1. : 0, a.y == b.y ? 1. : 0);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 ple<float4>(const float4& a, const float4& b) {
+  return make_float4(a.x <= b.x ? 1.f : 0, a.y <= b.y ? 1.f : 0, a.z <= b.z ? 1.f : 0, a.w <= b.w ? 1.f : 0);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 ple<double2>(const double2& a, const double2& b) {
+  return make_double2(a.x <= b.x ? 1. : 0, a.y <= b.y ? 1. : 0);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 plt<float4>(const float4& a, const float4& b) {
+  return make_float4(a.x < b.x ? 1.f : 0, a.y < b.y ? 1.f : 0, a.z < b.z ? 1.f : 0, a.w < b.w ? 1.f : 0);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 plt<double2>(const double2& a, const double2& b) {
+  return make_double2(a.x < b.x ? 1. : 0, a.y < b.y ? 1. : 0);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pselect<float4>(const float4& a, const float4& b, const float4& c) {
+  return make_float4(c.x ? b.x : a.x, c.y ? b.y : a.y, c.z ? b.z : a.z, c.w ? b.w : a.w);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pselect<double2>(const double2& a, const double2& b, const double2& c) {
+  return make_double2(c.x ? b.x : a.x, c.y ? b.y : a.y);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pnegate(const float4& a) {
+  return make_float4(-a.x, -a.y, -a.z, -a.w);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pnegate(const double2& a) {
+  return make_double2(-a.x, -a.y);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pconj(const float4& a) { return a; }
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pconj(const double2& a) { return a; }
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmul<float4>(const float4& a, const float4& b) {
+  return make_float4(a.x*b.x, a.y*b.y, a.z*b.z, a.w*b.w);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmul<double2>(const double2& a, const double2& b) {
+  return make_double2(a.x*b.x, a.y*b.y);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pdiv<float4>(const float4& a, const float4& b) {
+  return make_float4(a.x/b.x, a.y/b.y, a.z/b.z, a.w/b.w);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pdiv<double2>(const double2& a, const double2& b) {
+  return make_double2(a.x/b.x, a.y/b.y);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmin<float4>(const float4& a, const float4& b) {
+  return make_float4(fminf(a.x, b.x), fminf(a.y, b.y), fminf(a.z, b.z), fminf(a.w, b.w));
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmin<double2>(const double2& a, const double2& b) {
+  return make_double2(fmin(a.x, b.x), fmin(a.y, b.y));
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pmax<float4>(const float4& a, const float4& b) {
+  return make_float4(fmaxf(a.x, b.x), fmaxf(a.y, b.y), fmaxf(a.z, b.z), fmaxf(a.w, b.w));
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pmax<double2>(const double2& a, const double2& b) {
+  return make_double2(fmax(a.x, b.x), fmax(a.y, b.y));
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 pload<float4>(const float* from) {
+  return *reinterpret_cast<const float4*>(from);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 pload<double2>(const double* from) {
+  return *reinterpret_cast<const double2*>(from);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE float4 ploadu<float4>(const float* from) {
+  return make_float4(from[0], from[1], from[2], from[3]);
+}
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE double2 ploadu<double2>(const double* from) {
+  return make_double2(from[0], from[1]);
+}
+
+template<> EIGEN_STRONG_INLINE float4 ploaddup<float4>(const float*   from) {
+  return make_float4(from[0], from[0], from[1], from[1]);
+}
+template<> EIGEN_STRONG_INLINE double2 ploaddup<double2>(const double*  from) {
+  return make_double2(from[0], from[0]);
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstore<float>(float*   to, const float4& from) {
+  *reinterpret_cast<float4*>(to) = from;
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstore<double>(double* to, const double2& from) {
+  *reinterpret_cast<double2*>(to) = from;
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstoreu<float>(float*  to, const float4& from) {
+  to[0] = from.x;
+  to[1] = from.y;
+  to[2] = from.z;
+  to[3] = from.w;
+}
+
+template<> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void pstoreu<double>(double* to, const double2& from) {
+  to[0] = from.x;
+  to[1] = from.y;
+}
+
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 350
+template<>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float4 ploadt_ro<float4, Aligned>(const float* from) {
+  return __ldg((const float4*)from);
+}
+template<>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double2 ploadt_ro<double2, Aligned>(const double* from) {
+  return __ldg((const double2*)from);
+}
+
+template<>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE float4 ploadt_ro<float4, Unaligned>(const float* from) {
+  return make_float4(__ldg(from+0), __ldg(from+1), __ldg(from+2), __ldg(from+3));
+}
+template<>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE double2 ploadt_ro<double2, Unaligned>(const double* from) {
+  return make_double2(__ldg(from+0), __ldg(from+1));
+}
+#endif
+
+template<> EIGEN_DEVICE_FUNC inline float4 pgather<float, float4>(const float* from, int stride) {
+  return make_float4(from[0*stride], from[1*stride], from[2*stride], from[3*stride]);
+}
+
+template<> EIGEN_DEVICE_FUNC inline double2 pgather<double, double2>(const double* from, int stride) {
+  return make_double2(from[0*stride], from[1*stride]);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<float, float4>(float* to, const float4& from, int stride) {
+  to[stride*0] = from.x;
+  to[stride*1] = from.y;
+  to[stride*2] = from.z;
+  to[stride*3] = from.w;
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<double, double2>(double* to, const double2& from, int stride) {
+  to[stride*0] = from.x;
+  to[stride*1] = from.y;
+}
+
+template<> EIGEN_DEVICE_FUNC inline float  pfirst<float4>(const float4& a) {
+  return a.x;
+}
+template<> EIGEN_DEVICE_FUNC inline double pfirst<double2>(const double2& a) {
+  return a.x;
+}
+
+template<> EIGEN_DEVICE_FUNC inline float  predux<float4>(const float4& a) {
+  return a.x + a.y + a.z + a.w;
+}
+template<> EIGEN_DEVICE_FUNC inline double predux<double2>(const double2& a) {
+  return a.x + a.y;
+}
+
+template<> EIGEN_DEVICE_FUNC inline float  predux_max<float4>(const float4& a) {
+  return fmaxf(fmaxf(a.x, a.y), fmaxf(a.z, a.w));
+}
+template<> EIGEN_DEVICE_FUNC inline double predux_max<double2>(const double2& a) {
+  return fmax(a.x, a.y);
+}
+
+template<> EIGEN_DEVICE_FUNC inline float  predux_min<float4>(const float4& a) {
+  return fminf(fminf(a.x, a.y), fminf(a.z, a.w));
+}
+template<> EIGEN_DEVICE_FUNC inline double predux_min<double2>(const double2& a) {
+  return fmin(a.x, a.y);
+}
+
+template <>
+EIGEN_DEVICE_FUNC inline float predux_mul<float4>(const float4& a) {
+  return a.x * a.y * a.z * a.w;
+}
+template <>
+EIGEN_DEVICE_FUNC inline double predux_mul<double2>(const double2& a) {
+  return a.x * a.y;
+}
+
+template<> EIGEN_DEVICE_FUNC inline float4  pabs<float4>(const float4& a) {
+  return make_float4(fabsf(a.x), fabsf(a.y), fabsf(a.z), fabsf(a.w));
+}
+template<> EIGEN_DEVICE_FUNC inline double2 pabs<double2>(const double2& a) {
+  return make_double2(fabs(a.x), fabs(a.y));
+}
+
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<float4,4>& kernel) {
+  double tmp = kernel.packet[0].y;
+  kernel.packet[0].y = kernel.packet[1].x;
+  kernel.packet[1].x = tmp;
+
+  tmp = kernel.packet[0].z;
+  kernel.packet[0].z = kernel.packet[2].x;
+  kernel.packet[2].x = tmp;
+
+  tmp = kernel.packet[0].w;
+  kernel.packet[0].w = kernel.packet[3].x;
+  kernel.packet[3].x = tmp;
+
+  tmp = kernel.packet[1].z;
+  kernel.packet[1].z = kernel.packet[2].y;
+  kernel.packet[2].y = tmp;
+
+  tmp = kernel.packet[1].w;
+  kernel.packet[1].w = kernel.packet[3].y;
+  kernel.packet[3].y = tmp;
+
+  tmp = kernel.packet[2].w;
+  kernel.packet[2].w = kernel.packet[3].z;
+  kernel.packet[3].z = tmp;
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<double2,2>& kernel) {
+  double tmp = kernel.packet[0].y;
+  kernel.packet[0].y = kernel.packet[1].x;
+  kernel.packet[1].x = tmp;
+}
+
+#endif
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+
+#endif // EIGEN_PACKET_MATH_CUDA_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/Default/Settings.h b/third_party/eigen3/Eigen/src/Core/arch/Default/Settings.h
new file mode 100644
index 0000000000..097373c84d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/Default/Settings.h
@@ -0,0 +1,49 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+
+/* All the parameters defined in this file can be specialized in the
+ * architecture specific files, and/or by the user.
+ * More to come... */
+
+#ifndef EIGEN_DEFAULT_SETTINGS_H
+#define EIGEN_DEFAULT_SETTINGS_H
+
+/** Defines the maximal loop size to enable meta unrolling of loops.
+  * Note that the value here is expressed in Eigen's own notion of "number of FLOPS",
+  * it does not correspond to the number of iterations or the number of instructions
+  */
+#ifndef EIGEN_UNROLLING_LIMIT
+#define EIGEN_UNROLLING_LIMIT 100
+#endif
+
+/** Defines the threshold between a "small" and a "large" matrix.
+  * This threshold is mainly used to select the proper product implementation.
+  */
+#ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD
+#define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8
+#endif
+
+/** Defines the maximal width of the blocks used in the triangular product and solver
+  * for vectors (level 2 blas xTRMV and xTRSV). The default is 8.
+  */
+#ifndef EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH
+#define EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH 8
+#endif
+
+
+/** Defines the default number of registers available for that architecture.
+  * Currently it must be 8 or 16. Other values will fail.
+  */
+#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
+#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 8
+#endif
+
+#endif // EIGEN_DEFAULT_SETTINGS_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/NEON/Complex.h b/third_party/eigen3/Eigen/src/Core/arch/NEON/Complex.h
new file mode 100644
index 0000000000..49e3fa1b02
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/NEON/Complex.h
@@ -0,0 +1,467 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COMPLEX_NEON_H
+#define EIGEN_COMPLEX_NEON_H
+
+namespace Eigen {
+
+namespace internal {
+
+static uint32x4_t p4ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET4(0x00000000, 0x80000000, 0x00000000, 0x80000000);
+static uint32x2_t p2ui_CONJ_XOR = EIGEN_INIT_NEON_PACKET2(0x00000000, 0x80000000);
+
+//---------- float ----------
+struct Packet2cf
+{
+  EIGEN_STRONG_INLINE Packet2cf() {}
+  EIGEN_STRONG_INLINE explicit Packet2cf(const Packet4f& a) : v(a) {}
+  Packet4f  v;
+};
+
+template<> struct packet_traits<std::complex<float> >  : default_packet_traits
+{
+  typedef Packet2cf type;
+  typedef Packet2cf half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 2,
+    HasHalfPacket = 0,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasDiv    = 1,
+    HasNegate = 1,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasSetLinear = 0
+  };
+};
+
+template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2}; typedef Packet2cf half; };
+
+template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>&  from)
+{
+  float32x2_t r64;
+  r64 = vld1_f32((float *)&from);
+
+  return Packet2cf(vcombine_f32(r64, r64));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf padd<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(padd<Packet4f>(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(psub<Packet4f>(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a) { return Packet2cf(pnegate<Packet4f>(a.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a)
+{
+  Packet4ui b = vreinterpretq_u32_f32(a.v);
+  return Packet2cf(vreinterpretq_f32_u32(veorq_u32(b, p4ui_CONJ_XOR)));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  Packet4f v1, v2;
+
+  // Get the real values of a | a1_re | a1_re | a2_re | a2_re |
+  v1 = vcombine_f32(vdup_lane_f32(vget_low_f32(a.v), 0), vdup_lane_f32(vget_high_f32(a.v), 0));
+  // Get the real values of a | a1_im | a1_im | a2_im | a2_im |
+  v2 = vcombine_f32(vdup_lane_f32(vget_low_f32(a.v), 1), vdup_lane_f32(vget_high_f32(a.v), 1));
+  // Multiply the real a with b
+  v1 = vmulq_f32(v1, b.v);
+  // Multiply the imag a with b
+  v2 = vmulq_f32(v2, b.v);
+  // Conjugate v2 
+  v2 = vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(v2), p4ui_CONJ_XOR));
+  // Swap real/imag elements in v2.
+  v2 = vrev64q_f32(v2);
+  // Add and return the result
+  return Packet2cf(vaddq_f32(v1, v2));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf pand   <Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  return Packet2cf(vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(a.v),vreinterpretq_u32_f32(b.v))));
+}
+template<> EIGEN_STRONG_INLINE Packet2cf por    <Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  return Packet2cf(vreinterpretq_f32_u32(vorrq_u32(vreinterpretq_u32_f32(a.v),vreinterpretq_u32_f32(b.v))));
+}
+template<> EIGEN_STRONG_INLINE Packet2cf pxor   <Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  return Packet2cf(vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(a.v),vreinterpretq_u32_f32(b.v))));
+}
+template<> EIGEN_STRONG_INLINE Packet2cf pandnot<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  return Packet2cf(vreinterpretq_f32_u32(vbicq_u32(vreinterpretq_u32_f32(a.v),vreinterpretq_u32_f32(b.v))));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf pload<Packet2cf>(const std::complex<float>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload<Packet4f>((const float*)from)); }
+template<> EIGEN_STRONG_INLINE Packet2cf ploadu<Packet2cf>(const std::complex<float>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu<Packet4f>((const float*)from)); }
+
+template<> EIGEN_STRONG_INLINE Packet2cf ploaddup<Packet2cf>(const std::complex<float>* from) { return pset1<Packet2cf>(*from); }
+
+template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float> *   to, const Packet2cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((float*)to, from.v); }
+template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float> *   to, const Packet2cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((float*)to, from.v); }
+
+template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather<std::complex<float>, Packet2cf>(const std::complex<float>* from, int stride)
+{
+  Packet4f res = pset1<Packet4f>(0.f);
+  res = vsetq_lane_f32(std::real(from[0*stride]), res, 0);
+  res = vsetq_lane_f32(std::imag(from[0*stride]), res, 1);
+  res = vsetq_lane_f32(std::real(from[1*stride]), res, 2);
+  res = vsetq_lane_f32(std::imag(from[1*stride]), res, 3);
+  return Packet2cf(res);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet2cf>(std::complex<float>* to, const Packet2cf& from, int stride)
+{
+  to[stride*0] = std::complex<float>(vgetq_lane_f32(from.v, 0), vgetq_lane_f32(from.v, 1));
+  to[stride*1] = std::complex<float>(vgetq_lane_f32(from.v, 2), vgetq_lane_f32(from.v, 3));
+}
+
+template<> EIGEN_STRONG_INLINE void prefetch<std::complex<float> >(const std::complex<float> *   addr) { EIGEN_ARM_PREFETCH((float *)addr); }
+
+template<> EIGEN_STRONG_INLINE std::complex<float>  pfirst<Packet2cf>(const Packet2cf& a)
+{
+  std::complex<float> EIGEN_ALIGN16 x[2];
+  vst1q_f32((float *)x, a.v);
+  return x[0];
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a)
+{
+  float32x2_t a_lo, a_hi;
+  Packet4f a_r128;
+
+  a_lo = vget_low_f32(a.v);
+  a_hi = vget_high_f32(a.v);
+  a_r128 = vcombine_f32(a_hi, a_lo);
+
+  return Packet2cf(a_r128);
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf pcplxflip<Packet2cf>(const Packet2cf& a)
+{
+  return Packet2cf(vrev64q_f32(a.v));
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packet2cf& a)
+{
+  float32x2_t a1, a2;
+  std::complex<float> s;
+
+  a1 = vget_low_f32(a.v);
+  a2 = vget_high_f32(a.v);
+  a2 = vadd_f32(a1, a2);
+  vst1_f32((float *)&s, a2);
+
+  return s;
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
+{
+  Packet4f sum1, sum2, sum;
+
+  // Add the first two 64-bit float32x2_t of vecs[0]
+  sum1 = vcombine_f32(vget_low_f32(vecs[0].v), vget_low_f32(vecs[1].v));
+  sum2 = vcombine_f32(vget_high_f32(vecs[0].v), vget_high_f32(vecs[1].v));
+  sum = vaddq_f32(sum1, sum2);
+
+  return Packet2cf(sum);
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
+{
+  float32x2_t a1, a2, v1, v2, prod;
+  std::complex<float> s;
+
+  a1 = vget_low_f32(a.v);
+  a2 = vget_high_f32(a.v);
+   // Get the real values of a | a1_re | a1_re | a2_re | a2_re |
+  v1 = vdup_lane_f32(a1, 0);
+  // Get the real values of a | a1_im | a1_im | a2_im | a2_im |
+  v2 = vdup_lane_f32(a1, 1);
+  // Multiply the real a with b
+  v1 = vmul_f32(v1, a2);
+  // Multiply the imag a with b
+  v2 = vmul_f32(v2, a2);
+  // Conjugate v2 
+  v2 = vreinterpret_f32_u32(veor_u32(vreinterpret_u32_f32(v2), p2ui_CONJ_XOR));
+  // Swap real/imag elements in v2.
+  v2 = vrev64_f32(v2);
+  // Add v1, v2
+  prod = vadd_f32(v1, v2);
+
+  vst1_f32((float *)&s, prod);
+
+  return s;
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet2cf>
+{
+  EIGEN_STRONG_INLINE static void run(Packet2cf& first, const Packet2cf& second)
+  {
+    if (Offset==1)
+    {
+      first.v = vextq_f32(first.v, second.v, 2);
+    }
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    return internal::pmul(a, pconj(b));
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    return internal::pmul(pconj(a), b);
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    return pconj(internal::pmul(a, b));
+  }
+};
+
+template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  // TODO optimize it for NEON
+  Packet2cf res = conj_helper<Packet2cf,Packet2cf,false,true>().pmul(a,b);
+  Packet4f s, rev_s;
+
+  // this computes the norm
+  s = vmulq_f32(b.v, b.v);
+  rev_s = vrev64q_f32(s);
+
+  return Packet2cf(pdiv(res.v, vaddq_f32(s,rev_s)));
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet2cf,2>& kernel) {
+  Packet4f tmp = vcombine_f32(vget_high_f32(kernel.packet[0].v), vget_high_f32(kernel.packet[1].v));
+  kernel.packet[0].v = vcombine_f32(vget_low_f32(kernel.packet[0].v), vget_low_f32(kernel.packet[1].v));
+  kernel.packet[1].v = tmp;
+}
+
+//---------- double ----------
+#if EIGEN_ARCH_ARM64 && !EIGEN_APPLE_DOUBLE_NEON_BUG
+
+static uint64x2_t p2ul_CONJ_XOR = EIGEN_INIT_NEON_PACKET2(0x0, 0x8000000000000000);
+
+struct Packet1cd
+{
+  EIGEN_STRONG_INLINE Packet1cd() {}
+  EIGEN_STRONG_INLINE explicit Packet1cd(const Packet2d& a) : v(a) {}
+  Packet2d v;
+};
+
+template<> struct packet_traits<std::complex<double> >  : default_packet_traits
+{
+  typedef Packet1cd type;
+  typedef Packet1cd half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 0,
+    size = 1,
+    HasHalfPacket = 0,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasDiv    = 1,
+    HasNegate = 1,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasSetLinear = 0
+  };
+};
+
+template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1}; typedef Packet1cd half; };
+
+template<> EIGEN_STRONG_INLINE Packet1cd pload<Packet1cd>(const std::complex<double>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload<Packet2d>((const double*)from)); }
+template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu<Packet2d>((const double*)from)); }
+
+template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<double>&  from)
+{ /* here we really have to use unaligned loads :( */ return ploadu<Packet1cd>(&from); }
+
+template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(padd<Packet2d>(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd psub<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(psub<Packet2d>(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate<Packet2d>(a.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a) { return Packet1cd(vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a.v), p2ul_CONJ_XOR))); }
+
+template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  Packet2d v1, v2;
+
+  // Get the real values of a 
+  v1 = vdupq_lane_f64(vget_low_f64(a.v), 0);
+  // Get the real values of a 
+  v2 = vdupq_lane_f64(vget_high_f64(a.v), 1);
+  // Multiply the real a with b
+  v1 = vmulq_f64(v1, b.v);
+  // Multiply the imag a with b
+  v2 = vmulq_f64(v2, b.v);
+  // Conjugate v2 
+  v2 = vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(v2), p2ul_CONJ_XOR));
+  // Swap real/imag elements in v2.
+  v2 = preverse<Packet2d>(v2);
+  // Add and return the result
+  return Packet1cd(vaddq_f64(v1, v2));
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd pand   <Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  return Packet1cd(vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v))));
+}
+template<> EIGEN_STRONG_INLINE Packet1cd por    <Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  return Packet1cd(vreinterpretq_f64_u64(vorrq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v))));
+}
+template<> EIGEN_STRONG_INLINE Packet1cd pxor   <Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  return Packet1cd(vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v))));
+}
+template<> EIGEN_STRONG_INLINE Packet1cd pandnot<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  return Packet1cd(vreinterpretq_f64_u64(vbicq_u64(vreinterpretq_u64_f64(a.v),vreinterpretq_u64_f64(b.v))));
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd ploaddup<Packet1cd>(const std::complex<double>* from) { return pset1<Packet1cd>(*from); }
+
+template<> EIGEN_STRONG_INLINE void pstore <std::complex<double> >(std::complex<double> *   to, const Packet1cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, from.v); }
+template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<double> *   to, const Packet1cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, from.v); }
+
+template<> EIGEN_STRONG_INLINE void prefetch<std::complex<double> >(const std::complex<double> *   addr) { EIGEN_ARM_PREFETCH((double *)addr); }
+
+template<> EIGEN_DEVICE_FUNC inline Packet1cd pgather<std::complex<double>, Packet1cd>(const std::complex<double>* from, int stride)
+{
+  Packet2d res = pset1<Packet2d>(0.0);
+  res = vsetq_lane_f64(std::real(from[0*stride]), res, 0);
+  res = vsetq_lane_f64(std::imag(from[0*stride]), res, 1);
+  return Packet1cd(res);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<double>, Packet1cd>(std::complex<double>* to, const Packet1cd& from, int stride)
+{
+  to[stride*0] = std::complex<double>(vgetq_lane_f64(from.v, 0), vgetq_lane_f64(from.v, 1));
+}
+
+
+template<> EIGEN_STRONG_INLINE std::complex<double>  pfirst<Packet1cd>(const Packet1cd& a)
+{
+  std::complex<double> EIGEN_ALIGN16 res;
+  pstore<std::complex<double> >(&res, a);
+
+  return res;
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
+
+template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs) { return vecs[0]; }
+
+template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a) { return pfirst(a); }
+
+template<int Offset>
+struct palign_impl<Offset,Packet1cd>
+{
+  static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
+  {
+    // FIXME is it sure we never have to align a Packet1cd?
+    // Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    return internal::pmul(a, pconj(b));
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    return internal::pmul(pconj(a), b);
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    return pconj(internal::pmul(a, b));
+  }
+};
+
+template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  // TODO optimize it for NEON
+  Packet1cd res = conj_helper<Packet1cd,Packet1cd,false,true>().pmul(a,b);
+  Packet2d s = pmul<Packet2d>(b.v, b.v);
+  Packet2d rev_s = preverse<Packet2d>(s);
+
+  return Packet1cd(pdiv(res.v, padd<Packet2d>(s,rev_s)));
+}
+
+EIGEN_STRONG_INLINE Packet1cd pcplxflip/*<Packet1cd>*/(const Packet1cd& x)
+{
+  return Packet1cd(preverse(Packet2d(x.v)));
+}
+
+EIGEN_STRONG_INLINE void ptranspose(PacketBlock<Packet1cd,2>& kernel)
+{
+  Packet2d tmp = vcombine_f64(vget_high_f64(kernel.packet[0].v), vget_high_f64(kernel.packet[1].v));
+  kernel.packet[0].v = vcombine_f64(vget_low_f64(kernel.packet[0].v), vget_low_f64(kernel.packet[1].v));
+  kernel.packet[1].v = tmp;
+}
+#endif // EIGEN_ARCH_ARM64
+
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMPLEX_NEON_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/NEON/MathFunctions.h b/third_party/eigen3/Eigen/src/Core/arch/NEON/MathFunctions.h
new file mode 100644
index 0000000000..6bb05bb922
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/NEON/MathFunctions.h
@@ -0,0 +1,91 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* The sin, cos, exp, and log functions of this file come from
+ * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/
+ */
+
+#ifndef EIGEN_MATH_FUNCTIONS_NEON_H
+#define EIGEN_MATH_FUNCTIONS_NEON_H
+
+namespace Eigen {
+
+namespace internal {
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f pexp<Packet4f>(const Packet4f& _x)
+{
+  Packet4f x = _x;
+  Packet4f tmp, fx;
+
+  _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f);
+  _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f);
+  _EIGEN_DECLARE_CONST_Packet4i(0x7f, 0x7f);
+  _EIGEN_DECLARE_CONST_Packet4f(exp_hi,  88.3762626647950f);
+  _EIGEN_DECLARE_CONST_Packet4f(exp_lo, -88.3762626647949f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_LOG2EF, 1.44269504088896341f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C1, 0.693359375f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C2, -2.12194440e-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p0, 1.9875691500E-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p1, 1.3981999507E-3f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p2, 8.3334519073E-3f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p3, 4.1665795894E-2f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p4, 1.6666665459E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p5, 5.0000001201E-1f);
+
+  x = vminq_f32(x, p4f_exp_hi);
+  x = vmaxq_f32(x, p4f_exp_lo);
+
+  /* express exp(x) as exp(g + n*log(2)) */
+  fx = vmlaq_f32(p4f_half, x, p4f_cephes_LOG2EF);
+
+  /* perform a floorf */
+  tmp = vcvtq_f32_s32(vcvtq_s32_f32(fx));
+
+  /* if greater, substract 1 */
+  Packet4ui mask = vcgtq_f32(tmp, fx);
+  mask = vandq_u32(mask, vreinterpretq_u32_f32(p4f_1));
+
+  fx = vsubq_f32(tmp, vreinterpretq_f32_u32(mask));
+
+  tmp = vmulq_f32(fx, p4f_cephes_exp_C1);
+  Packet4f z = vmulq_f32(fx, p4f_cephes_exp_C2);
+  x = vsubq_f32(x, tmp);
+  x = vsubq_f32(x, z);
+
+  Packet4f y = vmulq_f32(p4f_cephes_exp_p0, x);
+  z = vmulq_f32(x, x);
+  y = vaddq_f32(y, p4f_cephes_exp_p1);
+  y = vmulq_f32(y, x);
+  y = vaddq_f32(y, p4f_cephes_exp_p2);
+  y = vmulq_f32(y, x);
+  y = vaddq_f32(y, p4f_cephes_exp_p3);
+  y = vmulq_f32(y, x);
+  y = vaddq_f32(y, p4f_cephes_exp_p4);
+  y = vmulq_f32(y, x);
+  y = vaddq_f32(y, p4f_cephes_exp_p5);
+
+  y = vmulq_f32(y, z);
+  y = vaddq_f32(y, x);
+  y = vaddq_f32(y, p4f_1);
+
+  /* build 2^n */
+  int32x4_t mm;
+  mm = vcvtq_s32_f32(fx);
+  mm = vaddq_s32(mm, p4i_0x7f);
+  mm = vshlq_n_s32(mm, 23);
+  Packet4f pow2n = vreinterpretq_f32_s32(mm);
+
+  y = vmulq_f32(y, pow2n);
+  return y;
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_MATH_FUNCTIONS_NEON_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/NEON/PacketMath.h b/third_party/eigen3/Eigen/src/Core/arch/NEON/PacketMath.h
new file mode 100644
index 0000000000..856a65ad7b
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/NEON/PacketMath.h
@@ -0,0 +1,745 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Konstantinos Margaritis <markos@codex.gr>
+// Heavily based on Gael's SSE version.
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PACKET_MATH_NEON_H
+#define EIGEN_PACKET_MATH_NEON_H
+
+namespace Eigen {
+
+namespace internal {
+
+#ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD
+#define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 16
+#endif
+
+// FIXME NEON has 16 quad registers, but since the current register allocator
+// is so bad, it is much better to reduce it to 8
+#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
+#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS 16
+#endif
+
+#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+#endif
+
+#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
+#define EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
+#endif
+
+typedef float32x2_t Packet2f;
+typedef float32x4_t Packet4f;
+typedef int32x4_t   Packet4i;
+typedef int32x2_t   Packet2i;
+typedef uint32x4_t  Packet4ui;
+
+#define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \
+  const Packet4f p4f_##NAME = pset1<Packet4f>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(NAME,X) \
+  const Packet4f p4f_##NAME = vreinterpretq_f32_u32(pset1<int>(X))
+
+#define _EIGEN_DECLARE_CONST_Packet4i(NAME,X) \
+  const Packet4i p4i_##NAME = pset1<Packet4i>(X)
+
+#if EIGEN_COMP_LLVM && !EIGEN_COMP_CLANG
+  //Special treatment for Apple's llvm-gcc, its NEON packet types are unions
+  #define EIGEN_INIT_NEON_PACKET2(X, Y)       {{X, Y}}
+  #define EIGEN_INIT_NEON_PACKET4(X, Y, Z, W) {{X, Y, Z, W}}
+#else
+  //Default initializer for packets
+  #define EIGEN_INIT_NEON_PACKET2(X, Y)       {X, Y}
+  #define EIGEN_INIT_NEON_PACKET4(X, Y, Z, W) {X, Y, Z, W}
+#endif
+
+// arm64 does have the pld instruction. If available, let's trust the __builtin_prefetch built-in function
+// which available on LLVM and GCC (at least)
+#if EIGEN_HAS_BUILTIN(__builtin_prefetch) || EIGEN_COMP_GNUC
+  #define EIGEN_ARM_PREFETCH(ADDR) __builtin_prefetch(ADDR);
+#elif defined __pld
+  #define EIGEN_ARM_PREFETCH(ADDR) __pld(ADDR)
+#elif !EIGEN_ARCH_ARM64
+  #define EIGEN_ARM_PREFETCH(ADDR) asm volatile ( "   pld [%[addr]]\n" :: [addr] "r" (ADDR) : "cc" );
+#else
+  // by default no explicit prefetching
+  #define EIGEN_ARM_PREFETCH(ADDR)
+#endif
+
+template<> struct packet_traits<float>  : default_packet_traits
+{
+  typedef Packet4f type;
+  typedef Packet4f half; // Packet2f intrinsics not implemented yet
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 4,
+    HasHalfPacket=0, // Packet2f intrinsics not implemented yet
+
+    HasDiv  = 1,
+    // FIXME check the Has*
+    HasSin  = 0,
+    HasCos  = 0,
+    HasTanH = 1,
+    HasLog  = 0,
+    HasExp  = 1,
+    HasSqrt = 0
+  };
+};
+template<> struct packet_traits<int>    : default_packet_traits
+{
+  typedef Packet4i type;
+  typedef Packet4i half; // Packet2i intrinsics not implemented yet
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=4,
+    HasHalfPacket=0 // Packet2i intrinsics not implemented yet
+    // FIXME check the Has*
+  };
+};
+
+#if EIGEN_GNUC_AT_MOST(4,4) && !EIGEN_COMP_LLVM
+// workaround gcc 4.2, 4.3 and 4.4 compilatin issue
+EIGEN_STRONG_INLINE float32x4_t vld1q_f32(const float* x) { return ::vld1q_f32((const float32_t*)x); }
+EIGEN_STRONG_INLINE float32x2_t vld1_f32 (const float* x) { return ::vld1_f32 ((const float32_t*)x); }
+EIGEN_STRONG_INLINE float32x2_t vld1_dup_f32 (const float* x) { return ::vld1_dup_f32 ((const float32_t*)x); }
+EIGEN_STRONG_INLINE void        vst1q_f32(float* to, float32x4_t from) { ::vst1q_f32((float32_t*)to,from); }
+EIGEN_STRONG_INLINE void        vst1_f32 (float* to, float32x2_t from) { ::vst1_f32 ((float32_t*)to,from); }
+#endif
+
+template<> struct unpacket_traits<Packet4f> { typedef float  type; enum {size=4}; typedef Packet4f half; };
+template<> struct unpacket_traits<Packet4i> { typedef int    type; enum {size=4}; typedef Packet4i half; };
+
+template<> EIGEN_STRONG_INLINE Packet4f pset1<Packet4f>(const float&  from) { return vdupq_n_f32(from); }
+template<> EIGEN_STRONG_INLINE Packet4i pset1<Packet4i>(const int&    from)   { return vdupq_n_s32(from); }
+
+template<> EIGEN_STRONG_INLINE Packet4f plset<float>(const float& a)
+{
+  Packet4f countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3);
+  return vaddq_f32(pset1<Packet4f>(a), countdown);
+}
+template<> EIGEN_STRONG_INLINE Packet4i plset<int>(const int& a)
+{
+  Packet4i countdown = EIGEN_INIT_NEON_PACKET4(0, 1, 2, 3);
+  return vaddq_s32(pset1<Packet4i>(a), countdown);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f padd<Packet4f>(const Packet4f& a, const Packet4f& b) { return vaddq_f32(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i padd<Packet4i>(const Packet4i& a, const Packet4i& b) { return vaddq_s32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f psub<Packet4f>(const Packet4f& a, const Packet4f& b) { return vsubq_f32(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i psub<Packet4i>(const Packet4i& a, const Packet4i& b) { return vsubq_s32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a) { return vnegq_f32(a); }
+template<> EIGEN_STRONG_INLINE Packet4i pnegate(const Packet4i& a) { return vnegq_s32(a); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pconj(const Packet4f& a) { return a; }
+template<> EIGEN_STRONG_INLINE Packet4i pconj(const Packet4i& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE Packet4f pmul<Packet4f>(const Packet4f& a, const Packet4f& b) { return vmulq_f32(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pmul<Packet4i>(const Packet4i& a, const Packet4i& b) { return vmulq_s32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pselect<Packet4f>(const Packet4f& a, const Packet4f& b, const Packet4f& false_mask) {
+  return vbslq_f32(vreinterpretq_u32_f32(false_mask), b, a);
+}
+template<> EIGEN_STRONG_INLINE Packet4i pselect<Packet4i>(const Packet4i& a, const Packet4i& b, const Packet4i& false_mask) {
+  return vbslq_s32(vreinterpretq_u32_s32(false_mask), b, a);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f pdiv<Packet4f>(const Packet4f& a, const Packet4f& b)
+{
+#if EIGEN_ARCH_ARM64
+  return vdivq_f32(a,b);
+#else
+  Packet4f inv, restep, div;
+
+  // NEON does not offer a divide instruction, we have to do a reciprocal approximation
+  // However NEON in contrast to other SIMD engines (AltiVec/SSE), offers
+  // a reciprocal estimate AND a reciprocal step -which saves a few instructions
+  // vrecpeq_f32() returns an estimate to 1/b, which we will finetune with
+  // Newton-Raphson and vrecpsq_f32()
+  inv = vrecpeq_f32(b);
+
+  // This returns a differential, by which we will have to multiply inv to get a better
+  // approximation of 1/b.
+  restep = vrecpsq_f32(b, inv);
+  inv = vmulq_f32(restep, inv);
+
+  // Finally, multiply a by 1/b and get the wanted result of the division.
+  div = vmulq_f32(a, inv);
+
+  return div;
+#endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet4i pdiv<Packet4i>(const Packet4i& /*a*/, const Packet4i& /*b*/)
+{ eigen_assert(false && "packet integer division are not supported by NEON");
+  return pset1<Packet4i>(0);
+}
+
+#ifdef __ARM_FEATURE_FMA
+// See bug 936.
+// FMA is available on VFPv4 i.e. when compiling with -mfpu=neon-vfpv4.
+// FMA is a true fused multiply-add i.e. only 1 rounding at the end, no intermediate rounding.
+// MLA is not fused i.e. does 2 roundings.
+// In addition to giving better accuracy, FMA also gives better performance here on a Krait (Nexus 4):
+// MLA: 10 GFlop/s ; FMA: 12 GFlops/s.
+template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { return vfmaq_f32(c,a,b); }
+#else
+template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { return vmlaq_f32(c,a,b); }
+#endif
+
+// No FMA instruction for int, so use MLA unconditionally.
+template<> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i& b, const Packet4i& c) { return vmlaq_s32(c,a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pmin<Packet4f>(const Packet4f& a, const Packet4f& b) { return vminq_f32(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pmin<Packet4i>(const Packet4i& a, const Packet4i& b) { return vminq_s32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pmax<Packet4f>(const Packet4f& a, const Packet4f& b) { return vmaxq_f32(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pmax<Packet4i>(const Packet4i& a, const Packet4i& b) { return vmaxq_s32(a,b); }
+
+// TODO(ebrevdo): add support for ple, plt, peq using vcle_f32/s32 or
+// vcleq_f32/s32, and their ilk, respectively, once it's clear which condition code to use.
+
+// Logical Operations are not supported for float, so we have to reinterpret casts using NEON intrinsics
+template<> EIGEN_STRONG_INLINE Packet4f pand<Packet4f>(const Packet4f& a, const Packet4f& b)
+{
+  return vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(a),vreinterpretq_u32_f32(b)));
+}
+template<> EIGEN_STRONG_INLINE Packet4i pand<Packet4i>(const Packet4i& a, const Packet4i& b) { return vandq_s32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f por<Packet4f>(const Packet4f& a, const Packet4f& b)
+{
+  return vreinterpretq_f32_u32(vorrq_u32(vreinterpretq_u32_f32(a),vreinterpretq_u32_f32(b)));
+}
+template<> EIGEN_STRONG_INLINE Packet4i por<Packet4i>(const Packet4i& a, const Packet4i& b) { return vorrq_s32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pxor<Packet4f>(const Packet4f& a, const Packet4f& b)
+{
+  return vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(a),vreinterpretq_u32_f32(b)));
+}
+template<> EIGEN_STRONG_INLINE Packet4i pxor<Packet4i>(const Packet4i& a, const Packet4i& b) { return veorq_s32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pandnot<Packet4f>(const Packet4f& a, const Packet4f& b)
+{
+  return vreinterpretq_f32_u32(vbicq_u32(vreinterpretq_u32_f32(a),vreinterpretq_u32_f32(b)));
+}
+template<> EIGEN_STRONG_INLINE Packet4i pandnot<Packet4i>(const Packet4i& a, const Packet4i& b) { return vbicq_s32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pload<Packet4f>(const float* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_f32(from); }
+template<> EIGEN_STRONG_INLINE Packet4i pload<Packet4i>(const int*   from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_s32(from); }
+
+template<> EIGEN_STRONG_INLINE Packet4f ploadu<Packet4f>(const float* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_f32(from); }
+template<> EIGEN_STRONG_INLINE Packet4i ploadu<Packet4i>(const int* from)   { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_s32(from); }
+
+template<> EIGEN_STRONG_INLINE Packet4f ploaddup<Packet4f>(const float*   from)
+{
+  float32x2_t lo, hi;
+  lo = vld1_dup_f32(from);
+  hi = vld1_dup_f32(from+1);
+  return vcombine_f32(lo, hi);
+}
+template<> EIGEN_STRONG_INLINE Packet4i ploaddup<Packet4i>(const int*     from)
+{
+  int32x2_t lo, hi;
+  lo = vld1_dup_s32(from);
+  hi = vld1_dup_s32(from+1);
+  return vcombine_s32(lo, hi);
+}
+
+template<> EIGEN_STRONG_INLINE void pstore<float>(float*   to, const Packet4f& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_f32(to, from); }
+template<> EIGEN_STRONG_INLINE void pstore<int>(int*       to, const Packet4i& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_s32(to, from); }
+
+template<> EIGEN_STRONG_INLINE void pstoreu<float>(float*  to, const Packet4f& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_f32(to, from); }
+template<> EIGEN_STRONG_INLINE void pstoreu<int>(int*      to, const Packet4i& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_s32(to, from); }
+
+template<> EIGEN_DEVICE_FUNC inline Packet4f pgather<float, Packet4f>(const float* from, int stride)
+{
+  Packet4f res = pset1<Packet4f>(0);
+  res = vsetq_lane_f32(from[0*stride], res, 0);
+  res = vsetq_lane_f32(from[1*stride], res, 1);
+  res = vsetq_lane_f32(from[2*stride], res, 2);
+  res = vsetq_lane_f32(from[3*stride], res, 3);
+  return res;
+}
+template<> EIGEN_DEVICE_FUNC inline Packet4i pgather<int, Packet4i>(const int* from, int stride)
+{
+  Packet4i res = pset1<Packet4i>(0);
+  res = vsetq_lane_s32(from[0*stride], res, 0);
+  res = vsetq_lane_s32(from[1*stride], res, 1);
+  res = vsetq_lane_s32(from[2*stride], res, 2);
+  res = vsetq_lane_s32(from[3*stride], res, 3);
+  return res;
+}
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<float, Packet4f>(float* to, const Packet4f& from, int stride)
+{
+  to[stride*0] = vgetq_lane_f32(from, 0);
+  to[stride*1] = vgetq_lane_f32(from, 1);
+  to[stride*2] = vgetq_lane_f32(from, 2);
+  to[stride*3] = vgetq_lane_f32(from, 3);
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<int, Packet4i>(int* to, const Packet4i& from, int stride)
+{
+  to[stride*0] = vgetq_lane_s32(from, 0);
+  to[stride*1] = vgetq_lane_s32(from, 1);
+  to[stride*2] = vgetq_lane_s32(from, 2);
+  to[stride*3] = vgetq_lane_s32(from, 3);
+}
+
+template<> EIGEN_STRONG_INLINE void prefetch<float>(const float* addr) { EIGEN_ARM_PREFETCH(addr); }
+template<> EIGEN_STRONG_INLINE void prefetch<int>(const int*     addr) { EIGEN_ARM_PREFETCH(addr); }
+
+// FIXME only store the 2 first elements ?
+template<> EIGEN_STRONG_INLINE float  pfirst<Packet4f>(const Packet4f& a) { float EIGEN_ALIGN16 x[4]; vst1q_f32(x, a); return x[0]; }
+template<> EIGEN_STRONG_INLINE int    pfirst<Packet4i>(const Packet4i& a) { int   EIGEN_ALIGN16 x[4]; vst1q_s32(x, a); return x[0]; }
+
+template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a) {
+  float32x2_t a_lo, a_hi;
+  Packet4f a_r64;
+
+  a_r64 = vrev64q_f32(a);
+  a_lo = vget_low_f32(a_r64);
+  a_hi = vget_high_f32(a_r64);
+  return vcombine_f32(a_hi, a_lo);
+}
+template<> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a) {
+  int32x2_t a_lo, a_hi;
+  Packet4i a_r64;
+
+  a_r64 = vrev64q_s32(a);
+  a_lo = vget_low_s32(a_r64);
+  a_hi = vget_high_s32(a_r64);
+  return vcombine_s32(a_hi, a_lo);
+}
+
+template<size_t offset>
+struct protate_impl<offset, Packet4f>
+{
+  static Packet4f run(const Packet4f& a) {
+    return vextq_f32(a, a, offset);
+  }
+};
+
+template<size_t offset>
+struct protate_impl<offset, Packet4i>
+{
+  static Packet4i run(const Packet4i& a) {
+    return vextq_s32(a, a, offset);
+  }
+};
+
+template<> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a) { return vabsq_f32(a); }
+template<> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a) { return vabsq_s32(a); }
+
+template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
+{
+  float32x2_t a_lo, a_hi, sum;
+
+  a_lo = vget_low_f32(a);
+  a_hi = vget_high_f32(a);
+  sum = vpadd_f32(a_lo, a_hi);
+  sum = vpadd_f32(sum, sum);
+  return vget_lane_f32(sum, 0);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f preduxp<Packet4f>(const Packet4f* vecs)
+{
+  float32x4x2_t vtrn1, vtrn2, res1, res2;
+  Packet4f sum1, sum2, sum;
+
+  // NEON zip performs interleaving of the supplied vectors.
+  // We perform two interleaves in a row to acquire the transposed vector
+  vtrn1 = vzipq_f32(vecs[0], vecs[2]);
+  vtrn2 = vzipq_f32(vecs[1], vecs[3]);
+  res1 = vzipq_f32(vtrn1.val[0], vtrn2.val[0]);
+  res2 = vzipq_f32(vtrn1.val[1], vtrn2.val[1]);
+
+  // Do the addition of the resulting vectors
+  sum1 = vaddq_f32(res1.val[0], res1.val[1]);
+  sum2 = vaddq_f32(res2.val[0], res2.val[1]);
+  sum = vaddq_f32(sum1, sum2);
+
+  return sum;
+}
+
+template<> EIGEN_STRONG_INLINE int predux<Packet4i>(const Packet4i& a)
+{
+  int32x2_t a_lo, a_hi, sum;
+
+  a_lo = vget_low_s32(a);
+  a_hi = vget_high_s32(a);
+  sum = vpadd_s32(a_lo, a_hi);
+  sum = vpadd_s32(sum, sum);
+  return vget_lane_s32(sum, 0);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4i preduxp<Packet4i>(const Packet4i* vecs)
+{
+  int32x4x2_t vtrn1, vtrn2, res1, res2;
+  Packet4i sum1, sum2, sum;
+
+  // NEON zip performs interleaving of the supplied vectors.
+  // We perform two interleaves in a row to acquire the transposed vector
+  vtrn1 = vzipq_s32(vecs[0], vecs[2]);
+  vtrn2 = vzipq_s32(vecs[1], vecs[3]);
+  res1 = vzipq_s32(vtrn1.val[0], vtrn2.val[0]);
+  res2 = vzipq_s32(vtrn1.val[1], vtrn2.val[1]);
+
+  // Do the addition of the resulting vectors
+  sum1 = vaddq_s32(res1.val[0], res1.val[1]);
+  sum2 = vaddq_s32(res2.val[0], res2.val[1]);
+  sum = vaddq_s32(sum1, sum2);
+
+  return sum;
+}
+
+// Other reduction functions:
+// mul
+template<> EIGEN_STRONG_INLINE float predux_mul<Packet4f>(const Packet4f& a)
+{
+  float32x2_t a_lo, a_hi, prod;
+
+  // Get a_lo = |a1|a2| and a_hi = |a3|a4|
+  a_lo = vget_low_f32(a);
+  a_hi = vget_high_f32(a);
+  // Get the product of a_lo * a_hi -> |a1*a3|a2*a4|
+  prod = vmul_f32(a_lo, a_hi);
+  // Multiply prod with its swapped value |a2*a4|a1*a3|
+  prod = vmul_f32(prod, vrev64_f32(prod));
+
+  return vget_lane_f32(prod, 0);
+}
+template<> EIGEN_STRONG_INLINE int predux_mul<Packet4i>(const Packet4i& a)
+{
+  int32x2_t a_lo, a_hi, prod;
+
+  // Get a_lo = |a1|a2| and a_hi = |a3|a4|
+  a_lo = vget_low_s32(a);
+  a_hi = vget_high_s32(a);
+  // Get the product of a_lo * a_hi -> |a1*a3|a2*a4|
+  prod = vmul_s32(a_lo, a_hi);
+  // Multiply prod with its swapped value |a2*a4|a1*a3|
+  prod = vmul_s32(prod, vrev64_s32(prod));
+
+  return vget_lane_s32(prod, 0);
+}
+
+// min
+template<> EIGEN_STRONG_INLINE float predux_min<Packet4f>(const Packet4f& a)
+{
+  float32x2_t a_lo, a_hi, min;
+
+  a_lo = vget_low_f32(a);
+  a_hi = vget_high_f32(a);
+  min = vpmin_f32(a_lo, a_hi);
+  min = vpmin_f32(min, min);
+
+  return vget_lane_f32(min, 0);
+}
+
+template<> EIGEN_STRONG_INLINE int predux_min<Packet4i>(const Packet4i& a)
+{
+  int32x2_t a_lo, a_hi, min;
+
+  a_lo = vget_low_s32(a);
+  a_hi = vget_high_s32(a);
+  min = vpmin_s32(a_lo, a_hi);
+  min = vpmin_s32(min, min);
+
+  return vget_lane_s32(min, 0);
+}
+
+// max
+template<> EIGEN_STRONG_INLINE float predux_max<Packet4f>(const Packet4f& a)
+{
+  float32x2_t a_lo, a_hi, max;
+
+  a_lo = vget_low_f32(a);
+  a_hi = vget_high_f32(a);
+  max = vpmax_f32(a_lo, a_hi);
+  max = vpmax_f32(max, max);
+
+  return vget_lane_f32(max, 0);
+}
+
+template<> EIGEN_STRONG_INLINE int predux_max<Packet4i>(const Packet4i& a)
+{
+  int32x2_t a_lo, a_hi, max;
+
+  a_lo = vget_low_s32(a);
+  a_hi = vget_high_s32(a);
+  max = vpmax_s32(a_lo, a_hi);
+  max = vpmax_s32(max, max);
+
+  return vget_lane_s32(max, 0);
+}
+
+// this PALIGN_NEON business is to work around a bug in LLVM Clang 3.0 causing incorrect compilation errors,
+// see bug 347 and this LLVM bug: http://llvm.org/bugs/show_bug.cgi?id=11074
+#define PALIGN_NEON(Offset,Type,Command) \
+template<>\
+struct palign_impl<Offset,Type>\
+{\
+    EIGEN_STRONG_INLINE static void run(Type& first, const Type& second)\
+    {\
+        if (Offset!=0)\
+            first = Command(first, second, Offset);\
+    }\
+};\
+
+PALIGN_NEON(0,Packet4f,vextq_f32)
+PALIGN_NEON(1,Packet4f,vextq_f32)
+PALIGN_NEON(2,Packet4f,vextq_f32)
+PALIGN_NEON(3,Packet4f,vextq_f32)
+PALIGN_NEON(0,Packet4i,vextq_s32)
+PALIGN_NEON(1,Packet4i,vextq_s32)
+PALIGN_NEON(2,Packet4i,vextq_s32)
+PALIGN_NEON(3,Packet4i,vextq_s32)
+
+#undef PALIGN_NEON
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet4f,4>& kernel) {
+  float32x4x2_t tmp1 = vzipq_f32(kernel.packet[0], kernel.packet[1]);
+  float32x4x2_t tmp2 = vzipq_f32(kernel.packet[2], kernel.packet[3]);
+
+  kernel.packet[0] = vcombine_f32(vget_low_f32(tmp1.val[0]), vget_low_f32(tmp2.val[0]));
+  kernel.packet[1] = vcombine_f32(vget_high_f32(tmp1.val[0]), vget_high_f32(tmp2.val[0]));
+  kernel.packet[2] = vcombine_f32(vget_low_f32(tmp1.val[1]), vget_low_f32(tmp2.val[1]));
+  kernel.packet[3] = vcombine_f32(vget_high_f32(tmp1.val[1]), vget_high_f32(tmp2.val[1]));
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet4i,4>& kernel) {
+  int32x4x2_t tmp1 = vzipq_s32(kernel.packet[0], kernel.packet[1]);
+  int32x4x2_t tmp2 = vzipq_s32(kernel.packet[2], kernel.packet[3]);
+  kernel.packet[0] = vcombine_s32(vget_low_s32(tmp1.val[0]), vget_low_s32(tmp2.val[0]));
+  kernel.packet[1] = vcombine_s32(vget_high_s32(tmp1.val[0]), vget_high_s32(tmp2.val[0]));
+  kernel.packet[2] = vcombine_s32(vget_low_s32(tmp1.val[1]), vget_low_s32(tmp2.val[1]));
+  kernel.packet[3] = vcombine_s32(vget_high_s32(tmp1.val[1]), vget_high_s32(tmp2.val[1]));
+}
+
+//---------- double ----------
+
+// Clang 3.5 in the iOS toolchain has an ICE triggered by NEON intrisics for double.
+// Confirmed at least with __apple_build_version__ = 6000054.
+#ifdef __apple_build_version__
+// Let's hope that by the time __apple_build_version__ hits the 601* range, the bug will be fixed.
+// https://gist.github.com/yamaya/2924292 suggests that the 3 first digits are only updated with
+// major toolchain updates.
+#define EIGEN_APPLE_DOUBLE_NEON_BUG (__apple_build_version__ < 6010000)
+#else
+#define EIGEN_APPLE_DOUBLE_NEON_BUG 0
+#endif
+
+#if EIGEN_ARCH_ARM64 && !EIGEN_APPLE_DOUBLE_NEON_BUG
+
+#if (EIGEN_COMP_GNUC_STRICT && defined(__ANDROID__)) || defined(__apple_build_version__)
+// Bug 907: workaround missing declarations of the following two functions in the ADK
+__extension__ static __inline uint64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_u64_f64 (float64x2_t __a)
+{
+  return (uint64x2_t) __a;
+}
+
+__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
+vreinterpretq_f64_u64 (uint64x2_t __a)
+{
+  return (float64x2_t) __a;
+}
+#endif
+
+typedef float64x2_t Packet2d;
+typedef float64x1_t Packet1d;
+
+template<> struct packet_traits<double>  : default_packet_traits
+{
+  typedef Packet2d type;
+  typedef Packet2d half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 2,
+    HasHalfPacket=0,
+   
+    HasDiv  = 1,
+    // FIXME check the Has*
+    HasSin  = 0,
+    HasCos  = 0,
+    HasLog  = 0,
+    HasExp  = 0,
+    HasSqrt = 0
+  };
+};
+
+template<> struct unpacket_traits<Packet2d> { typedef double  type; enum {size=2}; typedef Packet2d half; };
+
+template<> EIGEN_STRONG_INLINE Packet2d pset1<Packet2d>(const double&  from) { return vdupq_n_f64(from); }
+
+template<> EIGEN_STRONG_INLINE Packet2d plset<double>(const double& a)
+{
+  Packet2d countdown = EIGEN_INIT_NEON_PACKET2(0, 1);
+  return vaddq_f64(pset1<Packet2d>(a), countdown);
+}
+template<> EIGEN_STRONG_INLINE Packet2d padd<Packet2d>(const Packet2d& a, const Packet2d& b) { return vaddq_f64(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d psub<Packet2d>(const Packet2d& a, const Packet2d& b) { return vsubq_f64(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pnegate(const Packet2d& a) { return vnegq_f64(a); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pconj(const Packet2d& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE Packet2d pselect<Packet2d>(const Packet2d& a, const Packet2d& b, const Packet2d& false_mask) {
+  return vbslq_f64(vreinterpretq_u64_f64(false_mask), b, a);
+}
+
+template<> EIGEN_STRONG_INLINE Packet2d pmul<Packet2d>(const Packet2d& a, const Packet2d& b) { return vmulq_f64(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pdiv<Packet2d>(const Packet2d& a, const Packet2d& b) { return vdivq_f64(a,b); }
+
+#ifdef __ARM_FEATURE_FMA
+// See bug 936. See above comment about FMA for float.
+template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return vfmaq_f64(c,a,b); }
+#else
+template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return vmlaq_f64(c,a,b); }
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet2d pmin<Packet2d>(const Packet2d& a, const Packet2d& b) { return vminq_f64(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet2d pmax<Packet2d>(const Packet2d& a, const Packet2d& b) { return vmaxq_f64(a,b); }
+
+// Logical Operations are not supported for float, so we have to reinterpret casts using NEON intrinsics
+template<> EIGEN_STRONG_INLINE Packet2d pand<Packet2d>(const Packet2d& a, const Packet2d& b)
+{
+  return vreinterpretq_f64_u64(vandq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2d por<Packet2d>(const Packet2d& a, const Packet2d& b)
+{
+  return vreinterpretq_f64_u64(vorrq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2d pxor<Packet2d>(const Packet2d& a, const Packet2d& b)
+{
+  return vreinterpretq_f64_u64(veorq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2d pandnot<Packet2d>(const Packet2d& a, const Packet2d& b)
+{
+  return vreinterpretq_f64_u64(vbicq_u64(vreinterpretq_u64_f64(a),vreinterpretq_u64_f64(b)));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2d pload<Packet2d>(const double* from) { EIGEN_DEBUG_ALIGNED_LOAD return vld1q_f64(from); }
+
+template<> EIGEN_STRONG_INLINE Packet2d ploadu<Packet2d>(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return vld1q_f64(from); }
+
+template<> EIGEN_STRONG_INLINE Packet2d ploaddup<Packet2d>(const double*   from)
+{
+  return vld1q_dup_f64(from);
+}
+template<> EIGEN_STRONG_INLINE void pstore<double>(double*   to, const Packet2d& from) { EIGEN_DEBUG_ALIGNED_STORE vst1q_f64(to, from); }
+
+template<> EIGEN_STRONG_INLINE void pstoreu<double>(double*  to, const Packet2d& from) { EIGEN_DEBUG_UNALIGNED_STORE vst1q_f64(to, from); }
+
+template<> EIGEN_DEVICE_FUNC inline Packet2d pgather<double, Packet2d>(const double* from, int stride)
+{
+  Packet2d res = pset1<Packet2d>(0.0);
+  res = vsetq_lane_f64(from[0*stride], res, 0);
+  res = vsetq_lane_f64(from[1*stride], res, 1);
+  return res;
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<double, Packet2d>(double* to, const Packet2d& from, int stride)
+{
+  to[stride*0] = vgetq_lane_f64(from, 0);
+  to[stride*1] = vgetq_lane_f64(from, 1);
+}
+template<> EIGEN_STRONG_INLINE void prefetch<double>(const double* addr) { EIGEN_ARM_PREFETCH(addr); }
+
+// FIXME only store the 2 first elements ?
+template<> EIGEN_STRONG_INLINE double pfirst<Packet2d>(const Packet2d& a) { return vgetq_lane_f64(a, 0); }
+
+template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a) { return vcombine_f64(vget_high_f64(a), vget_low_f64(a)); }
+
+template<size_t offset>
+struct protate_impl<offset, Packet2d>
+{
+  static Packet2d run(const Packet2d& a) {
+    return vextq_f64(a, a, offset);
+  }
+};
+
+template<> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a) { return vabsq_f64(a); }
+
+#if EIGEN_COMP_CLANG && defined(__apple_build_version__)
+// workaround ICE, see bug 907
+template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a) { return (vget_low_f64(a) + vget_high_f64(a))[0]; }
+#else
+template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a) { return vget_lane_f64(vget_low_f64(a) + vget_high_f64(a), 0); }
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet2d preduxp<Packet2d>(const Packet2d* vecs)
+{
+  float64x2_t trn1, trn2;
+
+  // NEON zip performs interleaving of the supplied vectors.
+  // We perform two interleaves in a row to acquire the transposed vector
+  trn1 = vzip1q_f64(vecs[0], vecs[1]);
+  trn2 = vzip2q_f64(vecs[0], vecs[1]);
+
+  // Do the addition of the resulting vectors
+  return vaddq_f64(trn1, trn2);
+}
+// Other reduction functions:
+// mul
+#if EIGEN_COMP_CLANG && defined(__apple_build_version__)
+template<> EIGEN_STRONG_INLINE double predux_mul<Packet2d>(const Packet2d& a) { return (vget_low_f64(a) * vget_high_f64(a))[0]; }
+#else
+template<> EIGEN_STRONG_INLINE double predux_mul<Packet2d>(const Packet2d& a) { return vget_lane_f64(vget_low_f64(a) * vget_high_f64(a), 0); }
+#endif
+
+// min
+template<> EIGEN_STRONG_INLINE double predux_min<Packet2d>(const Packet2d& a) { return vgetq_lane_f64(vpminq_f64(a, a), 0); }
+
+// max
+template<> EIGEN_STRONG_INLINE double predux_max<Packet2d>(const Packet2d& a) { return vgetq_lane_f64(vpmaxq_f64(a, a), 0); }
+
+// this PALIGN_NEON business is to work around a bug in LLVM Clang 3.0 causing incorrect compilation errors,
+// see bug 347 and this LLVM bug: http://llvm.org/bugs/show_bug.cgi?id=11074
+#define PALIGN_NEON(Offset,Type,Command) \
+template<>\
+struct palign_impl<Offset,Type>\
+{\
+    EIGEN_STRONG_INLINE static void run(Type& first, const Type& second)\
+    {\
+        if (Offset!=0)\
+            first = Command(first, second, Offset);\
+    }\
+};\
+
+PALIGN_NEON(0,Packet2d,vextq_f64)
+PALIGN_NEON(1,Packet2d,vextq_f64)
+#undef PALIGN_NEON
+
+EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet2d,2>& kernel) {
+  float64x2_t trn1 = vzip1q_f64(kernel.packet[0], kernel.packet[1]);
+  float64x2_t trn2 = vzip2q_f64(kernel.packet[0], kernel.packet[1]);
+
+  kernel.packet[0] = trn1;
+  kernel.packet[1] = trn2;
+}
+#endif // EIGEN_ARCH_ARM64 
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_PACKET_MATH_NEON_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/SSE/Complex.h b/third_party/eigen3/Eigen/src/Core/arch/SSE/Complex.h
new file mode 100644
index 0000000000..2722893dcf
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/SSE/Complex.h
@@ -0,0 +1,486 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COMPLEX_SSE_H
+#define EIGEN_COMPLEX_SSE_H
+
+namespace Eigen {
+
+namespace internal {
+
+//---------- float ----------
+struct Packet2cf
+{
+  EIGEN_STRONG_INLINE Packet2cf() {}
+  EIGEN_STRONG_INLINE explicit Packet2cf(const __m128& a) : v(a) {}
+  __m128  v;
+};
+
+// Use the packet_traits defined in AVX/PacketMath.h instead if we're going
+// to leverage AVX instructions.
+#ifndef EIGEN_VECTORIZE_AVX
+template<> struct packet_traits<std::complex<float> >  : default_packet_traits
+{
+  typedef Packet2cf type;
+  typedef Packet2cf half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 2,
+    HasHalfPacket = 0,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasDiv    = 1,
+    HasNegate = 1,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasSetLinear = 0,
+    HasBlend = 1,
+  };
+};
+#endif
+
+template<> struct unpacket_traits<Packet2cf> { typedef std::complex<float> type; enum {size=2}; typedef Packet2cf half; };
+
+template<> EIGEN_STRONG_INLINE Packet2cf padd<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_add_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf psub<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_sub_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pnegate(const Packet2cf& a)
+{
+  const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x80000000,0x80000000,0x80000000));
+  return Packet2cf(_mm_xor_ps(a.v,mask));
+}
+template<> EIGEN_STRONG_INLINE Packet2cf pconj(const Packet2cf& a)
+{
+  const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000));
+  return Packet2cf(_mm_xor_ps(a.v,mask));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf pmul<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  // TODO optimize it for SSE3 and 4
+  #ifdef EIGEN_VECTORIZE_SSE3
+  return Packet2cf(_mm_addsub_ps(_mm_mul_ps(_mm_moveldup_ps(a.v), b.v),
+                                 _mm_mul_ps(_mm_movehdup_ps(a.v),
+                                            vec4f_swizzle1(b.v, 1, 0, 3, 2))));
+//   return Packet2cf(_mm_addsub_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v),
+//                                  _mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3),
+//                                             vec4f_swizzle1(b.v, 1, 0, 3, 2))));
+  #else
+  const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x00000000,0x80000000,0x00000000));
+  return Packet2cf(_mm_add_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v),
+                              _mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3),
+                                                    vec4f_swizzle1(b.v, 1, 0, 3, 2)), mask)));
+  #endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf pand   <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_and_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf por    <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_or_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pxor   <Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_xor_ps(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet2cf pandnot<Packet2cf>(const Packet2cf& a, const Packet2cf& b) { return Packet2cf(_mm_andnot_ps(a.v,b.v)); }
+
+template<> EIGEN_STRONG_INLINE Packet2cf pload <Packet2cf>(const std::complex<float>* from) { EIGEN_DEBUG_ALIGNED_LOAD return Packet2cf(pload<Packet4f>(&numext::real_ref(*from))); }
+template<> EIGEN_STRONG_INLINE Packet2cf ploadu<Packet2cf>(const std::complex<float>* from) { EIGEN_DEBUG_UNALIGNED_LOAD return Packet2cf(ploadu<Packet4f>(&numext::real_ref(*from))); }
+
+template<> EIGEN_STRONG_INLINE Packet2cf pset1<Packet2cf>(const std::complex<float>&  from)
+{
+  Packet2cf res;
+#if EIGEN_GNUC_AT_MOST(4,2)
+  // Workaround annoying "may be used uninitialized in this function" warning with gcc 4.2
+  res.v = _mm_loadl_pi(_mm_set1_ps(0.0f), reinterpret_cast<const __m64*>(&from));
+#elif EIGEN_GNUC_AT_LEAST(4,6)
+  // Suppress annoying "may be used uninitialized in this function" warning with gcc >= 4.6
+  #pragma GCC diagnostic push
+  #pragma GCC diagnostic ignored "-Wuninitialized"
+  res.v = _mm_loadl_pi(res.v, (const __m64*)&from);
+  #pragma GCC diagnostic pop
+#else
+  res.v = _mm_loadl_pi(res.v, (const __m64*)&from);
+#endif
+  return Packet2cf(_mm_movelh_ps(res.v,res.v));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf ploaddup<Packet2cf>(const std::complex<float>* from) { return pset1<Packet2cf>(*from); }
+
+template<> EIGEN_STRONG_INLINE void pstore <std::complex<float> >(std::complex<float> *   to, const Packet2cf& from) { EIGEN_DEBUG_ALIGNED_STORE pstore(&numext::real_ref(*to), Packet4f(from.v)); }
+template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<float> >(std::complex<float> *   to, const Packet2cf& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(&numext::real_ref(*to), Packet4f(from.v)); }
+
+
+template<> EIGEN_DEVICE_FUNC inline Packet2cf pgather<std::complex<float>, Packet2cf>(const std::complex<float>* from, int stride)
+{
+  return Packet2cf(_mm_set_ps(std::imag(from[1*stride]), std::real(from[1*stride]),
+                              std::imag(from[0*stride]), std::real(from[0*stride])));
+}
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<std::complex<float>, Packet2cf>(std::complex<float>* to, const Packet2cf& from, int stride)
+{
+  to[stride*0] = std::complex<float>(_mm_cvtss_f32(_mm_shuffle_ps(from.v, from.v, 0)),
+                                     _mm_cvtss_f32(_mm_shuffle_ps(from.v, from.v, 1)));
+  to[stride*1] = std::complex<float>(_mm_cvtss_f32(_mm_shuffle_ps(from.v, from.v, 2)),
+                                     _mm_cvtss_f32(_mm_shuffle_ps(from.v, from.v, 3)));
+}
+
+template<> EIGEN_STRONG_INLINE void prefetch<std::complex<float> >(const std::complex<float> *   addr) { _mm_prefetch((const char*)(addr), _MM_HINT_T0); }
+
+template<> EIGEN_STRONG_INLINE std::complex<float>  pfirst<Packet2cf>(const Packet2cf& a)
+{
+  #if EIGEN_GNUC_AT_MOST(4,3)
+  // Workaround gcc 4.2 ICE - this is not performance wise ideal, but who cares...
+  // This workaround also fix invalid code generation with gcc 4.3
+  EIGEN_ALIGN16 std::complex<float> res[2];
+  _mm_store_ps((float*)res, a.v);
+  return res[0];
+  #else
+  std::complex<float> res;
+  _mm_storel_pi((__m64*)&res, a.v);
+  return res;
+  #endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf preverse(const Packet2cf& a) { return Packet2cf(_mm_castpd_ps(preverse(Packet2d(_mm_castps_pd(a.v))))); }
+
+template<> EIGEN_STRONG_INLINE std::complex<float> predux<Packet2cf>(const Packet2cf& a)
+{
+  return pfirst(Packet2cf(_mm_add_ps(a.v, _mm_movehl_ps(a.v,a.v))));
+}
+
+template<> EIGEN_STRONG_INLINE Packet2cf preduxp<Packet2cf>(const Packet2cf* vecs)
+{
+  return Packet2cf(_mm_add_ps(_mm_movelh_ps(vecs[0].v,vecs[1].v), _mm_movehl_ps(vecs[1].v,vecs[0].v)));
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<float> predux_mul<Packet2cf>(const Packet2cf& a)
+{
+  return pfirst(pmul(a, Packet2cf(_mm_movehl_ps(a.v,a.v))));
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet2cf>
+{
+  static EIGEN_STRONG_INLINE void run(Packet2cf& first, const Packet2cf& second)
+  {
+    if (Offset==1)
+    {
+      first.v = _mm_movehl_ps(first.v, first.v);
+      first.v = _mm_movelh_ps(first.v, second.v);
+    }
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, false,true>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    #ifdef EIGEN_VECTORIZE_SSE3
+    return internal::pmul(a, pconj(b));
+    #else
+    const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000));
+    return Packet2cf(_mm_add_ps(_mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v), mask),
+                                _mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3),
+                                           vec4f_swizzle1(b.v, 1, 0, 3, 2))));
+    #endif
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, true,false>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    #ifdef EIGEN_VECTORIZE_SSE3
+    return internal::pmul(pconj(a), b);
+    #else
+    const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000));
+    return Packet2cf(_mm_add_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v),
+                                _mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3),
+                                                      vec4f_swizzle1(b.v, 1, 0, 3, 2)), mask)));
+    #endif
+  }
+};
+
+template<> struct conj_helper<Packet2cf, Packet2cf, true,true>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& a, const Packet2cf& b) const
+  {
+    #ifdef EIGEN_VECTORIZE_SSE3
+    return pconj(internal::pmul(a, b));
+    #else
+    const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0x00000000,0x80000000,0x00000000,0x80000000));
+    return Packet2cf(_mm_sub_ps(_mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a.v, 0, 0, 2, 2), b.v), mask),
+                                _mm_mul_ps(vec4f_swizzle1(a.v, 1, 1, 3, 3),
+                                           vec4f_swizzle1(b.v, 1, 0, 3, 2))));
+    #endif
+  }
+};
+
+template<> struct conj_helper<Packet4f, Packet2cf, false,false>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet4f& x, const Packet2cf& y, const Packet2cf& c) const
+  { return padd(c, pmul(x,y)); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet4f& x, const Packet2cf& y) const
+  { return Packet2cf(Eigen::internal::pmul<Packet4f>(x, y.v)); }
+};
+
+template<> struct conj_helper<Packet2cf, Packet4f, false,false>
+{
+  EIGEN_STRONG_INLINE Packet2cf pmadd(const Packet2cf& x, const Packet4f& y, const Packet2cf& c) const
+  { return padd(c, pmul(x,y)); }
+
+  EIGEN_STRONG_INLINE Packet2cf pmul(const Packet2cf& x, const Packet4f& y) const
+  { return Packet2cf(Eigen::internal::pmul<Packet4f>(x.v, y)); }
+};
+
+template<> EIGEN_STRONG_INLINE Packet2cf pdiv<Packet2cf>(const Packet2cf& a, const Packet2cf& b)
+{
+  // TODO optimize it for SSE3 and 4
+  Packet2cf res = conj_helper<Packet2cf,Packet2cf,false,true>().pmul(a,b);
+  __m128 s = _mm_mul_ps(b.v,b.v);
+  return Packet2cf(_mm_div_ps(res.v,_mm_add_ps(s,_mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(s), 0xb1)))));
+}
+
+EIGEN_STRONG_INLINE Packet2cf pcplxflip/*<Packet2cf>*/(const Packet2cf& x)
+{
+  return Packet2cf(vec4f_swizzle1(x.v, 1, 0, 3, 2));
+}
+
+
+//---------- double ----------
+struct Packet1cd
+{
+  EIGEN_STRONG_INLINE Packet1cd() {}
+  EIGEN_STRONG_INLINE explicit Packet1cd(const __m128d& a) : v(a) {}
+  __m128d  v;
+};
+
+// Use the packet_traits defined in AVX/PacketMath.h instead if we're going
+// to leverage AVX instructions.
+#ifndef EIGEN_VECTORIZE_AVX
+template<> struct packet_traits<std::complex<double> >  : default_packet_traits
+{
+  typedef Packet1cd type;
+  typedef Packet1cd half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 0,
+    size = 1,
+    HasHalfPacket = 0,
+
+    HasAdd    = 1,
+    HasSub    = 1,
+    HasMul    = 1,
+    HasDiv    = 1,
+    HasNegate = 1,
+    HasAbs    = 0,
+    HasAbs2   = 0,
+    HasMin    = 0,
+    HasMax    = 0,
+    HasSetLinear = 0
+  };
+};
+#endif
+
+template<> struct unpacket_traits<Packet1cd> { typedef std::complex<double> type; enum {size=1}; typedef Packet1cd half; };
+
+template<> EIGEN_STRONG_INLINE Packet1cd padd<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_add_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd psub<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_sub_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pnegate(const Packet1cd& a) { return Packet1cd(pnegate(Packet2d(a.v))); }
+template<> EIGEN_STRONG_INLINE Packet1cd pconj(const Packet1cd& a)
+{
+  const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
+  return Packet1cd(_mm_xor_pd(a.v,mask));
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  // TODO optimize it for SSE3 and 4
+  #ifdef EIGEN_VECTORIZE_SSE3
+  return Packet1cd(_mm_addsub_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v),
+                                 _mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
+                                            vec2d_swizzle1(b.v, 1, 0))));
+  #else
+  const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x0,0x0,0x80000000,0x0));
+  return Packet1cd(_mm_add_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v),
+                              _mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
+                                                    vec2d_swizzle1(b.v, 1, 0)), mask)));
+  #endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd pand   <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_and_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd por    <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_or_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pxor   <Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_xor_pd(a.v,b.v)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pandnot<Packet1cd>(const Packet1cd& a, const Packet1cd& b) { return Packet1cd(_mm_andnot_pd(a.v,b.v)); }
+
+// FIXME force unaligned load, this is a temporary fix
+template<> EIGEN_STRONG_INLINE Packet1cd pload <Packet1cd>(const std::complex<double>* from)
+{ EIGEN_DEBUG_ALIGNED_LOAD return Packet1cd(pload<Packet2d>((const double*)from)); }
+template<> EIGEN_STRONG_INLINE Packet1cd ploadu<Packet1cd>(const std::complex<double>* from)
+{ EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu<Packet2d>((const double*)from)); }
+template<> EIGEN_STRONG_INLINE Packet1cd pset1<Packet1cd>(const std::complex<double>&  from)
+{ /* here we really have to use unaligned loads :( */ return ploadu<Packet1cd>(&from); }
+
+template<> EIGEN_STRONG_INLINE Packet1cd ploaddup<Packet1cd>(const std::complex<double>* from) { return pset1<Packet1cd>(*from); }
+
+// FIXME force unaligned store, this is a temporary fix
+template<> EIGEN_STRONG_INLINE void pstore <std::complex<double> >(std::complex<double> *   to, const Packet1cd& from) { EIGEN_DEBUG_ALIGNED_STORE pstore((double*)to, Packet2d(from.v)); }
+template<> EIGEN_STRONG_INLINE void pstoreu<std::complex<double> >(std::complex<double> *   to, const Packet1cd& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu((double*)to, Packet2d(from.v)); }
+
+template<> EIGEN_STRONG_INLINE void prefetch<std::complex<double> >(const std::complex<double> *   addr) { _mm_prefetch((const char*)(addr), _MM_HINT_T0); }
+
+template<> EIGEN_STRONG_INLINE std::complex<double>  pfirst<Packet1cd>(const Packet1cd& a)
+{
+  EIGEN_ALIGN16 double res[2];
+  _mm_store_pd(res, a.v);
+  return std::complex<double>(res[0],res[1]);
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd preverse(const Packet1cd& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE std::complex<double> predux<Packet1cd>(const Packet1cd& a)
+{
+  return pfirst(a);
+}
+
+template<> EIGEN_STRONG_INLINE Packet1cd preduxp<Packet1cd>(const Packet1cd* vecs)
+{
+  return vecs[0];
+}
+
+template<> EIGEN_STRONG_INLINE std::complex<double> predux_mul<Packet1cd>(const Packet1cd& a)
+{
+  return pfirst(a);
+}
+
+template<int Offset>
+struct palign_impl<Offset,Packet1cd>
+{
+  static EIGEN_STRONG_INLINE void run(Packet1cd& /*first*/, const Packet1cd& /*second*/)
+  {
+    // FIXME is it sure we never have to align a Packet1cd?
+    // Even though a std::complex<double> has 16 bytes, it is not necessarily aligned on a 16 bytes boundary...
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, false,true>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    #ifdef EIGEN_VECTORIZE_SSE3
+    return internal::pmul(a, pconj(b));
+    #else
+    const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
+    return Packet1cd(_mm_add_pd(_mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v), mask),
+                                _mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
+                                           vec2d_swizzle1(b.v, 1, 0))));
+    #endif
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, true,false>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    #ifdef EIGEN_VECTORIZE_SSE3
+    return internal::pmul(pconj(a), b);
+    #else
+    const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
+    return Packet1cd(_mm_add_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v),
+                                _mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
+                                                      vec2d_swizzle1(b.v, 1, 0)), mask)));
+    #endif
+  }
+};
+
+template<> struct conj_helper<Packet1cd, Packet1cd, true,true>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(pmul(x,y),c); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& a, const Packet1cd& b) const
+  {
+    #ifdef EIGEN_VECTORIZE_SSE3
+    return pconj(internal::pmul(a, b));
+    #else
+    const __m128d mask = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
+    return Packet1cd(_mm_sub_pd(_mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v), mask),
+                                _mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
+                                           vec2d_swizzle1(b.v, 1, 0))));
+    #endif
+  }
+};
+
+template<> struct conj_helper<Packet2d, Packet1cd, false,false>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet2d& x, const Packet1cd& y, const Packet1cd& c) const
+  { return padd(c, pmul(x,y)); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet2d& x, const Packet1cd& y) const
+  { return Packet1cd(Eigen::internal::pmul<Packet2d>(x, y.v)); }
+};
+
+template<> struct conj_helper<Packet1cd, Packet2d, false,false>
+{
+  EIGEN_STRONG_INLINE Packet1cd pmadd(const Packet1cd& x, const Packet2d& y, const Packet1cd& c) const
+  { return padd(c, pmul(x,y)); }
+
+  EIGEN_STRONG_INLINE Packet1cd pmul(const Packet1cd& x, const Packet2d& y) const
+  { return Packet1cd(Eigen::internal::pmul<Packet2d>(x.v, y)); }
+};
+
+template<> EIGEN_STRONG_INLINE Packet1cd pdiv<Packet1cd>(const Packet1cd& a, const Packet1cd& b)
+{
+  // TODO optimize it for SSE3 and 4
+  Packet1cd res = conj_helper<Packet1cd,Packet1cd,false,true>().pmul(a,b);
+  __m128d s = _mm_mul_pd(b.v,b.v);
+  return Packet1cd(_mm_div_pd(res.v, _mm_add_pd(s,_mm_shuffle_pd(s, s, 0x1))));
+}
+
+EIGEN_STRONG_INLINE Packet1cd pcplxflip/*<Packet1cd>*/(const Packet1cd& x)
+{
+  return Packet1cd(preverse(Packet2d(x.v)));
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet2cf,2>& kernel) {
+  __m128d w1 = _mm_castps_pd(kernel.packet[0].v);
+  __m128d w2 = _mm_castps_pd(kernel.packet[1].v);
+
+  __m128 tmp = _mm_castpd_ps(_mm_unpackhi_pd(w1, w2));
+  kernel.packet[0].v = _mm_castpd_ps(_mm_unpacklo_pd(w1, w2));
+  kernel.packet[1].v = tmp;
+}
+
+template<>  EIGEN_STRONG_INLINE Packet2cf pblend(const Selector<2>& ifPacket, const Packet2cf& thenPacket, const Packet2cf& elsePacket) {
+  __m128d result = pblend(ifPacket, _mm_castps_pd(thenPacket.v), _mm_castps_pd(elsePacket.v));
+  return Packet2cf(_mm_castpd_ps(result));
+}
+
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMPLEX_SSE_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/SSE/MathFunctions.h b/third_party/eigen3/Eigen/src/Core/arch/SSE/MathFunctions.h
new file mode 100644
index 0000000000..0baa7b4b58
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/SSE/MathFunctions.h
@@ -0,0 +1,529 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2007 Julien Pommier
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* The sin, cos, exp, and log functions of this file come from
+ * Julien Pommier's sse math library: http://gruntthepeon.free.fr/ssemath/
+ */
+
+#ifndef EIGEN_MATH_FUNCTIONS_SSE_H
+#define EIGEN_MATH_FUNCTIONS_SSE_H
+
+namespace Eigen {
+
+namespace internal {
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f plog<Packet4f>(const Packet4f& _x)
+{
+  Packet4f x = _x;
+  _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f);
+  _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f);
+  _EIGEN_DECLARE_CONST_Packet4i(0x7f, 0x7f);
+
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(inv_mant_mask, ~0x7f800000);
+
+  /* the smallest non denormalized float number */
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(min_norm_pos,  0x00800000);
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(minus_inf,     0xff800000);//-1.f/0.f);
+
+  /* natural logarithm computed for 4 simultaneous float
+    return NaN for x <= 0
+  */
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_SQRTHF, 0.707106781186547524f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p0, 7.0376836292E-2f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p1, - 1.1514610310E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p2, 1.1676998740E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p3, - 1.2420140846E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p4, + 1.4249322787E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p5, - 1.6668057665E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p6, + 2.0000714765E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p7, - 2.4999993993E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_p8, + 3.3333331174E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q1, -2.12194440e-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_log_q2, 0.693359375f);
+
+
+  Packet4i emm0;
+
+  // invalid_mask is set to true when x is NaN
+  Packet4f invalid_mask = _mm_cmpnge_ps(x, _mm_setzero_ps());
+  Packet4f iszero_mask = _mm_cmpeq_ps(x, _mm_setzero_ps());
+
+  x = pmax(x, p4f_min_norm_pos);  /* cut off denormalized stuff */
+  emm0 = _mm_srli_epi32(_mm_castps_si128(x), 23);
+
+  /* keep only the fractional part */
+  x = _mm_and_ps(x, p4f_inv_mant_mask);
+  x = _mm_or_ps(x, p4f_half);
+
+  emm0 = _mm_sub_epi32(emm0, p4i_0x7f);
+  Packet4f e = padd(Packet4f(_mm_cvtepi32_ps(emm0)), p4f_1);
+
+  /* part2:
+     if( x < SQRTHF ) {
+       e -= 1;
+       x = x + x - 1.0;
+     } else { x = x - 1.0; }
+  */
+  Packet4f mask = _mm_cmplt_ps(x, p4f_cephes_SQRTHF);
+  Packet4f tmp = pand(x, mask);
+  x = psub(x, p4f_1);
+  e = psub(e, pand(p4f_1, mask));
+  x = padd(x, tmp);
+
+  Packet4f x2 = pmul(x,x);
+  Packet4f x3 = pmul(x2,x);
+
+  Packet4f y, y1, y2;
+  y  = pmadd(p4f_cephes_log_p0, x, p4f_cephes_log_p1);
+  y1 = pmadd(p4f_cephes_log_p3, x, p4f_cephes_log_p4);
+  y2 = pmadd(p4f_cephes_log_p6, x, p4f_cephes_log_p7);
+  y  = pmadd(y , x, p4f_cephes_log_p2);
+  y1 = pmadd(y1, x, p4f_cephes_log_p5);
+  y2 = pmadd(y2, x, p4f_cephes_log_p8);
+  y = pmadd(y, x3, y1);
+  y = pmadd(y, x3, y2);
+  y = pmul(y, x3);
+
+  y1 = pmul(e, p4f_cephes_log_q1);
+  tmp = pmul(x2, p4f_half);
+  y = padd(y, y1);
+  x = psub(x, tmp);
+  y2 = pmul(e, p4f_cephes_log_q2);
+  x = padd(x, y);
+  x = padd(x, y2);
+  // negative arg will be NAN, 0 will be -INF
+  return _mm_or_ps(_mm_andnot_ps(iszero_mask, _mm_or_ps(x, invalid_mask)),
+                   _mm_and_ps(iszero_mask, p4f_minus_inf));
+}
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f pexp<Packet4f>(const Packet4f& _x)
+{
+  Packet4f x = _x;
+  _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f);
+  _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f);
+  _EIGEN_DECLARE_CONST_Packet4i(0x7f, 0x7f);
+
+
+  _EIGEN_DECLARE_CONST_Packet4f(exp_hi,  88.3762626647950f);
+  _EIGEN_DECLARE_CONST_Packet4f(exp_lo, -88.3762626647949f);
+
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_LOG2EF, 1.44269504088896341f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C1, 0.693359375f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_C2, -2.12194440e-4f);
+
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p0, 1.9875691500E-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p1, 1.3981999507E-3f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p2, 8.3334519073E-3f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p3, 4.1665795894E-2f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p4, 1.6666665459E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_exp_p5, 5.0000001201E-1f);
+
+  Packet4f tmp, fx;
+  Packet4i emm0;
+
+  // clamp x
+  x = pmax(pmin(x, p4f_exp_hi), p4f_exp_lo);
+
+  /* express exp(x) as exp(g + n*log(2)) */
+  fx = pmadd(x, p4f_cephes_LOG2EF, p4f_half);
+
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  fx = _mm_floor_ps(fx);
+#else
+  emm0 = _mm_cvttps_epi32(fx);
+  tmp  = _mm_cvtepi32_ps(emm0);
+  /* if greater, substract 1 */
+  Packet4f mask = _mm_cmpgt_ps(tmp, fx);
+  mask = _mm_and_ps(mask, p4f_1);
+  fx = psub(tmp, mask);
+#endif
+
+  tmp = pmul(fx, p4f_cephes_exp_C1);
+  Packet4f z = pmul(fx, p4f_cephes_exp_C2);
+  x = psub(x, tmp);
+  x = psub(x, z);
+
+  z = pmul(x,x);
+
+  Packet4f y = p4f_cephes_exp_p0;
+  y = pmadd(y, x, p4f_cephes_exp_p1);
+  y = pmadd(y, x, p4f_cephes_exp_p2);
+  y = pmadd(y, x, p4f_cephes_exp_p3);
+  y = pmadd(y, x, p4f_cephes_exp_p4);
+  y = pmadd(y, x, p4f_cephes_exp_p5);
+  y = pmadd(y, z, x);
+  y = padd(y, p4f_1);
+
+  // build 2^n
+  emm0 = _mm_cvttps_epi32(fx);
+  emm0 = _mm_add_epi32(emm0, p4i_0x7f);
+  emm0 = _mm_slli_epi32(emm0, 23);
+  return pmax(pmul(y, Packet4f(_mm_castsi128_ps(emm0))), _x);
+}
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet2d pexp<Packet2d>(const Packet2d& _x)
+{
+  Packet2d x = _x;
+
+  _EIGEN_DECLARE_CONST_Packet2d(1 , 1.0);
+  _EIGEN_DECLARE_CONST_Packet2d(2 , 2.0);
+  _EIGEN_DECLARE_CONST_Packet2d(half, 0.5);
+
+  _EIGEN_DECLARE_CONST_Packet2d(exp_hi,  709.437);
+  _EIGEN_DECLARE_CONST_Packet2d(exp_lo, -709.436139303);
+
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_LOG2EF, 1.4426950408889634073599);
+
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p0, 1.26177193074810590878e-4);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p1, 3.02994407707441961300e-2);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_p2, 9.99999999999999999910e-1);
+
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q0, 3.00198505138664455042e-6);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q1, 2.52448340349684104192e-3);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q2, 2.27265548208155028766e-1);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_q3, 2.00000000000000000009e0);
+
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C1, 0.693145751953125);
+  _EIGEN_DECLARE_CONST_Packet2d(cephes_exp_C2, 1.42860682030941723212e-6);
+  static const __m128i p4i_1023_0 = _mm_setr_epi32(1023, 1023, 0, 0);
+
+  Packet2d tmp, fx;
+  Packet4i emm0;
+
+  // clamp x
+  x = pmax(pmin(x, p2d_exp_hi), p2d_exp_lo);
+  /* express exp(x) as exp(g + n*log(2)) */
+  fx = pmadd(p2d_cephes_LOG2EF, x, p2d_half);
+
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  fx = _mm_floor_pd(fx);
+#else
+  emm0 = _mm_cvttpd_epi32(fx);
+  tmp  = _mm_cvtepi32_pd(emm0);
+  /* if greater, substract 1 */
+  Packet2d mask = _mm_cmpgt_pd(tmp, fx);
+  mask = _mm_and_pd(mask, p2d_1);
+  fx = psub(tmp, mask);
+#endif
+
+  tmp = pmul(fx, p2d_cephes_exp_C1);
+  Packet2d z = pmul(fx, p2d_cephes_exp_C2);
+  x = psub(x, tmp);
+  x = psub(x, z);
+
+  Packet2d x2 = pmul(x,x);
+
+  Packet2d px = p2d_cephes_exp_p0;
+  px = pmadd(px, x2, p2d_cephes_exp_p1);
+  px = pmadd(px, x2, p2d_cephes_exp_p2);
+  px = pmul (px, x);
+
+  Packet2d qx = p2d_cephes_exp_q0;
+  qx = pmadd(qx, x2, p2d_cephes_exp_q1);
+  qx = pmadd(qx, x2, p2d_cephes_exp_q2);
+  qx = pmadd(qx, x2, p2d_cephes_exp_q3);
+
+  x = pdiv(px,psub(qx,px));
+  x = pmadd(p2d_2,x,p2d_1);
+
+  // build 2^n
+  emm0 = _mm_cvttpd_epi32(fx);
+  emm0 = _mm_add_epi32(emm0, p4i_1023_0);
+  emm0 = _mm_slli_epi32(emm0, 20);
+  emm0 = _mm_shuffle_epi32(emm0, _MM_SHUFFLE(1,2,0,3));
+  return pmax(pmul(x, Packet2d(_mm_castsi128_pd(emm0))), _x);
+}
+
+/* evaluation of 4 sines at onces, using SSE2 intrinsics.
+
+   The code is the exact rewriting of the cephes sinf function.
+   Precision is excellent as long as x < 8192 (I did not bother to
+   take into account the special handling they have for greater values
+   -- it does not return garbage for arguments over 8192, though, but
+   the extra precision is missing).
+
+   Note that it is such that sinf((float)M_PI) = 8.74e-8, which is the
+   surprising but correct result.
+*/
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f psin<Packet4f>(const Packet4f& _x)
+{
+  Packet4f x = _x;
+  _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f);
+  _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f);
+
+  _EIGEN_DECLARE_CONST_Packet4i(1, 1);
+  _EIGEN_DECLARE_CONST_Packet4i(not1, ~1);
+  _EIGEN_DECLARE_CONST_Packet4i(2, 2);
+  _EIGEN_DECLARE_CONST_Packet4i(4, 4);
+
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(sign_mask, 0x80000000);
+
+  _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP1,-0.78515625f);
+  _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP2, -2.4187564849853515625e-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP3, -3.77489497744594108e-8f);
+  _EIGEN_DECLARE_CONST_Packet4f(sincof_p0, -1.9515295891E-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(sincof_p1,  8.3321608736E-3f);
+  _EIGEN_DECLARE_CONST_Packet4f(sincof_p2, -1.6666654611E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(coscof_p0,  2.443315711809948E-005f);
+  _EIGEN_DECLARE_CONST_Packet4f(coscof_p1, -1.388731625493765E-003f);
+  _EIGEN_DECLARE_CONST_Packet4f(coscof_p2,  4.166664568298827E-002f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_FOPI, 1.27323954473516f); // 4 / M_PI
+
+  Packet4f xmm1, xmm2, xmm3, sign_bit, y;
+
+  Packet4i emm0, emm2;
+  sign_bit = x;
+  /* take the absolute value */
+  x = pabs(x);
+
+  /* take the modulo */
+
+  /* extract the sign bit (upper one) */
+  sign_bit = _mm_and_ps(sign_bit, p4f_sign_mask);
+
+  /* scale by 4/Pi */
+  y = pmul(x, p4f_cephes_FOPI);
+
+  /* store the integer part of y in mm0 */
+  emm2 = _mm_cvttps_epi32(y);
+  /* j=(j+1) & (~1) (see the cephes sources) */
+  emm2 = _mm_add_epi32(emm2, p4i_1);
+  emm2 = _mm_and_si128(emm2, p4i_not1);
+  y = _mm_cvtepi32_ps(emm2);
+  /* get the swap sign flag */
+  emm0 = _mm_and_si128(emm2, p4i_4);
+  emm0 = _mm_slli_epi32(emm0, 29);
+  /* get the polynom selection mask
+     there is one polynom for 0 <= x <= Pi/4
+     and another one for Pi/4<x<=Pi/2
+
+     Both branches will be computed.
+  */
+  emm2 = _mm_and_si128(emm2, p4i_2);
+  emm2 = _mm_cmpeq_epi32(emm2, _mm_setzero_si128());
+
+  Packet4f swap_sign_bit = _mm_castsi128_ps(emm0);
+  Packet4f poly_mask = _mm_castsi128_ps(emm2);
+  sign_bit = _mm_xor_ps(sign_bit, swap_sign_bit);
+
+  /* The magic pass: "Extended precision modular arithmetic"
+     x = ((x - y * DP1) - y * DP2) - y * DP3; */
+  xmm1 = pmul(y, p4f_minus_cephes_DP1);
+  xmm2 = pmul(y, p4f_minus_cephes_DP2);
+  xmm3 = pmul(y, p4f_minus_cephes_DP3);
+  x = padd(x, xmm1);
+  x = padd(x, xmm2);
+  x = padd(x, xmm3);
+
+  /* Evaluate the first polynom  (0 <= x <= Pi/4) */
+  y = p4f_coscof_p0;
+  Packet4f z = _mm_mul_ps(x,x);
+
+  y = pmadd(y, z, p4f_coscof_p1);
+  y = pmadd(y, z, p4f_coscof_p2);
+  y = pmul(y, z);
+  y = pmul(y, z);
+  Packet4f tmp = pmul(z, p4f_half);
+  y = psub(y, tmp);
+  y = padd(y, p4f_1);
+
+  /* Evaluate the second polynom  (Pi/4 <= x <= 0) */
+
+  Packet4f y2 = p4f_sincof_p0;
+  y2 = pmadd(y2, z, p4f_sincof_p1);
+  y2 = pmadd(y2, z, p4f_sincof_p2);
+  y2 = pmul(y2, z);
+  y2 = pmul(y2, x);
+  y2 = padd(y2, x);
+
+  /* select the correct result from the two polynoms */
+  y2 = _mm_and_ps(poly_mask, y2);
+  y = _mm_andnot_ps(poly_mask, y);
+  y = _mm_or_ps(y,y2);
+  /* update the sign */
+  return _mm_xor_ps(y, sign_bit);
+}
+
+/* almost the same as psin */
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f pcos<Packet4f>(const Packet4f& _x)
+{
+  Packet4f x = _x;
+  _EIGEN_DECLARE_CONST_Packet4f(1 , 1.0f);
+  _EIGEN_DECLARE_CONST_Packet4f(half, 0.5f);
+
+  _EIGEN_DECLARE_CONST_Packet4i(1, 1);
+  _EIGEN_DECLARE_CONST_Packet4i(not1, ~1);
+  _EIGEN_DECLARE_CONST_Packet4i(2, 2);
+  _EIGEN_DECLARE_CONST_Packet4i(4, 4);
+
+  _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP1,-0.78515625f);
+  _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP2, -2.4187564849853515625e-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(minus_cephes_DP3, -3.77489497744594108e-8f);
+  _EIGEN_DECLARE_CONST_Packet4f(sincof_p0, -1.9515295891E-4f);
+  _EIGEN_DECLARE_CONST_Packet4f(sincof_p1,  8.3321608736E-3f);
+  _EIGEN_DECLARE_CONST_Packet4f(sincof_p2, -1.6666654611E-1f);
+  _EIGEN_DECLARE_CONST_Packet4f(coscof_p0,  2.443315711809948E-005f);
+  _EIGEN_DECLARE_CONST_Packet4f(coscof_p1, -1.388731625493765E-003f);
+  _EIGEN_DECLARE_CONST_Packet4f(coscof_p2,  4.166664568298827E-002f);
+  _EIGEN_DECLARE_CONST_Packet4f(cephes_FOPI, 1.27323954473516f); // 4 / M_PI
+
+  Packet4f xmm1, xmm2, xmm3, y;
+  Packet4i emm0, emm2;
+
+  x = pabs(x);
+
+  /* scale by 4/Pi */
+  y = pmul(x, p4f_cephes_FOPI);
+
+  /* get the integer part of y */
+  emm2 = _mm_cvttps_epi32(y);
+  /* j=(j+1) & (~1) (see the cephes sources) */
+  emm2 = _mm_add_epi32(emm2, p4i_1);
+  emm2 = _mm_and_si128(emm2, p4i_not1);
+  y = _mm_cvtepi32_ps(emm2);
+
+  emm2 = _mm_sub_epi32(emm2, p4i_2);
+
+  /* get the swap sign flag */
+  emm0 = _mm_andnot_si128(emm2, p4i_4);
+  emm0 = _mm_slli_epi32(emm0, 29);
+  /* get the polynom selection mask */
+  emm2 = _mm_and_si128(emm2, p4i_2);
+  emm2 = _mm_cmpeq_epi32(emm2, _mm_setzero_si128());
+
+  Packet4f sign_bit = _mm_castsi128_ps(emm0);
+  Packet4f poly_mask = _mm_castsi128_ps(emm2);
+
+  /* The magic pass: "Extended precision modular arithmetic"
+     x = ((x - y * DP1) - y * DP2) - y * DP3; */
+  xmm1 = pmul(y, p4f_minus_cephes_DP1);
+  xmm2 = pmul(y, p4f_minus_cephes_DP2);
+  xmm3 = pmul(y, p4f_minus_cephes_DP3);
+  x = padd(x, xmm1);
+  x = padd(x, xmm2);
+  x = padd(x, xmm3);
+
+  /* Evaluate the first polynom  (0 <= x <= Pi/4) */
+  y = p4f_coscof_p0;
+  Packet4f z = pmul(x,x);
+
+  y = pmadd(y,z,p4f_coscof_p1);
+  y = pmadd(y,z,p4f_coscof_p2);
+  y = pmul(y, z);
+  y = pmul(y, z);
+  Packet4f tmp = _mm_mul_ps(z, p4f_half);
+  y = psub(y, tmp);
+  y = padd(y, p4f_1);
+
+  /* Evaluate the second polynom  (Pi/4 <= x <= 0) */
+  Packet4f y2 = p4f_sincof_p0;
+  y2 = pmadd(y2, z, p4f_sincof_p1);
+  y2 = pmadd(y2, z, p4f_sincof_p2);
+  y2 = pmul(y2, z);
+  y2 = pmadd(y2, x, x);
+
+  /* select the correct result from the two polynoms */
+  y2 = _mm_and_ps(poly_mask, y2);
+  y  = _mm_andnot_ps(poly_mask, y);
+  y  = _mm_or_ps(y,y2);
+
+  /* update the sign */
+  return _mm_xor_ps(y, sign_bit);
+}
+
+#if EIGEN_FAST_MATH
+
+// This is based on Quake3's fast inverse square root.
+// For detail see here: http://www.beyond3d.com/content/articles/8/
+// It lacks 1 (or 2 bits in some rare cases) of precision, and does not handle negative, +inf, or denormalized numbers correctly.
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f psqrt<Packet4f>(const Packet4f& _x)
+{
+  Packet4f half = pmul(_x, pset1<Packet4f>(.5f));
+
+  /* select only the inverse sqrt of non-zero inputs */
+  Packet4f non_zero_mask = _mm_cmpge_ps(_x, pset1<Packet4f>((std::numeric_limits<float>::min)()));
+  Packet4f x = _mm_and_ps(non_zero_mask, _mm_rsqrt_ps(_x));
+
+  x = pmul(x, psub(pset1<Packet4f>(1.5f), pmul(half, pmul(x,x))));
+  return pmul(_x,x);
+}
+
+#else
+
+template<> EIGEN_STRONG_INLINE Packet4f psqrt<Packet4f>(const Packet4f& x) { return _mm_sqrt_ps(x); }
+
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet2d psqrt<Packet2d>(const Packet2d& x) { return _mm_sqrt_pd(x); }
+
+
+#if EIGEN_FAST_MATH
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f prsqrt<Packet4f>(const Packet4f& _x) {
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(inf, 0x7f800000);
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(nan, 0x7fc00000);
+  _EIGEN_DECLARE_CONST_Packet4f(one_point_five, 1.5f);
+  _EIGEN_DECLARE_CONST_Packet4f(minus_half, -0.5f);
+  _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(flt_min, 0x00800000);
+
+  Packet4f neg_half = pmul(_x, p4f_minus_half);
+
+  // select only the inverse sqrt of positive normal inputs (denormals are
+  // flushed to zero and cause infs as well).
+  Packet4f le_zero_mask = _mm_cmple_ps(_x, p4f_flt_min);
+  Packet4f x = _mm_andnot_ps(le_zero_mask, _mm_rsqrt_ps(_x));
+
+  // Fill in NaNs and Infs for the negative/zero entries.
+  Packet4f neg_mask = _mm_cmplt_ps(_x, _mm_setzero_ps());
+  Packet4f zero_mask = _mm_andnot_ps(neg_mask, le_zero_mask);
+  Packet4f infs_and_nans = _mm_or_ps(_mm_and_ps(neg_mask, p4f_nan),
+                                        _mm_and_ps(zero_mask, p4f_inf));
+
+  // Do a single step of Newton's iteration.
+  x = pmul(x, pmadd(neg_half, pmul(x, x), p4f_one_point_five));
+
+  // Insert NaNs and Infs in all the right places.
+  return _mm_or_ps(x, infs_and_nans);
+}
+
+#else
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet4f prsqrt<Packet4f>(const Packet4f& x) {
+  // Unfortunately we can't use the much faster mm_rqsrt_ps since it only provides an approximation.
+  return _mm_div_ps(pset1<Packet4f>(1.0f), _mm_sqrt_ps(x));
+}
+
+#endif
+
+template<> EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS EIGEN_UNUSED
+Packet2d prsqrt<Packet2d>(const Packet2d& x) {
+  // Unfortunately we can't use the much faster mm_rqsrt_pd since it only provides an approximation.
+  return _mm_div_pd(pset1<Packet2d>(1.0), _mm_sqrt_pd(x));
+}
+
+// Identical to the ptanh in GenericPacketMath.h, but for doubles use
+// a small/medium approximation threshold of 0.001.
+template<> EIGEN_STRONG_INLINE Packet2d ptanh_approx_threshold() {
+  return pset1<Packet2d>(0.001);
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_MATH_FUNCTIONS_SSE_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/SSE/PacketMath.h b/third_party/eigen3/Eigen/src/Core/arch/SSE/PacketMath.h
new file mode 100644
index 0000000000..7f4274fd99
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/SSE/PacketMath.h
@@ -0,0 +1,883 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PACKET_MATH_SSE_H
+#define EIGEN_PACKET_MATH_SSE_H
+
+namespace Eigen {
+
+namespace internal {
+
+#ifndef EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD
+#define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 8
+#endif
+
+#ifndef EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS
+#define EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS (2*sizeof(void*))
+#endif
+
+#ifdef __FMA__
+#ifndef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+#define EIGEN_HAS_SINGLE_INSTRUCTION_MADD 1
+#endif
+#endif
+
+typedef __m128  Packet4f;
+typedef __m128i Packet4i;
+typedef __m128d Packet2d;
+
+template<> struct is_arithmetic<__m128>  { enum { value = true }; };
+template<> struct is_arithmetic<__m128i> { enum { value = true }; };
+template<> struct is_arithmetic<__m128d> { enum { value = true }; };
+
+#define vec4f_swizzle1(v,p,q,r,s) \
+  (_mm_castsi128_ps(_mm_shuffle_epi32( _mm_castps_si128(v), ((s)<<6|(r)<<4|(q)<<2|(p)))))
+
+#define vec4i_swizzle1(v,p,q,r,s) \
+  (_mm_shuffle_epi32( v, ((s)<<6|(r)<<4|(q)<<2|(p))))
+
+#define vec2d_swizzle1(v,p,q) \
+  (_mm_castsi128_pd(_mm_shuffle_epi32( _mm_castpd_si128(v), ((q*2+1)<<6|(q*2)<<4|(p*2+1)<<2|(p*2)))))
+
+#define vec4f_swizzle2(a,b,p,q,r,s) \
+  (_mm_shuffle_ps( (a), (b), ((s)<<6|(r)<<4|(q)<<2|(p))))
+
+#define vec4i_swizzle2(a,b,p,q,r,s) \
+  (_mm_castps_si128( (_mm_shuffle_ps( _mm_castsi128_ps(a), _mm_castsi128_ps(b), ((s)<<6|(r)<<4|(q)<<2|(p))))))
+
+#define _EIGEN_DECLARE_CONST_Packet4f(NAME,X) \
+  const Packet4f p4f_##NAME = pset1<Packet4f>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet2d(NAME,X) \
+  const Packet2d p2d_##NAME = pset1<Packet2d>(X)
+
+#define _EIGEN_DECLARE_CONST_Packet4f_FROM_INT(NAME,X) \
+  const Packet4f p4f_##NAME = _mm_castsi128_ps(pset1<Packet4i>(X))
+
+#define _EIGEN_DECLARE_CONST_Packet4i(NAME,X) \
+  const Packet4i p4i_##NAME = pset1<Packet4i>(X)
+
+
+// Use the packet_traits defined in AVX/PacketMath.h instead if we're going
+// to leverage AVX instructions.
+#ifndef EIGEN_VECTORIZE_AVX
+template<> struct packet_traits<float>  : default_packet_traits
+{
+  typedef Packet4f type;
+  typedef Packet4f half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=4,
+    HasHalfPacket = 0,
+
+    HasDiv  = 1,
+    HasSin  = EIGEN_FAST_MATH,
+    HasCos  = EIGEN_FAST_MATH,
+    HasTanH = 1,
+    HasLog  = 1,
+    HasExp  = 1,
+    HasSqrt = 1,
+    HasRsqrt = 1,
+
+    HasBlend = 1,
+    HasSelect = 1,
+    HasEq = 1,
+  };
+};
+template<> struct packet_traits<double> : default_packet_traits
+{
+  typedef Packet2d type;
+  typedef Packet2d half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=2,
+    HasHalfPacket = 0,
+
+    HasDiv  = 1,
+    HasTanH = 1,
+    HasExp  = 1,
+    HasSqrt = 1,
+    HasRsqrt = 1,
+
+    HasBlend = 1,
+    HasSelect = 1,
+    HasEq = 1,
+  };
+};
+#endif
+template<> struct packet_traits<int>    : default_packet_traits
+{
+  typedef Packet4i type;
+  typedef Packet4i half;
+  enum {
+    // FIXME check the Has*
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size=4,
+
+    HasBlend = 1,
+  };
+};
+
+template<> struct unpacket_traits<Packet4f> { typedef float  type; enum {size=4}; typedef Packet4f half; };
+template<> struct unpacket_traits<Packet2d> { typedef double type; enum {size=2}; typedef Packet2d half; };
+template<> struct unpacket_traits<Packet4i> { typedef int    type; enum {size=4}; typedef Packet4i half; };
+
+#if EIGEN_COMP_MSVC==1500
+// Workaround MSVC 9 internal compiler error.
+// TODO: It has been detected with win64 builds (amd64), so let's check whether it also happens in 32bits+SSE mode
+// TODO: let's check whether there does not exist a better fix, like adding a pset0() function. (it crashed on pset1(0)).
+template<> EIGEN_STRONG_INLINE Packet4f pset1<Packet4f>(const float&  from) { return _mm_set_ps(from,from,from,from); }
+template<> EIGEN_STRONG_INLINE Packet2d pset1<Packet2d>(const double& from) { return _mm_set_pd(from,from); }
+template<> EIGEN_STRONG_INLINE Packet4i pset1<Packet4i>(const int&    from) { return _mm_set_epi32(from,from,from,from); }
+#else
+template<> EIGEN_STRONG_INLINE Packet4f pset1<Packet4f>(const float&  from) { return _mm_set_ps1(from); }
+template<> EIGEN_STRONG_INLINE Packet2d pset1<Packet2d>(const double& from) { return _mm_set1_pd(from); }
+template<> EIGEN_STRONG_INLINE Packet4i pset1<Packet4i>(const int&    from) { return _mm_set1_epi32(from); }
+#endif
+
+// GCC generates a shufps instruction for _mm_set1_ps/_mm_load1_ps instead of the more efficient pshufd instruction.
+// However, using inrinsics for pset1 makes gcc to generate crappy code in some cases (see bug 203)
+// Using inline assembly is also not an option because then gcc fails to reorder properly the instructions.
+// Therefore, we introduced the pload1 functions to be used in product kernels for which bug 203 does not apply.
+// Also note that with AVX, we want it to generate a vbroadcastss.
+#if EIGEN_COMP_GNUC_STRICT && (!defined __AVX__)
+template<> EIGEN_STRONG_INLINE Packet4f pload1<Packet4f>(const float *from) {
+  return vec4f_swizzle1(_mm_load_ss(from),0,0,0,0);
+}
+#endif
+
+#ifndef EIGEN_VECTORIZE_AVX
+template<> EIGEN_STRONG_INLINE Packet4f plset<float>(const float& a) { return _mm_add_ps(pset1<Packet4f>(a), _mm_set_ps(3,2,1,0)); }
+template<> EIGEN_STRONG_INLINE Packet2d plset<double>(const double& a) { return _mm_add_pd(pset1<Packet2d>(a),_mm_set_pd(1,0)); }
+#endif
+template<> EIGEN_STRONG_INLINE Packet4i plset<int>(const int& a) { return _mm_add_epi32(pset1<Packet4i>(a),_mm_set_epi32(3,2,1,0)); }
+
+template<> EIGEN_STRONG_INLINE Packet4f padd<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_add_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d padd<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_add_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i padd<Packet4i>(const Packet4i& a, const Packet4i& b) { return _mm_add_epi32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f psub<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_sub_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d psub<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_sub_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i psub<Packet4i>(const Packet4i& a, const Packet4i& b) { return _mm_sub_epi32(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f ple<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_cmple_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d ple<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_cmple_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f plt<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_cmplt_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d plt<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_cmplt_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f peq<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_cmpeq_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d peq<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_cmpeq_pd(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pselect<Packet4f>(const Packet4f& a, const Packet4f& b, const Packet4f& false_mask) {
+#if defined(EIGEN_VECTORIZE_SSE4_1)
+  return _mm_blendv_ps(a, b, false_mask);
+#else
+  return _mm_or_ps(_mm_andnot_ps(false_mask, a), _mm_and_ps(false_mask, b));
+#endif
+}
+template<> EIGEN_STRONG_INLINE Packet2d pselect<Packet2d>(const Packet2d& a, const Packet2d& b, const Packet2d& false_mask) {
+#if defined(EIGEN_VECTORIZE_SSE4_1)
+  return _mm_blendv_pd(a, b, false_mask);
+#else
+  return _mm_or_pd(_mm_andnot_pd(false_mask, a), _mm_and_pd(false_mask, b));
+#endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f pnegate(const Packet4f& a)
+{
+  const Packet4f mask = _mm_castsi128_ps(_mm_setr_epi32(0x80000000,0x80000000,0x80000000,0x80000000));
+  return _mm_xor_ps(a,mask);
+}
+template<> EIGEN_STRONG_INLINE Packet2d pnegate(const Packet2d& a)
+{
+  const Packet2d mask = _mm_castsi128_pd(_mm_setr_epi32(0x0,0x80000000,0x0,0x80000000));
+  return _mm_xor_pd(a,mask);
+}
+template<> EIGEN_STRONG_INLINE Packet4i pnegate(const Packet4i& a)
+{
+  return psub(Packet4i(_mm_setr_epi32(0,0,0,0)), a);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f pconj(const Packet4f& a) { return a; }
+template<> EIGEN_STRONG_INLINE Packet2d pconj(const Packet2d& a) { return a; }
+template<> EIGEN_STRONG_INLINE Packet4i pconj(const Packet4i& a) { return a; }
+
+template<> EIGEN_STRONG_INLINE Packet4f pmul<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_mul_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d pmul<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_mul_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pmul<Packet4i>(const Packet4i& a, const Packet4i& b)
+{
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  return _mm_mullo_epi32(a,b);
+#else
+  // this version is slightly faster than 4 scalar products
+  return vec4i_swizzle1(
+            vec4i_swizzle2(
+              _mm_mul_epu32(a,b),
+              _mm_mul_epu32(vec4i_swizzle1(a,1,0,3,2),
+                            vec4i_swizzle1(b,1,0,3,2)),
+              0,2,0,2),
+            0,2,1,3);
+#endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f pdiv<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_div_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d pdiv<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_div_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pdiv<Packet4i>(const Packet4i& /*a*/, const Packet4i& /*b*/)
+{ eigen_assert(false && "packet integer division are not supported by SSE");
+  return pset1<Packet4i>(0);
+}
+
+// for some weird raisons, it has to be overloaded for packet of integers
+template<> EIGEN_STRONG_INLINE Packet4i pmadd(const Packet4i& a, const Packet4i& b, const Packet4i& c) { return padd(pmul(a,b), c); }
+#ifdef __FMA__
+template<> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f& a, const Packet4f& b, const Packet4f& c) { return _mm_fmadd_ps(a,b,c); }
+template<> EIGEN_STRONG_INLINE Packet2d pmadd(const Packet2d& a, const Packet2d& b, const Packet2d& c) { return _mm_fmadd_pd(a,b,c); }
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet4f pmin<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_min_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d pmin<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_min_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pmin<Packet4i>(const Packet4i& a, const Packet4i& b)
+{
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  return _mm_min_epi32(a,b);
+#else
+  // after some bench, this version *is* faster than a scalar implementation
+  Packet4i mask = _mm_cmplt_epi32(a,b);
+  return _mm_or_si128(_mm_and_si128(mask,a),_mm_andnot_si128(mask,b));
+#endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f pmax<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_max_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d pmax<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_max_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pmax<Packet4i>(const Packet4i& a, const Packet4i& b)
+{
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  return _mm_max_epi32(a,b);
+#else
+  // after some bench, this version *is* faster than a scalar implementation
+  Packet4i mask = _mm_cmpgt_epi32(a,b);
+  return _mm_or_si128(_mm_and_si128(mask,a),_mm_andnot_si128(mask,b));
+#endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f pand<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_and_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d pand<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_and_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pand<Packet4i>(const Packet4i& a, const Packet4i& b) { return _mm_and_si128(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f por<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_or_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d por<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_or_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i por<Packet4i>(const Packet4i& a, const Packet4i& b) { return _mm_or_si128(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pxor<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_xor_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d pxor<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_xor_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pxor<Packet4i>(const Packet4i& a, const Packet4i& b) { return _mm_xor_si128(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pandnot<Packet4f>(const Packet4f& a, const Packet4f& b) { return _mm_andnot_ps(a,b); }
+template<> EIGEN_STRONG_INLINE Packet2d pandnot<Packet2d>(const Packet2d& a, const Packet2d& b) { return _mm_andnot_pd(a,b); }
+template<> EIGEN_STRONG_INLINE Packet4i pandnot<Packet4i>(const Packet4i& a, const Packet4i& b) { return _mm_andnot_si128(a,b); }
+
+template<> EIGEN_STRONG_INLINE Packet4f pload<Packet4f>(const float*   from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm_load_ps(from); }
+template<> EIGEN_STRONG_INLINE Packet2d pload<Packet2d>(const double*  from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm_load_pd(from); }
+template<> EIGEN_STRONG_INLINE Packet4i pload<Packet4i>(const int*     from) { EIGEN_DEBUG_ALIGNED_LOAD return _mm_load_si128(reinterpret_cast<const __m128i*>(from)); }
+
+#if EIGEN_COMP_MSVC
+  template<> EIGEN_STRONG_INLINE Packet4f ploadu<Packet4f>(const float*  from) {
+    EIGEN_DEBUG_UNALIGNED_LOAD
+    #if (EIGEN_COMP_MSVC==1600)
+    // NOTE Some version of MSVC10 generates bad code when using _mm_loadu_ps
+    // (i.e., it does not generate an unaligned load!!
+    // TODO On most architectures this version should also be faster than a single _mm_loadu_ps
+    // so we could also enable it for MSVC08 but first we have to make this later does not generate crap when doing so...
+    __m128 res = _mm_loadl_pi(_mm_set1_ps(0.0f), (const __m64*)(from));
+    res = _mm_loadh_pi(res, (const __m64*)(from+2));
+    return res;
+    #else
+    return _mm_loadu_ps(from);
+    #endif
+  }
+  template<> EIGEN_STRONG_INLINE Packet2d ploadu<Packet2d>(const double* from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm_loadu_pd(from); }
+  template<> EIGEN_STRONG_INLINE Packet4i ploadu<Packet4i>(const int*    from) { EIGEN_DEBUG_UNALIGNED_LOAD return _mm_loadu_si128(reinterpret_cast<const __m128i*>(from)); }
+#else
+// Fast unaligned loads. Note that here we cannot directly use intrinsics: this would
+// require pointer casting to incompatible pointer types and leads to invalid code
+// because of the strict aliasing rule. The "dummy" stuff are required to enforce
+// a correct instruction dependency.
+// TODO: do the same for MSVC (ICC is compatible)
+// NOTE: with the code below, MSVC's compiler crashes!
+
+#if EIGEN_COMP_GNUC && (EIGEN_ARCH_i386 || (EIGEN_ARCH_x86_64 && EIGEN_GNUC_AT_LEAST(4, 8)))
+  // bug 195: gcc/i386 emits weird x87 fldl/fstpl instructions for _mm_load_sd
+  #define EIGEN_AVOID_CUSTOM_UNALIGNED_LOADS 1
+  #define EIGEN_AVOID_CUSTOM_UNALIGNED_STORES 1
+#elif EIGEN_COMP_CLANG
+  // bug 201: Segfaults in __mm_loadh_pd with clang 2.8
+  #define EIGEN_AVOID_CUSTOM_UNALIGNED_LOADS 1
+  #define EIGEN_AVOID_CUSTOM_UNALIGNED_STORES 0
+#else
+  #define EIGEN_AVOID_CUSTOM_UNALIGNED_LOADS 0
+  #define EIGEN_AVOID_CUSTOM_UNALIGNED_STORES 0
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet4f ploadu<Packet4f>(const float* from)
+{
+  EIGEN_DEBUG_UNALIGNED_LOAD
+#if EIGEN_AVOID_CUSTOM_UNALIGNED_LOADS
+  return _mm_loadu_ps(from);
+#else
+  __m128d res;
+  res =  _mm_load_sd((const double*)(from)) ;
+  res =  _mm_loadh_pd(res, (const double*)(from+2)) ;
+  return _mm_castpd_ps(res);
+#endif
+}
+template<> EIGEN_STRONG_INLINE Packet2d ploadu<Packet2d>(const double* from)
+{
+  EIGEN_DEBUG_UNALIGNED_LOAD
+#if EIGEN_AVOID_CUSTOM_UNALIGNED_LOADS
+  return _mm_loadu_pd(from);
+#else
+  __m128d res;
+  res = _mm_load_sd(from) ;
+  res = _mm_loadh_pd(res,from+1);
+  return res;
+#endif
+}
+template<> EIGEN_STRONG_INLINE Packet4i ploadu<Packet4i>(const int* from)
+{
+  EIGEN_DEBUG_UNALIGNED_LOAD
+#if EIGEN_AVOID_CUSTOM_UNALIGNED_LOADS
+  return _mm_loadu_si128(reinterpret_cast<const __m128i*>(from));
+#else
+  __m128d res;
+  res =  _mm_load_sd((const double*)(from)) ;
+  res =  _mm_loadh_pd(res, (const double*)(from+2)) ;
+  return _mm_castpd_si128(res);
+#endif
+}
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet4f ploaddup<Packet4f>(const float*   from)
+{
+  return vec4f_swizzle1(_mm_castpd_ps(_mm_load_sd(reinterpret_cast<const double*>(from))), 0, 0, 1, 1);
+}
+template<> EIGEN_STRONG_INLINE Packet2d ploaddup<Packet2d>(const double*  from)
+{ return pset1<Packet2d>(from[0]); }
+template<> EIGEN_STRONG_INLINE Packet4i ploaddup<Packet4i>(const int*     from)
+{
+  Packet4i tmp;
+  tmp = _mm_loadl_epi64(reinterpret_cast<const __m128i*>(from));
+  return vec4i_swizzle1(tmp, 0, 0, 1, 1);
+}
+
+template<> EIGEN_STRONG_INLINE void pstore<float>(float*   to, const Packet4f& from) { EIGEN_DEBUG_ALIGNED_STORE _mm_store_ps(to, from); }
+template<> EIGEN_STRONG_INLINE void pstore<double>(double* to, const Packet2d& from) { EIGEN_DEBUG_ALIGNED_STORE _mm_store_pd(to, from); }
+template<> EIGEN_STRONG_INLINE void pstore<int>(int*       to, const Packet4i& from) { EIGEN_DEBUG_ALIGNED_STORE _mm_store_si128(reinterpret_cast<__m128i*>(to), from); }
+
+template<> EIGEN_STRONG_INLINE void pstoreu<double>(double* to, const Packet2d& from) {
+  EIGEN_DEBUG_UNALIGNED_STORE
+#if EIGEN_AVOID_CUSTOM_UNALIGNED_STORES
+  _mm_storeu_pd(to, from);
+#else
+  _mm_storel_pd((to), from);
+  _mm_storeh_pd((to+1), from);
+#endif
+}
+template<> EIGEN_STRONG_INLINE void pstoreu<float>(float*  to, const Packet4f& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(reinterpret_cast<double*>(to), Packet2d(_mm_castps_pd(from))); }
+template<> EIGEN_STRONG_INLINE void pstoreu<int>(int*      to, const Packet4i& from) { EIGEN_DEBUG_UNALIGNED_STORE pstoreu(reinterpret_cast<double*>(to), Packet2d(_mm_castsi128_pd(from))); }
+
+template<> EIGEN_DEVICE_FUNC inline Packet4f pgather<float, Packet4f>(const float* from, int stride)
+{
+ return _mm_set_ps(from[3*stride], from[2*stride], from[1*stride], from[0*stride]);
+}
+template<> EIGEN_DEVICE_FUNC inline Packet2d pgather<double, Packet2d>(const double* from, int stride)
+{
+ return _mm_set_pd(from[1*stride], from[0*stride]);
+}
+template<> EIGEN_DEVICE_FUNC inline Packet4i pgather<int, Packet4i>(const int* from, int stride)
+{
+ return _mm_set_epi32(from[3*stride], from[2*stride], from[1*stride], from[0*stride]);
+ }
+
+template<> EIGEN_DEVICE_FUNC inline void pscatter<float, Packet4f>(float* to, const Packet4f& from, int stride)
+{
+  to[stride*0] = _mm_cvtss_f32(from);
+  to[stride*1] = _mm_cvtss_f32(_mm_shuffle_ps(from, from, 1));
+  to[stride*2] = _mm_cvtss_f32(_mm_shuffle_ps(from, from, 2));
+  to[stride*3] = _mm_cvtss_f32(_mm_shuffle_ps(from, from, 3));
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<double, Packet2d>(double* to, const Packet2d& from, int stride)
+{
+  to[stride*0] = _mm_cvtsd_f64(from);
+  to[stride*1] = _mm_cvtsd_f64(_mm_shuffle_pd(from, from, 1));
+}
+template<> EIGEN_DEVICE_FUNC inline void pscatter<int, Packet4i>(int* to, const Packet4i& from, int stride)
+{
+  to[stride*0] = _mm_cvtsi128_si32(from);
+  to[stride*1] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 1));
+  to[stride*2] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 2));
+  to[stride*3] = _mm_cvtsi128_si32(_mm_shuffle_epi32(from, 3));
+}
+
+// some compilers might be tempted to perform multiple moves instead of using a vector path.
+template<> EIGEN_STRONG_INLINE void pstore1<Packet4f>(float* to, const float& a)
+{
+  Packet4f pa = _mm_set_ss(a);
+  pstore(to, Packet4f(vec4f_swizzle1(pa,0,0,0,0)));
+}
+// some compilers might be tempted to perform multiple moves instead of using a vector path.
+template<> EIGEN_STRONG_INLINE void pstore1<Packet2d>(double* to, const double& a)
+{
+  Packet2d pa = _mm_set_sd(a);
+  pstore(to, Packet2d(vec2d_swizzle1(pa,0,0)));
+}
+
+#ifndef EIGEN_VECTORIZE_AVX
+template<> EIGEN_STRONG_INLINE void prefetch<float>(const float*   addr) { _mm_prefetch((const char*)(addr), _MM_HINT_T0); }
+template<> EIGEN_STRONG_INLINE void prefetch<double>(const double* addr) { _mm_prefetch((const char*)(addr), _MM_HINT_T0); }
+template<> EIGEN_STRONG_INLINE void prefetch<int>(const int*       addr) { _mm_prefetch((const char*)(addr), _MM_HINT_T0); }
+#endif
+
+#if EIGEN_COMP_MSVC_STRICT && EIGEN_OS_WIN64
+// The temporary variable fixes an internal compilation error in vs <= 2008 and a wrong-result bug in vs 2010
+// Direct of the struct members fixed bug #62.
+template<> EIGEN_STRONG_INLINE float  pfirst<Packet4f>(const Packet4f& a) { return a.m128_f32[0]; }
+template<> EIGEN_STRONG_INLINE double pfirst<Packet2d>(const Packet2d& a) { return a.m128d_f64[0]; }
+template<> EIGEN_STRONG_INLINE int    pfirst<Packet4i>(const Packet4i& a) { int x = _mm_cvtsi128_si32(a); return x; }
+#elif EIGEN_COMP_MSVC_STRICT
+// The temporary variable fixes an internal compilation error in vs <= 2008 and a wrong-result bug in vs 2010
+template<> EIGEN_STRONG_INLINE float  pfirst<Packet4f>(const Packet4f& a) { float x = _mm_cvtss_f32(a); return x; }
+template<> EIGEN_STRONG_INLINE double pfirst<Packet2d>(const Packet2d& a) { double x = _mm_cvtsd_f64(a); return x; }
+template<> EIGEN_STRONG_INLINE int    pfirst<Packet4i>(const Packet4i& a) { int x = _mm_cvtsi128_si32(a); return x; }
+#else
+template<> EIGEN_STRONG_INLINE float  pfirst<Packet4f>(const Packet4f& a) { return _mm_cvtss_f32(a); }
+template<> EIGEN_STRONG_INLINE double pfirst<Packet2d>(const Packet2d& a) { return _mm_cvtsd_f64(a); }
+template<> EIGEN_STRONG_INLINE int    pfirst<Packet4i>(const Packet4i& a) { return _mm_cvtsi128_si32(a); }
+#endif
+
+template<> EIGEN_STRONG_INLINE Packet4f preverse(const Packet4f& a)
+{ return _mm_shuffle_ps(a,a,0x1B); }
+template<> EIGEN_STRONG_INLINE Packet2d preverse(const Packet2d& a)
+{ return _mm_shuffle_pd(a,a,0x1); }
+template<> EIGEN_STRONG_INLINE Packet4i preverse(const Packet4i& a)
+{ return _mm_shuffle_epi32(a,0x1B); }
+
+template<size_t offset>
+struct protate_impl<offset, Packet4f>
+{
+  static Packet4f run(const Packet4f& a) {
+    return vec4f_swizzle1(a, offset, (offset + 1) % 4, (offset + 2) % 4, (offset + 3) % 4);
+  }
+};
+
+template<size_t offset>
+struct protate_impl<offset, Packet4i>
+{
+  static Packet4i run(const Packet4i& a) {
+    return vec4i_swizzle1(a, offset, (offset + 1) % 4, (offset + 2) % 4, (offset + 3) % 4);
+  }
+};
+
+template<size_t offset>
+struct protate_impl<offset, Packet2d>
+{
+  static Packet2d run(const Packet2d& a) {
+    return vec2d_swizzle1(a, offset, (offset + 1) % 2);
+  }
+};
+
+template<> EIGEN_STRONG_INLINE Packet4f pabs(const Packet4f& a)
+{
+  const Packet4f mask = _mm_castsi128_ps(_mm_setr_epi32(0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF,0x7FFFFFFF));
+  return _mm_and_ps(a,mask);
+}
+template<> EIGEN_STRONG_INLINE Packet2d pabs(const Packet2d& a)
+{
+  const Packet2d mask = _mm_castsi128_pd(_mm_setr_epi32(0xFFFFFFFF,0x7FFFFFFF,0xFFFFFFFF,0x7FFFFFFF));
+  return _mm_and_pd(a,mask);
+}
+template<> EIGEN_STRONG_INLINE Packet4i pabs(const Packet4i& a)
+{
+  #ifdef EIGEN_VECTORIZE_SSSE3
+  return _mm_abs_epi32(a);
+  #else
+  Packet4i aux = _mm_srai_epi32(a,31);
+  return _mm_sub_epi32(_mm_xor_si128(a,aux),aux);
+  #endif
+}
+
+// with AVX, the default implementations based on pload1 are faster
+#ifndef __AVX__
+template<> EIGEN_STRONG_INLINE void
+pbroadcast4<Packet4f>(const float *a,
+                      Packet4f& a0, Packet4f& a1, Packet4f& a2, Packet4f& a3)
+{
+  a3 = pload<Packet4f>(a);
+  a0 = vec4f_swizzle1(a3, 0,0,0,0);
+  a1 = vec4f_swizzle1(a3, 1,1,1,1);
+  a2 = vec4f_swizzle1(a3, 2,2,2,2);
+  a3 = vec4f_swizzle1(a3, 3,3,3,3);
+}
+template<> EIGEN_STRONG_INLINE void
+pbroadcast4<Packet2d>(const double *a,
+                      Packet2d& a0, Packet2d& a1, Packet2d& a2, Packet2d& a3)
+{
+#ifdef EIGEN_VECTORIZE_SSE3
+  a0 = _mm_loaddup_pd(a+0);
+  a1 = _mm_loaddup_pd(a+1);
+  a2 = _mm_loaddup_pd(a+2);
+  a3 = _mm_loaddup_pd(a+3);
+#else
+  a1 = pload<Packet2d>(a);
+  a0 = vec2d_swizzle1(a1, 0,0);
+  a1 = vec2d_swizzle1(a1, 1,1);
+  a3 = pload<Packet2d>(a+2);
+  a2 = vec2d_swizzle1(a3, 0,0);
+  a3 = vec2d_swizzle1(a3, 1,1);
+#endif
+}
+#endif
+
+EIGEN_STRONG_INLINE void punpackp(Packet4f* vecs)
+{
+  vecs[1] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vecs[0]), 0x55));
+  vecs[2] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vecs[0]), 0xAA));
+  vecs[3] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vecs[0]), 0xFF));
+  vecs[0] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vecs[0]), 0x00));
+}
+
+#ifdef EIGEN_VECTORIZE_SSE3
+// TODO implement SSE2 versions as well as integer versions
+template<> EIGEN_STRONG_INLINE Packet4f preduxp<Packet4f>(const Packet4f* vecs)
+{
+  return _mm_hadd_ps(_mm_hadd_ps(vecs[0], vecs[1]),_mm_hadd_ps(vecs[2], vecs[3]));
+}
+template<> EIGEN_STRONG_INLINE Packet2d preduxp<Packet2d>(const Packet2d* vecs)
+{
+  return _mm_hadd_pd(vecs[0], vecs[1]);
+}
+// SSSE3 version:
+// EIGEN_STRONG_INLINE Packet4i preduxp(const Packet4i* vecs)
+// {
+//   return _mm_hadd_epi32(_mm_hadd_epi32(vecs[0], vecs[1]),_mm_hadd_epi32(vecs[2], vecs[3]));
+// }
+
+template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
+{
+  Packet4f tmp0 = _mm_hadd_ps(a,a);
+  return pfirst<Packet4f>(_mm_hadd_ps(tmp0, tmp0));
+}
+
+template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a) { return pfirst<Packet2d>(_mm_hadd_pd(a, a)); }
+
+// SSSE3 version:
+// EIGEN_STRONG_INLINE float predux(const Packet4i& a)
+// {
+//   Packet4i tmp0 = _mm_hadd_epi32(a,a);
+//   return pfirst(_mm_hadd_epi32(tmp0, tmp0));
+// }
+#else
+// SSE2 versions
+template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
+{
+  Packet4f tmp = _mm_add_ps(a, _mm_movehl_ps(a,a));
+  return pfirst(_mm_add_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1)));
+}
+template<> EIGEN_STRONG_INLINE double predux<Packet2d>(const Packet2d& a)
+{
+  return pfirst(_mm_add_sd(a, _mm_unpackhi_pd(a,a)));
+}
+
+template<> EIGEN_STRONG_INLINE Packet4f preduxp<Packet4f>(const Packet4f* vecs)
+{
+  Packet4f tmp0, tmp1, tmp2;
+  tmp0 = _mm_unpacklo_ps(vecs[0], vecs[1]);
+  tmp1 = _mm_unpackhi_ps(vecs[0], vecs[1]);
+  tmp2 = _mm_unpackhi_ps(vecs[2], vecs[3]);
+  tmp0 = _mm_add_ps(tmp0, tmp1);
+  tmp1 = _mm_unpacklo_ps(vecs[2], vecs[3]);
+  tmp1 = _mm_add_ps(tmp1, tmp2);
+  tmp2 = _mm_movehl_ps(tmp1, tmp0);
+  tmp0 = _mm_movelh_ps(tmp0, tmp1);
+  return _mm_add_ps(tmp0, tmp2);
+}
+
+template<> EIGEN_STRONG_INLINE Packet2d preduxp<Packet2d>(const Packet2d* vecs)
+{
+  return _mm_add_pd(_mm_unpacklo_pd(vecs[0], vecs[1]), _mm_unpackhi_pd(vecs[0], vecs[1]));
+}
+#endif  // SSE3
+
+template<> EIGEN_STRONG_INLINE int predux<Packet4i>(const Packet4i& a)
+{
+  Packet4i tmp = _mm_add_epi32(a, _mm_unpackhi_epi64(a,a));
+  return pfirst(tmp) + pfirst<Packet4i>(_mm_shuffle_epi32(tmp, 1));
+}
+
+template<> EIGEN_STRONG_INLINE Packet4i preduxp<Packet4i>(const Packet4i* vecs)
+{
+  Packet4i tmp0, tmp1, tmp2;
+  tmp0 = _mm_unpacklo_epi32(vecs[0], vecs[1]);
+  tmp1 = _mm_unpackhi_epi32(vecs[0], vecs[1]);
+  tmp2 = _mm_unpackhi_epi32(vecs[2], vecs[3]);
+  tmp0 = _mm_add_epi32(tmp0, tmp1);
+  tmp1 = _mm_unpacklo_epi32(vecs[2], vecs[3]);
+  tmp1 = _mm_add_epi32(tmp1, tmp2);
+  tmp2 = _mm_unpacklo_epi64(tmp0, tmp1);
+  tmp0 = _mm_unpackhi_epi64(tmp0, tmp1);
+  return _mm_add_epi32(tmp0, tmp2);
+}
+
+// Other reduction functions:
+
+// mul
+template<> EIGEN_STRONG_INLINE float predux_mul<Packet4f>(const Packet4f& a)
+{
+  Packet4f tmp = _mm_mul_ps(a, _mm_movehl_ps(a,a));
+  return pfirst<Packet4f>(_mm_mul_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1)));
+}
+template<> EIGEN_STRONG_INLINE double predux_mul<Packet2d>(const Packet2d& a)
+{
+  return pfirst<Packet2d>(_mm_mul_sd(a, _mm_unpackhi_pd(a,a)));
+}
+template<> EIGEN_STRONG_INLINE int predux_mul<Packet4i>(const Packet4i& a)
+{
+  // after some experiments, it is seems this is the fastest way to implement it
+  // for GCC (eg., reusing pmul is very slow !)
+  // TODO try to call _mm_mul_epu32 directly
+  EIGEN_ALIGN16 int aux[4];
+  pstore(aux, a);
+  return  (aux[0] * aux[1]) * (aux[2] * aux[3]);;
+}
+
+// min
+template<> EIGEN_STRONG_INLINE float predux_min<Packet4f>(const Packet4f& a)
+{
+  Packet4f tmp = _mm_min_ps(a, _mm_movehl_ps(a,a));
+  return pfirst<Packet4f>(_mm_min_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1)));
+}
+template<> EIGEN_STRONG_INLINE double predux_min<Packet2d>(const Packet2d& a)
+{
+  return pfirst<Packet2d>(_mm_min_sd(a, _mm_unpackhi_pd(a,a)));
+}
+template<> EIGEN_STRONG_INLINE int predux_min<Packet4i>(const Packet4i& a)
+{
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  Packet4i tmp = _mm_min_epi32(a, _mm_shuffle_epi32(a, _MM_SHUFFLE(0,0,3,2)));
+  return pfirst<Packet4i>(_mm_min_epi32(tmp,_mm_shuffle_epi32(tmp, 1)));
+#else
+  // after some experiments, it is seems this is the fastest way to implement it
+  // for GCC (eg., it does not like using std::min after the pstore !!)
+  EIGEN_ALIGN16 int aux[4];
+  pstore(aux, a);
+  int aux0 = aux[0]<aux[1] ? aux[0] : aux[1];
+  int aux2 = aux[2]<aux[3] ? aux[2] : aux[3];
+  return aux0<aux2 ? aux0 : aux2;
+#endif // EIGEN_VECTORIZE_SSE4_1
+}
+
+// max
+template<> EIGEN_STRONG_INLINE float predux_max<Packet4f>(const Packet4f& a)
+{
+  Packet4f tmp = _mm_max_ps(a, _mm_movehl_ps(a,a));
+  return pfirst<Packet4f>(_mm_max_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1)));
+}
+template<> EIGEN_STRONG_INLINE double predux_max<Packet2d>(const Packet2d& a)
+{
+  return pfirst<Packet2d>(_mm_max_sd(a, _mm_unpackhi_pd(a,a)));
+}
+template<> EIGEN_STRONG_INLINE int predux_max<Packet4i>(const Packet4i& a)
+{
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  Packet4i tmp = _mm_max_epi32(a, _mm_shuffle_epi32(a, _MM_SHUFFLE(0,0,3,2)));
+  return pfirst<Packet4i>(_mm_max_epi32(tmp,_mm_shuffle_epi32(tmp, 1)));
+#else
+  // after some experiments, it is seems this is the fastest way to implement it
+  // for GCC (eg., it does not like using std::min after the pstore !!)
+  EIGEN_ALIGN16 int aux[4];
+  pstore(aux, a);
+  int aux0 = aux[0]>aux[1] ? aux[0] : aux[1];
+  int aux2 = aux[2]>aux[3] ? aux[2] : aux[3];
+  return aux0>aux2 ? aux0 : aux2;
+#endif // EIGEN_VECTORIZE_SSE4_1
+}
+
+#if EIGEN_COMP_GNUC
+// template <> EIGEN_STRONG_INLINE Packet4f pmadd(const Packet4f&  a, const Packet4f&  b, const Packet4f&  c)
+// {
+//   Packet4f res = b;
+//   asm("mulps %[a], %[b] \n\taddps %[c], %[b]" : [b] "+x" (res) : [a] "x" (a), [c] "x" (c));
+//   return res;
+// }
+// EIGEN_STRONG_INLINE Packet4i _mm_alignr_epi8(const Packet4i&  a, const Packet4i&  b, const int i)
+// {
+//   Packet4i res = a;
+//   asm("palignr %[i], %[a], %[b] " : [b] "+x" (res) : [a] "x" (a), [i] "i" (i));
+//   return res;
+// }
+#endif
+
+#ifdef EIGEN_VECTORIZE_SSSE3
+// SSSE3 versions
+template<int Offset>
+struct palign_impl<Offset,Packet4f>
+{
+  static EIGEN_STRONG_INLINE void run(Packet4f& first, const Packet4f& second)
+  {
+    if (Offset!=0)
+      first = _mm_castsi128_ps(_mm_alignr_epi8(_mm_castps_si128(second), _mm_castps_si128(first), Offset*4));
+  }
+};
+
+template<int Offset>
+struct palign_impl<Offset,Packet4i>
+{
+  static EIGEN_STRONG_INLINE void run(Packet4i& first, const Packet4i& second)
+  {
+    if (Offset!=0)
+      first = _mm_alignr_epi8(second,first, Offset*4);
+  }
+};
+
+template<int Offset>
+struct palign_impl<Offset,Packet2d>
+{
+  static EIGEN_STRONG_INLINE void run(Packet2d& first, const Packet2d& second)
+  {
+    if (Offset==1)
+      first = _mm_castsi128_pd(_mm_alignr_epi8(_mm_castpd_si128(second), _mm_castpd_si128(first), 8));
+  }
+};
+#else
+// SSE2 versions
+template<int Offset>
+struct palign_impl<Offset,Packet4f>
+{
+  static EIGEN_STRONG_INLINE void run(Packet4f& first, const Packet4f& second)
+  {
+    if (Offset==1)
+    {
+      first = _mm_move_ss(first,second);
+      first = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(first),0x39));
+    }
+    else if (Offset==2)
+    {
+      first = _mm_movehl_ps(first,first);
+      first = _mm_movelh_ps(first,second);
+    }
+    else if (Offset==3)
+    {
+      first = _mm_move_ss(first,second);
+      first = _mm_shuffle_ps(first,second,0x93);
+    }
+  }
+};
+
+template<int Offset>
+struct palign_impl<Offset,Packet4i>
+{
+  static EIGEN_STRONG_INLINE void run(Packet4i& first, const Packet4i& second)
+  {
+    if (Offset==1)
+    {
+      first = _mm_castps_si128(_mm_move_ss(_mm_castsi128_ps(first),_mm_castsi128_ps(second)));
+      first = _mm_shuffle_epi32(first,0x39);
+    }
+    else if (Offset==2)
+    {
+      first = _mm_castps_si128(_mm_movehl_ps(_mm_castsi128_ps(first),_mm_castsi128_ps(first)));
+      first = _mm_castps_si128(_mm_movelh_ps(_mm_castsi128_ps(first),_mm_castsi128_ps(second)));
+    }
+    else if (Offset==3)
+    {
+      first = _mm_castps_si128(_mm_move_ss(_mm_castsi128_ps(first),_mm_castsi128_ps(second)));
+      first = _mm_castps_si128(_mm_shuffle_ps(_mm_castsi128_ps(first),_mm_castsi128_ps(second),0x93));
+    }
+  }
+};
+
+template<int Offset>
+struct palign_impl<Offset,Packet2d>
+{
+  static EIGEN_STRONG_INLINE void run(Packet2d& first, const Packet2d& second)
+  {
+    if (Offset==1)
+    {
+      first = _mm_castps_pd(_mm_movehl_ps(_mm_castpd_ps(first),_mm_castpd_ps(first)));
+      first = _mm_castps_pd(_mm_movelh_ps(_mm_castpd_ps(first),_mm_castpd_ps(second)));
+    }
+  }
+};
+#endif
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet4f,4>& kernel) {
+  _MM_TRANSPOSE4_PS(kernel.packet[0], kernel.packet[1], kernel.packet[2], kernel.packet[3]);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet2d,2>& kernel) {
+  __m128d tmp = _mm_unpackhi_pd(kernel.packet[0], kernel.packet[1]);
+  kernel.packet[0] = _mm_unpacklo_pd(kernel.packet[0], kernel.packet[1]);
+  kernel.packet[1] = tmp;
+}
+
+template<> EIGEN_DEVICE_FUNC inline void
+ptranspose(PacketBlock<Packet4i,4>& kernel) {
+  __m128i T0 = _mm_unpacklo_epi32(kernel.packet[0], kernel.packet[1]);
+  __m128i T1 = _mm_unpacklo_epi32(kernel.packet[2], kernel.packet[3]);
+  __m128i T2 = _mm_unpackhi_epi32(kernel.packet[0], kernel.packet[1]);
+  __m128i T3 = _mm_unpackhi_epi32(kernel.packet[2], kernel.packet[3]);
+
+  kernel.packet[0] = _mm_unpacklo_epi64(T0, T1);
+  kernel.packet[1] = _mm_unpackhi_epi64(T0, T1);
+  kernel.packet[2] = _mm_unpacklo_epi64(T2, T3);
+  kernel.packet[3] = _mm_unpackhi_epi64(T2, T3);
+}
+
+template<> EIGEN_STRONG_INLINE Packet4i pblend(const Selector<4>& ifPacket, const Packet4i& thenPacket, const Packet4i& elsePacket) {
+  const __m128i zero = _mm_setzero_si128();
+  const __m128i select = _mm_set_epi32(ifPacket.select[3], ifPacket.select[2], ifPacket.select[1], ifPacket.select[0]);
+  __m128i false_mask = _mm_cmpeq_epi32(select, zero);
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  return _mm_blendv_epi8(thenPacket, elsePacket, false_mask);
+#else
+  return _mm_or_si128(_mm_andnot_si128(false_mask, thenPacket), _mm_and_si128(false_mask, elsePacket));
+#endif
+}
+template<> EIGEN_STRONG_INLINE Packet4f pblend(const Selector<4>& ifPacket, const Packet4f& thenPacket, const Packet4f& elsePacket) {
+  const __m128 zero = _mm_setzero_ps();
+  const __m128 select = _mm_set_ps(ifPacket.select[3], ifPacket.select[2], ifPacket.select[1], ifPacket.select[0]);
+  __m128 false_mask = _mm_cmpeq_ps(select, zero);
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  return _mm_blendv_ps(thenPacket, elsePacket, false_mask);
+#else
+  return _mm_or_ps(_mm_andnot_ps(false_mask, thenPacket), _mm_and_ps(false_mask, elsePacket));
+#endif
+}
+
+template<> EIGEN_STRONG_INLINE Packet2d pblend(const Selector<2>& ifPacket, const Packet2d& thenPacket, const Packet2d& elsePacket) {
+  const __m128d zero = _mm_setzero_pd();
+  const __m128d select = _mm_set_pd(ifPacket.select[1], ifPacket.select[0]);
+  __m128d false_mask = _mm_cmpeq_pd(select, zero);
+#ifdef EIGEN_VECTORIZE_SSE4_1
+  return _mm_blendv_pd(thenPacket, elsePacket, false_mask);
+#else
+  return _mm_or_pd(_mm_andnot_pd(false_mask, thenPacket), _mm_and_pd(false_mask, elsePacket));
+#endif
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_PACKET_MATH_SSE_H
diff --git a/third_party/eigen3/Eigen/src/Core/arch/SSE/TypeCasting.h b/third_party/eigen3/Eigen/src/Core/arch/SSE/TypeCasting.h
new file mode 100644
index 0000000000..c848932306
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/arch/SSE/TypeCasting.h
@@ -0,0 +1,77 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TYPE_CASTING_SSE_H
+#define EIGEN_TYPE_CASTING_SSE_H
+
+namespace Eigen {
+
+namespace internal {
+
+template <>
+struct type_casting_traits<float, int> {
+  enum {
+    VectorizedCast = 1,
+    SrcCoeffRatio = 1,
+    TgtCoeffRatio = 1
+  };
+};
+
+template<> EIGEN_STRONG_INLINE Packet4i pcast<Packet4f, Packet4i>(const Packet4f& a) {
+  return _mm_cvttps_epi32(a);
+}
+
+
+template <>
+struct type_casting_traits<int, float> {
+  enum {
+    VectorizedCast = 1,
+    SrcCoeffRatio = 1,
+    TgtCoeffRatio = 1
+  };
+};
+
+template<> EIGEN_STRONG_INLINE Packet4f pcast<Packet4i, Packet4f>(const Packet4i& a) {
+  return _mm_cvtepi32_ps(a);
+}
+
+
+template <>
+struct type_casting_traits<double, float> {
+  enum {
+    VectorizedCast = 1,
+    SrcCoeffRatio = 2,
+    TgtCoeffRatio = 1
+  };
+};
+
+template<> EIGEN_STRONG_INLINE Packet4f pcast<Packet2d, Packet4f>(const Packet2d& a, const Packet2d& b) {
+  return _mm_shuffle_ps(_mm_cvtpd_ps(a), _mm_cvtpd_ps(b), (1 << 2) | (1 << 6));
+}
+
+template <>
+struct type_casting_traits<float, double> {
+  enum {
+    VectorizedCast = 1,
+    SrcCoeffRatio = 1,
+    TgtCoeffRatio = 2
+  };
+};
+
+template<> EIGEN_STRONG_INLINE Packet2d pcast<Packet4f, Packet2d>(const Packet4f& a) {
+  // Simply discard the second half of the input
+  return _mm_cvtps_pd(a);
+}
+
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TYPE_CASTING_SSE_H
diff --git a/third_party/eigen3/Eigen/src/Core/functors/AssignmentFunctors.h b/third_party/eigen3/Eigen/src/Core/functors/AssignmentFunctors.h
new file mode 100644
index 0000000000..ae264aa640
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/functors/AssignmentFunctors.h
@@ -0,0 +1,167 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ASSIGNMENT_FUNCTORS_H
+#define EIGEN_ASSIGNMENT_FUNCTORS_H
+
+namespace Eigen {
+
+namespace internal {
+  
+/** \internal
+  * \brief Template functor for scalar/packet assignment
+  *
+  */
+template<typename Scalar> struct assign_op {
+
+  EIGEN_EMPTY_STRUCT_CTOR(assign_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Scalar& a, const Scalar& b) const { a = b; }
+  
+  template<int Alignment, typename Packet>
+  EIGEN_STRONG_INLINE void assignPacket(Scalar* a, const Packet& b) const
+  { internal::pstoret<Scalar,Packet,Alignment>(a,b); }
+};
+template<typename Scalar>
+struct functor_traits<assign_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::ReadCost,
+    PacketAccess = packet_traits<Scalar>::IsVectorized
+  };
+};
+
+/** \internal
+  * \brief Template functor for scalar/packet assignment with addition
+  *
+  */
+template<typename Scalar> struct add_assign_op {
+
+  EIGEN_EMPTY_STRUCT_CTOR(add_assign_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Scalar& a, const Scalar& b) const { a += b; }
+  
+  template<int Alignment, typename Packet>
+  EIGEN_STRONG_INLINE void assignPacket(Scalar* a, const Packet& b) const
+  { internal::pstoret<Scalar,Packet,Alignment>(a,internal::padd(internal::ploadt<Packet,Alignment>(a),b)); }
+};
+template<typename Scalar>
+struct functor_traits<add_assign_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::ReadCost + NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasAdd
+  };
+};
+
+/** \internal
+  * \brief Template functor for scalar/packet assignment with subtraction
+  *
+  */
+template<typename Scalar> struct sub_assign_op {
+
+  EIGEN_EMPTY_STRUCT_CTOR(sub_assign_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Scalar& a, const Scalar& b) const { a -= b; }
+  
+  template<int Alignment, typename Packet>
+  EIGEN_STRONG_INLINE void assignPacket(Scalar* a, const Packet& b) const
+  { internal::pstoret<Scalar,Packet,Alignment>(a,internal::psub(internal::ploadt<Packet,Alignment>(a),b)); }
+};
+template<typename Scalar>
+struct functor_traits<sub_assign_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::ReadCost + NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasAdd
+  };
+};
+
+/** \internal
+  * \brief Template functor for scalar/packet assignment with multiplication
+  *
+  */
+template<typename Scalar> struct mul_assign_op {
+
+  EIGEN_EMPTY_STRUCT_CTOR(mul_assign_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Scalar& a, const Scalar& b) const { a *= b; }
+  
+  template<int Alignment, typename Packet>
+  EIGEN_STRONG_INLINE void assignPacket(Scalar* a, const Packet& b) const
+  { internal::pstoret<Scalar,Packet,Alignment>(a,internal::pmul(internal::ploadt<Packet,Alignment>(a),b)); }
+};
+template<typename Scalar>
+struct functor_traits<mul_assign_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::ReadCost + NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasMul
+  };
+};
+
+/** \internal
+  * \brief Template functor for scalar/packet assignment with diviving
+  *
+  */
+template<typename Scalar> struct div_assign_op {
+
+  EIGEN_EMPTY_STRUCT_CTOR(div_assign_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Scalar& a, const Scalar& b) const { a /= b; }
+  
+  template<int Alignment, typename Packet>
+  EIGEN_STRONG_INLINE void assignPacket(Scalar* a, const Packet& b) const
+  { internal::pstoret<Scalar,Packet,Alignment>(a,internal::pdiv(internal::ploadt<Packet,Alignment>(a),b)); }
+};
+template<typename Scalar>
+struct functor_traits<div_assign_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::ReadCost + NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasMul
+  };
+};
+
+
+/** \internal
+  * \brief Template functor for scalar/packet assignment with swaping
+  *
+  * It works as follow. For a non-vectorized evaluation loop, we have:
+  *   for(i) func(A.coeffRef(i), B.coeff(i));
+  * where B is a SwapWrapper expression. The trick is to make SwapWrapper::coeff behaves like a non-const coeffRef.
+  * Actually, SwapWrapper might not even be needed since even if B is a plain expression, since it has to be writable
+  * B.coeff already returns a const reference to the underlying scalar value.
+  * 
+  * The case of a vectorized loop is more tricky:
+  *   for(i,j) func.assignPacket<A_Align>(&A.coeffRef(i,j), B.packet<B_Align>(i,j));
+  * Here, B must be a SwapWrapper whose packet function actually returns a proxy object holding a Scalar*,
+  * the actual alignment and Packet type.
+  *
+  */
+template<typename Scalar> struct swap_assign_op {
+
+  EIGEN_EMPTY_STRUCT_CTOR(swap_assign_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void assignCoeff(Scalar& a, const Scalar& b) const
+  {
+    using std::swap;
+    swap(a,const_cast<Scalar&>(b));
+  }
+  
+  template<int LhsAlignment, int RhsAlignment, typename Packet>
+  EIGEN_STRONG_INLINE void swapPacket(Scalar* a, Scalar* b) const
+  {
+    Packet tmp = internal::ploadt<Packet,RhsAlignment>(b);
+    internal::pstoret<Scalar,Packet,RhsAlignment>(b, internal::ploadt<Packet,LhsAlignment>(a));
+    internal::pstoret<Scalar,Packet,LhsAlignment>(a, tmp);
+  }
+};
+template<typename Scalar>
+struct functor_traits<swap_assign_op<Scalar> > {
+  enum {
+    Cost = 3 * NumTraits<Scalar>::ReadCost,
+    PacketAccess = packet_traits<Scalar>::IsVectorized
+  };
+};
+
+} // namespace internal
+
+} // namespace Eigen
+
+#endif // EIGEN_ASSIGNMENT_FUNCTORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/functors/BinaryFunctors.h b/third_party/eigen3/Eigen/src/Core/functors/BinaryFunctors.h
new file mode 100644
index 0000000000..d8ea058431
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/functors/BinaryFunctors.h
@@ -0,0 +1,498 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BINARY_FUNCTORS_H
+#define EIGEN_BINARY_FUNCTORS_H
+
+// clang-format off
+
+namespace Eigen {
+
+namespace internal {
+
+//---------- associative binary functors ----------
+
+/** \internal
+  * \brief Template functor to compute the sum of two scalars
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::operator+, class VectorwiseOp, DenseBase::sum()
+  */
+template<typename Scalar> struct scalar_sum_op {
+//   typedef Scalar result_type;
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sum_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { return a + b; }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::padd(a,b); }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar predux(const Packet& a) const
+  { return internal::predux(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_sum_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasAdd
+  };
+};
+
+/** \internal
+  * \brief Template specialization to deprecate the summation of boolean expressions.
+  * This is required to solve Bug 426.
+  * \sa DenseBase::count(), DenseBase::any(), ArrayBase::cast(), MatrixBase::cast()
+  */
+template<> struct scalar_sum_op<bool> : scalar_sum_op<int> {
+  EIGEN_DEPRECATED
+  scalar_sum_op() {}
+};
+
+
+/** \internal
+  * \brief Template functor to compute the product of two scalars
+  *
+  * \sa class CwiseBinaryOp, Cwise::operator*(), class VectorwiseOp, MatrixBase::redux()
+  */
+template<typename LhsScalar,typename RhsScalar> struct scalar_product_op {
+  enum {
+    // TODO vectorize mixed product
+    Vectorizable = is_same<LhsScalar,RhsScalar>::value && packet_traits<LhsScalar>::HasMul && packet_traits<RhsScalar>::HasMul
+  };
+  typedef typename scalar_product_traits<LhsScalar,RhsScalar>::ReturnType result_type;
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_product_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a * b; }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::pmul(a,b); }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type predux(const Packet& a) const
+  { return internal::predux_mul(a); }
+};
+template<typename LhsScalar,typename RhsScalar>
+struct functor_traits<scalar_product_op<LhsScalar,RhsScalar> > {
+  enum {
+    Cost = (NumTraits<LhsScalar>::MulCost + NumTraits<RhsScalar>::MulCost)/2, // rough estimate!
+    PacketAccess = scalar_product_op<LhsScalar,RhsScalar>::Vectorizable
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the conjugate product of two scalars
+  *
+  * This is a short cut for conj(x) * y which is needed for optimization purpose; in Eigen2 support mode, this becomes x * conj(y)
+  */
+template<typename LhsScalar,typename RhsScalar> struct scalar_conj_product_op {
+
+  enum {
+    Conj = NumTraits<LhsScalar>::IsComplex
+  };
+
+  typedef typename scalar_product_traits<LhsScalar,RhsScalar>::ReturnType result_type;
+
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_conj_product_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const
+  { return conj_helper<LhsScalar,RhsScalar,Conj,false>().pmul(a,b); }
+
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return conj_helper<Packet,Packet,Conj,false>().pmul(a,b); }
+};
+template<typename LhsScalar,typename RhsScalar>
+struct functor_traits<scalar_conj_product_op<LhsScalar,RhsScalar> > {
+  enum {
+    Cost = NumTraits<LhsScalar>::MulCost,
+    PacketAccess = internal::is_same<LhsScalar, RhsScalar>::value && packet_traits<LhsScalar>::HasMul
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the min of two scalars
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::cwiseMin, class VectorwiseOp, MatrixBase::minCoeff()
+  */
+template<typename Scalar> struct scalar_min_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_min_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { return numext::mini(a, b); }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::pmin(a,b); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Scalar predux(const Packet& a) const
+  { return internal::predux_min(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_min_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasMin
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the max of two scalars
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::cwiseMax, class VectorwiseOp, MatrixBase::maxCoeff()
+  */
+template<typename Scalar> struct scalar_max_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_max_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const  { return numext::maxi(a, b); }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::pmax(a,b); }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar predux(const Packet& a) const
+  { return internal::predux_max(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_max_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasMax
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the hypot of two scalars
+  *
+  * \sa MatrixBase::stableNorm(), class Redux
+  */
+template<typename Scalar> struct scalar_hypot_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_hypot_op)
+//   typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& _x, const Scalar& _y) const
+  {
+    using std::sqrt;
+    Scalar p = numext::maxi(_x, _y);
+    Scalar q = numext::mini(_x, _y);
+    Scalar qp = q/p;
+    return p * sqrt(Scalar(1) + qp*qp);
+  }
+};
+template<typename Scalar>
+struct functor_traits<scalar_hypot_op<Scalar> > {
+  enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess=0 };
+};
+
+/** \internal
+  * \brief Template functor to compute the pow of two scalars
+  */
+template<typename Scalar, typename OtherScalar> struct scalar_binary_pow_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_binary_pow_op)
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a, const OtherScalar& b) const { return numext::pow(a, b); }
+};
+template<typename Scalar, typename OtherScalar>
+struct functor_traits<scalar_binary_pow_op<Scalar,OtherScalar> > {
+  enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = false };
+};
+
+
+
+//---------- non associative binary functors ----------
+
+/** \internal
+  * \brief Template functor to compute the difference of two scalars
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::operator-
+  */
+template<typename Scalar> struct scalar_difference_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_difference_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a, const Scalar& b) const { return a - b; }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::psub(a,b); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_difference_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasSub
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the quotient of two scalars
+  *
+  * \sa class CwiseBinaryOp, Cwise::operator/()
+  */
+template<typename LhsScalar,typename RhsScalar> struct scalar_quotient_op {
+  enum {
+    // TODO vectorize mixed product
+    Vectorizable = is_same<LhsScalar,RhsScalar>::value && packet_traits<LhsScalar>::HasDiv && packet_traits<RhsScalar>::HasDiv
+  };
+  typedef typename scalar_product_traits<LhsScalar,RhsScalar>::ReturnType result_type;
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_quotient_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const LhsScalar& a, const RhsScalar& b) const { return a / b; }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a, const Packet& b) const
+  { return internal::pdiv(a,b); }
+};
+template<typename LhsScalar,typename RhsScalar>
+struct functor_traits<scalar_quotient_op<LhsScalar,RhsScalar> > {
+  enum {
+    Cost = (NumTraits<LhsScalar>::MulCost + NumTraits<RhsScalar>::MulCost), // rough estimate!
+    PacketAccess = scalar_quotient_op<LhsScalar,RhsScalar>::Vectorizable
+  };
+};
+
+
+
+/** \internal
+  * \brief Template functor to compute the and of two booleans
+  *
+  * \sa class CwiseBinaryOp, ArrayBase::operator&&
+  */
+struct scalar_boolean_and_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_and_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a && b; }
+};
+template<> struct functor_traits<scalar_boolean_and_op> {
+  enum {
+    Cost = NumTraits<bool>::AddCost,
+    PacketAccess = false
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the or of two booleans
+  *
+  * \sa class CwiseBinaryOp, ArrayBase::operator||
+  */
+struct scalar_boolean_or_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_or_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a || b; }
+};
+template<> struct functor_traits<scalar_boolean_or_op> {
+  enum {
+    Cost = NumTraits<bool>::AddCost,
+    PacketAccess = false
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the xor of two booleans
+  *
+  * \sa class CwiseBinaryOp, ArrayBase::operator^
+  */
+struct scalar_boolean_xor_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_boolean_xor_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool operator() (const bool& a, const bool& b) const { return a ^ b; }
+};
+template<> struct functor_traits<scalar_boolean_xor_op> {
+  enum {
+    Cost = NumTraits<bool>::AddCost,
+    PacketAccess = false
+  };
+};
+
+
+
+//---------- binary functors bound to a constant, thus appearing as a unary functor ----------
+
+/** \internal
+  * \brief Template functor to multiply a scalar by a fixed other one
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::operator*, MatrixBase::operator/
+  */
+/* NOTE why doing the pset1() in packetOp *is* an optimization ?
+ * indeed it seems better to declare m_other as a Packet and do the pset1() once
+ * in the constructor. However, in practice:
+ *  - GCC does not like m_other as a Packet and generate a load every time it needs it
+ *  - on the other hand GCC is able to moves the pset1() outside the loop :)
+ *  - simpler code ;)
+ * (ICC and gcc 4.4 seems to perform well in both cases, the issue is visible with y = a*x + b*y)
+ */
+template<typename Scalar>
+struct scalar_multiple_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  // FIXME default copy constructors seems bugged with std::complex<>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_multiple_op(const scalar_multiple_op& other) : m_other(other.m_other) { }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_multiple_op(const Scalar& other) : m_other(other) { }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator() (const Scalar& a) const { return a * m_other; }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pmul(a, pset1<Packet>(m_other)); }
+  typename add_const_on_value_type<typename NumTraits<Scalar>::Nested>::type m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_multiple_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasMul }; };
+
+template<typename Scalar1, typename Scalar2>
+struct scalar_multiple2_op {
+  typedef typename packet_traits<Scalar1>::type Packet1;
+  typedef typename scalar_product_traits<Scalar1,Scalar2>::ReturnType result_type;
+  typedef typename packet_traits<result_type>::type packet_result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_multiple2_op(const scalar_multiple2_op& other) : m_other(other.m_other) { }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_multiple2_op(const Scalar2& other) : m_other(other) { }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE result_type operator() (const Scalar1& a) const { return a * m_other; }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const packet_result_type packetOp(const Packet1& a) const
+  { eigen_assert("packetOp is not defined"); }
+  typename add_const_on_value_type<typename NumTraits<Scalar2>::Nested>::type m_other;
+};
+template<typename Scalar1,typename Scalar2>
+struct functor_traits<scalar_multiple2_op<Scalar1,Scalar2> >
+{ enum { Cost = NumTraits<Scalar1>::MulCost, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to divide a scalar by a fixed other one
+  *
+  * This functor is used to implement the quotient of a matrix by
+  * a scalar where the scalar type is not necessarily a floating point type.
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::operator/
+  */
+template<typename Scalar>
+struct scalar_quotient1_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  // FIXME default copy constructors seems bugged with std::complex<>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_quotient1_op(const scalar_quotient1_op& other) : m_other(other.m_other) { }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_quotient1_op(const Scalar& other) : m_other(other) {}
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar operator() (const Scalar& a) const { return a / m_other; }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pdiv(a, pset1<Packet>(m_other)); }
+  typename add_const_on_value_type<typename NumTraits<Scalar>::Nested>::type m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_quotient1_op<Scalar> >
+{ enum { Cost = 2 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasDiv }; };
+
+// In Eigen, any binary op (Product, CwiseBinaryOp) require the Lhs and Rhs to have the same scalar type, except for multiplication
+// where the mixing of different types is handled by scalar_product_traits
+// In particular, real * complex<real> is allowed.
+// FIXME move this to functor_traits adding a functor_default
+template<typename Functor> struct functor_is_product_like { enum { ret = 0 }; };
+template<typename LhsScalar,typename RhsScalar> struct functor_is_product_like<scalar_product_op<LhsScalar,RhsScalar> > { enum { ret = 1 }; };
+template<typename LhsScalar,typename RhsScalar> struct functor_is_product_like<scalar_conj_product_op<LhsScalar,RhsScalar> > { enum { ret = 1 }; };
+template<typename LhsScalar,typename RhsScalar> struct functor_is_product_like<scalar_quotient_op<LhsScalar,RhsScalar> > { enum { ret = 1 }; };
+
+
+/** \internal
+  * \brief Template functor to add a scalar to a fixed other one
+  * \sa class CwiseUnaryOp, Array::operator+
+  */
+/* If you wonder why doing the pset1() in packetOp() is an optimization check scalar_multiple_op */
+template<typename Scalar>
+struct scalar_add_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  // FIXME default copy constructors seems bugged with std::complex<>
+  EIGEN_DEVICE_FUNC inline scalar_add_op(const scalar_add_op& other) : m_other(other.m_other) { }
+  EIGEN_DEVICE_FUNC inline scalar_add_op(const Scalar& other) : m_other(other) { }
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return a + m_other; }
+  EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const
+  { return internal::padd(a, pset1<Packet>(m_other)); }
+  const Scalar m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_add_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = packet_traits<Scalar>::HasAdd }; };
+
+/** \internal
+  * \brief Template functor to subtract a fixed scalar to another one
+  * \sa class CwiseUnaryOp, Array::operator-, struct scalar_add_op, struct scalar_rsub_op
+  */
+template<typename Scalar>
+struct scalar_sub_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  EIGEN_DEVICE_FUNC inline scalar_sub_op(const scalar_sub_op& other) : m_other(other.m_other) { }
+  EIGEN_DEVICE_FUNC inline scalar_sub_op(const Scalar& other) : m_other(other) { }
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return a - m_other; }
+  EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const
+  { return internal::psub(a, pset1<Packet>(m_other)); }
+  const Scalar m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_sub_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = packet_traits<Scalar>::HasAdd }; };
+
+/** \internal
+  * \brief Template functor to subtract a scalar to fixed another one
+  * \sa class CwiseUnaryOp, Array::operator-, struct scalar_add_op, struct scalar_sub_op
+  */
+template<typename Scalar>
+struct scalar_rsub_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  EIGEN_DEVICE_FUNC inline scalar_rsub_op(const scalar_rsub_op& other) : m_other(other.m_other) { }
+  EIGEN_DEVICE_FUNC inline scalar_rsub_op(const Scalar& other) : m_other(other) { }
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return m_other - a; }
+  EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const
+  { return internal::psub(pset1<Packet>(m_other), a); }
+  const Scalar m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_rsub_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = packet_traits<Scalar>::HasAdd }; };
+
+/** \internal
+  * \brief Template functor to raise a scalar to a power
+  * \sa class CwiseUnaryOp, Cwise::pow
+  */
+template<typename Scalar>
+struct scalar_pow_op {
+  // FIXME default copy constructors seems bugged with std::complex<>
+  EIGEN_DEVICE_FUNC inline scalar_pow_op(const scalar_pow_op& other) : m_exponent(other.m_exponent) { }
+  EIGEN_DEVICE_FUNC inline scalar_pow_op(const Scalar& exponent) : m_exponent(exponent) {}
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return numext::pow(a, m_exponent); }
+  const Scalar m_exponent;
+};
+template<typename Scalar>
+struct functor_traits<scalar_pow_op<Scalar> >
+{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to compute the quotient between a scalar and array entries.
+  * \sa class CwiseUnaryOp, Cwise::inverse()
+  */
+template<typename Scalar>
+struct scalar_inverse_mult_op {
+  EIGEN_DEVICE_FUNC scalar_inverse_mult_op(const Scalar& other) : m_other(other) {}
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return m_other / a; }
+  template<typename Packet>
+  EIGEN_DEVICE_FUNC inline const Packet packetOp(const Packet& a) const
+  { return internal::pdiv(pset1<Packet>(m_other),a); }
+  Scalar m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_inverse_mult_op<Scalar> >
+{ enum { Cost = 2 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasDiv }; };
+
+/** \internal
+ * \brief Template functor to compute the modulo between an array and a scalar.
+ */
+template <typename Scalar>
+struct scalar_mod_op {
+  EIGEN_DEVICE_FUNC scalar_mod_op(const Scalar& divisor) : m_divisor(divisor) {}
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return a % m_divisor; }
+  const Scalar m_divisor;
+};
+template <typename Scalar>
+struct functor_traits<scalar_mod_op<Scalar> >
+{ enum { Cost = 2 * NumTraits<Scalar>::MulCost, PacketAccess = false }; };
+
+/** \internal
+ * \brief Template functor to compute the float modulo between an array and a scalar.
+ */
+template <typename Scalar>
+struct scalar_fmod_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_fmod_op);
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar
+  operator()(const Scalar& a, const Scalar& b) const {
+    EIGEN_USING_STD_MATH(fmod);
+    return (fmod)(a, b);
+  }
+};
+
+template <typename Scalar>
+struct functor_traits<scalar_fmod_op<Scalar> > {
+  enum { Cost = 2 * NumTraits<Scalar>::MulCost, PacketAccess = false };
+};
+
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_BINARY_FUNCTORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/functors/NullaryFunctors.h b/third_party/eigen3/Eigen/src/Core/functors/NullaryFunctors.h
new file mode 100644
index 0000000000..6e464b2b8a
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/functors/NullaryFunctors.h
@@ -0,0 +1,158 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_NULLARY_FUNCTORS_H
+#define EIGEN_NULLARY_FUNCTORS_H
+
+namespace Eigen {
+
+namespace internal {
+
+template<typename Scalar>
+struct scalar_constant_op {
+  typedef typename packet_traits<Scalar>::type Packet;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_constant_op(const scalar_constant_op& other) : m_other(other.m_other) { }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE scalar_constant_op(const Scalar& other) : m_other(other) { }
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (Index, Index = 0) const { return m_other; }
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet packetOp(Index, Index = 0) const { return internal::pset1<Packet>(m_other); }
+  const Scalar m_other;
+};
+template<typename Scalar>
+struct functor_traits<scalar_constant_op<Scalar> >
+// FIXME replace this packet test by a safe one
+{ enum { Cost = 1, PacketAccess = packet_traits<Scalar>::Vectorizable, IsRepeatable = true }; };
+
+template<typename Scalar> struct scalar_identity_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_identity_op)
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (Index row, Index col) const { return row==col ? Scalar(1) : Scalar(0); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_identity_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::AddCost, PacketAccess = false, IsRepeatable = true }; };
+
+template <typename Scalar, bool RandomAccess> struct linspaced_op_impl;
+
+// linear access for packet ops:
+// 1) initialization
+//   base = [low, ..., low] + ([step, ..., step] * [-size, ..., 0])
+// 2) each step (where size is 1 for coeff access or PacketSize for packet access)
+//   base += [size*step, ..., size*step]
+//
+// TODO: Perhaps it's better to initialize lazily (so not in the constructor but in packetOp)
+//       in order to avoid the padd() in operator() ?
+template <typename Scalar>
+struct linspaced_op_impl<Scalar,false>
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+
+  linspaced_op_impl(const Scalar& low, const Scalar& step) :
+  m_low(low), m_step(step),
+  m_packetStep(pset1<Packet>(packet_traits<Scalar>::size*step)),
+  m_base(padd(pset1<Packet>(low), pmul(pset1<Packet>(step),plset<Scalar>(-packet_traits<Scalar>::size)))) {}
+
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (Index i) const
+  {
+    m_base = padd(m_base, pset1<Packet>(m_step));
+    return m_low+Scalar(i)*m_step;
+  }
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index) const { return m_base = padd(m_base,m_packetStep); }
+
+  const Scalar m_low;
+  const Scalar m_step;
+  const Packet m_packetStep;
+  mutable Packet m_base;
+};
+
+// random access for packet ops:
+// 1) each step
+//   [low, ..., low] + ( [step, ..., step] * ( [i, ..., i] + [0, ..., size] ) )
+template <typename Scalar>
+struct linspaced_op_impl<Scalar,true>
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+
+  linspaced_op_impl(const Scalar& low, const Scalar& step) :
+  m_low(low), m_step(step),
+  m_lowPacket(pset1<Packet>(m_low)), m_stepPacket(pset1<Packet>(m_step)), m_interPacket(plset<Scalar>(0)) {}
+
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (Index i) const { return m_low+i*m_step; }
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index i) const
+  { return internal::padd(m_lowPacket, pmul(m_stepPacket, padd(pset1<Packet>(i),m_interPacket))); }
+
+  const Scalar m_low;
+  const Scalar m_step;
+  const Packet m_lowPacket;
+  const Packet m_stepPacket;
+  const Packet m_interPacket;
+};
+
+// ----- Linspace functor ----------------------------------------------------------------
+
+// Forward declaration (we default to random access which does not really give
+// us a speed gain when using packet access but it allows to use the functor in
+// nested expressions).
+template <typename Scalar, bool RandomAccess = true> struct linspaced_op;
+template <typename Scalar, bool RandomAccess> struct functor_traits< linspaced_op<Scalar,RandomAccess> >
+{ enum { Cost = 1, PacketAccess = packet_traits<Scalar>::HasSetLinear, IsRepeatable = true }; };
+template <typename Scalar, bool RandomAccess> struct linspaced_op
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+  linspaced_op(const Scalar& low, const Scalar& high, DenseIndex num_steps) : impl((num_steps==1 ? high : low), (num_steps==1 ? Scalar() : (high-low)/(num_steps-1))) {}
+
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (Index i) const { return impl(i); }
+
+  // We need this function when assigning e.g. a RowVectorXd to a MatrixXd since
+  // there row==0 and col is used for the actual iteration.
+  template<typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (Index row, Index col) const
+  {
+    eigen_assert(col==0 || row==0);
+    return impl(col + row);
+  }
+
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index i) const { return impl.packetOp(i); }
+
+  // We need this function when assigning e.g. a RowVectorXd to a MatrixXd since
+  // there row==0 and col is used for the actual iteration.
+  template<typename Index>
+  EIGEN_STRONG_INLINE const Packet packetOp(Index row, Index col) const
+  {
+    eigen_assert(col==0 || row==0);
+    return impl.packetOp(col + row);
+  }
+
+  // This proxy object handles the actual required temporaries, the different
+  // implementations (random vs. sequential access) as well as the
+  // correct piping to size 2/4 packet operations.
+  const linspaced_op_impl<Scalar,RandomAccess> impl;
+};
+
+// all functors allow linear access, except scalar_identity_op. So we fix here a quick meta
+// to indicate whether a functor allows linear access, just always answering 'yes' except for
+// scalar_identity_op.
+// FIXME move this to functor_traits adding a functor_default
+template<typename Functor> struct functor_has_linear_access { enum { ret = 1 }; };
+template<typename Scalar> struct functor_has_linear_access<scalar_identity_op<Scalar> > { enum { ret = 0 }; };
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_NULLARY_FUNCTORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/functors/StlFunctors.h b/third_party/eigen3/Eigen/src/Core/functors/StlFunctors.h
new file mode 100644
index 0000000000..863fd451d3
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/functors/StlFunctors.h
@@ -0,0 +1,129 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STL_FUNCTORS_H
+#define EIGEN_STL_FUNCTORS_H
+
+namespace Eigen {
+
+namespace internal {
+
+// default functor traits for STL functors:
+
+template<typename T>
+struct functor_traits<std::multiplies<T> >
+{ enum { Cost = NumTraits<T>::MulCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::divides<T> >
+{ enum { Cost = NumTraits<T>::MulCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::plus<T> >
+{ enum { Cost = NumTraits<T>::AddCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::minus<T> >
+{ enum { Cost = NumTraits<T>::AddCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::negate<T> >
+{ enum { Cost = NumTraits<T>::AddCost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::logical_or<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::logical_and<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::logical_not<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::greater<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::less<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::greater_equal<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::less_equal<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::equal_to<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::not_equal_to<T> >
+{ enum { Cost = 1, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::binder2nd<T> >
+{ enum { Cost = functor_traits<T>::Cost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::binder1st<T> >
+{ enum { Cost = functor_traits<T>::Cost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::unary_negate<T> >
+{ enum { Cost = 1 + functor_traits<T>::Cost, PacketAccess = false }; };
+
+template<typename T>
+struct functor_traits<std::binary_negate<T> >
+{ enum { Cost = 1 + functor_traits<T>::Cost, PacketAccess = false }; };
+
+#ifdef EIGEN_STDEXT_SUPPORT
+
+template<typename T0,typename T1>
+struct functor_traits<std::project1st<T0,T1> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+template<typename T0,typename T1>
+struct functor_traits<std::project2nd<T0,T1> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+template<typename T0,typename T1>
+struct functor_traits<std::select2nd<std::pair<T0,T1> > >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+template<typename T0,typename T1>
+struct functor_traits<std::select1st<std::pair<T0,T1> > >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+template<typename T0,typename T1>
+struct functor_traits<std::unary_compose<T0,T1> >
+{ enum { Cost = functor_traits<T0>::Cost + functor_traits<T1>::Cost, PacketAccess = false }; };
+
+template<typename T0,typename T1,typename T2>
+struct functor_traits<std::binary_compose<T0,T1,T2> >
+{ enum { Cost = functor_traits<T0>::Cost + functor_traits<T1>::Cost + functor_traits<T2>::Cost, PacketAccess = false }; };
+
+#endif // EIGEN_STDEXT_SUPPORT
+
+// allow to add new functors and specializations of functor_traits from outside Eigen.
+// this macro is really needed because functor_traits must be specialized after it is declared but before it is used...
+#ifdef EIGEN_FUNCTORS_PLUGIN
+#include EIGEN_FUNCTORS_PLUGIN
+#endif
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_STL_FUNCTORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/functors/UnaryFunctors.h b/third_party/eigen3/Eigen/src/Core/functors/UnaryFunctors.h
new file mode 100644
index 0000000000..2a22e5bc19
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/functors/UnaryFunctors.h
@@ -0,0 +1,493 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_UNARY_FUNCTORS_H
+#define EIGEN_UNARY_FUNCTORS_H
+
+namespace Eigen {
+
+namespace internal {
+
+#if defined(__NVCC__) || !defined(__CUDA_ARCH__)
+using std::abs;
+using std::exp;
+using std::log;
+using std::min;
+using std::sqrt;
+using std::cos;
+using std::sin;
+using std::tan;
+using std::acos;
+using std::asin;
+using std::atan;
+#endif
+
+/** \internal
+  * \brief Template functor to compute the opposite of a scalar
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::operator-
+  */
+template<typename Scalar> struct scalar_opposite_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_opposite_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { return -a; }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pnegate(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_opposite_op<Scalar> >
+{ enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasNegate };
+};
+
+/** \internal
+  * \brief Template functor to compute the absolute value of a scalar
+  *
+  * \sa class CwiseUnaryOp, Cwise::abs
+  */
+template<typename Scalar> struct scalar_abs_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_abs_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { return abs(a); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pabs(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_abs_op<Scalar> >
+{
+  enum {
+    Cost = NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasAbs
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the squared absolute value of a scalar
+  *
+  * \sa class CwiseUnaryOp, Cwise::abs2
+  */
+template<typename Scalar> struct scalar_abs2_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_abs2_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE const result_type operator() (const Scalar& a) const { return numext::abs2(a); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const
+  { return internal::pmul(a,a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_abs2_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasAbs2 }; };
+
+/** \internal
+  * \brief Template functor to compute the conjugate of a complex value
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::conjugate()
+  */
+template<typename Scalar> struct scalar_conjugate_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_conjugate_op)
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE const Scalar operator() (const Scalar& a) const { using numext::conj; return conj(a); }
+  template<typename Packet>
+  EIGEN_STRONG_INLINE const Packet packetOp(const Packet& a) const { return internal::pconj(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_conjugate_op<Scalar> >
+{
+  enum {
+    Cost = NumTraits<Scalar>::IsComplex ? NumTraits<Scalar>::AddCost : 0,
+    PacketAccess = packet_traits<Scalar>::HasConj
+  };
+};
+
+/** \internal
+  * \brief Template functor to cast a scalar to another type
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::cast()
+  */
+template<typename Scalar, typename NewType>
+struct scalar_cast_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_cast_op)
+  typedef NewType result_type;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const NewType operator() (const Scalar& a) const { return cast<Scalar, NewType>(a); }
+};
+template<typename Scalar, typename NewType>
+struct functor_traits<scalar_cast_op<Scalar,NewType> >
+{ enum { Cost = is_same<Scalar, NewType>::value ? 0 : NumTraits<NewType>::AddCost, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to convert a scalar to another type using a custom functor.
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::convert()
+  */
+template<typename Scalar, typename NewType, typename ConvertOp>
+struct scalar_convert_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_convert_op)
+  typedef NewType result_type;
+  EIGEN_STRONG_INLINE const NewType operator() (const Scalar& a) const { return ConvertOp()(a); }
+};
+template<typename Scalar, typename NewType, typename ConvertOp>
+struct functor_traits<scalar_convert_op<Scalar,NewType,ConvertOp> >
+{ enum { Cost = is_same<Scalar, NewType>::value ? 0 : NumTraits<NewType>::AddCost, PacketAccess = false }; };
+
+
+/** \internal
+  * \brief Template functor to extract the real part of a complex
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::real()
+  */
+template<typename Scalar>
+struct scalar_real_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_real_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { return numext::real(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_real_op<Scalar> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to extract the imaginary part of a complex
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::imag()
+  */
+template<typename Scalar>
+struct scalar_imag_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_imag_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE result_type operator() (const Scalar& a) const { return numext::imag(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_imag_op<Scalar> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to extract the real part of a complex as a reference
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::real()
+  */
+template<typename Scalar>
+struct scalar_real_ref_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_real_ref_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE result_type& operator() (const Scalar& a) const { return numext::real_ref(*const_cast<Scalar*>(&a)); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_real_ref_op<Scalar> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+/** \internal
+  * \brief Template functor to extract the imaginary part of a complex as a reference
+  *
+  * \sa class CwiseUnaryOp, MatrixBase::imag()
+  */
+template<typename Scalar>
+struct scalar_imag_ref_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_imag_ref_op)
+  typedef typename NumTraits<Scalar>::Real result_type;
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE result_type& operator() (const Scalar& a) const { return numext::imag_ref(*const_cast<Scalar*>(&a)); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_imag_ref_op<Scalar> >
+{ enum { Cost = 0, PacketAccess = false }; };
+
+/** \internal
+  *
+  * \brief Template functor to compute the exponential of a scalar
+  *
+  * \sa class CwiseUnaryOp, Cwise::exp()
+  */
+template<typename Scalar> struct scalar_exp_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_exp_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return exp(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::pexp(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_exp_op<Scalar> >
+{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasExp }; };
+
+/** \internal
+  *
+  * \brief Template functor to compute the logarithm of a scalar
+  *
+  * \sa class CwiseUnaryOp, Cwise::log()
+  */
+template<typename Scalar> struct scalar_log_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_log_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return log(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::plog(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_log_op<Scalar> >
+{ enum { Cost = 5 * NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasLog }; };
+
+
+/** \internal
+  * \brief Template functor to compute the square root of a scalar
+  * \sa class CwiseUnaryOp, Cwise::sqrt()
+  */
+template<typename Scalar> struct scalar_sqrt_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sqrt_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return sqrt(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::psqrt(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_sqrt_op<Scalar> >
+{ enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasSqrt
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the reciprocal square root of a scalar
+  * \sa class CwiseUnaryOp, Cwise::rsqrt()
+  */
+template<typename Scalar> struct scalar_rsqrt_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_rsqrt_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return Scalar(1)/sqrt(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::prsqrt(a); }
+};
+
+template<typename Scalar>
+struct functor_traits<scalar_rsqrt_op<Scalar> >
+{ enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasRsqrt
+  };
+};
+
+
+/** \internal
+  * \brief Template functor to compute the cosine of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::cos()
+  */
+template<typename Scalar> struct scalar_cos_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_cos_op)
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return cos(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::pcos(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_cos_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasCos
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the sine of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::sin()
+  */
+template<typename Scalar> struct scalar_sin_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sin_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return sin(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::psin(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_sin_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasSin
+  };
+};
+
+
+/** \internal
+  * \brief Template functor to compute the tan of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::tan()
+  */
+template<typename Scalar> struct scalar_tan_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_tan_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return tan(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::ptan(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_tan_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasTan
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the arc cosine of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::acos()
+  */
+template<typename Scalar> struct scalar_acos_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_acos_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return acos(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::pacos(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_acos_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasACos
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the arc sine of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::asin()
+  */
+template<typename Scalar> struct scalar_asin_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_asin_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { return asin(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::pasin(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_asin_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasASin
+  };
+};
+
+
+/** \internal
+  * \brief Template functor to compute the atan of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::atan()
+  */
+template<typename Scalar> struct scalar_atan_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_atan_op)
+  inline const Scalar operator() (const Scalar& a) const { return atan(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::patan(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_atan_op<Scalar> >
+{
+  enum {
+    Cost = 5 * NumTraits<Scalar>::MulCost,
+    PacketAccess = packet_traits<Scalar>::HasATan
+  };
+};
+
+ /** \internal
+  * \brief Template functor to compute the tanh of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::tanh()
+  */
+template<typename Scalar> struct scalar_tanh_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_tanh_op)
+  EIGEN_DEVICE_FUNC inline const Scalar operator() (const Scalar& a) const { using std::tanh; return tanh(a); }
+  typedef typename packet_traits<Scalar>::type Packet;
+  inline Packet packetOp(const Packet& a) const { return internal::ptanh(a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_tanh_op<Scalar> >
+{
+  enum {
+    Cost = 6 * NumTraits<Scalar>::MulCost + 4 * NumTraits<Scalar>::AddCost,
+    PacketAccess = packet_traits<Scalar>::HasTanH
+  };
+};
+
+ /** \internal
+  * \brief Template functor to compute the sigmoid of a scalar
+  * \sa class CwiseUnaryOp, ArrayBase::sigmoid()
+  */
+template <typename T>
+struct scalar_sigmoid_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sigmoid_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(const T& x) const {
+    const T one = T(1);
+    return one / (one + std::exp(-x));
+  }
+
+  template <typename Packet>
+  inline Packet packetOp(const Packet& x) const {
+    const Packet one = pset1<Packet>(1);
+    return pdiv(one, padd(one, pexp(pnegate(x))));
+  }
+};
+
+template <typename T>
+struct functor_traits<scalar_sigmoid_op<T> > {
+  enum {
+    Cost = NumTraits<T>::AddCost * 2 + NumTraits<T>::MulCost * 6,
+    PacketAccess = packet_traits<T>::HasAdd && packet_traits<T>::HasDiv &&
+                   packet_traits<T>::HasNegate && packet_traits<T>::HasExp
+  };
+};
+
+/** \internal
+  * \brief Template functor to compute the inverse of a scalar
+  * \sa class CwiseUnaryOp, Cwise::inverse()
+  */
+template<typename Scalar>
+struct scalar_inverse_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_inverse_op)
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return Scalar(1)/a; }
+  template<typename Packet>
+  inline const Packet packetOp(const Packet& a) const
+  { return internal::pdiv(pset1<Packet>(Scalar(1)),a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_inverse_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasDiv }; };
+
+/** \internal
+  * \brief Template functor to compute the square of a scalar
+  * \sa class CwiseUnaryOp, Cwise::square()
+  */
+template<typename Scalar>
+struct scalar_square_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_square_op)
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return a*a; }
+  template<typename Packet>
+  inline const Packet packetOp(const Packet& a) const
+  { return internal::pmul(a,a); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_square_op<Scalar> >
+{ enum { Cost = NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasMul }; };
+
+/** \internal
+  * \brief Template functor to compute the cube of a scalar
+  * \sa class CwiseUnaryOp, Cwise::cube()
+  */
+template<typename Scalar>
+struct scalar_cube_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_cube_op)
+  EIGEN_DEVICE_FUNC inline Scalar operator() (const Scalar& a) const { return a*a*a; }
+  template<typename Packet>
+  inline const Packet packetOp(const Packet& a) const
+  { return internal::pmul(a,pmul(a,a)); }
+};
+template<typename Scalar>
+struct functor_traits<scalar_cube_op<Scalar> >
+{ enum { Cost = 2*NumTraits<Scalar>::MulCost, PacketAccess = packet_traits<Scalar>::HasMul }; };
+
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_FUNCTORS_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/CoeffBasedProduct.h b/third_party/eigen3/Eigen/src/Core/products/CoeffBasedProduct.h
new file mode 100644
index 0000000000..35a6e36e81
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/CoeffBasedProduct.h
@@ -0,0 +1,454 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COEFFBASED_PRODUCT_H
+#define EIGEN_COEFFBASED_PRODUCT_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/*********************************************************************************
+*  Coefficient based product implementation.
+*  It is designed for the following use cases:
+*  - small fixed sizes
+*  - lazy products
+*********************************************************************************/
+
+/* Since the all the dimensions of the product are small, here we can rely
+ * on the generic Assign mechanism to evaluate the product per coeff (or packet).
+ *
+ * Note that here the inner-loops should always be unrolled.
+ */
+
+template<int Traversal, int UnrollingIndex, typename Lhs, typename Rhs, typename RetScalar>
+struct product_coeff_impl;
+
+template<int StorageOrder, int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct product_packet_impl;
+
+template<typename LhsNested, typename RhsNested, int NestingFlags>
+struct traits<CoeffBasedProduct<LhsNested,RhsNested,NestingFlags> >
+{
+  typedef MatrixXpr XprKind;
+  typedef typename remove_all<LhsNested>::type _LhsNested;
+  typedef typename remove_all<RhsNested>::type _RhsNested;
+  typedef typename scalar_product_traits<typename _LhsNested::Scalar, typename _RhsNested::Scalar>::ReturnType Scalar;
+  typedef typename promote_storage_type<typename traits<_LhsNested>::StorageKind,
+                                           typename traits<_RhsNested>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename traits<_LhsNested>::Index,
+                                         typename traits<_RhsNested>::Index>::type Index;
+
+  enum {
+      LhsCoeffReadCost = _LhsNested::CoeffReadCost,
+      RhsCoeffReadCost = _RhsNested::CoeffReadCost,
+      LhsFlags = _LhsNested::Flags,
+      RhsFlags = _RhsNested::Flags,
+
+      RowsAtCompileTime = _LhsNested::RowsAtCompileTime,
+      ColsAtCompileTime = _RhsNested::ColsAtCompileTime,
+      InnerSize = EIGEN_SIZE_MIN_PREFER_FIXED(_LhsNested::ColsAtCompileTime, _RhsNested::RowsAtCompileTime),
+
+      MaxRowsAtCompileTime = _LhsNested::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = _RhsNested::MaxColsAtCompileTime,
+
+      LhsRowMajor = LhsFlags & RowMajorBit,
+      RhsRowMajor = RhsFlags & RowMajorBit,
+
+      SameType = is_same<typename _LhsNested::Scalar,typename _RhsNested::Scalar>::value,
+
+      CanVectorizeRhs = RhsRowMajor && (RhsFlags & PacketAccessBit)
+                      && (ColsAtCompileTime == Dynamic
+                          || ( (ColsAtCompileTime % packet_traits<Scalar>::size) == 0
+                              && (RhsFlags&AlignedBit)
+                             )
+                         ),
+
+      CanVectorizeLhs = (!LhsRowMajor) && (LhsFlags & PacketAccessBit)
+                      && (RowsAtCompileTime == Dynamic
+                          || ( (RowsAtCompileTime % packet_traits<Scalar>::size) == 0
+                              && (LhsFlags&AlignedBit)
+                             )
+                         ),
+
+      EvalToRowMajor = (MaxRowsAtCompileTime==1&&MaxColsAtCompileTime!=1) ? 1
+                     : (MaxColsAtCompileTime==1&&MaxRowsAtCompileTime!=1) ? 0
+                     : (RhsRowMajor && !CanVectorizeLhs),
+
+      Flags = ((unsigned int)(LhsFlags | RhsFlags) & HereditaryBits & ~RowMajorBit)
+            | (EvalToRowMajor ? RowMajorBit : 0)
+            | NestingFlags
+            | (CanVectorizeLhs ? (LhsFlags & AlignedBit) : 0)
+            | (CanVectorizeRhs ? (RhsFlags & AlignedBit) : 0)
+            // TODO enable vectorization for mixed types
+            | (SameType && (CanVectorizeLhs || CanVectorizeRhs) ? PacketAccessBit : 0),
+
+      CoeffReadCost = InnerSize == Dynamic ? Dynamic
+                    : InnerSize * (NumTraits<Scalar>::MulCost + LhsCoeffReadCost + RhsCoeffReadCost)
+                      + (InnerSize - 1) * NumTraits<Scalar>::AddCost,
+
+      /* CanVectorizeInner deserves special explanation. It does not affect the product flags. It is not used outside
+      * of Product. If the Product itself is not a packet-access expression, there is still a chance that the inner
+      * loop of the product might be vectorized. This is the meaning of CanVectorizeInner. Since it doesn't affect
+      * the Flags, it is safe to make this value depend on ActualPacketAccessBit, that doesn't affect the ABI.
+      */
+      CanVectorizeInner =    SameType
+                          && LhsRowMajor
+                          && (!RhsRowMajor)
+                          && (LhsFlags & RhsFlags & ActualPacketAccessBit)
+                          && (LhsFlags & RhsFlags & AlignedBit)
+                          && (InnerSize % packet_traits<Scalar>::size == 0)
+    };
+};
+
+} // end namespace internal
+
+template<typename LhsNested, typename RhsNested, int NestingFlags>
+class CoeffBasedProduct
+  : internal::no_assignment_operator,
+    public MatrixBase<CoeffBasedProduct<LhsNested, RhsNested, NestingFlags> >
+{
+  public:
+
+    typedef MatrixBase<CoeffBasedProduct> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(CoeffBasedProduct)
+    typedef typename Base::PlainObject PlainObject;
+
+  private:
+
+    typedef typename internal::traits<CoeffBasedProduct>::_LhsNested _LhsNested;
+    typedef typename internal::traits<CoeffBasedProduct>::_RhsNested _RhsNested;
+
+    enum {
+      PacketSize = internal::packet_traits<Scalar>::size,
+      InnerSize  = internal::traits<CoeffBasedProduct>::InnerSize,
+      Unroll = CoeffReadCost != Dynamic && CoeffReadCost <= EIGEN_UNROLLING_LIMIT,
+      CanVectorizeInner = internal::traits<CoeffBasedProduct>::CanVectorizeInner
+    };
+
+    typedef internal::product_coeff_impl<CanVectorizeInner ? InnerVectorizedTraversal : DefaultTraversal,
+                                   Unroll ? InnerSize-1 : Dynamic,
+                                   _LhsNested, _RhsNested, Scalar> ScalarCoeffImpl;
+
+    typedef CoeffBasedProduct<LhsNested,RhsNested,NestByRefBit> LazyCoeffBasedProductType;
+
+  public:
+
+    EIGEN_DEVICE_FUNC
+    inline CoeffBasedProduct(const CoeffBasedProduct& other)
+      : Base(), m_lhs(other.m_lhs), m_rhs(other.m_rhs)
+    {}
+
+    template<typename Lhs, typename Rhs>
+    EIGEN_DEVICE_FUNC 
+    inline CoeffBasedProduct(const Lhs& lhs, const Rhs& rhs)
+      : m_lhs(lhs), m_rhs(rhs)
+    {
+      // we don't allow taking products of matrices of different real types, as that wouldn't be vectorizable.
+      // We still allow to mix T and complex<T>.
+      EIGEN_STATIC_ASSERT((internal::scalar_product_traits<typename Lhs::RealScalar, typename Rhs::RealScalar>::Defined),
+        YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+      eigen_assert(lhs.cols() == rhs.rows()
+        && "invalid matrix product"
+        && "if you wanted a coeff-wise or a dot product use the respective explicit functions");
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index rows() const { return m_lhs.rows(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index cols() const { return m_rhs.cols(); }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index row, Index col) const
+    {
+      Scalar res;
+      ScalarCoeffImpl::run(row, col, m_lhs, m_rhs, res);
+      return res;
+    }
+
+    /* Allow index-based non-packet access. It is impossible though to allow index-based packed access,
+     * which is why we don't set the LinearAccessBit.
+     */
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index index) const
+    {
+      Scalar res;
+      const Index row = RowsAtCompileTime == 1 ? 0 : index;
+      const Index col = RowsAtCompileTime == 1 ? index : 0;
+      ScalarCoeffImpl::run(row, col, m_lhs, m_rhs, res);
+      return res;
+    }
+
+    template<int LoadMode>
+    EIGEN_STRONG_INLINE const PacketScalar packet(Index row, Index col) const
+    {
+      PacketScalar res;
+      internal::product_packet_impl<Flags&RowMajorBit ? RowMajor : ColMajor,
+                              Unroll ? InnerSize-1 : Dynamic,
+                              _LhsNested, _RhsNested, PacketScalar, LoadMode>
+        ::run(row, col, m_lhs, m_rhs, res);
+      return res;
+    }
+
+    // Implicit conversion to the nested type (trigger the evaluation of the product)
+    EIGEN_DEVICE_FUNC 
+    EIGEN_STRONG_INLINE operator const PlainObject& () const
+    {
+      m_result.lazyAssign(*this);
+      return m_result;
+    }
+
+    EIGEN_DEVICE_FUNC const _LhsNested& lhs() const { return m_lhs; }
+    EIGEN_DEVICE_FUNC const _RhsNested& rhs() const { return m_rhs; }
+
+    EIGEN_DEVICE_FUNC
+    const Diagonal<const LazyCoeffBasedProductType,0> diagonal() const
+    { return reinterpret_cast<const LazyCoeffBasedProductType&>(*this); }
+
+    template<int DiagonalIndex>
+    EIGEN_DEVICE_FUNC 
+    const Diagonal<const LazyCoeffBasedProductType,DiagonalIndex> diagonal() const
+    { return reinterpret_cast<const LazyCoeffBasedProductType&>(*this); }
+
+    EIGEN_DEVICE_FUNC
+    const Diagonal<const LazyCoeffBasedProductType, DynamicIndex> diagonal(Index index) const {
+      return Diagonal<const LazyCoeffBasedProductType, DynamicIndex>(
+          reinterpret_cast<const LazyCoeffBasedProductType&>(*this), index);
+    }
+
+  protected:
+    typename internal::add_const_on_value_type<LhsNested>::type m_lhs;
+    typename internal::add_const_on_value_type<RhsNested>::type m_rhs;
+
+    mutable PlainObject m_result;
+};
+
+namespace internal {
+
+// here we need to overload the nested rule for products
+// such that the nested type is a const reference to a plain matrix
+template<typename Lhs, typename Rhs, int N, typename PlainObject>
+struct nested<CoeffBasedProduct<Lhs,Rhs,EvalBeforeNestingBit|EvalBeforeAssigningBit>, N, PlainObject>
+{
+  typedef PlainObject const& type;
+};
+
+/***************************************************************************
+* Normal product .coeff() implementation (with meta-unrolling)
+***************************************************************************/
+
+/**************************************
+*** Scalar path  - no vectorization ***
+**************************************/
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename RetScalar>
+struct product_coeff_impl<DefaultTraversal, UnrollingIndex, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::Index Index;
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, RetScalar &res)
+  {
+    product_coeff_impl<DefaultTraversal, UnrollingIndex-1, Lhs, Rhs, RetScalar>::run(row, col, lhs, rhs, res);
+    res += lhs.coeff(row, UnrollingIndex) * rhs.coeff(UnrollingIndex, col);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename RetScalar>
+struct product_coeff_impl<DefaultTraversal, 0, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::Index Index;
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, RetScalar &res)
+  {
+    res = lhs.coeff(row, 0) * rhs.coeff(0, col);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename RetScalar>
+struct product_coeff_impl<DefaultTraversal, Dynamic, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::Index Index;
+  EIGEN_DEVICE_FUNC 
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, RetScalar& res)
+  {
+    eigen_assert(lhs.cols()>0 && "you are using a non initialized matrix");
+    res = lhs.coeff(row, 0) * rhs.coeff(0, col);
+      for(Index i = 1; i < lhs.cols(); ++i)
+        res += lhs.coeff(row, i) * rhs.coeff(i, col);
+  }
+};
+
+/*******************************************
+*** Scalar path with inner vectorization ***
+*******************************************/
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet>
+struct product_coeff_vectorized_unroller
+{
+  typedef typename Lhs::Index Index;
+  enum { PacketSize = packet_traits<typename Lhs::Scalar>::size };
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, typename Lhs::PacketScalar &pres)
+  {
+    product_coeff_vectorized_unroller<UnrollingIndex-PacketSize, Lhs, Rhs, Packet>::run(row, col, lhs, rhs, pres);
+    pres = padd(pres, pmul( lhs.template packet<Aligned>(row, UnrollingIndex) , rhs.template packet<Aligned>(UnrollingIndex, col) ));
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet>
+struct product_coeff_vectorized_unroller<0, Lhs, Rhs, Packet>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, typename Lhs::PacketScalar &pres)
+  {
+    pres = pmul(lhs.template packet<Aligned>(row, 0) , rhs.template packet<Aligned>(0, col));
+  }
+};
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename RetScalar>
+struct product_coeff_impl<InnerVectorizedTraversal, UnrollingIndex, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::PacketScalar Packet;
+  typedef typename Lhs::Index Index;
+  enum { PacketSize = packet_traits<typename Lhs::Scalar>::size };
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, RetScalar &res)
+  {
+    Packet pres;
+    product_coeff_vectorized_unroller<UnrollingIndex+1-PacketSize, Lhs, Rhs, Packet>::run(row, col, lhs, rhs, pres);
+    res = predux(pres);
+  }
+};
+
+template<typename Lhs, typename Rhs, int LhsRows = Lhs::RowsAtCompileTime, int RhsCols = Rhs::ColsAtCompileTime>
+struct product_coeff_vectorized_dyn_selector
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, typename Lhs::Scalar &res)
+  {
+    res = lhs.row(row).transpose().cwiseProduct(rhs.col(col)).sum();
+  }
+};
+
+// NOTE the 3 following specializations are because taking .col(0) on a vector is a bit slower
+// NOTE maybe they are now useless since we have a specialization for Block<Matrix>
+template<typename Lhs, typename Rhs, int RhsCols>
+struct product_coeff_vectorized_dyn_selector<Lhs,Rhs,1,RhsCols>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index /*row*/, Index col, const Lhs& lhs, const Rhs& rhs, typename Lhs::Scalar &res)
+  {
+    res = lhs.transpose().cwiseProduct(rhs.col(col)).sum();
+  }
+};
+
+template<typename Lhs, typename Rhs, int LhsRows>
+struct product_coeff_vectorized_dyn_selector<Lhs,Rhs,LhsRows,1>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index /*col*/, const Lhs& lhs, const Rhs& rhs, typename Lhs::Scalar &res)
+  {
+    res = lhs.row(row).transpose().cwiseProduct(rhs).sum();
+  }
+};
+
+template<typename Lhs, typename Rhs>
+struct product_coeff_vectorized_dyn_selector<Lhs,Rhs,1,1>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index /*row*/, Index /*col*/, const Lhs& lhs, const Rhs& rhs, typename Lhs::Scalar &res)
+  {
+    res = lhs.transpose().cwiseProduct(rhs).sum();
+  }
+};
+
+template<typename Lhs, typename Rhs, typename RetScalar>
+struct product_coeff_impl<InnerVectorizedTraversal, Dynamic, Lhs, Rhs, RetScalar>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, typename Lhs::Scalar &res)
+  {
+    product_coeff_vectorized_dyn_selector<Lhs,Rhs>::run(row, col, lhs, rhs, res);
+  }
+};
+
+/*******************
+*** Packet path  ***
+*******************/
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct product_packet_impl<RowMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Packet &res)
+  {
+    product_packet_impl<RowMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, res);
+    res =  pmadd(pset1<Packet>(lhs.coeff(row, UnrollingIndex)), rhs.template packet<LoadMode>(UnrollingIndex, col), res);
+  }
+};
+
+template<int UnrollingIndex, typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct product_packet_impl<ColMajor, UnrollingIndex, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Packet &res)
+  {
+    product_packet_impl<ColMajor, UnrollingIndex-1, Lhs, Rhs, Packet, LoadMode>::run(row, col, lhs, rhs, res);
+    res =  pmadd(lhs.template packet<LoadMode>(row, UnrollingIndex), pset1<Packet>(rhs.coeff(UnrollingIndex, col)), res);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct product_packet_impl<RowMajor, 0, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Packet &res)
+  {
+    res = pmul(pset1<Packet>(lhs.coeff(row, 0)),rhs.template packet<LoadMode>(0, col));
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct product_packet_impl<ColMajor, 0, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Packet &res)
+  {
+    res = pmul(lhs.template packet<LoadMode>(row, 0), pset1<Packet>(rhs.coeff(0, col)));
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct product_packet_impl<RowMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Packet& res)
+  {
+    eigen_assert(lhs.cols()>0 && "you are using a non initialized matrix");
+    res = pmul(pset1<Packet>(lhs.coeff(row, 0)),rhs.template packet<LoadMode>(0, col));
+      for(Index i = 1; i < lhs.cols(); ++i)
+        res =  pmadd(pset1<Packet>(lhs.coeff(row, i)), rhs.template packet<LoadMode>(i, col), res);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename Packet, int LoadMode>
+struct product_packet_impl<ColMajor, Dynamic, Lhs, Rhs, Packet, LoadMode>
+{
+  typedef typename Lhs::Index Index;
+  static EIGEN_STRONG_INLINE void run(Index row, Index col, const Lhs& lhs, const Rhs& rhs, Packet& res)
+  {
+    eigen_assert(lhs.cols()>0 && "you are using a non initialized matrix");
+    res = pmul(lhs.template packet<LoadMode>(row, 0), pset1<Packet>(rhs.coeff(0, col)));
+      for(Index i = 1; i < lhs.cols(); ++i)
+        res =  pmadd(lhs.template packet<LoadMode>(row, i), pset1<Packet>(rhs.coeff(i, col)), res);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_COEFFBASED_PRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/GeneralBlockPanelKernel.h b/third_party/eigen3/Eigen/src/Core/products/GeneralBlockPanelKernel.h
new file mode 100644
index 0000000000..80bd6aa0e6
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/GeneralBlockPanelKernel.h
@@ -0,0 +1,2197 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GENERAL_BLOCK_PANEL_H
+#define EIGEN_GENERAL_BLOCK_PANEL_H
+
+
+namespace Eigen {
+
+namespace internal {
+
+template<typename _LhsScalar, typename _RhsScalar, bool _ConjLhs=false, bool _ConjRhs=false>
+class gebp_traits;
+
+
+/** \internal \returns b if a<=0, and returns a otherwise. */
+inline std::ptrdiff_t manage_caching_sizes_helper(std::ptrdiff_t a, std::ptrdiff_t b)
+{
+  return a<=0 ? b : a;
+}
+
+#if EIGEN_ARCH_i386_OR_x86_64
+const std::ptrdiff_t defaultL1CacheSize = 32*1024;
+const std::ptrdiff_t defaultL2CacheSize = 256*1024;
+const std::ptrdiff_t defaultL3CacheSize = 2*1024*1024;
+#else
+const std::ptrdiff_t defaultL1CacheSize = 16*1024;
+const std::ptrdiff_t defaultL2CacheSize = 512*1024;
+const std::ptrdiff_t defaultL3CacheSize = 512*1024;
+#endif
+
+/** \internal */
+inline void manage_caching_sizes(Action action, std::ptrdiff_t* l1, std::ptrdiff_t* l2, std::ptrdiff_t* l3)
+{
+  static bool m_cache_sizes_initialized = false;
+  static std::ptrdiff_t m_l1CacheSize = 0;
+  static std::ptrdiff_t m_l2CacheSize = 0;
+  static std::ptrdiff_t m_l3CacheSize = 0;
+
+  if(EIGEN_UNLIKELY(!m_cache_sizes_initialized))
+  {
+    int l1CacheSize, l2CacheSize, l3CacheSize;
+    queryCacheSizes(l1CacheSize, l2CacheSize, l3CacheSize);
+    m_l1CacheSize = manage_caching_sizes_helper(l1CacheSize, defaultL1CacheSize);
+    m_l2CacheSize = manage_caching_sizes_helper(l2CacheSize, defaultL2CacheSize);
+    m_l3CacheSize = manage_caching_sizes_helper(l3CacheSize, defaultL3CacheSize);
+    m_cache_sizes_initialized = true;
+  }
+
+  if(EIGEN_UNLIKELY(action==SetAction))
+  {
+    // set the cpu cache size and cache all block sizes from a global cache size in byte
+    eigen_internal_assert(l1!=0 && l2!=0);
+    m_l1CacheSize = *l1;
+    m_l2CacheSize = *l2;
+    m_l3CacheSize = *l3;
+  }
+  else if(EIGEN_LIKELY(action==GetAction))
+  {
+    eigen_internal_assert(l1!=0 && l2!=0);
+    *l1 = m_l1CacheSize;
+    *l2 = m_l2CacheSize;
+    *l3 = m_l3CacheSize;
+  }
+  else
+  {
+    eigen_internal_assert(false);
+  }
+}
+
+#define CEIL(a, b) ((a)+(b)-1)/(b)
+
+/* Helper for computeProductBlockingSizes.
+ *
+ * Given a m x k times k x n matrix product of scalar types \c LhsScalar and \c RhsScalar,
+ * this function computes the blocking size parameters along the respective dimensions
+ * for matrix products and related algorithms. The blocking sizes depends on various
+ * parameters:
+ * - the L1 and L2 cache sizes,
+ * - the register level blocking sizes defined by gebp_traits,
+ * - the number of scalars that fit into a packet (when vectorization is enabled).
+ *
+ * \sa setCpuCacheSizes */
+template<typename LhsScalar, typename RhsScalar, int KcFactor, typename Index>
+void evaluateProductBlockingSizesHeuristic(Index& k, Index& m, Index& n, Index num_threads = 1)
+{
+  // Explanations:
+  // Let's recall the product algorithms form kc x nc horizontal panels B' on the rhs and
+  // mc x kc blocks A' on the lhs. A' has to fit into L2 cache. Moreover, B' is processed
+  // per kc x nr vertical small panels where nr is the blocking size along the n dimension
+  // at the register level. For vectorization purpose, these small vertical panels are unpacked,
+  // e.g., each coefficient is replicated to fit a packet. This small vertical panel has to
+  // stay in L1 cache.
+  typedef gebp_traits<LhsScalar,RhsScalar> Traits;
+  typedef typename Traits::ResScalar ResScalar;
+  enum {
+    kdiv = KcFactor * (Traits::mr * sizeof(LhsScalar) + Traits::nr * sizeof(RhsScalar)),
+    ksub = Traits::mr * Traits::nr * sizeof(ResScalar),
+    k_mask = (0xffffffff/8)*8,
+
+    mr = Traits::mr,
+    mr_mask = (0xffffffff/mr)*mr,
+
+    nr = Traits::nr,
+    nr_mask = (0xffffffff/nr)*nr
+  };
+
+  std::ptrdiff_t l1, l2, l3;
+  manage_caching_sizes(GetAction, &l1, &l2, &l3);
+
+  // Increasing k gives us more time to prefetch the content of the "C"
+  // registers. However once the latency is hidden there is no point in
+  // increasing the value of k, so we'll cap it at 320 (value determined
+  // experimentally).
+  const Index k_cache = (std::min<Index>)((l1-ksub)/kdiv, 320);
+  if (k_cache < k) {
+    k = k_cache & k_mask;
+    eigen_assert(k > 0);
+  }
+
+  const Index n_cache = (l2-l1) / (nr * sizeof(RhsScalar) * k);
+  Index n_per_thread = CEIL(n, num_threads);
+  if (n_cache <= n_per_thread) {
+    // Don't exceed the capacity of the l2 cache.
+    if (n_cache < nr) {
+      n = nr;
+    } else {
+      n = n_cache & nr_mask;
+      eigen_assert(n > 0);
+    }
+  } else {
+    n = (std::min<Index>)(n, (n_per_thread + nr - 1) & nr_mask);
+  }
+
+  if (l3 > l2) {
+    // l3 is shared between all cores, so we'll give each thread its own chunk of l3.
+    const Index m_cache = (l3-l2) / (sizeof(LhsScalar) * k * num_threads);
+    const Index m_per_thread = CEIL(m, num_threads);
+    if(m_cache < m_per_thread && m_cache >= static_cast<Index>(mr)) {
+      m = m_cache & mr_mask;
+      eigen_assert(m > 0);
+    } else {
+      m = (std::min<Index>)(m, (m_per_thread + mr - 1) & mr_mask);
+    }
+  }
+}
+
+template <typename Index>
+bool useSpecificBlockingSizes(Index& k, Index& m, Index& n)
+{
+#ifdef EIGEN_TEST_SPECIFIC_BLOCKING_SIZES
+  if (EIGEN_TEST_SPECIFIC_BLOCKING_SIZES) {
+    k = std::min<Index>(k, EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_K);
+    m = std::min<Index>(m, EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_M);
+    n = std::min<Index>(n, EIGEN_TEST_SPECIFIC_BLOCKING_SIZE_N);
+    return true;
+  }
+#else
+  EIGEN_UNUSED_VARIABLE(k)
+  EIGEN_UNUSED_VARIABLE(m)
+  EIGEN_UNUSED_VARIABLE(n)
+#endif
+  return false;
+}
+
+/** \brief Computes the blocking parameters for a m x k times k x n matrix product
+  *
+  * \param[in,out] k Input: the third dimension of the product. Output: the blocking size along the same dimension.
+  * \param[in,out] m Input: the number of rows of the left hand side. Output: the blocking size along the same dimension.
+  * \param[in,out] n Input: the number of columns of the right hand side. Output: the blocking size along the same dimension.
+  *
+  * Given a m x k times k x n matrix product of scalar types \c LhsScalar and \c RhsScalar,
+  * this function computes the blocking size parameters along the respective dimensions
+  * for matrix products and related algorithms.
+  *
+  * The blocking size parameters may be evaluated:
+  *   - either by a heuristic based on cache sizes;
+  *   - or using fixed prescribed values (for testing purposes).
+  *
+  * \sa setCpuCacheSizes */
+
+template<typename LhsScalar, typename RhsScalar, int KcFactor, typename Index>
+void computeProductBlockingSizes(Index& k, Index& m, Index& n, Index num_threads = 1)
+{
+  if (!k || !m || !n) {
+    return;
+  }
+
+  if (!useSpecificBlockingSizes(k, m, n)) {
+    evaluateProductBlockingSizesHeuristic<LhsScalar, RhsScalar, KcFactor>(k, m, n, num_threads);
+  }
+
+#if !EIGEN_ARCH_i386_OR_x86_64
+  // The following code rounds k,m,n down to the nearest multiple of register-level blocking sizes.
+  // We should always do that, and in upstream Eigen we always do that.
+  // Unfortunately, we can't do that in Google3 on x86[-64] because this makes tiny differences in results and
+  // we have some unfortunate tests require very specific relative errors which fail because of that,
+  // at least //learning/laser/algorithms/wals:wals_batch_solver_test.
+  // Note that this wouldn't make any difference if we had been using only correctly rounded values,
+  // but we've not! See how in evaluateProductBlockingSizesHeuristic, we do the rounding down by
+  // bit-masking, e.g. mr_mask = (0xffffffff/mr)*mr, implicitly assuming that mr is always a power of
+  // two, which is not the case with the 3px4 kernel.
+  typedef gebp_traits<LhsScalar,RhsScalar> Traits;
+  enum {
+    kr = 8,
+    mr = Traits::mr,
+    nr = Traits::nr
+  };
+  if (k > kr) k -= k % kr;
+  if (m > mr) m -= m % mr;
+  if (n > nr) n -= n % nr;
+#endif
+}
+
+template<typename LhsScalar, typename RhsScalar, typename Index>
+inline void computeProductBlockingSizes(Index& k, Index& m, Index& n, Index num_threads)
+{
+  computeProductBlockingSizes<LhsScalar,RhsScalar,1>(k, m, n, num_threads);
+}
+
+#ifdef EIGEN_HAS_SINGLE_INSTRUCTION_CJMADD
+  #define CJMADD(CJ,A,B,C,T)  C = CJ.pmadd(A,B,C);
+#else
+
+  // FIXME (a bit overkill maybe ?)
+
+  template<typename CJ, typename A, typename B, typename C, typename T> struct gebp_madd_selector {
+    EIGEN_ALWAYS_INLINE static void run(const CJ& cj, A& a, B& b, C& c, T& /*t*/)
+    {
+      c = cj.pmadd(a,b,c);
+    }
+  };
+
+  template<typename CJ, typename T> struct gebp_madd_selector<CJ,T,T,T,T> {
+    EIGEN_ALWAYS_INLINE static void run(const CJ& cj, T& a, T& b, T& c, T& t)
+    {
+      t = b; t = cj.pmul(a,t); c = padd(c,t);
+    }
+  };
+
+  template<typename CJ, typename A, typename B, typename C, typename T>
+  EIGEN_STRONG_INLINE void gebp_madd(const CJ& cj, A& a, B& b, C& c, T& t)
+  {
+    gebp_madd_selector<CJ,A,B,C,T>::run(cj,a,b,c,t);
+  }
+
+  #define CJMADD(CJ,A,B,C,T)  gebp_madd(CJ,A,B,C,T);
+//   #define CJMADD(CJ,A,B,C,T)  T = B; T = CJ.pmul(A,T); C = padd(C,T);
+#endif
+
+/* Vectorization logic
+ *  real*real: unpack rhs to constant packets, ...
+ *
+ *  cd*cd : unpack rhs to (b_r,b_r), (b_i,b_i), mul to get (a_r b_r,a_i b_r) (a_r b_i,a_i b_i),
+ *          storing each res packet into two packets (2x2),
+ *          at the end combine them: swap the second and addsub them
+ *  cf*cf : same but with 2x4 blocks
+ *  cplx*real : unpack rhs to constant packets, ...
+ *  real*cplx : load lhs as (a0,a0,a1,a1), and mul as usual
+ */
+template<typename _LhsScalar, typename _RhsScalar, bool _ConjLhs, bool _ConjRhs>
+class gebp_traits
+{
+public:
+  typedef _LhsScalar LhsScalar;
+  typedef _RhsScalar RhsScalar;
+  typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+
+  enum {
+    ConjLhs = _ConjLhs,
+    ConjRhs = _ConjRhs,
+    Vectorizable = packet_traits<LhsScalar>::Vectorizable && packet_traits<RhsScalar>::Vectorizable,
+    LhsPacketSize = Vectorizable ? packet_traits<LhsScalar>::size : 1,
+    RhsPacketSize = Vectorizable ? packet_traits<RhsScalar>::size : 1,
+    ResPacketSize = Vectorizable ? packet_traits<ResScalar>::size : 1,
+
+    NumberOfRegisters = EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS,
+
+    // register block size along the N direction must be 1 or 4
+    nr = 4,
+
+    // register block size along the M direction (currently, this one cannot be modified)
+    default_mr = (EIGEN_PLAIN_ENUM_MIN(16,NumberOfRegisters)/2/nr)*LhsPacketSize,
+#if defined(EIGEN_HAS_SINGLE_INSTRUCTION_MADD) && !defined(EIGEN_VECTORIZE_ALTIVEC) && !defined(EIGEN_VECTORIZE_VSX)
+    // we assume 16 registers
+    mr = Vectorizable ? 3*LhsPacketSize : default_mr,
+#else
+    mr = default_mr,
+#endif
+
+    LhsProgress = LhsPacketSize,
+    RhsProgress = 1
+  };
+
+  typedef typename packet_traits<LhsScalar>::type  _LhsPacket;
+  typedef typename packet_traits<RhsScalar>::type  _RhsPacket;
+  typedef typename packet_traits<ResScalar>::type  _ResPacket;
+
+  typedef typename conditional<Vectorizable,_LhsPacket,LhsScalar>::type LhsPacket;
+  typedef typename conditional<Vectorizable,_RhsPacket,RhsScalar>::type RhsPacket;
+  typedef typename conditional<Vectorizable,_ResPacket,ResScalar>::type ResPacket;
+
+  typedef ResPacket AccPacket;
+
+  EIGEN_STRONG_INLINE void initAcc(AccPacket& p)
+  {
+    p = pset1<ResPacket>(ResScalar(0));
+  }
+
+  EIGEN_STRONG_INLINE void broadcastRhs(const RhsScalar* b, RhsPacket& b0, RhsPacket& b1, RhsPacket& b2, RhsPacket& b3)
+  {
+    pbroadcast4(b, b0, b1, b2, b3);
+  }
+
+//   EIGEN_STRONG_INLINE void broadcastRhs(const RhsScalar* b, RhsPacket& b0, RhsPacket& b1)
+//   {
+//     pbroadcast2(b, b0, b1);
+//   }
+
+  template<typename RhsPacketType>
+  EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacketType& dest) const
+  {
+    dest = pset1<RhsPacketType>(*b);
+  }
+
+  EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const
+  {
+    dest = ploadquad<RhsPacket>(b);
+  }
+
+  template<typename LhsPacketType>
+  EIGEN_STRONG_INLINE void loadLhs(const LhsScalar* a, LhsPacketType& dest) const
+  {
+    dest = pload<LhsPacketType>(a);
+  }
+
+  template<typename LhsPacketType>
+  EIGEN_STRONG_INLINE void loadLhsUnaligned(const LhsScalar* a, LhsPacketType& dest) const
+  {
+    dest = ploadu<LhsPacketType>(a);
+  }
+
+  template<typename LhsPacketType, typename RhsPacketType, typename AccPacketType>
+  EIGEN_STRONG_INLINE void madd(const LhsPacketType& a, const RhsPacketType& b, AccPacketType& c, AccPacketType& tmp) const
+  {
+    // It would be a lot cleaner to call pmadd all the time. Unfortunately if we
+    // let gcc allocate the register in which to store the result of the pmul
+    // (in the case where there is no FMA) gcc fails to figure out how to avoid
+    // spilling register.
+#ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+    EIGEN_UNUSED_VARIABLE(tmp);
+    c = pmadd(a,b,c);
+#else
+    tmp = b; tmp = pmul(a,tmp); c = padd(c,tmp);
+#endif
+  }
+
+  EIGEN_STRONG_INLINE void acc(const AccPacket& c, const ResPacket& alpha, ResPacket& r) const
+  {
+    r = pmadd(c,alpha,r);
+  }
+
+  template<typename ResPacketHalf>
+  EIGEN_STRONG_INLINE void acc(const ResPacketHalf& c, const ResPacketHalf& alpha, ResPacketHalf& r) const
+  {
+    r = pmadd(c,alpha,r);
+  }
+
+protected:
+//   conj_helper<LhsScalar,RhsScalar,ConjLhs,ConjRhs> cj;
+//   conj_helper<LhsPacket,RhsPacket,ConjLhs,ConjRhs> pcj;
+};
+
+template<typename RealScalar, bool _ConjLhs>
+class gebp_traits<std::complex<RealScalar>, RealScalar, _ConjLhs, false>
+{
+public:
+  typedef std::complex<RealScalar> LhsScalar;
+  typedef RealScalar RhsScalar;
+  typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+
+  enum {
+    ConjLhs = _ConjLhs,
+    ConjRhs = false,
+    Vectorizable = packet_traits<LhsScalar>::Vectorizable && packet_traits<RhsScalar>::Vectorizable,
+    LhsPacketSize = Vectorizable ? packet_traits<LhsScalar>::size : 1,
+    RhsPacketSize = Vectorizable ? packet_traits<RhsScalar>::size : 1,
+    ResPacketSize = Vectorizable ? packet_traits<ResScalar>::size : 1,
+
+    NumberOfRegisters = EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS,
+    nr = 4,
+#if defined(EIGEN_HAS_SINGLE_INSTRUCTION_MADD) && !defined(EIGEN_VECTORIZE_ALTIVEC) && !defined(EIGEN_VECTORIZE_VSX)
+    // we assume 16 registers
+    mr = 3*LhsPacketSize,
+#else
+    mr = (EIGEN_PLAIN_ENUM_MIN(16,NumberOfRegisters)/2/nr)*LhsPacketSize,
+#endif
+
+    LhsProgress = LhsPacketSize,
+    RhsProgress = 1
+  };
+
+  typedef typename packet_traits<LhsScalar>::type  _LhsPacket;
+  typedef typename packet_traits<RhsScalar>::type  _RhsPacket;
+  typedef typename packet_traits<ResScalar>::type  _ResPacket;
+
+  typedef typename conditional<Vectorizable,_LhsPacket,LhsScalar>::type LhsPacket;
+  typedef typename conditional<Vectorizable,_RhsPacket,RhsScalar>::type RhsPacket;
+  typedef typename conditional<Vectorizable,_ResPacket,ResScalar>::type ResPacket;
+
+  typedef ResPacket AccPacket;
+
+  EIGEN_STRONG_INLINE void initAcc(AccPacket& p)
+  {
+    p = pset1<ResPacket>(ResScalar(0));
+  }
+
+  EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacket& dest) const
+  {
+    dest = pset1<RhsPacket>(*b);
+  }
+
+  EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const
+  {
+    dest = pset1<RhsPacket>(*b);
+  }
+
+  EIGEN_STRONG_INLINE void loadLhs(const LhsScalar* a, LhsPacket& dest) const
+  {
+    dest = pload<LhsPacket>(a);
+  }
+
+  EIGEN_STRONG_INLINE void loadLhsUnaligned(const LhsScalar* a, LhsPacket& dest) const
+  {
+    dest = ploadu<LhsPacket>(a);
+  }
+
+  EIGEN_STRONG_INLINE void broadcastRhs(const RhsScalar* b, RhsPacket& b0, RhsPacket& b1, RhsPacket& b2, RhsPacket& b3)
+  {
+    pbroadcast4(b, b0, b1, b2, b3);
+  }
+
+//   EIGEN_STRONG_INLINE void broadcastRhs(const RhsScalar* b, RhsPacket& b0, RhsPacket& b1)
+//   {
+//     pbroadcast2(b, b0, b1);
+//   }
+
+  EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& tmp) const
+  {
+    madd_impl(a, b, c, tmp, typename conditional<Vectorizable,true_type,false_type>::type());
+  }
+
+  EIGEN_STRONG_INLINE void madd_impl(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& tmp, const true_type&) const
+  {
+#ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+    EIGEN_UNUSED_VARIABLE(tmp);
+    c.v = pmadd(a.v,b,c.v);
+#else
+    tmp = b; tmp = pmul(a.v,tmp); c.v = padd(c.v,tmp);
+#endif
+  }
+
+  EIGEN_STRONG_INLINE void madd_impl(const LhsScalar& a, const RhsScalar& b, ResScalar& c, RhsScalar& /*tmp*/, const false_type&) const
+  {
+    c += a * b;
+  }
+
+  EIGEN_STRONG_INLINE void acc(const AccPacket& c, const ResPacket& alpha, ResPacket& r) const
+  {
+    r = cj.pmadd(c,alpha,r);
+  }
+
+protected:
+  conj_helper<ResPacket,ResPacket,ConjLhs,false> cj;
+};
+
+template<typename Packet>
+struct DoublePacket
+{
+  Packet first;
+  Packet second;
+};
+
+template<typename Packet>
+DoublePacket<Packet> padd(const DoublePacket<Packet> &a, const DoublePacket<Packet> &b)
+{
+  DoublePacket<Packet> res;
+  res.first  = padd(a.first, b.first);
+  res.second = padd(a.second,b.second);
+  return res;
+}
+
+template<typename Packet>
+const DoublePacket<Packet>& predux4(const DoublePacket<Packet> &a)
+{
+  return a;
+}
+
+template<typename Packet> struct unpacket_traits<DoublePacket<Packet> > { typedef DoublePacket<Packet> half; };
+// template<typename Packet>
+// DoublePacket<Packet> pmadd(const DoublePacket<Packet> &a, const DoublePacket<Packet> &b)
+// {
+//   DoublePacket<Packet> res;
+//   res.first  = padd(a.first, b.first);
+//   res.second = padd(a.second,b.second);
+//   return res;
+// }
+
+template<typename RealScalar, bool _ConjLhs, bool _ConjRhs>
+class gebp_traits<std::complex<RealScalar>, std::complex<RealScalar>, _ConjLhs, _ConjRhs >
+{
+public:
+  typedef std::complex<RealScalar>  Scalar;
+  typedef std::complex<RealScalar>  LhsScalar;
+  typedef std::complex<RealScalar>  RhsScalar;
+  typedef std::complex<RealScalar>  ResScalar;
+
+  enum {
+    ConjLhs = _ConjLhs,
+    ConjRhs = _ConjRhs,
+    Vectorizable = packet_traits<RealScalar>::Vectorizable
+                && packet_traits<Scalar>::Vectorizable,
+    RealPacketSize  = Vectorizable ? packet_traits<RealScalar>::size : 1,
+    ResPacketSize   = Vectorizable ? packet_traits<ResScalar>::size : 1,
+    LhsPacketSize = Vectorizable ? packet_traits<LhsScalar>::size : 1,
+    RhsPacketSize = Vectorizable ? packet_traits<RhsScalar>::size : 1,
+
+    // FIXME: should depend on NumberOfRegisters
+    nr = 4,
+    mr = ResPacketSize,
+
+    LhsProgress = ResPacketSize,
+    RhsProgress = 1
+  };
+
+  typedef typename packet_traits<RealScalar>::type RealPacket;
+  typedef typename packet_traits<Scalar>::type     ScalarPacket;
+  typedef DoublePacket<RealPacket> DoublePacketType;
+
+  typedef typename conditional<Vectorizable,RealPacket,  Scalar>::type LhsPacket;
+  typedef typename conditional<Vectorizable,DoublePacketType,Scalar>::type RhsPacket;
+  typedef typename conditional<Vectorizable,ScalarPacket,Scalar>::type ResPacket;
+  typedef typename conditional<Vectorizable,DoublePacketType,Scalar>::type AccPacket;
+
+  EIGEN_STRONG_INLINE void initAcc(Scalar& p) { p = Scalar(0); }
+
+  EIGEN_STRONG_INLINE void initAcc(DoublePacketType& p)
+  {
+    p.first   = pset1<RealPacket>(RealScalar(0));
+    p.second  = pset1<RealPacket>(RealScalar(0));
+  }
+
+  // Scalar path
+  EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, ResPacket& dest) const
+  {
+    dest = pset1<ResPacket>(*b);
+  }
+
+  // Vectorized path
+  EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, DoublePacketType& dest) const
+  {
+    dest.first  = pset1<RealPacket>(real(*b));
+    dest.second = pset1<RealPacket>(imag(*b));
+  }
+
+  EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, ResPacket& dest) const
+  {
+    loadRhs(b,dest);
+  }
+  EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, DoublePacketType& dest) const
+  {
+    eigen_internal_assert(unpacket_traits<ScalarPacket>::size<=4);
+    loadRhs(b,dest);
+  }
+
+  EIGEN_STRONG_INLINE void broadcastRhs(const RhsScalar* b, RhsPacket& b0, RhsPacket& b1, RhsPacket& b2, RhsPacket& b3)
+  {
+    // FIXME not sure that's the best way to implement it!
+    loadRhs(b+0, b0);
+    loadRhs(b+1, b1);
+    loadRhs(b+2, b2);
+    loadRhs(b+3, b3);
+  }
+
+  // Vectorized path
+  EIGEN_STRONG_INLINE void broadcastRhs(const RhsScalar* b, DoublePacketType& b0, DoublePacketType& b1)
+  {
+    // FIXME not sure that's the best way to implement it!
+    loadRhs(b+0, b0);
+    loadRhs(b+1, b1);
+  }
+
+  // Scalar path
+  EIGEN_STRONG_INLINE void broadcastRhs(const RhsScalar* b, RhsScalar& b0, RhsScalar& b1)
+  {
+    // FIXME not sure that's the best way to implement it!
+    loadRhs(b+0, b0);
+    loadRhs(b+1, b1);
+  }
+
+  // nothing special here
+  EIGEN_STRONG_INLINE void loadLhs(const LhsScalar* a, LhsPacket& dest) const
+  {
+    dest = pload<LhsPacket>((const typename unpacket_traits<LhsPacket>::type*)(a));
+  }
+
+  EIGEN_STRONG_INLINE void loadLhsUnaligned(const LhsScalar* a, LhsPacket& dest) const
+  {
+    dest = ploadu<LhsPacket>((const typename unpacket_traits<LhsPacket>::type*)(a));
+  }
+
+  EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, DoublePacketType& c, RhsPacket& /*tmp*/) const
+  {
+    c.first   = padd(pmul(a,b.first), c.first);
+    c.second  = padd(pmul(a,b.second),c.second);
+  }
+
+  EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, ResPacket& c, RhsPacket& /*tmp*/) const
+  {
+    c = cj.pmadd(a,b,c);
+  }
+
+  EIGEN_STRONG_INLINE void acc(const Scalar& c, const Scalar& alpha, Scalar& r) const { r += alpha * c; }
+
+  EIGEN_STRONG_INLINE void acc(const DoublePacketType& c, const ResPacket& alpha, ResPacket& r) const
+  {
+    // assemble c
+    ResPacket tmp;
+    if((!ConjLhs)&&(!ConjRhs))
+    {
+      tmp = pcplxflip(pconj(ResPacket(c.second)));
+      tmp = padd(ResPacket(c.first),tmp);
+    }
+    else if((!ConjLhs)&&(ConjRhs))
+    {
+      tmp = pconj(pcplxflip(ResPacket(c.second)));
+      tmp = padd(ResPacket(c.first),tmp);
+    }
+    else if((ConjLhs)&&(!ConjRhs))
+    {
+      tmp = pcplxflip(ResPacket(c.second));
+      tmp = padd(pconj(ResPacket(c.first)),tmp);
+    }
+    else if((ConjLhs)&&(ConjRhs))
+    {
+      tmp = pcplxflip(ResPacket(c.second));
+      tmp = psub(pconj(ResPacket(c.first)),tmp);
+    }
+
+    r = pmadd(tmp,alpha,r);
+  }
+
+protected:
+  conj_helper<LhsScalar,RhsScalar,ConjLhs,ConjRhs> cj;
+};
+
+template<typename RealScalar, bool _ConjRhs>
+class gebp_traits<RealScalar, std::complex<RealScalar>, false, _ConjRhs >
+{
+public:
+  typedef std::complex<RealScalar>  Scalar;
+  typedef RealScalar  LhsScalar;
+  typedef Scalar      RhsScalar;
+  typedef Scalar      ResScalar;
+
+  enum {
+    ConjLhs = false,
+    ConjRhs = _ConjRhs,
+    Vectorizable = packet_traits<RealScalar>::Vectorizable
+                && packet_traits<Scalar>::Vectorizable,
+    LhsPacketSize = Vectorizable ? packet_traits<LhsScalar>::size : 1,
+    RhsPacketSize = Vectorizable ? packet_traits<RhsScalar>::size : 1,
+    ResPacketSize = Vectorizable ? packet_traits<ResScalar>::size : 1,
+
+    NumberOfRegisters = EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS,
+    // FIXME: should depend on NumberOfRegisters
+    nr = 4,
+    mr = (EIGEN_PLAIN_ENUM_MIN(16,NumberOfRegisters)/2/nr)*ResPacketSize,
+
+    LhsProgress = ResPacketSize,
+    RhsProgress = 1
+  };
+
+  typedef typename packet_traits<LhsScalar>::type  _LhsPacket;
+  typedef typename packet_traits<RhsScalar>::type  _RhsPacket;
+  typedef typename packet_traits<ResScalar>::type  _ResPacket;
+
+  typedef typename conditional<Vectorizable,_LhsPacket,LhsScalar>::type LhsPacket;
+  typedef typename conditional<Vectorizable,_RhsPacket,RhsScalar>::type RhsPacket;
+  typedef typename conditional<Vectorizable,_ResPacket,ResScalar>::type ResPacket;
+
+  typedef ResPacket AccPacket;
+
+  EIGEN_STRONG_INLINE void initAcc(AccPacket& p)
+  {
+    p = pset1<ResPacket>(ResScalar(0));
+  }
+
+  EIGEN_STRONG_INLINE void loadRhs(const RhsScalar* b, RhsPacket& dest) const
+  {
+    dest = pset1<RhsPacket>(*b);
+  }
+
+  void broadcastRhs(const RhsScalar* b, RhsPacket& b0, RhsPacket& b1, RhsPacket& b2, RhsPacket& b3)
+  {
+    pbroadcast4(b, b0, b1, b2, b3);
+  }
+
+//   EIGEN_STRONG_INLINE void broadcastRhs(const RhsScalar* b, RhsPacket& b0, RhsPacket& b1)
+//   {
+//     // FIXME not sure that's the best way to implement it!
+//     b0 = pload1<RhsPacket>(b+0);
+//     b1 = pload1<RhsPacket>(b+1);
+//   }
+
+  EIGEN_STRONG_INLINE void loadLhs(const LhsScalar* a, LhsPacket& dest) const
+  {
+    dest = ploaddup<LhsPacket>(a);
+  }
+
+  EIGEN_STRONG_INLINE void loadRhsQuad(const RhsScalar* b, RhsPacket& dest) const
+  {
+    eigen_internal_assert(unpacket_traits<RhsPacket>::size<=4);
+    loadRhs(b,dest);
+  }
+
+  EIGEN_STRONG_INLINE void loadLhsUnaligned(const LhsScalar* a, LhsPacket& dest) const
+  {
+    dest = ploaddup<LhsPacket>(a);
+  }
+
+  EIGEN_STRONG_INLINE void madd(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& tmp) const
+  {
+    madd_impl(a, b, c, tmp, typename conditional<Vectorizable,true_type,false_type>::type());
+  }
+
+  EIGEN_STRONG_INLINE void madd_impl(const LhsPacket& a, const RhsPacket& b, AccPacket& c, RhsPacket& tmp, const true_type&) const
+  {
+#ifdef EIGEN_HAS_SINGLE_INSTRUCTION_MADD
+    EIGEN_UNUSED_VARIABLE(tmp);
+    c.v = pmadd(a,b.v,c.v);
+#else
+    tmp = b; tmp.v = pmul(a,tmp.v); c = padd(c,tmp);
+#endif
+
+  }
+
+  EIGEN_STRONG_INLINE void madd_impl(const LhsScalar& a, const RhsScalar& b, ResScalar& c, RhsScalar& /*tmp*/, const false_type&) const
+  {
+    c += a * b;
+  }
+
+  EIGEN_STRONG_INLINE void acc(const AccPacket& c, const ResPacket& alpha, ResPacket& r) const
+  {
+    r = cj.pmadd(alpha,c,r);
+  }
+
+protected:
+  conj_helper<ResPacket,ResPacket,false,ConjRhs> cj;
+};
+
+// helper for the rotating kernel below
+template <typename GebpKernel, bool UseRotatingKernel = GebpKernel::UseRotatingKernel>
+struct PossiblyRotatingKernelHelper
+{
+  // default implementation, not rotating
+
+  typedef typename GebpKernel::Traits Traits;
+  typedef typename Traits::RhsScalar RhsScalar;
+  typedef typename Traits::RhsPacket RhsPacket;
+  typedef typename Traits::AccPacket AccPacket;
+
+  const Traits& traits;
+  EIGEN_ALWAYS_INLINE PossiblyRotatingKernelHelper(const Traits& t) : traits(t) {}
+
+
+  template <size_t K, size_t Index> EIGEN_ALWAYS_INLINE
+  void loadOrRotateRhs(RhsPacket& to, const RhsScalar* from) const
+  {
+    traits.loadRhs(from + (Index+4*K)*Traits::RhsProgress, to);
+  }
+
+  EIGEN_ALWAYS_INLINE void unrotateResult(AccPacket&,
+                                          AccPacket&,
+                                          AccPacket&,
+                                          AccPacket&)
+  {
+  }
+};
+
+// rotating implementation
+template <typename GebpKernel>
+struct PossiblyRotatingKernelHelper<GebpKernel, true>
+{
+  typedef typename GebpKernel::Traits Traits;
+  typedef typename Traits::RhsScalar RhsScalar;
+  typedef typename Traits::RhsPacket RhsPacket;
+  typedef typename Traits::AccPacket AccPacket;
+
+  const Traits& traits;
+  EIGEN_ALWAYS_INLINE PossiblyRotatingKernelHelper(const Traits& t) : traits(t) {}
+
+  template <size_t K, size_t Index> EIGEN_ALWAYS_INLINE
+  void loadOrRotateRhs(RhsPacket& to, const RhsScalar* from) const
+  {
+    if (Index == 0) {
+      to = pload<RhsPacket>(from + 4*K*Traits::RhsProgress);
+    } else {
+      EIGEN_ASM_COMMENT("Do not reorder code, we're very tight on registers");
+      to = protate<1>(to);
+    }
+  }
+
+  EIGEN_ALWAYS_INLINE void unrotateResult(AccPacket& res0,
+                                          AccPacket& res1,
+                                          AccPacket& res2,
+                                          AccPacket& res3)
+  {
+    PacketBlock<AccPacket> resblock;
+    resblock.packet[0] = res0;
+    resblock.packet[1] = res1;
+    resblock.packet[2] = res2;
+    resblock.packet[3] = res3;
+    ptranspose(resblock);
+    resblock.packet[3] = protate<1>(resblock.packet[3]);
+    resblock.packet[2] = protate<2>(resblock.packet[2]);
+    resblock.packet[1] = protate<3>(resblock.packet[1]);
+    ptranspose(resblock);
+    res0 = resblock.packet[0];
+    res1 = resblock.packet[1];
+    res2 = resblock.packet[2];
+    res3 = resblock.packet[3];
+  }
+};
+
+/* optimized GEneral packed Block * packed Panel product kernel
+ *
+ * Mixing type logic: C += A * B
+ *  |  A  |  B  | comments
+ *  |real |cplx | no vectorization yet, would require to pack A with duplication
+ *  |cplx |real | easy vectorization
+ */
+template<typename LhsScalar, typename RhsScalar, typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+struct gebp_kernel
+{
+  typedef gebp_traits<LhsScalar,RhsScalar,ConjugateLhs,ConjugateRhs> Traits;
+  typedef typename Traits::ResScalar ResScalar;
+  typedef typename Traits::LhsPacket LhsPacket;
+  typedef typename Traits::RhsPacket RhsPacket;
+  typedef typename Traits::ResPacket ResPacket;
+  typedef typename Traits::AccPacket AccPacket;
+
+  typedef gebp_traits<RhsScalar,LhsScalar,ConjugateRhs,ConjugateLhs> SwappedTraits;
+  typedef typename SwappedTraits::ResScalar SResScalar;
+  typedef typename SwappedTraits::LhsPacket SLhsPacket;
+  typedef typename SwappedTraits::RhsPacket SRhsPacket;
+  typedef typename SwappedTraits::ResPacket SResPacket;
+  typedef typename SwappedTraits::AccPacket SAccPacket;
+
+  typedef typename DataMapper::LinearMapper LinearMapper;
+
+  enum {
+    Vectorizable  = Traits::Vectorizable,
+    LhsProgress   = Traits::LhsProgress,
+    RhsProgress   = Traits::RhsProgress,
+    ResPacketSize = Traits::ResPacketSize
+  };
+
+  EIGEN_DONT_INLINE
+  void operator()(const DataMapper& res, const LhsScalar* blockA, const RhsScalar* blockB,
+                  Index rows, Index depth, Index cols, ResScalar alpha,
+                  Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0);
+
+  static const bool UseRotatingKernel =
+    EIGEN_ARCH_ARM &&
+    internal::is_same<LhsScalar, float>::value &&
+    internal::is_same<RhsScalar, float>::value &&
+    internal::is_same<ResScalar, float>::value &&
+    Traits::LhsPacketSize == 4 &&
+    Traits::RhsPacketSize == 4 &&
+    Traits::ResPacketSize == 4;
+};
+
+template<typename LhsScalar, typename RhsScalar, typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+EIGEN_DONT_INLINE
+void gebp_kernel<LhsScalar, RhsScalar, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+  ::operator()(const DataMapper& res, const LhsScalar* blockA, const RhsScalar* blockB,
+               Index rows, Index depth, Index cols, ResScalar alpha,
+               Index strideA, Index strideB, Index offsetA, Index offsetB)
+  {
+    Traits traits;
+    SwappedTraits straits;
+
+    if(strideA==-1) strideA = depth;
+    if(strideB==-1) strideB = depth;
+    conj_helper<LhsScalar,RhsScalar,ConjugateLhs,ConjugateRhs> cj;
+    Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0;
+    const Index peeled_mc3 = mr>=3*Traits::LhsProgress ? (rows/(3*LhsProgress))*(3*LhsProgress) : 0;
+    const Index peeled_mc2 = mr>=2*Traits::LhsProgress ? peeled_mc3+((rows-peeled_mc3)/(2*LhsProgress))*(2*LhsProgress) : 0;
+    const Index peeled_mc1 = mr>=1*Traits::LhsProgress ? (rows/(1*LhsProgress))*(1*LhsProgress) : 0;
+    enum { pk = 8 }; // NOTE Such a large peeling factor is important for large matrices (~ +5% when >1000 on Haswell)
+    const Index peeled_kc  = depth & ~(pk-1);
+    const Index prefetch_res_offset = 0;
+//     const Index depth2     = depth & ~1;
+
+    //---------- Process 3 * LhsProgress rows at once ----------
+    // This corresponds to 3*LhsProgress x nr register blocks.
+    // Usually, make sense only with FMA
+    if(mr>=3*Traits::LhsProgress)
+    {
+      PossiblyRotatingKernelHelper<gebp_kernel> possiblyRotatingKernelHelper(traits);
+
+      // loops on each largest micro horizontal panel of lhs (3*Traits::LhsProgress x depth)
+      for(Index i=0; i<peeled_mc3; i+=3*Traits::LhsProgress)
+      {
+        // loops on each largest micro vertical panel of rhs (depth * nr)
+        for(Index j2=0; j2<packet_cols4; j2+=nr)
+        {
+          // We select a 3*Traits::LhsProgress x nr micro block of res which is entirely
+          // stored into 3 x nr registers.
+
+          const LhsScalar* blA = &blockA[i*strideA+offsetA*(3*Traits::LhsProgress)];
+          prefetch(&blA[0]);
+          const RhsScalar* blB = &blockB[j2*strideB+offsetB*nr];
+          prefetch(&blB[0]);
+          LhsPacket A0, A1;
+
+          // gets res block as register
+          AccPacket C0, C1, C2,  C3,
+                    C4, C5, C6,  C7,
+                    C8, C9, C10, C11;
+          traits.initAcc(C0);  traits.initAcc(C1);  traits.initAcc(C2);  traits.initAcc(C3);
+          traits.initAcc(C4);  traits.initAcc(C5);  traits.initAcc(C6);  traits.initAcc(C7);
+          traits.initAcc(C8);  traits.initAcc(C9);  traits.initAcc(C10); traits.initAcc(C11);
+
+          LinearMapper r0 = res.getLinearMapper(i, j2 + 0);
+          LinearMapper r1 = res.getLinearMapper(i, j2 + 1);
+          LinearMapper r2 = res.getLinearMapper(i, j2 + 2);
+          LinearMapper r3 = res.getLinearMapper(i, j2 + 3);
+
+          r0.prefetch(0);
+          r1.prefetch(0);
+          r2.prefetch(0);
+          r3.prefetch(0);
+
+          // performs "inner" products
+          for(Index k=0; k<peeled_kc; k+=pk)
+          {
+            EIGEN_ASM_COMMENT("begin gebp micro kernel 3pX4");
+            RhsPacket B_0, T0;
+            LhsPacket A2;
+
+#define EIGEN_GEBP_ONESTEP(K) \
+            do { \
+              EIGEN_ASM_COMMENT("begin step of gebp micro kernel 3pX4"); \
+              EIGEN_ASM_COMMENT("Note: these asm comments work around bug 935!"); \
+              internal::prefetch(blA+(3*K+16)*LhsProgress); \
+              if (EIGEN_ARCH_ARM) internal::prefetch(blB+(4*K+16)*RhsProgress); /* Bug 953 */ \
+              traits.loadLhs(&blA[(0+3*K)*LhsProgress], A0);  \
+              traits.loadLhs(&blA[(1+3*K)*LhsProgress], A1);  \
+              traits.loadLhs(&blA[(2+3*K)*LhsProgress], A2);  \
+              possiblyRotatingKernelHelper.template loadOrRotateRhs<K, 0>(B_0, blB); \
+              traits.madd(A0, B_0, C0, T0); \
+              traits.madd(A1, B_0, C4, T0); \
+              traits.madd(A2, B_0, C8, B_0); \
+              possiblyRotatingKernelHelper.template loadOrRotateRhs<K, 1>(B_0, blB); \
+              traits.madd(A0, B_0, C1, T0); \
+              traits.madd(A1, B_0, C5, T0); \
+              traits.madd(A2, B_0, C9, B_0); \
+              possiblyRotatingKernelHelper.template loadOrRotateRhs<K, 2>(B_0, blB); \
+              traits.madd(A0, B_0, C2,  T0); \
+              traits.madd(A1, B_0, C6,  T0); \
+              traits.madd(A2, B_0, C10, B_0); \
+              possiblyRotatingKernelHelper.template loadOrRotateRhs<K, 3>(B_0, blB); \
+              traits.madd(A0, B_0, C3 , T0); \
+              traits.madd(A1, B_0, C7,  T0); \
+              traits.madd(A2, B_0, C11, B_0); \
+              EIGEN_ASM_COMMENT("end step of gebp micro kernel 3pX4"); \
+            } while(false)
+
+            internal::prefetch(blB);
+            EIGEN_GEBP_ONESTEP(0);
+            EIGEN_GEBP_ONESTEP(1);
+            EIGEN_GEBP_ONESTEP(2);
+            EIGEN_GEBP_ONESTEP(3);
+            EIGEN_GEBP_ONESTEP(4);
+            EIGEN_GEBP_ONESTEP(5);
+            EIGEN_GEBP_ONESTEP(6);
+            EIGEN_GEBP_ONESTEP(7);
+
+            blB += pk*4*RhsProgress;
+            blA += pk*3*Traits::LhsProgress;
+
+            EIGEN_ASM_COMMENT("end gebp micro kernel 3pX4");
+          }
+          // process remaining peeled loop
+          for(Index k=peeled_kc; k<depth; k++)
+          {
+            RhsPacket B_0, T0;
+            LhsPacket A2;
+            EIGEN_GEBP_ONESTEP(0);
+            blB += 4*RhsProgress;
+            blA += 3*Traits::LhsProgress;
+          }
+#undef EIGEN_GEBP_ONESTEP
+
+          possiblyRotatingKernelHelper.unrotateResult(C0, C1, C2, C3);
+          possiblyRotatingKernelHelper.unrotateResult(C4, C5, C6, C7);
+          possiblyRotatingKernelHelper.unrotateResult(C8, C9, C10, C11);
+
+          ResPacket R0, R1, R2;
+          ResPacket alphav = pset1<ResPacket>(alpha);
+
+          R0 = r0.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r0.loadPacket(1 * Traits::ResPacketSize);
+          R2 = r0.loadPacket(2 * Traits::ResPacketSize);
+          traits.acc(C0, alphav, R0);
+          traits.acc(C4, alphav, R1);
+          traits.acc(C8, alphav, R2);
+          r0.storePacket(0 * Traits::ResPacketSize, R0);
+          r0.storePacket(1 * Traits::ResPacketSize, R1);
+          r0.storePacket(2 * Traits::ResPacketSize, R2);
+
+          R0 = r1.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r1.loadPacket(1 * Traits::ResPacketSize);
+          R2 = r1.loadPacket(2 * Traits::ResPacketSize);
+          traits.acc(C1, alphav, R0);
+          traits.acc(C5, alphav, R1);
+          traits.acc(C9, alphav, R2);
+          r1.storePacket(0 * Traits::ResPacketSize, R0);
+          r1.storePacket(1 * Traits::ResPacketSize, R1);
+          r1.storePacket(2 * Traits::ResPacketSize, R2);
+
+          R0 = r2.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r2.loadPacket(1 * Traits::ResPacketSize);
+          R2 = r2.loadPacket(2 * Traits::ResPacketSize);
+          traits.acc(C2, alphav, R0);
+          traits.acc(C6, alphav, R1);
+          traits.acc(C10, alphav, R2);
+          r2.storePacket(0 * Traits::ResPacketSize, R0);
+          r2.storePacket(1 * Traits::ResPacketSize, R1);
+          r2.storePacket(2 * Traits::ResPacketSize, R2);
+
+          R0 = r3.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r3.loadPacket(1 * Traits::ResPacketSize);
+          R2 = r3.loadPacket(2 * Traits::ResPacketSize);
+          traits.acc(C3, alphav, R0);
+          traits.acc(C7, alphav, R1);
+          traits.acc(C11, alphav, R2);
+          r3.storePacket(0 * Traits::ResPacketSize, R0);
+          r3.storePacket(1 * Traits::ResPacketSize, R1);
+          r3.storePacket(2 * Traits::ResPacketSize, R2);
+        }
+
+        // Deal with remaining columns of the rhs
+        for(Index j2=packet_cols4; j2<cols; j2++)
+        {
+          // One column at a time
+          const LhsScalar* blA = &blockA[i*strideA+offsetA*(3*Traits::LhsProgress)];
+          prefetch(&blA[0]);
+          const RhsScalar* blB = &blockB[j2*strideB+offsetB];
+          prefetch(&blB[0]);
+          // gets res block as register
+          AccPacket C0, C4, C8;
+          traits.initAcc(C0);
+          traits.initAcc(C4);
+          traits.initAcc(C8);
+
+          LinearMapper r0 = res.getLinearMapper(i, j2);
+          r0.prefetch(0);
+          LhsPacket A0, A1, A2;
+
+          // performs "inner" products
+          for(Index k=0; k<peeled_kc; k+=pk)
+          {
+            EIGEN_ASM_COMMENT("begin gebp micro kernel 3pX1");
+            RhsPacket B_0;
+#define EIGEN_GEBGP_ONESTEP(K) \
+            do { \
+              EIGEN_ASM_COMMENT("begin step of gebp micro kernel 3pX1"); \
+              EIGEN_ASM_COMMENT("Note: these asm comments work around bug 935!"); \
+              traits.loadLhs(&blA[(0+3*K)*LhsProgress], A0);  \
+              traits.loadLhs(&blA[(1+3*K)*LhsProgress], A1);  \
+              traits.loadLhs(&blA[(2+3*K)*LhsProgress], A2);  \
+              traits.loadRhs(&blB[(0+K)*RhsProgress], B_0);   \
+              traits.madd(A0, B_0, C0, B_0); \
+              traits.madd(A1, B_0, C4, B_0); \
+              traits.madd(A2, B_0, C8, B_0); \
+              EIGEN_ASM_COMMENT("end step of gebp micro kernel 3pX1"); \
+            } while(false)
+
+            EIGEN_GEBGP_ONESTEP(0);
+            EIGEN_GEBGP_ONESTEP(1);
+            EIGEN_GEBGP_ONESTEP(2);
+            EIGEN_GEBGP_ONESTEP(3);
+            EIGEN_GEBGP_ONESTEP(4);
+            EIGEN_GEBGP_ONESTEP(5);
+            EIGEN_GEBGP_ONESTEP(6);
+            EIGEN_GEBGP_ONESTEP(7);
+
+            blB += pk*RhsProgress;
+            blA += pk*3*Traits::LhsProgress;
+
+            EIGEN_ASM_COMMENT("end gebp micro kernel 3pX1");
+          }
+
+          // process remaining peeled loop
+          for(Index k=peeled_kc; k<depth; k++)
+          {
+            RhsPacket B_0;
+            EIGEN_GEBGP_ONESTEP(0);
+            blB += RhsProgress;
+            blA += 3*Traits::LhsProgress;
+          }
+#undef EIGEN_GEBGP_ONESTEP
+          ResPacket R0, R1, R2;
+          ResPacket alphav = pset1<ResPacket>(alpha);
+
+          R0 = r0.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r0.loadPacket(1 * Traits::ResPacketSize);
+          R2 = r0.loadPacket(2 * Traits::ResPacketSize);
+          traits.acc(C0, alphav, R0);
+          traits.acc(C4, alphav, R1);
+          traits.acc(C8, alphav, R2);
+          r0.storePacket(0 * Traits::ResPacketSize, R0);
+          r0.storePacket(1 * Traits::ResPacketSize, R1);
+          r0.storePacket(2 * Traits::ResPacketSize, R2);
+        }
+      }
+    }
+
+    //---------- Process 2 * LhsProgress rows at once ----------
+    if(mr>=2*Traits::LhsProgress)
+    {
+      // loops on each largest micro horizontal panel of lhs (2*LhsProgress x depth)
+      for(Index i=peeled_mc3; i<peeled_mc2; i+=2*LhsProgress)
+      {
+        // loops on each largest micro vertical panel of rhs (depth * nr)
+        for(Index j2=0; j2<packet_cols4; j2+=nr)
+        {
+          // We select a 2*Traits::LhsProgress x nr micro block of res which is entirely
+          // stored into 2 x nr registers.
+
+          const LhsScalar* blA = &blockA[i*strideA+offsetA*(2*Traits::LhsProgress)];
+          prefetch(&blA[0]);
+          const RhsScalar* blB = &blockB[j2*strideB+offsetB*nr];
+          prefetch(&blB[0]);
+
+          // gets res block as register
+          AccPacket C0, C1, C2, C3,
+                    C4, C5, C6, C7;
+          traits.initAcc(C0); traits.initAcc(C1); traits.initAcc(C2); traits.initAcc(C3);
+          traits.initAcc(C4); traits.initAcc(C5); traits.initAcc(C6); traits.initAcc(C7);
+
+          LinearMapper r0 = res.getLinearMapper(i, j2 + 0);
+          LinearMapper r1 = res.getLinearMapper(i, j2 + 1);
+          LinearMapper r2 = res.getLinearMapper(i, j2 + 2);
+          LinearMapper r3 = res.getLinearMapper(i, j2 + 3);
+
+          r0.prefetch(prefetch_res_offset);
+          r1.prefetch(prefetch_res_offset);
+          r2.prefetch(prefetch_res_offset);
+          r3.prefetch(prefetch_res_offset);
+
+          LhsPacket A0, A1;
+
+          // performs "inner" products
+          for(Index k=0; k<peeled_kc; k+=pk)
+          {
+            EIGEN_ASM_COMMENT("begin gebp micro kernel 2pX4");
+            RhsPacket B_0, B1, B2, B3, T0;
+
+            // The 2 ASM comments in the #define are intended to prevent gcc
+            // from optimizing the code accross steps since it ends up spilling
+            // registers in this case.
+   #define EIGEN_GEBGP_ONESTEP(K) \
+            do {                                                                \
+              EIGEN_ASM_COMMENT("begin step of gebp micro kernel 2pX4");        \
+              EIGEN_ASM_COMMENT("Note: these asm comments work around bug 935!"); \
+              traits.loadLhs(&blA[(0+2*K)*LhsProgress], A0);                    \
+              traits.loadLhs(&blA[(1+2*K)*LhsProgress], A1);                    \
+              traits.broadcastRhs(&blB[(0+4*K)*RhsProgress], B_0, B1, B2, B3);  \
+              traits.madd(A0, B_0, C0, T0);                                     \
+              traits.madd(A1, B_0, C4, B_0);                                    \
+              traits.madd(A0, B1,  C1, T0);                                     \
+              traits.madd(A1, B1,  C5, B1);                                     \
+              traits.madd(A0, B2,  C2, T0);                                     \
+              traits.madd(A1, B2,  C6, B2);                                     \
+              traits.madd(A0, B3,  C3, T0);                                     \
+              traits.madd(A1, B3,  C7, B3);                                     \
+              EIGEN_ASM_COMMENT("end step of gebp micro kernel 2pX4");          \
+            } while(false)
+
+            prefetch(&blB[pk*4*RhsProgress]);
+            EIGEN_GEBGP_ONESTEP(0);
+            EIGEN_GEBGP_ONESTEP(1);
+            EIGEN_GEBGP_ONESTEP(2);
+            EIGEN_GEBGP_ONESTEP(3);
+            EIGEN_GEBGP_ONESTEP(4);
+            EIGEN_GEBGP_ONESTEP(5);
+            EIGEN_GEBGP_ONESTEP(6);
+            EIGEN_GEBGP_ONESTEP(7);
+
+            blB += pk*4*RhsProgress;
+            blA += pk*(2*Traits::LhsProgress);
+
+            EIGEN_ASM_COMMENT("end gebp micro kernel 2pX4");
+          }
+          // process remaining peeled loop
+          for(Index k=peeled_kc; k<depth; k++)
+          {
+            RhsPacket B_0, B1, B2, B3, T0;
+            EIGEN_GEBGP_ONESTEP(0);
+            blB += 4*RhsProgress;
+            blA += 2*Traits::LhsProgress;
+          }
+#undef EIGEN_GEBGP_ONESTEP
+
+          ResPacket R0, R1, R2, R3;
+          ResPacket alphav = pset1<ResPacket>(alpha);
+
+          R0 = r0.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r0.loadPacket(1 * Traits::ResPacketSize);
+          R2 = r1.loadPacket(0 * Traits::ResPacketSize);
+          R3 = r1.loadPacket(1 * Traits::ResPacketSize);
+          traits.acc(C0, alphav, R0);
+          traits.acc(C4, alphav, R1);
+          traits.acc(C1, alphav, R2);
+          traits.acc(C5, alphav, R3);
+          r0.storePacket(0 * Traits::ResPacketSize, R0);
+          r0.storePacket(1 * Traits::ResPacketSize, R1);
+          r1.storePacket(0 * Traits::ResPacketSize, R2);
+          r1.storePacket(1 * Traits::ResPacketSize, R3);
+
+          R0 = r2.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r2.loadPacket(1 * Traits::ResPacketSize);
+          R2 = r3.loadPacket(0 * Traits::ResPacketSize);
+          R3 = r3.loadPacket(1 * Traits::ResPacketSize);
+          traits.acc(C2,  alphav, R0);
+          traits.acc(C6,  alphav, R1);
+          traits.acc(C3,  alphav, R2);
+          traits.acc(C7,  alphav, R3);
+          r2.storePacket(0 * Traits::ResPacketSize, R0);
+          r2.storePacket(1 * Traits::ResPacketSize, R1);
+          r3.storePacket(0 * Traits::ResPacketSize, R2);
+          r3.storePacket(1 * Traits::ResPacketSize, R3);
+        }
+
+        // Deal with remaining columns of the rhs
+        for(Index j2=packet_cols4; j2<cols; j2++)
+        {
+          // One column at a time
+          const LhsScalar* blA = &blockA[i*strideA+offsetA*(2*Traits::LhsProgress)];
+          prefetch(&blA[0]);
+          const RhsScalar* blB = &blockB[j2*strideB+offsetB];
+          prefetch(&blB[0]);
+
+          // gets res block as register
+          AccPacket C0, C4;
+          traits.initAcc(C0);
+          traits.initAcc(C4);
+
+          LinearMapper r0 = res.getLinearMapper(i, j2);
+          r0.prefetch(prefetch_res_offset);
+          LhsPacket A0, A1;
+
+          // performs "inner" products
+          for(Index k=0; k<peeled_kc; k+=pk)
+          {
+            EIGEN_ASM_COMMENT("begin gebp micro kernel 2pX1");
+            RhsPacket B_0, B1;
+
+#define EIGEN_GEBGP_ONESTEP(K) \
+            do {                                                                  \
+              EIGEN_ASM_COMMENT("begin step of gebp micro kernel 2pX1");          \
+              EIGEN_ASM_COMMENT("Note: these asm comments work around bug 935!"); \
+              traits.loadLhs(&blA[(0+2*K)*LhsProgress], A0);                      \
+              traits.loadLhs(&blA[(1+2*K)*LhsProgress], A1);                      \
+              traits.loadRhs(&blB[(0+K)*RhsProgress], B_0);                       \
+              traits.madd(A0, B_0, C0, B1);                                       \
+              traits.madd(A1, B_0, C4, B_0);                                      \
+              EIGEN_ASM_COMMENT("end step of gebp micro kernel 2pX1");            \
+            } while(false)
+
+            EIGEN_GEBGP_ONESTEP(0);
+            EIGEN_GEBGP_ONESTEP(1);
+            EIGEN_GEBGP_ONESTEP(2);
+            EIGEN_GEBGP_ONESTEP(3);
+            EIGEN_GEBGP_ONESTEP(4);
+            EIGEN_GEBGP_ONESTEP(5);
+            EIGEN_GEBGP_ONESTEP(6);
+            EIGEN_GEBGP_ONESTEP(7);
+
+            blB += pk*RhsProgress;
+            blA += pk*2*Traits::LhsProgress;
+
+            EIGEN_ASM_COMMENT("end gebp micro kernel 2pX1");
+          }
+
+          // process remaining peeled loop
+          for(Index k=peeled_kc; k<depth; k++)
+          {
+            RhsPacket B_0, B1;
+            EIGEN_GEBGP_ONESTEP(0);
+            blB += RhsProgress;
+            blA += 2*Traits::LhsProgress;
+          }
+#undef EIGEN_GEBGP_ONESTEP
+          ResPacket R0, R1;
+          ResPacket alphav = pset1<ResPacket>(alpha);
+
+          R0 = r0.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r0.loadPacket(1 * Traits::ResPacketSize);
+          traits.acc(C0, alphav, R0);
+          traits.acc(C4, alphav, R1);
+          r0.storePacket(0 * Traits::ResPacketSize, R0);
+          r0.storePacket(1 * Traits::ResPacketSize, R1);
+        }
+      }
+    }
+    //---------- Process 1 * LhsProgress rows at once ----------
+    if(mr>=1*Traits::LhsProgress)
+    {
+      // loops on each largest micro horizontal panel of lhs (1*LhsProgress x depth)
+      for(Index i=peeled_mc2; i<peeled_mc1; i+=1*LhsProgress)
+      {
+        // loops on each largest micro vertical panel of rhs (depth * nr)
+        for(Index j2=0; j2<packet_cols4; j2+=nr)
+        {
+          // We select a 1*Traits::LhsProgress x nr micro block of res which is entirely
+          // stored into 1 x nr registers.
+
+          const LhsScalar* blA = &blockA[i*strideA+offsetA*(1*Traits::LhsProgress)];
+          prefetch(&blA[0]);
+          const RhsScalar* blB = &blockB[j2*strideB+offsetB*nr];
+          prefetch(&blB[0]);
+
+          // gets res block as register
+          AccPacket C0, C1, C2, C3;
+          traits.initAcc(C0);
+          traits.initAcc(C1);
+          traits.initAcc(C2);
+          traits.initAcc(C3);
+
+          LinearMapper r0 = res.getLinearMapper(i, j2 + 0);
+          LinearMapper r1 = res.getLinearMapper(i, j2 + 1);
+          LinearMapper r2 = res.getLinearMapper(i, j2 + 2);
+          LinearMapper r3 = res.getLinearMapper(i, j2 + 3);
+
+          r0.prefetch(prefetch_res_offset);
+          r1.prefetch(prefetch_res_offset);
+          r2.prefetch(prefetch_res_offset);
+          r3.prefetch(prefetch_res_offset);
+          LhsPacket A0;
+
+          // performs "inner" products
+          for(Index k=0; k<peeled_kc; k+=pk)
+          {
+            EIGEN_ASM_COMMENT("begin gebp micro kernel 1pX4");
+            RhsPacket B_0, B1, B2, B3;
+
+#define EIGEN_GEBGP_ONESTEP(K) \
+            do {                                                                \
+              EIGEN_ASM_COMMENT("begin step of gebp micro kernel 1pX4");        \
+              EIGEN_ASM_COMMENT("Note: these asm comments work around bug 935!"); \
+              traits.loadLhs(&blA[(0+1*K)*LhsProgress], A0);                    \
+              traits.broadcastRhs(&blB[(0+4*K)*RhsProgress], B_0, B1, B2, B3);  \
+              traits.madd(A0, B_0, C0, B_0);                                    \
+              traits.madd(A0, B1,  C1, B1);                                     \
+              traits.madd(A0, B2,  C2, B2);                                     \
+              traits.madd(A0, B3,  C3, B3);                                     \
+              EIGEN_ASM_COMMENT("end step of gebp micro kernel 1pX4");          \
+            } while(false)
+
+            EIGEN_GEBGP_ONESTEP(0);
+            EIGEN_GEBGP_ONESTEP(1);
+            EIGEN_GEBGP_ONESTEP(2);
+            EIGEN_GEBGP_ONESTEP(3);
+            EIGEN_GEBGP_ONESTEP(4);
+            EIGEN_GEBGP_ONESTEP(5);
+            EIGEN_GEBGP_ONESTEP(6);
+            EIGEN_GEBGP_ONESTEP(7);
+
+            blB += pk*4*RhsProgress;
+            blA += pk*1*LhsProgress;
+
+            EIGEN_ASM_COMMENT("end gebp micro kernel 1pX4");
+          }
+          // process remaining peeled loop
+          for(Index k=peeled_kc; k<depth; k++)
+          {
+            RhsPacket B_0, B1, B2, B3;
+            EIGEN_GEBGP_ONESTEP(0);
+            blB += 4*RhsProgress;
+            blA += 1*LhsProgress;
+          }
+#undef EIGEN_GEBGP_ONESTEP
+
+          ResPacket R0, R1;
+          ResPacket alphav = pset1<ResPacket>(alpha);
+
+          R0 = r0.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r1.loadPacket(0 * Traits::ResPacketSize);
+          traits.acc(C0, alphav, R0);
+          traits.acc(C1,  alphav, R1);
+          r0.storePacket(0 * Traits::ResPacketSize, R0);
+          r1.storePacket(0 * Traits::ResPacketSize, R1);
+
+          R0 = r2.loadPacket(0 * Traits::ResPacketSize);
+          R1 = r3.loadPacket(0 * Traits::ResPacketSize);
+          traits.acc(C2,  alphav, R0);
+          traits.acc(C3,  alphav, R1);
+          r2.storePacket(0 * Traits::ResPacketSize, R0);
+          r3.storePacket(0 * Traits::ResPacketSize, R1);
+        }
+
+        // Deal with remaining columns of the rhs
+        for(Index j2=packet_cols4; j2<cols; j2++)
+        {
+          // One column at a time
+          const LhsScalar* blA = &blockA[i*strideA+offsetA*(1*Traits::LhsProgress)];
+          prefetch(&blA[0]);
+          const RhsScalar* blB = &blockB[j2*strideB+offsetB];
+          prefetch(&blB[0]);
+
+          // gets res block as register
+          AccPacket C0;
+          traits.initAcc(C0);
+
+          LinearMapper r0 = res.getLinearMapper(i, j2);
+          LhsPacket A0;
+
+          // performs "inner" products
+          for(Index k=0; k<peeled_kc; k+=pk)
+          {
+            EIGEN_ASM_COMMENT("begin gebp micro kernel 2pX1");
+            RhsPacket B_0;
+
+#define EIGEN_GEBGP_ONESTEP(K) \
+            do {                                                                \
+              EIGEN_ASM_COMMENT("begin step of gebp micro kernel 2pX1");        \
+              EIGEN_ASM_COMMENT("Note: these asm comments work around bug 935!"); \
+              traits.loadLhs(&blA[(0+1*K)*LhsProgress], A0);                    \
+              traits.loadRhs(&blB[(0+K)*RhsProgress], B_0);                     \
+              traits.madd(A0, B_0, C0, B_0);                                    \
+              EIGEN_ASM_COMMENT("end step of gebp micro kernel 2pX1");          \
+            } while(false)
+
+            EIGEN_GEBGP_ONESTEP(0);
+            EIGEN_GEBGP_ONESTEP(1);
+            EIGEN_GEBGP_ONESTEP(2);
+            EIGEN_GEBGP_ONESTEP(3);
+            EIGEN_GEBGP_ONESTEP(4);
+            EIGEN_GEBGP_ONESTEP(5);
+            EIGEN_GEBGP_ONESTEP(6);
+            EIGEN_GEBGP_ONESTEP(7);
+
+            blB += pk*RhsProgress;
+            blA += pk*1*Traits::LhsProgress;
+
+            EIGEN_ASM_COMMENT("end gebp micro kernel 2pX1");
+          }
+
+          // process remaining peeled loop
+          for(Index k=peeled_kc; k<depth; k++)
+          {
+            RhsPacket B_0;
+            EIGEN_GEBGP_ONESTEP(0);
+            blB += RhsProgress;
+            blA += 1*Traits::LhsProgress;
+          }
+#undef EIGEN_GEBGP_ONESTEP
+          ResPacket R0;
+          ResPacket alphav = pset1<ResPacket>(alpha);
+          R0 = r0.loadPacket(0 * Traits::ResPacketSize);
+          traits.acc(C0, alphav, R0);
+          r0.storePacket(0 * Traits::ResPacketSize, R0);
+        }
+      }
+    }
+    //---------- Process remaining rows, 1 by 1 ----------
+    for(Index i=peeled_mc1; i<rows; i+=1)
+    {
+      // loop on each panel of the rhs
+      for(Index j2=0; j2<packet_cols4; j2+=nr)
+      {
+        const LhsScalar* blA = &blockA[i*strideA+offsetA];
+        prefetch(&blA[0]);
+        const RhsScalar* blB = &blockB[j2*strideB+offsetB*nr];
+        prefetch(&blB[0]);
+
+        if( (SwappedTraits::LhsProgress % 4)==0 )
+        {
+          // NOTE The following piece of code wont work for 512 bit registers
+          SAccPacket C0, C1, C2, C3;
+          straits.initAcc(C0);
+          straits.initAcc(C1);
+          straits.initAcc(C2);
+          straits.initAcc(C3);
+
+          const Index spk   = (std::max)(1,SwappedTraits::LhsProgress/4);
+          const Index endk  = (depth/spk)*spk;
+          const Index endk4 = (depth/(spk*4))*(spk*4);
+
+          Index k=0;
+          for(; k<endk4; k+=4*spk)
+          {
+            prefetch(&blB[4*SwappedTraits::LhsProgress]);
+
+            SLhsPacket A0,A1,A2,A3;
+            SRhsPacket B_0,B_1,B_2,B_3;
+
+            straits.loadLhsUnaligned(blB+0*SwappedTraits::LhsProgress, A0);
+            straits.loadLhsUnaligned(blB+1*SwappedTraits::LhsProgress, A1);
+            straits.loadRhsQuad(blA+0*spk, B_0);
+            straits.loadRhsQuad(blA+1*spk, B_1);
+            straits.madd(A0,B_0,C0,B_0);
+            straits.madd(A1,B_1,C1,B_1);
+
+            straits.loadLhsUnaligned(blB+2*SwappedTraits::LhsProgress, A2);
+            straits.loadLhsUnaligned(blB+3*SwappedTraits::LhsProgress, A3);
+            straits.loadRhsQuad(blA+2*spk, B_2);
+            straits.loadRhsQuad(blA+3*spk, B_3);
+            straits.madd(A2,B_2,C2,B_2);
+            straits.madd(A3,B_3,C3,B_3);
+
+            blB += 4*SwappedTraits::LhsProgress;
+            blA += 4*spk;
+          }
+          C0 = padd(padd(C0,C1),padd(C2,C3));
+          for(; k<endk; k+=spk)
+          {
+            SLhsPacket A0;
+            SRhsPacket B_0;
+
+            straits.loadLhsUnaligned(blB, A0);
+            straits.loadRhsQuad(blA, B_0);
+            straits.madd(A0,B_0,C0,B_0);
+
+            blB += SwappedTraits::LhsProgress;
+            blA += spk;
+          }
+          if(SwappedTraits::LhsProgress==8)
+          {
+            // Special case where we have to first reduce the accumulation register C0
+            typedef typename conditional<SwappedTraits::LhsProgress==8,typename unpacket_traits<SResPacket>::half,SResPacket>::type SResPacketHalf;
+            typedef typename conditional<SwappedTraits::LhsProgress==8,typename unpacket_traits<SLhsPacket>::half,SLhsPacket>::type SLhsPacketHalf;
+            typedef typename conditional<SwappedTraits::LhsProgress==8,typename unpacket_traits<SLhsPacket>::half,SRhsPacket>::type SRhsPacketHalf;
+            typedef typename conditional<SwappedTraits::LhsProgress==8,typename unpacket_traits<SAccPacket>::half,SAccPacket>::type SAccPacketHalf;
+
+            SResPacketHalf R = res.template gatherPacket<SResPacketHalf>(i, j2);
+            SResPacketHalf alphav = pset1<SResPacketHalf>(alpha);
+
+            if(depth-endk>0)
+            {
+              // We have to handle the last row of the rhs which corresponds to a half-packet
+              SLhsPacketHalf a0;
+              SRhsPacketHalf b0;
+              straits.loadLhsUnaligned(blB, a0);
+              straits.loadRhs(blA, b0);
+              SAccPacketHalf c0 = predux4(C0);
+              straits.madd(a0,b0,c0,b0);
+              straits.acc(c0, alphav, R);
+            }
+            else
+            {
+                straits.acc(predux4(C0), alphav, R);
+            }
+            res.scatterPacket(i, j2, R);
+          }
+          else
+          {
+            SResPacket R = res.template gatherPacket<SResPacket>(i, j2);
+            SResPacket alphav = pset1<SResPacket>(alpha);
+            straits.acc(C0, alphav, R);
+            res.scatterPacket(i, j2, R);
+          }
+        }
+        else // scalar path
+        {
+          // get a 1 x 4 res block as registers
+          ResScalar C0(0), C1(0), C2(0), C3(0);
+
+          for(Index k=0; k<depth; k++)
+          {
+            LhsScalar A0 = blA[k];
+            RhsScalar B_0 = blB[0];
+            RhsScalar B_1 = blB[1];
+            CJMADD(cj,A0,B_0,C0, B_0);
+            CJMADD(cj,A0,B_1,C1, B_1);
+            RhsScalar B_2 = blB[2];
+            RhsScalar B_3 = blB[3];
+            CJMADD(cj,A0,B_2,C2, B_2);
+            CJMADD(cj,A0,B_3,C3, B_3);
+
+            blB += 4;
+          }
+          res(i, j2 + 0) += alpha * C0;
+          res(i, j2 + 1) += alpha * C1;
+          res(i, j2 + 2) += alpha * C2;
+          res(i, j2 + 3) += alpha * C3;
+        }
+      }
+
+      // remaining columns
+      for(Index j2=packet_cols4; j2<cols; j2++)
+      {
+        const LhsScalar* blA = &blockA[i*strideA+offsetA];
+        //          prefetch(blA);
+        // gets a 1 x 1 res block as registers
+        ResScalar C0(0);
+        const RhsScalar* blB = &blockB[j2*strideB+offsetB];
+        for(Index k=0; k<depth; k++)
+        {
+          LhsScalar A0 = blA[k];
+          RhsScalar B_0 = blB[k];
+          CJMADD(cj, A0, B_0, C0, B_0);
+        }
+        res(i, j2) += alpha * C0;
+      }
+    }
+  }
+
+
+#undef CJMADD
+
+// pack a block of the lhs
+// The traversal is as follow (mr==4):
+//   0  4  8 12 ...
+//   1  5  9 13 ...
+//   2  6 10 14 ...
+//   3  7 11 15 ...
+//
+//  16 20 24 28 ...
+//  17 21 25 29 ...
+//  18 22 26 30 ...
+//  19 23 27 31 ...
+//
+//  32 33 34 35 ...
+//  36 36 38 39 ...
+template<typename Scalar, typename Index, typename DataMapper, int Pack1, int Pack2, bool Conjugate, bool PanelMode>
+struct gemm_pack_lhs<Scalar, Index, DataMapper, Pack1, Pack2, ColMajor, Conjugate, PanelMode>
+{
+  typedef typename DataMapper::LinearMapper LinearMapper;
+  EIGEN_DONT_INLINE void operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0);
+};
+
+template<typename Scalar, typename Index, typename DataMapper, int Pack1, int Pack2, bool Conjugate, bool PanelMode>
+EIGEN_DONT_INLINE void gemm_pack_lhs<Scalar, Index, DataMapper, Pack1, Pack2, ColMajor, Conjugate, PanelMode>
+  ::operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset)
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+  enum { PacketSize = packet_traits<Scalar>::size };
+
+  EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK LHS");
+  EIGEN_UNUSED_VARIABLE(stride);
+  EIGEN_UNUSED_VARIABLE(offset);
+  eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride));
+  eigen_assert( ((Pack1%PacketSize)==0 && Pack1<=4*PacketSize) || (Pack1<=4) );
+  conj_if<NumTraits<Scalar>::IsComplex && Conjugate> cj;
+
+  const Index peeled_mc3 = Pack1>=3*PacketSize ? (rows/(3*PacketSize))*(3*PacketSize) : 0;
+  const Index peeled_mc2 = Pack1>=2*PacketSize ? peeled_mc3+((rows-peeled_mc3)/(2*PacketSize))*(2*PacketSize) : 0;
+  const Index peeled_mc1 = Pack1>=1*PacketSize ? (rows/(1*PacketSize))*(1*PacketSize) : 0;
+  const Index peeled_mc0 = Pack2>=1*PacketSize ? peeled_mc1
+                         : Pack2>1             ? (rows/Pack2)*Pack2 : 0;
+
+  Index i=0;
+
+  // Pack 3 packets
+  if(Pack1>=3*PacketSize)
+  {
+    if(PanelMode)
+    {
+      for(; i<peeled_mc3; i+=3*PacketSize)
+      {
+        blockA += (3*PacketSize) * offset;
+
+        for(Index k=0; k<depth; k++)
+        {
+          Packet A, B, C;
+          A = lhs.loadPacket(i+0*PacketSize, k);
+          B = lhs.loadPacket(i+1*PacketSize, k);
+          C = lhs.loadPacket(i+2*PacketSize, k);
+          pstore(blockA+0*PacketSize, cj.pconj(A));
+          pstore(blockA+1*PacketSize, cj.pconj(B));
+          pstore(blockA+2*PacketSize, cj.pconj(C));
+          blockA += 3*PacketSize;
+        }
+        blockA += (3*PacketSize) * (stride-offset-depth);
+      }
+    }
+    else
+    {
+      // Read the data from DRAM as sequentially as possible. We're writing to
+      // SRAM so the order of the writes shouldn't impact performance.
+      for(Index k=0; k<depth; k++)
+      {
+        Scalar* localBlockA = blockA + 3*PacketSize*k;
+        for(Index local_i = i; local_i<peeled_mc3; local_i+=3*PacketSize)
+        {
+          Packet A, B, C;
+          A = lhs.loadPacket(local_i+0*PacketSize, k);
+          B = lhs.loadPacket(local_i+1*PacketSize, k);
+          C = lhs.loadPacket(local_i+2*PacketSize, k);
+          pstore(localBlockA+0*PacketSize, cj.pconj(A));
+          pstore(localBlockA+1*PacketSize, cj.pconj(B));
+          pstore(localBlockA+2*PacketSize, cj.pconj(C));
+          localBlockA += 3*PacketSize*depth;
+        }
+      }
+      blockA += depth*peeled_mc3;
+      i = peeled_mc3;
+    }
+  }
+  // Pack 2 packets
+  if(Pack1>=2*PacketSize)
+  {
+    if(PanelMode)
+    {
+      for(; i<peeled_mc2; i+=2*PacketSize)
+      {
+        blockA += (2*PacketSize) * offset;
+
+        for(Index k=0; k<depth; k++)
+        {
+          Packet A, B;
+          A = lhs.loadPacket(i+0*PacketSize, k);
+          B = lhs.loadPacket(i+1*PacketSize, k);
+          pstore(blockA+0*PacketSize, cj.pconj(A));
+          pstore(blockA+1*PacketSize, cj.pconj(B));
+          blockA += 2*PacketSize;
+        }
+        blockA += (2*PacketSize) * (stride-offset-depth);
+      }
+    }
+    else
+    {
+      // Read the data from RAM as sequentially as possible.
+      for(Index k=0; k<depth; k++)
+      {
+        Scalar* localBlockA = blockA + 2*PacketSize*k;
+        for(Index local_i = i; local_i<peeled_mc2; local_i+=2*PacketSize)
+        {
+          Packet A, B;
+          A = lhs.loadPacket(local_i+0*PacketSize, k);
+          B = lhs.loadPacket(local_i+1*PacketSize, k);
+          pstore(localBlockA+0*PacketSize, cj.pconj(A));
+          pstore(localBlockA+1*PacketSize, cj.pconj(B));
+          localBlockA += 2*PacketSize*depth;
+        }
+      }
+      blockA += depth*(peeled_mc2-i);
+      i = peeled_mc2;
+    }
+  }
+  // Pack 1 packets
+  if(Pack1>=1*PacketSize)
+  {
+    if(PanelMode)
+    {
+      for(; i<peeled_mc1; i+=1*PacketSize)
+      {
+        blockA += (1*PacketSize) * offset;
+
+        for(Index k=0; k<depth; k++)
+        {
+          Packet A;
+          A = lhs.loadPacket(i+0*PacketSize, k);
+          pstore(blockA, cj.pconj(A));
+          blockA+=PacketSize;
+        }
+        blockA += (1*PacketSize) * (stride-offset-depth);
+      }
+    }
+    else
+    {
+      // Read the data from RAM as sequentially as possible.
+      for(Index k=0; k<depth; k++)
+      {
+        Scalar* localBlockA = blockA + PacketSize*k;
+        for(Index local_i = i; local_i<peeled_mc1; local_i+=1*PacketSize)
+        {
+          Packet A;
+          A = lhs.loadPacket(local_i+0*PacketSize, k);
+          pstore(localBlockA, cj.pconj(A));
+          localBlockA += PacketSize*depth;
+        }
+      }
+      blockA += depth*(peeled_mc1-i);
+      i = peeled_mc1;
+    }
+  }
+  // Pack scalars
+  if(Pack2<PacketSize && Pack2>1)
+  {
+    for(; i<peeled_mc0; i+=Pack2)
+    {
+      if (PanelMode) {
+        blockA += Pack2 * offset;
+      }
+
+      for(Index k=0; k<depth; k++) {
+        const LinearMapper dm0 = lhs.getLinearMapper(i, k);
+        for(Index w=0; w<Pack2; w++) {
+          *blockA = cj(dm0(w));
+          blockA += 1;
+        }
+      }
+
+      if(PanelMode) blockA += Pack2 * (stride-offset-depth);
+    }
+  }
+  for(; i<rows; i++)
+  {
+    if(PanelMode) blockA += offset;
+    for(Index k=0; k<depth; k++) {
+      *blockA = cj(lhs(i, k));
+      blockA += 1;
+    }
+    if(PanelMode) blockA += (stride-offset-depth);
+  }
+}
+
+template<typename Scalar, typename Index, typename DataMapper, int Pack1, int Pack2, bool Conjugate, bool PanelMode>
+struct gemm_pack_lhs<Scalar, Index, DataMapper, Pack1, Pack2, RowMajor, Conjugate, PanelMode>
+{
+  typedef typename DataMapper::LinearMapper LinearMapper;
+  EIGEN_DONT_INLINE void operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride=0, Index offset=0);
+};
+
+template<typename Scalar, typename Index, typename DataMapper, int Pack1, int Pack2, bool Conjugate, bool PanelMode>
+EIGEN_DONT_INLINE void gemm_pack_lhs<Scalar, Index, DataMapper, Pack1, Pack2, RowMajor, Conjugate, PanelMode>
+  ::operator()(Scalar* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset)
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+  enum { PacketSize = packet_traits<Scalar>::size };
+
+  EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK LHS");
+  EIGEN_UNUSED_VARIABLE(stride);
+  EIGEN_UNUSED_VARIABLE(offset);
+  eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride));
+  conj_if<NumTraits<Scalar>::IsComplex && Conjugate> cj;
+
+//   const Index peeled_mc3 = Pack1>=3*PacketSize ? (rows/(3*PacketSize))*(3*PacketSize) : 0;
+//   const Index peeled_mc2 = Pack1>=2*PacketSize ? peeled_mc3+((rows-peeled_mc3)/(2*PacketSize))*(2*PacketSize) : 0;
+//   const Index peeled_mc1 = Pack1>=1*PacketSize ? (rows/(1*PacketSize))*(1*PacketSize) : 0;
+
+  int pack = Pack1;
+  Index i = 0;
+  while(pack>0)
+  {
+    Index remaining_rows = rows-i;
+    Index peeled_mc = i+(remaining_rows/pack)*pack;
+    for(; i<peeled_mc; i+=pack)
+    {
+      if(PanelMode) blockA += pack * offset;
+
+      const Index peeled_k = (depth/PacketSize)*PacketSize;
+      Index k=0;
+      if(pack>=PacketSize)
+      {
+        for(; k<peeled_k; k+=PacketSize)
+        {
+          for (Index m = 0; m < pack; m += PacketSize)
+          {
+            PacketBlock<Packet> kernel;
+            for (int p = 0; p < PacketSize; ++p) kernel.packet[p] = lhs.loadPacket(i+p+m, k);
+            ptranspose(kernel);
+            for (int p = 0; p < PacketSize; ++p) pstore(blockA+m+(pack)*p, cj.pconj(kernel.packet[p]));
+          }
+          blockA += PacketSize*pack;
+        }
+      }
+      for(; k<depth; k++)
+      {
+        Index w=0;
+        for(; w<pack-3; w+=4)
+        {
+          Scalar a(cj(lhs(i+w+0, k))),
+                 b(cj(lhs(i+w+1, k))),
+                 c(cj(lhs(i+w+2, k))),
+                 d(cj(lhs(i+w+3, k)));
+          blockA[0] = a;
+          blockA[1] = b;
+          blockA[2] = c;
+          blockA[3] = d;
+          blockA += 4;
+        }
+        if(pack%4)
+          for(;w<pack;++w) {
+            *blockA = cj(lhs(i+w, k));
+            blockA += 1;
+          }
+      }
+
+      if(PanelMode) blockA += pack * (stride-offset-depth);
+    }
+
+    pack -= PacketSize;
+    if(pack<Pack2 && (pack+PacketSize)!=Pack2)
+      pack = Pack2;
+  }
+
+  for(; i<rows; i++)
+  {
+    if(PanelMode) blockA += offset;
+    for(Index k=0; k<depth; k++) {
+      *blockA = cj(lhs(i, k));
+      blockA += 1;
+    }
+    if(PanelMode) blockA += (stride-offset-depth);
+  }
+}
+
+// copy a complete panel of the rhs
+// this version is optimized for column major matrices
+// The traversal order is as follow: (nr==4):
+//  0  1  2  3   12 13 14 15   24 27
+//  4  5  6  7   16 17 18 19   25 28
+//  8  9 10 11   20 21 22 23   26 29
+//  .  .  .  .    .  .  .  .    .  .
+template<typename Scalar, typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
+struct gemm_pack_rhs<Scalar, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode>
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename DataMapper::LinearMapper LinearMapper;
+  enum { PacketSize = packet_traits<Scalar>::size };
+  EIGEN_DONT_INLINE void operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0);
+};
+
+template<typename Scalar, typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
+EIGEN_DONT_INLINE void gemm_pack_rhs<Scalar, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode>
+::operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset)
+{
+  EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK RHS COLMAJOR");
+  EIGEN_UNUSED_VARIABLE(stride);
+  EIGEN_UNUSED_VARIABLE(offset);
+  eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride));
+  conj_if<NumTraits<Scalar>::IsComplex && Conjugate> cj;
+  Index packet_cols8 = nr>=8 ? (cols/8) * 8 : 0;
+  Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0;
+  const Index peeled_k = (depth/PacketSize)*PacketSize;
+//   if(nr>=8)
+//   {
+//     for(Index j2=0; j2<packet_cols8; j2+=8)
+//     {
+//       // skip what we have before
+//       if(PanelMode) count += 8 * offset;
+//       const Scalar* b0 = &rhs[(j2+0)*rhsStride];
+//       const Scalar* b1 = &rhs[(j2+1)*rhsStride];
+//       const Scalar* b2 = &rhs[(j2+2)*rhsStride];
+//       const Scalar* b3 = &rhs[(j2+3)*rhsStride];
+//       const Scalar* b4 = &rhs[(j2+4)*rhsStride];
+//       const Scalar* b5 = &rhs[(j2+5)*rhsStride];
+//       const Scalar* b6 = &rhs[(j2+6)*rhsStride];
+//       const Scalar* b7 = &rhs[(j2+7)*rhsStride];
+//       Index k=0;
+//       if(PacketSize==8) // TODO enbale vectorized transposition for PacketSize==4
+//       {
+//         for(; k<peeled_k; k+=PacketSize) {
+//           PacketBlock<Packet> kernel;
+//           for (int p = 0; p < PacketSize; ++p) {
+//             kernel.packet[p] = ploadu<Packet>(&rhs[(j2+p)*rhsStride+k]);
+//           }
+//           ptranspose(kernel);
+//           for (int p = 0; p < PacketSize; ++p) {
+//             pstoreu(blockB+count, cj.pconj(kernel.packet[p]));
+//             count+=PacketSize;
+//           }
+//         }
+//       }
+//       for(; k<depth; k++)
+//       {
+//         blockB[count+0] = cj(b0[k]);
+//         blockB[count+1] = cj(b1[k]);
+//         blockB[count+2] = cj(b2[k]);
+//         blockB[count+3] = cj(b3[k]);
+//         blockB[count+4] = cj(b4[k]);
+//         blockB[count+5] = cj(b5[k]);
+//         blockB[count+6] = cj(b6[k]);
+//         blockB[count+7] = cj(b7[k]);
+//         count += 8;
+//       }
+//       // skip what we have after
+//       if(PanelMode) count += 8 * (stride-offset-depth);
+//     }
+//   }
+
+  if(nr>=4)
+  {
+    for(Index j2=packet_cols8; j2<packet_cols4; j2+=4)
+    {
+      // skip what we have before
+      if(PanelMode) blockB += 4 * offset;
+
+      // TODO: each of these makes a copy of the stride :(
+      const LinearMapper dm0 = rhs.getLinearMapper(0, j2 + 0);
+      const LinearMapper dm1 = rhs.getLinearMapper(0, j2 + 1);
+      const LinearMapper dm2 = rhs.getLinearMapper(0, j2 + 2);
+      const LinearMapper dm3 = rhs.getLinearMapper(0, j2 + 3);
+
+      Index k=0;
+      if((PacketSize%4)==0) // TODO enable vectorized transposition for PacketSize==2 ??
+      {
+        for(; k<peeled_k; k+=PacketSize) {
+          PacketBlock<Packet, 4> kernel;
+          kernel.packet[0] = dm0.loadPacket(k);
+          kernel.packet[1] = dm1.loadPacket(k);
+          kernel.packet[2] = dm2.loadPacket(k);
+          kernel.packet[3] = dm3.loadPacket(k);
+          ptranspose(kernel);
+          pstoreu(blockB+0*PacketSize, cj.pconj(kernel.packet[0]));
+          pstoreu(blockB+1*PacketSize, cj.pconj(kernel.packet[1]));
+          pstoreu(blockB+2*PacketSize, cj.pconj(kernel.packet[2]));
+          pstoreu(blockB+3*PacketSize, cj.pconj(kernel.packet[3]));
+          blockB+=4*PacketSize;
+        }
+      }
+      for(; k<depth; k++)
+      {
+        blockB[0] = cj(dm0(k));
+        blockB[1] = cj(dm1(k));
+        blockB[2] = cj(dm2(k));
+        blockB[3] = cj(dm3(k));
+        blockB += 4;
+      }
+      // skip what we have after
+      if(PanelMode) blockB += 4 * (stride-offset-depth);
+    }
+  }
+
+  // copy the remaining columns one at a time (nr==1)
+  for(Index j2=packet_cols4; j2<cols; ++j2)
+  {
+    const LinearMapper dm0 = rhs.getLinearMapper(0, j2);
+    if(PanelMode) blockB += offset;
+    for(Index k=0; k<depth; k++)
+    {
+      *blockB = cj(dm0(k));
+      blockB += 1;
+    }
+    if(PanelMode) blockB += (stride-offset-depth);
+  }
+}
+
+// this version is optimized for row major matrices
+template<typename Scalar, typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
+struct gemm_pack_rhs<Scalar, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode>
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename packet_traits<Scalar>::half HalfPacket;
+  typedef typename DataMapper::LinearMapper LinearMapper;
+  enum {
+    PacketSize = packet_traits<Scalar>::size,
+    HalfPacketSize = packet_traits<Scalar>::HasHalfPacket ? unpacket_traits<typename packet_traits<Scalar>::half>::size : 0
+  };
+  EIGEN_DONT_INLINE void operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0);
+};
+
+template<typename Scalar, typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
+EIGEN_DONT_INLINE void gemm_pack_rhs<Scalar, Index, DataMapper, nr, RowMajor, Conjugate, PanelMode>
+  ::operator()(Scalar* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset)
+{
+  EIGEN_ASM_COMMENT("EIGEN PRODUCT PACK RHS ROWMAJOR");
+  EIGEN_UNUSED_VARIABLE(stride);
+  EIGEN_UNUSED_VARIABLE(offset);
+  eigen_assert(((!PanelMode) && stride==0 && offset==0) || (PanelMode && stride>=depth && offset<=stride));
+  conj_if<NumTraits<Scalar>::IsComplex && Conjugate> cj;
+  Index packet_cols8 = nr>=8 ? (cols/8) * 8 : 0;
+  Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0;
+
+//   if(nr>=8)
+//   {
+//     for(Index j2=0; j2<packet_cols8; j2+=8)
+//     {
+//       // skip what we have before
+//       if(PanelMode) count += 8 * offset;
+//       for(Index k=0; k<depth; k++)
+//       {
+//         if (PacketSize==8) {
+//           Packet A = ploadu<Packet>(&rhs[k*rhsStride + j2]);
+//           pstoreu(blockB+count, cj.pconj(A));
+//         } else if (PacketSize==4) {
+//           Packet A = ploadu<Packet>(&rhs[k*rhsStride + j2]);
+//           Packet B = ploadu<Packet>(&rhs[k*rhsStride + j2 + PacketSize]);
+//           pstoreu(blockB+count, cj.pconj(A));
+//           pstoreu(blockB+count+PacketSize, cj.pconj(B));
+//         } else {
+//           const Scalar* b0 = &rhs[k*rhsStride + j2];
+//           blockB[count+0] = cj(b0[0]);
+//           blockB[count+1] = cj(b0[1]);
+//           blockB[count+2] = cj(b0[2]);
+//           blockB[count+3] = cj(b0[3]);
+//           blockB[count+4] = cj(b0[4]);
+//           blockB[count+5] = cj(b0[5]);
+//           blockB[count+6] = cj(b0[6]);
+//           blockB[count+7] = cj(b0[7]);
+//         }
+//         count += 8;
+//       }
+//       // skip what we have after
+//       if(PanelMode) count += 8 * (stride-offset-depth);
+//     }
+//   }
+  if(nr>=4)
+  {
+    for(Index j2=packet_cols8; j2<packet_cols4; j2+=4)
+    {
+      // skip what we have before
+      if(PanelMode) blockB += 4 * offset;
+      for(Index k=0; k<depth; k++)
+      {
+        if (PacketSize==4) {
+          Packet A = rhs.loadPacket(k, j2);
+          pstore(blockB, cj.pconj(A));
+          blockB += PacketSize;
+        }
+        else if (HalfPacketSize==4) {
+          HalfPacket A = rhs.loadHalfPacket(k, j2);
+          pstore<Scalar, HalfPacket>(blockB, cj.pconj(A));
+          blockB += HalfPacketSize;
+        }
+        else {
+          const LinearMapper dm0 = rhs.getLinearMapper(k, j2);
+          blockB[0] = cj(dm0(0));
+          blockB[1] = cj(dm0(1));
+          blockB[2] = cj(dm0(2));
+          blockB[3] = cj(dm0(3));
+          blockB += 4;
+        }
+      }
+      // skip what we have after
+      if(PanelMode) blockB += 4 * (stride-offset-depth);
+    }
+  }
+  // copy the remaining columns one at a time (nr==1)
+  for(Index j2=packet_cols4; j2<cols; ++j2)
+  {
+    if(PanelMode) blockB += offset;
+    for(Index k=0; k<depth; k++)
+    {
+      *blockB = cj(rhs(k, j2));
+      blockB += 1;
+    }
+    if(PanelMode) blockB += stride-offset-depth;
+  }
+}
+
+} // end namespace internal
+
+/** \returns the currently set level 1 cpu cache size (in bytes) used to estimate the ideal blocking size parameters.
+  * \sa setCpuCacheSize */
+inline std::ptrdiff_t l1CacheSize()
+{
+  std::ptrdiff_t l1, l2, l3;
+  internal::manage_caching_sizes(GetAction, &l1, &l2, &l3);
+  return l1;
+}
+
+/** \returns the currently set level 2 cpu cache size (in bytes) used to estimate the ideal blocking size parameters.
+  * \sa setCpuCacheSize */
+inline std::ptrdiff_t l2CacheSize()
+{
+  std::ptrdiff_t l1, l2, l3;
+  internal::manage_caching_sizes(GetAction, &l1, &l2, &l3);
+  return l2;
+}
+
+/** \returns the currently set level 3 cpu cache size (in bytes) used to estimate the ideal blocking size parameters.
+  * \sa setCpuCacheSize */
+inline std::ptrdiff_t l3CacheSize()
+{
+  std::ptrdiff_t l1, l2, l3;
+  internal::manage_caching_sizes(GetAction, &l1, &l2, &l3);
+  return l3;
+}
+
+/** Set the cpu L1 and L2 cache sizes (in bytes).
+  * These values are use to adjust the size of the blocks
+  * for the algorithms working per blocks.
+  *
+  * \sa computeProductBlockingSizes */
+inline void setCpuCacheSizes(std::ptrdiff_t l1, std::ptrdiff_t l2, std::ptrdiff_t l3)
+{
+  internal::manage_caching_sizes(SetAction, &l1, &l2, &l3);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERAL_BLOCK_PANEL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrix.h b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrix.h
new file mode 100644
index 0000000000..c3715b1a39
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrix.h
@@ -0,0 +1,465 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GENERAL_MATRIX_MATRIX_H
+#define EIGEN_GENERAL_MATRIX_MATRIX_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename _LhsScalar, typename _RhsScalar> class level3_blocking;
+
+/* Specialization for a row-major destination matrix => simple transposition of the product */
+template<
+  typename Index,
+  typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs,
+  typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs>
+struct general_matrix_matrix_product<Index,LhsScalar,LhsStorageOrder,ConjugateLhs,RhsScalar,RhsStorageOrder,ConjugateRhs,RowMajor>
+{
+  typedef gebp_traits<RhsScalar,LhsScalar> Traits;
+  
+  typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+  static EIGEN_STRONG_INLINE void run(
+    Index rows, Index cols, Index depth,
+    const LhsScalar* lhs, Index lhsStride,
+    const RhsScalar* rhs, Index rhsStride,
+    ResScalar* res, Index resStride,
+    ResScalar alpha,
+    level3_blocking<RhsScalar,LhsScalar>& blocking,
+    GemmParallelInfo<Index>* info = 0)
+  {
+    // transpose the product such that the result is column major
+    general_matrix_matrix_product<Index,
+      RhsScalar, RhsStorageOrder==RowMajor ? ColMajor : RowMajor, ConjugateRhs,
+      LhsScalar, LhsStorageOrder==RowMajor ? ColMajor : RowMajor, ConjugateLhs,
+      ColMajor>
+    ::run(cols,rows,depth,rhs,rhsStride,lhs,lhsStride,res,resStride,alpha,blocking,info);
+  }
+};
+
+/*  Specialization for a col-major destination matrix
+ *    => Blocking algorithm following Goto's paper */
+template<
+  typename Index,
+  typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs,
+  typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs>
+struct general_matrix_matrix_product<Index,LhsScalar,LhsStorageOrder,ConjugateLhs,RhsScalar,RhsStorageOrder,ConjugateRhs,ColMajor>
+{
+
+typedef gebp_traits<LhsScalar,RhsScalar> Traits;
+  
+typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+static void run(Index rows, Index cols, Index depth,
+  const LhsScalar* _lhs, Index lhsStride,
+  const RhsScalar* _rhs, Index rhsStride,
+  ResScalar* _res, Index resStride,
+  ResScalar alpha,
+  level3_blocking<LhsScalar,RhsScalar>& blocking,
+  GemmParallelInfo<Index>* info = 0)
+{
+  typedef const_blas_data_mapper<LhsScalar, Index, LhsStorageOrder> LhsMapper;
+  typedef const_blas_data_mapper<RhsScalar, Index, RhsStorageOrder> RhsMapper;
+  typedef blas_data_mapper<typename Traits::ResScalar, Index, ColMajor> ResMapper;
+  LhsMapper lhs(_lhs,lhsStride);
+  RhsMapper rhs(_rhs,rhsStride);
+  ResMapper res(_res, resStride);
+
+  Index kc = blocking.kc();                   // cache block size along the K direction
+  Index mc = (std::min)(rows,blocking.mc());  // cache block size along the M direction
+  Index nc = (std::min)(cols,blocking.nc());  // cache block size along the N direction
+
+  gemm_pack_lhs<LhsScalar, Index, LhsMapper, Traits::mr, Traits::LhsProgress, LhsStorageOrder> pack_lhs;
+  gemm_pack_rhs<RhsScalar, Index, RhsMapper, Traits::nr, RhsStorageOrder> pack_rhs;
+  gebp_kernel<LhsScalar, RhsScalar, Index, ResMapper, Traits::mr, Traits::nr, ConjugateLhs, ConjugateRhs> gebp;
+
+#ifdef EIGEN_HAS_OPENMP
+  if(info)
+  {
+    // this is the parallel version!
+    Index tid = omp_get_thread_num();
+    Index threads = omp_get_num_threads();
+    
+    LhsScalar* blockA = blocking.blockA();
+    eigen_internal_assert(blockA!=0);
+    
+    std::size_t sizeB = kc*nc;
+    ei_declare_aligned_stack_constructed_variable(RhsScalar, blockB, sizeB, 0);
+      
+    // For each horizontal panel of the rhs, and corresponding vertical panel of the lhs...
+    for(Index k=0; k<depth; k+=kc)
+    {
+      const Index actual_kc = (std::min)(k+kc,depth)-k; // => rows of B', and cols of the A'
+
+      // In order to reduce the chance that a thread has to wait for the other,
+      // let's start by packing B'.
+      pack_rhs(blockB, rhs.getSubMapper(k,0), actual_kc, nc);
+
+      // Pack A_k to A' in a parallel fashion:
+      // each thread packs the sub block A_k,i to A'_i where i is the thread id.
+
+      // However, before copying to A'_i, we have to make sure that no other thread is still using it,
+      // i.e., we test that info[tid].users equals 0.
+      // Then, we set info[tid].users to the number of threads to mark that all other threads are going to use it.
+      while(info[tid].users!=0) {}
+      info[tid].users += threads;
+
+      pack_lhs(blockA+info[tid].lhs_start*actual_kc, lhs.getSubMapper(info[tid].lhs_start,k), actual_kc, info[tid].lhs_length);
+
+      // Notify the other threads that the part A'_i is ready to go.
+      info[tid].sync = k;
+      
+      // Computes C_i += A' * B' per A'_i
+      for(Index shift=0; shift<threads; ++shift)
+      {
+        Index i = (tid+shift)%threads;
+
+        // At this point we have to make sure that A'_i has been updated by the thread i,
+        // we use testAndSetOrdered to mimic a volatile access.
+        // However, no need to wait for the B' part which has been updated by the current thread!
+        if (shift>0) {
+          while(info[i].sync!=k) {
+          }
+        }
+
+        gebp(res.getSubMapper(info[i].lhs_start, 0), blockA+info[i].lhs_start*actual_kc, blockB, info[i].lhs_length, actual_kc, nc, alpha);
+      }
+
+      // Then keep going as usual with the remaining B'
+      for(Index j=nc; j<cols; j+=nc)
+      {
+        const Index actual_nc = (std::min)(j+nc,cols)-j;
+
+        // pack B_k,j to B'
+        pack_rhs(blockB, rhs.getSubMapper(k,j), actual_kc, actual_nc);
+
+        // C_j += A' * B'
+        gebp(res.getSubMapper(0, j), blockA, blockB, rows, actual_kc, actual_nc, alpha);
+      }
+
+      // Release all the sub blocks A'_i of A' for the current thread,
+      // i.e., we simply decrement the number of users by 1
+      #pragma omp critical
+      {
+      for(Index i=0; i<threads; ++i)
+        #pragma omp atomic
+        --(info[i].users);
+      }
+    }
+  }
+  else
+#endif // EIGEN_HAS_OPENMP
+  {
+    EIGEN_UNUSED_VARIABLE(info);
+
+    // this is the sequential version!
+    std::size_t sizeA = kc*mc;
+    std::size_t sizeB = kc*nc;
+
+    ei_declare_aligned_stack_constructed_variable(LhsScalar, blockA, sizeA, blocking.blockA());
+    ei_declare_aligned_stack_constructed_variable(RhsScalar, blockB, sizeB, blocking.blockB());
+    
+    const bool pack_rhs_once = mc!=rows && kc==depth && nc==cols;
+
+    // For each horizontal panel of the rhs, and corresponding panel of the lhs...
+    for(Index i2=0; i2<rows; i2+=mc)
+    {
+      const Index actual_mc = (std::min)(i2+mc,rows)-i2;
+
+      for(Index k2=0; k2<depth; k2+=kc)
+      {
+        const Index actual_kc = (std::min)(k2+kc,depth)-k2;
+        
+        // OK, here we have selected one horizontal panel of rhs and one vertical panel of lhs.
+        // => Pack lhs's panel into a sequential chunk of memory (L2/L3 caching)
+        // Note that this panel will be read as many times as the number of blocks in the rhs's
+        // horizontal panel which is, in practice, a very low number.
+        pack_lhs(blockA, lhs.getSubMapper(i2,k2), actual_kc, actual_mc);
+        
+        // For each kc x nc block of the rhs's horizontal panel...
+        for(Index j2=0; j2<cols; j2+=nc)
+        {
+          const Index actual_nc = (std::min)(j2+nc,cols)-j2;
+          
+          // We pack the rhs's block into a sequential chunk of memory (L2 caching)
+          // Note that this block will be read a very high number of times, which is equal to the number of
+          // micro horizontal panel of the large rhs's panel (e.g., rows/12 times).
+          if((!pack_rhs_once) || i2==0)
+            pack_rhs(blockB, rhs.getSubMapper(k2,j2), actual_kc, actual_nc);
+          
+          // Everything is packed, we can now call the panel * block kernel:
+          gebp(res.getSubMapper(i2, j2), blockA, blockB, actual_mc, actual_kc, actual_nc, alpha);
+        }
+      }
+    }
+  }
+}
+
+};
+
+/*********************************************************************************
+*  Specialization of GeneralProduct<> for "large" GEMM, i.e.,
+*  implementation of the high level wrapper to general_matrix_matrix_product
+**********************************************************************************/
+
+template<typename Lhs, typename Rhs>
+struct traits<GeneralProduct<Lhs,Rhs,GemmProduct> >
+ : traits<ProductBase<GeneralProduct<Lhs,Rhs,GemmProduct>, Lhs, Rhs> >
+{};
+
+template<typename Scalar, typename Index, typename Gemm, typename Lhs, typename Rhs, typename Dest, typename BlockingType>
+struct gemm_functor
+{
+  gemm_functor(const Lhs& lhs, const Rhs& rhs, Dest& dest, const Scalar& actualAlpha, BlockingType& blocking)
+    : m_lhs(lhs), m_rhs(rhs), m_dest(dest), m_actualAlpha(actualAlpha), m_blocking(blocking)
+  {}
+
+  void initParallelSession() const
+  {
+    m_blocking.allocateA();
+  }
+
+  void operator() (Index row, Index rows, Index col=0, Index cols=-1, GemmParallelInfo<Index>* info=0) const
+  {
+    if(cols==-1)
+      cols = m_rhs.cols();
+
+    Gemm::run(rows, cols, m_lhs.cols(),
+              /*(const Scalar*)*/&m_lhs.coeffRef(row,0), m_lhs.outerStride(),
+              /*(const Scalar*)*/&m_rhs.coeffRef(0,col), m_rhs.outerStride(),
+              (Scalar*)&(m_dest.coeffRef(row,col)), m_dest.outerStride(),
+              m_actualAlpha, m_blocking, info);
+  }
+  
+  typedef typename Gemm::Traits Traits;
+
+  protected:
+    const Lhs& m_lhs;
+    const Rhs& m_rhs;
+    Dest& m_dest;
+    Scalar m_actualAlpha;
+    BlockingType& m_blocking;
+};
+
+template<int StorageOrder, typename LhsScalar, typename RhsScalar, int MaxRows, int MaxCols, int MaxDepth, int KcFactor=1,
+bool FiniteAtCompileTime = MaxRows!=Dynamic && MaxCols!=Dynamic && MaxDepth != Dynamic> class gemm_blocking_space;
+
+template<typename _LhsScalar, typename _RhsScalar>
+class level3_blocking
+{
+    typedef _LhsScalar LhsScalar;
+    typedef _RhsScalar RhsScalar;
+
+  protected:
+    LhsScalar* m_blockA;
+    RhsScalar* m_blockB;
+
+    DenseIndex m_mc;
+    DenseIndex m_nc;
+    DenseIndex m_kc;
+
+  public:
+
+    level3_blocking()
+      : m_blockA(0), m_blockB(0), m_mc(0), m_nc(0), m_kc(0)
+    {}
+
+    inline DenseIndex mc() const { return m_mc; }
+    inline DenseIndex nc() const { return m_nc; }
+    inline DenseIndex kc() const { return m_kc; }
+
+    inline LhsScalar* blockA() { return m_blockA; }
+    inline RhsScalar* blockB() { return m_blockB; }
+};
+
+template<int StorageOrder, typename _LhsScalar, typename _RhsScalar, int MaxRows, int MaxCols, int MaxDepth, int KcFactor>
+class gemm_blocking_space<StorageOrder,_LhsScalar,_RhsScalar,MaxRows, MaxCols, MaxDepth, KcFactor, true>
+  : public level3_blocking<
+      typename conditional<StorageOrder==RowMajor,_RhsScalar,_LhsScalar>::type,
+      typename conditional<StorageOrder==RowMajor,_LhsScalar,_RhsScalar>::type>
+{
+    enum {
+      Transpose = StorageOrder==RowMajor,
+      ActualRows = Transpose ? MaxCols : MaxRows,
+      ActualCols = Transpose ? MaxRows : MaxCols
+    };
+    typedef typename conditional<Transpose,_RhsScalar,_LhsScalar>::type LhsScalar;
+    typedef typename conditional<Transpose,_LhsScalar,_RhsScalar>::type RhsScalar;
+    typedef gebp_traits<LhsScalar,RhsScalar> Traits;
+    enum {
+      SizeA = ActualRows * MaxDepth,
+      SizeB = ActualCols * MaxDepth
+    };
+
+    EIGEN_ALIGN_DEFAULT LhsScalar m_staticA[SizeA];
+    EIGEN_ALIGN_DEFAULT RhsScalar m_staticB[SizeB];
+
+  public:
+
+    gemm_blocking_space(DenseIndex /*rows*/, DenseIndex /*cols*/, DenseIndex /*depth*/, int /*num_threads*/, bool /*full_rows = false*/)
+    {
+      this->m_mc = ActualRows;
+      this->m_nc = ActualCols;
+      this->m_kc = MaxDepth;
+      this->m_blockA = m_staticA;
+      this->m_blockB = m_staticB;
+    }
+
+    inline void allocateA() {}
+    inline void allocateB() {}
+    inline void allocateAll() {}
+};
+
+template<int StorageOrder, typename _LhsScalar, typename _RhsScalar, int MaxRows, int MaxCols, int MaxDepth, int KcFactor>
+class gemm_blocking_space<StorageOrder,_LhsScalar,_RhsScalar,MaxRows, MaxCols, MaxDepth, KcFactor, false>
+  : public level3_blocking<
+      typename conditional<StorageOrder==RowMajor,_RhsScalar,_LhsScalar>::type,
+      typename conditional<StorageOrder==RowMajor,_LhsScalar,_RhsScalar>::type>
+{
+    enum {
+      Transpose = StorageOrder==RowMajor
+    };
+    typedef typename conditional<Transpose,_RhsScalar,_LhsScalar>::type LhsScalar;
+    typedef typename conditional<Transpose,_LhsScalar,_RhsScalar>::type RhsScalar;
+    typedef gebp_traits<LhsScalar,RhsScalar> Traits;
+
+    DenseIndex m_sizeA;
+    DenseIndex m_sizeB;
+
+  public:
+
+    gemm_blocking_space(DenseIndex rows, DenseIndex cols, DenseIndex depth, DenseIndex num_threads, bool l3_blocking)
+    {
+      this->m_mc = Transpose ? cols : rows;
+      this->m_nc = Transpose ? rows : cols;
+      this->m_kc = depth;
+
+      if(l3_blocking)
+      {
+        computeProductBlockingSizes<LhsScalar,RhsScalar,KcFactor>(this->m_kc, this->m_mc, this->m_nc, num_threads);
+      }
+      else // no l3 blocking
+      {
+        DenseIndex m = this->m_mc;
+        DenseIndex n = this->m_nc;
+        computeProductBlockingSizes<LhsScalar,RhsScalar,KcFactor>(this->m_kc, m, n, num_threads);
+      }
+      
+      m_sizeA = this->m_mc * this->m_kc;
+      m_sizeB = this->m_kc * this->m_nc;
+    }
+
+    void allocateA()
+    {
+      if(this->m_blockA==0)
+        this->m_blockA = aligned_new<LhsScalar>(m_sizeA);
+    }
+
+    void allocateB()
+    {
+      if(this->m_blockB==0)
+        this->m_blockB = aligned_new<RhsScalar>(m_sizeB);
+    }
+
+    void allocateAll()
+    {
+      allocateA();
+      allocateB();
+    }
+
+    ~gemm_blocking_space()
+    {
+      aligned_delete(this->m_blockA, m_sizeA);
+      aligned_delete(this->m_blockB, m_sizeB);
+    }
+};
+
+} // end namespace internal
+
+template<typename Lhs, typename Rhs>
+class GeneralProduct<Lhs, Rhs, GemmProduct>
+  : public ProductBase<GeneralProduct<Lhs,Rhs,GemmProduct>, Lhs, Rhs>
+{
+    enum {
+      MaxDepthAtCompileTime = EIGEN_SIZE_MIN_PREFER_FIXED(Lhs::MaxColsAtCompileTime,Rhs::MaxRowsAtCompileTime)
+    };
+  public:
+    EIGEN_PRODUCT_PUBLIC_INTERFACE(GeneralProduct)
+    
+    typedef typename  Lhs::Scalar LhsScalar;
+    typedef typename  Rhs::Scalar RhsScalar;
+    typedef           Scalar      ResScalar;
+
+    GeneralProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs)
+    {
+      typedef internal::scalar_product_op<LhsScalar,RhsScalar> BinOp;
+      EIGEN_CHECK_BINARY_COMPATIBILIY(BinOp,LhsScalar,RhsScalar);
+    }
+    
+    template<typename Dest>
+    inline void evalTo(Dest& dst) const
+    {
+      if((m_rhs.rows()+dst.rows()+dst.cols())<20 && m_rhs.rows()>0)
+        dst.noalias() = m_lhs .lazyProduct( m_rhs );
+      else
+      {
+        dst.setZero();
+        scaleAndAddTo(dst,Scalar(1));
+      }
+    }
+
+    template<typename Dest>
+    inline void addTo(Dest& dst) const
+    {
+      if((m_rhs.rows()+dst.rows()+dst.cols())<20 && m_rhs.rows()>0)
+        dst.noalias() += m_lhs .lazyProduct( m_rhs );
+      else
+        scaleAndAddTo(dst,Scalar(1));
+    }
+
+    template<typename Dest>
+    inline void subTo(Dest& dst) const
+    {
+      if((m_rhs.rows()+dst.rows()+dst.cols())<20 && m_rhs.rows()>0)
+        dst.noalias() -= m_lhs .lazyProduct( m_rhs );
+      else
+        scaleAndAddTo(dst,Scalar(-1));
+    }
+    
+    template<typename Dest> void scaleAndAddTo(Dest& dst, const Scalar& alpha) const
+    {
+      eigen_assert(dst.rows()==m_lhs.rows() && dst.cols()==m_rhs.cols());
+
+      typename internal::add_const_on_value_type<ActualLhsType>::type lhs = LhsBlasTraits::extract(m_lhs);
+      typename internal::add_const_on_value_type<ActualRhsType>::type rhs = RhsBlasTraits::extract(m_rhs);
+
+      Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(m_lhs)
+                                 * RhsBlasTraits::extractScalarFactor(m_rhs);
+
+      typedef internal::gemm_blocking_space<(Dest::Flags&RowMajorBit) ? RowMajor : ColMajor,LhsScalar,RhsScalar,
+              Dest::MaxRowsAtCompileTime,Dest::MaxColsAtCompileTime,MaxDepthAtCompileTime> BlockingType;
+
+      typedef internal::gemm_functor<
+        Scalar, Index,
+        internal::general_matrix_matrix_product<
+          Index,
+          LhsScalar, (_ActualLhsType::Flags&RowMajorBit) ? RowMajor : ColMajor, bool(LhsBlasTraits::NeedToConjugate),
+          RhsScalar, (_ActualRhsType::Flags&RowMajorBit) ? RowMajor : ColMajor, bool(RhsBlasTraits::NeedToConjugate),
+          (Dest::Flags&RowMajorBit) ? RowMajor : ColMajor>,
+        _ActualLhsType, _ActualRhsType, Dest, BlockingType> GemmFunctor;
+
+      BlockingType blocking(dst.rows(), dst.cols(), lhs.cols(), 1, true);
+
+      internal::parallelize_gemm<(Dest::MaxRowsAtCompileTime>32 || Dest::MaxRowsAtCompileTime==Dynamic)>(GemmFunctor(lhs, rhs, dst, actualAlpha, blocking), this->rows(), this->cols(), Dest::Flags&RowMajorBit);
+    }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERAL_MATRIX_MATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrixTriangular.h b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrixTriangular.h
new file mode 100644
index 0000000000..e4c10e88d1
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrixTriangular.h
@@ -0,0 +1,285 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_H
+#define EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_H
+
+namespace Eigen { 
+
+template<typename Scalar, typename Index, int StorageOrder, int UpLo, bool ConjLhs, bool ConjRhs>
+struct selfadjoint_rank1_update;
+
+namespace internal {
+
+/**********************************************************************
+* This file implements a general A * B product while
+* evaluating only one triangular part of the product.
+* This is more general version of self adjoint product (C += A A^T)
+* as the level 3 SYRK Blas routine.
+**********************************************************************/
+
+// forward declarations (defined at the end of this file)
+template<typename LhsScalar, typename RhsScalar, typename Index, int mr, int nr, bool ConjLhs, bool ConjRhs, int UpLo>
+struct tribb_kernel;
+  
+/* Optimized matrix-matrix product evaluating only one triangular half */
+template <typename Index,
+          typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs,
+          typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs,
+                              int ResStorageOrder, int  UpLo, int Version = Specialized>
+struct general_matrix_matrix_triangular_product;
+
+// as usual if the result is row major => we transpose the product
+template <typename Index, typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs,
+                          typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs, int  UpLo, int Version>
+struct general_matrix_matrix_triangular_product<Index,LhsScalar,LhsStorageOrder,ConjugateLhs,RhsScalar,RhsStorageOrder,ConjugateRhs,RowMajor,UpLo,Version>
+{
+  typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+  static EIGEN_STRONG_INLINE void run(Index size, Index depth,const LhsScalar* lhs, Index lhsStride,
+                                      const RhsScalar* rhs, Index rhsStride, ResScalar* res, Index resStride, const ResScalar& alpha)
+  {
+    general_matrix_matrix_triangular_product<Index,
+        RhsScalar, RhsStorageOrder==RowMajor ? ColMajor : RowMajor, ConjugateRhs,
+        LhsScalar, LhsStorageOrder==RowMajor ? ColMajor : RowMajor, ConjugateLhs,
+        ColMajor, UpLo==Lower?Upper:Lower>
+      ::run(size,depth,rhs,rhsStride,lhs,lhsStride,res,resStride,alpha);
+  }
+};
+
+template <typename Index, typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs,
+                          typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs, int  UpLo, int Version>
+struct general_matrix_matrix_triangular_product<Index,LhsScalar,LhsStorageOrder,ConjugateLhs,RhsScalar,RhsStorageOrder,ConjugateRhs,ColMajor,UpLo,Version>
+{
+  typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+  static EIGEN_STRONG_INLINE void run(Index size, Index depth,const LhsScalar* _lhs, Index lhsStride,
+                                      const RhsScalar* _rhs, Index rhsStride, ResScalar* _res, Index resStride, const ResScalar& alpha)
+  {
+    typedef gebp_traits<LhsScalar,RhsScalar> Traits;
+
+    typedef const_blas_data_mapper<LhsScalar, Index, LhsStorageOrder> LhsMapper;
+    typedef const_blas_data_mapper<RhsScalar, Index, RhsStorageOrder> RhsMapper;
+    typedef blas_data_mapper<typename Traits::ResScalar, Index, ColMajor> ResMapper;
+    LhsMapper lhs(_lhs,lhsStride);
+    RhsMapper rhs(_rhs,rhsStride);
+    ResMapper res(_res, resStride);
+
+    Index kc = depth; // cache block size along the K direction
+    Index mc = size;  // cache block size along the M direction
+    Index nc = size;  // cache block size along the N direction
+    computeProductBlockingSizes<LhsScalar,RhsScalar>(kc, mc, nc, Index(1));
+    // !!! mc must be a multiple of nr:
+    if(mc > Traits::nr)
+      mc = (mc/Traits::nr)*Traits::nr;
+
+    ei_declare_aligned_stack_constructed_variable(LhsScalar, blockA, kc*mc, 0);
+    ei_declare_aligned_stack_constructed_variable(RhsScalar, blockB, kc*size, 0);
+
+    gemm_pack_lhs<LhsScalar, Index, LhsMapper, Traits::mr, Traits::LhsProgress, LhsStorageOrder> pack_lhs;
+    gemm_pack_rhs<RhsScalar, Index, RhsMapper, Traits::nr, RhsStorageOrder> pack_rhs;
+    gebp_kernel<LhsScalar, RhsScalar, Index, ResMapper, Traits::mr, Traits::nr, ConjugateLhs, ConjugateRhs> gebp;
+    tribb_kernel<LhsScalar, RhsScalar, Index, Traits::mr, Traits::nr, ConjugateLhs, ConjugateRhs, UpLo> sybb;
+
+    for(Index k2=0; k2<depth; k2+=kc)
+    {
+      const Index actual_kc = (std::min)(k2+kc,depth)-k2;
+
+      // note that the actual rhs is the transpose/adjoint of mat
+      pack_rhs(blockB, rhs.getSubMapper(k2,0), actual_kc, size);
+
+      for(Index i2=0; i2<size; i2+=mc)
+      {
+        const Index actual_mc = (std::min)(i2+mc,size)-i2;
+
+        pack_lhs(blockA, lhs.getSubMapper(i2, k2), actual_kc, actual_mc);
+
+        // the selected actual_mc * size panel of res is split into three different part:
+        //  1 - before the diagonal => processed with gebp or skipped
+        //  2 - the actual_mc x actual_mc symmetric block => processed with a special kernel
+        //  3 - after the diagonal => processed with gebp or skipped
+        if (UpLo==Lower)
+          gebp(res.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc,
+               (std::min)(size,i2), alpha, -1, -1, 0, 0);
+
+
+        sybb(_res+resStride*i2 + i2, resStride, blockA, blockB + actual_kc*i2, actual_mc, actual_kc, alpha);
+
+        if (UpLo==Upper)
+        {
+          Index j2 = i2+actual_mc;
+          gebp(res.getSubMapper(i2, j2), blockA, blockB+actual_kc*j2, actual_mc,
+               actual_kc, (std::max)(Index(0), size-j2), alpha, -1, -1, 0, 0);
+        }
+      }
+    }
+  }
+};
+
+// Optimized packed Block * packed Block product kernel evaluating only one given triangular part
+// This kernel is built on top of the gebp kernel:
+// - the current destination block is processed per panel of actual_mc x BlockSize
+//   where BlockSize is set to the minimal value allowing gebp to be as fast as possible
+// - then, as usual, each panel is split into three parts along the diagonal,
+//   the sub blocks above and below the diagonal are processed as usual,
+//   while the triangular block overlapping the diagonal is evaluated into a
+//   small temporary buffer which is then accumulated into the result using a
+//   triangular traversal.
+template<typename LhsScalar, typename RhsScalar, typename Index, int mr, int nr, bool ConjLhs, bool ConjRhs, int UpLo>
+struct tribb_kernel
+{
+  typedef gebp_traits<LhsScalar,RhsScalar,ConjLhs,ConjRhs> Traits;
+  typedef typename Traits::ResScalar ResScalar;
+
+  enum {
+    BlockSize  = EIGEN_PLAIN_ENUM_MAX(mr,nr)
+  };
+  void operator()(ResScalar* _res, Index resStride, const LhsScalar* blockA, const RhsScalar* blockB, Index size, Index depth, const ResScalar& alpha)
+  {
+    typedef blas_data_mapper<ResScalar, Index, ColMajor> ResMapper;
+    ResMapper res(_res, resStride);
+    gebp_kernel<LhsScalar, RhsScalar, Index, ResMapper, mr, nr, ConjLhs, ConjRhs> gebp_kernel;
+
+    Matrix<ResScalar,BlockSize,BlockSize,ColMajor> buffer;
+
+    // let's process the block per panel of actual_mc x BlockSize,
+    // again, each is split into three parts, etc.
+    for (Index j=0; j<size; j+=BlockSize)
+    {
+      Index actualBlockSize = std::min<Index>(BlockSize,size - j);
+      const RhsScalar* actual_b = blockB+j*depth;
+
+      if(UpLo==Upper)
+        gebp_kernel(res.getSubMapper(0, j), blockA, actual_b, j, depth, actualBlockSize, alpha,
+                    -1, -1, 0, 0);
+
+      // selfadjoint micro block
+      {
+        Index i = j;
+        buffer.setZero();
+        // 1 - apply the kernel on the temporary buffer
+        gebp_kernel(ResMapper(buffer.data(), BlockSize), blockA+depth*i, actual_b, actualBlockSize, depth, actualBlockSize, alpha,
+                    -1, -1, 0, 0);
+        // 2 - triangular accumulation
+        for(Index j1=0; j1<actualBlockSize; ++j1)
+        {
+          ResScalar* r = &res(i, j + j1);
+          for(Index i1=UpLo==Lower ? j1 : 0;
+              UpLo==Lower ? i1<actualBlockSize : i1<=j1; ++i1)
+            r[i1] += buffer(i1,j1);
+        }
+      }
+
+      if(UpLo==Lower)
+      {
+        Index i = j+actualBlockSize;
+        gebp_kernel(res.getSubMapper(i, j), blockA+depth*i, actual_b, size-i, 
+                    depth, actualBlockSize, alpha, -1, -1, 0, 0);
+      }
+    }
+  }
+};
+
+} // end namespace internal
+
+// high level API
+
+template<typename MatrixType, typename ProductType, int UpLo, bool IsOuterProduct>
+struct general_product_to_triangular_selector;
+
+
+template<typename MatrixType, typename ProductType, int UpLo>
+struct general_product_to_triangular_selector<MatrixType,ProductType,UpLo,true>
+{
+  static void run(MatrixType& mat, const ProductType& prod, const typename MatrixType::Scalar& alpha)
+  {
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::Index Index;
+    
+    typedef typename internal::remove_all<typename ProductType::LhsNested>::type Lhs;
+    typedef internal::blas_traits<Lhs> LhsBlasTraits;
+    typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhs;
+    typedef typename internal::remove_all<ActualLhs>::type _ActualLhs;
+    typename internal::add_const_on_value_type<ActualLhs>::type actualLhs = LhsBlasTraits::extract(prod.lhs());
+    
+    typedef typename internal::remove_all<typename ProductType::RhsNested>::type Rhs;
+    typedef internal::blas_traits<Rhs> RhsBlasTraits;
+    typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhs;
+    typedef typename internal::remove_all<ActualRhs>::type _ActualRhs;
+    typename internal::add_const_on_value_type<ActualRhs>::type actualRhs = RhsBlasTraits::extract(prod.rhs());
+
+    Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(prod.lhs().derived()) * RhsBlasTraits::extractScalarFactor(prod.rhs().derived());
+
+    enum {
+      StorageOrder = (internal::traits<MatrixType>::Flags&RowMajorBit) ? RowMajor : ColMajor,
+      UseLhsDirectly = _ActualLhs::InnerStrideAtCompileTime==1,
+      UseRhsDirectly = _ActualRhs::InnerStrideAtCompileTime==1
+    };
+    
+    internal::gemv_static_vector_if<Scalar,Lhs::SizeAtCompileTime,Lhs::MaxSizeAtCompileTime,!UseLhsDirectly> static_lhs;
+    ei_declare_aligned_stack_constructed_variable(Scalar, actualLhsPtr, actualLhs.size(),
+      (UseLhsDirectly ? const_cast<Scalar*>(actualLhs.data()) : static_lhs.data()));
+    if(!UseLhsDirectly) Map<typename _ActualLhs::PlainObject>(actualLhsPtr, actualLhs.size()) = actualLhs;
+    
+    internal::gemv_static_vector_if<Scalar,Rhs::SizeAtCompileTime,Rhs::MaxSizeAtCompileTime,!UseRhsDirectly> static_rhs;
+    ei_declare_aligned_stack_constructed_variable(Scalar, actualRhsPtr, actualRhs.size(),
+      (UseRhsDirectly ? const_cast<Scalar*>(actualRhs.data()) : static_rhs.data()));
+    if(!UseRhsDirectly) Map<typename _ActualRhs::PlainObject>(actualRhsPtr, actualRhs.size()) = actualRhs;
+    
+    
+    selfadjoint_rank1_update<Scalar,Index,StorageOrder,UpLo,
+                              LhsBlasTraits::NeedToConjugate && NumTraits<Scalar>::IsComplex,
+                              RhsBlasTraits::NeedToConjugate && NumTraits<Scalar>::IsComplex>
+          ::run(actualLhs.size(), mat.data(), mat.outerStride(), actualLhsPtr, actualRhsPtr, actualAlpha);
+  }
+};
+
+template<typename MatrixType, typename ProductType, int UpLo>
+struct general_product_to_triangular_selector<MatrixType,ProductType,UpLo,false>
+{
+  static void run(MatrixType& mat, const ProductType& prod, const typename MatrixType::Scalar& alpha)
+  {
+    typedef typename MatrixType::Index Index;
+    
+    typedef typename internal::remove_all<typename ProductType::LhsNested>::type Lhs;
+    typedef internal::blas_traits<Lhs> LhsBlasTraits;
+    typedef typename LhsBlasTraits::DirectLinearAccessType ActualLhs;
+    typedef typename internal::remove_all<ActualLhs>::type _ActualLhs;
+    typename internal::add_const_on_value_type<ActualLhs>::type actualLhs = LhsBlasTraits::extract(prod.lhs());
+    
+    typedef typename internal::remove_all<typename ProductType::RhsNested>::type Rhs;
+    typedef internal::blas_traits<Rhs> RhsBlasTraits;
+    typedef typename RhsBlasTraits::DirectLinearAccessType ActualRhs;
+    typedef typename internal::remove_all<ActualRhs>::type _ActualRhs;
+    typename internal::add_const_on_value_type<ActualRhs>::type actualRhs = RhsBlasTraits::extract(prod.rhs());
+
+    typename ProductType::Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(prod.lhs().derived()) * RhsBlasTraits::extractScalarFactor(prod.rhs().derived());
+
+    internal::general_matrix_matrix_triangular_product<Index,
+      typename Lhs::Scalar, _ActualLhs::Flags&RowMajorBit ? RowMajor : ColMajor, LhsBlasTraits::NeedToConjugate,
+      typename Rhs::Scalar, _ActualRhs::Flags&RowMajorBit ? RowMajor : ColMajor, RhsBlasTraits::NeedToConjugate,
+      MatrixType::Flags&RowMajorBit ? RowMajor : ColMajor, UpLo>
+      ::run(mat.cols(), actualLhs.cols(),
+            &actualLhs.coeffRef(0,0), actualLhs.outerStride(), &actualRhs.coeffRef(0,0), actualRhs.outerStride(),
+            mat.data(), mat.outerStride(), actualAlpha);
+  }
+};
+
+template<typename MatrixType, unsigned int UpLo>
+template<typename ProductDerived, typename _Lhs, typename _Rhs>
+TriangularView<MatrixType,UpLo>& TriangularView<MatrixType,UpLo>::assignProduct(const ProductBase<ProductDerived, _Lhs,_Rhs>& prod, const Scalar& alpha)
+{
+  eigen_assert(m_matrix.rows() == prod.rows() && m_matrix.cols() == prod.cols());
+
+  general_product_to_triangular_selector<MatrixType, ProductDerived, UpLo, (_Lhs::ColsAtCompileTime==1) || (_Rhs::RowsAtCompileTime==1)>::run(m_matrix.const_cast_derived(), prod.derived(), alpha);
+  
+  return *this;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrixTriangular_MKL.h b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrixTriangular_MKL.h
new file mode 100644
index 0000000000..3deed068e3
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrixTriangular_MKL.h
@@ -0,0 +1,146 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   Level 3 BLAS SYRK/HERK implementation.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_MKL_H
+#define EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_MKL_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template <typename Index, typename Scalar, int AStorageOrder, bool ConjugateA, int ResStorageOrder, int  UpLo>
+struct general_matrix_matrix_rankupdate :
+       general_matrix_matrix_triangular_product<
+         Index,Scalar,AStorageOrder,ConjugateA,Scalar,AStorageOrder,ConjugateA,ResStorageOrder,UpLo,BuiltIn> {};
+
+
+// try to go to BLAS specialization
+#define EIGEN_MKL_RANKUPDATE_SPECIALIZE(Scalar) \
+template <typename Index, int LhsStorageOrder, bool ConjugateLhs, \
+                          int RhsStorageOrder, bool ConjugateRhs, int  UpLo> \
+struct general_matrix_matrix_triangular_product<Index,Scalar,LhsStorageOrder,ConjugateLhs, \
+               Scalar,RhsStorageOrder,ConjugateRhs,ColMajor,UpLo,Specialized> { \
+  static EIGEN_STRONG_INLINE void run(Index size, Index depth,const Scalar* lhs, Index lhsStride, \
+                          const Scalar* rhs, Index rhsStride, Scalar* res, Index resStride, Scalar alpha) \
+  { \
+    if (lhs==rhs) { \
+      general_matrix_matrix_rankupdate<Index,Scalar,LhsStorageOrder,ConjugateLhs,ColMajor,UpLo> \
+      ::run(size,depth,lhs,lhsStride,rhs,rhsStride,res,resStride,alpha); \
+    } else { \
+      general_matrix_matrix_triangular_product<Index, \
+        Scalar, LhsStorageOrder, ConjugateLhs, \
+        Scalar, RhsStorageOrder, ConjugateRhs, \
+        ColMajor, UpLo, BuiltIn> \
+      ::run(size,depth,lhs,lhsStride,rhs,rhsStride,res,resStride,alpha); \
+    } \
+  } \
+};
+
+EIGEN_MKL_RANKUPDATE_SPECIALIZE(double)
+//EIGEN_MKL_RANKUPDATE_SPECIALIZE(dcomplex)
+EIGEN_MKL_RANKUPDATE_SPECIALIZE(float)
+//EIGEN_MKL_RANKUPDATE_SPECIALIZE(scomplex)
+
+// SYRK for float/double
+#define EIGEN_MKL_RANKUPDATE_R(EIGTYPE, MKLTYPE, MKLFUNC) \
+template <typename Index, int AStorageOrder, bool ConjugateA, int  UpLo> \
+struct general_matrix_matrix_rankupdate<Index,EIGTYPE,AStorageOrder,ConjugateA,ColMajor,UpLo> { \
+  enum { \
+    IsLower = (UpLo&Lower) == Lower, \
+    LowUp = IsLower ? Lower : Upper, \
+    conjA = ((AStorageOrder==ColMajor) && ConjugateA) ? 1 : 0 \
+  }; \
+  static EIGEN_STRONG_INLINE void run(Index size, Index depth,const EIGTYPE* lhs, Index lhsStride, \
+                          const EIGTYPE* rhs, Index rhsStride, EIGTYPE* res, Index resStride, EIGTYPE alpha) \
+  { \
+  /* typedef Matrix<EIGTYPE, Dynamic, Dynamic, RhsStorageOrder> MatrixRhs;*/ \
+\
+   MKL_INT lda=lhsStride, ldc=resStride, n=size, k=depth; \
+   char uplo=(IsLower) ? 'L' : 'U', trans=(AStorageOrder==RowMajor) ? 'T':'N'; \
+   MKLTYPE alpha_, beta_; \
+\
+/* Set alpha_ & beta_ */ \
+   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(alpha_, alpha); \
+   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(beta_, EIGTYPE(1)); \
+   MKLFUNC(&uplo, &trans, &n, &k, &alpha_, lhs, &lda, &beta_, res, &ldc); \
+  } \
+};
+
+// HERK for complex data
+#define EIGEN_MKL_RANKUPDATE_C(EIGTYPE, MKLTYPE, RTYPE, MKLFUNC) \
+template <typename Index, int AStorageOrder, bool ConjugateA, int  UpLo> \
+struct general_matrix_matrix_rankupdate<Index,EIGTYPE,AStorageOrder,ConjugateA,ColMajor,UpLo> { \
+  enum { \
+    IsLower = (UpLo&Lower) == Lower, \
+    LowUp = IsLower ? Lower : Upper, \
+    conjA = (((AStorageOrder==ColMajor) && ConjugateA) || ((AStorageOrder==RowMajor) && !ConjugateA)) ? 1 : 0 \
+  }; \
+  static EIGEN_STRONG_INLINE void run(Index size, Index depth,const EIGTYPE* lhs, Index lhsStride, \
+                          const EIGTYPE* rhs, Index rhsStride, EIGTYPE* res, Index resStride, EIGTYPE alpha) \
+  { \
+   typedef Matrix<EIGTYPE, Dynamic, Dynamic, AStorageOrder> MatrixType; \
+\
+   MKL_INT lda=lhsStride, ldc=resStride, n=size, k=depth; \
+   char uplo=(IsLower) ? 'L' : 'U', trans=(AStorageOrder==RowMajor) ? 'C':'N'; \
+   RTYPE alpha_, beta_; \
+   const EIGTYPE* a_ptr; \
+\
+/* Set alpha_ & beta_ */ \
+/*   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(alpha_, alpha); */\
+/*   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(beta_, EIGTYPE(1));*/ \
+   alpha_ = alpha.real(); \
+   beta_ = 1.0; \
+/* Copy with conjugation in some cases*/ \
+   MatrixType a; \
+   if (conjA) { \
+     Map<const MatrixType, 0, OuterStride<> > mapA(lhs,n,k,OuterStride<>(lhsStride)); \
+     a = mapA.conjugate(); \
+     lda = a.outerStride(); \
+     a_ptr = a.data(); \
+   } else a_ptr=lhs; \
+   MKLFUNC(&uplo, &trans, &n, &k, &alpha_, (MKLTYPE*)a_ptr, &lda, &beta_, (MKLTYPE*)res, &ldc); \
+  } \
+};
+
+
+EIGEN_MKL_RANKUPDATE_R(double, double, dsyrk)
+EIGEN_MKL_RANKUPDATE_R(float,  float,  ssyrk)
+
+//EIGEN_MKL_RANKUPDATE_C(dcomplex, MKL_Complex16, double, zherk)
+//EIGEN_MKL_RANKUPDATE_C(scomplex, MKL_Complex8,  double, cherk)
+
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERAL_MATRIX_MATRIX_TRIANGULAR_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrix_MKL.h b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrix_MKL.h
new file mode 100644
index 0000000000..060af328eb
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixMatrix_MKL.h
@@ -0,0 +1,118 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   General matrix-matrix product functionality based on ?GEMM.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_GENERAL_MATRIX_MATRIX_MKL_H
+#define EIGEN_GENERAL_MATRIX_MATRIX_MKL_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/**********************************************************************
+* This file implements general matrix-matrix multiplication using BLAS
+* gemm function via partial specialization of
+* general_matrix_matrix_product::run(..) method for float, double,
+* std::complex<float> and std::complex<double> types
+**********************************************************************/
+
+// gemm specialization
+
+#define GEMM_SPECIALIZATION(EIGTYPE, EIGPREFIX, MKLTYPE, MKLPREFIX) \
+template< \
+  typename Index, \
+  int LhsStorageOrder, bool ConjugateLhs, \
+  int RhsStorageOrder, bool ConjugateRhs> \
+struct general_matrix_matrix_product<Index,EIGTYPE,LhsStorageOrder,ConjugateLhs,EIGTYPE,RhsStorageOrder,ConjugateRhs,ColMajor> \
+{ \
+static void run(Index rows, Index cols, Index depth, \
+  const EIGTYPE* _lhs, Index lhsStride, \
+  const EIGTYPE* _rhs, Index rhsStride, \
+  EIGTYPE* res, Index resStride, \
+  EIGTYPE alpha, \
+  level3_blocking<EIGTYPE, EIGTYPE>& /*blocking*/, \
+  GemmParallelInfo<Index>* /*info = 0*/) \
+{ \
+  using std::conj; \
+\
+  char transa, transb; \
+  MKL_INT m, n, k, lda, ldb, ldc; \
+  const EIGTYPE *a, *b; \
+  MKLTYPE alpha_, beta_; \
+  MatrixX##EIGPREFIX a_tmp, b_tmp; \
+  EIGTYPE myone(1);\
+\
+/* Set transpose options */ \
+  transa = (LhsStorageOrder==RowMajor) ? ((ConjugateLhs) ? 'C' : 'T') : 'N'; \
+  transb = (RhsStorageOrder==RowMajor) ? ((ConjugateRhs) ? 'C' : 'T') : 'N'; \
+\
+/* Set m, n, k */ \
+  m = (MKL_INT)rows;  \
+  n = (MKL_INT)cols;  \
+  k = (MKL_INT)depth; \
+\
+/* Set alpha_ & beta_ */ \
+  assign_scalar_eig2mkl(alpha_, alpha); \
+  assign_scalar_eig2mkl(beta_, myone); \
+\
+/* Set lda, ldb, ldc */ \
+  lda = (MKL_INT)lhsStride; \
+  ldb = (MKL_INT)rhsStride; \
+  ldc = (MKL_INT)resStride; \
+\
+/* Set a, b, c */ \
+  if ((LhsStorageOrder==ColMajor) && (ConjugateLhs)) { \
+    Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > lhs(_lhs,m,k,OuterStride<>(lhsStride)); \
+    a_tmp = lhs.conjugate(); \
+    a = a_tmp.data(); \
+    lda = a_tmp.outerStride(); \
+  } else a = _lhs; \
+\
+  if ((RhsStorageOrder==ColMajor) && (ConjugateRhs)) { \
+    Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > rhs(_rhs,k,n,OuterStride<>(rhsStride)); \
+    b_tmp = rhs.conjugate(); \
+    b = b_tmp.data(); \
+    ldb = b_tmp.outerStride(); \
+  } else b = _rhs; \
+\
+  MKLPREFIX##gemm(&transa, &transb, &m, &n, &k, &alpha_, (const MKLTYPE*)a, &lda, (const MKLTYPE*)b, &ldb, &beta_, (MKLTYPE*)res, &ldc); \
+}};
+
+GEMM_SPECIALIZATION(double,   d,  double,        d)
+GEMM_SPECIALIZATION(float,    f,  float,         s)
+GEMM_SPECIALIZATION(dcomplex, cd, MKL_Complex16, z)
+GEMM_SPECIALIZATION(scomplex, cf, MKL_Complex8,  c)
+
+} // end namespase internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERAL_MATRIX_MATRIX_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixVector.h b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixVector.h
new file mode 100644
index 0000000000..cb67d5d0a9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixVector.h
@@ -0,0 +1,618 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GENERAL_MATRIX_VECTOR_H
+#define EIGEN_GENERAL_MATRIX_VECTOR_H
+
+namespace Eigen {
+
+namespace internal {
+
+/* Optimized col-major matrix * vector product:
+ * This algorithm processes 4 columns at onces that allows to both reduce
+ * the number of load/stores of the result by a factor 4 and to reduce
+ * the instruction dependency. Moreover, we know that all bands have the
+ * same alignment pattern.
+ *
+ * Mixing type logic: C += alpha * A * B
+ *  |  A  |  B  |alpha| comments
+ *  |real |cplx |cplx | no vectorization
+ *  |real |cplx |real | alpha is converted to a cplx when calling the run function, no vectorization
+ *  |cplx |real |cplx | invalid, the caller has to do tmp: = A * B; C += alpha*tmp
+ *  |cplx |real |real | optimal case, vectorization possible via real-cplx mul
+ *
+ * Accesses to the matrix coefficients follow the following logic:
+ *
+ * - if all columns have the same alignment then
+ *   - if the columns have the same alignment as the result vector, then easy! (-> AllAligned case)
+ *   - otherwise perform unaligned loads only (-> NoneAligned case)
+ * - otherwise
+ *   - if even columns have the same alignment then
+ *     // odd columns are guaranteed to have the same alignment too
+ *     - if even or odd columns have the same alignment as the result, then
+ *       // for a register size of 2 scalars, this is guarantee to be the case (e.g., SSE with double)
+ *       - perform half aligned and half unaligned loads (-> EvenAligned case)
+ *     - otherwise perform unaligned loads only (-> NoneAligned case)
+ *   - otherwise, if the register size is 4 scalars (e.g., SSE with float) then
+ *     - one over 4 consecutive columns is guaranteed to be aligned with the result vector,
+ *       perform simple aligned loads for this column and aligned loads plus re-alignment for the other. (-> FirstAligned case)
+ *       // this re-alignment is done by the palign function implemented for SSE in Eigen/src/Core/arch/SSE/PacketMath.h
+ *   - otherwise,
+ *     // if we get here, this means the register size is greater than 4 (e.g., AVX with floats),
+ *     // we currently fall back to the NoneAligned case
+ *
+ * The same reasoning apply for the transposed case.
+ *
+ * The last case (PacketSize>4) could probably be improved by generalizing the FirstAligned case, but since we do not support AVX yet...
+ * One might also wonder why in the EvenAligned case we perform unaligned loads instead of using the aligned-loads plus re-alignment
+ * strategy as in the FirstAligned case. The reason is that we observed that unaligned loads on a 8 byte boundary are not too slow
+ * compared to unaligned loads on a 4 byte boundary.
+ *
+ */
+template<typename Index, typename LhsScalar, typename LhsMapper, bool ConjugateLhs, typename RhsScalar, typename RhsMapper, bool ConjugateRhs, int Version>
+struct general_matrix_vector_product<Index,LhsScalar,LhsMapper,ColMajor,ConjugateLhs,RhsScalar,RhsMapper,ConjugateRhs,Version>
+{
+  typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+
+enum {
+  Vectorizable = packet_traits<LhsScalar>::Vectorizable && packet_traits<RhsScalar>::Vectorizable
+              && int(packet_traits<LhsScalar>::size)==int(packet_traits<RhsScalar>::size),
+  LhsPacketSize = Vectorizable ? packet_traits<LhsScalar>::size : 1,
+  RhsPacketSize = Vectorizable ? packet_traits<RhsScalar>::size : 1,
+  ResPacketSize = Vectorizable ? packet_traits<ResScalar>::size : 1
+};
+
+typedef typename packet_traits<LhsScalar>::type  _LhsPacket;
+typedef typename packet_traits<RhsScalar>::type  _RhsPacket;
+typedef typename packet_traits<ResScalar>::type  _ResPacket;
+
+typedef typename conditional<Vectorizable,_LhsPacket,LhsScalar>::type LhsPacket;
+typedef typename conditional<Vectorizable,_RhsPacket,RhsScalar>::type RhsPacket;
+typedef typename conditional<Vectorizable,_ResPacket,ResScalar>::type ResPacket;
+
+EIGEN_DONT_INLINE static void run(
+  Index rows, Index cols,
+  const LhsMapper& lhs,
+  const RhsMapper& rhs,
+        ResScalar* res, Index resIncr,
+  RhsScalar alpha);
+};
+
+template<typename Index, typename LhsScalar, typename LhsMapper, bool ConjugateLhs, typename RhsScalar, typename RhsMapper, bool ConjugateRhs, int Version>
+EIGEN_DONT_INLINE void general_matrix_vector_product<Index,LhsScalar,LhsMapper,ColMajor,ConjugateLhs,RhsScalar,RhsMapper,ConjugateRhs,Version>::run(
+  Index rows, Index cols,
+  const LhsMapper& lhs,
+  const RhsMapper& rhs,
+        ResScalar* res, Index resIncr,
+  RhsScalar alpha)
+{
+  EIGEN_UNUSED_VARIABLE(resIncr);
+  eigen_internal_assert(resIncr==1);
+  #ifdef _EIGEN_ACCUMULATE_PACKETS
+  #error _EIGEN_ACCUMULATE_PACKETS has already been defined
+  #endif
+  #define _EIGEN_ACCUMULATE_PACKETS(Alignment0,Alignment13,Alignment2) \
+    pstore(&res[j], \
+      padd(pload<ResPacket>(&res[j]), \
+        padd( \
+      padd(pcj.pmul(lhs0.template load<LhsPacket, Alignment0>(j),    ptmp0), \
+      pcj.pmul(lhs1.template load<LhsPacket, Alignment13>(j),   ptmp1)),   \
+      padd(pcj.pmul(lhs2.template load<LhsPacket, Alignment2>(j),    ptmp2), \
+      pcj.pmul(lhs3.template load<LhsPacket, Alignment13>(j),   ptmp3)) )))
+
+  typedef typename LhsMapper::VectorMapper LhsScalars;
+
+  conj_helper<LhsScalar,RhsScalar,ConjugateLhs,ConjugateRhs> cj;
+  conj_helper<LhsPacket,RhsPacket,ConjugateLhs,ConjugateRhs> pcj;
+  if(ConjugateRhs)
+    alpha = numext::conj(alpha);
+
+  enum { AllAligned = 0, EvenAligned, FirstAligned, NoneAligned };
+  const Index columnsAtOnce = 4;
+  const Index peels = 2;
+  const Index LhsPacketAlignedMask = LhsPacketSize-1;
+  const Index ResPacketAlignedMask = ResPacketSize-1;
+//  const Index PeelAlignedMask = ResPacketSize*peels-1;
+  const Index size = rows;
+
+  const Index lhsStride = lhs.stride();
+
+  // How many coeffs of the result do we have to skip to be aligned.
+  // Here we assume data are at least aligned on the base scalar type.
+  Index alignedStart = internal::first_aligned(res,size);
+  Index alignedSize = ResPacketSize>1 ? alignedStart + ((size-alignedStart) & ~ResPacketAlignedMask) : 0;
+  const Index peeledSize = alignedSize - RhsPacketSize*peels - RhsPacketSize + 1;
+
+  const Index alignmentStep = LhsPacketSize>1 ? (LhsPacketSize - lhsStride % LhsPacketSize) & LhsPacketAlignedMask : 0;
+  Index alignmentPattern = alignmentStep==0 ? AllAligned
+                       : alignmentStep==(LhsPacketSize/2) ? EvenAligned
+                       : FirstAligned;
+
+  // we cannot assume the first element is aligned because of sub-matrices
+  const Index lhsAlignmentOffset = lhs.firstAligned(size);
+
+  // find how many columns do we have to skip to be aligned with the result (if possible)
+  Index skipColumns = 0;
+  // if the data cannot be aligned (TODO add some compile time tests when possible, e.g. for floats)
+  if( (lhsAlignmentOffset < 0) || (lhsAlignmentOffset == size) || (size_t(res)%sizeof(ResScalar)) )
+  {
+    alignedSize = 0;
+    alignedStart = 0;
+    alignmentPattern = NoneAligned;
+  }
+  else if(LhsPacketSize > 4)
+  {
+    // TODO: extend the code to support aligned loads whenever possible when LhsPacketSize > 4.
+    // Currently, it seems to be better to perform unaligned loads anyway
+    alignmentPattern = NoneAligned;
+  }
+  else if (LhsPacketSize>1)
+  {
+  //    eigen_internal_assert(size_t(firstLhs+lhsAlignmentOffset)%sizeof(LhsPacket)==0 || size<LhsPacketSize);
+
+    while (skipColumns<LhsPacketSize &&
+          alignedStart != ((lhsAlignmentOffset + alignmentStep*skipColumns)%LhsPacketSize))
+      ++skipColumns;
+    if (skipColumns==LhsPacketSize)
+    {
+      // nothing can be aligned, no need to skip any column
+      alignmentPattern = NoneAligned;
+      skipColumns = 0;
+    }
+    else
+    {
+      skipColumns = (std::min)(skipColumns,cols);
+      // note that the skiped columns are processed later.
+    }
+
+    /*    eigen_internal_assert(  (alignmentPattern==NoneAligned)
+                      || (skipColumns + columnsAtOnce >= cols)
+                      || LhsPacketSize > size
+                      || (size_t(firstLhs+alignedStart+lhsStride*skipColumns)%sizeof(LhsPacket))==0);*/
+  }
+  else if(Vectorizable)
+  {
+    alignedStart = 0;
+    alignedSize = size;
+    alignmentPattern = AllAligned;
+  }
+
+  const Index offset1 = (FirstAligned && alignmentStep==1?3:1);
+  const Index offset3 = (FirstAligned && alignmentStep==1?1:3);
+
+  Index columnBound = ((cols-skipColumns)/columnsAtOnce)*columnsAtOnce + skipColumns;
+  for (Index i=skipColumns; i<columnBound; i+=columnsAtOnce)
+  {
+    RhsPacket ptmp0 = pset1<RhsPacket>(alpha*rhs(i, 0)),
+              ptmp1 = pset1<RhsPacket>(alpha*rhs(i+offset1, 0)),
+              ptmp2 = pset1<RhsPacket>(alpha*rhs(i+2, 0)),
+              ptmp3 = pset1<RhsPacket>(alpha*rhs(i+offset3, 0));
+
+    // this helps a lot generating better binary code
+    const LhsScalars lhs0 = lhs.getVectorMapper(0, i+0),   lhs1 = lhs.getVectorMapper(0, i+offset1),
+                     lhs2 = lhs.getVectorMapper(0, i+2),   lhs3 = lhs.getVectorMapper(0, i+offset3);
+
+    if (Vectorizable)
+    {
+      /* explicit vectorization */
+      // process initial unaligned coeffs
+      for (Index j=0; j<alignedStart; ++j)
+      {
+        res[j] = cj.pmadd(lhs0(j), pfirst(ptmp0), res[j]);
+        res[j] = cj.pmadd(lhs1(j), pfirst(ptmp1), res[j]);
+        res[j] = cj.pmadd(lhs2(j), pfirst(ptmp2), res[j]);
+        res[j] = cj.pmadd(lhs3(j), pfirst(ptmp3), res[j]);
+      }
+
+      if (alignedSize>alignedStart)
+      {
+        switch(alignmentPattern)
+        {
+          case AllAligned:
+            for (Index j = alignedStart; j<alignedSize; j+=ResPacketSize)
+              _EIGEN_ACCUMULATE_PACKETS(Aligned,Aligned,Aligned);
+            break;
+          case EvenAligned:
+            for (Index j = alignedStart; j<alignedSize; j+=ResPacketSize)
+              _EIGEN_ACCUMULATE_PACKETS(Aligned,Unaligned,Aligned);
+            break;
+          case FirstAligned:
+          {
+            Index j = alignedStart;
+            if(peels>1)
+            {
+              LhsPacket A00, A01, A02, A03, A10, A11, A12, A13;
+              ResPacket T0, T1;
+
+              A01 = lhs1.template load<LhsPacket, Aligned>(alignedStart-1);
+              A02 = lhs2.template load<LhsPacket, Aligned>(alignedStart-2);
+              A03 = lhs3.template load<LhsPacket, Aligned>(alignedStart-3);
+
+              for (; j<peeledSize; j+=peels*ResPacketSize)
+              {
+                A11 = lhs1.template load<LhsPacket, Aligned>(j-1+LhsPacketSize);  palign<1>(A01,A11);
+                A12 = lhs2.template load<LhsPacket, Aligned>(j-2+LhsPacketSize);  palign<2>(A02,A12);
+                A13 = lhs3.template load<LhsPacket, Aligned>(j-3+LhsPacketSize);  palign<3>(A03,A13);
+
+                A00 = lhs0.template load<LhsPacket, Aligned>(j);
+                A10 = lhs0.template load<LhsPacket, Aligned>(j+LhsPacketSize);
+                T0  = pcj.pmadd(A00, ptmp0, pload<ResPacket>(&res[j]));
+                T1  = pcj.pmadd(A10, ptmp0, pload<ResPacket>(&res[j+ResPacketSize]));
+
+                T0  = pcj.pmadd(A01, ptmp1, T0);
+                A01 = lhs1.template load<LhsPacket, Aligned>(j-1+2*LhsPacketSize);  palign<1>(A11,A01);
+                T0  = pcj.pmadd(A02, ptmp2, T0);
+                A02 = lhs2.template load<LhsPacket, Aligned>(j-2+2*LhsPacketSize);  palign<2>(A12,A02);
+                T0  = pcj.pmadd(A03, ptmp3, T0);
+                pstore(&res[j],T0);
+                A03 = lhs3.template load<LhsPacket, Aligned>(j-3+2*LhsPacketSize);  palign<3>(A13,A03);
+                T1  = pcj.pmadd(A11, ptmp1, T1);
+                T1  = pcj.pmadd(A12, ptmp2, T1);
+                T1  = pcj.pmadd(A13, ptmp3, T1);
+                pstore(&res[j+ResPacketSize],T1);
+              }
+            }
+            for (; j<alignedSize; j+=ResPacketSize)
+              _EIGEN_ACCUMULATE_PACKETS(Aligned,Unaligned,Unaligned);
+            break;
+          }
+          default:
+            for (Index j = alignedStart; j<alignedSize; j+=ResPacketSize)
+              _EIGEN_ACCUMULATE_PACKETS(Unaligned,Unaligned,Unaligned);
+            break;
+        }
+      }
+    } // end explicit vectorization
+
+    /* process remaining coeffs (or all if there is no explicit vectorization) */
+    for (Index j=alignedSize; j<size; ++j)
+    {
+      res[j] = cj.pmadd(lhs0(j), pfirst(ptmp0), res[j]);
+      res[j] = cj.pmadd(lhs1(j), pfirst(ptmp1), res[j]);
+      res[j] = cj.pmadd(lhs2(j), pfirst(ptmp2), res[j]);
+      res[j] = cj.pmadd(lhs3(j), pfirst(ptmp3), res[j]);
+    }
+  }
+
+  // process remaining first and last columns (at most columnsAtOnce-1)
+  Index end = cols;
+  Index start = columnBound;
+  do
+  {
+    for (Index k=start; k<end; ++k)
+    {
+      RhsPacket ptmp0 = pset1<RhsPacket>(alpha*rhs(k, 0));
+      const LhsScalars lhs0 = lhs.getVectorMapper(0, k);
+
+      if (Vectorizable)
+      {
+        /* explicit vectorization */
+        // process first unaligned result's coeffs
+        for (Index j=0; j<alignedStart; ++j)
+          res[j] += cj.pmul(lhs0(j), pfirst(ptmp0));
+        // process aligned result's coeffs
+        if (lhs0.template aligned<LhsPacket>(alignedStart))
+          for (Index i = alignedStart;i<alignedSize;i+=ResPacketSize)
+            pstore(&res[i], pcj.pmadd(lhs0.template load<LhsPacket, Aligned>(i), ptmp0, pload<ResPacket>(&res[i])));
+        else
+          for (Index i = alignedStart;i<alignedSize;i+=ResPacketSize)
+            pstore(&res[i], pcj.pmadd(lhs0.template load<LhsPacket, Unaligned>(i), ptmp0, pload<ResPacket>(&res[i])));
+      }
+
+      // process remaining scalars (or all if no explicit vectorization)
+      for (Index i=alignedSize; i<size; ++i)
+        res[i] += cj.pmul(lhs0(i), pfirst(ptmp0));
+    }
+    if (skipColumns)
+    {
+      start = 0;
+      end = skipColumns;
+      skipColumns = 0;
+    }
+    else
+      break;
+  } while(Vectorizable);
+  #undef _EIGEN_ACCUMULATE_PACKETS
+}
+
+/* Optimized row-major matrix * vector product:
+ * This algorithm processes 4 rows at onces that allows to both reduce
+ * the number of load/stores of the result by a factor 4 and to reduce
+ * the instruction dependency. Moreover, we know that all bands have the
+ * same alignment pattern.
+ *
+ * Mixing type logic:
+ *  - alpha is always a complex (or converted to a complex)
+ *  - no vectorization
+ */
+template<typename Index, typename LhsScalar, typename LhsMapper, bool ConjugateLhs, typename RhsScalar, typename RhsMapper, bool ConjugateRhs, int Version>
+struct general_matrix_vector_product<Index,LhsScalar,LhsMapper,RowMajor,ConjugateLhs,RhsScalar,RhsMapper,ConjugateRhs,Version>
+{
+typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+
+enum {
+  Vectorizable = packet_traits<LhsScalar>::Vectorizable && packet_traits<RhsScalar>::Vectorizable
+              && int(packet_traits<LhsScalar>::size)==int(packet_traits<RhsScalar>::size),
+  LhsPacketSize = Vectorizable ? packet_traits<LhsScalar>::size : 1,
+  RhsPacketSize = Vectorizable ? packet_traits<RhsScalar>::size : 1,
+  ResPacketSize = Vectorizable ? packet_traits<ResScalar>::size : 1
+};
+
+typedef typename packet_traits<LhsScalar>::type  _LhsPacket;
+typedef typename packet_traits<RhsScalar>::type  _RhsPacket;
+typedef typename packet_traits<ResScalar>::type  _ResPacket;
+
+typedef typename conditional<Vectorizable,_LhsPacket,LhsScalar>::type LhsPacket;
+typedef typename conditional<Vectorizable,_RhsPacket,RhsScalar>::type RhsPacket;
+typedef typename conditional<Vectorizable,_ResPacket,ResScalar>::type ResPacket;
+
+EIGEN_DONT_INLINE static void run(
+  Index rows, Index cols,
+  const LhsMapper& lhs,
+  const RhsMapper& rhs,
+        ResScalar* res, Index resIncr,
+  ResScalar alpha);
+};
+
+template<typename Index, typename LhsScalar, typename LhsMapper, bool ConjugateLhs, typename RhsScalar, typename RhsMapper, bool ConjugateRhs, int Version>
+EIGEN_DONT_INLINE void general_matrix_vector_product<Index,LhsScalar,LhsMapper,RowMajor,ConjugateLhs,RhsScalar,RhsMapper,ConjugateRhs,Version>::run(
+  Index rows, Index cols,
+  const LhsMapper& lhs,
+  const RhsMapper& rhs,
+  ResScalar* res, Index resIncr,
+  ResScalar alpha)
+{
+  eigen_internal_assert(rhs.stride()==1);
+
+  #ifdef _EIGEN_ACCUMULATE_PACKETS
+  #error _EIGEN_ACCUMULATE_PACKETS has already been defined
+  #endif
+
+  #define _EIGEN_ACCUMULATE_PACKETS(Alignment0,Alignment13,Alignment2) {\
+    RhsPacket b = rhs.getVectorMapper(j, 0).template load<RhsPacket, Aligned>(0);  \
+    ptmp0 = pcj.pmadd(lhs0.template load<LhsPacket, Alignment0>(j), b, ptmp0); \
+    ptmp1 = pcj.pmadd(lhs1.template load<LhsPacket, Alignment13>(j), b, ptmp1); \
+    ptmp2 = pcj.pmadd(lhs2.template load<LhsPacket, Alignment2>(j), b, ptmp2); \
+    ptmp3 = pcj.pmadd(lhs3.template load<LhsPacket, Alignment13>(j), b, ptmp3); }
+
+  conj_helper<LhsScalar,RhsScalar,ConjugateLhs,ConjugateRhs> cj;
+  conj_helper<LhsPacket,RhsPacket,ConjugateLhs,ConjugateRhs> pcj;
+
+  typedef typename LhsMapper::VectorMapper LhsScalars;
+
+  enum { AllAligned=0, EvenAligned=1, FirstAligned=2, NoneAligned=3 };
+  const Index rowsAtOnce = 4;
+  const Index peels = 2;
+  const Index RhsPacketAlignedMask = RhsPacketSize-1;
+  const Index LhsPacketAlignedMask = LhsPacketSize-1;
+  const Index depth = cols;
+  const Index lhsStride = lhs.stride();
+
+  // How many coeffs of the result do we have to skip to be aligned.
+  // Here we assume data are at least aligned on the base scalar type
+  // if that's not the case then vectorization is discarded, see below.
+  Index alignedStart = rhs.firstAligned(depth);
+  Index alignedSize = RhsPacketSize>1 ? alignedStart + ((depth-alignedStart) & ~RhsPacketAlignedMask) : 0;
+  const Index peeledSize = alignedSize - RhsPacketSize*peels - RhsPacketSize + 1;
+
+  const Index alignmentStep = LhsPacketSize>1 ? (LhsPacketSize - lhsStride % LhsPacketSize) & LhsPacketAlignedMask : 0;
+  Index alignmentPattern = alignmentStep==0 ? AllAligned
+                           : alignmentStep==(LhsPacketSize/2) ? EvenAligned
+                           : FirstAligned;
+
+  // we cannot assume the first element is aligned because of sub-matrices
+  const Index lhsAlignmentOffset = lhs.firstAligned(depth);
+  const Index rhsAlignmentOffset = rhs.firstAligned(rows);
+
+  // find how many rows do we have to skip to be aligned with rhs (if possible)
+  Index skipRows = 0;
+  // if the data cannot be aligned (TODO add some compile time tests when possible, e.g. for floats)
+  if( (sizeof(LhsScalar)!=sizeof(RhsScalar))
+      || (lhsAlignmentOffset < 0) || (lhsAlignmentOffset == depth)
+      || (rhsAlignmentOffset < 0) || (rhsAlignmentOffset == rows))
+  {
+    alignedSize = 0;
+    alignedStart = 0;
+    alignmentPattern = NoneAligned;
+  }
+  else if(LhsPacketSize > 4)
+  {
+    // TODO: extend the code to support aligned loads whenever possible when LhsPacketSize > 4.
+    alignmentPattern = NoneAligned;
+  }
+  else if (LhsPacketSize>1)
+  {
+  //    eigen_internal_assert(size_t(firstLhs+lhsAlignmentOffset)%sizeof(LhsPacket)==0  || depth<LhsPacketSize);
+
+    while (skipRows<LhsPacketSize &&
+           alignedStart != ((lhsAlignmentOffset + alignmentStep*skipRows)%LhsPacketSize))
+      ++skipRows;
+    if (skipRows==LhsPacketSize)
+    {
+      // nothing can be aligned, no need to skip any column
+      alignmentPattern = NoneAligned;
+      skipRows = 0;
+    }
+    else
+    {
+      skipRows = (std::min)(skipRows,Index(rows));
+      // note that the skiped columns are processed later.
+    }
+    /*    eigen_internal_assert(  alignmentPattern==NoneAligned
+                      || LhsPacketSize==1
+                      || (skipRows + rowsAtOnce >= rows)
+                      || LhsPacketSize > depth
+                      || (size_t(firstLhs+alignedStart+lhsStride*skipRows)%sizeof(LhsPacket))==0);*/
+  }
+  else if(Vectorizable)
+  {
+    alignedStart = 0;
+    alignedSize = depth;
+    alignmentPattern = AllAligned;
+  }
+
+  const Index offset1 = (FirstAligned && alignmentStep==1?3:1);
+  const Index offset3 = (FirstAligned && alignmentStep==1?1:3);
+
+  Index rowBound = ((rows-skipRows)/rowsAtOnce)*rowsAtOnce + skipRows;
+  for (Index i=skipRows; i<rowBound; i+=rowsAtOnce)
+  {
+    EIGEN_ALIGN_DEFAULT ResScalar tmp0 = ResScalar(0);
+    ResScalar tmp1 = ResScalar(0), tmp2 = ResScalar(0), tmp3 = ResScalar(0);
+
+    // this helps the compiler generating good binary code
+    const LhsScalars lhs0 = lhs.getVectorMapper(i+0, 0),    lhs1 = lhs.getVectorMapper(i+offset1, 0),
+                     lhs2 = lhs.getVectorMapper(i+2, 0),    lhs3 = lhs.getVectorMapper(i+offset3, 0);
+
+    if (Vectorizable)
+    {
+      /* explicit vectorization */
+      ResPacket ptmp0 = pset1<ResPacket>(ResScalar(0)), ptmp1 = pset1<ResPacket>(ResScalar(0)),
+                ptmp2 = pset1<ResPacket>(ResScalar(0)), ptmp3 = pset1<ResPacket>(ResScalar(0));
+
+      // process initial unaligned coeffs
+      // FIXME this loop get vectorized by the compiler !
+      for (Index j=0; j<alignedStart; ++j)
+      {
+        RhsScalar b = rhs(j, 0);
+        tmp0 += cj.pmul(lhs0(j),b); tmp1 += cj.pmul(lhs1(j),b);
+        tmp2 += cj.pmul(lhs2(j),b); tmp3 += cj.pmul(lhs3(j),b);
+      }
+
+      if (alignedSize>alignedStart)
+      {
+        switch(alignmentPattern)
+        {
+          case AllAligned:
+            for (Index j = alignedStart; j<alignedSize; j+=RhsPacketSize)
+              _EIGEN_ACCUMULATE_PACKETS(Aligned,Aligned,Aligned);
+            break;
+          case EvenAligned:
+            for (Index j = alignedStart; j<alignedSize; j+=RhsPacketSize)
+              _EIGEN_ACCUMULATE_PACKETS(Aligned,Unaligned,Aligned);
+            break;
+          case FirstAligned:
+          {
+            Index j = alignedStart;
+            if (peels>1)
+            {
+              /* Here we proccess 4 rows with with two peeled iterations to hide
+               * the overhead of unaligned loads. Moreover unaligned loads are handled
+               * using special shift/move operations between the two aligned packets
+               * overlaping the desired unaligned packet. This is *much* more efficient
+               * than basic unaligned loads.
+               */
+              LhsPacket A01, A02, A03, A11, A12, A13;
+              A01 = lhs1.template load<LhsPacket, Aligned>(alignedStart-1);
+              A02 = lhs2.template load<LhsPacket, Aligned>(alignedStart-2);
+              A03 = lhs3.template load<LhsPacket, Aligned>(alignedStart-3);
+
+              for (; j<peeledSize; j+=peels*RhsPacketSize)
+              {
+                RhsPacket b = rhs.getVectorMapper(j, 0).template load<RhsPacket, Aligned>(0);
+                A11 = lhs1.template load<LhsPacket, Aligned>(j-1+LhsPacketSize);  palign<1>(A01,A11);
+                A12 = lhs2.template load<LhsPacket, Aligned>(j-2+LhsPacketSize);  palign<2>(A02,A12);
+                A13 = lhs3.template load<LhsPacket, Aligned>(j-3+LhsPacketSize);  palign<3>(A03,A13);
+
+                ptmp0 = pcj.pmadd(lhs0.template load<LhsPacket, Aligned>(j), b, ptmp0);
+                ptmp1 = pcj.pmadd(A01, b, ptmp1);
+                A01 = lhs1.template load<LhsPacket, Aligned>(j-1+2*LhsPacketSize);  palign<1>(A11,A01);
+                ptmp2 = pcj.pmadd(A02, b, ptmp2);
+                A02 = lhs2.template load<LhsPacket, Aligned>(j-2+2*LhsPacketSize);  palign<2>(A12,A02);
+                ptmp3 = pcj.pmadd(A03, b, ptmp3);
+                A03 = lhs3.template load<LhsPacket, Aligned>(j-3+2*LhsPacketSize);  palign<3>(A13,A03);
+
+                b = rhs.getVectorMapper(j+RhsPacketSize, 0).template load<RhsPacket, Aligned>(0);
+                ptmp0 = pcj.pmadd(lhs0.template load<LhsPacket, Aligned>(j+LhsPacketSize), b, ptmp0);
+                ptmp1 = pcj.pmadd(A11, b, ptmp1);
+                ptmp2 = pcj.pmadd(A12, b, ptmp2);
+                ptmp3 = pcj.pmadd(A13, b, ptmp3);
+              }
+            }
+            for (; j<alignedSize; j+=RhsPacketSize)
+              _EIGEN_ACCUMULATE_PACKETS(Aligned,Unaligned,Unaligned);
+            break;
+          }
+          default:
+            for (Index j = alignedStart; j<alignedSize; j+=RhsPacketSize)
+              _EIGEN_ACCUMULATE_PACKETS(Unaligned,Unaligned,Unaligned);
+            break;
+        }
+        tmp0 += predux(ptmp0);
+        tmp1 += predux(ptmp1);
+        tmp2 += predux(ptmp2);
+        tmp3 += predux(ptmp3);
+      }
+    } // end explicit vectorization
+
+    // process remaining coeffs (or all if no explicit vectorization)
+    // FIXME this loop get vectorized by the compiler !
+    for (Index j=alignedSize; j<depth; ++j)
+    {
+      RhsScalar b = rhs(j, 0);
+      tmp0 += cj.pmul(lhs0(j),b); tmp1 += cj.pmul(lhs1(j),b);
+      tmp2 += cj.pmul(lhs2(j),b); tmp3 += cj.pmul(lhs3(j),b);
+    }
+    res[i*resIncr]            += alpha*tmp0;
+    res[(i+offset1)*resIncr]  += alpha*tmp1;
+    res[(i+2)*resIncr]        += alpha*tmp2;
+    res[(i+offset3)*resIncr]  += alpha*tmp3;
+  }
+
+  // process remaining first and last rows (at most columnsAtOnce-1)
+  Index end = rows;
+  Index start = rowBound;
+  do
+  {
+    for (Index i=start; i<end; ++i)
+    {
+      EIGEN_ALIGN_DEFAULT ResScalar tmp0 = ResScalar(0);
+      ResPacket ptmp0 = pset1<ResPacket>(tmp0);
+      const LhsScalars lhs0 = lhs.getVectorMapper(i, 0);
+      // process first unaligned result's coeffs
+      // FIXME this loop get vectorized by the compiler !
+      for (Index j=0; j<alignedStart; ++j)
+        tmp0 += cj.pmul(lhs0(j), rhs(j, 0));
+
+      if (alignedSize>alignedStart)
+      {
+        // process aligned rhs coeffs
+        if (lhs0.template aligned<LhsPacket>(alignedStart))
+          for (Index j = alignedStart;j<alignedSize;j+=RhsPacketSize)
+            ptmp0 = pcj.pmadd(lhs0.template load<LhsPacket, Aligned>(j), rhs.getVectorMapper(j, 0).template load<RhsPacket, Aligned>(0), ptmp0);
+        else
+          for (Index j = alignedStart;j<alignedSize;j+=RhsPacketSize)
+            ptmp0 = pcj.pmadd(lhs0.template load<LhsPacket, Unaligned>(j), rhs.getVectorMapper(j, 0).template load<RhsPacket, Aligned>(0), ptmp0);
+        tmp0 += predux(ptmp0);
+      }
+
+      // process remaining scalars
+      // FIXME this loop get vectorized by the compiler !
+      for (Index j=alignedSize; j<depth; ++j)
+        tmp0 += cj.pmul(lhs0(j), rhs(j, 0));
+      res[i*resIncr] += alpha*tmp0;
+    }
+    if (skipRows)
+    {
+      start = 0;
+      end = skipRows;
+      skipRows = 0;
+    }
+    else
+      break;
+  } while(Vectorizable);
+
+  #undef _EIGEN_ACCUMULATE_PACKETS
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERAL_MATRIX_VECTOR_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixVector_MKL.h b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixVector_MKL.h
new file mode 100644
index 0000000000..1cb9fe6b5a
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/GeneralMatrixVector_MKL.h
@@ -0,0 +1,131 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   General matrix-vector product functionality based on ?GEMV.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_GENERAL_MATRIX_VECTOR_MKL_H
+#define EIGEN_GENERAL_MATRIX_VECTOR_MKL_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/**********************************************************************
+* This file implements general matrix-vector multiplication using BLAS
+* gemv function via partial specialization of
+* general_matrix_vector_product::run(..) method for float, double,
+* std::complex<float> and std::complex<double> types
+**********************************************************************/
+
+// gemv specialization
+
+template<typename Index, typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs, typename RhsScalar, bool ConjugateRhs>
+struct general_matrix_vector_product_gemv :
+  general_matrix_vector_product<Index,LhsScalar,LhsStorageOrder,ConjugateLhs,RhsScalar,ConjugateRhs,BuiltIn> {};
+
+#define EIGEN_MKL_GEMV_SPECIALIZE(Scalar) \
+template<typename Index, bool ConjugateLhs, bool ConjugateRhs> \
+struct general_matrix_vector_product<Index,Scalar,ColMajor,ConjugateLhs,Scalar,ConjugateRhs,Specialized> { \
+static void run( \
+  Index rows, Index cols, \
+  const Scalar* lhs, Index lhsStride, \
+  const Scalar* rhs, Index rhsIncr, \
+  Scalar* res, Index resIncr, Scalar alpha) \
+{ \
+  if (ConjugateLhs) { \
+    general_matrix_vector_product<Index,Scalar,ColMajor,ConjugateLhs,Scalar,ConjugateRhs,BuiltIn>::run( \
+      rows, cols, lhs, lhsStride, rhs, rhsIncr, res, resIncr, alpha); \
+  } else { \
+    general_matrix_vector_product_gemv<Index,Scalar,ColMajor,ConjugateLhs,Scalar,ConjugateRhs>::run( \
+      rows, cols, lhs, lhsStride, rhs, rhsIncr, res, resIncr, alpha); \
+  } \
+} \
+}; \
+template<typename Index, bool ConjugateLhs, bool ConjugateRhs> \
+struct general_matrix_vector_product<Index,Scalar,RowMajor,ConjugateLhs,Scalar,ConjugateRhs,Specialized> { \
+static void run( \
+  Index rows, Index cols, \
+  const Scalar* lhs, Index lhsStride, \
+  const Scalar* rhs, Index rhsIncr, \
+  Scalar* res, Index resIncr, Scalar alpha) \
+{ \
+    general_matrix_vector_product_gemv<Index,Scalar,RowMajor,ConjugateLhs,Scalar,ConjugateRhs>::run( \
+      rows, cols, lhs, lhsStride, rhs, rhsIncr, res, resIncr, alpha); \
+} \
+}; \
+
+EIGEN_MKL_GEMV_SPECIALIZE(double)
+EIGEN_MKL_GEMV_SPECIALIZE(float)
+EIGEN_MKL_GEMV_SPECIALIZE(dcomplex)
+EIGEN_MKL_GEMV_SPECIALIZE(scomplex)
+
+#define EIGEN_MKL_GEMV_SPECIALIZATION(EIGTYPE,MKLTYPE,MKLPREFIX) \
+template<typename Index, int LhsStorageOrder, bool ConjugateLhs, bool ConjugateRhs> \
+struct general_matrix_vector_product_gemv<Index,EIGTYPE,LhsStorageOrder,ConjugateLhs,EIGTYPE,ConjugateRhs> \
+{ \
+typedef Matrix<EIGTYPE,Dynamic,1,ColMajor> GEMVVector;\
+\
+static void run( \
+  Index rows, Index cols, \
+  const EIGTYPE* lhs, Index lhsStride, \
+  const EIGTYPE* rhs, Index rhsIncr, \
+  EIGTYPE* res, Index resIncr, EIGTYPE alpha) \
+{ \
+  MKL_INT m=rows, n=cols, lda=lhsStride, incx=rhsIncr, incy=resIncr; \
+  MKLTYPE alpha_, beta_; \
+  const EIGTYPE *x_ptr, myone(1); \
+  char trans=(LhsStorageOrder==ColMajor) ? 'N' : (ConjugateLhs) ? 'C' : 'T'; \
+  if (LhsStorageOrder==RowMajor) { \
+    m=cols; \
+    n=rows; \
+  }\
+  assign_scalar_eig2mkl(alpha_, alpha); \
+  assign_scalar_eig2mkl(beta_, myone); \
+  GEMVVector x_tmp; \
+  if (ConjugateRhs) { \
+    Map<const GEMVVector, 0, InnerStride<> > map_x(rhs,cols,1,InnerStride<>(incx)); \
+    x_tmp=map_x.conjugate(); \
+    x_ptr=x_tmp.data(); \
+    incx=1; \
+  } else x_ptr=rhs; \
+  MKLPREFIX##gemv(&trans, &m, &n, &alpha_, (const MKLTYPE*)lhs, &lda, (const MKLTYPE*)x_ptr, &incx, &beta_, (MKLTYPE*)res, &incy); \
+}\
+};
+
+EIGEN_MKL_GEMV_SPECIALIZATION(double,   double,        d)
+EIGEN_MKL_GEMV_SPECIALIZATION(float,    float,         s)
+EIGEN_MKL_GEMV_SPECIALIZATION(dcomplex, MKL_Complex16, z)
+EIGEN_MKL_GEMV_SPECIALIZATION(scomplex, MKL_Complex8,  c)
+
+} // end namespase internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERAL_MATRIX_VECTOR_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/Parallelizer.h b/third_party/eigen3/Eigen/src/Core/products/Parallelizer.h
new file mode 100644
index 0000000000..837e69415b
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/Parallelizer.h
@@ -0,0 +1,158 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PARALLELIZER_H
+#define EIGEN_PARALLELIZER_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \internal */
+inline void manage_multi_threading(Action action, int* v)
+{
+  static EIGEN_UNUSED int m_maxThreads = -1;
+
+  if(action==SetAction)
+  {
+    eigen_internal_assert(v!=0);
+    m_maxThreads = *v;
+  }
+  else if(action==GetAction)
+  {
+    eigen_internal_assert(v!=0);
+    #ifdef EIGEN_HAS_OPENMP
+    if(m_maxThreads>0)
+      *v = m_maxThreads;
+    else
+      *v = omp_get_max_threads();
+    #else
+    *v = 1;
+    #endif
+  }
+  else
+  {
+    eigen_internal_assert(false);
+  }
+}
+
+}
+
+/** Must be call first when calling Eigen from multiple threads */
+inline void initParallel()
+{
+  int nbt;
+  internal::manage_multi_threading(GetAction, &nbt);
+  std::ptrdiff_t l1, l2, l3;
+  internal::manage_caching_sizes(GetAction, &l1, &l2, &l3);
+}
+
+/** \returns the max number of threads reserved for Eigen
+  * \sa setNbThreads */
+inline int nbThreads()
+{
+  int ret;
+  internal::manage_multi_threading(GetAction, &ret);
+  return ret;
+}
+
+/** Sets the max number of threads reserved for Eigen
+  * \sa nbThreads */
+inline void setNbThreads(int v)
+{
+  internal::manage_multi_threading(SetAction, &v);
+}
+
+namespace internal {
+
+template<typename Index> struct GemmParallelInfo
+{
+  GemmParallelInfo() : sync(-1), users(0), lhs_start(0), lhs_length(0) {}
+
+  int volatile sync;
+  int volatile users;
+
+  Index lhs_start;
+  Index lhs_length;
+};
+
+template<bool Condition, typename Functor, typename Index>
+void parallelize_gemm(const Functor& func, Index rows, Index cols, bool transpose)
+{
+  // TODO when EIGEN_USE_BLAS is defined,
+  // we should still enable OMP for other scalar types
+#if !(defined (EIGEN_HAS_OPENMP)) || defined (EIGEN_USE_BLAS)
+  // FIXME the transpose variable is only needed to properly split
+  // the matrix product when multithreading is enabled. This is a temporary
+  // fix to support row-major destination matrices. This whole
+  // parallelizer mechanism has to be redisigned anyway.
+  EIGEN_UNUSED_VARIABLE(transpose);
+  func(0,rows, 0,cols);
+#else
+
+  // Dynamically check whether we should enable or disable OpenMP.
+  // The conditions are:
+  // - the max number of threads we can create is greater than 1
+  // - we are not already in a parallel code
+  // - the sizes are large enough
+
+  // 1- are we already in a parallel session?
+  // FIXME omp_get_num_threads()>1 only works for openmp, what if the user does not use openmp?
+  if((!Condition) || (omp_get_num_threads()>1))
+    return func(0,rows, 0,cols);
+
+  Index size = transpose ? rows : cols;
+
+  // 2- compute the maximal number of threads from the size of the product:
+  // FIXME this has to be fine tuned
+  Index max_threads = std::max<Index>(1,size / 32);
+
+  // 3 - compute the number of threads we are going to use
+  Index threads = std::min<Index>(nbThreads(), max_threads);
+
+  if(threads==1)
+    return func(0,rows, 0,cols);
+
+  Eigen::initParallel();
+  func.initParallelSession();
+
+  if(transpose)
+    std::swap(rows,cols);
+
+  Index blockCols = (cols / threads) & ~Index(0x3);
+  Index blockRows = (rows / threads);
+  blockRows = (blockRows/Functor::Traits::mr)*Functor::Traits::mr;
+  
+  GemmParallelInfo<Index>* info = new GemmParallelInfo<Index>[threads];
+
+  #pragma omp parallel num_threads(threads)
+  {
+    Index i = omp_get_thread_num();
+    Index r0 = i*blockRows;
+    Index actualBlockRows = (i+1==threads) ? rows-r0 : blockRows;
+
+    Index c0 = i*blockCols;
+    Index actualBlockCols = (i+1==threads) ? cols-c0 : blockCols;
+
+    info[i].lhs_start = r0;
+    info[i].lhs_length = actualBlockRows;
+
+    if(transpose) func(c0, actualBlockCols, 0, rows, info);
+    else          func(0, rows, c0, actualBlockCols, info);
+  }
+
+  delete[] info;
+#endif
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_PARALLELIZER_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixMatrix.h b/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixMatrix.h
new file mode 100644
index 0000000000..4a60ef7dc5
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixMatrix.h
@@ -0,0 +1,523 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SELFADJOINT_MATRIX_MATRIX_H
+#define EIGEN_SELFADJOINT_MATRIX_MATRIX_H
+
+namespace Eigen {
+
+namespace internal {
+
+// pack a selfadjoint block diagonal for use with the gebp_kernel
+template<typename Scalar, typename Index, int Pack1, int Pack2_dummy, int StorageOrder>
+struct symm_pack_lhs
+{
+  template<int BlockRows> inline
+  void pack(Scalar* blockA, const const_blas_data_mapper<Scalar,Index,StorageOrder>& lhs, Index cols, Index i, Index& count)
+  {
+    // normal copy
+    for(Index k=0; k<i; k++)
+      for(Index w=0; w<BlockRows; w++)
+        blockA[count++] = lhs(i+w,k);           // normal
+    // symmetric copy
+    Index h = 0;
+    for(Index k=i; k<i+BlockRows; k++)
+    {
+      for(Index w=0; w<h; w++)
+        blockA[count++] = numext::conj(lhs(k, i+w)); // transposed
+
+      blockA[count++] = numext::real(lhs(k,k));   // real (diagonal)
+
+      for(Index w=h+1; w<BlockRows; w++)
+        blockA[count++] = lhs(i+w, k);          // normal
+      ++h;
+    }
+    // transposed copy
+    for(Index k=i+BlockRows; k<cols; k++)
+      for(Index w=0; w<BlockRows; w++)
+        blockA[count++] = numext::conj(lhs(k, i+w)); // transposed
+  }
+  void operator()(Scalar* blockA, const Scalar* _lhs, Index lhsStride, Index cols, Index rows)
+  {
+    enum { PacketSize = packet_traits<Scalar>::size };
+    const_blas_data_mapper<Scalar,Index,StorageOrder> lhs(_lhs,lhsStride);
+    Index count = 0;
+    //Index peeled_mc3 = (rows/Pack1)*Pack1;
+
+    const Index peeled_mc3 = Pack1>=3*PacketSize ? (rows/(3*PacketSize))*(3*PacketSize) : 0;
+    const Index peeled_mc2 = Pack1>=2*PacketSize ? peeled_mc3+((rows-peeled_mc3)/(2*PacketSize))*(2*PacketSize) : 0;
+    const Index peeled_mc1 = Pack1>=1*PacketSize ? (rows/(1*PacketSize))*(1*PacketSize) : 0;
+
+    if(Pack1>=3*PacketSize)
+      for(Index i=0; i<peeled_mc3; i+=3*PacketSize)
+        pack<3*PacketSize>(blockA, lhs, cols, i, count);
+
+    if(Pack1>=2*PacketSize)
+      for(Index i=peeled_mc3; i<peeled_mc2; i+=2*PacketSize)
+        pack<2*PacketSize>(blockA, lhs, cols, i, count);
+
+    if(Pack1>=1*PacketSize)
+      for(Index i=peeled_mc2; i<peeled_mc1; i+=1*PacketSize)
+        pack<1*PacketSize>(blockA, lhs, cols, i, count);
+
+    // do the same with mr==1
+    for(Index i=peeled_mc1; i<rows; i++)
+    {
+      for(Index k=0; k<i; k++)
+        blockA[count++] = lhs(i, k);                   // normal
+
+      blockA[count++] = numext::real(lhs(i, i));       // real (diagonal)
+
+      for(Index k=i+1; k<cols; k++)
+        blockA[count++] = numext::conj(lhs(k, i));     // transposed
+    }
+  }
+};
+
+template<typename Scalar, typename Index, int nr, int StorageOrder>
+struct symm_pack_rhs
+{
+  enum { PacketSize = packet_traits<Scalar>::size };
+  void operator()(Scalar* blockB, const Scalar* _rhs, Index rhsStride, Index rows, Index cols, Index k2)
+  {
+    Index end_k = k2 + rows;
+    Index count = 0;
+    const_blas_data_mapper<Scalar,Index,StorageOrder> rhs(_rhs,rhsStride);
+    Index packet_cols8 = nr>=8 ? (cols/8) * 8 : 0;
+    Index packet_cols4 = nr>=4 ? (cols/4) * 4 : 0;
+
+    // first part: normal case
+    for(Index j2=0; j2<k2; j2+=nr)
+    {
+      for(Index k=k2; k<end_k; k++)
+      {
+        blockB[count+0] = rhs(k,j2+0);
+        blockB[count+1] = rhs(k,j2+1);
+        if (nr>=4)
+        {
+          blockB[count+2] = rhs(k,j2+2);
+          blockB[count+3] = rhs(k,j2+3);
+        }
+        if (nr>=8)
+        {
+          blockB[count+4] = rhs(k,j2+4);
+          blockB[count+5] = rhs(k,j2+5);
+          blockB[count+6] = rhs(k,j2+6);
+          blockB[count+7] = rhs(k,j2+7);
+        }
+        count += nr;
+      }
+    }
+
+    // second part: diagonal block
+    Index end8 = nr>=8 ? (std::min)(k2+rows,packet_cols8) : k2;
+    if(nr>=8)
+    {
+      for(Index j2=k2; j2<end8; j2+=8)
+      {
+        // again we can split vertically in three different parts (transpose, symmetric, normal)
+        // transpose
+        for(Index k=k2; k<j2; k++)
+        {
+          blockB[count+0] = numext::conj(rhs(j2+0,k));
+          blockB[count+1] = numext::conj(rhs(j2+1,k));
+          blockB[count+2] = numext::conj(rhs(j2+2,k));
+          blockB[count+3] = numext::conj(rhs(j2+3,k));
+          blockB[count+4] = numext::conj(rhs(j2+4,k));
+          blockB[count+5] = numext::conj(rhs(j2+5,k));
+          blockB[count+6] = numext::conj(rhs(j2+6,k));
+          blockB[count+7] = numext::conj(rhs(j2+7,k));
+          count += 8;
+        }
+        // symmetric
+        Index h = 0;
+        for(Index k=j2; k<j2+8; k++)
+        {
+          // normal
+          for (Index w=0 ; w<h; ++w)
+            blockB[count+w] = rhs(k,j2+w);
+
+          blockB[count+h] = numext::real(rhs(k,k));
+
+          // transpose
+          for (Index w=h+1 ; w<8; ++w)
+            blockB[count+w] = numext::conj(rhs(j2+w,k));
+          count += 8;
+          ++h;
+        }
+        // normal
+        for(Index k=j2+8; k<end_k; k++)
+        {
+          blockB[count+0] = rhs(k,j2+0);
+          blockB[count+1] = rhs(k,j2+1);
+          blockB[count+2] = rhs(k,j2+2);
+          blockB[count+3] = rhs(k,j2+3);
+          blockB[count+4] = rhs(k,j2+4);
+          blockB[count+5] = rhs(k,j2+5);
+          blockB[count+6] = rhs(k,j2+6);
+          blockB[count+7] = rhs(k,j2+7);
+          count += 8;
+        }
+      }
+    }
+    if(nr>=4)
+    {
+      for(Index j2=end8; j2<(std::min)(k2+rows,packet_cols4); j2+=4)
+      {
+        // again we can split vertically in three different parts (transpose, symmetric, normal)
+        // transpose
+        for(Index k=k2; k<j2; k++)
+        {
+          blockB[count+0] = numext::conj(rhs(j2+0,k));
+          blockB[count+1] = numext::conj(rhs(j2+1,k));
+          blockB[count+2] = numext::conj(rhs(j2+2,k));
+          blockB[count+3] = numext::conj(rhs(j2+3,k));
+          count += 4;
+        }
+        // symmetric
+        Index h = 0;
+        for(Index k=j2; k<j2+4; k++)
+        {
+          // normal
+          for (Index w=0 ; w<h; ++w)
+            blockB[count+w] = rhs(k,j2+w);
+
+          blockB[count+h] = numext::real(rhs(k,k));
+
+          // transpose
+          for (Index w=h+1 ; w<4; ++w)
+            blockB[count+w] = numext::conj(rhs(j2+w,k));
+          count += 4;
+          ++h;
+        }
+        // normal
+        for(Index k=j2+4; k<end_k; k++)
+        {
+          blockB[count+0] = rhs(k,j2+0);
+          blockB[count+1] = rhs(k,j2+1);
+          blockB[count+2] = rhs(k,j2+2);
+          blockB[count+3] = rhs(k,j2+3);
+          count += 4;
+        }
+      }
+    }
+
+    // third part: transposed
+    if(nr>=8)
+    {
+      for(Index j2=k2+rows; j2<packet_cols8; j2+=8)
+      {
+        for(Index k=k2; k<end_k; k++)
+        {
+          blockB[count+0] = numext::conj(rhs(j2+0,k));
+          blockB[count+1] = numext::conj(rhs(j2+1,k));
+          blockB[count+2] = numext::conj(rhs(j2+2,k));
+          blockB[count+3] = numext::conj(rhs(j2+3,k));
+          blockB[count+4] = numext::conj(rhs(j2+4,k));
+          blockB[count+5] = numext::conj(rhs(j2+5,k));
+          blockB[count+6] = numext::conj(rhs(j2+6,k));
+          blockB[count+7] = numext::conj(rhs(j2+7,k));
+          count += 8;
+        }
+      }
+    }
+    if(nr>=4)
+    {
+      for(Index j2=(std::max)(packet_cols8,k2+rows); j2<packet_cols4; j2+=4)
+      {
+        for(Index k=k2; k<end_k; k++)
+        {
+          blockB[count+0] = numext::conj(rhs(j2+0,k));
+          blockB[count+1] = numext::conj(rhs(j2+1,k));
+          blockB[count+2] = numext::conj(rhs(j2+2,k));
+          blockB[count+3] = numext::conj(rhs(j2+3,k));
+          count += 4;
+        }
+      }
+    }
+
+    // copy the remaining columns one at a time (=> the same with nr==1)
+    for(Index j2=packet_cols4; j2<cols; ++j2)
+    {
+      // transpose
+      Index half = (std::min)(end_k,j2);
+      for(Index k=k2; k<half; k++)
+      {
+        blockB[count] = numext::conj(rhs(j2,k));
+        count += 1;
+      }
+
+      if(half==j2 && half<k2+rows)
+      {
+        blockB[count] = numext::real(rhs(j2,j2));
+        count += 1;
+      }
+      else
+        half--;
+
+      // normal
+      for(Index k=half+1; k<k2+rows; k++)
+      {
+        blockB[count] = rhs(k,j2);
+        count += 1;
+      }
+    }
+  }
+};
+
+/* Optimized selfadjoint matrix * matrix (_SYMM) product built on top of
+ * the general matrix matrix product.
+ */
+template <typename Scalar, typename Index,
+          int LhsStorageOrder, bool LhsSelfAdjoint, bool ConjugateLhs,
+          int RhsStorageOrder, bool RhsSelfAdjoint, bool ConjugateRhs,
+          int ResStorageOrder>
+struct product_selfadjoint_matrix;
+
+template <typename Scalar, typename Index,
+          int LhsStorageOrder, bool LhsSelfAdjoint, bool ConjugateLhs,
+          int RhsStorageOrder, bool RhsSelfAdjoint, bool ConjugateRhs>
+struct product_selfadjoint_matrix<Scalar,Index,LhsStorageOrder,LhsSelfAdjoint,ConjugateLhs, RhsStorageOrder,RhsSelfAdjoint,ConjugateRhs,RowMajor>
+{
+
+  static EIGEN_STRONG_INLINE void run(
+    Index rows, Index cols,
+    const Scalar* lhs, Index lhsStride,
+    const Scalar* rhs, Index rhsStride,
+    Scalar* res,       Index resStride,
+    const Scalar& alpha)
+  {
+    product_selfadjoint_matrix<Scalar, Index,
+      EIGEN_LOGICAL_XOR(RhsSelfAdjoint,RhsStorageOrder==RowMajor) ? ColMajor : RowMajor,
+      RhsSelfAdjoint, NumTraits<Scalar>::IsComplex && EIGEN_LOGICAL_XOR(RhsSelfAdjoint,ConjugateRhs),
+      EIGEN_LOGICAL_XOR(LhsSelfAdjoint,LhsStorageOrder==RowMajor) ? ColMajor : RowMajor,
+      LhsSelfAdjoint, NumTraits<Scalar>::IsComplex && EIGEN_LOGICAL_XOR(LhsSelfAdjoint,ConjugateLhs),
+      ColMajor>
+      ::run(cols, rows,  rhs, rhsStride,  lhs, lhsStride,  res, resStride,  alpha);
+  }
+};
+
+template <typename Scalar, typename Index,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs>
+struct product_selfadjoint_matrix<Scalar,Index,LhsStorageOrder,true,ConjugateLhs, RhsStorageOrder,false,ConjugateRhs,ColMajor>
+{
+
+  static EIGEN_DONT_INLINE void run(
+    Index rows, Index cols,
+    const Scalar* _lhs, Index lhsStride,
+    const Scalar* _rhs, Index rhsStride,
+    Scalar* res,        Index resStride,
+    const Scalar& alpha);
+};
+
+template <typename Scalar, typename Index,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs>
+EIGEN_DONT_INLINE void product_selfadjoint_matrix<Scalar,Index,LhsStorageOrder,true,ConjugateLhs, RhsStorageOrder,false,ConjugateRhs,ColMajor>::run(
+    Index rows, Index cols,
+    const Scalar* _lhs, Index lhsStride,
+    const Scalar* _rhs, Index rhsStride,
+    Scalar* _res,        Index resStride,
+    const Scalar& alpha)
+  {
+    Index size = rows;
+
+    typedef gebp_traits<Scalar,Scalar> Traits;
+
+    typedef const_blas_data_mapper<Scalar, Index, LhsStorageOrder> LhsMapper;
+    typedef const_blas_data_mapper<Scalar, Index, (LhsStorageOrder == RowMajor) ? ColMajor : RowMajor> LhsTransposeMapper;
+    typedef const_blas_data_mapper<Scalar, Index, RhsStorageOrder> RhsMapper;
+    typedef blas_data_mapper<typename Traits::ResScalar, Index, ColMajor> ResMapper;
+    LhsMapper lhs(_lhs,lhsStride);
+    LhsTransposeMapper lhs_transpose(_lhs,lhsStride);
+    RhsMapper rhs(_rhs,rhsStride);
+    ResMapper res(_res, resStride);
+
+    Index kc = size;  // cache block size along the K direction
+    Index mc = rows;  // cache block size along the M direction
+    Index nc = cols;  // cache block size along the N direction
+    computeProductBlockingSizes<Scalar,Scalar>(kc, mc, nc, Index(1));
+    // kc must smaller than mc
+    kc = (std::min)(kc,mc);
+
+    std::size_t sizeB = kc*cols;
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockA, kc*mc, 0);
+    ei_declare_aligned_stack_constructed_variable(Scalar, allocatedBlockB, sizeB, 0);
+    Scalar* blockB = allocatedBlockB;
+
+    gebp_kernel<Scalar, Scalar, Index, ResMapper, Traits::mr, Traits::nr, ConjugateLhs, ConjugateRhs> gebp_kernel;
+    symm_pack_lhs<Scalar, Index, Traits::mr, Traits::LhsProgress, LhsStorageOrder> pack_lhs;
+    gemm_pack_rhs<Scalar, Index, RhsMapper, Traits::nr,RhsStorageOrder> pack_rhs;
+    gemm_pack_lhs<Scalar, Index, LhsTransposeMapper, Traits::mr, Traits::LhsProgress, LhsStorageOrder==RowMajor?ColMajor:RowMajor, true> pack_lhs_transposed;
+
+    for(Index k2=0; k2<size; k2+=kc)
+    {
+      const Index actual_kc = (std::min)(k2+kc,size)-k2;
+
+      // we have selected one row panel of rhs and one column panel of lhs
+      // pack rhs's panel into a sequential chunk of memory
+      // and expand each coeff to a constant packet for further reuse
+      pack_rhs(blockB, rhs.getSubMapper(k2,0), actual_kc, cols);
+
+      // the select lhs's panel has to be split in three different parts:
+      //  1 - the transposed panel above the diagonal block => transposed packed copy
+      //  2 - the diagonal block => special packed copy
+      //  3 - the panel below the diagonal block => generic packed copy
+      for(Index i2=0; i2<k2; i2+=mc)
+      {
+        const Index actual_mc = (std::min)(i2+mc,k2)-i2;
+        // transposed packed copy
+        pack_lhs_transposed(blockA, lhs_transpose.getSubMapper(i2, k2), actual_kc, actual_mc);
+
+        gebp_kernel(res.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc, cols, alpha);
+      }
+      // the block diagonal
+      {
+        const Index actual_mc = (std::min)(k2+kc,size)-k2;
+        // symmetric packed copy
+        pack_lhs(blockA, &lhs(k2,k2), lhsStride, actual_kc, actual_mc);
+
+        gebp_kernel(res.getSubMapper(k2, 0), blockA, blockB, actual_mc, actual_kc, cols, alpha);
+      }
+
+      for(Index i2=k2+kc; i2<size; i2+=mc)
+      {
+        const Index actual_mc = (std::min)(i2+mc,size)-i2;
+        gemm_pack_lhs<Scalar, Index, LhsMapper, Traits::mr, Traits::LhsProgress, LhsStorageOrder,false>()
+          (blockA, lhs.getSubMapper(i2, k2), actual_kc, actual_mc);
+
+        gebp_kernel(res.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc, cols, alpha);
+      }
+    }
+  }
+
+// matrix * selfadjoint product
+template <typename Scalar, typename Index,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs>
+struct product_selfadjoint_matrix<Scalar,Index,LhsStorageOrder,false,ConjugateLhs, RhsStorageOrder,true,ConjugateRhs,ColMajor>
+{
+
+  static EIGEN_DONT_INLINE void run(
+    Index rows, Index cols,
+    const Scalar* _lhs, Index lhsStride,
+    const Scalar* _rhs, Index rhsStride,
+    Scalar* res,        Index resStride,
+    const Scalar& alpha);
+};
+
+template <typename Scalar, typename Index,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs>
+EIGEN_DONT_INLINE void product_selfadjoint_matrix<Scalar,Index,LhsStorageOrder,false,ConjugateLhs, RhsStorageOrder,true,ConjugateRhs,ColMajor>::run(
+    Index rows, Index cols,
+    const Scalar* _lhs, Index lhsStride,
+    const Scalar* _rhs, Index rhsStride,
+    Scalar* _res,        Index resStride,
+    const Scalar& alpha)
+  {
+    Index size = cols;
+
+    typedef gebp_traits<Scalar,Scalar> Traits;
+
+    typedef const_blas_data_mapper<Scalar, Index, LhsStorageOrder> LhsMapper;
+    typedef blas_data_mapper<typename Traits::ResScalar, Index, ColMajor> ResMapper;
+    LhsMapper lhs(_lhs,lhsStride);
+    ResMapper res(_res,resStride);
+
+    Index kc = size; // cache block size along the K direction
+    Index mc = rows;  // cache block size along the M direction
+    Index nc = cols;  // cache block size along the N direction
+    computeProductBlockingSizes<Scalar,Scalar>(kc, mc, nc, Index(1));
+    std::size_t sizeB = kc*cols;
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockA, kc*mc, 0);
+    ei_declare_aligned_stack_constructed_variable(Scalar, allocatedBlockB, sizeB, 0);
+    Scalar* blockB = allocatedBlockB;
+
+    gebp_kernel<Scalar, Scalar, Index, ResMapper, Traits::mr, Traits::nr, ConjugateLhs, ConjugateRhs> gebp_kernel;
+    gemm_pack_lhs<Scalar, Index, LhsMapper, Traits::mr, Traits::LhsProgress, LhsStorageOrder> pack_lhs;
+    symm_pack_rhs<Scalar, Index, Traits::nr,RhsStorageOrder> pack_rhs;
+
+    for(Index k2=0; k2<size; k2+=kc)
+    {
+      const Index actual_kc = (std::min)(k2+kc,size)-k2;
+
+      pack_rhs(blockB, _rhs, rhsStride, actual_kc, cols, k2);
+
+      // => GEPP
+      for(Index i2=0; i2<rows; i2+=mc)
+      {
+        const Index actual_mc = (std::min)(i2+mc,rows)-i2;
+        pack_lhs(blockA, lhs.getSubMapper(i2, k2), actual_kc, actual_mc);
+
+        gebp_kernel(res.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc, cols, alpha);
+      }
+    }
+  }
+
+} // end namespace internal
+
+/***************************************************************************
+* Wrapper to product_selfadjoint_matrix
+***************************************************************************/
+
+namespace internal {
+template<typename Lhs, int LhsMode, typename Rhs, int RhsMode>
+struct traits<SelfadjointProductMatrix<Lhs,LhsMode,false,Rhs,RhsMode,false> >
+  : traits<ProductBase<SelfadjointProductMatrix<Lhs,LhsMode,false,Rhs,RhsMode,false>, Lhs, Rhs> >
+{};
+}
+
+template<typename Lhs, int LhsMode, typename Rhs, int RhsMode>
+struct SelfadjointProductMatrix<Lhs,LhsMode,false,Rhs,RhsMode,false>
+  : public ProductBase<SelfadjointProductMatrix<Lhs,LhsMode,false,Rhs,RhsMode,false>, Lhs, Rhs >
+{
+  EIGEN_PRODUCT_PUBLIC_INTERFACE(SelfadjointProductMatrix)
+
+  SelfadjointProductMatrix(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs) {}
+
+  enum {
+    LhsIsUpper = (LhsMode&(Upper|Lower))==Upper,
+    LhsIsSelfAdjoint = (LhsMode&SelfAdjoint)==SelfAdjoint,
+    RhsIsUpper = (RhsMode&(Upper|Lower))==Upper,
+    RhsIsSelfAdjoint = (RhsMode&SelfAdjoint)==SelfAdjoint
+  };
+
+  template<typename Dest> void scaleAndAddTo(Dest& dst, const Scalar& alpha) const
+  {
+    eigen_assert(dst.rows()==m_lhs.rows() && dst.cols()==m_rhs.cols());
+
+    typename internal::add_const_on_value_type<ActualLhsType>::type lhs = LhsBlasTraits::extract(m_lhs);
+    typename internal::add_const_on_value_type<ActualRhsType>::type rhs = RhsBlasTraits::extract(m_rhs);
+
+    Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(m_lhs)
+                               * RhsBlasTraits::extractScalarFactor(m_rhs);
+
+    internal::product_selfadjoint_matrix<Scalar, Index,
+      EIGEN_LOGICAL_XOR(LhsIsUpper,
+                        internal::traits<Lhs>::Flags &RowMajorBit) ? RowMajor : ColMajor, LhsIsSelfAdjoint,
+      NumTraits<Scalar>::IsComplex && EIGEN_LOGICAL_XOR(LhsIsUpper,bool(LhsBlasTraits::NeedToConjugate)),
+      EIGEN_LOGICAL_XOR(RhsIsUpper,
+                        internal::traits<Rhs>::Flags &RowMajorBit) ? RowMajor : ColMajor, RhsIsSelfAdjoint,
+      NumTraits<Scalar>::IsComplex && EIGEN_LOGICAL_XOR(RhsIsUpper,bool(RhsBlasTraits::NeedToConjugate)),
+      internal::traits<Dest>::Flags&RowMajorBit  ? RowMajor : ColMajor>
+      ::run(
+        lhs.rows(), rhs.cols(),                     // sizes
+        &lhs.coeffRef(0,0),    lhs.outerStride(),   // lhs info
+        &rhs.coeffRef(0,0),    rhs.outerStride(),   // rhs info
+        &dst.coeffRef(0,0), dst.outerStride(),      // result info
+        actualAlpha                                 // alpha
+      );
+  }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFADJOINT_MATRIX_MATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixMatrix_MKL.h b/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixMatrix_MKL.h
new file mode 100644
index 0000000000..dfa687fefe
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixMatrix_MKL.h
@@ -0,0 +1,295 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   Self adjoint matrix * matrix product functionality based on ?SYMM/?HEMM.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_SELFADJOINT_MATRIX_MATRIX_MKL_H
+#define EIGEN_SELFADJOINT_MATRIX_MATRIX_MKL_H
+
+namespace Eigen { 
+
+namespace internal {
+
+
+/* Optimized selfadjoint matrix * matrix (?SYMM/?HEMM) product */
+
+#define EIGEN_MKL_SYMM_L(EIGTYPE, MKLTYPE, EIGPREFIX, MKLPREFIX) \
+template <typename Index, \
+          int LhsStorageOrder, bool ConjugateLhs, \
+          int RhsStorageOrder, bool ConjugateRhs> \
+struct product_selfadjoint_matrix<EIGTYPE,Index,LhsStorageOrder,true,ConjugateLhs,RhsStorageOrder,false,ConjugateRhs,ColMajor> \
+{\
+\
+  static void run( \
+    Index rows, Index cols, \
+    const EIGTYPE* _lhs, Index lhsStride, \
+    const EIGTYPE* _rhs, Index rhsStride, \
+    EIGTYPE* res,        Index resStride, \
+    EIGTYPE alpha) \
+  { \
+    char side='L', uplo='L'; \
+    MKL_INT m, n, lda, ldb, ldc; \
+    const EIGTYPE *a, *b; \
+    MKLTYPE alpha_, beta_; \
+    MatrixX##EIGPREFIX b_tmp; \
+    EIGTYPE myone(1);\
+\
+/* Set transpose options */ \
+/* Set m, n, k */ \
+    m = (MKL_INT)rows;  \
+    n = (MKL_INT)cols;  \
+\
+/* Set alpha_ & beta_ */ \
+    assign_scalar_eig2mkl(alpha_, alpha); \
+    assign_scalar_eig2mkl(beta_, myone); \
+\
+/* Set lda, ldb, ldc */ \
+    lda = (MKL_INT)lhsStride; \
+    ldb = (MKL_INT)rhsStride; \
+    ldc = (MKL_INT)resStride; \
+\
+/* Set a, b, c */ \
+    if (LhsStorageOrder==RowMajor) uplo='U'; \
+    a = _lhs; \
+\
+    if (RhsStorageOrder==RowMajor) { \
+      Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > rhs(_rhs,n,m,OuterStride<>(rhsStride)); \
+      b_tmp = rhs.adjoint(); \
+      b = b_tmp.data(); \
+      ldb = b_tmp.outerStride(); \
+    } else b = _rhs; \
+\
+    MKLPREFIX##symm(&side, &uplo, &m, &n, &alpha_, (const MKLTYPE*)a, &lda, (const MKLTYPE*)b, &ldb, &beta_, (MKLTYPE*)res, &ldc); \
+\
+  } \
+};
+
+
+#define EIGEN_MKL_HEMM_L(EIGTYPE, MKLTYPE, EIGPREFIX, MKLPREFIX) \
+template <typename Index, \
+          int LhsStorageOrder, bool ConjugateLhs, \
+          int RhsStorageOrder, bool ConjugateRhs> \
+struct product_selfadjoint_matrix<EIGTYPE,Index,LhsStorageOrder,true,ConjugateLhs,RhsStorageOrder,false,ConjugateRhs,ColMajor> \
+{\
+  static void run( \
+    Index rows, Index cols, \
+    const EIGTYPE* _lhs, Index lhsStride, \
+    const EIGTYPE* _rhs, Index rhsStride, \
+    EIGTYPE* res,        Index resStride, \
+    EIGTYPE alpha) \
+  { \
+    char side='L', uplo='L'; \
+    MKL_INT m, n, lda, ldb, ldc; \
+    const EIGTYPE *a, *b; \
+    MKLTYPE alpha_, beta_; \
+    MatrixX##EIGPREFIX b_tmp; \
+    Matrix<EIGTYPE, Dynamic, Dynamic, LhsStorageOrder> a_tmp; \
+    EIGTYPE myone(1); \
+\
+/* Set transpose options */ \
+/* Set m, n, k */ \
+    m = (MKL_INT)rows; \
+    n = (MKL_INT)cols; \
+\
+/* Set alpha_ & beta_ */ \
+    assign_scalar_eig2mkl(alpha_, alpha); \
+    assign_scalar_eig2mkl(beta_, myone); \
+\
+/* Set lda, ldb, ldc */ \
+    lda = (MKL_INT)lhsStride; \
+    ldb = (MKL_INT)rhsStride; \
+    ldc = (MKL_INT)resStride; \
+\
+/* Set a, b, c */ \
+    if (((LhsStorageOrder==ColMajor) && ConjugateLhs) || ((LhsStorageOrder==RowMajor) && (!ConjugateLhs))) { \
+      Map<const Matrix<EIGTYPE, Dynamic, Dynamic, LhsStorageOrder>, 0, OuterStride<> > lhs(_lhs,m,m,OuterStride<>(lhsStride)); \
+      a_tmp = lhs.conjugate(); \
+      a = a_tmp.data(); \
+      lda = a_tmp.outerStride(); \
+    } else a = _lhs; \
+    if (LhsStorageOrder==RowMajor) uplo='U'; \
+\
+    if (RhsStorageOrder==ColMajor && (!ConjugateRhs)) { \
+       b = _rhs; } \
+    else { \
+      if (RhsStorageOrder==ColMajor && ConjugateRhs) { \
+        Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > rhs(_rhs,m,n,OuterStride<>(rhsStride)); \
+        b_tmp = rhs.conjugate(); \
+      } else \
+      if (ConjugateRhs) { \
+        Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > rhs(_rhs,n,m,OuterStride<>(rhsStride)); \
+        b_tmp = rhs.adjoint(); \
+      } else { \
+        Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > rhs(_rhs,n,m,OuterStride<>(rhsStride)); \
+        b_tmp = rhs.transpose(); \
+      } \
+      b = b_tmp.data(); \
+      ldb = b_tmp.outerStride(); \
+    } \
+\
+    MKLPREFIX##hemm(&side, &uplo, &m, &n, &alpha_, (const MKLTYPE*)a, &lda, (const MKLTYPE*)b, &ldb, &beta_, (MKLTYPE*)res, &ldc); \
+\
+  } \
+};
+
+EIGEN_MKL_SYMM_L(double, double, d, d)
+EIGEN_MKL_SYMM_L(float, float, f, s)
+EIGEN_MKL_HEMM_L(dcomplex, MKL_Complex16, cd, z)
+EIGEN_MKL_HEMM_L(scomplex, MKL_Complex8, cf, c)
+
+
+/* Optimized matrix * selfadjoint matrix (?SYMM/?HEMM) product */
+
+#define EIGEN_MKL_SYMM_R(EIGTYPE, MKLTYPE, EIGPREFIX, MKLPREFIX) \
+template <typename Index, \
+          int LhsStorageOrder, bool ConjugateLhs, \
+          int RhsStorageOrder, bool ConjugateRhs> \
+struct product_selfadjoint_matrix<EIGTYPE,Index,LhsStorageOrder,false,ConjugateLhs,RhsStorageOrder,true,ConjugateRhs,ColMajor> \
+{\
+\
+  static void run( \
+    Index rows, Index cols, \
+    const EIGTYPE* _lhs, Index lhsStride, \
+    const EIGTYPE* _rhs, Index rhsStride, \
+    EIGTYPE* res,        Index resStride, \
+    EIGTYPE alpha) \
+  { \
+    char side='R', uplo='L'; \
+    MKL_INT m, n, lda, ldb, ldc; \
+    const EIGTYPE *a, *b; \
+    MKLTYPE alpha_, beta_; \
+    MatrixX##EIGPREFIX b_tmp; \
+    EIGTYPE myone(1);\
+\
+/* Set m, n, k */ \
+    m = (MKL_INT)rows;  \
+    n = (MKL_INT)cols;  \
+\
+/* Set alpha_ & beta_ */ \
+    assign_scalar_eig2mkl(alpha_, alpha); \
+    assign_scalar_eig2mkl(beta_, myone); \
+\
+/* Set lda, ldb, ldc */ \
+    lda = (MKL_INT)rhsStride; \
+    ldb = (MKL_INT)lhsStride; \
+    ldc = (MKL_INT)resStride; \
+\
+/* Set a, b, c */ \
+    if (RhsStorageOrder==RowMajor) uplo='U'; \
+    a = _rhs; \
+\
+    if (LhsStorageOrder==RowMajor) { \
+      Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > lhs(_lhs,n,m,OuterStride<>(rhsStride)); \
+      b_tmp = lhs.adjoint(); \
+      b = b_tmp.data(); \
+      ldb = b_tmp.outerStride(); \
+    } else b = _lhs; \
+\
+    MKLPREFIX##symm(&side, &uplo, &m, &n, &alpha_, (const MKLTYPE*)a, &lda, (const MKLTYPE*)b, &ldb, &beta_, (MKLTYPE*)res, &ldc); \
+\
+  } \
+};
+
+
+#define EIGEN_MKL_HEMM_R(EIGTYPE, MKLTYPE, EIGPREFIX, MKLPREFIX) \
+template <typename Index, \
+          int LhsStorageOrder, bool ConjugateLhs, \
+          int RhsStorageOrder, bool ConjugateRhs> \
+struct product_selfadjoint_matrix<EIGTYPE,Index,LhsStorageOrder,false,ConjugateLhs,RhsStorageOrder,true,ConjugateRhs,ColMajor> \
+{\
+  static void run( \
+    Index rows, Index cols, \
+    const EIGTYPE* _lhs, Index lhsStride, \
+    const EIGTYPE* _rhs, Index rhsStride, \
+    EIGTYPE* res,        Index resStride, \
+    EIGTYPE alpha) \
+  { \
+    char side='R', uplo='L'; \
+    MKL_INT m, n, lda, ldb, ldc; \
+    const EIGTYPE *a, *b; \
+    MKLTYPE alpha_, beta_; \
+    MatrixX##EIGPREFIX b_tmp; \
+    Matrix<EIGTYPE, Dynamic, Dynamic, RhsStorageOrder> a_tmp; \
+    EIGTYPE myone(1); \
+\
+/* Set m, n, k */ \
+    m = (MKL_INT)rows; \
+    n = (MKL_INT)cols; \
+\
+/* Set alpha_ & beta_ */ \
+    assign_scalar_eig2mkl(alpha_, alpha); \
+    assign_scalar_eig2mkl(beta_, myone); \
+\
+/* Set lda, ldb, ldc */ \
+    lda = (MKL_INT)rhsStride; \
+    ldb = (MKL_INT)lhsStride; \
+    ldc = (MKL_INT)resStride; \
+\
+/* Set a, b, c */ \
+    if (((RhsStorageOrder==ColMajor) && ConjugateRhs) || ((RhsStorageOrder==RowMajor) && (!ConjugateRhs))) { \
+      Map<const Matrix<EIGTYPE, Dynamic, Dynamic, RhsStorageOrder>, 0, OuterStride<> > rhs(_rhs,n,n,OuterStride<>(rhsStride)); \
+      a_tmp = rhs.conjugate(); \
+      a = a_tmp.data(); \
+      lda = a_tmp.outerStride(); \
+    } else a = _rhs; \
+    if (RhsStorageOrder==RowMajor) uplo='U'; \
+\
+    if (LhsStorageOrder==ColMajor && (!ConjugateLhs)) { \
+       b = _lhs; } \
+    else { \
+      if (LhsStorageOrder==ColMajor && ConjugateLhs) { \
+        Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > lhs(_lhs,m,n,OuterStride<>(lhsStride)); \
+        b_tmp = lhs.conjugate(); \
+      } else \
+      if (ConjugateLhs) { \
+        Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > lhs(_lhs,n,m,OuterStride<>(lhsStride)); \
+        b_tmp = lhs.adjoint(); \
+      } else { \
+        Map<const MatrixX##EIGPREFIX, 0, OuterStride<> > lhs(_lhs,n,m,OuterStride<>(lhsStride)); \
+        b_tmp = lhs.transpose(); \
+      } \
+      b = b_tmp.data(); \
+      ldb = b_tmp.outerStride(); \
+    } \
+\
+    MKLPREFIX##hemm(&side, &uplo, &m, &n, &alpha_, (const MKLTYPE*)a, &lda, (const MKLTYPE*)b, &ldb, &beta_, (MKLTYPE*)res, &ldc); \
+  } \
+};
+
+EIGEN_MKL_SYMM_R(double, double, d, d)
+EIGEN_MKL_SYMM_R(float, float, f, s)
+EIGEN_MKL_HEMM_R(dcomplex, MKL_Complex16, cd, z)
+EIGEN_MKL_HEMM_R(scomplex, MKL_Complex8, cf, c)
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFADJOINT_MATRIX_MATRIX_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixVector.h b/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixVector.h
new file mode 100644
index 0000000000..fdc81205ab
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixVector.h
@@ -0,0 +1,281 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SELFADJOINT_MATRIX_VECTOR_H
+#define EIGEN_SELFADJOINT_MATRIX_VECTOR_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/* Optimized selfadjoint matrix * vector product:
+ * This algorithm processes 2 columns at onces that allows to both reduce
+ * the number of load/stores of the result by a factor 2 and to reduce
+ * the instruction dependency.
+ */
+
+template<typename Scalar, typename Index, int StorageOrder, int UpLo, bool ConjugateLhs, bool ConjugateRhs, int Version=Specialized>
+struct selfadjoint_matrix_vector_product;
+
+template<typename Scalar, typename Index, int StorageOrder, int UpLo, bool ConjugateLhs, bool ConjugateRhs, int Version>
+struct selfadjoint_matrix_vector_product
+
+{
+static EIGEN_DONT_INLINE void run(
+  Index size,
+  const Scalar*  lhs, Index lhsStride,
+  const Scalar* _rhs, Index rhsIncr,
+  Scalar* res,
+  Scalar alpha);
+};
+
+template<typename Scalar, typename Index, int StorageOrder, int UpLo, bool ConjugateLhs, bool ConjugateRhs, int Version>
+EIGEN_DONT_INLINE void selfadjoint_matrix_vector_product<Scalar,Index,StorageOrder,UpLo,ConjugateLhs,ConjugateRhs,Version>::run(
+  Index size,
+  const Scalar*  lhs, Index lhsStride,
+  const Scalar* _rhs, Index rhsIncr,
+  Scalar* res,
+  Scalar alpha)
+{
+  typedef typename packet_traits<Scalar>::type Packet;
+  const Index PacketSize = sizeof(Packet)/sizeof(Scalar);
+
+  enum {
+    IsRowMajor = StorageOrder==RowMajor ? 1 : 0,
+    IsLower = UpLo == Lower ? 1 : 0,
+    FirstTriangular = IsRowMajor == IsLower
+  };
+
+  conj_helper<Scalar,Scalar,NumTraits<Scalar>::IsComplex && EIGEN_LOGICAL_XOR(ConjugateLhs,  IsRowMajor), ConjugateRhs> cj0;
+  conj_helper<Scalar,Scalar,NumTraits<Scalar>::IsComplex && EIGEN_LOGICAL_XOR(ConjugateLhs, !IsRowMajor), ConjugateRhs> cj1;
+  conj_helper<Scalar,Scalar,NumTraits<Scalar>::IsComplex, ConjugateRhs> cjd;
+
+  conj_helper<Packet,Packet,NumTraits<Scalar>::IsComplex && EIGEN_LOGICAL_XOR(ConjugateLhs,  IsRowMajor), ConjugateRhs> pcj0;
+  conj_helper<Packet,Packet,NumTraits<Scalar>::IsComplex && EIGEN_LOGICAL_XOR(ConjugateLhs, !IsRowMajor), ConjugateRhs> pcj1;
+
+  Scalar cjAlpha = ConjugateRhs ? numext::conj(alpha) : alpha;
+
+  // FIXME this copy is now handled outside product_selfadjoint_vector, so it could probably be removed.
+  // if the rhs is not sequentially stored in memory we copy it to a temporary buffer,
+  // this is because we need to extract packets
+  ei_declare_aligned_stack_constructed_variable(Scalar,rhs,size,rhsIncr==1 ? const_cast<Scalar*>(_rhs) : 0);  
+  if (rhsIncr!=1)
+  {
+    const Scalar* it = _rhs;
+    for (Index i=0; i<size; ++i, it+=rhsIncr)
+      rhs[i] = *it;
+  }
+
+  Index bound = (std::max)(Index(0),size-8) & 0xfffffffe;
+  if (FirstTriangular)
+    bound = size - bound;
+
+  for (Index j=FirstTriangular ? bound : 0;
+       j<(FirstTriangular ? size : bound);j+=2)
+  {
+    const Scalar* EIGEN_RESTRICT A0 = lhs + j*lhsStride;
+    const Scalar* EIGEN_RESTRICT A1 = lhs + (j+1)*lhsStride;
+
+    Scalar t0 = cjAlpha * rhs[j];
+    Packet ptmp0 = pset1<Packet>(t0);
+    Scalar t1 = cjAlpha * rhs[j+1];
+    Packet ptmp1 = pset1<Packet>(t1);
+
+    Scalar t2(0);
+    Packet ptmp2 = pset1<Packet>(t2);
+    Scalar t3(0);
+    Packet ptmp3 = pset1<Packet>(t3);
+
+    size_t starti = FirstTriangular ? 0 : j+2;
+    size_t endi   = FirstTriangular ? j : size;
+    size_t alignedStart = (starti) + internal::first_aligned(&res[starti], endi-starti);
+    size_t alignedEnd = alignedStart + ((endi-alignedStart)/(PacketSize))*(PacketSize);
+
+    // TODO make sure this product is a real * complex and that the rhs is properly conjugated if needed
+    res[j]   += cjd.pmul(numext::real(A0[j]), t0);
+    res[j+1] += cjd.pmul(numext::real(A1[j+1]), t1);
+    if(FirstTriangular)
+    {
+      res[j]   += cj0.pmul(A1[j],   t1);
+      t3       += cj1.pmul(A1[j],   rhs[j]);
+    }
+    else
+    {
+      res[j+1] += cj0.pmul(A0[j+1],t0);
+      t2 += cj1.pmul(A0[j+1], rhs[j+1]);
+    }
+
+    for (size_t i=starti; i<alignedStart; ++i)
+    {
+      res[i] += cj0.pmul(A0[i], t0) + cj0.pmul(A1[i],t1);
+      t2 += cj1.pmul(A0[i], rhs[i]);
+      t3 += cj1.pmul(A1[i], rhs[i]);
+    }
+    // Yes this an optimization for gcc 4.3 and 4.4 (=> huge speed up)
+    // gcc 4.2 does this optimization automatically.
+    const Scalar* EIGEN_RESTRICT a0It  = A0  + alignedStart;
+    const Scalar* EIGEN_RESTRICT a1It  = A1  + alignedStart;
+    const Scalar* EIGEN_RESTRICT rhsIt = rhs + alignedStart;
+          Scalar* EIGEN_RESTRICT resIt = res + alignedStart;
+    for (size_t i=alignedStart; i<alignedEnd; i+=PacketSize)
+    {
+      Packet A0i = ploadu<Packet>(a0It);  a0It  += PacketSize;
+      Packet A1i = ploadu<Packet>(a1It);  a1It  += PacketSize;
+      Packet Bi  = ploadu<Packet>(rhsIt); rhsIt += PacketSize; // FIXME should be aligned in most cases
+      Packet Xi  = pload <Packet>(resIt);
+
+      Xi    = pcj0.pmadd(A0i,ptmp0, pcj0.pmadd(A1i,ptmp1,Xi));
+      ptmp2 = pcj1.pmadd(A0i,  Bi, ptmp2);
+      ptmp3 = pcj1.pmadd(A1i,  Bi, ptmp3);
+      pstore(resIt,Xi); resIt += PacketSize;
+    }
+    for (size_t i=alignedEnd; i<endi; i++)
+    {
+      res[i] += cj0.pmul(A0[i], t0) + cj0.pmul(A1[i],t1);
+      t2 += cj1.pmul(A0[i], rhs[i]);
+      t3 += cj1.pmul(A1[i], rhs[i]);
+    }
+
+    res[j]   += alpha * (t2 + predux(ptmp2));
+    res[j+1] += alpha * (t3 + predux(ptmp3));
+  }
+  for (Index j=FirstTriangular ? 0 : bound;j<(FirstTriangular ? bound : size);j++)
+  {
+    const Scalar* EIGEN_RESTRICT A0 = lhs + j*lhsStride;
+
+    Scalar t1 = cjAlpha * rhs[j];
+    Scalar t2(0);
+    // TODO make sure this product is a real * complex and that the rhs is properly conjugated if needed
+    res[j] += cjd.pmul(numext::real(A0[j]), t1);
+    for (Index i=FirstTriangular ? 0 : j+1; i<(FirstTriangular ? j : size); i++)
+    {
+      res[i] += cj0.pmul(A0[i], t1);
+      t2 += cj1.pmul(A0[i], rhs[i]);
+    }
+    res[j] += alpha * t2;
+  }
+}
+
+} // end namespace internal 
+
+/***************************************************************************
+* Wrapper to product_selfadjoint_vector
+***************************************************************************/
+
+namespace internal {
+template<typename Lhs, int LhsMode, typename Rhs>
+struct traits<SelfadjointProductMatrix<Lhs,LhsMode,false,Rhs,0,true> >
+  : traits<ProductBase<SelfadjointProductMatrix<Lhs,LhsMode,false,Rhs,0,true>, Lhs, Rhs> >
+{};
+}
+
+template<typename Lhs, int LhsMode, typename Rhs>
+struct SelfadjointProductMatrix<Lhs,LhsMode,false,Rhs,0,true>
+  : public ProductBase<SelfadjointProductMatrix<Lhs,LhsMode,false,Rhs,0,true>, Lhs, Rhs >
+{
+  EIGEN_PRODUCT_PUBLIC_INTERFACE(SelfadjointProductMatrix)
+
+  enum {
+    LhsUpLo = LhsMode&(Upper|Lower)
+  };
+
+  SelfadjointProductMatrix(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs) {}
+
+  template<typename Dest> void scaleAndAddTo(Dest& dest, const Scalar& alpha) const
+  {
+    typedef typename Dest::Scalar ResScalar;
+    typedef typename Base::RhsScalar RhsScalar;
+    typedef Map<Matrix<ResScalar,Dynamic,1>, Aligned> MappedDest;
+    
+    eigen_assert(dest.rows()==m_lhs.rows() && dest.cols()==m_rhs.cols());
+
+    typename internal::add_const_on_value_type<ActualLhsType>::type lhs = LhsBlasTraits::extract(m_lhs);
+    typename internal::add_const_on_value_type<ActualRhsType>::type rhs = RhsBlasTraits::extract(m_rhs);
+
+    Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(m_lhs)
+                               * RhsBlasTraits::extractScalarFactor(m_rhs);
+
+    enum {
+      EvalToDest = (Dest::InnerStrideAtCompileTime==1),
+      UseRhs = (_ActualRhsType::InnerStrideAtCompileTime==1)
+    };
+    
+    internal::gemv_static_vector_if<ResScalar,Dest::SizeAtCompileTime,Dest::MaxSizeAtCompileTime,!EvalToDest> static_dest;
+    internal::gemv_static_vector_if<RhsScalar,_ActualRhsType::SizeAtCompileTime,_ActualRhsType::MaxSizeAtCompileTime,!UseRhs> static_rhs;
+
+    ei_declare_aligned_stack_constructed_variable(ResScalar,actualDestPtr,dest.size(),
+                                                  EvalToDest ? dest.data() : static_dest.data());
+                                                  
+    ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhsPtr,rhs.size(),
+        UseRhs ? const_cast<RhsScalar*>(rhs.data()) : static_rhs.data());
+    
+    if(!EvalToDest)
+    {
+      #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      int size = dest.size();
+      EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      #endif
+      MappedDest(actualDestPtr, dest.size()) = dest;
+    }
+      
+    if(!UseRhs)
+    {
+      #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      int size = rhs.size();
+      EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      #endif
+      Map<typename _ActualRhsType::PlainObject>(actualRhsPtr, rhs.size()) = rhs;
+    }
+      
+      
+    internal::selfadjoint_matrix_vector_product<Scalar, Index, (internal::traits<_ActualLhsType>::Flags&RowMajorBit) ? RowMajor : ColMajor, int(LhsUpLo), bool(LhsBlasTraits::NeedToConjugate), bool(RhsBlasTraits::NeedToConjugate)>::run
+      (
+        lhs.rows(),                             // size
+        &lhs.coeffRef(0,0),  lhs.outerStride(), // lhs info
+        actualRhsPtr, 1,                        // rhs info
+        actualDestPtr,                          // result info
+        actualAlpha                             // scale factor
+      );
+    
+    if(!EvalToDest)
+      dest = MappedDest(actualDestPtr, dest.size());
+  }
+};
+
+namespace internal {
+template<typename Lhs, typename Rhs, int RhsMode>
+struct traits<SelfadjointProductMatrix<Lhs,0,true,Rhs,RhsMode,false> >
+  : traits<ProductBase<SelfadjointProductMatrix<Lhs,0,true,Rhs,RhsMode,false>, Lhs, Rhs> >
+{};
+}
+
+template<typename Lhs, typename Rhs, int RhsMode>
+struct SelfadjointProductMatrix<Lhs,0,true,Rhs,RhsMode,false>
+  : public ProductBase<SelfadjointProductMatrix<Lhs,0,true,Rhs,RhsMode,false>, Lhs, Rhs >
+{
+  EIGEN_PRODUCT_PUBLIC_INTERFACE(SelfadjointProductMatrix)
+
+  enum {
+    RhsUpLo = RhsMode&(Upper|Lower)
+  };
+
+  SelfadjointProductMatrix(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs) {}
+
+  template<typename Dest> void scaleAndAddTo(Dest& dest, const Scalar& alpha) const
+  {
+    // let's simply transpose the product
+    Transpose<Dest> destT(dest);
+    SelfadjointProductMatrix<Transpose<const Rhs>, int(RhsUpLo)==Upper ? Lower : Upper, false,
+                             Transpose<const Lhs>, 0, true>(m_rhs.transpose(), m_lhs.transpose()).scaleAndAddTo(destT, alpha);
+  }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFADJOINT_MATRIX_VECTOR_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixVector_MKL.h b/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixVector_MKL.h
new file mode 100644
index 0000000000..86684b66d9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/SelfadjointMatrixVector_MKL.h
@@ -0,0 +1,114 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   Selfadjoint matrix-vector product functionality based on ?SYMV/HEMV.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_SELFADJOINT_MATRIX_VECTOR_MKL_H
+#define EIGEN_SELFADJOINT_MATRIX_VECTOR_MKL_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/**********************************************************************
+* This file implements selfadjoint matrix-vector multiplication using BLAS
+**********************************************************************/
+
+// symv/hemv specialization
+
+template<typename Scalar, typename Index, int StorageOrder, int UpLo, bool ConjugateLhs, bool ConjugateRhs>
+struct selfadjoint_matrix_vector_product_symv :
+  selfadjoint_matrix_vector_product<Scalar,Index,StorageOrder,UpLo,ConjugateLhs,ConjugateRhs,BuiltIn> {};
+
+#define EIGEN_MKL_SYMV_SPECIALIZE(Scalar) \
+template<typename Index, int StorageOrder, int UpLo, bool ConjugateLhs, bool ConjugateRhs> \
+struct selfadjoint_matrix_vector_product<Scalar,Index,StorageOrder,UpLo,ConjugateLhs,ConjugateRhs,Specialized> { \
+static void run( \
+  Index size, const Scalar*  lhs, Index lhsStride, \
+  const Scalar* _rhs, Index rhsIncr, Scalar* res, Scalar alpha) { \
+    enum {\
+      IsColMajor = StorageOrder==ColMajor \
+    }; \
+    if (IsColMajor == ConjugateLhs) {\
+      selfadjoint_matrix_vector_product<Scalar,Index,StorageOrder,UpLo,ConjugateLhs,ConjugateRhs,BuiltIn>::run( \
+        size, lhs, lhsStride, _rhs, rhsIncr, res, alpha);  \
+    } else {\
+      selfadjoint_matrix_vector_product_symv<Scalar,Index,StorageOrder,UpLo,ConjugateLhs,ConjugateRhs>::run( \
+        size, lhs, lhsStride, _rhs, rhsIncr, res, alpha);  \
+    }\
+  } \
+}; \
+
+EIGEN_MKL_SYMV_SPECIALIZE(double)
+EIGEN_MKL_SYMV_SPECIALIZE(float)
+EIGEN_MKL_SYMV_SPECIALIZE(dcomplex)
+EIGEN_MKL_SYMV_SPECIALIZE(scomplex)
+
+#define EIGEN_MKL_SYMV_SPECIALIZATION(EIGTYPE,MKLTYPE,MKLFUNC) \
+template<typename Index, int StorageOrder, int UpLo, bool ConjugateLhs, bool ConjugateRhs> \
+struct selfadjoint_matrix_vector_product_symv<EIGTYPE,Index,StorageOrder,UpLo,ConjugateLhs,ConjugateRhs> \
+{ \
+typedef Matrix<EIGTYPE,Dynamic,1,ColMajor> SYMVVector;\
+\
+static void run( \
+Index size, const EIGTYPE*  lhs, Index lhsStride, \
+const EIGTYPE* _rhs, Index rhsIncr, EIGTYPE* res, EIGTYPE alpha) \
+{ \
+  enum {\
+    IsRowMajor = StorageOrder==RowMajor ? 1 : 0, \
+    IsLower = UpLo == Lower ? 1 : 0 \
+  }; \
+  MKL_INT n=size, lda=lhsStride, incx=rhsIncr, incy=1; \
+  MKLTYPE alpha_, beta_; \
+  const EIGTYPE *x_ptr, myone(1); \
+  char uplo=(IsRowMajor) ? (IsLower ? 'U' : 'L') : (IsLower ? 'L' : 'U'); \
+  assign_scalar_eig2mkl(alpha_, alpha); \
+  assign_scalar_eig2mkl(beta_, myone); \
+  SYMVVector x_tmp; \
+  if (ConjugateRhs) { \
+    Map<const SYMVVector, 0, InnerStride<> > map_x(_rhs,size,1,InnerStride<>(incx)); \
+    x_tmp=map_x.conjugate(); \
+    x_ptr=x_tmp.data(); \
+    incx=1; \
+  } else x_ptr=_rhs; \
+  MKLFUNC(&uplo, &n, &alpha_, (const MKLTYPE*)lhs, &lda, (const MKLTYPE*)x_ptr, &incx, &beta_, (MKLTYPE*)res, &incy); \
+}\
+};
+
+EIGEN_MKL_SYMV_SPECIALIZATION(double,   double,        dsymv)
+EIGEN_MKL_SYMV_SPECIALIZATION(float,    float,         ssymv)
+EIGEN_MKL_SYMV_SPECIALIZATION(dcomplex, MKL_Complex16, zhemv)
+EIGEN_MKL_SYMV_SPECIALIZATION(scomplex, MKL_Complex8,  chemv)
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFADJOINT_MATRIX_VECTOR_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/SelfadjointProduct.h b/third_party/eigen3/Eigen/src/Core/products/SelfadjointProduct.h
new file mode 100644
index 0000000000..6ca4ae6c0f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/SelfadjointProduct.h
@@ -0,0 +1,123 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SELFADJOINT_PRODUCT_H
+#define EIGEN_SELFADJOINT_PRODUCT_H
+
+/**********************************************************************
+* This file implements a self adjoint product: C += A A^T updating only
+* half of the selfadjoint matrix C.
+* It corresponds to the level 3 SYRK and level 2 SYR Blas routines.
+**********************************************************************/
+
+namespace Eigen { 
+
+
+template<typename Scalar, typename Index, int UpLo, bool ConjLhs, bool ConjRhs>
+struct selfadjoint_rank1_update<Scalar,Index,ColMajor,UpLo,ConjLhs,ConjRhs>
+{
+  static void run(Index size, Scalar* mat, Index stride, const Scalar* vecX, const Scalar* vecY, const Scalar& alpha)
+  {
+    internal::conj_if<ConjRhs> cj;
+    typedef Map<const Matrix<Scalar,Dynamic,1> > OtherMap;
+    typedef typename internal::conditional<ConjLhs,typename OtherMap::ConjugateReturnType,const OtherMap&>::type ConjLhsType;
+    for (Index i=0; i<size; ++i)
+    {
+      Map<Matrix<Scalar,Dynamic,1> >(mat+stride*i+(UpLo==Lower ? i : 0), (UpLo==Lower ? size-i : (i+1)))
+          += (alpha * cj(vecY[i])) * ConjLhsType(OtherMap(vecX+(UpLo==Lower ? i : 0),UpLo==Lower ? size-i : (i+1)));
+    }
+  }
+};
+
+template<typename Scalar, typename Index, int UpLo, bool ConjLhs, bool ConjRhs>
+struct selfadjoint_rank1_update<Scalar,Index,RowMajor,UpLo,ConjLhs,ConjRhs>
+{
+  static void run(Index size, Scalar* mat, Index stride, const Scalar* vecX, const Scalar* vecY, const Scalar& alpha)
+  {
+    selfadjoint_rank1_update<Scalar,Index,ColMajor,UpLo==Lower?Upper:Lower,ConjRhs,ConjLhs>::run(size,mat,stride,vecY,vecX,alpha);
+  }
+};
+
+template<typename MatrixType, typename OtherType, int UpLo, bool OtherIsVector = OtherType::IsVectorAtCompileTime>
+struct selfadjoint_product_selector;
+
+template<typename MatrixType, typename OtherType, int UpLo>
+struct selfadjoint_product_selector<MatrixType,OtherType,UpLo,true>
+{
+  static void run(MatrixType& mat, const OtherType& other, const typename MatrixType::Scalar& alpha)
+  {
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::Index Index;
+    typedef internal::blas_traits<OtherType> OtherBlasTraits;
+    typedef typename OtherBlasTraits::DirectLinearAccessType ActualOtherType;
+    typedef typename internal::remove_all<ActualOtherType>::type _ActualOtherType;
+    typename internal::add_const_on_value_type<ActualOtherType>::type actualOther = OtherBlasTraits::extract(other.derived());
+
+    Scalar actualAlpha = alpha * OtherBlasTraits::extractScalarFactor(other.derived());
+
+    enum {
+      StorageOrder = (internal::traits<MatrixType>::Flags&RowMajorBit) ? RowMajor : ColMajor,
+      UseOtherDirectly = _ActualOtherType::InnerStrideAtCompileTime==1
+    };
+    internal::gemv_static_vector_if<Scalar,OtherType::SizeAtCompileTime,OtherType::MaxSizeAtCompileTime,!UseOtherDirectly> static_other;
+
+    ei_declare_aligned_stack_constructed_variable(Scalar, actualOtherPtr, other.size(),
+      (UseOtherDirectly ? const_cast<Scalar*>(actualOther.data()) : static_other.data()));
+      
+    if(!UseOtherDirectly)
+      Map<typename _ActualOtherType::PlainObject>(actualOtherPtr, actualOther.size()) = actualOther;
+    
+    selfadjoint_rank1_update<Scalar,Index,StorageOrder,UpLo,
+                              OtherBlasTraits::NeedToConjugate  && NumTraits<Scalar>::IsComplex,
+                            (!OtherBlasTraits::NeedToConjugate) && NumTraits<Scalar>::IsComplex>
+          ::run(other.size(), mat.data(), mat.outerStride(), actualOtherPtr, actualOtherPtr, actualAlpha);
+  }
+};
+
+template<typename MatrixType, typename OtherType, int UpLo>
+struct selfadjoint_product_selector<MatrixType,OtherType,UpLo,false>
+{
+  static void run(MatrixType& mat, const OtherType& other, const typename MatrixType::Scalar& alpha)
+  {
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::Index Index;
+    typedef internal::blas_traits<OtherType> OtherBlasTraits;
+    typedef typename OtherBlasTraits::DirectLinearAccessType ActualOtherType;
+    typedef typename internal::remove_all<ActualOtherType>::type _ActualOtherType;
+    typename internal::add_const_on_value_type<ActualOtherType>::type actualOther = OtherBlasTraits::extract(other.derived());
+
+    Scalar actualAlpha = alpha * OtherBlasTraits::extractScalarFactor(other.derived());
+
+    enum { IsRowMajor = (internal::traits<MatrixType>::Flags&RowMajorBit) ? 1 : 0 };
+
+    internal::general_matrix_matrix_triangular_product<Index,
+      Scalar, _ActualOtherType::Flags&RowMajorBit ? RowMajor : ColMajor,   OtherBlasTraits::NeedToConjugate  && NumTraits<Scalar>::IsComplex,
+      Scalar, _ActualOtherType::Flags&RowMajorBit ? ColMajor : RowMajor, (!OtherBlasTraits::NeedToConjugate) && NumTraits<Scalar>::IsComplex,
+      MatrixType::Flags&RowMajorBit ? RowMajor : ColMajor, UpLo>
+      ::run(mat.cols(), actualOther.cols(),
+            &actualOther.coeffRef(0,0), actualOther.outerStride(), &actualOther.coeffRef(0,0), actualOther.outerStride(),
+            mat.data(), mat.outerStride(), actualAlpha);
+  }
+};
+
+// high level API
+
+template<typename MatrixType, unsigned int UpLo>
+template<typename DerivedU>
+SelfAdjointView<MatrixType,UpLo>& SelfAdjointView<MatrixType,UpLo>
+::rankUpdate(const MatrixBase<DerivedU>& u, const Scalar& alpha)
+{
+  selfadjoint_product_selector<MatrixType,DerivedU,UpLo>::run(_expression().const_cast_derived(), u.derived(), alpha);
+
+  return *this;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFADJOINT_PRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/SelfadjointRank2Update.h b/third_party/eigen3/Eigen/src/Core/products/SelfadjointRank2Update.h
new file mode 100644
index 0000000000..8594a97cea
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/SelfadjointRank2Update.h
@@ -0,0 +1,93 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SELFADJOINTRANK2UPTADE_H
+#define EIGEN_SELFADJOINTRANK2UPTADE_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/* Optimized selfadjoint matrix += alpha * uv' + conj(alpha)*vu'
+ * It corresponds to the Level2 syr2 BLAS routine
+ */
+
+template<typename Scalar, typename Index, typename UType, typename VType, int UpLo>
+struct selfadjoint_rank2_update_selector;
+
+template<typename Scalar, typename Index, typename UType, typename VType>
+struct selfadjoint_rank2_update_selector<Scalar,Index,UType,VType,Lower>
+{
+  static void run(Scalar* mat, Index stride, const UType& u, const VType& v, const Scalar& alpha)
+  {
+    const Index size = u.size();
+    for (Index i=0; i<size; ++i)
+    {
+      Map<Matrix<Scalar,Dynamic,1> >(mat+stride*i+i, size-i) +=
+                        (numext::conj(alpha) * numext::conj(u.coeff(i))) * v.tail(size-i)
+                      + (alpha * numext::conj(v.coeff(i))) * u.tail(size-i);
+    }
+  }
+};
+
+template<typename Scalar, typename Index, typename UType, typename VType>
+struct selfadjoint_rank2_update_selector<Scalar,Index,UType,VType,Upper>
+{
+  static void run(Scalar* mat, Index stride, const UType& u, const VType& v, const Scalar& alpha)
+  {
+    const Index size = u.size();
+    for (Index i=0; i<size; ++i)
+      Map<Matrix<Scalar,Dynamic,1> >(mat+stride*i, i+1) +=
+                        (numext::conj(alpha)  * numext::conj(u.coeff(i))) * v.head(i+1)
+                      + (alpha * numext::conj(v.coeff(i))) * u.head(i+1);
+  }
+};
+
+template<bool Cond, typename T> struct conj_expr_if
+  : conditional<!Cond, const T&,
+      CwiseUnaryOp<scalar_conjugate_op<typename traits<T>::Scalar>,T> > {};
+
+} // end namespace internal
+
+template<typename MatrixType, unsigned int UpLo>
+template<typename DerivedU, typename DerivedV>
+SelfAdjointView<MatrixType,UpLo>& SelfAdjointView<MatrixType,UpLo>
+::rankUpdate(const MatrixBase<DerivedU>& u, const MatrixBase<DerivedV>& v, const Scalar& alpha)
+{
+  typedef internal::blas_traits<DerivedU> UBlasTraits;
+  typedef typename UBlasTraits::DirectLinearAccessType ActualUType;
+  typedef typename internal::remove_all<ActualUType>::type _ActualUType;
+  typename internal::add_const_on_value_type<ActualUType>::type actualU = UBlasTraits::extract(u.derived());
+
+  typedef internal::blas_traits<DerivedV> VBlasTraits;
+  typedef typename VBlasTraits::DirectLinearAccessType ActualVType;
+  typedef typename internal::remove_all<ActualVType>::type _ActualVType;
+  typename internal::add_const_on_value_type<ActualVType>::type actualV = VBlasTraits::extract(v.derived());
+
+  // If MatrixType is row major, then we use the routine for lower triangular in the upper triangular case and
+  // vice versa, and take the complex conjugate of all coefficients and vector entries.
+
+  enum { IsRowMajor = (internal::traits<MatrixType>::Flags&RowMajorBit) ? 1 : 0 };
+  Scalar actualAlpha = alpha * UBlasTraits::extractScalarFactor(u.derived())
+                             * numext::conj(VBlasTraits::extractScalarFactor(v.derived()));
+  if (IsRowMajor)
+    actualAlpha = numext::conj(actualAlpha);
+
+  internal::selfadjoint_rank2_update_selector<Scalar, Index,
+    typename internal::remove_all<typename internal::conj_expr_if<IsRowMajor ^ UBlasTraits::NeedToConjugate,_ActualUType>::type>::type,
+    typename internal::remove_all<typename internal::conj_expr_if<IsRowMajor ^ VBlasTraits::NeedToConjugate,_ActualVType>::type>::type,
+    (IsRowMajor ? int(UpLo==Upper ? Lower : Upper) : UpLo)>
+    ::run(_expression().const_cast_derived().data(),_expression().outerStride(),actualU,actualV,actualAlpha);
+
+  return *this;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFADJOINTRANK2UPTADE_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixMatrix.h b/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixMatrix.h
new file mode 100644
index 0000000000..4cbb79da0c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixMatrix.h
@@ -0,0 +1,434 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRIANGULAR_MATRIX_MATRIX_H
+#define EIGEN_TRIANGULAR_MATRIX_MATRIX_H
+
+namespace Eigen { 
+
+namespace internal {
+
+// template<typename Scalar, int mr, int StorageOrder, bool Conjugate, int Mode>
+// struct gemm_pack_lhs_triangular
+// {
+//   Matrix<Scalar,mr,mr,
+//   void operator()(Scalar* blockA, const EIGEN_RESTRICT Scalar* _lhs, int lhsStride, int depth, int rows)
+//   {
+//     conj_if<NumTraits<Scalar>::IsComplex && Conjugate> cj;
+//     const_blas_data_mapper<Scalar, StorageOrder> lhs(_lhs,lhsStride);
+//     int count = 0;
+//     const int peeled_mc = (rows/mr)*mr;
+//     for(int i=0; i<peeled_mc; i+=mr)
+//     {
+//       for(int k=0; k<depth; k++)
+//         for(int w=0; w<mr; w++)
+//           blockA[count++] = cj(lhs(i+w, k));
+//     }
+//     for(int i=peeled_mc; i<rows; i++)
+//     {
+//       for(int k=0; k<depth; k++)
+//         blockA[count++] = cj(lhs(i, k));
+//     }
+//   }
+// };
+
+/* Optimized triangular matrix * matrix (_TRMM++) product built on top of
+ * the general matrix matrix product.
+ */
+template <typename Scalar, typename Index,
+          int Mode, bool LhsIsTriangular,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs,
+          int ResStorageOrder, int Version = Specialized>
+struct product_triangular_matrix_matrix;
+
+template <typename Scalar, typename Index,
+          int Mode, bool LhsIsTriangular,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs, int Version>
+struct product_triangular_matrix_matrix<Scalar,Index,Mode,LhsIsTriangular,
+                                           LhsStorageOrder,ConjugateLhs,
+                                           RhsStorageOrder,ConjugateRhs,RowMajor,Version>
+{
+  static EIGEN_STRONG_INLINE void run(
+    Index rows, Index cols, Index depth,
+    const Scalar* lhs, Index lhsStride,
+    const Scalar* rhs, Index rhsStride,
+    Scalar* res,       Index resStride,
+    const Scalar& alpha, level3_blocking<Scalar,Scalar>& blocking)
+  {
+    product_triangular_matrix_matrix<Scalar, Index,
+      (Mode&(UnitDiag|ZeroDiag)) | ((Mode&Upper) ? Lower : Upper),
+      (!LhsIsTriangular),
+      RhsStorageOrder==RowMajor ? ColMajor : RowMajor,
+      ConjugateRhs,
+      LhsStorageOrder==RowMajor ? ColMajor : RowMajor,
+      ConjugateLhs,
+      ColMajor>
+      ::run(cols, rows, depth, rhs, rhsStride, lhs, lhsStride, res, resStride, alpha, blocking);
+  }
+};
+
+// implements col-major += alpha * op(triangular) * op(general)
+template <typename Scalar, typename Index, int Mode,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs, int Version>
+struct product_triangular_matrix_matrix<Scalar,Index,Mode,true,
+                                           LhsStorageOrder,ConjugateLhs,
+                                           RhsStorageOrder,ConjugateRhs,ColMajor,Version>
+{
+  
+  typedef gebp_traits<Scalar,Scalar> Traits;
+  enum {
+    SmallPanelWidth   = 2 * EIGEN_PLAIN_ENUM_MAX(Traits::mr,Traits::nr),
+    IsLower = (Mode&Lower) == Lower,
+    SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1
+  };
+
+  static EIGEN_DONT_INLINE void run(
+    Index _rows, Index _cols, Index _depth,
+    const Scalar* _lhs, Index lhsStride,
+    const Scalar* _rhs, Index rhsStride,
+    Scalar* res,        Index resStride,
+    const Scalar& alpha, level3_blocking<Scalar,Scalar>& blocking);
+};
+
+template <typename Scalar, typename Index, int Mode,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs, int Version>
+EIGEN_DONT_INLINE void product_triangular_matrix_matrix<Scalar,Index,Mode,true,
+                                                        LhsStorageOrder,ConjugateLhs,
+                                                        RhsStorageOrder,ConjugateRhs,ColMajor,Version>::run(
+    Index _rows, Index _cols, Index _depth,
+    const Scalar* _lhs, Index lhsStride,
+    const Scalar* _rhs, Index rhsStride,
+    Scalar* _res,        Index resStride,
+    const Scalar& alpha, level3_blocking<Scalar,Scalar>& blocking)
+  {
+    // strip zeros
+    Index diagSize  = (std::min)(_rows,_depth);
+    Index rows      = IsLower ? _rows : diagSize;
+    Index depth     = IsLower ? diagSize : _depth;
+    Index cols      = _cols;
+    
+    typedef const_blas_data_mapper<Scalar, Index, LhsStorageOrder> LhsMapper;
+    typedef const_blas_data_mapper<Scalar, Index, RhsStorageOrder> RhsMapper;
+    typedef blas_data_mapper<typename Traits::ResScalar, Index, ColMajor> ResMapper;
+    LhsMapper lhs(_lhs,lhsStride);
+    RhsMapper rhs(_rhs,rhsStride);
+    ResMapper res(_res, resStride);
+
+    Index kc = blocking.kc();                   // cache block size along the K direction
+    Index mc = (std::min)(rows,blocking.mc());  // cache block size along the M direction
+
+    std::size_t sizeA = kc*mc;
+    std::size_t sizeB = kc*cols;
+
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA());
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB());
+
+    Matrix<Scalar,SmallPanelWidth,SmallPanelWidth,LhsStorageOrder> triangularBuffer;
+    triangularBuffer.setZero();
+    if((Mode&ZeroDiag)==ZeroDiag)
+      triangularBuffer.diagonal().setZero();
+    else
+      triangularBuffer.diagonal().setOnes();
+
+    gebp_kernel<Scalar, Scalar, Index, ResMapper, Traits::mr, Traits::nr, ConjugateLhs, ConjugateRhs> gebp_kernel;
+    gemm_pack_lhs<Scalar, Index, LhsMapper, Traits::mr, Traits::LhsProgress, LhsStorageOrder> pack_lhs;
+    gemm_pack_rhs<Scalar, Index, RhsMapper, Traits::nr,RhsStorageOrder> pack_rhs;
+
+    for(Index k2=IsLower ? depth : 0;
+        IsLower ? k2>0 : k2<depth;
+        IsLower ? k2-=kc : k2+=kc)
+    {
+      Index actual_kc = (std::min)(IsLower ? k2 : depth-k2, kc);
+      Index actual_k2 = IsLower ? k2-actual_kc : k2;
+
+      // align blocks with the end of the triangular part for trapezoidal lhs
+      if((!IsLower)&&(k2<rows)&&(k2+actual_kc>rows))
+      {
+        actual_kc = rows-k2;
+        k2 = k2+actual_kc-kc;
+      }
+
+      pack_rhs(blockB, rhs.getSubMapper(actual_k2,0), actual_kc, cols);
+
+      // the selected lhs's panel has to be split in three different parts:
+      //  1 - the part which is zero => skip it
+      //  2 - the diagonal block => special kernel
+      //  3 - the dense panel below (lower case) or above (upper case) the diagonal block => GEPP
+
+      // the block diagonal, if any:
+      if(IsLower || actual_k2<rows)
+      {
+        // for each small vertical panels of lhs
+        for (Index k1=0; k1<actual_kc; k1+=SmallPanelWidth)
+        {
+          Index actualPanelWidth = std::min<Index>(actual_kc-k1, SmallPanelWidth);
+          Index lengthTarget = IsLower ? actual_kc-k1-actualPanelWidth : k1;
+          Index startBlock   = actual_k2+k1;
+          Index blockBOffset = k1;
+
+          // => GEBP with the micro triangular block
+          // The trick is to pack this micro block while filling the opposite triangular part with zeros.
+          // To this end we do an extra triangular copy to a small temporary buffer
+          for (Index k=0;k<actualPanelWidth;++k)
+          {
+            if (SetDiag)
+              triangularBuffer.coeffRef(k,k) = lhs(startBlock+k,startBlock+k);
+            for (Index i=IsLower ? k+1 : 0; IsLower ? i<actualPanelWidth : i<k; ++i)
+              triangularBuffer.coeffRef(i,k) = lhs(startBlock+i,startBlock+k);
+          }
+          pack_lhs(blockA, LhsMapper(triangularBuffer.data(), triangularBuffer.outerStride()), actualPanelWidth, actualPanelWidth);
+
+          gebp_kernel(res.getSubMapper(startBlock, 0), blockA, blockB,
+                      actualPanelWidth, actualPanelWidth, cols, alpha,
+                      actualPanelWidth, actual_kc, 0, blockBOffset);
+
+          // GEBP with remaining micro panel
+          if (lengthTarget>0)
+          {
+            Index startTarget  = IsLower ? actual_k2+k1+actualPanelWidth : actual_k2;
+
+            pack_lhs(blockA, lhs.getSubMapper(startTarget,startBlock), actualPanelWidth, lengthTarget);
+
+            gebp_kernel(res.getSubMapper(startTarget, 0), blockA, blockB,
+                        lengthTarget, actualPanelWidth, cols, alpha,
+                        actualPanelWidth, actual_kc, 0, blockBOffset);
+          }
+        }
+      }
+      // the part below (lower case) or above (upper case) the diagonal => GEPP
+      {
+        Index start = IsLower ? k2 : 0;
+        Index end   = IsLower ? rows : (std::min)(actual_k2,rows);
+        for(Index i2=start; i2<end; i2+=mc)
+        {
+          const Index actual_mc = (std::min)(i2+mc,end)-i2;
+          gemm_pack_lhs<Scalar, Index, LhsMapper, Traits::mr,Traits::LhsProgress, LhsStorageOrder,false>()
+            (blockA, lhs.getSubMapper(i2, actual_k2), actual_kc, actual_mc);
+
+          gebp_kernel(res.getSubMapper(i2, 0), blockA, blockB, actual_mc,
+                      actual_kc, cols, alpha, -1, -1, 0, 0);
+        }
+      }
+    }
+  }
+
+// implements col-major += alpha * op(general) * op(triangular)
+template <typename Scalar, typename Index, int Mode,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs, int Version>
+struct product_triangular_matrix_matrix<Scalar,Index,Mode,false,
+                                        LhsStorageOrder,ConjugateLhs,
+                                        RhsStorageOrder,ConjugateRhs,ColMajor,Version>
+{
+  typedef gebp_traits<Scalar,Scalar> Traits;
+  enum {
+    SmallPanelWidth   = EIGEN_PLAIN_ENUM_MAX(Traits::mr,Traits::nr),
+    IsLower = (Mode&Lower) == Lower,
+    SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1
+  };
+
+  static EIGEN_DONT_INLINE void run(
+    Index _rows, Index _cols, Index _depth,
+    const Scalar* _lhs, Index lhsStride,
+    const Scalar* _rhs, Index rhsStride,
+    Scalar* res,        Index resStride,
+    const Scalar& alpha, level3_blocking<Scalar,Scalar>& blocking);
+};
+
+template <typename Scalar, typename Index, int Mode,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs, int Version>
+EIGEN_DONT_INLINE void product_triangular_matrix_matrix<Scalar,Index,Mode,false,
+                                                        LhsStorageOrder,ConjugateLhs,
+                                                        RhsStorageOrder,ConjugateRhs,ColMajor,Version>::run(
+    Index _rows, Index _cols, Index _depth,
+    const Scalar* _lhs, Index lhsStride,
+    const Scalar* _rhs, Index rhsStride,
+    Scalar* _res,        Index resStride,
+    const Scalar& alpha, level3_blocking<Scalar,Scalar>& blocking)
+  {
+    // strip zeros
+    Index diagSize  = (std::min)(_cols,_depth);
+    Index rows      = _rows;
+    Index depth     = IsLower ? _depth : diagSize;
+    Index cols      = IsLower ? diagSize : _cols;
+    
+    typedef const_blas_data_mapper<Scalar, Index, LhsStorageOrder> LhsMapper;
+    typedef const_blas_data_mapper<Scalar, Index, RhsStorageOrder> RhsMapper;
+    typedef blas_data_mapper<typename Traits::ResScalar, Index, ColMajor> ResMapper;
+    LhsMapper lhs(_lhs,lhsStride);
+    RhsMapper rhs(_rhs,rhsStride);
+    ResMapper res(_res, resStride);
+
+    Index kc = blocking.kc();                   // cache block size along the K direction
+    Index mc = (std::min)(rows,blocking.mc());  // cache block size along the M direction
+
+    std::size_t sizeA = kc*mc;
+    std::size_t sizeB = kc*cols+EIGEN_ALIGN_BYTES/sizeof(Scalar);
+
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA());
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB());
+
+    Matrix<Scalar,SmallPanelWidth,SmallPanelWidth,RhsStorageOrder> triangularBuffer;
+    triangularBuffer.setZero();
+    if((Mode&ZeroDiag)==ZeroDiag)
+      triangularBuffer.diagonal().setZero();
+    else
+      triangularBuffer.diagonal().setOnes();
+
+    gebp_kernel<Scalar, Scalar, Index, ResMapper, Traits::mr, Traits::nr, ConjugateLhs, ConjugateRhs> gebp_kernel;
+    gemm_pack_lhs<Scalar, Index, LhsMapper, Traits::mr, Traits::LhsProgress, LhsStorageOrder> pack_lhs;
+    gemm_pack_rhs<Scalar, Index, RhsMapper, Traits::nr,RhsStorageOrder> pack_rhs;
+    gemm_pack_rhs<Scalar, Index, RhsMapper, Traits::nr,RhsStorageOrder,false,true> pack_rhs_panel;
+
+    for(Index k2=IsLower ? 0 : depth;
+        IsLower ? k2<depth  : k2>0;
+        IsLower ? k2+=kc   : k2-=kc)
+    {
+      Index actual_kc = (std::min)(IsLower ? depth-k2 : k2, kc);
+      Index actual_k2 = IsLower ? k2 : k2-actual_kc;
+
+      // align blocks with the end of the triangular part for trapezoidal rhs
+      if(IsLower && (k2<cols) && (actual_k2+actual_kc>cols))
+      {
+        actual_kc = cols-k2;
+        k2 = actual_k2 + actual_kc - kc;
+      }
+
+      // remaining size
+      Index rs = IsLower ? (std::min)(cols,actual_k2) : cols - k2;
+      // size of the triangular part
+      Index ts = (IsLower && actual_k2>=cols) ? 0 : actual_kc;
+
+      Scalar* geb = blockB+ts*ts;
+      geb = geb + internal::first_aligned(geb,EIGEN_ALIGN_BYTES/sizeof(Scalar));
+
+      pack_rhs(geb, rhs.getSubMapper(actual_k2,IsLower ? 0 : k2), actual_kc, rs);
+
+      // pack the triangular part of the rhs padding the unrolled blocks with zeros
+      if(ts>0)
+      {
+        for (Index j2=0; j2<actual_kc; j2+=SmallPanelWidth)
+        {
+          Index actualPanelWidth = std::min<Index>(actual_kc-j2, SmallPanelWidth);
+          Index actual_j2 = actual_k2 + j2;
+          Index panelOffset = IsLower ? j2+actualPanelWidth : 0;
+          Index panelLength = IsLower ? actual_kc-j2-actualPanelWidth : j2;
+          // general part
+          pack_rhs_panel(blockB+j2*actual_kc,
+                         rhs.getSubMapper(actual_k2+panelOffset, actual_j2),
+                         panelLength, actualPanelWidth,
+                         actual_kc, panelOffset);
+
+          // append the triangular part via a temporary buffer
+          for (Index j=0;j<actualPanelWidth;++j)
+          {
+            if (SetDiag)
+              triangularBuffer.coeffRef(j,j) = rhs(actual_j2+j,actual_j2+j);
+            for (Index k=IsLower ? j+1 : 0; IsLower ? k<actualPanelWidth : k<j; ++k)
+              triangularBuffer.coeffRef(k,j) = rhs(actual_j2+k,actual_j2+j);
+          }
+
+          pack_rhs_panel(blockB+j2*actual_kc,
+                         RhsMapper(triangularBuffer.data(), triangularBuffer.outerStride()),
+                         actualPanelWidth, actualPanelWidth,
+                         actual_kc, j2);
+        }
+      }
+
+      for (Index i2=0; i2<rows; i2+=mc)
+      {
+        const Index actual_mc = (std::min)(mc,rows-i2);
+        pack_lhs(blockA, lhs.getSubMapper(i2, actual_k2), actual_kc, actual_mc);
+
+        // triangular kernel
+        if(ts>0)
+        {
+          for (Index j2=0; j2<actual_kc; j2+=SmallPanelWidth)
+          {
+            Index actualPanelWidth = std::min<Index>(actual_kc-j2, SmallPanelWidth);
+            Index panelLength = IsLower ? actual_kc-j2 : j2+actualPanelWidth;
+            Index blockOffset = IsLower ? j2 : 0;
+
+            gebp_kernel(res.getSubMapper(i2, actual_k2 + j2),
+                        blockA, blockB+j2*actual_kc,
+                        actual_mc, panelLength, actualPanelWidth,
+                        alpha,
+                        actual_kc, actual_kc,  // strides
+                        blockOffset, blockOffset);// offsets
+          }
+        }
+        gebp_kernel(res.getSubMapper(i2, IsLower ? 0 : k2),
+                    blockA, geb, actual_mc, actual_kc, rs,
+                    alpha,
+                    -1, -1, 0, 0);
+      }
+    }
+  }
+
+/***************************************************************************
+* Wrapper to product_triangular_matrix_matrix
+***************************************************************************/
+
+template<int Mode, bool LhsIsTriangular, typename Lhs, typename Rhs>
+struct traits<TriangularProduct<Mode,LhsIsTriangular,Lhs,false,Rhs,false> >
+  : traits<ProductBase<TriangularProduct<Mode,LhsIsTriangular,Lhs,false,Rhs,false>, Lhs, Rhs> >
+{};
+
+} // end namespace internal
+
+template<int Mode, bool LhsIsTriangular, typename Lhs, typename Rhs>
+struct TriangularProduct<Mode,LhsIsTriangular,Lhs,false,Rhs,false>
+  : public ProductBase<TriangularProduct<Mode,LhsIsTriangular,Lhs,false,Rhs,false>, Lhs, Rhs >
+{
+  EIGEN_PRODUCT_PUBLIC_INTERFACE(TriangularProduct)
+
+  TriangularProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs) {}
+
+  template<typename Dest> void scaleAndAddTo(Dest& dst, const Scalar& alpha) const
+  {
+    typename internal::add_const_on_value_type<ActualLhsType>::type lhs = LhsBlasTraits::extract(m_lhs);
+    typename internal::add_const_on_value_type<ActualRhsType>::type rhs = RhsBlasTraits::extract(m_rhs);
+
+    Scalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(m_lhs)
+                               * RhsBlasTraits::extractScalarFactor(m_rhs);
+
+    typedef internal::gemm_blocking_space<(Dest::Flags&RowMajorBit) ? RowMajor : ColMajor,Scalar,Scalar,
+              Lhs::MaxRowsAtCompileTime, Rhs::MaxColsAtCompileTime, Lhs::MaxColsAtCompileTime,4> BlockingType;
+
+    enum { IsLower = (Mode&Lower) == Lower };
+    Index stripedRows  = ((!LhsIsTriangular) || (IsLower))  ? lhs.rows() : (std::min)(lhs.rows(),lhs.cols());
+    Index stripedCols  = ((LhsIsTriangular)  || (!IsLower)) ? rhs.cols() : (std::min)(rhs.cols(),rhs.rows());
+    Index stripedDepth = LhsIsTriangular ? ((!IsLower) ? lhs.cols() : (std::min)(lhs.cols(),lhs.rows()))
+                                         : ((IsLower)  ? rhs.rows() : (std::min)(rhs.rows(),rhs.cols()));
+
+    BlockingType blocking(stripedRows, stripedCols, stripedDepth, 1, false);
+
+    internal::product_triangular_matrix_matrix<Scalar, Index,
+      Mode, LhsIsTriangular,
+      (internal::traits<_ActualLhsType>::Flags&RowMajorBit) ? RowMajor : ColMajor, LhsBlasTraits::NeedToConjugate,
+      (internal::traits<_ActualRhsType>::Flags&RowMajorBit) ? RowMajor : ColMajor, RhsBlasTraits::NeedToConjugate,
+      (internal::traits<Dest          >::Flags&RowMajorBit) ? RowMajor : ColMajor>
+      ::run(
+        stripedRows, stripedCols, stripedDepth,   // sizes
+        &lhs.coeffRef(0,0),    lhs.outerStride(), // lhs info
+        &rhs.coeffRef(0,0),    rhs.outerStride(), // rhs info
+        &dst.coeffRef(0,0), dst.outerStride(),    // result info
+        actualAlpha, blocking
+      );
+  }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIANGULAR_MATRIX_MATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixMatrix_MKL.h b/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixMatrix_MKL.h
new file mode 100644
index 0000000000..ba41a1c99f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixMatrix_MKL.h
@@ -0,0 +1,309 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   Triangular matrix * matrix product functionality based on ?TRMM.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_TRIANGULAR_MATRIX_MATRIX_MKL_H
+#define EIGEN_TRIANGULAR_MATRIX_MATRIX_MKL_H
+
+namespace Eigen { 
+
+namespace internal {
+
+
+template <typename Scalar, typename Index,
+          int Mode, bool LhsIsTriangular,
+          int LhsStorageOrder, bool ConjugateLhs,
+          int RhsStorageOrder, bool ConjugateRhs,
+          int ResStorageOrder>
+struct product_triangular_matrix_matrix_trmm :
+       product_triangular_matrix_matrix<Scalar,Index,Mode,
+          LhsIsTriangular,LhsStorageOrder,ConjugateLhs,
+          RhsStorageOrder, ConjugateRhs, ResStorageOrder, BuiltIn> {};
+
+
+// try to go to BLAS specialization
+#define EIGEN_MKL_TRMM_SPECIALIZE(Scalar, LhsIsTriangular) \
+template <typename Index, int Mode, \
+          int LhsStorageOrder, bool ConjugateLhs, \
+          int RhsStorageOrder, bool ConjugateRhs> \
+struct product_triangular_matrix_matrix<Scalar,Index, Mode, LhsIsTriangular, \
+           LhsStorageOrder,ConjugateLhs, RhsStorageOrder,ConjugateRhs,ColMajor,Specialized> { \
+  static inline void run(Index _rows, Index _cols, Index _depth, const Scalar* _lhs, Index lhsStride,\
+    const Scalar* _rhs, Index rhsStride, Scalar* res, Index resStride, Scalar alpha, level3_blocking<Scalar,Scalar>& blocking) { \
+      product_triangular_matrix_matrix_trmm<Scalar,Index,Mode, \
+        LhsIsTriangular,LhsStorageOrder,ConjugateLhs, \
+        RhsStorageOrder, ConjugateRhs, ColMajor>::run( \
+        _rows, _cols, _depth, _lhs, lhsStride, _rhs, rhsStride, res, resStride, alpha, blocking); \
+  } \
+};
+
+EIGEN_MKL_TRMM_SPECIALIZE(double, true)
+EIGEN_MKL_TRMM_SPECIALIZE(double, false)
+EIGEN_MKL_TRMM_SPECIALIZE(dcomplex, true)
+EIGEN_MKL_TRMM_SPECIALIZE(dcomplex, false)
+EIGEN_MKL_TRMM_SPECIALIZE(float, true)
+EIGEN_MKL_TRMM_SPECIALIZE(float, false)
+EIGEN_MKL_TRMM_SPECIALIZE(scomplex, true)
+EIGEN_MKL_TRMM_SPECIALIZE(scomplex, false)
+
+// implements col-major += alpha * op(triangular) * op(general)
+#define EIGEN_MKL_TRMM_L(EIGTYPE, MKLTYPE, EIGPREFIX, MKLPREFIX) \
+template <typename Index, int Mode, \
+          int LhsStorageOrder, bool ConjugateLhs, \
+          int RhsStorageOrder, bool ConjugateRhs> \
+struct product_triangular_matrix_matrix_trmm<EIGTYPE,Index,Mode,true, \
+         LhsStorageOrder,ConjugateLhs,RhsStorageOrder,ConjugateRhs,ColMajor> \
+{ \
+  enum { \
+    IsLower = (Mode&Lower) == Lower, \
+    SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1, \
+    IsUnitDiag  = (Mode&UnitDiag) ? 1 : 0, \
+    IsZeroDiag  = (Mode&ZeroDiag) ? 1 : 0, \
+    LowUp = IsLower ? Lower : Upper, \
+    conjA = ((LhsStorageOrder==ColMajor) && ConjugateLhs) ? 1 : 0 \
+  }; \
+\
+  static void run( \
+    Index _rows, Index _cols, Index _depth, \
+    const EIGTYPE* _lhs, Index lhsStride, \
+    const EIGTYPE* _rhs, Index rhsStride, \
+    EIGTYPE* res,        Index resStride, \
+    EIGTYPE alpha, level3_blocking<EIGTYPE,EIGTYPE>& blocking) \
+  { \
+   Index diagSize  = (std::min)(_rows,_depth); \
+   Index rows      = IsLower ? _rows : diagSize; \
+   Index depth     = IsLower ? diagSize : _depth; \
+   Index cols      = _cols; \
+\
+   typedef Matrix<EIGTYPE, Dynamic, Dynamic, LhsStorageOrder> MatrixLhs; \
+   typedef Matrix<EIGTYPE, Dynamic, Dynamic, RhsStorageOrder> MatrixRhs; \
+\
+/* Non-square case - doesn't fit to MKL ?TRMM. Fall to default triangular product or call MKL ?GEMM*/ \
+   if (rows != depth) { \
+\
+     int nthr = mkl_domain_get_max_threads(MKL_BLAS); \
+\
+     if (((nthr==1) && (((std::max)(rows,depth)-diagSize)/(double)diagSize < 0.5))) { \
+     /* Most likely no benefit to call TRMM or GEMM from MKL*/ \
+       product_triangular_matrix_matrix<EIGTYPE,Index,Mode,true, \
+       LhsStorageOrder,ConjugateLhs, RhsStorageOrder, ConjugateRhs, ColMajor, BuiltIn>::run( \
+           _rows, _cols, _depth, _lhs, lhsStride, _rhs, rhsStride, res, resStride, alpha, blocking); \
+     /*std::cout << "TRMM_L: A is not square! Go to Eigen TRMM implementation!\n";*/ \
+     } else { \
+     /* Make sense to call GEMM */ \
+       Map<const MatrixLhs, 0, OuterStride<> > lhsMap(_lhs,rows,depth,OuterStride<>(lhsStride)); \
+       MatrixLhs aa_tmp=lhsMap.template triangularView<Mode>(); \
+       MKL_INT aStride = aa_tmp.outerStride(); \
+       gemm_blocking_space<ColMajor,EIGTYPE,EIGTYPE,Dynamic,Dynamic,Dynamic> gemm_blocking(_rows,_cols,_depth); \
+       general_matrix_matrix_product<Index,EIGTYPE,LhsStorageOrder,ConjugateLhs,EIGTYPE,RhsStorageOrder,ConjugateRhs,ColMajor>::run( \
+       rows, cols, depth, aa_tmp.data(), aStride, _rhs, rhsStride, res, resStride, alpha, gemm_blocking, 0); \
+\
+     /*std::cout << "TRMM_L: A is not square! Go to MKL GEMM implementation! " << nthr<<" \n";*/ \
+     } \
+     return; \
+   } \
+   char side = 'L', transa, uplo, diag = 'N'; \
+   EIGTYPE *b; \
+   const EIGTYPE *a; \
+   MKL_INT m, n, lda, ldb; \
+   MKLTYPE alpha_; \
+\
+/* Set alpha_*/ \
+   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(alpha_, alpha); \
+\
+/* Set m, n */ \
+   m = (MKL_INT)diagSize; \
+   n = (MKL_INT)cols; \
+\
+/* Set trans */ \
+   transa = (LhsStorageOrder==RowMajor) ? ((ConjugateLhs) ? 'C' : 'T') : 'N'; \
+\
+/* Set b, ldb */ \
+   Map<const MatrixRhs, 0, OuterStride<> > rhs(_rhs,depth,cols,OuterStride<>(rhsStride)); \
+   MatrixX##EIGPREFIX b_tmp; \
+\
+   if (ConjugateRhs) b_tmp = rhs.conjugate(); else b_tmp = rhs; \
+   b = b_tmp.data(); \
+   ldb = b_tmp.outerStride(); \
+\
+/* Set uplo */ \
+   uplo = IsLower ? 'L' : 'U'; \
+   if (LhsStorageOrder==RowMajor) uplo = (uplo == 'L') ? 'U' : 'L'; \
+/* Set a, lda */ \
+   Map<const MatrixLhs, 0, OuterStride<> > lhs(_lhs,rows,depth,OuterStride<>(lhsStride)); \
+   MatrixLhs a_tmp; \
+\
+   if ((conjA!=0) || (SetDiag==0)) { \
+     if (conjA) a_tmp = lhs.conjugate(); else a_tmp = lhs; \
+     if (IsZeroDiag) \
+       a_tmp.diagonal().setZero(); \
+     else if (IsUnitDiag) \
+       a_tmp.diagonal().setOnes();\
+     a = a_tmp.data(); \
+     lda = a_tmp.outerStride(); \
+   } else { \
+     a = _lhs; \
+     lda = lhsStride; \
+   } \
+   /*std::cout << "TRMM_L: A is square! Go to MKL TRMM implementation! \n";*/ \
+/* call ?trmm*/ \
+   MKLPREFIX##trmm(&side, &uplo, &transa, &diag, &m, &n, &alpha_, (const MKLTYPE*)a, &lda, (MKLTYPE*)b, &ldb); \
+\
+/* Add op(a_triangular)*b into res*/ \
+   Map<MatrixX##EIGPREFIX, 0, OuterStride<> > res_tmp(res,rows,cols,OuterStride<>(resStride)); \
+   res_tmp=res_tmp+b_tmp; \
+  } \
+};
+
+EIGEN_MKL_TRMM_L(double, double, d, d)
+EIGEN_MKL_TRMM_L(dcomplex, MKL_Complex16, cd, z)
+EIGEN_MKL_TRMM_L(float, float, f, s)
+EIGEN_MKL_TRMM_L(scomplex, MKL_Complex8, cf, c)
+
+// implements col-major += alpha * op(general) * op(triangular)
+#define EIGEN_MKL_TRMM_R(EIGTYPE, MKLTYPE, EIGPREFIX, MKLPREFIX) \
+template <typename Index, int Mode, \
+          int LhsStorageOrder, bool ConjugateLhs, \
+          int RhsStorageOrder, bool ConjugateRhs> \
+struct product_triangular_matrix_matrix_trmm<EIGTYPE,Index,Mode,false, \
+         LhsStorageOrder,ConjugateLhs,RhsStorageOrder,ConjugateRhs,ColMajor> \
+{ \
+  enum { \
+    IsLower = (Mode&Lower) == Lower, \
+    SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1, \
+    IsUnitDiag  = (Mode&UnitDiag) ? 1 : 0, \
+    IsZeroDiag  = (Mode&ZeroDiag) ? 1 : 0, \
+    LowUp = IsLower ? Lower : Upper, \
+    conjA = ((RhsStorageOrder==ColMajor) && ConjugateRhs) ? 1 : 0 \
+  }; \
+\
+  static void run( \
+    Index _rows, Index _cols, Index _depth, \
+    const EIGTYPE* _lhs, Index lhsStride, \
+    const EIGTYPE* _rhs, Index rhsStride, \
+    EIGTYPE* res,        Index resStride, \
+    EIGTYPE alpha, level3_blocking<EIGTYPE,EIGTYPE>& blocking) \
+  { \
+   Index diagSize  = (std::min)(_cols,_depth); \
+   Index rows      = _rows; \
+   Index depth     = IsLower ? _depth : diagSize; \
+   Index cols      = IsLower ? diagSize : _cols; \
+\
+   typedef Matrix<EIGTYPE, Dynamic, Dynamic, LhsStorageOrder> MatrixLhs; \
+   typedef Matrix<EIGTYPE, Dynamic, Dynamic, RhsStorageOrder> MatrixRhs; \
+\
+/* Non-square case - doesn't fit to MKL ?TRMM. Fall to default triangular product or call MKL ?GEMM*/ \
+   if (cols != depth) { \
+\
+     int nthr = mkl_domain_get_max_threads(MKL_BLAS); \
+\
+     if ((nthr==1) && (((std::max)(cols,depth)-diagSize)/(double)diagSize < 0.5)) { \
+     /* Most likely no benefit to call TRMM or GEMM from MKL*/ \
+       product_triangular_matrix_matrix<EIGTYPE,Index,Mode,false, \
+       LhsStorageOrder,ConjugateLhs, RhsStorageOrder, ConjugateRhs, ColMajor, BuiltIn>::run( \
+           _rows, _cols, _depth, _lhs, lhsStride, _rhs, rhsStride, res, resStride, alpha, blocking); \
+       /*std::cout << "TRMM_R: A is not square! Go to Eigen TRMM implementation!\n";*/ \
+     } else { \
+     /* Make sense to call GEMM */ \
+       Map<const MatrixRhs, 0, OuterStride<> > rhsMap(_rhs,depth,cols, OuterStride<>(rhsStride)); \
+       MatrixRhs aa_tmp=rhsMap.template triangularView<Mode>(); \
+       MKL_INT aStride = aa_tmp.outerStride(); \
+       gemm_blocking_space<ColMajor,EIGTYPE,EIGTYPE,Dynamic,Dynamic,Dynamic> gemm_blocking(_rows,_cols,_depth); \
+       general_matrix_matrix_product<Index,EIGTYPE,LhsStorageOrder,ConjugateLhs,EIGTYPE,RhsStorageOrder,ConjugateRhs,ColMajor>::run( \
+       rows, cols, depth, _lhs, lhsStride, aa_tmp.data(), aStride, res, resStride, alpha, gemm_blocking, 0); \
+\
+     /*std::cout << "TRMM_R: A is not square! Go to MKL GEMM implementation! " << nthr<<" \n";*/ \
+     } \
+     return; \
+   } \
+   char side = 'R', transa, uplo, diag = 'N'; \
+   EIGTYPE *b; \
+   const EIGTYPE *a; \
+   MKL_INT m, n, lda, ldb; \
+   MKLTYPE alpha_; \
+\
+/* Set alpha_*/ \
+   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(alpha_, alpha); \
+\
+/* Set m, n */ \
+   m = (MKL_INT)rows; \
+   n = (MKL_INT)diagSize; \
+\
+/* Set trans */ \
+   transa = (RhsStorageOrder==RowMajor) ? ((ConjugateRhs) ? 'C' : 'T') : 'N'; \
+\
+/* Set b, ldb */ \
+   Map<const MatrixLhs, 0, OuterStride<> > lhs(_lhs,rows,depth,OuterStride<>(lhsStride)); \
+   MatrixX##EIGPREFIX b_tmp; \
+\
+   if (ConjugateLhs) b_tmp = lhs.conjugate(); else b_tmp = lhs; \
+   b = b_tmp.data(); \
+   ldb = b_tmp.outerStride(); \
+\
+/* Set uplo */ \
+   uplo = IsLower ? 'L' : 'U'; \
+   if (RhsStorageOrder==RowMajor) uplo = (uplo == 'L') ? 'U' : 'L'; \
+/* Set a, lda */ \
+   Map<const MatrixRhs, 0, OuterStride<> > rhs(_rhs,depth,cols, OuterStride<>(rhsStride)); \
+   MatrixRhs a_tmp; \
+\
+   if ((conjA!=0) || (SetDiag==0)) { \
+     if (conjA) a_tmp = rhs.conjugate(); else a_tmp = rhs; \
+     if (IsZeroDiag) \
+       a_tmp.diagonal().setZero(); \
+     else if (IsUnitDiag) \
+       a_tmp.diagonal().setOnes();\
+     a = a_tmp.data(); \
+     lda = a_tmp.outerStride(); \
+   } else { \
+     a = _rhs; \
+     lda = rhsStride; \
+   } \
+   /*std::cout << "TRMM_R: A is square! Go to MKL TRMM implementation! \n";*/ \
+/* call ?trmm*/ \
+   MKLPREFIX##trmm(&side, &uplo, &transa, &diag, &m, &n, &alpha_, (const MKLTYPE*)a, &lda, (MKLTYPE*)b, &ldb); \
+\
+/* Add op(a_triangular)*b into res*/ \
+   Map<MatrixX##EIGPREFIX, 0, OuterStride<> > res_tmp(res,rows,cols,OuterStride<>(resStride)); \
+   res_tmp=res_tmp+b_tmp; \
+  } \
+};
+
+EIGEN_MKL_TRMM_R(double, double, d, d)
+EIGEN_MKL_TRMM_R(dcomplex, MKL_Complex16, cd, z)
+EIGEN_MKL_TRMM_R(float, float, f, s)
+EIGEN_MKL_TRMM_R(scomplex, MKL_Complex8, cf, c)
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIANGULAR_MATRIX_MATRIX_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixVector.h b/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixVector.h
new file mode 100644
index 0000000000..9863076958
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixVector.h
@@ -0,0 +1,354 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRIANGULARMATRIXVECTOR_H
+#define EIGEN_TRIANGULARMATRIXVECTOR_H
+
+namespace Eigen {
+
+namespace internal {
+
+template<typename Index, int Mode, typename LhsScalar, bool ConjLhs, typename RhsScalar, bool ConjRhs, int StorageOrder, int Version=Specialized>
+struct triangular_matrix_vector_product;
+
+template<typename Index, int Mode, typename LhsScalar, bool ConjLhs, typename RhsScalar, bool ConjRhs, int Version>
+struct triangular_matrix_vector_product<Index,Mode,LhsScalar,ConjLhs,RhsScalar,ConjRhs,ColMajor,Version>
+{
+  typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+  enum {
+    IsLower = ((Mode&Lower)==Lower),
+    HasUnitDiag = (Mode & UnitDiag)==UnitDiag,
+    HasZeroDiag = (Mode & ZeroDiag)==ZeroDiag
+  };
+  static EIGEN_DONT_INLINE  void run(Index _rows, Index _cols, const LhsScalar* _lhs, Index lhsStride,
+                                     const RhsScalar* _rhs, Index rhsIncr, ResScalar* _res, Index resIncr, const ResScalar& alpha);
+};
+
+template<typename Index, int Mode, typename LhsScalar, bool ConjLhs, typename RhsScalar, bool ConjRhs, int Version>
+EIGEN_DONT_INLINE void triangular_matrix_vector_product<Index,Mode,LhsScalar,ConjLhs,RhsScalar,ConjRhs,ColMajor,Version>
+  ::run(Index _rows, Index _cols, const LhsScalar* _lhs, Index lhsStride,
+        const RhsScalar* _rhs, Index rhsIncr, ResScalar* _res, Index resIncr, const ResScalar& alpha)
+  {
+    static const Index PanelWidth = EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH;
+    Index size = (std::min)(_rows,_cols);
+    Index rows = IsLower ? _rows : (std::min)(_rows,_cols);
+    Index cols = IsLower ? (std::min)(_rows,_cols) : _cols;
+
+    typedef Map<const Matrix<LhsScalar,Dynamic,Dynamic,ColMajor>, 0, OuterStride<> > LhsMap;
+    const LhsMap lhs(_lhs,rows,cols,OuterStride<>(lhsStride));
+    typename conj_expr_if<ConjLhs,LhsMap>::type cjLhs(lhs);
+
+    typedef Map<const Matrix<RhsScalar,Dynamic,1>, 0, InnerStride<> > RhsMap;
+    const RhsMap rhs(_rhs,cols,InnerStride<>(rhsIncr));
+    typename conj_expr_if<ConjRhs,RhsMap>::type cjRhs(rhs);
+
+    typedef Map<Matrix<ResScalar,Dynamic,1> > ResMap;
+    ResMap res(_res,rows);
+
+    typedef const_blas_data_mapper<LhsScalar,Index,ColMajor> LhsMapper;
+    typedef const_blas_data_mapper<RhsScalar,Index,RowMajor> RhsMapper;
+
+    for (Index pi=0; pi<size; pi+=PanelWidth)
+    {
+      Index actualPanelWidth = (std::min)(PanelWidth, size-pi);
+      for (Index k=0; k<actualPanelWidth; ++k)
+      {
+        Index i = pi + k;
+        Index s = IsLower ? ((HasUnitDiag||HasZeroDiag) ? i+1 : i ) : pi;
+        Index r = IsLower ? actualPanelWidth-k : k+1;
+        if ((!(HasUnitDiag||HasZeroDiag)) || (--r)>0)
+          res.segment(s,r) += (alpha * cjRhs.coeff(i)) * cjLhs.col(i).segment(s,r);
+        if (HasUnitDiag)
+          res.coeffRef(i) += alpha * cjRhs.coeff(i);
+      }
+      Index r = IsLower ? rows - pi - actualPanelWidth : pi;
+      if (r>0)
+      {
+        Index s = IsLower ? pi+actualPanelWidth : 0;
+        general_matrix_vector_product<Index,LhsScalar,LhsMapper,ColMajor,ConjLhs,RhsScalar,RhsMapper,ConjRhs,BuiltIn>::run(
+            r, actualPanelWidth,
+            LhsMapper(&lhs.coeffRef(s,pi), lhsStride),
+            RhsMapper(&rhs.coeffRef(pi), rhsIncr),
+            &res.coeffRef(s), resIncr, alpha);
+      }
+    }
+    if((!IsLower) && cols>size)
+    {
+      general_matrix_vector_product<Index,LhsScalar,LhsMapper,ColMajor,ConjLhs,RhsScalar,RhsMapper,ConjRhs>::run(
+          rows, cols-size,
+          LhsMapper(&lhs.coeffRef(0,size), lhsStride),
+          RhsMapper(&rhs.coeffRef(size), rhsIncr),
+          _res, resIncr, alpha);
+    }
+  }
+
+template<typename Index, int Mode, typename LhsScalar, bool ConjLhs, typename RhsScalar, bool ConjRhs,int Version>
+struct triangular_matrix_vector_product<Index,Mode,LhsScalar,ConjLhs,RhsScalar,ConjRhs,RowMajor,Version>
+{
+  typedef typename scalar_product_traits<LhsScalar, RhsScalar>::ReturnType ResScalar;
+  enum {
+    IsLower = ((Mode&Lower)==Lower),
+    HasUnitDiag = (Mode & UnitDiag)==UnitDiag,
+    HasZeroDiag = (Mode & ZeroDiag)==ZeroDiag
+  };
+  static EIGEN_DONT_INLINE void run(Index _rows, Index _cols, const LhsScalar* _lhs, Index lhsStride,
+                                    const RhsScalar* _rhs, Index rhsIncr, ResScalar* _res, Index resIncr, const ResScalar& alpha);
+};
+
+template<typename Index, int Mode, typename LhsScalar, bool ConjLhs, typename RhsScalar, bool ConjRhs,int Version>
+EIGEN_DONT_INLINE void triangular_matrix_vector_product<Index,Mode,LhsScalar,ConjLhs,RhsScalar,ConjRhs,RowMajor,Version>
+  ::run(Index _rows, Index _cols, const LhsScalar* _lhs, Index lhsStride,
+        const RhsScalar* _rhs, Index rhsIncr, ResScalar* _res, Index resIncr, const ResScalar& alpha)
+  {
+    static const Index PanelWidth = EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH;
+    Index diagSize = (std::min)(_rows,_cols);
+    Index rows = IsLower ? _rows : diagSize;
+    Index cols = IsLower ? diagSize : _cols;
+
+    typedef Map<const Matrix<LhsScalar,Dynamic,Dynamic,RowMajor>, 0, OuterStride<> > LhsMap;
+    const LhsMap lhs(_lhs,rows,cols,OuterStride<>(lhsStride));
+    typename conj_expr_if<ConjLhs,LhsMap>::type cjLhs(lhs);
+
+    typedef Map<const Matrix<RhsScalar,Dynamic,1> > RhsMap;
+    const RhsMap rhs(_rhs,cols);
+    typename conj_expr_if<ConjRhs,RhsMap>::type cjRhs(rhs);
+
+    typedef Map<Matrix<ResScalar,Dynamic,1>, 0, InnerStride<> > ResMap;
+    ResMap res(_res,rows,InnerStride<>(resIncr));
+
+    typedef const_blas_data_mapper<LhsScalar,Index,RowMajor> LhsMapper;
+    typedef const_blas_data_mapper<RhsScalar,Index,RowMajor> RhsMapper;
+
+    for (Index pi=0; pi<diagSize; pi+=PanelWidth)
+    {
+      Index actualPanelWidth = (std::min)(PanelWidth, diagSize-pi);
+      for (Index k=0; k<actualPanelWidth; ++k)
+      {
+        Index i = pi + k;
+        Index s = IsLower ? pi  : ((HasUnitDiag||HasZeroDiag) ? i+1 : i);
+        Index r = IsLower ? k+1 : actualPanelWidth-k;
+        if ((!(HasUnitDiag||HasZeroDiag)) || (--r)>0)
+          res.coeffRef(i) += alpha * (cjLhs.row(i).segment(s,r).cwiseProduct(cjRhs.segment(s,r).transpose())).sum();
+        if (HasUnitDiag)
+          res.coeffRef(i) += alpha * cjRhs.coeff(i);
+      }
+      Index r = IsLower ? pi : cols - pi - actualPanelWidth;
+      if (r>0)
+      {
+        Index s = IsLower ? 0 : pi + actualPanelWidth;
+        general_matrix_vector_product<Index,LhsScalar,LhsMapper,RowMajor,ConjLhs,RhsScalar,RhsMapper,ConjRhs,BuiltIn>::run(
+            actualPanelWidth, r,
+            LhsMapper(&lhs.coeffRef(pi,s), lhsStride),
+            RhsMapper(&rhs.coeffRef(s), rhsIncr),
+            &res.coeffRef(pi), resIncr, alpha);
+      }
+    }
+    if(IsLower && rows>diagSize)
+    {
+      general_matrix_vector_product<Index,LhsScalar,LhsMapper,RowMajor,ConjLhs,RhsScalar,RhsMapper,ConjRhs>::run(
+            rows-diagSize, cols,
+            LhsMapper(&lhs.coeffRef(diagSize,0), lhsStride),
+            RhsMapper(&rhs.coeffRef(0), rhsIncr),
+            &res.coeffRef(diagSize), resIncr, alpha);
+    }
+  }
+
+/***************************************************************************
+* Wrapper to product_triangular_vector
+***************************************************************************/
+
+template<int Mode, bool LhsIsTriangular, typename Lhs, typename Rhs>
+struct traits<TriangularProduct<Mode,LhsIsTriangular,Lhs,false,Rhs,true> >
+ : traits<ProductBase<TriangularProduct<Mode,LhsIsTriangular,Lhs,false,Rhs,true>, Lhs, Rhs> >
+{};
+
+template<int Mode, bool LhsIsTriangular, typename Lhs, typename Rhs>
+struct traits<TriangularProduct<Mode,LhsIsTriangular,Lhs,true,Rhs,false> >
+ : traits<ProductBase<TriangularProduct<Mode,LhsIsTriangular,Lhs,true,Rhs,false>, Lhs, Rhs> >
+{};
+
+
+template<int StorageOrder>
+struct trmv_selector;
+
+} // end namespace internal
+
+template<int Mode, typename Lhs, typename Rhs>
+struct TriangularProduct<Mode,true,Lhs,false,Rhs,true>
+  : public ProductBase<TriangularProduct<Mode,true,Lhs,false,Rhs,true>, Lhs, Rhs >
+{
+  EIGEN_PRODUCT_PUBLIC_INTERFACE(TriangularProduct)
+
+  TriangularProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs) {}
+
+  template<typename Dest> void scaleAndAddTo(Dest& dst, const Scalar& alpha) const
+  {
+    eigen_assert(dst.rows()==m_lhs.rows() && dst.cols()==m_rhs.cols());
+
+    internal::trmv_selector<(int(internal::traits<Lhs>::Flags)&RowMajorBit) ? RowMajor : ColMajor>::run(*this, dst, alpha);
+  }
+};
+
+template<int Mode, typename Lhs, typename Rhs>
+struct TriangularProduct<Mode,false,Lhs,true,Rhs,false>
+  : public ProductBase<TriangularProduct<Mode,false,Lhs,true,Rhs,false>, Lhs, Rhs >
+{
+  EIGEN_PRODUCT_PUBLIC_INTERFACE(TriangularProduct)
+
+  TriangularProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs) {}
+
+  template<typename Dest> void scaleAndAddTo(Dest& dst, const Scalar& alpha) const
+  {
+    eigen_assert(dst.rows()==m_lhs.rows() && dst.cols()==m_rhs.cols());
+
+    typedef TriangularProduct<(Mode & (UnitDiag|ZeroDiag)) | ((Mode & Lower) ? Upper : Lower),true,Transpose<const Rhs>,false,Transpose<const Lhs>,true> TriangularProductTranspose;
+    Transpose<Dest> dstT(dst);
+    internal::trmv_selector<(int(internal::traits<Rhs>::Flags)&RowMajorBit) ? ColMajor : RowMajor>::run(
+      TriangularProductTranspose(m_rhs.transpose(),m_lhs.transpose()), dstT, alpha);
+  }
+};
+
+namespace internal {
+
+// TODO: find a way to factorize this piece of code with gemv_selector since the logic is exactly the same.
+
+template<> struct trmv_selector<ColMajor>
+{
+  template<int Mode, typename Lhs, typename Rhs, typename Dest>
+  static void run(const TriangularProduct<Mode,true,Lhs,false,Rhs,true>& prod, Dest& dest, const typename TriangularProduct<Mode,true,Lhs,false,Rhs,true>::Scalar& alpha)
+  {
+    typedef TriangularProduct<Mode,true,Lhs,false,Rhs,true> ProductType;
+    typedef typename ProductType::Index Index;
+    typedef typename ProductType::LhsScalar   LhsScalar;
+    typedef typename ProductType::RhsScalar   RhsScalar;
+    typedef typename ProductType::Scalar      ResScalar;
+    typedef typename ProductType::RealScalar  RealScalar;
+    typedef typename ProductType::ActualLhsType ActualLhsType;
+    typedef typename ProductType::ActualRhsType ActualRhsType;
+    typedef typename ProductType::LhsBlasTraits LhsBlasTraits;
+    typedef typename ProductType::RhsBlasTraits RhsBlasTraits;
+    typedef Map<Matrix<ResScalar,Dynamic,1>, Aligned> MappedDest;
+
+    typename internal::add_const_on_value_type<ActualLhsType>::type actualLhs = LhsBlasTraits::extract(prod.lhs());
+    typename internal::add_const_on_value_type<ActualRhsType>::type actualRhs = RhsBlasTraits::extract(prod.rhs());
+
+    ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(prod.lhs())
+                                  * RhsBlasTraits::extractScalarFactor(prod.rhs());
+
+    enum {
+      // FIXME find a way to allow an inner stride on the result if packet_traits<Scalar>::size==1
+      // on, the other hand it is good for the cache to pack the vector anyways...
+      EvalToDestAtCompileTime = Dest::InnerStrideAtCompileTime==1,
+      ComplexByReal = (NumTraits<LhsScalar>::IsComplex) && (!NumTraits<RhsScalar>::IsComplex),
+      MightCannotUseDest = (Dest::InnerStrideAtCompileTime!=1) || ComplexByReal
+    };
+
+    gemv_static_vector_if<ResScalar,Dest::SizeAtCompileTime,Dest::MaxSizeAtCompileTime,MightCannotUseDest> static_dest;
+
+    bool alphaIsCompatible = (!ComplexByReal) || (numext::imag(actualAlpha)==RealScalar(0));
+    bool evalToDest = EvalToDestAtCompileTime && alphaIsCompatible;
+
+    RhsScalar compatibleAlpha = get_factor<ResScalar,RhsScalar>::run(actualAlpha);
+
+    ei_declare_aligned_stack_constructed_variable(ResScalar,actualDestPtr,dest.size(),
+                                                  evalToDest ? dest.data() : static_dest.data());
+
+    if(!evalToDest)
+    {
+      #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      Index size = dest.size();
+      EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      #endif
+      if(!alphaIsCompatible)
+      {
+        MappedDest(actualDestPtr, dest.size()).setZero();
+        compatibleAlpha = RhsScalar(1);
+      }
+      else
+        MappedDest(actualDestPtr, dest.size()) = dest;
+    }
+
+    internal::triangular_matrix_vector_product
+      <Index,Mode,
+       LhsScalar, LhsBlasTraits::NeedToConjugate,
+       RhsScalar, RhsBlasTraits::NeedToConjugate,
+       ColMajor>
+      ::run(actualLhs.rows(),actualLhs.cols(),
+            actualLhs.data(),actualLhs.outerStride(),
+            actualRhs.data(),actualRhs.innerStride(),
+            actualDestPtr,1,compatibleAlpha);
+
+    if (!evalToDest)
+    {
+      if(!alphaIsCompatible)
+        dest += actualAlpha * MappedDest(actualDestPtr, dest.size());
+      else
+        dest = MappedDest(actualDestPtr, dest.size());
+    }
+  }
+};
+
+template<> struct trmv_selector<RowMajor>
+{
+  template<int Mode, typename Lhs, typename Rhs, typename Dest>
+  static void run(const TriangularProduct<Mode,true,Lhs,false,Rhs,true>& prod, Dest& dest, const typename TriangularProduct<Mode,true,Lhs,false,Rhs,true>::Scalar& alpha)
+  {
+    typedef TriangularProduct<Mode,true,Lhs,false,Rhs,true> ProductType;
+    typedef typename ProductType::LhsScalar LhsScalar;
+    typedef typename ProductType::RhsScalar RhsScalar;
+    typedef typename ProductType::Scalar    ResScalar;
+    typedef typename ProductType::Index Index;
+    typedef typename ProductType::ActualLhsType ActualLhsType;
+    typedef typename ProductType::ActualRhsType ActualRhsType;
+    typedef typename ProductType::_ActualRhsType _ActualRhsType;
+    typedef typename ProductType::LhsBlasTraits LhsBlasTraits;
+    typedef typename ProductType::RhsBlasTraits RhsBlasTraits;
+
+    typename add_const<ActualLhsType>::type actualLhs = LhsBlasTraits::extract(prod.lhs());
+    typename add_const<ActualRhsType>::type actualRhs = RhsBlasTraits::extract(prod.rhs());
+
+    ResScalar actualAlpha = alpha * LhsBlasTraits::extractScalarFactor(prod.lhs())
+                                  * RhsBlasTraits::extractScalarFactor(prod.rhs());
+
+    enum {
+      DirectlyUseRhs = _ActualRhsType::InnerStrideAtCompileTime==1
+    };
+
+    gemv_static_vector_if<RhsScalar,_ActualRhsType::SizeAtCompileTime,_ActualRhsType::MaxSizeAtCompileTime,!DirectlyUseRhs> static_rhs;
+
+    ei_declare_aligned_stack_constructed_variable(RhsScalar,actualRhsPtr,actualRhs.size(),
+        DirectlyUseRhs ? const_cast<RhsScalar*>(actualRhs.data()) : static_rhs.data());
+
+    if(!DirectlyUseRhs)
+    {
+      #ifdef EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      int size = actualRhs.size();
+      EIGEN_DENSE_STORAGE_CTOR_PLUGIN
+      #endif
+      Map<typename _ActualRhsType::PlainObject>(actualRhsPtr, actualRhs.size()) = actualRhs;
+    }
+
+    internal::triangular_matrix_vector_product
+      <Index,Mode,
+       LhsScalar, LhsBlasTraits::NeedToConjugate,
+       RhsScalar, RhsBlasTraits::NeedToConjugate,
+       RowMajor>
+      ::run(actualLhs.rows(),actualLhs.cols(),
+            actualLhs.data(),actualLhs.outerStride(),
+            actualRhsPtr,1,
+            dest.data(),dest.innerStride(),
+            actualAlpha);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIANGULARMATRIXVECTOR_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixVector_MKL.h b/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixVector_MKL.h
new file mode 100644
index 0000000000..09f110da71
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/TriangularMatrixVector_MKL.h
@@ -0,0 +1,247 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   Triangular matrix-vector product functionality based on ?TRMV.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_TRIANGULAR_MATRIX_VECTOR_MKL_H
+#define EIGEN_TRIANGULAR_MATRIX_VECTOR_MKL_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/**********************************************************************
+* This file implements triangular matrix-vector multiplication using BLAS
+**********************************************************************/
+
+// trmv/hemv specialization
+
+template<typename Index, int Mode, typename LhsScalar, bool ConjLhs, typename RhsScalar, bool ConjRhs, int StorageOrder>
+struct triangular_matrix_vector_product_trmv :
+  triangular_matrix_vector_product<Index,Mode,LhsScalar,ConjLhs,RhsScalar,ConjRhs,StorageOrder,BuiltIn> {};
+
+#define EIGEN_MKL_TRMV_SPECIALIZE(Scalar) \
+template<typename Index, int Mode, bool ConjLhs, bool ConjRhs> \
+struct triangular_matrix_vector_product<Index,Mode,Scalar,ConjLhs,Scalar,ConjRhs,ColMajor,Specialized> { \
+ static void run(Index _rows, Index _cols, const Scalar* _lhs, Index lhsStride, \
+                                     const Scalar* _rhs, Index rhsIncr, Scalar* _res, Index resIncr, Scalar alpha) { \
+      triangular_matrix_vector_product_trmv<Index,Mode,Scalar,ConjLhs,Scalar,ConjRhs,ColMajor>::run( \
+        _rows, _cols, _lhs, lhsStride, _rhs, rhsIncr, _res, resIncr, alpha); \
+  } \
+}; \
+template<typename Index, int Mode, bool ConjLhs, bool ConjRhs> \
+struct triangular_matrix_vector_product<Index,Mode,Scalar,ConjLhs,Scalar,ConjRhs,RowMajor,Specialized> { \
+ static void run(Index _rows, Index _cols, const Scalar* _lhs, Index lhsStride, \
+                                     const Scalar* _rhs, Index rhsIncr, Scalar* _res, Index resIncr, Scalar alpha) { \
+      triangular_matrix_vector_product_trmv<Index,Mode,Scalar,ConjLhs,Scalar,ConjRhs,RowMajor>::run( \
+        _rows, _cols, _lhs, lhsStride, _rhs, rhsIncr, _res, resIncr, alpha); \
+  } \
+};
+
+EIGEN_MKL_TRMV_SPECIALIZE(double)
+EIGEN_MKL_TRMV_SPECIALIZE(float)
+EIGEN_MKL_TRMV_SPECIALIZE(dcomplex)
+EIGEN_MKL_TRMV_SPECIALIZE(scomplex)
+
+// implements col-major: res += alpha * op(triangular) * vector
+#define EIGEN_MKL_TRMV_CM(EIGTYPE, MKLTYPE, EIGPREFIX, MKLPREFIX) \
+template<typename Index, int Mode, bool ConjLhs, bool ConjRhs> \
+struct triangular_matrix_vector_product_trmv<Index,Mode,EIGTYPE,ConjLhs,EIGTYPE,ConjRhs,ColMajor> { \
+  enum { \
+    IsLower = (Mode&Lower) == Lower, \
+    SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1, \
+    IsUnitDiag  = (Mode&UnitDiag) ? 1 : 0, \
+    IsZeroDiag  = (Mode&ZeroDiag) ? 1 : 0, \
+    LowUp = IsLower ? Lower : Upper \
+  }; \
+ static void run(Index _rows, Index _cols, const EIGTYPE* _lhs, Index lhsStride, \
+                 const EIGTYPE* _rhs, Index rhsIncr, EIGTYPE* _res, Index resIncr, EIGTYPE alpha) \
+ { \
+   if (ConjLhs || IsZeroDiag) { \
+     triangular_matrix_vector_product<Index,Mode,EIGTYPE,ConjLhs,EIGTYPE,ConjRhs,ColMajor,BuiltIn>::run( \
+       _rows, _cols, _lhs, lhsStride, _rhs, rhsIncr, _res, resIncr, alpha); \
+     return; \
+   }\
+   Index size = (std::min)(_rows,_cols); \
+   Index rows = IsLower ? _rows : size; \
+   Index cols = IsLower ? size : _cols; \
+\
+   typedef VectorX##EIGPREFIX VectorRhs; \
+   EIGTYPE *x, *y;\
+\
+/* Set x*/ \
+   Map<const VectorRhs, 0, InnerStride<> > rhs(_rhs,cols,InnerStride<>(rhsIncr)); \
+   VectorRhs x_tmp; \
+   if (ConjRhs) x_tmp = rhs.conjugate(); else x_tmp = rhs; \
+   x = x_tmp.data(); \
+\
+/* Square part handling */\
+\
+   char trans, uplo, diag; \
+   MKL_INT m, n, lda, incx, incy; \
+   EIGTYPE const *a; \
+   MKLTYPE alpha_, beta_; \
+   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(alpha_, alpha); \
+   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(beta_, EIGTYPE(1)); \
+\
+/* Set m, n */ \
+   n = (MKL_INT)size; \
+   lda = lhsStride; \
+   incx = 1; \
+   incy = resIncr; \
+\
+/* Set uplo, trans and diag*/ \
+   trans = 'N'; \
+   uplo = IsLower ? 'L' : 'U'; \
+   diag = IsUnitDiag ? 'U' : 'N'; \
+\
+/* call ?TRMV*/ \
+   MKLPREFIX##trmv(&uplo, &trans, &diag, &n, (const MKLTYPE*)_lhs, &lda, (MKLTYPE*)x, &incx); \
+\
+/* Add op(a_tr)rhs into res*/ \
+   MKLPREFIX##axpy(&n, &alpha_,(const MKLTYPE*)x, &incx, (MKLTYPE*)_res, &incy); \
+/* Non-square case - doesn't fit to MKL ?TRMV. Fall to default triangular product*/ \
+   if (size<(std::max)(rows,cols)) { \
+     typedef Matrix<EIGTYPE, Dynamic, Dynamic> MatrixLhs; \
+     if (ConjRhs) x_tmp = rhs.conjugate(); else x_tmp = rhs; \
+     x = x_tmp.data(); \
+     if (size<rows) { \
+       y = _res + size*resIncr; \
+       a = _lhs + size; \
+       m = rows-size; \
+       n = size; \
+     } \
+     else { \
+       x += size; \
+       y = _res; \
+       a = _lhs + size*lda; \
+       m = size; \
+       n = cols-size; \
+     } \
+     MKLPREFIX##gemv(&trans, &m, &n, &alpha_, (const MKLTYPE*)a, &lda, (const MKLTYPE*)x, &incx, &beta_, (MKLTYPE*)y, &incy); \
+   } \
+  } \
+};
+
+EIGEN_MKL_TRMV_CM(double, double, d, d)
+EIGEN_MKL_TRMV_CM(dcomplex, MKL_Complex16, cd, z)
+EIGEN_MKL_TRMV_CM(float, float, f, s)
+EIGEN_MKL_TRMV_CM(scomplex, MKL_Complex8, cf, c)
+
+// implements row-major: res += alpha * op(triangular) * vector
+#define EIGEN_MKL_TRMV_RM(EIGTYPE, MKLTYPE, EIGPREFIX, MKLPREFIX) \
+template<typename Index, int Mode, bool ConjLhs, bool ConjRhs> \
+struct triangular_matrix_vector_product_trmv<Index,Mode,EIGTYPE,ConjLhs,EIGTYPE,ConjRhs,RowMajor> { \
+  enum { \
+    IsLower = (Mode&Lower) == Lower, \
+    SetDiag = (Mode&(ZeroDiag|UnitDiag)) ? 0 : 1, \
+    IsUnitDiag  = (Mode&UnitDiag) ? 1 : 0, \
+    IsZeroDiag  = (Mode&ZeroDiag) ? 1 : 0, \
+    LowUp = IsLower ? Lower : Upper \
+  }; \
+ static void run(Index _rows, Index _cols, const EIGTYPE* _lhs, Index lhsStride, \
+                 const EIGTYPE* _rhs, Index rhsIncr, EIGTYPE* _res, Index resIncr, EIGTYPE alpha) \
+ { \
+   if (IsZeroDiag) { \
+     triangular_matrix_vector_product<Index,Mode,EIGTYPE,ConjLhs,EIGTYPE,ConjRhs,RowMajor,BuiltIn>::run( \
+       _rows, _cols, _lhs, lhsStride, _rhs, rhsIncr, _res, resIncr, alpha); \
+     return; \
+   }\
+   Index size = (std::min)(_rows,_cols); \
+   Index rows = IsLower ? _rows : size; \
+   Index cols = IsLower ? size : _cols; \
+\
+   typedef VectorX##EIGPREFIX VectorRhs; \
+   EIGTYPE *x, *y;\
+\
+/* Set x*/ \
+   Map<const VectorRhs, 0, InnerStride<> > rhs(_rhs,cols,InnerStride<>(rhsIncr)); \
+   VectorRhs x_tmp; \
+   if (ConjRhs) x_tmp = rhs.conjugate(); else x_tmp = rhs; \
+   x = x_tmp.data(); \
+\
+/* Square part handling */\
+\
+   char trans, uplo, diag; \
+   MKL_INT m, n, lda, incx, incy; \
+   EIGTYPE const *a; \
+   MKLTYPE alpha_, beta_; \
+   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(alpha_, alpha); \
+   assign_scalar_eig2mkl<MKLTYPE, EIGTYPE>(beta_, EIGTYPE(1)); \
+\
+/* Set m, n */ \
+   n = (MKL_INT)size; \
+   lda = lhsStride; \
+   incx = 1; \
+   incy = resIncr; \
+\
+/* Set uplo, trans and diag*/ \
+   trans = ConjLhs ? 'C' : 'T'; \
+   uplo = IsLower ? 'U' : 'L'; \
+   diag = IsUnitDiag ? 'U' : 'N'; \
+\
+/* call ?TRMV*/ \
+   MKLPREFIX##trmv(&uplo, &trans, &diag, &n, (const MKLTYPE*)_lhs, &lda, (MKLTYPE*)x, &incx); \
+\
+/* Add op(a_tr)rhs into res*/ \
+   MKLPREFIX##axpy(&n, &alpha_,(const MKLTYPE*)x, &incx, (MKLTYPE*)_res, &incy); \
+/* Non-square case - doesn't fit to MKL ?TRMV. Fall to default triangular product*/ \
+   if (size<(std::max)(rows,cols)) { \
+     typedef Matrix<EIGTYPE, Dynamic, Dynamic> MatrixLhs; \
+     if (ConjRhs) x_tmp = rhs.conjugate(); else x_tmp = rhs; \
+     x = x_tmp.data(); \
+     if (size<rows) { \
+       y = _res + size*resIncr; \
+       a = _lhs + size*lda; \
+       m = rows-size; \
+       n = size; \
+     } \
+     else { \
+       x += size; \
+       y = _res; \
+       a = _lhs + size; \
+       m = size; \
+       n = cols-size; \
+     } \
+     MKLPREFIX##gemv(&trans, &n, &m, &alpha_, (const MKLTYPE*)a, &lda, (const MKLTYPE*)x, &incx, &beta_, (MKLTYPE*)y, &incy); \
+   } \
+  } \
+};
+
+EIGEN_MKL_TRMV_RM(double, double, d, d)
+EIGEN_MKL_TRMV_RM(dcomplex, MKL_Complex16, cd, z)
+EIGEN_MKL_TRMV_RM(float, float, f, s)
+EIGEN_MKL_TRMV_RM(scomplex, MKL_Complex8, cf, c)
+
+} // end namespase internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIANGULAR_MATRIX_VECTOR_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/TriangularSolverMatrix.h b/third_party/eigen3/Eigen/src/Core/products/TriangularSolverMatrix.h
new file mode 100644
index 0000000000..f5de67c59f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/TriangularSolverMatrix.h
@@ -0,0 +1,331 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRIANGULAR_SOLVER_MATRIX_H
+#define EIGEN_TRIANGULAR_SOLVER_MATRIX_H
+
+namespace Eigen { 
+
+namespace internal {
+
+// if the rhs is row major, let's transpose the product
+template <typename Scalar, typename Index, int Side, int Mode, bool Conjugate, int TriStorageOrder>
+struct triangular_solve_matrix<Scalar,Index,Side,Mode,Conjugate,TriStorageOrder,RowMajor>
+{
+  static void run(
+    Index size, Index cols,
+    const Scalar*  tri, Index triStride,
+    Scalar* _other, Index otherStride,
+    level3_blocking<Scalar,Scalar>& blocking)
+  {
+    triangular_solve_matrix<
+      Scalar, Index, Side==OnTheLeft?OnTheRight:OnTheLeft,
+      (Mode&UnitDiag) | ((Mode&Upper) ? Lower : Upper),
+      NumTraits<Scalar>::IsComplex && Conjugate,
+      TriStorageOrder==RowMajor ? ColMajor : RowMajor, ColMajor>
+      ::run(size, cols, tri, triStride, _other, otherStride, blocking);
+  }
+};
+
+/* Optimized triangular solver with multiple right hand side and the triangular matrix on the left
+ */
+template <typename Scalar, typename Index, int Mode, bool Conjugate, int TriStorageOrder>
+struct triangular_solve_matrix<Scalar,Index,OnTheLeft,Mode,Conjugate,TriStorageOrder,ColMajor>
+{
+  static EIGEN_DONT_INLINE void run(
+    Index size, Index otherSize,
+    const Scalar* _tri, Index triStride,
+    Scalar* _other, Index otherStride,
+    level3_blocking<Scalar,Scalar>& blocking);
+};
+template <typename Scalar, typename Index, int Mode, bool Conjugate, int TriStorageOrder>
+EIGEN_DONT_INLINE void triangular_solve_matrix<Scalar,Index,OnTheLeft,Mode,Conjugate,TriStorageOrder,ColMajor>::run(
+    Index size, Index otherSize,
+    const Scalar* _tri, Index triStride,
+    Scalar* _other, Index otherStride,
+    level3_blocking<Scalar,Scalar>& blocking)
+  {
+    Index cols = otherSize;
+
+    typedef const_blas_data_mapper<Scalar, Index, TriStorageOrder> TriMapper;
+    typedef blas_data_mapper<Scalar, Index, ColMajor> OtherMapper;
+    TriMapper tri(_tri, triStride);
+    OtherMapper other(_other, otherStride);
+
+    typedef gebp_traits<Scalar,Scalar> Traits;
+
+    enum {
+      SmallPanelWidth   = EIGEN_PLAIN_ENUM_MAX(Traits::mr,Traits::nr),
+      IsLower = (Mode&Lower) == Lower
+    };
+
+    Index kc = blocking.kc();                   // cache block size along the K direction
+    Index mc = (std::min)(size,blocking.mc());  // cache block size along the M direction
+
+    std::size_t sizeA = kc*mc;
+    std::size_t sizeB = kc*cols;
+
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA());
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB());
+
+    conj_if<Conjugate> conj;
+    gebp_kernel<Scalar, Scalar, Index, OtherMapper, Traits::mr, Traits::nr, Conjugate, false> gebp_kernel;
+    gemm_pack_lhs<Scalar, Index, TriMapper, Traits::mr, Traits::LhsProgress, TriStorageOrder> pack_lhs;
+    gemm_pack_rhs<Scalar, Index, OtherMapper, Traits::nr, ColMajor, false, true> pack_rhs;
+
+    // the goal here is to subdivise the Rhs panels such that we keep some cache
+    // coherence when accessing the rhs elements
+    std::ptrdiff_t l1, l2, l3;
+    manage_caching_sizes(GetAction, &l1, &l2, &l3);
+    Index subcols = cols>0 ? l2/(4 * sizeof(Scalar) * otherStride) : 0;
+    subcols = std::max<Index>((subcols/Traits::nr)*Traits::nr, Traits::nr);
+
+    for(Index k2=IsLower ? 0 : size;
+        IsLower ? k2<size : k2>0;
+        IsLower ? k2+=kc : k2-=kc)
+    {
+      const Index actual_kc = (std::min)(IsLower ? size-k2 : k2, kc);
+
+      // We have selected and packed a big horizontal panel R1 of rhs. Let B be the packed copy of this panel,
+      // and R2 the remaining part of rhs. The corresponding vertical panel of lhs is split into
+      // A11 (the triangular part) and A21 the remaining rectangular part.
+      // Then the high level algorithm is:
+      //  - B = R1                    => general block copy (done during the next step)
+      //  - R1 = A11^-1 B             => tricky part
+      //  - update B from the new R1  => actually this has to be performed continuously during the above step
+      //  - R2 -= A21 * B             => GEPP
+
+      // The tricky part: compute R1 = A11^-1 B while updating B from R1
+      // The idea is to split A11 into multiple small vertical panels.
+      // Each panel can be split into a small triangular part T1k which is processed without optimization,
+      // and the remaining small part T2k which is processed using gebp with appropriate block strides
+      for(Index j2=0; j2<cols; j2+=subcols)
+      {
+        Index actual_cols = (std::min)(cols-j2,subcols);
+        // for each small vertical panels [T1k^T, T2k^T]^T of lhs
+        for (Index k1=0; k1<actual_kc; k1+=SmallPanelWidth)
+        {
+          Index actualPanelWidth = std::min<Index>(actual_kc-k1, SmallPanelWidth);
+          // tr solve
+          for (Index k=0; k<actualPanelWidth; ++k)
+          {
+            // TODO write a small kernel handling this (can be shared with trsv)
+            Index i  = IsLower ? k2+k1+k : k2-k1-k-1;
+            Index s  = IsLower ? k2+k1 : i+1;
+            Index rs = actualPanelWidth - k - 1; // remaining size
+
+            Scalar a = (Mode & UnitDiag) ? Scalar(1) : Scalar(1)/conj(tri(i,i));
+            for (Index j=j2; j<j2+actual_cols; ++j)
+            {
+              if (TriStorageOrder==RowMajor)
+              {
+                Scalar b(0);
+                const Scalar* l = &tri(i,s);
+                Scalar* r = &other(s,j);
+                for (Index i3=0; i3<k; ++i3)
+                  b += conj(l[i3]) * r[i3];
+
+                other(i,j) = (other(i,j) - b)*a;
+              }
+              else
+              {
+                Index s = IsLower ? i+1 : i-rs;
+                Scalar b = (other(i,j) *= a);
+                Scalar* r = &other(s,j);
+                const Scalar* l = &tri(s,i);
+                for (Index i3=0;i3<rs;++i3)
+                  r[i3] -= b * conj(l[i3]);
+              }
+            }
+          }
+
+          Index lengthTarget = actual_kc-k1-actualPanelWidth;
+          Index startBlock   = IsLower ? k2+k1 : k2-k1-actualPanelWidth;
+          Index blockBOffset = IsLower ? k1 : lengthTarget;
+
+          // update the respective rows of B from other
+          pack_rhs(blockB+actual_kc*j2, other.getSubMapper(startBlock,j2), actualPanelWidth, actual_cols, actual_kc, blockBOffset);
+
+          // GEBP
+          if (lengthTarget>0)
+          {
+            Index startTarget  = IsLower ? k2+k1+actualPanelWidth : k2-actual_kc;
+
+            pack_lhs(blockA, tri.getSubMapper(startTarget,startBlock), actualPanelWidth, lengthTarget);
+
+            gebp_kernel(other.getSubMapper(startTarget,j2), blockA, blockB+actual_kc*j2, lengthTarget, actualPanelWidth, actual_cols, Scalar(-1),
+                        actualPanelWidth, actual_kc, 0, blockBOffset);
+          }
+        }
+      }
+      
+      // R2 -= A21 * B => GEPP
+      {
+        Index start = IsLower ? k2+kc : 0;
+        Index end   = IsLower ? size : k2-kc;
+        for(Index i2=start; i2<end; i2+=mc)
+        {
+          const Index actual_mc = (std::min)(mc,end-i2);
+          if (actual_mc>0)
+          {
+            pack_lhs(blockA, tri.getSubMapper(i2, IsLower ? k2 : k2-kc), actual_kc, actual_mc);
+
+            gebp_kernel(other.getSubMapper(i2, 0), blockA, blockB, actual_mc, actual_kc, cols, Scalar(-1), -1, -1, 0, 0);
+          }
+        }
+      }
+    }
+  }
+
+/* Optimized triangular solver with multiple left hand sides and the trinagular matrix on the right
+ */
+template <typename Scalar, typename Index, int Mode, bool Conjugate, int TriStorageOrder>
+struct triangular_solve_matrix<Scalar,Index,OnTheRight,Mode,Conjugate,TriStorageOrder,ColMajor>
+{
+  static EIGEN_DONT_INLINE void run(
+    Index size, Index otherSize,
+    const Scalar* _tri, Index triStride,
+    Scalar* _other, Index otherStride,
+    level3_blocking<Scalar,Scalar>& blocking);
+};
+template <typename Scalar, typename Index, int Mode, bool Conjugate, int TriStorageOrder>
+EIGEN_DONT_INLINE void triangular_solve_matrix<Scalar,Index,OnTheRight,Mode,Conjugate,TriStorageOrder,ColMajor>::run(
+    Index size, Index otherSize,
+    const Scalar* _tri, Index triStride,
+    Scalar* _other, Index otherStride,
+    level3_blocking<Scalar,Scalar>& blocking)
+  {
+    Index rows = otherSize;
+
+    typedef blas_data_mapper<Scalar, Index, ColMajor> LhsMapper;
+    typedef const_blas_data_mapper<Scalar, Index, TriStorageOrder> RhsMapper;
+    LhsMapper lhs(_other, otherStride);
+    RhsMapper rhs(_tri, triStride);
+
+    typedef gebp_traits<Scalar,Scalar> Traits;
+    enum {
+      RhsStorageOrder   = TriStorageOrder,
+      SmallPanelWidth   = EIGEN_PLAIN_ENUM_MAX(Traits::mr,Traits::nr),
+      IsLower = (Mode&Lower) == Lower
+    };
+
+    Index kc = blocking.kc();                   // cache block size along the K direction
+    Index mc = (std::min)(rows,blocking.mc());  // cache block size along the M direction
+
+    std::size_t sizeA = kc*mc;
+    std::size_t sizeB = kc*size;
+
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockA, sizeA, blocking.blockA());
+    ei_declare_aligned_stack_constructed_variable(Scalar, blockB, sizeB, blocking.blockB());
+
+    conj_if<Conjugate> conj;
+    gebp_kernel<Scalar, Scalar, Index, LhsMapper, Traits::mr, Traits::nr, false, Conjugate> gebp_kernel;
+    gemm_pack_rhs<Scalar, Index, RhsMapper, Traits::nr, RhsStorageOrder> pack_rhs;
+    gemm_pack_rhs<Scalar, Index, RhsMapper, Traits::nr, RhsStorageOrder,false,true> pack_rhs_panel;
+    gemm_pack_lhs<Scalar, Index, LhsMapper, Traits::mr, Traits::LhsProgress, ColMajor, false, true> pack_lhs_panel;
+
+    for(Index k2=IsLower ? size : 0;
+        IsLower ? k2>0 : k2<size;
+        IsLower ? k2-=kc : k2+=kc)
+    {
+      const Index actual_kc = (std::min)(IsLower ? k2 : size-k2, kc);
+      Index actual_k2 = IsLower ? k2-actual_kc : k2 ;
+
+      Index startPanel = IsLower ? 0 : k2+actual_kc;
+      Index rs = IsLower ? actual_k2 : size - actual_k2 - actual_kc;
+      Scalar* geb = blockB+actual_kc*actual_kc;
+
+      if (rs>0) pack_rhs(geb, rhs.getSubMapper(actual_k2,startPanel), actual_kc, rs);
+
+      // triangular packing (we only pack the panels off the diagonal,
+      // neglecting the blocks overlapping the diagonal
+      {
+        for (Index j2=0; j2<actual_kc; j2+=SmallPanelWidth)
+        {
+          Index actualPanelWidth = std::min<Index>(actual_kc-j2, SmallPanelWidth);
+          Index actual_j2 = actual_k2 + j2;
+          Index panelOffset = IsLower ? j2+actualPanelWidth : 0;
+          Index panelLength = IsLower ? actual_kc-j2-actualPanelWidth : j2;
+
+          if (panelLength>0)
+          pack_rhs_panel(blockB+j2*actual_kc,
+                         rhs.getSubMapper(actual_k2+panelOffset, actual_j2),
+                         panelLength, actualPanelWidth,
+                         actual_kc, panelOffset);
+        }
+      }
+
+      for(Index i2=0; i2<rows; i2+=mc)
+      {
+        const Index actual_mc = (std::min)(mc,rows-i2);
+
+        // triangular solver kernel
+        {
+          // for each small block of the diagonal (=> vertical panels of rhs)
+          for (Index j2 = IsLower
+                      ? (actual_kc - ((actual_kc%SmallPanelWidth) ? Index(actual_kc%SmallPanelWidth)
+                                                                  : Index(SmallPanelWidth)))
+                      : 0;
+               IsLower ? j2>=0 : j2<actual_kc;
+               IsLower ? j2-=SmallPanelWidth : j2+=SmallPanelWidth)
+          {
+            Index actualPanelWidth = std::min<Index>(actual_kc-j2, SmallPanelWidth);
+            Index absolute_j2 = actual_k2 + j2;
+            Index panelOffset = IsLower ? j2+actualPanelWidth : 0;
+            Index panelLength = IsLower ? actual_kc - j2 - actualPanelWidth : j2;
+
+            // GEBP
+            if(panelLength>0)
+            {
+              gebp_kernel(lhs.getSubMapper(i2,absolute_j2),
+                          blockA, blockB+j2*actual_kc,
+                          actual_mc, panelLength, actualPanelWidth,
+                          Scalar(-1),
+                          actual_kc, actual_kc, // strides
+                          panelOffset, panelOffset); // offsets
+            }
+
+            // unblocked triangular solve
+            for (Index k=0; k<actualPanelWidth; ++k)
+            {
+              Index j = IsLower ? absolute_j2+actualPanelWidth-k-1 : absolute_j2+k;
+
+              Scalar* r = &lhs(i2,j);
+              for (Index k3=0; k3<k; ++k3)
+              {
+                Scalar b = conj(rhs(IsLower ? j+1+k3 : absolute_j2+k3,j));
+                Scalar* a = &lhs(i2,IsLower ? j+1+k3 : absolute_j2+k3);
+                for (Index i=0; i<actual_mc; ++i)
+                  r[i] -= a[i] * b;
+              }
+              Scalar b = (Mode & UnitDiag) ? Scalar(1) : Scalar(1)/conj(rhs(j,j));
+              for (Index i=0; i<actual_mc; ++i)
+                r[i] *= b;
+            }
+
+            // pack the just computed part of lhs to A
+            pack_lhs_panel(blockA, LhsMapper(_other+absolute_j2*otherStride+i2, otherStride),
+                           actualPanelWidth, actual_mc,
+                           actual_kc, j2);
+          }
+        }
+
+        if (rs>0)
+          gebp_kernel(lhs.getSubMapper(i2, startPanel), blockA, geb,
+                      actual_mc, actual_kc, rs, Scalar(-1),
+                      -1, -1, 0, 0);
+      }
+    }
+  }
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIANGULAR_SOLVER_MATRIX_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/TriangularSolverMatrix_MKL.h b/third_party/eigen3/Eigen/src/Core/products/TriangularSolverMatrix_MKL.h
new file mode 100644
index 0000000000..6a0bb83393
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/TriangularSolverMatrix_MKL.h
@@ -0,0 +1,155 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   Triangular matrix * matrix product functionality based on ?TRMM.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_TRIANGULAR_SOLVER_MATRIX_MKL_H
+#define EIGEN_TRIANGULAR_SOLVER_MATRIX_MKL_H
+
+namespace Eigen {
+
+namespace internal {
+
+// implements LeftSide op(triangular)^-1 * general
+#define EIGEN_MKL_TRSM_L(EIGTYPE, MKLTYPE, MKLPREFIX) \
+template <typename Index, int Mode, bool Conjugate, int TriStorageOrder> \
+struct triangular_solve_matrix<EIGTYPE,Index,OnTheLeft,Mode,Conjugate,TriStorageOrder,ColMajor> \
+{ \
+  enum { \
+    IsLower = (Mode&Lower) == Lower, \
+    IsUnitDiag  = (Mode&UnitDiag) ? 1 : 0, \
+    IsZeroDiag  = (Mode&ZeroDiag) ? 1 : 0, \
+    conjA = ((TriStorageOrder==ColMajor) && Conjugate) ? 1 : 0 \
+  }; \
+  static void run( \
+      Index size, Index otherSize, \
+      const EIGTYPE* _tri, Index triStride, \
+      EIGTYPE* _other, Index otherStride, level3_blocking<EIGTYPE,EIGTYPE>& /*blocking*/) \
+  { \
+   MKL_INT m = size, n = otherSize, lda, ldb; \
+   char side = 'L', uplo, diag='N', transa; \
+   /* Set alpha_ */ \
+   MKLTYPE alpha; \
+   EIGTYPE myone(1); \
+   assign_scalar_eig2mkl(alpha, myone); \
+   ldb = otherStride;\
+\
+   const EIGTYPE *a; \
+/* Set trans */ \
+   transa = (TriStorageOrder==RowMajor) ? ((Conjugate) ? 'C' : 'T') : 'N'; \
+/* Set uplo */ \
+   uplo = IsLower ? 'L' : 'U'; \
+   if (TriStorageOrder==RowMajor) uplo = (uplo == 'L') ? 'U' : 'L'; \
+/* Set a, lda */ \
+   typedef Matrix<EIGTYPE, Dynamic, Dynamic, TriStorageOrder> MatrixTri; \
+   Map<const MatrixTri, 0, OuterStride<> > tri(_tri,size,size,OuterStride<>(triStride)); \
+   MatrixTri a_tmp; \
+\
+   if (conjA) { \
+     a_tmp = tri.conjugate(); \
+     a = a_tmp.data(); \
+     lda = a_tmp.outerStride(); \
+   } else { \
+     a = _tri; \
+     lda = triStride; \
+   } \
+   if (IsUnitDiag) diag='U'; \
+/* call ?trsm*/ \
+   MKLPREFIX##trsm(&side, &uplo, &transa, &diag, &m, &n, &alpha, (const MKLTYPE*)a, &lda, (MKLTYPE*)_other, &ldb); \
+ } \
+};
+
+EIGEN_MKL_TRSM_L(double, double, d)
+EIGEN_MKL_TRSM_L(dcomplex, MKL_Complex16, z)
+EIGEN_MKL_TRSM_L(float, float, s)
+EIGEN_MKL_TRSM_L(scomplex, MKL_Complex8, c)
+
+
+// implements RightSide general * op(triangular)^-1
+#define EIGEN_MKL_TRSM_R(EIGTYPE, MKLTYPE, MKLPREFIX) \
+template <typename Index, int Mode, bool Conjugate, int TriStorageOrder> \
+struct triangular_solve_matrix<EIGTYPE,Index,OnTheRight,Mode,Conjugate,TriStorageOrder,ColMajor> \
+{ \
+  enum { \
+    IsLower = (Mode&Lower) == Lower, \
+    IsUnitDiag  = (Mode&UnitDiag) ? 1 : 0, \
+    IsZeroDiag  = (Mode&ZeroDiag) ? 1 : 0, \
+    conjA = ((TriStorageOrder==ColMajor) && Conjugate) ? 1 : 0 \
+  }; \
+  static void run( \
+      Index size, Index otherSize, \
+      const EIGTYPE* _tri, Index triStride, \
+      EIGTYPE* _other, Index otherStride, level3_blocking<EIGTYPE,EIGTYPE>& /*blocking*/) \
+  { \
+   MKL_INT m = otherSize, n = size, lda, ldb; \
+   char side = 'R', uplo, diag='N', transa; \
+   /* Set alpha_ */ \
+   MKLTYPE alpha; \
+   EIGTYPE myone(1); \
+   assign_scalar_eig2mkl(alpha, myone); \
+   ldb = otherStride;\
+\
+   const EIGTYPE *a; \
+/* Set trans */ \
+   transa = (TriStorageOrder==RowMajor) ? ((Conjugate) ? 'C' : 'T') : 'N'; \
+/* Set uplo */ \
+   uplo = IsLower ? 'L' : 'U'; \
+   if (TriStorageOrder==RowMajor) uplo = (uplo == 'L') ? 'U' : 'L'; \
+/* Set a, lda */ \
+   typedef Matrix<EIGTYPE, Dynamic, Dynamic, TriStorageOrder> MatrixTri; \
+   Map<const MatrixTri, 0, OuterStride<> > tri(_tri,size,size,OuterStride<>(triStride)); \
+   MatrixTri a_tmp; \
+\
+   if (conjA) { \
+     a_tmp = tri.conjugate(); \
+     a = a_tmp.data(); \
+     lda = a_tmp.outerStride(); \
+   } else { \
+     a = _tri; \
+     lda = triStride; \
+   } \
+   if (IsUnitDiag) diag='U'; \
+/* call ?trsm*/ \
+   MKLPREFIX##trsm(&side, &uplo, &transa, &diag, &m, &n, &alpha, (const MKLTYPE*)a, &lda, (MKLTYPE*)_other, &ldb); \
+   /*std::cout << "TRMS_L specialization!\n";*/ \
+ } \
+};
+
+EIGEN_MKL_TRSM_R(double, double, d)
+EIGEN_MKL_TRSM_R(dcomplex, MKL_Complex16, z)
+EIGEN_MKL_TRSM_R(float, float, s)
+EIGEN_MKL_TRSM_R(scomplex, MKL_Complex8, c)
+
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIANGULAR_SOLVER_MATRIX_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Core/products/TriangularSolverVector.h b/third_party/eigen3/Eigen/src/Core/products/TriangularSolverVector.h
new file mode 100644
index 0000000000..b994759b26
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/products/TriangularSolverVector.h
@@ -0,0 +1,145 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRIANGULAR_SOLVER_VECTOR_H
+#define EIGEN_TRIANGULAR_SOLVER_VECTOR_H
+
+namespace Eigen {
+
+namespace internal {
+
+template<typename LhsScalar, typename RhsScalar, typename Index, int Mode, bool Conjugate, int StorageOrder>
+struct triangular_solve_vector<LhsScalar, RhsScalar, Index, OnTheRight, Mode, Conjugate, StorageOrder>
+{
+  static void run(Index size, const LhsScalar* _lhs, Index lhsStride, RhsScalar* rhs)
+  {
+    triangular_solve_vector<LhsScalar,RhsScalar,Index,OnTheLeft,
+        ((Mode&Upper)==Upper ? Lower : Upper) | (Mode&UnitDiag),
+        Conjugate,StorageOrder==RowMajor?ColMajor:RowMajor
+      >::run(size, _lhs, lhsStride, rhs);
+  }
+};
+
+// forward and backward substitution, row-major, rhs is a vector
+template<typename LhsScalar, typename RhsScalar, typename Index, int Mode, bool Conjugate>
+struct triangular_solve_vector<LhsScalar, RhsScalar, Index, OnTheLeft, Mode, Conjugate, RowMajor>
+{
+  enum {
+    IsLower = ((Mode&Lower)==Lower)
+  };
+  static void run(Index size, const LhsScalar* _lhs, Index lhsStride, RhsScalar* rhs)
+  {
+    typedef Map<const Matrix<LhsScalar,Dynamic,Dynamic,RowMajor>, 0, OuterStride<> > LhsMap;
+    const LhsMap lhs(_lhs,size,size,OuterStride<>(lhsStride));
+
+    typedef const_blas_data_mapper<LhsScalar,Index,RowMajor> LhsMapper;
+    typedef const_blas_data_mapper<RhsScalar,Index,ColMajor> RhsMapper;
+
+    typename internal::conditional<
+                          Conjugate,
+                          const CwiseUnaryOp<typename internal::scalar_conjugate_op<LhsScalar>,LhsMap>,
+                          const LhsMap&>
+                        ::type cjLhs(lhs);
+    static const Index PanelWidth = EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH;
+    for(Index pi=IsLower ? 0 : size;
+        IsLower ? pi<size : pi>0;
+        IsLower ? pi+=PanelWidth : pi-=PanelWidth)
+    {
+      Index actualPanelWidth = (std::min)(IsLower ? size - pi : pi, PanelWidth);
+
+      Index r = IsLower ? pi : size - pi; // remaining size
+      if (r > 0)
+      {
+        // let's directly call the low level product function because:
+        // 1 - it is faster to compile
+        // 2 - it is slighlty faster at runtime
+        Index startRow = IsLower ? pi : pi-actualPanelWidth;
+        Index startCol = IsLower ? 0 : pi;
+
+        general_matrix_vector_product<Index,LhsScalar,LhsMapper,RowMajor,Conjugate,RhsScalar,RhsMapper,false>::run(
+          actualPanelWidth, r,
+          LhsMapper(&lhs.coeffRef(startRow,startCol), lhsStride),
+          RhsMapper(rhs + startCol, 1),
+          rhs + startRow, 1,
+          RhsScalar(-1));
+      }
+
+      for(Index k=0; k<actualPanelWidth; ++k)
+      {
+        Index i = IsLower ? pi+k : pi-k-1;
+        Index s = IsLower ? pi   : i+1;
+        if (k>0)
+          rhs[i] -= (cjLhs.row(i).segment(s,k).transpose().cwiseProduct(Map<const Matrix<RhsScalar,Dynamic,1> >(rhs+s,k))).sum();
+
+        if(!(Mode & UnitDiag))
+          rhs[i] /= cjLhs(i,i);
+      }
+    }
+  }
+};
+
+// forward and backward substitution, column-major, rhs is a vector
+template<typename LhsScalar, typename RhsScalar, typename Index, int Mode, bool Conjugate>
+struct triangular_solve_vector<LhsScalar, RhsScalar, Index, OnTheLeft, Mode, Conjugate, ColMajor>
+{
+  enum {
+    IsLower = ((Mode&Lower)==Lower)
+  };
+  static void run(Index size, const LhsScalar* _lhs, Index lhsStride, RhsScalar* rhs)
+  {
+    typedef Map<const Matrix<LhsScalar,Dynamic,Dynamic,ColMajor>, 0, OuterStride<> > LhsMap;
+    const LhsMap lhs(_lhs,size,size,OuterStride<>(lhsStride));
+    typedef const_blas_data_mapper<LhsScalar,Index,ColMajor> LhsMapper;
+    typedef const_blas_data_mapper<RhsScalar,Index,ColMajor> RhsMapper;
+    typename internal::conditional<Conjugate,
+                                   const CwiseUnaryOp<typename internal::scalar_conjugate_op<LhsScalar>,LhsMap>,
+                                   const LhsMap&
+                                  >::type cjLhs(lhs);
+    static const Index PanelWidth = EIGEN_TUNE_TRIANGULAR_PANEL_WIDTH;
+
+    for(Index pi=IsLower ? 0 : size;
+        IsLower ? pi<size : pi>0;
+        IsLower ? pi+=PanelWidth : pi-=PanelWidth)
+    {
+      Index actualPanelWidth = (std::min)(IsLower ? size - pi : pi, PanelWidth);
+      Index startBlock = IsLower ? pi : pi-actualPanelWidth;
+      Index endBlock = IsLower ? pi + actualPanelWidth : 0;
+
+      for(Index k=0; k<actualPanelWidth; ++k)
+      {
+        Index i = IsLower ? pi+k : pi-k-1;
+        if(!(Mode & UnitDiag))
+          rhs[i] /= cjLhs.coeff(i,i);
+
+        Index r = actualPanelWidth - k - 1; // remaining size
+        Index s = IsLower ? i+1 : i-r;
+        if (r>0)
+          Map<Matrix<RhsScalar,Dynamic,1> >(rhs+s,r) -= rhs[i] * cjLhs.col(i).segment(s,r);
+      }
+      Index r = IsLower ? size - endBlock : startBlock; // remaining size
+      if (r > 0)
+      {
+        // let's directly call the low level product function because:
+        // 1 - it is faster to compile
+        // 2 - it is slighlty faster at runtime
+        general_matrix_vector_product<Index,LhsScalar,LhsMapper,ColMajor,Conjugate,RhsScalar,RhsMapper,false>::run(
+            r, actualPanelWidth,
+            LhsMapper(&lhs.coeffRef(endBlock,startBlock), lhsStride),
+            RhsMapper(rhs+startBlock, 1),
+            rhs+endBlock, 1, RhsScalar(-1));
+      }
+    }
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIANGULAR_SOLVER_VECTOR_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/BlasUtil.h b/third_party/eigen3/Eigen/src/Core/util/BlasUtil.h
new file mode 100644
index 0000000000..bbaff8dd0e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/BlasUtil.h
@@ -0,0 +1,237 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BLASUTIL_H
+#define EIGEN_BLASUTIL_H
+
+// This file contains many lightweight helper classes used to
+// implement and control fast level 2 and level 3 BLAS-like routines.
+
+namespace Eigen {
+
+namespace internal {
+
+// forward declarations
+template<typename LhsScalar, typename RhsScalar, typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs=false, bool ConjugateRhs=false>
+struct gebp_kernel;
+
+template<typename Scalar, typename Index, typename DataMapper, int nr, int StorageOrder, bool Conjugate = false, bool PanelMode=false>
+struct gemm_pack_rhs;
+
+template<typename Scalar, typename Index, typename DataMapper, int Pack1, int Pack2, int StorageOrder, bool Conjugate = false, bool PanelMode = false>
+struct gemm_pack_lhs;
+
+template<
+  typename Index,
+  typename LhsScalar, int LhsStorageOrder, bool ConjugateLhs,
+  typename RhsScalar, int RhsStorageOrder, bool ConjugateRhs,
+  int ResStorageOrder>
+struct general_matrix_matrix_product;
+
+template<typename Index, typename LhsScalar, typename LhsMapper, int LhsStorageOrder, bool ConjugateLhs, typename RhsScalar, typename RhsMapper, bool ConjugateRhs, int Version=Specialized>
+struct general_matrix_vector_product;
+
+
+template<bool Conjugate> struct conj_if;
+
+template<> struct conj_if<true> {
+  template<typename T>
+  inline T operator()(const T& x) { return numext::conj(x); }
+  template<typename T>
+  inline T pconj(const T& x) { return internal::pconj(x); }
+};
+
+template<> struct conj_if<false> {
+  template<typename T>
+  inline const T& operator()(const T& x) { return x; }
+  template<typename T>
+  inline const T& pconj(const T& x) { return x; }
+};
+
+template<typename Scalar> struct conj_helper<Scalar,Scalar,false,false>
+{
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const Scalar& y, const Scalar& c) const { return internal::pmadd(x,y,c); }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const Scalar& y) const { return internal::pmul(x,y); }
+};
+
+template<typename RealScalar> struct conj_helper<std::complex<RealScalar>, std::complex<RealScalar>, false,true>
+{
+  typedef std::complex<RealScalar> Scalar;
+  EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const Scalar& y, const Scalar& c) const
+  { return c + pmul(x,y); }
+
+  EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const Scalar& y) const
+  { return Scalar(numext::real(x)*numext::real(y) + numext::imag(x)*numext::imag(y), numext::imag(x)*numext::real(y) - numext::real(x)*numext::imag(y)); }
+};
+
+template<typename RealScalar> struct conj_helper<std::complex<RealScalar>, std::complex<RealScalar>, true,false>
+{
+  typedef std::complex<RealScalar> Scalar;
+  EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const Scalar& y, const Scalar& c) const
+  { return c + pmul(x,y); }
+
+  EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const Scalar& y) const
+  { return Scalar(numext::real(x)*numext::real(y) + numext::imag(x)*numext::imag(y), numext::real(x)*numext::imag(y) - numext::imag(x)*numext::real(y)); }
+};
+
+template<typename RealScalar> struct conj_helper<std::complex<RealScalar>, std::complex<RealScalar>, true,true>
+{
+  typedef std::complex<RealScalar> Scalar;
+  EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const Scalar& y, const Scalar& c) const
+  { return c + pmul(x,y); }
+
+  EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const Scalar& y) const
+  { return Scalar(numext::real(x)*numext::real(y) - numext::imag(x)*numext::imag(y), - numext::real(x)*numext::imag(y) - numext::imag(x)*numext::real(y)); }
+};
+
+template<typename RealScalar,bool Conj> struct conj_helper<std::complex<RealScalar>, RealScalar, Conj,false>
+{
+  typedef std::complex<RealScalar> Scalar;
+  EIGEN_STRONG_INLINE Scalar pmadd(const Scalar& x, const RealScalar& y, const Scalar& c) const
+  { return padd(c, pmul(x,y)); }
+  EIGEN_STRONG_INLINE Scalar pmul(const Scalar& x, const RealScalar& y) const
+  { return conj_if<Conj>()(x)*y; }
+};
+
+template<typename RealScalar,bool Conj> struct conj_helper<RealScalar, std::complex<RealScalar>, false,Conj>
+{
+  typedef std::complex<RealScalar> Scalar;
+  EIGEN_STRONG_INLINE Scalar pmadd(const RealScalar& x, const Scalar& y, const Scalar& c) const
+  { return padd(c, pmul(x,y)); }
+  EIGEN_STRONG_INLINE Scalar pmul(const RealScalar& x, const Scalar& y) const
+  { return x*conj_if<Conj>()(y); }
+};
+
+template<typename From,typename To> struct get_factor {
+  EIGEN_DEVICE_FUNC static EIGEN_STRONG_INLINE To run(const From& x) { return x; }
+};
+
+template<typename Scalar> struct get_factor<Scalar,typename NumTraits<Scalar>::Real> {
+  EIGEN_DEVICE_FUNC
+  static EIGEN_STRONG_INLINE typename NumTraits<Scalar>::Real run(const Scalar& x) { return numext::real(x); }
+};
+
+
+/* Helper class to analyze the factors of a Product expression.
+ * In particular it allows to pop out operator-, scalar multiples,
+ * and conjugate */
+template<typename XprType> struct blas_traits
+{
+  typedef typename traits<XprType>::Scalar Scalar;
+  typedef const XprType& ExtractType;
+  typedef XprType _ExtractType;
+  enum {
+    IsComplex = NumTraits<Scalar>::IsComplex,
+    IsTransposed = false,
+    NeedToConjugate = false,
+    HasUsableDirectAccess = (    (int(XprType::Flags)&DirectAccessBit)
+                              && (   bool(XprType::IsVectorAtCompileTime)
+                                  || int(inner_stride_at_compile_time<XprType>::ret) == 1)
+                             ) ?  1 : 0
+  };
+  typedef typename conditional<bool(HasUsableDirectAccess),
+    ExtractType,
+    typename _ExtractType::PlainObject
+    >::type DirectLinearAccessType;
+  static inline ExtractType extract(const XprType& x) { return x; }
+  static inline const Scalar extractScalarFactor(const XprType&) { return Scalar(1); }
+};
+
+// pop conjugate
+template<typename Scalar, typename NestedXpr>
+struct blas_traits<CwiseUnaryOp<scalar_conjugate_op<Scalar>, NestedXpr> >
+ : blas_traits<NestedXpr>
+{
+  typedef blas_traits<NestedXpr> Base;
+  typedef CwiseUnaryOp<scalar_conjugate_op<Scalar>, NestedXpr> XprType;
+  typedef typename Base::ExtractType ExtractType;
+
+  enum {
+    IsComplex = NumTraits<Scalar>::IsComplex,
+    NeedToConjugate = Base::NeedToConjugate ? 0 : IsComplex
+  };
+  static inline ExtractType extract(const XprType& x) { return Base::extract(x.nestedExpression()); }
+  static inline Scalar extractScalarFactor(const XprType& x) { return conj(Base::extractScalarFactor(x.nestedExpression())); }
+};
+
+// pop scalar multiple
+template<typename Scalar, typename NestedXpr>
+struct blas_traits<CwiseUnaryOp<scalar_multiple_op<Scalar>, NestedXpr> >
+ : blas_traits<NestedXpr>
+{
+  typedef blas_traits<NestedXpr> Base;
+  typedef CwiseUnaryOp<scalar_multiple_op<Scalar>, NestedXpr> XprType;
+  typedef typename Base::ExtractType ExtractType;
+  static inline ExtractType extract(const XprType& x) { return Base::extract(x.nestedExpression()); }
+  static inline Scalar extractScalarFactor(const XprType& x)
+  { return x.functor().m_other * Base::extractScalarFactor(x.nestedExpression()); }
+};
+
+// pop opposite
+template<typename Scalar, typename NestedXpr>
+struct blas_traits<CwiseUnaryOp<scalar_opposite_op<Scalar>, NestedXpr> >
+ : blas_traits<NestedXpr>
+{
+  typedef blas_traits<NestedXpr> Base;
+  typedef CwiseUnaryOp<scalar_opposite_op<Scalar>, NestedXpr> XprType;
+  typedef typename Base::ExtractType ExtractType;
+  static inline ExtractType extract(const XprType& x) { return Base::extract(x.nestedExpression()); }
+  static inline Scalar extractScalarFactor(const XprType& x)
+  { return - Base::extractScalarFactor(x.nestedExpression()); }
+};
+
+// pop/push transpose
+template<typename NestedXpr>
+struct blas_traits<Transpose<NestedXpr> >
+ : blas_traits<NestedXpr>
+{
+  typedef typename NestedXpr::Scalar Scalar;
+  typedef blas_traits<NestedXpr> Base;
+  typedef Transpose<NestedXpr> XprType;
+  typedef Transpose<const typename Base::_ExtractType>  ExtractType; // const to get rid of a compile error; anyway blas traits are only used on the RHS
+  typedef Transpose<const typename Base::_ExtractType> _ExtractType;
+  typedef typename conditional<bool(Base::HasUsableDirectAccess),
+    ExtractType,
+    typename ExtractType::PlainObject
+    >::type DirectLinearAccessType;
+  enum {
+    IsTransposed = Base::IsTransposed ? 0 : 1
+  };
+  static inline ExtractType extract(const XprType& x) { return Base::extract(x.nestedExpression()); }
+  static inline Scalar extractScalarFactor(const XprType& x) { return Base::extractScalarFactor(x.nestedExpression()); }
+};
+
+template<typename T>
+struct blas_traits<const T>
+     : blas_traits<T>
+{};
+
+template<typename T, bool HasUsableDirectAccess=blas_traits<T>::HasUsableDirectAccess>
+struct extract_data_selector {
+  static const typename T::Scalar* run(const T& m)
+  {
+    return blas_traits<T>::extract(m).data();
+  }
+};
+
+template<typename T>
+struct extract_data_selector<T,false> {
+  static typename T::Scalar* run(const T&) { return 0; }
+};
+
+template<typename T> const typename T::Scalar* extract_data(const T& m)
+{
+  return extract_data_selector<T>::run(m);
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_BLASUTIL_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/Constants.h b/third_party/eigen3/Eigen/src/Core/util/Constants.h
new file mode 100644
index 0000000000..be14df0168
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/Constants.h
@@ -0,0 +1,453 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2007-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CONSTANTS_H
+#define EIGEN_CONSTANTS_H
+
+namespace Eigen {
+
+/** This value means that a positive quantity (e.g., a size) is not known at compile-time, and that instead the value is
+  * stored in some runtime variable.
+  *
+  * Changing the value of Dynamic breaks the ABI, as Dynamic is often used as a template parameter for Matrix.
+  */
+const int Dynamic = -1;
+
+/** This value means that a signed quantity (e.g., a signed index) is not known at compile-time, and that instead its value
+  * has to be specified at runtime.
+  */
+const int DynamicIndex = 0xffffff;
+
+/** This value means +Infinity; it is currently used only as the p parameter to MatrixBase::lpNorm<int>().
+  * The value Infinity there means the L-infinity norm.
+  */
+const int Infinity = -1;
+
+/** \defgroup flags Flags
+  * \ingroup Core_Module
+  *
+  * These are the possible bits which can be OR'ed to constitute the flags of a matrix or
+  * expression.
+  *
+  * It is important to note that these flags are a purely compile-time notion. They are a compile-time property of
+  * an expression type, implemented as enum's. They are not stored in memory at runtime, and they do not incur any
+  * runtime overhead.
+  *
+  * \sa MatrixBase::Flags
+  */
+
+/** \ingroup flags
+  *
+  * for a matrix, this means that the storage order is row-major.
+  * If this bit is not set, the storage order is column-major.
+  * For an expression, this determines the storage order of
+  * the matrix created by evaluation of that expression.
+  * \sa \ref TopicStorageOrders */
+const unsigned int RowMajorBit = 0x1;
+
+/** \ingroup flags
+  *
+  * means the expression should be evaluated by the calling expression */
+const unsigned int EvalBeforeNestingBit = 0x2;
+
+/** \ingroup flags
+  *
+  * means the expression should be evaluated before any assignment */
+const unsigned int EvalBeforeAssigningBit = 0x4;
+
+/** \ingroup flags
+  *
+  * Short version: means the expression might be vectorized
+  *
+  * Long version: means that the coefficients can be handled by packets
+  * and start at a memory location whose alignment meets the requirements
+  * of the present CPU architecture for optimized packet access. In the fixed-size
+  * case, there is the additional condition that it be possible to access all the
+  * coefficients by packets (this implies the requirement that the size be a multiple of 16 bytes,
+  * and that any nontrivial strides don't break the alignment). In the dynamic-size case,
+  * there is no such condition on the total size and strides, so it might not be possible to access
+  * all coeffs by packets.
+  *
+  * \note This bit can be set regardless of whether vectorization is actually enabled.
+  *       To check for actual vectorizability, see \a ActualPacketAccessBit.
+  */
+const unsigned int PacketAccessBit = 0x8;
+
+#ifdef EIGEN_VECTORIZE
+/** \ingroup flags
+  *
+  * If vectorization is enabled (EIGEN_VECTORIZE is defined) this constant
+  * is set to the value \a PacketAccessBit.
+  *
+  * If vectorization is not enabled (EIGEN_VECTORIZE is not defined) this constant
+  * is set to the value 0.
+  */
+const unsigned int ActualPacketAccessBit = PacketAccessBit;
+#else
+const unsigned int ActualPacketAccessBit = 0x0;
+#endif
+
+/** \ingroup flags
+  *
+  * Short version: means the expression can be seen as 1D vector.
+  *
+  * Long version: means that one can access the coefficients
+  * of this expression by coeff(int), and coeffRef(int) in the case of a lvalue expression. These
+  * index-based access methods are guaranteed
+  * to not have to do any runtime computation of a (row, col)-pair from the index, so that it
+  * is guaranteed that whenever it is available, index-based access is at least as fast as
+  * (row,col)-based access. Expressions for which that isn't possible don't have the LinearAccessBit.
+  *
+  * If both PacketAccessBit and LinearAccessBit are set, then the
+  * packets of this expression can be accessed by packet(int), and writePacket(int) in the case of a
+  * lvalue expression.
+  *
+  * Typically, all vector expressions have the LinearAccessBit, but there is one exception:
+  * Product expressions don't have it, because it would be troublesome for vectorization, even when the
+  * Product is a vector expression. Thus, vector Product expressions allow index-based coefficient access but
+  * not index-based packet access, so they don't have the LinearAccessBit.
+  */
+const unsigned int LinearAccessBit = 0x10;
+
+/** \ingroup flags
+  *
+  * Means the expression has a coeffRef() method, i.e. is writable as its individual coefficients are directly addressable.
+  * This rules out read-only expressions.
+  *
+  * Note that DirectAccessBit and LvalueBit are mutually orthogonal, as there are examples of expression having one but note
+  * the other:
+  *   \li writable expressions that don't have a very simple memory layout as a strided array, have LvalueBit but not DirectAccessBit
+  *   \li Map-to-const expressions, for example Map<const Matrix>, have DirectAccessBit but not LvalueBit
+  *
+  * Expressions having LvalueBit also have their coeff() method returning a const reference instead of returning a new value.
+  */
+const unsigned int LvalueBit = 0x20;
+
+/** \ingroup flags
+  *
+  * Means that the underlying array of coefficients can be directly accessed as a plain strided array. The memory layout
+  * of the array of coefficients must be exactly the natural one suggested by rows(), cols(),
+  * outerStride(), innerStride(), and the RowMajorBit. This rules out expressions such as Diagonal, whose coefficients,
+  * though referencable, do not have such a regular memory layout.
+  *
+  * See the comment on LvalueBit for an explanation of how LvalueBit and DirectAccessBit are mutually orthogonal.
+  */
+const unsigned int DirectAccessBit = 0x40;
+
+/** \ingroup flags
+  *
+  * means the first coefficient packet is guaranteed to be aligned.
+  * An expression cannot has the AlignedBit without the PacketAccessBit flag.
+  * In other words, this means we are allow to perform an aligned packet access to the first element regardless
+  * of the expression kind:
+  * \code
+  * expression.packet<Aligned>(0);
+  * \endcode
+  */
+const unsigned int AlignedBit = 0x80;
+
+const unsigned int NestByRefBit = 0x100;
+
+// list of flags that are inherited by default
+const unsigned int HereditaryBits = RowMajorBit
+                                  | EvalBeforeNestingBit
+                                  | EvalBeforeAssigningBit;
+
+/** \defgroup enums Enumerations
+  * \ingroup Core_Module
+  *
+  * Various enumerations used in %Eigen. Many of these are used as template parameters.
+  */
+
+/** \ingroup enums
+  * Enum containing possible values for the \p Mode parameter of
+  * MatrixBase::selfadjointView() and MatrixBase::triangularView(). */
+enum {
+  /** View matrix as a lower triangular matrix. */
+  Lower=0x1,
+  /** View matrix as an upper triangular matrix. */
+  Upper=0x2,
+  /** %Matrix has ones on the diagonal; to be used in combination with #Lower or #Upper. */
+  UnitDiag=0x4,
+  /** %Matrix has zeros on the diagonal; to be used in combination with #Lower or #Upper. */
+  ZeroDiag=0x8,
+  /** View matrix as a lower triangular matrix with ones on the diagonal. */
+  UnitLower=UnitDiag|Lower,
+  /** View matrix as an upper triangular matrix with ones on the diagonal. */
+  UnitUpper=UnitDiag|Upper,
+  /** View matrix as a lower triangular matrix with zeros on the diagonal. */
+  StrictlyLower=ZeroDiag|Lower,
+  /** View matrix as an upper triangular matrix with zeros on the diagonal. */
+  StrictlyUpper=ZeroDiag|Upper,
+  /** Used in BandMatrix and SelfAdjointView to indicate that the matrix is self-adjoint. */
+  SelfAdjoint=0x10,
+  /** Used to support symmetric, non-selfadjoint, complex matrices. */
+  Symmetric=0x20
+};
+
+/** \ingroup enums
+  * Enum for indicating whether an object is aligned or not. */
+enum {
+  /** Object is not correctly aligned for vectorization. */
+  Unaligned=0,
+  /** Object is aligned for vectorization. */
+  Aligned=1
+};
+
+/** \ingroup enums
+ * Enum used by DenseBase::corner() in Eigen2 compatibility mode. */
+// FIXME after the corner() API change, this was not needed anymore, except by AlignedBox
+// TODO: find out what to do with that. Adapt the AlignedBox API ?
+enum CornerType { TopLeft, TopRight, BottomLeft, BottomRight };
+
+/** \ingroup enums
+  * Enum containing possible values for the \p Direction parameter of
+  * Reverse, PartialReduxExpr and VectorwiseOp. */
+enum DirectionType {
+  /** For Reverse, all columns are reversed;
+    * for PartialReduxExpr and VectorwiseOp, act on columns. */
+  Vertical,
+  /** For Reverse, all rows are reversed;
+    * for PartialReduxExpr and VectorwiseOp, act on rows. */
+  Horizontal,
+  /** For Reverse, both rows and columns are reversed;
+    * not used for PartialReduxExpr and VectorwiseOp. */
+  BothDirections
+};
+
+/** \internal \ingroup enums
+  * Enum to specify how to traverse the entries of a matrix. */
+enum {
+  /** \internal Default traversal, no vectorization, no index-based access */
+  DefaultTraversal,
+  /** \internal No vectorization, use index-based access to have only one for loop instead of 2 nested loops */
+  LinearTraversal,
+  /** \internal Equivalent to a slice vectorization for fixed-size matrices having good alignment
+    * and good size */
+  InnerVectorizedTraversal,
+  /** \internal Vectorization path using a single loop plus scalar loops for the
+    * unaligned boundaries */
+  LinearVectorizedTraversal,
+  /** \internal Generic vectorization path using one vectorized loop per row/column with some
+    * scalar loops to handle the unaligned boundaries */
+  SliceVectorizedTraversal,
+  /** \internal Special case to properly handle incompatible scalar types or other defecting cases*/
+  InvalidTraversal,
+  /** \internal Evaluate all entries at once */
+  AllAtOnceTraversal
+};
+
+/** \internal \ingroup enums
+  * Enum to specify whether to unroll loops when traversing over the entries of a matrix. */
+enum {
+  /** \internal Do not unroll loops. */
+  NoUnrolling,
+  /** \internal Unroll only the inner loop, but not the outer loop. */
+  InnerUnrolling,
+  /** \internal Unroll both the inner and the outer loop. If there is only one loop,
+    * because linear traversal is used, then unroll that loop. */
+  CompleteUnrolling
+};
+
+/** \internal \ingroup enums
+  * Enum to specify whether to use the default (built-in) implementation or the specialization. */
+enum {
+  Specialized,
+  BuiltIn
+};
+
+/** \ingroup enums
+  * Enum containing possible values for the \p _Options template parameter of
+  * Matrix, Array and BandMatrix. */
+enum {
+  /** Storage order is column major (see \ref TopicStorageOrders). */
+  ColMajor = 0,
+  /** Storage order is row major (see \ref TopicStorageOrders). */
+  RowMajor = 0x1,  // it is only a coincidence that this is equal to RowMajorBit -- don't rely on that
+  /** Align the matrix itself if it is vectorizable fixed-size */
+  AutoAlign = 0,
+  /** Don't require alignment for the matrix itself (the array of coefficients, if dynamically allocated, may still be requested to be aligned) */ // FIXME --- clarify the situation
+  DontAlign = 0x2,
+  AllocateDefault = 0,
+  AllocateUVM = 0x8
+};
+
+/** \ingroup enums
+  * Enum for specifying whether to apply or solve on the left or right. */
+enum {
+  /** Apply transformation on the left. */
+  OnTheLeft = 1,
+  /** Apply transformation on the right. */
+  OnTheRight = 2
+};
+
+/* the following used to be written as:
+ *
+ *   struct NoChange_t {};
+ *   namespace {
+ *     EIGEN_UNUSED NoChange_t NoChange;
+ *   }
+ *
+ * on the ground that it feels dangerous to disambiguate overloaded functions on enum/integer types.
+ * However, this leads to "variable declared but never referenced" warnings on Intel Composer XE,
+ * and we do not know how to get rid of them (bug 450).
+ */
+
+enum NoChange_t   { NoChange };
+enum Sequential_t { Sequential };
+enum Default_t    { Default };
+
+/** \internal \ingroup enums
+  * Used in AmbiVector. */
+enum {
+  IsDense         = 0,
+  IsSparse
+};
+
+/** \ingroup enums
+  * Used as template parameter in DenseCoeffBase and MapBase to indicate
+  * which accessors should be provided. */
+enum AccessorLevels {
+  /** Read-only access via a member function. */
+  ReadOnlyAccessors,
+  /** Read/write access via member functions. */
+  WriteAccessors,
+  /** Direct read-only access to the coefficients. */
+  DirectAccessors,
+  /** Direct read/write access to the coefficients. */
+  DirectWriteAccessors
+};
+
+/** \ingroup enums
+  * Enum with options to give to various decompositions. */
+enum DecompositionOptions {
+  /** \internal Not used (meant for LDLT?). */
+  Pivoting            = 0x01,
+  /** \internal Not used (meant for LDLT?). */
+  NoPivoting          = 0x02,
+  /** Used in JacobiSVD to indicate that the square matrix U is to be computed. */
+  ComputeFullU        = 0x04,
+  /** Used in JacobiSVD to indicate that the thin matrix U is to be computed. */
+  ComputeThinU        = 0x08,
+  /** Used in JacobiSVD to indicate that the square matrix V is to be computed. */
+  ComputeFullV        = 0x10,
+  /** Used in JacobiSVD to indicate that the thin matrix V is to be computed. */
+  ComputeThinV        = 0x20,
+  /** Used in SelfAdjointEigenSolver and GeneralizedSelfAdjointEigenSolver to specify
+    * that only the eigenvalues are to be computed and not the eigenvectors. */
+  EigenvaluesOnly     = 0x40,
+  /** Used in SelfAdjointEigenSolver and GeneralizedSelfAdjointEigenSolver to specify
+    * that both the eigenvalues and the eigenvectors are to be computed. */
+  ComputeEigenvectors = 0x80,
+  /** \internal */
+  EigVecMask = EigenvaluesOnly | ComputeEigenvectors,
+  /** Used in GeneralizedSelfAdjointEigenSolver to indicate that it should
+    * solve the generalized eigenproblem \f$ Ax = \lambda B x \f$. */
+  Ax_lBx              = 0x100,
+  /** Used in GeneralizedSelfAdjointEigenSolver to indicate that it should
+    * solve the generalized eigenproblem \f$ ABx = \lambda x \f$. */
+  ABx_lx              = 0x200,
+  /** Used in GeneralizedSelfAdjointEigenSolver to indicate that it should
+    * solve the generalized eigenproblem \f$ BAx = \lambda x \f$. */
+  BAx_lx              = 0x400,
+  /** \internal */
+  GenEigMask = Ax_lBx | ABx_lx | BAx_lx
+};
+
+/** \ingroup enums
+  * Possible values for the \p QRPreconditioner template parameter of JacobiSVD. */
+enum QRPreconditioners {
+  /** Do not specify what is to be done if the SVD of a non-square matrix is asked for. */
+  NoQRPreconditioner,
+  /** Use a QR decomposition without pivoting as the first step. */
+  HouseholderQRPreconditioner,
+  /** Use a QR decomposition with column pivoting as the first step. */
+  ColPivHouseholderQRPreconditioner,
+  /** Use a QR decomposition with full pivoting as the first step. */
+  FullPivHouseholderQRPreconditioner
+};
+
+#ifdef Success
+#error The preprocessor symbol 'Success' is defined, possibly by the X11 header file X.h
+#endif
+
+/** \ingroup enums
+  * Enum for reporting the status of a computation. */
+enum ComputationInfo {
+  /** Computation was successful. */
+  Success = 0,
+  /** The provided data did not satisfy the prerequisites. */
+  NumericalIssue = 1,
+  /** Iterative procedure did not converge. */
+  NoConvergence = 2,
+  /** The inputs are invalid, or the algorithm has been improperly called.
+    * When assertions are enabled, such errors trigger an assert. */
+  InvalidInput = 3
+};
+
+/** \ingroup enums
+  * Enum used to specify how a particular transformation is stored in a matrix.
+  * \sa Transform, Hyperplane::transform(). */
+enum TransformTraits {
+  /** Transformation is an isometry. */
+  Isometry      = 0x1,
+  /** Transformation is an affine transformation stored as a (Dim+1)^2 matrix whose last row is
+    * assumed to be [0 ... 0 1]. */
+  Affine        = 0x2,
+  /** Transformation is an affine transformation stored as a (Dim) x (Dim+1) matrix. */
+  AffineCompact = 0x10 | Affine,
+  /** Transformation is a general projective transformation stored as a (Dim+1)^2 matrix. */
+  Projective    = 0x20
+};
+
+/** \internal \ingroup enums
+  * Enum used to choose between implementation depending on the computer architecture. */
+namespace Architecture
+{
+  enum Type {
+    Generic = 0x0,
+    SSE = 0x1,
+    AltiVec = 0x2,
+    VSX = 0x3,
+    NEON = 0x4,
+#if defined EIGEN_VECTORIZE_SSE
+    Target = SSE
+#elif defined EIGEN_VECTORIZE_ALTIVEC
+    Target = AltiVec
+#elif defined EIGEN_VECTORIZE_VSX
+    Target = VSX
+#elif defined EIGEN_VECTORIZE_NEON
+    Target = NEON
+#else
+    Target = Generic
+#endif
+  };
+}
+
+/** \internal \ingroup enums
+  * Enum used as template parameter in GeneralProduct. */
+enum { CoeffBasedProductMode, LazyCoeffBasedProductMode, OuterProduct, InnerProduct, GemvProduct, GemmProduct };
+
+/** \internal \ingroup enums
+  * Enum used in experimental parallel implementation. */
+enum Action {GetAction, SetAction};
+
+/** The type used to identify a dense storage. */
+struct Dense {};
+
+/** The type used to identify a matrix expression */
+struct MatrixXpr {};
+
+/** The type used to identify an array expression */
+struct ArrayXpr {};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CONSTANTS_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/DisableStupidWarnings.h b/third_party/eigen3/Eigen/src/Core/util/DisableStupidWarnings.h
new file mode 100644
index 0000000000..6a0bf0629c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/DisableStupidWarnings.h
@@ -0,0 +1,40 @@
+#ifndef EIGEN_WARNINGS_DISABLED
+#define EIGEN_WARNINGS_DISABLED
+
+#ifdef _MSC_VER
+  // 4100 - unreferenced formal parameter (occurred e.g. in aligned_allocator::destroy(pointer p))
+  // 4101 - unreferenced local variable
+  // 4127 - conditional expression is constant
+  // 4181 - qualifier applied to reference type ignored
+  // 4211 - nonstandard extension used : redefined extern to static
+  // 4244 - 'argument' : conversion from 'type1' to 'type2', possible loss of data
+  // 4273 - QtAlignedMalloc, inconsistent DLL linkage
+  // 4324 - structure was padded due to declspec(align())
+  // 4512 - assignment operator could not be generated
+  // 4522 - 'class' : multiple assignment operators specified
+  // 4700 - uninitialized local variable 'xyz' used
+  // 4717 - 'function' : recursive on all control paths, function will cause runtime stack overflow
+  #ifndef EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS
+    #pragma warning( push )
+  #endif
+  #pragma warning( disable : 4100 4101 4127 4181 4211 4244 4273 4324 4512 4522 4700 4717 )
+#elif defined __INTEL_COMPILER
+  // 2196 - routine is both "inline" and "noinline" ("noinline" assumed)
+  //        ICC 12 generates this warning even without any inline keyword, when defining class methods 'inline' i.e. inside of class body
+  //        typedef that may be a reference type.
+  // 279  - controlling expression is constant
+  //        ICC 12 generates this warning on assert(constant_expression_depending_on_template_params) and frankly this is a legitimate use case.
+  #ifndef EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS
+    #pragma warning push
+  #endif
+  #pragma warning disable 2196 279
+#elif defined __clang__
+  // -Wconstant-logical-operand - warning: use of logical && with constant operand; switch to bitwise & or remove constant
+  //     this is really a stupid warning as it warns on compile-time expressions involving enums
+  #ifndef EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS
+    #pragma clang diagnostic push
+  #endif
+  #pragma clang diagnostic ignored "-Wconstant-logical-operand"
+#endif
+
+#endif // not EIGEN_WARNINGS_DISABLED
diff --git a/third_party/eigen3/Eigen/src/Core/util/ForwardDeclarations.h b/third_party/eigen3/Eigen/src/Core/util/ForwardDeclarations.h
new file mode 100644
index 0000000000..be39d731ad
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/ForwardDeclarations.h
@@ -0,0 +1,301 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2007-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_FORWARDDECLARATIONS_H
+#define EIGEN_FORWARDDECLARATIONS_H
+
+namespace Eigen {
+namespace internal {
+
+template<typename T> struct traits;
+
+// here we say once and for all that traits<const T> == traits<T>
+// When constness must affect traits, it has to be constness on template parameters on which T itself depends.
+// For example, traits<Map<const T> > != traits<Map<T> >, but
+//              traits<const Map<T> > == traits<Map<T> >
+template<typename T> struct traits<const T> : traits<T> {};
+
+template<typename Derived> struct has_direct_access
+{
+  enum { ret = (traits<Derived>::Flags & DirectAccessBit) ? 1 : 0 };
+};
+
+template<typename Derived> struct accessors_level
+{
+  enum { has_direct_access = (traits<Derived>::Flags & DirectAccessBit) ? 1 : 0,
+         has_write_access = (traits<Derived>::Flags & LvalueBit) ? 1 : 0,
+         value = has_direct_access ? (has_write_access ? DirectWriteAccessors : DirectAccessors)
+                                   : (has_write_access ? WriteAccessors       : ReadOnlyAccessors)
+  };
+};
+
+} // end namespace internal
+
+template<typename T> struct NumTraits;
+
+template<typename Derived> struct EigenBase;
+template<typename Derived> class DenseBase;
+template<typename Derived> class PlainObjectBase;
+
+
+template<typename Derived,
+         int Level = internal::accessors_level<Derived>::value >
+class DenseCoeffsBase;
+
+template<typename _Scalar, int _Rows, int _Cols,
+         int _Options = AutoAlign |
+#if EIGEN_GNUC_AT(3,4)
+    // workaround a bug in at least gcc 3.4.6
+    // the innermost ?: ternary operator is misparsed. We write it slightly
+    // differently and this makes gcc 3.4.6 happy, but it's ugly.
+    // The error would only show up with EIGEN_DEFAULT_TO_ROW_MAJOR is defined
+    // (when EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION is RowMajor)
+                          ( (_Rows==1 && _Cols!=1) ? Eigen::RowMajor
+                          : !(_Cols==1 && _Rows!=1) ?  EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION
+                          : Eigen::ColMajor ),
+#else
+                          ( (_Rows==1 && _Cols!=1) ? Eigen::RowMajor
+                          : (_Cols==1 && _Rows!=1) ? Eigen::ColMajor
+                          : EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION ),
+#endif
+         int _MaxRows = _Rows,
+         int _MaxCols = _Cols
+> class Matrix;
+
+template<typename Derived> class MatrixBase;
+template<typename Derived> class ArrayBase;
+
+template<typename ExpressionType, unsigned int Added, unsigned int Removed> class Flagged;
+template<typename ExpressionType, template <typename> class StorageBase > class NoAlias;
+template<typename ExpressionType> class NestByValue;
+template<typename ExpressionType> class ForceAlignedAccess;
+template<typename ExpressionType> class SwapWrapper;
+
+template<typename XprType, int BlockRows=Dynamic, int BlockCols=Dynamic, bool InnerPanel = false> class Block;
+
+template<typename MatrixType, int Size=Dynamic> class VectorBlock;
+template<typename MatrixType> class Transpose;
+template<typename MatrixType> class Conjugate;
+template<typename NullaryOp, typename MatrixType>         class CwiseNullaryOp;
+template<typename UnaryOp,   typename MatrixType>         class CwiseUnaryOp;
+template<typename ViewOp,    typename MatrixType>         class CwiseUnaryView;
+template<typename BinaryOp,  typename Lhs, typename Rhs>  class CwiseBinaryOp;
+template<typename BinOp,     typename Lhs, typename Rhs>  class SelfCwiseBinaryOp;
+template<typename Derived,   typename Lhs, typename Rhs>  class ProductBase;
+template<typename Lhs, typename Rhs>                      class Product;
+template<typename Lhs, typename Rhs, int Mode>            class GeneralProduct;
+template<typename Lhs, typename Rhs, int NestingFlags>    class CoeffBasedProduct;
+
+template<typename Derived> class DiagonalBase;
+template<typename _DiagonalVectorType> class DiagonalWrapper;
+template<typename _Scalar, int SizeAtCompileTime, int MaxSizeAtCompileTime=SizeAtCompileTime> class DiagonalMatrix;
+template<typename MatrixType, typename DiagonalType, int ProductOrder> class DiagonalProduct;
+template<typename MatrixType, int Index = 0> class Diagonal;
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime = SizeAtCompileTime, typename IndexType=int> class PermutationMatrix;
+template<int SizeAtCompileTime, int MaxSizeAtCompileTime = SizeAtCompileTime, typename IndexType=int> class Transpositions;
+template<typename Derived> class PermutationBase;
+template<typename Derived> class TranspositionsBase;
+template<typename _IndicesType> class PermutationWrapper;
+template<typename _IndicesType> class TranspositionsWrapper;
+
+template<typename Derived,
+         int Level = internal::accessors_level<Derived>::has_write_access ? WriteAccessors : ReadOnlyAccessors
+> class MapBase;
+template<int InnerStrideAtCompileTime, int OuterStrideAtCompileTime> class Stride;
+template<typename MatrixType, int MapOptions=Unaligned, typename StrideType = Stride<0,0> > class Map;
+
+template<typename Derived> class TriangularBase;
+template<typename MatrixType, unsigned int Mode> class TriangularView;
+template<typename MatrixType, unsigned int Mode> class SelfAdjointView;
+template<typename MatrixType> class SparseView;
+template<typename ExpressionType> class WithFormat;
+template<typename MatrixType> struct CommaInitializer;
+template<typename Derived> class ReturnByValue;
+template<typename ExpressionType> class ArrayWrapper;
+template<typename ExpressionType> class MatrixWrapper;
+
+namespace internal {
+template<typename DecompositionType, typename Rhs> struct solve_retval_base;
+template<typename DecompositionType, typename Rhs> struct solve_retval;
+template<typename DecompositionType> struct kernel_retval_base;
+template<typename DecompositionType> struct kernel_retval;
+template<typename DecompositionType> struct image_retval_base;
+template<typename DecompositionType> struct image_retval;
+} // end namespace internal
+
+namespace internal {
+template<typename _Scalar, int Rows=Dynamic, int Cols=Dynamic, int Supers=Dynamic, int Subs=Dynamic, int Options=0> class BandMatrix;
+}
+
+namespace internal {
+template<typename Lhs, typename Rhs> struct product_type;
+}
+
+template<typename Lhs, typename Rhs,
+         int ProductType = internal::product_type<Lhs,Rhs>::value>
+struct ProductReturnType;
+
+// this is a workaround for sun CC
+template<typename Lhs, typename Rhs> struct LazyProductReturnType;
+
+namespace internal {
+
+// Provides scalar/packet-wise product and product with accumulation
+// with optional conjugation of the arguments.
+template<typename LhsScalar, typename RhsScalar, bool ConjLhs=false, bool ConjRhs=false> struct conj_helper;
+
+template<typename Scalar> struct scalar_sum_op;
+template<typename Scalar> struct scalar_difference_op;
+template<typename LhsScalar,typename RhsScalar> struct scalar_conj_product_op;
+template<typename Scalar> struct scalar_opposite_op;
+template<typename Scalar> struct scalar_conjugate_op;
+template<typename Scalar> struct scalar_real_op;
+template<typename Scalar> struct scalar_imag_op;
+template<typename Scalar> struct scalar_abs_op;
+template<typename Scalar> struct scalar_abs2_op;
+template<typename Scalar> struct scalar_sqrt_op;
+template<typename Scalar> struct scalar_rsqrt_op;
+template<typename Scalar> struct scalar_exp_op;
+template<typename Scalar> struct scalar_log_op;
+template<typename Scalar> struct scalar_cos_op;
+template<typename Scalar> struct scalar_sin_op;
+template<typename Scalar> struct scalar_acos_op;
+template<typename Scalar> struct scalar_asin_op;
+template<typename Scalar> struct scalar_tan_op;
+template<typename Scalar> struct scalar_pow_op;
+template<typename Scalar> struct scalar_inverse_op;
+template<typename Scalar> struct scalar_square_op;
+template<typename Scalar> struct scalar_cube_op;
+template<typename Scalar, typename NewType> struct scalar_cast_op;
+template<typename Scalar> struct scalar_multiple_op;
+template<typename Scalar> struct scalar_quotient1_op;
+template<typename Scalar> struct scalar_min_op;
+template<typename Scalar> struct scalar_max_op;
+template<typename Scalar> struct scalar_random_op;
+template<typename Scalar> struct scalar_add_op;
+template<typename Scalar> struct scalar_constant_op;
+template<typename Scalar> struct scalar_identity_op;
+
+template<typename LhsScalar,typename RhsScalar=LhsScalar> struct scalar_product_op;
+template<typename LhsScalar,typename RhsScalar> struct scalar_multiple2_op;
+template<typename LhsScalar,typename RhsScalar=LhsScalar> struct scalar_quotient_op;
+
+} // end namespace internal
+
+struct IOFormat;
+
+// Array module
+template<typename _Scalar, int _Rows, int _Cols,
+         int _Options = AutoAlign |
+#if EIGEN_GNUC_AT(3,4)
+    // workaround a bug in at least gcc 3.4.6
+    // the innermost ?: ternary operator is misparsed. We write it slightly
+    // differently and this makes gcc 3.4.6 happy, but it's ugly.
+    // The error would only show up with EIGEN_DEFAULT_TO_ROW_MAJOR is defined
+    // (when EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION is RowMajor)
+                          ( (_Rows==1 && _Cols!=1) ? Eigen::RowMajor
+                          : !(_Cols==1 && _Rows!=1) ?  EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION
+                          : Eigen::ColMajor ),
+#else
+                          ( (_Rows==1 && _Cols!=1) ? Eigen::RowMajor
+                          : (_Cols==1 && _Rows!=1) ? Eigen::ColMajor
+                          : EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION ),
+#endif
+         int _MaxRows = _Rows, int _MaxCols = _Cols> class Array;
+template<typename ConditionMatrixType, typename ThenMatrixType, typename ElseMatrixType> class Select;
+template<typename MatrixType, typename BinaryOp, int Direction> class PartialReduxExpr;
+template<typename ExpressionType, int Direction> class VectorwiseOp;
+template<typename MatrixType,int RowFactor,int ColFactor> class Replicate;
+template<typename MatrixType, int Direction = BothDirections> class Reverse;
+
+template<typename MatrixType> class FullPivLU;
+template<typename MatrixType> class PartialPivLU;
+namespace internal {
+template<typename MatrixType> struct inverse_impl;
+}
+template<typename MatrixType> class HouseholderQR;
+template<typename MatrixType> class ColPivHouseholderQR;
+template<typename MatrixType> class FullPivHouseholderQR;
+template<typename MatrixType, int QRPreconditioner = ColPivHouseholderQRPreconditioner> class JacobiSVD;
+template<typename MatrixType, int UpLo = Lower> class LLT;
+template<typename MatrixType, int UpLo = Lower> class LDLT;
+template<typename VectorsType, typename CoeffsType, int Side=OnTheLeft> class HouseholderSequence;
+template<typename Scalar>     class JacobiRotation;
+
+// Geometry module:
+template<typename Derived, int _Dim> class RotationBase;
+template<typename Lhs, typename Rhs> class Cross;
+template<typename Derived> class QuaternionBase;
+template<typename Scalar> class Rotation2D;
+template<typename Scalar> class AngleAxis;
+template<typename Scalar,int Dim> class Translation;
+
+#ifdef EIGEN2_SUPPORT
+template<typename Derived, int _Dim> class eigen2_RotationBase;
+template<typename Lhs, typename Rhs> class eigen2_Cross;
+template<typename Scalar> class eigen2_Quaternion;
+template<typename Scalar> class eigen2_Rotation2D;
+template<typename Scalar> class eigen2_AngleAxis;
+template<typename Scalar,int Dim> class eigen2_Transform;
+template <typename _Scalar, int _AmbientDim> class eigen2_ParametrizedLine;
+template <typename _Scalar, int _AmbientDim> class eigen2_Hyperplane;
+template<typename Scalar,int Dim> class eigen2_Translation;
+template<typename Scalar,int Dim> class eigen2_Scaling;
+#endif
+
+#if EIGEN2_SUPPORT_STAGE < STAGE20_RESOLVE_API_CONFLICTS
+template<typename Scalar> class Quaternion;
+template<typename Scalar,int Dim> class Transform;
+template <typename _Scalar, int _AmbientDim> class ParametrizedLine;
+template <typename _Scalar, int _AmbientDim> class Hyperplane;
+template<typename Scalar,int Dim> class Scaling;
+#endif
+
+#if EIGEN2_SUPPORT_STAGE > STAGE20_RESOLVE_API_CONFLICTS
+template<typename Scalar, int Options = AutoAlign> class Quaternion;
+template<typename Scalar,int Dim,int Mode,int _Options=AutoAlign> class Transform;
+template <typename _Scalar, int _AmbientDim, int Options=AutoAlign> class ParametrizedLine;
+template <typename _Scalar, int _AmbientDim, int Options=AutoAlign> class Hyperplane;
+template<typename Scalar> class UniformScaling;
+template<typename MatrixType,int Direction> class Homogeneous;
+#endif
+
+// MatrixFunctions module
+template<typename Derived> struct MatrixExponentialReturnValue;
+template<typename Derived> class MatrixFunctionReturnValue;
+template<typename Derived> class MatrixSquareRootReturnValue;
+template<typename Derived> class MatrixLogarithmReturnValue;
+template<typename Derived> class MatrixPowerReturnValue;
+template<typename Derived> class MatrixComplexPowerReturnValue;
+
+namespace internal {
+template <typename Scalar>
+struct stem_function
+{
+  typedef std::complex<typename NumTraits<Scalar>::Real> ComplexScalar;
+  typedef ComplexScalar type(ComplexScalar, int);
+};
+}
+
+
+#ifdef EIGEN2_SUPPORT
+template<typename ExpressionType> class Cwise;
+template<typename MatrixType> class Minor;
+template<typename MatrixType> class LU;
+template<typename MatrixType> class QR;
+template<typename MatrixType> class SVD;
+namespace internal {
+template<typename MatrixType, unsigned int Mode> struct eigen2_part_return_type;
+}
+#endif
+
+} // end namespace Eigen
+
+#endif // EIGEN_FORWARDDECLARATIONS_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/MKL_support.h b/third_party/eigen3/Eigen/src/Core/util/MKL_support.h
new file mode 100644
index 0000000000..8acca9c8c5
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/MKL_support.h
@@ -0,0 +1,126 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *   Include file with common MKL declarations
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_MKL_SUPPORT_H
+#define EIGEN_MKL_SUPPORT_H
+
+#ifdef EIGEN_USE_MKL_ALL
+  #ifndef EIGEN_USE_BLAS
+    #define EIGEN_USE_BLAS
+  #endif
+  #ifndef EIGEN_USE_LAPACKE
+    #define EIGEN_USE_LAPACKE
+  #endif
+  #ifndef EIGEN_USE_MKL_VML
+    #define EIGEN_USE_MKL_VML
+  #endif
+#endif
+
+#ifdef EIGEN_USE_LAPACKE_STRICT
+  #define EIGEN_USE_LAPACKE
+#endif
+
+#if defined(EIGEN_USE_BLAS) || defined(EIGEN_USE_LAPACKE) || defined(EIGEN_USE_MKL_VML)
+  #define EIGEN_USE_MKL
+#endif
+
+#if defined EIGEN_USE_MKL
+#   include <mkl.h> 
+/*Check IMKL version for compatibility: < 10.3 is not usable with Eigen*/
+#   ifndef INTEL_MKL_VERSION
+#       undef EIGEN_USE_MKL /* INTEL_MKL_VERSION is not even defined on older versions */
+#   elif INTEL_MKL_VERSION < 100305    /* the intel-mkl-103-release-notes say this was when the lapacke.h interface was added*/
+#       undef EIGEN_USE_MKL
+#   endif
+#   ifndef EIGEN_USE_MKL
+    /*If the MKL version is too old, undef everything*/
+#       undef   EIGEN_USE_MKL_ALL
+#       undef   EIGEN_USE_BLAS
+#       undef   EIGEN_USE_LAPACKE
+#       undef   EIGEN_USE_MKL_VML
+#       undef   EIGEN_USE_LAPACKE_STRICT
+#       undef   EIGEN_USE_LAPACKE
+#   endif
+#endif
+
+#if defined EIGEN_USE_MKL
+#include <mkl_lapacke.h>
+#define EIGEN_MKL_VML_THRESHOLD 128
+
+namespace Eigen {
+
+typedef std::complex<double> dcomplex;
+typedef std::complex<float>  scomplex;
+
+namespace internal {
+
+template<typename MKLType, typename EigenType>
+static inline void assign_scalar_eig2mkl(MKLType& mklScalar, const EigenType& eigenScalar) {
+  mklScalar=eigenScalar;
+}
+
+template<typename MKLType, typename EigenType>
+static inline void assign_conj_scalar_eig2mkl(MKLType& mklScalar, const EigenType& eigenScalar) {
+  mklScalar=eigenScalar;
+}
+
+template <>
+inline void assign_scalar_eig2mkl<MKL_Complex16,dcomplex>(MKL_Complex16& mklScalar, const dcomplex& eigenScalar) {
+  mklScalar.real=eigenScalar.real();
+  mklScalar.imag=eigenScalar.imag();
+}
+
+template <>
+inline void assign_scalar_eig2mkl<MKL_Complex8,scomplex>(MKL_Complex8& mklScalar, const scomplex& eigenScalar) {
+  mklScalar.real=eigenScalar.real();
+  mklScalar.imag=eigenScalar.imag();
+}
+
+template <>
+inline void assign_conj_scalar_eig2mkl<MKL_Complex16,dcomplex>(MKL_Complex16& mklScalar, const dcomplex& eigenScalar) {
+  mklScalar.real=eigenScalar.real();
+  mklScalar.imag=-eigenScalar.imag();
+}
+
+template <>
+inline void assign_conj_scalar_eig2mkl<MKL_Complex8,scomplex>(MKL_Complex8& mklScalar, const scomplex& eigenScalar) {
+  mklScalar.real=eigenScalar.real();
+  mklScalar.imag=-eigenScalar.imag();
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif
+
+#endif // EIGEN_MKL_SUPPORT_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/Macros.h b/third_party/eigen3/Eigen/src/Core/util/Macros.h
new file mode 100644
index 0000000000..729a451324
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/Macros.h
@@ -0,0 +1,740 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MACROS_H
+#define EIGEN_MACROS_H
+
+#define EIGEN_WORLD_VERSION 3
+#define EIGEN_MAJOR_VERSION 2
+#define EIGEN_MINOR_VERSION 90
+
+#define EIGEN_VERSION_AT_LEAST(x,y,z) (EIGEN_WORLD_VERSION>x || (EIGEN_WORLD_VERSION>=x && \
+                                      (EIGEN_MAJOR_VERSION>y || (EIGEN_MAJOR_VERSION>=y && \
+                                                                 EIGEN_MINOR_VERSION>=z))))
+
+// Compiler identification, EIGEN_COMP_*
+/// \internal EIGEN_COMP_GNUC set to 1 for all compilers compatible with GCC
+#ifdef __GNUC__
+  #define EIGEN_COMP_GNUC 1
+#else
+  #define EIGEN_COMP_GNUC 0
+#endif
+
+/// \internal EIGEN_COMP_CLANG set to 1 if the compiler is clang (alias for __clang__)
+#if defined(__clang__)
+  #define EIGEN_COMP_CLANG 1
+#else
+  #define EIGEN_COMP_CLANG 0
+#endif
+
+
+/// \internal EIGEN_COMP_LLVM set to 1 if the compiler backend is llvm
+#if defined(__llvm__)
+  #define EIGEN_COMP_LLVM 1
+#else
+  #define EIGEN_COMP_LLVM 0
+#endif
+
+/// \internal EIGEN_COMP_ICC set to __INTEL_COMPILER if the compiler is Intel compiler, 0 otherwise
+#if defined(__INTEL_COMPILER)
+  #define EIGEN_COMP_ICC __INTEL_COMPILER
+#else
+  #define EIGEN_COMP_ICC 0
+#endif
+
+/// \internal EIGEN_COMP_MINGW set to 1 if the compiler is mingw
+#if defined(__MINGW32__)
+  #define EIGEN_COMP_MINGW 1
+#else
+  #define EIGEN_COMP_MINGW 0
+#endif
+
+/// \internal EIGEN_COMP_SUNCC set to 1 if the compiler is Solaris Studio
+#if defined(__SUNPRO_CC)
+  #define EIGEN_COMP_SUNCC 1
+#else
+  #define EIGEN_COMP_SUNCC 0
+#endif
+
+/// \internal EIGEN_COMP_MSVC set to _MSC_VER if the compiler is Microsoft Visual C++, 0 otherwise.
+#if defined(_MSC_VER)
+  #define EIGEN_COMP_MSVC _MSC_VER
+#else
+  #define EIGEN_COMP_MSVC 0
+#endif
+
+/// \internal EIGEN_COMP_MSVC_STRICT set to 1 if the compiler is really Microsoft Visual C++ and not ,e.g., ICC
+#if EIGEN_COMP_MSVC && !(EIGEN_COMP_ICC)
+  #define EIGEN_COMP_MSVC_STRICT 1
+#else
+  #define EIGEN_COMP_MSVC_STRICT 0
+#endif
+
+/// \internal EIGEN_COMP_IBM set to 1 if the compiler is IBM XL C++
+#if defined(__IBMCPP__) || defined(__xlc__)
+  #define EIGEN_COMP_IBM 1
+#else
+  #define EIGEN_COMP_IBM 0
+#endif
+
+/// \internal EIGEN_COMP_PGI set to 1 if the compiler is Portland Group Compiler
+#if defined(__PGI)
+  #define EIGEN_COMP_PGI 1
+#else
+  #define EIGEN_COMP_PGI 0
+#endif
+
+/// \internal EIGEN_COMP_ARM set to 1 if the compiler is ARM Compiler
+#if defined(__CC_ARM) || defined(__ARMCC_VERSION)
+  #define EIGEN_COMP_ARM 1
+#else
+  #define EIGEN_COMP_ARM 0
+#endif
+
+
+/// \internal EIGEN_GNUC_STRICT set to 1 if the compiler is really GCC and not a compatible compiler (e.g., ICC, clang, mingw, etc.)
+#if EIGEN_COMP_GNUC && !(EIGEN_COMP_CLANG || EIGEN_COMP_CLANG || EIGEN_COMP_MINGW || EIGEN_COMP_PGI || EIGEN_COMP_IBM || EIGEN_COMP_ARM )
+  #define EIGEN_COMP_GNUC_STRICT 1
+#else
+  #define EIGEN_COMP_GNUC_STRICT 0
+#endif
+
+
+#if EIGEN_COMP_GNUC
+  #define EIGEN_GNUC_AT_LEAST(x,y) ((__GNUC__==x && __GNUC_MINOR__>=y) || __GNUC__>x)
+  #define EIGEN_GNUC_AT_MOST(x,y)  ((__GNUC__==x && __GNUC_MINOR__<=y) || __GNUC__<x)
+  #define EIGEN_GNUC_AT(x,y)       ( __GNUC__==x && __GNUC_MINOR__==y )
+#else
+  #define EIGEN_GNUC_AT_LEAST(x,y) 0
+  #define EIGEN_GNUC_AT_MOST(x,y)  0
+  #define EIGEN_GNUC_AT(x,y)       0
+#endif
+
+// FIXME: could probably be removed as we do not support gcc 3.x anymore
+#if EIGEN_COMP_GNUC && (__GNUC__ <= 3)
+#define EIGEN_GCC3_OR_OLDER 1
+#else
+#define EIGEN_GCC3_OR_OLDER 0
+#endif
+
+
+// Architecture identification, EIGEN_ARCH_*
+
+#if defined(__x86_64__) || defined(_M_X64) || defined(__amd64)
+  #define EIGEN_ARCH_x86_64 1
+#else
+  #define EIGEN_ARCH_x86_64 0
+#endif
+
+#if defined(__i386__) || defined(_M_IX86) || defined(_X86_) || defined(__i386)
+  #define EIGEN_ARCH_i386 1
+#else
+  #define EIGEN_ARCH_i386 0
+#endif
+
+#if EIGEN_ARCH_x86_64 || EIGEN_ARCH_i386
+  #define EIGEN_ARCH_i386_OR_x86_64 1
+#else
+  #define EIGEN_ARCH_i386_OR_x86_64 0
+#endif
+
+/// \internal EIGEN_ARCH_ARM set to 1 if the architecture is ARM
+#if defined(__arm__)
+  #define EIGEN_ARCH_ARM 1
+#else
+  #define EIGEN_ARCH_ARM 0
+#endif
+
+/// \internal EIGEN_ARCH_ARM64 set to 1 if the architecture is ARM64
+#if defined(__aarch64__)
+  #define EIGEN_ARCH_ARM64 1
+#else
+  #define EIGEN_ARCH_ARM64 0
+#endif
+
+#if EIGEN_ARCH_ARM || EIGEN_ARCH_ARM64
+  #define EIGEN_ARCH_ARM_OR_ARM64 1
+#else
+  #define EIGEN_ARCH_ARM_OR_ARM64 0
+#endif
+
+/// \internal EIGEN_ARCH_MIPS set to 1 if the architecture is MIPS
+#if defined(__mips__) || defined(__mips)
+  #define EIGEN_ARCH_MIPS 1
+#else
+  #define EIGEN_ARCH_MIPS 0
+#endif
+
+/// \internal EIGEN_ARCH_SPARC set to 1 if the architecture is SPARC
+#if defined(__sparc__) || defined(__sparc)
+  #define EIGEN_ARCH_SPARC 1
+#else
+  #define EIGEN_ARCH_SPARC 0
+#endif
+
+/// \internal EIGEN_ARCH_IA64 set to 1 if the architecture is Intel Itanium
+#if defined(__ia64__)
+  #define EIGEN_ARCH_IA64 1
+#else
+  #define EIGEN_ARCH_IA64 0
+#endif
+
+/// \internal EIGEN_ARCH_PPC set to 1 if the architecture is PowerPC
+#if defined(__powerpc__) || defined(__ppc__) || defined(_M_PPC)
+  #define EIGEN_ARCH_PPC 1
+#else
+  #define EIGEN_ARCH_PPC 0
+#endif
+
+
+
+// Operating system identification, EIGEN_OS_*
+
+/// \internal EIGEN_OS_UNIX set to 1 if the OS is a unix variant
+#if defined(__unix__) || defined(__unix)
+  #define EIGEN_OS_UNIX 1
+#else
+  #define EIGEN_OS_UNIX 0
+#endif
+
+/// \internal EIGEN_OS_LINUX set to 1 if the OS is based on Linux kernel
+#if defined(__linux__)
+  #define EIGEN_OS_LINUX 1
+#else
+  #define EIGEN_OS_LINUX 0
+#endif
+
+/// \internal EIGEN_OS_ANDROID set to 1 if the OS is Android
+// note: ANDROID is defined when using ndk_build, __ANDROID__ is defined when using a standalone toolchain.
+#if defined(__ANDROID__) || defined(ANDROID)
+  #define EIGEN_OS_ANDROID 1
+#else
+  #define EIGEN_OS_ANDROID 0
+#endif
+
+/// \internal EIGEN_OS_GNULINUX set to 1 if the OS is GNU Linux and not Linux-based OS (e.g., not android)
+#if defined(__gnu_linux__) && !(EIGEN_OS_ANDROID)
+  #define EIGEN_OS_GNULINUX 1
+#else
+  #define EIGEN_OS_GNULINUX 0
+#endif
+
+/// \internal EIGEN_OS_BSD set to 1 if the OS is a BSD variant
+#if defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__) || defined(__bsdi__) || defined(__DragonFly__)
+  #define EIGEN_OS_BSD 1
+#else
+  #define EIGEN_OS_BSD 0
+#endif
+
+/// \internal EIGEN_OS_MAC set to 1 if the OS is MacOS
+#if defined(__APPLE__)
+  #define EIGEN_OS_MAC 1
+#else
+  #define EIGEN_OS_MAC 0
+#endif
+
+/// \internal EIGEN_OS_QNX set to 1 if the OS is QNX
+#if defined(__QNX__)
+  #define EIGEN_OS_QNX 1
+#else
+  #define EIGEN_OS_QNX 0
+#endif
+
+/// \internal EIGEN_OS_WIN set to 1 if the OS is Windows based
+#if defined(_WIN32)
+  #define EIGEN_OS_WIN 1
+#else
+  #define EIGEN_OS_WIN 0
+#endif
+
+/// \internal EIGEN_OS_WIN64 set to 1 if the OS is Windows 64bits
+#if defined(_WIN64)
+  #define EIGEN_OS_WIN64 1
+#else
+  #define EIGEN_OS_WIN64 0
+#endif
+
+/// \internal EIGEN_OS_WINCE set to 1 if the OS is Windows CE
+#if defined(_WIN32_WCE)
+  #define EIGEN_OS_WINCE 1
+#else
+  #define EIGEN_OS_WINCE 0
+#endif
+
+/// \internal EIGEN_OS_CYGWIN set to 1 if the OS is Windows/Cygwin
+#if defined(__CYGWIN__)
+  #define EIGEN_OS_CYGWIN 1
+#else
+  #define EIGEN_OS_CYGWIN 0
+#endif
+
+/// \internal EIGEN_OS_WIN_STRICT set to 1 if the OS is really Windows and not some variants
+#if EIGEN_OS_WIN && !( EIGEN_OS_WINCE || EIGEN_OS_CYGWIN )
+  #define EIGEN_OS_WIN_STRICT 1
+#else
+  #define EIGEN_OS_WIN_STRICT 0
+#endif
+
+
+
+
+#if EIGEN_GNUC_AT_MOST(4,3) && !EIGEN_COMP_CLANG
+  // see bug 89
+  #define EIGEN_SAFE_TO_USE_STANDARD_ASSERT_MACRO 0
+#else
+  #define EIGEN_SAFE_TO_USE_STANDARD_ASSERT_MACRO 1
+#endif
+
+// 16 byte alignment is only useful for vectorization. Since it affects the ABI, we need to enable
+// 16 byte alignment on all platforms where vectorization might be enabled. In theory we could always
+// enable alignment, but it can be a cause of problems on some platforms, so we just disable it in
+// certain common platform (compiler+architecture combinations) to avoid these problems.
+// Only static alignment is really problematic (relies on nonstandard compiler extensions that don't
+// work everywhere, for example don't work on GCC/ARM), try to keep heap alignment even
+// when we have to disable static alignment.
+#if EIGEN_COMP_GNUC && !(EIGEN_ARCH_i386_OR_x86_64 || EIGEN_ARCH_PPC || EIGEN_ARCH_IA64)
+#define EIGEN_GCC_AND_ARCH_DOESNT_WANT_STACK_ALIGNMENT 1
+#else
+#define EIGEN_GCC_AND_ARCH_DOESNT_WANT_STACK_ALIGNMENT 0
+#endif
+
+// static alignment is completely disabled with GCC 3, Sun Studio, and QCC/QNX
+#if !EIGEN_GCC_AND_ARCH_DOESNT_WANT_STACK_ALIGNMENT \
+ && !EIGEN_GCC3_OR_OLDER \
+ && !EIGEN_COMP_SUNCC \
+ && !EIGEN_OS_QNX
+  #define EIGEN_ARCH_WANTS_STACK_ALIGNMENT 1
+#else
+  #define EIGEN_ARCH_WANTS_STACK_ALIGNMENT 0
+#endif
+
+// Defined the boundary (in bytes) on which the data needs to be aligned. Note
+// that unless EIGEN_ALIGN is defined and not equal to 0, the data may not be
+// aligned at all regardless of the value of this #define.
+#define EIGEN_ALIGN_BYTES 16
+
+#ifdef EIGEN_DONT_ALIGN
+  #ifndef EIGEN_DONT_ALIGN_STATICALLY
+    #define EIGEN_DONT_ALIGN_STATICALLY
+  #endif
+  #define EIGEN_ALIGN 0
+#elif !defined(EIGEN_DONT_VECTORIZE)
+  #if defined(__AVX__)
+    #undef EIGEN_ALIGN_BYTES
+    #define EIGEN_ALIGN_BYTES 32
+  #endif
+  #define EIGEN_ALIGN 1
+#else
+  #define EIGEN_ALIGN 0
+#endif
+
+#define EIGEN_MAX_ALIGN_BYTES EIGEN_ALIGN_BYTES
+
+
+// This macro can be used to prevent from macro expansion, e.g.:
+//   std::max EIGEN_NOT_A_MACRO(a,b)
+#define EIGEN_NOT_A_MACRO
+
+// EIGEN_ALIGN_STATICALLY is the true test whether we want to align arrays on the stack or not. It takes into account both the user choice to explicitly disable
+// alignment (EIGEN_DONT_ALIGN_STATICALLY) and the architecture config (EIGEN_ARCH_WANTS_STACK_ALIGNMENT). Henceforth, only EIGEN_ALIGN_STATICALLY should be used.
+#if EIGEN_ARCH_WANTS_STACK_ALIGNMENT && !defined(EIGEN_DONT_ALIGN_STATICALLY)
+  #define EIGEN_ALIGN_STATICALLY 1
+#else
+  #define EIGEN_ALIGN_STATICALLY 0
+  #ifndef EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT
+    #define EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT
+  #endif
+#endif
+
+#ifdef EIGEN_DEFAULT_TO_ROW_MAJOR
+#define EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION Eigen::RowMajor
+#else
+#define EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION Eigen::ColMajor
+#endif
+
+#ifndef EIGEN_DEFAULT_DENSE_INDEX_TYPE
+#define EIGEN_DEFAULT_DENSE_INDEX_TYPE std::ptrdiff_t
+#endif
+
+// Cross compiler wrapper around LLVM's __has_builtin
+#ifdef __has_builtin
+#  define EIGEN_HAS_BUILTIN(x) __has_builtin(x)
+#else
+#  define EIGEN_HAS_BUILTIN(x) 0
+#endif
+
+// A Clang feature extension to determine compiler features.
+// We use it to determine 'cxx_rvalue_references'
+#ifndef __has_feature
+# define __has_feature(x) 0
+#endif
+
+#if __cplusplus > 199711L
+#define EIGEN_HAS_VARIADIC_TEMPLATES 1
+#endif
+
+// Does the compiler support const expressions?
+#if __cplusplus > 199711L && !defined(__NVCC__) && !defined(GOOGLE_LIBCXX) && !defined(__APPLE__)
+#define EIGEN_HAS_CONSTEXPR 1
+#endif
+
+/** Allows to disable some optimizations which might affect the accuracy of the result.
+  * Such optimization are enabled by default, and set EIGEN_FAST_MATH to 0 to disable them.
+  * They currently include:
+  *   - single precision Cwise::sin() and Cwise::cos() when SSE vectorization is enabled.
+  */
+#ifndef EIGEN_FAST_MATH
+#define EIGEN_FAST_MATH 1
+#endif
+
+#define EIGEN_DEBUG_VAR(x) std::cerr << #x << " = " << x << std::endl;
+
+// concatenate two tokens
+#define EIGEN_CAT2(a,b) a ## b
+#define EIGEN_CAT(a,b) EIGEN_CAT2(a,b)
+
+// convert a token to a string
+#define EIGEN_MAKESTRING2(a) #a
+#define EIGEN_MAKESTRING(a) EIGEN_MAKESTRING2(a)
+
+// EIGEN_STRONG_INLINE is a stronger version of the inline, using __forceinline on MSVC,
+// but it still doesn't use GCC's always_inline. This is useful in (common) situations where MSVC needs forceinline
+// but GCC is still doing fine with just inline.
+#if EIGEN_COMP_MSVC || EIGEN_COMP_ICC
+#define EIGEN_STRONG_INLINE __forceinline
+#else
+#define EIGEN_STRONG_INLINE inline
+#endif
+
+// EIGEN_ALWAYS_INLINE is the stronget, it has the effect of making the function inline and adding every possible
+// attribute to maximize inlining. This should only be used when really necessary: in particular,
+// it uses __attribute__((always_inline)) on GCC, which most of the time is useless and can severely harm compile times.
+// FIXME with the always_inline attribute,
+// gcc 3.4.x reports the following compilation error:
+//   Eval.h:91: sorry, unimplemented: inlining failed in call to 'const Eigen::Eval<Derived> Eigen::MatrixBase<Scalar, Derived>::eval() const'
+//    : function body not available
+#if EIGEN_GNUC_AT_LEAST(4,0)
+#define EIGEN_ALWAYS_INLINE __attribute__((always_inline)) inline
+#else
+#define EIGEN_ALWAYS_INLINE EIGEN_STRONG_INLINE
+#endif
+
+#if EIGEN_COMP_GNUC
+#define EIGEN_DONT_INLINE __attribute__((noinline))
+#elif EIGEN_COMP_MSVC
+#define EIGEN_DONT_INLINE __declspec(noinline)
+#else
+#define EIGEN_DONT_INLINE
+#endif
+
+#if EIGEN_COMP_GNUC
+#define EIGEN_PERMISSIVE_EXPR __extension__
+#else
+#define EIGEN_PERMISSIVE_EXPR
+#endif
+
+#if EIGEN_COMP_GNUC
+#define EIGEN_LIKELY(x) __builtin_expect((x), 1)
+#define EIGEN_UNLIKELY(x) __builtin_expect((x), 0)
+#else
+#define EIGEN_LIKELY(x) (x)
+#define EIGEN_UNLIKELY(x) (x)
+#endif
+
+// this macro allows to get rid of linking errors about multiply defined functions.
+//  - static is not very good because it prevents definitions from different object files to be merged.
+//           So static causes the resulting linked executable to be bloated with multiple copies of the same function.
+//  - inline is not perfect either as it unwantedly hints the compiler toward inlining the function.
+#define EIGEN_DECLARE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS
+#define EIGEN_DEFINE_FUNCTION_ALLOWING_MULTIPLE_DEFINITIONS inline
+
+#ifdef NDEBUG
+# ifndef EIGEN_NO_DEBUG
+#  define EIGEN_NO_DEBUG
+# endif
+#endif
+
+#if !defined(EIGEN_NO_CHECK) || (!defined(EIGEN_NO_DEBUG) && !EIGEN_SAFE_TO_USE_STANDARD_ASSERT_MACRO)
+  // Custom assertion code that works regardless of the compilation mode.
+  #include <cstdlib>   // for abort
+  #include <iostream>  // for std::cerr
+
+  namespace Eigen {
+  namespace internal {
+  // trivial function copying a bool. Must be EIGEN_DONT_INLINE, so we implement it after including Eigen headers.
+  // see bug 89.
+  namespace {
+  EIGEN_DONT_INLINE bool copy_bool(bool b) { return b; }
+  }
+  inline void assert_fail(const char *condition, const char *function, const char *file, int line)
+  {
+    copy_bool(true);  // dummy call to avoid warnings about unused functions.
+    std::cerr << "assertion failed: " << condition << " in function " << function << " at " << file << ":" << line << std::endl;
+    abort();
+  }
+  }
+  }
+  #define eigen_internal_check(x) \
+    do { \
+      if(!Eigen::internal::copy_bool(x)) \
+        Eigen::internal::assert_fail(EIGEN_MAKESTRING(x), __PRETTY_FUNCTION__, __FILE__, __LINE__); \
+    } while(false)
+#endif
+
+#ifdef EIGEN_NO_CHECK
+  #define eigen_check(x)
+#else
+  #define eigen_check(x) eigen_internal_check(x)
+#endif
+
+// eigen_plain_assert is where we implement the workaround for the assert() bug in GCC <= 4.3, see bug 89
+#ifdef EIGEN_NO_DEBUG
+  #define eigen_plain_assert(x)
+#else
+  #if EIGEN_SAFE_TO_USE_STANDARD_ASSERT_MACRO
+    namespace Eigen {
+    namespace internal {
+    inline bool copy_bool(bool b) { return b; }
+    }
+    }
+    #define eigen_plain_assert(x) assert(x)
+  #else
+    // work around bug 89
+    #define eigen_plain_assert(x) eigen_internal_check(x)
+  #endif
+#endif
+
+// eigen_assert can be overridden
+#ifndef eigen_assert
+#define eigen_assert(x) eigen_plain_assert(x)
+#endif
+
+#ifdef EIGEN_INTERNAL_DEBUGGING
+#define eigen_internal_assert(x) eigen_assert(x)
+#else
+#define eigen_internal_assert(x)
+#endif
+
+#ifdef EIGEN_NO_DEBUG
+#define EIGEN_ONLY_USED_FOR_DEBUG(x) (void)x
+#else
+#define EIGEN_ONLY_USED_FOR_DEBUG(x)
+#endif
+
+#ifndef EIGEN_NO_DEPRECATED_WARNING
+  #if EIGEN_COMP_GNUC
+    #define EIGEN_DEPRECATED __attribute__((deprecated))
+  #elif (defined _MSC_VER)
+    #define EIGEN_DEPRECATED __declspec(deprecated)
+  #else
+    #define EIGEN_DEPRECATED
+  #endif
+#else
+  #define EIGEN_DEPRECATED
+#endif
+
+#if EIGEN_COMP_GNUC
+#define EIGEN_UNUSED __attribute__((unused))
+#else
+#define EIGEN_UNUSED
+#endif
+
+// Suppresses 'unused variable' warnings.
+namespace Eigen {
+  namespace internal {
+    template<typename T> void ignore_unused_variable(const T&) {}
+  }
+}
+#define EIGEN_UNUSED_VARIABLE(var) Eigen::internal::ignore_unused_variable(var);
+
+#if !defined(EIGEN_ASM_COMMENT)
+  #if EIGEN_COMP_GNUC && (EIGEN_ARCH_i386_OR_x86_64 || EIGEN_ARCH_ARM_OR_ARM64)
+    #define EIGEN_ASM_COMMENT(X)  asm("#" X)
+  #else
+    #define EIGEN_ASM_COMMENT(X)
+  #endif
+#endif
+
+/* EIGEN_ALIGN_TO_BOUNDARY(n) forces data to be n-byte aligned. This is used to satisfy SIMD requirements.
+ * However, we do that EVEN if vectorization (EIGEN_VECTORIZE) is disabled,
+ * so that vectorization doesn't affect binary compatibility.
+ *
+ * If we made alignment depend on whether or not EIGEN_VECTORIZE is defined, it would be impossible to link
+ * vectorized and non-vectorized code.
+ */
+#if (defined __CUDACC__)
+  #define EIGEN_ALIGN_TO_BOUNDARY(n) __align__(n)
+#elif EIGEN_COMP_GNUC || EIGEN_COMP_PGI || EIGEN_COMP_IBM || EIGEN_COMP_ARM
+  #define EIGEN_ALIGN_TO_BOUNDARY(n) __attribute__((aligned(n)))
+#elif EIGEN_COMP_MSVC
+  #define EIGEN_ALIGN_TO_BOUNDARY(n) __declspec(align(n))
+#elif EIGEN_COMP_SUNCC
+  // FIXME not sure about this one:
+  #define EIGEN_ALIGN_TO_BOUNDARY(n) __attribute__((aligned(n)))
+#else
+  #error Please tell me what is the equivalent of __attribute__((aligned(n))) for your compiler
+#endif
+
+#define EIGEN_ALIGN16 EIGEN_ALIGN_TO_BOUNDARY(16)
+#define EIGEN_ALIGN32 EIGEN_ALIGN_TO_BOUNDARY(32)
+#define EIGEN_ALIGN_DEFAULT EIGEN_ALIGN_TO_BOUNDARY(EIGEN_ALIGN_BYTES)
+#define EIGEN_ALIGN_MAX EIGEN_ALIGN_DEFAULT
+
+#if EIGEN_ALIGN_STATICALLY
+#define EIGEN_USER_ALIGN_TO_BOUNDARY(n) EIGEN_ALIGN_TO_BOUNDARY(n)
+#define EIGEN_USER_ALIGN16 EIGEN_ALIGN16
+#define EIGEN_USER_ALIGN32 EIGEN_ALIGN32
+#define EIGEN_USER_ALIGN_DEFAULT EIGEN_ALIGN_DEFAULT
+#else
+#define EIGEN_USER_ALIGN_TO_BOUNDARY(n)
+#define EIGEN_USER_ALIGN16
+#define EIGEN_USER_ALIGN32
+#define EIGEN_USER_ALIGN_DEFAULT
+#endif
+
+#ifdef EIGEN_DONT_USE_RESTRICT_KEYWORD
+  #define EIGEN_RESTRICT
+#endif
+#ifndef EIGEN_RESTRICT
+  #define EIGEN_RESTRICT __restrict
+#endif
+
+#ifndef EIGEN_STACK_ALLOCATION_LIMIT
+#define EIGEN_STACK_ALLOCATION_LIMIT 20000
+#endif
+
+#ifndef EIGEN_DEFAULT_IO_FORMAT
+#ifdef EIGEN_MAKING_DOCS
+// format used in Eigen's documentation
+// needed to define it here as escaping characters in CMake add_definition's argument seems very problematic.
+#define EIGEN_DEFAULT_IO_FORMAT Eigen::IOFormat(3, 0, " ", "\n", "", "")
+#else
+#define EIGEN_DEFAULT_IO_FORMAT Eigen::IOFormat()
+#endif
+#endif
+
+// just an empty macro !
+#define EIGEN_EMPTY
+
+#if EIGEN_COMP_MSVC_STRICT
+  #define EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \
+    using Base::operator =;
+#elif EIGEN_COMP_CLANG // workaround clang bug (see http://forum.kde.org/viewtopic.php?f=74&t=102653)
+  #define EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \
+    using Base::operator =; \
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const Derived& other) { Base::operator=(other); return *this; } \
+    template <typename OtherDerived> \
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const DenseBase<OtherDerived>& other) { Base::operator=(other.derived()); return *this; }
+#else
+  #define EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived) \
+    using Base::operator =; \
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& operator=(const Derived& other) \
+    { \
+      Base::operator=(other); \
+      return *this; \
+    }
+#endif
+
+#define EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Derived) EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Derived)
+
+/**
+* Just a side note. Commenting within defines works only by documenting
+* behind the object (via '!<'). Comments cannot be multi-line and thus
+* we have these extra long lines. What is confusing doxygen over here is
+* that we use '\' and basically have a bunch of typedefs with their
+* documentation in a single line.
+**/
+
+#define EIGEN_GENERIC_PUBLIC_INTERFACE(Derived) \
+  typedef typename Eigen::internal::traits<Derived>::Scalar Scalar; /*!< \brief Numeric type, e.g. float, double, int or std::complex<float>. */ \
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar; /*!< \brief The underlying numeric type for composed scalar types. \details In cases where Scalar is e.g. std::complex<T>, T were corresponding to RealScalar. */ \
+  typedef typename Base::CoeffReturnType CoeffReturnType; /*!< \brief The return type for coefficient access. \details Depending on whether the object allows direct coefficient access (e.g. for a MatrixXd), this type is either 'const Scalar&' or simply 'Scalar' for objects that do not allow direct coefficient access. */ \
+  typedef typename Eigen::internal::nested<Derived>::type Nested; \
+  typedef typename Eigen::internal::traits<Derived>::StorageKind StorageKind; \
+  typedef typename Eigen::internal::traits<Derived>::Index Index; \
+  enum { RowsAtCompileTime = Eigen::internal::traits<Derived>::RowsAtCompileTime, \
+        ColsAtCompileTime = Eigen::internal::traits<Derived>::ColsAtCompileTime, \
+        Flags = Eigen::internal::traits<Derived>::Flags, \
+        CoeffReadCost = Eigen::internal::traits<Derived>::CoeffReadCost, \
+        SizeAtCompileTime = Base::SizeAtCompileTime, \
+        MaxSizeAtCompileTime = Base::MaxSizeAtCompileTime, \
+        IsVectorAtCompileTime = Base::IsVectorAtCompileTime };
+
+
+#define EIGEN_DENSE_PUBLIC_INTERFACE(Derived) \
+  typedef typename Eigen::internal::traits<Derived>::Scalar Scalar; /*!< \brief Numeric type, e.g. float, double, int or std::complex<float>. */ \
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar; /*!< \brief The underlying numeric type for composed scalar types. \details In cases where Scalar is e.g. std::complex<T>, T were corresponding to RealScalar. */ \
+  typedef typename Base::PacketScalar PacketScalar; \
+  typedef typename Base::CoeffReturnType CoeffReturnType; /*!< \brief The return type for coefficient access. \details Depending on whether the object allows direct coefficient access (e.g. for a MatrixXd), this type is either 'const Scalar&' or simply 'Scalar' for objects that do not allow direct coefficient access. */ \
+  typedef typename Eigen::internal::nested<Derived>::type Nested; \
+  typedef typename Eigen::internal::traits<Derived>::StorageKind StorageKind; \
+  typedef typename Eigen::internal::traits<Derived>::Index Index; \
+  enum { RowsAtCompileTime = Eigen::internal::traits<Derived>::RowsAtCompileTime, \
+        ColsAtCompileTime = Eigen::internal::traits<Derived>::ColsAtCompileTime, \
+        MaxRowsAtCompileTime = Eigen::internal::traits<Derived>::MaxRowsAtCompileTime, \
+        MaxColsAtCompileTime = Eigen::internal::traits<Derived>::MaxColsAtCompileTime, \
+        Flags = Eigen::internal::traits<Derived>::Flags, \
+        CoeffReadCost = Eigen::internal::traits<Derived>::CoeffReadCost, \
+        SizeAtCompileTime = Base::SizeAtCompileTime, \
+        MaxSizeAtCompileTime = Base::MaxSizeAtCompileTime, \
+        IsVectorAtCompileTime = Base::IsVectorAtCompileTime }; \
+  using Base::derived; \
+  using Base::const_cast_derived;
+
+
+#define EIGEN_PLAIN_ENUM_MIN(a,b) (((int)a <= (int)b) ? (int)a : (int)b)
+#define EIGEN_PLAIN_ENUM_MAX(a,b) (((int)a >= (int)b) ? (int)a : (int)b)
+
+// EIGEN_SIZE_MIN_PREFER_DYNAMIC gives the min between compile-time sizes. 0 has absolute priority, followed by 1,
+// followed by Dynamic, followed by other finite values. The reason for giving Dynamic the priority over
+// finite values is that min(3, Dynamic) should be Dynamic, since that could be anything between 0 and 3.
+#define EIGEN_SIZE_MIN_PREFER_DYNAMIC(a,b) (((int)a == 0 || (int)b == 0) ? 0 \
+                           : ((int)a == 1 || (int)b == 1) ? 1 \
+                           : ((int)a == Dynamic || (int)b == Dynamic) ? Dynamic \
+                           : ((int)a <= (int)b) ? (int)a : (int)b)
+
+// EIGEN_SIZE_MIN_PREFER_FIXED is a variant of EIGEN_SIZE_MIN_PREFER_DYNAMIC comparing MaxSizes. The difference is that finite values
+// now have priority over Dynamic, so that min(3, Dynamic) gives 3. Indeed, whatever the actual value is
+// (between 0 and 3), it is not more than 3.
+#define EIGEN_SIZE_MIN_PREFER_FIXED(a,b)  (((int)a == 0 || (int)b == 0) ? 0 \
+                           : ((int)a == 1 || (int)b == 1) ? 1 \
+                           : ((int)a == Dynamic && (int)b == Dynamic) ? Dynamic \
+                           : ((int)a == Dynamic) ? (int)b \
+                           : ((int)b == Dynamic) ? (int)a \
+                           : ((int)a <= (int)b) ? (int)a : (int)b)
+
+// see EIGEN_SIZE_MIN_PREFER_DYNAMIC. No need for a separate variant for MaxSizes here.
+#define EIGEN_SIZE_MAX(a,b) (((int)a == Dynamic || (int)b == Dynamic) ? Dynamic \
+                           : ((int)a >= (int)b) ? (int)a : (int)b)
+
+#define EIGEN_LOGICAL_XOR(a,b) (((a) || (b)) && !((a) && (b)))
+
+#define EIGEN_IMPLIES(a,b) (!(a) || (b))
+
+#define EIGEN_MAKE_CWISE_BINARY_OP(METHOD,FUNCTOR) \
+  template<typename OtherDerived> \
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const CwiseBinaryOp<FUNCTOR<Scalar>, const Derived, const OtherDerived> \
+  (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
+  { \
+    return CwiseBinaryOp<FUNCTOR<Scalar>, const Derived, const OtherDerived>(derived(), other.derived()); \
+  }
+
+// the expression type of a cwise product
+#define EIGEN_CWISE_PRODUCT_RETURN_TYPE(LHS,RHS) \
+    CwiseBinaryOp< \
+      internal::scalar_product_op< \
+          typename internal::traits<LHS>::Scalar, \
+          typename internal::traits<RHS>::Scalar \
+      >, \
+      const LHS, \
+      const RHS \
+    >
+
+#endif // EIGEN_MACROS_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/MatrixMapper.h b/third_party/eigen3/Eigen/src/Core/util/MatrixMapper.h
new file mode 100644
index 0000000000..ec2ad018ff
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/MatrixMapper.h
@@ -0,0 +1,155 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Eric Martin <eric@ericmart.in>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MATRIXMAPPER_H
+#define EIGEN_MATRIXMAPPER_H
+
+// To support both matrices and tensors, we need a way to abstractly access an
+// element of a matrix (where the matrix might be an implicitly flattened
+// tensor). This file abstracts the logic needed to access elements in a row
+// major or column major matrix.
+
+namespace Eigen {
+
+namespace internal {
+
+template<typename Scalar, typename Index>
+class BlasVectorMapper {
+  public:
+  EIGEN_ALWAYS_INLINE BlasVectorMapper(Scalar *data) : m_data(data) {}
+
+  EIGEN_ALWAYS_INLINE Scalar operator()(Index i) const {
+    return m_data[i];
+  }
+  template <typename Packet, int AlignmentType>
+  EIGEN_ALWAYS_INLINE Packet load(Index i) const {
+    return ploadt<Packet, AlignmentType>(m_data + i);
+  }
+
+  template <typename Packet>
+  bool aligned(Index i) const {
+    return (size_t(m_data+i)%sizeof(Packet))==0;
+  }
+
+  protected:
+  Scalar* m_data;
+};
+
+// We need a fast way to iterate down columns (if column major) that doesn't
+// involves performing a multiplication for each lookup.
+template<typename Scalar, typename Index, int AlignmentType>
+class BlasLinearMapper {
+  public:
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename packet_traits<Scalar>::half HalfPacket;
+
+  EIGEN_ALWAYS_INLINE BlasLinearMapper(Scalar *data) : m_data(data) {}
+
+  EIGEN_ALWAYS_INLINE void prefetch(int i) const {
+    internal::prefetch(&operator()(i));
+  }
+
+  EIGEN_ALWAYS_INLINE Scalar& operator()(Index i) const {
+    return m_data[i];
+  }
+
+  EIGEN_ALWAYS_INLINE Packet loadPacket(Index i) const {
+    return ploadt<Packet, AlignmentType>(m_data + i);
+  }
+
+  EIGEN_ALWAYS_INLINE HalfPacket loadHalfPacket(Index i) const {
+    return ploadt<HalfPacket, AlignmentType>(m_data + i);
+  }
+
+  EIGEN_ALWAYS_INLINE void storePacket(Index i, Packet p) const {
+    pstoret<Scalar, Packet, AlignmentType>(m_data + i, p);
+  }
+
+  protected:
+  Scalar* m_data;
+};
+
+// This mapper allows access into matrix by coordinates i and j.
+template<typename Scalar, typename Index, int StorageOrder, int AlignmentType = Unaligned>
+class blas_data_mapper {
+  public:
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename packet_traits<Scalar>::half HalfPacket;
+
+  typedef BlasLinearMapper<Scalar, Index, AlignmentType> LinearMapper;
+  typedef BlasVectorMapper<Scalar, Index> VectorMapper;
+
+  EIGEN_ALWAYS_INLINE blas_data_mapper(Scalar* data, Index stride) : m_data(data), m_stride(stride) {}
+
+  EIGEN_ALWAYS_INLINE blas_data_mapper<Scalar, Index, StorageOrder, AlignmentType>
+  getSubMapper(Index i, Index j) const {
+    return blas_data_mapper<Scalar, Index, StorageOrder, AlignmentType>(&operator()(i, j), m_stride);
+  }
+
+  EIGEN_ALWAYS_INLINE LinearMapper getLinearMapper(Index i, Index j) const {
+    return LinearMapper(&operator()(i, j));
+  }
+
+  EIGEN_ALWAYS_INLINE VectorMapper getVectorMapper(Index i, Index j) const {
+    return VectorMapper(&operator()(i, j));
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Scalar& operator()(Index i, Index j) const {
+    return m_data[StorageOrder==RowMajor ? j + i*m_stride : i + j*m_stride];
+  }
+
+  EIGEN_ALWAYS_INLINE Packet loadPacket(Index i, Index j) const {
+    return ploadt<Packet, AlignmentType>(&operator()(i, j));
+  }
+
+  EIGEN_ALWAYS_INLINE HalfPacket loadHalfPacket(Index i, Index j) const {
+    return ploadt<HalfPacket, AlignmentType>(&operator()(i, j));
+  }
+
+  template<typename SubPacket>
+  EIGEN_ALWAYS_INLINE void scatterPacket(Index i, Index j, SubPacket p) const {
+    pscatter<Scalar, SubPacket>(&operator()(i, j), p, m_stride);
+  }
+
+  template<typename SubPacket>
+  EIGEN_ALWAYS_INLINE SubPacket gatherPacket(Index i, Index j) const {
+    return pgather<Scalar, SubPacket>(&operator()(i, j), m_stride);
+  }
+
+  const Index stride() const { return m_stride; }
+
+  Index firstAligned(Index size) const {
+    if (size_t(m_data)%sizeof(Scalar)) {
+      return -1;
+    }
+    return internal::first_aligned(m_data, size);
+  }
+
+  protected:
+  Scalar* EIGEN_RESTRICT m_data;
+  const Index m_stride;
+};
+
+// This is just a convienent way to work with
+// blas_data_mapper<const Scalar, Index, StorageOrder>
+template<typename Scalar, typename Index, int StorageOrder>
+class const_blas_data_mapper : public blas_data_mapper<const Scalar, Index, StorageOrder> {
+  public:
+  EIGEN_ALWAYS_INLINE const_blas_data_mapper(const Scalar *data, Index stride) : blas_data_mapper<const Scalar, Index, StorageOrder>(data, stride) {}
+
+  EIGEN_ALWAYS_INLINE const_blas_data_mapper<Scalar, Index, StorageOrder> getSubMapper(Index i, Index j) const {
+    return const_blas_data_mapper<Scalar, Index, StorageOrder>(&(this->operator()(i, j)), this->m_stride);
+  }
+};
+
+} // end namespace internal
+} // end namespace eigen
+
+#endif //EIGEN_MATRIXMAPPER_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/Memory.h b/third_party/eigen3/Eigen/src/Core/util/Memory.h
new file mode 100644
index 0000000000..03a699177a
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/Memory.h
@@ -0,0 +1,984 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2008-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009 Kenneth Riddile <kfriddile@yahoo.com>
+// Copyright (C) 2010 Hauke Heibel <hauke.heibel@gmail.com>
+// Copyright (C) 2010 Thomas Capricelli <orzel@freehackers.org>
+// Copyright (C) 2013 Pavel Holoborodko <pavel@holoborodko.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+
+/*****************************************************************************
+*** Platform checks for aligned malloc functions                           ***
+*****************************************************************************/
+
+#ifndef EIGEN_MEMORY_H
+#define EIGEN_MEMORY_H
+
+// See bug 554 (http://eigen.tuxfamily.org/bz/show_bug.cgi?id=554)
+// It seems to be unsafe to check _POSIX_ADVISORY_INFO without including unistd.h first.
+// Currently, let's include it only on unix systems:
+#if defined(__unix__) || defined(__unix)
+  #include <unistd.h>
+  #if ((defined __QNXNTO__) || (defined _GNU_SOURCE) || ((defined _XOPEN_SOURCE) && (_XOPEN_SOURCE >= 600))) && (defined _POSIX_ADVISORY_INFO) && (_POSIX_ADVISORY_INFO > 0)
+    #define EIGEN_HAS_POSIX_MEMALIGN 1
+  #endif
+#endif
+
+#ifndef EIGEN_HAS_POSIX_MEMALIGN
+  #define EIGEN_HAS_POSIX_MEMALIGN 0
+#endif
+
+#if defined EIGEN_VECTORIZE_SSE || defined EIGEN_VECTORIZE_AVX
+  #define EIGEN_HAS_MM_MALLOC 1
+#else
+  #define EIGEN_HAS_MM_MALLOC 0
+#endif
+
+namespace Eigen {
+
+namespace internal {
+
+EIGEN_DEVICE_FUNC inline void throw_std_bad_alloc()
+{
+#ifndef __CUDA_ARCH__
+  #ifdef EIGEN_EXCEPTIONS
+    throw std::bad_alloc();
+  #else
+    std::size_t huge = static_cast<std::size_t>(-1);
+    new int[huge];
+  #endif
+#endif
+}
+
+/*****************************************************************************
+*** Implementation of handmade aligned functions                           ***
+*****************************************************************************/
+
+/* ----- Hand made implementations of aligned malloc/free and realloc ----- */
+
+/** \internal Like malloc, but the returned pointer is guaranteed to be 16-byte aligned.
+  * Fast, but wastes 16 additional bytes of memory. Does not throw any exception.
+  */
+inline void* handmade_aligned_malloc(std::size_t size)
+{
+  void *original = std::malloc(size+EIGEN_ALIGN_BYTES);
+  if (original == 0) return 0;
+  void *aligned = reinterpret_cast<void*>((reinterpret_cast<std::size_t>(original) & ~(std::size_t(EIGEN_ALIGN_BYTES-1))) + EIGEN_ALIGN_BYTES);
+  *(reinterpret_cast<void**>(aligned) - 1) = original;
+  return aligned;
+}
+
+/** \internal Frees memory allocated with handmade_aligned_malloc */
+inline void handmade_aligned_free(void *ptr)
+{
+  if (ptr) std::free(*(reinterpret_cast<void**>(ptr) - 1));
+}
+
+/** \internal
+  * \brief Reallocates aligned memory.
+  * Since we know that our handmade version is based on std::realloc
+  * we can use std::realloc to implement efficient reallocation.
+  */
+inline void* handmade_aligned_realloc(void* ptr, std::size_t size, std::size_t = 0)
+{
+  if (ptr == 0) return handmade_aligned_malloc(size);
+  void *original = *(reinterpret_cast<void**>(ptr) - 1);
+  std::ptrdiff_t previous_offset = static_cast<char *>(ptr)-static_cast<char *>(original);
+  original = std::realloc(original,size+EIGEN_ALIGN_BYTES);
+  if (original == 0) return 0;
+  void *aligned = reinterpret_cast<void*>((reinterpret_cast<std::size_t>(original) & ~(std::size_t(EIGEN_ALIGN_BYTES-1))) + EIGEN_ALIGN_BYTES);
+  void *previous_aligned = static_cast<char *>(original)+previous_offset;
+  if(aligned!=previous_aligned)
+    std::memmove(aligned, previous_aligned, size);
+
+  *(reinterpret_cast<void**>(aligned) - 1) = original;
+  return aligned;
+}
+
+/*****************************************************************************
+*** Implementation of generic aligned realloc (when no realloc can be used)***
+*****************************************************************************/
+
+EIGEN_DEVICE_FUNC void* aligned_malloc(std::size_t size);
+EIGEN_DEVICE_FUNC void  aligned_free(void *ptr);
+
+/** \internal
+  * \brief Reallocates aligned memory.
+  * Allows reallocation with aligned ptr types. This implementation will
+  * always create a new memory chunk and copy the old data.
+  */
+inline void* generic_aligned_realloc(void* ptr, size_t size, size_t old_size)
+{
+  if (ptr==0)
+    return aligned_malloc(size);
+
+  if (size==0)
+  {
+    aligned_free(ptr);
+    return 0;
+  }
+
+  void* newptr = aligned_malloc(size);
+  if (newptr == 0)
+  {
+    #ifdef EIGEN_HAS_ERRNO
+    errno = ENOMEM; // according to the standard
+    #endif
+    return 0;
+  }
+
+  if (ptr != 0)
+  {
+    std::memcpy(newptr, ptr, (std::min)(size,old_size));
+    aligned_free(ptr);
+  }
+
+  return newptr;
+}
+
+/*****************************************************************************
+*** Implementation of portable aligned versions of malloc/free/realloc     ***
+*****************************************************************************/
+
+#ifdef EIGEN_NO_MALLOC
+EIGEN_DEVICE_FUNC inline void check_that_malloc_is_allowed()
+{
+  eigen_assert(false && "heap allocation is forbidden (EIGEN_NO_MALLOC is defined)");
+}
+#elif defined EIGEN_RUNTIME_NO_MALLOC
+EIGEN_DEVICE_FUNC inline bool is_malloc_allowed_impl(bool update, bool new_value = false)
+{
+  static bool value = true;
+  if (update == 1)
+    value = new_value;
+  return value;
+}
+EIGEN_DEVICE_FUNC inline bool is_malloc_allowed() { return is_malloc_allowed_impl(false); }
+EIGEN_DEVICE_FUNC inline bool set_is_malloc_allowed(bool new_value) { return is_malloc_allowed_impl(true, new_value); }
+EIGEN_DEVICE_FUNC inline void check_that_malloc_is_allowed()
+{
+  eigen_assert(is_malloc_allowed() && "heap allocation is forbidden (EIGEN_RUNTIME_NO_MALLOC is defined and g_is_malloc_allowed is false)");
+}
+#else
+EIGEN_DEVICE_FUNC inline void check_that_malloc_is_allowed()
+{}
+#endif
+
+/** \internal Allocates \a size bytes. The returned pointer is guaranteed to have 16 or 32 bytes alignment depending on the requirements.
+  * On allocation error, the returned pointer is null, and std::bad_alloc is thrown.
+  */
+EIGEN_DEVICE_FUNC
+inline void* aligned_malloc(size_t size)
+{
+  check_that_malloc_is_allowed();
+
+  void *result;
+  #if !EIGEN_ALIGN
+    result = std::malloc(size);
+  #elif EIGEN_HAS_POSIX_MEMALIGN
+    if(posix_memalign(&result, EIGEN_ALIGN_BYTES, size)) result = 0;
+  #elif EIGEN_HAS_MM_MALLOC
+    result = _mm_malloc(size, EIGEN_ALIGN_BYTES);
+  #elif defined(_MSC_VER) && (!defined(_WIN32_WCE))
+    result = _aligned_malloc(size, EIGEN_ALIGN_BYTES);
+  #else
+    result = handmade_aligned_malloc(size);
+  #endif
+
+  if(!result && size)
+    throw_std_bad_alloc();
+
+  return result;
+}
+
+/** \internal Frees memory allocated with aligned_malloc. */
+EIGEN_DEVICE_FUNC
+inline void aligned_free(void *ptr)
+{
+  #if !EIGEN_ALIGN
+    std::free(ptr);
+  #elif EIGEN_HAS_POSIX_MEMALIGN
+    std::free(ptr);
+  #elif EIGEN_HAS_MM_MALLOC
+    _mm_free(ptr);
+  #elif defined(_MSC_VER) && (!defined(_WIN32_WCE))
+    _aligned_free(ptr);
+  #else
+    handmade_aligned_free(ptr);
+  #endif
+}
+
+/**
+* \internal
+* \brief Reallocates an aligned block of memory.
+* \throws std::bad_alloc on allocation failure
+**/
+inline void* aligned_realloc(void *ptr, size_t new_size, size_t old_size)
+{
+  EIGEN_UNUSED_VARIABLE(old_size);
+
+  void *result;
+#if !EIGEN_ALIGN
+  result = std::realloc(ptr,new_size);
+#elif EIGEN_HAS_POSIX_MEMALIGN
+  result = generic_aligned_realloc(ptr,new_size,old_size);
+#elif EIGEN_HAS_MM_MALLOC
+  // The defined(_mm_free) is just here to verify that this MSVC version
+  // implements _mm_malloc/_mm_free based on the corresponding _aligned_
+  // functions. This may not always be the case and we just try to be safe.
+  #if EIGEN_OS_WIN_STRICT && defined(_mm_free)
+    result = _aligned_realloc(ptr,new_size,EIGEN_ALIGN_BYTES);
+  #else
+    result = generic_aligned_realloc(ptr,new_size,old_size);
+  #endif
+#elif EIGEN_OS_WIN_STRICT
+  result = _aligned_realloc(ptr,new_size,EIGEN_ALIGN_BYTES);
+#else
+  result = handmade_aligned_realloc(ptr,new_size,old_size);
+#endif
+
+  if (!result && new_size)
+    throw_std_bad_alloc();
+
+  return result;
+}
+
+/*****************************************************************************
+*** Implementation of conditionally aligned functions                      ***
+*****************************************************************************/
+
+/** \internal Allocates \a size bytes. If Align is true, then the returned ptr is 16-byte-aligned.
+  * On allocation error, the returned pointer is null, and a std::bad_alloc is thrown.
+  */
+template<bool Align> EIGEN_DEVICE_FUNC inline void* conditional_aligned_malloc(size_t size)
+{
+  return aligned_malloc(size);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void* conditional_aligned_malloc<false>(size_t size)
+{
+  check_that_malloc_is_allowed();
+
+  void *result = std::malloc(size);
+  if(!result && size)
+    throw_std_bad_alloc();
+  return result;
+}
+
+/** \internal Frees memory allocated with conditional_aligned_malloc */
+template<bool Align> EIGEN_DEVICE_FUNC inline void conditional_aligned_free(void *ptr)
+{
+  aligned_free(ptr);
+}
+
+template<> EIGEN_DEVICE_FUNC inline void conditional_aligned_free<false>(void *ptr)
+{
+  std::free(ptr);
+}
+
+template<bool Align> inline void* conditional_aligned_realloc(void* ptr, size_t new_size, size_t old_size)
+{
+  return aligned_realloc(ptr, new_size, old_size);
+}
+
+template<> inline void* conditional_aligned_realloc<false>(void* ptr, size_t new_size, size_t)
+{
+  return std::realloc(ptr, new_size);
+}
+
+/*****************************************************************************
+*** Construction/destruction of array elements                             ***
+*****************************************************************************/
+
+/** \internal Constructs the elements of an array.
+  * The \a size parameter tells on how many objects to call the constructor of T.
+  */
+template<typename T> EIGEN_DEVICE_FUNC inline T* construct_elements_of_array(T *ptr, size_t size)
+{
+  for (size_t i=0; i < size; ++i) ::new (ptr + i) T;
+  return ptr;
+}
+
+/** \internal Destructs the elements of an array.
+  * The \a size parameters tells on how many objects to call the destructor of T.
+  */
+template<typename T> EIGEN_DEVICE_FUNC inline void destruct_elements_of_array(T *ptr, size_t size)
+{
+  // always destruct an array starting from the end.
+  if(ptr)
+    while(size) ptr[--size].~T();
+}
+
+/*****************************************************************************
+*** Implementation of aligned new/delete-like functions                    ***
+*****************************************************************************/
+
+template<typename T>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void check_size_for_overflow(size_t size)
+{
+  if(size > size_t(-1) / sizeof(T))
+    throw_std_bad_alloc();
+}
+
+/** \internal Allocates \a size objects of type T. The returned pointer is guaranteed to have 16 bytes alignment.
+  * On allocation error, the returned pointer is undefined, but a std::bad_alloc is thrown.
+  * The default constructor of T is called.
+  */
+template<typename T> EIGEN_DEVICE_FUNC inline T* aligned_new(size_t size)
+{
+  check_size_for_overflow<T>(size);
+  T *result = reinterpret_cast<T*>(aligned_malloc(sizeof(T)*size));
+  return construct_elements_of_array(result, size);
+}
+
+template<typename T, bool Align> EIGEN_DEVICE_FUNC inline T* conditional_aligned_new(size_t size)
+{
+  check_size_for_overflow<T>(size);
+  T *result = reinterpret_cast<T*>(conditional_aligned_malloc<Align>(sizeof(T)*size));
+  return construct_elements_of_array(result, size);
+}
+
+template<typename T> EIGEN_DEVICE_FUNC inline T* allocate_uvm(size_t size)
+{
+#if defined(EIGEN_USE_GPU) && defined(__CUDA_ARCH__)
+  return (T*)malloc(size);
+#elif defined(EIGEN_USE_GPU) && defined(__NVCC__)
+  T* result = NULL;
+  if (cudaMallocManaged(&result, size) != cudaSuccess) {
+    throw_std_bad_alloc();
+  }
+  return result;
+#else
+  return reinterpret_cast<T*>(conditional_aligned_malloc<true>(sizeof(T)*size));
+#endif
+}
+
+template<typename T> EIGEN_DEVICE_FUNC void deallocate_uvm(T* ptr)
+{
+#if defined(EIGEN_USE_GPU) && defined(__CUDA_ARCH__)
+  free(ptr);
+#elif defined(EIGEN_USE_GPU) && defined(__NVCC__)
+  if (cudaFree(ptr) != cudaSuccess) {
+    throw_std_bad_alloc();
+  }
+#else
+  return conditional_aligned_free<true>(ptr);
+#endif
+}
+
+/** \internal Deletes objects constructed with aligned_new
+  * The \a size parameters tells on how many objects to call the destructor of T.
+  */
+template<typename T> EIGEN_DEVICE_FUNC  inline void aligned_delete(T *ptr, size_t size)
+{
+  destruct_elements_of_array<T>(ptr, size);
+  aligned_free(ptr);
+}
+
+/** \internal Deletes objects constructed with conditional_aligned_new
+  * The \a size parameters tells on how many objects to call the destructor of T.
+  */
+template<typename T, bool Align> EIGEN_DEVICE_FUNC inline void conditional_aligned_delete(T *ptr, size_t size)
+{
+  destruct_elements_of_array<T>(ptr, size);
+  conditional_aligned_free<Align>(ptr);
+}
+
+template<typename T, bool Align> EIGEN_DEVICE_FUNC inline T* conditional_aligned_realloc_new(T* pts, size_t new_size, size_t old_size)
+{
+  check_size_for_overflow<T>(new_size);
+  check_size_for_overflow<T>(old_size);
+  if(new_size < old_size)
+    destruct_elements_of_array(pts+new_size, old_size-new_size);
+  T *result = reinterpret_cast<T*>(conditional_aligned_realloc<Align>(reinterpret_cast<void*>(pts), sizeof(T)*new_size, sizeof(T)*old_size));
+  if(new_size > old_size)
+    construct_elements_of_array(result+old_size, new_size-old_size);
+  return result;
+}
+
+
+template<typename T, bool Align> EIGEN_DEVICE_FUNC inline T* conditional_aligned_new_auto(size_t size)
+{
+  check_size_for_overflow<T>(size);
+  T *result = reinterpret_cast<T*>(conditional_aligned_malloc<Align>(sizeof(T)*size));
+  if(NumTraits<T>::RequireInitialization)
+    construct_elements_of_array(result, size);
+  return result;
+}
+
+template<typename T, bool Align, bool UseUVM> EIGEN_DEVICE_FUNC inline T* conditional_managed_new_auto(size_t size)
+{
+  check_size_for_overflow<T>(size);
+  T *result;
+  if (UseUVM) {
+    result = allocate_uvm<T>(size*sizeof(T));
+  }
+  else {
+    result = reinterpret_cast<T*>(conditional_aligned_malloc<Align>(sizeof(T)*size));
+  }
+  if(NumTraits<T>::RequireInitialization)
+    construct_elements_of_array(result, size);
+  return result;
+}
+
+template<typename T, bool Align, bool UseUVM> EIGEN_DEVICE_FUNC inline void conditional_managed_delete_auto(T* ptr, size_t size)
+{
+  if(NumTraits<T>::RequireInitialization)
+    destruct_elements_of_array<T>(ptr, size);
+  if (UseUVM) {
+    deallocate_uvm(ptr);
+  }
+  else {
+    conditional_aligned_free<Align>(ptr);
+  }
+}
+
+template<typename T, bool Align> inline T* conditional_aligned_realloc_new_auto(T* pts, size_t new_size, size_t old_size)
+{
+  check_size_for_overflow<T>(new_size);
+  check_size_for_overflow<T>(old_size);
+  if(NumTraits<T>::RequireInitialization && (new_size < old_size))
+    destruct_elements_of_array(pts+new_size, old_size-new_size);
+  T *result = reinterpret_cast<T*>(conditional_aligned_realloc<Align>(reinterpret_cast<void*>(pts), sizeof(T)*new_size, sizeof(T)*old_size));
+  if(NumTraits<T>::RequireInitialization && (new_size > old_size))
+    construct_elements_of_array(result+old_size, new_size-old_size);
+  return result;
+}
+
+template<typename T, bool Align> EIGEN_DEVICE_FUNC inline void conditional_aligned_delete_auto(T *ptr, size_t size)
+{
+  if(NumTraits<T>::RequireInitialization)
+    destruct_elements_of_array<T>(ptr, size);
+  conditional_aligned_free<Align>(ptr);
+}
+
+/****************************************************************************/
+
+/** \internal Returns the index of the first element of the array that is well aligned for vectorization.
+  *
+  * \param array the address of the start of the array
+  * \param size the size of the array
+  *
+  * \note If no element of the array is well aligned, the size of the array is returned. Typically,
+  * for example with SSE, "well aligned" means 16-byte-aligned. If vectorization is disabled or if the
+  * packet size for the given scalar type is 1, then everything is considered well-aligned.
+  *
+  * \note If the scalar type is vectorizable, we rely on the following assumptions: sizeof(Scalar) is a
+  * power of 2, the packet size in bytes is also a power of 2, and is a multiple of sizeof(Scalar). On the
+  * other hand, we do not assume that the array address is a multiple of sizeof(Scalar), as that fails for
+  * example with Scalar=double on certain 32-bit platforms, see bug #79.
+  *
+  * There is also the variant first_aligned(const MatrixBase&) defined in DenseCoeffsBase.h.
+  */
+template<typename Scalar, typename Index>
+inline Index first_aligned(const Scalar* array, Index size)
+{
+  enum { PacketSize = packet_traits<Scalar>::size,
+         PacketAlignedMask = PacketSize-1
+  };
+
+  if(PacketSize==1)
+  {
+    // Either there is no vectorization, or a packet consists of exactly 1 scalar so that all elements
+    // of the array have the same alignment.
+    return 0;
+  }
+  else if(size_t(array) & (sizeof(Scalar)-1))
+  {
+    // There is vectorization for this scalar type, but the array is not aligned to the size of a single scalar.
+    // Consequently, no element of the array is well aligned.
+    return size;
+  }
+  else
+  {
+    return std::min<Index>( (PacketSize - (Index((size_t(array)/sizeof(Scalar))) & PacketAlignedMask))
+                           & PacketAlignedMask, size);
+  }
+}
+
+/** \internal Returns the smallest integer multiple of \a base and greater or equal to \a size
+  */
+template<typename Index>
+inline Index first_multiple(Index size, Index base)
+{
+  return ((size+base-1)/base)*base;
+}
+
+// std::copy is much slower than memcpy, so let's introduce a smart_copy which
+// use memcpy on trivial types, i.e., on types that does not require an initialization ctor.
+template<typename T, bool UseMemcpy> struct smart_copy_helper;
+
+template<typename T> EIGEN_DEVICE_FUNC void smart_copy(const T* start, const T* end, T* target)
+{
+  smart_copy_helper<T,!NumTraits<T>::RequireInitialization>::run(start, end, target);
+}
+
+template<typename T> struct smart_copy_helper<T,true> {
+  static inline EIGEN_DEVICE_FUNC void run(const T* start, const T* end, T* target)
+  { memcpy(target, start, std::ptrdiff_t(end)-std::ptrdiff_t(start)); }
+};
+
+template<typename T> struct smart_copy_helper<T,false> {
+  static inline EIGEN_DEVICE_FUNC void run(const T* start, const T* end, T* target)
+  { std::copy(start, end, target); }
+};
+
+// intelligent memmove. falls back to std::memmove for POD types, uses std::copy otherwise.
+template<typename T, bool UseMemmove> struct smart_memmove_helper;
+
+template<typename T> void smart_memmove(const T* start, const T* end, T* target)
+{
+    smart_memmove_helper<T,!NumTraits<T>::RequireInitialization>::run(start, end, target);
+}
+
+template<typename T> struct smart_memmove_helper<T,true> {
+    static inline void run(const T* start, const T* end, T* target)
+    { std::memmove(target, start, std::ptrdiff_t(end)-std::ptrdiff_t(start)); }
+};
+
+template<typename T> struct smart_memmove_helper<T,false> {
+    static inline void run(const T* start, const T* end, T* target)
+    {
+        if (uintptr_t(target) < uintptr_t(start))
+        {
+            std::copy(start, end, target);
+        }
+        else
+        {
+            std::ptrdiff_t count = (std::ptrdiff_t(end)-std::ptrdiff_t(start)) / sizeof(T);
+            std::copy_backward(start, end, target + count);
+        }
+    }
+};
+
+
+/*****************************************************************************
+*** Implementation of runtime stack allocation (falling back to malloc)    ***
+*****************************************************************************/
+
+// you can overwrite Eigen's default behavior regarding alloca by defining EIGEN_ALLOCA
+// to the appropriate stack allocation function
+#ifndef EIGEN_ALLOCA
+  #if (defined __linux__) || (defined __APPLE__)
+    #define EIGEN_ALLOCA alloca
+  #elif defined(_MSC_VER)
+    #define EIGEN_ALLOCA _alloca
+  #endif
+#endif
+
+// This helper class construct the allocated memory, and takes care of destructing and freeing the handled data
+// at destruction time. In practice this helper class is mainly useful to avoid memory leak in case of exceptions.
+template<typename T> class aligned_stack_memory_handler
+{
+  public:
+    /* Creates a stack_memory_handler responsible for the buffer \a ptr of size \a size.
+     * Note that \a ptr can be 0 regardless of the other parameters.
+     * This constructor takes care of constructing/initializing the elements of the buffer if required by the scalar type T (see NumTraits<T>::RequireInitialization).
+     * In this case, the buffer elements will also be destructed when this handler will be destructed.
+     * Finally, if \a dealloc is true, then the pointer \a ptr is freed.
+     **/
+    aligned_stack_memory_handler(T* ptr, size_t size, bool dealloc)
+      : m_ptr(ptr), m_size(size), m_deallocate(dealloc)
+    {
+      if(NumTraits<T>::RequireInitialization && m_ptr)
+        Eigen::internal::construct_elements_of_array(m_ptr, size);
+    }
+    ~aligned_stack_memory_handler()
+    {
+      if(NumTraits<T>::RequireInitialization && m_ptr)
+        Eigen::internal::destruct_elements_of_array<T>(m_ptr, m_size);
+      if(m_deallocate)
+        Eigen::internal::aligned_free(m_ptr);
+    }
+  protected:
+    T* m_ptr;
+    size_t m_size;
+    bool m_deallocate;
+};
+
+} // end namespace internal
+
+/** \internal
+  * Declares, allocates and construct an aligned buffer named NAME of SIZE elements of type TYPE on the stack
+  * if SIZE is smaller than EIGEN_STACK_ALLOCATION_LIMIT, and if stack allocation is supported by the platform
+  * (currently, this is Linux and Visual Studio only). Otherwise the memory is allocated on the heap.
+  * The allocated buffer is automatically deleted when exiting the scope of this declaration.
+  * If BUFFER is non null, then the declared variable is simply an alias for BUFFER, and no allocation/deletion occurs.
+  * Here is an example:
+  * \code
+  * {
+  *   ei_declare_aligned_stack_constructed_variable(float,data,size,0);
+  *   // use data[0] to data[size-1]
+  * }
+  * \endcode
+  * The underlying stack allocation function can controlled with the EIGEN_ALLOCA preprocessor token.
+  */
+#ifdef EIGEN_ALLOCA
+  // The native alloca() that comes with llvm aligns buffer on 16 bytes even when AVX is enabled.
+#if defined(__arm__) || defined(_WIN32) || EIGEN_ALIGN_BYTES > 16
+    #define EIGEN_ALIGNED_ALLOCA(SIZE) reinterpret_cast<void*>((reinterpret_cast<size_t>(EIGEN_ALLOCA(SIZE+EIGEN_ALIGN_BYTES)) & ~(size_t(EIGEN_ALIGN_BYTES-1))) + EIGEN_ALIGN_BYTES)
+  #else
+    #define EIGEN_ALIGNED_ALLOCA EIGEN_ALLOCA
+  #endif
+
+  #define ei_declare_aligned_stack_constructed_variable(TYPE,NAME,SIZE,BUFFER) \
+    Eigen::internal::check_size_for_overflow<TYPE>(SIZE); \
+    TYPE* NAME = (BUFFER)!=0 ? (BUFFER) \
+               : reinterpret_cast<TYPE*>( \
+                      (sizeof(TYPE)*SIZE<=EIGEN_STACK_ALLOCATION_LIMIT) ? EIGEN_ALIGNED_ALLOCA(sizeof(TYPE)*SIZE) \
+                    : Eigen::internal::aligned_malloc(sizeof(TYPE)*SIZE) );  \
+    Eigen::internal::aligned_stack_memory_handler<TYPE> EIGEN_CAT(NAME,_stack_memory_destructor)((BUFFER)==0 ? NAME : 0,SIZE,sizeof(TYPE)*SIZE>EIGEN_STACK_ALLOCATION_LIMIT)
+
+#else
+
+  #define ei_declare_aligned_stack_constructed_variable(TYPE,NAME,SIZE,BUFFER) \
+    Eigen::internal::check_size_for_overflow<TYPE>(SIZE); \
+    TYPE* NAME = (BUFFER)!=0 ? BUFFER : reinterpret_cast<TYPE*>(Eigen::internal::aligned_malloc(sizeof(TYPE)*SIZE));    \
+    Eigen::internal::aligned_stack_memory_handler<TYPE> EIGEN_CAT(NAME,_stack_memory_destructor)((BUFFER)==0 ? NAME : 0,SIZE,true)
+
+#endif
+
+
+/*****************************************************************************
+*** Implementation of EIGEN_MAKE_ALIGNED_OPERATOR_NEW [_IF]                ***
+*****************************************************************************/
+
+#if EIGEN_ALIGN
+  #ifdef EIGEN_EXCEPTIONS
+    #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_NOTHROW(NeedsToAlign) \
+      void* operator new(size_t size, const std::nothrow_t&) throw() { \
+        try { return Eigen::internal::conditional_aligned_malloc<NeedsToAlign>(size); } \
+        catch (...) { return 0; } \
+        return 0; \
+      }
+  #else
+    #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_NOTHROW(NeedsToAlign) \
+      void* operator new(size_t size, const std::nothrow_t&) throw() { \
+        return Eigen::internal::conditional_aligned_malloc<NeedsToAlign>(size); \
+      }
+  #endif
+
+  #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(NeedsToAlign) \
+      void *operator new(size_t size) { \
+        return Eigen::internal::conditional_aligned_malloc<NeedsToAlign>(size); \
+      } \
+      void *operator new[](size_t size) { \
+        return Eigen::internal::conditional_aligned_malloc<NeedsToAlign>(size); \
+      } \
+      void operator delete(void * ptr) throw() { Eigen::internal::conditional_aligned_free<NeedsToAlign>(ptr); } \
+      void operator delete[](void * ptr) throw() { Eigen::internal::conditional_aligned_free<NeedsToAlign>(ptr); } \
+      /* in-place new and delete. since (at least afaik) there is no actual   */ \
+      /* memory allocated we can safely let the default implementation handle */ \
+      /* this particular case. */ \
+      static void *operator new(size_t size, void *ptr) { return ::operator new(size,ptr); } \
+      static void *operator new[](size_t size, void* ptr) { return ::operator new[](size,ptr); } \
+      void operator delete(void * memory, void *ptr) throw() { return ::operator delete(memory,ptr); } \
+      void operator delete[](void * memory, void *ptr) throw() { return ::operator delete[](memory,ptr); } \
+      /* nothrow-new (returns zero instead of std::bad_alloc) */ \
+      EIGEN_MAKE_ALIGNED_OPERATOR_NEW_NOTHROW(NeedsToAlign) \
+      void operator delete(void *ptr, const std::nothrow_t&) throw() { \
+        Eigen::internal::conditional_aligned_free<NeedsToAlign>(ptr); \
+      } \
+      typedef void eigen_aligned_operator_new_marker_type;
+#else
+  #define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(NeedsToAlign)
+#endif
+
+#define EIGEN_MAKE_ALIGNED_OPERATOR_NEW EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(true)
+#define EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(Scalar,Size) \
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(bool(((Size)!=Eigen::Dynamic) && ((sizeof(Scalar)*(Size))%EIGEN_ALIGN_BYTES==0)))
+
+/****************************************************************************/
+
+/** \class aligned_allocator
+* \ingroup Core_Module
+*
+* \brief STL compatible allocator to use with with 16 byte aligned types
+*
+* Example:
+* \code
+* // Matrix4f requires 16 bytes alignment:
+* std::map< int, Matrix4f, std::less<int>,
+*           aligned_allocator<std::pair<const int, Matrix4f> > > my_map_mat4;
+* // Vector3f does not require 16 bytes alignment, no need to use Eigen's allocator:
+* std::map< int, Vector3f > my_map_vec3;
+* \endcode
+*
+* \sa \ref TopicStlContainers.
+*/
+template<class T>
+class aligned_allocator : public std::allocator<T>
+{
+public:
+  typedef size_t          size_type;
+  typedef std::ptrdiff_t  difference_type;
+  typedef T*              pointer;
+  typedef const T*        const_pointer;
+  typedef T&              reference;
+  typedef const T&        const_reference;
+  typedef T               value_type;
+
+  template<class U>
+  struct rebind
+  {
+    typedef aligned_allocator<U> other;
+  };
+
+  aligned_allocator() : std::allocator<T>() {}
+
+  aligned_allocator(const aligned_allocator& other) : std::allocator<T>(other) {}
+
+  template<class U>
+  aligned_allocator(const aligned_allocator<U>& other) : std::allocator<T>(other) {}
+
+  ~aligned_allocator() {}
+
+  pointer allocate(size_type num, const void* /*hint*/ = 0)
+  {
+    internal::check_size_for_overflow<T>(num);
+    return static_cast<pointer>( internal::aligned_malloc(num * sizeof(T)) );
+  }
+
+  void deallocate(pointer p, size_type /*num*/)
+  {
+    internal::aligned_free(p);
+  }
+};
+
+//---------- Cache sizes ----------
+
+#if !defined(EIGEN_NO_CPUID)
+#  if EIGEN_COMP_GNUC && EIGEN_ARCH_i386_OR_x86_64
+#    if defined(__PIC__) && EIGEN_ARCH_i386
+       // Case for x86 with PIC
+#      define EIGEN_CPUID(abcd,func,id) \
+         __asm__ __volatile__ ("xchgl %%ebx, %k1;cpuid; xchgl %%ebx,%k1": "=a" (abcd[0]), "=&r" (abcd[1]), "=c" (abcd[2]), "=d" (abcd[3]) : "a" (func), "c" (id));
+#    elif defined(__PIC__) && EIGEN_ARCH_x86_64
+       // Case for x64 with PIC. In theory this is only a problem with recent gcc and with medium or large code model, not with the default small code model.
+       // However, we cannot detect which code model is used, and the xchg overhead is negligible anyway.
+#      define EIGEN_CPUID(abcd,func,id) \
+        __asm__ __volatile__ ("xchg{q}\t{%%}rbx, %q1; cpuid; xchg{q}\t{%%}rbx, %q1": "=a" (abcd[0]), "=&r" (abcd[1]), "=c" (abcd[2]), "=d" (abcd[3]) : "0" (func), "2" (id));
+#    else
+       // Case for x86_64 or x86 w/o PIC
+#      define EIGEN_CPUID(abcd,func,id) \
+         __asm__ __volatile__ ("cpuid": "=a" (abcd[0]), "=b" (abcd[1]), "=c" (abcd[2]), "=d" (abcd[3]) : "0" (func), "2" (id) );
+#    endif
+#  elif EIGEN_COMP_MSVC
+#    if (EIGEN_COMP_MSVC > 1500) && EIGEN_ARCH_i386_OR_x86_64
+#      define EIGEN_CPUID(abcd,func,id) __cpuidex((int*)abcd,func,id)
+#    endif
+#  endif
+#endif
+
+namespace internal {
+
+#ifdef EIGEN_CPUID
+
+inline bool cpuid_is_vendor(int abcd[4], const char* vendor)
+{
+  return abcd[1]==(reinterpret_cast<const int*>(vendor))[0] && abcd[3]==(reinterpret_cast<const int*>(vendor))[1] && abcd[2]==(reinterpret_cast<const int*>(vendor))[2];
+}
+
+inline void queryCacheSizes_intel_direct(int& l1, int& l2, int& l3)
+{
+  int abcd[4];
+  l1 = l2 = l3 = 0;
+  int cache_id = 0;
+  int cache_type = 0;
+  do {
+    abcd[0] = abcd[1] = abcd[2] = abcd[3] = 0;
+    EIGEN_CPUID(abcd,0x4,cache_id);
+    cache_type  = (abcd[0] & 0x0F) >> 0;
+    if(cache_type==1||cache_type==3) // data or unified cache
+    {
+      int cache_level = (abcd[0] & 0xE0) >> 5;  // A[7:5]
+      int ways        = (abcd[1] & 0xFFC00000) >> 22; // B[31:22]
+      int partitions  = (abcd[1] & 0x003FF000) >> 12; // B[21:12]
+      int line_size   = (abcd[1] & 0x00000FFF) >>  0; // B[11:0]
+      int sets        = (abcd[2]);                    // C[31:0]
+
+      int cache_size = (ways+1) * (partitions+1) * (line_size+1) * (sets+1);
+
+      switch(cache_level)
+      {
+        case 1: l1 = cache_size; break;
+        case 2: l2 = cache_size; break;
+        case 3: l3 = cache_size; break;
+        default: break;
+      }
+    }
+    cache_id++;
+  } while(cache_type>0 && cache_id<16);
+}
+
+inline void queryCacheSizes_intel_codes(int& l1, int& l2, int& l3)
+{
+  int abcd[4];
+  abcd[0] = abcd[1] = abcd[2] = abcd[3] = 0;
+  l1 = l2 = l3 = 0;
+  EIGEN_CPUID(abcd,0x00000002,0);
+  unsigned char * bytes = reinterpret_cast<unsigned char *>(abcd)+2;
+  bool check_for_p2_core2 = false;
+  for(int i=0; i<14; ++i)
+  {
+    switch(bytes[i])
+    {
+      case 0x0A: l1 = 8; break;   // 0Ah   data L1 cache, 8 KB, 2 ways, 32 byte lines
+      case 0x0C: l1 = 16; break;  // 0Ch   data L1 cache, 16 KB, 4 ways, 32 byte lines
+      case 0x0E: l1 = 24; break;  // 0Eh   data L1 cache, 24 KB, 6 ways, 64 byte lines
+      case 0x10: l1 = 16; break;  // 10h   data L1 cache, 16 KB, 4 ways, 32 byte lines (IA-64)
+      case 0x15: l1 = 16; break;  // 15h   code L1 cache, 16 KB, 4 ways, 32 byte lines (IA-64)
+      case 0x2C: l1 = 32; break;  // 2Ch   data L1 cache, 32 KB, 8 ways, 64 byte lines
+      case 0x30: l1 = 32; break;  // 30h   code L1 cache, 32 KB, 8 ways, 64 byte lines
+      case 0x60: l1 = 16; break;  // 60h   data L1 cache, 16 KB, 8 ways, 64 byte lines, sectored
+      case 0x66: l1 = 8; break;   // 66h   data L1 cache, 8 KB, 4 ways, 64 byte lines, sectored
+      case 0x67: l1 = 16; break;  // 67h   data L1 cache, 16 KB, 4 ways, 64 byte lines, sectored
+      case 0x68: l1 = 32; break;  // 68h   data L1 cache, 32 KB, 4 ways, 64 byte lines, sectored
+      case 0x1A: l2 = 96; break;   // code and data L2 cache, 96 KB, 6 ways, 64 byte lines (IA-64)
+      case 0x22: l3 = 512; break;   // code and data L3 cache, 512 KB, 4 ways (!), 64 byte lines, dual-sectored
+      case 0x23: l3 = 1024; break;   // code and data L3 cache, 1024 KB, 8 ways, 64 byte lines, dual-sectored
+      case 0x25: l3 = 2048; break;   // code and data L3 cache, 2048 KB, 8 ways, 64 byte lines, dual-sectored
+      case 0x29: l3 = 4096; break;   // code and data L3 cache, 4096 KB, 8 ways, 64 byte lines, dual-sectored
+      case 0x39: l2 = 128; break;   // code and data L2 cache, 128 KB, 4 ways, 64 byte lines, sectored
+      case 0x3A: l2 = 192; break;   // code and data L2 cache, 192 KB, 6 ways, 64 byte lines, sectored
+      case 0x3B: l2 = 128; break;   // code and data L2 cache, 128 KB, 2 ways, 64 byte lines, sectored
+      case 0x3C: l2 = 256; break;   // code and data L2 cache, 256 KB, 4 ways, 64 byte lines, sectored
+      case 0x3D: l2 = 384; break;   // code and data L2 cache, 384 KB, 6 ways, 64 byte lines, sectored
+      case 0x3E: l2 = 512; break;   // code and data L2 cache, 512 KB, 4 ways, 64 byte lines, sectored
+      case 0x40: l2 = 0; break;   // no integrated L2 cache (P6 core) or L3 cache (P4 core)
+      case 0x41: l2 = 128; break;   // code and data L2 cache, 128 KB, 4 ways, 32 byte lines
+      case 0x42: l2 = 256; break;   // code and data L2 cache, 256 KB, 4 ways, 32 byte lines
+      case 0x43: l2 = 512; break;   // code and data L2 cache, 512 KB, 4 ways, 32 byte lines
+      case 0x44: l2 = 1024; break;   // code and data L2 cache, 1024 KB, 4 ways, 32 byte lines
+      case 0x45: l2 = 2048; break;   // code and data L2 cache, 2048 KB, 4 ways, 32 byte lines
+      case 0x46: l3 = 4096; break;   // code and data L3 cache, 4096 KB, 4 ways, 64 byte lines
+      case 0x47: l3 = 8192; break;   // code and data L3 cache, 8192 KB, 8 ways, 64 byte lines
+      case 0x48: l2 = 3072; break;   // code and data L2 cache, 3072 KB, 12 ways, 64 byte lines
+      case 0x49: if(l2!=0) l3 = 4096; else {check_for_p2_core2=true; l3 = l2 = 4096;} break;// code and data L3 cache, 4096 KB, 16 ways, 64 byte lines (P4) or L2 for core2
+      case 0x4A: l3 = 6144; break;   // code and data L3 cache, 6144 KB, 12 ways, 64 byte lines
+      case 0x4B: l3 = 8192; break;   // code and data L3 cache, 8192 KB, 16 ways, 64 byte lines
+      case 0x4C: l3 = 12288; break;   // code and data L3 cache, 12288 KB, 12 ways, 64 byte lines
+      case 0x4D: l3 = 16384; break;   // code and data L3 cache, 16384 KB, 16 ways, 64 byte lines
+      case 0x4E: l2 = 6144; break;   // code and data L2 cache, 6144 KB, 24 ways, 64 byte lines
+      case 0x78: l2 = 1024; break;   // code and data L2 cache, 1024 KB, 4 ways, 64 byte lines
+      case 0x79: l2 = 128; break;   // code and data L2 cache, 128 KB, 8 ways, 64 byte lines, dual-sectored
+      case 0x7A: l2 = 256; break;   // code and data L2 cache, 256 KB, 8 ways, 64 byte lines, dual-sectored
+      case 0x7B: l2 = 512; break;   // code and data L2 cache, 512 KB, 8 ways, 64 byte lines, dual-sectored
+      case 0x7C: l2 = 1024; break;   // code and data L2 cache, 1024 KB, 8 ways, 64 byte lines, dual-sectored
+      case 0x7D: l2 = 2048; break;   // code and data L2 cache, 2048 KB, 8 ways, 64 byte lines
+      case 0x7E: l2 = 256; break;   // code and data L2 cache, 256 KB, 8 ways, 128 byte lines, sect. (IA-64)
+      case 0x7F: l2 = 512; break;   // code and data L2 cache, 512 KB, 2 ways, 64 byte lines
+      case 0x80: l2 = 512; break;   // code and data L2 cache, 512 KB, 8 ways, 64 byte lines
+      case 0x81: l2 = 128; break;   // code and data L2 cache, 128 KB, 8 ways, 32 byte lines
+      case 0x82: l2 = 256; break;   // code and data L2 cache, 256 KB, 8 ways, 32 byte lines
+      case 0x83: l2 = 512; break;   // code and data L2 cache, 512 KB, 8 ways, 32 byte lines
+      case 0x84: l2 = 1024; break;   // code and data L2 cache, 1024 KB, 8 ways, 32 byte lines
+      case 0x85: l2 = 2048; break;   // code and data L2 cache, 2048 KB, 8 ways, 32 byte lines
+      case 0x86: l2 = 512; break;   // code and data L2 cache, 512 KB, 4 ways, 64 byte lines
+      case 0x87: l2 = 1024; break;   // code and data L2 cache, 1024 KB, 8 ways, 64 byte lines
+      case 0x88: l3 = 2048; break;   // code and data L3 cache, 2048 KB, 4 ways, 64 byte lines (IA-64)
+      case 0x89: l3 = 4096; break;   // code and data L3 cache, 4096 KB, 4 ways, 64 byte lines (IA-64)
+      case 0x8A: l3 = 8192; break;   // code and data L3 cache, 8192 KB, 4 ways, 64 byte lines (IA-64)
+      case 0x8D: l3 = 3072; break;   // code and data L3 cache, 3072 KB, 12 ways, 128 byte lines (IA-64)
+
+      default: break;
+    }
+  }
+  if(check_for_p2_core2 && l2 == l3)
+    l3 = 0;
+  l1 *= 1024;
+  l2 *= 1024;
+  l3 *= 1024;
+}
+
+inline void queryCacheSizes_intel(int& l1, int& l2, int& l3, int max_std_funcs)
+{
+  if(max_std_funcs>=4)
+    queryCacheSizes_intel_direct(l1,l2,l3);
+  else
+    queryCacheSizes_intel_codes(l1,l2,l3);
+}
+
+inline void queryCacheSizes_amd(int& l1, int& l2, int& l3)
+{
+  int abcd[4];
+  abcd[0] = abcd[1] = abcd[2] = abcd[3] = 0;
+  EIGEN_CPUID(abcd,0x80000005,0);
+  l1 = (abcd[2] >> 24) * 1024; // C[31:24] = L1 size in KB
+  abcd[0] = abcd[1] = abcd[2] = abcd[3] = 0;
+  EIGEN_CPUID(abcd,0x80000006,0);
+  l2 = (abcd[2] >> 16) * 1024; // C[31;16] = l2 cache size in KB
+  l3 = ((abcd[3] & 0xFFFC000) >> 18) * 512 * 1024; // D[31;18] = l3 cache size in 512KB
+}
+#endif
+
+/** \internal
+ * Queries and returns the cache sizes in Bytes of the L1, L2, and L3 data caches respectively */
+inline void queryCacheSizes(int& l1, int& l2, int& l3)
+{
+  #ifdef EIGEN_CPUID
+  int abcd[4];
+
+  // identify the CPU vendor
+  EIGEN_CPUID(abcd,0x0,0);
+  int max_std_funcs = abcd[1];
+  if(cpuid_is_vendor(abcd,"GenuineIntel"))
+    queryCacheSizes_intel(l1,l2,l3,max_std_funcs);
+  else if(cpuid_is_vendor(abcd,"AuthenticAMD") || cpuid_is_vendor(abcd,"AMDisbetter!"))
+    queryCacheSizes_amd(l1,l2,l3);
+  else
+    // by default let's use Intel's API
+    queryCacheSizes_intel(l1,l2,l3,max_std_funcs);
+
+  // here is the list of other vendors:
+//   ||cpuid_is_vendor(abcd,"VIA VIA VIA ")
+//   ||cpuid_is_vendor(abcd,"CyrixInstead")
+//   ||cpuid_is_vendor(abcd,"CentaurHauls")
+//   ||cpuid_is_vendor(abcd,"GenuineTMx86")
+//   ||cpuid_is_vendor(abcd,"TransmetaCPU")
+//   ||cpuid_is_vendor(abcd,"RiseRiseRise")
+//   ||cpuid_is_vendor(abcd,"Geode by NSC")
+//   ||cpuid_is_vendor(abcd,"SiS SiS SiS ")
+//   ||cpuid_is_vendor(abcd,"UMC UMC UMC ")
+//   ||cpuid_is_vendor(abcd,"NexGenDriven")
+  #else
+  l1 = l2 = l3 = -1;
+  #endif
+}
+
+/** \internal
+ * \returns the size in Bytes of the L1 data cache */
+inline int queryL1CacheSize()
+{
+  int l1(-1), l2, l3;
+  queryCacheSizes(l1,l2,l3);
+  return l1;
+}
+
+inline int queryL2CacheSize()
+{
+  int l1, l2(-1), l3;
+  queryCacheSizes(l1,l2,l3);
+  return l2;
+}
+
+/** \internal
+ * \returns the size in Bytes of the L2 or L3 cache if this later is present */
+inline int queryTopLevelCacheSize()
+{
+  int l1, l2(-1), l3(-1);
+  queryCacheSizes(l1,l2,l3);
+  return (std::max)(l2,l3);
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_MEMORY_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/Meta.h b/third_party/eigen3/Eigen/src/Core/util/Meta.h
new file mode 100644
index 0000000000..7576b32689
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/Meta.h
@@ -0,0 +1,334 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_META_H
+#define EIGEN_META_H
+
+#if defined(__CUDA_ARCH__) && !defined(__GCUDACC__)
+#include <math_constants.h>
+#endif
+
+namespace Eigen {
+
+namespace internal {
+
+/** \internal
+  * \file Meta.h
+  * This file contains generic metaprogramming classes which are not specifically related to Eigen.
+  * \note In case you wonder, yes we're aware that Boost already provides all these features,
+  * we however don't want to add a dependency to Boost.
+  */
+
+struct true_type {  enum { value = 1 }; };
+struct false_type { enum { value = 0 }; };
+
+template<bool Condition, typename Then, typename Else>
+struct conditional { typedef Then type; };
+
+template<typename Then, typename Else>
+struct conditional <false, Then, Else> { typedef Else type; };
+
+template<typename T, typename U> struct is_same { enum { value = 0 }; };
+template<typename T> struct is_same<T,T> { enum { value = 1 }; };
+
+template<typename T> struct remove_reference { typedef T type; };
+template<typename T> struct remove_reference<T&> { typedef T type; };
+
+template<typename T> struct remove_pointer { typedef T type; };
+template<typename T> struct remove_pointer<T*> { typedef T type; };
+template<typename T> struct remove_pointer<T*const> { typedef T type; };
+
+template <class T> struct remove_const { typedef T type; };
+template <class T> struct remove_const<const T> { typedef T type; };
+template <class T> struct remove_const<const T[]> { typedef T type[]; };
+template <class T, unsigned int Size> struct remove_const<const T[Size]> { typedef T type[Size]; };
+
+template<typename T> struct remove_all { typedef T type; };
+template<typename T> struct remove_all<const T>   { typedef typename remove_all<T>::type type; };
+template<typename T> struct remove_all<T const&>  { typedef typename remove_all<T>::type type; };
+template<typename T> struct remove_all<T&>        { typedef typename remove_all<T>::type type; };
+template<typename T> struct remove_all<T const*>  { typedef typename remove_all<T>::type type; };
+template<typename T> struct remove_all<T*>        { typedef typename remove_all<T>::type type; };
+
+template<typename T> struct is_arithmetic      { enum { value = false }; };
+template<> struct is_arithmetic<float>         { enum { value = true }; };
+template<> struct is_arithmetic<double>        { enum { value = true }; };
+template<> struct is_arithmetic<long double>   { enum { value = true }; };
+template<> struct is_arithmetic<bool>          { enum { value = true }; };
+template<> struct is_arithmetic<char>          { enum { value = true }; };
+template<> struct is_arithmetic<signed char>   { enum { value = true }; };
+template<> struct is_arithmetic<unsigned char> { enum { value = true }; };
+template<> struct is_arithmetic<signed short>  { enum { value = true }; };
+template<> struct is_arithmetic<unsigned short>{ enum { value = true }; };
+template<> struct is_arithmetic<signed int>    { enum { value = true }; };
+template<> struct is_arithmetic<unsigned int>  { enum { value = true }; };
+template<> struct is_arithmetic<signed long>   { enum { value = true }; };
+template<> struct is_arithmetic<unsigned long> { enum { value = true }; };
+
+template <typename T> struct add_const { typedef const T type; };
+template <typename T> struct add_const<T&> { typedef T& type; };
+
+template <typename T> struct is_const { enum { value = 0 }; };
+template <typename T> struct is_const<T const> { enum { value = 1 }; };
+
+template<typename T> struct add_const_on_value_type            { typedef const T type;  };
+template<typename T> struct add_const_on_value_type<T&>        { typedef T const& type; };
+template<typename T> struct add_const_on_value_type<T*>        { typedef T const* type; };
+template<typename T> struct add_const_on_value_type<T* const>  { typedef T const* const type; };
+template<typename T> struct add_const_on_value_type<T const* const>  { typedef T const* const type; };
+
+/** \internal Allows to enable/disable an overload
+  * according to a compile time condition.
+  */
+template<bool Condition, typename T> struct enable_if;
+
+template<typename T> struct enable_if<true,T>
+{ typedef T type; };
+
+#if defined(__CUDA_ARCH__) && !defined(__GCUDACC__)
+ 
+namespace device {
+
+template<typename T> struct numeric_limits
+{
+  EIGEN_DEVICE_FUNC
+  static T epsilon() { return 0; }
+  static T max() { assert(false && "Max not suppoted for this type"); }
+  static T lowest() { assert(false && "Lowest not suppoted for this type"); }
+};
+template<> struct numeric_limits<float>
+{
+  EIGEN_DEVICE_FUNC
+  static float epsilon() { return __FLT_EPSILON__; }
+  EIGEN_DEVICE_FUNC
+  static float max() { return CUDART_MAX_NORMAL_F; }
+  EIGEN_DEVICE_FUNC
+  static float lowest() { return -CUDART_MAX_NORMAL_F; }
+};
+template<> struct numeric_limits<double>
+{
+  EIGEN_DEVICE_FUNC
+  static double epsilon() { return __DBL_EPSILON__; }
+  EIGEN_DEVICE_FUNC
+  static double max() { return CUDART_INF; }
+  EIGEN_DEVICE_FUNC
+  static double lowest() { return -CUDART_INF; }
+};
+template<> struct numeric_limits<int>
+{
+  EIGEN_DEVICE_FUNC
+  static int epsilon() { return 0; }
+  EIGEN_DEVICE_FUNC
+  static int max() { return INT_MAX; }
+  EIGEN_DEVICE_FUNC
+  static int lowest() { return INT_MIN; }
+};
+template<> struct numeric_limits<long>
+{
+  EIGEN_DEVICE_FUNC
+  static long epsilon() { return 0; }
+  EIGEN_DEVICE_FUNC
+  static long max() { return LONG_MAX; }
+  EIGEN_DEVICE_FUNC
+  static long lowest() { return LONG_MIN; }
+};
+template<> struct numeric_limits<long long>
+{
+  EIGEN_DEVICE_FUNC
+  static long long epsilon() { return 0; }
+  EIGEN_DEVICE_FUNC
+  static long long max() { return LLONG_MAX; }
+  EIGEN_DEVICE_FUNC
+  static long long lowest() { return LLONG_MIN; }
+};
+
+}
+
+#endif
+
+/** \internal
+  * A base class do disable default copy ctor and copy assignement operator.
+  */
+class noncopyable
+{
+  noncopyable(const noncopyable&);
+  const noncopyable& operator=(const noncopyable&);
+protected:
+  noncopyable() {}
+  ~noncopyable() {}
+};
+
+
+/** \internal
+  * Convenient struct to get the result type of a unary or binary functor.
+  *
+  * It supports both the current STL mechanism (using the result_type member) as well as
+  * upcoming next STL generation (using a templated result member).
+  * If none of these members is provided, then the type of the first argument is returned. FIXME, that behavior is a pretty bad hack.
+  */
+template<typename T> struct result_of {};
+
+struct has_none {int a[1];};
+struct has_std_result_type {int a[2];};
+struct has_tr1_result {int a[3];};
+
+template<typename Func, typename ArgType, int SizeOf=sizeof(has_none)>
+struct unary_result_of_select {typedef ArgType type;};
+
+template<typename Func, typename ArgType>
+struct unary_result_of_select<Func, ArgType, sizeof(has_std_result_type)> {typedef typename Func::result_type type;};
+
+template<typename Func, typename ArgType>
+struct unary_result_of_select<Func, ArgType, sizeof(has_tr1_result)> {typedef typename Func::template result<Func(ArgType)>::type type;};
+
+template<typename Func, typename ArgType>
+struct result_of<Func(ArgType)> {
+    template<typename T>
+    static has_std_result_type testFunctor(T const *, typename T::result_type const * = 0);
+    template<typename T>
+    static has_tr1_result      testFunctor(T const *, typename T::template result<T(ArgType)>::type const * = 0);
+    static has_none            testFunctor(...);
+
+    // note that the following indirection is needed for gcc-3.3
+    enum {FunctorType = sizeof(testFunctor(static_cast<Func*>(0)))};
+    typedef typename unary_result_of_select<Func, ArgType, FunctorType>::type type;
+};
+
+template<typename Func, typename ArgType0, typename ArgType1, int SizeOf=sizeof(has_none)>
+struct binary_result_of_select {typedef ArgType0 type;};
+
+template<typename Func, typename ArgType0, typename ArgType1>
+struct binary_result_of_select<Func, ArgType0, ArgType1, sizeof(has_std_result_type)>
+{typedef typename Func::result_type type;};
+
+template<typename Func, typename ArgType0, typename ArgType1>
+struct binary_result_of_select<Func, ArgType0, ArgType1, sizeof(has_tr1_result)>
+{typedef typename Func::template result<Func(ArgType0,ArgType1)>::type type;};
+
+template<typename Func, typename ArgType0, typename ArgType1>
+struct result_of<Func(ArgType0,ArgType1)> {
+    template<typename T>
+    static has_std_result_type testFunctor(T const *, typename T::result_type const * = 0);
+    template<typename T>
+    static has_tr1_result      testFunctor(T const *, typename T::template result<T(ArgType0,ArgType1)>::type const * = 0);
+    static has_none            testFunctor(...);
+
+    // note that the following indirection is needed for gcc-3.3
+    enum {FunctorType = sizeof(testFunctor(static_cast<Func*>(0)))};
+    typedef typename binary_result_of_select<Func, ArgType0, ArgType1, FunctorType>::type type;
+};
+
+/** \internal In short, it computes int(sqrt(\a Y)) with \a Y an integer.
+  * Usage example: \code meta_sqrt<1023>::ret \endcode
+  */
+template<int Y,
+         int InfX = 0,
+         int SupX = ((Y==1) ? 1 : Y/2),
+         bool Done = ((SupX-InfX)<=1 ? true : ((SupX*SupX <= Y) && ((SupX+1)*(SupX+1) > Y))) >
+                                // use ?: instead of || just to shut up a stupid gcc 4.3 warning
+class meta_sqrt
+{
+    enum {
+      MidX = (InfX+SupX)/2,
+      TakeInf = MidX*MidX > Y ? 1 : 0,
+      NewInf = int(TakeInf) ? InfX : int(MidX),
+      NewSup = int(TakeInf) ? int(MidX) : SupX
+    };
+  public:
+    enum { ret = meta_sqrt<Y,NewInf,NewSup>::ret };
+};
+
+template<int Y, int InfX, int SupX>
+class meta_sqrt<Y, InfX, SupX, true> { public:  enum { ret = (SupX*SupX <= Y) ? SupX : InfX }; };
+
+/** \internal determines whether the product of two numeric types is allowed and what the return type is */
+template<typename T, typename U> struct scalar_product_traits
+{
+  enum { Defined = 0 };
+};
+
+template<typename T> struct scalar_product_traits<T,T>
+{
+  enum {
+    // Cost = NumTraits<T>::MulCost,
+    Defined = 1
+  };
+  typedef T ReturnType;
+};
+
+template<typename T> struct scalar_product_traits<T, const T>
+{
+  enum {
+    // Cost = NumTraits<T>::MulCost,
+    Defined = 1
+  };
+  typedef T ReturnType;
+};
+
+template<typename T> struct scalar_product_traits<const T, T>
+{
+  enum {
+    // Cost = NumTraits<T>::MulCost,
+    Defined = 1
+  };
+  typedef T ReturnType;
+};
+
+template<typename T> struct scalar_product_traits<T,std::complex<T> >
+{
+  enum {
+    // Cost = 2*NumTraits<T>::MulCost,
+    Defined = 1
+  };
+  typedef std::complex<T> ReturnType;
+};
+
+template<typename T> struct scalar_product_traits<std::complex<T>, T>
+{
+  enum {
+    // Cost = 2*NumTraits<T>::MulCost,
+    Defined = 1
+  };
+  typedef std::complex<T> ReturnType;
+};
+
+// FIXME quick workaround around current limitation of result_of
+// template<typename Scalar, typename ArgType0, typename ArgType1>
+// struct result_of<scalar_product_op<Scalar>(ArgType0,ArgType1)> {
+// typedef typename scalar_product_traits<typename remove_all<ArgType0>::type, typename remove_all<ArgType1>::type>::ReturnType type;
+// };
+
+template<typename T> struct is_diagonal
+{ enum { ret = false }; };
+
+template<typename T> struct is_diagonal<DiagonalBase<T> >
+{ enum { ret = true }; };
+
+template<typename T> struct is_diagonal<DiagonalWrapper<T> >
+{ enum { ret = true }; };
+
+template<typename T, int S> struct is_diagonal<DiagonalMatrix<T,S> >
+{ enum { ret = true }; };
+
+} // end namespace internal
+
+namespace numext {
+  
+#if defined(__CUDA_ARCH__)
+template<typename T> EIGEN_DEVICE_FUNC   void swap(T &a, T &b) { T tmp = b; b = a; a = tmp; }
+#else
+template<typename T> EIGEN_STRONG_INLINE void swap(T &a, T &b) { std::swap(a,b); }
+#endif
+
+} // end namespace numext
+
+} // end namespace Eigen
+
+#endif // EIGEN_META_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/ReenableStupidWarnings.h b/third_party/eigen3/Eigen/src/Core/util/ReenableStupidWarnings.h
new file mode 100644
index 0000000000..5ddfbd4aa6
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/ReenableStupidWarnings.h
@@ -0,0 +1,14 @@
+#ifdef EIGEN_WARNINGS_DISABLED
+#undef EIGEN_WARNINGS_DISABLED
+
+#ifndef EIGEN_PERMANENTLY_DISABLE_STUPID_WARNINGS
+  #ifdef _MSC_VER
+    #pragma warning( pop )
+  #elif defined __INTEL_COMPILER
+    #pragma warning pop
+  #elif defined __clang__
+    #pragma clang diagnostic pop
+  #endif
+#endif
+
+#endif // EIGEN_WARNINGS_DISABLED
diff --git a/third_party/eigen3/Eigen/src/Core/util/StaticAssert.h b/third_party/eigen3/Eigen/src/Core/util/StaticAssert.h
new file mode 100644
index 0000000000..396e27b900
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/StaticAssert.h
@@ -0,0 +1,206 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STATIC_ASSERT_H
+#define EIGEN_STATIC_ASSERT_H
+
+/* Some notes on Eigen's static assertion mechanism:
+ *
+ *  - in EIGEN_STATIC_ASSERT(CONDITION,MSG) the parameter CONDITION must be a compile time boolean
+ *    expression, and MSG an enum listed in struct internal::static_assertion<true>
+ *
+ *  - define EIGEN_NO_STATIC_ASSERT to disable them (and save compilation time)
+ *    in that case, the static assertion is converted to the following runtime assert:
+ *      eigen_assert(CONDITION && "MSG")
+ *
+ *  - currently EIGEN_STATIC_ASSERT can only be used in function scope
+ *
+ */
+
+#ifndef EIGEN_NO_STATIC_ASSERT
+
+  #if defined(__GXX_EXPERIMENTAL_CXX0X__) || (EIGEN_COMP_MSVC >= 1600)
+
+    // if native static_assert is enabled, let's use it
+    #define EIGEN_STATIC_ASSERT(X,MSG) static_assert(X,#MSG);
+
+  #else // not CXX0X
+
+    namespace Eigen {
+
+    namespace internal {
+
+    template<bool condition>
+    struct static_assertion {};
+
+    template<>
+    struct static_assertion<true>
+    {
+      enum {
+        YOU_TRIED_CALLING_A_VECTOR_METHOD_ON_A_MATRIX,
+        YOU_MIXED_VECTORS_OF_DIFFERENT_SIZES,
+        YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES,
+        THIS_METHOD_IS_ONLY_FOR_VECTORS_OF_A_SPECIFIC_SIZE,
+        THIS_METHOD_IS_ONLY_FOR_MATRICES_OF_A_SPECIFIC_SIZE,
+        THIS_METHOD_IS_ONLY_FOR_OBJECTS_OF_A_SPECIFIC_SIZE,
+        YOU_MADE_A_PROGRAMMING_MISTAKE,
+        EIGEN_INTERNAL_ERROR_PLEASE_FILE_A_BUG_REPORT,
+        EIGEN_INTERNAL_COMPILATION_ERROR_OR_YOU_MADE_A_PROGRAMMING_MISTAKE,
+        YOU_CALLED_A_FIXED_SIZE_METHOD_ON_A_DYNAMIC_SIZE_MATRIX_OR_VECTOR,
+        YOU_CALLED_A_DYNAMIC_SIZE_METHOD_ON_A_FIXED_SIZE_MATRIX_OR_VECTOR,
+        UNALIGNED_LOAD_AND_STORE_OPERATIONS_UNIMPLEMENTED_ON_ALTIVEC,
+        THIS_FUNCTION_IS_NOT_FOR_INTEGER_NUMERIC_TYPES,
+        FLOATING_POINT_ARGUMENT_PASSED__INTEGER_WAS_EXPECTED,
+        NUMERIC_TYPE_MUST_BE_REAL,
+        COEFFICIENT_WRITE_ACCESS_TO_SELFADJOINT_NOT_SUPPORTED,
+        WRITING_TO_TRIANGULAR_PART_WITH_UNIT_DIAGONAL_IS_NOT_SUPPORTED,
+        THIS_METHOD_IS_ONLY_FOR_FIXED_SIZE,
+        INVALID_MATRIX_PRODUCT,
+        INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS,
+        INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION,
+        YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY,
+        THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES,
+        THIS_METHOD_IS_ONLY_FOR_ROW_MAJOR_MATRICES,
+        INVALID_MATRIX_TEMPLATE_PARAMETERS,
+        INVALID_MATRIXBASE_TEMPLATE_PARAMETERS,
+        BOTH_MATRICES_MUST_HAVE_THE_SAME_STORAGE_ORDER,
+        THIS_METHOD_IS_ONLY_FOR_DIAGONAL_MATRIX,
+        THE_MATRIX_OR_EXPRESSION_THAT_YOU_PASSED_DOES_NOT_HAVE_THE_EXPECTED_TYPE,
+        THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_WITH_DIRECT_MEMORY_ACCESS_SUCH_AS_MAP_OR_PLAIN_MATRICES,
+        YOU_ALREADY_SPECIFIED_THIS_STRIDE,
+        INVALID_STORAGE_ORDER_FOR_THIS_VECTOR_EXPRESSION,
+        THE_BRACKET_OPERATOR_IS_ONLY_FOR_VECTORS__USE_THE_PARENTHESIS_OPERATOR_INSTEAD,
+        PACKET_ACCESS_REQUIRES_TO_HAVE_INNER_STRIDE_FIXED_TO_1,
+        THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS,
+        YOU_CANNOT_MIX_ARRAYS_AND_MATRICES,
+        YOU_PERFORMED_AN_INVALID_TRANSFORMATION_CONVERSION,
+        THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY,
+        YOU_ARE_TRYING_TO_USE_AN_INDEX_BASED_ACCESSOR_ON_AN_EXPRESSION_THAT_DOES_NOT_SUPPORT_THAT,
+        THIS_METHOD_IS_ONLY_FOR_1x1_EXPRESSIONS,
+        THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_OF_BOOL,
+        THIS_METHOD_IS_ONLY_FOR_ARRAYS_NOT_MATRICES,
+        YOU_PASSED_A_ROW_VECTOR_BUT_A_COLUMN_VECTOR_WAS_EXPECTED,
+        YOU_PASSED_A_COLUMN_VECTOR_BUT_A_ROW_VECTOR_WAS_EXPECTED,
+        THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE,
+        THE_STORAGE_ORDER_OF_BOTH_SIDES_MUST_MATCH,
+        OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG
+      };
+    };
+
+    } // end namespace internal
+
+    } // end namespace Eigen
+
+    // Specialized implementation for MSVC to avoid "conditional
+    // expression is constant" warnings.  This implementation doesn't
+    // appear to work under GCC, hence the multiple implementations.
+    #if EIGEN_COMP_MSVC
+
+      #define EIGEN_STATIC_ASSERT(CONDITION,MSG) \
+        {Eigen::internal::static_assertion<bool(CONDITION)>::MSG;}
+
+    #else
+      // In some cases clang interprets bool(CONDITION) as function declaration
+      #define EIGEN_STATIC_ASSERT(CONDITION,MSG) \
+        if (Eigen::internal::static_assertion<static_cast<bool>(CONDITION)>::MSG) {}
+
+    #endif
+
+  #endif // not CXX0X
+
+#else // EIGEN_NO_STATIC_ASSERT
+
+  #define EIGEN_STATIC_ASSERT(CONDITION,MSG) eigen_assert((CONDITION) && #MSG);
+
+#endif // EIGEN_NO_STATIC_ASSERT
+
+
+// static assertion failing if the type \a TYPE is not a vector type
+#define EIGEN_STATIC_ASSERT_VECTOR_ONLY(TYPE) \
+  EIGEN_STATIC_ASSERT(TYPE::IsVectorAtCompileTime, \
+                      YOU_TRIED_CALLING_A_VECTOR_METHOD_ON_A_MATRIX)
+
+// static assertion failing if the type \a TYPE is not fixed-size
+#define EIGEN_STATIC_ASSERT_FIXED_SIZE(TYPE) \
+  EIGEN_STATIC_ASSERT(TYPE::SizeAtCompileTime!=Eigen::Dynamic, \
+                      YOU_CALLED_A_FIXED_SIZE_METHOD_ON_A_DYNAMIC_SIZE_MATRIX_OR_VECTOR)
+
+// static assertion failing if the type \a TYPE is not dynamic-size
+#define EIGEN_STATIC_ASSERT_DYNAMIC_SIZE(TYPE) \
+  EIGEN_STATIC_ASSERT(TYPE::SizeAtCompileTime==Eigen::Dynamic, \
+                      YOU_CALLED_A_DYNAMIC_SIZE_METHOD_ON_A_FIXED_SIZE_MATRIX_OR_VECTOR)
+
+// static assertion failing if the type \a TYPE is not a vector type of the given size
+#define EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(TYPE, SIZE) \
+  EIGEN_STATIC_ASSERT(TYPE::IsVectorAtCompileTime && TYPE::SizeAtCompileTime==SIZE, \
+                      THIS_METHOD_IS_ONLY_FOR_VECTORS_OF_A_SPECIFIC_SIZE)
+
+// static assertion failing if the type \a TYPE is not a vector type of the given size
+#define EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(TYPE, ROWS, COLS) \
+  EIGEN_STATIC_ASSERT(TYPE::RowsAtCompileTime==ROWS && TYPE::ColsAtCompileTime==COLS, \
+                      THIS_METHOD_IS_ONLY_FOR_MATRICES_OF_A_SPECIFIC_SIZE)
+
+// static assertion failing if the two vector expression types are not compatible (same fixed-size or dynamic size)
+#define EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(TYPE0,TYPE1) \
+  EIGEN_STATIC_ASSERT( \
+      (int(TYPE0::SizeAtCompileTime)==Eigen::Dynamic \
+    || int(TYPE1::SizeAtCompileTime)==Eigen::Dynamic \
+    || int(TYPE0::SizeAtCompileTime)==int(TYPE1::SizeAtCompileTime)),\
+    YOU_MIXED_VECTORS_OF_DIFFERENT_SIZES)
+
+#define EIGEN_PREDICATE_SAME_MATRIX_SIZE(TYPE0,TYPE1) \
+     ( \
+        (int(TYPE0::SizeAtCompileTime)==0 && int(TYPE1::SizeAtCompileTime)==0) \
+    || (\
+          (int(TYPE0::RowsAtCompileTime)==Eigen::Dynamic \
+        || int(TYPE1::RowsAtCompileTime)==Eigen::Dynamic \
+        || int(TYPE0::RowsAtCompileTime)==int(TYPE1::RowsAtCompileTime)) \
+      &&  (int(TYPE0::ColsAtCompileTime)==Eigen::Dynamic \
+        || int(TYPE1::ColsAtCompileTime)==Eigen::Dynamic \
+        || int(TYPE0::ColsAtCompileTime)==int(TYPE1::ColsAtCompileTime))\
+       ) \
+     )
+
+#ifdef EIGEN2_SUPPORT
+  #define EIGEN_STATIC_ASSERT_NON_INTEGER(TYPE) \
+    eigen_assert(!NumTraits<Scalar>::IsInteger);
+#else
+  #define EIGEN_STATIC_ASSERT_NON_INTEGER(TYPE) \
+    EIGEN_STATIC_ASSERT(!NumTraits<TYPE>::IsInteger, THIS_FUNCTION_IS_NOT_FOR_INTEGER_NUMERIC_TYPES)
+#endif
+
+
+// static assertion failing if it is guaranteed at compile-time that the two matrix expression types have different sizes
+#define EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(TYPE0,TYPE1) \
+  EIGEN_STATIC_ASSERT( \
+     EIGEN_PREDICATE_SAME_MATRIX_SIZE(TYPE0,TYPE1),\
+    YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES)
+
+#define EIGEN_STATIC_ASSERT_SIZE_1x1(TYPE) \
+      EIGEN_STATIC_ASSERT((TYPE::RowsAtCompileTime == 1 || TYPE::RowsAtCompileTime == Dynamic) && \
+                          (TYPE::ColsAtCompileTime == 1 || TYPE::ColsAtCompileTime == Dynamic), \
+                          THIS_METHOD_IS_ONLY_FOR_1x1_EXPRESSIONS)
+
+#define EIGEN_STATIC_ASSERT_LVALUE(Derived) \
+      EIGEN_STATIC_ASSERT(internal::is_lvalue<Derived>::value, \
+                          THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY)
+
+#define EIGEN_STATIC_ASSERT_ARRAYXPR(Derived) \
+      EIGEN_STATIC_ASSERT((internal::is_same<typename internal::traits<Derived>::XprKind, ArrayXpr>::value), \
+                          THIS_METHOD_IS_ONLY_FOR_ARRAYS_NOT_MATRICES)
+
+#define EIGEN_STATIC_ASSERT_SAME_XPR_KIND(Derived1, Derived2) \
+      EIGEN_STATIC_ASSERT((internal::is_same<typename internal::traits<Derived1>::XprKind, \
+                                             typename internal::traits<Derived2>::XprKind \
+                                            >::value), \
+                          YOU_CANNOT_MIX_ARRAYS_AND_MATRICES)
+
+
+#endif // EIGEN_STATIC_ASSERT_H
diff --git a/third_party/eigen3/Eigen/src/Core/util/XprHelper.h b/third_party/eigen3/Eigen/src/Core/util/XprHelper.h
new file mode 100644
index 0000000000..13285909b4
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Core/util/XprHelper.h
@@ -0,0 +1,481 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_XPRHELPER_H
+#define EIGEN_XPRHELPER_H
+
+// just a workaround because GCC seems to not really like empty structs
+// FIXME: gcc 4.3 generates bad code when strict-aliasing is enabled
+// so currently we simply disable this optimization for gcc 4.3
+#if EIGEN_COMP_GNUC && !EIGEN_GNUC_AT(4,3)
+  #define EIGEN_EMPTY_STRUCT_CTOR(X) \
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE X() {} \
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE X(const X& ) {}
+#else
+  #define EIGEN_EMPTY_STRUCT_CTOR(X)
+#endif
+
+namespace Eigen {
+
+typedef EIGEN_DEFAULT_DENSE_INDEX_TYPE DenseIndex;
+
+namespace internal {
+
+//classes inheriting no_assignment_operator don't generate a default operator=.
+class no_assignment_operator
+{
+  private:
+    no_assignment_operator& operator=(const no_assignment_operator&);
+};
+
+/** \internal return the index type with the largest number of bits */
+template<typename I1, typename I2>
+struct promote_index_type
+{
+  typedef typename conditional<(sizeof(I1)<sizeof(I2)), I2, I1>::type type;
+};
+
+/** \internal If the template parameter Value is Dynamic, this class is just a wrapper around a T variable that
+  * can be accessed using value() and setValue().
+  * Otherwise, this class is an empty structure and value() just returns the template parameter Value.
+  */
+template<typename T, int Value> class variable_if_dynamic
+{
+  public:
+    EIGEN_EMPTY_STRUCT_CTOR(variable_if_dynamic)
+    EIGEN_DEVICE_FUNC explicit variable_if_dynamic(T v) { EIGEN_ONLY_USED_FOR_DEBUG(v); eigen_assert(v == T(Value)); }
+    EIGEN_DEVICE_FUNC static T value() { return T(Value); }
+    EIGEN_DEVICE_FUNC void setValue(T) {}
+};
+
+template<typename T> class variable_if_dynamic<T, Dynamic>
+{
+    T m_value;
+    EIGEN_DEVICE_FUNC variable_if_dynamic() { eigen_assert(false); }
+  public:
+    EIGEN_DEVICE_FUNC explicit variable_if_dynamic(T value) : m_value(value) {}
+    EIGEN_DEVICE_FUNC T value() const { return m_value; }
+    EIGEN_DEVICE_FUNC void setValue(T value) { m_value = value; }
+};
+
+/** \internal like variable_if_dynamic but for DynamicIndex
+  */
+template<typename T, int Value> class variable_if_dynamicindex
+{
+  public:
+    EIGEN_EMPTY_STRUCT_CTOR(variable_if_dynamicindex)
+    EIGEN_DEVICE_FUNC explicit variable_if_dynamicindex(T v) { EIGEN_ONLY_USED_FOR_DEBUG(v); eigen_assert(v == T(Value)); }
+    EIGEN_DEVICE_FUNC static T value() { return T(Value); }
+    EIGEN_DEVICE_FUNC void setValue(T) {}
+};
+
+template<typename T> class variable_if_dynamicindex<T, DynamicIndex>
+{
+    T m_value;
+    EIGEN_DEVICE_FUNC variable_if_dynamicindex() { eigen_assert(false); }
+  public:
+    EIGEN_DEVICE_FUNC explicit variable_if_dynamicindex(T value) : m_value(value) {}
+    EIGEN_DEVICE_FUNC T value() const { return m_value; }
+    EIGEN_DEVICE_FUNC void setValue(T value) { m_value = value; }
+};
+
+template<typename T> struct functor_traits
+{
+  enum
+  {
+    Cost = 10,
+    PacketAccess = false,
+    IsRepeatable = false
+  };
+};
+
+template<typename T> struct packet_traits;
+
+template<typename T> struct unpacket_traits
+{
+  typedef T type;
+  typedef T half;
+  enum {size=1};
+};
+
+template<typename _Scalar, int _Rows, int _Cols,
+         int _Options = AutoAlign |
+                          ( (_Rows==1 && _Cols!=1) ? RowMajor
+                          : (_Cols==1 && _Rows!=1) ? ColMajor
+                          : EIGEN_DEFAULT_MATRIX_STORAGE_ORDER_OPTION ),
+         int _MaxRows = _Rows,
+         int _MaxCols = _Cols
+> class make_proper_matrix_type
+{
+    enum {
+      IsColVector = _Cols==1 && _Rows!=1,
+      IsRowVector = _Rows==1 && _Cols!=1,
+      Options = IsColVector ? (_Options | ColMajor) & ~RowMajor
+              : IsRowVector ? (_Options | RowMajor) & ~ColMajor
+              : _Options
+    };
+  public:
+    typedef Matrix<_Scalar, _Rows, _Cols, Options, _MaxRows, _MaxCols> type;
+};
+
+template<typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
+class compute_matrix_flags
+{
+    enum {
+      row_major_bit = Options&RowMajor ? RowMajorBit : 0,
+      is_dynamic_size_storage = MaxRows==Dynamic || MaxCols==Dynamic,
+
+      aligned_bit =
+      (
+            ((Options&DontAlign)==0)
+        && (
+#if EIGEN_ALIGN_STATICALLY
+             ((!is_dynamic_size_storage) && (((MaxCols*MaxRows*int(sizeof(Scalar))) % EIGEN_ALIGN_BYTES) == 0))
+#else
+             0
+#endif
+
+          ||
+
+#if EIGEN_ALIGN
+             is_dynamic_size_storage
+#else
+             0
+#endif
+
+          )
+      ) ? AlignedBit : 0,
+      packet_access_bit = packet_traits<Scalar>::Vectorizable && aligned_bit ? PacketAccessBit : 0
+    };
+
+  public:
+    enum { ret = LinearAccessBit | LvalueBit | DirectAccessBit | NestByRefBit | packet_access_bit | row_major_bit | aligned_bit };
+};
+
+template<int _Rows, int _Cols> struct size_at_compile_time
+{
+  enum { ret = (_Rows==Dynamic || _Cols==Dynamic) ? Dynamic : _Rows * _Cols };
+};
+
+/* plain_matrix_type : the difference from eval is that plain_matrix_type is always a plain matrix type,
+ * whereas eval is a const reference in the case of a matrix
+ */
+
+template<typename T, typename StorageKind = typename traits<T>::StorageKind> struct plain_matrix_type;
+template<typename T, typename BaseClassType> struct plain_matrix_type_dense;
+template<typename T> struct plain_matrix_type<T,Dense>
+{
+  typedef typename plain_matrix_type_dense<T,typename traits<T>::XprKind>::type type;
+};
+
+template<typename T> struct plain_matrix_type_dense<T,MatrixXpr>
+{
+  typedef Matrix<typename traits<T>::Scalar,
+                traits<T>::RowsAtCompileTime,
+                traits<T>::ColsAtCompileTime,
+                AutoAlign | (traits<T>::Flags&RowMajorBit ? RowMajor : ColMajor),
+                traits<T>::MaxRowsAtCompileTime,
+                traits<T>::MaxColsAtCompileTime
+          > type;
+};
+
+template<typename T> struct plain_matrix_type_dense<T,ArrayXpr>
+{
+  typedef Array<typename traits<T>::Scalar,
+                traits<T>::RowsAtCompileTime,
+                traits<T>::ColsAtCompileTime,
+                AutoAlign | (traits<T>::Flags&RowMajorBit ? RowMajor : ColMajor),
+                traits<T>::MaxRowsAtCompileTime,
+                traits<T>::MaxColsAtCompileTime
+          > type;
+};
+
+/* eval : the return type of eval(). For matrices, this is just a const reference
+ * in order to avoid a useless copy
+ */
+
+template<typename T, typename StorageKind = typename traits<T>::StorageKind> struct eval;
+
+template<typename T> struct eval<T,Dense>
+{
+  typedef typename plain_matrix_type<T>::type type;
+//   typedef typename T::PlainObject type;
+//   typedef T::Matrix<typename traits<T>::Scalar,
+//                 traits<T>::RowsAtCompileTime,
+//                 traits<T>::ColsAtCompileTime,
+//                 AutoAlign | (traits<T>::Flags&RowMajorBit ? RowMajor : ColMajor),
+//                 traits<T>::MaxRowsAtCompileTime,
+//                 traits<T>::MaxColsAtCompileTime
+//           > type;
+};
+
+// for matrices, no need to evaluate, just use a const reference to avoid a useless copy
+template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
+struct eval<Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>, Dense>
+{
+  typedef const Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>& type;
+};
+
+template<typename _Scalar, int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols>
+struct eval<Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>, Dense>
+{
+  typedef const Array<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>& type;
+};
+
+
+
+/* plain_matrix_type_column_major : same as plain_matrix_type but guaranteed to be column-major
+ */
+template<typename T> struct plain_matrix_type_column_major
+{
+  enum { Rows = traits<T>::RowsAtCompileTime,
+         Cols = traits<T>::ColsAtCompileTime,
+         MaxRows = traits<T>::MaxRowsAtCompileTime,
+         MaxCols = traits<T>::MaxColsAtCompileTime
+  };
+  typedef Matrix<typename traits<T>::Scalar,
+                Rows,
+                Cols,
+                (MaxRows==1&&MaxCols!=1) ? RowMajor : ColMajor,
+                MaxRows,
+                MaxCols
+          > type;
+};
+
+/* plain_matrix_type_row_major : same as plain_matrix_type but guaranteed to be row-major
+ */
+template<typename T> struct plain_matrix_type_row_major
+{
+  enum { Rows = traits<T>::RowsAtCompileTime,
+         Cols = traits<T>::ColsAtCompileTime,
+         MaxRows = traits<T>::MaxRowsAtCompileTime,
+         MaxCols = traits<T>::MaxColsAtCompileTime
+  };
+  typedef Matrix<typename traits<T>::Scalar,
+                Rows,
+                Cols,
+                (MaxCols==1&&MaxRows!=1) ? RowMajor : ColMajor,
+                MaxRows,
+                MaxCols
+          > type;
+};
+
+// we should be able to get rid of this one too
+template<typename T> struct must_nest_by_value { enum { ret = false }; };
+
+/** \internal The reference selector for template expressions. The idea is that we don't
+  * need to use references for expressions since they are light weight proxy
+  * objects which should generate no copying overhead. */
+template <typename T>
+struct ref_selector
+{
+  typedef typename conditional<
+    bool(traits<T>::Flags & NestByRefBit),
+    T const&,
+    const T
+  >::type type;
+};
+
+/** \internal Adds the const qualifier on the value-type of T2 if and only if T1 is a const type */
+template<typename T1, typename T2>
+struct transfer_constness
+{
+  typedef typename conditional<
+    bool(internal::is_const<T1>::value),
+    typename internal::add_const_on_value_type<T2>::type,
+    T2
+  >::type type;
+};
+
+/** \internal Determines how a given expression should be nested into another one.
+  * For example, when you do a * (b+c), Eigen will determine how the expression b+c should be
+  * nested into the bigger product expression. The choice is between nesting the expression b+c as-is, or
+  * evaluating that expression b+c into a temporary variable d, and nest d so that the resulting expression is
+  * a*d. Evaluating can be beneficial for example if every coefficient access in the resulting expression causes
+  * many coefficient accesses in the nested expressions -- as is the case with matrix product for example.
+  *
+  * \param T the type of the expression being nested
+  * \param n the number of coefficient accesses in the nested expression for each coefficient access in the bigger expression.
+  *
+  * Note that if no evaluation occur, then the constness of T is preserved.
+  *
+  * Example. Suppose that a, b, and c are of type Matrix3d. The user forms the expression a*(b+c).
+  * b+c is an expression "sum of matrices", which we will denote by S. In order to determine how to nest it,
+  * the Product expression uses: nested<S, 3>::type, which turns out to be Matrix3d because the internal logic of
+  * nested determined that in this case it was better to evaluate the expression b+c into a temporary. On the other hand,
+  * since a is of type Matrix3d, the Product expression nests it as nested<Matrix3d, 3>::type, which turns out to be
+  * const Matrix3d&, because the internal logic of nested determined that since a was already a matrix, there was no point
+  * in copying it into another matrix.
+  */
+template<typename T, int n=1, typename PlainObject = typename eval<T>::type> struct nested
+{
+  enum {
+    // for the purpose of this test, to keep it reasonably simple, we arbitrarily choose a value of Dynamic values.
+    // the choice of 10000 makes it larger than any practical fixed value and even most dynamic values.
+    // in extreme cases where these assumptions would be wrong, we would still at worst suffer performance issues
+    // (poor choice of temporaries).
+    // it's important that this value can still be squared without integer overflowing.
+    DynamicAsInteger = 10000,
+    ScalarReadCost = NumTraits<typename traits<T>::Scalar>::ReadCost,
+    ScalarReadCostAsInteger = ScalarReadCost == Dynamic ? int(DynamicAsInteger) : int(ScalarReadCost),
+    CoeffReadCost = traits<T>::CoeffReadCost,
+    CoeffReadCostAsInteger = CoeffReadCost == Dynamic ? int(DynamicAsInteger) : int(CoeffReadCost),
+    NAsInteger = n == Dynamic ? int(DynamicAsInteger) : n,
+    CostEvalAsInteger   = (NAsInteger+1) * ScalarReadCostAsInteger + CoeffReadCostAsInteger,
+    CostNoEvalAsInteger = NAsInteger * CoeffReadCostAsInteger
+  };
+
+  typedef typename conditional<
+      ( (int(traits<T>::Flags) & EvalBeforeNestingBit) ||
+        int(CostEvalAsInteger) < int(CostNoEvalAsInteger)
+      ),
+      PlainObject,
+      typename ref_selector<T>::type
+  >::type type;
+};
+
+template<typename T>
+EIGEN_DEVICE_FUNC
+T* const_cast_ptr(const T* ptr)
+{
+  return const_cast<T*>(ptr);
+}
+
+template<typename Derived, typename XprKind = typename traits<Derived>::XprKind>
+struct dense_xpr_base
+{
+  /* dense_xpr_base should only ever be used on dense expressions, thus falling either into the MatrixXpr or into the ArrayXpr cases */
+};
+
+template<typename Derived>
+struct dense_xpr_base<Derived, MatrixXpr>
+{
+  typedef MatrixBase<Derived> type;
+};
+
+template<typename Derived>
+struct dense_xpr_base<Derived, ArrayXpr>
+{
+  typedef ArrayBase<Derived> type;
+};
+
+/** \internal Helper base class to add a scalar multiple operator
+  * overloads for complex types */
+template<typename Derived,typename Scalar,typename OtherScalar,
+         bool EnableIt = !is_same<Scalar,OtherScalar>::value >
+struct special_scalar_op_base : public DenseCoeffsBase<Derived>
+{
+  // dummy operator* so that the
+  // "using special_scalar_op_base::operator*" compiles
+  void operator*() const;
+};
+
+template<typename Derived,typename Scalar,typename OtherScalar>
+struct special_scalar_op_base<Derived,Scalar,OtherScalar,true>  : public DenseCoeffsBase<Derived>
+{
+  const CwiseUnaryOp<scalar_multiple2_op<Scalar,OtherScalar>, Derived>
+  operator*(const OtherScalar& scalar) const
+  {
+    return CwiseUnaryOp<scalar_multiple2_op<Scalar,OtherScalar>, Derived>
+      (*static_cast<const Derived*>(this), scalar_multiple2_op<Scalar,OtherScalar>(scalar));
+  }
+
+  inline friend const CwiseUnaryOp<scalar_multiple2_op<Scalar,OtherScalar>, Derived>
+  operator*(const OtherScalar& scalar, const Derived& matrix)
+  { return static_cast<const special_scalar_op_base&>(matrix).operator*(scalar); }
+};
+
+template<typename XprType, typename CastType> struct cast_return_type
+{
+  typedef typename XprType::Scalar CurrentScalarType;
+  typedef typename remove_all<CastType>::type _CastType;
+  typedef typename _CastType::Scalar NewScalarType;
+  typedef typename conditional<is_same<CurrentScalarType,NewScalarType>::value,
+                              const XprType&,CastType>::type type;
+};
+
+template <typename A, typename B> struct promote_storage_type;
+
+template <typename A> struct promote_storage_type<A,A>
+{
+  typedef A ret;
+};
+template <typename A> struct promote_storage_type<A, const A>
+{
+  typedef A ret;
+};
+template <typename A> struct promote_storage_type<const A, A>
+{
+  typedef A ret;
+};
+
+
+
+/** \internal gives the plain matrix or array type to store a row/column/diagonal of a matrix type.
+  * \param Scalar optional parameter allowing to pass a different scalar type than the one of the MatrixType.
+  */
+template<typename ExpressionType, typename Scalar = typename ExpressionType::Scalar>
+struct plain_row_type
+{
+  typedef Matrix<Scalar, 1, ExpressionType::ColsAtCompileTime,
+                 ExpressionType::PlainObject::Options | RowMajor, 1, ExpressionType::MaxColsAtCompileTime> MatrixRowType;
+  typedef Array<Scalar, 1, ExpressionType::ColsAtCompileTime,
+                 ExpressionType::PlainObject::Options | RowMajor, 1, ExpressionType::MaxColsAtCompileTime> ArrayRowType;
+
+  typedef typename conditional<
+    is_same< typename traits<ExpressionType>::XprKind, MatrixXpr >::value,
+    MatrixRowType,
+    ArrayRowType 
+  >::type type;
+};
+
+template<typename ExpressionType, typename Scalar = typename ExpressionType::Scalar>
+struct plain_col_type
+{
+  typedef Matrix<Scalar, ExpressionType::RowsAtCompileTime, 1,
+                 ExpressionType::PlainObject::Options & ~RowMajor, ExpressionType::MaxRowsAtCompileTime, 1> MatrixColType;
+  typedef Array<Scalar, ExpressionType::RowsAtCompileTime, 1,
+                 ExpressionType::PlainObject::Options & ~RowMajor, ExpressionType::MaxRowsAtCompileTime, 1> ArrayColType;
+
+  typedef typename conditional<
+    is_same< typename traits<ExpressionType>::XprKind, MatrixXpr >::value,
+    MatrixColType,
+    ArrayColType 
+  >::type type;
+};
+
+template<typename ExpressionType, typename Scalar = typename ExpressionType::Scalar>
+struct plain_diag_type
+{
+  enum { diag_size = EIGEN_SIZE_MIN_PREFER_DYNAMIC(ExpressionType::RowsAtCompileTime, ExpressionType::ColsAtCompileTime),
+         max_diag_size = EIGEN_SIZE_MIN_PREFER_FIXED(ExpressionType::MaxRowsAtCompileTime, ExpressionType::MaxColsAtCompileTime)
+  };
+  typedef Matrix<Scalar, diag_size, 1, ExpressionType::PlainObject::Options & ~RowMajor, max_diag_size, 1> MatrixDiagType;
+  typedef Array<Scalar, diag_size, 1, ExpressionType::PlainObject::Options & ~RowMajor, max_diag_size, 1> ArrayDiagType;
+
+  typedef typename conditional<
+    is_same< typename traits<ExpressionType>::XprKind, MatrixXpr >::value,
+    MatrixDiagType,
+    ArrayDiagType 
+  >::type type;
+};
+
+template<typename ExpressionType>
+struct is_lvalue
+{
+  enum { value = !bool(is_const<ExpressionType>::value) &&
+                 bool(traits<ExpressionType>::Flags & LvalueBit) };
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_XPRHELPER_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Block.h b/third_party/eigen3/Eigen/src/Eigen2Support/Block.h
new file mode 100644
index 0000000000..604456f40e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Block.h
@@ -0,0 +1,126 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BLOCK2_H
+#define EIGEN_BLOCK2_H
+
+namespace Eigen { 
+
+/** \returns a dynamic-size expression of a corner of *this.
+  *
+  * \param type the type of corner. Can be \a Eigen::TopLeft, \a Eigen::TopRight,
+  * \a Eigen::BottomLeft, \a Eigen::BottomRight.
+  * \param cRows the number of rows in the corner
+  * \param cCols the number of columns in the corner
+  *
+  * Example: \include MatrixBase_corner_enum_int_int.cpp
+  * Output: \verbinclude MatrixBase_corner_enum_int_int.out
+  *
+  * \note Even though the returned expression has dynamic size, in the case
+  * when it is applied to a fixed-size matrix, it inherits a fixed maximal size,
+  * which means that evaluating it does not cause a dynamic memory allocation.
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<typename Derived>
+inline Block<Derived> DenseBase<Derived>
+  ::corner(CornerType type, Index cRows, Index cCols)
+{
+  switch(type)
+  {
+    default:
+      eigen_assert(false && "Bad corner type.");
+    case TopLeft:
+      return Block<Derived>(derived(), 0, 0, cRows, cCols);
+    case TopRight:
+      return Block<Derived>(derived(), 0, cols() - cCols, cRows, cCols);
+    case BottomLeft:
+      return Block<Derived>(derived(), rows() - cRows, 0, cRows, cCols);
+    case BottomRight:
+      return Block<Derived>(derived(), rows() - cRows, cols() - cCols, cRows, cCols);
+  }
+}
+
+/** This is the const version of corner(CornerType, Index, Index).*/
+template<typename Derived>
+inline const Block<Derived>
+DenseBase<Derived>::corner(CornerType type, Index cRows, Index cCols) const
+{
+  switch(type)
+  {
+    default:
+      eigen_assert(false && "Bad corner type.");
+    case TopLeft:
+      return Block<Derived>(derived(), 0, 0, cRows, cCols);
+    case TopRight:
+      return Block<Derived>(derived(), 0, cols() - cCols, cRows, cCols);
+    case BottomLeft:
+      return Block<Derived>(derived(), rows() - cRows, 0, cRows, cCols);
+    case BottomRight:
+      return Block<Derived>(derived(), rows() - cRows, cols() - cCols, cRows, cCols);
+  }
+}
+
+/** \returns a fixed-size expression of a corner of *this.
+  *
+  * \param type the type of corner. Can be \a Eigen::TopLeft, \a Eigen::TopRight,
+  * \a Eigen::BottomLeft, \a Eigen::BottomRight.
+  *
+  * The template parameters CRows and CCols arethe number of rows and columns in the corner.
+  *
+  * Example: \include MatrixBase_template_int_int_corner_enum.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_corner_enum.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<typename Derived>
+template<int CRows, int CCols>
+inline Block<Derived, CRows, CCols>
+DenseBase<Derived>::corner(CornerType type)
+{
+  switch(type)
+  {
+    default:
+      eigen_assert(false && "Bad corner type.");
+    case TopLeft:
+      return Block<Derived, CRows, CCols>(derived(), 0, 0);
+    case TopRight:
+      return Block<Derived, CRows, CCols>(derived(), 0, cols() - CCols);
+    case BottomLeft:
+      return Block<Derived, CRows, CCols>(derived(), rows() - CRows, 0);
+    case BottomRight:
+      return Block<Derived, CRows, CCols>(derived(), rows() - CRows, cols() - CCols);
+  }
+}
+
+/** This is the const version of corner<int, int>(CornerType).*/
+template<typename Derived>
+template<int CRows, int CCols>
+inline const Block<Derived, CRows, CCols>
+DenseBase<Derived>::corner(CornerType type) const
+{
+  switch(type)
+  {
+    default:
+      eigen_assert(false && "Bad corner type.");
+    case TopLeft:
+      return Block<Derived, CRows, CCols>(derived(), 0, 0);
+    case TopRight:
+      return Block<Derived, CRows, CCols>(derived(), 0, cols() - CCols);
+    case BottomLeft:
+      return Block<Derived, CRows, CCols>(derived(), rows() - CRows, 0);
+    case BottomRight:
+      return Block<Derived, CRows, CCols>(derived(), rows() - CRows, cols() - CCols);
+  }
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_BLOCK2_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Cwise.h b/third_party/eigen3/Eigen/src/Eigen2Support/Cwise.h
new file mode 100644
index 0000000000..d95009b6e2
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Cwise.h
@@ -0,0 +1,192 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CWISE_H
+#define EIGEN_CWISE_H
+
+namespace Eigen { 
+
+/** \internal
+  * convenient macro to defined the return type of a cwise binary operation */
+#define EIGEN_CWISE_BINOP_RETURN_TYPE(OP) \
+    CwiseBinaryOp<OP<typename internal::traits<ExpressionType>::Scalar>, ExpressionType, OtherDerived>
+
+/** \internal
+  * convenient macro to defined the return type of a cwise unary operation */
+#define EIGEN_CWISE_UNOP_RETURN_TYPE(OP) \
+    CwiseUnaryOp<OP<typename internal::traits<ExpressionType>::Scalar>, ExpressionType>
+
+/** \internal
+  * convenient macro to defined the return type of a cwise comparison to a scalar */
+#define EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(OP) \
+    CwiseBinaryOp<OP<typename internal::traits<ExpressionType>::Scalar>, ExpressionType, \
+        typename ExpressionType::ConstantReturnType >
+
+/** \class Cwise
+  *
+  * \brief Pseudo expression providing additional coefficient-wise operations
+  *
+  * \param ExpressionType the type of the object on which to do coefficient-wise operations
+  *
+  * This class represents an expression with additional coefficient-wise features.
+  * It is the return type of MatrixBase::cwise()
+  * and most of the time this is the only way it is used.
+  *
+  * Example: \include MatrixBase_cwise_const.cpp
+  * Output: \verbinclude MatrixBase_cwise_const.out
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_CWISE_PLUGIN.
+  *
+  * \sa MatrixBase::cwise() const, MatrixBase::cwise()
+  */
+template<typename ExpressionType> class Cwise
+{
+  public:
+
+    typedef typename internal::traits<ExpressionType>::Scalar Scalar;
+    typedef typename internal::conditional<internal::must_nest_by_value<ExpressionType>::ret,
+        ExpressionType, const ExpressionType&>::type ExpressionTypeNested;
+    typedef CwiseUnaryOp<internal::scalar_add_op<Scalar>, ExpressionType> ScalarAddReturnType;
+
+    inline Cwise(const ExpressionType& matrix) : m_matrix(matrix) {}
+
+    /** \internal */
+    inline const ExpressionType& _expression() const { return m_matrix; }
+
+    template<typename OtherDerived>
+    const EIGEN_CWISE_PRODUCT_RETURN_TYPE(ExpressionType,OtherDerived)
+    operator*(const MatrixBase<OtherDerived> &other) const;
+
+    template<typename OtherDerived>
+    const EIGEN_CWISE_BINOP_RETURN_TYPE(internal::scalar_quotient_op)
+    operator/(const MatrixBase<OtherDerived> &other) const;
+
+    /** \deprecated ArrayBase::min() */
+    template<typename OtherDerived>
+    const EIGEN_CWISE_BINOP_RETURN_TYPE(internal::scalar_min_op)
+    (min)(const MatrixBase<OtherDerived> &other) const
+    { return EIGEN_CWISE_BINOP_RETURN_TYPE(internal::scalar_min_op)(_expression(), other.derived()); }
+
+    /** \deprecated ArrayBase::max() */
+    template<typename OtherDerived>
+    const EIGEN_CWISE_BINOP_RETURN_TYPE(internal::scalar_max_op)
+    (max)(const MatrixBase<OtherDerived> &other) const
+    { return EIGEN_CWISE_BINOP_RETURN_TYPE(internal::scalar_max_op)(_expression(), other.derived()); }
+
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_abs_op)      abs() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_abs2_op)     abs2() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_square_op)   square() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_cube_op)     cube() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_inverse_op)  inverse() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_sqrt_op)     sqrt() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_exp_op)      exp() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_log_op)      log() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_cos_op)      cos() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_sin_op)      sin() const;
+    const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_pow_op)      pow(const Scalar& exponent) const;
+
+    const ScalarAddReturnType
+    operator+(const Scalar& scalar) const;
+
+    /** \relates Cwise */
+    friend const ScalarAddReturnType
+    operator+(const Scalar& scalar, const Cwise& mat)
+    { return mat + scalar; }
+
+    ExpressionType& operator+=(const Scalar& scalar);
+
+    const ScalarAddReturnType
+    operator-(const Scalar& scalar) const;
+
+    ExpressionType& operator-=(const Scalar& scalar);
+
+    template<typename OtherDerived>
+    inline ExpressionType& operator*=(const MatrixBase<OtherDerived> &other);
+
+    template<typename OtherDerived>
+    inline ExpressionType& operator/=(const MatrixBase<OtherDerived> &other);
+
+    template<typename OtherDerived> const EIGEN_CWISE_BINOP_RETURN_TYPE(std::less)
+    operator<(const MatrixBase<OtherDerived>& other) const;
+
+    template<typename OtherDerived> const EIGEN_CWISE_BINOP_RETURN_TYPE(std::less_equal)
+    operator<=(const MatrixBase<OtherDerived>& other) const;
+
+    template<typename OtherDerived> const EIGEN_CWISE_BINOP_RETURN_TYPE(std::greater)
+    operator>(const MatrixBase<OtherDerived>& other) const;
+
+    template<typename OtherDerived> const EIGEN_CWISE_BINOP_RETURN_TYPE(std::greater_equal)
+    operator>=(const MatrixBase<OtherDerived>& other) const;
+
+    template<typename OtherDerived> const EIGEN_CWISE_BINOP_RETURN_TYPE(std::equal_to)
+    operator==(const MatrixBase<OtherDerived>& other) const;
+
+    template<typename OtherDerived> const EIGEN_CWISE_BINOP_RETURN_TYPE(std::not_equal_to)
+    operator!=(const MatrixBase<OtherDerived>& other) const;
+
+    // comparisons to a scalar value
+    const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::less)
+    operator<(Scalar s) const;
+
+    const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::less_equal)
+    operator<=(Scalar s) const;
+
+    const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::greater)
+    operator>(Scalar s) const;
+
+    const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::greater_equal)
+    operator>=(Scalar s) const;
+
+    const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::equal_to)
+    operator==(Scalar s) const;
+
+    const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::not_equal_to)
+    operator!=(Scalar s) const;
+
+    // allow to extend Cwise outside Eigen
+    #ifdef EIGEN_CWISE_PLUGIN
+    #include EIGEN_CWISE_PLUGIN
+    #endif
+
+  protected:
+    ExpressionTypeNested m_matrix;
+};
+
+
+/** \returns a Cwise wrapper of *this providing additional coefficient-wise operations
+  *
+  * Example: \include MatrixBase_cwise_const.cpp
+  * Output: \verbinclude MatrixBase_cwise_const.out
+  *
+  * \sa class Cwise, cwise()
+  */
+template<typename Derived>
+inline const Cwise<Derived> MatrixBase<Derived>::cwise() const
+{
+  return derived();
+}
+
+/** \returns a Cwise wrapper of *this providing additional coefficient-wise operations
+  *
+  * Example: \include MatrixBase_cwise.cpp
+  * Output: \verbinclude MatrixBase_cwise.out
+  *
+  * \sa class Cwise, cwise() const
+  */
+template<typename Derived>
+inline Cwise<Derived> MatrixBase<Derived>::cwise()
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CWISE_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/CwiseOperators.h b/third_party/eigen3/Eigen/src/Eigen2Support/CwiseOperators.h
new file mode 100644
index 0000000000..482f306485
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/CwiseOperators.h
@@ -0,0 +1,298 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ARRAY_CWISE_OPERATORS_H
+#define EIGEN_ARRAY_CWISE_OPERATORS_H
+
+namespace Eigen { 
+
+/***************************************************************************
+* The following functions were defined in Core
+***************************************************************************/
+
+
+/** \deprecated ArrayBase::abs() */
+template<typename ExpressionType>
+EIGEN_STRONG_INLINE const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_abs_op)
+Cwise<ExpressionType>::abs() const
+{
+  return _expression();
+}
+
+/** \deprecated ArrayBase::abs2() */
+template<typename ExpressionType>
+EIGEN_STRONG_INLINE const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_abs2_op)
+Cwise<ExpressionType>::abs2() const
+{
+  return _expression();
+}
+
+/** \deprecated ArrayBase::exp() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_exp_op)
+Cwise<ExpressionType>::exp() const
+{
+  return _expression();
+}
+
+/** \deprecated ArrayBase::log() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_log_op)
+Cwise<ExpressionType>::log() const
+{
+  return _expression();
+}
+
+/** \deprecated ArrayBase::operator*() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE const EIGEN_CWISE_PRODUCT_RETURN_TYPE(ExpressionType,OtherDerived)
+Cwise<ExpressionType>::operator*(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_PRODUCT_RETURN_TYPE(ExpressionType,OtherDerived)(_expression(), other.derived());
+}
+
+/** \deprecated ArrayBase::operator/() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE const EIGEN_CWISE_BINOP_RETURN_TYPE(internal::scalar_quotient_op)
+Cwise<ExpressionType>::operator/(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_BINOP_RETURN_TYPE(internal::scalar_quotient_op)(_expression(), other.derived());
+}
+
+/** \deprecated ArrayBase::operator*=() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+inline ExpressionType& Cwise<ExpressionType>::operator*=(const MatrixBase<OtherDerived> &other)
+{
+  return m_matrix.const_cast_derived() = *this * other;
+}
+
+/** \deprecated ArrayBase::operator/=() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+inline ExpressionType& Cwise<ExpressionType>::operator/=(const MatrixBase<OtherDerived> &other)
+{
+  return m_matrix.const_cast_derived() = *this / other;
+}
+
+/***************************************************************************
+* The following functions were defined in Array
+***************************************************************************/
+
+// -- unary operators --
+
+/** \deprecated ArrayBase::sqrt() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_sqrt_op)
+Cwise<ExpressionType>::sqrt() const
+{
+  return _expression();
+}
+
+/** \deprecated ArrayBase::cos() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_cos_op)
+Cwise<ExpressionType>::cos() const
+{
+  return _expression();
+}
+
+
+/** \deprecated ArrayBase::sin() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_sin_op)
+Cwise<ExpressionType>::sin() const
+{
+  return _expression();
+}
+
+
+/** \deprecated ArrayBase::log() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_pow_op)
+Cwise<ExpressionType>::pow(const Scalar& exponent) const
+{
+  return EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_pow_op)(_expression(), internal::scalar_pow_op<Scalar>(exponent));
+}
+
+
+/** \deprecated ArrayBase::inverse() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_inverse_op)
+Cwise<ExpressionType>::inverse() const
+{
+  return _expression();
+}
+
+/** \deprecated ArrayBase::square() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_square_op)
+Cwise<ExpressionType>::square() const
+{
+  return _expression();
+}
+
+/** \deprecated ArrayBase::cube() */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_UNOP_RETURN_TYPE(internal::scalar_cube_op)
+Cwise<ExpressionType>::cube() const
+{
+  return _expression();
+}
+
+
+// -- binary operators --
+
+/** \deprecated ArrayBase::operator<() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+inline const EIGEN_CWISE_BINOP_RETURN_TYPE(std::less)
+Cwise<ExpressionType>::operator<(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_BINOP_RETURN_TYPE(std::less)(_expression(), other.derived());
+}
+
+/** \deprecated ArrayBase::<=() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+inline const EIGEN_CWISE_BINOP_RETURN_TYPE(std::less_equal)
+Cwise<ExpressionType>::operator<=(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_BINOP_RETURN_TYPE(std::less_equal)(_expression(), other.derived());
+}
+
+/** \deprecated ArrayBase::operator>() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+inline const EIGEN_CWISE_BINOP_RETURN_TYPE(std::greater)
+Cwise<ExpressionType>::operator>(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_BINOP_RETURN_TYPE(std::greater)(_expression(), other.derived());
+}
+
+/** \deprecated ArrayBase::operator>=() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+inline const EIGEN_CWISE_BINOP_RETURN_TYPE(std::greater_equal)
+Cwise<ExpressionType>::operator>=(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_BINOP_RETURN_TYPE(std::greater_equal)(_expression(), other.derived());
+}
+
+/** \deprecated ArrayBase::operator==() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+inline const EIGEN_CWISE_BINOP_RETURN_TYPE(std::equal_to)
+Cwise<ExpressionType>::operator==(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_BINOP_RETURN_TYPE(std::equal_to)(_expression(), other.derived());
+}
+
+/** \deprecated ArrayBase::operator!=() */
+template<typename ExpressionType>
+template<typename OtherDerived>
+inline const EIGEN_CWISE_BINOP_RETURN_TYPE(std::not_equal_to)
+Cwise<ExpressionType>::operator!=(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_BINOP_RETURN_TYPE(std::not_equal_to)(_expression(), other.derived());
+}
+
+// comparisons to scalar value
+
+/** \deprecated ArrayBase::operator<(Scalar) */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::less)
+Cwise<ExpressionType>::operator<(Scalar s) const
+{
+  return EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::less)(_expression(),
+            typename ExpressionType::ConstantReturnType(_expression().rows(), _expression().cols(), s));
+}
+
+/** \deprecated ArrayBase::operator<=(Scalar) */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::less_equal)
+Cwise<ExpressionType>::operator<=(Scalar s) const
+{
+  return EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::less_equal)(_expression(),
+            typename ExpressionType::ConstantReturnType(_expression().rows(), _expression().cols(), s));
+}
+
+/** \deprecated ArrayBase::operator>(Scalar) */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::greater)
+Cwise<ExpressionType>::operator>(Scalar s) const
+{
+  return EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::greater)(_expression(),
+            typename ExpressionType::ConstantReturnType(_expression().rows(), _expression().cols(), s));
+}
+
+/** \deprecated ArrayBase::operator>=(Scalar) */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::greater_equal)
+Cwise<ExpressionType>::operator>=(Scalar s) const
+{
+  return EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::greater_equal)(_expression(),
+            typename ExpressionType::ConstantReturnType(_expression().rows(), _expression().cols(), s));
+}
+
+/** \deprecated ArrayBase::operator==(Scalar) */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::equal_to)
+Cwise<ExpressionType>::operator==(Scalar s) const
+{
+  return EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::equal_to)(_expression(),
+            typename ExpressionType::ConstantReturnType(_expression().rows(), _expression().cols(), s));
+}
+
+/** \deprecated ArrayBase::operator!=(Scalar) */
+template<typename ExpressionType>
+inline const EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::not_equal_to)
+Cwise<ExpressionType>::operator!=(Scalar s) const
+{
+  return EIGEN_CWISE_COMP_TO_SCALAR_RETURN_TYPE(std::not_equal_to)(_expression(),
+            typename ExpressionType::ConstantReturnType(_expression().rows(), _expression().cols(), s));
+}
+
+// scalar addition
+
+/** \deprecated ArrayBase::operator+(Scalar) */
+template<typename ExpressionType>
+inline const typename Cwise<ExpressionType>::ScalarAddReturnType
+Cwise<ExpressionType>::operator+(const Scalar& scalar) const
+{
+  return typename Cwise<ExpressionType>::ScalarAddReturnType(m_matrix, internal::scalar_add_op<Scalar>(scalar));
+}
+
+/** \deprecated ArrayBase::operator+=(Scalar) */
+template<typename ExpressionType>
+inline ExpressionType& Cwise<ExpressionType>::operator+=(const Scalar& scalar)
+{
+  return m_matrix.const_cast_derived() = *this + scalar;
+}
+
+/** \deprecated ArrayBase::operator-(Scalar) */
+template<typename ExpressionType>
+inline const typename Cwise<ExpressionType>::ScalarAddReturnType
+Cwise<ExpressionType>::operator-(const Scalar& scalar) const
+{
+  return *this + (-scalar);
+}
+
+/** \deprecated ArrayBase::operator-=(Scalar) */
+template<typename ExpressionType>
+inline ExpressionType& Cwise<ExpressionType>::operator-=(const Scalar& scalar)
+{
+  return m_matrix.const_cast_derived() = *this - scalar;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_ARRAY_CWISE_OPERATORS_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/AlignedBox.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/AlignedBox.h
new file mode 100644
index 0000000000..2e4309dd94
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/AlignedBox.h
@@ -0,0 +1,159 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  * \nonstableyet
+  *
+  * \class AlignedBox
+  *
+  * \brief An axis aligned box
+  *
+  * \param _Scalar the type of the scalar coefficients
+  * \param _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic.
+  *
+  * This class represents an axis aligned box as a pair of the minimal and maximal corners.
+  */
+template <typename _Scalar, int _AmbientDim>
+class AlignedBox
+{
+public:
+EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim==Dynamic ? Dynamic : _AmbientDim+1)
+  enum { AmbientDimAtCompileTime = _AmbientDim };
+  typedef _Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef Matrix<Scalar,AmbientDimAtCompileTime,1> VectorType;
+
+  /** Default constructor initializing a null box. */
+  inline AlignedBox()
+  { if (AmbientDimAtCompileTime!=Dynamic) setNull(); }
+
+  /** Constructs a null box with \a _dim the dimension of the ambient space. */
+  inline explicit AlignedBox(int _dim) : m_min(_dim), m_max(_dim)
+  { setNull(); }
+
+  /** Constructs a box with extremities \a _min and \a _max. */
+  inline AlignedBox(const VectorType& _min, const VectorType& _max) : m_min(_min), m_max(_max) {}
+
+  /** Constructs a box containing a single point \a p. */
+  inline explicit AlignedBox(const VectorType& p) : m_min(p), m_max(p) {}
+
+  ~AlignedBox() {}
+
+  /** \returns the dimension in which the box holds */
+  inline int dim() const { return AmbientDimAtCompileTime==Dynamic ? m_min.size()-1 : AmbientDimAtCompileTime; }
+
+  /** \returns true if the box is null, i.e, empty. */
+  inline bool isNull() const { return (m_min.cwise() > m_max).any(); }
+
+  /** Makes \c *this a null/empty box. */
+  inline void setNull()
+  {
+    m_min.setConstant( (std::numeric_limits<Scalar>::max)());
+    m_max.setConstant(-(std::numeric_limits<Scalar>::max)());
+  }
+
+  /** \returns the minimal corner */
+  inline const VectorType& (min)() const { return m_min; }
+  /** \returns a non const reference to the minimal corner */
+  inline VectorType& (min)() { return m_min; }
+  /** \returns the maximal corner */
+  inline const VectorType& (max)() const { return m_max; }
+  /** \returns a non const reference to the maximal corner */
+  inline VectorType& (max)() { return m_max; }
+
+  /** \returns true if the point \a p is inside the box \c *this. */
+  inline bool contains(const VectorType& p) const
+  { return (m_min.cwise()<=p).all() && (p.cwise()<=m_max).all(); }
+
+  /** \returns true if the box \a b is entirely inside the box \c *this. */
+  inline bool contains(const AlignedBox& b) const
+  { return (m_min.cwise()<=(b.min)()).all() && ((b.max)().cwise()<=m_max).all(); }
+
+  /** Extends \c *this such that it contains the point \a p and returns a reference to \c *this. */
+  inline AlignedBox& extend(const VectorType& p)
+  { m_min = (m_min.cwise().min)(p); m_max = (m_max.cwise().max)(p); return *this; }
+
+  /** Extends \c *this such that it contains the box \a b and returns a reference to \c *this. */
+  inline AlignedBox& extend(const AlignedBox& b)
+  { m_min = (m_min.cwise().min)(b.m_min); m_max = (m_max.cwise().max)(b.m_max); return *this; }
+
+  /** Clamps \c *this by the box \a b and returns a reference to \c *this. */
+  inline AlignedBox& clamp(const AlignedBox& b)
+  { m_min = (m_min.cwise().max)(b.m_min); m_max = (m_max.cwise().min)(b.m_max); return *this; }
+
+  /** Translate \c *this by the vector \a t and returns a reference to \c *this. */
+  inline AlignedBox& translate(const VectorType& t)
+  { m_min += t; m_max += t; return *this; }
+
+  /** \returns the squared distance between the point \a p and the box \c *this,
+    * and zero if \a p is inside the box.
+    * \sa exteriorDistance()
+    */
+  inline Scalar squaredExteriorDistance(const VectorType& p) const;
+
+  /** \returns the distance between the point \a p and the box \c *this,
+    * and zero if \a p is inside the box.
+    * \sa squaredExteriorDistance()
+    */
+  inline Scalar exteriorDistance(const VectorType& p) const
+  { return ei_sqrt(squaredExteriorDistance(p)); }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<AlignedBox,
+           AlignedBox<NewScalarType,AmbientDimAtCompileTime> >::type cast() const
+  {
+    return typename internal::cast_return_type<AlignedBox,
+                    AlignedBox<NewScalarType,AmbientDimAtCompileTime> >::type(*this);
+  }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit AlignedBox(const AlignedBox<OtherScalarType,AmbientDimAtCompileTime>& other)
+  {
+    m_min = (other.min)().template cast<Scalar>();
+    m_max = (other.max)().template cast<Scalar>();
+  }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const AlignedBox& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return m_min.isApprox(other.m_min, prec) && m_max.isApprox(other.m_max, prec); }
+
+protected:
+
+  VectorType m_min, m_max;
+};
+
+template<typename Scalar,int AmbiantDim>
+inline Scalar AlignedBox<Scalar,AmbiantDim>::squaredExteriorDistance(const VectorType& p) const
+{
+  Scalar dist2(0);
+  Scalar aux;
+  for (int k=0; k<dim(); ++k)
+  {
+    if ((aux = (p[k]-m_min[k]))<Scalar(0))
+      dist2 += aux*aux;
+    else if ( (aux = (m_max[k]-p[k]))<Scalar(0))
+      dist2 += aux*aux;
+  }
+  return dist2;
+}
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/All.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/All.h
new file mode 100644
index 0000000000..e0b00fcccc
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/All.h
@@ -0,0 +1,115 @@
+#ifndef EIGEN2_GEOMETRY_MODULE_H
+#define EIGEN2_GEOMETRY_MODULE_H
+
+#include <limits>
+
+#ifndef M_PI
+#define M_PI 3.14159265358979323846
+#endif
+
+#if EIGEN2_SUPPORT_STAGE < STAGE20_RESOLVE_API_CONFLICTS
+#include "RotationBase.h"
+#include "Rotation2D.h"
+#include "Quaternion.h"
+#include "AngleAxis.h"
+#include "Transform.h"
+#include "Translation.h"
+#include "Scaling.h"
+#include "AlignedBox.h"
+#include "Hyperplane.h"
+#include "ParametrizedLine.h"
+#endif
+
+
+#define RotationBase eigen2_RotationBase
+#define Rotation2D eigen2_Rotation2D
+#define Rotation2Df eigen2_Rotation2Df
+#define Rotation2Dd eigen2_Rotation2Dd
+
+#define Quaternion  eigen2_Quaternion
+#define Quaternionf eigen2_Quaternionf
+#define Quaterniond eigen2_Quaterniond
+
+#define AngleAxis eigen2_AngleAxis
+#define AngleAxisf eigen2_AngleAxisf
+#define AngleAxisd eigen2_AngleAxisd
+
+#define Transform   eigen2_Transform
+#define Transform2f eigen2_Transform2f
+#define Transform2d eigen2_Transform2d
+#define Transform3f eigen2_Transform3f
+#define Transform3d eigen2_Transform3d
+
+#define Translation eigen2_Translation
+#define Translation2f eigen2_Translation2f
+#define Translation2d eigen2_Translation2d
+#define Translation3f eigen2_Translation3f
+#define Translation3d eigen2_Translation3d
+
+#define Scaling eigen2_Scaling
+#define Scaling2f eigen2_Scaling2f
+#define Scaling2d eigen2_Scaling2d
+#define Scaling3f eigen2_Scaling3f
+#define Scaling3d eigen2_Scaling3d
+
+#define AlignedBox eigen2_AlignedBox
+
+#define Hyperplane eigen2_Hyperplane
+#define ParametrizedLine eigen2_ParametrizedLine
+
+#define ei_toRotationMatrix eigen2_ei_toRotationMatrix
+#define ei_quaternion_assign_impl eigen2_ei_quaternion_assign_impl
+#define ei_transform_product_impl eigen2_ei_transform_product_impl
+
+#include "RotationBase.h"
+#include "Rotation2D.h"
+#include "Quaternion.h"
+#include "AngleAxis.h"
+#include "Transform.h"
+#include "Translation.h"
+#include "Scaling.h"
+#include "AlignedBox.h"
+#include "Hyperplane.h"
+#include "ParametrizedLine.h"
+
+#undef ei_toRotationMatrix
+#undef ei_quaternion_assign_impl
+#undef ei_transform_product_impl
+
+#undef RotationBase
+#undef Rotation2D
+#undef Rotation2Df
+#undef Rotation2Dd
+
+#undef Quaternion
+#undef Quaternionf
+#undef Quaterniond
+
+#undef AngleAxis
+#undef AngleAxisf
+#undef AngleAxisd
+
+#undef Transform
+#undef Transform2f
+#undef Transform2d
+#undef Transform3f
+#undef Transform3d
+
+#undef Translation
+#undef Translation2f
+#undef Translation2d
+#undef Translation3f
+#undef Translation3d
+
+#undef Scaling
+#undef Scaling2f
+#undef Scaling2d
+#undef Scaling3f
+#undef Scaling3d
+
+#undef AlignedBox
+
+#undef Hyperplane
+#undef ParametrizedLine
+
+#endif // EIGEN2_GEOMETRY_MODULE_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/AngleAxis.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/AngleAxis.h
new file mode 100644
index 0000000000..a0b4ac44e7
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/AngleAxis.h
@@ -0,0 +1,228 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class AngleAxis
+  *
+  * \brief Represents a 3D rotation as a rotation angle around an arbitrary 3D axis
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients.
+  *
+  * The following two typedefs are provided for convenience:
+  * \li \c AngleAxisf for \c float
+  * \li \c AngleAxisd for \c double
+  *
+  * \addexample AngleAxisForEuler \label How to define a rotation from Euler-angles
+  *
+  * Combined with MatrixBase::Unit{X,Y,Z}, AngleAxis can be used to easily
+  * mimic Euler-angles. Here is an example:
+  * \include AngleAxis_mimic_euler.cpp
+  * Output: \verbinclude AngleAxis_mimic_euler.out
+  *
+  * \note This class is not aimed to be used to store a rotation transformation,
+  * but rather to make easier the creation of other rotation (Quaternion, rotation Matrix)
+  * and transformation objects.
+  *
+  * \sa class Quaternion, class Transform, MatrixBase::UnitX()
+  */
+
+template<typename _Scalar> struct ei_traits<AngleAxis<_Scalar> >
+{
+  typedef _Scalar Scalar;
+};
+
+template<typename _Scalar>
+class AngleAxis : public RotationBase<AngleAxis<_Scalar>,3>
+{
+  typedef RotationBase<AngleAxis<_Scalar>,3> Base;
+
+public:
+
+  using Base::operator*;
+
+  enum { Dim = 3 };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  typedef Matrix<Scalar,3,3> Matrix3;
+  typedef Matrix<Scalar,3,1> Vector3;
+  typedef Quaternion<Scalar> QuaternionType;
+
+protected:
+
+  Vector3 m_axis;
+  Scalar m_angle;
+
+public:
+
+  /** Default constructor without initialization. */
+  AngleAxis() {}
+
+  /** Constructs and initialize the angle-axis rotation from an \a angle in radian
+    * and an \a axis which must be normalized. */
+  template<typename Derived>
+  inline AngleAxis(Scalar angle, const MatrixBase<Derived>& axis) : m_axis(axis), m_angle(angle)
+  {
+    using std::sqrt;
+    using std::abs;
+    // since we compare against 1, this is equal to computing the relative error
+    eigen_assert( abs(m_axis.derived().squaredNorm() - 1) < sqrt( NumTraits<Scalar>::dummy_precision() ) );
+  }
+
+  /** Constructs and initialize the angle-axis rotation from a quaternion \a q. */
+  inline AngleAxis(const QuaternionType& q) { *this = q; }
+
+  /** Constructs and initialize the angle-axis rotation from a 3x3 rotation matrix. */
+  template<typename Derived>
+  inline explicit AngleAxis(const MatrixBase<Derived>& m) { *this = m; }
+
+  Scalar angle() const { return m_angle; }
+  Scalar& angle() { return m_angle; }
+
+  const Vector3& axis() const { return m_axis; }
+  Vector3& axis() { return m_axis; }
+
+  /** Concatenates two rotations */
+  inline QuaternionType operator* (const AngleAxis& other) const
+  { return QuaternionType(*this) * QuaternionType(other); }
+
+  /** Concatenates two rotations */
+  inline QuaternionType operator* (const QuaternionType& other) const
+  { return QuaternionType(*this) * other; }
+
+  /** Concatenates two rotations */
+  friend inline QuaternionType operator* (const QuaternionType& a, const AngleAxis& b)
+  { return a * QuaternionType(b); }
+
+  /** Concatenates two rotations */
+  inline Matrix3 operator* (const Matrix3& other) const
+  { return toRotationMatrix() * other; }
+
+  /** Concatenates two rotations */
+  inline friend Matrix3 operator* (const Matrix3& a, const AngleAxis& b)
+  { return a * b.toRotationMatrix(); }
+
+  /** Applies rotation to vector */
+  inline Vector3 operator* (const Vector3& other) const
+  { return toRotationMatrix() * other; }
+
+  /** \returns the inverse rotation, i.e., an angle-axis with opposite rotation angle */
+  AngleAxis inverse() const
+  { return AngleAxis(-m_angle, m_axis); }
+
+  AngleAxis& operator=(const QuaternionType& q);
+  template<typename Derived>
+  AngleAxis& operator=(const MatrixBase<Derived>& m);
+
+  template<typename Derived>
+  AngleAxis& fromRotationMatrix(const MatrixBase<Derived>& m);
+  Matrix3 toRotationMatrix(void) const;
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<AngleAxis,AngleAxis<NewScalarType> >::type cast() const
+  { return typename internal::cast_return_type<AngleAxis,AngleAxis<NewScalarType> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit AngleAxis(const AngleAxis<OtherScalarType>& other)
+  {
+    m_axis = other.axis().template cast<Scalar>();
+    m_angle = Scalar(other.angle());
+  }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const AngleAxis& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return m_axis.isApprox(other.m_axis, prec) && ei_isApprox(m_angle,other.m_angle, prec); }
+};
+
+/** \ingroup Geometry_Module
+  * single precision angle-axis type */
+typedef AngleAxis<float> AngleAxisf;
+/** \ingroup Geometry_Module
+  * double precision angle-axis type */
+typedef AngleAxis<double> AngleAxisd;
+
+/** Set \c *this from a quaternion.
+  * The axis is normalized.
+  */
+template<typename Scalar>
+AngleAxis<Scalar>& AngleAxis<Scalar>::operator=(const QuaternionType& q)
+{
+  Scalar n2 = q.vec().squaredNorm();
+  if (n2 < precision<Scalar>()*precision<Scalar>())
+  {
+    m_angle = 0;
+    m_axis << 1, 0, 0;
+  }
+  else
+  {
+    m_angle = 2*std::acos(q.w());
+    m_axis = q.vec() / ei_sqrt(n2);
+
+    using std::sqrt;
+    using std::abs;
+    // since we compare against 1, this is equal to computing the relative error
+    eigen_assert( abs(m_axis.derived().squaredNorm() - 1) < sqrt( NumTraits<Scalar>::dummy_precision() ) );
+  }
+  return *this;
+}
+
+/** Set \c *this from a 3x3 rotation matrix \a mat.
+  */
+template<typename Scalar>
+template<typename Derived>
+AngleAxis<Scalar>& AngleAxis<Scalar>::operator=(const MatrixBase<Derived>& mat)
+{
+  // Since a direct conversion would not be really faster,
+  // let's use the robust Quaternion implementation:
+  return *this = QuaternionType(mat);
+}
+
+/** Constructs and \returns an equivalent 3x3 rotation matrix.
+  */
+template<typename Scalar>
+typename AngleAxis<Scalar>::Matrix3
+AngleAxis<Scalar>::toRotationMatrix(void) const
+{
+  Matrix3 res;
+  Vector3 sin_axis  = ei_sin(m_angle) * m_axis;
+  Scalar c = ei_cos(m_angle);
+  Vector3 cos1_axis = (Scalar(1)-c) * m_axis;
+
+  Scalar tmp;
+  tmp = cos1_axis.x() * m_axis.y();
+  res.coeffRef(0,1) = tmp - sin_axis.z();
+  res.coeffRef(1,0) = tmp + sin_axis.z();
+
+  tmp = cos1_axis.x() * m_axis.z();
+  res.coeffRef(0,2) = tmp + sin_axis.y();
+  res.coeffRef(2,0) = tmp - sin_axis.y();
+
+  tmp = cos1_axis.y() * m_axis.z();
+  res.coeffRef(1,2) = tmp - sin_axis.x();
+  res.coeffRef(2,1) = tmp + sin_axis.x();
+
+  res.diagonal() = (cos1_axis.cwise() * m_axis).cwise() + c;
+
+  return res;
+}
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Hyperplane.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Hyperplane.h
new file mode 100644
index 0000000000..b95bf00ecf
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Hyperplane.h
@@ -0,0 +1,254 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Hyperplane
+  *
+  * \brief A hyperplane
+  *
+  * A hyperplane is an affine subspace of dimension n-1 in a space of dimension n.
+  * For example, a hyperplane in a plane is a line; a hyperplane in 3-space is a plane.
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients
+  * \param _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic.
+  *             Notice that the dimension of the hyperplane is _AmbientDim-1.
+  *
+  * This class represents an hyperplane as the zero set of the implicit equation
+  * \f$ n \cdot x + d = 0 \f$ where \f$ n \f$ is a unit normal vector of the plane (linear part)
+  * and \f$ d \f$ is the distance (offset) to the origin.
+  */
+template <typename _Scalar, int _AmbientDim>
+class Hyperplane
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim==Dynamic ? Dynamic : _AmbientDim+1)
+  enum { AmbientDimAtCompileTime = _AmbientDim };
+  typedef _Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef Matrix<Scalar,AmbientDimAtCompileTime,1> VectorType;
+  typedef Matrix<Scalar,int(AmbientDimAtCompileTime)==Dynamic
+                        ? Dynamic
+                        : int(AmbientDimAtCompileTime)+1,1> Coefficients;
+  typedef Block<Coefficients,AmbientDimAtCompileTime,1> NormalReturnType;
+
+  /** Default constructor without initialization */
+  inline Hyperplane() {}
+
+  /** Constructs a dynamic-size hyperplane with \a _dim the dimension
+    * of the ambient space */
+  inline explicit Hyperplane(int _dim) : m_coeffs(_dim+1) {}
+
+  /** Construct a plane from its normal \a n and a point \a e onto the plane.
+    * \warning the vector normal is assumed to be normalized.
+    */
+  inline Hyperplane(const VectorType& n, const VectorType& e)
+    : m_coeffs(n.size()+1)
+  {
+    normal() = n;
+    offset() = -e.eigen2_dot(n);
+  }
+
+  /** Constructs a plane from its normal \a n and distance to the origin \a d
+    * such that the algebraic equation of the plane is \f$ n \cdot x + d = 0 \f$.
+    * \warning the vector normal is assumed to be normalized.
+    */
+  inline Hyperplane(const VectorType& n, Scalar d)
+    : m_coeffs(n.size()+1)
+  {
+    normal() = n;
+    offset() = d;
+  }
+
+  /** Constructs a hyperplane passing through the two points. If the dimension of the ambient space
+    * is greater than 2, then there isn't uniqueness, so an arbitrary choice is made.
+    */
+  static inline Hyperplane Through(const VectorType& p0, const VectorType& p1)
+  {
+    Hyperplane result(p0.size());
+    result.normal() = (p1 - p0).unitOrthogonal();
+    result.offset() = -result.normal().eigen2_dot(p0);
+    return result;
+  }
+
+  /** Constructs a hyperplane passing through the three points. The dimension of the ambient space
+    * is required to be exactly 3.
+    */
+  static inline Hyperplane Through(const VectorType& p0, const VectorType& p1, const VectorType& p2)
+  {
+    EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 3)
+    Hyperplane result(p0.size());
+    result.normal() = (p2 - p0).cross(p1 - p0).normalized();
+    result.offset() = -result.normal().eigen2_dot(p0);
+    return result;
+  }
+
+  /** Constructs a hyperplane passing through the parametrized line \a parametrized.
+    * If the dimension of the ambient space is greater than 2, then there isn't uniqueness,
+    * so an arbitrary choice is made.
+    */
+  // FIXME to be consitent with the rest this could be implemented as a static Through function ??
+  explicit Hyperplane(const ParametrizedLine<Scalar, AmbientDimAtCompileTime>& parametrized)
+  {
+    normal() = parametrized.direction().unitOrthogonal();
+    offset() = -normal().eigen2_dot(parametrized.origin());
+  }
+
+  ~Hyperplane() {}
+
+  /** \returns the dimension in which the plane holds */
+  inline int dim() const { return int(AmbientDimAtCompileTime)==Dynamic ? m_coeffs.size()-1 : int(AmbientDimAtCompileTime); }
+
+  /** normalizes \c *this */
+  void normalize(void)
+  {
+    m_coeffs /= normal().norm();
+  }
+
+  /** \returns the signed distance between the plane \c *this and a point \a p.
+    * \sa absDistance()
+    */
+  inline Scalar signedDistance(const VectorType& p) const { return p.eigen2_dot(normal()) + offset(); }
+
+  /** \returns the absolute distance between the plane \c *this and a point \a p.
+    * \sa signedDistance()
+    */
+  inline Scalar absDistance(const VectorType& p) const { return ei_abs(signedDistance(p)); }
+
+  /** \returns the projection of a point \a p onto the plane \c *this.
+    */
+  inline VectorType projection(const VectorType& p) const { return p - signedDistance(p) * normal(); }
+
+  /** \returns a constant reference to the unit normal vector of the plane, which corresponds
+    * to the linear part of the implicit equation.
+    */
+  inline const NormalReturnType normal() const { return NormalReturnType(*const_cast<Coefficients*>(&m_coeffs),0,0,dim(),1); }
+
+  /** \returns a non-constant reference to the unit normal vector of the plane, which corresponds
+    * to the linear part of the implicit equation.
+    */
+  inline NormalReturnType normal() { return NormalReturnType(m_coeffs,0,0,dim(),1); }
+
+  /** \returns the distance to the origin, which is also the "constant term" of the implicit equation
+    * \warning the vector normal is assumed to be normalized.
+    */
+  inline const Scalar& offset() const { return m_coeffs.coeff(dim()); }
+
+  /** \returns a non-constant reference to the distance to the origin, which is also the constant part
+    * of the implicit equation */
+  inline Scalar& offset() { return m_coeffs(dim()); }
+
+  /** \returns a constant reference to the coefficients c_i of the plane equation:
+    * \f$ c_0*x_0 + ... + c_{d-1}*x_{d-1} + c_d = 0 \f$
+    */
+  inline const Coefficients& coeffs() const { return m_coeffs; }
+
+  /** \returns a non-constant reference to the coefficients c_i of the plane equation:
+    * \f$ c_0*x_0 + ... + c_{d-1}*x_{d-1} + c_d = 0 \f$
+    */
+  inline Coefficients& coeffs() { return m_coeffs; }
+
+  /** \returns the intersection of *this with \a other.
+    *
+    * \warning The ambient space must be a plane, i.e. have dimension 2, so that \c *this and \a other are lines.
+    *
+    * \note If \a other is approximately parallel to *this, this method will return any point on *this.
+    */
+  VectorType intersection(const Hyperplane& other)
+  {
+    EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 2)
+    Scalar det = coeffs().coeff(0) * other.coeffs().coeff(1) - coeffs().coeff(1) * other.coeffs().coeff(0);
+    // since the line equations ax+by=c are normalized with a^2+b^2=1, the following tests
+    // whether the two lines are approximately parallel.
+    if(ei_isMuchSmallerThan(det, Scalar(1)))
+    {   // special case where the two lines are approximately parallel. Pick any point on the first line.
+        if(ei_abs(coeffs().coeff(1))>ei_abs(coeffs().coeff(0)))
+            return VectorType(coeffs().coeff(1), -coeffs().coeff(2)/coeffs().coeff(1)-coeffs().coeff(0));
+        else
+            return VectorType(-coeffs().coeff(2)/coeffs().coeff(0)-coeffs().coeff(1), coeffs().coeff(0));
+    }
+    else
+    {   // general case
+        Scalar invdet = Scalar(1) / det;
+        return VectorType(invdet*(coeffs().coeff(1)*other.coeffs().coeff(2)-other.coeffs().coeff(1)*coeffs().coeff(2)),
+                          invdet*(other.coeffs().coeff(0)*coeffs().coeff(2)-coeffs().coeff(0)*other.coeffs().coeff(2)));
+    }
+  }
+
+  /** Applies the transformation matrix \a mat to \c *this and returns a reference to \c *this.
+    *
+    * \param mat the Dim x Dim transformation matrix
+    * \param traits specifies whether the matrix \a mat represents an Isometry
+    *               or a more generic Affine transformation. The default is Affine.
+    */
+  template<typename XprType>
+  inline Hyperplane& transform(const MatrixBase<XprType>& mat, TransformTraits traits = Affine)
+  {
+    if (traits==Affine)
+      normal() = mat.inverse().transpose() * normal();
+    else if (traits==Isometry)
+      normal() = mat * normal();
+    else
+    {
+      ei_assert("invalid traits value in Hyperplane::transform()");
+    }
+    return *this;
+  }
+
+  /** Applies the transformation \a t to \c *this and returns a reference to \c *this.
+    *
+    * \param t the transformation of dimension Dim
+    * \param traits specifies whether the transformation \a t represents an Isometry
+    *               or a more generic Affine transformation. The default is Affine.
+    *               Other kind of transformations are not supported.
+    */
+  inline Hyperplane& transform(const Transform<Scalar,AmbientDimAtCompileTime>& t,
+                                TransformTraits traits = Affine)
+  {
+    transform(t.linear(), traits);
+    offset() -= t.translation().eigen2_dot(normal());
+    return *this;
+  }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Hyperplane,
+           Hyperplane<NewScalarType,AmbientDimAtCompileTime> >::type cast() const
+  {
+    return typename internal::cast_return_type<Hyperplane,
+                    Hyperplane<NewScalarType,AmbientDimAtCompileTime> >::type(*this);
+  }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Hyperplane(const Hyperplane<OtherScalarType,AmbientDimAtCompileTime>& other)
+  { m_coeffs = other.coeffs().template cast<Scalar>(); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Hyperplane& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return m_coeffs.isApprox(other.m_coeffs, prec); }
+
+protected:
+
+  Coefficients m_coeffs;
+};
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/ParametrizedLine.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/ParametrizedLine.h
new file mode 100644
index 0000000000..9b57b7e0bb
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/ParametrizedLine.h
@@ -0,0 +1,141 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class ParametrizedLine
+  *
+  * \brief A parametrized line
+  *
+  * A parametrized line is defined by an origin point \f$ \mathbf{o} \f$ and a unit
+  * direction vector \f$ \mathbf{d} \f$ such that the line corresponds to
+  * the set \f$ l(t) = \mathbf{o} + t \mathbf{d} \f$, \f$ l \in \mathbf{R} \f$.
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients
+  * \param _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic.
+  */
+template <typename _Scalar, int _AmbientDim>
+class ParametrizedLine
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim)
+  enum { AmbientDimAtCompileTime = _AmbientDim };
+  typedef _Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef Matrix<Scalar,AmbientDimAtCompileTime,1> VectorType;
+
+  /** Default constructor without initialization */
+  inline ParametrizedLine() {}
+
+  /** Constructs a dynamic-size line with \a _dim the dimension
+    * of the ambient space */
+  inline explicit ParametrizedLine(int _dim) : m_origin(_dim), m_direction(_dim) {}
+
+  /** Initializes a parametrized line of direction \a direction and origin \a origin.
+    * \warning the vector direction is assumed to be normalized.
+    */
+  ParametrizedLine(const VectorType& origin, const VectorType& direction)
+    : m_origin(origin), m_direction(direction) {}
+
+  explicit ParametrizedLine(const Hyperplane<_Scalar, _AmbientDim>& hyperplane);
+
+  /** Constructs a parametrized line going from \a p0 to \a p1. */
+  static inline ParametrizedLine Through(const VectorType& p0, const VectorType& p1)
+  { return ParametrizedLine(p0, (p1-p0).normalized()); }
+
+  ~ParametrizedLine() {}
+
+  /** \returns the dimension in which the line holds */
+  inline int dim() const { return m_direction.size(); }
+
+  const VectorType& origin() const { return m_origin; }
+  VectorType& origin() { return m_origin; }
+
+  const VectorType& direction() const { return m_direction; }
+  VectorType& direction() { return m_direction; }
+
+  /** \returns the squared distance of a point \a p to its projection onto the line \c *this.
+    * \sa distance()
+    */
+  RealScalar squaredDistance(const VectorType& p) const
+  {
+    VectorType diff = p-origin();
+    return (diff - diff.eigen2_dot(direction())* direction()).squaredNorm();
+  }
+  /** \returns the distance of a point \a p to its projection onto the line \c *this.
+    * \sa squaredDistance()
+    */
+  RealScalar distance(const VectorType& p) const { return ei_sqrt(squaredDistance(p)); }
+
+  /** \returns the projection of a point \a p onto the line \c *this. */
+  VectorType projection(const VectorType& p) const
+  { return origin() + (p-origin()).eigen2_dot(direction()) * direction(); }
+
+  Scalar intersection(const Hyperplane<_Scalar, _AmbientDim>& hyperplane);
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<ParametrizedLine,
+           ParametrizedLine<NewScalarType,AmbientDimAtCompileTime> >::type cast() const
+  {
+    return typename internal::cast_return_type<ParametrizedLine,
+                    ParametrizedLine<NewScalarType,AmbientDimAtCompileTime> >::type(*this);
+  }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit ParametrizedLine(const ParametrizedLine<OtherScalarType,AmbientDimAtCompileTime>& other)
+  {
+    m_origin = other.origin().template cast<Scalar>();
+    m_direction = other.direction().template cast<Scalar>();
+  }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const ParametrizedLine& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return m_origin.isApprox(other.m_origin, prec) && m_direction.isApprox(other.m_direction, prec); }
+
+protected:
+
+  VectorType m_origin, m_direction;
+};
+
+/** Constructs a parametrized line from a 2D hyperplane
+  *
+  * \warning the ambient space must have dimension 2 such that the hyperplane actually describes a line
+  */
+template <typename _Scalar, int _AmbientDim>
+inline ParametrizedLine<_Scalar, _AmbientDim>::ParametrizedLine(const Hyperplane<_Scalar, _AmbientDim>& hyperplane)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 2)
+  direction() = hyperplane.normal().unitOrthogonal();
+  origin() = -hyperplane.normal()*hyperplane.offset();
+}
+
+/** \returns the parameter value of the intersection between \c *this and the given hyperplane
+  */
+template <typename _Scalar, int _AmbientDim>
+inline _Scalar ParametrizedLine<_Scalar, _AmbientDim>::intersection(const Hyperplane<_Scalar, _AmbientDim>& hyperplane)
+{
+  return -(hyperplane.offset()+origin().eigen2_dot(hyperplane.normal()))
+          /(direction().eigen2_dot(hyperplane.normal()));
+}
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Quaternion.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Quaternion.h
new file mode 100644
index 0000000000..4b6390cf1d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Quaternion.h
@@ -0,0 +1,495 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+template<typename Other,
+         int OtherRows=Other::RowsAtCompileTime,
+         int OtherCols=Other::ColsAtCompileTime>
+struct ei_quaternion_assign_impl;
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Quaternion
+  *
+  * \brief The quaternion class used to represent 3D orientations and rotations
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients
+  *
+  * This class represents a quaternion \f$ w+xi+yj+zk \f$ that is a convenient representation of
+  * orientations and rotations of objects in three dimensions. Compared to other representations
+  * like Euler angles or 3x3 matrices, quatertions offer the following advantages:
+  * \li \b compact storage (4 scalars)
+  * \li \b efficient to compose (28 flops),
+  * \li \b stable spherical interpolation
+  *
+  * The following two typedefs are provided for convenience:
+  * \li \c Quaternionf for \c float
+  * \li \c Quaterniond for \c double
+  *
+  * \sa  class AngleAxis, class Transform
+  */
+
+template<typename _Scalar> struct ei_traits<Quaternion<_Scalar> >
+{
+  typedef _Scalar Scalar;
+};
+
+template<typename _Scalar>
+class Quaternion : public RotationBase<Quaternion<_Scalar>,3>
+{
+  typedef RotationBase<Quaternion<_Scalar>,3> Base;
+
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,4)
+
+  using Base::operator*;
+
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+
+  /** the type of the Coefficients 4-vector */
+  typedef Matrix<Scalar, 4, 1> Coefficients;
+  /** the type of a 3D vector */
+  typedef Matrix<Scalar,3,1> Vector3;
+  /** the equivalent rotation matrix type */
+  typedef Matrix<Scalar,3,3> Matrix3;
+  /** the equivalent angle-axis type */
+  typedef AngleAxis<Scalar> AngleAxisType;
+
+  /** \returns the \c x coefficient */
+  inline Scalar x() const { return m_coeffs.coeff(0); }
+  /** \returns the \c y coefficient */
+  inline Scalar y() const { return m_coeffs.coeff(1); }
+  /** \returns the \c z coefficient */
+  inline Scalar z() const { return m_coeffs.coeff(2); }
+  /** \returns the \c w coefficient */
+  inline Scalar w() const { return m_coeffs.coeff(3); }
+
+  /** \returns a reference to the \c x coefficient */
+  inline Scalar& x() { return m_coeffs.coeffRef(0); }
+  /** \returns a reference to the \c y coefficient */
+  inline Scalar& y() { return m_coeffs.coeffRef(1); }
+  /** \returns a reference to the \c z coefficient */
+  inline Scalar& z() { return m_coeffs.coeffRef(2); }
+  /** \returns a reference to the \c w coefficient */
+  inline Scalar& w() { return m_coeffs.coeffRef(3); }
+
+  /** \returns a read-only vector expression of the imaginary part (x,y,z) */
+  inline const Block<const Coefficients,3,1> vec() const { return m_coeffs.template start<3>(); }
+
+  /** \returns a vector expression of the imaginary part (x,y,z) */
+  inline Block<Coefficients,3,1> vec() { return m_coeffs.template start<3>(); }
+
+  /** \returns a read-only vector expression of the coefficients (x,y,z,w) */
+  inline const Coefficients& coeffs() const { return m_coeffs; }
+
+  /** \returns a vector expression of the coefficients (x,y,z,w) */
+  inline Coefficients& coeffs() { return m_coeffs; }
+
+  /** Default constructor leaving the quaternion uninitialized. */
+  inline Quaternion() {}
+
+  /** Constructs and initializes the quaternion \f$ w+xi+yj+zk \f$ from
+    * its four coefficients \a w, \a x, \a y and \a z.
+    *
+    * \warning Note the order of the arguments: the real \a w coefficient first,
+    * while internally the coefficients are stored in the following order:
+    * [\c x, \c y, \c z, \c w]
+    */
+  inline Quaternion(Scalar w, Scalar x, Scalar y, Scalar z)
+  { m_coeffs << x, y, z, w; }
+
+  /** Copy constructor */
+  inline Quaternion(const Quaternion& other) { m_coeffs = other.m_coeffs; }
+
+  /** Constructs and initializes a quaternion from the angle-axis \a aa */
+  explicit inline Quaternion(const AngleAxisType& aa) { *this = aa; }
+
+  /** Constructs and initializes a quaternion from either:
+    *  - a rotation matrix expression,
+    *  - a 4D vector expression representing quaternion coefficients.
+    * \sa operator=(MatrixBase<Derived>)
+    */
+  template<typename Derived>
+  explicit inline Quaternion(const MatrixBase<Derived>& other) { *this = other; }
+
+  Quaternion& operator=(const Quaternion& other);
+  Quaternion& operator=(const AngleAxisType& aa);
+  template<typename Derived>
+  Quaternion& operator=(const MatrixBase<Derived>& m);
+
+  /** \returns a quaternion representing an identity rotation
+    * \sa MatrixBase::Identity()
+    */
+  static inline Quaternion Identity() { return Quaternion(1, 0, 0, 0); }
+
+  /** \sa Quaternion::Identity(), MatrixBase::setIdentity()
+    */
+  inline Quaternion& setIdentity() { m_coeffs << 0, 0, 0, 1; return *this; }
+
+  /** \returns the squared norm of the quaternion's coefficients
+    * \sa Quaternion::norm(), MatrixBase::squaredNorm()
+    */
+  inline Scalar squaredNorm() const { return m_coeffs.squaredNorm(); }
+
+  /** \returns the norm of the quaternion's coefficients
+    * \sa Quaternion::squaredNorm(), MatrixBase::norm()
+    */
+  inline Scalar norm() const { return m_coeffs.norm(); }
+
+  /** Normalizes the quaternion \c *this
+    * \sa normalized(), MatrixBase::normalize() */
+  inline void normalize() { m_coeffs.normalize(); }
+  /** \returns a normalized version of \c *this
+    * \sa normalize(), MatrixBase::normalized() */
+  inline Quaternion normalized() const { return Quaternion(m_coeffs.normalized()); }
+
+  /** \returns the dot product of \c *this and \a other
+    * Geometrically speaking, the dot product of two unit quaternions
+    * corresponds to the cosine of half the angle between the two rotations.
+    * \sa angularDistance()
+    */
+  inline Scalar eigen2_dot(const Quaternion& other) const { return m_coeffs.eigen2_dot(other.m_coeffs); }
+
+  inline Scalar angularDistance(const Quaternion& other) const;
+
+  Matrix3 toRotationMatrix(void) const;
+
+  template<typename Derived1, typename Derived2>
+  Quaternion& setFromTwoVectors(const MatrixBase<Derived1>& a, const MatrixBase<Derived2>& b);
+
+  inline Quaternion operator* (const Quaternion& q) const;
+  inline Quaternion& operator*= (const Quaternion& q);
+
+  Quaternion inverse(void) const;
+  Quaternion conjugate(void) const;
+
+  Quaternion slerp(Scalar t, const Quaternion& other) const;
+
+  template<typename Derived>
+  Vector3 operator* (const MatrixBase<Derived>& vec) const;
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Quaternion,Quaternion<NewScalarType> >::type cast() const
+  { return typename internal::cast_return_type<Quaternion,Quaternion<NewScalarType> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Quaternion(const Quaternion<OtherScalarType>& other)
+  { m_coeffs = other.coeffs().template cast<Scalar>(); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Quaternion& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return m_coeffs.isApprox(other.m_coeffs, prec); }
+
+protected:
+  Coefficients m_coeffs;
+};
+
+/** \ingroup Geometry_Module
+  * single precision quaternion type */
+typedef Quaternion<float> Quaternionf;
+/** \ingroup Geometry_Module
+  * double precision quaternion type */
+typedef Quaternion<double> Quaterniond;
+
+// Generic Quaternion * Quaternion product
+template<typename Scalar> inline Quaternion<Scalar>
+ei_quaternion_product(const Quaternion<Scalar>& a, const Quaternion<Scalar>& b)
+{
+  return Quaternion<Scalar>
+  (
+    a.w() * b.w() - a.x() * b.x() - a.y() * b.y() - a.z() * b.z(),
+    a.w() * b.x() + a.x() * b.w() + a.y() * b.z() - a.z() * b.y(),
+    a.w() * b.y() + a.y() * b.w() + a.z() * b.x() - a.x() * b.z(),
+    a.w() * b.z() + a.z() * b.w() + a.x() * b.y() - a.y() * b.x()
+  );
+}
+
+/** \returns the concatenation of two rotations as a quaternion-quaternion product */
+template <typename Scalar>
+inline Quaternion<Scalar> Quaternion<Scalar>::operator* (const Quaternion& other) const
+{
+  return ei_quaternion_product(*this,other);
+}
+
+/** \sa operator*(Quaternion) */
+template <typename Scalar>
+inline Quaternion<Scalar>& Quaternion<Scalar>::operator*= (const Quaternion& other)
+{
+  return (*this = *this * other);
+}
+
+/** Rotation of a vector by a quaternion.
+  * \remarks If the quaternion is used to rotate several points (>1)
+  * then it is much more efficient to first convert it to a 3x3 Matrix.
+  * Comparison of the operation cost for n transformations:
+  *   - Quaternion:    30n
+  *   - Via a Matrix3: 24 + 15n
+  */
+template <typename Scalar>
+template<typename Derived>
+inline typename Quaternion<Scalar>::Vector3
+Quaternion<Scalar>::operator* (const MatrixBase<Derived>& v) const
+{
+    // Note that this algorithm comes from the optimization by hand
+    // of the conversion to a Matrix followed by a Matrix/Vector product.
+    // It appears to be much faster than the common algorithm found
+    // in the litterature (30 versus 39 flops). It also requires two
+    // Vector3 as temporaries.
+    Vector3 uv;
+    uv = 2 * this->vec().cross(v);
+    return v + this->w() * uv + this->vec().cross(uv);
+}
+
+template<typename Scalar>
+inline Quaternion<Scalar>& Quaternion<Scalar>::operator=(const Quaternion& other)
+{
+  m_coeffs = other.m_coeffs;
+  return *this;
+}
+
+/** Set \c *this from an angle-axis \a aa and returns a reference to \c *this
+  */
+template<typename Scalar>
+inline Quaternion<Scalar>& Quaternion<Scalar>::operator=(const AngleAxisType& aa)
+{
+  Scalar ha = Scalar(0.5)*aa.angle(); // Scalar(0.5) to suppress precision loss warnings
+  this->w() = ei_cos(ha);
+  this->vec() = ei_sin(ha) * aa.axis();
+  return *this;
+}
+
+/** Set \c *this from the expression \a xpr:
+  *   - if \a xpr is a 4x1 vector, then \a xpr is assumed to be a quaternion
+  *   - if \a xpr is a 3x3 matrix, then \a xpr is assumed to be rotation matrix
+  *     and \a xpr is converted to a quaternion
+  */
+template<typename Scalar>
+template<typename Derived>
+inline Quaternion<Scalar>& Quaternion<Scalar>::operator=(const MatrixBase<Derived>& xpr)
+{
+  ei_quaternion_assign_impl<Derived>::run(*this, xpr.derived());
+  return *this;
+}
+
+/** Convert the quaternion to a 3x3 rotation matrix */
+template<typename Scalar>
+inline typename Quaternion<Scalar>::Matrix3
+Quaternion<Scalar>::toRotationMatrix(void) const
+{
+  // NOTE if inlined, then gcc 4.2 and 4.4 get rid of the temporary (not gcc 4.3 !!)
+  // if not inlined then the cost of the return by value is huge ~ +35%,
+  // however, not inlining this function is an order of magnitude slower, so
+  // it has to be inlined, and so the return by value is not an issue
+  Matrix3 res;
+
+  const Scalar tx  = Scalar(2)*this->x();
+  const Scalar ty  = Scalar(2)*this->y();
+  const Scalar tz  = Scalar(2)*this->z();
+  const Scalar twx = tx*this->w();
+  const Scalar twy = ty*this->w();
+  const Scalar twz = tz*this->w();
+  const Scalar txx = tx*this->x();
+  const Scalar txy = ty*this->x();
+  const Scalar txz = tz*this->x();
+  const Scalar tyy = ty*this->y();
+  const Scalar tyz = tz*this->y();
+  const Scalar tzz = tz*this->z();
+
+  res.coeffRef(0,0) = Scalar(1)-(tyy+tzz);
+  res.coeffRef(0,1) = txy-twz;
+  res.coeffRef(0,2) = txz+twy;
+  res.coeffRef(1,0) = txy+twz;
+  res.coeffRef(1,1) = Scalar(1)-(txx+tzz);
+  res.coeffRef(1,2) = tyz-twx;
+  res.coeffRef(2,0) = txz-twy;
+  res.coeffRef(2,1) = tyz+twx;
+  res.coeffRef(2,2) = Scalar(1)-(txx+tyy);
+
+  return res;
+}
+
+/** Sets *this to be a quaternion representing a rotation sending the vector \a a to the vector \a b.
+  *
+  * \returns a reference to *this.
+  *
+  * Note that the two input vectors do \b not have to be normalized.
+  */
+template<typename Scalar>
+template<typename Derived1, typename Derived2>
+inline Quaternion<Scalar>& Quaternion<Scalar>::setFromTwoVectors(const MatrixBase<Derived1>& a, const MatrixBase<Derived2>& b)
+{
+  Vector3 v0 = a.normalized();
+  Vector3 v1 = b.normalized();
+  Scalar c = v0.eigen2_dot(v1);
+
+  // if dot == 1, vectors are the same
+  if (ei_isApprox(c,Scalar(1)))
+  {
+    // set to identity
+    this->w() = 1; this->vec().setZero();
+    return *this;
+  }
+  // if dot == -1, vectors are opposites
+  if (ei_isApprox(c,Scalar(-1)))
+  {
+    this->vec() = v0.unitOrthogonal();
+    this->w() = 0;
+    return *this;
+  }
+
+  Vector3 axis = v0.cross(v1);
+  Scalar s = ei_sqrt((Scalar(1)+c)*Scalar(2));
+  Scalar invs = Scalar(1)/s;
+  this->vec() = axis * invs;
+  this->w() = s * Scalar(0.5);
+
+  return *this;
+}
+
+/** \returns the multiplicative inverse of \c *this
+  * Note that in most cases, i.e., if you simply want the opposite rotation,
+  * and/or the quaternion is normalized, then it is enough to use the conjugate.
+  *
+  * \sa Quaternion::conjugate()
+  */
+template <typename Scalar>
+inline Quaternion<Scalar> Quaternion<Scalar>::inverse() const
+{
+  // FIXME should this function be called multiplicativeInverse and conjugate() be called inverse() or opposite()  ??
+  Scalar n2 = this->squaredNorm();
+  if (n2 > 0)
+    return Quaternion(conjugate().coeffs() / n2);
+  else
+  {
+    // return an invalid result to flag the error
+    return Quaternion(Coefficients::Zero());
+  }
+}
+
+/** \returns the conjugate of the \c *this which is equal to the multiplicative inverse
+  * if the quaternion is normalized.
+  * The conjugate of a quaternion represents the opposite rotation.
+  *
+  * \sa Quaternion::inverse()
+  */
+template <typename Scalar>
+inline Quaternion<Scalar> Quaternion<Scalar>::conjugate() const
+{
+  return Quaternion(this->w(),-this->x(),-this->y(),-this->z());
+}
+
+/** \returns the angle (in radian) between two rotations
+  * \sa eigen2_dot()
+  */
+template <typename Scalar>
+inline Scalar Quaternion<Scalar>::angularDistance(const Quaternion& other) const
+{
+  double d = ei_abs(this->eigen2_dot(other));
+  if (d>=1.0)
+    return 0;
+  return Scalar(2) * std::acos(d);
+}
+
+/** \returns the spherical linear interpolation between the two quaternions
+  * \c *this and \a other at the parameter \a t
+  */
+template <typename Scalar>
+Quaternion<Scalar> Quaternion<Scalar>::slerp(Scalar t, const Quaternion& other) const
+{
+  static const Scalar one = Scalar(1) - machine_epsilon<Scalar>();
+  Scalar d = this->eigen2_dot(other);
+  Scalar absD = ei_abs(d);
+
+  Scalar scale0;
+  Scalar scale1;
+
+  if (absD>=one)
+  {
+    scale0 = Scalar(1) - t;
+    scale1 = t;
+  }
+  else
+  {
+    // theta is the angle between the 2 quaternions
+    Scalar theta = std::acos(absD);
+    Scalar sinTheta = ei_sin(theta);
+
+    scale0 = ei_sin( ( Scalar(1) - t ) * theta) / sinTheta;
+    scale1 = ei_sin( ( t * theta) ) / sinTheta;
+    if (d<0)
+      scale1 = -scale1;
+  }
+
+  return Quaternion<Scalar>(scale0 * coeffs() + scale1 * other.coeffs());
+}
+
+// set from a rotation matrix
+template<typename Other>
+struct ei_quaternion_assign_impl<Other,3,3>
+{
+  typedef typename Other::Scalar Scalar;
+  static inline void run(Quaternion<Scalar>& q, const Other& mat)
+  {
+    // This algorithm comes from  "Quaternion Calculus and Fast Animation",
+    // Ken Shoemake, 1987 SIGGRAPH course notes
+    Scalar t = mat.trace();
+    if (t > 0)
+    {
+      t = ei_sqrt(t + Scalar(1.0));
+      q.w() = Scalar(0.5)*t;
+      t = Scalar(0.5)/t;
+      q.x() = (mat.coeff(2,1) - mat.coeff(1,2)) * t;
+      q.y() = (mat.coeff(0,2) - mat.coeff(2,0)) * t;
+      q.z() = (mat.coeff(1,0) - mat.coeff(0,1)) * t;
+    }
+    else
+    {
+      int i = 0;
+      if (mat.coeff(1,1) > mat.coeff(0,0))
+        i = 1;
+      if (mat.coeff(2,2) > mat.coeff(i,i))
+        i = 2;
+      int j = (i+1)%3;
+      int k = (j+1)%3;
+
+      t = ei_sqrt(mat.coeff(i,i)-mat.coeff(j,j)-mat.coeff(k,k) + Scalar(1.0));
+      q.coeffs().coeffRef(i) = Scalar(0.5) * t;
+      t = Scalar(0.5)/t;
+      q.w() = (mat.coeff(k,j)-mat.coeff(j,k))*t;
+      q.coeffs().coeffRef(j) = (mat.coeff(j,i)+mat.coeff(i,j))*t;
+      q.coeffs().coeffRef(k) = (mat.coeff(k,i)+mat.coeff(i,k))*t;
+    }
+  }
+};
+
+// set from a vector of coefficients assumed to be a quaternion
+template<typename Other>
+struct ei_quaternion_assign_impl<Other,4,1>
+{
+  typedef typename Other::Scalar Scalar;
+  static inline void run(Quaternion<Scalar>& q, const Other& vec)
+  {
+    q.coeffs() = vec;
+  }
+};
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Rotation2D.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Rotation2D.h
new file mode 100644
index 0000000000..19b8582a1b
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Rotation2D.h
@@ -0,0 +1,145 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Rotation2D
+  *
+  * \brief Represents a rotation/orientation in a 2 dimensional space.
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients
+  *
+  * This class is equivalent to a single scalar representing a counter clock wise rotation
+  * as a single angle in radian. It provides some additional features such as the automatic
+  * conversion from/to a 2x2 rotation matrix. Moreover this class aims to provide a similar
+  * interface to Quaternion in order to facilitate the writing of generic algorithms
+  * dealing with rotations.
+  *
+  * \sa class Quaternion, class Transform
+  */
+template<typename _Scalar> struct ei_traits<Rotation2D<_Scalar> >
+{
+  typedef _Scalar Scalar;
+};
+
+template<typename _Scalar>
+class Rotation2D : public RotationBase<Rotation2D<_Scalar>,2>
+{
+  typedef RotationBase<Rotation2D<_Scalar>,2> Base;
+
+public:
+
+  using Base::operator*;
+
+  enum { Dim = 2 };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  typedef Matrix<Scalar,2,1> Vector2;
+  typedef Matrix<Scalar,2,2> Matrix2;
+
+protected:
+
+  Scalar m_angle;
+
+public:
+
+  /** Construct a 2D counter clock wise rotation from the angle \a a in radian. */
+  inline Rotation2D(Scalar a) : m_angle(a) {}
+
+  /** \returns the rotation angle */
+  inline Scalar angle() const { return m_angle; }
+
+  /** \returns a read-write reference to the rotation angle */
+  inline Scalar& angle() { return m_angle; }
+
+  /** \returns the inverse rotation */
+  inline Rotation2D inverse() const { return -m_angle; }
+
+  /** Concatenates two rotations */
+  inline Rotation2D operator*(const Rotation2D& other) const
+  { return m_angle + other.m_angle; }
+
+  /** Concatenates two rotations */
+  inline Rotation2D& operator*=(const Rotation2D& other)
+  { return m_angle += other.m_angle; return *this; }
+
+  /** Applies the rotation to a 2D vector */
+  Vector2 operator* (const Vector2& vec) const
+  { return toRotationMatrix() * vec; }
+
+  template<typename Derived>
+  Rotation2D& fromRotationMatrix(const MatrixBase<Derived>& m);
+  Matrix2 toRotationMatrix(void) const;
+
+  /** \returns the spherical interpolation between \c *this and \a other using
+    * parameter \a t. It is in fact equivalent to a linear interpolation.
+    */
+  inline Rotation2D slerp(Scalar t, const Rotation2D& other) const
+  { return m_angle * (1-t) + other.angle() * t; }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Rotation2D,Rotation2D<NewScalarType> >::type cast() const
+  { return typename internal::cast_return_type<Rotation2D,Rotation2D<NewScalarType> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Rotation2D(const Rotation2D<OtherScalarType>& other)
+  {
+    m_angle = Scalar(other.angle());
+  }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Rotation2D& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return ei_isApprox(m_angle,other.m_angle, prec); }
+};
+
+/** \ingroup Geometry_Module
+  * single precision 2D rotation type */
+typedef Rotation2D<float> Rotation2Df;
+/** \ingroup Geometry_Module
+  * double precision 2D rotation type */
+typedef Rotation2D<double> Rotation2Dd;
+
+/** Set \c *this from a 2x2 rotation matrix \a mat.
+  * In other words, this function extract the rotation angle
+  * from the rotation matrix.
+  */
+template<typename Scalar>
+template<typename Derived>
+Rotation2D<Scalar>& Rotation2D<Scalar>::fromRotationMatrix(const MatrixBase<Derived>& mat)
+{
+  EIGEN_STATIC_ASSERT(Derived::RowsAtCompileTime==2 && Derived::ColsAtCompileTime==2,YOU_MADE_A_PROGRAMMING_MISTAKE)
+  m_angle = ei_atan2(mat.coeff(1,0), mat.coeff(0,0));
+  return *this;
+}
+
+/** Constructs and \returns an equivalent 2x2 rotation matrix.
+  */
+template<typename Scalar>
+typename Rotation2D<Scalar>::Matrix2
+Rotation2D<Scalar>::toRotationMatrix(void) const
+{
+  Scalar sinA = ei_sin(m_angle);
+  Scalar cosA = ei_cos(m_angle);
+  return (Matrix2() << cosA, -sinA, sinA, cosA).finished();
+}
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/RotationBase.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/RotationBase.h
new file mode 100644
index 0000000000..b1c8f38da9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/RotationBase.h
@@ -0,0 +1,123 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+// this file aims to contains the various representations of rotation/orientation
+// in 2D and 3D space excepted Matrix and Quaternion.
+
+/** \class RotationBase
+  *
+  * \brief Common base class for compact rotation representations
+  *
+  * \param Derived is the derived type, i.e., a rotation type
+  * \param _Dim the dimension of the space
+  */
+template<typename Derived, int _Dim>
+class RotationBase
+{
+  public:
+    enum { Dim = _Dim };
+    /** the scalar type of the coefficients */
+    typedef typename ei_traits<Derived>::Scalar Scalar;
+    
+    /** corresponding linear transformation matrix type */
+    typedef Matrix<Scalar,Dim,Dim> RotationMatrixType;
+
+    inline const Derived& derived() const { return *static_cast<const Derived*>(this); }
+    inline Derived& derived() { return *static_cast<Derived*>(this); }
+
+    /** \returns an equivalent rotation matrix */
+    inline RotationMatrixType toRotationMatrix() const { return derived().toRotationMatrix(); }
+
+    /** \returns the inverse rotation */
+    inline Derived inverse() const { return derived().inverse(); }
+
+    /** \returns the concatenation of the rotation \c *this with a translation \a t */
+    inline Transform<Scalar,Dim> operator*(const Translation<Scalar,Dim>& t) const
+    { return toRotationMatrix() * t; }
+
+    /** \returns the concatenation of the rotation \c *this with a scaling \a s */
+    inline RotationMatrixType operator*(const Scaling<Scalar,Dim>& s) const
+    { return toRotationMatrix() * s; }
+
+    /** \returns the concatenation of the rotation \c *this with an affine transformation \a t */
+    inline Transform<Scalar,Dim> operator*(const Transform<Scalar,Dim>& t) const
+    { return toRotationMatrix() * t; }
+};
+
+/** \geometry_module
+  *
+  * Constructs a Dim x Dim rotation matrix from the rotation \a r
+  */
+template<typename _Scalar, int _Rows, int _Cols, int _Storage, int _MaxRows, int _MaxCols>
+template<typename OtherDerived>
+Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols>
+::Matrix(const RotationBase<OtherDerived,ColsAtCompileTime>& r)
+{
+  EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(Matrix,int(OtherDerived::Dim),int(OtherDerived::Dim))
+  *this = r.toRotationMatrix();
+}
+
+/** \geometry_module
+  *
+  * Set a Dim x Dim rotation matrix from the rotation \a r
+  */
+template<typename _Scalar, int _Rows, int _Cols, int _Storage, int _MaxRows, int _MaxCols>
+template<typename OtherDerived>
+Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols>&
+Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols>
+::operator=(const RotationBase<OtherDerived,ColsAtCompileTime>& r)
+{
+  EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(Matrix,int(OtherDerived::Dim),int(OtherDerived::Dim))
+  return *this = r.toRotationMatrix();
+}
+
+/** \internal
+  *
+  * Helper function to return an arbitrary rotation object to a rotation matrix.
+  *
+  * \param Scalar the numeric type of the matrix coefficients
+  * \param Dim the dimension of the current space
+  *
+  * It returns a Dim x Dim fixed size matrix.
+  *
+  * Default specializations are provided for:
+  *   - any scalar type (2D),
+  *   - any matrix expression,
+  *   - any type based on RotationBase (e.g., Quaternion, AngleAxis, Rotation2D)
+  *
+  * Currently ei_toRotationMatrix is only used by Transform.
+  *
+  * \sa class Transform, class Rotation2D, class Quaternion, class AngleAxis
+  */
+template<typename Scalar, int Dim>
+static inline Matrix<Scalar,2,2> ei_toRotationMatrix(const Scalar& s)
+{
+  EIGEN_STATIC_ASSERT(Dim==2,YOU_MADE_A_PROGRAMMING_MISTAKE)
+  return Rotation2D<Scalar>(s).toRotationMatrix();
+}
+
+template<typename Scalar, int Dim, typename OtherDerived>
+static inline Matrix<Scalar,Dim,Dim> ei_toRotationMatrix(const RotationBase<OtherDerived,Dim>& r)
+{
+  return r.toRotationMatrix();
+}
+
+template<typename Scalar, int Dim, typename OtherDerived>
+static inline const MatrixBase<OtherDerived>& ei_toRotationMatrix(const MatrixBase<OtherDerived>& mat)
+{
+  EIGEN_STATIC_ASSERT(OtherDerived::RowsAtCompileTime==Dim && OtherDerived::ColsAtCompileTime==Dim,
+    YOU_MADE_A_PROGRAMMING_MISTAKE)
+  return mat;
+}
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Scaling.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Scaling.h
new file mode 100644
index 0000000000..b8fa6cd3f6
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Scaling.h
@@ -0,0 +1,167 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Scaling
+  *
+  * \brief Represents a possibly non uniform scaling transformation
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients.
+  * \param _Dim the  dimension of the space, can be a compile time value or Dynamic
+  *
+  * \note This class is not aimed to be used to store a scaling transformation,
+  * but rather to make easier the constructions and updates of Transform objects.
+  *
+  * \sa class Translation, class Transform
+  */
+template<typename _Scalar, int _Dim>
+class Scaling
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Dim)
+  /** dimension of the space */
+  enum { Dim = _Dim };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  /** corresponding vector type */
+  typedef Matrix<Scalar,Dim,1> VectorType;
+  /** corresponding linear transformation matrix type */
+  typedef Matrix<Scalar,Dim,Dim> LinearMatrixType;
+  /** corresponding translation type */
+  typedef Translation<Scalar,Dim> TranslationType;
+  /** corresponding affine transformation type */
+  typedef Transform<Scalar,Dim> TransformType;
+
+protected:
+
+  VectorType m_coeffs;
+
+public:
+
+  /** Default constructor without initialization. */
+  Scaling() {}
+  /** Constructs and initialize a uniform scaling transformation */
+  explicit inline Scaling(const Scalar& s) { m_coeffs.setConstant(s); }
+  /** 2D only */
+  inline Scaling(const Scalar& sx, const Scalar& sy)
+  {
+    ei_assert(Dim==2);
+    m_coeffs.x() = sx;
+    m_coeffs.y() = sy;
+  }
+  /** 3D only */
+  inline Scaling(const Scalar& sx, const Scalar& sy, const Scalar& sz)
+  {
+    ei_assert(Dim==3);
+    m_coeffs.x() = sx;
+    m_coeffs.y() = sy;
+    m_coeffs.z() = sz;
+  }
+  /** Constructs and initialize the scaling transformation from a vector of scaling coefficients */
+  explicit inline Scaling(const VectorType& coeffs) : m_coeffs(coeffs) {}
+
+  const VectorType& coeffs() const { return m_coeffs; }
+  VectorType& coeffs() { return m_coeffs; }
+
+  /** Concatenates two scaling */
+  inline Scaling operator* (const Scaling& other) const
+  { return Scaling(coeffs().cwise() * other.coeffs()); }
+
+  /** Concatenates a scaling and a translation */
+  inline TransformType operator* (const TranslationType& t) const;
+
+  /** Concatenates a scaling and an affine transformation */
+  inline TransformType operator* (const TransformType& t) const;
+
+  /** Concatenates a scaling and a linear transformation matrix */
+  // TODO returns an expression
+  inline LinearMatrixType operator* (const LinearMatrixType& other) const
+  { return coeffs().asDiagonal() * other; }
+
+  /** Concatenates a linear transformation matrix and a scaling */
+  // TODO returns an expression
+  friend inline LinearMatrixType operator* (const LinearMatrixType& other, const Scaling& s)
+  { return other * s.coeffs().asDiagonal(); }
+
+  template<typename Derived>
+  inline LinearMatrixType operator*(const RotationBase<Derived,Dim>& r) const
+  { return *this * r.toRotationMatrix(); }
+
+  /** Applies scaling to vector */
+  inline VectorType operator* (const VectorType& other) const
+  { return coeffs().asDiagonal() * other; }
+
+  /** \returns the inverse scaling */
+  inline Scaling inverse() const
+  { return Scaling(coeffs().cwise().inverse()); }
+
+  inline Scaling& operator=(const Scaling& other)
+  {
+    m_coeffs = other.m_coeffs;
+    return *this;
+  }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Scaling,Scaling<NewScalarType,Dim> >::type cast() const
+  { return typename internal::cast_return_type<Scaling,Scaling<NewScalarType,Dim> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Scaling(const Scaling<OtherScalarType,Dim>& other)
+  { m_coeffs = other.coeffs().template cast<Scalar>(); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Scaling& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return m_coeffs.isApprox(other.m_coeffs, prec); }
+
+};
+
+/** \addtogroup Geometry_Module */
+//@{
+typedef Scaling<float, 2> Scaling2f;
+typedef Scaling<double,2> Scaling2d;
+typedef Scaling<float, 3> Scaling3f;
+typedef Scaling<double,3> Scaling3d;
+//@}
+
+template<typename Scalar, int Dim>
+inline typename Scaling<Scalar,Dim>::TransformType
+Scaling<Scalar,Dim>::operator* (const TranslationType& t) const
+{
+  TransformType res;
+  res.matrix().setZero();
+  res.linear().diagonal() = coeffs();
+  res.translation() = m_coeffs.cwise() * t.vector();
+  res(Dim,Dim) = Scalar(1);
+  return res;
+}
+
+template<typename Scalar, int Dim>
+inline typename Scaling<Scalar,Dim>::TransformType
+Scaling<Scalar,Dim>::operator* (const TransformType& t) const
+{
+  TransformType res = t;
+  res.prescale(m_coeffs);
+  return res;
+}
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Transform.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Transform.h
new file mode 100644
index 0000000000..fab60b251d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Transform.h
@@ -0,0 +1,786 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+// Note that we have to pass Dim and HDim because it is not allowed to use a template
+// parameter to define a template specialization. To be more precise, in the following
+// specializations, it is not allowed to use Dim+1 instead of HDim.
+template< typename Other,
+          int Dim,
+          int HDim,
+          int OtherRows=Other::RowsAtCompileTime,
+          int OtherCols=Other::ColsAtCompileTime>
+struct ei_transform_product_impl;
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Transform
+  *
+  * \brief Represents an homogeneous transformation in a N dimensional space
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients
+  * \param _Dim the dimension of the space
+  *
+  * The homography is internally represented and stored as a (Dim+1)^2 matrix which
+  * is available through the matrix() method.
+  *
+  * Conversion methods from/to Qt's QMatrix and QTransform are available if the
+  * preprocessor token EIGEN_QT_SUPPORT is defined.
+  *
+  * \sa class Matrix, class Quaternion
+  */
+template<typename _Scalar, int _Dim>
+class Transform
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Dim==Dynamic ? Dynamic : (_Dim+1)*(_Dim+1))
+  enum {
+    Dim = _Dim,     ///< space dimension in which the transformation holds
+    HDim = _Dim+1   ///< size of a respective homogeneous vector
+  };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  /** type of the matrix used to represent the transformation */
+  typedef Matrix<Scalar,HDim,HDim> MatrixType;
+  /** type of the matrix used to represent the linear part of the transformation */
+  typedef Matrix<Scalar,Dim,Dim> LinearMatrixType;
+  /** type of read/write reference to the linear part of the transformation */
+  typedef Block<MatrixType,Dim,Dim> LinearPart;
+  /** type of read/write reference to the linear part of the transformation */
+  typedef const Block<const MatrixType,Dim,Dim> ConstLinearPart;
+  /** type of a vector */
+  typedef Matrix<Scalar,Dim,1> VectorType;
+  /** type of a read/write reference to the translation part of the rotation */
+  typedef Block<MatrixType,Dim,1> TranslationPart;
+  /** type of a read/write reference to the translation part of the rotation */
+  typedef const Block<const MatrixType,Dim,1> ConstTranslationPart;
+  /** corresponding translation type */
+  typedef Translation<Scalar,Dim> TranslationType;
+  /** corresponding scaling transformation type */
+  typedef Scaling<Scalar,Dim> ScalingType;
+
+protected:
+
+  MatrixType m_matrix;
+
+public:
+
+  /** Default constructor without initialization of the coefficients. */
+  inline Transform() { }
+
+  inline Transform(const Transform& other)
+  {
+    m_matrix = other.m_matrix;
+  }
+
+  inline explicit Transform(const TranslationType& t) { *this = t; }
+  inline explicit Transform(const ScalingType& s) { *this = s; }
+  template<typename Derived>
+  inline explicit Transform(const RotationBase<Derived, Dim>& r) { *this = r; }
+
+  inline Transform& operator=(const Transform& other)
+  { m_matrix = other.m_matrix; return *this; }
+
+  template<typename OtherDerived, bool BigMatrix> // MSVC 2005 will commit suicide if BigMatrix has a default value
+  struct construct_from_matrix
+  {
+    static inline void run(Transform *transform, const MatrixBase<OtherDerived>& other)
+    {
+      transform->matrix() = other;
+    }
+  };
+
+  template<typename OtherDerived> struct construct_from_matrix<OtherDerived, true>
+  {
+    static inline void run(Transform *transform, const MatrixBase<OtherDerived>& other)
+    {
+      transform->linear() = other;
+      transform->translation().setZero();
+      transform->matrix()(Dim,Dim) = Scalar(1);
+      transform->matrix().template block<1,Dim>(Dim,0).setZero();
+    }
+  };
+
+  /** Constructs and initializes a transformation from a Dim^2 or a (Dim+1)^2 matrix. */
+  template<typename OtherDerived>
+  inline explicit Transform(const MatrixBase<OtherDerived>& other)
+  {
+    construct_from_matrix<OtherDerived, int(OtherDerived::RowsAtCompileTime) == Dim>::run(this, other);
+  }
+
+  /** Set \c *this from a (Dim+1)^2 matrix. */
+  template<typename OtherDerived>
+  inline Transform& operator=(const MatrixBase<OtherDerived>& other)
+  { m_matrix = other; return *this; }
+
+  #ifdef EIGEN_QT_SUPPORT
+  inline Transform(const QMatrix& other);
+  inline Transform& operator=(const QMatrix& other);
+  inline QMatrix toQMatrix(void) const;
+  inline Transform(const QTransform& other);
+  inline Transform& operator=(const QTransform& other);
+  inline QTransform toQTransform(void) const;
+  #endif
+
+  /** shortcut for m_matrix(row,col);
+    * \sa MatrixBase::operaror(int,int) const */
+  inline Scalar operator() (int row, int col) const { return m_matrix(row,col); }
+  /** shortcut for m_matrix(row,col);
+    * \sa MatrixBase::operaror(int,int) */
+  inline Scalar& operator() (int row, int col) { return m_matrix(row,col); }
+
+  /** \returns a read-only expression of the transformation matrix */
+  inline const MatrixType& matrix() const { return m_matrix; }
+  /** \returns a writable expression of the transformation matrix */
+  inline MatrixType& matrix() { return m_matrix; }
+
+  /** \returns a read-only expression of the linear (linear) part of the transformation */
+  inline ConstLinearPart linear() const { return m_matrix.template block<Dim,Dim>(0,0); }
+  /** \returns a writable expression of the linear (linear) part of the transformation */
+  inline LinearPart linear() { return m_matrix.template block<Dim,Dim>(0,0); }
+
+  /** \returns a read-only expression of the translation vector of the transformation */
+  inline ConstTranslationPart translation() const { return m_matrix.template block<Dim,1>(0,Dim); }
+  /** \returns a writable expression of the translation vector of the transformation */
+  inline TranslationPart translation() { return m_matrix.template block<Dim,1>(0,Dim); }
+
+  /** \returns an expression of the product between the transform \c *this and a matrix expression \a other
+  *
+  * The right hand side \a other might be either:
+  * \li a vector of size Dim,
+  * \li an homogeneous vector of size Dim+1,
+  * \li a transformation matrix of size Dim+1 x Dim+1.
+  */
+  // note: this function is defined here because some compilers cannot find the respective declaration
+  template<typename OtherDerived>
+  inline const typename ei_transform_product_impl<OtherDerived,_Dim,_Dim+1>::ResultType
+  operator * (const MatrixBase<OtherDerived> &other) const
+  { return ei_transform_product_impl<OtherDerived,Dim,HDim>::run(*this,other.derived()); }
+
+  /** \returns the product expression of a transformation matrix \a a times a transform \a b
+    * The transformation matrix \a a must have a Dim+1 x Dim+1 sizes. */
+  template<typename OtherDerived>
+  friend inline const typename ProductReturnType<OtherDerived,MatrixType>::Type
+  operator * (const MatrixBase<OtherDerived> &a, const Transform &b)
+  { return a.derived() * b.matrix(); }
+
+  /** Contatenates two transformations */
+  inline const Transform
+  operator * (const Transform& other) const
+  { return Transform(m_matrix * other.matrix()); }
+
+  /** \sa MatrixBase::setIdentity() */
+  void setIdentity() { m_matrix.setIdentity(); }
+  static const typename MatrixType::IdentityReturnType Identity()
+  {
+    return MatrixType::Identity();
+  }
+
+  template<typename OtherDerived>
+  inline Transform& scale(const MatrixBase<OtherDerived> &other);
+
+  template<typename OtherDerived>
+  inline Transform& prescale(const MatrixBase<OtherDerived> &other);
+
+  inline Transform& scale(Scalar s);
+  inline Transform& prescale(Scalar s);
+
+  template<typename OtherDerived>
+  inline Transform& translate(const MatrixBase<OtherDerived> &other);
+
+  template<typename OtherDerived>
+  inline Transform& pretranslate(const MatrixBase<OtherDerived> &other);
+
+  template<typename RotationType>
+  inline Transform& rotate(const RotationType& rotation);
+
+  template<typename RotationType>
+  inline Transform& prerotate(const RotationType& rotation);
+
+  Transform& shear(Scalar sx, Scalar sy);
+  Transform& preshear(Scalar sx, Scalar sy);
+
+  inline Transform& operator=(const TranslationType& t);
+  inline Transform& operator*=(const TranslationType& t) { return translate(t.vector()); }
+  inline Transform operator*(const TranslationType& t) const;
+
+  inline Transform& operator=(const ScalingType& t);
+  inline Transform& operator*=(const ScalingType& s) { return scale(s.coeffs()); }
+  inline Transform operator*(const ScalingType& s) const;
+  friend inline Transform operator*(const LinearMatrixType& mat, const Transform& t)
+  {
+    Transform res = t;
+    res.matrix().row(Dim) = t.matrix().row(Dim);
+    res.matrix().template block<Dim,HDim>(0,0) = (mat * t.matrix().template block<Dim,HDim>(0,0)).lazy();
+    return res;
+  }
+
+  template<typename Derived>
+  inline Transform& operator=(const RotationBase<Derived,Dim>& r);
+  template<typename Derived>
+  inline Transform& operator*=(const RotationBase<Derived,Dim>& r) { return rotate(r.toRotationMatrix()); }
+  template<typename Derived>
+  inline Transform operator*(const RotationBase<Derived,Dim>& r) const;
+
+  LinearMatrixType rotation() const;
+  template<typename RotationMatrixType, typename ScalingMatrixType>
+  void computeRotationScaling(RotationMatrixType *rotation, ScalingMatrixType *scaling) const;
+  template<typename ScalingMatrixType, typename RotationMatrixType>
+  void computeScalingRotation(ScalingMatrixType *scaling, RotationMatrixType *rotation) const;
+
+  template<typename PositionDerived, typename OrientationType, typename ScaleDerived>
+  Transform& fromPositionOrientationScale(const MatrixBase<PositionDerived> &position,
+    const OrientationType& orientation, const MatrixBase<ScaleDerived> &scale);
+
+  inline const MatrixType inverse(TransformTraits traits = Affine) const;
+
+  /** \returns a const pointer to the column major internal matrix */
+  const Scalar* data() const { return m_matrix.data(); }
+  /** \returns a non-const pointer to the column major internal matrix */
+  Scalar* data() { return m_matrix.data(); }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Transform,Transform<NewScalarType,Dim> >::type cast() const
+  { return typename internal::cast_return_type<Transform,Transform<NewScalarType,Dim> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Transform(const Transform<OtherScalarType,Dim>& other)
+  { m_matrix = other.matrix().template cast<Scalar>(); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Transform& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return m_matrix.isApprox(other.m_matrix, prec); }
+
+  #ifdef EIGEN_TRANSFORM_PLUGIN
+  #include EIGEN_TRANSFORM_PLUGIN
+  #endif
+
+protected:
+
+};
+
+/** \ingroup Geometry_Module */
+typedef Transform<float,2> Transform2f;
+/** \ingroup Geometry_Module */
+typedef Transform<float,3> Transform3f;
+/** \ingroup Geometry_Module */
+typedef Transform<double,2> Transform2d;
+/** \ingroup Geometry_Module */
+typedef Transform<double,3> Transform3d;
+
+/**************************
+*** Optional QT support ***
+**************************/
+
+#ifdef EIGEN_QT_SUPPORT
+/** Initialises \c *this from a QMatrix assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim>
+Transform<Scalar,Dim>::Transform(const QMatrix& other)
+{
+  *this = other;
+}
+
+/** Set \c *this from a QMatrix assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim>
+Transform<Scalar,Dim>& Transform<Scalar,Dim>::operator=(const QMatrix& other)
+{
+  EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  m_matrix << other.m11(), other.m21(), other.dx(),
+              other.m12(), other.m22(), other.dy(),
+              0, 0, 1;
+   return *this;
+}
+
+/** \returns a QMatrix from \c *this assuming the dimension is 2.
+  *
+  * \warning this convertion might loss data if \c *this is not affine
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim>
+QMatrix Transform<Scalar,Dim>::toQMatrix(void) const
+{
+  EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  return QMatrix(m_matrix.coeff(0,0), m_matrix.coeff(1,0),
+                 m_matrix.coeff(0,1), m_matrix.coeff(1,1),
+                 m_matrix.coeff(0,2), m_matrix.coeff(1,2));
+}
+
+/** Initialises \c *this from a QTransform assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim>
+Transform<Scalar,Dim>::Transform(const QTransform& other)
+{
+  *this = other;
+}
+
+/** Set \c *this from a QTransform assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim>
+Transform<Scalar,Dim>& Transform<Scalar,Dim>::operator=(const QTransform& other)
+{
+  EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  m_matrix << other.m11(), other.m21(), other.dx(),
+              other.m12(), other.m22(), other.dy(),
+              other.m13(), other.m23(), other.m33();
+   return *this;
+}
+
+/** \returns a QTransform from \c *this assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim>
+QTransform Transform<Scalar,Dim>::toQTransform(void) const
+{
+  EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  return QTransform(m_matrix.coeff(0,0), m_matrix.coeff(1,0), m_matrix.coeff(2,0),
+                    m_matrix.coeff(0,1), m_matrix.coeff(1,1), m_matrix.coeff(2,1),
+                    m_matrix.coeff(0,2), m_matrix.coeff(1,2), m_matrix.coeff(2,2));
+}
+#endif
+
+/*********************
+*** Procedural API ***
+*********************/
+
+/** Applies on the right the non uniform scale transformation represented
+  * by the vector \a other to \c *this and returns a reference to \c *this.
+  * \sa prescale()
+  */
+template<typename Scalar, int Dim>
+template<typename OtherDerived>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::scale(const MatrixBase<OtherDerived> &other)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim))
+  linear() = (linear() * other.asDiagonal()).lazy();
+  return *this;
+}
+
+/** Applies on the right a uniform scale of a factor \a c to \c *this
+  * and returns a reference to \c *this.
+  * \sa prescale(Scalar)
+  */
+template<typename Scalar, int Dim>
+inline Transform<Scalar,Dim>& Transform<Scalar,Dim>::scale(Scalar s)
+{
+  linear() *= s;
+  return *this;
+}
+
+/** Applies on the left the non uniform scale transformation represented
+  * by the vector \a other to \c *this and returns a reference to \c *this.
+  * \sa scale()
+  */
+template<typename Scalar, int Dim>
+template<typename OtherDerived>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::prescale(const MatrixBase<OtherDerived> &other)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim))
+  m_matrix.template block<Dim,HDim>(0,0) = (other.asDiagonal() * m_matrix.template block<Dim,HDim>(0,0)).lazy();
+  return *this;
+}
+
+/** Applies on the left a uniform scale of a factor \a c to \c *this
+  * and returns a reference to \c *this.
+  * \sa scale(Scalar)
+  */
+template<typename Scalar, int Dim>
+inline Transform<Scalar,Dim>& Transform<Scalar,Dim>::prescale(Scalar s)
+{
+  m_matrix.template corner<Dim,HDim>(TopLeft) *= s;
+  return *this;
+}
+
+/** Applies on the right the translation matrix represented by the vector \a other
+  * to \c *this and returns a reference to \c *this.
+  * \sa pretranslate()
+  */
+template<typename Scalar, int Dim>
+template<typename OtherDerived>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::translate(const MatrixBase<OtherDerived> &other)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim))
+  translation() += linear() * other;
+  return *this;
+}
+
+/** Applies on the left the translation matrix represented by the vector \a other
+  * to \c *this and returns a reference to \c *this.
+  * \sa translate()
+  */
+template<typename Scalar, int Dim>
+template<typename OtherDerived>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::pretranslate(const MatrixBase<OtherDerived> &other)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim))
+  translation() += other;
+  return *this;
+}
+
+/** Applies on the right the rotation represented by the rotation \a rotation
+  * to \c *this and returns a reference to \c *this.
+  *
+  * The template parameter \a RotationType is the type of the rotation which
+  * must be known by ei_toRotationMatrix<>.
+  *
+  * Natively supported types includes:
+  *   - any scalar (2D),
+  *   - a Dim x Dim matrix expression,
+  *   - a Quaternion (3D),
+  *   - a AngleAxis (3D)
+  *
+  * This mechanism is easily extendable to support user types such as Euler angles,
+  * or a pair of Quaternion for 4D rotations.
+  *
+  * \sa rotate(Scalar), class Quaternion, class AngleAxis, prerotate(RotationType)
+  */
+template<typename Scalar, int Dim>
+template<typename RotationType>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::rotate(const RotationType& rotation)
+{
+  linear() *= ei_toRotationMatrix<Scalar,Dim>(rotation);
+  return *this;
+}
+
+/** Applies on the left the rotation represented by the rotation \a rotation
+  * to \c *this and returns a reference to \c *this.
+  *
+  * See rotate() for further details.
+  *
+  * \sa rotate()
+  */
+template<typename Scalar, int Dim>
+template<typename RotationType>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::prerotate(const RotationType& rotation)
+{
+  m_matrix.template block<Dim,HDim>(0,0) = ei_toRotationMatrix<Scalar,Dim>(rotation)
+                                         * m_matrix.template block<Dim,HDim>(0,0);
+  return *this;
+}
+
+/** Applies on the right the shear transformation represented
+  * by the vector \a other to \c *this and returns a reference to \c *this.
+  * \warning 2D only.
+  * \sa preshear()
+  */
+template<typename Scalar, int Dim>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::shear(Scalar sx, Scalar sy)
+{
+  EIGEN_STATIC_ASSERT(int(Dim)==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  VectorType tmp = linear().col(0)*sy + linear().col(1);
+  linear() << linear().col(0) + linear().col(1)*sx, tmp;
+  return *this;
+}
+
+/** Applies on the left the shear transformation represented
+  * by the vector \a other to \c *this and returns a reference to \c *this.
+  * \warning 2D only.
+  * \sa shear()
+  */
+template<typename Scalar, int Dim>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::preshear(Scalar sx, Scalar sy)
+{
+  EIGEN_STATIC_ASSERT(int(Dim)==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  m_matrix.template block<Dim,HDim>(0,0) = LinearMatrixType(1, sx, sy, 1) * m_matrix.template block<Dim,HDim>(0,0);
+  return *this;
+}
+
+/******************************************************
+*** Scaling, Translation and Rotation compatibility ***
+******************************************************/
+
+template<typename Scalar, int Dim>
+inline Transform<Scalar,Dim>& Transform<Scalar,Dim>::operator=(const TranslationType& t)
+{
+  linear().setIdentity();
+  translation() = t.vector();
+  m_matrix.template block<1,Dim>(Dim,0).setZero();
+  m_matrix(Dim,Dim) = Scalar(1);
+  return *this;
+}
+
+template<typename Scalar, int Dim>
+inline Transform<Scalar,Dim> Transform<Scalar,Dim>::operator*(const TranslationType& t) const
+{
+  Transform res = *this;
+  res.translate(t.vector());
+  return res;
+}
+
+template<typename Scalar, int Dim>
+inline Transform<Scalar,Dim>& Transform<Scalar,Dim>::operator=(const ScalingType& s)
+{
+  m_matrix.setZero();
+  linear().diagonal() = s.coeffs();
+  m_matrix.coeffRef(Dim,Dim) = Scalar(1);
+  return *this;
+}
+
+template<typename Scalar, int Dim>
+inline Transform<Scalar,Dim> Transform<Scalar,Dim>::operator*(const ScalingType& s) const
+{
+  Transform res = *this;
+  res.scale(s.coeffs());
+  return res;
+}
+
+template<typename Scalar, int Dim>
+template<typename Derived>
+inline Transform<Scalar,Dim>& Transform<Scalar,Dim>::operator=(const RotationBase<Derived,Dim>& r)
+{
+  linear() = ei_toRotationMatrix<Scalar,Dim>(r);
+  translation().setZero();
+  m_matrix.template block<1,Dim>(Dim,0).setZero();
+  m_matrix.coeffRef(Dim,Dim) = Scalar(1);
+  return *this;
+}
+
+template<typename Scalar, int Dim>
+template<typename Derived>
+inline Transform<Scalar,Dim> Transform<Scalar,Dim>::operator*(const RotationBase<Derived,Dim>& r) const
+{
+  Transform res = *this;
+  res.rotate(r.derived());
+  return res;
+}
+
+/************************
+*** Special functions ***
+************************/
+
+/** \returns the rotation part of the transformation
+  * \nonstableyet
+  *
+  * \svd_module
+  *
+  * \sa computeRotationScaling(), computeScalingRotation(), class SVD
+  */
+template<typename Scalar, int Dim>
+typename Transform<Scalar,Dim>::LinearMatrixType
+Transform<Scalar,Dim>::rotation() const
+{
+  LinearMatrixType result;
+  computeRotationScaling(&result, (LinearMatrixType*)0);
+  return result;
+}
+
+
+/** decomposes the linear part of the transformation as a product rotation x scaling, the scaling being
+  * not necessarily positive.
+  *
+  * If either pointer is zero, the corresponding computation is skipped.
+  *
+  * \nonstableyet
+  *
+  * \svd_module
+  *
+  * \sa computeScalingRotation(), rotation(), class SVD
+  */
+template<typename Scalar, int Dim>
+template<typename RotationMatrixType, typename ScalingMatrixType>
+void Transform<Scalar,Dim>::computeRotationScaling(RotationMatrixType *rotation, ScalingMatrixType *scaling) const
+{
+  JacobiSVD<LinearMatrixType> svd(linear(), ComputeFullU|ComputeFullV);
+  Scalar x = (svd.matrixU() * svd.matrixV().adjoint()).determinant(); // so x has absolute value 1
+  Matrix<Scalar, Dim, 1> sv(svd.singularValues());
+  sv.coeffRef(0) *= x;
+  if(scaling)
+  {
+    scaling->noalias() = svd.matrixV() * sv.asDiagonal() * svd.matrixV().adjoint();
+  }
+  if(rotation)
+  {
+    LinearMatrixType m(svd.matrixU());
+    m.col(0) /= x;
+    rotation->noalias() = m * svd.matrixV().adjoint();
+  }
+}
+
+/** decomposes the linear part of the transformation as a product rotation x scaling, the scaling being
+  * not necessarily positive.
+  *
+  * If either pointer is zero, the corresponding computation is skipped.
+  *
+  * \nonstableyet
+  *
+  * \svd_module
+  *
+  * \sa computeRotationScaling(), rotation(), class SVD
+  */
+template<typename Scalar, int Dim>
+template<typename ScalingMatrixType, typename RotationMatrixType>
+void Transform<Scalar,Dim>::computeScalingRotation(ScalingMatrixType *scaling, RotationMatrixType *rotation) const
+{
+  JacobiSVD<LinearMatrixType> svd(linear(), ComputeFullU|ComputeFullV);
+  Scalar x = (svd.matrixU() * svd.matrixV().adjoint()).determinant(); // so x has absolute value 1
+  Matrix<Scalar, Dim, 1> sv(svd.singularValues());
+  sv.coeffRef(0) *= x;
+  if(scaling)
+  {
+    scaling->noalias() = svd.matrixU() * sv.asDiagonal() * svd.matrixU().adjoint();
+  }
+  if(rotation)
+  {
+    LinearMatrixType m(svd.matrixU());
+    m.col(0) /= x;
+    rotation->noalias() = m * svd.matrixV().adjoint();
+  }
+}
+
+/** Convenient method to set \c *this from a position, orientation and scale
+  * of a 3D object.
+  */
+template<typename Scalar, int Dim>
+template<typename PositionDerived, typename OrientationType, typename ScaleDerived>
+Transform<Scalar,Dim>&
+Transform<Scalar,Dim>::fromPositionOrientationScale(const MatrixBase<PositionDerived> &position,
+  const OrientationType& orientation, const MatrixBase<ScaleDerived> &scale)
+{
+  linear() = ei_toRotationMatrix<Scalar,Dim>(orientation);
+  linear() *= scale.asDiagonal();
+  translation() = position;
+  m_matrix.template block<1,Dim>(Dim,0).setZero();
+  m_matrix(Dim,Dim) = Scalar(1);
+  return *this;
+}
+
+/** \nonstableyet
+  *
+  * \returns the inverse transformation matrix according to some given knowledge
+  * on \c *this.
+  *
+  * \param traits allows to optimize the inversion process when the transformion
+  * is known to be not a general transformation. The possible values are:
+  *  - Projective if the transformation is not necessarily affine, i.e., if the
+  *    last row is not guaranteed to be [0 ... 0 1]
+  *  - Affine is the default, the last row is assumed to be [0 ... 0 1]
+  *  - Isometry if the transformation is only a concatenations of translations
+  *    and rotations.
+  *
+  * \warning unless \a traits is always set to NoShear or NoScaling, this function
+  * requires the generic inverse method of MatrixBase defined in the LU module. If
+  * you forget to include this module, then you will get hard to debug linking errors.
+  *
+  * \sa MatrixBase::inverse()
+  */
+template<typename Scalar, int Dim>
+inline const typename Transform<Scalar,Dim>::MatrixType
+Transform<Scalar,Dim>::inverse(TransformTraits traits) const
+{
+  if (traits == Projective)
+  {
+    return m_matrix.inverse();
+  }
+  else
+  {
+    MatrixType res;
+    if (traits == Affine)
+    {
+      res.template corner<Dim,Dim>(TopLeft) = linear().inverse();
+    }
+    else if (traits == Isometry)
+    {
+      res.template corner<Dim,Dim>(TopLeft) = linear().transpose();
+    }
+    else
+    {
+      ei_assert("invalid traits value in Transform::inverse()");
+    }
+    // translation and remaining parts
+    res.template corner<Dim,1>(TopRight) = - res.template corner<Dim,Dim>(TopLeft) * translation();
+    res.template corner<1,Dim>(BottomLeft).setZero();
+    res.coeffRef(Dim,Dim) = Scalar(1);
+    return res;
+  }
+}
+
+/*****************************************************
+*** Specializations of operator* with a MatrixBase ***
+*****************************************************/
+
+template<typename Other, int Dim, int HDim>
+struct ei_transform_product_impl<Other,Dim,HDim, HDim,HDim>
+{
+  typedef Transform<typename Other::Scalar,Dim> TransformType;
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef typename ProductReturnType<MatrixType,Other>::Type ResultType;
+  static ResultType run(const TransformType& tr, const Other& other)
+  { return tr.matrix() * other; }
+};
+
+template<typename Other, int Dim, int HDim>
+struct ei_transform_product_impl<Other,Dim,HDim, Dim,Dim>
+{
+  typedef Transform<typename Other::Scalar,Dim> TransformType;
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef TransformType ResultType;
+  static ResultType run(const TransformType& tr, const Other& other)
+  {
+    TransformType res;
+    res.translation() = tr.translation();
+    res.matrix().row(Dim) = tr.matrix().row(Dim);
+    res.linear() = (tr.linear() * other).lazy();
+    return res;
+  }
+};
+
+template<typename Other, int Dim, int HDim>
+struct ei_transform_product_impl<Other,Dim,HDim, HDim,1>
+{
+  typedef Transform<typename Other::Scalar,Dim> TransformType;
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef typename ProductReturnType<MatrixType,Other>::Type ResultType;
+  static ResultType run(const TransformType& tr, const Other& other)
+  { return tr.matrix() * other; }
+};
+
+template<typename Other, int Dim, int HDim>
+struct ei_transform_product_impl<Other,Dim,HDim, Dim,1>
+{
+  typedef typename Other::Scalar Scalar;
+  typedef Transform<Scalar,Dim> TransformType;
+  typedef Matrix<Scalar,Dim,1> ResultType;
+  static ResultType run(const TransformType& tr, const Other& other)
+  { return ((tr.linear() * other) + tr.translation())
+          * (Scalar(1) / ( (tr.matrix().template block<1,Dim>(Dim,0) * other).coeff(0) + tr.matrix().coeff(Dim,Dim))); }
+};
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Translation.h b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Translation.h
new file mode 100644
index 0000000000..2b9859f6f4
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Geometry/Translation.h
@@ -0,0 +1,184 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// no include guard, we'll include this twice from All.h from Eigen2Support, and it's internal anyway
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Translation
+  *
+  * \brief Represents a translation transformation
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients.
+  * \param _Dim the  dimension of the space, can be a compile time value or Dynamic
+  *
+  * \note This class is not aimed to be used to store a translation transformation,
+  * but rather to make easier the constructions and updates of Transform objects.
+  *
+  * \sa class Scaling, class Transform
+  */
+template<typename _Scalar, int _Dim>
+class Translation
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Dim)
+  /** dimension of the space */
+  enum { Dim = _Dim };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  /** corresponding vector type */
+  typedef Matrix<Scalar,Dim,1> VectorType;
+  /** corresponding linear transformation matrix type */
+  typedef Matrix<Scalar,Dim,Dim> LinearMatrixType;
+  /** corresponding scaling transformation type */
+  typedef Scaling<Scalar,Dim> ScalingType;
+  /** corresponding affine transformation type */
+  typedef Transform<Scalar,Dim> TransformType;
+
+protected:
+
+  VectorType m_coeffs;
+
+public:
+
+  /** Default constructor without initialization. */
+  Translation() {}
+  /**  */
+  inline Translation(const Scalar& sx, const Scalar& sy)
+  {
+    ei_assert(Dim==2);
+    m_coeffs.x() = sx;
+    m_coeffs.y() = sy;
+  }
+  /**  */
+  inline Translation(const Scalar& sx, const Scalar& sy, const Scalar& sz)
+  {
+    ei_assert(Dim==3);
+    m_coeffs.x() = sx;
+    m_coeffs.y() = sy;
+    m_coeffs.z() = sz;
+  }
+  /** Constructs and initialize the scaling transformation from a vector of scaling coefficients */
+  explicit inline Translation(const VectorType& vector) : m_coeffs(vector) {}
+
+  const VectorType& vector() const { return m_coeffs; }
+  VectorType& vector() { return m_coeffs; }
+
+  /** Concatenates two translation */
+  inline Translation operator* (const Translation& other) const
+  { return Translation(m_coeffs + other.m_coeffs); }
+
+  /** Concatenates a translation and a scaling */
+  inline TransformType operator* (const ScalingType& other) const;
+
+  /** Concatenates a translation and a linear transformation */
+  inline TransformType operator* (const LinearMatrixType& linear) const;
+
+  template<typename Derived>
+  inline TransformType operator*(const RotationBase<Derived,Dim>& r) const
+  { return *this * r.toRotationMatrix(); }
+
+  /** Concatenates a linear transformation and a translation */
+  // its a nightmare to define a templated friend function outside its declaration
+  friend inline TransformType operator* (const LinearMatrixType& linear, const Translation& t)
+  {
+    TransformType res;
+    res.matrix().setZero();
+    res.linear() = linear;
+    res.translation() = linear * t.m_coeffs;
+    res.matrix().row(Dim).setZero();
+    res(Dim,Dim) = Scalar(1);
+    return res;
+  }
+
+  /** Concatenates a translation and an affine transformation */
+  inline TransformType operator* (const TransformType& t) const;
+
+  /** Applies translation to vector */
+  inline VectorType operator* (const VectorType& other) const
+  { return m_coeffs + other; }
+
+  /** \returns the inverse translation (opposite) */
+  Translation inverse() const { return Translation(-m_coeffs); }
+
+  Translation& operator=(const Translation& other)
+  {
+    m_coeffs = other.m_coeffs;
+    return *this;
+  }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Translation,Translation<NewScalarType,Dim> >::type cast() const
+  { return typename internal::cast_return_type<Translation,Translation<NewScalarType,Dim> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Translation(const Translation<OtherScalarType,Dim>& other)
+  { m_coeffs = other.vector().template cast<Scalar>(); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Translation& other, typename NumTraits<Scalar>::Real prec = precision<Scalar>()) const
+  { return m_coeffs.isApprox(other.m_coeffs, prec); }
+
+};
+
+/** \addtogroup Geometry_Module */
+//@{
+typedef Translation<float, 2> Translation2f;
+typedef Translation<double,2> Translation2d;
+typedef Translation<float, 3> Translation3f;
+typedef Translation<double,3> Translation3d;
+//@}
+
+
+template<typename Scalar, int Dim>
+inline typename Translation<Scalar,Dim>::TransformType
+Translation<Scalar,Dim>::operator* (const ScalingType& other) const
+{
+  TransformType res;
+  res.matrix().setZero();
+  res.linear().diagonal() = other.coeffs();
+  res.translation() = m_coeffs;
+  res(Dim,Dim) = Scalar(1);
+  return res;
+}
+
+template<typename Scalar, int Dim>
+inline typename Translation<Scalar,Dim>::TransformType
+Translation<Scalar,Dim>::operator* (const LinearMatrixType& linear) const
+{
+  TransformType res;
+  res.matrix().setZero();
+  res.linear() = linear;
+  res.translation() = m_coeffs;
+  res.matrix().row(Dim).setZero();
+  res(Dim,Dim) = Scalar(1);
+  return res;
+}
+
+template<typename Scalar, int Dim>
+inline typename Translation<Scalar,Dim>::TransformType
+Translation<Scalar,Dim>::operator* (const TransformType& t) const
+{
+  TransformType res = t;
+  res.pretranslate(m_coeffs);
+  return res;
+}
+
+} // end namespace Eigen
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/LU.h b/third_party/eigen3/Eigen/src/Eigen2Support/LU.h
new file mode 100644
index 0000000000..49f19ad76e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/LU.h
@@ -0,0 +1,120 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_LU_H
+#define EIGEN2_LU_H
+
+namespace Eigen { 
+
+template<typename MatrixType>
+class LU : public FullPivLU<MatrixType>
+{
+  public:
+
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
+    typedef Matrix<int, 1, MatrixType::ColsAtCompileTime, MatrixType::Options, 1, MatrixType::MaxColsAtCompileTime> IntRowVectorType;
+    typedef Matrix<int, MatrixType::RowsAtCompileTime, 1, MatrixType::Options, MatrixType::MaxRowsAtCompileTime, 1> IntColVectorType;
+    typedef Matrix<Scalar, 1, MatrixType::ColsAtCompileTime, MatrixType::Options, 1, MatrixType::MaxColsAtCompileTime> RowVectorType;
+    typedef Matrix<Scalar, MatrixType::RowsAtCompileTime, 1, MatrixType::Options, MatrixType::MaxRowsAtCompileTime, 1> ColVectorType;
+
+    typedef Matrix<typename MatrixType::Scalar,
+                  MatrixType::ColsAtCompileTime, // the number of rows in the "kernel matrix" is the number of cols of the original matrix
+                                                 // so that the product "matrix * kernel = zero" makes sense
+                  Dynamic,                       // we don't know at compile-time the dimension of the kernel
+                  MatrixType::Options,
+                  MatrixType::MaxColsAtCompileTime, // see explanation for 2nd template parameter
+                  MatrixType::MaxColsAtCompileTime // the kernel is a subspace of the domain space, whose dimension is the number
+                                                   // of columns of the original matrix
+    > KernelResultType;
+
+    typedef Matrix<typename MatrixType::Scalar,
+                   MatrixType::RowsAtCompileTime, // the image is a subspace of the destination space, whose dimension is the number
+                                                  // of rows of the original matrix
+                   Dynamic,                       // we don't know at compile time the dimension of the image (the rank)
+                   MatrixType::Options,
+                   MatrixType::MaxRowsAtCompileTime, // the image matrix will consist of columns from the original matrix,
+                   MatrixType::MaxColsAtCompileTime  // so it has the same number of rows and at most as many columns.
+    > ImageResultType;
+
+    typedef FullPivLU<MatrixType> Base;
+
+    template<typename T>
+    explicit LU(const T& t) : Base(t), m_originalMatrix(t) {}
+
+    template<typename OtherDerived, typename ResultType>
+    bool solve(const MatrixBase<OtherDerived>& b, ResultType *result) const
+    {
+      *result = static_cast<const Base*>(this)->solve(b);
+      return true;
+    }
+
+    template<typename ResultType>
+    inline void computeInverse(ResultType *result) const
+    {
+      solve(MatrixType::Identity(this->rows(), this->cols()), result);
+    }
+    
+    template<typename KernelMatrixType>
+    void computeKernel(KernelMatrixType *result) const
+    {
+      *result = static_cast<const Base*>(this)->kernel();
+    }
+    
+    template<typename ImageMatrixType>
+    void computeImage(ImageMatrixType *result) const
+    {
+      *result = static_cast<const Base*>(this)->image(m_originalMatrix);
+    }
+    
+    const ImageResultType image() const
+    {
+      return static_cast<const Base*>(this)->image(m_originalMatrix);
+    }
+    
+    const MatrixType& m_originalMatrix;
+};
+
+#if EIGEN2_SUPPORT_STAGE < STAGE20_RESOLVE_API_CONFLICTS
+/** \lu_module
+  *
+  * Synonym of partialPivLu().
+  *
+  * \return the partial-pivoting LU decomposition of \c *this.
+  *
+  * \sa class PartialPivLU
+  */
+template<typename Derived>
+inline const LU<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::lu() const
+{
+  return LU<PlainObject>(eval());
+}
+#endif
+
+#ifdef EIGEN2_SUPPORT
+/** \lu_module
+  *
+  * Synonym of partialPivLu().
+  *
+  * \return the partial-pivoting LU decomposition of \c *this.
+  *
+  * \sa class PartialPivLU
+  */
+template<typename Derived>
+inline const LU<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::eigen2_lu() const
+{
+  return LU<PlainObject>(eval());
+}
+#endif
+
+} // end namespace Eigen
+
+#endif // EIGEN2_LU_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Lazy.h b/third_party/eigen3/Eigen/src/Eigen2Support/Lazy.h
new file mode 100644
index 0000000000..593fc78e6d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Lazy.h
@@ -0,0 +1,71 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_LAZY_H
+#define EIGEN_LAZY_H
+
+namespace Eigen { 
+
+/** \deprecated it is only used by lazy() which is deprecated
+  *
+  * \returns an expression of *this with added flags
+  *
+  * Example: \include MatrixBase_marked.cpp
+  * Output: \verbinclude MatrixBase_marked.out
+  *
+  * \sa class Flagged, extract(), part()
+  */
+template<typename Derived>
+template<unsigned int Added>
+inline const Flagged<Derived, Added, 0>
+MatrixBase<Derived>::marked() const
+{
+  return derived();
+}
+
+/** \deprecated use MatrixBase::noalias()
+  *
+  * \returns an expression of *this with the EvalBeforeAssigningBit flag removed.
+  *
+  * Example: \include MatrixBase_lazy.cpp
+  * Output: \verbinclude MatrixBase_lazy.out
+  *
+  * \sa class Flagged, marked()
+  */
+template<typename Derived>
+inline const Flagged<Derived, 0, EvalBeforeAssigningBit>
+MatrixBase<Derived>::lazy() const
+{
+  return derived();
+}
+
+
+/** \internal
+  * Overloaded to perform an efficient C += (A*B).lazy() */
+template<typename Derived>
+template<typename ProductDerived, typename Lhs, typename Rhs>
+Derived& MatrixBase<Derived>::operator+=(const Flagged<ProductBase<ProductDerived, Lhs,Rhs>, 0,
+                                                       EvalBeforeAssigningBit>& other)
+{
+  other._expression().derived().addTo(derived()); return derived();
+}
+
+/** \internal
+  * Overloaded to perform an efficient C -= (A*B).lazy() */
+template<typename Derived>
+template<typename ProductDerived, typename Lhs, typename Rhs>
+Derived& MatrixBase<Derived>::operator-=(const Flagged<ProductBase<ProductDerived, Lhs,Rhs>, 0,
+                                                       EvalBeforeAssigningBit>& other)
+{
+  other._expression().derived().subTo(derived()); return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_LAZY_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/LeastSquares.h b/third_party/eigen3/Eigen/src/Eigen2Support/LeastSquares.h
new file mode 100644
index 0000000000..0e6fdb4889
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/LeastSquares.h
@@ -0,0 +1,170 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_LEASTSQUARES_H
+#define EIGEN2_LEASTSQUARES_H
+
+namespace Eigen { 
+
+/** \ingroup LeastSquares_Module
+  *
+  * \leastsquares_module
+  *
+  * For a set of points, this function tries to express
+  * one of the coords as a linear (affine) function of the other coords.
+  *
+  * This is best explained by an example. This function works in full
+  * generality, for points in a space of arbitrary dimension, and also over
+  * the complex numbers, but for this example we will work in dimension 3
+  * over the real numbers (doubles).
+  *
+  * So let us work with the following set of 5 points given by their
+  * \f$(x,y,z)\f$ coordinates:
+  * @code
+    Vector3d points[5];
+    points[0] = Vector3d( 3.02, 6.89, -4.32 );
+    points[1] = Vector3d( 2.01, 5.39, -3.79 );
+    points[2] = Vector3d( 2.41, 6.01, -4.01 );
+    points[3] = Vector3d( 2.09, 5.55, -3.86 );
+    points[4] = Vector3d( 2.58, 6.32, -4.10 );
+  * @endcode
+  * Suppose that we want to express the second coordinate (\f$y\f$) as a linear
+  * expression in \f$x\f$ and \f$z\f$, that is,
+  * \f[ y=ax+bz+c \f]
+  * for some constants \f$a,b,c\f$. Thus, we want to find the best possible
+  * constants \f$a,b,c\f$ so that the plane of equation \f$y=ax+bz+c\f$ fits
+  * best the five above points. To do that, call this function as follows:
+  * @code
+    Vector3d coeffs; // will store the coefficients a, b, c
+    linearRegression(
+      5,
+      &points,
+      &coeffs,
+      1 // the coord to express as a function of
+        // the other ones. 0 means x, 1 means y, 2 means z.
+    );
+  * @endcode
+  * Now the vector \a coeffs is approximately
+  * \f$( 0.495 ,  -1.927 ,  -2.906 )\f$.
+  * Thus, we get \f$a=0.495, b = -1.927, c = -2.906\f$. Let us check for
+  * instance how near points[0] is from the plane of equation \f$y=ax+bz+c\f$.
+  * Looking at the coords of points[0], we see that:
+  * \f[ax+bz+c = 0.495 * 3.02 + (-1.927) * (-4.32) + (-2.906) = 6.91.\f]
+  * On the other hand, we have \f$y=6.89\f$. We see that the values
+  * \f$6.91\f$ and \f$6.89\f$
+  * are near, so points[0] is very near the plane of equation \f$y=ax+bz+c\f$.
+  *
+  * Let's now describe precisely the parameters:
+  * @param numPoints the number of points
+  * @param points the array of pointers to the points on which to perform the linear regression
+  * @param result pointer to the vector in which to store the result.
+                  This vector must be of the same type and size as the
+                  data points. The meaning of its coords is as follows.
+                  For brevity, let \f$n=Size\f$,
+                  \f$r_i=result[i]\f$,
+                  and \f$f=funcOfOthers\f$. Denote by
+                  \f$x_0,\ldots,x_{n-1}\f$
+                  the n coordinates in the n-dimensional space.
+                  Then the resulting equation is:
+                  \f[ x_f = r_0 x_0 + \cdots + r_{f-1}x_{f-1}
+                   + r_{f+1}x_{f+1} + \cdots + r_{n-1}x_{n-1} + r_n. \f]
+  * @param funcOfOthers Determines which coord to express as a function of the
+                        others. Coords are numbered starting from 0, so that a
+                        value of 0 means \f$x\f$, 1 means \f$y\f$,
+                        2 means \f$z\f$, ...
+  *
+  * \sa fitHyperplane()
+  */
+template<typename VectorType>
+void linearRegression(int numPoints,
+                      VectorType **points,
+                      VectorType *result,
+                      int funcOfOthers )
+{
+  typedef typename VectorType::Scalar Scalar;
+  typedef Hyperplane<Scalar, VectorType::SizeAtCompileTime> HyperplaneType;
+  const int size = points[0]->size();
+  result->resize(size);
+  HyperplaneType h(size);
+  fitHyperplane(numPoints, points, &h);
+  for(int i = 0; i < funcOfOthers; i++)
+    result->coeffRef(i) = - h.coeffs()[i] / h.coeffs()[funcOfOthers];
+  for(int i = funcOfOthers; i < size; i++)
+    result->coeffRef(i) = - h.coeffs()[i+1] / h.coeffs()[funcOfOthers];
+}
+
+/** \ingroup LeastSquares_Module
+  *
+  * \leastsquares_module
+  *
+  * This function is quite similar to linearRegression(), so we refer to the
+  * documentation of this function and only list here the differences.
+  *
+  * The main difference from linearRegression() is that this function doesn't
+  * take a \a funcOfOthers argument. Instead, it finds a general equation
+  * of the form
+  * \f[ r_0 x_0 + \cdots + r_{n-1}x_{n-1} + r_n = 0, \f]
+  * where \f$n=Size\f$, \f$r_i=retCoefficients[i]\f$, and we denote by
+  * \f$x_0,\ldots,x_{n-1}\f$ the n coordinates in the n-dimensional space.
+  *
+  * Thus, the vector \a retCoefficients has size \f$n+1\f$, which is another
+  * difference from linearRegression().
+  *
+  * In practice, this function performs an hyper-plane fit in a total least square sense
+  * via the following steps:
+  *  1 - center the data to the mean
+  *  2 - compute the covariance matrix
+  *  3 - pick the eigenvector corresponding to the smallest eigenvalue of the covariance matrix
+  * The ratio of the smallest eigenvalue and the second one gives us a hint about the relevance
+  * of the solution. This value is optionally returned in \a soundness.
+  *
+  * \sa linearRegression()
+  */
+template<typename VectorType, typename HyperplaneType>
+void fitHyperplane(int numPoints,
+                   VectorType **points,
+                   HyperplaneType *result,
+                   typename NumTraits<typename VectorType::Scalar>::Real* soundness = 0)
+{
+  typedef typename VectorType::Scalar Scalar;
+  typedef Matrix<Scalar,VectorType::SizeAtCompileTime,VectorType::SizeAtCompileTime> CovMatrixType;
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(VectorType)
+  ei_assert(numPoints >= 1);
+  int size = points[0]->size();
+  ei_assert(size+1 == result->coeffs().size());
+
+  // compute the mean of the data
+  VectorType mean = VectorType::Zero(size);
+  for(int i = 0; i < numPoints; ++i)
+    mean += *(points[i]);
+  mean /= numPoints;
+
+  // compute the covariance matrix
+  CovMatrixType covMat = CovMatrixType::Zero(size, size);
+  VectorType remean = VectorType::Zero(size);
+  for(int i = 0; i < numPoints; ++i)
+  {
+    VectorType diff = (*(points[i]) - mean).conjugate();
+    covMat += diff * diff.adjoint();
+  }
+
+  // now we just have to pick the eigen vector with smallest eigen value
+  SelfAdjointEigenSolver<CovMatrixType> eig(covMat);
+  result->normal() = eig.eigenvectors().col(0);
+  if (soundness)
+    *soundness = eig.eigenvalues().coeff(0)/eig.eigenvalues().coeff(1);
+
+  // let's compute the constant coefficient such that the
+  // plane pass trough the mean point:
+  result->offset() = - (result->normal().cwise()* mean).sum();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN2_LEASTSQUARES_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Macros.h b/third_party/eigen3/Eigen/src/Eigen2Support/Macros.h
new file mode 100644
index 0000000000..351c32afb6
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Macros.h
@@ -0,0 +1,20 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_MACROS_H
+#define EIGEN2_MACROS_H
+
+#define ei_assert eigen_assert
+#define ei_internal_assert eigen_internal_assert
+
+#define EIGEN_ALIGN_128 EIGEN_ALIGN16
+
+#define EIGEN_ARCH_WANTS_ALIGNMENT EIGEN_ALIGN_STATICALLY
+
+#endif // EIGEN2_MACROS_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/MathFunctions.h b/third_party/eigen3/Eigen/src/Eigen2Support/MathFunctions.h
new file mode 100644
index 0000000000..3544af2538
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/MathFunctions.h
@@ -0,0 +1,57 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_MATH_FUNCTIONS_H
+#define EIGEN2_MATH_FUNCTIONS_H
+
+namespace Eigen { 
+
+template<typename T> inline typename NumTraits<T>::Real ei_real(const T& x) { return numext::real(x); }
+template<typename T> inline typename NumTraits<T>::Real ei_imag(const T& x) { return numext::imag(x); }
+template<typename T> inline T ei_conj(const T& x) { return numext::conj(x); }
+template<typename T> inline typename NumTraits<T>::Real ei_abs (const T& x) { using std::abs; return abs(x); }
+template<typename T> inline typename NumTraits<T>::Real ei_abs2(const T& x) { return numext::abs2(x); }
+template<typename T> inline T ei_sqrt(const T& x) { using std::sqrt; return sqrt(x); }
+template<typename T> inline T ei_exp (const T& x) { using std::exp;  return exp(x); }
+template<typename T> inline T ei_log (const T& x) { using std::log;  return log(x); }
+template<typename T> inline T ei_sin (const T& x) { using std::sin;  return sin(x); }
+template<typename T> inline T ei_cos (const T& x) { using std::cos;  return cos(x); }
+template<typename T> inline T ei_atan2(const T& x,const T& y) { using std::atan2; return atan2(x,y); }
+template<typename T> inline T ei_pow (const T& x,const T& y) { return numext::pow(x,y); }
+template<typename T> inline T ei_random () { return internal::random<T>(); }
+template<typename T> inline T ei_random (const T& x, const T& y) { return internal::random(x, y); }
+
+template<typename T> inline T precision () { return NumTraits<T>::dummy_precision(); }
+template<typename T> inline T machine_epsilon () { return NumTraits<T>::epsilon(); }
+
+
+template<typename Scalar, typename OtherScalar>
+inline bool ei_isMuchSmallerThan(const Scalar& x, const OtherScalar& y,
+                                   typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision())
+{
+  return internal::isMuchSmallerThan(x, y, precision);
+}
+
+template<typename Scalar>
+inline bool ei_isApprox(const Scalar& x, const Scalar& y,
+                          typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision())
+{
+  return internal::isApprox(x, y, precision);
+}
+
+template<typename Scalar>
+inline bool ei_isApproxOrLessThan(const Scalar& x, const Scalar& y,
+                                    typename NumTraits<Scalar>::Real precision = NumTraits<Scalar>::dummy_precision())
+{
+  return internal::isApproxOrLessThan(x, y, precision);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN2_MATH_FUNCTIONS_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Memory.h b/third_party/eigen3/Eigen/src/Eigen2Support/Memory.h
new file mode 100644
index 0000000000..f86372b6b5
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Memory.h
@@ -0,0 +1,45 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_MEMORY_H
+#define EIGEN2_MEMORY_H
+
+namespace Eigen { 
+
+inline void* ei_aligned_malloc(size_t size) { return internal::aligned_malloc(size); }
+inline void  ei_aligned_free(void *ptr) { internal::aligned_free(ptr); }
+inline void* ei_aligned_realloc(void *ptr, size_t new_size, size_t old_size) { return internal::aligned_realloc(ptr, new_size, old_size); }
+inline void* ei_handmade_aligned_malloc(size_t size) { return internal::handmade_aligned_malloc(size); }
+inline void  ei_handmade_aligned_free(void *ptr) { internal::handmade_aligned_free(ptr); }
+
+template<bool Align> inline void* ei_conditional_aligned_malloc(size_t size)
+{
+  return internal::conditional_aligned_malloc<Align>(size);
+}
+template<bool Align> inline void ei_conditional_aligned_free(void *ptr)
+{
+  internal::conditional_aligned_free<Align>(ptr);
+}
+template<bool Align> inline void* ei_conditional_aligned_realloc(void* ptr, size_t new_size, size_t old_size)
+{
+  return internal::conditional_aligned_realloc<Align>(ptr, new_size, old_size);
+}
+
+template<typename T> inline T* ei_aligned_new(size_t size)
+{
+  return internal::aligned_new<T>(size);
+}
+template<typename T> inline void ei_aligned_delete(T *ptr, size_t size)
+{
+  return internal::aligned_delete(ptr, size);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN2_MACROS_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Meta.h b/third_party/eigen3/Eigen/src/Eigen2Support/Meta.h
new file mode 100644
index 0000000000..fa37cfc961
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Meta.h
@@ -0,0 +1,75 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_META_H
+#define EIGEN2_META_H
+
+namespace Eigen { 
+
+template<typename T>
+struct ei_traits : internal::traits<T>
+{};
+
+struct ei_meta_true {  enum { ret = 1 }; };
+struct ei_meta_false { enum { ret = 0 }; };
+
+template<bool Condition, typename Then, typename Else>
+struct ei_meta_if { typedef Then ret; };
+
+template<typename Then, typename Else>
+struct ei_meta_if <false, Then, Else> { typedef Else ret; };
+
+template<typename T, typename U> struct ei_is_same_type { enum { ret = 0 }; };
+template<typename T> struct ei_is_same_type<T,T> { enum { ret = 1 }; };
+
+template<typename T> struct ei_unref { typedef T type; };
+template<typename T> struct ei_unref<T&> { typedef T type; };
+
+template<typename T> struct ei_unpointer { typedef T type; };
+template<typename T> struct ei_unpointer<T*> { typedef T type; };
+template<typename T> struct ei_unpointer<T*const> { typedef T type; };
+
+template<typename T> struct ei_unconst { typedef T type; };
+template<typename T> struct ei_unconst<const T> { typedef T type; };
+template<typename T> struct ei_unconst<T const &> { typedef T & type; };
+template<typename T> struct ei_unconst<T const *> { typedef T * type; };
+
+template<typename T> struct ei_cleantype { typedef T type; };
+template<typename T> struct ei_cleantype<const T>   { typedef typename ei_cleantype<T>::type type; };
+template<typename T> struct ei_cleantype<const T&>  { typedef typename ei_cleantype<T>::type type; };
+template<typename T> struct ei_cleantype<T&>        { typedef typename ei_cleantype<T>::type type; };
+template<typename T> struct ei_cleantype<const T*>  { typedef typename ei_cleantype<T>::type type; };
+template<typename T> struct ei_cleantype<T*>        { typedef typename ei_cleantype<T>::type type; };
+
+/** \internal In short, it computes int(sqrt(\a Y)) with \a Y an integer.
+  * Usage example: \code ei_meta_sqrt<1023>::ret \endcode
+  */
+template<int Y,
+         int InfX = 0,
+         int SupX = ((Y==1) ? 1 : Y/2),
+         bool Done = ((SupX-InfX)<=1 ? true : ((SupX*SupX <= Y) && ((SupX+1)*(SupX+1) > Y))) >
+                                // use ?: instead of || just to shut up a stupid gcc 4.3 warning
+class ei_meta_sqrt
+{
+    enum {
+      MidX = (InfX+SupX)/2,
+      TakeInf = MidX*MidX > Y ? 1 : 0,
+      NewInf = int(TakeInf) ? InfX : int(MidX),
+      NewSup = int(TakeInf) ? int(MidX) : SupX
+    };
+  public:
+    enum { ret = ei_meta_sqrt<Y,NewInf,NewSup>::ret };
+};
+
+template<int Y, int InfX, int SupX>
+class ei_meta_sqrt<Y, InfX, SupX, true> { public:  enum { ret = (SupX*SupX <= Y) ? SupX : InfX }; };
+
+} // end namespace Eigen
+
+#endif // EIGEN2_META_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/Minor.h b/third_party/eigen3/Eigen/src/Eigen2Support/Minor.h
new file mode 100644
index 0000000000..4cded5734f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/Minor.h
@@ -0,0 +1,117 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MINOR_H
+#define EIGEN_MINOR_H
+
+namespace Eigen { 
+
+/**
+  * \class Minor
+  *
+  * \brief Expression of a minor
+  *
+  * \param MatrixType the type of the object in which we are taking a minor
+  *
+  * This class represents an expression of a minor. It is the return
+  * type of MatrixBase::minor() and most of the time this is the only way it
+  * is used.
+  *
+  * \sa MatrixBase::minor()
+  */
+
+namespace internal {
+template<typename MatrixType>
+struct traits<Minor<MatrixType> >
+ : traits<MatrixType>
+{
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_reference<MatrixTypeNested>::type _MatrixTypeNested;
+  typedef typename MatrixType::StorageKind StorageKind;
+  enum {
+    RowsAtCompileTime = (MatrixType::RowsAtCompileTime != Dynamic) ?
+                          int(MatrixType::RowsAtCompileTime) - 1 : Dynamic,
+    ColsAtCompileTime = (MatrixType::ColsAtCompileTime != Dynamic) ?
+                          int(MatrixType::ColsAtCompileTime) - 1 : Dynamic,
+    MaxRowsAtCompileTime = (MatrixType::MaxRowsAtCompileTime != Dynamic) ?
+                             int(MatrixType::MaxRowsAtCompileTime) - 1 : Dynamic,
+    MaxColsAtCompileTime = (MatrixType::MaxColsAtCompileTime != Dynamic) ?
+                             int(MatrixType::MaxColsAtCompileTime) - 1 : Dynamic,
+    Flags = _MatrixTypeNested::Flags & (HereditaryBits | LvalueBit),
+    CoeffReadCost = _MatrixTypeNested::CoeffReadCost // minor is used typically on tiny matrices,
+      // where loops are unrolled and the 'if' evaluates at compile time
+  };
+};
+}
+
+template<typename MatrixType> class Minor
+  : public MatrixBase<Minor<MatrixType> >
+{
+  public:
+
+    typedef MatrixBase<Minor> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Minor)
+
+    inline Minor(const MatrixType& matrix,
+                       Index row, Index col)
+      : m_matrix(matrix), m_row(row), m_col(col)
+    {
+      eigen_assert(row >= 0 && row < matrix.rows()
+          && col >= 0 && col < matrix.cols());
+    }
+
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(Minor)
+
+    inline Index rows() const { return m_matrix.rows() - 1; }
+    inline Index cols() const { return m_matrix.cols() - 1; }
+
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      return m_matrix.const_cast_derived().coeffRef(row + (row >= m_row), col + (col >= m_col));
+    }
+
+    inline const Scalar coeff(Index row, Index col) const
+    {
+      return m_matrix.coeff(row + (row >= m_row), col + (col >= m_col));
+    }
+
+  protected:
+    const typename MatrixType::Nested m_matrix;
+    const Index m_row, m_col;
+};
+
+/**
+  * \return an expression of the (\a row, \a col)-minor of *this,
+  * i.e. an expression constructed from *this by removing the specified
+  * row and column.
+  *
+  * Example: \include MatrixBase_minor.cpp
+  * Output: \verbinclude MatrixBase_minor.out
+  *
+  * \sa class Minor
+  */
+template<typename Derived>
+inline Minor<Derived>
+MatrixBase<Derived>::minor(Index row, Index col)
+{
+  return Minor<Derived>(derived(), row, col);
+}
+
+/**
+  * This is the const version of minor(). */
+template<typename Derived>
+inline const Minor<Derived>
+MatrixBase<Derived>::minor(Index row, Index col) const
+{
+  return Minor<Derived>(derived(), row, col);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_MINOR_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/QR.h b/third_party/eigen3/Eigen/src/Eigen2Support/QR.h
new file mode 100644
index 0000000000..2042c98510
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/QR.h
@@ -0,0 +1,67 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+// Copyright (C) 2011 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_QR_H
+#define EIGEN2_QR_H
+
+namespace Eigen { 
+
+template<typename MatrixType>
+class QR : public HouseholderQR<MatrixType>
+{
+  public:
+
+    typedef HouseholderQR<MatrixType> Base;
+    typedef Block<const MatrixType, MatrixType::ColsAtCompileTime, MatrixType::ColsAtCompileTime> MatrixRBlockType;
+
+    QR() : Base() {}
+
+    template<typename T>
+    explicit QR(const T& t) : Base(t) {}
+
+    template<typename OtherDerived, typename ResultType>
+    bool solve(const MatrixBase<OtherDerived>& b, ResultType *result) const
+    {
+      *result = static_cast<const Base*>(this)->solve(b);
+      return true;
+    }
+
+    MatrixType matrixQ(void) const {
+      MatrixType ret = MatrixType::Identity(this->rows(), this->cols());
+      ret = this->householderQ() * ret;
+      return ret;
+    }
+
+    bool isFullRank() const {
+      return true;
+    }
+    
+    const TriangularView<MatrixRBlockType, UpperTriangular>
+    matrixR(void) const
+    {
+      int cols = this->cols();
+      return MatrixRBlockType(this->matrixQR(), 0, 0, cols, cols).template triangularView<UpperTriangular>();
+    }
+};
+
+/** \return the QR decomposition of \c *this.
+  *
+  * \sa class QR
+  */
+template<typename Derived>
+const QR<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::qr() const
+{
+  return QR<PlainObject>(eval());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN2_QR_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/SVD.h b/third_party/eigen3/Eigen/src/Eigen2Support/SVD.h
new file mode 100644
index 0000000000..3d03d2288d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/SVD.h
@@ -0,0 +1,637 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <g.gael@free.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_SVD_H
+#define EIGEN2_SVD_H
+
+namespace Eigen {
+
+/** \ingroup SVD_Module
+  * \nonstableyet
+  *
+  * \class SVD
+  *
+  * \brief Standard SVD decomposition of a matrix and associated features
+  *
+  * \param MatrixType the type of the matrix of which we are computing the SVD decomposition
+  *
+  * This class performs a standard SVD decomposition of a real matrix A of size \c M x \c N
+  * with \c M \>= \c N.
+  *
+  *
+  * \sa MatrixBase::SVD()
+  */
+template<typename MatrixType> class SVD
+{
+  private:
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
+
+    enum {
+      PacketSize = internal::packet_traits<Scalar>::size,
+      AlignmentMask = int(PacketSize)-1,
+      MinSize = EIGEN_SIZE_MIN_PREFER_DYNAMIC(MatrixType::RowsAtCompileTime, MatrixType::ColsAtCompileTime)
+    };
+
+    typedef Matrix<Scalar, MatrixType::RowsAtCompileTime, 1> ColVector;
+    typedef Matrix<Scalar, MatrixType::ColsAtCompileTime, 1> RowVector;
+
+    typedef Matrix<Scalar, MatrixType::RowsAtCompileTime, MinSize> MatrixUType;
+    typedef Matrix<Scalar, MatrixType::ColsAtCompileTime, MatrixType::ColsAtCompileTime> MatrixVType;
+    typedef Matrix<Scalar, MinSize, 1> SingularValuesType;
+
+  public:
+
+    SVD() {} // a user who relied on compiler-generated default compiler reported problems with MSVC in 2.0.7
+    
+    SVD(const MatrixType& matrix)
+      : m_matU(matrix.rows(), (std::min)(matrix.rows(), matrix.cols())),
+        m_matV(matrix.cols(),matrix.cols()),
+        m_sigma((std::min)(matrix.rows(),matrix.cols()))
+    {
+      compute(matrix);
+    }
+
+    template<typename OtherDerived, typename ResultType>
+    bool solve(const MatrixBase<OtherDerived> &b, ResultType* result) const;
+
+    const MatrixUType& matrixU() const { return m_matU; }
+    const SingularValuesType& singularValues() const { return m_sigma; }
+    const MatrixVType& matrixV() const { return m_matV; }
+
+    void compute(const MatrixType& matrix);
+    SVD& sort();
+
+    template<typename UnitaryType, typename PositiveType>
+    void computeUnitaryPositive(UnitaryType *unitary, PositiveType *positive) const;
+    template<typename PositiveType, typename UnitaryType>
+    void computePositiveUnitary(PositiveType *positive, UnitaryType *unitary) const;
+    template<typename RotationType, typename ScalingType>
+    void computeRotationScaling(RotationType *unitary, ScalingType *positive) const;
+    template<typename ScalingType, typename RotationType>
+    void computeScalingRotation(ScalingType *positive, RotationType *unitary) const;
+
+  protected:
+    /** \internal */
+    MatrixUType m_matU;
+    /** \internal */
+    MatrixVType m_matV;
+    /** \internal */
+    SingularValuesType m_sigma;
+};
+
+/** Computes / recomputes the SVD decomposition A = U S V^* of \a matrix
+  *
+  * \note this code has been adapted from JAMA (public domain)
+  */
+template<typename MatrixType>
+void SVD<MatrixType>::compute(const MatrixType& matrix)
+{
+  const int m = matrix.rows();
+  const int n = matrix.cols();
+  const int nu = (std::min)(m,n);
+  ei_assert(m>=n && "In Eigen 2.0, SVD only works for MxN matrices with M>=N. Sorry!");
+  ei_assert(m>1 && "In Eigen 2.0, SVD doesn't work on 1x1 matrices");
+
+  m_matU.resize(m, nu);
+  m_matU.setZero();
+  m_sigma.resize((std::min)(m,n));
+  m_matV.resize(n,n);
+
+  RowVector e(n);
+  ColVector work(m);
+  MatrixType matA(matrix);
+  const bool wantu = true;
+  const bool wantv = true;
+  int i=0, j=0, k=0;
+
+  // Reduce A to bidiagonal form, storing the diagonal elements
+  // in s and the super-diagonal elements in e.
+  int nct = (std::min)(m-1,n);
+  int nrt = (std::max)(0,(std::min)(n-2,m));
+  for (k = 0; k < (std::max)(nct,nrt); ++k)
+  {
+    if (k < nct)
+    {
+      // Compute the transformation for the k-th column and
+      // place the k-th diagonal in m_sigma[k].
+      m_sigma[k] = matA.col(k).end(m-k).norm();
+      if (m_sigma[k] != 0.0) // FIXME
+      {
+        if (matA(k,k) < 0.0)
+          m_sigma[k] = -m_sigma[k];
+        matA.col(k).end(m-k) /= m_sigma[k];
+        matA(k,k) += 1.0;
+      }
+      m_sigma[k] = -m_sigma[k];
+    }
+
+    for (j = k+1; j < n; ++j)
+    {
+      if ((k < nct) && (m_sigma[k] != 0.0))
+      {
+        // Apply the transformation.
+        Scalar t = matA.col(k).end(m-k).eigen2_dot(matA.col(j).end(m-k)); // FIXME dot product or cwise prod + .sum() ??
+        t = -t/matA(k,k);
+        matA.col(j).end(m-k) += t * matA.col(k).end(m-k);
+      }
+
+      // Place the k-th row of A into e for the
+      // subsequent calculation of the row transformation.
+      e[j] = matA(k,j);
+    }
+
+    // Place the transformation in U for subsequent back multiplication.
+    if (wantu & (k < nct))
+      m_matU.col(k).end(m-k) = matA.col(k).end(m-k);
+
+    if (k < nrt)
+    {
+      // Compute the k-th row transformation and place the
+      // k-th super-diagonal in e[k].
+      e[k] = e.end(n-k-1).norm();
+      if (e[k] != 0.0)
+      {
+          if (e[k+1] < 0.0)
+            e[k] = -e[k];
+          e.end(n-k-1) /= e[k];
+          e[k+1] += 1.0;
+      }
+      e[k] = -e[k];
+      if ((k+1 < m) & (e[k] != 0.0))
+      {
+        // Apply the transformation.
+        work.end(m-k-1) = matA.corner(BottomRight,m-k-1,n-k-1) * e.end(n-k-1);
+        for (j = k+1; j < n; ++j)
+          matA.col(j).end(m-k-1) += (-e[j]/e[k+1]) * work.end(m-k-1);
+      }
+
+      // Place the transformation in V for subsequent back multiplication.
+      if (wantv)
+        m_matV.col(k).end(n-k-1) = e.end(n-k-1);
+    }
+  }
+
+
+  // Set up the final bidiagonal matrix or order p.
+  int p = (std::min)(n,m+1);
+  if (nct < n)
+    m_sigma[nct] = matA(nct,nct);
+  if (m < p)
+    m_sigma[p-1] = 0.0;
+  if (nrt+1 < p)
+    e[nrt] = matA(nrt,p-1);
+  e[p-1] = 0.0;
+
+  // If required, generate U.
+  if (wantu)
+  {
+    for (j = nct; j < nu; ++j)
+    {
+      m_matU.col(j).setZero();
+      m_matU(j,j) = 1.0;
+    }
+    for (k = nct-1; k >= 0; k--)
+    {
+      if (m_sigma[k] != 0.0)
+      {
+        for (j = k+1; j < nu; ++j)
+        {
+          Scalar t = m_matU.col(k).end(m-k).eigen2_dot(m_matU.col(j).end(m-k)); // FIXME is it really a dot product we want ?
+          t = -t/m_matU(k,k);
+          m_matU.col(j).end(m-k) += t * m_matU.col(k).end(m-k);
+        }
+        m_matU.col(k).end(m-k) = - m_matU.col(k).end(m-k);
+        m_matU(k,k) = Scalar(1) + m_matU(k,k);
+        if (k-1>0)
+          m_matU.col(k).start(k-1).setZero();
+      }
+      else
+      {
+        m_matU.col(k).setZero();
+        m_matU(k,k) = 1.0;
+      }
+    }
+  }
+
+  // If required, generate V.
+  if (wantv)
+  {
+    for (k = n-1; k >= 0; k--)
+    {
+      if ((k < nrt) & (e[k] != 0.0))
+      {
+        for (j = k+1; j < nu; ++j)
+        {
+          Scalar t = m_matV.col(k).end(n-k-1).eigen2_dot(m_matV.col(j).end(n-k-1)); // FIXME is it really a dot product we want ?
+          t = -t/m_matV(k+1,k);
+          m_matV.col(j).end(n-k-1) += t * m_matV.col(k).end(n-k-1);
+        }
+      }
+      m_matV.col(k).setZero();
+      m_matV(k,k) = 1.0;
+    }
+  }
+
+  // Main iteration loop for the singular values.
+  int pp = p-1;
+  int iter = 0;
+  Scalar eps = ei_pow(Scalar(2),ei_is_same_type<Scalar,float>::ret ? Scalar(-23) : Scalar(-52));
+  while (p > 0)
+  {
+    int k=0;
+    int kase=0;
+
+    // Here is where a test for too many iterations would go.
+
+    // This section of the program inspects for
+    // negligible elements in the s and e arrays.  On
+    // completion the variables kase and k are set as follows.
+
+    // kase = 1     if s(p) and e[k-1] are negligible and k<p
+    // kase = 2     if s(k) is negligible and k<p
+    // kase = 3     if e[k-1] is negligible, k<p, and
+    //              s(k), ..., s(p) are not negligible (qr step).
+    // kase = 4     if e(p-1) is negligible (convergence).
+
+    for (k = p-2; k >= -1; --k)
+    {
+      if (k == -1)
+          break;
+      if (ei_abs(e[k]) <= eps*(ei_abs(m_sigma[k]) + ei_abs(m_sigma[k+1])))
+      {
+          e[k] = 0.0;
+          break;
+      }
+    }
+    if (k == p-2)
+    {
+      kase = 4;
+    }
+    else
+    {
+      int ks;
+      for (ks = p-1; ks >= k; --ks)
+      {
+        if (ks == k)
+          break;
+        Scalar t = (ks != p ? ei_abs(e[ks]) : Scalar(0)) + (ks != k+1 ? ei_abs(e[ks-1]) : Scalar(0));
+        if (ei_abs(m_sigma[ks]) <= eps*t)
+        {
+          m_sigma[ks] = 0.0;
+          break;
+        }
+      }
+      if (ks == k)
+      {
+        kase = 3;
+      }
+      else if (ks == p-1)
+      {
+        kase = 1;
+      }
+      else
+      {
+        kase = 2;
+        k = ks;
+      }
+    }
+    ++k;
+
+    // Perform the task indicated by kase.
+    switch (kase)
+    {
+
+      // Deflate negligible s(p).
+      case 1:
+      {
+        Scalar f(e[p-2]);
+        e[p-2] = 0.0;
+        for (j = p-2; j >= k; --j)
+        {
+          Scalar t(numext::hypot(m_sigma[j],f));
+          Scalar cs(m_sigma[j]/t);
+          Scalar sn(f/t);
+          m_sigma[j] = t;
+          if (j != k)
+          {
+            f = -sn*e[j-1];
+            e[j-1] = cs*e[j-1];
+          }
+          if (wantv)
+          {
+            for (i = 0; i < n; ++i)
+            {
+              t = cs*m_matV(i,j) + sn*m_matV(i,p-1);
+              m_matV(i,p-1) = -sn*m_matV(i,j) + cs*m_matV(i,p-1);
+              m_matV(i,j) = t;
+            }
+          }
+        }
+      }
+      break;
+
+      // Split at negligible s(k).
+      case 2:
+      {
+        Scalar f(e[k-1]);
+        e[k-1] = 0.0;
+        for (j = k; j < p; ++j)
+        {
+          Scalar t(numext::hypot(m_sigma[j],f));
+          Scalar cs( m_sigma[j]/t);
+          Scalar sn(f/t);
+          m_sigma[j] = t;
+          f = -sn*e[j];
+          e[j] = cs*e[j];
+          if (wantu)
+          {
+            for (i = 0; i < m; ++i)
+            {
+              t = cs*m_matU(i,j) + sn*m_matU(i,k-1);
+              m_matU(i,k-1) = -sn*m_matU(i,j) + cs*m_matU(i,k-1);
+              m_matU(i,j) = t;
+            }
+          }
+        }
+      }
+      break;
+
+      // Perform one qr step.
+      case 3:
+      {
+        // Calculate the shift.
+        Scalar scale = (std::max)((std::max)((std::max)((std::max)(
+                        ei_abs(m_sigma[p-1]),ei_abs(m_sigma[p-2])),ei_abs(e[p-2])),
+                        ei_abs(m_sigma[k])),ei_abs(e[k]));
+        Scalar sp = m_sigma[p-1]/scale;
+        Scalar spm1 = m_sigma[p-2]/scale;
+        Scalar epm1 = e[p-2]/scale;
+        Scalar sk = m_sigma[k]/scale;
+        Scalar ek = e[k]/scale;
+        Scalar b = ((spm1 + sp)*(spm1 - sp) + epm1*epm1)/Scalar(2);
+        Scalar c = (sp*epm1)*(sp*epm1);
+        Scalar shift(0);
+        if ((b != 0.0) || (c != 0.0))
+        {
+          shift = ei_sqrt(b*b + c);
+          if (b < 0.0)
+            shift = -shift;
+          shift = c/(b + shift);
+        }
+        Scalar f = (sk + sp)*(sk - sp) + shift;
+        Scalar g = sk*ek;
+
+        // Chase zeros.
+
+        for (j = k; j < p-1; ++j)
+        {
+          Scalar t = numext::hypot(f,g);
+          Scalar cs = f/t;
+          Scalar sn = g/t;
+          if (j != k)
+            e[j-1] = t;
+          f = cs*m_sigma[j] + sn*e[j];
+          e[j] = cs*e[j] - sn*m_sigma[j];
+          g = sn*m_sigma[j+1];
+          m_sigma[j+1] = cs*m_sigma[j+1];
+          if (wantv)
+          {
+            for (i = 0; i < n; ++i)
+            {
+              t = cs*m_matV(i,j) + sn*m_matV(i,j+1);
+              m_matV(i,j+1) = -sn*m_matV(i,j) + cs*m_matV(i,j+1);
+              m_matV(i,j) = t;
+            }
+          }
+          t = numext::hypot(f,g);
+          cs = f/t;
+          sn = g/t;
+          m_sigma[j] = t;
+          f = cs*e[j] + sn*m_sigma[j+1];
+          m_sigma[j+1] = -sn*e[j] + cs*m_sigma[j+1];
+          g = sn*e[j+1];
+          e[j+1] = cs*e[j+1];
+          if (wantu && (j < m-1))
+          {
+            for (i = 0; i < m; ++i)
+            {
+              t = cs*m_matU(i,j) + sn*m_matU(i,j+1);
+              m_matU(i,j+1) = -sn*m_matU(i,j) + cs*m_matU(i,j+1);
+              m_matU(i,j) = t;
+            }
+          }
+        }
+        e[p-2] = f;
+        iter = iter + 1;
+      }
+      break;
+
+      // Convergence.
+      case 4:
+      {
+        // Make the singular values positive.
+        if (m_sigma[k] <= 0.0)
+        {
+          m_sigma[k] = m_sigma[k] < Scalar(0) ? -m_sigma[k] : Scalar(0);
+          if (wantv)
+            m_matV.col(k).start(pp+1) = -m_matV.col(k).start(pp+1);
+        }
+
+        // Order the singular values.
+        while (k < pp)
+        {
+          if (m_sigma[k] >= m_sigma[k+1])
+            break;
+          Scalar t = m_sigma[k];
+          m_sigma[k] = m_sigma[k+1];
+          m_sigma[k+1] = t;
+          if (wantv && (k < n-1))
+            m_matV.col(k).swap(m_matV.col(k+1));
+          if (wantu && (k < m-1))
+            m_matU.col(k).swap(m_matU.col(k+1));
+          ++k;
+        }
+        iter = 0;
+        p--;
+      }
+      break;
+    } // end big switch
+  } // end iterations
+}
+
+template<typename MatrixType>
+SVD<MatrixType>& SVD<MatrixType>::sort()
+{
+  int mu = m_matU.rows();
+  int mv = m_matV.rows();
+  int n  = m_matU.cols();
+
+  for (int i=0; i<n; ++i)
+  {
+    int  k = i;
+    Scalar p = m_sigma.coeff(i);
+
+    for (int j=i+1; j<n; ++j)
+    {
+      if (m_sigma.coeff(j) > p)
+      {
+        k = j;
+        p = m_sigma.coeff(j);
+      }
+    }
+    if (k != i)
+    {
+      m_sigma.coeffRef(k) = m_sigma.coeff(i);  // i.e.
+      m_sigma.coeffRef(i) = p;                 // swaps the i-th and the k-th elements
+
+      int j = mu;
+      for(int s=0; j!=0; ++s, --j)
+        std::swap(m_matU.coeffRef(s,i), m_matU.coeffRef(s,k));
+
+      j = mv;
+      for (int s=0; j!=0; ++s, --j)
+        std::swap(m_matV.coeffRef(s,i), m_matV.coeffRef(s,k));
+    }
+  }
+  return *this;
+}
+
+/** \returns the solution of \f$ A x = b \f$ using the current SVD decomposition of A.
+  * The parts of the solution corresponding to zero singular values are ignored.
+  *
+  * \sa MatrixBase::svd(), LU::solve(), LLT::solve()
+  */
+template<typename MatrixType>
+template<typename OtherDerived, typename ResultType>
+bool SVD<MatrixType>::solve(const MatrixBase<OtherDerived> &b, ResultType* result) const
+{
+  ei_assert(b.rows() == m_matU.rows());
+
+  Scalar maxVal = m_sigma.cwise().abs().maxCoeff();
+  for (int j=0; j<b.cols(); ++j)
+  {
+    Matrix<Scalar,MatrixUType::RowsAtCompileTime,1> aux = m_matU.transpose() * b.col(j);
+
+    for (int i = 0; i <m_matU.cols(); ++i)
+    {
+      Scalar si = m_sigma.coeff(i);
+      if (ei_isMuchSmallerThan(ei_abs(si),maxVal))
+        aux.coeffRef(i) = 0;
+      else
+        aux.coeffRef(i) /= si;
+    }
+
+    result->col(j) = m_matV * aux;
+  }
+  return true;
+}
+
+/** Computes the polar decomposition of the matrix, as a product unitary x positive.
+  *
+  * If either pointer is zero, the corresponding computation is skipped.
+  *
+  * Only for square matrices.
+  *
+  * \sa computePositiveUnitary(), computeRotationScaling()
+  */
+template<typename MatrixType>
+template<typename UnitaryType, typename PositiveType>
+void SVD<MatrixType>::computeUnitaryPositive(UnitaryType *unitary,
+                                             PositiveType *positive) const
+{
+  ei_assert(m_matU.cols() == m_matV.cols() && "Polar decomposition is only for square matrices");
+  if(unitary) *unitary = m_matU * m_matV.adjoint();
+  if(positive) *positive = m_matV * m_sigma.asDiagonal() * m_matV.adjoint();
+}
+
+/** Computes the polar decomposition of the matrix, as a product positive x unitary.
+  *
+  * If either pointer is zero, the corresponding computation is skipped.
+  *
+  * Only for square matrices.
+  *
+  * \sa computeUnitaryPositive(), computeRotationScaling()
+  */
+template<typename MatrixType>
+template<typename UnitaryType, typename PositiveType>
+void SVD<MatrixType>::computePositiveUnitary(UnitaryType *positive,
+                                             PositiveType *unitary) const
+{
+  ei_assert(m_matU.rows() == m_matV.rows() && "Polar decomposition is only for square matrices");
+  if(unitary) *unitary = m_matU * m_matV.adjoint();
+  if(positive) *positive = m_matU * m_sigma.asDiagonal() * m_matU.adjoint();
+}
+
+/** decomposes the matrix as a product rotation x scaling, the scaling being
+  * not necessarily positive.
+  *
+  * If either pointer is zero, the corresponding computation is skipped.
+  *
+  * This method requires the Geometry module.
+  *
+  * \sa computeScalingRotation(), computeUnitaryPositive()
+  */
+template<typename MatrixType>
+template<typename RotationType, typename ScalingType>
+void SVD<MatrixType>::computeRotationScaling(RotationType *rotation, ScalingType *scaling) const
+{
+  ei_assert(m_matU.rows() == m_matV.rows() && "Polar decomposition is only for square matrices");
+  Scalar x = (m_matU * m_matV.adjoint()).determinant(); // so x has absolute value 1
+  Matrix<Scalar, MatrixType::RowsAtCompileTime, 1> sv(m_sigma);
+  sv.coeffRef(0) *= x;
+  if(scaling) scaling->lazyAssign(m_matV * sv.asDiagonal() * m_matV.adjoint());
+  if(rotation)
+  {
+    MatrixType m(m_matU);
+    m.col(0) /= x;
+    rotation->lazyAssign(m * m_matV.adjoint());
+  }
+}
+
+/** decomposes the matrix as a product scaling x rotation, the scaling being
+  * not necessarily positive.
+  *
+  * If either pointer is zero, the corresponding computation is skipped.
+  *
+  * This method requires the Geometry module.
+  *
+  * \sa computeRotationScaling(), computeUnitaryPositive()
+  */
+template<typename MatrixType>
+template<typename ScalingType, typename RotationType>
+void SVD<MatrixType>::computeScalingRotation(ScalingType *scaling, RotationType *rotation) const
+{
+  ei_assert(m_matU.rows() == m_matV.rows() && "Polar decomposition is only for square matrices");
+  Scalar x = (m_matU * m_matV.adjoint()).determinant(); // so x has absolute value 1
+  Matrix<Scalar, MatrixType::RowsAtCompileTime, 1> sv(m_sigma);
+  sv.coeffRef(0) *= x;
+  if(scaling) scaling->lazyAssign(m_matU * sv.asDiagonal() * m_matU.adjoint());
+  if(rotation)
+  {
+    MatrixType m(m_matU);
+    m.col(0) /= x;
+    rotation->lazyAssign(m * m_matV.adjoint());
+  }
+}
+
+
+/** \svd_module
+  * \returns the SVD decomposition of \c *this
+  */
+template<typename Derived>
+inline SVD<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::svd() const
+{
+  return SVD<PlainObject>(derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN2_SVD_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/TriangularSolver.h b/third_party/eigen3/Eigen/src/Eigen2Support/TriangularSolver.h
new file mode 100644
index 0000000000..ebbeb3b495
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/TriangularSolver.h
@@ -0,0 +1,42 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRIANGULAR_SOLVER2_H
+#define EIGEN_TRIANGULAR_SOLVER2_H
+
+namespace Eigen { 
+
+const unsigned int UnitDiagBit = UnitDiag;
+const unsigned int SelfAdjointBit = SelfAdjoint;
+const unsigned int UpperTriangularBit = Upper;
+const unsigned int LowerTriangularBit = Lower;
+
+const unsigned int UpperTriangular = Upper;
+const unsigned int LowerTriangular = Lower;
+const unsigned int UnitUpperTriangular = UnitUpper;
+const unsigned int UnitLowerTriangular = UnitLower;
+
+template<typename ExpressionType, unsigned int Added, unsigned int Removed>
+template<typename OtherDerived>
+typename ExpressionType::PlainObject
+Flagged<ExpressionType,Added,Removed>::solveTriangular(const MatrixBase<OtherDerived>& other) const
+{
+  return m_matrix.template triangularView<Added>().solve(other.derived());
+}
+
+template<typename ExpressionType, unsigned int Added, unsigned int Removed>
+template<typename OtherDerived>
+void Flagged<ExpressionType,Added,Removed>::solveTriangularInPlace(const MatrixBase<OtherDerived>& other) const
+{
+  m_matrix.template triangularView<Added>().solveInPlace(other.derived());
+}
+
+} // end namespace Eigen
+    
+#endif // EIGEN_TRIANGULAR_SOLVER2_H
diff --git a/third_party/eigen3/Eigen/src/Eigen2Support/VectorBlock.h b/third_party/eigen3/Eigen/src/Eigen2Support/VectorBlock.h
new file mode 100644
index 0000000000..71a8080a9f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigen2Support/VectorBlock.h
@@ -0,0 +1,94 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN2_VECTORBLOCK_H
+#define EIGEN2_VECTORBLOCK_H
+
+namespace Eigen { 
+
+/** \deprecated use DenseMase::head(Index) */
+template<typename Derived>
+inline VectorBlock<Derived>
+MatrixBase<Derived>::start(Index size)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return VectorBlock<Derived>(derived(), 0, size);
+}
+
+/** \deprecated use DenseMase::head(Index) */
+template<typename Derived>
+inline const VectorBlock<const Derived>
+MatrixBase<Derived>::start(Index size) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return VectorBlock<const Derived>(derived(), 0, size);
+}
+
+/** \deprecated use DenseMase::tail(Index) */
+template<typename Derived>
+inline VectorBlock<Derived>
+MatrixBase<Derived>::end(Index size)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return VectorBlock<Derived>(derived(), this->size() - size, size);
+}
+
+/** \deprecated use DenseMase::tail(Index) */
+template<typename Derived>
+inline const VectorBlock<const Derived>
+MatrixBase<Derived>::end(Index size) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return VectorBlock<const Derived>(derived(), this->size() - size, size);
+}
+
+/** \deprecated use DenseMase::head() */
+template<typename Derived>
+template<int Size>
+inline VectorBlock<Derived,Size>
+MatrixBase<Derived>::start()
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return VectorBlock<Derived,Size>(derived(), 0);
+}
+
+/** \deprecated use DenseMase::head() */
+template<typename Derived>
+template<int Size>
+inline const VectorBlock<const Derived,Size>
+MatrixBase<Derived>::start() const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return VectorBlock<const Derived,Size>(derived(), 0);
+}
+
+/** \deprecated use DenseMase::tail() */
+template<typename Derived>
+template<int Size>
+inline VectorBlock<Derived,Size>
+MatrixBase<Derived>::end()
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return VectorBlock<Derived, Size>(derived(), size() - Size);
+}
+
+/** \deprecated use DenseMase::tail() */
+template<typename Derived>
+template<int Size>
+inline const VectorBlock<const Derived,Size>
+MatrixBase<Derived>::end() const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return VectorBlock<const Derived, Size>(derived(), size() - Size);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN2_VECTORBLOCK_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/ComplexEigenSolver.h b/third_party/eigen3/Eigen/src/Eigenvalues/ComplexEigenSolver.h
new file mode 100644
index 0000000000..af434bc9bd
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/ComplexEigenSolver.h
@@ -0,0 +1,333 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Claire Maurice
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010,2012 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COMPLEX_EIGEN_SOLVER_H
+#define EIGEN_COMPLEX_EIGEN_SOLVER_H
+
+#include "./ComplexSchur.h"
+
+namespace Eigen { 
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class ComplexEigenSolver
+  *
+  * \brief Computes eigenvalues and eigenvectors of general complex matrices
+  *
+  * \tparam _MatrixType the type of the matrix of which we are
+  * computing the eigendecomposition; this is expected to be an
+  * instantiation of the Matrix class template.
+  *
+  * The eigenvalues and eigenvectors of a matrix \f$ A \f$ are scalars
+  * \f$ \lambda \f$ and vectors \f$ v \f$ such that \f$ Av = \lambda v
+  * \f$.  If \f$ D \f$ is a diagonal matrix with the eigenvalues on
+  * the diagonal, and \f$ V \f$ is a matrix with the eigenvectors as
+  * its columns, then \f$ A V = V D \f$. The matrix \f$ V \f$ is
+  * almost always invertible, in which case we have \f$ A = V D V^{-1}
+  * \f$. This is called the eigendecomposition.
+  *
+  * The main function in this class is compute(), which computes the
+  * eigenvalues and eigenvectors of a given function. The
+  * documentation for that function contains an example showing the
+  * main features of the class.
+  *
+  * \sa class EigenSolver, class SelfAdjointEigenSolver
+  */
+template<typename _MatrixType> class ComplexEigenSolver
+{
+  public:
+
+    /** \brief Synonym for the template parameter \p _MatrixType. */
+    typedef _MatrixType MatrixType;
+
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+
+    /** \brief Scalar type for matrices of type #MatrixType. */
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename MatrixType::Index Index;
+
+    /** \brief Complex scalar type for #MatrixType.
+      *
+      * This is \c std::complex<Scalar> if #Scalar is real (e.g.,
+      * \c float or \c double) and just \c Scalar if #Scalar is
+      * complex.
+      */
+    typedef std::complex<RealScalar> ComplexScalar;
+
+    /** \brief Type for vector of eigenvalues as returned by eigenvalues().
+      *
+      * This is a column vector with entries of type #ComplexScalar.
+      * The length of the vector is the size of #MatrixType.
+      */
+    typedef Matrix<ComplexScalar, ColsAtCompileTime, 1, Options&(~RowMajor), MaxColsAtCompileTime, 1> EigenvalueType;
+
+    /** \brief Type for matrix of eigenvectors as returned by eigenvectors().
+      *
+      * This is a square matrix with entries of type #ComplexScalar.
+      * The size is the same as the size of #MatrixType.
+      */
+    typedef Matrix<ComplexScalar, RowsAtCompileTime, ColsAtCompileTime, Options, MaxRowsAtCompileTime, MaxColsAtCompileTime> EigenvectorType;
+
+    /** \brief Default constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via compute().
+      */
+    ComplexEigenSolver()
+            : m_eivec(),
+              m_eivalues(),
+              m_schur(),
+              m_isInitialized(false),
+              m_eigenvectorsOk(false),
+              m_matX()
+    {}
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa ComplexEigenSolver()
+      */
+    ComplexEigenSolver(Index size)
+            : m_eivec(size, size),
+              m_eivalues(size),
+              m_schur(size),
+              m_isInitialized(false),
+              m_eigenvectorsOk(false),
+              m_matX(size, size)
+    {}
+
+    /** \brief Constructor; computes eigendecomposition of given matrix.
+      *
+      * \param[in]  matrix  Square matrix whose eigendecomposition is to be computed.
+      * \param[in]  computeEigenvectors  If true, both the eigenvectors and the
+      *    eigenvalues are computed; if false, only the eigenvalues are
+      *    computed.
+      *
+      * This constructor calls compute() to compute the eigendecomposition.
+      */
+      ComplexEigenSolver(const MatrixType& matrix, bool computeEigenvectors = true)
+            : m_eivec(matrix.rows(),matrix.cols()),
+              m_eivalues(matrix.cols()),
+              m_schur(matrix.rows()),
+              m_isInitialized(false),
+              m_eigenvectorsOk(false),
+              m_matX(matrix.rows(),matrix.cols())
+    {
+      compute(matrix, computeEigenvectors);
+    }
+
+    /** \brief Returns the eigenvectors of given matrix.
+      *
+      * \returns  A const reference to the matrix whose columns are the eigenvectors.
+      *
+      * \pre Either the constructor
+      * ComplexEigenSolver(const MatrixType& matrix, bool) or the member
+      * function compute(const MatrixType& matrix, bool) has been called before
+      * to compute the eigendecomposition of a matrix, and
+      * \p computeEigenvectors was set to true (the default).
+      *
+      * This function returns a matrix whose columns are the eigenvectors. Column
+      * \f$ k \f$ is an eigenvector corresponding to eigenvalue number \f$ k
+      * \f$ as returned by eigenvalues().  The eigenvectors are normalized to
+      * have (Euclidean) norm equal to one. The matrix returned by this
+      * function is the matrix \f$ V \f$ in the eigendecomposition \f$ A = V D
+      * V^{-1} \f$, if it exists.
+      *
+      * Example: \include ComplexEigenSolver_eigenvectors.cpp
+      * Output: \verbinclude ComplexEigenSolver_eigenvectors.out
+      */
+    const EigenvectorType& eigenvectors() const
+    {
+      eigen_assert(m_isInitialized && "ComplexEigenSolver is not initialized.");
+      eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues.");
+      return m_eivec;
+    }
+
+    /** \brief Returns the eigenvalues of given matrix.
+      *
+      * \returns A const reference to the column vector containing the eigenvalues.
+      *
+      * \pre Either the constructor
+      * ComplexEigenSolver(const MatrixType& matrix, bool) or the member
+      * function compute(const MatrixType& matrix, bool) has been called before
+      * to compute the eigendecomposition of a matrix.
+      *
+      * This function returns a column vector containing the
+      * eigenvalues. Eigenvalues are repeated according to their
+      * algebraic multiplicity, so there are as many eigenvalues as
+      * rows in the matrix. The eigenvalues are not sorted in any particular
+      * order.
+      *
+      * Example: \include ComplexEigenSolver_eigenvalues.cpp
+      * Output: \verbinclude ComplexEigenSolver_eigenvalues.out
+      */
+    const EigenvalueType& eigenvalues() const
+    {
+      eigen_assert(m_isInitialized && "ComplexEigenSolver is not initialized.");
+      return m_eivalues;
+    }
+
+    /** \brief Computes eigendecomposition of given matrix.
+      *
+      * \param[in]  matrix  Square matrix whose eigendecomposition is to be computed.
+      * \param[in]  computeEigenvectors  If true, both the eigenvectors and the
+      *    eigenvalues are computed; if false, only the eigenvalues are
+      *    computed.
+      * \returns    Reference to \c *this
+      *
+      * This function computes the eigenvalues of the complex matrix \p matrix.
+      * The eigenvalues() function can be used to retrieve them.  If
+      * \p computeEigenvectors is true, then the eigenvectors are also computed
+      * and can be retrieved by calling eigenvectors().
+      *
+      * The matrix is first reduced to Schur form using the
+      * ComplexSchur class. The Schur decomposition is then used to
+      * compute the eigenvalues and eigenvectors.
+      *
+      * The cost of the computation is dominated by the cost of the
+      * Schur decomposition, which is \f$ O(n^3) \f$ where \f$ n \f$
+      * is the size of the matrix.
+      *
+      * Example: \include ComplexEigenSolver_compute.cpp
+      * Output: \verbinclude ComplexEigenSolver_compute.out
+      */
+    ComplexEigenSolver& compute(const MatrixType& matrix, bool computeEigenvectors = true);
+
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful, \c NoConvergence otherwise.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "ComplexEigenSolver is not initialized.");
+      return m_schur.info();
+    }
+
+    /** \brief Sets the maximum number of iterations allowed. */
+    ComplexEigenSolver& setMaxIterations(Index maxIters)
+    {
+      m_schur.setMaxIterations(maxIters);
+      return *this;
+    }
+
+    /** \brief Returns the maximum number of iterations. */
+    Index getMaxIterations()
+    {
+      return m_schur.getMaxIterations();
+    }
+
+  protected:
+    EigenvectorType m_eivec;
+    EigenvalueType m_eivalues;
+    ComplexSchur<MatrixType> m_schur;
+    bool m_isInitialized;
+    bool m_eigenvectorsOk;
+    EigenvectorType m_matX;
+
+  private:
+    void doComputeEigenvectors(const RealScalar& matrixnorm);
+    void sortEigenvalues(bool computeEigenvectors);
+};
+
+
+template<typename MatrixType>
+ComplexEigenSolver<MatrixType>& 
+ComplexEigenSolver<MatrixType>::compute(const MatrixType& matrix, bool computeEigenvectors)
+{
+  // this code is inspired from Jampack
+  eigen_assert(matrix.cols() == matrix.rows());
+
+  // Do a complex Schur decomposition, A = U T U^*
+  // The eigenvalues are on the diagonal of T.
+  m_schur.compute(matrix, computeEigenvectors);
+
+  if(m_schur.info() == Success)
+  {
+    m_eivalues = m_schur.matrixT().diagonal();
+    if(computeEigenvectors)
+      doComputeEigenvectors(matrix.norm());
+    sortEigenvalues(computeEigenvectors);
+  }
+
+  m_isInitialized = true;
+  m_eigenvectorsOk = computeEigenvectors;
+  return *this;
+}
+
+
+template<typename MatrixType>
+void ComplexEigenSolver<MatrixType>::doComputeEigenvectors(const RealScalar& matrixnorm)
+{
+  const Index n = m_eivalues.size();
+
+  // Compute X such that T = X D X^(-1), where D is the diagonal of T.
+  // The matrix X is unit triangular.
+  m_matX = EigenvectorType::Zero(n, n);
+  for(Index k=n-1 ; k>=0 ; k--)
+  {
+    m_matX.coeffRef(k,k) = ComplexScalar(1.0,0.0);
+    // Compute X(i,k) using the (i,k) entry of the equation X T = D X
+    for(Index i=k-1 ; i>=0 ; i--)
+    {
+      m_matX.coeffRef(i,k) = -m_schur.matrixT().coeff(i,k);
+      if(k-i-1>0)
+        m_matX.coeffRef(i,k) -= (m_schur.matrixT().row(i).segment(i+1,k-i-1) * m_matX.col(k).segment(i+1,k-i-1)).value();
+      ComplexScalar z = m_schur.matrixT().coeff(i,i) - m_schur.matrixT().coeff(k,k);
+      if(z==ComplexScalar(0))
+      {
+        // If the i-th and k-th eigenvalue are equal, then z equals 0.
+        // Use a small value instead, to prevent division by zero.
+        numext::real_ref(z) = NumTraits<RealScalar>::epsilon() * matrixnorm;
+      }
+      m_matX.coeffRef(i,k) = m_matX.coeff(i,k) / z;
+    }
+  }
+
+  // Compute V as V = U X; now A = U T U^* = U X D X^(-1) U^* = V D V^(-1)
+  m_eivec.noalias() = m_schur.matrixU() * m_matX;
+  // .. and normalize the eigenvectors
+  for(Index k=0 ; k<n ; k++)
+  {
+    m_eivec.col(k).normalize();
+  }
+}
+
+
+template<typename MatrixType>
+void ComplexEigenSolver<MatrixType>::sortEigenvalues(bool computeEigenvectors)
+{
+  const Index n =  m_eivalues.size();
+  for (Index i=0; i<n; i++)
+  {
+    Index k;
+    m_eivalues.cwiseAbs().tail(n-i).minCoeff(&k);
+    if (k != 0)
+    {
+      k += i;
+      std::swap(m_eivalues[k],m_eivalues[i]);
+      if(computeEigenvectors)
+	m_eivec.col(i).swap(m_eivec.col(k));
+    }
+  }
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMPLEX_EIGEN_SOLVER_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/ComplexSchur.h b/third_party/eigen3/Eigen/src/Eigenvalues/ComplexSchur.h
new file mode 100644
index 0000000000..89e6cade33
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/ComplexSchur.h
@@ -0,0 +1,456 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Claire Maurice
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010,2012 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COMPLEX_SCHUR_H
+#define EIGEN_COMPLEX_SCHUR_H
+
+#include "./HessenbergDecomposition.h"
+
+namespace Eigen { 
+
+namespace internal {
+template<typename MatrixType, bool IsComplex> struct complex_schur_reduce_to_hessenberg;
+}
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class ComplexSchur
+  *
+  * \brief Performs a complex Schur decomposition of a real or complex square matrix
+  *
+  * \tparam _MatrixType the type of the matrix of which we are
+  * computing the Schur decomposition; this is expected to be an
+  * instantiation of the Matrix class template.
+  *
+  * Given a real or complex square matrix A, this class computes the
+  * Schur decomposition: \f$ A = U T U^*\f$ where U is a unitary
+  * complex matrix, and T is a complex upper triangular matrix.  The
+  * diagonal of the matrix T corresponds to the eigenvalues of the
+  * matrix A.
+  *
+  * Call the function compute() to compute the Schur decomposition of
+  * a given matrix. Alternatively, you can use the 
+  * ComplexSchur(const MatrixType&, bool) constructor which computes
+  * the Schur decomposition at construction time. Once the
+  * decomposition is computed, you can use the matrixU() and matrixT()
+  * functions to retrieve the matrices U and V in the decomposition.
+  *
+  * \note This code is inspired from Jampack
+  *
+  * \sa class RealSchur, class EigenSolver, class ComplexEigenSolver
+  */
+template<typename _MatrixType> class ComplexSchur
+{
+  public:
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+
+    /** \brief Scalar type for matrices of type \p _MatrixType. */
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename MatrixType::Index Index;
+
+    /** \brief Complex scalar type for \p _MatrixType. 
+      *
+      * This is \c std::complex<Scalar> if #Scalar is real (e.g.,
+      * \c float or \c double) and just \c Scalar if #Scalar is
+      * complex.
+      */
+    typedef std::complex<RealScalar> ComplexScalar;
+
+    /** \brief Type for the matrices in the Schur decomposition.
+      *
+      * This is a square matrix with entries of type #ComplexScalar. 
+      * The size is the same as the size of \p _MatrixType.
+      */
+    typedef Matrix<ComplexScalar, RowsAtCompileTime, ColsAtCompileTime, Options, MaxRowsAtCompileTime, MaxColsAtCompileTime> ComplexMatrixType;
+
+    /** \brief Default constructor.
+      *
+      * \param [in] size  Positive integer, size of the matrix whose Schur decomposition will be computed.
+      *
+      * The default constructor is useful in cases in which the user
+      * intends to perform decompositions via compute().  The \p size
+      * parameter is only used as a hint. It is not an error to give a
+      * wrong \p size, but it may impair performance.
+      *
+      * \sa compute() for an example.
+      */
+    ComplexSchur(Index size = RowsAtCompileTime==Dynamic ? 1 : RowsAtCompileTime)
+      : m_matT(size,size),
+        m_matU(size,size),
+        m_hess(size),
+        m_isInitialized(false),
+        m_matUisUptodate(false),
+        m_maxIters(-1)
+    {}
+
+    /** \brief Constructor; computes Schur decomposition of given matrix. 
+      * 
+      * \param[in]  matrix    Square matrix whose Schur decomposition is to be computed.
+      * \param[in]  computeU  If true, both T and U are computed; if false, only T is computed.
+      *
+      * This constructor calls compute() to compute the Schur decomposition.
+      *
+      * \sa matrixT() and matrixU() for examples.
+      */
+    ComplexSchur(const MatrixType& matrix, bool computeU = true)
+      : m_matT(matrix.rows(),matrix.cols()),
+        m_matU(matrix.rows(),matrix.cols()),
+        m_hess(matrix.rows()),
+        m_isInitialized(false),
+        m_matUisUptodate(false),
+        m_maxIters(-1)
+    {
+      compute(matrix, computeU);
+    }
+
+    /** \brief Returns the unitary matrix in the Schur decomposition. 
+      *
+      * \returns A const reference to the matrix U.
+      *
+      * It is assumed that either the constructor
+      * ComplexSchur(const MatrixType& matrix, bool computeU) or the
+      * member function compute(const MatrixType& matrix, bool computeU)
+      * has been called before to compute the Schur decomposition of a
+      * matrix, and that \p computeU was set to true (the default
+      * value).
+      *
+      * Example: \include ComplexSchur_matrixU.cpp
+      * Output: \verbinclude ComplexSchur_matrixU.out
+      */
+    const ComplexMatrixType& matrixU() const
+    {
+      eigen_assert(m_isInitialized && "ComplexSchur is not initialized.");
+      eigen_assert(m_matUisUptodate && "The matrix U has not been computed during the ComplexSchur decomposition.");
+      return m_matU;
+    }
+
+    /** \brief Returns the triangular matrix in the Schur decomposition. 
+      *
+      * \returns A const reference to the matrix T.
+      *
+      * It is assumed that either the constructor
+      * ComplexSchur(const MatrixType& matrix, bool computeU) or the
+      * member function compute(const MatrixType& matrix, bool computeU)
+      * has been called before to compute the Schur decomposition of a
+      * matrix.
+      *
+      * Note that this function returns a plain square matrix. If you want to reference
+      * only the upper triangular part, use:
+      * \code schur.matrixT().triangularView<Upper>() \endcode 
+      *
+      * Example: \include ComplexSchur_matrixT.cpp
+      * Output: \verbinclude ComplexSchur_matrixT.out
+      */
+    const ComplexMatrixType& matrixT() const
+    {
+      eigen_assert(m_isInitialized && "ComplexSchur is not initialized.");
+      return m_matT;
+    }
+
+    /** \brief Computes Schur decomposition of given matrix. 
+      * 
+      * \param[in]  matrix  Square matrix whose Schur decomposition is to be computed.
+      * \param[in]  computeU  If true, both T and U are computed; if false, only T is computed.
+
+      * \returns    Reference to \c *this
+      *
+      * The Schur decomposition is computed by first reducing the
+      * matrix to Hessenberg form using the class
+      * HessenbergDecomposition. The Hessenberg matrix is then reduced
+      * to triangular form by performing QR iterations with a single
+      * shift. The cost of computing the Schur decomposition depends
+      * on the number of iterations; as a rough guide, it may be taken
+      * on the number of iterations; as a rough guide, it may be taken
+      * to be \f$25n^3\f$ complex flops, or \f$10n^3\f$ complex flops
+      * if \a computeU is false.
+      *
+      * Example: \include ComplexSchur_compute.cpp
+      * Output: \verbinclude ComplexSchur_compute.out
+      *
+      * \sa compute(const MatrixType&, bool, Index)
+      */
+    ComplexSchur& compute(const MatrixType& matrix, bool computeU = true);
+    
+    /** \brief Compute Schur decomposition from a given Hessenberg matrix
+     *  \param[in] matrixH Matrix in Hessenberg form H
+     *  \param[in] matrixQ orthogonal matrix Q that transform a matrix A to H : A = Q H Q^T
+     *  \param computeU Computes the matriX U of the Schur vectors
+     * \return Reference to \c *this
+     * 
+     *  This routine assumes that the matrix is already reduced in Hessenberg form matrixH
+     *  using either the class HessenbergDecomposition or another mean. 
+     *  It computes the upper quasi-triangular matrix T of the Schur decomposition of H
+     *  When computeU is true, this routine computes the matrix U such that 
+     *  A = U T U^T =  (QZ) T (QZ)^T = Q H Q^T where A is the initial matrix
+     * 
+     * NOTE Q is referenced if computeU is true; so, if the initial orthogonal matrix
+     * is not available, the user should give an identity matrix (Q.setIdentity())
+     * 
+     * \sa compute(const MatrixType&, bool)
+     */
+    template<typename HessMatrixType, typename OrthMatrixType>
+    ComplexSchur& computeFromHessenberg(const HessMatrixType& matrixH, const OrthMatrixType& matrixQ,  bool computeU=true);
+
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful, \c NoConvergence otherwise.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "ComplexSchur is not initialized.");
+      return m_info;
+    }
+
+    /** \brief Sets the maximum number of iterations allowed. 
+      *
+      * If not specified by the user, the maximum number of iterations is m_maxIterationsPerRow times the size
+      * of the matrix.
+      */
+    ComplexSchur& setMaxIterations(Index maxIters)
+    {
+      m_maxIters = maxIters;
+      return *this;
+    }
+
+    /** \brief Returns the maximum number of iterations. */
+    Index getMaxIterations()
+    {
+      return m_maxIters;
+    }
+
+    /** \brief Maximum number of iterations per row.
+      *
+      * If not otherwise specified, the maximum number of iterations is this number times the size of the
+      * matrix. It is currently set to 30.
+      */
+    static const int m_maxIterationsPerRow = 30;
+
+  protected:
+    ComplexMatrixType m_matT, m_matU;
+    HessenbergDecomposition<MatrixType> m_hess;
+    ComputationInfo m_info;
+    bool m_isInitialized;
+    bool m_matUisUptodate;
+    Index m_maxIters;
+
+  private:  
+    bool subdiagonalEntryIsNeglegible(Index i);
+    ComplexScalar computeShift(Index iu, Index iter);
+    void reduceToTriangularForm(bool computeU);
+    friend struct internal::complex_schur_reduce_to_hessenberg<MatrixType, NumTraits<Scalar>::IsComplex>;
+};
+
+/** If m_matT(i+1,i) is neglegible in floating point arithmetic
+  * compared to m_matT(i,i) and m_matT(j,j), then set it to zero and
+  * return true, else return false. */
+template<typename MatrixType>
+inline bool ComplexSchur<MatrixType>::subdiagonalEntryIsNeglegible(Index i)
+{
+  RealScalar d = numext::norm1(m_matT.coeff(i,i)) + numext::norm1(m_matT.coeff(i+1,i+1));
+  RealScalar sd = numext::norm1(m_matT.coeff(i+1,i));
+  if (internal::isMuchSmallerThan(sd, d, NumTraits<RealScalar>::epsilon()))
+  {
+    m_matT.coeffRef(i+1,i) = ComplexScalar(0);
+    return true;
+  }
+  return false;
+}
+
+
+/** Compute the shift in the current QR iteration. */
+template<typename MatrixType>
+typename ComplexSchur<MatrixType>::ComplexScalar ComplexSchur<MatrixType>::computeShift(Index iu, Index iter)
+{
+  using std::abs;
+  if (iter == 10 || iter == 20) 
+  {
+    // exceptional shift, taken from http://www.netlib.org/eispack/comqr.f
+    return abs(numext::real(m_matT.coeff(iu,iu-1))) + abs(numext::real(m_matT.coeff(iu-1,iu-2)));
+  }
+
+  // compute the shift as one of the eigenvalues of t, the 2x2
+  // diagonal block on the bottom of the active submatrix
+  Matrix<ComplexScalar,2,2> t = m_matT.template block<2,2>(iu-1,iu-1);
+  RealScalar normt = t.cwiseAbs().sum();
+  t /= normt;     // the normalization by sf is to avoid under/overflow
+
+  ComplexScalar b = t.coeff(0,1) * t.coeff(1,0);
+  ComplexScalar c = t.coeff(0,0) - t.coeff(1,1);
+  ComplexScalar disc = sqrt(c*c + RealScalar(4)*b);
+  ComplexScalar det = t.coeff(0,0) * t.coeff(1,1) - b;
+  ComplexScalar trace = t.coeff(0,0) + t.coeff(1,1);
+  ComplexScalar eival1 = (trace + disc) / RealScalar(2);
+  ComplexScalar eival2 = (trace - disc) / RealScalar(2);
+
+  if(numext::norm1(eival1) > numext::norm1(eival2))
+    eival2 = det / eival1;
+  else
+    eival1 = det / eival2;
+
+  // choose the eigenvalue closest to the bottom entry of the diagonal
+  if(numext::norm1(eival1-t.coeff(1,1)) < numext::norm1(eival2-t.coeff(1,1)))
+    return normt * eival1;
+  else
+    return normt * eival2;
+}
+
+
+template<typename MatrixType>
+ComplexSchur<MatrixType>& ComplexSchur<MatrixType>::compute(const MatrixType& matrix, bool computeU)
+{
+  m_matUisUptodate = false;
+  eigen_assert(matrix.cols() == matrix.rows());
+
+  if(matrix.cols() == 1)
+  {
+    m_matT = matrix.template cast<ComplexScalar>();
+    if(computeU)  m_matU = ComplexMatrixType::Identity(1,1);
+    m_info = Success;
+    m_isInitialized = true;
+    m_matUisUptodate = computeU;
+    return *this;
+  }
+
+  internal::complex_schur_reduce_to_hessenberg<MatrixType, NumTraits<Scalar>::IsComplex>::run(*this, matrix, computeU);
+  computeFromHessenberg(m_matT, m_matU, computeU);
+  return *this;
+}
+
+template<typename MatrixType>
+template<typename HessMatrixType, typename OrthMatrixType>
+ComplexSchur<MatrixType>& ComplexSchur<MatrixType>::computeFromHessenberg(const HessMatrixType& matrixH, const OrthMatrixType& matrixQ, bool computeU)
+{
+  m_matT = matrixH;
+  if(computeU)
+    m_matU = matrixQ;
+  reduceToTriangularForm(computeU);
+  return *this;
+}
+namespace internal {
+
+/* Reduce given matrix to Hessenberg form */
+template<typename MatrixType, bool IsComplex>
+struct complex_schur_reduce_to_hessenberg
+{
+  // this is the implementation for the case IsComplex = true
+  static void run(ComplexSchur<MatrixType>& _this, const MatrixType& matrix, bool computeU)
+  {
+    _this.m_hess.compute(matrix);
+    _this.m_matT = _this.m_hess.matrixH();
+    if(computeU)  _this.m_matU = _this.m_hess.matrixQ();
+  }
+};
+
+template<typename MatrixType>
+struct complex_schur_reduce_to_hessenberg<MatrixType, false>
+{
+  static void run(ComplexSchur<MatrixType>& _this, const MatrixType& matrix, bool computeU)
+  {
+    typedef typename ComplexSchur<MatrixType>::ComplexScalar ComplexScalar;
+
+    // Note: m_hess is over RealScalar; m_matT and m_matU is over ComplexScalar
+    _this.m_hess.compute(matrix);
+    _this.m_matT = _this.m_hess.matrixH().template cast<ComplexScalar>();
+    if(computeU)  
+    {
+      // This may cause an allocation which seems to be avoidable
+      MatrixType Q = _this.m_hess.matrixQ(); 
+      _this.m_matU = Q.template cast<ComplexScalar>();
+    }
+  }
+};
+
+} // end namespace internal
+
+// Reduce the Hessenberg matrix m_matT to triangular form by QR iteration.
+template<typename MatrixType>
+void ComplexSchur<MatrixType>::reduceToTriangularForm(bool computeU)
+{  
+  Index maxIters = m_maxIters;
+  if (maxIters == -1)
+    maxIters = m_maxIterationsPerRow * m_matT.rows();
+
+  // The matrix m_matT is divided in three parts. 
+  // Rows 0,...,il-1 are decoupled from the rest because m_matT(il,il-1) is zero. 
+  // Rows il,...,iu is the part we are working on (the active submatrix).
+  // Rows iu+1,...,end are already brought in triangular form.
+  Index iu = m_matT.cols() - 1;
+  Index il;
+  Index iter = 0; // number of iterations we are working on the (iu,iu) element
+  Index totalIter = 0; // number of iterations for whole matrix
+
+  while(true)
+  {
+    // find iu, the bottom row of the active submatrix
+    while(iu > 0)
+    {
+      if(!subdiagonalEntryIsNeglegible(iu-1)) break;
+      iter = 0;
+      --iu;
+    }
+
+    // if iu is zero then we are done; the whole matrix is triangularized
+    if(iu==0) break;
+
+    // if we spent too many iterations, we give up
+    iter++;
+    totalIter++;
+    if(totalIter > maxIters) break;
+
+    // find il, the top row of the active submatrix
+    il = iu-1;
+    while(il > 0 && !subdiagonalEntryIsNeglegible(il-1))
+    {
+      --il;
+    }
+
+    /* perform the QR step using Givens rotations. The first rotation
+       creates a bulge; the (il+2,il) element becomes nonzero. This
+       bulge is chased down to the bottom of the active submatrix. */
+
+    ComplexScalar shift = computeShift(iu, iter);
+    JacobiRotation<ComplexScalar> rot;
+    rot.makeGivens(m_matT.coeff(il,il) - shift, m_matT.coeff(il+1,il));
+    m_matT.rightCols(m_matT.cols()-il).applyOnTheLeft(il, il+1, rot.adjoint());
+    m_matT.topRows((std::min)(il+2,iu)+1).applyOnTheRight(il, il+1, rot);
+    if(computeU) m_matU.applyOnTheRight(il, il+1, rot);
+
+    for(Index i=il+1 ; i<iu ; i++)
+    {
+      rot.makeGivens(m_matT.coeffRef(i,i-1), m_matT.coeffRef(i+1,i-1), &m_matT.coeffRef(i,i-1));
+      m_matT.coeffRef(i+1,i-1) = ComplexScalar(0);
+      m_matT.rightCols(m_matT.cols()-i).applyOnTheLeft(i, i+1, rot.adjoint());
+      m_matT.topRows((std::min)(i+2,iu)+1).applyOnTheRight(i, i+1, rot);
+      if(computeU) m_matU.applyOnTheRight(i, i+1, rot);
+    }
+  }
+
+  if(totalIter <= maxIters)
+    m_info = Success;
+  else
+    m_info = NoConvergence;
+
+  m_isInitialized = true;
+  m_matUisUptodate = computeU;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMPLEX_SCHUR_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/ComplexSchur_MKL.h b/third_party/eigen3/Eigen/src/Eigenvalues/ComplexSchur_MKL.h
new file mode 100644
index 0000000000..91496ae5bd
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/ComplexSchur_MKL.h
@@ -0,0 +1,94 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *    Complex Schur needed to complex unsymmetrical eigenvalues/eigenvectors.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_COMPLEX_SCHUR_MKL_H
+#define EIGEN_COMPLEX_SCHUR_MKL_H
+
+#include "Eigen/src/Core/util/MKL_support.h"
+
+namespace Eigen { 
+
+/** \internal Specialization for the data types supported by MKL */
+
+#define EIGEN_MKL_SCHUR_COMPLEX(EIGTYPE, MKLTYPE, MKLPREFIX, MKLPREFIX_U, EIGCOLROW, MKLCOLROW) \
+template<> inline \
+ComplexSchur<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW> >& \
+ComplexSchur<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW> >::compute(const Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW>& matrix, bool computeU) \
+{ \
+  typedef Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW> MatrixType; \
+  typedef MatrixType::Scalar Scalar; \
+  typedef MatrixType::RealScalar RealScalar; \
+  typedef std::complex<RealScalar> ComplexScalar; \
+\
+  eigen_assert(matrix.cols() == matrix.rows()); \
+\
+  m_matUisUptodate = false; \
+  if(matrix.cols() == 1) \
+  { \
+    m_matT = matrix.cast<ComplexScalar>(); \
+    if(computeU)  m_matU = ComplexMatrixType::Identity(1,1); \
+      m_info = Success; \
+      m_isInitialized = true; \
+      m_matUisUptodate = computeU; \
+      return *this; \
+  } \
+  lapack_int n = matrix.cols(), sdim, info; \
+  lapack_int lda = matrix.outerStride(); \
+  lapack_int matrix_order = MKLCOLROW; \
+  char jobvs, sort='N'; \
+  LAPACK_##MKLPREFIX_U##_SELECT1 select = 0; \
+  jobvs = (computeU) ? 'V' : 'N'; \
+  m_matU.resize(n, n); \
+  lapack_int ldvs  = m_matU.outerStride(); \
+  m_matT = matrix; \
+  Matrix<EIGTYPE, Dynamic, Dynamic> w; \
+  w.resize(n, 1);\
+  info = LAPACKE_##MKLPREFIX##gees( matrix_order, jobvs, sort, select, n, (MKLTYPE*)m_matT.data(), lda, &sdim, (MKLTYPE*)w.data(), (MKLTYPE*)m_matU.data(), ldvs ); \
+  if(info == 0) \
+    m_info = Success; \
+  else \
+    m_info = NoConvergence; \
+\
+  m_isInitialized = true; \
+  m_matUisUptodate = computeU; \
+  return *this; \
+\
+}
+
+EIGEN_MKL_SCHUR_COMPLEX(dcomplex, MKL_Complex16, z, Z, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_SCHUR_COMPLEX(scomplex, MKL_Complex8,  c, C, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_SCHUR_COMPLEX(dcomplex, MKL_Complex16, z, Z, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_SCHUR_COMPLEX(scomplex, MKL_Complex8,  c, C, RowMajor, LAPACK_ROW_MAJOR)
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMPLEX_SCHUR_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/EigenSolver.h b/third_party/eigen3/Eigen/src/Eigenvalues/EigenSolver.h
new file mode 100644
index 0000000000..1763fed197
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/EigenSolver.h
@@ -0,0 +1,629 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010,2012 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_EIGENSOLVER_H
+#define EIGEN_EIGENSOLVER_H
+
+#include "./RealSchur.h"
+
+namespace Eigen { 
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class EigenSolver
+  *
+  * \brief Computes eigenvalues and eigenvectors of general matrices
+  *
+  * \tparam _MatrixType the type of the matrix of which we are computing the
+  * eigendecomposition; this is expected to be an instantiation of the Matrix
+  * class template. Currently, only real matrices are supported.
+  *
+  * The eigenvalues and eigenvectors of a matrix \f$ A \f$ are scalars
+  * \f$ \lambda \f$ and vectors \f$ v \f$ such that \f$ Av = \lambda v \f$.  If
+  * \f$ D \f$ is a diagonal matrix with the eigenvalues on the diagonal, and
+  * \f$ V \f$ is a matrix with the eigenvectors as its columns, then \f$ A V =
+  * V D \f$. The matrix \f$ V \f$ is almost always invertible, in which case we
+  * have \f$ A = V D V^{-1} \f$. This is called the eigendecomposition.
+  *
+  * The eigenvalues and eigenvectors of a matrix may be complex, even when the
+  * matrix is real. However, we can choose real matrices \f$ V \f$ and \f$ D
+  * \f$ satisfying \f$ A V = V D \f$, just like the eigendecomposition, if the
+  * matrix \f$ D \f$ is not required to be diagonal, but if it is allowed to
+  * have blocks of the form
+  * \f[ \begin{bmatrix} u & v \\ -v & u \end{bmatrix} \f]
+  * (where \f$ u \f$ and \f$ v \f$ are real numbers) on the diagonal.  These
+  * blocks correspond to complex eigenvalue pairs \f$ u \pm iv \f$. We call
+  * this variant of the eigendecomposition the pseudo-eigendecomposition.
+  *
+  * Call the function compute() to compute the eigenvalues and eigenvectors of
+  * a given matrix. Alternatively, you can use the 
+  * EigenSolver(const MatrixType&, bool) constructor which computes the
+  * eigenvalues and eigenvectors at construction time. Once the eigenvalue and
+  * eigenvectors are computed, they can be retrieved with the eigenvalues() and
+  * eigenvectors() functions. The pseudoEigenvalueMatrix() and
+  * pseudoEigenvectors() methods allow the construction of the
+  * pseudo-eigendecomposition.
+  *
+  * The documentation for EigenSolver(const MatrixType&, bool) contains an
+  * example of the typical use of this class.
+  *
+  * \note The implementation is adapted from
+  * <a href="http://math.nist.gov/javanumerics/jama/">JAMA</a> (public domain).
+  * Their code is based on EISPACK.
+  *
+  * \sa MatrixBase::eigenvalues(), class ComplexEigenSolver, class SelfAdjointEigenSolver
+  */
+template<typename _MatrixType> class EigenSolver
+{
+  public:
+
+    /** \brief Synonym for the template parameter \p _MatrixType. */
+    typedef _MatrixType MatrixType;
+
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+
+    /** \brief Scalar type for matrices of type #MatrixType. */
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename MatrixType::Index Index;
+
+    /** \brief Complex scalar type for #MatrixType. 
+      *
+      * This is \c std::complex<Scalar> if #Scalar is real (e.g.,
+      * \c float or \c double) and just \c Scalar if #Scalar is
+      * complex.
+      */
+    typedef std::complex<RealScalar> ComplexScalar;
+
+    /** \brief Type for vector of eigenvalues as returned by eigenvalues(). 
+      *
+      * This is a column vector with entries of type #ComplexScalar.
+      * The length of the vector is the size of #MatrixType.
+      */
+    typedef Matrix<ComplexScalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> EigenvalueType;
+
+    /** \brief Type for matrix of eigenvectors as returned by eigenvectors(). 
+      *
+      * This is a square matrix with entries of type #ComplexScalar. 
+      * The size is the same as the size of #MatrixType.
+      */
+    typedef Matrix<ComplexScalar, RowsAtCompileTime, ColsAtCompileTime, Options, MaxRowsAtCompileTime, MaxColsAtCompileTime> EigenvectorsType;
+
+    /** \brief Default constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via EigenSolver::compute(const MatrixType&, bool).
+      *
+      * \sa compute() for an example.
+      */
+ EigenSolver() : m_eivec(), m_eivalues(), m_isInitialized(false), m_realSchur(), m_matT(), m_tmp() {}
+
+    /** \brief Default constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa EigenSolver()
+      */
+    EigenSolver(Index size)
+      : m_eivec(size, size),
+        m_eivalues(size),
+        m_isInitialized(false),
+        m_eigenvectorsOk(false),
+        m_realSchur(size),
+        m_matT(size, size), 
+        m_tmp(size)
+    {}
+
+    /** \brief Constructor; computes eigendecomposition of given matrix. 
+      * 
+      * \param[in]  matrix  Square matrix whose eigendecomposition is to be computed.
+      * \param[in]  computeEigenvectors  If true, both the eigenvectors and the
+      *    eigenvalues are computed; if false, only the eigenvalues are
+      *    computed. 
+      *
+      * This constructor calls compute() to compute the eigenvalues
+      * and eigenvectors.
+      *
+      * Example: \include EigenSolver_EigenSolver_MatrixType.cpp
+      * Output: \verbinclude EigenSolver_EigenSolver_MatrixType.out
+      *
+      * \sa compute()
+      */
+    EigenSolver(const MatrixType& matrix, bool computeEigenvectors = true)
+      : m_eivec(matrix.rows(), matrix.cols()),
+        m_eivalues(matrix.cols()),
+        m_isInitialized(false),
+        m_eigenvectorsOk(false),
+        m_realSchur(matrix.cols()),
+        m_matT(matrix.rows(), matrix.cols()), 
+        m_tmp(matrix.cols())
+    {
+      compute(matrix, computeEigenvectors);
+    }
+
+    /** \brief Returns the eigenvectors of given matrix. 
+      *
+      * \returns  %Matrix whose columns are the (possibly complex) eigenvectors.
+      *
+      * \pre Either the constructor 
+      * EigenSolver(const MatrixType&,bool) or the member function
+      * compute(const MatrixType&, bool) has been called before, and
+      * \p computeEigenvectors was set to true (the default).
+      *
+      * Column \f$ k \f$ of the returned matrix is an eigenvector corresponding
+      * to eigenvalue number \f$ k \f$ as returned by eigenvalues().  The
+      * eigenvectors are normalized to have (Euclidean) norm equal to one. The
+      * matrix returned by this function is the matrix \f$ V \f$ in the
+      * eigendecomposition \f$ A = V D V^{-1} \f$, if it exists.
+      *
+      * Example: \include EigenSolver_eigenvectors.cpp
+      * Output: \verbinclude EigenSolver_eigenvectors.out
+      *
+      * \sa eigenvalues(), pseudoEigenvectors()
+      */
+    EigenvectorsType eigenvectors() const;
+
+    /** \brief Returns the pseudo-eigenvectors of given matrix. 
+      *
+      * \returns  Const reference to matrix whose columns are the pseudo-eigenvectors.
+      *
+      * \pre Either the constructor 
+      * EigenSolver(const MatrixType&,bool) or the member function
+      * compute(const MatrixType&, bool) has been called before, and
+      * \p computeEigenvectors was set to true (the default).
+      *
+      * The real matrix \f$ V \f$ returned by this function and the
+      * block-diagonal matrix \f$ D \f$ returned by pseudoEigenvalueMatrix()
+      * satisfy \f$ AV = VD \f$.
+      *
+      * Example: \include EigenSolver_pseudoEigenvectors.cpp
+      * Output: \verbinclude EigenSolver_pseudoEigenvectors.out
+      *
+      * \sa pseudoEigenvalueMatrix(), eigenvectors()
+      */
+    const MatrixType& pseudoEigenvectors() const
+    {
+      eigen_assert(m_isInitialized && "EigenSolver is not initialized.");
+      eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues.");
+      return m_eivec;
+    }
+
+    /** \brief Returns the block-diagonal matrix in the pseudo-eigendecomposition.
+      *
+      * \returns  A block-diagonal matrix.
+      *
+      * \pre Either the constructor 
+      * EigenSolver(const MatrixType&,bool) or the member function
+      * compute(const MatrixType&, bool) has been called before.
+      *
+      * The matrix \f$ D \f$ returned by this function is real and
+      * block-diagonal. The blocks on the diagonal are either 1-by-1 or 2-by-2
+      * blocks of the form
+      * \f$ \begin{bmatrix} u & v \\ -v & u \end{bmatrix} \f$.
+      * These blocks are not sorted in any particular order.
+      * The matrix \f$ D \f$ and the matrix \f$ V \f$ returned by
+      * pseudoEigenvectors() satisfy \f$ AV = VD \f$.
+      *
+      * \sa pseudoEigenvectors() for an example, eigenvalues()
+      */
+    MatrixType pseudoEigenvalueMatrix() const;
+
+    /** \brief Returns the eigenvalues of given matrix. 
+      *
+      * \returns A const reference to the column vector containing the eigenvalues.
+      *
+      * \pre Either the constructor 
+      * EigenSolver(const MatrixType&,bool) or the member function
+      * compute(const MatrixType&, bool) has been called before.
+      *
+      * The eigenvalues are repeated according to their algebraic multiplicity,
+      * so there are as many eigenvalues as rows in the matrix. The eigenvalues 
+      * are not sorted in any particular order.
+      *
+      * Example: \include EigenSolver_eigenvalues.cpp
+      * Output: \verbinclude EigenSolver_eigenvalues.out
+      *
+      * \sa eigenvectors(), pseudoEigenvalueMatrix(),
+      *     MatrixBase::eigenvalues()
+      */
+    const EigenvalueType& eigenvalues() const
+    {
+      eigen_assert(m_isInitialized && "EigenSolver is not initialized.");
+      return m_eivalues;
+    }
+
+    /** \brief Computes eigendecomposition of given matrix. 
+      * 
+      * \param[in]  matrix  Square matrix whose eigendecomposition is to be computed.
+      * \param[in]  computeEigenvectors  If true, both the eigenvectors and the
+      *    eigenvalues are computed; if false, only the eigenvalues are
+      *    computed. 
+      * \returns    Reference to \c *this
+      *
+      * This function computes the eigenvalues of the real matrix \p matrix.
+      * The eigenvalues() function can be used to retrieve them.  If 
+      * \p computeEigenvectors is true, then the eigenvectors are also computed
+      * and can be retrieved by calling eigenvectors().
+      *
+      * The matrix is first reduced to real Schur form using the RealSchur
+      * class. The Schur decomposition is then used to compute the eigenvalues
+      * and eigenvectors.
+      *
+      * The cost of the computation is dominated by the cost of the
+      * Schur decomposition, which is very approximately \f$ 25n^3 \f$
+      * (where \f$ n \f$ is the size of the matrix) if \p computeEigenvectors 
+      * is true, and \f$ 10n^3 \f$ if \p computeEigenvectors is false.
+      *
+      * This method reuses of the allocated data in the EigenSolver object.
+      *
+      * Example: \include EigenSolver_compute.cpp
+      * Output: \verbinclude EigenSolver_compute.out
+      */
+    EigenSolver& compute(const MatrixType& matrix, bool computeEigenvectors = true);
+
+    /** \returns NumericalIssue if the input contains INF or NaN values or overflow occured. Returns Success otherwise. */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "EigenSolver is not initialized.");
+      return m_info;
+    }
+
+    /** \brief Sets the maximum number of iterations allowed. */
+    EigenSolver& setMaxIterations(Index maxIters)
+    {
+      m_realSchur.setMaxIterations(maxIters);
+      return *this;
+    }
+
+    /** \brief Returns the maximum number of iterations. */
+    Index getMaxIterations()
+    {
+      return m_realSchur.getMaxIterations();
+    }
+
+  private:
+    void doComputeEigenvectors();
+
+  protected:
+    MatrixType m_eivec;
+    EigenvalueType m_eivalues;
+    bool m_isInitialized;
+    bool m_eigenvectorsOk;
+    ComputationInfo m_info;
+    RealSchur<MatrixType> m_realSchur;
+    MatrixType m_matT;
+
+    typedef Matrix<Scalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> ColumnVectorType;
+    ColumnVectorType m_tmp;
+};
+
+template<typename MatrixType>
+MatrixType EigenSolver<MatrixType>::pseudoEigenvalueMatrix() const
+{
+  eigen_assert(m_isInitialized && "EigenSolver is not initialized.");
+  Index n = m_eivalues.rows();
+  MatrixType matD = MatrixType::Zero(n,n);
+  for (Index i=0; i<n; ++i)
+  {
+    if (internal::isMuchSmallerThan(numext::imag(m_eivalues.coeff(i)), numext::real(m_eivalues.coeff(i))))
+      matD.coeffRef(i,i) = numext::real(m_eivalues.coeff(i));
+    else
+    {
+      matD.template block<2,2>(i,i) <<  numext::real(m_eivalues.coeff(i)), numext::imag(m_eivalues.coeff(i)),
+                                       -numext::imag(m_eivalues.coeff(i)), numext::real(m_eivalues.coeff(i));
+      ++i;
+    }
+  }
+  return matD;
+}
+
+template<typename MatrixType>
+typename EigenSolver<MatrixType>::EigenvectorsType EigenSolver<MatrixType>::eigenvectors() const
+{
+  eigen_assert(m_isInitialized && "EigenSolver is not initialized.");
+  eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues.");
+  Index n = m_eivec.cols();
+  EigenvectorsType matV(n,n);
+  for (Index j=0; j<n; ++j)
+  {
+    if (internal::isMuchSmallerThan(numext::imag(m_eivalues.coeff(j)), numext::real(m_eivalues.coeff(j))) || j+1==n)
+    {
+      // we have a real eigen value
+      matV.col(j) = m_eivec.col(j).template cast<ComplexScalar>();
+      matV.col(j).normalize();
+    }
+    else
+    {
+      // we have a pair of complex eigen values
+      for (Index i=0; i<n; ++i)
+      {
+        matV.coeffRef(i,j)   = ComplexScalar(m_eivec.coeff(i,j),  m_eivec.coeff(i,j+1));
+        matV.coeffRef(i,j+1) = ComplexScalar(m_eivec.coeff(i,j), -m_eivec.coeff(i,j+1));
+      }
+      matV.col(j).normalize();
+      matV.col(j+1).normalize();
+      ++j;
+    }
+  }
+  return matV;
+}
+
+template<typename MatrixType>
+EigenSolver<MatrixType>& 
+EigenSolver<MatrixType>::compute(const MatrixType& matrix, bool computeEigenvectors)
+{
+  using std::sqrt;
+  using std::abs;
+  using std::max;
+  using numext::isfinite;
+  eigen_assert(matrix.cols() == matrix.rows());
+
+  // Reduce to real Schur form.
+  m_realSchur.compute(matrix, computeEigenvectors);
+  
+  m_info = m_realSchur.info();
+
+  if (m_info == Success)
+  {
+    m_matT = m_realSchur.matrixT();
+    if (computeEigenvectors)
+      m_eivec = m_realSchur.matrixU();
+  
+    // Compute eigenvalues from matT
+    m_eivalues.resize(matrix.cols());
+    Index i = 0;
+    while (i < matrix.cols()) 
+    {
+      if (i == matrix.cols() - 1 || m_matT.coeff(i+1, i) == Scalar(0)) 
+      {
+        m_eivalues.coeffRef(i) = m_matT.coeff(i, i);
+        if(!isfinite(m_eivalues.coeffRef(i)))
+        {
+          m_isInitialized = true;
+          m_eigenvectorsOk = false;
+          m_info = NumericalIssue;
+          return *this;
+        }
+        ++i;
+      }
+      else
+      {
+        Scalar p = Scalar(0.5) * (m_matT.coeff(i, i) - m_matT.coeff(i+1, i+1));
+        Scalar z;
+        // Compute z = sqrt(abs(p * p + m_matT.coeff(i+1, i) * m_matT.coeff(i, i+1)));
+        // without overflow
+        {
+          Scalar t0 = m_matT.coeff(i+1, i);
+          Scalar t1 = m_matT.coeff(i, i+1);
+          Scalar maxval = (max)(abs(p),(max)(abs(t0),abs(t1)));
+          t0 /= maxval;
+          t1 /= maxval;
+          Scalar p0 = p/maxval;
+          z = maxval * sqrt(abs(p0 * p0 + t0 * t1));
+        }
+        
+        m_eivalues.coeffRef(i)   = ComplexScalar(m_matT.coeff(i+1, i+1) + p, z);
+        m_eivalues.coeffRef(i+1) = ComplexScalar(m_matT.coeff(i+1, i+1) + p, -z);
+        if(!(isfinite(m_eivalues.coeffRef(i)) && isfinite(m_eivalues.coeffRef(i+1))))
+        {
+          m_isInitialized = true;
+          m_eigenvectorsOk = false;
+          m_info = NumericalIssue;
+          return *this;
+        }
+        i += 2;
+      }
+    }
+    
+    // Compute eigenvectors.
+    if (computeEigenvectors)
+      doComputeEigenvectors();
+  }
+
+  m_isInitialized = true;
+  m_eigenvectorsOk = computeEigenvectors;
+
+  return *this;
+}
+
+// Complex scalar division.
+template<typename Scalar>
+std::complex<Scalar> cdiv(const Scalar& xr, const Scalar& xi, const Scalar& yr, const Scalar& yi)
+{
+  using std::abs;
+  Scalar r,d;
+  if (abs(yr) > abs(yi))
+  {
+      r = yi/yr;
+      d = yr + r*yi;
+      return std::complex<Scalar>((xr + r*xi)/d, (xi - r*xr)/d);
+  }
+  else
+  {
+      r = yr/yi;
+      d = yi + r*yr;
+      return std::complex<Scalar>((r*xr + xi)/d, (r*xi - xr)/d);
+  }
+}
+
+
+template<typename MatrixType>
+void EigenSolver<MatrixType>::doComputeEigenvectors()
+{
+  using std::abs;
+  const Index size = m_eivec.cols();
+  const Scalar eps = NumTraits<Scalar>::epsilon();
+
+  // inefficient! this is already computed in RealSchur
+  Scalar norm(0);
+  for (Index j = 0; j < size; ++j)
+  {
+    norm += m_matT.row(j).segment((std::max)(j-1,Index(0)), size-(std::max)(j-1,Index(0))).cwiseAbs().sum();
+  }
+  
+  // Backsubstitute to find vectors of upper triangular form
+  if (norm == 0.0)
+  {
+    return;
+  }
+
+  for (Index n = size-1; n >= 0; n--)
+  {
+    Scalar p = m_eivalues.coeff(n).real();
+    Scalar q = m_eivalues.coeff(n).imag();
+
+    // Scalar vector
+    if (q == Scalar(0))
+    {
+      Scalar lastr(0), lastw(0);
+      Index l = n;
+
+      m_matT.coeffRef(n,n) = 1.0;
+      for (Index i = n-1; i >= 0; i--)
+      {
+        Scalar w = m_matT.coeff(i,i) - p;
+        Scalar r = m_matT.row(i).segment(l,n-l+1).dot(m_matT.col(n).segment(l, n-l+1));
+
+        if (m_eivalues.coeff(i).imag() < 0.0)
+        {
+          lastw = w;
+          lastr = r;
+        }
+        else
+        {
+          l = i;
+          if (m_eivalues.coeff(i).imag() == 0.0)
+          {
+            if (w != 0.0)
+              m_matT.coeffRef(i,n) = -r / w;
+            else
+              m_matT.coeffRef(i,n) = -r / (eps * norm);
+          }
+          else // Solve real equations
+          {
+            Scalar x = m_matT.coeff(i,i+1);
+            Scalar y = m_matT.coeff(i+1,i);
+            Scalar denom = (m_eivalues.coeff(i).real() - p) * (m_eivalues.coeff(i).real() - p) + m_eivalues.coeff(i).imag() * m_eivalues.coeff(i).imag();
+            Scalar t = (x * lastr - lastw * r) / denom;
+            m_matT.coeffRef(i,n) = t;
+            if (abs(x) > abs(lastw))
+              m_matT.coeffRef(i+1,n) = (-r - w * t) / x;
+            else
+              m_matT.coeffRef(i+1,n) = (-lastr - y * t) / lastw;
+          }
+
+          // Overflow control
+          Scalar t = abs(m_matT.coeff(i,n));
+          if ((eps * t) * t > Scalar(1))
+            m_matT.col(n).tail(size-i) /= t;
+        }
+      }
+    }
+    else if (q < Scalar(0) && n > 0) // Complex vector
+    {
+      Scalar lastra(0), lastsa(0), lastw(0);
+      Index l = n-1;
+
+      // Last vector component imaginary so matrix is triangular
+      if (abs(m_matT.coeff(n,n-1)) > abs(m_matT.coeff(n-1,n)))
+      {
+        m_matT.coeffRef(n-1,n-1) = q / m_matT.coeff(n,n-1);
+        m_matT.coeffRef(n-1,n) = -(m_matT.coeff(n,n) - p) / m_matT.coeff(n,n-1);
+      }
+      else
+      {
+        std::complex<Scalar> cc = cdiv<Scalar>(0.0,-m_matT.coeff(n-1,n),m_matT.coeff(n-1,n-1)-p,q);
+        m_matT.coeffRef(n-1,n-1) = numext::real(cc);
+        m_matT.coeffRef(n-1,n) = numext::imag(cc);
+      }
+      m_matT.coeffRef(n,n-1) = 0.0;
+      m_matT.coeffRef(n,n) = 1.0;
+      for (Index i = n-2; i >= 0; i--)
+      {
+        Scalar ra = m_matT.row(i).segment(l, n-l+1).dot(m_matT.col(n-1).segment(l, n-l+1));
+        Scalar sa = m_matT.row(i).segment(l, n-l+1).dot(m_matT.col(n).segment(l, n-l+1));
+        Scalar w = m_matT.coeff(i,i) - p;
+
+        if (m_eivalues.coeff(i).imag() < 0.0)
+        {
+          lastw = w;
+          lastra = ra;
+          lastsa = sa;
+        }
+        else
+        {
+          l = i;
+          if (m_eivalues.coeff(i).imag() == RealScalar(0))
+          {
+            std::complex<Scalar> cc = cdiv(-ra,-sa,w,q);
+            m_matT.coeffRef(i,n-1) = numext::real(cc);
+            m_matT.coeffRef(i,n) = numext::imag(cc);
+          }
+          else
+          {
+            // Solve complex equations
+            Scalar x = m_matT.coeff(i,i+1);
+            Scalar y = m_matT.coeff(i+1,i);
+            Scalar vr = (m_eivalues.coeff(i).real() - p) * (m_eivalues.coeff(i).real() - p) + m_eivalues.coeff(i).imag() * m_eivalues.coeff(i).imag() - q * q;
+            Scalar vi = (m_eivalues.coeff(i).real() - p) * Scalar(2) * q;
+            if ((vr == 0.0) && (vi == 0.0))
+              vr = eps * norm * (abs(w) + abs(q) + abs(x) + abs(y) + abs(lastw));
+
+            std::complex<Scalar> cc = cdiv(x*lastra-lastw*ra+q*sa,x*lastsa-lastw*sa-q*ra,vr,vi);
+            m_matT.coeffRef(i,n-1) = numext::real(cc);
+            m_matT.coeffRef(i,n) = numext::imag(cc);
+            if (abs(x) > (abs(lastw) + abs(q)))
+            {
+              m_matT.coeffRef(i+1,n-1) = (-ra - w * m_matT.coeff(i,n-1) + q * m_matT.coeff(i,n)) / x;
+              m_matT.coeffRef(i+1,n) = (-sa - w * m_matT.coeff(i,n) - q * m_matT.coeff(i,n-1)) / x;
+            }
+            else
+            {
+              cc = cdiv(-lastra-y*m_matT.coeff(i,n-1),-lastsa-y*m_matT.coeff(i,n),lastw,q);
+              m_matT.coeffRef(i+1,n-1) = numext::real(cc);
+              m_matT.coeffRef(i+1,n) = numext::imag(cc);
+            }
+          }
+
+          // Overflow control
+          Scalar t = numext::maxi(abs(m_matT.coeff(i,n-1)),abs(m_matT.coeff(i,n)));
+          if ((eps * t) * t > Scalar(1))
+            m_matT.block(i, n-1, size-i, 2) /= t;
+
+        }
+      }
+      
+      // We handled a pair of complex conjugate eigenvalues, so need to skip them both
+      n--;
+    }
+    else
+    {
+      eigen_assert(0 && "Internal bug in EigenSolver (INF or NaN has not been detected)"); // this should not happen
+    }
+  }
+
+  // Back transformation to get eigenvectors of original matrix
+  for (Index j = size-1; j >= 0; j--)
+  {
+    m_tmp.noalias() = m_eivec.leftCols(j+1) * m_matT.col(j).segment(0, j+1);
+    m_eivec.col(j) = m_tmp;
+  }
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_EIGENSOLVER_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/GeneralizedEigenSolver.h b/third_party/eigen3/Eigen/src/Eigenvalues/GeneralizedEigenSolver.h
new file mode 100644
index 0000000000..dc240e13e1
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/GeneralizedEigenSolver.h
@@ -0,0 +1,341 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010,2012 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GENERALIZEDEIGENSOLVER_H
+#define EIGEN_GENERALIZEDEIGENSOLVER_H
+
+#include "./RealQZ.h"
+
+namespace Eigen { 
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class GeneralizedEigenSolver
+  *
+  * \brief Computes the generalized eigenvalues and eigenvectors of a pair of general matrices
+  *
+  * \tparam _MatrixType the type of the matrices of which we are computing the
+  * eigen-decomposition; this is expected to be an instantiation of the Matrix
+  * class template. Currently, only real matrices are supported.
+  *
+  * The generalized eigenvalues and eigenvectors of a matrix pair \f$ A \f$ and \f$ B \f$ are scalars
+  * \f$ \lambda \f$ and vectors \f$ v \f$ such that \f$ Av = \lambda Bv \f$.  If
+  * \f$ D \f$ is a diagonal matrix with the eigenvalues on the diagonal, and
+  * \f$ V \f$ is a matrix with the eigenvectors as its columns, then \f$ A V =
+  * B V D \f$. The matrix \f$ V \f$ is almost always invertible, in which case we
+  * have \f$ A = B V D V^{-1} \f$. This is called the generalized eigen-decomposition.
+  *
+  * The generalized eigenvalues and eigenvectors of a matrix pair may be complex, even when the
+  * matrices are real. Moreover, the generalized eigenvalue might be infinite if the matrix B is
+  * singular. To workaround this difficulty, the eigenvalues are provided as a pair of complex \f$ \alpha \f$
+  * and real \f$ \beta \f$ such that: \f$ \lambda_i = \alpha_i / \beta_i \f$. If \f$ \beta_i \f$ is (nearly) zero,
+  * then one can consider the well defined left eigenvalue \f$ \mu = \beta_i / \alpha_i\f$ such that:
+  * \f$ \mu_i A v_i = B v_i \f$, or even \f$ \mu_i u_i^T A  = u_i^T B \f$ where \f$ u_i \f$ is
+  * called the left eigenvector.
+  *
+  * Call the function compute() to compute the generalized eigenvalues and eigenvectors of
+  * a given matrix pair. Alternatively, you can use the
+  * GeneralizedEigenSolver(const MatrixType&, const MatrixType&, bool) constructor which computes the
+  * eigenvalues and eigenvectors at construction time. Once the eigenvalue and
+  * eigenvectors are computed, they can be retrieved with the eigenvalues() and
+  * eigenvectors() functions.
+  *
+  * Here is an usage example of this class:
+  * Example: \include GeneralizedEigenSolver.cpp
+  * Output: \verbinclude GeneralizedEigenSolver.out
+  *
+  * \sa MatrixBase::eigenvalues(), class ComplexEigenSolver, class SelfAdjointEigenSolver
+  */
+template<typename _MatrixType> class GeneralizedEigenSolver
+{
+  public:
+
+    /** \brief Synonym for the template parameter \p _MatrixType. */
+    typedef _MatrixType MatrixType;
+
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+
+    /** \brief Scalar type for matrices of type #MatrixType. */
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename MatrixType::Index Index;
+
+    /** \brief Complex scalar type for #MatrixType. 
+      *
+      * This is \c std::complex<Scalar> if #Scalar is real (e.g.,
+      * \c float or \c double) and just \c Scalar if #Scalar is
+      * complex.
+      */
+    typedef std::complex<RealScalar> ComplexScalar;
+
+    /** \brief Type for vector of real scalar values eigenvalues as returned by betas().
+      *
+      * This is a column vector with entries of type #Scalar.
+      * The length of the vector is the size of #MatrixType.
+      */
+    typedef Matrix<Scalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> VectorType;
+
+    /** \brief Type for vector of complex scalar values eigenvalues as returned by betas().
+      *
+      * This is a column vector with entries of type #ComplexScalar.
+      * The length of the vector is the size of #MatrixType.
+      */
+    typedef Matrix<ComplexScalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> ComplexVectorType;
+
+    /** \brief Expression type for the eigenvalues as returned by eigenvalues().
+      */
+    typedef CwiseBinaryOp<internal::scalar_quotient_op<ComplexScalar,Scalar>,ComplexVectorType,VectorType> EigenvalueType;
+
+    /** \brief Type for matrix of eigenvectors as returned by eigenvectors(). 
+      *
+      * This is a square matrix with entries of type #ComplexScalar. 
+      * The size is the same as the size of #MatrixType.
+      */
+    typedef Matrix<ComplexScalar, RowsAtCompileTime, ColsAtCompileTime, Options, MaxRowsAtCompileTime, MaxColsAtCompileTime> EigenvectorsType;
+
+    /** \brief Default constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via EigenSolver::compute(const MatrixType&, bool).
+      *
+      * \sa compute() for an example.
+      */
+    GeneralizedEigenSolver() : m_eivec(), m_alphas(), m_betas(), m_isInitialized(false), m_realQZ(), m_matS(), m_tmp() {}
+
+    /** \brief Default constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa GeneralizedEigenSolver()
+      */
+    GeneralizedEigenSolver(Index size)
+      : m_eivec(size, size),
+        m_alphas(size),
+        m_betas(size),
+        m_isInitialized(false),
+        m_eigenvectorsOk(false),
+        m_realQZ(size),
+        m_matS(size, size),
+        m_tmp(size)
+    {}
+
+    /** \brief Constructor; computes the generalized eigendecomposition of given matrix pair.
+      * 
+      * \param[in]  A  Square matrix whose eigendecomposition is to be computed.
+      * \param[in]  B  Square matrix whose eigendecomposition is to be computed.
+      * \param[in]  computeEigenvectors  If true, both the eigenvectors and the
+      *    eigenvalues are computed; if false, only the eigenvalues are computed.
+      *
+      * This constructor calls compute() to compute the generalized eigenvalues
+      * and eigenvectors.
+      *
+      * \sa compute()
+      */
+    GeneralizedEigenSolver(const MatrixType& A, const MatrixType& B, bool computeEigenvectors = true)
+      : m_eivec(A.rows(), A.cols()),
+        m_alphas(A.cols()),
+        m_betas(A.cols()),
+        m_isInitialized(false),
+        m_eigenvectorsOk(false),
+        m_realQZ(A.cols()),
+        m_matS(A.rows(), A.cols()),
+        m_tmp(A.cols())
+    {
+      compute(A, B, computeEigenvectors);
+    }
+
+    /* \brief Returns the computed generalized eigenvectors.
+      *
+      * \returns  %Matrix whose columns are the (possibly complex) eigenvectors.
+      *
+      * \pre Either the constructor 
+      * GeneralizedEigenSolver(const MatrixType&,const MatrixType&, bool) or the member function
+      * compute(const MatrixType&, const MatrixType& bool) has been called before, and
+      * \p computeEigenvectors was set to true (the default).
+      *
+      * Column \f$ k \f$ of the returned matrix is an eigenvector corresponding
+      * to eigenvalue number \f$ k \f$ as returned by eigenvalues().  The
+      * eigenvectors are normalized to have (Euclidean) norm equal to one. The
+      * matrix returned by this function is the matrix \f$ V \f$ in the
+      * generalized eigendecomposition \f$ A = B V D V^{-1} \f$, if it exists.
+      *
+      * \sa eigenvalues()
+      */
+//    EigenvectorsType eigenvectors() const;
+
+    /** \brief Returns an expression of the computed generalized eigenvalues.
+      *
+      * \returns An expression of the column vector containing the eigenvalues.
+      *
+      * It is a shortcut for \code this->alphas().cwiseQuotient(this->betas()); \endcode
+      * Not that betas might contain zeros. It is therefore not recommended to use this function,
+      * but rather directly deal with the alphas and betas vectors.
+      *
+      * \pre Either the constructor 
+      * GeneralizedEigenSolver(const MatrixType&,const MatrixType&,bool) or the member function
+      * compute(const MatrixType&,const MatrixType&,bool) has been called before.
+      *
+      * The eigenvalues are repeated according to their algebraic multiplicity,
+      * so there are as many eigenvalues as rows in the matrix. The eigenvalues 
+      * are not sorted in any particular order.
+      *
+      * \sa alphas(), betas(), eigenvectors()
+      */
+    EigenvalueType eigenvalues() const
+    {
+      eigen_assert(m_isInitialized && "GeneralizedEigenSolver is not initialized.");
+      return EigenvalueType(m_alphas,m_betas);
+    }
+
+    /** \returns A const reference to the vectors containing the alpha values
+      *
+      * This vector permits to reconstruct the j-th eigenvalues as alphas(i)/betas(j).
+      *
+      * \sa betas(), eigenvalues() */
+    ComplexVectorType alphas() const
+    {
+      eigen_assert(m_isInitialized && "GeneralizedEigenSolver is not initialized.");
+      return m_alphas;
+    }
+
+    /** \returns A const reference to the vectors containing the beta values
+      *
+      * This vector permits to reconstruct the j-th eigenvalues as alphas(i)/betas(j).
+      *
+      * \sa alphas(), eigenvalues() */
+    VectorType betas() const
+    {
+      eigen_assert(m_isInitialized && "GeneralizedEigenSolver is not initialized.");
+      return m_betas;
+    }
+
+    /** \brief Computes generalized eigendecomposition of given matrix.
+      * 
+      * \param[in]  A  Square matrix whose eigendecomposition is to be computed.
+      * \param[in]  B  Square matrix whose eigendecomposition is to be computed.
+      * \param[in]  computeEigenvectors  If true, both the eigenvectors and the
+      *    eigenvalues are computed; if false, only the eigenvalues are
+      *    computed. 
+      * \returns    Reference to \c *this
+      *
+      * This function computes the eigenvalues of the real matrix \p matrix.
+      * The eigenvalues() function can be used to retrieve them.  If 
+      * \p computeEigenvectors is true, then the eigenvectors are also computed
+      * and can be retrieved by calling eigenvectors().
+      *
+      * The matrix is first reduced to real generalized Schur form using the RealQZ
+      * class. The generalized Schur decomposition is then used to compute the eigenvalues
+      * and eigenvectors.
+      *
+      * The cost of the computation is dominated by the cost of the
+      * generalized Schur decomposition.
+      *
+      * This method reuses of the allocated data in the GeneralizedEigenSolver object.
+      */
+    GeneralizedEigenSolver& compute(const MatrixType& A, const MatrixType& B, bool computeEigenvectors = true);
+
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "EigenSolver is not initialized.");
+      return m_realQZ.info();
+    }
+
+    /** Sets the maximal number of iterations allowed.
+    */
+    GeneralizedEigenSolver& setMaxIterations(Index maxIters)
+    {
+      m_realQZ.setMaxIterations(maxIters);
+      return *this;
+    }
+
+  protected:
+    MatrixType m_eivec;
+    ComplexVectorType m_alphas;
+    VectorType m_betas;
+    bool m_isInitialized;
+    bool m_eigenvectorsOk;
+    RealQZ<MatrixType> m_realQZ;
+    MatrixType m_matS;
+
+    typedef Matrix<Scalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> ColumnVectorType;
+    ColumnVectorType m_tmp;
+};
+
+//template<typename MatrixType>
+//typename GeneralizedEigenSolver<MatrixType>::EigenvectorsType GeneralizedEigenSolver<MatrixType>::eigenvectors() const
+//{
+//  eigen_assert(m_isInitialized && "EigenSolver is not initialized.");
+//  eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues.");
+//  Index n = m_eivec.cols();
+//  EigenvectorsType matV(n,n);
+//  // TODO
+//  return matV;
+//}
+
+template<typename MatrixType>
+GeneralizedEigenSolver<MatrixType>&
+GeneralizedEigenSolver<MatrixType>::compute(const MatrixType& A, const MatrixType& B, bool computeEigenvectors)
+{
+  using std::sqrt;
+  using std::abs;
+  eigen_assert(A.cols() == A.rows() && B.cols() == A.rows() && B.cols() == B.rows());
+
+  // Reduce to generalized real Schur form:
+  // A = Q S Z and B = Q T Z
+  m_realQZ.compute(A, B, computeEigenvectors);
+
+  if (m_realQZ.info() == Success)
+  {
+    m_matS = m_realQZ.matrixS();
+    if (computeEigenvectors)
+      m_eivec = m_realQZ.matrixZ().transpose();
+  
+    // Compute eigenvalues from matS
+    m_alphas.resize(A.cols());
+    m_betas.resize(A.cols());
+    Index i = 0;
+    while (i < A.cols())
+    {
+      if (i == A.cols() - 1 || m_matS.coeff(i+1, i) == Scalar(0))
+      {
+        m_alphas.coeffRef(i) = m_matS.coeff(i, i);
+        m_betas.coeffRef(i)  = m_realQZ.matrixT().coeff(i,i);
+        ++i;
+      }
+      else
+      {
+        Scalar p = Scalar(0.5) * (m_matS.coeff(i, i) - m_matS.coeff(i+1, i+1));
+        Scalar z = sqrt(abs(p * p + m_matS.coeff(i+1, i) * m_matS.coeff(i, i+1)));
+        m_alphas.coeffRef(i)   = ComplexScalar(m_matS.coeff(i+1, i+1) + p, z);
+        m_alphas.coeffRef(i+1) = ComplexScalar(m_matS.coeff(i+1, i+1) + p, -z);
+
+        m_betas.coeffRef(i)   = m_realQZ.matrixT().coeff(i,i);
+        m_betas.coeffRef(i+1) = m_realQZ.matrixT().coeff(i,i);
+        i += 2;
+      }
+    }
+  }
+
+  m_isInitialized = true;
+  m_eigenvectorsOk = false;//computeEigenvectors;
+
+  return *this;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERALIZEDEIGENSOLVER_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/GeneralizedSelfAdjointEigenSolver.h b/third_party/eigen3/Eigen/src/Eigenvalues/GeneralizedSelfAdjointEigenSolver.h
new file mode 100644
index 0000000000..07bf1ea095
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/GeneralizedSelfAdjointEigenSolver.h
@@ -0,0 +1,227 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GENERALIZEDSELFADJOINTEIGENSOLVER_H
+#define EIGEN_GENERALIZEDSELFADJOINTEIGENSOLVER_H
+
+#include "./Tridiagonalization.h"
+
+namespace Eigen { 
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class GeneralizedSelfAdjointEigenSolver
+  *
+  * \brief Computes eigenvalues and eigenvectors of the generalized selfadjoint eigen problem
+  *
+  * \tparam _MatrixType the type of the matrix of which we are computing the
+  * eigendecomposition; this is expected to be an instantiation of the Matrix
+  * class template.
+  *
+  * This class solves the generalized eigenvalue problem
+  * \f$ Av = \lambda Bv \f$. In this case, the matrix \f$ A \f$ should be
+  * selfadjoint and the matrix \f$ B \f$ should be positive definite.
+  *
+  * Only the \b lower \b triangular \b part of the input matrix is referenced.
+  *
+  * Call the function compute() to compute the eigenvalues and eigenvectors of
+  * a given matrix. Alternatively, you can use the
+  * GeneralizedSelfAdjointEigenSolver(const MatrixType&, const MatrixType&, int)
+  * constructor which computes the eigenvalues and eigenvectors at construction time.
+  * Once the eigenvalue and eigenvectors are computed, they can be retrieved with the eigenvalues()
+  * and eigenvectors() functions.
+  *
+  * The documentation for GeneralizedSelfAdjointEigenSolver(const MatrixType&, const MatrixType&, int)
+  * contains an example of the typical use of this class.
+  *
+  * \sa class SelfAdjointEigenSolver, class EigenSolver, class ComplexEigenSolver
+  */
+template<typename _MatrixType>
+class GeneralizedSelfAdjointEigenSolver : public SelfAdjointEigenSolver<_MatrixType>
+{
+    typedef SelfAdjointEigenSolver<_MatrixType> Base;
+  public:
+
+    typedef typename Base::Index Index;
+    typedef _MatrixType MatrixType;
+
+    /** \brief Default constructor for fixed-size matrices.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via compute(). This constructor
+      * can only be used if \p _MatrixType is a fixed-size matrix; use
+      * GeneralizedSelfAdjointEigenSolver(Index) for dynamic-size matrices.
+      */
+    GeneralizedSelfAdjointEigenSolver() : Base() {}
+
+    /** \brief Constructor, pre-allocates memory for dynamic-size matrices.
+      *
+      * \param [in]  size  Positive integer, size of the matrix whose
+      * eigenvalues and eigenvectors will be computed.
+      *
+      * This constructor is useful for dynamic-size matrices, when the user
+      * intends to perform decompositions via compute(). The \p size
+      * parameter is only used as a hint. It is not an error to give a wrong
+      * \p size, but it may impair performance.
+      *
+      * \sa compute() for an example
+      */
+    GeneralizedSelfAdjointEigenSolver(Index size)
+        : Base(size)
+    {}
+
+    /** \brief Constructor; computes generalized eigendecomposition of given matrix pencil.
+      *
+      * \param[in]  matA  Selfadjoint matrix in matrix pencil.
+      *                   Only the lower triangular part of the matrix is referenced.
+      * \param[in]  matB  Positive-definite matrix in matrix pencil.
+      *                   Only the lower triangular part of the matrix is referenced.
+      * \param[in]  options A or-ed set of flags {#ComputeEigenvectors,#EigenvaluesOnly} | {#Ax_lBx,#ABx_lx,#BAx_lx}.
+      *                     Default is #ComputeEigenvectors|#Ax_lBx.
+      *
+      * This constructor calls compute(const MatrixType&, const MatrixType&, int)
+      * to compute the eigenvalues and (if requested) the eigenvectors of the
+      * generalized eigenproblem \f$ Ax = \lambda B x \f$ with \a matA the
+      * selfadjoint matrix \f$ A \f$ and \a matB the positive definite matrix
+      * \f$ B \f$. Each eigenvector \f$ x \f$ satisfies the property
+      * \f$ x^* B x = 1 \f$. The eigenvectors are computed if
+      * \a options contains ComputeEigenvectors.
+      *
+      * In addition, the two following variants can be solved via \p options:
+      * - \c ABx_lx: \f$ ABx = \lambda x \f$
+      * - \c BAx_lx: \f$ BAx = \lambda x \f$
+      *
+      * Example: \include SelfAdjointEigenSolver_SelfAdjointEigenSolver_MatrixType2.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_SelfAdjointEigenSolver_MatrixType2.out
+      *
+      * \sa compute(const MatrixType&, const MatrixType&, int)
+      */
+    GeneralizedSelfAdjointEigenSolver(const MatrixType& matA, const MatrixType& matB,
+                                      int options = ComputeEigenvectors|Ax_lBx)
+      : Base(matA.cols())
+    {
+      compute(matA, matB, options);
+    }
+
+    /** \brief Computes generalized eigendecomposition of given matrix pencil.
+      *
+      * \param[in]  matA  Selfadjoint matrix in matrix pencil.
+      *                   Only the lower triangular part of the matrix is referenced.
+      * \param[in]  matB  Positive-definite matrix in matrix pencil.
+      *                   Only the lower triangular part of the matrix is referenced.
+      * \param[in]  options A or-ed set of flags {#ComputeEigenvectors,#EigenvaluesOnly} | {#Ax_lBx,#ABx_lx,#BAx_lx}.
+      *                     Default is #ComputeEigenvectors|#Ax_lBx.
+      *
+      * \returns    Reference to \c *this
+      *
+      * Accoring to \p options, this function computes eigenvalues and (if requested)
+      * the eigenvectors of one of the following three generalized eigenproblems:
+      * - \c Ax_lBx: \f$ Ax = \lambda B x \f$
+      * - \c ABx_lx: \f$ ABx = \lambda x \f$
+      * - \c BAx_lx: \f$ BAx = \lambda x \f$
+      * with \a matA the selfadjoint matrix \f$ A \f$ and \a matB the positive definite
+      * matrix \f$ B \f$.
+      * In addition, each eigenvector \f$ x \f$ satisfies the property \f$ x^* B x = 1 \f$.
+      *
+      * The eigenvalues() function can be used to retrieve
+      * the eigenvalues. If \p options contains ComputeEigenvectors, then the
+      * eigenvectors are also computed and can be retrieved by calling
+      * eigenvectors().
+      *
+      * The implementation uses LLT to compute the Cholesky decomposition
+      * \f$ B = LL^* \f$ and computes the classical eigendecomposition
+      * of the selfadjoint matrix \f$ L^{-1} A (L^*)^{-1} \f$ if \p options contains Ax_lBx
+      * and of \f$ L^{*} A L \f$ otherwise. This solves the
+      * generalized eigenproblem, because any solution of the generalized
+      * eigenproblem \f$ Ax = \lambda B x \f$ corresponds to a solution
+      * \f$ L^{-1} A (L^*)^{-1} (L^* x) = \lambda (L^* x) \f$ of the
+      * eigenproblem for \f$ L^{-1} A (L^*)^{-1} \f$. Similar statements
+      * can be made for the two other variants.
+      *
+      * Example: \include SelfAdjointEigenSolver_compute_MatrixType2.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_compute_MatrixType2.out
+      *
+      * \sa GeneralizedSelfAdjointEigenSolver(const MatrixType&, const MatrixType&, int)
+      */
+    GeneralizedSelfAdjointEigenSolver& compute(const MatrixType& matA, const MatrixType& matB,
+                                               int options = ComputeEigenvectors|Ax_lBx);
+
+  protected:
+
+};
+
+
+template<typename MatrixType>
+GeneralizedSelfAdjointEigenSolver<MatrixType>& GeneralizedSelfAdjointEigenSolver<MatrixType>::
+compute(const MatrixType& matA, const MatrixType& matB, int options)
+{
+  eigen_assert(matA.cols()==matA.rows() && matB.rows()==matA.rows() && matB.cols()==matB.rows());
+  eigen_assert((options&~(EigVecMask|GenEigMask))==0
+          && (options&EigVecMask)!=EigVecMask
+          && ((options&GenEigMask)==0 || (options&GenEigMask)==Ax_lBx
+           || (options&GenEigMask)==ABx_lx || (options&GenEigMask)==BAx_lx)
+          && "invalid option parameter");
+
+  bool computeEigVecs = ((options&EigVecMask)==0) || ((options&EigVecMask)==ComputeEigenvectors);
+
+  // Compute the cholesky decomposition of matB = L L' = U'U
+  LLT<MatrixType> cholB(matB);
+
+  int type = (options&GenEigMask);
+  if(type==0)
+    type = Ax_lBx;
+
+  if(type==Ax_lBx)
+  {
+    // compute C = inv(L) A inv(L')
+    MatrixType matC = matA.template selfadjointView<Lower>();
+    cholB.matrixL().template solveInPlace<OnTheLeft>(matC);
+    cholB.matrixU().template solveInPlace<OnTheRight>(matC);
+
+    Base::compute(matC, computeEigVecs ? ComputeEigenvectors : EigenvaluesOnly );
+
+    // transform back the eigen vectors: evecs = inv(U) * evecs
+    if(computeEigVecs)
+      cholB.matrixU().solveInPlace(Base::m_eivec);
+  }
+  else if(type==ABx_lx)
+  {
+    // compute C = L' A L
+    MatrixType matC = matA.template selfadjointView<Lower>();
+    matC = matC * cholB.matrixL();
+    matC = cholB.matrixU() * matC;
+
+    Base::compute(matC, computeEigVecs ? ComputeEigenvectors : EigenvaluesOnly);
+
+    // transform back the eigen vectors: evecs = inv(U) * evecs
+    if(computeEigVecs)
+      cholB.matrixU().solveInPlace(Base::m_eivec);
+  }
+  else if(type==BAx_lx)
+  {
+    // compute C = L' A L
+    MatrixType matC = matA.template selfadjointView<Lower>();
+    matC = matC * cholB.matrixL();
+    matC = cholB.matrixU() * matC;
+
+    Base::compute(matC, computeEigVecs ? ComputeEigenvectors : EigenvaluesOnly);
+
+    // transform back the eigen vectors: evecs = L * evecs
+    if(computeEigVecs)
+      Base::m_eivec = cholB.matrixL() * Base::m_eivec;
+  }
+
+  return *this;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_GENERALIZEDSELFADJOINTEIGENSOLVER_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/HessenbergDecomposition.h b/third_party/eigen3/Eigen/src/Eigenvalues/HessenbergDecomposition.h
new file mode 100644
index 0000000000..3db0c0106c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/HessenbergDecomposition.h
@@ -0,0 +1,373 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_HESSENBERGDECOMPOSITION_H
+#define EIGEN_HESSENBERGDECOMPOSITION_H
+
+namespace Eigen { 
+
+namespace internal {
+  
+template<typename MatrixType> struct HessenbergDecompositionMatrixHReturnType;
+template<typename MatrixType>
+struct traits<HessenbergDecompositionMatrixHReturnType<MatrixType> >
+{
+  typedef MatrixType ReturnType;
+};
+
+}
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class HessenbergDecomposition
+  *
+  * \brief Reduces a square matrix to Hessenberg form by an orthogonal similarity transformation
+  *
+  * \tparam _MatrixType the type of the matrix of which we are computing the Hessenberg decomposition
+  *
+  * This class performs an Hessenberg decomposition of a matrix \f$ A \f$. In
+  * the real case, the Hessenberg decomposition consists of an orthogonal
+  * matrix \f$ Q \f$ and a Hessenberg matrix \f$ H \f$ such that \f$ A = Q H
+  * Q^T \f$. An orthogonal matrix is a matrix whose inverse equals its
+  * transpose (\f$ Q^{-1} = Q^T \f$). A Hessenberg matrix has zeros below the
+  * subdiagonal, so it is almost upper triangular. The Hessenberg decomposition
+  * of a complex matrix is \f$ A = Q H Q^* \f$ with \f$ Q \f$ unitary (that is,
+  * \f$ Q^{-1} = Q^* \f$).
+  *
+  * Call the function compute() to compute the Hessenberg decomposition of a
+  * given matrix. Alternatively, you can use the
+  * HessenbergDecomposition(const MatrixType&) constructor which computes the
+  * Hessenberg decomposition at construction time. Once the decomposition is
+  * computed, you can use the matrixH() and matrixQ() functions to construct
+  * the matrices H and Q in the decomposition.
+  *
+  * The documentation for matrixH() contains an example of the typical use of
+  * this class.
+  *
+  * \sa class ComplexSchur, class Tridiagonalization, \ref QR_Module "QR Module"
+  */
+template<typename _MatrixType> class HessenbergDecomposition
+{
+  public:
+
+    /** \brief Synonym for the template parameter \p _MatrixType. */
+    typedef _MatrixType MatrixType;
+
+    enum {
+      Size = MatrixType::RowsAtCompileTime,
+      SizeMinusOne = Size == Dynamic ? Dynamic : Size - 1,
+      Options = MatrixType::Options,
+      MaxSize = MatrixType::MaxRowsAtCompileTime,
+      MaxSizeMinusOne = MaxSize == Dynamic ? Dynamic : MaxSize - 1
+    };
+
+    /** \brief Scalar type for matrices of type #MatrixType. */
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::Index Index;
+
+    /** \brief Type for vector of Householder coefficients.
+      *
+      * This is column vector with entries of type #Scalar. The length of the
+      * vector is one less than the size of #MatrixType, if it is a fixed-side
+      * type.
+      */
+    typedef Matrix<Scalar, SizeMinusOne, 1, Options & ~RowMajor, MaxSizeMinusOne, 1> CoeffVectorType;
+
+    /** \brief Return type of matrixQ() */
+    typedef HouseholderSequence<MatrixType,typename internal::remove_all<typename CoeffVectorType::ConjugateReturnType>::type> HouseholderSequenceType;
+    
+    typedef internal::HessenbergDecompositionMatrixHReturnType<MatrixType> MatrixHReturnType;
+
+    /** \brief Default constructor; the decomposition will be computed later.
+      *
+      * \param [in] size  The size of the matrix whose Hessenberg decomposition will be computed.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via compute().  The \p size parameter is only
+      * used as a hint. It is not an error to give a wrong \p size, but it may
+      * impair performance.
+      *
+      * \sa compute() for an example.
+      */
+    HessenbergDecomposition(Index size = Size==Dynamic ? 2 : Size)
+      : m_matrix(size,size),
+        m_temp(size),
+        m_isInitialized(false)
+    {
+      if(size>1)
+        m_hCoeffs.resize(size-1);
+    }
+
+    /** \brief Constructor; computes Hessenberg decomposition of given matrix.
+      *
+      * \param[in]  matrix  Square matrix whose Hessenberg decomposition is to be computed.
+      *
+      * This constructor calls compute() to compute the Hessenberg
+      * decomposition.
+      *
+      * \sa matrixH() for an example.
+      */
+    HessenbergDecomposition(const MatrixType& matrix)
+      : m_matrix(matrix),
+        m_temp(matrix.rows()),
+        m_isInitialized(false)
+    {
+      if(matrix.rows()<2)
+      {
+        m_isInitialized = true;
+        return;
+      }
+      m_hCoeffs.resize(matrix.rows()-1,1);
+      _compute(m_matrix, m_hCoeffs, m_temp);
+      m_isInitialized = true;
+    }
+
+    /** \brief Computes Hessenberg decomposition of given matrix.
+      *
+      * \param[in]  matrix  Square matrix whose Hessenberg decomposition is to be computed.
+      * \returns    Reference to \c *this
+      *
+      * The Hessenberg decomposition is computed by bringing the columns of the
+      * matrix successively in the required form using Householder reflections
+      * (see, e.g., Algorithm 7.4.2 in Golub \& Van Loan, <i>%Matrix
+      * Computations</i>). The cost is \f$ 10n^3/3 \f$ flops, where \f$ n \f$
+      * denotes the size of the given matrix.
+      *
+      * This method reuses of the allocated data in the HessenbergDecomposition
+      * object.
+      *
+      * Example: \include HessenbergDecomposition_compute.cpp
+      * Output: \verbinclude HessenbergDecomposition_compute.out
+      */
+    HessenbergDecomposition& compute(const MatrixType& matrix)
+    {
+      m_matrix = matrix;
+      if(matrix.rows()<2)
+      {
+        m_isInitialized = true;
+        return *this;
+      }
+      m_hCoeffs.resize(matrix.rows()-1,1);
+      _compute(m_matrix, m_hCoeffs, m_temp);
+      m_isInitialized = true;
+      return *this;
+    }
+
+    /** \brief Returns the Householder coefficients.
+      *
+      * \returns a const reference to the vector of Householder coefficients
+      *
+      * \pre Either the constructor HessenbergDecomposition(const MatrixType&)
+      * or the member function compute(const MatrixType&) has been called
+      * before to compute the Hessenberg decomposition of a matrix.
+      *
+      * The Householder coefficients allow the reconstruction of the matrix
+      * \f$ Q \f$ in the Hessenberg decomposition from the packed data.
+      *
+      * \sa packedMatrix(), \ref Householder_Module "Householder module"
+      */
+    const CoeffVectorType& householderCoefficients() const
+    {
+      eigen_assert(m_isInitialized && "HessenbergDecomposition is not initialized.");
+      return m_hCoeffs;
+    }
+
+    /** \brief Returns the internal representation of the decomposition
+      *
+      *	\returns a const reference to a matrix with the internal representation
+      *	         of the decomposition.
+      *
+      * \pre Either the constructor HessenbergDecomposition(const MatrixType&)
+      * or the member function compute(const MatrixType&) has been called
+      * before to compute the Hessenberg decomposition of a matrix.
+      *
+      * The returned matrix contains the following information:
+      *  - the upper part and lower sub-diagonal represent the Hessenberg matrix H
+      *  - the rest of the lower part contains the Householder vectors that, combined with
+      *    Householder coefficients returned by householderCoefficients(),
+      *    allows to reconstruct the matrix Q as
+      *       \f$ Q = H_{N-1} \ldots H_1 H_0 \f$.
+      *    Here, the matrices \f$ H_i \f$ are the Householder transformations
+      *       \f$ H_i = (I - h_i v_i v_i^T) \f$
+      *    where \f$ h_i \f$ is the \f$ i \f$th Householder coefficient and
+      *    \f$ v_i \f$ is the Householder vector defined by
+      *       \f$ v_i = [ 0, \ldots, 0, 1, M(i+2,i), \ldots, M(N-1,i) ]^T \f$
+      *    with M the matrix returned by this function.
+      *
+      * See LAPACK for further details on this packed storage.
+      *
+      * Example: \include HessenbergDecomposition_packedMatrix.cpp
+      * Output: \verbinclude HessenbergDecomposition_packedMatrix.out
+      *
+      * \sa householderCoefficients()
+      */
+    const MatrixType& packedMatrix() const
+    {
+      eigen_assert(m_isInitialized && "HessenbergDecomposition is not initialized.");
+      return m_matrix;
+    }
+
+    /** \brief Reconstructs the orthogonal matrix Q in the decomposition
+      *
+      * \returns object representing the matrix Q
+      *
+      * \pre Either the constructor HessenbergDecomposition(const MatrixType&)
+      * or the member function compute(const MatrixType&) has been called
+      * before to compute the Hessenberg decomposition of a matrix.
+      *
+      * This function returns a light-weight object of template class
+      * HouseholderSequence. You can either apply it directly to a matrix or
+      * you can convert it to a matrix of type #MatrixType.
+      *
+      * \sa matrixH() for an example, class HouseholderSequence
+      */
+    HouseholderSequenceType matrixQ() const
+    {
+      eigen_assert(m_isInitialized && "HessenbergDecomposition is not initialized.");
+      return HouseholderSequenceType(m_matrix, m_hCoeffs.conjugate())
+             .setLength(m_matrix.rows() - 1)
+             .setShift(1);
+    }
+
+    /** \brief Constructs the Hessenberg matrix H in the decomposition
+      *
+      * \returns expression object representing the matrix H
+      *
+      * \pre Either the constructor HessenbergDecomposition(const MatrixType&)
+      * or the member function compute(const MatrixType&) has been called
+      * before to compute the Hessenberg decomposition of a matrix.
+      *
+      * The object returned by this function constructs the Hessenberg matrix H
+      * when it is assigned to a matrix or otherwise evaluated. The matrix H is
+      * constructed from the packed matrix as returned by packedMatrix(): The
+      * upper part (including the subdiagonal) of the packed matrix contains
+      * the matrix H. It may sometimes be better to directly use the packed
+      * matrix instead of constructing the matrix H.
+      *
+      * Example: \include HessenbergDecomposition_matrixH.cpp
+      * Output: \verbinclude HessenbergDecomposition_matrixH.out
+      *
+      * \sa matrixQ(), packedMatrix()
+      */
+    MatrixHReturnType matrixH() const
+    {
+      eigen_assert(m_isInitialized && "HessenbergDecomposition is not initialized.");
+      return MatrixHReturnType(*this);
+    }
+
+  private:
+
+    typedef Matrix<Scalar, 1, Size, Options | RowMajor, 1, MaxSize> VectorType;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    static void _compute(MatrixType& matA, CoeffVectorType& hCoeffs, VectorType& temp);
+
+  protected:
+    MatrixType m_matrix;
+    CoeffVectorType m_hCoeffs;
+    VectorType m_temp;
+    bool m_isInitialized;
+};
+
+/** \internal
+  * Performs a tridiagonal decomposition of \a matA in place.
+  *
+  * \param matA the input selfadjoint matrix
+  * \param hCoeffs returned Householder coefficients
+  *
+  * The result is written in the lower triangular part of \a matA.
+  *
+  * Implemented from Golub's "%Matrix Computations", algorithm 8.3.1.
+  *
+  * \sa packedMatrix()
+  */
+template<typename MatrixType>
+void HessenbergDecomposition<MatrixType>::_compute(MatrixType& matA, CoeffVectorType& hCoeffs, VectorType& temp)
+{
+  eigen_assert(matA.rows()==matA.cols());
+  Index n = matA.rows();
+  temp.resize(n);
+  for (Index i = 0; i<n-1; ++i)
+  {
+    // let's consider the vector v = i-th column starting at position i+1
+    Index remainingSize = n-i-1;
+    RealScalar beta;
+    Scalar h;
+    matA.col(i).tail(remainingSize).makeHouseholderInPlace(h, beta);
+    matA.col(i).coeffRef(i+1) = beta;
+    hCoeffs.coeffRef(i) = h;
+
+    // Apply similarity transformation to remaining columns,
+    // i.e., compute A = H A H'
+
+    // A = H A
+    matA.bottomRightCorner(remainingSize, remainingSize)
+        .applyHouseholderOnTheLeft(matA.col(i).tail(remainingSize-1), h, &temp.coeffRef(0));
+
+    // A = A H'
+    matA.rightCols(remainingSize)
+        .applyHouseholderOnTheRight(matA.col(i).tail(remainingSize-1).conjugate(), numext::conj(h), &temp.coeffRef(0));
+  }
+}
+
+namespace internal {
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \brief Expression type for return value of HessenbergDecomposition::matrixH()
+  *
+  * \tparam MatrixType type of matrix in the Hessenberg decomposition
+  *
+  * Objects of this type represent the Hessenberg matrix in the Hessenberg
+  * decomposition of some matrix. The object holds a reference to the
+  * HessenbergDecomposition class until the it is assigned or evaluated for
+  * some other reason (the reference should remain valid during the life time
+  * of this object). This class is the return type of
+  * HessenbergDecomposition::matrixH(); there is probably no other use for this
+  * class.
+  */
+template<typename MatrixType> struct HessenbergDecompositionMatrixHReturnType
+: public ReturnByValue<HessenbergDecompositionMatrixHReturnType<MatrixType> >
+{
+    typedef typename MatrixType::Index Index;
+  public:
+    /** \brief Constructor.
+      *
+      * \param[in] hess  Hessenberg decomposition
+      */
+    HessenbergDecompositionMatrixHReturnType(const HessenbergDecomposition<MatrixType>& hess) : m_hess(hess) { }
+
+    /** \brief Hessenberg matrix in decomposition.
+      *
+      * \param[out] result  Hessenberg matrix in decomposition \p hess which
+      *                     was passed to the constructor
+      */
+    template <typename ResultType>
+    inline void evalTo(ResultType& result) const
+    {
+      result = m_hess.packedMatrix();
+      Index n = result.rows();
+      if (n>2)
+        result.bottomLeftCorner(n-2, n-2).template triangularView<Lower>().setZero();
+    }
+
+    Index rows() const { return m_hess.packedMatrix().rows(); }
+    Index cols() const { return m_hess.packedMatrix().cols(); }
+
+  protected:
+    const HessenbergDecomposition<MatrixType>& m_hess;
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_HESSENBERGDECOMPOSITION_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/MatrixBaseEigenvalues.h b/third_party/eigen3/Eigen/src/Eigenvalues/MatrixBaseEigenvalues.h
new file mode 100644
index 0000000000..4fec8af0a3
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/MatrixBaseEigenvalues.h
@@ -0,0 +1,160 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MATRIXBASEEIGENVALUES_H
+#define EIGEN_MATRIXBASEEIGENVALUES_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Derived, bool IsComplex>
+struct eigenvalues_selector
+{
+  // this is the implementation for the case IsComplex = true
+  static inline typename MatrixBase<Derived>::EigenvaluesReturnType const
+  run(const MatrixBase<Derived>& m)
+  {
+    typedef typename Derived::PlainObject PlainObject;
+    PlainObject m_eval(m);
+    return ComplexEigenSolver<PlainObject>(m_eval, false).eigenvalues();
+  }
+};
+
+template<typename Derived>
+struct eigenvalues_selector<Derived, false>
+{
+  static inline typename MatrixBase<Derived>::EigenvaluesReturnType const
+  run(const MatrixBase<Derived>& m)
+  {
+    typedef typename Derived::PlainObject PlainObject;
+    PlainObject m_eval(m);
+    return EigenSolver<PlainObject>(m_eval, false).eigenvalues();
+  }
+};
+
+} // end namespace internal
+
+/** \brief Computes the eigenvalues of a matrix 
+  * \returns Column vector containing the eigenvalues.
+  *
+  * \eigenvalues_module
+  * This function computes the eigenvalues with the help of the EigenSolver
+  * class (for real matrices) or the ComplexEigenSolver class (for complex
+  * matrices). 
+  *
+  * The eigenvalues are repeated according to their algebraic multiplicity,
+  * so there are as many eigenvalues as rows in the matrix.
+  *
+  * The SelfAdjointView class provides a better algorithm for selfadjoint
+  * matrices.
+  *
+  * Example: \include MatrixBase_eigenvalues.cpp
+  * Output: \verbinclude MatrixBase_eigenvalues.out
+  *
+  * \sa EigenSolver::eigenvalues(), ComplexEigenSolver::eigenvalues(),
+  *     SelfAdjointView::eigenvalues()
+  */
+template<typename Derived>
+inline typename MatrixBase<Derived>::EigenvaluesReturnType
+MatrixBase<Derived>::eigenvalues() const
+{
+  typedef typename internal::traits<Derived>::Scalar Scalar;
+  return internal::eigenvalues_selector<Derived, NumTraits<Scalar>::IsComplex>::run(derived());
+}
+
+/** \brief Computes the eigenvalues of a matrix
+  * \returns Column vector containing the eigenvalues.
+  *
+  * \eigenvalues_module
+  * This function computes the eigenvalues with the help of the
+  * SelfAdjointEigenSolver class.  The eigenvalues are repeated according to
+  * their algebraic multiplicity, so there are as many eigenvalues as rows in
+  * the matrix.
+  *
+  * Example: \include SelfAdjointView_eigenvalues.cpp
+  * Output: \verbinclude SelfAdjointView_eigenvalues.out
+  *
+  * \sa SelfAdjointEigenSolver::eigenvalues(), MatrixBase::eigenvalues()
+  */
+template<typename MatrixType, unsigned int UpLo> 
+inline typename SelfAdjointView<MatrixType, UpLo>::EigenvaluesReturnType
+SelfAdjointView<MatrixType, UpLo>::eigenvalues() const
+{
+  typedef typename SelfAdjointView<MatrixType, UpLo>::PlainObject PlainObject;
+  PlainObject thisAsMatrix(*this);
+  return SelfAdjointEigenSolver<PlainObject>(thisAsMatrix, false).eigenvalues();
+}
+
+
+
+/** \brief Computes the L2 operator norm
+  * \returns Operator norm of the matrix.
+  *
+  * \eigenvalues_module
+  * This function computes the L2 operator norm of a matrix, which is also
+  * known as the spectral norm. The norm of a matrix \f$ A \f$ is defined to be
+  * \f[ \|A\|_2 = \max_x \frac{\|Ax\|_2}{\|x\|_2} \f]
+  * where the maximum is over all vectors and the norm on the right is the
+  * Euclidean vector norm. The norm equals the largest singular value, which is
+  * the square root of the largest eigenvalue of the positive semi-definite
+  * matrix \f$ A^*A \f$.
+  *
+  * The current implementation uses the eigenvalues of \f$ A^*A \f$, as computed
+  * by SelfAdjointView::eigenvalues(), to compute the operator norm of a
+  * matrix.  The SelfAdjointView class provides a better algorithm for
+  * selfadjoint matrices.
+  *
+  * Example: \include MatrixBase_operatorNorm.cpp
+  * Output: \verbinclude MatrixBase_operatorNorm.out
+  *
+  * \sa SelfAdjointView::eigenvalues(), SelfAdjointView::operatorNorm()
+  */
+template<typename Derived>
+inline typename MatrixBase<Derived>::RealScalar
+MatrixBase<Derived>::operatorNorm() const
+{
+  using std::sqrt;
+  typename Derived::PlainObject m_eval(derived());
+  // FIXME if it is really guaranteed that the eigenvalues are already sorted,
+  // then we don't need to compute a maxCoeff() here, comparing the 1st and last ones is enough.
+  return sqrt((m_eval*m_eval.adjoint())
+                 .eval()
+		 .template selfadjointView<Lower>()
+		 .eigenvalues()
+		 .maxCoeff()
+		 );
+}
+
+/** \brief Computes the L2 operator norm
+  * \returns Operator norm of the matrix.
+  *
+  * \eigenvalues_module
+  * This function computes the L2 operator norm of a self-adjoint matrix. For a
+  * self-adjoint matrix, the operator norm is the largest eigenvalue.
+  *
+  * The current implementation uses the eigenvalues of the matrix, as computed
+  * by eigenvalues(), to compute the operator norm of the matrix.
+  *
+  * Example: \include SelfAdjointView_operatorNorm.cpp
+  * Output: \verbinclude SelfAdjointView_operatorNorm.out
+  *
+  * \sa eigenvalues(), MatrixBase::operatorNorm()
+  */
+template<typename MatrixType, unsigned int UpLo>
+inline typename SelfAdjointView<MatrixType, UpLo>::RealScalar
+SelfAdjointView<MatrixType, UpLo>::operatorNorm() const
+{
+  return eigenvalues().cwiseAbs().maxCoeff();
+}
+
+} // end namespace Eigen
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/RealQZ.h b/third_party/eigen3/Eigen/src/Eigenvalues/RealQZ.h
new file mode 100644
index 0000000000..5706eeebe9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/RealQZ.h
@@ -0,0 +1,624 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Alexey Korepanov <kaikaikai@yandex.ru>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_REAL_QZ_H
+#define EIGEN_REAL_QZ_H
+
+namespace Eigen {
+
+  /** \eigenvalues_module \ingroup Eigenvalues_Module
+   *
+   *
+   * \class RealQZ
+   *
+   * \brief Performs a real QZ decomposition of a pair of square matrices
+   *
+   * \tparam _MatrixType the type of the matrix of which we are computing the
+   * real QZ decomposition; this is expected to be an instantiation of the
+   * Matrix class template.
+   *
+   * Given a real square matrices A and B, this class computes the real QZ
+   * decomposition: \f$ A = Q S Z \f$, \f$ B = Q T Z \f$ where Q and Z are
+   * real orthogonal matrixes, T is upper-triangular matrix, and S is upper
+   * quasi-triangular matrix. An orthogonal matrix is a matrix whose
+   * inverse is equal to its transpose, \f$ U^{-1} = U^T \f$. A quasi-triangular
+   * matrix is a block-triangular matrix whose diagonal consists of 1-by-1
+   * blocks and 2-by-2 blocks where further reduction is impossible due to
+   * complex eigenvalues. 
+   *
+   * The eigenvalues of the pencil \f$ A - z B \f$ can be obtained from
+   * 1x1 and 2x2 blocks on the diagonals of S and T.
+   *
+   * Call the function compute() to compute the real QZ decomposition of a
+   * given pair of matrices. Alternatively, you can use the 
+   * RealQZ(const MatrixType& B, const MatrixType& B, bool computeQZ)
+   * constructor which computes the real QZ decomposition at construction
+   * time. Once the decomposition is computed, you can use the matrixS(),
+   * matrixT(), matrixQ() and matrixZ() functions to retrieve the matrices
+   * S, T, Q and Z in the decomposition. If computeQZ==false, some time
+   * is saved by not computing matrices Q and Z.
+   *
+   * Example: \include RealQZ_compute.cpp
+   * Output: \include RealQZ_compute.out
+   *
+   * \note The implementation is based on the algorithm in "Matrix Computations"
+   * by Gene H. Golub and Charles F. Van Loan, and a paper "An algorithm for
+   * generalized eigenvalue problems" by C.B.Moler and G.W.Stewart.
+   *
+   * \sa class RealSchur, class ComplexSchur, class EigenSolver, class ComplexEigenSolver
+   */
+
+  template<typename _MatrixType> class RealQZ
+  {
+    public:
+      typedef _MatrixType MatrixType;
+      enum {
+        RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+        ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+        Options = MatrixType::Options,
+        MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+        MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+      };
+      typedef typename MatrixType::Scalar Scalar;
+      typedef std::complex<typename NumTraits<Scalar>::Real> ComplexScalar;
+      typedef typename MatrixType::Index Index;
+
+      typedef Matrix<ComplexScalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> EigenvalueType;
+      typedef Matrix<Scalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> ColumnVectorType;
+
+      /** \brief Default constructor.
+       *
+       * \param [in] size  Positive integer, size of the matrix whose QZ decomposition will be computed.
+       *
+       * The default constructor is useful in cases in which the user intends to
+       * perform decompositions via compute().  The \p size parameter is only
+       * used as a hint. It is not an error to give a wrong \p size, but it may
+       * impair performance.
+       *
+       * \sa compute() for an example.
+       */
+      RealQZ(Index size = RowsAtCompileTime==Dynamic ? 1 : RowsAtCompileTime) : 
+        m_S(size, size),
+        m_T(size, size),
+        m_Q(size, size),
+        m_Z(size, size),
+        m_workspace(size*2),
+        m_maxIters(400),
+        m_isInitialized(false)
+        { }
+
+      /** \brief Constructor; computes real QZ decomposition of given matrices
+       * 
+       * \param[in]  A          Matrix A.
+       * \param[in]  B          Matrix B.
+       * \param[in]  computeQZ  If false, A and Z are not computed.
+       *
+       * This constructor calls compute() to compute the QZ decomposition.
+       */
+      RealQZ(const MatrixType& A, const MatrixType& B, bool computeQZ = true) :
+        m_S(A.rows(),A.cols()),
+        m_T(A.rows(),A.cols()),
+        m_Q(A.rows(),A.cols()),
+        m_Z(A.rows(),A.cols()),
+        m_workspace(A.rows()*2),
+        m_maxIters(400),
+        m_isInitialized(false) {
+          compute(A, B, computeQZ);
+        }
+
+      /** \brief Returns matrix Q in the QZ decomposition. 
+       *
+       * \returns A const reference to the matrix Q.
+       */
+      const MatrixType& matrixQ() const {
+        eigen_assert(m_isInitialized && "RealQZ is not initialized.");
+        eigen_assert(m_computeQZ && "The matrices Q and Z have not been computed during the QZ decomposition.");
+        return m_Q;
+      }
+
+      /** \brief Returns matrix Z in the QZ decomposition. 
+       *
+       * \returns A const reference to the matrix Z.
+       */
+      const MatrixType& matrixZ() const {
+        eigen_assert(m_isInitialized && "RealQZ is not initialized.");
+        eigen_assert(m_computeQZ && "The matrices Q and Z have not been computed during the QZ decomposition.");
+        return m_Z;
+      }
+
+      /** \brief Returns matrix S in the QZ decomposition. 
+       *
+       * \returns A const reference to the matrix S.
+       */
+      const MatrixType& matrixS() const {
+        eigen_assert(m_isInitialized && "RealQZ is not initialized.");
+        return m_S;
+      }
+
+      /** \brief Returns matrix S in the QZ decomposition. 
+       *
+       * \returns A const reference to the matrix S.
+       */
+      const MatrixType& matrixT() const {
+        eigen_assert(m_isInitialized && "RealQZ is not initialized.");
+        return m_T;
+      }
+
+      /** \brief Computes QZ decomposition of given matrix. 
+       * 
+       * \param[in]  A          Matrix A.
+       * \param[in]  B          Matrix B.
+       * \param[in]  computeQZ  If false, A and Z are not computed.
+       * \returns    Reference to \c *this
+       */
+      RealQZ& compute(const MatrixType& A, const MatrixType& B, bool computeQZ = true);
+
+      /** \brief Reports whether previous computation was successful.
+       *
+       * \returns \c Success if computation was succesful, \c NoConvergence otherwise.
+       */
+      ComputationInfo info() const
+      {
+        eigen_assert(m_isInitialized && "RealQZ is not initialized.");
+        return m_info;
+      }
+
+      /** \brief Returns number of performed QR-like iterations.
+      */
+      Index iterations() const
+      {
+        eigen_assert(m_isInitialized && "RealQZ is not initialized.");
+        return m_global_iter;
+      }
+
+      /** Sets the maximal number of iterations allowed to converge to one eigenvalue
+       * or decouple the problem.
+      */
+      RealQZ& setMaxIterations(Index maxIters)
+      {
+        m_maxIters = maxIters;
+        return *this;
+      }
+
+    private:
+
+      MatrixType m_S, m_T, m_Q, m_Z;
+      Matrix<Scalar,Dynamic,1> m_workspace;
+      ComputationInfo m_info;
+      Index m_maxIters;
+      bool m_isInitialized;
+      bool m_computeQZ;
+      Scalar m_normOfT, m_normOfS;
+      Index m_global_iter;
+
+      typedef Matrix<Scalar,3,1> Vector3s;
+      typedef Matrix<Scalar,2,1> Vector2s;
+      typedef Matrix<Scalar,2,2> Matrix2s;
+      typedef JacobiRotation<Scalar> JRs;
+
+      void hessenbergTriangular();
+      void computeNorms();
+      Index findSmallSubdiagEntry(Index iu);
+      Index findSmallDiagEntry(Index f, Index l);
+      void splitOffTwoRows(Index i);
+      void pushDownZero(Index z, Index f, Index l);
+      void step(Index f, Index l, Index iter);
+
+  }; // RealQZ
+
+  /** \internal Reduces S and T to upper Hessenberg - triangular form */
+  template<typename MatrixType>
+    void RealQZ<MatrixType>::hessenbergTriangular()
+    {
+
+      const Index dim = m_S.cols();
+
+      // perform QR decomposition of T, overwrite T with R, save Q
+      HouseholderQR<MatrixType> qrT(m_T);
+      m_T = qrT.matrixQR();
+      m_T.template triangularView<StrictlyLower>().setZero();
+      m_Q = qrT.householderQ();
+      // overwrite S with Q* S
+      m_S.applyOnTheLeft(m_Q.adjoint());
+      // init Z as Identity
+      if (m_computeQZ)
+        m_Z = MatrixType::Identity(dim,dim);
+      // reduce S to upper Hessenberg with Givens rotations
+      for (Index j=0; j<=dim-3; j++) {
+        for (Index i=dim-1; i>=j+2; i--) {
+          JRs G;
+          // kill S(i,j)
+          if(m_S.coeff(i,j) != 0)
+          {
+            G.makeGivens(m_S.coeff(i-1,j), m_S.coeff(i,j), &m_S.coeffRef(i-1, j));
+            m_S.coeffRef(i,j) = Scalar(0.0);
+            m_S.rightCols(dim-j-1).applyOnTheLeft(i-1,i,G.adjoint());
+            m_T.rightCols(dim-i+1).applyOnTheLeft(i-1,i,G.adjoint());
+          }
+          // update Q
+          if (m_computeQZ)
+            m_Q.applyOnTheRight(i-1,i,G);
+          // kill T(i,i-1)
+          if(m_T.coeff(i,i-1)!=Scalar(0))
+          {
+            G.makeGivens(m_T.coeff(i,i), m_T.coeff(i,i-1), &m_T.coeffRef(i,i));
+            m_T.coeffRef(i,i-1) = Scalar(0.0);
+            m_S.applyOnTheRight(i,i-1,G);
+            m_T.topRows(i).applyOnTheRight(i,i-1,G);
+          }
+          // update Z
+          if (m_computeQZ)
+            m_Z.applyOnTheLeft(i,i-1,G.adjoint());
+        }
+      }
+    }
+
+  /** \internal Computes vector L1 norms of S and T when in Hessenberg-Triangular form already */
+  template<typename MatrixType>
+    inline void RealQZ<MatrixType>::computeNorms()
+    {
+      const Index size = m_S.cols();
+      m_normOfS = Scalar(0.0);
+      m_normOfT = Scalar(0.0);
+      for (Index j = 0; j < size; ++j)
+      {
+        m_normOfS += m_S.col(j).segment(0, (std::min)(size,j+2)).cwiseAbs().sum();
+        m_normOfT += m_T.row(j).segment(j, size - j).cwiseAbs().sum();
+      }
+    }
+
+
+  /** \internal Look for single small sub-diagonal element S(res, res-1) and return res (or 0) */
+  template<typename MatrixType>
+    inline typename MatrixType::Index RealQZ<MatrixType>::findSmallSubdiagEntry(Index iu)
+    {
+      using std::abs;
+      Index res = iu;
+      while (res > 0)
+      {
+        Scalar s = abs(m_S.coeff(res-1,res-1)) + abs(m_S.coeff(res,res));
+        if (s == Scalar(0.0))
+          s = m_normOfS;
+        if (abs(m_S.coeff(res,res-1)) < NumTraits<Scalar>::epsilon() * s)
+          break;
+        res--;
+      }
+      return res;
+    }
+
+  /** \internal Look for single small diagonal element T(res, res) for res between f and l, and return res (or f-1)  */
+  template<typename MatrixType>
+    inline typename MatrixType::Index RealQZ<MatrixType>::findSmallDiagEntry(Index f, Index l)
+    {
+      using std::abs;
+      Index res = l;
+      while (res >= f) {
+        if (abs(m_T.coeff(res,res)) <= NumTraits<Scalar>::epsilon() * m_normOfT)
+          break;
+        res--;
+      }
+      return res;
+    }
+
+  /** \internal decouple 2x2 diagonal block in rows i, i+1 if eigenvalues are real */
+  template<typename MatrixType>
+    inline void RealQZ<MatrixType>::splitOffTwoRows(Index i)
+    {
+      using std::abs;
+      using std::sqrt;
+      const Index dim=m_S.cols();
+      if (abs(m_S.coeff(i+1,i)==Scalar(0)))
+        return;
+      Index z = findSmallDiagEntry(i,i+1);
+      if (z==i-1)
+      {
+        // block of (S T^{-1})
+        Matrix2s STi = m_T.template block<2,2>(i,i).template triangularView<Upper>().
+          template solve<OnTheRight>(m_S.template block<2,2>(i,i));
+        Scalar p = Scalar(0.5)*(STi(0,0)-STi(1,1));
+        Scalar q = p*p + STi(1,0)*STi(0,1);
+        if (q>=0) {
+          Scalar z = sqrt(q);
+          // one QR-like iteration for ABi - lambda I
+          // is enough - when we know exact eigenvalue in advance,
+          // convergence is immediate
+          JRs G;
+          if (p>=0)
+            G.makeGivens(p + z, STi(1,0));
+          else
+            G.makeGivens(p - z, STi(1,0));
+          m_S.rightCols(dim-i).applyOnTheLeft(i,i+1,G.adjoint());
+          m_T.rightCols(dim-i).applyOnTheLeft(i,i+1,G.adjoint());
+          // update Q
+          if (m_computeQZ)
+            m_Q.applyOnTheRight(i,i+1,G);
+
+          G.makeGivens(m_T.coeff(i+1,i+1), m_T.coeff(i+1,i));
+          m_S.topRows(i+2).applyOnTheRight(i+1,i,G);
+          m_T.topRows(i+2).applyOnTheRight(i+1,i,G);
+          // update Z
+          if (m_computeQZ)
+            m_Z.applyOnTheLeft(i+1,i,G.adjoint());
+
+          m_S.coeffRef(i+1,i) = Scalar(0.0);
+          m_T.coeffRef(i+1,i) = Scalar(0.0);
+        }
+      }
+      else
+      {
+        pushDownZero(z,i,i+1);
+      }
+    }
+
+  /** \internal use zero in T(z,z) to zero S(l,l-1), working in block f..l */
+  template<typename MatrixType>
+    inline void RealQZ<MatrixType>::pushDownZero(Index z, Index f, Index l)
+    {
+      JRs G;
+      const Index dim = m_S.cols();
+      for (Index zz=z; zz<l; zz++)
+      {
+        // push 0 down
+        Index firstColS = zz>f ? (zz-1) : zz;
+        G.makeGivens(m_T.coeff(zz, zz+1), m_T.coeff(zz+1, zz+1));
+        m_S.rightCols(dim-firstColS).applyOnTheLeft(zz,zz+1,G.adjoint());
+        m_T.rightCols(dim-zz).applyOnTheLeft(zz,zz+1,G.adjoint());
+        m_T.coeffRef(zz+1,zz+1) = Scalar(0.0);
+        // update Q
+        if (m_computeQZ)
+          m_Q.applyOnTheRight(zz,zz+1,G);
+        // kill S(zz+1, zz-1)
+        if (zz>f)
+        {
+          G.makeGivens(m_S.coeff(zz+1, zz), m_S.coeff(zz+1,zz-1));
+          m_S.topRows(zz+2).applyOnTheRight(zz, zz-1,G);
+          m_T.topRows(zz+1).applyOnTheRight(zz, zz-1,G);
+          m_S.coeffRef(zz+1,zz-1) = Scalar(0.0);
+          // update Z
+          if (m_computeQZ)
+            m_Z.applyOnTheLeft(zz,zz-1,G.adjoint());
+        }
+      }
+      // finally kill S(l,l-1)
+      G.makeGivens(m_S.coeff(l,l), m_S.coeff(l,l-1));
+      m_S.applyOnTheRight(l,l-1,G);
+      m_T.applyOnTheRight(l,l-1,G);
+      m_S.coeffRef(l,l-1)=Scalar(0.0);
+      // update Z
+      if (m_computeQZ)
+        m_Z.applyOnTheLeft(l,l-1,G.adjoint());
+    }
+
+  /** \internal QR-like iterative step for block f..l */
+  template<typename MatrixType>
+    inline void RealQZ<MatrixType>::step(Index f, Index l, Index iter)
+    {
+      using std::abs;
+      const Index dim = m_S.cols();
+
+      // x, y, z
+      Scalar x, y, z;
+      if (iter==10)
+      {
+        // Wilkinson ad hoc shift
+        const Scalar
+          a11=m_S.coeff(f+0,f+0), a12=m_S.coeff(f+0,f+1),
+          a21=m_S.coeff(f+1,f+0), a22=m_S.coeff(f+1,f+1), a32=m_S.coeff(f+2,f+1),
+          b12=m_T.coeff(f+0,f+1),
+          b11i=Scalar(1.0)/m_T.coeff(f+0,f+0),
+          b22i=Scalar(1.0)/m_T.coeff(f+1,f+1),
+          a87=m_S.coeff(l-1,l-2),
+          a98=m_S.coeff(l-0,l-1),
+          b77i=Scalar(1.0)/m_T.coeff(l-2,l-2),
+          b88i=Scalar(1.0)/m_T.coeff(l-1,l-1);
+        Scalar ss = abs(a87*b77i) + abs(a98*b88i),
+               lpl = Scalar(1.5)*ss,
+               ll = ss*ss;
+        x = ll + a11*a11*b11i*b11i - lpl*a11*b11i + a12*a21*b11i*b22i
+          - a11*a21*b12*b11i*b11i*b22i;
+        y = a11*a21*b11i*b11i - lpl*a21*b11i + a21*a22*b11i*b22i 
+          - a21*a21*b12*b11i*b11i*b22i;
+        z = a21*a32*b11i*b22i;
+      }
+      else if (iter==16)
+      {
+        // another exceptional shift
+        x = m_S.coeff(f,f)/m_T.coeff(f,f)-m_S.coeff(l,l)/m_T.coeff(l,l) + m_S.coeff(l,l-1)*m_T.coeff(l-1,l) /
+          (m_T.coeff(l-1,l-1)*m_T.coeff(l,l));
+        y = m_S.coeff(f+1,f)/m_T.coeff(f,f);
+        z = 0;
+      }
+      else if (iter>23 && !(iter%8))
+      {
+        // extremely exceptional shift
+        x = internal::random<Scalar>(-1.0,1.0);
+        y = internal::random<Scalar>(-1.0,1.0);
+        z = internal::random<Scalar>(-1.0,1.0);
+      }
+      else
+      {
+        // Compute the shifts: (x,y,z,0...) = (AB^-1 - l1 I) (AB^-1 - l2 I) e1
+        // where l1 and l2 are the eigenvalues of the 2x2 matrix C = U V^-1 where
+        // U and V are 2x2 bottom right sub matrices of A and B. Thus:
+        //  = AB^-1AB^-1 + l1 l2 I - (l1+l2)(AB^-1)
+        //  = AB^-1AB^-1 + det(M) - tr(M)(AB^-1)
+        // Since we are only interested in having x, y, z with a correct ratio, we have:
+        const Scalar
+          a11 = m_S.coeff(f,f),     a12 = m_S.coeff(f,f+1),
+          a21 = m_S.coeff(f+1,f),   a22 = m_S.coeff(f+1,f+1),
+                                    a32 = m_S.coeff(f+2,f+1),
+
+          a88 = m_S.coeff(l-1,l-1), a89 = m_S.coeff(l-1,l),
+          a98 = m_S.coeff(l,l-1),   a99 = m_S.coeff(l,l),
+
+          b11 = m_T.coeff(f,f),     b12 = m_T.coeff(f,f+1),
+                                    b22 = m_T.coeff(f+1,f+1),
+
+          b88 = m_T.coeff(l-1,l-1), b89 = m_T.coeff(l-1,l),
+                                    b99 = m_T.coeff(l,l);
+
+        x = ( (a88/b88 - a11/b11)*(a99/b99 - a11/b11) - (a89/b99)*(a98/b88) + (a98/b88)*(b89/b99)*(a11/b11) ) * (b11/a21)
+          + a12/b22 - (a11/b11)*(b12/b22);
+        y = (a22/b22-a11/b11) - (a21/b11)*(b12/b22) - (a88/b88-a11/b11) - (a99/b99-a11/b11) + (a98/b88)*(b89/b99);
+        z = a32/b22;
+      }
+
+      JRs G;
+
+      for (Index k=f; k<=l-2; k++)
+      {
+        // variables for Householder reflections
+        Vector2s essential2;
+        Scalar tau, beta;
+
+        Vector3s hr(x,y,z);
+
+        // Q_k to annihilate S(k+1,k-1) and S(k+2,k-1)
+        hr.makeHouseholderInPlace(tau, beta);
+        essential2 = hr.template bottomRows<2>();
+        Index fc=(std::max)(k-1,Index(0));  // first col to update
+        m_S.template middleRows<3>(k).rightCols(dim-fc).applyHouseholderOnTheLeft(essential2, tau, m_workspace.data());
+        m_T.template middleRows<3>(k).rightCols(dim-fc).applyHouseholderOnTheLeft(essential2, tau, m_workspace.data());
+        if (m_computeQZ)
+          m_Q.template middleCols<3>(k).applyHouseholderOnTheRight(essential2, tau, m_workspace.data());
+        if (k>f)
+          m_S.coeffRef(k+2,k-1) = m_S.coeffRef(k+1,k-1) = Scalar(0.0);
+
+        // Z_{k1} to annihilate T(k+2,k+1) and T(k+2,k)
+        hr << m_T.coeff(k+2,k+2),m_T.coeff(k+2,k),m_T.coeff(k+2,k+1);
+        hr.makeHouseholderInPlace(tau, beta);
+        essential2 = hr.template bottomRows<2>();
+        {
+          Index lr = (std::min)(k+4,dim); // last row to update
+          Map<Matrix<Scalar,Dynamic,1> > tmp(m_workspace.data(),lr);
+          // S
+          tmp = m_S.template middleCols<2>(k).topRows(lr) * essential2;
+          tmp += m_S.col(k+2).head(lr);
+          m_S.col(k+2).head(lr) -= tau*tmp;
+          m_S.template middleCols<2>(k).topRows(lr) -= (tau*tmp) * essential2.adjoint();
+          // T
+          tmp = m_T.template middleCols<2>(k).topRows(lr) * essential2;
+          tmp += m_T.col(k+2).head(lr);
+          m_T.col(k+2).head(lr) -= tau*tmp;
+          m_T.template middleCols<2>(k).topRows(lr) -= (tau*tmp) * essential2.adjoint();
+        }
+        if (m_computeQZ)
+        {
+          // Z
+          Map<Matrix<Scalar,1,Dynamic> > tmp(m_workspace.data(),dim);
+          tmp = essential2.adjoint()*(m_Z.template middleRows<2>(k));
+          tmp += m_Z.row(k+2);
+          m_Z.row(k+2) -= tau*tmp;
+          m_Z.template middleRows<2>(k) -= essential2 * (tau*tmp);
+        }
+        m_T.coeffRef(k+2,k) = m_T.coeffRef(k+2,k+1) = Scalar(0.0);
+
+        // Z_{k2} to annihilate T(k+1,k)
+        G.makeGivens(m_T.coeff(k+1,k+1), m_T.coeff(k+1,k));
+        m_S.applyOnTheRight(k+1,k,G);
+        m_T.applyOnTheRight(k+1,k,G);
+        // update Z
+        if (m_computeQZ)
+          m_Z.applyOnTheLeft(k+1,k,G.adjoint());
+        m_T.coeffRef(k+1,k) = Scalar(0.0);
+
+        // update x,y,z
+        x = m_S.coeff(k+1,k);
+        y = m_S.coeff(k+2,k);
+        if (k < l-2)
+          z = m_S.coeff(k+3,k);
+      } // loop over k
+
+      // Q_{n-1} to annihilate y = S(l,l-2)
+      G.makeGivens(x,y);
+      m_S.applyOnTheLeft(l-1,l,G.adjoint());
+      m_T.applyOnTheLeft(l-1,l,G.adjoint());
+      if (m_computeQZ)
+        m_Q.applyOnTheRight(l-1,l,G);
+      m_S.coeffRef(l,l-2) = Scalar(0.0);
+
+      // Z_{n-1} to annihilate T(l,l-1)
+      G.makeGivens(m_T.coeff(l,l),m_T.coeff(l,l-1));
+      m_S.applyOnTheRight(l,l-1,G);
+      m_T.applyOnTheRight(l,l-1,G);
+      if (m_computeQZ)
+        m_Z.applyOnTheLeft(l,l-1,G.adjoint());
+      m_T.coeffRef(l,l-1) = Scalar(0.0);
+    }
+
+
+  template<typename MatrixType>
+    RealQZ<MatrixType>& RealQZ<MatrixType>::compute(const MatrixType& A_in, const MatrixType& B_in, bool computeQZ)
+    {
+
+      const Index dim = A_in.cols();
+
+      eigen_assert (A_in.rows()==dim && A_in.cols()==dim 
+          && B_in.rows()==dim && B_in.cols()==dim 
+          && "Need square matrices of the same dimension");
+
+      m_isInitialized = true;
+      m_computeQZ = computeQZ;
+      m_S = A_in; m_T = B_in;
+      m_workspace.resize(dim*2);
+      m_global_iter = 0;
+
+      // entrance point: hessenberg triangular decomposition
+      hessenbergTriangular();
+      // compute L1 vector norms of T, S into m_normOfS, m_normOfT
+      computeNorms();
+
+      Index l = dim-1, 
+            f, 
+            local_iter = 0;
+
+      while (l>0 && local_iter<m_maxIters)
+      {
+        f = findSmallSubdiagEntry(l);
+        // now rows and columns f..l (including) decouple from the rest of the problem
+        if (f>0) m_S.coeffRef(f,f-1) = Scalar(0.0);
+        if (f == l) // One root found
+        {
+          l--;
+          local_iter = 0;
+        }
+        else if (f == l-1) // Two roots found
+        {
+          splitOffTwoRows(f);
+          l -= 2;
+          local_iter = 0;
+        }
+        else // No convergence yet
+        {
+          // if there's zero on diagonal of T, we can isolate an eigenvalue with Givens rotations
+          Index z = findSmallDiagEntry(f,l);
+          if (z>=f)
+          {
+            // zero found
+            pushDownZero(z,f,l);
+          }
+          else
+          {
+            // We are sure now that S.block(f,f, l-f+1,l-f+1) is underuced upper-Hessenberg 
+            // and T.block(f,f, l-f+1,l-f+1) is invertible uper-triangular, which allows to
+            // apply a QR-like iteration to rows and columns f..l.
+            step(f,l, local_iter);
+            local_iter++;
+            m_global_iter++;
+          }
+        }
+      }
+      // check if we converged before reaching iterations limit
+      m_info = (local_iter<m_maxIters) ? Success : NoConvergence;
+      return *this;
+    } // end compute
+
+} // end namespace Eigen
+
+#endif //EIGEN_REAL_QZ
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/RealSchur.h b/third_party/eigen3/Eigen/src/Eigenvalues/RealSchur.h
new file mode 100644
index 0000000000..64d1363414
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/RealSchur.h
@@ -0,0 +1,529 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010,2012 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_REAL_SCHUR_H
+#define EIGEN_REAL_SCHUR_H
+
+#include "./HessenbergDecomposition.h"
+
+namespace Eigen { 
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class RealSchur
+  *
+  * \brief Performs a real Schur decomposition of a square matrix
+  *
+  * \tparam _MatrixType the type of the matrix of which we are computing the
+  * real Schur decomposition; this is expected to be an instantiation of the
+  * Matrix class template.
+  *
+  * Given a real square matrix A, this class computes the real Schur
+  * decomposition: \f$ A = U T U^T \f$ where U is a real orthogonal matrix and
+  * T is a real quasi-triangular matrix. An orthogonal matrix is a matrix whose
+  * inverse is equal to its transpose, \f$ U^{-1} = U^T \f$. A quasi-triangular
+  * matrix is a block-triangular matrix whose diagonal consists of 1-by-1
+  * blocks and 2-by-2 blocks with complex eigenvalues. The eigenvalues of the
+  * blocks on the diagonal of T are the same as the eigenvalues of the matrix
+  * A, and thus the real Schur decomposition is used in EigenSolver to compute
+  * the eigendecomposition of a matrix.
+  *
+  * Call the function compute() to compute the real Schur decomposition of a
+  * given matrix. Alternatively, you can use the RealSchur(const MatrixType&, bool)
+  * constructor which computes the real Schur decomposition at construction
+  * time. Once the decomposition is computed, you can use the matrixU() and
+  * matrixT() functions to retrieve the matrices U and T in the decomposition.
+  *
+  * The documentation of RealSchur(const MatrixType&, bool) contains an example
+  * of the typical use of this class.
+  *
+  * \note The implementation is adapted from
+  * <a href="http://math.nist.gov/javanumerics/jama/">JAMA</a> (public domain).
+  * Their code is based on EISPACK.
+  *
+  * \sa class ComplexSchur, class EigenSolver, class ComplexEigenSolver
+  */
+template<typename _MatrixType> class RealSchur
+{
+  public:
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef std::complex<typename NumTraits<Scalar>::Real> ComplexScalar;
+    typedef typename MatrixType::Index Index;
+
+    typedef Matrix<ComplexScalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> EigenvalueType;
+    typedef Matrix<Scalar, ColsAtCompileTime, 1, Options & ~RowMajor, MaxColsAtCompileTime, 1> ColumnVectorType;
+
+    /** \brief Default constructor.
+      *
+      * \param [in] size  Positive integer, size of the matrix whose Schur decomposition will be computed.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via compute().  The \p size parameter is only
+      * used as a hint. It is not an error to give a wrong \p size, but it may
+      * impair performance.
+      *
+      * \sa compute() for an example.
+      */
+    RealSchur(Index size = RowsAtCompileTime==Dynamic ? 1 : RowsAtCompileTime)
+            : m_matT(size, size),
+              m_matU(size, size),
+              m_workspaceVector(size),
+              m_hess(size),
+              m_isInitialized(false),
+              m_matUisUptodate(false),
+              m_maxIters(-1)
+    { }
+
+    /** \brief Constructor; computes real Schur decomposition of given matrix. 
+      * 
+      * \param[in]  matrix    Square matrix whose Schur decomposition is to be computed.
+      * \param[in]  computeU  If true, both T and U are computed; if false, only T is computed.
+      *
+      * This constructor calls compute() to compute the Schur decomposition.
+      *
+      * Example: \include RealSchur_RealSchur_MatrixType.cpp
+      * Output: \verbinclude RealSchur_RealSchur_MatrixType.out
+      */
+    RealSchur(const MatrixType& matrix, bool computeU = true)
+            : m_matT(matrix.rows(),matrix.cols()),
+              m_matU(matrix.rows(),matrix.cols()),
+              m_workspaceVector(matrix.rows()),
+              m_hess(matrix.rows()),
+              m_isInitialized(false),
+              m_matUisUptodate(false),
+              m_maxIters(-1)
+    {
+      compute(matrix, computeU);
+    }
+
+    /** \brief Returns the orthogonal matrix in the Schur decomposition. 
+      *
+      * \returns A const reference to the matrix U.
+      *
+      * \pre Either the constructor RealSchur(const MatrixType&, bool) or the
+      * member function compute(const MatrixType&, bool) has been called before
+      * to compute the Schur decomposition of a matrix, and \p computeU was set
+      * to true (the default value).
+      *
+      * \sa RealSchur(const MatrixType&, bool) for an example
+      */
+    const MatrixType& matrixU() const
+    {
+      eigen_assert(m_isInitialized && "RealSchur is not initialized.");
+      eigen_assert(m_matUisUptodate && "The matrix U has not been computed during the RealSchur decomposition.");
+      return m_matU;
+    }
+
+    /** \brief Returns the quasi-triangular matrix in the Schur decomposition. 
+      *
+      * \returns A const reference to the matrix T.
+      *
+      * \pre Either the constructor RealSchur(const MatrixType&, bool) or the
+      * member function compute(const MatrixType&, bool) has been called before
+      * to compute the Schur decomposition of a matrix.
+      *
+      * \sa RealSchur(const MatrixType&, bool) for an example
+      */
+    const MatrixType& matrixT() const
+    {
+      eigen_assert(m_isInitialized && "RealSchur is not initialized.");
+      return m_matT;
+    }
+  
+    /** \brief Computes Schur decomposition of given matrix. 
+      * 
+      * \param[in]  matrix    Square matrix whose Schur decomposition is to be computed.
+      * \param[in]  computeU  If true, both T and U are computed; if false, only T is computed.
+      * \returns    Reference to \c *this
+      *
+      * The Schur decomposition is computed by first reducing the matrix to
+      * Hessenberg form using the class HessenbergDecomposition. The Hessenberg
+      * matrix is then reduced to triangular form by performing Francis QR
+      * iterations with implicit double shift. The cost of computing the Schur
+      * decomposition depends on the number of iterations; as a rough guide, it
+      * may be taken to be \f$25n^3\f$ flops if \a computeU is true and
+      * \f$10n^3\f$ flops if \a computeU is false.
+      *
+      * Example: \include RealSchur_compute.cpp
+      * Output: \verbinclude RealSchur_compute.out
+      *
+      * \sa compute(const MatrixType&, bool, Index)
+      */
+    RealSchur& compute(const MatrixType& matrix, bool computeU = true);
+
+    /** \brief Computes Schur decomposition of a Hessenberg matrix H = Z T Z^T
+     *  \param[in] matrixH Matrix in Hessenberg form H
+     *  \param[in] matrixQ orthogonal matrix Q that transform a matrix A to H : A = Q H Q^T
+     *  \param computeU Computes the matriX U of the Schur vectors
+     * \return Reference to \c *this
+     * 
+     *  This routine assumes that the matrix is already reduced in Hessenberg form matrixH
+     *  using either the class HessenbergDecomposition or another mean. 
+     *  It computes the upper quasi-triangular matrix T of the Schur decomposition of H
+     *  When computeU is true, this routine computes the matrix U such that 
+     *  A = U T U^T =  (QZ) T (QZ)^T = Q H Q^T where A is the initial matrix
+     * 
+     * NOTE Q is referenced if computeU is true; so, if the initial orthogonal matrix
+     * is not available, the user should give an identity matrix (Q.setIdentity())
+     * 
+     * \sa compute(const MatrixType&, bool)
+     */
+    template<typename HessMatrixType, typename OrthMatrixType>
+    RealSchur& computeFromHessenberg(const HessMatrixType& matrixH, const OrthMatrixType& matrixQ,  bool computeU);
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful, \c NoConvergence otherwise.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "RealSchur is not initialized.");
+      return m_info;
+    }
+
+    /** \brief Sets the maximum number of iterations allowed. 
+      *
+      * If not specified by the user, the maximum number of iterations is m_maxIterationsPerRow times the size
+      * of the matrix.
+      */
+    RealSchur& setMaxIterations(Index maxIters)
+    {
+      m_maxIters = maxIters;
+      return *this;
+    }
+
+    /** \brief Returns the maximum number of iterations. */
+    Index getMaxIterations()
+    {
+      return m_maxIters;
+    }
+
+    /** \brief Maximum number of iterations per row.
+      *
+      * If not otherwise specified, the maximum number of iterations is this number times the size of the
+      * matrix. It is currently set to 40.
+      */
+    static const int m_maxIterationsPerRow = 40;
+
+  private:
+    
+    MatrixType m_matT;
+    MatrixType m_matU;
+    ColumnVectorType m_workspaceVector;
+    HessenbergDecomposition<MatrixType> m_hess;
+    ComputationInfo m_info;
+    bool m_isInitialized;
+    bool m_matUisUptodate;
+    Index m_maxIters;
+
+    typedef Matrix<Scalar,3,1> Vector3s;
+
+    Scalar computeNormOfT();
+    Index findSmallSubdiagEntry(Index iu, const Scalar& norm);
+    void splitOffTwoRows(Index iu, bool computeU, const Scalar& exshift);
+    void computeShift(Index iu, Index iter, Scalar& exshift, Vector3s& shiftInfo);
+    void initFrancisQRStep(Index il, Index iu, const Vector3s& shiftInfo, Index& im, Vector3s& firstHouseholderVector);
+    void performFrancisQRStep(Index il, Index im, Index iu, bool computeU, const Vector3s& firstHouseholderVector, Scalar* workspace);
+};
+
+
+template<typename MatrixType>
+RealSchur<MatrixType>& RealSchur<MatrixType>::compute(const MatrixType& matrix, bool computeU)
+{
+  eigen_assert(matrix.cols() == matrix.rows());
+  Index maxIters = m_maxIters;
+  if (maxIters == -1)
+    maxIters = m_maxIterationsPerRow * matrix.rows();
+
+  // Step 1. Reduce to Hessenberg form
+  m_hess.compute(matrix);
+
+  // Step 2. Reduce to real Schur form  
+  computeFromHessenberg(m_hess.matrixH(), m_hess.matrixQ(), computeU);
+  
+  return *this;
+}
+template<typename MatrixType>
+template<typename HessMatrixType, typename OrthMatrixType>
+RealSchur<MatrixType>& RealSchur<MatrixType>::computeFromHessenberg(const HessMatrixType& matrixH, const OrthMatrixType& matrixQ,  bool computeU)
+{  
+  m_matT = matrixH; 
+  if(computeU)
+    m_matU = matrixQ;
+  
+  Index maxIters = m_maxIters;
+  if (maxIters == -1)
+    maxIters = m_maxIterationsPerRow * matrixH.rows();
+  m_workspaceVector.resize(m_matT.cols());
+  Scalar* workspace = &m_workspaceVector.coeffRef(0);
+
+  // The matrix m_matT is divided in three parts. 
+  // Rows 0,...,il-1 are decoupled from the rest because m_matT(il,il-1) is zero. 
+  // Rows il,...,iu is the part we are working on (the active window).
+  // Rows iu+1,...,end are already brought in triangular form.
+  Index iu = m_matT.cols() - 1;
+  Index iter = 0;      // iteration count for current eigenvalue
+  Index totalIter = 0; // iteration count for whole matrix
+  Scalar exshift(0);   // sum of exceptional shifts
+  Scalar norm = computeNormOfT();
+
+  if(norm!=0)
+  {
+    while (iu >= 0)
+    {
+      Index il = findSmallSubdiagEntry(iu, norm);
+
+      // Check for convergence
+      if (il == iu) // One root found
+      {
+        m_matT.coeffRef(iu,iu) = m_matT.coeff(iu,iu) + exshift;
+        if (iu > 0)
+          m_matT.coeffRef(iu, iu-1) = Scalar(0);
+        iu--;
+        iter = 0;
+      }
+      else if (il == iu-1) // Two roots found
+      {
+        splitOffTwoRows(iu, computeU, exshift);
+        iu -= 2;
+        iter = 0;
+      }
+      else // No convergence yet
+      {
+        // The firstHouseholderVector vector has to be initialized to something to get rid of a silly GCC warning (-O1 -Wall -DNDEBUG )
+        Vector3s firstHouseholderVector(0,0,0), shiftInfo;
+        computeShift(iu, iter, exshift, shiftInfo);
+        iter = iter + 1;
+        totalIter = totalIter + 1;
+        if (totalIter > maxIters) break;
+        Index im;
+        initFrancisQRStep(il, iu, shiftInfo, im, firstHouseholderVector);
+        performFrancisQRStep(il, im, iu, computeU, firstHouseholderVector, workspace);
+      }
+    }
+  }
+  if(totalIter <= maxIters)
+    m_info = Success;
+  else
+    m_info = NoConvergence;
+
+  m_isInitialized = true;
+  m_matUisUptodate = computeU;
+  return *this;
+}
+
+/** \internal Computes and returns vector L1 norm of T */
+template<typename MatrixType>
+inline typename MatrixType::Scalar RealSchur<MatrixType>::computeNormOfT()
+{
+  const Index size = m_matT.cols();
+  // FIXME to be efficient the following would requires a triangular reduxion code
+  // Scalar norm = m_matT.upper().cwiseAbs().sum() 
+  //               + m_matT.bottomLeftCorner(size-1,size-1).diagonal().cwiseAbs().sum();
+  Scalar norm(0);
+  for (Index j = 0; j < size; ++j)
+    norm += m_matT.col(j).segment(0, (std::min)(size,j+2)).cwiseAbs().sum();
+  return norm;
+}
+
+/** \internal Look for single small sub-diagonal element and returns its index */
+template<typename MatrixType>
+inline typename MatrixType::Index RealSchur<MatrixType>::findSmallSubdiagEntry(Index iu, const Scalar& norm)
+{
+  using std::abs;
+  Index res = iu;
+  while (res > 0)
+  {
+    Scalar s = abs(m_matT.coeff(res-1,res-1)) + abs(m_matT.coeff(res,res));
+    if (s == 0.0)
+      s = norm;
+    if (abs(m_matT.coeff(res,res-1)) < NumTraits<Scalar>::epsilon() * s)
+      break;
+    res--;
+  }
+  return res;
+}
+
+/** \internal Update T given that rows iu-1 and iu decouple from the rest. */
+template<typename MatrixType>
+inline void RealSchur<MatrixType>::splitOffTwoRows(Index iu, bool computeU, const Scalar& exshift)
+{
+  using std::sqrt;
+  using std::abs;
+  const Index size = m_matT.cols();
+
+  // The eigenvalues of the 2x2 matrix [a b; c d] are 
+  // trace +/- sqrt(discr/4) where discr = tr^2 - 4*det, tr = a + d, det = ad - bc
+  Scalar p = Scalar(0.5) * (m_matT.coeff(iu-1,iu-1) - m_matT.coeff(iu,iu));
+  Scalar q = p * p + m_matT.coeff(iu,iu-1) * m_matT.coeff(iu-1,iu);   // q = tr^2 / 4 - det = discr/4
+  m_matT.coeffRef(iu,iu) += exshift;
+  m_matT.coeffRef(iu-1,iu-1) += exshift;
+
+  if (q >= Scalar(0)) // Two real eigenvalues
+  {
+    Scalar z = sqrt(abs(q));
+    JacobiRotation<Scalar> rot;
+    if (p >= Scalar(0))
+      rot.makeGivens(p + z, m_matT.coeff(iu, iu-1));
+    else
+      rot.makeGivens(p - z, m_matT.coeff(iu, iu-1));
+
+    m_matT.rightCols(size-iu+1).applyOnTheLeft(iu-1, iu, rot.adjoint());
+    m_matT.topRows(iu+1).applyOnTheRight(iu-1, iu, rot);
+    m_matT.coeffRef(iu, iu-1) = Scalar(0); 
+    if (computeU)
+      m_matU.applyOnTheRight(iu-1, iu, rot);
+  }
+
+  if (iu > 1) 
+    m_matT.coeffRef(iu-1, iu-2) = Scalar(0);
+}
+
+/** \internal Form shift in shiftInfo, and update exshift if an exceptional shift is performed. */
+template<typename MatrixType>
+inline void RealSchur<MatrixType>::computeShift(Index iu, Index iter, Scalar& exshift, Vector3s& shiftInfo)
+{
+  using std::sqrt;
+  using std::abs;
+  shiftInfo.coeffRef(0) = m_matT.coeff(iu,iu);
+  shiftInfo.coeffRef(1) = m_matT.coeff(iu-1,iu-1);
+  shiftInfo.coeffRef(2) = m_matT.coeff(iu,iu-1) * m_matT.coeff(iu-1,iu);
+
+  // Wilkinson's original ad hoc shift
+  if (iter == 10)
+  {
+    exshift += shiftInfo.coeff(0);
+    for (Index i = 0; i <= iu; ++i)
+      m_matT.coeffRef(i,i) -= shiftInfo.coeff(0);
+    Scalar s = abs(m_matT.coeff(iu,iu-1)) + abs(m_matT.coeff(iu-1,iu-2));
+    shiftInfo.coeffRef(0) = Scalar(0.75) * s;
+    shiftInfo.coeffRef(1) = Scalar(0.75) * s;
+    shiftInfo.coeffRef(2) = Scalar(-0.4375) * s * s;
+  }
+
+  // MATLAB's new ad hoc shift
+  if (iter == 30)
+  {
+    Scalar s = (shiftInfo.coeff(1) - shiftInfo.coeff(0)) / Scalar(2.0);
+    s = s * s + shiftInfo.coeff(2);
+    if (s > Scalar(0))
+    {
+      s = sqrt(s);
+      if (shiftInfo.coeff(1) < shiftInfo.coeff(0))
+        s = -s;
+      s = s + (shiftInfo.coeff(1) - shiftInfo.coeff(0)) / Scalar(2.0);
+      s = shiftInfo.coeff(0) - shiftInfo.coeff(2) / s;
+      exshift += s;
+      for (Index i = 0; i <= iu; ++i)
+        m_matT.coeffRef(i,i) -= s;
+      shiftInfo.setConstant(Scalar(0.964));
+    }
+  }
+}
+
+/** \internal Compute index im at which Francis QR step starts and the first Householder vector. */
+template<typename MatrixType>
+inline void RealSchur<MatrixType>::initFrancisQRStep(Index il, Index iu, const Vector3s& shiftInfo, Index& im, Vector3s& firstHouseholderVector)
+{
+  using std::abs;
+  Vector3s& v = firstHouseholderVector; // alias to save typing
+
+  for (im = iu-2; im >= il; --im)
+  {
+    const Scalar Tmm = m_matT.coeff(im,im);
+    const Scalar r = shiftInfo.coeff(0) - Tmm;
+    const Scalar s = shiftInfo.coeff(1) - Tmm;
+    v.coeffRef(0) = (r * s - shiftInfo.coeff(2)) / m_matT.coeff(im+1,im) + m_matT.coeff(im,im+1);
+    v.coeffRef(1) = m_matT.coeff(im+1,im+1) - Tmm - r - s;
+    v.coeffRef(2) = m_matT.coeff(im+2,im+1);
+    if (im == il) {
+      break;
+    }
+    const Scalar lhs = m_matT.coeff(im,im-1) * (abs(v.coeff(1)) + abs(v.coeff(2)));
+    const Scalar rhs = v.coeff(0) * (abs(m_matT.coeff(im-1,im-1)) + abs(Tmm) + abs(m_matT.coeff(im+1,im+1)));
+    if (abs(lhs) < NumTraits<Scalar>::epsilon() * rhs)
+    {
+      break;
+    }
+  }
+}
+
+/** \internal Perform a Francis QR step involving rows il:iu and columns im:iu. */
+template<typename MatrixType>
+inline void RealSchur<MatrixType>::performFrancisQRStep(Index il, Index im, Index iu, bool computeU, const Vector3s& firstHouseholderVector, Scalar* workspace)
+{
+  eigen_assert(im >= il);
+  eigen_assert(im <= iu-2);
+
+  const Index size = m_matT.cols();
+
+  for (Index k = im; k <= iu-2; ++k)
+  {
+    bool firstIteration = (k == im);
+
+    Vector3s v;
+    if (firstIteration)
+      v = firstHouseholderVector;
+    else
+      v = m_matT.template block<3,1>(k,k-1);
+
+    Scalar tau, beta;
+    Matrix<Scalar, 2, 1> ess;
+    v.makeHouseholder(ess, tau, beta);
+    
+    if (beta != Scalar(0)) // if v is not zero
+    {
+      if (firstIteration && k > il)
+        m_matT.coeffRef(k,k-1) = -m_matT.coeff(k,k-1);
+      else if (!firstIteration)
+        m_matT.coeffRef(k,k-1) = beta;
+
+      // These Householder transformations form the O(n^3) part of the algorithm
+      m_matT.block(k, k, 3, size-k).applyHouseholderOnTheLeft(ess, tau, workspace);
+      m_matT.block(0, k, (std::min)(iu,k+3) + 1, 3).applyHouseholderOnTheRight(ess, tau, workspace);
+      if (computeU)
+        m_matU.block(0, k, size, 3).applyHouseholderOnTheRight(ess, tau, workspace);
+    }
+  }
+
+  Matrix<Scalar, 2, 1> v = m_matT.template block<2,1>(iu-1, iu-2);
+  Scalar tau, beta;
+  Matrix<Scalar, 1, 1> ess;
+  v.makeHouseholder(ess, tau, beta);
+
+  if (beta != Scalar(0)) // if v is not zero
+  {
+    m_matT.coeffRef(iu-1, iu-2) = beta;
+    m_matT.block(iu-1, iu-1, 2, size-iu+1).applyHouseholderOnTheLeft(ess, tau, workspace);
+    m_matT.block(0, iu-1, iu+1, 2).applyHouseholderOnTheRight(ess, tau, workspace);
+    if (computeU)
+      m_matU.block(0, iu-1, size, 2).applyHouseholderOnTheRight(ess, tau, workspace);
+  }
+
+  // clean up pollution due to round-off errors
+  for (Index i = im+2; i <= iu; ++i)
+  {
+    m_matT.coeffRef(i,i-2) = Scalar(0);
+    if (i > im+2)
+      m_matT.coeffRef(i,i-3) = Scalar(0);
+  }
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_REAL_SCHUR_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/RealSchur_MKL.h b/third_party/eigen3/Eigen/src/Eigenvalues/RealSchur_MKL.h
new file mode 100644
index 0000000000..ad97364602
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/RealSchur_MKL.h
@@ -0,0 +1,83 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *    Real Schur needed to real unsymmetrical eigenvalues/eigenvectors.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_REAL_SCHUR_MKL_H
+#define EIGEN_REAL_SCHUR_MKL_H
+
+#include "Eigen/src/Core/util/MKL_support.h"
+
+namespace Eigen { 
+
+/** \internal Specialization for the data types supported by MKL */
+
+#define EIGEN_MKL_SCHUR_REAL(EIGTYPE, MKLTYPE, MKLPREFIX, MKLPREFIX_U, EIGCOLROW, MKLCOLROW) \
+template<> inline \
+RealSchur<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW> >& \
+RealSchur<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW> >::compute(const Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW>& matrix, bool computeU) \
+{ \
+  typedef Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW> MatrixType; \
+  typedef MatrixType::Scalar Scalar; \
+  typedef MatrixType::RealScalar RealScalar; \
+\
+  eigen_assert(matrix.cols() == matrix.rows()); \
+\
+  lapack_int n = matrix.cols(), sdim, info; \
+  lapack_int lda = matrix.outerStride(); \
+  lapack_int matrix_order = MKLCOLROW; \
+  char jobvs, sort='N'; \
+  LAPACK_##MKLPREFIX_U##_SELECT2 select = 0; \
+  jobvs = (computeU) ? 'V' : 'N'; \
+  m_matU.resize(n, n); \
+  lapack_int ldvs  = m_matU.outerStride(); \
+  m_matT = matrix; \
+  Matrix<EIGTYPE, Dynamic, Dynamic> wr, wi; \
+  wr.resize(n, 1); wi.resize(n, 1); \
+  info = LAPACKE_##MKLPREFIX##gees( matrix_order, jobvs, sort, select, n, (MKLTYPE*)m_matT.data(), lda, &sdim, (MKLTYPE*)wr.data(), (MKLTYPE*)wi.data(), (MKLTYPE*)m_matU.data(), ldvs ); \
+  if(info == 0) \
+    m_info = Success; \
+  else \
+    m_info = NoConvergence; \
+\
+  m_isInitialized = true; \
+  m_matUisUptodate = computeU; \
+  return *this; \
+\
+}
+
+EIGEN_MKL_SCHUR_REAL(double,   double, d, D, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_SCHUR_REAL(float,    float,  s, S, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_SCHUR_REAL(double,   double, d, D, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_SCHUR_REAL(float,    float,  s, S, RowMajor, LAPACK_ROW_MAJOR)
+
+} // end namespace Eigen
+
+#endif // EIGEN_REAL_SCHUR_MKL_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h b/third_party/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h
new file mode 100644
index 0000000000..d97d905273
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h
@@ -0,0 +1,884 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SELFADJOINTEIGENSOLVER_H
+#define EIGEN_SELFADJOINTEIGENSOLVER_H
+
+#include "./Tridiagonalization.h"
+
+namespace Eigen { 
+
+template<typename _MatrixType>
+class GeneralizedSelfAdjointEigenSolver;
+
+namespace internal {
+template<typename SolverType,int Size,bool IsComplex> struct direct_selfadjoint_eigenvalues;
+template<typename MatrixType, typename DiagType, typename SubDiagType>
+ComputationInfo computeFromTridiagonal_impl(DiagType& diag, SubDiagType& subdiag, const typename MatrixType::Index maxIterations, bool computeEigenvectors, MatrixType& eivec);
+}
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class SelfAdjointEigenSolver
+  *
+  * \brief Computes eigenvalues and eigenvectors of selfadjoint matrices
+  *
+  * \tparam _MatrixType the type of the matrix of which we are computing the
+  * eigendecomposition; this is expected to be an instantiation of the Matrix
+  * class template.
+  *
+  * A matrix \f$ A \f$ is selfadjoint if it equals its adjoint. For real
+  * matrices, this means that the matrix is symmetric: it equals its
+  * transpose. This class computes the eigenvalues and eigenvectors of a
+  * selfadjoint matrix. These are the scalars \f$ \lambda \f$ and vectors
+  * \f$ v \f$ such that \f$ Av = \lambda v \f$.  The eigenvalues of a
+  * selfadjoint matrix are always real. If \f$ D \f$ is a diagonal matrix with
+  * the eigenvalues on the diagonal, and \f$ V \f$ is a matrix with the
+  * eigenvectors as its columns, then \f$ A = V D V^{-1} \f$ (for selfadjoint
+  * matrices, the matrix \f$ V \f$ is always invertible). This is called the
+  * eigendecomposition.
+  *
+  * The algorithm exploits the fact that the matrix is selfadjoint, making it
+  * faster and more accurate than the general purpose eigenvalue algorithms
+  * implemented in EigenSolver and ComplexEigenSolver.
+  *
+  * Only the \b lower \b triangular \b part of the input matrix is referenced.
+  *
+  * Call the function compute() to compute the eigenvalues and eigenvectors of
+  * a given matrix. Alternatively, you can use the
+  * SelfAdjointEigenSolver(const MatrixType&, int) constructor which computes
+  * the eigenvalues and eigenvectors at construction time. Once the eigenvalue
+  * and eigenvectors are computed, they can be retrieved with the eigenvalues()
+  * and eigenvectors() functions.
+  *
+  * The documentation for SelfAdjointEigenSolver(const MatrixType&, int)
+  * contains an example of the typical use of this class.
+  *
+  * To solve the \em generalized eigenvalue problem \f$ Av = \lambda Bv \f$ and
+  * the likes, see the class GeneralizedSelfAdjointEigenSolver.
+  *
+  * \sa MatrixBase::eigenvalues(), class EigenSolver, class ComplexEigenSolver
+  */
+template<typename _MatrixType> class SelfAdjointEigenSolver
+{
+  public:
+
+    typedef _MatrixType MatrixType;
+    enum {
+      Size = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+    
+    /** \brief Scalar type for matrices of type \p _MatrixType. */
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::Index Index;
+
+    /** \brief Real scalar type for \p _MatrixType.
+      *
+      * This is just \c Scalar if #Scalar is real (e.g., \c float or
+      * \c double), and the type of the real part of \c Scalar if #Scalar is
+      * complex.
+      */
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    
+    friend struct internal::direct_selfadjoint_eigenvalues<SelfAdjointEigenSolver,Size,NumTraits<Scalar>::IsComplex>;
+
+    /** \brief Type for vector of eigenvalues as returned by eigenvalues().
+      *
+      * This is a column vector with entries of type #RealScalar.
+      * The length of the vector is the size of \p _MatrixType.
+      */
+    typedef typename internal::plain_col_type<MatrixType, RealScalar>::type RealVectorType;
+    typedef Tridiagonalization<MatrixType> TridiagonalizationType;
+    typedef typename TridiagonalizationType::SubDiagonalType SubDiagonalType;
+
+    /** \brief Default constructor for fixed-size matrices.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via compute(). This constructor
+      * can only be used if \p _MatrixType is a fixed-size matrix; use
+      * SelfAdjointEigenSolver(Index) for dynamic-size matrices.
+      *
+      * Example: \include SelfAdjointEigenSolver_SelfAdjointEigenSolver.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_SelfAdjointEigenSolver.out
+      */
+    EIGEN_DEVICE_FUNC
+    SelfAdjointEigenSolver()
+        : m_eivec(),
+          m_eivalues(),
+          m_subdiag(),
+          m_isInitialized(false)
+    { }
+
+    /** \brief Constructor, pre-allocates memory for dynamic-size matrices.
+      *
+      * \param [in]  size  Positive integer, size of the matrix whose
+      * eigenvalues and eigenvectors will be computed.
+      *
+      * This constructor is useful for dynamic-size matrices, when the user
+      * intends to perform decompositions via compute(). The \p size
+      * parameter is only used as a hint. It is not an error to give a wrong
+      * \p size, but it may impair performance.
+      *
+      * \sa compute() for an example
+      */
+    EIGEN_DEVICE_FUNC
+    SelfAdjointEigenSolver(Index size)
+        : m_eivec(size, size),
+          m_eivalues(size),
+          m_subdiag(size > 1 ? size - 1 : 1),
+          m_isInitialized(false)
+    {}
+
+    /** \brief Constructor; computes eigendecomposition of given matrix.
+      *
+      * \param[in]  matrix  Selfadjoint matrix whose eigendecomposition is to
+      *    be computed. Only the lower triangular part of the matrix is referenced.
+      * \param[in]  options Can be #ComputeEigenvectors (default) or #EigenvaluesOnly.
+      *
+      * This constructor calls compute(const MatrixType&, int) to compute the
+      * eigenvalues of the matrix \p matrix. The eigenvectors are computed if
+      * \p options equals #ComputeEigenvectors.
+      *
+      * Example: \include SelfAdjointEigenSolver_SelfAdjointEigenSolver_MatrixType.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_SelfAdjointEigenSolver_MatrixType.out
+      *
+      * \sa compute(const MatrixType&, int)
+      */
+    EIGEN_DEVICE_FUNC
+    SelfAdjointEigenSolver(const MatrixType& matrix, int options = ComputeEigenvectors)
+      : m_eivec(matrix.rows(), matrix.cols()),
+        m_eivalues(matrix.cols()),
+        m_subdiag(matrix.rows() > 1 ? matrix.rows() - 1 : 1),
+        m_isInitialized(false)
+    {
+      compute(matrix, options);
+    }
+
+    /** \brief Computes eigendecomposition of given matrix.
+      *
+      * \param[in]  matrix  Selfadjoint matrix whose eigendecomposition is to
+      *    be computed. Only the lower triangular part of the matrix is referenced.
+      * \param[in]  options Can be #ComputeEigenvectors (default) or #EigenvaluesOnly.
+      * \returns    Reference to \c *this
+      *
+      * This function computes the eigenvalues of \p matrix.  The eigenvalues()
+      * function can be used to retrieve them.  If \p options equals #ComputeEigenvectors,
+      * then the eigenvectors are also computed and can be retrieved by
+      * calling eigenvectors().
+      *
+      * This implementation uses a symmetric QR algorithm. The matrix is first
+      * reduced to tridiagonal form using the Tridiagonalization class. The
+      * tridiagonal matrix is then brought to diagonal form with implicit
+      * symmetric QR steps with Wilkinson shift. Details can be found in
+      * Section 8.3 of Golub \& Van Loan, <i>%Matrix Computations</i>.
+      *
+      * The cost of the computation is about \f$ 9n^3 \f$ if the eigenvectors
+      * are required and \f$ 4n^3/3 \f$ if they are not required.
+      *
+      * This method reuses the memory in the SelfAdjointEigenSolver object that
+      * was allocated when the object was constructed, if the size of the
+      * matrix does not change.
+      *
+      * Example: \include SelfAdjointEigenSolver_compute_MatrixType.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_compute_MatrixType.out
+      *
+      * \sa SelfAdjointEigenSolver(const MatrixType&, int)
+      */
+    EIGEN_DEVICE_FUNC
+    SelfAdjointEigenSolver& compute(const MatrixType& matrix, int options = ComputeEigenvectors);
+    
+    /** \brief Computes eigendecomposition of given matrix using a direct algorithm
+      *
+      * This is a variant of compute(const MatrixType&, int options) which
+      * directly solves the underlying polynomial equation.
+      * 
+      * Currently only 3x3 matrices for which the sizes are known at compile time are supported (e.g., Matrix3d).
+      * 
+      * This method is usually significantly faster than the QR algorithm
+      * but it might also be less accurate. It is also worth noting that
+      * for 3x3 matrices it involves trigonometric operations which are
+      * not necessarily available for all scalar types.
+      *
+      * \sa compute(const MatrixType&, int options)
+      */
+    EIGEN_DEVICE_FUNC
+    SelfAdjointEigenSolver& computeDirect(const MatrixType& matrix, int options = ComputeEigenvectors);
+
+    /**
+      *\brief Computes the eigen decomposition from a tridiagonal symmetric matrix
+      *
+      * \param[in] diag The vector containing the diagonal of the matrix.
+      * \param[in] subdiag The subdiagonal of the matrix.
+      * \returns Reference to \c *this
+      *
+      * This function assumes that the matrix has been reduced to tridiagonal form.
+      *
+      * \sa compute(const MatrixType&, int) for more information
+      */
+    SelfAdjointEigenSolver& computeFromTridiagonal(const RealVectorType& diag, const SubDiagonalType& subdiag , int options=ComputeEigenvectors);
+
+    /** \brief Returns the eigenvectors of given matrix.
+      *
+      * \returns  A const reference to the matrix whose columns are the eigenvectors.
+      *
+      * \pre The eigenvectors have been computed before.
+      *
+      * Column \f$ k \f$ of the returned matrix is an eigenvector corresponding
+      * to eigenvalue number \f$ k \f$ as returned by eigenvalues().  The
+      * eigenvectors are normalized to have (Euclidean) norm equal to one. If
+      * this object was used to solve the eigenproblem for the selfadjoint
+      * matrix \f$ A \f$, then the matrix returned by this function is the
+      * matrix \f$ V \f$ in the eigendecomposition \f$ A = V D V^{-1} \f$.
+      *
+      * Example: \include SelfAdjointEigenSolver_eigenvectors.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_eigenvectors.out
+      *
+      * \sa eigenvalues()
+      */
+    EIGEN_DEVICE_FUNC
+    const MatrixType& eigenvectors() const
+    {
+      eigen_assert(m_isInitialized && "SelfAdjointEigenSolver is not initialized.");
+      eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues.");
+      return m_eivec;
+    }
+
+    /** \brief Returns the eigenvalues of given matrix.
+      *
+      * \returns A const reference to the column vector containing the eigenvalues.
+      *
+      * \pre The eigenvalues have been computed before.
+      *
+      * The eigenvalues are repeated according to their algebraic multiplicity,
+      * so there are as many eigenvalues as rows in the matrix. The eigenvalues
+      * are sorted in increasing order.
+      *
+      * Example: \include SelfAdjointEigenSolver_eigenvalues.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_eigenvalues.out
+      *
+      * \sa eigenvectors(), MatrixBase::eigenvalues()
+      */
+    EIGEN_DEVICE_FUNC
+    const RealVectorType& eigenvalues() const
+    {
+      eigen_assert(m_isInitialized && "SelfAdjointEigenSolver is not initialized.");
+      return m_eivalues;
+    }
+
+    /** \brief Computes the positive-definite square root of the matrix.
+      *
+      * \returns the positive-definite square root of the matrix
+      *
+      * \pre The eigenvalues and eigenvectors of a positive-definite matrix
+      * have been computed before.
+      *
+      * The square root of a positive-definite matrix \f$ A \f$ is the
+      * positive-definite matrix whose square equals \f$ A \f$. This function
+      * uses the eigendecomposition \f$ A = V D V^{-1} \f$ to compute the
+      * square root as \f$ A^{1/2} = V D^{1/2} V^{-1} \f$.
+      *
+      * Example: \include SelfAdjointEigenSolver_operatorSqrt.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_operatorSqrt.out
+      *
+      * \sa operatorInverseSqrt(),
+      *     \ref MatrixFunctions_Module "MatrixFunctions Module"
+      */
+    EIGEN_DEVICE_FUNC
+    MatrixType operatorSqrt() const
+    {
+      eigen_assert(m_isInitialized && "SelfAdjointEigenSolver is not initialized.");
+      eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues.");
+      return m_eivec * m_eivalues.cwiseSqrt().asDiagonal() * m_eivec.adjoint();
+    }
+
+    /** \brief Computes the inverse square root of the matrix.
+      *
+      * \returns the inverse positive-definite square root of the matrix
+      *
+      * \pre The eigenvalues and eigenvectors of a positive-definite matrix
+      * have been computed before.
+      *
+      * This function uses the eigendecomposition \f$ A = V D V^{-1} \f$ to
+      * compute the inverse square root as \f$ V D^{-1/2} V^{-1} \f$. This is
+      * cheaper than first computing the square root with operatorSqrt() and
+      * then its inverse with MatrixBase::inverse().
+      *
+      * Example: \include SelfAdjointEigenSolver_operatorInverseSqrt.cpp
+      * Output: \verbinclude SelfAdjointEigenSolver_operatorInverseSqrt.out
+      *
+      * \sa operatorSqrt(), MatrixBase::inverse(),
+      *     \ref MatrixFunctions_Module "MatrixFunctions Module"
+      */
+    EIGEN_DEVICE_FUNC
+    MatrixType operatorInverseSqrt() const
+    {
+      eigen_assert(m_isInitialized && "SelfAdjointEigenSolver is not initialized.");
+      eigen_assert(m_eigenvectorsOk && "The eigenvectors have not been computed together with the eigenvalues.");
+      return m_eivec * m_eivalues.cwiseInverse().cwiseSqrt().asDiagonal() * m_eivec.adjoint();
+    }
+
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful, \c NoConvergence otherwise.
+      */
+    EIGEN_DEVICE_FUNC
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "SelfAdjointEigenSolver is not initialized.");
+      return m_info;
+    }
+
+    /** \brief Maximum number of iterations.
+      *
+      * The algorithm terminates if it does not converge within m_maxIterations * n iterations, where n
+      * denotes the size of the matrix. This value is currently set to 30 (copied from LAPACK).
+      */
+    static const int m_maxIterations = 30;
+
+    #ifdef EIGEN2_SUPPORT
+    EIGEN_DEVICE_FUNC
+    SelfAdjointEigenSolver(const MatrixType& matrix, bool computeEigenvectors)
+      : m_eivec(matrix.rows(), matrix.cols()),
+        m_eivalues(matrix.cols()),
+        m_subdiag(matrix.rows() > 1 ? matrix.rows() - 1 : 1),
+        m_isInitialized(false)
+    {
+      compute(matrix, computeEigenvectors);
+    }
+    
+    EIGEN_DEVICE_FUNC
+    SelfAdjointEigenSolver(const MatrixType& matA, const MatrixType& matB, bool computeEigenvectors = true)
+        : m_eivec(matA.cols(), matA.cols()),
+          m_eivalues(matA.cols()),
+          m_subdiag(matA.cols() > 1 ? matA.cols() - 1 : 1),
+          m_isInitialized(false)
+    {
+      static_cast<GeneralizedSelfAdjointEigenSolver<MatrixType>*>(this)->compute(matA, matB, computeEigenvectors ? ComputeEigenvectors : EigenvaluesOnly);
+    }
+    
+    EIGEN_DEVICE_FUNC
+    void compute(const MatrixType& matrix, bool computeEigenvectors)
+    {
+      compute(matrix, computeEigenvectors ? ComputeEigenvectors : EigenvaluesOnly);
+    }
+
+    EIGEN_DEVICE_FUNC
+    void compute(const MatrixType& matA, const MatrixType& matB, bool computeEigenvectors = true)
+    {
+      compute(matA, matB, computeEigenvectors ? ComputeEigenvectors : EigenvaluesOnly);
+    }
+    #endif // EIGEN2_SUPPORT
+
+  protected:
+    MatrixType m_eivec;
+    RealVectorType m_eivalues;
+    typename TridiagonalizationType::SubDiagonalType m_subdiag;
+    ComputationInfo m_info;
+    bool m_isInitialized;
+    bool m_eigenvectorsOk;
+};
+
+/** \internal
+  *
+  * \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  * Performs a QR step on a tridiagonal symmetric matrix represented as a
+  * pair of two vectors \a diag and \a subdiag.
+  *
+  * \param matA the input selfadjoint matrix
+  * \param hCoeffs returned Householder coefficients
+  *
+  * For compilation efficiency reasons, this procedure does not use eigen expression
+  * for its arguments.
+  *
+  * Implemented from Golub's "Matrix Computations", algorithm 8.3.2:
+  * "implicit symmetric QR step with Wilkinson shift"
+  */
+namespace internal {
+template<int StorageOrder,typename RealScalar, typename Scalar, typename Index>
+EIGEN_DEVICE_FUNC
+static void tridiagonal_qr_step(RealScalar* diag, RealScalar* subdiag, Index start, Index end, Scalar* matrixQ, Index n);
+}
+
+template<typename MatrixType>
+EIGEN_DEVICE_FUNC
+SelfAdjointEigenSolver<MatrixType>& SelfAdjointEigenSolver<MatrixType>
+::compute(const MatrixType& matrix, int options)
+{
+  using std::abs;
+  eigen_assert(matrix.cols() == matrix.rows());
+  eigen_assert((options&~(EigVecMask|GenEigMask))==0
+          && (options&EigVecMask)!=EigVecMask
+          && "invalid option parameter");
+  bool computeEigenvectors = (options&ComputeEigenvectors)==ComputeEigenvectors;
+  Index n = matrix.cols();
+  m_eivalues.resize(n,1);
+
+  if(n==1)
+  {
+    m_eivalues.coeffRef(0,0) = numext::real(matrix.coeff(0,0));
+    if(computeEigenvectors)
+      m_eivec.setOnes(n,n);
+    m_info = Success;
+    m_isInitialized = true;
+    m_eigenvectorsOk = computeEigenvectors;
+    return *this;
+  }
+
+  // declare some aliases
+  RealVectorType& diag = m_eivalues;
+  MatrixType& mat = m_eivec;
+
+  // map the matrix coefficients to [-1:1] to avoid over- and underflow.
+  mat = matrix.template triangularView<Lower>();
+  RealScalar scale = mat.cwiseAbs().maxCoeff();
+  if(scale==RealScalar(0)) scale = RealScalar(1);
+  mat.template triangularView<Lower>() /= scale;
+  m_subdiag.resize(n-1);
+  internal::tridiagonalization_inplace(mat, diag, m_subdiag, computeEigenvectors);
+
+  m_info = internal::computeFromTridiagonal_impl(diag, m_subdiag, m_maxIterations, computeEigenvectors, m_eivec);
+  
+  // scale back the eigen values
+  m_eivalues *= scale;
+
+  m_isInitialized = true;
+  m_eigenvectorsOk = computeEigenvectors;
+  return *this;
+}
+
+template<typename MatrixType>
+SelfAdjointEigenSolver<MatrixType>& SelfAdjointEigenSolver<MatrixType>
+::computeFromTridiagonal(const RealVectorType& diag, const SubDiagonalType& subdiag , int options)
+{
+  //TODO : Add an option to scale the values beforehand
+  bool computeEigenvectors = (options&ComputeEigenvectors)==ComputeEigenvectors;
+
+  m_eivalues = diag;
+  m_subdiag = subdiag;
+  if (computeEigenvectors)
+  {
+    m_eivec.setIdentity(diag.size(), diag.size());
+  }
+  m_info = computeFromTridiagonal_impl(m_eivalues, m_subdiag, m_maxIterations, computeEigenvectors, m_eivec);
+
+  m_isInitialized = true;
+  m_eigenvectorsOk = computeEigenvectors;
+  return *this;
+}
+
+namespace internal {
+/**
+  * \internal
+  * \brief Compute the eigendecomposition from a tridiagonal matrix
+  *
+  * \param[in,out] diag : On input, the diagonal of the matrix, on output the eigenvalues
+  * \param[in] subdiag : The subdiagonal part of the matrix.
+  * \param[in,out] : On input, the maximum number of iterations, on output, the effective number of iterations.
+  * \param[out] eivec : The matrix to store the eigenvectors... if needed. allocated on input
+  * \returns \c Success or \c NoConvergence
+  */
+template<typename MatrixType, typename DiagType, typename SubDiagType>
+ComputationInfo computeFromTridiagonal_impl(DiagType& diag, SubDiagType& subdiag, const typename MatrixType::Index maxIterations, bool computeEigenvectors, MatrixType& eivec)
+{
+  using std::abs;
+
+  ComputationInfo info;
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+
+  Index n = diag.size();
+  Index end = n-1;
+  Index start = 0;
+  Index iter = 0; // total number of iterations
+
+  while (end>0)
+  {
+    for (Index i = start; i<end; ++i)
+      if (internal::isMuchSmallerThan(abs(subdiag[i]),(abs(diag[i])+abs(diag[i+1]))))
+        subdiag[i] = 0;
+
+    // find the largest unreduced block
+    while (end>0 && subdiag[end-1]==0)
+    {
+      end--;
+    }
+    if (end<=0)
+      break;
+
+    // if we spent too many iterations, we give up
+    iter++;
+    if(iter > maxIterations * n) break;
+
+    start = end - 1;
+    while (start>0 && subdiag[start-1]!=0)
+      start--;
+
+    internal::tridiagonal_qr_step<MatrixType::Flags&RowMajorBit ? RowMajor : ColMajor>(diag.data(), subdiag.data(), start, end, computeEigenvectors ? eivec.data() : (Scalar*)0, n);
+  }
+  if (iter <= maxIterations * n)
+    info = Success;
+  else
+    info = NoConvergence;
+
+  // Sort eigenvalues and corresponding vectors.
+  // TODO make the sort optional ?
+  // TODO use a better sort algorithm !!
+  if (info == Success)
+  {
+    for (Index i = 0; i < n-1; ++i)
+    {
+      Index k;
+      diag.segment(i,n-i).minCoeff(&k);
+      if (k > 0)
+      {
+        std::swap(diag[i], diag[k+i]);
+        if(computeEigenvectors)
+          eivec.col(i).swap(eivec.col(k+i));
+      }
+    }
+  }
+  return info;
+}
+  
+template<typename SolverType,int Size,bool IsComplex> struct direct_selfadjoint_eigenvalues
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(SolverType& eig, const typename SolverType::MatrixType& A, int options)
+  { eig.compute(A,options); }
+};
+
+template<typename SolverType> struct direct_selfadjoint_eigenvalues<SolverType,3,false>
+{
+  typedef typename SolverType::MatrixType MatrixType;
+  typedef typename SolverType::RealVectorType VectorType;
+  typedef typename SolverType::Scalar Scalar;
+  
+  EIGEN_DEVICE_FUNC
+  static inline void computeRoots(const MatrixType& m, VectorType& roots)
+  {
+    EIGEN_USING_STD_MATH(sqrt)
+    EIGEN_USING_STD_MATH(atan2)
+    EIGEN_USING_STD_MATH(cos)
+    EIGEN_USING_STD_MATH(sin)
+    const Scalar s_inv3 = Scalar(1.0)/Scalar(3.0);
+    const Scalar s_sqrt3 = sqrt(Scalar(3.0));
+
+    // The characteristic equation is x^3 - c2*x^2 + c1*x - c0 = 0.  The
+    // eigenvalues are the roots to this equation, all guaranteed to be
+    // real-valued, because the matrix is symmetric.
+    Scalar c0 = m(0,0)*m(1,1)*m(2,2) + Scalar(2)*m(1,0)*m(2,0)*m(2,1) - m(0,0)*m(2,1)*m(2,1) - m(1,1)*m(2,0)*m(2,0) - m(2,2)*m(1,0)*m(1,0);
+    Scalar c1 = m(0,0)*m(1,1) - m(1,0)*m(1,0) + m(0,0)*m(2,2) - m(2,0)*m(2,0) + m(1,1)*m(2,2) - m(2,1)*m(2,1);
+    Scalar c2 = m(0,0) + m(1,1) + m(2,2);
+
+    // Construct the parameters used in classifying the roots of the equation
+    // and in solving the equation for the roots in closed form.
+    Scalar c2_over_3 = c2*s_inv3;
+    Scalar a_over_3 = (c1 - c2*c2_over_3)*s_inv3;
+    if (a_over_3 > Scalar(0))
+      a_over_3 = Scalar(0);
+
+    Scalar half_b = Scalar(0.5)*(c0 + c2_over_3*(Scalar(2)*c2_over_3*c2_over_3 - c1));
+
+    Scalar q = half_b*half_b + a_over_3*a_over_3*a_over_3;
+    if (q > Scalar(0))
+      q = Scalar(0);
+
+    // Compute the eigenvalues by solving for the roots of the polynomial.
+    Scalar rho = sqrt(-a_over_3);
+    Scalar theta = atan2(sqrt(-q),half_b)*s_inv3;
+    Scalar cos_theta = cos(theta);
+    Scalar sin_theta = sin(theta);
+    roots(0) = c2_over_3 + Scalar(2)*rho*cos_theta;
+    roots(1) = c2_over_3 - rho*(cos_theta + s_sqrt3*sin_theta);
+    roots(2) = c2_over_3 - rho*(cos_theta - s_sqrt3*sin_theta);
+
+    // Sort in increasing order.
+    if (roots(0) >= roots(1))
+      numext::swap(roots(0),roots(1));
+    if (roots(1) >= roots(2))
+    {
+      numext::swap(roots(1),roots(2));
+      if (roots(0) >= roots(1))
+        numext::swap(roots(0),roots(1));
+    }
+  }
+  
+  EIGEN_DEVICE_FUNC
+  static inline void run(SolverType& solver, const MatrixType& mat, int options)
+  {
+    using std::sqrt;
+    eigen_assert(mat.cols() == 3 && mat.cols() == mat.rows());
+    eigen_assert((options&~(EigVecMask|GenEigMask))==0
+            && (options&EigVecMask)!=EigVecMask
+            && "invalid option parameter");
+    bool computeEigenvectors = (options&ComputeEigenvectors)==ComputeEigenvectors;
+    
+    MatrixType& eivecs = solver.m_eivec;
+    VectorType& eivals = solver.m_eivalues;
+  
+    // map the matrix coefficients to [-1:1] to avoid over- and underflow.
+    Scalar scale = mat.cwiseAbs().maxCoeff();
+    MatrixType scaledMat = mat / scale;
+
+    // compute the eigenvalues
+    computeRoots(scaledMat,eivals);
+
+    // compute the eigen vectors
+    if(computeEigenvectors)
+    {
+      Scalar safeNorm2 = Eigen::NumTraits<Scalar>::epsilon();
+      safeNorm2 *= safeNorm2;
+      if((eivals(2)-eivals(0))<=Eigen::NumTraits<Scalar>::epsilon())
+      {
+        eivecs.setIdentity();
+      }
+      else
+      {
+        scaledMat = scaledMat.template selfadjointView<Lower>();
+        MatrixType tmp;
+        tmp = scaledMat;
+
+        Scalar d0 = eivals(2) - eivals(1);
+        Scalar d1 = eivals(1) - eivals(0);
+        int k =  d0 > d1 ? 2 : 0;
+        d0 = d0 > d1 ? d1 : d0;
+
+        tmp.diagonal().array () -= eivals(k);
+        VectorType cross;
+        Scalar n;
+        n = (cross = tmp.row(0).cross(tmp.row(1))).squaredNorm();
+
+        if(n>safeNorm2)
+          eivecs.col(k) = cross / sqrt(n);
+        else
+        {
+          n = (cross = tmp.row(0).cross(tmp.row(2))).squaredNorm();
+
+          if(n>safeNorm2)
+            eivecs.col(k) = cross / sqrt(n);
+          else
+          {
+            n = (cross = tmp.row(1).cross(tmp.row(2))).squaredNorm();
+
+            if(n>safeNorm2)
+              eivecs.col(k) = cross / sqrt(n);
+            else
+            {
+              // the input matrix and/or the eigenvaues probably contains some inf/NaN,
+              // => exit
+              // scale back to the original size.
+              eivals *= scale;
+
+              solver.m_info = NumericalIssue;
+              solver.m_isInitialized = true;
+              solver.m_eigenvectorsOk = computeEigenvectors;
+              return;
+            }
+          }
+        }
+
+        tmp = scaledMat;
+        tmp.diagonal().array() -= eivals(1);
+
+        if(d0<=Eigen::NumTraits<Scalar>::epsilon())
+          eivecs.col(1) = eivecs.col(k).unitOrthogonal();
+        else
+        {
+          n = (cross = eivecs.col(k).cross(tmp.row(0).normalized())).squaredNorm();
+          if(n>safeNorm2)
+            eivecs.col(1) = cross / sqrt(n);
+          else
+          {
+            n = (cross = eivecs.col(k).cross(tmp.row(1))).squaredNorm();
+            if(n>safeNorm2)
+              eivecs.col(1) = cross / sqrt(n);
+            else
+            {
+              n = (cross = eivecs.col(k).cross(tmp.row(2))).squaredNorm();
+              if(n>safeNorm2)
+                eivecs.col(1) = cross / sqrt(n);
+              else
+              {
+                // we should never reach this point,
+                // if so the last two eigenvalues are likely to ve very closed to each other
+                eivecs.col(1) = eivecs.col(k).unitOrthogonal();
+              }
+            }
+          }
+
+          // make sure that eivecs[1] is orthogonal to eivecs[2]
+          Scalar d = eivecs.col(1).dot(eivecs.col(k));
+          eivecs.col(1) = (eivecs.col(1) - d * eivecs.col(k)).normalized();
+        }
+
+        eivecs.col(k==2 ? 0 : 2) = eivecs.col(k).cross(eivecs.col(1)).normalized();
+      }
+    }
+    // Rescale back to the original size.
+    eivals *= scale;
+    
+    solver.m_info = Success;
+    solver.m_isInitialized = true;
+    solver.m_eigenvectorsOk = computeEigenvectors;
+  }
+};
+
+// 2x2 direct eigenvalues decomposition, code from Hauke Heibel
+template<typename SolverType> 
+struct direct_selfadjoint_eigenvalues<SolverType,2,false>
+{
+  typedef typename SolverType::MatrixType MatrixType;
+  typedef typename SolverType::RealVectorType VectorType;
+  typedef typename SolverType::Scalar Scalar;
+  
+  EIGEN_DEVICE_FUNC
+  static inline void computeRoots(const MatrixType& m, VectorType& roots)
+  {
+    using std::sqrt;
+    const Scalar t0 = Scalar(0.5) * sqrt( numext::abs2(m(0,0)-m(1,1)) + Scalar(4)*m(1,0)*m(1,0));
+    const Scalar t1 = Scalar(0.5) * (m(0,0) + m(1,1));
+    roots(0) = t1 - t0;
+    roots(1) = t1 + t0;
+  }
+  
+  EIGEN_DEVICE_FUNC
+  static inline void run(SolverType& solver, const MatrixType& mat, int options)
+  {
+    EIGEN_USING_STD_MATH(sqrt);
+    
+    eigen_assert(mat.cols() == 2 && mat.cols() == mat.rows());
+    eigen_assert((options&~(EigVecMask|GenEigMask))==0
+            && (options&EigVecMask)!=EigVecMask
+            && "invalid option parameter");
+    bool computeEigenvectors = (options&ComputeEigenvectors)==ComputeEigenvectors;
+    
+    MatrixType& eivecs = solver.m_eivec;
+    VectorType& eivals = solver.m_eivalues;
+  
+    // map the matrix coefficients to [-1:1] to avoid over- and underflow.
+    Scalar scale = mat.cwiseAbs().maxCoeff();
+    scale = numext::maxi(scale,Scalar(1));
+    MatrixType scaledMat = mat / scale;
+    
+    // Compute the eigenvalues
+    computeRoots(scaledMat,eivals);
+    
+    // compute the eigen vectors
+    if(computeEigenvectors)
+    {
+      scaledMat.diagonal().array () -= eivals(1);
+      Scalar a2 = numext::abs2(scaledMat(0,0));
+      Scalar c2 = numext::abs2(scaledMat(1,1));
+      Scalar b2 = numext::abs2(scaledMat(1,0));
+      if(a2>c2)
+      {
+        eivecs.col(1) << -scaledMat(1,0), scaledMat(0,0);
+        eivecs.col(1) /= sqrt(a2+b2);
+      }
+      else
+      {
+        eivecs.col(1) << -scaledMat(1,1), scaledMat(1,0);
+        eivecs.col(1) /= sqrt(c2+b2);
+      }
+
+      eivecs.col(0) << eivecs.col(1).unitOrthogonal();
+    }
+    
+    // Rescale back to the original size.
+    eivals *= scale;
+    
+    solver.m_info = Success;
+    solver.m_isInitialized = true;
+    solver.m_eigenvectorsOk = computeEigenvectors;
+  }
+};
+
+}
+
+template<typename MatrixType>
+EIGEN_DEVICE_FUNC
+SelfAdjointEigenSolver<MatrixType>& SelfAdjointEigenSolver<MatrixType>
+::computeDirect(const MatrixType& matrix, int options)
+{
+  internal::direct_selfadjoint_eigenvalues<SelfAdjointEigenSolver,Size,NumTraits<Scalar>::IsComplex>::run(*this,matrix,options);
+  return *this;
+}
+
+namespace internal {
+template<int StorageOrder,typename RealScalar, typename Scalar, typename Index>
+EIGEN_DEVICE_FUNC
+static void tridiagonal_qr_step(RealScalar* diag, RealScalar* subdiag, Index start, Index end, Scalar* matrixQ, Index n)
+{
+  using std::abs;
+  RealScalar td = (diag[end-1] - diag[end])*RealScalar(0.5);
+  RealScalar e = subdiag[end-1];
+  // Note that thanks to scaling, e^2 or td^2 cannot overflow, however they can still
+  // underflow thus leading to inf/NaN values when using the following commented code:
+//   RealScalar e2 = numext::abs2(subdiag[end-1]);
+//   RealScalar mu = diag[end] - e2 / (td + (td>0 ? 1 : -1) * sqrt(td*td + e2));
+  // This explain the following, somewhat more complicated, version:
+  RealScalar mu = diag[end];
+  if(td==0)
+    mu -= abs(e);
+  else
+  {
+    RealScalar e2 = numext::abs2(subdiag[end-1]);
+    RealScalar h = numext::hypot(td,e);
+    if(e2==0)  mu -= (e / (td + (td>0 ? 1 : -1))) * (e / h);
+    else       mu -= e2 / (td + (td>0 ? h : -h));
+  }
+  
+  RealScalar x = diag[start] - mu;
+  RealScalar z = subdiag[start];
+  for (Index k = start; k < end; ++k)
+  {
+    JacobiRotation<RealScalar> rot;
+    rot.makeGivens(x, z);
+
+    // do T = G' T G
+    RealScalar sdk = rot.s() * diag[k] + rot.c() * subdiag[k];
+    RealScalar dkp1 = rot.s() * subdiag[k] + rot.c() * diag[k+1];
+
+    diag[k] = rot.c() * (rot.c() * diag[k] - rot.s() * subdiag[k]) - rot.s() * (rot.c() * subdiag[k] - rot.s() * diag[k+1]);
+    diag[k+1] = rot.s() * sdk + rot.c() * dkp1;
+    subdiag[k] = rot.c() * sdk - rot.s() * dkp1;
+    
+
+    if (k > start)
+      subdiag[k - 1] = rot.c() * subdiag[k-1] - rot.s() * z;
+
+    x = subdiag[k];
+
+    if (k < end - 1)
+    {
+      z = -rot.s() * subdiag[k+1];
+      subdiag[k + 1] = rot.c() * subdiag[k+1];
+    }
+    
+    // apply the givens rotation to the unit matrix Q = Q * G
+    if (matrixQ)
+    {
+      // FIXME if StorageOrder == RowMajor this operation is not very efficient
+      Map<Matrix<Scalar,Dynamic,Dynamic,StorageOrder> > q(matrixQ,n,n);
+      q.applyOnTheRight(k,k+1,rot);
+    }
+  }
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_SELFADJOINTEIGENSOLVER_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver_MKL.h b/third_party/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver_MKL.h
new file mode 100644
index 0000000000..17c0dadd23
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver_MKL.h
@@ -0,0 +1,92 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *    Self-adjoint eigenvalues/eigenvectors.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_SAEIGENSOLVER_MKL_H
+#define EIGEN_SAEIGENSOLVER_MKL_H
+
+#include "Eigen/src/Core/util/MKL_support.h"
+
+namespace Eigen { 
+
+/** \internal Specialization for the data types supported by MKL */
+
+#define EIGEN_MKL_EIG_SELFADJ(EIGTYPE, MKLTYPE, MKLRTYPE, MKLNAME, EIGCOLROW, MKLCOLROW ) \
+template<> inline \
+SelfAdjointEigenSolver<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW> >& \
+SelfAdjointEigenSolver<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW> >::compute(const Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW>& matrix, int options) \
+{ \
+  eigen_assert(matrix.cols() == matrix.rows()); \
+  eigen_assert((options&~(EigVecMask|GenEigMask))==0 \
+          && (options&EigVecMask)!=EigVecMask \
+          && "invalid option parameter"); \
+  bool computeEigenvectors = (options&ComputeEigenvectors)==ComputeEigenvectors; \
+  lapack_int n = matrix.cols(), lda, matrix_order, info; \
+  m_eivalues.resize(n,1); \
+  m_subdiag.resize(n-1); \
+  m_eivec = matrix; \
+\
+  if(n==1) \
+  { \
+    m_eivalues.coeffRef(0,0) = numext::real(matrix.coeff(0,0)); \
+    if(computeEigenvectors) m_eivec.setOnes(n,n); \
+    m_info = Success; \
+    m_isInitialized = true; \
+    m_eigenvectorsOk = computeEigenvectors; \
+    return *this; \
+  } \
+\
+  lda = matrix.outerStride(); \
+  matrix_order=MKLCOLROW; \
+  char jobz, uplo='L'/*, range='A'*/; \
+  jobz = computeEigenvectors ? 'V' : 'N'; \
+\
+  info = LAPACKE_##MKLNAME( matrix_order, jobz, uplo, n, (MKLTYPE*)m_eivec.data(), lda, (MKLRTYPE*)m_eivalues.data() ); \
+  m_info = (info==0) ? Success : NoConvergence; \
+  m_isInitialized = true; \
+  m_eigenvectorsOk = computeEigenvectors; \
+  return *this; \
+}
+
+
+EIGEN_MKL_EIG_SELFADJ(double,   double,        double, dsyev, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_EIG_SELFADJ(float,    float,         float,  ssyev, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_EIG_SELFADJ(dcomplex, MKL_Complex16, double, zheev, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_EIG_SELFADJ(scomplex, MKL_Complex8,  float,  cheev, ColMajor, LAPACK_COL_MAJOR)
+
+EIGEN_MKL_EIG_SELFADJ(double,   double,        double, dsyev, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_EIG_SELFADJ(float,    float,         float,  ssyev, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_EIG_SELFADJ(dcomplex, MKL_Complex16, double, zheev, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_EIG_SELFADJ(scomplex, MKL_Complex8,  float,  cheev, RowMajor, LAPACK_ROW_MAJOR)
+
+} // end namespace Eigen
+
+#endif // EIGEN_SAEIGENSOLVER_H
diff --git a/third_party/eigen3/Eigen/src/Eigenvalues/Tridiagonalization.h b/third_party/eigen3/Eigen/src/Eigenvalues/Tridiagonalization.h
new file mode 100644
index 0000000000..192278d685
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Eigenvalues/Tridiagonalization.h
@@ -0,0 +1,557 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Jitse Niesen <jitse@maths.leeds.ac.uk>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRIDIAGONALIZATION_H
+#define EIGEN_TRIDIAGONALIZATION_H
+
+namespace Eigen { 
+
+namespace internal {
+  
+template<typename MatrixType> struct TridiagonalizationMatrixTReturnType;
+template<typename MatrixType>
+struct traits<TridiagonalizationMatrixTReturnType<MatrixType> >
+{
+  typedef typename MatrixType::PlainObject ReturnType;
+};
+
+template<typename MatrixType, typename CoeffVectorType>
+void tridiagonalization_inplace(MatrixType& matA, CoeffVectorType& hCoeffs);
+}
+
+/** \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  *
+  * \class Tridiagonalization
+  *
+  * \brief Tridiagonal decomposition of a selfadjoint matrix
+  *
+  * \tparam _MatrixType the type of the matrix of which we are computing the
+  * tridiagonal decomposition; this is expected to be an instantiation of the
+  * Matrix class template.
+  *
+  * This class performs a tridiagonal decomposition of a selfadjoint matrix \f$ A \f$ such that:
+  * \f$ A = Q T Q^* \f$ where \f$ Q \f$ is unitary and \f$ T \f$ a real symmetric tridiagonal matrix.
+  *
+  * A tridiagonal matrix is a matrix which has nonzero elements only on the
+  * main diagonal and the first diagonal below and above it. The Hessenberg
+  * decomposition of a selfadjoint matrix is in fact a tridiagonal
+  * decomposition. This class is used in SelfAdjointEigenSolver to compute the
+  * eigenvalues and eigenvectors of a selfadjoint matrix.
+  *
+  * Call the function compute() to compute the tridiagonal decomposition of a
+  * given matrix. Alternatively, you can use the Tridiagonalization(const MatrixType&)
+  * constructor which computes the tridiagonal Schur decomposition at
+  * construction time. Once the decomposition is computed, you can use the
+  * matrixQ() and matrixT() functions to retrieve the matrices Q and T in the
+  * decomposition.
+  *
+  * The documentation of Tridiagonalization(const MatrixType&) contains an
+  * example of the typical use of this class.
+  *
+  * \sa class HessenbergDecomposition, class SelfAdjointEigenSolver
+  */
+template<typename _MatrixType> class Tridiagonalization
+{
+  public:
+
+    /** \brief Synonym for the template parameter \p _MatrixType. */
+    typedef _MatrixType MatrixType;
+
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename MatrixType::Index Index;
+
+    enum {
+      Size = MatrixType::RowsAtCompileTime,
+      SizeMinusOne = Size == Dynamic ? Dynamic : (Size > 1 ? Size - 1 : 1),
+      Options = MatrixType::Options,
+      MaxSize = MatrixType::MaxRowsAtCompileTime,
+      MaxSizeMinusOne = MaxSize == Dynamic ? Dynamic : (MaxSize > 1 ? MaxSize - 1 : 1)
+    };
+
+    typedef Matrix<Scalar, SizeMinusOne, 1, Options & ~RowMajor, MaxSizeMinusOne, 1> CoeffVectorType;
+    typedef typename internal::plain_col_type<MatrixType, RealScalar>::type DiagonalType;
+    typedef Matrix<RealScalar, SizeMinusOne, 1, Options & ~RowMajor, MaxSizeMinusOne, 1> SubDiagonalType;
+    typedef typename internal::remove_all<typename MatrixType::RealReturnType>::type MatrixTypeRealView;
+    typedef internal::TridiagonalizationMatrixTReturnType<MatrixTypeRealView> MatrixTReturnType;
+
+    typedef typename internal::conditional<NumTraits<Scalar>::IsComplex,
+              typename internal::add_const_on_value_type<typename Diagonal<const MatrixType>::RealReturnType>::type,
+              const Diagonal<const MatrixType>
+            >::type DiagonalReturnType;
+
+    typedef typename internal::conditional<NumTraits<Scalar>::IsComplex,
+              typename internal::add_const_on_value_type<typename Diagonal<
+                Block<const MatrixType,SizeMinusOne,SizeMinusOne> >::RealReturnType>::type,
+              const Diagonal<
+                Block<const MatrixType,SizeMinusOne,SizeMinusOne> >
+            >::type SubDiagonalReturnType;
+
+    /** \brief Return type of matrixQ() */
+    typedef HouseholderSequence<MatrixType,typename internal::remove_all<typename CoeffVectorType::ConjugateReturnType>::type> HouseholderSequenceType;
+
+    /** \brief Default constructor.
+      *
+      * \param [in]  size  Positive integer, size of the matrix whose tridiagonal
+      * decomposition will be computed.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via compute().  The \p size parameter is only
+      * used as a hint. It is not an error to give a wrong \p size, but it may
+      * impair performance.
+      *
+      * \sa compute() for an example.
+      */
+    Tridiagonalization(Index size = Size==Dynamic ? 2 : Size)
+      : m_matrix(size,size),
+        m_hCoeffs(size > 1 ? size-1 : 1),
+        m_isInitialized(false)
+    {}
+
+    /** \brief Constructor; computes tridiagonal decomposition of given matrix.
+      *
+      * \param[in]  matrix  Selfadjoint matrix whose tridiagonal decomposition
+      * is to be computed.
+      *
+      * This constructor calls compute() to compute the tridiagonal decomposition.
+      *
+      * Example: \include Tridiagonalization_Tridiagonalization_MatrixType.cpp
+      * Output: \verbinclude Tridiagonalization_Tridiagonalization_MatrixType.out
+      */
+    Tridiagonalization(const MatrixType& matrix)
+      : m_matrix(matrix),
+        m_hCoeffs(matrix.cols() > 1 ? matrix.cols()-1 : 1),
+        m_isInitialized(false)
+    {
+      internal::tridiagonalization_inplace(m_matrix, m_hCoeffs);
+      m_isInitialized = true;
+    }
+
+    /** \brief Computes tridiagonal decomposition of given matrix.
+      *
+      * \param[in]  matrix  Selfadjoint matrix whose tridiagonal decomposition
+      * is to be computed.
+      * \returns    Reference to \c *this
+      *
+      * The tridiagonal decomposition is computed by bringing the columns of
+      * the matrix successively in the required form using Householder
+      * reflections. The cost is \f$ 4n^3/3 \f$ flops, where \f$ n \f$ denotes
+      * the size of the given matrix.
+      *
+      * This method reuses of the allocated data in the Tridiagonalization
+      * object, if the size of the matrix does not change.
+      *
+      * Example: \include Tridiagonalization_compute.cpp
+      * Output: \verbinclude Tridiagonalization_compute.out
+      */
+    Tridiagonalization& compute(const MatrixType& matrix)
+    {
+      m_matrix = matrix;
+      m_hCoeffs.resize(matrix.rows()-1, 1);
+      internal::tridiagonalization_inplace(m_matrix, m_hCoeffs);
+      m_isInitialized = true;
+      return *this;
+    }
+
+    /** \brief Returns the Householder coefficients.
+      *
+      * \returns a const reference to the vector of Householder coefficients
+      *
+      * \pre Either the constructor Tridiagonalization(const MatrixType&) or
+      * the member function compute(const MatrixType&) has been called before
+      * to compute the tridiagonal decomposition of a matrix.
+      *
+      * The Householder coefficients allow the reconstruction of the matrix
+      * \f$ Q \f$ in the tridiagonal decomposition from the packed data.
+      *
+      * Example: \include Tridiagonalization_householderCoefficients.cpp
+      * Output: \verbinclude Tridiagonalization_householderCoefficients.out
+      *
+      * \sa packedMatrix(), \ref Householder_Module "Householder module"
+      */
+    inline CoeffVectorType householderCoefficients() const
+    {
+      eigen_assert(m_isInitialized && "Tridiagonalization is not initialized.");
+      return m_hCoeffs;
+    }
+
+    /** \brief Returns the internal representation of the decomposition
+      *
+      *	\returns a const reference to a matrix with the internal representation
+      *	         of the decomposition.
+      *
+      * \pre Either the constructor Tridiagonalization(const MatrixType&) or
+      * the member function compute(const MatrixType&) has been called before
+      * to compute the tridiagonal decomposition of a matrix.
+      *
+      * The returned matrix contains the following information:
+      *  - the strict upper triangular part is equal to the input matrix A.
+      *  - the diagonal and lower sub-diagonal represent the real tridiagonal
+      *    symmetric matrix T.
+      *  - the rest of the lower part contains the Householder vectors that,
+      *    combined with Householder coefficients returned by
+      *    householderCoefficients(), allows to reconstruct the matrix Q as
+      *       \f$ Q = H_{N-1} \ldots H_1 H_0 \f$.
+      *    Here, the matrices \f$ H_i \f$ are the Householder transformations
+      *       \f$ H_i = (I - h_i v_i v_i^T) \f$
+      *    where \f$ h_i \f$ is the \f$ i \f$th Householder coefficient and
+      *    \f$ v_i \f$ is the Householder vector defined by
+      *       \f$ v_i = [ 0, \ldots, 0, 1, M(i+2,i), \ldots, M(N-1,i) ]^T \f$
+      *    with M the matrix returned by this function.
+      *
+      * See LAPACK for further details on this packed storage.
+      *
+      * Example: \include Tridiagonalization_packedMatrix.cpp
+      * Output: \verbinclude Tridiagonalization_packedMatrix.out
+      *
+      * \sa householderCoefficients()
+      */
+    inline const MatrixType& packedMatrix() const
+    {
+      eigen_assert(m_isInitialized && "Tridiagonalization is not initialized.");
+      return m_matrix;
+    }
+
+    /** \brief Returns the unitary matrix Q in the decomposition
+      *
+      * \returns object representing the matrix Q
+      *
+      * \pre Either the constructor Tridiagonalization(const MatrixType&) or
+      * the member function compute(const MatrixType&) has been called before
+      * to compute the tridiagonal decomposition of a matrix.
+      *
+      * This function returns a light-weight object of template class
+      * HouseholderSequence. You can either apply it directly to a matrix or
+      * you can convert it to a matrix of type #MatrixType.
+      *
+      * \sa Tridiagonalization(const MatrixType&) for an example,
+      *     matrixT(), class HouseholderSequence
+      */
+    HouseholderSequenceType matrixQ() const
+    {
+      eigen_assert(m_isInitialized && "Tridiagonalization is not initialized.");
+      return HouseholderSequenceType(m_matrix, m_hCoeffs.conjugate())
+             .setLength(m_matrix.rows() - 1)
+             .setShift(1);
+    }
+
+    /** \brief Returns an expression of the tridiagonal matrix T in the decomposition
+      *
+      * \returns expression object representing the matrix T
+      *
+      * \pre Either the constructor Tridiagonalization(const MatrixType&) or
+      * the member function compute(const MatrixType&) has been called before
+      * to compute the tridiagonal decomposition of a matrix.
+      *
+      * Currently, this function can be used to extract the matrix T from internal
+      * data and copy it to a dense matrix object. In most cases, it may be
+      * sufficient to directly use the packed matrix or the vector expressions
+      * returned by diagonal() and subDiagonal() instead of creating a new
+      * dense copy matrix with this function.
+      *
+      * \sa Tridiagonalization(const MatrixType&) for an example,
+      * matrixQ(), packedMatrix(), diagonal(), subDiagonal()
+      */
+    MatrixTReturnType matrixT() const
+    {
+      eigen_assert(m_isInitialized && "Tridiagonalization is not initialized.");
+      return MatrixTReturnType(m_matrix.real());
+    }
+
+    /** \brief Returns the diagonal of the tridiagonal matrix T in the decomposition.
+      *
+      * \returns expression representing the diagonal of T
+      *
+      * \pre Either the constructor Tridiagonalization(const MatrixType&) or
+      * the member function compute(const MatrixType&) has been called before
+      * to compute the tridiagonal decomposition of a matrix.
+      *
+      * Example: \include Tridiagonalization_diagonal.cpp
+      * Output: \verbinclude Tridiagonalization_diagonal.out
+      *
+      * \sa matrixT(), subDiagonal()
+      */
+    DiagonalReturnType diagonal() const;
+
+    /** \brief Returns the subdiagonal of the tridiagonal matrix T in the decomposition.
+      *
+      * \returns expression representing the subdiagonal of T
+      *
+      * \pre Either the constructor Tridiagonalization(const MatrixType&) or
+      * the member function compute(const MatrixType&) has been called before
+      * to compute the tridiagonal decomposition of a matrix.
+      *
+      * \sa diagonal() for an example, matrixT()
+      */
+    SubDiagonalReturnType subDiagonal() const;
+
+  protected:
+
+    MatrixType m_matrix;
+    CoeffVectorType m_hCoeffs;
+    bool m_isInitialized;
+};
+
+template<typename MatrixType>
+typename Tridiagonalization<MatrixType>::DiagonalReturnType
+Tridiagonalization<MatrixType>::diagonal() const
+{
+  eigen_assert(m_isInitialized && "Tridiagonalization is not initialized.");
+  return m_matrix.diagonal();
+}
+
+template<typename MatrixType>
+typename Tridiagonalization<MatrixType>::SubDiagonalReturnType
+Tridiagonalization<MatrixType>::subDiagonal() const
+{
+  eigen_assert(m_isInitialized && "Tridiagonalization is not initialized.");
+  Index n = m_matrix.rows();
+  return Block<const MatrixType,SizeMinusOne,SizeMinusOne>(m_matrix, 1, 0, n-1,n-1).diagonal();
+}
+
+namespace internal {
+
+/** \internal
+  * Performs a tridiagonal decomposition of the selfadjoint matrix \a matA in-place.
+  *
+  * \param[in,out] matA On input the selfadjoint matrix. Only the \b lower triangular part is referenced.
+  *                     On output, the strict upper part is left unchanged, and the lower triangular part
+  *                     represents the T and Q matrices in packed format has detailed below.
+  * \param[out]    hCoeffs returned Householder coefficients (see below)
+  *
+  * On output, the tridiagonal selfadjoint matrix T is stored in the diagonal
+  * and lower sub-diagonal of the matrix \a matA.
+  * The unitary matrix Q is represented in a compact way as a product of
+  * Householder reflectors \f$ H_i \f$ such that:
+  *       \f$ Q = H_{N-1} \ldots H_1 H_0 \f$.
+  * The Householder reflectors are defined as
+  *       \f$ H_i = (I - h_i v_i v_i^T) \f$
+  * where \f$ h_i = hCoeffs[i]\f$ is the \f$ i \f$th Householder coefficient and
+  * \f$ v_i \f$ is the Householder vector defined by
+  *       \f$ v_i = [ 0, \ldots, 0, 1, matA(i+2,i), \ldots, matA(N-1,i) ]^T \f$.
+  *
+  * Implemented from Golub's "Matrix Computations", algorithm 8.3.1.
+  *
+  * \sa Tridiagonalization::packedMatrix()
+  */
+template<typename MatrixType, typename CoeffVectorType>
+void tridiagonalization_inplace(MatrixType& matA, CoeffVectorType& hCoeffs)
+{
+  using numext::conj;
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename MatrixType::RealScalar RealScalar;
+  Index n = matA.rows();
+  eigen_assert(n==matA.cols());
+  eigen_assert(n==hCoeffs.size()+1 || n==1);
+  
+  for (Index i = 0; i<n-1; ++i)
+  {
+    Index remainingSize = n-i-1;
+    RealScalar beta;
+    Scalar h;
+    matA.col(i).tail(remainingSize).makeHouseholderInPlace(h, beta);
+
+    // Apply similarity transformation to remaining columns,
+    // i.e., A = H A H' where H = I - h v v' and v = matA.col(i).tail(n-i-1)
+    matA.col(i).coeffRef(i+1) = 1;
+
+    hCoeffs.tail(n-i-1).noalias() = (matA.bottomRightCorner(remainingSize,remainingSize).template selfadjointView<Lower>()
+                                  * (conj(h) * matA.col(i).tail(remainingSize)));
+
+    hCoeffs.tail(n-i-1) += (conj(h)*Scalar(-0.5)*(hCoeffs.tail(remainingSize).dot(matA.col(i).tail(remainingSize)))) * matA.col(i).tail(n-i-1);
+
+    matA.bottomRightCorner(remainingSize, remainingSize).template selfadjointView<Lower>()
+      .rankUpdate(matA.col(i).tail(remainingSize), hCoeffs.tail(remainingSize), -1);
+
+    matA.col(i).coeffRef(i+1) = beta;
+    hCoeffs.coeffRef(i) = h;
+  }
+}
+
+// forward declaration, implementation at the end of this file
+template<typename MatrixType,
+         int Size=MatrixType::ColsAtCompileTime,
+         bool IsComplex=NumTraits<typename MatrixType::Scalar>::IsComplex>
+struct tridiagonalization_inplace_selector;
+
+/** \brief Performs a full tridiagonalization in place
+  *
+  * \param[in,out]  mat  On input, the selfadjoint matrix whose tridiagonal
+  *    decomposition is to be computed. Only the lower triangular part referenced.
+  *    The rest is left unchanged. On output, the orthogonal matrix Q
+  *    in the decomposition if \p extractQ is true.
+  * \param[out]  diag  The diagonal of the tridiagonal matrix T in the
+  *    decomposition.
+  * \param[out]  subdiag  The subdiagonal of the tridiagonal matrix T in
+  *    the decomposition.
+  * \param[in]  extractQ  If true, the orthogonal matrix Q in the
+  *    decomposition is computed and stored in \p mat.
+  *
+  * Computes the tridiagonal decomposition of the selfadjoint matrix \p mat in place
+  * such that \f$ mat = Q T Q^* \f$ where \f$ Q \f$ is unitary and \f$ T \f$ a real
+  * symmetric tridiagonal matrix.
+  *
+  * The tridiagonal matrix T is passed to the output parameters \p diag and \p subdiag. If
+  * \p extractQ is true, then the orthogonal matrix Q is passed to \p mat. Otherwise the lower
+  * part of the matrix \p mat is destroyed.
+  *
+  * The vectors \p diag and \p subdiag are not resized. The function
+  * assumes that they are already of the correct size. The length of the
+  * vector \p diag should equal the number of rows in \p mat, and the
+  * length of the vector \p subdiag should be one left.
+  *
+  * This implementation contains an optimized path for 3-by-3 matrices
+  * which is especially useful for plane fitting.
+  *
+  * \note Currently, it requires two temporary vectors to hold the intermediate
+  * Householder coefficients, and to reconstruct the matrix Q from the Householder
+  * reflectors.
+  *
+  * Example (this uses the same matrix as the example in
+  *    Tridiagonalization::Tridiagonalization(const MatrixType&)):
+  *    \include Tridiagonalization_decomposeInPlace.cpp
+  * Output: \verbinclude Tridiagonalization_decomposeInPlace.out
+  *
+  * \sa class Tridiagonalization
+  */
+template<typename MatrixType, typename DiagonalType, typename SubDiagonalType>
+void tridiagonalization_inplace(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag, bool extractQ)
+{
+  eigen_assert(mat.cols()==mat.rows() && diag.size()==mat.rows() && subdiag.size()==mat.rows()-1);
+  tridiagonalization_inplace_selector<MatrixType>::run(mat, diag, subdiag, extractQ);
+}
+
+/** \internal
+  * General full tridiagonalization
+  */
+template<typename MatrixType, int Size, bool IsComplex>
+struct tridiagonalization_inplace_selector
+{
+  typedef typename Tridiagonalization<MatrixType>::CoeffVectorType CoeffVectorType;
+  typedef typename Tridiagonalization<MatrixType>::HouseholderSequenceType HouseholderSequenceType;
+  typedef typename MatrixType::Index Index;
+  template<typename DiagonalType, typename SubDiagonalType>
+  static void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag, bool extractQ)
+  {
+    CoeffVectorType hCoeffs(mat.cols()-1);
+    tridiagonalization_inplace(mat,hCoeffs);
+    diag = mat.diagonal().real();
+    subdiag = mat.template diagonal<-1>().real();
+    if(extractQ)
+      mat = HouseholderSequenceType(mat, hCoeffs.conjugate())
+            .setLength(mat.rows() - 1)
+            .setShift(1);
+  }
+};
+
+/** \internal
+  * Specialization for 3x3 real matrices.
+  * Especially useful for plane fitting.
+  */
+template<typename MatrixType>
+struct tridiagonalization_inplace_selector<MatrixType,3,false>
+{
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename MatrixType::RealScalar RealScalar;
+
+  template<typename DiagonalType, typename SubDiagonalType>
+  static void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType& subdiag, bool extractQ)
+  {
+    using std::sqrt;
+    diag[0] = mat(0,0);
+    RealScalar v1norm2 = numext::abs2(mat(2,0));
+    if(v1norm2 == RealScalar(0))
+    {
+      diag[1] = mat(1,1);
+      diag[2] = mat(2,2);
+      subdiag[0] = mat(1,0);
+      subdiag[1] = mat(2,1);
+      if (extractQ)
+        mat.setIdentity();
+    }
+    else
+    {
+      RealScalar beta = sqrt(numext::abs2(mat(1,0)) + v1norm2);
+      RealScalar invBeta = RealScalar(1)/beta;
+      Scalar m01 = mat(1,0) * invBeta;
+      Scalar m02 = mat(2,0) * invBeta;
+      Scalar q = RealScalar(2)*m01*mat(2,1) + m02*(mat(2,2) - mat(1,1));
+      diag[1] = mat(1,1) + m02*q;
+      diag[2] = mat(2,2) - m02*q;
+      subdiag[0] = beta;
+      subdiag[1] = mat(2,1) - m01 * q;
+      if (extractQ)
+      {
+        mat << 1,   0,    0,
+               0, m01,  m02,
+               0, m02, -m01;
+      }
+    }
+  }
+};
+
+/** \internal
+  * Trivial specialization for 1x1 matrices
+  */
+template<typename MatrixType, bool IsComplex>
+struct tridiagonalization_inplace_selector<MatrixType,1,IsComplex>
+{
+  typedef typename MatrixType::Scalar Scalar;
+
+  template<typename DiagonalType, typename SubDiagonalType>
+  static void run(MatrixType& mat, DiagonalType& diag, SubDiagonalType&, bool extractQ)
+  {
+    diag(0,0) = numext::real(mat(0,0));
+    if(extractQ)
+      mat(0,0) = Scalar(1);
+  }
+};
+
+/** \internal
+  * \eigenvalues_module \ingroup Eigenvalues_Module
+  *
+  * \brief Expression type for return value of Tridiagonalization::matrixT()
+  *
+  * \tparam MatrixType type of underlying dense matrix
+  */
+template<typename MatrixType> struct TridiagonalizationMatrixTReturnType
+: public ReturnByValue<TridiagonalizationMatrixTReturnType<MatrixType> >
+{
+    typedef typename MatrixType::Index Index;
+  public:
+    /** \brief Constructor.
+      *
+      * \param[in] mat The underlying dense matrix
+      */
+    TridiagonalizationMatrixTReturnType(const MatrixType& mat) : m_matrix(mat) { }
+
+    template <typename ResultType>
+    inline void evalTo(ResultType& result) const
+    {
+      result.setZero();
+      result.template diagonal<1>() = m_matrix.template diagonal<-1>().conjugate();
+      result.diagonal() = m_matrix.diagonal();
+      result.template diagonal<-1>() = m_matrix.template diagonal<-1>();
+    }
+
+    Index rows() const { return m_matrix.rows(); }
+    Index cols() const { return m_matrix.cols(); }
+
+  protected:
+    typename MatrixType::Nested m_matrix;
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRIDIAGONALIZATION_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/AlignedBox.h b/third_party/eigen3/Eigen/src/Geometry/AlignedBox.h
new file mode 100644
index 0000000000..b6a2f0e24c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/AlignedBox.h
@@ -0,0 +1,379 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ALIGNEDBOX_H
+#define EIGEN_ALIGNEDBOX_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  *
+  * \class AlignedBox
+  *
+  * \brief An axis aligned box
+  *
+  * \param _Scalar the type of the scalar coefficients
+  * \param _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic.
+  *
+  * This class represents an axis aligned box as a pair of the minimal and maximal corners.
+  */
+template <typename _Scalar, int _AmbientDim>
+class AlignedBox
+{
+public:
+EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim)
+  enum { AmbientDimAtCompileTime = _AmbientDim };
+  typedef _Scalar                                   Scalar;
+  typedef NumTraits<Scalar>                         ScalarTraits;
+  typedef DenseIndex                                Index;
+  typedef typename ScalarTraits::Real               RealScalar;
+  typedef typename ScalarTraits::NonInteger      NonInteger;
+  typedef Matrix<Scalar,AmbientDimAtCompileTime,1>  VectorType;
+
+  /** Define constants to name the corners of a 1D, 2D or 3D axis aligned bounding box */
+  enum CornerType
+  {
+    /** 1D names */
+    Min=0, Max=1,
+
+    /** Added names for 2D */
+    BottomLeft=0, BottomRight=1,
+    TopLeft=2, TopRight=3,
+
+    /** Added names for 3D */
+    BottomLeftFloor=0, BottomRightFloor=1,
+    TopLeftFloor=2, TopRightFloor=3,
+    BottomLeftCeil=4, BottomRightCeil=5,
+    TopLeftCeil=6, TopRightCeil=7
+  };
+
+
+  /** Default constructor initializing a null box. */
+  inline AlignedBox()
+  { if (AmbientDimAtCompileTime!=Dynamic) setEmpty(); }
+
+  /** Constructs a null box with \a _dim the dimension of the ambient space. */
+  inline explicit AlignedBox(Index _dim) : m_min(_dim), m_max(_dim)
+  { setEmpty(); }
+
+  /** Constructs a box with extremities \a _min and \a _max. */
+  template<typename OtherVectorType1, typename OtherVectorType2>
+  inline AlignedBox(const OtherVectorType1& _min, const OtherVectorType2& _max) : m_min(_min), m_max(_max) {}
+
+  /** Constructs a box containing a single point \a p. */
+  template<typename Derived>
+  inline explicit AlignedBox(const MatrixBase<Derived>& a_p)
+  {
+    typename internal::nested<Derived,2>::type p(a_p.derived());
+    m_min = p;
+    m_max = p;
+  }
+
+  ~AlignedBox() {}
+
+  /** \returns the dimension in which the box holds */
+  inline Index dim() const { return AmbientDimAtCompileTime==Dynamic ? m_min.size() : Index(AmbientDimAtCompileTime); }
+
+  /** \deprecated use isEmpty */
+  inline bool isNull() const { return isEmpty(); }
+
+  /** \deprecated use setEmpty */
+  inline void setNull() { setEmpty(); }
+
+  /** \returns true if the box is empty. */
+  inline bool isEmpty() const { return (m_min.array() > m_max.array()).any(); }
+
+  /** Makes \c *this an empty box. */
+  inline void setEmpty()
+  {
+    m_min.setConstant( ScalarTraits::highest() );
+    m_max.setConstant( ScalarTraits::lowest() );
+  }
+
+  /** \returns the minimal corner */
+  inline const VectorType& (min)() const { return m_min; }
+  /** \returns a non const reference to the minimal corner */
+  inline VectorType& (min)() { return m_min; }
+  /** \returns the maximal corner */
+  inline const VectorType& (max)() const { return m_max; }
+  /** \returns a non const reference to the maximal corner */
+  inline VectorType& (max)() { return m_max; }
+
+  /** \returns the center of the box */
+  inline const CwiseUnaryOp<internal::scalar_quotient1_op<Scalar>,
+                            const CwiseBinaryOp<internal::scalar_sum_op<Scalar>, const VectorType, const VectorType> >
+  center() const
+  { return (m_min+m_max)/2; }
+
+  /** \returns the lengths of the sides of the bounding box.
+    * Note that this function does not get the same
+    * result for integral or floating scalar types: see
+    */
+  inline const CwiseBinaryOp< internal::scalar_difference_op<Scalar>, const VectorType, const VectorType> sizes() const
+  { return m_max - m_min; }
+
+  /** \returns the volume of the bounding box */
+  inline Scalar volume() const
+  { return sizes().prod(); }
+
+  /** \returns an expression for the bounding box diagonal vector
+    * if the length of the diagonal is needed: diagonal().norm()
+    * will provide it.
+    */
+  inline CwiseBinaryOp< internal::scalar_difference_op<Scalar>, const VectorType, const VectorType> diagonal() const
+  { return sizes(); }
+
+  /** \returns the vertex of the bounding box at the corner defined by
+    * the corner-id corner. It works only for a 1D, 2D or 3D bounding box.
+    * For 1D bounding boxes corners are named by 2 enum constants:
+    * BottomLeft and BottomRight.
+    * For 2D bounding boxes, corners are named by 4 enum constants:
+    * BottomLeft, BottomRight, TopLeft, TopRight.
+    * For 3D bounding boxes, the following names are added:
+    * BottomLeftCeil, BottomRightCeil, TopLeftCeil, TopRightCeil.
+    */
+  inline VectorType corner(CornerType corner) const
+  {
+    EIGEN_STATIC_ASSERT(_AmbientDim <= 3, THIS_METHOD_IS_ONLY_FOR_VECTORS_OF_A_SPECIFIC_SIZE);
+
+    VectorType res;
+
+    Index mult = 1;
+    for(Index d=0; d<dim(); ++d)
+    {
+      if( mult & corner ) res[d] = m_max[d];
+      else                res[d] = m_min[d];
+      mult *= 2;
+    }
+    return res;
+  }
+
+  /** \returns a random point inside the bounding box sampled with
+   * a uniform distribution */
+  inline VectorType sample() const
+  {
+    VectorType r;
+    for(Index d=0; d<dim(); ++d)
+    {
+      if(!ScalarTraits::IsInteger)
+      {
+        r[d] = m_min[d] + (m_max[d]-m_min[d])
+             * internal::random<Scalar>(Scalar(0), Scalar(1));
+      }
+      else
+        r[d] = internal::random(m_min[d], m_max[d]);
+    }
+    return r;
+  }
+
+  /** \returns true if the point \a p is inside the box \c *this. */
+  template<typename Derived>
+  inline bool contains(const MatrixBase<Derived>& a_p) const
+  {
+    typename internal::nested<Derived,2>::type p(a_p.derived());
+    return (m_min.array()<=p.array()).all() && (p.array()<=m_max.array()).all();
+  }
+
+  /** \returns true if the box \a b is entirely inside the box \c *this. */
+  inline bool contains(const AlignedBox& b) const
+  { return (m_min.array()<=(b.min)().array()).all() && ((b.max)().array()<=m_max.array()).all(); }
+
+  /** \returns true if the box \a b is intersecting the box \c *this. */
+  inline bool intersects(const AlignedBox& b) const
+  { return (m_min.array()<=(b.max)().array()).all() && ((b.min)().array()<=m_max.array()).all(); }
+
+  /** Extends \c *this such that it contains the point \a p and returns a reference to \c *this. */
+  template<typename Derived>
+  inline AlignedBox& extend(const MatrixBase<Derived>& a_p)
+  {
+    typename internal::nested<Derived,2>::type p(a_p.derived());
+    m_min = m_min.cwiseMin(p);
+    m_max = m_max.cwiseMax(p);
+    return *this;
+  }
+
+  /** Extends \c *this such that it contains the box \a b and returns a reference to \c *this. */
+  inline AlignedBox& extend(const AlignedBox& b)
+  {
+    m_min = m_min.cwiseMin(b.m_min);
+    m_max = m_max.cwiseMax(b.m_max);
+    return *this;
+  }
+
+  /** Clamps \c *this by the box \a b and returns a reference to \c *this. */
+  inline AlignedBox& clamp(const AlignedBox& b)
+  {
+    m_min = m_min.cwiseMax(b.m_min);
+    m_max = m_max.cwiseMin(b.m_max);
+    return *this;
+  }
+
+  /** Returns an AlignedBox that is the intersection of \a b and \c *this */
+  inline AlignedBox intersection(const AlignedBox& b) const
+  {return AlignedBox(m_min.cwiseMax(b.m_min), m_max.cwiseMin(b.m_max)); }
+
+  /** Returns an AlignedBox that is the union of \a b and \c *this */
+  inline AlignedBox merged(const AlignedBox& b) const
+  { return AlignedBox(m_min.cwiseMin(b.m_min), m_max.cwiseMax(b.m_max)); }
+
+  /** Translate \c *this by the vector \a t and returns a reference to \c *this. */
+  template<typename Derived>
+  inline AlignedBox& translate(const MatrixBase<Derived>& a_t)
+  {
+    const typename internal::nested<Derived,2>::type t(a_t.derived());
+    m_min += t;
+    m_max += t;
+    return *this;
+  }
+
+  /** \returns the squared distance between the point \a p and the box \c *this,
+    * and zero if \a p is inside the box.
+    * \sa exteriorDistance()
+    */
+  template<typename Derived>
+  inline Scalar squaredExteriorDistance(const MatrixBase<Derived>& a_p) const;
+
+  /** \returns the squared distance between the boxes \a b and \c *this,
+    * and zero if the boxes intersect.
+    * \sa exteriorDistance()
+    */
+  inline Scalar squaredExteriorDistance(const AlignedBox& b) const;
+
+  /** \returns the distance between the point \a p and the box \c *this,
+    * and zero if \a p is inside the box.
+    * \sa squaredExteriorDistance()
+    */
+  template<typename Derived>
+  inline NonInteger exteriorDistance(const MatrixBase<Derived>& p) const
+  { using std::sqrt; return sqrt(NonInteger(squaredExteriorDistance(p))); }
+
+  /** \returns the distance between the boxes \a b and \c *this,
+    * and zero if the boxes intersect.
+    * \sa squaredExteriorDistance()
+    */
+  inline NonInteger exteriorDistance(const AlignedBox& b) const
+  { using std::sqrt; return sqrt(NonInteger(squaredExteriorDistance(b))); }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<AlignedBox,
+           AlignedBox<NewScalarType,AmbientDimAtCompileTime> >::type cast() const
+  {
+    return typename internal::cast_return_type<AlignedBox,
+                    AlignedBox<NewScalarType,AmbientDimAtCompileTime> >::type(*this);
+  }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit AlignedBox(const AlignedBox<OtherScalarType,AmbientDimAtCompileTime>& other)
+  {
+    m_min = (other.min)().template cast<Scalar>();
+    m_max = (other.max)().template cast<Scalar>();
+  }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const AlignedBox& other, const RealScalar& prec = ScalarTraits::dummy_precision()) const
+  { return m_min.isApprox(other.m_min, prec) && m_max.isApprox(other.m_max, prec); }
+
+protected:
+
+  VectorType m_min, m_max;
+};
+
+
+
+template<typename Scalar,int AmbientDim>
+template<typename Derived>
+inline Scalar AlignedBox<Scalar,AmbientDim>::squaredExteriorDistance(const MatrixBase<Derived>& a_p) const
+{
+  typename internal::nested<Derived,2*AmbientDim>::type p(a_p.derived());
+  Scalar dist2(0);
+  Scalar aux;
+  for (Index k=0; k<dim(); ++k)
+  {
+    if( m_min[k] > p[k] )
+    {
+      aux = m_min[k] - p[k];
+      dist2 += aux*aux;
+    }
+    else if( p[k] > m_max[k] )
+    {
+      aux = p[k] - m_max[k];
+      dist2 += aux*aux;
+    }
+  }
+  return dist2;
+}
+
+template<typename Scalar,int AmbientDim>
+inline Scalar AlignedBox<Scalar,AmbientDim>::squaredExteriorDistance(const AlignedBox& b) const
+{
+  Scalar dist2(0);
+  Scalar aux;
+  for (Index k=0; k<dim(); ++k)
+  {
+    if( m_min[k] > b.m_max[k] )
+    {
+      aux = m_min[k] - b.m_max[k];
+      dist2 += aux*aux;
+    }
+    else if( b.m_min[k] > m_max[k] )
+    {
+      aux = b.m_min[k] - m_max[k];
+      dist2 += aux*aux;
+    }
+  }
+  return dist2;
+}
+
+/** \defgroup alignedboxtypedefs Global aligned box typedefs
+  *
+  * \ingroup Geometry_Module
+  *
+  * Eigen defines several typedef shortcuts for most common aligned box types.
+  *
+  * The general patterns are the following:
+  *
+  * \c AlignedBoxSizeType where \c Size can be \c 1, \c 2,\c 3,\c 4 for fixed size boxes or \c X for dynamic size,
+  * and where \c Type can be \c i for integer, \c f for float, \c d for double.
+  *
+  * For example, \c AlignedBox3d is a fixed-size 3x3 aligned box type of doubles, and \c AlignedBoxXf is a dynamic-size aligned box of floats.
+  *
+  * \sa class AlignedBox
+  */
+
+#define EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Size, SizeSuffix)    \
+/** \ingroup alignedboxtypedefs */                                 \
+typedef AlignedBox<Type, Size>   AlignedBox##SizeSuffix##TypeSuffix;
+
+#define EIGEN_MAKE_TYPEDEFS_ALL_SIZES(Type, TypeSuffix) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 1, 1) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 2, 2) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 3, 3) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, 4, 4) \
+EIGEN_MAKE_TYPEDEFS(Type, TypeSuffix, Dynamic, X)
+
+EIGEN_MAKE_TYPEDEFS_ALL_SIZES(int,                  i)
+EIGEN_MAKE_TYPEDEFS_ALL_SIZES(float,                f)
+EIGEN_MAKE_TYPEDEFS_ALL_SIZES(double,               d)
+
+#undef EIGEN_MAKE_TYPEDEFS_ALL_SIZES
+#undef EIGEN_MAKE_TYPEDEFS
+
+} // end namespace Eigen
+
+#endif // EIGEN_ALIGNEDBOX_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/AngleAxis.h b/third_party/eigen3/Eigen/src/Geometry/AngleAxis.h
new file mode 100644
index 0000000000..636712c2b9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/AngleAxis.h
@@ -0,0 +1,233 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ANGLEAXIS_H
+#define EIGEN_ANGLEAXIS_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class AngleAxis
+  *
+  * \brief Represents a 3D rotation as a rotation angle around an arbitrary 3D axis
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients.
+  *
+  * \warning When setting up an AngleAxis object, the axis vector \b must \b be \b normalized.
+  *
+  * The following two typedefs are provided for convenience:
+  * \li \c AngleAxisf for \c float
+  * \li \c AngleAxisd for \c double
+  *
+  * Combined with MatrixBase::Unit{X,Y,Z}, AngleAxis can be used to easily
+  * mimic Euler-angles. Here is an example:
+  * \include AngleAxis_mimic_euler.cpp
+  * Output: \verbinclude AngleAxis_mimic_euler.out
+  *
+  * \note This class is not aimed to be used to store a rotation transformation,
+  * but rather to make easier the creation of other rotation (Quaternion, rotation Matrix)
+  * and transformation objects.
+  *
+  * \sa class Quaternion, class Transform, MatrixBase::UnitX()
+  */
+
+namespace internal {
+template<typename _Scalar> struct traits<AngleAxis<_Scalar> >
+{
+  typedef _Scalar Scalar;
+};
+}
+
+template<typename _Scalar>
+class AngleAxis : public RotationBase<AngleAxis<_Scalar>,3>
+{
+  typedef RotationBase<AngleAxis<_Scalar>,3> Base;
+
+public:
+
+  using Base::operator*;
+
+  enum { Dim = 3 };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  typedef Matrix<Scalar,3,3> Matrix3;
+  typedef Matrix<Scalar,3,1> Vector3;
+  typedef Quaternion<Scalar> QuaternionType;
+
+protected:
+
+  Vector3 m_axis;
+  Scalar m_angle;
+
+public:
+
+  /** Default constructor without initialization. */
+  AngleAxis() {}
+  /** Constructs and initialize the angle-axis rotation from an \a angle in radian
+    * and an \a axis which \b must \b be \b normalized.
+    *
+    * \warning If the \a axis vector is not normalized, then the angle-axis object
+    *          represents an invalid rotation. */
+  template<typename Derived>
+  inline AngleAxis(const Scalar& angle, const MatrixBase<Derived>& axis) : m_axis(axis), m_angle(angle) {}
+  /** Constructs and initialize the angle-axis rotation from a quaternion \a q.
+    * This function implicitly normalizes the quaternion \a q.
+    */
+  template<typename QuatDerived> inline explicit AngleAxis(const QuaternionBase<QuatDerived>& q) { *this = q; }
+  /** Constructs and initialize the angle-axis rotation from a 3x3 rotation matrix. */
+  template<typename Derived>
+  inline explicit AngleAxis(const MatrixBase<Derived>& m) { *this = m; }
+
+  Scalar angle() const { return m_angle; }
+  Scalar& angle() { return m_angle; }
+
+  const Vector3& axis() const { return m_axis; }
+  Vector3& axis() { return m_axis; }
+
+  /** Concatenates two rotations */
+  inline QuaternionType operator* (const AngleAxis& other) const
+  { return QuaternionType(*this) * QuaternionType(other); }
+
+  /** Concatenates two rotations */
+  inline QuaternionType operator* (const QuaternionType& other) const
+  { return QuaternionType(*this) * other; }
+
+  /** Concatenates two rotations */
+  friend inline QuaternionType operator* (const QuaternionType& a, const AngleAxis& b)
+  { return a * QuaternionType(b); }
+
+  /** \returns the inverse rotation, i.e., an angle-axis with opposite rotation angle */
+  AngleAxis inverse() const
+  { return AngleAxis(-m_angle, m_axis); }
+
+  template<class QuatDerived>
+  AngleAxis& operator=(const QuaternionBase<QuatDerived>& q);
+  template<typename Derived>
+  AngleAxis& operator=(const MatrixBase<Derived>& m);
+
+  template<typename Derived>
+  AngleAxis& fromRotationMatrix(const MatrixBase<Derived>& m);
+  Matrix3 toRotationMatrix(void) const;
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<AngleAxis,AngleAxis<NewScalarType> >::type cast() const
+  { return typename internal::cast_return_type<AngleAxis,AngleAxis<NewScalarType> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit AngleAxis(const AngleAxis<OtherScalarType>& other)
+  {
+    m_axis = other.axis().template cast<Scalar>();
+    m_angle = Scalar(other.angle());
+  }
+
+  static inline const AngleAxis Identity() { return AngleAxis(0, Vector3::UnitX()); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const AngleAxis& other, const typename NumTraits<Scalar>::Real& prec = NumTraits<Scalar>::dummy_precision()) const
+  { return m_axis.isApprox(other.m_axis, prec) && internal::isApprox(m_angle,other.m_angle, prec); }
+};
+
+/** \ingroup Geometry_Module
+  * single precision angle-axis type */
+typedef AngleAxis<float> AngleAxisf;
+/** \ingroup Geometry_Module
+  * double precision angle-axis type */
+typedef AngleAxis<double> AngleAxisd;
+
+/** Set \c *this from a \b unit quaternion.
+  * The resulting axis is normalized.
+  * 
+  * This function implicitly normalizes the quaternion \a q.
+  */
+template<typename Scalar>
+template<typename QuatDerived>
+AngleAxis<Scalar>& AngleAxis<Scalar>::operator=(const QuaternionBase<QuatDerived>& q)
+{
+  using std::atan2;
+  Scalar n = q.vec().norm();
+  if(n<NumTraits<Scalar>::epsilon())
+    n = q.vec().stableNorm();
+  if (n > Scalar(0))
+  {
+    m_angle = Scalar(2)*atan2(n, q.w());
+    m_axis  = q.vec() / n;
+  }
+  else
+  {
+    m_angle = 0;
+    m_axis << 1, 0, 0;
+  }
+  return *this;
+}
+
+/** Set \c *this from a 3x3 rotation matrix \a mat.
+  */
+template<typename Scalar>
+template<typename Derived>
+AngleAxis<Scalar>& AngleAxis<Scalar>::operator=(const MatrixBase<Derived>& mat)
+{
+  // Since a direct conversion would not be really faster,
+  // let's use the robust Quaternion implementation:
+  return *this = QuaternionType(mat);
+}
+
+/**
+* \brief Sets \c *this from a 3x3 rotation matrix.
+**/
+template<typename Scalar>
+template<typename Derived>
+AngleAxis<Scalar>& AngleAxis<Scalar>::fromRotationMatrix(const MatrixBase<Derived>& mat)
+{
+  return *this = QuaternionType(mat);
+}
+
+/** Constructs and \returns an equivalent 3x3 rotation matrix.
+  */
+template<typename Scalar>
+typename AngleAxis<Scalar>::Matrix3
+AngleAxis<Scalar>::toRotationMatrix(void) const
+{
+  using std::sin;
+  using std::cos;
+  Matrix3 res;
+  Vector3 sin_axis  = sin(m_angle) * m_axis;
+  Scalar c = cos(m_angle);
+  Vector3 cos1_axis = (Scalar(1)-c) * m_axis;
+
+  Scalar tmp;
+  tmp = cos1_axis.x() * m_axis.y();
+  res.coeffRef(0,1) = tmp - sin_axis.z();
+  res.coeffRef(1,0) = tmp + sin_axis.z();
+
+  tmp = cos1_axis.x() * m_axis.z();
+  res.coeffRef(0,2) = tmp + sin_axis.y();
+  res.coeffRef(2,0) = tmp - sin_axis.y();
+
+  tmp = cos1_axis.y() * m_axis.z();
+  res.coeffRef(1,2) = tmp - sin_axis.x();
+  res.coeffRef(2,1) = tmp + sin_axis.x();
+
+  res.diagonal() = (cos1_axis.cwiseProduct(m_axis)).array() + c;
+
+  return res;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_ANGLEAXIS_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/EulerAngles.h b/third_party/eigen3/Eigen/src/Geometry/EulerAngles.h
new file mode 100644
index 0000000000..82802fb43c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/EulerAngles.h
@@ -0,0 +1,104 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_EULERANGLES_H
+#define EIGEN_EULERANGLES_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  *
+  * \returns the Euler-angles of the rotation matrix \c *this using the convention defined by the triplet (\a a0,\a a1,\a a2)
+  *
+  * Each of the three parameters \a a0,\a a1,\a a2 represents the respective rotation axis as an integer in {0,1,2}.
+  * For instance, in:
+  * \code Vector3f ea = mat.eulerAngles(2, 0, 2); \endcode
+  * "2" represents the z axis and "0" the x axis, etc. The returned angles are such that
+  * we have the following equality:
+  * \code
+  * mat == AngleAxisf(ea[0], Vector3f::UnitZ())
+  *      * AngleAxisf(ea[1], Vector3f::UnitX())
+  *      * AngleAxisf(ea[2], Vector3f::UnitZ()); \endcode
+  * This corresponds to the right-multiply conventions (with right hand side frames).
+  * 
+  * The returned angles are in the ranges [0:pi]x[-pi:pi]x[-pi:pi].
+  * 
+  * \sa class AngleAxis
+  */
+template<typename Derived>
+inline Matrix<typename MatrixBase<Derived>::Scalar,3,1>
+MatrixBase<Derived>::eulerAngles(Index a0, Index a1, Index a2) const
+{
+  using std::atan2;
+  using std::sin;
+  using std::cos;
+  /* Implemented from Graphics Gems IV */
+  EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(Derived,3,3)
+
+  Matrix<Scalar,3,1> res;
+  typedef Matrix<typename Derived::Scalar,2,1> Vector2;
+
+  const Index odd = ((a0+1)%3 == a1) ? 0 : 1;
+  const Index i = a0;
+  const Index j = (a0 + 1 + odd)%3;
+  const Index k = (a0 + 2 - odd)%3;
+  
+  if (a0==a2)
+  {
+    res[0] = atan2(coeff(j,i), coeff(k,i));
+    if((odd && res[0]<Scalar(0)) || ((!odd) && res[0]>Scalar(0)))
+    {
+      res[0] = (res[0] > Scalar(0)) ? res[0] - Scalar(M_PI) : res[0] + Scalar(M_PI);
+      Scalar s2 = Vector2(coeff(j,i), coeff(k,i)).norm();
+      res[1] = -atan2(s2, coeff(i,i));
+    }
+    else
+    {
+      Scalar s2 = Vector2(coeff(j,i), coeff(k,i)).norm();
+      res[1] = atan2(s2, coeff(i,i));
+    }
+    
+    // With a=(0,1,0), we have i=0; j=1; k=2, and after computing the first two angles,
+    // we can compute their respective rotation, and apply its inverse to M. Since the result must
+    // be a rotation around x, we have:
+    //
+    //  c2  s1.s2 c1.s2                   1  0   0 
+    //  0   c1    -s1       *    M    =   0  c3  s3
+    //  -s2 s1.c2 c1.c2                   0 -s3  c3
+    //
+    //  Thus:  m11.c1 - m21.s1 = c3  &   m12.c1 - m22.s1 = s3
+    
+    Scalar s1 = sin(res[0]);
+    Scalar c1 = cos(res[0]);
+    res[2] = atan2(c1*coeff(j,k)-s1*coeff(k,k), c1*coeff(j,j) - s1 * coeff(k,j));
+  } 
+  else
+  {
+    res[0] = atan2(coeff(j,k), coeff(k,k));
+    Scalar c2 = Vector2(coeff(i,i), coeff(i,j)).norm();
+    if((odd && res[0]<Scalar(0)) || ((!odd) && res[0]>Scalar(0))) {
+      res[0] = (res[0] > Scalar(0)) ? res[0] - Scalar(M_PI) : res[0] + Scalar(M_PI);
+      res[1] = atan2(-coeff(i,k), -c2);
+    }
+    else
+      res[1] = atan2(-coeff(i,k), c2);
+    Scalar s1 = sin(res[0]);
+    Scalar c1 = cos(res[0]);
+    res[2] = atan2(s1*coeff(k,i)-c1*coeff(j,i), c1*coeff(j,j) - s1 * coeff(k,j));
+  }
+  if (!odd)
+    res = -res;
+  
+  return res;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_EULERANGLES_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/Homogeneous.h b/third_party/eigen3/Eigen/src/Geometry/Homogeneous.h
new file mode 100644
index 0000000000..00e71d190c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/Homogeneous.h
@@ -0,0 +1,307 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_HOMOGENEOUS_H
+#define EIGEN_HOMOGENEOUS_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Homogeneous
+  *
+  * \brief Expression of one (or a set of) homogeneous vector(s)
+  *
+  * \param MatrixType the type of the object in which we are making homogeneous
+  *
+  * This class represents an expression of one (or a set of) homogeneous vector(s).
+  * It is the return type of MatrixBase::homogeneous() and most of the time
+  * this is the only way it is used.
+  *
+  * \sa MatrixBase::homogeneous()
+  */
+
+namespace internal {
+
+template<typename MatrixType,int Direction>
+struct traits<Homogeneous<MatrixType,Direction> >
+ : traits<MatrixType>
+{
+  typedef typename traits<MatrixType>::StorageKind StorageKind;
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_reference<MatrixTypeNested>::type _MatrixTypeNested;
+  enum {
+    RowsPlusOne = (MatrixType::RowsAtCompileTime != Dynamic) ?
+                  int(MatrixType::RowsAtCompileTime) + 1 : Dynamic,
+    ColsPlusOne = (MatrixType::ColsAtCompileTime != Dynamic) ?
+                  int(MatrixType::ColsAtCompileTime) + 1 : Dynamic,
+    RowsAtCompileTime = Direction==Vertical  ?  RowsPlusOne : MatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = Direction==Horizontal ? ColsPlusOne : MatrixType::ColsAtCompileTime,
+    MaxRowsAtCompileTime = RowsAtCompileTime,
+    MaxColsAtCompileTime = ColsAtCompileTime,
+    TmpFlags = _MatrixTypeNested::Flags & HereditaryBits,
+    Flags = ColsAtCompileTime==1 ? (TmpFlags & ~RowMajorBit)
+          : RowsAtCompileTime==1 ? (TmpFlags | RowMajorBit)
+          : TmpFlags,
+    CoeffReadCost = _MatrixTypeNested::CoeffReadCost
+  };
+};
+
+template<typename MatrixType,typename Lhs> struct homogeneous_left_product_impl;
+template<typename MatrixType,typename Rhs> struct homogeneous_right_product_impl;
+
+} // end namespace internal
+
+template<typename MatrixType,int _Direction> class Homogeneous
+  : internal::no_assignment_operator, public MatrixBase<Homogeneous<MatrixType,_Direction> >
+{
+  public:
+
+    enum { Direction = _Direction };
+
+    typedef MatrixBase<Homogeneous> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(Homogeneous)
+
+    inline Homogeneous(const MatrixType& matrix)
+      : m_matrix(matrix)
+    {}
+
+    inline Index rows() const { return m_matrix.rows() + (int(Direction)==Vertical   ? 1 : 0); }
+    inline Index cols() const { return m_matrix.cols() + (int(Direction)==Horizontal ? 1 : 0); }
+
+    inline Scalar coeff(Index row, Index col) const
+    {
+      if(  (int(Direction)==Vertical   && row==m_matrix.rows())
+        || (int(Direction)==Horizontal && col==m_matrix.cols()))
+        return 1;
+      return m_matrix.coeff(row, col);
+    }
+
+    template<typename Rhs>
+    inline const internal::homogeneous_right_product_impl<Homogeneous,Rhs>
+    operator* (const MatrixBase<Rhs>& rhs) const
+    {
+      eigen_assert(int(Direction)==Horizontal);
+      return internal::homogeneous_right_product_impl<Homogeneous,Rhs>(m_matrix,rhs.derived());
+    }
+
+    template<typename Lhs> friend
+    inline const internal::homogeneous_left_product_impl<Homogeneous,Lhs>
+    operator* (const MatrixBase<Lhs>& lhs, const Homogeneous& rhs)
+    {
+      eigen_assert(int(Direction)==Vertical);
+      return internal::homogeneous_left_product_impl<Homogeneous,Lhs>(lhs.derived(),rhs.m_matrix);
+    }
+
+    template<typename Scalar, int Dim, int Mode, int Options> friend
+    inline const internal::homogeneous_left_product_impl<Homogeneous,Transform<Scalar,Dim,Mode,Options> >
+    operator* (const Transform<Scalar,Dim,Mode,Options>& lhs, const Homogeneous& rhs)
+    {
+      eigen_assert(int(Direction)==Vertical);
+      return internal::homogeneous_left_product_impl<Homogeneous,Transform<Scalar,Dim,Mode,Options> >(lhs,rhs.m_matrix);
+    }
+
+  protected:
+    typename MatrixType::Nested m_matrix;
+};
+
+/** \geometry_module
+  *
+  * \return an expression of the equivalent homogeneous vector
+  *
+  * \only_for_vectors
+  *
+  * Example: \include MatrixBase_homogeneous.cpp
+  * Output: \verbinclude MatrixBase_homogeneous.out
+  *
+  * \sa class Homogeneous
+  */
+template<typename Derived>
+inline typename MatrixBase<Derived>::HomogeneousReturnType
+MatrixBase<Derived>::homogeneous() const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
+  return derived();
+}
+
+/** \geometry_module
+  *
+  * \returns a matrix expression of homogeneous column (or row) vectors
+  *
+  * Example: \include VectorwiseOp_homogeneous.cpp
+  * Output: \verbinclude VectorwiseOp_homogeneous.out
+  *
+  * \sa MatrixBase::homogeneous() */
+template<typename ExpressionType, int Direction>
+inline Homogeneous<ExpressionType,Direction>
+VectorwiseOp<ExpressionType,Direction>::homogeneous() const
+{
+  return _expression();
+}
+
+/** \geometry_module
+  *
+  * \returns an expression of the homogeneous normalized vector of \c *this
+  *
+  * Example: \include MatrixBase_hnormalized.cpp
+  * Output: \verbinclude MatrixBase_hnormalized.out
+  *
+  * \sa VectorwiseOp::hnormalized() */
+template<typename Derived>
+inline const typename MatrixBase<Derived>::HNormalizedReturnType
+MatrixBase<Derived>::hnormalized() const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
+  return ConstStartMinusOne(derived(),0,0,
+    ColsAtCompileTime==1?size()-1:1,
+    ColsAtCompileTime==1?1:size()-1) / coeff(size()-1);
+}
+
+/** \geometry_module
+  *
+  * \returns an expression of the homogeneous normalized vector of \c *this
+  *
+  * Example: \include DirectionWise_hnormalized.cpp
+  * Output: \verbinclude DirectionWise_hnormalized.out
+  *
+  * \sa MatrixBase::hnormalized() */
+template<typename ExpressionType, int Direction>
+inline const typename VectorwiseOp<ExpressionType,Direction>::HNormalizedReturnType
+VectorwiseOp<ExpressionType,Direction>::hnormalized() const
+{
+  return HNormalized_Block(_expression(),0,0,
+      Direction==Vertical   ? _expression().rows()-1 : _expression().rows(),
+      Direction==Horizontal ? _expression().cols()-1 : _expression().cols()).cwiseQuotient(
+      Replicate<HNormalized_Factors,
+                Direction==Vertical   ? HNormalized_SizeMinusOne : 1,
+                Direction==Horizontal ? HNormalized_SizeMinusOne : 1>
+        (HNormalized_Factors(_expression(),
+          Direction==Vertical    ? _expression().rows()-1:0,
+          Direction==Horizontal  ? _expression().cols()-1:0,
+          Direction==Vertical    ? 1 : _expression().rows(),
+          Direction==Horizontal  ? 1 : _expression().cols()),
+         Direction==Vertical   ? _expression().rows()-1 : 1,
+         Direction==Horizontal ? _expression().cols()-1 : 1));
+}
+
+namespace internal {
+
+template<typename MatrixOrTransformType>
+struct take_matrix_for_product
+{
+  typedef MatrixOrTransformType type;
+  static const type& run(const type &x) { return x; }
+};
+
+template<typename Scalar, int Dim, int Mode,int Options>
+struct take_matrix_for_product<Transform<Scalar, Dim, Mode, Options> >
+{
+  typedef Transform<Scalar, Dim, Mode, Options> TransformType;
+  typedef typename internal::add_const<typename TransformType::ConstAffinePart>::type type;
+  static type run (const TransformType& x) { return x.affine(); }
+};
+
+template<typename Scalar, int Dim, int Options>
+struct take_matrix_for_product<Transform<Scalar, Dim, Projective, Options> >
+{
+  typedef Transform<Scalar, Dim, Projective, Options> TransformType;
+  typedef typename TransformType::MatrixType type;
+  static const type& run (const TransformType& x) { return x.matrix(); }
+};
+
+template<typename MatrixType,typename Lhs>
+struct traits<homogeneous_left_product_impl<Homogeneous<MatrixType,Vertical>,Lhs> >
+{
+  typedef typename take_matrix_for_product<Lhs>::type LhsMatrixType;
+  typedef typename remove_all<MatrixType>::type MatrixTypeCleaned;
+  typedef typename remove_all<LhsMatrixType>::type LhsMatrixTypeCleaned;
+  typedef typename make_proper_matrix_type<
+                 typename traits<MatrixTypeCleaned>::Scalar,
+                 LhsMatrixTypeCleaned::RowsAtCompileTime,
+                 MatrixTypeCleaned::ColsAtCompileTime,
+                 MatrixTypeCleaned::PlainObject::Options,
+                 LhsMatrixTypeCleaned::MaxRowsAtCompileTime,
+                 MatrixTypeCleaned::MaxColsAtCompileTime>::type ReturnType;
+};
+
+template<typename MatrixType,typename Lhs>
+struct homogeneous_left_product_impl<Homogeneous<MatrixType,Vertical>,Lhs>
+  : public ReturnByValue<homogeneous_left_product_impl<Homogeneous<MatrixType,Vertical>,Lhs> >
+{
+  typedef typename traits<homogeneous_left_product_impl>::LhsMatrixType LhsMatrixType;
+  typedef typename remove_all<LhsMatrixType>::type LhsMatrixTypeCleaned;
+  typedef typename remove_all<typename LhsMatrixTypeCleaned::Nested>::type LhsMatrixTypeNested;
+  typedef typename MatrixType::Index Index;
+  homogeneous_left_product_impl(const Lhs& lhs, const MatrixType& rhs)
+    : m_lhs(take_matrix_for_product<Lhs>::run(lhs)),
+      m_rhs(rhs)
+  {}
+
+  inline Index rows() const { return m_lhs.rows(); }
+  inline Index cols() const { return m_rhs.cols(); }
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    // FIXME investigate how to allow lazy evaluation of this product when possible
+    dst = Block<const LhsMatrixTypeNested,
+              LhsMatrixTypeNested::RowsAtCompileTime,
+              LhsMatrixTypeNested::ColsAtCompileTime==Dynamic?Dynamic:LhsMatrixTypeNested::ColsAtCompileTime-1>
+            (m_lhs,0,0,m_lhs.rows(),m_lhs.cols()-1) * m_rhs;
+    dst += m_lhs.col(m_lhs.cols()-1).rowwise()
+            .template replicate<MatrixType::ColsAtCompileTime>(m_rhs.cols());
+  }
+
+  typename LhsMatrixTypeCleaned::Nested m_lhs;
+  typename MatrixType::Nested m_rhs;
+};
+
+template<typename MatrixType,typename Rhs>
+struct traits<homogeneous_right_product_impl<Homogeneous<MatrixType,Horizontal>,Rhs> >
+{
+  typedef typename make_proper_matrix_type<typename traits<MatrixType>::Scalar,
+                 MatrixType::RowsAtCompileTime,
+                 Rhs::ColsAtCompileTime,
+                 MatrixType::PlainObject::Options,
+                 MatrixType::MaxRowsAtCompileTime,
+                 Rhs::MaxColsAtCompileTime>::type ReturnType;
+};
+
+template<typename MatrixType,typename Rhs>
+struct homogeneous_right_product_impl<Homogeneous<MatrixType,Horizontal>,Rhs>
+  : public ReturnByValue<homogeneous_right_product_impl<Homogeneous<MatrixType,Horizontal>,Rhs> >
+{
+  typedef typename remove_all<typename Rhs::Nested>::type RhsNested;
+  typedef typename MatrixType::Index Index;
+  homogeneous_right_product_impl(const MatrixType& lhs, const Rhs& rhs)
+    : m_lhs(lhs), m_rhs(rhs)
+  {}
+
+  inline Index rows() const { return m_lhs.rows(); }
+  inline Index cols() const { return m_rhs.cols(); }
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    // FIXME investigate how to allow lazy evaluation of this product when possible
+    dst = m_lhs * Block<const RhsNested,
+                        RhsNested::RowsAtCompileTime==Dynamic?Dynamic:RhsNested::RowsAtCompileTime-1,
+                        RhsNested::ColsAtCompileTime>
+            (m_rhs,0,0,m_rhs.rows()-1,m_rhs.cols());
+    dst += m_rhs.row(m_rhs.rows()-1).colwise()
+            .template replicate<MatrixType::RowsAtCompileTime>(m_lhs.rows());
+  }
+
+  typename MatrixType::Nested m_lhs;
+  typename Rhs::Nested m_rhs;
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_HOMOGENEOUS_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/Hyperplane.h b/third_party/eigen3/Eigen/src/Geometry/Hyperplane.h
new file mode 100644
index 0000000000..aeff43fefa
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/Hyperplane.h
@@ -0,0 +1,270 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_HYPERPLANE_H
+#define EIGEN_HYPERPLANE_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Hyperplane
+  *
+  * \brief A hyperplane
+  *
+  * A hyperplane is an affine subspace of dimension n-1 in a space of dimension n.
+  * For example, a hyperplane in a plane is a line; a hyperplane in 3-space is a plane.
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients
+  * \param _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic.
+  *             Notice that the dimension of the hyperplane is _AmbientDim-1.
+  *
+  * This class represents an hyperplane as the zero set of the implicit equation
+  * \f$ n \cdot x + d = 0 \f$ where \f$ n \f$ is a unit normal vector of the plane (linear part)
+  * and \f$ d \f$ is the distance (offset) to the origin.
+  */
+template <typename _Scalar, int _AmbientDim, int _Options>
+class Hyperplane
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim==Dynamic ? Dynamic : _AmbientDim+1)
+  enum {
+    AmbientDimAtCompileTime = _AmbientDim,
+    Options = _Options
+  };
+  typedef _Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef DenseIndex Index;
+  typedef Matrix<Scalar,AmbientDimAtCompileTime,1> VectorType;
+  typedef Matrix<Scalar,Index(AmbientDimAtCompileTime)==Dynamic
+                        ? Dynamic
+                        : Index(AmbientDimAtCompileTime)+1,1,Options> Coefficients;
+  typedef Block<Coefficients,AmbientDimAtCompileTime,1> NormalReturnType;
+  typedef const Block<const Coefficients,AmbientDimAtCompileTime,1> ConstNormalReturnType;
+
+  /** Default constructor without initialization */
+  inline Hyperplane() {}
+  
+  template<int OtherOptions>
+  Hyperplane(const Hyperplane<Scalar,AmbientDimAtCompileTime,OtherOptions>& other)
+   : m_coeffs(other.coeffs())
+  {}
+
+  /** Constructs a dynamic-size hyperplane with \a _dim the dimension
+    * of the ambient space */
+  inline explicit Hyperplane(Index _dim) : m_coeffs(_dim+1) {}
+
+  /** Construct a plane from its normal \a n and a point \a e onto the plane.
+    * \warning the vector normal is assumed to be normalized.
+    */
+  inline Hyperplane(const VectorType& n, const VectorType& e)
+    : m_coeffs(n.size()+1)
+  {
+    normal() = n;
+    offset() = -n.dot(e);
+  }
+
+  /** Constructs a plane from its normal \a n and distance to the origin \a d
+    * such that the algebraic equation of the plane is \f$ n \cdot x + d = 0 \f$.
+    * \warning the vector normal is assumed to be normalized.
+    */
+  inline Hyperplane(const VectorType& n, const Scalar& d)
+    : m_coeffs(n.size()+1)
+  {
+    normal() = n;
+    offset() = d;
+  }
+
+  /** Constructs a hyperplane passing through the two points. If the dimension of the ambient space
+    * is greater than 2, then there isn't uniqueness, so an arbitrary choice is made.
+    */
+  static inline Hyperplane Through(const VectorType& p0, const VectorType& p1)
+  {
+    Hyperplane result(p0.size());
+    result.normal() = (p1 - p0).unitOrthogonal();
+    result.offset() = -p0.dot(result.normal());
+    return result;
+  }
+
+  /** Constructs a hyperplane passing through the three points. The dimension of the ambient space
+    * is required to be exactly 3.
+    */
+  static inline Hyperplane Through(const VectorType& p0, const VectorType& p1, const VectorType& p2)
+  {
+    EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 3)
+    Hyperplane result(p0.size());
+    result.normal() = (p2 - p0).cross(p1 - p0).normalized();
+    result.offset() = -p0.dot(result.normal());
+    return result;
+  }
+
+  /** Constructs a hyperplane passing through the parametrized line \a parametrized.
+    * If the dimension of the ambient space is greater than 2, then there isn't uniqueness,
+    * so an arbitrary choice is made.
+    */
+  // FIXME to be consitent with the rest this could be implemented as a static Through function ??
+  explicit Hyperplane(const ParametrizedLine<Scalar, AmbientDimAtCompileTime>& parametrized)
+  {
+    normal() = parametrized.direction().unitOrthogonal();
+    offset() = -parametrized.origin().dot(normal());
+  }
+
+  ~Hyperplane() {}
+
+  /** \returns the dimension in which the plane holds */
+  inline Index dim() const { return AmbientDimAtCompileTime==Dynamic ? m_coeffs.size()-1 : Index(AmbientDimAtCompileTime); }
+
+  /** normalizes \c *this */
+  void normalize(void)
+  {
+    m_coeffs /= normal().norm();
+  }
+
+  /** \returns the signed distance between the plane \c *this and a point \a p.
+    * \sa absDistance()
+    */
+  inline Scalar signedDistance(const VectorType& p) const { return normal().dot(p) + offset(); }
+
+  /** \returns the absolute distance between the plane \c *this and a point \a p.
+    * \sa signedDistance()
+    */
+  inline Scalar absDistance(const VectorType& p) const { using std::abs; return abs(signedDistance(p)); }
+
+  /** \returns the projection of a point \a p onto the plane \c *this.
+    */
+  inline VectorType projection(const VectorType& p) const { return p - signedDistance(p) * normal(); }
+
+  /** \returns a constant reference to the unit normal vector of the plane, which corresponds
+    * to the linear part of the implicit equation.
+    */
+  inline ConstNormalReturnType normal() const { return ConstNormalReturnType(m_coeffs,0,0,dim(),1); }
+
+  /** \returns a non-constant reference to the unit normal vector of the plane, which corresponds
+    * to the linear part of the implicit equation.
+    */
+  inline NormalReturnType normal() { return NormalReturnType(m_coeffs,0,0,dim(),1); }
+
+  /** \returns the distance to the origin, which is also the "constant term" of the implicit equation
+    * \warning the vector normal is assumed to be normalized.
+    */
+  inline const Scalar& offset() const { return m_coeffs.coeff(dim()); }
+
+  /** \returns a non-constant reference to the distance to the origin, which is also the constant part
+    * of the implicit equation */
+  inline Scalar& offset() { return m_coeffs(dim()); }
+
+  /** \returns a constant reference to the coefficients c_i of the plane equation:
+    * \f$ c_0*x_0 + ... + c_{d-1}*x_{d-1} + c_d = 0 \f$
+    */
+  inline const Coefficients& coeffs() const { return m_coeffs; }
+
+  /** \returns a non-constant reference to the coefficients c_i of the plane equation:
+    * \f$ c_0*x_0 + ... + c_{d-1}*x_{d-1} + c_d = 0 \f$
+    */
+  inline Coefficients& coeffs() { return m_coeffs; }
+
+  /** \returns the intersection of *this with \a other.
+    *
+    * \warning The ambient space must be a plane, i.e. have dimension 2, so that \c *this and \a other are lines.
+    *
+    * \note If \a other is approximately parallel to *this, this method will return any point on *this.
+    */
+  VectorType intersection(const Hyperplane& other) const
+  {
+    using std::abs;
+    EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 2)
+    Scalar det = coeffs().coeff(0) * other.coeffs().coeff(1) - coeffs().coeff(1) * other.coeffs().coeff(0);
+    // since the line equations ax+by=c are normalized with a^2+b^2=1, the following tests
+    // whether the two lines are approximately parallel.
+    if(internal::isMuchSmallerThan(det, Scalar(1)))
+    {   // special case where the two lines are approximately parallel. Pick any point on the first line.
+        if(abs(coeffs().coeff(1))>abs(coeffs().coeff(0)))
+            return VectorType(coeffs().coeff(1), -coeffs().coeff(2)/coeffs().coeff(1)-coeffs().coeff(0));
+        else
+            return VectorType(-coeffs().coeff(2)/coeffs().coeff(0)-coeffs().coeff(1), coeffs().coeff(0));
+    }
+    else
+    {   // general case
+        Scalar invdet = Scalar(1) / det;
+        return VectorType(invdet*(coeffs().coeff(1)*other.coeffs().coeff(2)-other.coeffs().coeff(1)*coeffs().coeff(2)),
+                          invdet*(other.coeffs().coeff(0)*coeffs().coeff(2)-coeffs().coeff(0)*other.coeffs().coeff(2)));
+    }
+  }
+
+  /** Applies the transformation matrix \a mat to \c *this and returns a reference to \c *this.
+    *
+    * \param mat the Dim x Dim transformation matrix
+    * \param traits specifies whether the matrix \a mat represents an #Isometry
+    *               or a more generic #Affine transformation. The default is #Affine.
+    */
+  template<typename XprType>
+  inline Hyperplane& transform(const MatrixBase<XprType>& mat, TransformTraits traits = Affine)
+  {
+    if (traits==Affine)
+      normal() = mat.inverse().transpose() * normal();
+    else if (traits==Isometry)
+      normal() = mat * normal();
+    else
+    {
+      eigen_assert(0 && "invalid traits value in Hyperplane::transform()");
+    }
+    return *this;
+  }
+
+  /** Applies the transformation \a t to \c *this and returns a reference to \c *this.
+    *
+    * \param t the transformation of dimension Dim
+    * \param traits specifies whether the transformation \a t represents an #Isometry
+    *               or a more generic #Affine transformation. The default is #Affine.
+    *               Other kind of transformations are not supported.
+    */
+  template<int TrOptions>
+  inline Hyperplane& transform(const Transform<Scalar,AmbientDimAtCompileTime,Affine,TrOptions>& t,
+                                TransformTraits traits = Affine)
+  {
+    transform(t.linear(), traits);
+    offset() -= normal().dot(t.translation());
+    return *this;
+  }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Hyperplane,
+           Hyperplane<NewScalarType,AmbientDimAtCompileTime,Options> >::type cast() const
+  {
+    return typename internal::cast_return_type<Hyperplane,
+                    Hyperplane<NewScalarType,AmbientDimAtCompileTime,Options> >::type(*this);
+  }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType,int OtherOptions>
+  inline explicit Hyperplane(const Hyperplane<OtherScalarType,AmbientDimAtCompileTime,OtherOptions>& other)
+  { m_coeffs = other.coeffs().template cast<Scalar>(); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  template<int OtherOptions>
+  bool isApprox(const Hyperplane<Scalar,AmbientDimAtCompileTime,OtherOptions>& other, const typename NumTraits<Scalar>::Real& prec = NumTraits<Scalar>::dummy_precision()) const
+  { return m_coeffs.isApprox(other.m_coeffs, prec); }
+
+protected:
+
+  Coefficients m_coeffs;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_HYPERPLANE_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/OrthoMethods.h b/third_party/eigen3/Eigen/src/Geometry/OrthoMethods.h
new file mode 100644
index 0000000000..26be3ee5b9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/OrthoMethods.h
@@ -0,0 +1,221 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ORTHOMETHODS_H
+#define EIGEN_ORTHOMETHODS_H
+
+namespace Eigen { 
+
+/** \geometry_module
+  *
+  * \returns the cross product of \c *this and \a other
+  *
+  * Here is a very good explanation of cross-product: http://xkcd.com/199/
+  * \sa MatrixBase::cross3()
+  */
+template<typename Derived>
+template<typename OtherDerived>
+inline typename MatrixBase<Derived>::template cross_product_return_type<OtherDerived>::type
+MatrixBase<Derived>::cross(const MatrixBase<OtherDerived>& other) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Derived,3)
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,3)
+
+  // Note that there is no need for an expression here since the compiler
+  // optimize such a small temporary very well (even within a complex expression)
+  typename internal::nested<Derived,2>::type lhs(derived());
+  typename internal::nested<OtherDerived,2>::type rhs(other.derived());
+  return typename cross_product_return_type<OtherDerived>::type(
+    numext::conj(lhs.coeff(1) * rhs.coeff(2) - lhs.coeff(2) * rhs.coeff(1)),
+    numext::conj(lhs.coeff(2) * rhs.coeff(0) - lhs.coeff(0) * rhs.coeff(2)),
+    numext::conj(lhs.coeff(0) * rhs.coeff(1) - lhs.coeff(1) * rhs.coeff(0))
+  );
+}
+
+namespace internal {
+
+template< int Arch,typename VectorLhs,typename VectorRhs,
+          typename Scalar = typename VectorLhs::Scalar,
+          bool Vectorizable = bool((VectorLhs::Flags&VectorRhs::Flags)&PacketAccessBit)>
+struct cross3_impl {
+  static inline typename internal::plain_matrix_type<VectorLhs>::type
+  run(const VectorLhs& lhs, const VectorRhs& rhs)
+  {
+    return typename internal::plain_matrix_type<VectorLhs>::type(
+      numext::conj(lhs.coeff(1) * rhs.coeff(2) - lhs.coeff(2) * rhs.coeff(1)),
+      numext::conj(lhs.coeff(2) * rhs.coeff(0) - lhs.coeff(0) * rhs.coeff(2)),
+      numext::conj(lhs.coeff(0) * rhs.coeff(1) - lhs.coeff(1) * rhs.coeff(0)),
+      0
+    );
+  }
+};
+
+}
+
+/** \geometry_module
+  *
+  * \returns the cross product of \c *this and \a other using only the x, y, and z coefficients
+  *
+  * The size of \c *this and \a other must be four. This function is especially useful
+  * when using 4D vectors instead of 3D ones to get advantage of SSE/AltiVec vectorization.
+  *
+  * \sa MatrixBase::cross()
+  */
+template<typename Derived>
+template<typename OtherDerived>
+inline typename MatrixBase<Derived>::PlainObject
+MatrixBase<Derived>::cross3(const MatrixBase<OtherDerived>& other) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(Derived,4)
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,4)
+
+  typedef typename internal::nested<Derived,2>::type DerivedNested;
+  typedef typename internal::nested<OtherDerived,2>::type OtherDerivedNested;
+  DerivedNested lhs(derived());
+  OtherDerivedNested rhs(other.derived());
+
+  return internal::cross3_impl<Architecture::Target,
+                        typename internal::remove_all<DerivedNested>::type,
+                        typename internal::remove_all<OtherDerivedNested>::type>::run(lhs,rhs);
+}
+
+/** \returns a matrix expression of the cross product of each column or row
+  * of the referenced expression with the \a other vector.
+  *
+  * The referenced matrix must have one dimension equal to 3.
+  * The result matrix has the same dimensions than the referenced one.
+  *
+  * \geometry_module
+  *
+  * \sa MatrixBase::cross() */
+template<typename ExpressionType, int Direction>
+template<typename OtherDerived>
+const typename VectorwiseOp<ExpressionType,Direction>::CrossReturnType
+VectorwiseOp<ExpressionType,Direction>::cross(const MatrixBase<OtherDerived>& other) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,3)
+  EIGEN_STATIC_ASSERT((internal::is_same<Scalar, typename OtherDerived::Scalar>::value),
+    YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+  CrossReturnType res(_expression().rows(),_expression().cols());
+  if(Direction==Vertical)
+  {
+    eigen_assert(CrossReturnType::RowsAtCompileTime==3 && "the matrix must have exactly 3 rows");
+    res.row(0) = (_expression().row(1) * other.coeff(2) - _expression().row(2) * other.coeff(1)).conjugate();
+    res.row(1) = (_expression().row(2) * other.coeff(0) - _expression().row(0) * other.coeff(2)).conjugate();
+    res.row(2) = (_expression().row(0) * other.coeff(1) - _expression().row(1) * other.coeff(0)).conjugate();
+  }
+  else
+  {
+    eigen_assert(CrossReturnType::ColsAtCompileTime==3 && "the matrix must have exactly 3 columns");
+    res.col(0) = (_expression().col(1) * other.coeff(2) - _expression().col(2) * other.coeff(1)).conjugate();
+    res.col(1) = (_expression().col(2) * other.coeff(0) - _expression().col(0) * other.coeff(2)).conjugate();
+    res.col(2) = (_expression().col(0) * other.coeff(1) - _expression().col(1) * other.coeff(0)).conjugate();
+  }
+  return res;
+}
+
+namespace internal {
+
+template<typename Derived, int Size = Derived::SizeAtCompileTime>
+struct unitOrthogonal_selector
+{
+  typedef typename plain_matrix_type<Derived>::type VectorType;
+  typedef typename traits<Derived>::Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef typename Derived::Index Index;
+  typedef Matrix<Scalar,2,1> Vector2;
+  EIGEN_DEVICE_FUNC
+  static inline VectorType run(const Derived& src)
+  {
+    VectorType perp = VectorType::Zero(src.size());
+    Index maxi = 0;
+    Index sndi = 0;
+    src.cwiseAbs().maxCoeff(&maxi);
+    if (maxi==0)
+      sndi = 1;
+    RealScalar invnm = RealScalar(1)/(Vector2() << src.coeff(sndi),src.coeff(maxi)).finished().norm();
+    perp.coeffRef(maxi) = -numext::conj(src.coeff(sndi)) * invnm;
+    perp.coeffRef(sndi) =  numext::conj(src.coeff(maxi)) * invnm;
+
+    return perp;
+   }
+};
+
+template<typename Derived>
+struct unitOrthogonal_selector<Derived,3>
+{
+  typedef typename plain_matrix_type<Derived>::type VectorType;
+  typedef typename traits<Derived>::Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  EIGEN_DEVICE_FUNC
+  static inline VectorType run(const Derived& src)
+  {
+    VectorType perp;
+    /* Let us compute the crossed product of *this with a vector
+     * that is not too close to being colinear to *this.
+     */
+
+    /* unless the x and y coords are both close to zero, we can
+     * simply take ( -y, x, 0 ) and normalize it.
+     */
+    if((!isMuchSmallerThan(src.x(), src.z()))
+    || (!isMuchSmallerThan(src.y(), src.z())))
+    {
+      RealScalar invnm = RealScalar(1)/src.template head<2>().norm();
+      perp.coeffRef(0) = -numext::conj(src.y())*invnm;
+      perp.coeffRef(1) = numext::conj(src.x())*invnm;
+      perp.coeffRef(2) = 0;
+    }
+    /* if both x and y are close to zero, then the vector is close
+     * to the z-axis, so it's far from colinear to the x-axis for instance.
+     * So we take the crossed product with (1,0,0) and normalize it.
+     */
+    else
+    {
+      RealScalar invnm = RealScalar(1)/src.template tail<2>().norm();
+      perp.coeffRef(0) = 0;
+      perp.coeffRef(1) = -numext::conj(src.z())*invnm;
+      perp.coeffRef(2) = numext::conj(src.y())*invnm;
+    }
+
+    return perp;
+   }
+};
+
+template<typename Derived>
+struct unitOrthogonal_selector<Derived,2>
+{
+  typedef typename plain_matrix_type<Derived>::type VectorType;
+  EIGEN_DEVICE_FUNC
+  static inline VectorType run(const Derived& src)
+  { return VectorType(-numext::conj(src.y()), numext::conj(src.x())).normalized(); }
+};
+
+} // end namespace internal
+
+/** \returns a unit vector which is orthogonal to \c *this
+  *
+  * The size of \c *this must be at least 2. If the size is exactly 2,
+  * then the returned vector is a counter clock wise rotation of \c *this, i.e., (-y,x).normalized().
+  *
+  * \sa cross()
+  */
+template<typename Derived>
+typename MatrixBase<Derived>::PlainObject
+MatrixBase<Derived>::unitOrthogonal() const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return internal::unitOrthogonal_selector<Derived>::run(derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_ORTHOMETHODS_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/ParametrizedLine.h b/third_party/eigen3/Eigen/src/Geometry/ParametrizedLine.h
new file mode 100644
index 0000000000..77fa228e6a
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/ParametrizedLine.h
@@ -0,0 +1,195 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PARAMETRIZEDLINE_H
+#define EIGEN_PARAMETRIZEDLINE_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class ParametrizedLine
+  *
+  * \brief A parametrized line
+  *
+  * A parametrized line is defined by an origin point \f$ \mathbf{o} \f$ and a unit
+  * direction vector \f$ \mathbf{d} \f$ such that the line corresponds to
+  * the set \f$ l(t) = \mathbf{o} + t \mathbf{d} \f$, \f$ t \in \mathbf{R} \f$.
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients
+  * \param _AmbientDim the dimension of the ambient space, can be a compile time value or Dynamic.
+  */
+template <typename _Scalar, int _AmbientDim, int _Options>
+class ParametrizedLine
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_AmbientDim)
+  enum {
+    AmbientDimAtCompileTime = _AmbientDim,
+    Options = _Options
+  };
+  typedef _Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef DenseIndex Index;
+  typedef Matrix<Scalar,AmbientDimAtCompileTime,1,Options> VectorType;
+
+  /** Default constructor without initialization */
+  inline ParametrizedLine() {}
+  
+  template<int OtherOptions>
+  ParametrizedLine(const ParametrizedLine<Scalar,AmbientDimAtCompileTime,OtherOptions>& other)
+   : m_origin(other.origin()), m_direction(other.direction())
+  {}
+
+  /** Constructs a dynamic-size line with \a _dim the dimension
+    * of the ambient space */
+  inline explicit ParametrizedLine(Index _dim) : m_origin(_dim), m_direction(_dim) {}
+
+  /** Initializes a parametrized line of direction \a direction and origin \a origin.
+    * \warning the vector direction is assumed to be normalized.
+    */
+  ParametrizedLine(const VectorType& origin, const VectorType& direction)
+    : m_origin(origin), m_direction(direction) {}
+
+  template <int OtherOptions>
+  explicit ParametrizedLine(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane);
+
+  /** Constructs a parametrized line going from \a p0 to \a p1. */
+  static inline ParametrizedLine Through(const VectorType& p0, const VectorType& p1)
+  { return ParametrizedLine(p0, (p1-p0).normalized()); }
+
+  ~ParametrizedLine() {}
+
+  /** \returns the dimension in which the line holds */
+  inline Index dim() const { return m_direction.size(); }
+
+  const VectorType& origin() const { return m_origin; }
+  VectorType& origin() { return m_origin; }
+
+  const VectorType& direction() const { return m_direction; }
+  VectorType& direction() { return m_direction; }
+
+  /** \returns the squared distance of a point \a p to its projection onto the line \c *this.
+    * \sa distance()
+    */
+  RealScalar squaredDistance(const VectorType& p) const
+  {
+    VectorType diff = p - origin();
+    return (diff - direction().dot(diff) * direction()).squaredNorm();
+  }
+  /** \returns the distance of a point \a p to its projection onto the line \c *this.
+    * \sa squaredDistance()
+    */
+  RealScalar distance(const VectorType& p) const { using std::sqrt; return sqrt(squaredDistance(p)); }
+
+  /** \returns the projection of a point \a p onto the line \c *this. */
+  VectorType projection(const VectorType& p) const
+  { return origin() + direction().dot(p-origin()) * direction(); }
+
+  VectorType pointAt(const Scalar& t) const;
+  
+  template <int OtherOptions>
+  Scalar intersectionParameter(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const;
+ 
+  template <int OtherOptions>
+  Scalar intersection(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const;
+  
+  template <int OtherOptions>
+  VectorType intersectionPoint(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const;
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<ParametrizedLine,
+           ParametrizedLine<NewScalarType,AmbientDimAtCompileTime,Options> >::type cast() const
+  {
+    return typename internal::cast_return_type<ParametrizedLine,
+                    ParametrizedLine<NewScalarType,AmbientDimAtCompileTime,Options> >::type(*this);
+  }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType,int OtherOptions>
+  inline explicit ParametrizedLine(const ParametrizedLine<OtherScalarType,AmbientDimAtCompileTime,OtherOptions>& other)
+  {
+    m_origin = other.origin().template cast<Scalar>();
+    m_direction = other.direction().template cast<Scalar>();
+  }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const ParametrizedLine& other, typename NumTraits<Scalar>::Real prec = NumTraits<Scalar>::dummy_precision()) const
+  { return m_origin.isApprox(other.m_origin, prec) && m_direction.isApprox(other.m_direction, prec); }
+
+protected:
+
+  VectorType m_origin, m_direction;
+};
+
+/** Constructs a parametrized line from a 2D hyperplane
+  *
+  * \warning the ambient space must have dimension 2 such that the hyperplane actually describes a line
+  */
+template <typename _Scalar, int _AmbientDim, int _Options>
+template <int OtherOptions>
+inline ParametrizedLine<_Scalar, _AmbientDim,_Options>::ParametrizedLine(const Hyperplane<_Scalar, _AmbientDim,OtherOptions>& hyperplane)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(VectorType, 2)
+  direction() = hyperplane.normal().unitOrthogonal();
+  origin() = -hyperplane.normal()*hyperplane.offset();
+}
+
+/** \returns the point at \a t along this line
+  */
+template <typename _Scalar, int _AmbientDim, int _Options>
+inline typename ParametrizedLine<_Scalar, _AmbientDim,_Options>::VectorType
+ParametrizedLine<_Scalar, _AmbientDim,_Options>::pointAt(const _Scalar& t) const
+{
+  return origin() + (direction()*t); 
+}
+
+/** \returns the parameter value of the intersection between \c *this and the given \a hyperplane
+  */
+template <typename _Scalar, int _AmbientDim, int _Options>
+template <int OtherOptions>
+inline _Scalar ParametrizedLine<_Scalar, _AmbientDim,_Options>::intersectionParameter(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const
+{
+  return -(hyperplane.offset()+hyperplane.normal().dot(origin()))
+          / hyperplane.normal().dot(direction());
+}
+
+
+/** \deprecated use intersectionParameter()
+  * \returns the parameter value of the intersection between \c *this and the given \a hyperplane
+  */
+template <typename _Scalar, int _AmbientDim, int _Options>
+template <int OtherOptions>
+inline _Scalar ParametrizedLine<_Scalar, _AmbientDim,_Options>::intersection(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const
+{
+  return intersectionParameter(hyperplane);
+}
+
+/** \returns the point of the intersection between \c *this and the given hyperplane
+  */
+template <typename _Scalar, int _AmbientDim, int _Options>
+template <int OtherOptions>
+inline typename ParametrizedLine<_Scalar, _AmbientDim,_Options>::VectorType
+ParametrizedLine<_Scalar, _AmbientDim,_Options>::intersectionPoint(const Hyperplane<_Scalar, _AmbientDim, OtherOptions>& hyperplane) const
+{
+  return pointAt(intersectionParameter(hyperplane));
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_PARAMETRIZEDLINE_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/Quaternion.h b/third_party/eigen3/Eigen/src/Geometry/Quaternion.h
new file mode 100644
index 0000000000..8524befddf
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/Quaternion.h
@@ -0,0 +1,778 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Mathieu Gautier <mathieu.gautier@cea.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_QUATERNION_H
+#define EIGEN_QUATERNION_H
+namespace Eigen { 
+
+
+/***************************************************************************
+* Definition of QuaternionBase<Derived>
+* The implementation is at the end of the file
+***************************************************************************/
+
+namespace internal {
+template<typename Other,
+         int OtherRows=Other::RowsAtCompileTime,
+         int OtherCols=Other::ColsAtCompileTime>
+struct quaternionbase_assign_impl;
+}
+
+/** \geometry_module \ingroup Geometry_Module
+  * \class QuaternionBase
+  * \brief Base class for quaternion expressions
+  * \tparam Derived derived type (CRTP)
+  * \sa class Quaternion
+  */
+template<class Derived>
+class QuaternionBase : public RotationBase<Derived, 3>
+{
+ public:
+  typedef RotationBase<Derived, 3> Base;
+
+  using Base::operator*;
+  using Base::derived;
+
+  typedef typename internal::traits<Derived>::Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef typename internal::traits<Derived>::Coefficients Coefficients;
+  enum {
+    Flags = Eigen::internal::traits<Derived>::Flags
+  };
+
+ // typedef typename Matrix<Scalar,4,1> Coefficients;
+  /** the type of a 3D vector */
+  typedef Matrix<Scalar,3,1> Vector3;
+  /** the equivalent rotation matrix type */
+  typedef Matrix<Scalar,3,3> Matrix3;
+  /** the equivalent angle-axis type */
+  typedef AngleAxis<Scalar> AngleAxisType;
+
+
+
+  /** \returns the \c x coefficient */
+  inline Scalar x() const { return this->derived().coeffs().coeff(0); }
+  /** \returns the \c y coefficient */
+  inline Scalar y() const { return this->derived().coeffs().coeff(1); }
+  /** \returns the \c z coefficient */
+  inline Scalar z() const { return this->derived().coeffs().coeff(2); }
+  /** \returns the \c w coefficient */
+  inline Scalar w() const { return this->derived().coeffs().coeff(3); }
+
+  /** \returns a reference to the \c x coefficient */
+  inline Scalar& x() { return this->derived().coeffs().coeffRef(0); }
+  /** \returns a reference to the \c y coefficient */
+  inline Scalar& y() { return this->derived().coeffs().coeffRef(1); }
+  /** \returns a reference to the \c z coefficient */
+  inline Scalar& z() { return this->derived().coeffs().coeffRef(2); }
+  /** \returns a reference to the \c w coefficient */
+  inline Scalar& w() { return this->derived().coeffs().coeffRef(3); }
+
+  /** \returns a read-only vector expression of the imaginary part (x,y,z) */
+  inline const VectorBlock<const Coefficients,3> vec() const { return coeffs().template head<3>(); }
+
+  /** \returns a vector expression of the imaginary part (x,y,z) */
+  inline VectorBlock<Coefficients,3> vec() { return coeffs().template head<3>(); }
+
+  /** \returns a read-only vector expression of the coefficients (x,y,z,w) */
+  inline const typename internal::traits<Derived>::Coefficients& coeffs() const { return derived().coeffs(); }
+
+  /** \returns a vector expression of the coefficients (x,y,z,w) */
+  inline typename internal::traits<Derived>::Coefficients& coeffs() { return derived().coeffs(); }
+
+  EIGEN_STRONG_INLINE QuaternionBase<Derived>& operator=(const QuaternionBase<Derived>& other);
+  template<class OtherDerived> EIGEN_STRONG_INLINE Derived& operator=(const QuaternionBase<OtherDerived>& other);
+
+// disabled this copy operator as it is giving very strange compilation errors when compiling
+// test_stdvector with GCC 4.4.2. This looks like a GCC bug though, so feel free to re-enable it if it's
+// useful; however notice that we already have the templated operator= above and e.g. in MatrixBase
+// we didn't have to add, in addition to templated operator=, such a non-templated copy operator.
+//  Derived& operator=(const QuaternionBase& other)
+//  { return operator=<Derived>(other); }
+
+  Derived& operator=(const AngleAxisType& aa);
+  template<class OtherDerived> Derived& operator=(const MatrixBase<OtherDerived>& m);
+
+  /** \returns a quaternion representing an identity rotation
+    * \sa MatrixBase::Identity()
+    */
+  static inline Quaternion<Scalar> Identity() { return Quaternion<Scalar>(1, 0, 0, 0); }
+
+  /** \sa QuaternionBase::Identity(), MatrixBase::setIdentity()
+    */
+  inline QuaternionBase& setIdentity() { coeffs() << 0, 0, 0, 1; return *this; }
+
+  /** \returns the squared norm of the quaternion's coefficients
+    * \sa QuaternionBase::norm(), MatrixBase::squaredNorm()
+    */
+  inline Scalar squaredNorm() const { return coeffs().squaredNorm(); }
+
+  /** \returns the norm of the quaternion's coefficients
+    * \sa QuaternionBase::squaredNorm(), MatrixBase::norm()
+    */
+  inline Scalar norm() const { return coeffs().norm(); }
+
+  /** Normalizes the quaternion \c *this
+    * \sa normalized(), MatrixBase::normalize() */
+  inline void normalize() { coeffs().normalize(); }
+  /** \returns a normalized copy of \c *this
+    * \sa normalize(), MatrixBase::normalized() */
+  inline Quaternion<Scalar> normalized() const { return Quaternion<Scalar>(coeffs().normalized()); }
+
+    /** \returns the dot product of \c *this and \a other
+    * Geometrically speaking, the dot product of two unit quaternions
+    * corresponds to the cosine of half the angle between the two rotations.
+    * \sa angularDistance()
+    */
+  template<class OtherDerived> inline Scalar dot(const QuaternionBase<OtherDerived>& other) const { return coeffs().dot(other.coeffs()); }
+
+  template<class OtherDerived> Scalar angularDistance(const QuaternionBase<OtherDerived>& other) const;
+
+  /** \returns an equivalent 3x3 rotation matrix */
+  Matrix3 toRotationMatrix() const;
+
+  /** \returns the quaternion which transform \a a into \a b through a rotation */
+  template<typename Derived1, typename Derived2>
+  Derived& setFromTwoVectors(const MatrixBase<Derived1>& a, const MatrixBase<Derived2>& b);
+
+  template<class OtherDerived> EIGEN_STRONG_INLINE Quaternion<Scalar> operator* (const QuaternionBase<OtherDerived>& q) const;
+  template<class OtherDerived> EIGEN_STRONG_INLINE Derived& operator*= (const QuaternionBase<OtherDerived>& q);
+
+  /** \returns the quaternion describing the inverse rotation */
+  Quaternion<Scalar> inverse() const;
+
+  /** \returns the conjugated quaternion */
+  Quaternion<Scalar> conjugate() const;
+
+  template<class OtherDerived> Quaternion<Scalar> slerp(const Scalar& t, const QuaternionBase<OtherDerived>& other) const;
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  template<class OtherDerived>
+  bool isApprox(const QuaternionBase<OtherDerived>& other, const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const
+  { return coeffs().isApprox(other.coeffs(), prec); }
+
+	/** return the result vector of \a v through the rotation*/
+  EIGEN_STRONG_INLINE Vector3 _transformVector(Vector3 v) const;
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Derived,Quaternion<NewScalarType> >::type cast() const
+  {
+    return typename internal::cast_return_type<Derived,Quaternion<NewScalarType> >::type(derived());
+  }
+
+#ifdef EIGEN_QUATERNIONBASE_PLUGIN
+# include EIGEN_QUATERNIONBASE_PLUGIN
+#endif
+};
+
+/***************************************************************************
+* Definition/implementation of Quaternion<Scalar>
+***************************************************************************/
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Quaternion
+  *
+  * \brief The quaternion class used to represent 3D orientations and rotations
+  *
+  * \tparam _Scalar the scalar type, i.e., the type of the coefficients
+  * \tparam _Options controls the memory alignment of the coefficients. Can be \# AutoAlign or \# DontAlign. Default is AutoAlign.
+  *
+  * This class represents a quaternion \f$ w+xi+yj+zk \f$ that is a convenient representation of
+  * orientations and rotations of objects in three dimensions. Compared to other representations
+  * like Euler angles or 3x3 matrices, quaternions offer the following advantages:
+  * \li \b compact storage (4 scalars)
+  * \li \b efficient to compose (28 flops),
+  * \li \b stable spherical interpolation
+  *
+  * The following two typedefs are provided for convenience:
+  * \li \c Quaternionf for \c float
+  * \li \c Quaterniond for \c double
+  *
+  * \warning Operations interpreting the quaternion as rotation have undefined behavior if the quaternion is not normalized.
+  *
+  * \sa  class AngleAxis, class Transform
+  */
+
+namespace internal {
+template<typename _Scalar,int _Options>
+struct traits<Quaternion<_Scalar,_Options> >
+{
+  typedef Quaternion<_Scalar,_Options> PlainObject;
+  typedef _Scalar Scalar;
+  typedef Matrix<_Scalar,4,1,_Options> Coefficients;
+  enum{
+    IsAligned = internal::traits<Coefficients>::Flags & AlignedBit,
+    Flags = IsAligned ? (AlignedBit | LvalueBit) : LvalueBit
+  };
+};
+}
+
+template<typename _Scalar, int _Options>
+class Quaternion : public QuaternionBase<Quaternion<_Scalar,_Options> >
+{
+public:
+  typedef QuaternionBase<Quaternion<_Scalar,_Options> > Base;
+  enum { IsAligned = internal::traits<Quaternion>::IsAligned };
+
+  typedef _Scalar Scalar;
+
+  EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Quaternion)
+  using Base::operator*=;
+
+  typedef typename internal::traits<Quaternion>::Coefficients Coefficients;
+  typedef typename Base::AngleAxisType AngleAxisType;
+
+  /** Default constructor leaving the quaternion uninitialized. */
+  inline Quaternion() {}
+
+  /** Constructs and initializes the quaternion \f$ w+xi+yj+zk \f$ from
+    * its four coefficients \a w, \a x, \a y and \a z.
+    *
+    * \warning Note the order of the arguments: the real \a w coefficient first,
+    * while internally the coefficients are stored in the following order:
+    * [\c x, \c y, \c z, \c w]
+    */
+  inline Quaternion(const Scalar& w, const Scalar& x, const Scalar& y, const Scalar& z) : m_coeffs(x, y, z, w){}
+
+  /** Constructs and initialize a quaternion from the array data */
+  inline Quaternion(const Scalar* data) : m_coeffs(data) {}
+
+  /** Copy constructor */
+  template<class Derived> EIGEN_STRONG_INLINE Quaternion(const QuaternionBase<Derived>& other) { this->Base::operator=(other); }
+
+  /** Constructs and initializes a quaternion from the angle-axis \a aa */
+  explicit inline Quaternion(const AngleAxisType& aa) { *this = aa; }
+
+  /** Constructs and initializes a quaternion from either:
+    *  - a rotation matrix expression,
+    *  - a 4D vector expression representing quaternion coefficients.
+    */
+  template<typename Derived>
+  explicit inline Quaternion(const MatrixBase<Derived>& other) { *this = other; }
+
+  /** Explicit copy constructor with scalar conversion */
+  template<typename OtherScalar, int OtherOptions>
+  explicit inline Quaternion(const Quaternion<OtherScalar, OtherOptions>& other)
+  { m_coeffs = other.coeffs().template cast<Scalar>(); }
+
+  template<typename Derived1, typename Derived2>
+  static Quaternion FromTwoVectors(const MatrixBase<Derived1>& a, const MatrixBase<Derived2>& b);
+
+  inline Coefficients& coeffs() { return m_coeffs;}
+  inline const Coefficients& coeffs() const { return m_coeffs;}
+
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF(IsAligned)
+
+protected:
+  Coefficients m_coeffs;
+  
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    static EIGEN_STRONG_INLINE void _check_template_params()
+    {
+      EIGEN_STATIC_ASSERT( (_Options & DontAlign) == _Options,
+        INVALID_MATRIX_TEMPLATE_PARAMETERS)
+    }
+#endif
+};
+
+/** \ingroup Geometry_Module
+  * single precision quaternion type */
+typedef Quaternion<float> Quaternionf;
+/** \ingroup Geometry_Module
+  * double precision quaternion type */
+typedef Quaternion<double> Quaterniond;
+
+/***************************************************************************
+* Specialization of Map<Quaternion<Scalar>>
+***************************************************************************/
+
+namespace internal {
+  template<typename _Scalar, int _Options>
+  struct traits<Map<Quaternion<_Scalar>, _Options> > : traits<Quaternion<_Scalar, (int(_Options)&Aligned)==Aligned ? AutoAlign : DontAlign> >
+  {
+    typedef Map<Matrix<_Scalar,4,1>, _Options> Coefficients;
+  };
+}
+
+namespace internal {
+  template<typename _Scalar, int _Options>
+  struct traits<Map<const Quaternion<_Scalar>, _Options> > : traits<Quaternion<_Scalar, (int(_Options)&Aligned)==Aligned ? AutoAlign : DontAlign> >
+  {
+    typedef Map<const Matrix<_Scalar,4,1>, _Options> Coefficients;
+    typedef traits<Quaternion<_Scalar, (int(_Options)&Aligned)==Aligned ? AutoAlign : DontAlign> > TraitsBase;
+    enum {
+      Flags = TraitsBase::Flags & ~LvalueBit
+    };
+  };
+}
+
+/** \ingroup Geometry_Module
+  * \brief Quaternion expression mapping a constant memory buffer
+  *
+  * \tparam _Scalar the type of the Quaternion coefficients
+  * \tparam _Options see class Map
+  *
+  * This is a specialization of class Map for Quaternion. This class allows to view
+  * a 4 scalar memory buffer as an Eigen's Quaternion object.
+  *
+  * \sa class Map, class Quaternion, class QuaternionBase
+  */
+template<typename _Scalar, int _Options>
+class Map<const Quaternion<_Scalar>, _Options >
+  : public QuaternionBase<Map<const Quaternion<_Scalar>, _Options> >
+{
+  public:
+    typedef QuaternionBase<Map<const Quaternion<_Scalar>, _Options> > Base;
+
+    typedef _Scalar Scalar;
+    typedef typename internal::traits<Map>::Coefficients Coefficients;
+    EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Map)
+    using Base::operator*=;
+
+    /** Constructs a Mapped Quaternion object from the pointer \a coeffs
+      *
+      * The pointer \a coeffs must reference the four coefficients of Quaternion in the following order:
+      * \code *coeffs == {x, y, z, w} \endcode
+      *
+      * If the template parameter _Options is set to #Aligned, then the pointer coeffs must be aligned. */
+    EIGEN_STRONG_INLINE Map(const Scalar* coeffs) : m_coeffs(coeffs) {}
+
+    inline const Coefficients& coeffs() const { return m_coeffs;}
+
+  protected:
+    const Coefficients m_coeffs;
+};
+
+/** \ingroup Geometry_Module
+  * \brief Expression of a quaternion from a memory buffer
+  *
+  * \tparam _Scalar the type of the Quaternion coefficients
+  * \tparam _Options see class Map
+  *
+  * This is a specialization of class Map for Quaternion. This class allows to view
+  * a 4 scalar memory buffer as an Eigen's  Quaternion object.
+  *
+  * \sa class Map, class Quaternion, class QuaternionBase
+  */
+template<typename _Scalar, int _Options>
+class Map<Quaternion<_Scalar>, _Options >
+  : public QuaternionBase<Map<Quaternion<_Scalar>, _Options> >
+{
+  public:
+    typedef QuaternionBase<Map<Quaternion<_Scalar>, _Options> > Base;
+
+    typedef _Scalar Scalar;
+    typedef typename internal::traits<Map>::Coefficients Coefficients;
+    EIGEN_INHERIT_ASSIGNMENT_EQUAL_OPERATOR(Map)
+    using Base::operator*=;
+
+    /** Constructs a Mapped Quaternion object from the pointer \a coeffs
+      *
+      * The pointer \a coeffs must reference the four coefficients of Quaternion in the following order:
+      * \code *coeffs == {x, y, z, w} \endcode
+      *
+      * If the template parameter _Options is set to #Aligned, then the pointer coeffs must be aligned. */
+    EIGEN_STRONG_INLINE Map(Scalar* coeffs) : m_coeffs(coeffs) {}
+
+    inline Coefficients& coeffs() { return m_coeffs; }
+    inline const Coefficients& coeffs() const { return m_coeffs; }
+
+  protected:
+    Coefficients m_coeffs;
+};
+
+/** \ingroup Geometry_Module
+  * Map an unaligned array of single precision scalars as a quaternion */
+typedef Map<Quaternion<float>, 0>         QuaternionMapf;
+/** \ingroup Geometry_Module
+  * Map an unaligned array of double precision scalars as a quaternion */
+typedef Map<Quaternion<double>, 0>        QuaternionMapd;
+/** \ingroup Geometry_Module
+  * Map a 16-byte aligned array of single precision scalars as a quaternion */
+typedef Map<Quaternion<float>, Aligned>   QuaternionMapAlignedf;
+/** \ingroup Geometry_Module
+  * Map a 16-byte aligned array of double precision scalars as a quaternion */
+typedef Map<Quaternion<double>, Aligned>  QuaternionMapAlignedd;
+
+/***************************************************************************
+* Implementation of QuaternionBase methods
+***************************************************************************/
+
+// Generic Quaternion * Quaternion product
+// This product can be specialized for a given architecture via the Arch template argument.
+namespace internal {
+template<int Arch, class Derived1, class Derived2, typename Scalar, int _Options> struct quat_product
+{
+  static EIGEN_STRONG_INLINE Quaternion<Scalar> run(const QuaternionBase<Derived1>& a, const QuaternionBase<Derived2>& b){
+    return Quaternion<Scalar>
+    (
+      a.w() * b.w() - a.x() * b.x() - a.y() * b.y() - a.z() * b.z(),
+      a.w() * b.x() + a.x() * b.w() + a.y() * b.z() - a.z() * b.y(),
+      a.w() * b.y() + a.y() * b.w() + a.z() * b.x() - a.x() * b.z(),
+      a.w() * b.z() + a.z() * b.w() + a.x() * b.y() - a.y() * b.x()
+    );
+  }
+};
+}
+
+/** \returns the concatenation of two rotations as a quaternion-quaternion product */
+template <class Derived>
+template <class OtherDerived>
+EIGEN_STRONG_INLINE Quaternion<typename internal::traits<Derived>::Scalar>
+QuaternionBase<Derived>::operator* (const QuaternionBase<OtherDerived>& other) const
+{
+  EIGEN_STATIC_ASSERT((internal::is_same<typename Derived::Scalar, typename OtherDerived::Scalar>::value),
+   YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+  return internal::quat_product<Architecture::Target, Derived, OtherDerived,
+                         typename internal::traits<Derived>::Scalar,
+                         internal::traits<Derived>::IsAligned && internal::traits<OtherDerived>::IsAligned>::run(*this, other);
+}
+
+/** \sa operator*(Quaternion) */
+template <class Derived>
+template <class OtherDerived>
+EIGEN_STRONG_INLINE Derived& QuaternionBase<Derived>::operator*= (const QuaternionBase<OtherDerived>& other)
+{
+  derived() = derived() * other.derived();
+  return derived();
+}
+
+/** Rotation of a vector by a quaternion.
+  * \remarks If the quaternion is used to rotate several points (>1)
+  * then it is much more efficient to first convert it to a 3x3 Matrix.
+  * Comparison of the operation cost for n transformations:
+  *   - Quaternion2:    30n
+  *   - Via a Matrix3: 24 + 15n
+  */
+template <class Derived>
+EIGEN_STRONG_INLINE typename QuaternionBase<Derived>::Vector3
+QuaternionBase<Derived>::_transformVector(Vector3 v) const
+{
+    // Note that this algorithm comes from the optimization by hand
+    // of the conversion to a Matrix followed by a Matrix/Vector product.
+    // It appears to be much faster than the common algorithm found
+    // in the literature (30 versus 39 flops). It also requires two
+    // Vector3 as temporaries.
+    Vector3 uv = this->vec().cross(v);
+    uv += uv;
+    return v + this->w() * uv + this->vec().cross(uv);
+}
+
+template<class Derived>
+EIGEN_STRONG_INLINE QuaternionBase<Derived>& QuaternionBase<Derived>::operator=(const QuaternionBase<Derived>& other)
+{
+  coeffs() = other.coeffs();
+  return derived();
+}
+
+template<class Derived>
+template<class OtherDerived>
+EIGEN_STRONG_INLINE Derived& QuaternionBase<Derived>::operator=(const QuaternionBase<OtherDerived>& other)
+{
+  coeffs() = other.coeffs();
+  return derived();
+}
+
+/** Set \c *this from an angle-axis \a aa and returns a reference to \c *this
+  */
+template<class Derived>
+EIGEN_STRONG_INLINE Derived& QuaternionBase<Derived>::operator=(const AngleAxisType& aa)
+{
+  using std::cos;
+  using std::sin;
+  Scalar ha = Scalar(0.5)*aa.angle(); // Scalar(0.5) to suppress precision loss warnings
+  this->w() = cos(ha);
+  this->vec() = sin(ha) * aa.axis();
+  return derived();
+}
+
+/** Set \c *this from the expression \a xpr:
+  *   - if \a xpr is a 4x1 vector, then \a xpr is assumed to be a quaternion
+  *   - if \a xpr is a 3x3 matrix, then \a xpr is assumed to be rotation matrix
+  *     and \a xpr is converted to a quaternion
+  */
+
+template<class Derived>
+template<class MatrixDerived>
+inline Derived& QuaternionBase<Derived>::operator=(const MatrixBase<MatrixDerived>& xpr)
+{
+  EIGEN_STATIC_ASSERT((internal::is_same<typename Derived::Scalar, typename MatrixDerived::Scalar>::value),
+   YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+  internal::quaternionbase_assign_impl<MatrixDerived>::run(*this, xpr.derived());
+  return derived();
+}
+
+/** Convert the quaternion to a 3x3 rotation matrix. The quaternion is required to
+  * be normalized, otherwise the result is undefined.
+  */
+template<class Derived>
+inline typename QuaternionBase<Derived>::Matrix3
+QuaternionBase<Derived>::toRotationMatrix(void) const
+{
+  // NOTE if inlined, then gcc 4.2 and 4.4 get rid of the temporary (not gcc 4.3 !!)
+  // if not inlined then the cost of the return by value is huge ~ +35%,
+  // however, not inlining this function is an order of magnitude slower, so
+  // it has to be inlined, and so the return by value is not an issue
+  Matrix3 res;
+
+  const Scalar tx  = Scalar(2)*this->x();
+  const Scalar ty  = Scalar(2)*this->y();
+  const Scalar tz  = Scalar(2)*this->z();
+  const Scalar twx = tx*this->w();
+  const Scalar twy = ty*this->w();
+  const Scalar twz = tz*this->w();
+  const Scalar txx = tx*this->x();
+  const Scalar txy = ty*this->x();
+  const Scalar txz = tz*this->x();
+  const Scalar tyy = ty*this->y();
+  const Scalar tyz = tz*this->y();
+  const Scalar tzz = tz*this->z();
+
+  res.coeffRef(0,0) = Scalar(1)-(tyy+tzz);
+  res.coeffRef(0,1) = txy-twz;
+  res.coeffRef(0,2) = txz+twy;
+  res.coeffRef(1,0) = txy+twz;
+  res.coeffRef(1,1) = Scalar(1)-(txx+tzz);
+  res.coeffRef(1,2) = tyz-twx;
+  res.coeffRef(2,0) = txz-twy;
+  res.coeffRef(2,1) = tyz+twx;
+  res.coeffRef(2,2) = Scalar(1)-(txx+tyy);
+
+  return res;
+}
+
+/** Sets \c *this to be a quaternion representing a rotation between
+  * the two arbitrary vectors \a a and \a b. In other words, the built
+  * rotation represent a rotation sending the line of direction \a a
+  * to the line of direction \a b, both lines passing through the origin.
+  *
+  * \returns a reference to \c *this.
+  *
+  * Note that the two input vectors do \b not have to be normalized, and
+  * do not need to have the same norm.
+  */
+template<class Derived>
+template<typename Derived1, typename Derived2>
+inline Derived& QuaternionBase<Derived>::setFromTwoVectors(const MatrixBase<Derived1>& a, const MatrixBase<Derived2>& b)
+{
+  using std::sqrt;
+  Vector3 v0 = a.normalized();
+  Vector3 v1 = b.normalized();
+  Scalar c = v1.dot(v0);
+
+  // if dot == -1, vectors are nearly opposites
+  // => accurately compute the rotation axis by computing the
+  //    intersection of the two planes. This is done by solving:
+  //       x^T v0 = 0
+  //       x^T v1 = 0
+  //    under the constraint:
+  //       ||x|| = 1
+  //    which yields a singular value problem
+  if (c < Scalar(-1)+NumTraits<Scalar>::dummy_precision())
+  {
+    c = numext::maxi(c,Scalar(-1));
+    Matrix<Scalar,2,3> m; m << v0.transpose(), v1.transpose();
+    JacobiSVD<Matrix<Scalar,2,3> > svd(m, ComputeFullV);
+    Vector3 axis = svd.matrixV().col(2);
+
+    Scalar w2 = (Scalar(1)+c)*Scalar(0.5);
+    this->w() = sqrt(w2);
+    this->vec() = axis * sqrt(Scalar(1) - w2);
+    return derived();
+  }
+  Vector3 axis = v0.cross(v1);
+  Scalar s = sqrt((Scalar(1)+c)*Scalar(2));
+  Scalar invs = Scalar(1)/s;
+  this->vec() = axis * invs;
+  this->w() = s * Scalar(0.5);
+
+  return derived();
+}
+
+
+/** Returns a quaternion representing a rotation between
+  * the two arbitrary vectors \a a and \a b. In other words, the built
+  * rotation represent a rotation sending the line of direction \a a
+  * to the line of direction \a b, both lines passing through the origin.
+  *
+  * \returns resulting quaternion
+  *
+  * Note that the two input vectors do \b not have to be normalized, and
+  * do not need to have the same norm.
+  */
+template<typename Scalar, int Options>
+template<typename Derived1, typename Derived2>
+Quaternion<Scalar,Options> Quaternion<Scalar,Options>::FromTwoVectors(const MatrixBase<Derived1>& a, const MatrixBase<Derived2>& b)
+{
+    Quaternion quat;
+    quat.setFromTwoVectors(a, b);
+    return quat;
+}
+
+
+/** \returns the multiplicative inverse of \c *this
+  * Note that in most cases, i.e., if you simply want the opposite rotation,
+  * and/or the quaternion is normalized, then it is enough to use the conjugate.
+  *
+  * \sa QuaternionBase::conjugate()
+  */
+template <class Derived>
+inline Quaternion<typename internal::traits<Derived>::Scalar> QuaternionBase<Derived>::inverse() const
+{
+  // FIXME should this function be called multiplicativeInverse and conjugate() be called inverse() or opposite()  ??
+  Scalar n2 = this->squaredNorm();
+  if (n2 > 0)
+    return Quaternion<Scalar>(conjugate().coeffs() / n2);
+  else
+  {
+    // return an invalid result to flag the error
+    return Quaternion<Scalar>(Coefficients::Zero());
+  }
+}
+
+/** \returns the conjugate of the \c *this which is equal to the multiplicative inverse
+  * if the quaternion is normalized.
+  * The conjugate of a quaternion represents the opposite rotation.
+  *
+  * \sa Quaternion2::inverse()
+  */
+template <class Derived>
+inline Quaternion<typename internal::traits<Derived>::Scalar>
+QuaternionBase<Derived>::conjugate() const
+{
+  return Quaternion<Scalar>(this->w(),-this->x(),-this->y(),-this->z());
+}
+
+/** \returns the angle (in radian) between two rotations
+  * \sa dot()
+  */
+template <class Derived>
+template <class OtherDerived>
+inline typename internal::traits<Derived>::Scalar
+QuaternionBase<Derived>::angularDistance(const QuaternionBase<OtherDerived>& other) const
+{
+  using std::acos;
+  using std::abs;
+  Scalar d = abs(this->dot(other));
+  if (d>=Scalar(1))
+    return Scalar(0);
+  return Scalar(2) * acos(d);
+}
+
+ 
+    
+/** \returns the spherical linear interpolation between the two quaternions
+  * \c *this and \a other at the parameter \a t in [0;1].
+  * 
+  * This represents an interpolation for a constant motion between \c *this and \a other,
+  * see also http://en.wikipedia.org/wiki/Slerp.
+  */
+template <class Derived>
+template <class OtherDerived>
+Quaternion<typename internal::traits<Derived>::Scalar>
+QuaternionBase<Derived>::slerp(const Scalar& t, const QuaternionBase<OtherDerived>& other) const
+{
+  using std::acos;
+  using std::sin;
+  using std::abs;
+  static const Scalar one = Scalar(1) - NumTraits<Scalar>::epsilon();
+  Scalar d = this->dot(other);
+  Scalar absD = abs(d);
+
+  Scalar scale0;
+  Scalar scale1;
+
+  if(absD>=one)
+  {
+    scale0 = Scalar(1) - t;
+    scale1 = t;
+  }
+  else
+  {
+    // theta is the angle between the 2 quaternions
+    Scalar theta = acos(absD);
+    Scalar sinTheta = sin(theta);
+
+    scale0 = sin( ( Scalar(1) - t ) * theta) / sinTheta;
+    scale1 = sin( ( t * theta) ) / sinTheta;
+  }
+  if(d<0) scale1 = -scale1;
+
+  return Quaternion<Scalar>(scale0 * coeffs() + scale1 * other.coeffs());
+}
+
+namespace internal {
+
+// set from a rotation matrix
+template<typename Other>
+struct quaternionbase_assign_impl<Other,3,3>
+{
+  typedef typename Other::Scalar Scalar;
+  typedef DenseIndex Index;
+  template<class Derived> static inline void run(QuaternionBase<Derived>& q, const Other& mat)
+  {
+    using std::sqrt;
+    // This algorithm comes from  "Quaternion Calculus and Fast Animation",
+    // Ken Shoemake, 1987 SIGGRAPH course notes
+    Scalar t = mat.trace();
+    if (t > Scalar(0))
+    {
+      t = sqrt(t + Scalar(1.0));
+      q.w() = Scalar(0.5)*t;
+      t = Scalar(0.5)/t;
+      q.x() = (mat.coeff(2,1) - mat.coeff(1,2)) * t;
+      q.y() = (mat.coeff(0,2) - mat.coeff(2,0)) * t;
+      q.z() = (mat.coeff(1,0) - mat.coeff(0,1)) * t;
+    }
+    else
+    {
+      DenseIndex i = 0;
+      if (mat.coeff(1,1) > mat.coeff(0,0))
+        i = 1;
+      if (mat.coeff(2,2) > mat.coeff(i,i))
+        i = 2;
+      DenseIndex j = (i+1)%3;
+      DenseIndex k = (j+1)%3;
+
+      t = sqrt(mat.coeff(i,i)-mat.coeff(j,j)-mat.coeff(k,k) + Scalar(1.0));
+      q.coeffs().coeffRef(i) = Scalar(0.5) * t;
+      t = Scalar(0.5)/t;
+      q.w() = (mat.coeff(k,j)-mat.coeff(j,k))*t;
+      q.coeffs().coeffRef(j) = (mat.coeff(j,i)+mat.coeff(i,j))*t;
+      q.coeffs().coeffRef(k) = (mat.coeff(k,i)+mat.coeff(i,k))*t;
+    }
+  }
+};
+
+// set from a vector of coefficients assumed to be a quaternion
+template<typename Other>
+struct quaternionbase_assign_impl<Other,4,1>
+{
+  typedef typename Other::Scalar Scalar;
+  template<class Derived> static inline void run(QuaternionBase<Derived>& q, const Other& vec)
+  {
+    q.coeffs() = vec;
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_QUATERNION_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/Rotation2D.h b/third_party/eigen3/Eigen/src/Geometry/Rotation2D.h
new file mode 100644
index 0000000000..1cac343a5e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/Rotation2D.h
@@ -0,0 +1,157 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ROTATION2D_H
+#define EIGEN_ROTATION2D_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Rotation2D
+  *
+  * \brief Represents a rotation/orientation in a 2 dimensional space.
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients
+  *
+  * This class is equivalent to a single scalar representing a counter clock wise rotation
+  * as a single angle in radian. It provides some additional features such as the automatic
+  * conversion from/to a 2x2 rotation matrix. Moreover this class aims to provide a similar
+  * interface to Quaternion in order to facilitate the writing of generic algorithms
+  * dealing with rotations.
+  *
+  * \sa class Quaternion, class Transform
+  */
+
+namespace internal {
+
+template<typename _Scalar> struct traits<Rotation2D<_Scalar> >
+{
+  typedef _Scalar Scalar;
+};
+} // end namespace internal
+
+template<typename _Scalar>
+class Rotation2D : public RotationBase<Rotation2D<_Scalar>,2>
+{
+  typedef RotationBase<Rotation2D<_Scalar>,2> Base;
+
+public:
+
+  using Base::operator*;
+
+  enum { Dim = 2 };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  typedef Matrix<Scalar,2,1> Vector2;
+  typedef Matrix<Scalar,2,2> Matrix2;
+
+protected:
+
+  Scalar m_angle;
+
+public:
+
+  /** Construct a 2D counter clock wise rotation from the angle \a a in radian. */
+  inline Rotation2D(const Scalar& a) : m_angle(a) {}
+
+  /** \returns the rotation angle */
+  inline Scalar angle() const { return m_angle; }
+
+  /** \returns a read-write reference to the rotation angle */
+  inline Scalar& angle() { return m_angle; }
+
+  /** \returns the inverse rotation */
+  inline Rotation2D inverse() const { return -m_angle; }
+
+  /** Concatenates two rotations */
+  inline Rotation2D operator*(const Rotation2D& other) const
+  { return m_angle + other.m_angle; }
+
+  /** Concatenates two rotations */
+  inline Rotation2D& operator*=(const Rotation2D& other)
+  { m_angle += other.m_angle; return *this; }
+
+  /** Applies the rotation to a 2D vector */
+  Vector2 operator* (const Vector2& vec) const
+  { return toRotationMatrix() * vec; }
+
+  template<typename Derived>
+  Rotation2D& fromRotationMatrix(const MatrixBase<Derived>& m);
+  Matrix2 toRotationMatrix(void) const;
+
+  /** \returns the spherical interpolation between \c *this and \a other using
+    * parameter \a t. It is in fact equivalent to a linear interpolation.
+    */
+  inline Rotation2D slerp(const Scalar& t, const Rotation2D& other) const
+  { return m_angle * (1-t) + other.angle() * t; }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Rotation2D,Rotation2D<NewScalarType> >::type cast() const
+  { return typename internal::cast_return_type<Rotation2D,Rotation2D<NewScalarType> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Rotation2D(const Rotation2D<OtherScalarType>& other)
+  {
+    m_angle = Scalar(other.angle());
+  }
+
+  static inline Rotation2D Identity() { return Rotation2D(0); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Rotation2D& other, const typename NumTraits<Scalar>::Real& prec = NumTraits<Scalar>::dummy_precision()) const
+  { return internal::isApprox(m_angle,other.m_angle, prec); }
+};
+
+/** \ingroup Geometry_Module
+  * single precision 2D rotation type */
+typedef Rotation2D<float> Rotation2Df;
+/** \ingroup Geometry_Module
+  * double precision 2D rotation type */
+typedef Rotation2D<double> Rotation2Dd;
+
+/** Set \c *this from a 2x2 rotation matrix \a mat.
+  * In other words, this function extract the rotation angle
+  * from the rotation matrix.
+  */
+template<typename Scalar>
+template<typename Derived>
+Rotation2D<Scalar>& Rotation2D<Scalar>::fromRotationMatrix(const MatrixBase<Derived>& mat)
+{
+  using std::atan2;
+  EIGEN_STATIC_ASSERT(Derived::RowsAtCompileTime==2 && Derived::ColsAtCompileTime==2,YOU_MADE_A_PROGRAMMING_MISTAKE)
+  m_angle = atan2(mat.coeff(1,0), mat.coeff(0,0));
+  return *this;
+}
+
+/** Constructs and \returns an equivalent 2x2 rotation matrix.
+  */
+template<typename Scalar>
+typename Rotation2D<Scalar>::Matrix2
+Rotation2D<Scalar>::toRotationMatrix(void) const
+{
+  using std::sin;
+  using std::cos;
+  Scalar sinA = sin(m_angle);
+  Scalar cosA = cos(m_angle);
+  return (Matrix2() << cosA, -sinA, sinA, cosA).finished();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_ROTATION2D_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/RotationBase.h b/third_party/eigen3/Eigen/src/Geometry/RotationBase.h
new file mode 100644
index 0000000000..b88661de6b
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/RotationBase.h
@@ -0,0 +1,206 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ROTATIONBASE_H
+#define EIGEN_ROTATIONBASE_H
+
+namespace Eigen { 
+
+// forward declaration
+namespace internal {
+template<typename RotationDerived, typename MatrixType, bool IsVector=MatrixType::IsVectorAtCompileTime>
+struct rotation_base_generic_product_selector;
+}
+
+/** \class RotationBase
+  *
+  * \brief Common base class for compact rotation representations
+  *
+  * \param Derived is the derived type, i.e., a rotation type
+  * \param _Dim the dimension of the space
+  */
+template<typename Derived, int _Dim>
+class RotationBase
+{
+  public:
+    enum { Dim = _Dim };
+    /** the scalar type of the coefficients */
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+
+    /** corresponding linear transformation matrix type */
+    typedef Matrix<Scalar,Dim,Dim> RotationMatrixType;
+    typedef Matrix<Scalar,Dim,1> VectorType;
+
+  public:
+    inline const Derived& derived() const { return *static_cast<const Derived*>(this); }
+    inline Derived& derived() { return *static_cast<Derived*>(this); }
+
+    /** \returns an equivalent rotation matrix */
+    inline RotationMatrixType toRotationMatrix() const { return derived().toRotationMatrix(); }
+
+    /** \returns an equivalent rotation matrix 
+      * This function is added to be conform with the Transform class' naming scheme.
+      */
+    inline RotationMatrixType matrix() const { return derived().toRotationMatrix(); }
+
+    /** \returns the inverse rotation */
+    inline Derived inverse() const { return derived().inverse(); }
+
+    /** \returns the concatenation of the rotation \c *this with a translation \a t */
+    inline Transform<Scalar,Dim,Isometry> operator*(const Translation<Scalar,Dim>& t) const
+    { return Transform<Scalar,Dim,Isometry>(*this) * t; }
+
+    /** \returns the concatenation of the rotation \c *this with a uniform scaling \a s */
+    inline RotationMatrixType operator*(const UniformScaling<Scalar>& s) const
+    { return toRotationMatrix() * s.factor(); }
+
+    /** \returns the concatenation of the rotation \c *this with a generic expression \a e
+      * \a e can be:
+      *  - a DimxDim linear transformation matrix
+      *  - a DimxDim diagonal matrix (axis aligned scaling)
+      *  - a vector of size Dim
+      */
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE typename internal::rotation_base_generic_product_selector<Derived,OtherDerived,OtherDerived::IsVectorAtCompileTime>::ReturnType
+    operator*(const EigenBase<OtherDerived>& e) const
+    { return internal::rotation_base_generic_product_selector<Derived,OtherDerived>::run(derived(), e.derived()); }
+
+    /** \returns the concatenation of a linear transformation \a l with the rotation \a r */
+    template<typename OtherDerived> friend
+    inline RotationMatrixType operator*(const EigenBase<OtherDerived>& l, const Derived& r)
+    { return l.derived() * r.toRotationMatrix(); }
+
+    /** \returns the concatenation of a scaling \a l with the rotation \a r */
+    friend inline Transform<Scalar,Dim,Affine> operator*(const DiagonalMatrix<Scalar,Dim>& l, const Derived& r)
+    { 
+      Transform<Scalar,Dim,Affine> res(r);
+      res.linear().applyOnTheLeft(l);
+      return res;
+    }
+
+    /** \returns the concatenation of the rotation \c *this with a transformation \a t */
+    template<int Mode, int Options>
+    inline Transform<Scalar,Dim,Mode> operator*(const Transform<Scalar,Dim,Mode,Options>& t) const
+    { return toRotationMatrix() * t; }
+
+    template<typename OtherVectorType>
+    inline VectorType _transformVector(const OtherVectorType& v) const
+    { return toRotationMatrix() * v; }
+};
+
+namespace internal {
+
+// implementation of the generic product rotation * matrix
+template<typename RotationDerived, typename MatrixType>
+struct rotation_base_generic_product_selector<RotationDerived,MatrixType,false>
+{
+  enum { Dim = RotationDerived::Dim };
+  typedef Matrix<typename RotationDerived::Scalar,Dim,Dim> ReturnType;
+  static inline ReturnType run(const RotationDerived& r, const MatrixType& m)
+  { return r.toRotationMatrix() * m; }
+};
+
+template<typename RotationDerived, typename Scalar, int Dim, int MaxDim>
+struct rotation_base_generic_product_selector< RotationDerived, DiagonalMatrix<Scalar,Dim,MaxDim>, false >
+{
+  typedef Transform<Scalar,Dim,Affine> ReturnType;
+  static inline ReturnType run(const RotationDerived& r, const DiagonalMatrix<Scalar,Dim,MaxDim>& m)
+  {
+    ReturnType res(r);
+    res.linear() *= m;
+    return res;
+  }
+};
+
+template<typename RotationDerived,typename OtherVectorType>
+struct rotation_base_generic_product_selector<RotationDerived,OtherVectorType,true>
+{
+  enum { Dim = RotationDerived::Dim };
+  typedef Matrix<typename RotationDerived::Scalar,Dim,1> ReturnType;
+  static EIGEN_STRONG_INLINE ReturnType run(const RotationDerived& r, const OtherVectorType& v)
+  {
+    return r._transformVector(v);
+  }
+};
+
+} // end namespace internal
+
+/** \geometry_module
+  *
+  * \brief Constructs a Dim x Dim rotation matrix from the rotation \a r
+  */
+template<typename _Scalar, int _Rows, int _Cols, int _Storage, int _MaxRows, int _MaxCols>
+template<typename OtherDerived>
+Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols>
+::Matrix(const RotationBase<OtherDerived,ColsAtCompileTime>& r)
+{
+  EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(Matrix,int(OtherDerived::Dim),int(OtherDerived::Dim))
+  *this = r.toRotationMatrix();
+}
+
+/** \geometry_module
+  *
+  * \brief Set a Dim x Dim rotation matrix from the rotation \a r
+  */
+template<typename _Scalar, int _Rows, int _Cols, int _Storage, int _MaxRows, int _MaxCols>
+template<typename OtherDerived>
+Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols>&
+Matrix<_Scalar, _Rows, _Cols, _Storage, _MaxRows, _MaxCols>
+::operator=(const RotationBase<OtherDerived,ColsAtCompileTime>& r)
+{
+  EIGEN_STATIC_ASSERT_MATRIX_SPECIFIC_SIZE(Matrix,int(OtherDerived::Dim),int(OtherDerived::Dim))
+  return *this = r.toRotationMatrix();
+}
+
+namespace internal {
+
+/** \internal
+  *
+  * Helper function to return an arbitrary rotation object to a rotation matrix.
+  *
+  * \param Scalar the numeric type of the matrix coefficients
+  * \param Dim the dimension of the current space
+  *
+  * It returns a Dim x Dim fixed size matrix.
+  *
+  * Default specializations are provided for:
+  *   - any scalar type (2D),
+  *   - any matrix expression,
+  *   - any type based on RotationBase (e.g., Quaternion, AngleAxis, Rotation2D)
+  *
+  * Currently toRotationMatrix is only used by Transform.
+  *
+  * \sa class Transform, class Rotation2D, class Quaternion, class AngleAxis
+  */
+template<typename Scalar, int Dim>
+static inline Matrix<Scalar,2,2> toRotationMatrix(const Scalar& s)
+{
+  EIGEN_STATIC_ASSERT(Dim==2,YOU_MADE_A_PROGRAMMING_MISTAKE)
+  return Rotation2D<Scalar>(s).toRotationMatrix();
+}
+
+template<typename Scalar, int Dim, typename OtherDerived>
+static inline Matrix<Scalar,Dim,Dim> toRotationMatrix(const RotationBase<OtherDerived,Dim>& r)
+{
+  return r.toRotationMatrix();
+}
+
+template<typename Scalar, int Dim, typename OtherDerived>
+static inline const MatrixBase<OtherDerived>& toRotationMatrix(const MatrixBase<OtherDerived>& mat)
+{
+  EIGEN_STATIC_ASSERT(OtherDerived::RowsAtCompileTime==Dim && OtherDerived::ColsAtCompileTime==Dim,
+    YOU_MADE_A_PROGRAMMING_MISTAKE)
+  return mat;
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_ROTATIONBASE_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/Scaling.h b/third_party/eigen3/Eigen/src/Geometry/Scaling.h
new file mode 100644
index 0000000000..023fba2eec
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/Scaling.h
@@ -0,0 +1,166 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SCALING_H
+#define EIGEN_SCALING_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Scaling
+  *
+  * \brief Represents a generic uniform scaling transformation
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients.
+  *
+  * This class represent a uniform scaling transformation. It is the return
+  * type of Scaling(Scalar), and most of the time this is the only way it
+  * is used. In particular, this class is not aimed to be used to store a scaling transformation,
+  * but rather to make easier the constructions and updates of Transform objects.
+  *
+  * To represent an axis aligned scaling, use the DiagonalMatrix class.
+  *
+  * \sa Scaling(), class DiagonalMatrix, MatrixBase::asDiagonal(), class Translation, class Transform
+  */
+template<typename _Scalar>
+class UniformScaling
+{
+public:
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+
+protected:
+
+  Scalar m_factor;
+
+public:
+
+  /** Default constructor without initialization. */
+  UniformScaling() {}
+  /** Constructs and initialize a uniform scaling transformation */
+  explicit inline UniformScaling(const Scalar& s) : m_factor(s) {}
+
+  inline const Scalar& factor() const { return m_factor; }
+  inline Scalar& factor() { return m_factor; }
+
+  /** Concatenates two uniform scaling */
+  inline UniformScaling operator* (const UniformScaling& other) const
+  { return UniformScaling(m_factor * other.factor()); }
+
+  /** Concatenates a uniform scaling and a translation */
+  template<int Dim>
+  inline Transform<Scalar,Dim,Affine> operator* (const Translation<Scalar,Dim>& t) const;
+
+  /** Concatenates a uniform scaling and an affine transformation */
+  template<int Dim, int Mode, int Options>
+  inline Transform<Scalar,Dim,(int(Mode)==int(Isometry)?Affine:Mode)> operator* (const Transform<Scalar,Dim, Mode, Options>& t) const
+  {
+    Transform<Scalar,Dim,(int(Mode)==int(Isometry)?Affine:Mode)> res = t;
+    res.prescale(factor());
+    return res;
+  }
+
+  /** Concatenates a uniform scaling and a linear transformation matrix */
+  // TODO returns an expression
+  template<typename Derived>
+  inline typename internal::plain_matrix_type<Derived>::type operator* (const MatrixBase<Derived>& other) const
+  { return other * m_factor; }
+
+  template<typename Derived,int Dim>
+  inline Matrix<Scalar,Dim,Dim> operator*(const RotationBase<Derived,Dim>& r) const
+  { return r.toRotationMatrix() * m_factor; }
+
+  /** \returns the inverse scaling */
+  inline UniformScaling inverse() const
+  { return UniformScaling(Scalar(1)/m_factor); }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline UniformScaling<NewScalarType> cast() const
+  { return UniformScaling<NewScalarType>(NewScalarType(m_factor)); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit UniformScaling(const UniformScaling<OtherScalarType>& other)
+  { m_factor = Scalar(other.factor()); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const UniformScaling& other, const typename NumTraits<Scalar>::Real& prec = NumTraits<Scalar>::dummy_precision()) const
+  { return internal::isApprox(m_factor, other.factor(), prec); }
+
+};
+
+/** Concatenates a linear transformation matrix and a uniform scaling */
+// NOTE this operator is defiend in MatrixBase and not as a friend function
+// of UniformScaling to fix an internal crash of Intel's ICC
+template<typename Derived> typename MatrixBase<Derived>::ScalarMultipleReturnType
+MatrixBase<Derived>::operator*(const UniformScaling<Scalar>& s) const
+{ return derived() * s.factor(); }
+
+/** Constructs a uniform scaling from scale factor \a s */
+static inline UniformScaling<float> Scaling(float s) { return UniformScaling<float>(s); }
+/** Constructs a uniform scaling from scale factor \a s */
+static inline UniformScaling<double> Scaling(double s) { return UniformScaling<double>(s); }
+/** Constructs a uniform scaling from scale factor \a s */
+template<typename RealScalar>
+static inline UniformScaling<std::complex<RealScalar> > Scaling(const std::complex<RealScalar>& s)
+{ return UniformScaling<std::complex<RealScalar> >(s); }
+
+/** Constructs a 2D axis aligned scaling */
+template<typename Scalar>
+static inline DiagonalMatrix<Scalar,2> Scaling(const Scalar& sx, const Scalar& sy)
+{ return DiagonalMatrix<Scalar,2>(sx, sy); }
+/** Constructs a 3D axis aligned scaling */
+template<typename Scalar>
+static inline DiagonalMatrix<Scalar,3> Scaling(const Scalar& sx, const Scalar& sy, const Scalar& sz)
+{ return DiagonalMatrix<Scalar,3>(sx, sy, sz); }
+
+/** Constructs an axis aligned scaling expression from vector expression \a coeffs
+  * This is an alias for coeffs.asDiagonal()
+  */
+template<typename Derived>
+static inline const DiagonalWrapper<const Derived> Scaling(const MatrixBase<Derived>& coeffs)
+{ return coeffs.asDiagonal(); }
+
+/** \addtogroup Geometry_Module */
+//@{
+/** \deprecated */
+typedef DiagonalMatrix<float, 2> AlignedScaling2f;
+/** \deprecated */
+typedef DiagonalMatrix<double,2> AlignedScaling2d;
+/** \deprecated */
+typedef DiagonalMatrix<float, 3> AlignedScaling3f;
+/** \deprecated */
+typedef DiagonalMatrix<double,3> AlignedScaling3d;
+//@}
+
+template<typename Scalar>
+template<int Dim>
+inline Transform<Scalar,Dim,Affine>
+UniformScaling<Scalar>::operator* (const Translation<Scalar,Dim>& t) const
+{
+  Transform<Scalar,Dim,Affine> res;
+  res.matrix().setZero();
+  res.linear().diagonal().fill(factor());
+  res.translation() = factor() * t.vector();
+  res(Dim,Dim) = Scalar(1);
+  return res;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SCALING_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/Transform.h b/third_party/eigen3/Eigen/src/Geometry/Transform.h
new file mode 100644
index 0000000000..b44c0324b0
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/Transform.h
@@ -0,0 +1,1444 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2010 Hauke Heibel <hauke.heibel@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRANSFORM_H
+#define EIGEN_TRANSFORM_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Transform>
+struct transform_traits
+{
+  enum
+  {
+    Dim = Transform::Dim,
+    HDim = Transform::HDim,
+    Mode = Transform::Mode,
+    IsProjective = (int(Mode)==int(Projective))
+  };
+};
+
+template< typename TransformType,
+          typename MatrixType,
+          int Case = transform_traits<TransformType>::IsProjective ? 0
+                   : int(MatrixType::RowsAtCompileTime) == int(transform_traits<TransformType>::HDim) ? 1
+                   : 2>
+struct transform_right_product_impl;
+
+template< typename Other,
+          int Mode,
+          int Options,
+          int Dim,
+          int HDim,
+          int OtherRows=Other::RowsAtCompileTime,
+          int OtherCols=Other::ColsAtCompileTime>
+struct transform_left_product_impl;
+
+template< typename Lhs,
+          typename Rhs,
+          bool AnyProjective = 
+            transform_traits<Lhs>::IsProjective ||
+            transform_traits<Rhs>::IsProjective>
+struct transform_transform_product_impl;
+
+template< typename Other,
+          int Mode,
+          int Options,
+          int Dim,
+          int HDim,
+          int OtherRows=Other::RowsAtCompileTime,
+          int OtherCols=Other::ColsAtCompileTime>
+struct transform_construct_from_matrix;
+
+template<typename TransformType> struct transform_take_affine_part;
+
+} // end namespace internal
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Transform
+  *
+  * \brief Represents an homogeneous transformation in a N dimensional space
+  *
+  * \tparam _Scalar the scalar type, i.e., the type of the coefficients
+  * \tparam _Dim the dimension of the space
+  * \tparam _Mode the type of the transformation. Can be:
+  *              - #Affine: the transformation is stored as a (Dim+1)^2 matrix,
+  *                         where the last row is assumed to be [0 ... 0 1].
+  *              - #AffineCompact: the transformation is stored as a (Dim)x(Dim+1) matrix.
+  *              - #Projective: the transformation is stored as a (Dim+1)^2 matrix
+  *                             without any assumption.
+  * \tparam _Options has the same meaning as in class Matrix. It allows to specify DontAlign and/or RowMajor.
+  *                  These Options are passed directly to the underlying matrix type.
+  *
+  * The homography is internally represented and stored by a matrix which
+  * is available through the matrix() method. To understand the behavior of
+  * this class you have to think a Transform object as its internal
+  * matrix representation. The chosen convention is right multiply:
+  *
+  * \code v' = T * v \endcode
+  *
+  * Therefore, an affine transformation matrix M is shaped like this:
+  *
+  * \f$ \left( \begin{array}{cc}
+  * linear & translation\\
+  * 0 ... 0 & 1
+  * \end{array} \right) \f$
+  *
+  * Note that for a projective transformation the last row can be anything,
+  * and then the interpretation of different parts might be sightly different.
+  *
+  * However, unlike a plain matrix, the Transform class provides many features
+  * simplifying both its assembly and usage. In particular, it can be composed
+  * with any other transformations (Transform,Translation,RotationBase,Matrix)
+  * and can be directly used to transform implicit homogeneous vectors. All these
+  * operations are handled via the operator*. For the composition of transformations,
+  * its principle consists to first convert the right/left hand sides of the product
+  * to a compatible (Dim+1)^2 matrix and then perform a pure matrix product.
+  * Of course, internally, operator* tries to perform the minimal number of operations
+  * according to the nature of each terms. Likewise, when applying the transform
+  * to non homogeneous vectors, the latters are automatically promoted to homogeneous
+  * one before doing the matrix product. The convertions to homogeneous representations
+  * are performed as follow:
+  *
+  * \b Translation t (Dim)x(1):
+  * \f$ \left( \begin{array}{cc}
+  * I & t \\
+  * 0\,...\,0 & 1
+  * \end{array} \right) \f$
+  *
+  * \b Rotation R (Dim)x(Dim):
+  * \f$ \left( \begin{array}{cc}
+  * R & 0\\
+  * 0\,...\,0 & 1
+  * \end{array} \right) \f$
+  *
+  * \b Linear \b Matrix L (Dim)x(Dim):
+  * \f$ \left( \begin{array}{cc}
+  * L & 0\\
+  * 0\,...\,0 & 1
+  * \end{array} \right) \f$
+  *
+  * \b Affine \b Matrix A (Dim)x(Dim+1):
+  * \f$ \left( \begin{array}{c}
+  * A\\
+  * 0\,...\,0\,1
+  * \end{array} \right) \f$
+  *
+  * \b Column \b vector v (Dim)x(1):
+  * \f$ \left( \begin{array}{c}
+  * v\\
+  * 1
+  * \end{array} \right) \f$
+  *
+  * \b Set \b of \b column \b vectors V1...Vn (Dim)x(n):
+  * \f$ \left( \begin{array}{ccc}
+  * v_1 & ... & v_n\\
+  * 1 & ... & 1
+  * \end{array} \right) \f$
+  *
+  * The concatenation of a Transform object with any kind of other transformation
+  * always returns a Transform object.
+  *
+  * A little exception to the "as pure matrix product" rule is the case of the
+  * transformation of non homogeneous vectors by an affine transformation. In
+  * that case the last matrix row can be ignored, and the product returns non
+  * homogeneous vectors.
+  *
+  * Since, for instance, a Dim x Dim matrix is interpreted as a linear transformation,
+  * it is not possible to directly transform Dim vectors stored in a Dim x Dim matrix.
+  * The solution is either to use a Dim x Dynamic matrix or explicitly request a
+  * vector transformation by making the vector homogeneous:
+  * \code
+  * m' = T * m.colwise().homogeneous();
+  * \endcode
+  * Note that there is zero overhead.
+  *
+  * Conversion methods from/to Qt's QMatrix and QTransform are available if the
+  * preprocessor token EIGEN_QT_SUPPORT is defined.
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_TRANSFORM_PLUGIN.
+  *
+  * \sa class Matrix, class Quaternion
+  */
+template<typename _Scalar, int _Dim, int _Mode, int _Options>
+class Transform
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Dim==Dynamic ? Dynamic : (_Dim+1)*(_Dim+1))
+  enum {
+    Mode = _Mode,
+    Options = _Options,
+    Dim = _Dim,     ///< space dimension in which the transformation holds
+    HDim = _Dim+1,  ///< size of a respective homogeneous vector
+    Rows = int(Mode)==(AffineCompact) ? Dim : HDim
+  };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  typedef DenseIndex Index;
+  /** type of the matrix used to represent the transformation */
+  typedef typename internal::make_proper_matrix_type<Scalar,Rows,HDim,Options>::type MatrixType;
+  /** constified MatrixType */
+  typedef const MatrixType ConstMatrixType;
+  /** type of the matrix used to represent the linear part of the transformation */
+  typedef Matrix<Scalar,Dim,Dim,Options> LinearMatrixType;
+  /** type of read/write reference to the linear part of the transformation */
+  typedef Block<MatrixType,Dim,Dim,int(Mode)==(AffineCompact) && (Options&RowMajor)==0> LinearPart;
+  /** type of read reference to the linear part of the transformation */
+  typedef const Block<ConstMatrixType,Dim,Dim,int(Mode)==(AffineCompact) && (Options&RowMajor)==0> ConstLinearPart;
+  /** type of read/write reference to the affine part of the transformation */
+  typedef typename internal::conditional<int(Mode)==int(AffineCompact),
+                              MatrixType&,
+                              Block<MatrixType,Dim,HDim> >::type AffinePart;
+  /** type of read reference to the affine part of the transformation */
+  typedef typename internal::conditional<int(Mode)==int(AffineCompact),
+                              const MatrixType&,
+                              const Block<const MatrixType,Dim,HDim> >::type ConstAffinePart;
+  /** type of a vector */
+  typedef Matrix<Scalar,Dim,1> VectorType;
+  /** type of a read/write reference to the translation part of the rotation */
+  typedef Block<MatrixType,Dim,1,!(internal::traits<MatrixType>::Flags & RowMajorBit)> TranslationPart;
+  /** type of a read reference to the translation part of the rotation */
+  typedef const Block<ConstMatrixType,Dim,1,!(internal::traits<MatrixType>::Flags & RowMajorBit)> ConstTranslationPart;
+  /** corresponding translation type */
+  typedef Translation<Scalar,Dim> TranslationType;
+  
+  // this intermediate enum is needed to avoid an ICE with gcc 3.4 and 4.0
+  enum { TransformTimeDiagonalMode = ((Mode==int(Isometry))?Affine:int(Mode)) };
+  /** The return type of the product between a diagonal matrix and a transform */
+  typedef Transform<Scalar,Dim,TransformTimeDiagonalMode> TransformTimeDiagonalReturnType;
+
+protected:
+
+  MatrixType m_matrix;
+
+public:
+
+  /** Default constructor without initialization of the meaningful coefficients.
+    * If Mode==Affine, then the last row is set to [0 ... 0 1] */
+  inline Transform()
+  {
+    check_template_params();
+    if (int(Mode)==Affine)
+      makeAffine();
+  }
+
+  inline Transform(const Transform& other)
+  {
+    check_template_params();
+    m_matrix = other.m_matrix;
+  }
+
+  inline explicit Transform(const TranslationType& t)
+  {
+    check_template_params();
+    *this = t;
+  }
+  inline explicit Transform(const UniformScaling<Scalar>& s)
+  {
+    check_template_params();
+    *this = s;
+  }
+  template<typename Derived>
+  inline explicit Transform(const RotationBase<Derived, Dim>& r)
+  {
+    check_template_params();
+    *this = r;
+  }
+
+  inline Transform& operator=(const Transform& other)
+  { m_matrix = other.m_matrix; return *this; }
+
+  typedef internal::transform_take_affine_part<Transform> take_affine_part;
+
+  /** Constructs and initializes a transformation from a Dim^2 or a (Dim+1)^2 matrix. */
+  template<typename OtherDerived>
+  inline explicit Transform(const EigenBase<OtherDerived>& other)
+  {
+    EIGEN_STATIC_ASSERT((internal::is_same<Scalar,typename OtherDerived::Scalar>::value),
+      YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY);
+
+    check_template_params();
+    internal::transform_construct_from_matrix<OtherDerived,Mode,Options,Dim,HDim>::run(this, other.derived());
+  }
+
+  /** Set \c *this from a Dim^2 or (Dim+1)^2 matrix. */
+  template<typename OtherDerived>
+  inline Transform& operator=(const EigenBase<OtherDerived>& other)
+  {
+    EIGEN_STATIC_ASSERT((internal::is_same<Scalar,typename OtherDerived::Scalar>::value),
+      YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY);
+
+    internal::transform_construct_from_matrix<OtherDerived,Mode,Options,Dim,HDim>::run(this, other.derived());
+    return *this;
+  }
+  
+  template<int OtherOptions>
+  inline Transform(const Transform<Scalar,Dim,Mode,OtherOptions>& other)
+  {
+    check_template_params();
+    // only the options change, we can directly copy the matrices
+    m_matrix = other.matrix();
+  }
+
+  template<int OtherMode,int OtherOptions>
+  inline Transform(const Transform<Scalar,Dim,OtherMode,OtherOptions>& other)
+  {
+    check_template_params();
+    // prevent conversions as:
+    // Affine | AffineCompact | Isometry = Projective
+    EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(OtherMode==int(Projective), Mode==int(Projective)),
+                        YOU_PERFORMED_AN_INVALID_TRANSFORMATION_CONVERSION)
+
+    // prevent conversions as:
+    // Isometry = Affine | AffineCompact
+    EIGEN_STATIC_ASSERT(EIGEN_IMPLIES(OtherMode==int(Affine)||OtherMode==int(AffineCompact), Mode!=int(Isometry)),
+                        YOU_PERFORMED_AN_INVALID_TRANSFORMATION_CONVERSION)
+
+    enum { ModeIsAffineCompact = Mode == int(AffineCompact),
+           OtherModeIsAffineCompact = OtherMode == int(AffineCompact)
+    };
+
+    if(ModeIsAffineCompact == OtherModeIsAffineCompact)
+    {
+      // We need the block expression because the code is compiled for all
+      // combinations of transformations and will trigger a compile time error
+      // if one tries to assign the matrices directly
+      m_matrix.template block<Dim,Dim+1>(0,0) = other.matrix().template block<Dim,Dim+1>(0,0);
+      makeAffine();
+    }
+    else if(OtherModeIsAffineCompact)
+    {
+      typedef typename Transform<Scalar,Dim,OtherMode,OtherOptions>::MatrixType OtherMatrixType;
+      internal::transform_construct_from_matrix<OtherMatrixType,Mode,Options,Dim,HDim>::run(this, other.matrix());
+    }
+    else
+    {
+      // here we know that Mode == AffineCompact and OtherMode != AffineCompact.
+      // if OtherMode were Projective, the static assert above would already have caught it.
+      // So the only possibility is that OtherMode == Affine
+      linear() = other.linear();
+      translation() = other.translation();
+    }
+  }
+
+  template<typename OtherDerived>
+  Transform(const ReturnByValue<OtherDerived>& other)
+  {
+    check_template_params();
+    other.evalTo(*this);
+  }
+
+  template<typename OtherDerived>
+  Transform& operator=(const ReturnByValue<OtherDerived>& other)
+  {
+    other.evalTo(*this);
+    return *this;
+  }
+
+  #ifdef EIGEN_QT_SUPPORT
+  inline Transform(const QMatrix& other);
+  inline Transform& operator=(const QMatrix& other);
+  inline QMatrix toQMatrix(void) const;
+  inline Transform(const QTransform& other);
+  inline Transform& operator=(const QTransform& other);
+  inline QTransform toQTransform(void) const;
+  #endif
+
+  /** shortcut for m_matrix(row,col);
+    * \sa MatrixBase::operator(Index,Index) const */
+  inline Scalar operator() (Index row, Index col) const { return m_matrix(row,col); }
+  /** shortcut for m_matrix(row,col);
+    * \sa MatrixBase::operator(Index,Index) */
+  inline Scalar& operator() (Index row, Index col) { return m_matrix(row,col); }
+
+  /** \returns a read-only expression of the transformation matrix */
+  inline const MatrixType& matrix() const { return m_matrix; }
+  /** \returns a writable expression of the transformation matrix */
+  inline MatrixType& matrix() { return m_matrix; }
+
+  /** \returns a read-only expression of the linear part of the transformation */
+  inline ConstLinearPart linear() const { return ConstLinearPart(m_matrix,0,0); }
+  /** \returns a writable expression of the linear part of the transformation */
+  inline LinearPart linear() { return LinearPart(m_matrix,0,0); }
+
+  /** \returns a read-only expression of the Dim x HDim affine part of the transformation */
+  inline ConstAffinePart affine() const { return take_affine_part::run(m_matrix); }
+  /** \returns a writable expression of the Dim x HDim affine part of the transformation */
+  inline AffinePart affine() { return take_affine_part::run(m_matrix); }
+
+  /** \returns a read-only expression of the translation vector of the transformation */
+  inline ConstTranslationPart translation() const { return ConstTranslationPart(m_matrix,0,Dim); }
+  /** \returns a writable expression of the translation vector of the transformation */
+  inline TranslationPart translation() { return TranslationPart(m_matrix,0,Dim); }
+
+  /** \returns an expression of the product between the transform \c *this and a matrix expression \a other
+    *
+    * The right hand side \a other might be either:
+    * \li a vector of size Dim,
+    * \li an homogeneous vector of size Dim+1,
+    * \li a set of vectors of size Dim x Dynamic,
+    * \li a set of homogeneous vectors of size Dim+1 x Dynamic,
+    * \li a linear transformation matrix of size Dim x Dim,
+    * \li an affine transformation matrix of size Dim x Dim+1,
+    * \li a transformation matrix of size Dim+1 x Dim+1.
+    */
+  // note: this function is defined here because some compilers cannot find the respective declaration
+  template<typename OtherDerived>
+  EIGEN_STRONG_INLINE const typename internal::transform_right_product_impl<Transform, OtherDerived>::ResultType
+  operator * (const EigenBase<OtherDerived> &other) const
+  { return internal::transform_right_product_impl<Transform, OtherDerived>::run(*this,other.derived()); }
+
+  /** \returns the product expression of a transformation matrix \a a times a transform \a b
+    *
+    * The left hand side \a other might be either:
+    * \li a linear transformation matrix of size Dim x Dim,
+    * \li an affine transformation matrix of size Dim x Dim+1,
+    * \li a general transformation matrix of size Dim+1 x Dim+1.
+    */
+  template<typename OtherDerived> friend
+  inline const typename internal::transform_left_product_impl<OtherDerived,Mode,Options,_Dim,_Dim+1>::ResultType
+    operator * (const EigenBase<OtherDerived> &a, const Transform &b)
+  { return internal::transform_left_product_impl<OtherDerived,Mode,Options,Dim,HDim>::run(a.derived(),b); }
+
+  /** \returns The product expression of a transform \a a times a diagonal matrix \a b
+    *
+    * The rhs diagonal matrix is interpreted as an affine scaling transformation. The
+    * product results in a Transform of the same type (mode) as the lhs only if the lhs 
+    * mode is no isometry. In that case, the returned transform is an affinity.
+    */
+  template<typename DiagonalDerived>
+  inline const TransformTimeDiagonalReturnType
+    operator * (const DiagonalBase<DiagonalDerived> &b) const
+  {
+    TransformTimeDiagonalReturnType res(*this);
+    res.linear() *= b;
+    return res;
+  }
+
+  /** \returns The product expression of a diagonal matrix \a a times a transform \a b
+    *
+    * The lhs diagonal matrix is interpreted as an affine scaling transformation. The
+    * product results in a Transform of the same type (mode) as the lhs only if the lhs 
+    * mode is no isometry. In that case, the returned transform is an affinity.
+    */
+  template<typename DiagonalDerived>
+  friend inline TransformTimeDiagonalReturnType
+    operator * (const DiagonalBase<DiagonalDerived> &a, const Transform &b)
+  {
+    TransformTimeDiagonalReturnType res;
+    res.linear().noalias() = a*b.linear();
+    res.translation().noalias() = a*b.translation();
+    if (Mode!=int(AffineCompact))
+      res.matrix().row(Dim) = b.matrix().row(Dim);
+    return res;
+  }
+
+  template<typename OtherDerived>
+  inline Transform& operator*=(const EigenBase<OtherDerived>& other) { return *this = *this * other; }
+
+  /** Concatenates two transformations */
+  inline const Transform operator * (const Transform& other) const
+  {
+    return internal::transform_transform_product_impl<Transform,Transform>::run(*this,other);
+  }
+  
+  #if EIGEN_COMP_ICC
+private:
+  // this intermediate structure permits to workaround a bug in ICC 11:
+  //   error: template instantiation resulted in unexpected function type of "Eigen::Transform<double, 3, 32, 0>
+  //             (const Eigen::Transform<double, 3, 2, 0> &) const"
+  //  (the meaning of a name may have changed since the template declaration -- the type of the template is:
+  // "Eigen::internal::transform_transform_product_impl<Eigen::Transform<double, 3, 32, 0>,
+  //     Eigen::Transform<double, 3, Mode, Options>, <expression>>::ResultType (const Eigen::Transform<double, 3, Mode, Options> &) const")
+  // 
+  template<int OtherMode,int OtherOptions> struct icc_11_workaround
+  {
+    typedef internal::transform_transform_product_impl<Transform,Transform<Scalar,Dim,OtherMode,OtherOptions> > ProductType;
+    typedef typename ProductType::ResultType ResultType;
+  };
+  
+public:
+  /** Concatenates two different transformations */
+  template<int OtherMode,int OtherOptions>
+  inline typename icc_11_workaround<OtherMode,OtherOptions>::ResultType
+    operator * (const Transform<Scalar,Dim,OtherMode,OtherOptions>& other) const
+  {
+    typedef typename icc_11_workaround<OtherMode,OtherOptions>::ProductType ProductType;
+    return ProductType::run(*this,other);
+  }
+  #else
+  /** Concatenates two different transformations */
+  template<int OtherMode,int OtherOptions>
+  inline typename internal::transform_transform_product_impl<Transform,Transform<Scalar,Dim,OtherMode,OtherOptions> >::ResultType
+    operator * (const Transform<Scalar,Dim,OtherMode,OtherOptions>& other) const
+  {
+    return internal::transform_transform_product_impl<Transform,Transform<Scalar,Dim,OtherMode,OtherOptions> >::run(*this,other);
+  }
+  #endif
+
+  /** \sa MatrixBase::setIdentity() */
+  void setIdentity() { m_matrix.setIdentity(); }
+
+  /**
+   * \brief Returns an identity transformation.
+   * \todo In the future this function should be returning a Transform expression.
+   */
+  static const Transform Identity()
+  {
+    return Transform(MatrixType::Identity());
+  }
+
+  template<typename OtherDerived>
+  inline Transform& scale(const MatrixBase<OtherDerived> &other);
+
+  template<typename OtherDerived>
+  inline Transform& prescale(const MatrixBase<OtherDerived> &other);
+
+  inline Transform& scale(const Scalar& s);
+  inline Transform& prescale(const Scalar& s);
+
+  template<typename OtherDerived>
+  inline Transform& translate(const MatrixBase<OtherDerived> &other);
+
+  template<typename OtherDerived>
+  inline Transform& pretranslate(const MatrixBase<OtherDerived> &other);
+
+  template<typename RotationType>
+  inline Transform& rotate(const RotationType& rotation);
+
+  template<typename RotationType>
+  inline Transform& prerotate(const RotationType& rotation);
+
+  Transform& shear(const Scalar& sx, const Scalar& sy);
+  Transform& preshear(const Scalar& sx, const Scalar& sy);
+
+  inline Transform& operator=(const TranslationType& t);
+  inline Transform& operator*=(const TranslationType& t) { return translate(t.vector()); }
+  inline Transform operator*(const TranslationType& t) const;
+
+  inline Transform& operator=(const UniformScaling<Scalar>& t);
+  inline Transform& operator*=(const UniformScaling<Scalar>& s) { return scale(s.factor()); }
+  inline TransformTimeDiagonalReturnType operator*(const UniformScaling<Scalar>& s) const
+  {
+    TransformTimeDiagonalReturnType res = *this;
+    res.scale(s.factor());
+    return res;
+  }
+
+  inline Transform& operator*=(const DiagonalMatrix<Scalar,Dim>& s) { linear() *= s; return *this; }
+
+  template<typename Derived>
+  inline Transform& operator=(const RotationBase<Derived,Dim>& r);
+  template<typename Derived>
+  inline Transform& operator*=(const RotationBase<Derived,Dim>& r) { return rotate(r.toRotationMatrix()); }
+  template<typename Derived>
+  inline Transform operator*(const RotationBase<Derived,Dim>& r) const;
+
+  const LinearMatrixType rotation() const;
+  template<typename RotationMatrixType, typename ScalingMatrixType>
+  void computeRotationScaling(RotationMatrixType *rotation, ScalingMatrixType *scaling) const;
+  template<typename ScalingMatrixType, typename RotationMatrixType>
+  void computeScalingRotation(ScalingMatrixType *scaling, RotationMatrixType *rotation) const;
+
+  template<typename PositionDerived, typename OrientationType, typename ScaleDerived>
+  Transform& fromPositionOrientationScale(const MatrixBase<PositionDerived> &position,
+    const OrientationType& orientation, const MatrixBase<ScaleDerived> &scale);
+
+  inline Transform inverse(TransformTraits traits = (TransformTraits)Mode) const;
+
+  /** \returns a const pointer to the column major internal matrix */
+  const Scalar* data() const { return m_matrix.data(); }
+  /** \returns a non-const pointer to the column major internal matrix */
+  Scalar* data() { return m_matrix.data(); }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Transform,Transform<NewScalarType,Dim,Mode,Options> >::type cast() const
+  { return typename internal::cast_return_type<Transform,Transform<NewScalarType,Dim,Mode,Options> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Transform(const Transform<OtherScalarType,Dim,Mode,Options>& other)
+  {
+    check_template_params();
+    m_matrix = other.matrix().template cast<Scalar>();
+  }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Transform& other, const typename NumTraits<Scalar>::Real& prec = NumTraits<Scalar>::dummy_precision()) const
+  { return m_matrix.isApprox(other.m_matrix, prec); }
+
+  /** Sets the last row to [0 ... 0 1]
+    */
+  void makeAffine()
+  {
+    if(int(Mode)!=int(AffineCompact))
+    {
+      matrix().template block<1,Dim>(Dim,0).setZero();
+      matrix().coeffRef(Dim,Dim) = Scalar(1);
+    }
+  }
+
+  /** \internal
+    * \returns the Dim x Dim linear part if the transformation is affine,
+    *          and the HDim x Dim part for projective transformations.
+    */
+  inline Block<MatrixType,int(Mode)==int(Projective)?HDim:Dim,Dim> linearExt()
+  { return m_matrix.template block<int(Mode)==int(Projective)?HDim:Dim,Dim>(0,0); }
+  /** \internal
+    * \returns the Dim x Dim linear part if the transformation is affine,
+    *          and the HDim x Dim part for projective transformations.
+    */
+  inline const Block<MatrixType,int(Mode)==int(Projective)?HDim:Dim,Dim> linearExt() const
+  { return m_matrix.template block<int(Mode)==int(Projective)?HDim:Dim,Dim>(0,0); }
+
+  /** \internal
+    * \returns the translation part if the transformation is affine,
+    *          and the last column for projective transformations.
+    */
+  inline Block<MatrixType,int(Mode)==int(Projective)?HDim:Dim,1> translationExt()
+  { return m_matrix.template block<int(Mode)==int(Projective)?HDim:Dim,1>(0,Dim); }
+  /** \internal
+    * \returns the translation part if the transformation is affine,
+    *          and the last column for projective transformations.
+    */
+  inline const Block<MatrixType,int(Mode)==int(Projective)?HDim:Dim,1> translationExt() const
+  { return m_matrix.template block<int(Mode)==int(Projective)?HDim:Dim,1>(0,Dim); }
+
+
+  #ifdef EIGEN_TRANSFORM_PLUGIN
+  #include EIGEN_TRANSFORM_PLUGIN
+  #endif
+  
+protected:
+  #ifndef EIGEN_PARSED_BY_DOXYGEN
+    static EIGEN_STRONG_INLINE void check_template_params()
+    {
+      EIGEN_STATIC_ASSERT((Options & (DontAlign|RowMajor)) == Options, INVALID_MATRIX_TEMPLATE_PARAMETERS)
+    }
+  #endif
+
+};
+
+/** \ingroup Geometry_Module */
+typedef Transform<float,2,Isometry> Isometry2f;
+/** \ingroup Geometry_Module */
+typedef Transform<float,3,Isometry> Isometry3f;
+/** \ingroup Geometry_Module */
+typedef Transform<double,2,Isometry> Isometry2d;
+/** \ingroup Geometry_Module */
+typedef Transform<double,3,Isometry> Isometry3d;
+
+/** \ingroup Geometry_Module */
+typedef Transform<float,2,Affine> Affine2f;
+/** \ingroup Geometry_Module */
+typedef Transform<float,3,Affine> Affine3f;
+/** \ingroup Geometry_Module */
+typedef Transform<double,2,Affine> Affine2d;
+/** \ingroup Geometry_Module */
+typedef Transform<double,3,Affine> Affine3d;
+
+/** \ingroup Geometry_Module */
+typedef Transform<float,2,AffineCompact> AffineCompact2f;
+/** \ingroup Geometry_Module */
+typedef Transform<float,3,AffineCompact> AffineCompact3f;
+/** \ingroup Geometry_Module */
+typedef Transform<double,2,AffineCompact> AffineCompact2d;
+/** \ingroup Geometry_Module */
+typedef Transform<double,3,AffineCompact> AffineCompact3d;
+
+/** \ingroup Geometry_Module */
+typedef Transform<float,2,Projective> Projective2f;
+/** \ingroup Geometry_Module */
+typedef Transform<float,3,Projective> Projective3f;
+/** \ingroup Geometry_Module */
+typedef Transform<double,2,Projective> Projective2d;
+/** \ingroup Geometry_Module */
+typedef Transform<double,3,Projective> Projective3d;
+
+/**************************
+*** Optional QT support ***
+**************************/
+
+#ifdef EIGEN_QT_SUPPORT
+/** Initializes \c *this from a QMatrix assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim, int Mode,int Options>
+Transform<Scalar,Dim,Mode,Options>::Transform(const QMatrix& other)
+{
+  check_template_params();
+  *this = other;
+}
+
+/** Set \c *this from a QMatrix assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim, int Mode,int Options>
+Transform<Scalar,Dim,Mode,Options>& Transform<Scalar,Dim,Mode,Options>::operator=(const QMatrix& other)
+{
+  EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  if (Mode == int(AffineCompact))
+    m_matrix << other.m11(), other.m21(), other.dx(),
+                other.m12(), other.m22(), other.dy();
+  else
+    m_matrix << other.m11(), other.m21(), other.dx(),
+                other.m12(), other.m22(), other.dy(),
+                0, 0, 1;
+  return *this;
+}
+
+/** \returns a QMatrix from \c *this assuming the dimension is 2.
+  *
+  * \warning this conversion might loss data if \c *this is not affine
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+QMatrix Transform<Scalar,Dim,Mode,Options>::toQMatrix(void) const
+{
+  check_template_params();
+  EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  return QMatrix(m_matrix.coeff(0,0), m_matrix.coeff(1,0),
+                 m_matrix.coeff(0,1), m_matrix.coeff(1,1),
+                 m_matrix.coeff(0,2), m_matrix.coeff(1,2));
+}
+
+/** Initializes \c *this from a QTransform assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim, int Mode,int Options>
+Transform<Scalar,Dim,Mode,Options>::Transform(const QTransform& other)
+{
+  check_template_params();
+  *this = other;
+}
+
+/** Set \c *this from a QTransform assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+Transform<Scalar,Dim,Mode,Options>& Transform<Scalar,Dim,Mode,Options>::operator=(const QTransform& other)
+{
+  check_template_params();
+  EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  if (Mode == int(AffineCompact))
+    m_matrix << other.m11(), other.m21(), other.dx(),
+                other.m12(), other.m22(), other.dy();
+  else
+    m_matrix << other.m11(), other.m21(), other.dx(),
+                other.m12(), other.m22(), other.dy(),
+                other.m13(), other.m23(), other.m33();
+  return *this;
+}
+
+/** \returns a QTransform from \c *this assuming the dimension is 2.
+  *
+  * This function is available only if the token EIGEN_QT_SUPPORT is defined.
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+QTransform Transform<Scalar,Dim,Mode,Options>::toQTransform(void) const
+{
+  EIGEN_STATIC_ASSERT(Dim==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  if (Mode == int(AffineCompact))
+    return QTransform(m_matrix.coeff(0,0), m_matrix.coeff(1,0),
+                      m_matrix.coeff(0,1), m_matrix.coeff(1,1),
+                      m_matrix.coeff(0,2), m_matrix.coeff(1,2));
+  else
+    return QTransform(m_matrix.coeff(0,0), m_matrix.coeff(1,0), m_matrix.coeff(2,0),
+                      m_matrix.coeff(0,1), m_matrix.coeff(1,1), m_matrix.coeff(2,1),
+                      m_matrix.coeff(0,2), m_matrix.coeff(1,2), m_matrix.coeff(2,2));
+}
+#endif
+
+/*********************
+*** Procedural API ***
+*********************/
+
+/** Applies on the right the non uniform scale transformation represented
+  * by the vector \a other to \c *this and returns a reference to \c *this.
+  * \sa prescale()
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename OtherDerived>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::scale(const MatrixBase<OtherDerived> &other)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim))
+  EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS)
+  linearExt().noalias() = (linearExt() * other.asDiagonal());
+  return *this;
+}
+
+/** Applies on the right a uniform scale of a factor \a c to \c *this
+  * and returns a reference to \c *this.
+  * \sa prescale(Scalar)
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+inline Transform<Scalar,Dim,Mode,Options>& Transform<Scalar,Dim,Mode,Options>::scale(const Scalar& s)
+{
+  EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS)
+  linearExt() *= s;
+  return *this;
+}
+
+/** Applies on the left the non uniform scale transformation represented
+  * by the vector \a other to \c *this and returns a reference to \c *this.
+  * \sa scale()
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename OtherDerived>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::prescale(const MatrixBase<OtherDerived> &other)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim))
+  EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS)
+  m_matrix.template block<Dim,HDim>(0,0).noalias() = (other.asDiagonal() * m_matrix.template block<Dim,HDim>(0,0));
+  return *this;
+}
+
+/** Applies on the left a uniform scale of a factor \a c to \c *this
+  * and returns a reference to \c *this.
+  * \sa scale(Scalar)
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+inline Transform<Scalar,Dim,Mode,Options>& Transform<Scalar,Dim,Mode,Options>::prescale(const Scalar& s)
+{
+  EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS)
+  m_matrix.template topRows<Dim>() *= s;
+  return *this;
+}
+
+/** Applies on the right the translation matrix represented by the vector \a other
+  * to \c *this and returns a reference to \c *this.
+  * \sa pretranslate()
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename OtherDerived>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::translate(const MatrixBase<OtherDerived> &other)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim))
+  translationExt() += linearExt() * other;
+  return *this;
+}
+
+/** Applies on the left the translation matrix represented by the vector \a other
+  * to \c *this and returns a reference to \c *this.
+  * \sa translate()
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename OtherDerived>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::pretranslate(const MatrixBase<OtherDerived> &other)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_SPECIFIC_SIZE(OtherDerived,int(Dim))
+  if(int(Mode)==int(Projective))
+    affine() += other * m_matrix.row(Dim);
+  else
+    translation() += other;
+  return *this;
+}
+
+/** Applies on the right the rotation represented by the rotation \a rotation
+  * to \c *this and returns a reference to \c *this.
+  *
+  * The template parameter \a RotationType is the type of the rotation which
+  * must be known by internal::toRotationMatrix<>.
+  *
+  * Natively supported types includes:
+  *   - any scalar (2D),
+  *   - a Dim x Dim matrix expression,
+  *   - a Quaternion (3D),
+  *   - a AngleAxis (3D)
+  *
+  * This mechanism is easily extendable to support user types such as Euler angles,
+  * or a pair of Quaternion for 4D rotations.
+  *
+  * \sa rotate(Scalar), class Quaternion, class AngleAxis, prerotate(RotationType)
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename RotationType>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::rotate(const RotationType& rotation)
+{
+  linearExt() *= internal::toRotationMatrix<Scalar,Dim>(rotation);
+  return *this;
+}
+
+/** Applies on the left the rotation represented by the rotation \a rotation
+  * to \c *this and returns a reference to \c *this.
+  *
+  * See rotate() for further details.
+  *
+  * \sa rotate()
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename RotationType>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::prerotate(const RotationType& rotation)
+{
+  m_matrix.template block<Dim,HDim>(0,0) = internal::toRotationMatrix<Scalar,Dim>(rotation)
+                                         * m_matrix.template block<Dim,HDim>(0,0);
+  return *this;
+}
+
+/** Applies on the right the shear transformation represented
+  * by the vector \a other to \c *this and returns a reference to \c *this.
+  * \warning 2D only.
+  * \sa preshear()
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::shear(const Scalar& sx, const Scalar& sy)
+{
+  EIGEN_STATIC_ASSERT(int(Dim)==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS)
+  VectorType tmp = linear().col(0)*sy + linear().col(1);
+  linear() << linear().col(0) + linear().col(1)*sx, tmp;
+  return *this;
+}
+
+/** Applies on the left the shear transformation represented
+  * by the vector \a other to \c *this and returns a reference to \c *this.
+  * \warning 2D only.
+  * \sa shear()
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::preshear(const Scalar& sx, const Scalar& sy)
+{
+  EIGEN_STATIC_ASSERT(int(Dim)==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  EIGEN_STATIC_ASSERT(Mode!=int(Isometry), THIS_METHOD_IS_ONLY_FOR_SPECIFIC_TRANSFORMATIONS)
+  m_matrix.template block<Dim,HDim>(0,0) = LinearMatrixType(1, sx, sy, 1) * m_matrix.template block<Dim,HDim>(0,0);
+  return *this;
+}
+
+/******************************************************
+*** Scaling, Translation and Rotation compatibility ***
+******************************************************/
+
+template<typename Scalar, int Dim, int Mode, int Options>
+inline Transform<Scalar,Dim,Mode,Options>& Transform<Scalar,Dim,Mode,Options>::operator=(const TranslationType& t)
+{
+  linear().setIdentity();
+  translation() = t.vector();
+  makeAffine();
+  return *this;
+}
+
+template<typename Scalar, int Dim, int Mode, int Options>
+inline Transform<Scalar,Dim,Mode,Options> Transform<Scalar,Dim,Mode,Options>::operator*(const TranslationType& t) const
+{
+  Transform res = *this;
+  res.translate(t.vector());
+  return res;
+}
+
+template<typename Scalar, int Dim, int Mode, int Options>
+inline Transform<Scalar,Dim,Mode,Options>& Transform<Scalar,Dim,Mode,Options>::operator=(const UniformScaling<Scalar>& s)
+{
+  m_matrix.setZero();
+  linear().diagonal().fill(s.factor());
+  makeAffine();
+  return *this;
+}
+
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename Derived>
+inline Transform<Scalar,Dim,Mode,Options>& Transform<Scalar,Dim,Mode,Options>::operator=(const RotationBase<Derived,Dim>& r)
+{
+  linear() = internal::toRotationMatrix<Scalar,Dim>(r);
+  translation().setZero();
+  makeAffine();
+  return *this;
+}
+
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename Derived>
+inline Transform<Scalar,Dim,Mode,Options> Transform<Scalar,Dim,Mode,Options>::operator*(const RotationBase<Derived,Dim>& r) const
+{
+  Transform res = *this;
+  res.rotate(r.derived());
+  return res;
+}
+
+/************************
+*** Special functions ***
+************************/
+
+/** \returns the rotation part of the transformation
+  *
+  *
+  * \svd_module
+  *
+  * \sa computeRotationScaling(), computeScalingRotation(), class SVD
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+const typename Transform<Scalar,Dim,Mode,Options>::LinearMatrixType
+Transform<Scalar,Dim,Mode,Options>::rotation() const
+{
+  LinearMatrixType result;
+  computeRotationScaling(&result, (LinearMatrixType*)0);
+  return result;
+}
+
+
+/** decomposes the linear part of the transformation as a product rotation x scaling, the scaling being
+  * not necessarily positive.
+  *
+  * If either pointer is zero, the corresponding computation is skipped.
+  *
+  *
+  *
+  * \svd_module
+  *
+  * \sa computeScalingRotation(), rotation(), class SVD
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename RotationMatrixType, typename ScalingMatrixType>
+void Transform<Scalar,Dim,Mode,Options>::computeRotationScaling(RotationMatrixType *rotation, ScalingMatrixType *scaling) const
+{
+  JacobiSVD<LinearMatrixType> svd(linear(), ComputeFullU | ComputeFullV);
+
+  Scalar x = (svd.matrixU() * svd.matrixV().adjoint()).determinant(); // so x has absolute value 1
+  VectorType sv(svd.singularValues());
+  sv.coeffRef(0) *= x;
+  if(scaling) scaling->lazyAssign(svd.matrixV() * sv.asDiagonal() * svd.matrixV().adjoint());
+  if(rotation)
+  {
+    LinearMatrixType m(svd.matrixU());
+    m.col(0) /= x;
+    rotation->lazyAssign(m * svd.matrixV().adjoint());
+  }
+}
+
+/** decomposes the linear part of the transformation as a product rotation x scaling, the scaling being
+  * not necessarily positive.
+  *
+  * If either pointer is zero, the corresponding computation is skipped.
+  *
+  *
+  *
+  * \svd_module
+  *
+  * \sa computeRotationScaling(), rotation(), class SVD
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename ScalingMatrixType, typename RotationMatrixType>
+void Transform<Scalar,Dim,Mode,Options>::computeScalingRotation(ScalingMatrixType *scaling, RotationMatrixType *rotation) const
+{
+  JacobiSVD<LinearMatrixType> svd(linear(), ComputeFullU | ComputeFullV);
+
+  Scalar x = (svd.matrixU() * svd.matrixV().adjoint()).determinant(); // so x has absolute value 1
+  VectorType sv(svd.singularValues());
+  sv.coeffRef(0) *= x;
+  if(scaling) scaling->lazyAssign(svd.matrixU() * sv.asDiagonal() * svd.matrixU().adjoint());
+  if(rotation)
+  {
+    LinearMatrixType m(svd.matrixU());
+    m.col(0) /= x;
+    rotation->lazyAssign(m * svd.matrixV().adjoint());
+  }
+}
+
+/** Convenient method to set \c *this from a position, orientation and scale
+  * of a 3D object.
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+template<typename PositionDerived, typename OrientationType, typename ScaleDerived>
+Transform<Scalar,Dim,Mode,Options>&
+Transform<Scalar,Dim,Mode,Options>::fromPositionOrientationScale(const MatrixBase<PositionDerived> &position,
+  const OrientationType& orientation, const MatrixBase<ScaleDerived> &scale)
+{
+  linear() = internal::toRotationMatrix<Scalar,Dim>(orientation);
+  linear() *= scale.asDiagonal();
+  translation() = position;
+  makeAffine();
+  return *this;
+}
+
+namespace internal {
+
+// selector needed to avoid taking the inverse of a 3x4 matrix
+template<typename TransformType, int Mode=TransformType::Mode>
+struct projective_transform_inverse
+{
+  static inline void run(const TransformType&, TransformType&)
+  {}
+};
+
+template<typename TransformType>
+struct projective_transform_inverse<TransformType, Projective>
+{
+  static inline void run(const TransformType& m, TransformType& res)
+  {
+    res.matrix() = m.matrix().inverse();
+  }
+};
+
+} // end namespace internal
+
+
+/**
+  *
+  * \returns the inverse transformation according to some given knowledge
+  * on \c *this.
+  *
+  * \param hint allows to optimize the inversion process when the transformation
+  * is known to be not a general transformation (optional). The possible values are:
+  *  - #Projective if the transformation is not necessarily affine, i.e., if the
+  *    last row is not guaranteed to be [0 ... 0 1]
+  *  - #Affine if the last row can be assumed to be [0 ... 0 1]
+  *  - #Isometry if the transformation is only a concatenations of translations
+  *    and rotations.
+  *  The default is the template class parameter \c Mode.
+  *
+  * \warning unless \a traits is always set to NoShear or NoScaling, this function
+  * requires the generic inverse method of MatrixBase defined in the LU module. If
+  * you forget to include this module, then you will get hard to debug linking errors.
+  *
+  * \sa MatrixBase::inverse()
+  */
+template<typename Scalar, int Dim, int Mode, int Options>
+Transform<Scalar,Dim,Mode,Options>
+Transform<Scalar,Dim,Mode,Options>::inverse(TransformTraits hint) const
+{
+  Transform res;
+  if (hint == Projective)
+  {
+    internal::projective_transform_inverse<Transform>::run(*this, res);
+  }
+  else
+  {
+    if (hint == Isometry)
+    {
+      res.matrix().template topLeftCorner<Dim,Dim>() = linear().transpose();
+    }
+    else if(hint&Affine)
+    {
+      res.matrix().template topLeftCorner<Dim,Dim>() = linear().inverse();
+    }
+    else
+    {
+      eigen_assert(false && "Invalid transform traits in Transform::Inverse");
+    }
+    // translation and remaining parts
+    res.matrix().template topRightCorner<Dim,1>()
+      = - res.matrix().template topLeftCorner<Dim,Dim>() * translation();
+    res.makeAffine(); // we do need this, because in the beginning res is uninitialized
+  }
+  return res;
+}
+
+namespace internal {
+
+/*****************************************************
+*** Specializations of take affine part            ***
+*****************************************************/
+
+template<typename TransformType> struct transform_take_affine_part {
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef typename TransformType::AffinePart AffinePart;
+  typedef typename TransformType::ConstAffinePart ConstAffinePart;
+  static inline AffinePart run(MatrixType& m)
+  { return m.template block<TransformType::Dim,TransformType::HDim>(0,0); }
+  static inline ConstAffinePart run(const MatrixType& m)
+  { return m.template block<TransformType::Dim,TransformType::HDim>(0,0); }
+};
+
+template<typename Scalar, int Dim, int Options>
+struct transform_take_affine_part<Transform<Scalar,Dim,AffineCompact, Options> > {
+  typedef typename Transform<Scalar,Dim,AffineCompact,Options>::MatrixType MatrixType;
+  static inline MatrixType& run(MatrixType& m) { return m; }
+  static inline const MatrixType& run(const MatrixType& m) { return m; }
+};
+
+/*****************************************************
+*** Specializations of construct from matrix       ***
+*****************************************************/
+
+template<typename Other, int Mode, int Options, int Dim, int HDim>
+struct transform_construct_from_matrix<Other, Mode,Options,Dim,HDim, Dim,Dim>
+{
+  static inline void run(Transform<typename Other::Scalar,Dim,Mode,Options> *transform, const Other& other)
+  {
+    transform->linear() = other;
+    transform->translation().setZero();
+    transform->makeAffine();
+  }
+};
+
+template<typename Other, int Mode, int Options, int Dim, int HDim>
+struct transform_construct_from_matrix<Other, Mode,Options,Dim,HDim, Dim,HDim>
+{
+  static inline void run(Transform<typename Other::Scalar,Dim,Mode,Options> *transform, const Other& other)
+  {
+    transform->affine() = other;
+    transform->makeAffine();
+  }
+};
+
+template<typename Other, int Mode, int Options, int Dim, int HDim>
+struct transform_construct_from_matrix<Other, Mode,Options,Dim,HDim, HDim,HDim>
+{
+  static inline void run(Transform<typename Other::Scalar,Dim,Mode,Options> *transform, const Other& other)
+  { transform->matrix() = other; }
+};
+
+template<typename Other, int Options, int Dim, int HDim>
+struct transform_construct_from_matrix<Other, AffineCompact,Options,Dim,HDim, HDim,HDim>
+{
+  static inline void run(Transform<typename Other::Scalar,Dim,AffineCompact,Options> *transform, const Other& other)
+  { transform->matrix() = other.template block<Dim,HDim>(0,0); }
+};
+
+/**********************************************************
+***   Specializations of operator* with rhs EigenBase   ***
+**********************************************************/
+
+template<int LhsMode,int RhsMode>
+struct transform_product_result
+{
+  enum 
+  { 
+    Mode =
+      (LhsMode == (int)Projective    || RhsMode == (int)Projective    ) ? Projective :
+      (LhsMode == (int)Affine        || RhsMode == (int)Affine        ) ? Affine :
+      (LhsMode == (int)AffineCompact || RhsMode == (int)AffineCompact ) ? AffineCompact :
+      (LhsMode == (int)Isometry      || RhsMode == (int)Isometry      ) ? Isometry : Projective
+  };
+};
+
+template< typename TransformType, typename MatrixType >
+struct transform_right_product_impl< TransformType, MatrixType, 0 >
+{
+  typedef typename MatrixType::PlainObject ResultType;
+
+  static EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
+  {
+    return T.matrix() * other;
+  }
+};
+
+template< typename TransformType, typename MatrixType >
+struct transform_right_product_impl< TransformType, MatrixType, 1 >
+{
+  enum { 
+    Dim = TransformType::Dim, 
+    HDim = TransformType::HDim,
+    OtherRows = MatrixType::RowsAtCompileTime,
+    OtherCols = MatrixType::ColsAtCompileTime
+  };
+
+  typedef typename MatrixType::PlainObject ResultType;
+
+  static EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
+  {
+    EIGEN_STATIC_ASSERT(OtherRows==HDim, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES);
+
+    typedef Block<ResultType, Dim, OtherCols, int(MatrixType::RowsAtCompileTime)==Dim> TopLeftLhs;
+
+    ResultType res(other.rows(),other.cols());
+    TopLeftLhs(res, 0, 0, Dim, other.cols()).noalias() = T.affine() * other;
+    res.row(OtherRows-1) = other.row(OtherRows-1);
+    
+    return res;
+  }
+};
+
+template< typename TransformType, typename MatrixType >
+struct transform_right_product_impl< TransformType, MatrixType, 2 >
+{
+  enum { 
+    Dim = TransformType::Dim, 
+    HDim = TransformType::HDim,
+    OtherRows = MatrixType::RowsAtCompileTime,
+    OtherCols = MatrixType::ColsAtCompileTime
+  };
+
+  typedef typename MatrixType::PlainObject ResultType;
+
+  static EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
+  {
+    EIGEN_STATIC_ASSERT(OtherRows==Dim, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES);
+
+    typedef Block<ResultType, Dim, OtherCols, true> TopLeftLhs;
+    ResultType res(Replicate<typename TransformType::ConstTranslationPart, 1, OtherCols>(T.translation(),1,other.cols()));
+    TopLeftLhs(res, 0, 0, Dim, other.cols()).noalias() += T.linear() * other;
+
+    return res;
+  }
+};
+
+/**********************************************************
+***   Specializations of operator* with lhs EigenBase   ***
+**********************************************************/
+
+// generic HDim x HDim matrix * T => Projective
+template<typename Other,int Mode, int Options, int Dim, int HDim>
+struct transform_left_product_impl<Other,Mode,Options,Dim,HDim, HDim,HDim>
+{
+  typedef Transform<typename Other::Scalar,Dim,Mode,Options> TransformType;
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef Transform<typename Other::Scalar,Dim,Projective,Options> ResultType;
+  static ResultType run(const Other& other,const TransformType& tr)
+  { return ResultType(other * tr.matrix()); }
+};
+
+// generic HDim x HDim matrix * AffineCompact => Projective
+template<typename Other, int Options, int Dim, int HDim>
+struct transform_left_product_impl<Other,AffineCompact,Options,Dim,HDim, HDim,HDim>
+{
+  typedef Transform<typename Other::Scalar,Dim,AffineCompact,Options> TransformType;
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef Transform<typename Other::Scalar,Dim,Projective,Options> ResultType;
+  static ResultType run(const Other& other,const TransformType& tr)
+  {
+    ResultType res;
+    res.matrix().noalias() = other.template block<HDim,Dim>(0,0) * tr.matrix();
+    res.matrix().col(Dim) += other.col(Dim);
+    return res;
+  }
+};
+
+// affine matrix * T
+template<typename Other,int Mode, int Options, int Dim, int HDim>
+struct transform_left_product_impl<Other,Mode,Options,Dim,HDim, Dim,HDim>
+{
+  typedef Transform<typename Other::Scalar,Dim,Mode,Options> TransformType;
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef TransformType ResultType;
+  static ResultType run(const Other& other,const TransformType& tr)
+  {
+    ResultType res;
+    res.affine().noalias() = other * tr.matrix();
+    res.matrix().row(Dim) = tr.matrix().row(Dim);
+    return res;
+  }
+};
+
+// affine matrix * AffineCompact
+template<typename Other, int Options, int Dim, int HDim>
+struct transform_left_product_impl<Other,AffineCompact,Options,Dim,HDim, Dim,HDim>
+{
+  typedef Transform<typename Other::Scalar,Dim,AffineCompact,Options> TransformType;
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef TransformType ResultType;
+  static ResultType run(const Other& other,const TransformType& tr)
+  {
+    ResultType res;
+    res.matrix().noalias() = other.template block<Dim,Dim>(0,0) * tr.matrix();
+    res.translation() += other.col(Dim);
+    return res;
+  }
+};
+
+// linear matrix * T
+template<typename Other,int Mode, int Options, int Dim, int HDim>
+struct transform_left_product_impl<Other,Mode,Options,Dim,HDim, Dim,Dim>
+{
+  typedef Transform<typename Other::Scalar,Dim,Mode,Options> TransformType;
+  typedef typename TransformType::MatrixType MatrixType;
+  typedef TransformType ResultType;
+  static ResultType run(const Other& other, const TransformType& tr)
+  {
+    TransformType res;
+    if(Mode!=int(AffineCompact))
+      res.matrix().row(Dim) = tr.matrix().row(Dim);
+    res.matrix().template topRows<Dim>().noalias()
+      = other * tr.matrix().template topRows<Dim>();
+    return res;
+  }
+};
+
+/**********************************************************
+*** Specializations of operator* with another Transform ***
+**********************************************************/
+
+template<typename Scalar, int Dim, int LhsMode, int LhsOptions, int RhsMode, int RhsOptions>
+struct transform_transform_product_impl<Transform<Scalar,Dim,LhsMode,LhsOptions>,Transform<Scalar,Dim,RhsMode,RhsOptions>,false >
+{
+  enum { ResultMode = transform_product_result<LhsMode,RhsMode>::Mode };
+  typedef Transform<Scalar,Dim,LhsMode,LhsOptions> Lhs;
+  typedef Transform<Scalar,Dim,RhsMode,RhsOptions> Rhs;
+  typedef Transform<Scalar,Dim,ResultMode,LhsOptions> ResultType;
+  static ResultType run(const Lhs& lhs, const Rhs& rhs)
+  {
+    ResultType res;
+    res.linear() = lhs.linear() * rhs.linear();
+    res.translation() = lhs.linear() * rhs.translation() + lhs.translation();
+    res.makeAffine();
+    return res;
+  }
+};
+
+template<typename Scalar, int Dim, int LhsMode, int LhsOptions, int RhsMode, int RhsOptions>
+struct transform_transform_product_impl<Transform<Scalar,Dim,LhsMode,LhsOptions>,Transform<Scalar,Dim,RhsMode,RhsOptions>,true >
+{
+  typedef Transform<Scalar,Dim,LhsMode,LhsOptions> Lhs;
+  typedef Transform<Scalar,Dim,RhsMode,RhsOptions> Rhs;
+  typedef Transform<Scalar,Dim,Projective> ResultType;
+  static ResultType run(const Lhs& lhs, const Rhs& rhs)
+  {
+    return ResultType( lhs.matrix() * rhs.matrix() );
+  }
+};
+
+template<typename Scalar, int Dim, int LhsOptions, int RhsOptions>
+struct transform_transform_product_impl<Transform<Scalar,Dim,AffineCompact,LhsOptions>,Transform<Scalar,Dim,Projective,RhsOptions>,true >
+{
+  typedef Transform<Scalar,Dim,AffineCompact,LhsOptions> Lhs;
+  typedef Transform<Scalar,Dim,Projective,RhsOptions> Rhs;
+  typedef Transform<Scalar,Dim,Projective> ResultType;
+  static ResultType run(const Lhs& lhs, const Rhs& rhs)
+  {
+    ResultType res;
+    res.matrix().template topRows<Dim>() = lhs.matrix() * rhs.matrix();
+    res.matrix().row(Dim) = rhs.matrix().row(Dim);
+    return res;
+  }
+};
+
+template<typename Scalar, int Dim, int LhsOptions, int RhsOptions>
+struct transform_transform_product_impl<Transform<Scalar,Dim,Projective,LhsOptions>,Transform<Scalar,Dim,AffineCompact,RhsOptions>,true >
+{
+  typedef Transform<Scalar,Dim,Projective,LhsOptions> Lhs;
+  typedef Transform<Scalar,Dim,AffineCompact,RhsOptions> Rhs;
+  typedef Transform<Scalar,Dim,Projective> ResultType;
+  static ResultType run(const Lhs& lhs, const Rhs& rhs)
+  {
+    ResultType res(lhs.matrix().template leftCols<Dim>() * rhs.matrix());
+    res.matrix().col(Dim) += lhs.matrix().col(Dim);
+    return res;
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRANSFORM_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/Translation.h b/third_party/eigen3/Eigen/src/Geometry/Translation.h
new file mode 100644
index 0000000000..7fda179cc3
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/Translation.h
@@ -0,0 +1,206 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_TRANSLATION_H
+#define EIGEN_TRANSLATION_H
+
+namespace Eigen { 
+
+/** \geometry_module \ingroup Geometry_Module
+  *
+  * \class Translation
+  *
+  * \brief Represents a translation transformation
+  *
+  * \param _Scalar the scalar type, i.e., the type of the coefficients.
+  * \param _Dim the  dimension of the space, can be a compile time value or Dynamic
+  *
+  * \note This class is not aimed to be used to store a translation transformation,
+  * but rather to make easier the constructions and updates of Transform objects.
+  *
+  * \sa class Scaling, class Transform
+  */
+template<typename _Scalar, int _Dim>
+class Translation
+{
+public:
+  EIGEN_MAKE_ALIGNED_OPERATOR_NEW_IF_VECTORIZABLE_FIXED_SIZE(_Scalar,_Dim)
+  /** dimension of the space */
+  enum { Dim = _Dim };
+  /** the scalar type of the coefficients */
+  typedef _Scalar Scalar;
+  /** corresponding vector type */
+  typedef Matrix<Scalar,Dim,1> VectorType;
+  /** corresponding linear transformation matrix type */
+  typedef Matrix<Scalar,Dim,Dim> LinearMatrixType;
+  /** corresponding affine transformation type */
+  typedef Transform<Scalar,Dim,Affine> AffineTransformType;
+  /** corresponding isometric transformation type */
+  typedef Transform<Scalar,Dim,Isometry> IsometryTransformType;
+
+protected:
+
+  VectorType m_coeffs;
+
+public:
+
+  /** Default constructor without initialization. */
+  Translation() {}
+  /**  */
+  inline Translation(const Scalar& sx, const Scalar& sy)
+  {
+    eigen_assert(Dim==2);
+    m_coeffs.x() = sx;
+    m_coeffs.y() = sy;
+  }
+  /**  */
+  inline Translation(const Scalar& sx, const Scalar& sy, const Scalar& sz)
+  {
+    eigen_assert(Dim==3);
+    m_coeffs.x() = sx;
+    m_coeffs.y() = sy;
+    m_coeffs.z() = sz;
+  }
+  /** Constructs and initialize the translation transformation from a vector of translation coefficients */
+  explicit inline Translation(const VectorType& vector) : m_coeffs(vector) {}
+
+  /** \brief Retruns the x-translation by value. **/
+  inline Scalar x() const { return m_coeffs.x(); }
+  /** \brief Retruns the y-translation by value. **/
+  inline Scalar y() const { return m_coeffs.y(); }
+  /** \brief Retruns the z-translation by value. **/
+  inline Scalar z() const { return m_coeffs.z(); }
+
+  /** \brief Retruns the x-translation as a reference. **/
+  inline Scalar& x() { return m_coeffs.x(); }
+  /** \brief Retruns the y-translation as a reference. **/
+  inline Scalar& y() { return m_coeffs.y(); }
+  /** \brief Retruns the z-translation as a reference. **/
+  inline Scalar& z() { return m_coeffs.z(); }
+
+  const VectorType& vector() const { return m_coeffs; }
+  VectorType& vector() { return m_coeffs; }
+
+  const VectorType& translation() const { return m_coeffs; }
+  VectorType& translation() { return m_coeffs; }
+
+  /** Concatenates two translation */
+  inline Translation operator* (const Translation& other) const
+  { return Translation(m_coeffs + other.m_coeffs); }
+
+  /** Concatenates a translation and a uniform scaling */
+  inline AffineTransformType operator* (const UniformScaling<Scalar>& other) const;
+
+  /** Concatenates a translation and a linear transformation */
+  template<typename OtherDerived>
+  inline AffineTransformType operator* (const EigenBase<OtherDerived>& linear) const;
+
+  /** Concatenates a translation and a rotation */
+  template<typename Derived>
+  inline IsometryTransformType operator*(const RotationBase<Derived,Dim>& r) const
+  { return *this * IsometryTransformType(r); }
+
+  /** \returns the concatenation of a linear transformation \a l with the translation \a t */
+  // its a nightmare to define a templated friend function outside its declaration
+  template<typename OtherDerived> friend
+  inline AffineTransformType operator*(const EigenBase<OtherDerived>& linear, const Translation& t)
+  {
+    AffineTransformType res;
+    res.matrix().setZero();
+    res.linear() = linear.derived();
+    res.translation() = linear.derived() * t.m_coeffs;
+    res.matrix().row(Dim).setZero();
+    res(Dim,Dim) = Scalar(1);
+    return res;
+  }
+
+  /** Concatenates a translation and a transformation */
+  template<int Mode, int Options>
+  inline Transform<Scalar,Dim,Mode> operator* (const Transform<Scalar,Dim,Mode,Options>& t) const
+  {
+    Transform<Scalar,Dim,Mode> res = t;
+    res.pretranslate(m_coeffs);
+    return res;
+  }
+
+  /** Applies translation to vector */
+  inline VectorType operator* (const VectorType& other) const
+  { return m_coeffs + other; }
+
+  /** \returns the inverse translation (opposite) */
+  Translation inverse() const { return Translation(-m_coeffs); }
+
+  Translation& operator=(const Translation& other)
+  {
+    m_coeffs = other.m_coeffs;
+    return *this;
+  }
+
+  static const Translation Identity() { return Translation(VectorType::Zero()); }
+
+  /** \returns \c *this with scalar type casted to \a NewScalarType
+    *
+    * Note that if \a NewScalarType is equal to the current scalar type of \c *this
+    * then this function smartly returns a const reference to \c *this.
+    */
+  template<typename NewScalarType>
+  inline typename internal::cast_return_type<Translation,Translation<NewScalarType,Dim> >::type cast() const
+  { return typename internal::cast_return_type<Translation,Translation<NewScalarType,Dim> >::type(*this); }
+
+  /** Copy constructor with scalar type conversion */
+  template<typename OtherScalarType>
+  inline explicit Translation(const Translation<OtherScalarType,Dim>& other)
+  { m_coeffs = other.vector().template cast<Scalar>(); }
+
+  /** \returns \c true if \c *this is approximately equal to \a other, within the precision
+    * determined by \a prec.
+    *
+    * \sa MatrixBase::isApprox() */
+  bool isApprox(const Translation& other, typename NumTraits<Scalar>::Real prec = NumTraits<Scalar>::dummy_precision()) const
+  { return m_coeffs.isApprox(other.m_coeffs, prec); }
+
+};
+
+/** \addtogroup Geometry_Module */
+//@{
+typedef Translation<float, 2> Translation2f;
+typedef Translation<double,2> Translation2d;
+typedef Translation<float, 3> Translation3f;
+typedef Translation<double,3> Translation3d;
+//@}
+
+template<typename Scalar, int Dim>
+inline typename Translation<Scalar,Dim>::AffineTransformType
+Translation<Scalar,Dim>::operator* (const UniformScaling<Scalar>& other) const
+{
+  AffineTransformType res;
+  res.matrix().setZero();
+  res.linear().diagonal().fill(other.factor());
+  res.translation() = m_coeffs;
+  res(Dim,Dim) = Scalar(1);
+  return res;
+}
+
+template<typename Scalar, int Dim>
+template<typename OtherDerived>
+inline typename Translation<Scalar,Dim>::AffineTransformType
+Translation<Scalar,Dim>::operator* (const EigenBase<OtherDerived>& linear) const
+{
+  AffineTransformType res;
+  res.matrix().setZero();
+  res.linear() = linear.derived();
+  res.translation() = m_coeffs;
+  res.matrix().row(Dim).setZero();
+  res(Dim,Dim) = Scalar(1);
+  return res;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_TRANSLATION_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/Umeyama.h b/third_party/eigen3/Eigen/src/Geometry/Umeyama.h
new file mode 100644
index 0000000000..5e20662f80
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/Umeyama.h
@@ -0,0 +1,177 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Hauke Heibel <hauke.heibel@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_UMEYAMA_H
+#define EIGEN_UMEYAMA_H
+
+// This file requires the user to include 
+// * Eigen/Core
+// * Eigen/LU 
+// * Eigen/SVD
+// * Eigen/Array
+
+namespace Eigen { 
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+
+// These helpers are required since it allows to use mixed types as parameters
+// for the Umeyama. The problem with mixed parameters is that the return type
+// cannot trivially be deduced when float and double types are mixed.
+namespace internal {
+
+// Compile time return type deduction for different MatrixBase types.
+// Different means here different alignment and parameters but the same underlying
+// real scalar type.
+template<typename MatrixType, typename OtherMatrixType>
+struct umeyama_transform_matrix_type
+{
+  enum {
+    MinRowsAtCompileTime = EIGEN_SIZE_MIN_PREFER_DYNAMIC(MatrixType::RowsAtCompileTime, OtherMatrixType::RowsAtCompileTime),
+
+    // When possible we want to choose some small fixed size value since the result
+    // is likely to fit on the stack. So here, EIGEN_SIZE_MIN_PREFER_DYNAMIC is not what we want.
+    HomogeneousDimension = int(MinRowsAtCompileTime) == Dynamic ? Dynamic : int(MinRowsAtCompileTime)+1
+  };
+
+  typedef Matrix<typename traits<MatrixType>::Scalar,
+    HomogeneousDimension,
+    HomogeneousDimension,
+    AutoAlign | (traits<MatrixType>::Flags & RowMajorBit ? RowMajor : ColMajor),
+    HomogeneousDimension,
+    HomogeneousDimension
+  > type;
+};
+
+}
+
+#endif
+
+/**
+* \geometry_module \ingroup Geometry_Module
+*
+* \brief Returns the transformation between two point sets.
+*
+* The algorithm is based on:
+* "Least-squares estimation of transformation parameters between two point patterns",
+* Shinji Umeyama, PAMI 1991, DOI: 10.1109/34.88573
+*
+* It estimates parameters \f$ c, \mathbf{R}, \f$ and \f$ \mathbf{t} \f$ such that
+* \f{align*}
+*   \frac{1}{n} \sum_{i=1}^n \vert\vert y_i - (c\mathbf{R}x_i + \mathbf{t}) \vert\vert_2^2
+* \f}
+* is minimized.
+*
+* The algorithm is based on the analysis of the covariance matrix
+* \f$ \Sigma_{\mathbf{x}\mathbf{y}} \in \mathbb{R}^{d \times d} \f$
+* of the input point sets \f$ \mathbf{x} \f$ and \f$ \mathbf{y} \f$ where 
+* \f$d\f$ is corresponding to the dimension (which is typically small).
+* The analysis is involving the SVD having a complexity of \f$O(d^3)\f$
+* though the actual computational effort lies in the covariance
+* matrix computation which has an asymptotic lower bound of \f$O(dm)\f$ when 
+* the input point sets have dimension \f$d \times m\f$.
+*
+* Currently the method is working only for floating point matrices.
+*
+* \todo Should the return type of umeyama() become a Transform?
+*
+* \param src Source points \f$ \mathbf{x} = \left( x_1, \hdots, x_n \right) \f$.
+* \param dst Destination points \f$ \mathbf{y} = \left( y_1, \hdots, y_n \right) \f$.
+* \param with_scaling Sets \f$ c=1 \f$ when <code>false</code> is passed.
+* \return The homogeneous transformation 
+* \f{align*}
+*   T = \begin{bmatrix} c\mathbf{R} & \mathbf{t} \\ \mathbf{0} & 1 \end{bmatrix}
+* \f}
+* minimizing the resudiual above. This transformation is always returned as an 
+* Eigen::Matrix.
+*/
+template <typename Derived, typename OtherDerived>
+typename internal::umeyama_transform_matrix_type<Derived, OtherDerived>::type
+umeyama(const MatrixBase<Derived>& src, const MatrixBase<OtherDerived>& dst, bool with_scaling = true)
+{
+  typedef typename internal::umeyama_transform_matrix_type<Derived, OtherDerived>::type TransformationMatrixType;
+  typedef typename internal::traits<TransformationMatrixType>::Scalar Scalar;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  typedef typename Derived::Index Index;
+
+  EIGEN_STATIC_ASSERT(!NumTraits<Scalar>::IsComplex, NUMERIC_TYPE_MUST_BE_REAL)
+  EIGEN_STATIC_ASSERT((internal::is_same<Scalar, typename internal::traits<OtherDerived>::Scalar>::value),
+    YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+  enum { Dimension = EIGEN_SIZE_MIN_PREFER_DYNAMIC(Derived::RowsAtCompileTime, OtherDerived::RowsAtCompileTime) };
+
+  typedef Matrix<Scalar, Dimension, 1> VectorType;
+  typedef Matrix<Scalar, Dimension, Dimension> MatrixType;
+  typedef typename internal::plain_matrix_type_row_major<Derived>::type RowMajorMatrixType;
+
+  const Index m = src.rows(); // dimension
+  const Index n = src.cols(); // number of measurements
+
+  // required for demeaning ...
+  const RealScalar one_over_n = RealScalar(1) / static_cast<RealScalar>(n);
+
+  // computation of mean
+  const VectorType src_mean = src.rowwise().sum() * one_over_n;
+  const VectorType dst_mean = dst.rowwise().sum() * one_over_n;
+
+  // demeaning of src and dst points
+  const RowMajorMatrixType src_demean = src.colwise() - src_mean;
+  const RowMajorMatrixType dst_demean = dst.colwise() - dst_mean;
+
+  // Eq. (36)-(37)
+  const Scalar src_var = src_demean.rowwise().squaredNorm().sum() * one_over_n;
+
+  // Eq. (38)
+  const MatrixType sigma = one_over_n * dst_demean * src_demean.transpose();
+
+  JacobiSVD<MatrixType> svd(sigma, ComputeFullU | ComputeFullV);
+
+  // Initialize the resulting transformation with an identity matrix...
+  TransformationMatrixType Rt = TransformationMatrixType::Identity(m+1,m+1);
+
+  // Eq. (39)
+  VectorType S = VectorType::Ones(m);
+  if (sigma.determinant()<Scalar(0)) S(m-1) = Scalar(-1);
+
+  // Eq. (40) and (43)
+  const VectorType& d = svd.singularValues();
+  Index rank = 0; for (Index i=0; i<m; ++i) if (!internal::isMuchSmallerThan(d.coeff(i),d.coeff(0))) ++rank;
+  if (rank == m-1) {
+    if ( svd.matrixU().determinant() * svd.matrixV().determinant() > Scalar(0) ) {
+      Rt.block(0,0,m,m).noalias() = svd.matrixU()*svd.matrixV().transpose();
+    } else {
+      const Scalar s = S(m-1); S(m-1) = Scalar(-1);
+      Rt.block(0,0,m,m).noalias() = svd.matrixU() * S.asDiagonal() * svd.matrixV().transpose();
+      S(m-1) = s;
+    }
+  } else {
+    Rt.block(0,0,m,m).noalias() = svd.matrixU() * S.asDiagonal() * svd.matrixV().transpose();
+  }
+
+  if (with_scaling)
+  {
+    // Eq. (42)
+    const Scalar c = Scalar(1)/src_var * svd.singularValues().dot(S);
+
+    // Eq. (41)
+    Rt.col(m).head(m) = dst_mean;
+    Rt.col(m).head(m).noalias() -= c*Rt.topLeftCorner(m,m)*src_mean;
+    Rt.block(0,0,m,m) *= c;
+  }
+  else
+  {
+    Rt.col(m).head(m) = dst_mean;
+    Rt.col(m).head(m).noalias() -= Rt.topLeftCorner(m,m)*src_mean;
+  }
+
+  return Rt;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_UMEYAMA_H
diff --git a/third_party/eigen3/Eigen/src/Geometry/arch/Geometry_SSE.h b/third_party/eigen3/Eigen/src/Geometry/arch/Geometry_SSE.h
new file mode 100644
index 0000000000..3d8284f2d0
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Geometry/arch/Geometry_SSE.h
@@ -0,0 +1,115 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Rohit Garg <rpg.314@gmail.com>
+// Copyright (C) 2009-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_GEOMETRY_SSE_H
+#define EIGEN_GEOMETRY_SSE_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<class Derived, class OtherDerived>
+struct quat_product<Architecture::SSE, Derived, OtherDerived, float, Aligned>
+{
+  static inline Quaternion<float> run(const QuaternionBase<Derived>& _a, const QuaternionBase<OtherDerived>& _b)
+  {
+    const __m128 mask = _mm_castsi128_ps(_mm_setr_epi32(0,0,0,0x80000000));
+    Quaternion<float> res;
+    __m128 a = _a.coeffs().template packet<Aligned>(0);
+    __m128 b = _b.coeffs().template packet<Aligned>(0);
+    __m128 flip1 = _mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a,1,2,0,2),
+                                         vec4f_swizzle1(b,2,0,1,2)),mask);
+    __m128 flip2 = _mm_xor_ps(_mm_mul_ps(vec4f_swizzle1(a,3,3,3,1),
+                                         vec4f_swizzle1(b,0,1,2,1)),mask);
+    pstore(&res.x(),
+              _mm_add_ps(_mm_sub_ps(_mm_mul_ps(a,vec4f_swizzle1(b,3,3,3,3)),
+                                    _mm_mul_ps(vec4f_swizzle1(a,2,0,1,0),
+                                               vec4f_swizzle1(b,1,2,0,0))),
+                         _mm_add_ps(flip1,flip2)));
+    return res;
+  }
+};
+
+template<typename VectorLhs,typename VectorRhs>
+struct cross3_impl<Architecture::SSE,VectorLhs,VectorRhs,float,true>
+{
+  static inline typename plain_matrix_type<VectorLhs>::type
+  run(const VectorLhs& lhs, const VectorRhs& rhs)
+  {
+    __m128 a = lhs.template packet<VectorLhs::Flags&AlignedBit ? Aligned : Unaligned>(0);
+    __m128 b = rhs.template packet<VectorRhs::Flags&AlignedBit ? Aligned : Unaligned>(0);
+    __m128 mul1=_mm_mul_ps(vec4f_swizzle1(a,1,2,0,3),vec4f_swizzle1(b,2,0,1,3));
+    __m128 mul2=_mm_mul_ps(vec4f_swizzle1(a,2,0,1,3),vec4f_swizzle1(b,1,2,0,3));
+    typename plain_matrix_type<VectorLhs>::type res;
+    pstore(&res.x(),_mm_sub_ps(mul1,mul2));
+    return res;
+  }
+};
+
+
+
+
+template<class Derived, class OtherDerived>
+struct quat_product<Architecture::SSE, Derived, OtherDerived, double, Aligned>
+{
+  static inline Quaternion<double> run(const QuaternionBase<Derived>& _a, const QuaternionBase<OtherDerived>& _b)
+  {
+  const Packet2d mask = _mm_castsi128_pd(_mm_set_epi32(0x0,0x0,0x80000000,0x0));
+
+  Quaternion<double> res;
+
+  const double* a = _a.coeffs().data();
+  Packet2d b_xy = _b.coeffs().template packet<Aligned>(0);
+  Packet2d b_zw = _b.coeffs().template packet<Aligned>(2);
+  Packet2d a_xx = pset1<Packet2d>(a[0]);
+  Packet2d a_yy = pset1<Packet2d>(a[1]);
+  Packet2d a_zz = pset1<Packet2d>(a[2]);
+  Packet2d a_ww = pset1<Packet2d>(a[3]);
+
+  // two temporaries:
+  Packet2d t1, t2;
+
+  /*
+   * t1 = ww*xy + yy*zw
+   * t2 = zz*xy - xx*zw
+   * res.xy = t1 +/- swap(t2)
+   */
+  t1 = padd(pmul(a_ww, b_xy), pmul(a_yy, b_zw));
+  t2 = psub(pmul(a_zz, b_xy), pmul(a_xx, b_zw));
+#ifdef EIGEN_VECTORIZE_SSE3
+  EIGEN_UNUSED_VARIABLE(mask)
+  pstore(&res.x(), _mm_addsub_pd(t1, preverse(t2)));
+#else
+  pstore(&res.x(), padd(t1, pxor(mask,preverse(t2))));
+#endif
+  
+  /*
+   * t1 = ww*zw - yy*xy
+   * t2 = zz*zw + xx*xy
+   * res.zw = t1 -/+ swap(t2) = swap( swap(t1) +/- t2)
+   */
+  t1 = psub(pmul(a_ww, b_zw), pmul(a_yy, b_xy));
+  t2 = padd(pmul(a_zz, b_zw), pmul(a_xx, b_xy));
+#ifdef EIGEN_VECTORIZE_SSE3
+  EIGEN_UNUSED_VARIABLE(mask)
+  pstore(&res.z(), preverse(_mm_addsub_pd(preverse(t1), t2)));
+#else
+  pstore(&res.z(), psub(t1, pxor(mask,preverse(t2))));
+#endif
+
+  return res;
+}
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_GEOMETRY_SSE_H
diff --git a/third_party/eigen3/Eigen/src/Householder/BlockHouseholder.h b/third_party/eigen3/Eigen/src/Householder/BlockHouseholder.h
new file mode 100644
index 0000000000..60dbea5f56
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Householder/BlockHouseholder.h
@@ -0,0 +1,68 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Vincent Lejeune
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BLOCK_HOUSEHOLDER_H
+#define EIGEN_BLOCK_HOUSEHOLDER_H
+
+// This file contains some helper function to deal with block householder reflectors
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \internal */
+template<typename TriangularFactorType,typename VectorsType,typename CoeffsType>
+void make_block_householder_triangular_factor(TriangularFactorType& triFactor, const VectorsType& vectors, const CoeffsType& hCoeffs)
+{
+  typedef typename TriangularFactorType::Index Index;
+  typedef typename VectorsType::Scalar Scalar;
+  const Index nbVecs = vectors.cols();
+  eigen_assert(triFactor.rows() == nbVecs && triFactor.cols() == nbVecs && vectors.rows()>=nbVecs);
+
+  for(Index i = 0; i < nbVecs; i++)
+  {
+    Index rs = vectors.rows() - i;
+    Scalar Vii = vectors(i,i);
+    vectors.const_cast_derived().coeffRef(i,i) = Scalar(1);
+    triFactor.col(i).head(i).noalias() = -hCoeffs(i) * vectors.block(i, 0, rs, i).adjoint()
+                                       * vectors.col(i).tail(rs);
+    vectors.const_cast_derived().coeffRef(i, i) = Vii;
+    // FIXME add .noalias() once the triangular product can work inplace
+    triFactor.col(i).head(i) = triFactor.block(0,0,i,i).template triangularView<Upper>()
+                             * triFactor.col(i).head(i);
+    triFactor(i,i) = hCoeffs(i);
+  }
+}
+
+/** \internal */
+template<typename MatrixType,typename VectorsType,typename CoeffsType>
+void apply_block_householder_on_the_left(MatrixType& mat, const VectorsType& vectors, const CoeffsType& hCoeffs)
+{
+  typedef typename MatrixType::Index Index;
+  enum { TFactorSize = MatrixType::ColsAtCompileTime };
+  Index nbVecs = vectors.cols();
+  Matrix<typename MatrixType::Scalar, TFactorSize, TFactorSize, ColMajor> T(nbVecs,nbVecs);
+  make_block_householder_triangular_factor(T, vectors, hCoeffs);
+
+  const TriangularView<const VectorsType, UnitLower>& V(vectors);
+
+  // A -= V T V^* A
+  Matrix<typename MatrixType::Scalar,VectorsType::ColsAtCompileTime,MatrixType::ColsAtCompileTime,0,
+         VectorsType::MaxColsAtCompileTime,MatrixType::MaxColsAtCompileTime> tmp = V.adjoint() * mat;
+  // FIXME add .noalias() once the triangular product can work inplace
+  tmp = T.template triangularView<Upper>().adjoint() * tmp;
+  mat.noalias() -= V * tmp;
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_BLOCK_HOUSEHOLDER_H
diff --git a/third_party/eigen3/Eigen/src/Householder/Householder.h b/third_party/eigen3/Eigen/src/Householder/Householder.h
new file mode 100644
index 0000000000..32112af9bf
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Householder/Householder.h
@@ -0,0 +1,171 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_HOUSEHOLDER_H
+#define EIGEN_HOUSEHOLDER_H
+
+namespace Eigen { 
+
+namespace internal {
+template<int n> struct decrement_size
+{
+  enum {
+    ret = n==Dynamic ? n : n-1
+  };
+};
+}
+
+/** Computes the elementary reflector H such that:
+  * \f$ H *this = [ beta 0 ... 0]^T \f$
+  * where the transformation H is:
+  * \f$ H = I - tau v v^*\f$
+  * and the vector v is:
+  * \f$ v^T = [1 essential^T] \f$
+  *
+  * The essential part of the vector \c v is stored in *this.
+  * 
+  * On output:
+  * \param tau the scaling factor of the Householder transformation
+  * \param beta the result of H * \c *this
+  *
+  * \sa MatrixBase::makeHouseholder(), MatrixBase::applyHouseholderOnTheLeft(),
+  *     MatrixBase::applyHouseholderOnTheRight()
+  */
+template<typename Derived>
+void MatrixBase<Derived>::makeHouseholderInPlace(Scalar& tau, RealScalar& beta)
+{
+  VectorBlock<Derived, internal::decrement_size<Base::SizeAtCompileTime>::ret> essentialPart(derived(), 1, size()-1);
+  makeHouseholder(essentialPart, tau, beta);
+}
+
+/** Computes the elementary reflector H such that:
+  * \f$ H *this = [ beta 0 ... 0]^T \f$
+  * where the transformation H is:
+  * \f$ H = I - tau v v^*\f$
+  * and the vector v is:
+  * \f$ v^T = [1 essential^T] \f$
+  *
+  * On output:
+  * \param essential the essential part of the vector \c v
+  * \param tau the scaling factor of the Householder transformation
+  * \param beta the result of H * \c *this
+  *
+  * \sa MatrixBase::makeHouseholderInPlace(), MatrixBase::applyHouseholderOnTheLeft(),
+  *     MatrixBase::applyHouseholderOnTheRight()
+  */
+template<typename Derived>
+template<typename EssentialPart>
+void MatrixBase<Derived>::makeHouseholder(
+  EssentialPart& essential,
+  Scalar& tau,
+  RealScalar& beta) const
+{
+  using std::sqrt;
+  using numext::conj;
+  
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(EssentialPart)
+  VectorBlock<const Derived, EssentialPart::SizeAtCompileTime> tail(derived(), 1, size()-1);
+  
+  RealScalar tailSqNorm = size()==1 ? RealScalar(0) : tail.squaredNorm();
+  Scalar c0 = coeff(0);
+
+  if(tailSqNorm == RealScalar(0) && numext::imag(c0)==RealScalar(0))
+  {
+    tau = RealScalar(0);
+    beta = numext::real(c0);
+    essential.setZero();
+  }
+  else
+  {
+    beta = sqrt(numext::abs2(c0) + tailSqNorm);
+    if (numext::real(c0)>=RealScalar(0))
+      beta = -beta;
+    essential = tail / (c0 - beta);
+    tau = conj((beta - c0) / beta);
+  }
+}
+
+/** Apply the elementary reflector H given by
+  * \f$ H = I - tau v v^*\f$
+  * with
+  * \f$ v^T = [1 essential^T] \f$
+  * from the left to a vector or matrix.
+  *
+  * On input:
+  * \param essential the essential part of the vector \c v
+  * \param tau the scaling factor of the Householder transformation
+  * \param workspace a pointer to working space with at least
+  *                  this->cols() * essential.size() entries
+  *
+  * \sa MatrixBase::makeHouseholder(), MatrixBase::makeHouseholderInPlace(), 
+  *     MatrixBase::applyHouseholderOnTheRight()
+  */
+template<typename Derived>
+template<typename EssentialPart>
+void MatrixBase<Derived>::applyHouseholderOnTheLeft(
+  const EssentialPart& essential,
+  const Scalar& tau,
+  Scalar* workspace)
+{
+  if(rows() == 1)
+  {
+    *this *= Scalar(1)-tau;
+  }
+  else
+  {
+    Map<typename internal::plain_row_type<PlainObject>::type> tmp(workspace,cols());
+    Block<Derived, EssentialPart::SizeAtCompileTime, Derived::ColsAtCompileTime> bottom(derived(), 1, 0, rows()-1, cols());
+    tmp.noalias() = essential.adjoint() * bottom;
+    tmp += this->row(0);
+    this->row(0) -= tau * tmp;
+    bottom.noalias() -= tau * essential * tmp;
+  }
+}
+
+/** Apply the elementary reflector H given by
+  * \f$ H = I - tau v v^*\f$
+  * with
+  * \f$ v^T = [1 essential^T] \f$
+  * from the right to a vector or matrix.
+  *
+  * On input:
+  * \param essential the essential part of the vector \c v
+  * \param tau the scaling factor of the Householder transformation
+  * \param workspace a pointer to working space with at least
+  *                  this->cols() * essential.size() entries
+  *
+  * \sa MatrixBase::makeHouseholder(), MatrixBase::makeHouseholderInPlace(), 
+  *     MatrixBase::applyHouseholderOnTheLeft()
+  */
+template<typename Derived>
+template<typename EssentialPart>
+void MatrixBase<Derived>::applyHouseholderOnTheRight(
+  const EssentialPart& essential,
+  const Scalar& tau,
+  Scalar* workspace)
+{
+  if(cols() == 1)
+  {
+    *this *= Scalar(1)-tau;
+  }
+  else
+  {
+    Map<typename internal::plain_col_type<PlainObject>::type> tmp(workspace,rows());
+    Block<Derived, Derived::RowsAtCompileTime, EssentialPart::SizeAtCompileTime> right(derived(), 0, 1, rows(), cols()-1);
+    tmp.noalias() = right * essential.conjugate();
+    tmp += this->col(0);
+    this->col(0) -= tau * tmp;
+    right.noalias() -= tau * tmp * essential.transpose();
+  }
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_HOUSEHOLDER_H
diff --git a/third_party/eigen3/Eigen/src/Householder/HouseholderSequence.h b/third_party/eigen3/Eigen/src/Householder/HouseholderSequence.h
new file mode 100644
index 0000000000..d800ca1fa4
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Householder/HouseholderSequence.h
@@ -0,0 +1,441 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_HOUSEHOLDER_SEQUENCE_H
+#define EIGEN_HOUSEHOLDER_SEQUENCE_H
+
+namespace Eigen { 
+
+/** \ingroup Householder_Module
+  * \householder_module
+  * \class HouseholderSequence
+  * \brief Sequence of Householder reflections acting on subspaces with decreasing size
+  * \tparam VectorsType type of matrix containing the Householder vectors
+  * \tparam CoeffsType  type of vector containing the Householder coefficients
+  * \tparam Side        either OnTheLeft (the default) or OnTheRight
+  *
+  * This class represents a product sequence of Householder reflections where the first Householder reflection
+  * acts on the whole space, the second Householder reflection leaves the one-dimensional subspace spanned by
+  * the first unit vector invariant, the third Householder reflection leaves the two-dimensional subspace
+  * spanned by the first two unit vectors invariant, and so on up to the last reflection which leaves all but
+  * one dimensions invariant and acts only on the last dimension. Such sequences of Householder reflections
+  * are used in several algorithms to zero out certain parts of a matrix. Indeed, the methods
+  * HessenbergDecomposition::matrixQ(), Tridiagonalization::matrixQ(), HouseholderQR::householderQ(),
+  * and ColPivHouseholderQR::householderQ() all return a %HouseholderSequence.
+  *
+  * More precisely, the class %HouseholderSequence represents an \f$ n \times n \f$ matrix \f$ H \f$ of the
+  * form \f$ H = \prod_{i=0}^{n-1} H_i \f$ where the i-th Householder reflection is \f$ H_i = I - h_i v_i
+  * v_i^* \f$. The i-th Householder coefficient \f$ h_i \f$ is a scalar and the i-th Householder vector \f$
+  * v_i \f$ is a vector of the form
+  * \f[ 
+  * v_i = [\underbrace{0, \ldots, 0}_{i-1\mbox{ zeros}}, 1, \underbrace{*, \ldots,*}_{n-i\mbox{ arbitrary entries}} ]. 
+  * \f]
+  * The last \f$ n-i \f$ entries of \f$ v_i \f$ are called the essential part of the Householder vector.
+  *
+  * Typical usages are listed below, where H is a HouseholderSequence:
+  * \code
+  * A.applyOnTheRight(H);             // A = A * H
+  * A.applyOnTheLeft(H);              // A = H * A
+  * A.applyOnTheRight(H.adjoint());   // A = A * H^*
+  * A.applyOnTheLeft(H.adjoint());    // A = H^* * A
+  * MatrixXd Q = H;                   // conversion to a dense matrix
+  * \endcode
+  * In addition to the adjoint, you can also apply the inverse (=adjoint), the transpose, and the conjugate operators.
+  *
+  * See the documentation for HouseholderSequence(const VectorsType&, const CoeffsType&) for an example.
+  *
+  * \sa MatrixBase::applyOnTheLeft(), MatrixBase::applyOnTheRight()
+  */
+
+namespace internal {
+
+template<typename VectorsType, typename CoeffsType, int Side>
+struct traits<HouseholderSequence<VectorsType,CoeffsType,Side> >
+{
+  typedef typename VectorsType::Scalar Scalar;
+  typedef typename VectorsType::Index Index;
+  typedef typename VectorsType::StorageKind StorageKind;
+  enum {
+    RowsAtCompileTime = Side==OnTheLeft ? traits<VectorsType>::RowsAtCompileTime
+                                        : traits<VectorsType>::ColsAtCompileTime,
+    ColsAtCompileTime = RowsAtCompileTime,
+    MaxRowsAtCompileTime = Side==OnTheLeft ? traits<VectorsType>::MaxRowsAtCompileTime
+                                           : traits<VectorsType>::MaxColsAtCompileTime,
+    MaxColsAtCompileTime = MaxRowsAtCompileTime,
+    Flags = 0
+  };
+};
+
+template<typename VectorsType, typename CoeffsType, int Side>
+struct hseq_side_dependent_impl
+{
+  typedef Block<const VectorsType, Dynamic, 1> EssentialVectorType;
+  typedef HouseholderSequence<VectorsType, CoeffsType, OnTheLeft> HouseholderSequenceType;
+  typedef typename VectorsType::Index Index;
+  static inline const EssentialVectorType essentialVector(const HouseholderSequenceType& h, Index k)
+  {
+    Index start = k+1+h.m_shift;
+    return Block<const VectorsType,Dynamic,1>(h.m_vectors, start, k, h.rows()-start, 1);
+  }
+};
+
+template<typename VectorsType, typename CoeffsType>
+struct hseq_side_dependent_impl<VectorsType, CoeffsType, OnTheRight>
+{
+  typedef Transpose<Block<const VectorsType, 1, Dynamic> > EssentialVectorType;
+  typedef HouseholderSequence<VectorsType, CoeffsType, OnTheRight> HouseholderSequenceType;
+  typedef typename VectorsType::Index Index;
+  static inline const EssentialVectorType essentialVector(const HouseholderSequenceType& h, Index k)
+  {
+    Index start = k+1+h.m_shift;
+    return Block<const VectorsType,1,Dynamic>(h.m_vectors, k, start, 1, h.rows()-start).transpose();
+  }
+};
+
+template<typename OtherScalarType, typename MatrixType> struct matrix_type_times_scalar_type
+{
+  typedef typename scalar_product_traits<OtherScalarType, typename MatrixType::Scalar>::ReturnType
+    ResultScalar;
+  typedef Matrix<ResultScalar, MatrixType::RowsAtCompileTime, MatrixType::ColsAtCompileTime,
+                 0, MatrixType::MaxRowsAtCompileTime, MatrixType::MaxColsAtCompileTime> Type;
+};
+
+} // end namespace internal
+
+template<typename VectorsType, typename CoeffsType, int Side> class HouseholderSequence
+  : public EigenBase<HouseholderSequence<VectorsType,CoeffsType,Side> >
+{
+    typedef typename internal::hseq_side_dependent_impl<VectorsType,CoeffsType,Side>::EssentialVectorType EssentialVectorType;
+  
+  public:
+    enum {
+      RowsAtCompileTime = internal::traits<HouseholderSequence>::RowsAtCompileTime,
+      ColsAtCompileTime = internal::traits<HouseholderSequence>::ColsAtCompileTime,
+      MaxRowsAtCompileTime = internal::traits<HouseholderSequence>::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = internal::traits<HouseholderSequence>::MaxColsAtCompileTime
+    };
+    typedef typename internal::traits<HouseholderSequence>::Scalar Scalar;
+    typedef typename VectorsType::Index Index;
+
+    typedef HouseholderSequence<
+      typename internal::conditional<NumTraits<Scalar>::IsComplex,
+        typename internal::remove_all<typename VectorsType::ConjugateReturnType>::type,
+        VectorsType>::type,
+      typename internal::conditional<NumTraits<Scalar>::IsComplex,
+        typename internal::remove_all<typename CoeffsType::ConjugateReturnType>::type,
+        CoeffsType>::type,
+      Side
+    > ConjugateReturnType;
+
+    /** \brief Constructor.
+      * \param[in]  v      %Matrix containing the essential parts of the Householder vectors
+      * \param[in]  h      Vector containing the Householder coefficients
+      *
+      * Constructs the Householder sequence with coefficients given by \p h and vectors given by \p v. The
+      * i-th Householder coefficient \f$ h_i \f$ is given by \p h(i) and the essential part of the i-th
+      * Householder vector \f$ v_i \f$ is given by \p v(k,i) with \p k > \p i (the subdiagonal part of the
+      * i-th column). If \p v has fewer columns than rows, then the Householder sequence contains as many
+      * Householder reflections as there are columns.
+      *
+      * \note The %HouseholderSequence object stores \p v and \p h by reference.
+      *
+      * Example: \include HouseholderSequence_HouseholderSequence.cpp
+      * Output: \verbinclude HouseholderSequence_HouseholderSequence.out
+      *
+      * \sa setLength(), setShift()
+      */
+    HouseholderSequence(const VectorsType& v, const CoeffsType& h)
+      : m_vectors(v), m_coeffs(h), m_trans(false), m_length(v.diagonalSize()),
+        m_shift(0)
+    {
+    }
+
+    /** \brief Copy constructor. */
+    HouseholderSequence(const HouseholderSequence& other)
+      : m_vectors(other.m_vectors),
+        m_coeffs(other.m_coeffs),
+        m_trans(other.m_trans),
+        m_length(other.m_length),
+        m_shift(other.m_shift)
+    {
+    }
+
+    /** \brief Number of rows of transformation viewed as a matrix.
+      * \returns Number of rows 
+      * \details This equals the dimension of the space that the transformation acts on.
+      */
+    Index rows() const { return Side==OnTheLeft ? m_vectors.rows() : m_vectors.cols(); }
+
+    /** \brief Number of columns of transformation viewed as a matrix.
+      * \returns Number of columns
+      * \details This equals the dimension of the space that the transformation acts on.
+      */
+    Index cols() const { return rows(); }
+
+    /** \brief Essential part of a Householder vector.
+      * \param[in]  k  Index of Householder reflection
+      * \returns    Vector containing non-trivial entries of k-th Householder vector
+      *
+      * This function returns the essential part of the Householder vector \f$ v_i \f$. This is a vector of
+      * length \f$ n-i \f$ containing the last \f$ n-i \f$ entries of the vector
+      * \f[ 
+      * v_i = [\underbrace{0, \ldots, 0}_{i-1\mbox{ zeros}}, 1, \underbrace{*, \ldots,*}_{n-i\mbox{ arbitrary entries}} ]. 
+      * \f]
+      * The index \f$ i \f$ equals \p k + shift(), corresponding to the k-th column of the matrix \p v
+      * passed to the constructor.
+      *
+      * \sa setShift(), shift()
+      */
+    const EssentialVectorType essentialVector(Index k) const
+    {
+      eigen_assert(k >= 0 && k < m_length);
+      return internal::hseq_side_dependent_impl<VectorsType,CoeffsType,Side>::essentialVector(*this, k);
+    }
+
+    /** \brief %Transpose of the Householder sequence. */
+    HouseholderSequence transpose() const
+    {
+      return HouseholderSequence(*this).setTrans(!m_trans);
+    }
+
+    /** \brief Complex conjugate of the Householder sequence. */
+    ConjugateReturnType conjugate() const
+    {
+      return ConjugateReturnType(m_vectors.conjugate(), m_coeffs.conjugate())
+             .setTrans(m_trans)
+             .setLength(m_length)
+             .setShift(m_shift);
+    }
+
+    /** \brief Adjoint (conjugate transpose) of the Householder sequence. */
+    ConjugateReturnType adjoint() const
+    {
+      return conjugate().setTrans(!m_trans);
+    }
+
+    /** \brief Inverse of the Householder sequence (equals the adjoint). */
+    ConjugateReturnType inverse() const { return adjoint(); }
+
+    /** \internal */
+    template<typename DestType> inline void evalTo(DestType& dst) const
+    {
+      Matrix<Scalar, DestType::RowsAtCompileTime, 1,
+             AutoAlign|ColMajor, DestType::MaxRowsAtCompileTime, 1> workspace(rows());
+      evalTo(dst, workspace);
+    }
+
+    /** \internal */
+    template<typename Dest, typename Workspace>
+    void evalTo(Dest& dst, Workspace& workspace) const
+    {
+      workspace.resize(rows());
+      Index vecs = m_length;
+      if(    internal::is_same<typename internal::remove_all<VectorsType>::type,Dest>::value
+          && internal::extract_data(dst) == internal::extract_data(m_vectors))
+      {
+        // in-place
+        dst.diagonal().setOnes();
+        dst.template triangularView<StrictlyUpper>().setZero();
+        for(Index k = vecs-1; k >= 0; --k)
+        {
+          Index cornerSize = rows() - k - m_shift;
+          if(m_trans)
+            dst.bottomRightCorner(cornerSize, cornerSize)
+               .applyHouseholderOnTheRight(essentialVector(k), m_coeffs.coeff(k), workspace.data());
+          else
+            dst.bottomRightCorner(cornerSize, cornerSize)
+               .applyHouseholderOnTheLeft(essentialVector(k), m_coeffs.coeff(k), workspace.data());
+
+          // clear the off diagonal vector
+          dst.col(k).tail(rows()-k-1).setZero();
+        }
+        // clear the remaining columns if needed
+        for(Index k = 0; k<cols()-vecs ; ++k)
+          dst.col(k).tail(rows()-k-1).setZero();
+      }
+      else
+      {
+        dst.setIdentity(rows(), rows());
+        for(Index k = vecs-1; k >= 0; --k)
+        {
+          Index cornerSize = rows() - k - m_shift;
+          if(m_trans)
+            dst.bottomRightCorner(cornerSize, cornerSize)
+               .applyHouseholderOnTheRight(essentialVector(k), m_coeffs.coeff(k), &workspace.coeffRef(0));
+          else
+            dst.bottomRightCorner(cornerSize, cornerSize)
+               .applyHouseholderOnTheLeft(essentialVector(k), m_coeffs.coeff(k), &workspace.coeffRef(0));
+        }
+      }
+    }
+
+    /** \internal */
+    template<typename Dest> inline void applyThisOnTheRight(Dest& dst) const
+    {
+      Matrix<Scalar,1,Dest::RowsAtCompileTime,RowMajor,1,Dest::MaxRowsAtCompileTime> workspace(dst.rows());
+      applyThisOnTheRight(dst, workspace);
+    }
+
+    /** \internal */
+    template<typename Dest, typename Workspace>
+    inline void applyThisOnTheRight(Dest& dst, Workspace& workspace) const
+    {
+      workspace.resize(dst.rows());
+      for(Index k = 0; k < m_length; ++k)
+      {
+        Index actual_k = m_trans ? m_length-k-1 : k;
+        dst.rightCols(rows()-m_shift-actual_k)
+           .applyHouseholderOnTheRight(essentialVector(actual_k), m_coeffs.coeff(actual_k), workspace.data());
+      }
+    }
+
+    /** \internal */
+    template<typename Dest> inline void applyThisOnTheLeft(Dest& dst) const
+    {
+      Matrix<Scalar,1,Dest::ColsAtCompileTime,RowMajor,1,Dest::MaxColsAtCompileTime> workspace(dst.cols());
+      applyThisOnTheLeft(dst, workspace);
+    }
+
+    /** \internal */
+    template<typename Dest, typename Workspace>
+    inline void applyThisOnTheLeft(Dest& dst, Workspace& workspace) const
+    {
+      workspace.resize(dst.cols());
+      for(Index k = 0; k < m_length; ++k)
+      {
+        Index actual_k = m_trans ? k : m_length-k-1;
+        dst.bottomRows(rows()-m_shift-actual_k)
+           .applyHouseholderOnTheLeft(essentialVector(actual_k), m_coeffs.coeff(actual_k), workspace.data());
+      }
+    }
+
+    /** \brief Computes the product of a Householder sequence with a matrix.
+      * \param[in]  other  %Matrix being multiplied.
+      * \returns    Expression object representing the product.
+      *
+      * This function computes \f$ HM \f$ where \f$ H \f$ is the Householder sequence represented by \p *this
+      * and \f$ M \f$ is the matrix \p other.
+      */
+    template<typename OtherDerived>
+    typename internal::matrix_type_times_scalar_type<Scalar, OtherDerived>::Type operator*(const MatrixBase<OtherDerived>& other) const
+    {
+      typename internal::matrix_type_times_scalar_type<Scalar, OtherDerived>::Type
+        res(other.template cast<typename internal::matrix_type_times_scalar_type<Scalar,OtherDerived>::ResultScalar>());
+      applyThisOnTheLeft(res);
+      return res;
+    }
+
+    template<typename _VectorsType, typename _CoeffsType, int _Side> friend struct internal::hseq_side_dependent_impl;
+
+    /** \brief Sets the length of the Householder sequence.
+      * \param [in]  length  New value for the length.
+      *
+      * By default, the length \f$ n \f$ of the Householder sequence \f$ H = H_0 H_1 \ldots H_{n-1} \f$ is set
+      * to the number of columns of the matrix \p v passed to the constructor, or the number of rows if that
+      * is smaller. After this function is called, the length equals \p length.
+      *
+      * \sa length()
+      */
+    HouseholderSequence& setLength(Index length)
+    {
+      m_length = length;
+      return *this;
+    }
+
+    /** \brief Sets the shift of the Householder sequence.
+      * \param [in]  shift  New value for the shift.
+      *
+      * By default, a %HouseholderSequence object represents \f$ H = H_0 H_1 \ldots H_{n-1} \f$ and the i-th
+      * column of the matrix \p v passed to the constructor corresponds to the i-th Householder
+      * reflection. After this function is called, the object represents \f$ H = H_{\mathrm{shift}}
+      * H_{\mathrm{shift}+1} \ldots H_{n-1} \f$ and the i-th column of \p v corresponds to the (shift+i)-th
+      * Householder reflection.
+      *
+      * \sa shift()
+      */
+    HouseholderSequence& setShift(Index shift)
+    {
+      m_shift = shift;
+      return *this;
+    }
+
+    Index length() const { return m_length; }  /**< \brief Returns the length of the Householder sequence. */
+    Index shift() const { return m_shift; }    /**< \brief Returns the shift of the Householder sequence. */
+
+    /* Necessary for .adjoint() and .conjugate() */
+    template <typename VectorsType2, typename CoeffsType2, int Side2> friend class HouseholderSequence;
+
+  protected:
+
+    /** \brief Sets the transpose flag.
+      * \param [in]  trans  New value of the transpose flag.
+      *
+      * By default, the transpose flag is not set. If the transpose flag is set, then this object represents 
+      * \f$ H^T = H_{n-1}^T \ldots H_1^T H_0^T \f$ instead of \f$ H = H_0 H_1 \ldots H_{n-1} \f$.
+      *
+      * \sa trans()
+      */
+    HouseholderSequence& setTrans(bool trans)
+    {
+      m_trans = trans;
+      return *this;
+    }
+
+    bool trans() const { return m_trans; }     /**< \brief Returns the transpose flag. */
+
+    typename VectorsType::Nested m_vectors;
+    typename CoeffsType::Nested m_coeffs;
+    bool m_trans;
+    Index m_length;
+    Index m_shift;
+};
+
+/** \brief Computes the product of a matrix with a Householder sequence.
+  * \param[in]  other  %Matrix being multiplied.
+  * \param[in]  h      %HouseholderSequence being multiplied.
+  * \returns    Expression object representing the product.
+  *
+  * This function computes \f$ MH \f$ where \f$ M \f$ is the matrix \p other and \f$ H \f$ is the
+  * Householder sequence represented by \p h.
+  */
+template<typename OtherDerived, typename VectorsType, typename CoeffsType, int Side>
+typename internal::matrix_type_times_scalar_type<typename VectorsType::Scalar,OtherDerived>::Type operator*(const MatrixBase<OtherDerived>& other, const HouseholderSequence<VectorsType,CoeffsType,Side>& h)
+{
+  typename internal::matrix_type_times_scalar_type<typename VectorsType::Scalar,OtherDerived>::Type
+    res(other.template cast<typename internal::matrix_type_times_scalar_type<typename VectorsType::Scalar,OtherDerived>::ResultScalar>());
+  h.applyThisOnTheRight(res);
+  return res;
+}
+
+/** \ingroup Householder_Module \householder_module
+  * \brief Convenience function for constructing a Householder sequence. 
+  * \returns A HouseholderSequence constructed from the specified arguments.
+  */
+template<typename VectorsType, typename CoeffsType>
+HouseholderSequence<VectorsType,CoeffsType> householderSequence(const VectorsType& v, const CoeffsType& h)
+{
+  return HouseholderSequence<VectorsType,CoeffsType,OnTheLeft>(v, h);
+}
+
+/** \ingroup Householder_Module \householder_module
+  * \brief Convenience function for constructing a Householder sequence. 
+  * \returns A HouseholderSequence constructed from the specified arguments.
+  * \details This function differs from householderSequence() in that the template argument \p OnTheSide of
+  * the constructed HouseholderSequence is set to OnTheRight, instead of the default OnTheLeft.
+  */
+template<typename VectorsType, typename CoeffsType>
+HouseholderSequence<VectorsType,CoeffsType,OnTheRight> rightHouseholderSequence(const VectorsType& v, const CoeffsType& h)
+{
+  return HouseholderSequence<VectorsType,CoeffsType,OnTheRight>(v, h);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_HOUSEHOLDER_SEQUENCE_H
diff --git a/third_party/eigen3/Eigen/src/IterativeLinearSolvers/BasicPreconditioners.h b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/BasicPreconditioners.h
new file mode 100644
index 0000000000..1f3c060d02
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/BasicPreconditioners.h
@@ -0,0 +1,149 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BASIC_PRECONDITIONERS_H
+#define EIGEN_BASIC_PRECONDITIONERS_H
+
+namespace Eigen { 
+
+/** \ingroup IterativeLinearSolvers_Module
+  * \brief A preconditioner based on the digonal entries
+  *
+  * This class allows to approximately solve for A.x = b problems assuming A is a diagonal matrix.
+  * In other words, this preconditioner neglects all off diagonal entries and, in Eigen's language, solves for:
+  * \code
+  * A.diagonal().asDiagonal() . x = b
+  * \endcode
+  *
+  * \tparam _Scalar the type of the scalar.
+  *
+  * This preconditioner is suitable for both selfadjoint and general problems.
+  * The diagonal entries are pre-inverted and stored into a dense vector.
+  *
+  * \note A variant that has yet to be implemented would attempt to preserve the norm of each column.
+  *
+  */
+template <typename _Scalar>
+class DiagonalPreconditioner
+{
+    typedef _Scalar Scalar;
+    typedef Matrix<Scalar,Dynamic,1> Vector;
+    typedef typename Vector::Index Index;
+
+  public:
+    // this typedef is only to export the scalar type and compile-time dimensions to solve_retval
+    typedef Matrix<Scalar,Dynamic,Dynamic> MatrixType;
+
+    DiagonalPreconditioner() : m_isInitialized(false) {}
+
+    template<typename MatType>
+    DiagonalPreconditioner(const MatType& mat) : m_invdiag(mat.cols())
+    {
+      compute(mat);
+    }
+
+    Index rows() const { return m_invdiag.size(); }
+    Index cols() const { return m_invdiag.size(); }
+    
+    template<typename MatType>
+    DiagonalPreconditioner& analyzePattern(const MatType& )
+    {
+      return *this;
+    }
+    
+    template<typename MatType>
+    DiagonalPreconditioner& factorize(const MatType& mat)
+    {
+      m_invdiag.resize(mat.cols());
+      for(int j=0; j<mat.outerSize(); ++j)
+      {
+        typename MatType::InnerIterator it(mat,j);
+        while(it && it.index()!=j) ++it;
+        if(it && it.index()==j && it.value()!=Scalar(0))
+          m_invdiag(j) = Scalar(1)/it.value();
+        else
+          m_invdiag(j) = Scalar(1);
+      }
+      m_isInitialized = true;
+      return *this;
+    }
+    
+    template<typename MatType>
+    DiagonalPreconditioner& compute(const MatType& mat)
+    {
+      return factorize(mat);
+    }
+
+    template<typename Rhs, typename Dest>
+    void _solve(const Rhs& b, Dest& x) const
+    {
+      x = m_invdiag.array() * b.array() ;
+    }
+
+    template<typename Rhs> inline const internal::solve_retval<DiagonalPreconditioner, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "DiagonalPreconditioner is not initialized.");
+      eigen_assert(m_invdiag.size()==b.rows()
+                && "DiagonalPreconditioner::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<DiagonalPreconditioner, Rhs>(*this, b.derived());
+    }
+
+  protected:
+    Vector m_invdiag;
+    bool m_isInitialized;
+};
+
+namespace internal {
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<DiagonalPreconditioner<_MatrixType>, Rhs>
+  : solve_retval_base<DiagonalPreconditioner<_MatrixType>, Rhs>
+{
+  typedef DiagonalPreconditioner<_MatrixType> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+}
+
+/** \ingroup IterativeLinearSolvers_Module
+  * \brief A naive preconditioner which approximates any matrix as the identity matrix
+  *
+  * \sa class DiagonalPreconditioner
+  */
+class IdentityPreconditioner
+{
+  public:
+
+    IdentityPreconditioner() {}
+
+    template<typename MatrixType>
+    IdentityPreconditioner(const MatrixType& ) {}
+    
+    template<typename MatrixType>
+    IdentityPreconditioner& analyzePattern(const MatrixType& ) { return *this; }
+    
+    template<typename MatrixType>
+    IdentityPreconditioner& factorize(const MatrixType& ) { return *this; }
+
+    template<typename MatrixType>
+    IdentityPreconditioner& compute(const MatrixType& ) { return *this; }
+    
+    template<typename Rhs>
+    inline const Rhs& solve(const Rhs& b) const { return b; }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_BASIC_PRECONDITIONERS_H
diff --git a/third_party/eigen3/Eigen/src/IterativeLinearSolvers/BiCGSTAB.h b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/BiCGSTAB.h
new file mode 100644
index 0000000000..7a46b51fa6
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/BiCGSTAB.h
@@ -0,0 +1,254 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BICGSTAB_H
+#define EIGEN_BICGSTAB_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \internal Low-level bi conjugate gradient stabilized algorithm
+  * \param mat The matrix A
+  * \param rhs The right hand side vector b
+  * \param x On input and initial solution, on output the computed solution.
+  * \param precond A preconditioner being able to efficiently solve for an
+  *                approximation of Ax=b (regardless of b)
+  * \param iters On input the max number of iteration, on output the number of performed iterations.
+  * \param tol_error On input the tolerance error, on output an estimation of the relative error.
+  * \return false in the case of numerical issue, for example a break down of BiCGSTAB. 
+  */
+template<typename MatrixType, typename Rhs, typename Dest, typename Preconditioner>
+bool bicgstab(const MatrixType& mat, const Rhs& rhs, Dest& x,
+              const Preconditioner& precond, int& iters,
+              typename Dest::RealScalar& tol_error)
+{
+  using std::sqrt;
+  using std::abs;
+  typedef typename Dest::RealScalar RealScalar;
+  typedef typename Dest::Scalar Scalar;
+  typedef Matrix<Scalar,Dynamic,1> VectorType;
+  RealScalar tol = tol_error;
+  int maxIters = iters;
+
+  int n = mat.cols();
+  x = precond.solve(x);
+  VectorType r  = rhs - mat * x;
+  VectorType r0 = r;
+  
+  RealScalar r0_sqnorm = r0.squaredNorm();
+  RealScalar rhs_sqnorm = rhs.squaredNorm();
+  if(rhs_sqnorm == 0)
+  {
+    x.setZero();
+    return true;
+  }
+  Scalar rho    = 1;
+  Scalar alpha  = 1;
+  Scalar w      = 1;
+  
+  VectorType v = VectorType::Zero(n), p = VectorType::Zero(n);
+  VectorType y(n),  z(n);
+  VectorType kt(n), ks(n);
+
+  VectorType s(n), t(n);
+
+  RealScalar tol2 = tol*tol;
+  int i = 0;
+  int restarts = 0;
+
+  while ( r.squaredNorm()/rhs_sqnorm > tol2 && i<maxIters )
+  {
+    Scalar rho_old = rho;
+
+    rho = r0.dot(r);
+    if (internal::isMuchSmallerThan(rho,r0_sqnorm))
+    {
+      // The new residual vector became too orthogonal to the arbitrarily choosen direction r0
+      // Let's restart with a new r0:
+      r0 = r;
+      rho = r0_sqnorm = r.squaredNorm();
+      if(restarts++ == 0)
+        i = 0;
+    }
+    Scalar beta = (rho/rho_old) * (alpha / w);
+    p = r + beta * (p - w * v);
+    
+    y = precond.solve(p);
+    
+    v.noalias() = mat * y;
+
+    alpha = rho / r0.dot(v);
+    s = r - alpha * v;
+
+    z = precond.solve(s);
+    t.noalias() = mat * z;
+
+    RealScalar tmp = t.squaredNorm();
+    if(tmp>RealScalar(0))
+      w = t.dot(s) / tmp;
+    else
+      w = Scalar(0);
+    x += alpha * y + w * z;
+    r = s - w * t;
+    ++i;
+  }
+  tol_error = sqrt(r.squaredNorm()/rhs_sqnorm);
+  iters = i;
+  return true; 
+}
+
+}
+
+template< typename _MatrixType,
+          typename _Preconditioner = DiagonalPreconditioner<typename _MatrixType::Scalar> >
+class BiCGSTAB;
+
+namespace internal {
+
+template< typename _MatrixType, typename _Preconditioner>
+struct traits<BiCGSTAB<_MatrixType,_Preconditioner> >
+{
+  typedef _MatrixType MatrixType;
+  typedef _Preconditioner Preconditioner;
+};
+
+}
+
+/** \ingroup IterativeLinearSolvers_Module
+  * \brief A bi conjugate gradient stabilized solver for sparse square problems
+  *
+  * This class allows to solve for A.x = b sparse linear problems using a bi conjugate gradient
+  * stabilized algorithm. The vectors x and b can be either dense or sparse.
+  *
+  * \tparam _MatrixType the type of the sparse matrix A, can be a dense or a sparse matrix.
+  * \tparam _Preconditioner the type of the preconditioner. Default is DiagonalPreconditioner
+  *
+  * The maximal number of iterations and tolerance value can be controlled via the setMaxIterations()
+  * and setTolerance() methods. The defaults are the size of the problem for the maximal number of iterations
+  * and NumTraits<Scalar>::epsilon() for the tolerance.
+  * 
+  * This class can be used as the direct solver classes. Here is a typical usage example:
+  * \include BiCGSTAB_simple.cpp
+  * 
+  * By default the iterations start with x=0 as an initial guess of the solution.
+  * One can control the start using the solveWithGuess() method. Here is a step by
+  * step execution example starting with a random guess and printing the evolution
+  * of the estimated error:
+  * \include BiCGSTAB_step_by_step.cpp
+  * Note that such a step by step excution is slightly slower.
+  * 
+  * \sa class SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner
+  */
+template< typename _MatrixType, typename _Preconditioner>
+class BiCGSTAB : public IterativeSolverBase<BiCGSTAB<_MatrixType,_Preconditioner> >
+{
+  typedef IterativeSolverBase<BiCGSTAB> Base;
+  using Base::mp_matrix;
+  using Base::m_error;
+  using Base::m_iterations;
+  using Base::m_info;
+  using Base::m_isInitialized;
+public:
+  typedef _MatrixType MatrixType;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::RealScalar RealScalar;
+  typedef _Preconditioner Preconditioner;
+
+public:
+
+  /** Default constructor. */
+  BiCGSTAB() : Base() {}
+
+  /** Initialize the solver with matrix \a A for further \c Ax=b solving.
+    * 
+    * This constructor is a shortcut for the default constructor followed
+    * by a call to compute().
+    * 
+    * \warning this class stores a reference to the matrix A as well as some
+    * precomputed values that depend on it. Therefore, if \a A is changed
+    * this class becomes invalid. Call compute() to update it with the new
+    * matrix A, or modify a copy of A.
+    */
+  BiCGSTAB(const MatrixType& A) : Base(A) {}
+
+  ~BiCGSTAB() {}
+  
+  /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A
+    * \a x0 as an initial solution.
+    *
+    * \sa compute()
+    */
+  template<typename Rhs,typename Guess>
+  inline const internal::solve_retval_with_guess<BiCGSTAB, Rhs, Guess>
+  solveWithGuess(const MatrixBase<Rhs>& b, const Guess& x0) const
+  {
+    eigen_assert(m_isInitialized && "BiCGSTAB is not initialized.");
+    eigen_assert(Base::rows()==b.rows()
+              && "BiCGSTAB::solve(): invalid number of rows of the right hand side matrix b");
+    return internal::solve_retval_with_guess
+            <BiCGSTAB, Rhs, Guess>(*this, b.derived(), x0);
+  }
+  
+  /** \internal */
+  template<typename Rhs,typename Dest>
+  void _solveWithGuess(const Rhs& b, Dest& x) const
+  {    
+    bool failed = false;
+    for(int j=0; j<b.cols(); ++j)
+    {
+      m_iterations = Base::maxIterations();
+      m_error = Base::m_tolerance;
+      
+      typename Dest::ColXpr xj(x,j);
+      if(!internal::bicgstab(*mp_matrix, b.col(j), xj, Base::m_preconditioner, m_iterations, m_error))
+        failed = true;
+    }
+    m_info = failed ? NumericalIssue
+           : m_error <= Base::m_tolerance ? Success
+           : NoConvergence;
+    m_isInitialized = true;
+  }
+
+  /** \internal */
+  template<typename Rhs,typename Dest>
+  void _solve(const Rhs& b, Dest& x) const
+  {
+//     x.setZero();
+  x = b;
+    _solveWithGuess(b,x);
+  }
+
+protected:
+
+};
+
+
+namespace internal {
+
+  template<typename _MatrixType, typename _Preconditioner, typename Rhs>
+struct solve_retval<BiCGSTAB<_MatrixType, _Preconditioner>, Rhs>
+  : solve_retval_base<BiCGSTAB<_MatrixType, _Preconditioner>, Rhs>
+{
+  typedef BiCGSTAB<_MatrixType, _Preconditioner> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_BICGSTAB_H
diff --git a/third_party/eigen3/Eigen/src/IterativeLinearSolvers/ConjugateGradient.h b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/ConjugateGradient.h
new file mode 100644
index 0000000000..3ce5179409
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/ConjugateGradient.h
@@ -0,0 +1,265 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CONJUGATE_GRADIENT_H
+#define EIGEN_CONJUGATE_GRADIENT_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \internal Low-level conjugate gradient algorithm
+  * \param mat The matrix A
+  * \param rhs The right hand side vector b
+  * \param x On input and initial solution, on output the computed solution.
+  * \param precond A preconditioner being able to efficiently solve for an
+  *                approximation of Ax=b (regardless of b)
+  * \param iters On input the max number of iteration, on output the number of performed iterations.
+  * \param tol_error On input the tolerance error, on output an estimation of the relative error.
+  */
+template<typename MatrixType, typename Rhs, typename Dest, typename Preconditioner>
+EIGEN_DONT_INLINE
+void conjugate_gradient(const MatrixType& mat, const Rhs& rhs, Dest& x,
+                        const Preconditioner& precond, int& iters,
+                        typename Dest::RealScalar& tol_error)
+{
+  using std::sqrt;
+  using std::abs;
+  typedef typename Dest::RealScalar RealScalar;
+  typedef typename Dest::Scalar Scalar;
+  typedef Matrix<Scalar,Dynamic,1> VectorType;
+  
+  RealScalar tol = tol_error;
+  int maxIters = iters;
+  
+  int n = mat.cols();
+
+  VectorType residual = rhs - mat * x; //initial residual
+
+  RealScalar rhsNorm2 = rhs.squaredNorm();
+  if(rhsNorm2 == 0) 
+  {
+    x.setZero();
+    iters = 0;
+    tol_error = 0;
+    return;
+  }
+  RealScalar threshold = tol*tol*rhsNorm2;
+  RealScalar residualNorm2 = residual.squaredNorm();
+  if (residualNorm2 < threshold)
+  {
+    iters = 0;
+    tol_error = sqrt(residualNorm2 / rhsNorm2);
+    return;
+  }
+  
+  VectorType p(n);
+  p = precond.solve(residual);      //initial search direction
+
+  VectorType z(n), tmp(n);
+  RealScalar absNew = numext::real(residual.dot(p));  // the square of the absolute value of r scaled by invM
+  int i = 0;
+  while(i < maxIters)
+  {
+    tmp.noalias() = mat * p;              // the bottleneck of the algorithm
+
+    Scalar alpha = absNew / p.dot(tmp);   // the amount we travel on dir
+    x += alpha * p;                       // update solution
+    residual -= alpha * tmp;              // update residue
+    
+    residualNorm2 = residual.squaredNorm();
+    if(residualNorm2 < threshold)
+      break;
+    
+    z = precond.solve(residual);          // approximately solve for "A z = residual"
+
+    RealScalar absOld = absNew;
+    absNew = numext::real(residual.dot(z));     // update the absolute value of r
+    RealScalar beta = absNew / absOld;            // calculate the Gram-Schmidt value used to create the new search direction
+    p = z + beta * p;                             // update search direction
+    i++;
+  }
+  tol_error = sqrt(residualNorm2 / rhsNorm2);
+  iters = i;
+}
+
+}
+
+template< typename _MatrixType, int _UpLo=Lower,
+          typename _Preconditioner = DiagonalPreconditioner<typename _MatrixType::Scalar> >
+class ConjugateGradient;
+
+namespace internal {
+
+template< typename _MatrixType, int _UpLo, typename _Preconditioner>
+struct traits<ConjugateGradient<_MatrixType,_UpLo,_Preconditioner> >
+{
+  typedef _MatrixType MatrixType;
+  typedef _Preconditioner Preconditioner;
+};
+
+}
+
+/** \ingroup IterativeLinearSolvers_Module
+  * \brief A conjugate gradient solver for sparse (or dense) self-adjoint problems
+  *
+  * This class allows to solve for A.x = b linear problems using an iterative conjugate gradient algorithm.
+  * The matrix A must be selfadjoint. The matrix A and the vectors x and b can be either dense or sparse.
+  *
+  * \tparam _MatrixType the type of the matrix A, can be a dense or a sparse matrix.
+  * \tparam _UpLo the triangular part that will be used for the computations. It can be Lower
+  *               or Upper. Default is Lower.
+  * \tparam _Preconditioner the type of the preconditioner. Default is DiagonalPreconditioner
+  *
+  * The maximal number of iterations and tolerance value can be controlled via the setMaxIterations()
+  * and setTolerance() methods. The defaults are the size of the problem for the maximal number of iterations
+  * and NumTraits<Scalar>::epsilon() for the tolerance.
+  * 
+  * This class can be used as the direct solver classes. Here is a typical usage example:
+  * \code
+  * int n = 10000;
+  * VectorXd x(n), b(n);
+  * SparseMatrix<double> A(n,n);
+  * // fill A and b
+  * ConjugateGradient<SparseMatrix<double> > cg;
+  * cg.compute(A);
+  * x = cg.solve(b);
+  * std::cout << "#iterations:     " << cg.iterations() << std::endl;
+  * std::cout << "estimated error: " << cg.error()      << std::endl;
+  * // update b, and solve again
+  * x = cg.solve(b);
+  * \endcode
+  * 
+  * By default the iterations start with x=0 as an initial guess of the solution.
+  * One can control the start using the solveWithGuess() method. Here is a step by
+  * step execution example starting with a random guess and printing the evolution
+  * of the estimated error:
+  * * \code
+  * x = VectorXd::Random(n);
+  * cg.setMaxIterations(1);
+  * int i = 0;
+  * do {
+  *   x = cg.solveWithGuess(b,x);
+  *   std::cout << i << " : " << cg.error() << std::endl;
+  *   ++i;
+  * } while (cg.info()!=Success && i<100);
+  * \endcode
+  * Note that such a step by step excution is slightly slower.
+  * 
+  * \sa class SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner
+  */
+template< typename _MatrixType, int _UpLo, typename _Preconditioner>
+class ConjugateGradient : public IterativeSolverBase<ConjugateGradient<_MatrixType,_UpLo,_Preconditioner> >
+{
+  typedef IterativeSolverBase<ConjugateGradient> Base;
+  using Base::mp_matrix;
+  using Base::m_error;
+  using Base::m_iterations;
+  using Base::m_info;
+  using Base::m_isInitialized;
+public:
+  typedef _MatrixType MatrixType;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::RealScalar RealScalar;
+  typedef _Preconditioner Preconditioner;
+
+  enum {
+    UpLo = _UpLo
+  };
+
+public:
+
+  /** Default constructor. */
+  ConjugateGradient() : Base() {}
+
+  /** Initialize the solver with matrix \a A for further \c Ax=b solving.
+    * 
+    * This constructor is a shortcut for the default constructor followed
+    * by a call to compute().
+    * 
+    * \warning this class stores a reference to the matrix A as well as some
+    * precomputed values that depend on it. Therefore, if \a A is changed
+    * this class becomes invalid. Call compute() to update it with the new
+    * matrix A, or modify a copy of A.
+    */
+  ConjugateGradient(const MatrixType& A) : Base(A) {}
+
+  ~ConjugateGradient() {}
+  
+  /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A
+    * \a x0 as an initial solution.
+    *
+    * \sa compute()
+    */
+  template<typename Rhs,typename Guess>
+  inline const internal::solve_retval_with_guess<ConjugateGradient, Rhs, Guess>
+  solveWithGuess(const MatrixBase<Rhs>& b, const Guess& x0) const
+  {
+    eigen_assert(m_isInitialized && "ConjugateGradient is not initialized.");
+    eigen_assert(Base::rows()==b.rows()
+              && "ConjugateGradient::solve(): invalid number of rows of the right hand side matrix b");
+    return internal::solve_retval_with_guess
+            <ConjugateGradient, Rhs, Guess>(*this, b.derived(), x0);
+  }
+
+  /** \internal */
+  template<typename Rhs,typename Dest>
+  void _solveWithGuess(const Rhs& b, Dest& x) const
+  {
+    m_iterations = Base::maxIterations();
+    m_error = Base::m_tolerance;
+
+    for(int j=0; j<b.cols(); ++j)
+    {
+      m_iterations = Base::maxIterations();
+      m_error = Base::m_tolerance;
+
+      typename Dest::ColXpr xj(x,j);
+      internal::conjugate_gradient(mp_matrix->template selfadjointView<UpLo>(), b.col(j), xj,
+                                   Base::m_preconditioner, m_iterations, m_error);
+    }
+
+    m_isInitialized = true;
+    m_info = m_error <= Base::m_tolerance ? Success : NoConvergence;
+  }
+  
+  /** \internal */
+  template<typename Rhs,typename Dest>
+  void _solve(const Rhs& b, Dest& x) const
+  {
+    x.setOnes();
+    _solveWithGuess(b,x);
+  }
+
+protected:
+
+};
+
+
+namespace internal {
+
+template<typename _MatrixType, int _UpLo, typename _Preconditioner, typename Rhs>
+struct solve_retval<ConjugateGradient<_MatrixType,_UpLo,_Preconditioner>, Rhs>
+  : solve_retval_base<ConjugateGradient<_MatrixType,_UpLo,_Preconditioner>, Rhs>
+{
+  typedef ConjugateGradient<_MatrixType,_UpLo,_Preconditioner> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_CONJUGATE_GRADIENT_H
diff --git a/third_party/eigen3/Eigen/src/IterativeLinearSolvers/IncompleteLUT.h b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/IncompleteLUT.h
new file mode 100644
index 0000000000..b55afc1363
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/IncompleteLUT.h
@@ -0,0 +1,467 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_INCOMPLETE_LUT_H
+#define EIGEN_INCOMPLETE_LUT_H
+
+
+namespace Eigen { 
+
+namespace internal {
+    
+/** \internal
+  * Compute a quick-sort split of a vector 
+  * On output, the vector row is permuted such that its elements satisfy
+  * abs(row(i)) >= abs(row(ncut)) if i<ncut
+  * abs(row(i)) <= abs(row(ncut)) if i>ncut 
+  * \param row The vector of values
+  * \param ind The array of index for the elements in @p row
+  * \param ncut  The number of largest elements to keep
+  **/ 
+template <typename VectorV, typename VectorI, typename Index>
+Index QuickSplit(VectorV &row, VectorI &ind, Index ncut)
+{
+  typedef typename VectorV::RealScalar RealScalar;
+  using std::swap;
+  using std::abs;
+  Index mid;
+  Index n = row.size(); /* length of the vector */
+  Index first, last ;
+  
+  ncut--; /* to fit the zero-based indices */
+  first = 0; 
+  last = n-1; 
+  if (ncut < first || ncut > last ) return 0;
+  
+  do {
+    mid = first; 
+    RealScalar abskey = abs(row(mid)); 
+    for (Index j = first + 1; j <= last; j++) {
+      if ( abs(row(j)) > abskey) {
+        ++mid;
+        swap(row(mid), row(j));
+        swap(ind(mid), ind(j));
+      }
+    }
+    /* Interchange for the pivot element */
+    swap(row(mid), row(first));
+    swap(ind(mid), ind(first));
+    
+    if (mid > ncut) last = mid - 1;
+    else if (mid < ncut ) first = mid + 1; 
+  } while (mid != ncut );
+  
+  return 0; /* mid is equal to ncut */ 
+}
+
+}// end namespace internal
+
+/** \ingroup IterativeLinearSolvers_Module
+  * \class IncompleteLUT
+  * \brief Incomplete LU factorization with dual-threshold strategy
+  *
+  * During the numerical factorization, two dropping rules are used :
+  *  1) any element whose magnitude is less than some tolerance is dropped.
+  *    This tolerance is obtained by multiplying the input tolerance @p droptol 
+  *    by the average magnitude of all the original elements in the current row.
+  *  2) After the elimination of the row, only the @p fill largest elements in 
+  *    the L part and the @p fill largest elements in the U part are kept 
+  *    (in addition to the diagonal element ). Note that @p fill is computed from 
+  *    the input parameter @p fillfactor which is used the ratio to control the fill_in 
+  *    relatively to the initial number of nonzero elements.
+  * 
+  * The two extreme cases are when @p droptol=0 (to keep all the @p fill*2 largest elements)
+  * and when @p fill=n/2 with @p droptol being different to zero. 
+  * 
+  * References : Yousef Saad, ILUT: A dual threshold incomplete LU factorization, 
+  *              Numerical Linear Algebra with Applications, 1(4), pp 387-402, 1994.
+  * 
+  * NOTE : The following implementation is derived from the ILUT implementation
+  * in the SPARSKIT package, Copyright (C) 2005, the Regents of the University of Minnesota 
+  *  released under the terms of the GNU LGPL: 
+  *    http://www-users.cs.umn.edu/~saad/software/SPARSKIT/README
+  * However, Yousef Saad gave us permission to relicense his ILUT code to MPL2.
+  * See the Eigen mailing list archive, thread: ILUT, date: July 8, 2012:
+  *   http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2012/07/msg00064.html
+  * alternatively, on GMANE:
+  *   http://comments.gmane.org/gmane.comp.lib.eigen/3302
+  */
+template <typename _Scalar>
+class IncompleteLUT : internal::noncopyable
+{
+    typedef _Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef Matrix<Scalar,Dynamic,1> Vector;
+    typedef SparseMatrix<Scalar,RowMajor> FactorType;
+    typedef SparseMatrix<Scalar,ColMajor> PermutType;
+    typedef typename FactorType::Index Index;
+
+  public:
+    typedef Matrix<Scalar,Dynamic,Dynamic> MatrixType;
+    
+    IncompleteLUT()
+      : m_droptol(NumTraits<Scalar>::dummy_precision()), m_fillfactor(10),
+        m_analysisIsOk(false), m_factorizationIsOk(false), m_isInitialized(false)
+    {}
+    
+    template<typename MatrixType>
+    IncompleteLUT(const MatrixType& mat, const RealScalar& droptol=NumTraits<Scalar>::dummy_precision(), int fillfactor = 10)
+      : m_droptol(droptol),m_fillfactor(fillfactor),
+        m_analysisIsOk(false),m_factorizationIsOk(false),m_isInitialized(false)
+    {
+      eigen_assert(fillfactor != 0);
+      compute(mat); 
+    }
+    
+    Index rows() const { return m_lu.rows(); }
+    
+    Index cols() const { return m_lu.cols(); }
+
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the matrix.appears to be negative.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "IncompleteLUT is not initialized.");
+      return m_info;
+    }
+    
+    template<typename MatrixType>
+    void analyzePattern(const MatrixType& amat);
+    
+    template<typename MatrixType>
+    void factorize(const MatrixType& amat);
+    
+    /**
+      * Compute an incomplete LU factorization with dual threshold on the matrix mat
+      * No pivoting is done in this version
+      * 
+      **/
+    template<typename MatrixType>
+    IncompleteLUT<Scalar>& compute(const MatrixType& amat)
+    {
+      analyzePattern(amat); 
+      factorize(amat);
+      m_isInitialized = m_factorizationIsOk;
+      return *this;
+    }
+
+    void setDroptol(const RealScalar& droptol); 
+    void setFillfactor(int fillfactor); 
+    
+    template<typename Rhs, typename Dest>
+    void _solve(const Rhs& b, Dest& x) const
+    {
+      x = m_Pinv * b;  
+      x = m_lu.template triangularView<UnitLower>().solve(x);
+      x = m_lu.template triangularView<Upper>().solve(x);
+      x = m_P * x; 
+    }
+
+    template<typename Rhs> inline const internal::solve_retval<IncompleteLUT, Rhs>
+     solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "IncompleteLUT is not initialized.");
+      eigen_assert(cols()==b.rows()
+                && "IncompleteLUT::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<IncompleteLUT, Rhs>(*this, b.derived());
+    }
+
+protected:
+
+    /** keeps off-diagonal entries; drops diagonal entries */
+    struct keep_diag {
+      inline bool operator() (const Index& row, const Index& col, const Scalar&) const
+      {
+        return row!=col;
+      }
+    };
+
+protected:
+
+    FactorType m_lu;
+    RealScalar m_droptol;
+    int m_fillfactor;
+    bool m_analysisIsOk;
+    bool m_factorizationIsOk;
+    bool m_isInitialized;
+    ComputationInfo m_info;
+    PermutationMatrix<Dynamic,Dynamic,Index> m_P;     // Fill-reducing permutation
+    PermutationMatrix<Dynamic,Dynamic,Index> m_Pinv;  // Inverse permutation
+};
+
+/**
+ * Set control parameter droptol
+ *  \param droptol   Drop any element whose magnitude is less than this tolerance 
+ **/ 
+template<typename Scalar>
+void IncompleteLUT<Scalar>::setDroptol(const RealScalar& droptol)
+{
+  this->m_droptol = droptol;   
+}
+
+/**
+ * Set control parameter fillfactor
+ * \param fillfactor  This is used to compute the  number @p fill_in of largest elements to keep on each row. 
+ **/ 
+template<typename Scalar>
+void IncompleteLUT<Scalar>::setFillfactor(int fillfactor)
+{
+  this->m_fillfactor = fillfactor;   
+}
+
+template <typename Scalar>
+template<typename _MatrixType>
+void IncompleteLUT<Scalar>::analyzePattern(const _MatrixType& amat)
+{
+  // Compute the Fill-reducing permutation
+  SparseMatrix<Scalar,ColMajor, Index> mat1 = amat;
+  SparseMatrix<Scalar,ColMajor, Index> mat2 = amat.transpose();
+  // Symmetrize the pattern
+  // FIXME for a matrix with nearly symmetric pattern, mat2+mat1 is the appropriate choice.
+  //       on the other hand for a really non-symmetric pattern, mat2*mat1 should be prefered...
+  SparseMatrix<Scalar,ColMajor, Index> AtA = mat2 + mat1;
+  AtA.prune(keep_diag());
+  internal::minimum_degree_ordering<Scalar, Index>(AtA, m_P);  // Then compute the AMD ordering...
+
+  m_Pinv  = m_P.inverse(); // ... and the inverse permutation
+
+  m_analysisIsOk = true;
+}
+
+template <typename Scalar>
+template<typename _MatrixType>
+void IncompleteLUT<Scalar>::factorize(const _MatrixType& amat)
+{
+  using std::sqrt;
+  using std::swap;
+  using std::abs;
+
+  eigen_assert((amat.rows() == amat.cols()) && "The factorization should be done on a square matrix");
+  Index n = amat.cols();  // Size of the matrix
+  m_lu.resize(n,n);
+  // Declare Working vectors and variables
+  Vector u(n) ;     // real values of the row -- maximum size is n --
+  VectorXi ju(n);   // column position of the values in u -- maximum size  is n
+  VectorXi jr(n);   // Indicate the position of the nonzero elements in the vector u -- A zero location is indicated by -1
+
+  // Apply the fill-reducing permutation
+  eigen_assert(m_analysisIsOk && "You must first call analyzePattern()");
+  SparseMatrix<Scalar,RowMajor, Index> mat;
+  mat = amat.twistedBy(m_Pinv);
+
+  // Initialization
+  jr.fill(-1);
+  ju.fill(0);
+  u.fill(0);
+
+  // number of largest elements to keep in each row:
+  Index fill_in =   static_cast<Index> (amat.nonZeros()*m_fillfactor)/n+1;
+  if (fill_in > n) fill_in = n;
+
+  // number of largest nonzero elements to keep in the L and the U part of the current row:
+  Index nnzL = fill_in/2;
+  Index nnzU = nnzL;
+  m_lu.reserve(n * (nnzL + nnzU + 1));
+
+  // global loop over the rows of the sparse matrix
+  for (Index ii = 0; ii < n; ii++)
+  {
+    // 1 - copy the lower and the upper part of the row i of mat in the working vector u
+
+    Index sizeu = 1; // number of nonzero elements in the upper part of the current row
+    Index sizel = 0; // number of nonzero elements in the lower part of the current row
+    ju(ii)    = ii;
+    u(ii)     = 0;
+    jr(ii)    = ii;
+    RealScalar rownorm = 0;
+
+    typename FactorType::InnerIterator j_it(mat, ii); // Iterate through the current row ii
+    for (; j_it; ++j_it)
+    {
+      Index k = j_it.index();
+      if (k < ii)
+      {
+        // copy the lower part
+        ju(sizel) = k;
+        u(sizel) = j_it.value();
+        jr(k) = sizel;
+        ++sizel;
+      }
+      else if (k == ii)
+      {
+        u(ii) = j_it.value();
+      }
+      else
+      {
+        // copy the upper part
+        Index jpos = ii + sizeu;
+        ju(jpos) = k;
+        u(jpos) = j_it.value();
+        jr(k) = jpos;
+        ++sizeu;
+      }
+      rownorm += numext::abs2(j_it.value());
+    }
+
+    // 2 - detect possible zero row
+    if(rownorm==0)
+    {
+      m_info = NumericalIssue;
+      return;
+    }
+    // Take the 2-norm of the current row as a relative tolerance
+    rownorm = sqrt(rownorm);
+
+    // 3 - eliminate the previous nonzero rows
+    Index jj = 0;
+    Index len = 0;
+    while (jj < sizel)
+    {
+      // In order to eliminate in the correct order,
+      // we must select first the smallest column index among  ju(jj:sizel)
+      Index k;
+      Index minrow = ju.segment(jj,sizel-jj).minCoeff(&k); // k is relative to the segment
+      k += jj;
+      if (minrow != ju(jj))
+      {
+        // swap the two locations
+        Index j = ju(jj);
+        swap(ju(jj), ju(k));
+        jr(minrow) = jj;   jr(j) = k;
+        swap(u(jj), u(k));
+      }
+      // Reset this location
+      jr(minrow) = -1;
+
+      // Start elimination
+      typename FactorType::InnerIterator ki_it(m_lu, minrow);
+      while (ki_it && ki_it.index() < minrow) ++ki_it;
+      eigen_internal_assert(ki_it && ki_it.col()==minrow);
+      Scalar fact = u(jj) / ki_it.value();
+
+      // drop too small elements
+      if(abs(fact) <= m_droptol)
+      {
+        jj++;
+        continue;
+      }
+
+      // linear combination of the current row ii and the row minrow
+      ++ki_it;
+      for (; ki_it; ++ki_it)
+      {
+        Scalar prod = fact * ki_it.value();
+        Index j       = ki_it.index();
+        Index jpos    = jr(j);
+        if (jpos == -1) // fill-in element
+        {
+          Index newpos;
+          if (j >= ii) // dealing with the upper part
+          {
+            newpos = ii + sizeu;
+            sizeu++;
+            eigen_internal_assert(sizeu<=n);
+          }
+          else // dealing with the lower part
+          {
+            newpos = sizel;
+            sizel++;
+            eigen_internal_assert(sizel<=ii);
+          }
+          ju(newpos) = j;
+          u(newpos) = -prod;
+          jr(j) = newpos;
+        }
+        else
+          u(jpos) -= prod;
+      }
+      // store the pivot element
+      u(len) = fact;
+      ju(len) = minrow;
+      ++len;
+
+      jj++;
+    } // end of the elimination on the row ii
+
+    // reset the upper part of the pointer jr to zero
+    for(Index k = 0; k <sizeu; k++) jr(ju(ii+k)) = -1;
+
+    // 4 - partially sort and insert the elements in the m_lu matrix
+
+    // sort the L-part of the row
+    sizel = len;
+    len = (std::min)(sizel, nnzL);
+    typename Vector::SegmentReturnType ul(u.segment(0, sizel));
+    typename VectorXi::SegmentReturnType jul(ju.segment(0, sizel));
+    internal::QuickSplit(ul, jul, len);
+
+    // store the largest m_fill elements of the L part
+    m_lu.startVec(ii);
+    for(Index k = 0; k < len; k++)
+      m_lu.insertBackByOuterInnerUnordered(ii,ju(k)) = u(k);
+
+    // store the diagonal element
+    // apply a shifting rule to avoid zero pivots (we are doing an incomplete factorization)
+    if (u(ii) == Scalar(0))
+      u(ii) = sqrt(m_droptol) * rownorm;
+    m_lu.insertBackByOuterInnerUnordered(ii, ii) = u(ii);
+
+    // sort the U-part of the row
+    // apply the dropping rule first
+    len = 0;
+    for(Index k = 1; k < sizeu; k++)
+    {
+      if(abs(u(ii+k)) > m_droptol * rownorm )
+      {
+        ++len;
+        u(ii + len)  = u(ii + k);
+        ju(ii + len) = ju(ii + k);
+      }
+    }
+    sizeu = len + 1; // +1 to take into account the diagonal element
+    len = (std::min)(sizeu, nnzU);
+    typename Vector::SegmentReturnType uu(u.segment(ii+1, sizeu-1));
+    typename VectorXi::SegmentReturnType juu(ju.segment(ii+1, sizeu-1));
+    internal::QuickSplit(uu, juu, len);
+
+    // store the largest elements of the U part
+    for(Index k = ii + 1; k < ii + len; k++)
+      m_lu.insertBackByOuterInnerUnordered(ii,ju(k)) = u(k);
+  }
+
+  m_lu.finalize();
+  m_lu.makeCompressed();
+
+  m_factorizationIsOk = true;
+  m_info = Success;
+}
+
+namespace internal {
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<IncompleteLUT<_MatrixType>, Rhs>
+  : solve_retval_base<IncompleteLUT<_MatrixType>, Rhs>
+{
+  typedef IncompleteLUT<_MatrixType> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_INCOMPLETE_LUT_H
diff --git a/third_party/eigen3/Eigen/src/IterativeLinearSolvers/IterativeSolverBase.h b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/IterativeSolverBase.h
new file mode 100644
index 0000000000..2036922d69
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/IterativeLinearSolvers/IterativeSolverBase.h
@@ -0,0 +1,254 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ITERATIVE_SOLVER_BASE_H
+#define EIGEN_ITERATIVE_SOLVER_BASE_H
+
+namespace Eigen { 
+
+/** \ingroup IterativeLinearSolvers_Module
+  * \brief Base class for linear iterative solvers
+  *
+  * \sa class SimplicialCholesky, DiagonalPreconditioner, IdentityPreconditioner
+  */
+template< typename Derived>
+class IterativeSolverBase : internal::noncopyable
+{
+public:
+  typedef typename internal::traits<Derived>::MatrixType MatrixType;
+  typedef typename internal::traits<Derived>::Preconditioner Preconditioner;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::RealScalar RealScalar;
+
+public:
+
+  Derived& derived() { return *static_cast<Derived*>(this); }
+  const Derived& derived() const { return *static_cast<const Derived*>(this); }
+
+  /** Default constructor. */
+  IterativeSolverBase()
+    : mp_matrix(0)
+  {
+    init();
+  }
+
+  /** Initialize the solver with matrix \a A for further \c Ax=b solving.
+    * 
+    * This constructor is a shortcut for the default constructor followed
+    * by a call to compute().
+    * 
+    * \warning this class stores a reference to the matrix A as well as some
+    * precomputed values that depend on it. Therefore, if \a A is changed
+    * this class becomes invalid. Call compute() to update it with the new
+    * matrix A, or modify a copy of A.
+    */
+  IterativeSolverBase(const MatrixType& A)
+  {
+    init();
+    compute(A);
+  }
+
+  ~IterativeSolverBase() {}
+  
+  /** Initializes the iterative solver for the sparcity pattern of the matrix \a A for further solving \c Ax=b problems.
+    *
+    * Currently, this function mostly call analyzePattern on the preconditioner. In the future
+    * we might, for instance, implement column reodering for faster matrix vector products.
+    */
+  Derived& analyzePattern(const MatrixType& A)
+  {
+    m_preconditioner.analyzePattern(A);
+    m_isInitialized = true;
+    m_analysisIsOk = true;
+    m_info = Success;
+    return derived();
+  }
+  
+  /** Initializes the iterative solver with the numerical values of the matrix \a A for further solving \c Ax=b problems.
+    *
+    * Currently, this function mostly call factorize on the preconditioner.
+    *
+    * \warning this class stores a reference to the matrix A as well as some
+    * precomputed values that depend on it. Therefore, if \a A is changed
+    * this class becomes invalid. Call compute() to update it with the new
+    * matrix A, or modify a copy of A.
+    */
+  Derived& factorize(const MatrixType& A)
+  {
+    eigen_assert(m_analysisIsOk && "You must first call analyzePattern()"); 
+    mp_matrix = &A;
+    m_preconditioner.factorize(A);
+    m_factorizationIsOk = true;
+    m_info = Success;
+    return derived();
+  }
+
+  /** Initializes the iterative solver with the matrix \a A for further solving \c Ax=b problems.
+    *
+    * Currently, this function mostly initialized/compute the preconditioner. In the future
+    * we might, for instance, implement column reodering for faster matrix vector products.
+    *
+    * \warning this class stores a reference to the matrix A as well as some
+    * precomputed values that depend on it. Therefore, if \a A is changed
+    * this class becomes invalid. Call compute() to update it with the new
+    * matrix A, or modify a copy of A.
+    */
+  Derived& compute(const MatrixType& A)
+  {
+    mp_matrix = &A;
+    m_preconditioner.compute(A);
+    m_isInitialized = true;
+    m_analysisIsOk = true;
+    m_factorizationIsOk = true;
+    m_info = Success;
+    return derived();
+  }
+
+  /** \internal */
+  Index rows() const { return mp_matrix ? mp_matrix->rows() : 0; }
+  /** \internal */
+  Index cols() const { return mp_matrix ? mp_matrix->cols() : 0; }
+
+  /** \returns the tolerance threshold used by the stopping criteria */
+  RealScalar tolerance() const { return m_tolerance; }
+  
+  /** Sets the tolerance threshold used by the stopping criteria */
+  Derived& setTolerance(const RealScalar& tolerance)
+  {
+    m_tolerance = tolerance;
+    return derived();
+  }
+
+  /** \returns a read-write reference to the preconditioner for custom configuration. */
+  Preconditioner& preconditioner() { return m_preconditioner; }
+  
+  /** \returns a read-only reference to the preconditioner. */
+  const Preconditioner& preconditioner() const { return m_preconditioner; }
+
+  /** \returns the max number of iterations */
+  int maxIterations() const
+  {
+    return (mp_matrix && m_maxIterations<0) ? mp_matrix->cols() : m_maxIterations;
+  }
+  
+  /** Sets the max number of iterations */
+  Derived& setMaxIterations(int maxIters)
+  {
+    m_maxIterations = maxIters;
+    return derived();
+  }
+
+  /** \returns the number of iterations performed during the last solve */
+  int iterations() const
+  {
+    eigen_assert(m_isInitialized && "ConjugateGradient is not initialized.");
+    return m_iterations;
+  }
+
+  /** \returns the tolerance error reached during the last solve */
+  RealScalar error() const
+  {
+    eigen_assert(m_isInitialized && "ConjugateGradient is not initialized.");
+    return m_error;
+  }
+
+  /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+    *
+    * \sa compute()
+    */
+  template<typename Rhs> inline const internal::solve_retval<Derived, Rhs>
+  solve(const MatrixBase<Rhs>& b) const
+  {
+    eigen_assert(m_isInitialized && "IterativeSolverBase is not initialized.");
+    eigen_assert(rows()==b.rows()
+              && "IterativeSolverBase::solve(): invalid number of rows of the right hand side matrix b");
+    return internal::solve_retval<Derived, Rhs>(derived(), b.derived());
+  }
+  
+  /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+    *
+    * \sa compute()
+    */
+  template<typename Rhs>
+  inline const internal::sparse_solve_retval<IterativeSolverBase, Rhs>
+  solve(const SparseMatrixBase<Rhs>& b) const
+  {
+    eigen_assert(m_isInitialized && "IterativeSolverBase is not initialized.");
+    eigen_assert(rows()==b.rows()
+              && "IterativeSolverBase::solve(): invalid number of rows of the right hand side matrix b");
+    return internal::sparse_solve_retval<IterativeSolverBase, Rhs>(*this, b.derived());
+  }
+
+  /** \returns Success if the iterations converged, and NoConvergence otherwise. */
+  ComputationInfo info() const
+  {
+    eigen_assert(m_isInitialized && "IterativeSolverBase is not initialized.");
+    return m_info;
+  }
+  
+  /** \internal */
+  template<typename Rhs, typename DestScalar, int DestOptions, typename DestIndex>
+  void _solve_sparse(const Rhs& b, SparseMatrix<DestScalar,DestOptions,DestIndex> &dest) const
+  {
+    eigen_assert(rows()==b.rows());
+    
+    int rhsCols = b.cols();
+    int size = b.rows();
+    Eigen::Matrix<DestScalar,Dynamic,1> tb(size);
+    Eigen::Matrix<DestScalar,Dynamic,1> tx(size);
+    for(int k=0; k<rhsCols; ++k)
+    {
+      tb = b.col(k);
+      tx = derived().solve(tb);
+      dest.col(k) = tx.sparseView(0);
+    }
+  }
+
+protected:
+  void init()
+  {
+    m_isInitialized = false;
+    m_analysisIsOk = false;
+    m_factorizationIsOk = false;
+    m_maxIterations = -1;
+    m_tolerance = NumTraits<Scalar>::epsilon();
+  }
+  const MatrixType* mp_matrix;
+  Preconditioner m_preconditioner;
+
+  int m_maxIterations;
+  RealScalar m_tolerance;
+  
+  mutable RealScalar m_error;
+  mutable int m_iterations;
+  mutable ComputationInfo m_info;
+  mutable bool m_isInitialized, m_analysisIsOk, m_factorizationIsOk;
+};
+
+namespace internal {
+ 
+template<typename Derived, typename Rhs>
+struct sparse_solve_retval<IterativeSolverBase<Derived>, Rhs>
+  : sparse_solve_retval_base<IterativeSolverBase<Derived>, Rhs>
+{
+  typedef IterativeSolverBase<Derived> Dec;
+  EIGEN_MAKE_SPARSE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec().derived()._solve_sparse(rhs(),dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_ITERATIVE_SOLVER_BASE_H
diff --git a/third_party/eigen3/Eigen/src/Jacobi/Jacobi.h b/third_party/eigen3/Eigen/src/Jacobi/Jacobi.h
new file mode 100644
index 0000000000..956f72d570
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/Jacobi/Jacobi.h
@@ -0,0 +1,433 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_JACOBI_H
+#define EIGEN_JACOBI_H
+
+namespace Eigen { 
+
+/** \ingroup Jacobi_Module
+  * \jacobi_module
+  * \class JacobiRotation
+  * \brief Rotation given by a cosine-sine pair.
+  *
+  * This class represents a Jacobi or Givens rotation.
+  * This is a 2D rotation in the plane \c J of angle \f$ \theta \f$ defined by
+  * its cosine \c c and sine \c s as follow:
+  * \f$ J = \left ( \begin{array}{cc} c & \overline s \\ -s  & \overline c \end{array} \right ) \f$
+  *
+  * You can apply the respective counter-clockwise rotation to a column vector \c v by
+  * applying its adjoint on the left: \f$ v = J^* v \f$ that translates to the following Eigen code:
+  * \code
+  * v.applyOnTheLeft(J.adjoint());
+  * \endcode
+  *
+  * \sa MatrixBase::applyOnTheLeft(), MatrixBase::applyOnTheRight()
+  */
+template<typename Scalar> class JacobiRotation
+{
+  public:
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    /** Default constructor without any initialization. */
+    JacobiRotation() {}
+
+    /** Construct a planar rotation from a cosine-sine pair (\a c, \c s). */
+    JacobiRotation(const Scalar& c, const Scalar& s) : m_c(c), m_s(s) {}
+
+    Scalar& c() { return m_c; }
+    Scalar c() const { return m_c; }
+    Scalar& s() { return m_s; }
+    Scalar s() const { return m_s; }
+
+    /** Concatenates two planar rotation */
+    JacobiRotation operator*(const JacobiRotation& other)
+    {
+      using numext::conj;
+      return JacobiRotation(m_c * other.m_c - conj(m_s) * other.m_s,
+                            conj(m_c * conj(other.m_s) + conj(m_s) * conj(other.m_c)));
+    }
+
+    /** Returns the transposed transformation */
+    JacobiRotation transpose() const { using numext::conj; return JacobiRotation(m_c, -conj(m_s)); }
+
+    /** Returns the adjoint transformation */
+    JacobiRotation adjoint() const { using numext::conj; return JacobiRotation(conj(m_c), -m_s); }
+
+    template<typename Derived>
+    bool makeJacobi(const MatrixBase<Derived>&, typename Derived::Index p, typename Derived::Index q);
+    bool makeJacobi(const RealScalar& x, const Scalar& y, const RealScalar& z);
+
+    void makeGivens(const Scalar& p, const Scalar& q, Scalar* z=0);
+
+  protected:
+    void makeGivens(const Scalar& p, const Scalar& q, Scalar* z, internal::true_type);
+    void makeGivens(const Scalar& p, const Scalar& q, Scalar* z, internal::false_type);
+
+    Scalar m_c, m_s;
+};
+
+/** Makes \c *this as a Jacobi rotation \a J such that applying \a J on both the right and left sides of the selfadjoint 2x2 matrix
+  * \f$ B = \left ( \begin{array}{cc} x & y \\ \overline y & z \end{array} \right )\f$ yields a diagonal matrix \f$ A = J^* B J \f$
+  *
+  * \sa MatrixBase::makeJacobi(const MatrixBase<Derived>&, Index, Index), MatrixBase::applyOnTheLeft(), MatrixBase::applyOnTheRight()
+  */
+template<typename Scalar>
+bool JacobiRotation<Scalar>::makeJacobi(const RealScalar& x, const Scalar& y, const RealScalar& z)
+{
+  using std::sqrt;
+  using std::abs;
+  typedef typename NumTraits<Scalar>::Real RealScalar;
+  if(y == Scalar(0))
+  {
+    m_c = Scalar(1);
+    m_s = Scalar(0);
+    return false;
+  }
+  else
+  {
+    RealScalar tau = (x-z)/(RealScalar(2)*abs(y));
+    RealScalar w = sqrt(numext::abs2(tau) + RealScalar(1));
+    RealScalar t;
+    if(tau>RealScalar(0))
+    {
+      t = RealScalar(1) / (tau + w);
+    }
+    else
+    {
+      t = RealScalar(1) / (tau - w);
+    }
+    RealScalar sign_t = t > RealScalar(0) ? RealScalar(1) : RealScalar(-1);
+    RealScalar n = RealScalar(1) / sqrt(numext::abs2(t)+RealScalar(1));
+    m_s = - sign_t * (numext::conj(y) / abs(y)) * abs(t) * n;
+    m_c = n;
+    return true;
+  }
+}
+
+/** Makes \c *this as a Jacobi rotation \c J such that applying \a J on both the right and left sides of the 2x2 selfadjoint matrix
+  * \f$ B = \left ( \begin{array}{cc} \text{this}_{pp} & \text{this}_{pq} \\ (\text{this}_{pq})^* & \text{this}_{qq} \end{array} \right )\f$ yields
+  * a diagonal matrix \f$ A = J^* B J \f$
+  *
+  * Example: \include Jacobi_makeJacobi.cpp
+  * Output: \verbinclude Jacobi_makeJacobi.out
+  *
+  * \sa JacobiRotation::makeJacobi(RealScalar, Scalar, RealScalar), MatrixBase::applyOnTheLeft(), MatrixBase::applyOnTheRight()
+  */
+template<typename Scalar>
+template<typename Derived>
+inline bool JacobiRotation<Scalar>::makeJacobi(const MatrixBase<Derived>& m, typename Derived::Index p, typename Derived::Index q)
+{
+  return makeJacobi(numext::real(m.coeff(p,p)), m.coeff(p,q), numext::real(m.coeff(q,q)));
+}
+
+/** Makes \c *this as a Givens rotation \c G such that applying \f$ G^* \f$ to the left of the vector
+  * \f$ V = \left ( \begin{array}{c} p \\ q \end{array} \right )\f$ yields:
+  * \f$ G^* V = \left ( \begin{array}{c} r \\ 0 \end{array} \right )\f$.
+  *
+  * The value of \a z is returned if \a z is not null (the default is null).
+  * Also note that G is built such that the cosine is always real.
+  *
+  * Example: \include Jacobi_makeGivens.cpp
+  * Output: \verbinclude Jacobi_makeGivens.out
+  *
+  * This function implements the continuous Givens rotation generation algorithm
+  * found in Anderson (2000), Discontinuous Plane Rotations and the Symmetric Eigenvalue Problem.
+  * LAPACK Working Note 150, University of Tennessee, UT-CS-00-454, December 4, 2000.
+  *
+  * \sa MatrixBase::applyOnTheLeft(), MatrixBase::applyOnTheRight()
+  */
+template<typename Scalar>
+void JacobiRotation<Scalar>::makeGivens(const Scalar& p, const Scalar& q, Scalar* z)
+{
+  makeGivens(p, q, z, typename internal::conditional<NumTraits<Scalar>::IsComplex, internal::true_type, internal::false_type>::type());
+}
+
+
+// specialization for complexes
+template<typename Scalar>
+void JacobiRotation<Scalar>::makeGivens(const Scalar& p, const Scalar& q, Scalar* r, internal::true_type)
+{
+  using std::sqrt;
+  using std::abs;
+  using numext::conj;
+  
+  if(q==Scalar(0))
+  {
+    m_c = numext::real(p)<0 ? Scalar(-1) : Scalar(1);
+    m_s = 0;
+    if(r) *r = m_c * p;
+  }
+  else if(p==Scalar(0))
+  {
+    m_c = 0;
+    m_s = -q/abs(q);
+    if(r) *r = abs(q);
+  }
+  else
+  {
+    RealScalar p1 = numext::norm1(p);
+    RealScalar q1 = numext::norm1(q);
+    if(p1>=q1)
+    {
+      Scalar ps = p / p1;
+      RealScalar p2 = numext::abs2(ps);
+      Scalar qs = q / p1;
+      RealScalar q2 = numext::abs2(qs);
+
+      RealScalar u = sqrt(RealScalar(1) + q2/p2);
+      if(numext::real(p)<RealScalar(0))
+        u = -u;
+
+      m_c = Scalar(1)/u;
+      m_s = -qs*conj(ps)*(m_c/p2);
+      if(r) *r = p * u;
+    }
+    else
+    {
+      Scalar ps = p / q1;
+      RealScalar p2 = numext::abs2(ps);
+      Scalar qs = q / q1;
+      RealScalar q2 = numext::abs2(qs);
+
+      RealScalar u = q1 * sqrt(p2 + q2);
+      if(numext::real(p)<RealScalar(0))
+        u = -u;
+
+      p1 = abs(p);
+      ps = p/p1;
+      m_c = p1/u;
+      m_s = -conj(ps) * (q/u);
+      if(r) *r = ps * u;
+    }
+  }
+}
+
+// specialization for reals
+template<typename Scalar>
+void JacobiRotation<Scalar>::makeGivens(const Scalar& p, const Scalar& q, Scalar* r, internal::false_type)
+{
+  using std::sqrt;
+  using std::abs;
+  if(q==Scalar(0))
+  {
+    m_c = p<Scalar(0) ? Scalar(-1) : Scalar(1);
+    m_s = Scalar(0);
+    if(r) *r = abs(p);
+  }
+  else if(p==Scalar(0))
+  {
+    m_c = Scalar(0);
+    m_s = q<Scalar(0) ? Scalar(1) : Scalar(-1);
+    if(r) *r = abs(q);
+  }
+  else if(abs(p) > abs(q))
+  {
+    Scalar t = q/p;
+    Scalar u = sqrt(Scalar(1) + numext::abs2(t));
+    if(p<Scalar(0))
+      u = -u;
+    m_c = Scalar(1)/u;
+    m_s = -t * m_c;
+    if(r) *r = p * u;
+  }
+  else
+  {
+    Scalar t = p/q;
+    Scalar u = sqrt(Scalar(1) + numext::abs2(t));
+    if(q<Scalar(0))
+      u = -u;
+    m_s = -Scalar(1)/u;
+    m_c = -t * m_s;
+    if(r) *r = q * u;
+  }
+
+}
+
+/****************************************************************************************
+*   Implementation of MatrixBase methods
+****************************************************************************************/
+
+/** \jacobi_module
+  * Applies the clock wise 2D rotation \a j to the set of 2D vectors of cordinates \a x and \a y:
+  * \f$ \left ( \begin{array}{cc} x \\ y \end{array} \right )  =  J \left ( \begin{array}{cc} x \\ y \end{array} \right ) \f$
+  *
+  * \sa MatrixBase::applyOnTheLeft(), MatrixBase::applyOnTheRight()
+  */
+namespace internal {
+template<typename VectorX, typename VectorY, typename OtherScalar>
+void apply_rotation_in_the_plane(VectorX& _x, VectorY& _y, const JacobiRotation<OtherScalar>& j);
+}
+
+/** \jacobi_module
+  * Applies the rotation in the plane \a j to the rows \a p and \a q of \c *this, i.e., it computes B = J * B,
+  * with \f$ B = \left ( \begin{array}{cc} \text{*this.row}(p) \\ \text{*this.row}(q) \end{array} \right ) \f$.
+  *
+  * \sa class JacobiRotation, MatrixBase::applyOnTheRight(), internal::apply_rotation_in_the_plane()
+  */
+template<typename Derived>
+template<typename OtherScalar>
+inline void MatrixBase<Derived>::applyOnTheLeft(Index p, Index q, const JacobiRotation<OtherScalar>& j)
+{
+  RowXpr x(this->row(p));
+  RowXpr y(this->row(q));
+  internal::apply_rotation_in_the_plane(x, y, j);
+}
+
+/** \ingroup Jacobi_Module
+  * Applies the rotation in the plane \a j to the columns \a p and \a q of \c *this, i.e., it computes B = B * J
+  * with \f$ B = \left ( \begin{array}{cc} \text{*this.col}(p) & \text{*this.col}(q) \end{array} \right ) \f$.
+  *
+  * \sa class JacobiRotation, MatrixBase::applyOnTheLeft(), internal::apply_rotation_in_the_plane()
+  */
+template<typename Derived>
+template<typename OtherScalar>
+inline void MatrixBase<Derived>::applyOnTheRight(Index p, Index q, const JacobiRotation<OtherScalar>& j)
+{
+  ColXpr x(this->col(p));
+  ColXpr y(this->col(q));
+  internal::apply_rotation_in_the_plane(x, y, j.transpose());
+}
+
+namespace internal {
+template<typename VectorX, typename VectorY, typename OtherScalar>
+void /*EIGEN_DONT_INLINE*/ apply_rotation_in_the_plane(VectorX& _x, VectorY& _y, const JacobiRotation<OtherScalar>& j)
+{
+  typedef typename VectorX::Index Index;
+  typedef typename VectorX::Scalar Scalar;
+  enum { PacketSize = packet_traits<Scalar>::size };
+  typedef typename packet_traits<Scalar>::type Packet;
+  eigen_assert(_x.size() == _y.size());
+  Index size = _x.size();
+  Index incrx = _x.innerStride();
+  Index incry = _y.innerStride();
+
+  Scalar* EIGEN_RESTRICT x = &_x.coeffRef(0);
+  Scalar* EIGEN_RESTRICT y = &_y.coeffRef(0);
+  
+  OtherScalar c = j.c();
+  OtherScalar s = j.s();
+  if (c==OtherScalar(1) && s==OtherScalar(0))
+    return;
+
+  /*** dynamic-size vectorized paths ***/
+
+  if(VectorX::SizeAtCompileTime == Dynamic &&
+    (VectorX::Flags & VectorY::Flags & PacketAccessBit) &&
+    ((incrx==1 && incry==1) || PacketSize == 1))
+  {
+    // both vectors are sequentially stored in memory => vectorization
+    enum { Peeling = 2 };
+
+    Index alignedStart = internal::first_aligned(y, size);
+    Index alignedEnd = alignedStart + ((size-alignedStart)/PacketSize)*PacketSize;
+
+    const Packet pc = pset1<Packet>(c);
+    const Packet ps = pset1<Packet>(s);
+    conj_helper<Packet,Packet,NumTraits<Scalar>::IsComplex,false> pcj;
+
+    for(Index i=0; i<alignedStart; ++i)
+    {
+      Scalar xi = x[i];
+      Scalar yi = y[i];
+      x[i] =  c * xi + numext::conj(s) * yi;
+      y[i] = -s * xi + numext::conj(c) * yi;
+    }
+
+    Scalar* EIGEN_RESTRICT px = x + alignedStart;
+    Scalar* EIGEN_RESTRICT py = y + alignedStart;
+
+    if(internal::first_aligned(x, size)==alignedStart)
+    {
+      for(Index i=alignedStart; i<alignedEnd; i+=PacketSize)
+      {
+        Packet xi = pload<Packet>(px);
+        Packet yi = pload<Packet>(py);
+        pstore(px, padd(pmul(pc,xi),pcj.pmul(ps,yi)));
+        pstore(py, psub(pcj.pmul(pc,yi),pmul(ps,xi)));
+        px += PacketSize;
+        py += PacketSize;
+      }
+    }
+    else
+    {
+      Index peelingEnd = alignedStart + ((size-alignedStart)/(Peeling*PacketSize))*(Peeling*PacketSize);
+      for(Index i=alignedStart; i<peelingEnd; i+=Peeling*PacketSize)
+      {
+        Packet xi   = ploadu<Packet>(px);
+        Packet xi1  = ploadu<Packet>(px+PacketSize);
+        Packet yi   = pload <Packet>(py);
+        Packet yi1  = pload <Packet>(py+PacketSize);
+        pstoreu(px, padd(pmul(pc,xi),pcj.pmul(ps,yi)));
+        pstoreu(px+PacketSize, padd(pmul(pc,xi1),pcj.pmul(ps,yi1)));
+        pstore (py, psub(pcj.pmul(pc,yi),pmul(ps,xi)));
+        pstore (py+PacketSize, psub(pcj.pmul(pc,yi1),pmul(ps,xi1)));
+        px += Peeling*PacketSize;
+        py += Peeling*PacketSize;
+      }
+      if(alignedEnd!=peelingEnd)
+      {
+        Packet xi = ploadu<Packet>(x+peelingEnd);
+        Packet yi = pload <Packet>(y+peelingEnd);
+        pstoreu(x+peelingEnd, padd(pmul(pc,xi),pcj.pmul(ps,yi)));
+        pstore (y+peelingEnd, psub(pcj.pmul(pc,yi),pmul(ps,xi)));
+      }
+    }
+
+    for(Index i=alignedEnd; i<size; ++i)
+    {
+      Scalar xi = x[i];
+      Scalar yi = y[i];
+      x[i] =  c * xi + numext::conj(s) * yi;
+      y[i] = -s * xi + numext::conj(c) * yi;
+    }
+  }
+
+  /*** fixed-size vectorized path ***/
+  else if(VectorX::SizeAtCompileTime != Dynamic &&
+          (VectorX::Flags & VectorY::Flags & PacketAccessBit) &&
+          (VectorX::Flags & VectorY::Flags & AlignedBit))
+  {
+    const Packet pc = pset1<Packet>(c);
+    const Packet ps = pset1<Packet>(s);
+    conj_helper<Packet,Packet,NumTraits<Scalar>::IsComplex,false> pcj;
+    Scalar* EIGEN_RESTRICT px = x;
+    Scalar* EIGEN_RESTRICT py = y;
+    for(Index i=0; i<size; i+=PacketSize)
+    {
+      Packet xi = pload<Packet>(px);
+      Packet yi = pload<Packet>(py);
+      pstore(px, padd(pmul(pc,xi),pcj.pmul(ps,yi)));
+      pstore(py, psub(pcj.pmul(pc,yi),pmul(ps,xi)));
+      px += PacketSize;
+      py += PacketSize;
+    }
+  }
+
+  /*** non-vectorized path ***/
+  else
+  {
+    for(Index i=0; i<size; ++i)
+    {
+      Scalar xi = *x;
+      Scalar yi = *y;
+      *x =  c * xi + numext::conj(s) * yi;
+      *y = -s * xi + numext::conj(c) * yi;
+      x += incrx;
+      y += incry;
+    }
+  }
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_JACOBI_H
diff --git a/third_party/eigen3/Eigen/src/LU/Determinant.h b/third_party/eigen3/Eigen/src/LU/Determinant.h
new file mode 100644
index 0000000000..bb8e78a8a8
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/LU/Determinant.h
@@ -0,0 +1,101 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_DETERMINANT_H
+#define EIGEN_DETERMINANT_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Derived>
+inline const typename Derived::Scalar bruteforce_det3_helper
+(const MatrixBase<Derived>& matrix, int a, int b, int c)
+{
+  return matrix.coeff(0,a)
+         * (matrix.coeff(1,b) * matrix.coeff(2,c) - matrix.coeff(1,c) * matrix.coeff(2,b));
+}
+
+template<typename Derived>
+const typename Derived::Scalar bruteforce_det4_helper
+(const MatrixBase<Derived>& matrix, int j, int k, int m, int n)
+{
+  return (matrix.coeff(j,0) * matrix.coeff(k,1) - matrix.coeff(k,0) * matrix.coeff(j,1))
+       * (matrix.coeff(m,2) * matrix.coeff(n,3) - matrix.coeff(n,2) * matrix.coeff(m,3));
+}
+
+template<typename Derived,
+         int DeterminantType = Derived::RowsAtCompileTime
+> struct determinant_impl
+{
+  static inline typename traits<Derived>::Scalar run(const Derived& m)
+  {
+    if(Derived::ColsAtCompileTime==Dynamic && m.rows()==0)
+      return typename traits<Derived>::Scalar(1);
+    return m.partialPivLu().determinant();
+  }
+};
+
+template<typename Derived> struct determinant_impl<Derived, 1>
+{
+  static inline typename traits<Derived>::Scalar run(const Derived& m)
+  {
+    return m.coeff(0,0);
+  }
+};
+
+template<typename Derived> struct determinant_impl<Derived, 2>
+{
+  static inline typename traits<Derived>::Scalar run(const Derived& m)
+  {
+    return m.coeff(0,0) * m.coeff(1,1) - m.coeff(1,0) * m.coeff(0,1);
+  }
+};
+
+template<typename Derived> struct determinant_impl<Derived, 3>
+{
+  static inline typename traits<Derived>::Scalar run(const Derived& m)
+  {
+    return bruteforce_det3_helper(m,0,1,2)
+          - bruteforce_det3_helper(m,1,0,2)
+          + bruteforce_det3_helper(m,2,0,1);
+  }
+};
+
+template<typename Derived> struct determinant_impl<Derived, 4>
+{
+  static typename traits<Derived>::Scalar run(const Derived& m)
+  {
+    // trick by Martin Costabel to compute 4x4 det with only 30 muls
+    return bruteforce_det4_helper(m,0,1,2,3)
+          - bruteforce_det4_helper(m,0,2,1,3)
+          + bruteforce_det4_helper(m,0,3,1,2)
+          + bruteforce_det4_helper(m,1,2,0,3)
+          - bruteforce_det4_helper(m,1,3,0,2)
+          + bruteforce_det4_helper(m,2,3,0,1);
+  }
+};
+
+} // end namespace internal
+
+/** \lu_module
+  *
+  * \returns the determinant of this matrix
+  */
+template<typename Derived>
+inline typename internal::traits<Derived>::Scalar MatrixBase<Derived>::determinant() const
+{
+  eigen_assert(rows() == cols());
+  typedef typename internal::nested<Derived,Base::RowsAtCompileTime>::type Nested;
+  return internal::determinant_impl<typename internal::remove_all<Nested>::type>::run(derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_DETERMINANT_H
diff --git a/third_party/eigen3/Eigen/src/LU/FullPivLU.h b/third_party/eigen3/Eigen/src/LU/FullPivLU.h
new file mode 100644
index 0000000000..971b9da1d4
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/LU/FullPivLU.h
@@ -0,0 +1,745 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_LU_H
+#define EIGEN_LU_H
+
+namespace Eigen { 
+
+/** \ingroup LU_Module
+  *
+  * \class FullPivLU
+  *
+  * \brief LU decomposition of a matrix with complete pivoting, and related features
+  *
+  * \param MatrixType the type of the matrix of which we are computing the LU decomposition
+  *
+  * This class represents a LU decomposition of any matrix, with complete pivoting: the matrix A is
+  * decomposed as \f$ A = P^{-1} L U Q^{-1} \f$ where L is unit-lower-triangular, U is
+  * upper-triangular, and P and Q are permutation matrices. This is a rank-revealing LU
+  * decomposition. The eigenvalues (diagonal coefficients) of U are sorted in such a way that any
+  * zeros are at the end.
+  *
+  * This decomposition provides the generic approach to solving systems of linear equations, computing
+  * the rank, invertibility, inverse, kernel, and determinant.
+  *
+  * This LU decomposition is very stable and well tested with large matrices. However there are use cases where the SVD
+  * decomposition is inherently more stable and/or flexible. For example, when computing the kernel of a matrix,
+  * working with the SVD allows to select the smallest singular values of the matrix, something that
+  * the LU decomposition doesn't see.
+  *
+  * The data of the LU decomposition can be directly accessed through the methods matrixLU(),
+  * permutationP(), permutationQ().
+  *
+  * As an exemple, here is how the original matrix can be retrieved:
+  * \include class_FullPivLU.cpp
+  * Output: \verbinclude class_FullPivLU.out
+  *
+  * \sa MatrixBase::fullPivLu(), MatrixBase::determinant(), MatrixBase::inverse()
+  */
+template<typename _MatrixType> class FullPivLU
+{
+  public:
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
+    typedef typename internal::traits<MatrixType>::StorageKind StorageKind;
+    typedef typename MatrixType::Index Index;
+    typedef typename internal::plain_row_type<MatrixType, Index>::type IntRowVectorType;
+    typedef typename internal::plain_col_type<MatrixType, Index>::type IntColVectorType;
+    typedef PermutationMatrix<ColsAtCompileTime, MaxColsAtCompileTime> PermutationQType;
+    typedef PermutationMatrix<RowsAtCompileTime, MaxRowsAtCompileTime> PermutationPType;
+
+    /**
+      * \brief Default Constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via LU::compute(const MatrixType&).
+      */
+    FullPivLU();
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa FullPivLU()
+      */
+    FullPivLU(Index rows, Index cols);
+
+    /** Constructor.
+      *
+      * \param matrix the matrix of which to compute the LU decomposition.
+      *               It is required to be nonzero.
+      */
+    FullPivLU(const MatrixType& matrix);
+
+    /** Computes the LU decomposition of the given matrix.
+      *
+      * \param matrix the matrix of which to compute the LU decomposition.
+      *               It is required to be nonzero.
+      *
+      * \returns a reference to *this
+      */
+    FullPivLU& compute(const MatrixType& matrix);
+
+    /** \returns the LU decomposition matrix: the upper-triangular part is U, the
+      * unit-lower-triangular part is L (at least for square matrices; in the non-square
+      * case, special care is needed, see the documentation of class FullPivLU).
+      *
+      * \sa matrixL(), matrixU()
+      */
+    inline const MatrixType& matrixLU() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return m_lu;
+    }
+
+    /** \returns the number of nonzero pivots in the LU decomposition.
+      * Here nonzero is meant in the exact sense, not in a fuzzy sense.
+      * So that notion isn't really intrinsically interesting, but it is
+      * still useful when implementing algorithms.
+      *
+      * \sa rank()
+      */
+    inline Index nonzeroPivots() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return m_nonzero_pivots;
+    }
+
+    /** \returns the absolute value of the biggest pivot, i.e. the biggest
+      *          diagonal coefficient of U.
+      */
+    RealScalar maxPivot() const { return m_maxpivot; }
+
+    /** \returns the permutation matrix P
+      *
+      * \sa permutationQ()
+      */
+    inline const PermutationPType& permutationP() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return m_p;
+    }
+
+    /** \returns the permutation matrix Q
+      *
+      * \sa permutationP()
+      */
+    inline const PermutationQType& permutationQ() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return m_q;
+    }
+
+    /** \returns the kernel of the matrix, also called its null-space. The columns of the returned matrix
+      * will form a basis of the kernel.
+      *
+      * \note If the kernel has dimension zero, then the returned matrix is a column-vector filled with zeros.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      *
+      * Example: \include FullPivLU_kernel.cpp
+      * Output: \verbinclude FullPivLU_kernel.out
+      *
+      * \sa image()
+      */
+    inline const internal::kernel_retval<FullPivLU> kernel() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return internal::kernel_retval<FullPivLU>(*this);
+    }
+
+    /** \returns the image of the matrix, also called its column-space. The columns of the returned matrix
+      * will form a basis of the kernel.
+      *
+      * \param originalMatrix the original matrix, of which *this is the LU decomposition.
+      *                       The reason why it is needed to pass it here, is that this allows
+      *                       a large optimization, as otherwise this method would need to reconstruct it
+      *                       from the LU decomposition.
+      *
+      * \note If the image has dimension zero, then the returned matrix is a column-vector filled with zeros.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      *
+      * Example: \include FullPivLU_image.cpp
+      * Output: \verbinclude FullPivLU_image.out
+      *
+      * \sa kernel()
+      */
+    inline const internal::image_retval<FullPivLU>
+      image(const MatrixType& originalMatrix) const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return internal::image_retval<FullPivLU>(*this, originalMatrix);
+    }
+
+    /** \return a solution x to the equation Ax=b, where A is the matrix of which
+      * *this is the LU decomposition.
+      *
+      * \param b the right-hand-side of the equation to solve. Can be a vector or a matrix,
+      *          the only requirement in order for the equation to make sense is that
+      *          b.rows()==A.rows(), where A is the matrix of which *this is the LU decomposition.
+      *
+      * \returns a solution.
+      *
+      * \note_about_checking_solutions
+      *
+      * \note_about_arbitrary_choice_of_solution
+      * \note_about_using_kernel_to_study_multiple_solutions
+      *
+      * Example: \include FullPivLU_solve.cpp
+      * Output: \verbinclude FullPivLU_solve.out
+      *
+      * \sa TriangularView::solve(), kernel(), inverse()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<FullPivLU, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return internal::solve_retval<FullPivLU, Rhs>(*this, b.derived());
+    }
+
+    /** \returns the determinant of the matrix of which
+      * *this is the LU decomposition. It has only linear complexity
+      * (that is, O(n) where n is the dimension of the square matrix)
+      * as the LU decomposition has already been computed.
+      *
+      * \note This is only for square matrices.
+      *
+      * \note For fixed-size matrices of size up to 4, MatrixBase::determinant() offers
+      *       optimized paths.
+      *
+      * \warning a determinant can be very big or small, so for matrices
+      * of large enough dimension, there is a risk of overflow/underflow.
+      *
+      * \sa MatrixBase::determinant()
+      */
+    typename internal::traits<MatrixType>::Scalar determinant() const;
+
+    /** Allows to prescribe a threshold to be used by certain methods, such as rank(),
+      * who need to determine when pivots are to be considered nonzero. This is not used for the
+      * LU decomposition itself.
+      *
+      * When it needs to get the threshold value, Eigen calls threshold(). By default, this
+      * uses a formula to automatically determine a reasonable threshold.
+      * Once you have called the present method setThreshold(const RealScalar&),
+      * your value is used instead.
+      *
+      * \param threshold The new value to use as the threshold.
+      *
+      * A pivot will be considered nonzero if its absolute value is strictly greater than
+      *  \f$ \vert pivot \vert \leqslant threshold \times \vert maxpivot \vert \f$
+      * where maxpivot is the biggest pivot.
+      *
+      * If you want to come back to the default behavior, call setThreshold(Default_t)
+      */
+    FullPivLU& setThreshold(const RealScalar& threshold)
+    {
+      m_usePrescribedThreshold = true;
+      m_prescribedThreshold = threshold;
+      return *this;
+    }
+
+    /** Allows to come back to the default behavior, letting Eigen use its default formula for
+      * determining the threshold.
+      *
+      * You should pass the special object Eigen::Default as parameter here.
+      * \code lu.setThreshold(Eigen::Default); \endcode
+      *
+      * See the documentation of setThreshold(const RealScalar&).
+      */
+    FullPivLU& setThreshold(Default_t)
+    {
+      m_usePrescribedThreshold = false;
+      return *this;
+    }
+
+    /** Returns the threshold that will be used by certain methods such as rank().
+      *
+      * See the documentation of setThreshold(const RealScalar&).
+      */
+    RealScalar threshold() const
+    {
+      eigen_assert(m_isInitialized || m_usePrescribedThreshold);
+      return m_usePrescribedThreshold ? m_prescribedThreshold
+      // this formula comes from experimenting (see "LU precision tuning" thread on the list)
+      // and turns out to be identical to Higham's formula used already in LDLt.
+                                      : NumTraits<Scalar>::epsilon() * m_lu.diagonalSize();
+    }
+
+    /** \returns the rank of the matrix of which *this is the LU decomposition.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline Index rank() const
+    {
+      using std::abs;
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      RealScalar premultiplied_threshold = abs(m_maxpivot) * threshold();
+      Index result = 0;
+      for(Index i = 0; i < m_nonzero_pivots; ++i)
+        result += (abs(m_lu.coeff(i,i)) > premultiplied_threshold);
+      return result;
+    }
+
+    /** \returns the dimension of the kernel of the matrix of which *this is the LU decomposition.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline Index dimensionOfKernel() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return cols() - rank();
+    }
+
+    /** \returns true if the matrix of which *this is the LU decomposition represents an injective
+      *          linear map, i.e. has trivial kernel; false otherwise.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isInjective() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return rank() == cols();
+    }
+
+    /** \returns true if the matrix of which *this is the LU decomposition represents a surjective
+      *          linear map; false otherwise.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isSurjective() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return rank() == rows();
+    }
+
+    /** \returns true if the matrix of which *this is the LU decomposition is invertible.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isInvertible() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return isInjective() && (m_lu.rows() == m_lu.cols());
+    }
+
+    /** \returns the inverse of the matrix of which *this is the LU decomposition.
+      *
+      * \note If this matrix is not invertible, the returned matrix has undefined coefficients.
+      *       Use isInvertible() to first determine whether this matrix is invertible.
+      *
+      * \sa MatrixBase::inverse()
+      */
+    inline const internal::solve_retval<FullPivLU,typename MatrixType::IdentityReturnType> inverse() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      eigen_assert(m_lu.rows() == m_lu.cols() && "You can't take the inverse of a non-square matrix!");
+      return internal::solve_retval<FullPivLU,typename MatrixType::IdentityReturnType>
+               (*this, MatrixType::Identity(m_lu.rows(), m_lu.cols()));
+    }
+
+    MatrixType reconstructedMatrix() const;
+
+    inline Index rows() const { return m_lu.rows(); }
+    inline Index cols() const { return m_lu.cols(); }
+
+  protected:
+    MatrixType m_lu;
+    PermutationPType m_p;
+    PermutationQType m_q;
+    IntColVectorType m_rowsTranspositions;
+    IntRowVectorType m_colsTranspositions;
+    Index m_det_pq, m_nonzero_pivots;
+    RealScalar m_maxpivot, m_prescribedThreshold;
+    bool m_isInitialized, m_usePrescribedThreshold;
+};
+
+template<typename MatrixType>
+FullPivLU<MatrixType>::FullPivLU()
+  : m_isInitialized(false), m_usePrescribedThreshold(false)
+{
+}
+
+template<typename MatrixType>
+FullPivLU<MatrixType>::FullPivLU(Index rows, Index cols)
+  : m_lu(rows, cols),
+    m_p(rows),
+    m_q(cols),
+    m_rowsTranspositions(rows),
+    m_colsTranspositions(cols),
+    m_isInitialized(false),
+    m_usePrescribedThreshold(false)
+{
+}
+
+template<typename MatrixType>
+FullPivLU<MatrixType>::FullPivLU(const MatrixType& matrix)
+  : m_lu(matrix.rows(), matrix.cols()),
+    m_p(matrix.rows()),
+    m_q(matrix.cols()),
+    m_rowsTranspositions(matrix.rows()),
+    m_colsTranspositions(matrix.cols()),
+    m_isInitialized(false),
+    m_usePrescribedThreshold(false)
+{
+  compute(matrix);
+}
+
+template<typename MatrixType>
+FullPivLU<MatrixType>& FullPivLU<MatrixType>::compute(const MatrixType& matrix)
+{
+  // the permutations are stored as int indices, so just to be sure:
+  eigen_assert(matrix.rows()<=NumTraits<int>::highest() && matrix.cols()<=NumTraits<int>::highest());
+  
+  m_isInitialized = true;
+  m_lu = matrix;
+
+  const Index size = matrix.diagonalSize();
+  const Index rows = matrix.rows();
+  const Index cols = matrix.cols();
+
+  // will store the transpositions, before we accumulate them at the end.
+  // can't accumulate on-the-fly because that will be done in reverse order for the rows.
+  m_rowsTranspositions.resize(matrix.rows());
+  m_colsTranspositions.resize(matrix.cols());
+  Index number_of_transpositions = 0; // number of NONTRIVIAL transpositions, i.e. m_rowsTranspositions[i]!=i
+
+  m_nonzero_pivots = size; // the generic case is that in which all pivots are nonzero (invertible case)
+  m_maxpivot = RealScalar(0);
+
+  for(Index k = 0; k < size; ++k)
+  {
+    // First, we need to find the pivot.
+
+    // biggest coefficient in the remaining bottom-right corner (starting at row k, col k)
+    Index row_of_biggest_in_corner, col_of_biggest_in_corner;
+    RealScalar biggest_in_corner;
+    biggest_in_corner = m_lu.bottomRightCorner(rows-k, cols-k)
+                        .cwiseAbs()
+                        .maxCoeff(&row_of_biggest_in_corner, &col_of_biggest_in_corner);
+    row_of_biggest_in_corner += k; // correct the values! since they were computed in the corner,
+    col_of_biggest_in_corner += k; // need to add k to them.
+
+    if(biggest_in_corner==RealScalar(0))
+    {
+      // before exiting, make sure to initialize the still uninitialized transpositions
+      // in a sane state without destroying what we already have.
+      m_nonzero_pivots = k;
+      for(Index i = k; i < size; ++i)
+      {
+        m_rowsTranspositions.coeffRef(i) = i;
+        m_colsTranspositions.coeffRef(i) = i;
+      }
+      break;
+    }
+
+    if(biggest_in_corner > m_maxpivot) m_maxpivot = biggest_in_corner;
+
+    // Now that we've found the pivot, we need to apply the row/col swaps to
+    // bring it to the location (k,k).
+
+    m_rowsTranspositions.coeffRef(k) = row_of_biggest_in_corner;
+    m_colsTranspositions.coeffRef(k) = col_of_biggest_in_corner;
+    if(k != row_of_biggest_in_corner) {
+      m_lu.row(k).swap(m_lu.row(row_of_biggest_in_corner));
+      ++number_of_transpositions;
+    }
+    if(k != col_of_biggest_in_corner) {
+      m_lu.col(k).swap(m_lu.col(col_of_biggest_in_corner));
+      ++number_of_transpositions;
+    }
+
+    // Now that the pivot is at the right location, we update the remaining
+    // bottom-right corner by Gaussian elimination.
+
+    if(k<rows-1)
+      m_lu.col(k).tail(rows-k-1) /= m_lu.coeff(k,k);
+    if(k<size-1)
+      m_lu.block(k+1,k+1,rows-k-1,cols-k-1).noalias() -= m_lu.col(k).tail(rows-k-1) * m_lu.row(k).tail(cols-k-1);
+  }
+
+  // the main loop is over, we still have to accumulate the transpositions to find the
+  // permutations P and Q
+
+  m_p.setIdentity(rows);
+  for(Index k = size-1; k >= 0; --k)
+    m_p.applyTranspositionOnTheRight(k, m_rowsTranspositions.coeff(k));
+
+  m_q.setIdentity(cols);
+  for(Index k = 0; k < size; ++k)
+    m_q.applyTranspositionOnTheRight(k, m_colsTranspositions.coeff(k));
+
+  m_det_pq = (number_of_transpositions%2) ? -1 : 1;
+  return *this;
+}
+
+template<typename MatrixType>
+typename internal::traits<MatrixType>::Scalar FullPivLU<MatrixType>::determinant() const
+{
+  eigen_assert(m_isInitialized && "LU is not initialized.");
+  eigen_assert(m_lu.rows() == m_lu.cols() && "You can't take the determinant of a non-square matrix!");
+  return Scalar(m_det_pq) * Scalar(m_lu.diagonal().prod());
+}
+
+/** \returns the matrix represented by the decomposition,
+ * i.e., it returns the product: \f$ P^{-1} L U Q^{-1} \f$.
+ * This function is provided for debug purposes. */
+template<typename MatrixType>
+MatrixType FullPivLU<MatrixType>::reconstructedMatrix() const
+{
+  eigen_assert(m_isInitialized && "LU is not initialized.");
+  const Index smalldim = (std::min)(m_lu.rows(), m_lu.cols());
+  // LU
+  MatrixType res(m_lu.rows(),m_lu.cols());
+  // FIXME the .toDenseMatrix() should not be needed...
+  res = m_lu.leftCols(smalldim)
+            .template triangularView<UnitLower>().toDenseMatrix()
+      * m_lu.topRows(smalldim)
+            .template triangularView<Upper>().toDenseMatrix();
+
+  // P^{-1}(LU)
+  res = m_p.inverse() * res;
+
+  // (P^{-1}LU)Q^{-1}
+  res = res * m_q.inverse();
+
+  return res;
+}
+
+/********* Implementation of kernel() **************************************************/
+
+namespace internal {
+template<typename _MatrixType>
+struct kernel_retval<FullPivLU<_MatrixType> >
+  : kernel_retval_base<FullPivLU<_MatrixType> >
+{
+  EIGEN_MAKE_KERNEL_HELPERS(FullPivLU<_MatrixType>)
+
+  enum { MaxSmallDimAtCompileTime = EIGEN_SIZE_MIN_PREFER_FIXED(
+            MatrixType::MaxColsAtCompileTime,
+            MatrixType::MaxRowsAtCompileTime)
+  };
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    using std::abs;
+    const Index cols = dec().matrixLU().cols(), dimker = cols - rank();
+    if(dimker == 0)
+    {
+      // The Kernel is just {0}, so it doesn't have a basis properly speaking, but let's
+      // avoid crashing/asserting as that depends on floating point calculations. Let's
+      // just return a single column vector filled with zeros.
+      dst.setZero();
+      return;
+    }
+
+    /* Let us use the following lemma:
+      *
+      * Lemma: If the matrix A has the LU decomposition PAQ = LU,
+      * then Ker A = Q(Ker U).
+      *
+      * Proof: trivial: just keep in mind that P, Q, L are invertible.
+      */
+
+    /* Thus, all we need to do is to compute Ker U, and then apply Q.
+      *
+      * U is upper triangular, with eigenvalues sorted so that any zeros appear at the end.
+      * Thus, the diagonal of U ends with exactly
+      * dimKer zero's. Let us use that to construct dimKer linearly
+      * independent vectors in Ker U.
+      */
+
+    Matrix<Index, Dynamic, 1, 0, MaxSmallDimAtCompileTime, 1> pivots(rank());
+    RealScalar premultiplied_threshold = dec().maxPivot() * dec().threshold();
+    Index p = 0;
+    for(Index i = 0; i < dec().nonzeroPivots(); ++i)
+      if(abs(dec().matrixLU().coeff(i,i)) > premultiplied_threshold)
+        pivots.coeffRef(p++) = i;
+    eigen_internal_assert(p == rank());
+
+    // we construct a temporaty trapezoid matrix m, by taking the U matrix and
+    // permuting the rows and cols to bring the nonnegligible pivots to the top of
+    // the main diagonal. We need that to be able to apply our triangular solvers.
+    // FIXME when we get triangularView-for-rectangular-matrices, this can be simplified
+    Matrix<typename MatrixType::Scalar, Dynamic, Dynamic, MatrixType::Options,
+           MaxSmallDimAtCompileTime, MatrixType::MaxColsAtCompileTime>
+      m(dec().matrixLU().block(0, 0, rank(), cols));
+    for(Index i = 0; i < rank(); ++i)
+    {
+      if(i) m.row(i).head(i).setZero();
+      m.row(i).tail(cols-i) = dec().matrixLU().row(pivots.coeff(i)).tail(cols-i);
+    }
+    m.block(0, 0, rank(), rank());
+    m.block(0, 0, rank(), rank()).template triangularView<StrictlyLower>().setZero();
+    for(Index i = 0; i < rank(); ++i)
+      m.col(i).swap(m.col(pivots.coeff(i)));
+
+    // ok, we have our trapezoid matrix, we can apply the triangular solver.
+    // notice that the math behind this suggests that we should apply this to the
+    // negative of the RHS, but for performance we just put the negative sign elsewhere, see below.
+    m.topLeftCorner(rank(), rank())
+     .template triangularView<Upper>().solveInPlace(
+        m.topRightCorner(rank(), dimker)
+      );
+
+    // now we must undo the column permutation that we had applied!
+    for(Index i = rank()-1; i >= 0; --i)
+      m.col(i).swap(m.col(pivots.coeff(i)));
+
+    // see the negative sign in the next line, that's what we were talking about above.
+    for(Index i = 0; i < rank(); ++i) dst.row(dec().permutationQ().indices().coeff(i)) = -m.row(i).tail(dimker);
+    for(Index i = rank(); i < cols; ++i) dst.row(dec().permutationQ().indices().coeff(i)).setZero();
+    for(Index k = 0; k < dimker; ++k) dst.coeffRef(dec().permutationQ().indices().coeff(rank()+k), k) = Scalar(1);
+  }
+};
+
+/***** Implementation of image() *****************************************************/
+
+template<typename _MatrixType>
+struct image_retval<FullPivLU<_MatrixType> >
+  : image_retval_base<FullPivLU<_MatrixType> >
+{
+  EIGEN_MAKE_IMAGE_HELPERS(FullPivLU<_MatrixType>)
+
+  enum { MaxSmallDimAtCompileTime = EIGEN_SIZE_MIN_PREFER_FIXED(
+            MatrixType::MaxColsAtCompileTime,
+            MatrixType::MaxRowsAtCompileTime)
+  };
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    using std::abs;
+    if(rank() == 0)
+    {
+      // The Image is just {0}, so it doesn't have a basis properly speaking, but let's
+      // avoid crashing/asserting as that depends on floating point calculations. Let's
+      // just return a single column vector filled with zeros.
+      dst.setZero();
+      return;
+    }
+
+    Matrix<Index, Dynamic, 1, 0, MaxSmallDimAtCompileTime, 1> pivots(rank());
+    RealScalar premultiplied_threshold = dec().maxPivot() * dec().threshold();
+    Index p = 0;
+    for(Index i = 0; i < dec().nonzeroPivots(); ++i)
+      if(abs(dec().matrixLU().coeff(i,i)) > premultiplied_threshold)
+        pivots.coeffRef(p++) = i;
+    eigen_internal_assert(p == rank());
+
+    for(Index i = 0; i < rank(); ++i)
+      dst.col(i) = originalMatrix().col(dec().permutationQ().indices().coeff(pivots.coeff(i)));
+  }
+};
+
+/***** Implementation of solve() *****************************************************/
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<FullPivLU<_MatrixType>, Rhs>
+  : solve_retval_base<FullPivLU<_MatrixType>, Rhs>
+{
+  EIGEN_MAKE_SOLVE_HELPERS(FullPivLU<_MatrixType>,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    /* The decomposition PAQ = LU can be rewritten as A = P^{-1} L U Q^{-1}.
+     * So we proceed as follows:
+     * Step 1: compute c = P * rhs.
+     * Step 2: replace c by the solution x to Lx = c. Exists because L is invertible.
+     * Step 3: replace c by the solution x to Ux = c. May or may not exist.
+     * Step 4: result = Q * c;
+     */
+
+    const Index rows = dec().rows(), cols = dec().cols(),
+              nonzero_pivots = dec().nonzeroPivots();
+    eigen_assert(rhs().rows() == rows);
+    const Index smalldim = (std::min)(rows, cols);
+
+    if(nonzero_pivots == 0)
+    {
+      dst.setZero();
+      return;
+    }
+
+    typename Rhs::PlainObject c(rhs().rows(), rhs().cols());
+
+    // Step 1
+    c = dec().permutationP() * rhs();
+
+    // Step 2
+    dec().matrixLU()
+        .topLeftCorner(smalldim,smalldim)
+        .template triangularView<UnitLower>()
+        .solveInPlace(c.topRows(smalldim));
+    if(rows>cols)
+    {
+      c.bottomRows(rows-cols)
+        -= dec().matrixLU().bottomRows(rows-cols)
+         * c.topRows(cols);
+    }
+
+    // Step 3
+    dec().matrixLU()
+        .topLeftCorner(nonzero_pivots, nonzero_pivots)
+        .template triangularView<Upper>()
+        .solveInPlace(c.topRows(nonzero_pivots));
+
+    // Step 4
+    for(Index i = 0; i < nonzero_pivots; ++i)
+      dst.row(dec().permutationQ().indices().coeff(i)) = c.row(i);
+    for(Index i = nonzero_pivots; i < dec().matrixLU().cols(); ++i)
+      dst.row(dec().permutationQ().indices().coeff(i)).setZero();
+  }
+};
+
+} // end namespace internal
+
+/******* MatrixBase methods *****************************************************************/
+
+/** \lu_module
+  *
+  * \return the full-pivoting LU decomposition of \c *this.
+  *
+  * \sa class FullPivLU
+  */
+#ifndef __CUDACC__
+template<typename Derived>
+inline const FullPivLU<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::fullPivLu() const
+{
+  return FullPivLU<PlainObject>(eval());
+}
+#endif
+
+} // end namespace Eigen
+
+#endif // EIGEN_LU_H
diff --git a/third_party/eigen3/Eigen/src/LU/Inverse.h b/third_party/eigen3/Eigen/src/LU/Inverse.h
new file mode 100644
index 0000000000..8d1364e0a9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/LU/Inverse.h
@@ -0,0 +1,417 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_INVERSE_H
+#define EIGEN_INVERSE_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/**********************************
+*** General case implementation ***
+**********************************/
+
+template<typename MatrixType, typename ResultType, int Size = MatrixType::RowsAtCompileTime>
+struct compute_inverse
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(const MatrixType& matrix, ResultType& result)
+  {
+    result = matrix.partialPivLu().inverse();
+  }
+};
+
+template<typename MatrixType, typename ResultType, int Size = MatrixType::RowsAtCompileTime>
+struct compute_inverse_and_det_with_check { /* nothing! general case not supported. */ };
+
+/****************************
+*** Size 1 implementation ***
+****************************/
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse<MatrixType, ResultType, 1>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(const MatrixType& matrix, ResultType& result)
+  {
+    typedef typename MatrixType::Scalar Scalar;
+    result.coeffRef(0,0) = Scalar(1) / matrix.coeff(0,0);
+  }
+};
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse_and_det_with_check<MatrixType, ResultType, 1>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(
+    const MatrixType& matrix,
+    const typename MatrixType::RealScalar& absDeterminantThreshold,
+    ResultType& result,
+    typename ResultType::Scalar& determinant,
+    bool& invertible
+  )
+  {
+    using std::abs;
+    determinant = matrix.coeff(0,0);
+    invertible = abs(determinant) > absDeterminantThreshold;
+    if(invertible) result.coeffRef(0,0) = typename ResultType::Scalar(1) / determinant;
+  }
+};
+
+/****************************
+*** Size 2 implementation ***
+****************************/
+
+template<typename MatrixType, typename ResultType>
+EIGEN_DEVICE_FUNC 
+inline void compute_inverse_size2_helper(
+    const MatrixType& matrix, const typename ResultType::Scalar& invdet,
+    ResultType& result)
+{
+  result.coeffRef(0,0) = matrix.coeff(1,1) * invdet;
+  result.coeffRef(1,0) = -matrix.coeff(1,0) * invdet;
+  result.coeffRef(0,1) = -matrix.coeff(0,1) * invdet;
+  result.coeffRef(1,1) = matrix.coeff(0,0) * invdet;
+}
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse<MatrixType, ResultType, 2>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(const MatrixType& matrix, ResultType& result)
+  {
+    typedef typename ResultType::Scalar Scalar;
+    const Scalar invdet = typename MatrixType::Scalar(1) / matrix.determinant();
+    compute_inverse_size2_helper(matrix, invdet, result);
+  }
+};
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse_and_det_with_check<MatrixType, ResultType, 2>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(
+    const MatrixType& matrix,
+    const typename MatrixType::RealScalar& absDeterminantThreshold,
+    ResultType& inverse,
+    typename ResultType::Scalar& determinant,
+    bool& invertible
+  )
+  {
+    using std::abs;
+    typedef typename ResultType::Scalar Scalar;
+    determinant = matrix.determinant();
+    invertible = abs(determinant) > absDeterminantThreshold;
+    if(!invertible) return;
+    const Scalar invdet = Scalar(1) / determinant;
+    compute_inverse_size2_helper(matrix, invdet, inverse);
+  }
+};
+
+/****************************
+*** Size 3 implementation ***
+****************************/
+
+template<typename MatrixType, int i, int j>
+EIGEN_DEVICE_FUNC 
+inline typename MatrixType::Scalar cofactor_3x3(const MatrixType& m)
+{
+  enum {
+    i1 = (i+1) % 3,
+    i2 = (i+2) % 3,
+    j1 = (j+1) % 3,
+    j2 = (j+2) % 3
+  };
+  return m.coeff(i1, j1) * m.coeff(i2, j2)
+       - m.coeff(i1, j2) * m.coeff(i2, j1);
+}
+
+template<typename MatrixType, typename ResultType>
+EIGEN_DEVICE_FUNC
+inline void compute_inverse_size3_helper(
+    const MatrixType& matrix,
+    const typename ResultType::Scalar& invdet,
+    const Matrix<typename ResultType::Scalar,3,1>& cofactors_col0,
+    ResultType& result)
+{
+  result.row(0) = cofactors_col0 * invdet;
+  result.coeffRef(1,0) =  cofactor_3x3<MatrixType,0,1>(matrix) * invdet;
+  result.coeffRef(1,1) =  cofactor_3x3<MatrixType,1,1>(matrix) * invdet;
+  result.coeffRef(1,2) =  cofactor_3x3<MatrixType,2,1>(matrix) * invdet;
+  result.coeffRef(2,0) =  cofactor_3x3<MatrixType,0,2>(matrix) * invdet;
+  result.coeffRef(2,1) =  cofactor_3x3<MatrixType,1,2>(matrix) * invdet;
+  result.coeffRef(2,2) =  cofactor_3x3<MatrixType,2,2>(matrix) * invdet;
+}
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse<MatrixType, ResultType, 3>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(const MatrixType& matrix, ResultType& result)
+  {
+    typedef typename ResultType::Scalar Scalar;
+    Matrix<typename MatrixType::Scalar,3,1> cofactors_col0;
+    cofactors_col0.coeffRef(0) =  cofactor_3x3<MatrixType,0,0>(matrix);
+    cofactors_col0.coeffRef(1) =  cofactor_3x3<MatrixType,1,0>(matrix);
+    cofactors_col0.coeffRef(2) =  cofactor_3x3<MatrixType,2,0>(matrix);
+    const Scalar det = (cofactors_col0.cwiseProduct(matrix.col(0))).sum();
+    const Scalar invdet = Scalar(1) / det;
+    compute_inverse_size3_helper(matrix, invdet, cofactors_col0, result);
+  }
+};
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse_and_det_with_check<MatrixType, ResultType, 3>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(
+    const MatrixType& matrix,
+    const typename MatrixType::RealScalar& absDeterminantThreshold,
+    ResultType& inverse,
+    typename ResultType::Scalar& determinant,
+    bool& invertible
+  )
+  {
+    using std::abs;
+    typedef typename ResultType::Scalar Scalar;
+    Matrix<Scalar,3,1> cofactors_col0;
+    cofactors_col0.coeffRef(0) =  cofactor_3x3<MatrixType,0,0>(matrix);
+    cofactors_col0.coeffRef(1) =  cofactor_3x3<MatrixType,1,0>(matrix);
+    cofactors_col0.coeffRef(2) =  cofactor_3x3<MatrixType,2,0>(matrix);
+    determinant = (cofactors_col0.cwiseProduct(matrix.col(0))).sum();
+    invertible = abs(determinant) > absDeterminantThreshold;
+    if(!invertible) return;
+    const Scalar invdet = Scalar(1) / determinant;
+    compute_inverse_size3_helper(matrix, invdet, cofactors_col0, inverse);
+  }
+};
+
+/****************************
+*** Size 4 implementation ***
+****************************/
+
+template<typename Derived>
+EIGEN_DEVICE_FUNC 
+inline const typename Derived::Scalar general_det3_helper
+(const MatrixBase<Derived>& matrix, int i1, int i2, int i3, int j1, int j2, int j3)
+{
+  return matrix.coeff(i1,j1)
+         * (matrix.coeff(i2,j2) * matrix.coeff(i3,j3) - matrix.coeff(i2,j3) * matrix.coeff(i3,j2));
+}
+
+template<typename MatrixType, int i, int j>
+EIGEN_DEVICE_FUNC 
+inline typename MatrixType::Scalar cofactor_4x4(const MatrixType& matrix)
+{
+  enum {
+    i1 = (i+1) % 4,
+    i2 = (i+2) % 4,
+    i3 = (i+3) % 4,
+    j1 = (j+1) % 4,
+    j2 = (j+2) % 4,
+    j3 = (j+3) % 4
+  };
+  return general_det3_helper(matrix, i1, i2, i3, j1, j2, j3)
+       + general_det3_helper(matrix, i2, i3, i1, j1, j2, j3)
+       + general_det3_helper(matrix, i3, i1, i2, j1, j2, j3);
+}
+
+template<int Arch, typename Scalar, typename MatrixType, typename ResultType>
+struct compute_inverse_size4
+{
+  EIGEN_DEVICE_FUNC
+  static void run(const MatrixType& matrix, ResultType& result)
+  {
+    result.coeffRef(0,0) =  cofactor_4x4<MatrixType,0,0>(matrix);
+    result.coeffRef(1,0) = -cofactor_4x4<MatrixType,0,1>(matrix);
+    result.coeffRef(2,0) =  cofactor_4x4<MatrixType,0,2>(matrix);
+    result.coeffRef(3,0) = -cofactor_4x4<MatrixType,0,3>(matrix);
+    result.coeffRef(0,2) =  cofactor_4x4<MatrixType,2,0>(matrix);
+    result.coeffRef(1,2) = -cofactor_4x4<MatrixType,2,1>(matrix);
+    result.coeffRef(2,2) =  cofactor_4x4<MatrixType,2,2>(matrix);
+    result.coeffRef(3,2) = -cofactor_4x4<MatrixType,2,3>(matrix);
+    result.coeffRef(0,1) = -cofactor_4x4<MatrixType,1,0>(matrix);
+    result.coeffRef(1,1) =  cofactor_4x4<MatrixType,1,1>(matrix);
+    result.coeffRef(2,1) = -cofactor_4x4<MatrixType,1,2>(matrix);
+    result.coeffRef(3,1) =  cofactor_4x4<MatrixType,1,3>(matrix);
+    result.coeffRef(0,3) = -cofactor_4x4<MatrixType,3,0>(matrix);
+    result.coeffRef(1,3) =  cofactor_4x4<MatrixType,3,1>(matrix);
+    result.coeffRef(2,3) = -cofactor_4x4<MatrixType,3,2>(matrix);
+    result.coeffRef(3,3) =  cofactor_4x4<MatrixType,3,3>(matrix);
+    result /= (matrix.col(0).cwiseProduct(result.row(0).transpose())).sum();
+  }
+};
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse<MatrixType, ResultType, 4>
+ : compute_inverse_size4<Architecture::Target, typename MatrixType::Scalar,
+                            MatrixType, ResultType>
+{
+};
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse_and_det_with_check<MatrixType, ResultType, 4>
+{
+  EIGEN_DEVICE_FUNC
+  static inline void run(
+    const MatrixType& matrix,
+    const typename MatrixType::RealScalar& absDeterminantThreshold,
+    ResultType& inverse,
+    typename ResultType::Scalar& determinant,
+    bool& invertible
+  )
+  {
+    using std::abs;
+    determinant = matrix.determinant();
+    invertible = abs(determinant) > absDeterminantThreshold;
+    if(invertible) compute_inverse<MatrixType, ResultType>::run(matrix, inverse);
+  }
+};
+
+/*************************
+*** MatrixBase methods ***
+*************************/
+
+template<typename MatrixType>
+struct traits<inverse_impl<MatrixType> >
+{
+  typedef typename MatrixType::PlainObject ReturnType;
+};
+
+template<typename MatrixType>
+struct inverse_impl : public ReturnByValue<inverse_impl<MatrixType> >
+{
+  typedef typename MatrixType::Index Index;
+  typedef typename internal::eval<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_all<MatrixTypeNested>::type MatrixTypeNestedCleaned;
+  MatrixTypeNested m_matrix;
+
+  EIGEN_DEVICE_FUNC
+  inverse_impl(const MatrixType& matrix)
+    : m_matrix(matrix)
+  {}
+
+  EIGEN_DEVICE_FUNC inline Index rows() const { return m_matrix.rows(); }
+  EIGEN_DEVICE_FUNC inline Index cols() const { return m_matrix.cols(); }
+
+  template<typename Dest>
+  EIGEN_DEVICE_FUNC
+  inline void evalTo(Dest& dst) const
+  {
+    const int Size = EIGEN_PLAIN_ENUM_MIN(MatrixType::ColsAtCompileTime,Dest::ColsAtCompileTime);
+    EIGEN_ONLY_USED_FOR_DEBUG(Size);
+    eigen_assert(( (Size<=1) || (Size>4) || (extract_data(m_matrix)!=extract_data(dst)))
+              && "Aliasing problem detected in inverse(), you need to do inverse().eval() here.");
+
+    compute_inverse<MatrixTypeNestedCleaned, Dest>::run(m_matrix, dst);
+  }
+};
+
+} // end namespace internal
+
+/** \lu_module
+  *
+  * \returns the matrix inverse of this matrix.
+  *
+  * For small fixed sizes up to 4x4, this method uses cofactors.
+  * In the general case, this method uses class PartialPivLU.
+  *
+  * \note This matrix must be invertible, otherwise the result is undefined. If you need an
+  * invertibility check, do the following:
+  * \li for fixed sizes up to 4x4, use computeInverseAndDetWithCheck().
+  * \li for the general case, use class FullPivLU.
+  *
+  * Example: \include MatrixBase_inverse.cpp
+  * Output: \verbinclude MatrixBase_inverse.out
+  *
+  * \sa computeInverseAndDetWithCheck()
+  */
+template<typename Derived>
+inline const internal::inverse_impl<Derived> MatrixBase<Derived>::inverse() const
+{
+  EIGEN_STATIC_ASSERT(!NumTraits<Scalar>::IsInteger,THIS_FUNCTION_IS_NOT_FOR_INTEGER_NUMERIC_TYPES)
+  eigen_assert(rows() == cols());
+  return internal::inverse_impl<Derived>(derived());
+}
+
+/** \lu_module
+  *
+  * Computation of matrix inverse and determinant, with invertibility check.
+  *
+  * This is only for fixed-size square matrices of size up to 4x4.
+  *
+  * \param inverse Reference to the matrix in which to store the inverse.
+  * \param determinant Reference to the variable in which to store the determinant.
+  * \param invertible Reference to the bool variable in which to store whether the matrix is invertible.
+  * \param absDeterminantThreshold Optional parameter controlling the invertibility check.
+  *                                The matrix will be declared invertible if the absolute value of its
+  *                                determinant is greater than this threshold.
+  *
+  * Example: \include MatrixBase_computeInverseAndDetWithCheck.cpp
+  * Output: \verbinclude MatrixBase_computeInverseAndDetWithCheck.out
+  *
+  * \sa inverse(), computeInverseWithCheck()
+  */
+template<typename Derived>
+template<typename ResultType>
+inline void MatrixBase<Derived>::computeInverseAndDetWithCheck(
+    ResultType& inverse,
+    typename ResultType::Scalar& determinant,
+    bool& invertible,
+    const RealScalar& absDeterminantThreshold
+  ) const
+{
+  // i'd love to put some static assertions there, but SFINAE means that they have no effect...
+  eigen_assert(rows() == cols());
+  // for 2x2, it's worth giving a chance to avoid evaluating.
+  // for larger sizes, evaluating has negligible cost and limits code size.
+  typedef typename internal::conditional<
+    RowsAtCompileTime == 2,
+    typename internal::remove_all<typename internal::nested<Derived, 2>::type>::type,
+    PlainObject
+  >::type MatrixType;
+  internal::compute_inverse_and_det_with_check<MatrixType, ResultType>::run
+    (derived(), absDeterminantThreshold, inverse, determinant, invertible);
+}
+
+/** \lu_module
+  *
+  * Computation of matrix inverse, with invertibility check.
+  *
+  * This is only for fixed-size square matrices of size up to 4x4.
+  *
+  * \param inverse Reference to the matrix in which to store the inverse.
+  * \param invertible Reference to the bool variable in which to store whether the matrix is invertible.
+  * \param absDeterminantThreshold Optional parameter controlling the invertibility check.
+  *                                The matrix will be declared invertible if the absolute value of its
+  *                                determinant is greater than this threshold.
+  *
+  * Example: \include MatrixBase_computeInverseWithCheck.cpp
+  * Output: \verbinclude MatrixBase_computeInverseWithCheck.out
+  *
+  * \sa inverse(), computeInverseAndDetWithCheck()
+  */
+template<typename Derived>
+template<typename ResultType>
+inline void MatrixBase<Derived>::computeInverseWithCheck(
+    ResultType& inverse,
+    bool& invertible,
+    const RealScalar& absDeterminantThreshold
+  ) const
+{
+  RealScalar determinant;
+  // i'd love to put some static assertions there, but SFINAE means that they have no effect...
+  eigen_assert(rows() == cols());
+  computeInverseAndDetWithCheck(inverse,determinant,invertible,absDeterminantThreshold);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_INVERSE_H
diff --git a/third_party/eigen3/Eigen/src/LU/PartialPivLU.h b/third_party/eigen3/Eigen/src/LU/PartialPivLU.h
new file mode 100644
index 0000000000..1d389ecac7
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/LU/PartialPivLU.h
@@ -0,0 +1,506 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2006-2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PARTIALLU_H
+#define EIGEN_PARTIALLU_H
+
+namespace Eigen { 
+
+/** \ingroup LU_Module
+  *
+  * \class PartialPivLU
+  *
+  * \brief LU decomposition of a matrix with partial pivoting, and related features
+  *
+  * \param MatrixType the type of the matrix of which we are computing the LU decomposition
+  *
+  * This class represents a LU decomposition of a \b square \b invertible matrix, with partial pivoting: the matrix A
+  * is decomposed as A = PLU where L is unit-lower-triangular, U is upper-triangular, and P
+  * is a permutation matrix.
+  *
+  * Typically, partial pivoting LU decomposition is only considered numerically stable for square invertible
+  * matrices. Thus LAPACK's dgesv and dgesvx require the matrix to be square and invertible. The present class
+  * does the same. It will assert that the matrix is square, but it won't (actually it can't) check that the
+  * matrix is invertible: it is your task to check that you only use this decomposition on invertible matrices.
+  *
+  * The guaranteed safe alternative, working for all matrices, is the full pivoting LU decomposition, provided
+  * by class FullPivLU.
+  *
+  * This is \b not a rank-revealing LU decomposition. Many features are intentionally absent from this class,
+  * such as rank computation. If you need these features, use class FullPivLU.
+  *
+  * This LU decomposition is suitable to invert invertible matrices. It is what MatrixBase::inverse() uses
+  * in the general case.
+  * On the other hand, it is \b not suitable to determine whether a given matrix is invertible.
+  *
+  * The data of the LU decomposition can be directly accessed through the methods matrixLU(), permutationP().
+  *
+  * \sa MatrixBase::partialPivLu(), MatrixBase::determinant(), MatrixBase::inverse(), MatrixBase::computeInverse(), class FullPivLU
+  */
+template<typename _MatrixType> class PartialPivLU
+{
+  public:
+
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
+    typedef typename internal::traits<MatrixType>::StorageKind StorageKind;
+    typedef typename MatrixType::Index Index;
+    typedef PermutationMatrix<RowsAtCompileTime, MaxRowsAtCompileTime> PermutationType;
+    typedef Transpositions<RowsAtCompileTime, MaxRowsAtCompileTime> TranspositionType;
+
+
+    /**
+    * \brief Default Constructor.
+    *
+    * The default constructor is useful in cases in which the user intends to
+    * perform decompositions via PartialPivLU::compute(const MatrixType&).
+    */
+    PartialPivLU();
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa PartialPivLU()
+      */
+    PartialPivLU(Index size);
+
+    /** Constructor.
+      *
+      * \param matrix the matrix of which to compute the LU decomposition.
+      *
+      * \warning The matrix should have full rank (e.g. if it's square, it should be invertible).
+      * If you need to deal with non-full rank, use class FullPivLU instead.
+      */
+    PartialPivLU(const MatrixType& matrix);
+
+    PartialPivLU& compute(const MatrixType& matrix);
+
+    /** \returns the LU decomposition matrix: the upper-triangular part is U, the
+      * unit-lower-triangular part is L (at least for square matrices; in the non-square
+      * case, special care is needed, see the documentation of class FullPivLU).
+      *
+      * \sa matrixL(), matrixU()
+      */
+    inline const MatrixType& matrixLU() const
+    {
+      eigen_assert(m_isInitialized && "PartialPivLU is not initialized.");
+      return m_lu;
+    }
+
+    /** \returns the permutation matrix P.
+      */
+    inline const PermutationType& permutationP() const
+    {
+      eigen_assert(m_isInitialized && "PartialPivLU is not initialized.");
+      return m_p;
+    }
+
+    /** This method returns the solution x to the equation Ax=b, where A is the matrix of which
+      * *this is the LU decomposition.
+      *
+      * \param b the right-hand-side of the equation to solve. Can be a vector or a matrix,
+      *          the only requirement in order for the equation to make sense is that
+      *          b.rows()==A.rows(), where A is the matrix of which *this is the LU decomposition.
+      *
+      * \returns the solution.
+      *
+      * Example: \include PartialPivLU_solve.cpp
+      * Output: \verbinclude PartialPivLU_solve.out
+      *
+      * Since this PartialPivLU class assumes anyway that the matrix A is invertible, the solution
+      * theoretically exists and is unique regardless of b.
+      *
+      * \sa TriangularView::solve(), inverse(), computeInverse()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<PartialPivLU, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "PartialPivLU is not initialized.");
+      return internal::solve_retval<PartialPivLU, Rhs>(*this, b.derived());
+    }
+
+    /** \returns the inverse of the matrix of which *this is the LU decomposition.
+      *
+      * \warning The matrix being decomposed here is assumed to be invertible. If you need to check for
+      *          invertibility, use class FullPivLU instead.
+      *
+      * \sa MatrixBase::inverse(), LU::inverse()
+      */
+    inline const internal::solve_retval<PartialPivLU,typename MatrixType::IdentityReturnType> inverse() const
+    {
+      eigen_assert(m_isInitialized && "PartialPivLU is not initialized.");
+      return internal::solve_retval<PartialPivLU,typename MatrixType::IdentityReturnType>
+               (*this, MatrixType::Identity(m_lu.rows(), m_lu.cols()));
+    }
+
+    /** \returns the determinant of the matrix of which
+      * *this is the LU decomposition. It has only linear complexity
+      * (that is, O(n) where n is the dimension of the square matrix)
+      * as the LU decomposition has already been computed.
+      *
+      * \note For fixed-size matrices of size up to 4, MatrixBase::determinant() offers
+      *       optimized paths.
+      *
+      * \warning a determinant can be very big or small, so for matrices
+      * of large enough dimension, there is a risk of overflow/underflow.
+      *
+      * \sa MatrixBase::determinant()
+      */
+    typename internal::traits<MatrixType>::Scalar determinant() const;
+
+    MatrixType reconstructedMatrix() const;
+
+    inline Index rows() const { return m_lu.rows(); }
+    inline Index cols() const { return m_lu.cols(); }
+
+  protected:
+    MatrixType m_lu;
+    PermutationType m_p;
+    TranspositionType m_rowsTranspositions;
+    Index m_det_p;
+    bool m_isInitialized;
+};
+
+template<typename MatrixType>
+PartialPivLU<MatrixType>::PartialPivLU()
+  : m_lu(),
+    m_p(),
+    m_rowsTranspositions(),
+    m_det_p(0),
+    m_isInitialized(false)
+{
+}
+
+template<typename MatrixType>
+PartialPivLU<MatrixType>::PartialPivLU(Index size)
+  : m_lu(size, size),
+    m_p(size),
+    m_rowsTranspositions(size),
+    m_det_p(0),
+    m_isInitialized(false)
+{
+}
+
+template<typename MatrixType>
+PartialPivLU<MatrixType>::PartialPivLU(const MatrixType& matrix)
+  : m_lu(matrix.rows(), matrix.rows()),
+    m_p(matrix.rows()),
+    m_rowsTranspositions(matrix.rows()),
+    m_det_p(0),
+    m_isInitialized(false)
+{
+  compute(matrix);
+}
+
+namespace internal {
+
+/** \internal This is the blocked version of fullpivlu_unblocked() */
+template<typename Scalar, int StorageOrder, typename PivIndex>
+struct partial_lu_impl
+{
+  // FIXME add a stride to Map, so that the following mapping becomes easier,
+  // another option would be to create an expression being able to automatically
+  // warp any Map, Matrix, and Block expressions as a unique type, but since that's exactly
+  // a Map + stride, why not adding a stride to Map, and convenient ctors from a Matrix,
+  // and Block.
+  typedef Map<Matrix<Scalar, Dynamic, Dynamic, StorageOrder> > MapLU;
+  typedef Block<MapLU, Dynamic, Dynamic> MatrixType;
+  typedef Block<MatrixType,Dynamic,Dynamic> BlockType;
+  typedef typename MatrixType::RealScalar RealScalar;
+  typedef typename MatrixType::Index Index;
+
+  /** \internal performs the LU decomposition in-place of the matrix \a lu
+    * using an unblocked algorithm.
+    *
+    * In addition, this function returns the row transpositions in the
+    * vector \a row_transpositions which must have a size equal to the number
+    * of columns of the matrix \a lu, and an integer \a nb_transpositions
+    * which returns the actual number of transpositions.
+    *
+    * \returns The index of the first pivot which is exactly zero if any, or a negative number otherwise.
+    */
+  static Index unblocked_lu(MatrixType& lu, PivIndex* row_transpositions, PivIndex& nb_transpositions)
+  {
+    const Index rows = lu.rows();
+    const Index cols = lu.cols();
+    const Index size = (std::min)(rows,cols);
+    nb_transpositions = 0;
+    Index first_zero_pivot = -1;
+    for(Index k = 0; k < size; ++k)
+    {
+      Index rrows = rows-k-1;
+      Index rcols = cols-k-1;
+        
+      Index row_of_biggest_in_col;
+      RealScalar biggest_in_corner
+        = lu.col(k).tail(rows-k).cwiseAbs().maxCoeff(&row_of_biggest_in_col);
+      row_of_biggest_in_col += k;
+
+      row_transpositions[k] = PivIndex(row_of_biggest_in_col);
+
+      if(biggest_in_corner != RealScalar(0))
+      {
+        if(k != row_of_biggest_in_col)
+        {
+          lu.row(k).swap(lu.row(row_of_biggest_in_col));
+          ++nb_transpositions;
+        }
+
+        // FIXME shall we introduce a safe quotient expression in cas 1/lu.coeff(k,k)
+        // overflow but not the actual quotient?
+        lu.col(k).tail(rrows) /= lu.coeff(k,k);
+      }
+      else if(first_zero_pivot==-1)
+      {
+        // the pivot is exactly zero, we record the index of the first pivot which is exactly 0,
+        // and continue the factorization such we still have A = PLU
+        first_zero_pivot = k;
+      }
+
+      if(k<rows-1)
+        lu.bottomRightCorner(rrows,rcols).noalias() -= lu.col(k).tail(rrows) * lu.row(k).tail(rcols);
+    }
+    return first_zero_pivot;
+  }
+
+  /** \internal performs the LU decomposition in-place of the matrix represented
+    * by the variables \a rows, \a cols, \a lu_data, and \a lu_stride using a
+    * recursive, blocked algorithm.
+    *
+    * In addition, this function returns the row transpositions in the
+    * vector \a row_transpositions which must have a size equal to the number
+    * of columns of the matrix \a lu, and an integer \a nb_transpositions
+    * which returns the actual number of transpositions.
+    *
+    * \returns The index of the first pivot which is exactly zero if any, or a negative number otherwise.
+    *
+    * \note This very low level interface using pointers, etc. is to:
+    *   1 - reduce the number of instanciations to the strict minimum
+    *   2 - avoid infinite recursion of the instanciations with Block<Block<Block<...> > >
+    */
+  static Index blocked_lu(Index rows, Index cols, Scalar* lu_data, Index luStride, PivIndex* row_transpositions, PivIndex& nb_transpositions, Index maxBlockSize=256)
+  {
+    MapLU lu1(lu_data,StorageOrder==RowMajor?rows:luStride,StorageOrder==RowMajor?luStride:cols);
+    MatrixType lu(lu1,0,0,rows,cols);
+
+    const Index size = (std::min)(rows,cols);
+
+    // if the matrix is too small, no blocking:
+    if(size<=16)
+    {
+      return unblocked_lu(lu, row_transpositions, nb_transpositions);
+    }
+
+    // automatically adjust the number of subdivisions to the size
+    // of the matrix so that there is enough sub blocks:
+    Index blockSize;
+    {
+      blockSize = size/8;
+      blockSize = (blockSize/16)*16;
+      blockSize = (std::min)((std::max)(blockSize,Index(8)), maxBlockSize);
+    }
+
+    nb_transpositions = 0;
+    Index first_zero_pivot = -1;
+    for(Index k = 0; k < size; k+=blockSize)
+    {
+      Index bs = (std::min)(size-k,blockSize); // actual size of the block
+      Index trows = rows - k - bs; // trailing rows
+      Index tsize = size - k - bs; // trailing size
+
+      // partition the matrix:
+      //                          A00 | A01 | A02
+      // lu  = A_0 | A_1 | A_2 =  A10 | A11 | A12
+      //                          A20 | A21 | A22
+      BlockType A_0(lu,0,0,rows,k);
+      BlockType A_2(lu,0,k+bs,rows,tsize);
+      BlockType A11(lu,k,k,bs,bs);
+      BlockType A12(lu,k,k+bs,bs,tsize);
+      BlockType A21(lu,k+bs,k,trows,bs);
+      BlockType A22(lu,k+bs,k+bs,trows,tsize);
+
+      PivIndex nb_transpositions_in_panel;
+      // recursively call the blocked LU algorithm on [A11^T A21^T]^T
+      // with a very small blocking size:
+      Index ret = blocked_lu(trows+bs, bs, &lu.coeffRef(k,k), luStride,
+                   row_transpositions+k, nb_transpositions_in_panel, 16);
+      if(ret>=0 && first_zero_pivot==-1)
+        first_zero_pivot = k+ret;
+
+      nb_transpositions += nb_transpositions_in_panel;
+      // update permutations and apply them to A_0
+      for(Index i=k; i<k+bs; ++i)
+      {
+        Index piv = (row_transpositions[i] += k);
+        A_0.row(i).swap(A_0.row(piv));
+      }
+
+      if(trows)
+      {
+        // apply permutations to A_2
+        for(Index i=k;i<k+bs; ++i)
+          A_2.row(i).swap(A_2.row(row_transpositions[i]));
+
+        // A12 = A11^-1 A12
+        A11.template triangularView<UnitLower>().solveInPlace(A12);
+
+        A22.noalias() -= A21 * A12;
+      }
+    }
+    return first_zero_pivot;
+  }
+};
+
+/** \internal performs the LU decomposition with partial pivoting in-place.
+  */
+template<typename MatrixType, typename TranspositionType>
+void partial_lu_inplace(MatrixType& lu, TranspositionType& row_transpositions, typename TranspositionType::Index& nb_transpositions)
+{
+  eigen_assert(lu.cols() == row_transpositions.size());
+  eigen_assert((&row_transpositions.coeffRef(1)-&row_transpositions.coeffRef(0)) == 1);
+
+  partial_lu_impl
+    <typename MatrixType::Scalar, MatrixType::Flags&RowMajorBit?RowMajor:ColMajor, typename TranspositionType::Index>
+    ::blocked_lu(lu.rows(), lu.cols(), &lu.coeffRef(0,0), lu.outerStride(), &row_transpositions.coeffRef(0), nb_transpositions);
+}
+
+} // end namespace internal
+
+template<typename MatrixType>
+PartialPivLU<MatrixType>& PartialPivLU<MatrixType>::compute(const MatrixType& matrix)
+{
+  // the row permutation is stored as int indices, so just to be sure:
+  eigen_assert(matrix.rows()<NumTraits<int>::highest());
+  
+  m_lu = matrix;
+
+  eigen_assert(matrix.rows() == matrix.cols() && "PartialPivLU is only for square (and moreover invertible) matrices");
+  const Index size = matrix.rows();
+
+  m_rowsTranspositions.resize(size);
+
+  typename TranspositionType::Index nb_transpositions;
+  internal::partial_lu_inplace(m_lu, m_rowsTranspositions, nb_transpositions);
+  m_det_p = (nb_transpositions%2) ? -1 : 1;
+
+  m_p = m_rowsTranspositions;
+
+  m_isInitialized = true;
+  return *this;
+}
+
+template<typename MatrixType>
+typename internal::traits<MatrixType>::Scalar PartialPivLU<MatrixType>::determinant() const
+{
+  eigen_assert(m_isInitialized && "PartialPivLU is not initialized.");
+  return Scalar(m_det_p) * m_lu.diagonal().prod();
+}
+
+/** \returns the matrix represented by the decomposition,
+ * i.e., it returns the product: P^{-1} L U.
+ * This function is provided for debug purpose. */
+template<typename MatrixType>
+MatrixType PartialPivLU<MatrixType>::reconstructedMatrix() const
+{
+  eigen_assert(m_isInitialized && "LU is not initialized.");
+  // LU
+  MatrixType res = m_lu.template triangularView<UnitLower>().toDenseMatrix()
+                 * m_lu.template triangularView<Upper>();
+
+  // P^{-1}(LU)
+  res = m_p.inverse() * res;
+
+  return res;
+}
+
+/***** Implementation of solve() *****************************************************/
+
+namespace internal {
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<PartialPivLU<_MatrixType>, Rhs>
+  : solve_retval_base<PartialPivLU<_MatrixType>, Rhs>
+{
+  EIGEN_MAKE_SOLVE_HELPERS(PartialPivLU<_MatrixType>,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    /* The decomposition PA = LU can be rewritten as A = P^{-1} L U.
+    * So we proceed as follows:
+    * Step 1: compute c = Pb.
+    * Step 2: replace c by the solution x to Lx = c.
+    * Step 3: replace c by the solution x to Ux = c.
+    */
+
+    eigen_assert(rhs().rows() == dec().matrixLU().rows());
+
+    // Step 1
+    dst = dec().permutationP() * rhs();
+
+    // Step 2
+    dec().matrixLU().template triangularView<UnitLower>().solveInPlace(dst);
+
+    // Step 3
+    dec().matrixLU().template triangularView<Upper>().solveInPlace(dst);
+  }
+};
+
+} // end namespace internal
+
+/******** MatrixBase methods *******/
+
+/** \lu_module
+  *
+  * \return the partial-pivoting LU decomposition of \c *this.
+  *
+  * \sa class PartialPivLU
+  */
+#ifndef __CUDACC__
+template<typename Derived>
+inline const PartialPivLU<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::partialPivLu() const
+{
+  return PartialPivLU<PlainObject>(eval());
+}
+#endif
+
+#if EIGEN2_SUPPORT_STAGE > STAGE20_RESOLVE_API_CONFLICTS
+/** \lu_module
+  *
+  * Synonym of partialPivLu().
+  *
+  * \return the partial-pivoting LU decomposition of \c *this.
+  *
+  * \sa class PartialPivLU
+  */
+#ifndef __CUDACC__
+template<typename Derived>
+inline const PartialPivLU<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::lu() const
+{
+  return PartialPivLU<PlainObject>(eval());
+}
+#endif
+
+#endif
+
+} // end namespace Eigen
+
+#endif // EIGEN_PARTIALLU_H
diff --git a/third_party/eigen3/Eigen/src/LU/PartialPivLU_MKL.h b/third_party/eigen3/Eigen/src/LU/PartialPivLU_MKL.h
new file mode 100644
index 0000000000..9035953c82
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/LU/PartialPivLU_MKL.h
@@ -0,0 +1,85 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *     LU decomposition with partial pivoting based on LAPACKE_?getrf function.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_PARTIALLU_LAPACK_H
+#define EIGEN_PARTIALLU_LAPACK_H
+
+#include "Eigen/src/Core/util/MKL_support.h"
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \internal Specialization for the data types supported by MKL */
+
+#define EIGEN_MKL_LU_PARTPIV(EIGTYPE, MKLTYPE, MKLPREFIX) \
+template<int StorageOrder> \
+struct partial_lu_impl<EIGTYPE, StorageOrder, lapack_int> \
+{ \
+  /* \internal performs the LU decomposition in-place of the matrix represented */ \
+  static lapack_int blocked_lu(lapack_int rows, lapack_int cols, EIGTYPE* lu_data, lapack_int luStride, lapack_int* row_transpositions, lapack_int& nb_transpositions, lapack_int maxBlockSize=256) \
+  { \
+    EIGEN_UNUSED_VARIABLE(maxBlockSize);\
+    lapack_int matrix_order, first_zero_pivot; \
+    lapack_int m, n, lda, *ipiv, info; \
+    EIGTYPE* a; \
+/* Set up parameters for ?getrf */ \
+    matrix_order = StorageOrder==RowMajor ? LAPACK_ROW_MAJOR : LAPACK_COL_MAJOR; \
+    lda = luStride; \
+    a = lu_data; \
+    ipiv = row_transpositions; \
+    m = rows; \
+    n = cols; \
+    nb_transpositions = 0; \
+\
+    info = LAPACKE_##MKLPREFIX##getrf( matrix_order, m, n, (MKLTYPE*)a, lda, ipiv ); \
+\
+    for(int i=0;i<m;i++) { ipiv[i]--; if (ipiv[i]!=i) nb_transpositions++; } \
+\
+    eigen_assert(info >= 0); \
+/* something should be done with nb_transpositions */ \
+\
+    first_zero_pivot = info; \
+    return first_zero_pivot; \
+  } \
+};
+
+EIGEN_MKL_LU_PARTPIV(double, double, d)
+EIGEN_MKL_LU_PARTPIV(float, float, s)
+EIGEN_MKL_LU_PARTPIV(dcomplex, MKL_Complex16, z)
+EIGEN_MKL_LU_PARTPIV(scomplex, MKL_Complex8, c)
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_PARTIALLU_LAPACK_H
diff --git a/third_party/eigen3/Eigen/src/LU/arch/Inverse_SSE.h b/third_party/eigen3/Eigen/src/LU/arch/Inverse_SSE.h
new file mode 100644
index 0000000000..60b7a23763
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/LU/arch/Inverse_SSE.h
@@ -0,0 +1,329 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2001 Intel Corporation
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// The SSE code for the 4x4 float and double matrix inverse in this file
+// comes from the following Intel's library:
+// http://software.intel.com/en-us/articles/optimized-matrix-library-for-use-with-the-intel-pentiumr-4-processors-sse2-instructions/
+//
+// Here is the respective copyright and license statement:
+//
+//   Copyright (c) 2001 Intel Corporation.
+//
+// Permition is granted to use, copy, distribute and prepare derivative works
+// of this library for any purpose and without fee, provided, that the above
+// copyright notice and this statement appear in all copies.
+// Intel makes no representations about the suitability of this software for
+// any purpose, and specifically disclaims all warranties.
+// See LEGAL.TXT for all the legal information.
+
+#ifndef EIGEN_INVERSE_SSE_H
+#define EIGEN_INVERSE_SSE_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse_size4<Architecture::SSE, float, MatrixType, ResultType>
+{
+  enum {
+    MatrixAlignment     = bool(MatrixType::Flags&AlignedBit),
+    ResultAlignment     = bool(ResultType::Flags&AlignedBit),
+    StorageOrdersMatch  = (MatrixType::Flags&RowMajorBit) == (ResultType::Flags&RowMajorBit)
+  };
+  
+  static void run(const MatrixType& matrix, ResultType& result)
+  {
+    EIGEN_ALIGN16 const unsigned int _Sign_PNNP[4] = { 0x00000000, 0x80000000, 0x80000000, 0x00000000 };
+
+    // Load the full matrix into registers
+    __m128 _L1 = matrix.template packet<MatrixAlignment>( 0);
+    __m128 _L2 = matrix.template packet<MatrixAlignment>( 4);
+    __m128 _L3 = matrix.template packet<MatrixAlignment>( 8);
+    __m128 _L4 = matrix.template packet<MatrixAlignment>(12);
+
+    // The inverse is calculated using "Divide and Conquer" technique. The
+    // original matrix is divide into four 2x2 sub-matrices. Since each
+    // register holds four matrix element, the smaller matrices are
+    // represented as a registers. Hence we get a better locality of the
+    // calculations.
+
+    __m128 A, B, C, D; // the four sub-matrices
+    if(!StorageOrdersMatch)
+    {
+      A = _mm_unpacklo_ps(_L1, _L2);
+      B = _mm_unpacklo_ps(_L3, _L4);
+      C = _mm_unpackhi_ps(_L1, _L2);
+      D = _mm_unpackhi_ps(_L3, _L4);
+    }
+    else
+    {
+      A = _mm_movelh_ps(_L1, _L2);
+      B = _mm_movehl_ps(_L2, _L1);
+      C = _mm_movelh_ps(_L3, _L4);
+      D = _mm_movehl_ps(_L4, _L3);
+    }
+
+    __m128 iA, iB, iC, iD,                 // partial inverse of the sub-matrices
+            DC, AB;
+    __m128 dA, dB, dC, dD;                 // determinant of the sub-matrices
+    __m128 det, d, d1, d2;
+    __m128 rd;                             // reciprocal of the determinant
+
+    //  AB = A# * B
+    AB = _mm_mul_ps(_mm_shuffle_ps(A,A,0x0F), B);
+    AB = _mm_sub_ps(AB,_mm_mul_ps(_mm_shuffle_ps(A,A,0xA5), _mm_shuffle_ps(B,B,0x4E)));
+    //  DC = D# * C
+    DC = _mm_mul_ps(_mm_shuffle_ps(D,D,0x0F), C);
+    DC = _mm_sub_ps(DC,_mm_mul_ps(_mm_shuffle_ps(D,D,0xA5), _mm_shuffle_ps(C,C,0x4E)));
+
+    //  dA = |A|
+    dA = _mm_mul_ps(_mm_shuffle_ps(A, A, 0x5F),A);
+    dA = _mm_sub_ss(dA, _mm_movehl_ps(dA,dA));
+    //  dB = |B|
+    dB = _mm_mul_ps(_mm_shuffle_ps(B, B, 0x5F),B);
+    dB = _mm_sub_ss(dB, _mm_movehl_ps(dB,dB));
+
+    //  dC = |C|
+    dC = _mm_mul_ps(_mm_shuffle_ps(C, C, 0x5F),C);
+    dC = _mm_sub_ss(dC, _mm_movehl_ps(dC,dC));
+    //  dD = |D|
+    dD = _mm_mul_ps(_mm_shuffle_ps(D, D, 0x5F),D);
+    dD = _mm_sub_ss(dD, _mm_movehl_ps(dD,dD));
+
+    //  d = trace(AB*DC) = trace(A#*B*D#*C)
+    d = _mm_mul_ps(_mm_shuffle_ps(DC,DC,0xD8),AB);
+
+    //  iD = C*A#*B
+    iD = _mm_mul_ps(_mm_shuffle_ps(C,C,0xA0), _mm_movelh_ps(AB,AB));
+    iD = _mm_add_ps(iD,_mm_mul_ps(_mm_shuffle_ps(C,C,0xF5), _mm_movehl_ps(AB,AB)));
+    //  iA = B*D#*C
+    iA = _mm_mul_ps(_mm_shuffle_ps(B,B,0xA0), _mm_movelh_ps(DC,DC));
+    iA = _mm_add_ps(iA,_mm_mul_ps(_mm_shuffle_ps(B,B,0xF5), _mm_movehl_ps(DC,DC)));
+
+    //  d = trace(AB*DC) = trace(A#*B*D#*C) [continue]
+    d  = _mm_add_ps(d, _mm_movehl_ps(d, d));
+    d  = _mm_add_ss(d, _mm_shuffle_ps(d, d, 1));
+    d1 = _mm_mul_ss(dA,dD);
+    d2 = _mm_mul_ss(dB,dC);
+
+    //  iD = D*|A| - C*A#*B
+    iD = _mm_sub_ps(_mm_mul_ps(D,_mm_shuffle_ps(dA,dA,0)), iD);
+
+    //  iA = A*|D| - B*D#*C;
+    iA = _mm_sub_ps(_mm_mul_ps(A,_mm_shuffle_ps(dD,dD,0)), iA);
+
+    //  det = |A|*|D| + |B|*|C| - trace(A#*B*D#*C)
+    det = _mm_sub_ss(_mm_add_ss(d1,d2),d);
+    rd  = _mm_div_ss(_mm_set_ss(1.0f), det);
+
+//     #ifdef ZERO_SINGULAR
+//         rd = _mm_and_ps(_mm_cmpneq_ss(det,_mm_setzero_ps()), rd);
+//     #endif
+
+    //  iB = D * (A#B)# = D*B#*A
+    iB = _mm_mul_ps(D, _mm_shuffle_ps(AB,AB,0x33));
+    iB = _mm_sub_ps(iB, _mm_mul_ps(_mm_shuffle_ps(D,D,0xB1), _mm_shuffle_ps(AB,AB,0x66)));
+    //  iC = A * (D#C)# = A*C#*D
+    iC = _mm_mul_ps(A, _mm_shuffle_ps(DC,DC,0x33));
+    iC = _mm_sub_ps(iC, _mm_mul_ps(_mm_shuffle_ps(A,A,0xB1), _mm_shuffle_ps(DC,DC,0x66)));
+
+    rd = _mm_shuffle_ps(rd,rd,0);
+    rd = _mm_xor_ps(rd, _mm_load_ps((float*)_Sign_PNNP));
+
+    //  iB = C*|B| - D*B#*A
+    iB = _mm_sub_ps(_mm_mul_ps(C,_mm_shuffle_ps(dB,dB,0)), iB);
+
+    //  iC = B*|C| - A*C#*D;
+    iC = _mm_sub_ps(_mm_mul_ps(B,_mm_shuffle_ps(dC,dC,0)), iC);
+
+    //  iX = iX / det
+    iA = _mm_mul_ps(rd,iA);
+    iB = _mm_mul_ps(rd,iB);
+    iC = _mm_mul_ps(rd,iC);
+    iD = _mm_mul_ps(rd,iD);
+
+    result.template writePacket<ResultAlignment>( 0, _mm_shuffle_ps(iA,iB,0x77));
+    result.template writePacket<ResultAlignment>( 4, _mm_shuffle_ps(iA,iB,0x22));
+    result.template writePacket<ResultAlignment>( 8, _mm_shuffle_ps(iC,iD,0x77));
+    result.template writePacket<ResultAlignment>(12, _mm_shuffle_ps(iC,iD,0x22));
+  }
+
+};
+
+template<typename MatrixType, typename ResultType>
+struct compute_inverse_size4<Architecture::SSE, double, MatrixType, ResultType>
+{
+  enum {
+    MatrixAlignment = bool(MatrixType::Flags&AlignedBit),
+    ResultAlignment = bool(ResultType::Flags&AlignedBit),
+    StorageOrdersMatch  = (MatrixType::Flags&RowMajorBit) == (ResultType::Flags&RowMajorBit)
+  };
+  static void run(const MatrixType& matrix, ResultType& result)
+  {
+    const __m128d _Sign_NP = _mm_castsi128_pd(_mm_set_epi32(0x0,0x0,0x80000000,0x0));
+    const __m128d _Sign_PN = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));
+
+    // The inverse is calculated using "Divide and Conquer" technique. The
+    // original matrix is divide into four 2x2 sub-matrices. Since each
+    // register of the matrix holds two element, the smaller matrices are
+    // consisted of two registers. Hence we get a better locality of the
+    // calculations.
+
+    // the four sub-matrices
+    __m128d A1, A2, B1, B2, C1, C2, D1, D2;
+    
+    if(StorageOrdersMatch)
+    {
+      A1 = matrix.template packet<MatrixAlignment>( 0); B1 = matrix.template packet<MatrixAlignment>( 2);
+      A2 = matrix.template packet<MatrixAlignment>( 4); B2 = matrix.template packet<MatrixAlignment>( 6);
+      C1 = matrix.template packet<MatrixAlignment>( 8); D1 = matrix.template packet<MatrixAlignment>(10);
+      C2 = matrix.template packet<MatrixAlignment>(12); D2 = matrix.template packet<MatrixAlignment>(14);
+    }
+    else
+    {
+      __m128d tmp;
+      A1 = matrix.template packet<MatrixAlignment>( 0); C1 = matrix.template packet<MatrixAlignment>( 2);
+      A2 = matrix.template packet<MatrixAlignment>( 4); C2 = matrix.template packet<MatrixAlignment>( 6);
+      tmp = A1;
+      A1 = _mm_unpacklo_pd(A1,A2);
+      A2 = _mm_unpackhi_pd(tmp,A2);
+      tmp = C1;
+      C1 = _mm_unpacklo_pd(C1,C2);
+      C2 = _mm_unpackhi_pd(tmp,C2);
+      
+      B1 = matrix.template packet<MatrixAlignment>( 8); D1 = matrix.template packet<MatrixAlignment>(10);
+      B2 = matrix.template packet<MatrixAlignment>(12); D2 = matrix.template packet<MatrixAlignment>(14);
+      tmp = B1;
+      B1 = _mm_unpacklo_pd(B1,B2);
+      B2 = _mm_unpackhi_pd(tmp,B2);
+      tmp = D1;
+      D1 = _mm_unpacklo_pd(D1,D2);
+      D2 = _mm_unpackhi_pd(tmp,D2);
+    }
+    
+    __m128d iA1, iA2, iB1, iB2, iC1, iC2, iD1, iD2,     // partial invese of the sub-matrices
+            DC1, DC2, AB1, AB2;
+    __m128d dA, dB, dC, dD;     // determinant of the sub-matrices
+    __m128d det, d1, d2, rd;
+
+    //  dA = |A|
+    dA = _mm_shuffle_pd(A2, A2, 1);
+    dA = _mm_mul_pd(A1, dA);
+    dA = _mm_sub_sd(dA, _mm_shuffle_pd(dA,dA,3));
+    //  dB = |B|
+    dB = _mm_shuffle_pd(B2, B2, 1);
+    dB = _mm_mul_pd(B1, dB);
+    dB = _mm_sub_sd(dB, _mm_shuffle_pd(dB,dB,3));
+
+    //  AB = A# * B
+    AB1 = _mm_mul_pd(B1, _mm_shuffle_pd(A2,A2,3));
+    AB2 = _mm_mul_pd(B2, _mm_shuffle_pd(A1,A1,0));
+    AB1 = _mm_sub_pd(AB1, _mm_mul_pd(B2, _mm_shuffle_pd(A1,A1,3)));
+    AB2 = _mm_sub_pd(AB2, _mm_mul_pd(B1, _mm_shuffle_pd(A2,A2,0)));
+
+    //  dC = |C|
+    dC = _mm_shuffle_pd(C2, C2, 1);
+    dC = _mm_mul_pd(C1, dC);
+    dC = _mm_sub_sd(dC, _mm_shuffle_pd(dC,dC,3));
+    //  dD = |D|
+    dD = _mm_shuffle_pd(D2, D2, 1);
+    dD = _mm_mul_pd(D1, dD);
+    dD = _mm_sub_sd(dD, _mm_shuffle_pd(dD,dD,3));
+
+    //  DC = D# * C
+    DC1 = _mm_mul_pd(C1, _mm_shuffle_pd(D2,D2,3));
+    DC2 = _mm_mul_pd(C2, _mm_shuffle_pd(D1,D1,0));
+    DC1 = _mm_sub_pd(DC1, _mm_mul_pd(C2, _mm_shuffle_pd(D1,D1,3)));
+    DC2 = _mm_sub_pd(DC2, _mm_mul_pd(C1, _mm_shuffle_pd(D2,D2,0)));
+
+    //  rd = trace(AB*DC) = trace(A#*B*D#*C)
+    d1 = _mm_mul_pd(AB1, _mm_shuffle_pd(DC1, DC2, 0));
+    d2 = _mm_mul_pd(AB2, _mm_shuffle_pd(DC1, DC2, 3));
+    rd = _mm_add_pd(d1, d2);
+    rd = _mm_add_sd(rd, _mm_shuffle_pd(rd, rd,3));
+
+    //  iD = C*A#*B
+    iD1 = _mm_mul_pd(AB1, _mm_shuffle_pd(C1,C1,0));
+    iD2 = _mm_mul_pd(AB1, _mm_shuffle_pd(C2,C2,0));
+    iD1 = _mm_add_pd(iD1, _mm_mul_pd(AB2, _mm_shuffle_pd(C1,C1,3)));
+    iD2 = _mm_add_pd(iD2, _mm_mul_pd(AB2, _mm_shuffle_pd(C2,C2,3)));
+
+    //  iA = B*D#*C
+    iA1 = _mm_mul_pd(DC1, _mm_shuffle_pd(B1,B1,0));
+    iA2 = _mm_mul_pd(DC1, _mm_shuffle_pd(B2,B2,0));
+    iA1 = _mm_add_pd(iA1, _mm_mul_pd(DC2, _mm_shuffle_pd(B1,B1,3)));
+    iA2 = _mm_add_pd(iA2, _mm_mul_pd(DC2, _mm_shuffle_pd(B2,B2,3)));
+
+    //  iD = D*|A| - C*A#*B
+    dA = _mm_shuffle_pd(dA,dA,0);
+    iD1 = _mm_sub_pd(_mm_mul_pd(D1, dA), iD1);
+    iD2 = _mm_sub_pd(_mm_mul_pd(D2, dA), iD2);
+
+    //  iA = A*|D| - B*D#*C;
+    dD = _mm_shuffle_pd(dD,dD,0);
+    iA1 = _mm_sub_pd(_mm_mul_pd(A1, dD), iA1);
+    iA2 = _mm_sub_pd(_mm_mul_pd(A2, dD), iA2);
+
+    d1 = _mm_mul_sd(dA, dD);
+    d2 = _mm_mul_sd(dB, dC);
+
+    //  iB = D * (A#B)# = D*B#*A
+    iB1 = _mm_mul_pd(D1, _mm_shuffle_pd(AB2,AB1,1));
+    iB2 = _mm_mul_pd(D2, _mm_shuffle_pd(AB2,AB1,1));
+    iB1 = _mm_sub_pd(iB1, _mm_mul_pd(_mm_shuffle_pd(D1,D1,1), _mm_shuffle_pd(AB2,AB1,2)));
+    iB2 = _mm_sub_pd(iB2, _mm_mul_pd(_mm_shuffle_pd(D2,D2,1), _mm_shuffle_pd(AB2,AB1,2)));
+
+    //  det = |A|*|D| + |B|*|C| - trace(A#*B*D#*C)
+    det = _mm_add_sd(d1, d2);
+    det = _mm_sub_sd(det, rd);
+
+    //  iC = A * (D#C)# = A*C#*D
+    iC1 = _mm_mul_pd(A1, _mm_shuffle_pd(DC2,DC1,1));
+    iC2 = _mm_mul_pd(A2, _mm_shuffle_pd(DC2,DC1,1));
+    iC1 = _mm_sub_pd(iC1, _mm_mul_pd(_mm_shuffle_pd(A1,A1,1), _mm_shuffle_pd(DC2,DC1,2)));
+    iC2 = _mm_sub_pd(iC2, _mm_mul_pd(_mm_shuffle_pd(A2,A2,1), _mm_shuffle_pd(DC2,DC1,2)));
+
+    rd = _mm_div_sd(_mm_set_sd(1.0), det);
+//     #ifdef ZERO_SINGULAR
+//         rd = _mm_and_pd(_mm_cmpneq_sd(det,_mm_setzero_pd()), rd);
+//     #endif
+    rd = _mm_shuffle_pd(rd,rd,0);
+
+    //  iB = C*|B| - D*B#*A
+    dB = _mm_shuffle_pd(dB,dB,0);
+    iB1 = _mm_sub_pd(_mm_mul_pd(C1, dB), iB1);
+    iB2 = _mm_sub_pd(_mm_mul_pd(C2, dB), iB2);
+
+    d1 = _mm_xor_pd(rd, _Sign_PN);
+    d2 = _mm_xor_pd(rd, _Sign_NP);
+
+    //  iC = B*|C| - A*C#*D;
+    dC = _mm_shuffle_pd(dC,dC,0);
+    iC1 = _mm_sub_pd(_mm_mul_pd(B1, dC), iC1);
+    iC2 = _mm_sub_pd(_mm_mul_pd(B2, dC), iC2);
+
+    result.template writePacket<ResultAlignment>( 0, _mm_mul_pd(_mm_shuffle_pd(iA2, iA1, 3), d1));     // iA# / det
+    result.template writePacket<ResultAlignment>( 4, _mm_mul_pd(_mm_shuffle_pd(iA2, iA1, 0), d2));
+    result.template writePacket<ResultAlignment>( 2, _mm_mul_pd(_mm_shuffle_pd(iB2, iB1, 3), d1));     // iB# / det
+    result.template writePacket<ResultAlignment>( 6, _mm_mul_pd(_mm_shuffle_pd(iB2, iB1, 0), d2));
+    result.template writePacket<ResultAlignment>( 8, _mm_mul_pd(_mm_shuffle_pd(iC2, iC1, 3), d1));     // iC# / det
+    result.template writePacket<ResultAlignment>(12, _mm_mul_pd(_mm_shuffle_pd(iC2, iC1, 0), d2));
+    result.template writePacket<ResultAlignment>(10, _mm_mul_pd(_mm_shuffle_pd(iD2, iD1, 3), d1));     // iD# / det
+    result.template writePacket<ResultAlignment>(14, _mm_mul_pd(_mm_shuffle_pd(iD2, iD1, 0), d2));
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_INVERSE_SSE_H
diff --git a/third_party/eigen3/Eigen/src/MetisSupport/MetisSupport.h b/third_party/eigen3/Eigen/src/MetisSupport/MetisSupport.h
new file mode 100644
index 0000000000..f2bbef20c8
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/MetisSupport/MetisSupport.h
@@ -0,0 +1,137 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#ifndef METIS_SUPPORT_H
+#define METIS_SUPPORT_H
+
+namespace Eigen {
+/**
+ * Get the fill-reducing ordering from the METIS package
+ * 
+ * If A is the original matrix and Ap is the permuted matrix, 
+ * the fill-reducing permutation is defined as follows :
+ * Row (column) i of A is the matperm(i) row (column) of Ap. 
+ * WARNING: As computed by METIS, this corresponds to the vector iperm (instead of perm)
+ */
+template <typename Index>
+class MetisOrdering
+{
+public:
+  typedef PermutationMatrix<Dynamic,Dynamic,Index> PermutationType;
+  typedef Matrix<Index,Dynamic,1> IndexVector; 
+  
+  template <typename MatrixType>
+  void get_symmetrized_graph(const MatrixType& A)
+  {
+    Index m = A.cols(); 
+    eigen_assert((A.rows() == A.cols()) && "ONLY FOR SQUARED MATRICES");
+    // Get the transpose of the input matrix 
+    MatrixType At = A.transpose(); 
+    // Get the number of nonzeros elements in each row/col of At+A
+    Index TotNz = 0; 
+    IndexVector visited(m); 
+    visited.setConstant(-1); 
+    for (int j = 0; j < m; j++)
+    {
+      // Compute the union structure of of A(j,:) and At(j,:)
+      visited(j) = j; // Do not include the diagonal element
+      // Get the nonzeros in row/column j of A
+      for (typename MatrixType::InnerIterator it(A, j); it; ++it)
+      {
+        Index idx = it.index(); // Get the row index (for column major) or column index (for row major)
+        if (visited(idx) != j ) 
+        {
+          visited(idx) = j; 
+          ++TotNz; 
+        }
+      }
+      //Get the nonzeros in row/column j of At
+      for (typename MatrixType::InnerIterator it(At, j); it; ++it)
+      {
+        Index idx = it.index(); 
+        if(visited(idx) != j)
+        {
+          visited(idx) = j; 
+          ++TotNz; 
+        }
+      }
+    }
+    // Reserve place for A + At
+    m_indexPtr.resize(m+1);
+    m_innerIndices.resize(TotNz); 
+
+    // Now compute the real adjacency list of each column/row 
+    visited.setConstant(-1); 
+    Index CurNz = 0; 
+    for (int j = 0; j < m; j++)
+    {
+      m_indexPtr(j) = CurNz; 
+      
+      visited(j) = j; // Do not include the diagonal element
+      // Add the pattern of row/column j of A to A+At
+      for (typename MatrixType::InnerIterator it(A,j); it; ++it)
+      {
+        Index idx = it.index(); // Get the row index (for column major) or column index (for row major)
+        if (visited(idx) != j ) 
+        {
+          visited(idx) = j; 
+          m_innerIndices(CurNz) = idx; 
+          CurNz++; 
+        }
+      }
+      //Add the pattern of row/column j of At to A+At
+      for (typename MatrixType::InnerIterator it(At, j); it; ++it)
+      {
+        Index idx = it.index(); 
+        if(visited(idx) != j)
+        {
+          visited(idx) = j; 
+          m_innerIndices(CurNz) = idx; 
+          ++CurNz; 
+        }
+      }
+    }
+    m_indexPtr(m) = CurNz;    
+  }
+  
+  template <typename MatrixType>
+  void operator() (const MatrixType& A, PermutationType& matperm)
+  {
+     Index m = A.cols();
+     IndexVector perm(m),iperm(m); 
+    // First, symmetrize the matrix graph. 
+     get_symmetrized_graph(A); 
+     int output_error;
+     
+     // Call the fill-reducing routine from METIS 
+     output_error = METIS_NodeND(&m, m_indexPtr.data(), m_innerIndices.data(), NULL, NULL, perm.data(), iperm.data());
+     
+    if(output_error != METIS_OK) 
+    {
+      //FIXME The ordering interface should define a class of possible errors 
+     std::cerr << "ERROR WHILE CALLING THE METIS PACKAGE \n"; 
+     return; 
+    }
+    
+    // Get the fill-reducing permutation 
+    //NOTE:  If Ap is the permuted matrix then perm and iperm vectors are defined as follows 
+    // Row (column) i of Ap is the perm(i) row(column) of A, and row (column) i of A is the iperm(i) row(column) of Ap
+    
+     matperm.resize(m);
+     for (int j = 0; j < m; j++)
+       matperm.indices()(iperm(j)) = j;
+   
+  }
+  
+  protected:
+    IndexVector m_indexPtr; // Pointer to the adjacenccy list of each row/column
+    IndexVector m_innerIndices; // Adjacency list 
+};
+
+}// end namespace eigen 
+#endif
diff --git a/third_party/eigen3/Eigen/src/OrderingMethods/Eigen_Colamd.h b/third_party/eigen3/Eigen/src/OrderingMethods/Eigen_Colamd.h
new file mode 100644
index 0000000000..44548f6607
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/OrderingMethods/Eigen_Colamd.h
@@ -0,0 +1,1850 @@
+// // This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Desire Nuentsa Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// This file is modified from the colamd/symamd library. The copyright is below
+
+//   The authors of the code itself are Stefan I. Larimore and Timothy A.
+//   Davis (davis@cise.ufl.edu), University of Florida.  The algorithm was
+//   developed in collaboration with John Gilbert, Xerox PARC, and Esmond
+//   Ng, Oak Ridge National Laboratory.
+// 
+//     Date:
+// 
+//   September 8, 2003.  Version 2.3.
+// 
+//     Acknowledgements:
+// 
+//   This work was supported by the National Science Foundation, under
+//   grants DMS-9504974 and DMS-9803599.
+// 
+//     Notice:
+// 
+//   Copyright (c) 1998-2003 by the University of Florida.
+//   All Rights Reserved.
+// 
+//   THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+//   EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+// 
+//   Permission is hereby granted to use, copy, modify, and/or distribute
+//   this program, provided that the Copyright, this License, and the
+//   Availability of the original version is retained on all copies and made
+//   accessible to the end-user of any code or package that includes COLAMD
+//   or any modified version of COLAMD. 
+// 
+//     Availability:
+// 
+//   The colamd/symamd library is available at
+// 
+//       http://www.cise.ufl.edu/research/sparse/colamd/
+
+//   This is the http://www.cise.ufl.edu/research/sparse/colamd/colamd.h
+//   file.  It is required by the colamd.c, colamdmex.c, and symamdmex.c
+//   files, and by any C code that calls the routines whose prototypes are
+//   listed below, or that uses the colamd/symamd definitions listed below.
+  
+#ifndef EIGEN_COLAMD_H
+#define EIGEN_COLAMD_H
+
+namespace internal {
+/* Ensure that debugging is turned off: */
+#ifndef COLAMD_NDEBUG
+#define COLAMD_NDEBUG
+#endif /* NDEBUG */
+/* ========================================================================== */
+/* === Knob and statistics definitions ====================================== */
+/* ========================================================================== */
+
+/* size of the knobs [ ] array.  Only knobs [0..1] are currently used. */
+#define COLAMD_KNOBS 20
+
+/* number of output statistics.  Only stats [0..6] are currently used. */
+#define COLAMD_STATS 20 
+
+/* knobs [0] and stats [0]: dense row knob and output statistic. */
+#define COLAMD_DENSE_ROW 0
+
+/* knobs [1] and stats [1]: dense column knob and output statistic. */
+#define COLAMD_DENSE_COL 1
+
+/* stats [2]: memory defragmentation count output statistic */
+#define COLAMD_DEFRAG_COUNT 2
+
+/* stats [3]: colamd status:  zero OK, > 0 warning or notice, < 0 error */
+#define COLAMD_STATUS 3
+
+/* stats [4..6]: error info, or info on jumbled columns */ 
+#define COLAMD_INFO1 4
+#define COLAMD_INFO2 5
+#define COLAMD_INFO3 6
+
+/* error codes returned in stats [3]: */
+#define COLAMD_OK       (0)
+#define COLAMD_OK_BUT_JUMBLED     (1)
+#define COLAMD_ERROR_A_not_present    (-1)
+#define COLAMD_ERROR_p_not_present    (-2)
+#define COLAMD_ERROR_nrow_negative    (-3)
+#define COLAMD_ERROR_ncol_negative    (-4)
+#define COLAMD_ERROR_nnz_negative   (-5)
+#define COLAMD_ERROR_p0_nonzero     (-6)
+#define COLAMD_ERROR_A_too_small    (-7)
+#define COLAMD_ERROR_col_length_negative  (-8)
+#define COLAMD_ERROR_row_index_out_of_bounds  (-9)
+#define COLAMD_ERROR_out_of_memory    (-10)
+#define COLAMD_ERROR_internal_error   (-999)
+
+/* ========================================================================== */
+/* === Definitions ========================================================== */
+/* ========================================================================== */
+
+#define COLAMD_MAX(a,b) (((a) > (b)) ? (a) : (b))
+#define COLAMD_MIN(a,b) (((a) < (b)) ? (a) : (b))
+
+#define ONES_COMPLEMENT(r) (-(r)-1)
+
+/* -------------------------------------------------------------------------- */
+
+#define COLAMD_EMPTY (-1)
+
+/* Row and column status */
+#define ALIVE (0)
+#define DEAD  (-1)
+
+/* Column status */
+#define DEAD_PRINCIPAL    (-1)
+#define DEAD_NON_PRINCIPAL  (-2)
+
+/* Macros for row and column status update and checking. */
+#define ROW_IS_DEAD(r)      ROW_IS_MARKED_DEAD (Row[r].shared2.mark)
+#define ROW_IS_MARKED_DEAD(row_mark)  (row_mark < ALIVE)
+#define ROW_IS_ALIVE(r)     (Row [r].shared2.mark >= ALIVE)
+#define COL_IS_DEAD(c)      (Col [c].start < ALIVE)
+#define COL_IS_ALIVE(c)     (Col [c].start >= ALIVE)
+#define COL_IS_DEAD_PRINCIPAL(c)  (Col [c].start == DEAD_PRINCIPAL)
+#define KILL_ROW(r)     { Row [r].shared2.mark = DEAD ; }
+#define KILL_PRINCIPAL_COL(c)   { Col [c].start = DEAD_PRINCIPAL ; }
+#define KILL_NON_PRINCIPAL_COL(c) { Col [c].start = DEAD_NON_PRINCIPAL ; }
+
+/* ========================================================================== */
+/* === Colamd reporting mechanism =========================================== */
+/* ========================================================================== */
+
+// == Row and Column structures ==
+template <typename Index>
+struct colamd_col
+{
+  Index start ;   /* index for A of first row in this column, or DEAD */
+  /* if column is dead */
+  Index length ;  /* number of rows in this column */
+  union
+  {
+    Index thickness ; /* number of original columns represented by this */
+    /* col, if the column is alive */
+    Index parent ;  /* parent in parent tree super-column structure, if */
+    /* the column is dead */
+  } shared1 ;
+  union
+  {
+    Index score ; /* the score used to maintain heap, if col is alive */
+    Index order ; /* pivot ordering of this column, if col is dead */
+  } shared2 ;
+  union
+  {
+    Index headhash ;  /* head of a hash bucket, if col is at the head of */
+    /* a degree list */
+    Index hash ;  /* hash value, if col is not in a degree list */
+    Index prev ;  /* previous column in degree list, if col is in a */
+    /* degree list (but not at the head of a degree list) */
+  } shared3 ;
+  union
+  {
+    Index degree_next ; /* next column, if col is in a degree list */
+    Index hash_next ;   /* next column, if col is in a hash list */
+  } shared4 ;
+  
+};
+ 
+template <typename Index>
+struct Colamd_Row
+{
+  Index start ;   /* index for A of first col in this row */
+  Index length ;  /* number of principal columns in this row */
+  union
+  {
+    Index degree ;  /* number of principal & non-principal columns in row */
+    Index p ;   /* used as a row pointer in init_rows_cols () */
+  } shared1 ;
+  union
+  {
+    Index mark ;  /* for computing set differences and marking dead rows*/
+    Index first_column ;/* first column in row (used in garbage collection) */
+  } shared2 ;
+  
+};
+ 
+/* ========================================================================== */
+/* === Colamd recommended memory size ======================================= */
+/* ========================================================================== */
+ 
+/*
+  The recommended length Alen of the array A passed to colamd is given by
+  the COLAMD_RECOMMENDED (nnz, n_row, n_col) macro.  It returns -1 if any
+  argument is negative.  2*nnz space is required for the row and column
+  indices of the matrix. colamd_c (n_col) + colamd_r (n_row) space is
+  required for the Col and Row arrays, respectively, which are internal to
+  colamd.  An additional n_col space is the minimal amount of "elbow room",
+  and nnz/5 more space is recommended for run time efficiency.
+  
+  This macro is not needed when using symamd.
+  
+  Explicit typecast to Index added Sept. 23, 2002, COLAMD version 2.2, to avoid
+  gcc -pedantic warning messages.
+*/
+template <typename Index>
+inline Index colamd_c(Index n_col) 
+{ return Index( ((n_col) + 1) * sizeof (colamd_col<Index>) / sizeof (Index) ) ; }
+
+template <typename Index>
+inline Index  colamd_r(Index n_row)
+{ return Index(((n_row) + 1) * sizeof (Colamd_Row<Index>) / sizeof (Index)); }
+
+// Prototypes of non-user callable routines
+template <typename Index>
+static Index init_rows_cols (Index n_row, Index n_col, Colamd_Row<Index> Row [], colamd_col<Index> col [], Index A [], Index p [], Index stats[COLAMD_STATS] ); 
+
+template <typename Index>
+static void init_scoring (Index n_row, Index n_col, Colamd_Row<Index> Row [], colamd_col<Index> Col [], Index A [], Index head [], double knobs[COLAMD_KNOBS], Index *p_n_row2, Index *p_n_col2, Index *p_max_deg);
+
+template <typename Index>
+static Index find_ordering (Index n_row, Index n_col, Index Alen, Colamd_Row<Index> Row [], colamd_col<Index> Col [], Index A [], Index head [], Index n_col2, Index max_deg, Index pfree);
+
+template <typename Index>
+static void order_children (Index n_col, colamd_col<Index> Col [], Index p []);
+
+template <typename Index>
+static void detect_super_cols (colamd_col<Index> Col [], Index A [], Index head [], Index row_start, Index row_length ) ;
+
+template <typename Index>
+static Index garbage_collection (Index n_row, Index n_col, Colamd_Row<Index> Row [], colamd_col<Index> Col [], Index A [], Index *pfree) ;
+
+template <typename Index>
+static inline  Index clear_mark (Index n_row, Colamd_Row<Index> Row [] ) ;
+
+/* === No debugging ========================================================= */
+
+#define COLAMD_DEBUG0(params) ;
+#define COLAMD_DEBUG1(params) ;
+#define COLAMD_DEBUG2(params) ;
+#define COLAMD_DEBUG3(params) ;
+#define COLAMD_DEBUG4(params) ;
+
+#define COLAMD_ASSERT(expression) ((void) 0)
+
+
+/**
+ * \brief Returns the recommended value of Alen 
+ * 
+ * Returns recommended value of Alen for use by colamd.  
+ * Returns -1 if any input argument is negative.  
+ * The use of this routine or macro is optional.  
+ * Note that the macro uses its arguments   more than once, 
+ * so be careful for side effects, if you pass expressions as arguments to COLAMD_RECOMMENDED.  
+ * 
+ * \param nnz nonzeros in A
+ * \param n_row number of rows in A
+ * \param n_col number of columns in A
+ * \return recommended value of Alen for use by colamd
+ */
+template <typename Index>
+inline Index colamd_recommended ( Index nnz, Index n_row, Index n_col)
+{
+  if ((nnz) < 0 || (n_row) < 0 || (n_col) < 0)
+    return (-1);
+  else
+    return (2 * (nnz) + colamd_c (n_col) + colamd_r (n_row) + (n_col) + ((nnz) / 5)); 
+}
+
+/**
+ * \brief set default parameters  The use of this routine is optional.
+ * 
+ * Colamd: rows with more than (knobs [COLAMD_DENSE_ROW] * n_col)
+ * entries are removed prior to ordering.  Columns with more than
+ * (knobs [COLAMD_DENSE_COL] * n_row) entries are removed prior to
+ * ordering, and placed last in the output column ordering. 
+ *
+ * COLAMD_DENSE_ROW and COLAMD_DENSE_COL are defined as 0 and 1,
+ * respectively, in colamd.h.  Default values of these two knobs
+ * are both 0.5.  Currently, only knobs [0] and knobs [1] are
+ * used, but future versions may use more knobs.  If so, they will
+ * be properly set to their defaults by the future version of
+ * colamd_set_defaults, so that the code that calls colamd will
+ * not need to change, assuming that you either use
+ * colamd_set_defaults, or pass a (double *) NULL pointer as the
+ * knobs array to colamd or symamd.
+ * 
+ * \param knobs parameter settings for colamd
+ */
+
+static inline void colamd_set_defaults(double knobs[COLAMD_KNOBS])
+{
+  /* === Local variables ================================================== */
+  
+  int i ;
+
+  if (!knobs)
+  {
+    return ;      /* no knobs to initialize */
+  }
+  for (i = 0 ; i < COLAMD_KNOBS ; i++)
+  {
+    knobs [i] = 0 ;
+  }
+  knobs [COLAMD_DENSE_ROW] = 0.5 ;  /* ignore rows over 50% dense */
+  knobs [COLAMD_DENSE_COL] = 0.5 ;  /* ignore columns over 50% dense */
+}
+
+/** 
+ * \brief  Computes a column ordering using the column approximate minimum degree ordering
+ * 
+ * Computes a column ordering (Q) of A such that P(AQ)=LU or
+ * (AQ)'AQ=LL' have less fill-in and require fewer floating point
+ * operations than factorizing the unpermuted matrix A or A'A,
+ * respectively.
+ * 
+ * 
+ * \param n_row number of rows in A
+ * \param n_col number of columns in A
+ * \param Alen, size of the array A
+ * \param A row indices of the matrix, of size ALen
+ * \param p column pointers of A, of size n_col+1
+ * \param knobs parameter settings for colamd
+ * \param stats colamd output statistics and error codes
+ */
+template <typename Index>
+static bool colamd(Index n_row, Index n_col, Index Alen, Index *A, Index *p, double knobs[COLAMD_KNOBS], Index stats[COLAMD_STATS])
+{
+  /* === Local variables ================================================== */
+  
+  Index i ;     /* loop index */
+  Index nnz ;     /* nonzeros in A */
+  Index Row_size ;    /* size of Row [], in integers */
+  Index Col_size ;    /* size of Col [], in integers */
+  Index need ;      /* minimum required length of A */
+  Colamd_Row<Index> *Row ;   /* pointer into A of Row [0..n_row] array */
+  colamd_col<Index> *Col ;   /* pointer into A of Col [0..n_col] array */
+  Index n_col2 ;    /* number of non-dense, non-empty columns */
+  Index n_row2 ;    /* number of non-dense, non-empty rows */
+  Index ngarbage ;    /* number of garbage collections performed */
+  Index max_deg ;   /* maximum row degree */
+  double default_knobs [COLAMD_KNOBS] ; /* default knobs array */
+  
+  
+  /* === Check the input arguments ======================================== */
+  
+  if (!stats)
+  {
+    COLAMD_DEBUG0 (("colamd: stats not present\n")) ;
+    return (false) ;
+  }
+  for (i = 0 ; i < COLAMD_STATS ; i++)
+  {
+    stats [i] = 0 ;
+  }
+  stats [COLAMD_STATUS] = COLAMD_OK ;
+  stats [COLAMD_INFO1] = -1 ;
+  stats [COLAMD_INFO2] = -1 ;
+  
+  if (!A)   /* A is not present */
+  {
+    stats [COLAMD_STATUS] = COLAMD_ERROR_A_not_present ;
+    COLAMD_DEBUG0 (("colamd: A not present\n")) ;
+    return (false) ;
+  }
+  
+  if (!p)   /* p is not present */
+  {
+    stats [COLAMD_STATUS] = COLAMD_ERROR_p_not_present ;
+    COLAMD_DEBUG0 (("colamd: p not present\n")) ;
+    return (false) ;
+  }
+  
+  if (n_row < 0)  /* n_row must be >= 0 */
+  {
+    stats [COLAMD_STATUS] = COLAMD_ERROR_nrow_negative ;
+    stats [COLAMD_INFO1] = n_row ;
+    COLAMD_DEBUG0 (("colamd: nrow negative %d\n", n_row)) ;
+    return (false) ;
+  }
+  
+  if (n_col < 0)  /* n_col must be >= 0 */
+  {
+    stats [COLAMD_STATUS] = COLAMD_ERROR_ncol_negative ;
+    stats [COLAMD_INFO1] = n_col ;
+    COLAMD_DEBUG0 (("colamd: ncol negative %d\n", n_col)) ;
+    return (false) ;
+  }
+  
+  nnz = p [n_col] ;
+  if (nnz < 0)  /* nnz must be >= 0 */
+  {
+    stats [COLAMD_STATUS] = COLAMD_ERROR_nnz_negative ;
+    stats [COLAMD_INFO1] = nnz ;
+    COLAMD_DEBUG0 (("colamd: number of entries negative %d\n", nnz)) ;
+    return (false) ;
+  }
+  
+  if (p [0] != 0)
+  {
+    stats [COLAMD_STATUS] = COLAMD_ERROR_p0_nonzero ;
+    stats [COLAMD_INFO1] = p [0] ;
+    COLAMD_DEBUG0 (("colamd: p[0] not zero %d\n", p [0])) ;
+    return (false) ;
+  }
+  
+  /* === If no knobs, set default knobs =================================== */
+  
+  if (!knobs)
+  {
+    colamd_set_defaults (default_knobs) ;
+    knobs = default_knobs ;
+  }
+  
+  /* === Allocate the Row and Col arrays from array A ===================== */
+  
+  Col_size = colamd_c (n_col) ;
+  Row_size = colamd_r (n_row) ;
+  need = 2*nnz + n_col + Col_size + Row_size ;
+  
+  if (need > Alen)
+  {
+    /* not enough space in array A to perform the ordering */
+    stats [COLAMD_STATUS] = COLAMD_ERROR_A_too_small ;
+    stats [COLAMD_INFO1] = need ;
+    stats [COLAMD_INFO2] = Alen ;
+    COLAMD_DEBUG0 (("colamd: Need Alen >= %d, given only Alen = %d\n", need,Alen));
+    return (false) ;
+  }
+  
+  Alen -= Col_size + Row_size ;
+  Col = (colamd_col<Index> *) &A [Alen] ;
+  Row = (Colamd_Row<Index> *) &A [Alen + Col_size] ;
+
+  /* === Construct the row and column data structures ===================== */
+  
+  if (!Eigen::internal::init_rows_cols (n_row, n_col, Row, Col, A, p, stats))
+  {
+    /* input matrix is invalid */
+    COLAMD_DEBUG0 (("colamd: Matrix invalid\n")) ;
+    return (false) ;
+  }
+  
+  /* === Initialize scores, kill dense rows/columns ======================= */
+
+  Eigen::internal::init_scoring (n_row, n_col, Row, Col, A, p, knobs,
+		&n_row2, &n_col2, &max_deg) ;
+  
+  /* === Order the supercolumns =========================================== */
+  
+  ngarbage = Eigen::internal::find_ordering (n_row, n_col, Alen, Row, Col, A, p,
+			    n_col2, max_deg, 2*nnz) ;
+  
+  /* === Order the non-principal columns ================================== */
+  
+  Eigen::internal::order_children (n_col, Col, p) ;
+  
+  /* === Return statistics in stats ======================================= */
+  
+  stats [COLAMD_DENSE_ROW] = n_row - n_row2 ;
+  stats [COLAMD_DENSE_COL] = n_col - n_col2 ;
+  stats [COLAMD_DEFRAG_COUNT] = ngarbage ;
+  COLAMD_DEBUG0 (("colamd: done.\n")) ; 
+  return (true) ;
+}
+
+/* ========================================================================== */
+/* === NON-USER-CALLABLE ROUTINES: ========================================== */
+/* ========================================================================== */
+
+/* There are no user-callable routines beyond this point in the file */
+
+
+/* ========================================================================== */
+/* === init_rows_cols ======================================================= */
+/* ========================================================================== */
+
+/*
+  Takes the column form of the matrix in A and creates the row form of the
+  matrix.  Also, row and column attributes are stored in the Col and Row
+  structs.  If the columns are un-sorted or contain duplicate row indices,
+  this routine will also sort and remove duplicate row indices from the
+  column form of the matrix.  Returns false if the matrix is invalid,
+  true otherwise.  Not user-callable.
+*/
+template <typename Index>
+static Index init_rows_cols  /* returns true if OK, or false otherwise */
+  (
+    /* === Parameters ======================================================= */
+
+    Index n_row,      /* number of rows of A */
+    Index n_col,      /* number of columns of A */
+    Colamd_Row<Index> Row [],    /* of size n_row+1 */
+    colamd_col<Index> Col [],    /* of size n_col+1 */
+    Index A [],     /* row indices of A, of size Alen */
+    Index p [],     /* pointers to columns in A, of size n_col+1 */
+    Index stats [COLAMD_STATS]  /* colamd statistics */ 
+    )
+{
+  /* === Local variables ================================================== */
+
+  Index col ;     /* a column index */
+  Index row ;     /* a row index */
+  Index *cp ;     /* a column pointer */
+  Index *cp_end ;   /* a pointer to the end of a column */
+  Index *rp ;     /* a row pointer */
+  Index *rp_end ;   /* a pointer to the end of a row */
+  Index last_row ;    /* previous row */
+
+  /* === Initialize columns, and check column pointers ==================== */
+
+  for (col = 0 ; col < n_col ; col++)
+  {
+    Col [col].start = p [col] ;
+    Col [col].length = p [col+1] - p [col] ;
+
+    if (Col [col].length < 0)
+    {
+      /* column pointers must be non-decreasing */
+      stats [COLAMD_STATUS] = COLAMD_ERROR_col_length_negative ;
+      stats [COLAMD_INFO1] = col ;
+      stats [COLAMD_INFO2] = Col [col].length ;
+      COLAMD_DEBUG0 (("colamd: col %d length %d < 0\n", col, Col [col].length)) ;
+      return (false) ;
+    }
+
+    Col [col].shared1.thickness = 1 ;
+    Col [col].shared2.score = 0 ;
+    Col [col].shared3.prev = COLAMD_EMPTY ;
+    Col [col].shared4.degree_next = COLAMD_EMPTY ;
+  }
+
+  /* p [0..n_col] no longer needed, used as "head" in subsequent routines */
+
+  /* === Scan columns, compute row degrees, and check row indices ========= */
+
+  stats [COLAMD_INFO3] = 0 ;  /* number of duplicate or unsorted row indices*/
+
+  for (row = 0 ; row < n_row ; row++)
+  {
+    Row [row].length = 0 ;
+    Row [row].shared2.mark = -1 ;
+  }
+
+  for (col = 0 ; col < n_col ; col++)
+  {
+    last_row = -1 ;
+
+    cp = &A [p [col]] ;
+    cp_end = &A [p [col+1]] ;
+
+    while (cp < cp_end)
+    {
+      row = *cp++ ;
+
+      /* make sure row indices within range */
+      if (row < 0 || row >= n_row)
+      {
+	stats [COLAMD_STATUS] = COLAMD_ERROR_row_index_out_of_bounds ;
+	stats [COLAMD_INFO1] = col ;
+	stats [COLAMD_INFO2] = row ;
+	stats [COLAMD_INFO3] = n_row ;
+	COLAMD_DEBUG0 (("colamd: row %d col %d out of bounds\n", row, col)) ;
+	return (false) ;
+      }
+
+      if (row <= last_row || Row [row].shared2.mark == col)
+      {
+	/* row index are unsorted or repeated (or both), thus col */
+	/* is jumbled.  This is a notice, not an error condition. */
+	stats [COLAMD_STATUS] = COLAMD_OK_BUT_JUMBLED ;
+	stats [COLAMD_INFO1] = col ;
+	stats [COLAMD_INFO2] = row ;
+	(stats [COLAMD_INFO3]) ++ ;
+	COLAMD_DEBUG1 (("colamd: row %d col %d unsorted/duplicate\n",row,col));
+      }
+
+      if (Row [row].shared2.mark != col)
+      {
+	Row [row].length++ ;
+      }
+      else
+      {
+	/* this is a repeated entry in the column, */
+	/* it will be removed */
+	Col [col].length-- ;
+      }
+
+      /* mark the row as having been seen in this column */
+      Row [row].shared2.mark = col ;
+
+      last_row = row ;
+    }
+  }
+
+  /* === Compute row pointers ============================================= */
+
+  /* row form of the matrix starts directly after the column */
+  /* form of matrix in A */
+  Row [0].start = p [n_col] ;
+  Row [0].shared1.p = Row [0].start ;
+  Row [0].shared2.mark = -1 ;
+  for (row = 1 ; row < n_row ; row++)
+  {
+    Row [row].start = Row [row-1].start + Row [row-1].length ;
+    Row [row].shared1.p = Row [row].start ;
+    Row [row].shared2.mark = -1 ;
+  }
+
+  /* === Create row form ================================================== */
+
+  if (stats [COLAMD_STATUS] == COLAMD_OK_BUT_JUMBLED)
+  {
+    /* if cols jumbled, watch for repeated row indices */
+    for (col = 0 ; col < n_col ; col++)
+    {
+      cp = &A [p [col]] ;
+      cp_end = &A [p [col+1]] ;
+      while (cp < cp_end)
+      {
+	row = *cp++ ;
+	if (Row [row].shared2.mark != col)
+	{
+	  A [(Row [row].shared1.p)++] = col ;
+	  Row [row].shared2.mark = col ;
+	}
+      }
+    }
+  }
+  else
+  {
+    /* if cols not jumbled, we don't need the mark (this is faster) */
+    for (col = 0 ; col < n_col ; col++)
+    {
+      cp = &A [p [col]] ;
+      cp_end = &A [p [col+1]] ;
+      while (cp < cp_end)
+      {
+	A [(Row [*cp++].shared1.p)++] = col ;
+      }
+    }
+  }
+
+  /* === Clear the row marks and set row degrees ========================== */
+
+  for (row = 0 ; row < n_row ; row++)
+  {
+    Row [row].shared2.mark = 0 ;
+    Row [row].shared1.degree = Row [row].length ;
+  }
+
+  /* === See if we need to re-create columns ============================== */
+
+  if (stats [COLAMD_STATUS] == COLAMD_OK_BUT_JUMBLED)
+  {
+    COLAMD_DEBUG0 (("colamd: reconstructing column form, matrix jumbled\n")) ;
+
+
+    /* === Compute col pointers ========================================= */
+
+    /* col form of the matrix starts at A [0]. */
+    /* Note, we may have a gap between the col form and the row */
+    /* form if there were duplicate entries, if so, it will be */
+    /* removed upon the first garbage collection */
+    Col [0].start = 0 ;
+    p [0] = Col [0].start ;
+    for (col = 1 ; col < n_col ; col++)
+    {
+      /* note that the lengths here are for pruned columns, i.e. */
+      /* no duplicate row indices will exist for these columns */
+      Col [col].start = Col [col-1].start + Col [col-1].length ;
+      p [col] = Col [col].start ;
+    }
+
+    /* === Re-create col form =========================================== */
+
+    for (row = 0 ; row < n_row ; row++)
+    {
+      rp = &A [Row [row].start] ;
+      rp_end = rp + Row [row].length ;
+      while (rp < rp_end)
+      {
+	A [(p [*rp++])++] = row ;
+      }
+    }
+  }
+
+  /* === Done.  Matrix is not (or no longer) jumbled ====================== */
+
+  return (true) ;
+}
+
+
+/* ========================================================================== */
+/* === init_scoring ========================================================= */
+/* ========================================================================== */
+
+/*
+  Kills dense or empty columns and rows, calculates an initial score for
+  each column, and places all columns in the degree lists.  Not user-callable.
+*/
+template <typename Index>
+static void init_scoring
+  (
+    /* === Parameters ======================================================= */
+
+    Index n_row,      /* number of rows of A */
+    Index n_col,      /* number of columns of A */
+    Colamd_Row<Index> Row [],    /* of size n_row+1 */
+    colamd_col<Index> Col [],    /* of size n_col+1 */
+    Index A [],     /* column form and row form of A */
+    Index head [],    /* of size n_col+1 */
+    double knobs [COLAMD_KNOBS],/* parameters */
+    Index *p_n_row2,    /* number of non-dense, non-empty rows */
+    Index *p_n_col2,    /* number of non-dense, non-empty columns */
+    Index *p_max_deg    /* maximum row degree */
+    )
+{
+  /* === Local variables ================================================== */
+
+  Index c ;     /* a column index */
+  Index r, row ;    /* a row index */
+  Index *cp ;     /* a column pointer */
+  Index deg ;     /* degree of a row or column */
+  Index *cp_end ;   /* a pointer to the end of a column */
+  Index *new_cp ;   /* new column pointer */
+  Index col_length ;    /* length of pruned column */
+  Index score ;     /* current column score */
+  Index n_col2 ;    /* number of non-dense, non-empty columns */
+  Index n_row2 ;    /* number of non-dense, non-empty rows */
+  Index dense_row_count ; /* remove rows with more entries than this */
+  Index dense_col_count ; /* remove cols with more entries than this */
+  Index min_score ;   /* smallest column score */
+  Index max_deg ;   /* maximum row degree */
+  Index next_col ;    /* Used to add to degree list.*/
+
+
+  /* === Extract knobs ==================================================== */
+
+  dense_row_count = COLAMD_MAX (0, COLAMD_MIN (knobs [COLAMD_DENSE_ROW] * n_col, n_col)) ;
+  dense_col_count = COLAMD_MAX (0, COLAMD_MIN (knobs [COLAMD_DENSE_COL] * n_row, n_row)) ;
+  COLAMD_DEBUG1 (("colamd: densecount: %d %d\n", dense_row_count, dense_col_count)) ;
+  max_deg = 0 ;
+  n_col2 = n_col ;
+  n_row2 = n_row ;
+
+  /* === Kill empty columns =============================================== */
+
+  /* Put the empty columns at the end in their natural order, so that LU */
+  /* factorization can proceed as far as possible. */
+  for (c = n_col-1 ; c >= 0 ; c--)
+  {
+    deg = Col [c].length ;
+    if (deg == 0)
+    {
+      /* this is a empty column, kill and order it last */
+      Col [c].shared2.order = --n_col2 ;
+      KILL_PRINCIPAL_COL (c) ;
+    }
+  }
+  COLAMD_DEBUG1 (("colamd: null columns killed: %d\n", n_col - n_col2)) ;
+
+  /* === Kill dense columns =============================================== */
+
+  /* Put the dense columns at the end, in their natural order */
+  for (c = n_col-1 ; c >= 0 ; c--)
+  {
+    /* skip any dead columns */
+    if (COL_IS_DEAD (c))
+    {
+      continue ;
+    }
+    deg = Col [c].length ;
+    if (deg > dense_col_count)
+    {
+      /* this is a dense column, kill and order it last */
+      Col [c].shared2.order = --n_col2 ;
+      /* decrement the row degrees */
+      cp = &A [Col [c].start] ;
+      cp_end = cp + Col [c].length ;
+      while (cp < cp_end)
+      {
+	Row [*cp++].shared1.degree-- ;
+      }
+      KILL_PRINCIPAL_COL (c) ;
+    }
+  }
+  COLAMD_DEBUG1 (("colamd: Dense and null columns killed: %d\n", n_col - n_col2)) ;
+
+  /* === Kill dense and empty rows ======================================== */
+
+  for (r = 0 ; r < n_row ; r++)
+  {
+    deg = Row [r].shared1.degree ;
+    COLAMD_ASSERT (deg >= 0 && deg <= n_col) ;
+    if (deg > dense_row_count || deg == 0)
+    {
+      /* kill a dense or empty row */
+      KILL_ROW (r) ;
+      --n_row2 ;
+    }
+    else
+    {
+      /* keep track of max degree of remaining rows */
+      max_deg = COLAMD_MAX (max_deg, deg) ;
+    }
+  }
+  COLAMD_DEBUG1 (("colamd: Dense and null rows killed: %d\n", n_row - n_row2)) ;
+
+  /* === Compute initial column scores ==================================== */
+
+  /* At this point the row degrees are accurate.  They reflect the number */
+  /* of "live" (non-dense) columns in each row.  No empty rows exist. */
+  /* Some "live" columns may contain only dead rows, however.  These are */
+  /* pruned in the code below. */
+
+  /* now find the initial matlab score for each column */
+  for (c = n_col-1 ; c >= 0 ; c--)
+  {
+    /* skip dead column */
+    if (COL_IS_DEAD (c))
+    {
+      continue ;
+    }
+    score = 0 ;
+    cp = &A [Col [c].start] ;
+    new_cp = cp ;
+    cp_end = cp + Col [c].length ;
+    while (cp < cp_end)
+    {
+      /* get a row */
+      row = *cp++ ;
+      /* skip if dead */
+      if (ROW_IS_DEAD (row))
+      {
+	continue ;
+      }
+      /* compact the column */
+      *new_cp++ = row ;
+      /* add row's external degree */
+      score += Row [row].shared1.degree - 1 ;
+      /* guard against integer overflow */
+      score = COLAMD_MIN (score, n_col) ;
+    }
+    /* determine pruned column length */
+    col_length = (Index) (new_cp - &A [Col [c].start]) ;
+    if (col_length == 0)
+    {
+      /* a newly-made null column (all rows in this col are "dense" */
+      /* and have already been killed) */
+      COLAMD_DEBUG2 (("Newly null killed: %d\n", c)) ;
+      Col [c].shared2.order = --n_col2 ;
+      KILL_PRINCIPAL_COL (c) ;
+    }
+    else
+    {
+      /* set column length and set score */
+      COLAMD_ASSERT (score >= 0) ;
+      COLAMD_ASSERT (score <= n_col) ;
+      Col [c].length = col_length ;
+      Col [c].shared2.score = score ;
+    }
+  }
+  COLAMD_DEBUG1 (("colamd: Dense, null, and newly-null columns killed: %d\n",
+		  n_col-n_col2)) ;
+
+  /* At this point, all empty rows and columns are dead.  All live columns */
+  /* are "clean" (containing no dead rows) and simplicial (no supercolumns */
+  /* yet).  Rows may contain dead columns, but all live rows contain at */
+  /* least one live column. */
+
+  /* === Initialize degree lists ========================================== */
+
+
+  /* clear the hash buckets */
+  for (c = 0 ; c <= n_col ; c++)
+  {
+    head [c] = COLAMD_EMPTY ;
+  }
+  min_score = n_col ;
+  /* place in reverse order, so low column indices are at the front */
+  /* of the lists.  This is to encourage natural tie-breaking */
+  for (c = n_col-1 ; c >= 0 ; c--)
+  {
+    /* only add principal columns to degree lists */
+    if (COL_IS_ALIVE (c))
+    {
+      COLAMD_DEBUG4 (("place %d score %d minscore %d ncol %d\n",
+		      c, Col [c].shared2.score, min_score, n_col)) ;
+
+      /* === Add columns score to DList =============================== */
+
+      score = Col [c].shared2.score ;
+
+      COLAMD_ASSERT (min_score >= 0) ;
+      COLAMD_ASSERT (min_score <= n_col) ;
+      COLAMD_ASSERT (score >= 0) ;
+      COLAMD_ASSERT (score <= n_col) ;
+      COLAMD_ASSERT (head [score] >= COLAMD_EMPTY) ;
+
+      /* now add this column to dList at proper score location */
+      next_col = head [score] ;
+      Col [c].shared3.prev = COLAMD_EMPTY ;
+      Col [c].shared4.degree_next = next_col ;
+
+      /* if there already was a column with the same score, set its */
+      /* previous pointer to this new column */
+      if (next_col != COLAMD_EMPTY)
+      {
+	Col [next_col].shared3.prev = c ;
+      }
+      head [score] = c ;
+
+      /* see if this score is less than current min */
+      min_score = COLAMD_MIN (min_score, score) ;
+
+
+    }
+  }
+
+
+  /* === Return number of remaining columns, and max row degree =========== */
+
+  *p_n_col2 = n_col2 ;
+  *p_n_row2 = n_row2 ;
+  *p_max_deg = max_deg ;
+}
+
+
+/* ========================================================================== */
+/* === find_ordering ======================================================== */
+/* ========================================================================== */
+
+/*
+  Order the principal columns of the supercolumn form of the matrix
+  (no supercolumns on input).  Uses a minimum approximate column minimum
+  degree ordering method.  Not user-callable.
+*/
+template <typename Index>
+static Index find_ordering /* return the number of garbage collections */
+  (
+    /* === Parameters ======================================================= */
+
+    Index n_row,      /* number of rows of A */
+    Index n_col,      /* number of columns of A */
+    Index Alen,     /* size of A, 2*nnz + n_col or larger */
+    Colamd_Row<Index> Row [],    /* of size n_row+1 */
+    colamd_col<Index> Col [],    /* of size n_col+1 */
+    Index A [],     /* column form and row form of A */
+    Index head [],    /* of size n_col+1 */
+    Index n_col2,     /* Remaining columns to order */
+    Index max_deg,    /* Maximum row degree */
+    Index pfree     /* index of first free slot (2*nnz on entry) */
+    )
+{
+  /* === Local variables ================================================== */
+
+  Index k ;     /* current pivot ordering step */
+  Index pivot_col ;   /* current pivot column */
+  Index *cp ;     /* a column pointer */
+  Index *rp ;     /* a row pointer */
+  Index pivot_row ;   /* current pivot row */
+  Index *new_cp ;   /* modified column pointer */
+  Index *new_rp ;   /* modified row pointer */
+  Index pivot_row_start ; /* pointer to start of pivot row */
+  Index pivot_row_degree ;  /* number of columns in pivot row */
+  Index pivot_row_length ;  /* number of supercolumns in pivot row */
+  Index pivot_col_score ; /* score of pivot column */
+  Index needed_memory ;   /* free space needed for pivot row */
+  Index *cp_end ;   /* pointer to the end of a column */
+  Index *rp_end ;   /* pointer to the end of a row */
+  Index row ;     /* a row index */
+  Index col ;     /* a column index */
+  Index max_score ;   /* maximum possible score */
+  Index cur_score ;   /* score of current column */
+  unsigned int hash ;   /* hash value for supernode detection */
+  Index head_column ;   /* head of hash bucket */
+  Index first_col ;   /* first column in hash bucket */
+  Index tag_mark ;    /* marker value for mark array */
+  Index row_mark ;    /* Row [row].shared2.mark */
+  Index set_difference ;  /* set difference size of row with pivot row */
+  Index min_score ;   /* smallest column score */
+  Index col_thickness ;   /* "thickness" (no. of columns in a supercol) */
+  Index max_mark ;    /* maximum value of tag_mark */
+  Index pivot_col_thickness ; /* number of columns represented by pivot col */
+  Index prev_col ;    /* Used by Dlist operations. */
+  Index next_col ;    /* Used by Dlist operations. */
+  Index ngarbage ;    /* number of garbage collections performed */
+
+
+  /* === Initialization and clear mark ==================================== */
+
+  max_mark = INT_MAX - n_col ;  /* INT_MAX defined in <limits.h> */
+  tag_mark = Eigen::internal::clear_mark (n_row, Row) ;
+  min_score = 0 ;
+  ngarbage = 0 ;
+  COLAMD_DEBUG1 (("colamd: Ordering, n_col2=%d\n", n_col2)) ;
+
+  /* === Order the columns ================================================ */
+
+  for (k = 0 ; k < n_col2 ; /* 'k' is incremented below */)
+  {
+
+    /* === Select pivot column, and order it ============================ */
+
+    /* make sure degree list isn't empty */
+    COLAMD_ASSERT (min_score >= 0) ;
+    COLAMD_ASSERT (min_score <= n_col) ;
+    COLAMD_ASSERT (head [min_score] >= COLAMD_EMPTY) ;
+
+    /* get pivot column from head of minimum degree list */
+    while (head [min_score] == COLAMD_EMPTY && min_score < n_col)
+    {
+      min_score++ ;
+    }
+    pivot_col = head [min_score] ;
+    COLAMD_ASSERT (pivot_col >= 0 && pivot_col <= n_col) ;
+    next_col = Col [pivot_col].shared4.degree_next ;
+    head [min_score] = next_col ;
+    if (next_col != COLAMD_EMPTY)
+    {
+      Col [next_col].shared3.prev = COLAMD_EMPTY ;
+    }
+
+    COLAMD_ASSERT (COL_IS_ALIVE (pivot_col)) ;
+    COLAMD_DEBUG3 (("Pivot col: %d\n", pivot_col)) ;
+
+    /* remember score for defrag check */
+    pivot_col_score = Col [pivot_col].shared2.score ;
+
+    /* the pivot column is the kth column in the pivot order */
+    Col [pivot_col].shared2.order = k ;
+
+    /* increment order count by column thickness */
+    pivot_col_thickness = Col [pivot_col].shared1.thickness ;
+    k += pivot_col_thickness ;
+    COLAMD_ASSERT (pivot_col_thickness > 0) ;
+
+    /* === Garbage_collection, if necessary ============================= */
+
+    needed_memory = COLAMD_MIN (pivot_col_score, n_col - k) ;
+    if (pfree + needed_memory >= Alen)
+    {
+      pfree = Eigen::internal::garbage_collection (n_row, n_col, Row, Col, A, &A [pfree]) ;
+      ngarbage++ ;
+      /* after garbage collection we will have enough */
+      COLAMD_ASSERT (pfree + needed_memory < Alen) ;
+      /* garbage collection has wiped out the Row[].shared2.mark array */
+      tag_mark = Eigen::internal::clear_mark (n_row, Row) ;
+
+    }
+
+    /* === Compute pivot row pattern ==================================== */
+
+    /* get starting location for this new merged row */
+    pivot_row_start = pfree ;
+
+    /* initialize new row counts to zero */
+    pivot_row_degree = 0 ;
+
+    /* tag pivot column as having been visited so it isn't included */
+    /* in merged pivot row */
+    Col [pivot_col].shared1.thickness = -pivot_col_thickness ;
+
+    /* pivot row is the union of all rows in the pivot column pattern */
+    cp = &A [Col [pivot_col].start] ;
+    cp_end = cp + Col [pivot_col].length ;
+    while (cp < cp_end)
+    {
+      /* get a row */
+      row = *cp++ ;
+      COLAMD_DEBUG4 (("Pivot col pattern %d %d\n", ROW_IS_ALIVE (row), row)) ;
+      /* skip if row is dead */
+      if (ROW_IS_DEAD (row))
+      {
+	continue ;
+      }
+      rp = &A [Row [row].start] ;
+      rp_end = rp + Row [row].length ;
+      while (rp < rp_end)
+      {
+	/* get a column */
+	col = *rp++ ;
+	/* add the column, if alive and untagged */
+	col_thickness = Col [col].shared1.thickness ;
+	if (col_thickness > 0 && COL_IS_ALIVE (col))
+	{
+	  /* tag column in pivot row */
+	  Col [col].shared1.thickness = -col_thickness ;
+	  COLAMD_ASSERT (pfree < Alen) ;
+	  /* place column in pivot row */
+	  A [pfree++] = col ;
+	  pivot_row_degree += col_thickness ;
+	}
+      }
+    }
+
+    /* clear tag on pivot column */
+    Col [pivot_col].shared1.thickness = pivot_col_thickness ;
+    max_deg = COLAMD_MAX (max_deg, pivot_row_degree) ;
+
+
+    /* === Kill all rows used to construct pivot row ==================== */
+
+    /* also kill pivot row, temporarily */
+    cp = &A [Col [pivot_col].start] ;
+    cp_end = cp + Col [pivot_col].length ;
+    while (cp < cp_end)
+    {
+      /* may be killing an already dead row */
+      row = *cp++ ;
+      COLAMD_DEBUG3 (("Kill row in pivot col: %d\n", row)) ;
+      KILL_ROW (row) ;
+    }
+
+    /* === Select a row index to use as the new pivot row =============== */
+
+    pivot_row_length = pfree - pivot_row_start ;
+    if (pivot_row_length > 0)
+    {
+      /* pick the "pivot" row arbitrarily (first row in col) */
+      pivot_row = A [Col [pivot_col].start] ;
+      COLAMD_DEBUG3 (("Pivotal row is %d\n", pivot_row)) ;
+    }
+    else
+    {
+      /* there is no pivot row, since it is of zero length */
+      pivot_row = COLAMD_EMPTY ;
+      COLAMD_ASSERT (pivot_row_length == 0) ;
+    }
+    COLAMD_ASSERT (Col [pivot_col].length > 0 || pivot_row_length == 0) ;
+
+    /* === Approximate degree computation =============================== */
+
+    /* Here begins the computation of the approximate degree.  The column */
+    /* score is the sum of the pivot row "length", plus the size of the */
+    /* set differences of each row in the column minus the pattern of the */
+    /* pivot row itself.  The column ("thickness") itself is also */
+    /* excluded from the column score (we thus use an approximate */
+    /* external degree). */
+
+    /* The time taken by the following code (compute set differences, and */
+    /* add them up) is proportional to the size of the data structure */
+    /* being scanned - that is, the sum of the sizes of each column in */
+    /* the pivot row.  Thus, the amortized time to compute a column score */
+    /* is proportional to the size of that column (where size, in this */
+    /* context, is the column "length", or the number of row indices */
+    /* in that column).  The number of row indices in a column is */
+    /* monotonically non-decreasing, from the length of the original */
+    /* column on input to colamd. */
+
+    /* === Compute set differences ====================================== */
+
+    COLAMD_DEBUG3 (("** Computing set differences phase. **\n")) ;
+
+    /* pivot row is currently dead - it will be revived later. */
+
+    COLAMD_DEBUG3 (("Pivot row: ")) ;
+    /* for each column in pivot row */
+    rp = &A [pivot_row_start] ;
+    rp_end = rp + pivot_row_length ;
+    while (rp < rp_end)
+    {
+      col = *rp++ ;
+      COLAMD_ASSERT (COL_IS_ALIVE (col) && col != pivot_col) ;
+      COLAMD_DEBUG3 (("Col: %d\n", col)) ;
+
+      /* clear tags used to construct pivot row pattern */
+      col_thickness = -Col [col].shared1.thickness ;
+      COLAMD_ASSERT (col_thickness > 0) ;
+      Col [col].shared1.thickness = col_thickness ;
+
+      /* === Remove column from degree list =========================== */
+
+      cur_score = Col [col].shared2.score ;
+      prev_col = Col [col].shared3.prev ;
+      next_col = Col [col].shared4.degree_next ;
+      COLAMD_ASSERT (cur_score >= 0) ;
+      COLAMD_ASSERT (cur_score <= n_col) ;
+      COLAMD_ASSERT (cur_score >= COLAMD_EMPTY) ;
+      if (prev_col == COLAMD_EMPTY)
+      {
+	head [cur_score] = next_col ;
+      }
+      else
+      {
+	Col [prev_col].shared4.degree_next = next_col ;
+      }
+      if (next_col != COLAMD_EMPTY)
+      {
+	Col [next_col].shared3.prev = prev_col ;
+      }
+
+      /* === Scan the column ========================================== */
+
+      cp = &A [Col [col].start] ;
+      cp_end = cp + Col [col].length ;
+      while (cp < cp_end)
+      {
+	/* get a row */
+	row = *cp++ ;
+	row_mark = Row [row].shared2.mark ;
+	/* skip if dead */
+	if (ROW_IS_MARKED_DEAD (row_mark))
+	{
+	  continue ;
+	}
+	COLAMD_ASSERT (row != pivot_row) ;
+	set_difference = row_mark - tag_mark ;
+	/* check if the row has been seen yet */
+	if (set_difference < 0)
+	{
+	  COLAMD_ASSERT (Row [row].shared1.degree <= max_deg) ;
+	  set_difference = Row [row].shared1.degree ;
+	}
+	/* subtract column thickness from this row's set difference */
+	set_difference -= col_thickness ;
+	COLAMD_ASSERT (set_difference >= 0) ;
+	/* absorb this row if the set difference becomes zero */
+	if (set_difference == 0)
+	{
+	  COLAMD_DEBUG3 (("aggressive absorption. Row: %d\n", row)) ;
+	  KILL_ROW (row) ;
+	}
+	else
+	{
+	  /* save the new mark */
+	  Row [row].shared2.mark = set_difference + tag_mark ;
+	}
+      }
+    }
+
+
+    /* === Add up set differences for each column ======================= */
+
+    COLAMD_DEBUG3 (("** Adding set differences phase. **\n")) ;
+
+    /* for each column in pivot row */
+    rp = &A [pivot_row_start] ;
+    rp_end = rp + pivot_row_length ;
+    while (rp < rp_end)
+    {
+      /* get a column */
+      col = *rp++ ;
+      COLAMD_ASSERT (COL_IS_ALIVE (col) && col != pivot_col) ;
+      hash = 0 ;
+      cur_score = 0 ;
+      cp = &A [Col [col].start] ;
+      /* compact the column */
+      new_cp = cp ;
+      cp_end = cp + Col [col].length ;
+
+      COLAMD_DEBUG4 (("Adding set diffs for Col: %d.\n", col)) ;
+
+      while (cp < cp_end)
+      {
+	/* get a row */
+	row = *cp++ ;
+	COLAMD_ASSERT(row >= 0 && row < n_row) ;
+	row_mark = Row [row].shared2.mark ;
+	/* skip if dead */
+	if (ROW_IS_MARKED_DEAD (row_mark))
+	{
+	  continue ;
+	}
+	COLAMD_ASSERT (row_mark > tag_mark) ;
+	/* compact the column */
+	*new_cp++ = row ;
+	/* compute hash function */
+	hash += row ;
+	/* add set difference */
+	cur_score += row_mark - tag_mark ;
+	/* integer overflow... */
+	cur_score = COLAMD_MIN (cur_score, n_col) ;
+      }
+
+      /* recompute the column's length */
+      Col [col].length = (Index) (new_cp - &A [Col [col].start]) ;
+
+      /* === Further mass elimination ================================= */
+
+      if (Col [col].length == 0)
+      {
+	COLAMD_DEBUG4 (("further mass elimination. Col: %d\n", col)) ;
+	/* nothing left but the pivot row in this column */
+	KILL_PRINCIPAL_COL (col) ;
+	pivot_row_degree -= Col [col].shared1.thickness ;
+	COLAMD_ASSERT (pivot_row_degree >= 0) ;
+	/* order it */
+	Col [col].shared2.order = k ;
+	/* increment order count by column thickness */
+	k += Col [col].shared1.thickness ;
+      }
+      else
+      {
+	/* === Prepare for supercolumn detection ==================== */
+
+	COLAMD_DEBUG4 (("Preparing supercol detection for Col: %d.\n", col)) ;
+
+	/* save score so far */
+	Col [col].shared2.score = cur_score ;
+
+	/* add column to hash table, for supercolumn detection */
+	hash %= n_col + 1 ;
+
+	COLAMD_DEBUG4 ((" Hash = %d, n_col = %d.\n", hash, n_col)) ;
+	COLAMD_ASSERT (hash <= n_col) ;
+
+	head_column = head [hash] ;
+	if (head_column > COLAMD_EMPTY)
+	{
+	  /* degree list "hash" is non-empty, use prev (shared3) of */
+	  /* first column in degree list as head of hash bucket */
+	  first_col = Col [head_column].shared3.headhash ;
+	  Col [head_column].shared3.headhash = col ;
+	}
+	else
+	{
+	  /* degree list "hash" is empty, use head as hash bucket */
+	  first_col = - (head_column + 2) ;
+	  head [hash] = - (col + 2) ;
+	}
+	Col [col].shared4.hash_next = first_col ;
+
+	/* save hash function in Col [col].shared3.hash */
+	Col [col].shared3.hash = (Index) hash ;
+	COLAMD_ASSERT (COL_IS_ALIVE (col)) ;
+      }
+    }
+
+    /* The approximate external column degree is now computed.  */
+
+    /* === Supercolumn detection ======================================== */
+
+    COLAMD_DEBUG3 (("** Supercolumn detection phase. **\n")) ;
+
+    Eigen::internal::detect_super_cols (Col, A, head, pivot_row_start, pivot_row_length) ;
+
+    /* === Kill the pivotal column ====================================== */
+
+    KILL_PRINCIPAL_COL (pivot_col) ;
+
+    /* === Clear mark =================================================== */
+
+    tag_mark += (max_deg + 1) ;
+    if (tag_mark >= max_mark)
+    {
+      COLAMD_DEBUG2 (("clearing tag_mark\n")) ;
+      tag_mark = Eigen::internal::clear_mark (n_row, Row) ;
+    }
+
+    /* === Finalize the new pivot row, and column scores ================ */
+
+    COLAMD_DEBUG3 (("** Finalize scores phase. **\n")) ;
+
+    /* for each column in pivot row */
+    rp = &A [pivot_row_start] ;
+    /* compact the pivot row */
+    new_rp = rp ;
+    rp_end = rp + pivot_row_length ;
+    while (rp < rp_end)
+    {
+      col = *rp++ ;
+      /* skip dead columns */
+      if (COL_IS_DEAD (col))
+      {
+	continue ;
+      }
+      *new_rp++ = col ;
+      /* add new pivot row to column */
+      A [Col [col].start + (Col [col].length++)] = pivot_row ;
+
+      /* retrieve score so far and add on pivot row's degree. */
+      /* (we wait until here for this in case the pivot */
+      /* row's degree was reduced due to mass elimination). */
+      cur_score = Col [col].shared2.score + pivot_row_degree ;
+
+      /* calculate the max possible score as the number of */
+      /* external columns minus the 'k' value minus the */
+      /* columns thickness */
+      max_score = n_col - k - Col [col].shared1.thickness ;
+
+      /* make the score the external degree of the union-of-rows */
+      cur_score -= Col [col].shared1.thickness ;
+
+      /* make sure score is less or equal than the max score */
+      cur_score = COLAMD_MIN (cur_score, max_score) ;
+      COLAMD_ASSERT (cur_score >= 0) ;
+
+      /* store updated score */
+      Col [col].shared2.score = cur_score ;
+
+      /* === Place column back in degree list ========================= */
+
+      COLAMD_ASSERT (min_score >= 0) ;
+      COLAMD_ASSERT (min_score <= n_col) ;
+      COLAMD_ASSERT (cur_score >= 0) ;
+      COLAMD_ASSERT (cur_score <= n_col) ;
+      COLAMD_ASSERT (head [cur_score] >= COLAMD_EMPTY) ;
+      next_col = head [cur_score] ;
+      Col [col].shared4.degree_next = next_col ;
+      Col [col].shared3.prev = COLAMD_EMPTY ;
+      if (next_col != COLAMD_EMPTY)
+      {
+	Col [next_col].shared3.prev = col ;
+      }
+      head [cur_score] = col ;
+
+      /* see if this score is less than current min */
+      min_score = COLAMD_MIN (min_score, cur_score) ;
+
+    }
+
+    /* === Resurrect the new pivot row ================================== */
+
+    if (pivot_row_degree > 0)
+    {
+      /* update pivot row length to reflect any cols that were killed */
+      /* during super-col detection and mass elimination */
+      Row [pivot_row].start  = pivot_row_start ;
+      Row [pivot_row].length = (Index) (new_rp - &A[pivot_row_start]) ;
+      Row [pivot_row].shared1.degree = pivot_row_degree ;
+      Row [pivot_row].shared2.mark = 0 ;
+      /* pivot row is no longer dead */
+    }
+  }
+
+  /* === All principal columns have now been ordered ====================== */
+
+  return (ngarbage) ;
+}
+
+
+/* ========================================================================== */
+/* === order_children ======================================================= */
+/* ========================================================================== */
+
+/*
+  The find_ordering routine has ordered all of the principal columns (the
+  representatives of the supercolumns).  The non-principal columns have not
+  yet been ordered.  This routine orders those columns by walking up the
+  parent tree (a column is a child of the column which absorbed it).  The
+  final permutation vector is then placed in p [0 ... n_col-1], with p [0]
+  being the first column, and p [n_col-1] being the last.  It doesn't look
+  like it at first glance, but be assured that this routine takes time linear
+  in the number of columns.  Although not immediately obvious, the time
+  taken by this routine is O (n_col), that is, linear in the number of
+  columns.  Not user-callable.
+*/
+template <typename Index>
+static inline  void order_children
+(
+  /* === Parameters ======================================================= */
+
+  Index n_col,      /* number of columns of A */
+  colamd_col<Index> Col [],    /* of size n_col+1 */
+  Index p []      /* p [0 ... n_col-1] is the column permutation*/
+  )
+{
+  /* === Local variables ================================================== */
+
+  Index i ;     /* loop counter for all columns */
+  Index c ;     /* column index */
+  Index parent ;    /* index of column's parent */
+  Index order ;     /* column's order */
+
+  /* === Order each non-principal column ================================== */
+
+  for (i = 0 ; i < n_col ; i++)
+  {
+    /* find an un-ordered non-principal column */
+    COLAMD_ASSERT (COL_IS_DEAD (i)) ;
+    if (!COL_IS_DEAD_PRINCIPAL (i) && Col [i].shared2.order == COLAMD_EMPTY)
+    {
+      parent = i ;
+      /* once found, find its principal parent */
+      do
+      {
+	parent = Col [parent].shared1.parent ;
+      } while (!COL_IS_DEAD_PRINCIPAL (parent)) ;
+
+      /* now, order all un-ordered non-principal columns along path */
+      /* to this parent.  collapse tree at the same time */
+      c = i ;
+      /* get order of parent */
+      order = Col [parent].shared2.order ;
+
+      do
+      {
+	COLAMD_ASSERT (Col [c].shared2.order == COLAMD_EMPTY) ;
+
+	/* order this column */
+	Col [c].shared2.order = order++ ;
+	/* collaps tree */
+	Col [c].shared1.parent = parent ;
+
+	/* get immediate parent of this column */
+	c = Col [c].shared1.parent ;
+
+	/* continue until we hit an ordered column.  There are */
+	/* guarranteed not to be anymore unordered columns */
+	/* above an ordered column */
+      } while (Col [c].shared2.order == COLAMD_EMPTY) ;
+
+      /* re-order the super_col parent to largest order for this group */
+      Col [parent].shared2.order = order ;
+    }
+  }
+
+  /* === Generate the permutation ========================================= */
+
+  for (c = 0 ; c < n_col ; c++)
+  {
+    p [Col [c].shared2.order] = c ;
+  }
+}
+
+
+/* ========================================================================== */
+/* === detect_super_cols ==================================================== */
+/* ========================================================================== */
+
+/*
+  Detects supercolumns by finding matches between columns in the hash buckets.
+  Check amongst columns in the set A [row_start ... row_start + row_length-1].
+  The columns under consideration are currently *not* in the degree lists,
+  and have already been placed in the hash buckets.
+
+  The hash bucket for columns whose hash function is equal to h is stored
+  as follows:
+
+  if head [h] is >= 0, then head [h] contains a degree list, so:
+
+  head [h] is the first column in degree bucket h.
+  Col [head [h]].headhash gives the first column in hash bucket h.
+
+  otherwise, the degree list is empty, and:
+
+  -(head [h] + 2) is the first column in hash bucket h.
+
+  For a column c in a hash bucket, Col [c].shared3.prev is NOT a "previous
+  column" pointer.  Col [c].shared3.hash is used instead as the hash number
+  for that column.  The value of Col [c].shared4.hash_next is the next column
+  in the same hash bucket.
+
+  Assuming no, or "few" hash collisions, the time taken by this routine is
+  linear in the sum of the sizes (lengths) of each column whose score has
+  just been computed in the approximate degree computation.
+  Not user-callable.
+*/
+template <typename Index>
+static void detect_super_cols
+(
+  /* === Parameters ======================================================= */
+  
+  colamd_col<Index> Col [],    /* of size n_col+1 */
+  Index A [],     /* row indices of A */
+  Index head [],    /* head of degree lists and hash buckets */
+  Index row_start,    /* pointer to set of columns to check */
+  Index row_length    /* number of columns to check */
+)
+{
+  /* === Local variables ================================================== */
+
+  Index hash ;      /* hash value for a column */
+  Index *rp ;     /* pointer to a row */
+  Index c ;     /* a column index */
+  Index super_c ;   /* column index of the column to absorb into */
+  Index *cp1 ;      /* column pointer for column super_c */
+  Index *cp2 ;      /* column pointer for column c */
+  Index length ;    /* length of column super_c */
+  Index prev_c ;    /* column preceding c in hash bucket */
+  Index i ;     /* loop counter */
+  Index *rp_end ;   /* pointer to the end of the row */
+  Index col ;     /* a column index in the row to check */
+  Index head_column ;   /* first column in hash bucket or degree list */
+  Index first_col ;   /* first column in hash bucket */
+
+  /* === Consider each column in the row ================================== */
+
+  rp = &A [row_start] ;
+  rp_end = rp + row_length ;
+  while (rp < rp_end)
+  {
+    col = *rp++ ;
+    if (COL_IS_DEAD (col))
+    {
+      continue ;
+    }
+
+    /* get hash number for this column */
+    hash = Col [col].shared3.hash ;
+    COLAMD_ASSERT (hash <= n_col) ;
+
+    /* === Get the first column in this hash bucket ===================== */
+
+    head_column = head [hash] ;
+    if (head_column > COLAMD_EMPTY)
+    {
+      first_col = Col [head_column].shared3.headhash ;
+    }
+    else
+    {
+      first_col = - (head_column + 2) ;
+    }
+
+    /* === Consider each column in the hash bucket ====================== */
+
+    for (super_c = first_col ; super_c != COLAMD_EMPTY ;
+	 super_c = Col [super_c].shared4.hash_next)
+    {
+      COLAMD_ASSERT (COL_IS_ALIVE (super_c)) ;
+      COLAMD_ASSERT (Col [super_c].shared3.hash == hash) ;
+      length = Col [super_c].length ;
+
+      /* prev_c is the column preceding column c in the hash bucket */
+      prev_c = super_c ;
+
+      /* === Compare super_c with all columns after it ================ */
+
+      for (c = Col [super_c].shared4.hash_next ;
+	   c != COLAMD_EMPTY ; c = Col [c].shared4.hash_next)
+      {
+	COLAMD_ASSERT (c != super_c) ;
+	COLAMD_ASSERT (COL_IS_ALIVE (c)) ;
+	COLAMD_ASSERT (Col [c].shared3.hash == hash) ;
+
+	/* not identical if lengths or scores are different */
+	if (Col [c].length != length ||
+	    Col [c].shared2.score != Col [super_c].shared2.score)
+	{
+	  prev_c = c ;
+	  continue ;
+	}
+
+	/* compare the two columns */
+	cp1 = &A [Col [super_c].start] ;
+	cp2 = &A [Col [c].start] ;
+
+	for (i = 0 ; i < length ; i++)
+	{
+	  /* the columns are "clean" (no dead rows) */
+	  COLAMD_ASSERT (ROW_IS_ALIVE (*cp1))  ;
+	  COLAMD_ASSERT (ROW_IS_ALIVE (*cp2))  ;
+	  /* row indices will same order for both supercols, */
+	  /* no gather scatter nessasary */
+	  if (*cp1++ != *cp2++)
+	  {
+	    break ;
+	  }
+	}
+
+	/* the two columns are different if the for-loop "broke" */
+	if (i != length)
+	{
+	  prev_c = c ;
+	  continue ;
+	}
+
+	/* === Got it!  two columns are identical =================== */
+
+	COLAMD_ASSERT (Col [c].shared2.score == Col [super_c].shared2.score) ;
+
+	Col [super_c].shared1.thickness += Col [c].shared1.thickness ;
+	Col [c].shared1.parent = super_c ;
+	KILL_NON_PRINCIPAL_COL (c) ;
+	/* order c later, in order_children() */
+	Col [c].shared2.order = COLAMD_EMPTY ;
+	/* remove c from hash bucket */
+	Col [prev_c].shared4.hash_next = Col [c].shared4.hash_next ;
+      }
+    }
+
+    /* === Empty this hash bucket ======================================= */
+
+    if (head_column > COLAMD_EMPTY)
+    {
+      /* corresponding degree list "hash" is not empty */
+      Col [head_column].shared3.headhash = COLAMD_EMPTY ;
+    }
+    else
+    {
+      /* corresponding degree list "hash" is empty */
+      head [hash] = COLAMD_EMPTY ;
+    }
+  }
+}
+
+
+/* ========================================================================== */
+/* === garbage_collection =================================================== */
+/* ========================================================================== */
+
+/*
+  Defragments and compacts columns and rows in the workspace A.  Used when
+  all avaliable memory has been used while performing row merging.  Returns
+  the index of the first free position in A, after garbage collection.  The
+  time taken by this routine is linear is the size of the array A, which is
+  itself linear in the number of nonzeros in the input matrix.
+  Not user-callable.
+*/
+template <typename Index>
+static Index garbage_collection  /* returns the new value of pfree */
+  (
+    /* === Parameters ======================================================= */
+    
+    Index n_row,      /* number of rows */
+    Index n_col,      /* number of columns */
+    Colamd_Row<Index> Row [],    /* row info */
+    colamd_col<Index> Col [],    /* column info */
+    Index A [],     /* A [0 ... Alen-1] holds the matrix */
+    Index *pfree      /* &A [0] ... pfree is in use */
+    )
+{
+  /* === Local variables ================================================== */
+
+  Index *psrc ;     /* source pointer */
+  Index *pdest ;    /* destination pointer */
+  Index j ;     /* counter */
+  Index r ;     /* a row index */
+  Index c ;     /* a column index */
+  Index length ;    /* length of a row or column */
+
+  /* === Defragment the columns =========================================== */
+
+  pdest = &A[0] ;
+  for (c = 0 ; c < n_col ; c++)
+  {
+    if (COL_IS_ALIVE (c))
+    {
+      psrc = &A [Col [c].start] ;
+
+      /* move and compact the column */
+      COLAMD_ASSERT (pdest <= psrc) ;
+      Col [c].start = (Index) (pdest - &A [0]) ;
+      length = Col [c].length ;
+      for (j = 0 ; j < length ; j++)
+      {
+	r = *psrc++ ;
+	if (ROW_IS_ALIVE (r))
+	{
+	  *pdest++ = r ;
+	}
+      }
+      Col [c].length = (Index) (pdest - &A [Col [c].start]) ;
+    }
+  }
+
+  /* === Prepare to defragment the rows =================================== */
+
+  for (r = 0 ; r < n_row ; r++)
+  {
+    if (ROW_IS_ALIVE (r))
+    {
+      if (Row [r].length == 0)
+      {
+	/* this row is of zero length.  cannot compact it, so kill it */
+	COLAMD_DEBUG3 (("Defrag row kill\n")) ;
+	KILL_ROW (r) ;
+      }
+      else
+      {
+	/* save first column index in Row [r].shared2.first_column */
+	psrc = &A [Row [r].start] ;
+	Row [r].shared2.first_column = *psrc ;
+	COLAMD_ASSERT (ROW_IS_ALIVE (r)) ;
+	/* flag the start of the row with the one's complement of row */
+	*psrc = ONES_COMPLEMENT (r) ;
+
+      }
+    }
+  }
+
+  /* === Defragment the rows ============================================== */
+
+  psrc = pdest ;
+  while (psrc < pfree)
+  {
+    /* find a negative number ... the start of a row */
+    if (*psrc++ < 0)
+    {
+      psrc-- ;
+      /* get the row index */
+      r = ONES_COMPLEMENT (*psrc) ;
+      COLAMD_ASSERT (r >= 0 && r < n_row) ;
+      /* restore first column index */
+      *psrc = Row [r].shared2.first_column ;
+      COLAMD_ASSERT (ROW_IS_ALIVE (r)) ;
+
+      /* move and compact the row */
+      COLAMD_ASSERT (pdest <= psrc) ;
+      Row [r].start = (Index) (pdest - &A [0]) ;
+      length = Row [r].length ;
+      for (j = 0 ; j < length ; j++)
+      {
+	c = *psrc++ ;
+	if (COL_IS_ALIVE (c))
+	{
+	  *pdest++ = c ;
+	}
+      }
+      Row [r].length = (Index) (pdest - &A [Row [r].start]) ;
+
+    }
+  }
+  /* ensure we found all the rows */
+  COLAMD_ASSERT (debug_rows == 0) ;
+
+  /* === Return the new value of pfree ==================================== */
+
+  return ((Index) (pdest - &A [0])) ;
+}
+
+
+/* ========================================================================== */
+/* === clear_mark =========================================================== */
+/* ========================================================================== */
+
+/*
+  Clears the Row [].shared2.mark array, and returns the new tag_mark.
+  Return value is the new tag_mark.  Not user-callable.
+*/
+template <typename Index>
+static inline  Index clear_mark  /* return the new value for tag_mark */
+  (
+      /* === Parameters ======================================================= */
+
+    Index n_row,    /* number of rows in A */
+    Colamd_Row<Index> Row [] /* Row [0 ... n_row-1].shared2.mark is set to zero */
+    )
+{
+  /* === Local variables ================================================== */
+
+  Index r ;
+
+  for (r = 0 ; r < n_row ; r++)
+  {
+    if (ROW_IS_ALIVE (r))
+    {
+      Row [r].shared2.mark = 0 ;
+    }
+  }
+  return (1) ;
+}
+
+
+} // namespace internal 
+#endif
diff --git a/third_party/eigen3/Eigen/src/OrderingMethods/Ordering.h b/third_party/eigen3/Eigen/src/OrderingMethods/Ordering.h
new file mode 100644
index 0000000000..4e06097849
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/OrderingMethods/Ordering.h
@@ -0,0 +1,154 @@
+ 
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012  Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_ORDERING_H
+#define EIGEN_ORDERING_H
+
+namespace Eigen {
+  
+#include "Eigen_Colamd.h"
+
+namespace internal {
+    
+/** \internal
+  * \ingroup OrderingMethods_Module
+  * \returns the symmetric pattern A^T+A from the input matrix A. 
+  * FIXME: The values should not be considered here
+  */
+template<typename MatrixType> 
+void ordering_helper_at_plus_a(const MatrixType& mat, MatrixType& symmat)
+{
+  MatrixType C;
+  C = mat.transpose(); // NOTE: Could be  costly
+  for (int i = 0; i < C.rows(); i++) 
+  {
+      for (typename MatrixType::InnerIterator it(C, i); it; ++it)
+        it.valueRef() = 0.0;
+  }
+  symmat = C + mat; 
+}
+    
+}
+
+#ifndef EIGEN_MPL2_ONLY
+
+/** \ingroup OrderingMethods_Module
+  * \class AMDOrdering
+  *
+  * Functor computing the \em approximate \em minimum \em degree ordering
+  * If the matrix is not structurally symmetric, an ordering of A^T+A is computed
+  * \tparam  Index The type of indices of the matrix 
+  * \sa COLAMDOrdering
+  */
+template <typename Index>
+class AMDOrdering
+{
+  public:
+    typedef PermutationMatrix<Dynamic, Dynamic, Index> PermutationType;
+    
+    /** Compute the permutation vector from a sparse matrix
+     * This routine is much faster if the input matrix is column-major     
+     */
+    template <typename MatrixType>
+    void operator()(const MatrixType& mat, PermutationType& perm)
+    {
+      // Compute the symmetric pattern
+      SparseMatrix<typename MatrixType::Scalar, ColMajor, Index> symm;
+      internal::ordering_helper_at_plus_a(mat,symm); 
+    
+      // Call the AMD routine 
+      //m_mat.prune(keep_diag());
+      internal::minimum_degree_ordering(symm, perm);
+    }
+    
+    /** Compute the permutation with a selfadjoint matrix */
+    template <typename SrcType, unsigned int SrcUpLo> 
+    void operator()(const SparseSelfAdjointView<SrcType, SrcUpLo>& mat, PermutationType& perm)
+    { 
+      SparseMatrix<typename SrcType::Scalar, ColMajor, Index> C; C = mat;
+      
+      // Call the AMD routine 
+      // m_mat.prune(keep_diag()); //Remove the diagonal elements 
+      internal::minimum_degree_ordering(C, perm);
+    }
+};
+
+#endif // EIGEN_MPL2_ONLY
+
+/** \ingroup OrderingMethods_Module
+  * \class NaturalOrdering
+  *
+  * Functor computing the natural ordering (identity)
+  * 
+  * \note Returns an empty permutation matrix
+  * \tparam  Index The type of indices of the matrix 
+  */
+template <typename Index>
+class NaturalOrdering
+{
+  public:
+    typedef PermutationMatrix<Dynamic, Dynamic, Index> PermutationType;
+    
+    /** Compute the permutation vector from a column-major sparse matrix */
+    template <typename MatrixType>
+    void operator()(const MatrixType& /*mat*/, PermutationType& perm)
+    {
+      perm.resize(0); 
+    }
+    
+};
+
+/** \ingroup OrderingMethods_Module
+  * \class COLAMDOrdering
+  *
+  * Functor computing the \em column \em approximate \em minimum \em degree ordering 
+  * The matrix should be in column-major and \b compressed format (see SparseMatrix::makeCompressed()).
+  */
+template<typename Index>
+class COLAMDOrdering
+{
+  public:
+    typedef PermutationMatrix<Dynamic, Dynamic, Index> PermutationType; 
+    typedef Matrix<Index, Dynamic, 1> IndexVector;
+    
+    /** Compute the permutation vector \a perm form the sparse matrix \a mat
+      * \warning The input sparse matrix \a mat must be in compressed mode (see SparseMatrix::makeCompressed()).
+      */
+    template <typename MatrixType>
+    void operator() (const MatrixType& mat, PermutationType& perm)
+    {
+      eigen_assert(mat.isCompressed() && "COLAMDOrdering requires a sparse matrix in compressed mode. Call .makeCompressed() before passing it to COLAMDOrdering");
+      
+      Index m = mat.rows();
+      Index n = mat.cols();
+      Index nnz = mat.nonZeros();
+      // Get the recommended value of Alen to be used by colamd
+      Index Alen = internal::colamd_recommended(nnz, m, n); 
+      // Set the default parameters
+      double knobs [COLAMD_KNOBS]; 
+      Index stats [COLAMD_STATS];
+      internal::colamd_set_defaults(knobs);
+      
+      Index info;
+      IndexVector p(n+1), A(Alen); 
+      for(Index i=0; i <= n; i++)   p(i) = mat.outerIndexPtr()[i];
+      for(Index i=0; i < nnz; i++)  A(i) = mat.innerIndexPtr()[i];
+      // Call Colamd routine to compute the ordering 
+      info = internal::colamd(m, n, Alen, A.data(), p.data(), knobs, stats); 
+      eigen_assert( info && "COLAMD failed " );
+      
+      perm.resize(n);
+      for (Index i = 0; i < n; i++) perm.indices()(p(i)) = i;
+    }
+};
+
+} // end namespace Eigen
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/PaStiXSupport/PaStiXSupport.h b/third_party/eigen3/Eigen/src/PaStiXSupport/PaStiXSupport.h
new file mode 100644
index 0000000000..8a546dc2ff
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/PaStiXSupport/PaStiXSupport.h
@@ -0,0 +1,729 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_PASTIXSUPPORT_H
+#define EIGEN_PASTIXSUPPORT_H
+
+namespace Eigen { 
+
+#if defined(DCOMPLEX)
+  #define PASTIX_COMPLEX  COMPLEX
+  #define PASTIX_DCOMPLEX DCOMPLEX
+#else
+  #define PASTIX_COMPLEX  std::complex<float>
+  #define PASTIX_DCOMPLEX std::complex<double>
+#endif
+
+/** \ingroup PaStiXSupport_Module
+  * \brief Interface to the PaStix solver
+  * 
+  * This class is used to solve the linear systems A.X = B via the PaStix library. 
+  * The matrix can be either real or complex, symmetric or not.
+  *
+  * \sa TutorialSparseDirectSolvers
+  */
+template<typename _MatrixType, bool IsStrSym = false> class PastixLU;
+template<typename _MatrixType, int Options> class PastixLLT;
+template<typename _MatrixType, int Options> class PastixLDLT;
+
+namespace internal
+{
+    
+  template<class Pastix> struct pastix_traits;
+
+  template<typename _MatrixType>
+  struct pastix_traits< PastixLU<_MatrixType> >
+  {
+    typedef _MatrixType MatrixType;
+    typedef typename _MatrixType::Scalar Scalar;
+    typedef typename _MatrixType::RealScalar RealScalar;
+    typedef typename _MatrixType::Index Index;
+  };
+
+  template<typename _MatrixType, int Options>
+  struct pastix_traits< PastixLLT<_MatrixType,Options> >
+  {
+    typedef _MatrixType MatrixType;
+    typedef typename _MatrixType::Scalar Scalar;
+    typedef typename _MatrixType::RealScalar RealScalar;
+    typedef typename _MatrixType::Index Index;
+  };
+
+  template<typename _MatrixType, int Options>
+  struct pastix_traits< PastixLDLT<_MatrixType,Options> >
+  {
+    typedef _MatrixType MatrixType;
+    typedef typename _MatrixType::Scalar Scalar;
+    typedef typename _MatrixType::RealScalar RealScalar;
+    typedef typename _MatrixType::Index Index;
+  };
+  
+  void eigen_pastix(pastix_data_t **pastix_data, int pastix_comm, int n, int *ptr, int *idx, float *vals, int *perm, int * invp, float *x, int nbrhs, int *iparm, double *dparm)
+  {
+    if (n == 0) { ptr = NULL; idx = NULL; vals = NULL; }
+    if (nbrhs == 0) {x = NULL; nbrhs=1;}
+    s_pastix(pastix_data, pastix_comm, n, ptr, idx, vals, perm, invp, x, nbrhs, iparm, dparm); 
+  }
+  
+  void eigen_pastix(pastix_data_t **pastix_data, int pastix_comm, int n, int *ptr, int *idx, double *vals, int *perm, int * invp, double *x, int nbrhs, int *iparm, double *dparm)
+  {
+    if (n == 0) { ptr = NULL; idx = NULL; vals = NULL; }
+    if (nbrhs == 0) {x = NULL; nbrhs=1;}
+    d_pastix(pastix_data, pastix_comm, n, ptr, idx, vals, perm, invp, x, nbrhs, iparm, dparm); 
+  }
+  
+  void eigen_pastix(pastix_data_t **pastix_data, int pastix_comm, int n, int *ptr, int *idx, std::complex<float> *vals, int *perm, int * invp, std::complex<float> *x, int nbrhs, int *iparm, double *dparm)
+  {
+    if (n == 0) { ptr = NULL; idx = NULL; vals = NULL; }
+    if (nbrhs == 0) {x = NULL; nbrhs=1;}
+    c_pastix(pastix_data, pastix_comm, n, ptr, idx, reinterpret_cast<PASTIX_COMPLEX*>(vals), perm, invp, reinterpret_cast<PASTIX_COMPLEX*>(x), nbrhs, iparm, dparm); 
+  }
+  
+  void eigen_pastix(pastix_data_t **pastix_data, int pastix_comm, int n, int *ptr, int *idx, std::complex<double> *vals, int *perm, int * invp, std::complex<double> *x, int nbrhs, int *iparm, double *dparm)
+  {
+    if (n == 0) { ptr = NULL; idx = NULL; vals = NULL; }
+    if (nbrhs == 0) {x = NULL; nbrhs=1;}
+    z_pastix(pastix_data, pastix_comm, n, ptr, idx, reinterpret_cast<PASTIX_DCOMPLEX*>(vals), perm, invp, reinterpret_cast<PASTIX_DCOMPLEX*>(x), nbrhs, iparm, dparm); 
+  }
+
+  // Convert the matrix  to Fortran-style Numbering
+  template <typename MatrixType>
+  void c_to_fortran_numbering (MatrixType& mat)
+  {
+    if ( !(mat.outerIndexPtr()[0]) ) 
+    { 
+      int i;
+      for(i = 0; i <= mat.rows(); ++i)
+        ++mat.outerIndexPtr()[i];
+      for(i = 0; i < mat.nonZeros(); ++i)
+        ++mat.innerIndexPtr()[i];
+    }
+  }
+  
+  // Convert to C-style Numbering
+  template <typename MatrixType>
+  void fortran_to_c_numbering (MatrixType& mat)
+  {
+    // Check the Numbering
+    if ( mat.outerIndexPtr()[0] == 1 ) 
+    { // Convert to C-style numbering
+      int i;
+      for(i = 0; i <= mat.rows(); ++i)
+        --mat.outerIndexPtr()[i];
+      for(i = 0; i < mat.nonZeros(); ++i)
+        --mat.innerIndexPtr()[i];
+    }
+  }
+}
+
+// This is the base class to interface with PaStiX functions. 
+// Users should not used this class directly. 
+template <class Derived>
+class PastixBase : internal::noncopyable
+{
+  public:
+    typedef typename internal::pastix_traits<Derived>::MatrixType _MatrixType;
+    typedef _MatrixType MatrixType;
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef Matrix<Scalar,Dynamic,1> Vector;
+    typedef SparseMatrix<Scalar, ColMajor> ColSpMatrix;
+    
+  public:
+    
+    PastixBase() : m_initisOk(false), m_analysisIsOk(false), m_factorizationIsOk(false), m_isInitialized(false), m_pastixdata(0), m_size(0)
+    {
+      init();
+    }
+    
+    ~PastixBase() 
+    {
+      clean();
+    }
+
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<PastixBase, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "Pastix solver is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "PastixBase::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<PastixBase, Rhs>(*this, b.derived());
+    }
+    
+    template<typename Rhs,typename Dest>
+    bool _solve (const MatrixBase<Rhs> &b, MatrixBase<Dest> &x) const;
+    
+    Derived& derived()
+    {
+      return *static_cast<Derived*>(this);
+    }
+    const Derived& derived() const
+    {
+      return *static_cast<const Derived*>(this);
+    }
+
+    /** Returns a reference to the integer vector IPARM of PaStiX parameters
+      * to modify the default parameters. 
+      * The statistics related to the different phases of factorization and solve are saved here as well
+      * \sa analyzePattern() factorize()
+      */
+    Array<Index,IPARM_SIZE,1>& iparm()
+    {
+      return m_iparm; 
+    }
+    
+    /** Return a reference to a particular index parameter of the IPARM vector 
+     * \sa iparm()
+     */
+    
+    int& iparm(int idxparam)
+    {
+      return m_iparm(idxparam);
+    }
+    
+     /** Returns a reference to the double vector DPARM of PaStiX parameters 
+      * The statistics related to the different phases of factorization and solve are saved here as well
+      * \sa analyzePattern() factorize()
+      */
+    Array<RealScalar,IPARM_SIZE,1>& dparm()
+    {
+      return m_dparm; 
+    }
+    
+    
+    /** Return a reference to a particular index parameter of the DPARM vector 
+     * \sa dparm()
+     */
+    double& dparm(int idxparam)
+    {
+      return m_dparm(idxparam);
+    }
+    
+    inline Index cols() const { return m_size; }
+    inline Index rows() const { return m_size; }
+    
+     /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the PaStiX reports a problem
+      *          \c InvalidInput if the input matrix is invalid
+      *
+      * \sa iparm()          
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_info;
+    }
+    
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::sparse_solve_retval<PastixBase, Rhs>
+    solve(const SparseMatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "Pastix LU, LLT or LDLT is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "PastixBase::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::sparse_solve_retval<PastixBase, Rhs>(*this, b.derived());
+    }
+    
+  protected:
+
+    // Initialize the Pastix data structure, check the matrix
+    void init(); 
+    
+    // Compute the ordering and the symbolic factorization
+    void analyzePattern(ColSpMatrix& mat);
+    
+    // Compute the numerical factorization
+    void factorize(ColSpMatrix& mat);
+    
+    // Free all the data allocated by Pastix
+    void clean()
+    {
+      eigen_assert(m_initisOk && "The Pastix structure should be allocated first"); 
+      m_iparm(IPARM_START_TASK) = API_TASK_CLEAN;
+      m_iparm(IPARM_END_TASK) = API_TASK_CLEAN;
+      internal::eigen_pastix(&m_pastixdata, MPI_COMM_WORLD, 0, 0, 0, (Scalar*)0,
+                             m_perm.data(), m_invp.data(), 0, 0, m_iparm.data(), m_dparm.data());
+    }
+    
+    void compute(ColSpMatrix& mat);
+    
+    int m_initisOk; 
+    int m_analysisIsOk;
+    int m_factorizationIsOk;
+    bool m_isInitialized;
+    mutable ComputationInfo m_info; 
+    mutable pastix_data_t *m_pastixdata; // Data structure for pastix
+    mutable int m_comm; // The MPI communicator identifier
+    mutable Matrix<int,IPARM_SIZE,1> m_iparm; // integer vector for the input parameters
+    mutable Matrix<double,DPARM_SIZE,1> m_dparm; // Scalar vector for the input parameters
+    mutable Matrix<Index,Dynamic,1> m_perm;  // Permutation vector
+    mutable Matrix<Index,Dynamic,1> m_invp;  // Inverse permutation vector
+    mutable int m_size; // Size of the matrix 
+}; 
+
+ /** Initialize the PaStiX data structure. 
+   *A first call to this function fills iparm and dparm with the default PaStiX parameters
+   * \sa iparm() dparm()
+   */
+template <class Derived>
+void PastixBase<Derived>::init()
+{
+  m_size = 0; 
+  m_iparm.setZero(IPARM_SIZE);
+  m_dparm.setZero(DPARM_SIZE);
+  
+  m_iparm(IPARM_MODIFY_PARAMETER) = API_NO;
+  pastix(&m_pastixdata, MPI_COMM_WORLD,
+         0, 0, 0, 0,
+         0, 0, 0, 1, m_iparm.data(), m_dparm.data());
+  
+  m_iparm[IPARM_MATRIX_VERIFICATION] = API_NO;
+  m_iparm[IPARM_VERBOSE]             = 2;
+  m_iparm[IPARM_ORDERING]            = API_ORDER_SCOTCH;
+  m_iparm[IPARM_INCOMPLETE]          = API_NO;
+  m_iparm[IPARM_OOC_LIMIT]           = 2000;
+  m_iparm[IPARM_RHS_MAKING]          = API_RHS_B;
+  m_iparm(IPARM_MATRIX_VERIFICATION) = API_NO;
+  
+  m_iparm(IPARM_START_TASK) = API_TASK_INIT;
+  m_iparm(IPARM_END_TASK) = API_TASK_INIT;
+  internal::eigen_pastix(&m_pastixdata, MPI_COMM_WORLD, 0, 0, 0, (Scalar*)0,
+                         0, 0, 0, 0, m_iparm.data(), m_dparm.data());
+  
+  // Check the returned error
+  if(m_iparm(IPARM_ERROR_NUMBER)) {
+    m_info = InvalidInput;
+    m_initisOk = false;
+  }
+  else { 
+    m_info = Success;
+    m_initisOk = true;
+  }
+}
+
+template <class Derived>
+void PastixBase<Derived>::compute(ColSpMatrix& mat)
+{
+  eigen_assert(mat.rows() == mat.cols() && "The input matrix should be squared");
+  
+  analyzePattern(mat);  
+  factorize(mat);
+  
+  m_iparm(IPARM_MATRIX_VERIFICATION) = API_NO;
+  m_isInitialized = m_factorizationIsOk;
+}
+
+
+template <class Derived>
+void PastixBase<Derived>::analyzePattern(ColSpMatrix& mat)
+{                         
+  eigen_assert(m_initisOk && "The initialization of PaSTiX failed");
+  
+  // clean previous calls
+  if(m_size>0)
+    clean();
+  
+  m_size = mat.rows();
+  m_perm.resize(m_size);
+  m_invp.resize(m_size);
+  
+  m_iparm(IPARM_START_TASK) = API_TASK_ORDERING;
+  m_iparm(IPARM_END_TASK) = API_TASK_ANALYSE;
+  internal::eigen_pastix(&m_pastixdata, MPI_COMM_WORLD, m_size, mat.outerIndexPtr(), mat.innerIndexPtr(),
+               mat.valuePtr(), m_perm.data(), m_invp.data(), 0, 0, m_iparm.data(), m_dparm.data());
+  
+  // Check the returned error
+  if(m_iparm(IPARM_ERROR_NUMBER))
+  {
+    m_info = NumericalIssue;
+    m_analysisIsOk = false;
+  }
+  else
+  { 
+    m_info = Success;
+    m_analysisIsOk = true;
+  }
+}
+
+template <class Derived>
+void PastixBase<Derived>::factorize(ColSpMatrix& mat)
+{
+//   if(&m_cpyMat != &mat) m_cpyMat = mat;
+  eigen_assert(m_analysisIsOk && "The analysis phase should be called before the factorization phase");
+  m_iparm(IPARM_START_TASK) = API_TASK_NUMFACT;
+  m_iparm(IPARM_END_TASK) = API_TASK_NUMFACT;
+  m_size = mat.rows();
+  
+  internal::eigen_pastix(&m_pastixdata, MPI_COMM_WORLD, m_size, mat.outerIndexPtr(), mat.innerIndexPtr(),
+               mat.valuePtr(), m_perm.data(), m_invp.data(), 0, 0, m_iparm.data(), m_dparm.data());
+  
+  // Check the returned error
+  if(m_iparm(IPARM_ERROR_NUMBER))
+  {
+    m_info = NumericalIssue;
+    m_factorizationIsOk = false;
+    m_isInitialized = false;
+  }
+  else
+  {
+    m_info = Success;
+    m_factorizationIsOk = true;
+    m_isInitialized = true;
+  }
+}
+
+/* Solve the system */
+template<typename Base>
+template<typename Rhs,typename Dest>
+bool PastixBase<Base>::_solve (const MatrixBase<Rhs> &b, MatrixBase<Dest> &x) const
+{
+  eigen_assert(m_isInitialized && "The matrix should be factorized first");
+  EIGEN_STATIC_ASSERT((Dest::Flags&RowMajorBit)==0,
+                     THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
+  int rhs = 1;
+  
+  x = b; /* on return, x is overwritten by the computed solution */
+  
+  for (int i = 0; i < b.cols(); i++){
+    m_iparm[IPARM_START_TASK]          = API_TASK_SOLVE;
+    m_iparm[IPARM_END_TASK]            = API_TASK_REFINE;
+  
+    internal::eigen_pastix(&m_pastixdata, MPI_COMM_WORLD, x.rows(), 0, 0, 0,
+                           m_perm.data(), m_invp.data(), &x(0, i), rhs, m_iparm.data(), m_dparm.data());
+  }
+  
+  // Check the returned error
+  m_info = m_iparm(IPARM_ERROR_NUMBER)==0 ? Success : NumericalIssue;
+  
+  return m_iparm(IPARM_ERROR_NUMBER)==0;
+}
+
+/** \ingroup PaStiXSupport_Module
+  * \class PastixLU
+  * \brief Sparse direct LU solver based on PaStiX library
+  * 
+  * This class is used to solve the linear systems A.X = B with a supernodal LU 
+  * factorization in the PaStiX library. The matrix A should be squared and nonsingular
+  * PaStiX requires that the matrix A has a symmetric structural pattern. 
+  * This interface can symmetrize the input matrix otherwise. 
+  * The vectors or matrices X and B can be either dense or sparse.
+  * 
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam IsStrSym Indicates if the input matrix has a symmetric pattern, default is false
+  * NOTE : Note that if the analysis and factorization phase are called separately, 
+  * the input matrix will be symmetrized at each call, hence it is advised to 
+  * symmetrize the matrix in a end-user program and set \p IsStrSym to true
+  * 
+  * \sa \ref TutorialSparseDirectSolvers
+  * 
+  */
+template<typename _MatrixType, bool IsStrSym>
+class PastixLU : public PastixBase< PastixLU<_MatrixType> >
+{
+  public:
+    typedef _MatrixType MatrixType;
+    typedef PastixBase<PastixLU<MatrixType> > Base;
+    typedef typename Base::ColSpMatrix ColSpMatrix;
+    typedef typename MatrixType::Index Index;
+    
+  public:
+    PastixLU() : Base()
+    {
+      init();
+    }
+    
+    PastixLU(const MatrixType& matrix):Base()
+    {
+      init();
+      compute(matrix);
+    }
+    /** Compute the LU supernodal factorization of \p matrix. 
+      * iparm and dparm can be used to tune the PaStiX parameters. 
+      * see the PaStiX user's manual
+      * \sa analyzePattern() factorize()
+      */
+    void compute (const MatrixType& matrix)
+    {
+      m_structureIsUptodate = false;
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::compute(temp);
+    }
+    /** Compute the LU symbolic factorization of \p matrix using its sparsity pattern. 
+      * Several ordering methods can be used at this step. See the PaStiX user's manual. 
+      * The result of this operation can be used with successive matrices having the same pattern as \p matrix
+      * \sa factorize()
+      */
+    void analyzePattern(const MatrixType& matrix)
+    {
+      m_structureIsUptodate = false;
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::analyzePattern(temp);
+    }
+
+    /** Compute the LU supernodal factorization of \p matrix
+      * WARNING The matrix \p matrix should have the same structural pattern 
+      * as the same used in the analysis phase.
+      * \sa analyzePattern()
+      */ 
+    void factorize(const MatrixType& matrix)
+    {
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::factorize(temp);
+    }
+  protected:
+    
+    void init()
+    {
+      m_structureIsUptodate = false;
+      m_iparm(IPARM_SYM) = API_SYM_NO;
+      m_iparm(IPARM_FACTORIZATION) = API_FACT_LU;
+    }
+    
+    void grabMatrix(const MatrixType& matrix, ColSpMatrix& out)
+    {
+      if(IsStrSym)
+        out = matrix;
+      else
+      {
+        if(!m_structureIsUptodate)
+        {
+          // update the transposed structure
+          m_transposedStructure = matrix.transpose();
+          
+          // Set the elements of the matrix to zero 
+          for (Index j=0; j<m_transposedStructure.outerSize(); ++j) 
+            for(typename ColSpMatrix::InnerIterator it(m_transposedStructure, j); it; ++it)
+              it.valueRef() = 0.0;
+
+          m_structureIsUptodate = true;
+        }
+        
+        out = m_transposedStructure + matrix;
+      }
+      internal::c_to_fortran_numbering(out);
+    }
+    
+    using Base::m_iparm;
+    using Base::m_dparm;
+    
+    ColSpMatrix m_transposedStructure;
+    bool m_structureIsUptodate;
+};
+
+/** \ingroup PaStiXSupport_Module
+  * \class PastixLLT
+  * \brief A sparse direct supernodal Cholesky (LLT) factorization and solver based on the PaStiX library
+  * 
+  * This class is used to solve the linear systems A.X = B via a LL^T supernodal Cholesky factorization
+  * available in the PaStiX library. The matrix A should be symmetric and positive definite
+  * WARNING Selfadjoint complex matrices are not supported in the current version of PaStiX
+  * The vectors or matrices X and B can be either dense or sparse
+  * 
+  * \tparam MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam UpLo The part of the matrix to use : Lower or Upper. The default is Lower as required by PaStiX
+  * 
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename _MatrixType, int _UpLo>
+class PastixLLT : public PastixBase< PastixLLT<_MatrixType, _UpLo> >
+{
+  public:
+    typedef _MatrixType MatrixType;
+    typedef PastixBase<PastixLLT<MatrixType, _UpLo> > Base;
+    typedef typename Base::ColSpMatrix ColSpMatrix;
+    
+  public:
+    enum { UpLo = _UpLo };
+    PastixLLT() : Base()
+    {
+      init();
+    }
+    
+    PastixLLT(const MatrixType& matrix):Base()
+    {
+      init();
+      compute(matrix);
+    }
+
+    /** Compute the L factor of the LL^T supernodal factorization of \p matrix 
+      * \sa analyzePattern() factorize()
+      */
+    void compute (const MatrixType& matrix)
+    {
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::compute(temp);
+    }
+
+     /** Compute the LL^T symbolic factorization of \p matrix using its sparsity pattern
+      * The result of this operation can be used with successive matrices having the same pattern as \p matrix
+      * \sa factorize()
+      */
+    void analyzePattern(const MatrixType& matrix)
+    {
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::analyzePattern(temp);
+    }
+      /** Compute the LL^T supernodal numerical factorization of \p matrix 
+        * \sa analyzePattern()
+        */
+    void factorize(const MatrixType& matrix)
+    {
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::factorize(temp);
+    }
+  protected:
+    using Base::m_iparm;
+    
+    void init()
+    {
+      m_iparm(IPARM_SYM) = API_SYM_YES;
+      m_iparm(IPARM_FACTORIZATION) = API_FACT_LLT;
+    }
+    
+    void grabMatrix(const MatrixType& matrix, ColSpMatrix& out)
+    {
+      // Pastix supports only lower, column-major matrices 
+      out.template selfadjointView<Lower>() = matrix.template selfadjointView<UpLo>();
+      internal::c_to_fortran_numbering(out);
+    }
+};
+
+/** \ingroup PaStiXSupport_Module
+  * \class PastixLDLT
+  * \brief A sparse direct supernodal Cholesky (LLT) factorization and solver based on the PaStiX library
+  * 
+  * This class is used to solve the linear systems A.X = B via a LDL^T supernodal Cholesky factorization
+  * available in the PaStiX library. The matrix A should be symmetric and positive definite
+  * WARNING Selfadjoint complex matrices are not supported in the current version of PaStiX
+  * The vectors or matrices X and B can be either dense or sparse
+  * 
+  * \tparam MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam UpLo The part of the matrix to use : Lower or Upper. The default is Lower as required by PaStiX
+  * 
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename _MatrixType, int _UpLo>
+class PastixLDLT : public PastixBase< PastixLDLT<_MatrixType, _UpLo> >
+{
+  public:
+    typedef _MatrixType MatrixType;
+    typedef PastixBase<PastixLDLT<MatrixType, _UpLo> > Base; 
+    typedef typename Base::ColSpMatrix ColSpMatrix;
+    
+  public:
+    enum { UpLo = _UpLo };
+    PastixLDLT():Base()
+    {
+      init();
+    }
+    
+    PastixLDLT(const MatrixType& matrix):Base()
+    {
+      init();
+      compute(matrix);
+    }
+
+    /** Compute the L and D factors of the LDL^T factorization of \p matrix 
+      * \sa analyzePattern() factorize()
+      */
+    void compute (const MatrixType& matrix)
+    {
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::compute(temp);
+    }
+
+    /** Compute the LDL^T symbolic factorization of \p matrix using its sparsity pattern
+      * The result of this operation can be used with successive matrices having the same pattern as \p matrix
+      * \sa factorize()
+      */
+    void analyzePattern(const MatrixType& matrix)
+    { 
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::analyzePattern(temp);
+    }
+    /** Compute the LDL^T supernodal numerical factorization of \p matrix 
+      * 
+      */
+    void factorize(const MatrixType& matrix)
+    {
+      ColSpMatrix temp;
+      grabMatrix(matrix, temp);
+      Base::factorize(temp);
+    }
+
+  protected:
+    using Base::m_iparm;
+    
+    void init()
+    {
+      m_iparm(IPARM_SYM) = API_SYM_YES;
+      m_iparm(IPARM_FACTORIZATION) = API_FACT_LDLT;
+    }
+    
+    void grabMatrix(const MatrixType& matrix, ColSpMatrix& out)
+    {
+      // Pastix supports only lower, column-major matrices 
+      out.template selfadjointView<Lower>() = matrix.template selfadjointView<UpLo>();
+      internal::c_to_fortran_numbering(out);
+    }
+};
+
+namespace internal {
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<PastixBase<_MatrixType>, Rhs>
+  : solve_retval_base<PastixBase<_MatrixType>, Rhs>
+{
+  typedef PastixBase<_MatrixType> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+template<typename _MatrixType, typename Rhs>
+struct sparse_solve_retval<PastixBase<_MatrixType>, Rhs>
+  : sparse_solve_retval_base<PastixBase<_MatrixType>, Rhs>
+{
+  typedef PastixBase<_MatrixType> Dec;
+  EIGEN_MAKE_SPARSE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    this->defaultEvalTo(dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/PardisoSupport/PardisoSupport.h b/third_party/eigen3/Eigen/src/PardisoSupport/PardisoSupport.h
new file mode 100644
index 0000000000..b6571069e4
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/PardisoSupport/PardisoSupport.h
@@ -0,0 +1,581 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL PARDISO
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_PARDISOSUPPORT_H
+#define EIGEN_PARDISOSUPPORT_H
+
+namespace Eigen { 
+
+template<typename _MatrixType> class PardisoLU;
+template<typename _MatrixType, int Options=Upper> class PardisoLLT;
+template<typename _MatrixType, int Options=Upper> class PardisoLDLT;
+
+namespace internal
+{
+  template<typename Index>
+  struct pardiso_run_selector
+  {
+    static Index run( _MKL_DSS_HANDLE_t pt, Index maxfct, Index mnum, Index type, Index phase, Index n, void *a,
+                      Index *ia, Index *ja, Index *perm, Index nrhs, Index *iparm, Index msglvl, void *b, void *x)
+    {
+      Index error = 0;
+      ::pardiso(pt, &maxfct, &mnum, &type, &phase, &n, a, ia, ja, perm, &nrhs, iparm, &msglvl, b, x, &error);
+      return error;
+    }
+  };
+  template<>
+  struct pardiso_run_selector<long long int>
+  {
+    typedef long long int Index;
+    static Index run( _MKL_DSS_HANDLE_t pt, Index maxfct, Index mnum, Index type, Index phase, Index n, void *a,
+                      Index *ia, Index *ja, Index *perm, Index nrhs, Index *iparm, Index msglvl, void *b, void *x)
+    {
+      Index error = 0;
+      ::pardiso_64(pt, &maxfct, &mnum, &type, &phase, &n, a, ia, ja, perm, &nrhs, iparm, &msglvl, b, x, &error);
+      return error;
+    }
+  };
+
+  template<class Pardiso> struct pardiso_traits;
+
+  template<typename _MatrixType>
+  struct pardiso_traits< PardisoLU<_MatrixType> >
+  {
+    typedef _MatrixType MatrixType;
+    typedef typename _MatrixType::Scalar Scalar;
+    typedef typename _MatrixType::RealScalar RealScalar;
+    typedef typename _MatrixType::Index Index;
+  };
+
+  template<typename _MatrixType, int Options>
+  struct pardiso_traits< PardisoLLT<_MatrixType, Options> >
+  {
+    typedef _MatrixType MatrixType;
+    typedef typename _MatrixType::Scalar Scalar;
+    typedef typename _MatrixType::RealScalar RealScalar;
+    typedef typename _MatrixType::Index Index;
+  };
+
+  template<typename _MatrixType, int Options>
+  struct pardiso_traits< PardisoLDLT<_MatrixType, Options> >
+  {
+    typedef _MatrixType MatrixType;
+    typedef typename _MatrixType::Scalar Scalar;
+    typedef typename _MatrixType::RealScalar RealScalar;
+    typedef typename _MatrixType::Index Index;    
+  };
+
+}
+
+template<class Derived>
+class PardisoImpl : internal::noncopyable
+{
+    typedef internal::pardiso_traits<Derived> Traits;
+  public:
+    typedef typename Traits::MatrixType MatrixType;
+    typedef typename Traits::Scalar Scalar;
+    typedef typename Traits::RealScalar RealScalar;
+    typedef typename Traits::Index Index;
+    typedef SparseMatrix<Scalar,RowMajor,Index> SparseMatrixType;
+    typedef Matrix<Scalar,Dynamic,1> VectorType;
+    typedef Matrix<Index, 1, MatrixType::ColsAtCompileTime> IntRowVectorType;
+    typedef Matrix<Index, MatrixType::RowsAtCompileTime, 1> IntColVectorType;
+    typedef Array<Index,64,1,DontAlign> ParameterType;
+    enum {
+      ScalarIsComplex = NumTraits<Scalar>::IsComplex
+    };
+
+    PardisoImpl()
+    {
+      eigen_assert((sizeof(Index) >= sizeof(_INTEGER_t) && sizeof(Index) <= 8) && "Non-supported index type");
+      m_iparm.setZero();
+      m_msglvl = 0; // No output
+      m_initialized = false;
+    }
+
+    ~PardisoImpl()
+    {
+      pardisoRelease();
+    }
+
+    inline Index cols() const { return m_size; }
+    inline Index rows() const { return m_size; }
+  
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the matrix appears to be negative.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_initialized && "Decomposition is not initialized.");
+      return m_info;
+    }
+
+    /** \warning for advanced usage only.
+      * \returns a reference to the parameter array controlling PARDISO.
+      * See the PARDISO manual to know how to use it. */
+    ParameterType& pardisoParameterArray()
+    {
+      return m_iparm;
+    }
+    
+    /** Performs a symbolic decomposition on the sparcity of \a matrix.
+      *
+      * This function is particularly useful when solving for several problems having the same structure.
+      * 
+      * \sa factorize()
+      */
+    Derived& analyzePattern(const MatrixType& matrix);
+    
+    /** Performs a numeric decomposition of \a matrix
+      *
+      * The given matrix must has the same sparcity than the matrix on which the symbolic decomposition has been performed.
+      *
+      * \sa analyzePattern()
+      */
+    Derived& factorize(const MatrixType& matrix);
+
+    Derived& compute(const MatrixType& matrix);
+    
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<PardisoImpl, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_initialized && "Pardiso solver is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "PardisoImpl::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<PardisoImpl, Rhs>(*this, b.derived());
+    }
+
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::sparse_solve_retval<PardisoImpl, Rhs>
+    solve(const SparseMatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_initialized && "Pardiso solver is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "PardisoImpl::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::sparse_solve_retval<PardisoImpl, Rhs>(*this, b.derived());
+    }
+
+    Derived& derived()
+    {
+      return *static_cast<Derived*>(this);
+    }
+    const Derived& derived() const
+    {
+      return *static_cast<const Derived*>(this);
+    }
+
+    template<typename BDerived, typename XDerived>
+    bool _solve(const MatrixBase<BDerived> &b, MatrixBase<XDerived>& x) const;
+
+  protected:
+    void pardisoRelease()
+    {
+      if(m_initialized) // Factorization ran at least once
+      {
+        internal::pardiso_run_selector<Index>::run(m_pt, 1, 1, m_type, -1, m_size, 0, 0, 0, m_perm.data(), 0,
+                                                   m_iparm.data(), m_msglvl, 0, 0);
+      }
+    }
+
+    void pardisoInit(int type)
+    {
+      m_type = type;
+      bool symmetric = std::abs(m_type) < 10;
+      m_iparm[0] = 1;   // No solver default
+      m_iparm[1] = 3;   // use Metis for the ordering
+      m_iparm[2] = 1;   // Numbers of processors, value of OMP_NUM_THREADS
+      m_iparm[3] = 0;   // No iterative-direct algorithm
+      m_iparm[4] = 0;   // No user fill-in reducing permutation
+      m_iparm[5] = 0;   // Write solution into x
+      m_iparm[6] = 0;   // Not in use
+      m_iparm[7] = 2;   // Max numbers of iterative refinement steps
+      m_iparm[8] = 0;   // Not in use
+      m_iparm[9] = 13;  // Perturb the pivot elements with 1E-13
+      m_iparm[10] = symmetric ? 0 : 1; // Use nonsymmetric permutation and scaling MPS
+      m_iparm[11] = 0;  // Not in use
+      m_iparm[12] = symmetric ? 0 : 1;  // Maximum weighted matching algorithm is switched-off (default for symmetric).
+                                        // Try m_iparm[12] = 1 in case of inappropriate accuracy
+      m_iparm[13] = 0;  // Output: Number of perturbed pivots
+      m_iparm[14] = 0;  // Not in use
+      m_iparm[15] = 0;  // Not in use
+      m_iparm[16] = 0;  // Not in use
+      m_iparm[17] = -1; // Output: Number of nonzeros in the factor LU
+      m_iparm[18] = -1; // Output: Mflops for LU factorization
+      m_iparm[19] = 0;  // Output: Numbers of CG Iterations
+      
+      m_iparm[20] = 0;  // 1x1 pivoting
+      m_iparm[26] = 0;  // No matrix checker
+      m_iparm[27] = (sizeof(RealScalar) == 4) ? 1 : 0;
+      m_iparm[34] = 1;  // C indexing
+      m_iparm[59] = 1;  // Automatic switch between In-Core and Out-of-Core modes
+    }
+
+  protected:
+    // cached data to reduce reallocation, etc.
+    
+    void manageErrorCode(Index error)
+    {
+      switch(error)
+      {
+        case 0:
+          m_info = Success;
+          break;
+        case -4:
+        case -7:
+          m_info = NumericalIssue;
+          break;
+        default:
+          m_info = InvalidInput;
+      }
+    }
+
+    mutable SparseMatrixType m_matrix;
+    ComputationInfo m_info;
+    bool m_initialized, m_analysisIsOk, m_factorizationIsOk;
+    Index m_type, m_msglvl;
+    mutable void *m_pt[64];
+    mutable ParameterType m_iparm;
+    mutable IntColVectorType m_perm;
+    Index m_size;
+    
+};
+
+template<class Derived>
+Derived& PardisoImpl<Derived>::compute(const MatrixType& a)
+{
+  m_size = a.rows();
+  eigen_assert(a.rows() == a.cols());
+
+  pardisoRelease();
+  memset(m_pt, 0, sizeof(m_pt));
+  m_perm.setZero(m_size);
+  derived().getMatrix(a);
+  
+  Index error;
+  error = internal::pardiso_run_selector<Index>::run(m_pt, 1, 1, m_type, 12, m_size,
+                                                     m_matrix.valuePtr(), m_matrix.outerIndexPtr(), m_matrix.innerIndexPtr(),
+                                                     m_perm.data(), 0, m_iparm.data(), m_msglvl, NULL, NULL);
+
+  manageErrorCode(error);
+  m_analysisIsOk = true;
+  m_factorizationIsOk = true;
+  m_initialized = true;
+  return derived();
+}
+
+template<class Derived>
+Derived& PardisoImpl<Derived>::analyzePattern(const MatrixType& a)
+{
+  m_size = a.rows();
+  eigen_assert(m_size == a.cols());
+
+  pardisoRelease();
+  memset(m_pt, 0, sizeof(m_pt));
+  m_perm.setZero(m_size);
+  derived().getMatrix(a);
+  
+  Index error;
+  error = internal::pardiso_run_selector<Index>::run(m_pt, 1, 1, m_type, 11, m_size,
+                                                     m_matrix.valuePtr(), m_matrix.outerIndexPtr(), m_matrix.innerIndexPtr(),
+                                                     m_perm.data(), 0, m_iparm.data(), m_msglvl, NULL, NULL);
+  
+  manageErrorCode(error);
+  m_analysisIsOk = true;
+  m_factorizationIsOk = false;
+  m_initialized = true;
+  return derived();
+}
+
+template<class Derived>
+Derived& PardisoImpl<Derived>::factorize(const MatrixType& a)
+{
+  eigen_assert(m_analysisIsOk && "You must first call analyzePattern()");
+  eigen_assert(m_size == a.rows() && m_size == a.cols());
+  
+  derived().getMatrix(a);
+
+  Index error;  
+  error = internal::pardiso_run_selector<Index>::run(m_pt, 1, 1, m_type, 22, m_size,
+                                                     m_matrix.valuePtr(), m_matrix.outerIndexPtr(), m_matrix.innerIndexPtr(),
+                                                     m_perm.data(), 0, m_iparm.data(), m_msglvl, NULL, NULL);
+  
+  manageErrorCode(error);
+  m_factorizationIsOk = true;
+  return derived();
+}
+
+template<class Base>
+template<typename BDerived,typename XDerived>
+bool PardisoImpl<Base>::_solve(const MatrixBase<BDerived> &b, MatrixBase<XDerived>& x) const
+{
+  if(m_iparm[0] == 0) // Factorization was not computed
+    return false;
+
+  //Index n = m_matrix.rows();
+  Index nrhs = Index(b.cols());
+  eigen_assert(m_size==b.rows());
+  eigen_assert(((MatrixBase<BDerived>::Flags & RowMajorBit) == 0 || nrhs == 1) && "Row-major right hand sides are not supported");
+  eigen_assert(((MatrixBase<XDerived>::Flags & RowMajorBit) == 0 || nrhs == 1) && "Row-major matrices of unknowns are not supported");
+  eigen_assert(((nrhs == 1) || b.outerStride() == b.rows()));
+
+
+//  switch (transposed) {
+//    case SvNoTrans    : m_iparm[11] = 0 ; break;
+//    case SvTranspose  : m_iparm[11] = 2 ; break;
+//    case SvAdjoint    : m_iparm[11] = 1 ; break;
+//    default:
+//      //std::cerr << "Eigen: transposition  option \"" << transposed << "\" not supported by the PARDISO backend\n";
+//      m_iparm[11] = 0;
+//  }
+
+  Scalar* rhs_ptr = const_cast<Scalar*>(b.derived().data());
+  Matrix<Scalar,Dynamic,Dynamic,ColMajor> tmp;
+  
+  // Pardiso cannot solve in-place
+  if(rhs_ptr == x.derived().data())
+  {
+    tmp = b;
+    rhs_ptr = tmp.data();
+  }
+  
+  Index error;
+  error = internal::pardiso_run_selector<Index>::run(m_pt, 1, 1, m_type, 33, m_size,
+                                                     m_matrix.valuePtr(), m_matrix.outerIndexPtr(), m_matrix.innerIndexPtr(),
+                                                     m_perm.data(), nrhs, m_iparm.data(), m_msglvl,
+                                                     rhs_ptr, x.derived().data());
+
+  return error==0;
+}
+
+
+/** \ingroup PardisoSupport_Module
+  * \class PardisoLU
+  * \brief A sparse direct LU factorization and solver based on the PARDISO library
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a direct LU factorization
+  * using the Intel MKL PARDISO library. The sparse matrix A must be squared and invertible.
+  * The vectors or matrices X and B can be either dense or sparse.
+  *
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  *
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename MatrixType>
+class PardisoLU : public PardisoImpl< PardisoLU<MatrixType> >
+{
+  protected:
+    typedef PardisoImpl< PardisoLU<MatrixType> > Base;
+    typedef typename Base::Scalar Scalar;
+    typedef typename Base::RealScalar RealScalar;
+    using Base::pardisoInit;
+    using Base::m_matrix;
+    friend class PardisoImpl< PardisoLU<MatrixType> >;
+
+  public:
+
+    using Base::compute;
+    using Base::solve;
+
+    PardisoLU()
+      : Base()
+    {
+      pardisoInit(Base::ScalarIsComplex ? 13 : 11);
+    }
+
+    PardisoLU(const MatrixType& matrix)
+      : Base()
+    {
+      pardisoInit(Base::ScalarIsComplex ? 13 : 11);
+      compute(matrix);
+    }
+  protected:
+    void getMatrix(const MatrixType& matrix)
+    {
+      m_matrix = matrix;
+    }
+};
+
+/** \ingroup PardisoSupport_Module
+  * \class PardisoLLT
+  * \brief A sparse direct Cholesky (LLT) factorization and solver based on the PARDISO library
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a LL^T Cholesky factorization
+  * using the Intel MKL PARDISO library. The sparse matrix A must be selfajoint and positive definite.
+  * The vectors or matrices X and B can be either dense or sparse.
+  *
+  * \tparam MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam UpLo can be any bitwise combination of Upper, Lower. The default is Upper, meaning only the upper triangular part has to be used.
+  *         Upper|Lower can be used to tell both triangular parts can be used as input.
+  *
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename MatrixType, int _UpLo>
+class PardisoLLT : public PardisoImpl< PardisoLLT<MatrixType,_UpLo> >
+{
+  protected:
+    typedef PardisoImpl< PardisoLLT<MatrixType,_UpLo> > Base;
+    typedef typename Base::Scalar Scalar;
+    typedef typename Base::Index Index;
+    typedef typename Base::RealScalar RealScalar;
+    using Base::pardisoInit;
+    using Base::m_matrix;
+    friend class PardisoImpl< PardisoLLT<MatrixType,_UpLo> >;
+
+  public:
+
+    enum { UpLo = _UpLo };
+    using Base::compute;
+    using Base::solve;
+
+    PardisoLLT()
+      : Base()
+    {
+      pardisoInit(Base::ScalarIsComplex ? 4 : 2);
+    }
+
+    PardisoLLT(const MatrixType& matrix)
+      : Base()
+    {
+      pardisoInit(Base::ScalarIsComplex ? 4 : 2);
+      compute(matrix);
+    }
+    
+  protected:
+    
+    void getMatrix(const MatrixType& matrix)
+    {
+      // PARDISO supports only upper, row-major matrices
+      PermutationMatrix<Dynamic,Dynamic,Index> p_null;
+      m_matrix.resize(matrix.rows(), matrix.cols());
+      m_matrix.template selfadjointView<Upper>() = matrix.template selfadjointView<UpLo>().twistedBy(p_null);
+    }
+};
+
+/** \ingroup PardisoSupport_Module
+  * \class PardisoLDLT
+  * \brief A sparse direct Cholesky (LDLT) factorization and solver based on the PARDISO library
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a LDL^T Cholesky factorization
+  * using the Intel MKL PARDISO library. The sparse matrix A is assumed to be selfajoint and positive definite.
+  * For complex matrices, A can also be symmetric only, see the \a Options template parameter.
+  * The vectors or matrices X and B can be either dense or sparse.
+  *
+  * \tparam MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  * \tparam Options can be any bitwise combination of Upper, Lower, and Symmetric. The default is Upper, meaning only the upper triangular part has to be used.
+  *         Symmetric can be used for symmetric, non-selfadjoint complex matrices, the default being to assume a selfadjoint matrix.
+  *         Upper|Lower can be used to tell both triangular parts can be used as input.
+  *
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename MatrixType, int Options>
+class PardisoLDLT : public PardisoImpl< PardisoLDLT<MatrixType,Options> >
+{
+  protected:
+    typedef PardisoImpl< PardisoLDLT<MatrixType,Options> > Base;
+    typedef typename Base::Scalar Scalar;
+    typedef typename Base::Index Index;
+    typedef typename Base::RealScalar RealScalar;
+    using Base::pardisoInit;
+    using Base::m_matrix;
+    friend class PardisoImpl< PardisoLDLT<MatrixType,Options> >;
+
+  public:
+
+    using Base::compute;
+    using Base::solve;
+    enum { UpLo = Options&(Upper|Lower) };
+
+    PardisoLDLT()
+      : Base()
+    {
+      pardisoInit(Base::ScalarIsComplex ? ( bool(Options&Symmetric) ? 6 : -4 ) : -2);
+    }
+
+    PardisoLDLT(const MatrixType& matrix)
+      : Base()
+    {
+      pardisoInit(Base::ScalarIsComplex ? ( bool(Options&Symmetric) ? 6 : -4 ) : -2);
+      compute(matrix);
+    }
+    
+    void getMatrix(const MatrixType& matrix)
+    {
+      // PARDISO supports only upper, row-major matrices
+      PermutationMatrix<Dynamic,Dynamic,Index> p_null;
+      m_matrix.resize(matrix.rows(), matrix.cols());
+      m_matrix.template selfadjointView<Upper>() = matrix.template selfadjointView<UpLo>().twistedBy(p_null);
+    }
+};
+
+namespace internal {
+  
+template<typename _Derived, typename Rhs>
+struct solve_retval<PardisoImpl<_Derived>, Rhs>
+  : solve_retval_base<PardisoImpl<_Derived>, Rhs>
+{
+  typedef PardisoImpl<_Derived> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+template<typename Derived, typename Rhs>
+struct sparse_solve_retval<PardisoImpl<Derived>, Rhs>
+  : sparse_solve_retval_base<PardisoImpl<Derived>, Rhs>
+{
+  typedef PardisoImpl<Derived> Dec;
+  EIGEN_MAKE_SPARSE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    this->defaultEvalTo(dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_PARDISOSUPPORT_H
diff --git a/third_party/eigen3/Eigen/src/QR/ColPivHouseholderQR.h b/third_party/eigen3/Eigen/src/QR/ColPivHouseholderQR.h
new file mode 100644
index 0000000000..4824880f51
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/QR/ColPivHouseholderQR.h
@@ -0,0 +1,582 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COLPIVOTINGHOUSEHOLDERQR_H
+#define EIGEN_COLPIVOTINGHOUSEHOLDERQR_H
+
+namespace Eigen { 
+
+/** \ingroup QR_Module
+  *
+  * \class ColPivHouseholderQR
+  *
+  * \brief Householder rank-revealing QR decomposition of a matrix with column-pivoting
+  *
+  * \param MatrixType the type of the matrix of which we are computing the QR decomposition
+  *
+  * This class performs a rank-revealing QR decomposition of a matrix \b A into matrices \b P, \b Q and \b R
+  * such that 
+  * \f[
+  *  \mathbf{A} \, \mathbf{P} = \mathbf{Q} \, \mathbf{R}
+  * \f]
+  * by using Householder transformations. Here, \b P is a permutation matrix, \b Q a unitary matrix and \b R an 
+  * upper triangular matrix.
+  *
+  * This decomposition performs column pivoting in order to be rank-revealing and improve
+  * numerical stability. It is slower than HouseholderQR, and faster than FullPivHouseholderQR.
+  *
+  * \sa MatrixBase::colPivHouseholderQr()
+  */
+template<typename _MatrixType> class ColPivHouseholderQR
+{
+  public:
+
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef Matrix<Scalar, RowsAtCompileTime, RowsAtCompileTime, Options, MaxRowsAtCompileTime, MaxRowsAtCompileTime> MatrixQType;
+    typedef typename internal::plain_diag_type<MatrixType>::type HCoeffsType;
+    typedef PermutationMatrix<ColsAtCompileTime, MaxColsAtCompileTime> PermutationType;
+    typedef typename internal::plain_row_type<MatrixType, Index>::type IntRowVectorType;
+    typedef typename internal::plain_row_type<MatrixType>::type RowVectorType;
+    typedef typename internal::plain_row_type<MatrixType, RealScalar>::type RealRowVectorType;
+    typedef HouseholderSequence<MatrixType,typename internal::remove_all<typename HCoeffsType::ConjugateReturnType>::type> HouseholderSequenceType;
+    
+  private:
+    
+    typedef typename PermutationType::Index PermIndexType;
+    
+  public:
+
+    /**
+    * \brief Default Constructor.
+    *
+    * The default constructor is useful in cases in which the user intends to
+    * perform decompositions via ColPivHouseholderQR::compute(const MatrixType&).
+    */
+    ColPivHouseholderQR()
+      : m_qr(),
+        m_hCoeffs(),
+        m_colsPermutation(),
+        m_colsTranspositions(),
+        m_temp(),
+        m_colSqNorms(),
+        m_isInitialized(false),
+        m_usePrescribedThreshold(false) {}
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa ColPivHouseholderQR()
+      */
+    ColPivHouseholderQR(Index rows, Index cols)
+      : m_qr(rows, cols),
+        m_hCoeffs((std::min)(rows,cols)),
+        m_colsPermutation(PermIndexType(cols)),
+        m_colsTranspositions(cols),
+        m_temp(cols),
+        m_colSqNorms(cols),
+        m_isInitialized(false),
+        m_usePrescribedThreshold(false) {}
+
+    /** \brief Constructs a QR factorization from a given matrix
+      *
+      * This constructor computes the QR factorization of the matrix \a matrix by calling
+      * the method compute(). It is a short cut for:
+      * 
+      * \code
+      * ColPivHouseholderQR<MatrixType> qr(matrix.rows(), matrix.cols());
+      * qr.compute(matrix);
+      * \endcode
+      * 
+      * \sa compute()
+      */
+    ColPivHouseholderQR(const MatrixType& matrix)
+      : m_qr(matrix.rows(), matrix.cols()),
+        m_hCoeffs((std::min)(matrix.rows(),matrix.cols())),
+        m_colsPermutation(PermIndexType(matrix.cols())),
+        m_colsTranspositions(matrix.cols()),
+        m_temp(matrix.cols()),
+        m_colSqNorms(matrix.cols()),
+        m_isInitialized(false),
+        m_usePrescribedThreshold(false)
+    {
+      compute(matrix);
+    }
+
+    /** This method finds a solution x to the equation Ax=b, where A is the matrix of which
+      * *this is the QR decomposition, if any exists.
+      *
+      * \param b the right-hand-side of the equation to solve.
+      *
+      * \returns a solution.
+      *
+      * \note The case where b is a matrix is not yet implemented. Also, this
+      *       code is space inefficient.
+      *
+      * \note_about_checking_solutions
+      *
+      * \note_about_arbitrary_choice_of_solution
+      *
+      * Example: \include ColPivHouseholderQR_solve.cpp
+      * Output: \verbinclude ColPivHouseholderQR_solve.out
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<ColPivHouseholderQR, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return internal::solve_retval<ColPivHouseholderQR, Rhs>(*this, b.derived());
+    }
+
+    HouseholderSequenceType householderQ(void) const;
+    HouseholderSequenceType matrixQ(void) const
+    {
+      return householderQ(); 
+    }
+
+    /** \returns a reference to the matrix where the Householder QR decomposition is stored
+      */
+    const MatrixType& matrixQR() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return m_qr;
+    }
+    
+    /** \returns a reference to the matrix where the result Householder QR is stored 
+     * \warning The strict lower part of this matrix contains internal values. 
+     * Only the upper triangular part should be referenced. To get it, use
+     * \code matrixR().template triangularView<Upper>() \endcode
+     * For rank-deficient matrices, use 
+     * \code 
+     * matrixR().topLeftCorner(rank(), rank()).template triangularView<Upper>() 
+     * \endcode
+     */
+    const MatrixType& matrixR() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return m_qr;
+    }
+    
+    ColPivHouseholderQR& compute(const MatrixType& matrix);
+
+    /** \returns a const reference to the column permutation matrix */
+    const PermutationType& colsPermutation() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return m_colsPermutation;
+    }
+
+    /** \returns the absolute value of the determinant of the matrix of which
+      * *this is the QR decomposition. It has only linear complexity
+      * (that is, O(n) where n is the dimension of the square matrix)
+      * as the QR decomposition has already been computed.
+      *
+      * \note This is only for square matrices.
+      *
+      * \warning a determinant can be very big or small, so for matrices
+      * of large enough dimension, there is a risk of overflow/underflow.
+      * One way to work around that is to use logAbsDeterminant() instead.
+      *
+      * \sa logAbsDeterminant(), MatrixBase::determinant()
+      */
+    typename MatrixType::RealScalar absDeterminant() const;
+
+    /** \returns the natural log of the absolute value of the determinant of the matrix of which
+      * *this is the QR decomposition. It has only linear complexity
+      * (that is, O(n) where n is the dimension of the square matrix)
+      * as the QR decomposition has already been computed.
+      *
+      * \note This is only for square matrices.
+      *
+      * \note This method is useful to work around the risk of overflow/underflow that's inherent
+      * to determinant computation.
+      *
+      * \sa absDeterminant(), MatrixBase::determinant()
+      */
+    typename MatrixType::RealScalar logAbsDeterminant() const;
+
+    /** \returns the rank of the matrix of which *this is the QR decomposition.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline Index rank() const
+    {
+      using std::abs;
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      RealScalar premultiplied_threshold = abs(m_maxpivot) * threshold();
+      Index result = 0;
+      for(Index i = 0; i < m_nonzero_pivots; ++i)
+        result += (abs(m_qr.coeff(i,i)) > premultiplied_threshold);
+      return result;
+    }
+
+    /** \returns the dimension of the kernel of the matrix of which *this is the QR decomposition.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline Index dimensionOfKernel() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return cols() - rank();
+    }
+
+    /** \returns true if the matrix of which *this is the QR decomposition represents an injective
+      *          linear map, i.e. has trivial kernel; false otherwise.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isInjective() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return rank() == cols();
+    }
+
+    /** \returns true if the matrix of which *this is the QR decomposition represents a surjective
+      *          linear map; false otherwise.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isSurjective() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return rank() == rows();
+    }
+
+    /** \returns true if the matrix of which *this is the QR decomposition is invertible.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isInvertible() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return isInjective() && isSurjective();
+    }
+
+    /** \returns the inverse of the matrix of which *this is the QR decomposition.
+      *
+      * \note If this matrix is not invertible, the returned matrix has undefined coefficients.
+      *       Use isInvertible() to first determine whether this matrix is invertible.
+      */
+    inline const
+    internal::solve_retval<ColPivHouseholderQR, typename MatrixType::IdentityReturnType>
+    inverse() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return internal::solve_retval<ColPivHouseholderQR,typename MatrixType::IdentityReturnType>
+               (*this, MatrixType::Identity(m_qr.rows(), m_qr.cols()));
+    }
+
+    inline Index rows() const { return m_qr.rows(); }
+    inline Index cols() const { return m_qr.cols(); }
+    
+    /** \returns a const reference to the vector of Householder coefficients used to represent the factor \c Q.
+      * 
+      * For advanced uses only.
+      */
+    const HCoeffsType& hCoeffs() const { return m_hCoeffs; }
+
+    /** Allows to prescribe a threshold to be used by certain methods, such as rank(),
+      * who need to determine when pivots are to be considered nonzero. This is not used for the
+      * QR decomposition itself.
+      *
+      * When it needs to get the threshold value, Eigen calls threshold(). By default, this
+      * uses a formula to automatically determine a reasonable threshold.
+      * Once you have called the present method setThreshold(const RealScalar&),
+      * your value is used instead.
+      *
+      * \param threshold The new value to use as the threshold.
+      *
+      * A pivot will be considered nonzero if its absolute value is strictly greater than
+      *  \f$ \vert pivot \vert \leqslant threshold \times \vert maxpivot \vert \f$
+      * where maxpivot is the biggest pivot.
+      *
+      * If you want to come back to the default behavior, call setThreshold(Default_t)
+      */
+    ColPivHouseholderQR& setThreshold(const RealScalar& threshold)
+    {
+      m_usePrescribedThreshold = true;
+      m_prescribedThreshold = threshold;
+      return *this;
+    }
+
+    /** Allows to come back to the default behavior, letting Eigen use its default formula for
+      * determining the threshold.
+      *
+      * You should pass the special object Eigen::Default as parameter here.
+      * \code qr.setThreshold(Eigen::Default); \endcode
+      *
+      * See the documentation of setThreshold(const RealScalar&).
+      */
+    ColPivHouseholderQR& setThreshold(Default_t)
+    {
+      m_usePrescribedThreshold = false;
+      return *this;
+    }
+
+    /** Returns the threshold that will be used by certain methods such as rank().
+      *
+      * See the documentation of setThreshold(const RealScalar&).
+      */
+    RealScalar threshold() const
+    {
+      eigen_assert(m_isInitialized || m_usePrescribedThreshold);
+      return m_usePrescribedThreshold ? m_prescribedThreshold
+      // this formula comes from experimenting (see "LU precision tuning" thread on the list)
+      // and turns out to be identical to Higham's formula used already in LDLt.
+                                      : NumTraits<Scalar>::epsilon() * RealScalar(m_qr.diagonalSize());
+    }
+
+    /** \returns the number of nonzero pivots in the QR decomposition.
+      * Here nonzero is meant in the exact sense, not in a fuzzy sense.
+      * So that notion isn't really intrinsically interesting, but it is
+      * still useful when implementing algorithms.
+      *
+      * \sa rank()
+      */
+    inline Index nonzeroPivots() const
+    {
+      eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+      return m_nonzero_pivots;
+    }
+
+    /** \returns the absolute value of the biggest pivot, i.e. the biggest
+      *          diagonal coefficient of R.
+      */
+    RealScalar maxPivot() const { return m_maxpivot; }
+    
+    /** \brief Reports whether the QR factorization was succesful.
+      *
+      * \note This function always returns \c Success. It is provided for compatibility 
+      * with other factorization routines.
+      * \returns \c Success 
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return Success;
+    }
+
+  protected:
+    MatrixType m_qr;
+    HCoeffsType m_hCoeffs;
+    PermutationType m_colsPermutation;
+    IntRowVectorType m_colsTranspositions;
+    RowVectorType m_temp;
+    RealRowVectorType m_colSqNorms;
+    bool m_isInitialized, m_usePrescribedThreshold;
+    RealScalar m_prescribedThreshold, m_maxpivot;
+    Index m_nonzero_pivots;
+    Index m_det_pq;
+};
+
+template<typename MatrixType>
+typename MatrixType::RealScalar ColPivHouseholderQR<MatrixType>::absDeterminant() const
+{
+  using std::abs;
+  eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+  eigen_assert(m_qr.rows() == m_qr.cols() && "You can't take the determinant of a non-square matrix!");
+  return abs(m_qr.diagonal().prod());
+}
+
+template<typename MatrixType>
+typename MatrixType::RealScalar ColPivHouseholderQR<MatrixType>::logAbsDeterminant() const
+{
+  eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+  eigen_assert(m_qr.rows() == m_qr.cols() && "You can't take the determinant of a non-square matrix!");
+  return m_qr.diagonal().cwiseAbs().array().log().sum();
+}
+
+/** Performs the QR factorization of the given matrix \a matrix. The result of
+  * the factorization is stored into \c *this, and a reference to \c *this
+  * is returned.
+  *
+  * \sa class ColPivHouseholderQR, ColPivHouseholderQR(const MatrixType&)
+  */
+template<typename MatrixType>
+ColPivHouseholderQR<MatrixType>& ColPivHouseholderQR<MatrixType>::compute(const MatrixType& matrix)
+{
+  using std::abs;
+  Index rows = matrix.rows();
+  Index cols = matrix.cols();
+  Index size = matrix.diagonalSize();
+  
+  // the column permutation is stored as int indices, so just to be sure:
+  eigen_assert(cols<=NumTraits<int>::highest());
+
+  m_qr = matrix;
+  m_hCoeffs.resize(size);
+
+  m_temp.resize(cols);
+
+  m_colsTranspositions.resize(matrix.cols());
+  Index number_of_transpositions = 0;
+
+  m_colSqNorms.resize(cols);
+  for(Index k = 0; k < cols; ++k)
+    m_colSqNorms.coeffRef(k) = m_qr.col(k).squaredNorm();
+
+  RealScalar threshold_helper = m_colSqNorms.maxCoeff() * numext::abs2(NumTraits<Scalar>::epsilon()) / RealScalar(rows);
+
+  m_nonzero_pivots = size; // the generic case is that in which all pivots are nonzero (invertible case)
+  m_maxpivot = RealScalar(0);
+
+  for(Index k = 0; k < size; ++k)
+  {
+    // first, we look up in our table m_colSqNorms which column has the biggest squared norm
+    Index biggest_col_index;
+    RealScalar biggest_col_sq_norm = m_colSqNorms.tail(cols-k).maxCoeff(&biggest_col_index);
+    biggest_col_index += k;
+
+    // since our table m_colSqNorms accumulates imprecision at every step, we must now recompute
+    // the actual squared norm of the selected column.
+    // Note that not doing so does result in solve() sometimes returning inf/nan values
+    // when running the unit test with 1000 repetitions.
+    biggest_col_sq_norm = m_qr.col(biggest_col_index).tail(rows-k).squaredNorm();
+
+    // we store that back into our table: it can't hurt to correct our table.
+    m_colSqNorms.coeffRef(biggest_col_index) = biggest_col_sq_norm;
+
+    // if the current biggest column is smaller than epsilon times the initial biggest column,
+    // terminate to avoid generating nan/inf values.
+    // Note that here, if we test instead for "biggest == 0", we get a failure every 1000 (or so)
+    // repetitions of the unit test, with the result of solve() filled with large values of the order
+    // of 1/(size*epsilon).
+    if(biggest_col_sq_norm < threshold_helper * RealScalar(rows-k))
+    {
+      m_nonzero_pivots = k;
+      m_hCoeffs.tail(size-k).setZero();
+      m_qr.bottomRightCorner(rows-k,cols-k)
+          .template triangularView<StrictlyLower>()
+          .setZero();
+      break;
+    }
+
+    // apply the transposition to the columns
+    m_colsTranspositions.coeffRef(k) = biggest_col_index;
+    if(k != biggest_col_index) {
+      m_qr.col(k).swap(m_qr.col(biggest_col_index));
+      std::swap(m_colSqNorms.coeffRef(k), m_colSqNorms.coeffRef(biggest_col_index));
+      ++number_of_transpositions;
+    }
+
+    // generate the householder vector, store it below the diagonal
+    RealScalar beta;
+    m_qr.col(k).tail(rows-k).makeHouseholderInPlace(m_hCoeffs.coeffRef(k), beta);
+
+    // apply the householder transformation to the diagonal coefficient
+    m_qr.coeffRef(k,k) = beta;
+
+    // remember the maximum absolute value of diagonal coefficients
+    if(abs(beta) > m_maxpivot) m_maxpivot = abs(beta);
+
+    // apply the householder transformation
+    m_qr.bottomRightCorner(rows-k, cols-k-1)
+        .applyHouseholderOnTheLeft(m_qr.col(k).tail(rows-k-1), m_hCoeffs.coeffRef(k), &m_temp.coeffRef(k+1));
+
+    // update our table of squared norms of the columns
+    m_colSqNorms.tail(cols-k-1) -= m_qr.row(k).tail(cols-k-1).cwiseAbs2();
+  }
+
+  m_colsPermutation.setIdentity(PermIndexType(cols));
+  for(PermIndexType k = 0; k < m_nonzero_pivots; ++k)
+    m_colsPermutation.applyTranspositionOnTheRight(k, PermIndexType(m_colsTranspositions.coeff(k)));
+
+  m_det_pq = (number_of_transpositions%2) ? -1 : 1;
+  m_isInitialized = true;
+
+  return *this;
+}
+
+namespace internal {
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<ColPivHouseholderQR<_MatrixType>, Rhs>
+  : solve_retval_base<ColPivHouseholderQR<_MatrixType>, Rhs>
+{
+  EIGEN_MAKE_SOLVE_HELPERS(ColPivHouseholderQR<_MatrixType>,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    eigen_assert(rhs().rows() == dec().rows());
+
+    const Index cols = dec().cols(),
+				nonzero_pivots = dec().nonzeroPivots();
+
+    if(nonzero_pivots == 0)
+    {
+      dst.setZero();
+      return;
+    }
+
+    typename Rhs::PlainObject c(rhs());
+
+    // Note that the matrix Q = H_0^* H_1^*... so its inverse is Q^* = (H_0 H_1 ...)^T
+    c.applyOnTheLeft(householderSequence(dec().matrixQR(), dec().hCoeffs())
+                     .setLength(dec().nonzeroPivots())
+		     .transpose()
+      );
+
+    dec().matrixR()
+       .topLeftCorner(nonzero_pivots, nonzero_pivots)
+       .template triangularView<Upper>()
+       .solveInPlace(c.topRows(nonzero_pivots));
+
+    for(Index i = 0; i < nonzero_pivots; ++i) dst.row(dec().colsPermutation().indices().coeff(i)) = c.row(i);
+    for(Index i = nonzero_pivots; i < cols; ++i) dst.row(dec().colsPermutation().indices().coeff(i)).setZero();
+  }
+};
+
+} // end namespace internal
+
+/** \returns the matrix Q as a sequence of householder transformations */
+template<typename MatrixType>
+typename ColPivHouseholderQR<MatrixType>::HouseholderSequenceType ColPivHouseholderQR<MatrixType>
+  ::householderQ() const
+{
+  eigen_assert(m_isInitialized && "ColPivHouseholderQR is not initialized.");
+  return HouseholderSequenceType(m_qr, m_hCoeffs.conjugate()).setLength(m_nonzero_pivots);
+}
+
+#ifndef __CUDACC__
+/** \return the column-pivoting Householder QR decomposition of \c *this.
+  *
+  * \sa class ColPivHouseholderQR
+  */
+template<typename Derived>
+const ColPivHouseholderQR<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::colPivHouseholderQr() const
+{
+  return ColPivHouseholderQR<PlainObject>(eval());
+}
+#endif // __CUDACC__
+
+} // end namespace Eigen
+
+#endif // EIGEN_COLPIVOTINGHOUSEHOLDERQR_H
diff --git a/third_party/eigen3/Eigen/src/QR/ColPivHouseholderQR_MKL.h b/third_party/eigen3/Eigen/src/QR/ColPivHouseholderQR_MKL.h
new file mode 100644
index 0000000000..b5b1983265
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/QR/ColPivHouseholderQR_MKL.h
@@ -0,0 +1,99 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *    Householder QR decomposition of a matrix with column pivoting based on
+ *    LAPACKE_?geqp3 function.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_COLPIVOTINGHOUSEHOLDERQR_MKL_H
+#define EIGEN_COLPIVOTINGHOUSEHOLDERQR_MKL_H
+
+#include "Eigen/src/Core/util/MKL_support.h"
+
+namespace Eigen { 
+
+/** \internal Specialization for the data types supported by MKL */
+
+#define EIGEN_MKL_QR_COLPIV(EIGTYPE, MKLTYPE, MKLPREFIX, EIGCOLROW, MKLCOLROW) \
+template<> inline \
+ColPivHouseholderQR<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW, Dynamic, Dynamic> >& \
+ColPivHouseholderQR<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW, Dynamic, Dynamic> >::compute( \
+              const Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW, Dynamic, Dynamic>& matrix) \
+\
+{ \
+  using std::abs; \
+  typedef Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW, Dynamic, Dynamic> MatrixType; \
+  typedef MatrixType::Scalar Scalar; \
+  typedef MatrixType::RealScalar RealScalar; \
+  Index rows = matrix.rows();\
+  Index cols = matrix.cols();\
+  Index size = matrix.diagonalSize();\
+\
+  m_qr = matrix;\
+  m_hCoeffs.resize(size);\
+\
+  m_colsTranspositions.resize(cols);\
+  /*Index number_of_transpositions = 0;*/ \
+\
+  m_nonzero_pivots = 0; \
+  m_maxpivot = RealScalar(0);\
+  m_colsPermutation.resize(cols); \
+  m_colsPermutation.indices().setZero(); \
+\
+  lapack_int lda = m_qr.outerStride(), i; \
+  lapack_int matrix_order = MKLCOLROW; \
+  LAPACKE_##MKLPREFIX##geqp3( matrix_order, rows, cols, (MKLTYPE*)m_qr.data(), lda, (lapack_int*)m_colsPermutation.indices().data(), (MKLTYPE*)m_hCoeffs.data()); \
+  m_isInitialized = true; \
+  m_maxpivot=m_qr.diagonal().cwiseAbs().maxCoeff(); \
+  m_hCoeffs.adjointInPlace(); \
+  RealScalar premultiplied_threshold = abs(m_maxpivot) * threshold(); \
+  lapack_int *perm = m_colsPermutation.indices().data(); \
+  for(i=0;i<size;i++) { \
+    m_nonzero_pivots += (abs(m_qr.coeff(i,i)) > premultiplied_threshold);\
+  } \
+  for(i=0;i<cols;i++) perm[i]--;\
+\
+  /*m_det_pq = (number_of_transpositions%2) ? -1 : 1;  // TODO: It's not needed now; fix upon availability in Eigen */ \
+\
+  return *this; \
+}
+
+EIGEN_MKL_QR_COLPIV(double,   double,        d, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_QR_COLPIV(float,    float,         s, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_QR_COLPIV(dcomplex, MKL_Complex16, z, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_QR_COLPIV(scomplex, MKL_Complex8,  c, ColMajor, LAPACK_COL_MAJOR)
+
+EIGEN_MKL_QR_COLPIV(double,   double,        d, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_QR_COLPIV(float,    float,         s, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_QR_COLPIV(dcomplex, MKL_Complex16, z, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_QR_COLPIV(scomplex, MKL_Complex8,  c, RowMajor, LAPACK_ROW_MAJOR)
+
+} // end namespace Eigen
+
+#endif // EIGEN_COLPIVOTINGHOUSEHOLDERQR_MKL_H
diff --git a/third_party/eigen3/Eigen/src/QR/FullPivHouseholderQR.h b/third_party/eigen3/Eigen/src/QR/FullPivHouseholderQR.h
new file mode 100644
index 0000000000..a7b0fc16f3
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/QR/FullPivHouseholderQR.h
@@ -0,0 +1,616 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_FULLPIVOTINGHOUSEHOLDERQR_H
+#define EIGEN_FULLPIVOTINGHOUSEHOLDERQR_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename MatrixType> struct FullPivHouseholderQRMatrixQReturnType;
+
+template<typename MatrixType>
+struct traits<FullPivHouseholderQRMatrixQReturnType<MatrixType> >
+{
+  typedef typename MatrixType::PlainObject ReturnType;
+};
+
+}
+
+/** \ingroup QR_Module
+  *
+  * \class FullPivHouseholderQR
+  *
+  * \brief Householder rank-revealing QR decomposition of a matrix with full pivoting
+  *
+  * \param MatrixType the type of the matrix of which we are computing the QR decomposition
+  *
+  * This class performs a rank-revealing QR decomposition of a matrix \b A into matrices \b P, \b P', \b Q and \b R
+  * such that 
+  * \f[
+  *  \mathbf{P} \, \mathbf{A} \, \mathbf{P}' = \mathbf{Q} \, \mathbf{R}
+  * \f]
+  * by using Householder transformations. Here, \b P and \b P' are permutation matrices, \b Q a unitary matrix 
+  * and \b R an upper triangular matrix.
+  *
+  * This decomposition performs a very prudent full pivoting in order to be rank-revealing and achieve optimal
+  * numerical stability. The trade-off is that it is slower than HouseholderQR and ColPivHouseholderQR.
+  *
+  * \sa MatrixBase::fullPivHouseholderQr()
+  */
+template<typename _MatrixType> class FullPivHouseholderQR
+{
+  public:
+
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef internal::FullPivHouseholderQRMatrixQReturnType<MatrixType> MatrixQReturnType;
+    typedef typename internal::plain_diag_type<MatrixType>::type HCoeffsType;
+    typedef Matrix<Index, 1,
+                   EIGEN_SIZE_MIN_PREFER_DYNAMIC(ColsAtCompileTime,RowsAtCompileTime), RowMajor, 1,
+                   EIGEN_SIZE_MIN_PREFER_FIXED(MaxColsAtCompileTime,MaxRowsAtCompileTime)> IntDiagSizeVectorType;
+    typedef PermutationMatrix<ColsAtCompileTime, MaxColsAtCompileTime> PermutationType;
+    typedef typename internal::plain_row_type<MatrixType>::type RowVectorType;
+    typedef typename internal::plain_col_type<MatrixType>::type ColVectorType;
+
+    /** \brief Default Constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via FullPivHouseholderQR::compute(const MatrixType&).
+      */
+    FullPivHouseholderQR()
+      : m_qr(),
+        m_hCoeffs(),
+        m_rows_transpositions(),
+        m_cols_transpositions(),
+        m_cols_permutation(),
+        m_temp(),
+        m_isInitialized(false),
+        m_usePrescribedThreshold(false) {}
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa FullPivHouseholderQR()
+      */
+    FullPivHouseholderQR(Index rows, Index cols)
+      : m_qr(rows, cols),
+        m_hCoeffs((std::min)(rows,cols)),
+        m_rows_transpositions((std::min)(rows,cols)),
+        m_cols_transpositions((std::min)(rows,cols)),
+        m_cols_permutation(cols),
+        m_temp(cols),
+        m_isInitialized(false),
+        m_usePrescribedThreshold(false) {}
+
+    /** \brief Constructs a QR factorization from a given matrix
+      *
+      * This constructor computes the QR factorization of the matrix \a matrix by calling
+      * the method compute(). It is a short cut for:
+      * 
+      * \code
+      * FullPivHouseholderQR<MatrixType> qr(matrix.rows(), matrix.cols());
+      * qr.compute(matrix);
+      * \endcode
+      * 
+      * \sa compute()
+      */
+    FullPivHouseholderQR(const MatrixType& matrix)
+      : m_qr(matrix.rows(), matrix.cols()),
+        m_hCoeffs((std::min)(matrix.rows(), matrix.cols())),
+        m_rows_transpositions((std::min)(matrix.rows(), matrix.cols())),
+        m_cols_transpositions((std::min)(matrix.rows(), matrix.cols())),
+        m_cols_permutation(matrix.cols()),
+        m_temp(matrix.cols()),
+        m_isInitialized(false),
+        m_usePrescribedThreshold(false)
+    {
+      compute(matrix);
+    }
+
+    /** This method finds a solution x to the equation Ax=b, where A is the matrix of which
+      * \c *this is the QR decomposition.
+      *
+      * \param b the right-hand-side of the equation to solve.
+      *
+      * \returns the exact or least-square solution if the rank is greater or equal to the number of columns of A,
+      * and an arbitrary solution otherwise.
+      *
+      * \note The case where b is a matrix is not yet implemented. Also, this
+      *       code is space inefficient.
+      *
+      * \note_about_checking_solutions
+      *
+      * \note_about_arbitrary_choice_of_solution
+      *
+      * Example: \include FullPivHouseholderQR_solve.cpp
+      * Output: \verbinclude FullPivHouseholderQR_solve.out
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<FullPivHouseholderQR, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return internal::solve_retval<FullPivHouseholderQR, Rhs>(*this, b.derived());
+    }
+
+    /** \returns Expression object representing the matrix Q
+      */
+    MatrixQReturnType matrixQ(void) const;
+
+    /** \returns a reference to the matrix where the Householder QR decomposition is stored
+      */
+    const MatrixType& matrixQR() const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return m_qr;
+    }
+
+    FullPivHouseholderQR& compute(const MatrixType& matrix);
+
+    /** \returns a const reference to the column permutation matrix */
+    const PermutationType& colsPermutation() const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return m_cols_permutation;
+    }
+
+    /** \returns a const reference to the vector of indices representing the rows transpositions */
+    const IntDiagSizeVectorType& rowsTranspositions() const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return m_rows_transpositions;
+    }
+
+    /** \returns the absolute value of the determinant of the matrix of which
+      * *this is the QR decomposition. It has only linear complexity
+      * (that is, O(n) where n is the dimension of the square matrix)
+      * as the QR decomposition has already been computed.
+      *
+      * \note This is only for square matrices.
+      *
+      * \warning a determinant can be very big or small, so for matrices
+      * of large enough dimension, there is a risk of overflow/underflow.
+      * One way to work around that is to use logAbsDeterminant() instead.
+      *
+      * \sa logAbsDeterminant(), MatrixBase::determinant()
+      */
+    typename MatrixType::RealScalar absDeterminant() const;
+
+    /** \returns the natural log of the absolute value of the determinant of the matrix of which
+      * *this is the QR decomposition. It has only linear complexity
+      * (that is, O(n) where n is the dimension of the square matrix)
+      * as the QR decomposition has already been computed.
+      *
+      * \note This is only for square matrices.
+      *
+      * \note This method is useful to work around the risk of overflow/underflow that's inherent
+      * to determinant computation.
+      *
+      * \sa absDeterminant(), MatrixBase::determinant()
+      */
+    typename MatrixType::RealScalar logAbsDeterminant() const;
+
+    /** \returns the rank of the matrix of which *this is the QR decomposition.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline Index rank() const
+    {
+      using std::abs;
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      RealScalar premultiplied_threshold = abs(m_maxpivot) * threshold();
+      Index result = 0;
+      for(Index i = 0; i < m_nonzero_pivots; ++i)
+        result += (abs(m_qr.coeff(i,i)) > premultiplied_threshold);
+      return result;
+    }
+
+    /** \returns the dimension of the kernel of the matrix of which *this is the QR decomposition.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline Index dimensionOfKernel() const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return cols() - rank();
+    }
+
+    /** \returns true if the matrix of which *this is the QR decomposition represents an injective
+      *          linear map, i.e. has trivial kernel; false otherwise.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isInjective() const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return rank() == cols();
+    }
+
+    /** \returns true if the matrix of which *this is the QR decomposition represents a surjective
+      *          linear map; false otherwise.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isSurjective() const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return rank() == rows();
+    }
+
+    /** \returns true if the matrix of which *this is the QR decomposition is invertible.
+      *
+      * \note This method has to determine which pivots should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline bool isInvertible() const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return isInjective() && isSurjective();
+    }
+
+    /** \returns the inverse of the matrix of which *this is the QR decomposition.
+      *
+      * \note If this matrix is not invertible, the returned matrix has undefined coefficients.
+      *       Use isInvertible() to first determine whether this matrix is invertible.
+      */    inline const
+    internal::solve_retval<FullPivHouseholderQR, typename MatrixType::IdentityReturnType>
+    inverse() const
+    {
+      eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+      return internal::solve_retval<FullPivHouseholderQR,typename MatrixType::IdentityReturnType>
+               (*this, MatrixType::Identity(m_qr.rows(), m_qr.cols()));
+    }
+
+    inline Index rows() const { return m_qr.rows(); }
+    inline Index cols() const { return m_qr.cols(); }
+    
+    /** \returns a const reference to the vector of Householder coefficients used to represent the factor \c Q.
+      * 
+      * For advanced uses only.
+      */
+    const HCoeffsType& hCoeffs() const { return m_hCoeffs; }
+
+    /** Allows to prescribe a threshold to be used by certain methods, such as rank(),
+      * who need to determine when pivots are to be considered nonzero. This is not used for the
+      * QR decomposition itself.
+      *
+      * When it needs to get the threshold value, Eigen calls threshold(). By default, this
+      * uses a formula to automatically determine a reasonable threshold.
+      * Once you have called the present method setThreshold(const RealScalar&),
+      * your value is used instead.
+      *
+      * \param threshold The new value to use as the threshold.
+      *
+      * A pivot will be considered nonzero if its absolute value is strictly greater than
+      *  \f$ \vert pivot \vert \leqslant threshold \times \vert maxpivot \vert \f$
+      * where maxpivot is the biggest pivot.
+      *
+      * If you want to come back to the default behavior, call setThreshold(Default_t)
+      */
+    FullPivHouseholderQR& setThreshold(const RealScalar& threshold)
+    {
+      m_usePrescribedThreshold = true;
+      m_prescribedThreshold = threshold;
+      return *this;
+    }
+
+    /** Allows to come back to the default behavior, letting Eigen use its default formula for
+      * determining the threshold.
+      *
+      * You should pass the special object Eigen::Default as parameter here.
+      * \code qr.setThreshold(Eigen::Default); \endcode
+      *
+      * See the documentation of setThreshold(const RealScalar&).
+      */
+    FullPivHouseholderQR& setThreshold(Default_t)
+    {
+      m_usePrescribedThreshold = false;
+      return *this;
+    }
+
+    /** Returns the threshold that will be used by certain methods such as rank().
+      *
+      * See the documentation of setThreshold(const RealScalar&).
+      */
+    RealScalar threshold() const
+    {
+      eigen_assert(m_isInitialized || m_usePrescribedThreshold);
+      return m_usePrescribedThreshold ? m_prescribedThreshold
+      // this formula comes from experimenting (see "LU precision tuning" thread on the list)
+      // and turns out to be identical to Higham's formula used already in LDLt.
+                                      : NumTraits<Scalar>::epsilon() * RealScalar(m_qr.diagonalSize());
+    }
+
+    /** \returns the number of nonzero pivots in the QR decomposition.
+      * Here nonzero is meant in the exact sense, not in a fuzzy sense.
+      * So that notion isn't really intrinsically interesting, but it is
+      * still useful when implementing algorithms.
+      *
+      * \sa rank()
+      */
+    inline Index nonzeroPivots() const
+    {
+      eigen_assert(m_isInitialized && "LU is not initialized.");
+      return m_nonzero_pivots;
+    }
+
+    /** \returns the absolute value of the biggest pivot, i.e. the biggest
+      *          diagonal coefficient of U.
+      */
+    RealScalar maxPivot() const { return m_maxpivot; }
+
+  protected:
+    MatrixType m_qr;
+    HCoeffsType m_hCoeffs;
+    IntDiagSizeVectorType m_rows_transpositions;
+    IntDiagSizeVectorType m_cols_transpositions;
+    PermutationType m_cols_permutation;
+    RowVectorType m_temp;
+    bool m_isInitialized, m_usePrescribedThreshold;
+    RealScalar m_prescribedThreshold, m_maxpivot;
+    Index m_nonzero_pivots;
+    RealScalar m_precision;
+    Index m_det_pq;
+};
+
+template<typename MatrixType>
+typename MatrixType::RealScalar FullPivHouseholderQR<MatrixType>::absDeterminant() const
+{
+  using std::abs;
+  eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+  eigen_assert(m_qr.rows() == m_qr.cols() && "You can't take the determinant of a non-square matrix!");
+  return abs(m_qr.diagonal().prod());
+}
+
+template<typename MatrixType>
+typename MatrixType::RealScalar FullPivHouseholderQR<MatrixType>::logAbsDeterminant() const
+{
+  eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+  eigen_assert(m_qr.rows() == m_qr.cols() && "You can't take the determinant of a non-square matrix!");
+  return m_qr.diagonal().cwiseAbs().array().log().sum();
+}
+
+/** Performs the QR factorization of the given matrix \a matrix. The result of
+  * the factorization is stored into \c *this, and a reference to \c *this
+  * is returned.
+  *
+  * \sa class FullPivHouseholderQR, FullPivHouseholderQR(const MatrixType&)
+  */
+template<typename MatrixType>
+FullPivHouseholderQR<MatrixType>& FullPivHouseholderQR<MatrixType>::compute(const MatrixType& matrix)
+{
+  using std::abs;
+  Index rows = matrix.rows();
+  Index cols = matrix.cols();
+  Index size = (std::min)(rows,cols);
+
+  m_qr = matrix;
+  m_hCoeffs.resize(size);
+
+  m_temp.resize(cols);
+
+  m_precision = NumTraits<Scalar>::epsilon() * RealScalar(size);
+
+  m_rows_transpositions.resize(size);
+  m_cols_transpositions.resize(size);
+  Index number_of_transpositions = 0;
+
+  RealScalar biggest(0);
+
+  m_nonzero_pivots = size; // the generic case is that in which all pivots are nonzero (invertible case)
+  m_maxpivot = RealScalar(0);
+
+  for (Index k = 0; k < size; ++k)
+  {
+    Index row_of_biggest_in_corner, col_of_biggest_in_corner;
+    RealScalar biggest_in_corner;
+
+    biggest_in_corner = m_qr.bottomRightCorner(rows-k, cols-k)
+                            .cwiseAbs()
+                            .maxCoeff(&row_of_biggest_in_corner, &col_of_biggest_in_corner);
+    row_of_biggest_in_corner += k;
+    col_of_biggest_in_corner += k;
+    if(k==0) biggest = biggest_in_corner;
+
+    // if the corner is negligible, then we have less than full rank, and we can finish early
+    if(internal::isMuchSmallerThan(biggest_in_corner, biggest, m_precision))
+    {
+      m_nonzero_pivots = k;
+      for(Index i = k; i < size; i++)
+      {
+        m_rows_transpositions.coeffRef(i) = i;
+        m_cols_transpositions.coeffRef(i) = i;
+        m_hCoeffs.coeffRef(i) = Scalar(0);
+      }
+      break;
+    }
+
+    m_rows_transpositions.coeffRef(k) = row_of_biggest_in_corner;
+    m_cols_transpositions.coeffRef(k) = col_of_biggest_in_corner;
+    if(k != row_of_biggest_in_corner) {
+      m_qr.row(k).tail(cols-k).swap(m_qr.row(row_of_biggest_in_corner).tail(cols-k));
+      ++number_of_transpositions;
+    }
+    if(k != col_of_biggest_in_corner) {
+      m_qr.col(k).swap(m_qr.col(col_of_biggest_in_corner));
+      ++number_of_transpositions;
+    }
+
+    RealScalar beta;
+    m_qr.col(k).tail(rows-k).makeHouseholderInPlace(m_hCoeffs.coeffRef(k), beta);
+    m_qr.coeffRef(k,k) = beta;
+
+    // remember the maximum absolute value of diagonal coefficients
+    if(abs(beta) > m_maxpivot) m_maxpivot = abs(beta);
+
+    m_qr.bottomRightCorner(rows-k, cols-k-1)
+        .applyHouseholderOnTheLeft(m_qr.col(k).tail(rows-k-1), m_hCoeffs.coeffRef(k), &m_temp.coeffRef(k+1));
+  }
+
+  m_cols_permutation.setIdentity(cols);
+  for(Index k = 0; k < size; ++k)
+    m_cols_permutation.applyTranspositionOnTheRight(k, m_cols_transpositions.coeff(k));
+
+  m_det_pq = (number_of_transpositions%2) ? -1 : 1;
+  m_isInitialized = true;
+
+  return *this;
+}
+
+namespace internal {
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<FullPivHouseholderQR<_MatrixType>, Rhs>
+  : solve_retval_base<FullPivHouseholderQR<_MatrixType>, Rhs>
+{
+  EIGEN_MAKE_SOLVE_HELPERS(FullPivHouseholderQR<_MatrixType>,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    const Index rows = dec().rows(), cols = dec().cols();
+    eigen_assert(rhs().rows() == rows);
+
+    // FIXME introduce nonzeroPivots() and use it here. and more generally,
+    // make the same improvements in this dec as in FullPivLU.
+    if(dec().rank()==0)
+    {
+      dst.setZero();
+      return;
+    }
+
+    typename Rhs::PlainObject c(rhs());
+
+    Matrix<Scalar,1,Rhs::ColsAtCompileTime> temp(rhs().cols());
+    for (Index k = 0; k < dec().rank(); ++k)
+    {
+      Index remainingSize = rows-k;
+      c.row(k).swap(c.row(dec().rowsTranspositions().coeff(k)));
+      c.bottomRightCorner(remainingSize, rhs().cols())
+       .applyHouseholderOnTheLeft(dec().matrixQR().col(k).tail(remainingSize-1),
+                                  dec().hCoeffs().coeff(k), &temp.coeffRef(0));
+    }
+
+    dec().matrixQR()
+       .topLeftCorner(dec().rank(), dec().rank())
+       .template triangularView<Upper>()
+       .solveInPlace(c.topRows(dec().rank()));
+
+    for(Index i = 0; i < dec().rank(); ++i) dst.row(dec().colsPermutation().indices().coeff(i)) = c.row(i);
+    for(Index i = dec().rank(); i < cols; ++i) dst.row(dec().colsPermutation().indices().coeff(i)).setZero();
+  }
+};
+
+/** \ingroup QR_Module
+  *
+  * \brief Expression type for return value of FullPivHouseholderQR::matrixQ()
+  *
+  * \tparam MatrixType type of underlying dense matrix
+  */
+template<typename MatrixType> struct FullPivHouseholderQRMatrixQReturnType
+  : public ReturnByValue<FullPivHouseholderQRMatrixQReturnType<MatrixType> >
+{
+public:
+  typedef typename MatrixType::Index Index;
+  typedef typename FullPivHouseholderQR<MatrixType>::IntDiagSizeVectorType IntDiagSizeVectorType;
+  typedef typename internal::plain_diag_type<MatrixType>::type HCoeffsType;
+  typedef Matrix<typename MatrixType::Scalar, 1, MatrixType::RowsAtCompileTime, RowMajor, 1,
+                 MatrixType::MaxRowsAtCompileTime> WorkVectorType;
+
+  FullPivHouseholderQRMatrixQReturnType(const MatrixType&       qr,
+                                        const HCoeffsType&      hCoeffs,
+                                        const IntDiagSizeVectorType& rowsTranspositions)
+    : m_qr(qr),
+      m_hCoeffs(hCoeffs),
+      m_rowsTranspositions(rowsTranspositions)
+      {}
+
+  template <typename ResultType>
+  void evalTo(ResultType& result) const
+  {
+    const Index rows = m_qr.rows();
+    WorkVectorType workspace(rows);
+    evalTo(result, workspace);
+  }
+
+  template <typename ResultType>
+  void evalTo(ResultType& result, WorkVectorType& workspace) const
+  {
+    using numext::conj;
+    // compute the product H'_0 H'_1 ... H'_n-1,
+    // where H_k is the k-th Householder transformation I - h_k v_k v_k'
+    // and v_k is the k-th Householder vector [1,m_qr(k+1,k), m_qr(k+2,k), ...]
+    const Index rows = m_qr.rows();
+    const Index cols = m_qr.cols();
+    const Index size = (std::min)(rows, cols);
+    workspace.resize(rows);
+    result.setIdentity(rows, rows);
+    for (Index k = size-1; k >= 0; k--)
+    {
+      result.block(k, k, rows-k, rows-k)
+            .applyHouseholderOnTheLeft(m_qr.col(k).tail(rows-k-1), conj(m_hCoeffs.coeff(k)), &workspace.coeffRef(k));
+      result.row(k).swap(result.row(m_rowsTranspositions.coeff(k)));
+    }
+  }
+
+    Index rows() const { return m_qr.rows(); }
+    Index cols() const { return m_qr.rows(); }
+
+protected:
+  typename MatrixType::Nested m_qr;
+  typename HCoeffsType::Nested m_hCoeffs;
+  typename IntDiagSizeVectorType::Nested m_rowsTranspositions;
+};
+
+} // end namespace internal
+
+template<typename MatrixType>
+inline typename FullPivHouseholderQR<MatrixType>::MatrixQReturnType FullPivHouseholderQR<MatrixType>::matrixQ() const
+{
+  eigen_assert(m_isInitialized && "FullPivHouseholderQR is not initialized.");
+  return MatrixQReturnType(m_qr, m_hCoeffs, m_rows_transpositions);
+}
+
+#ifndef __CUDACC__
+/** \return the full-pivoting Householder QR decomposition of \c *this.
+  *
+  * \sa class FullPivHouseholderQR
+  */
+template<typename Derived>
+const FullPivHouseholderQR<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::fullPivHouseholderQr() const
+{
+  return FullPivHouseholderQR<PlainObject>(eval());
+}
+#endif // __CUDACC__
+
+} // end namespace Eigen
+
+#endif // EIGEN_FULLPIVOTINGHOUSEHOLDERQR_H
diff --git a/third_party/eigen3/Eigen/src/QR/HouseholderQR.h b/third_party/eigen3/Eigen/src/QR/HouseholderQR.h
new file mode 100644
index 0000000000..352dbf3f0e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/QR/HouseholderQR.h
@@ -0,0 +1,382 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+// Copyright (C) 2010 Vincent Lejeune
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_QR_H
+#define EIGEN_QR_H
+
+namespace Eigen { 
+
+/** \ingroup QR_Module
+  *
+  *
+  * \class HouseholderQR
+  *
+  * \brief Householder QR decomposition of a matrix
+  *
+  * \param MatrixType the type of the matrix of which we are computing the QR decomposition
+  *
+  * This class performs a QR decomposition of a matrix \b A into matrices \b Q and \b R
+  * such that 
+  * \f[
+  *  \mathbf{A} = \mathbf{Q} \, \mathbf{R}
+  * \f]
+  * by using Householder transformations. Here, \b Q a unitary matrix and \b R an upper triangular matrix.
+  * The result is stored in a compact way compatible with LAPACK.
+  *
+  * Note that no pivoting is performed. This is \b not a rank-revealing decomposition.
+  * If you want that feature, use FullPivHouseholderQR or ColPivHouseholderQR instead.
+  *
+  * This Householder QR decomposition is faster, but less numerically stable and less feature-full than
+  * FullPivHouseholderQR or ColPivHouseholderQR.
+  *
+  * \sa MatrixBase::householderQr()
+  */
+template<typename _MatrixType> class HouseholderQR
+{
+  public:
+
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      Options = MatrixType::Options,
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef Matrix<Scalar, RowsAtCompileTime, RowsAtCompileTime, (MatrixType::Flags&RowMajorBit) ? RowMajor : ColMajor, MaxRowsAtCompileTime, MaxRowsAtCompileTime> MatrixQType;
+    typedef typename internal::plain_diag_type<MatrixType>::type HCoeffsType;
+    typedef typename internal::plain_row_type<MatrixType>::type RowVectorType;
+    typedef HouseholderSequence<MatrixType,typename internal::remove_all<typename HCoeffsType::ConjugateReturnType>::type> HouseholderSequenceType;
+
+    /**
+      * \brief Default Constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via HouseholderQR::compute(const MatrixType&).
+      */
+    HouseholderQR() : m_qr(), m_hCoeffs(), m_temp(), m_isInitialized(false) {}
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem \a size.
+      * \sa HouseholderQR()
+      */
+    HouseholderQR(Index rows, Index cols)
+      : m_qr(rows, cols),
+        m_hCoeffs((std::min)(rows,cols)),
+        m_temp(cols),
+        m_isInitialized(false) {}
+
+    /** \brief Constructs a QR factorization from a given matrix
+      *
+      * This constructor computes the QR factorization of the matrix \a matrix by calling
+      * the method compute(). It is a short cut for:
+      * 
+      * \code
+      * HouseholderQR<MatrixType> qr(matrix.rows(), matrix.cols());
+      * qr.compute(matrix);
+      * \endcode
+      * 
+      * \sa compute()
+      */
+    HouseholderQR(const MatrixType& matrix)
+      : m_qr(matrix.rows(), matrix.cols()),
+        m_hCoeffs((std::min)(matrix.rows(),matrix.cols())),
+        m_temp(matrix.cols()),
+        m_isInitialized(false)
+    {
+      compute(matrix);
+    }
+
+    /** This method finds a solution x to the equation Ax=b, where A is the matrix of which
+      * *this is the QR decomposition, if any exists.
+      *
+      * \param b the right-hand-side of the equation to solve.
+      *
+      * \returns a solution.
+      *
+      * \note The case where b is a matrix is not yet implemented. Also, this
+      *       code is space inefficient.
+      *
+      * \note_about_checking_solutions
+      *
+      * \note_about_arbitrary_choice_of_solution
+      *
+      * Example: \include HouseholderQR_solve.cpp
+      * Output: \verbinclude HouseholderQR_solve.out
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<HouseholderQR, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "HouseholderQR is not initialized.");
+      return internal::solve_retval<HouseholderQR, Rhs>(*this, b.derived());
+    }
+
+    /** This method returns an expression of the unitary matrix Q as a sequence of Householder transformations.
+      *
+      * The returned expression can directly be used to perform matrix products. It can also be assigned to a dense Matrix object.
+      * Here is an example showing how to recover the full or thin matrix Q, as well as how to perform matrix products using operator*:
+      *
+      * Example: \include HouseholderQR_householderQ.cpp
+      * Output: \verbinclude HouseholderQR_householderQ.out
+      */
+    HouseholderSequenceType householderQ() const
+    {
+      eigen_assert(m_isInitialized && "HouseholderQR is not initialized.");
+      return HouseholderSequenceType(m_qr, m_hCoeffs.conjugate());
+    }
+
+    /** \returns a reference to the matrix where the Householder QR decomposition is stored
+      * in a LAPACK-compatible way.
+      */
+    const MatrixType& matrixQR() const
+    {
+        eigen_assert(m_isInitialized && "HouseholderQR is not initialized.");
+        return m_qr;
+    }
+
+    HouseholderQR& compute(const MatrixType& matrix);
+
+    /** \returns the absolute value of the determinant of the matrix of which
+      * *this is the QR decomposition. It has only linear complexity
+      * (that is, O(n) where n is the dimension of the square matrix)
+      * as the QR decomposition has already been computed.
+      *
+      * \note This is only for square matrices.
+      *
+      * \warning a determinant can be very big or small, so for matrices
+      * of large enough dimension, there is a risk of overflow/underflow.
+      * One way to work around that is to use logAbsDeterminant() instead.
+      *
+      * \sa logAbsDeterminant(), MatrixBase::determinant()
+      */
+    typename MatrixType::RealScalar absDeterminant() const;
+
+    /** \returns the natural log of the absolute value of the determinant of the matrix of which
+      * *this is the QR decomposition. It has only linear complexity
+      * (that is, O(n) where n is the dimension of the square matrix)
+      * as the QR decomposition has already been computed.
+      *
+      * \note This is only for square matrices.
+      *
+      * \note This method is useful to work around the risk of overflow/underflow that's inherent
+      * to determinant computation.
+      *
+      * \sa absDeterminant(), MatrixBase::determinant()
+      */
+    typename MatrixType::RealScalar logAbsDeterminant() const;
+
+    inline Index rows() const { return m_qr.rows(); }
+    inline Index cols() const { return m_qr.cols(); }
+    
+    /** \returns a const reference to the vector of Householder coefficients used to represent the factor \c Q.
+      * 
+      * For advanced uses only.
+      */
+    const HCoeffsType& hCoeffs() const { return m_hCoeffs; }
+
+  protected:
+    MatrixType m_qr;
+    HCoeffsType m_hCoeffs;
+    RowVectorType m_temp;
+    bool m_isInitialized;
+};
+
+template<typename MatrixType>
+typename MatrixType::RealScalar HouseholderQR<MatrixType>::absDeterminant() const
+{
+  using std::abs;
+  eigen_assert(m_isInitialized && "HouseholderQR is not initialized.");
+  eigen_assert(m_qr.rows() == m_qr.cols() && "You can't take the determinant of a non-square matrix!");
+  return abs(m_qr.diagonal().prod());
+}
+
+template<typename MatrixType>
+typename MatrixType::RealScalar HouseholderQR<MatrixType>::logAbsDeterminant() const
+{
+  eigen_assert(m_isInitialized && "HouseholderQR is not initialized.");
+  eigen_assert(m_qr.rows() == m_qr.cols() && "You can't take the determinant of a non-square matrix!");
+  return m_qr.diagonal().cwiseAbs().array().log().sum();
+}
+
+namespace internal {
+
+/** \internal */
+template<typename MatrixQR, typename HCoeffs>
+void householder_qr_inplace_unblocked(MatrixQR& mat, HCoeffs& hCoeffs, typename MatrixQR::Scalar* tempData = 0)
+{
+  typedef typename MatrixQR::Index Index;
+  typedef typename MatrixQR::Scalar Scalar;
+  typedef typename MatrixQR::RealScalar RealScalar;
+  Index rows = mat.rows();
+  Index cols = mat.cols();
+  Index size = (std::min)(rows,cols);
+
+  eigen_assert(hCoeffs.size() == size);
+
+  typedef Matrix<Scalar,MatrixQR::ColsAtCompileTime,1> TempType;
+  TempType tempVector;
+  if(tempData==0)
+  {
+    tempVector.resize(cols);
+    tempData = tempVector.data();
+  }
+
+  for(Index k = 0; k < size; ++k)
+  {
+    Index remainingRows = rows - k;
+    Index remainingCols = cols - k - 1;
+
+    RealScalar beta;
+    mat.col(k).tail(remainingRows).makeHouseholderInPlace(hCoeffs.coeffRef(k), beta);
+    mat.coeffRef(k,k) = beta;
+
+    // apply H to remaining part of m_qr from the left
+    mat.bottomRightCorner(remainingRows, remainingCols)
+        .applyHouseholderOnTheLeft(mat.col(k).tail(remainingRows-1), hCoeffs.coeffRef(k), tempData+k+1);
+  }
+}
+
+/** \internal */
+template<typename MatrixQR, typename HCoeffs,
+  typename MatrixQRScalar = typename MatrixQR::Scalar,
+  bool InnerStrideIsOne = (MatrixQR::InnerStrideAtCompileTime == 1 && HCoeffs::InnerStrideAtCompileTime == 1)>
+struct householder_qr_inplace_blocked
+{
+  // This is specialized for MKL-supported Scalar types in HouseholderQR_MKL.h
+  static void run(MatrixQR& mat, HCoeffs& hCoeffs,
+      typename MatrixQR::Index maxBlockSize=32,
+      typename MatrixQR::Scalar* tempData = 0)
+  {
+    typedef typename MatrixQR::Index Index;
+    typedef typename MatrixQR::Scalar Scalar;
+    typedef Block<MatrixQR,Dynamic,Dynamic> BlockType;
+
+    Index rows = mat.rows();
+    Index cols = mat.cols();
+    Index size = (std::min)(rows, cols);
+
+    typedef Matrix<Scalar,Dynamic,1,ColMajor,MatrixQR::MaxColsAtCompileTime,1> TempType;
+    TempType tempVector;
+    if(tempData==0)
+    {
+      tempVector.resize(cols);
+      tempData = tempVector.data();
+    }
+
+    Index blockSize = (std::min)(maxBlockSize,size);
+
+    Index k = 0;
+    for (k = 0; k < size; k += blockSize)
+    {
+      Index bs = (std::min)(size-k,blockSize);  // actual size of the block
+      Index tcols = cols - k - bs;            // trailing columns
+      Index brows = rows-k;                   // rows of the block
+
+      // partition the matrix:
+      //        A00 | A01 | A02
+      // mat  = A10 | A11 | A12
+      //        A20 | A21 | A22
+      // and performs the qr dec of [A11^T A12^T]^T
+      // and update [A21^T A22^T]^T using level 3 operations.
+      // Finally, the algorithm continue on A22
+
+      BlockType A11_21 = mat.block(k,k,brows,bs);
+      Block<HCoeffs,Dynamic,1> hCoeffsSegment = hCoeffs.segment(k,bs);
+
+      householder_qr_inplace_unblocked(A11_21, hCoeffsSegment, tempData);
+
+      if(tcols)
+      {
+        BlockType A21_22 = mat.block(k,k+bs,brows,tcols);
+        apply_block_householder_on_the_left(A21_22,A11_21,hCoeffsSegment.adjoint());
+      }
+    }
+  }
+};
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<HouseholderQR<_MatrixType>, Rhs>
+  : solve_retval_base<HouseholderQR<_MatrixType>, Rhs>
+{
+  EIGEN_MAKE_SOLVE_HELPERS(HouseholderQR<_MatrixType>,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    const Index rows = dec().rows(), cols = dec().cols();
+    const Index rank = (std::min)(rows, cols);
+    eigen_assert(rhs().rows() == rows);
+
+    typename Rhs::PlainObject c(rhs());
+
+    // Note that the matrix Q = H_0^* H_1^*... so its inverse is Q^* = (H_0 H_1 ...)^T
+    c.applyOnTheLeft(householderSequence(
+      dec().matrixQR().leftCols(rank),
+      dec().hCoeffs().head(rank)).transpose()
+    );
+
+    dec().matrixQR()
+       .topLeftCorner(rank, rank)
+       .template triangularView<Upper>()
+       .solveInPlace(c.topRows(rank));
+
+    dst.topRows(rank) = c.topRows(rank);
+    dst.bottomRows(cols-rank).setZero();
+  }
+};
+
+} // end namespace internal
+
+/** Performs the QR factorization of the given matrix \a matrix. The result of
+  * the factorization is stored into \c *this, and a reference to \c *this
+  * is returned.
+  *
+  * \sa class HouseholderQR, HouseholderQR(const MatrixType&)
+  */
+template<typename MatrixType>
+HouseholderQR<MatrixType>& HouseholderQR<MatrixType>::compute(const MatrixType& matrix)
+{
+  Index rows = matrix.rows();
+  Index cols = matrix.cols();
+  Index size = (std::min)(rows,cols);
+
+  m_qr = matrix;
+  m_hCoeffs.resize(size);
+
+  m_temp.resize(cols);
+
+  internal::householder_qr_inplace_blocked<MatrixType, HCoeffsType>::run(m_qr, m_hCoeffs, 48, m_temp.data());
+
+  m_isInitialized = true;
+  return *this;
+}
+
+#ifndef __CUDACC__
+/** \return the Householder QR decomposition of \c *this.
+  *
+  * \sa class HouseholderQR
+  */
+template<typename Derived>
+const HouseholderQR<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::householderQr() const
+{
+  return HouseholderQR<PlainObject>(eval());
+}
+#endif // __CUDACC__
+
+} // end namespace Eigen
+
+#endif // EIGEN_QR_H
diff --git a/third_party/eigen3/Eigen/src/QR/HouseholderQR_MKL.h b/third_party/eigen3/Eigen/src/QR/HouseholderQR_MKL.h
new file mode 100644
index 0000000000..8a3a7e4063
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/QR/HouseholderQR_MKL.h
@@ -0,0 +1,71 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *    Householder QR decomposition of a matrix w/o pivoting based on
+ *    LAPACKE_?geqrf function.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_QR_MKL_H
+#define EIGEN_QR_MKL_H
+
+#include "../Core/util/MKL_support.h"
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \internal Specialization for the data types supported by MKL */
+
+#define EIGEN_MKL_QR_NOPIV(EIGTYPE, MKLTYPE, MKLPREFIX) \
+template<typename MatrixQR, typename HCoeffs> \
+struct householder_qr_inplace_blocked<MatrixQR, HCoeffs, EIGTYPE, true> \
+{ \
+  static void run(MatrixQR& mat, HCoeffs& hCoeffs, \
+      typename MatrixQR::Index = 32, \
+      typename MatrixQR::Scalar* = 0) \
+  { \
+    lapack_int m = (lapack_int) mat.rows(); \
+    lapack_int n = (lapack_int) mat.cols(); \
+    lapack_int lda = (lapack_int) mat.outerStride(); \
+    lapack_int matrix_order = (MatrixQR::IsRowMajor) ? LAPACK_ROW_MAJOR : LAPACK_COL_MAJOR; \
+    LAPACKE_##MKLPREFIX##geqrf( matrix_order, m, n, (MKLTYPE*)mat.data(), lda, (MKLTYPE*)hCoeffs.data()); \
+    hCoeffs.adjointInPlace(); \
+  } \
+};
+
+EIGEN_MKL_QR_NOPIV(double, double, d)
+EIGEN_MKL_QR_NOPIV(float, float, s)
+EIGEN_MKL_QR_NOPIV(dcomplex, MKL_Complex16, z)
+EIGEN_MKL_QR_NOPIV(scomplex, MKL_Complex8, c)
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_QR_MKL_H
diff --git a/third_party/eigen3/Eigen/src/SPQRSupport/SuiteSparseQRSupport.h b/third_party/eigen3/Eigen/src/SPQRSupport/SuiteSparseQRSupport.h
new file mode 100644
index 0000000000..a2cc2a9e26
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SPQRSupport/SuiteSparseQRSupport.h
@@ -0,0 +1,314 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Desire Nuentsa <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SUITESPARSEQRSUPPORT_H
+#define EIGEN_SUITESPARSEQRSUPPORT_H
+
+namespace Eigen {
+  
+  template<typename MatrixType> class SPQR; 
+  template<typename SPQRType> struct SPQRMatrixQReturnType; 
+  template<typename SPQRType> struct SPQRMatrixQTransposeReturnType; 
+  template <typename SPQRType, typename Derived> struct SPQR_QProduct;
+  namespace internal {
+    template <typename SPQRType> struct traits<SPQRMatrixQReturnType<SPQRType> >
+    {
+      typedef typename SPQRType::MatrixType ReturnType;
+    };
+    template <typename SPQRType> struct traits<SPQRMatrixQTransposeReturnType<SPQRType> >
+    {
+      typedef typename SPQRType::MatrixType ReturnType;
+    };
+    template <typename SPQRType, typename Derived> struct traits<SPQR_QProduct<SPQRType, Derived> >
+    {
+      typedef typename Derived::PlainObject ReturnType;
+    };
+  } // End namespace internal
+  
+/**
+ * \ingroup SPQRSupport_Module
+ * \class SPQR
+ * \brief Sparse QR factorization based on SuiteSparseQR library
+ * 
+ * This class is used to perform a multithreaded and multifrontal rank-revealing QR decomposition 
+ * of sparse matrices. The result is then used to solve linear leasts_square systems.
+ * Clearly, a QR factorization is returned such that A*P = Q*R where :
+ * 
+ * P is the column permutation. Use colsPermutation() to get it.
+ * 
+ * Q is the orthogonal matrix represented as Householder reflectors. 
+ * Use matrixQ() to get an expression and matrixQ().transpose() to get the transpose.
+ * You can then apply it to a vector.
+ * 
+ * R is the sparse triangular factor. Use matrixQR() to get it as SparseMatrix.
+ * NOTE : The Index type of R is always UF_long. You can get it with SPQR::Index
+ * 
+ * \tparam _MatrixType The type of the sparse matrix A, must be a column-major SparseMatrix<>
+ * NOTE 
+ * 
+ */
+template<typename _MatrixType>
+class SPQR
+{
+  public:
+    typedef typename _MatrixType::Scalar Scalar;
+    typedef typename _MatrixType::RealScalar RealScalar;
+    typedef UF_long Index ; 
+    typedef SparseMatrix<Scalar, ColMajor, Index> MatrixType;
+    typedef PermutationMatrix<Dynamic, Dynamic> PermutationType;
+  public:
+    SPQR() 
+      : m_isInitialized(false),
+      m_ordering(SPQR_ORDERING_DEFAULT),
+      m_allow_tol(SPQR_DEFAULT_TOL),
+      m_tolerance (NumTraits<Scalar>::epsilon())
+    { 
+      cholmod_l_start(&m_cc);
+    }
+    
+    SPQR(const _MatrixType& matrix) 
+    : m_isInitialized(false),
+      m_ordering(SPQR_ORDERING_DEFAULT),
+      m_allow_tol(SPQR_DEFAULT_TOL),
+      m_tolerance (NumTraits<Scalar>::epsilon())
+    {
+      cholmod_l_start(&m_cc);
+      compute(matrix);
+    }
+    
+    ~SPQR()
+    {
+      SPQR_free();
+      cholmod_l_finish(&m_cc);
+    }
+    void SPQR_free()
+    {
+      cholmod_l_free_sparse(&m_H, &m_cc);
+      cholmod_l_free_sparse(&m_cR, &m_cc);
+      cholmod_l_free_dense(&m_HTau, &m_cc);
+      std::free(m_E);
+      std::free(m_HPinv);
+    }
+
+    void compute(const _MatrixType& matrix)
+    {
+      if(m_isInitialized) SPQR_free();
+
+      MatrixType mat(matrix);
+      cholmod_sparse A; 
+      A = viewAsCholmod(mat);
+      Index col = matrix.cols();
+      m_rank = SuiteSparseQR<Scalar>(m_ordering, m_tolerance, col, &A, 
+                             &m_cR, &m_E, &m_H, &m_HPinv, &m_HTau, &m_cc);
+
+      if (!m_cR)
+      {
+        m_info = NumericalIssue; 
+        m_isInitialized = false;
+        return;
+      }
+      m_info = Success;
+      m_isInitialized = true;
+      m_isRUpToDate = false;
+    }
+    /** 
+     * Get the number of rows of the input matrix and the Q matrix
+     */
+    inline Index rows() const {return m_H->nrow; }
+    
+    /** 
+     * Get the number of columns of the input matrix. 
+     */
+    inline Index cols() const { return m_cR->ncol; }
+   
+      /** \returns the solution X of \f$ A X = B \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<SPQR, Rhs> solve(const MatrixBase<Rhs>& B) const 
+    {
+      eigen_assert(m_isInitialized && " The QR factorization should be computed first, call compute()");
+      eigen_assert(this->rows()==B.rows()
+                    && "SPQR::solve(): invalid number of rows of the right hand side matrix B");
+          return internal::solve_retval<SPQR, Rhs>(*this, B.derived());
+    }
+    
+    template<typename Rhs, typename Dest>
+    void _solve(const MatrixBase<Rhs> &b, MatrixBase<Dest> &dest) const
+    {
+      eigen_assert(m_isInitialized && " The QR factorization should be computed first, call compute()");
+      eigen_assert(b.cols()==1 && "This method is for vectors only");
+      
+      //Compute Q^T * b
+      typename Dest::PlainObject y;
+      y = matrixQ().transpose() * b;
+        // Solves with the triangular matrix R
+      Index rk = this->rank();
+      y.topRows(rk) = this->matrixR().topLeftCorner(rk, rk).template triangularView<Upper>().solve(y.topRows(rk));
+      y.bottomRows(cols()-rk).setZero();
+      // Apply the column permutation 
+      dest.topRows(cols()) = colsPermutation() * y.topRows(cols());
+      
+      m_info = Success;
+    }
+    
+    /** \returns the sparse triangular factor R. It is a sparse matrix
+     */
+    const MatrixType matrixR() const
+    {
+      eigen_assert(m_isInitialized && " The QR factorization should be computed first, call compute()");
+      if(!m_isRUpToDate) {
+        m_R = viewAsEigen<Scalar,ColMajor, typename MatrixType::Index>(*m_cR);
+        m_isRUpToDate = true;
+      }
+      return m_R;
+    }
+    /// Get an expression of the matrix Q
+    SPQRMatrixQReturnType<SPQR> matrixQ() const
+    {
+      return SPQRMatrixQReturnType<SPQR>(*this);
+    }
+    /// Get the permutation that was applied to columns of A
+    PermutationType colsPermutation() const
+    { 
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      Index n = m_cR->ncol;
+      PermutationType colsPerm(n);
+      for(Index j = 0; j <n; j++) colsPerm.indices()(j) = m_E[j];
+      return colsPerm; 
+      
+    }
+    /**
+     * Gets the rank of the matrix. 
+     * It should be equal to matrixQR().cols if the matrix is full-rank
+     */
+    Index rank() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_cc.SPQR_istat[4];
+    }
+    /// Set the fill-reducing ordering method to be used
+    void setSPQROrdering(int ord) { m_ordering = ord;}
+    /// Set the tolerance tol to treat columns with 2-norm < =tol as zero
+    void setPivotThreshold(const RealScalar& tol) { m_tolerance = tol; }
+    
+    /** \returns a pointer to the SPQR workspace */
+    cholmod_common *cholmodCommon() const { return &m_cc; }
+    
+    
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the sparse QR can not be computed
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_info;
+    }
+  protected:
+    bool m_isInitialized;
+    bool m_analysisIsOk;
+    bool m_factorizationIsOk;
+    mutable bool m_isRUpToDate;
+    mutable ComputationInfo m_info;
+    int m_ordering; // Ordering method to use, see SPQR's manual
+    int m_allow_tol; // Allow to use some tolerance during numerical factorization.
+    RealScalar m_tolerance; // treat columns with 2-norm below this tolerance as zero
+    mutable cholmod_sparse *m_cR; // The sparse R factor in cholmod format
+    mutable MatrixType m_R; // The sparse matrix R in Eigen format
+    mutable Index *m_E; // The permutation applied to columns
+    mutable cholmod_sparse *m_H;  //The householder vectors
+    mutable Index *m_HPinv; // The row permutation of H
+    mutable cholmod_dense *m_HTau; // The Householder coefficients
+    mutable Index m_rank; // The rank of the matrix
+    mutable cholmod_common m_cc; // Workspace and parameters
+    template<typename ,typename > friend struct SPQR_QProduct;
+};
+
+template <typename SPQRType, typename Derived>
+struct SPQR_QProduct : ReturnByValue<SPQR_QProduct<SPQRType,Derived> >
+{
+  typedef typename SPQRType::Scalar Scalar;
+  typedef typename SPQRType::Index Index;
+  //Define the constructor to get reference to argument types
+  SPQR_QProduct(const SPQRType& spqr, const Derived& other, bool transpose) : m_spqr(spqr),m_other(other),m_transpose(transpose) {}
+  
+  inline Index rows() const { return m_transpose ? m_spqr.rows() : m_spqr.cols(); }
+  inline Index cols() const { return m_other.cols(); }
+  // Assign to a vector
+  template<typename ResType>
+  void evalTo(ResType& res) const
+  {
+    cholmod_dense y_cd;
+    cholmod_dense *x_cd; 
+    int method = m_transpose ? SPQR_QTX : SPQR_QX; 
+    cholmod_common *cc = m_spqr.cholmodCommon();
+    y_cd = viewAsCholmod(m_other.const_cast_derived());
+    x_cd = SuiteSparseQR_qmult<Scalar>(method, m_spqr.m_H, m_spqr.m_HTau, m_spqr.m_HPinv, &y_cd, cc);
+    res = Matrix<Scalar,ResType::RowsAtCompileTime,ResType::ColsAtCompileTime>::Map(reinterpret_cast<Scalar*>(x_cd->x), x_cd->nrow, x_cd->ncol);
+    cholmod_l_free_dense(&x_cd, cc);
+  }
+  const SPQRType& m_spqr; 
+  const Derived& m_other; 
+  bool m_transpose; 
+  
+};
+template<typename SPQRType>
+struct SPQRMatrixQReturnType{
+  
+  SPQRMatrixQReturnType(const SPQRType& spqr) : m_spqr(spqr) {}
+  template<typename Derived>
+  SPQR_QProduct<SPQRType, Derived> operator*(const MatrixBase<Derived>& other)
+  {
+    return SPQR_QProduct<SPQRType,Derived>(m_spqr,other.derived(),false);
+  }
+  SPQRMatrixQTransposeReturnType<SPQRType> adjoint() const
+  {
+    return SPQRMatrixQTransposeReturnType<SPQRType>(m_spqr);
+  }
+  // To use for operations with the transpose of Q
+  SPQRMatrixQTransposeReturnType<SPQRType> transpose() const
+  {
+    return SPQRMatrixQTransposeReturnType<SPQRType>(m_spqr);
+  }
+  const SPQRType& m_spqr;
+};
+
+template<typename SPQRType>
+struct SPQRMatrixQTransposeReturnType{
+  SPQRMatrixQTransposeReturnType(const SPQRType& spqr) : m_spqr(spqr) {}
+  template<typename Derived>
+  SPQR_QProduct<SPQRType,Derived> operator*(const MatrixBase<Derived>& other)
+  {
+    return SPQR_QProduct<SPQRType,Derived>(m_spqr,other.derived(), true);
+  }
+  const SPQRType& m_spqr;
+};
+
+namespace internal {
+  
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<SPQR<_MatrixType>, Rhs>
+  : solve_retval_base<SPQR<_MatrixType>, Rhs>
+{
+  typedef SPQR<_MatrixType> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+} // end namespace internal
+
+}// End namespace Eigen
+#endif
diff --git a/third_party/eigen3/Eigen/src/SVD/JacobiSVD.h b/third_party/eigen3/Eigen/src/SVD/JacobiSVD.h
new file mode 100644
index 0000000000..d17d3a667d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SVD/JacobiSVD.h
@@ -0,0 +1,960 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_JACOBISVD_H
+#define EIGEN_JACOBISVD_H
+
+namespace Eigen { 
+
+namespace internal {
+// forward declaration (needed by ICC)
+// the empty body is required by MSVC
+template<typename MatrixType, int QRPreconditioner,
+         bool IsComplex = NumTraits<typename MatrixType::Scalar>::IsComplex>
+struct svd_precondition_2x2_block_to_be_real {};
+
+/*** QR preconditioners (R-SVD)
+ ***
+ *** Their role is to reduce the problem of computing the SVD to the case of a square matrix.
+ *** This approach, known as R-SVD, is an optimization for rectangular-enough matrices, and is a requirement for
+ *** JacobiSVD which by itself is only able to work on square matrices.
+ ***/
+
+enum { PreconditionIfMoreColsThanRows, PreconditionIfMoreRowsThanCols };
+
+template<typename MatrixType, int QRPreconditioner, int Case>
+struct qr_preconditioner_should_do_anything
+{
+  enum { a = MatrixType::RowsAtCompileTime != Dynamic &&
+             MatrixType::ColsAtCompileTime != Dynamic &&
+             MatrixType::ColsAtCompileTime <= MatrixType::RowsAtCompileTime,
+         b = MatrixType::RowsAtCompileTime != Dynamic &&
+             MatrixType::ColsAtCompileTime != Dynamic &&
+             MatrixType::RowsAtCompileTime <= MatrixType::ColsAtCompileTime,
+         ret = !( (QRPreconditioner == NoQRPreconditioner) ||
+                  (Case == PreconditionIfMoreColsThanRows && bool(a)) ||
+                  (Case == PreconditionIfMoreRowsThanCols && bool(b)) )
+  };
+};
+
+template<typename MatrixType, int QRPreconditioner, int Case,
+         bool DoAnything = qr_preconditioner_should_do_anything<MatrixType, QRPreconditioner, Case>::ret
+> struct qr_preconditioner_impl {};
+
+template<typename MatrixType, int QRPreconditioner, int Case>
+class qr_preconditioner_impl<MatrixType, QRPreconditioner, Case, false>
+{
+public:
+  typedef typename MatrixType::Index Index;
+  void allocate(const JacobiSVD<MatrixType, QRPreconditioner>&) {}
+  bool run(JacobiSVD<MatrixType, QRPreconditioner>&, const MatrixType&)
+  {
+    return false;
+  }
+};
+
+/*** preconditioner using FullPivHouseholderQR ***/
+
+template<typename MatrixType>
+class qr_preconditioner_impl<MatrixType, FullPivHouseholderQRPreconditioner, PreconditionIfMoreRowsThanCols, true>
+{
+public:
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  enum
+  {
+    RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+    MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime
+  };
+  typedef Matrix<Scalar, 1, RowsAtCompileTime, RowMajor, 1, MaxRowsAtCompileTime> WorkspaceType;
+
+  void allocate(const JacobiSVD<MatrixType, FullPivHouseholderQRPreconditioner>& svd)
+  {
+    if (svd.rows() != m_qr.rows() || svd.cols() != m_qr.cols())
+    {
+      m_qr.~QRType();
+      ::new (&m_qr) QRType(svd.rows(), svd.cols());
+    }
+    if (svd.m_computeFullU) m_workspace.resize(svd.rows());
+  }
+
+  bool run(JacobiSVD<MatrixType, FullPivHouseholderQRPreconditioner>& svd, const MatrixType& matrix)
+  {
+    if(matrix.rows() > matrix.cols())
+    {
+      m_qr.compute(matrix);
+      svd.m_workMatrix = m_qr.matrixQR().block(0,0,matrix.cols(),matrix.cols()).template triangularView<Upper>();
+      if(svd.m_computeFullU) m_qr.matrixQ().evalTo(svd.m_matrixU, m_workspace);
+      if(svd.computeV()) svd.m_matrixV = m_qr.colsPermutation();
+      return true;
+    }
+    return false;
+  }
+private:
+  typedef FullPivHouseholderQR<MatrixType> QRType;
+  QRType m_qr;
+  WorkspaceType m_workspace;
+};
+
+template<typename MatrixType>
+class qr_preconditioner_impl<MatrixType, FullPivHouseholderQRPreconditioner, PreconditionIfMoreColsThanRows, true>
+{
+public:
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  enum
+  {
+    RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+    MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
+    Options = MatrixType::Options
+  };
+  typedef Matrix<Scalar, ColsAtCompileTime, RowsAtCompileTime, Options, MaxColsAtCompileTime, MaxRowsAtCompileTime>
+          TransposeTypeWithSameStorageOrder;
+
+  void allocate(const JacobiSVD<MatrixType, FullPivHouseholderQRPreconditioner>& svd)
+  {
+    if (svd.cols() != m_qr.rows() || svd.rows() != m_qr.cols())
+    {
+      m_qr.~QRType();
+      ::new (&m_qr) QRType(svd.cols(), svd.rows());
+    }
+    m_adjoint.resize(svd.cols(), svd.rows());
+    if (svd.m_computeFullV) m_workspace.resize(svd.cols());
+  }
+
+  bool run(JacobiSVD<MatrixType, FullPivHouseholderQRPreconditioner>& svd, const MatrixType& matrix)
+  {
+    if(matrix.cols() > matrix.rows())
+    {
+      m_adjoint = matrix.adjoint();
+      m_qr.compute(m_adjoint);
+      svd.m_workMatrix = m_qr.matrixQR().block(0,0,matrix.rows(),matrix.rows()).template triangularView<Upper>().adjoint();
+      if(svd.m_computeFullV) m_qr.matrixQ().evalTo(svd.m_matrixV, m_workspace);
+      if(svd.computeU()) svd.m_matrixU = m_qr.colsPermutation();
+      return true;
+    }
+    else return false;
+  }
+private:
+  typedef FullPivHouseholderQR<TransposeTypeWithSameStorageOrder> QRType;
+  QRType m_qr;
+  TransposeTypeWithSameStorageOrder m_adjoint;
+  typename internal::plain_row_type<MatrixType>::type m_workspace;
+};
+
+/*** preconditioner using ColPivHouseholderQR ***/
+
+template<typename MatrixType>
+class qr_preconditioner_impl<MatrixType, ColPivHouseholderQRPreconditioner, PreconditionIfMoreRowsThanCols, true>
+{
+public:
+  typedef typename MatrixType::Index Index;
+
+  void allocate(const JacobiSVD<MatrixType, ColPivHouseholderQRPreconditioner>& svd)
+  {
+    if (svd.rows() != m_qr.rows() || svd.cols() != m_qr.cols())
+    {
+      m_qr.~QRType();
+      ::new (&m_qr) QRType(svd.rows(), svd.cols());
+    }
+    if (svd.m_computeFullU) m_workspace.resize(svd.rows());
+    else if (svd.m_computeThinU) m_workspace.resize(svd.cols());
+  }
+
+  bool run(JacobiSVD<MatrixType, ColPivHouseholderQRPreconditioner>& svd, const MatrixType& matrix)
+  {
+    if(matrix.rows() > matrix.cols())
+    {
+      m_qr.compute(matrix);
+      svd.m_workMatrix = m_qr.matrixQR().block(0,0,matrix.cols(),matrix.cols()).template triangularView<Upper>();
+      if(svd.m_computeFullU) m_qr.householderQ().evalTo(svd.m_matrixU, m_workspace);
+      else if(svd.m_computeThinU)
+      {
+        svd.m_matrixU.setIdentity(matrix.rows(), matrix.cols());
+        m_qr.householderQ().applyThisOnTheLeft(svd.m_matrixU, m_workspace);
+      }
+      if(svd.computeV()) svd.m_matrixV = m_qr.colsPermutation();
+      return true;
+    }
+    return false;
+  }
+
+private:
+  typedef ColPivHouseholderQR<MatrixType> QRType;
+  QRType m_qr;
+  typename internal::plain_col_type<MatrixType>::type m_workspace;
+};
+
+template<typename MatrixType>
+class qr_preconditioner_impl<MatrixType, ColPivHouseholderQRPreconditioner, PreconditionIfMoreColsThanRows, true>
+{
+public:
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  enum
+  {
+    RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+    MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
+    Options = MatrixType::Options
+  };
+
+  typedef Matrix<Scalar, ColsAtCompileTime, RowsAtCompileTime, Options, MaxColsAtCompileTime, MaxRowsAtCompileTime>
+          TransposeTypeWithSameStorageOrder;
+
+  void allocate(const JacobiSVD<MatrixType, ColPivHouseholderQRPreconditioner>& svd)
+  {
+    if (svd.cols() != m_qr.rows() || svd.rows() != m_qr.cols())
+    {
+      m_qr.~QRType();
+      ::new (&m_qr) QRType(svd.cols(), svd.rows());
+    }
+    if (svd.m_computeFullV) m_workspace.resize(svd.cols());
+    else if (svd.m_computeThinV) m_workspace.resize(svd.rows());
+    m_adjoint.resize(svd.cols(), svd.rows());
+  }
+
+  bool run(JacobiSVD<MatrixType, ColPivHouseholderQRPreconditioner>& svd, const MatrixType& matrix)
+  {
+    if(matrix.cols() > matrix.rows())
+    {
+      m_adjoint = matrix.adjoint();
+      m_qr.compute(m_adjoint);
+
+      svd.m_workMatrix = m_qr.matrixQR().block(0,0,matrix.rows(),matrix.rows()).template triangularView<Upper>().adjoint();
+      if(svd.m_computeFullV) m_qr.householderQ().evalTo(svd.m_matrixV, m_workspace);
+      else if(svd.m_computeThinV)
+      {
+        svd.m_matrixV.setIdentity(matrix.cols(), matrix.rows());
+        m_qr.householderQ().applyThisOnTheLeft(svd.m_matrixV, m_workspace);
+      }
+      if(svd.computeU()) svd.m_matrixU = m_qr.colsPermutation();
+      return true;
+    }
+    else return false;
+  }
+
+private:
+  typedef ColPivHouseholderQR<TransposeTypeWithSameStorageOrder> QRType;
+  QRType m_qr;
+  TransposeTypeWithSameStorageOrder m_adjoint;
+  typename internal::plain_row_type<MatrixType>::type m_workspace;
+};
+
+/*** preconditioner using HouseholderQR ***/
+
+template<typename MatrixType>
+class qr_preconditioner_impl<MatrixType, HouseholderQRPreconditioner, PreconditionIfMoreRowsThanCols, true>
+{
+public:
+  typedef typename MatrixType::Index Index;
+
+  void allocate(const JacobiSVD<MatrixType, HouseholderQRPreconditioner>& svd)
+  {
+    if (svd.rows() != m_qr.rows() || svd.cols() != m_qr.cols())
+    {
+      m_qr.~QRType();
+      ::new (&m_qr) QRType(svd.rows(), svd.cols());
+    }
+    if (svd.m_computeFullU) m_workspace.resize(svd.rows());
+    else if (svd.m_computeThinU) m_workspace.resize(svd.cols());
+  }
+
+  bool run(JacobiSVD<MatrixType, HouseholderQRPreconditioner>& svd, const MatrixType& matrix)
+  {
+    if(matrix.rows() > matrix.cols())
+    {
+      m_qr.compute(matrix);
+      svd.m_workMatrix = m_qr.matrixQR().block(0,0,matrix.cols(),matrix.cols()).template triangularView<Upper>();
+      if(svd.m_computeFullU) m_qr.householderQ().evalTo(svd.m_matrixU, m_workspace);
+      else if(svd.m_computeThinU)
+      {
+        svd.m_matrixU.setIdentity(matrix.rows(), matrix.cols());
+        m_qr.householderQ().applyThisOnTheLeft(svd.m_matrixU, m_workspace);
+      }
+      if(svd.computeV()) svd.m_matrixV.setIdentity(matrix.cols(), matrix.cols());
+      return true;
+    }
+    return false;
+  }
+private:
+  typedef HouseholderQR<MatrixType> QRType;
+  QRType m_qr;
+  typename internal::plain_col_type<MatrixType>::type m_workspace;
+};
+
+template<typename MatrixType>
+class qr_preconditioner_impl<MatrixType, HouseholderQRPreconditioner, PreconditionIfMoreColsThanRows, true>
+{
+public:
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  enum
+  {
+    RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+    ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+    MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
+    Options = MatrixType::Options
+  };
+
+  typedef Matrix<Scalar, ColsAtCompileTime, RowsAtCompileTime, Options, MaxColsAtCompileTime, MaxRowsAtCompileTime>
+          TransposeTypeWithSameStorageOrder;
+
+  void allocate(const JacobiSVD<MatrixType, HouseholderQRPreconditioner>& svd)
+  {
+    if (svd.cols() != m_qr.rows() || svd.rows() != m_qr.cols())
+    {
+      m_qr.~QRType();
+      ::new (&m_qr) QRType(svd.cols(), svd.rows());
+    }
+    if (svd.m_computeFullV) m_workspace.resize(svd.cols());
+    else if (svd.m_computeThinV) m_workspace.resize(svd.rows());
+    m_adjoint.resize(svd.cols(), svd.rows());
+  }
+
+  bool run(JacobiSVD<MatrixType, HouseholderQRPreconditioner>& svd, const MatrixType& matrix)
+  {
+    if(matrix.cols() > matrix.rows())
+    {
+      m_adjoint = matrix.adjoint();
+      m_qr.compute(m_adjoint);
+
+      svd.m_workMatrix = m_qr.matrixQR().block(0,0,matrix.rows(),matrix.rows()).template triangularView<Upper>().adjoint();
+      if(svd.m_computeFullV) m_qr.householderQ().evalTo(svd.m_matrixV, m_workspace);
+      else if(svd.m_computeThinV)
+      {
+        svd.m_matrixV.setIdentity(matrix.cols(), matrix.rows());
+        m_qr.householderQ().applyThisOnTheLeft(svd.m_matrixV, m_workspace);
+      }
+      if(svd.computeU()) svd.m_matrixU.setIdentity(matrix.rows(), matrix.rows());
+      return true;
+    }
+    else return false;
+  }
+
+private:
+  typedef HouseholderQR<TransposeTypeWithSameStorageOrder> QRType;
+  QRType m_qr;
+  TransposeTypeWithSameStorageOrder m_adjoint;
+  typename internal::plain_row_type<MatrixType>::type m_workspace;
+};
+
+/*** 2x2 SVD implementation
+ ***
+ *** JacobiSVD consists in performing a series of 2x2 SVD subproblems
+ ***/
+
+template<typename MatrixType, int QRPreconditioner>
+struct svd_precondition_2x2_block_to_be_real<MatrixType, QRPreconditioner, false>
+{
+  typedef JacobiSVD<MatrixType, QRPreconditioner> SVD;
+  typedef typename SVD::Index Index;
+  static void run(typename SVD::WorkMatrixType&, SVD&, Index, Index) {}
+};
+
+template<typename MatrixType, int QRPreconditioner>
+struct svd_precondition_2x2_block_to_be_real<MatrixType, QRPreconditioner, true>
+{
+  typedef JacobiSVD<MatrixType, QRPreconditioner> SVD;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef typename MatrixType::RealScalar RealScalar;
+  typedef typename SVD::Index Index;
+  static void run(typename SVD::WorkMatrixType& work_matrix, SVD& svd, Index p, Index q)
+  {
+    using std::sqrt;
+    Scalar z;
+    JacobiRotation<Scalar> rot;
+    RealScalar n = sqrt(numext::abs2(work_matrix.coeff(p,p)) + numext::abs2(work_matrix.coeff(q,p)));
+    if(n==0)
+    {
+      z = abs(work_matrix.coeff(p,q)) / work_matrix.coeff(p,q);
+      work_matrix.row(p) *= z;
+      if(svd.computeU()) svd.m_matrixU.col(p) *= conj(z);
+      if(work_matrix.coeff(q,q)!=Scalar(0))
+        z = abs(work_matrix.coeff(q,q)) / work_matrix.coeff(q,q);
+      else
+        z = Scalar(0);
+      work_matrix.row(q) *= z;
+      if(svd.computeU()) svd.m_matrixU.col(q) *= conj(z);
+    }
+    else
+    {
+      rot.c() = conj(work_matrix.coeff(p,p)) / n;
+      rot.s() = work_matrix.coeff(q,p) / n;
+      work_matrix.applyOnTheLeft(p,q,rot);
+      if(svd.computeU()) svd.m_matrixU.applyOnTheRight(p,q,rot.adjoint());
+      if(work_matrix.coeff(p,q) != Scalar(0))
+      {
+        Scalar z = abs(work_matrix.coeff(p,q)) / work_matrix.coeff(p,q);
+        work_matrix.col(q) *= z;
+        if(svd.computeV()) svd.m_matrixV.col(q) *= z;
+      }
+      if(work_matrix.coeff(q,q) != Scalar(0))
+      {
+        z = abs(work_matrix.coeff(q,q)) / work_matrix.coeff(q,q);
+        work_matrix.row(q) *= z;
+        if(svd.computeU()) svd.m_matrixU.col(q) *= conj(z);
+      }
+    }
+  }
+};
+
+template<typename MatrixType, typename RealScalar, typename Index>
+void real_2x2_jacobi_svd(const MatrixType& matrix, Index p, Index q,
+                         JacobiRotation<RealScalar> *j_left,
+                         JacobiRotation<RealScalar> *j_right)
+{
+  using std::sqrt;
+  using std::abs;
+  Matrix<RealScalar,2,2> m;
+  m << numext::real(matrix.coeff(p,p)), numext::real(matrix.coeff(p,q)),
+       numext::real(matrix.coeff(q,p)), numext::real(matrix.coeff(q,q));
+  JacobiRotation<RealScalar> rot1;
+  RealScalar t = m.coeff(0,0) + m.coeff(1,1);
+  RealScalar d = m.coeff(1,0) - m.coeff(0,1);
+  if(t == RealScalar(0))
+  {
+    rot1.c() = RealScalar(0);
+    rot1.s() = d > RealScalar(0) ? RealScalar(1) : RealScalar(-1);
+  }
+  else
+  {
+    RealScalar t2d2 = numext::hypot(t,d);
+    rot1.c() = abs(t)/t2d2;
+    rot1.s() = d/t2d2;
+    if(t<RealScalar(0))
+      rot1.s() = -rot1.s();
+  }
+  m.applyOnTheLeft(0,1,rot1);
+  j_right->makeJacobi(m,0,1);
+  *j_left  = rot1 * j_right->transpose();
+}
+
+} // end namespace internal
+
+/** \ingroup SVD_Module
+  *
+  *
+  * \class JacobiSVD
+  *
+  * \brief Two-sided Jacobi SVD decomposition of a rectangular matrix
+  *
+  * \param MatrixType the type of the matrix of which we are computing the SVD decomposition
+  * \param QRPreconditioner this optional parameter allows to specify the type of QR decomposition that will be used internally
+  *                        for the R-SVD step for non-square matrices. See discussion of possible values below.
+  *
+  * SVD decomposition consists in decomposing any n-by-p matrix \a A as a product
+  *   \f[ A = U S V^* \f]
+  * where \a U is a n-by-n unitary, \a V is a p-by-p unitary, and \a S is a n-by-p real positive matrix which is zero outside of its main diagonal;
+  * the diagonal entries of S are known as the \em singular \em values of \a A and the columns of \a U and \a V are known as the left
+  * and right \em singular \em vectors of \a A respectively.
+  *
+  * Singular values are always sorted in decreasing order.
+  *
+  * This JacobiSVD decomposition computes only the singular values by default. If you want \a U or \a V, you need to ask for them explicitly.
+  *
+  * You can ask for only \em thin \a U or \a V to be computed, meaning the following. In case of a rectangular n-by-p matrix, letting \a m be the
+  * smaller value among \a n and \a p, there are only \a m singular vectors; the remaining columns of \a U and \a V do not correspond to actual
+  * singular vectors. Asking for \em thin \a U or \a V means asking for only their \a m first columns to be formed. So \a U is then a n-by-m matrix,
+  * and \a V is then a p-by-m matrix. Notice that thin \a U and \a V are all you need for (least squares) solving.
+  *
+  * Here's an example demonstrating basic usage:
+  * \include JacobiSVD_basic.cpp
+  * Output: \verbinclude JacobiSVD_basic.out
+  *
+  * This JacobiSVD class is a two-sided Jacobi R-SVD decomposition, ensuring optimal reliability and accuracy. The downside is that it's slower than
+  * bidiagonalizing SVD algorithms for large square matrices; however its complexity is still \f$ O(n^2p) \f$ where \a n is the smaller dimension and
+  * \a p is the greater dimension, meaning that it is still of the same order of complexity as the faster bidiagonalizing R-SVD algorithms.
+  * In particular, like any R-SVD, it takes advantage of non-squareness in that its complexity is only linear in the greater dimension.
+  *
+  * If the input matrix has inf or nan coefficients, the result of the computation is undefined, but the computation is guaranteed to
+  * terminate in finite (and reasonable) time.
+  *
+  * The possible values for QRPreconditioner are:
+  * \li ColPivHouseholderQRPreconditioner is the default. In practice it's very safe. It uses column-pivoting QR.
+  * \li FullPivHouseholderQRPreconditioner, is the safest and slowest. It uses full-pivoting QR.
+  *     Contrary to other QRs, it doesn't allow computing thin unitaries.
+  * \li HouseholderQRPreconditioner is the fastest, and less safe and accurate than the pivoting variants. It uses non-pivoting QR.
+  *     This is very similar in safety and accuracy to the bidiagonalization process used by bidiagonalizing SVD algorithms (since bidiagonalization
+  *     is inherently non-pivoting). However the resulting SVD is still more reliable than bidiagonalizing SVDs because the Jacobi-based iterarive
+  *     process is more reliable than the optimized bidiagonal SVD iterations.
+  * \li NoQRPreconditioner allows not to use a QR preconditioner at all. This is useful if you know that you will only be computing
+  *     JacobiSVD decompositions of square matrices. Non-square matrices require a QR preconditioner. Using this option will result in
+  *     faster compilation and smaller executable code. It won't significantly speed up computation, since JacobiSVD is always checking
+  *     if QR preconditioning is needed before applying it anyway.
+  *
+  * \sa MatrixBase::jacobiSvd()
+  */
+template<typename _MatrixType, int QRPreconditioner> class JacobiSVD
+{
+  public:
+
+    typedef _MatrixType MatrixType;
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename NumTraits<typename MatrixType::Scalar>::Real RealScalar;
+    typedef typename MatrixType::Index Index;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      DiagSizeAtCompileTime = EIGEN_SIZE_MIN_PREFER_DYNAMIC(RowsAtCompileTime,ColsAtCompileTime),
+      MaxRowsAtCompileTime = MatrixType::MaxRowsAtCompileTime,
+      MaxColsAtCompileTime = MatrixType::MaxColsAtCompileTime,
+      MaxDiagSizeAtCompileTime = EIGEN_SIZE_MIN_PREFER_FIXED(MaxRowsAtCompileTime,MaxColsAtCompileTime),
+      MatrixOptions = MatrixType::Options
+    };
+
+    typedef Matrix<Scalar, RowsAtCompileTime, RowsAtCompileTime,
+                   MatrixOptions, MaxRowsAtCompileTime, MaxRowsAtCompileTime>
+            MatrixUType;
+    typedef Matrix<Scalar, ColsAtCompileTime, ColsAtCompileTime,
+                   MatrixOptions, MaxColsAtCompileTime, MaxColsAtCompileTime>
+            MatrixVType;
+    typedef typename internal::plain_diag_type<MatrixType, RealScalar>::type SingularValuesType;
+    typedef typename internal::plain_row_type<MatrixType>::type RowType;
+    typedef typename internal::plain_col_type<MatrixType>::type ColType;
+    typedef Matrix<Scalar, DiagSizeAtCompileTime, DiagSizeAtCompileTime,
+                   MatrixOptions, MaxDiagSizeAtCompileTime, MaxDiagSizeAtCompileTime>
+            WorkMatrixType;
+
+    /** \brief Default Constructor.
+      *
+      * The default constructor is useful in cases in which the user intends to
+      * perform decompositions via JacobiSVD::compute(const MatrixType&).
+      */
+    JacobiSVD()
+      : m_isInitialized(false),
+        m_isAllocated(false),
+        m_usePrescribedThreshold(false),
+        m_computationOptions(0),
+        m_rows(-1), m_cols(-1), m_diagSize(0)
+    {}
+
+
+    /** \brief Default Constructor with memory preallocation
+      *
+      * Like the default constructor but with preallocation of the internal data
+      * according to the specified problem size.
+      * \sa JacobiSVD()
+      */
+    JacobiSVD(Index rows, Index cols, unsigned int computationOptions = 0)
+      : m_isInitialized(false),
+        m_isAllocated(false),
+        m_usePrescribedThreshold(false),
+        m_computationOptions(0),
+        m_rows(-1), m_cols(-1)
+    {
+      allocate(rows, cols, computationOptions);
+    }
+
+    /** \brief Constructor performing the decomposition of given matrix.
+     *
+     * \param matrix the matrix to decompose
+     * \param computationOptions optional parameter allowing to specify if you want full or thin U or V unitaries to be computed.
+     *                           By default, none is computed. This is a bit-field, the possible bits are #ComputeFullU, #ComputeThinU,
+     *                           #ComputeFullV, #ComputeThinV.
+     *
+     * Thin unitaries are only available if your matrix type has a Dynamic number of columns (for example MatrixXf). They also are not
+     * available with the (non-default) FullPivHouseholderQR preconditioner.
+     */
+    JacobiSVD(const MatrixType& matrix, unsigned int computationOptions = 0)
+      : m_isInitialized(false),
+        m_isAllocated(false),
+        m_usePrescribedThreshold(false),
+        m_computationOptions(0),
+        m_rows(-1), m_cols(-1)
+    {
+      compute(matrix, computationOptions);
+    }
+
+    /** \brief Method performing the decomposition of given matrix using custom options.
+     *
+     * \param matrix the matrix to decompose
+     * \param computationOptions optional parameter allowing to specify if you want full or thin U or V unitaries to be computed.
+     *                           By default, none is computed. This is a bit-field, the possible bits are #ComputeFullU, #ComputeThinU,
+     *                           #ComputeFullV, #ComputeThinV.
+     *
+     * Thin unitaries are only available if your matrix type has a Dynamic number of columns (for example MatrixXf). They also are not
+     * available with the (non-default) FullPivHouseholderQR preconditioner.
+     */
+    JacobiSVD& compute(const MatrixType& matrix, unsigned int computationOptions);
+
+    /** \brief Method performing the decomposition of given matrix using current options.
+     *
+     * \param matrix the matrix to decompose
+     *
+     * This method uses the current \a computationOptions, as already passed to the constructor or to compute(const MatrixType&, unsigned int).
+     */
+    JacobiSVD& compute(const MatrixType& matrix)
+    {
+      return compute(matrix, m_computationOptions);
+    }
+
+    /** \returns the \a U matrix.
+     *
+     * For the SVD decomposition of a n-by-p matrix, letting \a m be the minimum of \a n and \a p,
+     * the U matrix is n-by-n if you asked for #ComputeFullU, and is n-by-m if you asked for #ComputeThinU.
+     *
+     * The \a m first columns of \a U are the left singular vectors of the matrix being decomposed.
+     *
+     * This method asserts that you asked for \a U to be computed.
+     */
+    const MatrixUType& matrixU() const
+    {
+      eigen_assert(m_isInitialized && "JacobiSVD is not initialized.");
+      eigen_assert(computeU() && "This JacobiSVD decomposition didn't compute U. Did you ask for it?");
+      return m_matrixU;
+    }
+
+    /** \returns the \a V matrix.
+     *
+     * For the SVD decomposition of a n-by-p matrix, letting \a m be the minimum of \a n and \a p,
+     * the V matrix is p-by-p if you asked for #ComputeFullV, and is p-by-m if you asked for ComputeThinV.
+     *
+     * The \a m first columns of \a V are the right singular vectors of the matrix being decomposed.
+     *
+     * This method asserts that you asked for \a V to be computed.
+     */
+    const MatrixVType& matrixV() const
+    {
+      eigen_assert(m_isInitialized && "JacobiSVD is not initialized.");
+      eigen_assert(computeV() && "This JacobiSVD decomposition didn't compute V. Did you ask for it?");
+      return m_matrixV;
+    }
+
+    /** \returns the vector of singular values.
+     *
+     * For the SVD decomposition of a n-by-p matrix, letting \a m be the minimum of \a n and \a p, the
+     * returned vector has size \a m.  Singular values are always sorted in decreasing order.
+     */
+    const SingularValuesType& singularValues() const
+    {
+      eigen_assert(m_isInitialized && "JacobiSVD is not initialized.");
+      return m_singularValues;
+    }
+
+    /** \returns true if \a U (full or thin) is asked for in this SVD decomposition */
+    inline bool computeU() const { return m_computeFullU || m_computeThinU; }
+    /** \returns true if \a V (full or thin) is asked for in this SVD decomposition */
+    inline bool computeV() const { return m_computeFullV || m_computeThinV; }
+
+    /** \returns a (least squares) solution of \f$ A x = b \f$ using the current SVD decomposition of A.
+      *
+      * \param b the right-hand-side of the equation to solve.
+      *
+      * \note Solving requires both U and V to be computed. Thin U and V are enough, there is no need for full U or V.
+      *
+      * \note SVD solving is implicitly least-squares. Thus, this method serves both purposes of exact solving and least-squares solving.
+      * In other words, the returned solution is guaranteed to minimize the Euclidean norm \f$ \Vert A x - b \Vert \f$.
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<JacobiSVD, Rhs>
+    solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "JacobiSVD is not initialized.");
+      eigen_assert(computeU() && computeV() && "JacobiSVD::solve() requires both unitaries U and V to be computed (thin unitaries suffice).");
+      return internal::solve_retval<JacobiSVD, Rhs>(*this, b.derived());
+    }
+
+    /** \returns the number of singular values that are not exactly 0 */
+    Index nonzeroSingularValues() const
+    {
+      eigen_assert(m_isInitialized && "JacobiSVD is not initialized.");
+      return m_nonzeroSingularValues;
+    }
+    
+    /** \returns the rank of the matrix of which \c *this is the SVD.
+      *
+      * \note This method has to determine which singular values should be considered nonzero.
+      *       For that, it uses the threshold value that you can control by calling
+      *       setThreshold(const RealScalar&).
+      */
+    inline Index rank() const
+    {
+      using std::abs;
+      eigen_assert(m_isInitialized && "JacobiSVD is not initialized.");
+      if(m_singularValues.size()==0) return 0;
+      RealScalar premultiplied_threshold = m_singularValues.coeff(0) * threshold();
+      Index i = m_nonzeroSingularValues-1;
+      while(i>=0 && m_singularValues.coeff(i) < premultiplied_threshold) --i;
+      return i+1;
+    }
+    
+    /** Allows to prescribe a threshold to be used by certain methods, such as rank() and solve(),
+      * which need to determine when singular values are to be considered nonzero.
+      * This is not used for the SVD decomposition itself.
+      *
+      * When it needs to get the threshold value, Eigen calls threshold().
+      * The default is \c NumTraits<Scalar>::epsilon()
+      *
+      * \param threshold The new value to use as the threshold.
+      *
+      * A singular value will be considered nonzero if its value is strictly greater than
+      *  \f$ \vert singular value \vert \leqslant threshold \times \vert max singular value \vert \f$.
+      *
+      * If you want to come back to the default behavior, call setThreshold(Default_t)
+      */
+    JacobiSVD& setThreshold(const RealScalar& threshold)
+    {
+      m_usePrescribedThreshold = true;
+      m_prescribedThreshold = threshold;
+      return *this;
+    }
+
+    /** Allows to come back to the default behavior, letting Eigen use its default formula for
+      * determining the threshold.
+      *
+      * You should pass the special object Eigen::Default as parameter here.
+      * \code svd.setThreshold(Eigen::Default); \endcode
+      *
+      * See the documentation of setThreshold(const RealScalar&).
+      */
+    JacobiSVD& setThreshold(Default_t)
+    {
+      m_usePrescribedThreshold = false;
+      return *this;
+    }
+
+    /** Returns the threshold that will be used by certain methods such as rank().
+      *
+      * See the documentation of setThreshold(const RealScalar&).
+      */
+    RealScalar threshold() const
+    {
+      eigen_assert(m_isInitialized || m_usePrescribedThreshold);
+      return m_usePrescribedThreshold ? m_prescribedThreshold
+                                      : (std::max<Index>)(1,m_diagSize)*NumTraits<Scalar>::epsilon();
+    }
+
+    inline Index rows() const { return m_rows; }
+    inline Index cols() const { return m_cols; }
+
+  private:
+    void allocate(Index rows, Index cols, unsigned int computationOptions);
+
+  protected:
+    MatrixUType m_matrixU;
+    MatrixVType m_matrixV;
+    SingularValuesType m_singularValues;
+    WorkMatrixType m_workMatrix;
+    bool m_isInitialized, m_isAllocated, m_usePrescribedThreshold;
+    bool m_computeFullU, m_computeThinU;
+    bool m_computeFullV, m_computeThinV;
+    unsigned int m_computationOptions;
+    Index m_nonzeroSingularValues, m_rows, m_cols, m_diagSize;
+    RealScalar m_prescribedThreshold;
+
+    template<typename __MatrixType, int _QRPreconditioner, bool _IsComplex>
+    friend struct internal::svd_precondition_2x2_block_to_be_real;
+    template<typename __MatrixType, int _QRPreconditioner, int _Case, bool _DoAnything>
+    friend struct internal::qr_preconditioner_impl;
+
+    internal::qr_preconditioner_impl<MatrixType, QRPreconditioner, internal::PreconditionIfMoreColsThanRows> m_qr_precond_morecols;
+    internal::qr_preconditioner_impl<MatrixType, QRPreconditioner, internal::PreconditionIfMoreRowsThanCols> m_qr_precond_morerows;
+};
+
+template<typename MatrixType, int QRPreconditioner>
+void JacobiSVD<MatrixType, QRPreconditioner>::allocate(Index rows, Index cols, unsigned int computationOptions)
+{
+  eigen_assert(rows >= 0 && cols >= 0);
+
+  if (m_isAllocated &&
+      rows == m_rows &&
+      cols == m_cols &&
+      computationOptions == m_computationOptions)
+  {
+    return;
+  }
+
+  m_rows = rows;
+  m_cols = cols;
+  m_isInitialized = false;
+  m_isAllocated = true;
+  m_computationOptions = computationOptions;
+  m_computeFullU = (computationOptions & ComputeFullU) != 0;
+  m_computeThinU = (computationOptions & ComputeThinU) != 0;
+  m_computeFullV = (computationOptions & ComputeFullV) != 0;
+  m_computeThinV = (computationOptions & ComputeThinV) != 0;
+  eigen_assert(!(m_computeFullU && m_computeThinU) && "JacobiSVD: you can't ask for both full and thin U");
+  eigen_assert(!(m_computeFullV && m_computeThinV) && "JacobiSVD: you can't ask for both full and thin V");
+  eigen_assert(EIGEN_IMPLIES(m_computeThinU || m_computeThinV, MatrixType::ColsAtCompileTime==Dynamic) &&
+              "JacobiSVD: thin U and V are only available when your matrix has a dynamic number of columns.");
+  if (QRPreconditioner == FullPivHouseholderQRPreconditioner)
+  {
+      eigen_assert(!(m_computeThinU || m_computeThinV) &&
+              "JacobiSVD: can't compute thin U or thin V with the FullPivHouseholderQR preconditioner. "
+              "Use the ColPivHouseholderQR preconditioner instead.");
+  }
+  m_diagSize = (std::min)(m_rows, m_cols);
+  m_singularValues.resize(m_diagSize);
+  if(RowsAtCompileTime==Dynamic)
+    m_matrixU.resize(m_rows, m_computeFullU ? m_rows
+                            : m_computeThinU ? m_diagSize
+                            : 0);
+  if(ColsAtCompileTime==Dynamic)
+    m_matrixV.resize(m_cols, m_computeFullV ? m_cols
+                            : m_computeThinV ? m_diagSize
+                            : 0);
+  m_workMatrix.resize(m_diagSize, m_diagSize);
+  
+  if(m_cols>m_rows) m_qr_precond_morecols.allocate(*this);
+  if(m_rows>m_cols) m_qr_precond_morerows.allocate(*this);
+}
+
+template<typename MatrixType, int QRPreconditioner>
+JacobiSVD<MatrixType, QRPreconditioner>&
+JacobiSVD<MatrixType, QRPreconditioner>::compute(const MatrixType& matrix, unsigned int computationOptions)
+{
+  using std::abs;
+  allocate(matrix.rows(), matrix.cols(), computationOptions);
+
+  // currently we stop when we reach precision 2*epsilon as the last bit of precision can require an unreasonable number of iterations,
+  // only worsening the precision of U and V as we accumulate more rotations
+  const RealScalar precision = RealScalar(2) * NumTraits<Scalar>::epsilon();
+
+  // limit for very small denormal numbers to be considered zero in order to avoid infinite loops (see bug 286)
+  const RealScalar considerAsZero = RealScalar(2) * std::numeric_limits<RealScalar>::denorm_min();
+
+  /*** step 1. The R-SVD step: we use a QR decomposition to reduce to the case of a square matrix */
+
+  if(!m_qr_precond_morecols.run(*this, matrix) && !m_qr_precond_morerows.run(*this, matrix))
+  {
+    m_workMatrix = matrix.block(0,0,m_diagSize,m_diagSize);
+    if(m_computeFullU) m_matrixU.setIdentity(m_rows,m_rows);
+    if(m_computeThinU) m_matrixU.setIdentity(m_rows,m_diagSize);
+    if(m_computeFullV) m_matrixV.setIdentity(m_cols,m_cols);
+    if(m_computeThinV) m_matrixV.setIdentity(m_cols, m_diagSize);
+  }
+  
+  // Scaling factor to reducover/under-flows
+  RealScalar scale = m_workMatrix.cwiseAbs().maxCoeff();
+  if(scale==RealScalar(0)) scale = RealScalar(1);
+  m_workMatrix /= scale;
+
+  /*** step 2. The main Jacobi SVD iteration. ***/
+
+  bool finished = false;
+  while(!finished)
+  {
+    finished = true;
+
+    // do a sweep: for all index pairs (p,q), perform SVD of the corresponding 2x2 sub-matrix
+
+    for(Index p = 1; p < m_diagSize; ++p)
+    {
+      for(Index q = 0; q < p; ++q)
+      {
+        // if this 2x2 sub-matrix is not diagonal already...
+        // notice that this comparison will evaluate to false if any NaN is involved, ensuring that NaN's don't
+        // keep us iterating forever. Similarly, small denormal numbers are considered zero.
+        RealScalar threshold = numext::maxi(considerAsZero, precision * numext::maxi(abs(m_workMatrix.coeff(p,p)),
+                                                                                     abs(m_workMatrix.coeff(q,q))));
+        if(numext::maxi(abs(m_workMatrix.coeff(p,q)),abs(m_workMatrix.coeff(q,p))) > threshold)
+        {
+          finished = false;
+
+          // perform SVD decomposition of 2x2 sub-matrix corresponding to indices p,q to make it diagonal
+          internal::svd_precondition_2x2_block_to_be_real<MatrixType, QRPreconditioner>::run(m_workMatrix, *this, p, q);
+          JacobiRotation<RealScalar> j_left, j_right;
+          internal::real_2x2_jacobi_svd(m_workMatrix, p, q, &j_left, &j_right);
+
+          // accumulate resulting Jacobi rotations
+          m_workMatrix.applyOnTheLeft(p,q,j_left);
+          if(computeU()) m_matrixU.applyOnTheRight(p,q,j_left.transpose());
+
+          m_workMatrix.applyOnTheRight(p,q,j_right);
+          if(computeV()) m_matrixV.applyOnTheRight(p,q,j_right);
+        }
+      }
+    }
+  }
+
+  /*** step 3. The work matrix is now diagonal, so ensure it's positive so its diagonal entries are the singular values ***/
+
+  for(Index i = 0; i < m_diagSize; ++i)
+  {
+    RealScalar a = abs(m_workMatrix.coeff(i,i));
+    m_singularValues.coeffRef(i) = a;
+    if(computeU() && (a!=RealScalar(0))) m_matrixU.col(i) *= m_workMatrix.coeff(i,i)/a;
+  }
+  
+  m_singularValues *= scale;
+
+  /*** step 4. Sort singular values in descending order and compute the number of nonzero singular values ***/
+
+  m_nonzeroSingularValues = m_diagSize;
+  for(Index i = 0; i < m_diagSize; i++)
+  {
+    Index pos;
+    RealScalar maxRemainingSingularValue = m_singularValues.tail(m_diagSize-i).maxCoeff(&pos);
+    if(maxRemainingSingularValue == RealScalar(0))
+    {
+      m_nonzeroSingularValues = i;
+      break;
+    }
+    if(pos)
+    {
+      pos += i;
+      std::swap(m_singularValues.coeffRef(i), m_singularValues.coeffRef(pos));
+      if(computeU()) m_matrixU.col(pos).swap(m_matrixU.col(i));
+      if(computeV()) m_matrixV.col(pos).swap(m_matrixV.col(i));
+    }
+  }
+
+  m_isInitialized = true;
+  return *this;
+}
+
+namespace internal {
+template<typename _MatrixType, int QRPreconditioner, typename Rhs>
+struct solve_retval<JacobiSVD<_MatrixType, QRPreconditioner>, Rhs>
+  : solve_retval_base<JacobiSVD<_MatrixType, QRPreconditioner>, Rhs>
+{
+  typedef JacobiSVD<_MatrixType, QRPreconditioner> JacobiSVDType;
+  EIGEN_MAKE_SOLVE_HELPERS(JacobiSVDType,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    eigen_assert(rhs().rows() == dec().rows());
+
+    // A = U S V^*
+    // So A^{-1} = V S^{-1} U^*
+
+    Matrix<Scalar, Dynamic, Rhs::ColsAtCompileTime, 0, _MatrixType::MaxRowsAtCompileTime, Rhs::MaxColsAtCompileTime> tmp;
+    Index rank = dec().rank();
+    
+    tmp.noalias() = dec().matrixU().leftCols(rank).adjoint() * rhs();
+    tmp = dec().singularValues().head(rank).asDiagonal().inverse() * tmp;
+    dst = dec().matrixV().leftCols(rank) * tmp;
+  }
+};
+} // end namespace internal
+
+#ifndef __CUDACC__
+/** \svd_module
+  *
+  * \return the singular value decomposition of \c *this computed by two-sided
+  * Jacobi transformations.
+  *
+  * \sa class JacobiSVD
+  */
+template<typename Derived>
+JacobiSVD<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::jacobiSvd(unsigned int computationOptions) const
+{
+  return JacobiSVD<PlainObject>(*this, computationOptions);
+}
+#endif // __CUDACC__
+
+} // end namespace Eigen
+
+#endif // EIGEN_JACOBISVD_H
diff --git a/third_party/eigen3/Eigen/src/SVD/JacobiSVD_MKL.h b/third_party/eigen3/Eigen/src/SVD/JacobiSVD_MKL.h
new file mode 100644
index 0000000000..decda75405
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SVD/JacobiSVD_MKL.h
@@ -0,0 +1,92 @@
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without modification,
+ are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+ * Neither the name of Intel Corporation nor the names of its contributors may
+   be used to endorse or promote products derived from this software without
+   specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ ********************************************************************************
+ *   Content : Eigen bindings to Intel(R) MKL
+ *    Singular Value Decomposition - SVD.
+ ********************************************************************************
+*/
+
+#ifndef EIGEN_JACOBISVD_MKL_H
+#define EIGEN_JACOBISVD_MKL_H
+
+#include "Eigen/src/Core/util/MKL_support.h"
+
+namespace Eigen { 
+
+/** \internal Specialization for the data types supported by MKL */
+
+#define EIGEN_MKL_SVD(EIGTYPE, MKLTYPE, MKLRTYPE, MKLPREFIX, EIGCOLROW, MKLCOLROW) \
+template<> inline \
+JacobiSVD<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW, Dynamic, Dynamic>, ColPivHouseholderQRPreconditioner>& \
+JacobiSVD<Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW, Dynamic, Dynamic>, ColPivHouseholderQRPreconditioner>::compute(const Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW, Dynamic, Dynamic>& matrix, unsigned int computationOptions) \
+{ \
+  typedef Matrix<EIGTYPE, Dynamic, Dynamic, EIGCOLROW, Dynamic, Dynamic> MatrixType; \
+  typedef MatrixType::Scalar Scalar; \
+  typedef MatrixType::RealScalar RealScalar; \
+  allocate(matrix.rows(), matrix.cols(), computationOptions); \
+\
+  /*const RealScalar precision = RealScalar(2) * NumTraits<Scalar>::epsilon();*/ \
+  m_nonzeroSingularValues = m_diagSize; \
+\
+  lapack_int lda = matrix.outerStride(), ldu, ldvt; \
+  lapack_int matrix_order = MKLCOLROW; \
+  char jobu, jobvt; \
+  MKLTYPE *u, *vt, dummy; \
+  jobu  = (m_computeFullU) ? 'A' : (m_computeThinU) ? 'S' : 'N'; \
+  jobvt = (m_computeFullV) ? 'A' : (m_computeThinV) ? 'S' : 'N'; \
+  if (computeU()) { \
+    ldu  = m_matrixU.outerStride(); \
+    u    = (MKLTYPE*)m_matrixU.data(); \
+  } else { ldu=1; u=&dummy; }\
+  MatrixType localV; \
+  ldvt = (m_computeFullV) ? m_cols : (m_computeThinV) ? m_diagSize : 1; \
+  if (computeV()) { \
+    localV.resize(ldvt, m_cols); \
+    vt   = (MKLTYPE*)localV.data(); \
+  } else { ldvt=1; vt=&dummy; }\
+  Matrix<MKLRTYPE, Dynamic, Dynamic> superb; superb.resize(m_diagSize, 1); \
+  MatrixType m_temp; m_temp = matrix; \
+  LAPACKE_##MKLPREFIX##gesvd( matrix_order, jobu, jobvt, m_rows, m_cols, (MKLTYPE*)m_temp.data(), lda, (MKLRTYPE*)m_singularValues.data(), u, ldu, vt, ldvt, superb.data()); \
+  if (computeV()) m_matrixV = localV.adjoint(); \
+ /* for(int i=0;i<m_diagSize;i++) if (m_singularValues.coeffRef(i) < precision) { m_nonzeroSingularValues--; m_singularValues.coeffRef(i)=RealScalar(0);}*/ \
+  m_isInitialized = true; \
+  return *this; \
+}
+
+EIGEN_MKL_SVD(double,   double,        double, d, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_SVD(float,    float,         float , s, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_SVD(dcomplex, MKL_Complex16, double, z, ColMajor, LAPACK_COL_MAJOR)
+EIGEN_MKL_SVD(scomplex, MKL_Complex8,  float , c, ColMajor, LAPACK_COL_MAJOR)
+
+EIGEN_MKL_SVD(double,   double,        double, d, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_SVD(float,    float,         float , s, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_SVD(dcomplex, MKL_Complex16, double, z, RowMajor, LAPACK_ROW_MAJOR)
+EIGEN_MKL_SVD(scomplex, MKL_Complex8,  float , c, RowMajor, LAPACK_ROW_MAJOR)
+
+} // end namespace Eigen
+
+#endif // EIGEN_JACOBISVD_MKL_H
diff --git a/third_party/eigen3/Eigen/src/SVD/UpperBidiagonalization.h b/third_party/eigen3/Eigen/src/SVD/UpperBidiagonalization.h
new file mode 100644
index 0000000000..40067682c9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SVD/UpperBidiagonalization.h
@@ -0,0 +1,396 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_BIDIAGONALIZATION_H
+#define EIGEN_BIDIAGONALIZATION_H
+
+namespace Eigen { 
+
+namespace internal {
+// UpperBidiagonalization will probably be replaced by a Bidiagonalization class, don't want to make it stable API.
+// At the same time, it's useful to keep for now as it's about the only thing that is testing the BandMatrix class.
+
+template<typename _MatrixType> class UpperBidiagonalization
+{
+  public:
+
+    typedef _MatrixType MatrixType;
+    enum {
+      RowsAtCompileTime = MatrixType::RowsAtCompileTime,
+      ColsAtCompileTime = MatrixType::ColsAtCompileTime,
+      ColsAtCompileTimeMinusOne = internal::decrement_size<ColsAtCompileTime>::ret
+    };
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef Matrix<Scalar, 1, ColsAtCompileTime> RowVectorType;
+    typedef Matrix<Scalar, RowsAtCompileTime, 1> ColVectorType;
+    typedef BandMatrix<RealScalar, ColsAtCompileTime, ColsAtCompileTime, 1, 0, RowMajor> BidiagonalType;
+    typedef Matrix<Scalar, ColsAtCompileTime, 1> DiagVectorType;
+    typedef Matrix<Scalar, ColsAtCompileTimeMinusOne, 1> SuperDiagVectorType;
+    typedef HouseholderSequence<
+              const MatrixType,
+              CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>, const Diagonal<const MatrixType,0> >
+            > HouseholderUSequenceType;
+    typedef HouseholderSequence<
+              const typename internal::remove_all<typename MatrixType::ConjugateReturnType>::type,
+              Diagonal<const MatrixType,1>,
+              OnTheRight
+            > HouseholderVSequenceType;
+    
+    /**
+    * \brief Default Constructor.
+    *
+    * The default constructor is useful in cases in which the user intends to
+    * perform decompositions via Bidiagonalization::compute(const MatrixType&).
+    */
+    UpperBidiagonalization() : m_householder(), m_bidiagonal(), m_isInitialized(false) {}
+
+    UpperBidiagonalization(const MatrixType& matrix)
+      : m_householder(matrix.rows(), matrix.cols()),
+        m_bidiagonal(matrix.cols(), matrix.cols()),
+        m_isInitialized(false)
+    {
+      compute(matrix);
+    }
+    
+    UpperBidiagonalization& compute(const MatrixType& matrix);
+    UpperBidiagonalization& computeUnblocked(const MatrixType& matrix);
+    
+    const MatrixType& householder() const { return m_householder; }
+    const BidiagonalType& bidiagonal() const { return m_bidiagonal; }
+    
+    const HouseholderUSequenceType householderU() const
+    {
+      eigen_assert(m_isInitialized && "UpperBidiagonalization is not initialized.");
+      return HouseholderUSequenceType(m_householder, m_householder.diagonal().conjugate());
+    }
+
+    const HouseholderVSequenceType householderV() // const here gives nasty errors and i'm lazy
+    {
+      eigen_assert(m_isInitialized && "UpperBidiagonalization is not initialized.");
+      return HouseholderVSequenceType(m_householder.conjugate(), m_householder.const_derived().template diagonal<1>())
+             .setLength(m_householder.cols()-1)
+             .setShift(1);
+    }
+    
+  protected:
+    MatrixType m_householder;
+    BidiagonalType m_bidiagonal;
+    bool m_isInitialized;
+};
+
+// Standard upper bidiagonalization without fancy optimizations
+// This version should be faster for small matrix size
+template<typename MatrixType>
+void upperbidiagonalization_inplace_unblocked(MatrixType& mat,
+                                              typename MatrixType::RealScalar *diagonal,
+                                              typename MatrixType::RealScalar *upper_diagonal,
+                                              typename MatrixType::Scalar* tempData = 0)
+{
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+
+  Index rows = mat.rows();
+  Index cols = mat.cols();
+
+  typedef Matrix<Scalar,Dynamic,1,ColMajor,MatrixType::MaxRowsAtCompileTime,1> TempType;
+  TempType tempVector;
+  if(tempData==0)
+  {
+    tempVector.resize(rows);
+    tempData = tempVector.data();
+  }
+
+  for (Index k = 0; /* breaks at k==cols-1 below */ ; ++k)
+  {
+    Index remainingRows = rows - k;
+    Index remainingCols = cols - k - 1;
+
+    // construct left householder transform in-place in A
+    mat.col(k).tail(remainingRows)
+       .makeHouseholderInPlace(mat.coeffRef(k,k), diagonal[k]);
+    // apply householder transform to remaining part of A on the left
+    mat.bottomRightCorner(remainingRows, remainingCols)
+       .applyHouseholderOnTheLeft(mat.col(k).tail(remainingRows-1), mat.coeff(k,k), tempData);
+
+    if(k == cols-1) break;
+
+    // construct right householder transform in-place in mat
+    mat.row(k).tail(remainingCols)
+       .makeHouseholderInPlace(mat.coeffRef(k,k+1), upper_diagonal[k]);
+    // apply householder transform to remaining part of mat on the left
+    mat.bottomRightCorner(remainingRows-1, remainingCols)
+       .applyHouseholderOnTheRight(mat.row(k).tail(remainingCols-1).transpose(), mat.coeff(k,k+1), tempData);
+  }
+}
+
+/** \internal
+  * Helper routine for the block reduction to upper bidiagonal form.
+  *
+  * Let's partition the matrix A:
+  * 
+  *      | A00 A01 |
+  *  A = |         |
+  *      | A10 A11 |
+  *
+  * This function reduces to bidiagonal form the left \c rows x \a blockSize vertical panel [A00/A10]
+  * and the \a blockSize x \c cols horizontal panel [A00 A01] of the matrix \a A. The bottom-right block A11
+  * is updated using matrix-matrix products:
+  *   A22 -= V * Y^T - X * U^T
+  * where V and U contains the left and right Householder vectors. U and V are stored in A10, and A01
+  * respectively, and the update matrices X and Y are computed during the reduction.
+  * 
+  */
+template<typename MatrixType>
+void upperbidiagonalization_blocked_helper(MatrixType& A,
+                                           typename MatrixType::RealScalar *diagonal,
+                                           typename MatrixType::RealScalar *upper_diagonal,
+                                           typename MatrixType::Index bs,
+                                           Ref<Matrix<typename MatrixType::Scalar, Dynamic, Dynamic> > X,
+                                           Ref<Matrix<typename MatrixType::Scalar, Dynamic, Dynamic> > Y)
+{
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef Ref<Matrix<Scalar, Dynamic, 1> >                    SubColumnType;
+  typedef Ref<Matrix<Scalar, 1, Dynamic>, 0, InnerStride<> >  SubRowType;
+  typedef Ref<Matrix<Scalar, Dynamic, Dynamic> >              SubMatType;
+  
+  Index brows = A.rows();
+  Index bcols = A.cols();
+
+  Scalar tau_u, tau_u_prev(0), tau_v;
+
+  for(Index k = 0; k < bs; ++k)
+  {
+    Index remainingRows = brows - k;
+    Index remainingCols = bcols - k - 1;
+
+    SubMatType X_k1( X.block(k,0, remainingRows,k) );
+    SubMatType V_k1( A.block(k,0, remainingRows,k) );
+
+    // 1 - update the k-th column of A
+    SubColumnType v_k = A.col(k).tail(remainingRows);
+          v_k -= V_k1 * Y.row(k).head(k).adjoint();
+    if(k) v_k -= X_k1 * A.col(k).head(k);
+    
+    // 2 - construct left Householder transform in-place
+    v_k.makeHouseholderInPlace(tau_v, diagonal[k]);
+       
+    if(k+1<bcols)
+    {
+      SubMatType Y_k  ( Y.block(k+1,0, remainingCols, k+1) );
+      SubMatType U_k1 ( A.block(0,k+1, k,remainingCols) );
+      
+      // this eases the application of Householder transforAions
+      // A(k,k) will store tau_v later
+      A(k,k) = Scalar(1);
+
+      // 3 - Compute y_k^T = tau_v * ( A^T*v_k - Y_k-1*V_k-1^T*v_k - U_k-1*X_k-1^T*v_k )
+      {
+        SubColumnType y_k( Y.col(k).tail(remainingCols) );
+        
+        // let's use the begining of column k of Y as a temporary vector
+        SubColumnType tmp( Y.col(k).head(k) );
+        y_k.noalias()  = A.block(k,k+1, remainingRows,remainingCols).adjoint() * v_k; // bottleneck
+        tmp.noalias()  = V_k1.adjoint()  * v_k;
+        y_k.noalias() -= Y_k.leftCols(k) * tmp;
+        tmp.noalias()  = X_k1.adjoint()  * v_k;
+        y_k.noalias() -= U_k1.adjoint()  * tmp;
+        y_k *= numext::conj(tau_v);
+      }
+
+      // 4 - update k-th row of A (it will become u_k)
+      SubRowType u_k( A.row(k).tail(remainingCols) );
+      u_k = u_k.conjugate();
+      {
+        u_k -= Y_k * A.row(k).head(k+1).adjoint();
+        if(k) u_k -= U_k1.adjoint() * X.row(k).head(k).adjoint();
+      }
+
+      // 5 - construct right Householder transform in-placecols
+      u_k.makeHouseholderInPlace(tau_u, upper_diagonal[k]);
+
+      // this eases the application of Householder transforAions
+      // A(k,k+1) will store tau_u later
+      A(k,k+1) = Scalar(1);
+
+      // 6 - Compute x_k = tau_u * ( A*u_k - X_k-1*U_k-1^T*u_k - V_k*Y_k^T*u_k )
+      {
+        SubColumnType x_k ( X.col(k).tail(remainingRows-1) );
+        
+        // let's use the begining of column k of X as a temporary vectors
+        // note that tmp0 and tmp1 overlaps
+        SubColumnType tmp0 ( X.col(k).head(k) ),
+                      tmp1 ( X.col(k).head(k+1) );
+                    
+        x_k.noalias()   = A.block(k+1,k+1, remainingRows-1,remainingCols) * u_k.transpose(); // bottleneck
+        tmp0.noalias()  = U_k1 * u_k.transpose();
+        x_k.noalias()  -= X_k1.bottomRows(remainingRows-1) * tmp0;
+        tmp1.noalias()  = Y_k.adjoint() * u_k.transpose();
+        x_k.noalias()  -= A.block(k+1,0, remainingRows-1,k+1) * tmp1;
+        x_k *= numext::conj(tau_u);
+        tau_u = numext::conj(tau_u);
+        u_k = u_k.conjugate();
+      }
+
+      if(k>0) A.coeffRef(k-1,k) = tau_u_prev;
+      tau_u_prev = tau_u;
+    }
+    else
+      A.coeffRef(k-1,k) = tau_u_prev;
+
+    A.coeffRef(k,k) = tau_v;
+  }
+  
+  if(bs<bcols)
+    A.coeffRef(bs-1,bs) = tau_u_prev;
+
+  // update A22
+  if(bcols>bs && brows>bs)
+  {
+    SubMatType A11( A.bottomRightCorner(brows-bs,bcols-bs) );
+    SubMatType A10( A.block(bs,0, brows-bs,bs) );
+    SubMatType A01( A.block(0,bs, bs,bcols-bs) );
+    Scalar tmp = A01(bs-1,0);
+    A01(bs-1,0) = 1;
+    A11.noalias() -= A10 * Y.topLeftCorner(bcols,bs).bottomRows(bcols-bs).adjoint();
+    A11.noalias() -= X.topLeftCorner(brows,bs).bottomRows(brows-bs) * A01;
+    A01(bs-1,0) = tmp;
+  }
+}
+
+/** \internal
+  *
+  * Implementation of a block-bidiagonal reduction.
+  * It is based on the following paper:
+  *   The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Tridiagonal, and Bidiagonal Form.
+  *   by Jaeyoung Choi, Jack J. Dongarra, David W. Walker. (1995)
+  *   section 3.3
+  */
+template<typename MatrixType, typename BidiagType>
+void upperbidiagonalization_inplace_blocked(MatrixType& A, BidiagType& bidiagonal,
+                                            typename MatrixType::Index maxBlockSize=32,
+                                            typename MatrixType::Scalar* /*tempData*/ = 0)
+{
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef Block<MatrixType,Dynamic,Dynamic> BlockType;
+
+  Index rows = A.rows();
+  Index cols = A.cols();
+  Index size = (std::min)(rows, cols);
+
+  Matrix<Scalar,MatrixType::RowsAtCompileTime,Dynamic,ColMajor,MatrixType::MaxRowsAtCompileTime> X(rows,maxBlockSize);
+  Matrix<Scalar,MatrixType::ColsAtCompileTime,Dynamic,ColMajor,MatrixType::MaxColsAtCompileTime> Y(cols,maxBlockSize);
+  Index blockSize = (std::min)(maxBlockSize,size);
+
+  Index k = 0;
+  for(k = 0; k < size; k += blockSize)
+  {
+    Index bs = (std::min)(size-k,blockSize);  // actual size of the block
+    Index brows = rows - k;                   // rows of the block
+    Index bcols = cols - k;                   // columns of the block
+
+    // partition the matrix A:
+    // 
+    //      | A00 A01 A02 |
+    //      |             |
+    // A  = | A10 A11 A12 |
+    //      |             |
+    //      | A20 A21 A22 |
+    //
+    // where A11 is a bs x bs diagonal block,
+    // and let:
+    //      | A11 A12 |
+    //  B = |         |
+    //      | A21 A22 |
+
+    BlockType B = A.block(k,k,brows,bcols);
+    
+    // This stage performs the bidiagonalization of A11, A21, A12, and updating of A22.
+    // Finally, the algorithm continue on the updated A22.
+    //
+    // However, if B is too small, or A22 empty, then let's use an unblocked strategy
+    if(k+bs==cols || bcols<48) // somewhat arbitrary threshold
+    {
+      upperbidiagonalization_inplace_unblocked(B,
+                                               &(bidiagonal.template diagonal<0>().coeffRef(k)),
+                                               &(bidiagonal.template diagonal<1>().coeffRef(k)),
+                                               X.data()
+                                              );
+      break; // We're done
+    }
+    else
+    {
+      upperbidiagonalization_blocked_helper<BlockType>( B,
+                                                        &(bidiagonal.template diagonal<0>().coeffRef(k)),
+                                                        &(bidiagonal.template diagonal<1>().coeffRef(k)),
+                                                        bs,
+                                                        X.topLeftCorner(brows,bs),
+                                                        Y.topLeftCorner(bcols,bs)
+                                                      );
+    }
+  }
+}
+
+template<typename _MatrixType>
+UpperBidiagonalization<_MatrixType>& UpperBidiagonalization<_MatrixType>::computeUnblocked(const _MatrixType& matrix)
+{
+  Index rows = matrix.rows();
+  Index cols = matrix.cols();
+
+  eigen_assert(rows >= cols && "UpperBidiagonalization is only for Arices satisfying rows>=cols.");
+
+  m_householder = matrix;
+
+  ColVectorType temp(rows);
+
+  upperbidiagonalization_inplace_unblocked(m_householder,
+                                           &(m_bidiagonal.template diagonal<0>().coeffRef(0)),
+                                           &(m_bidiagonal.template diagonal<1>().coeffRef(0)),
+                                           temp.data());
+
+  m_isInitialized = true;
+  return *this;
+}
+
+template<typename _MatrixType>
+UpperBidiagonalization<_MatrixType>& UpperBidiagonalization<_MatrixType>::compute(const _MatrixType& matrix)
+{
+  Index rows = matrix.rows();
+  Index cols = matrix.cols();
+
+  eigen_assert(rows >= cols && "UpperBidiagonalization is only for Arices satisfying rows>=cols.");
+
+  m_householder = matrix;
+  upperbidiagonalization_inplace_blocked(m_householder, m_bidiagonal);
+            
+  m_isInitialized = true;
+  return *this;
+}
+
+#if 0
+/** \return the Householder QR decomposition of \c *this.
+  *
+  * \sa class Bidiagonalization
+  */
+template<typename Derived>
+const UpperBidiagonalization<typename MatrixBase<Derived>::PlainObject>
+MatrixBase<Derived>::bidiagonalization() const
+{
+  return UpperBidiagonalization<PlainObject>(eval());
+}
+#endif
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_BIDIAGONALIZATION_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/AmbiVector.h b/third_party/eigen3/Eigen/src/SparseCore/AmbiVector.h
new file mode 100644
index 0000000000..17fff96a78
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/AmbiVector.h
@@ -0,0 +1,373 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_AMBIVECTOR_H
+#define EIGEN_AMBIVECTOR_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \internal
+  * Hybrid sparse/dense vector class designed for intensive read-write operations.
+  *
+  * See BasicSparseLLT and SparseProduct for usage examples.
+  */
+template<typename _Scalar, typename _Index>
+class AmbiVector
+{
+  public:
+    typedef _Scalar Scalar;
+    typedef _Index Index;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    AmbiVector(Index size)
+      : m_buffer(0), m_zero(0), m_size(0), m_allocatedSize(0), m_allocatedElements(0), m_mode(-1)
+    {
+      resize(size);
+    }
+
+    void init(double estimatedDensity);
+    void init(int mode);
+
+    Index nonZeros() const;
+
+    /** Specifies a sub-vector to work on */
+    void setBounds(Index start, Index end) { m_start = start; m_end = end; }
+
+    void setZero();
+
+    void restart();
+    Scalar& coeffRef(Index i);
+    Scalar& coeff(Index i);
+
+    class Iterator;
+
+    ~AmbiVector() { delete[] m_buffer; }
+
+    void resize(Index size)
+    {
+      if (m_allocatedSize < size)
+        reallocate(size);
+      m_size = size;
+    }
+
+    Index size() const { return m_size; }
+
+  protected:
+
+    void reallocate(Index size)
+    {
+      // if the size of the matrix is not too large, let's allocate a bit more than needed such
+      // that we can handle dense vector even in sparse mode.
+      delete[] m_buffer;
+      if (size<1000)
+      {
+        Index allocSize = (size * sizeof(ListEl))/sizeof(Scalar);
+        m_allocatedElements = (allocSize*sizeof(Scalar))/sizeof(ListEl);
+        m_buffer = new Scalar[allocSize];
+      }
+      else
+      {
+        m_allocatedElements = (size*sizeof(Scalar))/sizeof(ListEl);
+        m_buffer = new Scalar[size];
+      }
+      m_size = size;
+      m_start = 0;
+      m_end = m_size;
+    }
+
+    void reallocateSparse()
+    {
+      Index copyElements = m_allocatedElements;
+      m_allocatedElements = (std::min)(Index(m_allocatedElements*1.5),m_size);
+      Index allocSize = m_allocatedElements * sizeof(ListEl);
+      allocSize = allocSize/sizeof(Scalar) + (allocSize%sizeof(Scalar)>0?1:0);
+      Scalar* newBuffer = new Scalar[allocSize];
+      memcpy(newBuffer,  m_buffer,  copyElements * sizeof(ListEl));
+      delete[] m_buffer;
+      m_buffer = newBuffer;
+    }
+
+  protected:
+    // element type of the linked list
+    struct ListEl
+    {
+      Index next;
+      Index index;
+      Scalar value;
+    };
+
+    // used to store data in both mode
+    Scalar* m_buffer;
+    Scalar m_zero;
+    Index m_size;
+    Index m_start;
+    Index m_end;
+    Index m_allocatedSize;
+    Index m_allocatedElements;
+    Index m_mode;
+
+    // linked list mode
+    Index m_llStart;
+    Index m_llCurrent;
+    Index m_llSize;
+};
+
+/** \returns the number of non zeros in the current sub vector */
+template<typename _Scalar,typename _Index>
+_Index AmbiVector<_Scalar,_Index>::nonZeros() const
+{
+  if (m_mode==IsSparse)
+    return m_llSize;
+  else
+    return m_end - m_start;
+}
+
+template<typename _Scalar,typename _Index>
+void AmbiVector<_Scalar,_Index>::init(double estimatedDensity)
+{
+  if (estimatedDensity>0.1)
+    init(IsDense);
+  else
+    init(IsSparse);
+}
+
+template<typename _Scalar,typename _Index>
+void AmbiVector<_Scalar,_Index>::init(int mode)
+{
+  m_mode = mode;
+  if (m_mode==IsSparse)
+  {
+    m_llSize = 0;
+    m_llStart = -1;
+  }
+}
+
+/** Must be called whenever we might perform a write access
+  * with an index smaller than the previous one.
+  *
+  * Don't worry, this function is extremely cheap.
+  */
+template<typename _Scalar,typename _Index>
+void AmbiVector<_Scalar,_Index>::restart()
+{
+  m_llCurrent = m_llStart;
+}
+
+/** Set all coefficients of current subvector to zero */
+template<typename _Scalar,typename _Index>
+void AmbiVector<_Scalar,_Index>::setZero()
+{
+  if (m_mode==IsDense)
+  {
+    for (Index i=m_start; i<m_end; ++i)
+      m_buffer[i] = Scalar(0);
+  }
+  else
+  {
+    eigen_assert(m_mode==IsSparse);
+    m_llSize = 0;
+    m_llStart = -1;
+  }
+}
+
+template<typename _Scalar,typename _Index>
+_Scalar& AmbiVector<_Scalar,_Index>::coeffRef(_Index i)
+{
+  if (m_mode==IsDense)
+    return m_buffer[i];
+  else
+  {
+    ListEl* EIGEN_RESTRICT llElements = reinterpret_cast<ListEl*>(m_buffer);
+    // TODO factorize the following code to reduce code generation
+    eigen_assert(m_mode==IsSparse);
+    if (m_llSize==0)
+    {
+      // this is the first element
+      m_llStart = 0;
+      m_llCurrent = 0;
+      ++m_llSize;
+      llElements[0].value = Scalar(0);
+      llElements[0].index = i;
+      llElements[0].next = -1;
+      return llElements[0].value;
+    }
+    else if (i<llElements[m_llStart].index)
+    {
+      // this is going to be the new first element of the list
+      ListEl& el = llElements[m_llSize];
+      el.value = Scalar(0);
+      el.index = i;
+      el.next = m_llStart;
+      m_llStart = m_llSize;
+      ++m_llSize;
+      m_llCurrent = m_llStart;
+      return el.value;
+    }
+    else
+    {
+      Index nextel = llElements[m_llCurrent].next;
+      eigen_assert(i>=llElements[m_llCurrent].index && "you must call restart() before inserting an element with lower or equal index");
+      while (nextel >= 0 && llElements[nextel].index<=i)
+      {
+        m_llCurrent = nextel;
+        nextel = llElements[nextel].next;
+      }
+
+      if (llElements[m_llCurrent].index==i)
+      {
+        // the coefficient already exists and we found it !
+        return llElements[m_llCurrent].value;
+      }
+      else
+      {
+        if (m_llSize>=m_allocatedElements)
+        {
+          reallocateSparse();
+          llElements = reinterpret_cast<ListEl*>(m_buffer);
+        }
+        eigen_internal_assert(m_llSize<m_allocatedElements && "internal error: overflow in sparse mode");
+        // let's insert a new coefficient
+        ListEl& el = llElements[m_llSize];
+        el.value = Scalar(0);
+        el.index = i;
+        el.next = llElements[m_llCurrent].next;
+        llElements[m_llCurrent].next = m_llSize;
+        ++m_llSize;
+        return el.value;
+      }
+    }
+  }
+}
+
+template<typename _Scalar,typename _Index>
+_Scalar& AmbiVector<_Scalar,_Index>::coeff(_Index i)
+{
+  if (m_mode==IsDense)
+    return m_buffer[i];
+  else
+  {
+    ListEl* EIGEN_RESTRICT llElements = reinterpret_cast<ListEl*>(m_buffer);
+    eigen_assert(m_mode==IsSparse);
+    if ((m_llSize==0) || (i<llElements[m_llStart].index))
+    {
+      return m_zero;
+    }
+    else
+    {
+      Index elid = m_llStart;
+      while (elid >= 0 && llElements[elid].index<i)
+        elid = llElements[elid].next;
+
+      if (llElements[elid].index==i)
+        return llElements[m_llCurrent].value;
+      else
+        return m_zero;
+    }
+  }
+}
+
+/** Iterator over the nonzero coefficients */
+template<typename _Scalar,typename _Index>
+class AmbiVector<_Scalar,_Index>::Iterator
+{
+  public:
+    typedef _Scalar Scalar;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    /** Default constructor
+      * \param vec the vector on which we iterate
+      * \param epsilon the minimal value used to prune zero coefficients.
+      * In practice, all coefficients having a magnitude smaller than \a epsilon
+      * are skipped.
+      */
+    Iterator(const AmbiVector& vec, const RealScalar& epsilon = 0)
+      : m_vector(vec)
+    {
+      using std::abs;
+      m_epsilon = epsilon;
+      m_isDense = m_vector.m_mode==IsDense;
+      if (m_isDense)
+      {
+        m_currentEl = 0;   // this is to avoid a compilation warning
+        m_cachedValue = 0; // this is to avoid a compilation warning
+        m_cachedIndex = m_vector.m_start-1;
+        ++(*this);
+      }
+      else
+      {
+        ListEl* EIGEN_RESTRICT llElements = reinterpret_cast<ListEl*>(m_vector.m_buffer);
+        m_currentEl = m_vector.m_llStart;
+        while (m_currentEl>=0 && abs(llElements[m_currentEl].value)<=m_epsilon)
+          m_currentEl = llElements[m_currentEl].next;
+        if (m_currentEl<0)
+        {
+          m_cachedValue = 0; // this is to avoid a compilation warning
+          m_cachedIndex = -1;
+        }
+        else
+        {
+          m_cachedIndex = llElements[m_currentEl].index;
+          m_cachedValue = llElements[m_currentEl].value;
+        }
+      }
+    }
+
+    Index index() const { return m_cachedIndex; }
+    Scalar value() const { return m_cachedValue; }
+
+    operator bool() const { return m_cachedIndex>=0; }
+
+    Iterator& operator++()
+    {
+      using std::abs;
+      if (m_isDense)
+      {
+        do {
+          ++m_cachedIndex;
+        } while (m_cachedIndex<m_vector.m_end && abs(m_vector.m_buffer[m_cachedIndex])<m_epsilon);
+        if (m_cachedIndex<m_vector.m_end)
+          m_cachedValue = m_vector.m_buffer[m_cachedIndex];
+        else
+          m_cachedIndex=-1;
+      }
+      else
+      {
+        ListEl* EIGEN_RESTRICT llElements = reinterpret_cast<ListEl*>(m_vector.m_buffer);
+        do {
+          m_currentEl = llElements[m_currentEl].next;
+        } while (m_currentEl>=0 && abs(llElements[m_currentEl].value)<m_epsilon);
+        if (m_currentEl<0)
+        {
+          m_cachedIndex = -1;
+        }
+        else
+        {
+          m_cachedIndex = llElements[m_currentEl].index;
+          m_cachedValue = llElements[m_currentEl].value;
+        }
+      }
+      return *this;
+    }
+
+  protected:
+    const AmbiVector& m_vector; // the target vector
+    Index m_currentEl;            // the current element in sparse/linked-list mode
+    RealScalar m_epsilon;       // epsilon used to prune zero coefficients
+    Index m_cachedIndex;          // current coordinate
+    Scalar m_cachedValue;       // current value
+    bool m_isDense;             // mode of the vector
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_AMBIVECTOR_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/CompressedStorage.h b/third_party/eigen3/Eigen/src/SparseCore/CompressedStorage.h
new file mode 100644
index 0000000000..ab3989ce28
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/CompressedStorage.h
@@ -0,0 +1,235 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_COMPRESSED_STORAGE_H
+#define EIGEN_COMPRESSED_STORAGE_H
+
+namespace Eigen {
+
+namespace internal {
+
+/** \internal
+  * Stores a sparse set of values as a list of values and a list of indices.
+  *
+  */
+template<typename _Scalar,typename _Index>
+class CompressedStorage
+{
+  public:
+
+    typedef _Scalar Scalar;
+    typedef _Index Index;
+
+  protected:
+
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+  public:
+
+    CompressedStorage()
+      : m_values(0), m_indices(0), m_size(0), m_allocatedSize(0)
+    {}
+
+    CompressedStorage(size_t size)
+      : m_values(0), m_indices(0), m_size(0), m_allocatedSize(0)
+    {
+      resize(size);
+    }
+
+    CompressedStorage(const CompressedStorage& other)
+      : m_values(0), m_indices(0), m_size(0), m_allocatedSize(0)
+    {
+      *this = other;
+    }
+
+    CompressedStorage& operator=(const CompressedStorage& other)
+    {
+      resize(other.size());
+      internal::smart_copy(other.m_values,  other.m_values  + m_size, m_values);
+      internal::smart_copy(other.m_indices, other.m_indices + m_size, m_indices);
+      return *this;
+    }
+
+    void swap(CompressedStorage& other)
+    {
+      std::swap(m_values, other.m_values);
+      std::swap(m_indices, other.m_indices);
+      std::swap(m_size, other.m_size);
+      std::swap(m_allocatedSize, other.m_allocatedSize);
+    }
+
+    ~CompressedStorage()
+    {
+      delete[] m_values;
+      delete[] m_indices;
+    }
+
+    void reserve(size_t size)
+    {
+      size_t newAllocatedSize = m_size + size;
+      if (newAllocatedSize > m_allocatedSize)
+        reallocate(newAllocatedSize);
+    }
+
+    void squeeze()
+    {
+      if (m_allocatedSize>m_size)
+        reallocate(m_size);
+    }
+
+    void resize(size_t size, float reserveSizeFactor = 0)
+    {
+      if (m_allocatedSize<size)
+        reallocate(size + size_t(reserveSizeFactor*size));
+      m_size = size;
+    }
+
+    void append(const Scalar& v, Index i)
+    {
+      Index id = static_cast<Index>(m_size);
+      resize(m_size+1, 1);
+      m_values[id] = v;
+      m_indices[id] = i;
+    }
+
+    inline size_t size() const { return m_size; }
+    inline size_t allocatedSize() const { return m_allocatedSize; }
+    inline void clear() { m_size = 0; }
+
+    inline Scalar& value(size_t i) { return m_values[i]; }
+    inline const Scalar& value(size_t i) const { return m_values[i]; }
+
+    inline Index& index(size_t i) { return m_indices[i]; }
+    inline const Index& index(size_t i) const { return m_indices[i]; }
+
+    static CompressedStorage Map(Index* indices, Scalar* values, size_t size)
+    {
+      CompressedStorage res;
+      res.m_indices = indices;
+      res.m_values = values;
+      res.m_allocatedSize = res.m_size = size;
+      return res;
+    }
+
+    /** \returns the largest \c k such that for all \c j in [0,k) index[\c j]\<\a key */
+    inline Index searchLowerIndex(Index key) const
+    {
+      return searchLowerIndex(0, m_size, key);
+    }
+
+    /** \returns the largest \c k in [start,end) such that for all \c j in [start,k) index[\c j]\<\a key */
+    inline Index searchLowerIndex(size_t start, size_t end, Index key) const
+    {
+      while(end>start)
+      {
+        size_t mid = (end+start)>>1;
+        if (m_indices[mid]<key)
+          start = mid+1;
+        else
+          end = mid;
+      }
+      return static_cast<Index>(start);
+    }
+
+    /** \returns the stored value at index \a key
+      * If the value does not exist, then the value \a defaultValue is returned without any insertion. */
+    inline Scalar at(Index key, const Scalar& defaultValue = Scalar(0)) const
+    {
+      if (m_size==0)
+        return defaultValue;
+      else if (key==m_indices[m_size-1])
+        return m_values[m_size-1];
+      // ^^  optimization: let's first check if it is the last coefficient
+      // (very common in high level algorithms)
+      const size_t id = searchLowerIndex(0,m_size-1,key);
+      return ((id<m_size) && (m_indices[id]==key)) ? m_values[id] : defaultValue;
+    }
+
+    /** Like at(), but the search is performed in the range [start,end) */
+    inline Scalar atInRange(size_t start, size_t end, Index key, const Scalar& defaultValue = Scalar(0)) const
+    {
+      if (start>=end)
+        return Scalar(0);
+      else if (end>start && key==m_indices[end-1])
+        return m_values[end-1];
+      // ^^  optimization: let's first check if it is the last coefficient
+      // (very common in high level algorithms)
+      const size_t id = searchLowerIndex(start,end-1,key);
+      return ((id<end) && (m_indices[id]==key)) ? m_values[id] : defaultValue;
+    }
+
+    /** \returns a reference to the value at index \a key
+      * If the value does not exist, then the value \a defaultValue is inserted
+      * such that the keys are sorted. */
+    inline Scalar& atWithInsertion(Index key, const Scalar& defaultValue = Scalar(0))
+    {
+      size_t id = searchLowerIndex(0,m_size,key);
+      if (id>=m_size || m_indices[id]!=key)
+      {
+        resize(m_size+1,1);
+        for (size_t j=m_size-1; j>id; --j)
+        {
+          m_indices[j] = m_indices[j-1];
+          m_values[j] = m_values[j-1];
+        }
+        m_indices[id] = key;
+        m_values[id] = defaultValue;
+      }
+      return m_values[id];
+    }
+
+    void prune(const Scalar& reference, const RealScalar& epsilon = NumTraits<RealScalar>::dummy_precision())
+    {
+      size_t k = 0;
+      size_t n = size();
+      for (size_t i=0; i<n; ++i)
+      {
+        if (!internal::isMuchSmallerThan(value(i), reference, epsilon))
+        {
+          value(k) = value(i);
+          index(k) = index(i);
+          ++k;
+        }
+      }
+      resize(k,0);
+    }
+
+  protected:
+
+    inline void reallocate(size_t size)
+    {
+      Scalar* newValues  = new Scalar[size];
+      Index* newIndices = new Index[size];
+      size_t copySize = (std::min)(size, m_size);
+      // copy
+      if (copySize>0) {
+        internal::smart_copy(m_values, m_values+copySize, newValues);
+        internal::smart_copy(m_indices, m_indices+copySize, newIndices);
+      }
+      // delete old stuff
+      delete[] m_values;
+      delete[] m_indices;
+      m_values = newValues;
+      m_indices = newIndices;
+      m_allocatedSize = size;
+    }
+
+  protected:
+    Scalar* m_values;
+    Index* m_indices;
+    size_t m_size;
+    size_t m_allocatedSize;
+
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_COMPRESSED_STORAGE_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h b/third_party/eigen3/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h
new file mode 100644
index 0000000000..5c320e2d2d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h
@@ -0,0 +1,245 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CONSERVATIVESPARSESPARSEPRODUCT_H
+#define EIGEN_CONSERVATIVESPARSESPARSEPRODUCT_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Lhs, typename Rhs, typename ResultType>
+static void conservative_sparse_sparse_product_impl(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+{
+  typedef typename remove_all<Lhs>::type::Scalar Scalar;
+  typedef typename remove_all<Lhs>::type::Index Index;
+
+  // make sure to call innerSize/outerSize since we fake the storage order.
+  Index rows = lhs.innerSize();
+  Index cols = rhs.outerSize();
+  eigen_assert(lhs.outerSize() == rhs.innerSize());
+
+  std::vector<bool> mask(rows,false);
+  Matrix<Scalar,Dynamic,1> values(rows);
+  Matrix<Index,Dynamic,1>  indices(rows);
+
+  // estimate the number of non zero entries
+  // given a rhs column containing Y non zeros, we assume that the respective Y columns
+  // of the lhs differs in average of one non zeros, thus the number of non zeros for
+  // the product of a rhs column with the lhs is X+Y where X is the average number of non zero
+  // per column of the lhs.
+  // Therefore, we have nnz(lhs*rhs) = nnz(lhs) + nnz(rhs)
+  Index estimated_nnz_prod = lhs.nonZeros() + rhs.nonZeros();
+
+  res.setZero();
+  res.reserve(Index(estimated_nnz_prod));
+  // we compute each column of the result, one after the other
+  for (Index j=0; j<cols; ++j)
+  {
+
+    res.startVec(j);
+    Index nnz = 0;
+    for (typename Rhs::InnerIterator rhsIt(rhs, j); rhsIt; ++rhsIt)
+    {
+      Scalar y = rhsIt.value();
+      Index k = rhsIt.index();
+      for (typename Lhs::InnerIterator lhsIt(lhs, k); lhsIt; ++lhsIt)
+      {
+        Index i = lhsIt.index();
+        Scalar x = lhsIt.value();
+        if(!mask[i])
+        {
+          mask[i] = true;
+          values[i] = x * y;
+          indices[nnz] = i;
+          ++nnz;
+        }
+        else
+          values[i] += x * y;
+      }
+    }
+
+    // unordered insertion
+    for(Index k=0; k<nnz; ++k)
+    {
+      Index i = indices[k];
+      res.insertBackByOuterInnerUnordered(j,i) = values[i];
+      mask[i] = false;
+    }
+
+#if 0
+    // alternative ordered insertion code:
+
+    Index t200 = rows/(log2(200)*1.39);
+    Index t = (rows*100)/139;
+
+    // FIXME reserve nnz non zeros
+    // FIXME implement fast sort algorithms for very small nnz
+    // if the result is sparse enough => use a quick sort
+    // otherwise => loop through the entire vector
+    // In order to avoid to perform an expensive log2 when the
+    // result is clearly very sparse we use a linear bound up to 200.
+    //if((nnz<200 && nnz<t200) || nnz * log2(nnz) < t)
+    //res.startVec(j);
+    if(true)
+    {
+      if(nnz>1) std::sort(indices.data(),indices.data()+nnz);
+      for(Index k=0; k<nnz; ++k)
+      {
+        Index i = indices[k];
+        res.insertBackByOuterInner(j,i) = values[i];
+        mask[i] = false;
+      }
+    }
+    else
+    {
+      // dense path
+      for(Index i=0; i<rows; ++i)
+      {
+        if(mask[i])
+        {
+          mask[i] = false;
+          res.insertBackByOuterInner(j,i) = values[i];
+        }
+      }
+    }
+#endif
+
+  }
+  res.finalize();
+}
+
+
+} // end namespace internal
+
+namespace internal {
+
+template<typename Lhs, typename Rhs, typename ResultType,
+  int LhsStorageOrder = (traits<Lhs>::Flags&RowMajorBit) ? RowMajor : ColMajor,
+  int RhsStorageOrder = (traits<Rhs>::Flags&RowMajorBit) ? RowMajor : ColMajor,
+  int ResStorageOrder = (traits<ResultType>::Flags&RowMajorBit) ? RowMajor : ColMajor>
+struct conservative_sparse_sparse_product_selector;
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct conservative_sparse_sparse_product_selector<Lhs,Rhs,ResultType,ColMajor,ColMajor,ColMajor>
+{
+  typedef typename remove_all<Lhs>::type LhsCleaned;
+  typedef typename LhsCleaned::Scalar Scalar;
+
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+  {
+    typedef SparseMatrix<typename ResultType::Scalar,RowMajor,typename ResultType::Index> RowMajorMatrix;
+    typedef SparseMatrix<typename ResultType::Scalar,ColMajor,typename ResultType::Index> ColMajorMatrix;
+    ColMajorMatrix resCol(lhs.rows(),rhs.cols());
+    internal::conservative_sparse_sparse_product_impl<Lhs,Rhs,ColMajorMatrix>(lhs, rhs, resCol);
+    // sort the non zeros:
+    RowMajorMatrix resRow(resCol);
+    res = resRow;
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct conservative_sparse_sparse_product_selector<Lhs,Rhs,ResultType,RowMajor,ColMajor,ColMajor>
+{
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+  {
+     typedef SparseMatrix<typename ResultType::Scalar,RowMajor,typename ResultType::Index> RowMajorMatrix;
+     RowMajorMatrix rhsRow = rhs;
+     RowMajorMatrix resRow(lhs.rows(), rhs.cols());
+     internal::conservative_sparse_sparse_product_impl<RowMajorMatrix,Lhs,RowMajorMatrix>(rhsRow, lhs, resRow);
+     res = resRow;
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct conservative_sparse_sparse_product_selector<Lhs,Rhs,ResultType,ColMajor,RowMajor,ColMajor>
+{
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+  {
+    typedef SparseMatrix<typename ResultType::Scalar,RowMajor,typename ResultType::Index> RowMajorMatrix;
+    RowMajorMatrix lhsRow = lhs;
+    RowMajorMatrix resRow(lhs.rows(), rhs.cols());
+    internal::conservative_sparse_sparse_product_impl<Rhs,RowMajorMatrix,RowMajorMatrix>(rhs, lhsRow, resRow);
+    res = resRow;
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct conservative_sparse_sparse_product_selector<Lhs,Rhs,ResultType,RowMajor,RowMajor,ColMajor>
+{
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+  {
+    typedef SparseMatrix<typename ResultType::Scalar,RowMajor,typename ResultType::Index> RowMajorMatrix;
+    RowMajorMatrix resRow(lhs.rows(), rhs.cols());
+    internal::conservative_sparse_sparse_product_impl<Rhs,Lhs,RowMajorMatrix>(rhs, lhs, resRow);
+    res = resRow;
+  }
+};
+
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct conservative_sparse_sparse_product_selector<Lhs,Rhs,ResultType,ColMajor,ColMajor,RowMajor>
+{
+  typedef typename traits<typename remove_all<Lhs>::type>::Scalar Scalar;
+
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+  {
+    typedef SparseMatrix<typename ResultType::Scalar,ColMajor,typename ResultType::Index> ColMajorMatrix;
+    ColMajorMatrix resCol(lhs.rows(), rhs.cols());
+    internal::conservative_sparse_sparse_product_impl<Lhs,Rhs,ColMajorMatrix>(lhs, rhs, resCol);
+    res = resCol;
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct conservative_sparse_sparse_product_selector<Lhs,Rhs,ResultType,RowMajor,ColMajor,RowMajor>
+{
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+  {
+    typedef SparseMatrix<typename ResultType::Scalar,ColMajor,typename ResultType::Index> ColMajorMatrix;
+    ColMajorMatrix lhsCol = lhs;
+    ColMajorMatrix resCol(lhs.rows(), rhs.cols());
+    internal::conservative_sparse_sparse_product_impl<ColMajorMatrix,Rhs,ColMajorMatrix>(lhsCol, rhs, resCol);
+    res = resCol;
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct conservative_sparse_sparse_product_selector<Lhs,Rhs,ResultType,ColMajor,RowMajor,RowMajor>
+{
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+  {
+    typedef SparseMatrix<typename ResultType::Scalar,ColMajor,typename ResultType::Index> ColMajorMatrix;
+    ColMajorMatrix rhsCol = rhs;
+    ColMajorMatrix resCol(lhs.rows(), rhs.cols());
+    internal::conservative_sparse_sparse_product_impl<Lhs,ColMajorMatrix,ColMajorMatrix>(lhs, rhsCol, resCol);
+    res = resCol;
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct conservative_sparse_sparse_product_selector<Lhs,Rhs,ResultType,RowMajor,RowMajor,RowMajor>
+{
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res)
+  {
+    typedef SparseMatrix<typename ResultType::Scalar,RowMajor,typename ResultType::Index> RowMajorMatrix;
+    typedef SparseMatrix<typename ResultType::Scalar,ColMajor,typename ResultType::Index> ColMajorMatrix;
+    RowMajorMatrix resRow(lhs.rows(),rhs.cols());
+    internal::conservative_sparse_sparse_product_impl<Rhs,Lhs,RowMajorMatrix>(rhs, lhs, resRow);
+    // sort the non zeros:
+    ColMajorMatrix resCol(resRow);
+    res = resCol;
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_CONSERVATIVESPARSESPARSEPRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/MappedSparseMatrix.h b/third_party/eigen3/Eigen/src/SparseCore/MappedSparseMatrix.h
new file mode 100644
index 0000000000..ab1a266a90
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/MappedSparseMatrix.h
@@ -0,0 +1,181 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MAPPED_SPARSEMATRIX_H
+#define EIGEN_MAPPED_SPARSEMATRIX_H
+
+namespace Eigen { 
+
+/** \class MappedSparseMatrix
+  *
+  * \brief Sparse matrix
+  *
+  * \param _Scalar the scalar type, i.e. the type of the coefficients
+  *
+  * See http://www.netlib.org/linalg/html_templates/node91.html for details on the storage scheme.
+  *
+  */
+namespace internal {
+template<typename _Scalar, int _Flags, typename _Index>
+struct traits<MappedSparseMatrix<_Scalar, _Flags, _Index> > : traits<SparseMatrix<_Scalar, _Flags, _Index> >
+{};
+}
+
+template<typename _Scalar, int _Flags, typename _Index>
+class MappedSparseMatrix
+  : public SparseMatrixBase<MappedSparseMatrix<_Scalar, _Flags, _Index> >
+{
+  public:
+    EIGEN_SPARSE_PUBLIC_INTERFACE(MappedSparseMatrix)
+    enum { IsRowMajor = Base::IsRowMajor };
+
+  protected:
+
+    Index   m_outerSize;
+    Index   m_innerSize;
+    Index   m_nnz;
+    Index*  m_outerIndex;
+    Index*  m_innerIndices;
+    Scalar* m_values;
+
+  public:
+
+    inline Index rows() const { return IsRowMajor ? m_outerSize : m_innerSize; }
+    inline Index cols() const { return IsRowMajor ? m_innerSize : m_outerSize; }
+    inline Index innerSize() const { return m_innerSize; }
+    inline Index outerSize() const { return m_outerSize; }
+    
+    bool isCompressed() const { return true; }
+
+    //----------------------------------------
+    // direct access interface
+    inline const Scalar* valuePtr() const { return m_values; }
+    inline Scalar* valuePtr() { return m_values; }
+
+    inline const Index* innerIndexPtr() const { return m_innerIndices; }
+    inline Index* innerIndexPtr() { return m_innerIndices; }
+
+    inline const Index* outerIndexPtr() const { return m_outerIndex; }
+    inline Index* outerIndexPtr() { return m_outerIndex; }
+    //----------------------------------------
+
+    inline Scalar coeff(Index row, Index col) const
+    {
+      const Index outer = IsRowMajor ? row : col;
+      const Index inner = IsRowMajor ? col : row;
+
+      Index start = m_outerIndex[outer];
+      Index end = m_outerIndex[outer+1];
+      if (start==end)
+        return Scalar(0);
+      else if (end>0 && inner==m_innerIndices[end-1])
+        return m_values[end-1];
+      // ^^  optimization: let's first check if it is the last coefficient
+      // (very common in high level algorithms)
+
+      const Index* r = std::lower_bound(&m_innerIndices[start],&m_innerIndices[end-1],inner);
+      const Index id = r-&m_innerIndices[0];
+      return ((*r==inner) && (id<end)) ? m_values[id] : Scalar(0);
+    }
+
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      const Index outer = IsRowMajor ? row : col;
+      const Index inner = IsRowMajor ? col : row;
+
+      Index start = m_outerIndex[outer];
+      Index end = m_outerIndex[outer+1];
+      eigen_assert(end>=start && "you probably called coeffRef on a non finalized matrix");
+      eigen_assert(end>start && "coeffRef cannot be called on a zero coefficient");
+      Index* r = std::lower_bound(&m_innerIndices[start],&m_innerIndices[end],inner);
+      const Index id = r-&m_innerIndices[0];
+      eigen_assert((*r==inner) && (id<end) && "coeffRef cannot be called on a zero coefficient");
+      return m_values[id];
+    }
+
+    class InnerIterator;
+    class ReverseInnerIterator;
+
+    /** \returns the number of non zero coefficients */
+    inline Index nonZeros() const  { return m_nnz; }
+
+    inline MappedSparseMatrix(Index rows, Index cols, Index nnz, Index* outerIndexPtr, Index* innerIndexPtr, Scalar* valuePtr)
+      : m_outerSize(IsRowMajor?rows:cols), m_innerSize(IsRowMajor?cols:rows), m_nnz(nnz), m_outerIndex(outerIndexPtr),
+        m_innerIndices(innerIndexPtr), m_values(valuePtr)
+    {}
+
+    /** Empty destructor */
+    inline ~MappedSparseMatrix() {}
+};
+
+template<typename Scalar, int _Flags, typename _Index>
+class MappedSparseMatrix<Scalar,_Flags,_Index>::InnerIterator
+{
+  public:
+    InnerIterator(const MappedSparseMatrix& mat, Index outer)
+      : m_matrix(mat),
+        m_outer(outer),
+        m_id(mat.outerIndexPtr()[outer]),
+        m_start(m_id),
+        m_end(mat.outerIndexPtr()[outer+1])
+    {}
+
+    inline InnerIterator& operator++() { m_id++; return *this; }
+
+    inline Scalar value() const { return m_matrix.valuePtr()[m_id]; }
+    inline Scalar& valueRef() { return const_cast<Scalar&>(m_matrix.valuePtr()[m_id]); }
+
+    inline Index index() const { return m_matrix.innerIndexPtr()[m_id]; }
+    inline Index row() const { return IsRowMajor ? m_outer : index(); }
+    inline Index col() const { return IsRowMajor ? index() : m_outer; }
+
+    inline operator bool() const { return (m_id < m_end) && (m_id>=m_start); }
+
+  protected:
+    const MappedSparseMatrix& m_matrix;
+    const Index m_outer;
+    Index m_id;
+    const Index m_start;
+    const Index m_end;
+};
+
+template<typename Scalar, int _Flags, typename _Index>
+class MappedSparseMatrix<Scalar,_Flags,_Index>::ReverseInnerIterator
+{
+  public:
+    ReverseInnerIterator(const MappedSparseMatrix& mat, Index outer)
+      : m_matrix(mat),
+        m_outer(outer),
+        m_id(mat.outerIndexPtr()[outer+1]),
+        m_start(mat.outerIndexPtr()[outer]),
+        m_end(m_id)
+    {}
+
+    inline ReverseInnerIterator& operator--() { m_id--; return *this; }
+
+    inline Scalar value() const { return m_matrix.valuePtr()[m_id-1]; }
+    inline Scalar& valueRef() { return const_cast<Scalar&>(m_matrix.valuePtr()[m_id-1]); }
+
+    inline Index index() const { return m_matrix.innerIndexPtr()[m_id-1]; }
+    inline Index row() const { return IsRowMajor ? m_outer : index(); }
+    inline Index col() const { return IsRowMajor ? index() : m_outer; }
+
+    inline operator bool() const { return (m_id <= m_end) && (m_id>m_start); }
+
+  protected:
+    const MappedSparseMatrix& m_matrix;
+    const Index m_outer;
+    Index m_id;
+    const Index m_start;
+    const Index m_end;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_MAPPED_SPARSEMATRIX_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseBlock.h b/third_party/eigen3/Eigen/src/SparseCore/SparseBlock.h
new file mode 100644
index 0000000000..3a6d8a275c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseBlock.h
@@ -0,0 +1,547 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_BLOCK_H
+#define EIGEN_SPARSE_BLOCK_H
+
+namespace Eigen {
+
+template<typename XprType, int BlockRows, int BlockCols>
+class BlockImpl<XprType,BlockRows,BlockCols,true,Sparse>
+  : public SparseMatrixBase<Block<XprType,BlockRows,BlockCols,true> >
+{
+    typedef typename internal::remove_all<typename XprType::Nested>::type _MatrixTypeNested;
+    typedef Block<XprType, BlockRows, BlockCols, true> BlockType;
+public:
+    enum { IsRowMajor = internal::traits<BlockType>::IsRowMajor };
+protected:
+    enum { OuterSize = IsRowMajor ? BlockRows : BlockCols };
+public:
+    EIGEN_SPARSE_PUBLIC_INTERFACE(BlockType)
+
+    class InnerIterator: public XprType::InnerIterator
+    {
+        typedef typename BlockImpl::Index Index;
+      public:
+        inline InnerIterator(const BlockType& xpr, Index outer)
+          : XprType::InnerIterator(xpr.m_matrix, xpr.m_outerStart + outer), m_outer(outer)
+        {}
+        inline Index row() const { return IsRowMajor ? m_outer : this->index(); }
+        inline Index col() const { return IsRowMajor ? this->index() : m_outer; }
+      protected:
+        Index m_outer;
+    };
+    class ReverseInnerIterator: public XprType::ReverseInnerIterator
+    {
+        typedef typename BlockImpl::Index Index;
+      public:
+        inline ReverseInnerIterator(const BlockType& xpr, Index outer)
+          : XprType::ReverseInnerIterator(xpr.m_matrix, xpr.m_outerStart + outer), m_outer(outer)
+        {}
+        inline Index row() const { return IsRowMajor ? m_outer : this->index(); }
+        inline Index col() const { return IsRowMajor ? this->index() : m_outer; }
+      protected:
+        Index m_outer;
+    };
+
+    inline BlockImpl(const XprType& xpr, int i)
+      : m_matrix(xpr), m_outerStart(i), m_outerSize(OuterSize)
+    {}
+
+    inline BlockImpl(const XprType& xpr, int startRow, int startCol, int blockRows, int blockCols)
+      : m_matrix(xpr), m_outerStart(IsRowMajor ? startRow : startCol), m_outerSize(IsRowMajor ? blockRows : blockCols)
+    {}
+
+    EIGEN_STRONG_INLINE Index rows() const { return IsRowMajor ? m_outerSize.value() : m_matrix.rows(); }
+    EIGEN_STRONG_INLINE Index cols() const { return IsRowMajor ? m_matrix.cols() : m_outerSize.value(); }
+
+    Index nonZeros() const
+    {
+      Index nnz = 0;
+      Index end = m_outerStart + m_outerSize.value();
+      for(int j=m_outerStart; j<end; ++j)
+        for(typename XprType::InnerIterator it(m_matrix, j); it; ++it)
+          ++nnz;
+      return nnz;
+    }
+
+  protected:
+
+    typename XprType::Nested m_matrix;
+    Index m_outerStart;
+    const internal::variable_if_dynamic<Index, OuterSize> m_outerSize;
+
+  public:
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl)
+};
+
+
+/***************************************************************************
+* specialization for SparseMatrix
+***************************************************************************/
+
+namespace internal {
+
+template<typename SparseMatrixType, int BlockRows, int BlockCols>
+class sparse_matrix_block_impl
+  : public SparseMatrixBase<Block<SparseMatrixType,BlockRows,BlockCols,true> >
+{
+    typedef typename internal::remove_all<typename SparseMatrixType::Nested>::type _MatrixTypeNested;
+    typedef Block<SparseMatrixType, BlockRows, BlockCols, true> BlockType;
+public:
+    enum { IsRowMajor = internal::traits<BlockType>::IsRowMajor };
+    EIGEN_SPARSE_PUBLIC_INTERFACE(BlockType)
+protected:
+    enum { OuterSize = IsRowMajor ? BlockRows : BlockCols };
+public:
+
+    class InnerIterator: public SparseMatrixType::InnerIterator
+    {
+      public:
+        inline InnerIterator(const BlockType& xpr, Index outer)
+          : SparseMatrixType::InnerIterator(xpr.m_matrix, xpr.m_outerStart + outer), m_outer(outer)
+        {}
+        inline Index row() const { return IsRowMajor ? m_outer : this->index(); }
+        inline Index col() const { return IsRowMajor ? this->index() : m_outer; }
+      protected:
+        Index m_outer;
+    };
+    class ReverseInnerIterator: public SparseMatrixType::ReverseInnerIterator
+    {
+      public:
+        inline ReverseInnerIterator(const BlockType& xpr, Index outer)
+          : SparseMatrixType::ReverseInnerIterator(xpr.m_matrix, xpr.m_outerStart + outer), m_outer(outer)
+        {}
+        inline Index row() const { return IsRowMajor ? m_outer : this->index(); }
+        inline Index col() const { return IsRowMajor ? this->index() : m_outer; }
+      protected:
+        Index m_outer;
+    };
+
+    inline sparse_matrix_block_impl(const SparseMatrixType& xpr, int i)
+      : m_matrix(xpr), m_outerStart(i), m_outerSize(OuterSize)
+    {}
+
+    inline sparse_matrix_block_impl(const SparseMatrixType& xpr, int startRow, int startCol, int blockRows, int blockCols)
+      : m_matrix(xpr), m_outerStart(IsRowMajor ? startRow : startCol), m_outerSize(IsRowMajor ? blockRows : blockCols)
+    {}
+
+    template<typename OtherDerived>
+    inline BlockType& operator=(const SparseMatrixBase<OtherDerived>& other)
+    {
+      typedef typename internal::remove_all<typename SparseMatrixType::Nested>::type _NestedMatrixType;
+      _NestedMatrixType& matrix = const_cast<_NestedMatrixType&>(m_matrix);;
+      // This assignment is slow if this vector set is not empty
+      // and/or it is not at the end of the nonzeros of the underlying matrix.
+
+      // 1 - eval to a temporary to avoid transposition and/or aliasing issues
+      SparseMatrix<Scalar, IsRowMajor ? RowMajor : ColMajor, Index> tmp(other);
+
+      // 2 - let's check whether there is enough allocated memory
+      Index nnz           = tmp.nonZeros();
+      Index start         = m_outerStart==0 ? 0 : matrix.outerIndexPtr()[m_outerStart]; // starting position of the current block
+      Index end           = m_matrix.outerIndexPtr()[m_outerStart+m_outerSize.value()]; // ending position of the current block
+      Index block_size    = end - start;                                                // available room in the current block
+      Index tail_size     = m_matrix.outerIndexPtr()[m_matrix.outerSize()] - end;
+
+      Index free_size     = m_matrix.isCompressed()
+                          ? Index(matrix.data().allocatedSize()) + block_size
+                          : block_size;
+
+      if(nnz>free_size)
+      {
+        // realloc manually to reduce copies
+        typename SparseMatrixType::Storage newdata(m_matrix.data().allocatedSize() - block_size + nnz);
+
+        internal::smart_copy(&m_matrix.data().value(0),  &m_matrix.data().value(0) + start, &newdata.value(0));
+        internal::smart_copy(&m_matrix.data().index(0),  &m_matrix.data().index(0) + start, &newdata.index(0));
+
+        internal::smart_copy(&tmp.data().value(0),  &tmp.data().value(0) + nnz, &newdata.value(start));
+        internal::smart_copy(&tmp.data().index(0),  &tmp.data().index(0) + nnz, &newdata.index(start));
+
+        internal::smart_copy(&matrix.data().value(end),  &matrix.data().value(end) + tail_size, &newdata.value(start+nnz));
+        internal::smart_copy(&matrix.data().index(end),  &matrix.data().index(end) + tail_size, &newdata.index(start+nnz));
+
+        newdata.resize(m_matrix.outerIndexPtr()[m_matrix.outerSize()] - block_size + nnz);
+
+        matrix.data().swap(newdata);
+      }
+      else
+      {
+        // no need to realloc, simply copy the tail at its respective position and insert tmp
+        matrix.data().resize(start + nnz + tail_size);
+
+        internal::smart_memmove(&matrix.data().value(end),  &matrix.data().value(end) + tail_size, &matrix.data().value(start + nnz));
+        internal::smart_memmove(&matrix.data().index(end),  &matrix.data().index(end) + tail_size, &matrix.data().index(start + nnz));
+
+        internal::smart_copy(&tmp.data().value(0),  &tmp.data().value(0) + nnz, &matrix.data().value(start));
+        internal::smart_copy(&tmp.data().index(0),  &tmp.data().index(0) + nnz, &matrix.data().index(start));
+      }
+
+      // update innerNonZeros
+      if(!m_matrix.isCompressed())
+        for(Index j=0; j<m_outerSize.value(); ++j)
+          matrix.innerNonZeroPtr()[m_outerStart+j] = tmp.innerVector(j).nonZeros();
+
+      // update outer index pointers
+      Index p = start;
+      for(Index k=0; k<m_outerSize.value(); ++k)
+      {
+        matrix.outerIndexPtr()[m_outerStart+k] = p;
+        p += tmp.innerVector(k).nonZeros();
+      }
+      std::ptrdiff_t offset = nnz - block_size;
+      for(Index k = m_outerStart + m_outerSize.value(); k<=matrix.outerSize(); ++k)
+      {
+        matrix.outerIndexPtr()[k] += offset;
+      }
+
+      return derived();
+    }
+
+    inline BlockType& operator=(const BlockType& other)
+    {
+      return operator=<BlockType>(other);
+    }
+
+    inline const Scalar* valuePtr() const
+    { return m_matrix.valuePtr() + m_matrix.outerIndexPtr()[m_outerStart]; }
+    inline Scalar* valuePtr()
+    { return m_matrix.const_cast_derived().valuePtr() + m_matrix.outerIndexPtr()[m_outerStart]; }
+
+    inline const Index* innerIndexPtr() const
+    { return m_matrix.innerIndexPtr() + m_matrix.outerIndexPtr()[m_outerStart]; }
+    inline Index* innerIndexPtr()
+    { return m_matrix.const_cast_derived().innerIndexPtr() + m_matrix.outerIndexPtr()[m_outerStart]; }
+
+    inline const Index* outerIndexPtr() const
+    { return m_matrix.outerIndexPtr() + m_outerStart; }
+    inline Index* outerIndexPtr()
+    { return m_matrix.const_cast_derived().outerIndexPtr() + m_outerStart; }
+
+    Index nonZeros() const
+    {
+      if(m_matrix.isCompressed())
+        return  std::size_t(m_matrix.outerIndexPtr()[m_outerStart+m_outerSize.value()])
+              - std::size_t(m_matrix.outerIndexPtr()[m_outerStart]);
+      else if(m_outerSize.value()==0)
+        return 0;
+      else
+        return Map<const Matrix<Index,OuterSize,1> >(m_matrix.innerNonZeroPtr()+m_outerStart, m_outerSize.value()).sum();
+    }
+
+    const Scalar& lastCoeff() const
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(sparse_matrix_block_impl);
+      eigen_assert(nonZeros()>0);
+      if(m_matrix.isCompressed())
+        return m_matrix.valuePtr()[m_matrix.outerIndexPtr()[m_outerStart+1]-1];
+      else
+        return m_matrix.valuePtr()[m_matrix.outerIndexPtr()[m_outerStart]+m_matrix.innerNonZeroPtr()[m_outerStart]-1];
+    }
+
+    EIGEN_STRONG_INLINE Index rows() const { return IsRowMajor ? m_outerSize.value() : m_matrix.rows(); }
+    EIGEN_STRONG_INLINE Index cols() const { return IsRowMajor ? m_matrix.cols() : m_outerSize.value(); }
+
+  protected:
+
+    typename SparseMatrixType::Nested m_matrix;
+    Index m_outerStart;
+    const internal::variable_if_dynamic<Index, OuterSize> m_outerSize;
+
+};
+
+} // namespace internal
+
+template<typename _Scalar, int _Options, typename _Index, int BlockRows, int BlockCols>
+class BlockImpl<SparseMatrix<_Scalar, _Options, _Index>,BlockRows,BlockCols,true,Sparse>
+  : public internal::sparse_matrix_block_impl<SparseMatrix<_Scalar, _Options, _Index>,BlockRows,BlockCols>
+{
+public:
+  typedef SparseMatrix<_Scalar, _Options, _Index> SparseMatrixType;
+  typedef internal::sparse_matrix_block_impl<SparseMatrixType,BlockRows,BlockCols> Base;
+  inline BlockImpl(SparseMatrixType& xpr, int i)
+    : Base(xpr, i)
+  {}
+
+  inline BlockImpl(SparseMatrixType& xpr, int startRow, int startCol, int blockRows, int blockCols)
+    : Base(xpr, startRow, startCol, blockRows, blockCols)
+  {}
+
+  using Base::operator=;
+};
+
+template<typename _Scalar, int _Options, typename _Index, int BlockRows, int BlockCols>
+class BlockImpl<const SparseMatrix<_Scalar, _Options, _Index>,BlockRows,BlockCols,true,Sparse>
+  : public internal::sparse_matrix_block_impl<const SparseMatrix<_Scalar, _Options, _Index>,BlockRows,BlockCols>
+{
+public:
+  typedef const SparseMatrix<_Scalar, _Options, _Index> SparseMatrixType;
+  typedef internal::sparse_matrix_block_impl<SparseMatrixType,BlockRows,BlockCols> Base;
+  inline BlockImpl(SparseMatrixType& xpr, int i)
+    : Base(xpr, i)
+  {}
+
+  inline BlockImpl(SparseMatrixType& xpr, int startRow, int startCol, int blockRows, int blockCols)
+    : Base(xpr, startRow, startCol, blockRows, blockCols)
+  {}
+
+  using Base::operator=;
+};
+
+//----------
+
+/** \returns the \a outer -th column (resp. row) of the matrix \c *this if \c *this
+  * is col-major (resp. row-major).
+  */
+template<typename Derived>
+typename SparseMatrixBase<Derived>::InnerVectorReturnType SparseMatrixBase<Derived>::innerVector(Index outer)
+{ return InnerVectorReturnType(derived(), outer); }
+
+/** \returns the \a outer -th column (resp. row) of the matrix \c *this if \c *this
+  * is col-major (resp. row-major). Read-only.
+  */
+template<typename Derived>
+const typename SparseMatrixBase<Derived>::ConstInnerVectorReturnType SparseMatrixBase<Derived>::innerVector(Index outer) const
+{ return ConstInnerVectorReturnType(derived(), outer); }
+
+/** \returns the \a outer -th column (resp. row) of the matrix \c *this if \c *this
+  * is col-major (resp. row-major).
+  */
+template<typename Derived>
+Block<Derived,Dynamic,Dynamic,true> SparseMatrixBase<Derived>::innerVectors(Index outerStart, Index outerSize)
+{
+  return Block<Derived,Dynamic,Dynamic,true>(derived(),
+                                             IsRowMajor ? outerStart : 0, IsRowMajor ? 0 : outerStart,
+                                             IsRowMajor ? outerSize : rows(), IsRowMajor ? cols() : outerSize);
+
+}
+
+/** \returns the \a outer -th column (resp. row) of the matrix \c *this if \c *this
+  * is col-major (resp. row-major). Read-only.
+  */
+template<typename Derived>
+const Block<const Derived,Dynamic,Dynamic,true> SparseMatrixBase<Derived>::innerVectors(Index outerStart, Index outerSize) const
+{
+  return Block<const Derived,Dynamic,Dynamic,true>(derived(),
+                                                  IsRowMajor ? outerStart : 0, IsRowMajor ? 0 : outerStart,
+                                                  IsRowMajor ? outerSize : rows(), IsRowMajor ? cols() : outerSize);
+
+}
+
+namespace internal {
+
+template< typename XprType, int BlockRows, int BlockCols, bool InnerPanel,
+          bool OuterVector =  (BlockCols==1 && XprType::IsRowMajor)
+                               | // FIXME | instead of || to please GCC 4.4.0 stupid warning "suggest parentheses around &&".
+                                 // revert to || as soon as not needed anymore.
+                              (BlockRows==1 && !XprType::IsRowMajor)>
+class GenericSparseBlockInnerIteratorImpl;
+
+}
+
+/** Generic implementation of sparse Block expression.
+  * Real-only.
+  */
+template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
+class BlockImpl<XprType,BlockRows,BlockCols,InnerPanel,Sparse>
+  : public SparseMatrixBase<Block<XprType,BlockRows,BlockCols,InnerPanel> >, internal::no_assignment_operator
+{
+  typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
+public:
+    enum { IsRowMajor = internal::traits<BlockType>::IsRowMajor };
+    EIGEN_SPARSE_PUBLIC_INTERFACE(BlockType)
+
+    typedef typename internal::remove_all<typename XprType::Nested>::type _MatrixTypeNested;
+
+    /** Column or Row constructor
+      */
+    inline BlockImpl(const XprType& xpr, int i)
+      : m_matrix(xpr),
+        m_startRow( (BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) ? i : 0),
+        m_startCol( (BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) ? i : 0),
+        m_blockRows(BlockRows==1 ? 1 : xpr.rows()),
+        m_blockCols(BlockCols==1 ? 1 : xpr.cols())
+    {}
+
+    /** Dynamic-size constructor
+      */
+    inline BlockImpl(const XprType& xpr, int startRow, int startCol, int blockRows, int blockCols)
+      : m_matrix(xpr), m_startRow(startRow), m_startCol(startCol), m_blockRows(blockRows), m_blockCols(blockCols)
+    {}
+
+    inline int rows() const { return m_blockRows.value(); }
+    inline int cols() const { return m_blockCols.value(); }
+
+    inline Scalar& coeffRef(int row, int col)
+    {
+      return m_matrix.const_cast_derived()
+               .coeffRef(row + m_startRow.value(), col + m_startCol.value());
+    }
+
+    inline const Scalar coeff(int row, int col) const
+    {
+      return m_matrix.coeff(row + m_startRow.value(), col + m_startCol.value());
+    }
+
+    inline Scalar& coeffRef(int index)
+    {
+      return m_matrix.const_cast_derived()
+             .coeffRef(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
+                       m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
+    }
+
+    inline const Scalar coeff(int index) const
+    {
+      return m_matrix
+             .coeff(m_startRow.value() + (RowsAtCompileTime == 1 ? 0 : index),
+                    m_startCol.value() + (RowsAtCompileTime == 1 ? index : 0));
+    }
+
+    inline const _MatrixTypeNested& nestedExpression() const { return m_matrix; }
+
+    typedef internal::GenericSparseBlockInnerIteratorImpl<XprType,BlockRows,BlockCols,InnerPanel> InnerIterator;
+
+    class ReverseInnerIterator : public _MatrixTypeNested::ReverseInnerIterator
+    {
+      typedef typename _MatrixTypeNested::ReverseInnerIterator Base;
+      const BlockType& m_block;
+      Index m_begin;
+    public:
+
+      EIGEN_STRONG_INLINE ReverseInnerIterator(const BlockType& block, Index outer)
+        : Base(block.derived().nestedExpression(), outer + (IsRowMajor ? block.m_startRow.value() : block.m_startCol.value())),
+          m_block(block),
+          m_begin(IsRowMajor ? block.m_startCol.value() : block.m_startRow.value())
+      {
+        while( (Base::operator bool()) && (Base::index() >= (IsRowMajor ? m_block.m_startCol.value()+block.m_blockCols.value() : m_block.m_startRow.value()+block.m_blockRows.value())) )
+          Base::operator--();
+      }
+
+      inline Index index()  const { return Base::index() - (IsRowMajor ? m_block.m_startCol.value() : m_block.m_startRow.value()); }
+      inline Index outer()  const { return Base::outer() - (IsRowMajor ? m_block.m_startRow.value() : m_block.m_startCol.value()); }
+      inline Index row()    const { return Base::row()   - m_block.m_startRow.value(); }
+      inline Index col()    const { return Base::col()   - m_block.m_startCol.value(); }
+
+      inline operator bool() const { return Base::operator bool() && Base::index() >= m_begin; }
+    };
+  protected:
+    friend class internal::GenericSparseBlockInnerIteratorImpl<XprType,BlockRows,BlockCols,InnerPanel>;
+    friend class ReverseInnerIterator;
+
+    EIGEN_INHERIT_ASSIGNMENT_OPERATORS(BlockImpl)
+
+    typename XprType::Nested m_matrix;
+    const internal::variable_if_dynamic<Index, XprType::RowsAtCompileTime == 1 ? 0 : Dynamic> m_startRow;
+    const internal::variable_if_dynamic<Index, XprType::ColsAtCompileTime == 1 ? 0 : Dynamic> m_startCol;
+    const internal::variable_if_dynamic<Index, RowsAtCompileTime> m_blockRows;
+    const internal::variable_if_dynamic<Index, ColsAtCompileTime> m_blockCols;
+
+};
+
+namespace internal {
+  template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
+  class GenericSparseBlockInnerIteratorImpl<XprType,BlockRows,BlockCols,InnerPanel,false> : public Block<XprType, BlockRows, BlockCols, InnerPanel>::_MatrixTypeNested::InnerIterator
+  {
+    typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
+    enum {
+      IsRowMajor = BlockType::IsRowMajor
+    };
+    typedef typename BlockType::_MatrixTypeNested _MatrixTypeNested;
+    typedef typename BlockType::Index Index;
+    typedef typename _MatrixTypeNested::InnerIterator Base;
+    const BlockType& m_block;
+    Index m_end;
+  public:
+
+    EIGEN_STRONG_INLINE GenericSparseBlockInnerIteratorImpl(const BlockType& block, Index outer)
+      : Base(block.derived().nestedExpression(), outer + (IsRowMajor ? block.m_startRow.value() : block.m_startCol.value())),
+        m_block(block),
+        m_end(IsRowMajor ? block.m_startCol.value()+block.m_blockCols.value() : block.m_startRow.value()+block.m_blockRows.value())
+    {
+      while( (Base::operator bool()) && (Base::index() < (IsRowMajor ? m_block.m_startCol.value() : m_block.m_startRow.value())) )
+        Base::operator++();
+    }
+
+    inline Index index()  const { return Base::index() - (IsRowMajor ? m_block.m_startCol.value() : m_block.m_startRow.value()); }
+    inline Index outer()  const { return Base::outer() - (IsRowMajor ? m_block.m_startRow.value() : m_block.m_startCol.value()); }
+    inline Index row()    const { return Base::row()   - m_block.m_startRow.value(); }
+    inline Index col()    const { return Base::col()   - m_block.m_startCol.value(); }
+
+    inline operator bool() const { return Base::operator bool() && Base::index() < m_end; }
+  };
+
+  // Row vector of a column-major sparse matrix or column of a row-major one.
+  template<typename XprType, int BlockRows, int BlockCols, bool InnerPanel>
+  class GenericSparseBlockInnerIteratorImpl<XprType,BlockRows,BlockCols,InnerPanel,true>
+  {
+    typedef Block<XprType, BlockRows, BlockCols, InnerPanel> BlockType;
+    enum {
+      IsRowMajor = BlockType::IsRowMajor
+    };
+    typedef typename BlockType::_MatrixTypeNested _MatrixTypeNested;
+    typedef typename BlockType::Index Index;
+    typedef typename BlockType::Scalar Scalar;
+    const BlockType& m_block;
+    Index m_outerPos;
+    Index m_innerIndex;
+    Scalar m_value;
+    Index m_end;
+  public:
+
+    EIGEN_STRONG_INLINE GenericSparseBlockInnerIteratorImpl(const BlockType& block, Index outer = 0)
+      :
+        m_block(block),
+        m_outerPos( (IsRowMajor ? block.m_startCol.value() : block.m_startRow.value()) - 1), // -1 so that operator++ finds the first non-zero entry
+        m_innerIndex(IsRowMajor ? block.m_startRow.value() : block.m_startCol.value()),
+        m_end(IsRowMajor ? block.m_startCol.value()+block.m_blockCols.value() : block.m_startRow.value()+block.m_blockRows.value())
+    {
+      EIGEN_UNUSED_VARIABLE(outer);
+      eigen_assert(outer==0);
+
+      ++(*this);
+    }
+
+    inline Index index()  const { return m_outerPos - (IsRowMajor ? m_block.m_startCol.value() : m_block.m_startRow.value()); }
+    inline Index outer()  const { return 0; }
+    inline Index row()    const { return IsRowMajor ? 0 : index(); }
+    inline Index col()    const { return IsRowMajor ? index() : 0; }
+
+    inline Scalar value() const { return m_value; }
+
+    inline GenericSparseBlockInnerIteratorImpl& operator++()
+    {
+      // At end already?
+      if (m_outerPos >= m_end)
+        return *this;
+
+      // search next non-zero entry.
+      while(++m_outerPos<m_end)
+      {
+        typename XprType::InnerIterator it(m_block.m_matrix, m_outerPos);
+        // search for the key m_innerIndex in the current outer-vector
+        while(it && it.index() < m_innerIndex) ++it;
+        if(it && it.index()==m_innerIndex)
+        {
+          m_value = it.value();
+          break;
+        }
+      }
+      return *this;
+    }
+
+    inline operator bool() const { return m_outerPos < m_end; }
+  };
+
+} // end namespace internal
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_BLOCK_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseColEtree.h b/third_party/eigen3/Eigen/src/SparseCore/SparseColEtree.h
new file mode 100644
index 0000000000..f8745f4610
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseColEtree.h
@@ -0,0 +1,206 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+
+/* 
+ 
+ * NOTE: This file is the modified version of sp_coletree.c file in SuperLU 
+ 
+ * -- SuperLU routine (version 3.1) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * August 1, 2008
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+#ifndef SPARSE_COLETREE_H
+#define SPARSE_COLETREE_H
+
+namespace Eigen {
+
+namespace internal {
+
+/** Find the root of the tree/set containing the vertex i : Use Path halving */ 
+template<typename Index, typename IndexVector>
+Index etree_find (Index i, IndexVector& pp)
+{
+  Index p = pp(i); // Parent 
+  Index gp = pp(p); // Grand parent 
+  while (gp != p) 
+  {
+    pp(i) = gp; // Parent pointer on find path is changed to former grand parent
+    i = gp; 
+    p = pp(i);
+    gp = pp(p);
+  }
+  return p; 
+}
+
+/** Compute the column elimination tree of a sparse matrix
+  * \param mat The matrix in column-major format. 
+  * \param parent The elimination tree
+  * \param firstRowElt The column index of the first element in each row
+  * \param perm The permutation to apply to the column of \b mat
+  */
+template <typename MatrixType, typename IndexVector>
+int coletree(const MatrixType& mat, IndexVector& parent, IndexVector& firstRowElt, typename MatrixType::Index *perm=0)
+{
+  typedef typename MatrixType::Index Index;
+  Index nc = mat.cols(); // Number of columns 
+  Index m = mat.rows();
+  Index diagSize = (std::min)(nc,m);
+  IndexVector root(nc); // root of subtree of etree 
+  root.setZero();
+  IndexVector pp(nc); // disjoint sets 
+  pp.setZero(); // Initialize disjoint sets 
+  parent.resize(mat.cols());
+  //Compute first nonzero column in each row 
+  Index row,col; 
+  firstRowElt.resize(m);
+  firstRowElt.setConstant(nc);
+  firstRowElt.segment(0, diagSize).setLinSpaced(diagSize, 0, diagSize-1);
+  bool found_diag;
+  for (col = 0; col < nc; col++)
+  {
+    Index pcol = col;
+    if(perm) pcol  = perm[col];
+    for (typename MatrixType::InnerIterator it(mat, pcol); it; ++it)
+    { 
+      row = it.row();
+      firstRowElt(row) = (std::min)(firstRowElt(row), col);
+    }
+  }
+  /* Compute etree by Liu's algorithm for symmetric matrices,
+          except use (firstRowElt[r],c) in place of an edge (r,c) of A.
+    Thus each row clique in A'*A is replaced by a star
+    centered at its first vertex, which has the same fill. */
+  Index rset, cset, rroot; 
+  for (col = 0; col < nc; col++) 
+  {
+    found_diag = col>=m;
+    pp(col) = col; 
+    cset = col; 
+    root(cset) = col; 
+    parent(col) = nc; 
+    /* The diagonal element is treated here even if it does not exist in the matrix
+     * hence the loop is executed once more */ 
+    Index pcol = col;
+    if(perm) pcol  = perm[col];
+    for (typename MatrixType::InnerIterator it(mat, pcol); it||!found_diag; ++it)
+    { //  A sequence of interleaved find and union is performed 
+      Index i = col;
+      if(it) i = it.index();
+      if (i == col) found_diag = true;
+      
+      row = firstRowElt(i);
+      if (row >= col) continue; 
+      rset = internal::etree_find(row, pp); // Find the name of the set containing row
+      rroot = root(rset);
+      if (rroot != col) 
+      {
+        parent(rroot) = col; 
+        pp(cset) = rset; 
+        cset = rset; 
+        root(cset) = col; 
+      }
+    }
+  }
+  return 0;  
+}
+
+/** 
+  * Depth-first search from vertex n.  No recursion.
+  * This routine was contributed by Cédric Doucet, CEDRAT Group, Meylan, France.
+*/
+template <typename Index, typename IndexVector>
+void nr_etdfs (Index n, IndexVector& parent, IndexVector& first_kid, IndexVector& next_kid, IndexVector& post, Index postnum)
+{
+  Index current = n, first, next;
+  while (postnum != n) 
+  {
+    // No kid for the current node
+    first = first_kid(current);
+    
+    // no kid for the current node
+    if (first == -1) 
+    {
+      // Numbering this node because it has no kid 
+      post(current) = postnum++;
+      
+      // looking for the next kid 
+      next = next_kid(current); 
+      while (next == -1) 
+      {
+        // No more kids : back to the parent node
+        current = parent(current); 
+        // numbering the parent node 
+        post(current) = postnum++;
+        
+        // Get the next kid 
+        next = next_kid(current); 
+      }
+      // stopping criterion 
+      if (postnum == n+1) return; 
+      
+      // Updating current node 
+      current = next; 
+    }
+    else 
+    {
+      current = first; 
+    }
+  }
+}
+
+
+/**
+  * \brief Post order a tree 
+  * \param n the number of nodes
+  * \param parent Input tree
+  * \param post postordered tree
+  */
+template <typename Index, typename IndexVector>
+void treePostorder(Index n, IndexVector& parent, IndexVector& post)
+{
+  IndexVector first_kid, next_kid; // Linked list of children 
+  Index postnum; 
+  // Allocate storage for working arrays and results 
+  first_kid.resize(n+1); 
+  next_kid.setZero(n+1);
+  post.setZero(n+1);
+  
+  // Set up structure describing children
+  Index v, dad; 
+  first_kid.setConstant(-1); 
+  for (v = n-1; v >= 0; v--) 
+  {
+    dad = parent(v);
+    next_kid(v) = first_kid(dad); 
+    first_kid(dad) = v; 
+  }
+  
+  // Depth-first search from dummy root vertex #n
+  postnum = 0; 
+  internal::nr_etdfs(n, parent, first_kid, next_kid, post, postnum);
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // SPARSE_COLETREE_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseCwiseBinaryOp.h b/third_party/eigen3/Eigen/src/SparseCore/SparseCwiseBinaryOp.h
new file mode 100644
index 0000000000..ec86ca933c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseCwiseBinaryOp.h
@@ -0,0 +1,324 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_CWISE_BINARY_OP_H
+#define EIGEN_SPARSE_CWISE_BINARY_OP_H
+
+namespace Eigen { 
+
+// Here we have to handle 3 cases:
+//  1 - sparse op dense
+//  2 - dense op sparse
+//  3 - sparse op sparse
+// We also need to implement a 4th iterator for:
+//  4 - dense op dense
+// Finally, we also need to distinguish between the product and other operations :
+//                configuration      returned mode
+//  1 - sparse op dense    product      sparse
+//                         generic      dense
+//  2 - dense op sparse    product      sparse
+//                         generic      dense
+//  3 - sparse op sparse   product      sparse
+//                         generic      sparse
+//  4 - dense op dense     product      dense
+//                         generic      dense
+
+namespace internal {
+
+template<> struct promote_storage_type<Dense,Sparse>
+{ typedef Sparse ret; };
+
+template<> struct promote_storage_type<Sparse,Dense>
+{ typedef Sparse ret; };
+
+template<typename BinaryOp, typename Lhs, typename Rhs, typename Derived,
+  typename _LhsStorageMode = typename traits<Lhs>::StorageKind,
+  typename _RhsStorageMode = typename traits<Rhs>::StorageKind>
+class sparse_cwise_binary_op_inner_iterator_selector;
+
+} // end namespace internal
+
+template<typename BinaryOp, typename Lhs, typename Rhs>
+class CwiseBinaryOpImpl<BinaryOp, Lhs, Rhs, Sparse>
+  : public SparseMatrixBase<CwiseBinaryOp<BinaryOp, Lhs, Rhs> >
+{
+  public:
+    class InnerIterator;
+    class ReverseInnerIterator;
+    typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> Derived;
+    EIGEN_SPARSE_PUBLIC_INTERFACE(Derived)
+    CwiseBinaryOpImpl()
+    {
+      typedef typename internal::traits<Lhs>::StorageKind LhsStorageKind;
+      typedef typename internal::traits<Rhs>::StorageKind RhsStorageKind;
+      EIGEN_STATIC_ASSERT((
+                (!internal::is_same<LhsStorageKind,RhsStorageKind>::value)
+            ||  ((Lhs::Flags&RowMajorBit) == (Rhs::Flags&RowMajorBit))),
+            THE_STORAGE_ORDER_OF_BOTH_SIDES_MUST_MATCH);
+    }
+};
+
+template<typename BinaryOp, typename Lhs, typename Rhs>
+class CwiseBinaryOpImpl<BinaryOp,Lhs,Rhs,Sparse>::InnerIterator
+  : public internal::sparse_cwise_binary_op_inner_iterator_selector<BinaryOp,Lhs,Rhs,typename CwiseBinaryOpImpl<BinaryOp,Lhs,Rhs,Sparse>::InnerIterator>
+{
+  public:
+    typedef typename Lhs::Index Index;
+    typedef internal::sparse_cwise_binary_op_inner_iterator_selector<
+      BinaryOp,Lhs,Rhs, InnerIterator> Base;
+
+    EIGEN_STRONG_INLINE InnerIterator(const CwiseBinaryOpImpl& binOp, Index outer)
+      : Base(binOp.derived(),outer)
+    {}
+};
+
+/***************************************************************************
+* Implementation of inner-iterators
+***************************************************************************/
+
+// template<typename T> struct internal::func_is_conjunction { enum { ret = false }; };
+// template<typename T> struct internal::func_is_conjunction<internal::scalar_product_op<T> > { enum { ret = true }; };
+
+// TODO generalize the internal::scalar_product_op specialization to all conjunctions if any !
+
+namespace internal {
+
+// sparse - sparse  (generic)
+template<typename BinaryOp, typename Lhs, typename Rhs, typename Derived>
+class sparse_cwise_binary_op_inner_iterator_selector<BinaryOp, Lhs, Rhs, Derived, Sparse, Sparse>
+{
+    typedef CwiseBinaryOp<BinaryOp, Lhs, Rhs> CwiseBinaryXpr;
+    typedef typename traits<CwiseBinaryXpr>::Scalar Scalar;
+    typedef typename traits<CwiseBinaryXpr>::_LhsNested _LhsNested;
+    typedef typename traits<CwiseBinaryXpr>::_RhsNested _RhsNested;
+    typedef typename _LhsNested::InnerIterator LhsIterator;
+    typedef typename _RhsNested::InnerIterator RhsIterator;
+    typedef typename Lhs::Index Index;
+
+  public:
+
+    EIGEN_STRONG_INLINE sparse_cwise_binary_op_inner_iterator_selector(const CwiseBinaryXpr& xpr, Index outer)
+      : m_lhsIter(xpr.lhs(),outer), m_rhsIter(xpr.rhs(),outer), m_functor(xpr.functor())
+    {
+      this->operator++();
+    }
+
+    EIGEN_STRONG_INLINE Derived& operator++()
+    {
+      if (m_lhsIter && m_rhsIter && (m_lhsIter.index() == m_rhsIter.index()))
+      {
+        m_id = m_lhsIter.index();
+        m_value = m_functor(m_lhsIter.value(), m_rhsIter.value());
+        ++m_lhsIter;
+        ++m_rhsIter;
+      }
+      else if (m_lhsIter && (!m_rhsIter || (m_lhsIter.index() < m_rhsIter.index())))
+      {
+        m_id = m_lhsIter.index();
+        m_value = m_functor(m_lhsIter.value(), Scalar(0));
+        ++m_lhsIter;
+      }
+      else if (m_rhsIter && (!m_lhsIter || (m_lhsIter.index() > m_rhsIter.index())))
+      {
+        m_id = m_rhsIter.index();
+        m_value = m_functor(Scalar(0), m_rhsIter.value());
+        ++m_rhsIter;
+      }
+      else
+      {
+        m_value = 0; // this is to avoid a compilation warning
+        m_id = -1;
+      }
+      return *static_cast<Derived*>(this);
+    }
+
+    EIGEN_STRONG_INLINE Scalar value() const { return m_value; }
+
+    EIGEN_STRONG_INLINE Index index() const { return m_id; }
+    EIGEN_STRONG_INLINE Index row() const { return Lhs::IsRowMajor ? m_lhsIter.row() : index(); }
+    EIGEN_STRONG_INLINE Index col() const { return Lhs::IsRowMajor ? index() : m_lhsIter.col(); }
+
+    EIGEN_STRONG_INLINE operator bool() const { return m_id>=0; }
+
+  protected:
+    LhsIterator m_lhsIter;
+    RhsIterator m_rhsIter;
+    const BinaryOp& m_functor;
+    Scalar m_value;
+    Index m_id;
+};
+
+// sparse - sparse  (product)
+template<typename T, typename Lhs, typename Rhs, typename Derived>
+class sparse_cwise_binary_op_inner_iterator_selector<scalar_product_op<T>, Lhs, Rhs, Derived, Sparse, Sparse>
+{
+    typedef scalar_product_op<T> BinaryFunc;
+    typedef CwiseBinaryOp<BinaryFunc, Lhs, Rhs> CwiseBinaryXpr;
+    typedef typename CwiseBinaryXpr::Scalar Scalar;
+    typedef typename traits<CwiseBinaryXpr>::_LhsNested _LhsNested;
+    typedef typename _LhsNested::InnerIterator LhsIterator;
+    typedef typename traits<CwiseBinaryXpr>::_RhsNested _RhsNested;
+    typedef typename _RhsNested::InnerIterator RhsIterator;
+    typedef typename Lhs::Index Index;
+  public:
+
+    EIGEN_STRONG_INLINE sparse_cwise_binary_op_inner_iterator_selector(const CwiseBinaryXpr& xpr, Index outer)
+      : m_lhsIter(xpr.lhs(),outer), m_rhsIter(xpr.rhs(),outer), m_functor(xpr.functor())
+    {
+      while (m_lhsIter && m_rhsIter && (m_lhsIter.index() != m_rhsIter.index()))
+      {
+        if (m_lhsIter.index() < m_rhsIter.index())
+          ++m_lhsIter;
+        else
+          ++m_rhsIter;
+      }
+    }
+
+    EIGEN_STRONG_INLINE Derived& operator++()
+    {
+      ++m_lhsIter;
+      ++m_rhsIter;
+      while (m_lhsIter && m_rhsIter && (m_lhsIter.index() != m_rhsIter.index()))
+      {
+        if (m_lhsIter.index() < m_rhsIter.index())
+          ++m_lhsIter;
+        else
+          ++m_rhsIter;
+      }
+      return *static_cast<Derived*>(this);
+    }
+
+    EIGEN_STRONG_INLINE Scalar value() const { return m_functor(m_lhsIter.value(), m_rhsIter.value()); }
+
+    EIGEN_STRONG_INLINE Index index() const { return m_lhsIter.index(); }
+    EIGEN_STRONG_INLINE Index row() const { return m_lhsIter.row(); }
+    EIGEN_STRONG_INLINE Index col() const { return m_lhsIter.col(); }
+
+    EIGEN_STRONG_INLINE operator bool() const { return (m_lhsIter && m_rhsIter); }
+
+  protected:
+    LhsIterator m_lhsIter;
+    RhsIterator m_rhsIter;
+    const BinaryFunc& m_functor;
+};
+
+// sparse - dense  (product)
+template<typename T, typename Lhs, typename Rhs, typename Derived>
+class sparse_cwise_binary_op_inner_iterator_selector<scalar_product_op<T>, Lhs, Rhs, Derived, Sparse, Dense>
+{
+    typedef scalar_product_op<T> BinaryFunc;
+    typedef CwiseBinaryOp<BinaryFunc, Lhs, Rhs> CwiseBinaryXpr;
+    typedef typename CwiseBinaryXpr::Scalar Scalar;
+    typedef typename traits<CwiseBinaryXpr>::_LhsNested _LhsNested;
+    typedef typename traits<CwiseBinaryXpr>::RhsNested RhsNested;
+    typedef typename _LhsNested::InnerIterator LhsIterator;
+    typedef typename Lhs::Index Index;
+    enum { IsRowMajor = (int(Lhs::Flags)&RowMajorBit)==RowMajorBit };
+  public:
+
+    EIGEN_STRONG_INLINE sparse_cwise_binary_op_inner_iterator_selector(const CwiseBinaryXpr& xpr, Index outer)
+      : m_rhs(xpr.rhs()), m_lhsIter(xpr.lhs(),outer), m_functor(xpr.functor()), m_outer(outer)
+    {}
+
+    EIGEN_STRONG_INLINE Derived& operator++()
+    {
+      ++m_lhsIter;
+      return *static_cast<Derived*>(this);
+    }
+
+    EIGEN_STRONG_INLINE Scalar value() const
+    { return m_functor(m_lhsIter.value(),
+                       m_rhs.coeff(IsRowMajor?m_outer:m_lhsIter.index(),IsRowMajor?m_lhsIter.index():m_outer)); }
+
+    EIGEN_STRONG_INLINE Index index() const { return m_lhsIter.index(); }
+    EIGEN_STRONG_INLINE Index row() const { return m_lhsIter.row(); }
+    EIGEN_STRONG_INLINE Index col() const { return m_lhsIter.col(); }
+
+    EIGEN_STRONG_INLINE operator bool() const { return m_lhsIter; }
+
+  protected:
+    RhsNested m_rhs;
+    LhsIterator m_lhsIter;
+    const BinaryFunc m_functor;
+    const Index m_outer;
+};
+
+// sparse - dense  (product)
+template<typename T, typename Lhs, typename Rhs, typename Derived>
+class sparse_cwise_binary_op_inner_iterator_selector<scalar_product_op<T>, Lhs, Rhs, Derived, Dense, Sparse>
+{
+    typedef scalar_product_op<T> BinaryFunc;
+    typedef CwiseBinaryOp<BinaryFunc, Lhs, Rhs> CwiseBinaryXpr;
+    typedef typename CwiseBinaryXpr::Scalar Scalar;
+    typedef typename traits<CwiseBinaryXpr>::_RhsNested _RhsNested;
+    typedef typename _RhsNested::InnerIterator RhsIterator;
+    typedef typename Lhs::Index Index;
+
+    enum { IsRowMajor = (int(Rhs::Flags)&RowMajorBit)==RowMajorBit };
+  public:
+
+    EIGEN_STRONG_INLINE sparse_cwise_binary_op_inner_iterator_selector(const CwiseBinaryXpr& xpr, Index outer)
+      : m_xpr(xpr), m_rhsIter(xpr.rhs(),outer), m_functor(xpr.functor()), m_outer(outer)
+    {}
+
+    EIGEN_STRONG_INLINE Derived& operator++()
+    {
+      ++m_rhsIter;
+      return *static_cast<Derived*>(this);
+    }
+
+    EIGEN_STRONG_INLINE Scalar value() const
+    { return m_functor(m_xpr.lhs().coeff(IsRowMajor?m_outer:m_rhsIter.index(),IsRowMajor?m_rhsIter.index():m_outer), m_rhsIter.value()); }
+
+    EIGEN_STRONG_INLINE Index index() const { return m_rhsIter.index(); }
+    EIGEN_STRONG_INLINE Index row() const { return m_rhsIter.row(); }
+    EIGEN_STRONG_INLINE Index col() const { return m_rhsIter.col(); }
+
+    EIGEN_STRONG_INLINE operator bool() const { return m_rhsIter; }
+
+  protected:
+    const CwiseBinaryXpr& m_xpr;
+    RhsIterator m_rhsIter;
+    const BinaryFunc& m_functor;
+    const Index m_outer;
+};
+
+} // end namespace internal
+
+/***************************************************************************
+* Implementation of SparseMatrixBase and SparseCwise functions/operators
+***************************************************************************/
+
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived &
+SparseMatrixBase<Derived>::operator-=(const SparseMatrixBase<OtherDerived> &other)
+{
+  return derived() = derived() - other.derived();
+}
+
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE Derived &
+SparseMatrixBase<Derived>::operator+=(const SparseMatrixBase<OtherDerived>& other)
+{
+  return derived() = derived() + other.derived();
+}
+
+template<typename Derived>
+template<typename OtherDerived>
+EIGEN_STRONG_INLINE const EIGEN_SPARSE_CWISE_PRODUCT_RETURN_TYPE
+SparseMatrixBase<Derived>::cwiseProduct(const MatrixBase<OtherDerived> &other) const
+{
+  return EIGEN_SPARSE_CWISE_PRODUCT_RETURN_TYPE(derived(), other.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_CWISE_BINARY_OP_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseCwiseUnaryOp.h b/third_party/eigen3/Eigen/src/SparseCore/SparseCwiseUnaryOp.h
new file mode 100644
index 0000000000..5a50c78030
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseCwiseUnaryOp.h
@@ -0,0 +1,163 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_CWISE_UNARY_OP_H
+#define EIGEN_SPARSE_CWISE_UNARY_OP_H
+
+namespace Eigen { 
+
+template<typename UnaryOp, typename MatrixType>
+class CwiseUnaryOpImpl<UnaryOp,MatrixType,Sparse>
+  : public SparseMatrixBase<CwiseUnaryOp<UnaryOp, MatrixType> >
+{
+  public:
+
+    class InnerIterator;
+    class ReverseInnerIterator;
+
+    typedef CwiseUnaryOp<UnaryOp, MatrixType> Derived;
+    EIGEN_SPARSE_PUBLIC_INTERFACE(Derived)
+
+  protected:
+    typedef typename internal::traits<Derived>::_XprTypeNested _MatrixTypeNested;
+    typedef typename _MatrixTypeNested::InnerIterator MatrixTypeIterator;
+    typedef typename _MatrixTypeNested::ReverseInnerIterator MatrixTypeReverseIterator;
+};
+
+template<typename UnaryOp, typename MatrixType>
+class CwiseUnaryOpImpl<UnaryOp,MatrixType,Sparse>::InnerIterator
+    : public CwiseUnaryOpImpl<UnaryOp,MatrixType,Sparse>::MatrixTypeIterator
+{
+    typedef typename CwiseUnaryOpImpl::Scalar Scalar;
+    typedef typename CwiseUnaryOpImpl<UnaryOp,MatrixType,Sparse>::MatrixTypeIterator Base;
+  public:
+
+    EIGEN_STRONG_INLINE InnerIterator(const CwiseUnaryOpImpl& unaryOp, typename CwiseUnaryOpImpl::Index outer)
+      : Base(unaryOp.derived().nestedExpression(),outer), m_functor(unaryOp.derived().functor())
+    {}
+
+    EIGEN_STRONG_INLINE InnerIterator& operator++()
+    { Base::operator++(); return *this; }
+
+    EIGEN_STRONG_INLINE typename CwiseUnaryOpImpl::Scalar value() const { return m_functor(Base::value()); }
+
+  protected:
+    const UnaryOp m_functor;
+  private:
+    typename CwiseUnaryOpImpl::Scalar& valueRef();
+};
+
+template<typename UnaryOp, typename MatrixType>
+class CwiseUnaryOpImpl<UnaryOp,MatrixType,Sparse>::ReverseInnerIterator
+    : public CwiseUnaryOpImpl<UnaryOp,MatrixType,Sparse>::MatrixTypeReverseIterator
+{
+    typedef typename CwiseUnaryOpImpl::Scalar Scalar;
+    typedef typename CwiseUnaryOpImpl<UnaryOp,MatrixType,Sparse>::MatrixTypeReverseIterator Base;
+  public:
+
+    EIGEN_STRONG_INLINE ReverseInnerIterator(const CwiseUnaryOpImpl& unaryOp, typename CwiseUnaryOpImpl::Index outer)
+      : Base(unaryOp.derived().nestedExpression(),outer), m_functor(unaryOp.derived().functor())
+    {}
+
+    EIGEN_STRONG_INLINE ReverseInnerIterator& operator--()
+    { Base::operator--(); return *this; }
+
+    EIGEN_STRONG_INLINE typename CwiseUnaryOpImpl::Scalar value() const { return m_functor(Base::value()); }
+
+  protected:
+    const UnaryOp m_functor;
+  private:
+    typename CwiseUnaryOpImpl::Scalar& valueRef();
+};
+
+template<typename ViewOp, typename MatrixType>
+class CwiseUnaryViewImpl<ViewOp,MatrixType,Sparse>
+  : public SparseMatrixBase<CwiseUnaryView<ViewOp, MatrixType> >
+{
+  public:
+
+    class InnerIterator;
+    class ReverseInnerIterator;
+
+    typedef CwiseUnaryView<ViewOp, MatrixType> Derived;
+    EIGEN_SPARSE_PUBLIC_INTERFACE(Derived)
+
+  protected:
+    typedef typename internal::traits<Derived>::_MatrixTypeNested _MatrixTypeNested;
+    typedef typename _MatrixTypeNested::InnerIterator MatrixTypeIterator;
+    typedef typename _MatrixTypeNested::ReverseInnerIterator MatrixTypeReverseIterator;
+};
+
+template<typename ViewOp, typename MatrixType>
+class CwiseUnaryViewImpl<ViewOp,MatrixType,Sparse>::InnerIterator
+    : public CwiseUnaryViewImpl<ViewOp,MatrixType,Sparse>::MatrixTypeIterator
+{
+    typedef typename CwiseUnaryViewImpl::Scalar Scalar;
+    typedef typename CwiseUnaryViewImpl<ViewOp,MatrixType,Sparse>::MatrixTypeIterator Base;
+  public:
+
+    EIGEN_STRONG_INLINE InnerIterator(const CwiseUnaryViewImpl& unaryOp, typename CwiseUnaryViewImpl::Index outer)
+      : Base(unaryOp.derived().nestedExpression(),outer), m_functor(unaryOp.derived().functor())
+    {}
+
+    EIGEN_STRONG_INLINE InnerIterator& operator++()
+    { Base::operator++(); return *this; }
+
+    EIGEN_STRONG_INLINE typename CwiseUnaryViewImpl::Scalar value() const { return m_functor(Base::value()); }
+    EIGEN_STRONG_INLINE typename CwiseUnaryViewImpl::Scalar& valueRef() { return m_functor(Base::valueRef()); }
+
+  protected:
+    const ViewOp m_functor;
+};
+
+template<typename ViewOp, typename MatrixType>
+class CwiseUnaryViewImpl<ViewOp,MatrixType,Sparse>::ReverseInnerIterator
+    : public CwiseUnaryViewImpl<ViewOp,MatrixType,Sparse>::MatrixTypeReverseIterator
+{
+    typedef typename CwiseUnaryViewImpl::Scalar Scalar;
+    typedef typename CwiseUnaryViewImpl<ViewOp,MatrixType,Sparse>::MatrixTypeReverseIterator Base;
+  public:
+
+    EIGEN_STRONG_INLINE ReverseInnerIterator(const CwiseUnaryViewImpl& unaryOp, typename CwiseUnaryViewImpl::Index outer)
+      : Base(unaryOp.derived().nestedExpression(),outer), m_functor(unaryOp.derived().functor())
+    {}
+
+    EIGEN_STRONG_INLINE ReverseInnerIterator& operator--()
+    { Base::operator--(); return *this; }
+
+    EIGEN_STRONG_INLINE typename CwiseUnaryViewImpl::Scalar value() const { return m_functor(Base::value()); }
+    EIGEN_STRONG_INLINE typename CwiseUnaryViewImpl::Scalar& valueRef() { return m_functor(Base::valueRef()); }
+
+  protected:
+    const ViewOp m_functor;
+};
+
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+SparseMatrixBase<Derived>::operator*=(const Scalar& other)
+{
+  for (Index j=0; j<outerSize(); ++j)
+    for (typename Derived::InnerIterator i(derived(),j); i; ++i)
+      i.valueRef() *= other;
+  return derived();
+}
+
+template<typename Derived>
+EIGEN_STRONG_INLINE Derived&
+SparseMatrixBase<Derived>::operator/=(const Scalar& other)
+{
+  for (Index j=0; j<outerSize(); ++j)
+    for (typename Derived::InnerIterator i(derived(),j); i; ++i)
+      i.valueRef() /= other;
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_CWISE_UNARY_OP_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseDenseProduct.h b/third_party/eigen3/Eigen/src/SparseCore/SparseDenseProduct.h
new file mode 100644
index 0000000000..610833f3b0
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseDenseProduct.h
@@ -0,0 +1,311 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSEDENSEPRODUCT_H
+#define EIGEN_SPARSEDENSEPRODUCT_H
+
+namespace Eigen { 
+
+template<typename Lhs, typename Rhs, int InnerSize> struct SparseDenseProductReturnType
+{
+  typedef SparseTimeDenseProduct<Lhs,Rhs> Type;
+};
+
+template<typename Lhs, typename Rhs> struct SparseDenseProductReturnType<Lhs,Rhs,1>
+{
+  typedef SparseDenseOuterProduct<Lhs,Rhs,false> Type;
+};
+
+template<typename Lhs, typename Rhs, int InnerSize> struct DenseSparseProductReturnType
+{
+  typedef DenseTimeSparseProduct<Lhs,Rhs> Type;
+};
+
+template<typename Lhs, typename Rhs> struct DenseSparseProductReturnType<Lhs,Rhs,1>
+{
+  typedef SparseDenseOuterProduct<Rhs,Lhs,true> Type;
+};
+
+namespace internal {
+
+template<typename Lhs, typename Rhs, bool Tr>
+struct traits<SparseDenseOuterProduct<Lhs,Rhs,Tr> >
+{
+  typedef Sparse StorageKind;
+  typedef typename scalar_product_traits<typename traits<Lhs>::Scalar,
+                                         typename traits<Rhs>::Scalar>::ReturnType Scalar;
+  typedef typename Lhs::Index Index;
+  typedef typename Lhs::Nested LhsNested;
+  typedef typename Rhs::Nested RhsNested;
+  typedef typename remove_all<LhsNested>::type _LhsNested;
+  typedef typename remove_all<RhsNested>::type _RhsNested;
+
+  enum {
+    LhsCoeffReadCost = traits<_LhsNested>::CoeffReadCost,
+    RhsCoeffReadCost = traits<_RhsNested>::CoeffReadCost,
+
+    RowsAtCompileTime    = Tr ? int(traits<Rhs>::RowsAtCompileTime)     : int(traits<Lhs>::RowsAtCompileTime),
+    ColsAtCompileTime    = Tr ? int(traits<Lhs>::ColsAtCompileTime)     : int(traits<Rhs>::ColsAtCompileTime),
+    MaxRowsAtCompileTime = Tr ? int(traits<Rhs>::MaxRowsAtCompileTime)  : int(traits<Lhs>::MaxRowsAtCompileTime),
+    MaxColsAtCompileTime = Tr ? int(traits<Lhs>::MaxColsAtCompileTime)  : int(traits<Rhs>::MaxColsAtCompileTime),
+
+    Flags = Tr ? RowMajorBit : 0,
+
+    CoeffReadCost = LhsCoeffReadCost + RhsCoeffReadCost + NumTraits<Scalar>::MulCost
+  };
+};
+
+} // end namespace internal
+
+template<typename Lhs, typename Rhs, bool Tr>
+class SparseDenseOuterProduct
+ : public SparseMatrixBase<SparseDenseOuterProduct<Lhs,Rhs,Tr> >
+{
+  public:
+
+    typedef SparseMatrixBase<SparseDenseOuterProduct> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(SparseDenseOuterProduct)
+    typedef internal::traits<SparseDenseOuterProduct> Traits;
+
+  private:
+
+    typedef typename Traits::LhsNested LhsNested;
+    typedef typename Traits::RhsNested RhsNested;
+    typedef typename Traits::_LhsNested _LhsNested;
+    typedef typename Traits::_RhsNested _RhsNested;
+
+  public:
+
+    class InnerIterator;
+
+    EIGEN_STRONG_INLINE SparseDenseOuterProduct(const Lhs& lhs, const Rhs& rhs)
+      : m_lhs(lhs), m_rhs(rhs)
+    {
+      EIGEN_STATIC_ASSERT(!Tr,YOU_MADE_A_PROGRAMMING_MISTAKE);
+    }
+
+    EIGEN_STRONG_INLINE SparseDenseOuterProduct(const Rhs& rhs, const Lhs& lhs)
+      : m_lhs(lhs), m_rhs(rhs)
+    {
+      EIGEN_STATIC_ASSERT(Tr,YOU_MADE_A_PROGRAMMING_MISTAKE);
+    }
+
+    EIGEN_STRONG_INLINE Index rows() const { return Tr ? m_rhs.rows() : m_lhs.rows(); }
+    EIGEN_STRONG_INLINE Index cols() const { return Tr ? m_lhs.cols() : m_rhs.cols(); }
+
+    EIGEN_STRONG_INLINE const _LhsNested& lhs() const { return m_lhs; }
+    EIGEN_STRONG_INLINE const _RhsNested& rhs() const { return m_rhs; }
+
+  protected:
+    LhsNested m_lhs;
+    RhsNested m_rhs;
+};
+
+template<typename Lhs, typename Rhs, bool Transpose>
+class SparseDenseOuterProduct<Lhs,Rhs,Transpose>::InnerIterator : public _LhsNested::InnerIterator
+{
+    typedef typename _LhsNested::InnerIterator Base;
+    typedef typename SparseDenseOuterProduct::Index Index;
+  public:
+    EIGEN_STRONG_INLINE InnerIterator(const SparseDenseOuterProduct& prod, Index outer)
+      : Base(prod.lhs(), 0), m_outer(outer), m_factor(prod.rhs().coeff(outer))
+    {
+    }
+
+    inline Index outer() const { return m_outer; }
+    inline Index row() const { return Transpose ? Base::row() : m_outer; }
+    inline Index col() const { return Transpose ? m_outer : Base::row(); }
+
+    inline Scalar value() const { return Base::value() * m_factor; }
+
+  protected:
+    Index m_outer;
+    Scalar m_factor;
+};
+
+namespace internal {
+template<typename Lhs, typename Rhs>
+struct traits<SparseTimeDenseProduct<Lhs,Rhs> >
+ : traits<ProductBase<SparseTimeDenseProduct<Lhs,Rhs>, Lhs, Rhs> >
+{
+  typedef Dense StorageKind;
+  typedef MatrixXpr XprKind;
+};
+
+template<typename SparseLhsType, typename DenseRhsType, typename DenseResType,
+         typename AlphaType,
+         int LhsStorageOrder = ((SparseLhsType::Flags&RowMajorBit)==RowMajorBit) ? RowMajor : ColMajor,
+         bool ColPerCol = ((DenseRhsType::Flags&RowMajorBit)==0) || DenseRhsType::ColsAtCompileTime==1>
+struct sparse_time_dense_product_impl;
+
+template<typename SparseLhsType, typename DenseRhsType, typename DenseResType>
+struct sparse_time_dense_product_impl<SparseLhsType,DenseRhsType,DenseResType, typename DenseResType::Scalar, RowMajor, true>
+{
+  typedef typename internal::remove_all<SparseLhsType>::type Lhs;
+  typedef typename internal::remove_all<DenseRhsType>::type Rhs;
+  typedef typename internal::remove_all<DenseResType>::type Res;
+  typedef typename Lhs::Index Index;
+  typedef typename Lhs::InnerIterator LhsInnerIterator;
+  static void run(const SparseLhsType& lhs, const DenseRhsType& rhs, DenseResType& res, const typename Res::Scalar& alpha)
+  {
+    for(Index c=0; c<rhs.cols(); ++c)
+    {
+      Index n = lhs.outerSize();
+      for(Index j=0; j<n; ++j)
+      {
+        typename Res::Scalar tmp(0);
+        for(LhsInnerIterator it(lhs,j); it ;++it)
+          tmp += it.value() * rhs.coeff(it.index(),c);
+        res.coeffRef(j,c) = alpha * tmp;
+      }
+    }
+  }
+};
+
+template<typename T1, typename T2/*, int _Options, typename _StrideType*/>
+struct scalar_product_traits<T1, Ref<T2/*, _Options, _StrideType*/> >
+{
+  enum {
+    Defined = 1
+  };
+  typedef typename CwiseUnaryOp<scalar_multiple2_op<T1, typename T2::Scalar>, T2>::PlainObject ReturnType;
+};
+template<typename SparseLhsType, typename DenseRhsType, typename DenseResType, typename AlphaType>
+struct sparse_time_dense_product_impl<SparseLhsType,DenseRhsType,DenseResType, AlphaType, ColMajor, true>
+{
+  typedef typename internal::remove_all<SparseLhsType>::type Lhs;
+  typedef typename internal::remove_all<DenseRhsType>::type Rhs;
+  typedef typename internal::remove_all<DenseResType>::type Res;
+  typedef typename Lhs::InnerIterator LhsInnerIterator;
+  typedef typename Lhs::Index Index;
+  static void run(const SparseLhsType& lhs, const DenseRhsType& rhs, DenseResType& res, const AlphaType& alpha)
+  {
+    for(Index c=0; c<rhs.cols(); ++c)
+    {
+      for(Index j=0; j<lhs.outerSize(); ++j)
+      {
+//        typename Res::Scalar rhs_j = alpha * rhs.coeff(j,c);
+        typename internal::scalar_product_traits<AlphaType, typename Rhs::Scalar>::ReturnType rhs_j(alpha * rhs.coeff(j,c));
+        for(LhsInnerIterator it(lhs,j); it ;++it)
+          res.coeffRef(it.index(),c) += it.value() * rhs_j;
+      }
+    }
+  }
+};
+
+template<typename SparseLhsType, typename DenseRhsType, typename DenseResType>
+struct sparse_time_dense_product_impl<SparseLhsType,DenseRhsType,DenseResType, typename DenseResType::Scalar, RowMajor, false>
+{
+  typedef typename internal::remove_all<SparseLhsType>::type Lhs;
+  typedef typename internal::remove_all<DenseRhsType>::type Rhs;
+  typedef typename internal::remove_all<DenseResType>::type Res;
+  typedef typename Lhs::InnerIterator LhsInnerIterator;
+  typedef typename Lhs::Index Index;
+  static void run(const SparseLhsType& lhs, const DenseRhsType& rhs, DenseResType& res, const typename Res::Scalar& alpha)
+  {
+    for(Index j=0; j<lhs.outerSize(); ++j)
+    {
+      typename Res::RowXpr res_j(res.row(j));
+      for(LhsInnerIterator it(lhs,j); it ;++it)
+        res_j += (alpha*it.value()) * rhs.row(it.index());
+    }
+  }
+};
+
+template<typename SparseLhsType, typename DenseRhsType, typename DenseResType>
+struct sparse_time_dense_product_impl<SparseLhsType,DenseRhsType,DenseResType, typename DenseResType::Scalar, ColMajor, false>
+{
+  typedef typename internal::remove_all<SparseLhsType>::type Lhs;
+  typedef typename internal::remove_all<DenseRhsType>::type Rhs;
+  typedef typename internal::remove_all<DenseResType>::type Res;
+  typedef typename Lhs::InnerIterator LhsInnerIterator;
+  typedef typename Lhs::Index Index;
+  static void run(const SparseLhsType& lhs, const DenseRhsType& rhs, DenseResType& res, const typename Res::Scalar& alpha)
+  {
+    for(Index j=0; j<lhs.outerSize(); ++j)
+    {
+      typename Rhs::ConstRowXpr rhs_j(rhs.row(j));
+      for(LhsInnerIterator it(lhs,j); it ;++it)
+        res.row(it.index()) += (alpha*it.value()) * rhs_j;
+    }
+  }
+};
+
+template<typename SparseLhsType, typename DenseRhsType, typename DenseResType,typename AlphaType>
+inline void sparse_time_dense_product(const SparseLhsType& lhs, const DenseRhsType& rhs, DenseResType& res, const AlphaType& alpha)
+{
+  sparse_time_dense_product_impl<SparseLhsType,DenseRhsType,DenseResType, AlphaType>::run(lhs, rhs, res, alpha);
+}
+
+} // end namespace internal
+
+template<typename Lhs, typename Rhs>
+class SparseTimeDenseProduct
+  : public ProductBase<SparseTimeDenseProduct<Lhs,Rhs>, Lhs, Rhs>
+{
+  public:
+    EIGEN_PRODUCT_PUBLIC_INTERFACE(SparseTimeDenseProduct)
+
+    SparseTimeDenseProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs)
+    {}
+
+    template<typename Dest> void scaleAndAddTo(Dest& dest, const Scalar& alpha) const
+    {
+      internal::sparse_time_dense_product(m_lhs, m_rhs, dest, alpha);
+    }
+
+  private:
+    SparseTimeDenseProduct& operator=(const SparseTimeDenseProduct&);
+};
+
+
+// dense = dense * sparse
+namespace internal {
+template<typename Lhs, typename Rhs>
+struct traits<DenseTimeSparseProduct<Lhs,Rhs> >
+ : traits<ProductBase<DenseTimeSparseProduct<Lhs,Rhs>, Lhs, Rhs> >
+{
+  typedef Dense StorageKind;
+};
+} // end namespace internal
+
+template<typename Lhs, typename Rhs>
+class DenseTimeSparseProduct
+  : public ProductBase<DenseTimeSparseProduct<Lhs,Rhs>, Lhs, Rhs>
+{
+  public:
+    EIGEN_PRODUCT_PUBLIC_INTERFACE(DenseTimeSparseProduct)
+
+    DenseTimeSparseProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs)
+    {}
+
+    template<typename Dest> void scaleAndAddTo(Dest& dest, const Scalar& alpha) const
+    {
+      Transpose<const _LhsNested> lhs_t(m_lhs);
+      Transpose<const _RhsNested> rhs_t(m_rhs);
+      Transpose<Dest> dest_t(dest);
+      internal::sparse_time_dense_product(rhs_t, lhs_t, dest_t, alpha);
+    }
+
+  private:
+    DenseTimeSparseProduct& operator=(const DenseTimeSparseProduct&);
+};
+
+// sparse * dense
+template<typename Derived>
+template<typename OtherDerived>
+inline const typename SparseDenseProductReturnType<Derived,OtherDerived>::Type
+SparseMatrixBase<Derived>::operator*(const MatrixBase<OtherDerived> &other) const
+{
+  return typename SparseDenseProductReturnType<Derived,OtherDerived>::Type(derived(), other.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSEDENSEPRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseDiagonalProduct.h b/third_party/eigen3/Eigen/src/SparseCore/SparseDiagonalProduct.h
new file mode 100644
index 0000000000..1bb590e64d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseDiagonalProduct.h
@@ -0,0 +1,196 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_DIAGONAL_PRODUCT_H
+#define EIGEN_SPARSE_DIAGONAL_PRODUCT_H
+
+namespace Eigen { 
+
+// The product of a diagonal matrix with a sparse matrix can be easily
+// implemented using expression template.
+// We have two consider very different cases:
+// 1 - diag * row-major sparse
+//     => each inner vector <=> scalar * sparse vector product
+//     => so we can reuse CwiseUnaryOp::InnerIterator
+// 2 - diag * col-major sparse
+//     => each inner vector <=> densevector * sparse vector cwise product
+//     => again, we can reuse specialization of CwiseBinaryOp::InnerIterator
+//        for that particular case
+// The two other cases are symmetric.
+
+namespace internal {
+
+template<typename Lhs, typename Rhs>
+struct traits<SparseDiagonalProduct<Lhs, Rhs> >
+{
+  typedef typename remove_all<Lhs>::type _Lhs;
+  typedef typename remove_all<Rhs>::type _Rhs;
+  typedef typename _Lhs::Scalar Scalar;
+  typedef typename promote_index_type<typename traits<Lhs>::Index,
+                                         typename traits<Rhs>::Index>::type Index;
+  typedef Sparse StorageKind;
+  typedef MatrixXpr XprKind;
+  enum {
+    RowsAtCompileTime = _Lhs::RowsAtCompileTime,
+    ColsAtCompileTime = _Rhs::ColsAtCompileTime,
+
+    MaxRowsAtCompileTime = _Lhs::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = _Rhs::MaxColsAtCompileTime,
+
+    SparseFlags = is_diagonal<_Lhs>::ret ? int(_Rhs::Flags) : int(_Lhs::Flags),
+    Flags = (SparseFlags&RowMajorBit),
+    CoeffReadCost = Dynamic
+  };
+};
+
+enum {SDP_IsDiagonal, SDP_IsSparseRowMajor, SDP_IsSparseColMajor};
+template<typename Lhs, typename Rhs, typename SparseDiagonalProductType, int RhsMode, int LhsMode>
+class sparse_diagonal_product_inner_iterator_selector;
+
+} // end namespace internal
+
+template<typename Lhs, typename Rhs>
+class SparseDiagonalProduct
+  : public SparseMatrixBase<SparseDiagonalProduct<Lhs,Rhs> >,
+    internal::no_assignment_operator
+{
+    typedef typename Lhs::Nested LhsNested;
+    typedef typename Rhs::Nested RhsNested;
+
+    typedef typename internal::remove_all<LhsNested>::type _LhsNested;
+    typedef typename internal::remove_all<RhsNested>::type _RhsNested;
+
+    enum {
+      LhsMode = internal::is_diagonal<_LhsNested>::ret ? internal::SDP_IsDiagonal
+              : (_LhsNested::Flags&RowMajorBit) ? internal::SDP_IsSparseRowMajor : internal::SDP_IsSparseColMajor,
+      RhsMode = internal::is_diagonal<_RhsNested>::ret ? internal::SDP_IsDiagonal
+              : (_RhsNested::Flags&RowMajorBit) ? internal::SDP_IsSparseRowMajor : internal::SDP_IsSparseColMajor
+    };
+
+  public:
+
+    EIGEN_SPARSE_PUBLIC_INTERFACE(SparseDiagonalProduct)
+
+    typedef internal::sparse_diagonal_product_inner_iterator_selector
+                      <_LhsNested,_RhsNested,SparseDiagonalProduct,LhsMode,RhsMode> InnerIterator;
+    
+    // We do not want ReverseInnerIterator for diagonal-sparse products,
+    // but this dummy declaration is needed to make diag * sparse * diag compile.
+    class ReverseInnerIterator;
+
+    EIGEN_STRONG_INLINE SparseDiagonalProduct(const Lhs& lhs, const Rhs& rhs)
+      : m_lhs(lhs), m_rhs(rhs)
+    {
+      eigen_assert(lhs.cols() == rhs.rows() && "invalid sparse matrix * diagonal matrix product");
+    }
+
+    EIGEN_STRONG_INLINE Index rows() const { return m_lhs.rows(); }
+    EIGEN_STRONG_INLINE Index cols() const { return m_rhs.cols(); }
+
+    EIGEN_STRONG_INLINE const _LhsNested& lhs() const { return m_lhs; }
+    EIGEN_STRONG_INLINE const _RhsNested& rhs() const { return m_rhs; }
+
+  protected:
+    LhsNested m_lhs;
+    RhsNested m_rhs;
+};
+
+namespace internal {
+
+template<typename Lhs, typename Rhs, typename SparseDiagonalProductType>
+class sparse_diagonal_product_inner_iterator_selector
+<Lhs,Rhs,SparseDiagonalProductType,SDP_IsDiagonal,SDP_IsSparseRowMajor>
+  : public CwiseUnaryOp<scalar_multiple_op<typename Lhs::Scalar>,const Rhs>::InnerIterator
+{
+    typedef typename CwiseUnaryOp<scalar_multiple_op<typename Lhs::Scalar>,const Rhs>::InnerIterator Base;
+    typedef typename Lhs::Index Index;
+  public:
+    inline sparse_diagonal_product_inner_iterator_selector(
+              const SparseDiagonalProductType& expr, Index outer)
+      : Base(expr.rhs()*(expr.lhs().diagonal().coeff(outer)), outer)
+    {}
+};
+
+template<typename Lhs, typename Rhs, typename SparseDiagonalProductType>
+class sparse_diagonal_product_inner_iterator_selector
+<Lhs,Rhs,SparseDiagonalProductType,SDP_IsDiagonal,SDP_IsSparseColMajor>
+  : public CwiseBinaryOp<
+      scalar_product_op<typename Lhs::Scalar>,
+      const typename Rhs::ConstInnerVectorReturnType,
+      const typename Lhs::DiagonalVectorType>::InnerIterator
+{
+    typedef typename CwiseBinaryOp<
+      scalar_product_op<typename Lhs::Scalar>,
+      const typename Rhs::ConstInnerVectorReturnType,
+      const typename Lhs::DiagonalVectorType>::InnerIterator Base;
+    typedef typename Lhs::Index Index;
+    Index m_outer;
+  public:
+    inline sparse_diagonal_product_inner_iterator_selector(
+              const SparseDiagonalProductType& expr, Index outer)
+      : Base(expr.rhs().innerVector(outer) .cwiseProduct(expr.lhs().diagonal()), 0), m_outer(outer)
+    {}
+    
+    inline Index outer() const { return m_outer; }
+    inline Index col() const { return m_outer; }
+};
+
+template<typename Lhs, typename Rhs, typename SparseDiagonalProductType>
+class sparse_diagonal_product_inner_iterator_selector
+<Lhs,Rhs,SparseDiagonalProductType,SDP_IsSparseColMajor,SDP_IsDiagonal>
+  : public CwiseUnaryOp<scalar_multiple_op<typename Rhs::Scalar>,const Lhs>::InnerIterator
+{
+    typedef typename CwiseUnaryOp<scalar_multiple_op<typename Rhs::Scalar>,const Lhs>::InnerIterator Base;
+    typedef typename Lhs::Index Index;
+  public:
+    inline sparse_diagonal_product_inner_iterator_selector(
+              const SparseDiagonalProductType& expr, Index outer)
+      : Base(expr.lhs()*expr.rhs().diagonal().coeff(outer), outer)
+    {}
+};
+
+template<typename Lhs, typename Rhs, typename SparseDiagonalProductType>
+class sparse_diagonal_product_inner_iterator_selector
+<Lhs,Rhs,SparseDiagonalProductType,SDP_IsSparseRowMajor,SDP_IsDiagonal>
+  : public CwiseBinaryOp<
+      scalar_product_op<typename Rhs::Scalar>,
+      const typename Lhs::ConstInnerVectorReturnType,
+      const Transpose<const typename Rhs::DiagonalVectorType> >::InnerIterator
+{
+    typedef typename CwiseBinaryOp<
+      scalar_product_op<typename Rhs::Scalar>,
+      const typename Lhs::ConstInnerVectorReturnType,
+      const Transpose<const typename Rhs::DiagonalVectorType> >::InnerIterator Base;
+    typedef typename Lhs::Index Index;
+    Index m_outer;
+  public:
+    inline sparse_diagonal_product_inner_iterator_selector(
+              const SparseDiagonalProductType& expr, Index outer)
+      : Base(expr.lhs().innerVector(outer) .cwiseProduct(expr.rhs().diagonal().transpose()), 0), m_outer(outer)
+    {}
+    
+    inline Index outer() const { return m_outer; }
+    inline Index row() const { return m_outer; }
+};
+
+} // end namespace internal
+
+// SparseMatrixBase functions
+
+template<typename Derived>
+template<typename OtherDerived>
+const SparseDiagonalProduct<Derived,OtherDerived>
+SparseMatrixBase<Derived>::operator*(const DiagonalBase<OtherDerived> &other) const
+{
+  return SparseDiagonalProduct<Derived,OtherDerived>(this->derived(), other.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_DIAGONAL_PRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseDot.h b/third_party/eigen3/Eigen/src/SparseCore/SparseDot.h
new file mode 100644
index 0000000000..db39c9aecc
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseDot.h
@@ -0,0 +1,101 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_DOT_H
+#define EIGEN_SPARSE_DOT_H
+
+namespace Eigen { 
+
+template<typename Derived>
+template<typename OtherDerived>
+typename internal::traits<Derived>::Scalar
+SparseMatrixBase<Derived>::dot(const MatrixBase<OtherDerived>& other) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+  EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(Derived,OtherDerived)
+  EIGEN_STATIC_ASSERT((internal::is_same<Scalar, typename OtherDerived::Scalar>::value),
+    YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+  eigen_assert(size() == other.size());
+  eigen_assert(other.size()>0 && "you are using a non initialized vector");
+
+  typename Derived::InnerIterator i(derived(),0);
+  Scalar res(0);
+  while (i)
+  {
+    res += numext::conj(i.value()) * other.coeff(i.index());
+    ++i;
+  }
+  return res;
+}
+
+template<typename Derived>
+template<typename OtherDerived>
+typename internal::traits<Derived>::Scalar
+SparseMatrixBase<Derived>::dot(const SparseMatrixBase<OtherDerived>& other) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(OtherDerived)
+  EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(Derived,OtherDerived)
+  EIGEN_STATIC_ASSERT((internal::is_same<Scalar, typename OtherDerived::Scalar>::value),
+    YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+
+  eigen_assert(size() == other.size());
+
+  typedef typename Derived::Nested  Nested;
+  typedef typename OtherDerived::Nested  OtherNested;
+  typedef typename internal::remove_all<Nested>::type  NestedCleaned;
+  typedef typename internal::remove_all<OtherNested>::type  OtherNestedCleaned;
+
+  Nested nthis(derived());
+  OtherNested nother(other.derived());
+
+  typename NestedCleaned::InnerIterator i(nthis,0);
+  typename OtherNestedCleaned::InnerIterator j(nother,0);
+  Scalar res(0);
+  while (i && j)
+  {
+    if (i.index()==j.index())
+    {
+      res += numext::conj(i.value()) * j.value();
+      ++i; ++j;
+    }
+    else if (i.index()<j.index())
+      ++i;
+    else
+      ++j;
+  }
+  return res;
+}
+
+template<typename Derived>
+inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
+SparseMatrixBase<Derived>::squaredNorm() const
+{
+  return numext::real((*this).cwiseAbs2().sum());
+}
+
+template<typename Derived>
+inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
+SparseMatrixBase<Derived>::norm() const
+{
+  using std::sqrt;
+  return sqrt(squaredNorm());
+}
+
+template<typename Derived>
+inline typename NumTraits<typename internal::traits<Derived>::Scalar>::Real
+SparseMatrixBase<Derived>::blueNorm() const
+{
+  return internal::blueNorm_impl(*this);
+}
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_DOT_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseFuzzy.h b/third_party/eigen3/Eigen/src/SparseCore/SparseFuzzy.h
new file mode 100644
index 0000000000..45f36e9eb9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseFuzzy.h
@@ -0,0 +1,26 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_FUZZY_H
+#define EIGEN_SPARSE_FUZZY_H
+
+// template<typename Derived>
+// template<typename OtherDerived>
+// bool SparseMatrixBase<Derived>::isApprox(
+//   const OtherDerived& other,
+//   typename NumTraits<Scalar>::Real prec
+// ) const
+// {
+//   const typename internal::nested<Derived,2>::type nested(derived());
+//   const typename internal::nested<OtherDerived,2>::type otherNested(other.derived());
+//   return    (nested - otherNested).cwise().abs2().sum()
+//          <= prec * prec * (std::min)(nested.cwise().abs2().sum(), otherNested.cwise().abs2().sum());
+// }
+
+#endif // EIGEN_SPARSE_FUZZY_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseMatrix.h b/third_party/eigen3/Eigen/src/SparseCore/SparseMatrix.h
new file mode 100644
index 0000000000..5070c81d9f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseMatrix.h
@@ -0,0 +1,1259 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSEMATRIX_H
+#define EIGEN_SPARSEMATRIX_H
+
+namespace Eigen { 
+
+/** \ingroup SparseCore_Module
+  *
+  * \class SparseMatrix
+  *
+  * \brief A versatible sparse matrix representation
+  *
+  * This class implements a more versatile variants of the common \em compressed row/column storage format.
+  * Each colmun's (resp. row) non zeros are stored as a pair of value with associated row (resp. colmiun) index.
+  * All the non zeros are stored in a single large buffer. Unlike the \em compressed format, there might be extra
+  * space inbetween the nonzeros of two successive colmuns (resp. rows) such that insertion of new non-zero
+  * can be done with limited memory reallocation and copies.
+  *
+  * A call to the function makeCompressed() turns the matrix into the standard \em compressed format
+  * compatible with many library.
+  *
+  * More details on this storage sceheme are given in the \ref TutorialSparse "manual pages".
+  *
+  * \tparam _Scalar the scalar type, i.e. the type of the coefficients
+  * \tparam _Options Union of bit flags controlling the storage scheme. Currently the only possibility
+  *                 is ColMajor or RowMajor. The default is 0 which means column-major.
+  * \tparam _Index the type of the indices. It has to be a \b signed type (e.g., short, int, std::ptrdiff_t). Default is \c int.
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_SPARSEMATRIX_PLUGIN.
+  */
+
+namespace internal {
+template<typename _Scalar, int _Options, typename _Index>
+struct traits<SparseMatrix<_Scalar, _Options, _Index> >
+{
+  typedef _Scalar Scalar;
+  typedef _Index Index;
+  typedef Sparse StorageKind;
+  typedef MatrixXpr XprKind;
+  enum {
+    RowsAtCompileTime = Dynamic,
+    ColsAtCompileTime = Dynamic,
+    MaxRowsAtCompileTime = Dynamic,
+    MaxColsAtCompileTime = Dynamic,
+    Flags = _Options | NestByRefBit | LvalueBit,
+    CoeffReadCost = NumTraits<Scalar>::ReadCost,
+    SupportedAccessPatterns = InnerRandomAccessPattern
+  };
+};
+
+template<typename _Scalar, int _Options, typename _Index, int DiagIndex>
+struct traits<Diagonal<const SparseMatrix<_Scalar, _Options, _Index>, DiagIndex> >
+{
+  typedef SparseMatrix<_Scalar, _Options, _Index> MatrixType;
+  typedef typename nested<MatrixType>::type MatrixTypeNested;
+  typedef typename remove_reference<MatrixTypeNested>::type _MatrixTypeNested;
+
+  typedef _Scalar Scalar;
+  typedef Dense StorageKind;
+  typedef _Index Index;
+  typedef MatrixXpr XprKind;
+
+  enum {
+    RowsAtCompileTime = Dynamic,
+    ColsAtCompileTime = 1,
+    MaxRowsAtCompileTime = Dynamic,
+    MaxColsAtCompileTime = 1,
+    Flags = 0,
+    CoeffReadCost = _MatrixTypeNested::CoeffReadCost*10
+  };
+};
+
+} // end namespace internal
+
+template<typename _Scalar, int _Options, typename _Index>
+class SparseMatrix
+  : public SparseMatrixBase<SparseMatrix<_Scalar, _Options, _Index> >
+{
+  public:
+    EIGEN_SPARSE_PUBLIC_INTERFACE(SparseMatrix)
+    EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(SparseMatrix, +=)
+    EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(SparseMatrix, -=)
+
+    typedef MappedSparseMatrix<Scalar,Flags> Map;
+    using Base::IsRowMajor;
+    typedef internal::CompressedStorage<Scalar,Index> Storage;
+    enum {
+      Options = _Options
+    };
+
+  protected:
+
+    typedef SparseMatrix<Scalar,(Flags&~RowMajorBit)|(IsRowMajor?RowMajorBit:0)> TransposedSparseMatrix;
+
+    Index m_outerSize;
+    Index m_innerSize;
+    Index* m_outerIndex;
+    Index* m_innerNonZeros;     // optional, if null then the data is compressed
+    Storage m_data;
+    
+    Eigen::Map<Matrix<Index,Dynamic,1> > innerNonZeros() { return Eigen::Map<Matrix<Index,Dynamic,1> >(m_innerNonZeros, m_innerNonZeros?m_outerSize:0); }
+    const  Eigen::Map<const Matrix<Index,Dynamic,1> > innerNonZeros() const { return Eigen::Map<const Matrix<Index,Dynamic,1> >(m_innerNonZeros, m_innerNonZeros?m_outerSize:0); }
+
+  public:
+    
+    /** \returns whether \c *this is in compressed form. */
+    inline bool isCompressed() const { return m_innerNonZeros==0; }
+
+    /** \returns the number of rows of the matrix */
+    inline Index rows() const { return IsRowMajor ? m_outerSize : m_innerSize; }
+    /** \returns the number of columns of the matrix */
+    inline Index cols() const { return IsRowMajor ? m_innerSize : m_outerSize; }
+
+    /** \returns the number of rows (resp. columns) of the matrix if the storage order column major (resp. row major) */
+    inline Index innerSize() const { return m_innerSize; }
+    /** \returns the number of columns (resp. rows) of the matrix if the storage order column major (resp. row major) */
+    inline Index outerSize() const { return m_outerSize; }
+    
+    /** \returns a const pointer to the array of values.
+      * This function is aimed at interoperability with other libraries.
+      * \sa innerIndexPtr(), outerIndexPtr() */
+    inline const Scalar* valuePtr() const { return &m_data.value(0); }
+    /** \returns a non-const pointer to the array of values.
+      * This function is aimed at interoperability with other libraries.
+      * \sa innerIndexPtr(), outerIndexPtr() */
+    inline Scalar* valuePtr() { return &m_data.value(0); }
+
+    /** \returns a const pointer to the array of inner indices.
+      * This function is aimed at interoperability with other libraries.
+      * \sa valuePtr(), outerIndexPtr() */
+    inline const Index* innerIndexPtr() const { return &m_data.index(0); }
+    /** \returns a non-const pointer to the array of inner indices.
+      * This function is aimed at interoperability with other libraries.
+      * \sa valuePtr(), outerIndexPtr() */
+    inline Index* innerIndexPtr() { return &m_data.index(0); }
+
+    /** \returns a const pointer to the array of the starting positions of the inner vectors.
+      * This function is aimed at interoperability with other libraries.
+      * \sa valuePtr(), innerIndexPtr() */
+    inline const Index* outerIndexPtr() const { return m_outerIndex; }
+    /** \returns a non-const pointer to the array of the starting positions of the inner vectors.
+      * This function is aimed at interoperability with other libraries.
+      * \sa valuePtr(), innerIndexPtr() */
+    inline Index* outerIndexPtr() { return m_outerIndex; }
+
+    /** \returns a const pointer to the array of the number of non zeros of the inner vectors.
+      * This function is aimed at interoperability with other libraries.
+      * \warning it returns the null pointer 0 in compressed mode */
+    inline const Index* innerNonZeroPtr() const { return m_innerNonZeros; }
+    /** \returns a non-const pointer to the array of the number of non zeros of the inner vectors.
+      * This function is aimed at interoperability with other libraries.
+      * \warning it returns the null pointer 0 in compressed mode */
+    inline Index* innerNonZeroPtr() { return m_innerNonZeros; }
+
+    /** \internal */
+    inline Storage& data() { return m_data; }
+    /** \internal */
+    inline const Storage& data() const { return m_data; }
+
+    /** \returns the value of the matrix at position \a i, \a j
+      * This function returns Scalar(0) if the element is an explicit \em zero */
+    inline Scalar coeff(Index row, Index col) const
+    {
+      eigen_assert(row>=0 && row<rows() && col>=0 && col<cols());
+      
+      const Index outer = IsRowMajor ? row : col;
+      const Index inner = IsRowMajor ? col : row;
+      Index end = m_innerNonZeros ? m_outerIndex[outer] + m_innerNonZeros[outer] : m_outerIndex[outer+1];
+      return m_data.atInRange(m_outerIndex[outer], end, inner);
+    }
+
+    /** \returns a non-const reference to the value of the matrix at position \a i, \a j
+      *
+      * If the element does not exist then it is inserted via the insert(Index,Index) function
+      * which itself turns the matrix into a non compressed form if that was not the case.
+      *
+      * This is a O(log(nnz_j)) operation (binary search) plus the cost of insert(Index,Index)
+      * function if the element does not already exist.
+      */
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      eigen_assert(row>=0 && row<rows() && col>=0 && col<cols());
+      
+      const Index outer = IsRowMajor ? row : col;
+      const Index inner = IsRowMajor ? col : row;
+
+      Index start = m_outerIndex[outer];
+      Index end = m_innerNonZeros ? m_outerIndex[outer] + m_innerNonZeros[outer] : m_outerIndex[outer+1];
+      eigen_assert(end>=start && "you probably called coeffRef on a non finalized matrix");
+      if(end<=start)
+        return insert(row,col);
+      const Index p = m_data.searchLowerIndex(start,end-1,inner);
+      if((p<end) && (m_data.index(p)==inner))
+        return m_data.value(p);
+      else
+        return insert(row,col);
+    }
+
+    /** \returns a reference to a novel non zero coefficient with coordinates \a row x \a col.
+      * The non zero coefficient must \b not already exist.
+      *
+      * If the matrix \c *this is in compressed mode, then \c *this is turned into uncompressed
+      * mode while reserving room for 2 non zeros per inner vector. It is strongly recommended to first
+      * call reserve(const SizesType &) to reserve a more appropriate number of elements per
+      * inner vector that better match your scenario.
+      *
+      * This function performs a sorted insertion in O(1) if the elements of each inner vector are
+      * inserted in increasing inner index order, and in O(nnz_j) for a random insertion.
+      *
+      */
+    Scalar& insert(Index row, Index col)
+    {
+      eigen_assert(row>=0 && row<rows() && col>=0 && col<cols());
+      
+      if(isCompressed())
+      {
+        reserve(Matrix<Index,Dynamic,1>::Constant(outerSize(), 2));
+      }
+      return insertUncompressed(row,col);
+    }
+
+  public:
+
+    class InnerIterator;
+    class ReverseInnerIterator;
+
+    /** Removes all non zeros but keep allocated memory */
+    inline void setZero()
+    {
+      m_data.clear();
+      memset(m_outerIndex, 0, (m_outerSize+1)*sizeof(Index));
+      if(m_innerNonZeros)
+        memset(m_innerNonZeros, 0, (m_outerSize)*sizeof(Index));
+    }
+
+    /** \returns the number of non zero coefficients */
+    inline Index nonZeros() const
+    {
+      if(m_innerNonZeros)
+        return innerNonZeros().sum();
+      return static_cast<Index>(m_data.size());
+    }
+
+    /** Preallocates \a reserveSize non zeros.
+      *
+      * Precondition: the matrix must be in compressed mode. */
+    inline void reserve(Index reserveSize)
+    {
+      eigen_assert(isCompressed() && "This function does not make sense in non compressed mode.");
+      m_data.reserve(reserveSize);
+    }
+    
+    #ifdef EIGEN_PARSED_BY_DOXYGEN
+    /** Preallocates \a reserveSize[\c j] non zeros for each column (resp. row) \c j.
+      *
+      * This function turns the matrix in non-compressed mode */
+    template<class SizesType>
+    inline void reserve(const SizesType& reserveSizes);
+    #else
+    template<class SizesType>
+    inline void reserve(const SizesType& reserveSizes, const typename SizesType::value_type& enableif = typename SizesType::value_type())
+    {
+      EIGEN_UNUSED_VARIABLE(enableif);
+      reserveInnerVectors(reserveSizes);
+    }
+    template<class SizesType>
+    inline void reserve(const SizesType& reserveSizes, const typename SizesType::Scalar& enableif =
+    #if (!EIGEN_COMP_MSVC) || (EIGEN_COMP_MSVC>=1500) // MSVC 2005 fails to compile with this typename
+        typename
+    #endif
+        SizesType::Scalar())
+    {
+      EIGEN_UNUSED_VARIABLE(enableif);
+      reserveInnerVectors(reserveSizes);
+    }
+    #endif // EIGEN_PARSED_BY_DOXYGEN
+  protected:
+    template<class SizesType>
+    inline void reserveInnerVectors(const SizesType& reserveSizes)
+    {
+      if(isCompressed())
+      {
+        std::size_t totalReserveSize = 0;
+        // turn the matrix into non-compressed mode
+        m_innerNonZeros = static_cast<Index*>(std::malloc(m_outerSize * sizeof(Index)));
+        if (!m_innerNonZeros) internal::throw_std_bad_alloc();
+        
+        // temporarily use m_innerSizes to hold the new starting points.
+        Index* newOuterIndex = m_innerNonZeros;
+        
+        Index count = 0;
+        for(Index j=0; j<m_outerSize; ++j)
+        {
+          newOuterIndex[j] = count;
+          count += reserveSizes[j] + (m_outerIndex[j+1]-m_outerIndex[j]);
+          totalReserveSize += reserveSizes[j];
+        }
+        m_data.reserve(totalReserveSize);
+        Index previousOuterIndex = m_outerIndex[m_outerSize];
+        for(Index j=m_outerSize-1; j>=0; --j)
+        {
+          Index innerNNZ = previousOuterIndex - m_outerIndex[j];
+          for(Index i=innerNNZ-1; i>=0; --i)
+          {
+            m_data.index(newOuterIndex[j]+i) = m_data.index(m_outerIndex[j]+i);
+            m_data.value(newOuterIndex[j]+i) = m_data.value(m_outerIndex[j]+i);
+          }
+          previousOuterIndex = m_outerIndex[j];
+          m_outerIndex[j] = newOuterIndex[j];
+          m_innerNonZeros[j] = innerNNZ;
+        }
+        m_outerIndex[m_outerSize] = m_outerIndex[m_outerSize-1] + m_innerNonZeros[m_outerSize-1] + reserveSizes[m_outerSize-1];
+        
+        m_data.resize(m_outerIndex[m_outerSize]);
+      }
+      else
+      {
+        Index* newOuterIndex = static_cast<Index*>(std::malloc((m_outerSize+1)*sizeof(Index)));
+        if (!newOuterIndex) internal::throw_std_bad_alloc();
+        
+        Index count = 0;
+        for(Index j=0; j<m_outerSize; ++j)
+        {
+          newOuterIndex[j] = count;
+          Index alreadyReserved = (m_outerIndex[j+1]-m_outerIndex[j]) - m_innerNonZeros[j];
+          Index toReserve = std::max<Index>(reserveSizes[j], alreadyReserved);
+          count += toReserve + m_innerNonZeros[j];
+        }
+        newOuterIndex[m_outerSize] = count;
+        
+        m_data.resize(count);
+        for(Index j=m_outerSize-1; j>=0; --j)
+        {
+          Index offset = newOuterIndex[j] - m_outerIndex[j];
+          if(offset>0)
+          {
+            Index innerNNZ = m_innerNonZeros[j];
+            for(Index i=innerNNZ-1; i>=0; --i)
+            {
+              m_data.index(newOuterIndex[j]+i) = m_data.index(m_outerIndex[j]+i);
+              m_data.value(newOuterIndex[j]+i) = m_data.value(m_outerIndex[j]+i);
+            }
+          }
+        }
+        
+        std::swap(m_outerIndex, newOuterIndex);
+        std::free(newOuterIndex);
+      }
+      
+    }
+  public:
+
+    //--- low level purely coherent filling ---
+
+    /** \internal
+      * \returns a reference to the non zero coefficient at position \a row, \a col assuming that:
+      * - the nonzero does not already exist
+      * - the new coefficient is the last one according to the storage order
+      *
+      * Before filling a given inner vector you must call the statVec(Index) function.
+      *
+      * After an insertion session, you should call the finalize() function.
+      *
+      * \sa insert, insertBackByOuterInner, startVec */
+    inline Scalar& insertBack(Index row, Index col)
+    {
+      return insertBackByOuterInner(IsRowMajor?row:col, IsRowMajor?col:row);
+    }
+
+    /** \internal
+      * \sa insertBack, startVec */
+    inline Scalar& insertBackByOuterInner(Index outer, Index inner)
+    {
+      eigen_assert(size_t(m_outerIndex[outer+1]) == m_data.size() && "Invalid ordered insertion (invalid outer index)");
+      eigen_assert( (m_outerIndex[outer+1]-m_outerIndex[outer]==0 || m_data.index(m_data.size()-1)<inner) && "Invalid ordered insertion (invalid inner index)");
+      Index p = m_outerIndex[outer+1];
+      ++m_outerIndex[outer+1];
+      m_data.append(Scalar(0), inner);
+      return m_data.value(p);
+    }
+
+    /** \internal
+      * \warning use it only if you know what you are doing */
+    inline Scalar& insertBackByOuterInnerUnordered(Index outer, Index inner)
+    {
+      Index p = m_outerIndex[outer+1];
+      ++m_outerIndex[outer+1];
+      m_data.append(Scalar(0), inner);
+      return m_data.value(p);
+    }
+
+    /** \internal
+      * \sa insertBack, insertBackByOuterInner */
+    inline void startVec(Index outer)
+    {
+      eigen_assert(m_outerIndex[outer]==Index(m_data.size()) && "You must call startVec for each inner vector sequentially");
+      eigen_assert(m_outerIndex[outer+1]==0 && "You must call startVec for each inner vector sequentially");
+      m_outerIndex[outer+1] = m_outerIndex[outer];
+    }
+
+    /** \internal
+      * Must be called after inserting a set of non zero entries using the low level compressed API.
+      */
+    inline void finalize()
+    {
+      if(isCompressed())
+      {
+        Index size = static_cast<Index>(m_data.size());
+        Index i = m_outerSize;
+        // find the last filled column
+        while (i>=0 && m_outerIndex[i]==0)
+          --i;
+        ++i;
+        while (i<=m_outerSize)
+        {
+          m_outerIndex[i] = size;
+          ++i;
+        }
+      }
+    }
+
+    //---
+
+    template<typename InputIterators>
+    void setFromTriplets(const InputIterators& begin, const InputIterators& end);
+
+    void sumupDuplicates();
+
+    //---
+    
+    /** \internal
+      * same as insert(Index,Index) except that the indices are given relative to the storage order */
+    Scalar& insertByOuterInner(Index j, Index i)
+    {
+      return insert(IsRowMajor ? j : i, IsRowMajor ? i : j);
+    }
+
+    /** Turns the matrix into the \em compressed format.
+      */
+    void makeCompressed()
+    {
+      if(isCompressed())
+        return;
+      
+      Index oldStart = m_outerIndex[1];
+      m_outerIndex[1] = m_innerNonZeros[0];
+      for(Index j=1; j<m_outerSize; ++j)
+      {
+        Index nextOldStart = m_outerIndex[j+1];
+        Index offset = oldStart - m_outerIndex[j];
+        if(offset>0)
+        {
+          for(Index k=0; k<m_innerNonZeros[j]; ++k)
+          {
+            m_data.index(m_outerIndex[j]+k) = m_data.index(oldStart+k);
+            m_data.value(m_outerIndex[j]+k) = m_data.value(oldStart+k);
+          }
+        }
+        m_outerIndex[j+1] = m_outerIndex[j] + m_innerNonZeros[j];
+        oldStart = nextOldStart;
+      }
+      std::free(m_innerNonZeros);
+      m_innerNonZeros = 0;
+      m_data.resize(m_outerIndex[m_outerSize]);
+      m_data.squeeze();
+    }
+
+    /** Turns the matrix into the uncompressed mode */
+    void uncompress()
+    {
+      if(m_innerNonZeros != 0)
+        return; 
+      m_innerNonZeros = static_cast<Index*>(std::malloc(m_outerSize * sizeof(Index)));
+      for (Index i = 0; i < m_outerSize; i++)
+      {
+        m_innerNonZeros[i] = m_outerIndex[i+1] - m_outerIndex[i]; 
+      }
+    }
+    
+    /** Suppresses all nonzeros which are \b much \b smaller \b than \a reference under the tolerence \a epsilon */
+    void prune(const Scalar& reference, const RealScalar& epsilon = NumTraits<RealScalar>::dummy_precision())
+    {
+      prune(default_prunning_func(reference,epsilon));
+    }
+    
+    /** Turns the matrix into compressed format, and suppresses all nonzeros which do not satisfy the predicate \a keep.
+      * The functor type \a KeepFunc must implement the following function:
+      * \code
+      * bool operator() (const Index& row, const Index& col, const Scalar& value) const;
+      * \endcode
+      * \sa prune(Scalar,RealScalar)
+      */
+    template<typename KeepFunc>
+    void prune(const KeepFunc& keep = KeepFunc())
+    {
+      // TODO optimize the uncompressed mode to avoid moving and allocating the data twice
+      // TODO also implement a unit test
+      makeCompressed();
+
+      Index k = 0;
+      for(Index j=0; j<m_outerSize; ++j)
+      {
+        Index previousStart = m_outerIndex[j];
+        m_outerIndex[j] = k;
+        Index end = m_outerIndex[j+1];
+        for(Index i=previousStart; i<end; ++i)
+        {
+          if(keep(IsRowMajor?j:m_data.index(i), IsRowMajor?m_data.index(i):j, m_data.value(i)))
+          {
+            m_data.value(k) = m_data.value(i);
+            m_data.index(k) = m_data.index(i);
+            ++k;
+          }
+        }
+      }
+      m_outerIndex[m_outerSize] = k;
+      m_data.resize(k,0);
+    }
+
+    /** Resizes the matrix to a \a rows x \a cols matrix leaving old values untouched.
+      * \sa resizeNonZeros(Index), reserve(), setZero()
+      */
+    void conservativeResize(Index rows, Index cols) 
+    {
+      // No change
+      if (this->rows() == rows && this->cols() == cols) return;
+      
+      // If one dimension is null, then there is nothing to be preserved
+      if(rows==0 || cols==0) return resize(rows,cols);
+
+      Index innerChange = IsRowMajor ? cols - this->cols() : rows - this->rows();
+      Index outerChange = IsRowMajor ? rows - this->rows() : cols - this->cols();
+      Index newInnerSize = IsRowMajor ? cols : rows;
+
+      // Deals with inner non zeros
+      if (m_innerNonZeros)
+      {
+        // Resize m_innerNonZeros
+        Index *newInnerNonZeros = static_cast<Index*>(std::realloc(m_innerNonZeros, (m_outerSize + outerChange) * sizeof(Index)));
+        if (!newInnerNonZeros) internal::throw_std_bad_alloc();
+        m_innerNonZeros = newInnerNonZeros;
+        
+        for(Index i=m_outerSize; i<m_outerSize+outerChange; i++)          
+          m_innerNonZeros[i] = 0;
+      } 
+      else if (innerChange < 0) 
+      {
+        // Inner size decreased: allocate a new m_innerNonZeros
+        m_innerNonZeros = static_cast<Index*>(std::malloc((m_outerSize+outerChange+1) * sizeof(Index)));
+        if (!m_innerNonZeros) internal::throw_std_bad_alloc();
+        for(Index i = 0; i < m_outerSize; i++)
+          m_innerNonZeros[i] = m_outerIndex[i+1] - m_outerIndex[i];
+      }
+      
+      // Change the m_innerNonZeros in case of a decrease of inner size
+      if (m_innerNonZeros && innerChange < 0)
+      {
+        for(Index i = 0; i < m_outerSize + (std::min)(outerChange, Index(0)); i++)
+        {
+          Index &n = m_innerNonZeros[i];
+          Index start = m_outerIndex[i];
+          while (n > 0 && m_data.index(start+n-1) >= newInnerSize) --n; 
+        }
+      }
+      
+      m_innerSize = newInnerSize;
+
+      // Re-allocate outer index structure if necessary
+      if (outerChange == 0)
+        return;
+          
+      Index *newOuterIndex = static_cast<Index*>(std::realloc(m_outerIndex, (m_outerSize + outerChange + 1) * sizeof(Index)));
+      if (!newOuterIndex) internal::throw_std_bad_alloc();
+      m_outerIndex = newOuterIndex;
+      if (outerChange > 0)
+      {
+        Index last = m_outerSize == 0 ? 0 : m_outerIndex[m_outerSize];
+        for(Index i=m_outerSize; i<m_outerSize+outerChange+1; i++)          
+          m_outerIndex[i] = last; 
+      }
+      m_outerSize += outerChange;
+    }
+    
+    /** Resizes the matrix to a \a rows x \a cols matrix and initializes it to zero.
+      * \sa resizeNonZeros(Index), reserve(), setZero()
+      */
+    void resize(Index rows, Index cols)
+    {
+      const Index outerSize = IsRowMajor ? rows : cols;
+      m_innerSize = IsRowMajor ? cols : rows;
+      m_data.clear();
+      if (m_outerSize != outerSize || m_outerSize==0)
+      {
+        std::free(m_outerIndex);
+        m_outerIndex = static_cast<Index*>(std::malloc((outerSize + 1) * sizeof(Index)));
+        if (!m_outerIndex) internal::throw_std_bad_alloc();
+        
+        m_outerSize = outerSize;
+      }
+      if(m_innerNonZeros)
+      {
+        std::free(m_innerNonZeros);
+        m_innerNonZeros = 0;
+      }
+      memset(m_outerIndex, 0, (m_outerSize+1)*sizeof(Index));
+    }
+
+    /** \internal
+      * Resize the nonzero vector to \a size */
+    void resizeNonZeros(Index size)
+    {
+      // TODO remove this function
+      m_data.resize(size);
+    }
+
+    /** \returns a const expression of the diagonal coefficients */
+    const Diagonal<const SparseMatrix> diagonal() const { return *this; }
+
+    /** Default constructor yielding an empty \c 0 \c x \c 0 matrix */
+    inline SparseMatrix()
+      : m_outerSize(-1), m_innerSize(0), m_outerIndex(0), m_innerNonZeros(0)
+    {
+      check_template_parameters();
+      resize(0, 0);
+    }
+
+    /** Constructs a \a rows \c x \a cols empty matrix */
+    inline SparseMatrix(Index rows, Index cols)
+      : m_outerSize(0), m_innerSize(0), m_outerIndex(0), m_innerNonZeros(0)
+    {
+      check_template_parameters();
+      resize(rows, cols);
+    }
+
+    /** Constructs a sparse matrix from the sparse expression \a other */
+    template<typename OtherDerived>
+    inline SparseMatrix(const SparseMatrixBase<OtherDerived>& other)
+      : m_outerSize(0), m_innerSize(0), m_outerIndex(0), m_innerNonZeros(0)
+    {
+      EIGEN_STATIC_ASSERT((internal::is_same<Scalar, typename OtherDerived::Scalar>::value),
+        YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+      check_template_parameters();
+      *this = other.derived();
+    }
+    
+    /** Constructs a sparse matrix from the sparse selfadjoint view \a other */
+    template<typename OtherDerived, unsigned int UpLo>
+    inline SparseMatrix(const SparseSelfAdjointView<OtherDerived, UpLo>& other)
+      : m_outerSize(0), m_innerSize(0), m_outerIndex(0), m_innerNonZeros(0)
+    {
+      check_template_parameters();
+      *this = other;
+    }
+
+    /** Copy constructor (it performs a deep copy) */
+    inline SparseMatrix(const SparseMatrix& other)
+      : Base(), m_outerSize(0), m_innerSize(0), m_outerIndex(0), m_innerNonZeros(0)
+    {
+      check_template_parameters();
+      *this = other.derived();
+    }
+
+    /** \brief Copy constructor with in-place evaluation */
+    template<typename OtherDerived>
+    SparseMatrix(const ReturnByValue<OtherDerived>& other)
+      : Base(), m_outerSize(0), m_innerSize(0), m_outerIndex(0), m_innerNonZeros(0)
+    {
+      check_template_parameters();
+      initAssignment(other);
+      other.evalTo(*this);
+    }
+
+    /** Swaps the content of two sparse matrices of the same type.
+      * This is a fast operation that simply swaps the underlying pointers and parameters. */
+    inline void swap(SparseMatrix& other)
+    {
+      //EIGEN_DBG_SPARSE(std::cout << "SparseMatrix:: swap\n");
+      std::swap(m_outerIndex, other.m_outerIndex);
+      std::swap(m_innerSize, other.m_innerSize);
+      std::swap(m_outerSize, other.m_outerSize);
+      std::swap(m_innerNonZeros, other.m_innerNonZeros);
+      m_data.swap(other.m_data);
+    }
+
+    /** Sets *this to the identity matrix */
+    inline void setIdentity()
+    {
+      eigen_assert(rows() == cols() && "ONLY FOR SQUARED MATRICES");
+      this->m_data.resize(rows());
+      Eigen::Map<Matrix<Index, Dynamic, 1> >(&this->m_data.index(0), rows()).setLinSpaced(0, rows()-1);
+      Eigen::Map<Matrix<Scalar, Dynamic, 1> >(&this->m_data.value(0), rows()).setOnes();
+      Eigen::Map<Matrix<Index, Dynamic, 1> >(this->m_outerIndex, rows()+1).setLinSpaced(0, rows());
+    }
+    inline SparseMatrix& operator=(const SparseMatrix& other)
+    {
+      if (other.isRValue())
+      {
+        swap(other.const_cast_derived());
+      }
+      else if(this!=&other)
+      {
+        initAssignment(other);
+        if(other.isCompressed())
+        {
+          internal::smart_copy(other.m_outerIndex, other.m_outerIndex + m_outerSize + 1, m_outerIndex);
+          m_data = other.m_data;
+        }
+        else
+        {
+          Base::operator=(other);
+        }
+      }
+      return *this;
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename Lhs, typename Rhs>
+    inline SparseMatrix& operator=(const SparseSparseProduct<Lhs,Rhs>& product)
+    { return Base::operator=(product); }
+    
+    template<typename OtherDerived>
+    inline SparseMatrix& operator=(const ReturnByValue<OtherDerived>& other)
+    {
+      initAssignment(other);
+      return Base::operator=(other.derived());
+    }
+    
+    template<typename OtherDerived>
+    inline SparseMatrix& operator=(const EigenBase<OtherDerived>& other)
+    { return Base::operator=(other.derived()); }
+    #endif
+
+    template<typename OtherDerived>
+    EIGEN_DONT_INLINE SparseMatrix& operator=(const SparseMatrixBase<OtherDerived>& other);
+
+    friend std::ostream & operator << (std::ostream & s, const SparseMatrix& m)
+    {
+      EIGEN_DBG_SPARSE(
+        s << "Nonzero entries:\n";
+        if(m.isCompressed())
+          for (Index i=0; i<m.nonZeros(); ++i)
+            s << "(" << m.m_data.value(i) << "," << m.m_data.index(i) << ") ";
+        else
+          for (Index i=0; i<m.outerSize(); ++i)
+          {
+            Index p = m.m_outerIndex[i];
+            Index pe = m.m_outerIndex[i]+m.m_innerNonZeros[i];
+            Index k=p;
+            for (; k<pe; ++k)
+              s << "(" << m.m_data.value(k) << "," << m.m_data.index(k) << ") ";
+            for (; k<m.m_outerIndex[i+1]; ++k)
+              s << "(_,_) ";
+          }
+        s << std::endl;
+        s << std::endl;
+        s << "Outer pointers:\n";
+        for (Index i=0; i<m.outerSize(); ++i)
+          s << m.m_outerIndex[i] << " ";
+        s << " $" << std::endl;
+        if(!m.isCompressed())
+        {
+          s << "Inner non zeros:\n";
+          for (Index i=0; i<m.outerSize(); ++i)
+            s << m.m_innerNonZeros[i] << " ";
+          s << " $" << std::endl;
+        }
+        s << std::endl;
+      );
+      s << static_cast<const SparseMatrixBase<SparseMatrix>&>(m);
+      return s;
+    }
+
+    /** Destructor */
+    inline ~SparseMatrix()
+    {
+      std::free(m_outerIndex);
+      std::free(m_innerNonZeros);
+    }
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** Overloaded for performance */
+    Scalar sum() const;
+#endif
+    
+#   ifdef EIGEN_SPARSEMATRIX_PLUGIN
+#     include EIGEN_SPARSEMATRIX_PLUGIN
+#   endif
+
+protected:
+
+    template<typename Other>
+    void initAssignment(const Other& other)
+    {
+      resize(other.rows(), other.cols());
+      if(m_innerNonZeros)
+      {
+        std::free(m_innerNonZeros);
+        m_innerNonZeros = 0;
+      }
+    }
+
+    /** \internal
+      * \sa insert(Index,Index) */
+    EIGEN_DONT_INLINE Scalar& insertCompressed(Index row, Index col);
+
+    /** \internal
+      * A vector object that is equal to 0 everywhere but v at the position i */
+    class SingletonVector
+    {
+        Index m_index;
+        Index m_value;
+      public:
+        typedef Index value_type;
+        SingletonVector(Index i, Index v)
+          : m_index(i), m_value(v)
+        {}
+
+        Index operator[](Index i) const { return i==m_index ? m_value : 0; }
+    };
+
+    /** \internal
+      * \sa insert(Index,Index) */
+    EIGEN_DONT_INLINE Scalar& insertUncompressed(Index row, Index col);
+
+public:
+    /** \internal
+      * \sa insert(Index,Index) */
+    EIGEN_STRONG_INLINE Scalar& insertBackUncompressed(Index row, Index col)
+    {
+      const Index outer = IsRowMajor ? row : col;
+      const Index inner = IsRowMajor ? col : row;
+
+      eigen_assert(!isCompressed());
+      eigen_assert(m_innerNonZeros[outer]<=(m_outerIndex[outer+1] - m_outerIndex[outer]));
+
+      Index p = m_outerIndex[outer] + m_innerNonZeros[outer]++;
+      m_data.index(p) = inner;
+      return (m_data.value(p) = 0);
+    }
+
+private:
+  static void check_template_parameters()
+  {
+    EIGEN_STATIC_ASSERT(NumTraits<Index>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE);
+    EIGEN_STATIC_ASSERT((Options&(ColMajor|RowMajor))==Options,INVALID_MATRIX_TEMPLATE_PARAMETERS);
+  }
+
+  struct default_prunning_func {
+    default_prunning_func(const Scalar& ref, const RealScalar& eps) : reference(ref), epsilon(eps) {}
+    inline bool operator() (const Index&, const Index&, const Scalar& value) const
+    {
+      return !internal::isMuchSmallerThan(value, reference, epsilon);
+    }
+    Scalar reference;
+    RealScalar epsilon;
+  };
+};
+
+template<typename Scalar, int _Options, typename _Index>
+class SparseMatrix<Scalar,_Options,_Index>::InnerIterator
+{
+  public:
+    InnerIterator(const SparseMatrix& mat, Index outer)
+      : m_values(mat.valuePtr()), m_indices(mat.innerIndexPtr()), m_outer(outer), m_id(mat.m_outerIndex[outer])
+    {
+      if(mat.isCompressed())
+        m_end = mat.m_outerIndex[outer+1];
+      else
+        m_end = m_id + mat.m_innerNonZeros[outer];
+    }
+
+    inline InnerIterator& operator++() { m_id++; return *this; }
+
+    inline const Scalar& value() const { return m_values[m_id]; }
+    inline Scalar& valueRef() { return const_cast<Scalar&>(m_values[m_id]); }
+
+    inline Index index() const { return m_indices[m_id]; }
+    inline Index outer() const { return m_outer; }
+    inline Index row() const { return IsRowMajor ? m_outer : index(); }
+    inline Index col() const { return IsRowMajor ? index() : m_outer; }
+
+    inline operator bool() const { return (m_id < m_end); }
+
+  protected:
+    const Scalar* m_values;
+    const Index* m_indices;
+    const Index m_outer;
+    Index m_id;
+    Index m_end;
+};
+
+template<typename Scalar, int _Options, typename _Index>
+class SparseMatrix<Scalar,_Options,_Index>::ReverseInnerIterator
+{
+  public:
+    ReverseInnerIterator(const SparseMatrix& mat, Index outer)
+      : m_values(mat.valuePtr()), m_indices(mat.innerIndexPtr()), m_outer(outer), m_start(mat.m_outerIndex[outer])
+    {
+      if(mat.isCompressed())
+        m_id = mat.m_outerIndex[outer+1];
+      else
+        m_id = m_start + mat.m_innerNonZeros[outer];
+    }
+
+    inline ReverseInnerIterator& operator--() { --m_id; return *this; }
+
+    inline const Scalar& value() const { return m_values[m_id-1]; }
+    inline Scalar& valueRef() { return const_cast<Scalar&>(m_values[m_id-1]); }
+
+    inline Index index() const { return m_indices[m_id-1]; }
+    inline Index outer() const { return m_outer; }
+    inline Index row() const { return IsRowMajor ? m_outer : index(); }
+    inline Index col() const { return IsRowMajor ? index() : m_outer; }
+
+    inline operator bool() const { return (m_id > m_start); }
+
+  protected:
+    const Scalar* m_values;
+    const Index* m_indices;
+    const Index m_outer;
+    Index m_id;
+    const Index m_start;
+};
+
+namespace internal {
+
+template<typename InputIterator, typename SparseMatrixType>
+void set_from_triplets(const InputIterator& begin, const InputIterator& end, SparseMatrixType& mat, int Options = 0)
+{
+  EIGEN_UNUSED_VARIABLE(Options);
+  enum { IsRowMajor = SparseMatrixType::IsRowMajor };
+  typedef typename SparseMatrixType::Scalar Scalar;
+  typedef typename SparseMatrixType::Index Index;
+  SparseMatrix<Scalar,IsRowMajor?ColMajor:RowMajor> trMat(mat.rows(),mat.cols());
+
+  if(begin!=end)
+  {
+    // pass 1: count the nnz per inner-vector
+    Matrix<Index,Dynamic,1> wi(trMat.outerSize());
+    wi.setZero();
+    for(InputIterator it(begin); it!=end; ++it)
+    {
+      eigen_assert(it->row()>=0 && it->row()<mat.rows() && it->col()>=0 && it->col()<mat.cols());
+      wi(IsRowMajor ? it->col() : it->row())++;
+    }
+
+    // pass 2: insert all the elements into trMat
+    trMat.reserve(wi);
+    for(InputIterator it(begin); it!=end; ++it)
+      trMat.insertBackUncompressed(it->row(),it->col()) = it->value();
+
+    // pass 3:
+    trMat.sumupDuplicates();
+  }
+
+  // pass 4: transposed copy -> implicit sorting
+  mat = trMat;
+}
+
+}
+
+
+/** Fill the matrix \c *this with the list of \em triplets defined by the iterator range \a begin - \a end.
+  *
+  * A \em triplet is a tuple (i,j,value) defining a non-zero element.
+  * The input list of triplets does not have to be sorted, and can contains duplicated elements.
+  * In any case, the result is a \b sorted and \b compressed sparse matrix where the duplicates have been summed up.
+  * This is a \em O(n) operation, with \em n the number of triplet elements.
+  * The initial contents of \c *this is destroyed.
+  * The matrix \c *this must be properly resized beforehand using the SparseMatrix(Index,Index) constructor,
+  * or the resize(Index,Index) method. The sizes are not extracted from the triplet list.
+  *
+  * The \a InputIterators value_type must provide the following interface:
+  * \code
+  * Scalar value() const; // the value
+  * Scalar row() const;   // the row index i
+  * Scalar col() const;   // the column index j
+  * \endcode
+  * See for instance the Eigen::Triplet template class.
+  *
+  * Here is a typical usage example:
+  * \code
+    typedef Triplet<double> T;
+    std::vector<T> tripletList;
+    triplets.reserve(estimation_of_entries);
+    for(...)
+    {
+      // ...
+      tripletList.push_back(T(i,j,v_ij));
+    }
+    SparseMatrixType m(rows,cols);
+    m.setFromTriplets(tripletList.begin(), tripletList.end());
+    // m is ready to go!
+  * \endcode
+  *
+  * \warning The list of triplets is read multiple times (at least twice). Therefore, it is not recommended to define
+  * an abstract iterator over a complex data-structure that would be expensive to evaluate. The triplets should rather
+  * be explicitely stored into a std::vector for instance.
+  */
+template<typename Scalar, int _Options, typename _Index>
+template<typename InputIterators>
+void SparseMatrix<Scalar,_Options,_Index>::setFromTriplets(const InputIterators& begin, const InputIterators& end)
+{
+  internal::set_from_triplets(begin, end, *this);
+}
+
+/** \internal */
+template<typename Scalar, int _Options, typename _Index>
+void SparseMatrix<Scalar,_Options,_Index>::sumupDuplicates()
+{
+  eigen_assert(!isCompressed());
+  // TODO, in practice we should be able to use m_innerNonZeros for that task
+  Matrix<Index,Dynamic,1> wi(innerSize());
+  wi.fill(-1);
+  Index count = 0;
+  // for each inner-vector, wi[inner_index] will hold the position of first element into the index/value buffers
+  for(Index j=0; j<outerSize(); ++j)
+  {
+    Index start   = count;
+    Index oldEnd  = m_outerIndex[j]+m_innerNonZeros[j];
+    for(Index k=m_outerIndex[j]; k<oldEnd; ++k)
+    {
+      Index i = m_data.index(k);
+      if(wi(i)>=start)
+      {
+        // we already meet this entry => accumulate it
+        m_data.value(wi(i)) += m_data.value(k);
+      }
+      else
+      {
+        m_data.value(count) = m_data.value(k);
+        m_data.index(count) = m_data.index(k);
+        wi(i) = count;
+        ++count;
+      }
+    }
+    m_outerIndex[j] = start;
+  }
+  m_outerIndex[m_outerSize] = count;
+
+  // turn the matrix into compressed form
+  std::free(m_innerNonZeros);
+  m_innerNonZeros = 0;
+  m_data.resize(m_outerIndex[m_outerSize]);
+}
+
+template<typename Scalar, int _Options, typename _Index>
+template<typename OtherDerived>
+EIGEN_DONT_INLINE SparseMatrix<Scalar,_Options,_Index>& SparseMatrix<Scalar,_Options,_Index>::operator=(const SparseMatrixBase<OtherDerived>& other)
+{
+  EIGEN_STATIC_ASSERT((internal::is_same<Scalar, typename OtherDerived::Scalar>::value),
+        YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+  
+  const bool needToTranspose = (Flags & RowMajorBit) != (OtherDerived::Flags & RowMajorBit);
+  if (needToTranspose)
+  {
+    // two passes algorithm:
+    //  1 - compute the number of coeffs per dest inner vector
+    //  2 - do the actual copy/eval
+    // Since each coeff of the rhs has to be evaluated twice, let's evaluate it if needed
+    typedef typename internal::nested<OtherDerived,2>::type OtherCopy;
+    typedef typename internal::remove_all<OtherCopy>::type _OtherCopy;
+    OtherCopy otherCopy(other.derived());
+
+    SparseMatrix dest(other.rows(),other.cols());
+    Eigen::Map<Matrix<Index, Dynamic, 1> > (dest.m_outerIndex,dest.outerSize()).setZero();
+
+    // pass 1
+    // FIXME the above copy could be merged with that pass
+    for (Index j=0; j<otherCopy.outerSize(); ++j)
+      for (typename _OtherCopy::InnerIterator it(otherCopy, j); it; ++it)
+        ++dest.m_outerIndex[it.index()];
+
+    // prefix sum
+    Index count = 0;
+    Matrix<Index,Dynamic,1> positions(dest.outerSize());
+    for (Index j=0; j<dest.outerSize(); ++j)
+    {
+      Index tmp = dest.m_outerIndex[j];
+      dest.m_outerIndex[j] = count;
+      positions[j] = count;
+      count += tmp;
+    }
+    dest.m_outerIndex[dest.outerSize()] = count;
+    // alloc
+    dest.m_data.resize(count);
+    // pass 2
+    for (Index j=0; j<otherCopy.outerSize(); ++j)
+    {
+      for (typename _OtherCopy::InnerIterator it(otherCopy, j); it; ++it)
+      {
+        Index pos = positions[it.index()]++;
+        dest.m_data.index(pos) = j;
+        dest.m_data.value(pos) = it.value();
+      }
+    }
+    this->swap(dest);
+    return *this;
+  }
+  else
+  {
+    if(other.isRValue())
+      initAssignment(other.derived());
+    // there is no special optimization
+    return Base::operator=(other.derived());
+  }
+}
+
+template<typename _Scalar, int _Options, typename _Index>
+EIGEN_DONT_INLINE typename SparseMatrix<_Scalar,_Options,_Index>::Scalar& SparseMatrix<_Scalar,_Options,_Index>::insertUncompressed(Index row, Index col)
+{
+  eigen_assert(!isCompressed());
+
+  const Index outer = IsRowMajor ? row : col;
+  const Index inner = IsRowMajor ? col : row;
+
+  Index room = m_outerIndex[outer+1] - m_outerIndex[outer];
+  Index innerNNZ = m_innerNonZeros[outer];
+  if(innerNNZ>=room)
+  {
+    // this inner vector is full, we need to reallocate the whole buffer :(
+    reserve(SingletonVector(outer,std::max<Index>(2,innerNNZ)));
+  }
+
+  Index startId = m_outerIndex[outer];
+  Index p = startId + m_innerNonZeros[outer];
+  while ( (p > startId) && (m_data.index(p-1) > inner) )
+  {
+    m_data.index(p) = m_data.index(p-1);
+    m_data.value(p) = m_data.value(p-1);
+    --p;
+  }
+  eigen_assert((p<=startId || m_data.index(p-1)!=inner) && "you cannot insert an element that already exists, you must call coeffRef to this end");
+
+  m_innerNonZeros[outer]++;
+
+  m_data.index(p) = inner;
+  return (m_data.value(p) = 0);
+}
+
+template<typename _Scalar, int _Options, typename _Index>
+EIGEN_DONT_INLINE typename SparseMatrix<_Scalar,_Options,_Index>::Scalar& SparseMatrix<_Scalar,_Options,_Index>::insertCompressed(Index row, Index col)
+{
+  eigen_assert(isCompressed());
+
+  const Index outer = IsRowMajor ? row : col;
+  const Index inner = IsRowMajor ? col : row;
+
+  Index previousOuter = outer;
+  if (m_outerIndex[outer+1]==0)
+  {
+    // we start a new inner vector
+    while (previousOuter>=0 && m_outerIndex[previousOuter]==0)
+    {
+      m_outerIndex[previousOuter] = static_cast<Index>(m_data.size());
+      --previousOuter;
+    }
+    m_outerIndex[outer+1] = m_outerIndex[outer];
+  }
+
+  // here we have to handle the tricky case where the outerIndex array
+  // starts with: [ 0 0 0 0 0 1 ...] and we are inserted in, e.g.,
+  // the 2nd inner vector...
+  bool isLastVec = (!(previousOuter==-1 && m_data.size()!=0))
+                && (size_t(m_outerIndex[outer+1]) == m_data.size());
+
+  size_t startId = m_outerIndex[outer];
+  // FIXME let's make sure sizeof(long int) == sizeof(size_t)
+  size_t p = m_outerIndex[outer+1];
+  ++m_outerIndex[outer+1];
+
+  float reallocRatio = 1;
+  if (m_data.allocatedSize()<=m_data.size())
+  {
+    // if there is no preallocated memory, let's reserve a minimum of 32 elements
+    if (m_data.size()==0)
+    {
+      m_data.reserve(32);
+    }
+    else
+    {
+      // we need to reallocate the data, to reduce multiple reallocations
+      // we use a smart resize algorithm based on the current filling ratio
+      // in addition, we use float to avoid integers overflows
+      float nnzEstimate = float(m_outerIndex[outer])*float(m_outerSize)/float(outer+1);
+      reallocRatio = (nnzEstimate-float(m_data.size()))/float(m_data.size());
+      // furthermore we bound the realloc ratio to:
+      //   1) reduce multiple minor realloc when the matrix is almost filled
+      //   2) avoid to allocate too much memory when the matrix is almost empty
+      reallocRatio = (std::min)((std::max)(reallocRatio,1.5f),8.f);
+    }
+  }
+  m_data.resize(m_data.size()+1,reallocRatio);
+
+  if (!isLastVec)
+  {
+    if (previousOuter==-1)
+    {
+      // oops wrong guess.
+      // let's correct the outer offsets
+      for (Index k=0; k<=(outer+1); ++k)
+        m_outerIndex[k] = 0;
+      Index k=outer+1;
+      while(m_outerIndex[k]==0)
+        m_outerIndex[k++] = 1;
+      while (k<=m_outerSize && m_outerIndex[k]!=0)
+        m_outerIndex[k++]++;
+      p = 0;
+      --k;
+      k = m_outerIndex[k]-1;
+      while (k>0)
+      {
+        m_data.index(k) = m_data.index(k-1);
+        m_data.value(k) = m_data.value(k-1);
+        k--;
+      }
+    }
+    else
+    {
+      // we are not inserting into the last inner vec
+      // update outer indices:
+      Index j = outer+2;
+      while (j<=m_outerSize && m_outerIndex[j]!=0)
+        m_outerIndex[j++]++;
+      --j;
+      // shift data of last vecs:
+      Index k = m_outerIndex[j]-1;
+      while (k>=Index(p))
+      {
+        m_data.index(k) = m_data.index(k-1);
+        m_data.value(k) = m_data.value(k-1);
+        k--;
+      }
+    }
+  }
+
+  while ( (p > startId) && (m_data.index(p-1) > inner) )
+  {
+    m_data.index(p) = m_data.index(p-1);
+    m_data.value(p) = m_data.value(p-1);
+    --p;
+  }
+
+  m_data.index(p) = inner;
+  return (m_data.value(p) = 0);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSEMATRIX_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseMatrixBase.h b/third_party/eigen3/Eigen/src/SparseCore/SparseMatrixBase.h
new file mode 100644
index 0000000000..bbcf7fb1c6
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseMatrixBase.h
@@ -0,0 +1,451 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSEMATRIXBASE_H
+#define EIGEN_SPARSEMATRIXBASE_H
+
+namespace Eigen { 
+
+/** \ingroup SparseCore_Module
+  *
+  * \class SparseMatrixBase
+  *
+  * \brief Base class of any sparse matrices or sparse expressions
+  *
+  * \tparam Derived
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_SPARSEMATRIXBASE_PLUGIN.
+  */
+template<typename Derived> class SparseMatrixBase : public EigenBase<Derived>
+{
+  public:
+
+    typedef typename internal::traits<Derived>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type PacketScalar;
+    typedef typename internal::traits<Derived>::StorageKind StorageKind;
+    typedef typename internal::traits<Derived>::Index Index;
+    typedef typename internal::add_const_on_value_type_if_arithmetic<
+                         typename internal::packet_traits<Scalar>::type
+                     >::type PacketReturnType;
+
+    typedef SparseMatrixBase StorageBaseType;
+    typedef EigenBase<Derived> Base;
+    
+    template<typename OtherDerived>
+    Derived& operator=(const EigenBase<OtherDerived> &other)
+    {
+      other.derived().evalTo(derived());
+      return derived();
+    }
+
+    enum {
+
+      RowsAtCompileTime = internal::traits<Derived>::RowsAtCompileTime,
+        /**< The number of rows at compile-time. This is just a copy of the value provided
+          * by the \a Derived type. If a value is not known at compile-time,
+          * it is set to the \a Dynamic constant.
+          * \sa MatrixBase::rows(), MatrixBase::cols(), ColsAtCompileTime, SizeAtCompileTime */
+
+      ColsAtCompileTime = internal::traits<Derived>::ColsAtCompileTime,
+        /**< The number of columns at compile-time. This is just a copy of the value provided
+          * by the \a Derived type. If a value is not known at compile-time,
+          * it is set to the \a Dynamic constant.
+          * \sa MatrixBase::rows(), MatrixBase::cols(), RowsAtCompileTime, SizeAtCompileTime */
+
+
+      SizeAtCompileTime = (internal::size_at_compile_time<internal::traits<Derived>::RowsAtCompileTime,
+                                                   internal::traits<Derived>::ColsAtCompileTime>::ret),
+        /**< This is equal to the number of coefficients, i.e. the number of
+          * rows times the number of columns, or to \a Dynamic if this is not
+          * known at compile-time. \sa RowsAtCompileTime, ColsAtCompileTime */
+
+      MaxRowsAtCompileTime = RowsAtCompileTime,
+      MaxColsAtCompileTime = ColsAtCompileTime,
+
+      MaxSizeAtCompileTime = (internal::size_at_compile_time<MaxRowsAtCompileTime,
+                                                      MaxColsAtCompileTime>::ret),
+
+      IsVectorAtCompileTime = RowsAtCompileTime == 1 || ColsAtCompileTime == 1,
+        /**< This is set to true if either the number of rows or the number of
+          * columns is known at compile-time to be equal to 1. Indeed, in that case,
+          * we are dealing with a column-vector (if there is only one column) or with
+          * a row-vector (if there is only one row). */
+
+      Flags = internal::traits<Derived>::Flags,
+        /**< This stores expression \ref flags flags which may or may not be inherited by new expressions
+          * constructed from this one. See the \ref flags "list of flags".
+          */
+
+      CoeffReadCost = internal::traits<Derived>::CoeffReadCost,
+        /**< This is a rough measure of how expensive it is to read one coefficient from
+          * this expression.
+          */
+
+      IsRowMajor = Flags&RowMajorBit ? 1 : 0,
+      
+      InnerSizeAtCompileTime = int(IsVectorAtCompileTime) ? int(SizeAtCompileTime)
+                             : int(IsRowMajor) ? int(ColsAtCompileTime) : int(RowsAtCompileTime),
+
+      #ifndef EIGEN_PARSED_BY_DOXYGEN
+      _HasDirectAccess = (int(Flags)&DirectAccessBit) ? 1 : 0 // workaround sunCC
+      #endif
+    };
+
+    /** \internal the return type of MatrixBase::adjoint() */
+    typedef typename internal::conditional<NumTraits<Scalar>::IsComplex,
+                        CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>, Eigen::Transpose<const Derived> >,
+                        Transpose<const Derived>
+                     >::type AdjointReturnType;
+
+
+    typedef SparseMatrix<Scalar, Flags&RowMajorBit ? RowMajor : ColMajor, Index> PlainObject;
+
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** This is the "real scalar" type; if the \a Scalar type is already real numbers
+      * (e.g. int, float or double) then \a RealScalar is just the same as \a Scalar. If
+      * \a Scalar is \a std::complex<T> then RealScalar is \a T.
+      *
+      * \sa class NumTraits
+      */
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    /** \internal the return type of coeff()
+      */
+    typedef typename internal::conditional<_HasDirectAccess, const Scalar&, Scalar>::type CoeffReturnType;
+
+    /** \internal Represents a matrix with all coefficients equal to one another*/
+    typedef CwiseNullaryOp<internal::scalar_constant_op<Scalar>,Matrix<Scalar,Dynamic,Dynamic> > ConstantReturnType;
+
+    /** type of the equivalent square matrix */
+    typedef Matrix<Scalar,EIGEN_SIZE_MAX(RowsAtCompileTime,ColsAtCompileTime),
+                          EIGEN_SIZE_MAX(RowsAtCompileTime,ColsAtCompileTime)> SquareMatrixType;
+
+    inline const Derived& derived() const { return *static_cast<const Derived*>(this); }
+    inline Derived& derived() { return *static_cast<Derived*>(this); }
+    inline Derived& const_cast_derived() const
+    { return *static_cast<Derived*>(const_cast<SparseMatrixBase*>(this)); }
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+#define EIGEN_CURRENT_STORAGE_BASE_CLASS Eigen::SparseMatrixBase
+#   include "../plugins/CommonCwiseUnaryOps.h"
+#   include "../plugins/CommonCwiseBinaryOps.h"
+#   include "../plugins/MatrixCwiseUnaryOps.h"
+#   include "../plugins/MatrixCwiseBinaryOps.h"
+#   include "../plugins/BlockMethods.h"
+#   ifdef EIGEN_SPARSEMATRIXBASE_PLUGIN
+#     include EIGEN_SPARSEMATRIXBASE_PLUGIN
+#   endif
+#   undef EIGEN_CURRENT_STORAGE_BASE_CLASS
+#undef EIGEN_CURRENT_STORAGE_BASE_CLASS
+
+    /** \returns the number of rows. \sa cols() */
+    inline Index rows() const { return derived().rows(); }
+    /** \returns the number of columns. \sa rows() */
+    inline Index cols() const { return derived().cols(); }
+    /** \returns the number of coefficients, which is \a rows()*cols().
+      * \sa rows(), cols(). */
+    inline Index size() const { return rows() * cols(); }
+    /** \returns the number of nonzero coefficients which is in practice the number
+      * of stored coefficients. */
+    inline Index nonZeros() const { return derived().nonZeros(); }
+    /** \returns true if either the number of rows or the number of columns is equal to 1.
+      * In other words, this function returns
+      * \code rows()==1 || cols()==1 \endcode
+      * \sa rows(), cols(), IsVectorAtCompileTime. */
+    inline bool isVector() const { return rows()==1 || cols()==1; }
+    /** \returns the size of the storage major dimension,
+      * i.e., the number of columns for a columns major matrix, and the number of rows otherwise */
+    Index outerSize() const { return (int(Flags)&RowMajorBit) ? this->rows() : this->cols(); }
+    /** \returns the size of the inner dimension according to the storage order,
+      * i.e., the number of rows for a columns major matrix, and the number of cols otherwise */
+    Index innerSize() const { return (int(Flags)&RowMajorBit) ? this->cols() : this->rows(); }
+
+    bool isRValue() const { return m_isRValue; }
+    Derived& markAsRValue() { m_isRValue = true; return derived(); }
+
+    SparseMatrixBase() : m_isRValue(false) { /* TODO check flags */ }
+
+    
+    template<typename OtherDerived>
+    Derived& operator=(const ReturnByValue<OtherDerived>& other)
+    {
+      other.evalTo(derived());
+      return derived();
+    }
+
+
+    template<typename OtherDerived>
+    inline Derived& operator=(const SparseMatrixBase<OtherDerived>& other)
+    {
+      return assign(other.derived());
+    }
+
+    inline Derived& operator=(const Derived& other)
+    {
+//       if (other.isRValue())
+//         derived().swap(other.const_cast_derived());
+//       else
+      return assign(other.derived());
+    }
+
+  protected:
+
+    template<typename OtherDerived>
+    inline Derived& assign(const OtherDerived& other)
+    {
+      const bool transpose = (Flags & RowMajorBit) != (OtherDerived::Flags & RowMajorBit);
+      const Index outerSize = (int(OtherDerived::Flags) & RowMajorBit) ? other.rows() : other.cols();
+      if ((!transpose) && other.isRValue())
+      {
+        // eval without temporary
+        derived().resize(other.rows(), other.cols());
+        derived().setZero();
+        derived().reserve((std::max)(this->rows(),this->cols())*2);
+        for (Index j=0; j<outerSize; ++j)
+        {
+          derived().startVec(j);
+          for (typename OtherDerived::InnerIterator it(other, j); it; ++it)
+          {
+            Scalar v = it.value();
+            derived().insertBackByOuterInner(j,it.index()) = v;
+          }
+        }
+        derived().finalize();
+      }
+      else
+      {
+        assignGeneric(other);
+      }
+      return derived();
+    }
+
+    template<typename OtherDerived>
+    inline void assignGeneric(const OtherDerived& other)
+    {
+      //const bool transpose = (Flags & RowMajorBit) != (OtherDerived::Flags & RowMajorBit);
+      eigen_assert(( ((internal::traits<Derived>::SupportedAccessPatterns&OuterRandomAccessPattern)==OuterRandomAccessPattern) ||
+                  (!((Flags & RowMajorBit) != (OtherDerived::Flags & RowMajorBit)))) &&
+                  "the transpose operation is supposed to be handled in SparseMatrix::operator=");
+
+      enum { Flip = (Flags & RowMajorBit) != (OtherDerived::Flags & RowMajorBit) };
+
+      const Index outerSize = other.outerSize();
+      //typedef typename internal::conditional<transpose, LinkedVectorMatrix<Scalar,Flags&RowMajorBit>, Derived>::type TempType;
+      // thanks to shallow copies, we always eval to a tempary
+      Derived temp(other.rows(), other.cols());
+
+      temp.reserve((std::max)(this->rows(),this->cols())*2);
+      for (Index j=0; j<outerSize; ++j)
+      {
+        temp.startVec(j);
+        for (typename OtherDerived::InnerIterator it(other.derived(), j); it; ++it)
+        {
+          Scalar v = it.value();
+          temp.insertBackByOuterInner(Flip?it.index():j,Flip?j:it.index()) = v;
+        }
+      }
+      temp.finalize();
+
+      derived() = temp.markAsRValue();
+    }
+
+  public:
+
+    template<typename Lhs, typename Rhs>
+    inline Derived& operator=(const SparseSparseProduct<Lhs,Rhs>& product);
+
+    friend std::ostream & operator << (std::ostream & s, const SparseMatrixBase& m)
+    {
+      typedef typename Derived::Nested Nested;
+      typedef typename internal::remove_all<Nested>::type NestedCleaned;
+
+      if (Flags&RowMajorBit)
+      {
+        const Nested nm(m.derived());
+        for (Index row=0; row<nm.outerSize(); ++row)
+        {
+          Index col = 0;
+          for (typename NestedCleaned::InnerIterator it(nm.derived(), row); it; ++it)
+          {
+            for ( ; col<it.index(); ++col)
+              s << "0 ";
+            s << it.value() << " ";
+            ++col;
+          }
+          for ( ; col<m.cols(); ++col)
+            s << "0 ";
+          s << std::endl;
+        }
+      }
+      else
+      {
+        const Nested nm(m.derived());
+        if (m.cols() == 1) {
+          Index row = 0;
+          for (typename NestedCleaned::InnerIterator it(nm.derived(), 0); it; ++it)
+          {
+            for ( ; row<it.index(); ++row)
+              s << "0" << std::endl;
+            s << it.value() << std::endl;
+            ++row;
+          }
+          for ( ; row<m.rows(); ++row)
+            s << "0" << std::endl;
+        }
+        else
+        {
+          SparseMatrix<Scalar, RowMajorBit, Index> trans = m;
+          s << static_cast<const SparseMatrixBase<SparseMatrix<Scalar, RowMajorBit, Index> >&>(trans);
+        }
+      }
+      return s;
+    }
+
+    template<typename OtherDerived>
+    Derived& operator+=(const SparseMatrixBase<OtherDerived>& other);
+    template<typename OtherDerived>
+    Derived& operator-=(const SparseMatrixBase<OtherDerived>& other);
+
+    Derived& operator*=(const Scalar& other);
+    Derived& operator/=(const Scalar& other);
+
+    #define EIGEN_SPARSE_CWISE_PRODUCT_RETURN_TYPE \
+      CwiseBinaryOp< \
+        internal::scalar_product_op< \
+          typename internal::scalar_product_traits< \
+            typename internal::traits<Derived>::Scalar, \
+            typename internal::traits<OtherDerived>::Scalar \
+          >::ReturnType \
+        >, \
+        const Derived, \
+        const OtherDerived \
+      >
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE const EIGEN_SPARSE_CWISE_PRODUCT_RETURN_TYPE
+    cwiseProduct(const MatrixBase<OtherDerived> &other) const;
+
+    // sparse * sparse
+    template<typename OtherDerived>
+    const typename SparseSparseProductReturnType<Derived,OtherDerived>::Type
+    operator*(const SparseMatrixBase<OtherDerived> &other) const;
+
+    // sparse * diagonal
+    template<typename OtherDerived>
+    const SparseDiagonalProduct<Derived,OtherDerived>
+    operator*(const DiagonalBase<OtherDerived> &other) const;
+
+    // diagonal * sparse
+    template<typename OtherDerived> friend
+    const SparseDiagonalProduct<OtherDerived,Derived>
+    operator*(const DiagonalBase<OtherDerived> &lhs, const SparseMatrixBase& rhs)
+    { return SparseDiagonalProduct<OtherDerived,Derived>(lhs.derived(), rhs.derived()); }
+
+    /** dense * sparse (return a dense object unless it is an outer product) */
+    template<typename OtherDerived> friend
+    const typename DenseSparseProductReturnType<OtherDerived,Derived>::Type
+    operator*(const MatrixBase<OtherDerived>& lhs, const Derived& rhs)
+    { return typename DenseSparseProductReturnType<OtherDerived,Derived>::Type(lhs.derived(),rhs); }
+
+    /** sparse * dense (returns a dense object unless it is an outer product) */
+    template<typename OtherDerived>
+    const typename SparseDenseProductReturnType<Derived,OtherDerived>::Type
+    operator*(const MatrixBase<OtherDerived> &other) const;
+    
+     /** \returns an expression of P H P^-1 where H is the matrix represented by \c *this */
+    SparseSymmetricPermutationProduct<Derived,Upper|Lower> twistedBy(const PermutationMatrix<Dynamic,Dynamic,Index>& perm) const
+    {
+      return SparseSymmetricPermutationProduct<Derived,Upper|Lower>(derived(), perm);
+    }
+
+    template<typename OtherDerived>
+    Derived& operator*=(const SparseMatrixBase<OtherDerived>& other);
+
+    #ifdef EIGEN2_SUPPORT
+    // deprecated
+    template<typename OtherDerived>
+    typename internal::plain_matrix_type_column_major<OtherDerived>::type
+    solveTriangular(const MatrixBase<OtherDerived>& other) const;
+
+    // deprecated
+    template<typename OtherDerived>
+    void solveTriangularInPlace(MatrixBase<OtherDerived>& other) const;
+    #endif // EIGEN2_SUPPORT
+
+    template<int Mode>
+    inline const SparseTriangularView<Derived, Mode> triangularView() const;
+
+    template<unsigned int UpLo> inline const SparseSelfAdjointView<Derived, UpLo> selfadjointView() const;
+    template<unsigned int UpLo> inline SparseSelfAdjointView<Derived, UpLo> selfadjointView();
+
+    template<typename OtherDerived> Scalar dot(const MatrixBase<OtherDerived>& other) const;
+    template<typename OtherDerived> Scalar dot(const SparseMatrixBase<OtherDerived>& other) const;
+    RealScalar squaredNorm() const;
+    RealScalar norm()  const;
+    RealScalar blueNorm() const;
+
+    Transpose<Derived> transpose() { return derived(); }
+    const Transpose<const Derived> transpose() const { return derived(); }
+    const AdjointReturnType adjoint() const { return transpose(); }
+
+    // inner-vector
+    typedef Block<Derived,IsRowMajor?1:Dynamic,IsRowMajor?Dynamic:1,true>       InnerVectorReturnType;
+    typedef Block<const Derived,IsRowMajor?1:Dynamic,IsRowMajor?Dynamic:1,true> ConstInnerVectorReturnType;
+    InnerVectorReturnType innerVector(Index outer);
+    const ConstInnerVectorReturnType innerVector(Index outer) const;
+
+    // set of inner-vectors
+    Block<Derived,Dynamic,Dynamic,true> innerVectors(Index outerStart, Index outerSize);
+    const Block<const Derived,Dynamic,Dynamic,true> innerVectors(Index outerStart, Index outerSize) const;
+
+    /** \internal use operator= */
+    template<typename DenseDerived>
+    void evalTo(MatrixBase<DenseDerived>& dst) const
+    {
+      dst.setZero();
+      for (Index j=0; j<outerSize(); ++j)
+        for (typename Derived::InnerIterator i(derived(),j); i; ++i)
+          dst.coeffRef(i.row(),i.col()) = i.value();
+    }
+
+    Matrix<Scalar,RowsAtCompileTime,ColsAtCompileTime> toDense() const
+    {
+      return derived();
+    }
+
+    template<typename OtherDerived>
+    bool isApprox(const SparseMatrixBase<OtherDerived>& other,
+                  const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const
+    { return toDense().isApprox(other.toDense(),prec); }
+
+    template<typename OtherDerived>
+    bool isApprox(const MatrixBase<OtherDerived>& other,
+                  const RealScalar& prec = NumTraits<Scalar>::dummy_precision()) const
+    { return toDense().isApprox(other,prec); }
+
+    /** \returns the matrix or vector obtained by evaluating this expression.
+      *
+      * Notice that in the case of a plain matrix or vector (not an expression) this function just returns
+      * a const reference, in order to avoid a useless copy.
+      */
+    inline const typename internal::eval<Derived>::type eval() const
+    { return typename internal::eval<Derived>::type(derived()); }
+
+    Scalar sum() const;
+
+  protected:
+
+    bool m_isRValue;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSEMATRIXBASE_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparsePermutation.h b/third_party/eigen3/Eigen/src/SparseCore/SparsePermutation.h
new file mode 100644
index 0000000000..b85be93f6f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparsePermutation.h
@@ -0,0 +1,148 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_PERMUTATION_H
+#define EIGEN_SPARSE_PERMUTATION_H
+
+// This file implements sparse * permutation products
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename PermutationType, typename MatrixType, int Side, bool Transposed>
+struct traits<permut_sparsematrix_product_retval<PermutationType, MatrixType, Side, Transposed> >
+{
+  typedef typename remove_all<typename MatrixType::Nested>::type MatrixTypeNestedCleaned;
+  typedef typename MatrixTypeNestedCleaned::Scalar Scalar;
+  typedef typename MatrixTypeNestedCleaned::Index Index;
+  enum {
+    SrcStorageOrder = MatrixTypeNestedCleaned::Flags&RowMajorBit ? RowMajor : ColMajor,
+    MoveOuter = SrcStorageOrder==RowMajor ? Side==OnTheLeft : Side==OnTheRight
+  };
+
+  typedef typename internal::conditional<MoveOuter,
+        SparseMatrix<Scalar,SrcStorageOrder,Index>,
+        SparseMatrix<Scalar,int(SrcStorageOrder)==RowMajor?ColMajor:RowMajor,Index> >::type ReturnType;
+};
+
+template<typename PermutationType, typename MatrixType, int Side, bool Transposed>
+struct permut_sparsematrix_product_retval
+ : public ReturnByValue<permut_sparsematrix_product_retval<PermutationType, MatrixType, Side, Transposed> >
+{
+    typedef typename remove_all<typename MatrixType::Nested>::type MatrixTypeNestedCleaned;
+    typedef typename MatrixTypeNestedCleaned::Scalar Scalar;
+    typedef typename MatrixTypeNestedCleaned::Index Index;
+
+    enum {
+      SrcStorageOrder = MatrixTypeNestedCleaned::Flags&RowMajorBit ? RowMajor : ColMajor,
+      MoveOuter = SrcStorageOrder==RowMajor ? Side==OnTheLeft : Side==OnTheRight
+    };
+
+    permut_sparsematrix_product_retval(const PermutationType& perm, const MatrixType& matrix)
+      : m_permutation(perm), m_matrix(matrix)
+    {}
+
+    inline int rows() const { return m_matrix.rows(); }
+    inline int cols() const { return m_matrix.cols(); }
+
+    template<typename Dest> inline void evalTo(Dest& dst) const
+    {
+      if(MoveOuter)
+      {
+        SparseMatrix<Scalar,SrcStorageOrder,Index> tmp(m_matrix.rows(), m_matrix.cols());
+        Matrix<Index,Dynamic,1> sizes(m_matrix.outerSize());
+        for(Index j=0; j<m_matrix.outerSize(); ++j)
+        {
+          Index jp = m_permutation.indices().coeff(j);
+          sizes[((Side==OnTheLeft) ^ Transposed) ? jp : j] = m_matrix.innerVector(((Side==OnTheRight) ^ Transposed) ? jp : j).size();
+        }
+        tmp.reserve(sizes);
+        for(Index j=0; j<m_matrix.outerSize(); ++j)
+        {
+          Index jp = m_permutation.indices().coeff(j);
+          Index jsrc = ((Side==OnTheRight) ^ Transposed) ? jp : j;
+          Index jdst = ((Side==OnTheLeft) ^ Transposed) ? jp : j;
+          for(typename MatrixTypeNestedCleaned::InnerIterator it(m_matrix,jsrc); it; ++it)
+            tmp.insertByOuterInner(jdst,it.index()) = it.value();
+        }
+        dst = tmp;
+      }
+      else
+      {
+        SparseMatrix<Scalar,int(SrcStorageOrder)==RowMajor?ColMajor:RowMajor,Index> tmp(m_matrix.rows(), m_matrix.cols());
+        Matrix<Index,Dynamic,1> sizes(tmp.outerSize());
+        sizes.setZero();
+        PermutationMatrix<Dynamic,Dynamic,Index> perm;
+        if((Side==OnTheLeft) ^ Transposed)
+          perm = m_permutation;
+        else
+          perm = m_permutation.transpose();
+
+        for(Index j=0; j<m_matrix.outerSize(); ++j)
+          for(typename MatrixTypeNestedCleaned::InnerIterator it(m_matrix,j); it; ++it)
+            sizes[perm.indices().coeff(it.index())]++;
+        tmp.reserve(sizes);
+        for(Index j=0; j<m_matrix.outerSize(); ++j)
+          for(typename MatrixTypeNestedCleaned::InnerIterator it(m_matrix,j); it; ++it)
+            tmp.insertByOuterInner(perm.indices().coeff(it.index()),j) = it.value();
+        dst = tmp;
+      }
+    }
+
+  protected:
+    const PermutationType& m_permutation;
+    typename MatrixType::Nested m_matrix;
+};
+
+}
+
+
+
+/** \returns the matrix with the permutation applied to the columns
+  */
+template<typename SparseDerived, typename PermDerived>
+inline const internal::permut_sparsematrix_product_retval<PermutationBase<PermDerived>, SparseDerived, OnTheRight, false>
+operator*(const SparseMatrixBase<SparseDerived>& matrix, const PermutationBase<PermDerived>& perm)
+{
+  return internal::permut_sparsematrix_product_retval<PermutationBase<PermDerived>, SparseDerived, OnTheRight, false>(perm, matrix.derived());
+}
+
+/** \returns the matrix with the permutation applied to the rows
+  */
+template<typename SparseDerived, typename PermDerived>
+inline const internal::permut_sparsematrix_product_retval<PermutationBase<PermDerived>, SparseDerived, OnTheLeft, false>
+operator*( const PermutationBase<PermDerived>& perm, const SparseMatrixBase<SparseDerived>& matrix)
+{
+  return internal::permut_sparsematrix_product_retval<PermutationBase<PermDerived>, SparseDerived, OnTheLeft, false>(perm, matrix.derived());
+}
+
+
+
+/** \returns the matrix with the inverse permutation applied to the columns.
+  */
+template<typename SparseDerived, typename PermDerived>
+inline const internal::permut_sparsematrix_product_retval<PermutationBase<PermDerived>, SparseDerived, OnTheRight, true>
+operator*(const SparseMatrixBase<SparseDerived>& matrix, const Transpose<PermutationBase<PermDerived> >& tperm)
+{
+  return internal::permut_sparsematrix_product_retval<PermutationBase<PermDerived>, SparseDerived, OnTheRight, true>(tperm.nestedPermutation(), matrix.derived());
+}
+
+/** \returns the matrix with the inverse permutation applied to the rows.
+  */
+template<typename SparseDerived, typename PermDerived>
+inline const internal::permut_sparsematrix_product_retval<PermutationBase<PermDerived>, SparseDerived, OnTheLeft, true>
+operator*(const Transpose<PermutationBase<PermDerived> >& tperm, const SparseMatrixBase<SparseDerived>& matrix)
+{
+  return internal::permut_sparsematrix_product_retval<PermutationBase<PermDerived>, SparseDerived, OnTheLeft, true>(tperm.nestedPermutation(), matrix.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_SELFADJOINTVIEW_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseProduct.h b/third_party/eigen3/Eigen/src/SparseCore/SparseProduct.h
new file mode 100644
index 0000000000..cf76630700
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseProduct.h
@@ -0,0 +1,188 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSEPRODUCT_H
+#define EIGEN_SPARSEPRODUCT_H
+
+namespace Eigen { 
+
+template<typename Lhs, typename Rhs>
+struct SparseSparseProductReturnType
+{
+  typedef typename internal::traits<Lhs>::Scalar Scalar;
+  typedef typename internal::traits<Lhs>::Index Index;
+  enum {
+    LhsRowMajor = internal::traits<Lhs>::Flags & RowMajorBit,
+    RhsRowMajor = internal::traits<Rhs>::Flags & RowMajorBit,
+    TransposeRhs = (!LhsRowMajor) && RhsRowMajor,
+    TransposeLhs = LhsRowMajor && (!RhsRowMajor)
+  };
+
+  typedef typename internal::conditional<TransposeLhs,
+    SparseMatrix<Scalar,0,Index>,
+    typename internal::nested<Lhs,Rhs::RowsAtCompileTime>::type>::type LhsNested;
+
+  typedef typename internal::conditional<TransposeRhs,
+    SparseMatrix<Scalar,0,Index>,
+    typename internal::nested<Rhs,Lhs::RowsAtCompileTime>::type>::type RhsNested;
+
+  typedef SparseSparseProduct<LhsNested, RhsNested> Type;
+};
+
+namespace internal {
+template<typename LhsNested, typename RhsNested>
+struct traits<SparseSparseProduct<LhsNested, RhsNested> >
+{
+  typedef MatrixXpr XprKind;
+  // clean the nested types:
+  typedef typename remove_all<LhsNested>::type _LhsNested;
+  typedef typename remove_all<RhsNested>::type _RhsNested;
+  typedef typename _LhsNested::Scalar Scalar;
+  typedef typename promote_index_type<typename traits<_LhsNested>::Index,
+                                         typename traits<_RhsNested>::Index>::type Index;
+
+  enum {
+    LhsCoeffReadCost = _LhsNested::CoeffReadCost,
+    RhsCoeffReadCost = _RhsNested::CoeffReadCost,
+    LhsFlags = _LhsNested::Flags,
+    RhsFlags = _RhsNested::Flags,
+
+    RowsAtCompileTime    = _LhsNested::RowsAtCompileTime,
+    ColsAtCompileTime    = _RhsNested::ColsAtCompileTime,
+    MaxRowsAtCompileTime = _LhsNested::MaxRowsAtCompileTime,
+    MaxColsAtCompileTime = _RhsNested::MaxColsAtCompileTime,
+
+    InnerSize = EIGEN_SIZE_MIN_PREFER_FIXED(_LhsNested::ColsAtCompileTime, _RhsNested::RowsAtCompileTime),
+
+    EvalToRowMajor = (RhsFlags & LhsFlags & RowMajorBit),
+
+    RemovedBits = ~(EvalToRowMajor ? 0 : RowMajorBit),
+
+    Flags = (int(LhsFlags | RhsFlags) & HereditaryBits & RemovedBits)
+          | EvalBeforeAssigningBit
+          | EvalBeforeNestingBit,
+
+    CoeffReadCost = Dynamic
+  };
+
+  typedef Sparse StorageKind;
+};
+
+} // end namespace internal
+
+template<typename LhsNested, typename RhsNested>
+class SparseSparseProduct : internal::no_assignment_operator,
+  public SparseMatrixBase<SparseSparseProduct<LhsNested, RhsNested> >
+{
+  public:
+
+    typedef SparseMatrixBase<SparseSparseProduct> Base;
+    EIGEN_DENSE_PUBLIC_INTERFACE(SparseSparseProduct)
+
+  private:
+
+    typedef typename internal::traits<SparseSparseProduct>::_LhsNested _LhsNested;
+    typedef typename internal::traits<SparseSparseProduct>::_RhsNested _RhsNested;
+
+  public:
+
+    template<typename Lhs, typename Rhs>
+    EIGEN_STRONG_INLINE SparseSparseProduct(const Lhs& lhs, const Rhs& rhs)
+      : m_lhs(lhs), m_rhs(rhs), m_tolerance(0), m_conservative(true)
+    {
+      init();
+    }
+
+    template<typename Lhs, typename Rhs>
+    EIGEN_STRONG_INLINE SparseSparseProduct(const Lhs& lhs, const Rhs& rhs, const RealScalar& tolerance)
+      : m_lhs(lhs), m_rhs(rhs), m_tolerance(tolerance), m_conservative(false)
+    {
+      init();
+    }
+
+    SparseSparseProduct pruned(const Scalar& reference = 0, const RealScalar& epsilon = NumTraits<RealScalar>::dummy_precision()) const
+    {
+      using std::abs;
+      return SparseSparseProduct(m_lhs,m_rhs,abs(reference)*epsilon);
+    }
+
+    template<typename Dest>
+    void evalTo(Dest& result) const
+    {
+      if(m_conservative)
+        internal::conservative_sparse_sparse_product_selector<_LhsNested, _RhsNested, Dest>::run(lhs(),rhs(),result);
+      else
+        internal::sparse_sparse_product_with_pruning_selector<_LhsNested, _RhsNested, Dest>::run(lhs(),rhs(),result,m_tolerance);
+    }
+
+    EIGEN_STRONG_INLINE Index rows() const { return m_lhs.rows(); }
+    EIGEN_STRONG_INLINE Index cols() const { return m_rhs.cols(); }
+
+    EIGEN_STRONG_INLINE const _LhsNested& lhs() const { return m_lhs; }
+    EIGEN_STRONG_INLINE const _RhsNested& rhs() const { return m_rhs; }
+
+  protected:
+    void init()
+    {
+      eigen_assert(m_lhs.cols() == m_rhs.rows());
+
+      enum {
+        ProductIsValid = _LhsNested::ColsAtCompileTime==Dynamic
+                      || _RhsNested::RowsAtCompileTime==Dynamic
+                      || int(_LhsNested::ColsAtCompileTime)==int(_RhsNested::RowsAtCompileTime),
+        AreVectors = _LhsNested::IsVectorAtCompileTime && _RhsNested::IsVectorAtCompileTime,
+        SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(_LhsNested,_RhsNested)
+      };
+      // note to the lost user:
+      //    * for a dot product use: v1.dot(v2)
+      //    * for a coeff-wise product use: v1.cwise()*v2
+      EIGEN_STATIC_ASSERT(ProductIsValid || !(AreVectors && SameSizes),
+        INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS)
+      EIGEN_STATIC_ASSERT(ProductIsValid || !(SameSizes && !AreVectors),
+        INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION)
+      EIGEN_STATIC_ASSERT(ProductIsValid || SameSizes, INVALID_MATRIX_PRODUCT)
+    }
+
+    LhsNested m_lhs;
+    RhsNested m_rhs;
+    RealScalar m_tolerance;
+    bool m_conservative;
+};
+
+// sparse = sparse * sparse
+template<typename Derived>
+template<typename Lhs, typename Rhs>
+inline Derived& SparseMatrixBase<Derived>::operator=(const SparseSparseProduct<Lhs,Rhs>& product)
+{
+  product.evalTo(derived());
+  return derived();
+}
+
+/** \returns an expression of the product of two sparse matrices.
+  * By default a conservative product preserving the symbolic non zeros is performed.
+  * The automatic pruning of the small values can be achieved by calling the pruned() function
+  * in which case a totally different product algorithm is employed:
+  * \code
+  * C = (A*B).pruned();             // supress numerical zeros (exact)
+  * C = (A*B).pruned(ref);
+  * C = (A*B).pruned(ref,epsilon);
+  * \endcode
+  * where \c ref is a meaningful non zero reference value.
+  * */
+template<typename Derived>
+template<typename OtherDerived>
+inline const typename SparseSparseProductReturnType<Derived,OtherDerived>::Type
+SparseMatrixBase<Derived>::operator*(const SparseMatrixBase<OtherDerived> &other) const
+{
+  return typename SparseSparseProductReturnType<Derived,OtherDerived>::Type(derived(), other.derived());
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSEPRODUCT_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseRedux.h b/third_party/eigen3/Eigen/src/SparseCore/SparseRedux.h
new file mode 100644
index 0000000000..f3da93a71d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseRedux.h
@@ -0,0 +1,45 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSEREDUX_H
+#define EIGEN_SPARSEREDUX_H
+
+namespace Eigen { 
+
+template<typename Derived>
+typename internal::traits<Derived>::Scalar
+SparseMatrixBase<Derived>::sum() const
+{
+  eigen_assert(rows()>0 && cols()>0 && "you are using a non initialized matrix");
+  Scalar res(0);
+  for (Index j=0; j<outerSize(); ++j)
+    for (typename Derived::InnerIterator iter(derived(),j); iter; ++iter)
+      res += iter.value();
+  return res;
+}
+
+template<typename _Scalar, int _Options, typename _Index>
+typename internal::traits<SparseMatrix<_Scalar,_Options,_Index> >::Scalar
+SparseMatrix<_Scalar,_Options,_Index>::sum() const
+{
+  eigen_assert(rows()>0 && cols()>0 && "you are using a non initialized matrix");
+  return Matrix<Scalar,1,Dynamic>::Map(&m_data.value(0), m_data.size()).sum();
+}
+
+template<typename _Scalar, int _Options, typename _Index>
+typename internal::traits<SparseVector<_Scalar,_Options, _Index> >::Scalar
+SparseVector<_Scalar,_Options,_Index>::sum() const
+{
+  eigen_assert(rows()>0 && cols()>0 && "you are using a non initialized matrix");
+  return Matrix<Scalar,1,Dynamic>::Map(&m_data.value(0), m_data.size()).sum();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSEREDUX_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseSelfAdjointView.h b/third_party/eigen3/Eigen/src/SparseCore/SparseSelfAdjointView.h
new file mode 100644
index 0000000000..0eda96bc47
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseSelfAdjointView.h
@@ -0,0 +1,507 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_SELFADJOINTVIEW_H
+#define EIGEN_SPARSE_SELFADJOINTVIEW_H
+
+namespace Eigen { 
+
+/** \ingroup SparseCore_Module
+  * \class SparseSelfAdjointView
+  *
+  * \brief Pseudo expression to manipulate a triangular sparse matrix as a selfadjoint matrix.
+  *
+  * \param MatrixType the type of the dense matrix storing the coefficients
+  * \param UpLo can be either \c #Lower or \c #Upper
+  *
+  * This class is an expression of a sefladjoint matrix from a triangular part of a matrix
+  * with given dense storage of the coefficients. It is the return type of MatrixBase::selfadjointView()
+  * and most of the time this is the only way that it is used.
+  *
+  * \sa SparseMatrixBase::selfadjointView()
+  */
+template<typename Lhs, typename Rhs, int UpLo>
+class SparseSelfAdjointTimeDenseProduct;
+
+template<typename Lhs, typename Rhs, int UpLo>
+class DenseTimeSparseSelfAdjointProduct;
+
+namespace internal {
+  
+template<typename MatrixType, unsigned int UpLo>
+struct traits<SparseSelfAdjointView<MatrixType,UpLo> > : traits<MatrixType> {
+};
+
+template<int SrcUpLo,int DstUpLo,typename MatrixType,int DestOrder>
+void permute_symm_to_symm(const MatrixType& mat, SparseMatrix<typename MatrixType::Scalar,DestOrder,typename MatrixType::Index>& _dest, const typename MatrixType::Index* perm = 0);
+
+template<int UpLo,typename MatrixType,int DestOrder>
+void permute_symm_to_fullsymm(const MatrixType& mat, SparseMatrix<typename MatrixType::Scalar,DestOrder,typename MatrixType::Index>& _dest, const typename MatrixType::Index* perm = 0);
+
+}
+
+template<typename MatrixType, unsigned int UpLo> class SparseSelfAdjointView
+  : public EigenBase<SparseSelfAdjointView<MatrixType,UpLo> >
+{
+  public:
+
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::Index Index;
+    typedef Matrix<Index,Dynamic,1> VectorI;
+    typedef typename MatrixType::Nested MatrixTypeNested;
+    typedef typename internal::remove_all<MatrixTypeNested>::type _MatrixTypeNested;
+
+    inline SparseSelfAdjointView(const MatrixType& matrix) : m_matrix(matrix)
+    {
+      eigen_assert(rows()==cols() && "SelfAdjointView is only for squared matrices");
+    }
+
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+
+    /** \internal \returns a reference to the nested matrix */
+    const _MatrixTypeNested& matrix() const { return m_matrix; }
+    _MatrixTypeNested& matrix() { return m_matrix.const_cast_derived(); }
+
+    /** \returns an expression of the matrix product between a sparse self-adjoint matrix \c *this and a sparse matrix \a rhs.
+      *
+      * Note that there is no algorithmic advantage of performing such a product compared to a general sparse-sparse matrix product.
+      * Indeed, the SparseSelfadjointView operand is first copied into a temporary SparseMatrix before computing the product.
+      */
+    template<typename OtherDerived>
+    SparseSparseProduct<typename OtherDerived::PlainObject, OtherDerived>
+    operator*(const SparseMatrixBase<OtherDerived>& rhs) const
+    {
+      return SparseSparseProduct<typename OtherDerived::PlainObject, OtherDerived>(*this, rhs.derived());
+    }
+
+    /** \returns an expression of the matrix product between a sparse matrix \a lhs and a sparse self-adjoint matrix \a rhs.
+      *
+      * Note that there is no algorithmic advantage of performing such a product compared to a general sparse-sparse matrix product.
+      * Indeed, the SparseSelfadjointView operand is first copied into a temporary SparseMatrix before computing the product.
+      */
+    template<typename OtherDerived> friend
+    SparseSparseProduct<OtherDerived, typename OtherDerived::PlainObject >
+    operator*(const SparseMatrixBase<OtherDerived>& lhs, const SparseSelfAdjointView& rhs)
+    {
+      return SparseSparseProduct<OtherDerived, typename OtherDerived::PlainObject>(lhs.derived(), rhs);
+    }
+    
+    /** Efficient sparse self-adjoint matrix times dense vector/matrix product */
+    template<typename OtherDerived>
+    SparseSelfAdjointTimeDenseProduct<MatrixType,OtherDerived,UpLo>
+    operator*(const MatrixBase<OtherDerived>& rhs) const
+    {
+      return SparseSelfAdjointTimeDenseProduct<MatrixType,OtherDerived,UpLo>(m_matrix, rhs.derived());
+    }
+
+    /** Efficient dense vector/matrix times sparse self-adjoint matrix product */
+    template<typename OtherDerived> friend
+    DenseTimeSparseSelfAdjointProduct<OtherDerived,MatrixType,UpLo>
+    operator*(const MatrixBase<OtherDerived>& lhs, const SparseSelfAdjointView& rhs)
+    {
+      return DenseTimeSparseSelfAdjointProduct<OtherDerived,_MatrixTypeNested,UpLo>(lhs.derived(), rhs.m_matrix);
+    }
+
+    /** Perform a symmetric rank K update of the selfadjoint matrix \c *this:
+      * \f$ this = this + \alpha ( u u^* ) \f$ where \a u is a vector or matrix.
+      *
+      * \returns a reference to \c *this
+      *
+      * To perform \f$ this = this + \alpha ( u^* u ) \f$ you can simply
+      * call this function with u.adjoint().
+      */
+    template<typename DerivedU>
+    SparseSelfAdjointView& rankUpdate(const SparseMatrixBase<DerivedU>& u, const Scalar& alpha = Scalar(1));
+    
+    /** \internal triggered by sparse_matrix = SparseSelfadjointView; */
+    template<typename DestScalar,int StorageOrder> void evalTo(SparseMatrix<DestScalar,StorageOrder,Index>& _dest) const
+    {
+      internal::permute_symm_to_fullsymm<UpLo>(m_matrix, _dest);
+    }
+    
+    template<typename DestScalar> void evalTo(DynamicSparseMatrix<DestScalar,ColMajor,Index>& _dest) const
+    {
+      // TODO directly evaluate into _dest;
+      SparseMatrix<DestScalar,ColMajor,Index> tmp(_dest.rows(),_dest.cols());
+      internal::permute_symm_to_fullsymm<UpLo>(m_matrix, tmp);
+      _dest = tmp;
+    }
+    
+    /** \returns an expression of P H P^-1 */
+    SparseSymmetricPermutationProduct<_MatrixTypeNested,UpLo> twistedBy(const PermutationMatrix<Dynamic,Dynamic,Index>& perm) const
+    {
+      return SparseSymmetricPermutationProduct<_MatrixTypeNested,UpLo>(m_matrix, perm);
+    }
+    
+    template<typename SrcMatrixType,int SrcUpLo>
+    SparseSelfAdjointView& operator=(const SparseSymmetricPermutationProduct<SrcMatrixType,SrcUpLo>& permutedMatrix)
+    {
+      permutedMatrix.evalTo(*this);
+      return *this;
+    }
+
+
+    SparseSelfAdjointView& operator=(const SparseSelfAdjointView& src)
+    {
+      PermutationMatrix<Dynamic> pnull;
+      return *this = src.twistedBy(pnull);
+    }
+
+    template<typename SrcMatrixType,unsigned int SrcUpLo>
+    SparseSelfAdjointView& operator=(const SparseSelfAdjointView<SrcMatrixType,SrcUpLo>& src)
+    {
+      PermutationMatrix<Dynamic> pnull;
+      return *this = src.twistedBy(pnull);
+    }
+    
+
+    // const SparseLLT<PlainObject, UpLo> llt() const;
+    // const SparseLDLT<PlainObject, UpLo> ldlt() const;
+
+  protected:
+
+    typename MatrixType::Nested m_matrix;
+    mutable VectorI m_countPerRow;
+    mutable VectorI m_countPerCol;
+};
+
+/***************************************************************************
+* Implementation of SparseMatrixBase methods
+***************************************************************************/
+
+template<typename Derived>
+template<unsigned int UpLo>
+const SparseSelfAdjointView<Derived, UpLo> SparseMatrixBase<Derived>::selfadjointView() const
+{
+  return derived();
+}
+
+template<typename Derived>
+template<unsigned int UpLo>
+SparseSelfAdjointView<Derived, UpLo> SparseMatrixBase<Derived>::selfadjointView()
+{
+  return derived();
+}
+
+/***************************************************************************
+* Implementation of SparseSelfAdjointView methods
+***************************************************************************/
+
+template<typename MatrixType, unsigned int UpLo>
+template<typename DerivedU>
+SparseSelfAdjointView<MatrixType,UpLo>&
+SparseSelfAdjointView<MatrixType,UpLo>::rankUpdate(const SparseMatrixBase<DerivedU>& u, const Scalar& alpha)
+{
+  SparseMatrix<Scalar,MatrixType::Flags&RowMajorBit?RowMajor:ColMajor> tmp = u * u.adjoint();
+  if(alpha==Scalar(0))
+    m_matrix.const_cast_derived() = tmp.template triangularView<UpLo>();
+  else
+    m_matrix.const_cast_derived() += alpha * tmp.template triangularView<UpLo>();
+
+  return *this;
+}
+
+/***************************************************************************
+* Implementation of sparse self-adjoint time dense matrix
+***************************************************************************/
+
+namespace internal {
+template<typename Lhs, typename Rhs, int UpLo>
+struct traits<SparseSelfAdjointTimeDenseProduct<Lhs,Rhs,UpLo> >
+ : traits<ProductBase<SparseSelfAdjointTimeDenseProduct<Lhs,Rhs,UpLo>, Lhs, Rhs> >
+{
+  typedef Dense StorageKind;
+};
+}
+
+template<typename Lhs, typename Rhs, int UpLo>
+class SparseSelfAdjointTimeDenseProduct
+  : public ProductBase<SparseSelfAdjointTimeDenseProduct<Lhs,Rhs,UpLo>, Lhs, Rhs>
+{
+  public:
+    EIGEN_PRODUCT_PUBLIC_INTERFACE(SparseSelfAdjointTimeDenseProduct)
+
+    SparseSelfAdjointTimeDenseProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs)
+    {}
+
+    template<typename Dest> void scaleAndAddTo(Dest& dest, const Scalar& alpha) const
+    {
+      EIGEN_ONLY_USED_FOR_DEBUG(alpha);
+      // TODO use alpha
+      eigen_assert(alpha==Scalar(1) && "alpha != 1 is not implemented yet, sorry");
+      typedef typename internal::remove_all<Lhs>::type _Lhs;
+      typedef typename _Lhs::InnerIterator LhsInnerIterator;
+      enum {
+        LhsIsRowMajor = (_Lhs::Flags&RowMajorBit)==RowMajorBit,
+        ProcessFirstHalf =
+                 ((UpLo&(Upper|Lower))==(Upper|Lower))
+              || ( (UpLo&Upper) && !LhsIsRowMajor)
+              || ( (UpLo&Lower) && LhsIsRowMajor),
+        ProcessSecondHalf = !ProcessFirstHalf
+      };
+      for (Index j=0; j<m_lhs.outerSize(); ++j)
+      {
+        LhsInnerIterator i(m_lhs,j);
+        if (ProcessSecondHalf)
+        {
+          while (i && i.index()<j) ++i;
+          if(i && i.index()==j)
+          {
+            dest.row(j) += i.value() * m_rhs.row(j);
+            ++i;
+          }
+        }
+        for(; (ProcessFirstHalf ? i && i.index() < j : i) ; ++i)
+        {
+          Index a = LhsIsRowMajor ? j : i.index();
+          Index b = LhsIsRowMajor ? i.index() : j;
+          typename Lhs::Scalar v = i.value();
+          dest.row(a) += (v) * m_rhs.row(b);
+          dest.row(b) += numext::conj(v) * m_rhs.row(a);
+        }
+        if (ProcessFirstHalf && i && (i.index()==j))
+          dest.row(j) += i.value() * m_rhs.row(j);
+      }
+    }
+
+  private:
+    SparseSelfAdjointTimeDenseProduct& operator=(const SparseSelfAdjointTimeDenseProduct&);
+};
+
+namespace internal {
+template<typename Lhs, typename Rhs, int UpLo>
+struct traits<DenseTimeSparseSelfAdjointProduct<Lhs,Rhs,UpLo> >
+ : traits<ProductBase<DenseTimeSparseSelfAdjointProduct<Lhs,Rhs,UpLo>, Lhs, Rhs> >
+{};
+}
+
+template<typename Lhs, typename Rhs, int UpLo>
+class DenseTimeSparseSelfAdjointProduct
+  : public ProductBase<DenseTimeSparseSelfAdjointProduct<Lhs,Rhs,UpLo>, Lhs, Rhs>
+{
+  public:
+    EIGEN_PRODUCT_PUBLIC_INTERFACE(DenseTimeSparseSelfAdjointProduct)
+
+    DenseTimeSparseSelfAdjointProduct(const Lhs& lhs, const Rhs& rhs) : Base(lhs,rhs)
+    {}
+
+    template<typename Dest> void scaleAndAddTo(Dest& /*dest*/, const Scalar& /*alpha*/) const
+    {
+      // TODO
+    }
+
+  private:
+    DenseTimeSparseSelfAdjointProduct& operator=(const DenseTimeSparseSelfAdjointProduct&);
+};
+
+/***************************************************************************
+* Implementation of symmetric copies and permutations
+***************************************************************************/
+namespace internal {
+  
+template<typename MatrixType, int UpLo>
+struct traits<SparseSymmetricPermutationProduct<MatrixType,UpLo> > : traits<MatrixType> {
+};
+
+template<int UpLo,typename MatrixType,int DestOrder>
+void permute_symm_to_fullsymm(const MatrixType& mat, SparseMatrix<typename MatrixType::Scalar,DestOrder,typename MatrixType::Index>& _dest, const typename MatrixType::Index* perm)
+{
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  typedef SparseMatrix<Scalar,DestOrder,Index> Dest;
+  typedef Matrix<Index,Dynamic,1> VectorI;
+  
+  Dest& dest(_dest.derived());
+  enum {
+    StorageOrderMatch = int(Dest::IsRowMajor) == int(MatrixType::IsRowMajor)
+  };
+  
+  Index size = mat.rows();
+  VectorI count;
+  count.resize(size);
+  count.setZero();
+  dest.resize(size,size);
+  for(Index j = 0; j<size; ++j)
+  {
+    Index jp = perm ? perm[j] : j;
+    for(typename MatrixType::InnerIterator it(mat,j); it; ++it)
+    {
+      Index i = it.index();
+      Index r = it.row();
+      Index c = it.col();
+      Index ip = perm ? perm[i] : i;
+      if(UpLo==(Upper|Lower))
+        count[StorageOrderMatch ? jp : ip]++;
+      else if(r==c)
+        count[ip]++;
+      else if(( UpLo==Lower && r>c) || ( UpLo==Upper && r<c))
+      {
+        count[ip]++;
+        count[jp]++;
+      }
+    }
+  }
+  Index nnz = count.sum();
+  
+  // reserve space
+  dest.resizeNonZeros(nnz);
+  dest.outerIndexPtr()[0] = 0;
+  for(Index j=0; j<size; ++j)
+    dest.outerIndexPtr()[j+1] = dest.outerIndexPtr()[j] + count[j];
+  for(Index j=0; j<size; ++j)
+    count[j] = dest.outerIndexPtr()[j];
+  
+  // copy data
+  for(Index j = 0; j<size; ++j)
+  {
+    for(typename MatrixType::InnerIterator it(mat,j); it; ++it)
+    {
+      Index i = it.index();
+      Index r = it.row();
+      Index c = it.col();
+      
+      Index jp = perm ? perm[j] : j;
+      Index ip = perm ? perm[i] : i;
+      
+      if(UpLo==(Upper|Lower))
+      {
+        Index k = count[StorageOrderMatch ? jp : ip]++;
+        dest.innerIndexPtr()[k] = StorageOrderMatch ? ip : jp;
+        dest.valuePtr()[k] = it.value();
+      }
+      else if(r==c)
+      {
+        Index k = count[ip]++;
+        dest.innerIndexPtr()[k] = ip;
+        dest.valuePtr()[k] = it.value();
+      }
+      else if(( (UpLo&Lower)==Lower && r>c) || ( (UpLo&Upper)==Upper && r<c))
+      {
+        if(!StorageOrderMatch)
+          std::swap(ip,jp);
+        Index k = count[jp]++;
+        dest.innerIndexPtr()[k] = ip;
+        dest.valuePtr()[k] = it.value();
+        k = count[ip]++;
+        dest.innerIndexPtr()[k] = jp;
+        dest.valuePtr()[k] = numext::conj(it.value());
+      }
+    }
+  }
+}
+
+template<int _SrcUpLo,int _DstUpLo,typename MatrixType,int DstOrder>
+void permute_symm_to_symm(const MatrixType& mat, SparseMatrix<typename MatrixType::Scalar,DstOrder,typename MatrixType::Index>& _dest, const typename MatrixType::Index* perm)
+{
+  typedef typename MatrixType::Index Index;
+  typedef typename MatrixType::Scalar Scalar;
+  SparseMatrix<Scalar,DstOrder,Index>& dest(_dest.derived());
+  typedef Matrix<Index,Dynamic,1> VectorI;
+  enum {
+    SrcOrder = MatrixType::IsRowMajor ? RowMajor : ColMajor,
+    StorageOrderMatch = int(SrcOrder) == int(DstOrder),
+    DstUpLo = DstOrder==RowMajor ? (_DstUpLo==Upper ? Lower : Upper) : _DstUpLo,
+    SrcUpLo = SrcOrder==RowMajor ? (_SrcUpLo==Upper ? Lower : Upper) : _SrcUpLo
+  };
+  
+  Index size = mat.rows();
+  VectorI count(size);
+  count.setZero();
+  dest.resize(size,size);
+  for(Index j = 0; j<size; ++j)
+  {
+    Index jp = perm ? perm[j] : j;
+    for(typename MatrixType::InnerIterator it(mat,j); it; ++it)
+    {
+      Index i = it.index();
+      if((int(SrcUpLo)==int(Lower) && i<j) || (int(SrcUpLo)==int(Upper) && i>j))
+        continue;
+                  
+      Index ip = perm ? perm[i] : i;
+      count[int(DstUpLo)==int(Lower) ? (std::min)(ip,jp) : (std::max)(ip,jp)]++;
+    }
+  }
+  dest.outerIndexPtr()[0] = 0;
+  for(Index j=0; j<size; ++j)
+    dest.outerIndexPtr()[j+1] = dest.outerIndexPtr()[j] + count[j];
+  dest.resizeNonZeros(dest.outerIndexPtr()[size]);
+  for(Index j=0; j<size; ++j)
+    count[j] = dest.outerIndexPtr()[j];
+  
+  for(Index j = 0; j<size; ++j)
+  {
+    
+    for(typename MatrixType::InnerIterator it(mat,j); it; ++it)
+    {
+      Index i = it.index();
+      if((int(SrcUpLo)==int(Lower) && i<j) || (int(SrcUpLo)==int(Upper) && i>j))
+        continue;
+                  
+      Index jp = perm ? perm[j] : j;
+      Index ip = perm? perm[i] : i;
+      
+      Index k = count[int(DstUpLo)==int(Lower) ? (std::min)(ip,jp) : (std::max)(ip,jp)]++;
+      dest.innerIndexPtr()[k] = int(DstUpLo)==int(Lower) ? (std::max)(ip,jp) : (std::min)(ip,jp);
+      
+      if(!StorageOrderMatch) std::swap(ip,jp);
+      if( ((int(DstUpLo)==int(Lower) && ip<jp) || (int(DstUpLo)==int(Upper) && ip>jp)))
+        dest.valuePtr()[k] = numext::conj(it.value());
+      else
+        dest.valuePtr()[k] = it.value();
+    }
+  }
+}
+
+}
+
+template<typename MatrixType,int UpLo>
+class SparseSymmetricPermutationProduct
+  : public EigenBase<SparseSymmetricPermutationProduct<MatrixType,UpLo> >
+{
+  public:
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::Index Index;
+  protected:
+    typedef PermutationMatrix<Dynamic,Dynamic,Index> Perm;
+  public:
+    typedef Matrix<Index,Dynamic,1> VectorI;
+    typedef typename MatrixType::Nested MatrixTypeNested;
+    typedef typename internal::remove_all<MatrixTypeNested>::type _MatrixTypeNested;
+    
+    SparseSymmetricPermutationProduct(const MatrixType& mat, const Perm& perm)
+      : m_matrix(mat), m_perm(perm)
+    {}
+    
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+    
+    template<typename DestScalar, int Options, typename DstIndex>
+    void evalTo(SparseMatrix<DestScalar,Options,DstIndex>& _dest) const
+    {
+//       internal::permute_symm_to_fullsymm<UpLo>(m_matrix,_dest,m_perm.indices().data());
+      SparseMatrix<DestScalar,(Options&RowMajor)==RowMajor ? ColMajor : RowMajor, DstIndex> tmp;
+      internal::permute_symm_to_fullsymm<UpLo>(m_matrix,tmp,m_perm.indices().data());
+      _dest = tmp;
+    }
+    
+    template<typename DestType,unsigned int DestUpLo> void evalTo(SparseSelfAdjointView<DestType,DestUpLo>& dest) const
+    {
+      internal::permute_symm_to_symm<UpLo,DestUpLo>(m_matrix,dest.matrix(),m_perm.indices().data());
+    }
+    
+  protected:
+    MatrixTypeNested m_matrix;
+    const Perm& m_perm;
+
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_SELFADJOINTVIEW_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseSparseProductWithPruning.h b/third_party/eigen3/Eigen/src/SparseCore/SparseSparseProductWithPruning.h
new file mode 100644
index 0000000000..fcc18f5c9c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseSparseProductWithPruning.h
@@ -0,0 +1,150 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSESPARSEPRODUCTWITHPRUNING_H
+#define EIGEN_SPARSESPARSEPRODUCTWITHPRUNING_H
+
+namespace Eigen { 
+
+namespace internal {
+
+
+// perform a pseudo in-place sparse * sparse product assuming all matrices are col major
+template<typename Lhs, typename Rhs, typename ResultType>
+static void sparse_sparse_product_with_pruning_impl(const Lhs& lhs, const Rhs& rhs, ResultType& res, const typename ResultType::RealScalar& tolerance)
+{
+  // return sparse_sparse_product_with_pruning_impl2(lhs,rhs,res);
+
+  typedef typename remove_all<Lhs>::type::Scalar Scalar;
+  typedef typename remove_all<Lhs>::type::Index Index;
+
+  // make sure to call innerSize/outerSize since we fake the storage order.
+  Index rows = lhs.innerSize();
+  Index cols = rhs.outerSize();
+  //Index size = lhs.outerSize();
+  eigen_assert(lhs.outerSize() == rhs.innerSize());
+
+  // allocate a temporary buffer
+  AmbiVector<Scalar,Index> tempVector(rows);
+
+  // estimate the number of non zero entries
+  // given a rhs column containing Y non zeros, we assume that the respective Y columns
+  // of the lhs differs in average of one non zeros, thus the number of non zeros for
+  // the product of a rhs column with the lhs is X+Y where X is the average number of non zero
+  // per column of the lhs.
+  // Therefore, we have nnz(lhs*rhs) = nnz(lhs) + nnz(rhs)
+  Index estimated_nnz_prod = lhs.nonZeros() + rhs.nonZeros();
+
+  // mimics a resizeByInnerOuter:
+  if(ResultType::IsRowMajor)
+    res.resize(cols, rows);
+  else
+    res.resize(rows, cols);
+
+  res.reserve(estimated_nnz_prod);
+  double ratioColRes = double(estimated_nnz_prod)/double(lhs.rows()*rhs.cols());
+  for (Index j=0; j<cols; ++j)
+  {
+    // FIXME:
+    //double ratioColRes = (double(rhs.innerVector(j).nonZeros()) + double(lhs.nonZeros())/double(lhs.cols()))/double(lhs.rows());
+    // let's do a more accurate determination of the nnz ratio for the current column j of res
+    tempVector.init(ratioColRes);
+    tempVector.setZero();
+    for (typename Rhs::InnerIterator rhsIt(rhs, j); rhsIt; ++rhsIt)
+    {
+      // FIXME should be written like this: tmp += rhsIt.value() * lhs.col(rhsIt.index())
+      tempVector.restart();
+      Scalar x = rhsIt.value();
+      for (typename Lhs::InnerIterator lhsIt(lhs, rhsIt.index()); lhsIt; ++lhsIt)
+      {
+        tempVector.coeffRef(lhsIt.index()) += lhsIt.value() * x;
+      }
+    }
+    res.startVec(j);
+    for (typename AmbiVector<Scalar,Index>::Iterator it(tempVector,tolerance); it; ++it)
+      res.insertBackByOuterInner(j,it.index()) = it.value();
+  }
+  res.finalize();
+}
+
+template<typename Lhs, typename Rhs, typename ResultType,
+  int LhsStorageOrder = traits<Lhs>::Flags&RowMajorBit,
+  int RhsStorageOrder = traits<Rhs>::Flags&RowMajorBit,
+  int ResStorageOrder = traits<ResultType>::Flags&RowMajorBit>
+struct sparse_sparse_product_with_pruning_selector;
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct sparse_sparse_product_with_pruning_selector<Lhs,Rhs,ResultType,ColMajor,ColMajor,ColMajor>
+{
+  typedef typename traits<typename remove_all<Lhs>::type>::Scalar Scalar;
+  typedef typename ResultType::RealScalar RealScalar;
+
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res, const RealScalar& tolerance)
+  {
+    typename remove_all<ResultType>::type _res(res.rows(), res.cols());
+    internal::sparse_sparse_product_with_pruning_impl<Lhs,Rhs,ResultType>(lhs, rhs, _res, tolerance);
+    res.swap(_res);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct sparse_sparse_product_with_pruning_selector<Lhs,Rhs,ResultType,ColMajor,ColMajor,RowMajor>
+{
+  typedef typename ResultType::RealScalar RealScalar;
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res, const RealScalar& tolerance)
+  {
+    // we need a col-major matrix to hold the result
+    typedef SparseMatrix<typename ResultType::Scalar,ColMajor,typename ResultType::Index> SparseTemporaryType;
+    SparseTemporaryType _res(res.rows(), res.cols());
+    internal::sparse_sparse_product_with_pruning_impl<Lhs,Rhs,SparseTemporaryType>(lhs, rhs, _res, tolerance);
+    res = _res;
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct sparse_sparse_product_with_pruning_selector<Lhs,Rhs,ResultType,RowMajor,RowMajor,RowMajor>
+{
+  typedef typename ResultType::RealScalar RealScalar;
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res, const RealScalar& tolerance)
+  {
+    // let's transpose the product to get a column x column product
+    typename remove_all<ResultType>::type _res(res.rows(), res.cols());
+    internal::sparse_sparse_product_with_pruning_impl<Rhs,Lhs,ResultType>(rhs, lhs, _res, tolerance);
+    res.swap(_res);
+  }
+};
+
+template<typename Lhs, typename Rhs, typename ResultType>
+struct sparse_sparse_product_with_pruning_selector<Lhs,Rhs,ResultType,RowMajor,RowMajor,ColMajor>
+{
+  typedef typename ResultType::RealScalar RealScalar;
+  static void run(const Lhs& lhs, const Rhs& rhs, ResultType& res, const RealScalar& tolerance)
+  {
+    typedef SparseMatrix<typename ResultType::Scalar,ColMajor,typename Lhs::Index> ColMajorMatrixLhs;
+    typedef SparseMatrix<typename ResultType::Scalar,ColMajor,typename Lhs::Index> ColMajorMatrixRhs;
+    ColMajorMatrixLhs colLhs(lhs);
+    ColMajorMatrixRhs colRhs(rhs);
+    internal::sparse_sparse_product_with_pruning_impl<ColMajorMatrixLhs,ColMajorMatrixRhs,ResultType>(colLhs, colRhs, res, tolerance);
+
+    // let's transpose the product to get a column x column product
+//     typedef SparseMatrix<typename ResultType::Scalar> SparseTemporaryType;
+//     SparseTemporaryType _res(res.cols(), res.rows());
+//     sparse_sparse_product_with_pruning_impl<Rhs,Lhs,SparseTemporaryType>(rhs, lhs, _res);
+//     res = _res.transpose();
+  }
+};
+
+// NOTE the 2 others cases (col row *) must never occur since they are caught
+// by ProductReturnType which transforms it to (col col *) by evaluating rhs.
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSESPARSEPRODUCTWITHPRUNING_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseTranspose.h b/third_party/eigen3/Eigen/src/SparseCore/SparseTranspose.h
new file mode 100644
index 0000000000..7c300ee8db
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseTranspose.h
@@ -0,0 +1,63 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSETRANSPOSE_H
+#define EIGEN_SPARSETRANSPOSE_H
+
+namespace Eigen { 
+
+template<typename MatrixType> class TransposeImpl<MatrixType,Sparse>
+  : public SparseMatrixBase<Transpose<MatrixType> >
+{
+    typedef typename internal::remove_all<typename MatrixType::Nested>::type _MatrixTypeNested;
+  public:
+
+    EIGEN_SPARSE_PUBLIC_INTERFACE(Transpose<MatrixType> )
+
+    class InnerIterator;
+    class ReverseInnerIterator;
+
+    inline Index nonZeros() const { return derived().nestedExpression().nonZeros(); }
+};
+
+// NOTE: VC10 trigger an ICE if don't put typename TransposeImpl<MatrixType,Sparse>:: in front of Index,
+// a typedef typename TransposeImpl<MatrixType,Sparse>::Index Index;
+// does not fix the issue.
+// An alternative is to define the nested class in the parent class itself.
+template<typename MatrixType> class TransposeImpl<MatrixType,Sparse>::InnerIterator
+  : public _MatrixTypeNested::InnerIterator
+{
+    typedef typename _MatrixTypeNested::InnerIterator Base;
+    typedef typename TransposeImpl::Index Index;
+  public:
+
+    EIGEN_STRONG_INLINE InnerIterator(const TransposeImpl& trans, typename TransposeImpl<MatrixType,Sparse>::Index outer)
+      : Base(trans.derived().nestedExpression(), outer)
+    {}
+    Index row() const { return Base::col(); }
+    Index col() const { return Base::row(); }
+};
+
+template<typename MatrixType> class TransposeImpl<MatrixType,Sparse>::ReverseInnerIterator
+  : public _MatrixTypeNested::ReverseInnerIterator
+{
+    typedef typename _MatrixTypeNested::ReverseInnerIterator Base;
+    typedef typename TransposeImpl::Index Index;
+  public:
+
+    EIGEN_STRONG_INLINE ReverseInnerIterator(const TransposeImpl& xpr, typename TransposeImpl<MatrixType,Sparse>::Index outer)
+      : Base(xpr.derived().nestedExpression(), outer)
+    {}
+    Index row() const { return Base::col(); }
+    Index col() const { return Base::row(); }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSETRANSPOSE_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseTriangularView.h b/third_party/eigen3/Eigen/src/SparseCore/SparseTriangularView.h
new file mode 100644
index 0000000000..333127b78e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseTriangularView.h
@@ -0,0 +1,179 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_TRIANGULARVIEW_H
+#define EIGEN_SPARSE_TRIANGULARVIEW_H
+
+namespace Eigen { 
+
+namespace internal {
+  
+template<typename MatrixType, int Mode>
+struct traits<SparseTriangularView<MatrixType,Mode> >
+: public traits<MatrixType>
+{};
+
+} // namespace internal
+
+template<typename MatrixType, int Mode> class SparseTriangularView
+  : public SparseMatrixBase<SparseTriangularView<MatrixType,Mode> >
+{
+    enum { SkipFirst = ((Mode&Lower) && !(MatrixType::Flags&RowMajorBit))
+                    || ((Mode&Upper) &&  (MatrixType::Flags&RowMajorBit)),
+           SkipLast = !SkipFirst,
+           SkipDiag = (Mode&ZeroDiag) ? 1 : 0,
+           HasUnitDiag = (Mode&UnitDiag) ? 1 : 0
+    };
+
+  public:
+    
+    EIGEN_SPARSE_PUBLIC_INTERFACE(SparseTriangularView)
+
+    class InnerIterator;
+    class ReverseInnerIterator;
+
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+
+    typedef typename MatrixType::Nested MatrixTypeNested;
+    typedef typename internal::remove_reference<MatrixTypeNested>::type MatrixTypeNestedNonRef;
+    typedef typename internal::remove_all<MatrixTypeNested>::type MatrixTypeNestedCleaned;
+
+    inline SparseTriangularView(const MatrixType& matrix) : m_matrix(matrix) {}
+
+    /** \internal */
+    inline const MatrixTypeNestedCleaned& nestedExpression() const { return m_matrix; }
+
+    template<typename OtherDerived>
+    typename internal::plain_matrix_type_column_major<OtherDerived>::type
+    solve(const MatrixBase<OtherDerived>& other) const;
+
+    template<typename OtherDerived> void solveInPlace(MatrixBase<OtherDerived>& other) const;
+    template<typename OtherDerived> void solveInPlace(SparseMatrixBase<OtherDerived>& other) const;
+
+  protected:
+    MatrixTypeNested m_matrix;
+};
+
+template<typename MatrixType, int Mode>
+class SparseTriangularView<MatrixType,Mode>::InnerIterator : public MatrixTypeNestedCleaned::InnerIterator
+{
+    typedef typename MatrixTypeNestedCleaned::InnerIterator Base;
+    typedef typename SparseTriangularView::Index Index;
+  public:
+
+    EIGEN_STRONG_INLINE InnerIterator(const SparseTriangularView& view, Index outer)
+      : Base(view.nestedExpression(), outer), m_returnOne(false)
+    {
+      if(SkipFirst)
+      {
+        while((*this) && ((HasUnitDiag||SkipDiag)  ? this->index()<=outer : this->index()<outer))
+          Base::operator++();
+        if(HasUnitDiag)
+          m_returnOne = true;
+      }
+      else if(HasUnitDiag && ((!Base::operator bool()) || Base::index()>=Base::outer()))
+      {
+        if((!SkipFirst) && Base::operator bool())
+          Base::operator++();
+        m_returnOne = true;
+      }
+    }
+
+    EIGEN_STRONG_INLINE InnerIterator& operator++()
+    {
+      if(HasUnitDiag && m_returnOne)
+        m_returnOne = false;
+      else
+      {
+        Base::operator++();
+        if(HasUnitDiag && (!SkipFirst) && ((!Base::operator bool()) || Base::index()>=Base::outer()))
+        {
+          if((!SkipFirst) && Base::operator bool())
+            Base::operator++();
+          m_returnOne = true;
+        }
+      }
+      return *this;
+    }
+
+    inline Index row() const { return (MatrixType::Flags&RowMajorBit ? Base::outer() : this->index()); }
+    inline Index col() const { return (MatrixType::Flags&RowMajorBit ? this->index() : Base::outer()); }
+    inline Index index() const
+    {
+      if(HasUnitDiag && m_returnOne)  return Base::outer();
+      else                            return Base::index();
+    }
+    inline Scalar value() const
+    {
+      if(HasUnitDiag && m_returnOne)  return Scalar(1);
+      else                            return Base::value();
+    }
+
+    EIGEN_STRONG_INLINE operator bool() const
+    {
+      if(HasUnitDiag && m_returnOne)
+        return true;
+      if(SkipFirst) return  Base::operator bool();
+      else
+      {
+        if (SkipDiag) return (Base::operator bool() && this->index() < this->outer());
+        else return (Base::operator bool() && this->index() <= this->outer());
+      }
+    }
+  protected:
+    bool m_returnOne;
+};
+
+template<typename MatrixType, int Mode>
+class SparseTriangularView<MatrixType,Mode>::ReverseInnerIterator : public MatrixTypeNestedCleaned::ReverseInnerIterator
+{
+    typedef typename MatrixTypeNestedCleaned::ReverseInnerIterator Base;
+    typedef typename SparseTriangularView::Index Index;
+  public:
+
+    EIGEN_STRONG_INLINE ReverseInnerIterator(const SparseTriangularView& view, Index outer)
+      : Base(view.nestedExpression(), outer)
+    {
+      eigen_assert((!HasUnitDiag) && "ReverseInnerIterator does not support yet triangular views with a unit diagonal");
+      if(SkipLast) {
+        while((*this) && (SkipDiag ? this->index()>=outer : this->index()>outer))
+          --(*this);
+      }
+    }
+
+    EIGEN_STRONG_INLINE ReverseInnerIterator& operator--()
+    { Base::operator--(); return *this; }
+
+    inline Index row() const { return Base::row(); }
+    inline Index col() const { return Base::col(); }
+
+    EIGEN_STRONG_INLINE operator bool() const
+    {
+      if (SkipLast) return Base::operator bool() ;
+      else
+      {
+        if(SkipDiag) return (Base::operator bool() && this->index() > this->outer());
+        else return (Base::operator bool() && this->index() >= this->outer());
+      }
+    }
+};
+
+template<typename Derived>
+template<int Mode>
+inline const SparseTriangularView<Derived, Mode>
+SparseMatrixBase<Derived>::triangularView() const
+{
+  return derived();
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_TRIANGULARVIEW_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseUtil.h b/third_party/eigen3/Eigen/src/SparseCore/SparseUtil.h
new file mode 100644
index 0000000000..05023858b1
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseUtil.h
@@ -0,0 +1,171 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSEUTIL_H
+#define EIGEN_SPARSEUTIL_H
+
+namespace Eigen { 
+
+#ifdef NDEBUG
+#define EIGEN_DBG_SPARSE(X)
+#else
+#define EIGEN_DBG_SPARSE(X) X
+#endif
+
+#define EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(Derived, Op) \
+template<typename OtherDerived> \
+EIGEN_STRONG_INLINE Derived& operator Op(const Eigen::SparseMatrixBase<OtherDerived>& other) \
+{ \
+  return Base::operator Op(other.derived()); \
+} \
+EIGEN_STRONG_INLINE Derived& operator Op(const Derived& other) \
+{ \
+  return Base::operator Op(other); \
+}
+
+#define EIGEN_SPARSE_INHERIT_SCALAR_ASSIGNMENT_OPERATOR(Derived, Op) \
+template<typename Other> \
+EIGEN_STRONG_INLINE Derived& operator Op(const Other& scalar) \
+{ \
+  return Base::operator Op(scalar); \
+}
+
+#define EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATORS(Derived) \
+EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(Derived, =) \
+EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(Derived, +=) \
+EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(Derived, -=) \
+EIGEN_SPARSE_INHERIT_SCALAR_ASSIGNMENT_OPERATOR(Derived, *=) \
+EIGEN_SPARSE_INHERIT_SCALAR_ASSIGNMENT_OPERATOR(Derived, /=)
+
+#define _EIGEN_SPARSE_PUBLIC_INTERFACE(Derived, BaseClass) \
+  typedef BaseClass Base; \
+  typedef typename Eigen::internal::traits<Derived >::Scalar Scalar; \
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar; \
+  typedef typename Eigen::internal::nested<Derived >::type Nested; \
+  typedef typename Eigen::internal::traits<Derived >::StorageKind StorageKind; \
+  typedef typename Eigen::internal::traits<Derived >::Index Index; \
+  enum { RowsAtCompileTime = Eigen::internal::traits<Derived >::RowsAtCompileTime, \
+        ColsAtCompileTime = Eigen::internal::traits<Derived >::ColsAtCompileTime, \
+        Flags = Eigen::internal::traits<Derived >::Flags, \
+        CoeffReadCost = Eigen::internal::traits<Derived >::CoeffReadCost, \
+        SizeAtCompileTime = Base::SizeAtCompileTime, \
+        IsVectorAtCompileTime = Base::IsVectorAtCompileTime }; \
+  using Base::derived; \
+  using Base::const_cast_derived;
+
+#define EIGEN_SPARSE_PUBLIC_INTERFACE(Derived) \
+  _EIGEN_SPARSE_PUBLIC_INTERFACE(Derived, Eigen::SparseMatrixBase<Derived >)
+
+const int CoherentAccessPattern     = 0x1;
+const int InnerRandomAccessPattern  = 0x2 | CoherentAccessPattern;
+const int OuterRandomAccessPattern  = 0x4 | CoherentAccessPattern;
+const int RandomAccessPattern       = 0x8 | OuterRandomAccessPattern | InnerRandomAccessPattern;
+
+template<typename Derived> class SparseMatrixBase;
+template<typename _Scalar, int _Flags = 0, typename _Index = int>  class SparseMatrix;
+template<typename _Scalar, int _Flags = 0, typename _Index = int>  class DynamicSparseMatrix;
+template<typename _Scalar, int _Flags = 0, typename _Index = int>  class SparseVector;
+template<typename _Scalar, int _Flags = 0, typename _Index = int>  class MappedSparseMatrix;
+
+template<typename MatrixType, int Mode>           class SparseTriangularView;
+template<typename MatrixType, unsigned int UpLo>  class SparseSelfAdjointView;
+template<typename Lhs, typename Rhs>              class SparseDiagonalProduct;
+template<typename MatrixType> class SparseView;
+
+template<typename Lhs, typename Rhs>        class SparseSparseProduct;
+template<typename Lhs, typename Rhs>        class SparseTimeDenseProduct;
+template<typename Lhs, typename Rhs>        class DenseTimeSparseProduct;
+template<typename Lhs, typename Rhs, bool Transpose> class SparseDenseOuterProduct;
+
+template<typename Lhs, typename Rhs> struct SparseSparseProductReturnType;
+template<typename Lhs, typename Rhs, int InnerSize = internal::traits<Lhs>::ColsAtCompileTime> struct DenseSparseProductReturnType;
+template<typename Lhs, typename Rhs, int InnerSize = internal::traits<Lhs>::ColsAtCompileTime> struct SparseDenseProductReturnType;
+template<typename MatrixType,int UpLo> class SparseSymmetricPermutationProduct;
+
+namespace internal {
+
+template<typename T,int Rows,int Cols> struct sparse_eval;
+
+template<typename T> struct eval<T,Sparse>
+  : public sparse_eval<T, traits<T>::RowsAtCompileTime,traits<T>::ColsAtCompileTime>
+{};
+
+template<typename T,int Cols> struct sparse_eval<T,1,Cols> {
+    typedef typename traits<T>::Scalar _Scalar;
+    typedef typename traits<T>::Index _Index;
+  public:
+    typedef SparseVector<_Scalar, RowMajor, _Index> type;
+};
+
+template<typename T,int Rows> struct sparse_eval<T,Rows,1> {
+    typedef typename traits<T>::Scalar _Scalar;
+    typedef typename traits<T>::Index _Index;
+  public:
+    typedef SparseVector<_Scalar, ColMajor, _Index> type;
+};
+
+template<typename T,int Rows,int Cols> struct sparse_eval {
+    typedef typename traits<T>::Scalar _Scalar;
+    typedef typename traits<T>::Index _Index;
+    enum { _Options = ((traits<T>::Flags&RowMajorBit)==RowMajorBit) ? RowMajor : ColMajor };
+  public:
+    typedef SparseMatrix<_Scalar, _Options, _Index> type;
+};
+
+template<typename T> struct sparse_eval<T,1,1> {
+    typedef typename traits<T>::Scalar _Scalar;
+  public:
+    typedef Matrix<_Scalar, 1, 1> type;
+};
+
+template<typename T> struct plain_matrix_type<T,Sparse>
+{
+  typedef typename traits<T>::Scalar _Scalar;
+  typedef typename traits<T>::Index _Index;
+  enum { _Options = ((traits<T>::Flags&RowMajorBit)==RowMajorBit) ? RowMajor : ColMajor };
+  public:
+    typedef SparseMatrix<_Scalar, _Options, _Index> type;
+};
+
+} // end namespace internal
+
+/** \ingroup SparseCore_Module
+  *
+  * \class Triplet
+  *
+  * \brief A small structure to hold a non zero as a triplet (i,j,value).
+  *
+  * \sa SparseMatrix::setFromTriplets()
+  */
+template<typename Scalar, typename Index=typename SparseMatrix<Scalar>::Index >
+class Triplet
+{
+public:
+  Triplet() : m_row(0), m_col(0), m_value(0) {}
+
+  Triplet(const Index& i, const Index& j, const Scalar& v = Scalar(0))
+    : m_row(i), m_col(j), m_value(v)
+  {}
+
+  /** \returns the row index of the element */
+  const Index& row() const { return m_row; }
+
+  /** \returns the column index of the element */
+  const Index& col() const { return m_col; }
+
+  /** \returns the value of the element */
+  const Scalar& value() const { return m_value; }
+protected:
+  Index m_row, m_col;
+  Scalar m_value;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSEUTIL_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseVector.h b/third_party/eigen3/Eigen/src/SparseCore/SparseVector.h
new file mode 100644
index 0000000000..7e15c814b6
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseVector.h
@@ -0,0 +1,447 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSEVECTOR_H
+#define EIGEN_SPARSEVECTOR_H
+
+namespace Eigen { 
+
+/** \ingroup SparseCore_Module
+  * \class SparseVector
+  *
+  * \brief a sparse vector class
+  *
+  * \tparam _Scalar the scalar type, i.e. the type of the coefficients
+  *
+  * See http://www.netlib.org/linalg/html_templates/node91.html for details on the storage scheme.
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_SPARSEVECTOR_PLUGIN.
+  */
+
+namespace internal {
+template<typename _Scalar, int _Options, typename _Index>
+struct traits<SparseVector<_Scalar, _Options, _Index> >
+{
+  typedef _Scalar Scalar;
+  typedef _Index Index;
+  typedef Sparse StorageKind;
+  typedef MatrixXpr XprKind;
+  enum {
+    IsColVector = (_Options & RowMajorBit) ? 0 : 1,
+
+    RowsAtCompileTime = IsColVector ? Dynamic : 1,
+    ColsAtCompileTime = IsColVector ? 1 : Dynamic,
+    MaxRowsAtCompileTime = RowsAtCompileTime,
+    MaxColsAtCompileTime = ColsAtCompileTime,
+    Flags = _Options | NestByRefBit | LvalueBit | (IsColVector ? 0 : RowMajorBit),
+    CoeffReadCost = NumTraits<Scalar>::ReadCost,
+    SupportedAccessPatterns = InnerRandomAccessPattern
+  };
+};
+
+// Sparse-Vector-Assignment kinds:
+enum {
+  SVA_RuntimeSwitch,
+  SVA_Inner,
+  SVA_Outer
+};
+
+template< typename Dest, typename Src,
+          int AssignmentKind = !bool(Src::IsVectorAtCompileTime) ? SVA_RuntimeSwitch
+                             : Src::InnerSizeAtCompileTime==1 ? SVA_Outer
+                             : SVA_Inner>
+struct sparse_vector_assign_selector;
+
+}
+
+template<typename _Scalar, int _Options, typename _Index>
+class SparseVector
+  : public SparseMatrixBase<SparseVector<_Scalar, _Options, _Index> >
+{
+    typedef SparseMatrixBase<SparseVector> SparseBase;
+    
+  public:
+    EIGEN_SPARSE_PUBLIC_INTERFACE(SparseVector)
+    EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(SparseVector, +=)
+    EIGEN_SPARSE_INHERIT_ASSIGNMENT_OPERATOR(SparseVector, -=)
+    
+    typedef internal::CompressedStorage<Scalar,Index> Storage;
+    enum { IsColVector = internal::traits<SparseVector>::IsColVector };
+    
+    enum {
+      Options = _Options
+    };
+    
+    EIGEN_STRONG_INLINE Index rows() const { return IsColVector ? m_size : 1; }
+    EIGEN_STRONG_INLINE Index cols() const { return IsColVector ? 1 : m_size; }
+    EIGEN_STRONG_INLINE Index innerSize() const { return m_size; }
+    EIGEN_STRONG_INLINE Index outerSize() const { return 1; }
+
+    EIGEN_STRONG_INLINE const Scalar* valuePtr() const { return &m_data.value(0); }
+    EIGEN_STRONG_INLINE Scalar* valuePtr() { return &m_data.value(0); }
+
+    EIGEN_STRONG_INLINE const Index* innerIndexPtr() const { return &m_data.index(0); }
+    EIGEN_STRONG_INLINE Index* innerIndexPtr() { return &m_data.index(0); }
+    
+    /** \internal */
+    inline Storage& data() { return m_data; }
+    /** \internal */
+    inline const Storage& data() const { return m_data; }
+
+    inline Scalar coeff(Index row, Index col) const
+    {
+      eigen_assert(IsColVector ? (col==0 && row>=0 && row<m_size) : (row==0 && col>=0 && col<m_size));
+      return coeff(IsColVector ? row : col);
+    }
+    inline Scalar coeff(Index i) const
+    {
+      eigen_assert(i>=0 && i<m_size);
+      return m_data.at(i);
+    }
+
+    inline Scalar& coeffRef(Index row, Index col)
+    {
+      eigen_assert(IsColVector ? (col==0 && row>=0 && row<m_size) : (row==0 && col>=0 && col<m_size));
+      return coeff(IsColVector ? row : col);
+    }
+
+    /** \returns a reference to the coefficient value at given index \a i
+      * This operation involes a log(rho*size) binary search. If the coefficient does not
+      * exist yet, then a sorted insertion into a sequential buffer is performed.
+      *
+      * This insertion might be very costly if the number of nonzeros above \a i is large.
+      */
+    inline Scalar& coeffRef(Index i)
+    {
+      eigen_assert(i>=0 && i<m_size);
+      return m_data.atWithInsertion(i);
+    }
+
+  public:
+
+    class InnerIterator;
+    class ReverseInnerIterator;
+
+    inline void setZero() { m_data.clear(); }
+
+    /** \returns the number of non zero coefficients */
+    inline Index nonZeros() const  { return static_cast<Index>(m_data.size()); }
+
+    inline void startVec(Index outer)
+    {
+      EIGEN_UNUSED_VARIABLE(outer);
+      eigen_assert(outer==0);
+    }
+
+    inline Scalar& insertBackByOuterInner(Index outer, Index inner)
+    {
+      EIGEN_UNUSED_VARIABLE(outer);
+      eigen_assert(outer==0);
+      return insertBack(inner);
+    }
+    inline Scalar& insertBack(Index i)
+    {
+      m_data.append(0, i);
+      return m_data.value(m_data.size()-1);
+    }
+
+    inline Scalar& insert(Index row, Index col)
+    {
+      eigen_assert(IsColVector ? (col==0 && row>=0 && row<m_size) : (row==0 && col>=0 && col<m_size));
+      
+      Index inner = IsColVector ? row : col;
+      Index outer = IsColVector ? col : row;
+      eigen_assert(outer==0);
+      return insert(inner);
+    }
+    Scalar& insert(Index i)
+    {
+      eigen_assert(i>=0 && i<m_size);
+      
+      Index startId = 0;
+      Index p = Index(m_data.size()) - 1;
+      // TODO smart realloc
+      m_data.resize(p+2,1);
+
+      while ( (p >= startId) && (m_data.index(p) > i) )
+      {
+        m_data.index(p+1) = m_data.index(p);
+        m_data.value(p+1) = m_data.value(p);
+        --p;
+      }
+      m_data.index(p+1) = i;
+      m_data.value(p+1) = 0;
+      return m_data.value(p+1);
+    }
+
+    /**
+      */
+    inline void reserve(Index reserveSize) { m_data.reserve(reserveSize); }
+
+
+    inline void finalize() {}
+
+    void prune(const Scalar& reference, const RealScalar& epsilon = NumTraits<RealScalar>::dummy_precision())
+    {
+      m_data.prune(reference,epsilon);
+    }
+
+    void resize(Index rows, Index cols)
+    {
+      eigen_assert(rows==1 || cols==1);
+      resize(IsColVector ? rows : cols);
+    }
+
+    void resize(Index newSize)
+    {
+      m_size = newSize;
+      m_data.clear();
+    }
+
+    void resizeNonZeros(Index size) { m_data.resize(size); }
+
+    inline SparseVector() : m_size(0) { check_template_parameters(); resize(0); }
+
+    inline SparseVector(Index size) : m_size(0) { check_template_parameters(); resize(size); }
+
+    inline SparseVector(Index rows, Index cols) : m_size(0) { check_template_parameters(); resize(rows,cols); }
+
+    template<typename OtherDerived>
+    inline SparseVector(const SparseMatrixBase<OtherDerived>& other)
+      : m_size(0)
+    {
+      check_template_parameters();
+      *this = other.derived();
+    }
+
+    inline SparseVector(const SparseVector& other)
+      : SparseBase(other), m_size(0)
+    {
+      check_template_parameters();
+      *this = other.derived();
+    }
+
+    /** Swaps the values of \c *this and \a other.
+      * Overloaded for performance: this version performs a \em shallow swap by swaping pointers and attributes only.
+      * \sa SparseMatrixBase::swap()
+      */
+    inline void swap(SparseVector& other)
+    {
+      std::swap(m_size, other.m_size);
+      m_data.swap(other.m_data);
+    }
+
+    inline SparseVector& operator=(const SparseVector& other)
+    {
+      if (other.isRValue())
+      {
+        swap(other.const_cast_derived());
+      }
+      else
+      {
+        resize(other.size());
+        m_data = other.m_data;
+      }
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    inline SparseVector& operator=(const SparseMatrixBase<OtherDerived>& other)
+    {
+      SparseVector tmp(other.size());
+      internal::sparse_vector_assign_selector<SparseVector,OtherDerived>::run(tmp,other.derived());
+      this->swap(tmp);
+      return *this;
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    template<typename Lhs, typename Rhs>
+    inline SparseVector& operator=(const SparseSparseProduct<Lhs,Rhs>& product)
+    {
+      return Base::operator=(product);
+    }
+    #endif
+
+    friend std::ostream & operator << (std::ostream & s, const SparseVector& m)
+    {
+      for (Index i=0; i<m.nonZeros(); ++i)
+        s << "(" << m.m_data.value(i) << "," << m.m_data.index(i) << ") ";
+      s << std::endl;
+      return s;
+    }
+
+    /** Destructor */
+    inline ~SparseVector() {}
+
+    /** Overloaded for performance */
+    Scalar sum() const;
+
+  public:
+
+    /** \internal \deprecated use setZero() and reserve() */
+    EIGEN_DEPRECATED void startFill(Index reserve)
+    {
+      setZero();
+      m_data.reserve(reserve);
+    }
+
+    /** \internal \deprecated use insertBack(Index,Index) */
+    EIGEN_DEPRECATED Scalar& fill(Index r, Index c)
+    {
+      eigen_assert(r==0 || c==0);
+      return fill(IsColVector ? r : c);
+    }
+
+    /** \internal \deprecated use insertBack(Index) */
+    EIGEN_DEPRECATED Scalar& fill(Index i)
+    {
+      m_data.append(0, i);
+      return m_data.value(m_data.size()-1);
+    }
+
+    /** \internal \deprecated use insert(Index,Index) */
+    EIGEN_DEPRECATED Scalar& fillrand(Index r, Index c)
+    {
+      eigen_assert(r==0 || c==0);
+      return fillrand(IsColVector ? r : c);
+    }
+
+    /** \internal \deprecated use insert(Index) */
+    EIGEN_DEPRECATED Scalar& fillrand(Index i)
+    {
+      return insert(i);
+    }
+
+    /** \internal \deprecated use finalize() */
+    EIGEN_DEPRECATED void endFill() {}
+    
+    // These two functions were here in the 3.1 release, so let's keep them in case some code rely on them.
+    /** \internal \deprecated use data() */
+    EIGEN_DEPRECATED Storage& _data() { return m_data; }
+    /** \internal \deprecated use data() */
+    EIGEN_DEPRECATED const Storage& _data() const { return m_data; }
+    
+#   ifdef EIGEN_SPARSEVECTOR_PLUGIN
+#     include EIGEN_SPARSEVECTOR_PLUGIN
+#   endif
+
+protected:
+  
+    static void check_template_parameters()
+    {
+      EIGEN_STATIC_ASSERT(NumTraits<Index>::IsSigned,THE_INDEX_TYPE_MUST_BE_A_SIGNED_TYPE);
+      EIGEN_STATIC_ASSERT((_Options&(ColMajor|RowMajor))==Options,INVALID_MATRIX_TEMPLATE_PARAMETERS);
+    }
+    
+    Storage m_data;
+    Index m_size;
+};
+
+template<typename Scalar, int _Options, typename _Index>
+class SparseVector<Scalar,_Options,_Index>::InnerIterator
+{
+  public:
+    InnerIterator(const SparseVector& vec, Index outer=0)
+      : m_data(vec.m_data), m_id(0), m_end(static_cast<Index>(m_data.size()))
+    {
+      EIGEN_UNUSED_VARIABLE(outer);
+      eigen_assert(outer==0);
+    }
+
+    InnerIterator(const internal::CompressedStorage<Scalar,Index>& data)
+      : m_data(data), m_id(0), m_end(static_cast<Index>(m_data.size()))
+    {}
+
+    inline InnerIterator& operator++() { m_id++; return *this; }
+
+    inline Scalar value() const { return m_data.value(m_id); }
+    inline Scalar& valueRef() { return const_cast<Scalar&>(m_data.value(m_id)); }
+
+    inline Index index() const { return m_data.index(m_id); }
+    inline Index row() const { return IsColVector ? index() : 0; }
+    inline Index col() const { return IsColVector ? 0 : index(); }
+
+    inline operator bool() const { return (m_id < m_end); }
+
+  protected:
+    const internal::CompressedStorage<Scalar,Index>& m_data;
+    Index m_id;
+    const Index m_end;
+};
+
+template<typename Scalar, int _Options, typename _Index>
+class SparseVector<Scalar,_Options,_Index>::ReverseInnerIterator
+{
+  public:
+    ReverseInnerIterator(const SparseVector& vec, Index outer=0)
+      : m_data(vec.m_data), m_id(static_cast<Index>(m_data.size())), m_start(0)
+    {
+      EIGEN_UNUSED_VARIABLE(outer);
+      eigen_assert(outer==0);
+    }
+
+    ReverseInnerIterator(const internal::CompressedStorage<Scalar,Index>& data)
+      : m_data(data), m_id(static_cast<Index>(m_data.size())), m_start(0)
+    {}
+
+    inline ReverseInnerIterator& operator--() { m_id--; return *this; }
+
+    inline Scalar value() const { return m_data.value(m_id-1); }
+    inline Scalar& valueRef() { return const_cast<Scalar&>(m_data.value(m_id-1)); }
+
+    inline Index index() const { return m_data.index(m_id-1); }
+    inline Index row() const { return IsColVector ? index() : 0; }
+    inline Index col() const { return IsColVector ? 0 : index(); }
+
+    inline operator bool() const { return (m_id > m_start); }
+
+  protected:
+    const internal::CompressedStorage<Scalar,Index>& m_data;
+    Index m_id;
+    const Index m_start;
+};
+
+namespace internal {
+
+template< typename Dest, typename Src>
+struct sparse_vector_assign_selector<Dest,Src,SVA_Inner> {
+  static void run(Dest& dst, const Src& src) {
+    eigen_internal_assert(src.innerSize()==src.size());
+    for(typename Src::InnerIterator it(src, 0); it; ++it)
+      dst.insert(it.index()) = it.value();
+  }
+};
+
+template< typename Dest, typename Src>
+struct sparse_vector_assign_selector<Dest,Src,SVA_Outer> {
+  static void run(Dest& dst, const Src& src) {
+    eigen_internal_assert(src.outerSize()==src.size());
+    for(typename Dest::Index i=0; i<src.size(); ++i)
+    {
+      typename Src::InnerIterator it(src, i);
+      if(it)
+        dst.insert(i) = it.value();
+    }
+  }
+};
+
+template< typename Dest, typename Src>
+struct sparse_vector_assign_selector<Dest,Src,SVA_RuntimeSwitch> {
+  static void run(Dest& dst, const Src& src) {
+    if(src.outerSize()==1)  sparse_vector_assign_selector<Dest,Src,SVA_Inner>::run(dst, src);
+    else                    sparse_vector_assign_selector<Dest,Src,SVA_Outer>::run(dst, src);
+  }
+};
+
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSEVECTOR_H
diff --git a/third_party/eigen3/Eigen/src/SparseCore/SparseView.h b/third_party/eigen3/Eigen/src/SparseCore/SparseView.h
new file mode 100644
index 0000000000..fd8450463f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/SparseView.h
@@ -0,0 +1,99 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2010 Daniel Lowengrub <lowdanie@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSEVIEW_H
+#define EIGEN_SPARSEVIEW_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename MatrixType>
+struct traits<SparseView<MatrixType> > : traits<MatrixType>
+{
+  typedef typename MatrixType::Index Index;
+  typedef Sparse StorageKind;
+  enum {
+    Flags = int(traits<MatrixType>::Flags) & (RowMajorBit)
+  };
+};
+
+} // end namespace internal
+
+template<typename MatrixType>
+class SparseView : public SparseMatrixBase<SparseView<MatrixType> >
+{
+  typedef typename MatrixType::Nested MatrixTypeNested;
+  typedef typename internal::remove_all<MatrixTypeNested>::type _MatrixTypeNested;
+public:
+  EIGEN_SPARSE_PUBLIC_INTERFACE(SparseView)
+
+  SparseView(const MatrixType& mat, const Scalar& m_reference = Scalar(0),
+             typename NumTraits<Scalar>::Real m_epsilon = NumTraits<Scalar>::dummy_precision()) : 
+    m_matrix(mat), m_reference(m_reference), m_epsilon(m_epsilon) {}
+
+  class InnerIterator;
+
+  inline Index rows() const { return m_matrix.rows(); }
+  inline Index cols() const { return m_matrix.cols(); }
+
+  inline Index innerSize() const { return m_matrix.innerSize(); }
+  inline Index outerSize() const { return m_matrix.outerSize(); }
+
+protected:
+  MatrixTypeNested m_matrix;
+  Scalar m_reference;
+  typename NumTraits<Scalar>::Real m_epsilon;
+};
+
+template<typename MatrixType>
+class SparseView<MatrixType>::InnerIterator : public _MatrixTypeNested::InnerIterator
+{
+  typedef typename SparseView::Index Index;
+public:
+  typedef typename _MatrixTypeNested::InnerIterator IterBase;
+  InnerIterator(const SparseView& view, Index outer) :
+  IterBase(view.m_matrix, outer), m_view(view)
+  {
+    incrementToNonZero();
+  }
+
+  EIGEN_STRONG_INLINE InnerIterator& operator++()
+  {
+    IterBase::operator++();
+    incrementToNonZero();
+    return *this;
+  }
+
+  using IterBase::value;
+
+protected:
+  const SparseView& m_view;
+
+private:
+  void incrementToNonZero()
+  {
+    while((bool(*this)) && internal::isMuchSmallerThan(value(), m_view.m_reference, m_view.m_epsilon))
+    {
+      IterBase::operator++();
+    }
+  }
+};
+
+template<typename Derived>
+const SparseView<Derived> MatrixBase<Derived>::sparseView(const Scalar& m_reference,
+                                                          const typename NumTraits<Scalar>::Real& m_epsilon) const
+{
+  return SparseView<Derived>(derived(), m_reference, m_epsilon);
+}
+
+} // end namespace Eigen
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/SparseCore/TriangularSolver.h b/third_party/eigen3/Eigen/src/SparseCore/TriangularSolver.h
new file mode 100644
index 0000000000..cb8ad82b4f
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseCore/TriangularSolver.h
@@ -0,0 +1,334 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSETRIANGULARSOLVER_H
+#define EIGEN_SPARSETRIANGULARSOLVER_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename Lhs, typename Rhs, int Mode,
+  int UpLo = (Mode & Lower)
+           ? Lower
+           : (Mode & Upper)
+           ? Upper
+           : -1,
+  int StorageOrder = int(traits<Lhs>::Flags) & RowMajorBit>
+struct sparse_solve_triangular_selector;
+
+// forward substitution, row-major
+template<typename Lhs, typename Rhs, int Mode>
+struct sparse_solve_triangular_selector<Lhs,Rhs,Mode,Lower,RowMajor>
+{
+  typedef typename Rhs::Scalar Scalar;
+  static void run(const Lhs& lhs, Rhs& other)
+  {
+    for(int col=0 ; col<other.cols() ; ++col)
+    {
+      for(int i=0; i<lhs.rows(); ++i)
+      {
+        Scalar tmp = other.coeff(i,col);
+        Scalar lastVal(0);
+        int lastIndex = 0;
+        for(typename Lhs::InnerIterator it(lhs, i); it; ++it)
+        {
+          lastVal = it.value();
+          lastIndex = it.index();
+          if(lastIndex==i)
+            break;
+          tmp -= lastVal * other.coeff(lastIndex,col);
+        }
+        if (Mode & UnitDiag)
+          other.coeffRef(i,col) = tmp;
+        else
+        {
+          eigen_assert(lastIndex==i);
+          other.coeffRef(i,col) = tmp/lastVal;
+        }
+      }
+    }
+  }
+};
+
+// backward substitution, row-major
+template<typename Lhs, typename Rhs, int Mode>
+struct sparse_solve_triangular_selector<Lhs,Rhs,Mode,Upper,RowMajor>
+{
+  typedef typename Rhs::Scalar Scalar;
+  static void run(const Lhs& lhs, Rhs& other)
+  {
+    for(int col=0 ; col<other.cols() ; ++col)
+    {
+      for(int i=lhs.rows()-1 ; i>=0 ; --i)
+      {
+        Scalar tmp = other.coeff(i,col);
+        Scalar l_ii = 0;
+        typename Lhs::InnerIterator it(lhs, i);
+        while(it && it.index()<i)
+          ++it;
+        if(!(Mode & UnitDiag))
+        {
+          eigen_assert(it && it.index()==i);
+          l_ii = it.value();
+          ++it;
+        }
+        else if (it && it.index() == i)
+          ++it;
+        for(; it; ++it)
+        {
+          tmp -= it.value() * other.coeff(it.index(),col);
+        }
+
+        if (Mode & UnitDiag)
+          other.coeffRef(i,col) = tmp;
+        else
+          other.coeffRef(i,col) = tmp/l_ii;
+      }
+    }
+  }
+};
+
+// forward substitution, col-major
+template<typename Lhs, typename Rhs, int Mode>
+struct sparse_solve_triangular_selector<Lhs,Rhs,Mode,Lower,ColMajor>
+{
+  typedef typename Rhs::Scalar Scalar;
+  static void run(const Lhs& lhs, Rhs& other)
+  {
+    for(int col=0 ; col<other.cols() ; ++col)
+    {
+      for(int i=0; i<lhs.cols(); ++i)
+      {
+        Scalar& tmp = other.coeffRef(i,col);
+        if (tmp!=Scalar(0)) // optimization when other is actually sparse
+        {
+          typename Lhs::InnerIterator it(lhs, i);
+          while(it && it.index()<i)
+            ++it;
+          if(!(Mode & UnitDiag))
+          {
+            eigen_assert(it && it.index()==i);
+            tmp /= it.value();
+          }
+          if (it && it.index()==i)
+            ++it;
+          for(; it; ++it)
+            other.coeffRef(it.index(), col) -= tmp * it.value();
+        }
+      }
+    }
+  }
+};
+
+// backward substitution, col-major
+template<typename Lhs, typename Rhs, int Mode>
+struct sparse_solve_triangular_selector<Lhs,Rhs,Mode,Upper,ColMajor>
+{
+  typedef typename Rhs::Scalar Scalar;
+  static void run(const Lhs& lhs, Rhs& other)
+  {
+    for(int col=0 ; col<other.cols() ; ++col)
+    {
+      for(int i=lhs.cols()-1; i>=0; --i)
+      {
+        Scalar& tmp = other.coeffRef(i,col);
+        if (tmp!=Scalar(0)) // optimization when other is actually sparse
+        {
+          if(!(Mode & UnitDiag))
+          {
+            // TODO replace this by a binary search. make sure the binary search is safe for partially sorted elements
+            typename Lhs::ReverseInnerIterator it(lhs, i);
+            while(it && it.index()!=i)
+              --it;
+            eigen_assert(it && it.index()==i);
+            other.coeffRef(i,col) /= it.value();
+          }
+          typename Lhs::InnerIterator it(lhs, i);
+          for(; it && it.index()<i; ++it)
+            other.coeffRef(it.index(), col) -= tmp * it.value();
+        }
+      }
+    }
+  }
+};
+
+} // end namespace internal
+
+template<typename ExpressionType,int Mode>
+template<typename OtherDerived>
+void SparseTriangularView<ExpressionType,Mode>::solveInPlace(MatrixBase<OtherDerived>& other) const
+{
+  eigen_assert(m_matrix.cols() == m_matrix.rows() && m_matrix.cols() == other.rows());
+  eigen_assert((!(Mode & ZeroDiag)) && bool(Mode & (Upper|Lower)));
+
+  enum { copy = internal::traits<OtherDerived>::Flags & RowMajorBit };
+
+  typedef typename internal::conditional<copy,
+    typename internal::plain_matrix_type_column_major<OtherDerived>::type, OtherDerived&>::type OtherCopy;
+  OtherCopy otherCopy(other.derived());
+
+  internal::sparse_solve_triangular_selector<ExpressionType, typename internal::remove_reference<OtherCopy>::type, Mode>::run(m_matrix, otherCopy);
+
+  if (copy)
+    other = otherCopy;
+}
+
+template<typename ExpressionType,int Mode>
+template<typename OtherDerived>
+typename internal::plain_matrix_type_column_major<OtherDerived>::type
+SparseTriangularView<ExpressionType,Mode>::solve(const MatrixBase<OtherDerived>& other) const
+{
+  typename internal::plain_matrix_type_column_major<OtherDerived>::type res(other);
+  solveInPlace(res);
+  return res;
+}
+
+// pure sparse path
+
+namespace internal {
+
+template<typename Lhs, typename Rhs, int Mode,
+  int UpLo = (Mode & Lower)
+           ? Lower
+           : (Mode & Upper)
+           ? Upper
+           : -1,
+  int StorageOrder = int(Lhs::Flags) & (RowMajorBit)>
+struct sparse_solve_triangular_sparse_selector;
+
+// forward substitution, col-major
+template<typename Lhs, typename Rhs, int Mode, int UpLo>
+struct sparse_solve_triangular_sparse_selector<Lhs,Rhs,Mode,UpLo,ColMajor>
+{
+  typedef typename Rhs::Scalar Scalar;
+  typedef typename promote_index_type<typename traits<Lhs>::Index,
+                                         typename traits<Rhs>::Index>::type Index;
+  static void run(const Lhs& lhs, Rhs& other)
+  {
+    const bool IsLower = (UpLo==Lower);
+    AmbiVector<Scalar,Index> tempVector(other.rows()*2);
+    tempVector.setBounds(0,other.rows());
+
+    Rhs res(other.rows(), other.cols());
+    res.reserve(other.nonZeros());
+
+    for(int col=0 ; col<other.cols() ; ++col)
+    {
+      // FIXME estimate number of non zeros
+      tempVector.init(.99/*float(other.col(col).nonZeros())/float(other.rows())*/);
+      tempVector.setZero();
+      tempVector.restart();
+      for (typename Rhs::InnerIterator rhsIt(other, col); rhsIt; ++rhsIt)
+      {
+        tempVector.coeffRef(rhsIt.index()) = rhsIt.value();
+      }
+
+      for(int i=IsLower?0:lhs.cols()-1;
+          IsLower?i<lhs.cols():i>=0;
+          i+=IsLower?1:-1)
+      {
+        tempVector.restart();
+        Scalar& ci = tempVector.coeffRef(i);
+        if (ci!=Scalar(0))
+        {
+          // find
+          typename Lhs::InnerIterator it(lhs, i);
+          if(!(Mode & UnitDiag))
+          {
+            if (IsLower)
+            {
+              eigen_assert(it.index()==i);
+              ci /= it.value();
+            }
+            else
+              ci /= lhs.coeff(i,i);
+          }
+          tempVector.restart();
+          if (IsLower)
+          {
+            if (it.index()==i)
+              ++it;
+            for(; it; ++it)
+              tempVector.coeffRef(it.index()) -= ci * it.value();
+          }
+          else
+          {
+            for(; it && it.index()<i; ++it)
+              tempVector.coeffRef(it.index()) -= ci * it.value();
+          }
+        }
+      }
+
+
+      int count = 0;
+      // FIXME compute a reference value to filter zeros
+      for (typename AmbiVector<Scalar,Index>::Iterator it(tempVector/*,1e-12*/); it; ++it)
+      {
+        ++ count;
+//         std::cerr << "fill " << it.index() << ", " << col << "\n";
+//         std::cout << it.value() << "  ";
+        // FIXME use insertBack
+        res.insert(it.index(), col) = it.value();
+      }
+//       std::cout << "tempVector.nonZeros() == " << int(count) << " / " << (other.rows()) << "\n";
+    }
+    res.finalize();
+    other = res.markAsRValue();
+  }
+};
+
+} // end namespace internal
+
+template<typename ExpressionType,int Mode>
+template<typename OtherDerived>
+void SparseTriangularView<ExpressionType,Mode>::solveInPlace(SparseMatrixBase<OtherDerived>& other) const
+{
+  eigen_assert(m_matrix.cols() == m_matrix.rows() && m_matrix.cols() == other.rows());
+  eigen_assert( (!(Mode & ZeroDiag)) && bool(Mode & (Upper|Lower)));
+
+//   enum { copy = internal::traits<OtherDerived>::Flags & RowMajorBit };
+
+//   typedef typename internal::conditional<copy,
+//     typename internal::plain_matrix_type_column_major<OtherDerived>::type, OtherDerived&>::type OtherCopy;
+//   OtherCopy otherCopy(other.derived());
+
+  internal::sparse_solve_triangular_sparse_selector<ExpressionType, OtherDerived, Mode>::run(m_matrix, other.derived());
+
+//   if (copy)
+//     other = otherCopy;
+}
+
+#ifdef EIGEN2_SUPPORT
+
+// deprecated stuff:
+
+/** \deprecated */
+template<typename Derived>
+template<typename OtherDerived>
+void SparseMatrixBase<Derived>::solveTriangularInPlace(MatrixBase<OtherDerived>& other) const
+{
+  this->template triangular<Flags&(Upper|Lower)>().solveInPlace(other);
+}
+
+/** \deprecated */
+template<typename Derived>
+template<typename OtherDerived>
+typename internal::plain_matrix_type_column_major<OtherDerived>::type
+SparseMatrixBase<Derived>::solveTriangular(const MatrixBase<OtherDerived>& other) const
+{
+  typename internal::plain_matrix_type_column_major<OtherDerived>::type res(other);
+  derived().solveTriangularInPlace(res);
+  return res;
+}
+#endif // EIGEN2_SUPPORT
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSETRIANGULARSOLVER_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU.h
new file mode 100644
index 0000000000..7a9aeec2da
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU.h
@@ -0,0 +1,762 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+
+#ifndef EIGEN_SPARSE_LU_H
+#define EIGEN_SPARSE_LU_H
+
+namespace Eigen {
+
+template <typename _MatrixType, typename _OrderingType = COLAMDOrdering<typename _MatrixType::Index> > class SparseLU;
+template <typename MappedSparseMatrixType> struct SparseLUMatrixLReturnType;
+template <typename MatrixLType, typename MatrixUType> struct SparseLUMatrixUReturnType;
+
+/** \ingroup SparseLU_Module
+  * \class SparseLU
+  * 
+  * \brief Sparse supernodal LU factorization for general matrices
+  * 
+  * This class implements the supernodal LU factorization for general matrices.
+  * It uses the main techniques from the sequential SuperLU package 
+  * (http://crd-legacy.lbl.gov/~xiaoye/SuperLU/). It handles transparently real 
+  * and complex arithmetics with single and double precision, depending on the 
+  * scalar type of your input matrix. 
+  * The code has been optimized to provide BLAS-3 operations during supernode-panel updates. 
+  * It benefits directly from the built-in high-performant Eigen BLAS routines. 
+  * Moreover, when the size of a supernode is very small, the BLAS calls are avoided to 
+  * enable a better optimization from the compiler. For best performance, 
+  * you should compile it with NDEBUG flag to avoid the numerous bounds checking on vectors. 
+  * 
+  * An important parameter of this class is the ordering method. It is used to reorder the columns 
+  * (and eventually the rows) of the matrix to reduce the number of new elements that are created during 
+  * numerical factorization. The cheapest method available is COLAMD. 
+  * See  \link OrderingMethods_Module the OrderingMethods module \endlink for the list of 
+  * built-in and external ordering methods. 
+  *
+  * Simple example with key steps 
+  * \code
+  * VectorXd x(n), b(n);
+  * SparseMatrix<double, ColMajor> A;
+  * SparseLU<SparseMatrix<scalar, ColMajor>, COLAMDOrdering<Index> >   solver;
+  * // fill A and b;
+  * // Compute the ordering permutation vector from the structural pattern of A
+  * solver.analyzePattern(A); 
+  * // Compute the numerical factorization 
+  * solver.factorize(A); 
+  * //Use the factors to solve the linear system 
+  * x = solver.solve(b); 
+  * \endcode
+  * 
+  * \warning The input matrix A should be in a \b compressed and \b column-major form.
+  * Otherwise an expensive copy will be made. You can call the inexpensive makeCompressed() to get a compressed matrix.
+  * 
+  * \note Unlike the initial SuperLU implementation, there is no step to equilibrate the matrix. 
+  * For badly scaled matrices, this step can be useful to reduce the pivoting during factorization. 
+  * If this is the case for your matrices, you can try the basic scaling method at
+  *  "unsupported/Eigen/src/IterativeSolvers/Scaling.h"
+  * 
+  * \tparam _MatrixType The type of the sparse matrix. It must be a column-major SparseMatrix<>
+  * \tparam _OrderingType The ordering method to use, either AMD, COLAMD or METIS. Default is COLMAD
+  * 
+  * 
+  * \sa \ref TutorialSparseDirectSolvers
+  * \sa \ref OrderingMethods_Module
+  */
+template <typename _MatrixType, typename _OrderingType>
+class SparseLU : public internal::SparseLUImpl<typename _MatrixType::Scalar, typename _MatrixType::Index>
+{
+  public:
+    typedef _MatrixType MatrixType; 
+    typedef _OrderingType OrderingType;
+    typedef typename MatrixType::Scalar Scalar; 
+    typedef typename MatrixType::RealScalar RealScalar; 
+    typedef typename MatrixType::Index Index; 
+    typedef SparseMatrix<Scalar,ColMajor,Index> NCMatrix;
+    typedef internal::MappedSuperNodalMatrix<Scalar, Index> SCMatrix; 
+    typedef Matrix<Scalar,Dynamic,1> ScalarVector;
+    typedef Matrix<Index,Dynamic,1> IndexVector;
+    typedef PermutationMatrix<Dynamic, Dynamic, Index> PermutationType;
+    typedef internal::SparseLUImpl<Scalar, Index> Base;
+    
+  public:
+    SparseLU():m_isInitialized(true),m_lastError(""),m_Ustore(0,0,0,0,0,0),m_symmetricmode(false),m_diagpivotthresh(1.0),m_detPermR(1)
+    {
+      initperfvalues(); 
+    }
+    SparseLU(const MatrixType& matrix):m_isInitialized(true),m_lastError(""),m_Ustore(0,0,0,0,0,0),m_symmetricmode(false),m_diagpivotthresh(1.0),m_detPermR(1)
+    {
+      initperfvalues(); 
+      compute(matrix);
+    }
+    
+    ~SparseLU()
+    {
+      // Free all explicit dynamic pointers 
+    }
+    
+    void analyzePattern (const MatrixType& matrix);
+    void factorize (const MatrixType& matrix);
+    void simplicialfactorize(const MatrixType& matrix);
+    
+    /**
+      * Compute the symbolic and numeric factorization of the input sparse matrix.
+      * The input matrix should be in column-major storage. 
+      */
+    void compute (const MatrixType& matrix)
+    {
+      // Analyze 
+      analyzePattern(matrix); 
+      //Factorize
+      factorize(matrix);
+    } 
+    
+    inline Index rows() const { return m_mat.rows(); }
+    inline Index cols() const { return m_mat.cols(); }
+    /** Indicate that the pattern of the input matrix is symmetric */
+    void isSymmetric(bool sym)
+    {
+      m_symmetricmode = sym;
+    }
+    
+    /** \returns an expression of the matrix L, internally stored as supernodes
+      * The only operation available with this expression is the triangular solve
+      * \code
+      * y = b; matrixL().solveInPlace(y);
+      * \endcode
+      */
+    SparseLUMatrixLReturnType<SCMatrix> matrixL() const
+    {
+      return SparseLUMatrixLReturnType<SCMatrix>(m_Lstore);
+    }
+    /** \returns an expression of the matrix U,
+      * The only operation available with this expression is the triangular solve
+      * \code
+      * y = b; matrixU().solveInPlace(y);
+      * \endcode
+      */
+    SparseLUMatrixUReturnType<SCMatrix,MappedSparseMatrix<Scalar,ColMajor,Index> > matrixU() const
+    {
+      return SparseLUMatrixUReturnType<SCMatrix, MappedSparseMatrix<Scalar,ColMajor,Index> >(m_Lstore, m_Ustore);
+    }
+
+    /**
+      * \returns a reference to the row matrix permutation \f$ P_r \f$ such that \f$P_r A P_c^T = L U\f$
+      * \sa colsPermutation()
+      */
+    inline const PermutationType& rowsPermutation() const
+    {
+      return m_perm_r;
+    }
+    /**
+      * \returns a reference to the column matrix permutation\f$ P_c^T \f$ such that \f$P_r A P_c^T = L U\f$
+      * \sa rowsPermutation()
+      */
+    inline const PermutationType& colsPermutation() const
+    {
+      return m_perm_c;
+    }
+    /** Set the threshold used for a diagonal entry to be an acceptable pivot. */
+    void setPivotThreshold(const RealScalar& thresh)
+    {
+      m_diagpivotthresh = thresh; 
+    }
+
+    /** \returns the solution X of \f$ A X = B \f$ using the current decomposition of A.
+      *
+      * \warning the destination matrix X in X = this->solve(B) must be colmun-major.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<SparseLU, Rhs> solve(const MatrixBase<Rhs>& B) const 
+    {
+      eigen_assert(m_factorizationIsOk && "SparseLU is not initialized."); 
+      eigen_assert(rows()==B.rows()
+                    && "SparseLU::solve(): invalid number of rows of the right hand side matrix B");
+          return internal::solve_retval<SparseLU, Rhs>(*this, B.derived());
+    }
+
+    /** \returns the solution X of \f$ A X = B \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::sparse_solve_retval<SparseLU, Rhs> solve(const SparseMatrixBase<Rhs>& B) const 
+    {
+      eigen_assert(m_factorizationIsOk && "SparseLU is not initialized."); 
+      eigen_assert(rows()==B.rows()
+                    && "SparseLU::solve(): invalid number of rows of the right hand side matrix B");
+          return internal::sparse_solve_retval<SparseLU, Rhs>(*this, B.derived());
+    }
+    
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the LU factorization reports a problem, zero diagonal for instance
+      *          \c InvalidInput if the input matrix is invalid
+      *
+      * \sa iparm()          
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_info;
+    }
+    
+    /**
+      * \returns A string describing the type of error
+      */
+    std::string lastErrorMessage() const
+    {
+      return m_lastError; 
+    }
+
+    template<typename Rhs, typename Dest>
+    bool _solve(const MatrixBase<Rhs> &B, MatrixBase<Dest> &X_base) const
+    {
+      Dest& X(X_base.derived());
+      eigen_assert(m_factorizationIsOk && "The matrix should be factorized first");
+      EIGEN_STATIC_ASSERT((Dest::Flags&RowMajorBit)==0,
+                        THIS_METHOD_IS_ONLY_FOR_COLUMN_MAJOR_MATRICES);
+      
+      // Permute the right hand side to form X = Pr*B
+      // on return, X is overwritten by the computed solution
+      X.resize(B.rows(),B.cols());
+
+      // this ugly const_cast_derived() helps to detect aliasing when applying the permutations
+      for(Index j = 0; j < B.cols(); ++j)
+        X.col(j) = rowsPermutation() * B.const_cast_derived().col(j);
+      
+      //Forward substitution with L
+      this->matrixL().solveInPlace(X);
+      this->matrixU().solveInPlace(X);
+      
+      // Permute back the solution 
+      for (Index j = 0; j < B.cols(); ++j)
+        X.col(j) = colsPermutation().inverse() * X.col(j);
+      
+      return true; 
+    }
+    
+    /**
+      * \returns the absolute value of the determinant of the matrix of which
+      * *this is the QR decomposition.
+      *
+      * \warning a determinant can be very big or small, so for matrices
+      * of large enough dimension, there is a risk of overflow/underflow.
+      * One way to work around that is to use logAbsDeterminant() instead.
+      *
+      * \sa logAbsDeterminant(), signDeterminant()
+      */
+    Scalar absDeterminant()
+    {
+      using std::abs;
+      eigen_assert(m_factorizationIsOk && "The matrix should be factorized first.");
+      // Initialize with the determinant of the row matrix
+      Scalar det = Scalar(1.);
+      //Note that the diagonal blocks of U are stored in supernodes,
+      // which are available in the  L part :)
+      for (Index j = 0; j < this->cols(); ++j)
+      {
+        for (typename SCMatrix::InnerIterator it(m_Lstore, j); it; ++it)
+        {
+          if(it.row() < j) continue;
+          if(it.row() == j)
+          {
+            det *= abs(it.value());
+            break;
+          }
+        }
+      }
+      return det;
+    }
+
+    /** \returns the natural log of the absolute value of the determinant of the matrix
+      * of which **this is the QR decomposition
+      *
+      * \note This method is useful to work around the risk of overflow/underflow that's
+      * inherent to the determinant computation.
+      *
+      * \sa absDeterminant(), signDeterminant()
+      */
+    Scalar logAbsDeterminant() const
+    {
+      using std::log;
+      using std::abs;
+
+      eigen_assert(m_factorizationIsOk && "The matrix should be factorized first.");
+      Scalar det = Scalar(0.);
+      for (Index j = 0; j < this->cols(); ++j)
+      {
+        for (typename SCMatrix::InnerIterator it(m_Lstore, j); it; ++it)
+        {
+          if(it.row() < j) continue;
+          if(it.row() == j)
+          {
+            det += log(abs(it.value()));
+            break;
+          }
+        }
+      }
+      return det;
+    }
+
+    /** \returns A number representing the sign of the determinant
+      *
+      * \sa absDeterminant(), logAbsDeterminant()
+      */
+    Scalar signDeterminant()
+    {
+      eigen_assert(m_factorizationIsOk && "The matrix should be factorized first.");
+      return Scalar(m_detPermR);
+    }
+
+  protected:
+    // Functions 
+    void initperfvalues()
+    {
+      m_perfv.panel_size = 1;
+      m_perfv.relax = 1; 
+      m_perfv.maxsuper = 128; 
+      m_perfv.rowblk = 16; 
+      m_perfv.colblk = 8; 
+      m_perfv.fillfactor = 20;  
+    }
+      
+    // Variables 
+    mutable ComputationInfo m_info;
+    bool m_isInitialized;
+    bool m_factorizationIsOk;
+    bool m_analysisIsOk;
+    std::string m_lastError;
+    NCMatrix m_mat; // The input (permuted ) matrix 
+    SCMatrix m_Lstore; // The lower triangular matrix (supernodal)
+    MappedSparseMatrix<Scalar,ColMajor,Index> m_Ustore; // The upper triangular matrix
+    PermutationType m_perm_c; // Column permutation 
+    PermutationType m_perm_r ; // Row permutation
+    IndexVector m_etree; // Column elimination tree 
+    
+    typename Base::GlobalLU_t m_glu; 
+                               
+    // SparseLU options 
+    bool m_symmetricmode;
+    // values for performance 
+    internal::perfvalues<Index> m_perfv; 
+    RealScalar m_diagpivotthresh; // Specifies the threshold used for a diagonal entry to be an acceptable pivot
+    Index m_nnzL, m_nnzU; // Nonzeros in L and U factors 
+    Index m_detPermR; // Determinant of the coefficient matrix
+  private:
+    // Disable copy constructor 
+    SparseLU (const SparseLU& );
+  
+}; // End class SparseLU
+
+
+
+// Functions needed by the anaysis phase
+/** 
+  * Compute the column permutation to minimize the fill-in
+  * 
+  *  - Apply this permutation to the input matrix - 
+  * 
+  *  - Compute the column elimination tree on the permuted matrix 
+  * 
+  *  - Postorder the elimination tree and the column permutation
+  * 
+  */
+template <typename MatrixType, typename OrderingType>
+void SparseLU<MatrixType, OrderingType>::analyzePattern(const MatrixType& mat)
+{
+  
+  //TODO  It is possible as in SuperLU to compute row and columns scaling vectors to equilibrate the matrix mat.
+  
+  OrderingType ord; 
+  ord(mat,m_perm_c);
+  
+  // Apply the permutation to the column of the input  matrix
+  //First copy the whole input matrix. 
+  m_mat = mat;
+  if (m_perm_c.size()) {
+    m_mat.uncompress(); //NOTE: The effect of this command is only to create the InnerNonzeros pointers. FIXME : This vector is filled but not subsequently used.  
+    //Then, permute only the column pointers
+    const Index * outerIndexPtr;
+    if (mat.isCompressed()) outerIndexPtr = mat.outerIndexPtr();
+    else
+    {
+      Index *outerIndexPtr_t = new Index[mat.cols()+1];
+      for(Index i = 0; i <= mat.cols(); i++) outerIndexPtr_t[i] = m_mat.outerIndexPtr()[i];
+      outerIndexPtr = outerIndexPtr_t;
+    }
+    for (Index i = 0; i < mat.cols(); i++)
+    {
+      m_mat.outerIndexPtr()[m_perm_c.indices()(i)] = outerIndexPtr[i];
+      m_mat.innerNonZeroPtr()[m_perm_c.indices()(i)] = outerIndexPtr[i+1] - outerIndexPtr[i];
+    }
+    if(!mat.isCompressed()) delete[] outerIndexPtr;
+  }
+  // Compute the column elimination tree of the permuted matrix 
+  IndexVector firstRowElt;
+  internal::coletree(m_mat, m_etree,firstRowElt); 
+     
+  // In symmetric mode, do not do postorder here
+  if (!m_symmetricmode) {
+    IndexVector post, iwork; 
+    // Post order etree
+    internal::treePostorder(m_mat.cols(), m_etree, post); 
+      
+   
+    // Renumber etree in postorder 
+    Index m = m_mat.cols(); 
+    iwork.resize(m+1);
+    for (Index i = 0; i < m; ++i) iwork(post(i)) = post(m_etree(i));
+    m_etree = iwork;
+    
+    // Postmultiply A*Pc by post, i.e reorder the matrix according to the postorder of the etree
+    PermutationType post_perm(m); 
+    for (Index i = 0; i < m; i++) 
+      post_perm.indices()(i) = post(i); 
+        
+    // Combine the two permutations : postorder the permutation for future use
+    if(m_perm_c.size()) {
+      m_perm_c = post_perm * m_perm_c;
+    }
+    
+  } // end postordering 
+  
+  m_analysisIsOk = true; 
+}
+
+// Functions needed by the numerical factorization phase
+
+
+/** 
+  *  - Numerical factorization 
+  *  - Interleaved with the symbolic factorization 
+  * On exit,  info is 
+  * 
+  *    = 0: successful factorization
+  * 
+  *    > 0: if info = i, and i is
+  * 
+  *       <= A->ncol: U(i,i) is exactly zero. The factorization has
+  *          been completed, but the factor U is exactly singular,
+  *          and division by zero will occur if it is used to solve a
+  *          system of equations.
+  * 
+  *       > A->ncol: number of bytes allocated when memory allocation
+  *         failure occurred, plus A->ncol. If lwork = -1, it is
+  *         the estimated amount of space needed, plus A->ncol.  
+  */
+template <typename MatrixType, typename OrderingType>
+void SparseLU<MatrixType, OrderingType>::factorize(const MatrixType& matrix)
+{
+  using internal::emptyIdxLU;
+  eigen_assert(m_analysisIsOk && "analyzePattern() should be called first"); 
+  eigen_assert((matrix.rows() == matrix.cols()) && "Only for squared matrices");
+  
+  typedef typename IndexVector::Scalar Index; 
+  
+  
+  // Apply the column permutation computed in analyzepattern()
+  //   m_mat = matrix * m_perm_c.inverse(); 
+  m_mat = matrix;
+  if (m_perm_c.size()) 
+  {
+    m_mat.uncompress(); //NOTE: The effect of this command is only to create the InnerNonzeros pointers.
+    //Then, permute only the column pointers
+    const Index * outerIndexPtr;
+    if (matrix.isCompressed()) outerIndexPtr = matrix.outerIndexPtr();
+    else
+    {
+      Index* outerIndexPtr_t = new Index[matrix.cols()+1];
+      for(Index i = 0; i <= matrix.cols(); i++) outerIndexPtr_t[i] = m_mat.outerIndexPtr()[i];
+      outerIndexPtr = outerIndexPtr_t;
+    }
+    for (Index i = 0; i < matrix.cols(); i++)
+    {
+      m_mat.outerIndexPtr()[m_perm_c.indices()(i)] = outerIndexPtr[i];
+      m_mat.innerNonZeroPtr()[m_perm_c.indices()(i)] = outerIndexPtr[i+1] - outerIndexPtr[i];
+    }
+    if(!matrix.isCompressed()) delete[] outerIndexPtr;
+  } 
+  else 
+  { //FIXME This should not be needed if the empty permutation is handled transparently
+    m_perm_c.resize(matrix.cols());
+    for(Index i = 0; i < matrix.cols(); ++i) m_perm_c.indices()(i) = i;
+  }
+  
+  Index m = m_mat.rows();
+  Index n = m_mat.cols();
+  Index nnz = m_mat.nonZeros();
+  Index maxpanel = m_perfv.panel_size * m;
+  // Allocate working storage common to the factor routines
+  Index lwork = 0;
+  Index info = Base::memInit(m, n, nnz, lwork, m_perfv.fillfactor, m_perfv.panel_size, m_glu); 
+  if (info) 
+  {
+    m_lastError = "UNABLE TO ALLOCATE WORKING MEMORY\n\n" ;
+    m_factorizationIsOk = false;
+    return ; 
+  }
+  
+  // Set up pointers for integer working arrays 
+  IndexVector segrep(m); segrep.setZero();
+  IndexVector parent(m); parent.setZero();
+  IndexVector xplore(m); xplore.setZero();
+  IndexVector repfnz(maxpanel);
+  IndexVector panel_lsub(maxpanel);
+  IndexVector xprune(n); xprune.setZero();
+  IndexVector marker(m*internal::LUNoMarker); marker.setZero();
+  
+  repfnz.setConstant(-1); 
+  panel_lsub.setConstant(-1);
+  
+  // Set up pointers for scalar working arrays 
+  ScalarVector dense; 
+  dense.setZero(maxpanel);
+  ScalarVector tempv; 
+  tempv.setZero(internal::LUnumTempV(m, m_perfv.panel_size, m_perfv.maxsuper, /*m_perfv.rowblk*/m) );
+  
+  // Compute the inverse of perm_c
+  PermutationType iperm_c(m_perm_c.inverse()); 
+  
+  // Identify initial relaxed snodes
+  IndexVector relax_end(n);
+  if ( m_symmetricmode == true ) 
+    Base::heap_relax_snode(n, m_etree, m_perfv.relax, marker, relax_end);
+  else
+    Base::relax_snode(n, m_etree, m_perfv.relax, marker, relax_end);
+  
+  
+  m_perm_r.resize(m); 
+  m_perm_r.indices().setConstant(-1);
+  marker.setConstant(-1);
+  m_detPermR = 1; // Record the determinant of the row permutation
+  
+  m_glu.supno(0) = emptyIdxLU; m_glu.xsup.setConstant(0);
+  m_glu.xsup(0) = m_glu.xlsub(0) = m_glu.xusub(0) = m_glu.xlusup(0) = Index(0);
+  
+  // Work on one 'panel' at a time. A panel is one of the following :
+  //  (a) a relaxed supernode at the bottom of the etree, or
+  //  (b) panel_size contiguous columns, <panel_size> defined by the user
+  Index jcol; 
+  IndexVector panel_histo(n);
+  Index pivrow; // Pivotal row number in the original row matrix
+  Index nseg1; // Number of segments in U-column above panel row jcol
+  Index nseg; // Number of segments in each U-column 
+  Index irep; 
+  Index i, k, jj; 
+  for (jcol = 0; jcol < n; )
+  {
+    // Adjust panel size so that a panel won't overlap with the next relaxed snode. 
+    Index panel_size = m_perfv.panel_size; // upper bound on panel width
+    for (k = jcol + 1; k < (std::min)(jcol+panel_size, n); k++)
+    {
+      if (relax_end(k) != emptyIdxLU) 
+      {
+        panel_size = k - jcol; 
+        break; 
+      }
+    }
+    if (k == n) 
+      panel_size = n - jcol; 
+      
+    // Symbolic outer factorization on a panel of columns 
+    Base::panel_dfs(m, panel_size, jcol, m_mat, m_perm_r.indices(), nseg1, dense, panel_lsub, segrep, repfnz, xprune, marker, parent, xplore, m_glu); 
+    
+    // Numeric sup-panel updates in topological order 
+    Base::panel_bmod(m, panel_size, jcol, nseg1, dense, tempv, segrep, repfnz, m_glu); 
+    
+    // Sparse LU within the panel, and below the panel diagonal 
+    for ( jj = jcol; jj< jcol + panel_size; jj++) 
+    {
+      k = (jj - jcol) * m; // Column index for w-wide arrays 
+      
+      nseg = nseg1; // begin after all the panel segments
+      //Depth-first-search for the current column
+      VectorBlock<IndexVector> panel_lsubk(panel_lsub, k, m);
+      VectorBlock<IndexVector> repfnz_k(repfnz, k, m); 
+      info = Base::column_dfs(m, jj, m_perm_r.indices(), m_perfv.maxsuper, nseg, panel_lsubk, segrep, repfnz_k, xprune, marker, parent, xplore, m_glu); 
+      if ( info ) 
+      {
+        m_lastError =  "UNABLE TO EXPAND MEMORY IN COLUMN_DFS() ";
+        m_info = NumericalIssue; 
+        m_factorizationIsOk = false; 
+        return; 
+      }
+      // Numeric updates to this column 
+      VectorBlock<ScalarVector> dense_k(dense, k, m); 
+      VectorBlock<IndexVector> segrep_k(segrep, nseg1, m-nseg1); 
+      info = Base::column_bmod(jj, (nseg - nseg1), dense_k, tempv, segrep_k, repfnz_k, jcol, m_glu); 
+      if ( info ) 
+      {
+        m_lastError = "UNABLE TO EXPAND MEMORY IN COLUMN_BMOD() ";
+        m_info = NumericalIssue; 
+        m_factorizationIsOk = false; 
+        return; 
+      }
+      
+      // Copy the U-segments to ucol(*)
+      info = Base::copy_to_ucol(jj, nseg, segrep, repfnz_k ,m_perm_r.indices(), dense_k, m_glu); 
+      if ( info ) 
+      {
+        m_lastError = "UNABLE TO EXPAND MEMORY IN COPY_TO_UCOL() ";
+        m_info = NumericalIssue; 
+        m_factorizationIsOk = false; 
+        return; 
+      }
+      
+      // Form the L-segment 
+      info = Base::pivotL(jj, m_diagpivotthresh, m_perm_r.indices(), iperm_c.indices(), pivrow, m_glu);
+      if ( info ) 
+      {
+        m_lastError = "THE MATRIX IS STRUCTURALLY SINGULAR ... ZERO COLUMN AT ";
+        std::ostringstream returnInfo;
+        returnInfo << info; 
+        m_lastError += returnInfo.str();
+        m_info = NumericalIssue; 
+        m_factorizationIsOk = false; 
+        return; 
+      }
+      
+      // Update the determinant of the row permutation matrix
+      if (pivrow != jj) m_detPermR *= -1;
+
+      // Prune columns (0:jj-1) using column jj
+      Base::pruneL(jj, m_perm_r.indices(), pivrow, nseg, segrep, repfnz_k, xprune, m_glu); 
+      
+      // Reset repfnz for this column 
+      for (i = 0; i < nseg; i++)
+      {
+        irep = segrep(i); 
+        repfnz_k(irep) = emptyIdxLU; 
+      }
+    } // end SparseLU within the panel  
+    jcol += panel_size;  // Move to the next panel
+  } // end for -- end elimination 
+  
+  // Count the number of nonzeros in factors 
+  Base::countnz(n, m_nnzL, m_nnzU, m_glu); 
+  // Apply permutation  to the L subscripts 
+  Base::fixupL(n, m_perm_r.indices(), m_glu); 
+  
+  // Create supernode matrix L 
+  m_Lstore.setInfos(m, n, m_glu.lusup, m_glu.xlusup, m_glu.lsub, m_glu.xlsub, m_glu.supno, m_glu.xsup); 
+  // Create the column major upper sparse matrix  U; 
+  new (&m_Ustore) MappedSparseMatrix<Scalar, ColMajor, Index> ( m, n, m_nnzU, m_glu.xusub.data(), m_glu.usub.data(), m_glu.ucol.data() ); 
+  
+  m_info = Success;
+  m_factorizationIsOk = true;
+}
+
+template<typename MappedSupernodalType>
+struct SparseLUMatrixLReturnType : internal::no_assignment_operator
+{
+  typedef typename MappedSupernodalType::Index Index;
+  typedef typename MappedSupernodalType::Scalar Scalar;
+  SparseLUMatrixLReturnType(const MappedSupernodalType& mapL) : m_mapL(mapL)
+  { }
+  Index rows() { return m_mapL.rows(); }
+  Index cols() { return m_mapL.cols(); }
+  template<typename Dest>
+  void solveInPlace( MatrixBase<Dest> &X) const
+  {
+    m_mapL.solveInPlace(X);
+  }
+  const MappedSupernodalType& m_mapL;
+};
+
+template<typename MatrixLType, typename MatrixUType>
+struct SparseLUMatrixUReturnType : internal::no_assignment_operator
+{
+  typedef typename MatrixLType::Index Index;
+  typedef typename MatrixLType::Scalar Scalar;
+  SparseLUMatrixUReturnType(const MatrixLType& mapL, const MatrixUType& mapU)
+  : m_mapL(mapL),m_mapU(mapU)
+  { }
+  Index rows() { return m_mapL.rows(); }
+  Index cols() { return m_mapL.cols(); }
+
+  template<typename Dest>   void solveInPlace(MatrixBase<Dest> &X) const
+  {
+    Index nrhs = X.cols();
+    Index n = X.rows();
+    // Backward solve with U
+    for (Index k = m_mapL.nsuper(); k >= 0; k--)
+    {
+      Index fsupc = m_mapL.supToCol()[k];
+      Index lda = m_mapL.colIndexPtr()[fsupc+1] - m_mapL.colIndexPtr()[fsupc]; // leading dimension
+      Index nsupc = m_mapL.supToCol()[k+1] - fsupc;
+      Index luptr = m_mapL.colIndexPtr()[fsupc];
+
+      if (nsupc == 1)
+      {
+        for (Index j = 0; j < nrhs; j++)
+        {
+          X(fsupc, j) /= m_mapL.valuePtr()[luptr];
+        }
+      }
+      else
+      {
+        Map<const Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > A( &(m_mapL.valuePtr()[luptr]), nsupc, nsupc, OuterStride<>(lda) );
+        Map< Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > U (&(X(fsupc,0)), nsupc, nrhs, OuterStride<>(n) );
+        U = A.template triangularView<Upper>().solve(U);
+      }
+
+      for (Index j = 0; j < nrhs; ++j)
+      {
+        for (Index jcol = fsupc; jcol < fsupc + nsupc; jcol++)
+        {
+          typename MatrixUType::InnerIterator it(m_mapU, jcol);
+          for ( ; it; ++it)
+          {
+            Index irow = it.index();
+            X(irow, j) -= X(jcol, j) * it.value();
+          }
+        }
+      }
+    } // End For U-solve
+  }
+  const MatrixLType& m_mapL;
+  const MatrixUType& m_mapU;
+};
+
+namespace internal {
+  
+template<typename _MatrixType, typename Derived, typename Rhs>
+struct solve_retval<SparseLU<_MatrixType,Derived>, Rhs>
+  : solve_retval_base<SparseLU<_MatrixType,Derived>, Rhs>
+{
+  typedef SparseLU<_MatrixType,Derived> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+template<typename _MatrixType, typename Derived, typename Rhs>
+struct sparse_solve_retval<SparseLU<_MatrixType,Derived>, Rhs>
+  : sparse_solve_retval_base<SparseLU<_MatrixType,Derived>, Rhs>
+{
+  typedef SparseLU<_MatrixType,Derived> Dec;
+  EIGEN_MAKE_SPARSE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    this->defaultEvalTo(dst);
+  }
+};
+} // end namespace internal
+
+} // End namespace Eigen 
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLUImpl.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLUImpl.h
new file mode 100644
index 0000000000..14d70897df
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLUImpl.h
@@ -0,0 +1,64 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#ifndef SPARSELU_IMPL_H
+#define SPARSELU_IMPL_H
+
+namespace Eigen {
+namespace internal {
+  
+/** \ingroup SparseLU_Module
+  * \class SparseLUImpl
+  * Base class for sparseLU
+  */
+template <typename Scalar, typename Index>
+class SparseLUImpl
+{
+  public:
+    typedef Matrix<Scalar,Dynamic,1> ScalarVector;
+    typedef Matrix<Index,Dynamic,1> IndexVector; 
+    typedef typename ScalarVector::RealScalar RealScalar; 
+    typedef Ref<Matrix<Scalar,Dynamic,1> > BlockScalarVector;
+    typedef Ref<Matrix<Index,Dynamic,1> > BlockIndexVector;
+    typedef LU_GlobalLU_t<IndexVector, ScalarVector> GlobalLU_t; 
+    typedef SparseMatrix<Scalar,ColMajor,Index> MatrixType; 
+    
+  protected:
+     template <typename VectorType>
+     Index expand(VectorType& vec, Index& length, Index nbElts, Index keep_prev, Index& num_expansions);
+     Index memInit(Index m, Index n, Index annz, Index lwork, Index fillratio, Index panel_size,  GlobalLU_t& glu); 
+     template <typename VectorType>
+     Index memXpand(VectorType& vec, Index& maxlen, Index nbElts, MemType memtype, Index& num_expansions);
+     void heap_relax_snode (const Index n, IndexVector& et, const Index relax_columns, IndexVector& descendants, IndexVector& relax_end); 
+     void relax_snode (const Index n, IndexVector& et, const Index relax_columns, IndexVector& descendants, IndexVector& relax_end); 
+     Index snode_dfs(const Index jcol, const Index kcol,const MatrixType& mat,  IndexVector& xprune, IndexVector& marker, GlobalLU_t& glu); 
+     Index snode_bmod (const Index jcol, const Index fsupc, ScalarVector& dense, GlobalLU_t& glu);
+     Index pivotL(const Index jcol, const RealScalar& diagpivotthresh, IndexVector& perm_r, IndexVector& iperm_c, Index& pivrow, GlobalLU_t& glu);
+     template <typename Traits>
+     void dfs_kernel(const Index jj, IndexVector& perm_r,
+                    Index& nseg, IndexVector& panel_lsub, IndexVector& segrep,
+                    Ref<IndexVector> repfnz_col, IndexVector& xprune, Ref<IndexVector> marker, IndexVector& parent,
+                    IndexVector& xplore, GlobalLU_t& glu, Index& nextl_col, Index krow, Traits& traits);
+     void panel_dfs(const Index m, const Index w, const Index jcol, MatrixType& A, IndexVector& perm_r, Index& nseg, ScalarVector& dense, IndexVector& panel_lsub, IndexVector& segrep, IndexVector& repfnz, IndexVector& xprune, IndexVector& marker, IndexVector& parent, IndexVector& xplore, GlobalLU_t& glu);
+    
+     void panel_bmod(const Index m, const Index w, const Index jcol, const Index nseg, ScalarVector& dense, ScalarVector& tempv, IndexVector& segrep, IndexVector& repfnz, GlobalLU_t& glu);
+     Index column_dfs(const Index m, const Index jcol, IndexVector& perm_r, Index maxsuper, Index& nseg,  BlockIndexVector lsub_col, IndexVector& segrep, BlockIndexVector repfnz, IndexVector& xprune, IndexVector& marker, IndexVector& parent, IndexVector& xplore, GlobalLU_t& glu);
+     Index column_bmod(const Index jcol, const Index nseg, BlockScalarVector dense, ScalarVector& tempv, BlockIndexVector segrep, BlockIndexVector repfnz, Index fpanelc, GlobalLU_t& glu); 
+     Index copy_to_ucol(const Index jcol, const Index nseg, IndexVector& segrep, BlockIndexVector repfnz ,IndexVector& perm_r, BlockScalarVector dense, GlobalLU_t& glu); 
+     void pruneL(const Index jcol, const IndexVector& perm_r, const Index pivrow, const Index nseg, const IndexVector& segrep, BlockIndexVector repfnz, IndexVector& xprune, GlobalLU_t& glu);
+     void countnz(const Index n, Index& nnzL, Index& nnzU, GlobalLU_t& glu); 
+     void fixupL(const Index n, const IndexVector& perm_r, GlobalLU_t& glu); 
+     
+     template<typename , typename >
+     friend struct column_dfs_traits;
+}; 
+
+} // end namespace internal
+} // namespace Eigen
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Memory.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Memory.h
new file mode 100644
index 0000000000..1ffa7d54e9
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Memory.h
@@ -0,0 +1,227 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* 
+ 
+ * NOTE: This file is the modified version of [s,d,c,z]memory.c files in SuperLU 
+ 
+ * -- SuperLU routine (version 3.1) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * August 1, 2008
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+
+#ifndef EIGEN_SPARSELU_MEMORY
+#define EIGEN_SPARSELU_MEMORY
+
+namespace Eigen {
+namespace internal {
+  
+enum { LUNoMarker = 3 };
+enum {emptyIdxLU = -1};
+template<typename Index>
+inline Index LUnumTempV(Index& m, Index& w, Index& t, Index& b)
+{
+  return (std::max)(m, (t+b)*w);
+}
+
+template< typename Scalar, typename Index>
+inline Index LUTempSpace(Index&m, Index& w)
+{
+  return (2*w + 4 + LUNoMarker) * m * sizeof(Index) + (w + 1) * m * sizeof(Scalar);
+}
+
+
+
+
+/** 
+  * Expand the existing storage to accomodate more fill-ins
+  * \param vec Valid pointer to the vector to allocate or expand
+  * \param[in,out] length  At input, contain the current length of the vector that is to be increased. At output, length of the newly allocated vector
+  * \param[in] nbElts Current number of elements in the factors
+  * \param keep_prev  1: use length  and do not expand the vector; 0: compute new_len and expand
+  * \param[in,out] num_expansions Number of times the memory has been expanded
+  */
+template <typename Scalar, typename Index>
+template <typename VectorType>
+Index  SparseLUImpl<Scalar,Index>::expand(VectorType& vec, Index& length, Index nbElts, Index keep_prev, Index& num_expansions) 
+{
+  
+  float alpha = 1.5; // Ratio of the memory increase 
+  Index new_len; // New size of the allocated memory
+  
+  if(num_expansions == 0 || keep_prev) 
+    new_len = length ; // First time allocate requested
+  else 
+    new_len = (std::max)(length+1,Index(alpha * length));
+  
+  VectorType old_vec; // Temporary vector to hold the previous values   
+  if (nbElts > 0 )
+    old_vec = vec.segment(0,nbElts); 
+  
+  //Allocate or expand the current vector
+#ifdef EIGEN_EXCEPTIONS
+  try
+#endif
+  {
+    vec.resize(new_len); 
+  }
+#ifdef EIGEN_EXCEPTIONS
+  catch(std::bad_alloc& )
+#else
+  if(!vec.size())
+#endif
+  {
+    if (!num_expansions)
+    {
+      // First time to allocate from LUMemInit()
+      // Let LUMemInit() deals with it.
+      return -1;
+    }
+    if (keep_prev)
+    {
+      // In this case, the memory length should not not be reduced
+      return new_len;
+    }
+    else 
+    {
+      // Reduce the size and increase again 
+      Index tries = 0; // Number of attempts
+      do 
+      {
+        alpha = (alpha + 1)/2;
+        new_len = (std::max)(length+1,Index(alpha * length));
+#ifdef EIGEN_EXCEPTIONS
+        try
+#endif
+        {
+          vec.resize(new_len); 
+        }
+#ifdef EIGEN_EXCEPTIONS
+        catch(std::bad_alloc& )
+#else
+        if (!vec.size())
+#endif
+        {
+          tries += 1; 
+          if ( tries > 10) return new_len; 
+        }
+      } while (!vec.size());
+    }
+  }
+  //Copy the previous values to the newly allocated space 
+  if (nbElts > 0)
+    vec.segment(0, nbElts) = old_vec;   
+   
+  
+  length  = new_len;
+  if(num_expansions) ++num_expansions;
+  return 0; 
+}
+
+/**
+ * \brief  Allocate various working space for the numerical factorization phase.
+ * \param m number of rows of the input matrix 
+ * \param n number of columns 
+ * \param annz number of initial nonzeros in the matrix 
+ * \param lwork  if lwork=-1, this routine returns an estimated size of the required memory
+ * \param glu persistent data to facilitate multiple factors : will be deleted later ??
+ * \param fillratio estimated ratio of fill in the factors
+ * \param panel_size Size of a panel
+ * \return an estimated size of the required memory if lwork = -1; otherwise, return the size of actually allocated memory when allocation failed, and 0 on success
+ * \note Unlike SuperLU, this routine does not support successive factorization with the same pattern and the same row permutation
+ */
+template <typename Scalar, typename Index>
+Index SparseLUImpl<Scalar,Index>::memInit(Index m, Index n, Index annz, Index lwork, Index fillratio, Index panel_size,  GlobalLU_t& glu)
+{
+  Index& num_expansions = glu.num_expansions; //No memory expansions so far
+  num_expansions = 0;
+  glu.nzumax = glu.nzlumax = (std::min)(fillratio * annz / n, m) * n; // estimated number of nonzeros in U 
+  glu.nzlmax = (std::max)(Index(4), fillratio) * annz / 4; // estimated  nnz in L factor
+  // Return the estimated size to the user if necessary
+  Index tempSpace;
+  tempSpace = (2*panel_size + 4 + LUNoMarker) * m * sizeof(Index) + (panel_size + 1) * m * sizeof(Scalar);
+  if (lwork == emptyIdxLU) 
+  {
+    Index estimated_size;
+    estimated_size = (5 * n + 5) * sizeof(Index)  + tempSpace
+                    + (glu.nzlmax + glu.nzumax) * sizeof(Index) + (glu.nzlumax+glu.nzumax) *  sizeof(Scalar) + n; 
+    return estimated_size;
+  }
+  
+  // Setup the required space 
+  
+  // First allocate Integer pointers for L\U factors
+  glu.xsup.resize(n+1);
+  glu.supno.resize(n+1);
+  glu.xlsub.resize(n+1);
+  glu.xlusup.resize(n+1);
+  glu.xusub.resize(n+1);
+
+  // Reserve memory for L/U factors
+  do 
+  {
+    if(     (expand<ScalarVector>(glu.lusup, glu.nzlumax, 0, 0, num_expansions)<0)
+        ||  (expand<ScalarVector>(glu.ucol,  glu.nzumax,  0, 0, num_expansions)<0)
+        ||  (expand<IndexVector> (glu.lsub,  glu.nzlmax,  0, 0, num_expansions)<0)
+        ||  (expand<IndexVector> (glu.usub,  glu.nzumax,  0, 1, num_expansions)<0) )
+    {
+      //Reduce the estimated size and retry
+      glu.nzlumax /= 2;
+      glu.nzumax /= 2;
+      glu.nzlmax /= 2;
+      if (glu.nzlumax < annz ) return glu.nzlumax; 
+    }
+  } while (!glu.lusup.size() || !glu.ucol.size() || !glu.lsub.size() || !glu.usub.size());
+  
+  ++num_expansions;
+  return 0;
+  
+} // end LuMemInit
+
+/** 
+ * \brief Expand the existing storage 
+ * \param vec vector to expand 
+ * \param[in,out] maxlen On input, previous size of vec (Number of elements to copy ). on output, new size
+ * \param nbElts current number of elements in the vector.
+ * \param memtype Type of the element to expand
+ * \param num_expansions Number of expansions 
+ * \return 0 on success, > 0 size of the memory allocated so far
+ */
+template <typename Scalar, typename Index>
+template <typename VectorType>
+Index SparseLUImpl<Scalar,Index>::memXpand(VectorType& vec, Index& maxlen, Index nbElts, MemType memtype, Index& num_expansions)
+{
+  Index failed_size; 
+  if (memtype == USUB)
+     failed_size = this->expand<VectorType>(vec, maxlen, nbElts, 1, num_expansions);
+  else
+    failed_size = this->expand<VectorType>(vec, maxlen, nbElts, 0, num_expansions);
+
+  if (failed_size)
+    return failed_size; 
+  
+  return 0 ;  
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+#endif // EIGEN_SPARSELU_MEMORY
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Structs.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Structs.h
new file mode 100644
index 0000000000..24d6bf1794
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Structs.h
@@ -0,0 +1,111 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* 
+ * NOTE: This file comes from a partly modified version of files slu_[s,d,c,z]defs.h
+ * -- SuperLU routine (version 4.1) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * November, 2010
+ * 
+ * Global data structures used in LU factorization -
+ * 
+ *   nsuper: #supernodes = nsuper + 1, numbered [0, nsuper].
+ *   (xsup,supno): supno[i] is the supernode no to which i belongs;
+ *  xsup(s) points to the beginning of the s-th supernode.
+ *  e.g.   supno 0 1 2 2 3 3 3 4 4 4 4 4   (n=12)
+ *          xsup 0 1 2 4 7 12
+ *  Note: dfs will be performed on supernode rep. relative to the new 
+ *        row pivoting ordering
+ *
+ *   (xlsub,lsub): lsub[*] contains the compressed subscript of
+ *  rectangular supernodes; xlsub[j] points to the starting
+ *  location of the j-th column in lsub[*]. Note that xlsub 
+ *  is indexed by column.
+ *  Storage: original row subscripts
+ *
+ *      During the course of sparse LU factorization, we also use
+ *  (xlsub,lsub) for the purpose of symmetric pruning. For each
+ *  supernode {s,s+1,...,t=s+r} with first column s and last
+ *  column t, the subscript set
+ *    lsub[j], j=xlsub[s], .., xlsub[s+1]-1
+ *  is the structure of column s (i.e. structure of this supernode).
+ *  It is used for the storage of numerical values.
+ *  Furthermore,
+ *    lsub[j], j=xlsub[t], .., xlsub[t+1]-1
+ *  is the structure of the last column t of this supernode.
+ *  It is for the purpose of symmetric pruning. Therefore, the
+ *  structural subscripts can be rearranged without making physical
+ *  interchanges among the numerical values.
+ *
+ *  However, if the supernode has only one column, then we
+ *  only keep one set of subscripts. For any subscript interchange
+ *  performed, similar interchange must be done on the numerical
+ *  values.
+ *
+ *  The last column structures (for pruning) will be removed
+ *  after the numercial LU factorization phase.
+ *
+ *   (xlusup,lusup): lusup[*] contains the numerical values of the
+ *  rectangular supernodes; xlusup[j] points to the starting
+ *  location of the j-th column in storage vector lusup[*]
+ *  Note: xlusup is indexed by column.
+ *  Each rectangular supernode is stored by column-major
+ *  scheme, consistent with Fortran 2-dim array storage.
+ *
+ *   (xusub,ucol,usub): ucol[*] stores the numerical values of
+ *  U-columns outside the rectangular supernodes. The row
+ *  subscript of nonzero ucol[k] is stored in usub[k].
+ *  xusub[i] points to the starting location of column i in ucol.
+ *  Storage: new row subscripts; that is subscripts of PA.
+ */
+
+#ifndef EIGEN_LU_STRUCTS
+#define EIGEN_LU_STRUCTS
+namespace Eigen {
+namespace internal {
+  
+typedef enum {LUSUP, UCOL, LSUB, USUB, LLVL, ULVL} MemType; 
+
+template <typename IndexVector, typename ScalarVector>
+struct LU_GlobalLU_t {
+  typedef typename IndexVector::Scalar Index; 
+  IndexVector xsup; //First supernode column ... xsup(s) points to the beginning of the s-th supernode
+  IndexVector supno; // Supernode number corresponding to this column (column to supernode mapping)
+  ScalarVector  lusup; // nonzero values of L ordered by columns 
+  IndexVector lsub; // Compressed row indices of L rectangular supernodes. 
+  IndexVector xlusup; // pointers to the beginning of each column in lusup
+  IndexVector xlsub; // pointers to the beginning of each column in lsub
+  Index   nzlmax; // Current max size of lsub
+  Index   nzlumax; // Current max size of lusup
+  ScalarVector  ucol; // nonzero values of U ordered by columns 
+  IndexVector usub; // row indices of U columns in ucol
+  IndexVector xusub; // Pointers to the beginning of each column of U in ucol 
+  Index   nzumax; // Current max size of ucol
+  Index   n; // Number of columns in the matrix  
+  Index   num_expansions; 
+};
+
+// Values to set for performance
+template <typename Index>
+struct perfvalues {
+  Index panel_size; // a panel consists of at most <panel_size> consecutive columns
+  Index relax; // To control degree of relaxing supernodes. If the number of nodes (columns) 
+                // in a subtree of the elimination tree is less than relax, this subtree is considered 
+                // as one supernode regardless of the row structures of those columns
+  Index maxsuper; // The maximum size for a supernode in complete LU
+  Index rowblk; // The minimum row dimension for 2-D blocking to be used;
+  Index colblk; // The minimum column dimension for 2-D blocking to be used;
+  Index fillfactor; // The estimated fills factors for L and U, compared with A
+}; 
+
+} // end namespace internal
+
+} // end namespace Eigen
+#endif // EIGEN_LU_STRUCTS
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_SupernodalMatrix.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_SupernodalMatrix.h
new file mode 100644
index 0000000000..ad6f2183fe
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_SupernodalMatrix.h
@@ -0,0 +1,298 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSELU_SUPERNODAL_MATRIX_H
+#define EIGEN_SPARSELU_SUPERNODAL_MATRIX_H
+
+namespace Eigen {
+namespace internal {
+
+/** \ingroup SparseLU_Module
+ * \brief a class to manipulate the L supernodal factor from the SparseLU factorization
+ * 
+ * This class  contain the data to easily store 
+ * and manipulate the supernodes during the factorization and solution phase of Sparse LU. 
+ * Only the lower triangular matrix has supernodes.
+ * 
+ * NOTE : This class corresponds to the SCformat structure in SuperLU
+ * 
+ */
+/* TODO
+ * InnerIterator as for sparsematrix 
+ * SuperInnerIterator to iterate through all supernodes 
+ * Function for triangular solve
+ */
+template <typename _Scalar, typename _Index>
+class MappedSuperNodalMatrix
+{
+  public:
+    typedef _Scalar Scalar; 
+    typedef _Index Index;
+    typedef Matrix<Index,Dynamic,1> IndexVector; 
+    typedef Matrix<Scalar,Dynamic,1> ScalarVector;
+  public:
+    MappedSuperNodalMatrix()
+    {
+      
+    }
+    MappedSuperNodalMatrix(Index m, Index n,  ScalarVector& nzval, IndexVector& nzval_colptr, IndexVector& rowind, 
+             IndexVector& rowind_colptr, IndexVector& col_to_sup, IndexVector& sup_to_col )
+    {
+      setInfos(m, n, nzval, nzval_colptr, rowind, rowind_colptr, col_to_sup, sup_to_col);
+    }
+    
+    ~MappedSuperNodalMatrix()
+    {
+      
+    }
+    /**
+     * Set appropriate pointers for the lower triangular supernodal matrix
+     * These infos are available at the end of the numerical factorization
+     * FIXME This class will be modified such that it can be use in the course 
+     * of the factorization.
+     */
+    void setInfos(Index m, Index n, ScalarVector& nzval, IndexVector& nzval_colptr, IndexVector& rowind, 
+             IndexVector& rowind_colptr, IndexVector& col_to_sup, IndexVector& sup_to_col )
+    {
+      m_row = m;
+      m_col = n; 
+      m_nzval = nzval.data(); 
+      m_nzval_colptr = nzval_colptr.data(); 
+      m_rowind = rowind.data(); 
+      m_rowind_colptr = rowind_colptr.data(); 
+      m_nsuper = col_to_sup(n); 
+      m_col_to_sup = col_to_sup.data(); 
+      m_sup_to_col = sup_to_col.data(); 
+    }
+    
+    /**
+     * Number of rows
+     */
+    Index rows() { return m_row; }
+    
+    /**
+     * Number of columns
+     */
+    Index cols() { return m_col; }
+    
+    /**
+     * Return the array of nonzero values packed by column
+     * 
+     * The size is nnz
+     */
+    Scalar* valuePtr() {  return m_nzval; }
+    
+    const Scalar* valuePtr() const 
+    {
+      return m_nzval; 
+    }
+    /**
+     * Return the pointers to the beginning of each column in \ref valuePtr()
+     */
+    Index* colIndexPtr()
+    {
+      return m_nzval_colptr; 
+    }
+    
+    const Index* colIndexPtr() const
+    {
+      return m_nzval_colptr; 
+    }
+    
+    /**
+     * Return the array of compressed row indices of all supernodes
+     */
+    Index* rowIndex()  { return m_rowind; }
+    
+    const Index* rowIndex() const
+    {
+      return m_rowind; 
+    }
+    
+    /**
+     * Return the location in \em rowvaluePtr() which starts each column
+     */
+    Index* rowIndexPtr() { return m_rowind_colptr; }
+    
+    const Index* rowIndexPtr() const 
+    {
+      return m_rowind_colptr; 
+    }
+    
+    /** 
+     * Return the array of column-to-supernode mapping 
+     */
+    Index* colToSup()  { return m_col_to_sup; }
+    
+    const Index* colToSup() const
+    {
+      return m_col_to_sup;       
+    }
+    /**
+     * Return the array of supernode-to-column mapping
+     */
+    Index* supToCol() { return m_sup_to_col; }
+    
+    const Index* supToCol() const 
+    {
+      return m_sup_to_col;
+    }
+    
+    /**
+     * Return the number of supernodes
+     */
+    Index nsuper() const 
+    {
+      return m_nsuper; 
+    }
+    
+    class InnerIterator; 
+    template<typename Dest>
+    void solveInPlace( MatrixBase<Dest>&X) const;
+    
+      
+      
+    
+  protected:
+    Index m_row; // Number of rows
+    Index m_col; // Number of columns 
+    Index m_nsuper; // Number of supernodes 
+    Scalar* m_nzval; //array of nonzero values packed by column
+    Index* m_nzval_colptr; //nzval_colptr[j] Stores the location in nzval[] which starts column j 
+    Index* m_rowind; // Array of compressed row indices of rectangular supernodes
+    Index* m_rowind_colptr; //rowind_colptr[j] stores the location in rowind[] which starts column j
+    Index* m_col_to_sup; // col_to_sup[j] is the supernode number to which column j belongs
+    Index* m_sup_to_col; //sup_to_col[s] points to the starting column of the s-th supernode
+    
+  private :
+};
+
+/**
+  * \brief InnerIterator class to iterate over nonzero values of the current column in the supernodal matrix L
+  * 
+  */
+template<typename Scalar, typename Index>
+class MappedSuperNodalMatrix<Scalar,Index>::InnerIterator
+{
+  public:
+     InnerIterator(const MappedSuperNodalMatrix& mat, Index outer)
+      : m_matrix(mat),
+        m_outer(outer), 
+        m_supno(mat.colToSup()[outer]),
+        m_idval(mat.colIndexPtr()[outer]),
+        m_startidval(m_idval),
+        m_endidval(mat.colIndexPtr()[outer+1]),
+        m_idrow(mat.rowIndexPtr()[outer]),
+        m_endidrow(mat.rowIndexPtr()[outer+1])
+    {}
+    inline InnerIterator& operator++()
+    { 
+      m_idval++; 
+      m_idrow++;
+      return *this;
+    }
+    inline Scalar value() const { return m_matrix.valuePtr()[m_idval]; }
+    
+    inline Scalar& valueRef() { return const_cast<Scalar&>(m_matrix.valuePtr()[m_idval]); }
+    
+    inline Index index() const { return m_matrix.rowIndex()[m_idrow]; }
+    inline Index row() const { return index(); }
+    inline Index col() const { return m_outer; }
+    
+    inline Index supIndex() const { return m_supno; }
+    
+    inline operator bool() const 
+    { 
+      return ( (m_idval < m_endidval) && (m_idval >= m_startidval)
+                && (m_idrow < m_endidrow) );
+    }
+    
+  protected:
+    const MappedSuperNodalMatrix& m_matrix; // Supernodal lower triangular matrix 
+    const Index m_outer;                    // Current column 
+    const Index m_supno;                    // Current SuperNode number
+    Index m_idval;                          // Index to browse the values in the current column
+    const Index m_startidval;               // Start of the column value
+    const Index m_endidval;                 // End of the column value
+    Index m_idrow;                          // Index to browse the row indices 
+    Index m_endidrow;                       // End index of row indices of the current column
+};
+
+/**
+ * \brief Solve with the supernode triangular matrix
+ * 
+ */
+template<typename Scalar, typename Index>
+template<typename Dest>
+void MappedSuperNodalMatrix<Scalar,Index>::solveInPlace( MatrixBase<Dest>&X) const
+{
+    Index n = X.rows(); 
+    Index nrhs = X.cols(); 
+    const Scalar * Lval = valuePtr();                 // Nonzero values 
+    Matrix<Scalar,Dynamic,Dynamic> work(n, nrhs);     // working vector
+    work.setZero();
+    for (Index k = 0; k <= nsuper(); k ++)
+    {
+      Index fsupc = supToCol()[k];                    // First column of the current supernode 
+      Index istart = rowIndexPtr()[fsupc];            // Pointer index to the subscript of the current column
+      Index nsupr = rowIndexPtr()[fsupc+1] - istart;  // Number of rows in the current supernode
+      Index nsupc = supToCol()[k+1] - fsupc;          // Number of columns in the current supernode
+      Index nrow = nsupr - nsupc;                     // Number of rows in the non-diagonal part of the supernode
+      Index irow;                                     //Current index row
+      
+      if (nsupc == 1 )
+      {
+        for (Index j = 0; j < nrhs; j++)
+        {
+          InnerIterator it(*this, fsupc);
+          ++it; // Skip the diagonal element
+          for (; it; ++it)
+          {
+            irow = it.row();
+            X(irow, j) -= X(fsupc, j) * it.value();
+          }
+        }
+      }
+      else
+      {
+        // The supernode has more than one column 
+        Index luptr = colIndexPtr()[fsupc]; 
+        Index lda = colIndexPtr()[fsupc+1] - luptr;
+        
+        // Triangular solve 
+        Map<const Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > A( &(Lval[luptr]), nsupc, nsupc, OuterStride<>(lda) ); 
+        Map< Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > U (&(X(fsupc,0)), nsupc, nrhs, OuterStride<>(n) ); 
+        U = A.template triangularView<UnitLower>().solve(U); 
+        
+        // Matrix-vector product 
+        new (&A) Map<const Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > ( &(Lval[luptr+nsupc]), nrow, nsupc, OuterStride<>(lda) ); 
+        work.block(0, 0, nrow, nrhs) = A * U; 
+        
+        //Begin Scatter 
+        for (Index j = 0; j < nrhs; j++)
+        {
+          Index iptr = istart + nsupc; 
+          for (Index i = 0; i < nrow; i++)
+          {
+            irow = rowIndex()[iptr]; 
+            X(irow, j) -= work(i, j); // Scatter operation
+            work(i, j) = Scalar(0); 
+            iptr++;
+          }
+        }
+      }
+    } 
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSELU_MATRIX_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Utils.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Utils.h
new file mode 100644
index 0000000000..15352ac33a
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_Utils.h
@@ -0,0 +1,80 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+
+#ifndef EIGEN_SPARSELU_UTILS_H
+#define EIGEN_SPARSELU_UTILS_H
+
+namespace Eigen {
+namespace internal {
+
+/**
+ * \brief Count Nonzero elements in the factors
+ */
+template <typename Scalar, typename Index>
+void SparseLUImpl<Scalar,Index>::countnz(const Index n, Index& nnzL, Index& nnzU, GlobalLU_t& glu)
+{
+ nnzL = 0; 
+ nnzU = (glu.xusub)(n); 
+ Index nsuper = (glu.supno)(n); 
+ Index jlen; 
+ Index i, j, fsupc;
+ if (n <= 0 ) return; 
+ // For each supernode
+ for (i = 0; i <= nsuper; i++)
+ {
+   fsupc = glu.xsup(i); 
+   jlen = glu.xlsub(fsupc+1) - glu.xlsub(fsupc); 
+   
+   for (j = fsupc; j < glu.xsup(i+1); j++)
+   {
+     nnzL += jlen; 
+     nnzU += j - fsupc + 1; 
+     jlen--; 
+   }
+ }
+}
+
+/**
+ * \brief Fix up the data storage lsub for L-subscripts. 
+ * 
+ * It removes the subscripts sets for structural pruning, 
+ * and applies permutation to the remaining subscripts
+ * 
+ */
+template <typename Scalar, typename Index>
+void SparseLUImpl<Scalar,Index>::fixupL(const Index n, const IndexVector& perm_r, GlobalLU_t& glu)
+{
+  Index fsupc, i, j, k, jstart; 
+  
+  Index nextl = 0; 
+  Index nsuper = (glu.supno)(n); 
+  
+  // For each supernode 
+  for (i = 0; i <= nsuper; i++)
+  {
+    fsupc = glu.xsup(i); 
+    jstart = glu.xlsub(fsupc); 
+    glu.xlsub(fsupc) = nextl; 
+    for (j = jstart; j < glu.xlsub(fsupc + 1); j++)
+    {
+      glu.lsub(nextl) = perm_r(glu.lsub(j)); // Now indexed into P*A
+      nextl++;
+    }
+    for (k = fsupc+1; k < glu.xsup(i+1); k++)
+      glu.xlsub(k) = nextl; // other columns in supernode i
+  }
+  
+  glu.xlsub(n) = nextl; 
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+#endif // EIGEN_SPARSELU_UTILS_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_column_bmod.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_column_bmod.h
new file mode 100644
index 0000000000..f24bd87d3e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_column_bmod.h
@@ -0,0 +1,180 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* 
+ 
+ * NOTE: This file is the modified version of xcolumn_bmod.c file in SuperLU 
+ 
+ * -- SuperLU routine (version 3.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * October 15, 2003
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+#ifndef SPARSELU_COLUMN_BMOD_H
+#define SPARSELU_COLUMN_BMOD_H
+
+namespace Eigen {
+
+namespace internal {
+/**
+ * \brief Performs numeric block updates (sup-col) in topological order
+ * 
+ * \param jcol current column to update
+ * \param nseg Number of segments in the U part
+ * \param dense Store the full representation of the column
+ * \param tempv working array 
+ * \param segrep segment representative ...
+ * \param repfnz ??? First nonzero column in each row ???  ...
+ * \param fpanelc First column in the current panel
+ * \param glu Global LU data. 
+ * \return 0 - successful return 
+ *         > 0 - number of bytes allocated when run out of space
+ * 
+ */
+template <typename Scalar, typename Index>
+Index SparseLUImpl<Scalar,Index>::column_bmod(const Index jcol, const Index nseg, BlockScalarVector dense, ScalarVector& tempv, BlockIndexVector segrep, BlockIndexVector repfnz, Index fpanelc, GlobalLU_t& glu)
+{
+  Index  jsupno, k, ksub, krep, ksupno; 
+  Index lptr, nrow, isub, irow, nextlu, new_next, ufirst; 
+  Index fsupc, nsupc, nsupr, luptr, kfnz, no_zeros; 
+  /* krep = representative of current k-th supernode
+    * fsupc =  first supernodal column
+    * nsupc = number of columns in a supernode
+    * nsupr = number of rows in a supernode
+    * luptr = location of supernodal LU-block in storage
+    * kfnz = first nonz in the k-th supernodal segment
+    * no_zeros = no lf leading zeros in a supernodal U-segment
+    */
+  
+  jsupno = glu.supno(jcol);
+  // For each nonzero supernode segment of U[*,j] in topological order 
+  k = nseg - 1; 
+  Index d_fsupc; // distance between the first column of the current panel and the 
+               // first column of the current snode
+  Index fst_col; // First column within small LU update
+  Index segsize; 
+  for (ksub = 0; ksub < nseg; ksub++)
+  {
+    krep = segrep(k); k--; 
+    ksupno = glu.supno(krep); 
+    if (jsupno != ksupno )
+    {
+      // outside the rectangular supernode 
+      fsupc = glu.xsup(ksupno); 
+      fst_col = (std::max)(fsupc, fpanelc); 
+      
+      // Distance from the current supernode to the current panel; 
+      // d_fsupc = 0 if fsupc > fpanelc
+      d_fsupc = fst_col - fsupc; 
+      
+      luptr = glu.xlusup(fst_col) + d_fsupc; 
+      lptr = glu.xlsub(fsupc) + d_fsupc; 
+      
+      kfnz = repfnz(krep); 
+      kfnz = (std::max)(kfnz, fpanelc); 
+      
+      segsize = krep - kfnz + 1; 
+      nsupc = krep - fst_col + 1; 
+      nsupr = glu.xlsub(fsupc+1) - glu.xlsub(fsupc); 
+      nrow = nsupr - d_fsupc - nsupc;
+      Index lda = glu.xlusup(fst_col+1) - glu.xlusup(fst_col);
+      
+      
+      // Perform a triangular solver and block update, 
+      // then scatter the result of sup-col update to dense
+      no_zeros = kfnz - fst_col; 
+      if(segsize==1)
+        LU_kernel_bmod<1>::run(segsize, dense, tempv, glu.lusup, luptr, lda, nrow, glu.lsub, lptr, no_zeros);
+      else
+        LU_kernel_bmod<Dynamic>::run(segsize, dense, tempv, glu.lusup, luptr, lda, nrow, glu.lsub, lptr, no_zeros);
+    } // end if jsupno 
+  } // end for each segment
+  
+  // Process the supernodal portion of  L\U[*,j]
+  nextlu = glu.xlusup(jcol); 
+  fsupc = glu.xsup(jsupno);
+  
+  // copy the SPA dense into L\U[*,j]
+  Index mem; 
+  new_next = nextlu + glu.xlsub(fsupc + 1) - glu.xlsub(fsupc); 
+  Index offset = internal::first_multiple<Index>(new_next, internal::packet_traits<Scalar>::size) - new_next;
+  if(offset)
+    new_next += offset;
+  while (new_next > glu.nzlumax )
+  {
+    mem = memXpand<ScalarVector>(glu.lusup, glu.nzlumax, nextlu, LUSUP, glu.num_expansions);  
+    if (mem) return mem; 
+  }
+  
+  for (isub = glu.xlsub(fsupc); isub < glu.xlsub(fsupc+1); isub++)
+  {
+    irow = glu.lsub(isub);
+    glu.lusup(nextlu) = dense(irow);
+    dense(irow) = Scalar(0.0); 
+    ++nextlu; 
+  }
+  
+  if(offset)
+  {
+    glu.lusup.segment(nextlu,offset).setZero();
+    nextlu += offset;
+  }
+  glu.xlusup(jcol + 1) = nextlu;  // close L\U(*,jcol); 
+  
+  /* For more updates within the panel (also within the current supernode),
+   * should start from the first column of the panel, or the first column
+   * of the supernode, whichever is bigger. There are two cases:
+   *  1) fsupc < fpanelc, then fst_col <-- fpanelc
+   *  2) fsupc >= fpanelc, then fst_col <-- fsupc
+   */
+  fst_col = (std::max)(fsupc, fpanelc); 
+  
+  if (fst_col  < jcol)
+  {
+    // Distance between the current supernode and the current panel
+    // d_fsupc = 0 if fsupc >= fpanelc
+    d_fsupc = fst_col - fsupc; 
+    
+    lptr = glu.xlsub(fsupc) + d_fsupc; 
+    luptr = glu.xlusup(fst_col) + d_fsupc; 
+    nsupr = glu.xlsub(fsupc+1) - glu.xlsub(fsupc); // leading dimension
+    nsupc = jcol - fst_col; // excluding jcol 
+    nrow = nsupr - d_fsupc - nsupc; 
+    
+    // points to the beginning of jcol in snode L\U(jsupno) 
+    ufirst = glu.xlusup(jcol) + d_fsupc; 
+    Index lda = glu.xlusup(jcol+1) - glu.xlusup(jcol);
+    Map<Matrix<Scalar,Dynamic,Dynamic>, 0,  OuterStride<> > A( &(glu.lusup.data()[luptr]), nsupc, nsupc, OuterStride<>(lda) ); 
+    VectorBlock<ScalarVector> u(glu.lusup, ufirst, nsupc); 
+    u = A.template triangularView<UnitLower>().solve(u); 
+    
+    new (&A) Map<Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > ( &(glu.lusup.data()[luptr+nsupc]), nrow, nsupc, OuterStride<>(lda) ); 
+    VectorBlock<ScalarVector> l(glu.lusup, ufirst+nsupc, nrow); 
+    l.noalias() -= A * u;
+    
+  } // End if fst_col
+  return 0; 
+}
+
+} // end namespace internal
+} // end namespace Eigen
+
+#endif // SPARSELU_COLUMN_BMOD_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_column_dfs.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_column_dfs.h
new file mode 100644
index 0000000000..4c04b0e44e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_column_dfs.h
@@ -0,0 +1,177 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* 
+ 
+ * NOTE: This file is the modified version of [s,d,c,z]column_dfs.c file in SuperLU 
+ 
+ * -- SuperLU routine (version 2.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * November 15, 1997
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+#ifndef SPARSELU_COLUMN_DFS_H
+#define SPARSELU_COLUMN_DFS_H
+
+template <typename Scalar, typename Index> class SparseLUImpl;
+namespace Eigen {
+
+namespace internal {
+
+template<typename IndexVector, typename ScalarVector>
+struct column_dfs_traits : no_assignment_operator
+{
+  typedef typename ScalarVector::Scalar Scalar;
+  typedef typename IndexVector::Scalar Index;
+  column_dfs_traits(Index jcol, Index& jsuper, typename SparseLUImpl<Scalar, Index>::GlobalLU_t& glu, SparseLUImpl<Scalar, Index>& luImpl)
+   : m_jcol(jcol), m_jsuper_ref(jsuper), m_glu(glu), m_luImpl(luImpl)
+ {}
+  bool update_segrep(Index /*krep*/, Index /*jj*/)
+  {
+    return true;
+  }
+  void mem_expand(IndexVector& lsub, Index& nextl, Index chmark)
+  {
+    if (nextl >= m_glu.nzlmax)
+      m_luImpl.memXpand(lsub, m_glu.nzlmax, nextl, LSUB, m_glu.num_expansions); 
+    if (chmark != (m_jcol-1)) m_jsuper_ref = emptyIdxLU;
+  }
+  enum { ExpandMem = true };
+  
+  Index m_jcol;
+  Index& m_jsuper_ref;
+  typename SparseLUImpl<Scalar, Index>::GlobalLU_t& m_glu;
+  SparseLUImpl<Scalar, Index>& m_luImpl;
+};
+
+
+/**
+ * \brief Performs a symbolic factorization on column jcol and decide the supernode boundary
+ * 
+ * A supernode representative is the last column of a supernode.
+ * The nonzeros in U[*,j] are segments that end at supernodes representatives. 
+ * The routine returns a list of the supernodal representatives 
+ * in topological order of the dfs that generates them. 
+ * The location of the first nonzero in each supernodal segment 
+ * (supernodal entry location) is also returned. 
+ * 
+ * \param m number of rows in the matrix
+ * \param jcol Current column 
+ * \param perm_r Row permutation
+ * \param maxsuper  Maximum number of column allowed in a supernode
+ * \param [in,out] nseg Number of segments in current U[*,j] - new segments appended
+ * \param lsub_col defines the rhs vector to start the dfs
+ * \param [in,out] segrep Segment representatives - new segments appended 
+ * \param repfnz  First nonzero location in each row
+ * \param xprune 
+ * \param marker  marker[i] == jj, if i was visited during dfs of current column jj;
+ * \param parent
+ * \param xplore working array
+ * \param glu global LU data 
+ * \return 0 success
+ *         > 0 number of bytes allocated when run out of space
+ * 
+ */
+template <typename Scalar, typename Index>
+Index SparseLUImpl<Scalar,Index>::column_dfs(const Index m, const Index jcol, IndexVector& perm_r, Index maxsuper, Index& nseg,  BlockIndexVector lsub_col, IndexVector& segrep, BlockIndexVector repfnz, IndexVector& xprune, IndexVector& marker, IndexVector& parent, IndexVector& xplore, GlobalLU_t& glu)
+{
+  
+  Index jsuper = glu.supno(jcol); 
+  Index nextl = glu.xlsub(jcol); 
+  VectorBlock<IndexVector> marker2(marker, 2*m, m); 
+  
+  
+  column_dfs_traits<IndexVector, ScalarVector> traits(jcol, jsuper, glu, *this);
+  
+  // For each nonzero in A(*,jcol) do dfs 
+  for (Index k = 0; ((k < m) ? lsub_col[k] != emptyIdxLU : false) ; k++)
+  {
+    Index krow = lsub_col(k); 
+    lsub_col(k) = emptyIdxLU; 
+    Index kmark = marker2(krow); 
+    
+    // krow was visited before, go to the next nonz; 
+    if (kmark == jcol) continue;
+    
+    dfs_kernel(jcol, perm_r, nseg, glu.lsub, segrep, repfnz, xprune, marker2, parent,
+                   xplore, glu, nextl, krow, traits);
+  } // for each nonzero ... 
+  
+  Index fsupc, jptr, jm1ptr, ito, ifrom, istop;
+  Index nsuper = glu.supno(jcol);
+  Index jcolp1 = jcol + 1;
+  Index jcolm1 = jcol - 1;
+  
+  // check to see if j belongs in the same supernode as j-1
+  if ( jcol == 0 )
+  { // Do nothing for column 0 
+    nsuper = glu.supno(0) = 0 ;
+  }
+  else 
+  {
+    fsupc = glu.xsup(nsuper); 
+    jptr = glu.xlsub(jcol); // Not yet compressed
+    jm1ptr = glu.xlsub(jcolm1); 
+    
+    // Use supernodes of type T2 : see SuperLU paper
+    if ( (nextl-jptr != jptr-jm1ptr-1) ) jsuper = emptyIdxLU;
+    
+    // Make sure the number of columns in a supernode doesn't
+    // exceed threshold
+    if ( (jcol - fsupc) >= maxsuper) jsuper = emptyIdxLU; 
+    
+    /* If jcol starts a new supernode, reclaim storage space in
+     * glu.lsub from previous supernode. Note we only store 
+     * the subscript set of the first and last columns of 
+     * a supernode. (first for num values, last for pruning)
+     */
+    if (jsuper == emptyIdxLU)
+    { // starts a new supernode 
+      if ( (fsupc < jcolm1-1) ) 
+      { // >= 3 columns in nsuper
+        ito = glu.xlsub(fsupc+1);
+        glu.xlsub(jcolm1) = ito; 
+        istop = ito + jptr - jm1ptr; 
+        xprune(jcolm1) = istop; // intialize xprune(jcol-1)
+        glu.xlsub(jcol) = istop; 
+        
+        for (ifrom = jm1ptr; ifrom < nextl; ++ifrom, ++ito)
+          glu.lsub(ito) = glu.lsub(ifrom); 
+        nextl = ito;  // = istop + length(jcol)
+      }
+      nsuper++; 
+      glu.supno(jcol) = nsuper; 
+    } // if a new supernode 
+  } // end else:  jcol > 0
+  
+  // Tidy up the pointers before exit
+  glu.xsup(nsuper+1) = jcolp1; 
+  glu.supno(jcolp1) = nsuper; 
+  xprune(jcol) = nextl;  // Intialize upper bound for pruning
+  glu.xlsub(jcolp1) = nextl; 
+  
+  return 0; 
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_copy_to_ucol.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_copy_to_ucol.h
new file mode 100644
index 0000000000..170610d9f2
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_copy_to_ucol.h
@@ -0,0 +1,106 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+/* 
+ 
+ * NOTE: This file is the modified version of [s,d,c,z]copy_to_ucol.c file in SuperLU 
+ 
+ * -- SuperLU routine (version 2.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * November 15, 1997
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+#ifndef SPARSELU_COPY_TO_UCOL_H
+#define SPARSELU_COPY_TO_UCOL_H
+
+namespace Eigen {
+namespace internal {
+
+/**
+ * \brief Performs numeric block updates (sup-col) in topological order
+ * 
+ * \param jcol current column to update
+ * \param nseg Number of segments in the U part
+ * \param segrep segment representative ...
+ * \param repfnz First nonzero column in each row  ...
+ * \param perm_r Row permutation 
+ * \param dense Store the full representation of the column
+ * \param glu Global LU data. 
+ * \return 0 - successful return 
+ *         > 0 - number of bytes allocated when run out of space
+ * 
+ */
+template <typename Scalar, typename Index>
+Index SparseLUImpl<Scalar,Index>::copy_to_ucol(const Index jcol, const Index nseg, IndexVector& segrep, BlockIndexVector repfnz ,IndexVector& perm_r, BlockScalarVector dense, GlobalLU_t& glu)
+{  
+  Index ksub, krep, ksupno; 
+    
+  Index jsupno = glu.supno(jcol);
+  
+  // For each nonzero supernode segment of U[*,j] in topological order 
+  Index k = nseg - 1, i; 
+  Index nextu = glu.xusub(jcol); 
+  Index kfnz, isub, segsize; 
+  Index new_next,irow; 
+  Index fsupc, mem; 
+  for (ksub = 0; ksub < nseg; ksub++)
+  {
+    krep = segrep(k); k--; 
+    ksupno = glu.supno(krep); 
+    if (jsupno != ksupno ) // should go into ucol(); 
+    {
+      kfnz = repfnz(krep); 
+      if (kfnz != emptyIdxLU)
+      { // Nonzero U-segment 
+        fsupc = glu.xsup(ksupno); 
+        isub = glu.xlsub(fsupc) + kfnz - fsupc; 
+        segsize = krep - kfnz + 1; 
+        new_next = nextu + segsize; 
+        while (new_next > glu.nzumax) 
+        {
+          mem = memXpand<ScalarVector>(glu.ucol, glu.nzumax, nextu, UCOL, glu.num_expansions); 
+          if (mem) return mem; 
+          mem = memXpand<IndexVector>(glu.usub, glu.nzumax, nextu, USUB, glu.num_expansions); 
+          if (mem) return mem; 
+          
+        }
+        
+        for (i = 0; i < segsize; i++)
+        {
+          irow = glu.lsub(isub); 
+          glu.usub(nextu) = perm_r(irow); // Unlike the L part, the U part is stored in its final order
+          glu.ucol(nextu) = dense(irow); 
+          dense(irow) = Scalar(0.0); 
+          nextu++;
+          isub++;
+        }
+        
+      } // end nonzero U-segment 
+      
+    } // end if jsupno 
+    
+  } // end for each segment
+  glu.xusub(jcol + 1) = nextu; // close U(*,jcol)
+  return 0; 
+}
+
+} // namespace internal
+} // end namespace Eigen
+
+#endif // SPARSELU_COPY_TO_UCOL_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_gemm_kernel.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_gemm_kernel.h
new file mode 100644
index 0000000000..9e4e3e72b7
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_gemm_kernel.h
@@ -0,0 +1,279 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSELU_GEMM_KERNEL_H
+#define EIGEN_SPARSELU_GEMM_KERNEL_H
+
+namespace Eigen {
+
+namespace internal {
+
+
+/** \internal
+  * A general matrix-matrix product kernel optimized for the SparseLU factorization.
+  *  - A, B, and C must be column major
+  *  - lda and ldc must be multiples of the respective packet size
+  *  - C must have the same alignment as A
+  */
+template<typename Scalar,typename Index>
+EIGEN_DONT_INLINE
+void sparselu_gemm(Index m, Index n, Index d, const Scalar* A, Index lda, const Scalar* B, Index ldb, Scalar* C, Index ldc)
+{
+  using namespace Eigen::internal;
+  
+  typedef typename packet_traits<Scalar>::type Packet;
+  enum {
+    NumberOfRegisters = EIGEN_ARCH_DEFAULT_NUMBER_OF_REGISTERS,
+    PacketSize = packet_traits<Scalar>::size,
+    PM = 8,                             // peeling in M
+    RN = 2,                             // register blocking
+    RK = NumberOfRegisters>=16 ? 4 : 2, // register blocking
+    BM = 4096/sizeof(Scalar),           // number of rows of A-C per chunk
+    SM = PM*PacketSize                  // step along M
+  };
+  Index d_end = (d/RK)*RK;    // number of columns of A (rows of B) suitable for full register blocking
+  Index n_end = (n/RN)*RN;    // number of columns of B-C suitable for processing RN columns at once
+  Index i0 = internal::first_aligned(A,m);
+  
+  eigen_internal_assert(((lda%PacketSize)==0) && ((ldc%PacketSize)==0) && (i0==internal::first_aligned(C,m)));
+  
+  // handle the non aligned rows of A and C without any optimization:
+  for(Index i=0; i<i0; ++i)
+  {
+    for(Index j=0; j<n; ++j)
+    {
+      Scalar c = C[i+j*ldc];
+      for(Index k=0; k<d; ++k)
+        c += B[k+j*ldb] * A[i+k*lda];
+      C[i+j*ldc] = c;
+    }
+  }
+  // process the remaining rows per chunk of BM rows
+  for(Index ib=i0; ib<m; ib+=BM)
+  {
+    Index actual_b = std::min<Index>(BM, m-ib);                 // actual number of rows
+    Index actual_b_end1 = (actual_b/SM)*SM;                   // actual number of rows suitable for peeling
+    Index actual_b_end2 = (actual_b/PacketSize)*PacketSize;   // actual number of rows suitable for vectorization
+    
+    // Let's process two columns of B-C at once
+    for(Index j=0; j<n_end; j+=RN)
+    {
+      const Scalar* Bc0 = B+(j+0)*ldb;
+      const Scalar* Bc1 = B+(j+1)*ldb;
+      
+      for(Index k=0; k<d_end; k+=RK)
+      {
+        
+        // load and expand a RN x RK block of B
+        Packet b00, b10, b20, b30, b01, b11, b21, b31;
+                  b00 = pset1<Packet>(Bc0[0]);
+                  b10 = pset1<Packet>(Bc0[1]);
+        if(RK==4) b20 = pset1<Packet>(Bc0[2]);
+        if(RK==4) b30 = pset1<Packet>(Bc0[3]);
+                  b01 = pset1<Packet>(Bc1[0]);
+                  b11 = pset1<Packet>(Bc1[1]);
+        if(RK==4) b21 = pset1<Packet>(Bc1[2]);
+        if(RK==4) b31 = pset1<Packet>(Bc1[3]);
+        
+        Packet a0, a1, a2, a3, c0, c1, t0, t1;
+        
+        const Scalar* A0 = A+ib+(k+0)*lda;
+        const Scalar* A1 = A+ib+(k+1)*lda;
+        const Scalar* A2 = A+ib+(k+2)*lda;
+        const Scalar* A3 = A+ib+(k+3)*lda;
+        
+        Scalar* C0 = C+ib+(j+0)*ldc;
+        Scalar* C1 = C+ib+(j+1)*ldc;
+        
+                  a0 = pload<Packet>(A0);
+                  a1 = pload<Packet>(A1);
+        if(RK==4)
+        {
+          a2 = pload<Packet>(A2);
+          a3 = pload<Packet>(A3);
+        }
+        else
+        {
+          // workaround "may be used uninitialized in this function" warning
+          a2 = a3 = a0;
+        }
+        
+#define KMADD(c, a, b, tmp) {tmp = b; tmp = pmul(a,tmp); c = padd(c,tmp);}
+#define WORK(I)  \
+                    c0 = pload<Packet>(C0+i+(I)*PacketSize);   \
+                    c1 = pload<Packet>(C1+i+(I)*PacketSize);   \
+                    KMADD(c0, a0, b00, t0)      \
+                    KMADD(c1, a0, b01, t1)      \
+                    a0 = pload<Packet>(A0+i+(I+1)*PacketSize); \
+                    KMADD(c0, a1, b10, t0)      \
+                    KMADD(c1, a1, b11, t1)       \
+                    a1 = pload<Packet>(A1+i+(I+1)*PacketSize); \
+          if(RK==4) KMADD(c0, a2, b20, t0)       \
+          if(RK==4) KMADD(c1, a2, b21, t1)       \
+          if(RK==4) a2 = pload<Packet>(A2+i+(I+1)*PacketSize); \
+          if(RK==4) KMADD(c0, a3, b30, t0)       \
+          if(RK==4) KMADD(c1, a3, b31, t1)       \
+          if(RK==4) a3 = pload<Packet>(A3+i+(I+1)*PacketSize); \
+                    pstore(C0+i+(I)*PacketSize, c0);           \
+                    pstore(C1+i+(I)*PacketSize, c1)
+        
+        // process rows of A' - C' with aggressive vectorization and peeling 
+        for(Index i=0; i<actual_b_end1; i+=PacketSize*8)
+        {
+          EIGEN_ASM_COMMENT("SPARSELU_GEMML_KERNEL1");
+                    prefetch((A0+i+(5)*PacketSize));
+                    prefetch((A1+i+(5)*PacketSize));
+          if(RK==4) prefetch((A2+i+(5)*PacketSize));
+          if(RK==4) prefetch((A3+i+(5)*PacketSize));
+                    WORK(0);
+                    WORK(1);
+                    WORK(2);
+                    WORK(3);
+                    WORK(4);
+                    WORK(5);
+                    WORK(6);
+                    WORK(7);
+        }
+        // process the remaining rows with vectorization only
+        for(Index i=actual_b_end1; i<actual_b_end2; i+=PacketSize)
+        {
+          WORK(0);
+        }
+#undef WORK
+        // process the remaining rows without vectorization
+        for(Index i=actual_b_end2; i<actual_b; ++i)
+        {
+          if(RK==4)
+          {
+            C0[i] += A0[i]*Bc0[0]+A1[i]*Bc0[1]+A2[i]*Bc0[2]+A3[i]*Bc0[3];
+            C1[i] += A0[i]*Bc1[0]+A1[i]*Bc1[1]+A2[i]*Bc1[2]+A3[i]*Bc1[3];
+          }
+          else
+          {
+            C0[i] += A0[i]*Bc0[0]+A1[i]*Bc0[1];
+            C1[i] += A0[i]*Bc1[0]+A1[i]*Bc1[1];
+          }
+        }
+        
+        Bc0 += RK;
+        Bc1 += RK;
+      } // peeled loop on k
+    } // peeled loop on the columns j
+    // process the last column (we now perform a matrux-vector product)
+    if((n-n_end)>0)
+    {
+      const Scalar* Bc0 = B+(n-1)*ldb;
+      
+      for(Index k=0; k<d_end; k+=RK)
+      {
+        
+        // load and expand a 1 x RK block of B
+        Packet b00, b10, b20, b30;
+                  b00 = pset1<Packet>(Bc0[0]);
+                  b10 = pset1<Packet>(Bc0[1]);
+        if(RK==4) b20 = pset1<Packet>(Bc0[2]);
+        if(RK==4) b30 = pset1<Packet>(Bc0[3]);
+        
+        Packet a0, a1, a2, a3, c0, t0/*, t1*/;
+        
+        const Scalar* A0 = A+ib+(k+0)*lda;
+        const Scalar* A1 = A+ib+(k+1)*lda;
+        const Scalar* A2 = A+ib+(k+2)*lda;
+        const Scalar* A3 = A+ib+(k+3)*lda;
+        
+        Scalar* C0 = C+ib+(n_end)*ldc;
+        
+                  a0 = pload<Packet>(A0);
+                  a1 = pload<Packet>(A1);
+        if(RK==4)
+        {
+          a2 = pload<Packet>(A2);
+          a3 = pload<Packet>(A3);
+        }
+        else
+        {
+          // workaround "may be used uninitialized in this function" warning
+          a2 = a3 = a0;
+        }
+        
+#define WORK(I) \
+                  c0 = pload<Packet>(C0+i+(I)*PacketSize);   \
+                  KMADD(c0, a0, b00, t0)       \
+                  a0 = pload<Packet>(A0+i+(I+1)*PacketSize); \
+                  KMADD(c0, a1, b10, t0)       \
+                  a1 = pload<Packet>(A1+i+(I+1)*PacketSize); \
+        if(RK==4) KMADD(c0, a2, b20, t0)       \
+        if(RK==4) a2 = pload<Packet>(A2+i+(I+1)*PacketSize); \
+        if(RK==4) KMADD(c0, a3, b30, t0)       \
+        if(RK==4) a3 = pload<Packet>(A3+i+(I+1)*PacketSize); \
+                  pstore(C0+i+(I)*PacketSize, c0);
+        
+        // agressive vectorization and peeling
+        for(Index i=0; i<actual_b_end1; i+=PacketSize*8)
+        {
+          EIGEN_ASM_COMMENT("SPARSELU_GEMML_KERNEL2");
+          WORK(0);
+          WORK(1);
+          WORK(2);
+          WORK(3);
+          WORK(4);
+          WORK(5);
+          WORK(6);
+          WORK(7);
+        }
+        // vectorization only
+        for(Index i=actual_b_end1; i<actual_b_end2; i+=PacketSize)
+        {
+          WORK(0);
+        }
+        // remaining scalars
+        for(Index i=actual_b_end2; i<actual_b; ++i)
+        {
+          if(RK==4) 
+            C0[i] += A0[i]*Bc0[0]+A1[i]*Bc0[1]+A2[i]*Bc0[2]+A3[i]*Bc0[3];
+          else
+            C0[i] += A0[i]*Bc0[0]+A1[i]*Bc0[1];
+        }
+        
+        Bc0 += RK;
+#undef WORK
+      }
+    }
+    
+    // process the last columns of A, corresponding to the last rows of B
+    Index rd = d-d_end;
+    if(rd>0)
+    {
+      for(Index j=0; j<n; ++j)
+      {
+        enum {
+          Alignment = PacketSize>1 ? Aligned : 0
+        };
+        typedef Map<Matrix<Scalar,Dynamic,1>, Alignment > MapVector;
+        typedef Map<const Matrix<Scalar,Dynamic,1>, Alignment > ConstMapVector;
+        if(rd==1)       MapVector(C+j*ldc+ib,actual_b) += B[0+d_end+j*ldb] * ConstMapVector(A+(d_end+0)*lda+ib, actual_b);
+        
+        else if(rd==2)  MapVector(C+j*ldc+ib,actual_b) += B[0+d_end+j*ldb] * ConstMapVector(A+(d_end+0)*lda+ib, actual_b)
+                                                        + B[1+d_end+j*ldb] * ConstMapVector(A+(d_end+1)*lda+ib, actual_b);
+        
+        else            MapVector(C+j*ldc+ib,actual_b) += B[0+d_end+j*ldb] * ConstMapVector(A+(d_end+0)*lda+ib, actual_b)
+                                                        + B[1+d_end+j*ldb] * ConstMapVector(A+(d_end+1)*lda+ib, actual_b)
+                                                        + B[2+d_end+j*ldb] * ConstMapVector(A+(d_end+2)*lda+ib, actual_b);
+      }
+    }
+  
+  } // blocking on the rows of A and C
+}
+#undef KMADD
+
+} // namespace internal
+
+} // namespace Eigen
+
+#endif // EIGEN_SPARSELU_GEMM_KERNEL_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_heap_relax_snode.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_heap_relax_snode.h
new file mode 100644
index 0000000000..7a4e4305aa
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_heap_relax_snode.h
@@ -0,0 +1,127 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* This file is a modified version of heap_relax_snode.c file in SuperLU
+ * -- SuperLU routine (version 3.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * October 15, 2003
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+
+#ifndef SPARSELU_HEAP_RELAX_SNODE_H
+#define SPARSELU_HEAP_RELAX_SNODE_H
+
+namespace Eigen {
+namespace internal {
+
+/** 
+ * \brief Identify the initial relaxed supernodes
+ * 
+ * This routine applied to a symmetric elimination tree. 
+ * It assumes that the matrix has been reordered according to the postorder of the etree
+ * \param n The number of columns
+ * \param et elimination tree 
+ * \param relax_columns Maximum number of columns allowed in a relaxed snode 
+ * \param descendants Number of descendants of each node in the etree
+ * \param relax_end last column in a supernode
+ */
+template <typename Scalar, typename Index>
+void SparseLUImpl<Scalar,Index>::heap_relax_snode (const Index n, IndexVector& et, const Index relax_columns, IndexVector& descendants, IndexVector& relax_end)
+{
+  
+  // The etree may not be postordered, but its heap ordered  
+  IndexVector post;
+  internal::treePostorder(n, et, post); // Post order etree
+  IndexVector inv_post(n+1); 
+  Index i;
+  for (i = 0; i < n+1; ++i) inv_post(post(i)) = i; // inv_post = post.inverse()???
+  
+  // Renumber etree in postorder 
+  IndexVector iwork(n);
+  IndexVector et_save(n+1);
+  for (i = 0; i < n; ++i)
+  {
+    iwork(post(i)) = post(et(i));
+  }
+  et_save = et; // Save the original etree
+  et = iwork; 
+  
+  // compute the number of descendants of each node in the etree
+  relax_end.setConstant(emptyIdxLU);
+  Index j, parent; 
+  descendants.setZero();
+  for (j = 0; j < n; j++) 
+  {
+    parent = et(j);
+    if (parent != n) // not the dummy root
+      descendants(parent) += descendants(j) + 1;
+  }
+  // Identify the relaxed supernodes by postorder traversal of the etree
+  Index snode_start; // beginning of a snode 
+  Index k;
+  Index nsuper_et_post = 0; // Number of relaxed snodes in postordered etree 
+  Index nsuper_et = 0; // Number of relaxed snodes in the original etree 
+  Index l; 
+  for (j = 0; j < n; )
+  {
+    parent = et(j);
+    snode_start = j; 
+    while ( parent != n && descendants(parent) < relax_columns ) 
+    {
+      j = parent; 
+      parent = et(j);
+    }
+    // Found a supernode in postordered etree, j is the last column 
+    ++nsuper_et_post;
+    k = n;
+    for (i = snode_start; i <= j; ++i)
+      k = (std::min)(k, inv_post(i));
+    l = inv_post(j);
+    if ( (l - k) == (j - snode_start) )  // Same number of columns in the snode
+    {
+      // This is also a supernode in the original etree
+      relax_end(k) = l; // Record last column 
+      ++nsuper_et; 
+    }
+    else 
+    {
+      for (i = snode_start; i <= j; ++i) 
+      {
+        l = inv_post(i);
+        if (descendants(i) == 0) 
+        {
+          relax_end(l) = l;
+          ++nsuper_et;
+        }
+      }
+    }
+    j++;
+    // Search for a new leaf
+    while (descendants(j) != 0 && j < n) j++;
+  } // End postorder traversal of the etree
+  
+  // Recover the original etree
+  et = et_save; 
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+#endif // SPARSELU_HEAP_RELAX_SNODE_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_kernel_bmod.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_kernel_bmod.h
new file mode 100644
index 0000000000..0d0283b132
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_kernel_bmod.h
@@ -0,0 +1,130 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef SPARSELU_KERNEL_BMOD_H
+#define SPARSELU_KERNEL_BMOD_H
+
+namespace Eigen {
+namespace internal {
+  
+/**
+ * \brief Performs numeric block updates from a given supernode to a single column
+ * 
+ * \param segsize Size of the segment (and blocks ) to use for updates
+ * \param[in,out] dense Packed values of the original matrix
+ * \param tempv temporary vector to use for updates
+ * \param lusup array containing the supernodes
+ * \param lda Leading dimension in the supernode
+ * \param nrow Number of rows in the rectangular part of the supernode
+ * \param lsub compressed row subscripts of supernodes
+ * \param lptr pointer to the first column of the current supernode in lsub
+ * \param no_zeros Number of nonzeros elements before the diagonal part of the supernode
+ * \return 0 on success
+ */
+template <int SegSizeAtCompileTime> struct LU_kernel_bmod
+{
+  template <typename BlockScalarVector, typename ScalarVector, typename IndexVector, typename Index>
+  static EIGEN_DONT_INLINE void run(const int segsize, BlockScalarVector& dense, ScalarVector& tempv, ScalarVector& lusup, Index& luptr, const Index lda,
+                                    const Index nrow, IndexVector& lsub, const Index lptr, const Index no_zeros);
+};
+
+template <int SegSizeAtCompileTime>
+template <typename BlockScalarVector, typename ScalarVector, typename IndexVector, typename Index>
+EIGEN_DONT_INLINE void LU_kernel_bmod<SegSizeAtCompileTime>::run(const int segsize, BlockScalarVector& dense, ScalarVector& tempv, ScalarVector& lusup, Index& luptr, const Index lda,
+                                                                  const Index nrow, IndexVector& lsub, const Index lptr, const Index no_zeros)
+{
+  typedef typename ScalarVector::Scalar Scalar;
+  // First, copy U[*,j] segment from dense(*) to tempv(*)
+  // The result of triangular solve is in tempv[*]; 
+    // The result of matric-vector update is in dense[*]
+  Index isub = lptr + no_zeros; 
+  int i;
+  Index irow;
+  for (i = 0; i < ((SegSizeAtCompileTime==Dynamic)?segsize:SegSizeAtCompileTime); i++)
+  {
+    irow = lsub(isub); 
+    tempv(i) = dense(irow); 
+    ++isub; 
+  }
+  // Dense triangular solve -- start effective triangle
+  luptr += lda * no_zeros + no_zeros; 
+  // Form Eigen matrix and vector 
+  Map<Matrix<Scalar,SegSizeAtCompileTime,SegSizeAtCompileTime>, 0, OuterStride<> > A( &(lusup.data()[luptr]), segsize, segsize, OuterStride<>(lda) );
+  Map<Matrix<Scalar,SegSizeAtCompileTime,1> > u(tempv.data(), segsize);
+  
+  u = A.template triangularView<UnitLower>().solve(u); 
+  
+  // Dense matrix-vector product y <-- B*x 
+  luptr += segsize;
+  const Index PacketSize = internal::packet_traits<Scalar>::size;
+  Index ldl = internal::first_multiple(nrow, PacketSize);
+  Map<Matrix<Scalar,Dynamic,SegSizeAtCompileTime>, 0, OuterStride<> > B( &(lusup.data()[luptr]), nrow, segsize, OuterStride<>(lda) );
+  Index aligned_offset = internal::first_aligned(tempv.data()+segsize, PacketSize);
+  Index aligned_with_B_offset = (PacketSize-internal::first_aligned(B.data(), PacketSize))%PacketSize;
+  Map<Matrix<Scalar,Dynamic,1>, 0, OuterStride<> > l(tempv.data()+segsize+aligned_offset+aligned_with_B_offset, nrow, OuterStride<>(ldl) );
+  
+  l.setZero();
+  internal::sparselu_gemm<Scalar>(l.rows(), l.cols(), B.cols(), B.data(), B.outerStride(), u.data(), u.outerStride(), l.data(), l.outerStride());
+  
+  // Scatter tempv[] into SPA dense[] as a temporary storage 
+  isub = lptr + no_zeros;
+  for (i = 0; i < ((SegSizeAtCompileTime==Dynamic)?segsize:SegSizeAtCompileTime); i++)
+  {
+    irow = lsub(isub++); 
+    dense(irow) = tempv(i);
+  }
+  
+  // Scatter l into SPA dense[]
+  for (i = 0; i < nrow; i++)
+  {
+    irow = lsub(isub++); 
+    dense(irow) -= l(i);
+  } 
+}
+
+template <> struct LU_kernel_bmod<1>
+{
+  template <typename BlockScalarVector, typename ScalarVector, typename IndexVector, typename Index>
+  static EIGEN_DONT_INLINE void run(const int /*segsize*/, BlockScalarVector& dense, ScalarVector& /*tempv*/, ScalarVector& lusup, Index& luptr,
+                                    const Index lda, const Index nrow, IndexVector& lsub, const Index lptr, const Index no_zeros);
+};
+
+
+template <typename BlockScalarVector, typename ScalarVector, typename IndexVector, typename Index>
+EIGEN_DONT_INLINE void LU_kernel_bmod<1>::run(const int /*segsize*/, BlockScalarVector& dense, ScalarVector& /*tempv*/, ScalarVector& lusup, Index& luptr,
+                                              const Index lda, const Index nrow, IndexVector& lsub, const Index lptr, const Index no_zeros)
+{
+  typedef typename ScalarVector::Scalar Scalar;
+  Scalar f = dense(lsub(lptr + no_zeros));
+  luptr += lda * no_zeros + no_zeros + 1;
+  const Scalar* a(lusup.data() + luptr);
+  const /*typename IndexVector::Scalar*/Index*  irow(lsub.data()+lptr + no_zeros + 1);
+  Index i = 0;
+  for (; i+1 < nrow; i+=2)
+  {
+    Index i0 = *(irow++);
+    Index i1 = *(irow++);
+    Scalar a0 = *(a++);
+    Scalar a1 = *(a++);
+    Scalar d0 = dense.coeff(i0);
+    Scalar d1 = dense.coeff(i1);
+    d0 -= f*a0;
+    d1 -= f*a1;
+    dense.coeffRef(i0) = d0;
+    dense.coeffRef(i1) = d1;
+  }
+  if(i<nrow)
+    dense.coeffRef(*(irow++)) -= f * *(a++);
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+#endif // SPARSELU_KERNEL_BMOD_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_panel_bmod.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_panel_bmod.h
new file mode 100644
index 0000000000..da0e0fc3c6
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_panel_bmod.h
@@ -0,0 +1,223 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+// Copyright (C) 2012 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* 
+ 
+ * NOTE: This file is the modified version of [s,d,c,z]panel_bmod.c file in SuperLU 
+ 
+ * -- SuperLU routine (version 3.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * October 15, 2003
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+#ifndef SPARSELU_PANEL_BMOD_H
+#define SPARSELU_PANEL_BMOD_H
+
+namespace Eigen {
+namespace internal {
+
+/**
+ * \brief Performs numeric block updates (sup-panel) in topological order.
+ * 
+ * Before entering this routine, the original nonzeros in the panel
+ * were already copied i nto the spa[m,w]
+ * 
+ * \param m number of rows in the matrix
+ * \param w Panel size
+ * \param jcol Starting  column of the panel
+ * \param nseg Number of segments in the U part
+ * \param dense Store the full representation of the panel 
+ * \param tempv working array 
+ * \param segrep segment representative... first row in the segment
+ * \param repfnz First nonzero rows
+ * \param glu Global LU data. 
+ * 
+ * 
+ */
+template <typename Scalar, typename Index>
+void SparseLUImpl<Scalar,Index>::panel_bmod(const Index m, const Index w, const Index jcol, 
+                                            const Index nseg, ScalarVector& dense, ScalarVector& tempv,
+                                            IndexVector& segrep, IndexVector& repfnz, GlobalLU_t& glu)
+{
+  
+  Index ksub,jj,nextl_col; 
+  Index fsupc, nsupc, nsupr, nrow; 
+  Index krep, kfnz; 
+  Index lptr; // points to the row subscripts of a supernode 
+  Index luptr; // ...
+  Index segsize,no_zeros ; 
+  // For each nonz supernode segment of U[*,j] in topological order
+  Index k = nseg - 1; 
+  const Index PacketSize = internal::packet_traits<Scalar>::size;
+  
+  for (ksub = 0; ksub < nseg; ksub++)
+  { // For each updating supernode
+    /* krep = representative of current k-th supernode
+     * fsupc =  first supernodal column
+     * nsupc = number of columns in a supernode
+     * nsupr = number of rows in a supernode
+     */
+    krep = segrep(k); k--; 
+    fsupc = glu.xsup(glu.supno(krep)); 
+    nsupc = krep - fsupc + 1; 
+    nsupr = glu.xlsub(fsupc+1) - glu.xlsub(fsupc); 
+    nrow = nsupr - nsupc; 
+    lptr = glu.xlsub(fsupc); 
+    
+    // loop over the panel columns to detect the actual number of columns and rows
+    Index u_rows = 0;
+    Index u_cols = 0;
+    for (jj = jcol; jj < jcol + w; jj++)
+    {
+      nextl_col = (jj-jcol) * m; 
+      VectorBlock<IndexVector> repfnz_col(repfnz, nextl_col, m); // First nonzero column index for each row
+      
+      kfnz = repfnz_col(krep); 
+      if ( kfnz == emptyIdxLU ) 
+        continue; // skip any zero segment
+      
+      segsize = krep - kfnz + 1;
+      u_cols++;
+      u_rows = (std::max)(segsize,u_rows);
+    }
+    
+    if(nsupc >= 2)
+    { 
+      Index ldu = internal::first_multiple<Index>(u_rows, PacketSize);
+      Map<Matrix<Scalar,Dynamic,Dynamic>, Aligned,  OuterStride<> > U(tempv.data(), u_rows, u_cols, OuterStride<>(ldu));
+      
+      // gather U
+      Index u_col = 0;
+      for (jj = jcol; jj < jcol + w; jj++)
+      {
+        nextl_col = (jj-jcol) * m; 
+        VectorBlock<IndexVector> repfnz_col(repfnz, nextl_col, m); // First nonzero column index for each row
+        VectorBlock<ScalarVector> dense_col(dense, nextl_col, m); // Scatter/gather entire matrix column from/to here
+        
+        kfnz = repfnz_col(krep); 
+        if ( kfnz == emptyIdxLU ) 
+          continue; // skip any zero segment
+        
+        segsize = krep - kfnz + 1;
+        luptr = glu.xlusup(fsupc);    
+        no_zeros = kfnz - fsupc; 
+        
+        Index isub = lptr + no_zeros;
+        Index off = u_rows-segsize;
+        for (Index i = 0; i < off; i++) U(i,u_col) = 0;
+        for (Index i = 0; i < segsize; i++)
+        {
+          Index irow = glu.lsub(isub); 
+          U(i+off,u_col) = dense_col(irow); 
+          ++isub; 
+        }
+        u_col++;
+      }
+      // solve U = A^-1 U
+      luptr = glu.xlusup(fsupc);
+      Index lda = glu.xlusup(fsupc+1) - glu.xlusup(fsupc);
+      no_zeros = (krep - u_rows + 1) - fsupc;
+      luptr += lda * no_zeros + no_zeros;
+      Map<Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > A(glu.lusup.data()+luptr, u_rows, u_rows, OuterStride<>(lda) );
+      U = A.template triangularView<UnitLower>().solve(U);
+      
+      // update
+      luptr += u_rows;
+      Map<Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > B(glu.lusup.data()+luptr, nrow, u_rows, OuterStride<>(lda) );
+      eigen_assert(tempv.size()>w*ldu + nrow*w + 1);
+      
+      Index ldl = internal::first_multiple<Index>(nrow, PacketSize);
+      Index offset = (PacketSize-internal::first_aligned(B.data(), PacketSize)) % PacketSize;
+      Map<Matrix<Scalar,Dynamic,Dynamic>, 0, OuterStride<> > L(tempv.data()+w*ldu+offset, nrow, u_cols, OuterStride<>(ldl));
+      
+      L.setZero();
+      internal::sparselu_gemm<Scalar>(L.rows(), L.cols(), B.cols(), B.data(), B.outerStride(), U.data(), U.outerStride(), L.data(), L.outerStride());
+      
+      // scatter U and L
+      u_col = 0;
+      for (jj = jcol; jj < jcol + w; jj++)
+      {
+        nextl_col = (jj-jcol) * m; 
+        VectorBlock<IndexVector> repfnz_col(repfnz, nextl_col, m); // First nonzero column index for each row
+        VectorBlock<ScalarVector> dense_col(dense, nextl_col, m); // Scatter/gather entire matrix column from/to here
+        
+        kfnz = repfnz_col(krep); 
+        if ( kfnz == emptyIdxLU ) 
+          continue; // skip any zero segment
+        
+        segsize = krep - kfnz + 1;
+        no_zeros = kfnz - fsupc; 
+        Index isub = lptr + no_zeros;
+        
+        Index off = u_rows-segsize;
+        for (Index i = 0; i < segsize; i++)
+        {
+          Index irow = glu.lsub(isub++); 
+          dense_col(irow) = U.coeff(i+off,u_col);
+          U.coeffRef(i+off,u_col) = 0;
+        }
+        
+        // Scatter l into SPA dense[]
+        for (Index i = 0; i < nrow; i++)
+        {
+          Index irow = glu.lsub(isub++); 
+          dense_col(irow) -= L.coeff(i,u_col);
+          L.coeffRef(i,u_col) = 0;
+        }
+        u_col++;
+      }
+    }
+    else // level 2 only
+    {
+      // Sequence through each column in the panel
+      for (jj = jcol; jj < jcol + w; jj++)
+      {
+        nextl_col = (jj-jcol) * m; 
+        VectorBlock<IndexVector> repfnz_col(repfnz, nextl_col, m); // First nonzero column index for each row
+        VectorBlock<ScalarVector> dense_col(dense, nextl_col, m); // Scatter/gather entire matrix column from/to here
+        
+        kfnz = repfnz_col(krep); 
+        if ( kfnz == emptyIdxLU ) 
+          continue; // skip any zero segment
+        
+        segsize = krep - kfnz + 1;
+        luptr = glu.xlusup(fsupc);
+        
+        Index lda = glu.xlusup(fsupc+1)-glu.xlusup(fsupc);// nsupr
+        
+        // Perform a trianglar solve and block update, 
+        // then scatter the result of sup-col update to dense[]
+        no_zeros = kfnz - fsupc; 
+              if(segsize==1)  LU_kernel_bmod<1>::run(segsize, dense_col, tempv, glu.lusup, luptr, lda, nrow, glu.lsub, lptr, no_zeros);
+        else  if(segsize==2)  LU_kernel_bmod<2>::run(segsize, dense_col, tempv, glu.lusup, luptr, lda, nrow, glu.lsub, lptr, no_zeros);
+        else  if(segsize==3)  LU_kernel_bmod<3>::run(segsize, dense_col, tempv, glu.lusup, luptr, lda, nrow, glu.lsub, lptr, no_zeros);
+        else                  LU_kernel_bmod<Dynamic>::run(segsize, dense_col, tempv, glu.lusup, luptr, lda, nrow, glu.lsub, lptr, no_zeros); 
+      } // End for each column in the panel 
+    }
+    
+  } // End for each updating supernode
+} // end panel bmod
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // SPARSELU_PANEL_BMOD_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_panel_dfs.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_panel_dfs.h
new file mode 100644
index 0000000000..dc0054efd2
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_panel_dfs.h
@@ -0,0 +1,258 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* 
+ 
+ * NOTE: This file is the modified version of [s,d,c,z]panel_dfs.c file in SuperLU 
+ 
+ * -- SuperLU routine (version 2.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * November 15, 1997
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+#ifndef SPARSELU_PANEL_DFS_H
+#define SPARSELU_PANEL_DFS_H
+
+namespace Eigen {
+
+namespace internal {
+  
+template<typename IndexVector>
+struct panel_dfs_traits
+{
+  typedef typename IndexVector::Scalar Index;
+  panel_dfs_traits(Index jcol, Index* marker)
+    : m_jcol(jcol), m_marker(marker)
+  {}
+  bool update_segrep(Index krep, Index jj)
+  {
+    if(m_marker[krep]<m_jcol)
+    {
+      m_marker[krep] = jj; 
+      return true;
+    }
+    return false;
+  }
+  void mem_expand(IndexVector& /*glu.lsub*/, Index /*nextl*/, Index /*chmark*/) {}
+  enum { ExpandMem = false };
+  Index m_jcol;
+  Index* m_marker;
+};
+
+
+template <typename Scalar, typename Index>
+template <typename Traits>
+void SparseLUImpl<Scalar,Index>::dfs_kernel(const Index jj, IndexVector& perm_r,
+                   Index& nseg, IndexVector& panel_lsub, IndexVector& segrep,
+                   Ref<IndexVector> repfnz_col, IndexVector& xprune, Ref<IndexVector> marker, IndexVector& parent,
+                   IndexVector& xplore, GlobalLU_t& glu,
+                   Index& nextl_col, Index krow, Traits& traits
+                  )
+{
+  
+  Index kmark = marker(krow);
+      
+  // For each unmarked krow of jj
+  marker(krow) = jj; 
+  Index kperm = perm_r(krow); 
+  if (kperm == emptyIdxLU ) {
+    // krow is in L : place it in structure of L(*, jj)
+    panel_lsub(nextl_col++) = krow;  // krow is indexed into A
+    
+    traits.mem_expand(panel_lsub, nextl_col, kmark);
+  }
+  else 
+  {
+    // krow is in U : if its supernode-representative krep
+    // has been explored, update repfnz(*)
+    // krep = supernode representative of the current row
+    Index krep = glu.xsup(glu.supno(kperm)+1) - 1; 
+    // First nonzero element in the current column:
+    Index myfnz = repfnz_col(krep); 
+    
+    if (myfnz != emptyIdxLU )
+    {
+      // Representative visited before
+      if (myfnz > kperm ) repfnz_col(krep) = kperm; 
+      
+    }
+    else 
+    {
+      // Otherwise, perform dfs starting at krep
+      Index oldrep = emptyIdxLU; 
+      parent(krep) = oldrep; 
+      repfnz_col(krep) = kperm; 
+      Index xdfs =  glu.xlsub(krep); 
+      Index maxdfs = xprune(krep); 
+      
+      Index kpar;
+      do 
+      {
+        // For each unmarked kchild of krep
+        while (xdfs < maxdfs) 
+        {
+          Index kchild = glu.lsub(xdfs); 
+          xdfs++; 
+          Index chmark = marker(kchild); 
+          
+          if (chmark != jj ) 
+          {
+            marker(kchild) = jj; 
+            Index chperm = perm_r(kchild); 
+            
+            if (chperm == emptyIdxLU) 
+            {
+              // case kchild is in L: place it in L(*, j)
+              panel_lsub(nextl_col++) = kchild;
+              traits.mem_expand(panel_lsub, nextl_col, chmark);
+            }
+            else
+            {
+              // case kchild is in U :
+              // chrep = its supernode-rep. If its rep has been explored, 
+              // update its repfnz(*)
+              Index chrep = glu.xsup(glu.supno(chperm)+1) - 1; 
+              myfnz = repfnz_col(chrep); 
+              
+              if (myfnz != emptyIdxLU) 
+              { // Visited before 
+                if (myfnz > chperm) 
+                  repfnz_col(chrep) = chperm; 
+              }
+              else 
+              { // Cont. dfs at snode-rep of kchild
+                xplore(krep) = xdfs; 
+                oldrep = krep; 
+                krep = chrep; // Go deeper down G(L)
+                parent(krep) = oldrep; 
+                repfnz_col(krep) = chperm; 
+                xdfs = glu.xlsub(krep); 
+                maxdfs = xprune(krep); 
+                
+              } // end if myfnz != -1
+            } // end if chperm == -1 
+                
+          } // end if chmark !=jj
+        } // end while xdfs < maxdfs
+        
+        // krow has no more unexplored nbrs :
+        //    Place snode-rep krep in postorder DFS, if this 
+        //    segment is seen for the first time. (Note that 
+        //    "repfnz(krep)" may change later.)
+        //    Baktrack dfs to its parent
+        if(traits.update_segrep(krep,jj))
+        //if (marker1(krep) < jcol )
+        {
+          segrep(nseg) = krep; 
+          ++nseg; 
+          //marker1(krep) = jj; 
+        }
+        
+        kpar = parent(krep); // Pop recursion, mimic recursion 
+        if (kpar == emptyIdxLU) 
+          break; // dfs done 
+        krep = kpar; 
+        xdfs = xplore(krep); 
+        maxdfs = xprune(krep); 
+
+      } while (kpar != emptyIdxLU); // Do until empty stack 
+      
+    } // end if (myfnz = -1)
+
+  } // end if (kperm == -1)   
+}
+
+/**
+ * \brief Performs a symbolic factorization on a panel of columns [jcol, jcol+w)
+ * 
+ * A supernode representative is the last column of a supernode.
+ * The nonzeros in U[*,j] are segments that end at supernodes representatives
+ * 
+ * The routine returns a list of the supernodal representatives 
+ * in topological order of the dfs that generates them. This list is 
+ * a superset of the topological order of each individual column within 
+ * the panel.
+ * The location of the first nonzero in each supernodal segment 
+ * (supernodal entry location) is also returned. Each column has 
+ * a separate list for this purpose. 
+ * 
+ * Two markers arrays are used for dfs :
+ *    marker[i] == jj, if i was visited during dfs of current column jj;
+ *    marker1[i] >= jcol, if i was visited by earlier columns in this panel; 
+ * 
+ * \param[in] m number of rows in the matrix
+ * \param[in] w Panel size
+ * \param[in] jcol Starting  column of the panel
+ * \param[in] A Input matrix in column-major storage
+ * \param[in] perm_r Row permutation
+ * \param[out] nseg Number of U segments
+ * \param[out] dense Accumulate the column vectors of the panel
+ * \param[out] panel_lsub Subscripts of the row in the panel 
+ * \param[out] segrep Segment representative i.e first nonzero row of each segment
+ * \param[out] repfnz First nonzero location in each row
+ * \param[out] xprune The pruned elimination tree
+ * \param[out] marker work vector
+ * \param  parent The elimination tree
+ * \param xplore work vector
+ * \param glu The global data structure
+ * 
+ */
+
+template <typename Scalar, typename Index>
+void SparseLUImpl<Scalar,Index>::panel_dfs(const Index m, const Index w, const Index jcol, MatrixType& A, IndexVector& perm_r, Index& nseg, ScalarVector& dense, IndexVector& panel_lsub, IndexVector& segrep, IndexVector& repfnz, IndexVector& xprune, IndexVector& marker, IndexVector& parent, IndexVector& xplore, GlobalLU_t& glu)
+{
+  Index nextl_col; // Next available position in panel_lsub[*,jj] 
+  
+  // Initialize pointers 
+  VectorBlock<IndexVector> marker1(marker, m, m); 
+  nseg = 0; 
+  
+  panel_dfs_traits<IndexVector> traits(jcol, marker1.data());
+  
+  // For each column in the panel 
+  for (Index jj = jcol; jj < jcol + w; jj++) 
+  {
+    nextl_col = (jj - jcol) * m; 
+    
+    VectorBlock<IndexVector> repfnz_col(repfnz, nextl_col, m); // First nonzero location in each row
+    VectorBlock<ScalarVector> dense_col(dense,nextl_col, m); // Accumulate a column vector here
+    
+    
+    // For each nnz in A[*, jj] do depth first search
+    for (typename MatrixType::InnerIterator it(A, jj); it; ++it)
+    {
+      Index krow = it.row(); 
+      dense_col(krow) = it.value();
+      
+      Index kmark = marker(krow); 
+      if (kmark == jj) 
+        continue; // krow visited before, go to the next nonzero
+      
+      dfs_kernel(jj, perm_r, nseg, panel_lsub, segrep, repfnz_col, xprune, marker, parent,
+                   xplore, glu, nextl_col, krow, traits);
+    }// end for nonzeros in column jj
+    
+  } // end for column jj
+}
+
+} // end namespace internal
+} // end namespace Eigen
+
+#endif // SPARSELU_PANEL_DFS_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_pivotL.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_pivotL.h
new file mode 100644
index 0000000000..457789c780
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_pivotL.h
@@ -0,0 +1,136 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* 
+ 
+ * NOTE: This file is the modified version of xpivotL.c file in SuperLU 
+ 
+ * -- SuperLU routine (version 3.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * October 15, 2003
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+#ifndef SPARSELU_PIVOTL_H
+#define SPARSELU_PIVOTL_H
+
+namespace Eigen {
+namespace internal {
+  
+/**
+ * \brief Performs the numerical pivotin on the current column of L, and the CDIV operation.
+ * 
+ * Pivot policy :
+ * (1) Compute thresh = u * max_(i>=j) abs(A_ij);
+ * (2) IF user specifies pivot row k and abs(A_kj) >= thresh THEN
+ *           pivot row = k;
+ *       ELSE IF abs(A_jj) >= thresh THEN
+ *           pivot row = j;
+ *       ELSE
+ *           pivot row = m;
+ * 
+ *   Note: If you absolutely want to use a given pivot order, then set u=0.0.
+ * 
+ * \param jcol The current column of L
+ * \param diagpivotthresh diagonal pivoting threshold
+ * \param[in,out] perm_r Row permutation (threshold pivoting)
+ * \param[in] iperm_c column permutation - used to finf diagonal of Pc*A*Pc'
+ * \param[out] pivrow  The pivot row
+ * \param glu Global LU data
+ * \return 0 if success, i > 0 if U(i,i) is exactly zero 
+ * 
+ */
+template <typename Scalar, typename Index>
+Index SparseLUImpl<Scalar,Index>::pivotL(const Index jcol, const RealScalar& diagpivotthresh, IndexVector& perm_r, IndexVector& iperm_c, Index& pivrow, GlobalLU_t& glu)
+{
+  
+  Index fsupc = (glu.xsup)((glu.supno)(jcol)); // First column in the supernode containing the column jcol
+  Index nsupc = jcol - fsupc; // Number of columns in the supernode portion, excluding jcol; nsupc >=0
+  Index lptr = glu.xlsub(fsupc); // pointer to the starting location of the row subscripts for this supernode portion
+  Index nsupr = glu.xlsub(fsupc+1) - lptr; // Number of rows in the supernode
+  Index lda = glu.xlusup(fsupc+1) - glu.xlusup(fsupc); // leading dimension
+  Scalar* lu_sup_ptr = &(glu.lusup.data()[glu.xlusup(fsupc)]); // Start of the current supernode
+  Scalar* lu_col_ptr = &(glu.lusup.data()[glu.xlusup(jcol)]); // Start of jcol in the supernode
+  Index* lsub_ptr = &(glu.lsub.data()[lptr]); // Start of row indices of the supernode
+  
+  // Determine the largest abs numerical value for partial pivoting 
+  Index diagind = iperm_c(jcol); // diagonal index 
+  RealScalar pivmax = 0.0; 
+  Index pivptr = nsupc; 
+  Index diag = emptyIdxLU; 
+  RealScalar rtemp;
+  Index isub, icol, itemp, k; 
+  for (isub = nsupc; isub < nsupr; ++isub) {
+    using std::abs;
+    rtemp = abs(lu_col_ptr[isub]);
+    if (rtemp > pivmax) {
+      pivmax = rtemp; 
+      pivptr = isub;
+    } 
+    if (lsub_ptr[isub] == diagind) diag = isub;
+  }
+  
+  // Test for singularity
+  if ( pivmax == 0.0 ) {
+    pivrow = lsub_ptr[pivptr];
+    perm_r(pivrow) = jcol;
+    return (jcol+1);
+  }
+  
+  RealScalar thresh = diagpivotthresh * pivmax; 
+  
+  // Choose appropriate pivotal element 
+  
+  {
+    // Test if the diagonal element can be used as a pivot (given the threshold value)
+    if (diag >= 0 ) 
+    {
+      // Diagonal element exists
+      using std::abs;
+      rtemp = abs(lu_col_ptr[diag]);
+      if (rtemp != 0.0 && rtemp >= thresh) pivptr = diag;
+    }
+    pivrow = lsub_ptr[pivptr];
+  }
+  
+  // Record pivot row
+  perm_r(pivrow) = jcol; 
+  // Interchange row subscripts
+  if (pivptr != nsupc )
+  {
+    std::swap( lsub_ptr[pivptr], lsub_ptr[nsupc] );
+    // Interchange numerical values as well, for the two rows in the whole snode
+    // such that L is indexed the same way as A
+    for (icol = 0; icol <= nsupc; icol++)
+    {
+      itemp = pivptr + icol * lda; 
+      std::swap(lu_sup_ptr[itemp], lu_sup_ptr[nsupc + icol * lda]);
+    }
+  }
+  // cdiv operations
+  Scalar temp = Scalar(1.0) / lu_col_ptr[nsupc];
+  for (k = nsupc+1; k < nsupr; k++)
+    lu_col_ptr[k] *= temp; 
+  return 0;
+}
+
+} // end namespace internal
+} // end namespace Eigen
+
+#endif // SPARSELU_PIVOTL_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_pruneL.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_pruneL.h
new file mode 100644
index 0000000000..66460d1688
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_pruneL.h
@@ -0,0 +1,135 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* 
+ 
+ * NOTE: This file is the modified version of [s,d,c,z]pruneL.c file in SuperLU 
+ 
+ * -- SuperLU routine (version 2.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * November 15, 1997
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+#ifndef SPARSELU_PRUNEL_H
+#define SPARSELU_PRUNEL_H
+
+namespace Eigen {
+namespace internal {
+
+/**
+ * \brief Prunes the L-structure.
+ *
+ * It prunes the L-structure  of supernodes whose L-structure contains the current pivot row "pivrow"
+ * 
+ * 
+ * \param jcol The current column of L
+ * \param[in] perm_r Row permutation
+ * \param[out] pivrow  The pivot row
+ * \param nseg Number of segments
+ * \param segrep 
+ * \param repfnz
+ * \param[out] xprune 
+ * \param glu Global LU data
+ * 
+ */
+template <typename Scalar, typename Index>
+void SparseLUImpl<Scalar,Index>::pruneL(const Index jcol, const IndexVector& perm_r, const Index pivrow, const Index nseg, const IndexVector& segrep, BlockIndexVector repfnz, IndexVector& xprune, GlobalLU_t& glu)
+{
+  // For each supernode-rep irep in U(*,j]
+  Index jsupno = glu.supno(jcol); 
+  Index i,irep,irep1; 
+  bool movnum, do_prune = false; 
+  Index kmin = 0, kmax = 0, minloc, maxloc,krow; 
+  for (i = 0; i < nseg; i++)
+  {
+    irep = segrep(i); 
+    irep1 = irep + 1; 
+    do_prune = false; 
+    
+    // Don't prune with a zero U-segment 
+    if (repfnz(irep) == emptyIdxLU) continue; 
+    
+    // If a snode overlaps with the next panel, then the U-segment
+    // is fragmented into two parts -- irep and irep1. We should let 
+    // pruning occur at the rep-column in irep1s snode. 
+    if (glu.supno(irep) == glu.supno(irep1) ) continue; // don't prune 
+    
+    // If it has not been pruned & it has a nonz in row L(pivrow,i)
+    if (glu.supno(irep) != jsupno )
+    {
+      if ( xprune (irep) >= glu.xlsub(irep1) )
+      {
+        kmin = glu.xlsub(irep);
+        kmax = glu.xlsub(irep1) - 1; 
+        for (krow = kmin; krow <= kmax; krow++)
+        {
+          if (glu.lsub(krow) == pivrow) 
+          {
+            do_prune = true; 
+            break; 
+          }
+        }
+      }
+      
+      if (do_prune) 
+      {
+        // do a quicksort-type partition
+        // movnum=true means that the num values have to be exchanged
+        movnum = false; 
+        if (irep == glu.xsup(glu.supno(irep)) ) // Snode of size 1 
+          movnum = true; 
+        
+        while (kmin <= kmax)
+        {
+          if (perm_r(glu.lsub(kmax)) == emptyIdxLU)
+            kmax--; 
+          else if ( perm_r(glu.lsub(kmin)) != emptyIdxLU)
+            kmin++;
+          else 
+          {
+            // kmin below pivrow (not yet pivoted), and kmax
+            // above pivrow: interchange the two suscripts
+            std::swap(glu.lsub(kmin), glu.lsub(kmax)); 
+            
+            // If the supernode has only one column, then we 
+            // only keep one set of subscripts. For any subscript
+            // intercnahge performed, similar interchange must be 
+            // done on the numerical values. 
+            if (movnum) 
+            {
+              minloc = glu.xlusup(irep) + ( kmin - glu.xlsub(irep) ); 
+              maxloc = glu.xlusup(irep) + ( kmax - glu.xlsub(irep) ); 
+              std::swap(glu.lusup(minloc), glu.lusup(maxloc)); 
+            }
+            kmin++;
+            kmax--;
+          }
+        } // end while 
+        
+        xprune(irep) = kmin;  //Pruning 
+      } // end if do_prune 
+    } // end pruning 
+  } // End for each U-segment
+}
+
+} // end namespace internal
+} // end namespace Eigen
+
+#endif // SPARSELU_PRUNEL_H
diff --git a/third_party/eigen3/Eigen/src/SparseLU/SparseLU_relax_snode.h b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_relax_snode.h
new file mode 100644
index 0000000000..58ec32e27e
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseLU/SparseLU_relax_snode.h
@@ -0,0 +1,83 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012 Désiré Nuentsa-Wakam <desire.nuentsa_wakam@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+/* This file is a modified version of heap_relax_snode.c file in SuperLU
+ * -- SuperLU routine (version 3.0) --
+ * Univ. of California Berkeley, Xerox Palo Alto Research Center,
+ * and Lawrence Berkeley National Lab.
+ * October 15, 2003
+ *
+ * Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+ *
+ * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+ * EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+ *
+ * Permission is hereby granted to use or copy this program for any
+ * purpose, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is
+ * granted, provided the above notices are retained, and a notice that
+ * the code was modified is included with the above copyright notice.
+ */
+
+#ifndef SPARSELU_RELAX_SNODE_H
+#define SPARSELU_RELAX_SNODE_H
+
+namespace Eigen {
+
+namespace internal {
+ 
+/** 
+ * \brief Identify the initial relaxed supernodes
+ * 
+ * This routine is applied to a column elimination tree. 
+ * It assumes that the matrix has been reordered according to the postorder of the etree
+ * \param n  the number of columns
+ * \param et elimination tree 
+ * \param relax_columns Maximum number of columns allowed in a relaxed snode 
+ * \param descendants Number of descendants of each node in the etree
+ * \param relax_end last column in a supernode
+ */
+template <typename Scalar, typename Index>
+void SparseLUImpl<Scalar,Index>::relax_snode (const Index n, IndexVector& et, const Index relax_columns, IndexVector& descendants, IndexVector& relax_end)
+{
+  
+  // compute the number of descendants of each node in the etree
+  Index j, parent; 
+  relax_end.setConstant(emptyIdxLU);
+  descendants.setZero();
+  for (j = 0; j < n; j++) 
+  {
+    parent = et(j);
+    if (parent != n) // not the dummy root
+      descendants(parent) += descendants(j) + 1;
+  }
+  // Identify the relaxed supernodes by postorder traversal of the etree
+  Index snode_start; // beginning of a snode 
+  for (j = 0; j < n; )
+  {
+    parent = et(j);
+    snode_start = j; 
+    while ( parent != n && descendants(parent) < relax_columns ) 
+    {
+      j = parent; 
+      parent = et(j);
+    }
+    // Found a supernode in postordered etree, j is the last column 
+    relax_end(snode_start) = j; // Record last column
+    j++;
+    // Search for a new leaf
+    while (descendants(j) != 0 && j < n) j++;
+  } // End postorder traversal of the etree
+  
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+#endif
diff --git a/third_party/eigen3/Eigen/src/SparseQR/SparseQR.h b/third_party/eigen3/Eigen/src/SparseQR/SparseQR.h
new file mode 100644
index 0000000000..5fb5bc2038
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SparseQR/SparseQR.h
@@ -0,0 +1,675 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2012-2013 Desire Nuentsa <desire.nuentsa_wakam@inria.fr>
+// Copyright (C) 2012-2013 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_QR_H
+#define EIGEN_SPARSE_QR_H
+
+namespace Eigen {
+
+template<typename MatrixType, typename OrderingType> class SparseQR;
+template<typename SparseQRType> struct SparseQRMatrixQReturnType;
+template<typename SparseQRType> struct SparseQRMatrixQTransposeReturnType;
+template<typename SparseQRType, typename Derived> struct SparseQR_QProduct;
+namespace internal {
+  template <typename SparseQRType> struct traits<SparseQRMatrixQReturnType<SparseQRType> >
+  {
+    typedef typename SparseQRType::MatrixType ReturnType;
+    typedef typename ReturnType::Index Index;
+    typedef typename ReturnType::StorageKind StorageKind;
+  };
+  template <typename SparseQRType> struct traits<SparseQRMatrixQTransposeReturnType<SparseQRType> >
+  {
+    typedef typename SparseQRType::MatrixType ReturnType;
+  };
+  template <typename SparseQRType, typename Derived> struct traits<SparseQR_QProduct<SparseQRType, Derived> >
+  {
+    typedef typename Derived::PlainObject ReturnType;
+  };
+} // End namespace internal
+
+/**
+  * \ingroup SparseQR_Module
+  * \class SparseQR
+  * \brief Sparse left-looking rank-revealing QR factorization
+  * 
+  * This class implements a left-looking rank-revealing QR decomposition 
+  * of sparse matrices. When a column has a norm less than a given tolerance
+  * it is implicitly permuted to the end. The QR factorization thus obtained is 
+  * given by A*P = Q*R where R is upper triangular or trapezoidal. 
+  * 
+  * P is the column permutation which is the product of the fill-reducing and the
+  * rank-revealing permutations. Use colsPermutation() to get it.
+  * 
+  * Q is the orthogonal matrix represented as products of Householder reflectors. 
+  * Use matrixQ() to get an expression and matrixQ().transpose() to get the transpose.
+  * You can then apply it to a vector.
+  * 
+  * R is the sparse triangular or trapezoidal matrix. The later occurs when A is rank-deficient.
+  * matrixR().topLeftCorner(rank(), rank()) always returns a triangular factor of full rank.
+  * 
+  * \tparam _MatrixType The type of the sparse matrix A, must be a column-major SparseMatrix<>
+  * \tparam _OrderingType The fill-reducing ordering method. See the \link OrderingMethods_Module 
+  *  OrderingMethods \endlink module for the list of built-in and external ordering methods.
+  * 
+  * \warning The input sparse matrix A must be in compressed mode (see SparseMatrix::makeCompressed()).
+  * 
+  */
+template<typename _MatrixType, typename _OrderingType>
+class SparseQR
+{
+  public:
+    typedef _MatrixType MatrixType;
+    typedef _OrderingType OrderingType;
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef SparseMatrix<Scalar,ColMajor,Index> QRMatrixType;
+    typedef Matrix<Index, Dynamic, 1> IndexVector;
+    typedef Matrix<Scalar, Dynamic, 1> ScalarVector;
+    typedef PermutationMatrix<Dynamic, Dynamic, Index> PermutationType;
+  public:
+    SparseQR () : m_isInitialized(false), m_analysisIsok(false), m_lastError(""), m_useDefaultThreshold(true),m_isQSorted(false)
+    { }
+    
+    /** Construct a QR factorization of the matrix \a mat.
+      * 
+      * \warning The matrix \a mat must be in compressed mode (see SparseMatrix::makeCompressed()).
+      * 
+      * \sa compute()
+      */
+    SparseQR(const MatrixType& mat) : m_isInitialized(false), m_analysisIsok(false), m_lastError(""), m_useDefaultThreshold(true),m_isQSorted(false)
+    {
+      compute(mat);
+    }
+    
+    /** Computes the QR factorization of the sparse matrix \a mat.
+      * 
+      * \warning The matrix \a mat must be in compressed mode (see SparseMatrix::makeCompressed()).
+      * 
+      * \sa analyzePattern(), factorize()
+      */
+    void compute(const MatrixType& mat)
+    {
+      analyzePattern(mat);
+      factorize(mat);
+    }
+    void analyzePattern(const MatrixType& mat);
+    void factorize(const MatrixType& mat);
+    
+    /** \returns the number of rows of the represented matrix. 
+      */
+    inline Index rows() const { return m_pmat.rows(); }
+    
+    /** \returns the number of columns of the represented matrix. 
+      */
+    inline Index cols() const { return m_pmat.cols();}
+    
+    /** \returns a const reference to the \b sparse upper triangular matrix R of the QR factorization.
+      */
+    const QRMatrixType& matrixR() const { return m_R; }
+    
+    /** \returns the number of non linearly dependent columns as determined by the pivoting threshold.
+      *
+      * \sa setPivotThreshold()
+      */
+    Index rank() const 
+    {
+      eigen_assert(m_isInitialized && "The factorization should be called first, use compute()");
+      return m_nonzeropivots; 
+    }
+    
+    /** \returns an expression of the matrix Q as products of sparse Householder reflectors.
+    * The common usage of this function is to apply it to a dense matrix or vector
+    * \code
+    * VectorXd B1, B2;
+    * // Initialize B1
+    * B2 = matrixQ() * B1;
+    * \endcode
+    *
+    * To get a plain SparseMatrix representation of Q:
+    * \code
+    * SparseMatrix<double> Q;
+    * Q = SparseQR<SparseMatrix<double> >(A).matrixQ();
+    * \endcode
+    * Internally, this call simply performs a sparse product between the matrix Q
+    * and a sparse identity matrix. However, due to the fact that the sparse
+    * reflectors are stored unsorted, two transpositions are needed to sort
+    * them before performing the product.
+    */
+    SparseQRMatrixQReturnType<SparseQR> matrixQ() const 
+    { return SparseQRMatrixQReturnType<SparseQR>(*this); }
+    
+    /** \returns a const reference to the column permutation P that was applied to A such that A*P = Q*R
+      * It is the combination of the fill-in reducing permutation and numerical column pivoting.
+      */
+    const PermutationType& colsPermutation() const
+    { 
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_outputPerm_c;
+    }
+    
+    /** \returns A string describing the type of error.
+      * This method is provided to ease debugging, not to handle errors.
+      */
+    std::string lastErrorMessage() const { return m_lastError; }
+    
+    /** \internal */
+    template<typename Rhs, typename Dest>
+    bool _solve(const MatrixBase<Rhs> &B, MatrixBase<Dest> &dest) const
+    {
+      eigen_assert(m_isInitialized && "The factorization should be called first, use compute()");
+      eigen_assert(this->rows() == B.rows() && "SparseQR::solve() : invalid number of rows in the right hand side matrix");
+
+      Index rank = this->rank();
+      
+      // Compute Q^T * b;
+      typename Dest::PlainObject y, b;
+      y = this->matrixQ().transpose() * B; 
+      b = y;
+      
+      // Solve with the triangular matrix R
+      y.resize((std::max)(cols(),Index(y.rows())),y.cols());
+      y.topRows(rank) = this->matrixR().topLeftCorner(rank, rank).template triangularView<Upper>().solve(b.topRows(rank));
+      y.bottomRows(y.rows()-rank).setZero();
+
+      // Apply the column permutation
+      if (m_perm_c.size())  dest.topRows(cols()) = colsPermutation() * y.topRows(cols());
+      else                  dest = y.topRows(cols());
+      
+      m_info = Success;
+      return true;
+    }
+    
+
+    /** Sets the threshold that is used to determine linearly dependent columns during the factorization.
+      *
+      * In practice, if during the factorization the norm of the column that has to be eliminated is below
+      * this threshold, then the entire column is treated as zero, and it is moved at the end.
+      */
+    void setPivotThreshold(const RealScalar& threshold)
+    {
+      m_useDefaultThreshold = false;
+      m_threshold = threshold;
+    }
+    
+    /** \returns the solution X of \f$ A X = B \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<SparseQR, Rhs> solve(const MatrixBase<Rhs>& B) const 
+    {
+      eigen_assert(m_isInitialized && "The factorization should be called first, use compute()");
+      eigen_assert(this->rows() == B.rows() && "SparseQR::solve() : invalid number of rows in the right hand side matrix");
+      return internal::solve_retval<SparseQR, Rhs>(*this, B.derived());
+    }
+    template<typename Rhs>
+    inline const internal::sparse_solve_retval<SparseQR, Rhs> solve(const SparseMatrixBase<Rhs>& B) const
+    {
+          eigen_assert(m_isInitialized && "The factorization should be called first, use compute()");
+          eigen_assert(this->rows() == B.rows() && "SparseQR::solve() : invalid number of rows in the right hand side matrix");
+          return internal::sparse_solve_retval<SparseQR, Rhs>(*this, B.derived());
+    }
+    
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was successful,
+      *          \c NumericalIssue if the QR factorization reports a numerical problem
+      *          \c InvalidInput if the input matrix is invalid
+      *
+      * \sa iparm()          
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_info;
+    }
+
+  protected:
+    inline void sort_matrix_Q()
+    {
+      if(this->m_isQSorted) return;
+      // The matrix Q is sorted during the transposition
+      SparseMatrix<Scalar, RowMajor, Index> mQrm(this->m_Q);
+      this->m_Q = mQrm;
+      this->m_isQSorted = true;
+    }
+
+    
+  protected:
+    bool m_isInitialized;
+    bool m_analysisIsok;
+    bool m_factorizationIsok;
+    mutable ComputationInfo m_info;
+    std::string m_lastError;
+    QRMatrixType m_pmat;            // Temporary matrix
+    QRMatrixType m_R;               // The triangular factor matrix
+    QRMatrixType m_Q;               // The orthogonal reflectors
+    ScalarVector m_hcoeffs;         // The Householder coefficients
+    PermutationType m_perm_c;       // Fill-reducing  Column  permutation
+    PermutationType m_pivotperm;    // The permutation for rank revealing
+    PermutationType m_outputPerm_c; // The final column permutation
+    RealScalar m_threshold;         // Threshold to determine null Householder reflections
+    bool m_useDefaultThreshold;     // Use default threshold
+    Index m_nonzeropivots;          // Number of non zero pivots found 
+    IndexVector m_etree;            // Column elimination tree
+    IndexVector m_firstRowElt;      // First element in each row
+    bool m_isQSorted;               // whether Q is sorted or not
+    
+    template <typename, typename > friend struct SparseQR_QProduct;
+    template <typename > friend struct SparseQRMatrixQReturnType;
+    
+};
+
+/** \brief Preprocessing step of a QR factorization 
+  * 
+  * \warning The matrix \a mat must be in compressed mode (see SparseMatrix::makeCompressed()).
+  * 
+  * In this step, the fill-reducing permutation is computed and applied to the columns of A
+  * and the column elimination tree is computed as well. Only the sparsity pattern of \a mat is exploited.
+  * 
+  * \note In this step it is assumed that there is no empty row in the matrix \a mat.
+  */
+template <typename MatrixType, typename OrderingType>
+void SparseQR<MatrixType,OrderingType>::analyzePattern(const MatrixType& mat)
+{
+  eigen_assert(mat.isCompressed() && "SparseQR requires a sparse matrix in compressed mode. Call .makeCompressed() before passing it to SparseQR");
+  // Compute the column fill reducing ordering
+  OrderingType ord; 
+  ord(mat, m_perm_c); 
+  Index n = mat.cols();
+  Index m = mat.rows();
+  
+  if (!m_perm_c.size())
+  {
+    m_perm_c.resize(n);
+    m_perm_c.indices().setLinSpaced(n, 0,n-1);
+  }
+  
+  // Compute the column elimination tree of the permuted matrix
+  m_outputPerm_c = m_perm_c.inverse();
+  internal::coletree(mat, m_etree, m_firstRowElt, m_outputPerm_c.indices().data());
+  
+  m_R.resize(n, n);
+  m_Q.resize(m, n);
+  
+  // Allocate space for nonzero elements : rough estimation
+  m_R.reserve(2*mat.nonZeros()); //FIXME Get a more accurate estimation through symbolic factorization with the etree
+  m_Q.reserve(2*mat.nonZeros());
+  m_hcoeffs.resize(n);
+  m_analysisIsok = true;
+}
+
+/** \brief Performs the numerical QR factorization of the input matrix
+  * 
+  * The function SparseQR::analyzePattern(const MatrixType&) must have been called beforehand with
+  * a matrix having the same sparsity pattern than \a mat.
+  * 
+  * \param mat The sparse column-major matrix
+  */
+template <typename MatrixType, typename OrderingType>
+void SparseQR<MatrixType,OrderingType>::factorize(const MatrixType& mat)
+{
+  using std::abs;
+  using std::max;
+  
+  eigen_assert(m_analysisIsok && "analyzePattern() should be called before this step");
+  Index m = mat.rows();
+  Index n = mat.cols();
+  IndexVector mark(m); mark.setConstant(-1);  // Record the visited nodes
+  IndexVector Ridx(n), Qidx(m);               // Store temporarily the row indexes for the current column of R and Q
+  Index nzcolR, nzcolQ;                       // Number of nonzero for the current column of R and Q
+  ScalarVector tval(m);                       // The dense vector used to compute the current column
+  bool found_diag;
+    
+  m_pmat = mat;
+  m_pmat.uncompress(); // To have the innerNonZeroPtr allocated
+  // Apply the fill-in reducing permutation lazily:
+  for (int i = 0; i < n; i++)
+  {
+    Index p = m_perm_c.size() ? m_perm_c.indices()(i) : i;
+    m_pmat.outerIndexPtr()[p] = mat.outerIndexPtr()[i]; 
+    m_pmat.innerNonZeroPtr()[p] = mat.outerIndexPtr()[i+1] - mat.outerIndexPtr()[i]; 
+  }
+  
+  /* Compute the default threshold, see : 
+   * Tim Davis, "Algorithm 915, SuiteSparseQR: Multifrontal Multithreaded Rank-Revealing
+   * Sparse QR Factorization, ACM Trans. on Math. Soft. 38(1), 2011, Page 8:3 
+   */
+  if(m_useDefaultThreshold) 
+  {
+    RealScalar max2Norm = 0.0;
+    for (int j = 0; j < n; j++) max2Norm = (max)(max2Norm, m_pmat.col(j).norm());
+    m_threshold = 20 * (m + n) * max2Norm * NumTraits<RealScalar>::epsilon();
+  }
+  
+  // Initialize the numerical permutation
+  m_pivotperm.setIdentity(n);
+  
+  Index nonzeroCol = 0; // Record the number of valid pivots
+  
+  // Left looking rank-revealing QR factorization: compute a column of R and Q at a time
+  for (Index col = 0; col < (std::min)(n,m); ++col)
+  {
+    mark.setConstant(-1);
+    m_R.startVec(col);
+    m_Q.startVec(col);
+    mark(nonzeroCol) = col;
+    Qidx(0) = nonzeroCol;
+    nzcolR = 0; nzcolQ = 1;
+    found_diag = col>=m;
+    tval.setZero(); 
+    
+    // Symbolic factorization: find the nonzero locations of the column k of the factors R and Q, i.e.,
+    // all the nodes (with indexes lower than rank) reachable through the column elimination tree (etree) rooted at node k.
+    // Note: if the diagonal entry does not exist, then its contribution must be explicitly added,
+    // thus the trick with found_diag that permits to do one more iteration on the diagonal element if this one has not been found.
+    for (typename MatrixType::InnerIterator itp(m_pmat, col); itp || !found_diag; ++itp)
+    {
+      Index curIdx = nonzeroCol ;
+      if(itp) curIdx = itp.row();
+      if(curIdx == nonzeroCol) found_diag = true;
+      
+      // Get the nonzeros indexes of the current column of R
+      Index st = m_firstRowElt(curIdx); // The traversal of the etree starts here 
+      if (st < 0 )
+      {
+        m_lastError = "Empty row found during numerical factorization";
+        m_info = InvalidInput;
+        return;
+      }
+
+      // Traverse the etree 
+      Index bi = nzcolR;
+      for (; mark(st) != col; st = m_etree(st))
+      {
+        Ridx(nzcolR) = st;  // Add this row to the list,
+        mark(st) = col;     // and mark this row as visited
+        nzcolR++;
+      }
+
+      // Reverse the list to get the topological ordering
+      Index nt = nzcolR-bi;
+      for(Index i = 0; i < nt/2; i++) std::swap(Ridx(bi+i), Ridx(nzcolR-i-1));
+       
+      // Copy the current (curIdx,pcol) value of the input matrix
+      if(itp) tval(curIdx) = itp.value();
+      else    tval(curIdx) = Scalar(0);
+      
+      // Compute the pattern of Q(:,k)
+      if(curIdx > nonzeroCol && mark(curIdx) != col ) 
+      {
+        Qidx(nzcolQ) = curIdx;  // Add this row to the pattern of Q,
+        mark(curIdx) = col;     // and mark it as visited
+        nzcolQ++;
+      }
+    }
+
+    // Browse all the indexes of R(:,col) in reverse order
+    for (Index i = nzcolR-1; i >= 0; i--)
+    {
+      Index curIdx = m_pivotperm.indices()(Ridx(i));
+      
+      // Apply the curIdx-th householder vector to the current column (temporarily stored into tval)
+      Scalar tdot(0);
+      
+      // First compute q' * tval
+      tdot = m_Q.col(curIdx).dot(tval);
+
+      tdot *= m_hcoeffs(curIdx);
+      
+      // Then update tval = tval - q * tau
+      // FIXME: tval -= tdot * m_Q.col(curIdx) should amount to the same (need to check/add support for efficient "dense ?= sparse")
+      for (typename QRMatrixType::InnerIterator itq(m_Q, curIdx); itq; ++itq)
+        tval(itq.row()) -= itq.value() * tdot;
+
+      // Detect fill-in for the current column of Q
+      if(m_etree(Ridx(i)) == nonzeroCol)
+      {
+        for (typename QRMatrixType::InnerIterator itq(m_Q, curIdx); itq; ++itq)
+        {
+          Index iQ = itq.row();
+          if (mark(iQ) != col)
+          {
+            Qidx(nzcolQ++) = iQ;  // Add this row to the pattern of Q,
+            mark(iQ) = col;       // and mark it as visited
+          }
+        }
+      }
+    } // End update current column
+        
+    // Compute the Householder reflection that eliminate the current column
+    // FIXME this step should call the Householder module.
+    Scalar tau;
+    RealScalar beta;
+    Scalar c0 = nzcolQ ? tval(Qidx(0)) : Scalar(0);
+    
+    // First, the squared norm of Q((col+1):m, col)
+    RealScalar sqrNorm = 0.;
+    for (Index itq = 1; itq < nzcolQ; ++itq) sqrNorm += numext::abs2(tval(Qidx(itq)));
+    
+    if(sqrNorm == RealScalar(0) && numext::imag(c0) == RealScalar(0))
+    {
+      tau = RealScalar(0);
+      beta = numext::real(c0);
+      tval(Qidx(0)) = 1;
+     }
+    else
+    {
+      using std::sqrt;
+      beta = sqrt(numext::abs2(c0) + sqrNorm);
+      if(numext::real(c0) >= RealScalar(0))
+        beta = -beta;
+      tval(Qidx(0)) = 1;
+      for (Index itq = 1; itq < nzcolQ; ++itq)
+        tval(Qidx(itq)) /= (c0 - beta);
+      tau = numext::conj((beta-c0) / beta);
+        
+    }
+
+    // Insert values in R
+    for (Index  i = nzcolR-1; i >= 0; i--)
+    {
+      Index curIdx = Ridx(i);
+      if(curIdx < nonzeroCol) 
+      {
+        m_R.insertBackByOuterInnerUnordered(col, curIdx) = tval(curIdx);
+        tval(curIdx) = Scalar(0.);
+      }
+    }
+
+    if(abs(beta) >= m_threshold)
+    {
+      m_R.insertBackByOuterInner(col, nonzeroCol) = beta;
+      nonzeroCol++;
+      // The householder coefficient
+      m_hcoeffs(col) = tau;
+      // Record the householder reflections
+      for (Index itq = 0; itq < nzcolQ; ++itq)
+      {
+        Index iQ = Qidx(itq);
+        m_Q.insertBackByOuterInnerUnordered(col,iQ) = tval(iQ);
+        tval(iQ) = Scalar(0.);
+      }    
+    }
+    else
+    {
+      // Zero pivot found: move implicitly this column to the end
+      m_hcoeffs(col) = Scalar(0);
+      for (Index j = nonzeroCol; j < n-1; j++) 
+        std::swap(m_pivotperm.indices()(j), m_pivotperm.indices()[j+1]);
+      
+      // Recompute the column elimination tree
+      internal::coletree(m_pmat, m_etree, m_firstRowElt, m_pivotperm.indices().data());
+    }
+  }
+  
+  // Finalize the column pointers of the sparse matrices R and Q
+  m_Q.finalize();
+  m_Q.makeCompressed();
+  m_R.finalize();
+  m_R.makeCompressed();
+  m_isQSorted = false;
+  
+  m_nonzeropivots = nonzeroCol;
+  
+  if(nonzeroCol<n)
+  {
+    // Permute the triangular factor to put the 'dead' columns to the end
+    MatrixType tempR(m_R);
+    m_R = tempR * m_pivotperm;
+    
+    // Update the column permutation
+    m_outputPerm_c = m_outputPerm_c * m_pivotperm;
+  }
+  
+  m_isInitialized = true; 
+  m_factorizationIsok = true;
+  m_info = Success;
+}
+
+namespace internal {
+  
+template<typename _MatrixType, typename OrderingType, typename Rhs>
+struct solve_retval<SparseQR<_MatrixType,OrderingType>, Rhs>
+  : solve_retval_base<SparseQR<_MatrixType,OrderingType>, Rhs>
+{
+  typedef SparseQR<_MatrixType,OrderingType> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+template<typename _MatrixType, typename OrderingType, typename Rhs>
+struct sparse_solve_retval<SparseQR<_MatrixType, OrderingType>, Rhs>
+ : sparse_solve_retval_base<SparseQR<_MatrixType, OrderingType>, Rhs>
+{
+  typedef SparseQR<_MatrixType, OrderingType> Dec;
+  EIGEN_MAKE_SPARSE_SOLVE_HELPERS(Dec, Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    this->defaultEvalTo(dst);
+  }
+};
+} // end namespace internal
+
+template <typename SparseQRType, typename Derived>
+struct SparseQR_QProduct : ReturnByValue<SparseQR_QProduct<SparseQRType, Derived> >
+{
+  typedef typename SparseQRType::QRMatrixType MatrixType;
+  typedef typename SparseQRType::Scalar Scalar;
+  typedef typename SparseQRType::Index Index;
+  // Get the references 
+  SparseQR_QProduct(const SparseQRType& qr, const Derived& other, bool transpose) : 
+  m_qr(qr),m_other(other),m_transpose(transpose) {}
+  inline Index rows() const { return m_transpose ? m_qr.rows() : m_qr.cols(); }
+  inline Index cols() const { return m_other.cols(); }
+  
+  // Assign to a vector
+  template<typename DesType>
+  void evalTo(DesType& res) const
+  {
+    Index n = m_qr.cols();
+    res = m_other;
+    if (m_transpose)
+    {
+      eigen_assert(m_qr.m_Q.rows() == m_other.rows() && "Non conforming object sizes");
+      //Compute res = Q' * other column by column
+      for(Index j = 0; j < res.cols(); j++){
+        for (Index k = 0; k < n; k++)
+        {
+          Scalar tau = Scalar(0);
+          tau = m_qr.m_Q.col(k).dot(res.col(j));
+          if(tau==Scalar(0)) continue;
+          tau = tau * m_qr.m_hcoeffs(k);
+          res.col(j) -= tau * m_qr.m_Q.col(k);
+        }
+      }
+    }
+    else
+    {
+      eigen_assert(m_qr.m_Q.rows() == m_other.rows() && "Non conforming object sizes");
+      // Compute res = Q' * other column by column
+      for(Index j = 0; j < res.cols(); j++)
+      {
+        for (Index k = n-1; k >=0; k--)
+        {
+          Scalar tau = Scalar(0);
+          tau = m_qr.m_Q.col(k).dot(res.col(j));
+          if(tau==Scalar(0)) continue;
+          tau = tau * m_qr.m_hcoeffs(k);
+          res.col(j) -= tau * m_qr.m_Q.col(k);
+        }
+      }
+    }
+  }
+  
+  const SparseQRType& m_qr;
+  const Derived& m_other;
+  bool m_transpose;
+};
+
+template<typename SparseQRType>
+struct SparseQRMatrixQReturnType : public EigenBase<SparseQRMatrixQReturnType<SparseQRType> >
+{  
+  typedef typename SparseQRType::Index Index;
+  typedef typename SparseQRType::Scalar Scalar;
+  typedef Matrix<Scalar,Dynamic,Dynamic> DenseMatrix;
+  SparseQRMatrixQReturnType(const SparseQRType& qr) : m_qr(qr) {}
+  template<typename Derived>
+  SparseQR_QProduct<SparseQRType, Derived> operator*(const MatrixBase<Derived>& other)
+  {
+    return SparseQR_QProduct<SparseQRType,Derived>(m_qr,other.derived(),false);
+  }
+  SparseQRMatrixQTransposeReturnType<SparseQRType> adjoint() const
+  {
+    return SparseQRMatrixQTransposeReturnType<SparseQRType>(m_qr);
+  }
+  inline Index rows() const { return m_qr.rows(); }
+  inline Index cols() const { return m_qr.cols(); }
+  // To use for operations with the transpose of Q
+  SparseQRMatrixQTransposeReturnType<SparseQRType> transpose() const
+  {
+    return SparseQRMatrixQTransposeReturnType<SparseQRType>(m_qr);
+  }
+  template<typename Dest> void evalTo(MatrixBase<Dest>& dest) const
+  {
+    dest.derived() = m_qr.matrixQ() * Dest::Identity(m_qr.rows(), m_qr.rows());
+  }
+  template<typename Dest> void evalTo(SparseMatrixBase<Dest>& dest) const
+  {
+    Dest idMat(m_qr.rows(), m_qr.rows());
+    idMat.setIdentity();
+    // Sort the sparse householder reflectors if needed
+    const_cast<SparseQRType *>(&m_qr)->sort_matrix_Q();
+    dest.derived() = SparseQR_QProduct<SparseQRType, Dest>(m_qr, idMat, false);
+  }
+
+  const SparseQRType& m_qr;
+};
+
+template<typename SparseQRType>
+struct SparseQRMatrixQTransposeReturnType
+{
+  SparseQRMatrixQTransposeReturnType(const SparseQRType& qr) : m_qr(qr) {}
+  template<typename Derived>
+  SparseQR_QProduct<SparseQRType,Derived> operator*(const MatrixBase<Derived>& other)
+  {
+    return SparseQR_QProduct<SparseQRType,Derived>(m_qr,other.derived(), true);
+  }
+  const SparseQRType& m_qr;
+};
+
+} // end namespace Eigen
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/StlSupport/StdDeque.h b/third_party/eigen3/Eigen/src/StlSupport/StdDeque.h
new file mode 100644
index 0000000000..4ee8e5c10a
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/StlSupport/StdDeque.h
@@ -0,0 +1,134 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Hauke Heibel <hauke.heibel@googlemail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STDDEQUE_H
+#define EIGEN_STDDEQUE_H
+
+#include "Eigen/src/StlSupport/details.h"
+
+// Define the explicit instantiation (e.g. necessary for the Intel compiler)
+#if defined(__INTEL_COMPILER) || defined(__GNUC__)
+  #define EIGEN_EXPLICIT_STL_DEQUE_INSTANTIATION(...) template class std::deque<__VA_ARGS__, EIGEN_ALIGNED_ALLOCATOR<__VA_ARGS__> >;
+#else
+  #define EIGEN_EXPLICIT_STL_DEQUE_INSTANTIATION(...)
+#endif
+
+/**
+ * This section contains a convenience MACRO which allows an easy specialization of
+ * std::deque such that for data types with alignment issues the correct allocator
+ * is used automatically.
+ */
+#define EIGEN_DEFINE_STL_DEQUE_SPECIALIZATION(...) \
+EIGEN_EXPLICIT_STL_DEQUE_INSTANTIATION(__VA_ARGS__) \
+namespace std \
+{ \
+  template<typename _Ay> \
+  class deque<__VA_ARGS__, _Ay>  \
+    : public deque<__VA_ARGS__, EIGEN_ALIGNED_ALLOCATOR<__VA_ARGS__> > \
+  { \
+    typedef deque<__VA_ARGS__, EIGEN_ALIGNED_ALLOCATOR<__VA_ARGS__> > deque_base; \
+  public: \
+    typedef __VA_ARGS__ value_type; \
+    typedef typename deque_base::allocator_type allocator_type; \
+    typedef typename deque_base::size_type size_type;  \
+    typedef typename deque_base::iterator iterator;  \
+    explicit deque(const allocator_type& a = allocator_type()) : deque_base(a) {}  \
+    template<typename InputIterator> \
+    deque(InputIterator first, InputIterator last, const allocator_type& a = allocator_type()) : deque_base(first, last, a) {} \
+    deque(const deque& c) : deque_base(c) {}  \
+    explicit deque(size_type num, const value_type& val = value_type()) : deque_base(num, val) {} \
+    deque(iterator start, iterator end) : deque_base(start, end) {}  \
+    deque& operator=(const deque& x) {  \
+      deque_base::operator=(x);  \
+      return *this;  \
+    } \
+  }; \
+}
+
+// check whether we really need the std::deque specialization
+#if !(defined(_GLIBCXX_DEQUE) && (!EIGEN_GNUC_AT_LEAST(4,1))) /* Note that before gcc-4.1 we already have: std::deque::resize(size_type,const T&). */
+
+namespace std {
+
+#define EIGEN_STD_DEQUE_SPECIALIZATION_BODY \
+  public:  \
+    typedef T value_type; \
+    typedef typename deque_base::allocator_type allocator_type; \
+    typedef typename deque_base::size_type size_type;  \
+    typedef typename deque_base::iterator iterator;  \
+    typedef typename deque_base::const_iterator const_iterator;  \
+    explicit deque(const allocator_type& a = allocator_type()) : deque_base(a) {}  \
+    template<typename InputIterator> \
+    deque(InputIterator first, InputIterator last, const allocator_type& a = allocator_type()) \
+    : deque_base(first, last, a) {} \
+    deque(const deque& c) : deque_base(c) {}  \
+    explicit deque(size_type num, const value_type& val = value_type()) : deque_base(num, val) {} \
+    deque(iterator start, iterator end) : deque_base(start, end) {}  \
+    deque& operator=(const deque& x) {  \
+      deque_base::operator=(x);  \
+      return *this;  \
+    }
+
+  template<typename T>
+  class deque<T,EIGEN_ALIGNED_ALLOCATOR<T> >
+    : public deque<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T),
+                   Eigen::aligned_allocator_indirection<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T)> >
+{
+  typedef deque<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T),
+                Eigen::aligned_allocator_indirection<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T)> > deque_base;
+  EIGEN_STD_DEQUE_SPECIALIZATION_BODY
+
+  void resize(size_type new_size)
+  { resize(new_size, T()); }
+
+#if defined(_DEQUE_)
+  // workaround MSVC std::deque implementation
+  void resize(size_type new_size, const value_type& x)
+  {
+    if (deque_base::size() < new_size)
+      deque_base::_Insert_n(deque_base::end(), new_size - deque_base::size(), x);
+    else if (new_size < deque_base::size())
+      deque_base::erase(deque_base::begin() + new_size, deque_base::end());
+  }
+  void push_back(const value_type& x)
+  { deque_base::push_back(x); } 
+  void push_front(const value_type& x)
+  { deque_base::push_front(x); }
+  using deque_base::insert;  
+  iterator insert(const_iterator position, const value_type& x)
+  { return deque_base::insert(position,x); }
+  void insert(const_iterator position, size_type new_size, const value_type& x)
+  { deque_base::insert(position, new_size, x); }
+#elif defined(_GLIBCXX_DEQUE) && EIGEN_GNUC_AT_LEAST(4,2)
+  // workaround GCC std::deque implementation
+  void resize(size_type new_size, const value_type& x)
+  {
+    if (new_size < deque_base::size())
+      deque_base::_M_erase_at_end(this->_M_impl._M_start + new_size);
+    else
+      deque_base::insert(deque_base::end(), new_size - deque_base::size(), x);
+  }
+#else
+  // either GCC 4.1 or non-GCC
+  // default implementation which should always work.
+  void resize(size_type new_size, const value_type& x)
+  {
+    if (new_size < deque_base::size())
+      deque_base::erase(deque_base::begin() + new_size, deque_base::end());
+    else if (new_size > deque_base::size())
+      deque_base::insert(deque_base::end(), new_size - deque_base::size(), x);
+  }
+#endif
+  };
+}
+
+#endif // check whether specialization is actually required
+
+#endif // EIGEN_STDDEQUE_H
diff --git a/third_party/eigen3/Eigen/src/StlSupport/StdList.h b/third_party/eigen3/Eigen/src/StlSupport/StdList.h
new file mode 100644
index 0000000000..627381ecec
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/StlSupport/StdList.h
@@ -0,0 +1,114 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Hauke Heibel <hauke.heibel@googlemail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STDLIST_H
+#define EIGEN_STDLIST_H
+
+#include "Eigen/src/StlSupport/details.h"
+
+// Define the explicit instantiation (e.g. necessary for the Intel compiler)
+#if defined(__INTEL_COMPILER) || defined(__GNUC__)
+  #define EIGEN_EXPLICIT_STL_LIST_INSTANTIATION(...) template class std::list<__VA_ARGS__, EIGEN_ALIGNED_ALLOCATOR<__VA_ARGS__> >;
+#else
+  #define EIGEN_EXPLICIT_STL_LIST_INSTANTIATION(...)
+#endif
+
+/**
+ * This section contains a convenience MACRO which allows an easy specialization of
+ * std::list such that for data types with alignment issues the correct allocator
+ * is used automatically.
+ */
+#define EIGEN_DEFINE_STL_LIST_SPECIALIZATION(...) \
+EIGEN_EXPLICIT_STL_LIST_INSTANTIATION(__VA_ARGS__) \
+namespace std \
+{ \
+  template<typename _Ay> \
+  class list<__VA_ARGS__, _Ay>  \
+    : public list<__VA_ARGS__, EIGEN_ALIGNED_ALLOCATOR<__VA_ARGS__> > \
+  { \
+    typedef list<__VA_ARGS__, EIGEN_ALIGNED_ALLOCATOR<__VA_ARGS__> > list_base; \
+  public: \
+    typedef __VA_ARGS__ value_type; \
+    typedef typename list_base::allocator_type allocator_type; \
+    typedef typename list_base::size_type size_type;  \
+    typedef typename list_base::iterator iterator;  \
+    explicit list(const allocator_type& a = allocator_type()) : list_base(a) {}  \
+    template<typename InputIterator> \
+    list(InputIterator first, InputIterator last, const allocator_type& a = allocator_type()) : list_base(first, last, a) {} \
+    list(const list& c) : list_base(c) {}  \
+    explicit list(size_type num, const value_type& val = value_type()) : list_base(num, val) {} \
+    list(iterator start, iterator end) : list_base(start, end) {}  \
+    list& operator=(const list& x) {  \
+      list_base::operator=(x);  \
+      return *this;  \
+    } \
+  }; \
+}
+
+// check whether we really need the std::vector specialization
+#if !(defined(_GLIBCXX_VECTOR) && (!EIGEN_GNUC_AT_LEAST(4,1))) /* Note that before gcc-4.1 we already have: std::list::resize(size_type,const T&). */
+
+namespace std
+{
+
+#define EIGEN_STD_LIST_SPECIALIZATION_BODY \
+  public:  \
+    typedef T value_type; \
+    typedef typename list_base::allocator_type allocator_type; \
+    typedef typename list_base::size_type size_type;  \
+    typedef typename list_base::iterator iterator;  \
+    typedef typename list_base::const_iterator const_iterator;  \
+    explicit list(const allocator_type& a = allocator_type()) : list_base(a) {}  \
+    template<typename InputIterator> \
+    list(InputIterator first, InputIterator last, const allocator_type& a = allocator_type()) \
+    : list_base(first, last, a) {} \
+    list(const list& c) : list_base(c) {}  \
+    explicit list(size_type num, const value_type& val = value_type()) : list_base(num, val) {} \
+    list(iterator start, iterator end) : list_base(start, end) {}  \
+    list& operator=(const list& x) {  \
+    list_base::operator=(x);  \
+    return *this; \
+  }
+
+  template<typename T>
+  class list<T,EIGEN_ALIGNED_ALLOCATOR<T> >
+    : public list<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T),
+                  Eigen::aligned_allocator_indirection<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T)> >
+  {
+    typedef list<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T),
+                 Eigen::aligned_allocator_indirection<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T)> > list_base;
+    EIGEN_STD_LIST_SPECIALIZATION_BODY
+
+    void resize(size_type new_size)
+    { resize(new_size, T()); }
+
+    void resize(size_type new_size, const value_type& x)
+    {
+      if (list_base::size() < new_size)
+        list_base::insert(list_base::end(), new_size - list_base::size(), x);
+      else
+        while (new_size < list_base::size()) list_base::pop_back();
+    }
+
+#if defined(_LIST_)
+    // workaround MSVC std::list implementation
+    void push_back(const value_type& x)
+    { list_base::push_back(x); } 
+    using list_base::insert;  
+    iterator insert(const_iterator position, const value_type& x)
+    { return list_base::insert(position,x); }
+    void insert(const_iterator position, size_type new_size, const value_type& x)
+    { list_base::insert(position, new_size, x); }
+#endif
+  };
+}
+
+#endif // check whether specialization is actually required
+
+#endif // EIGEN_STDLIST_H
diff --git a/third_party/eigen3/Eigen/src/StlSupport/StdVector.h b/third_party/eigen3/Eigen/src/StlSupport/StdVector.h
new file mode 100644
index 0000000000..40a9abefa8
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/StlSupport/StdVector.h
@@ -0,0 +1,126 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Hauke Heibel <hauke.heibel@googlemail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STDVECTOR_H
+#define EIGEN_STDVECTOR_H
+
+#include "Eigen/src/StlSupport/details.h"
+
+/**
+ * This section contains a convenience MACRO which allows an easy specialization of
+ * std::vector such that for data types with alignment issues the correct allocator
+ * is used automatically.
+ */
+#define EIGEN_DEFINE_STL_VECTOR_SPECIALIZATION(...) \
+namespace std \
+{ \
+  template<> \
+  class vector<__VA_ARGS__, std::allocator<__VA_ARGS__> >  \
+    : public vector<__VA_ARGS__, EIGEN_ALIGNED_ALLOCATOR<__VA_ARGS__> > \
+  { \
+    typedef vector<__VA_ARGS__, EIGEN_ALIGNED_ALLOCATOR<__VA_ARGS__> > vector_base; \
+  public: \
+    typedef __VA_ARGS__ value_type; \
+    typedef vector_base::allocator_type allocator_type; \
+    typedef vector_base::size_type size_type;  \
+    typedef vector_base::iterator iterator;  \
+    explicit vector(const allocator_type& a = allocator_type()) : vector_base(a) {}  \
+    template<typename InputIterator> \
+    vector(InputIterator first, InputIterator last, const allocator_type& a = allocator_type()) : vector_base(first, last, a) {} \
+    vector(const vector& c) : vector_base(c) {}  \
+    explicit vector(size_type num, const value_type& val = value_type()) : vector_base(num, val) {} \
+    vector(iterator start, iterator end) : vector_base(start, end) {}  \
+    vector& operator=(const vector& x) {  \
+      vector_base::operator=(x);  \
+      return *this;  \
+    } \
+  }; \
+}
+
+namespace std {
+
+#define EIGEN_STD_VECTOR_SPECIALIZATION_BODY \
+  public:  \
+    typedef T value_type; \
+    typedef typename vector_base::allocator_type allocator_type; \
+    typedef typename vector_base::size_type size_type;  \
+    typedef typename vector_base::iterator iterator;  \
+    typedef typename vector_base::const_iterator const_iterator;  \
+    explicit vector(const allocator_type& a = allocator_type()) : vector_base(a) {}  \
+    template<typename InputIterator> \
+    vector(InputIterator first, InputIterator last, const allocator_type& a = allocator_type()) \
+    : vector_base(first, last, a) {} \
+    vector(const vector& c) : vector_base(c) {}  \
+    explicit vector(size_type num, const value_type& val = value_type()) : vector_base(num, val) {} \
+    vector(iterator start, iterator end) : vector_base(start, end) {}  \
+    vector& operator=(const vector& x) {  \
+      vector_base::operator=(x);  \
+      return *this;  \
+    }
+
+  template<typename T>
+  class vector<T,EIGEN_ALIGNED_ALLOCATOR<T> >
+    : public vector<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T),
+                    Eigen::aligned_allocator_indirection<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T)> >
+{
+  typedef vector<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T),
+                 Eigen::aligned_allocator_indirection<EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T)> > vector_base;
+  EIGEN_STD_VECTOR_SPECIALIZATION_BODY
+
+  void resize(size_type new_size)
+  { resize(new_size, T()); }
+
+#if defined(_VECTOR_)
+  // workaround MSVC std::vector implementation
+  void resize(size_type new_size, const value_type& x)
+  {
+    if (vector_base::size() < new_size)
+      vector_base::_Insert_n(vector_base::end(), new_size - vector_base::size(), x);
+    else if (new_size < vector_base::size())
+      vector_base::erase(vector_base::begin() + new_size, vector_base::end());
+  }
+  void push_back(const value_type& x)
+  { vector_base::push_back(x); } 
+  using vector_base::insert;  
+  iterator insert(const_iterator position, const value_type& x)
+  { return vector_base::insert(position,x); }
+  void insert(const_iterator position, size_type new_size, const value_type& x)
+  { vector_base::insert(position, new_size, x); }
+#elif defined(_GLIBCXX_VECTOR) && (!(EIGEN_GNUC_AT_LEAST(4,1)))
+  /* Note that before gcc-4.1 we already have: std::vector::resize(size_type,const T&).
+   * However, this specialization is still needed to make the above EIGEN_DEFINE_STL_VECTOR_SPECIALIZATION trick to work. */
+  void resize(size_type new_size, const value_type& x)
+  {
+    vector_base::resize(new_size,x);
+  }
+#elif defined(_GLIBCXX_VECTOR) && EIGEN_GNUC_AT_LEAST(4,2)
+  // workaround GCC std::vector implementation
+  void resize(size_type new_size, const value_type& x)
+  {
+    if (new_size < vector_base::size())
+      vector_base::_M_erase_at_end(this->_M_impl._M_start + new_size);
+    else
+      vector_base::insert(vector_base::end(), new_size - vector_base::size(), x);
+  }
+#else
+  // either GCC 4.1 or non-GCC
+  // default implementation which should always work.
+  void resize(size_type new_size, const value_type& x)
+  {
+    if (new_size < vector_base::size())
+      vector_base::erase(vector_base::begin() + new_size, vector_base::end());
+    else if (new_size > vector_base::size())
+      vector_base::insert(vector_base::end(), new_size - vector_base::size(), x);
+  }
+#endif
+  };
+}
+
+#endif // EIGEN_STDVECTOR_H
diff --git a/third_party/eigen3/Eigen/src/StlSupport/details.h b/third_party/eigen3/Eigen/src/StlSupport/details.h
new file mode 100644
index 0000000000..e42ec024f2
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/StlSupport/details.h
@@ -0,0 +1,84 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2009 Hauke Heibel <hauke.heibel@googlemail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_STL_DETAILS_H
+#define EIGEN_STL_DETAILS_H
+
+#ifndef EIGEN_ALIGNED_ALLOCATOR
+  #define EIGEN_ALIGNED_ALLOCATOR Eigen::aligned_allocator
+#endif
+
+namespace Eigen {
+
+  // This one is needed to prevent reimplementing the whole std::vector.
+  template <class T>
+  class aligned_allocator_indirection : public EIGEN_ALIGNED_ALLOCATOR<T>
+  {
+  public:
+    typedef size_t    size_type;
+    typedef ptrdiff_t difference_type;
+    typedef T*        pointer;
+    typedef const T*  const_pointer;
+    typedef T&        reference;
+    typedef const T&  const_reference;
+    typedef T         value_type;
+
+    template<class U>
+    struct rebind
+    {
+      typedef aligned_allocator_indirection<U> other;
+    };
+
+    aligned_allocator_indirection() {}
+    aligned_allocator_indirection(const aligned_allocator_indirection& ) : EIGEN_ALIGNED_ALLOCATOR<T>() {}
+    aligned_allocator_indirection(const EIGEN_ALIGNED_ALLOCATOR<T>& ) {}
+    template<class U>
+    aligned_allocator_indirection(const aligned_allocator_indirection<U>& ) {}
+    template<class U>
+    aligned_allocator_indirection(const EIGEN_ALIGNED_ALLOCATOR<U>& ) {}
+    ~aligned_allocator_indirection() {}
+  };
+
+#if EIGEN_COMP_MSVC
+
+  // sometimes, MSVC detects, at compile time, that the argument x
+  // in std::vector::resize(size_t s,T x) won't be aligned and generate an error
+  // even if this function is never called. Whence this little wrapper.
+#define EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T) \
+  typename Eigen::internal::conditional< \
+    Eigen::internal::is_arithmetic<T>::value, \
+    T, \
+    Eigen::internal::workaround_msvc_stl_support<T> \
+  >::type
+
+  namespace internal {
+  template<typename T> struct workaround_msvc_stl_support : public T
+  {
+    inline workaround_msvc_stl_support() : T() {}
+    inline workaround_msvc_stl_support(const T& other) : T(other) {}
+    inline operator T& () { return *static_cast<T*>(this); }
+    inline operator const T& () const { return *static_cast<const T*>(this); }
+    template<typename OtherT>
+    inline T& operator=(const OtherT& other)
+    { T::operator=(other); return *this; }
+    inline workaround_msvc_stl_support& operator=(const workaround_msvc_stl_support& other)
+    { T::operator=(other); return *this; }
+  };
+  }
+
+#else
+
+#define EIGEN_WORKAROUND_MSVC_STL_SUPPORT(T) T
+
+#endif
+
+}
+
+#endif // EIGEN_STL_DETAILS_H
diff --git a/third_party/eigen3/Eigen/src/SuperLUSupport/SuperLUSupport.h b/third_party/eigen3/Eigen/src/SuperLUSupport/SuperLUSupport.h
new file mode 100644
index 0000000000..bcb355760c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/SuperLUSupport/SuperLUSupport.h
@@ -0,0 +1,1026 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SUPERLUSUPPORT_H
+#define EIGEN_SUPERLUSUPPORT_H
+
+namespace Eigen { 
+
+#define DECL_GSSVX(PREFIX,FLOATTYPE,KEYTYPE)		\
+    extern "C" {                                                                                          \
+      typedef struct { FLOATTYPE for_lu; FLOATTYPE total_needed; int expansions; } PREFIX##mem_usage_t;   \
+      extern void PREFIX##gssvx(superlu_options_t *, SuperMatrix *, int *, int *, int *,                  \
+                                char *, FLOATTYPE *, FLOATTYPE *, SuperMatrix *, SuperMatrix *,           \
+                                void *, int, SuperMatrix *, SuperMatrix *,                                \
+                                FLOATTYPE *, FLOATTYPE *, FLOATTYPE *, FLOATTYPE *,                       \
+                                PREFIX##mem_usage_t *, SuperLUStat_t *, int *);                           \
+    }                                                                                                     \
+    inline float SuperLU_gssvx(superlu_options_t *options, SuperMatrix *A,                                \
+         int *perm_c, int *perm_r, int *etree, char *equed,                                               \
+         FLOATTYPE *R, FLOATTYPE *C, SuperMatrix *L,                                                      \
+         SuperMatrix *U, void *work, int lwork,                                                           \
+         SuperMatrix *B, SuperMatrix *X,                                                                  \
+         FLOATTYPE *recip_pivot_growth,                                                                   \
+         FLOATTYPE *rcond, FLOATTYPE *ferr, FLOATTYPE *berr,                                              \
+         SuperLUStat_t *stats, int *info, KEYTYPE) {                                                      \
+    PREFIX##mem_usage_t mem_usage;                                                                        \
+    PREFIX##gssvx(options, A, perm_c, perm_r, etree, equed, R, C, L,                                      \
+         U, work, lwork, B, X, recip_pivot_growth, rcond,                                                 \
+         ferr, berr, &mem_usage, stats, info);                                                            \
+    return mem_usage.for_lu; /* bytes used by the factor storage */                                       \
+  }
+
+DECL_GSSVX(s,float,float)
+DECL_GSSVX(c,float,std::complex<float>)
+DECL_GSSVX(d,double,double)
+DECL_GSSVX(z,double,std::complex<double>)
+
+#ifdef MILU_ALPHA
+#define EIGEN_SUPERLU_HAS_ILU
+#endif
+
+#ifdef EIGEN_SUPERLU_HAS_ILU
+
+// similarly for the incomplete factorization using gsisx
+#define DECL_GSISX(PREFIX,FLOATTYPE,KEYTYPE)                                                    \
+    extern "C" {                                                                                \
+      extern void PREFIX##gsisx(superlu_options_t *, SuperMatrix *, int *, int *, int *,        \
+                         char *, FLOATTYPE *, FLOATTYPE *, SuperMatrix *, SuperMatrix *,        \
+                         void *, int, SuperMatrix *, SuperMatrix *, FLOATTYPE *, FLOATTYPE *,   \
+                         PREFIX##mem_usage_t *, SuperLUStat_t *, int *);                        \
+    }                                                                                           \
+    inline float SuperLU_gsisx(superlu_options_t *options, SuperMatrix *A,                      \
+         int *perm_c, int *perm_r, int *etree, char *equed,                                     \
+         FLOATTYPE *R, FLOATTYPE *C, SuperMatrix *L,                                            \
+         SuperMatrix *U, void *work, int lwork,                                                 \
+         SuperMatrix *B, SuperMatrix *X,                                                        \
+         FLOATTYPE *recip_pivot_growth,                                                         \
+         FLOATTYPE *rcond,                                                                      \
+         SuperLUStat_t *stats, int *info, KEYTYPE) {                                            \
+    PREFIX##mem_usage_t mem_usage;                                                              \
+    PREFIX##gsisx(options, A, perm_c, perm_r, etree, equed, R, C, L,                            \
+         U, work, lwork, B, X, recip_pivot_growth, rcond,                                       \
+         &mem_usage, stats, info);                                                              \
+    return mem_usage.for_lu; /* bytes used by the factor storage */                             \
+  }
+
+DECL_GSISX(s,float,float)
+DECL_GSISX(c,float,std::complex<float>)
+DECL_GSISX(d,double,double)
+DECL_GSISX(z,double,std::complex<double>)
+
+#endif
+
+template<typename MatrixType>
+struct SluMatrixMapHelper;
+
+/** \internal
+  *
+  * A wrapper class for SuperLU matrices. It supports only compressed sparse matrices
+  * and dense matrices. Supernodal and other fancy format are not supported by this wrapper.
+  *
+  * This wrapper class mainly aims to avoids the need of dynamic allocation of the storage structure.
+  */
+struct SluMatrix : SuperMatrix
+{
+  SluMatrix()
+  {
+    Store = &storage;
+  }
+
+  SluMatrix(const SluMatrix& other)
+    : SuperMatrix(other)
+  {
+    Store = &storage;
+    storage = other.storage;
+  }
+
+  SluMatrix& operator=(const SluMatrix& other)
+  {
+    SuperMatrix::operator=(static_cast<const SuperMatrix&>(other));
+    Store = &storage;
+    storage = other.storage;
+    return *this;
+  }
+
+  struct
+  {
+    union {int nnz;int lda;};
+    void *values;
+    int *innerInd;
+    int *outerInd;
+  } storage;
+
+  void setStorageType(Stype_t t)
+  {
+    Stype = t;
+    if (t==SLU_NC || t==SLU_NR || t==SLU_DN)
+      Store = &storage;
+    else
+    {
+      eigen_assert(false && "storage type not supported");
+      Store = 0;
+    }
+  }
+
+  template<typename Scalar>
+  void setScalarType()
+  {
+    if (internal::is_same<Scalar,float>::value)
+      Dtype = SLU_S;
+    else if (internal::is_same<Scalar,double>::value)
+      Dtype = SLU_D;
+    else if (internal::is_same<Scalar,std::complex<float> >::value)
+      Dtype = SLU_C;
+    else if (internal::is_same<Scalar,std::complex<double> >::value)
+      Dtype = SLU_Z;
+    else
+    {
+      eigen_assert(false && "Scalar type not supported by SuperLU");
+    }
+  }
+
+  template<typename MatrixType>
+  static SluMatrix Map(MatrixBase<MatrixType>& _mat)
+  {
+    MatrixType& mat(_mat.derived());
+    eigen_assert( ((MatrixType::Flags&RowMajorBit)!=RowMajorBit) && "row-major dense matrices are not supported by SuperLU");
+    SluMatrix res;
+    res.setStorageType(SLU_DN);
+    res.setScalarType<typename MatrixType::Scalar>();
+    res.Mtype     = SLU_GE;
+
+    res.nrow      = mat.rows();
+    res.ncol      = mat.cols();
+
+    res.storage.lda       = MatrixType::IsVectorAtCompileTime ? mat.size() : mat.outerStride();
+    res.storage.values    = (void*)(mat.data());
+    return res;
+  }
+
+  template<typename MatrixType>
+  static SluMatrix Map(SparseMatrixBase<MatrixType>& mat)
+  {
+    SluMatrix res;
+    if ((MatrixType::Flags&RowMajorBit)==RowMajorBit)
+    {
+      res.setStorageType(SLU_NR);
+      res.nrow      = mat.cols();
+      res.ncol      = mat.rows();
+    }
+    else
+    {
+      res.setStorageType(SLU_NC);
+      res.nrow      = mat.rows();
+      res.ncol      = mat.cols();
+    }
+
+    res.Mtype       = SLU_GE;
+
+    res.storage.nnz       = mat.nonZeros();
+    res.storage.values    = mat.derived().valuePtr();
+    res.storage.innerInd  = mat.derived().innerIndexPtr();
+    res.storage.outerInd  = mat.derived().outerIndexPtr();
+
+    res.setScalarType<typename MatrixType::Scalar>();
+
+    // FIXME the following is not very accurate
+    if (MatrixType::Flags & Upper)
+      res.Mtype = SLU_TRU;
+    if (MatrixType::Flags & Lower)
+      res.Mtype = SLU_TRL;
+
+    eigen_assert(((MatrixType::Flags & SelfAdjoint)==0) && "SelfAdjoint matrix shape not supported by SuperLU");
+
+    return res;
+  }
+};
+
+template<typename Scalar, int Rows, int Cols, int Options, int MRows, int MCols>
+struct SluMatrixMapHelper<Matrix<Scalar,Rows,Cols,Options,MRows,MCols> >
+{
+  typedef Matrix<Scalar,Rows,Cols,Options,MRows,MCols> MatrixType;
+  static void run(MatrixType& mat, SluMatrix& res)
+  {
+    eigen_assert( ((Options&RowMajor)!=RowMajor) && "row-major dense matrices is not supported by SuperLU");
+    res.setStorageType(SLU_DN);
+    res.setScalarType<Scalar>();
+    res.Mtype     = SLU_GE;
+
+    res.nrow      = mat.rows();
+    res.ncol      = mat.cols();
+
+    res.storage.lda       = mat.outerStride();
+    res.storage.values    = mat.data();
+  }
+};
+
+template<typename Derived>
+struct SluMatrixMapHelper<SparseMatrixBase<Derived> >
+{
+  typedef Derived MatrixType;
+  static void run(MatrixType& mat, SluMatrix& res)
+  {
+    if ((MatrixType::Flags&RowMajorBit)==RowMajorBit)
+    {
+      res.setStorageType(SLU_NR);
+      res.nrow      = mat.cols();
+      res.ncol      = mat.rows();
+    }
+    else
+    {
+      res.setStorageType(SLU_NC);
+      res.nrow      = mat.rows();
+      res.ncol      = mat.cols();
+    }
+
+    res.Mtype       = SLU_GE;
+
+    res.storage.nnz       = mat.nonZeros();
+    res.storage.values    = mat.valuePtr();
+    res.storage.innerInd  = mat.innerIndexPtr();
+    res.storage.outerInd  = mat.outerIndexPtr();
+
+    res.setScalarType<typename MatrixType::Scalar>();
+
+    // FIXME the following is not very accurate
+    if (MatrixType::Flags & Upper)
+      res.Mtype = SLU_TRU;
+    if (MatrixType::Flags & Lower)
+      res.Mtype = SLU_TRL;
+
+    eigen_assert(((MatrixType::Flags & SelfAdjoint)==0) && "SelfAdjoint matrix shape not supported by SuperLU");
+  }
+};
+
+namespace internal {
+
+template<typename MatrixType>
+SluMatrix asSluMatrix(MatrixType& mat)
+{
+  return SluMatrix::Map(mat);
+}
+
+/** View a Super LU matrix as an Eigen expression */
+template<typename Scalar, int Flags, typename Index>
+MappedSparseMatrix<Scalar,Flags,Index> map_superlu(SluMatrix& sluMat)
+{
+  eigen_assert((Flags&RowMajor)==RowMajor && sluMat.Stype == SLU_NR
+         || (Flags&ColMajor)==ColMajor && sluMat.Stype == SLU_NC);
+
+  Index outerSize = (Flags&RowMajor)==RowMajor ? sluMat.ncol : sluMat.nrow;
+
+  return MappedSparseMatrix<Scalar,Flags,Index>(
+    sluMat.nrow, sluMat.ncol, sluMat.storage.outerInd[outerSize],
+    sluMat.storage.outerInd, sluMat.storage.innerInd, reinterpret_cast<Scalar*>(sluMat.storage.values) );
+}
+
+} // end namespace internal
+
+/** \ingroup SuperLUSupport_Module
+  * \class SuperLUBase
+  * \brief The base class for the direct and incomplete LU factorization of SuperLU
+  */
+template<typename _MatrixType, typename Derived>
+class SuperLUBase : internal::noncopyable
+{
+  public:
+    typedef _MatrixType MatrixType;
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef Matrix<Scalar,Dynamic,1> Vector;
+    typedef Matrix<int, 1, MatrixType::ColsAtCompileTime> IntRowVectorType;
+    typedef Matrix<int, MatrixType::RowsAtCompileTime, 1> IntColVectorType;    
+    typedef SparseMatrix<Scalar> LUMatrixType;
+
+  public:
+
+    SuperLUBase() {}
+
+    ~SuperLUBase()
+    {
+      clearFactors();
+    }
+    
+    Derived& derived() { return *static_cast<Derived*>(this); }
+    const Derived& derived() const { return *static_cast<const Derived*>(this); }
+    
+    inline Index rows() const { return m_matrix.rows(); }
+    inline Index cols() const { return m_matrix.cols(); }
+    
+    /** \returns a reference to the Super LU option object to configure the  Super LU algorithms. */
+    inline superlu_options_t& options() { return m_sluOptions; }
+    
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the matrix.appears to be negative.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_info;
+    }
+
+    /** Computes the sparse Cholesky decomposition of \a matrix */
+    void compute(const MatrixType& matrix)
+    {
+      derived().analyzePattern(matrix);
+      derived().factorize(matrix);
+    }
+    
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<SuperLUBase, Rhs> solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "SuperLU is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "SuperLU::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<SuperLUBase, Rhs>(*this, b.derived());
+    }
+    
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::sparse_solve_retval<SuperLUBase, Rhs> solve(const SparseMatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "SuperLU is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "SuperLU::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::sparse_solve_retval<SuperLUBase, Rhs>(*this, b.derived());
+    }
+    
+    /** Performs a symbolic decomposition on the sparcity of \a matrix.
+      *
+      * This function is particularly useful when solving for several problems having the same structure.
+      * 
+      * \sa factorize()
+      */
+    void analyzePattern(const MatrixType& /*matrix*/)
+    {
+      m_isInitialized = true;
+      m_info = Success;
+      m_analysisIsOk = true;
+      m_factorizationIsOk = false;
+    }
+    
+    template<typename Stream>
+    void dumpMemory(Stream& /*s*/)
+    {}
+    
+  protected:
+    
+    void initFactorization(const MatrixType& a)
+    {
+      set_default_options(&this->m_sluOptions);
+      
+      const int size = a.rows();
+      m_matrix = a;
+
+      m_sluA = internal::asSluMatrix(m_matrix);
+      clearFactors();
+
+      m_p.resize(size);
+      m_q.resize(size);
+      m_sluRscale.resize(size);
+      m_sluCscale.resize(size);
+      m_sluEtree.resize(size);
+
+      // set empty B and X
+      m_sluB.setStorageType(SLU_DN);
+      m_sluB.setScalarType<Scalar>();
+      m_sluB.Mtype          = SLU_GE;
+      m_sluB.storage.values = 0;
+      m_sluB.nrow           = 0;
+      m_sluB.ncol           = 0;
+      m_sluB.storage.lda    = size;
+      m_sluX                = m_sluB;
+      
+      m_extractedDataAreDirty = true;
+    }
+    
+    void init()
+    {
+      m_info = InvalidInput;
+      m_isInitialized = false;
+      m_sluL.Store = 0;
+      m_sluU.Store = 0;
+    }
+    
+    void extractData() const;
+
+    void clearFactors()
+    {
+      if(m_sluL.Store)
+        Destroy_SuperNode_Matrix(&m_sluL);
+      if(m_sluU.Store)
+        Destroy_CompCol_Matrix(&m_sluU);
+
+      m_sluL.Store = 0;
+      m_sluU.Store = 0;
+
+      memset(&m_sluL,0,sizeof m_sluL);
+      memset(&m_sluU,0,sizeof m_sluU);
+    }
+
+    // cached data to reduce reallocation, etc.
+    mutable LUMatrixType m_l;
+    mutable LUMatrixType m_u;
+    mutable IntColVectorType m_p;
+    mutable IntRowVectorType m_q;
+
+    mutable LUMatrixType m_matrix;  // copy of the factorized matrix
+    mutable SluMatrix m_sluA;
+    mutable SuperMatrix m_sluL, m_sluU;
+    mutable SluMatrix m_sluB, m_sluX;
+    mutable SuperLUStat_t m_sluStat;
+    mutable superlu_options_t m_sluOptions;
+    mutable std::vector<int> m_sluEtree;
+    mutable Matrix<RealScalar,Dynamic,1> m_sluRscale, m_sluCscale;
+    mutable Matrix<RealScalar,Dynamic,1> m_sluFerr, m_sluBerr;
+    mutable char m_sluEqued;
+
+    mutable ComputationInfo m_info;
+    bool m_isInitialized;
+    int m_factorizationIsOk;
+    int m_analysisIsOk;
+    mutable bool m_extractedDataAreDirty;
+    
+  private:
+    SuperLUBase(SuperLUBase& ) { }
+};
+
+
+/** \ingroup SuperLUSupport_Module
+  * \class SuperLU
+  * \brief A sparse direct LU factorization and solver based on the SuperLU library
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a direct LU factorization
+  * using the SuperLU library. The sparse matrix A must be squared and invertible. The vectors or matrices
+  * X and B can be either dense or sparse.
+  *
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  *
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename _MatrixType>
+class SuperLU : public SuperLUBase<_MatrixType,SuperLU<_MatrixType> >
+{
+  public:
+    typedef SuperLUBase<_MatrixType,SuperLU> Base;
+    typedef _MatrixType MatrixType;
+    typedef typename Base::Scalar Scalar;
+    typedef typename Base::RealScalar RealScalar;
+    typedef typename Base::Index Index;
+    typedef typename Base::IntRowVectorType IntRowVectorType;
+    typedef typename Base::IntColVectorType IntColVectorType;    
+    typedef typename Base::LUMatrixType LUMatrixType;
+    typedef TriangularView<LUMatrixType, Lower|UnitDiag>  LMatrixType;
+    typedef TriangularView<LUMatrixType,  Upper>           UMatrixType;
+
+  public:
+
+    SuperLU() : Base() { init(); }
+
+    SuperLU(const MatrixType& matrix) : Base()
+    {
+      init();
+      Base::compute(matrix);
+    }
+
+    ~SuperLU()
+    {
+    }
+    
+    /** Performs a symbolic decomposition on the sparcity of \a matrix.
+      *
+      * This function is particularly useful when solving for several problems having the same structure.
+      * 
+      * \sa factorize()
+      */
+    void analyzePattern(const MatrixType& matrix)
+    {
+      m_info = InvalidInput;
+      m_isInitialized = false;
+      Base::analyzePattern(matrix);
+    }
+    
+    /** Performs a numeric decomposition of \a matrix
+      *
+      * The given matrix must has the same sparcity than the matrix on which the symbolic decomposition has been performed.
+      *
+      * \sa analyzePattern()
+      */
+    void factorize(const MatrixType& matrix);
+    
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** \internal */
+    template<typename Rhs,typename Dest>
+    void _solve(const MatrixBase<Rhs> &b, MatrixBase<Dest> &dest) const;
+    #endif // EIGEN_PARSED_BY_DOXYGEN
+    
+    inline const LMatrixType& matrixL() const
+    {
+      if (m_extractedDataAreDirty) this->extractData();
+      return m_l;
+    }
+
+    inline const UMatrixType& matrixU() const
+    {
+      if (m_extractedDataAreDirty) this->extractData();
+      return m_u;
+    }
+
+    inline const IntColVectorType& permutationP() const
+    {
+      if (m_extractedDataAreDirty) this->extractData();
+      return m_p;
+    }
+
+    inline const IntRowVectorType& permutationQ() const
+    {
+      if (m_extractedDataAreDirty) this->extractData();
+      return m_q;
+    }
+    
+    Scalar determinant() const;
+    
+  protected:
+    
+    using Base::m_matrix;
+    using Base::m_sluOptions;
+    using Base::m_sluA;
+    using Base::m_sluB;
+    using Base::m_sluX;
+    using Base::m_p;
+    using Base::m_q;
+    using Base::m_sluEtree;
+    using Base::m_sluEqued;
+    using Base::m_sluRscale;
+    using Base::m_sluCscale;
+    using Base::m_sluL;
+    using Base::m_sluU;
+    using Base::m_sluStat;
+    using Base::m_sluFerr;
+    using Base::m_sluBerr;
+    using Base::m_l;
+    using Base::m_u;
+    
+    using Base::m_analysisIsOk;
+    using Base::m_factorizationIsOk;
+    using Base::m_extractedDataAreDirty;
+    using Base::m_isInitialized;
+    using Base::m_info;
+    
+    void init()
+    {
+      Base::init();
+      
+      set_default_options(&this->m_sluOptions);
+      m_sluOptions.PrintStat        = NO;
+      m_sluOptions.ConditionNumber  = NO;
+      m_sluOptions.Trans            = NOTRANS;
+      m_sluOptions.ColPerm          = COLAMD;
+    }
+    
+    
+  private:
+    SuperLU(SuperLU& ) { }
+};
+
+template<typename MatrixType>
+void SuperLU<MatrixType>::factorize(const MatrixType& a)
+{
+  eigen_assert(m_analysisIsOk && "You must first call analyzePattern()");
+  if(!m_analysisIsOk)
+  {
+    m_info = InvalidInput;
+    return;
+  }
+  
+  this->initFactorization(a);
+  
+  m_sluOptions.ColPerm = COLAMD;
+  int info = 0;
+  RealScalar recip_pivot_growth, rcond;
+  RealScalar ferr, berr;
+
+  StatInit(&m_sluStat);
+  SuperLU_gssvx(&m_sluOptions, &m_sluA, m_q.data(), m_p.data(), &m_sluEtree[0],
+                &m_sluEqued, &m_sluRscale[0], &m_sluCscale[0],
+                &m_sluL, &m_sluU,
+                NULL, 0,
+                &m_sluB, &m_sluX,
+                &recip_pivot_growth, &rcond,
+                &ferr, &berr,
+                &m_sluStat, &info, Scalar());
+  StatFree(&m_sluStat);
+
+  m_extractedDataAreDirty = true;
+
+  // FIXME how to better check for errors ???
+  m_info = info == 0 ? Success : NumericalIssue;
+  m_factorizationIsOk = true;
+}
+
+template<typename MatrixType>
+template<typename Rhs,typename Dest>
+void SuperLU<MatrixType>::_solve(const MatrixBase<Rhs> &b, MatrixBase<Dest>& x) const
+{
+  eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or analyzePattern()/factorize()");
+
+  const int size = m_matrix.rows();
+  const int rhsCols = b.cols();
+  eigen_assert(size==b.rows());
+
+  m_sluOptions.Trans = NOTRANS;
+  m_sluOptions.Fact = FACTORED;
+  m_sluOptions.IterRefine = NOREFINE;
+  
+
+  m_sluFerr.resize(rhsCols);
+  m_sluBerr.resize(rhsCols);
+  m_sluB = SluMatrix::Map(b.const_cast_derived());
+  m_sluX = SluMatrix::Map(x.derived());
+  
+  typename Rhs::PlainObject b_cpy;
+  if(m_sluEqued!='N')
+  {
+    b_cpy = b;
+    m_sluB = SluMatrix::Map(b_cpy.const_cast_derived());  
+  }
+
+  StatInit(&m_sluStat);
+  int info = 0;
+  RealScalar recip_pivot_growth, rcond;
+  SuperLU_gssvx(&m_sluOptions, &m_sluA,
+                m_q.data(), m_p.data(),
+                &m_sluEtree[0], &m_sluEqued,
+                &m_sluRscale[0], &m_sluCscale[0],
+                &m_sluL, &m_sluU,
+                NULL, 0,
+                &m_sluB, &m_sluX,
+                &recip_pivot_growth, &rcond,
+                &m_sluFerr[0], &m_sluBerr[0],
+                &m_sluStat, &info, Scalar());
+  StatFree(&m_sluStat);
+  m_info = info==0 ? Success : NumericalIssue;
+}
+
+// the code of this extractData() function has been adapted from the SuperLU's Matlab support code,
+//
+//  Copyright (c) 1994 by Xerox Corporation.  All rights reserved.
+//
+//  THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY
+//  EXPRESSED OR IMPLIED.  ANY USE IS AT YOUR OWN RISK.
+//
+template<typename MatrixType, typename Derived>
+void SuperLUBase<MatrixType,Derived>::extractData() const
+{
+  eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for extracting factors, you must first call either compute() or analyzePattern()/factorize()");
+  if (m_extractedDataAreDirty)
+  {
+    int         upper;
+    int         fsupc, istart, nsupr;
+    int         lastl = 0, lastu = 0;
+    SCformat    *Lstore = static_cast<SCformat*>(m_sluL.Store);
+    NCformat    *Ustore = static_cast<NCformat*>(m_sluU.Store);
+    Scalar      *SNptr;
+
+    const int size = m_matrix.rows();
+    m_l.resize(size,size);
+    m_l.resizeNonZeros(Lstore->nnz);
+    m_u.resize(size,size);
+    m_u.resizeNonZeros(Ustore->nnz);
+
+    int* Lcol = m_l.outerIndexPtr();
+    int* Lrow = m_l.innerIndexPtr();
+    Scalar* Lval = m_l.valuePtr();
+
+    int* Ucol = m_u.outerIndexPtr();
+    int* Urow = m_u.innerIndexPtr();
+    Scalar* Uval = m_u.valuePtr();
+
+    Ucol[0] = 0;
+    Ucol[0] = 0;
+
+    /* for each supernode */
+    for (int k = 0; k <= Lstore->nsuper; ++k)
+    {
+      fsupc   = L_FST_SUPC(k);
+      istart  = L_SUB_START(fsupc);
+      nsupr   = L_SUB_START(fsupc+1) - istart;
+      upper   = 1;
+
+      /* for each column in the supernode */
+      for (int j = fsupc; j < L_FST_SUPC(k+1); ++j)
+      {
+        SNptr = &((Scalar*)Lstore->nzval)[L_NZ_START(j)];
+
+        /* Extract U */
+        for (int i = U_NZ_START(j); i < U_NZ_START(j+1); ++i)
+        {
+          Uval[lastu] = ((Scalar*)Ustore->nzval)[i];
+          /* Matlab doesn't like explicit zero. */
+          if (Uval[lastu] != 0.0)
+            Urow[lastu++] = U_SUB(i);
+        }
+        for (int i = 0; i < upper; ++i)
+        {
+          /* upper triangle in the supernode */
+          Uval[lastu] = SNptr[i];
+          /* Matlab doesn't like explicit zero. */
+          if (Uval[lastu] != 0.0)
+            Urow[lastu++] = L_SUB(istart+i);
+        }
+        Ucol[j+1] = lastu;
+
+        /* Extract L */
+        Lval[lastl] = 1.0; /* unit diagonal */
+        Lrow[lastl++] = L_SUB(istart + upper - 1);
+        for (int i = upper; i < nsupr; ++i)
+        {
+          Lval[lastl] = SNptr[i];
+          /* Matlab doesn't like explicit zero. */
+          if (Lval[lastl] != 0.0)
+            Lrow[lastl++] = L_SUB(istart+i);
+        }
+        Lcol[j+1] = lastl;
+
+        ++upper;
+      } /* for j ... */
+
+    } /* for k ... */
+
+    // squeeze the matrices :
+    m_l.resizeNonZeros(lastl);
+    m_u.resizeNonZeros(lastu);
+
+    m_extractedDataAreDirty = false;
+  }
+}
+
+template<typename MatrixType>
+typename SuperLU<MatrixType>::Scalar SuperLU<MatrixType>::determinant() const
+{
+  eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for computing the determinant, you must first call either compute() or analyzePattern()/factorize()");
+  
+  if (m_extractedDataAreDirty)
+    this->extractData();
+
+  Scalar det = Scalar(1);
+  for (int j=0; j<m_u.cols(); ++j)
+  {
+    if (m_u.outerIndexPtr()[j+1]-m_u.outerIndexPtr()[j] > 0)
+    {
+      int lastId = m_u.outerIndexPtr()[j+1]-1;
+      eigen_assert(m_u.innerIndexPtr()[lastId]<=j);
+      if (m_u.innerIndexPtr()[lastId]==j)
+        det *= m_u.valuePtr()[lastId];
+    }
+  }
+  if(m_sluEqued!='N')
+    return det/m_sluRscale.prod()/m_sluCscale.prod();
+  else
+    return det;
+}
+
+#ifdef EIGEN_PARSED_BY_DOXYGEN
+#define EIGEN_SUPERLU_HAS_ILU
+#endif
+
+#ifdef EIGEN_SUPERLU_HAS_ILU
+
+/** \ingroup SuperLUSupport_Module
+  * \class SuperILU
+  * \brief A sparse direct \b incomplete LU factorization and solver based on the SuperLU library
+  *
+  * This class allows to solve for an approximate solution of A.X = B sparse linear problems via an incomplete LU factorization
+  * using the SuperLU library. This class is aimed to be used as a preconditioner of the iterative linear solvers.
+  *
+  * \warning This class requires SuperLU 4 or later.
+  *
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  *
+  * \sa \ref TutorialSparseDirectSolvers, class ConjugateGradient, class BiCGSTAB
+  */
+
+template<typename _MatrixType>
+class SuperILU : public SuperLUBase<_MatrixType,SuperILU<_MatrixType> >
+{
+  public:
+    typedef SuperLUBase<_MatrixType,SuperILU> Base;
+    typedef _MatrixType MatrixType;
+    typedef typename Base::Scalar Scalar;
+    typedef typename Base::RealScalar RealScalar;
+    typedef typename Base::Index Index;
+
+  public:
+
+    SuperILU() : Base() { init(); }
+
+    SuperILU(const MatrixType& matrix) : Base()
+    {
+      init();
+      Base::compute(matrix);
+    }
+
+    ~SuperILU()
+    {
+    }
+    
+    /** Performs a symbolic decomposition on the sparcity of \a matrix.
+      *
+      * This function is particularly useful when solving for several problems having the same structure.
+      * 
+      * \sa factorize()
+      */
+    void analyzePattern(const MatrixType& matrix)
+    {
+      Base::analyzePattern(matrix);
+    }
+    
+    /** Performs a numeric decomposition of \a matrix
+      *
+      * The given matrix must has the same sparcity than the matrix on which the symbolic decomposition has been performed.
+      *
+      * \sa analyzePattern()
+      */
+    void factorize(const MatrixType& matrix);
+    
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** \internal */
+    template<typename Rhs,typename Dest>
+    void _solve(const MatrixBase<Rhs> &b, MatrixBase<Dest> &dest) const;
+    #endif // EIGEN_PARSED_BY_DOXYGEN
+    
+  protected:
+    
+    using Base::m_matrix;
+    using Base::m_sluOptions;
+    using Base::m_sluA;
+    using Base::m_sluB;
+    using Base::m_sluX;
+    using Base::m_p;
+    using Base::m_q;
+    using Base::m_sluEtree;
+    using Base::m_sluEqued;
+    using Base::m_sluRscale;
+    using Base::m_sluCscale;
+    using Base::m_sluL;
+    using Base::m_sluU;
+    using Base::m_sluStat;
+    using Base::m_sluFerr;
+    using Base::m_sluBerr;
+    using Base::m_l;
+    using Base::m_u;
+    
+    using Base::m_analysisIsOk;
+    using Base::m_factorizationIsOk;
+    using Base::m_extractedDataAreDirty;
+    using Base::m_isInitialized;
+    using Base::m_info;
+
+    void init()
+    {
+      Base::init();
+      
+      ilu_set_default_options(&m_sluOptions);
+      m_sluOptions.PrintStat        = NO;
+      m_sluOptions.ConditionNumber  = NO;
+      m_sluOptions.Trans            = NOTRANS;
+      m_sluOptions.ColPerm          = MMD_AT_PLUS_A;
+      
+      // no attempt to preserve column sum
+      m_sluOptions.ILU_MILU = SILU;
+      // only basic ILU(k) support -- no direct control over memory consumption
+      // better to use ILU_DropRule = DROP_BASIC | DROP_AREA
+      // and set ILU_FillFactor to max memory growth
+      m_sluOptions.ILU_DropRule = DROP_BASIC;
+      m_sluOptions.ILU_DropTol = NumTraits<Scalar>::dummy_precision()*10;
+    }
+    
+  private:
+    SuperILU(SuperILU& ) { }
+};
+
+template<typename MatrixType>
+void SuperILU<MatrixType>::factorize(const MatrixType& a)
+{
+  eigen_assert(m_analysisIsOk && "You must first call analyzePattern()");
+  if(!m_analysisIsOk)
+  {
+    m_info = InvalidInput;
+    return;
+  }
+  
+  this->initFactorization(a);
+
+  int info = 0;
+  RealScalar recip_pivot_growth, rcond;
+
+  StatInit(&m_sluStat);
+  SuperLU_gsisx(&m_sluOptions, &m_sluA, m_q.data(), m_p.data(), &m_sluEtree[0],
+                &m_sluEqued, &m_sluRscale[0], &m_sluCscale[0],
+                &m_sluL, &m_sluU,
+                NULL, 0,
+                &m_sluB, &m_sluX,
+                &recip_pivot_growth, &rcond,
+                &m_sluStat, &info, Scalar());
+  StatFree(&m_sluStat);
+
+  // FIXME how to better check for errors ???
+  m_info = info == 0 ? Success : NumericalIssue;
+  m_factorizationIsOk = true;
+}
+
+template<typename MatrixType>
+template<typename Rhs,typename Dest>
+void SuperILU<MatrixType>::_solve(const MatrixBase<Rhs> &b, MatrixBase<Dest>& x) const
+{
+  eigen_assert(m_factorizationIsOk && "The decomposition is not in a valid state for solving, you must first call either compute() or analyzePattern()/factorize()");
+
+  const int size = m_matrix.rows();
+  const int rhsCols = b.cols();
+  eigen_assert(size==b.rows());
+
+  m_sluOptions.Trans = NOTRANS;
+  m_sluOptions.Fact = FACTORED;
+  m_sluOptions.IterRefine = NOREFINE;
+
+  m_sluFerr.resize(rhsCols);
+  m_sluBerr.resize(rhsCols);
+  m_sluB = SluMatrix::Map(b.const_cast_derived());
+  m_sluX = SluMatrix::Map(x.derived());
+
+  typename Rhs::PlainObject b_cpy;
+  if(m_sluEqued!='N')
+  {
+    b_cpy = b;
+    m_sluB = SluMatrix::Map(b_cpy.const_cast_derived());  
+  }
+  
+  int info = 0;
+  RealScalar recip_pivot_growth, rcond;
+
+  StatInit(&m_sluStat);
+  SuperLU_gsisx(&m_sluOptions, &m_sluA,
+                m_q.data(), m_p.data(),
+                &m_sluEtree[0], &m_sluEqued,
+                &m_sluRscale[0], &m_sluCscale[0],
+                &m_sluL, &m_sluU,
+                NULL, 0,
+                &m_sluB, &m_sluX,
+                &recip_pivot_growth, &rcond,
+                &m_sluStat, &info, Scalar());
+  StatFree(&m_sluStat);
+
+  m_info = info==0 ? Success : NumericalIssue;
+}
+#endif
+
+namespace internal {
+  
+template<typename _MatrixType, typename Derived, typename Rhs>
+struct solve_retval<SuperLUBase<_MatrixType,Derived>, Rhs>
+  : solve_retval_base<SuperLUBase<_MatrixType,Derived>, Rhs>
+{
+  typedef SuperLUBase<_MatrixType,Derived> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec().derived()._solve(rhs(),dst);
+  }
+};
+
+template<typename _MatrixType, typename Derived, typename Rhs>
+struct sparse_solve_retval<SuperLUBase<_MatrixType,Derived>, Rhs>
+  : sparse_solve_retval_base<SuperLUBase<_MatrixType,Derived>, Rhs>
+{
+  typedef SuperLUBase<_MatrixType,Derived> Dec;
+  EIGEN_MAKE_SPARSE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    this->defaultEvalTo(dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_SUPERLUSUPPORT_H
diff --git a/third_party/eigen3/Eigen/src/UmfPackSupport/UmfPackSupport.h b/third_party/eigen3/Eigen/src/UmfPackSupport/UmfPackSupport.h
new file mode 100644
index 0000000000..3a48cecf76
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/UmfPackSupport/UmfPackSupport.h
@@ -0,0 +1,432 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2011 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_UMFPACKSUPPORT_H
+#define EIGEN_UMFPACKSUPPORT_H
+
+namespace Eigen { 
+
+/* TODO extract L, extract U, compute det, etc... */
+
+// generic double/complex<double> wrapper functions:
+
+inline void umfpack_free_numeric(void **Numeric, double)
+{ umfpack_di_free_numeric(Numeric); *Numeric = 0; }
+
+inline void umfpack_free_numeric(void **Numeric, std::complex<double>)
+{ umfpack_zi_free_numeric(Numeric); *Numeric = 0; }
+
+inline void umfpack_free_symbolic(void **Symbolic, double)
+{ umfpack_di_free_symbolic(Symbolic); *Symbolic = 0; }
+
+inline void umfpack_free_symbolic(void **Symbolic, std::complex<double>)
+{ umfpack_zi_free_symbolic(Symbolic); *Symbolic = 0; }
+
+inline int umfpack_symbolic(int n_row,int n_col,
+                            const int Ap[], const int Ai[], const double Ax[], void **Symbolic,
+                            const double Control [UMFPACK_CONTROL], double Info [UMFPACK_INFO])
+{
+  return umfpack_di_symbolic(n_row,n_col,Ap,Ai,Ax,Symbolic,Control,Info);
+}
+
+inline int umfpack_symbolic(int n_row,int n_col,
+                            const int Ap[], const int Ai[], const std::complex<double> Ax[], void **Symbolic,
+                            const double Control [UMFPACK_CONTROL], double Info [UMFPACK_INFO])
+{
+  return umfpack_zi_symbolic(n_row,n_col,Ap,Ai,&numext::real_ref(Ax[0]),0,Symbolic,Control,Info);
+}
+
+inline int umfpack_numeric( const int Ap[], const int Ai[], const double Ax[],
+                            void *Symbolic, void **Numeric,
+                            const double Control[UMFPACK_CONTROL],double Info [UMFPACK_INFO])
+{
+  return umfpack_di_numeric(Ap,Ai,Ax,Symbolic,Numeric,Control,Info);
+}
+
+inline int umfpack_numeric( const int Ap[], const int Ai[], const std::complex<double> Ax[],
+                            void *Symbolic, void **Numeric,
+                            const double Control[UMFPACK_CONTROL],double Info [UMFPACK_INFO])
+{
+  return umfpack_zi_numeric(Ap,Ai,&numext::real_ref(Ax[0]),0,Symbolic,Numeric,Control,Info);
+}
+
+inline int umfpack_solve( int sys, const int Ap[], const int Ai[], const double Ax[],
+                          double X[], const double B[], void *Numeric,
+                          const double Control[UMFPACK_CONTROL], double Info[UMFPACK_INFO])
+{
+  return umfpack_di_solve(sys,Ap,Ai,Ax,X,B,Numeric,Control,Info);
+}
+
+inline int umfpack_solve( int sys, const int Ap[], const int Ai[], const std::complex<double> Ax[],
+                          std::complex<double> X[], const std::complex<double> B[], void *Numeric,
+                          const double Control[UMFPACK_CONTROL], double Info[UMFPACK_INFO])
+{
+  return umfpack_zi_solve(sys,Ap,Ai,&numext::real_ref(Ax[0]),0,&numext::real_ref(X[0]),0,&numext::real_ref(B[0]),0,Numeric,Control,Info);
+}
+
+inline int umfpack_get_lunz(int *lnz, int *unz, int *n_row, int *n_col, int *nz_udiag, void *Numeric, double)
+{
+  return umfpack_di_get_lunz(lnz,unz,n_row,n_col,nz_udiag,Numeric);
+}
+
+inline int umfpack_get_lunz(int *lnz, int *unz, int *n_row, int *n_col, int *nz_udiag, void *Numeric, std::complex<double>)
+{
+  return umfpack_zi_get_lunz(lnz,unz,n_row,n_col,nz_udiag,Numeric);
+}
+
+inline int umfpack_get_numeric(int Lp[], int Lj[], double Lx[], int Up[], int Ui[], double Ux[],
+                               int P[], int Q[], double Dx[], int *do_recip, double Rs[], void *Numeric)
+{
+  return umfpack_di_get_numeric(Lp,Lj,Lx,Up,Ui,Ux,P,Q,Dx,do_recip,Rs,Numeric);
+}
+
+inline int umfpack_get_numeric(int Lp[], int Lj[], std::complex<double> Lx[], int Up[], int Ui[], std::complex<double> Ux[],
+                               int P[], int Q[], std::complex<double> Dx[], int *do_recip, double Rs[], void *Numeric)
+{
+  double& lx0_real = numext::real_ref(Lx[0]);
+  double& ux0_real = numext::real_ref(Ux[0]);
+  double& dx0_real = numext::real_ref(Dx[0]);
+  return umfpack_zi_get_numeric(Lp,Lj,Lx?&lx0_real:0,0,Up,Ui,Ux?&ux0_real:0,0,P,Q,
+                                Dx?&dx0_real:0,0,do_recip,Rs,Numeric);
+}
+
+inline int umfpack_get_determinant(double *Mx, double *Ex, void *NumericHandle, double User_Info [UMFPACK_INFO])
+{
+  return umfpack_di_get_determinant(Mx,Ex,NumericHandle,User_Info);
+}
+
+inline int umfpack_get_determinant(std::complex<double> *Mx, double *Ex, void *NumericHandle, double User_Info [UMFPACK_INFO])
+{
+  double& mx_real = numext::real_ref(*Mx);
+  return umfpack_zi_get_determinant(&mx_real,0,Ex,NumericHandle,User_Info);
+}
+
+/** \ingroup UmfPackSupport_Module
+  * \brief A sparse LU factorization and solver based on UmfPack
+  *
+  * This class allows to solve for A.X = B sparse linear problems via a LU factorization
+  * using the UmfPack library. The sparse matrix A must be squared and full rank.
+  * The vectors or matrices X and B can be either dense or sparse.
+  *
+  * \warning The input matrix A should be in a \b compressed and \b column-major form.
+  * Otherwise an expensive copy will be made. You can call the inexpensive makeCompressed() to get a compressed matrix.
+  * \tparam _MatrixType the type of the sparse matrix A, it must be a SparseMatrix<>
+  *
+  * \sa \ref TutorialSparseDirectSolvers
+  */
+template<typename _MatrixType>
+class UmfPackLU : internal::noncopyable
+{
+  public:
+    typedef _MatrixType MatrixType;
+    typedef typename MatrixType::Scalar Scalar;
+    typedef typename MatrixType::RealScalar RealScalar;
+    typedef typename MatrixType::Index Index;
+    typedef Matrix<Scalar,Dynamic,1> Vector;
+    typedef Matrix<int, 1, MatrixType::ColsAtCompileTime> IntRowVectorType;
+    typedef Matrix<int, MatrixType::RowsAtCompileTime, 1> IntColVectorType;
+    typedef SparseMatrix<Scalar> LUMatrixType;
+    typedef SparseMatrix<Scalar,ColMajor,int> UmfpackMatrixType;
+
+  public:
+
+    UmfPackLU() { init(); }
+
+    UmfPackLU(const MatrixType& matrix)
+    {
+      init();
+      compute(matrix);
+    }
+
+    ~UmfPackLU()
+    {
+      if(m_symbolic) umfpack_free_symbolic(&m_symbolic,Scalar());
+      if(m_numeric)  umfpack_free_numeric(&m_numeric,Scalar());
+    }
+
+    inline Index rows() const { return m_copyMatrix.rows(); }
+    inline Index cols() const { return m_copyMatrix.cols(); }
+
+    /** \brief Reports whether previous computation was successful.
+      *
+      * \returns \c Success if computation was succesful,
+      *          \c NumericalIssue if the matrix.appears to be negative.
+      */
+    ComputationInfo info() const
+    {
+      eigen_assert(m_isInitialized && "Decomposition is not initialized.");
+      return m_info;
+    }
+
+    inline const LUMatrixType& matrixL() const
+    {
+      if (m_extractedDataAreDirty) extractData();
+      return m_l;
+    }
+
+    inline const LUMatrixType& matrixU() const
+    {
+      if (m_extractedDataAreDirty) extractData();
+      return m_u;
+    }
+
+    inline const IntColVectorType& permutationP() const
+    {
+      if (m_extractedDataAreDirty) extractData();
+      return m_p;
+    }
+
+    inline const IntRowVectorType& permutationQ() const
+    {
+      if (m_extractedDataAreDirty) extractData();
+      return m_q;
+    }
+
+    /** Computes the sparse Cholesky decomposition of \a matrix 
+     *  Note that the matrix should be column-major, and in compressed format for best performance.
+     *  \sa SparseMatrix::makeCompressed().
+     */
+    void compute(const MatrixType& matrix)
+    {
+      analyzePattern(matrix);
+      factorize(matrix);
+    }
+
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::solve_retval<UmfPackLU, Rhs> solve(const MatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "UmfPackLU is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "UmfPackLU::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::solve_retval<UmfPackLU, Rhs>(*this, b.derived());
+    }
+
+    /** \returns the solution x of \f$ A x = b \f$ using the current decomposition of A.
+      *
+      * \sa compute()
+      */
+    template<typename Rhs>
+    inline const internal::sparse_solve_retval<UmfPackLU, Rhs> solve(const SparseMatrixBase<Rhs>& b) const
+    {
+      eigen_assert(m_isInitialized && "UmfPackLU is not initialized.");
+      eigen_assert(rows()==b.rows()
+                && "UmfPackLU::solve(): invalid number of rows of the right hand side matrix b");
+      return internal::sparse_solve_retval<UmfPackLU, Rhs>(*this, b.derived());
+    }
+
+    /** Performs a symbolic decomposition on the sparcity of \a matrix.
+      *
+      * This function is particularly useful when solving for several problems having the same structure.
+      *
+      * \sa factorize(), compute()
+      */
+    void analyzePattern(const MatrixType& matrix)
+    {
+      if(m_symbolic)
+        umfpack_free_symbolic(&m_symbolic,Scalar());
+      if(m_numeric)
+        umfpack_free_numeric(&m_numeric,Scalar());
+      
+      grapInput(matrix);
+
+      int errorCode = 0;
+      errorCode = umfpack_symbolic(matrix.rows(), matrix.cols(), m_outerIndexPtr, m_innerIndexPtr, m_valuePtr,
+                                   &m_symbolic, 0, 0);
+
+      m_isInitialized = true;
+      m_info = errorCode ? InvalidInput : Success;
+      m_analysisIsOk = true;
+      m_factorizationIsOk = false;
+    }
+
+    /** Performs a numeric decomposition of \a matrix
+      *
+      * The given matrix must has the same sparcity than the matrix on which the pattern anylysis has been performed.
+      *
+      * \sa analyzePattern(), compute()
+      */
+    void factorize(const MatrixType& matrix)
+    {
+      eigen_assert(m_analysisIsOk && "UmfPackLU: you must first call analyzePattern()");
+      if(m_numeric)
+        umfpack_free_numeric(&m_numeric,Scalar());
+
+      grapInput(matrix);
+
+      int errorCode;
+      errorCode = umfpack_numeric(m_outerIndexPtr, m_innerIndexPtr, m_valuePtr,
+                                  m_symbolic, &m_numeric, 0, 0);
+
+      m_info = errorCode ? NumericalIssue : Success;
+      m_factorizationIsOk = true;
+    }
+
+    #ifndef EIGEN_PARSED_BY_DOXYGEN
+    /** \internal */
+    template<typename BDerived,typename XDerived>
+    bool _solve(const MatrixBase<BDerived> &b, MatrixBase<XDerived> &x) const;
+    #endif
+
+    Scalar determinant() const;
+
+    void extractData() const;
+
+  protected:
+
+
+    void init()
+    {
+      m_info = InvalidInput;
+      m_isInitialized = false;
+      m_numeric = 0;
+      m_symbolic = 0;
+      m_outerIndexPtr = 0;
+      m_innerIndexPtr = 0;
+      m_valuePtr      = 0;
+    }
+    
+    void grapInput(const MatrixType& mat)
+    {
+      m_copyMatrix.resize(mat.rows(), mat.cols());
+      if( ((MatrixType::Flags&RowMajorBit)==RowMajorBit) || sizeof(typename MatrixType::Index)!=sizeof(int) || !mat.isCompressed() )
+      {
+        // non supported input -> copy
+        m_copyMatrix = mat;
+        m_outerIndexPtr = m_copyMatrix.outerIndexPtr();
+        m_innerIndexPtr = m_copyMatrix.innerIndexPtr();
+        m_valuePtr      = m_copyMatrix.valuePtr();
+      }
+      else
+      {
+        m_outerIndexPtr = mat.outerIndexPtr();
+        m_innerIndexPtr = mat.innerIndexPtr();
+        m_valuePtr      = mat.valuePtr();
+      }
+    }
+
+    // cached data to reduce reallocation, etc.
+    mutable LUMatrixType m_l;
+    mutable LUMatrixType m_u;
+    mutable IntColVectorType m_p;
+    mutable IntRowVectorType m_q;
+
+    UmfpackMatrixType m_copyMatrix;
+    const Scalar* m_valuePtr;
+    const int* m_outerIndexPtr;
+    const int* m_innerIndexPtr;
+    void* m_numeric;
+    void* m_symbolic;
+
+    mutable ComputationInfo m_info;
+    bool m_isInitialized;
+    int m_factorizationIsOk;
+    int m_analysisIsOk;
+    mutable bool m_extractedDataAreDirty;
+    
+  private:
+    UmfPackLU(UmfPackLU& ) { }
+};
+
+
+template<typename MatrixType>
+void UmfPackLU<MatrixType>::extractData() const
+{
+  if (m_extractedDataAreDirty)
+  {
+    // get size of the data
+    int lnz, unz, rows, cols, nz_udiag;
+    umfpack_get_lunz(&lnz, &unz, &rows, &cols, &nz_udiag, m_numeric, Scalar());
+
+    // allocate data
+    m_l.resize(rows,(std::min)(rows,cols));
+    m_l.resizeNonZeros(lnz);
+
+    m_u.resize((std::min)(rows,cols),cols);
+    m_u.resizeNonZeros(unz);
+
+    m_p.resize(rows);
+    m_q.resize(cols);
+
+    // extract
+    umfpack_get_numeric(m_l.outerIndexPtr(), m_l.innerIndexPtr(), m_l.valuePtr(),
+                        m_u.outerIndexPtr(), m_u.innerIndexPtr(), m_u.valuePtr(),
+                        m_p.data(), m_q.data(), 0, 0, 0, m_numeric);
+
+    m_extractedDataAreDirty = false;
+  }
+}
+
+template<typename MatrixType>
+typename UmfPackLU<MatrixType>::Scalar UmfPackLU<MatrixType>::determinant() const
+{
+  Scalar det;
+  umfpack_get_determinant(&det, 0, m_numeric, 0);
+  return det;
+}
+
+template<typename MatrixType>
+template<typename BDerived,typename XDerived>
+bool UmfPackLU<MatrixType>::_solve(const MatrixBase<BDerived> &b, MatrixBase<XDerived> &x) const
+{
+  const int rhsCols = b.cols();
+  eigen_assert((BDerived::Flags&RowMajorBit)==0 && "UmfPackLU backend does not support non col-major rhs yet");
+  eigen_assert((XDerived::Flags&RowMajorBit)==0 && "UmfPackLU backend does not support non col-major result yet");
+  eigen_assert(b.derived().data() != x.derived().data() && " Umfpack does not support inplace solve");
+  
+  int errorCode;
+  for (int j=0; j<rhsCols; ++j)
+  {
+    errorCode = umfpack_solve(UMFPACK_A,
+        m_outerIndexPtr, m_innerIndexPtr, m_valuePtr,
+        &x.col(j).coeffRef(0), &b.const_cast_derived().col(j).coeffRef(0), m_numeric, 0, 0);
+    if (errorCode!=0)
+      return false;
+  }
+
+  return true;
+}
+
+
+namespace internal {
+
+template<typename _MatrixType, typename Rhs>
+struct solve_retval<UmfPackLU<_MatrixType>, Rhs>
+  : solve_retval_base<UmfPackLU<_MatrixType>, Rhs>
+{
+  typedef UmfPackLU<_MatrixType> Dec;
+  EIGEN_MAKE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    dec()._solve(rhs(),dst);
+  }
+};
+
+template<typename _MatrixType, typename Rhs>
+struct sparse_solve_retval<UmfPackLU<_MatrixType>, Rhs>
+  : sparse_solve_retval_base<UmfPackLU<_MatrixType>, Rhs>
+{
+  typedef UmfPackLU<_MatrixType> Dec;
+  EIGEN_MAKE_SPARSE_SOLVE_HELPERS(Dec,Rhs)
+
+  template<typename Dest> void evalTo(Dest& dst) const
+  {
+    this->defaultEvalTo(dst);
+  }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_UMFPACKSUPPORT_H
diff --git a/third_party/eigen3/Eigen/src/misc/Image.h b/third_party/eigen3/Eigen/src/misc/Image.h
new file mode 100644
index 0000000000..75c5f433a8
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/misc/Image.h
@@ -0,0 +1,84 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MISC_IMAGE_H
+#define EIGEN_MISC_IMAGE_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \class image_retval_base
+  *
+  */
+template<typename DecompositionType>
+struct traits<image_retval_base<DecompositionType> >
+{
+  typedef typename DecompositionType::MatrixType MatrixType;
+  typedef Matrix<
+    typename MatrixType::Scalar,
+    MatrixType::RowsAtCompileTime, // the image is a subspace of the destination space, whose
+                                   // dimension is the number of rows of the original matrix
+    Dynamic,                       // we don't know at compile time the dimension of the image (the rank)
+    MatrixType::Options,
+    MatrixType::MaxRowsAtCompileTime, // the image matrix will consist of columns from the original matrix,
+    MatrixType::MaxColsAtCompileTime  // so it has the same number of rows and at most as many columns.
+  > ReturnType;
+};
+
+template<typename _DecompositionType> struct image_retval_base
+ : public ReturnByValue<image_retval_base<_DecompositionType> >
+{
+  typedef _DecompositionType DecompositionType;
+  typedef typename DecompositionType::MatrixType MatrixType;
+  typedef ReturnByValue<image_retval_base> Base;
+  typedef typename Base::Index Index;
+
+  image_retval_base(const DecompositionType& dec, const MatrixType& originalMatrix)
+    : m_dec(dec), m_rank(dec.rank()),
+      m_cols(m_rank == 0 ? 1 : m_rank),
+      m_originalMatrix(originalMatrix)
+  {}
+
+  inline Index rows() const { return m_dec.rows(); }
+  inline Index cols() const { return m_cols; }
+  inline Index rank() const { return m_rank; }
+  inline const DecompositionType& dec() const { return m_dec; }
+  inline const MatrixType& originalMatrix() const { return m_originalMatrix; }
+
+  template<typename Dest> inline void evalTo(Dest& dst) const
+  {
+    static_cast<const image_retval<DecompositionType>*>(this)->evalTo(dst);
+  }
+
+  protected:
+    const DecompositionType& m_dec;
+    Index m_rank, m_cols;
+    const MatrixType& m_originalMatrix;
+};
+
+} // end namespace internal
+
+#define EIGEN_MAKE_IMAGE_HELPERS(DecompositionType) \
+  typedef typename DecompositionType::MatrixType MatrixType; \
+  typedef typename MatrixType::Scalar Scalar; \
+  typedef typename MatrixType::RealScalar RealScalar; \
+  typedef typename MatrixType::Index Index; \
+  typedef Eigen::internal::image_retval_base<DecompositionType> Base; \
+  using Base::dec; \
+  using Base::originalMatrix; \
+  using Base::rank; \
+  using Base::rows; \
+  using Base::cols; \
+  image_retval(const DecompositionType& dec, const MatrixType& originalMatrix) \
+    : Base(dec, originalMatrix) {}
+
+} // end namespace Eigen
+
+#endif // EIGEN_MISC_IMAGE_H
diff --git a/third_party/eigen3/Eigen/src/misc/Kernel.h b/third_party/eigen3/Eigen/src/misc/Kernel.h
new file mode 100644
index 0000000000..b9e1518fd4
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/misc/Kernel.h
@@ -0,0 +1,81 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MISC_KERNEL_H
+#define EIGEN_MISC_KERNEL_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \class kernel_retval_base
+  *
+  */
+template<typename DecompositionType>
+struct traits<kernel_retval_base<DecompositionType> >
+{
+  typedef typename DecompositionType::MatrixType MatrixType;
+  typedef Matrix<
+    typename MatrixType::Scalar,
+    MatrixType::ColsAtCompileTime, // the number of rows in the "kernel matrix"
+                                   // is the number of cols of the original matrix
+                                   // so that the product "matrix * kernel = zero" makes sense
+    Dynamic,                       // we don't know at compile-time the dimension of the kernel
+    MatrixType::Options,
+    MatrixType::MaxColsAtCompileTime, // see explanation for 2nd template parameter
+    MatrixType::MaxColsAtCompileTime // the kernel is a subspace of the domain space,
+                                     // whose dimension is the number of columns of the original matrix
+  > ReturnType;
+};
+
+template<typename _DecompositionType> struct kernel_retval_base
+ : public ReturnByValue<kernel_retval_base<_DecompositionType> >
+{
+  typedef _DecompositionType DecompositionType;
+  typedef ReturnByValue<kernel_retval_base> Base;
+  typedef typename Base::Index Index;
+
+  kernel_retval_base(const DecompositionType& dec)
+    : m_dec(dec),
+      m_rank(dec.rank()),
+      m_cols(m_rank==dec.cols() ? 1 : dec.cols() - m_rank)
+  {}
+
+  inline Index rows() const { return m_dec.cols(); }
+  inline Index cols() const { return m_cols; }
+  inline Index rank() const { return m_rank; }
+  inline const DecompositionType& dec() const { return m_dec; }
+
+  template<typename Dest> inline void evalTo(Dest& dst) const
+  {
+    static_cast<const kernel_retval<DecompositionType>*>(this)->evalTo(dst);
+  }
+
+  protected:
+    const DecompositionType& m_dec;
+    Index m_rank, m_cols;
+};
+
+} // end namespace internal
+
+#define EIGEN_MAKE_KERNEL_HELPERS(DecompositionType) \
+  typedef typename DecompositionType::MatrixType MatrixType; \
+  typedef typename MatrixType::Scalar Scalar; \
+  typedef typename MatrixType::RealScalar RealScalar; \
+  typedef typename MatrixType::Index Index; \
+  typedef Eigen::internal::kernel_retval_base<DecompositionType> Base; \
+  using Base::dec; \
+  using Base::rank; \
+  using Base::rows; \
+  using Base::cols; \
+  kernel_retval(const DecompositionType& dec) : Base(dec) {}
+
+} // end namespace Eigen
+
+#endif // EIGEN_MISC_KERNEL_H
diff --git a/third_party/eigen3/Eigen/src/misc/Solve.h b/third_party/eigen3/Eigen/src/misc/Solve.h
new file mode 100644
index 0000000000..7f70d60afb
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/misc/Solve.h
@@ -0,0 +1,76 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_MISC_SOLVE_H
+#define EIGEN_MISC_SOLVE_H
+
+namespace Eigen { 
+
+namespace internal {
+
+/** \class solve_retval_base
+  *
+  */
+template<typename DecompositionType, typename Rhs>
+struct traits<solve_retval_base<DecompositionType, Rhs> >
+{
+  typedef typename DecompositionType::MatrixType MatrixType;
+  typedef Matrix<typename Rhs::Scalar,
+                 MatrixType::ColsAtCompileTime,
+                 Rhs::ColsAtCompileTime,
+                 Rhs::PlainObject::Options,
+                 MatrixType::MaxColsAtCompileTime,
+                 Rhs::MaxColsAtCompileTime> ReturnType;
+};
+
+template<typename _DecompositionType, typename Rhs> struct solve_retval_base
+ : public ReturnByValue<solve_retval_base<_DecompositionType, Rhs> >
+{
+  typedef typename remove_all<typename Rhs::Nested>::type RhsNestedCleaned;
+  typedef _DecompositionType DecompositionType;
+  typedef ReturnByValue<solve_retval_base> Base;
+  typedef typename Base::Index Index;
+
+  solve_retval_base(const DecompositionType& dec, const Rhs& rhs)
+    : m_dec(dec), m_rhs(rhs)
+  {}
+
+  inline Index rows() const { return m_dec.cols(); }
+  inline Index cols() const { return m_rhs.cols(); }
+  inline const DecompositionType& dec() const { return m_dec; }
+  inline const RhsNestedCleaned& rhs() const { return m_rhs; }
+
+  template<typename Dest> inline void evalTo(Dest& dst) const
+  {
+    static_cast<const solve_retval<DecompositionType,Rhs>*>(this)->evalTo(dst);
+  }
+
+  protected:
+    const DecompositionType& m_dec;
+    typename Rhs::Nested m_rhs;
+};
+
+} // end namespace internal
+
+#define EIGEN_MAKE_SOLVE_HELPERS(DecompositionType,Rhs) \
+  typedef typename DecompositionType::MatrixType MatrixType; \
+  typedef typename MatrixType::Scalar Scalar; \
+  typedef typename MatrixType::RealScalar RealScalar; \
+  typedef typename MatrixType::Index Index; \
+  typedef Eigen::internal::solve_retval_base<DecompositionType,Rhs> Base; \
+  using Base::dec; \
+  using Base::rhs; \
+  using Base::rows; \
+  using Base::cols; \
+  solve_retval(const DecompositionType& dec, const Rhs& rhs) \
+    : Base(dec, rhs) {}
+
+} // end namespace Eigen
+
+#endif // EIGEN_MISC_SOLVE_H
diff --git a/third_party/eigen3/Eigen/src/misc/SparseSolve.h b/third_party/eigen3/Eigen/src/misc/SparseSolve.h
new file mode 100644
index 0000000000..05caa9266b
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/misc/SparseSolve.h
@@ -0,0 +1,130 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_SPARSE_SOLVE_H
+#define EIGEN_SPARSE_SOLVE_H
+
+namespace Eigen { 
+
+namespace internal {
+
+template<typename _DecompositionType, typename Rhs> struct sparse_solve_retval_base;
+template<typename _DecompositionType, typename Rhs> struct sparse_solve_retval;
+  
+template<typename DecompositionType, typename Rhs>
+struct traits<sparse_solve_retval_base<DecompositionType, Rhs> >
+{
+  typedef typename DecompositionType::MatrixType MatrixType;
+  typedef SparseMatrix<typename Rhs::Scalar, Rhs::Options, typename Rhs::Index> ReturnType;
+};
+
+template<typename _DecompositionType, typename Rhs> struct sparse_solve_retval_base
+ : public ReturnByValue<sparse_solve_retval_base<_DecompositionType, Rhs> >
+{
+  typedef typename remove_all<typename Rhs::Nested>::type RhsNestedCleaned;
+  typedef _DecompositionType DecompositionType;
+  typedef ReturnByValue<sparse_solve_retval_base> Base;
+  typedef typename Base::Index Index;
+
+  sparse_solve_retval_base(const DecompositionType& dec, const Rhs& rhs)
+    : m_dec(dec), m_rhs(rhs)
+  {}
+
+  inline Index rows() const { return m_dec.cols(); }
+  inline Index cols() const { return m_rhs.cols(); }
+  inline const DecompositionType& dec() const { return m_dec; }
+  inline const RhsNestedCleaned& rhs() const { return m_rhs; }
+
+  template<typename Dest> inline void evalTo(Dest& dst) const
+  {
+    static_cast<const sparse_solve_retval<DecompositionType,Rhs>*>(this)->evalTo(dst);
+  }
+
+  protected:
+    template<typename DestScalar, int DestOptions, typename DestIndex>
+    inline void defaultEvalTo(SparseMatrix<DestScalar,DestOptions,DestIndex>& dst) const
+    {
+      // we process the sparse rhs per block of NbColsAtOnce columns temporarily stored into a dense matrix.
+      static const int NbColsAtOnce = 4;
+      int rhsCols = m_rhs.cols();
+      int size = m_rhs.rows();
+      // the temporary matrices do not need more columns than NbColsAtOnce:
+      int tmpCols = (std::min)(rhsCols, NbColsAtOnce); 
+      Eigen::Matrix<DestScalar,Dynamic,Dynamic> tmp(size,tmpCols);
+      Eigen::Matrix<DestScalar,Dynamic,Dynamic> tmpX(size,tmpCols);
+      for(int k=0; k<rhsCols; k+=NbColsAtOnce)
+      {
+        int actualCols = std::min<int>(rhsCols-k, NbColsAtOnce);
+        tmp.leftCols(actualCols) = m_rhs.middleCols(k,actualCols);
+        tmpX.leftCols(actualCols) = m_dec.solve(tmp.leftCols(actualCols));
+        dst.middleCols(k,actualCols) = tmpX.leftCols(actualCols).sparseView();
+      }
+    }
+    const DecompositionType& m_dec;
+    typename Rhs::Nested m_rhs;
+};
+
+#define EIGEN_MAKE_SPARSE_SOLVE_HELPERS(DecompositionType,Rhs) \
+  typedef typename DecompositionType::MatrixType MatrixType; \
+  typedef typename MatrixType::Scalar Scalar; \
+  typedef typename MatrixType::RealScalar RealScalar; \
+  typedef typename MatrixType::Index Index; \
+  typedef Eigen::internal::sparse_solve_retval_base<DecompositionType,Rhs> Base; \
+  using Base::dec; \
+  using Base::rhs; \
+  using Base::rows; \
+  using Base::cols; \
+  sparse_solve_retval(const DecompositionType& dec, const Rhs& rhs) \
+    : Base(dec, rhs) {}
+
+
+
+template<typename DecompositionType, typename Rhs, typename Guess> struct solve_retval_with_guess;
+
+template<typename DecompositionType, typename Rhs, typename Guess>
+struct traits<solve_retval_with_guess<DecompositionType, Rhs, Guess> >
+{
+  typedef typename DecompositionType::MatrixType MatrixType;
+  typedef Matrix<typename Rhs::Scalar,
+                 MatrixType::ColsAtCompileTime,
+                 Rhs::ColsAtCompileTime,
+                 Rhs::PlainObject::Options,
+                 MatrixType::MaxColsAtCompileTime,
+                 Rhs::MaxColsAtCompileTime> ReturnType;
+};
+
+template<typename DecompositionType, typename Rhs, typename Guess> struct solve_retval_with_guess
+ : public ReturnByValue<solve_retval_with_guess<DecompositionType, Rhs, Guess> >
+{
+  typedef typename DecompositionType::Index Index;
+
+  solve_retval_with_guess(const DecompositionType& dec, const Rhs& rhs, const Guess& guess)
+    : m_dec(dec), m_rhs(rhs), m_guess(guess)
+  {}
+
+  inline Index rows() const { return m_dec.cols(); }
+  inline Index cols() const { return m_rhs.cols(); }
+
+  template<typename Dest> inline void evalTo(Dest& dst) const
+  {
+    dst = m_guess;
+    m_dec._solveWithGuess(m_rhs,dst);
+  }
+
+  protected:
+    const DecompositionType& m_dec;
+    const typename Rhs::Nested m_rhs;
+    const typename Guess::Nested m_guess;
+};
+
+} // namepsace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_SPARSE_SOLVE_H
diff --git a/third_party/eigen3/Eigen/src/misc/blas.h b/third_party/eigen3/Eigen/src/misc/blas.h
new file mode 100644
index 0000000000..6fce99ed5c
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/misc/blas.h
@@ -0,0 +1,658 @@
+#ifndef BLAS_H
+#define BLAS_H
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+#define BLASFUNC(FUNC) FUNC##_
+
+#ifdef __WIN64__
+typedef long long BLASLONG;
+typedef unsigned long long BLASULONG;
+#else
+typedef long BLASLONG;
+typedef unsigned long BLASULONG;
+#endif
+
+int    BLASFUNC(xerbla)(const char *, int *info, int);
+
+float  BLASFUNC(sdot)  (int *, float  *, int *, float  *, int *);
+float  BLASFUNC(sdsdot)(int *, float  *,        float  *, int *, float  *, int *);
+
+double BLASFUNC(dsdot) (int *, float  *, int *, float  *, int *);
+double BLASFUNC(ddot)  (int *, double *, int *, double *, int *);
+double BLASFUNC(qdot)  (int *, double *, int *, double *, int *);
+
+int  BLASFUNC(cdotuw)  (int *, float  *, int *, float  *, int *, float*);
+int  BLASFUNC(cdotcw)  (int *, float  *, int *, float  *, int *, float*);
+int  BLASFUNC(zdotuw)  (int *, double  *, int *, double  *, int *, double*);
+int  BLASFUNC(zdotcw)  (int *, double  *, int *, double  *, int *, double*);
+
+int    BLASFUNC(saxpy) (int *, float  *, float  *, int *, float  *, int *);
+int    BLASFUNC(daxpy) (int *, double *, double *, int *, double *, int *);
+int    BLASFUNC(qaxpy) (int *, double *, double *, int *, double *, int *);
+int    BLASFUNC(caxpy) (int *, float  *, float  *, int *, float  *, int *);
+int    BLASFUNC(zaxpy) (int *, double *, double *, int *, double *, int *);
+int    BLASFUNC(xaxpy) (int *, double *, double *, int *, double *, int *);
+int    BLASFUNC(caxpyc)(int *, float  *, float  *, int *, float  *, int *);
+int    BLASFUNC(zaxpyc)(int *, double *, double *, int *, double *, int *);
+int    BLASFUNC(xaxpyc)(int *, double *, double *, int *, double *, int *);
+
+int    BLASFUNC(scopy) (int *, float  *, int *, float  *, int *);
+int    BLASFUNC(dcopy) (int *, double *, int *, double *, int *);
+int    BLASFUNC(qcopy) (int *, double *, int *, double *, int *);
+int    BLASFUNC(ccopy) (int *, float  *, int *, float  *, int *);
+int    BLASFUNC(zcopy) (int *, double *, int *, double *, int *);
+int    BLASFUNC(xcopy) (int *, double *, int *, double *, int *);
+
+int    BLASFUNC(sswap) (int *, float  *, int *, float  *, int *);
+int    BLASFUNC(dswap) (int *, double *, int *, double *, int *);
+int    BLASFUNC(qswap) (int *, double *, int *, double *, int *);
+int    BLASFUNC(cswap) (int *, float  *, int *, float  *, int *);
+int    BLASFUNC(zswap) (int *, double *, int *, double *, int *);
+int    BLASFUNC(xswap) (int *, double *, int *, double *, int *);
+
+float  BLASFUNC(sasum) (int *, float  *, int *);
+float  BLASFUNC(scasum)(int *, float  *, int *);
+double BLASFUNC(dasum) (int *, double *, int *);
+double BLASFUNC(qasum) (int *, double *, int *);
+double BLASFUNC(dzasum)(int *, double *, int *);
+double BLASFUNC(qxasum)(int *, double *, int *);
+
+int    BLASFUNC(isamax)(int *, float  *, int *);
+int    BLASFUNC(idamax)(int *, double *, int *);
+int    BLASFUNC(iqamax)(int *, double *, int *);
+int    BLASFUNC(icamax)(int *, float  *, int *);
+int    BLASFUNC(izamax)(int *, double *, int *);
+int    BLASFUNC(ixamax)(int *, double *, int *);
+
+int    BLASFUNC(ismax) (int *, float  *, int *);
+int    BLASFUNC(idmax) (int *, double *, int *);
+int    BLASFUNC(iqmax) (int *, double *, int *);
+int    BLASFUNC(icmax) (int *, float  *, int *);
+int    BLASFUNC(izmax) (int *, double *, int *);
+int    BLASFUNC(ixmax) (int *, double *, int *);
+
+int    BLASFUNC(isamin)(int *, float  *, int *);
+int    BLASFUNC(idamin)(int *, double *, int *);
+int    BLASFUNC(iqamin)(int *, double *, int *);
+int    BLASFUNC(icamin)(int *, float  *, int *);
+int    BLASFUNC(izamin)(int *, double *, int *);
+int    BLASFUNC(ixamin)(int *, double *, int *);
+
+int    BLASFUNC(ismin)(int *, float  *, int *);
+int    BLASFUNC(idmin)(int *, double *, int *);
+int    BLASFUNC(iqmin)(int *, double *, int *);
+int    BLASFUNC(icmin)(int *, float  *, int *);
+int    BLASFUNC(izmin)(int *, double *, int *);
+int    BLASFUNC(ixmin)(int *, double *, int *);
+
+float  BLASFUNC(samax) (int *, float  *, int *);
+double BLASFUNC(damax) (int *, double *, int *);
+double BLASFUNC(qamax) (int *, double *, int *);
+float  BLASFUNC(scamax)(int *, float  *, int *);
+double BLASFUNC(dzamax)(int *, double *, int *);
+double BLASFUNC(qxamax)(int *, double *, int *);
+
+float  BLASFUNC(samin) (int *, float  *, int *);
+double BLASFUNC(damin) (int *, double *, int *);
+double BLASFUNC(qamin) (int *, double *, int *);
+float  BLASFUNC(scamin)(int *, float  *, int *);
+double BLASFUNC(dzamin)(int *, double *, int *);
+double BLASFUNC(qxamin)(int *, double *, int *);
+
+float  BLASFUNC(smax)  (int *, float  *, int *);
+double BLASFUNC(dmax)  (int *, double *, int *);
+double BLASFUNC(qmax)  (int *, double *, int *);
+float  BLASFUNC(scmax) (int *, float  *, int *);
+double BLASFUNC(dzmax) (int *, double *, int *);
+double BLASFUNC(qxmax) (int *, double *, int *);
+
+float  BLASFUNC(smin)  (int *, float  *, int *);
+double BLASFUNC(dmin)  (int *, double *, int *);
+double BLASFUNC(qmin)  (int *, double *, int *);
+float  BLASFUNC(scmin) (int *, float  *, int *);
+double BLASFUNC(dzmin) (int *, double *, int *);
+double BLASFUNC(qxmin) (int *, double *, int *);
+
+int    BLASFUNC(sscal) (int *,  float  *, float  *, int *);
+int    BLASFUNC(dscal) (int *,  double *, double *, int *);
+int    BLASFUNC(qscal) (int *,  double *, double *, int *);
+int    BLASFUNC(cscal) (int *,  float  *, float  *, int *);
+int    BLASFUNC(zscal) (int *,  double *, double *, int *);
+int    BLASFUNC(xscal) (int *,  double *, double *, int *);
+int    BLASFUNC(csscal)(int *,  float  *, float  *, int *);
+int    BLASFUNC(zdscal)(int *,  double *, double *, int *);
+int    BLASFUNC(xqscal)(int *,  double *, double *, int *);
+
+float  BLASFUNC(snrm2) (int *, float  *, int *);
+float  BLASFUNC(scnrm2)(int *, float  *, int *);
+
+double BLASFUNC(dnrm2) (int *, double *, int *);
+double BLASFUNC(qnrm2) (int *, double *, int *);
+double BLASFUNC(dznrm2)(int *, double *, int *);
+double BLASFUNC(qxnrm2)(int *, double *, int *);
+
+int    BLASFUNC(srot)  (int *, float  *, int *, float  *, int *, float  *, float  *);
+int    BLASFUNC(drot)  (int *, double *, int *, double *, int *, double *, double *);
+int    BLASFUNC(qrot)  (int *, double *, int *, double *, int *, double *, double *);
+int    BLASFUNC(csrot) (int *, float  *, int *, float  *, int *, float  *, float  *);
+int    BLASFUNC(zdrot) (int *, double *, int *, double *, int *, double *, double *);
+int    BLASFUNC(xqrot) (int *, double *, int *, double *, int *, double *, double *);
+
+int    BLASFUNC(srotg) (float  *, float  *, float  *, float  *);
+int    BLASFUNC(drotg) (double *, double *, double *, double *);
+int    BLASFUNC(qrotg) (double *, double *, double *, double *);
+int    BLASFUNC(crotg) (float  *, float  *, float  *, float  *);
+int    BLASFUNC(zrotg) (double *, double *, double *, double *);
+int    BLASFUNC(xrotg) (double *, double *, double *, double *);
+
+int    BLASFUNC(srotmg)(float  *, float  *, float  *, float  *, float  *);
+int    BLASFUNC(drotmg)(double *, double *, double *, double *, double *);
+
+int    BLASFUNC(srotm) (int *, float  *, int *, float  *, int *, float  *);
+int    BLASFUNC(drotm) (int *, double *, int *, double *, int *, double *);
+int    BLASFUNC(qrotm) (int *, double *, int *, double *, int *, double *);
+
+/* Level 2 routines */
+
+int BLASFUNC(sger)(int *,    int *, float *,  float *, int *,
+		   float *,  int *, float *,  int *);
+int BLASFUNC(dger)(int *,    int *, double *, double *, int *,
+		   double *, int *, double *, int *);
+int BLASFUNC(qger)(int *,    int *, double *, double *, int *,
+		   double *, int *, double *, int *);
+int BLASFUNC(cgeru)(int *,    int *, float *,  float *, int *,
+		    float *,  int *, float *,  int *);
+int BLASFUNC(cgerc)(int *,    int *, float *,  float *, int *,
+		    float *,  int *, float *,  int *);
+int BLASFUNC(zgeru)(int *,    int *, double *, double *, int *,
+		    double *, int *, double *, int *);
+int BLASFUNC(zgerc)(int *,    int *, double *, double *, int *,
+		    double *, int *, double *, int *);
+int BLASFUNC(xgeru)(int *,    int *, double *, double *, int *,
+		    double *, int *, double *, int *);
+int BLASFUNC(xgerc)(int *,    int *, double *, double *, int *,
+		    double *, int *, double *, int *);
+
+int BLASFUNC(sgemv)(char *, int *, int *, float  *, float  *, int *,
+		    float  *, int *, float  *, float  *, int *);
+int BLASFUNC(dgemv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(qgemv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(cgemv)(char *, int *, int *, float  *, float  *, int *,
+		    float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zgemv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(xgemv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+
+int BLASFUNC(strsv) (char *, char *, char *, int *, float  *, int *,
+		     float  *, int *);
+int BLASFUNC(dtrsv) (char *, char *, char *, int *, double *, int *,
+		     double *, int *);
+int BLASFUNC(qtrsv) (char *, char *, char *, int *, double *, int *,
+		     double *, int *);
+int BLASFUNC(ctrsv) (char *, char *, char *, int *, float  *, int *,
+		     float  *, int *);
+int BLASFUNC(ztrsv) (char *, char *, char *, int *, double *, int *,
+		     double *, int *);
+int BLASFUNC(xtrsv) (char *, char *, char *, int *, double *, int *,
+		     double *, int *);
+
+int BLASFUNC(stpsv) (char *, char *, char *, int *, float  *, float  *, int *);
+int BLASFUNC(dtpsv) (char *, char *, char *, int *, double *, double *, int *);
+int BLASFUNC(qtpsv) (char *, char *, char *, int *, double *, double *, int *);
+int BLASFUNC(ctpsv) (char *, char *, char *, int *, float  *, float  *, int *);
+int BLASFUNC(ztpsv) (char *, char *, char *, int *, double *, double *, int *);
+int BLASFUNC(xtpsv) (char *, char *, char *, int *, double *, double *, int *);
+
+int BLASFUNC(strmv) (char *, char *, char *, int *, float  *, int *,
+		     float  *, int *);
+int BLASFUNC(dtrmv) (char *, char *, char *, int *, double *, int *,
+		     double *, int *);
+int BLASFUNC(qtrmv) (char *, char *, char *, int *, double *, int *,
+		     double *, int *);
+int BLASFUNC(ctrmv) (char *, char *, char *, int *, float  *, int *,
+		     float  *, int *);
+int BLASFUNC(ztrmv) (char *, char *, char *, int *, double *, int *,
+		     double *, int *);
+int BLASFUNC(xtrmv) (char *, char *, char *, int *, double *, int *,
+		     double *, int *);
+
+int BLASFUNC(stpmv) (char *, char *, char *, int *, float  *, float  *, int *);
+int BLASFUNC(dtpmv) (char *, char *, char *, int *, double *, double *, int *);
+int BLASFUNC(qtpmv) (char *, char *, char *, int *, double *, double *, int *);
+int BLASFUNC(ctpmv) (char *, char *, char *, int *, float  *, float  *, int *);
+int BLASFUNC(ztpmv) (char *, char *, char *, int *, double *, double *, int *);
+int BLASFUNC(xtpmv) (char *, char *, char *, int *, double *, double *, int *);
+
+int BLASFUNC(stbmv) (char *, char *, char *, int *, int *, float  *, int *, float  *, int *);
+int BLASFUNC(dtbmv) (char *, char *, char *, int *, int *, double *, int *, double *, int *);
+int BLASFUNC(qtbmv) (char *, char *, char *, int *, int *, double *, int *, double *, int *);
+int BLASFUNC(ctbmv) (char *, char *, char *, int *, int *, float  *, int *, float  *, int *);
+int BLASFUNC(ztbmv) (char *, char *, char *, int *, int *, double *, int *, double *, int *);
+int BLASFUNC(xtbmv) (char *, char *, char *, int *, int *, double *, int *, double *, int *);
+
+int BLASFUNC(stbsv) (char *, char *, char *, int *, int *, float  *, int *, float  *, int *);
+int BLASFUNC(dtbsv) (char *, char *, char *, int *, int *, double *, int *, double *, int *);
+int BLASFUNC(qtbsv) (char *, char *, char *, int *, int *, double *, int *, double *, int *);
+int BLASFUNC(ctbsv) (char *, char *, char *, int *, int *, float  *, int *, float  *, int *);
+int BLASFUNC(ztbsv) (char *, char *, char *, int *, int *, double *, int *, double *, int *);
+int BLASFUNC(xtbsv) (char *, char *, char *, int *, int *, double *, int *, double *, int *);
+
+int BLASFUNC(ssymv) (char *, int *, float  *, float *, int *,
+		     float  *, int *, float *, float *, int *);
+int BLASFUNC(dsymv) (char *, int *, double  *, double *, int *,
+		     double  *, int *, double *, double *, int *);
+int BLASFUNC(qsymv) (char *, int *, double  *, double *, int *,
+		     double  *, int *, double *, double *, int *);
+int BLASFUNC(csymv) (char *, int *, float  *, float *, int *,
+		     float  *, int *, float *, float *, int *);
+int BLASFUNC(zsymv) (char *, int *, double  *, double *, int *,
+		     double  *, int *, double *, double *, int *);
+int BLASFUNC(xsymv) (char *, int *, double  *, double *, int *,
+		     double  *, int *, double *, double *, int *);
+
+int BLASFUNC(sspmv) (char *, int *, float  *, float *,
+		     float  *, int *, float *, float *, int *);
+int BLASFUNC(dspmv) (char *, int *, double  *, double *,
+		     double  *, int *, double *, double *, int *);
+int BLASFUNC(qspmv) (char *, int *, double  *, double *,
+		     double  *, int *, double *, double *, int *);
+int BLASFUNC(cspmv) (char *, int *, float  *, float *,
+		     float  *, int *, float *, float *, int *);
+int BLASFUNC(zspmv) (char *, int *, double  *, double *,
+		     double  *, int *, double *, double *, int *);
+int BLASFUNC(xspmv) (char *, int *, double  *, double *,
+		     double  *, int *, double *, double *, int *);
+
+int BLASFUNC(ssyr) (char *, int *, float   *, float  *, int *,
+		    float  *, int *);
+int BLASFUNC(dsyr) (char *, int *, double  *, double *, int *,
+		    double *, int *);
+int BLASFUNC(qsyr) (char *, int *, double  *, double *, int *,
+		    double *, int *);
+int BLASFUNC(csyr) (char *, int *, float   *, float  *, int *,
+		    float  *, int *);
+int BLASFUNC(zsyr) (char *, int *, double  *, double *, int *,
+		    double *, int *);
+int BLASFUNC(xsyr) (char *, int *, double  *, double *, int *,
+		    double *, int *);
+
+int BLASFUNC(ssyr2) (char *, int *, float   *,
+		     float  *, int *, float  *, int *, float  *, int *);
+int BLASFUNC(dsyr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *, int *);
+int BLASFUNC(qsyr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *, int *);
+int BLASFUNC(csyr2) (char *, int *, float   *,
+		     float  *, int *, float  *, int *, float  *, int *);
+int BLASFUNC(zsyr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *, int *);
+int BLASFUNC(xsyr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *, int *);
+
+int BLASFUNC(sspr) (char *, int *, float   *, float  *, int *,
+		    float  *);
+int BLASFUNC(dspr) (char *, int *, double  *, double *, int *,
+		    double *);
+int BLASFUNC(qspr) (char *, int *, double  *, double *, int *,
+		    double *);
+int BLASFUNC(cspr) (char *, int *, float   *, float  *, int *,
+		    float  *);
+int BLASFUNC(zspr) (char *, int *, double  *, double *, int *,
+		    double *);
+int BLASFUNC(xspr) (char *, int *, double  *, double *, int *,
+		    double *);
+
+int BLASFUNC(sspr2) (char *, int *, float   *,
+		     float  *, int *, float  *, int *, float  *);
+int BLASFUNC(dspr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *);
+int BLASFUNC(qspr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *);
+int BLASFUNC(cspr2) (char *, int *, float   *,
+		     float  *, int *, float  *, int *, float  *);
+int BLASFUNC(zspr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *);
+int BLASFUNC(xspr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *);
+
+int BLASFUNC(cher) (char *, int *, float   *, float  *, int *,
+		    float  *, int *);
+int BLASFUNC(zher) (char *, int *, double  *, double *, int *,
+		    double *, int *);
+int BLASFUNC(xher) (char *, int *, double  *, double *, int *,
+		    double *, int *);
+
+int BLASFUNC(chpr) (char *, int *, float   *, float  *, int *, float  *);
+int BLASFUNC(zhpr) (char *, int *, double  *, double *, int *, double *);
+int BLASFUNC(xhpr) (char *, int *, double  *, double *, int *, double *);
+
+int BLASFUNC(cher2) (char *, int *, float   *,
+		     float  *, int *, float  *, int *, float  *, int *);
+int BLASFUNC(zher2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *, int *);
+int BLASFUNC(xher2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *, int *);
+
+int BLASFUNC(chpr2) (char *, int *, float   *,
+		     float  *, int *, float  *, int *, float  *);
+int BLASFUNC(zhpr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *);
+int BLASFUNC(xhpr2) (char *, int *, double  *,
+		     double *, int *, double *, int *, double *);
+
+int BLASFUNC(chemv) (char *, int *, float  *, float *, int *,
+		     float  *, int *, float *, float *, int *);
+int BLASFUNC(zhemv) (char *, int *, double  *, double *, int *,
+		     double  *, int *, double *, double *, int *);
+int BLASFUNC(xhemv) (char *, int *, double  *, double *, int *,
+		     double  *, int *, double *, double *, int *);
+
+int BLASFUNC(chpmv) (char *, int *, float  *, float *,
+		     float  *, int *, float *, float *, int *);
+int BLASFUNC(zhpmv) (char *, int *, double  *, double *,
+		     double  *, int *, double *, double *, int *);
+int BLASFUNC(xhpmv) (char *, int *, double  *, double *,
+		     double  *, int *, double *, double *, int *);
+
+int BLASFUNC(snorm)(char *, int *, int *, float  *, int *);
+int BLASFUNC(dnorm)(char *, int *, int *, double *, int *);
+int BLASFUNC(cnorm)(char *, int *, int *, float  *, int *);
+int BLASFUNC(znorm)(char *, int *, int *, double *, int *);
+
+int BLASFUNC(sgbmv)(char *, int *, int *, int *, int *, float  *, float  *, int *,
+		    float  *, int *, float  *, float  *, int *);
+int BLASFUNC(dgbmv)(char *, int *, int *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(qgbmv)(char *, int *, int *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(cgbmv)(char *, int *, int *, int *, int *, float  *, float  *, int *,
+		    float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zgbmv)(char *, int *, int *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(xgbmv)(char *, int *, int *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+
+int BLASFUNC(ssbmv)(char *, int *, int *, float  *, float  *, int *,
+		    float  *, int *, float  *, float  *, int *);
+int BLASFUNC(dsbmv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(qsbmv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(csbmv)(char *, int *, int *, float  *, float  *, int *,
+		    float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zsbmv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(xsbmv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+
+int BLASFUNC(chbmv)(char *, int *, int *, float  *, float  *, int *,
+		    float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zhbmv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+int BLASFUNC(xhbmv)(char *, int *, int *, double *, double *, int *,
+		    double *, int *, double *, double *, int *);
+
+/* Level 3 routines */
+
+int BLASFUNC(sgemm)(char *, char *, int *, int *, int *, float *,
+	   float  *, int *, float  *, int *, float  *, float  *, int *);
+int BLASFUNC(dgemm)(char *, char *, int *, int *, int *, double *,
+	   double *, int *, double *, int *, double *, double *, int *);
+int BLASFUNC(qgemm)(char *, char *, int *, int *, int *, double *,
+	   double *, int *, double *, int *, double *, double *, int *);
+int BLASFUNC(cgemm)(char *, char *, int *, int *, int *, float *,
+	   float  *, int *, float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zgemm)(char *, char *, int *, int *, int *, double *,
+	   double *, int *, double *, int *, double *, double *, int *);
+int BLASFUNC(xgemm)(char *, char *, int *, int *, int *, double *,
+	   double *, int *, double *, int *, double *, double *, int *);
+
+int BLASFUNC(cgemm3m)(char *, char *, int *, int *, int *, float *,
+	   float  *, int *, float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zgemm3m)(char *, char *, int *, int *, int *, double *,
+	   double *, int *, double *, int *, double *, double *, int *);
+int BLASFUNC(xgemm3m)(char *, char *, int *, int *, int *, double *,
+	   double *, int *, double *, int *, double *, double *, int *);
+
+int BLASFUNC(sge2mm)(char *, char *, char *, int *, int *,
+		     float *, float  *, int *, float  *, int *,
+		     float *, float  *, int *);
+int BLASFUNC(dge2mm)(char *, char *, char *, int *, int *,
+		     double *, double  *, int *, double  *, int *,
+		     double *, double  *, int *);
+int BLASFUNC(cge2mm)(char *, char *, char *, int *, int *,
+		     float *, float  *, int *, float  *, int *,
+		     float *, float  *, int *);
+int BLASFUNC(zge2mm)(char *, char *, char *, int *, int *,
+		     double *, double  *, int *, double  *, int *,
+		     double *, double  *, int *);
+
+int BLASFUNC(strsm)(char *, char *, char *, char *, int *, int *,
+	   float *,  float *, int *, float *, int *);
+int BLASFUNC(dtrsm)(char *, char *, char *, char *, int *, int *,
+	   double *,  double *, int *, double *, int *);
+int BLASFUNC(qtrsm)(char *, char *, char *, char *, int *, int *,
+	   double *,  double *, int *, double *, int *);
+int BLASFUNC(ctrsm)(char *, char *, char *, char *, int *, int *,
+	   float *,  float *, int *, float *, int *);
+int BLASFUNC(ztrsm)(char *, char *, char *, char *, int *, int *,
+	   double *,  double *, int *, double *, int *);
+int BLASFUNC(xtrsm)(char *, char *, char *, char *, int *, int *,
+	   double *,  double *, int *, double *, int *);
+
+int BLASFUNC(strmm)(char *, char *, char *, char *, int *, int *,
+	   float *,  float *, int *, float *, int *);
+int BLASFUNC(dtrmm)(char *, char *, char *, char *, int *, int *,
+	   double *,  double *, int *, double *, int *);
+int BLASFUNC(qtrmm)(char *, char *, char *, char *, int *, int *,
+	   double *,  double *, int *, double *, int *);
+int BLASFUNC(ctrmm)(char *, char *, char *, char *, int *, int *,
+	   float *,  float *, int *, float *, int *);
+int BLASFUNC(ztrmm)(char *, char *, char *, char *, int *, int *,
+	   double *,  double *, int *, double *, int *);
+int BLASFUNC(xtrmm)(char *, char *, char *, char *, int *, int *,
+	   double *,  double *, int *, double *, int *);
+
+int BLASFUNC(ssymm)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float  *, int *, float  *, float  *, int *);
+int BLASFUNC(dsymm)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+int BLASFUNC(qsymm)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+int BLASFUNC(csymm)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zsymm)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+int BLASFUNC(xsymm)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+
+int BLASFUNC(csymm3m)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zsymm3m)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+int BLASFUNC(xsymm3m)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+
+int BLASFUNC(ssyrk)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float  *, float  *, int *);
+int BLASFUNC(dsyrk)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, double *, int *);
+int BLASFUNC(qsyrk)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, double *, int *);
+int BLASFUNC(csyrk)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float  *, float  *, int *);
+int BLASFUNC(zsyrk)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, double *, int *);
+int BLASFUNC(xsyrk)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, double *, int *);
+
+int BLASFUNC(ssyr2k)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float *, int *, float  *, float  *, int *);
+int BLASFUNC(dsyr2k)(char *, char *, int *, int *, double *, double *, int *,
+	   double*, int *, double *, double *, int *);
+int BLASFUNC(qsyr2k)(char *, char *, int *, int *, double *, double *, int *,
+	   double*, int *, double *, double *, int *);
+int BLASFUNC(csyr2k)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float *, int *, float  *, float  *, int *);
+int BLASFUNC(zsyr2k)(char *, char *, int *, int *, double *, double *, int *,
+	   double*, int *, double *, double *, int *);
+int BLASFUNC(xsyr2k)(char *, char *, int *, int *, double *, double *, int *,
+	   double*, int *, double *, double *, int *);
+
+int BLASFUNC(chemm)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zhemm)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+int BLASFUNC(xhemm)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+
+int BLASFUNC(chemm3m)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float  *, int *, float  *, float  *, int *);
+int BLASFUNC(zhemm3m)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+int BLASFUNC(xhemm3m)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, int *, double *, double *, int *);
+
+int BLASFUNC(cherk)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float  *, float  *, int *);
+int BLASFUNC(zherk)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, double *, int *);
+int BLASFUNC(xherk)(char *, char *, int *, int *, double *, double *, int *,
+	   double *, double *, int *);
+
+int BLASFUNC(cher2k)(char *, char *, int *, int *, float  *, float  *, int *,
+	   float *, int *, float  *, float  *, int *);
+int BLASFUNC(zher2k)(char *, char *, int *, int *, double *, double *, int *,
+	   double*, int *, double *, double *, int *);
+int BLASFUNC(xher2k)(char *, char *, int *, int *, double *, double *, int *,
+	   double*, int *, double *, double *, int *);
+int BLASFUNC(cher2m)(char *, char *, char *, int *, int *, float  *, float  *, int *,
+	   float *, int *, float  *, float  *, int *);
+int BLASFUNC(zher2m)(char *, char *, char *, int *, int *, double *, double *, int *,
+	   double*, int *, double *, double *, int *);
+int BLASFUNC(xher2m)(char *, char *, char *, int *, int *, double *, double *, int *,
+	   double*, int *, double *, double *, int *);
+
+int BLASFUNC(sgemt)(char *, int *, int *, float  *, float  *, int *,
+		    float  *, int *);
+int BLASFUNC(dgemt)(char *, int *, int *, double *, double *, int *,
+		    double *, int *);
+int BLASFUNC(cgemt)(char *, int *, int *, float  *, float  *, int *,
+		    float  *, int *);
+int BLASFUNC(zgemt)(char *, int *, int *, double *, double *, int *,
+		    double *, int *);
+
+int BLASFUNC(sgema)(char *, char *, int *, int *, float  *,
+		    float  *, int *, float *, float  *, int *, float *, int *);
+int BLASFUNC(dgema)(char *, char *, int *, int *, double *,
+		    double *, int *, double*, double *, int *, double*, int *);
+int BLASFUNC(cgema)(char *, char *, int *, int *, float  *,
+		    float  *, int *, float *, float  *, int *, float *, int *);
+int BLASFUNC(zgema)(char *, char *, int *, int *, double *,
+		    double *, int *, double*, double *, int *, double*, int *);
+
+int BLASFUNC(sgems)(char *, char *, int *, int *, float  *,
+		    float  *, int *, float *, float  *, int *, float *, int *);
+int BLASFUNC(dgems)(char *, char *, int *, int *, double *,
+		    double *, int *, double*, double *, int *, double*, int *);
+int BLASFUNC(cgems)(char *, char *, int *, int *, float  *,
+		    float  *, int *, float *, float  *, int *, float *, int *);
+int BLASFUNC(zgems)(char *, char *, int *, int *, double *,
+		    double *, int *, double*, double *, int *, double*, int *);
+
+int BLASFUNC(sgetf2)(int *, int *, float  *, int *, int *, int *);
+int BLASFUNC(dgetf2)(int *, int *, double *, int *, int *, int *);
+int BLASFUNC(qgetf2)(int *, int *, double *, int *, int *, int *);
+int BLASFUNC(cgetf2)(int *, int *, float  *, int *, int *, int *);
+int BLASFUNC(zgetf2)(int *, int *, double *, int *, int *, int *);
+int BLASFUNC(xgetf2)(int *, int *, double *, int *, int *, int *);
+
+int BLASFUNC(sgetrf)(int *, int *, float  *, int *, int *, int *);
+int BLASFUNC(dgetrf)(int *, int *, double *, int *, int *, int *);
+int BLASFUNC(qgetrf)(int *, int *, double *, int *, int *, int *);
+int BLASFUNC(cgetrf)(int *, int *, float  *, int *, int *, int *);
+int BLASFUNC(zgetrf)(int *, int *, double *, int *, int *, int *);
+int BLASFUNC(xgetrf)(int *, int *, double *, int *, int *, int *);
+
+int BLASFUNC(slaswp)(int *, float  *, int *, int *, int *, int *, int *);
+int BLASFUNC(dlaswp)(int *, double *, int *, int *, int *, int *, int *);
+int BLASFUNC(qlaswp)(int *, double *, int *, int *, int *, int *, int *);
+int BLASFUNC(claswp)(int *, float  *, int *, int *, int *, int *, int *);
+int BLASFUNC(zlaswp)(int *, double *, int *, int *, int *, int *, int *);
+int BLASFUNC(xlaswp)(int *, double *, int *, int *, int *, int *, int *);
+
+int BLASFUNC(sgetrs)(char *, int *, int *, float  *, int *, int *, float  *, int *, int *);
+int BLASFUNC(dgetrs)(char *, int *, int *, double *, int *, int *, double *, int *, int *);
+int BLASFUNC(qgetrs)(char *, int *, int *, double *, int *, int *, double *, int *, int *);
+int BLASFUNC(cgetrs)(char *, int *, int *, float  *, int *, int *, float  *, int *, int *);
+int BLASFUNC(zgetrs)(char *, int *, int *, double *, int *, int *, double *, int *, int *);
+int BLASFUNC(xgetrs)(char *, int *, int *, double *, int *, int *, double *, int *, int *);
+
+int BLASFUNC(sgesv)(int *, int *, float  *, int *, int *, float *, int *, int *);
+int BLASFUNC(dgesv)(int *, int *, double *, int *, int *, double*, int *, int *);
+int BLASFUNC(qgesv)(int *, int *, double *, int *, int *, double*, int *, int *);
+int BLASFUNC(cgesv)(int *, int *, float  *, int *, int *, float *, int *, int *);
+int BLASFUNC(zgesv)(int *, int *, double *, int *, int *, double*, int *, int *);
+int BLASFUNC(xgesv)(int *, int *, double *, int *, int *, double*, int *, int *);
+
+int BLASFUNC(spotf2)(char *, int *, float  *, int *, int *);
+int BLASFUNC(dpotf2)(char *, int *, double *, int *, int *);
+int BLASFUNC(qpotf2)(char *, int *, double *, int *, int *);
+int BLASFUNC(cpotf2)(char *, int *, float  *, int *, int *);
+int BLASFUNC(zpotf2)(char *, int *, double *, int *, int *);
+int BLASFUNC(xpotf2)(char *, int *, double *, int *, int *);
+
+int BLASFUNC(spotrf)(char *, int *, float  *, int *, int *);
+int BLASFUNC(dpotrf)(char *, int *, double *, int *, int *);
+int BLASFUNC(qpotrf)(char *, int *, double *, int *, int *);
+int BLASFUNC(cpotrf)(char *, int *, float  *, int *, int *);
+int BLASFUNC(zpotrf)(char *, int *, double *, int *, int *);
+int BLASFUNC(xpotrf)(char *, int *, double *, int *, int *);
+
+int BLASFUNC(slauu2)(char *, int *, float  *, int *, int *);
+int BLASFUNC(dlauu2)(char *, int *, double *, int *, int *);
+int BLASFUNC(qlauu2)(char *, int *, double *, int *, int *);
+int BLASFUNC(clauu2)(char *, int *, float  *, int *, int *);
+int BLASFUNC(zlauu2)(char *, int *, double *, int *, int *);
+int BLASFUNC(xlauu2)(char *, int *, double *, int *, int *);
+
+int BLASFUNC(slauum)(char *, int *, float  *, int *, int *);
+int BLASFUNC(dlauum)(char *, int *, double *, int *, int *);
+int BLASFUNC(qlauum)(char *, int *, double *, int *, int *);
+int BLASFUNC(clauum)(char *, int *, float  *, int *, int *);
+int BLASFUNC(zlauum)(char *, int *, double *, int *, int *);
+int BLASFUNC(xlauum)(char *, int *, double *, int *, int *);
+
+int BLASFUNC(strti2)(char *, char *, int *, float  *, int *, int *);
+int BLASFUNC(dtrti2)(char *, char *, int *, double *, int *, int *);
+int BLASFUNC(qtrti2)(char *, char *, int *, double *, int *, int *);
+int BLASFUNC(ctrti2)(char *, char *, int *, float  *, int *, int *);
+int BLASFUNC(ztrti2)(char *, char *, int *, double *, int *, int *);
+int BLASFUNC(xtrti2)(char *, char *, int *, double *, int *, int *);
+
+int BLASFUNC(strtri)(char *, char *, int *, float  *, int *, int *);
+int BLASFUNC(dtrtri)(char *, char *, int *, double *, int *, int *);
+int BLASFUNC(qtrtri)(char *, char *, int *, double *, int *, int *);
+int BLASFUNC(ctrtri)(char *, char *, int *, float  *, int *, int *);
+int BLASFUNC(ztrtri)(char *, char *, int *, double *, int *, int *);
+int BLASFUNC(xtrtri)(char *, char *, int *, double *, int *, int *);
+
+int BLASFUNC(spotri)(char *, int *, float  *, int *, int *);
+int BLASFUNC(dpotri)(char *, int *, double *, int *, int *);
+int BLASFUNC(qpotri)(char *, int *, double *, int *, int *);
+int BLASFUNC(cpotri)(char *, int *, float  *, int *, int *);
+int BLASFUNC(zpotri)(char *, int *, double *, int *, int *);
+int BLASFUNC(xpotri)(char *, int *, double *, int *, int *);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/third_party/eigen3/Eigen/src/plugins/ArrayCwiseBinaryOps.h b/third_party/eigen3/Eigen/src/plugins/ArrayCwiseBinaryOps.h
new file mode 100644
index 0000000000..6e3f674573
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/plugins/ArrayCwiseBinaryOps.h
@@ -0,0 +1,241 @@
+/** \returns an expression of the coefficient wise product of \c *this and \a other
+  *
+  * \sa MatrixBase::cwiseProduct
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const EIGEN_CWISE_PRODUCT_RETURN_TYPE(Derived,OtherDerived)
+operator*(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_PRODUCT_RETURN_TYPE(Derived,OtherDerived)(derived(), other.derived());
+}
+
+/** \returns an expression of the coefficient wise quotient of \c *this and \a other
+  *
+  * \sa MatrixBase::cwiseQuotient
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<internal::scalar_quotient_op<Scalar>, const Derived, const OtherDerived>
+operator/(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  return CwiseBinaryOp<internal::scalar_quotient_op<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
+}
+
+/** \returns an expression of the coefficient-wise min of \c *this and \a other
+  *
+  * Example: \include Cwise_min.cpp
+  * Output: \verbinclude Cwise_min.out
+  *
+  * \sa max()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(min,internal::scalar_min_op)
+
+/** \returns an expression of the coefficient-wise min of \c *this and scalar \a other
+  *
+  * \sa max()
+  */
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<internal::scalar_min_op<Scalar>, const Derived,
+                                        const CwiseNullaryOp<internal::scalar_constant_op<Scalar>, PlainObject> >
+#ifdef EIGEN_PARSED_BY_DOXYGEN
+min
+#else
+(min)
+#endif
+(const Scalar &other) const
+{
+  return (min)(Derived::PlainObject::Constant(rows(), cols(), other));
+}
+
+/** \returns an expression of the coefficient-wise max of \c *this and \a other
+  *
+  * Example: \include Cwise_max.cpp
+  * Output: \verbinclude Cwise_max.out
+  *
+  * \sa min()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(max,internal::scalar_max_op)
+
+/** \returns an expression of the coefficient-wise max of \c *this and scalar \a other
+  *
+  * \sa min()
+  */
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<internal::scalar_max_op<Scalar>, const Derived,
+                                        const CwiseNullaryOp<internal::scalar_constant_op<Scalar>, PlainObject> >
+#ifdef EIGEN_PARSED_BY_DOXYGEN
+max
+#else
+(max)
+#endif
+(const Scalar &other) const
+{
+  return (max)(Derived::PlainObject::Constant(rows(), cols(), other));
+}
+
+/** \returns an expression of the coefficient-wise \< operator of *this and \a other
+  *
+  * Example: \include Cwise_less.cpp
+  * Output: \verbinclude Cwise_less.out
+  *
+  * \sa all(), any(), operator>(), operator<=()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(operator<,std::less)
+
+/** \returns an expression of the coefficient-wise \<= operator of *this and \a other
+  *
+  * Example: \include Cwise_less_equal.cpp
+  * Output: \verbinclude Cwise_less_equal.out
+  *
+  * \sa all(), any(), operator>=(), operator<()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(operator<=,std::less_equal)
+
+/** \returns an expression of the coefficient-wise \> operator of *this and \a other
+  *
+  * Example: \include Cwise_greater.cpp
+  * Output: \verbinclude Cwise_greater.out
+  *
+  * \sa all(), any(), operator>=(), operator<()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(operator>,std::greater)
+
+/** \returns an expression of the coefficient-wise \>= operator of *this and \a other
+  *
+  * Example: \include Cwise_greater_equal.cpp
+  * Output: \verbinclude Cwise_greater_equal.out
+  *
+  * \sa all(), any(), operator>(), operator<=()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(operator>=,std::greater_equal)
+
+/** \returns an expression of the coefficient-wise == operator of *this and \a other
+  *
+  * \warning this performs an exact comparison, which is generally a bad idea with floating-point types.
+  * In order to check for equality between two vectors or matrices with floating-point coefficients, it is
+  * generally a far better idea to use a fuzzy comparison as provided by isApprox() and
+  * isMuchSmallerThan().
+  *
+  * Example: \include Cwise_equal_equal.cpp
+  * Output: \verbinclude Cwise_equal_equal.out
+  *
+  * \sa all(), any(), isApprox(), isMuchSmallerThan()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(operator==,std::equal_to)
+
+/** \returns an expression of the coefficient-wise != operator of *this and \a other
+  *
+  * \warning this performs an exact comparison, which is generally a bad idea with floating-point types.
+  * In order to check for equality between two vectors or matrices with floating-point coefficients, it is
+  * generally a far better idea to use a fuzzy comparison as provided by isApprox() and
+  * isMuchSmallerThan().
+  *
+  * Example: \include Cwise_not_equal.cpp
+  * Output: \verbinclude Cwise_not_equal.out
+  *
+  * \sa all(), any(), isApprox(), isMuchSmallerThan()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(operator!=,std::not_equal_to)
+
+// scalar addition
+
+/** \returns an expression of \c *this with each coeff incremented by the constant \a scalar
+  *
+  * Example: \include Cwise_plus.cpp
+  * Output: \verbinclude Cwise_plus.out
+  *
+  * \sa operator+=(), operator-()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_add_op<Scalar>, const Derived>
+operator+(const Scalar& scalar) const
+{
+  return CwiseUnaryOp<internal::scalar_add_op<Scalar>, const Derived>(derived(), internal::scalar_add_op<Scalar>(scalar));
+}
+
+EIGEN_DEVICE_FUNC
+friend inline const CwiseUnaryOp<internal::scalar_add_op<Scalar>, const Derived>
+operator+(const Scalar& scalar,const EIGEN_CURRENT_STORAGE_BASE_CLASS<Derived>& other)
+{
+  return other + scalar;
+}
+
+/** \returns an expression of \c *this with each coeff decremented by the constant \a scalar
+  *
+  * Example: \include Cwise_minus.cpp
+  * Output: \verbinclude Cwise_minus.out
+  *
+  * \sa operator+(), operator-=()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_sub_op<Scalar>, const Derived>
+operator-(const Scalar& scalar) const
+{
+  return CwiseUnaryOp<internal::scalar_sub_op<Scalar>, const Derived>(derived(), internal::scalar_sub_op<Scalar>(scalar));;
+}
+
+EIGEN_DEVICE_FUNC
+friend inline const CwiseUnaryOp<internal::scalar_rsub_op<Scalar>, const Derived>
+operator-(const Scalar& scalar,const EIGEN_CURRENT_STORAGE_BASE_CLASS<Derived>& other)
+{
+  return CwiseUnaryOp<internal::scalar_rsub_op<Scalar>, const Derived>(other.derived(), internal::scalar_rsub_op<Scalar>(scalar));;
+}
+
+/** \returns an expression of the coefficient-wise && operator of *this and \a other
+  *
+  * \warning this operator is for expression of bool only.
+  *
+  * Example: \include Cwise_boolean_and.cpp
+  * Output: \verbinclude Cwise_boolean_and.out
+  *
+  * \sa operator||(), select()
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+inline const CwiseBinaryOp<internal::scalar_boolean_and_op, const Derived, const OtherDerived>
+operator&&(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  EIGEN_STATIC_ASSERT((internal::is_same<bool,Scalar>::value && internal::is_same<bool,typename OtherDerived::Scalar>::value),
+                      THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_OF_BOOL);
+  return CwiseBinaryOp<internal::scalar_boolean_and_op, const Derived, const OtherDerived>(derived(),other.derived());
+}
+
+/** \returns an expression of the coefficient-wise || operator of *this and \a other
+  *
+  * \warning this operator is for expression of bool only.
+  *
+  * Example: \include Cwise_boolean_or.cpp
+  * Output: \verbinclude Cwise_boolean_or.out
+  *
+  * \sa operator&&(), select()
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+inline const CwiseBinaryOp<internal::scalar_boolean_or_op, const Derived, const OtherDerived>
+operator||(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  EIGEN_STATIC_ASSERT((internal::is_same<bool,Scalar>::value && internal::is_same<bool,typename OtherDerived::Scalar>::value),
+                      THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_OF_BOOL);
+  return CwiseBinaryOp<internal::scalar_boolean_or_op, const Derived, const OtherDerived>(derived(),other.derived());
+}
+
+/** \returns an expression of the coefficient-wise ^ operator of *this and \a other
+  *
+  * \warning this operator is for expression of bool only.
+  *
+  * Example: \include Cwise_boolean_xor.cpp
+  * Output: \verbinclude Cwise_boolean_xor.out
+  *
+  * \sa operator^(), select()
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+inline const CwiseBinaryOp<internal::scalar_boolean_xor_op, const Derived, const OtherDerived>
+operator^(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  EIGEN_STATIC_ASSERT((internal::is_same<bool,Scalar>::value && internal::is_same<bool,typename OtherDerived::Scalar>::value),
+                      THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_OF_BOOL);
+  return CwiseBinaryOp<internal::scalar_boolean_xor_op, const Derived, const OtherDerived>(derived(),other.derived());
+}
+
diff --git a/third_party/eigen3/Eigen/src/plugins/ArrayCwiseUnaryOps.h b/third_party/eigen3/Eigen/src/plugins/ArrayCwiseUnaryOps.h
new file mode 100644
index 0000000000..ea6778c3f5
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/plugins/ArrayCwiseUnaryOps.h
@@ -0,0 +1,245 @@
+
+
+/** \returns an expression of the coefficient-wise absolute value of \c *this
+  *
+  * Example: \include Cwise_abs.cpp
+  * Output: \verbinclude Cwise_abs.out
+  *
+  * \sa abs2()
+  */
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseUnaryOp<internal::scalar_abs_op<Scalar>, const Derived>
+abs() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise squared absolute value of \c *this
+  *
+  * Example: \include Cwise_abs2.cpp
+  * Output: \verbinclude Cwise_abs2.out
+  *
+  * \sa abs(), square()
+  */
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseUnaryOp<internal::scalar_abs2_op<Scalar>, const Derived>
+abs2() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise exponential of *this.
+  *
+  * Example: \include Cwise_exp.cpp
+  * Output: \verbinclude Cwise_exp.out
+  *
+  * \sa pow(), log(), sin(), cos()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_exp_op<Scalar>, const Derived>
+exp() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise logarithm of *this.
+  *
+  * Example: \include Cwise_log.cpp
+  * Output: \verbinclude Cwise_log.out
+  *
+  * \sa exp()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_log_op<Scalar>, const Derived>
+log() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise square root of *this.
+  *
+  * Example: \include Cwise_sqrt.cpp
+  * Output: \verbinclude Cwise_sqrt.out
+  *
+  * \sa rsqrt(), pow(), square()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_sqrt_op<Scalar>, const Derived>
+sqrt() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise reciprocal square root of *this.
+  *
+  * \sa sqrt(), pow(), square()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_rsqrt_op<Scalar>, const Derived>
+rsqrt() const
+{
+  return derived();
+}
+
+
+/** \returns an expression of the coefficient-wise cosine of *this.
+  *
+  * Example: \include Cwise_cos.cpp
+  * Output: \verbinclude Cwise_cos.out
+  *
+  * \sa sin(), acos()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_cos_op<Scalar>, const Derived>
+cos() const
+{
+  return derived();
+}
+
+
+/** \returns an expression of the coefficient-wise sine of *this.
+  *
+  * Example: \include Cwise_sin.cpp
+  * Output: \verbinclude Cwise_sin.out
+  *
+  * \sa cos(), asin()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_sin_op<Scalar>, const Derived>
+sin() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise arc cosine of *this.
+  *
+  * Example: \include Cwise_acos.cpp
+  * Output: \verbinclude Cwise_acos.out
+  *
+  * \sa cos(), asin()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_acos_op<Scalar>, const Derived>
+acos() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise arc sine of *this.
+  *
+  * Example: \include Cwise_asin.cpp
+  * Output: \verbinclude Cwise_asin.out
+  *
+  * \sa sin(), acos()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_asin_op<Scalar>, const Derived>
+asin() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise tan of *this.
+  *
+  * Example: \include Cwise_tan.cpp
+  * Output: \verbinclude Cwise_tan.out
+  *
+  * \sa cos(), sin()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_tan_op<Scalar>, Derived>
+tan() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise arc tan of *this.
+  *
+  * Example: \include Cwise_atan.cpp
+  * Output: \verbinclude Cwise_atan.out
+  *
+  * \sa cos(), sin(), tan()
+  */
+inline const CwiseUnaryOp<internal::scalar_atan_op<Scalar>, Derived>
+atan() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise power of *this to the given exponent.
+  *
+  * Example: \include Cwise_pow.cpp
+  * Output: \verbinclude Cwise_pow.out
+  *
+  * \sa exp(), log()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_pow_op<Scalar>, const Derived>
+pow(const Scalar& exponent) const
+{
+  return CwiseUnaryOp<internal::scalar_pow_op<Scalar>, const Derived>
+          (derived(), internal::scalar_pow_op<Scalar>(exponent));
+}
+
+
+/** \returns an expression of the coefficient-wise inverse of *this.
+  *
+  * Example: \include Cwise_inverse.cpp
+  * Output: \verbinclude Cwise_inverse.out
+  *
+  * \sa operator/(), operator*()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_inverse_op<Scalar>, const Derived>
+inverse() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise square of *this.
+  *
+  * Example: \include Cwise_square.cpp
+  * Output: \verbinclude Cwise_square.out
+  *
+  * \sa operator/(), operator*(), abs2()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_square_op<Scalar>, const Derived>
+square() const
+{
+  return derived();
+}
+
+/** \returns an expression of the coefficient-wise cube of *this.
+  *
+  * Example: \include Cwise_cube.cpp
+  * Output: \verbinclude Cwise_cube.out
+  *
+  * \sa square(), pow()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_cube_op<Scalar>, const Derived>
+cube() const
+{
+  return derived();
+}
+
+#define EIGEN_MAKE_SCALAR_CWISE_UNARY_OP(METHOD_NAME,FUNCTOR) \
+  EIGEN_DEVICE_FUNC \
+  inline const CwiseUnaryOp<std::binder2nd<FUNCTOR<Scalar> >, const Derived> \
+  METHOD_NAME(const Scalar& s) const { \
+    return CwiseUnaryOp<std::binder2nd<FUNCTOR<Scalar> >, const Derived> \
+            (derived(), std::bind2nd(FUNCTOR<Scalar>(), s)); \
+  } \
+  friend inline const CwiseUnaryOp<std::binder1st<FUNCTOR<Scalar> >, const Derived> \
+  METHOD_NAME(const Scalar& s, const Derived& d) { \
+    return CwiseUnaryOp<std::binder1st<FUNCTOR<Scalar> >, const Derived> \
+            (d, std::bind1st(FUNCTOR<Scalar>(), s)); \
+  }
+
+EIGEN_MAKE_SCALAR_CWISE_UNARY_OP(operator==,  std::equal_to)
+EIGEN_MAKE_SCALAR_CWISE_UNARY_OP(operator!=,  std::not_equal_to)
+EIGEN_MAKE_SCALAR_CWISE_UNARY_OP(operator<,   std::less)
+EIGEN_MAKE_SCALAR_CWISE_UNARY_OP(operator<=,  std::less_equal)
+EIGEN_MAKE_SCALAR_CWISE_UNARY_OP(operator>,   std::greater)
+EIGEN_MAKE_SCALAR_CWISE_UNARY_OP(operator>=,  std::greater_equal)
diff --git a/third_party/eigen3/Eigen/src/plugins/BlockMethods.h b/third_party/eigen3/Eigen/src/plugins/BlockMethods.h
new file mode 100644
index 0000000000..9b7fdc4aa7
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/plugins/BlockMethods.h
@@ -0,0 +1,995 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2010 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2010 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+
+/** \internal expression type of a column */
+typedef Block<Derived, internal::traits<Derived>::RowsAtCompileTime, 1, !IsRowMajor> ColXpr;
+typedef const Block<const Derived, internal::traits<Derived>::RowsAtCompileTime, 1, !IsRowMajor> ConstColXpr;
+/** \internal expression type of a row */
+typedef Block<Derived, 1, internal::traits<Derived>::ColsAtCompileTime, IsRowMajor> RowXpr;
+typedef const Block<const Derived, 1, internal::traits<Derived>::ColsAtCompileTime, IsRowMajor> ConstRowXpr;
+/** \internal expression type of a block of whole columns */
+typedef Block<Derived, internal::traits<Derived>::RowsAtCompileTime, Dynamic, !IsRowMajor> ColsBlockXpr;
+typedef const Block<const Derived, internal::traits<Derived>::RowsAtCompileTime, Dynamic, !IsRowMajor> ConstColsBlockXpr;
+/** \internal expression type of a block of whole rows */
+typedef Block<Derived, Dynamic, internal::traits<Derived>::ColsAtCompileTime, IsRowMajor> RowsBlockXpr;
+typedef const Block<const Derived, Dynamic, internal::traits<Derived>::ColsAtCompileTime, IsRowMajor> ConstRowsBlockXpr;
+/** \internal expression type of a block of whole columns */
+template<int N> struct NColsBlockXpr { typedef Block<Derived, internal::traits<Derived>::RowsAtCompileTime, N, !IsRowMajor> Type; };
+template<int N> struct ConstNColsBlockXpr { typedef const Block<const Derived, internal::traits<Derived>::RowsAtCompileTime, N, !IsRowMajor> Type; };
+/** \internal expression type of a block of whole rows */
+template<int N> struct NRowsBlockXpr { typedef Block<Derived, N, internal::traits<Derived>::ColsAtCompileTime, IsRowMajor> Type; };
+template<int N> struct ConstNRowsBlockXpr { typedef const Block<const Derived, N, internal::traits<Derived>::ColsAtCompileTime, IsRowMajor> Type; };
+
+typedef VectorBlock<Derived> SegmentReturnType;
+typedef const VectorBlock<const Derived> ConstSegmentReturnType;
+template<int Size> struct FixedSegmentReturnType { typedef VectorBlock<Derived, Size> Type; };
+template<int Size> struct ConstFixedSegmentReturnType { typedef const VectorBlock<const Derived, Size> Type; };
+
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+/** \returns a dynamic-size expression of a block in *this.
+  *
+  * \param startRow the first row in the block
+  * \param startCol the first column in the block
+  * \param blockRows the number of rows in the block
+  * \param blockCols the number of columns in the block
+  *
+  * Example: \include MatrixBase_block_int_int_int_int.cpp
+  * Output: \verbinclude MatrixBase_block_int_int_int_int.out
+  *
+  * \note Even though the returned expression has dynamic size, in the case
+  * when it is applied to a fixed-size matrix, it inherits a fixed maximal size,
+  * which means that evaluating it does not cause a dynamic memory allocation.
+  *
+  * \sa class Block, block(Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline Block<Derived> block(Index startRow, Index startCol, Index blockRows, Index blockCols)
+{
+  return Block<Derived>(derived(), startRow, startCol, blockRows, blockCols);
+}
+
+/** This is the const version of block(Index,Index,Index,Index). */
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived> block(Index startRow, Index startCol, Index blockRows, Index blockCols) const
+{
+  return Block<const Derived>(derived(), startRow, startCol, blockRows, blockCols);
+}
+
+
+
+
+/** \returns a dynamic-size expression of a top-right corner of *this.
+  *
+  * \param cRows the number of rows in the corner
+  * \param cCols the number of columns in the corner
+  *
+  * Example: \include MatrixBase_topRightCorner_int_int.cpp
+  * Output: \verbinclude MatrixBase_topRightCorner_int_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline Block<Derived> topRightCorner(Index cRows, Index cCols)
+{
+  return Block<Derived>(derived(), 0, cols() - cCols, cRows, cCols);
+}
+
+/** This is the const version of topRightCorner(Index, Index).*/
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived> topRightCorner(Index cRows, Index cCols) const
+{
+  return Block<const Derived>(derived(), 0, cols() - cCols, cRows, cCols);
+}
+
+/** \returns an expression of a fixed-size top-right corner of *this.
+  *
+  * \tparam CRows the number of rows in the corner
+  * \tparam CCols the number of columns in the corner
+  *
+  * Example: \include MatrixBase_template_int_int_topRightCorner.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_topRightCorner.out
+  *
+  * \sa class Block, block<int,int>(Index,Index)
+  */
+template<int CRows, int CCols>
+EIGEN_DEVICE_FUNC
+inline Block<Derived, CRows, CCols> topRightCorner()
+{
+  return Block<Derived, CRows, CCols>(derived(), 0, cols() - CCols);
+}
+
+/** This is the const version of topRightCorner<int, int>().*/
+template<int CRows, int CCols>
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived, CRows, CCols> topRightCorner() const
+{
+  return Block<const Derived, CRows, CCols>(derived(), 0, cols() - CCols);
+}
+
+/** \returns an expression of a top-right corner of *this.
+  *
+  * \tparam CRows number of rows in corner as specified at compile-time
+  * \tparam CCols number of columns in corner as specified at compile-time
+  * \param  cRows number of rows in corner as specified at run-time
+  * \param  cCols number of columns in corner as specified at run-time
+  *
+  * This function is mainly useful for corners where the number of rows is specified at compile-time
+  * and the number of columns is specified at run-time, or vice versa. The compile-time and run-time
+  * information should not contradict. In other words, \a cRows should equal \a CRows unless
+  * \a CRows is \a Dynamic, and the same for the number of columns.
+  *
+  * Example: \include MatrixBase_template_int_int_topRightCorner_int_int.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_topRightCorner_int_int.out
+  *
+  * \sa class Block
+  */
+template<int CRows, int CCols>
+inline Block<Derived, CRows, CCols> topRightCorner(Index cRows, Index cCols)
+{
+  return Block<Derived, CRows, CCols>(derived(), 0, cols() - cCols, cRows, cCols);
+}
+
+/** This is the const version of topRightCorner<int, int>(Index, Index).*/
+template<int CRows, int CCols>
+inline const Block<const Derived, CRows, CCols> topRightCorner(Index cRows, Index cCols) const
+{
+  return Block<const Derived, CRows, CCols>(derived(), 0, cols() - cCols, cRows, cCols);
+}
+
+
+
+/** \returns a dynamic-size expression of a top-left corner of *this.
+  *
+  * \param cRows the number of rows in the corner
+  * \param cCols the number of columns in the corner
+  *
+  * Example: \include MatrixBase_topLeftCorner_int_int.cpp
+  * Output: \verbinclude MatrixBase_topLeftCorner_int_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline Block<Derived> topLeftCorner(Index cRows, Index cCols)
+{
+  return Block<Derived>(derived(), 0, 0, cRows, cCols);
+}
+
+/** This is the const version of topLeftCorner(Index, Index).*/
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived> topLeftCorner(Index cRows, Index cCols) const
+{
+  return Block<const Derived>(derived(), 0, 0, cRows, cCols);
+}
+
+/** \returns an expression of a fixed-size top-left corner of *this.
+  *
+  * The template parameters CRows and CCols are the number of rows and columns in the corner.
+  *
+  * Example: \include MatrixBase_template_int_int_topLeftCorner.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_topLeftCorner.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int CRows, int CCols>
+EIGEN_DEVICE_FUNC
+inline Block<Derived, CRows, CCols> topLeftCorner()
+{
+  return Block<Derived, CRows, CCols>(derived(), 0, 0);
+}
+
+/** This is the const version of topLeftCorner<int, int>().*/
+template<int CRows, int CCols>
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived, CRows, CCols> topLeftCorner() const
+{
+  return Block<const Derived, CRows, CCols>(derived(), 0, 0);
+}
+
+/** \returns an expression of a top-left corner of *this.
+  *
+  * \tparam CRows number of rows in corner as specified at compile-time
+  * \tparam CCols number of columns in corner as specified at compile-time
+  * \param  cRows number of rows in corner as specified at run-time
+  * \param  cCols number of columns in corner as specified at run-time
+  *
+  * This function is mainly useful for corners where the number of rows is specified at compile-time
+  * and the number of columns is specified at run-time, or vice versa. The compile-time and run-time
+  * information should not contradict. In other words, \a cRows should equal \a CRows unless
+  * \a CRows is \a Dynamic, and the same for the number of columns.
+  *
+  * Example: \include MatrixBase_template_int_int_topLeftCorner_int_int.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_topLeftCorner_int_int.out
+  *
+  * \sa class Block
+  */
+template<int CRows, int CCols>
+inline Block<Derived, CRows, CCols> topLeftCorner(Index cRows, Index cCols)
+{
+  return Block<Derived, CRows, CCols>(derived(), 0, 0, cRows, cCols);
+}
+
+/** This is the const version of topLeftCorner<int, int>(Index, Index).*/
+template<int CRows, int CCols>
+inline const Block<const Derived, CRows, CCols> topLeftCorner(Index cRows, Index cCols) const
+{
+  return Block<const Derived, CRows, CCols>(derived(), 0, 0, cRows, cCols);
+}
+
+
+
+/** \returns a dynamic-size expression of a bottom-right corner of *this.
+  *
+  * \param cRows the number of rows in the corner
+  * \param cCols the number of columns in the corner
+  *
+  * Example: \include MatrixBase_bottomRightCorner_int_int.cpp
+  * Output: \verbinclude MatrixBase_bottomRightCorner_int_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline Block<Derived> bottomRightCorner(Index cRows, Index cCols)
+{
+  return Block<Derived>(derived(), rows() - cRows, cols() - cCols, cRows, cCols);
+}
+
+/** This is the const version of bottomRightCorner(Index, Index).*/
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived> bottomRightCorner(Index cRows, Index cCols) const
+{
+  return Block<const Derived>(derived(), rows() - cRows, cols() - cCols, cRows, cCols);
+}
+
+/** \returns an expression of a fixed-size bottom-right corner of *this.
+  *
+  * The template parameters CRows and CCols are the number of rows and columns in the corner.
+  *
+  * Example: \include MatrixBase_template_int_int_bottomRightCorner.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_bottomRightCorner.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int CRows, int CCols>
+EIGEN_DEVICE_FUNC
+inline Block<Derived, CRows, CCols> bottomRightCorner()
+{
+  return Block<Derived, CRows, CCols>(derived(), rows() - CRows, cols() - CCols);
+}
+
+/** This is the const version of bottomRightCorner<int, int>().*/
+template<int CRows, int CCols>
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived, CRows, CCols> bottomRightCorner() const
+{
+  return Block<const Derived, CRows, CCols>(derived(), rows() - CRows, cols() - CCols);
+}
+
+/** \returns an expression of a bottom-right corner of *this.
+  *
+  * \tparam CRows number of rows in corner as specified at compile-time
+  * \tparam CCols number of columns in corner as specified at compile-time
+  * \param  cRows number of rows in corner as specified at run-time
+  * \param  cCols number of columns in corner as specified at run-time
+  *
+  * This function is mainly useful for corners where the number of rows is specified at compile-time
+  * and the number of columns is specified at run-time, or vice versa. The compile-time and run-time
+  * information should not contradict. In other words, \a cRows should equal \a CRows unless
+  * \a CRows is \a Dynamic, and the same for the number of columns.
+  *
+  * Example: \include MatrixBase_template_int_int_bottomRightCorner_int_int.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_bottomRightCorner_int_int.out
+  *
+  * \sa class Block
+  */
+template<int CRows, int CCols>
+inline Block<Derived, CRows, CCols> bottomRightCorner(Index cRows, Index cCols)
+{
+  return Block<Derived, CRows, CCols>(derived(), rows() - cRows, cols() - cCols, cRows, cCols);
+}
+
+/** This is the const version of bottomRightCorner<int, int>(Index, Index).*/
+template<int CRows, int CCols>
+inline const Block<const Derived, CRows, CCols> bottomRightCorner(Index cRows, Index cCols) const
+{
+  return Block<const Derived, CRows, CCols>(derived(), rows() - cRows, cols() - cCols, cRows, cCols);
+}
+
+
+
+/** \returns a dynamic-size expression of a bottom-left corner of *this.
+  *
+  * \param cRows the number of rows in the corner
+  * \param cCols the number of columns in the corner
+  *
+  * Example: \include MatrixBase_bottomLeftCorner_int_int.cpp
+  * Output: \verbinclude MatrixBase_bottomLeftCorner_int_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline Block<Derived> bottomLeftCorner(Index cRows, Index cCols)
+{
+  return Block<Derived>(derived(), rows() - cRows, 0, cRows, cCols);
+}
+
+/** This is the const version of bottomLeftCorner(Index, Index).*/
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived> bottomLeftCorner(Index cRows, Index cCols) const
+{
+  return Block<const Derived>(derived(), rows() - cRows, 0, cRows, cCols);
+}
+
+/** \returns an expression of a fixed-size bottom-left corner of *this.
+  *
+  * The template parameters CRows and CCols are the number of rows and columns in the corner.
+  *
+  * Example: \include MatrixBase_template_int_int_bottomLeftCorner.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_bottomLeftCorner.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int CRows, int CCols>
+EIGEN_DEVICE_FUNC
+inline Block<Derived, CRows, CCols> bottomLeftCorner()
+{
+  return Block<Derived, CRows, CCols>(derived(), rows() - CRows, 0);
+}
+
+/** This is the const version of bottomLeftCorner<int, int>().*/
+template<int CRows, int CCols>
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived, CRows, CCols> bottomLeftCorner() const
+{
+  return Block<const Derived, CRows, CCols>(derived(), rows() - CRows, 0);
+}
+
+/** \returns an expression of a bottom-left corner of *this.
+  *
+  * \tparam CRows number of rows in corner as specified at compile-time
+  * \tparam CCols number of columns in corner as specified at compile-time
+  * \param  cRows number of rows in corner as specified at run-time
+  * \param  cCols number of columns in corner as specified at run-time
+  *
+  * This function is mainly useful for corners where the number of rows is specified at compile-time
+  * and the number of columns is specified at run-time, or vice versa. The compile-time and run-time
+  * information should not contradict. In other words, \a cRows should equal \a CRows unless
+  * \a CRows is \a Dynamic, and the same for the number of columns.
+  *
+  * Example: \include MatrixBase_template_int_int_bottomLeftCorner_int_int.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_bottomLeftCorner_int_int.out
+  *
+  * \sa class Block
+  */
+template<int CRows, int CCols>
+inline Block<Derived, CRows, CCols> bottomLeftCorner(Index cRows, Index cCols)
+{
+  return Block<Derived, CRows, CCols>(derived(), rows() - cRows, 0, cRows, cCols);
+}
+
+/** This is the const version of bottomLeftCorner<int, int>(Index, Index).*/
+template<int CRows, int CCols>
+inline const Block<const Derived, CRows, CCols> bottomLeftCorner(Index cRows, Index cCols) const
+{
+  return Block<const Derived, CRows, CCols>(derived(), rows() - cRows, 0, cRows, cCols);
+}
+
+
+
+/** \returns a block consisting of the top rows of *this.
+  *
+  * \param n the number of rows in the block
+  *
+  * Example: \include MatrixBase_topRows_int.cpp
+  * Output: \verbinclude MatrixBase_topRows_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline RowsBlockXpr topRows(Index n)
+{
+  return RowsBlockXpr(derived(), 0, 0, n, cols());
+}
+
+/** This is the const version of topRows(Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstRowsBlockXpr topRows(Index n) const
+{
+  return ConstRowsBlockXpr(derived(), 0, 0, n, cols());
+}
+
+/** \returns a block consisting of the top rows of *this.
+  *
+  * \tparam N the number of rows in the block as specified at compile-time
+  * \param n the number of rows in the block as specified at run-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include MatrixBase_template_int_topRows.cpp
+  * Output: \verbinclude MatrixBase_template_int_topRows.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename NRowsBlockXpr<N>::Type topRows(Index n = N)
+{
+  return typename NRowsBlockXpr<N>::Type(derived(), 0, 0, n, cols());
+}
+
+/** This is the const version of topRows<int>().*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstNRowsBlockXpr<N>::Type topRows(Index n = N) const
+{
+  return typename ConstNRowsBlockXpr<N>::Type(derived(), 0, 0, n, cols());
+}
+
+
+
+/** \returns a block consisting of the bottom rows of *this.
+  *
+  * \param n the number of rows in the block
+  *
+  * Example: \include MatrixBase_bottomRows_int.cpp
+  * Output: \verbinclude MatrixBase_bottomRows_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline RowsBlockXpr bottomRows(Index n)
+{
+  return RowsBlockXpr(derived(), rows() - n, 0, n, cols());
+}
+
+/** This is the const version of bottomRows(Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstRowsBlockXpr bottomRows(Index n) const
+{
+  return ConstRowsBlockXpr(derived(), rows() - n, 0, n, cols());
+}
+
+/** \returns a block consisting of the bottom rows of *this.
+  *
+  * \tparam N the number of rows in the block as specified at compile-time
+  * \param n the number of rows in the block as specified at run-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include MatrixBase_template_int_bottomRows.cpp
+  * Output: \verbinclude MatrixBase_template_int_bottomRows.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename NRowsBlockXpr<N>::Type bottomRows(Index n = N)
+{
+  return typename NRowsBlockXpr<N>::Type(derived(), rows() - n, 0, n, cols());
+}
+
+/** This is the const version of bottomRows<int>().*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstNRowsBlockXpr<N>::Type bottomRows(Index n = N) const
+{
+  return typename ConstNRowsBlockXpr<N>::Type(derived(), rows() - n, 0, n, cols());
+}
+
+
+
+/** \returns a block consisting of a range of rows of *this.
+  *
+  * \param startRow the index of the first row in the block
+  * \param n the number of rows in the block
+  *
+  * Example: \include DenseBase_middleRows_int.cpp
+  * Output: \verbinclude DenseBase_middleRows_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline RowsBlockXpr middleRows(Index startRow, Index n)
+{
+  return RowsBlockXpr(derived(), startRow, 0, n, cols());
+}
+
+/** This is the const version of middleRows(Index,Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstRowsBlockXpr middleRows(Index startRow, Index n) const
+{
+  return ConstRowsBlockXpr(derived(), startRow, 0, n, cols());
+}
+
+/** \returns a block consisting of a range of rows of *this.
+  *
+  * \tparam N the number of rows in the block as specified at compile-time
+  * \param startRow the index of the first row in the block
+  * \param n the number of rows in the block as specified at run-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include DenseBase_template_int_middleRows.cpp
+  * Output: \verbinclude DenseBase_template_int_middleRows.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename NRowsBlockXpr<N>::Type middleRows(Index startRow, Index n = N)
+{
+  return typename NRowsBlockXpr<N>::Type(derived(), startRow, 0, n, cols());
+}
+
+/** This is the const version of middleRows<int>().*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstNRowsBlockXpr<N>::Type middleRows(Index startRow, Index n = N) const
+{
+  return typename ConstNRowsBlockXpr<N>::Type(derived(), startRow, 0, n, cols());
+}
+
+
+
+/** \returns a block consisting of the left columns of *this.
+  *
+  * \param n the number of columns in the block
+  *
+  * Example: \include MatrixBase_leftCols_int.cpp
+  * Output: \verbinclude MatrixBase_leftCols_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline ColsBlockXpr leftCols(Index n)
+{
+  return ColsBlockXpr(derived(), 0, 0, rows(), n);
+}
+
+/** This is the const version of leftCols(Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstColsBlockXpr leftCols(Index n) const
+{
+  return ConstColsBlockXpr(derived(), 0, 0, rows(), n);
+}
+
+/** \returns a block consisting of the left columns of *this.
+  *
+  * \tparam N the number of columns in the block as specified at compile-time
+  * \param n the number of columns in the block as specified at run-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include MatrixBase_template_int_leftCols.cpp
+  * Output: \verbinclude MatrixBase_template_int_leftCols.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename NColsBlockXpr<N>::Type leftCols(Index n = N)
+{
+  return typename NColsBlockXpr<N>::Type(derived(), 0, 0, rows(), n);
+}
+
+/** This is the const version of leftCols<int>().*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstNColsBlockXpr<N>::Type leftCols(Index n = N) const
+{
+  return typename ConstNColsBlockXpr<N>::Type(derived(), 0, 0, rows(), n);
+}
+
+
+
+/** \returns a block consisting of the right columns of *this.
+  *
+  * \param n the number of columns in the block
+  *
+  * Example: \include MatrixBase_rightCols_int.cpp
+  * Output: \verbinclude MatrixBase_rightCols_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline ColsBlockXpr rightCols(Index n)
+{
+  return ColsBlockXpr(derived(), 0, cols() - n, rows(), n);
+}
+
+/** This is the const version of rightCols(Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstColsBlockXpr rightCols(Index n) const
+{
+  return ConstColsBlockXpr(derived(), 0, cols() - n, rows(), n);
+}
+
+/** \returns a block consisting of the right columns of *this.
+  *
+  * \tparam N the number of columns in the block as specified at compile-time
+  * \param n the number of columns in the block as specified at run-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include MatrixBase_template_int_rightCols.cpp
+  * Output: \verbinclude MatrixBase_template_int_rightCols.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename NColsBlockXpr<N>::Type rightCols(Index n = N)
+{
+  return typename NColsBlockXpr<N>::Type(derived(), 0, cols() - n, rows(), n);
+}
+
+/** This is the const version of rightCols<int>().*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstNColsBlockXpr<N>::Type rightCols(Index n = N) const
+{
+  return typename ConstNColsBlockXpr<N>::Type(derived(), 0, cols() - n, rows(), n);
+}
+
+
+
+/** \returns a block consisting of a range of columns of *this.
+  *
+  * \param startCol the index of the first column in the block
+  * \param numCols the number of columns in the block
+  *
+  * Example: \include DenseBase_middleCols_int.cpp
+  * Output: \verbinclude DenseBase_middleCols_int.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline ColsBlockXpr middleCols(Index startCol, Index numCols)
+{
+  return ColsBlockXpr(derived(), 0, startCol, rows(), numCols);
+}
+
+/** This is the const version of middleCols(Index,Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstColsBlockXpr middleCols(Index startCol, Index numCols) const
+{
+  return ConstColsBlockXpr(derived(), 0, startCol, rows(), numCols);
+}
+
+/** \returns a block consisting of a range of columns of *this.
+  *
+  * \tparam N the number of columns in the block as specified at compile-time
+  * \param startCol the index of the first column in the block
+  * \param n the number of columns in the block as specified at run-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include DenseBase_template_int_middleCols.cpp
+  * Output: \verbinclude DenseBase_template_int_middleCols.out
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename NColsBlockXpr<N>::Type middleCols(Index startCol, Index n = N)
+{
+  return typename NColsBlockXpr<N>::Type(derived(), 0, startCol, rows(), n);
+}
+
+/** This is the const version of middleCols<int>().*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstNColsBlockXpr<N>::Type middleCols(Index startCol, Index n = N) const
+{
+  return typename ConstNColsBlockXpr<N>::Type(derived(), 0, startCol, rows(), n);
+}
+
+
+
+/** \returns a fixed-size expression of a block in *this.
+  *
+  * The template parameters \a BlockRows and \a BlockCols are the number of
+  * rows and columns in the block.
+  *
+  * \param startRow the first row in the block
+  * \param startCol the first column in the block
+  *
+  * Example: \include MatrixBase_block_int_int.cpp
+  * Output: \verbinclude MatrixBase_block_int_int.out
+  *
+  * \note since block is a templated member, the keyword template has to be used
+  * if the matrix type is also a template parameter: \code m.template block<3,3>(1,1); \endcode
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int BlockRows, int BlockCols>
+EIGEN_DEVICE_FUNC
+inline Block<Derived, BlockRows, BlockCols> block(Index startRow, Index startCol)
+{
+  return Block<Derived, BlockRows, BlockCols>(derived(), startRow, startCol);
+}
+
+/** This is the const version of block<>(Index, Index). */
+template<int BlockRows, int BlockCols>
+EIGEN_DEVICE_FUNC
+inline const Block<const Derived, BlockRows, BlockCols> block(Index startRow, Index startCol) const
+{
+  return Block<const Derived, BlockRows, BlockCols>(derived(), startRow, startCol);
+}
+
+/** \returns an expression of a block in *this.
+  *
+  * \tparam BlockRows number of rows in block as specified at compile-time
+  * \tparam BlockCols number of columns in block as specified at compile-time
+  * \param  startRow  the first row in the block
+  * \param  startCol  the first column in the block
+  * \param  blockRows number of rows in block as specified at run-time
+  * \param  blockCols number of columns in block as specified at run-time
+  *
+  * This function is mainly useful for blocks where the number of rows is specified at compile-time
+  * and the number of columns is specified at run-time, or vice versa. The compile-time and run-time
+  * information should not contradict. In other words, \a blockRows should equal \a BlockRows unless
+  * \a BlockRows is \a Dynamic, and the same for the number of columns.
+  *
+  * Example: \include MatrixBase_template_int_int_block_int_int_int_int.cpp
+  * Output: \verbinclude MatrixBase_template_int_int_block_int_int_int_int.cpp
+  *
+  * \sa class Block, block(Index,Index,Index,Index)
+  */
+template<int BlockRows, int BlockCols>
+inline Block<Derived, BlockRows, BlockCols> block(Index startRow, Index startCol, 
+                                                  Index blockRows, Index blockCols)
+{
+  return Block<Derived, BlockRows, BlockCols>(derived(), startRow, startCol, blockRows, blockCols);
+}
+
+/** This is the const version of block<>(Index, Index, Index, Index). */
+template<int BlockRows, int BlockCols>
+inline const Block<const Derived, BlockRows, BlockCols> block(Index startRow, Index startCol,
+                                                              Index blockRows, Index blockCols) const
+{
+  return Block<const Derived, BlockRows, BlockCols>(derived(), startRow, startCol, blockRows, blockCols);
+}
+
+/** \returns an expression of the \a i-th column of *this. Note that the numbering starts at 0.
+  *
+  * Example: \include MatrixBase_col.cpp
+  * Output: \verbinclude MatrixBase_col.out
+  *
+  * \sa row(), class Block */
+EIGEN_DEVICE_FUNC
+inline ColXpr col(Index i)
+{
+  return ColXpr(derived(), i);
+}
+
+/** This is the const version of col(). */
+EIGEN_DEVICE_FUNC
+inline ConstColXpr col(Index i) const
+{
+  return ConstColXpr(derived(), i);
+}
+
+/** \returns an expression of the \a i-th row of *this. Note that the numbering starts at 0.
+  *
+  * Example: \include MatrixBase_row.cpp
+  * Output: \verbinclude MatrixBase_row.out
+  *
+  * \sa col(), class Block */
+EIGEN_DEVICE_FUNC
+inline RowXpr row(Index i)
+{
+  return RowXpr(derived(), i);
+}
+
+/** This is the const version of row(). */
+EIGEN_DEVICE_FUNC
+inline ConstRowXpr row(Index i) const
+{
+  return ConstRowXpr(derived(), i);
+}
+
+/** \returns a dynamic-size expression of a segment (i.e. a vector block) in *this.
+  *
+  * \only_for_vectors
+  *
+  * \param start the first coefficient in the segment
+  * \param n the number of coefficients in the segment
+  *
+  * Example: \include MatrixBase_segment_int_int.cpp
+  * Output: \verbinclude MatrixBase_segment_int_int.out
+  *
+  * \note Even though the returned expression has dynamic size, in the case
+  * when it is applied to a fixed-size vector, it inherits a fixed maximal size,
+  * which means that evaluating it does not cause a dynamic memory allocation.
+  *
+  * \sa class Block, segment(Index)
+  */
+EIGEN_DEVICE_FUNC
+inline SegmentReturnType segment(Index start, Index n)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return SegmentReturnType(derived(), start, n);
+}
+
+
+/** This is the const version of segment(Index,Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstSegmentReturnType segment(Index start, Index n) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return ConstSegmentReturnType(derived(), start, n);
+}
+
+/** \returns a dynamic-size expression of the first coefficients of *this.
+  *
+  * \only_for_vectors
+  *
+  * \param n the number of coefficients in the segment
+  *
+  * Example: \include MatrixBase_start_int.cpp
+  * Output: \verbinclude MatrixBase_start_int.out
+  *
+  * \note Even though the returned expression has dynamic size, in the case
+  * when it is applied to a fixed-size vector, it inherits a fixed maximal size,
+  * which means that evaluating it does not cause a dynamic memory allocation.
+  *
+  * \sa class Block, block(Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline SegmentReturnType head(Index n)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return SegmentReturnType(derived(), 0, n);
+}
+
+/** This is the const version of head(Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstSegmentReturnType head(Index n) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return ConstSegmentReturnType(derived(), 0, n);
+}
+
+/** \returns a dynamic-size expression of the last coefficients of *this.
+  *
+  * \only_for_vectors
+  *
+  * \param n the number of coefficients in the segment
+  *
+  * Example: \include MatrixBase_end_int.cpp
+  * Output: \verbinclude MatrixBase_end_int.out
+  *
+  * \note Even though the returned expression has dynamic size, in the case
+  * when it is applied to a fixed-size vector, it inherits a fixed maximal size,
+  * which means that evaluating it does not cause a dynamic memory allocation.
+  *
+  * \sa class Block, block(Index,Index)
+  */
+EIGEN_DEVICE_FUNC
+inline SegmentReturnType tail(Index n)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return SegmentReturnType(derived(), this->size() - n, n);
+}
+
+/** This is the const version of tail(Index).*/
+EIGEN_DEVICE_FUNC
+inline ConstSegmentReturnType tail(Index n) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return ConstSegmentReturnType(derived(), this->size() - n, n);
+}
+
+/** \returns a fixed-size expression of a segment (i.e. a vector block) in \c *this
+  *
+  * \only_for_vectors
+  *
+  * \tparam N the number of coefficients in the segment as specified at compile-time
+  * \param start the index of the first element in the segment
+  * \param n the number of coefficients in the segment as specified at compile-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include MatrixBase_template_int_segment.cpp
+  * Output: \verbinclude MatrixBase_template_int_segment.out
+  *
+  * \sa class Block
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename FixedSegmentReturnType<N>::Type segment(Index start, Index n = N)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return typename FixedSegmentReturnType<N>::Type(derived(), start, n);
+}
+
+/** This is the const version of segment<int>(Index).*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstFixedSegmentReturnType<N>::Type segment(Index start, Index n = N) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return typename ConstFixedSegmentReturnType<N>::Type(derived(), start, n);
+}
+
+/** \returns a fixed-size expression of the first coefficients of *this.
+  *
+  * \only_for_vectors
+  *
+  * \tparam N the number of coefficients in the segment as specified at compile-time
+  * \param  n the number of coefficients in the segment as specified at run-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include MatrixBase_template_int_start.cpp
+  * Output: \verbinclude MatrixBase_template_int_start.out
+  *
+  * \sa class Block
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename FixedSegmentReturnType<N>::Type head(Index n = N)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return typename FixedSegmentReturnType<N>::Type(derived(), 0, n);
+}
+
+/** This is the const version of head<int>().*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstFixedSegmentReturnType<N>::Type head(Index n = N) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return typename ConstFixedSegmentReturnType<N>::Type(derived(), 0, n);
+}
+
+/** \returns a fixed-size expression of the last coefficients of *this.
+  *
+  * \only_for_vectors
+  *
+  * \tparam N the number of coefficients in the segment as specified at compile-time
+  * \param  n the number of coefficients in the segment as specified at run-time
+  *
+  * The compile-time and run-time information should not contradict. In other words,
+  * \a n should equal \a N unless \a N is \a Dynamic.
+  *
+  * Example: \include MatrixBase_template_int_end.cpp
+  * Output: \verbinclude MatrixBase_template_int_end.out
+  *
+  * \sa class Block
+  */
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename FixedSegmentReturnType<N>::Type tail(Index n = N)
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return typename FixedSegmentReturnType<N>::Type(derived(), size() - n);
+}
+
+/** This is the const version of tail<int>.*/
+template<int N>
+EIGEN_DEVICE_FUNC
+inline typename ConstFixedSegmentReturnType<N>::Type tail(Index n = N) const
+{
+  EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived)
+  return typename ConstFixedSegmentReturnType<N>::Type(derived(), size() - n);
+}
diff --git a/third_party/eigen3/Eigen/src/plugins/CommonCwiseBinaryOps.h b/third_party/eigen3/Eigen/src/plugins/CommonCwiseBinaryOps.h
new file mode 100644
index 0000000000..a8fa287c90
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/plugins/CommonCwiseBinaryOps.h
@@ -0,0 +1,47 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// This file is a base class plugin containing common coefficient wise functions.
+
+/** \returns an expression of the difference of \c *this and \a other
+  *
+  * \note If you want to substract a given scalar from all coefficients, see Cwise::operator-().
+  *
+  * \sa class CwiseBinaryOp, operator-=()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(operator-,internal::scalar_difference_op)
+
+/** \returns an expression of the sum of \c *this and \a other
+  *
+  * \note If you want to add a given scalar to all coefficients, see Cwise::operator+().
+  *
+  * \sa class CwiseBinaryOp, operator+=()
+  */
+EIGEN_MAKE_CWISE_BINARY_OP(operator+,internal::scalar_sum_op)
+
+/** \returns an expression of a custom coefficient-wise operator \a func of *this and \a other
+  *
+  * The template parameter \a CustomBinaryOp is the type of the functor
+  * of the custom operator (see class CwiseBinaryOp for an example)
+  *
+  * Here is an example illustrating the use of custom functors:
+  * \include class_CwiseBinaryOp.cpp
+  * Output: \verbinclude class_CwiseBinaryOp.out
+  *
+  * \sa class CwiseBinaryOp, operator+(), operator-(), cwiseProduct()
+  */
+template<typename CustomBinaryOp, typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<CustomBinaryOp, const Derived, const OtherDerived>
+binaryExpr(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other, const CustomBinaryOp& func = CustomBinaryOp()) const
+{
+  return CwiseBinaryOp<CustomBinaryOp, const Derived, const OtherDerived>(derived(), other.derived(), func);
+}
+
diff --git a/third_party/eigen3/Eigen/src/plugins/CommonCwiseUnaryOps.h b/third_party/eigen3/Eigen/src/plugins/CommonCwiseUnaryOps.h
new file mode 100644
index 0000000000..aa20215745
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/plugins/CommonCwiseUnaryOps.h
@@ -0,0 +1,201 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// This file is a base class plugin containing common coefficient wise functions.
+
+#ifndef EIGEN_PARSED_BY_DOXYGEN
+
+/** \internal Represents a scalar multiple of an expression */
+typedef CwiseUnaryOp<internal::scalar_multiple_op<Scalar>, const Derived> ScalarMultipleReturnType;
+/** \internal Represents a quotient of an expression by a scalar*/
+typedef CwiseUnaryOp<internal::scalar_quotient1_op<Scalar>, const Derived> ScalarQuotient1ReturnType;
+/** \internal the return type of conjugate() */
+typedef typename internal::conditional<NumTraits<Scalar>::IsComplex,
+                    const CwiseUnaryOp<internal::scalar_conjugate_op<Scalar>, const Derived>,
+                    const Derived&
+                  >::type ConjugateReturnType;
+/** \internal the return type of real() const */
+typedef typename internal::conditional<NumTraits<Scalar>::IsComplex,
+                    const CwiseUnaryOp<internal::scalar_real_op<Scalar>, const Derived>,
+                    const Derived&
+                  >::type RealReturnType;
+/** \internal the return type of real() */
+typedef typename internal::conditional<NumTraits<Scalar>::IsComplex,
+                    CwiseUnaryView<internal::scalar_real_ref_op<Scalar>, Derived>,
+                    Derived&
+                  >::type NonConstRealReturnType;
+/** \internal the return type of imag() const */
+typedef CwiseUnaryOp<internal::scalar_imag_op<Scalar>, const Derived> ImagReturnType;
+/** \internal the return type of imag() */
+typedef CwiseUnaryView<internal::scalar_imag_ref_op<Scalar>, Derived> NonConstImagReturnType;
+
+#endif // not EIGEN_PARSED_BY_DOXYGEN
+
+/** \returns an expression of the opposite of \c *this
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_opposite_op<typename internal::traits<Derived>::Scalar>, const Derived>
+operator-() const { return derived(); }
+
+
+/** \returns an expression of \c *this scaled by the scalar factor \a scalar */
+EIGEN_DEVICE_FUNC
+inline const ScalarMultipleReturnType
+operator*(const Scalar& scalar) const
+{
+  return CwiseUnaryOp<internal::scalar_multiple_op<Scalar>, const Derived>
+    (derived(), internal::scalar_multiple_op<Scalar>(scalar));
+}
+
+#ifdef EIGEN_PARSED_BY_DOXYGEN
+const ScalarMultipleReturnType operator*(const RealScalar& scalar) const;
+#endif
+
+/** \returns an expression of \c *this divided by the scalar value \a scalar */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_quotient1_op<typename internal::traits<Derived>::Scalar>, const Derived>
+operator/(const Scalar& scalar) const
+{
+  return CwiseUnaryOp<internal::scalar_quotient1_op<Scalar>, const Derived>
+    (derived(), internal::scalar_quotient1_op<Scalar>(scalar));
+}
+
+/** Overloaded for efficient real matrix times complex scalar value */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_multiple2_op<Scalar,std::complex<Scalar> >, const Derived>
+operator*(const std::complex<Scalar>& scalar) const
+{
+  return CwiseUnaryOp<internal::scalar_multiple2_op<Scalar,std::complex<Scalar> >, const Derived>
+    (*static_cast<const Derived*>(this), internal::scalar_multiple2_op<Scalar,std::complex<Scalar> >(scalar));
+}
+
+EIGEN_DEVICE_FUNC
+inline friend const ScalarMultipleReturnType
+operator*(const Scalar& scalar, const StorageBaseType& matrix)
+{ return matrix*scalar; }
+
+EIGEN_DEVICE_FUNC
+inline friend const CwiseUnaryOp<internal::scalar_multiple2_op<Scalar,std::complex<Scalar> >, const Derived>
+operator*(const std::complex<Scalar>& scalar, const StorageBaseType& matrix)
+{ return matrix*scalar; }
+
+/** \returns an expression of *this with the \a Scalar type casted to
+  * \a NewScalar.
+  *
+  * The template parameter \a NewScalar is the type we are casting the scalars to.
+  *
+  * \sa class CwiseUnaryOp
+  */
+template<typename NewType>
+EIGEN_DEVICE_FUNC
+typename internal::cast_return_type<Derived,const CwiseUnaryOp<internal::scalar_cast_op<typename internal::traits<Derived>::Scalar, NewType>, const Derived> >::type
+cast() const
+{
+  return derived();
+}
+
+/** \returns an expression of *this with the \a Scalar type converted to
+  * \a NewScalar using the custom conversion functor \a ConvertOp.
+  *
+  * The template parameter \a NewType is the type we are casting the scalars to.
+  * The template parameter \a ConvertOp is the conversion functor.
+  *
+  * \sa class CwiseUnaryOp
+  */
+template<typename NewType, typename ConvertOp>
+typename internal::cast_return_type<Derived,const CwiseUnaryOp<internal::scalar_convert_op<typename internal::traits<Derived>::Scalar, NewType, ConvertOp>, const Derived> >::type
+convert() const
+{
+  return derived();
+}
+
+/** \returns an expression of the complex conjugate of \c *this.
+  *
+  * \sa adjoint() */
+EIGEN_DEVICE_FUNC
+inline ConjugateReturnType
+conjugate() const
+{
+  return ConjugateReturnType(derived());
+}
+
+/** \returns a read-only expression of the real part of \c *this.
+  *
+  * \sa imag() */
+EIGEN_DEVICE_FUNC
+inline RealReturnType
+real() const { return derived(); }
+
+/** \returns an read-only expression of the imaginary part of \c *this.
+  *
+  * \sa real() */
+EIGEN_DEVICE_FUNC
+inline const ImagReturnType
+imag() const { return derived(); }
+
+/** \brief Apply a unary operator coefficient-wise
+  * \param[in]  func  Functor implementing the unary operator
+  * \tparam  CustomUnaryOp Type of \a func  
+  * \returns An expression of a custom coefficient-wise unary operator \a func of *this
+  *
+  * The function \c ptr_fun() from the C++ standard library can be used to make functors out of normal functions.
+  *
+  * Example:
+  * \include class_CwiseUnaryOp_ptrfun.cpp
+  * Output: \verbinclude class_CwiseUnaryOp_ptrfun.out
+  *
+  * Genuine functors allow for more possibilities, for instance it may contain a state.
+  *
+  * Example:
+  * \include class_CwiseUnaryOp.cpp
+  * Output: \verbinclude class_CwiseUnaryOp.out
+  *
+  * \sa class CwiseUnaryOp, class CwiseBinaryOp
+  */
+template<typename CustomUnaryOp>
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<CustomUnaryOp, const Derived>
+unaryExpr(const CustomUnaryOp& func = CustomUnaryOp()) const
+{
+  return CwiseUnaryOp<CustomUnaryOp, const Derived>(derived(), func);
+}
+
+/** \returns an expression of a custom coefficient-wise unary operator \a func of *this
+  *
+  * The template parameter \a CustomUnaryOp is the type of the functor
+  * of the custom unary operator.
+  *
+  * Example:
+  * \include class_CwiseUnaryOp.cpp
+  * Output: \verbinclude class_CwiseUnaryOp.out
+  *
+  * \sa class CwiseUnaryOp, class CwiseBinaryOp
+  */
+template<typename CustomViewOp>
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryView<CustomViewOp, const Derived>
+unaryViewExpr(const CustomViewOp& func = CustomViewOp()) const
+{
+  return CwiseUnaryView<CustomViewOp, const Derived>(derived(), func);
+}
+
+/** \returns a non const expression of the real part of \c *this.
+  *
+  * \sa imag() */
+EIGEN_DEVICE_FUNC
+inline NonConstRealReturnType
+real() { return derived(); }
+
+/** \returns a non const expression of the imaginary part of \c *this.
+  *
+  * \sa real() */
+EIGEN_DEVICE_FUNC
+inline NonConstImagReturnType
+imag() { return derived(); }
diff --git a/third_party/eigen3/Eigen/src/plugins/MatrixCwiseBinaryOps.h b/third_party/eigen3/Eigen/src/plugins/MatrixCwiseBinaryOps.h
new file mode 100644
index 0000000000..b9582a5a06
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/plugins/MatrixCwiseBinaryOps.h
@@ -0,0 +1,134 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// This file is a base class plugin containing matrix specifics coefficient wise functions.
+
+/** \returns an expression of the Schur product (coefficient wise product) of *this and \a other
+  *
+  * Example: \include MatrixBase_cwiseProduct.cpp
+  * Output: \verbinclude MatrixBase_cwiseProduct.out
+  *
+  * \sa class CwiseBinaryOp, cwiseAbs2
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const EIGEN_CWISE_PRODUCT_RETURN_TYPE(Derived,OtherDerived)
+cwiseProduct(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  return EIGEN_CWISE_PRODUCT_RETURN_TYPE(Derived,OtherDerived)(derived(), other.derived());
+}
+
+/** \returns an expression of the coefficient-wise == operator of *this and \a other
+  *
+  * \warning this performs an exact comparison, which is generally a bad idea with floating-point types.
+  * In order to check for equality between two vectors or matrices with floating-point coefficients, it is
+  * generally a far better idea to use a fuzzy comparison as provided by isApprox() and
+  * isMuchSmallerThan().
+  *
+  * Example: \include MatrixBase_cwiseEqual.cpp
+  * Output: \verbinclude MatrixBase_cwiseEqual.out
+  *
+  * \sa cwiseNotEqual(), isApprox(), isMuchSmallerThan()
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+inline const CwiseBinaryOp<std::equal_to<Scalar>, const Derived, const OtherDerived>
+cwiseEqual(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  return CwiseBinaryOp<std::equal_to<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
+}
+
+/** \returns an expression of the coefficient-wise != operator of *this and \a other
+  *
+  * \warning this performs an exact comparison, which is generally a bad idea with floating-point types.
+  * In order to check for equality between two vectors or matrices with floating-point coefficients, it is
+  * generally a far better idea to use a fuzzy comparison as provided by isApprox() and
+  * isMuchSmallerThan().
+  *
+  * Example: \include MatrixBase_cwiseNotEqual.cpp
+  * Output: \verbinclude MatrixBase_cwiseNotEqual.out
+  *
+  * \sa cwiseEqual(), isApprox(), isMuchSmallerThan()
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+inline const CwiseBinaryOp<std::not_equal_to<Scalar>, const Derived, const OtherDerived>
+cwiseNotEqual(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  return CwiseBinaryOp<std::not_equal_to<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
+}
+
+/** \returns an expression of the coefficient-wise min of *this and \a other
+  *
+  * Example: \include MatrixBase_cwiseMin.cpp
+  * Output: \verbinclude MatrixBase_cwiseMin.out
+  *
+  * \sa class CwiseBinaryOp, max()
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<internal::scalar_min_op<Scalar>, const Derived, const OtherDerived>
+cwiseMin(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  return CwiseBinaryOp<internal::scalar_min_op<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
+}
+
+/** \returns an expression of the coefficient-wise min of *this and scalar \a other
+  *
+  * \sa class CwiseBinaryOp, min()
+  */
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<internal::scalar_min_op<Scalar>, const Derived, const ConstantReturnType>
+cwiseMin(const Scalar &other) const
+{
+  return cwiseMin(Derived::Constant(rows(), cols(), other));
+}
+
+/** \returns an expression of the coefficient-wise max of *this and \a other
+  *
+  * Example: \include MatrixBase_cwiseMax.cpp
+  * Output: \verbinclude MatrixBase_cwiseMax.out
+  *
+  * \sa class CwiseBinaryOp, min()
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<internal::scalar_max_op<Scalar>, const Derived, const OtherDerived>
+cwiseMax(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  return CwiseBinaryOp<internal::scalar_max_op<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
+}
+
+/** \returns an expression of the coefficient-wise max of *this and scalar \a other
+  *
+  * \sa class CwiseBinaryOp, min()
+  */
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<internal::scalar_max_op<Scalar>, const Derived, const ConstantReturnType>
+cwiseMax(const Scalar &other) const
+{
+  return cwiseMax(Derived::Constant(rows(), cols(), other));
+}
+
+
+/** \returns an expression of the coefficient-wise quotient of *this and \a other
+  *
+  * Example: \include MatrixBase_cwiseQuotient.cpp
+  * Output: \verbinclude MatrixBase_cwiseQuotient.out
+  *
+  * \sa class CwiseBinaryOp, cwiseProduct(), cwiseInverse()
+  */
+template<typename OtherDerived>
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseBinaryOp<internal::scalar_quotient_op<Scalar>, const Derived, const OtherDerived>
+cwiseQuotient(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const
+{
+  return CwiseBinaryOp<internal::scalar_quotient_op<Scalar>, const Derived, const OtherDerived>(derived(), other.derived());
+}
diff --git a/third_party/eigen3/Eigen/src/plugins/MatrixCwiseUnaryOps.h b/third_party/eigen3/Eigen/src/plugins/MatrixCwiseUnaryOps.h
new file mode 100644
index 0000000000..1bb15f862d
--- /dev/null
+++ b/third_party/eigen3/Eigen/src/plugins/MatrixCwiseUnaryOps.h
@@ -0,0 +1,72 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2008-2009 Gael Guennebaud <gael.guennebaud@inria.fr>
+// Copyright (C) 2006-2008 Benoit Jacob <jacob.benoit.1@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+// This file is a base class plugin containing matrix specifics coefficient wise functions.
+
+/** \returns an expression of the coefficient-wise absolute value of \c *this
+  *
+  * Example: \include MatrixBase_cwiseAbs.cpp
+  * Output: \verbinclude MatrixBase_cwiseAbs.out
+  *
+  * \sa cwiseAbs2()
+  */
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseUnaryOp<internal::scalar_abs_op<Scalar>, const Derived>
+cwiseAbs() const { return derived(); }
+
+/** \returns an expression of the coefficient-wise squared absolute value of \c *this
+  *
+  * Example: \include MatrixBase_cwiseAbs2.cpp
+  * Output: \verbinclude MatrixBase_cwiseAbs2.out
+  *
+  * \sa cwiseAbs()
+  */
+EIGEN_DEVICE_FUNC
+EIGEN_STRONG_INLINE const CwiseUnaryOp<internal::scalar_abs2_op<Scalar>, const Derived>
+cwiseAbs2() const { return derived(); }
+
+/** \returns an expression of the coefficient-wise square root of *this.
+  *
+  * Example: \include MatrixBase_cwiseSqrt.cpp
+  * Output: \verbinclude MatrixBase_cwiseSqrt.out
+  *
+  * \sa cwisePow(), cwiseSquare()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_sqrt_op<Scalar>, const Derived>
+cwiseSqrt() const { return derived(); }
+
+/** \returns an expression of the coefficient-wise inverse of *this.
+  *
+  * Example: \include MatrixBase_cwiseInverse.cpp
+  * Output: \verbinclude MatrixBase_cwiseInverse.out
+  *
+  * \sa cwiseProduct()
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<internal::scalar_inverse_op<Scalar>, const Derived>
+cwiseInverse() const { return derived(); }
+
+/** \returns an expression of the coefficient-wise == operator of \c *this and a scalar \a s
+  *
+  * \warning this performs an exact comparison, which is generally a bad idea with floating-point types.
+  * In order to check for equality between two vectors or matrices with floating-point coefficients, it is
+  * generally a far better idea to use a fuzzy comparison as provided by isApprox() and
+  * isMuchSmallerThan().
+  *
+  * \sa cwiseEqual(const MatrixBase<OtherDerived> &) const
+  */
+EIGEN_DEVICE_FUNC
+inline const CwiseUnaryOp<std::binder1st<std::equal_to<Scalar> >, const Derived>
+cwiseEqual(const Scalar& s) const
+{
+  return CwiseUnaryOp<std::binder1st<std::equal_to<Scalar> >,const Derived>
+          (derived(), std::bind1st(std::equal_to<Scalar>(), s));
+}
diff --git a/third_party/eigen3/LICENSE b/third_party/eigen3/LICENSE
new file mode 100644
index 0000000000..a25d8e6fc6
--- /dev/null
+++ b/third_party/eigen3/LICENSE
@@ -0,0 +1,1936 @@
+Eigen is primarily MPL2 licensed. See COPYING.MPL2 and these links:
+  http://www.mozilla.org/MPL/2.0/
+  http://www.mozilla.org/MPL/2.0/FAQ.html
+
+Some files contain third-party code under BSD or LGPL licenses, whence
+the other COPYING.* files here.
+
+All the LGPL code is either LGPL 2.1-only, or LGPL 2.1-or-later.
+For this reason, the COPYING.LGPL file contains the LGPL 2.1 text.
+
+If you want to guarantee that the Eigen code that you are #including
+is licensed under the MPL2 and possibly more permissive licenses (like
+BSD), #define this preprocessor symbol: EIGEN_MPL2_ONLY 
+For example, with most compilers, you could add this to your project
+      CXXFLAGS: -DEIGEN_MPL2_ONLY 
+This will cause a compilation error to be generated if you #include
+any code that is LGPL licensed.
+
+----------------------------------------------------------------------
+Following applies to:
+./test/mapstaticmethods.cpp
+./test/schur_real.cpp
+./test/prec_inverse_4x4.cpp
+./test/smallvectors.cpp
+./test/redux.cpp
+./test/special_numbers.cpp
+./test/adjoint.cpp
+./test/resize.cpp
+./test/mixingtypes.cpp
+./test/product_trmv.cpp
+./test/sparse_solvers.cpp
+./test/cholesky.cpp
+./test/geo_quaternion.cpp
+./test/miscmatrices.cpp
+./test/stddeque.cpp
+./test/integer_types.cpp
+./test/product_large.cpp
+./test/eigensolver_generic.cpp
+./test/householder.cpp
+./test/geo_orthomethods.cpp
+./test/array_for_matrix.cpp
+./test/sparseLM.cpp
+./test/upperbidiagonalization.cpp
+./test/nomalloc.cpp
+./test/packetmath.cpp
+./test/jacobisvd.cpp
+./test/geo_transformations.cpp
+./test/swap.cpp
+./test/eigensolver_selfadjoint.cpp
+./test/inverse.cpp
+./test/product_selfadjoint.cpp
+./test/product_trsolve.cpp
+./test/product_extra.cpp
+./test/sparse_solver.h
+./test/mapstride.cpp
+./test/mapped_matrix.cpp
+./test/geo_eulerangles.cpp
+./test/eigen2support.cpp
+./test/denseLM.cpp
+./test/stdvector.cpp
+./test/nesting_ops.cpp
+./test/sparse_permutations.cpp
+./test/zerosized.cpp
+./test/exceptions.cpp
+./test/vectorwiseop.cpp
+./test/cwiseop.cpp
+./test/basicstuff.cpp
+./test/product_trmm.cpp
+./test/linearstructure.cpp
+./test/sparse_product.cpp
+./test/stdvector_overload.cpp
+./test/stable_norm.cpp
+./test/umeyama.cpp
+./test/unalignedcount.cpp
+./test/triangular.cpp
+./test/product_mmtr.cpp
+./test/sparse_basic.cpp
+./test/sparse_vector.cpp
+./test/meta.cpp
+./test/real_qz.cpp
+./test/ref.cpp
+./test/eigensolver_complex.cpp
+./test/cholmod_support.cpp
+./test/conjugate_gradient.cpp
+./test/sparse.h
+./test/simplicial_cholesky.cpp
+./test/bicgstab.cpp
+./test/dynalloc.cpp
+./test/product_notemporary.cpp
+./test/geo_hyperplane.cpp
+./test/lu.cpp
+./test/qr.cpp
+./test/hessenberg.cpp
+./test/sizeof.cpp
+./test/main.h
+./test/selfadjoint.cpp
+./test/permutationmatrices.cpp
+./test/superlu_support.cpp
+./test/qtvector.cpp
+./test/geo_homogeneous.cpp
+./test/determinant.cpp
+./test/array_reverse.cpp
+./test/unalignedassert.cpp
+./test/stdlist.cpp
+./test/product_symm.cpp
+./test/corners.cpp
+./test/dontalign.cpp
+./test/visitor.cpp
+./test/geo_alignedbox.cpp
+./test/diagonalmatrices.cpp
+./test/product_small.cpp
+./test/eigensolver_generalized_real.cpp
+./test/umfpack_support.cpp
+./test/first_aligned.cpp
+./test/qr_fullpivoting.cpp
+./test/array_replicate.cpp
+./test/geo_parametrizedline.cpp
+./test/eigen2/eigen2_unalignedassert.cpp
+./test/eigen2/eigen2_prec_inverse_4x4.cpp
+./test/eigen2/eigen2_alignedbox.cpp
+./test/eigen2/eigen2_sparse_product.cpp
+./test/eigen2/eigen2_meta.cpp
+./test/eigen2/eigen2_nomalloc.cpp
+./test/eigen2/eigen2_visitor.cpp
+./test/eigen2/eigen2_packetmath.cpp
+./test/eigen2/eigen2_svd.cpp
+./test/eigen2/eigen2_mixingtypes.cpp
+./test/eigen2/eigen2_qr.cpp
+./test/eigen2/eigen2_cwiseop.cpp
+./test/eigen2/eigen2_geometry_with_eigen2_prefix.cpp
+./test/eigen2/eigen2_smallvectors.cpp
+./test/eigen2/eigen2_commainitializer.cpp
+./test/eigen2/eigen2_sparse_solvers.cpp
+./test/eigen2/eigen2_hyperplane.cpp
+./test/eigen2/eigen2_eigensolver.cpp
+./test/eigen2/eigen2_linearstructure.cpp
+./test/eigen2/eigen2_sizeof.cpp
+./test/eigen2/eigen2_parametrizedline.cpp
+./test/eigen2/eigen2_lu.cpp
+./test/eigen2/eigen2_adjoint.cpp
+./test/eigen2/eigen2_geometry.cpp
+./test/eigen2/eigen2_stdvector.cpp
+./test/eigen2/eigen2_newstdvector.cpp
+./test/eigen2/eigen2_submatrices.cpp
+./test/eigen2/sparse.h
+./test/eigen2/eigen2_swap.cpp
+./test/eigen2/eigen2_triangular.cpp
+./test/eigen2/eigen2_basicstuff.cpp
+./test/eigen2/gsl_helper.h
+./test/eigen2/eigen2_dynalloc.cpp
+./test/eigen2/eigen2_array.cpp
+./test/eigen2/eigen2_map.cpp
+./test/eigen2/main.h
+./test/eigen2/eigen2_miscmatrices.cpp
+./test/eigen2/eigen2_product_large.cpp
+./test/eigen2/eigen2_first_aligned.cpp
+./test/eigen2/eigen2_cholesky.cpp
+./test/eigen2/eigen2_determinant.cpp
+./test/eigen2/eigen2_sum.cpp
+./test/eigen2/eigen2_inverse.cpp
+./test/eigen2/eigen2_regression.cpp
+./test/eigen2/eigen2_product_small.cpp
+./test/eigen2/eigen2_qtvector.cpp
+./test/eigen2/eigen2_sparse_vector.cpp
+./test/eigen2/product.h
+./test/eigen2/eigen2_sparse_basic.cpp
+./test/eigen2/eigen2_bug_132.cpp
+./test/array.cpp
+./test/product_syrk.cpp
+./test/commainitializer.cpp
+./test/conservative_resize.cpp
+./test/qr_colpivoting.cpp
+./test/nullary.cpp
+./test/bandmatrix.cpp
+./test/pastix_support.cpp
+./test/product.h
+./test/block.cpp
+./test/vectorization_logic.cpp
+./test/jacobi.cpp
+./test/diagonal.cpp
+./test/schur_complex.cpp
+./test/sizeoverflow.cpp
+./bench/BenchTimer.h
+./bench/benchFFT.cpp
+./bench/eig33.cpp
+./bench/spbench/spbenchsolver.h
+./bench/spbench/spbenchstyle.h
+./lapack/complex_double.cpp
+./lapack/cholesky.cpp
+./lapack/lapack_common.h
+./lapack/eigenvalues.cpp
+./lapack/single.cpp
+./lapack/lu.cpp
+./lapack/complex_single.cpp
+./lapack/double.cpp
+./demos/mix_eigen_and_c/binary_library.cpp
+./demos/mix_eigen_and_c/binary_library.h
+./demos/mix_eigen_and_c/example.c
+./demos/mandelbrot/mandelbrot.cpp
+./demos/mandelbrot/mandelbrot.h
+./demos/opengl/icosphere.cpp
+./demos/opengl/icosphere.h
+./demos/opengl/camera.cpp
+./demos/opengl/quaternion_demo.h
+./demos/opengl/camera.h
+./demos/opengl/trackball.h
+./demos/opengl/gpuhelper.h
+./demos/opengl/trackball.cpp
+./demos/opengl/gpuhelper.cpp
+./demos/opengl/quaternion_demo.cpp
+./debug/gdb/printers.py
+./unsupported/test/minres.cpp
+./unsupported/test/openglsupport.cpp
+./unsupported/test/jacobisvd.cpp
+./unsupported/test/dgmres.cpp
+./unsupported/test/matrix_square_root.cpp
+./unsupported/test/bdcsvd.cpp
+./unsupported/test/matrix_exponential.cpp
+./unsupported/test/forward_adolc.cpp
+./unsupported/test/polynomialsolver.cpp
+./unsupported/test/matrix_function.cpp
+./unsupported/test/sparse_extra.cpp
+./unsupported/test/matrix_functions.h
+./unsupported/test/svd_common.h
+./unsupported/test/FFTW.cpp
+./unsupported/test/alignedvector3.cpp
+./unsupported/test/autodiff.cpp
+./unsupported/test/gmres.cpp
+./unsupported/test/BVH.cpp
+./unsupported/test/levenberg_marquardt.cpp
+./unsupported/test/matrix_power.cpp
+./unsupported/test/kronecker_product.cpp
+./unsupported/test/splines.cpp
+./unsupported/test/polynomialutils.cpp
+./unsupported/bench/bench_svd.cpp
+./unsupported/Eigen/IterativeSolvers
+./unsupported/Eigen/src/IterativeSolvers/DGMRES.h
+./unsupported/Eigen/src/IterativeSolvers/IncompleteLU.h
+./unsupported/Eigen/src/IterativeSolvers/GMRES.h
+./unsupported/Eigen/src/IterativeSolvers/IncompleteCholesky.h
+./unsupported/Eigen/src/IterativeSolvers/Scaling.h
+./unsupported/Eigen/src/IterativeSolvers/MINRES.h
+./unsupported/Eigen/src/SparseExtra/RandomSetter.h
+./unsupported/Eigen/src/SparseExtra/MatrixMarketIterator.h
+./unsupported/Eigen/src/SparseExtra/DynamicSparseMatrix.h
+./unsupported/Eigen/src/SparseExtra/MarketIO.h
+./unsupported/Eigen/src/SparseExtra/BlockOfDynamicSparseMatrix.h
+./unsupported/Eigen/src/KroneckerProduct/KroneckerTensorProduct.h
+./unsupported/Eigen/src/NonLinearOptimization/LevenbergMarquardt.h
+./unsupported/Eigen/src/NonLinearOptimization/HybridNonLinearSolver.h
+./unsupported/Eigen/src/BVH/BVAlgorithms.h
+./unsupported/Eigen/src/BVH/KdBVH.h
+./unsupported/Eigen/src/AutoDiff/AutoDiffScalar.h
+./unsupported/Eigen/src/AutoDiff/AutoDiffJacobian.h
+./unsupported/Eigen/src/AutoDiff/AutoDiffVector.h
+./unsupported/Eigen/src/Splines/Spline.h
+./unsupported/Eigen/src/Splines/SplineFitting.h
+./unsupported/Eigen/src/Splines/SplineFwd.h
+./unsupported/Eigen/src/SVD/JacobiSVD.h
+./unsupported/Eigen/src/SVD/BDCSVD.h
+./unsupported/Eigen/src/SVD/SVDBase.h
+./unsupported/Eigen/src/MatrixFunctions/MatrixFunction.h
+./unsupported/Eigen/src/MatrixFunctions/MatrixSquareRoot.h
+./unsupported/Eigen/src/MatrixFunctions/MatrixLogarithm.h
+./unsupported/Eigen/src/MatrixFunctions/StemFunction.h
+./unsupported/Eigen/src/MatrixFunctions/MatrixPower.h
+./unsupported/Eigen/src/MatrixFunctions/MatrixExponential.h
+./unsupported/Eigen/src/MatrixFunctions/MatrixFunctionAtomic.h
+./unsupported/Eigen/src/MoreVectorization/MathFunctions.h
+./unsupported/Eigen/src/LevenbergMarquardt/LevenbergMarquardt.h
+./unsupported/Eigen/src/FFT/ei_fftw_impl.h
+./unsupported/Eigen/src/FFT/ei_kissfft_impl.h
+./unsupported/Eigen/src/Polynomials/PolynomialSolver.h
+./unsupported/Eigen/src/Polynomials/Companion.h
+./unsupported/Eigen/src/Polynomials/PolynomialUtils.h
+./unsupported/Eigen/src/NumericalDiff/NumericalDiff.h
+./unsupported/Eigen/src/Skyline/SkylineProduct.h
+./unsupported/Eigen/src/Skyline/SkylineMatrixBase.h
+./unsupported/Eigen/src/Skyline/SkylineStorage.h
+./unsupported/Eigen/src/Skyline/SkylineUtil.h
+./unsupported/Eigen/src/Skyline/SkylineInplaceLU.h
+./unsupported/Eigen/src/Skyline/SkylineMatrix.h
+./unsupported/Eigen/SparseExtra
+./unsupported/Eigen/AdolcForward
+./unsupported/Eigen/KroneckerProduct
+./unsupported/Eigen/NonLinearOptimization
+./unsupported/Eigen/BVH
+./unsupported/Eigen/OpenGLSupport
+./unsupported/Eigen/ArpackSupport
+./unsupported/Eigen/AutoDiff
+./unsupported/Eigen/Splines
+./unsupported/Eigen/MPRealSupport
+./unsupported/Eigen/MatrixFunctions
+./unsupported/Eigen/MoreVectorization
+./unsupported/Eigen/LevenbergMarquardt
+./unsupported/Eigen/AlignedVector3
+./unsupported/Eigen/FFT
+./unsupported/Eigen/Polynomials
+./unsupported/Eigen/NumericalDiff
+./unsupported/Eigen/Skyline
+./COPYING.README
+./COPYING.README
+./LICENSE
+./LICENSE
+./LICENSE
+./Eigen/Eigen2Support
+./Eigen/src/Eigen2Support/VectorBlock.h
+./Eigen/src/Eigen2Support/Cwise.h
+./Eigen/src/Eigen2Support/Minor.h
+./Eigen/src/Eigen2Support/Lazy.h
+./Eigen/src/Eigen2Support/Memory.h
+./Eigen/src/Eigen2Support/MathFunctions.h
+./Eigen/src/Eigen2Support/Geometry/AlignedBox.h
+./Eigen/src/Eigen2Support/Geometry/Hyperplane.h
+./Eigen/src/Eigen2Support/Geometry/Quaternion.h
+./Eigen/src/Eigen2Support/Geometry/Rotation2D.h
+./Eigen/src/Eigen2Support/Geometry/ParametrizedLine.h
+./Eigen/src/Eigen2Support/Geometry/RotationBase.h
+./Eigen/src/Eigen2Support/Geometry/Translation.h
+./Eigen/src/Eigen2Support/Geometry/Scaling.h
+./Eigen/src/Eigen2Support/Geometry/AngleAxis.h
+./Eigen/src/Eigen2Support/Geometry/Transform.h
+./Eigen/src/Eigen2Support/TriangularSolver.h
+./Eigen/src/Eigen2Support/LU.h
+./Eigen/src/Eigen2Support/QR.h
+./Eigen/src/Eigen2Support/SVD.h
+./Eigen/src/Eigen2Support/Meta.h
+./Eigen/src/Eigen2Support/Block.h
+./Eigen/src/Eigen2Support/Macros.h
+./Eigen/src/Eigen2Support/LeastSquares.h
+./Eigen/src/Eigen2Support/CwiseOperators.h
+./Eigen/src/Jacobi/Jacobi.h
+./Eigen/src/misc/Kernel.h
+./Eigen/src/misc/SparseSolve.h
+./Eigen/src/misc/Solve.h
+./Eigen/src/misc/Image.h
+./Eigen/src/SparseCore/SparseColEtree.h
+./Eigen/src/SparseCore/SparseTranspose.h
+./Eigen/src/SparseCore/SparseUtil.h
+./Eigen/src/SparseCore/SparseCwiseBinaryOp.h
+./Eigen/src/SparseCore/SparseDiagonalProduct.h
+./Eigen/src/SparseCore/SparseProduct.h
+./Eigen/src/SparseCore/SparseDot.h
+./Eigen/src/SparseCore/SparseCwiseUnaryOp.h
+./Eigen/src/SparseCore/SparseSparseProductWithPruning.h
+./Eigen/src/SparseCore/SparseBlock.h
+./Eigen/src/SparseCore/SparseDenseProduct.h
+./Eigen/src/SparseCore/CompressedStorage.h
+./Eigen/src/SparseCore/SparseMatrixBase.h
+./Eigen/src/SparseCore/MappedSparseMatrix.h
+./Eigen/src/SparseCore/SparseTriangularView.h
+./Eigen/src/SparseCore/SparseView.h
+./Eigen/src/SparseCore/SparseFuzzy.h
+./Eigen/src/SparseCore/TriangularSolver.h
+./Eigen/src/SparseCore/SparseSelfAdjointView.h
+./Eigen/src/SparseCore/SparseMatrix.h
+./Eigen/src/SparseCore/SparseVector.h
+./Eigen/src/SparseCore/AmbiVector.h
+./Eigen/src/SparseCore/ConservativeSparseSparseProduct.h
+./Eigen/src/SparseCore/SparseRedux.h
+./Eigen/src/SparseCore/SparsePermutation.h
+./Eigen/src/Eigenvalues/RealSchur.h
+./Eigen/src/Eigenvalues/ComplexEigenSolver.h
+./Eigen/src/Eigenvalues/GeneralizedEigenSolver.h
+./Eigen/src/Eigenvalues/ComplexSchur.h
+./Eigen/src/Eigenvalues/RealQZ.h
+./Eigen/src/Eigenvalues/EigenSolver.h
+./Eigen/src/Eigenvalues/HessenbergDecomposition.h
+./Eigen/src/Eigenvalues/GeneralizedSelfAdjointEigenSolver.h
+./Eigen/src/Eigenvalues/Tridiagonalization.h
+./Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h
+./Eigen/src/Eigenvalues/MatrixBaseEigenvalues.h
+./Eigen/src/SuperLUSupport/SuperLUSupport.h
+./Eigen/src/StlSupport/StdDeque.h
+./Eigen/src/StlSupport/StdVector.h
+./Eigen/src/StlSupport/StdList.h
+./Eigen/src/StlSupport/details.h
+./Eigen/src/SparseQR/SparseQR.h
+./Eigen/src/LU/Inverse.h
+./Eigen/src/LU/arch/Inverse_SSE.h
+./Eigen/src/LU/Determinant.h
+./Eigen/src/LU/PartialPivLU.h
+./Eigen/src/LU/FullPivLU.h
+./Eigen/src/UmfPackSupport/UmfPackSupport.h
+./Eigen/src/OrderingMethods/Ordering.h
+./Eigen/src/OrderingMethods/Eigen_Colamd.h
+./Eigen/src/QR/HouseholderQR.h
+./Eigen/src/QR/ColPivHouseholderQR.h
+./Eigen/src/QR/FullPivHouseholderQR.h
+./Eigen/src/SVD/JacobiSVD.h
+./Eigen/src/SVD/UpperBidiagonalization.h
+./Eigen/src/Geometry/OrthoMethods.h
+./Eigen/src/Geometry/AlignedBox.h
+./Eigen/src/Geometry/Hyperplane.h
+./Eigen/src/Geometry/Quaternion.h
+./Eigen/src/Geometry/EulerAngles.h
+./Eigen/src/Geometry/Rotation2D.h
+./Eigen/src/Geometry/ParametrizedLine.h
+./Eigen/src/Geometry/RotationBase.h
+./Eigen/src/Geometry/arch/Geometry_SSE.h
+./Eigen/src/Geometry/Umeyama.h
+./Eigen/src/Geometry/Homogeneous.h
+./Eigen/src/Geometry/Translation.h
+./Eigen/src/Geometry/Scaling.h
+./Eigen/src/Geometry/AngleAxis.h
+./Eigen/src/Geometry/Transform.h
+./Eigen/src/plugins/BlockMethods.h
+./Eigen/src/plugins/CommonCwiseUnaryOps.h
+./Eigen/src/plugins/CommonCwiseBinaryOps.h
+./Eigen/src/plugins/MatrixCwiseUnaryOps.h
+./Eigen/src/plugins/MatrixCwiseBinaryOps.h
+./Eigen/src/Householder/Householder.h
+./Eigen/src/Householder/HouseholderSequence.h
+./Eigen/src/Householder/BlockHouseholder.h
+./Eigen/src/Core/VectorBlock.h
+./Eigen/src/Core/Matrix.h
+./Eigen/src/Core/Ref.h
+./Eigen/src/Core/SelfAdjointView.h
+./Eigen/src/Core/MathFunctions.h
+./Eigen/src/Core/GlobalFunctions.h
+./Eigen/src/Core/MapBase.h
+./Eigen/src/Core/EigenBase.h
+./Eigen/src/Core/GenericPacketMath.h
+./Eigen/src/Core/NestByValue.h
+./Eigen/src/Core/CwiseUnaryOp.h
+./Eigen/src/Core/SolveTriangular.h
+./Eigen/src/Core/Fuzzy.h
+./Eigen/src/Core/Visitor.h
+./Eigen/src/Core/Map.h
+./Eigen/src/Core/NoAlias.h
+./Eigen/src/Core/Diagonal.h
+./Eigen/src/Core/StableNorm.h
+./Eigen/src/Core/CoreIterators.h
+./Eigen/src/Core/products/Parallelizer.h
+./Eigen/src/Core/products/SelfadjointMatrixVector.h
+./Eigen/src/Core/products/GeneralMatrixMatrixTriangular.h
+./Eigen/src/Core/products/TriangularSolverMatrix.h
+./Eigen/src/Core/products/GeneralMatrixMatrix.h
+./Eigen/src/Core/products/SelfadjointProduct.h
+./Eigen/src/Core/products/CoeffBasedProduct.h
+./Eigen/src/Core/products/TriangularMatrixVector.h
+./Eigen/src/Core/products/SelfadjointMatrixMatrix.h
+./Eigen/src/Core/products/TriangularSolverVector.h
+./Eigen/src/Core/products/SelfadjointRank2Update.h
+./Eigen/src/Core/products/GeneralBlockPanelKernel.h
+./Eigen/src/Core/products/GeneralMatrixVector.h
+./Eigen/src/Core/products/TriangularMatrixMatrix.h
+./Eigen/src/Core/Reverse.h
+./Eigen/src/Core/BooleanRedux.h
+./Eigen/src/Core/Replicate.h
+./Eigen/src/Core/arch/AltiVec/PacketMath.h
+./Eigen/src/Core/arch/AltiVec/Complex.h
+./Eigen/src/Core/arch/SSE/PacketMath.h
+./Eigen/src/Core/arch/SSE/Complex.h
+./Eigen/src/Core/arch/SSE/MathFunctions.h
+./Eigen/src/Core/arch/NEON/PacketMath.h
+./Eigen/src/Core/arch/NEON/Complex.h
+./Eigen/src/Core/arch/Default/Settings.h
+./Eigen/src/Core/CwiseUnaryView.h
+./Eigen/src/Core/Array.h
+./Eigen/src/Core/ArrayWrapper.h
+./Eigen/src/Core/Swap.h
+./Eigen/src/Core/Transpositions.h
+./Eigen/src/Core/Random.h
+./Eigen/src/Core/IO.h
+./Eigen/src/Core/SelfCwiseBinaryOp.h
+./Eigen/src/Core/VectorwiseOp.h
+./Eigen/src/Core/Select.h
+./Eigen/src/Core/ArrayBase.h
+./Eigen/src/Core/DenseCoeffsBase.h
+./Eigen/src/Core/DiagonalProduct.h
+./Eigen/src/Core/Assign.h
+./Eigen/src/Core/Redux.h
+./Eigen/src/Core/ForceAlignedAccess.h
+./Eigen/src/Core/BandMatrix.h
+./Eigen/src/Core/PlainObjectBase.h
+./Eigen/src/Core/DenseBase.h
+./Eigen/src/Core/Flagged.h
+./Eigen/src/Core/CwiseBinaryOp.h
+./Eigen/src/Core/ProductBase.h
+./Eigen/src/Core/TriangularMatrix.h
+./Eigen/src/Core/Transpose.h
+./Eigen/src/Core/DiagonalMatrix.h
+./Eigen/src/Core/Dot.h
+./Eigen/src/Core/Functors.h
+./Eigen/src/Core/PermutationMatrix.h
+./Eigen/src/Core/NumTraits.h
+./Eigen/src/Core/MatrixBase.h
+./Eigen/src/Core/DenseStorage.h
+./Eigen/src/Core/util/Memory.h
+./Eigen/src/Core/util/StaticAssert.h
+./Eigen/src/Core/util/BlasUtil.h
+./Eigen/src/Core/util/MatrixMapper.h
+./Eigen/src/Core/util/XprHelper.h
+./Eigen/src/Core/util/ForwardDeclarations.h
+./Eigen/src/Core/util/Meta.h
+./Eigen/src/Core/util/Macros.h
+./Eigen/src/Core/util/Constants.h
+./Eigen/src/Core/CwiseNullaryOp.h
+./Eigen/src/Core/Block.h
+./Eigen/src/Core/GeneralProduct.h
+./Eigen/src/Core/CommaInitializer.h
+./Eigen/src/Core/ReturnByValue.h
+./Eigen/src/Core/Stride.h
+./Eigen/src/SPQRSupport/SuiteSparseQRSupport.h
+./Eigen/src/SparseLU/SparseLU_column_dfs.h
+./Eigen/src/SparseLU/SparseLU_panel_dfs.h
+./Eigen/src/SparseLU/SparseLU_relax_snode.h
+./Eigen/src/SparseLU/SparseLU_panel_bmod.h
+./Eigen/src/SparseLU/SparseLU_SupernodalMatrix.h
+./Eigen/src/SparseLU/SparseLU_Utils.h
+./Eigen/src/SparseLU/SparseLU_gemm_kernel.h
+./Eigen/src/SparseLU/SparseLU_kernel_bmod.h
+./Eigen/src/SparseLU/SparseLU_pivotL.h
+./Eigen/src/SparseLU/SparseLU_Memory.h
+./Eigen/src/SparseLU/SparseLU_heap_relax_snode.h
+./Eigen/src/SparseLU/SparseLUImpl.h
+./Eigen/src/SparseLU/SparseLU_copy_to_ucol.h
+./Eigen/src/SparseLU/SparseLU_Structs.h
+./Eigen/src/SparseLU/SparseLU.h
+./Eigen/src/SparseLU/SparseLU_column_bmod.h
+./Eigen/src/SparseLU/SparseLU_pruneL.h
+./Eigen/src/IterativeLinearSolvers/IncompleteLUT.h
+./Eigen/src/IterativeLinearSolvers/BasicPreconditioners.h
+./Eigen/src/IterativeLinearSolvers/IterativeSolverBase.h
+./Eigen/src/IterativeLinearSolvers/ConjugateGradient.h
+./Eigen/src/IterativeLinearSolvers/BiCGSTAB.h
+./Eigen/src/SparseCholesky/SimplicialCholesky.h
+./Eigen/src/Cholesky/LDLT.h
+./Eigen/src/Cholesky/LLT.h
+./Eigen/src/CholmodSupport/CholmodSupport.h
+./Eigen/src/PaStiXSupport/PaStiXSupport.h
+./Eigen/src/MetisSupport/MetisSupport.h
+./Eigen/StdVector
+./Eigen/Core
+./Eigen/SparseLU
+./Eigen/StdList
+./Eigen/StdDeque
+./Eigen/SparseCholesky
+./scripts/relicense.py
+./scripts/relicense.py
+./blas/BandTriangularSolver.h
+./blas/PackedTriangularMatrixVector.h
+./blas/complex_double.cpp
+./blas/level2_real_impl.h
+./blas/level1_cplx_impl.h
+./blas/level1_impl.h
+./blas/level1_real_impl.h
+./blas/level3_impl.h
+./blas/single.cpp
+./blas/level2_cplx_impl.h
+./blas/PackedSelfadjointProduct.h
+./blas/Rank2Update.h
+./blas/complex_single.cpp
+./blas/PackedTriangularSolverVector.h
+./blas/double.cpp
+./blas/common.h
+./blas/level2_impl.h
+./blas/GeneralRank1Update.h
+
+Mozilla Public License Version 2.0
+==================================
+
+1. Definitions
+--------------
+
+1.1. "Contributor"
+    means each individual or legal entity that creates, contributes to
+    the creation of, or owns Covered Software.
+
+1.2. "Contributor Version"
+    means the combination of the Contributions of others (if any) used
+    by a Contributor and that particular Contributor's Contribution.
+
+1.3. "Contribution"
+    means Covered Software of a particular Contributor.
+
+1.4. "Covered Software"
+    means Source Code Form to which the initial Contributor has attached
+    the notice in Exhibit A, the Executable Form of such Source Code
+    Form, and Modifications of such Source Code Form, in each case
+    including portions thereof.
+
+1.5. "Incompatible With Secondary Licenses"
+    means
+
+    (a) that the initial Contributor has attached the notice described
+        in Exhibit B to the Covered Software; or
+
+    (b) that the Covered Software was made available under the terms of
+        version 1.1 or earlier of the License, but not also under the
+        terms of a Secondary License.
+
+1.6. "Executable Form"
+    means any form of the work other than Source Code Form.
+
+1.7. "Larger Work"
+    means a work that combines Covered Software with other material, in 
+    a separate file or files, that is not Covered Software.
+
+1.8. "License"
+    means this document.
+
+1.9. "Licensable"
+    means having the right to grant, to the maximum extent possible,
+    whether at the time of the initial grant or subsequently, any and
+    all of the rights conveyed by this License.
+
+1.10. "Modifications"
+    means any of the following:
+
+    (a) any file in Source Code Form that results from an addition to,
+        deletion from, or modification of the contents of Covered
+        Software; or
+
+    (b) any new file in Source Code Form that contains any Covered
+        Software.
+
+1.11. "Patent Claims" of a Contributor
+    means any patent claim(s), including without limitation, method,
+    process, and apparatus claims, in any patent Licensable by such
+    Contributor that would be infringed, but for the grant of the
+    License, by the making, using, selling, offering for sale, having
+    made, import, or transfer of either its Contributions or its
+    Contributor Version.
+
+1.12. "Secondary License"
+    means either the GNU General Public License, Version 2.0, the GNU
+    Lesser General Public License, Version 2.1, the GNU Affero General
+    Public License, Version 3.0, or any later versions of those
+    licenses.
+
+1.13. "Source Code Form"
+    means the form of the work preferred for making modifications.
+
+1.14. "You" (or "Your")
+    means an individual or a legal entity exercising rights under this
+    License. For legal entities, "You" includes any entity that
+    controls, is controlled by, or is under common control with You. For
+    purposes of this definition, "control" means (a) the power, direct
+    or indirect, to cause the direction or management of such entity,
+    whether by contract or otherwise, or (b) ownership of more than
+    fifty percent (50%) of the outstanding shares or beneficial
+    ownership of such entity.
+
+2. License Grants and Conditions
+--------------------------------
+
+2.1. Grants
+
+Each Contributor hereby grants You a world-wide, royalty-free,
+non-exclusive license:
+
+(a) under intellectual property rights (other than patent or trademark)
+    Licensable by such Contributor to use, reproduce, make available,
+    modify, display, perform, distribute, and otherwise exploit its
+    Contributions, either on an unmodified basis, with Modifications, or
+    as part of a Larger Work; and
+
+(b) under Patent Claims of such Contributor to make, use, sell, offer
+    for sale, have made, import, and otherwise transfer either its
+    Contributions or its Contributor Version.
+
+2.2. Effective Date
+
+The licenses granted in Section 2.1 with respect to any Contribution
+become effective for each Contribution on the date the Contributor first
+distributes such Contribution.
+
+2.3. Limitations on Grant Scope
+
+The licenses granted in this Section 2 are the only rights granted under
+this License. No additional rights or licenses will be implied from the
+distribution or licensing of Covered Software under this License.
+Notwithstanding Section 2.1(b) above, no patent license is granted by a
+Contributor:
+
+(a) for any code that a Contributor has removed from Covered Software;
+    or
+
+(b) for infringements caused by: (i) Your and any other third party's
+    modifications of Covered Software, or (ii) the combination of its
+    Contributions with other software (except as part of its Contributor
+    Version); or
+
+(c) under Patent Claims infringed by Covered Software in the absence of
+    its Contributions.
+
+This License does not grant any rights in the trademarks, service marks,
+or logos of any Contributor (except as may be necessary to comply with
+the notice requirements in Section 3.4).
+
+2.4. Subsequent Licenses
+
+No Contributor makes additional grants as a result of Your choice to
+distribute the Covered Software under a subsequent version of this
+License (see Section 10.2) or under the terms of a Secondary License (if
+permitted under the terms of Section 3.3).
+
+2.5. Representation
+
+Each Contributor represents that the Contributor believes its
+Contributions are its original creation(s) or it has sufficient rights
+to grant the rights to its Contributions conveyed by this License.
+
+2.6. Fair Use
+
+This License is not intended to limit any rights You have under
+applicable copyright doctrines of fair use, fair dealing, or other
+equivalents.
+
+2.7. Conditions
+
+Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted
+in Section 2.1.
+
+3. Responsibilities
+-------------------
+
+3.1. Distribution of Source Form
+
+All distribution of Covered Software in Source Code Form, including any
+Modifications that You create or to which You contribute, must be under
+the terms of this License. You must inform recipients that the Source
+Code Form of the Covered Software is governed by the terms of this
+License, and how they can obtain a copy of this License. You may not
+attempt to alter or restrict the recipients' rights in the Source Code
+Form.
+
+3.2. Distribution of Executable Form
+
+If You distribute Covered Software in Executable Form then:
+
+(a) such Covered Software must also be made available in Source Code
+    Form, as described in Section 3.1, and You must inform recipients of
+    the Executable Form how they can obtain a copy of such Source Code
+    Form by reasonable means in a timely manner, at a charge no more
+    than the cost of distribution to the recipient; and
+
+(b) You may distribute such Executable Form under the terms of this
+    License, or sublicense it under different terms, provided that the
+    license for the Executable Form does not attempt to limit or alter
+    the recipients' rights in the Source Code Form under this License.
+
+3.3. Distribution of a Larger Work
+
+You may create and distribute a Larger Work under terms of Your choice,
+provided that You also comply with the requirements of this License for
+the Covered Software. If the Larger Work is a combination of Covered
+Software with a work governed by one or more Secondary Licenses, and the
+Covered Software is not Incompatible With Secondary Licenses, this
+License permits You to additionally distribute such Covered Software
+under the terms of such Secondary License(s), so that the recipient of
+the Larger Work may, at their option, further distribute the Covered
+Software under the terms of either this License or such Secondary
+License(s).
+
+3.4. Notices
+
+You may not remove or alter the substance of any license notices
+(including copyright notices, patent notices, disclaimers of warranty,
+or limitations of liability) contained within the Source Code Form of
+the Covered Software, except that You may alter any license notices to
+the extent required to remedy known factual inaccuracies.
+
+3.5. Application of Additional Terms
+
+You may choose to offer, and to charge a fee for, warranty, support,
+indemnity or liability obligations to one or more recipients of Covered
+Software. However, You may do so only on Your own behalf, and not on
+behalf of any Contributor. You must make it absolutely clear that any
+such warranty, support, indemnity, or liability obligation is offered by
+You alone, and You hereby agree to indemnify every Contributor for any
+liability incurred by such Contributor as a result of warranty, support,
+indemnity or liability terms You offer. You may include additional
+disclaimers of warranty and limitations of liability specific to any
+jurisdiction.
+
+4. Inability to Comply Due to Statute or Regulation
+---------------------------------------------------
+
+If it is impossible for You to comply with any of the terms of this
+License with respect to some or all of the Covered Software due to
+statute, judicial order, or regulation then You must: (a) comply with
+the terms of this License to the maximum extent possible; and (b)
+describe the limitations and the code they affect. Such description must
+be placed in a text file included with all distributions of the Covered
+Software under this License. Except to the extent prohibited by statute
+or regulation, such description must be sufficiently detailed for a
+recipient of ordinary skill to be able to understand it.
+
+5. Termination
+--------------
+
+5.1. The rights granted under this License will terminate automatically
+if You fail to comply with any of its terms. However, if You become
+compliant, then the rights granted under this License from a particular
+Contributor are reinstated (a) provisionally, unless and until such
+Contributor explicitly and finally terminates Your grants, and (b) on an
+ongoing basis, if such Contributor fails to notify You of the
+non-compliance by some reasonable means prior to 60 days after You have
+come back into compliance. Moreover, Your grants from a particular
+Contributor are reinstated on an ongoing basis if such Contributor
+notifies You of the non-compliance by some reasonable means, this is the
+first time You have received notice of non-compliance with this License
+from such Contributor, and You become compliant prior to 30 days after
+Your receipt of the notice.
+
+5.2. If You initiate litigation against any entity by asserting a patent
+infringement claim (excluding declaratory judgment actions,
+counter-claims, and cross-claims) alleging that a Contributor Version
+directly or indirectly infringes any patent, then the rights granted to
+You by any and all Contributors for the Covered Software under Section
+2.1 of this License shall terminate.
+
+5.3. In the event of termination under Sections 5.1 or 5.2 above, all
+end user license agreements (excluding distributors and resellers) which
+have been validly granted by You or Your distributors under this License
+prior to termination shall survive termination.
+
+************************************************************************
+*                                                                      *
+*  6. Disclaimer of Warranty                                           *
+*  -------------------------                                           *
+*                                                                      *
+*  Covered Software is provided under this License on an "as is"       *
+*  basis, without warranty of any kind, either expressed, implied, or  *
+*  statutory, including, without limitation, warranties that the       *
+*  Covered Software is free of defects, merchantable, fit for a        *
+*  particular purpose or non-infringing. The entire risk as to the     *
+*  quality and performance of the Covered Software is with You.        *
+*  Should any Covered Software prove defective in any respect, You     *
+*  (not any Contributor) assume the cost of any necessary servicing,   *
+*  repair, or correction. This disclaimer of warranty constitutes an   *
+*  essential part of this License. No use of any Covered Software is   *
+*  authorized under this License except under this disclaimer.         *
+*                                                                      *
+************************************************************************
+
+************************************************************************
+*                                                                      *
+*  7. Limitation of Liability                                          *
+*  --------------------------                                          *
+*                                                                      *
+*  Under no circumstances and under no legal theory, whether tort      *
+*  (including negligence), contract, or otherwise, shall any           *
+*  Contributor, or anyone who distributes Covered Software as          *
+*  permitted above, be liable to You for any direct, indirect,         *
+*  special, incidental, or consequential damages of any character      *
+*  including, without limitation, damages for lost profits, loss of    *
+*  goodwill, work stoppage, computer failure or malfunction, or any    *
+*  and all other commercial damages or losses, even if such party      *
+*  shall have been informed of the possibility of such damages. This   *
+*  limitation of liability shall not apply to liability for death or   *
+*  personal injury resulting from such party's negligence to the       *
+*  extent applicable law prohibits such limitation. Some               *
+*  jurisdictions do not allow the exclusion or limitation of           *
+*  incidental or consequential damages, so this exclusion and          *
+*  limitation may not apply to You.                                    *
+*                                                                      *
+************************************************************************
+
+8. Litigation
+-------------
+
+Any litigation relating to this License may be brought only in the
+courts of a jurisdiction where the defendant maintains its principal
+place of business and such litigation shall be governed by laws of that
+jurisdiction, without reference to its conflict-of-law provisions.
+Nothing in this Section shall prevent a party's ability to bring
+cross-claims or counter-claims.
+
+9. Miscellaneous
+----------------
+
+This License represents the complete agreement concerning the subject
+matter hereof. If any provision of this License is held to be
+unenforceable, such provision shall be reformed only to the extent
+necessary to make it enforceable. Any law or regulation which provides
+that the language of a contract shall be construed against the drafter
+shall not be used to construe this License against a Contributor.
+
+10. Versions of the License
+---------------------------
+
+10.1. New Versions
+
+Mozilla Foundation is the license steward. Except as provided in Section
+10.3, no one other than the license steward has the right to modify or
+publish new versions of this License. Each version will be given a
+distinguishing version number.
+
+10.2. Effect of New Versions
+
+You may distribute the Covered Software under the terms of the version
+of the License under which You originally received the Covered Software,
+or under the terms of any subsequent version published by the license
+steward.
+
+10.3. Modified Versions
+
+If you create software not governed by this License, and you want to
+create a new license for such software, you may create and use a
+modified version of this License if you rename the license and remove
+any references to the name of the license steward (except to note that
+such modified license differs from this License).
+
+10.4. Distributing Source Code Form that is Incompatible With Secondary
+Licenses
+
+If You choose to distribute Source Code Form that is Incompatible With
+Secondary Licenses under the terms of this version of the License, the
+notice described in Exhibit B of this License must be attached.
+
+Exhibit A - Source Code Form License Notice
+-------------------------------------------
+
+  This Source Code Form is subject to the terms of the Mozilla Public
+  License, v. 2.0. If a copy of the MPL was not distributed with this
+  file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+If it is not possible or desirable to put the notice in a particular
+file, then You may include the notice in a location (such as a LICENSE
+file in a relevant directory) where a recipient would be likely to look
+for such a notice.
+
+You may add additional accurate notices of copyright ownership.
+
+Exhibit B - "Incompatible With Secondary Licenses" Notice
+---------------------------------------------------------
+
+  This Source Code Form is "Incompatible With Secondary Licenses", as
+  defined by the Mozilla Public License, v. 2.0.
+
+----------------------------------------------------------------------
+Following applies to:
+./doc/UsingIntelMKL.dox
+./doc/UsingIntelMKL.dox
+./Eigen/src/Eigenvalues/ComplexSchur_MKL.h
+./Eigen/src/Eigenvalues/ComplexSchur_MKL.h
+./Eigen/src/Eigenvalues/SelfAdjointEigenSolver_MKL.h
+./Eigen/src/Eigenvalues/SelfAdjointEigenSolver_MKL.h
+./Eigen/src/Eigenvalues/RealSchur_MKL.h
+./Eigen/src/Eigenvalues/RealSchur_MKL.h
+./Eigen/src/LU/arch/Inverse_SSE.h
+./Eigen/src/LU/arch/Inverse_SSE.h
+./Eigen/src/LU/PartialPivLU_MKL.h
+./Eigen/src/LU/PartialPivLU_MKL.h
+./Eigen/src/QR/HouseholderQR_MKL.h
+./Eigen/src/QR/HouseholderQR_MKL.h
+./Eigen/src/QR/ColPivHouseholderQR_MKL.h
+./Eigen/src/QR/ColPivHouseholderQR_MKL.h
+./Eigen/src/SVD/JacobiSVD_MKL.h
+./Eigen/src/SVD/JacobiSVD_MKL.h
+./Eigen/src/PardisoSupport/PardisoSupport.h
+./Eigen/src/PardisoSupport/PardisoSupport.h
+./Eigen/src/Core/Assign_MKL.h
+./Eigen/src/Core/Assign_MKL.h
+./Eigen/src/Core/products/SelfadjointMatrixVector_MKL.h
+./Eigen/src/Core/products/SelfadjointMatrixVector_MKL.h
+./Eigen/src/Core/products/GeneralMatrixVector_MKL.h
+./Eigen/src/Core/products/GeneralMatrixVector_MKL.h
+./Eigen/src/Core/products/SelfadjointMatrixMatrix_MKL.h
+./Eigen/src/Core/products/SelfadjointMatrixMatrix_MKL.h
+./Eigen/src/Core/products/TriangularMatrixMatrix_MKL.h
+./Eigen/src/Core/products/TriangularMatrixMatrix_MKL.h
+./Eigen/src/Core/products/GeneralMatrixMatrix_MKL.h
+./Eigen/src/Core/products/GeneralMatrixMatrix_MKL.h
+./Eigen/src/Core/products/TriangularMatrixVector_MKL.h
+./Eigen/src/Core/products/TriangularMatrixVector_MKL.h
+./Eigen/src/Core/products/GeneralMatrixMatrixTriangular_MKL.h
+./Eigen/src/Core/products/GeneralMatrixMatrixTriangular_MKL.h
+./Eigen/src/Core/products/TriangularSolverMatrix_MKL.h
+./Eigen/src/Core/products/TriangularSolverMatrix_MKL.h
+./Eigen/src/Core/util/MKL_support.h
+./Eigen/src/Core/util/MKL_support.h
+./Eigen/src/Cholesky/LLT_MKL.h
+./Eigen/src/Cholesky/LLT_MKL.h
+
+/*
+ Copyright (c) 2011, Intel Corporation. All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.  *
+   Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the
+   distribution.  * Neither the name of Intel Corporation nor the
+   names of its contributors may be used to endorse or promote
+   products derived from this software without specific prior written
+   permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+----------------------------------------------------------------------
+Following applies to:
+  everything under ./bench/btl
+
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds
+of works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal,
+family, or household purposes, or (2) anything designed or sold for
+incorporation into a dwelling.  In determining whether a product is a
+consumer product, doubtful cases shall be resolved in favor of
+coverage.  For a particular product received by a particular user,
+"normally used" refers to a typical or common use of that class of
+product, regardless of the status of the particular user or of the way
+in which the particular user actually uses, or expects or is expected
+to use, the product.  A product is a consumer product regardless of
+whether the product has substantial commercial, industrial or
+non-consumer uses, unless such uses represent the only significant
+mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to
+install and execute modified versions of a covered work in that User
+Product from a modified version of its Corresponding Source.  The
+information must suffice to ensure that the continued functioning of
+the modified object code is in no case prevented or interfered with
+solely because modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include
+a requirement to continue to provide support service, warranty, or
+updates for a work that has been modified or installed by the
+recipient, or for the User Product in which it has been modified or
+installed.  Access to a network may be denied when the modification
+itself materially and adversely affects the operation of the network
+or violates the rules and protocols for communication across the
+network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material
+you add to a covered work, you may (if authorized by the copyright
+holders of that material) supplement the terms of this License with
+terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions
+    of it) with contractual assumptions of liability to the recipient,
+    for any liability that these contractual assumptions directly
+    impose on those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement
+or otherwise) that contradict the conditions of this License, they do
+not excuse you from the conditions of this License.  If you cannot
+convey a covered work so as to satisfy simultaneously your obligations
+under this License and any other pertinent obligations, then as a
+consequence you may not convey it at all.  For example, if you agree
+to terms that obligate you to collect a royalty for further conveying
+from those to whom you convey the Program, the only way you could
+satisfy both those terms and this License would be to refrain entirely
+from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions
+of the GNU General Public License from time to time.  Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
+WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND
+PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE PROGRAM PROVE
+DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
+CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
+WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES
+AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR
+DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL
+DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM
+(INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED
+INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF
+THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER
+OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these
+terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it
+    does.>
+    Copyright (C) <year> <name of author>
+
+    This program is free software: you can redistribute it and/or
+    modify it under the terms of the GNU General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see
+    <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    <program> Copyright (C) <year> <name of author> This program comes
+    with ABSOLUTELY NO WARRANTY; for details type `show w'.  This is
+    free software, and you are welcome to redistribute it under
+    certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the
+appropriate parts of the General Public License.  Of course, your
+program's commands might be different; for a GUI interface, you would
+use an "about box".
+
+  You should also get your employer (if you work as a programmer) or
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  For more information on this, and how to apply and follow
+the GNU GPL, see <http://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your
+program into proprietary programs.  If your program is a subroutine
+library, you may consider it more useful to permit linking proprietary
+applications with the library.  If this is what you want to do, use
+the GNU Lesser General Public License instead of this License.  But
+first, please read <http://www.gnu.org/philosophy/why-not-lgpl.html>.
+
+
+----------------------------------------------------------------------
+Following applies to:
+./test/metis_support.cpp
+./test/sparselu.cpp
+./unsupported/test/mpreal/mpreal.h
+./unsupported/Eigen/src/IterativeSolvers/IterationController.h
+./unsupported/Eigen/src/IterativeSolvers/ConstrainedConjGrad.h
+./unsupported/Eigen/src/Eigenvalues/ArpackSelfAdjointEigenSolver.h
+./Eigen/src/OrderingMethods/Amd.h
+./Eigen/src/SparseCholesky/SimplicialCholesky_impl.h
+
+                  GNU LESSER GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+
+  This version of the GNU Lesser General Public License incorporates
+the terms and conditions of version 3 of the GNU General Public
+License, supplemented by the additional permissions listed below.
+
+  0. Additional Definitions. 
+
+  As used herein, "this License" refers to version 3 of the GNU Lesser
+General Public License, and the "GNU GPL" refers to version 3 of the
+GNU General Public License.
+
+  "The Library" refers to a covered work governed by this License,
+other than an Application or a Combined Work as defined below.
+
+  An "Application" is any work that makes use of an interface provided
+by the Library, but which is not otherwise based on the Library.
+Defining a subclass of a class defined by the Library is deemed a mode
+of using an interface provided by the Library.
+
+  A "Combined Work" is a work produced by combining or linking an
+Application with the Library.  The particular version of the Library
+with which the Combined Work was made is also called the "Linked
+Version".
+
+  The "Minimal Corresponding Source" for a Combined Work means the
+Corresponding Source for the Combined Work, excluding any source code
+for portions of the Combined Work that, considered in isolation, are
+based on the Application, and not on the Linked Version.
+
+  The "Corresponding Application Code" for a Combined Work means the
+object code and/or source code for the Application, including any data
+and utility programs needed for reproducing the Combined Work from the
+Application, but excluding the System Libraries of the Combined Work.
+
+  1. Exception to Section 3 of the GNU GPL.
+
+  You may convey a covered work under sections 3 and 4 of this License
+without being bound by section 3 of the GNU GPL.
+
+  2. Conveying Modified Versions.
+
+  If you modify a copy of the Library, and, in your modifications, a
+facility refers to a function or data to be supplied by an Application
+that uses the facility (other than as an argument passed when the
+facility is invoked), then you may convey a copy of the modified
+version:
+
+   a) under this License, provided that you make a good faith effort to
+   ensure that, in the event an Application does not supply the
+   function or data, the facility still operates, and performs
+   whatever part of its purpose remains meaningful, or
+
+   b) under the GNU GPL, with none of the additional permissions of
+   this License applicable to that copy.
+
+  3. Object Code Incorporating Material from Library Header Files.
+
+  The object code form of an Application may incorporate material from
+a header file that is part of the Library.  You may convey such object
+code under terms of your choice, provided that, if the incorporated
+material is not limited to numerical parameters, data structure
+layouts and accessors, or small macros, inline functions and templates
+(ten or fewer lines in length), you do both of the following:
+
+   a) Give prominent notice with each copy of the object code that the
+   Library is used in it and that the Library and its use are
+   covered by this License.
+
+   b) Accompany the object code with a copy of the GNU GPL and this
+   license document.
+
+  4. Combined Works.
+
+  You may convey a Combined Work under terms of your choice that,
+taken together, effectively do not restrict modification of the
+portions of the Library contained in the Combined Work and reverse
+engineering for debugging such modifications, if you also do each of
+the following:
+
+   a) Give prominent notice with each copy of the Combined Work that
+   the Library is used in it and that the Library and its use are
+   covered by this License.
+
+   b) Accompany the Combined Work with a copy of the GNU GPL and this
+   license document.
+
+   c) For a Combined Work that displays copyright notices during
+   execution, include the copyright notice for the Library among
+   these notices, as well as a reference directing the user to the
+   copies of the GNU GPL and this license document.
+
+   d) Do one of the following:
+
+       0) Convey the Minimal Corresponding Source under the terms of
+       this License, and the Corresponding Application Code in a form
+       suitable for, and under terms that permit, the user to
+       recombine or relink the Application with a modified version of
+       the Linked Version to produce a modified Combined Work, in the
+       manner specified by section 6 of the GNU GPL for conveying
+       Corresponding Source.
+
+       1) Use a suitable shared library mechanism for linking with the
+       Library.  A suitable mechanism is one that (a) uses at run time
+       a copy of the Library already present on the user's computer
+       system, and (b) will operate properly with a modified version
+       of the Library that is interface-compatible with the Linked
+       Version. 
+
+   e) Provide Installation Information, but only if you would otherwise
+   be required to provide such information under section 6 of the
+   GNU GPL, and only to the extent that such information is
+   necessary to install and execute a modified version of the
+   Combined Work produced by recombining or relinking the
+   Application with a modified version of the Linked Version. (If
+   you use option 4d0, the Installation Information must accompany
+   the Minimal Corresponding Source and Corresponding Application
+   Code. If you use option 4d1, you must provide the Installation
+   Information in the manner specified by section 6 of the GNU GPL
+   for conveying Corresponding Source.)
+
+  5. Combined Libraries.
+
+  You may place library facilities that are a work based on the
+Library side by side in a single library together with other library
+facilities that are not Applications and are not covered by this
+License, and convey such a combined library under terms of your
+choice, if you do both of the following:
+
+   a) Accompany the combined library with a copy of the same work based
+   on the Library, uncombined with any other library facilities,
+   conveyed under the terms of this License.
+
+   b) Give prominent notice with the combined library that part of it
+   is a work based on the Library, and explaining where to find the
+   accompanying uncombined form of the same work.
+
+  6. Revised Versions of the GNU Lesser General Public License.
+
+  The Free Software Foundation may publish revised and/or new versions
+of the GNU Lesser General Public License from time to time. Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns.
+
+  Each version is given a distinguishing version number. If the
+Library as you received it specifies that a certain numbered version
+of the GNU Lesser General Public License "or any later version"
+applies to it, you have the option of following the terms and
+conditions either of that published version or of any later version
+published by the Free Software Foundation. If the Library as you
+received it does not specify a version number of the GNU Lesser
+General Public License, you may choose any version of the GNU Lesser
+General Public License ever published by the Free Software Foundation.
+
+  If the Library as you received it specifies that a proxy can decide
+whether future versions of the GNU Lesser General Public License shall
+apply, that proxy's public statement of acceptance of any version is
+permanent authorization for you to choose that version for the
+Library.
+
+
+----------------------------------------------------------------------
+Following applies to:
+./unsupported/Eigen/src/LevenbergMarquardt/LevenbergMarquardt.h
+./unsupported/Eigen/src/LevenbergMarquardt/LMcovar.h
+./unsupported/Eigen/src/LevenbergMarquardt/LMonestep.h
+./unsupported/Eigen/src/LevenbergMarquardt/LMpar.h
+./unsupported/Eigen/src/LevenbergMarquardt/LMqrsolv.h
+
+Minpack Copyright Notice (1999) University of Chicago.  All rights
+reserved
+
+Redistribution and use in source and binary forms, with or
+without modification, are permitted provided that the
+following conditions are met:
+
+1. Redistributions of source code must retain the above
+copyright notice, this list of conditions and the following
+disclaimer.
+
+2. Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following
+disclaimer in the documentation and/or other materials
+provided with the distribution.
+
+3. The end-user documentation included with the
+redistribution, if any, must include the following
+acknowledgment:
+
+   "This product includes software developed by the
+   University of Chicago, as Operator of Argonne National
+   Laboratory.
+
+Alternately, this acknowledgment may appear in the software
+itself, if and wherever such third-party acknowledgments
+normally appear.
+
+4. WARRANTY DISCLAIMER. THE SOFTWARE IS SUPPLIED "AS IS"
+WITHOUT WARRANTY OF ANY KIND. THE COPYRIGHT HOLDER, THE
+UNITED STATES, THE UNITED STATES DEPARTMENT OF ENERGY, AND
+THEIR EMPLOYEES: (1) DISCLAIM ANY WARRANTIES, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO ANY IMPLIED WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE
+OR NON-INFRINGEMENT, (2) DO NOT ASSUME ANY LEGAL LIABILITY
+OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR
+USEFULNESS OF THE SOFTWARE, (3) DO NOT REPRESENT THAT USE OF
+THE SOFTWARE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS, (4)
+DO NOT WARRANT THAT THE SOFTWARE WILL FUNCTION
+UNINTERRUPTED, THAT IT IS ERROR-FREE OR THAT ANY ERRORS WILL
+BE CORRECTED.
+
+5. LIMITATION OF LIABILITY. IN NO EVENT WILL THE COPYRIGHT
+HOLDER, THE UNITED STATES, THE UNITED STATES DEPARTMENT OF
+ENERGY, OR THEIR EMPLOYEES: BE LIABLE FOR ANY INDIRECT,
+INCIDENTAL, CONSEQUENTIAL, SPECIAL OR PUNITIVE DAMAGES OF
+ANY KIND OR NATURE, INCLUDING BUT NOT LIMITED TO LOSS OF
+PROFITS OR LOSS OF DATA, FOR ANY REASON WHATSOEVER, WHETHER
+SUCH LIABILITY IS ASSERTED ON THE BASIS OF CONTRACT, TORT
+(INCLUDING NEGLIGENCE OR STRICT LIABILITY), OR OTHERWISE,
+EVEN IF ANY OF SAID PARTIES HAS BEEN WARNED OF THE
+POSSIBILITY OF SUCH LOSS OR DAMAGES.
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/Core b/third_party/eigen3/unsupported/Eigen/CXX11/Core
new file mode 100644
index 0000000000..1b3690716c
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/Core
@@ -0,0 +1,46 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_CORE_MODULE
+#define EIGEN_CXX11_CORE_MODULE
+
+#include <Eigen/Core>
+
+#include <Eigen/src/Core/util/DisableStupidWarnings.h>
+
+/** \defgroup CXX11_Core_Module C++11 Core Module
+  *
+  * This module provides common core features for all modules that
+  * explicitly depend on C++11. Currently, this is only the Tensor
+  * module. Note that at this stage, you should not need to include
+  * this module directly.
+  *
+  * It also provides a limited fallback for compilers that don't support
+  * CXX11 yet, such as nvcc.
+  *
+  * \code
+  * #include <Eigen/CXX11/Core>
+  * \endcode
+  */
+
+// Only a subset of cxx11 is allowed at Google, so we default to emulate the
+// cxx11 functionality that we need.
+#include "src/Core/util/FixedSizeVector.h"
+#if 1
+#include <vector>
+#include "src/Core/util/EmulateCXX11Meta.h"
+#else
+#include "src/Core/util/CXX11Workarounds.h"
+#include "src/Core/util/CXX11Meta.h"
+#endif
+#include <Eigen/src/Core/util/ReenableStupidWarnings.h>
+
+#endif // EIGEN_CXX11_CORE_MODULE
+
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint b/third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint
new file mode 100644
index 0000000000..35b55de46d
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/FixedPoint
@@ -0,0 +1,51 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_FIXED_POINT_MODULE
+#define EIGEN_CXX11_FIXED_POINT_MODULE
+
+#include <Eigen/Core>
+#include <stdint.h>
+
+/** \defgroup CXX11_FixedPoint_Module Fixed Point Module
+  *
+  * This module provides common core features for all modules that
+  * explicitly depend on C++11. Currently, this is only the Tensor
+  * module. Note that at this stage, you should not need to include
+  * this module directly.
+  *
+  * It also provides a limited fallback for compilers that don't support
+  * CXX11 yet, such as nvcc.
+  *
+  * \code
+  * #include <Eigen/CXX11/FixedPoint>
+  * \endcode
+  */
+
+#include "src/FixedPoint/FixedPointTypes.h"
+
+// Use optimized implementations whenever available
+#ifdef EIGEN_VECTORIZE_AVX2
+#define EIGEN_USE_OPTIMIZED_INT8_UINT8_MAT_MAT_PRODUCT
+#include "src/Tensor/TensorContractionThreadPool.h"
+#include "src/FixedPoint/PacketMathAVX2.h"
+#include "src/FixedPoint/MatMatProductAVX2.h"
+#include "src/FixedPoint/TypeCastingAVX2.h"
+
+#elif defined EIGEN_VECTORIZE_NEON
+#define EIGEN_USE_OPTIMIZED_INT8_UINT8_MAT_MAT_PRODUCT
+#include "src/FixedPoint/MatMatProductNEON.h"
+#endif
+
+// Use the default implementation when no optimized code is available
+#include "src/FixedPoint/MatMatProduct.h"
+#include "src/FixedPoint/MatVecProduct.h"
+
+
+#endif // EIGEN_CXX11_FIXED_POINT_MODULE
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks b/third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks
new file mode 100644
index 0000000000..7741b68d8a
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/NeuralNetworks
@@ -0,0 +1,35 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_NEURAL_NETWORKS_MODULE
+#define EIGEN_CXX11_NEURAL_NETWORKS_MODULE
+
+#include "unsupported/Eigen/CXX11/Tensor"
+
+/** \defgroup CXX11_NeuralNetworks_Module Neural Networks Module
+  *
+  * This module provides an efficient implementation of the common primitives
+  * used by neural networks.
+  * The primitives are  built on top of the tensor library.
+  *
+  * \code
+  * #include <Eigen/CXX11/NeuralNetworks>
+  * \endcode
+  */
+
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/Activations.h"
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/Attention.h"
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/Pooling.h"
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/SoftMax.h"
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/BackwardCuboidConvolutions.h"
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/CuboidConvolution.h"
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/BackwardSpatialConvolutions.h"
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/SpatialConvolutions.h"
+
+#endif  // EIGEN_CXX11_NEURAL_NETWORKS_MODULE
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/Tensor b/third_party/eigen3/unsupported/Eigen/CXX11/Tensor
new file mode 100644
index 0000000000..3904c72eef
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/Tensor
@@ -0,0 +1,145 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_MODULE
+#define EIGEN_CXX11_TENSOR_MODULE
+
+#include "Eigen/src/Core/util/StaticAssert.h"
+#include "unsupported/Eigen/CXX11/Core"
+
+#include "Eigen/src/Core/util/DisableStupidWarnings.h"
+
+/** \defgroup CXX11_Tensor_Module Tensor Module
+  *
+  * This module provides a Tensor class for storing arbitrarily indexed
+  * objects.
+  *
+  * \code
+  * #include <Eigen/CXX11/Tensor>
+  * \endcode
+  */
+
+#include <cstddef>
+#include <cstring>
+#include <stdint.h>
+
+#if __cplusplus > 199711
+#include <random>
+#endif
+
+#ifdef EIGEN_USE_THREADS
+#if defined(EIGEN_USE_CUSTOM_THREAD_POOL)
+// Use the Eigen implementation of the ThreadPool class. We only need to
+// include a few multithreading headers
+#include <condition_variable>
+#include <deque>
+#include <mutex>
+#include <thread>
+#else
+#include "tensorflow/core/platform/port.h"
+#endif  // EIGEN_USE_CUSTOM_THREAD_POOL
+
+#include <functional>
+
+#endif  // EIGEN_USE_THREADS
+
+#ifdef EIGEN_USE_GPU
+#include "tensorflow/core/platform/port.h"
+#if !defined(__GCUDACC__) && !defined(__GCUDACC_HOST__)
+#include <cuda.h>
+#include <cufft.h>
+#include <cuda_runtime.h>
+#ifdef __CUDACC__
+#include <curand_kernel.h>
+#endif  // defined(__CUDACC__)
+#else
+#include "perftools/gputools/executor/gcuda.h"
+#ifdef __CUDACC__
+#include "third_party/gpus/cuda/curand_device/curand_kernel.h"
+#endif  // defined(__CUDACC__)
+#endif  // __GCUDACC__
+#endif  // EIGEN_USE_GPU
+
+#ifdef _WIN32
+#include <winbase.h>
+#elif defined(__APPLE__)
+#include <mach/mach_time.h>
+#else
+#include <time.h>
+#endif
+
+#include "Eigen/Core"
+
+// Beware: the order of the include matters to some compilers. For example
+// TensorIndexList.h should be included before TensorDimensions.h in order to
+// use index lists to encode tensor dimensions when compiling with llvm.
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorForwardDeclarations.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorMeta.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorDeviceType.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorIndexList.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorDimensionList.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorInitializer.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorTraits.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorFunctors.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorIntDiv.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h"
+
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorBase.h"
+
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorExpr.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorArgMax.h"
+
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorConcatenation.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorContractionMappers.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h"
+#include "unsupported/Eigen/CXX11/src/NeuralNetworks/TensorConvolutionByFFT.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorFFT.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorPatch.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorImagePatch.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorVolumePatch.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorBroadcasting.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorInflation.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorLayoutSwap.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorPadding.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorReverse.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorTrueIndices.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorShuffling.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorStriding.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorCustomOp.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorEvalTo.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorForcedEval.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h"
+
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h"
+
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorStorage.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/Tensor.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorFixedSize.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorVarDim.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorMap.h"
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorRef.h"
+
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorReductionCuda.h"
+
+#include "unsupported/Eigen/CXX11/src/Tensor/TensorIO.h"
+
+#include "Eigen/src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_CXX11_TENSOR_MODULE
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/TensorSymmetry b/third_party/eigen3/unsupported/Eigen/CXX11/TensorSymmetry
new file mode 100644
index 0000000000..027c6087f9
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/TensorSymmetry
@@ -0,0 +1,40 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSORSYMMETRY_MODULE
+#define EIGEN_CXX11_TENSORSYMMETRY_MODULE
+
+#include <Eigen/CXX11/Tensor>
+
+#include <Eigen/src/Core/util/DisableStupidWarnings.h>
+
+/** \defgroup CXX11_TensorSymmetry_Module Tensor Symmetry Module
+  *
+  * This module provides a classes that allow for the definition of
+  * symmetries w.r.t. tensor indices.
+  *
+  * Including this module will implicitly include the Tensor module.
+  *
+  * \code
+  * #include <Eigen/TensorSymmetry>
+  * \endcode
+  */
+
+#include "src/TensorSymmetry/util/TemplateGroupTheory.h"
+#include "src/TensorSymmetry/Symmetry.h"
+#include "src/TensorSymmetry/StaticSymmetry.h"
+#include "src/TensorSymmetry/DynamicSymmetry.h"
+
+#include <Eigen/src/Core/util/ReenableStupidWarnings.h>
+
+#endif // EIGEN_CXX11_TENSORSYMMETRY_MODULE
+
+/*
+ * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle;
+ */
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/CXX11Meta.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/CXX11Meta.h
new file mode 100644
index 0000000000..ad6a9dda10
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/CXX11Meta.h
@@ -0,0 +1,508 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11META_H
+#define EIGEN_CXX11META_H
+
+namespace Eigen {
+
+namespace internal {
+
+/** \internal
+  * \file CXX11/Core/util/CXX11Meta.h
+  * This file contains generic metaprogramming classes which are not specifically related to Eigen.
+  * This file expands upon Core/util/Meta.h and adds support for C++11 specific features.
+  */
+
+template<typename... tt>
+struct type_list { constexpr static int count = sizeof...(tt); };
+
+template<typename t, typename... tt>
+struct type_list<t, tt...> { constexpr static int count = sizeof...(tt) + 1; typedef t first_type; };
+
+template<typename T, T... nn>
+struct numeric_list { constexpr static std::size_t count = sizeof...(nn); };
+
+template<typename T, T n, T... nn>
+struct numeric_list<T, n, nn...> { constexpr static std::size_t count = sizeof...(nn) + 1; constexpr static T first_value = n; };
+
+/* numeric list constructors
+ *
+ * equivalencies:
+ *     constructor                                              result
+ *     typename gen_numeric_list<int, 5>::type                  numeric_list<int, 0,1,2,3,4>
+ *     typename gen_numeric_list_reversed<int, 5>::type         numeric_list<int, 4,3,2,1,0>
+ *     typename gen_numeric_list_swapped_pair<int, 5,1,2>::type numeric_list<int, 0,2,1,3,4>
+ *     typename gen_numeric_list_repeated<int, 0, 5>::type      numeric_list<int, 0,0,0,0,0>
+ */
+
+template<typename T, std::size_t n, T... ii> struct gen_numeric_list              : gen_numeric_list<T, n-1, n-1, ii...> {};
+template<typename T, T... ii>                struct gen_numeric_list<T, 0, ii...> { typedef numeric_list<T, ii...> type; };
+
+template<typename T, std::size_t n, T... ii> struct gen_numeric_list_reversed              : gen_numeric_list_reversed<T, n-1, ii..., n-1> {};
+template<typename T, T... ii>                struct gen_numeric_list_reversed<T, 0, ii...> { typedef numeric_list<T, ii...> type; };
+
+template<typename T, std::size_t n, T a, T b, T... ii> struct gen_numeric_list_swapped_pair                    : gen_numeric_list_swapped_pair<T, n-1, a, b, (n-1) == a ? b : ((n-1) == b ? a : (n-1)), ii...> {};
+template<typename T, T a, T b, T... ii>                struct gen_numeric_list_swapped_pair<T, 0, a, b, ii...> { typedef numeric_list<T, ii...> type; };
+
+template<typename T, std::size_t n, T V, T... nn> struct gen_numeric_list_repeated                 : gen_numeric_list_repeated<T, n-1, V, V, nn...> {};
+template<typename T, T V, T... nn>                struct gen_numeric_list_repeated<T, 0, V, nn...> { typedef numeric_list<T, nn...> type; };
+
+/* list manipulation: concatenate */
+
+template<class a, class b> struct concat;
+
+template<typename... as, typename... bs> struct concat<type_list<as...>,       type_list<bs...>>        { typedef type_list<as..., bs...> type; };
+template<typename T, T... as, T... bs>   struct concat<numeric_list<T, as...>, numeric_list<T, bs...> > { typedef numeric_list<T, as..., bs...> type; };
+
+template<typename... p> struct mconcat;
+template<typename a>                             struct mconcat<a>           { typedef a type; };
+template<typename a, typename b>                 struct mconcat<a, b>        : concat<a, b> {};
+template<typename a, typename b, typename... cs> struct mconcat<a, b, cs...> : concat<a, typename mconcat<b, cs...>::type> {};
+
+/* list manipulation: extract slices */
+
+template<int n, typename x> struct take;
+template<int n, typename a, typename... as> struct take<n, type_list<a, as...>> : concat<type_list<a>, typename take<n-1, type_list<as...>>::type> {};
+template<int n>                             struct take<n, type_list<>>         { typedef type_list<> type; };
+template<typename a, typename... as>        struct take<0, type_list<a, as...>> { typedef type_list<> type; };
+template<>                                  struct take<0, type_list<>>         { typedef type_list<> type; };
+
+template<typename T, int n, T a, T... as> struct take<n, numeric_list<T, a, as...>> : concat<numeric_list<T, a>, typename take<n-1, numeric_list<T, as...>>::type> {};
+template<typename T, int n>               struct take<n, numeric_list<T>>           { typedef numeric_list<T> type; };
+template<typename T, T a, T... as>        struct take<0, numeric_list<T, a, as...>> { typedef numeric_list<T> type; };
+template<typename T>                      struct take<0, numeric_list<T>>           { typedef numeric_list<T> type; };
+
+template<typename T, int n, T... ii>      struct h_skip_helper_numeric;
+template<typename T, int n, T i, T... ii> struct h_skip_helper_numeric<T, n, i, ii...> : h_skip_helper_numeric<T, n-1, ii...> {};
+template<typename T, T i, T... ii>        struct h_skip_helper_numeric<T, 0, i, ii...> { typedef numeric_list<T, i, ii...> type; };
+template<typename T, int n>               struct h_skip_helper_numeric<T, n>           { typedef numeric_list<T> type; };
+template<typename T>                      struct h_skip_helper_numeric<T, 0>           { typedef numeric_list<T> type; };
+
+template<int n, typename... tt>             struct h_skip_helper_type;
+template<int n, typename t, typename... tt> struct h_skip_helper_type<n, t, tt...> : h_skip_helper_type<n-1, tt...> {};
+template<typename t, typename... tt>        struct h_skip_helper_type<0, t, tt...> { typedef type_list<t, tt...> type; };
+template<int n>                             struct h_skip_helper_type<n>           { typedef type_list<> type; };
+template<>                                  struct h_skip_helper_type<0>           { typedef type_list<> type; };
+
+template<int n>
+struct h_skip {
+  template<typename T, T... ii>
+  constexpr static inline typename h_skip_helper_numeric<T, n, ii...>::type helper(numeric_list<T, ii...>) { return typename h_skip_helper_numeric<T, n, ii...>::type(); }
+  template<typename... tt>
+  constexpr static inline typename h_skip_helper_type<n, tt...>::type helper(type_list<tt...>) { return typename h_skip_helper_type<n, tt...>::type(); }
+};
+
+template<int n, typename a> struct skip { typedef decltype(h_skip<n>::helper(a())) type; };
+
+template<int start, int count, typename a> struct slice : take<count, typename skip<start, a>::type> {};
+
+/* list manipulation: retrieve single element from list */
+
+template<int n, typename x> struct get;
+
+template<int n, typename a, typename... as>               struct get<n, type_list<a, as...>>   : get<n-1, type_list<as...>> {};
+template<typename a, typename... as>                      struct get<0, type_list<a, as...>>   { typedef a type; };
+template<int n EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, as)> struct get<n, type_list<EIGEN_TPL_PP_SPEC_HACK_USE(as)>> { static_assert((n - n) < 0, "meta-template get: The element to extract from a list must be smaller than the size of the list."); };
+
+template<typename T, int n, T a, T... as>                        struct get<n, numeric_list<T, a, as...>>   : get<n-1, numeric_list<T, as...>> {};
+template<typename T, T a, T... as>                               struct get<0, numeric_list<T, a, as...>>   { constexpr static T value = a; };
+template<typename T, int n EIGEN_TPL_PP_SPEC_HACK_DEFC(T, as)>   struct get<n, numeric_list<T EIGEN_TPL_PP_SPEC_HACK_USEC(as)>> { static_assert((n - n) < 0, "meta-template get: The element to extract from a list must be smaller than the size of the list."); };
+
+/* always get type, regardless of dummy; good for parameter pack expansion */
+
+template<typename T, T dummy, typename t> struct id_numeric  { typedef t type; };
+template<typename dummy, typename t>      struct id_type     { typedef t type; };
+
+/* equality checking, flagged version */
+
+template<typename a, typename b> struct is_same_gf : is_same<a, b> { constexpr static int global_flags = 0; };
+
+/* apply_op to list */
+
+template<
+  bool from_left, // false
+  template<typename, typename> class op,
+  typename additional_param,
+  typename... values
+>
+struct h_apply_op_helper                                        { typedef type_list<typename op<values, additional_param>::type...> type; };
+template<
+  template<typename, typename> class op,
+  typename additional_param,
+  typename... values
+>
+struct h_apply_op_helper<true, op, additional_param, values...> { typedef type_list<typename op<additional_param, values>::type...> type; };
+
+template<
+  bool from_left,
+  template<typename, typename> class op,
+  typename additional_param
+>
+struct h_apply_op
+{
+  template<typename... values>
+  constexpr static typename h_apply_op_helper<from_left, op, additional_param, values...>::type helper(type_list<values...>)
+  { return typename h_apply_op_helper<from_left, op, additional_param, values...>::type(); }
+};
+
+template<
+  template<typename, typename> class op,
+  typename additional_param,
+  typename a
+>
+struct apply_op_from_left { typedef decltype(h_apply_op<true, op, additional_param>::helper(a())) type; };
+
+template<
+  template<typename, typename> class op,
+  typename additional_param,
+  typename a
+>
+struct apply_op_from_right { typedef decltype(h_apply_op<false, op, additional_param>::helper(a())) type; };
+
+/* see if an element is in a list */
+
+template<
+  template<typename, typename> class test,
+  typename check_against,
+  typename h_list,
+  bool last_check_positive = false
+>
+struct contained_in_list;
+
+template<
+  template<typename, typename> class test,
+  typename check_against,
+  typename h_list
+>
+struct contained_in_list<test, check_against, h_list, true>
+{
+  constexpr static bool value = true;
+};
+
+template<
+  template<typename, typename> class test,
+  typename check_against,
+  typename a,
+  typename... as
+>
+struct contained_in_list<test, check_against, type_list<a, as...>, false> : contained_in_list<test, check_against, type_list<as...>, test<check_against, a>::value> {};
+
+template<
+  template<typename, typename> class test,
+  typename check_against
+  EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, empty)
+>
+struct contained_in_list<test, check_against, type_list<EIGEN_TPL_PP_SPEC_HACK_USE(empty)>, false> { constexpr static bool value = false; };
+
+/* see if an element is in a list and check for global flags */
+
+template<
+  template<typename, typename> class test,
+  typename check_against,
+  typename h_list,
+  int default_flags = 0,
+  bool last_check_positive = false,
+  int last_check_flags = default_flags
+>
+struct contained_in_list_gf;
+
+template<
+  template<typename, typename> class test,
+  typename check_against,
+  typename h_list,
+  int default_flags,
+  int last_check_flags
+>
+struct contained_in_list_gf<test, check_against, h_list, default_flags, true, last_check_flags>
+{
+  constexpr static bool value = true;
+  constexpr static int global_flags = last_check_flags;
+};
+
+template<
+  template<typename, typename> class test,
+  typename check_against,
+  typename a,
+  typename... as,
+  int default_flags,
+  int last_check_flags
+>
+struct contained_in_list_gf<test, check_against, type_list<a, as...>, default_flags, false, last_check_flags> : contained_in_list_gf<test, check_against, type_list<as...>, default_flags, test<check_against, a>::value, test<check_against, a>::global_flags> {};
+
+template<
+  template<typename, typename> class test,
+  typename check_against
+  EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, empty),
+  int default_flags,
+  int last_check_flags
+>
+struct contained_in_list_gf<test, check_against, type_list<EIGEN_TPL_PP_SPEC_HACK_USE(empty)>, default_flags, false, last_check_flags> { constexpr static bool value = false; constexpr static int global_flags = default_flags; };
+
+/* generic reductions */
+
+template<
+  typename Reducer,
+  typename... Ts
+> struct reduce;
+
+template<
+  typename Reducer,
+  typename A,
+  typename... Ts
+> struct reduce<Reducer, A, Ts...>
+{
+  constexpr static inline A run(A a, Ts...) { return a; }
+};
+
+template<
+  typename Reducer,
+  typename A,
+  typename B,
+  typename... Ts
+> struct reduce<Reducer, A, B, Ts...>
+{
+  constexpr static inline auto run(A a, B b, Ts... ts) -> decltype(Reducer::run(a, reduce<Reducer, B, Ts...>::run(b, ts...))) {
+    return Reducer::run(a, reduce<Reducer, B, Ts...>::run(b, ts...));
+  }
+};
+
+/* generic binary operations */
+
+struct sum_op           { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a + b)   { return a + b;   } };
+struct product_op       { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a * b)   { return a * b;   } };
+
+struct logical_and_op   { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a && b)  { return a && b;  } };
+struct logical_or_op    { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a || b)  { return a || b;  } };
+
+struct equal_op         { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a == b)  { return a == b;  } };
+struct not_equal_op     { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a != b)  { return a != b;  } };
+struct lesser_op        { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a < b)   { return a < b;   } };
+struct lesser_equal_op  { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a <= b)  { return a <= b;  } };
+struct greater_op       { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a > b)   { return a > b;   } };
+struct greater_equal_op { template<typename A, typename B> constexpr static inline auto run(A a, B b) -> decltype(a >= b)  { return a >= b;  } };
+
+/* generic unary operations */
+
+struct not_op                { template<typename A> constexpr static inline auto run(A a) -> decltype(!a)      { return !a;      } };
+struct negation_op           { template<typename A> constexpr static inline auto run(A a) -> decltype(-a)      { return -a;      } };
+struct greater_equal_zero_op { template<typename A> constexpr static inline auto run(A a) -> decltype(a >= 0)  { return a >= 0;  } };
+
+
+/* reductions for lists */
+
+// using auto -> return value spec makes ICC 13.0 and 13.1 crash here, so we have to hack it
+// together in front... (13.0 doesn't work with array_prod/array_reduce/... anyway, but 13.1
+// does...
+template<typename... Ts>
+constexpr inline decltype(reduce<product_op, Ts...>::run((*((Ts*)0))...)) arg_prod(Ts... ts)
+{
+  return reduce<product_op, Ts...>::run(ts...);
+}
+
+template<typename... Ts>
+constexpr inline decltype(reduce<sum_op, Ts...>::run((*((Ts*)0))...)) arg_sum(Ts... ts)
+{
+  return reduce<sum_op, Ts...>::run(ts...);
+}
+
+/* reverse arrays */
+
+template<typename Array, int... n>
+constexpr inline Array h_array_reverse(Array arr, numeric_list<int, n...>)
+{
+  return {{array_get<sizeof...(n) - n - 1>(arr)...}};
+}
+
+template<typename T, std::size_t N>
+constexpr inline std::array<T, N> array_reverse(std::array<T, N> arr)
+{
+  return h_array_reverse(arr, typename gen_numeric_list<int, N>::type());
+}
+
+/* generic array reductions */
+
+// can't reuse standard reduce() interface above because Intel's Compiler
+// *really* doesn't like it, so we just reimplement the stuff
+// (start from N - 1 and work down to 0 because specialization for
+// n == N - 1 also doesn't work in Intel's compiler, so it goes into
+// an infinite loop)
+template<typename Reducer, typename T, std::size_t N, std::size_t n = N - 1>
+struct h_array_reduce {
+  constexpr static inline auto run(std::array<T, N> arr, T identity) -> decltype(Reducer::run(h_array_reduce<Reducer, T, N, n - 1>::run(arr), array_get<n>(arr)))
+  {
+    return Reducer::run(h_array_reduce<Reducer, T, N, n - 1>::run(arr), array_get<n>(arr));
+  }
+};
+
+template<typename Reducer, typename T, std::size_t N>
+struct h_array_reduce<Reducer, T, N, 0>
+{
+  constexpr static inline T run(std::array<T, N> arr, T identity)
+  {
+    return array_get<0>(arr);
+  }
+};
+
+template<typename Reducer, typename T, std::size_t N>
+struct h_array_reduce<Reducer, T, 0>
+{
+  constexpr static inline T run(std::array<T, 0> arr, T identity)
+  {
+    return identity;
+  }
+};
+
+template<typename Reducer, typename T, std::size_t N>
+constexpr inline auto array_reduce(std::array<T, N> arr, T identity) -> decltype(h_array_reduce<Reducer, T, N>::run(arr))
+{
+  return h_array_reduce<Reducer, T, N>::run(arr, identity);
+}
+
+/* standard array reductions */
+
+template<typename T, std::size_t N>
+constexpr inline auto array_sum(std::array<T, N> arr) -> decltype(array_reduce<sum_op, T, N>(arr))
+{
+  return array_reduce<sum_op, T, N>(arr, 0);
+}
+
+template<typename T, std::size_t N>
+constexpr inline auto array_prod(std::array<T, N> arr) -> decltype(array_reduce<product_op, T, N>(arr))
+{
+  return array_reduce<product_op, T, N>(arr, 1);
+}
+
+/* zip an array */
+
+template<typename Op, typename A, typename B, std::size_t N, int... n>
+constexpr inline std::array<decltype(Op::run(A(), B())),N> h_array_zip(std::array<A, N> a, std::array<B, N> b, numeric_list<int, n...>)
+{
+  return std::array<decltype(Op::run(A(), B())),N>{{ Op::run(array_get<n>(a), array_get<n>(b))... }};
+}
+
+template<typename Op, typename A, typename B, std::size_t N>
+constexpr inline std::array<decltype(Op::run(A(), B())),N> array_zip(std::array<A, N> a, std::array<B, N> b)
+{
+  return h_array_zip<Op>(a, b, typename gen_numeric_list<int, N>::type());
+}
+
+/* zip an array and reduce the result */
+
+template<typename Reducer, typename Op, typename A, typename B, std::size_t N, int... n>
+constexpr inline auto h_array_zip_and_reduce(std::array<A, N> a, std::array<B, N> b, numeric_list<int, n...>) -> decltype(reduce<Reducer, typename id_numeric<int,n,decltype(Op::run(A(), B()))>::type...>::run(Op::run(array_get<n>(a), array_get<n>(b))...))
+{
+  return reduce<Reducer, typename id_numeric<int,n,decltype(Op::run(A(), B()))>::type...>::run(Op::run(array_get<n>(a), array_get<n>(b))...);
+}
+
+template<typename Reducer, typename Op, typename A, typename B, std::size_t N>
+constexpr inline auto array_zip_and_reduce(std::array<A, N> a, std::array<B, N> b) -> decltype(h_array_zip_and_reduce<Reducer, Op, A, B, N>(a, b, typename gen_numeric_list<int, N>::type()))
+{
+  return h_array_zip_and_reduce<Reducer, Op, A, B, N>(a, b, typename gen_numeric_list<int, N>::type());
+}
+
+/* apply stuff to an array */
+
+template<typename Op, typename A, std::size_t N, int... n>
+constexpr inline std::array<decltype(Op::run(A())),N> h_array_apply(std::array<A, N> a, numeric_list<int, n...>)
+{
+  return std::array<decltype(Op::run(A())),N>{{ Op::run(array_get<n>(a))... }};
+}
+
+template<typename Op, typename A, std::size_t N>
+constexpr inline std::array<decltype(Op::run(A())),N> array_apply(std::array<A, N> a)
+{
+  return h_array_apply<Op>(a, typename gen_numeric_list<int, N>::type());
+}
+
+/* apply stuff to an array and reduce */
+
+template<typename Reducer, typename Op, typename A, std::size_t N, int... n>
+constexpr inline auto h_array_apply_and_reduce(std::array<A, N> arr, numeric_list<int, n...>) -> decltype(reduce<Reducer, typename id_numeric<int,n,decltype(Op::run(A()))>::type...>::run(Op::run(array_get<n>(arr))...))
+{
+  return reduce<Reducer, typename id_numeric<int,n,decltype(Op::run(A()))>::type...>::run(Op::run(array_get<n>(arr))...);
+}
+
+template<typename Reducer, typename Op, typename A, std::size_t N>
+constexpr inline auto array_apply_and_reduce(std::array<A, N> a) -> decltype(h_array_apply_and_reduce<Reducer, Op, A, N>(a, typename gen_numeric_list<int, N>::type()))
+{
+  return h_array_apply_and_reduce<Reducer, Op, A, N>(a, typename gen_numeric_list<int, N>::type());
+}
+
+/* repeat a value n times (and make an array out of it
+ * usage:
+ *   std::array<int, 16> = repeat<16>(42);
+ */
+
+template<int n>
+struct h_repeat
+{
+  template<typename t, int... ii>
+  constexpr static inline std::array<t, n> run(t v, numeric_list<int, ii...>)
+  {
+    return {{ typename id_numeric<int, ii, t>::type(v)... }};
+  }
+};
+
+template<int n, typename t>
+constexpr std::array<t, n> repeat(t v) { return h_repeat<n>::run(v, typename gen_numeric_list<int, n>::type()); }
+
+/* instantiate a class by a C-style array */
+template<class InstType, typename ArrType, std::size_t N, bool Reverse, typename... Ps>
+struct h_instantiate_by_c_array;
+
+template<class InstType, typename ArrType, std::size_t N, typename... Ps>
+struct h_instantiate_by_c_array<InstType, ArrType, N, false, Ps...>
+{
+  static InstType run(ArrType* arr, Ps... args)
+  {
+    return h_instantiate_by_c_array<InstType, ArrType, N - 1, false, Ps..., ArrType>::run(arr + 1, args..., arr[0]);
+  }
+};
+
+template<class InstType, typename ArrType, std::size_t N, typename... Ps>
+struct h_instantiate_by_c_array<InstType, ArrType, N, true, Ps...>
+{
+  static InstType run(ArrType* arr, Ps... args)
+  {
+    return h_instantiate_by_c_array<InstType, ArrType, N - 1, false, ArrType, Ps...>::run(arr + 1, arr[0], args...);
+  }
+};
+
+template<class InstType, typename ArrType, typename... Ps>
+struct h_instantiate_by_c_array<InstType, ArrType, 0, false, Ps...>
+{
+  static InstType run(ArrType* arr, Ps... args)
+  {
+    (void)arr;
+    return InstType(args...);
+  }
+};
+
+template<class InstType, typename ArrType, typename... Ps>
+struct h_instantiate_by_c_array<InstType, ArrType, 0, true, Ps...>
+{
+  static InstType run(ArrType* arr, Ps... args)
+  {
+    (void)arr;
+    return InstType(args...);
+  }
+};
+
+template<class InstType, typename ArrType, std::size_t N, bool Reverse = false>
+InstType instantiate_by_c_array(ArrType* arr)
+{
+  return h_instantiate_by_c_array<InstType, ArrType, N, Reverse>::run(arr);
+}
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11META_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/CXX11Workarounds.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/CXX11Workarounds.h
new file mode 100644
index 0000000000..a590cf4e18
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/CXX11Workarounds.h
@@ -0,0 +1,116 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11WORKAROUNDS_H
+#define EIGEN_CXX11WORKAROUNDS_H
+
+/* COMPATIBILITY CHECKS
+ * (so users of compilers that are too old get some realistic error messages)
+ */
+#if defined(__INTEL_COMPILER) && (__INTEL_COMPILER < 1310)
+#error Intel Compiler only supports required C++ features since version 13.1.
+// note that most stuff in principle works with 13.0 but when combining
+// some features, at some point 13.0 will just fail with an internal assertion
+#elif defined(__GNUC__) && !defined(__clang__) && !defined(__INTEL_COMPILER) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 6))
+// G++ < 4.6 by default will continue processing the source files - even if we use #error to make
+// it error out. For this reason, we use the pragma to make sure G++ aborts at the first error
+// it sees. Unfortunately, that is still not our #error directive, but at least the output is
+// short enough the user has a chance to see that the compiler version is not sufficient for
+// the funky template mojo we use.
+#pragma GCC diagnostic error "-Wfatal-errors"
+#error GNU C++ Compiler (g++) only supports required C++ features since version 4.6.
+#endif
+
+/* Check that the compiler at least claims to support C++11. It might not be sufficient
+ * because the compiler may not implement it correctly, but at least we'll know.
+ */
+#if __cplusplus <= 199711L
+#if defined(__GNUC__) && !defined(__clang__) && !defined(__INTEL_COMPILER)
+#pragma GCC diagnostic error "-Wfatal-errors"
+#endif
+#error This library needs at least a C++11 compliant compiler. If you use g++/clang, please enable the -std=c++11 compiler flag. (-std=c++0x on older versions.)
+#endif
+
+namespace Eigen {
+
+// Use std::array as Eigen array
+template <typename T, std::size_t N> using array = std::array<T, N>;
+
+namespace internal {
+
+/* std::get is only constexpr in C++14, not yet in C++11
+ *     - libstdc++ from version 4.7 onwards has it nevertheless,
+ *                                          so use that
+ *     - libstdc++ older versions: use _M_instance directly
+ *     - libc++ all versions so far: use __elems_ directly
+ *     - all other libs: use std::get to be portable, but
+ *                       this may not be constexpr
+ */
+#if defined(__GLIBCXX__) && __GLIBCXX__ < 20120322
+#define STD_GET_ARR_HACK             a._M_instance[I]
+#elif defined(_LIBCPP_VERSION)
+#define STD_GET_ARR_HACK             a.__elems_[I]
+#else
+#define STD_GET_ARR_HACK             std::template get<I, T, N>(a)
+#endif
+
+template<std::size_t I, class T, std::size_t N> constexpr inline T&       array_get(std::array<T,N>&       a) { return (T&)       STD_GET_ARR_HACK; }
+template<std::size_t I, class T, std::size_t N> constexpr inline T&&      array_get(std::array<T,N>&&      a) { return (T&&)      STD_GET_ARR_HACK; }
+template<std::size_t I, class T, std::size_t N> constexpr inline T const& array_get(std::array<T,N> const& a) { return (T const&) STD_GET_ARR_HACK; }
+
+template<std::size_t I, class T> constexpr inline T&       array_get(std::vector<T>&       a) { return a[I]; }
+template<std::size_t I, class T> constexpr inline T&&      array_get(std::vector<T>&&      a) { return a[I]; }
+template<std::size_t I, class T> constexpr inline T const& array_get(std::vector<T> const& a) { return a[I]; }
+
+#undef STD_GET_ARR_HACK
+
+template <typename T> struct array_size;
+template<class T, std::size_t N> struct array_size<const std::array<T,N> > {
+  static const size_t value = N;
+};
+template <typename T> struct array_size;
+template<class T, std::size_t N> struct array_size<std::array<T,N> > {
+  static const size_t value = N;
+};
+
+/* Suppose you have a template of the form
+ * template<typename T> struct X;
+ * And you want to specialize it in such a way:
+ *    template<typename S1, typename... SN> struct X<Foo<S1, SN...>> { ::: };
+ *    template<>                            struct X<Foo<>>          { ::: };
+ * This will work in Intel's compiler 13.0, but only to some extent in g++ 4.6, since
+ * g++ can only match templates called with parameter packs if the number of template
+ * arguments is not a fixed size (so inside the first specialization, referencing
+ * X<Foo<Sn...>> will fail in g++). On the other hand, g++ will accept the following:
+ *    template<typename S...> struct X<Foo<S...>> { ::: }:
+ * as an additional (!) specialization, which will then only match the empty case.
+ * But Intel's compiler 13.0 won't accept that, it will only accept the empty syntax,
+ * so we have to create a workaround for this.
+ */
+#if defined(__GNUC__) && !defined(__INTEL_COMPILER)
+#define EIGEN_TPL_PP_SPEC_HACK_DEF(mt, n)    mt... n
+#define EIGEN_TPL_PP_SPEC_HACK_DEFC(mt, n)   , EIGEN_TPL_PP_SPEC_HACK_DEF(mt, n)
+#define EIGEN_TPL_PP_SPEC_HACK_USE(n)        n...
+#define EIGEN_TPL_PP_SPEC_HACK_USEC(n)       , n...
+#else
+#define EIGEN_TPL_PP_SPEC_HACK_DEF(mt, n)
+#define EIGEN_TPL_PP_SPEC_HACK_DEFC(mt, n)
+#define EIGEN_TPL_PP_SPEC_HACK_USE(n)
+#define EIGEN_TPL_PP_SPEC_HACK_USEC(n)
+#endif
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11WORKAROUNDS_H
+
+/*
+ * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle;
+ */
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/EmulateCXX11Meta.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/EmulateCXX11Meta.h
new file mode 100644
index 0000000000..a1e1dca8e1
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/EmulateCXX11Meta.h
@@ -0,0 +1,456 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_EMULATE_CXX11_META_H
+#define EIGEN_EMULATE_CXX11_META_H
+
+
+
+namespace Eigen {
+
+// The array class is only available starting with cxx11. Emulate our own here
+// if needed
+template <typename T, size_t n> class array {
+ public:
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE T& operator[] (size_t index) { return values[index]; }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE const T& operator[] (size_t index) const { return values[index]; }
+
+  static EIGEN_ALWAYS_INLINE std::size_t size() { return n; }
+
+  T values[n];
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array() { }
+  explicit EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array(const T& v) {
+    EIGEN_STATIC_ASSERT(n==1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    values[0] = v;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array(const T& v1, const T& v2) {
+    EIGEN_STATIC_ASSERT(n==2, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    values[0] = v1;
+    values[1] = v2;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3) {
+    EIGEN_STATIC_ASSERT(n==3, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    values[0] = v1;
+    values[1] = v2;
+    values[2] = v3;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3,
+                            const T& v4) {
+    EIGEN_STATIC_ASSERT(n==4, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    values[0] = v1;
+    values[1] = v2;
+    values[2] = v3;
+    values[3] = v4;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3, const T& v4,
+                            const T& v5) {
+    EIGEN_STATIC_ASSERT(n==5, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    values[0] = v1;
+    values[1] = v2;
+    values[2] = v3;
+    values[3] = v4;
+    values[4] = v5;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3, const T& v4,
+                            const T& v5, const T& v6) {
+    EIGEN_STATIC_ASSERT(n==6, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    values[0] = v1;
+    values[1] = v2;
+    values[2] = v3;
+    values[3] = v4;
+    values[4] = v5;
+    values[5] = v6;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array(const T& v1, const T& v2, const T& v3, const T& v4,
+                            const T& v5, const T& v6, const T& v7) {
+    EIGEN_STATIC_ASSERT(n==7, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    values[0] = v1;
+    values[1] = v2;
+    values[2] = v3;
+    values[3] = v4;
+    values[4] = v5;
+    values[5] = v6;
+    values[6] = v7;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array(
+      const T& v1, const T& v2, const T& v3, const T& v4,
+      const T& v5, const T& v6, const T& v7, const T& v8) {
+    EIGEN_STATIC_ASSERT(n==8, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    values[0] = v1;
+    values[1] = v2;
+    values[2] = v3;
+    values[3] = v4;
+    values[4] = v5;
+    values[5] = v6;
+    values[6] = v7;
+    values[7] = v8;
+  }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+  array(std::initializer_list<T> l) {
+    eigen_assert(l.size() == n);
+    internal::smart_copy(l.begin(), l.end(), values);
+  }
+#endif
+};
+
+// Specialize array for zero size
+template <typename T> class array<T, 0> {
+ public:
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE T& operator[] (size_t index) {
+    eigen_assert(false && "Can't index a zero size array");
+    return *static_cast<T*>(NULL);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE const T& operator[] (size_t index) const {
+    eigen_assert(false && "Can't index a zero size array");
+    return *static_cast<const T*>(NULL);
+  }
+
+  static EIGEN_ALWAYS_INLINE std::size_t size() { return 0; }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE array() { }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+  array(std::initializer_list<T> l) {
+    eigen_assert(l.size() == 0);
+  }
+#endif
+};
+
+namespace internal {
+
+/** \internal
+  * \file CXX11/Core/util/EmulateCXX11Meta.h
+  * This file emulates a subset of the functionality provided by CXXMeta.h for
+  * compilers that don't yet support cxx11 such as nvcc.
+  */
+
+struct empty_list { static const std::size_t count = 0; };
+
+template<typename T, typename Tail=empty_list> struct type_list {
+  typedef T HeadType;
+  typedef Tail TailType;
+  static const T head;
+  static const Tail tail;
+  static const std::size_t count = 1 + Tail::count;
+};
+
+struct null_type { };
+
+template<typename T1 = null_type, typename T2 = null_type, typename T3 = null_type,
+         typename T4 = null_type, typename T5 = null_type, typename T6 = null_type,
+         typename T7 = null_type, typename T8 = null_type>
+struct make_type_list {
+  typedef typename make_type_list<T2, T3, T4, T5, T6, T7, T8>::type tailresult;
+
+  typedef type_list<T1, tailresult> type;
+};
+
+template<> struct make_type_list<> {
+  typedef empty_list type;
+};
+
+
+template <std::size_t index, class TList> struct get_type;
+
+template <class Head, class Tail>
+struct get_type<0, type_list<Head, Tail> >
+{
+  typedef Head type;
+};
+
+template <std::size_t i, class Head, class Tail>
+struct get_type<i, type_list<Head, Tail> >
+{
+  typedef typename get_type<i-1, Tail>::type type;
+};
+
+
+/* numeric list */
+template <typename T, T n>
+struct type2val {
+  typedef T type;
+  static const T value = n;
+};
+
+
+template<typename T, size_t n, T V> struct gen_numeric_list_repeated;
+
+template<typename T, T V> struct gen_numeric_list_repeated<T, 1, V> {
+  typedef typename make_type_list<type2val<T, V> >::type type;
+};
+
+template<typename T, T V> struct gen_numeric_list_repeated<T, 2, V> {
+  typedef typename make_type_list<type2val<T, V>, type2val<T, V> >::type type;
+};
+
+template<typename T, T V> struct gen_numeric_list_repeated<T, 3, V> {
+  typedef typename make_type_list<type2val<T, V>, type2val<T, V>, type2val<T, V> >::type type;
+};
+
+template<typename T, T V> struct gen_numeric_list_repeated<T, 4, V> {
+  typedef typename make_type_list<type2val<T, V>, type2val<T, V>, type2val<T, V>, type2val<T, V> >::type type;
+};
+
+template<typename T, T V> struct gen_numeric_list_repeated<T, 5, V> {
+  typedef typename make_type_list<type2val<T, V>, type2val<T, V>, type2val<T, V>, type2val<T, V>, type2val<T, V> >::type type;
+};
+
+template<typename T, T V> struct gen_numeric_list_repeated<T, 6, V> {
+  typedef typename make_type_list<type2val<T, V>, type2val<T, V>, type2val<T, V>,
+                                  type2val<T, V>, type2val<T, V>, type2val<T, V> >::type type;
+};
+
+template<typename T, T V> struct gen_numeric_list_repeated<T, 7, V> {
+  typedef typename make_type_list<type2val<T, V>, type2val<T, V>, type2val<T, V>,
+                                  type2val<T, V>, type2val<T, V>, type2val<T, V>,
+                                  type2val<T, V> >::type type;
+};
+
+template<typename T, T V> struct gen_numeric_list_repeated<T, 8, V> {
+  typedef typename make_type_list<type2val<T, V>, type2val<T, V>, type2val<T, V>,
+                                  type2val<T, V>, type2val<T, V>, type2val<T, V>,
+                                  type2val<T, V>, type2val<T, V> >::type type;
+};
+
+
+template <std::size_t index, class NList> struct get;
+
+template <std::size_t i>
+struct get<i, empty_list>
+{
+  get() { eigen_assert(false && "index overflow"); }
+  typedef void type;
+  static const char value = '\0';
+};
+
+template <std::size_t i, class Head>
+struct get<i, type_list<Head, empty_list> >
+{
+  get() { eigen_assert(false && "index overflow"); }
+  typedef void type;
+  static const char value = '\0';
+};
+
+template <class Head>
+struct get<0, type_list<Head, empty_list> >
+{
+  typedef typename Head::type type;
+  static const type value = Head::value;
+};
+
+template <class Head, class Tail>
+struct get<0, type_list<Head, Tail> >
+{
+  typedef typename Head::type type;
+  static const type value = Head::value;
+};
+
+template <std::size_t i, class Head, class Tail>
+struct get<i, type_list<Head, Tail> >
+{
+  typedef typename Tail::HeadType::type type;
+  static const type value = get<i-1, Tail>::value;
+};
+
+
+template <class NList> struct arg_prod {
+  static const typename NList::HeadType::type value = get<0, NList>::value * arg_prod<typename NList::TailType>::value;
+};
+template <> struct arg_prod<empty_list> {
+  static const int value = 1;
+};
+
+
+template<int n, typename t>
+array<t, n> repeat(t v) {
+  array<t, n> array;
+  array.fill(v);
+  return array;
+}
+
+template<std::size_t I, class Head, class Tail>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Head::type array_get(type_list<Head, Tail>& a) {
+  return get<I, type_list<Head, Tail> >::value;
+}
+template<std::size_t I, class Head, class Tail>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Head::type array_get(const type_list<Head, Tail>& a) {
+  return get<I, type_list<Head, Tail> >::value;
+}
+
+template <class NList>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename NList::HeadType::type array_prod(const NList& l) {
+  return arg_prod<NList>::value;
+};
+
+template<std::size_t n, typename t>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE t array_prod(const array<t, n>& a) {
+  t prod = 1;
+  for (size_t i = 0; i < n; ++i) { prod *= a[i]; }
+  return prod;
+}
+
+template<typename t>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE t array_prod(const std::vector<t>& a) {
+  t prod = 1;
+  for (size_t i = 0; i < a.size(); ++i) { prod *= a[i]; }
+  return prod;
+}
+
+template<std::size_t I, class T, std::size_t N>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& array_get(array<T,N>& a) {
+  return a[I];
+}
+template<std::size_t I, class T, std::size_t N>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& array_get(const array<T,N>& a) {
+  return a[I];
+}
+
+template<std::size_t I, class T>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T& array_get(std::vector<T>& a) {
+  return a[I];
+}
+template<std::size_t I, class T>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T& array_get(const std::vector<T>& a) {
+  return a[I];
+}
+
+template <typename T> struct array_size;
+template<class T, std::size_t N> struct array_size<array<T,N> > {
+  static const size_t value = N;
+};
+template <typename T> struct array_size;
+template<class T, std::size_t N> struct array_size<array<T,N>& > {
+  static const size_t value = N;
+};
+template <typename T> struct array_size;
+template<class T, std::size_t N> struct array_size<const array<T,N> > {
+  static const size_t value = N;
+};
+template <typename T> struct array_size;
+template<class T, std::size_t N> struct array_size<const array<T,N>& > {
+  static const size_t value = N;
+};
+
+struct sum_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a + b; }
+};
+struct product_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a * b; }
+};
+
+struct logical_and_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a && b; }
+};
+struct logical_or_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a || b; }
+};
+
+struct equal_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a == b; }
+};
+struct not_equal_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a != b; }
+};
+struct lesser_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a < b; }
+};
+struct lesser_equal_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a <= b; }
+};
+
+struct greater_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a > b; }
+};
+struct greater_equal_op {
+  template<typename A, typename B> static inline bool run(A a, B b) { return a >= b; }
+};
+
+struct not_op {
+  template<typename A> static inline bool run(A a) { return !a; }
+};
+struct negation_op {
+  template<typename A> static inline bool run(A a) { return -a; }
+};
+struct greater_equal_zero_op {
+  template<typename A> static inline bool run(A a) { return a >= 0; }
+};
+
+
+template<typename Reducer, typename Op, typename A, std::size_t N>
+struct ArrayApplyAndReduce {
+  static inline bool run(const array<A, N>& a) {
+    EIGEN_STATIC_ASSERT(N >= 2, YOU_MADE_A_PROGRAMMING_MISTAKE);
+    bool result = Reducer::run(Op::run(a[0]), Op::run(a[1]));
+    for (size_t i = 2; i < N; ++i) {
+      result = Reducer::run(result, Op::run(a[i]));
+    }
+    return result;
+  }
+};
+
+template<typename Reducer, typename Op, typename A>
+struct ArrayApplyAndReduce<Reducer, Op, A, 1>  {
+  static inline bool run(const array<A, 1>& a) {
+    return Op::run(a[0]);
+  }
+};
+
+template<typename Reducer, typename Op, typename A, std::size_t N>
+inline bool array_apply_and_reduce(const array<A, N>& a) {
+  return ArrayApplyAndReduce<Reducer, Op, A, N>::run(a);
+}
+
+template<typename Reducer, typename Op, typename A, typename B, std::size_t N>
+struct ArrayZipAndReduce {
+  static inline bool run(const array<A, N>& a, const array<B, N>& b) {
+    EIGEN_STATIC_ASSERT(N >= 2, YOU_MADE_A_PROGRAMMING_MISTAKE);
+    bool result = Reducer::run(Op::run(a[0], b[0]), Op::run(a[1], b[1]));
+    for (size_t i = 2; i < N; ++i) {
+      result = Reducer::run(result, Op::run(a[i], b[i]));
+    }
+    return result;
+  }
+};
+
+template<typename Reducer, typename Op, typename A, typename B>
+struct ArrayZipAndReduce<Reducer, Op, A, B, 1> {
+  static inline bool run(const array<A, 1>& a, const array<B, 1>& b) {
+    return Op::run(a[0], b[0]);
+  }
+};
+
+template<typename Reducer, typename Op, typename A, typename B, std::size_t N>
+inline bool array_zip_and_reduce(const array<A, N>& a, const array<B, N>& b) {
+  return ArrayZipAndReduce<Reducer, Op, A, B, N>::run(a, b);
+}
+
+}  // end namespace internal
+
+}  // end namespace Eigen
+
+
+
+#endif  // EIGEN_EMULATE_CXX11_META_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/FixedSizeVector.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/FixedSizeVector.h
new file mode 100644
index 0000000000..c68119aa03
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Core/util/FixedSizeVector.h
@@ -0,0 +1,128 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_FIXEDSIZEVECTOR_H
+#define EIGEN_FIXEDSIZEVECTOR_H
+
+namespace Eigen {
+
+/** \class FixedSizeVector
+  * \ingroup Core
+  *
+  * \brief The FixedSizeVector class.
+  *
+  * The %FixedSizeVector provides a subset of std::vector functionality.
+  *
+  * The goal is to provide basic std::vector operations when using
+  * std::vector is not an option (e.g. on GPU or when compiling using
+  * FMA/AVX, as this can cause either compilation failures or illegal
+  * instruction failures).
+  *
+  */
+template <typename T>
+class FixedSizeVector {
+ public:
+  // Construct a new FixedSizeVector, reserve n elements.
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  explicit FixedSizeVector(size_t n)
+      : reserve_(n), size_(0),
+        data_(static_cast<T*>(internal::aligned_malloc(n * sizeof(T)))) {
+    for (size_t i = 0; i < n; ++i) { new (&data_[i]) T; }
+  }
+
+  // Construct a new FixedSizeVector, reserve and resize to n.
+  // Copy the init value to all elements.
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  explicit FixedSizeVector(size_t n, const T& init)
+      : reserve_(n), size_(n),
+        data_(static_cast<T*>(internal::aligned_malloc(n * sizeof(T)))) {
+    for (size_t i = 0; i < n; ++i) { new (&data_[i]) T(init); }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  ~FixedSizeVector() {
+    for (size_t i = 0; i < size_; ++i) {
+      data_[i].~T();
+    }
+    internal::aligned_free(data_);
+  }
+
+  // Append new elements (up to reserved size).
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void push_back(const T& t) {
+    eigen_assert(size_ < reserve_);
+    data_[size_++] = t;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  const T& operator[] (size_t i) const {
+    eigen_assert(i < size_);
+    return data_[i];
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  T& operator[] (size_t i) {
+    eigen_assert(i < size_);
+    return data_[i];
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  T& back() {
+    eigen_assert(size_ > 0);
+    return data_[size_ - 1];
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  const T& back() const {
+    eigen_assert(size_ > 0);
+    return data_[size_ - 1];
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void pop_back() {
+    // NOTE: This does not destroy the value at the end the way
+    // std::vector's version of pop_back() does.  That happens when
+    // the Vector is destroyed.
+    eigen_assert(size_ > 0);
+    size_--;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  size_t size() const { return size_; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  bool empty() const { return size_ == 0; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  T* data() { return data_; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  const T* data() const { return data_; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  T* begin() { return data_; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  T* end() { return data_ + size_; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  const T* begin() const { return data_; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  const T* end() const { return data_ + size_; }
+
+ private:
+  size_t reserve_;
+  size_t size_;
+  T* data_;
+};
+
+}  // namespace Eigen
+
+#endif  // EIGEN_FIXEDSIZEVECTOR_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/FixedPointTypes.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/FixedPointTypes.h
new file mode 100644
index 0000000000..564729ce48
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/FixedPointTypes.h
@@ -0,0 +1,341 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_FIXED_POINT_TYPES_H
+#define EIGEN_CXX11_FIXED_POINT_TYPES_H
+
+#include <cmath>
+#include <iostream>
+
+namespace Eigen {
+
+// The mantissa part of the fixed point representation. See
+// go/tensorfixedpoint for details
+struct QInt8;
+struct QUInt8;
+struct QInt16;
+struct QUInt16;
+struct QInt32;
+
+template <>
+struct NumTraits<QInt8> : GenericNumTraits<int8_t> {};
+template <>
+struct NumTraits<QUInt8> : GenericNumTraits<uint8_t> {};
+template <>
+struct NumTraits<QInt16> : GenericNumTraits<int16_t> {};
+template <>
+struct NumTraits<QUInt16> : GenericNumTraits<uint16_t> {};
+template <>
+struct NumTraits<QInt32> : GenericNumTraits<int32_t> {};
+
+namespace internal {
+template <>
+struct scalar_product_traits<QInt32, double> {
+  enum {
+    // Cost = NumTraits<T>::MulCost,
+    Defined = 1
+  };
+  typedef QInt32 ReturnType;
+};
+}
+
+// Wrap the 8bit int into a QInt8 struct instead of using a typedef to prevent
+// the compiler from silently type cast the mantissa into a bigger or a smaller
+// representation.
+struct QInt8 {
+  QInt8() {}
+  QInt8(const int8_t v) : value(v) {}
+  QInt8(const QInt32 v);
+
+  operator int() const { return static_cast<int>(value); }
+
+  int8_t value;
+};
+
+struct QUInt8 {
+  QUInt8() {}
+  QUInt8(const uint8_t v) : value(v) {}
+  QUInt8(const QInt32 v);
+
+  operator int() const { return static_cast<int>(value); }
+
+  uint8_t value;
+};
+
+struct QInt16 {
+  QInt16() {}
+  QInt16(const int16_t v) : value(v) {}
+  QInt16(const QInt32 v);
+  operator int() const { return static_cast<int>(value); }
+
+  int16_t value;
+};
+
+struct QUInt16 {
+  QUInt16() {}
+  QUInt16(const uint16_t v) : value(v) {}
+  QUInt16(const QInt32 v);
+  operator int() const { return static_cast<int>(value); }
+
+  uint16_t value;
+};
+
+struct QInt32 {
+  QInt32() {}
+  QInt32(const int8_t v) : value(v) {}
+  QInt32(const int32_t v) : value(v) {}
+  QInt32(const QInt8 v) : value(v.value) {}
+  QInt32(const float v) : value(static_cast<int32_t>(lrint(v))) {}
+#ifdef EIGEN_MAKING_DOCS
+  // Workaround to fix build on PPC.
+  QInt32(unsigned long v) : value(v) {}
+#endif
+
+  operator float() const { return static_cast<float>(value); }
+
+  int32_t value;
+};
+
+EIGEN_STRONG_INLINE QInt8::QInt8(const QInt32 v)
+    : value(v.value > 127 ? 127 : (v.value < -128 ? -128 : v.value)) {}
+EIGEN_STRONG_INLINE QUInt8::QUInt8(const QInt32 v)
+    : value(v.value > 255 ? 255 : (v.value < 0 ? 0 : v.value)) {}
+EIGEN_STRONG_INLINE QInt16::QInt16(const QInt32 v)
+    : value(v.value > 32767 ? 32767 : (v.value < -32768 ? -32768 : v.value)) {}
+EIGEN_STRONG_INLINE QUInt16::QUInt16(const QInt32 v)
+    : value(v.value > 65535 ? 65535 : (v.value < 0 ? 0 : v.value)) {}
+
+// Basic widening 8-bit operations: This will be vectorized in future CLs.
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt8 a, const QInt8 b) {
+  return QInt32(static_cast<int32_t>(a.value) * static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt8 a, const QUInt8 b) {
+  return QInt32(static_cast<int32_t>(a.value) * static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt8 a, const QInt8 b) {
+  return QInt32(static_cast<int32_t>(a.value) + static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt8 a, const QInt8 b) {
+  return QInt32(static_cast<int32_t>(a.value) - static_cast<int32_t>(b.value));
+}
+
+// Basic widening 16-bit operations: This will be vectorized in future CLs.
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt16 a, const QInt16 b) {
+  return QInt32(static_cast<int32_t>(a.value) * static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt16 a, const QUInt16 b) {
+  return QInt32(static_cast<int32_t>(a.value) * static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt16 a, const QInt16 b) {
+  return QInt32(static_cast<int32_t>(a.value) + static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt16 a, const QInt16 b) {
+  return QInt32(static_cast<int32_t>(a.value) - static_cast<int32_t>(b.value));
+}
+
+// Mixed QInt32 op QInt8 operations. This will be vectorized in future CLs.
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt32 a, const QInt8 b) {
+  return QInt32(a.value + static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt8 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) + b.value);
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt32 a, const QInt8 b) {
+  return QInt32(a.value - static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt8 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) - b.value);
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt32 a, const QInt8 b) {
+  return QInt32(a.value * static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt8 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) * b.value);
+}
+
+// Mixed QInt32 op QInt16 operations. This will be vectorized in future CLs.
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt32 a, const QInt16 b) {
+  return QInt32(a.value + static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt16 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) + b.value);
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt32 a, const QInt16 b) {
+  return QInt32(a.value - static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt16 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) - b.value);
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt32 a, const QInt16 b) {
+  return QInt32(a.value * static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt16 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) * b.value);
+}
+
+// Mixed QInt32 op QUInt8 operations. This will be vectorized in future CLs.
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt32 a, const QUInt8 b) {
+  return QInt32(a.value + static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator+(const QUInt8 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) + b.value);
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt32 a, const QUInt8 b) {
+  return QInt32(a.value - static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QUInt8 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) - b.value);
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt32 a, const QUInt8 b) {
+  return QInt32(a.value * static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QUInt8 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) * b.value);
+}
+
+// Mixed QInt32 op QUInt16 operations. This will be vectorized in future CLs.
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt32 a, const QUInt16 b) {
+  return QInt32(a.value + static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator+(const QUInt16 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) + b.value);
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt32 a, const QUInt16 b) {
+  return QInt32(a.value - static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QUInt16 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) - b.value);
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt32 a, const QUInt16 b) {
+  return QInt32(a.value * static_cast<int32_t>(b.value));
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QUInt16 a, const QInt32 b) {
+  return QInt32(static_cast<int32_t>(a.value) * b.value);
+}
+
+// Basic arithmetic operations on QInt32, which behaves like a int32_t.
+EIGEN_STRONG_INLINE QInt32 operator+(const QInt32 a, const QInt32 b) {
+  return a.value + b.value;
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt32 a, const QInt32 b) {
+  return a.value - b.value;
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt32 a, const QInt32 b) {
+  return a.value * b.value;
+}
+EIGEN_STRONG_INLINE QInt32 operator/(const QInt32 a, const QInt32 b) {
+  return a.value / b.value;
+}
+EIGEN_STRONG_INLINE QInt32& operator+=(QInt32& a, const QInt32 b) {
+  a.value += b.value;
+  return a;
+}
+EIGEN_STRONG_INLINE QInt32& operator-=(QInt32& a, const QInt32 b) {
+  a.value -= b.value;
+  return a;
+}
+EIGEN_STRONG_INLINE QInt32& operator*=(QInt32& a, const QInt32 b) {
+  a.value *= b.value;
+  return a;
+}
+EIGEN_STRONG_INLINE QInt32& operator/=(QInt32& a, const QInt32 b) {
+  a.value /= b.value;
+  return a;
+}
+EIGEN_STRONG_INLINE QInt32 operator-(const QInt32 a) {
+  return -a.value;
+}
+
+// Scaling QInt32 by double. We do the arithmetic in double because
+// float only has 23 bits of mantissa, so casting QInt32 to float might reduce
+// accuracy by discarding up to 7 (least significant) bits.
+EIGEN_STRONG_INLINE QInt32 operator*(const QInt32 a, const double b) {
+  return static_cast<int32_t>(lrint(static_cast<double>(a.value) * b));
+}
+EIGEN_STRONG_INLINE QInt32 operator*(const double a, const QInt32 b) {
+  return static_cast<int32_t>(lrint(a * static_cast<double>(b.value)));
+}
+EIGEN_STRONG_INLINE QInt32& operator*=(QInt32& a, const double b) {
+  a.value = static_cast<int32_t>(lrint(static_cast<double>(a.value) * b));
+  return a;
+}
+
+// Comparisons
+EIGEN_STRONG_INLINE bool operator==(const QInt8 a, const QInt8 b) {
+  return a.value == b.value;
+}
+EIGEN_STRONG_INLINE bool operator==(const QUInt8 a, const QUInt8 b) {
+  return a.value == b.value;
+}
+EIGEN_STRONG_INLINE bool operator==(const QInt16 a, const QInt16 b) {
+  return a.value == b.value;
+}
+EIGEN_STRONG_INLINE bool operator==(const QUInt16 a, const QUInt16 b) {
+  return a.value == b.value;
+}
+EIGEN_STRONG_INLINE bool operator==(const QInt32 a, const QInt32 b) {
+  return a.value == b.value;
+}
+
+EIGEN_STRONG_INLINE bool operator<(const QInt8 a, const QInt8 b) {
+  return a.value < b.value;
+}
+EIGEN_STRONG_INLINE bool operator<(const QUInt8 a, const QUInt8 b) {
+  return a.value < b.value;
+}
+EIGEN_STRONG_INLINE bool operator<(const QInt16 a, const QInt16 b) {
+  return a.value < b.value;
+}
+EIGEN_STRONG_INLINE bool operator<(const QUInt16 a, const QUInt16 b) {
+  return a.value < b.value;
+}
+EIGEN_STRONG_INLINE bool operator<(const QInt32 a, const QInt32 b) {
+  return a.value < b.value;
+}
+
+EIGEN_STRONG_INLINE bool operator>(const QInt8 a, const QInt8 b) {
+  return a.value > b.value;
+}
+EIGEN_STRONG_INLINE bool operator>(const QUInt8 a, const QUInt8 b) {
+  return a.value > b.value;
+}
+EIGEN_STRONG_INLINE bool operator>(const QInt16 a, const QInt16 b) {
+  return a.value > b.value;
+}
+EIGEN_STRONG_INLINE bool operator>(const QUInt16 a, const QUInt16 b) {
+  return a.value > b.value;
+}
+EIGEN_STRONG_INLINE bool operator>(const QInt32 a, const QInt32 b) {
+  return a.value > b.value;
+}
+
+EIGEN_STRONG_INLINE std::ostream& operator<<(std::ostream& os, QInt8 a) {
+  os << static_cast<int>(a.value);
+  return os;
+}
+EIGEN_STRONG_INLINE std::ostream& operator<<(std::ostream& os, QUInt8 a) {
+  os << static_cast<int>(a.value);
+  return os;
+}
+EIGEN_STRONG_INLINE std::ostream& operator<<(std::ostream& os, QInt16 a) {
+  os << static_cast<int>(a.value);
+  return os;
+}
+EIGEN_STRONG_INLINE std::ostream& operator<<(std::ostream& os, QUInt16 a) {
+  os << static_cast<int>(a.value);
+  return os;
+}
+EIGEN_STRONG_INLINE std::ostream& operator<<(std::ostream& os, QInt32 a) {
+  os << a.value;
+  return os;
+}
+
+}  // namespace Eigen
+
+#endif  // EIGEN_CXX11_FIXED_POINT_TYPES_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProduct.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProduct.h
new file mode 100644
index 0000000000..4d0dca07df
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProduct.h
@@ -0,0 +1,255 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_H
+#define EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_H
+
+
+namespace Eigen {
+namespace internal {
+
+// Accumulate the product of 2 QInt8 inputs on 32 bits to prevent
+// overflows
+template<> struct scalar_product_traits<QInt8, QInt8>
+{
+  enum {
+    Defined = 1
+  };
+  typedef QInt32 ReturnType;
+};
+
+// Accumulate the product of QInt8 inputs with QUint8 inputs on 32 bits
+// to prevent overflows
+template<> struct scalar_product_traits<QInt8, QUInt8>
+{
+  enum {
+    Defined = 1
+  };
+  typedef QInt32 ReturnType;
+};
+
+// Description of the product implementation. It's pretty simple now since
+// nothing is vectorized yet.
+// This definition tackle the case where both lhs and rhs are encoded using
+// signed 8bit integers
+#ifndef EIGEN_USE_OPTIMIZED_INT8_INT8_MAT_MAT_PRODUCT
+
+template<bool _ConjLhs, bool _ConjRhs>
+class gebp_traits<QInt8, QInt8, _ConjLhs, _ConjRhs>
+{
+public:
+  typedef QInt8 LhsScalar;
+  typedef QInt8 RhsScalar;
+  typedef QInt32 ResScalar;
+
+  enum {
+    // register block size along the M and N directions
+    // One for the current implementation
+    nr = 1,
+    mr = 1,
+    // Progress made at each iteration of the product loop
+    // also 1 for the current implementation
+    LhsProgress = 1,
+    RhsProgress = 1
+  };
+};
+
+// The signed 8bit Mat-Mat product itself.
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+struct gebp_kernel<QInt8, QInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+{
+  EIGEN_DONT_INLINE
+  void operator()(const DataMapper& res, const QInt8* blockA, const QInt8* blockB,
+                  Index rows, Index depth, Index cols, QInt32 alpha,
+                  Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0);
+};
+
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+EIGEN_DONT_INLINE
+void gebp_kernel<QInt8, QInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+::operator()(const DataMapper& res, const QInt8* blockA, const QInt8* blockB,
+             Index rows, Index depth, Index cols, QInt32 alpha,
+             Index strideA, Index strideB, Index offsetA, Index offsetB)
+{
+  EIGEN_STATIC_ASSERT(!ConjugateLhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  EIGEN_STATIC_ASSERT(!ConjugateRhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  eigen_assert(alpha.value == 1);
+  eigen_assert(strideA == -1);
+  eigen_assert(strideB == -1);
+  eigen_assert(offsetA == 0);
+  eigen_assert(offsetB == 0);
+
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+  eigen_assert(depth > 0);
+  eigen_assert(blockA);
+  eigen_assert(blockB);
+
+  for (Index j = 0; j < cols; ++j) {
+    Index startB = j * depth;
+
+    for (Index i = 0; i < rows; ++i) {
+      Index startA = i * depth;
+
+      for (Index k = 0; k < depth; ++k) {
+        res(i, j) += blockA[startA + k] * blockB[startB + k];
+      }
+    }
+  }
+}
+#endif
+
+
+// This definition tackle the case where the lhs is encoded using signed 8bit
+// integers and the rhs using unsigned 8bit integers.
+#ifndef EIGEN_USE_OPTIMIZED_INT8_UINT8_MAT_MAT_PRODUCT
+template<bool _ConjLhs, bool _ConjRhs>
+class gebp_traits<QInt8, QUInt8, _ConjLhs, _ConjRhs>
+{
+public:
+  typedef QInt8 LhsScalar;
+  typedef QUInt8 RhsScalar;
+  typedef QInt32 ResScalar;
+
+  enum {
+    // register block size along the M and N directions
+    // One for the current implementation
+    nr = 1,
+    mr = 1,
+    // Progress made at each iteration of the product loop
+    // also 1 for the current implementation
+    LhsProgress = 1,
+    RhsProgress = 1
+  };
+};
+
+// Mat-Mat product of a signed 8bit lhs with an unsigned 8bit rhs
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+struct gebp_kernel<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+{
+  EIGEN_DONT_INLINE
+  void operator()(const DataMapper& res, const QInt8* blockA, const QUInt8* blockB,
+                  Index rows, Index depth, Index cols, QInt32 alpha,
+                  Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0);
+};
+
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+EIGEN_DONT_INLINE
+void gebp_kernel<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+::operator()(const DataMapper& res, const QInt8* blockA, const QUInt8* blockB,
+             Index rows, Index depth, Index cols, QInt32 alpha,
+             Index strideA, Index strideB, Index offsetA, Index offsetB)
+{
+  EIGEN_STATIC_ASSERT(!ConjugateLhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  EIGEN_STATIC_ASSERT(!ConjugateRhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  eigen_assert(alpha.value == 1);
+  eigen_assert(strideA == -1);
+  eigen_assert(strideB == -1);
+  eigen_assert(offsetA == 0);
+  eigen_assert(offsetB == 0);
+
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+  eigen_assert(depth > 0);
+  eigen_assert(blockA);
+  eigen_assert(blockB);
+
+  for (Index j = 0; j < cols; ++j) {
+    Index startB = j * depth;
+
+    for (Index i = 0; i < rows; ++i) {
+      Index startA = i * depth;
+
+      for (Index k = 0; k < depth; ++k) {
+        res(i, j) += blockA[startA + k] * blockB[startB + k];
+      }
+    }
+  }
+}
+#endif
+
+
+// This definition tackle the case where the khs is encoded using unsigned 8bit
+// integers and the rhs using signed 8bit integers.
+#ifndef EIGEN_USE_OPTIMIZED_UINT8_INT8_MAT_MAT_PRODUCT
+template<bool _ConjLhs, bool _ConjRhs>
+class gebp_traits<QUInt8, QInt8, _ConjLhs, _ConjRhs>
+{
+public:
+  typedef QUInt8 LhsScalar;
+  typedef QInt8 RhsScalar;
+  typedef QInt32 ResScalar;
+
+  enum {
+    // register block size along the M and N directions
+    // One for the current implementation
+    nr = 1,
+    mr = 1,
+    // Progress made at each iteration of the product loop
+    // also 1 for the current implementation
+    LhsProgress = 1,
+    RhsProgress = 1
+  };
+};
+
+
+// Mat-Mat product of an unsigned 8bit lhs with a signed 8bit rhs
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+struct gebp_kernel<QUInt8, QInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+{
+  EIGEN_DONT_INLINE
+  void operator()(const DataMapper& res, const QUInt8* blockA, const QInt8* blockB,
+                  Index rows, Index depth, Index cols, QInt32 alpha,
+                  Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0);
+};
+
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+EIGEN_DONT_INLINE
+void gebp_kernel<QUInt8, QInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+::operator()(const DataMapper& res, const QUInt8* blockA, const QInt8* blockB,
+             Index rows, Index depth, Index cols, QInt32 alpha,
+             Index strideA, Index strideB, Index offsetA, Index offsetB)
+{
+  EIGEN_STATIC_ASSERT(!ConjugateLhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  EIGEN_STATIC_ASSERT(!ConjugateRhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  eigen_assert(alpha.value == 1);
+  eigen_assert(strideA == -1);
+  eigen_assert(strideB == -1);
+  eigen_assert(offsetA == 0);
+  eigen_assert(offsetB == 0);
+
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+  eigen_assert(depth > 0);
+  eigen_assert(blockA);
+  eigen_assert(blockB);
+
+  for (Index j = 0; j < cols; ++j) {
+    Index startB = j * depth;
+
+    for (Index i = 0; i < rows; ++i) {
+      Index startA = i * depth;
+
+      for (Index k = 0; k < depth; ++k) {
+        res(i, j) += blockA[startA + k] * blockB[startB + k];
+      }
+    }
+  }
+}
+#endif
+
+}  // namespace internal
+}  // namespace Eigen
+
+
+
+#endif  // EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProductAVX2.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProductAVX2.h
new file mode 100644
index 0000000000..d561b79fbd
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProductAVX2.h
@@ -0,0 +1,1743 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+// Copyright (C) 2015 Matthew Sarett <msarett@google.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_AVX2_H
+#define EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_AVX2_H
+
+namespace Eigen {
+namespace internal {
+
+// AVX2 optimized implementation of Mat-Mat product.
+// LHS is encoded using signed 8-bit integers.
+// RHS is encoded using unsigned 8-bit integers.
+#ifdef EIGEN_USE_OPTIMIZED_INT8_UINT8_MAT_MAT_PRODUCT
+
+// Define quantized traits
+template<bool _ConjLhs, bool _ConjRhs>
+class gebp_traits<QInt8, QUInt8, _ConjLhs, _ConjRhs>
+{
+public:
+  typedef QInt8 LhsScalar;
+  typedef QUInt8 RhsScalar;
+  typedef QInt32 ResScalar;
+
+  enum {
+    // Define register blocking scheme.
+    nr = 32,
+    mr = 32,
+    kr = 8,
+    // Ignore progress tracking per loop iteration.
+    LhsProgress = -1,
+    RhsProgress = -1
+  };
+};
+
+// Specialized blocking for quantized implementations.
+// Used by TensorContractionThreadPool, inputs must have dimensions that are
+// multiples of 32.
+template<int KcFactor, typename Index>
+struct ComputeGemmByColBlockingSizes<QInt8, QUInt8, KcFactor, Index> {
+  void operator()(Index& k, Index& m, Index& n, Index num_threads)
+  {
+    eigen_assert(m % 32 == 0);
+    eigen_assert(n % 32 == 0);
+    eigen_assert(k % 32 == 0);
+    if (!k || !m || !n) {
+      return;
+    }
+    n = (((n / num_threads) + 31) / 32) * 32;
+  }
+};
+
+// Specialized blocking for quantized implementations.
+// Used by TensorContractionThreadPool, inputs must have dimensions that are
+// multiples of 32.
+template<int KcFactor, typename Index>
+struct ComputeGemmByRowBlockingSizes<QInt8, QUInt8, KcFactor, Index> {
+  void operator()(Index& k, Index& m, Index& n, Index num_threads)
+  {
+    eigen_assert(m % 32 == 0);
+    eigen_assert(n % 32 == 0 || n == 1);
+    eigen_assert(k % 32 == 0);
+    if (!k || !m || !n) {
+      return;
+    }
+    // Special case to avoid breaking the unimplemented matrix-vector case
+    if (n == 1) {
+      n = 32;
+    }
+    m = (((m / num_threads) + 31) / 32) * 32;
+  }
+};
+
+// Specialized blocking for quantized implementations.
+// Used by TensorContraction and GeneralMatrixMatrix, inputs are padded to
+// multiples of 32.
+template <int MaxRows, int MaxCols, int MaxDepth, int KcFactor>
+class gemm_blocking_space<ColMajor, QInt8, QInt8, MaxRows, MaxCols, MaxDepth,
+                          KcFactor, false>
+    : public level3_blocking<QInt8, QInt8> {
+  DenseIndex m_sizeA;
+  DenseIndex m_sizeB;
+
+ public:
+  gemm_blocking_space(DenseIndex rows, DenseIndex cols, DenseIndex depth,
+                      DenseIndex /*num_threads*/, bool /*l3_blocking*/) {
+    this->m_mc = ((rows + 31) / 32) * 32;
+    this->m_nc = ((cols + 31) / 32) * 32;
+    this->m_kc = ((depth + 31) / 32) * 32;
+    m_sizeA = this->m_mc * this->m_kc;
+    m_sizeB = this->m_kc * this->m_nc;
+  }
+  void allocateA() {
+    if (this->m_blockA == 0) this->m_blockA = aligned_new<QInt8>(m_sizeA);
+  }
+  void allocateB() {
+    if (this->m_blockB == 0) this->m_blockB = aligned_new<QInt8>(m_sizeB);
+  }
+  void allocateAll() {
+    allocateA();
+    allocateB();
+  }
+  ~gemm_blocking_space() {
+    aligned_delete(this->m_blockA, m_sizeA);
+    aligned_delete(this->m_blockB, m_sizeB);
+  }
+};
+
+
+template <int MaxRows, int MaxCols, int MaxDepth, int KcFactor>
+class gemm_blocking_space<ColMajor, QInt8, QUInt8, MaxRows, MaxCols, MaxDepth,
+                          KcFactor, false>
+    : public level3_blocking<QInt8, QUInt8> {
+  DenseIndex m_sizeA;
+  DenseIndex m_sizeB;
+
+ public:
+  gemm_blocking_space(DenseIndex rows, DenseIndex cols, DenseIndex depth,
+                      DenseIndex /*num_threads*/, bool /*l3_blocking*/) {
+    this->m_mc = ((rows + 31) / 32) * 32;
+    this->m_nc = ((cols + 31) / 32) * 32;
+    this->m_kc = ((depth + 31) / 32) * 32;
+    m_sizeA = this->m_mc * this->m_kc;
+    m_sizeB = this->m_kc * this->m_nc;
+  }
+  void allocateA() {
+    if (this->m_blockA == 0) this->m_blockA = aligned_new<QInt8>(m_sizeA);
+  }
+  void allocateB() {
+    if (this->m_blockB == 0) this->m_blockB = aligned_new<QUInt8>(m_sizeB);
+  }
+  void allocateAll() {
+    allocateA();
+    allocateB();
+  }
+  ~gemm_blocking_space() {
+    aligned_delete(this->m_blockA, m_sizeA);
+    aligned_delete(this->m_blockB, m_sizeB);
+  }
+};
+
+// Alternate templates for any input sizes
+template<typename Scalar, typename Index, typename DataMapper, int Pack1, int Pack2, int StorageOrder, bool Conjugate = false, bool PanelMode = false>
+struct gemm_pack_lhs_any;
+template <typename Index, typename DataMapper, int Pack1, int Pack2, bool Conjugate, bool PanelMode>
+struct gemm_pack_lhs_any<QInt8, Index, DataMapper, Pack1, Pack2, ColMajor, Conjugate, PanelMode> {
+  EIGEN_DONT_INLINE void operator()
+      (QInt8* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride = 0, Index offset = 0);
+};
+
+template<typename Scalar, typename Index, typename DataMapper, int nr, int StorageOrder, bool Conjugate = false, bool PanelMode=false>
+struct gemm_pack_rhs_any;
+template <typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
+struct gemm_pack_rhs_any<QUInt8, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode> {
+  EIGEN_DONT_INLINE void operator()
+      (QUInt8* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride = 0, Index offset = 0);
+};
+
+template<typename LhsScalar, typename RhsScalar, typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs=false, bool ConjugateRhs=false>
+struct gebp_kernel_any;
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+struct gebp_kernel_any<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+{
+  typedef typename DataMapper::LinearMapper LinearMapper;
+
+  EIGEN_DONT_INLINE
+  void operator()(const DataMapper& res, const QInt8* blockA, const QUInt8* blockB,
+                  Index rows, Index depth, Index cols, QInt32 alpha,
+                  Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0);
+};
+
+// Alternate implementations for any input sizes
+template <typename Index, typename DataMapper, int Pack1, int Pack2, bool Conjugate, bool PanelMode>
+EIGEN_DONT_INLINE void gemm_pack_lhs_any<QInt8, Index, DataMapper, Pack1, Pack2, ColMajor, Conjugate, PanelMode>::
+operator()(QInt8* blockA, const DataMapper& lhs, Index depth, Index rows, Index stride, Index offset) {
+  eigen_assert(stride == 0);
+  eigen_assert(offset == 0);
+
+  // Get vector pointer
+  __m256i* blockA_256 = reinterpret_cast<__m256i*>(blockA);
+
+  // Get even multiples of the dimensions
+  Index rows_32 = (rows / 32) * 32;
+  Index depth_8 = (depth / 8) * 8;
+
+  // Get padding for when depth is not a multiple of 32
+  int padding = 0;
+  if (depth % 32 != 0) {
+    int depth_32 = (depth / 32) * 32;
+    int extra_depth = depth - depth_32;
+    int extra_depth_8 = ((extra_depth + 7) / 8) * 8;
+    padding = 32 - extra_depth_8;
+  }
+
+  // Pack rows in sets of 32
+  for (Index m = 0; m < rows_32; m += 32) {
+    // Pack depth in sets of 8
+    for (Index k = 0; k < depth_8; k += 8) {
+      // Load vectors
+      __m256i L_A = lhs.loadPacket(m, k);
+      __m256i L_B = lhs.loadPacket(m, k + 1);
+
+      // Interleave 8-bit elements
+      __m256i L_AB0_AB16 = _mm256_unpacklo_epi8(L_A, L_B);
+      __m256i L_AB8_AB24 = _mm256_unpackhi_epi8(L_A, L_B);
+
+      __m256i L_C = lhs.loadPacket(m, k + 2);
+      __m256i L_D = lhs.loadPacket(m, k + 3);
+      __m256i L_CD0_CD16 = _mm256_unpacklo_epi8(L_C, L_D);
+      __m256i L_CD8_CD24 = _mm256_unpackhi_epi8(L_C, L_D);
+
+      // Interleave 16-bit elements
+      __m256i L_AD0_AD16 = _mm256_unpacklo_epi16(L_AB0_AB16, L_CD0_CD16);
+      __m256i L_AD4_AD20 = _mm256_unpackhi_epi16(L_AB0_AB16, L_CD0_CD16);
+
+      // Use permute before we store to cross 128-bit lanes
+      __m256i L_AD0 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD0);
+
+      // Complete packing for 32 x 8 block
+      __m256i L_AD16 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x31);
+      __m256i L_AD8_AD24 = _mm256_unpacklo_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD12_AD28 = _mm256_unpackhi_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD8 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD8);
+      _mm256_store_si256(blockA_256++, L_AD16);
+      __m256i L_AD24 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x31);
+      _mm256_store_si256(blockA_256++, L_AD24);
+      __m256i L_E = lhs.loadPacket(m, k + 4);
+      __m256i L_F = lhs.loadPacket(m, k + 5);
+      __m256i L_EF0_EF16 = _mm256_unpacklo_epi8(L_E, L_F);
+      __m256i L_EF8_EF24 = _mm256_unpackhi_epi8(L_E, L_F);
+      __m256i L_G = lhs.loadPacket(m, k + 6);
+      __m256i L_H = lhs.loadPacket(m, k + 7);
+      __m256i L_GH0_GH16 = _mm256_unpacklo_epi8(L_G, L_H);
+      __m256i L_GH8_GH24 = _mm256_unpackhi_epi8(L_G, L_H);
+      __m256i L_EH0_EH16 = _mm256_unpacklo_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH4_EH20 = _mm256_unpackhi_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH0 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH0);
+      __m256i L_EH16 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x31);
+      __m256i L_EH8_EH24 = _mm256_unpacklo_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH12_EH28 = _mm256_unpackhi_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH8 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH8);
+      _mm256_store_si256(blockA_256++, L_EH16);
+      __m256i L_EH24 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x31);
+      _mm256_store_si256(blockA_256++, L_EH24);
+    }
+
+    // Finish the k dimension, padding with zeros
+    if (depth_8 < depth) {
+      __m256i L_A, L_B, L_C, L_D, L_E, L_F, L_G, L_H;
+      switch (depth - depth_8) {
+      case 1:
+        L_A = lhs.loadPacket(m, depth_8);
+        L_B = _mm256_setzero_si256();
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        break;
+      case 2:
+        L_A = lhs.loadPacket(m, depth_8);
+        L_B = lhs.loadPacket(m, depth_8 + 1);
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        break;
+      case 3:
+        L_A = lhs.loadPacket(m, depth_8);
+        L_B = lhs.loadPacket(m, depth_8 + 1);
+        L_C = lhs.loadPacket(m, depth_8 + 2);
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        break;
+      case 4:
+        L_A = lhs.loadPacket(m, depth_8);
+        L_B = lhs.loadPacket(m, depth_8 + 1);
+        L_C = lhs.loadPacket(m, depth_8 + 2);
+        L_D = lhs.loadPacket(m, depth_8 + 3);
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        break;
+      case 5:
+        L_A = lhs.loadPacket(m, depth_8);
+        L_B = lhs.loadPacket(m, depth_8 + 1);
+        L_C = lhs.loadPacket(m, depth_8 + 2);
+        L_D = lhs.loadPacket(m, depth_8 + 3);
+        L_E = lhs.loadPacket(m, depth_8 + 4);
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        break;
+      case 6:
+        L_A = lhs.loadPacket(m, depth_8);
+        L_B = lhs.loadPacket(m, depth_8 + 1);
+        L_C = lhs.loadPacket(m, depth_8 + 2);
+        L_D = lhs.loadPacket(m, depth_8 + 3);
+        L_E = lhs.loadPacket(m, depth_8 + 4);
+        L_F = lhs.loadPacket(m, depth_8 + 5);
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        break;
+      case 7:
+        L_A = lhs.loadPacket(m, depth_8);
+        L_B = lhs.loadPacket(m, depth_8 + 1);
+        L_C = lhs.loadPacket(m, depth_8 + 2);
+        L_D = lhs.loadPacket(m, depth_8 + 3);
+        L_E = lhs.loadPacket(m, depth_8 + 4);
+        L_F = lhs.loadPacket(m, depth_8 + 5);
+        L_G = lhs.loadPacket(m, depth_8 + 6);
+        L_H = _mm256_setzero_si256();
+        break;
+      }
+
+      // Interleave 8-bit elements
+      __m256i L_AB0_AB16 = _mm256_unpacklo_epi8(L_A, L_B);
+      __m256i L_AB8_AB24 = _mm256_unpackhi_epi8(L_A, L_B);
+
+      __m256i L_CD0_CD16 = _mm256_unpacklo_epi8(L_C, L_D);
+      __m256i L_CD8_CD24 = _mm256_unpackhi_epi8(L_C, L_D);
+
+      // Interleave 16-bit elements
+      __m256i L_AD0_AD16 = _mm256_unpacklo_epi16(L_AB0_AB16, L_CD0_CD16);
+      __m256i L_AD4_AD20 = _mm256_unpackhi_epi16(L_AB0_AB16, L_CD0_CD16);
+
+      // Use permute before we store to cross 128-bit lanes
+      __m256i L_AD0 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD0);
+
+      // Complete packing
+      __m256i L_AD16 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x31);
+      __m256i L_AD8_AD24 = _mm256_unpacklo_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD12_AD28 = _mm256_unpackhi_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD8 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD8);
+      _mm256_store_si256(blockA_256++, L_AD16);
+      __m256i L_AD24 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x31);
+      _mm256_store_si256(blockA_256++, L_AD24);
+      __m256i L_EF0_EF16 = _mm256_unpacklo_epi8(L_E, L_F);
+      __m256i L_EF8_EF24 = _mm256_unpackhi_epi8(L_E, L_F);
+      __m256i L_GH0_GH16 = _mm256_unpacklo_epi8(L_G, L_H);
+      __m256i L_GH8_GH24 = _mm256_unpackhi_epi8(L_G, L_H);
+      __m256i L_EH0_EH16 = _mm256_unpacklo_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH4_EH20 = _mm256_unpackhi_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH0 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH0);
+      __m256i L_EH16 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x31);
+      __m256i L_EH8_EH24 = _mm256_unpacklo_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH12_EH28 = _mm256_unpackhi_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH8 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH8);
+      _mm256_store_si256(blockA_256++, L_EH16);
+      __m256i L_EH24 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x31);
+      _mm256_store_si256(blockA_256++, L_EH24);
+    }
+    blockA_256 += padding;
+  }
+
+  // Finish the m dimension, padding with zeros
+  if (rows_32 < rows) {
+    // Pack depth in sets of 8
+    for (Index k = 0; k < depth_8; k += 8) {
+      // Load vectors
+      __m256i L_A = _mm256_setzero_si256();
+      __m256i L_B = _mm256_setzero_si256();
+      __m256i L_C = _mm256_setzero_si256();
+      __m256i L_D = _mm256_setzero_si256();
+      __m256i L_E = _mm256_setzero_si256();
+      __m256i L_F = _mm256_setzero_si256();
+      __m256i L_G = _mm256_setzero_si256();
+      __m256i L_H = _mm256_setzero_si256();
+      for (Index m = 0; m < rows - rows_32; m++) {
+        QInt8* ptr = (QInt8*) &L_A;
+        ptr[m] = lhs(rows_32 + m, k);
+        ptr = (QInt8*) &L_B;
+        ptr[m] = lhs(rows_32 + m, k + 1);
+        ptr = (QInt8*) &L_C;
+        ptr[m] = lhs(rows_32 + m, k + 2);
+        ptr = (QInt8*) &L_D;
+        ptr[m] = lhs(rows_32 + m, k + 3);
+        ptr = (QInt8*) &L_E;
+        ptr[m] = lhs(rows_32 + m, k + 4);
+        ptr = (QInt8*) &L_F;
+        ptr[m] = lhs(rows_32 + m, k + 5);
+        ptr = (QInt8*) &L_G;
+        ptr[m] = lhs(rows_32 + m, k + 6);
+        ptr = (QInt8*) &L_H;
+        ptr[m] = lhs(rows_32 + m, k + 7);
+      }
+
+      // Interleave 8-bit elements
+      __m256i L_AB0_AB16 = _mm256_unpacklo_epi8(L_A, L_B);
+      __m256i L_AB8_AB24 = _mm256_unpackhi_epi8(L_A, L_B);
+      __m256i L_CD0_CD16 = _mm256_unpacklo_epi8(L_C, L_D);
+      __m256i L_CD8_CD24 = _mm256_unpackhi_epi8(L_C, L_D);
+
+      // Interleave 16-bit elements
+      __m256i L_AD0_AD16 = _mm256_unpacklo_epi16(L_AB0_AB16, L_CD0_CD16);
+      __m256i L_AD4_AD20 = _mm256_unpackhi_epi16(L_AB0_AB16, L_CD0_CD16);
+
+      // Use permute before we store to cross 128-bit lanes
+      __m256i L_AD0 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD0);
+
+      // Complete packing for 32 x 8 block
+      __m256i L_AD16 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x31);
+      __m256i L_AD8_AD24 = _mm256_unpacklo_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD12_AD28 = _mm256_unpackhi_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD8 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD8);
+      _mm256_store_si256(blockA_256++, L_AD16);
+      __m256i L_AD24 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x31);
+      _mm256_store_si256(blockA_256++, L_AD24);
+      __m256i L_EF0_EF16 = _mm256_unpacklo_epi8(L_E, L_F);
+      __m256i L_EF8_EF24 = _mm256_unpackhi_epi8(L_E, L_F);
+      __m256i L_GH0_GH16 = _mm256_unpacklo_epi8(L_G, L_H);
+      __m256i L_GH8_GH24 = _mm256_unpackhi_epi8(L_G, L_H);
+      __m256i L_EH0_EH16 = _mm256_unpacklo_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH4_EH20 = _mm256_unpackhi_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH0 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH0);
+      __m256i L_EH16 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x31);
+      __m256i L_EH8_EH24 = _mm256_unpacklo_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH12_EH28 = _mm256_unpackhi_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH8 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH8);
+      _mm256_store_si256(blockA_256++, L_EH16);
+      __m256i L_EH24 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x31);
+      _mm256_store_si256(blockA_256++, L_EH24);
+    }
+
+    // Finish the k dimension, padding with zeros
+    if (depth_8 < depth) {
+      __m256i L_A, L_B, L_C, L_D, L_E, L_F, L_G, L_H;
+      QInt8* ptr;
+      switch (depth - depth_8) {
+      case 1:
+        L_A = _mm256_setzero_si256();
+        L_B = _mm256_setzero_si256();
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        for (Index m = 0; m < rows - rows_32; m++) {
+          QInt8* ptr = (QInt8*) &L_A;
+          ptr[m] = lhs(rows_32 + m, depth_8);
+        }
+        break;
+      case 2:
+        L_A = _mm256_setzero_si256();
+        L_B = _mm256_setzero_si256();
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        for (Index m = 0; m < rows - rows_32; m++) {
+          ptr = (QInt8*) &L_A;
+          ptr[m] = lhs(rows_32 + m, depth_8);
+          ptr = (QInt8*) &L_B;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 1);
+        }
+        break;
+      case 3:
+        L_A = _mm256_setzero_si256();
+        L_B = _mm256_setzero_si256();
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        for (Index m = 0; m < rows - rows_32; m++) {
+          ptr = (QInt8*) &L_A;
+          ptr[m] = lhs(rows_32 + m, depth_8);
+          ptr = (QInt8*) &L_B;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 1);
+          ptr = (QInt8*) &L_C;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 2);
+        }
+        break;
+      case 4:
+        L_A = _mm256_setzero_si256();
+        L_B = _mm256_setzero_si256();
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        for (Index m = 0; m < rows - rows_32; m++) {
+          ptr = (QInt8*) &L_A;
+          ptr[m] = lhs(rows_32 + m, depth_8);
+          ptr = (QInt8*) &L_B;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 1);
+          ptr = (QInt8*) &L_C;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 2);
+          ptr = (QInt8*) &L_D;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 3);
+        }
+        break;
+      case 5:
+        L_A = _mm256_setzero_si256();
+        L_B = _mm256_setzero_si256();
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        for (Index m = 0; m < rows - rows_32; m++) {
+          ptr = (QInt8*) &L_A;
+          ptr[m] = lhs(rows_32 + m, depth_8);
+          ptr = (QInt8*) &L_B;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 1);
+          ptr = (QInt8*) &L_C;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 2);
+          ptr = (QInt8*) &L_D;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 3);
+          ptr = (QInt8*) &L_E;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 4);
+        }
+        break;
+      case 6:
+        L_A = _mm256_setzero_si256();
+        L_B = _mm256_setzero_si256();
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        for (Index m = 0; m < rows - rows_32; m++) {
+          ptr = (QInt8*) &L_A;
+          ptr[m] = lhs(rows_32 + m, depth_8);
+          ptr = (QInt8*) &L_B;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 1);
+          ptr = (QInt8*) &L_C;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 2);
+          ptr = (QInt8*) &L_D;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 3);
+          ptr = (QInt8*) &L_E;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 4);
+          ptr = (QInt8*) &L_F;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 5);
+        }
+        break;
+      case 7:
+        L_A = _mm256_setzero_si256();
+        L_B = _mm256_setzero_si256();
+        L_C = _mm256_setzero_si256();
+        L_D = _mm256_setzero_si256();
+        L_E = _mm256_setzero_si256();
+        L_F = _mm256_setzero_si256();
+        L_G = _mm256_setzero_si256();
+        L_H = _mm256_setzero_si256();
+        for (Index m = 0; m < rows - rows_32; m++) {
+          ptr = (QInt8*) &L_A;
+          ptr[m] = lhs(rows_32 + m, depth_8);
+          ptr = (QInt8*) &L_B;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 1);
+          ptr = (QInt8*) &L_C;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 2);
+          ptr = (QInt8*) &L_D;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 3);
+          ptr = (QInt8*) &L_E;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 4);
+          ptr = (QInt8*) &L_F;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 5);
+          ptr = (QInt8*) &L_G;
+          ptr[m] = lhs(rows_32 + m, depth_8 + 6);
+        }
+        break;
+      }
+
+      // Interleave 8-bit elements
+      __m256i L_AB0_AB16 = _mm256_unpacklo_epi8(L_A, L_B);
+      __m256i L_AB8_AB24 = _mm256_unpackhi_epi8(L_A, L_B);
+      __m256i L_CD0_CD16 = _mm256_unpacklo_epi8(L_C, L_D);
+      __m256i L_CD8_CD24 = _mm256_unpackhi_epi8(L_C, L_D);
+
+      // Interleave 16-bit elements
+      __m256i L_AD0_AD16 = _mm256_unpacklo_epi16(L_AB0_AB16, L_CD0_CD16);
+      __m256i L_AD4_AD20 = _mm256_unpackhi_epi16(L_AB0_AB16, L_CD0_CD16);
+
+      // Use permute before we store to cross 128-bit lanes
+      __m256i L_AD0 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD0);
+
+      // Complete packing
+      __m256i L_AD16 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x31);
+      __m256i L_AD8_AD24 = _mm256_unpacklo_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD12_AD28 = _mm256_unpackhi_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD8 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD8);
+      _mm256_store_si256(blockA_256++, L_AD16);
+      __m256i L_AD24 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x31);
+      _mm256_store_si256(blockA_256++, L_AD24);
+      __m256i L_EF0_EF16 = _mm256_unpacklo_epi8(L_E, L_F);
+      __m256i L_EF8_EF24 = _mm256_unpackhi_epi8(L_E, L_F);
+      __m256i L_GH0_GH16 = _mm256_unpacklo_epi8(L_G, L_H);
+      __m256i L_GH8_GH24 = _mm256_unpackhi_epi8(L_G, L_H);
+      __m256i L_EH0_EH16 = _mm256_unpacklo_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH4_EH20 = _mm256_unpackhi_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH0 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH0);
+      __m256i L_EH16 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x31);
+      __m256i L_EH8_EH24 = _mm256_unpacklo_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH12_EH28 = _mm256_unpackhi_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH8 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH8);
+      _mm256_store_si256(blockA_256++, L_EH16);
+      __m256i L_EH24 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x31);
+      _mm256_store_si256(blockA_256++, L_EH24);
+    }
+  }
+}
+
+template <typename Index, typename DataMapper, int nr, bool Conjugate, bool PanelMode>
+EIGEN_DONT_INLINE void gemm_pack_rhs_any<QUInt8, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode>::
+operator()(QUInt8* blockB, const DataMapper& rhs, Index depth, Index cols, Index stride, Index offset) {
+  eigen_assert(stride == 0);
+  eigen_assert(offset == 0);
+
+  // Get vector pointer
+  __m256i* blockB_256 = reinterpret_cast<__m256i*>(blockB);
+
+  // Get even multiples of the dimensions
+  Index cols_32 = (cols / 32) * 32;
+  Index depth_32 = (depth / 32) * 32;
+
+  // Perform a step of the packing for 4 columns
+  __m256i R_AB_L, R_AB_H, R_CD_L, R_CD_H, R_AD_0, R_AD_8, R_AD_16, R_AD_24;
+#define PACK_STEP                                            \
+  R_AB_L = _mm256_unpacklo_epi64(R_A, R_B);                  \
+  R_CD_L = _mm256_unpacklo_epi64(R_C, R_D);                  \
+  R_AB_H = _mm256_unpackhi_epi64(R_A, R_B);                  \
+  R_CD_H = _mm256_unpackhi_epi64(R_C, R_D);                  \
+  R_AD_0 = _mm256_permute2x128_si256(R_AB_L, R_CD_L, 0x20);  \
+  R_AD_16 = _mm256_permute2x128_si256(R_AB_L, R_CD_L, 0x31); \
+  R_AD_8 = _mm256_permute2x128_si256(R_AB_H, R_CD_H, 0x20);  \
+  R_AD_24 = _mm256_permute2x128_si256(R_AB_H, R_CD_H, 0x31); \
+  _mm256_store_si256(blockB_256, R_AD_0);                    \
+  _mm256_store_si256(blockB_256 + 8, R_AD_8);                \
+  _mm256_store_si256(blockB_256 + 16, R_AD_16);              \
+  _mm256_store_si256(blockB_256 + 24, R_AD_24);              \
+  blockB_256++;
+
+  // Pack cols in sets of 32
+  for (Index n = 0; n < cols_32; n += 32) {
+    // Pack depth in sets of 32
+    for (Index k = 0; k < depth_32; k += 32) {
+      __m256i R_A = rhs.loadPacket(k, n);
+      __m256i R_B = rhs.loadPacket(k, n + 1);
+      __m256i R_C = rhs.loadPacket(k, n + 2);
+      __m256i R_D = rhs.loadPacket(k, n + 3);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 4);
+      R_B = rhs.loadPacket(k, n + 5);
+      R_C = rhs.loadPacket(k, n + 6);
+      R_D = rhs.loadPacket(k, n + 7);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 8);
+      R_B = rhs.loadPacket(k, n + 9);
+      R_C = rhs.loadPacket(k, n + 10);
+      R_D = rhs.loadPacket(k, n + 11);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 12);
+      R_B = rhs.loadPacket(k, n + 13);
+      R_C = rhs.loadPacket(k, n + 14);
+      R_D = rhs.loadPacket(k, n + 15);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 16);
+      R_B = rhs.loadPacket(k, n + 17);
+      R_C = rhs.loadPacket(k, n + 18);
+      R_D = rhs.loadPacket(k, n + 19);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 20);
+      R_B = rhs.loadPacket(k, n + 21);
+      R_C = rhs.loadPacket(k, n + 22);
+      R_D = rhs.loadPacket(k, n + 23);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 24);
+      R_B = rhs.loadPacket(k, n + 25);
+      R_C = rhs.loadPacket(k, n + 26);
+      R_D = rhs.loadPacket(k, n + 27);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 28);
+      R_B = rhs.loadPacket(k, n + 29);
+      R_C = rhs.loadPacket(k, n + 30);
+      R_D = rhs.loadPacket(k, n + 31);
+      PACK_STEP;
+
+      blockB_256 += 24;
+    }
+
+    if (depth_32 < depth) {
+      QUInt8* ptr;
+      __m256i R_A = _mm256_setzero_si256();
+      __m256i R_B = _mm256_setzero_si256();
+      __m256i R_C = _mm256_setzero_si256();
+      __m256i R_D = _mm256_setzero_si256();
+      for (Index k = depth_32; k < depth; k++) {
+        ptr = (QUInt8*) &R_A;
+        ptr[k - depth_32] = rhs(k, n);
+        ptr = (QUInt8*) &R_B;
+        ptr[k - depth_32] = rhs(k, n + 1);
+        ptr = (QUInt8*) &R_C;
+        ptr[k - depth_32] = rhs(k, n + 2);
+        ptr = (QUInt8*) &R_D;
+        ptr[k - depth_32] = rhs(k, n + 3);
+      }
+      PACK_STEP;
+
+      R_A = _mm256_setzero_si256();
+      R_B = _mm256_setzero_si256();
+      R_C = _mm256_setzero_si256();
+      R_D = _mm256_setzero_si256();
+      for (Index k = depth_32; k < depth; k++) {
+        ptr = (QUInt8*) &R_A;
+        ptr[k - depth_32] = rhs(k, n + 4);
+        ptr = (QUInt8*) &R_B;
+        ptr[k - depth_32] = rhs(k, n + 5);
+        ptr = (QUInt8*) &R_C;
+        ptr[k - depth_32] = rhs(k, n + 6);
+        ptr = (QUInt8*) &R_D;
+        ptr[k - depth_32] = rhs(k, n + 7);
+      }
+      PACK_STEP;
+
+      R_A = _mm256_setzero_si256();
+      R_B = _mm256_setzero_si256();
+      R_C = _mm256_setzero_si256();
+      R_D = _mm256_setzero_si256();
+      for (Index k = depth_32; k < depth; k++) {
+        ptr = (QUInt8*) &R_A;
+        ptr[k - depth_32] = rhs(k, n + 8);
+        ptr = (QUInt8*) &R_B;
+        ptr[k - depth_32] = rhs(k, n + 9);
+        ptr = (QUInt8*) &R_C;
+        ptr[k - depth_32] = rhs(k, n + 10);
+        ptr = (QUInt8*) &R_D;
+        ptr[k - depth_32] = rhs(k, n + 11);
+      }
+      PACK_STEP;
+
+      R_A = _mm256_setzero_si256();
+      R_B = _mm256_setzero_si256();
+      R_C = _mm256_setzero_si256();
+      R_D = _mm256_setzero_si256();
+      for (Index k = depth_32; k < depth; k++) {
+        ptr = (QUInt8*) &R_A;
+        ptr[k - depth_32] = rhs(k, n + 12);
+        ptr = (QUInt8*) &R_B;
+        ptr[k - depth_32] = rhs(k, n + 13);
+        ptr = (QUInt8*) &R_C;
+        ptr[k - depth_32] = rhs(k, n + 14);
+        ptr = (QUInt8*) &R_D;
+        ptr[k - depth_32] = rhs(k, n + 15);
+      }
+      PACK_STEP;
+
+      R_A = _mm256_setzero_si256();
+      R_B = _mm256_setzero_si256();
+      R_C = _mm256_setzero_si256();
+      R_D = _mm256_setzero_si256();
+      for (Index k = depth_32; k < depth; k++) {
+        ptr = (QUInt8*) &R_A;
+        ptr[k - depth_32] = rhs(k, n + 16);
+        ptr = (QUInt8*) &R_B;
+        ptr[k - depth_32] = rhs(k, n + 17);
+        ptr = (QUInt8*) &R_C;
+        ptr[k - depth_32] = rhs(k, n + 18);
+        ptr = (QUInt8*) &R_D;
+        ptr[k - depth_32] = rhs(k, n + 19);
+      }
+      PACK_STEP;
+
+      R_A = _mm256_setzero_si256();
+      R_B = _mm256_setzero_si256();
+      R_C = _mm256_setzero_si256();
+      R_D = _mm256_setzero_si256();
+      for (Index k = depth_32; k < depth; k++) {
+        ptr = (QUInt8*) &R_A;
+        ptr[k - depth_32] = rhs(k, n + 20);
+        ptr = (QUInt8*) &R_B;
+        ptr[k - depth_32] = rhs(k, n + 21);
+        ptr = (QUInt8*) &R_C;
+        ptr[k - depth_32] = rhs(k, n + 22);
+        ptr = (QUInt8*) &R_D;
+        ptr[k - depth_32] = rhs(k, n + 23);
+      }
+      PACK_STEP;
+
+      R_A = _mm256_setzero_si256();
+      R_B = _mm256_setzero_si256();
+      R_C = _mm256_setzero_si256();
+      R_D = _mm256_setzero_si256();
+      for (Index k = depth_32; k < depth; k++) {
+        ptr = (QUInt8*) &R_A;
+        ptr[k - depth_32] = rhs(k, n + 24);
+        ptr = (QUInt8*) &R_B;
+        ptr[k - depth_32] = rhs(k, n + 25);
+        ptr = (QUInt8*) &R_C;
+        ptr[k - depth_32] = rhs(k, n + 26);
+        ptr = (QUInt8*) &R_D;
+        ptr[k - depth_32] = rhs(k, n + 27);
+      }
+      PACK_STEP;
+
+      R_A = _mm256_setzero_si256();
+      R_B = _mm256_setzero_si256();
+      R_C = _mm256_setzero_si256();
+      R_D = _mm256_setzero_si256();
+      for (Index k = depth_32; k < depth; k++) {
+        ptr = (QUInt8*) &R_A;
+        ptr[k - depth_32] = rhs(k, n + 28);
+        ptr = (QUInt8*) &R_B;
+        ptr[k - depth_32] = rhs(k, n + 29);
+        ptr = (QUInt8*) &R_C;
+        ptr[k - depth_32] = rhs(k, n + 30);
+        ptr = (QUInt8*) &R_D;
+        ptr[k - depth_32] = rhs(k, n + 31);
+      }
+      PACK_STEP;
+      blockB_256 += 24;
+    }
+  }
+
+  // Finish packing cols
+  if (cols_32 < cols) {
+    // Pack depth in sets of 32
+    for (Index k = 0; k < depth_32; k += 32) {
+      __m256i R_A, R_B, R_C, R_D;
+      Index n;
+      for (n = cols_32; n < cols; n += 4) {
+        switch (cols - n) {
+        case 1:
+          R_A = rhs.loadPacket(k, n);
+          R_B = _mm256_setzero_si256();
+          R_C = _mm256_setzero_si256();
+          R_D = _mm256_setzero_si256();
+          PACK_STEP;
+          break;
+        case 2:
+          R_A = rhs.loadPacket(k, n);
+          R_B = rhs.loadPacket(k, n + 1);
+          R_C = _mm256_setzero_si256();
+          R_D = _mm256_setzero_si256();
+          PACK_STEP;
+          break;
+        case 3:
+          R_A = rhs.loadPacket(k, n);
+          R_B = rhs.loadPacket(k, n + 1);
+          R_C = rhs.loadPacket(k, n + 2);
+          R_D = _mm256_setzero_si256();
+          PACK_STEP;
+          break;
+        default:
+          R_A = rhs.loadPacket(k, n);
+          R_B = rhs.loadPacket(k, n + 1);
+          R_C = rhs.loadPacket(k, n + 2);
+          R_D = rhs.loadPacket(k, n + 3);
+          PACK_STEP;
+          break;
+        }
+      }
+
+      // Increment the block pointer.
+      // We must pad if cols is not a multiple of 32.
+      blockB_256 += 32 - (n - cols_32) / 4;
+    }
+
+    if (depth_32 < depth) {
+      for (Index n = cols_32; n < cols; n += 4) {
+        QUInt8* ptr;
+        __m256i R_A = _mm256_setzero_si256();
+        __m256i R_B = _mm256_setzero_si256();
+        __m256i R_C = _mm256_setzero_si256();
+        __m256i R_D = _mm256_setzero_si256();
+        switch (cols - n) {
+        case 1:
+          for (Index k = depth_32; k < depth; k++) {
+            ptr = (QUInt8*) &R_A;
+            ptr[k - depth_32] = rhs(k, n);
+          }
+          PACK_STEP;
+          break;
+        case 2:
+          for (Index k = depth_32; k < depth; k++) {
+            ptr = (QUInt8*) &R_A;
+            ptr[k - depth_32] = rhs(k, n);
+            ptr = (QUInt8*) &R_B;
+            ptr[k - depth_32] = rhs(k, n + 1);
+          }
+          PACK_STEP;
+          break;
+        case 3:
+          for (Index k = depth_32; k < depth; k++) {
+            ptr = (QUInt8*) &R_A;
+            ptr[k - depth_32] = rhs(k, n);
+            ptr = (QUInt8*) &R_B;
+            ptr[k - depth_32] = rhs(k, n + 1);
+            ptr = (QUInt8*) &R_C;
+            ptr[k - depth_32] = rhs(k, n + 2);
+          }
+          PACK_STEP;
+          break;
+        default:
+          for (Index k = depth_32; k < depth; k++) {
+            ptr = (QUInt8*) &R_A;
+            ptr[k - depth_32] = rhs(k, n);
+            ptr = (QUInt8*) &R_B;
+            ptr[k - depth_32] = rhs(k, n + 1);
+            ptr = (QUInt8*) &R_C;
+            ptr[k - depth_32] = rhs(k, n + 2);
+            ptr = (QUInt8*) &R_D;
+            ptr[k - depth_32] = rhs(k, n + 3);
+          }
+          PACK_STEP;
+          break;
+        }
+      }
+    }
+  }
+#undef PACK_STEP
+}
+
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+EIGEN_DONT_INLINE
+void gebp_kernel_any<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+::operator()(const DataMapper& res, const QInt8* blockA, const QUInt8* blockB,
+             Index rows, Index depth, Index cols, QInt32 alpha,
+             Index strideA, Index strideB, Index offsetA, Index offsetB)
+{
+  EIGEN_STATIC_ASSERT(!ConjugateLhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  EIGEN_STATIC_ASSERT(!ConjugateRhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  eigen_assert(alpha.value == 1);
+  eigen_assert(strideA == -1);
+  eigen_assert(strideB == -1);
+  eigen_assert(offsetA == 0);
+  eigen_assert(offsetB == 0);
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+  eigen_assert(depth > 0);
+  eigen_assert(blockA);
+  eigen_assert(blockB);
+
+  Index rows_32 = ((rows + 31) / 32) * 32;
+  Index cols_32 = ((cols + 31) / 32) * 32;
+  Index depth_32 = ((depth + 31) / 32) * 32;
+
+  // Create result block
+  ei_declare_aligned_stack_constructed_variable(QInt32, blockO, 32 * 32, 0);
+  memset(blockO, 0, 32 * 32 * sizeof(QInt32));
+
+  // Get vectorized pointers
+  __m256i* blockO_256 = reinterpret_cast<__m256i*>(blockO);
+  const __m256i* blockA_256 = reinterpret_cast<const __m256i*>(blockA);
+  const __m256i* blockB_256 = reinterpret_cast<const __m256i*>(blockB);
+
+  // Loop over blocks of 32 columns
+  for (Index n = 0; n < cols_32; n += 32) {
+    // Reset index into blockA
+    Index indexL = 0;
+    // Loop over blocks of 32 rows
+    for (Index m = 0; m < rows_32; m += 32) {
+      // Reset index into blockB
+      Index indexR = n / 32 * depth_32;
+      // Loop over blocks of 8 on depth
+      for (Index k = 0; k < depth_32; k += 8) {
+        // Load inputs
+        __m256i L_AD0 = blockA_256[indexL++];
+        __m256i L_AD8 = blockA_256[indexL++];
+        __m256i L_AD16 = blockA_256[indexL++];
+        __m256i L_AD24 = blockA_256[indexL++];
+        __m256i L_EH0 = blockA_256[indexL++];
+        __m256i L_EH8 = blockA_256[indexL++];
+        __m256i L_EH16 = blockA_256[indexL++];
+        __m256i L_EH24 = blockA_256[indexL++];
+        __m256i R_AH0 = blockB_256[indexR++];
+        __m256i R_AH4 = blockB_256[indexR++];
+        __m256i R_AH8 = blockB_256[indexR++];
+        __m256i R_AH12 = blockB_256[indexR++];
+        __m256i R_AH16 = blockB_256[indexR++];
+        __m256i R_AH20 = blockB_256[indexR++];
+        __m256i R_AH24 = blockB_256[indexR++];
+        __m256i R_AH28 = blockB_256[indexR++];
+
+        // This constant is used with madd to convert 16 bit to 32 bit
+        const __m256i ONE = _mm256_set1_epi32(0x00010001);
+
+        // Declare variables used in COMPUTE_STEP
+        __m256i P_16_A, P_16_B, P_32_A, P_32_B, P_32;
+
+#define COMPUTE_STEP(R_INPUT_A, R_INPUT_B, OFFSET)                             \
+  P_16_A = _mm256_maddubs_epi16(R_INPUT_A, L_AD0);                             \
+  P_32_A = _mm256_madd_epi16(P_16_A, ONE);                                     \
+  P_16_B = _mm256_maddubs_epi16(R_INPUT_B, L_EH0);                             \
+  P_32_B = _mm256_madd_epi16(P_16_B, ONE);                                     \
+  P_32 = _mm256_add_epi32(P_32_A, P_32_B);                                     \
+  _mm256_store_si256(                                                          \
+      blockO_256 + 4 * OFFSET,                                                 \
+      _mm256_add_epi32(_mm256_load_si256(blockO_256 + 4 * OFFSET), P_32));     \
+                                                                               \
+  P_16_A = _mm256_maddubs_epi16(R_INPUT_A, L_AD8);                             \
+  P_32_A = _mm256_madd_epi16(P_16_A, ONE);                                     \
+  P_16_B = _mm256_maddubs_epi16(R_INPUT_B, L_EH8);                             \
+  P_32_B = _mm256_madd_epi16(P_16_B, ONE);                                     \
+  P_32 = _mm256_add_epi32(P_32_A, P_32_B);                                     \
+  _mm256_store_si256(                                                          \
+      blockO_256 + 4 * OFFSET + 1,                                             \
+      _mm256_add_epi32(_mm256_load_si256(blockO_256 + 4 * OFFSET + 1), P_32)); \
+                                                                               \
+  P_16_A = _mm256_maddubs_epi16(R_INPUT_A, L_AD16);                            \
+  P_32_A = _mm256_madd_epi16(P_16_A, ONE);                                     \
+  P_16_B = _mm256_maddubs_epi16(R_INPUT_B, L_EH16);                            \
+  P_32_B = _mm256_madd_epi16(P_16_B, ONE);                                     \
+  P_32 = _mm256_add_epi32(P_32_A, P_32_B);                                     \
+  _mm256_store_si256(                                                          \
+      blockO_256 + 4 * OFFSET + 2,                                             \
+      _mm256_add_epi32(_mm256_load_si256(blockO_256 + 4 * OFFSET + 2), P_32)); \
+                                                                               \
+  P_16_A = _mm256_maddubs_epi16(R_INPUT_A, L_AD24);                            \
+  P_32_A = _mm256_madd_epi16(P_16_A, ONE);                                     \
+  P_16_B = _mm256_maddubs_epi16(R_INPUT_B, L_EH24);                            \
+  P_32_B = _mm256_madd_epi16(P_16_B, ONE);                                     \
+  P_32 = _mm256_add_epi32(P_32_A, P_32_B);                                     \
+  _mm256_store_si256(                                                          \
+      blockO_256 + 4 * OFFSET + 3,                                             \
+      _mm256_add_epi32(_mm256_load_si256(blockO_256 + 4 * OFFSET + 3), P_32));
+
+        // Permute and shuffle to copy a single value across the entire vector
+        // Then compute the multiplication
+        __m256i R_AH0_ = _mm256_permute2x128_si256(R_AH0, R_AH0, 0x00);
+        __m256i R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        __m256i R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 0);
+        __m256i R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        __m256i R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 1);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH0, R_AH0, 0x11);
+        __m256i R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        __m256i R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 2);
+        __m256i R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        __m256i R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 3);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH4, R_AH4, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 4);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 5);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH4, R_AH4, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 6);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 7);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH8, R_AH8, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 8);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 9);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH8, R_AH8, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 10);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 11);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH12, R_AH12, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 12);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 13);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH12, R_AH12, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 14);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 15);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH16, R_AH16, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 16);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 17);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH16, R_AH16, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 18);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 19);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH20, R_AH20, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 20);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 21);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH20, R_AH20, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 22);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 23);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH24, R_AH24, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 24);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 25);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH24, R_AH24, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 26);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 27);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH28, R_AH28, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 28);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 29);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH28, R_AH28, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 30);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 31);
+
+#undef COMPUTE_STEP
+      }
+
+      // Transfer the results to the result matrix.
+      if (m + 32 <= rows && n + 32 <= cols) {
+        Index i = 0;
+        for (Index j = n; j < n + 32; j++) {
+          LinearMapper r0 = res.getLinearMapper(m, j);
+          LinearMapper r1 = res.getLinearMapper(m + 8, j);
+          LinearMapper r2 = res.getLinearMapper(m + 16, j);
+          LinearMapper r3 = res.getLinearMapper(m + 24, j);
+          r0.storePacket(
+              0, _mm256_add_epi32(blockO_256[i++], r0.loadPacket(0)));
+          r1.storePacket(
+              0, _mm256_add_epi32(blockO_256[i++], r1.loadPacket(0)));
+          r2.storePacket(
+              0, _mm256_add_epi32(blockO_256[i++], r2.loadPacket(0)));
+          r3.storePacket(
+              0, _mm256_add_epi32(blockO_256[i++], r3.loadPacket(0)));
+        }
+      }
+      else {
+        for (Index j = n; j < cols; j++) {
+          for (Index i = m; i < rows; i++) {
+            res(i, j) = blockO[(j - n) * 32 + (i - m)];
+          }
+        }
+      }
+
+      // Zero the result block so it can be reused
+      memset(blockO, 0, 32 * 32 * sizeof(QInt32));
+    }
+  }
+}
+
+// Below are the fully optimized versions that are correct only for sizes that
+// are multiple of 32.  It is about a 10% performance benefit to keep these
+// implementations separate.
+
+// Arrange a block of the left input matrix in contiguous memory.
+//
+// Given column major input (A0 beside A1 in memory):
+// A0 B0 C0 D0 E0 F0 G0 H0 ...
+// A1 B1 C1 D1 E1 F1 G1 H1 ...
+// A2 B2 C2 D2 E2 F2 G2 H2 ...
+// A3 B3 C3 D3 E3 F3 G3 H3 ...
+// A4 B4 C4 D4 E4 F4 G4 H4 ...
+// A5 B5 C5 D5 E5 F5 G5 H5 ...
+// A6 B6 C6 D6 E6 F6 G6 H6 ...
+// A7 B7 C7 D7 E7 F7 G7 H7 ...
+// A8 ...
+// ...
+//
+// Packing yields output (A0 beside B0 in memory):
+// A0 B0 C0 D0
+// A1 B1 C1 D1
+// A2 B2 C2 D2
+// A3 B3 C3 D3
+// A4 B4 C4 D4
+// A5 B5 C5 D5
+// A6 B6 C6 D6
+// A7 B7 C7 D7
+// ...
+// A31 B31 C31 D31
+// E0 F0 G0 H0
+// E1 F1 G1 H1
+// E2 F2 G2 H2
+// E3 F3 G3 H3
+// E4 F4 G4 H4
+// E5 F5 G5 H5
+// E6 F6 G6 H6
+// E7 F7 G7 H7
+// ...
+//
+// Four elements of the same row are arranged contiguously because maddubs and
+// madd both perform an adjacent addition in the kernel.
+template <typename Index, typename DataMapper, int Pack1, int Pack2,
+          bool Conjugate, bool PanelMode>
+struct gemm_pack_lhs<QInt8, Index, DataMapper, Pack1, Pack2, ColMajor,
+                     Conjugate, PanelMode> {
+  EIGEN_DONT_INLINE void operator()(QInt8* blockA, const DataMapper& lhs,
+                                    Index depth, Index rows, Index stride = 0,
+                                    Index offset = 0);
+};
+
+template <typename Index, typename DataMapper, int Pack1, int Pack2,
+          bool Conjugate, bool PanelMode>
+EIGEN_DONT_INLINE void gemm_pack_lhs<QInt8, Index, DataMapper, Pack1, Pack2,
+                                     ColMajor, Conjugate, PanelMode>::
+operator()(QInt8* blockA, const DataMapper& lhs, Index depth, Index rows,
+           Index stride, Index offset) {
+  eigen_assert(stride == 0);
+  eigen_assert(offset == 0);
+
+  // Use alternate function for weird sizes
+  if (rows % 32 != 0 || depth % 32 != 0) {
+    gemm_pack_lhs_any<QInt8, Index, DataMapper, Pack1, Pack2, ColMajor, Conjugate, PanelMode> lhs_pack;
+    return lhs_pack(blockA, lhs, depth, rows, stride, offset);
+  }
+
+  // Get vector pointer
+  __m256i* blockA_256 = reinterpret_cast<__m256i*>(blockA);
+
+  // Pack rows in sets of 32
+  for (Index m = 0; m < rows; m += 32) {
+    // Pack depth in sets of 8
+    for (Index k = 0; k < depth; k += 8) {
+      // Load vectors
+      __m256i L_A = lhs.loadPacket(m, k);
+      __m256i L_B = lhs.loadPacket(m, k + 1);
+
+      // Interleave 8-bit elements
+      __m256i L_AB0_AB16 = _mm256_unpacklo_epi8(L_A, L_B);
+      __m256i L_AB8_AB24 = _mm256_unpackhi_epi8(L_A, L_B);
+
+      __m256i L_C = lhs.loadPacket(m, k + 2);
+      __m256i L_D = lhs.loadPacket(m, k + 3);
+      __m256i L_CD0_CD16 = _mm256_unpacklo_epi8(L_C, L_D);
+      __m256i L_CD8_CD24 = _mm256_unpackhi_epi8(L_C, L_D);
+
+      // Interleave 16-bit elements
+      __m256i L_AD0_AD16 = _mm256_unpacklo_epi16(L_AB0_AB16, L_CD0_CD16);
+      __m256i L_AD4_AD20 = _mm256_unpackhi_epi16(L_AB0_AB16, L_CD0_CD16);
+
+      // Use permute before we store to cross 128-bit lanes
+      __m256i L_AD0 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD0);
+
+      // Complete packing for 32 x 8 block
+      __m256i L_AD16 = _mm256_permute2x128_si256(L_AD0_AD16, L_AD4_AD20, 0x31);
+      __m256i L_AD8_AD24 = _mm256_unpacklo_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD12_AD28 = _mm256_unpackhi_epi16(L_AB8_AB24, L_CD8_CD24);
+      __m256i L_AD8 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x20);
+      _mm256_store_si256(blockA_256++, L_AD8);
+      _mm256_store_si256(blockA_256++, L_AD16);
+      __m256i L_AD24 = _mm256_permute2x128_si256(L_AD8_AD24, L_AD12_AD28, 0x31);
+      _mm256_store_si256(blockA_256++, L_AD24);
+      __m256i L_E = lhs.loadPacket(m, k + 4);
+      __m256i L_F = lhs.loadPacket(m, k + 5);
+      __m256i L_EF0_EF16 = _mm256_unpacklo_epi8(L_E, L_F);
+      __m256i L_EF8_EF24 = _mm256_unpackhi_epi8(L_E, L_F);
+      __m256i L_G = lhs.loadPacket(m, k + 6);
+      __m256i L_H = lhs.loadPacket(m, k + 7);
+      __m256i L_GH0_GH16 = _mm256_unpacklo_epi8(L_G, L_H);
+      __m256i L_GH8_GH24 = _mm256_unpackhi_epi8(L_G, L_H);
+      __m256i L_EH0_EH16 = _mm256_unpacklo_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH4_EH20 = _mm256_unpackhi_epi16(L_EF0_EF16, L_GH0_GH16);
+      __m256i L_EH0 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH0);
+      __m256i L_EH16 = _mm256_permute2x128_si256(L_EH0_EH16, L_EH4_EH20, 0x31);
+      __m256i L_EH8_EH24 = _mm256_unpacklo_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH12_EH28 = _mm256_unpackhi_epi16(L_EF8_EF24, L_GH8_GH24);
+      __m256i L_EH8 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x20);
+      _mm256_store_si256(blockA_256++, L_EH8);
+      _mm256_store_si256(blockA_256++, L_EH16);
+      __m256i L_EH24 = _mm256_permute2x128_si256(L_EH8_EH24, L_EH12_EH28, 0x31);
+      _mm256_store_si256(blockA_256++, L_EH24);
+    }
+  }
+}
+
+// Arrange a block of the right input matrix in contiguous memory.
+//
+// Given column major input (A0 beside A1 in memory):
+// A0 B0 C0 D0 E0 F0 G0 H0 ...
+// A1 B1 C1 D1 E1 F1 G1 H1 ...
+// A2 B2 C2 D2 E2 F2 G2 H2 ...
+// A3 B3 C3 D3 E3 F3 G3 H3 ...
+// A4 B4 C4 D4 E4 F4 G4 H4 ...
+// A5 B5 C5 D5 E5 F5 G5 H5 ...
+// A6 B6 C6 D6 E6 F6 G6 H6 ...
+// A7 B7 C7 D7 E7 F7 G7 H7 ...
+// A8 ...
+// ...
+//
+// Packing yields row major output (A0 beside A1 in memory):
+// A0 A1 A2 A3 A4 A5 A6 A7
+// B0 B1 B2 B3 B4 B5 B6 B7
+// ...
+//
+// At least four elements of the same col are arranged contiguously because
+// maddubs and madd both perform an adjacent addition in the kernel.  We can
+// save work by leaving 8 adjacent elements because kr = 8.
+template <typename Index, typename DataMapper, int nr, bool Conjugate,
+          bool PanelMode>
+struct gemm_pack_rhs<QUInt8, Index, DataMapper, nr, ColMajor, Conjugate,
+                     PanelMode> {
+  EIGEN_DONT_INLINE void operator()(QUInt8* blockB, const DataMapper& rhs,
+                                    Index depth, Index cols, Index stride = 0,
+                                    Index offset = 0);
+};
+
+template <typename Index, typename DataMapper, int nr, bool Conjugate,
+          bool PanelMode>
+EIGEN_DONT_INLINE void gemm_pack_rhs<QUInt8, Index, DataMapper, nr, ColMajor,
+                                     Conjugate, PanelMode>::
+operator()(QUInt8* blockB, const DataMapper& rhs, Index depth, Index cols,
+           Index stride, Index offset) {
+  eigen_assert(stride == 0);
+  eigen_assert(offset == 0);
+
+  // Use alternate function for weird sizes
+  if (cols % 32 != 0 || depth % 32 != 0) {
+    gemm_pack_rhs_any<QUInt8, Index, DataMapper, nr, ColMajor, Conjugate, PanelMode> rhs_pack;
+    return rhs_pack(blockB, rhs, depth, cols, stride, offset);
+  }
+
+  // Get vector pointer
+  __m256i* blockB_256 = reinterpret_cast<__m256i*>(blockB);
+
+  // Perform a step of the packing for 4 columns
+  __m256i R_AB_L, R_AB_H, R_CD_L, R_CD_H, R_AD_0, R_AD_8, R_AD_16, R_AD_24;
+#define PACK_STEP                                            \
+  R_AB_L = _mm256_unpacklo_epi64(R_A, R_B);                  \
+  R_CD_L = _mm256_unpacklo_epi64(R_C, R_D);                  \
+  R_AB_H = _mm256_unpackhi_epi64(R_A, R_B);                  \
+  R_CD_H = _mm256_unpackhi_epi64(R_C, R_D);                  \
+  R_AD_0 = _mm256_permute2x128_si256(R_AB_L, R_CD_L, 0x20);  \
+  R_AD_16 = _mm256_permute2x128_si256(R_AB_L, R_CD_L, 0x31); \
+  R_AD_8 = _mm256_permute2x128_si256(R_AB_H, R_CD_H, 0x20);  \
+  R_AD_24 = _mm256_permute2x128_si256(R_AB_H, R_CD_H, 0x31); \
+  _mm256_store_si256(blockB_256, R_AD_0);                    \
+  _mm256_store_si256(blockB_256 + 8, R_AD_8);                \
+  _mm256_store_si256(blockB_256 + 16, R_AD_16);              \
+  _mm256_store_si256(blockB_256 + 24, R_AD_24);              \
+  blockB_256++;
+
+  // Pack cols in sets of 32
+  for (Index n = 0; n < cols; n += 32) {
+    // Pack depth in sets of 32
+    for (Index k = 0; k < depth; k += 32) {
+      __m256i R_A = rhs.loadPacket(k, n);
+      __m256i R_B = rhs.loadPacket(k, n + 1);
+      __m256i R_C = rhs.loadPacket(k, n + 2);
+      __m256i R_D = rhs.loadPacket(k, n + 3);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 4);
+      R_B = rhs.loadPacket(k, n + 5);
+      R_C = rhs.loadPacket(k, n + 6);
+      R_D = rhs.loadPacket(k, n + 7);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 8);
+      R_B = rhs.loadPacket(k, n + 9);
+      R_C = rhs.loadPacket(k, n + 10);
+      R_D = rhs.loadPacket(k, n + 11);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 12);
+      R_B = rhs.loadPacket(k, n + 13);
+      R_C = rhs.loadPacket(k, n + 14);
+      R_D = rhs.loadPacket(k, n + 15);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 16);
+      R_B = rhs.loadPacket(k, n + 17);
+      R_C = rhs.loadPacket(k, n + 18);
+      R_D = rhs.loadPacket(k, n + 19);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 20);
+      R_B = rhs.loadPacket(k, n + 21);
+      R_C = rhs.loadPacket(k, n + 22);
+      R_D = rhs.loadPacket(k, n + 23);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 24);
+      R_B = rhs.loadPacket(k, n + 25);
+      R_C = rhs.loadPacket(k, n + 26);
+      R_D = rhs.loadPacket(k, n + 27);
+      PACK_STEP;
+
+      R_A = rhs.loadPacket(k, n + 28);
+      R_B = rhs.loadPacket(k, n + 29);
+      R_C = rhs.loadPacket(k, n + 30);
+      R_D = rhs.loadPacket(k, n + 31);
+      PACK_STEP;
+
+      blockB_256 += 24;
+    }
+  }
+#undef PACK_STEP
+}
+
+// Perform the actual multiplication on packed inputs
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+struct gebp_kernel<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+{
+  typedef typename DataMapper::LinearMapper LinearMapper;
+
+  EIGEN_DONT_INLINE
+  void operator()(const DataMapper& res, const QInt8* blockA, const QUInt8* blockB,
+                  Index rows, Index depth, Index cols, QInt32 alpha,
+                  Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0);
+};
+
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+EIGEN_DONT_INLINE
+void gebp_kernel<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+::operator()(const DataMapper& res, const QInt8* blockA, const QUInt8* blockB,
+             Index rows, Index depth, Index cols, QInt32 alpha,
+             Index strideA, Index strideB, Index offsetA, Index offsetB)
+{
+  EIGEN_STATIC_ASSERT(!ConjugateLhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  EIGEN_STATIC_ASSERT(!ConjugateRhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  eigen_assert(alpha.value == 1);
+  eigen_assert(strideA == -1);
+  eigen_assert(strideB == -1);
+  eigen_assert(offsetA == 0);
+  eigen_assert(offsetB == 0);
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+  eigen_assert(depth > 0);
+  eigen_assert(blockA);
+  eigen_assert(blockB);
+
+  // Use alternate function for weird sizes
+  if (rows % 32 != 0 || cols % 32 != 0 || depth % 32 != 0) {
+    gebp_kernel_any<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs> gebp;
+    return gebp(res, blockA, blockB, rows, depth, cols, alpha, strideA, strideB, offsetA, offsetB);
+  }
+
+  // Create result block
+  QInt32* blockO = aligned_new<QInt32>(32 * 32);
+  // Allocating the result block is about 5-10% faster than declaring stack
+  // space.  It is unclear why this is the case.
+  // ei_declare_aligned_stack_constructed_variable(QInt32, blockO, 32 * 32, 0);
+  memset(blockO, 0, 32 * 32 * sizeof(QInt32));
+
+  // Get vectorized pointers
+  __m256i* blockO_256 = reinterpret_cast<__m256i*>(blockO);
+  const __m256i* blockA_256 = reinterpret_cast<const __m256i*>(blockA);
+  const __m256i* blockB_256 = reinterpret_cast<const __m256i*>(blockB);
+
+  // Loop over blocks of 32 columns
+  for (Index n = 0; n < cols; n += 32) {
+    // Reset index into blockA
+    Index indexL = 0;
+    // Loop over blocks of 32 rows
+    for (Index m = 0; m < rows; m += 32) {
+      // Reset index into blockB
+      Index indexR = n / 32 * depth;
+      // Loop over blocks of 8 on depth
+      for (Index k = 0; k < depth; k += 8) {
+        // Load inputs
+        __m256i L_AD0 = blockA_256[indexL++];
+        __m256i L_AD8 = blockA_256[indexL++];
+        __m256i L_AD16 = blockA_256[indexL++];
+        __m256i L_AD24 = blockA_256[indexL++];
+        __m256i L_EH0 = blockA_256[indexL++];
+        __m256i L_EH8 = blockA_256[indexL++];
+        __m256i L_EH16 = blockA_256[indexL++];
+        __m256i L_EH24 = blockA_256[indexL++];
+        __m256i R_AH0 = blockB_256[indexR++];
+        __m256i R_AH4 = blockB_256[indexR++];
+        __m256i R_AH8 = blockB_256[indexR++];
+        __m256i R_AH12 = blockB_256[indexR++];
+        __m256i R_AH16 = blockB_256[indexR++];
+        __m256i R_AH20 = blockB_256[indexR++];
+        __m256i R_AH24 = blockB_256[indexR++];
+        __m256i R_AH28 = blockB_256[indexR++];
+
+        // This constant is used with madd to convert 16 bit to 32 bit
+        const __m256i ONE = _mm256_set1_epi32(0x00010001);
+
+        // Declare variables used in COMPUTE_STEP
+        __m256i P_16_A, P_16_B, P_32_A, P_32_B, P_32;
+
+#define COMPUTE_STEP(R_INPUT_A, R_INPUT_B, OFFSET)                             \
+  P_16_A = _mm256_maddubs_epi16(R_INPUT_A, L_AD0);                             \
+  P_32_A = _mm256_madd_epi16(P_16_A, ONE);                                     \
+  P_16_B = _mm256_maddubs_epi16(R_INPUT_B, L_EH0);                             \
+  P_32_B = _mm256_madd_epi16(P_16_B, ONE);                                     \
+  P_32 = _mm256_add_epi32(P_32_A, P_32_B);                                     \
+  _mm256_store_si256(                                                          \
+      blockO_256 + 4 * OFFSET,                                                 \
+      _mm256_add_epi32(_mm256_load_si256(blockO_256 + 4 * OFFSET), P_32));     \
+                                                                               \
+  P_16_A = _mm256_maddubs_epi16(R_INPUT_A, L_AD8);                             \
+  P_32_A = _mm256_madd_epi16(P_16_A, ONE);                                     \
+  P_16_B = _mm256_maddubs_epi16(R_INPUT_B, L_EH8);                             \
+  P_32_B = _mm256_madd_epi16(P_16_B, ONE);                                     \
+  P_32 = _mm256_add_epi32(P_32_A, P_32_B);                                     \
+  _mm256_store_si256(                                                          \
+      blockO_256 + 4 * OFFSET + 1,                                             \
+      _mm256_add_epi32(_mm256_load_si256(blockO_256 + 4 * OFFSET + 1), P_32)); \
+                                                                               \
+  P_16_A = _mm256_maddubs_epi16(R_INPUT_A, L_AD16);                            \
+  P_32_A = _mm256_madd_epi16(P_16_A, ONE);                                     \
+  P_16_B = _mm256_maddubs_epi16(R_INPUT_B, L_EH16);                            \
+  P_32_B = _mm256_madd_epi16(P_16_B, ONE);                                     \
+  P_32 = _mm256_add_epi32(P_32_A, P_32_B);                                     \
+  _mm256_store_si256(                                                          \
+      blockO_256 + 4 * OFFSET + 2,                                             \
+      _mm256_add_epi32(_mm256_load_si256(blockO_256 + 4 * OFFSET + 2), P_32)); \
+                                                                               \
+  P_16_A = _mm256_maddubs_epi16(R_INPUT_A, L_AD24);                            \
+  P_32_A = _mm256_madd_epi16(P_16_A, ONE);                                     \
+  P_16_B = _mm256_maddubs_epi16(R_INPUT_B, L_EH24);                            \
+  P_32_B = _mm256_madd_epi16(P_16_B, ONE);                                     \
+  P_32 = _mm256_add_epi32(P_32_A, P_32_B);                                     \
+  _mm256_store_si256(                                                          \
+      blockO_256 + 4 * OFFSET + 3,                                             \
+      _mm256_add_epi32(_mm256_load_si256(blockO_256 + 4 * OFFSET + 3), P_32));
+
+        // Permute and shuffle to copy a single value across the entire vector
+        // Then compute the multiplication
+        __m256i R_AH0_ = _mm256_permute2x128_si256(R_AH0, R_AH0, 0x00);
+        __m256i R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        __m256i R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 0);
+        __m256i R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        __m256i R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 1);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH0, R_AH0, 0x11);
+        __m256i R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        __m256i R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 2);
+        __m256i R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        __m256i R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 3);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH4, R_AH4, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 4);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 5);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH4, R_AH4, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 6);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 7);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH8, R_AH8, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 8);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 9);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH8, R_AH8, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 10);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 11);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH12, R_AH12, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 12);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 13);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH12, R_AH12, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 14);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 15);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH16, R_AH16, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 16);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 17);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH16, R_AH16, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 18);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 19);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH20, R_AH20, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 20);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 21);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH20, R_AH20, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 22);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 23);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH24, R_AH24, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 24);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 25);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH24, R_AH24, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 26);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 27);
+
+        R_AH0_ = _mm256_permute2x128_si256(R_AH28, R_AH28, 0x00);
+        R_AD0 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH0 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD0, R_EH0, 28);
+        R_AD1 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH1 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD1, R_EH1, 29);
+        R_AH0_ = _mm256_permute2x128_si256(R_AH28, R_AH28, 0x11);
+        R_AD2 = _mm256_shuffle_epi32(R_AH0_, 0x00);
+        R_EH2 = _mm256_shuffle_epi32(R_AH0_, 0x55);
+        COMPUTE_STEP(R_AD2, R_EH2, 30);
+        R_AD3 = _mm256_shuffle_epi32(R_AH0_, 0xAA);
+        R_EH3 = _mm256_shuffle_epi32(R_AH0_, 0xFF);
+        COMPUTE_STEP(R_AD3, R_EH3, 31);
+
+#undef COMPUTE_STEP
+      }
+
+      // Transfer the results to the result matrix
+      Index i = 0;
+      for (Index j = n; j < n + 32; j++) {
+        LinearMapper r0 = res.getLinearMapper(m, j);
+        LinearMapper r1 = res.getLinearMapper(m + 8, j);
+        LinearMapper r2 = res.getLinearMapper(m + 16, j);
+        LinearMapper r3 = res.getLinearMapper(m + 24, j);
+        r0.storePacket(
+            0, _mm256_add_epi32(blockO_256[i++], r0.loadPacket(0)));
+        r1.storePacket(
+            0, _mm256_add_epi32(blockO_256[i++], r1.loadPacket(0)));
+        r2.storePacket(
+            0, _mm256_add_epi32(blockO_256[i++], r2.loadPacket(0)));
+        r3.storePacket(
+            0, _mm256_add_epi32(blockO_256[i++], r3.loadPacket(0)));
+      }
+
+      // Zero the result block so it can be reused
+      memset(blockO, 0, 32 * 32 * sizeof(QInt32));
+    }
+  }
+  aligned_delete(blockO, 32 * 32);
+}
+
+#endif  // EIGEN_USE_OPTIMIZED_INT8_UINT8_MAT_MAT_PRODUCT
+
+}  // namespace internal
+}  // namespace Eigen
+
+#endif  // EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_AVX2_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProductNEON.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProductNEON.h
new file mode 100644
index 0000000000..99894cafb5
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatMatProductNEON.h
@@ -0,0 +1,95 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+// Copyright (C) 2015 Benoit Jacob <benoitjacob@google.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_NEON_H
+#define EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_NEON_H
+
+
+namespace Eigen {
+namespace internal {
+
+
+// AVX2 optimized implementation of the case where the lhs is encoded using signed 8bit
+// integers and the rhs using unsigned 8bit integers.
+#ifdef EIGEN_USE_OPTIMIZED_INT8_UINT8_MAT_MAT_PRODUCT
+
+template<bool _ConjLhs, bool _ConjRhs>
+class gebp_traits<QInt8, QUInt8, _ConjLhs, _ConjRhs>
+{
+public:
+  typedef QInt8 LhsScalar;
+  typedef QUInt8 RhsScalar;
+  typedef QInt32 ResScalar;
+
+  enum {
+    // register block size along the M and N directions
+    // One for the current implementation
+    nr = 1,
+    mr = 1,
+    // Progress made at each iteration of the product loop
+    // also 1 for the current implementation
+    LhsProgress = 1,
+    RhsProgress = 1
+  };
+};
+
+// Mat-Mat product of a signed 8bit lhs with an unsigned 8bit rhs
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+struct gebp_kernel<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+{
+  EIGEN_DONT_INLINE
+  void operator()(const DataMapper& res, const QInt8* blockA, const QUInt8* blockB,
+                  Index rows, Index depth, Index cols, QInt32 alpha,
+                  Index strideA=-1, Index strideB=-1, Index offsetA=0, Index offsetB=0);
+};
+
+template<typename Index, typename DataMapper, int mr, int nr, bool ConjugateLhs, bool ConjugateRhs>
+EIGEN_DONT_INLINE
+void gebp_kernel<QInt8, QUInt8, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>
+::operator()(const DataMapper& res, const QInt8* blockA, const QUInt8* blockB,
+             Index rows, Index depth, Index cols, QInt32 alpha,
+             Index strideA, Index strideB, Index offsetA, Index offsetB)
+{
+  EIGEN_STATIC_ASSERT(!ConjugateLhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  EIGEN_STATIC_ASSERT(!ConjugateRhs, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  eigen_assert(alpha.value == 1);
+  eigen_assert(strideA == -1);
+  eigen_assert(strideB == -1);
+  eigen_assert(offsetA == 0);
+  eigen_assert(offsetB == 0);
+
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+  eigen_assert(depth > 0);
+  eigen_assert(blockA);
+  eigen_assert(blockB);
+
+  for (Index j = 0; j < cols; ++j) {
+    Index startB = j * depth;
+
+    for (Index i = 0; i < rows; ++i) {
+      Index startA = i * depth;
+
+      for (Index k = 0; k < depth; ++k) {
+        res(i, j) += blockA[startA + k] * blockB[startB + k];
+      }
+    }
+  }
+}
+#endif
+
+
+}  // namespace internal
+}  // namespace Eigen
+
+
+
+#endif  // EIGEN_CXX11_FIXED_POINT_MAT_MAT_PRODUCT_NEON_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatVecProduct.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatVecProduct.h
new file mode 100644
index 0000000000..18b5085b89
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/MatVecProduct.h
@@ -0,0 +1,123 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_FIXED_POINT_MAT_VEC_PRODUCT_H
+#define EIGEN_CXX11_FIXED_POINT_MAT_VEC_PRODUCT_H
+
+
+namespace Eigen {
+namespace internal {
+
+// Mat-Vec product
+// Both lhs and rhs are encoded as 8bit signed integers
+template<typename Index, typename LhsMapper, bool ConjugateLhs, typename RhsMapper, bool ConjugateRhs, int Version>
+struct general_matrix_vector_product<Index,QInt8,LhsMapper,ColMajor,ConjugateLhs,QInt8,RhsMapper,ConjugateRhs,Version>
+{
+EIGEN_DONT_INLINE static void run(
+  Index rows, Index cols,
+  const LhsMapper& lhs,
+  const RhsMapper& rhs,
+  QInt32* res, Index resIncr,
+  QInt8 alpha);
+};
+
+template<typename Index, typename LhsMapper, bool ConjugateLhs, typename RhsMapper, bool ConjugateRhs, int Version>
+EIGEN_DONT_INLINE void general_matrix_vector_product<Index,QInt8,LhsMapper,ColMajor,ConjugateLhs,QInt8,RhsMapper,ConjugateRhs,Version>::run(
+    Index rows, Index cols,
+    const LhsMapper& lhs,
+    const RhsMapper& rhs,
+    QInt32* res, Index resIncr,
+    QInt8 alpha)
+{
+  eigen_assert(alpha.value == 1);
+  eigen_assert(resIncr == 1);
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+
+  for (Index i = 0; i < rows; ++i) {
+    for (Index j = 0; j < cols; ++j) {
+      res[i] += lhs(i, j) * rhs(j, 0);
+    }
+  }
+}
+
+
+// Mat-Vec product
+// The lhs is encoded using 8bit signed integers, the rhs using 8bit unsigned integers
+template<typename Index, typename LhsMapper, bool ConjugateLhs, typename RhsMapper, bool ConjugateRhs, int Version>
+struct general_matrix_vector_product<Index,QInt8,LhsMapper,ColMajor,ConjugateLhs,QUInt8,RhsMapper,ConjugateRhs,Version>
+{
+EIGEN_DONT_INLINE static void run(
+  Index rows, Index cols,
+  const LhsMapper& lhs,
+  const RhsMapper& rhs,
+  QInt32* res, Index resIncr,
+  QUInt8 alpha);
+};
+
+template<typename Index, typename LhsMapper, bool ConjugateLhs, typename RhsMapper, bool ConjugateRhs, int Version>
+EIGEN_DONT_INLINE void general_matrix_vector_product<Index,QInt8,LhsMapper,ColMajor,ConjugateLhs,QUInt8,RhsMapper,ConjugateRhs,Version>::run(
+    Index rows, Index cols,
+    const LhsMapper& lhs,
+    const RhsMapper& rhs,
+    QInt32* res, Index resIncr,
+    QUInt8 alpha)
+{
+  eigen_assert(alpha.value == 1);
+  eigen_assert(resIncr == 1);
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+
+  for (Index i = 0; i < rows; ++i) {
+    for (Index j = 0; j < cols; ++j) {
+      res[i] += lhs(i, j) * rhs(j, 0);
+    }
+  }
+}
+
+
+// Mat-Vec product
+// The lhs is encoded using bit unsigned integers, the rhs using 8bit signed integers
+template<typename Index, typename LhsMapper, bool ConjugateLhs, typename RhsMapper, bool ConjugateRhs, int Version>
+struct general_matrix_vector_product<Index,QUInt8,LhsMapper,ColMajor,ConjugateLhs,QInt8,RhsMapper,ConjugateRhs,Version>
+{
+EIGEN_DONT_INLINE static void run(
+  Index rows, Index cols,
+  const LhsMapper& lhs,
+  const RhsMapper& rhs,
+  QInt32* res, Index resIncr,
+  QInt8 alpha);
+};
+
+template<typename Index, typename LhsMapper, bool ConjugateLhs, typename RhsMapper, bool ConjugateRhs, int Version>
+EIGEN_DONT_INLINE void general_matrix_vector_product<Index,QUInt8,LhsMapper,ColMajor,ConjugateLhs,QInt8,RhsMapper,ConjugateRhs,Version>::run(
+    Index rows, Index cols,
+    const LhsMapper& lhs,
+    const RhsMapper& rhs,
+    QInt32* res, Index resIncr,
+    QInt8 alpha)
+{
+  eigen_assert(alpha.value == 1);
+  eigen_assert(resIncr == 1);
+  eigen_assert(rows > 0);
+  eigen_assert(cols > 0);
+
+  for (Index i = 0; i < rows; ++i) {
+    for (Index j = 0; j < cols; ++j) {
+      res[i] += lhs(i, j) * rhs(j, 0);
+    }
+  }
+}
+
+}  // namespace internal
+}  // namespace Eigen
+
+
+
+#endif  // EIGEN_CXX11_FIXED_POINT_MAT_VEC_PRODUCT_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h
new file mode 100644
index 0000000000..cae1a0b06d
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/PacketMathAVX2.h
@@ -0,0 +1,409 @@
+#ifndef THIRD_PARTY_EIGEN3_UNSUPPORTED_EIGEN_CXX11_SRC_FIXEDPOINT_PACKETMATHAVX2_H_
+#define THIRD_PARTY_EIGEN3_UNSUPPORTED_EIGEN_CXX11_SRC_FIXEDPOINT_PACKETMATHAVX2_H_
+
+namespace Eigen {
+namespace internal {
+
+typedef struct Packet32q8i {
+  __m256i val;
+  operator __m256i() const { return val; }
+  Packet32q8i();
+  Packet32q8i(__m256i val) : val(val) {}
+} Packet32q8i;
+
+typedef struct Packet32q8u {
+  __m256i val;
+  operator __m256i() const { return val; }
+  Packet32q8u();
+  Packet32q8u(__m256i val) : val(val) {}
+} Packet32q8u;
+
+typedef struct Packet16q8i {
+  __m128i val;
+  operator __m128i() const { return val; }
+  Packet16q8i();
+  Packet16q8i(__m128i val) : val(val) {}
+} Packet16q8i;
+
+typedef struct Packet16q8u {
+  __m128i val;
+  operator __m128i() const { return val; }
+  Packet16q8u();
+  Packet16q8u(__m128i val) : val(val) {}
+} Packet16q8u;
+
+typedef struct Packet8q32i {
+  __m256i val;
+  operator __m256i() const { return val; }
+  Packet8q32i();
+  Packet8q32i(__m256i val) : val(val) {}
+} Packet8q32i;
+
+typedef struct Packet4q32i {
+  __m128i val;
+  operator __m128i() const { return val; }
+  Packet4q32i();
+  Packet4q32i(__m128i val) : val(val) {}
+} Packet4q32i;
+
+template <>
+struct packet_traits<QInt8> : default_packet_traits {
+  typedef Packet32q8i type;
+  typedef Packet16q8i half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 32,
+  };
+  enum {
+    HasAdd = 0,
+    HasSub = 0,
+    HasMul = 0,
+    HasNegate = 0,
+    HasAbs = 0,
+    HasAbs2 = 0,
+    HasMin = 1,
+    HasMax = 1,
+    HasConj = 0,
+    HasSetLinear = 0
+  };
+};
+template <>
+struct packet_traits<QUInt8> : default_packet_traits {
+  typedef Packet32q8u type;
+  typedef Packet16q8u half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 32,
+  };
+  enum {
+    HasAdd = 0,
+    HasSub = 0,
+    HasMul = 0,
+    HasNegate = 0,
+    HasAbs = 0,
+    HasAbs2 = 0,
+    HasMin = 1,
+    HasMax = 1,
+    HasConj = 0,
+    HasSetLinear = 0
+  };
+};
+template <>
+struct packet_traits<QInt32> : default_packet_traits {
+  typedef Packet8q32i type;
+  typedef Packet4q32i half;
+  enum {
+    Vectorizable = 1,
+    AlignedOnScalar = 1,
+    size = 8,
+  };
+  enum {
+    HasAdd = 1,
+    HasSub = 1,
+    HasMul = 1,
+    HasNegate = 1,
+    HasAbs = 0,
+    HasAbs2 = 0,
+    HasMin = 1,
+    HasMax = 1,
+    HasConj = 0,
+    HasSetLinear = 0
+  };
+};
+
+template <>
+struct unpacket_traits<Packet32q8i> {
+  typedef QInt8 type;
+  typedef Packet16q8i half;
+  enum { size = 32 };
+};
+template <>
+struct unpacket_traits<Packet32q8u> {
+  typedef QUInt8 type;
+  typedef Packet16q8u half;
+  enum { size = 32 };
+};
+template <>
+struct unpacket_traits<Packet8q32i> {
+  typedef QInt32 type;
+  typedef Packet4q32i half;
+  enum { size = 8 };
+};
+
+// Unaligned load
+template <>
+EIGEN_STRONG_INLINE Packet32q8i ploadu<Packet32q8i>(const QInt8* from) {
+  EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_si256(
+      reinterpret_cast<const __m256i*>(from));
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8u ploadu<Packet32q8u>(const QUInt8* from) {
+  EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_si256(
+      reinterpret_cast<const __m256i*>(from));
+}
+template <>
+EIGEN_STRONG_INLINE Packet8q32i ploadu<Packet8q32i>(const QInt32* from) {
+  EIGEN_DEBUG_UNALIGNED_LOAD return _mm256_loadu_si256(
+      reinterpret_cast<const __m256i*>(from));
+}
+
+// Aligned load
+template <>
+EIGEN_STRONG_INLINE Packet32q8i pload<Packet32q8i>(const QInt8* from) {
+  EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_si256(
+      reinterpret_cast<const __m256i*>(from));
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8u pload<Packet32q8u>(const QUInt8* from) {
+  EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_si256(
+      reinterpret_cast<const __m256i*>(from));
+}
+template <>
+EIGEN_STRONG_INLINE Packet8q32i pload<Packet8q32i>(const QInt32* from) {
+  EIGEN_DEBUG_ALIGNED_LOAD return _mm256_load_si256(
+      reinterpret_cast<const __m256i*>(from));
+}
+
+// Unaligned store
+template <>
+EIGEN_STRONG_INLINE void pstoreu<QInt8>(QInt8* to, const Packet32q8i& from) {
+  EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_si256(
+      reinterpret_cast<__m256i*>(to), from.val);
+}
+template <>
+EIGEN_STRONG_INLINE void pstoreu<QUInt8>(QUInt8* to, const Packet32q8u& from) {
+  EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_si256(
+      reinterpret_cast<__m256i*>(to), from.val);
+}
+template <>
+EIGEN_STRONG_INLINE void pstoreu<QInt32>(QInt32* to, const Packet8q32i& from) {
+  EIGEN_DEBUG_UNALIGNED_STORE _mm256_storeu_si256(
+      reinterpret_cast<__m256i*>(to), from.val);
+}
+
+// Aligned store
+template <>
+EIGEN_STRONG_INLINE void pstore<QInt32>(QInt32* to, const Packet8q32i& from) {
+  EIGEN_DEBUG_ALIGNED_STORE _mm256_store_si256(reinterpret_cast<__m256i*>(to),
+                                               from.val);
+}
+template <>
+EIGEN_STRONG_INLINE void pstore<QUInt8>(QUInt8* to, const Packet32q8u& from) {
+  EIGEN_DEBUG_ALIGNED_STORE _mm256_store_si256(reinterpret_cast<__m256i*>(to),
+                                               from.val);
+}
+template <>
+EIGEN_STRONG_INLINE void pstore<QInt8>(QInt8* to, const Packet32q8i& from) {
+  EIGEN_DEBUG_ALIGNED_STORE _mm256_store_si256(reinterpret_cast<__m256i*>(to),
+                                               from.val);
+}
+
+// Extract first element.
+template <>
+EIGEN_STRONG_INLINE QInt32 pfirst<Packet8q32i>(const Packet8q32i& a) {
+  return _mm_cvtsi128_si32(_mm256_castsi256_si128(a));
+}
+template <>
+EIGEN_STRONG_INLINE QUInt8 pfirst<Packet32q8u>(const Packet32q8u& a) {
+  return static_cast<uint8_t>(_mm256_extract_epi8(a.val, 0));
+}
+template <>
+EIGEN_STRONG_INLINE QInt8 pfirst<Packet32q8i>(const Packet32q8i& a) {
+  return _mm256_extract_epi8(a.val, 0);
+}
+
+// Initialize to constant value.
+template <>
+EIGEN_STRONG_INLINE Packet32q8i pset1<Packet32q8i>(const QInt8& from) {
+  return _mm256_set1_epi8(from.value);
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8u pset1<Packet32q8u>(const QUInt8& from) {
+  return _mm256_set1_epi8(static_cast<uint8_t>(from.value));
+}
+template <>
+EIGEN_STRONG_INLINE Packet8q32i pset1<Packet8q32i>(const QInt32& from) {
+  return _mm256_set1_epi32(from.value);
+}
+
+// Basic arithmetic packet ops for QInt32.
+template <>
+EIGEN_STRONG_INLINE Packet8q32i padd<Packet8q32i>(const Packet8q32i& a,
+                                                  const Packet8q32i& b) {
+  return _mm256_add_epi32(a.val, b.val);
+}
+template <>
+EIGEN_STRONG_INLINE Packet8q32i psub<Packet8q32i>(const Packet8q32i& a,
+                                                  const Packet8q32i& b) {
+  return _mm256_sub_epi32(a.val, b.val);
+}
+// Note: mullo truncates the result to 32 bits.
+template <>
+EIGEN_STRONG_INLINE Packet8q32i pmul<Packet8q32i>(const Packet8q32i& a,
+                                                  const Packet8q32i& b) {
+  return _mm256_mullo_epi32(a.val, b.val);
+}
+template <>
+EIGEN_STRONG_INLINE Packet8q32i pnegate<Packet8q32i>(const Packet8q32i& a) {
+  return _mm256_sub_epi32(_mm256_setzero_si256(), a.val);
+}
+
+// Min and max.
+template <>
+EIGEN_STRONG_INLINE Packet8q32i pmin<Packet8q32i>(const Packet8q32i& a,
+                                                  const Packet8q32i& b) {
+  return _mm256_min_epi32(a.val, b.val);
+}
+template <>
+EIGEN_STRONG_INLINE Packet8q32i pmax<Packet8q32i>(const Packet8q32i& a,
+                                                  const Packet8q32i& b) {
+  return _mm256_max_epi32(a.val, b.val);
+}
+
+template <>
+EIGEN_STRONG_INLINE Packet32q8u pmin<Packet32q8u>(const Packet32q8u& a,
+                                                  const Packet32q8u& b) {
+  return _mm256_min_epu8(a.val, b.val);
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8u pmax<Packet32q8u>(const Packet32q8u& a,
+                                                  const Packet32q8u& b) {
+  return _mm256_max_epu8(a.val, b.val);
+}
+
+template <>
+EIGEN_STRONG_INLINE Packet32q8i pmin<Packet32q8i>(const Packet32q8i& a,
+                                                  const Packet32q8i& b) {
+  return _mm256_min_epi8(a.val, b.val);
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8i pmax<Packet32q8i>(const Packet32q8i& a,
+                                                  const Packet32q8i& b) {
+  return _mm256_max_epi8(a.val, b.val);
+}
+
+// Reductions.
+template <>
+EIGEN_STRONG_INLINE QInt32 predux_min<Packet8q32i>(const Packet8q32i& a) {
+  __m256i tmp = _mm256_min_epi32(a, _mm256_permute2f128_si256(a, a, 1));
+  tmp =
+      _mm256_min_epi32(tmp, _mm256_shuffle_epi32(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  return pfirst<Packet8q32i>(
+      _mm256_min_epi32(tmp, _mm256_shuffle_epi32(tmp, 1)));
+}
+template <>
+EIGEN_STRONG_INLINE QInt32 predux_max<Packet8q32i>(const Packet8q32i& a) {
+  __m256i tmp = _mm256_max_epi32(a, _mm256_permute2f128_si256(a, a, 1));
+  tmp =
+      _mm256_max_epi32(tmp, _mm256_shuffle_epi32(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  return pfirst<Packet8q32i>(
+      _mm256_max_epi32(tmp, _mm256_shuffle_epi32(tmp, 1)));
+}
+
+template <>
+EIGEN_STRONG_INLINE QUInt8 predux_min<Packet32q8u>(const Packet32q8u& a) {
+  __m256i tmp = _mm256_min_epu8(a, _mm256_permute2f128_si256(a, a, 1));
+  tmp =
+      _mm256_min_epu8(tmp, _mm256_shuffle_epi32(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  tmp = _mm256_min_epu8(tmp, _mm256_shuffle_epi32(tmp, 1));
+  tmp = _mm256_min_epu8(tmp,
+                        _mm256_shufflelo_epi16(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  return std::min(static_cast<uint8_t>(_mm256_extract_epi8(tmp, 0)),
+                  static_cast<uint8_t>(_mm256_extract_epi8(tmp, 1)));
+}
+template <>
+EIGEN_STRONG_INLINE QUInt8 predux_max<Packet32q8u>(const Packet32q8u& a) {
+  __m256i tmp = _mm256_max_epu8(a, _mm256_permute2f128_si256(a, a, 1));
+  tmp =
+      _mm256_max_epu8(tmp, _mm256_shuffle_epi32(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  tmp = _mm256_max_epu8(tmp, _mm256_shuffle_epi32(tmp, 1));
+  tmp = _mm256_max_epu8(tmp,
+                        _mm256_shufflelo_epi16(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  return std::max(static_cast<uint8_t>(_mm256_extract_epi8(tmp, 0)),
+                  static_cast<uint8_t>(_mm256_extract_epi8(tmp, 1)));
+}
+
+template <>
+EIGEN_STRONG_INLINE QInt8 predux_min<Packet32q8i>(const Packet32q8i& a) {
+  __m256i tmp = _mm256_min_epi8(a, _mm256_permute2f128_si256(a, a, 1));
+  tmp = _mm256_min_epi8(tmp, _mm256_shuffle_epi32(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  tmp = _mm256_min_epi8(tmp, _mm256_shuffle_epi32(tmp, 1));
+  tmp = _mm256_min_epi8(tmp, _mm256_shufflelo_epi16(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  return std::min(_mm256_extract_epi8(tmp, 0), _mm256_extract_epi8(tmp, 1));
+}
+template <>
+EIGEN_STRONG_INLINE QInt8 predux_max<Packet32q8i>(const Packet32q8i& a) {
+  __m256i tmp = _mm256_max_epi8(a, _mm256_permute2f128_si256(a, a, 1));
+  tmp = _mm256_max_epi8(tmp, _mm256_shuffle_epi32(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  tmp = _mm256_max_epi8(tmp, _mm256_shuffle_epi32(tmp, 1));
+  tmp = _mm256_max_epi8(tmp, _mm256_shufflelo_epi16(tmp, _MM_SHUFFLE(1, 0, 3, 2)));
+  return std::max(_mm256_extract_epi8(tmp, 0), _mm256_extract_epi8(tmp, 1));
+}
+
+// Comparisons
+template <>
+EIGEN_STRONG_INLINE Packet8q32i peq<Packet8q32i>(const Packet8q32i& a,
+                                                 const Packet8q32i& b) {
+  return _mm256_cmpeq_epi32(a.val, b.val);
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8i peq<Packet32q8i>(const Packet32q8i& a,
+                                                 const Packet32q8i& b) {
+  return _mm256_cmpeq_epi8(a.val, b.val);
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8u peq<Packet32q8u>(const Packet32q8u& a,
+                                                 const Packet32q8u& b) {
+  return _mm256_cmpeq_epi8(a.val, b.val);
+}
+
+// Note: There are no instructions in AVX2 for unsigned lt/gt comparison.
+// These are added in AVX-512.
+template <>
+EIGEN_STRONG_INLINE Packet8q32i ple<Packet8q32i>(const Packet8q32i& a,
+                                                 const Packet8q32i& b) {
+  const __m256i gt = _mm256_cmpgt_epi32(a.val, b.val);
+  return _mm256_xor_si256(gt, gt);
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8i ple<Packet32q8i>(const Packet32q8i& a,
+                                                 const Packet32q8i& b) {
+  const __m256i gt = _mm256_cmpgt_epi8(a.val, b.val);
+  return _mm256_xor_si256(gt, gt);
+}
+
+template <>
+EIGEN_STRONG_INLINE Packet8q32i plt<Packet8q32i>(const Packet8q32i& a,
+                                                 const Packet8q32i& b) {
+  return _mm256_cmpgt_epi32(b.val, a.val);
+}
+template <>
+EIGEN_STRONG_INLINE Packet32q8i plt<Packet32q8i>(const Packet32q8i& a,
+                                                 const Packet32q8i& b) {
+  return _mm256_cmpgt_epi8(b.val, a.val);
+}
+
+// Vectorized scaling of Packet32q8i by float.
+template <>
+struct functor_traits<scalar_multiple2_op<QInt32, double>> {
+  enum { Cost = 4 * NumTraits<float>::MulCost, PacketAccess = true };
+};
+
+template <>
+EIGEN_STRONG_INLINE const Packet8q32i
+scalar_multiple2_op<QInt32, double>::packetOp(const Packet8q32i& a) const {
+  __m256d scale = _mm256_set1_pd(m_other);
+  __m256d a_lo = _mm256_cvtepi32_pd(_mm256_castsi256_si128(a));
+  __m128i result_lo = _mm256_cvtpd_epi32(_mm256_mul_pd(scale, a_lo));
+  __m256d a_hi = _mm256_cvtepi32_pd(_mm256_extracti128_si256(a, 1));
+  __m128i result_hi = _mm256_cvtpd_epi32(_mm256_mul_pd(scale, a_hi));
+  return _mm256_insertf128_si256(_mm256_castsi128_si256(result_lo), result_hi,
+                                 1);
+}
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+#endif  // THIRD_PARTY_EIGEN3_UNSUPPORTED_EIGEN_CXX11_SRC_FIXEDPOINT_PACKETMATHAVX2_H_
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/TypeCastingAVX2.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/TypeCastingAVX2.h
new file mode 100644
index 0000000000..045384d7fc
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/FixedPoint/TypeCastingAVX2.h
@@ -0,0 +1,66 @@
+#ifndef THIRD_PARTY_EIGEN3_UNSUPPORTED_EIGEN_CXX11_SRC_FIXEDPOINT_TYPECASTINGAVX2_H_
+#define THIRD_PARTY_EIGEN3_UNSUPPORTED_EIGEN_CXX11_SRC_FIXEDPOINT_TYPECASTINGAVX2_H_
+
+namespace Eigen {
+namespace internal {
+
+typedef __m256 Packet8f;
+
+template <>
+struct type_casting_traits<QInt32, float> {
+  enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 };
+};
+
+template <>
+EIGEN_STRONG_INLINE Packet8f pcast<Packet8q32i>(const Packet8q32i& a) {
+  return _mm256_cvtepi32_ps(a.val);
+}
+
+template <>
+struct type_casting_traits<float, QInt32> {
+  enum { VectorizedCast = 1, SrcCoeffRatio = 1, TgtCoeffRatio = 1 };
+};
+
+template <>
+EIGEN_STRONG_INLINE Packet8q32i pcast<Packet8f>(const Packet8f& a) {
+  return _mm256_cvtps_epi32(a);
+}
+
+template <>
+struct type_casting_traits<QInt32, QInt8> {
+  enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 };
+};
+
+template <>
+EIGEN_STRONG_INLINE Packet32q8i
+pcast<Packet8q32i, Packet32q8i>(const Packet8q32i& a, const Packet8q32i& b,
+                                const Packet8q32i& c, const Packet8q32i& d) {
+  __m256i converted = _mm256_packs_epi16(_mm256_packs_epi32(a.val, b.val),
+                                         _mm256_packs_epi32(c.val, d.val));
+  // Since packs does not cross 128 bit lane boundaries,
+  // we have to permute to properly order the final result.
+  const __m256i permute_mask = _mm256_set_epi32(7, 3, 6, 2, 5, 1, 4, 0);
+  return _mm256_permutevar8x32_epi32(converted, permute_mask);
+}
+
+template <>
+struct type_casting_traits<QInt32, QUInt8> {
+  enum { VectorizedCast = 1, SrcCoeffRatio = 4, TgtCoeffRatio = 1 };
+};
+
+template <>
+EIGEN_STRONG_INLINE Packet32q8u
+pcast<Packet8q32i, Packet32q8u>(const Packet8q32i& a, const Packet8q32i& b,
+                                const Packet8q32i& c, const Packet8q32i& d) {
+  const __m256i converted = _mm256_packus_epi16(
+      _mm256_packs_epi32(a.val, b.val), _mm256_packs_epi32(c.val, d.val));
+  // Since packus does not cross 128 bit lane boundaries,
+  // we have to permute to properly order the final result.
+  const __m256i permute_mask = _mm256_set_epi32(7, 3, 6, 2, 5, 1, 4, 0);
+  return _mm256_permutevar8x32_epi32(converted, permute_mask);
+}
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+#endif  // THIRD_PARTY_EIGEN3_UNSUPPORTED_EIGEN_CXX11_SRC_FIXEDPOINT_TYPECASTINGAVX2_H_
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Activations.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Activations.h
new file mode 100644
index 0000000000..94d616f2b5
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Activations.h
@@ -0,0 +1,116 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#ifndef EIGEN_CXX11_NEURAL_NETWORKS_ACTIVATIONS_H
+#define EIGEN_CXX11_NEURAL_NETWORKS_ACTIVATIONS_H
+
+namespace Eigen {
+
+/** scalar_sigmoid_fast_derivative_op
+  * \ingroup CXX11_NeuralNetworks_Module
+  * \brief Template functor to compute the fast derivative of a sigmoid
+  *
+  * Input should be the backpropagated gradient.
+  *
+  * \sa class CwiseUnaryOp, Cwise::sigmoid_fast_derivative()
+  */
+template <typename T>
+struct scalar_sigmoid_fast_derivative_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_sigmoid_fast_derivative_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(const T& y) const {
+    const T one = T(1);
+    return (one - y) * y;
+  }
+
+  template <typename Packet>
+  inline Packet packetOp(const Packet& y) const {
+    const Packet one = internal::pset1<Packet>(1);
+    return internal::pmul(internal::psub(one, y), y);
+  }
+};
+
+namespace internal {
+template <typename T>
+struct functor_traits<scalar_sigmoid_fast_derivative_op<T> > {
+  enum {
+    Cost = NumTraits<T>::AddCost * 2 + NumTraits<T>::MulCost,
+    PacketAccess = packet_traits<T>::HasAdd && packet_traits<T>::HasMul &&
+                   packet_traits<T>::HasNegate
+  };
+};
+}  // namespace internal
+
+/** scalar_tanh_fast_derivative_op
+  * \ingroup CXX11_NeuralNetworks_Module
+  * \brief Template functor to compute the fast derivative of a tanh
+  *
+  * Input should be the backpropagated gradient.
+  *
+  * \sa class CwiseUnaryOp, Cwise::tanh_fast_derivative()
+  */
+template <typename T>
+struct scalar_tanh_fast_derivative_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_tanh_fast_derivative_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator()(const T& y) const {
+    const T one = T(1);
+    return one - (y * y);
+  }
+
+  template <typename Packet>
+  inline Packet packetOp(const Packet& y) const {
+    const Packet one = internal::pset1<Packet>(1);
+    return internal::psub(one, internal::pmul(y, y));
+  }
+};
+
+namespace internal {
+template <typename T>
+struct functor_traits<scalar_tanh_fast_derivative_op<T> > {
+  enum {
+    Cost = NumTraits<T>::AddCost * 2 + NumTraits<T>::MulCost * 1,
+    PacketAccess = packet_traits<T>::HasAdd && packet_traits<T>::HasMul &&
+                   packet_traits<T>::HasNegate
+  };
+};
+}  // namespace internal
+
+/**
+  * \ingroup CXX11_NeuralNetworks_Module
+  * \brief Template functor to clip the the magnitude of the first scalar.
+  *
+  * \sa class CwiseBinaryOp, MatrixBase::Clip
+  */
+template <typename Scalar>
+struct scalar_clip_op {
+  EIGEN_EMPTY_STRUCT_CTOR(scalar_clip_op)
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar
+  operator()(const Scalar& a, const Scalar& b) const {
+    return numext::mini(numext::maxi(a, -b), b);
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Packet
+  packetOp(const Packet& a, const Packet& b) const {
+    return internal::pmin(internal::pmax(a, internal::pnegate(b)), b);
+  }
+};
+
+namespace internal {
+template <typename Scalar>
+struct functor_traits<scalar_clip_op<Scalar> > {
+  enum {
+    Cost = NumTraits<Scalar>::AddCost * 3,
+    PacketAccess = packet_traits<Scalar>::HasMax &&
+                   packet_traits<Scalar>::HasMin &&
+                   packet_traits<Scalar>::HasNegate
+  };
+};
+}  // namespace internal
+
+}  // end namespace Eigen
+
+#endif  // EIGEN_CXX11_NEURAL_NETWORKS_ACTIVATIONS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Attention.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Attention.h
new file mode 100644
index 0000000000..d4bc7a3515
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Attention.h
@@ -0,0 +1,209 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#ifndef EIGEN_CXX11_NEURAL_NETWORKS_ATTENTION_H
+#define EIGEN_CXX11_NEURAL_NETWORKS_ATTENTION_H
+
+namespace Eigen {
+
+/** ExtractGlimpses
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Extract glimpses from an input tensor.
+  *
+  * The input parameter is expected to be a col-major tensor with a rank of 4 (depth, x, y, and batch).
+  * The width and height parameters specify the extension of the returned glimpses.
+  * The offsets parameter specifies the x, y locations of the center of the glimpses relative to the center of the input image. The vector is expected to contain one IndexPair for each image in the batch dimension.
+  * The normalized boolean indicates if incoming coordinates are normalized so that 0.0 and 1.0 correspond to the minimum and maximum of each height and width dimension.
+  * The centered boolean indicates if incoming coordinates are centered relative to the image, in which case -1.0 and 1.0 correspond to minimum and maximum of each dimension while 0.0 corresponds to the center.
+  *
+  * The result can be assigned to a tensor of rank equal to that of the input. The result will be laid out in col-major order (depth, x, y, batch).
+  * The dimensions of the result will be equal to the dimensions of the input except for width and height which will be equal to the requested glimpse size.
+  */
+namespace {
+template <typename Index>
+struct GlimpseExtractionOp {
+  GlimpseExtractionOp(const Index width, const Index height,
+                      const std::vector<IndexPair<float> >& offsets,
+                      const bool normalized,
+                      const bool centered,
+                      const bool uniform_noise) :
+      width_(width), height_(height), offsets_(offsets),
+      normalized_(normalized), centered_(centered), uniform_noise_(uniform_noise) { }
+
+  template <typename Input>
+  DSizes<Index, 4> dimensions(const Input& input) const {
+    typedef typename internal::traits<Input>::Index IndexType;
+    typedef TensorRef<Tensor<typename internal::traits<Input>::Scalar, 4,
+                             internal::traits<Input>::Layout, IndexType> > Ref;
+    Ref in(input);
+
+    DSizes<Index, 4> dims = in.dimensions();
+
+    dims[0] = in.dimension(0);
+    dims[1] = width_;
+    dims[2] = height_;
+    dims[3] = in.dimension(3);
+    return dims;
+  }
+
+  template <typename Input, typename Output, typename Device>
+  EIGEN_DEVICE_FUNC
+  void eval(const Input& input, Output& output, const Device& device) const
+  {
+    typedef typename internal::traits<Input>::Index IndexType;
+    typedef TensorRef<Tensor<typename internal::traits<Input>::Scalar, 4,
+                             internal::traits<Input>::Layout, IndexType> > Ref;
+    Ref in(input);
+
+    const Index num_channels = in.dimension(0);
+    const Index input_width = in.dimension(1);
+    const Index input_height = in.dimension(2);
+    const Index batch_size = in.dimension(3);
+    eigen_assert(input_width > 0);
+    eigen_assert(input_height > 0);
+
+    for (Index i = 0; i < batch_size; ++i) {
+      float x = offsets_[i].first, y = offsets_[i].second;
+
+      // Un-normalize coordinates back to pixel space if normalized.
+      if (normalized_) {
+        x *= input_width;
+        y *= input_height;
+      }
+      // Un-center if coordinates are centered on the image center.
+      if (centered_) {
+        x /= 2.0f;
+        y /= 2.0f;
+        x += input_width / 2.0f;
+        y += input_height / 2.0f;
+      }
+      // Remove half of the glimpse window.
+      x -= width_ / 2.0f;
+      y -= height_ / 2.0f;
+
+      const Index offset_x = (Index) x;
+      const Index offset_y = (Index) y;
+      Index glimpse_width = width_;
+      Index glimpse_height = height_;
+      bool partial_overlap = false;
+      DSizes<Index, 3> slice_offset(0, offset_x, offset_y);
+      DSizes<Index, 3> slice_extent(num_channels, width_, height_);
+      DSizes<Index, 3> base_offset(0, 0, 0);
+
+      if (offset_x < 0) {
+        slice_offset[1] = 0;
+        glimpse_width = (std::max<Index>)(0, width_ + offset_x);
+        slice_extent[1] = glimpse_width;
+        base_offset[1] = width_ - glimpse_width;
+        partial_overlap = true;
+      } else if (offset_x + width_ >= input_width) {
+        glimpse_width = (std::max<Index>)(0, input_width - offset_x);
+        slice_extent[1] = glimpse_width;
+        partial_overlap = true;
+      }
+      if (offset_y < 0) {
+        slice_offset[2] = 0;
+        glimpse_height = (std::max<Index>)(0, height_ + offset_y);
+        slice_extent[2] = glimpse_height;
+        base_offset[2] = height_ - glimpse_height;
+        partial_overlap = true;
+      } else if (offset_y + height_ >= input_height) {
+        glimpse_height = (std::max<Index>)(0, input_height - offset_y);
+        slice_extent[2] = glimpse_height;
+        partial_overlap = true;
+      }
+      slice_extent[1] = std::min<Index>(input_width, slice_extent[1]);
+      slice_extent[2] = std::min<Index>(input_height, slice_extent[2]);
+
+      if (partial_overlap) {
+        if (uniform_noise_) {
+          // Initialize the glimpse with uniform noise.
+          typedef typename internal::remove_const<
+            typename internal::traits<Input>::Scalar>::type Scalar;
+          TensorFixedSize<Scalar, Sizes<> > mini;
+          mini.device(device) = input.template chip<3>(i).minimum();
+          TensorFixedSize<float, Sizes<> > range;
+          range.device(device) =
+              (input.template chip<3>(i).maximum() - mini).template cast<float>();
+
+          DSizes<Index, 3> glimpse_size(num_channels, width_, height_);
+          TensorMap<Tensor<float, 3> > tmp(NULL, glimpse_size);
+          output.template chip<3>(i).device(device) =
+              mini.reshape(Sizes<1,1,1>()).broadcast(glimpse_size) +
+              (tmp.random() * range.reshape(Sizes<1,1,1>()).broadcast(glimpse_size)).template cast<Scalar>();
+        } else {
+          // Initialize the glimpse with white noise: compute the mean and sigma
+          // of each channel, and use them to shape the gaussian.
+          DSizes<Index, 2> glimpse_size(width_, height_);
+          DSizes<Index, 2> input_size(input_width, input_height);
+          typedef typename internal::remove_const<
+            typename internal::traits<Input>::Scalar>::type Scalar;
+
+          for (int j = 0; j < num_channels; ++j) {
+            TensorFixedSize<Scalar, Sizes<> > mean;
+            mean.device(device) = input.template chip<3>(i).template chip<0>(j).template cast<float>().mean();
+            TensorFixedSize<float, Sizes<> > sigma;
+            sigma.device(device) =
+                (input.template chip<3>(i).template chip<0>(j).template cast<float>() - mean.reshape(Sizes<1,1>()).broadcast(input_size)).square().mean().sqrt();
+            TensorFixedSize<Scalar, Sizes<> > mini;
+            mini.device(device) = input.template chip<3>(i).template chip<0>(j).minimum();
+            TensorFixedSize<float, Sizes<> > maxi;
+            maxi.device(device) = input.template chip<3>(i).template chip<0>(j).maximum();
+
+            TensorMap<Tensor<float, 2> > tmp(NULL, glimpse_size);
+            output.template chip<3>(i).template chip<0>(j).device(device) =
+                (mean.reshape(Sizes<1,1>()).broadcast(glimpse_size) +
+                 (tmp.random(internal::NormalRandomGenerator<float>()) * sigma.reshape(Sizes<1,1>()).broadcast(glimpse_size)).template cast<Scalar>()).cwiseMin(maxi.reshape(Sizes<1,1>()).broadcast(glimpse_size)).cwiseMax(mini.reshape(Sizes<1,1>()).broadcast(glimpse_size));
+          }
+        }
+
+        // Copy the part of the glimpse that cover the input image if any.
+        if (glimpse_width == 0 || glimpse_height == 0) {
+          continue;
+        }
+        output.template chip<3>(i).slice(base_offset, slice_extent).device(device) = input.template chip<3>(i).slice(slice_offset, slice_extent);
+      } else {
+        output.template chip<3>(i).device(device) = input.template chip<3>(i).slice(slice_offset, slice_extent);
+      }
+    }
+  }
+
+ private:
+  const Index width_;
+  const Index height_;
+  const std::vector<IndexPair<float> > offsets_;
+  const bool normalized_;
+  const bool centered_;
+  const bool uniform_noise_;
+};
+}
+
+
+template <typename Input>
+EIGEN_ALWAYS_INLINE
+static const TensorCustomUnaryOp<const GlimpseExtractionOp<typename internal::traits<Input>::Index>, const Input>
+ExtractGlimpses(const Input& input,
+                const typename internal::traits<Input>::Index width,
+                const typename internal::traits<Input>::Index height,
+                const std::vector<IndexPair<float> >& offsets,
+                const bool normalized = true, const bool centered = true,
+                const bool uniform_noise = true)
+{
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::Layout == ColMajor, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions == 4, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  typedef typename internal::traits<Input>::Index Index;
+  const GlimpseExtractionOp<Index> op(width, height, offsets, normalized,
+                                      centered, uniform_noise);
+  return input.customOp(op);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_NEURAL_NETWORKS_ATTENTION_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/BackwardCuboidConvolutions.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/BackwardCuboidConvolutions.h
new file mode 100644
index 0000000000..12ce23444c
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/BackwardCuboidConvolutions.h
@@ -0,0 +1,523 @@
+#ifndef EIGEN_CXX11_NEURAL_NETWORKS_BACKWARD_CUBOID_CONVOLUTIONS_H
+#define EIGEN_CXX11_NEURAL_NETWORKS_BACKWARD_CUBOID_CONVOLUTIONS_H
+
+#include "Patch3d.h"
+
+namespace Eigen {
+
+/** CuboidConvolutionBackwardInput
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Computes the backprop for the input of a 3D convolution.
+  *
+  * The output_backward parameter is expected to be a tensor with a rank of 4 or more (channels, depth, height, width, and optionally others)
+  * The kernel parameter is expected to be a 5D tensor (filters, channels, kernel_depth, kernel_height, kernel_width)
+  * output_backward and kernel have to be in the same layout.
+  *
+  * The dimensions of the result will be filters, depth, height, width (and others if applicable).
+  *
+  * It is possible to swap the order of the depth, width and height dimensions provided that the same order is used in the input, the kernel, and the output.
+  *
+  * All dimension orders above are given for col-major, and should be reversed for row-major.
+  */
+
+template <typename OutputBackward, typename Kernel>
+EIGEN_ALWAYS_INLINE static const typename internal::conditional<
+    internal::traits<OutputBackward>::Layout == ColMajor,
+    TensorReshapingOp<
+        const DSizes<typename internal::traits<OutputBackward>::Index,
+                     internal::traits<OutputBackward>::NumDimensions>,
+        const TensorContractionOp<
+            const array< IndexPair<typename internal::traits<OutputBackward>::Index>, 2>,
+            const TensorReshapingOp<
+                const DSizes< typename internal::traits<OutputBackward>::Index, 3>,
+                const TensorReverseOp<const array<bool, 5>, const Kernel>
+            >,
+            const TensorReshapingOp<
+                const DSizes< typename internal::traits<OutputBackward>::Index, 3>,
+                const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const OutputBackward>
+            >
+        >
+    >,
+    TensorReshapingOp<
+        const DSizes<typename internal::traits<OutputBackward>::Index,
+                     internal::traits<OutputBackward>::NumDimensions>,
+        const TensorContractionOp<
+            const array< IndexPair<typename internal::traits<OutputBackward>::Index>, 2>,
+            const TensorReshapingOp<
+                const DSizes< typename internal::traits<OutputBackward>::Index, 3>,
+                const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const OutputBackward>
+            >,
+            const TensorReshapingOp<
+                const DSizes<typename internal::traits<OutputBackward>::Index, 3>,
+                const TensorReverseOp<const array<bool, 5>, const Kernel>
+            >
+        >
+    >
+>::type
+CuboidConvolutionBackwardInput(
+    const Kernel& kernel, const OutputBackward& output_backward,
+    typename internal::traits<OutputBackward>::Index inputPlanes,
+    typename internal::traits<OutputBackward>::Index inputRows,
+    typename internal::traits<OutputBackward>::Index inputCols,
+    const DenseIndex stridePlanes = 1, const DenseIndex strideRows = 1,
+    const DenseIndex strideCols = 1) {
+  typedef typename internal::traits<OutputBackward>::Index TensorIndex;
+  const TensorRef<const Tensor<typename internal::traits<Kernel>::Scalar, internal::traits<Kernel>::NumDimensions, internal::traits<Kernel>::Layout, TensorIndex> > kern(kernel);
+  const TensorRef<const Tensor<typename internal::traits<OutputBackward>::Scalar, internal::traits<OutputBackward>::NumDimensions, internal::traits<OutputBackward>::Layout, TensorIndex> > out(output_backward);
+
+  EIGEN_STATIC_ASSERT(internal::traits<Kernel>::Layout == internal::traits<OutputBackward>::Layout, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  static const bool isColMajor = (internal::traits<OutputBackward>::Layout == ColMajor);
+
+  static const int NumDims = internal::traits<OutputBackward>::NumDimensions;
+
+  // Number of filters to apply. This is the same as the output depth of the result
+  const TensorIndex kernelFilters = isColMajor ? kern.dimensions()[0] : kern.dimensions()[4];
+  // Number of channels. This is the same as the input depth.
+  const TensorIndex kernelChannels = isColMajor ? kern.dimensions()[1] : kern.dimensions()[3];
+  const TensorIndex kernelPlanes = isColMajor ? kern.dimensions()[2] : kern.dimensions()[2];
+  const TensorIndex kernelRows = isColMajor ? kern.dimensions()[3] : kern.dimensions()[1];
+  const TensorIndex kernelCols = isColMajor ? kern.dimensions()[4] : kern.dimensions()[0];
+
+  const TensorIndex outputPlanes = isColMajor ? out.dimensions()[1] : out.dimensions()[NumDims - 2];
+  const TensorIndex outputRows = isColMajor ? out.dimensions()[2] : out.dimensions()[NumDims - 3];
+  const TensorIndex outputCols = isColMajor ? out.dimensions()[3] : out.dimensions()[NumDims - 4];
+
+  TensorIndex forward_pad_z, forward_pad_y, forward_pad_x;
+  const TensorIndex size_z = ceil(inputPlanes / static_cast<float>(stridePlanes));
+  const TensorIndex size_y = ceil(inputRows / static_cast<float>(strideRows));
+  const TensorIndex size_x = ceil(inputCols / static_cast<float>(strideCols));
+
+  // Infer padding type.
+  if (size_z == outputPlanes && size_y == outputRows && size_x == outputCols) {
+    // SAME padding.
+    const TensorIndex dz = size_z * stridePlanes + kernelPlanes - 1 - inputPlanes;
+    const TensorIndex dy = size_y * strideRows + kernelRows - 1 - inputRows;
+    const TensorIndex dx = size_x * strideCols + kernelCols - 1 - inputCols;
+
+    forward_pad_z = dz - dz / 2;
+    forward_pad_y = dy - dy / 2;
+    forward_pad_x = dx - dx / 2;
+  } else {
+    // VALID padding.
+    forward_pad_z = 0;
+    forward_pad_y = 0;
+    forward_pad_x = 0;
+  }
+  const TensorIndex padding_ztop = kernelPlanes - 1 - forward_pad_z;
+  const TensorIndex padding_top = kernelRows - 1 - forward_pad_y;
+  const TensorIndex padding_left = kernelCols - 1 - forward_pad_x;
+
+  const TensorIndex padding_zbottom = inputPlanes + kernelPlanes - 1 - (outputPlanes - 1) * stridePlanes - 1 - padding_ztop;
+  const TensorIndex padding_bottom = inputRows + kernelRows - 1 - (outputRows - 1) * strideRows - 1 - padding_top;
+  const TensorIndex padding_right = inputCols + kernelCols - 1 - (outputCols - 1) * strideCols - 1 - padding_left;
+
+  eigen_assert(padding_ztop >= 0);
+  eigen_assert(padding_zbottom >= 0);
+  eigen_assert(padding_top >= 0);
+  eigen_assert(padding_left >= 0);
+  eigen_assert(padding_bottom >= 0);
+  eigen_assert(padding_right >= 0);
+
+  // The kernel has dimensions filters X channels X patch_planes X patch_rows X patch_cols.
+  // We need to reverse the kernel along the spatial dimensions.
+  array<bool, 5> kernel_reverse;
+  if (isColMajor) {
+    kernel_reverse[0] = false;
+    kernel_reverse[1] = false;
+    kernel_reverse[2] = true;
+    kernel_reverse[3] = true;
+    kernel_reverse[4] = true;
+  } else {
+    kernel_reverse[0] = true;
+    kernel_reverse[1] = true;
+    kernel_reverse[2] = true;
+    kernel_reverse[3] = false;
+    kernel_reverse[4] = false;
+  }
+
+  DSizes<TensorIndex, 3> kernel_dims;
+  if (isColMajor) {
+    kernel_dims[0] = kernelFilters;
+    kernel_dims[1] = kernelChannels;
+    kernel_dims[2] = kernelRows * kernelCols * kernelPlanes;
+  } else {
+    kernel_dims[0] = kernelRows * kernelCols * kernelPlanes;
+    kernel_dims[1] = kernelChannels;
+    kernel_dims[2] = kernelFilters;
+  }
+
+  // The output_backward has dimensions out_depth X out_planes X out_rows X out_cols X OTHERS
+  // When we extract the image patches from output_backward, it will have dimensions:
+  //   out_depth X (patch_planes * patch_rows * patch_cols) X (input_planes * input_rows * input_cols * OTHERS)
+  DSizes<TensorIndex, 3> pre_contract_dims;
+  if (isColMajor) {
+    pre_contract_dims[0] = kernelFilters;
+    pre_contract_dims[1] = kernelRows * kernelCols * kernelPlanes;
+    pre_contract_dims[2] = inputRows * inputCols * inputPlanes;
+    for (int i = 4; i < NumDims; ++i) {
+      pre_contract_dims[2] *= out.dimension(i);
+    }
+  } else {
+    pre_contract_dims[2] = kernelFilters;
+    pre_contract_dims[1] = kernelRows * kernelCols * kernelPlanes;
+    pre_contract_dims[0] = inputRows * inputCols * inputPlanes;
+    for (int i = 0; i < NumDims - 4; ++i) {
+      pre_contract_dims[0] *= out.dimension(i);
+    }
+  }
+
+  // We will contract along dimensions (0, 2) in kernel and (0, 1) in
+  // output_backward, if this is col-major, and
+  // dimensions (0, 2) in kernel and (1, 2) in output_backward, if this row-major.
+  array<IndexPair<TensorIndex>, 2> contract_dims;
+  if (isColMajor) {
+    // col-major: kernel.contract(output.patches)
+    contract_dims[0] = IndexPair<TensorIndex>(0, 0);
+    contract_dims[1] = IndexPair<TensorIndex>(2, 1);
+  } else {
+    // row-major: output.patches.contract(kernel)
+    contract_dims[0] = IndexPair<TensorIndex>(1, 0);
+    contract_dims[1] = IndexPair<TensorIndex>(2, 2);
+  }
+
+  // Post contraction, the dimensions of the input_backprop is
+  //  channels X input_planes X input_rows X input_cols X OTHERS
+  DSizes<TensorIndex, NumDims> post_contract_dims;
+  if (isColMajor) {
+    post_contract_dims[0] = kernelChannels;
+    post_contract_dims[1] = inputPlanes;
+    post_contract_dims[2] = inputRows;
+    post_contract_dims[3] = inputCols;
+    for (int i = 4; i < NumDims; ++i) {
+      post_contract_dims[i] = out.dimension(i);
+    }
+  } else {
+    post_contract_dims[NumDims - 1] = kernelChannels;
+    post_contract_dims[NumDims - 2] = inputPlanes;
+    post_contract_dims[NumDims - 3] = inputRows;
+    post_contract_dims[NumDims - 4] = inputCols;
+    for (int i = 0; i < NumDims - 4; ++i) {
+      post_contract_dims[i] = out.dimension(i);
+    }
+  }
+
+  DSizes<TensorIndex, NumDims> strides;
+  for (int i = 0; i < NumDims; i++) {
+    strides[i] = 1;
+  }
+  if (isColMajor) {
+    strides[1] = stridePlanes;
+    strides[2] = strideRows;
+    strides[3] = strideCols;
+  } else {
+    strides[NumDims - 2] = stridePlanes;
+    strides[NumDims - 3] = strideRows;
+    strides[NumDims - 4] = strideCols;
+  }
+
+  return choose(
+      Cond<internal::traits<OutputBackward>::Layout == ColMajor>(),
+      kernel.reverse(kernel_reverse)
+          .reshape(kernel_dims)
+          .contract(
+              output_backward.extract_volume_patches(kernelPlanes, kernelRows, kernelCols,
+                                                     1, 1, 1, stridePlanes, strideRows, strideCols,
+                               padding_ztop, padding_zbottom,
+                               padding_top, padding_bottom,
+                               padding_left, padding_right)
+                  .reshape(pre_contract_dims),
+              contract_dims)
+          .reshape(post_contract_dims),
+      output_backward.extract_volume_patches(kernelPlanes, kernelRows, kernelCols,
+                                             1, 1, 1, stridePlanes, strideRows, strideCols,
+                       padding_ztop, padding_zbottom,
+                       padding_top, padding_bottom,
+                       padding_left, padding_right)
+          .reshape(pre_contract_dims)
+          .contract(kernel.reverse(kernel_reverse).reshape(kernel_dims),
+                    contract_dims)
+          .reshape(post_contract_dims));
+}
+
+
+/** CuboidConvolutionBackwardKernel
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Computes the backprop for the filter of a 3D convolution.
+  *
+  * The output_backward parameter is expected to be a tensor with a rank of 4 or more (channels, depth, height, width, and optionally others)
+  * The kernel parameter is expected to be a 4D tensor (filters, channels, kernel_depth, kernel_height, kernel_width)
+  * output_backward and kernel have to be in the same layout.
+  *
+  * The dimensions of the result will be filters, depth, height, width (and others if applicable).
+  *
+  * It is possible to swap the order of the depth, width and height dimensions provided that the same order is used in the input, the kernel, and the output.
+  *
+  * All dimension orders above are given for col-major, and should be reversed for row-major.
+  */
+template <typename OutputBackward, typename Input>
+EIGEN_ALWAYS_INLINE static const typename internal::conditional<
+    internal::traits<OutputBackward>::Layout == ColMajor,
+    const TensorShufflingOp<
+        const array<typename internal::traits<OutputBackward>::Index, 5>,
+        const TensorReverseOp<
+            const array<bool, 5>,
+            const TensorReshapingOp<
+                const DSizes<typename internal::traits<OutputBackward>::Index, 5>,
+                const TensorContractionOp<
+                    const array< IndexPair<typename internal::traits<Input>::Index>, 2>,
+                    const TensorReshapingOp<
+                        const DSizes<typename internal::traits<Input>::Index, 3>,
+                        const Input>,
+                    const TensorReshapingOp<
+                        const DSizes< typename internal::traits<OutputBackward>::Index, 4>,
+                        const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const OutputBackward>
+                    >
+                >
+            >
+        >
+    >,
+    const TensorShufflingOp<
+        const array<typename internal::traits<OutputBackward>::Index, 5>,
+        const TensorReverseOp<
+            const array<bool, 5>,
+            const TensorReshapingOp<
+                const DSizes<typename internal::traits<OutputBackward>::Index, 5>,
+                const TensorContractionOp<
+                    const array< IndexPair<typename internal::traits<Input>::Index>, 2>,
+                    const TensorReshapingOp<
+                        const DSizes< typename internal::traits<OutputBackward>::Index, 4>,
+                        const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const OutputBackward>
+                    >,
+                    const TensorReshapingOp<
+                        const DSizes<typename internal::traits<Input>::Index, 3>,
+                        const Input
+                    >
+                >
+            >
+        >
+    >
+>::type
+CuboidConvolutionBackwardKernel(
+    const Input& input, const OutputBackward& output_backward,
+    typename internal::traits<Input>::Index kernelPlanes,
+    typename internal::traits<Input>::Index kernelRows,
+    typename internal::traits<Input>::Index kernelCols,
+    const DenseIndex stridePlanes = 1,
+    const DenseIndex strideRows = 1,
+    const DenseIndex strideCols = 1) {
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+  TensorRef<Tensor<typename internal::traits<OutputBackward>::Scalar, internal::traits<OutputBackward>::NumDimensions, internal::traits<OutputBackward>::Layout, TensorIndex> > out(output_backward);
+
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::Layout == internal::traits<OutputBackward>::Layout, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+
+  static const int NumDims = internal::traits<Input>::NumDimensions;
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions == internal::traits<OutputBackward>::NumDimensions, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  const TensorIndex inputPlanes = isColMajor ? in.dimension(1) : in.dimension(NumDims - 2);
+  const TensorIndex inputRows = isColMajor ? in.dimension(2) : in.dimension(NumDims - 3);
+  const TensorIndex inputCols = isColMajor ? in.dimension(3) : in.dimension(NumDims - 4);
+
+  const TensorIndex outputPlanes = isColMajor ? out.dimension(1) : out.dimension(NumDims - 2);
+  const TensorIndex outputRows = isColMajor ? out.dimension(2) : out.dimension(NumDims - 3);
+  const TensorIndex outputCols = isColMajor ? out.dimension(3) : out.dimension(NumDims - 4);
+
+  const TensorIndex kernelFilters = isColMajor ? out.dimension(0) : out.dimension(NumDims - 1);
+  const TensorIndex kernelChannels = isColMajor ? in.dimension(0) : in.dimension(NumDims - 1);
+
+  TensorIndex forward_pad_z, forward_pad_y, forward_pad_x;
+  const TensorIndex size_z = ceil(inputPlanes / static_cast<float>(stridePlanes));
+  const TensorIndex size_y = ceil(inputRows / static_cast<float>(strideRows));
+  const TensorIndex size_x = ceil(inputCols / static_cast<float>(strideCols));
+
+  // Infer padding type.
+  if (size_z == outputPlanes && size_y == outputRows && size_x == outputCols) {
+    // SAME padding.
+    const TensorIndex dz = size_z * stridePlanes + kernelPlanes - 1 - inputPlanes;
+    const TensorIndex dy = size_y * strideRows + kernelRows - 1 - inputRows;
+    const TensorIndex dx = size_x * strideCols + kernelCols - 1 - inputCols;
+
+    forward_pad_z = dz - dz / 2;
+    forward_pad_y = dy - dy / 2;
+    forward_pad_x = dx - dx / 2;
+  } else {
+    // VALID padding.
+    forward_pad_z = 0;
+    forward_pad_y = 0;
+    forward_pad_x = 0;
+  }
+
+  const TensorIndex padding_ztop = kernelPlanes - 1 - forward_pad_z;
+  const TensorIndex padding_top = kernelRows - 1 - forward_pad_y;
+  const TensorIndex padding_left = kernelCols - 1 - forward_pad_x;
+
+  const TensorIndex padding_zbottom = inputPlanes + kernelPlanes - 1 - (outputPlanes - 1) * stridePlanes - 1 - padding_ztop;
+  const TensorIndex padding_bottom = inputRows + kernelRows - 1 - (outputRows - 1) * strideRows - 1 - padding_top;
+  const TensorIndex padding_right = inputCols + kernelCols - 1 - (outputCols - 1) * strideCols - 1 - padding_left;
+
+  eigen_assert(padding_ztop >= 0);
+  eigen_assert(padding_zbottom >= 0);
+  eigen_assert(padding_top >= 0);
+  eigen_assert(padding_left >= 0);
+  eigen_assert(padding_bottom >= 0);
+  eigen_assert(padding_right >= 0);
+
+  // The output_backward has dimensions out_depth X out_plaens X out_rows X out_cols X OTHERS
+  // When we extract the image patches from output_backward (with input as the
+  // kernel), it will have dimensions
+  //  (out_depth) X (input_planes * input_rows * input_cols) X (kernel_planes * kernel_rows * kernel_cols) X OTHERS
+  DSizes<TensorIndex, 4> pre_contract_dims;
+  if (isColMajor) {
+    pre_contract_dims[0] = kernelFilters;
+    pre_contract_dims[1] = inputRows * inputCols * inputPlanes;
+    pre_contract_dims[2] = kernelRows * kernelCols * kernelPlanes;
+    pre_contract_dims[3] = 1;
+    for (int i = 4; i < NumDims; ++i) {
+      pre_contract_dims[3] *= out.dimension(i);
+    }
+  } else {
+    pre_contract_dims[3] = kernelFilters;
+    pre_contract_dims[2] = inputRows * inputCols * inputPlanes;
+    pre_contract_dims[1] = kernelRows * kernelCols * kernelPlanes;
+    pre_contract_dims[0] = 1;
+    for (int i = 0; i < NumDims - 4; ++i) {
+      pre_contract_dims[0] *= out.dimension(i);
+    }
+  }
+
+  // The input has dimensions in_depth X (input_planes * input_rows * input_cols) X OTHERS
+  DSizes<TensorIndex, 3> input_dims;
+  if (isColMajor) {
+    input_dims[0] = kernelChannels;
+    input_dims[1] = inputRows * inputCols * inputPlanes;
+    input_dims[2] = 1;
+    for (int i = 4; i < NumDims; ++i) {
+      input_dims[2] *= in.dimension(i);
+    }
+    eigen_assert(input_dims[2] == pre_contract_dims[3]);
+  } else {
+    input_dims[2] = kernelChannels;
+    input_dims[1] = inputRows * inputCols * inputPlanes;
+    input_dims[0] = 1;
+    for (int i = 0; i < NumDims - 4; ++i) {
+      input_dims[0] *= in.dimension(i);
+    }
+    eigen_assert(input_dims[0] == pre_contract_dims[0]);
+  }
+
+  // We will contract along dimensions (1, 2) in in and (1, 3) in out, if
+  // this is col-major.
+  // For row-major, it's dimensions (0, 1) in in and (0, 2) in out.
+  array<IndexPair<TensorIndex>, 2> contract_dims;
+  if (isColMajor) {
+    // col-major: in.contract(output.patches)
+    contract_dims[0] = IndexPair<TensorIndex>(1, 1);
+    contract_dims[1] = IndexPair<TensorIndex>(2, 3);
+  } else {
+    // row-major: output.patches.contract(in)
+    contract_dims[0] = IndexPair<TensorIndex>(0, 0);
+    contract_dims[1] = IndexPair<TensorIndex>(2, 1);
+  }
+
+  // After the contraction, the kernel will have dimension
+  //   in_depth X out_depth X kernel_patches X kernel_rows X kernel_cols
+  // We will need to shuffle the first two dimensions and reverse the spatial dimensions.
+  // The end shape is:
+  //   out_depth X in_shape X kernel_planes X kernel_rows X kernel_cols
+
+  // This is the shape of the kernel *before* the shuffling.
+  DSizes<TensorIndex, 5> kernel_dims;
+  if (isColMajor) {
+    kernel_dims[0] = kernelChannels;
+    kernel_dims[1] = kernelFilters;
+    kernel_dims[2] = kernelPlanes;
+    kernel_dims[3] = kernelRows;
+    kernel_dims[4] = kernelCols;
+  } else {
+    kernel_dims[0] = kernelCols;
+    kernel_dims[1] = kernelRows;
+    kernel_dims[2] = kernelPlanes;
+    kernel_dims[3] = kernelFilters;
+    kernel_dims[4] = kernelChannels;
+  }
+
+  // Flip filters and channels.
+  array<TensorIndex, 5> kernel_shuffle;
+  if (isColMajor) {
+    kernel_shuffle[0] = 1;
+    kernel_shuffle[1] = 0;
+    kernel_shuffle[2] = 2;
+    kernel_shuffle[3] = 3;
+    kernel_shuffle[4] = 4;
+  } else {
+    kernel_shuffle[0] = 0;
+    kernel_shuffle[1] = 1;
+    kernel_shuffle[2] = 2;
+    kernel_shuffle[3] = 4;
+    kernel_shuffle[4] = 3;
+  }
+
+  // Reverse the spatial dimensions.
+  array<bool, 5> kernel_reverse;
+  if (isColMajor) {
+    kernel_reverse[0] = false;
+    kernel_reverse[1] = false;
+    kernel_reverse[2] = true;
+    kernel_reverse[3] = true;
+    kernel_reverse[4] = true;
+  } else {
+    kernel_reverse[0] = true;
+    kernel_reverse[1] = true;
+    kernel_reverse[2] = true;
+    kernel_reverse[3] = false;
+    kernel_reverse[4] = false;
+  }
+
+  DSizes<TensorIndex, NumDims> strides;
+  for (int i = 0; i < NumDims; i++) {
+    strides[i] = 1;
+  }
+  if (isColMajor) {
+    strides[1] = stridePlanes;
+    strides[2] = strideRows;
+    strides[3] = strideCols;
+  } else {
+    strides[NumDims - 2] = stridePlanes;
+    strides[NumDims - 3] = strideRows;
+    strides[NumDims - 4] = strideCols;
+  }
+  return choose(
+      Cond<internal::traits<Input>::Layout == ColMajor>(),
+      input.reshape(input_dims)
+          .contract(
+              output_backward.extract_volume_patches(
+                                 inputPlanes, inputRows, inputCols, 1,
+                                 1, 1, stridePlanes, strideRows, strideCols,
+
+                                 padding_ztop, padding_zbottom, padding_top,
+                                 padding_bottom, padding_left, padding_right)
+                  .reshape(pre_contract_dims),
+              contract_dims)
+          .reshape(kernel_dims)
+          .reverse(kernel_reverse)
+          .shuffle(kernel_shuffle),
+      output_backward.extract_volume_patches(
+                         inputPlanes, inputRows, inputCols, 1, 1, 1,
+                         stridePlanes, strideRows, strideCols, padding_ztop,
+                         padding_zbottom, padding_top, padding_bottom,
+                         padding_left, padding_right)
+          .reshape(pre_contract_dims)
+          .contract(input.reshape(input_dims), contract_dims)
+          .reshape(kernel_dims)
+          .reverse(kernel_reverse)
+          .shuffle(kernel_shuffle));
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_NEURAL_NETWORKS_BACKWARD_CUBOID_CONVOLUTIONS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/BackwardSpatialConvolutions.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/BackwardSpatialConvolutions.h
new file mode 100644
index 0000000000..188dc75bf6
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/BackwardSpatialConvolutions.h
@@ -0,0 +1,351 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Ke Yang <yangke@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_NEURAL_NETWORKS_BACKWARD_SPATIAL_CONVOLUTIONS_H
+#define EIGEN_CXX11_NEURAL_NETWORKS_BACKWARD_SPATIAL_CONVOLUTIONS_H
+
+namespace Eigen {
+
+/** SpatialConvolutionBackwardInput
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Computes the backprop for the input of a 2D convolution.
+  *
+  * The output_backward parameter is expected to be a tensor with a rank of 3 or more (channels, height, width, and optionally others)
+  * The kernel parameter is expected to be a 4D tensor (filters, channels, kernel_height, kernel_width)
+  * The output_backward and the kernel must both be in col-major layout. The result will also be in col-major layout.
+  *
+  * If in_stride > 1, then applies convolution with holes (aka atrous convolution), sampling every in_stride input pixels.
+  *
+  * The result can be assigned to a tensor of rank equal to the rank of the output_backward. The dimensions of the result will be filters, height, width (and others if applicable).
+  *
+  * It is possible to swap the order of the width and height dimensions provided that the same order is used in the input, the kernel, and the output.
+  *
+  */
+
+template <typename OutputBackward, typename Kernel>
+EIGEN_ALWAYS_INLINE
+static const typename internal::conditional<
+  internal::traits<OutputBackward>::Layout == ColMajor,
+  TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, internal::traits<OutputBackward>::NumDimensions>, const TensorContractionOp<const array<IndexPair<typename internal::traits<OutputBackward>::Index>, 2>, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 3>, const TensorReverseOp<const array<bool, 4>, const Kernel> >, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 3>, const TensorImagePatchOp<Dynamic, Dynamic, const OutputBackward> > > >,
+  TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, internal::traits<OutputBackward>::NumDimensions>, const TensorContractionOp<const array<IndexPair<typename internal::traits<OutputBackward>::Index>, 2>, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 3>, const TensorImagePatchOp<Dynamic, Dynamic, const OutputBackward> >, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 3>, const TensorReverseOp<const array<bool, 4>, const Kernel> > > > >::type
+SpatialConvolutionBackwardInput(const Kernel& kernel, const OutputBackward& output_backward, typename internal::traits<OutputBackward>::Index inputRows, typename internal::traits<OutputBackward>::Index inputCols, const DenseIndex stride = 1, const DenseIndex in_stride = 1) {
+
+  typedef typename internal::traits<OutputBackward>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Kernel>::Scalar, internal::traits<Kernel>::NumDimensions, internal::traits<Kernel>::Layout, TensorIndex> > kern(kernel);
+  TensorRef<Tensor<typename internal::traits<OutputBackward>::Scalar, internal::traits<OutputBackward>::NumDimensions, internal::traits<OutputBackward>::Layout, TensorIndex> > out(output_backward);
+
+  EIGEN_STATIC_ASSERT(internal::traits<Kernel>::Layout == internal::traits<OutputBackward>::Layout, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  static const bool isColMajor = (internal::traits<OutputBackward>::Layout == ColMajor);
+
+  static const int NumDims = internal::traits<OutputBackward>::NumDimensions;
+
+  // Number of filters to apply. This is the same as the output depth of the result
+  const TensorIndex kernelFilters = isColMajor ? kern.dimensions()[0] : kern.dimensions()[3];
+  // Number of channels. This is the same as the input depth.
+  const TensorIndex kernelChannels = isColMajor ? kern.dimensions()[1] : kern.dimensions()[2];
+  const TensorIndex kernelRows = isColMajor ? kern.dimensions()[2] : kern.dimensions()[1];
+  const TensorIndex kernelCols = isColMajor ? kern.dimensions()[3] : kern.dimensions()[0];
+
+  // This is the effective kernel size, taking into account the (in_stride - 1) zero-values
+  // inserted between consecutive kernel elements in atrous convolution
+  const TensorIndex kernelRowsEff = kernelRows + (kernelRows - 1) * (in_stride - 1);
+  const TensorIndex kernelColsEff = kernelCols + (kernelCols - 1) * (in_stride - 1);
+
+  const TensorIndex outputRows = isColMajor ? output_backward.dimension(1) : output_backward.dimension(NumDims - 2);
+  const TensorIndex outputCols = isColMajor ? output_backward.dimension(2) : output_backward.dimension(NumDims - 3);
+
+  // Computing the forward padding
+  const TensorIndex forward_pad_top = ((outputRows - 1) * stride + kernelRowsEff - inputRows) / 2;
+  const TensorIndex forward_pad_left = ((outputCols - 1) * stride + kernelColsEff - inputCols) / 2;
+
+  const TensorIndex padding_top = kernelRowsEff - 1 - forward_pad_top;
+  const TensorIndex padding_left = kernelColsEff - 1 - forward_pad_left;
+  const TensorIndex padding_bottom = inputRows + kernelRowsEff - 1 - (outputRows - 1) * stride - 1 - padding_top;
+  const TensorIndex padding_right = inputCols + kernelColsEff - 1 - (outputCols - 1) * stride - 1 - padding_left;
+
+  eigen_assert(padding_top >= 0);
+  eigen_assert(padding_left >= 0);
+  eigen_assert(padding_bottom >= 0);
+  eigen_assert(padding_right >= 0);
+
+  // The kernel has dimensions filters X channels X patch_rows X patch_cols
+  // We need to reverse the kernel along dimensions corresponding to rows and
+  // cols.
+  // TODO(yangke): we can make things slightly faster by collapsing the dimensions
+  // where we don't reverse. Try that once we have a faster compiler.
+  array<bool, 4> kernel_reverse;
+  if (isColMajor) {
+    kernel_reverse[0] = false;
+    kernel_reverse[1] = false;
+    kernel_reverse[2] = true;
+    kernel_reverse[3] = true;
+  } else {
+    kernel_reverse[0] = true;
+    kernel_reverse[1] = true;
+    kernel_reverse[2] = false;
+    kernel_reverse[3] = false;
+  }
+
+  DSizes<TensorIndex, 3> kernel_dims;
+  if (isColMajor) {
+    kernel_dims[0] = kernelFilters;
+    kernel_dims[1] = kernelChannels;
+    kernel_dims[2] = kernelRows * kernelCols;
+  } else {
+    kernel_dims[0] = kernelRows * kernelCols;
+    kernel_dims[1] = kernelChannels;
+    kernel_dims[2] = kernelFilters;
+  }
+
+  // The output_backward has dimensions out_depth X out_rows X out_cols X OTHERS
+  // When we extract the image patches from output_backward, it will have dimensions
+  //   out_depth X (patch_rows * patch_cols) X (input_rows * input_cols * OTHERS)
+  DSizes<TensorIndex, 3> pre_contract_dims;
+  if (isColMajor) {
+    pre_contract_dims[0] = kernelFilters;
+    pre_contract_dims[1] = kernelRows * kernelCols;
+    pre_contract_dims[2] = inputRows * inputCols;
+    for (int i = 3; i < NumDims; ++i) {
+      pre_contract_dims[2] *= out.dimension(i);
+    }
+  } else {
+    pre_contract_dims[2] = kernelFilters;
+    pre_contract_dims[1] = kernelRows * kernelCols;
+    pre_contract_dims[0] = inputRows * inputCols;
+    for (int i = 0; i < NumDims - 3; ++i) {
+      pre_contract_dims[0] *= out.dimension(i);
+    }
+  }
+
+  // We will contract along dimensions (0, 2) in kernel and (0, 1) in
+  // output_backward, if this is col-major, and
+  // dimensions (0, 2) in kernel and (1, 2) in output_backward, if this row-major.
+  array<IndexPair<TensorIndex>, 2> contract_dims;
+  if (isColMajor) {
+    // col-major: kernel.contract(output.patches)
+    contract_dims[0] = IndexPair<TensorIndex>(0, 0);
+    contract_dims[1] = IndexPair<TensorIndex>(2, 1);
+  } else {
+    // row-major: output.patches.contract(kernel)
+    contract_dims[0] = IndexPair<TensorIndex>(1, 0);
+    contract_dims[1] = IndexPair<TensorIndex>(2, 2);
+  }
+
+  // Post contraction, the dimensions of the input_backprop is
+  //  channels X input_rows X input_cols X OTHERS
+  DSizes<TensorIndex, NumDims> post_contract_dims;
+  if (isColMajor) {
+    post_contract_dims[0] = kernelChannels;
+    post_contract_dims[1] = inputRows;
+    post_contract_dims[2] = inputCols;
+    for (int i = 3; i < NumDims; ++i) {
+      post_contract_dims[i] = out.dimension(i);
+    }
+  } else {
+    post_contract_dims[NumDims - 1] = kernelChannels;
+    post_contract_dims[NumDims - 2] = inputRows;
+    post_contract_dims[NumDims - 3] = inputCols;
+    for (int i = 0; i < NumDims - 3; ++i) {
+      post_contract_dims[i] = out.dimension(i);
+    }
+  }
+
+  return choose(Cond<internal::traits<OutputBackward>::Layout == ColMajor>(),
+                kernel.reverse(kernel_reverse).reshape(kernel_dims).contract(output_backward.extract_image_patches(kernelRows, kernelCols, 1, 1, in_stride, in_stride, stride, stride, padding_top, padding_bottom, padding_left, padding_right, 0).reshape(pre_contract_dims), contract_dims).reshape(post_contract_dims),
+                output_backward.extract_image_patches(kernelRows, kernelCols, 1, 1, in_stride, in_stride, stride, stride, padding_top, padding_bottom, padding_left, padding_right, 0).reshape(pre_contract_dims).contract(kernel.reverse(kernel_reverse).reshape(kernel_dims), contract_dims).reshape(post_contract_dims));
+}
+
+
+/** SpatialConvolutionBackwardKernel
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Computes the backprop for the filter of a 2D convolution.
+  *
+  * The output_backward parameter is expected to be a tensor with a rank of 3 or more (channels, height, width, and optionally others)
+  * The kernel parameter is expected to be a 4D tensor (filters, channels, kernel_height, kernel_width)
+  * The output_backward and the kernel must both be in col-major layout. The result will also be in col-major layout.
+  *
+  * If in_stride > 1, then applies convolution with holes (aka atrous convolution), sampling every in_stride input pixels.
+  *
+  * The result can be assigned to a tensor of rank equal to the rank of the output_backward. The dimensions of the result will be filters, height, width (and others if applicable).
+  *
+  * It is possible to swap the order of the width and height dimensions provided that the same order is used in the input, the kernel, and the output.
+  *
+  */
+// TODO(gpapan): Resolve a bug in TensorContractionInputMapper at SpatialConvolutions.h that yangke circumvented by using .reshape().reshape().
+// This can significantly accelerate SpatialConvolutionBackwardKernel.
+
+template <typename OutputBackward, typename Input>
+EIGEN_ALWAYS_INLINE
+static const typename internal::conditional<
+  internal::traits<OutputBackward>::Layout == ColMajor,
+  const TensorShufflingOp<const array<typename internal::traits<OutputBackward>::Index, 4>, const TensorReverseOp<const array<bool, 4>, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 4>, const TensorContractionOp<const array<IndexPair<typename internal::traits<Input>::Index>, 2>, const TensorReshapingOp<const DSizes<typename internal::traits<Input>::Index, 3>, const Input>, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 4>, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 4>, const TensorImagePatchOp<Dynamic, Dynamic, const OutputBackward> > > > > > >,
+  const TensorShufflingOp<const array<typename internal::traits<OutputBackward>::Index, 4>, const TensorReverseOp<const array<bool, 4>, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 4>, const TensorContractionOp<const array<IndexPair<typename internal::traits<Input>::Index>, 2>, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 4>, const TensorReshapingOp<const DSizes<typename internal::traits<OutputBackward>::Index, 4>, const TensorImagePatchOp<Dynamic, Dynamic, const OutputBackward> > >, const TensorReshapingOp<const DSizes<typename internal::traits<Input>::Index, 3>, const Input> > > > > >::type
+SpatialConvolutionBackwardKernel(const Input& input, const OutputBackward& output_backward, typename internal::traits<Input>::Index kernelRows, typename internal::traits<Input>::Index kernelCols, const DenseIndex stride = 1, const DenseIndex in_stride = 1) {
+
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+  TensorRef<Tensor<typename internal::traits<OutputBackward>::Scalar, internal::traits<OutputBackward>::NumDimensions, internal::traits<OutputBackward>::Layout, TensorIndex> > out(output_backward);
+
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::Layout == internal::traits<OutputBackward>::Layout, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  // stride and in_stride cannot both be larger than 1
+  eigen_assert(!(stride > 1 && in_stride > 1));
+
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+
+  static const int NumDims = internal::traits<Input>::NumDimensions;
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions == internal::traits<OutputBackward>::NumDimensions, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  const TensorIndex inputRows = isColMajor ? in.dimension(1) : in.dimension(NumDims - 2);
+  const TensorIndex inputCols = isColMajor ? in.dimension(2) : in.dimension(NumDims - 3);
+
+  const TensorIndex outputRows = isColMajor ? output_backward.dimension(1) : output_backward.dimension(NumDims - 2);
+  const TensorIndex outputCols = isColMajor ? output_backward.dimension(2) : output_backward.dimension(NumDims - 3);
+
+  // Number of filters to apply. This is the same as the output depth of the result
+  const TensorIndex kernelFilters = isColMajor ? out.dimensions()[0] : out.dimensions()[NumDims - 1];
+
+  // Number of channels. This is the same as the input depth.
+  const TensorIndex kernelChannels = isColMajor ? in.dimensions()[0] : in.dimensions()[NumDims - 1];
+
+  // This is the effective kernel size, taking into account the (in_stride - 1) zero-values
+  // inserted between consecutive kernel elements in atrous convolution
+  const TensorIndex kernelRowsEff = kernelRows + (kernelRows - 1) * (in_stride - 1);
+  const TensorIndex kernelColsEff = kernelCols + (kernelCols - 1) * (in_stride - 1);
+
+  // Computing the forward padding
+  const TensorIndex forward_pad_top = ((outputRows - 1) * stride + kernelRowsEff - inputRows) / 2;
+  const TensorIndex forward_pad_left = ((outputCols - 1) * stride + kernelColsEff - inputCols) / 2;
+
+  // TODO: factor out the padding computation.
+  const TensorIndex padding_top = kernelRowsEff - 1 - forward_pad_top;
+  const TensorIndex padding_left = kernelColsEff - 1 - forward_pad_left;
+  const TensorIndex padding_bottom = inputRows + kernelRowsEff - 1 - (outputRows - 1) * stride - 1 - padding_top;
+  const TensorIndex padding_right = inputCols + kernelColsEff - 1 - (outputCols - 1) * stride - 1 - padding_left;
+
+  eigen_assert(padding_top >= 0);
+  eigen_assert(padding_left >= 0);
+  eigen_assert(padding_bottom >= 0);
+  eigen_assert(padding_right >= 0);
+
+  // The output_backward has dimensions out_depth X out_rows X out_cols X OTHERS
+  // When we extract the image patches from output_backward (with input as the
+  // kernel), it will have dimensions
+  //  (out_depth) X (input_rows * input_cols) X (kernel_rows * kernel_cols) X OTHERS
+  DSizes<TensorIndex, 4> pre_contract_dims;
+  if (isColMajor) {
+    pre_contract_dims[0] = kernelFilters;
+    pre_contract_dims[1] = inputRows * inputCols;
+    pre_contract_dims[2] = kernelRows * kernelCols;
+    pre_contract_dims[3] = 1;
+    for (int i = 3; i < NumDims; ++i) {
+      pre_contract_dims[3] *= out.dimension(i);
+    }
+  } else {
+    pre_contract_dims[3] = kernelFilters;
+    pre_contract_dims[2] = inputRows * inputCols;
+    pre_contract_dims[1] = kernelRows * kernelCols;
+    pre_contract_dims[0] = 1;
+    for (int i = 0; i < NumDims - 3; ++i) {
+      pre_contract_dims[0] *= out.dimension(i);
+    }
+  }
+
+  // The input has dimensions in_depth X (input_rows * input_cols) X OTHERS
+  DSizes<TensorIndex, 3> input_dims;
+  if (isColMajor) {
+    input_dims[0] = kernelChannels;
+    input_dims[1] = inputRows * inputCols;
+    input_dims[2] = 1;
+    for (int i = 3; i < NumDims; ++i) {
+      input_dims[2] *= in.dimension(i);
+    }
+    eigen_assert(input_dims[2] == pre_contract_dims[3]);
+  } else {
+    input_dims[2] = kernelChannels;
+    input_dims[1] = inputRows * inputCols;
+    input_dims[0] = 1;
+    for (int i = 0; i < NumDims - 3; ++i) {
+      input_dims[0] *= in.dimension(i);
+    }
+    eigen_assert(input_dims[0] == pre_contract_dims[0]);
+  }
+
+  // We will contract along dimensions (1, 2) in in and (1, 3) in out, if
+  // this is col-major.
+  // For row-major, it's dimensions (0, 1) in in and (0, 2) in out.
+  array<IndexPair<TensorIndex>, 2> contract_dims;
+  if (isColMajor) {
+    // col-major: in.contract(output.patches)
+    contract_dims[0] = IndexPair<TensorIndex>(1, 1);
+    contract_dims[1] = IndexPair<TensorIndex>(2, 3);
+  } else {
+    // row-major: output.patches.contract(in)
+    contract_dims[0] = IndexPair<TensorIndex>(0, 0);
+    contract_dims[1] = IndexPair<TensorIndex>(2, 1);
+  }
+
+  // After the contraction, the kernel will have dimension
+  // in_depth X out_depth X kernel_rows X kernel_cols
+  // We will need to shuffle the first two dimensions and reverse the latter
+  // two dimensions.
+  // The end shape is
+  // out_depth X in_shape X kernel_rows X kernel_cols
+
+  // This is the shape of the kernel *before* the shuffling.
+  DSizes<TensorIndex, 4> kernel_dims;
+  if (isColMajor) {
+    kernel_dims[0] = kernelChannels;
+    kernel_dims[1] = kernelFilters;
+    kernel_dims[2] = kernelRows;
+    kernel_dims[3] = kernelCols;
+  } else {
+    kernel_dims[0] = kernelCols;
+    kernel_dims[1] = kernelRows;
+    kernel_dims[2] = kernelFilters;
+    kernel_dims[3] = kernelChannels;
+  }
+
+  array<TensorIndex, 4> kernel_shuffle;
+  if (isColMajor) {
+    kernel_shuffle[0] = 1;
+    kernel_shuffle[1] = 0;
+    kernel_shuffle[2] = 2;
+    kernel_shuffle[3] = 3;
+  } else {
+    kernel_shuffle[0] = 0;
+    kernel_shuffle[1] = 1;
+    kernel_shuffle[2] = 3;
+    kernel_shuffle[3] = 2;
+  }
+
+  array<bool, 4> kernel_reverse;
+  if (isColMajor) {
+    kernel_reverse[0] = false;
+    kernel_reverse[1] = false;
+    kernel_reverse[2] = true;
+    kernel_reverse[3] = true;
+  } else {
+    kernel_reverse[0] = true;
+    kernel_reverse[1] = true;
+    kernel_reverse[2] = false;
+    kernel_reverse[3] = false;
+  }
+
+  return choose(Cond<internal::traits<Input>::Layout == ColMajor>(),
+                input.reshape(input_dims).contract(output_backward.extract_image_patches(inputRows, inputCols, in_stride, in_stride, 1, 1, stride, stride, padding_top, padding_bottom, padding_left, padding_right, 0).reshape(pre_contract_dims).reshape(pre_contract_dims), contract_dims).reshape(kernel_dims).reverse(kernel_reverse).shuffle(kernel_shuffle),
+                output_backward.extract_image_patches(inputRows, inputCols, in_stride, in_stride, 1, 1, stride, stride, padding_top, padding_bottom, padding_left, padding_right, 0).reshape(pre_contract_dims).reshape(pre_contract_dims).contract(input.reshape(input_dims), contract_dims).reshape(kernel_dims).reverse(kernel_reverse).shuffle(kernel_shuffle));
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_NEURAL_NETWORKS_BACKWARD_SPATIAL_CONVOLUTIONS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/CuboidConvolution.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/CuboidConvolution.h
new file mode 100644
index 0000000000..dfb9dcedba
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/CuboidConvolution.h
@@ -0,0 +1,179 @@
+#ifndef EIGEN_CXX11_SRC_NEURAL_NETWORKS_CUBOID_CONVOLUTION_H
+#define EIGEN_CXX11_SRC_NEURAL_NETWORKS_CUBOID_CONVOLUTION_H
+
+#include "Patch3d.h"
+
+namespace Eigen {
+
+/** CuboidConvolution
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Applies a 3D convolution over a multichannel input voxel block.
+  *
+  * The input parameter is expected to be a tensor with a rank of 4 or more (channels, depth, height, width, and optionally others).
+  * The kernel parameter is expected to be a 5D tensor (filters, channels, kernel_depth, kernel_height, kernel_width).
+  * The result can be assigned to a tensor of rank equal to the rank of the input. The dimensions of the result will be filters, depth, height, width (and others if applicable).
+  *
+  * The input and kernel have to be in the same layout, and both row-major and
+  * col-major are supported. The shapes given above are for col-major layout.
+  * For row-major, all dimensions should be reversed.
+  *
+  * It is possible to swap the order of the depth, width, and height dimensions provided that the same order is used in the input, the kernel, and the output.
+  */
+template <typename Input, typename Kernel>
+EIGEN_ALWAYS_INLINE
+static const typename internal::conditional <
+    internal::traits<Input>::Layout == ColMajor,
+    TensorReshapingOp<
+        const DSizes<typename internal::traits<Input>::Index,
+                     internal::traits<Input>::NumDimensions>,
+        const TensorContractionOp<
+            const array<IndexPair<typename internal::traits<Input>::Index>, 1>,
+            const TensorReshapingOp<
+                const DSizes<typename internal::traits<Input>::Index, 2>,
+                const Kernel>,
+            const TensorReshapingOp<
+                const DSizes<typename internal::traits<Input>::Index, 2>,
+                const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic,
+                                          const Input> > > >,
+    TensorReshapingOp<
+        const DSizes<typename internal::traits<Input>::Index,
+                     internal::traits<Input>::NumDimensions>,
+        const TensorContractionOp<
+            const array<IndexPair<typename internal::traits<Input>::Index>, 1>,
+            const TensorReshapingOp<
+                const DSizes<typename internal::traits<Input>::Index, 2>,
+                const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic,
+                                          const Input> > ,
+                const TensorReshapingOp<
+                    const DSizes<typename internal::traits<Input>::Index, 2>,
+                    const Kernel> > > >::type
+CuboidConvolution(const Input& input, const Kernel& kernel,
+                  const DenseIndex stridePlanes = 1,
+                  const DenseIndex strideRows = 1,
+                  const DenseIndex strideCols = 1,
+                  const PaddingType padding_type = PADDING_SAME) {
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+  TensorRef<Tensor<typename internal::traits<Kernel>::Scalar, internal::traits<Kernel>::NumDimensions, internal::traits<Kernel>::Layout, TensorIndex> > kern(kernel);
+
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::Layout == internal::traits<Kernel>::Layout, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+  static const int NumDims = internal::traits<Input>::NumDimensions;
+
+  // Number of filters to apply. This is the same as the output depth of the result.
+  const TensorIndex kernelFilters = isColMajor ? kern.dimensions()[0] : kern.dimensions()[4];
+  const TensorIndex kernelChannels = isColMajor ? kern.dimensions()[1] : kern.dimensions()[3];
+
+  // Spatial size of the kernel.
+  const TensorIndex kernelDepth = isColMajor ? kern.dimensions()[2] : kern.dimensions()[2];
+  const TensorIndex kernelRows = isColMajor ? kern.dimensions()[3] : kern.dimensions()[1];
+  const TensorIndex kernelCols = isColMajor ? kern.dimensions()[4] : kern.dimensions()[0];
+
+  if (isColMajor) {
+    eigen_assert(kernelChannels == in.dimension(0));
+  } else {
+    eigen_assert(kernelChannels == in.dimension(NumDims - 1));
+  }
+
+  const TensorIndex inputPlanes = isColMajor ? in.dimension(1) : in.dimension(NumDims - 2);
+  const TensorIndex inputRows = isColMajor ? in.dimension(2) : in.dimension(NumDims - 3);
+  const TensorIndex inputCols = isColMajor ? in.dimension(3) : in.dimension(NumDims - 4);
+
+  const float stride_planes_f = static_cast<float>(stridePlanes);
+  const float stride_rows_f = static_cast<float>(strideRows);
+  const float stride_cols_f = static_cast<float>(strideCols);
+  TensorIndex out_depth;
+  TensorIndex out_height;
+  TensorIndex out_width;
+  switch (padding_type) {
+    case PADDING_VALID:
+      out_depth = ceil((inputPlanes - kernelDepth + 1.f) / stride_planes_f);
+      out_height = ceil((inputRows - kernelRows + 1.f) / stride_rows_f);
+      out_width = ceil((inputCols - kernelCols + 1.f) / stride_cols_f);
+      break;
+    case PADDING_SAME:
+      out_depth = ceil(inputPlanes / stride_planes_f);
+      out_height = ceil(inputRows / stride_rows_f);
+      out_width = ceil(inputCols / stride_cols_f);
+      break;
+    default:
+      eigen_assert(false && "unexpected padding");
+  }
+
+  DSizes<TensorIndex, 2> kernel_dims;
+  if (isColMajor) {
+    kernel_dims[0] = kernelFilters;
+    kernel_dims[1] = kernelChannels * kernelDepth * kernelRows * kernelCols;
+  } else {
+    kernel_dims[0] = kernelChannels * kernelDepth * kernelRows * kernelCols;
+    kernel_dims[1] = kernelFilters;
+  }
+
+  // Molds the output of the patch extraction result into a 2D tensor:
+  // - the first dimension (dims[0]): the patch values to be multiplied with the kernels
+  // - the second dimension (dims[1]): everything else
+  DSizes<TensorIndex, 2> pre_contract_dims;
+  if (isColMajor) {
+    pre_contract_dims[0] = kernelChannels * kernelDepth * kernelRows * kernelCols;
+    pre_contract_dims[1] = out_depth * out_height * out_width;
+    for (int i = 4; i < NumDims; ++i) {
+      pre_contract_dims[1] *= in.dimension(i);
+    }
+  } else {
+    pre_contract_dims[1] = kernelChannels * kernelDepth * kernelRows * kernelCols;
+    pre_contract_dims[0] = out_depth * out_height * out_width;
+    for (int i = 0; i < NumDims - 4; ++i) {
+      pre_contract_dims[0] *= in.dimension(i);
+    }
+  }
+
+  array<IndexPair<TensorIndex>, 1> contract_dims;
+  contract_dims[0] = IndexPair<TensorIndex>(1, 0);
+
+  // Molds the output of the contraction into the shape expected by the user
+  // (assuming ColMajor):
+  // - 1st dim: kernel filters
+  // - 2nd dim: output depth
+  // - 3nd dim: output height
+  // - 4rd dim: output width
+  // - 5th dim and beyond: everything else including batch size
+  DSizes<TensorIndex, NumDims> post_contract_dims;
+  if (isColMajor) {
+    post_contract_dims[0] = kernelFilters;
+    post_contract_dims[1] = out_depth;
+    post_contract_dims[2] = out_height;
+    post_contract_dims[3] = out_width;
+    for (int i = 4; i < NumDims; ++i) {
+      post_contract_dims[i] = in.dimension(i);
+    }
+  } else {
+    post_contract_dims[NumDims - 1] = kernelFilters;
+    post_contract_dims[NumDims - 2] = out_depth;
+    post_contract_dims[NumDims - 3] = out_height;
+    post_contract_dims[NumDims - 4] = out_width;
+    for (int i = 0; i < NumDims - 4; ++i) {
+      post_contract_dims[i] = in.dimension(i);
+    }
+  }
+
+  return choose(
+      Cond<internal::traits<Input>::Layout == ColMajor>(),
+      kernel.reshape(kernel_dims)
+          .contract(input.extract_volume_patches(
+                             kernelDepth, kernelRows, kernelCols, stridePlanes,
+                             strideRows, strideCols, padding_type)
+                        .reshape(pre_contract_dims),
+                    contract_dims)
+          .reshape(post_contract_dims),
+      input.extract_volume_patches(kernelDepth, kernelRows, kernelCols,
+                                   stridePlanes, strideRows, strideCols,
+                                   padding_type)
+          .reshape(pre_contract_dims)
+          .contract(kernel.reshape(kernel_dims), contract_dims)
+          .reshape(post_contract_dims));
+}
+
+} // end namespace Eigen
+
+#endif  // EIGEN_CXX11_SRC_NEURAL_NETWORKS_CUBOID_CONVOLUTION_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Patch3d.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Patch3d.h
new file mode 100644
index 0000000000..df60fe18a3
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Patch3d.h
@@ -0,0 +1,233 @@
+#ifndef EIGEN_CXX11_SRC_NEURAL_NETWORKS_PATCH3D_H
+#define EIGEN_CXX11_SRC_NEURAL_NETWORKS_PATCH3D_H
+
+#if not defined(__CUDACC__)
+#include <type_traits>
+#endif
+
+namespace Eigen {
+namespace internal {
+
+/** Extract3DPatches
+ * \ingroup CXX11_NeuralNetworksModule
+ *
+ * \brief Extracts 3D patches from a multichannel input volume.
+ *
+ * The input parameter is expected to be a tensor with a rank of 4 or more
+ * (channels, depth, height, width, optional others in col-major, and the
+ * reverse order in row-major).
+
+ * The return value will be a tensor of 3 more dimension than the input tensor.
+ * In col-major, the first 4 dimensions of the result are: channels, patch_depth,
+ * patch_height, patch_width. The next dimensions will identify the patch
+ * position on the 3D grid of extracted patches: z, y, x. The remaining
+ * dimensions, if any, will be the same as the 'other' dimensions of the input
+ * tensor.
+ */
+
+template <typename Input>
+EIGEN_ALWAYS_INLINE static const TensorStridingOp<
+    const array<typename internal::traits<Input>::Index,
+                internal::traits<Input>::NumDimensions + 3>,
+    const TensorReshapingOp<
+        const DSizes<typename internal::traits<Input>::Index,
+                     internal::traits<Input>::NumDimensions + 3>,
+        const TensorPatchOp<
+            const DSizes<typename internal::traits<Input>::Index,
+                         internal::traits<Input>::NumDimensions>,
+            const TensorPaddingOp<
+                const array<IndexPair<typename internal::traits<Input>::Index>,
+                            internal::traits<Input>::NumDimensions>,
+                const Input> > > >
+Extract3DPatches(
+    const Input& input, const DenseIndex patchPlanes,
+    const DenseIndex patchRows, const DenseIndex patchCols,
+    const DenseIndex stridePlanes, const DenseIndex strideRows,
+    const DenseIndex strideCols,
+    const DenseIndex paddingZTop, const DenseIndex paddingZBottom,
+    const DenseIndex paddingTop, const DenseIndex paddingBottom,
+    const DenseIndex paddingLeft, const DenseIndex paddingRight,
+    const typename internal::traits<Input>::Scalar padding_value = 0) {
+
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions >= 4, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+  static const int NumDims = internal::traits<Input>::NumDimensions;
+  static const int ExtDims = NumDims + 3;
+
+  // Tensor size after patch extraction. We add three dimensions to unpack the
+  // linear patch index into a 3D grid over which stride() can work.
+  DSizes<TensorIndex, ExtDims> pre_stride_dims;
+
+  if (isColMajor) {
+    pre_stride_dims[0] = in.dimension(0);
+    pre_stride_dims[1] = patchPlanes;
+    pre_stride_dims[2] = patchRows;
+    pre_stride_dims[3] = patchCols;
+  } else {
+    pre_stride_dims[ExtDims - 1] = in.dimension(NumDims - 1);
+    pre_stride_dims[ExtDims - 4] = patchCols;
+    pre_stride_dims[ExtDims - 3] = patchRows;
+    pre_stride_dims[ExtDims - 2] = patchPlanes;
+  }
+
+  const TensorIndex inputPlanes = isColMajor ? in.dimension(1) : in.dimension(NumDims - 2);
+  const TensorIndex inputRows = isColMajor ? in.dimension(2) : in.dimension(NumDims - 3);
+  const TensorIndex inputCols = isColMajor ? in.dimension(3) : in.dimension(NumDims - 4);
+
+  array<IndexPair<TensorIndex>, NumDims> paddings;
+  for (int i = 0; i < NumDims; ++i) {
+    paddings[i] = IndexPair<TensorIndex>(0, 0);
+  }
+
+  paddings[isColMajor ? 1 : (NumDims - 2)] = IndexPair<TensorIndex>(paddingZTop, paddingZBottom);
+  paddings[isColMajor ? 2 : (NumDims - 3)] = IndexPair<TensorIndex>(paddingTop, paddingBottom);
+  paddings[isColMajor ? 3 : (NumDims - 4)] = IndexPair<TensorIndex>(paddingLeft, paddingRight);
+
+  pre_stride_dims[isColMajor ? 4 : (ExtDims - 5)] = inputPlanes + paddingZBottom + paddingZTop - patchPlanes + 1;
+  pre_stride_dims[isColMajor ? 5 : (ExtDims - 6)] = inputRows + paddingTop + paddingBottom - patchRows + 1;
+  pre_stride_dims[isColMajor ? 6 : (ExtDims - 7)] = inputCols + paddingLeft + paddingRight - patchCols + 1;
+
+  if (isColMajor) {
+    for (int i = 7; i < NumDims + 3; ++i) {
+      pre_stride_dims[i] = in.dimension(i - 3);
+    }
+  } else {
+    for (int i = 0; i < NumDims - 4; ++i) {
+      pre_stride_dims[i] = in.dimension(i);
+    }
+  }
+
+  DSizes<TensorIndex, NumDims> patch_dims;
+  if (isColMajor) {
+    patch_dims[0] = in.dimension(0);
+    patch_dims[1] = patchPlanes;
+    patch_dims[2] = patchRows;
+    patch_dims[3] = patchCols;
+    for (int i = 4; i < NumDims; ++i) {
+      patch_dims[i] = 1;
+    }
+  } else {
+    patch_dims[NumDims - 1] = in.dimension(NumDims - 1);
+    patch_dims[NumDims - 4] = patchCols;
+    patch_dims[NumDims - 3] = patchRows;
+    patch_dims[NumDims - 2] = patchPlanes;
+    for (int i = 0; i < NumDims - 4; i++) {
+      patch_dims[i] = 1;
+    }
+  }
+
+  array<TensorIndex, NumDims + 3> strides;
+  if (isColMajor) {
+    // No striding within the patches.
+    for (int i = 0; i < 4; ++i) {
+      strides[i] = 1;
+    }
+    // Apply striding in the spatial patch grid dimensions only.
+    strides[4] = stridePlanes;
+    strides[5] = strideRows;
+    strides[6] = strideCols;
+    // No striding in the remaining dimensions (batches, ...).
+    for (int i = 7; i < NumDims + 3; i++) {
+      strides[i] = 1;
+    }
+  } else {
+    // No striding within the patches.
+    for (int i = 1; i <= 4; ++i) {
+      strides[ExtDims - i] = 1;
+    }
+    // Apply striding in the spatial patch grid dimensions only.
+    strides[ExtDims - 7] = strideCols;
+    strides[ExtDims - 6] = strideRows;
+    strides[ExtDims - 5] = stridePlanes;
+    // No striding in the remaining dimensions (batches, ...).
+    for (int i = 0; i < NumDims - 4; i++) {
+      strides[i] = 1;
+    }
+  }
+
+  // TODO(mjanusz): Consider getting rid of pad(), and stride() and extend
+  // extract_patches to take additional parameters for padding/striding,
+  // similarly to etract_image_patches.
+  return input.pad(paddings, padding_value).extract_patches(patch_dims).reshape(pre_stride_dims).stride(strides);
+}
+
+
+template <typename Input>
+EIGEN_ALWAYS_INLINE static const TensorStridingOp<
+    const array<typename internal::traits<Input>::Index,
+                internal::traits<Input>::NumDimensions + 3>,
+    const TensorReshapingOp<
+        const DSizes<typename internal::traits<Input>::Index,
+                     internal::traits<Input>::NumDimensions + 3>,
+        const TensorPatchOp<
+            const DSizes<typename internal::traits<Input>::Index,
+                         internal::traits<Input>::NumDimensions>,
+            const TensorPaddingOp<
+                const array<IndexPair<typename internal::traits<Input>::Index>,
+                            internal::traits<Input>::NumDimensions>,
+                const Input> > > >
+Extract3DPatches(
+    const Input& input, const DenseIndex patchPlanes,
+    const DenseIndex patchRows, const DenseIndex patchCols,
+    const DenseIndex stridePlanes, const DenseIndex strideRows,
+    const DenseIndex strideCols, const PaddingType padding_type,
+    const typename internal::traits<Input>::Scalar padding_value = 0) {
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions >= 4, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+  static const int NumDims = internal::traits<Input>::NumDimensions;
+
+  const TensorIndex inputPlanes = isColMajor ? in.dimension(1) : in.dimension(NumDims - 2);
+  const TensorIndex inputRows = isColMajor ? in.dimension(2) : in.dimension(NumDims - 3);
+  const TensorIndex inputCols = isColMajor ? in.dimension(3) : in.dimension(NumDims - 4);
+
+  switch (padding_type) {
+    case PADDING_VALID:
+      // No padding in any dimension.
+      return Extract3DPatches(input, patchPlanes, patchRows, patchCols,
+                              stridePlanes, strideRows, strideCols,
+                              0, 0, 0, 0, 0, 0, padding_value);
+    case PADDING_SAME:
+      // The side of the tensor before striding should be just the expected
+      // output times the stride.
+      const TensorIndex size_z = ceil(inputPlanes / static_cast<float>(stridePlanes)) * stridePlanes;
+      const TensorIndex size_y = ceil(inputRows / static_cast<float>(strideRows)) * strideRows;
+      const TensorIndex size_x = ceil(inputCols / static_cast<float>(strideCols)) * strideCols;
+
+      // The size of the patch space is going to be: padded_input_size - patch_size + 1.
+      // This has to match the expected size before striding (pre_stride_dims).
+      // The deltas below extend the input to the expected size.
+      const TensorIndex dz = size_z + patchPlanes - 1 - inputPlanes;
+      const TensorIndex dy = size_y + patchRows - 1 - inputRows;
+      const TensorIndex dx = size_x + patchCols - 1 - inputCols;
+
+      return Extract3DPatches(input, patchPlanes, patchRows, patchCols,
+                              stridePlanes, strideRows, strideCols,
+                              dz - dz / 2, dz / 2,
+                              dy - dy / 2, dy / 2,
+                              dx - dx / 2, dx / 2,
+                              padding_value);
+  }
+}
+
+// TODO(mjanusz): Switch this to a 'using' alias once CUDA supports C++11.
+template <typename Input>
+struct Extract3DPatchesType {
+  typedef const TensorStridingOp< const array<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions + 3>,
+      const TensorReshapingOp< const DSizes<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions + 3>,
+      const TensorPatchOp< const DSizes<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions>,
+      const TensorPaddingOp< const array< IndexPair<typename internal::traits<Input>::Index>, internal::traits<Input>::NumDimensions>,
+      const Input> > > > type;
+};
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+#endif  // EIGEN_CXX11_SRC_NEURAL_NETWORKS_PATCH3D_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Pooling.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Pooling.h
new file mode 100644
index 0000000000..8dea22806c
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/Pooling.h
@@ -0,0 +1,442 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#ifndef EIGEN_CXX11_NEURAL_NETWORKS_POOLING_H
+#define EIGEN_CXX11_NEURAL_NETWORKS_POOLING_H
+
+#include "Patch3d.h"
+
+namespace Eigen {
+
+/** SpatialMaxPooling
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Applies a max-pooling over a multichannel input image.
+  *
+  * The input parameter is expected to be a with a rank of 4 (channels, height, width, others in col-major, and the reverse of that in row-major).
+  *
+  * The result can be assigned to a tensor of rank equal to the rank of the input. The dimensions of the result will be channels, height, width, and others (in col-major, and the reverse of that if the input was row-major).
+  *
+  * The order of the width and height dimensions can be swapped if needed.
+  *
+*/
+#if !defined(EIGEN_HAS_INDEX_LIST)
+template <typename Input>
+EIGEN_ALWAYS_INLINE
+static const TensorReshapingOp<const Eigen::DSizes<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions>, const TensorReductionOp<internal::MaxReducer<typename internal::remove_const<typename internal::traits<Input>::Scalar>::type>, const Eigen::array<int, 2>, const TensorImagePatchOp<Dynamic, Dynamic, const Input> > >
+#else
+template <typename Input>
+EIGEN_ALWAYS_INLINE
+static const TensorReshapingOp<const Eigen::DSizes<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions>, const TensorReductionOp<internal::MaxReducer<typename internal::remove_const<typename internal::traits<Input>::Scalar>::type>, typename internal::conditional<internal::traits<Input>::Layout == ColMajor, const Eigen::IndexList<Eigen::type2index<1>, Eigen::type2index<2> >, const Eigen::IndexList<Eigen::type2index<2>, Eigen::type2index<3> > >::type, const TensorImagePatchOp<Dynamic, Dynamic, const Input> > >
+#endif
+SpatialMaxPooling(const Input& input, DenseIndex patchRows, DenseIndex patchCols,
+                  DenseIndex strideRows, DenseIndex strideCols, const PaddingType padding_type,
+                  DenseIndex in_strideRows = 1, DenseIndex in_strideCols = 1)
+{
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions == 4, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+
+  const DenseIndex patchRowsEff = patchRows + (patchRows - 1) * (in_strideRows - 1);
+  const DenseIndex patchColsEff = patchCols + (patchCols - 1) * (in_strideCols - 1);
+
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+  static const int idxRows = isColMajor ? 1 : 2;
+  static const int idxCols = isColMajor ? 2 : 1;
+
+  // Molds the output of the reduction into the shape expected by the user.
+  // (assuming col-major):
+  // - 1st dim: channels
+  // - 2nd dim: output height
+  // - 3rd dim: output width
+  // - 4th dim and beyond: everything else including batch size
+  Eigen::DSizes<TensorIndex, internal::traits<Input>::NumDimensions> post_reduce_dims;
+  post_reduce_dims[0] = in.dimension(0);
+  if (padding_type == PADDING_VALID) {
+    post_reduce_dims[idxRows] = numext::ceil((in.dimension(idxRows) - patchRowsEff + 1.f) / static_cast<float>(strideRows));
+    post_reduce_dims[idxCols] = numext::ceil((in.dimension(idxCols) - patchColsEff + 1.f) / static_cast<float>(strideCols));
+  } else {
+    post_reduce_dims[idxRows] = numext::ceil(in.dimension(idxRows) / static_cast<float>(strideRows));
+    post_reduce_dims[idxCols] = numext::ceil(in.dimension(idxCols) / static_cast<float>(strideCols));
+  }
+  post_reduce_dims[3] = in.dimension(3);
+
+#if !defined(EIGEN_HAS_INDEX_LIST)
+  // nvcc doesn't support cxx11
+  Eigen::array<int, 2> reduction_dims;
+  if (isColMajor) {
+    reduction_dims[0] = 1;
+    reduction_dims[1] = 2;
+  } else {
+    reduction_dims[0] = 2;
+    reduction_dims[1] = 3;
+  }
+#else
+  // Take advantage of cxx11 to give the compiler information it can use to
+  // optimize the code.
+  typename internal::conditional<internal::traits<Input>::Layout == ColMajor, const Eigen::IndexList<Eigen::type2index<1>, Eigen::type2index<2> >, const Eigen::IndexList<Eigen::type2index<2>, Eigen::type2index<3> > >::type reduction_dims;
+#endif
+
+  return input.extract_image_patches(patchRows, patchCols, strideRows, strideCols, in_strideRows, in_strideCols, padding_type, -Eigen::NumTraits<typename internal::remove_const<typename internal::traits<Input>::Scalar>::type>::highest()).maximum(reduction_dims).reshape(post_reduce_dims);
+}
+
+/** CuboidMaxPooling
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Applies a max-pooling over a multichannel input volume.
+  *
+  * The input parameter is expected to be a tensor with a rank of 5 (channels, depth, height, width, others in col-major, and the reverse of that in row-major).
+  *
+  * The result can be assigned to a tensor of rank equal to the rank of the input. The dimensions of the result will be channels, depth, height, width, and others (in col-major, and the reverse of that if the input was row-major).
+  *
+  * The order of the depth, width and height dimensions can be swapped if needed.
+  *
+*/
+#if !defined(EIGEN_HAS_INDEX_LIST)
+template <typename Input>
+EIGEN_ALWAYS_INLINE static const TensorReshapingOp<
+    const Eigen::DSizes<DenseIndex, internal::traits<Input>::NumDimensions>,
+    const TensorReductionOp<
+        internal::MaxReducer<float>, const Eigen::array<int, 1>,
+        const TensorReshapingOp<
+            const Eigen::DSizes<DenseIndex, 3>,
+            const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const Input> > > >
+#else
+template <typename Input>
+EIGEN_ALWAYS_INLINE static const TensorReshapingOp<
+    const Eigen::DSizes<DenseIndex, internal::traits<Input>::NumDimensions>,
+    const TensorReductionOp<
+        internal::MaxReducer<float>,
+        const Eigen::IndexList<Eigen::type2index<1> >,
+        const TensorReshapingOp<
+            const Eigen::DSizes<DenseIndex, 3>,
+            const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const Input> > > >
+#endif
+CuboidMaxPooling(const Input& input, DenseIndex patchPlanes,
+                 DenseIndex patchRows, DenseIndex patchCols,
+                 DenseIndex stridePlanes, DenseIndex strideRows,
+                 DenseIndex strideCols, const PaddingType padding_type) {
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions == 5, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+
+  static const int idxPlanes = isColMajor ? 1 : 3;
+  static const int idxRows = 2;
+  static const int idxCols = isColMajor ? 3 : 1;
+
+  // Molds the output of the reduction into the shape expected by the used
+  // (assuming col-major):
+  // - 1st dim: channels
+  // - 2nd dim: output depth
+  // - 3rd dim: output height
+  // - 4th dim: output width
+  // - 5th dim and beyond: everything else including batch size
+  Eigen::DSizes<DenseIndex, internal::traits<Input>::NumDimensions> post_reduce_dims;
+  post_reduce_dims[0] = in.dimension(0);
+  if (padding_type == PADDING_VALID) {
+    post_reduce_dims[idxPlanes] = numext::ceil((in.dimension(idxPlanes) - patchPlanes + 1.f) / static_cast<float>(stridePlanes));
+    post_reduce_dims[idxRows] = numext::ceil((in.dimension(idxRows) - patchRows + 1.f) / static_cast<float>(strideRows));
+    post_reduce_dims[idxCols] = numext::ceil((in.dimension(idxCols) - patchCols + 1.f) / static_cast<float>(strideCols));
+  } else {
+    post_reduce_dims[idxPlanes] = numext::ceil(in.dimension(idxPlanes) / static_cast<float>(stridePlanes));
+    post_reduce_dims[idxRows] = numext::ceil(in.dimension(idxRows) / static_cast<float>(strideRows));
+    post_reduce_dims[idxCols] = numext::ceil(in.dimension(idxCols) / static_cast<float>(strideCols));
+  }
+  post_reduce_dims[4] = in.dimension(4);
+
+  Eigen::DSizes<DenseIndex, 3> pre_reduce_dims;
+  pre_reduce_dims[1] = patchRows * patchCols * patchPlanes;
+  if (isColMajor) {
+    pre_reduce_dims[0] = post_reduce_dims[0];
+    pre_reduce_dims[2] = post_reduce_dims[1] * post_reduce_dims[2] * post_reduce_dims[3] * post_reduce_dims[4];
+  } else {
+    pre_reduce_dims[0] = post_reduce_dims[0] * post_reduce_dims[1] * post_reduce_dims[2] * post_reduce_dims[3];
+    pre_reduce_dims[2] = post_reduce_dims[4];
+  }
+
+#if !defined(EIGEN_HAS_INDEX_LIST)
+  // nvcc doesn't support cxx11
+  Eigen::array<int, 1> reduction_dims;
+  reduction_dims[0] = 1;
+#else
+  // Take advantage of cxx11 to give the compiler information it can use to
+  // optimize the code.
+  Eigen::IndexList<Eigen::type2index<1> > reduction_dims;
+#endif
+  return input.extract_volume_patches(patchPlanes, patchRows, patchCols,
+                                      stridePlanes, strideRows, strideCols,
+                                      padding_type, -Eigen::NumTraits<float>::highest())
+      .reshape(pre_reduce_dims)
+      .maximum(reduction_dims)
+      .reshape(post_reduce_dims);
+}
+
+
+/** SpatialAvgPooling
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Applies an average pooling over a multichannel input image.
+  *
+  * The input parameter is expected to be a tensor with a rank of 4 (channels, height, width, others in col-major, and the reverse of that in row-major).
+  *
+  * The result can be assigned to a tensor of rank equal to the rank of the input. The dimensions of the result will be channels, height, width, and others (in col-major, and the reverse of that if the input was row-major).
+  *
+  * The order of the width and height dimensions can be swapped if needed.
+  *
+*/
+namespace internal {
+
+template <typename T> struct AvgPoolMeanReducer
+{
+#if (EIGEN_ARCH_i386 || EIGEN_ARCH_x86_64 || defined (EIGEN_USE_GPU) || defined(__CUDACC__) || defined(__CUDA_ARCH__))
+  // We only support packet access for floats.
+  static const bool PacketAccess = internal::is_same<T, float>::value;
+#else
+  static const bool PacketAccess = false;
+#endif
+  static const bool IsStateful = true;
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE AvgPoolMeanReducer() : scalarCount_(0) {
+    typedef typename packet_traits<T>::type Packet;
+    packetCount_ = pset1<Packet>(0.0);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) {
+    if (t != -Eigen::NumTraits<T>::highest()) {
+      (*accum) = (*accum) + t;
+      scalarCount_++;
+    }
+  }
+
+
+#if (!defined (EIGEN_USE_GPU) || !defined(__CUDACC__) || !defined(__CUDA_ARCH__))
+#ifdef EIGEN_VECTORIZE_AVX
+#define pequal(a,b) _mm256_cmp_ps(a,b,_CMP_EQ_UQ)
+#define psel(a,b,false_mask) _mm256_blendv_ps(a,b,false_mask)
+#else
+#define pequal(a,b) _mm_cmpeq_ps(a,b)
+#define psel(a,b,false_mask) _mm_or_ps(_mm_andnot_ps(false_mask, a), _mm_and_ps(false_mask, b))
+#endif
+
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) {
+    reducePacketWithType(static_cast<T>(0), p, accum);
+  }
+
+  template <typename Packet>
+  void reducePacketWithType(T, const Packet& p, Packet* accum) {
+    Packet skip_mask = pequal(p, pset1<Packet>(-Eigen::NumTraits<T>::highest()));
+    (*accum) = padd<Packet>(*accum, psel(p, pset1<Packet>(0), skip_mask));
+    packetCount_ = padd<Packet>(packetCount_, psel(pset1<Packet>(1), pset1<Packet>(0), skip_mask));
+  }
+
+#else
+#define pequal(a,b) make_float4(a.x == b.x ? 1.f : 0, a.y == b.y ? 1.f : 0, a.z == b.z ? 1.f : 0, a.w == b.w ? 1.f : 0)
+#define psel(a,b,c) make_float4(c.x ? b.x : a.x, c.y ? b.y : a.y, c.z ? b.z : a.z, c.w ? b.w : a.w)
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const float4& p, float4* accum) {
+    float4 skip_mask = pequal(p, pset1<float4>(-Eigen::NumTraits<float>::highest()));
+    (*accum) = padd<float4>(*accum, psel(p, pset1<float4>(0), skip_mask));
+    packetCount_ = padd<float4>(packetCount_, psel(pset1<float4>(1), pset1<float4>(0), skip_mask));
+  }
+
+#endif
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const {
+    return static_cast<T>(0);
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const {
+    return pset1<Packet>(0);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const {
+    eigen_assert(scalarCount_ > 0);
+    return accum / scalarCount_;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const {
+    return pdiv(vaccum, packetCount_);
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const {
+    return (saccum + predux(vaccum)) / (scalarCount_ + predux(packetCount_));
+  }
+
+ protected:
+    typedef typename packet_traits<T>::type Packet;
+    int scalarCount_;
+    Packet packetCount_;
+};
+
+}  // namespace internal
+
+#if !defined(EIGEN_HAS_INDEX_LIST)
+template <typename Input>
+EIGEN_ALWAYS_INLINE
+static const TensorReshapingOp<const Eigen::DSizes<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions>, const TensorReductionOp<internal::AvgPoolMeanReducer<typename internal::remove_const<typename internal::traits<Input>::Scalar>::type>, const Eigen::array<int, 2>, const TensorImagePatchOp<Dynamic, Dynamic, const Input> > >
+#else
+template <typename Input>
+EIGEN_ALWAYS_INLINE
+static const TensorReshapingOp<const Eigen::DSizes<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions>, const TensorReductionOp<internal::AvgPoolMeanReducer<typename internal::remove_const<typename internal::traits<Input>::Scalar>::type>, typename internal::conditional<internal::traits<Input>::Layout == ColMajor, const Eigen::IndexList<Eigen::type2index<1>, Eigen::type2index<2> >, const Eigen::IndexList<Eigen::type2index<2>, Eigen::type2index<3> > >::type, const TensorImagePatchOp<Dynamic, Dynamic, const Input> > >
+#endif
+SpatialAvgPooling(const Input& input, DenseIndex patchRows, DenseIndex patchCols,
+                  DenseIndex strideRows, DenseIndex strideCols, const PaddingType padding_type,
+                  DenseIndex in_strideRows = 1, DenseIndex in_strideCols = 1)
+{
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions == 4, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+
+  const DenseIndex patchRowsEff = patchRows + (patchRows - 1) * (in_strideRows - 1);
+  const DenseIndex patchColsEff = patchCols + (patchCols - 1) * (in_strideCols - 1);
+
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+  static const int idxRows = isColMajor ? 1 : 2;
+  static const int idxCols = isColMajor ? 2 : 1;
+
+  // Molds the output of the reduction into the shape expected by the user.
+  // (assuming col-major):
+  // - 1st dim: channels
+  // - 2nd dim: output height
+  // - 3rd dim: output width
+  // - 4th dim and beyond: everything else including batch size
+  Eigen::DSizes<TensorIndex, internal::traits<Input>::NumDimensions> post_reduce_dims;
+  post_reduce_dims[0] = in.dimension(0);
+  if (padding_type == PADDING_VALID) {
+    post_reduce_dims[idxRows] = numext::ceil((in.dimension(idxRows) - patchRowsEff + 1.f) / static_cast<float>(strideRows));
+    post_reduce_dims[idxCols] = numext::ceil((in.dimension(idxCols) - patchColsEff + 1.f) / static_cast<float>(strideCols));
+  } else {
+    post_reduce_dims[idxRows] = numext::ceil(in.dimension(idxRows) / static_cast<float>(strideRows));
+    post_reduce_dims[idxCols] = numext::ceil(in.dimension(idxCols) / static_cast<float>(strideCols));
+  }
+  post_reduce_dims[3] = in.dimension(3);
+
+  typedef typename internal::remove_const<typename internal::traits<Input>::Scalar>::type CoeffReturnType;
+  internal::AvgPoolMeanReducer<CoeffReturnType> mean_with_nan;
+
+#if !defined(EIGEN_HAS_INDEX_LIST)
+  // nvcc doesn't support cxx11
+  Eigen::array<int, 2> reduction_dims;
+  if (isColMajor) {
+    reduction_dims[0] = 1;
+    reduction_dims[1] = 2;
+  } else {
+    reduction_dims[0] = 2;
+    reduction_dims[1] = 3;
+  }
+#else
+  // Take advantage of cxx11 to give the compiler information it can use to
+  // optimize the code.
+  typename internal::conditional<internal::traits<Input>::Layout == ColMajor, const Eigen::IndexList<Eigen::type2index<1>, Eigen::type2index<2> >, const Eigen::IndexList<Eigen::type2index<2>, Eigen::type2index<3> > >::type reduction_dims;
+#endif
+  return input.extract_image_patches(patchRows, patchCols, strideRows, strideCols, in_strideRows, in_strideCols, padding_type, -Eigen::NumTraits<typename internal::remove_const<typename internal::traits<Input>::Scalar>::type>::highest()).reduce(reduction_dims, mean_with_nan).reshape(post_reduce_dims);
+}
+
+
+/** CuboidAvgPooling
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Applies an average pooling over a multichannel input volume.
+  *
+  * The input parameter is expected to be a tensor with a rank of 5 (channels, depth, height, width, others, and the reverse of that in row-major).
+  *
+  * The result can be assigned to a tensor of rank equal to the rank of the input. The dimensions of the result will be channels, depth, width, and others (in col-major, and the reverse of that if the input was row-major).
+  *
+  * The order of the depth, width and height dimensions can be swapped if needed.
+  *
+*/
+#if !defined(EIGEN_HAS_INDEX_LIST)
+template <typename Input>
+EIGEN_ALWAYS_INLINE static const TensorReshapingOp<
+    const Eigen::DSizes<DenseIndex, internal::traits<Input>::NumDimensions>,
+    const TensorReductionOp<
+        internal::AvgPoolMeanReducer<float>, const Eigen::array<int, 1>,
+        const TensorReshapingOp<
+            const Eigen::DSizes<DenseIndex, 3>,
+            const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const Input> > > >
+#else
+template <typename Input>
+EIGEN_ALWAYS_INLINE static const TensorReshapingOp<
+      const Eigen::DSizes<DenseIndex, internal::traits<Input>::NumDimensions>,
+      const TensorReductionOp<
+          internal::AvgPoolMeanReducer<float>,
+          const Eigen::IndexList<Eigen::type2index<1> >,
+          const TensorReshapingOp<
+              const Eigen::DSizes<DenseIndex, 3>,
+              const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const Input> > > >
+#endif
+CuboidAvgPooling(const Input& input, DenseIndex patchPlanes,
+                 DenseIndex patchRows, DenseIndex patchCols,
+                 DenseIndex stridePlanes, DenseIndex strideRows,
+                 DenseIndex strideCols, const PaddingType padding_type) {
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions == 5, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+
+  static const int idxPlanes = isColMajor ? 1 : 3;
+  static const int idxRows = 2;
+  static const int idxCols = isColMajor ? 3 : 1;
+  // Molds the output of the reduction into the shape expected by the used
+  // (assuming col-major):
+  // - 1st dim: channels
+  // - 2nd dim: outupt depth
+  // - 3rd dim: output height
+  // - 4th dim: output width
+  // - 5th dim and beyond: everything else including batch size
+  Eigen::DSizes<DenseIndex, internal::traits<Input>::NumDimensions> post_reduce_dims;
+  post_reduce_dims[0] = in.dimension(0);
+  if (padding_type == PADDING_VALID) {
+    post_reduce_dims[idxPlanes] = numext::ceil((in.dimension(idxPlanes) - patchPlanes + 1.f) / static_cast<float>(stridePlanes));
+    post_reduce_dims[idxRows] = numext::ceil((in.dimension(idxRows) - patchRows + 1.f) / static_cast<float>(strideRows));
+    post_reduce_dims[idxCols] = numext::ceil((in.dimension(idxCols) - patchCols + 1.f) / static_cast<float>(strideCols));
+  } else {
+    post_reduce_dims[idxPlanes] = numext::ceil(in.dimension(idxPlanes) / static_cast<float>(stridePlanes));
+    post_reduce_dims[idxRows] = numext::ceil(in.dimension(idxRows) / static_cast<float>(strideRows));
+    post_reduce_dims[idxCols] = numext::ceil(in.dimension(idxCols) / static_cast<float>(strideCols));
+  }
+  post_reduce_dims[4] = in.dimension(4);
+
+  Eigen::DSizes<DenseIndex, 3> pre_reduce_dims;
+  pre_reduce_dims[1] = patchRows * patchCols * patchPlanes;
+  if (isColMajor) {
+    pre_reduce_dims[0] = post_reduce_dims[0];
+    pre_reduce_dims[2] = post_reduce_dims[1] * post_reduce_dims[2] * post_reduce_dims[3] * post_reduce_dims[4];
+  } else {
+    pre_reduce_dims[0] = post_reduce_dims[0] * post_reduce_dims[1] * post_reduce_dims[2] * post_reduce_dims[3];
+    pre_reduce_dims[2] = post_reduce_dims[4];
+  }
+
+  typedef typename internal::remove_const<typename internal::traits<Input>::Scalar>::type CoeffReturnType;
+  internal::AvgPoolMeanReducer<CoeffReturnType> mean_with_nan;
+
+#if !defined(EIGEN_HAS_INDEX_LIST)
+  // nvcc doesn't support cxx11
+  Eigen::array<int, 1> reduction_dims;
+  reduction_dims[0] = 1;
+#else
+  // Take advantage of cxx11 to give the compiler information it can use to
+  // optimize the code.
+  Eigen::IndexList<Eigen::type2index<1> > reduction_dims;
+#endif
+  return input.extract_volume_patches(patchPlanes, patchRows, patchCols,
+                                      stridePlanes, strideRows, strideCols,
+                                      padding_type, -Eigen::NumTraits<float>::highest())
+      .reshape(pre_reduce_dims)
+      .reduce(reduction_dims, mean_with_nan)
+      .reshape(post_reduce_dims);
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_NEURAL_NETWORKS_POOLING_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/SoftMax.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/SoftMax.h
new file mode 100644
index 0000000000..223ae28ffd
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/SoftMax.h
@@ -0,0 +1,82 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#ifndef EIGEN_CXX11_NEURAL_NETWORKS_SOFTMAX_H
+#define EIGEN_CXX11_NEURAL_NETWORKS_SOFTMAX_H
+
+namespace Eigen {
+
+/** SoftMax
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Applies a softmax
+  *
+  * The input parameter is expected to be a col-major tensor with a rank of 2 (depth and other).
+  *
+  * The result can be assigned to a tensor of rank and dimensions equal to that of the input. The result will be laid out in col-major order.
+  *
+*/
+
+namespace {
+struct SoftmaxOp {
+  SoftmaxOp(const float beta) : beta_(beta) { }
+
+  template <typename Input>
+  typename Input::Dimensions dimensions(const Input& input) const {
+    return input.dimensions();
+  }
+
+  template <typename Input, typename Output, typename Device>
+  void eval(const Input& input, Output& output, const Device& device) const
+  {
+#if !defined(EIGEN_HAS_INDEX_LIST)
+    // nvcc doesn't support cxx11
+    Eigen::array<typename internal::traits<Input>::Index, 1> depth_dim;
+    depth_dim[0] = 0;
+    Eigen::array<typename internal::traits<Input>::Index, 2> bcast;
+    bcast[0] = dimensions(input)[0];
+    bcast[1] = 1;
+    DSizes<typename internal::traits<Input>::Index, 2> dims2d;
+    dims2d[0] = 1;
+    dims2d[1] = dimensions(input)[1];
+#else
+    // Take advantage of cxx11 to give the compiler information it can use to
+    // optimize the code.
+    Eigen::IndexList<Eigen::type2index<0>> depth_dim;
+    Eigen::IndexList<int, Eigen::type2index<1>> bcast;
+    bcast.set(0, dimensions(input)[0]);
+    Eigen::IndexList<Eigen::type2index<1>, typename internal::traits<Input>::Index> dims2d;
+    dims2d.set(1, dimensions(input)[1]);
+#endif
+
+    output.device(device) = ((input - input.maximum(depth_dim).eval().reshape(dims2d).broadcast(bcast)) * beta_).exp();
+    output.device(device) = output / (output.sum(depth_dim).eval().reshape(dims2d).broadcast(bcast));
+  }
+
+ private:
+  const float beta_;
+};
+}
+
+
+template <typename Input>
+EIGEN_ALWAYS_INLINE
+static const TensorCustomUnaryOp<const SoftmaxOp, const Input>
+SoftMax(const Input& input, const float beta)
+{
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::Layout == ColMajor, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::NumDimensions == 2, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+  const SoftmaxOp op(beta);
+  return input.customOp(op);
+}
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_NEURAL_NETWORKS_SOFTMAX_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/SpatialConvolutions.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/SpatialConvolutions.h
new file mode 100644
index 0000000000..34a9fcf037
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/SpatialConvolutions.h
@@ -0,0 +1,634 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#ifndef EIGEN_CXX11_NEURAL_NETWORKS_SPATIAL_CONVOLUTIONS_H
+#define EIGEN_CXX11_NEURAL_NETWORKS_SPATIAL_CONVOLUTIONS_H
+
+namespace Eigen {
+
+namespace internal {
+
+// These optimizations require vector instructions
+#ifdef EIGEN_VECTORIZE
+
+// TODO: Consolidate this part of the code with the image patch extraction code
+// since they are both very similar.
+template <typename NewDimension, DenseIndex Rows, DenseIndex Cols, typename ArgType, typename Device,
+          typename Scalar, typename Index,
+          typename nocontract_t, typename contract_t,
+          int Side, size_t packet_size,
+          bool inner_dim_contiguous, bool inner_dim_reordered, int Alignment>
+class TensorContractionInputMapper<Scalar, Index, Side, TensorEvaluator<const TensorReshapingOp<NewDimension, const TensorImagePatchOp<Rows, Cols, ArgType> >, Device>, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment>
+{
+ public:
+  typedef TensorContractionInputMapper<Scalar, Index, Side, TensorEvaluator<const TensorReshapingOp<NewDimension, const TensorImagePatchOp<Rows, Cols, ArgType> >, Device>, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment> Self;
+  typedef Self SubMapper;
+  typedef Self VectorMapper;
+  typedef Self LinearMapper;
+  typedef typename packet_traits<Scalar>::type Packet;
+
+  TensorContractionInputMapper(const TensorEvaluator<const TensorReshapingOp<NewDimension, const TensorImagePatchOp<Rows, Cols, ArgType> >, Device>& tensor,
+                               const nocontract_t&, const nocontract_t&,
+                               const contract_t&, const contract_t&,
+                               const Index depth_offset = 0, const Index col_offset = 0)
+      : m_depth_offset(depth_offset), m_col_offset(col_offset), m_impl(tensor.impl().impl())
+  {
+    if (internal::traits<ArgType>::Layout == ColMajor) {
+      m_patch_depth = tensor.impl().dimensions()[0];
+      m_patch_rows = tensor.impl().dimensions()[1];
+      m_patch_cols = tensor.impl().dimensions()[2];
+      m_num_patches = tensor.impl().dimensions()[3];
+    } else {
+      static const int NumDims = tensor.impl().dimensions().size();
+      m_patch_depth = tensor.impl().dimensions()[NumDims - 1];
+      m_patch_rows = tensor.impl().dimensions()[NumDims - 2];
+      m_patch_cols = tensor.impl().dimensions()[NumDims - 3];
+      m_num_patches = tensor.impl().dimensions()[NumDims - 4];
+    }
+    m_patch_row_inflate_strides = tensor.impl().rowInflateStride();
+    m_patch_col_inflate_strides = tensor.impl().colInflateStride();
+
+    m_colStride = m_patch_rows;
+
+    m_outputRows = tensor.impl().outputRows();
+    m_row_strides = tensor.impl().userRowStride();
+    m_col_strides = tensor.impl().userColStride();
+
+    m_in_row_strides = tensor.impl().userInRowStride();
+    m_in_col_strides = tensor.impl().userInColStride();
+
+    if (internal::traits<ArgType>::Layout == ColMajor) {
+      m_inputRows = tensor.impl().impl().dimensions()[1];
+      m_inputCols = tensor.impl().impl().dimensions()[2];
+    } else {
+      static const int NumDims = tensor.impl().impl().dimensions().size();
+      m_inputRows = tensor.impl().impl().dimensions()[NumDims - 2];
+      m_inputCols = tensor.impl().impl().dimensions()[NumDims - 3];
+    }
+
+    m_rowInputStride = m_patch_depth;
+    m_colInputStride = m_patch_depth * m_inputRows;
+    m_patchInputStride = m_patch_depth * m_inputRows * m_inputCols;
+
+    m_rowPaddingTop = tensor.impl().rowPaddingTop();
+    m_colPaddingLeft = tensor.impl().colPaddingLeft();
+
+    m_fastInputRowStride = internal::TensorIntDivisor<Index>(m_patch_row_inflate_strides);
+    m_fastInputColStride = internal::TensorIntDivisor<Index>(m_patch_col_inflate_strides);
+    m_fastNumPatches = internal::TensorIntDivisor<Index>(m_num_patches);
+    m_fastColStride = internal::TensorIntDivisor<Index>(m_colStride);
+    m_fastOutputRows = internal::TensorIntDivisor<Index>(m_outputRows);
+    m_fastDimZero = internal::TensorIntDivisor<Index>(m_patch_depth);
+
+    computeBaseIndices(m_col_offset, m_rowIndex, m_colIndex, m_otherIndex);
+  }
+
+  TensorContractionInputMapper(const TensorContractionInputMapper& base_mapper,
+                               const Index depth_offset,
+                               const Index col_offset) : m_depth_offset(depth_offset), m_col_offset(col_offset), m_impl(base_mapper.m_impl) {
+    m_patch_depth = base_mapper.m_patch_depth;
+    m_patch_rows = base_mapper.m_patch_rows;
+    m_patch_cols = base_mapper.m_patch_cols;
+    m_num_patches = base_mapper.m_num_patches;
+    m_patch_row_inflate_strides = base_mapper.m_patch_row_inflate_strides;
+    m_patch_col_inflate_strides = base_mapper.m_patch_col_inflate_strides;
+
+    m_colStride = base_mapper.m_colStride;
+
+    m_rowInputStride = base_mapper.m_rowInputStride;
+    m_colInputStride = base_mapper.m_colInputStride;
+    m_patchInputStride = base_mapper.m_patchInputStride;
+
+    m_inputRows = base_mapper.m_inputRows;
+    m_inputCols = base_mapper.m_inputCols;
+
+    m_outputRows = base_mapper.m_outputRows;
+    m_row_strides = base_mapper.m_row_strides;
+    m_col_strides = base_mapper.m_col_strides;
+
+    m_in_row_strides = base_mapper.m_in_row_strides;
+    m_in_col_strides = base_mapper.m_in_col_strides;
+
+    m_rowPaddingTop = base_mapper.m_rowPaddingTop;
+    m_colPaddingLeft = base_mapper.m_colPaddingLeft;
+
+    m_fastInputRowStride = base_mapper.m_fastInputRowStride;
+    m_fastInputColStride = base_mapper.m_fastInputColStride;
+    m_fastNumPatches = base_mapper.m_fastNumPatches;
+    m_fastColStride = base_mapper.m_fastColStride;
+    m_fastOutputRows = base_mapper.m_fastOutputRows;
+    m_fastDimZero = base_mapper.m_fastDimZero;
+
+    computeBaseIndices(m_col_offset, m_rowIndex, m_colIndex, m_otherIndex);
+  }
+
+ // If true, turns off some optimizations for loading packets since the image
+  // patches are "non-standard" such as there are non-trivial strides or
+  // inflations in the input.
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE bool nonStandardPatches() const {
+    return m_in_row_strides != 1 || m_in_col_strides != 1 || m_patch_row_inflate_strides != 1 || m_patch_col_inflate_strides != 1;
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE SubMapper getSubMapper(Index i, Index j) const {
+    return SubMapper(*this, m_depth_offset + i, m_col_offset + j);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE LinearMapper getLinearMapper(Index i, Index j) const {
+    return LinearMapper(*this, m_depth_offset + i, m_col_offset + j);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Scalar operator()(Index row) const {
+    return loadCoeff(row + m_depth_offset, m_rowIndex, m_colIndex, m_otherIndex);
+  }
+
+  // Load the coefficient at the patchIndex location instead of the usual m_rowIndex,
+  // m_colIndex, m_otherIndex. This is currently only used by the gpu code.  EIGEN_DEVICE_FUNC
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE Scalar operator()(Index row, Index patchIndex) const {
+    checkZeroOffsets();
+    Index rowIndex, colIndex, otherIndex;
+    computeBaseIndices(patchIndex, rowIndex, colIndex, otherIndex);
+    return loadCoeff(row, rowIndex, colIndex, otherIndex);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Packet loadPacket(Index row) const {
+    return loadPacket(row + m_depth_offset, m_rowIndex, m_colIndex, m_otherIndex);
+  }
+
+  // Load the packet at the patchIndex location instead of the usual m_rowIndex,
+  // m_colIndex, m_otherIndex. This is currently only used by the gpu code.
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Packet loadPacket(Index row, Index patchIndex) const {
+    checkZeroOffsets();
+    Index rowIndex, colIndex, otherIndex;
+    computeBaseIndices(patchIndex, rowIndex, colIndex, otherIndex);
+    return loadPacket(row, rowIndex, colIndex, otherIndex);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE const TensorEvaluator<ArgType, Device>& impl() const { return m_impl; }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Index patchDepth() const { return m_patch_depth; }
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Index patchRows() const { return m_patch_rows; }
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Index patchCols() const { return m_patch_cols; }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE bool padRow(const Index row) const {
+    const Index r = m_rowIndex + row;
+    return r < 0 | r >= m_inputRows;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE bool padCol(const Index col) const {
+    const Index c = m_colIndex + col;
+    return c < 0 | c >= m_inputCols;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Index baseIndex(const Index row, const Index col) const {
+    const Index r = m_rowIndex + row;
+    const Index c = m_colIndex + col;
+    return r * m_rowInputStride + c * m_colInputStride + m_otherIndex;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Packet packetNoPadding(const Index depth, const Index baseIndex) const {
+    const Index inputIndex = depth + baseIndex;
+    return m_impl.template packet<Unaligned>(inputIndex);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Index rowOffset() const {
+    const Index patchOffset = m_depth_offset / m_fastDimZero;
+    const Index colOffset = patchOffset / m_fastColStride;
+    return patchOffset-colOffset*m_colStride;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Index colOffset() const {
+    const Index patchOffset = m_depth_offset / m_fastDimZero;
+    const Index colOffset = patchOffset / m_fastColStride;
+    return colOffset;
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Index depthOffset() const {
+    const Index patchOffset = m_depth_offset % m_patch_depth;
+    return patchOffset;
+  }
+
+ private:
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE Scalar loadCoeff(Index patchId, Index rowIndex, Index colIndex, Index otherIndex) const {
+    // Find the offset of the element wrt the location of the first element.
+    const Index patchOffset = patchId / m_fastDimZero;
+
+    const Index colOffset = patchOffset / m_fastColStride;
+    const Index inputCol = colIndex + colOffset * m_in_col_strides;
+    const Index origInputCol = (m_patch_col_inflate_strides == 1) ? inputCol : ((inputCol >= 0) ? (inputCol / m_fastInputColStride) : 0);
+    const Index rowOffset = patchOffset - colOffset * m_colStride;
+    const Index inputRow = rowIndex + rowOffset * m_in_row_strides;
+    const Index origInputRow = (m_patch_row_inflate_strides == 1) ? inputRow : ((inputRow >= 0) ? (inputRow / m_fastInputRowStride) : 0);
+    if (origInputCol < 0 | origInputRow < 0 | origInputCol >= m_inputCols | origInputRow >= m_inputRows |
+        (inputCol != origInputCol * m_patch_col_inflate_strides) | (inputRow != origInputRow * m_patch_row_inflate_strides)) {
+      return Scalar(0);
+    }
+    const Index depth = patchId - patchOffset * m_patch_depth;
+    const Index inputIndex = depth + origInputRow * m_rowInputStride + origInputCol * m_colInputStride + otherIndex;
+    return m_impl.coeff(inputIndex);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_ALWAYS_INLINE Packet loadPacket(Index patchId, Index rowIndex, Index colIndex, Index otherIndex) const {
+    const Index packetSize = internal::unpacket_traits<Packet>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(patchId < m_patch_depth*m_patch_rows*m_patch_cols);
+
+    if (nonStandardPatches()) {
+      return packetWithPossibleZero(patchId, rowIndex, colIndex, otherIndex);
+    }
+
+    if ((m_patch_depth % packetSize) == 0) {
+      // Find the offset of the element wrt the location of the first element.
+      const Index patchOffset = patchId / m_fastDimZero;
+      eigen_assert((patchId + packetSize - 1)  / m_fastDimZero == patchOffset);
+
+      const Index colOffset = patchOffset / m_fastColStride;
+      const Index inputCol = colIndex + colOffset;
+      const Index rowOffset = patchOffset - colOffset*m_colStride;
+      const Index inputRow = rowIndex + rowOffset;
+      if (inputCol < 0 | inputRow < 0 | inputCol >= m_inputCols | inputRow >= m_inputRows) {
+        // all zeros
+        return internal::pset1<Packet>(Scalar(0));
+      }
+      // no padding
+      const Index depth = patchId - patchOffset * m_patch_depth;
+      const Index inputIndex = depth + inputRow * m_rowInputStride + inputCol * m_colInputStride + otherIndex;
+      return m_impl.template packet<Unaligned>(inputIndex);
+    }
+    else {
+      const Index patchOffsets[2] = {patchId / m_fastDimZero, (patchId + packetSize - 1) / m_fastDimZero};
+
+      const Index colOffsets[2] = {patchOffsets[0] / m_fastColStride, patchOffsets[1] / m_fastColStride};
+
+      const Index inputCols[2] = {colIndex + colOffsets[0], colIndex + colOffsets[1]};
+      if (inputCols[0] >= m_inputCols | inputCols[1] < 0) {
+        // all zeros
+        return internal::pset1<Packet>(Scalar(0));
+      }
+
+      if (inputCols[0] == inputCols[1]) {
+        const Index rowOffsets[2] = {patchOffsets[0] - colOffsets[0]*m_colStride, patchOffsets[1] - colOffsets[1]*m_colStride};
+        eigen_assert(rowOffsets[0] <= rowOffsets[1]);
+        const Index inputRows[2] = {rowIndex + rowOffsets[0], rowIndex + rowOffsets[1]};
+
+        if (inputRows[0] >= m_inputRows | inputRows[1] < 0) {
+          // all zeros
+          return internal::pset1<Packet>(Scalar(0));
+        }
+
+        if (inputRows[0] >= 0 & inputRows[1] < m_inputRows) {
+          // no padding
+          const Index depth = patchId - patchOffsets[0] * m_patch_depth;
+          const Index inputIndex = depth + inputRows[0] * m_rowInputStride + inputCols[0] * m_colInputStride + otherIndex;
+          return m_impl.template packet<Unaligned>(inputIndex);
+        }
+      }
+    }
+    return packetWithPossibleZero(patchId, rowIndex, colIndex, otherIndex);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet packetWithPossibleZero(Index patchId, Index rowIndex, Index colIndex, Index otherIndex) const
+  {
+    const int packetSize = internal::unpacket_traits<Packet>::size;
+    EIGEN_ALIGN_MAX typename internal::remove_const<Scalar>::type values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = loadCoeff(patchId+i, rowIndex, colIndex, otherIndex);
+    }
+    Packet rslt = internal::pload<Packet>(values);
+    return rslt;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void computeBaseIndices(Index patchIndex, Index& rowIndex, Index& colIndex, Index& otherIndex) const {
+    const int NumInputDims = array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+    otherIndex = (NumInputDims == 3) ? 0 : patchIndex / m_fastNumPatches;
+    const Index patch2DIndex = (NumInputDims == 3) ? patchIndex : (patchIndex - otherIndex * m_num_patches);
+    otherIndex *= m_patchInputStride;
+    colIndex = patch2DIndex / m_fastOutputRows;
+    rowIndex = patch2DIndex - colIndex * m_outputRows;
+    colIndex = colIndex * m_col_strides - m_colPaddingLeft;
+    rowIndex = rowIndex * m_row_strides - m_rowPaddingTop;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void checkZeroOffsets() const {
+    eigen_assert(m_col_offset == 0);
+    eigen_assert(m_depth_offset == 0);
+    eigen_assert(m_rowIndex == 0);
+    eigen_assert(m_colIndex == 0);
+    eigen_assert(m_otherIndex == 0);
+  }
+
+  Index m_depth_offset;  // First row in the input matrix
+  Index m_col_offset;    // First col in the input matrix
+
+  Index m_patch_depth;   // patch depth, which is equal to the input depth
+  Index m_patch_rows;    // number of rows in the patch
+  Index m_patch_cols;    // number of colums in the patch
+  Index m_num_patches;   // number of patches to extract.
+  Index m_patch_row_inflate_strides;  // the strides for row inflation in the image patch
+  Index m_patch_col_inflate_strides;  // the strides for col inflation in the image patch
+  // Fast representation of inflation strides.
+  internal::TensorIntDivisor<Index> m_fastInputRowStride;
+  internal::TensorIntDivisor<Index> m_fastInputColStride;
+
+  Index m_otherStride;
+  Index m_colStride;
+  internal::TensorIntDivisor<Index> m_fastNumPatches;
+  internal::TensorIntDivisor<Index> m_fastColStride;
+
+  Index m_rowInputStride;     // row stride in the input tensor
+  Index m_colInputStride;     // col stride in the input tensor
+  Index m_patchInputStride;   // patch stride in the input tensor
+
+  Index m_inputRows;     // Number of rows in the input tensor
+  Index m_inputCols;     // Number of cols in the input tensor
+
+  Index m_outputRows;    // Number of patch rows
+
+  Index m_row_strides;   // User specified row stride
+  Index m_col_strides;   // User specified col stride
+
+  Index m_in_row_strides;  // User specified input row stride
+  Index m_in_col_strides;  // User specified input col stride
+
+  Index m_rowPaddingTop;    // Row padding
+  Index m_colPaddingLeft;   // Column padding
+
+  internal::TensorIntDivisor<Index> m_fastOutputRows;
+  internal::TensorIntDivisor<Index> m_fastDimZero;
+
+  Index m_rowIndex;        // precomputed row index corresponding to the col offset
+  Index m_colIndex;        // precomputed col index corresponding to the col offset
+  Index m_otherIndex;      // precomputed other index corresponding to the col offset
+
+  const TensorEvaluator<ArgType, Device> m_impl;
+};
+
+
+template <typename NewDimension, DenseIndex Rows, DenseIndex Cols, typename ArgType, typename Device,
+          typename Scalar, typename Index,
+          typename nocontract_t, typename contract_t,
+          int Side, size_t packet_size,
+          bool inner_dim_contiguous, bool inner_dim_reordered, int Alignment, int nr>
+struct gemm_pack_rhs<Scalar, Index, TensorContractionInputMapper<Scalar, Index, Side, TensorEvaluator<const TensorReshapingOp<NewDimension, const TensorImagePatchOp<Rows, Cols, ArgType> >, Device>, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment>, nr, ColMajor, false, false> {
+
+  typedef TensorContractionInputMapper<Scalar, Index, Side, TensorEvaluator<const TensorReshapingOp<NewDimension, const TensorImagePatchOp<Rows, Cols, ArgType> >, Device>, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment> DataMapper;
+
+  static inline Index ceil_div(Index a, Index b) {
+    return (a + b - 1) / b;
+  }
+
+  EIGEN_DONT_INLINE void operator()(Scalar* block, const DataMapper& rhs, Index depth, Index cols, Index stride=0, Index offset=0) const {
+    eigen_assert(stride == 0);
+    eigen_assert(offset == 0);
+
+    EIGEN_STATIC_ASSERT((nr == 4), YOU_MADE_A_PROGRAMMING_MISTAKE);
+    typedef typename DataMapper::LinearMapper LinearMapper;
+    typedef typename packet_traits<Scalar>::type Packet;
+
+    const Index packet_cols4 = (cols/4) * 4;
+    const Index peeled_k = (depth/packet_size) * packet_size;
+
+    for(Index j2=0; j2<packet_cols4; j2+=4)
+    {
+      const LinearMapper dm0 = rhs.getLinearMapper(0, j2 + 0);
+      const LinearMapper dm1 = rhs.getLinearMapper(0, j2 + 1);
+      const LinearMapper dm2 = rhs.getLinearMapper(0, j2 + 2);
+      const LinearMapper dm3 = rhs.getLinearMapper(0, j2 + 3);
+
+      Index k=0;
+      if((packet_size%4)==0 && !rhs.nonStandardPatches())
+      {
+        const Index patch_depth = rhs.patchDepth();
+        if ((patch_depth % packet_size) == 0) {
+          const Index patch_cols = rhs.patchCols();
+          const Index patch_rows = rhs.patchRows();
+
+          const Index startCol = rhs.colOffset();
+          const Index max_cols = std::min<Index>(ceil_div(peeled_k, patch_rows*patch_depth)+startCol, patch_cols);
+
+          for (Index c = startCol; c < max_cols; ++c) {
+            eigen_assert(k < peeled_k);
+            const Index startRow = (c == startCol) ? rhs.rowOffset() : 0;
+            const Index max_rows = std::min<Index>(ceil_div(peeled_k-c*patch_rows*patch_depth, patch_depth)+startRow, patch_rows);
+
+            const bool pad_col0 = dm0.padCol(c);
+            const bool pad_col1 = dm1.padCol(c);
+            const bool pad_col2 = dm2.padCol(c);
+            const bool pad_col3 = dm3.padCol(c);
+            for (Index r = startRow; r < max_rows; ++r) {
+              eigen_assert(k < peeled_k);
+              const bool pad0 = pad_col0 || dm0.padRow(r);
+              const bool pad1 = pad_col1 || dm1.padRow(r);
+              const bool pad2 = pad_col2 || dm2.padRow(r);
+              const bool pad3 = pad_col3 || dm3.padRow(r);
+
+              const Index idx0 = dm0.baseIndex(r, c);
+              const Index idx1 = dm1.baseIndex(r, c);
+              const Index idx2 = dm2.baseIndex(r, c);
+              const Index idx3 = dm3.baseIndex(r, c);
+
+              const Index startDepth = ((c == startCol) && (r == startRow)) ? rhs.depthOffset() : 0;
+              const Index max_depth = std::min<Index>(peeled_k-c*patch_rows*patch_depth-r*patch_depth+startDepth, patch_depth);
+              eigen_assert(max_depth % packet_size == 0);
+              for (Index d = startDepth; d < max_depth; d += packet_size) {
+                eigen_assert(k < peeled_k);
+                PacketBlock<Packet, 4> kernel;
+                kernel.packet[0] = pad0 ? pset1<Packet>(0) : dm0.packetNoPadding(d, idx0);
+                kernel.packet[1] = pad1 ? pset1<Packet>(0) : dm1.packetNoPadding(d, idx1);
+                kernel.packet[2] = pad2 ? pset1<Packet>(0) : dm2.packetNoPadding(d, idx2);
+                kernel.packet[3] = pad3 ? pset1<Packet>(0) : dm3.packetNoPadding(d, idx3);
+                ptranspose(kernel);
+                pstoreu(block+0*packet_size, kernel.packet[0]);
+                pstoreu(block+1*packet_size, kernel.packet[1]);
+                pstoreu(block+2*packet_size, kernel.packet[2]);
+                pstoreu(block+3*packet_size, kernel.packet[3]);
+                block+=4*packet_size;
+                k += packet_size;
+              }
+            }
+          }
+        }
+
+        for(; k<peeled_k; k+=packet_size) {
+          PacketBlock<Packet, 4> kernel;
+          kernel.packet[0] = dm0.loadPacket(k);
+          kernel.packet[1] = dm1.loadPacket(k);
+          kernel.packet[2] = dm2.loadPacket(k);
+          kernel.packet[3] = dm3.loadPacket(k);
+          ptranspose(kernel);
+          pstoreu(block+0*packet_size, kernel.packet[0]);
+          pstoreu(block+1*packet_size, kernel.packet[1]);
+          pstoreu(block+2*packet_size, kernel.packet[2]);
+          pstoreu(block+3*packet_size, kernel.packet[3]);
+          block+=4*packet_size;
+        }
+      }
+      for(; k<depth; k++)
+      {
+        block[0] = dm0(k);
+        block[1] = dm1(k);
+        block[2] = dm2(k);
+        block[3] = dm3(k);
+        block += 4;
+      }
+    }
+
+    // copy the remaining columns one at a time (nr==1)
+    for(Index j2=packet_cols4; j2<cols; ++j2)
+    {
+      const LinearMapper dm0 = rhs.getLinearMapper(0, j2);
+      for(Index k=0; k<depth; k++)
+      {
+        *block = dm0(k);
+        block += 1;
+      }
+    }
+  }
+};
+
+#endif  // EIGEN_VECTORIZE
+}  // end namespace internal
+
+
+/** SpatialConvolution
+  * \ingroup CXX11_NeuralNetworks_Module
+  *
+  * \brief Applies a 2D convolution over a multichannel input image.
+  *
+  * The input parameter is expected to be a tensor with a rank of 3 or more (channels, height, width, and optionally others)
+  * The kernel parameter is expected to be a 4D tensor (filters, channels, kernel_height, kernel_width)
+  * The input and the kernel must both be in col-major layout. The result will also be in col-major layout.
+  *
+  * If in_stride > 1, then applies convolution with holes (aka atrous convolution), sampling every in_stride input pixels.
+  *
+  * The result can be assigned to a tensor of rank equal to the rank of the input. The dimensions of the result will be filters, height, width (and others if applicable).
+  *
+  * It is possible to swap the order of the width and height dimensions provided that the same order is used in the input, the kernel, and the output.
+  *
+  */
+template <typename Input, typename Kernel>
+EIGEN_ALWAYS_INLINE
+static const typename internal::conditional<
+  internal::traits<Input>::Layout == ColMajor,
+  TensorReshapingOp<const DSizes<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions>, const TensorContractionOp<const array<IndexPair<typename internal::traits<Input>::Index>, 1>, const TensorReshapingOp<const DSizes<typename internal::traits<Input>::Index, 2>, const Kernel>, const TensorReshapingOp<const DSizes<typename internal::traits<Input>::Index, 2>, const TensorImagePatchOp<Dynamic, Dynamic, const Input> > > >,
+  TensorReshapingOp<const DSizes<typename internal::traits<Input>::Index, internal::traits<Input>::NumDimensions>, const TensorContractionOp<const array<IndexPair<typename internal::traits<Input>::Index>, 1>, const TensorReshapingOp<const DSizes<typename internal::traits<Input>::Index, 2>, const TensorImagePatchOp<Dynamic, Dynamic, const Input> >, const TensorReshapingOp<const DSizes<typename internal::traits<Input>::Index, 2>, const Kernel> > > >::type
+SpatialConvolution(const Input& input, const Kernel& kernel, const DenseIndex stride = 1, const PaddingType padding_type = PADDING_SAME, const DenseIndex in_stride = 1) {
+
+  typedef typename internal::traits<Input>::Index TensorIndex;
+  TensorRef<Tensor<typename internal::traits<Input>::Scalar, internal::traits<Input>::NumDimensions, internal::traits<Input>::Layout, TensorIndex> > in(input);
+  TensorRef<Tensor<typename internal::traits<Kernel>::Scalar, internal::traits<Kernel>::NumDimensions, internal::traits<Kernel>::Layout, TensorIndex> > kern(kernel);
+
+  EIGEN_STATIC_ASSERT(internal::traits<Input>::Layout == internal::traits<Kernel>::Layout, YOU_MADE_A_PROGRAMMING_MISTAKE);
+  static const bool isColMajor = (internal::traits<Input>::Layout == ColMajor);
+
+  static const int NumDims = internal::traits<Input>::NumDimensions;
+
+  // Number of filters to apply. This is the same as the output depth of the result
+  const TensorIndex kernelFilters = isColMajor ? kern.dimensions()[0] : kern.dimensions()[3];
+  // Number of channels. This is the same as the input depth.
+  const TensorIndex kernelChannels = isColMajor ? kern.dimensions()[1] : kern.dimensions()[2];
+  const TensorIndex kernelRows = isColMajor ? kern.dimensions()[2] : kern.dimensions()[1];
+  const TensorIndex kernelCols = isColMajor ? kern.dimensions()[3] : kern.dimensions()[0];
+
+  const DenseIndex kernelRowsEff = kernelRows + (kernelRows - 1) * (in_stride - 1);
+  const DenseIndex kernelColsEff = kernelCols + (kernelCols - 1) * (in_stride - 1);
+
+  array<IndexPair<TensorIndex>, 1> contract_dims;
+  contract_dims[0] = IndexPair<TensorIndex>(1, 0);
+
+  const TensorIndex InputRows = isColMajor ? in.dimension(1) : in.dimension(NumDims - 2);
+  const TensorIndex InputCols = isColMajor ? in.dimension(2) : in.dimension(NumDims - 3);
+
+  TensorIndex out_height;
+  TensorIndex out_width;
+  switch (padding_type) {
+    case PADDING_VALID:
+      out_height = numext::ceil((InputRows - kernelRowsEff + 1.f) / static_cast<float>(stride));
+      out_width = numext::ceil((InputCols - kernelColsEff + 1.f) / static_cast<float>(stride));
+      break;
+    case PADDING_SAME:
+      out_height = numext::ceil(InputRows / static_cast<float>(stride));
+      out_width = numext::ceil(InputCols / static_cast<float>(stride));
+      break;
+    default:
+      eigen_assert(false && "unexpected padding");
+  }
+
+  // Molds the output of the patch extraction code into a 2d tensor:
+  // - the first dimension (dims[0]): the patch values to be multiplied with the kernels
+  // - the second dimension (dims[1]): everything else
+  DSizes<TensorIndex, 2> pre_contract_dims;
+  if (isColMajor) {
+    pre_contract_dims[0] = kernelChannels * kernelRows * kernelCols;
+    pre_contract_dims[1] = out_height * out_width;
+    for (int i = 3; i < NumDims; ++i) {
+      pre_contract_dims[1] *= in.dimension(i);
+    }
+  } else {
+    pre_contract_dims[1] = kernelChannels * kernelRows * kernelCols;
+    pre_contract_dims[0] = out_height * out_width;
+    for (int i = 0; i < NumDims - 3; ++i) {
+      pre_contract_dims[0] *= in.dimension(i);
+    }
+  }
+
+  // Molds the output of the contraction into the shape expected by the used
+  // (assuming this is ColMajor):
+  // - 1st dim: kernel filters
+  // - 2nd dim: output height
+  // - 3rd dim: output width
+  // - 4th dim and beyond: everything else including batch size
+  DSizes<TensorIndex, NumDims> post_contract_dims;
+  if (isColMajor) {
+    post_contract_dims[0] = kernelFilters;
+    post_contract_dims[1] = out_height;
+    post_contract_dims[2] = out_width;
+    for (int i = 3; i < NumDims; ++i) {
+      post_contract_dims[i] = in.dimension(i);
+    }
+  } else {
+    post_contract_dims[NumDims - 1] = kernelFilters;
+    post_contract_dims[NumDims - 2] = out_height;
+    post_contract_dims[NumDims - 3] = out_width;
+    for (int i = 0; i < NumDims - 3; ++i) {
+      post_contract_dims[i] = in.dimension(i);
+    }
+  }
+
+  DSizes<TensorIndex, 2> kernel_dims;
+  if (isColMajor) {
+    kernel_dims[0] = kernelFilters;
+    kernel_dims[1] = kernelChannels * kernelRows * kernelCols;
+  } else {
+    kernel_dims[0] = kernelChannels * kernelRows * kernelCols;
+    kernel_dims[1] = kernelFilters;
+  }
+  // TODO(yangke): choose() is defined in TensorContraction.h -- consider
+  // moving it to somewhere more "common".
+  return choose(Cond<internal::traits<Input>::Layout == ColMajor>(),
+                kernel.reshape(kernel_dims).contract(input.extract_image_patches(kernelRows, kernelCols, stride, stride, in_stride, in_stride, padding_type).reshape(pre_contract_dims), contract_dims).reshape(post_contract_dims),
+                input.extract_image_patches(kernelRows, kernelCols, stride, stride, in_stride, in_stride, padding_type).reshape(pre_contract_dims).contract(kernel.reshape(kernel_dims), contract_dims).reshape(post_contract_dims));
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_NEURAL_NETWORKS_SPATIAL_CONVOLUTIONS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/TensorConvolutionByFFT.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/TensorConvolutionByFFT.h
new file mode 100644
index 0000000000..0e72173536
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/NeuralNetworks/TensorConvolutionByFFT.h
@@ -0,0 +1,289 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+// Copyright (C) 2015 Jianwei Cui <thucjw@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTIONBYFFT_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTIONBYFFT_H
+
+namespace Eigen {
+
+/** \class TensorConvolutionByFFT
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor convolution class.
+  *
+  *
+  */
+namespace internal {
+
+
+template<typename Dimensions, typename InputXprType, typename KernelXprType>
+struct traits<TensorConvolutionByFFTOp<Dimensions, InputXprType, KernelXprType> >
+{
+  // Type promotion to handle the case where the types of the lhs and the rhs are different.
+  typedef typename promote_storage_type<typename InputXprType::Scalar,
+                                        typename KernelXprType::Scalar>::ret Scalar;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename promote_storage_type<typename traits<InputXprType>::StorageKind,
+                                        typename traits<KernelXprType>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename traits<InputXprType>::Index,
+                                      typename traits<KernelXprType>::Index>::type Index;
+  typedef typename InputXprType::Nested LhsNested;
+  typedef typename KernelXprType::Nested RhsNested;
+  typedef typename remove_reference<LhsNested>::type _LhsNested;
+  typedef typename remove_reference<RhsNested>::type _RhsNested;
+  static const int NumDimensions = traits<InputXprType>::NumDimensions;
+  static const int Layout = traits<InputXprType>::Layout;
+
+  enum {
+    Flags = 0,
+  };
+};
+
+template<typename Dimensions, typename InputXprType, typename KernelXprType>
+struct eval<TensorConvolutionByFFTOp<Dimensions, InputXprType, KernelXprType>, Eigen::Dense>
+{
+  typedef const TensorConvolutionByFFTOp<Dimensions, InputXprType, KernelXprType>& type;
+};
+
+template<typename Dimensions, typename InputXprType, typename KernelXprType>
+struct nested<TensorConvolutionByFFTOp<Dimensions, InputXprType, KernelXprType>, 1, typename eval<TensorConvolutionByFFTOp<Dimensions, InputXprType, KernelXprType> >::type>
+{
+  typedef TensorConvolutionByFFTOp<Dimensions, InputXprType, KernelXprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename Indices, typename InputXprType, typename KernelXprType>
+class TensorConvolutionByFFTOp : public TensorBase<TensorConvolutionByFFTOp<Indices, InputXprType, KernelXprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorConvolutionByFFTOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorConvolutionByFFTOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename internal::promote_storage_type<typename InputXprType::CoeffReturnType,
+                                                  typename KernelXprType::CoeffReturnType>::ret CoeffReturnType;
+  typedef typename internal::promote_storage_type<typename InputXprType::PacketReturnType,
+                                                  typename KernelXprType::PacketReturnType>::ret PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorConvolutionByFFTOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorConvolutionByFFTOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorConvolutionByFFTOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorConvolutionByFFTOp(const InputXprType& input, const KernelXprType& kernel, const Indices& dims)
+      : m_input_xpr(input), m_kernel_xpr(kernel), m_indices(dims) {}
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const Indices& indices() const { return m_indices; }
+
+    /** \returns the nested expressions */
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const typename internal::remove_all<typename InputXprType::Nested>::type&
+    inputExpression() const { return m_input_xpr; }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const typename internal::remove_all<typename KernelXprType::Nested>::type&
+    kernelExpression() const { return m_kernel_xpr; }
+
+  protected:
+    typename InputXprType::Nested m_input_xpr;
+    typename KernelXprType::Nested m_kernel_xpr;
+    const Indices m_indices;
+};
+
+
+template<typename Indices, typename InputArgType, typename KernelArgType, typename Device>
+struct TensorEvaluator<const TensorConvolutionByFFTOp<Indices, InputArgType, KernelArgType>, Device>
+{
+  typedef TensorConvolutionByFFTOp<Indices, InputArgType, KernelArgType> XprType;
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+
+  static const int NumDims = internal::array_size<typename TensorEvaluator<InputArgType, Device>::Dimensions>::value;
+  static const int NumKernelDims = internal::array_size<Indices>::value;
+  typedef typename XprType::Index Index;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = TensorEvaluator<InputArgType, Device>::IsAligned &
+                TensorEvaluator<KernelArgType, Device>::IsAligned,
+    PacketAccess = false,
+    BlockAccess = false,
+    Layout = TensorEvaluator<InputArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_inputImpl(op.inputExpression(), device), m_kernelImpl(op.kernelExpression(), device), m_kernelArg(op.kernelExpression()), m_kernel(NULL), m_local_kernel(false), m_device(device)
+  {
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<InputArgType, Device>::Layout) == static_cast<int>(TensorEvaluator<KernelArgType, Device>::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+    const typename TensorEvaluator<InputArgType, Device>::Dimensions& input_dims = m_inputImpl.dimensions();
+    const typename TensorEvaluator<KernelArgType, Device>::Dimensions& kernel_dims = m_kernelImpl.dimensions();
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_inputStride[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_inputStride[i] = m_inputStride[i - 1] * input_dims[i - 1];
+      }
+    } else {
+      m_inputStride[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_inputStride[i] = m_inputStride[i + 1] * input_dims[i + 1];
+      }
+    }
+
+    m_dimensions = m_inputImpl.dimensions();
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = 0; i < NumKernelDims; ++i) {
+        const Index index = op.indices()[i];
+        const Index input_dim = input_dims[index];
+        const Index kernel_dim = kernel_dims[i];
+        const Index result_dim = input_dim - kernel_dim + 1;
+        m_dimensions[index] = result_dim;
+        if (i > 0) {
+          m_kernelStride[i] = m_kernelStride[i - 1] * kernel_dims[i - 1];
+        } else {
+          m_kernelStride[0] = 1;
+        }
+        m_indexStride[i] = m_inputStride[index];
+      }
+
+      m_outputStride[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_outputStride[i] = m_outputStride[i - 1] * m_dimensions[i - 1];
+      }
+    } else {
+      for (int i = NumKernelDims - 1; i >= 0; --i) {
+        const Index index = op.indices()[i];
+        const Index input_dim = input_dims[index];
+        const Index kernel_dim = kernel_dims[i];
+        const Index result_dim = input_dim - kernel_dim + 1;
+        m_dimensions[index] = result_dim;
+        if (i < NumKernelDims - 1) {
+          m_kernelStride[i] = m_kernelStride[i + 1] * kernel_dims[i + 1];
+        } else {
+          m_kernelStride[NumKernelDims - 1] = 1;
+        }
+        m_indexStride[i] = m_inputStride[index];
+      }
+
+      m_outputStride[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_outputStride[i] = m_outputStride[i + 1] * m_dimensions[i + 1];
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* data) {
+    m_inputImpl.evalSubExprsIfNeeded(NULL);
+    m_kernelImpl.evalSubExprsIfNeeded(NULL);
+
+    typedef typename internal::traits<InputArgType>::Index TensorIndex;
+
+    Tensor<Scalar, NumDims, Layout, TensorIndex> input(m_inputImpl.dimensions());
+    for (int i = 0; i < m_inputImpl.dimensions().TotalSize(); ++i) {
+      input.data()[i] = m_inputImpl.coeff(i);
+    }
+
+    Tensor<Scalar, NumDims, Layout, TensorIndex> kernel(m_kernelImpl.dimensions());
+    for (int i = 0; i < m_kernelImpl.dimensions().TotalSize(); ++i) {
+      kernel.data()[i] = m_kernelImpl.coeff(i);
+    }
+
+    array<std::pair<ptrdiff_t, ptrdiff_t>, NumDims> paddings;
+    for (int i = 0; i < NumDims; ++i) {
+      paddings[i] = std::make_pair(0, m_inputImpl.dimensions()[i] - m_kernelImpl.dimensions()[i]);
+    }
+
+    Eigen::array<bool, NumKernelDims> reverse;
+    for (int i = 0; i < NumKernelDims; ++i) {
+      reverse[i] = true;
+    }
+
+    Eigen::array<bool, NumDims> fft;
+    for (int i = 0; i < NumDims; ++i) {
+      fft[i] = i;
+    }
+
+    Eigen::DSizes<TensorIndex, NumDims> slice_offsets;
+    for (int i = 0; i < NumDims; ++i) {
+      slice_offsets[i] = m_kernelImpl.dimensions()[i] - 1;
+    }
+
+    Eigen::DSizes<TensorIndex, NumDims> slice_extents;
+    for (int i = 0; i < NumDims; ++i) {
+      slice_extents[i] = m_inputImpl.dimensions()[i] - m_kernelImpl.dimensions()[i] + 1;
+    }
+
+    Tensor<Scalar, NumDims, Layout, TensorIndex> kernel_variant =  kernel.reverse(reverse).pad(paddings);
+    Tensor<std::complex<Scalar>, NumDims, Layout, TensorIndex> kernel_fft =  kernel_variant.template fft<Eigen::BothParts, FFT_FORWARD>(fft);
+    //Tensor<std::complex<Scalar>, NumDims, Layout|IndexType> kernel_fft =  kernel.reverse(reverse).pad(paddings).template fft<2>(fft);
+    Tensor<std::complex<Scalar>, NumDims, Layout, TensorIndex> input_fft = input.template fft<Eigen::BothParts, FFT_FORWARD>(fft);
+    Tensor<std::complex<Scalar>, NumDims, Layout, TensorIndex> prod = (input_fft * kernel_fft).template fft<Eigen::BothParts, FFT_REVERSE>(fft);
+    Tensor<std::complex<Scalar>, NumDims, Layout, TensorIndex> tensor_result = prod.slice(slice_offsets, slice_extents);
+
+    for (int i = 0; i < tensor_result.size(); ++i) {
+      data[i] = std::real(tensor_result.data()[i]);
+    }
+    return false;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_inputImpl.cleanup();
+    if (m_local_kernel) {
+      m_device.deallocate((void*)m_kernel);
+      m_local_kernel = false;
+    }
+    m_kernel = NULL;
+  }
+
+  void evalTo(typename XprType::Scalar* buffer) {
+    evalSubExprsIfNeeded(NULL);
+    for (int i = 0; i < dimensions().TotalSize(); ++i) {
+      buffer[i] += coeff(i);
+    }
+    cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    CoeffReturnType result = CoeffReturnType(0);
+    return result;
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ private:
+  array<Index, NumDims> m_inputStride;
+  array<Index, NumDims> m_outputStride;
+
+  array<Index, NumKernelDims> m_indexStride;
+  array<Index, NumKernelDims> m_kernelStride;
+  TensorEvaluator<InputArgType, Device> m_inputImpl;
+  TensorEvaluator<KernelArgType, Device> m_kernelImpl;
+  Dimensions m_dimensions;
+
+  KernelArgType m_kernelArg;
+  const Scalar* m_kernel;
+  bool m_local_kernel;
+  const Device& m_device;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTIONBYFFT_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/Tensor.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/Tensor.h
new file mode 100644
index 0000000000..9db0d2698f
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/Tensor.h
@@ -0,0 +1,461 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_H
+#define EIGEN_CXX11_TENSOR_TENSOR_H
+
+namespace Eigen {
+
+/** \class Tensor
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief The tensor class.
+  *
+  * The %Tensor class is the work-horse for all \em dense tensors within Eigen.
+  *
+  * The %Tensor class encompasses only dynamic-size objects so far.
+  *
+  * The first two template parameters are required:
+  * \tparam Scalar_ \anchor tensor_tparam_scalar Numeric type, e.g. float, double, int or std::complex<float>.
+  *                 User defined scalar types are supported as well (see \ref user_defined_scalars "here").
+  * \tparam NumIndices_ Number of indices (i.e. rank of the tensor)
+  *
+  * The remaining template parameters are optional -- in most cases you don't have to worry about them.
+  * \tparam Options_ \anchor tensor_tparam_options A combination of either \b #RowMajor or \b #ColMajor, and of either
+  *                 \b #AutoAlign or \b #DontAlign.
+  *                 The former controls \ref TopicStorageOrders "storage order", and defaults to column-major. The latter controls alignment, which is required
+  *                 for vectorization. It defaults to aligning tensors. Note that tensors currently do not support any operations that profit from vectorization.
+  *                 Support for such operations (i.e. adding two tensors etc.) is planned.
+  *
+  * You can access elements of tensors using normal subscripting:
+  *
+  * \code
+  * Eigen::Tensor<double, 4> t(10, 10, 10, 10);
+  * t(0, 1, 2, 3) = 42.0;
+  * \endcode
+  *
+  * This class can be extended with the help of the plugin mechanism described on the page
+  * \ref TopicCustomizingEigen by defining the preprocessor symbol \c EIGEN_TENSOR_PLUGIN.
+  *
+  * <i><b>Some notes:</b></i>
+  *
+  * <dl>
+  * <dt><b>Relation to other parts of Eigen:</b></dt>
+  * <dd>The midterm developement goal for this class is to have a similar hierarchy as Eigen uses for matrices, so that
+  * taking blocks or using tensors in expressions is easily possible, including an interface with the vector/matrix code
+  * by providing .asMatrix() and .asVector() (or similar) methods for rank 2 and 1 tensors. However, currently, the %Tensor
+  * class does not provide any of these features and is only available as a stand-alone class that just allows for
+  * coefficient access. Also, when fixed-size tensors are implemented, the number of template arguments is likely to
+  * change dramatically.</dd>
+  * </dl>
+  *
+  * \ref TopicStorageOrders
+  */
+
+template<typename Scalar_, std::size_t NumIndices_, int Options_, typename IndexType_>
+class Tensor : public TensorBase<Tensor<Scalar_, NumIndices_, Options_, IndexType_> >
+{
+  public:
+    typedef Tensor<Scalar_, NumIndices_, Options_, IndexType_> Self;
+    typedef TensorBase<Tensor<Scalar_, NumIndices_, Options_, IndexType_> > Base;
+    typedef typename Eigen::internal::nested<Self>::type Nested;
+    typedef typename internal::traits<Self>::StorageKind StorageKind;
+    typedef typename internal::traits<Self>::Index Index;
+    typedef Scalar_ Scalar;
+    typedef typename internal::packet_traits<Scalar>::type Packet;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+    typedef typename Base::PacketReturnType PacketReturnType;
+
+    enum {
+      IsAligned = bool(EIGEN_ALIGN) & !(Options_ & DontAlign),
+      PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+      BlockAccess = false,
+      Layout = Options_ & RowMajor ? RowMajor : ColMajor,
+      CoordAccess = true,
+    };
+
+    static const int Options = Options_;
+    static const std::size_t NumIndices = NumIndices_;
+    typedef DSizes<Index, NumIndices_> Dimensions;
+
+  protected:
+    TensorStorage<Scalar, Dimensions, Options_> m_storage;
+
+  public:
+    // Metadata
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                         rank()                   const { return NumIndices; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                         dimension(std::size_t n) const { return m_storage.dimensions()[n]; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions&             dimensions()    const { return m_storage.dimensions(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                         size()                   const { return m_storage.size(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar                        *data()                        { return m_storage.data(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar                  *data()                  const { return m_storage.data(); }
+
+    // This makes EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    // work, because that uses base().coeffRef() - and we don't yet
+    // implement a similar class hierarchy
+    inline Self& base()             { return *this; }
+    inline const Self& base() const { return *this; }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    EIGEN_DEVICE_FUNC inline const Scalar& coeff(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) const
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return coeff(array<Index, NumIndices>{{firstIndex, secondIndex, otherIndices...}});
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(const array<Index, NumIndices>& indices) const
+    {
+      eigen_internal_assert(checkIndexRange(indices));
+      return m_storage.data()[linearizedIndex(indices)];
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff() const
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return m_storage.data()[0];
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return m_storage.data()[index];
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline Scalar& coeffRef(Index firstIndex, Index secondIndex, IndexTypes... otherIndices)
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return coeffRef(array<Index, NumIndices>{{firstIndex, secondIndex, otherIndices...}});
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(const array<Index, NumIndices>& indices)
+    {
+      eigen_internal_assert(checkIndexRange(indices));
+      return m_storage.data()[linearizedIndex(indices)];
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef()
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return m_storage.data()[0];
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index)
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return m_storage.data()[index];
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline const Scalar& operator()(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) const
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return this->operator()(array<Index, NumIndices>{{firstIndex, secondIndex, otherIndices...}});
+    }
+#else
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1) const
+    {
+      return coeff(array<Index, 2>(i0, i1));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2) const
+    {
+      return coeff(array<Index, 3>(i0, i1, i2));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2, Index i3) const
+    {
+      return coeff(array<Index, 4>(i0, i1, i2, i3));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2, Index i3, Index i4) const
+    {
+      return coeff(array<Index, 5>(i0, i1, i2, i3, i4));
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(const array<Index, NumIndices>& indices) const
+    {
+      eigen_assert(checkIndexRange(indices));
+      return coeff(indices);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()() const
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return coeff();
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return coeff(index);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator[](Index index) const
+    {
+      // The bracket operator is only for vectors, use the parenthesis operator instead.
+      EIGEN_STATIC_ASSERT(NumIndices == 1, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return coeff(index);
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline Scalar& operator()(Index firstIndex, Index secondIndex, IndexTypes... otherIndices)
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return operator()(array<Index, NumIndices>{{firstIndex, secondIndex, otherIndices...}});
+    }
+#else
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1)
+    {
+      return coeffRef(array<Index, 2>(i0, i1));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2)
+    {
+      return coeffRef(array<Index, 3>(i0, i1, i2));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3)
+    {
+      return coeffRef(array<Index, 4>(i0, i1, i2, i3));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3, Index i4)
+    {
+      return coeffRef(array<Index, 5>(i0, i1, i2, i3, i4));
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(const array<Index, NumIndices>& indices)
+    {
+      eigen_assert(checkIndexRange(indices));
+      return coeffRef(indices);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()()
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return coeffRef();
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index index)
+    {
+      eigen_assert(index >= 0 && index < size());
+      return coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator[](Index index)
+    {
+      // The bracket operator is only for vectors, use the parenthesis operator instead
+      EIGEN_STATIC_ASSERT(NumIndices == 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Tensor()
+      : m_storage()
+    {
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Tensor(const Self& other)
+      : m_storage(other.m_storage)
+    {
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline Tensor(Index firstDimension, IndexTypes... otherDimensions)
+        : m_storage(internal::array_prod(array<Index, NumIndices>{{firstDimension, otherDimensions...}}), array<Index, NumIndices>{{firstDimension, otherDimensions...}})
+    {
+      // The number of dimensions used to construct a tensor must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherDimensions) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+#else
+    inline explicit Tensor(Index dim1)
+      : m_storage(dim1, array<Index, 1>(dim1))
+    {
+      EIGEN_STATIC_ASSERT(1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+    inline explicit Tensor(Index dim1, Index dim2)
+      : m_storage(dim1*dim2, array<Index, 2>(dim1, dim2))
+    {
+      EIGEN_STATIC_ASSERT(2 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+    inline explicit Tensor(Index dim1, Index dim2, Index dim3)
+      : m_storage(dim1*dim2*dim3, array<Index, 3>(dim1, dim2, dim3))
+    {
+      EIGEN_STATIC_ASSERT(3 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+    inline explicit Tensor(Index dim1, Index dim2, Index dim3, Index dim4)
+      : m_storage(dim1*dim2*dim3*dim4, array<Index, 4>(dim1, dim2, dim3, dim4))
+    {
+      EIGEN_STATIC_ASSERT(4 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+    inline explicit Tensor(Index dim1, Index dim2, Index dim3, Index dim4, Index dim5)
+      : m_storage(dim1*dim2*dim3*dim4*dim5, array<Index, 4>(dim1, dim2, dim3, dim4, dim5))
+    {
+      EIGEN_STATIC_ASSERT(5 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+#endif
+
+    inline explicit Tensor(const array<Index, NumIndices>& dimensions)
+        : m_storage(internal::array_prod(dimensions), dimensions)
+    {
+      EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Tensor(const TensorBase<OtherDerived, ReadOnlyAccessors>& other)
+    {
+      typedef TensorAssignOp<Tensor, const OtherDerived> Assign;
+      Assign assign(*this, other.derived());
+      resize(TensorEvaluator<const Assign, DefaultDevice>(assign, DefaultDevice()).dimensions());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+    }
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Tensor(const TensorBase<OtherDerived, WriteAccessors>& other)
+    {
+      typedef TensorAssignOp<Tensor, const OtherDerived> Assign;
+      Assign assign(*this, other.derived());
+      resize(TensorEvaluator<const Assign, DefaultDevice>(assign, DefaultDevice()).dimensions());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Tensor& operator=(const Tensor& other)
+    {
+      typedef TensorAssignOp<Tensor, const Tensor> Assign;
+      Assign assign(*this, other);
+      resize(TensorEvaluator<const Assign, DefaultDevice>(assign, DefaultDevice()).dimensions());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+      return *this;
+    }
+    template<typename Other>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Tensor& operator=(const Other& other)
+    {
+      typedef TensorAssignOp<Tensor, const Other> Assign;
+      Assign assign(*this, other);
+      resize(TensorEvaluator<const Assign, DefaultDevice>(assign, DefaultDevice()).dimensions());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+      return *this;
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes> EIGEN_DEVICE_FUNC
+    void resize(Index firstDimension, IndexTypes... otherDimensions)
+    {
+      // The number of dimensions used to resize a tensor must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherDimensions) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      resize(array<Index, NumIndices>{firstDimension, otherDimensions...});
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    void resize()
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      // Nothing to do: rank 0 tensors have fixed size
+    }
+
+    EIGEN_DEVICE_FUNC
+    void resize(const array<Index, NumIndices>& dimensions)
+    {
+      Index size = Index(1);
+      for (size_t i = 0; i < NumIndices; i++) {
+        internal::check_rows_cols_for_overflow<Dynamic>::run(size, dimensions[i]);
+        size *= dimensions[i];
+      }
+      #ifdef EIGEN_INITIALIZE_COEFFS
+        bool size_changed = size != this->size();
+        m_storage.resize(size, dimensions);
+        if(size_changed) EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+      #else
+        m_storage.resize(size, dimensions);
+      #endif
+    }
+
+    EIGEN_DEVICE_FUNC
+    void resize(const DSizes<Index, NumIndices>& dimensions) {
+      array<Index, NumIndices> dims;
+      for (int i = 0; i < NumIndices; ++i) {
+        dims[i] = dimensions[i];
+      }
+      resize(dims);
+    }
+
+#ifndef EIGEN_EMULATE_CXX11_META_H
+    template <typename std::size_t... Indices>
+    EIGEN_DEVICE_FUNC
+    void resize(const Sizes<Indices...>& dimensions) {
+      array<Index, NumIndices> dims;
+      for (int i = 0; i < NumIndices; ++i) {
+        dims[i] = dimensions[i];
+      }
+      resize(dims);
+    }
+#else
+    template <std::size_t V1, std::size_t V2, std::size_t V3, std::size_t V4, std::size_t V5>
+    EIGEN_DEVICE_FUNC
+    void resize(const Sizes<V1, V2, V3, V4, V5>& dimensions) {
+      array<Index, NumIndices> dims;
+      for (int i = 0; i < NumIndices; ++i) {
+        dims[i] = dimensions[i];
+      }
+      resize(dims);
+    }
+#endif
+
+  protected:
+
+    bool checkIndexRange(const array<Index, NumIndices>& indices) const
+    {
+      using internal::array_apply_and_reduce;
+      using internal::array_zip_and_reduce;
+      using internal::greater_equal_zero_op;
+      using internal::logical_and_op;
+      using internal::lesser_op;
+
+      return
+        // check whether the indices are all >= 0
+        array_apply_and_reduce<logical_and_op, greater_equal_zero_op>(indices) &&
+        // check whether the indices fit in the dimensions
+        array_zip_and_reduce<logical_and_op, lesser_op>(indices, m_storage.dimensions());
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index linearizedIndex(const array<Index, NumIndices>& indices) const
+    {
+      if (Options&RowMajor) {
+        return m_storage.dimensions().IndexOfRowMajor(indices);
+      } else {
+        return m_storage.dimensions().IndexOfColMajor(indices);
+      }
+    }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorArgMax.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorArgMax.h
new file mode 100644
index 0000000000..ee3bf7fe34
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorArgMax.h
@@ -0,0 +1,288 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Eugene Brevdo <ebrevdo@gmail.com>
+//                    Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_ARG_MAX_H
+#define EIGEN_CXX11_TENSOR_TENSOR_ARG_MAX_H
+
+namespace Eigen {
+namespace internal {
+
+/** \class TensorIndexTuple
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor + Index Tuple class.
+  *
+  *
+  */
+template<typename XprType>
+struct traits<TensorIndexTupleOp<XprType> > : public traits<XprType>
+{
+  typedef traits<XprType> XprTraits;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef Tuple<Index, typename XprTraits::Scalar> Scalar;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename XprType>
+struct eval<TensorIndexTupleOp<XprType>, Eigen::Dense>
+{
+  typedef const TensorIndexTupleOp<XprType>& type;
+};
+
+template<typename XprType>
+struct nested<TensorIndexTupleOp<XprType>, 1,
+              typename eval<TensorIndexTupleOp<XprType> >::type>
+{
+  typedef TensorIndexTupleOp<XprType> type;
+};
+
+}  // end namespace internal
+
+template<typename XprType>
+class TensorIndexTupleOp : public TensorBase<TensorIndexTupleOp<XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorIndexTupleOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename Eigen::internal::nested<TensorIndexTupleOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorIndexTupleOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorIndexTupleOp>::Index Index;
+  typedef Tuple<Index, typename XprType::CoeffReturnType> CoeffReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIndexTupleOp(const XprType& expr)
+      : m_xpr(expr) {}
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename XprType::Nested>::type&
+  expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+};
+
+// Eval as rvalue
+template<typename ArgType, typename Device>
+struct TensorEvaluator<const TensorIndexTupleOp<ArgType>, Device>
+{
+  typedef TensorIndexTupleOp<ArgType> XprType;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions Dimensions;
+  static const int NumDims = internal::array_size<Dimensions>::value;
+
+  enum {
+    IsAligned = /*TensorEvaluator<ArgType, Device>::IsAligned*/ false,
+    PacketAccess = /*TensorEvaluator<ArgType, Device>::PacketAccess*/ false,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device) { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const {
+    return m_impl.dimensions();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return CoeffReturnType(index, m_impl.coeff(index));
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  TensorEvaluator<ArgType, Device> m_impl;
+};
+
+namespace internal {
+
+/** \class TensorTupleIndex
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Converts to Tensor<Tuple<Index, Scalar> > and reduces to Tensor<Index>.
+  *
+  */
+template<typename ReduceOp, typename Dims, typename XprType>
+struct traits<TensorTupleReducerOp<ReduceOp, Dims, XprType> > : public traits<XprType>
+{
+  typedef traits<XprType> XprTraits;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef Index Scalar;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename ReduceOp, typename Dims, typename XprType>
+struct eval<TensorTupleReducerOp<ReduceOp, Dims, XprType>, Eigen::Dense>
+{
+  typedef const TensorTupleReducerOp<ReduceOp, Dims, XprType>& type;
+};
+
+template<typename ReduceOp, typename Dims, typename XprType>
+struct nested<TensorTupleReducerOp<ReduceOp, Dims, XprType>, 1,
+              typename eval<TensorTupleReducerOp<ReduceOp, Dims, XprType> >::type>
+{
+  typedef TensorTupleReducerOp<ReduceOp, Dims, XprType> type;
+};
+
+}  // end namespace internal
+
+template<typename ReduceOp, typename Dims, typename XprType>
+class TensorTupleReducerOp : public TensorBase<TensorTupleReducerOp<ReduceOp, Dims, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorTupleReducerOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename Eigen::internal::nested<TensorTupleReducerOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorTupleReducerOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorTupleReducerOp>::Index Index;
+  typedef Index CoeffReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorTupleReducerOp(const XprType& expr,
+                                                          const ReduceOp& reduce_op,
+                                                          const int return_dim,
+                                                          const Dims& reduce_dims)
+      : m_xpr(expr), m_reduce_op(reduce_op), m_return_dim(return_dim), m_reduce_dims(reduce_dims) {}
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename XprType::Nested>::type&
+  expression() const { return m_xpr; }
+
+  EIGEN_DEVICE_FUNC
+  const ReduceOp& reduce_op() const { return m_reduce_op; }
+
+  EIGEN_DEVICE_FUNC
+  const Dims& reduce_dims() const { return m_reduce_dims; }
+
+  EIGEN_DEVICE_FUNC
+  int return_dim() const { return m_return_dim; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const ReduceOp m_reduce_op;
+    const int m_return_dim;
+    const Dims m_reduce_dims;
+};
+
+// Eval as rvalue
+template<typename ReduceOp, typename Dims, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorTupleReducerOp<ReduceOp, Dims, ArgType>, Device>
+{
+  typedef TensorTupleReducerOp<ReduceOp, Dims, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename TensorIndexTupleOp<ArgType>::CoeffReturnType TupleType;
+  typedef typename TensorEvaluator<const TensorReductionOp<ReduceOp, Dims, const TensorIndexTupleOp<ArgType> >, Device>::Dimensions Dimensions;
+  typedef typename TensorEvaluator<const TensorIndexTupleOp<ArgType> , Device>::Dimensions InputDimensions;
+  static const int NumDims = internal::array_size<InputDimensions>::value;
+  typedef array<Index, NumDims> StrideDims;
+
+  enum {
+    IsAligned = /*TensorEvaluator<ArgType, Device>::IsAligned*/ false,
+    PacketAccess = /*TensorEvaluator<ArgType, Device>::PacketAccess*/ false,
+    BlockAccess = false,
+    Layout = TensorEvaluator<const TensorReductionOp<ReduceOp, Dims, const TensorIndexTupleOp<ArgType> >, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_orig_impl(op.expression(), device),
+        m_impl(op.expression().index_tuples().reduce(op.reduce_dims(), op.reduce_op()), device),
+        m_return_dim(op.return_dim()),
+        m_strides(gen_strides(m_orig_impl.dimensions())),
+        m_stride_mod(gen_stride_mod(m_orig_impl.dimensions())),
+        m_stride_div(gen_stride_div()) { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const {
+    return m_impl.dimensions();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const {
+    const TupleType v = m_impl.coeff(index);
+    return (m_return_dim < 0) ? v.first : (v.first % m_stride_mod) / m_stride_div;
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ private:
+  EIGEN_DEVICE_FUNC StrideDims gen_strides(const InputDimensions& dims) {
+    StrideDims strides;
+    if (m_return_dim < 0) return strides;  // Won't be using these.
+    eigen_assert(m_return_dim < NumDims &&
+                 "Asking to convert index to a dimension outside of the rank");
+
+    // Calculate m_stride_div and m_stride_mod, which are used to
+    // calculate the value of an index w.r.t. the m_return_dim.
+    if (Layout == static_cast<int>(ColMajor)) {
+      strides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        strides[i] = strides[i-1] * dims[i-1];
+      }
+    } else {
+      strides[NumDims-1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        strides[i] = strides[i+1] * dims[i+1];
+      }
+    }
+    return strides;
+  }
+
+  EIGEN_DEVICE_FUNC Index gen_stride_mod(const InputDimensions& dims) {
+    if (Layout == static_cast<int>(ColMajor)) {
+      return (m_return_dim < NumDims - 1) ? m_strides[m_return_dim + 1] : dims.TotalSize();
+    } else {
+      return (m_return_dim > 0) ? m_strides[m_return_dim - 1] : dims.TotalSize();
+    }
+  }
+
+  EIGEN_DEVICE_FUNC Index gen_stride_div() {
+    return m_strides[m_return_dim];
+  }
+
+ protected:
+  TensorEvaluator<const TensorIndexTupleOp<ArgType>, Device> m_orig_impl;
+  TensorEvaluator<const TensorReductionOp<ReduceOp, Dims, const TensorIndexTupleOp<ArgType> >, Device> m_impl;
+  const int m_return_dim;
+  const StrideDims m_strides;
+  const Index m_stride_mod;
+  const Index m_stride_div;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_ARG_MAX_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h
new file mode 100644
index 0000000000..fdb943e713
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h
@@ -0,0 +1,179 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_ASSIGN_H
+#define EIGEN_CXX11_TENSOR_TENSOR_ASSIGN_H
+
+namespace Eigen {
+
+/** \class TensorAssign
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief The tensor assignment class.
+  *
+  * This class is represents the assignment of the values resulting from the evaluation of
+  * the rhs expression to the memory locations denoted by the lhs expression.
+  */
+namespace internal {
+template<typename LhsXprType, typename RhsXprType>
+struct traits<TensorAssignOp<LhsXprType, RhsXprType> >
+{
+  typedef typename LhsXprType::Scalar Scalar;
+  typedef typename traits<LhsXprType>::StorageKind StorageKind;
+  typedef typename promote_index_type<typename traits<LhsXprType>::Index,
+                                      typename traits<RhsXprType>::Index>::type Index;
+  typedef typename LhsXprType::Nested LhsNested;
+  typedef typename RhsXprType::Nested RhsNested;
+  typedef typename remove_reference<LhsNested>::type _LhsNested;
+  typedef typename remove_reference<RhsNested>::type _RhsNested;
+  static const std::size_t NumDimensions = internal::traits<LhsXprType>::NumDimensions;
+  static const int Layout = internal::traits<LhsXprType>::Layout;
+
+  enum {
+    Flags = 0,
+  };
+};
+
+template<typename LhsXprType, typename RhsXprType>
+struct eval<TensorAssignOp<LhsXprType, RhsXprType>, Eigen::Dense>
+{
+  typedef const TensorAssignOp<LhsXprType, RhsXprType>& type;
+};
+
+template<typename LhsXprType, typename RhsXprType>
+struct nested<TensorAssignOp<LhsXprType, RhsXprType>, 1, typename eval<TensorAssignOp<LhsXprType, RhsXprType> >::type>
+{
+  typedef TensorAssignOp<LhsXprType, RhsXprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename LhsXprType, typename RhsXprType>
+class TensorAssignOp : public TensorBase<TensorAssignOp<LhsXprType, RhsXprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorAssignOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename LhsXprType::CoeffReturnType CoeffReturnType;
+  typedef typename Eigen::internal::traits<TensorAssignOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorAssignOp>::Index Index;
+  static const std::size_t NumDims = Eigen::internal::traits<TensorAssignOp>::NumDimensions;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorAssignOp(LhsXprType& lhs, const RhsXprType& rhs)
+      : m_lhs_xpr(lhs), m_rhs_xpr(rhs) {}
+
+    /** \returns the nested expressions */
+    EIGEN_DEVICE_FUNC
+    typename internal::remove_all<typename LhsXprType::Nested>::type&
+    lhsExpression() const { return *((typename internal::remove_all<typename LhsXprType::Nested>::type*)&m_lhs_xpr); }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename RhsXprType::Nested>::type&
+    rhsExpression() const { return m_rhs_xpr; }
+
+  protected:
+    typename internal::remove_all<typename LhsXprType::Nested>::type& m_lhs_xpr;
+    const typename internal::remove_all<typename RhsXprType::Nested>::type& m_rhs_xpr;
+};
+
+
+template<typename LeftArgType, typename RightArgType, typename Device>
+struct TensorEvaluator<const TensorAssignOp<LeftArgType, RightArgType>, Device>
+{
+  typedef TensorAssignOp<LeftArgType, RightArgType> XprType;
+
+  enum {
+    IsAligned = TensorEvaluator<LeftArgType, Device>::IsAligned &
+                TensorEvaluator<RightArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<LeftArgType, Device>::PacketAccess &
+                   TensorEvaluator<RightArgType, Device>::PacketAccess,
+    BlockAccess = TensorEvaluator<LeftArgType, Device>::BlockAccess &
+                  TensorEvaluator<RightArgType, Device>::BlockAccess,
+    Layout = TensorEvaluator<LeftArgType, Device>::Layout,
+  };
+
+  EIGEN_DEVICE_FUNC TensorEvaluator(const XprType& op, const Device& device) :
+      m_leftImpl(op.lhsExpression(), device),
+      m_rightImpl(op.rhsExpression(), device)
+  {
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<LeftArgType, Device>::Layout) == static_cast<int>(TensorEvaluator<RightArgType, Device>::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE);
+  }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename TensorEvaluator<RightArgType, Device>::Dimensions Dimensions;
+  static const std::size_t NumDims = XprType::NumDims;
+
+  typedef typename internal::TensorBlock<
+    Index, typename internal::remove_const<Scalar>::type, NumDims, Layout>
+    TensorBlock;
+
+  EIGEN_DEVICE_FUNC const Dimensions& dimensions() const
+  {
+    // TODO: use left impl instead if right impl dimensions are known at compile time.
+    return m_rightImpl.dimensions();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar*) {
+    eigen_assert(dimensions_match(m_leftImpl.dimensions(), m_rightImpl.dimensions()));
+    m_leftImpl.evalSubExprsIfNeeded(NULL);
+    // If the lhs provides raw access to its storage area (i.e. if m_leftImpl.data() returns a non
+    // null value), attempt to evaluate the rhs expression in place. Returns true iff in place
+    // evaluation isn't supported and the caller still needs to manually assign the values generated
+    // by the rhs to the lhs.
+    return m_rightImpl.evalSubExprsIfNeeded(m_leftImpl.data());
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_leftImpl.cleanup();
+    m_rightImpl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalScalar(Index i) {
+    m_leftImpl.coeffRef(i) = m_rightImpl.coeff(i);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalPacket(Index i) {
+    const int LhsStoreMode = TensorEvaluator<LeftArgType, Device>::IsAligned ? Aligned : Unaligned;
+    const int RhsLoadMode = TensorEvaluator<RightArgType, Device>::IsAligned ? Aligned : Unaligned;
+    m_leftImpl.template writePacket<LhsStoreMode>(i, m_rightImpl.template packet<RhsLoadMode>(i));
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+    m_leftImpl.getResourceRequirements(resources);
+    m_rightImpl.getResourceRequirements(resources);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalBlock(TensorBlock* block) {
+    m_rightImpl.block(block);
+    m_leftImpl.writeBlock(*block);
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const
+  {
+    return m_leftImpl.coeff(index);
+  }
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const
+  {
+    return m_leftImpl.template packet<LoadMode>(index);
+  }
+
+ private:
+  TensorEvaluator<LeftArgType, Device> m_leftImpl;
+  TensorEvaluator<RightArgType, Device> m_rightImpl;
+};
+
+}
+
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_ASSIGN_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBase.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBase.h
new file mode 100644
index 0000000000..35ebca151b
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBase.h
@@ -0,0 +1,934 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_BASE_H
+#define EIGEN_CXX11_TENSOR_TENSOR_BASE_H
+
+// clang-format off
+
+namespace Eigen {
+
+/** \class TensorBase
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief The tensor base class.
+  *
+  * This class is the common parent of the Tensor and TensorMap class, thus
+  * making it possible to use either class interchangably in expressions.
+  */
+
+template<typename Derived>
+class TensorBase<Derived, ReadOnlyAccessors>
+{
+  public:
+    typedef internal::traits<Derived> DerivedTraits;
+    typedef typename DerivedTraits::Scalar Scalar;
+    typedef typename DerivedTraits::Index Index;
+    typedef typename internal::remove_const<Scalar>::type CoeffReturnType;
+    typedef typename internal::packet_traits<CoeffReturnType>::type PacketReturnType;
+    static const int NumDimensions = DerivedTraits::NumDimensions;
+
+    // Generic nullary operation support.
+    template <typename CustomNullaryOp> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseNullaryOp<CustomNullaryOp, const Derived>
+    nullaryExpr(const CustomNullaryOp& func) const {
+      return TensorCwiseNullaryOp<CustomNullaryOp, const Derived>(derived(), func);
+    }
+
+    // Coefficient-wise nullary operators
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived>
+    constant(const Scalar& value) const {
+      return nullaryExpr(internal::scalar_constant_op<Scalar>(value));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseNullaryOp<internal::UniformRandomGenerator<Scalar>, const Derived>
+    random() const {
+      return nullaryExpr(internal::UniformRandomGenerator<Scalar>());
+    }
+    template <typename RandomGenerator> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseNullaryOp<RandomGenerator, const Derived>
+    random(const RandomGenerator& gen = RandomGenerator()) const {
+      return nullaryExpr(gen);
+    }
+
+    // Tensor generation
+    template <typename Generator> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorGeneratorOp<Generator, const Derived>
+    generate(const Generator& generator) const {
+      return TensorGeneratorOp<Generator, const Derived>(derived(), generator);
+    }
+
+    // Generic unary operation support.
+    template <typename CustomUnaryOp> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<CustomUnaryOp, const Derived>
+    unaryExpr(const CustomUnaryOp& func) const {
+      return TensorCwiseUnaryOp<CustomUnaryOp, const Derived>(derived(), func);
+    }
+
+    // Coefficient-wise unary operators
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_opposite_op<Scalar>, const Derived>
+    operator-() const {
+      return unaryExpr(internal::scalar_opposite_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_sqrt_op<Scalar>, const Derived>
+    sqrt() const {
+      return unaryExpr(internal::scalar_sqrt_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_rsqrt_op<Scalar>, const Derived>
+    rsqrt() const {
+      return unaryExpr(internal::scalar_rsqrt_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_square_op<Scalar>, const Derived>
+    square() const {
+      return unaryExpr(internal::scalar_square_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_cube_op<Scalar>, const Derived>
+    cube() const {
+      return unaryExpr(internal::scalar_cube_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_inverse_op<Scalar>, const Derived>
+    inverse() const {
+      return unaryExpr(internal::scalar_inverse_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_tanh_op<Scalar>, const Derived>
+    tanh() const {
+      return unaryExpr(internal::scalar_tanh_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_sigmoid_op<Scalar>, const Derived>
+    sigmoid() const {
+      return unaryExpr(internal::scalar_sigmoid_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_exp_op<Scalar>, const Derived>
+    exp() const {
+      return unaryExpr(internal::scalar_exp_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_log_op<Scalar>, const Derived>
+    log() const {
+      return unaryExpr(internal::scalar_log_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_abs_op<Scalar>, const Derived>
+    abs() const {
+      return unaryExpr(internal::scalar_abs_op<Scalar>());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_pow_op<Scalar>, const Derived>
+    pow(Scalar exponent) const {
+      return unaryExpr(internal::scalar_pow_op<Scalar>(exponent));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_add_op<Scalar>, const Derived>
+    operator+ (Scalar rhs) const {
+      return unaryExpr(internal::scalar_add_op<Scalar>(rhs));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_sub_op<Scalar>, const Derived>
+    operator- (Scalar rhs) const {
+      EIGEN_STATIC_ASSERT((std::numeric_limits<Scalar>::is_signed || internal::is_same<Scalar, const std::complex<float> >::value), YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return unaryExpr(internal::scalar_sub_op<Scalar>(rhs));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_multiple_op<Scalar>, const Derived>
+    operator* (Scalar rhs) const {
+      return unaryExpr(internal::scalar_multiple_op<Scalar>(rhs));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_quotient1_op<Scalar>, const Derived>
+    operator/ (Scalar rhs) const {
+      // EIGEN_STATIC_ASSERT(!std::numeric_limits<Scalar>::is_integer, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return unaryExpr(internal::scalar_quotient1_op<Scalar>(rhs));
+    }
+
+    template <typename Scale>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_multiple2_op<Scalar, Scale>, const Derived>
+    scale (Scale rhs) const {
+      return unaryExpr(internal::scalar_multiple2_op<Scalar, Scale>(rhs));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseUnaryOp<internal::scalar_mod_op<Scalar>, const Derived>
+    operator% (Scalar rhs) const {
+      EIGEN_STATIC_ASSERT(std::numeric_limits<Scalar>::is_integer, YOU_MADE_A_PROGRAMMING_MISTAKE_TRY_MOD);
+      return unaryExpr(internal::scalar_mod_op<Scalar>(rhs));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<internal::scalar_fmod_op<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    mod(Scalar rhs) const {
+      EIGEN_STATIC_ASSERT(!std::numeric_limits<Scalar>::is_integer, YOU_MADE_A_PROGRAMMING_MISTAKE_FMOD_IS_NOT_FOR_INTEGERS);
+      return mod(constant(rhs));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<internal::scalar_max_op<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    cwiseMax(Scalar threshold) const {
+      return cwiseMax(constant(threshold));
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<internal::scalar_min_op<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    cwiseMin(Scalar threshold) const {
+      return cwiseMin(constant(threshold));
+    }
+
+    template <typename NewType> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorConversionOp<NewType, const Derived>
+    cast() const {
+      return TensorConversionOp<NewType, const Derived>(derived());
+    }
+
+    // Generic binary operation support.
+    template <typename CustomBinaryOp, typename OtherDerived> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<CustomBinaryOp, const Derived, const OtherDerived>
+    binaryExpr(const OtherDerived& other, const CustomBinaryOp& func) const {
+      return TensorCwiseBinaryOp<CustomBinaryOp, const Derived, const OtherDerived>(derived(), other, func);
+    }
+
+    // Coefficient-wise binary operators.
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_sum_op<Scalar>, const Derived, const OtherDerived>
+    operator+(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_sum_op<Scalar>());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_difference_op<Scalar>, const Derived, const OtherDerived>
+    operator-(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_difference_op<Scalar>());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_product_op<Scalar>, const Derived, const OtherDerived>
+    operator*(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_product_op<Scalar>());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_quotient_op<Scalar>, const Derived, const OtherDerived>
+    operator/(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_quotient_op<Scalar>());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_fmod_op<Scalar>, const Derived, const OtherDerived>
+    mod(const OtherDerived& other) const {
+      EIGEN_STATIC_ASSERT(!std::numeric_limits<Scalar>::is_integer, YOU_MADE_A_PROGRAMMING_MISTAKE_FMOD_IS_NOT_FOR_INTEGERS);
+      return binaryExpr(other.derived(), internal::scalar_fmod_op<Scalar>());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_max_op<Scalar>, const Derived, const OtherDerived>
+    cwiseMax(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_max_op<Scalar>());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_min_op<Scalar>, const Derived, const OtherDerived>
+    cwiseMin(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_min_op<Scalar>());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_boolean_and_op, const Derived, const OtherDerived>
+    operator&&(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_boolean_and_op());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_boolean_or_op, const Derived, const OtherDerived>
+    operator||(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_boolean_or_op());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<internal::scalar_boolean_xor_op, const Derived, const OtherDerived>
+    operator^(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), internal::scalar_boolean_xor_op());
+    }
+
+    // Comparisons and tests.
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<std::less<Scalar>, const Derived, const OtherDerived>
+    operator<(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), std::less<Scalar>());
+    }
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<std::less_equal<Scalar>, const Derived, const OtherDerived>
+    operator<=(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), std::less_equal<Scalar>());
+    }
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<std::greater<Scalar>, const Derived, const OtherDerived>
+    operator>(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), std::greater<Scalar>());
+    }
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<std::greater_equal<Scalar>, const Derived, const OtherDerived>
+    operator>=(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), std::greater_equal<Scalar>());
+    }
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<std::equal_to<Scalar>, const Derived, const OtherDerived>
+    operator==(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), std::equal_to<Scalar>());
+    }
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCwiseBinaryOp<std::not_equal_to<Scalar>, const Derived, const OtherDerived>
+    operator!=(const OtherDerived& other) const {
+      return binaryExpr(other.derived(), std::not_equal_to<Scalar>());
+    }
+
+    // comparisons and tests for Scalars
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<std::less<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    operator<(Scalar threshold) const {
+      return operator<(constant(threshold));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<std::less_equal<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    operator<=(Scalar threshold) const {
+      return operator<=(constant(threshold));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<std::greater<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    operator>(Scalar threshold) const {
+      return operator>(constant(threshold));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<std::greater_equal<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    operator>=(Scalar threshold) const {
+      return operator>=(constant(threshold));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<std::equal_to<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    operator==(Scalar threshold) const {
+      return operator==(constant(threshold));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const TensorCwiseBinaryOp<std::not_equal_to<Scalar>, const Derived, const TensorCwiseNullaryOp<internal::scalar_constant_op<Scalar>, const Derived> >
+    operator!=(Scalar threshold) const {
+      return operator!=(constant(threshold));
+    }
+
+    // Coefficient-wise ternary operators.
+    template<typename ThenDerived, typename ElseDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorSelectOp<const Derived, const ThenDerived, const ElseDerived>
+    select(const ThenDerived& thenTensor, const ElseDerived& elseTensor) const {
+      return TensorSelectOp<const Derived, const ThenDerived, const ElseDerived>(derived(), thenTensor.derived(), elseTensor.derived());
+    }
+
+    // Contractions.
+    typedef Eigen::IndexPair<Index> DimensionPair;
+
+    template<typename OtherDerived, typename Dimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorContractionOp<const Dimensions, const Derived, const OtherDerived>
+    contract(const OtherDerived& other, const Dimensions& dims) const {
+      return TensorContractionOp<const Dimensions, const Derived, const OtherDerived>(derived(), other.derived(), dims);
+    }
+
+    // Convolutions.
+    template<typename KernelDerived, typename Dimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorConvolutionOp<const Dimensions, const Derived, const KernelDerived>
+    convolve(const KernelDerived& kernel, const Dimensions& dims) const {
+      return TensorConvolutionOp<const Dimensions, const Derived, const KernelDerived>(derived(), kernel.derived(), dims);
+    }
+
+    // Convolutions by fft.
+    template<typename KernelDerived, typename Dimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorConvolutionByFFTOp<const Dimensions, const Derived, const KernelDerived>
+    convolvebyfft(const KernelDerived& kernel, const Dimensions& dims) const {
+      return TensorConvolutionByFFTOp<const Dimensions, const Derived, const KernelDerived>(derived(), kernel.derived(), dims);
+    }
+
+    // Reductions.
+    template <typename Dims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::SumReducer<CoeffReturnType>, const Dims, const Derived>
+    sum(const Dims& dims) const {
+      return TensorReductionOp<internal::SumReducer<CoeffReturnType>, const Dims, const Derived>(derived(), dims, internal::SumReducer<CoeffReturnType>());
+    }
+
+    const TensorReductionOp<internal::SumReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>
+    sum() const {
+      DimensionList<Index, NumDimensions> in_dims;
+      return TensorReductionOp<internal::SumReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>(derived(), in_dims, internal::SumReducer<CoeffReturnType>());
+    }
+
+    template <typename Dims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::MeanReducer<CoeffReturnType>, const Dims, const Derived>
+    mean(const Dims& dims) const {
+      return TensorReductionOp<internal::MeanReducer<CoeffReturnType>, const Dims, const Derived>(derived(), dims, internal::MeanReducer<CoeffReturnType>());
+    }
+
+    const TensorReductionOp<internal::MeanReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>
+    mean() const {
+      DimensionList<Index, NumDimensions> in_dims;
+      return TensorReductionOp<internal::MeanReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>(derived(), in_dims, internal::MeanReducer<CoeffReturnType>());
+    }
+
+    template <typename Dims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::ProdReducer<CoeffReturnType>, const Dims, const Derived>
+    prod(const Dims& dims) const {
+      return TensorReductionOp<internal::ProdReducer<CoeffReturnType>, const Dims, const Derived>(derived(), dims, internal::ProdReducer<CoeffReturnType>());
+    }
+
+    const TensorReductionOp<internal::ProdReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>
+    prod() const {
+      DimensionList<Index, NumDimensions> in_dims;
+      return TensorReductionOp<internal::ProdReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>(derived(), in_dims, internal::ProdReducer<CoeffReturnType>());
+    }
+
+    template <typename Dims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::MaxReducer<CoeffReturnType>, const Dims, const Derived>
+    maximum(const Dims& dims) const {
+      return TensorReductionOp<internal::MaxReducer<CoeffReturnType>, const Dims, const Derived>(derived(), dims, internal::MaxReducer<CoeffReturnType>());
+    }
+
+    const TensorReductionOp<internal::MaxReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>
+    maximum() const {
+      DimensionList<Index, NumDimensions> in_dims;
+      return TensorReductionOp<internal::MaxReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>(derived(), in_dims, internal::MaxReducer<CoeffReturnType>());
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorTupleReducerOp<
+      internal::ArgMaxTupleReducer<Tuple<Index, CoeffReturnType> >,
+      const array<Index, NumDimensions>, const Derived>
+    argmax() const {
+      array<Index, NumDimensions> in_dims;
+      for (int d = 0; d < NumDimensions; ++d) in_dims[d] = d;
+      return TensorTupleReducerOp<
+        internal::ArgMaxTupleReducer<Tuple<Index, CoeffReturnType> >,
+        const array<Index, NumDimensions>,
+        const Derived>(derived(), internal::ArgMaxTupleReducer<Tuple<Index, CoeffReturnType> >(), -1, in_dims);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorTupleReducerOp<
+      internal::ArgMinTupleReducer<Tuple<Index, CoeffReturnType> >,
+      const array<Index, NumDimensions>, const Derived>
+    argmin() const {
+      array<Index, NumDimensions> in_dims;
+      for (int d = 0; d < NumDimensions; ++d) in_dims[d] = d;
+      return TensorTupleReducerOp<
+        internal::ArgMinTupleReducer<Tuple<Index, CoeffReturnType> >,
+        const array<Index, NumDimensions>,
+        const Derived>(derived(), internal::ArgMinTupleReducer<Tuple<Index, CoeffReturnType> >(), -1, in_dims);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorTupleReducerOp<
+      internal::ArgMaxTupleReducer<Tuple<Index, CoeffReturnType> >,
+      const array<Index, 1>, const Derived>
+    argmax(const int return_dim) const {
+      array<Index, 1> in_dims;
+      in_dims[0] = return_dim;
+      return TensorTupleReducerOp<
+        internal::ArgMaxTupleReducer<Tuple<Index, CoeffReturnType> >,
+        const array<Index, 1>,
+        const Derived>(derived(), internal::ArgMaxTupleReducer<Tuple<Index, CoeffReturnType> >(), return_dim, in_dims);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorTupleReducerOp<
+      internal::ArgMinTupleReducer<Tuple<Index, CoeffReturnType> >,
+      const array<Index, 1>, const Derived>
+    argmin(const int return_dim) const {
+      array<Index, 1> in_dims;
+      in_dims[0] = return_dim;
+      return TensorTupleReducerOp<
+        internal::ArgMinTupleReducer<Tuple<Index, CoeffReturnType> >,
+        const array<Index, 1>,
+        const Derived>(derived(), internal::ArgMinTupleReducer<Tuple<Index, CoeffReturnType> >(), return_dim, in_dims);
+    }
+
+    template <typename Dims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::MinReducer<CoeffReturnType>, const Dims, const Derived>
+    minimum(const Dims& dims) const {
+      return TensorReductionOp<internal::MinReducer<CoeffReturnType>, const Dims, const Derived>(derived(), dims, internal::MinReducer<CoeffReturnType>());
+    }
+
+    const TensorReductionOp<internal::MinReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>
+    minimum() const {
+      DimensionList<Index, NumDimensions> in_dims;
+      return TensorReductionOp<internal::MinReducer<CoeffReturnType>, const DimensionList<Index, NumDimensions>, const Derived>(derived(), in_dims, internal::MinReducer<CoeffReturnType>());
+    }
+
+    // This does not short-circuit, so is potentially very inefficient.
+    template <typename Dims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::AndReducer, const Dims, const TensorConversionOp<bool, const Derived> >
+    all(const Dims& dims) const {
+      return cast<bool>().reduce(dims, internal::AndReducer());
+    }
+
+    // This does not short-circuit, so is potentially very inefficient.
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::AndReducer, const DimensionList<Index, NumDimensions>, const TensorConversionOp<bool, const Derived> >
+    all() const {
+      DimensionList<Index, NumDimensions> in_dims;
+      return cast<bool>().reduce(in_dims, internal::AndReducer());
+    }
+
+    // This does not short-circuit, so is potentially very inefficient.
+    template <typename Dims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::OrReducer, const Dims, const TensorConversionOp<bool, const Derived> >
+    any(const Dims& dims) const {
+      return cast<bool>().reduce(dims, internal::OrReducer());
+    }
+
+    // This does not short-circuit, so is potentially very inefficient.
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<internal::OrReducer, const DimensionList<Index, NumDimensions>, const TensorConversionOp<bool, const Derived> >
+    any() const {
+      DimensionList<Index, NumDimensions> in_dims;
+      return cast<bool>().reduce(in_dims, internal::OrReducer());
+    }
+
+    template <typename Reducer, typename Dims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReductionOp<Reducer, const Dims, const Derived>
+    reduce(const Dims& dims, const Reducer& reducer) const {
+      return TensorReductionOp<Reducer, const Dims, const Derived>(derived(), dims, reducer);
+    }
+
+    template <typename Broadcast> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorBroadcastingOp<const Broadcast, const Derived>
+    broadcast(const Broadcast& broadcast) const {
+      return TensorBroadcastingOp<const Broadcast, const Derived>(derived(), broadcast);
+    }
+
+    template <int FFTDataType, int FFTDirection, typename FFT> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorFFTOp<const FFT, const Derived, FFTDataType, FFTDirection>
+    fft(const FFT& fft) const {
+      return TensorFFTOp<const FFT, const Derived, FFTDataType, FFTDirection>(derived(), fft);
+    }
+
+    template <typename Axis, typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorConcatenationOp<Axis, const Derived, const OtherDerived>
+    concatenate(const OtherDerived& other, Axis axis) const {
+      return TensorConcatenationOp<Axis, const Derived, const OtherDerived>(derived(), other.derived(), axis);
+    }
+
+    template <typename PatchDims> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorPatchOp<const PatchDims, const Derived>
+    extract_patches(const PatchDims& patch_dims) const {
+      return TensorPatchOp<const PatchDims, const Derived>(derived(), patch_dims);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const Derived>
+    extract_volume_patches(const Index patch_planes, const Index patch_rows, const Index patch_cols,
+                           const Index plane_stride = 1, const Index row_stride = 1, const Index col_stride = 1,
+                           const PaddingType padding_type = PADDING_SAME, const Scalar padding_value = 0) const {
+      return TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const Derived>(derived(), patch_planes, patch_rows, patch_cols, plane_stride, row_stride, col_stride, 1, 1, 1, 1, 1, 1, padding_type, padding_value);
+    }
+
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const Derived>
+    extract_volume_patches(const Index patch_planes, const Index patch_rows, const Index patch_cols,
+                           const Index plane_stride, const Index row_stride, const Index col_stride,
+                           const Index plane_inflate_stride, const Index row_inflate_stride, const Index col_inflate_stride,
+                           const Index padding_top_z, const Index padding_bottom_z,
+                           const Index padding_top, const Index padding_bottom,
+                           const Index padding_left, const Index padding_right, const Scalar padding_value = 0) const {
+      return TensorVolumePatchOp<Dynamic, Dynamic, Dynamic, const Derived>(derived(), patch_planes, patch_rows, patch_cols, plane_stride, row_stride, col_stride, 1, 1, 1, plane_inflate_stride, row_inflate_stride, col_inflate_stride, padding_top_z, padding_bottom_z, padding_top, padding_bottom, padding_left, padding_right, padding_value);
+    }
+
+    template <Index Rows, Index Cols> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Rows, Cols, const Derived>
+    extract_image_patches() const {
+      return TensorImagePatchOp<Rows, Cols, const Derived>(derived(), Rows, Cols, 1, 1, 1, 1, 1, 1, PADDING_SAME, 0);
+    }
+
+    template <Index Rows, Index Cols> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Rows, Cols, const Derived>
+    extract_image_patches(const PaddingType padding_type) const {
+      return TensorImagePatchOp<Rows, Cols, const Derived>(derived(), Rows, Cols, 1, 1, 1, 1, 1, 1, padding_type, 0);
+    }
+
+    template <Index Rows, Index Cols> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Rows, Cols, const Derived>
+    extract_image_patches(const Index stride, const PaddingType padding_type) const {
+      return TensorImagePatchOp<Rows, Cols, const Derived>(derived(), Rows, Cols, stride, stride, 1, 1, 1, 1, padding_type, 0);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Dynamic, Dynamic, const Derived>
+    extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride = 1, const Index col_stride = 1) const {
+      return TensorImagePatchOp<Dynamic, Dynamic, const Derived>(derived(), patch_rows, patch_cols, row_stride, col_stride,
+                                                                 1, 1, 1, 1, PADDING_SAME, 0);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Dynamic, Dynamic, const Derived>
+    extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride, const Index col_stride,
+                          const PaddingType padding_type) const {
+      return TensorImagePatchOp<Dynamic, Dynamic, const Derived>(derived(), patch_rows, patch_cols, row_stride, col_stride,
+                                                                 1, 1, 1, 1, padding_type, 0);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Dynamic, Dynamic, const Derived>
+    extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride, const Index col_stride,
+                          const PaddingType padding_type, const Scalar padding_value) const {
+      return TensorImagePatchOp<Dynamic, Dynamic, const Derived>(derived(), patch_rows, patch_cols, row_stride, col_stride,
+                                                                 1, 1, 1, 1, padding_type, padding_value);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Dynamic, Dynamic, const Derived>
+    extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride, const Index col_stride,
+                          const Index in_row_stride, const Index in_col_stride) const {
+      return TensorImagePatchOp<Dynamic, Dynamic, const Derived>(derived(), patch_rows, patch_cols, row_stride, col_stride,
+                                                                 in_row_stride, in_col_stride, 1, 1, PADDING_SAME, 0);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Dynamic, Dynamic, const Derived>
+    extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride, const Index col_stride,
+                          const Index in_row_stride, const Index in_col_stride,
+                          const PaddingType padding_type) const {
+      return TensorImagePatchOp<Dynamic, Dynamic, const Derived>(derived(), patch_rows, patch_cols, row_stride, col_stride,
+                                                                 in_row_stride, in_col_stride, 1, 1, padding_type, 0);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Dynamic, Dynamic, const Derived>
+    extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride, const Index col_stride,
+                          const Index in_row_stride, const Index in_col_stride,
+                          const PaddingType padding_type, const Scalar padding_value) const {
+      return TensorImagePatchOp<Dynamic, Dynamic, const Derived>(derived(), patch_rows, patch_cols, row_stride, col_stride,
+                                                                 in_row_stride, in_col_stride, 1, 1, padding_type, padding_value);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Dynamic, Dynamic, const Derived>
+    extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride, const Index col_stride,
+                          const Index in_row_stride, const Index in_col_stride,
+                          const Index row_inflate_stride, const Index col_inflate_stride,
+                          const PaddingType padding_type, const Scalar padding_value) const {
+      return TensorImagePatchOp<Dynamic, Dynamic, const Derived>(derived(), patch_rows, patch_cols, row_stride, col_stride,
+                                                                 in_row_stride, in_col_stride, row_inflate_stride, col_inflate_stride,
+                                                                 padding_type, padding_value);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorImagePatchOp<Dynamic, Dynamic, const Derived>
+    extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride, const Index col_stride,
+                          const Index in_row_stride, const Index in_col_stride,
+                          const Index row_inflate_stride, const Index col_inflate_stride,
+                          const Index padding_top, const Index padding_bottom,
+                          const Index padding_left,const Index padding_right,
+                          const Scalar padding_value) const {
+      return TensorImagePatchOp<Dynamic, Dynamic, const Derived>(derived(), patch_rows, patch_cols, row_stride, col_stride,
+                                                                 in_row_stride, in_col_stride, row_inflate_stride, col_inflate_stride,
+                                                                 padding_top, padding_bottom, padding_left, padding_right, padding_value);
+    }
+
+    // Morphing operators.
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorLayoutSwapOp<const Derived>
+    swap_layout() const {
+      return TensorLayoutSwapOp<const Derived>(derived());
+    }
+    template <typename NewDimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReshapingOp<const NewDimensions, const Derived>
+    reshape(const NewDimensions& newDimensions) const {
+      return TensorReshapingOp<const NewDimensions, const Derived>(derived(), newDimensions);
+    }
+    template <typename StartIndices, typename Sizes> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorSlicingOp<const StartIndices, const Sizes, const Derived>
+    slice(const StartIndices& startIndices, const Sizes& sizes) const {
+      return TensorSlicingOp<const StartIndices, const Sizes, const Derived>(derived(), startIndices, sizes);
+    }
+    template <Index DimId> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorChippingOp<DimId, const Derived>
+    chip(const Index offset) const {
+      return TensorChippingOp<DimId, const Derived>(derived(), offset, DimId);
+    }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorChippingOp<Dynamic, const Derived>
+    chip(const Index offset, const Index dim) const {
+      return TensorChippingOp<Dynamic, const Derived>(derived(), offset, dim);
+    }
+    template <typename ReverseDimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReverseOp<const ReverseDimensions, const Derived>
+    reverse(const ReverseDimensions& rev) const {
+      return TensorReverseOp<const ReverseDimensions, const Derived>(derived(), rev);
+    }
+    template <typename PaddingDimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorPaddingOp<const PaddingDimensions, const Derived>
+    pad(const PaddingDimensions& padding) const {
+      return TensorPaddingOp<const PaddingDimensions, const Derived>(derived(), padding, Scalar(0));
+    }
+    template <typename PaddingDimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorPaddingOp<const PaddingDimensions, const Derived>
+    pad (const PaddingDimensions& padding, const Scalar padding_value) const {
+      return TensorPaddingOp<const PaddingDimensions, const Derived>(derived(), padding, padding_value);
+    }
+    template <typename Shuffle> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorShufflingOp<const Shuffle, const Derived>
+    shuffle(const Shuffle& shuffle) const {
+      return TensorShufflingOp<const Shuffle, const Derived>(derived(), shuffle);
+    }
+    template <typename Strides> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorStridingOp<const Strides, const Derived>
+    stride(const Strides& strides) const {
+      return TensorStridingOp<const Strides, const Derived>(derived(), strides);
+    }
+    template <typename Strides> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorInflationOp<const Strides, const Derived>
+    inflate(const Strides& strides) const {
+      return TensorInflationOp<const Strides, const Derived>(derived(), strides);
+    }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorTrueIndicesOp<const Derived>
+    true_indices(const Index& not_true_value = -1) const {
+      return TensorTrueIndicesOp<const Derived>(derived(), not_true_value);
+    }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorIndexTupleOp<const Derived>
+    index_tuples() const {
+      return TensorIndexTupleOp<const Derived>(derived());
+    }
+    template <typename CustomUnaryFunc>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCustomUnaryOp<const CustomUnaryFunc, const Derived> customOp(const CustomUnaryFunc& op) const {
+      return TensorCustomUnaryOp<const CustomUnaryFunc, const Derived>(derived(), op);
+    }
+    template <typename OtherDerived, typename CustomBinaryFunc>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorCustomBinaryOp<const CustomBinaryFunc, const Derived, const OtherDerived> customOp(const OtherDerived& other, const CustomBinaryFunc& op) const {
+      return TensorCustomBinaryOp<const CustomBinaryFunc, const Derived, const OtherDerived>(derived(), other, op);
+    }
+
+    // Force the evaluation of the expression.
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorForcedEvalOp<const Derived> eval() const {
+      return TensorForcedEvalOp<const Derived>(derived());
+    }
+
+  protected:
+    template <typename Scalar, std::size_t NumIndices, int Options, typename IndexType> friend class Tensor;
+    template <typename Scalar, int Option, typename IndexTypes> friend class TensorVarDim;
+    template <typename Scalar, typename Dimensions, int Option, typename IndexTypes> friend class TensorFixedSize;
+    template <typename OtherDerived, int AccessLevel> friend class TensorBase;
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Derived& derived() const { return *static_cast<const Derived*>(this); }
+};
+
+template<typename Derived>
+class TensorBase<Derived, WriteAccessors> : public TensorBase<Derived, ReadOnlyAccessors> {
+ public:
+    typedef internal::traits<Derived> DerivedTraits;
+    typedef typename DerivedTraits::Scalar Scalar;
+    typedef typename DerivedTraits::Index Index;
+    typedef Scalar CoeffReturnType;
+    typedef typename internal::packet_traits<Scalar>::type PacketReturnType;
+    static const int NumDimensions = DerivedTraits::NumDimensions;
+
+    template <typename Scalar, std::size_t NumIndices, int Options, typename IndexType> friend class Tensor;
+    template <typename Scalar, int Options, typename IndexType> friend class TensorVarDim;
+    template <typename OtherDerived, int AccessLevel> friend class TensorBase;
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& setZero() {
+      return setConstant(Scalar(0));
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& setConstant(const Scalar& val) {
+      return derived() = this->constant(val);
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& setRandom() {
+      return derived() = this->random();
+    }
+    template <typename RandomGenerator> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& setRandom() {
+      return derived() = this->template random<RandomGenerator>();
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& setValues(
+        const typename internal::Initializer<Derived, NumDimensions>::InitList& vals) {
+      TensorEvaluator<Derived, DefaultDevice> eval(derived(), DefaultDevice());
+      internal::initialize_tensor<Derived, NumDimensions>(eval, vals);
+      return derived();
+    }
+#endif  // EIGEN_HAS_VARIADIC_TEMPLATES
+
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    Derived& operator+=(const OtherDerived& other) {
+      return derived() = derived() + other.derived();
+    }
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    Derived& operator-=(const OtherDerived& other) {
+      return derived() = derived() - other.derived();
+    }
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    Derived& operator*=(const OtherDerived& other) {
+      return derived() = derived() * other.derived();
+    }
+    template<typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    Derived& operator/=(const OtherDerived& other) {
+      return derived() = derived() / other.derived();
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorLayoutSwapOp<const Derived>
+    swap_layout() const {
+      return TensorLayoutSwapOp<const Derived>(derived());
+    }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorLayoutSwapOp<Derived>
+    swap_layout() {
+      return TensorLayoutSwapOp<Derived>(derived());
+    }
+
+    template <typename Axis, typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorConcatenationOp<const Axis, const Derived, const OtherDerived>
+    concatenate(const OtherDerived& other, const Axis& axis) const {
+      return TensorConcatenationOp<const Axis, const Derived, const OtherDerived>(derived(), other, axis);
+    }
+    template <typename Axis, typename OtherDerived> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorConcatenationOp<const Axis, Derived, OtherDerived>
+    concatenate(const OtherDerived& other, const Axis& axis) {
+      return TensorConcatenationOp<const Axis, Derived, OtherDerived>(derived(), other, axis);
+    }
+
+    template <typename NewDimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReshapingOp<const NewDimensions, const Derived>
+    reshape(const NewDimensions& newDimensions) const {
+      return TensorReshapingOp<const NewDimensions, const Derived>(derived(), newDimensions);
+    }
+    template <typename NewDimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorReshapingOp<const NewDimensions, Derived>
+    reshape(const NewDimensions& newDimensions) {
+      return TensorReshapingOp<const NewDimensions, Derived>(derived(), newDimensions);
+    }
+
+    template <typename StartIndices, typename Sizes> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorSlicingOp<const StartIndices, const Sizes, const Derived>
+    slice(const StartIndices& startIndices, const Sizes& sizes) const {
+      return TensorSlicingOp<const StartIndices, const Sizes, const Derived>(derived(), startIndices, sizes);
+    }
+    template <typename StartIndices, typename Sizes> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorSlicingOp<const StartIndices, const Sizes, Derived>
+    slice(const StartIndices& startIndices, const Sizes& sizes) {
+      return TensorSlicingOp<const StartIndices, const Sizes, Derived>(derived(), startIndices, sizes);
+    }
+
+    template <DenseIndex DimId> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorChippingOp<DimId, const Derived>
+    chip(const Index offset) const {
+      return TensorChippingOp<DimId, const Derived>(derived(), offset, DimId);
+    }
+    template <Index DimId> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorChippingOp<DimId, Derived>
+    chip(const Index offset) {
+      return TensorChippingOp<DimId, Derived>(derived(), offset, DimId);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorChippingOp<Dynamic, const Derived>
+    chip(const Index offset, const Index dim) const {
+      return TensorChippingOp<Dynamic, const Derived>(derived(), offset, dim);
+    }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorChippingOp<Dynamic, Derived>
+    chip(const Index offset, const Index dim) {
+      return TensorChippingOp<Dynamic, Derived>(derived(), offset, dim);
+    }
+
+    template <typename ReverseDimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorReverseOp<const ReverseDimensions, const Derived>
+    reverse(const ReverseDimensions& rev) const {
+      return TensorReverseOp<const ReverseDimensions, const Derived>(derived(), rev);
+    }
+    template <typename ReverseDimensions> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorReverseOp<const ReverseDimensions, Derived>
+    reverse(const ReverseDimensions& rev) {
+      return TensorReverseOp<const ReverseDimensions, Derived>(derived(), rev);
+    }
+
+    template <typename Shuffle> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorShufflingOp<const Shuffle, const Derived>
+    shuffle(const Shuffle& shuffle) const {
+      return TensorShufflingOp<const Shuffle, const Derived>(derived(), shuffle);
+    }
+    template <typename Shuffle> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorShufflingOp<const Shuffle, Derived>
+    shuffle(const Shuffle& shuffle) {
+      return TensorShufflingOp<const Shuffle, Derived>(derived(), shuffle);
+    }
+
+    template <typename Strides> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const TensorStridingOp<const Strides, const Derived>
+    stride(const Strides& strides) const {
+      return TensorStridingOp<const Strides, const Derived>(derived(), strides);
+    }
+    template <typename Strides> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorStridingOp<const Strides, Derived>
+    stride(const Strides& strides) {
+      return TensorStridingOp<const Strides, Derived>(derived(), strides);
+    }
+
+    // Select the device on which to evaluate the expression.
+    template <typename DeviceType>
+    TensorDevice<Derived, DeviceType> device(const DeviceType& device) {
+      return TensorDevice<Derived, DeviceType>(device, derived());
+    }
+
+ protected:
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Derived& derived() { return *static_cast<Derived*>(this); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Derived& derived() const { return *static_cast<const Derived*>(this); }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_BASE_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h
new file mode 100644
index 0000000000..ac428b169f
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBlock.h
@@ -0,0 +1,627 @@
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_BLOCK_H
+#define EIGEN_CXX11_TENSOR_TENSOR_BLOCK_H
+
+namespace Eigen {
+
+/** \class TensorBlock
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor block class.
+  *
+  * This class represents a tensor block specified by the index of the
+  * first block coefficient, and the size of the block in each dimension.
+  *
+  */
+
+namespace internal {
+
+template <typename Index, typename Scalar, std::size_t NumDims, int Layout>
+class TensorBlock {
+ public:
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  TensorBlock(const Index first_coeff_index,
+              const Dimensions& block_sizes,
+              const Dimensions& block_strides,
+              const Dimensions& tensor_strides,
+              Scalar* data)
+      : m_first_coeff_index(first_coeff_index),
+        m_block_sizes(block_sizes),
+        m_block_strides(block_strides),
+        m_tensor_strides(tensor_strides),
+        m_data(data) {}
+
+  Index first_coeff_index() const { return m_first_coeff_index; }
+
+  const Dimensions& block_sizes() const { return m_block_sizes; }
+
+  const Dimensions& block_strides() const { return m_block_strides; }
+
+  const Dimensions& tensor_strides() const { return m_tensor_strides; }
+
+  Scalar* data() { return m_data; }
+
+  const Scalar* data() const { return m_data; }
+
+ private:
+  Index m_first_coeff_index;
+  Dimensions m_block_sizes;
+  Dimensions m_block_strides;
+  Dimensions m_tensor_strides;
+  Scalar* m_data;  // Not owned.
+};
+
+template <typename Index, typename Scalar, bool Vectorizable>
+struct TensorBlockCopyOp {
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      const Index num_coeff_to_copy, const Index dst_index,
+      const Index dst_stride, Scalar* EIGEN_RESTRICT dst_data, const Index src_index,
+      const Index src_stride, const Scalar* EIGEN_RESTRICT src_data) {
+    for (Index i = 0; i < num_coeff_to_copy; ++i) {
+      dst_data[dst_index + i * dst_stride] =
+          src_data[src_index + i * src_stride];
+    }
+  }
+};
+
+// NOTE: Benchmarks run on an implementation of this that broke each of the
+// loops in these conditionals into it's own template specialization (to
+// avoid conditionals in the caller's loop) did not show an improvement.
+template <typename Index, typename Scalar>
+struct TensorBlockCopyOp<Index, Scalar, true> {
+  typedef typename packet_traits<Scalar>::type Packet;
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      const Index num_coeff_to_copy, const Index dst_index,
+      const Index dst_stride, Scalar* EIGEN_RESTRICT dst_data,
+      const Index src_index, const Index src_stride,
+      const Scalar* EIGEN_RESTRICT src_data) {
+    if (src_stride == 1) {
+      const Index packet_size = internal::unpacket_traits<Packet>::size;
+      const Index vectorized_size =
+          (num_coeff_to_copy / packet_size) * packet_size;
+      if (dst_stride == 1) {
+        // LINEAR
+        for (Index i = 0; i < vectorized_size; i += packet_size) {
+          Packet p = internal::ploadt<Packet, Unaligned>(
+              src_data + src_index + i);
+          internal::pstoret<Scalar, Packet, Unaligned>(
+              dst_data + dst_index + i, p);
+        }
+        for (Index i = vectorized_size; i < num_coeff_to_copy; ++i) {
+          dst_data[dst_index + i] = src_data[src_index + i];
+        }
+      } else {
+        // SCATTER
+        for (Index i = 0; i < vectorized_size; i += packet_size) {
+          Packet p = internal::ploadt<Packet, Unaligned>(
+              src_data + src_index + i);
+          internal::pscatter<Scalar, Packet>(
+              dst_data + dst_index + i * dst_stride, p, dst_stride);
+        }
+        for (Index i = vectorized_size; i < num_coeff_to_copy; ++i) {
+          dst_data[dst_index + i * dst_stride] = src_data[src_index + i];
+        }
+      }
+    } else {
+      if (dst_stride == 1) {
+        // GATHER
+        const Index packet_size = internal::unpacket_traits<Packet>::size;
+        const Index vectorized_size =
+            (num_coeff_to_copy / packet_size) * packet_size;
+        for (Index i = 0; i < vectorized_size; i += packet_size) {
+          Packet p = internal::pgather<Scalar, Packet>(
+              src_data + src_index + i * src_stride, src_stride);
+          internal::pstoret<Scalar, Packet, Unaligned>(
+              dst_data + dst_index + i, p);
+        }
+        for (Index i = vectorized_size; i < num_coeff_to_copy; ++i) {
+          dst_data[dst_index + i] = src_data[src_index + i * src_stride];
+        }
+      } else {
+        // RANDOM
+        for (Index i = 0; i < num_coeff_to_copy; ++i) {
+          dst_data[dst_index + i * dst_stride] =
+              src_data[src_index + i * src_stride];
+        }
+      }
+    }
+  }
+};
+
+/** \class TensorBlockIO
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor block IO class.
+  *
+  * This class is responsible for copying data between a tensor and a tensor
+  * block.
+  *
+  */
+template <typename Index, typename Scalar, std::size_t NumDims, int Layout,
+          bool Vectorizable, bool BlockRead>
+class TensorBlockIO {
+ public:
+  typedef typename internal::TensorBlock<Index, Scalar, NumDims, Layout>
+    TensorBlock;
+  typedef typename internal::TensorBlockCopyOp<Index, Scalar, Vectorizable>
+    TensorBlockCopyOp;
+
+ protected:
+  struct BlockIteratorState {
+    Index input_stride;
+    Index output_stride;
+    Index input_span;
+    Index output_span;
+    Index size;
+    Index count;
+  };
+
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Copy(
+      const TensorBlock& block, Index first_coeff_index,
+      const array<Index, NumDims>& tensor_to_block_dim_map,
+      const array<Index, NumDims>& tensor_strides, const Scalar* src_data,
+      Scalar* dst_data) {
+    // Calculate strides and dimensions.
+    const Index block_dim_for_tensor_stride1_dim =
+        NumDims == 0 ? 1 :
+        tensor_to_block_dim_map[static_cast<int>(Layout) ==
+                                        static_cast<int>(ColMajor)
+                                    ? 0
+                                    : NumDims - 1];
+    const size_t block_inner_dim_size =
+        NumDims == 0 ? 1 :
+        block.block_sizes()[block_dim_for_tensor_stride1_dim];
+    const size_t block_outer_dim_size =
+        NumDims == 0 ? 1 :
+        block.block_sizes().TotalSize() / block_inner_dim_size;
+
+    Index inputIndex;
+    Index outputIndex;
+    Index input_stride;
+    Index output_stride;
+
+    // Setup strides to read/write along the tensor's stride1 dimension.
+    if (BlockRead) {
+      inputIndex = first_coeff_index;
+      outputIndex = 0;
+      input_stride = 1;
+      output_stride = NumDims == 0 ? 1
+          : block.block_strides()[block_dim_for_tensor_stride1_dim];
+    } else {
+      inputIndex = 0;
+      outputIndex = first_coeff_index;
+      input_stride = NumDims == 0 ? 1
+          : block.block_strides()[block_dim_for_tensor_stride1_dim];
+      output_stride = 1;
+    }
+
+    const std::size_t at_least_1_dim = NumDims <= 1 ? 1 : NumDims - 1;
+    array<BlockIteratorState, at_least_1_dim> block_iter_state;
+
+    // Initialize block iterator state.
+    for (int i = 0; i < static_cast<int>(NumDims) - 1; ++i) {
+      const int dim = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                          ? i + 1
+                          : NumDims - i - 2;
+      block_iter_state[i].size =
+          block.block_sizes()[tensor_to_block_dim_map[dim]];
+      if (BlockRead) {
+        block_iter_state[i].input_stride = tensor_strides[dim];
+        block_iter_state[i].output_stride =
+            block.block_strides()[tensor_to_block_dim_map[dim]];
+      } else {
+        block_iter_state[i].input_stride =
+            block.block_strides()[tensor_to_block_dim_map[dim]];
+        block_iter_state[i].output_stride = tensor_strides[dim];
+      }
+      block_iter_state[i].input_span =
+          block_iter_state[i].input_stride * (block_iter_state[i].size - 1);
+      block_iter_state[i].output_span =
+          block_iter_state[i].output_stride * (block_iter_state[i].size - 1);
+      block_iter_state[i].count = 0;
+    }
+
+    // Iterate copying data from src to dst.
+    for (Index i = 0; i < block_outer_dim_size; ++i) {
+      TensorBlockCopyOp::Run(block_inner_dim_size, outputIndex, output_stride,
+                             dst_data, inputIndex, input_stride, src_data);
+      // Update index.
+      for (int i = 0; i < static_cast<int>(NumDims) - 1; ++i) {
+        if (++block_iter_state[i].count < block_iter_state[i].size) {
+          inputIndex += block_iter_state[i].input_stride;
+          outputIndex += block_iter_state[i].output_stride;
+          break;
+        }
+        block_iter_state[i].count = 0;
+        inputIndex -= block_iter_state[i].input_span;
+        outputIndex -= block_iter_state[i].output_span;
+      }
+    }
+  }
+};
+
+/** \class TensorBlockReader
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor block reader class.
+  *
+  * This class is responsible for reading a tensor block.
+  *
+  */
+
+template <typename Index, typename Scalar, std::size_t NumDims, int Layout,
+          bool Vectorizable>
+class TensorBlockReader : public TensorBlockIO<Index, Scalar, NumDims,
+                                               Layout, Vectorizable, true> {
+ public:
+  typedef typename internal::TensorBlock<Index, Scalar, NumDims, Layout>
+      TensorBlock;
+  typedef TensorBlockIO<Index, Scalar, NumDims, Layout, Vectorizable, true>
+      Base;
+
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      TensorBlock* block, const Scalar* src_data) {
+    array<Index, NumDims> tensor_to_block_dim_map;
+    for (int i = 0; i < NumDims; ++i) {
+      tensor_to_block_dim_map[i] = i;
+    }
+    Base::Copy(*block, block->first_coeff_index(), tensor_to_block_dim_map,
+               block->tensor_strides(), src_data, block->data());
+  }
+
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      TensorBlock* block, Index first_coeff_index,
+      const array<Index, NumDims>& tensor_to_block_dim_map,
+      const array<Index, NumDims>& tensor_strides, const Scalar* src_data) {
+    Base::Copy(*block, first_coeff_index, tensor_to_block_dim_map,
+               tensor_strides, src_data, block->data());
+  }
+};
+
+/** \class TensorBlockWriter
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor block writer class.
+  *
+  * This class is responsible for writing a tensor block.
+  *
+  */
+
+template <typename Index, typename Scalar, std::size_t NumDims, int Layout,
+          bool Vectorizable>
+class TensorBlockWriter : public TensorBlockIO<Index, Scalar, NumDims,
+                                               Layout, Vectorizable, false> {
+ public:
+  typedef typename internal::TensorBlock<Index, Scalar, NumDims, Layout>
+      TensorBlock;
+  typedef TensorBlockIO<Index, Scalar, NumDims, Layout, Vectorizable, false>
+      Base;
+
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      const TensorBlock& block, Scalar* dst_data) {
+    array<Index, NumDims> tensor_to_block_dim_map;
+    for (int i = 0; i < NumDims; ++i) {
+      tensor_to_block_dim_map[i] = i;
+    }
+    Base::Copy(block, block.first_coeff_index(), tensor_to_block_dim_map,
+               block.tensor_strides(), block.data(), dst_data);
+  }
+
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      const TensorBlock& block, Index first_coeff_index,
+      const array<Index, NumDims>& tensor_to_block_dim_map,
+      const array<Index, NumDims>& tensor_strides, Scalar* dst_data) {
+    Base::Copy(block, first_coeff_index, tensor_to_block_dim_map,
+               tensor_strides, block.data(), dst_data);
+  }
+};
+
+enum TensorBlockShapeType {
+  kUniformAllDims,
+  kSkewedInnerDims,
+};
+
+struct TensorOpResourceRequirements {
+  TensorBlockShapeType block_shape;
+  std::size_t block_total_size;
+  // TODO(andydavis) Add 'target_num_threads' to support communication of
+  // thread-resource requirements. This will allow ops deep in the
+  // expression tree (like reductions) to communicate resources
+  // requirements based on local state (like the total number of reductions
+  // to be computed).
+  TensorOpResourceRequirements(internal::TensorBlockShapeType shape,
+                               const std::size_t size)
+      : block_shape(shape), block_total_size(size) {}
+};
+
+/** \class TensorBlockMapper
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor block mapper class.
+  *
+  * This class is responsible for iterating over the blocks of a tensor.
+  *
+  */
+
+template <typename Index, typename Scalar, std::size_t NumDims, int Layout>
+class TensorBlockMapper {
+ public:
+  typedef typename internal::TensorBlock<Index, Scalar, NumDims, Layout>
+      TensorBlock;
+
+  TensorBlockMapper(const Eigen::DSizes<Index, NumDims>& dims,
+                    const TensorBlockShapeType block_shape,
+                    const size_t max_coeff_count)
+      : m_dimensions(dims), m_block_dim_sizes(dims), m_total_block_count(1) {
+    if (m_block_dim_sizes.TotalSize() > max_coeff_count) {
+      if (block_shape == kUniformAllDims) {
+        // Tensor will not fit within 'max_coeff_count' budget: calculate tensor
+        // block dimension sizes based on "square" dimension size target.
+        const size_t dim_size_target =
+            std::pow(static_cast<float>(max_coeff_count),
+                     1.0 / static_cast<float>(m_block_dim_sizes.rank()));
+        for (size_t i = 0; i < m_block_dim_sizes.rank(); ++i) {
+          // TODO(andydavis) Adjust the inner most 'm_block_dim_size' to make it
+          // a multiple of the packet size. Note that reducing 'm_block_dim_size'
+          // in this manner can increase the number of blocks, and so will
+          // amplify any per-block overhead.
+          m_block_dim_sizes[i] =
+              numext::mini(dim_size_target, static_cast<size_t>(m_dimensions[i]));
+        }
+        // Add any un-allocated coefficients to inner dimension(s).
+        Index total_size = m_block_dim_sizes.TotalSize();
+        for (int i = 0; i < NumDims; ++i) {
+          const int dim = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+              ? i : NumDims - i - 1;
+          if (m_block_dim_sizes[dim] < m_dimensions[dim]) {
+            const Index total_size_other_dims = total_size /
+                m_block_dim_sizes[dim];
+            const Index alloc_avail = max_coeff_count / total_size_other_dims;
+            if (alloc_avail == m_block_dim_sizes[dim]) {
+              // Insufficient excess coefficients to allocate.
+              break;
+            }
+            m_block_dim_sizes[dim] = numext::mini(m_dimensions[dim], alloc_avail);
+            total_size = total_size_other_dims * m_block_dim_sizes[dim];
+          }
+        }
+      } else {
+        eigen_assert(block_shape == kSkewedInnerDims);
+        Index coeff_to_allocate = max_coeff_count;
+        for (int i = 0; i < NumDims; ++i) {
+          const int dim = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+              ? i : NumDims - i - 1;
+          m_block_dim_sizes[dim] = numext::mini(coeff_to_allocate,
+                                                m_dimensions[dim]);
+          coeff_to_allocate /= numext::maxi(static_cast<Index>(1),
+                                            m_block_dim_sizes[dim]);
+        }
+      }
+    }
+
+    // Calculate block counts by dimension and total block count.
+    DSizes<Index, NumDims> block_count;
+    for (size_t i = 0; i < block_count.rank(); ++i) {
+      block_count[i] =
+          (m_dimensions[i] + m_block_dim_sizes[i] - 1) / m_block_dim_sizes[i];
+    }
+    m_total_block_count = array_prod(block_count);
+
+    // Calculate block strides (used for enumerating blocks).
+    if (NumDims > 0) {
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        m_block_strides[0] = 1;
+        m_tensor_strides[0] = 1;
+        for (int i = 1; i < NumDims; ++i) {
+          m_block_strides[i] = m_block_strides[i - 1] * block_count[i - 1];
+          m_tensor_strides[i] = m_tensor_strides[i - 1] * m_dimensions[i - 1];
+        }
+      } else {
+        m_block_strides[NumDims - 1] = 1;
+        m_tensor_strides[NumDims - 1] = 1;
+        for (int i = NumDims - 2; i >= 0; --i) {
+          m_block_strides[i] = m_block_strides[i + 1] * block_count[i + 1];
+          m_tensor_strides[i] = m_tensor_strides[i + 1] * m_dimensions[i + 1];
+        }
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock
+  GetBlockForIndex(Index block_index, Scalar* data) const {
+    Index first_coeff_index = 0;
+    DSizes<Index, NumDims> coords;
+    DSizes<Index, NumDims> sizes;
+    DSizes<Index, NumDims> strides;
+    if (NumDims > 0) {
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        for (int i = NumDims - 1; i > 0; --i) {
+          const Index idx = block_index / m_block_strides[i];
+          coords[i] = idx * m_block_dim_sizes[i];
+          sizes[i] =
+              numext::mini((m_dimensions[i] - coords[i]), m_block_dim_sizes[i]);
+          block_index -= idx * m_block_strides[i];
+          first_coeff_index += coords[i] * m_tensor_strides[i];
+        }
+        coords[0] = block_index * m_block_dim_sizes[0];
+        sizes[0] =
+            numext::mini((m_dimensions[0] - coords[0]), m_block_dim_sizes[0]);
+        first_coeff_index += coords[0] * m_tensor_strides[0];
+
+        strides[0] = 1;
+        for (int i = 1; i < NumDims; ++i) {
+          strides[i] = strides[i - 1] * sizes[i - 1];
+        }
+      } else {
+        for (int i = 0; i < NumDims - 1; ++i) {
+          const Index idx = block_index / m_block_strides[i];
+          coords[i] = idx * m_block_dim_sizes[i];
+          sizes[i] =
+              numext::mini((m_dimensions[i] - coords[i]), m_block_dim_sizes[i]);
+          block_index -= idx * m_block_strides[i];
+          first_coeff_index += coords[i] * m_tensor_strides[i];
+        }
+        coords[NumDims - 1] = block_index * m_block_dim_sizes[NumDims - 1];
+        sizes[NumDims - 1] =
+            numext::mini((m_dimensions[NumDims - 1] - coords[NumDims - 1]),
+                       m_block_dim_sizes[NumDims - 1]);
+        first_coeff_index += coords[NumDims - 1] * m_tensor_strides[NumDims - 1];
+
+        strides[NumDims - 1] = 1;
+        for (int i = NumDims - 2; i >= 0; --i) {
+          strides[i] = strides[i + 1] * sizes[i + 1];
+        }
+      }
+    }
+
+    return TensorBlock(first_coeff_index, sizes, strides, m_tensor_strides,
+                       data);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index total_block_count() const {
+    return m_total_block_count;
+  }
+
+ private:
+  DSizes<Index, NumDims> m_dimensions;
+  DSizes<Index, NumDims> m_block_dim_sizes;
+  DSizes<Index, NumDims> m_block_strides;
+  DSizes<Index, NumDims> m_tensor_strides;
+  Index m_total_block_count;
+};
+
+/** \class TensorSliceBlockMapper
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor slice block mapper class.
+  *
+  * This class is responsible for iterating over the blocks of
+  * a slice of a tensor. Supports shuffling of the block strides
+  * for callers that want to reduce strides for dimensions to be
+  * processed together.
+  *
+  */
+
+template <typename Index, typename Scalar, std::size_t NumDims, int Layout>
+class TensorSliceBlockMapper {
+ public:
+  typedef typename internal::TensorBlock<Index, Scalar, NumDims, Layout>
+      TensorBlock;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  TensorSliceBlockMapper(const Dimensions& tensor_dims,
+                         const Dimensions& tensor_slice_offsets,
+                         const Dimensions& tensor_slice_extents,
+                         const Dimensions& block_dim_sizes,
+                         const Dimensions& block_stride_order)
+      : m_tensor_dimensions(tensor_dims),
+        m_tensor_slice_offsets(tensor_slice_offsets),
+        m_tensor_slice_extents(tensor_slice_extents),
+        m_block_dim_sizes(block_dim_sizes),
+        m_block_stride_order(block_stride_order),
+        m_total_block_count(1) {
+    // Calculate block counts by dimension and total block count.
+    DSizes<Index, NumDims> block_count;
+    for (size_t i = 0; i < block_count.rank(); ++i) {
+      block_count[i] = (m_tensor_slice_extents[i] + m_block_dim_sizes[i] - 1) /
+          m_block_dim_sizes[i];
+    }
+    m_total_block_count = array_prod(block_count);
+
+    // Calculate block strides (used for enumerating blocks).
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_block_strides[0] = 1;
+      m_tensor_strides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_block_strides[i] = m_block_strides[i - 1] * block_count[i - 1];
+        m_tensor_strides[i] = m_tensor_strides[i - 1] *
+            m_tensor_dimensions[i - 1];
+      }
+    } else {
+      m_block_strides[NumDims - 1] = 1;
+      m_tensor_strides[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_block_strides[i] = m_block_strides[i + 1] * block_count[i + 1];
+        m_tensor_strides[i] = m_tensor_strides[i + 1] *
+            m_tensor_dimensions[i + 1];
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBlock
+  GetBlockForIndex(Index block_index, Scalar* data) const {
+    Index first_coeff_index = 0;
+    DSizes<Index, NumDims> coords;
+    DSizes<Index, NumDims> sizes;
+    DSizes<Index, NumDims> strides;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = block_index / m_block_strides[i];
+        coords[i] = m_tensor_slice_offsets[i] + idx * m_block_dim_sizes[i];
+        sizes[i] = numext::mini(m_tensor_slice_offsets[i] + m_tensor_slice_extents[i] - coords[i],
+                                m_block_dim_sizes[i]);
+        block_index -= idx * m_block_strides[i];
+        first_coeff_index += coords[i] * m_tensor_strides[i];
+      }
+      coords[0] = m_tensor_slice_offsets[0] +
+          block_index * m_block_dim_sizes[0];
+      sizes[0] = numext::mini(m_tensor_slice_offsets[0] + m_tensor_slice_extents[0] - coords[0],
+                                m_block_dim_sizes[0]);
+      first_coeff_index += coords[0] * m_tensor_strides[0];
+
+      Index prev_dim = m_block_stride_order[0];
+      strides[prev_dim] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        const Index curr_dim = m_block_stride_order[i];
+        strides[curr_dim] = strides[prev_dim] * sizes[prev_dim];
+        prev_dim = curr_dim;
+      }
+    } else {
+      for (int i = 0; i < static_cast<int>(NumDims) - 1; ++i) {
+        const Index idx = block_index / m_block_strides[i];
+        coords[i] = m_tensor_slice_offsets[i] + idx * m_block_dim_sizes[i];
+        sizes[i] = numext::mini(m_tensor_slice_offsets[i] + m_tensor_slice_extents[i] - coords[i],
+                                m_block_dim_sizes[i]);
+        block_index -= idx * m_block_strides[i];
+        first_coeff_index += coords[i] * m_tensor_strides[i];
+      }
+      coords[NumDims - 1] = m_tensor_slice_offsets[NumDims - 1] +
+          block_index * m_block_dim_sizes[NumDims - 1];
+      sizes[NumDims - 1] = numext::mini(
+          m_tensor_slice_offsets[NumDims - 1] + m_tensor_slice_extents[NumDims - 1] - coords[NumDims - 1],
+          m_block_dim_sizes[NumDims - 1]);
+      first_coeff_index += coords[NumDims - 1] * m_tensor_strides[NumDims - 1];
+
+      Index prev_dim = m_block_stride_order[NumDims - 1];
+      strides[prev_dim] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        const Index curr_dim = m_block_stride_order[i];
+        strides[curr_dim] = strides[prev_dim] * sizes[prev_dim];
+        prev_dim = curr_dim;
+      }
+    }
+
+    return TensorBlock(first_coeff_index, sizes, strides, m_tensor_strides,
+                       data);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index total_block_count() const {
+    return m_total_block_count;
+  }
+
+ private:
+  Dimensions m_tensor_dimensions;
+  Dimensions m_tensor_slice_offsets;
+  Dimensions m_tensor_slice_extents;
+  Dimensions m_tensor_strides;
+  Dimensions m_block_dim_sizes;
+  Dimensions m_block_stride_order;
+  Dimensions m_block_strides;
+  Index m_total_block_count;
+};
+
+}  // end namespace internal
+
+}  // end namespace Eigen
+
+#endif  // EIGEN_CXX11_TENSOR_TENSOR_BLOCK_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBroadcasting.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBroadcasting.h
new file mode 100644
index 0000000000..7e6d00fad6
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorBroadcasting.h
@@ -0,0 +1,352 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_BROADCASTING_H
+#define EIGEN_CXX11_TENSOR_TENSOR_BROADCASTING_H
+
+namespace Eigen {
+
+/** \class TensorBroadcasting
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor broadcasting class.
+  *
+  *
+  */
+namespace internal {
+template<typename Broadcast, typename XprType>
+struct traits<TensorBroadcastingOp<Broadcast, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename Broadcast, typename XprType>
+struct eval<TensorBroadcastingOp<Broadcast, XprType>, Eigen::Dense>
+{
+  typedef const TensorBroadcastingOp<Broadcast, XprType>& type;
+};
+
+template<typename Broadcast, typename XprType>
+struct nested<TensorBroadcastingOp<Broadcast, XprType>, 1, typename eval<TensorBroadcastingOp<Broadcast, XprType> >::type>
+{
+  typedef TensorBroadcastingOp<Broadcast, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename Broadcast, typename XprType>
+class TensorBroadcastingOp : public TensorBase<TensorBroadcastingOp<Broadcast, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorBroadcastingOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorBroadcastingOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorBroadcastingOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorBroadcastingOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorBroadcastingOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorBroadcastingOp(const XprType& expr, const Broadcast& broadcast)
+      : m_xpr(expr), m_broadcast(broadcast) {}
+
+    EIGEN_DEVICE_FUNC
+    const Broadcast& broadcast() const { return m_broadcast; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const Broadcast m_broadcast;
+};
+
+
+// Eval as rvalue
+template<typename Broadcast, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorBroadcastingOp<Broadcast, ArgType>, Device>
+{
+  typedef TensorBroadcastingOp<Broadcast, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions InputDimensions;
+  EIGEN_STATIC_ASSERT(NumDims == internal::array_size<Broadcast>::value, "Broadcast cannot change rank")
+
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+    : m_impl(op.expression(), device)
+  {
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    const Broadcast& broadcast = op.broadcast();
+    for (int i = 0; i < NumDims; ++i) {
+      eigen_assert(input_dims[i] > 0);
+      m_dimensions[i] = input_dims[i] * broadcast[i];
+    }
+
+    if (NumDims > 0) {
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        m_inputStrides[0] = 1;
+        m_outputStrides[0] = 1;
+        for (int i = 1; i < NumDims; ++i) {
+          m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1];
+          m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1];
+        }
+      } else {
+        // NumDims is always > 0 here, but use max to avoid compiler warning
+        m_inputStrides[numext::maxi(0, NumDims-1)] = 1;
+        m_outputStrides[numext::maxi(0, NumDims-1)] = 1;
+        for (int i = NumDims-2; i >= 0; --i) {
+          m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1];
+          m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1];
+        }
+      }
+    }
+  }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE CoeffReturnType coeff(Index index) const
+  {
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      return coeffColMajor(index);
+    } else {
+      return coeffRowMajor(index);
+    }
+  }
+
+  // TODO: attempt to speed this up. The integer divisions and modulo are slow
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeffColMajor(Index index) const
+  {
+    Index inputIndex = 0;
+    if (NumDims > 0) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = index / m_outputStrides[i];
+        if (internal::index_statically_eq<Broadcast>()(i, 1)) {
+          eigen_assert(idx < m_impl.dimensions()[i]);
+          inputIndex += idx * m_inputStrides[i];
+        } else {
+          if (internal::index_statically_eq<InputDimensions>()(i, 1)) {
+            eigen_assert(idx % m_impl.dimensions()[i] == 0);
+          } else {
+            inputIndex += (idx % m_impl.dimensions()[i]) * m_inputStrides[i];
+          }
+        }
+        index -= idx * m_outputStrides[i];
+      }
+      if (internal::index_statically_eq<Broadcast>()(0, 1)) {
+        eigen_assert(index < m_impl.dimensions()[0]);
+        inputIndex += index;
+      } else {
+        if (internal::index_statically_eq<InputDimensions>()(0, 1)) {
+          eigen_assert(index % m_impl.dimensions()[0] == 0);
+        } else {
+          inputIndex += (index % m_impl.dimensions()[0]);
+        }
+      }
+    }
+    return m_impl.coeff(inputIndex);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeffRowMajor(Index index) const
+  {
+    Index inputIndex = 0;
+    if (NumDims > 0) {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx = index / m_outputStrides[i];
+        if (internal::index_statically_eq<Broadcast>()(i, 1)) {
+          eigen_assert(idx < m_impl.dimensions()[i]);
+          inputIndex += idx * m_inputStrides[i];
+        } else {
+          if (internal::index_statically_eq<InputDimensions>()(i, 1)) {
+            eigen_assert(idx % m_impl.dimensions()[i] == 0);
+          } else {
+            inputIndex += (idx % m_impl.dimensions()[i]) * m_inputStrides[i];
+          }
+        }
+        index -= idx * m_outputStrides[i];
+      }
+      if (internal::index_statically_eq<Broadcast>()(NumDims-1, 1)) {
+        eigen_assert(index < m_impl.dimensions()[NumDims-1]);
+        inputIndex += index;
+      } else {
+        if (internal::index_statically_eq<InputDimensions>()(NumDims-1, 1)) {
+          eigen_assert(index % m_impl.dimensions()[NumDims-1] == 0);
+        } else {
+          inputIndex += (index % m_impl.dimensions()[NumDims-1]);
+        }
+      }
+    }
+    return m_impl.coeff(inputIndex);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketReturnType packet(Index index) const
+  {
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      return packetColMajor<LoadMode>(index);
+    } else {
+      return packetRowMajor<LoadMode>(index);
+    }
+  }
+
+  // Ignore the LoadMode and always use unaligned loads since we can't guarantee
+  // the alignment at compile time.
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetColMajor(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    const Index originalIndex = index;
+
+    Index inputIndex = 0;
+    Index innermostLoc = 0;
+    if (NumDims > 0) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = index / m_outputStrides[i];
+        if (internal::index_statically_eq<Broadcast>()(i, 1)) {
+          eigen_assert(idx < m_impl.dimensions()[i]);
+          inputIndex += idx * m_inputStrides[i];
+        } else {
+          if (internal::index_statically_eq<InputDimensions>()(i, 1)) {
+            eigen_assert(idx % m_impl.dimensions()[i] == 0);
+          } else {
+            inputIndex += (idx % m_impl.dimensions()[i]) * m_inputStrides[i];
+          }
+        }
+        index -= idx * m_outputStrides[i];
+      }
+      if (internal::index_statically_eq<Broadcast>()(0, 1)) {
+        eigen_assert(index < m_impl.dimensions()[0]);
+        innermostLoc = index;
+      } else {
+        if (internal::index_statically_eq<InputDimensions>()(0, 1)) {
+          eigen_assert(innermostLoc % m_impl.dimensions()[0] == 0);
+          innermostLoc = 0;
+        } else {
+          innermostLoc = index % m_impl.dimensions()[0];
+        }
+      }
+      inputIndex += innermostLoc;
+    }
+
+    // Todo: this could be extended to the second dimension if we're not
+    // broadcasting alongside the first dimension, and so on.
+    if (innermostLoc + packetSize <= m_impl.dimensions()[0]) {
+      return m_impl.template packet<Unaligned>(inputIndex);
+    } else {
+      EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+      values[0] = m_impl.coeff(inputIndex);
+      for (int i = 1; i < packetSize; ++i) {
+        values[i] = coeffColMajor(originalIndex+i);
+      }
+      PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+      return rslt;
+    }
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetRowMajor(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    const Index originalIndex = index;
+
+    Index inputIndex = 0;
+    for (int i = 0; i < NumDims - 1; ++i) {
+      const Index idx = index / m_outputStrides[i];
+      if (internal::index_statically_eq<Broadcast>()(i, 1)) {
+        eigen_assert(idx < m_impl.dimensions()[i]);
+        inputIndex += idx * m_inputStrides[i];
+      } else {
+        if (internal::index_statically_eq<InputDimensions>()(i, 1)) {
+          eigen_assert(idx % m_impl.dimensions()[i] == 0);
+        } else {
+          inputIndex += (idx % m_impl.dimensions()[i]) * m_inputStrides[i];
+        }
+      }
+      index -= idx * m_outputStrides[i];
+    }
+    Index innermostLoc;
+    if (internal::index_statically_eq<Broadcast>()(NumDims-1, 1)) {
+      eigen_assert(index < m_impl.dimensions()[NumDims-1]);
+      innermostLoc = index;
+    } else {
+      if (internal::index_statically_eq<InputDimensions>()(NumDims-1, 1)) {
+        eigen_assert(innermostLoc % m_impl.dimensions()[NumDims-1] == 0);
+        innermostLoc = 0;
+      } else {
+        innermostLoc = index % m_impl.dimensions()[NumDims-1];
+      }
+    }
+    inputIndex += innermostLoc;
+
+    // Todo: this could be extended to the second dimension if we're not
+    // broadcasting alongside the first dimension, and so on.
+    if (innermostLoc + packetSize <= m_impl.dimensions()[NumDims-1]) {
+      return m_impl.template packet<Unaligned>(inputIndex);
+    } else {
+      EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+      values[0] = m_impl.coeff(inputIndex);
+      for (int i = 1; i < packetSize; ++i) {
+        values[i] = coeffRowMajor(originalIndex+i);
+      }
+      PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+      return rslt;
+    }
+  }
+
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_outputStrides;
+  array<Index, NumDims> m_inputStrides;
+  TensorEvaluator<ArgType, Device> m_impl;
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_BROADCASTING_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h
new file mode 100644
index 0000000000..36c436a613
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorChipping.h
@@ -0,0 +1,510 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CHIPPING_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CHIPPING_H
+
+namespace Eigen {
+
+/** \class TensorKChippingReshaping
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief A chip is a thin slice, corresponding to a column or a row in a 2-d tensor.
+  *
+  *
+  */
+
+namespace internal {
+template<DenseIndex DimId, typename XprType>
+struct traits<TensorChippingOp<DimId, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions - 1;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<DenseIndex DimId, typename XprType>
+struct eval<TensorChippingOp<DimId, XprType>, Eigen::Dense>
+{
+  typedef const TensorChippingOp<DimId, XprType>& type;
+};
+
+template<DenseIndex DimId, typename XprType>
+struct nested<TensorChippingOp<DimId, XprType>, 1, typename eval<TensorChippingOp<DimId, XprType> >::type>
+{
+  typedef TensorChippingOp<DimId, XprType> type;
+};
+
+template <DenseIndex DimId>
+struct DimensionId
+{
+  DimensionId(DenseIndex dim) {
+    eigen_assert(dim == DimId);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex actualDim() const {
+    return DimId;
+  }
+};
+template <>
+struct DimensionId<Dynamic>
+{
+  DimensionId(DenseIndex dim) : actual_dim(dim) {
+    eigen_assert(dim >= 0);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex actualDim() const {
+    return actual_dim;
+  }
+ private:
+  const DenseIndex actual_dim;
+};
+
+
+}  // end namespace internal
+
+
+
+template<DenseIndex DimId, typename XprType>
+class TensorChippingOp : public TensorBase<TensorChippingOp<DimId, XprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorChippingOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename Eigen::internal::nested<TensorChippingOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorChippingOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorChippingOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorChippingOp(const XprType& expr, const Index offset, const Index dim)
+      : m_xpr(expr), m_offset(offset), m_dim(dim) {
+  }
+
+  EIGEN_DEVICE_FUNC
+  const Index offset() const { return m_offset; }
+  EIGEN_DEVICE_FUNC
+  const Index dim() const { return m_dim.actualDim(); }
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename XprType::Nested>::type&
+  expression() const { return m_xpr; }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE TensorChippingOp& operator = (const TensorChippingOp& other)
+  {
+    typedef TensorAssignOp<TensorChippingOp, const TensorChippingOp> Assign;
+    Assign assign(*this, other);
+    internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+    return *this;
+  }
+
+  template<typename OtherDerived>
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE TensorChippingOp& operator = (const OtherDerived& other)
+  {
+    typedef TensorAssignOp<TensorChippingOp, const OtherDerived> Assign;
+    Assign assign(*this, other);
+    internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+    return *this;
+  }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const Index m_offset;
+    const internal::DimensionId<DimId> m_dim;
+};
+
+
+// Eval as rvalue
+template<DenseIndex DimId, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorChippingOp<DimId, ArgType>, Device>
+{
+  typedef TensorChippingOp<DimId, ArgType> XprType;
+  static const int NumInputDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  static const int NumDims = NumInputDims-1;
+  typedef typename XprType::Index Index;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename internal::remove_const<Scalar>::type ScalarNonConst;
+
+  enum {
+    // Alignment can't be guaranteed at compile time since it depends on the
+    // slice offsets.
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = TensorEvaluator<ArgType, Device>::BlockAccess,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  typedef internal::TensorBlock<Index, ScalarNonConst, NumInputDims, Layout>
+    InputTensorBlock;
+  typedef internal::TensorBlock<Index, ScalarNonConst, NumDims, Layout>
+    OutputTensorBlock;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device), m_dim(op.dim()), m_device(device)
+  {
+    EIGEN_STATIC_ASSERT(NumInputDims >= 1, YOU_MADE_A_PROGRAMMING_MISTAKE);
+    eigen_assert(NumInputDims > m_dim.actualDim());
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    eigen_assert(op.offset() < input_dims[m_dim.actualDim()]);
+
+    int j = 0;
+    for (int i = 0; i < NumInputDims; ++i) {
+      if (i != m_dim.actualDim()) {
+        m_dimensions[j] = input_dims[i];
+        ++j;
+      }
+    }
+
+    m_stride = 1;
+    m_inputStride = 1;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = 0; i < m_dim.actualDim(); ++i) {
+        m_stride *= input_dims[i];
+        m_inputStride *= input_dims[i];
+      }
+    } else {
+      for (int i = NumInputDims-1; i > m_dim.actualDim(); --i) {
+        m_stride *= input_dims[i];
+        m_inputStride *= input_dims[i];
+      }
+    }
+    m_inputStride *= input_dims[m_dim.actualDim()];
+    m_inputOffset = m_stride * op.offset();
+
+    if (BlockAccess) {
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        m_inputStrides[0] = 1;
+        for (int i = 1; i < NumInputDims; ++i) {
+          m_inputStrides[i] = m_inputStrides[i - 1] * input_dims[i - 1];
+        }
+      } else {
+        m_inputStrides[NumInputDims - 1] = 1;
+        for (int i = NumInputDims - 2; i >= 0; --i) {
+          m_inputStrides[i] = m_inputStrides[i + 1] * input_dims[i + 1];
+        }
+      }
+
+      m_block_total_size_max = numext::maxi(static_cast<std::size_t>(1),
+                                            device.lastLevelCacheSize() /
+                                            sizeof(Scalar));
+    }
+  }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return m_impl.coeff(srcCoeff(index));
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    if ((static_cast<int>(Layout) == static_cast<int>(ColMajor) &&
+         m_dim.actualDim() == 0) ||
+        (static_cast<int>(Layout) == static_cast<int>(RowMajor) &&
+         m_dim.actualDim() == NumInputDims - 1)) {
+      // m_stride is equal to 1, so let's avoid the integer division.
+      eigen_assert(m_stride == 1);
+      Index inputIndex = index * m_inputStride + m_inputOffset;
+      EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+      for (int i = 0; i < packetSize; ++i) {
+        values[i] = m_impl.coeff(inputIndex);
+        inputIndex += m_inputStride;
+      }
+      PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+      return rslt;
+    } else if ((static_cast<int>(Layout) == static_cast<int>(ColMajor) &&
+                m_dim.actualDim() == NumInputDims - 1) ||
+               (static_cast<int>(Layout) == static_cast<int>(RowMajor) &&
+                m_dim.actualDim() == 0)) {
+      // m_stride is aways greater than index, so let's avoid the integer division.
+      eigen_assert(m_stride > index);
+      return m_impl.template packet<LoadMode>(index + m_inputOffset);
+    } else {
+      const Index idx = index / m_stride;
+      const Index rem = index - idx * m_stride;
+      if (rem + packetSize <= m_stride) {
+        Index inputIndex = idx * m_inputStride + m_inputOffset + rem;
+        return m_impl.template packet<LoadMode>(inputIndex);
+      } else {
+        // Cross the stride boundary. Fallback to slow path.
+        EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+        for (int i = 0; i < packetSize; ++i) {
+          values[i] = coeff(index);
+          ++index;
+        }
+        PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+        return rslt;
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+    resources->push_back(internal::TensorOpResourceRequirements(
+        internal::kSkewedInnerDims, m_block_total_size_max));
+    m_impl.getResourceRequirements(resources);
+  }
+
+  // TODO(andydavis) Reduce the overhead of this function (experiment with
+  // using a fixed block size).
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void block(
+      OutputTensorBlock* output_block) const {
+    // Calculate input block sizes.
+    const DSizes<Index, NumDims>& output_block_sizes =
+        output_block->block_sizes();
+    const DSizes<Index, NumDims>& output_block_strides =
+        output_block->block_strides();
+    const Index chip_dim = m_dim.actualDim();
+    DSizes<Index, NumInputDims> input_block_sizes;
+    DSizes<Index, NumInputDims> input_block_strides;
+    for (Index i = 0; i < NumInputDims; ++i) {
+      if (i < chip_dim) {
+        input_block_sizes[i] = output_block_sizes[i];
+        input_block_strides[i] = output_block_strides[i];
+      } else if (i > chip_dim) {
+        input_block_sizes[i] = output_block_sizes[i - 1];
+        input_block_strides[i] = output_block_strides[i - 1];
+      } else {
+        input_block_sizes[i] = 1;
+      }
+    }
+    // Fix up input_block_stride for chip dimension.
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      if (chip_dim == 0) {
+        input_block_strides[chip_dim] = 1;
+      } else {
+        input_block_strides[chip_dim] = input_block_strides[chip_dim - 1] *
+            input_block_sizes[chip_dim - 1];
+      }
+    } else {
+      if (chip_dim == NumInputDims - 1) {
+        input_block_strides[chip_dim] = 1;
+      } else {
+        input_block_strides[chip_dim] = input_block_strides[chip_dim + 1] *
+            input_block_sizes[chip_dim + 1];
+      }
+    }
+    // Instantiate and read input block from input tensor.
+    InputTensorBlock input_block(srcCoeff(output_block->first_coeff_index()),
+                                 input_block_sizes,
+                                 input_block_strides,
+                                 m_inputStrides,
+                                 output_block->data());
+    m_impl.block(&input_block);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType* data() const {
+    CoeffReturnType* result = const_cast<CoeffReturnType*>(m_impl.data());
+    if (((static_cast<int>(Layout) == static_cast<int>(ColMajor) &&
+          m_dim.actualDim() == NumDims) ||
+         (static_cast<int>(Layout) == static_cast<int>(RowMajor) &&
+          m_dim.actualDim() == 0)) &&
+        result) {
+      return result + m_inputOffset;
+    } else {
+      return NULL;
+    }
+  }
+
+ protected:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const
+  {
+    Index inputIndex;
+    if ((static_cast<int>(Layout) == static_cast<int>(ColMajor) &&
+         m_dim.actualDim() == 0) ||
+        (static_cast<int>(Layout) == static_cast<int>(RowMajor) &&
+         m_dim.actualDim() == NumInputDims - 1)) {
+      // m_stride is equal to 1, so let's avoid the integer division.
+      eigen_assert(m_stride == 1);
+      inputIndex = index * m_inputStride + m_inputOffset;
+    } else if ((static_cast<int>(Layout) == static_cast<int>(ColMajor) &&
+                m_dim.actualDim() == NumInputDims - 1) ||
+               (static_cast<int>(Layout) == static_cast<int>(RowMajor) &&
+                m_dim.actualDim() == 0)) {
+      // m_stride is aways greater than index, so let's avoid the integer division.
+      eigen_assert(m_stride > index);
+      inputIndex = index + m_inputOffset;
+    } else {
+      const Index idx = index / m_stride;
+      inputIndex = idx * m_inputStride + m_inputOffset;
+      index -= idx * m_stride;
+      inputIndex += index;
+    }
+    return inputIndex;
+  }
+
+  Dimensions m_dimensions;
+  Index m_stride;
+  Index m_inputOffset;
+  Index m_inputStride;
+  DSizes<Index, NumInputDims> m_inputStrides;
+  TensorEvaluator<ArgType, Device> m_impl;
+  const internal::DimensionId<DimId> m_dim;
+  const Device& m_device;
+  std::size_t m_block_total_size_max;
+};
+
+
+// Eval as lvalue
+template<DenseIndex DimId, typename ArgType, typename Device>
+struct TensorEvaluator<TensorChippingOp<DimId, ArgType>, Device>
+  : public TensorEvaluator<const TensorChippingOp<DimId, ArgType>, Device>
+{
+  typedef TensorEvaluator<const TensorChippingOp<DimId, ArgType>, Device> Base;
+  typedef TensorChippingOp<DimId, ArgType> XprType;
+  static const int NumInputDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  static const int NumDims = NumInputDims-1;
+  typedef typename XprType::Index Index;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = TensorEvaluator<ArgType, Device>::BlockAccess,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+    : Base(op, device)
+    { }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename internal::remove_const<Scalar>::type ScalarNonConst;
+  typedef internal::TensorBlock<Index, ScalarNonConst, NumInputDims, Layout>
+    InputTensorBlock;
+  typedef internal::TensorBlock<Index, ScalarNonConst, NumDims, Layout>
+    OutputTensorBlock;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index)
+  {
+    return this->m_impl.coeffRef(this->srcCoeff(index));
+  }
+
+  template <int StoreMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x)
+  {
+    static const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+
+    if ((static_cast<int>(this->Layout) == static_cast<int>(ColMajor) &&
+         this->m_dim.actualDim() == 0) ||
+        (static_cast<int>(this->Layout) == static_cast<int>(RowMajor) &&
+         this->m_dim.actualDim() == NumInputDims - 1)) {
+      // m_stride is equal to 1, so let's avoid the integer division.
+      eigen_assert(this->m_stride == 1);
+      EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+      internal::pstore<CoeffReturnType, PacketReturnType>(values, x);
+      Index inputIndex = index * this->m_inputStride + this->m_inputOffset;
+      for (int i = 0; i < packetSize; ++i) {
+        this->m_impl.coeffRef(inputIndex) = values[i];
+        inputIndex += this->m_inputStride;
+      }
+    } else if ((static_cast<int>(this->Layout) == static_cast<int>(ColMajor) &&
+                this->m_dim.actualDim() == NumInputDims - 1) ||
+               (static_cast<int>(this->Layout) == static_cast<int>(RowMajor) &&
+                this->m_dim.actualDim() == 0)) {
+      // m_stride is aways greater than index, so let's avoid the integer division.
+      eigen_assert(this->m_stride > index);
+      this->m_impl.template writePacket<StoreMode>(index + this->m_inputOffset, x);
+    } else {
+      const Index idx = index / this->m_stride;
+      const Index rem = index - idx * this->m_stride;
+      if (rem + packetSize <= this->m_stride) {
+        const Index inputIndex = idx * this->m_inputStride + this->m_inputOffset + rem;
+        this->m_impl.template writePacket<StoreMode>(inputIndex, x);
+      } else {
+        // Cross stride boundary. Fallback to slow path.
+        EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+        internal::pstore<CoeffReturnType, PacketReturnType>(values, x);
+        for (int i = 0; i < packetSize; ++i) {
+          this->coeffRef(index) = values[i];
+          ++index;
+        }
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock(
+      const OutputTensorBlock& output_block) {
+    // Calculate input block sizes.
+    const DSizes<Index, NumDims>& output_block_sizes =
+        output_block.block_sizes();
+    const DSizes<Index, NumDims>& output_block_strides =
+        output_block.block_strides();
+    const Index chip_dim = this->m_dim.actualDim();
+    DSizes<Index, NumInputDims> input_block_sizes;
+    DSizes<Index, NumInputDims> input_block_strides;
+    for (Index i = 0; i < NumInputDims; ++i) {
+      if (i < chip_dim) {
+        input_block_sizes[i] = output_block_sizes[i];
+        input_block_strides[i] = output_block_strides[i];
+      } else if (i > chip_dim) {
+        input_block_sizes[i] = output_block_sizes[i - 1];
+        input_block_strides[i] = output_block_strides[i - 1];
+      } else {
+        input_block_sizes[i] = 1;
+      }
+    }
+    // Fix up input_block_stride for chip dimension.
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      if (chip_dim == 0) {
+        input_block_strides[chip_dim] = 1;
+      } else {
+        input_block_strides[chip_dim] = input_block_strides[chip_dim - 1] *
+            input_block_sizes[chip_dim - 1];
+      }
+    } else {
+      if (chip_dim == NumInputDims - 1) {
+        input_block_strides[chip_dim] = 1;
+      } else {
+        input_block_strides[chip_dim] = input_block_strides[chip_dim - 1] *
+            input_block_sizes[chip_dim - 1];
+      }
+    }
+    // Write input block.
+    this->m_impl.writeBlock(
+        InputTensorBlock(this->srcCoeff(output_block.first_coeff_index()),
+                         input_block_sizes,
+                         input_block_strides,
+                         this->m_inputStrides,
+                         const_cast<ScalarNonConst*>(output_block.data())));
+  }
+
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CHIPPING_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConcatenation.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConcatenation.h
new file mode 100644
index 0000000000..54d9e5f2c8
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConcatenation.h
@@ -0,0 +1,350 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CONCATENATION_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CONCATENATION_H
+
+namespace Eigen {
+
+/** \class TensorConcatenationOp
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor concatenation class.
+  *
+  *
+  */
+namespace internal {
+template<typename Axis, typename LhsXprType, typename RhsXprType>
+struct traits<TensorConcatenationOp<Axis, LhsXprType, RhsXprType> >
+{
+  // Type promotion to handle the case where the types of the lhs and the rhs are different.
+  typedef typename promote_storage_type<typename LhsXprType::Scalar,
+                                        typename RhsXprType::Scalar>::ret Scalar;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename promote_storage_type<typename traits<LhsXprType>::StorageKind,
+                                        typename traits<RhsXprType>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename traits<LhsXprType>::Index,
+                                      typename traits<RhsXprType>::Index>::type Index;
+  typedef typename LhsXprType::Nested LhsNested;
+  typedef typename RhsXprType::Nested RhsNested;
+  typedef typename remove_reference<LhsNested>::type _LhsNested;
+  typedef typename remove_reference<RhsNested>::type _RhsNested;
+  static const int NumDimensions = traits<LhsXprType>::NumDimensions;
+  static const int Layout = traits<LhsXprType>::Layout;
+  enum { Flags = 0 };
+};
+
+template<typename Axis, typename LhsXprType, typename RhsXprType>
+struct eval<TensorConcatenationOp<Axis, LhsXprType, RhsXprType>, Eigen::Dense>
+{
+  typedef const TensorConcatenationOp<Axis, LhsXprType, RhsXprType>& type;
+};
+
+template<typename Axis, typename LhsXprType, typename RhsXprType>
+struct nested<TensorConcatenationOp<Axis, LhsXprType, RhsXprType>, 1, typename eval<TensorConcatenationOp<Axis, LhsXprType, RhsXprType> >::type>
+{
+  typedef TensorConcatenationOp<Axis, LhsXprType, RhsXprType> type;
+};
+
+}  // end namespace internal
+
+
+template<typename Axis, typename LhsXprType, typename RhsXprType>
+class TensorConcatenationOp : public TensorBase<TensorConcatenationOp<Axis, LhsXprType, RhsXprType>, WriteAccessors>
+{
+  public:
+    typedef typename internal::traits<TensorConcatenationOp>::Scalar Scalar;
+    typedef typename internal::traits<TensorConcatenationOp>::Packet Packet;
+    typedef typename internal::traits<TensorConcatenationOp>::StorageKind StorageKind;
+    typedef typename internal::traits<TensorConcatenationOp>::Index Index;
+    typedef typename internal::nested<TensorConcatenationOp>::type Nested;
+    typedef typename internal::promote_storage_type<typename LhsXprType::CoeffReturnType,
+                                                    typename RhsXprType::CoeffReturnType>::ret CoeffReturnType;
+    typedef typename internal::promote_storage_type<typename LhsXprType::PacketReturnType,
+                                                    typename RhsXprType::PacketReturnType>::ret PacketReturnType;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorConcatenationOp(const LhsXprType& lhs, const RhsXprType& rhs, Axis axis)
+        : m_lhs_xpr(lhs), m_rhs_xpr(rhs), m_axis(axis) {}
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename LhsXprType::Nested>::type&
+    lhsExpression() const { return m_lhs_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename RhsXprType::Nested>::type&
+    rhsExpression() const { return m_rhs_xpr; }
+
+    EIGEN_DEVICE_FUNC const Axis& axis() const { return m_axis; }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorConcatenationOp& operator = (const TensorConcatenationOp& other)
+    {
+      typedef TensorAssignOp<TensorConcatenationOp, const TensorConcatenationOp> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorConcatenationOp& operator = (const OtherDerived& other)
+    {
+      typedef TensorAssignOp<TensorConcatenationOp, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+  protected:
+    typename LhsXprType::Nested m_lhs_xpr;
+    typename RhsXprType::Nested m_rhs_xpr;
+    const Axis m_axis;
+};
+
+
+// Eval as rvalue
+template<typename Axis, typename LeftArgType, typename RightArgType, typename Device>
+struct TensorEvaluator<const TensorConcatenationOp<Axis, LeftArgType, RightArgType>, Device>
+{
+  typedef TensorConcatenationOp<Axis, LeftArgType, RightArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<LeftArgType, Device>::Dimensions>::value;
+  static const int RightNumDims = internal::array_size<typename TensorEvaluator<RightArgType, Device>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<LeftArgType, Device>::PacketAccess &
+                   TensorEvaluator<RightArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<LeftArgType, Device>::Layout,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+    : m_leftImpl(op.lhsExpression(), device), m_rightImpl(op.rhsExpression(), device), m_axis(op.axis())
+  {
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<LeftArgType, Device>::Layout) == static_cast<int>(TensorEvaluator<RightArgType, Device>::Layout) || NumDims == 1), YOU_MADE_A_PROGRAMMING_MISTAKE);
+    EIGEN_STATIC_ASSERT(NumDims == RightNumDims, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(0 <= m_axis && m_axis < NumDims);
+    const Dimensions& lhs_dims = m_leftImpl.dimensions();
+    const Dimensions& rhs_dims = m_rightImpl.dimensions();
+    int i = 0;
+    for (; i < m_axis; ++i) {
+      eigen_assert(lhs_dims[i] > 0);
+      eigen_assert(lhs_dims[i] == rhs_dims[i]);
+      m_dimensions[i] = lhs_dims[i];
+    }
+    eigen_assert(lhs_dims[i] > 0);  // Now i == m_axis.
+    eigen_assert(rhs_dims[i] > 0);
+    m_dimensions[i] = lhs_dims[i] + rhs_dims[i];
+    for (++i; i < NumDims; ++i) {
+      eigen_assert(lhs_dims[i] > 0);
+      eigen_assert(lhs_dims[i] == rhs_dims[i]);
+      m_dimensions[i] = lhs_dims[i];
+    }
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_leftStrides[0] = 1;
+      m_rightStrides[0] = 1;
+      m_outputStrides[0] = 1;
+
+      for (int i = 1; i < NumDims; ++i) {
+        m_leftStrides[i] = m_leftStrides[i-1] * lhs_dims[i-1];
+        m_rightStrides[i] = m_rightStrides[i-1] * rhs_dims[i-1];
+        m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1];
+      }
+    } else {
+      m_leftStrides[NumDims - 1] = 1;
+      m_rightStrides[NumDims - 1] = 1;
+      m_outputStrides[NumDims - 1] = 1;
+
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_leftStrides[i] = m_leftStrides[i+1] * lhs_dims[i+1];
+        m_rightStrides[i] = m_rightStrides[i+1] * rhs_dims[i+1];
+        m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1];
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  // TODO(phli): Add short-circuit memcpy evaluation if underlying data are linear?
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/)
+  {
+    m_leftImpl.evalSubExprsIfNeeded(NULL);
+    m_rightImpl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup()
+  {
+    m_leftImpl.cleanup();
+    m_rightImpl.cleanup();
+  }
+
+  // TODO(phli): attempt to speed this up. The integer divisions and modulo are slow.
+  // See CL/76180724 comments for more ideas.
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    // Collect dimension-wise indices (subs).
+    array<Index, NumDims> subs;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        subs[i] = index / m_outputStrides[i];
+        index -= subs[i] * m_outputStrides[i];
+      }
+      subs[0] = index;
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        subs[i] = index / m_outputStrides[i];
+        index -= subs[i] * m_outputStrides[i];
+      }
+      subs[NumDims - 1] = index;
+    }
+
+    const Dimensions& left_dims = m_leftImpl.dimensions();
+    if (subs[m_axis] < left_dims[m_axis]) {
+      Index left_index;
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        left_index = subs[0];
+        for (int i = 1; i < NumDims; ++i) {
+          left_index += (subs[i] % left_dims[i]) * m_leftStrides[i];
+        }
+      } else {
+        left_index = subs[NumDims - 1];
+        for (int i = NumDims - 2; i >= 0; --i) {
+          left_index += (subs[i] % left_dims[i]) * m_leftStrides[i];
+        }
+      }
+      return m_leftImpl.coeff(left_index);
+    } else {
+      subs[m_axis] -= left_dims[m_axis];
+      const Dimensions& right_dims = m_rightImpl.dimensions();
+      Index right_index;
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        right_index = subs[0];
+        for (int i = 1; i < NumDims; ++i) {
+          right_index += (subs[i] % right_dims[i]) * m_rightStrides[i];
+        }
+      } else {
+        right_index = subs[NumDims - 1];
+        for (int i = NumDims - 2; i >= 0; --i) {
+          right_index += (subs[i] % right_dims[i]) * m_rightStrides[i];
+        }
+      }
+      return m_rightImpl.coeff(right_index);
+    }
+  }
+
+  // TODO(phli): Add a real vectorization.
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    static const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index + packetSize - 1 < dimensions().TotalSize());
+
+    EIGEN_ALIGN_DEFAULT CoeffReturnType values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+  protected:
+    Dimensions m_dimensions;
+    array<Index, NumDims> m_outputStrides;
+    array<Index, NumDims> m_leftStrides;
+    array<Index, NumDims> m_rightStrides;
+    TensorEvaluator<LeftArgType, Device> m_leftImpl;
+    TensorEvaluator<RightArgType, Device> m_rightImpl;
+    const Axis m_axis;
+};
+
+// Eval as lvalue
+template<typename Axis, typename LeftArgType, typename RightArgType, typename Device>
+  struct TensorEvaluator<TensorConcatenationOp<Axis, LeftArgType, RightArgType>, Device>
+  : public TensorEvaluator<const TensorConcatenationOp<Axis, LeftArgType, RightArgType>, Device>
+{
+  typedef TensorEvaluator<const TensorConcatenationOp<Axis, LeftArgType, RightArgType>, Device> Base;
+  typedef TensorConcatenationOp<Axis, LeftArgType, RightArgType> XprType;
+  typedef typename Base::Dimensions Dimensions;
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<LeftArgType, Device>::PacketAccess &
+                   TensorEvaluator<RightArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<LeftArgType, Device>::Layout,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(XprType& op, const Device& device)
+    : Base(op, device)
+  {
+    EIGEN_STATIC_ASSERT((static_cast<int>(Layout) == static_cast<int>(ColMajor)), YOU_MADE_A_PROGRAMMING_MISTAKE);
+  }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index)
+  {
+    // Collect dimension-wise indices (subs).
+    array<Index, Base::NumDims> subs;
+    for (int i = Base::NumDims - 1; i > 0; --i) {
+      subs[i] = index / this->m_outputStrides[i];
+      index -= subs[i] * this->m_outputStrides[i];
+    }
+    subs[0] = index;
+
+    const Dimensions& left_dims = this->m_leftImpl.dimensions();
+    if (subs[this->m_axis] < left_dims[this->m_axis]) {
+      Index left_index = subs[0];
+      for (int i = 1; i < Base::NumDims; ++i) {
+        left_index += (subs[i] % left_dims[i]) * this->m_leftStrides[i];
+      }
+      return this->m_leftImpl.coeffRef(left_index);
+    } else {
+      subs[this->m_axis] -= left_dims[this->m_axis];
+      const Dimensions& right_dims = this->m_rightImpl.dimensions();
+      Index right_index = subs[0];
+      for (int i = 1; i < Base::NumDims; ++i) {
+        right_index += (subs[i] % right_dims[i]) * this->m_rightStrides[i];
+      }
+      return this->m_rightImpl.coeffRef(right_index);
+    }
+  }
+
+  template <int StoreMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x)
+  {
+    static const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index + packetSize - 1 < this->dimensions().TotalSize());
+
+    EIGEN_ALIGN_DEFAULT CoeffReturnType values[packetSize];
+    internal::pstore<CoeffReturnType, PacketReturnType>(values, x);
+    for (int i = 0; i < packetSize; ++i) {
+      coeffRef(index+i) = values[i];
+    }
+  }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CONCATENATION_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
new file mode 100644
index 0000000000..7fb384c65e
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContraction.h
@@ -0,0 +1,635 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Eric Martin <eric@ericmart.in>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_H
+
+namespace Eigen {
+
+/** \class TensorContraction
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor contraction class.
+  *
+  *
+  */
+namespace internal {
+template<typename Dimensions, typename LhsXprType, typename RhsXprType>
+struct traits<TensorContractionOp<Dimensions, LhsXprType, RhsXprType> >
+{
+  // Type promotion to handle the case where the types of the lhs and the rhs are different.
+  typedef typename scalar_product_traits<typename LhsXprType::Scalar, typename RhsXprType::Scalar>::ReturnType Scalar;
+
+  typedef typename scalar_product_traits<typename traits<LhsXprType>::StorageKind,
+                                         typename traits<RhsXprType>::StorageKind>::ReturnType StorageKind;
+  typedef typename promote_index_type<typename traits<LhsXprType>::Index,
+                                      typename traits<RhsXprType>::Index>::type Index;
+  typedef typename LhsXprType::Nested LhsNested;
+  typedef typename RhsXprType::Nested RhsNested;
+  typedef typename remove_reference<LhsNested>::type _LhsNested;
+  typedef typename remove_reference<RhsNested>::type _RhsNested;
+
+  // From NumDims below.
+  static const int NumDimensions = traits<RhsXprType>::NumDimensions + traits<RhsXprType>::NumDimensions - 2 * array_size<Dimensions>::value;
+  static const int Layout = traits<LhsXprType>::Layout;
+
+  enum {
+    Flags = 0,
+  };
+};
+
+template<typename Dimensions, typename LhsXprType, typename RhsXprType>
+struct eval<TensorContractionOp<Dimensions, LhsXprType, RhsXprType>, Eigen::Dense>
+{
+  typedef const TensorContractionOp<Dimensions, LhsXprType, RhsXprType>& type;
+};
+
+template<typename Dimensions, typename LhsXprType, typename RhsXprType>
+struct nested<TensorContractionOp<Dimensions, LhsXprType, RhsXprType>, 1, typename eval<TensorContractionOp<Dimensions, LhsXprType, RhsXprType> >::type>
+{
+  typedef TensorContractionOp<Dimensions, LhsXprType, RhsXprType> type;
+};
+
+template<typename Indices_, typename LeftArgType_, typename RightArgType_, typename Device_>
+struct traits<TensorEvaluator<const TensorContractionOp<Indices_, LeftArgType_, RightArgType_>, Device_> > {
+  typedef Indices_ Indices;
+  typedef LeftArgType_ LeftArgType;
+  typedef RightArgType_ RightArgType;
+  typedef Device_ Device;
+
+  // From NumDims below.
+  static const int NumDimensions = traits<LeftArgType_>::NumDimensions + traits<RightArgType_>::NumDimensions - 2 * array_size<Indices_>::value;
+};
+
+}  // end namespace internal
+
+template<typename Indices, typename LhsXprType, typename RhsXprType>
+class TensorContractionOp : public TensorBase<TensorContractionOp<Indices, LhsXprType, RhsXprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorContractionOp>::Scalar Scalar;
+  typedef typename internal::scalar_product_traits<typename LhsXprType::CoeffReturnType,
+                                                   typename RhsXprType::CoeffReturnType>::ReturnType CoeffReturnType;
+  typedef typename Eigen::internal::nested<TensorContractionOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorContractionOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorContractionOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorContractionOp(
+      const LhsXprType& lhs, const RhsXprType& rhs, const Indices& dims)
+      : m_lhs_xpr(lhs), m_rhs_xpr(rhs), m_indices(dims) {}
+
+  EIGEN_DEVICE_FUNC const Indices& indices() const { return m_indices; }
+
+  /** \returns the nested expressions */
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename LhsXprType::Nested>::type&
+  lhsExpression() const { return m_lhs_xpr; }
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename RhsXprType::Nested>::type&
+  rhsExpression() const { return m_rhs_xpr; }
+
+  protected:
+    typename LhsXprType::Nested m_lhs_xpr;
+    typename RhsXprType::Nested m_rhs_xpr;
+    const Indices m_indices;
+};
+
+
+template<typename Derived>
+struct TensorContractionEvaluatorBase
+{
+  typedef typename internal::traits<Derived>::Indices Indices;
+  typedef typename internal::traits<Derived>::LeftArgType LeftArgType;
+  typedef typename internal::traits<Derived>::RightArgType RightArgType;
+  typedef typename internal::traits<Derived>::Device Device;
+
+  typedef TensorContractionOp<Indices, LeftArgType, RightArgType> XprType;
+  typedef typename internal::remove_const<typename XprType::Scalar>::type Scalar;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+
+  enum {
+    IsAligned = true,
+    PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+    BlockAccess = false,
+    Layout = TensorEvaluator<LeftArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  // Most of the code is assuming that both input tensors are ColMajor. If the
+  // inputs are RowMajor, we will "cheat" by swapping the LHS and RHS:
+  // If we want to compute A * B = C, where A is LHS and B is RHS, the code
+  // will pretend B is LHS and A is RHS.
+  typedef typename internal::conditional<
+    static_cast<int>(Layout) == static_cast<int>(ColMajor), LeftArgType, RightArgType>::type EvalLeftArgType;
+  typedef typename internal::conditional<
+    static_cast<int>(Layout) == static_cast<int>(ColMajor), RightArgType, LeftArgType>::type EvalRightArgType;
+
+  static const int LDims =
+      internal::array_size<typename TensorEvaluator<EvalLeftArgType, Device>::Dimensions>::value;
+  static const int RDims =
+      internal::array_size<typename TensorEvaluator<EvalRightArgType, Device>::Dimensions>::value;
+  static const int ContractDims = internal::array_size<Indices>::value;
+  static const int NumDims = LDims + RDims - 2 * ContractDims;
+
+  typedef array<Index, LDims> left_dim_mapper_t;
+  typedef array<Index, RDims> right_dim_mapper_t;
+  typedef array<Index, ContractDims> contract_t;
+  typedef array<Index, LDims - ContractDims> left_nocontract_t;
+  typedef array<Index, RDims - ContractDims> right_nocontract_t;
+
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  TensorContractionEvaluatorBase(const XprType& op, const Device& device)
+      : m_leftImpl(choose(Cond<static_cast<int>(Layout) == static_cast<int>(ColMajor)>(),
+                          op.lhsExpression(), op.rhsExpression()), device),
+        m_rightImpl(choose(Cond<static_cast<int>(Layout) == static_cast<int>(ColMajor)>(),
+                          op.rhsExpression(), op.lhsExpression()), device),
+        m_device(device),
+        m_result(NULL) {
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<LeftArgType, Device>::Layout) ==
+                         static_cast<int>(TensorEvaluator<RightArgType, Device>::Layout)),
+                        YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+    eigen_assert((contract_t::size > 0) && "Must contract on some indices");
+
+
+    DSizes<Index, LDims> eval_left_dims;
+    DSizes<Index, RDims> eval_right_dims;
+    array<IndexPair<Index>, ContractDims> eval_op_indices;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      // For ColMajor, we keep using the existing dimensions
+      for (int i = 0; i < LDims; i++) {
+        eval_left_dims[i] = m_leftImpl.dimensions()[i];
+      }
+      for (int i = 0; i < RDims; i++) {
+        eval_right_dims[i] = m_rightImpl.dimensions()[i];
+      }
+      // We keep the pairs of contracting indices.
+      for (int i = 0; i < ContractDims; i++) {
+        eval_op_indices[i].first = op.indices()[i].first;
+        eval_op_indices[i].second = op.indices()[i].second;
+      }
+    } else {
+      // For RowMajor, we need to reverse the existing dimensions
+      for (int i = 0; i < LDims; i++) {
+        eval_left_dims[i] = m_leftImpl.dimensions()[LDims - i - 1];
+      }
+      for (int i = 0; i < RDims; i++) {
+        eval_right_dims[i] = m_rightImpl.dimensions()[RDims - i - 1];
+      }
+      // We need to flip all the pairs of contracting indices as well as
+      // reversing the dimensions.
+      for (int i = 0; i < ContractDims; i++) {
+        eval_op_indices[i].first = LDims - 1 - op.indices()[ContractDims - 1 - i].second;
+        eval_op_indices[i].second = RDims - 1 - op.indices()[ContractDims - 1 - i].first;
+      }
+    }
+
+    array<Index, LDims> lhs_strides;
+    if (LDims > 0) {
+      lhs_strides[0] = 1;
+      for (int i = 0; i < LDims-1; ++i) {
+        lhs_strides[i+1] = lhs_strides[i] * eval_left_dims[i];
+      }
+    }
+
+    array<Index, RDims> rhs_strides;
+    if (RDims > 0) {
+      rhs_strides[0] = 1;
+      for (int i = 0; i < RDims-1; ++i) {
+        rhs_strides[i+1] = rhs_strides[i] * eval_right_dims[i];
+      }
+    }
+
+    if (m_i_strides.size() > 0) m_i_strides[0] = 1;
+    if (m_j_strides.size() > 0) m_j_strides[0] = 1;
+    if (m_k_strides.size() > 0) m_k_strides[0] = 1;
+
+    m_i_size = 1;
+    m_j_size = 1;
+    m_k_size = 1;
+
+    // To compute the dimension, we simply concatenate the non-contracting
+    // dimensions of the left and then the right tensor. Additionally, I also
+    // want to compute the cumulative products of the left non-contracting
+    // dimensions, right non-contracting dimensions, and the contracting
+    // dimensions (in the order of the contraction) to aid in the later
+    // computation of tensor indices for matrix indices.
+    m_lhs_inner_dim_contiguous = true;
+    int dim_idx = 0;
+    int nocontract_idx = 0;
+
+    for (int i = 0; i < LDims; i++) {
+      // find if we are contracting on index i of left tensor
+      bool contracting = false;
+      for (int j = 0; j < ContractDims; j++) {
+        if (eval_op_indices[j].first == i) {
+          contracting = true;
+          break;
+        }
+      }
+      if (!contracting) {
+        // add dimension size to output dimensions
+        m_dimensions[dim_idx] = eval_left_dims[i];
+        m_left_nocontract_strides[nocontract_idx] = lhs_strides[i];
+        if (dim_idx != i) {
+          m_lhs_inner_dim_contiguous = false;
+        }
+        if (nocontract_idx+1 < internal::array_size<left_nocontract_t>::value) {
+          m_i_strides[nocontract_idx+1] =
+              m_i_strides[nocontract_idx] * eval_left_dims[i];
+        } else {
+          m_i_size = m_i_strides[nocontract_idx] * eval_left_dims[i];
+        }
+        dim_idx++;
+        nocontract_idx++;
+      }
+    }
+
+    nocontract_idx = 0;
+    for (int i = 0; i < RDims; i++) {
+      bool contracting = false;
+      // find if we are contracting on index i of right tensor
+      for (int j = 0; j < ContractDims; j++) {
+        if (eval_op_indices[j].second == i) {
+          contracting = true;
+          break;
+        }
+      }
+      if (!contracting) {
+        m_dimensions[dim_idx] = eval_right_dims[i];
+        if (nocontract_idx+1 < internal::array_size<right_nocontract_t>::value) {
+          m_j_strides[nocontract_idx+1] =
+              m_j_strides[nocontract_idx] * eval_right_dims[i];
+        } else {
+          m_j_size = m_j_strides[nocontract_idx] * eval_right_dims[i];
+        }
+        m_right_nocontract_strides[nocontract_idx] = rhs_strides[i];
+        dim_idx++;
+        nocontract_idx++;
+      }
+    }
+
+    // now build contraction cumprod. We assumed above that non-contracting axes
+    // are represented in the same order in the matrix as they are in the tensor.
+    // This is not the case for contracting axes. As the contracting axes must be
+    // of the same size in each tensor, I'll only look at the first tensor here.
+    m_rhs_inner_dim_contiguous = true;
+    m_rhs_inner_dim_reordered = false;
+    for (int i = 0; i < ContractDims; i++) {
+      Index left = eval_op_indices[i].first;
+      Index right = eval_op_indices[i].second;
+
+      Index size = eval_left_dims[left];
+      eigen_assert(size == eval_right_dims[right] &&
+                   "Contraction axes must be same size");
+
+      if (i+1 < internal::array_size<contract_t>::value) {
+        m_k_strides[i+1] = m_k_strides[i] * size;
+      } else {
+        m_k_size = m_k_strides[i] * size;
+      }
+      m_left_contracting_strides[i] = lhs_strides[left];
+      m_right_contracting_strides[i] = rhs_strides[right];
+
+      if (i > 0 && right < eval_op_indices[i-1].second) {
+        m_rhs_inner_dim_reordered = true;
+      }
+      if (right != i) {
+        m_rhs_inner_dim_contiguous = false;
+      }
+    }
+
+    // If the layout is RowMajor, we need to reverse the m_dimensions
+    if (static_cast<int>(Layout) == static_cast<int>(RowMajor)) {
+      for (int i = 0, j = NumDims - 1; i < j; i++, j--) {
+        numext::swap(m_dimensions[i], m_dimensions[j]);
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* data) {
+    m_leftImpl.evalSubExprsIfNeeded(NULL);
+    m_rightImpl.evalSubExprsIfNeeded(NULL);
+    if (data) {
+      evalTo(data);
+      return false;
+    } else {
+      m_result = static_cast<Scalar *>(m_device.allocate(dimensions().TotalSize() * sizeof(Scalar)));
+      evalTo(m_result);
+      return true;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC void evalTo(Scalar* buffer) const {
+    if (this->m_lhs_inner_dim_contiguous) {
+      if (this->m_rhs_inner_dim_contiguous) {
+        if (this->m_rhs_inner_dim_reordered) {
+          static_cast<const Derived*>(this)->template evalProduct<true, true, true, Unaligned>(buffer);
+        }
+        else {
+          static_cast<const Derived*>(this)->template evalProduct<true, true, false, Unaligned>(buffer);
+        }
+      }
+      else {
+       if (this->m_rhs_inner_dim_reordered) {
+          static_cast<const Derived*>(this)->template evalProduct<true, false, true, Unaligned>(buffer);
+        }
+        else {
+          static_cast<const Derived*>(this)->template evalProduct<true, false, false, Unaligned>(buffer);
+        }
+      }
+    }
+    else {
+      if (this->m_rhs_inner_dim_contiguous) {
+        if (this->m_rhs_inner_dim_reordered) {
+          static_cast<const Derived*>(this)->template evalProduct<false, true, true, Unaligned>(buffer);
+        }
+        else {
+          static_cast<const Derived*>(this)->template evalProduct<false, true, false, Unaligned>(buffer);
+        }
+      }
+      else {
+       if (this->m_rhs_inner_dim_reordered) {
+          static_cast<const Derived*>(this)->template evalProduct<false, false, true, Unaligned>(buffer);
+        }
+        else {
+          static_cast<const Derived*>(this)->template evalProduct<false, false, false, Unaligned>(buffer);
+        }
+      }
+    }
+  }
+
+  template <bool lhs_inner_dim_contiguous, bool rhs_inner_dim_contiguous, bool rhs_inner_dim_reordered, int Alignment>
+  void evalGemv(Scalar* buffer) const {
+    const Index rows = m_i_size;
+    const Index cols = m_k_size;
+
+    typedef typename internal::remove_const<typename EvalLeftArgType::Scalar>::type LhsScalar;
+    typedef typename internal::remove_const<typename EvalRightArgType::Scalar>::type RhsScalar;
+    typedef TensorEvaluator<EvalLeftArgType, Device> LeftEvaluator;
+    typedef TensorEvaluator<EvalRightArgType, Device> RightEvaluator;
+    const int lhs_packet_size = PacketType<LhsScalar, Device>::size;
+    const int rhs_packet_size = PacketType<RhsScalar, Device>::size;
+    typedef internal::TensorContractionInputMapper<LhsScalar, Index, internal::Lhs,
+                                                   LeftEvaluator, left_nocontract_t,
+                                                   contract_t, lhs_packet_size,
+                                                   lhs_inner_dim_contiguous,
+                                                   false, Unaligned> LhsMapper;
+
+    typedef internal::TensorContractionInputMapper<RhsScalar, Index, internal::Rhs,
+                                                   RightEvaluator, right_nocontract_t,
+                                                   contract_t, rhs_packet_size,
+                                                   rhs_inner_dim_contiguous,
+                                                   rhs_inner_dim_reordered, Unaligned> RhsMapper;
+
+    LhsMapper lhs(m_leftImpl, m_left_nocontract_strides, m_i_strides,
+                  m_left_contracting_strides, m_k_strides);
+    RhsMapper rhs(m_rightImpl, m_right_nocontract_strides, m_j_strides,
+                  m_right_contracting_strides, m_k_strides);
+
+    const RhsScalar alpha(1);
+    const Index resIncr(1);
+
+    // zero out the result buffer (which must be of size at least rows * sizeof(Scalar)
+    m_device.memset(buffer, 0, rows * sizeof(Scalar));
+
+    internal::general_matrix_vector_product<Index,LhsScalar,LhsMapper,ColMajor,false,RhsScalar,RhsMapper,false>::run(
+        rows, cols, lhs, rhs,
+        buffer, resIncr, alpha);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_leftImpl.cleanup();
+    m_rightImpl.cleanup();
+
+    if (m_result != NULL) {
+      m_device.deallocate(m_result);
+      m_result = NULL;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const {
+    return m_result[index];
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const {
+    return internal::ploadt<PacketReturnType, LoadMode>(m_result + index);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return m_result; }
+
+  protected:
+  // Note: nvcc doesn't like implicit copy constructor. If this is needed anywhere,
+  // then we'll have to write an explicit copy constructor...
+  //TensorContractionEvaluatorBase(const TensorContractionEvaluatorBase&);
+
+  TensorContractionEvaluatorBase& operator = (const TensorContractionEvaluatorBase&);
+  Dimensions m_dimensions;
+
+  contract_t m_k_strides;
+  contract_t m_left_contracting_strides;
+  contract_t m_right_contracting_strides;
+
+  bool m_lhs_inner_dim_contiguous;
+  bool m_rhs_inner_dim_contiguous;
+  bool m_rhs_inner_dim_reordered;
+
+  left_nocontract_t m_i_strides;
+  right_nocontract_t m_j_strides;
+  left_nocontract_t m_left_nocontract_strides;
+  right_nocontract_t m_right_nocontract_strides;
+
+  Index m_i_size;
+  Index m_j_size;
+  Index m_k_size;
+
+  TensorEvaluator<EvalLeftArgType, Device> m_leftImpl;
+  TensorEvaluator<EvalRightArgType, Device> m_rightImpl;
+  const Device& m_device;
+  Scalar* m_result;
+};
+
+
+// evaluator for default device
+template<typename Indices, typename LeftArgType, typename RightArgType, typename Device>
+struct TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, Device> :
+    public TensorContractionEvaluatorBase<
+      TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, Device> > {
+  typedef TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, Device> Self;
+  typedef TensorContractionEvaluatorBase<Self> Base;
+
+  typedef TensorContractionOp<Indices, LeftArgType, RightArgType> XprType;
+  typedef typename internal::remove_const<typename XprType::Scalar>::type Scalar;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+
+  enum {
+    Layout = TensorEvaluator<LeftArgType, Device>::Layout,
+  };
+
+  // Most of the code is assuming that both input tensors are ColMajor. If the
+  // inputs are RowMajor, we will "cheat" by swapping the LHS and RHS:
+  // If we want to compute A * B = C, where A is LHS and B is RHS, the code
+  // will pretend B is LHS and A is RHS.
+  typedef typename internal::conditional<
+    static_cast<int>(Layout) == static_cast<int>(ColMajor), LeftArgType, RightArgType>::type EvalLeftArgType;
+  typedef typename internal::conditional<
+    static_cast<int>(Layout) == static_cast<int>(ColMajor), RightArgType, LeftArgType>::type EvalRightArgType;
+
+  static const int LDims =
+      internal::array_size<typename TensorEvaluator<EvalLeftArgType, Device>::Dimensions>::value;
+  static const int RDims =
+      internal::array_size<typename TensorEvaluator<EvalRightArgType, Device>::Dimensions>::value;
+  static const int ContractDims = internal::array_size<Indices>::value;
+
+  typedef array<Index, LDims> left_dim_mapper_t;
+  typedef array<Index, RDims> right_dim_mapper_t;
+
+  typedef array<Index, ContractDims> contract_t;
+  typedef array<Index, LDims - ContractDims> left_nocontract_t;
+  typedef array<Index, RDims - ContractDims> right_nocontract_t;
+
+  static const int NumDims = LDims + RDims - 2 * ContractDims;
+
+  // Could we use NumDimensions here?
+  typedef DSizes<Index, NumDims> Dimensions;
+
+
+  EIGEN_DEVICE_FUNC TensorEvaluator(const XprType& op, const Device& device) :
+      Base(op, device) { }
+
+  template <bool lhs_inner_dim_contiguous, bool rhs_inner_dim_contiguous, bool rhs_inner_dim_reordered, int Alignment>
+  void evalProduct(Scalar* buffer) const {
+    if (this->m_j_size == 1) {
+      this->template evalGemv<lhs_inner_dim_contiguous, rhs_inner_dim_contiguous, rhs_inner_dim_reordered, Alignment>(buffer);
+      return;
+    }
+
+    evalGemm<lhs_inner_dim_contiguous, rhs_inner_dim_contiguous, rhs_inner_dim_reordered, Alignment>(buffer);
+  }
+
+  template <bool lhs_inner_dim_contiguous, bool rhs_inner_dim_contiguous, bool rhs_inner_dim_reordered, int Alignment>
+  EIGEN_DEVICE_FUNC void evalGemm(Scalar* buffer) const {
+    // columns in left side, rows in right side
+    const Index k = this->m_k_size;
+
+    // rows in left side
+    const Index m = this->m_i_size;
+
+    // columns in right side
+    const Index n = this->m_j_size;
+
+    // zero out the result buffer (which must be of size at least m * n * sizeof(Scalar)
+    this->m_device.memset(buffer, 0, m * n * sizeof(Scalar));
+
+    // define mr, nr, and all of my data mapper types
+    typedef typename internal::remove_const<typename EvalLeftArgType::Scalar>::type LhsScalar;
+    typedef typename internal::remove_const<typename EvalRightArgType::Scalar>::type RhsScalar;
+    typedef typename internal::gebp_traits<LhsScalar, RhsScalar> Traits;
+
+    const Index nr = Traits::nr;
+    const Index mr = Traits::mr;
+
+    typedef TensorEvaluator<EvalLeftArgType, Device> LeftEvaluator;
+    typedef TensorEvaluator<EvalRightArgType, Device> RightEvaluator;
+
+    const int lhs_packet_size = internal::packet_traits<LhsScalar>::size;
+    const int rhs_packet_size = internal::packet_traits<RhsScalar>::size;
+
+    typedef internal::TensorContractionInputMapper<LhsScalar, Index, internal::Lhs,
+                                                   LeftEvaluator, left_nocontract_t,
+                                                   contract_t, lhs_packet_size,
+                                                   lhs_inner_dim_contiguous,
+                                                   false, Unaligned> LhsMapper;
+
+    typedef internal::TensorContractionInputMapper<RhsScalar, Index, internal::Rhs,
+                                                   RightEvaluator, right_nocontract_t,
+                                                   contract_t, rhs_packet_size,
+                                                   rhs_inner_dim_contiguous,
+                                                   rhs_inner_dim_reordered, Unaligned> RhsMapper;
+
+    typedef internal::blas_data_mapper<Scalar, Index, ColMajor> OutputMapper;
+
+    // declare GEBP packing and kernel structs
+    // TODO: packing could be faster sometimes if we supported row major tensor mappers
+    internal::gemm_pack_lhs<LhsScalar, Index, typename LhsMapper::SubMapper, mr, Traits::LhsProgress, ColMajor> pack_lhs;
+    internal::gemm_pack_rhs<RhsScalar, Index, typename RhsMapper::SubMapper, nr, ColMajor> pack_rhs;
+
+    // TODO: replace false, false with conjugate values?
+    internal::gebp_kernel<LhsScalar, RhsScalar, Index, OutputMapper, mr, nr, false, false> gebp;
+
+    // initialize data mappers
+    LhsMapper lhs(this->m_leftImpl, this->m_left_nocontract_strides, this->m_i_strides,
+                  this->m_left_contracting_strides, this->m_k_strides);
+
+    RhsMapper rhs(this->m_rightImpl, this->m_right_nocontract_strides, this->m_j_strides,
+                  this->m_right_contracting_strides, this->m_k_strides);
+
+    OutputMapper output(buffer, m);
+
+    // TODO: refine arguments here (am I row or col major, etc)
+    typedef typename internal::gemm_blocking_space<ColMajor, LhsScalar, RhsScalar, Dynamic, Dynamic, Dynamic> BlockingType;
+
+    // compute block sizes (which depend on number of threads)
+
+    // last parameter is true to use L3 blocking, 2nd to last parameter is 1 to
+    // indicate 1 thread
+    BlockingType blocking(m, n, k, 1, true);
+
+    const Index kc = blocking.kc();
+    const Index mc = (std::min<Index>)(m, blocking.mc());
+    const Index nc = (std::min<Index>)(n, blocking.nc());
+
+    // sizes of submatrices to live in cache. see Goto paper.
+    int sizeA = blocking.mc() * kc;
+    int sizeB = kc * blocking.nc();
+
+    // note: m_device.allocate should return 16 byte aligned pointers, but if blockA and blockB
+    //       aren't 16 byte aligned segfaults will happen due to SIMD instructions
+    LhsScalar* blockA = static_cast<LhsScalar *>(this->m_device.allocate(sizeA * sizeof(LhsScalar)));
+    RhsScalar* blockB = static_cast<RhsScalar *>(this->m_device.allocate(sizeB * sizeof(RhsScalar)));
+
+    for(Index i2=0; i2<m; i2+=mc)
+    {
+      const Index actual_mc = numext::mini(i2+mc,m)-i2;
+      for (Index k2 = 0; k2 < k; k2 += kc) {
+        // make sure we don't overshoot right edge of left matrix, then pack vertical panel
+        const Index actual_kc = numext::mini(k2 + kc, k) - k2;
+        pack_lhs(blockA, lhs.getSubMapper(i2, k2), actual_kc, actual_mc, 0, 0);
+
+        // series of horizontal blocks
+        for (Index j2 = 0; j2 < n; j2 += nc) {
+          // make sure we don't overshoot right edge of right matrix, then pack block
+          const Index actual_nc = numext::mini(j2 + nc, n) - j2;
+          pack_rhs(blockB, rhs.getSubMapper(k2, j2), actual_kc, actual_nc, 0, 0);
+
+          // call gebp (matrix kernel)
+          // The parameters here are copied from Eigen's GEMM implementation
+          gebp(output.getSubMapper(i2, j2), blockA, blockB, actual_mc, actual_kc, actual_nc, Scalar(1), -1, -1, 0, 0);
+        }
+      }
+    }
+
+    this->m_device.deallocate(blockA);
+    this->m_device.deallocate(blockB);
+  }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h
new file mode 100644
index 0000000000..f05746f298
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h
@@ -0,0 +1,1387 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Eric Martin <eric@ericmart.in>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_CUDA_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_CUDA_H
+
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__)
+
+namespace Eigen {
+
+template<typename Scalar, typename Index, typename LhsMapper,
+         typename RhsMapper, typename OutputMapper, bool needs_edge_check>
+__device__ EIGEN_STRONG_INLINE void
+EigenContractionKernelInternal(const LhsMapper lhs, const RhsMapper rhs,
+                               const OutputMapper output, volatile Scalar* lhs_shmem, volatile Scalar* rhs_shmem,
+                       const Index m_size, const Index n_size, const Index k_size) {
+
+  const Index m_block_idx = blockIdx.x;
+  const Index n_block_idx = blockIdx.y;
+
+  const Index base_m = 64 * m_block_idx;
+  const Index base_n = 64 * n_block_idx;
+
+  // declare and initialize 64 registers for output 8x8 block
+
+  // prefetch registers
+  Scalar lhs_pf0;
+  Scalar lhs_pf1;
+  Scalar lhs_pf2;
+  Scalar lhs_pf3;
+  Scalar lhs_pf4;
+  Scalar lhs_pf5;
+  Scalar lhs_pf6;
+  Scalar lhs_pf7;
+
+  Scalar rhs_pf0;
+  Scalar rhs_pf1;
+  Scalar rhs_pf2;
+  Scalar rhs_pf3;
+  Scalar rhs_pf4;
+  Scalar rhs_pf5;
+  Scalar rhs_pf6;
+  Scalar rhs_pf7;
+
+  // shared memory is formatted
+  // (contract idx in block, nocontract idx in block, block idx)
+  // where block idx is column major. This transposition limits the number of
+  // bank conflicts when reading the LHS. The core idea is that since the contracting
+  // index is shared by both sides, then the contracting index should be in threadIdx.x.
+
+  // On the LHS, we pad each row inside of each block with an extra element. This makes
+  // each block 8 rows of 9 elements, which is 72 elements. This gives no bank conflicts
+  // on writes and very few 2-way conflicts on reads. There is an 8x8 grid of these blocks.
+
+  // On the RHS we just add 8 padding elements to the end of each block. This gives no bank
+  // conflicts on writes and also none on reads.
+
+  // storage indices
+  const Index lhs_store_idx_base = threadIdx.y * 72 + threadIdx.x * 9 + threadIdx.z;
+  const Index rhs_store_idx_base = threadIdx.y * 72 + threadIdx.z * 8 + threadIdx.x;
+
+  const Index lhs_store_idx_0 = lhs_store_idx_base + 576 * 0;
+  const Index lhs_store_idx_1 = lhs_store_idx_base + 576 * 1;
+  const Index lhs_store_idx_2 = lhs_store_idx_base + 576 * 2;
+  const Index lhs_store_idx_3 = lhs_store_idx_base + 576 * 3;
+  const Index lhs_store_idx_4 = lhs_store_idx_base + 576 * 4;
+  const Index lhs_store_idx_5 = lhs_store_idx_base + 576 * 5;
+  const Index lhs_store_idx_6 = lhs_store_idx_base + 576 * 6;
+  const Index lhs_store_idx_7 = lhs_store_idx_base + 576 * 7;
+
+  const Index rhs_store_idx_0 = rhs_store_idx_base + 576 * 0;
+  const Index rhs_store_idx_1 = rhs_store_idx_base + 576 * 1;
+  const Index rhs_store_idx_2 = rhs_store_idx_base + 576 * 2;
+  const Index rhs_store_idx_3 = rhs_store_idx_base + 576 * 3;
+  const Index rhs_store_idx_4 = rhs_store_idx_base + 576 * 4;
+  const Index rhs_store_idx_5 = rhs_store_idx_base + 576 * 5;
+  const Index rhs_store_idx_6 = rhs_store_idx_base + 576 * 6;
+  const Index rhs_store_idx_7 = rhs_store_idx_base + 576 * 7;
+
+  // in the loading code, the following variables are important:
+  // threadIdx.x: the vertical position in an 8x8 block
+  // threadIdx.y: the vertical index of the 8x8 block in the grid
+  // threadIdx.z: the horizontal position in an 8x8 block
+  // k: the horizontal index of the 8x8 block in the grid
+  //
+  // The k parameter is implicit (it was the loop counter for a loop that went
+  // from 0 to <8, but now that loop is unrolled in the below code.
+
+  const Index load_idx_vert = threadIdx.x + 8 * threadIdx.y;
+  const Index lhs_vert = base_m + load_idx_vert;
+
+#define prefetchIntoRegisters(base_k)                           \
+  {                                                             \
+    lhs_pf0 = Scalar(0);                                        \
+    lhs_pf1 = Scalar(0);                                        \
+    lhs_pf2 = Scalar(0);                                        \
+    lhs_pf3 = Scalar(0);                                        \
+    lhs_pf4 = Scalar(0);                                        \
+    lhs_pf5 = Scalar(0);                                        \
+    lhs_pf6 = Scalar(0);                                        \
+    lhs_pf7 = Scalar(0);                                        \
+                                                                \
+    rhs_pf0 = Scalar(0);                                        \
+    rhs_pf1 = Scalar(0);                                        \
+    rhs_pf2 = Scalar(0);                                        \
+    rhs_pf3 = Scalar(0);                                        \
+    rhs_pf4 = Scalar(0);                                        \
+    rhs_pf5 = Scalar(0);                                        \
+    rhs_pf6 = Scalar(0);                                        \
+    rhs_pf7 = Scalar(0);                                        \
+                                                                \
+    if (!needs_edge_check || lhs_vert < m_size) {               \
+      const Index lhs_horiz_0 = base_k + threadIdx.z + 0 * 8;   \
+      const Index lhs_horiz_1 = base_k + threadIdx.z + 1 * 8;   \
+      const Index lhs_horiz_2 = base_k + threadIdx.z + 2 * 8;   \
+      const Index lhs_horiz_3 = base_k + threadIdx.z + 3 * 8;   \
+      const Index lhs_horiz_4 = base_k + threadIdx.z + 4 * 8;   \
+      const Index lhs_horiz_5 = base_k + threadIdx.z + 5 * 8;   \
+      const Index lhs_horiz_6 = base_k + threadIdx.z + 6 * 8;   \
+      const Index lhs_horiz_7 = base_k + threadIdx.z + 7 * 8;   \
+                                                                \
+      if (!needs_edge_check || lhs_horiz_7 < k_size) {          \
+        lhs_pf0 = lhs(lhs_vert, lhs_horiz_0);                   \
+        lhs_pf1 = lhs(lhs_vert, lhs_horiz_1);                   \
+        lhs_pf2 = lhs(lhs_vert, lhs_horiz_2);                   \
+        lhs_pf3 = lhs(lhs_vert, lhs_horiz_3);                   \
+        lhs_pf4 = lhs(lhs_vert, lhs_horiz_4);                   \
+        lhs_pf5 = lhs(lhs_vert, lhs_horiz_5);                   \
+        lhs_pf6 = lhs(lhs_vert, lhs_horiz_6);                   \
+        lhs_pf7 = lhs(lhs_vert, lhs_horiz_7);                   \
+      } else if (lhs_horiz_6 < k_size) {                        \
+        lhs_pf0 = lhs(lhs_vert, lhs_horiz_0);                   \
+        lhs_pf1 = lhs(lhs_vert, lhs_horiz_1);                   \
+        lhs_pf2 = lhs(lhs_vert, lhs_horiz_2);                   \
+        lhs_pf3 = lhs(lhs_vert, lhs_horiz_3);                   \
+        lhs_pf4 = lhs(lhs_vert, lhs_horiz_4);                   \
+        lhs_pf5 = lhs(lhs_vert, lhs_horiz_5);                   \
+        lhs_pf6 = lhs(lhs_vert, lhs_horiz_6);                   \
+      } else if (lhs_horiz_5 < k_size) {                        \
+        lhs_pf0 = lhs(lhs_vert, lhs_horiz_0);                   \
+        lhs_pf1 = lhs(lhs_vert, lhs_horiz_1);                   \
+        lhs_pf2 = lhs(lhs_vert, lhs_horiz_2);                   \
+        lhs_pf3 = lhs(lhs_vert, lhs_horiz_3);                   \
+        lhs_pf4 = lhs(lhs_vert, lhs_horiz_4);                   \
+        lhs_pf5 = lhs(lhs_vert, lhs_horiz_5);                   \
+      } else if (lhs_horiz_4 < k_size) {                        \
+        lhs_pf0 = lhs(lhs_vert, lhs_horiz_0);                   \
+        lhs_pf1 = lhs(lhs_vert, lhs_horiz_1);                   \
+        lhs_pf2 = lhs(lhs_vert, lhs_horiz_2);                   \
+        lhs_pf3 = lhs(lhs_vert, lhs_horiz_3);                   \
+        lhs_pf4 = lhs(lhs_vert, lhs_horiz_4);                   \
+      } else if (lhs_horiz_3 < k_size) {                        \
+        lhs_pf0 = lhs(lhs_vert, lhs_horiz_0);                   \
+        lhs_pf1 = lhs(lhs_vert, lhs_horiz_1);                   \
+        lhs_pf2 = lhs(lhs_vert, lhs_horiz_2);                   \
+        lhs_pf3 = lhs(lhs_vert, lhs_horiz_3);                   \
+      } else if (lhs_horiz_2 < k_size) {                        \
+        lhs_pf0 = lhs(lhs_vert, lhs_horiz_0);                   \
+        lhs_pf1 = lhs(lhs_vert, lhs_horiz_1);                   \
+        lhs_pf2 = lhs(lhs_vert, lhs_horiz_2);                   \
+      } else if (lhs_horiz_1 < k_size) {                        \
+        lhs_pf0 = lhs(lhs_vert, lhs_horiz_0);                   \
+        lhs_pf1 = lhs(lhs_vert, lhs_horiz_1);                   \
+      } else if (lhs_horiz_0 < k_size) {                        \
+        lhs_pf0 = lhs(lhs_vert, lhs_horiz_0);                   \
+      }                                                         \
+    }                                                           \
+                                                                \
+    const Index rhs_vert = base_k + load_idx_vert;              \
+    if (!needs_edge_check || rhs_vert < k_size) {               \
+      const Index rhs_horiz_0 = base_n + threadIdx.z + 0 * 8;   \
+      const Index rhs_horiz_1 = base_n + threadIdx.z + 1 * 8;   \
+      const Index rhs_horiz_2 = base_n + threadIdx.z + 2 * 8;   \
+      const Index rhs_horiz_3 = base_n + threadIdx.z + 3 * 8;   \
+      const Index rhs_horiz_4 = base_n + threadIdx.z + 4 * 8;   \
+      const Index rhs_horiz_5 = base_n + threadIdx.z + 5 * 8;   \
+      const Index rhs_horiz_6 = base_n + threadIdx.z + 6 * 8;   \
+      const Index rhs_horiz_7 = base_n + threadIdx.z + 7 * 8;   \
+                                                                \
+      if (rhs_horiz_7 < n_size) {                               \
+        rhs_pf0 = rhs(rhs_vert, rhs_horiz_0);                   \
+        rhs_pf1 = rhs(rhs_vert, rhs_horiz_1);                   \
+        rhs_pf2 = rhs(rhs_vert, rhs_horiz_2);                   \
+        rhs_pf3 = rhs(rhs_vert, rhs_horiz_3);                   \
+        rhs_pf4 = rhs(rhs_vert, rhs_horiz_4);                   \
+        rhs_pf5 = rhs(rhs_vert, rhs_horiz_5);                   \
+        rhs_pf6 = rhs(rhs_vert, rhs_horiz_6);                   \
+        rhs_pf7 = rhs(rhs_vert, rhs_horiz_7);                   \
+      } else if (rhs_horiz_6 < n_size) {                        \
+        rhs_pf0 = rhs(rhs_vert, rhs_horiz_0);                   \
+        rhs_pf1 = rhs(rhs_vert, rhs_horiz_1);                   \
+        rhs_pf2 = rhs(rhs_vert, rhs_horiz_2);                   \
+        rhs_pf3 = rhs(rhs_vert, rhs_horiz_3);                   \
+        rhs_pf4 = rhs(rhs_vert, rhs_horiz_4);                   \
+        rhs_pf5 = rhs(rhs_vert, rhs_horiz_5);                   \
+        rhs_pf6 = rhs(rhs_vert, rhs_horiz_6);                   \
+      } else if (rhs_horiz_5 < n_size) {                        \
+        rhs_pf0 = rhs(rhs_vert, rhs_horiz_0);                   \
+        rhs_pf1 = rhs(rhs_vert, rhs_horiz_1);                   \
+        rhs_pf2 = rhs(rhs_vert, rhs_horiz_2);                   \
+        rhs_pf3 = rhs(rhs_vert, rhs_horiz_3);                   \
+        rhs_pf4 = rhs(rhs_vert, rhs_horiz_4);                   \
+        rhs_pf5 = rhs(rhs_vert, rhs_horiz_5);                   \
+      } else if (rhs_horiz_4 < n_size) {                        \
+        rhs_pf0 = rhs(rhs_vert, rhs_horiz_0);                   \
+        rhs_pf1 = rhs(rhs_vert, rhs_horiz_1);                   \
+        rhs_pf2 = rhs(rhs_vert, rhs_horiz_2);                   \
+        rhs_pf3 = rhs(rhs_vert, rhs_horiz_3);                   \
+        rhs_pf4 = rhs(rhs_vert, rhs_horiz_4);                   \
+      } else if (rhs_horiz_3 < n_size) {                        \
+        rhs_pf0 = rhs(rhs_vert, rhs_horiz_0);                   \
+        rhs_pf1 = rhs(rhs_vert, rhs_horiz_1);                   \
+        rhs_pf2 = rhs(rhs_vert, rhs_horiz_2);                   \
+        rhs_pf3 = rhs(rhs_vert, rhs_horiz_3);                   \
+      } else if (rhs_horiz_2 < n_size) {                        \
+        rhs_pf0 = rhs(rhs_vert, rhs_horiz_0);                   \
+        rhs_pf1 = rhs(rhs_vert, rhs_horiz_1);                   \
+        rhs_pf2 = rhs(rhs_vert, rhs_horiz_2);                   \
+      } else if (rhs_horiz_1 < n_size) {                        \
+        rhs_pf0 = rhs(rhs_vert, rhs_horiz_0);                   \
+        rhs_pf1 = rhs(rhs_vert, rhs_horiz_1);                   \
+      } else if (rhs_horiz_0 < n_size) {                        \
+        rhs_pf0 = rhs(rhs_vert, rhs_horiz_0);                   \
+      }                                                         \
+    }                                                           \
+  }                                                             \
+
+#define writeRegToShmem(_)                      \
+  lhs_shmem[lhs_store_idx_0] = lhs_pf0;         \
+  rhs_shmem[rhs_store_idx_0] = rhs_pf0;         \
+                                                \
+  lhs_shmem[lhs_store_idx_1] = lhs_pf1;         \
+  rhs_shmem[rhs_store_idx_1] = rhs_pf1;         \
+                                                \
+  lhs_shmem[lhs_store_idx_2] = lhs_pf2;         \
+  rhs_shmem[rhs_store_idx_2] = rhs_pf2;         \
+                                                \
+  lhs_shmem[lhs_store_idx_3] = lhs_pf3;         \
+  rhs_shmem[rhs_store_idx_3] = rhs_pf3;         \
+                                                \
+  lhs_shmem[lhs_store_idx_4] = lhs_pf4;         \
+  rhs_shmem[rhs_store_idx_4] = rhs_pf4;         \
+                                                \
+  lhs_shmem[lhs_store_idx_5] = lhs_pf5;         \
+  rhs_shmem[rhs_store_idx_5] = rhs_pf5;         \
+                                                \
+  lhs_shmem[lhs_store_idx_6] = lhs_pf6;         \
+  rhs_shmem[rhs_store_idx_6] = rhs_pf6;         \
+                                                \
+  lhs_shmem[lhs_store_idx_7] = lhs_pf7;         \
+  rhs_shmem[rhs_store_idx_7] = rhs_pf7;         \
+
+  // declare and initialize result array
+#define res(i, j) _res_##i##j
+#define initResultRow(i)                        \
+  Scalar res(i, 0) = Scalar(0);                 \
+  Scalar res(i, 1) = Scalar(0);                 \
+  Scalar res(i, 2) = Scalar(0);                 \
+  Scalar res(i, 3) = Scalar(0);                 \
+  Scalar res(i, 4) = Scalar(0);                 \
+  Scalar res(i, 5) = Scalar(0);                 \
+  Scalar res(i, 6) = Scalar(0);                 \
+  Scalar res(i, 7) = Scalar(0);                 \
+
+  initResultRow(0);
+  initResultRow(1);
+  initResultRow(2);
+  initResultRow(3);
+  initResultRow(4);
+  initResultRow(5);
+  initResultRow(6);
+  initResultRow(7);
+#undef initResultRow
+
+  for (Index base_k = 0; base_k < k_size; base_k += 64) {
+    // wait for previous iteration to finish with shmem. Despite common sense,
+    // the code is a bit faster with this here then at bottom of loop
+    __syncthreads();
+
+    prefetchIntoRegisters(base_k);
+    writeRegToShmem();
+
+    #undef prefetchIntoRegisters
+    #undef writeRegToShmem
+
+    // wait for shared mem packing to be done before starting computation
+    __syncthreads();
+
+    // compute 8x8 matrix product by outer product. This involves packing one column
+    // of LHS and one row of RHS into registers (takes 16 registers).
+
+#define lcol(i) _lcol##i
+    Scalar lcol(0);
+    Scalar lcol(1);
+    Scalar lcol(2);
+    Scalar lcol(3);
+    Scalar lcol(4);
+    Scalar lcol(5);
+    Scalar lcol(6);
+    Scalar lcol(7);
+
+#define rrow(j) _rrow##j
+    Scalar rrow(0);
+    Scalar rrow(1);
+    Scalar rrow(2);
+    Scalar rrow(3);
+    Scalar rrow(4);
+    Scalar rrow(5);
+    Scalar rrow(6);
+    Scalar rrow(7);
+
+    // Now x corresponds to k, y to m, and z to n
+    const volatile Scalar* lhs_block = &lhs_shmem[threadIdx.x + 9 * threadIdx.y];
+    const volatile Scalar* rhs_block = &rhs_shmem[threadIdx.x + 8 * threadIdx.z];
+
+#define lhs_element(i, j) lhs_block[72 * ((i) + 8 * (j))]
+#define rhs_element(i, j) rhs_block[72 * ((i) + 8 * (j))]
+
+#define loadData(i, j)                          \
+    lcol(0) = lhs_element(0, j);               \
+    rrow(0) = rhs_element(i, 0);               \
+    lcol(1) = lhs_element(1, j);               \
+    rrow(1) = rhs_element(i, 1);               \
+    lcol(2) = lhs_element(2, j);               \
+    rrow(2) = rhs_element(i, 2);               \
+    lcol(3) = lhs_element(3, j);               \
+    rrow(3) = rhs_element(i, 3);               \
+    lcol(4) = lhs_element(4, j);               \
+    rrow(4) = rhs_element(i, 4);               \
+    lcol(5) = lhs_element(5, j);               \
+    rrow(5) = rhs_element(i, 5);               \
+    lcol(6) = lhs_element(6, j);               \
+    rrow(6) = rhs_element(i, 6);               \
+    lcol(7) = lhs_element(7, j);               \
+    rrow(7) = rhs_element(i, 7);               \
+
+#define computeCol(j)                           \
+    res(0, j) += lcol(0) * rrow(j);             \
+    res(1, j) += lcol(1) * rrow(j);             \
+    res(2, j) += lcol(2) * rrow(j);             \
+    res(3, j) += lcol(3) * rrow(j);             \
+    res(4, j) += lcol(4) * rrow(j);             \
+    res(5, j) += lcol(5) * rrow(j);             \
+    res(6, j) += lcol(6) * rrow(j);             \
+    res(7, j) += lcol(7) * rrow(j);             \
+
+#define computePass(i)                          \
+    loadData(i, i);                             \
+                                                \
+    computeCol(0);                              \
+    computeCol(1);                              \
+    computeCol(2);                              \
+    computeCol(3);                              \
+    computeCol(4);                              \
+    computeCol(5);                              \
+    computeCol(6);                              \
+    computeCol(7);                              \
+
+    computePass(0);
+    computePass(1);
+    computePass(2);
+    computePass(3);
+    computePass(4);
+    computePass(5);
+    computePass(6);
+    computePass(7);
+
+#undef lcol
+#undef rrow
+#undef lhs_element
+#undef rhs_element
+#undef loadData
+#undef computeCol
+#undef computePass
+  } // end loop over k
+
+  // we've now iterated over all of the large (ie width 64) k blocks and
+  // accumulated results in registers. At this point thread (x, y, z) contains
+  // the sum across all big k blocks of the product of little k block of index (x, y)
+  // with block of index (y, z). To compute the final output, we need to reduce
+  // the 8 threads over y by summation.
+#define shuffleInc(i, j, mask) res(i, j) += __shfl_xor(res(i, j), mask)
+
+#define reduceRow(i, mask)                      \
+  shuffleInc(i, 0, mask);                       \
+  shuffleInc(i, 1, mask);                       \
+  shuffleInc(i, 2, mask);                       \
+  shuffleInc(i, 3, mask);                       \
+  shuffleInc(i, 4, mask);                       \
+  shuffleInc(i, 5, mask);                       \
+  shuffleInc(i, 6, mask);                       \
+  shuffleInc(i, 7, mask);                       \
+
+#define reduceMatrix(mask)                      \
+  reduceRow(0, mask);                           \
+  reduceRow(1, mask);                           \
+  reduceRow(2, mask);                           \
+  reduceRow(3, mask);                           \
+  reduceRow(4, mask);                           \
+  reduceRow(5, mask);                           \
+  reduceRow(6, mask);                           \
+  reduceRow(7, mask);                           \
+
+  // actually perform the reduction, now each thread of index (_, y, z)
+  // contains the correct values in its registers that belong in the output
+  // block
+  reduceMatrix(1);
+  reduceMatrix(2);
+  reduceMatrix(4);
+
+#undef shuffleInc
+#undef reduceRow
+#undef reduceMatrix
+
+  // now we need to copy the 64 values into main memory. We can't split work
+  // among threads because all variables are in registers. There's 2 ways
+  // to do this:
+  // (1) have 1 thread do 64 writes from registers into global memory
+  // (2) have 1 thread do 64 writes into shared memory, and then 8 threads
+  //     each do 8 writes into global memory. We can just overwrite the shared
+  //     memory from the problem we just solved.
+  // (2) is slightly faster than (1) due to less branching and more ILP
+
+  // TODO: won't yield much gain, but could just use currently unused shared mem
+  //       and then we won't have to sync
+  // wait for shared mem to be out of use
+  __syncthreads();
+
+#define writeResultShmem(i, j)                                          \
+  lhs_shmem[i + 8 * threadIdx.y + 64 * threadIdx.z + 512 * j] = res(i, j); \
+
+#define writeRow(i)                             \
+  writeResultShmem(i, 0);                       \
+  writeResultShmem(i, 1);                       \
+  writeResultShmem(i, 2);                       \
+  writeResultShmem(i, 3);                       \
+  writeResultShmem(i, 4);                       \
+  writeResultShmem(i, 5);                       \
+  writeResultShmem(i, 6);                       \
+  writeResultShmem(i, 7);                       \
+
+  if (threadIdx.x == 0) {
+    writeRow(0);
+    writeRow(1);
+    writeRow(2);
+    writeRow(3);
+    writeRow(4);
+    writeRow(5);
+    writeRow(6);
+    writeRow(7);
+  }
+#undef writeResultShmem
+#undef writeRow
+
+  const int max_i_write = (min)((int)((m_size - base_m - threadIdx.y + 7) / 8), 8);
+  const int max_j_write = (min)((int)((n_size - base_n - threadIdx.z + 7) / 8), 8);
+
+  if (threadIdx.x < max_i_write) {
+    if (max_j_write == 8) {
+      // TODO: can i trade bank conflicts for coalesced writes?
+      Scalar val0 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 0];
+      Scalar val1 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 1];
+      Scalar val2 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 2];
+      Scalar val3 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 3];
+      Scalar val4 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 4];
+      Scalar val5 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 5];
+      Scalar val6 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 6];
+      Scalar val7 = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * 7];
+
+      output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 0) = val0;
+      output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 1) = val1;
+      output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 2) = val2;
+      output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 3) = val3;
+      output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 4) = val4;
+      output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 5) = val5;
+      output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 6) = val6;
+      output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * 7) = val7;
+    } else {
+#pragma unroll 7
+      for (int j = 0; j < max_j_write; j++) {
+        Scalar val = lhs_shmem[threadIdx.x + 8 * threadIdx.y + 64 * threadIdx.z + 512 * j];
+        output(base_m + threadIdx.y + 8 * threadIdx.x, base_n + threadIdx.z + 8 * j) = val;
+      }
+    }
+  }
+#undef res
+}
+
+
+template<typename Scalar, typename Index, typename LhsMapper,
+         typename RhsMapper, typename OutputMapper>
+__global__ void
+__launch_bounds__(512)
+EigenContractionKernel(const LhsMapper lhs, const RhsMapper rhs,
+                       const OutputMapper output,
+                       const Index m_size, const Index n_size, const Index k_size) {
+  __shared__ volatile Scalar lhs_shmem[72 * 64];
+  __shared__ volatile Scalar rhs_shmem[72 * 64];
+
+  const Index m_block_idx = blockIdx.x;
+  const Index n_block_idx = blockIdx.y;
+
+  const Index base_m = 64 * m_block_idx;
+  const Index base_n = 64 * n_block_idx;
+
+  if (base_m + 63 < m_size && base_n + 63 < n_size) {
+    EigenContractionKernelInternal<Scalar, Index, LhsMapper, RhsMapper, OutputMapper, false>(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size);
+  } else {
+    EigenContractionKernelInternal<Scalar, Index, LhsMapper, RhsMapper, OutputMapper, true>(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size);
+  }
+}
+
+
+template<typename Index, typename LhsMapper,
+         typename RhsMapper, typename OutputMapper, bool CHECK_LHS_BOUNDARY,
+         bool CHECK_RHS_BOUNDARY>
+__device__ EIGEN_STRONG_INLINE void
+EigenFloatContractionKernelInternal16x16(const LhsMapper lhs, const RhsMapper rhs,
+                       const OutputMapper output, float2 lhs_shmem2[][16],
+                       float2 rhs_shmem2[][8], const Index m_size,
+                       const Index n_size, const Index k_size,
+                       const Index base_m, const Index base_n) {
+  typedef float Scalar;
+
+  // prefetch registers
+  float4 lhs_pf0, rhs_pf0;
+
+  float4 results[4];
+  for (int i = 0; i < 4; i++) {
+    results[i].x = results[i].y = results[i].z = results[i].w = 0;
+  }
+
+
+#define prefetch_lhs(reg, row, col)                   \
+    if (!CHECK_LHS_BOUNDARY) {                        \
+      if (col < k_size) {                             \
+        reg =lhs.loadPacket(row, col);                \
+      }                                               \
+    } else {                                          \
+      if (col < k_size) {                             \
+        if (row + 3 < m_size) {                       \
+          reg =lhs.loadPacket(row, col);              \
+        } else if (row + 2 < m_size) {                \
+          reg.x =lhs(row + 0, col);                   \
+          reg.y =lhs(row + 1, col);                   \
+          reg.z =lhs(row + 2, col);                   \
+        } else if (row + 1 < m_size) {                \
+          reg.x =lhs(row + 0, col);                   \
+          reg.y =lhs(row + 1, col);                   \
+        } else if (row  < m_size) {                   \
+          reg.x =lhs(row + 0, col);                   \
+        }                                             \
+      }                                               \
+    }                                                 \
+
+
+  Index lhs_vert = base_m+threadIdx.x*4;
+
+  for (Index k = 0; k < k_size; k += 16) {
+    lhs_pf0 = internal::pset1<float4>(0);
+    rhs_pf0 = internal::pset1<float4>(0);
+
+    Index lhs_horiz = threadIdx.y+k;
+    prefetch_lhs(lhs_pf0, lhs_vert, lhs_horiz)
+
+    Index rhs_vert = k+(threadIdx.x%4)*4;
+    Index rhs_horiz0 = (threadIdx.x>>2)+threadIdx.y*4+base_n;
+
+    if (!CHECK_RHS_BOUNDARY) {
+      if ((rhs_vert + 3) < k_size) {
+        // just CHECK_RHS_BOUNDARY
+        rhs_pf0 = rhs.loadPacket(rhs_vert, rhs_horiz0);
+      } else if (rhs_vert + 2 < k_size) {
+        // just CHECK_RHS_BOUNDARY
+        rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+        rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+        rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0);
+      } else if (rhs_vert + 1 < k_size) {
+        rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+        rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+      } else if (rhs_vert  < k_size) {
+        rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+      }
+    } else {
+      if (rhs_horiz0 < n_size) {
+        if ((rhs_vert + 3) < k_size) {
+          rhs_pf0 = rhs.loadPacket(rhs_vert, rhs_horiz0);
+        } else if ((rhs_vert + 2) < k_size) {
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+          rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+          rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0);
+        } else if ((rhs_vert + 1) < k_size) {
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+          rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+        } else if (rhs_vert  < k_size) {
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+        }
+      }
+    }
+    float x1, x2 ;
+    // the following can be a bitwise operation..... some day.
+    if((threadIdx.x%8) < 4) {
+      x1 = rhs_pf0.y;
+      x2 = rhs_pf0.w;
+    } else {
+      x1 = rhs_pf0.x;
+      x2 = rhs_pf0.z;
+    }
+    x1 = __shfl_xor(x1, 4);
+    x2 = __shfl_xor(x2, 4);
+    if((threadIdx.x%8) < 4) {
+      rhs_pf0.y = x1;
+      rhs_pf0.w = x2;
+    } else {
+      rhs_pf0.x = x1;
+      rhs_pf0.z = x2;
+    }
+
+    // We have 64 features.
+    // Row 0 -> times (0, 4, 8, 12, 1, 5, 9, 13) for features 0, 1.
+    // Row 1 -> times (0, 4, 8, 12, 1, 5, 9, 13) for features 2, 3.
+    // ...
+    // Row 31 -> times (0, 4, 8, 12, 1, 5, 9, 13) for features 62, 63
+    // Row 32 -> times (2, 6, 10, 14, 3, 7, 11, 15) for features 0, 1
+    // ...
+    rhs_shmem2[(threadIdx.x>>3)+ threadIdx.y*2][threadIdx.x%8] = make_float2(rhs_pf0.x, rhs_pf0.y);
+    rhs_shmem2[(threadIdx.x>>3)+ threadIdx.y*2+32][threadIdx.x%8] = make_float2(rhs_pf0.z, rhs_pf0.w);
+
+    // Row 0 (time 0) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), ..  (60, 61)
+    // Row 1 (time 1) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), ..  (60, 61)
+    // ...
+    // Row 15 (time 15) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), ..  (60, 61)
+    // Row 16 (time 0) -> features (2, 3), (6, 7), .. (30, 31), (34, 35), ..  (62, 63)
+    // ...
+
+    lhs_shmem2[threadIdx.y][threadIdx.x] = make_float2(lhs_pf0.x, lhs_pf0.y);
+    lhs_shmem2[threadIdx.y+16][threadIdx.x] = make_float2(lhs_pf0.z, lhs_pf0.w);
+
+
+#define add_vals(fl1, fl2, fr1, fr2)\
+    results[0].x += fl1.x * fr1.x;\
+    results[0].y += fl1.y * fr1.x;\
+    results[0].z += fl2.x * fr1.x;\
+    results[0].w += fl2.y * fr1.x;\
+\
+    results[1].x += fl1.x * fr1.y;\
+    results[1].y += fl1.y * fr1.y;\
+    results[1].z += fl2.x * fr1.y;\
+    results[1].w += fl2.y * fr1.y;\
+\
+    results[2].x += fl1.x * fr2.x;\
+    results[2].y += fl1.y * fr2.x;\
+    results[2].z += fl2.x * fr2.x;\
+    results[2].w += fl2.y * fr2.x;\
+\
+    results[3].x += fl1.x * fr2.y;\
+    results[3].y += fl1.y * fr2.y;\
+    results[3].z += fl2.x * fr2.y;\
+    results[3].w += fl2.y * fr2.y;\
+
+    __syncthreads();
+
+    // Do the multiplies.
+    #pragma unroll
+    for (int koff = 0; koff < 16; koff ++) {
+      // 32 x threads.
+      float2 fl1 = lhs_shmem2[koff][threadIdx.x];
+      float2 fl2 = lhs_shmem2[koff + 16][threadIdx.x];
+
+      int start_feature = threadIdx.y * 4;
+      float2 fr1 = rhs_shmem2[(start_feature>>1) + 32*((koff%4)/2)][koff/4 + (koff%2)*4];
+      float2 fr2 = rhs_shmem2[(start_feature>>1) + 1 + 32*((koff%4)/2)][koff/4 + (koff%2)*4];
+
+      add_vals(fl1, fl2, fr1, fr2)
+    }
+    __syncthreads();
+  }
+
+#undef prefetch_lhs
+#undef add_vals
+
+  Index horiz_base = threadIdx.y*4+base_n;
+  if (!CHECK_LHS_BOUNDARY && !CHECK_RHS_BOUNDARY) {
+    for (int i = 0; i < 4; i++) {
+      output(lhs_vert, horiz_base + i) = results[i].x;
+      output(lhs_vert + 1, horiz_base + i) = results[i].y;
+      output(lhs_vert + 2, horiz_base + i) = results[i].z;
+      output(lhs_vert + 3, horiz_base + i) = results[i].w;
+    }
+  } else if (!CHECK_RHS_BOUNDARY) {
+    // CHECK LHS
+    if (lhs_vert + 3 < m_size) {
+      for (int i = 0; i < 4; i++) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+        output(lhs_vert + 1, horiz_base + i) = results[i].y;
+        output(lhs_vert + 2, horiz_base + i) = results[i].z;
+        output(lhs_vert + 3, horiz_base + i) = results[i].w;
+      }
+    } else if (lhs_vert + 2 < m_size) {
+      for (int i = 0; i < 4; i++) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+        output(lhs_vert + 1, horiz_base + i) = results[i].y;
+        output(lhs_vert + 2, horiz_base + i) = results[i].z;
+      }
+    } else if (lhs_vert + 1 < m_size) {
+      for (int i = 0; i < 4; i++) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+        output(lhs_vert + 1, horiz_base + i) = results[i].y;
+      }
+    } else if (lhs_vert  < m_size) {
+      for (int i = 0; i < 4; i++) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+      }
+    }
+  } else if (!CHECK_LHS_BOUNDARY) {
+    // CHECK RHS
+    /*
+    int ncols_rem = fminf(n_size- horiz_base, 4);
+    for (int i = 0; i < ncols_rem; i++) {
+      output(lhs_vert, horiz_base + i) = results[i].x;
+      output(lhs_vert + 1, horiz_base + i) = results[i].y;
+      output(lhs_vert + 2, horiz_base + i) = results[i].z;
+      output(lhs_vert + 3, horiz_base + i) = results[i].w;
+    }*/
+    for (int i = 0; i < 4; i++) {
+      if (horiz_base+i < n_size) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+        output(lhs_vert + 1, horiz_base + i) = results[i].y;
+        output(lhs_vert + 2, horiz_base + i) = results[i].z;
+        output(lhs_vert + 3, horiz_base + i) = results[i].w;
+       }
+    }
+  } else {
+    // CHECK both boundaries.
+    for (int i = 0; i < 4; i++) {
+      if (horiz_base+i < n_size) {
+        if (lhs_vert < m_size)
+          output(lhs_vert, horiz_base + i) = results[i].x;
+        if (lhs_vert + 1 < m_size)
+          output(lhs_vert + 1, horiz_base + i) = results[i].y;
+        if (lhs_vert + 2 < m_size)
+          output(lhs_vert + 2, horiz_base + i) = results[i].z;
+        if (lhs_vert + 3 < m_size)
+          output(lhs_vert + 3, horiz_base + i) = results[i].w;
+      }
+    }
+  }
+}
+
+
+template<typename Index, typename LhsMapper,
+         typename RhsMapper, typename OutputMapper, bool CHECK_LHS_BOUNDARY,
+         bool CHECK_RHS_BOUNDARY>
+__device__ EIGEN_ALWAYS_INLINE void
+EigenFloatContractionKernelInternal(const LhsMapper lhs, const RhsMapper rhs,
+                       const OutputMapper output, float2 lhs_shmem2[][32],
+                       float2 rhs_shmem2[][8], const Index m_size,
+                       const Index n_size, const Index k_size,
+                       const Index base_m, const Index base_n) {
+  typedef float Scalar;
+
+  // prefetch registers
+  float4 lhs_pf0, lhs_pf1, lhs_pf2, lhs_pf3;
+  float4 rhs_pf0, rhs_pf1;
+
+  float4 results[8];
+  for (int i=0; i < 8; i++) {
+    results[i].x = results[i].y = results[i].z = results[i].w = 0;
+  }
+
+
+  Index lhs_vert = base_m+threadIdx.x*4+(threadIdx.y%4)*32;
+  for (Index k = 0; k < k_size; k += 32) {
+    lhs_pf0 = internal::pset1<float4>(0);
+    lhs_pf1 = internal::pset1<float4>(0);
+    lhs_pf2 = internal::pset1<float4>(0);
+    lhs_pf3 = internal::pset1<float4>(0);
+
+    rhs_pf0 = internal::pset1<float4>(0);
+    rhs_pf1 = internal::pset1<float4>(0);
+
+     if (!CHECK_LHS_BOUNDARY) {
+      if ((threadIdx.y/4+k+24) < k_size) {
+        lhs_pf0 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k));
+        lhs_pf1 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+8));
+        lhs_pf2 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+16));
+        lhs_pf3 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+24));
+      } else if ((threadIdx.y/4+k+16) < k_size) {
+        lhs_pf0 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k));
+        lhs_pf1 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+8));
+        lhs_pf2 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+16));
+      } else if ((threadIdx.y/4+k+8) < k_size) {
+        lhs_pf0 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k));
+        lhs_pf1 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+8));
+      } else if ((threadIdx.y/4+k) < k_size) {
+        lhs_pf0 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k));
+      }
+    } else {
+      // just CHECK_LHS_BOUNDARY
+      if (lhs_vert + 3 < m_size) {
+        if ((threadIdx.y/4+k+24) < k_size) {
+          lhs_pf0 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k));
+          lhs_pf1 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+8));
+          lhs_pf2 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+16));
+          lhs_pf3 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+24));
+        } else if ((threadIdx.y/4+k+16) < k_size) {
+          lhs_pf0 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k));
+          lhs_pf1 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+8));
+          lhs_pf2 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+16));
+        } else if ((threadIdx.y/4+k+8) < k_size) {
+          lhs_pf0 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k));
+          lhs_pf1 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k+8));
+        } else if ((threadIdx.y/4+k) < k_size) {
+          lhs_pf0 =lhs.loadPacket(lhs_vert, (threadIdx.y/4+k));
+        }
+      } else if (lhs_vert + 2 < m_size) {
+        if ((threadIdx.y/4+k+24) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k));
+          lhs_pf0.z =lhs(lhs_vert + 2, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+          lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8));
+          lhs_pf1.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+8));
+          lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16));
+          lhs_pf2.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+16));
+          lhs_pf2.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+16));
+          lhs_pf3.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+24));
+          lhs_pf3.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+24));
+          lhs_pf3.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+24));
+        } else if ((threadIdx.y/4+k+16) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k));
+          lhs_pf0.z =lhs(lhs_vert + 2, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+          lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8));
+          lhs_pf1.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+8));
+          lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16));
+          lhs_pf2.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+16));
+          lhs_pf2.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+16));
+        } else if ((threadIdx.y/4+k+8) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k));
+          lhs_pf0.z =lhs(lhs_vert + 2, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+          lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8));
+          lhs_pf1.z =lhs(lhs_vert + 2, (threadIdx.y/4+k+8));
+        } else if ((threadIdx.y/4+k) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k));
+          lhs_pf0.z =lhs(lhs_vert + 2, (threadIdx.y/4+k));
+        }
+      } else if (lhs_vert + 1 < m_size) {
+        if ((threadIdx.y/4+k+24) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+          lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8));
+          lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16));
+          lhs_pf2.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+16));
+          lhs_pf3.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+24));
+          lhs_pf3.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+24));
+        } else if ((threadIdx.y/4+k+16) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+          lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8));
+          lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16));
+          lhs_pf2.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+16));
+        } else if ((threadIdx.y/4+k+8) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+          lhs_pf1.y =lhs(lhs_vert + 1, (threadIdx.y/4+k+8));
+        } else if ((threadIdx.y/4+k) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf0.y =lhs(lhs_vert + 1, (threadIdx.y/4+k));
+        }
+      } else if (lhs_vert < m_size) {
+        if ((threadIdx.y/4+k+24) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+          lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16));
+          lhs_pf3.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+24));
+        } else if ((threadIdx.y/4+k+16) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+          lhs_pf2.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+16));
+        } else if ((threadIdx.y/4+k+8) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+          lhs_pf1.x =lhs(lhs_vert + 0, (threadIdx.y/4+k+8));
+        } else if ((threadIdx.y/4+k) < k_size) {
+          lhs_pf0.x =lhs(lhs_vert + 0, (threadIdx.y/4+k));
+        }
+      }
+    }
+    __syncthreads();
+    Index rhs_vert = k+threadIdx.x*4;
+    Index rhs_horiz0 = threadIdx.y*2+base_n;
+    Index rhs_horiz1 = threadIdx.y*2+1+base_n;
+    if (!CHECK_RHS_BOUNDARY) {
+      if ((rhs_vert + 3) < k_size) {
+        // just CHECK_RHS_BOUNDARY
+        rhs_pf0 = rhs.loadPacket(rhs_vert, rhs_horiz0);
+        rhs_pf1 = rhs.loadPacket(rhs_vert, rhs_horiz1);
+      } else if (rhs_vert + 2 < k_size) {
+        // just CHECK_RHS_BOUNDARY
+        rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+        rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+        rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0);
+        rhs_pf1.x = rhs(rhs_vert, rhs_horiz1);
+        rhs_pf1.y = rhs(rhs_vert + 1, rhs_horiz1);
+        rhs_pf1.z = rhs(rhs_vert + 2, rhs_horiz1);
+      } else if (rhs_vert + 1 < k_size) {
+        rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+        rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+        rhs_pf1.x = rhs(rhs_vert, rhs_horiz1);
+        rhs_pf1.y = rhs(rhs_vert + 1, rhs_horiz1);
+      } else if (rhs_vert  < k_size) {
+        rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+        rhs_pf1.x = rhs(rhs_vert, rhs_horiz1);
+      }
+    } else {
+      if (rhs_horiz1 < n_size) {
+        if ((rhs_vert + 3) < k_size) {
+          // just CHECK_RHS_BOUNDARY
+          rhs_pf0 = rhs.loadPacket(rhs_vert, rhs_horiz0);
+          rhs_pf1 = rhs.loadPacket(rhs_vert, rhs_horiz1);
+        } else if (rhs_vert + 2 < k_size) {
+          // just CHECK_RHS_BOUNDARY
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+          rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+          rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0);
+          rhs_pf1.x = rhs(rhs_vert, rhs_horiz1);
+          rhs_pf1.y = rhs(rhs_vert + 1, rhs_horiz1);
+          rhs_pf1.z = rhs(rhs_vert + 2, rhs_horiz1);
+        } else if (k+threadIdx.x*4 + 1 < k_size) {
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+          rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+          rhs_pf1.x = rhs(rhs_vert, rhs_horiz1);
+          rhs_pf1.y = rhs(rhs_vert + 1, rhs_horiz1);
+        } else if (k+threadIdx.x*4  < k_size) {
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+          rhs_pf1.x = rhs(rhs_vert, rhs_horiz1);
+        }
+      } else if (rhs_horiz0 < n_size) {
+        if ((rhs_vert + 3) < k_size) {
+          // just CHECK_RHS_BOUNDARY
+          rhs_pf0 = rhs.loadPacket(rhs_vert, rhs_horiz0);
+        } else if ((rhs_vert + 2) < k_size) {
+          // just CHECK_RHS_BOUNDARY
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+          rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+          rhs_pf0.z = rhs(rhs_vert + 2, rhs_horiz0);
+        } else if ((rhs_vert + 1) < k_size) {
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+          rhs_pf0.y = rhs(rhs_vert + 1, rhs_horiz0);
+        } else if (rhs_vert  < k_size) {
+          rhs_pf0.x = rhs(rhs_vert, rhs_horiz0);
+        }
+      }
+    }
+    __syncthreads();
+    // Loaded. Do computation
+    // Row 0 -> times (0, 4, 8, .. 28) for features 0, 1.
+    // Row 1 -> times (0, 4, 8, .. 28) for features 2, 3.
+    // ..
+    // Row 31 -> times (0, 4, 8, .. 28) for features 62, 63
+    rhs_shmem2[threadIdx.y][threadIdx.x] = make_float2(rhs_pf0.x, rhs_pf1.x);
+    // Row 32 -> times (1, 5, 9, .. 29) for features 0, 1.
+    // Row 33 -> times (1, 5, 9, .. 29) for features 2, 3.
+    // ..
+    rhs_shmem2[threadIdx.y+32][threadIdx.x] = make_float2(rhs_pf0.y, rhs_pf1.y);
+    // Row 64 -> times (2, 6, 10, .. 30) for features 0, 1.
+    // Row 65 -> times (2, 6, 10, .. 30) for features 2, 3.
+    rhs_shmem2[threadIdx.y+64][threadIdx.x] = make_float2(rhs_pf0.z, rhs_pf1.z);
+    // Row 96 -> times (3, 7, 11, .. 31) for features 0, 1.
+    // Row 97 -> times (3, 7, 11, .. 31) for features 2, 3.
+    rhs_shmem2[threadIdx.y+96][threadIdx.x] = make_float2(rhs_pf0.w, rhs_pf1.w);
+
+    // LHS.
+    // Row 0 (time 0) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), ..  (60, 61) .. (124, 125)
+    // Row 1 (time 1) -> features (0, 1), (4, 5), .. (28, 29), (32, 33), ..  (60, 61) .. (124, 125)
+    // ...
+    // Row 8 (time 0) -> features (2, 3), (6, 7), .. (30, 31), (34, 35), ..  (62, 63) .. (126, 127)
+    // Row 15 (time 7) -> features (2, 3), (6, 7), .. (30, 31), (34, 35), ..  (62, 63) .. (126, 127)
+
+
+#define add_vals(a_feat1, a_feat2, f1, f2, f3, f4)\
+      results[0].x += a_feat1.x * f1.x;\
+      results[1].x += a_feat1.x * f1.y;\
+      results[2].x += a_feat1.x * f2.x;\
+      results[3].x += a_feat1.x * f2.y;\
+      results[4].x += a_feat1.x * f3.x;\
+      results[5].x += a_feat1.x * f3.y;\
+      results[6].x += a_feat1.x * f4.x;\
+      results[7].x += a_feat1.x * f4.y;\
+\
+      results[0].y += a_feat1.y * f1.x;\
+      results[1].y += a_feat1.y * f1.y;\
+      results[2].y += a_feat1.y * f2.x;\
+      results[3].y += a_feat1.y * f2.y;\
+      results[4].y += a_feat1.y * f3.x;\
+      results[5].y += a_feat1.y * f3.y;\
+      results[6].y += a_feat1.y * f4.x;\
+      results[7].y += a_feat1.y * f4.y;\
+\
+      results[0].z += a_feat2.x * f1.x;\
+      results[1].z += a_feat2.x * f1.y;\
+      results[2].z += a_feat2.x * f2.x;\
+      results[3].z += a_feat2.x * f2.y;\
+      results[4].z += a_feat2.x * f3.x;\
+      results[5].z += a_feat2.x * f3.y;\
+      results[6].z += a_feat2.x * f4.x;\
+      results[7].z += a_feat2.x * f4.y;\
+\
+      results[0].w += a_feat2.y * f1.x;\
+      results[1].w += a_feat2.y * f1.y;\
+      results[2].w += a_feat2.y * f2.x;\
+      results[3].w += a_feat2.y * f2.y;\
+      results[4].w += a_feat2.y * f3.x;\
+      results[5].w += a_feat2.y * f3.y;\
+      results[6].w += a_feat2.y * f4.x;\
+      results[7].w += a_feat2.y * f4.y;\
+
+    lhs_shmem2[threadIdx.y/4][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf0.x, lhs_pf0.y);
+    lhs_shmem2[threadIdx.y/4+8][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf1.x, lhs_pf1.y);
+    lhs_shmem2[threadIdx.y/4+16][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf2.x, lhs_pf2.y);
+    lhs_shmem2[threadIdx.y/4+24][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf3.x, lhs_pf3.y);
+
+    lhs_shmem2[threadIdx.y/4 + 32][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf0.z, lhs_pf0.w);
+    lhs_shmem2[threadIdx.y/4 + 40][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf1.z, lhs_pf1.w);
+    lhs_shmem2[threadIdx.y/4 + 48][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf2.z, lhs_pf2.w);
+    lhs_shmem2[threadIdx.y/4 + 56][threadIdx.x+(threadIdx.y%4)*8] = make_float2(lhs_pf3.z, lhs_pf3.w);
+
+    __syncthreads();
+
+    // Do the multiplies.
+    #pragma unroll
+    for (int koff = 0; koff < 32; koff ++) {
+      float2 a3 = lhs_shmem2[koff][threadIdx.x + (threadIdx.y % 4) * 8];
+      float2 a4 = lhs_shmem2[koff + 32][threadIdx.x + (threadIdx.y % 4) * 8];
+
+      // first feature is at (threadIdx.y/4) * 8 last is at start + 8.
+      int start_feature = (threadIdx.y / 4) * 8;
+
+      float2 br1 = rhs_shmem2[start_feature/2 +     (koff % 4) * 32][koff/4];
+      float2 br2 = rhs_shmem2[start_feature/2 + 1 + (koff % 4) * 32][koff/4];
+      float2 br3 = rhs_shmem2[start_feature/2 + 2 + (koff % 4) * 32][koff/4];
+      float2 br4 = rhs_shmem2[start_feature/2 + 3 + (koff % 4) * 32][koff/4];
+
+      add_vals(a3, a4, br1, br2, br3, br4)
+    }
+    __syncthreads();
+  } // end loop over k
+
+
+  __syncthreads();
+  Index horiz_base = (threadIdx.y/4)*8+base_n;
+  if (!CHECK_LHS_BOUNDARY && !CHECK_RHS_BOUNDARY) {
+    #pragma unroll
+    for (int i = 0; i < 8; i++) {
+      output(lhs_vert, horiz_base + i) = results[i].x;
+      output(lhs_vert + 1, horiz_base + i) = results[i].y;
+      output(lhs_vert + 2, horiz_base + i) = results[i].z;
+      output(lhs_vert + 3, horiz_base + i) = results[i].w;
+    }
+  } else if (!CHECK_RHS_BOUNDARY) {
+    if (lhs_vert + 3 < m_size) {
+      #pragma unroll
+      for (int i = 0; i < 8; i++) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+        output(lhs_vert + 1, horiz_base + i) = results[i].y;
+        output(lhs_vert + 2, horiz_base + i) = results[i].z;
+        output(lhs_vert + 3, horiz_base + i) = results[i].w;
+      }
+    } else if (lhs_vert + 2 < m_size) {
+      #pragma unroll
+      for (int i = 0; i < 8; i++) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+        output(lhs_vert + 1, horiz_base + i) = results[i].y;
+        output(lhs_vert + 2, horiz_base + i) = results[i].z;
+      }
+    } else if (lhs_vert + 1 < m_size) {
+      #pragma unroll
+      for (int i = 0; i < 8; i++) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+        output(lhs_vert + 1, horiz_base + i) = results[i].y;
+      }
+    } else if (lhs_vert  < m_size) {
+      #pragma unroll
+      for (int i = 0; i < 8; i++) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+      }
+    }
+  } else if (!CHECK_LHS_BOUNDARY) {
+    // CHECK BOUNDARY_B
+    #pragma unroll
+    for (int i = 0; i < 8; i++) {
+      if (horiz_base + i < n_size) {
+        output(lhs_vert, horiz_base + i) = results[i].x;
+        output(lhs_vert + 1, horiz_base + i) = results[i].y;
+        output(lhs_vert + 2, horiz_base + i) = results[i].z;
+        output(lhs_vert + 3, horiz_base + i) = results[i].w;
+      }
+    }
+  } else {
+    // CHECK both boundaries.
+    #pragma unroll
+    for (int i = 0; i < 8; i++) {
+      if (horiz_base + i < n_size) {
+        if (lhs_vert < m_size)
+          output(lhs_vert, horiz_base + i) = results[i].x;
+        if (lhs_vert + 1 < m_size)
+          output(lhs_vert + 1, horiz_base + i) = results[i].y;
+        if (lhs_vert + 2 < m_size)
+          output(lhs_vert + 2, horiz_base + i) = results[i].z;
+        if (lhs_vert + 3 < m_size)
+          output(lhs_vert + 3, horiz_base + i) = results[i].w;
+      }
+    }
+  }
+}
+
+
+template<typename Index, typename LhsMapper,
+         typename RhsMapper, typename OutputMapper>
+__global__ void
+__launch_bounds__(256)
+EigenFloatContractionKernel(const LhsMapper lhs, const RhsMapper rhs,
+                       const OutputMapper output,
+                       const Index m_size, const Index n_size, const Index k_size) {
+  __shared__ float2 lhs_shmem[64*32];
+  __shared__ float2 rhs_shmem[128*8];
+
+  typedef float2 LHS_MEM[64][32];
+  typedef float2 RHS_MEM[128][8];
+
+  typedef float2 LHS_MEM16x16[32][16];
+  typedef float2 RHS_MEM16x16[64][8];
+
+  const Index m_block_idx = blockIdx.x;
+  const Index n_block_idx = blockIdx.y;
+
+  const Index base_m = 128 * m_block_idx;
+  const Index base_n = 64 * n_block_idx;
+
+  const bool check_rhs = (base_n + 63) >= n_size;
+  const bool check_lhs128 = (base_m + 127) >= m_size;
+
+  if (!check_rhs) {
+    if (!check_lhs128) {
+      // >= 128 rows left
+      EigenFloatContractionKernelInternal<Index, LhsMapper, RhsMapper, OutputMapper, false, false>(
+                     lhs, rhs, output, *((LHS_MEM *) lhs_shmem), *((RHS_MEM *) rhs_shmem), m_size, n_size, k_size, base_m, base_n);
+    } else {
+      EigenFloatContractionKernelInternal<Index, LhsMapper, RhsMapper, OutputMapper, true, false>(
+                     lhs, rhs, output, *((LHS_MEM *) lhs_shmem), *((RHS_MEM *) rhs_shmem), m_size, n_size, k_size, base_m, base_n);
+    }
+  } else {
+    if (!check_lhs128) {
+      // >= 128 rows left
+      EigenFloatContractionKernelInternal<Index, LhsMapper, RhsMapper, OutputMapper, false, true>(
+                     lhs, rhs, output, *((LHS_MEM *) lhs_shmem), *((RHS_MEM *) rhs_shmem), m_size, n_size, k_size, base_m, base_n);
+    } else {
+      EigenFloatContractionKernelInternal<Index, LhsMapper, RhsMapper, OutputMapper, true, true>(
+                     lhs, rhs, output, *((LHS_MEM *) lhs_shmem), *((RHS_MEM *) rhs_shmem), m_size, n_size, k_size, base_m, base_n);
+    }
+  }
+}
+
+template<typename Index, typename LhsMapper,
+         typename RhsMapper, typename OutputMapper>
+__global__ void
+__launch_bounds__(256)
+EigenFloatContractionKernel16x16(const LhsMapper lhs, const RhsMapper rhs,
+                       const OutputMapper output,
+                       const Index m_size, const Index n_size, const Index k_size) {
+  __shared__ float2 lhs_shmem[32][16];
+  __shared__ float2 rhs_shmem[64][8];
+
+  const Index m_block_idx = blockIdx.x;
+  const Index n_block_idx = blockIdx.y;
+
+  const Index base_m = 64 * m_block_idx;
+  const Index base_n = 64 * n_block_idx;
+
+  if (base_m + 63 < m_size) {
+    if (base_n + 63 < n_size) {
+      EigenFloatContractionKernelInternal16x16<Index, LhsMapper, RhsMapper, OutputMapper, false, false>(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size, base_m, base_n);
+    } else {
+      EigenFloatContractionKernelInternal16x16<Index, LhsMapper, RhsMapper, OutputMapper, false, true>(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size, base_m, base_n);
+    }
+  } else {
+    if (base_n + 63 < n_size) {
+      EigenFloatContractionKernelInternal16x16<Index, LhsMapper, RhsMapper, OutputMapper, true, false>(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size, base_m, base_n);
+    } else {
+      EigenFloatContractionKernelInternal16x16<Index, LhsMapper, RhsMapper, OutputMapper, true, true>(lhs, rhs, output, lhs_shmem, rhs_shmem, m_size, n_size, k_size, base_m, base_n);
+    }
+  }
+}
+
+
+template<typename Indices, typename LeftArgType, typename RightArgType>
+struct TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, GpuDevice> :
+    public TensorContractionEvaluatorBase<TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, GpuDevice> > {
+
+  typedef GpuDevice Device;
+
+  typedef TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, Device> Self;
+  typedef TensorContractionEvaluatorBase<Self> Base;
+
+  typedef TensorContractionOp<Indices, LeftArgType, RightArgType> XprType;
+  typedef typename internal::remove_const<typename XprType::Scalar>::type Scalar;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, GpuDevice>::type PacketReturnType;
+
+  enum {
+    Layout = TensorEvaluator<LeftArgType, Device>::Layout,
+  };
+
+  // Most of the code is assuming that both input tensors are ColMajor. If the
+  // inputs are RowMajor, we will "cheat" by swapping the LHS and RHS:
+  // If we want to compute A * B = C, where A is LHS and B is RHS, the code
+  // will pretend B is LHS and A is RHS.
+  typedef typename internal::conditional<
+    static_cast<int>(Layout) == static_cast<int>(ColMajor), LeftArgType, RightArgType>::type EvalLeftArgType;
+  typedef typename internal::conditional<
+    static_cast<int>(Layout) == static_cast<int>(ColMajor), RightArgType, LeftArgType>::type EvalRightArgType;
+
+  static const int LDims =
+      internal::array_size<typename TensorEvaluator<EvalLeftArgType, Device>::Dimensions>::value;
+  static const int RDims =
+      internal::array_size<typename TensorEvaluator<EvalRightArgType, Device>::Dimensions>::value;
+  static const int ContractDims = internal::array_size<Indices>::value;
+
+  typedef array<Index, LDims> left_dim_mapper_t;
+  typedef array<Index, RDims> right_dim_mapper_t;
+
+  typedef array<Index, ContractDims> contract_t;
+  typedef array<Index, LDims - ContractDims> left_nocontract_t;
+  typedef array<Index, RDims - ContractDims> right_nocontract_t;
+
+  static const int NumDims = LDims + RDims - 2 * ContractDims;
+
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  // typedefs needed in evalTo
+  typedef typename internal::remove_const<typename EvalLeftArgType::Scalar>::type LhsScalar;
+  typedef typename internal::remove_const<typename EvalRightArgType::Scalar>::type RhsScalar;
+
+  typedef TensorEvaluator<EvalLeftArgType, Device> LeftEvaluator;
+  typedef TensorEvaluator<EvalRightArgType, Device> RightEvaluator;
+
+  typedef typename LeftEvaluator::Dimensions LeftDimensions;
+  typedef typename RightEvaluator::Dimensions RightDimensions;
+
+  EIGEN_DEVICE_FUNC TensorEvaluator(const XprType& op, const Device& device) :
+      Base(op, device) {}
+
+  // We need to redefine this method to make nvcc happy
+  EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* data) {
+    this->m_leftImpl.evalSubExprsIfNeeded(NULL);
+    this->m_rightImpl.evalSubExprsIfNeeded(NULL);
+    if (data) {
+      evalTo(data);
+      return false;
+    } else {
+      this->m_result = static_cast<Scalar *>(this->m_device.allocate(this->dimensions().TotalSize() * sizeof(Scalar)));
+      evalTo(this->m_result);
+      return true;
+    }
+  }
+
+  void evalTo(Scalar* buffer) const {
+    if (this->m_lhs_inner_dim_contiguous) {
+      if (this->m_rhs_inner_dim_contiguous) {
+        if (this->m_rhs_inner_dim_reordered) {
+          evalTyped<true, true, true, Unaligned>(buffer);
+        }
+        else {
+          evalTyped<true, true, false, Unaligned>(buffer);
+        }
+      }
+      else {
+       if (this->m_rhs_inner_dim_reordered) {
+          evalTyped<true, false, true, Unaligned>(buffer);
+        }
+        else {
+          evalTyped<true, false, false, Unaligned>(buffer);
+        }
+      }
+    }
+    else {
+      if (this->m_rhs_inner_dim_contiguous) {
+        if (this->m_rhs_inner_dim_reordered) {
+          evalTyped<false, true, true, Unaligned>(buffer);
+        }
+        else {
+          evalTyped<false, true, false, Unaligned>(buffer);
+        }
+      }
+      else {
+       if (this->m_rhs_inner_dim_reordered) {
+          evalTyped<false, false, true, Unaligned>(buffer);
+        }
+        else {
+          evalTyped<false, false, false, Unaligned>(buffer);
+        }
+      }
+    }
+  }
+
+  template <bool lhs_inner_dim_contiguous, bool rhs_inner_dim_contiguous, bool rhs_inner_dim_reordered, int Alignment>
+  void evalTyped(Scalar* buffer) const {
+    // columns in left side, rows in right side
+    const Index k = this->m_k_size;
+
+    // rows in left side
+    const Index m = this->m_i_size;
+
+    // columns in right side
+    const Index n = this->m_j_size;
+
+    // zero out the result buffer (which must be of size at least m * n * sizeof(Scalar)
+    this->m_device.memset(buffer, 0, m * n * sizeof(Scalar));
+
+    typedef internal::TensorContractionInputMapper<LhsScalar, Index, internal::Lhs,
+                                                   LeftEvaluator, left_nocontract_t,
+                                                   contract_t, 4,
+                                                   lhs_inner_dim_contiguous,
+                                                   false, Unaligned> LhsMapper;
+
+    typedef internal::TensorContractionInputMapper<RhsScalar, Index, internal::Rhs,
+                                                   RightEvaluator, right_nocontract_t,
+                                                   contract_t, 4,
+                                                   rhs_inner_dim_contiguous,
+                                                   rhs_inner_dim_reordered, Unaligned> RhsMapper;
+
+    typedef internal::blas_data_mapper<Scalar, Index, ColMajor> OutputMapper;
+
+
+    // initialize data mappers
+    LhsMapper lhs(this->m_leftImpl, this->m_left_nocontract_strides, this->m_i_strides,
+                  this->m_left_contracting_strides, this->m_k_strides);
+
+    RhsMapper rhs(this->m_rightImpl, this->m_right_nocontract_strides, this->m_j_strides,
+                  this->m_right_contracting_strides, this->m_k_strides);
+
+    OutputMapper output(buffer, m);
+
+    setCudaSharedMemConfig(cudaSharedMemBankSizeEightByte);
+    if (internal::is_same<LhsScalar, float>::value &&
+        internal::is_same<RhsScalar, float>::value) {
+      if (m < 768 || n < 768) {
+        const Index m_blocks = (m + 63) / 64;
+        const Index n_blocks = (n + 63) / 64;
+        const dim3 num_blocks(m_blocks, n_blocks, 1);
+        const dim3 block_size(16, 16, 1);
+        LAUNCH_CUDA_KERNEL((EigenFloatContractionKernel16x16<Index, LhsMapper, RhsMapper, OutputMapper>), num_blocks, block_size, 0, this->m_device, lhs, rhs, output, m, n, k);
+      } else {
+       const Index m_blocks = (m + 127) / 128;
+        const Index n_blocks = (n + 63) / 64;
+        const dim3 num_blocks(m_blocks, n_blocks, 1);
+        const dim3 block_size(8, 32, 1);
+        LAUNCH_CUDA_KERNEL((EigenFloatContractionKernel<Index, LhsMapper, RhsMapper, OutputMapper>), num_blocks, block_size, 0, this->m_device, lhs, rhs, output, m, n, k);
+      }
+    } else {
+      const Index m_blocks = (m + 63) / 64;
+      const Index n_blocks = (n + 63) / 64;
+      const dim3 num_blocks(m_blocks, n_blocks, 1);
+      const dim3 block_size(8, 8, 8);
+      LAUNCH_CUDA_KERNEL((EigenContractionKernel<Scalar, Index, LhsMapper, RhsMapper, OutputMapper>), num_blocks, block_size, 0, this->m_device, lhs, rhs, output, m, n, k);
+    }
+  }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_USE_GPU and __CUDACC__
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_CUDA_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionMappers.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionMappers.h
new file mode 100644
index 0000000000..b5b09bf41e
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionMappers.h
@@ -0,0 +1,383 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Eric Martin <eric@ericmart.in>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_MAPPERS_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_MAPPERS_H
+
+// NOTE: The file has strong column major bias/assumptions, which is pointed out
+// in comments. As of right now, this code will only work the column major packing
+// routines.
+
+/*
+ * A tensor contraction can be represented by a matrix multiplication. We don't
+ * want to actually reshape the tensor into a matrix (because this involves a
+ * full copy of the tensor), so the reshaping operation is implicit in a sense.
+ * This means we need a collection of methods take a matrix index and return
+ * the element of the tensor that would be at that index if we were to actually
+ * reshape the matrix. This file consists of these methods.
+ */
+
+namespace Eigen {
+namespace internal {
+
+enum {
+  Rhs = 0,
+  Lhs = 1,
+};
+
+/*
+ * Used to lookup the tensor index when working with the left and right
+ * arguments to a tensor contraction.
+ */
+template<typename Scalar, typename Index, int side,
+         typename Tensor,
+         typename nocontract_t, typename contract_t,
+         size_t packet_size, bool inner_dim_contiguous>
+class SimpleTensorContractionMapper {
+  public:
+  EIGEN_DEVICE_FUNC
+  SimpleTensorContractionMapper(const Tensor& tensor,
+                              const nocontract_t& nocontract_strides,
+                              const nocontract_t& ij_strides,
+                              const contract_t& contract_strides,
+                              const contract_t& k_strides) :
+      m_tensor(tensor),
+      m_nocontract_strides(nocontract_strides),
+      m_ij_strides(ij_strides),
+      m_contract_strides(contract_strides),
+      m_k_strides(k_strides) { }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE void prefetch(int i) { }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE Scalar operator()(Index row) const {
+    // column major assumption
+    return operator()(row, 0);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE Scalar operator()(Index row, Index col) const {
+    return m_tensor.coeff(computeIndex(row, col));
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE Index computeIndex(Index row, Index col) const {
+    const bool left = (side == Lhs);
+    Index nocontract_val = left ? row : col;
+    Index linidx = 0;
+    for (int i = array_size<nocontract_t>::value - 1; i > 0; i--) {
+      const Index idx = nocontract_val / m_ij_strides[i];
+      linidx += idx * m_nocontract_strides[i];
+      nocontract_val -= idx * m_ij_strides[i];
+    }
+    if (array_size<typename Tensor::Dimensions>::value > array_size<contract_t>::value) {
+      if (side == Lhs && inner_dim_contiguous) {
+        eigen_assert(m_nocontract_strides[0] == 1);
+        linidx += nocontract_val;
+      } else {
+        linidx += nocontract_val * m_nocontract_strides[0];
+      }
+    }
+
+    Index contract_val = left ? col : row;
+    for (int i = array_size<contract_t>::value - 1; i > 0; i--) {
+      const Index idx = contract_val / m_k_strides[i];
+      linidx += idx * m_contract_strides[i];
+      contract_val -= idx * m_k_strides[i];
+    }
+    EIGEN_STATIC_ASSERT(array_size<contract_t>::value > 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+    if (side == Rhs && inner_dim_contiguous) {
+      eigen_assert(m_contract_strides[0] == 1);
+      linidx += contract_val;
+    } else {
+      linidx += contract_val * m_contract_strides[0];
+    }
+
+    return linidx;
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE IndexPair<Index> computeIndexPair(Index row, Index col, const Index distance) const {
+    const bool left = (side == Lhs);
+    Index nocontract_val[2] = {left ? row : col, left ? row + distance : col};
+    Index linidx[2] = {0, 0};
+    for (int i = array_size<nocontract_t>::value - 1; i > 0; i--) {
+      const Index idx0 = nocontract_val[0] / m_ij_strides[i];
+      const Index idx1 = nocontract_val[1] / m_ij_strides[i];
+      linidx[0] += idx0 * m_nocontract_strides[i];
+      linidx[1] += idx1 * m_nocontract_strides[i];
+      nocontract_val[0] -= idx0 * m_ij_strides[i];
+      nocontract_val[1] -= idx1 * m_ij_strides[i];
+    }
+    if (array_size<typename Tensor::Dimensions>::value > array_size<contract_t>::value) {
+      if (side == Lhs && inner_dim_contiguous) {
+        eigen_assert(m_nocontract_strides[0] == 1);
+        linidx[0] += nocontract_val[0];
+        linidx[1] += nocontract_val[1];
+      } else {
+        linidx[0] += nocontract_val[0] * m_nocontract_strides[0];
+        linidx[1] += nocontract_val[1] * m_nocontract_strides[0];
+      }
+    }
+
+    Index contract_val[2] = {left ? col : row, left ? col : row + distance};
+    for (int i = array_size<contract_t>::value - 1; i > 0; i--) {
+      const Index idx0 = contract_val[0] / m_k_strides[i];
+      const Index idx1 = contract_val[1] / m_k_strides[i];
+      linidx[0] += idx0 * m_contract_strides[i];
+      linidx[1] += idx1 * m_contract_strides[i];
+      contract_val[0] -= idx0 * m_k_strides[i];
+      contract_val[1] -= idx1 * m_k_strides[i];
+    }
+    EIGEN_STATIC_ASSERT(array_size<contract_t>::value > 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+    if (side == Rhs && inner_dim_contiguous) {
+      eigen_assert(m_contract_strides[0] == 1);
+      linidx[0] += contract_val[0];
+      linidx[1] += contract_val[1];
+    } else {
+      linidx[0] += contract_val[0] * m_contract_strides[0];
+      linidx[1] += contract_val[1] * m_contract_strides[0];
+    }
+    return IndexPair<Index>(linidx[0], linidx[1]);
+  }
+
+  Index firstAligned(Index size) const {
+    return size;
+  }
+  Index stride() const {
+    return 1;
+  }
+
+ protected:
+  const Tensor m_tensor;
+  const nocontract_t m_nocontract_strides;
+  const nocontract_t m_ij_strides;
+  const contract_t m_contract_strides;
+  const contract_t m_k_strides;
+};
+
+
+
+template<typename Scalar, typename Index, int side,
+         typename Tensor,
+         typename nocontract_t, typename contract_t,
+         size_t packet_size, bool inner_dim_contiguous,
+         bool inner_dim_reordered, int Alignment>
+  class BaseTensorContractionMapper : public SimpleTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous>
+{
+ public:
+  typedef SimpleTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous> ParentMapper;
+
+  EIGEN_DEVICE_FUNC
+  BaseTensorContractionMapper(const Tensor& tensor,
+                              const nocontract_t& nocontract_strides,
+                              const nocontract_t& ij_strides,
+                              const contract_t& contract_strides,
+                              const contract_t& k_strides) :
+  ParentMapper(tensor, nocontract_strides, ij_strides, contract_strides, k_strides) { }
+
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename packet_traits<Scalar>::half HalfPacket;
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE Packet loadPacket(Index i, Index j) const {
+    // whole method makes column major assumption
+
+    // don't need to add offsets for now (because operator handles that)
+    // current code assumes packet size must be a multiple of 2
+    EIGEN_STATIC_ASSERT(packet_size % 2 == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+    if (Tensor::PacketAccess && inner_dim_contiguous && !inner_dim_reordered) {
+      const Index index = this->computeIndex(i, j);
+      eigen_assert(this->computeIndex(i+packet_size-1, j) == index + packet_size-1);
+      return this->m_tensor.template packet<Alignment>(index);
+    }
+
+    const IndexPair<Index> indexPair = this->computeIndexPair(i, j, packet_size - 1);
+    const Index first = indexPair.first;
+    const Index last = indexPair.second;
+
+    // We can always do optimized packet reads from left hand side right now, because
+    // the vertical matrix dimension on the left hand side is never contracting.
+    // On the right hand side we need to check if the contracting dimensions may have
+    // been shuffled first.
+    if (Tensor::PacketAccess &&
+        (side == Lhs || internal::array_size<contract_t>::value <= 1 || !inner_dim_reordered) &&
+        (last - first) == (packet_size - 1)) {
+
+      return this->m_tensor.template packet<Alignment>(first);
+    }
+
+    EIGEN_ALIGN_DEFAULT Scalar data[packet_size];
+
+    data[0] = this->m_tensor.coeff(first);
+    for (Index k = 1; k < packet_size - 1; k += 2) {
+      const IndexPair<Index> internal_pair = this->computeIndexPair(i + k, j, 1);
+      data[k] = this->m_tensor.coeff(internal_pair.first);
+      data[k + 1] = this->m_tensor.coeff(internal_pair.second);
+    }
+    data[packet_size - 1] = this->m_tensor.coeff(last);
+
+    return pload<Packet>(data);
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE HalfPacket loadHalfPacket(Index i, Index j) const {
+    // whole method makes column major assumption
+
+    // don't need to add offsets for now (because operator handles that)
+    const Index half_packet_size = unpacket_traits<HalfPacket>::size;
+    if (half_packet_size == packet_size) {
+      return loadPacket(i, j);
+    }
+    EIGEN_ALIGN_DEFAULT Scalar data[half_packet_size];
+    for (Index k = 0; k < half_packet_size; k++) {
+      data[k] = operator()(i + k, j);
+    }
+    return pload<HalfPacket>(data);
+  }
+};
+
+
+template<typename Scalar, typename Index, int side,
+         typename Tensor,
+         typename nocontract_t, typename contract_t,
+         bool inner_dim_contiguous,
+         bool inner_dim_reordered, int Alignment>
+class BaseTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, 1, inner_dim_contiguous, inner_dim_reordered, Alignment> : public SimpleTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, 1, inner_dim_contiguous>
+{
+ public:
+  typedef SimpleTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, 1, inner_dim_contiguous> ParentMapper;
+
+  EIGEN_DEVICE_FUNC
+  BaseTensorContractionMapper(const Tensor& tensor,
+                              const nocontract_t& nocontract_strides,
+                              const nocontract_t& ij_strides,
+                              const contract_t& contract_strides,
+                              const contract_t& k_strides) :
+  ParentMapper(tensor, nocontract_strides, ij_strides, contract_strides, k_strides) { }
+
+  typedef typename packet_traits<Scalar>::type Packet;
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE Packet loadPacket(Index i, Index j) const {
+    EIGEN_ALIGN_DEFAULT Scalar data[1];
+    data[0] = this->m_tensor.coeff(this->computeIndex(i, j));
+    return pload<typename packet_traits<Scalar>::type>(data);
+  }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE Packet loadHalfPacket(Index i, Index j) const {
+    return loadPacket(i, j);
+  }
+};
+
+template<typename Scalar, typename Index, int side,
+         typename Tensor,
+         typename nocontract_t, typename contract_t,
+         size_t packet_size,
+         bool inner_dim_contiguous, bool inner_dim_reordered, int Alignment>
+class TensorContractionInputMapper;
+
+template<typename Scalar, typename Index, int side,
+         typename Tensor,
+         typename nocontract_t, typename contract_t,
+         size_t packet_size,
+         bool inner_dim_contiguous, bool inner_dim_reordered, int Alignment>
+class TensorContractionSubMapper {
+ public:
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename packet_traits<Scalar>::half HalfPacket;
+
+  typedef TensorContractionInputMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment> ParentMapper;
+  typedef TensorContractionSubMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment> Self;
+  typedef Self LinearMapper;
+
+  EIGEN_DEVICE_FUNC TensorContractionSubMapper(const ParentMapper& base_mapper, Index vert_offset, Index horiz_offset)
+      : m_base_mapper(base_mapper), m_vert_offset(vert_offset), m_horiz_offset(horiz_offset) { }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar operator()(Index i) const {
+    return m_base_mapper(i + m_vert_offset, m_horiz_offset);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Scalar operator()(Index i, Index j) const {
+    return m_base_mapper(i + m_vert_offset, j + m_horiz_offset);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet loadPacket(Index i) const {
+    return m_base_mapper.loadPacket(i + m_vert_offset, m_horiz_offset);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE Packet loadPacket(Index i, Index j) const {
+   return m_base_mapper.loadPacket(i + m_vert_offset, j + m_horiz_offset);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE HalfPacket loadHalfPacket(Index i) const {
+    return m_base_mapper.loadHalfPacket(i + m_vert_offset, m_horiz_offset);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE void storePacket(Index i, Packet p) const {
+    m_base_mapper.storePacket(i + m_vert_offset, m_horiz_offset, p);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE LinearMapper getLinearMapper(Index i, Index j) const {
+    return LinearMapper(m_base_mapper, i + m_vert_offset, j + m_horiz_offset);
+  }
+
+  template <typename PacketT, int AlignmentType>
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketT load(Index i) const {
+    EIGEN_STATIC_ASSERT((internal::is_same<PacketT, Packet>::value), YOU_MADE_A_PROGRAMMING_MISTAKE);
+    EIGEN_STATIC_ASSERT((AlignmentType == Aligned || Alignment == Unaligned), YOU_MADE_A_PROGRAMMING_MISTAKE);
+    return loadPacket(i);
+  }
+
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC bool aligned(Index i) const {
+    return false;
+  }
+
+ private:
+  const ParentMapper& m_base_mapper;
+  const Index m_vert_offset;
+  const Index m_horiz_offset;
+};
+
+
+template<typename Scalar, typename Index, int side,
+         typename Tensor,
+         typename nocontract_t, typename contract_t,
+         size_t packet_size,
+         bool inner_dim_contiguous, bool inner_dim_reordered, int Alignment>
+class TensorContractionInputMapper
+  : public BaseTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment> {
+
+ public:
+  typedef BaseTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment> Base;
+  typedef TensorContractionSubMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment> SubMapper;
+  typedef SubMapper VectorMapper;
+
+  EIGEN_DEVICE_FUNC TensorContractionInputMapper(const Tensor& tensor,
+                               const nocontract_t& nocontract_strides,
+                               const nocontract_t& ij_strides,
+                               const contract_t& contract_strides,
+                               const contract_t& k_strides)
+      : Base(tensor, nocontract_strides, ij_strides, contract_strides, k_strides) { }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE SubMapper getSubMapper(Index i, Index j) const {
+    return SubMapper(*this, i, j);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE VectorMapper getVectorMapper(Index i, Index j) const {
+    return VectorMapper(*this, i, j);
+  }
+};
+
+
+} // end namespace internal
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_MAPPERS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h
new file mode 100644
index 0000000000..c335086902
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h
@@ -0,0 +1,713 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_THREAD_POOL_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_THREAD_POOL_H
+
+namespace Eigen {
+namespace internal {
+
+// Specify blocking strategy for thread pool by cols
+template<typename LhsScalar, typename RhsScalar, int KcFactor, typename Index>
+struct ComputeGemmByColBlockingSizes {
+  void operator()(Index& k, Index& m, Index& n, Index num_threads = 1)
+  {
+    computeProductBlockingSizes<LhsScalar,RhsScalar,1>(k, m, n, num_threads);
+  }
+};
+
+// Specify blocking strategy for thread pool by rows
+template<typename LhsScalar, typename RhsScalar, int KcFactor, typename Index>
+struct ComputeGemmByRowBlockingSizes {
+  void operator()(Index& k, Index& m, Index& n, Index num_threads = 1)
+  {
+    if (!k || !m || !n) {
+      return;
+    }
+    m = (((m / num_threads) + 15) / 16) * 16;
+  }
+};
+
+} // namespace internal
+} // namespace Eigen
+
+// evaluator for thread pool device
+#ifdef EIGEN_USE_THREADS
+
+namespace Eigen {
+namespace internal {
+
+template<typename LhsScalar, typename LhsMapper, typename Index>
+struct packLhsArg {
+  LhsScalar* blockA;
+  const LhsMapper& lhs;
+  const Index m_start;
+  const Index k_start;
+  const Index mc;
+  const Index kc;
+};
+
+template<typename LhsScalar, typename RhsScalar, typename RhsMapper, typename OutputMapper, typename Index>
+struct packRhsAndKernelArg {
+  const FixedSizeVector<LhsScalar*>* blockAs;
+  RhsScalar* blockB;
+  const RhsMapper& rhs;
+  OutputMapper& output;
+  const Index m;
+  const Index k;
+  const Index n;
+  const Index mc;
+  const Index kc;
+  const Index nc;
+  const Index num_threads;
+  const Index num_blockAs;
+  const Index max_m;
+  const Index k_block_idx;
+  const Index m_block_idx;
+  const Index n_block_idx;
+  const Index m_blocks;
+  const Index n_blocks;
+  FixedSizeVector<Notification*>* kernel_notifications;
+  const FixedSizeVector<Notification*>* lhs_notifications;
+  const bool need_to_pack;
+};
+
+template<typename RhsScalar, typename RhsMapper, typename Index>
+struct packRhsArg {
+  RhsScalar* blockB;
+  const RhsMapper& rhs;
+  const Index n_start;
+  const Index k_start;
+  const Index nc;
+  const Index kc;
+};
+
+template<typename LhsScalar, typename RhsScalar, typename LhsMapper, typename OutputMapper, typename Index>
+struct packLhsAndKernelArg {
+  const FixedSizeVector<RhsScalar*>* blockBs;
+  LhsScalar* blockA;
+  const LhsMapper& lhs;
+  OutputMapper& output;
+  const Index m;
+  const Index k;
+  const Index n;
+  const Index mc;
+  const Index kc;
+  const Index nc;
+  const Index num_threads;
+  const Index num_blockBs;
+  const Index max_n;
+  const Index k_block_idx;
+  const Index m_block_idx;
+  const Index n_block_idx;
+  const Index m_blocks;
+  const Index n_blocks;
+  FixedSizeVector<Notification*>* kernel_notifications;
+  const FixedSizeVector<Notification*>* rhs_notifications;
+  const bool need_to_pack;
+};
+
+}  // end namespace internal
+
+
+template<typename Indices, typename LeftArgType, typename RightArgType>
+struct TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, ThreadPoolDevice> :
+    public TensorContractionEvaluatorBase<TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, ThreadPoolDevice> > {
+
+  typedef ThreadPoolDevice Device;
+
+  typedef TensorEvaluator<const TensorContractionOp<Indices, LeftArgType, RightArgType>, Device> Self;
+  typedef TensorContractionEvaluatorBase<Self> Base;
+
+  typedef TensorContractionOp<Indices, LeftArgType, RightArgType> XprType;
+  typedef typename internal::remove_const<typename XprType::Scalar>::type Scalar;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, ThreadPoolDevice>::type PacketReturnType;
+
+  enum {
+    Layout = TensorEvaluator<LeftArgType, Device>::Layout,
+  };
+
+  // Most of the code is assuming that both input tensors are ColMajor. If the
+  // inputs are RowMajor, we will "cheat" by swapping the LHS and RHS:
+  // If we want to compute A * B = C, where A is LHS and B is RHS, the code
+  // will pretend B is LHS and A is RHS.
+  typedef typename internal::conditional<
+    static_cast<int>(Layout) == static_cast<int>(ColMajor), LeftArgType, RightArgType>::type EvalLeftArgType;
+  typedef typename internal::conditional<
+    static_cast<int>(Layout) == static_cast<int>(ColMajor), RightArgType, LeftArgType>::type EvalRightArgType;
+
+  static const int LDims =
+      internal::array_size<typename TensorEvaluator<EvalLeftArgType, Device>::Dimensions>::value;
+  static const int RDims =
+      internal::array_size<typename TensorEvaluator<EvalRightArgType, Device>::Dimensions>::value;
+  static const int ContractDims = internal::array_size<Indices>::value;
+
+  typedef array<Index, LDims> left_dim_mapper_t;
+  typedef array<Index, RDims> right_dim_mapper_t;
+
+  typedef array<Index, ContractDims> contract_t;
+  typedef array<Index, LDims - ContractDims> left_nocontract_t;
+  typedef array<Index, RDims - ContractDims> right_nocontract_t;
+
+  static const int NumDims = LDims + RDims - 2 * ContractDims;
+
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  // typedefs needed in evalTo
+  typedef typename internal::remove_const<typename EvalLeftArgType::Scalar>::type LhsScalar;
+  typedef typename internal::remove_const<typename EvalRightArgType::Scalar>::type RhsScalar;
+  typedef typename internal::gebp_traits<LhsScalar, RhsScalar> Traits;
+
+  typedef TensorEvaluator<EvalLeftArgType, Device> LeftEvaluator;
+  typedef TensorEvaluator<EvalRightArgType, Device> RightEvaluator;
+
+  TensorEvaluator(const XprType& op, const Device& device) :
+      Base(op, device) {}
+
+  template <bool lhs_inner_dim_contiguous, bool rhs_inner_dim_contiguous, bool rhs_inner_dim_reordered, int Alignment>
+  void evalProduct(Scalar* buffer) const {
+    // Disable Gemv on ARM/AVX or if multiple threads are in use
+#if !defined(EIGEN_VECTORIZE_NEON) && !defined(EIGEN_VECTORIZE_AVX)
+    if (this->m_j_size == 1 && this->m_device.numThreads() == 1) {
+      this->template evalGemv<lhs_inner_dim_contiguous, rhs_inner_dim_contiguous, rhs_inner_dim_reordered, Alignment>(buffer);
+      return;
+    }
+#endif
+
+    if (this->m_j_size / this->m_device.numThreads() < Traits::nr &&
+        this->m_i_size / this->m_device.numThreads() >= Traits::mr) {
+      evalGemmByRows<lhs_inner_dim_contiguous, rhs_inner_dim_contiguous, rhs_inner_dim_reordered, Alignment>(buffer);
+    } else {
+      evalGemmByCols<lhs_inner_dim_contiguous, rhs_inner_dim_contiguous, rhs_inner_dim_reordered, Alignment>(buffer);
+    }
+  }
+
+  template <bool lhs_inner_dim_contiguous, bool rhs_inner_dim_contiguous, bool rhs_inner_dim_reordered, int Alignment>
+  void evalGemmByCols(Scalar* buffer) const {
+    // columns in left side, rows in right side
+    const Index k = this->m_k_size;
+
+    // rows in left side
+    const Index m = this->m_i_size;
+
+    // columns in right side
+    const Index n = this->m_j_size;
+
+    // zero out the result buffer (which must be of size at least m * n * sizeof(Scalar)
+    this->m_device.memset(buffer, 0, m * n * sizeof(Scalar));
+
+
+    const int lhs_packet_size = PacketType<LhsScalar, Device>::size;
+    const int rhs_packet_size = PacketType<RhsScalar, Device>::size;
+
+    typedef internal::TensorContractionInputMapper<LhsScalar, Index, internal::Lhs,
+                                                   LeftEvaluator, left_nocontract_t,
+                                                   contract_t, lhs_packet_size,
+                                                   lhs_inner_dim_contiguous,
+                                                   false, Unaligned> LhsMapper;
+
+    typedef internal::TensorContractionInputMapper<RhsScalar, Index, internal::Rhs,
+                                                   RightEvaluator, right_nocontract_t,
+                                                   contract_t, rhs_packet_size,
+                                                   rhs_inner_dim_contiguous,
+                                                   rhs_inner_dim_reordered, Unaligned> RhsMapper;
+
+    typedef internal::blas_data_mapper<Scalar, Index, ColMajor> OutputMapper;
+
+    // TODO: packing could be faster sometimes if we supported row major tensor mappers
+    typedef internal::gemm_pack_lhs<LhsScalar, Index, typename LhsMapper::SubMapper, Traits::mr,
+                                    Traits::LhsProgress, ColMajor> LhsPacker;
+    typedef internal::gemm_pack_rhs<RhsScalar, Index, typename RhsMapper::SubMapper, Traits::nr, ColMajor> RhsPacker;
+
+    // TODO: replace false, false with conjugate values?
+    typedef internal::gebp_kernel<LhsScalar, RhsScalar, Index, OutputMapper,
+                                  Traits::mr, Traits::nr, false, false> GebpKernel;
+
+    typedef internal::packLhsArg<LhsScalar, LhsMapper, Index> packLArg;
+    typedef internal::packRhsAndKernelArg<LhsScalar, RhsScalar, RhsMapper, OutputMapper, Index> packRKArg;
+
+    // initialize data mappers
+    LhsMapper lhs(this->m_leftImpl, this->m_left_nocontract_strides, this->m_i_strides,
+                  this->m_left_contracting_strides, this->m_k_strides);
+
+    RhsMapper rhs(this->m_rightImpl, this->m_right_nocontract_strides, this->m_j_strides,
+                  this->m_right_contracting_strides, this->m_k_strides);
+
+    OutputMapper output(buffer, m);
+
+    LhsPacker pack_lhs;
+
+    // compute block sizes (which depend on number of threads)
+    const Index num_threads = this->m_device.numThreads();
+    Index mc = m;
+    Index nc = n;
+    Index kc = k;
+    internal::ComputeGemmByColBlockingSizes<LhsScalar,RhsScalar,1,Index> block;
+    block(kc, mc, nc, num_threads);
+    eigen_assert(mc <= m);
+    eigen_assert(nc <= n);
+    eigen_assert(kc <= k);
+
+#define CEIL_DIV(a, b) (((a) + (b) - 1) / (b))
+    const Index k_blocks = CEIL_DIV(k, kc);
+    const Index n_blocks = CEIL_DIV(n, nc);
+    const Index m_blocks = CEIL_DIV(m, mc);
+#undef CEIL_DIV
+
+    const int sizeA = mc * kc;
+    const int sizeB = kc * nc;
+
+    /*   cout << "m: " << m << " n: " << n << " k: " << k << endl;
+    cout << "mc: " << mc << " nc: " << nc << " kc: " << kc << endl;
+    cout << "m_blocks: " << m_blocks << " n_blocks: " << n_blocks << " k_blocks: " << k_blocks << endl;
+    cout << "num threads: " << num_threads << endl;
+    */
+
+    // note: m_device.allocate should return 16 byte aligned pointers, but if blockA and blockB
+    //       aren't 16 byte aligned segfaults will happen due to SIMD instructions
+    // note: You can get away with allocating just a single blockA and offsets and meet the
+    //       the alignment requirements with the assumption that
+    //       (Traits::mr * sizeof(ResScalar)) % 16 == 0
+    const Index numBlockAs = (std::min)(num_threads, m_blocks);
+    FixedSizeVector<LhsScalar *> blockAs(num_threads);
+    for (int i = 0; i < num_threads; i++) {
+      blockAs.push_back(static_cast<LhsScalar *>(this->m_device.allocate(sizeA * sizeof(LhsScalar))));
+    }
+
+    // To circumvent alignment issues, I'm just going to separately allocate the memory for each thread
+    // TODO: is this too much memory to allocate? This simplifies coding a lot, but is wasteful.
+    //       Other options: (1) reuse memory when a thread finishes. con: tricky
+    //                      (2) allocate block B memory in each thread. con: overhead
+    FixedSizeVector<RhsScalar *> blockBs(n_blocks);
+    for (int i = 0; i < n_blocks; i++) {
+      blockBs.push_back(static_cast<RhsScalar *>(this->m_device.allocate(sizeB * sizeof(RhsScalar))));
+    }
+
+    // lhs_notifications starts with all null Notifications
+    FixedSizeVector<Notification*> lhs_notifications(num_threads, nullptr);
+
+    // this should really be numBlockAs * n_blocks;
+    const Index num_kernel_notifications = num_threads * n_blocks;
+    FixedSizeVector<Notification*> kernel_notifications(num_kernel_notifications,
+                                                        nullptr);
+
+    for (Index k_block_idx = 0; k_block_idx < k_blocks; k_block_idx++) {
+      const Index k_start = k_block_idx * kc;
+      // make sure we don't overshoot right edge of left matrix
+      const Index actual_kc = (std::min)(k_start + kc, k) - k_start;
+
+      for (Index m_block_idx = 0; m_block_idx < m_blocks; m_block_idx += numBlockAs) {
+        const int num_blocks = (std::min)(m_blocks-m_block_idx, numBlockAs);
+
+        for (Index mt_block_idx = m_block_idx; mt_block_idx < m_block_idx+num_blocks; mt_block_idx++) {
+          const Index m_start = mt_block_idx * mc;
+          const Index actual_mc = (std::min)(m_start + mc, m) - m_start;
+          eigen_assert(actual_mc > 0);
+
+          int blockAId = (k_block_idx * m_blocks + mt_block_idx) % num_threads;
+
+          // Wait for previous RHS kernels to complete.
+          for (int i = 0; i < n_blocks; ++i) {
+            int notification_id = (blockAId * n_blocks + i);
+
+            // Wait for any current kernels using this slot to complete
+            // before using it.
+            if (kernel_notifications[notification_id]) {
+              wait_until_ready(kernel_notifications[notification_id]);
+              delete kernel_notifications[notification_id];
+            }
+            kernel_notifications[notification_id] = new Notification();
+          }
+          const packLArg arg = {
+            blockAs[blockAId], // blockA
+            lhs,        // lhs
+            m_start,    // m
+            k_start,    // k
+            actual_mc,  // mc
+            actual_kc,  // kc
+          };
+
+          // Delete any existing notification since we may be
+          // replacing it.  The algorithm should ensure that there are
+          // no existing waiters on this notification.
+          delete lhs_notifications[blockAId];
+          lhs_notifications[blockAId] =
+              this->m_device.enqueue(&Self::packLhs<packLArg, LhsPacker>, arg);
+        }
+
+        // now start kernels.
+        const Index m_base_start = m_block_idx * mc;
+        const bool need_to_pack = m_block_idx == 0;
+
+        for (Index n_block_idx = 0; n_block_idx < n_blocks; n_block_idx++) {
+          const Index n_start = n_block_idx * nc;
+          const Index actual_nc = (std::min)(n_start + nc, n) - n_start;
+
+          // first make sure the previous kernels are all done before overwriting rhs. Also wait if
+          // we're going to start new k. In both cases need_to_pack is true.
+          if (need_to_pack) {
+            for (int i = num_blocks; i < num_threads; ++i) {
+              Index blockAId = (k_block_idx * m_blocks + i + m_block_idx) % num_threads;
+              Index future_id = (blockAId * n_blocks + n_block_idx);
+              wait_until_ready(kernel_notifications[future_id]);
+            }
+          }
+
+          packRKArg arg = {
+            &blockAs, // blockA
+            blockBs[n_block_idx], // blockB
+            rhs,          // rhs
+            output,       // output
+            m_base_start, // m
+            k_start,      // k
+            n_start,      // n
+            mc,           // mc
+            actual_kc,    // kc
+            actual_nc,    // nc
+            num_threads,
+            numBlockAs,
+            m,
+            k_block_idx,
+            m_block_idx,
+            n_block_idx, // n_block_idx
+            m_blocks, // m_blocks
+            n_blocks, // n_blocks
+            &kernel_notifications, // kernel_notifications
+            &lhs_notifications, // lhs_notifications
+            need_to_pack, // need_to_pack
+          };
+
+          // We asynchronously kick off this function, which ends up
+          // notifying the appropriate kernel_notifications objects,
+          // which this thread waits on before exiting.
+          //
+          // The wait for kernel_notifications below ensures that we
+          // don't have to keep track of the launch of this work.
+          this->m_device.enqueue_and_forget(&Self::packRhsAndKernel<packRKArg, RhsPacker, GebpKernel>, arg);
+        }
+      }
+    }
+
+    // Make sure all the kernels are done.
+    for (int i = 0; i < kernel_notifications.size(); ++i) {
+      wait_until_ready(kernel_notifications[i]);
+      delete kernel_notifications[i];
+    }
+
+    // No need to wait for lhs notifications since they should have
+    // already been waited on.  Just clean them up.
+    for (int i = 0; i < lhs_notifications.size(); ++i) {
+      delete lhs_notifications[i];
+    }
+
+    // deallocate all of the memory for both A and B's
+    for (int i = 0; i < blockAs.size(); i++) {
+      this->m_device.deallocate(blockAs[i]);
+    }
+    for (int i = 0; i < blockBs.size(); i++) {
+      this->m_device.deallocate(blockBs[i]);
+    }
+  }
+
+  /*
+   * Packs a LHS block of size (mt, kc) starting at lhs(m, k). Before packing
+   * the LHS block, check that all of the kernels that worked on the same
+   * mt_block_idx in the previous m_block are done.
+   */
+  template <typename packLArg, typename LhsPacker>
+  static void packLhs(const packLArg arg) {
+    // perform actual packing
+    LhsPacker pack_lhs;
+    pack_lhs(arg.blockA, arg.lhs.getSubMapper(arg.m_start, arg.k_start), arg.kc, arg.mc);
+  }
+
+  /*
+   * Packs a RHS block of size (kc, nc) starting at (k, n) after checking that
+   * all kernels in the previous block are done.
+   * Then for each LHS future, we wait on the future and then call GEBP
+   * on the area packed by the future (which starts at
+   * blockA + future_idx * mt * kc) on the LHS and with the full packed
+   * RHS block.
+   * The output of this GEBP is written to output(m + i * mt, n).
+   */
+  template <typename packRKArg, typename RhsPacker, typename GebpKernel>
+  static void packRhsAndKernel(packRKArg arg) {
+    if (arg.need_to_pack) {
+      RhsPacker pack_rhs;
+      pack_rhs(arg.blockB, arg.rhs.getSubMapper(arg.k, arg.n), arg.kc, arg.nc);
+    }
+
+    GebpKernel gebp;
+    for (Index mt_block_idx = 0; mt_block_idx < arg.num_blockAs; mt_block_idx++) {
+      const Index m_base_start = arg.m + arg.mc*mt_block_idx;
+      if (m_base_start < arg.max_m) {
+        int blockAId = (arg.k_block_idx * arg.m_blocks + mt_block_idx + arg.m_block_idx) % arg.num_threads;
+        wait_until_ready((*arg.lhs_notifications)[blockAId]);
+        const Index actual_mc = (std::min)(m_base_start + arg.mc, arg.max_m) - m_base_start;
+        gebp(arg.output.getSubMapper(m_base_start, arg.n),
+             (*arg.blockAs)[blockAId], arg.blockB,
+             actual_mc, arg.kc, arg.nc, Scalar(1), -1, -1, 0, 0);
+
+        // Notify that the kernel is done.
+        const Index set_idx = blockAId * arg.n_blocks + arg.n_block_idx;
+        (*arg.kernel_notifications)[set_idx]->Notify();
+      }
+    }
+  }
+
+  template <bool lhs_inner_dim_contiguous, bool rhs_inner_dim_contiguous, bool rhs_inner_dim_reordered, int Alignment>
+  void evalGemmByRows(Scalar* buffer) const {
+    // columns in left side, rows in right side
+    const Index k = this->m_k_size;
+
+    // rows in left side
+    const Index m = this->m_i_size;
+
+    // columns in right side
+    const Index n = this->m_j_size;
+
+    // zero out the result buffer (which must be of size at least m * n * sizeof(Scalar)
+    this->m_device.memset(buffer, 0, m * n * sizeof(Scalar));
+
+    const int lhs_packet_size = PacketType<LhsScalar, ThreadPoolDevice>::size;
+    const int rhs_packet_size = PacketType<RhsScalar, ThreadPoolDevice>::size;
+
+    typedef internal::TensorContractionInputMapper<LhsScalar, Index, internal::Lhs,
+                                                   LeftEvaluator, left_nocontract_t,
+                                                   contract_t, lhs_packet_size,
+                                                   lhs_inner_dim_contiguous,
+                                                   false, Unaligned> LhsMapper;
+
+    typedef internal::TensorContractionInputMapper<RhsScalar, Index, internal::Rhs,
+                                                   RightEvaluator, right_nocontract_t,
+                                                   contract_t, rhs_packet_size,
+                                                   rhs_inner_dim_contiguous,
+                                                   rhs_inner_dim_reordered, Unaligned> RhsMapper;
+
+    typedef internal::blas_data_mapper<Scalar, Index, ColMajor> OutputMapper;
+
+    // TODO: packing could be faster sometimes if we supported row major tensor mappers
+    typedef internal::gemm_pack_lhs<LhsScalar, Index, typename LhsMapper::SubMapper, Traits::mr,
+                                    Traits::LhsProgress, ColMajor> LhsPacker;
+    typedef internal::gemm_pack_rhs<RhsScalar, Index, typename RhsMapper::SubMapper, Traits::nr, ColMajor> RhsPacker;
+
+    // TODO: replace false, false with conjugate values?
+    typedef internal::gebp_kernel<LhsScalar, RhsScalar, Index, OutputMapper,
+                                  Traits::mr, Traits::nr, false, false> GebpKernel;
+
+    typedef internal::packRhsArg<RhsScalar, RhsMapper, Index> packRArg;
+    typedef internal::packLhsAndKernelArg<LhsScalar, RhsScalar, LhsMapper, OutputMapper, Index> packLKArg;
+
+    // initialize data mappers
+    LhsMapper lhs(this->m_leftImpl, this->m_left_nocontract_strides, this->m_i_strides,
+                  this->m_left_contracting_strides, this->m_k_strides);
+
+    RhsMapper rhs(this->m_rightImpl, this->m_right_nocontract_strides, this->m_j_strides,
+                  this->m_right_contracting_strides, this->m_k_strides);
+
+    OutputMapper output(buffer, m);
+
+    RhsPacker pack_rhs;
+
+    // compute block sizes (which depend on number of threads)
+    const Index num_threads = this->m_device.numThreads();
+    Index mc = m;
+    Index nc = n;
+    Index kc = k;
+    internal::ComputeGemmByRowBlockingSizes<LhsScalar,RhsScalar,1,Index> block;
+    block(kc, mc, nc, num_threads);
+    eigen_assert(mc <= m);
+    eigen_assert(nc <= n);
+    eigen_assert(kc <= k);
+
+#define CEIL_DIV(a, b) (((a) + (b) - 1) / (b))
+    const Index k_blocks = CEIL_DIV(k, kc);
+    const Index n_blocks = CEIL_DIV(n, nc);
+    const Index m_blocks = CEIL_DIV(m, mc);
+#undef CEIL_DIV
+
+
+    const int sizeA = mc * kc;
+    const int sizeB = kc * nc;
+
+    const Index numBlockBs = (std::min)(num_threads, n_blocks);
+    FixedSizeVector<RhsScalar *> blockBs(num_threads);
+    for (int i = 0; i < num_threads; i++) {
+      blockBs.push_back(static_cast<RhsScalar *>(this->m_device.allocate(sizeB * sizeof(RhsScalar))));
+    }
+
+    FixedSizeVector<LhsScalar *> blockAs(m_blocks);
+    for (int i = 0; i < m_blocks; i++) {
+      blockAs.push_back(static_cast<LhsScalar *>(this->m_device.allocate(sizeA * sizeof(LhsScalar))));
+    }
+
+    // lhs_notifications starts with all null Notifications
+    FixedSizeVector<Notification*> rhs_notifications(num_threads, nullptr);
+
+    // this should really be numBlockBs * m_blocks;
+    const Index num_kernel_notifications = num_threads * m_blocks;
+    FixedSizeVector<Notification*> kernel_notifications(num_kernel_notifications,
+                                                        nullptr);
+
+    for (Index k_block_idx = 0; k_block_idx < k_blocks; k_block_idx++) {
+      const Index k_start = k_block_idx * kc;
+      // make sure we don't overshoot right edge of left matrix
+      const Index actual_kc = (std::min)(k_start + kc, k) - k_start;
+
+      for (Index n_block_idx = 0; n_block_idx < n_blocks; n_block_idx += numBlockBs) {
+        const int num_blocks = (std::min)(n_blocks-n_block_idx, numBlockBs);
+
+        for (Index nt_block_idx = n_block_idx; nt_block_idx < n_block_idx+num_blocks; nt_block_idx++) {
+          const Index n_start = nt_block_idx * nc;
+          const Index actual_nc = (std::min)(n_start + nc, n) - n_start;
+          eigen_assert(actual_nc > 0);
+
+          int blockBId = (k_block_idx * n_blocks + nt_block_idx) % num_threads;
+          // Wait for previous RHS kernels to complete.
+          for (int i = 0; i < m_blocks; ++i) {
+            int notification_id = (blockBId * m_blocks + i);
+
+            // Wait for any current kernels using this slot to complete
+            // before using it.
+            if (kernel_notifications[notification_id]) {
+              wait_until_ready(kernel_notifications[notification_id]);
+              delete kernel_notifications[notification_id];
+            }
+            kernel_notifications[notification_id] = new Notification();
+          }
+          const packRArg arg = {
+            blockBs[blockBId], // blockB
+            rhs,               // rhs
+            n_start,           // n
+            k_start,           // k
+            actual_nc,         // nc
+            actual_kc,         // kc
+          };
+
+          // Delete any existing notification since we may be
+          // replacing it.  The algorithm should ensure that there are
+          // no existing waiters on this notification.
+          delete rhs_notifications[blockBId];
+          rhs_notifications[blockBId] =
+              this->m_device.enqueue(&Self::packRhs<packRArg, RhsPacker>, arg);
+        }
+
+        // now start kernels.
+        const Index n_base_start = n_block_idx * nc;
+        const bool need_to_pack = n_block_idx == 0;
+
+        for (Index m_block_idx = 0; m_block_idx < m_blocks; m_block_idx++) {
+          const Index m_start = m_block_idx * mc;
+          const Index actual_mc = (std::min)(m_start + mc, m) - m_start;
+
+          // first make sure the previous kernels are all done before overwriting rhs. Also wait if
+          // we're going to start new k. In both cases need_to_pack is true.
+          if (need_to_pack) {
+            for (int i = num_blocks; i < num_threads; ++i) {
+              Index blockBId = (k_block_idx * n_blocks + i + n_block_idx) % num_threads;
+              Index future_id = (blockBId * m_blocks + m_block_idx);
+              wait_until_ready(kernel_notifications[future_id]);
+            }
+          }
+
+          packLKArg arg = {
+            &blockBs,             // blockB
+            blockAs[m_block_idx], // blockA
+            lhs,                  // lhs
+            output,               // output
+            m_start,              // m
+            k_start,              // k
+            n_base_start,         // n
+            actual_mc,            // mc
+            actual_kc,            // kc
+            nc,                   // nc
+            num_threads,
+            numBlockBs,
+            n,
+            k_block_idx,
+            m_block_idx,
+            n_block_idx,
+            m_blocks,
+            n_blocks,
+            &kernel_notifications,
+            &rhs_notifications,
+            need_to_pack,
+          };
+
+          // We asynchronously kick off this function, which ends up
+          // notifying the appropriate kernel_notifications objects,
+          // which this thread waits on before exiting.
+          //
+          // The wait for kernel_notifications below ensures that we
+          // don't have to keep track of the launch of this work.
+          this->m_device.enqueue_and_forget(&Self::packLhsAndKernel<packLKArg, LhsPacker, GebpKernel>, arg);
+        }
+      }
+    }
+
+    // Make sure all the kernels are done.
+    for (int i = 0; i < kernel_notifications.size(); ++i) {
+      wait_until_ready(kernel_notifications[i]);
+      delete kernel_notifications[i];
+    }
+
+    // No need to wait for lhs notifications since they should have
+    // already been waited on.  Just clean them up.
+    for (int i = 0; i < rhs_notifications.size(); ++i) {
+      delete rhs_notifications[i];
+    }
+
+    // deallocate all of the memory for both A and B's
+    for (int i = 0; i < blockAs.size(); i++) {
+      this->m_device.deallocate(blockAs[i]);
+    }
+    for (int i = 0; i < blockBs.size(); i++) {
+      this->m_device.deallocate(blockBs[i]);
+    }
+  }
+
+  template <typename packRArg, typename RhsPacker>
+  static void packRhs(const packRArg arg) {
+    // perform actual packing
+    RhsPacker pack_rhs;
+    pack_rhs(arg.blockB, arg.rhs.getSubMapper(arg.k_start, arg.n_start), arg.kc, arg.nc);
+  }
+
+  template <typename packLKArg, typename LhsPacker, typename GebpKernel>
+  static void packLhsAndKernel(packLKArg arg) {
+    if (arg.need_to_pack) {
+      LhsPacker pack_lhs;
+      pack_lhs(arg.blockA, arg.lhs.getSubMapper(arg.m, arg.k), arg.kc, arg.mc);
+    }
+
+    GebpKernel gebp;
+    for (Index nt_block_idx = 0; nt_block_idx < arg.num_blockBs; nt_block_idx++) {
+      const Index n_base_start = arg.n + arg.nc*nt_block_idx;
+      if (n_base_start < arg.max_n) {
+        int blockBId = (arg.k_block_idx * arg.n_blocks + nt_block_idx + arg.n_block_idx) % arg.num_threads;
+        wait_until_ready((*arg.rhs_notifications)[blockBId]);
+        const Index actual_nc = (std::min)(n_base_start + arg.nc, arg.max_n) - n_base_start;
+        gebp(arg.output.getSubMapper(arg.m, n_base_start),
+             arg.blockA, (*arg.blockBs)[blockBId],
+             arg.mc, arg.kc, actual_nc, Scalar(1), -1, -1, 0, 0);
+
+        // Notify that the kernel is done.
+        const Index set_idx = blockBId * arg.m_blocks + arg.m_block_idx;
+        (*arg.kernel_notifications)[set_idx]->Notify();
+      }
+    }
+  }
+};
+
+} // end namespace Eigen
+
+#endif  // EIGEN_USE_THREADS
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CONTRACTION_THREAD_POOL_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h
new file mode 100644
index 0000000000..d54091fa1c
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h
@@ -0,0 +1,226 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CONVERSION_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CONVERSION_H
+
+namespace Eigen {
+
+/** \class TensorConversionOp
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor conversion class. This class makes it possible to vectorize
+  * type casting operations when the number of scalars per packet in the source
+  * and the destination type differ
+  */
+namespace internal {
+template<typename TargetType, typename XprType>
+struct traits<TensorConversionOp<TargetType, XprType> >
+{
+  // Type promotion to handle the case where the types of the lhs and the rhs are different.
+  typedef TargetType Scalar;
+  typedef typename traits<XprType>::StorageKind StorageKind;
+  typedef typename traits<XprType>::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = traits<XprType>::NumDimensions;
+  static const int Layout = traits<XprType>::Layout;
+  enum { Flags = 0 };
+};
+
+template<typename TargetType, typename XprType>
+struct eval<TensorConversionOp<TargetType, XprType>, Eigen::Dense>
+{
+  typedef const TensorConversionOp<TargetType, XprType>& type;
+};
+
+template<typename TargetType, typename XprType>
+struct nested<TensorConversionOp<TargetType, XprType>, 1, typename eval<TensorConversionOp<TargetType, XprType> >::type>
+{
+  typedef TensorConversionOp<TargetType, XprType> type;
+};
+
+}  // end namespace internal
+
+
+template <typename TensorEvaluator, typename SrcPacket, typename TgtPacket, int SrcCoeffRatio, int TgtCoeffRatio>
+struct PacketConverter {
+  PacketConverter(const TensorEvaluator& impl)
+      : m_impl(impl) {}
+
+  template<int LoadMode, typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const {
+    return internal::pcast<SrcPacket, TgtPacket>(m_impl.template packet<LoadMode>(index));
+  }
+
+ private:
+  const TensorEvaluator& m_impl;
+};
+
+
+template <typename TensorEvaluator, typename SrcPacket, typename TgtPacket>
+struct PacketConverter<TensorEvaluator, SrcPacket, TgtPacket, 2, 1> {
+  PacketConverter(const TensorEvaluator& impl)
+      : m_impl(impl) {}
+
+  template<int LoadMode, typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const {
+    const int SrcPacketSize = internal::unpacket_traits<SrcPacket>::size;
+
+    SrcPacket src1 = m_impl.template packet<LoadMode>(index);
+    SrcPacket src2 = m_impl.template packet<LoadMode>(index + SrcPacketSize);
+    TgtPacket result = internal::pcast<SrcPacket, TgtPacket>(src1, src2);
+    return result;
+  }
+
+ private:
+  const TensorEvaluator& m_impl;
+};
+
+template <typename TensorEvaluator, typename SrcPacket, typename TgtPacket>
+struct PacketConverter<TensorEvaluator, SrcPacket, TgtPacket, 4, 1> {
+  PacketConverter(const TensorEvaluator& impl)
+      : m_impl(impl) {}
+
+  template<int LoadMode, typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const {
+    const int SrcPacketSize = internal::unpacket_traits<SrcPacket>::size;
+
+    SrcPacket src1 = m_impl.template packet<LoadMode>(index);
+    SrcPacket src2 = m_impl.template packet<LoadMode>(index + SrcPacketSize);
+    SrcPacket src3 = m_impl.template packet<LoadMode>(index + 2 * SrcPacketSize);
+    SrcPacket src4 = m_impl.template packet<LoadMode>(index + 3 * SrcPacketSize);
+    TgtPacket result = internal::pcast<SrcPacket, TgtPacket>(src1, src2, src3, src4);
+    return result;
+  }
+
+ private:
+  const TensorEvaluator& m_impl;
+};
+
+
+template <typename TensorEvaluator, typename SrcPacket, typename TgtPacket>
+struct PacketConverter<TensorEvaluator, SrcPacket, TgtPacket, 1, 2> {
+  PacketConverter(const TensorEvaluator& impl)
+      : m_impl(impl), m_maxIndex(impl.dimensions().TotalSize()) {}
+
+  template<int LoadMode, typename Index>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TgtPacket packet(Index index) const {
+    const int SrcPacketSize = internal::unpacket_traits<SrcPacket>::size;
+    if (index + SrcPacketSize < m_maxIndex) {
+      return internal::pcast<SrcPacket, TgtPacket>(m_impl.template packet<LoadMode>(index));
+    } else {
+      const int TgtPacketSize = internal::unpacket_traits<TgtPacket>::size;
+      EIGEN_ALIGN_DEFAULT typename internal::unpacket_traits<TgtPacket>::type values[TgtPacketSize];
+      for (int i = 0; i < TgtPacketSize; ++i) {
+        values[i] = m_impl.coeff(index+i);
+      }
+      TgtPacket rslt = internal::pload<TgtPacket>(values);
+      return rslt;
+    }
+  }
+
+ private:
+  const TensorEvaluator& m_impl;
+  const typename TensorEvaluator::Index m_maxIndex;
+};
+
+template<typename TargetType, typename XprType>
+class TensorConversionOp : public TensorBase<TensorConversionOp<TargetType, XprType>, ReadOnlyAccessors>
+{
+  public:
+    typedef typename internal::traits<TensorConversionOp>::Scalar Scalar;
+    typedef typename internal::traits<TensorConversionOp>::StorageKind StorageKind;
+    typedef typename internal::traits<TensorConversionOp>::Index Index;
+    typedef typename internal::nested<TensorConversionOp>::type Nested;
+    typedef Scalar CoeffReturnType;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorConversionOp(const XprType& xpr)
+        : m_xpr(xpr) {}
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+};
+
+
+
+
+// Eval as rvalue
+template<typename TargetType, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorConversionOp<TargetType, ArgType>, Device>
+{
+  typedef TensorConversionOp<TargetType, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions Dimensions;
+  typedef TargetType Scalar;
+  typedef TargetType CoeffReturnType;
+  typedef typename internal::remove_all<typename internal::traits<ArgType>::Scalar>::type SrcType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename PacketType<SrcType, Device>::type PacketSourceType;
+
+  enum {
+    IsAligned = false,
+    PacketAccess =
+        TensorEvaluator<ArgType, Device>::PacketAccess &&
+        internal::type_casting_traits<SrcType, TargetType>::VectorizedCast,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+    : m_impl(op.expression(), device)
+  {
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_impl.dimensions(); }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* data)
+  {
+    if (internal::is_same<TargetType, SrcType>::value) {
+      return m_impl.evalSubExprsIfNeeded((SrcType*)data);
+    }
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup()
+  {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    internal::scalar_cast_op<SrcType, TargetType> converter;
+    return converter(m_impl.coeff(index));
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int SrcCoeffRatio = internal::type_casting_traits<SrcType, TargetType>::SrcCoeffRatio;
+    const int TgtCoeffRatio = internal::type_casting_traits<SrcType, TargetType>::TgtCoeffRatio;
+    PacketConverter<TensorEvaluator<ArgType, Device>, PacketSourceType, PacketReturnType,
+                    SrcCoeffRatio, TgtCoeffRatio> converter(m_impl);
+    return converter.template packet<LoadMode>(index);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+  protected:
+    TensorEvaluator<ArgType, Device> m_impl;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CONVERSION_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h
new file mode 100644
index 0000000000..58cae7162c
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h
@@ -0,0 +1,1076 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_H
+
+namespace Eigen {
+
+/** \class TensorConvolution
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor convolution class.
+  *
+  *
+  */
+namespace internal {
+
+template <typename Index, typename InputDims, size_t NumKernelDims, int Layout>
+class IndexMapper {
+ public:
+  IndexMapper(const InputDims& input_dims, const array<Index, NumKernelDims>& kernel_dims,
+              const array<Index, NumKernelDims>& indices) {
+
+    array<Index, NumDims> dimensions = input_dims;
+    for (int i = 0; i < NumKernelDims; ++i) {
+      const Index index = indices[i];
+      const Index input_dim = input_dims[index];
+      const Index kernel_dim = kernel_dims[i];
+      const Index result_dim = input_dim - kernel_dim + 1;
+      dimensions[index] = result_dim;
+    }
+
+    array<Index, NumDims> inputStrides;
+    array<Index, NumDims> outputStrides;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      inputStrides[0] = 1;
+      outputStrides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        inputStrides[i] = inputStrides[i-1] * input_dims[i-1];
+        outputStrides[i] = outputStrides[i-1] * dimensions[i-1];
+      }
+    } else {
+      inputStrides[NumDims - 1] = 1;
+      outputStrides[NumDims - 1] = 1;
+      for (int i = static_cast<int>(NumDims) - 2; i >= 0; --i) {
+        inputStrides[i] = inputStrides[i + 1] * input_dims[i + 1];
+        outputStrides[i] = outputStrides[i + 1] * dimensions[i + 1];
+      }
+    }
+
+    array<Index, NumDims> cudaInputDimensions;
+    array<Index, NumDims> cudaOutputDimensions;
+    array<Index, NumDims> tmp = dimensions;
+    array<Index, NumDims> ordering;
+    const size_t offset = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                              ? 0
+                              : NumDims - NumKernelDims;
+    for (int i = 0; i < NumKernelDims; ++i) {
+      const Index index = i + offset;
+      ordering[index] = indices[i];
+      tmp[indices[i]] = -1;
+      cudaInputDimensions[index] = input_dims[indices[i]];
+      cudaOutputDimensions[index] = dimensions[indices[i]];
+    }
+
+    int written = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                      ? NumKernelDims
+                      : 0;
+    for (int i = 0; i < NumDims; ++i) {
+      if (tmp[i] >= 0) {
+        ordering[written] = i;
+        cudaInputDimensions[written] = input_dims[i];
+        cudaOutputDimensions[written] = dimensions[i];
+        ++written;
+      }
+    }
+
+    for (int i = 0; i < NumDims; ++i) {
+      m_inputStrides[i] = inputStrides[ordering[i]];
+      m_outputStrides[i] = outputStrides[ordering[i]];
+    }
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = 0; i < NumDims; ++i) {
+        if (i > NumKernelDims) {
+          m_cudaInputStrides[i] =
+              m_cudaInputStrides[i - 1] * cudaInputDimensions[i - 1];
+          m_cudaOutputStrides[i] =
+              m_cudaOutputStrides[i - 1] * cudaOutputDimensions[i - 1];
+        } else {
+          m_cudaInputStrides[i] = 1;
+          m_cudaOutputStrides[i] = 1;
+        }
+      }
+    } else {
+      for (int i = NumDims - 1; i >= 0; --i) {
+        if (i + 1 < offset) {
+          m_cudaInputStrides[i] =
+              m_cudaInputStrides[i + 1] * cudaInputDimensions[i + 1];
+          m_cudaOutputStrides[i] =
+              m_cudaOutputStrides[i + 1] * cudaOutputDimensions[i + 1];
+        } else {
+          m_cudaInputStrides[i] = 1;
+          m_cudaOutputStrides[i] = 1;
+        }
+      }
+    }
+  }
+
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapCudaInputPlaneToTensorInputOffset(Index p) const {
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int d = NumDims - 1; d > NumKernelDims; --d) {
+        const Index idx = p / m_cudaInputStrides[d];
+        inputIndex += idx * m_inputStrides[d];
+        p -= idx * m_cudaInputStrides[d];
+      }
+      inputIndex += p * m_inputStrides[NumKernelDims];
+    } else {
+      int limit = 0;
+      if (NumKernelDims < NumDims) {
+        limit = NumDims - NumKernelDims - 1;
+      }
+      for (int d = 0; d < limit; ++d) {
+        const Index idx = p / m_cudaInputStrides[d];
+        inputIndex += idx * m_inputStrides[d];
+        p -= idx * m_cudaInputStrides[d];
+      }
+      inputIndex += p * m_inputStrides[limit];
+    }
+    return inputIndex;
+  }
+
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapCudaOutputPlaneToTensorOutputOffset(Index p) const {
+    Index outputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int d = NumDims - 1; d > NumKernelDims; --d) {
+        const Index idx = p / m_cudaOutputStrides[d];
+        outputIndex += idx * m_outputStrides[d];
+        p -= idx * m_cudaOutputStrides[d];
+      }
+      outputIndex += p * m_outputStrides[NumKernelDims];
+    } else {
+      int limit = 0;
+      if (NumKernelDims < NumDims) {
+        limit = NumDims - NumKernelDims - 1;
+      }
+      for (int d = 0; d < limit; ++d) {
+        const Index idx = p / m_cudaOutputStrides[d];
+        outputIndex += idx * m_outputStrides[d];
+        p -= idx * m_cudaOutputStrides[d];
+      }
+      outputIndex += p * m_outputStrides[limit];
+    }
+    return outputIndex;
+  }
+
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapCudaInputKernelToTensorInputOffset(Index i) const {
+    const size_t offset = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                              ? 0
+                              : NumDims - NumKernelDims;
+    return i * m_inputStrides[offset];
+  }
+
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapCudaOutputKernelToTensorOutputOffset(Index i) const {
+    const size_t offset = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                              ? 0
+                              : NumDims - NumKernelDims;
+    return i * m_outputStrides[offset];
+  }
+
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapCudaInputKernelToTensorInputOffset(Index i, Index j) const {
+    const size_t offset = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                              ? 0
+                              : NumDims - NumKernelDims;
+    return i * m_inputStrides[offset] + j * m_inputStrides[offset + 1];
+  }
+
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapCudaOutputKernelToTensorOutputOffset(Index i, Index j) const {
+    const size_t offset = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                              ? 0
+                              : NumDims - NumKernelDims;
+    return i * m_outputStrides[offset] + j * m_outputStrides[offset + 1];
+  }
+
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapCudaInputKernelToTensorInputOffset(Index i, Index j, Index k) const {
+    const size_t offset = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                              ? 0
+                              : NumDims - NumKernelDims;
+    return i * m_inputStrides[offset] + j * m_inputStrides[offset + 1] +
+           k * m_inputStrides[offset + 2];
+  }
+
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC Index mapCudaOutputKernelToTensorOutputOffset(Index i, Index j, Index k) const {
+    const size_t offset = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                              ? 0
+                              : NumDims - NumKernelDims;
+    return i * m_outputStrides[offset] + j * m_outputStrides[offset + 1] +
+           k * m_outputStrides[offset + 2];
+  }
+
+ private:
+  static const size_t NumDims = internal::array_size<InputDims>::value;
+  array<Index, NumDims> m_inputStrides;
+  array<Index, NumDims> m_outputStrides;
+  array<Index, NumDims> m_cudaInputStrides;
+  array<Index, NumDims> m_cudaOutputStrides;
+};
+
+
+
+template<typename Dimensions, typename InputXprType, typename KernelXprType>
+struct traits<TensorConvolutionOp<Dimensions, InputXprType, KernelXprType> >
+{
+  // Type promotion to handle the case where the types of the lhs and the rhs are different.
+  typedef typename promote_storage_type<typename InputXprType::Scalar,
+                                        typename KernelXprType::Scalar>::ret Scalar;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename promote_storage_type<typename traits<InputXprType>::StorageKind,
+                                        typename traits<KernelXprType>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename traits<InputXprType>::Index,
+                                      typename traits<KernelXprType>::Index>::type Index;
+  typedef typename InputXprType::Nested LhsNested;
+  typedef typename KernelXprType::Nested RhsNested;
+  typedef typename remove_reference<LhsNested>::type _LhsNested;
+  typedef typename remove_reference<RhsNested>::type _RhsNested;
+  static const int NumDimensions = traits<InputXprType>::NumDimensions;
+  static const int Layout = traits<InputXprType>::Layout;
+
+  enum {
+    Flags = 0,
+  };
+};
+
+template<typename Dimensions, typename InputXprType, typename KernelXprType>
+struct eval<TensorConvolutionOp<Dimensions, InputXprType, KernelXprType>, Eigen::Dense>
+{
+  typedef const TensorConvolutionOp<Dimensions, InputXprType, KernelXprType>& type;
+};
+
+template<typename Dimensions, typename InputXprType, typename KernelXprType>
+struct nested<TensorConvolutionOp<Dimensions, InputXprType, KernelXprType>, 1, typename eval<TensorConvolutionOp<Dimensions, InputXprType, KernelXprType> >::type>
+{
+  typedef TensorConvolutionOp<Dimensions, InputXprType, KernelXprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename Indices, typename InputXprType, typename KernelXprType>
+class TensorConvolutionOp : public TensorBase<TensorConvolutionOp<Indices, InputXprType, KernelXprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorConvolutionOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorConvolutionOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename internal::promote_storage_type<typename InputXprType::CoeffReturnType,
+                                                  typename KernelXprType::CoeffReturnType>::ret CoeffReturnType;
+  typedef typename internal::promote_storage_type<typename InputXprType::PacketReturnType,
+                                                  typename KernelXprType::PacketReturnType>::ret PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorConvolutionOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorConvolutionOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorConvolutionOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorConvolutionOp(const InputXprType& input, const KernelXprType& kernel, const Indices& dims)
+      : m_input_xpr(input), m_kernel_xpr(kernel), m_indices(dims) {}
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const Indices& indices() const { return m_indices; }
+
+    /** \returns the nested expressions */
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const typename internal::remove_all<typename InputXprType::Nested>::type&
+    inputExpression() const { return m_input_xpr; }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const typename internal::remove_all<typename KernelXprType::Nested>::type&
+    kernelExpression() const { return m_kernel_xpr; }
+
+  protected:
+    typename InputXprType::Nested m_input_xpr;
+    typename KernelXprType::Nested m_kernel_xpr;
+    const Indices m_indices;
+};
+
+
+template<typename Indices, typename InputArgType, typename KernelArgType, typename Device>
+struct TensorEvaluator<const TensorConvolutionOp<Indices, InputArgType, KernelArgType>, Device>
+{
+  typedef TensorConvolutionOp<Indices, InputArgType, KernelArgType> XprType;
+
+  static const int NumDims = internal::array_size<typename TensorEvaluator<InputArgType, Device>::Dimensions>::value;
+  static const int NumKernelDims = internal::array_size<Indices>::value;
+  typedef typename XprType::Index Index;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = TensorEvaluator<InputArgType, Device>::IsAligned &
+                TensorEvaluator<KernelArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<InputArgType, Device>::PacketAccess &
+                   TensorEvaluator<KernelArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<InputArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_inputImpl(op.inputExpression(), device), m_kernelImpl(op.kernelExpression(), device), m_kernelArg(op.kernelExpression()), m_kernel(NULL), m_local_kernel(false), m_device(device)
+  {
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<InputArgType, Device>::Layout) == static_cast<int>(TensorEvaluator<KernelArgType, Device>::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+    const typename TensorEvaluator<InputArgType, Device>::Dimensions& input_dims = m_inputImpl.dimensions();
+    const typename TensorEvaluator<KernelArgType, Device>::Dimensions& kernel_dims = m_kernelImpl.dimensions();
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_inputStride[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_inputStride[i] = m_inputStride[i - 1] * input_dims[i - 1];
+      }
+    } else {
+      m_inputStride[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_inputStride[i] = m_inputStride[i + 1] * input_dims[i + 1];
+      }
+    }
+
+    m_dimensions = m_inputImpl.dimensions();
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = 0; i < NumKernelDims; ++i) {
+        const Index index = op.indices()[i];
+        const Index input_dim = input_dims[index];
+        const Index kernel_dim = kernel_dims[i];
+        const Index result_dim = input_dim - kernel_dim + 1;
+        m_dimensions[index] = result_dim;
+        if (i > 0) {
+          m_kernelStride[i] = m_kernelStride[i - 1] * kernel_dims[i - 1];
+        } else {
+          m_kernelStride[0] = 1;
+        }
+        m_indexStride[i] = m_inputStride[index];
+      }
+
+      m_outputStride[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_outputStride[i] = m_outputStride[i - 1] * m_dimensions[i - 1];
+      }
+    } else {
+      for (int i = NumKernelDims - 1; i >= 0; --i) {
+        const Index index = op.indices()[i];
+        const Index input_dim = input_dims[index];
+        const Index kernel_dim = kernel_dims[i];
+        const Index result_dim = input_dim - kernel_dim + 1;
+        m_dimensions[index] = result_dim;
+        if (i < NumKernelDims - 1) {
+          m_kernelStride[i] = m_kernelStride[i + 1] * kernel_dims[i + 1];
+        } else {
+          m_kernelStride[NumKernelDims - 1] = 1;
+        }
+        m_indexStride[i] = m_inputStride[index];
+      }
+
+      m_outputStride[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_outputStride[i] = m_outputStride[i + 1] * m_dimensions[i + 1];
+      }
+    }
+  }
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar*) {
+    m_inputImpl.evalSubExprsIfNeeded(NULL);
+    preloadKernel();
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_inputImpl.cleanup();
+    if (m_local_kernel) {
+      m_device.deallocate((void*)m_kernel);
+      m_local_kernel = false;
+    }
+    m_kernel = NULL;
+  }
+
+  void evalTo(typename XprType::Scalar* buffer) {
+    evalSubExprsIfNeeded(NULL);
+    for (int i = 0; i < dimensions().TotalSize(); ++i) {
+      buffer[i] += coeff(i);
+    }
+    cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    CoeffReturnType result = CoeffReturnType(0);
+    convolve(firstInput(index), 0, NumKernelDims-1, result);
+    return result;
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC PacketReturnType packet(const Index index) const
+  {
+    const int PacketSize = internal::unpacket_traits<PacketReturnType>::size;
+    Index indices[2] = {index, index+PacketSize-1};
+    Index startInputs[2] = {0, 0};
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx0 = indices[0] / m_outputStride[i];
+        const Index idx1 = indices[1] / m_outputStride[i];
+        startInputs[0] += idx0 * m_inputStride[i];
+        startInputs[1] += idx1 * m_inputStride[i];
+        indices[0] -= idx0 * m_outputStride[i];
+        indices[1] -= idx1 * m_outputStride[i];
+      }
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx0 = indices[0] / m_outputStride[i];
+        const Index idx1 = indices[1] / m_outputStride[i];
+        startInputs[0] += idx0 * m_inputStride[i];
+        startInputs[1] += idx1 * m_inputStride[i];
+        indices[0] -= idx0 * m_outputStride[i];
+        indices[1] -= idx1 * m_outputStride[i];
+      }
+    }
+    startInputs[0] += indices[0];
+    startInputs[1] += indices[1];
+
+    if (startInputs[1]-startInputs[0] == PacketSize-1) {
+      PacketReturnType result = internal::pset1<PacketReturnType>(0);
+      convolvePacket(startInputs[0], 0, NumKernelDims-1, result);
+      return result;
+    } else {
+      EIGEN_ALIGN_DEFAULT Scalar data[PacketSize];
+      data[0] = Scalar(0);
+      convolve(startInputs[0], 0, NumKernelDims-1, data[0]);
+      for (int i = 1; i < PacketSize-1; ++i) {
+        data[i] = Scalar(0);
+        convolve(firstInput(index+i), 0, NumKernelDims-1, data[i]);
+      }
+      data[PacketSize-1] = Scalar(0);
+      convolve(startInputs[1], 0, NumKernelDims-1, data[PacketSize-1]);
+      return internal::pload<PacketReturnType>(data);
+    }
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ private:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index firstInput(Index index) const {
+    Index startInput = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = index / m_outputStride[i];
+        startInput += idx * m_inputStride[i];
+        index -= idx * m_outputStride[i];
+      }
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx = index / m_outputStride[i];
+        startInput += idx * m_inputStride[i];
+        index -= idx * m_outputStride[i];
+      }
+    }
+    startInput += index;
+    return startInput;
+  }
+
+  EIGEN_DEVICE_FUNC void convolve(Index firstIndex, Index firstKernel, int DimIndex, CoeffReturnType& accum) const {
+    for (int j = 0; j < m_kernelImpl.dimensions()[DimIndex]; ++j) {
+      const Index input = firstIndex + j * m_indexStride[DimIndex];
+      const Index kernel = firstKernel + j * m_kernelStride[DimIndex];
+      if (DimIndex > 0) {
+        convolve(input, kernel, DimIndex-1, accum);
+      } else {
+        accum += m_inputImpl.coeff(input) * m_kernel[kernel];
+      }
+    }
+  }
+
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC void convolvePacket(Index firstIndex, Index firstKernel, int DimIndex, Packet& accum) const {
+    for (int j = 0; j < m_kernelImpl.dimensions()[DimIndex]; ++j) {
+      const Index input = firstIndex + j * m_indexStride[DimIndex];
+      const Index kernel = firstKernel + j * m_kernelStride[DimIndex];
+      if (DimIndex > 0) {
+        convolvePacket(input, kernel, DimIndex-1, accum);
+      } else {
+        accum = internal::pmadd<Packet>(m_inputImpl.template packet<Unaligned>(input), internal::pset1<Packet>(m_kernel[kernel]), accum);
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void preloadKernel() {
+    // Don't make a local copy of the kernel unless we have to (i.e. it's an
+    // expression that needs to be evaluated)
+    const Scalar* in_place = m_kernelImpl.data();
+    if (in_place) {
+      m_kernel = in_place;
+      m_local_kernel = false;
+    } else {
+      size_t kernel_sz = m_kernelImpl.dimensions().TotalSize() * sizeof(Scalar);
+      Scalar* local = (Scalar*)m_device.allocate(kernel_sz);
+      typedef TensorEvalToOp<const KernelArgType> EvalTo;
+      EvalTo evalToTmp(local, m_kernelArg);
+      const bool PacketAccess = internal::IsVectorizable<Device, KernelArgType>::value;
+      const bool BlockAccess = false;
+      internal::TensorExecutor<const EvalTo, Device, PacketAccess, BlockAccess>::run(evalToTmp, m_device);
+
+      m_kernel = local;
+      m_local_kernel = true;
+    }
+  }
+
+  array<Index, NumDims> m_inputStride;
+  array<Index, NumDims> m_outputStride;
+
+  array<Index, NumKernelDims> m_indexStride;
+  array<Index, NumKernelDims> m_kernelStride;
+  TensorEvaluator<InputArgType, Device> m_inputImpl;
+  TensorEvaluator<KernelArgType, Device> m_kernelImpl;
+  Dimensions m_dimensions;
+
+  KernelArgType m_kernelArg;
+  const Scalar* m_kernel;
+  bool m_local_kernel;
+  const Device& m_device;
+};
+
+
+
+
+// Use an optimized implementation of the evaluation code for GPUs whenever possible.
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__)
+
+template <int StaticKernelSize>
+struct GetKernelSize {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int operator() (const int /*kernelSize*/) const {
+    return StaticKernelSize;
+  }
+};
+template <>
+struct GetKernelSize<Dynamic> {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int operator() (const int kernelSize) const {
+    return kernelSize;
+  }
+};
+
+template <typename InputEvaluator, typename Index, typename InputDims,
+          int StaticKernelSize>
+__global__ void EigenConvolutionKernel1D(
+    InputEvaluator eval,
+    const internal::IndexMapper<Index, InputDims, 1, InputEvaluator::Layout>
+        indexMapper,
+    const float* __restrict kernel, const int numPlanes, const int numX,
+    const int maxX, const int kernelSize, float* buffer) {
+  extern __shared__ float s[];
+
+  const int first_x = blockIdx.x * maxX;
+  const int last_x = (first_x + maxX < numX ? first_x + maxX : numX) - 1;
+  const int num_x_input = last_x - first_x + GetKernelSize<StaticKernelSize>()(kernelSize);
+  const int num_x_output = last_x - first_x + 1;
+
+  const int first_plane = blockIdx.y * blockDim.y;
+  const int plane_stride = blockDim.y * gridDim.y;
+
+  for (int p = first_plane + threadIdx.y; p < numPlanes; p += plane_stride) {
+    // Load inputs to shared memory
+    const int plane_input_offset = indexMapper.mapCudaInputPlaneToTensorInputOffset(p);
+    const int plane_kernel_offset = threadIdx.y * num_x_input;
+    #pragma unroll
+    for (int i = threadIdx.x; i < num_x_input; i += blockDim.x) {
+      const int tensor_index = plane_input_offset + indexMapper.mapCudaInputKernelToTensorInputOffset(i+first_x);
+      s[i + plane_kernel_offset] = eval.coeff(tensor_index);
+    }
+
+    __syncthreads();
+
+    // Compute the convolution
+    const int plane_output_offset = indexMapper.mapCudaOutputPlaneToTensorOutputOffset(p);
+
+    #pragma unroll
+    for (int i = threadIdx.x; i < num_x_output; i += blockDim.x) {
+      const int kernel_offset = plane_kernel_offset + i;
+      float result = 0.0f;
+      #pragma unroll
+      for (int k = 0; k < GetKernelSize<StaticKernelSize>()(kernelSize); ++k) {
+        result += s[k + kernel_offset] * kernel[k];
+      }
+      const int tensor_index = plane_output_offset + indexMapper.mapCudaOutputKernelToTensorOutputOffset(i+first_x);
+      buffer[tensor_index] = result;
+    }
+    __syncthreads();
+  }
+};
+
+template <typename InputEvaluator, typename Index, typename InputDims,
+          int StaticKernelSizeX, int StaticKernelSizeY>
+__global__ __launch_bounds__(1024, 1) void EigenConvolutionKernel2D(
+    InputEvaluator eval,
+    const internal::IndexMapper<Index, InputDims, 2, InputEvaluator::Layout>
+        indexMapper,
+    const float* __restrict kernel, const int numPlanes, const int numX,
+    const int maxX, const int numY, const int maxY, const int kernelSizeX,
+    const int kernelSizeY, float* buffer) {
+  extern __shared__ float s[];
+
+  const int first_x = blockIdx.x * maxX;
+  const int last_x = (first_x + maxX < numX ? first_x + maxX : numX) - 1;
+  const int num_x_input = last_x - first_x + GetKernelSize<StaticKernelSizeX>()(kernelSizeX);
+  const int num_x_output = last_x - first_x + 1;
+
+  const int first_y = blockIdx.y * maxY;
+  const int last_y = (first_y + maxY < numY ? first_y + maxY : numY) - 1;
+  const int num_y_input = last_y - first_y + GetKernelSize<StaticKernelSizeY>()(kernelSizeY);
+  const int num_y_output = last_y - first_y + 1;
+
+  const int first_plane = blockIdx.z * blockDim.z;
+  const int plane_stride = blockDim.z * gridDim.z;
+
+  for (int p = first_plane + threadIdx.z; p < numPlanes; p += plane_stride) {
+
+    const int plane_input_offset = indexMapper.mapCudaInputPlaneToTensorInputOffset(p);
+    const int plane_kernel_offset = threadIdx.z * num_y_input;
+
+    // Load inputs to shared memory
+    #pragma unroll
+    for (int j = threadIdx.y; j < num_y_input; j += blockDim.y) {
+      const int input_offset = num_x_input * (j + plane_kernel_offset);
+      #pragma unroll
+      for (int i = threadIdx.x; i < num_x_input; i += blockDim.x) {
+        const int tensor_index = plane_input_offset + indexMapper.mapCudaInputKernelToTensorInputOffset(i+first_x, j+first_y);
+        s[i + input_offset] = eval.coeff(tensor_index);
+      }
+    }
+
+    __syncthreads();
+
+    // Convolution
+    const int plane_output_offset = indexMapper.mapCudaOutputPlaneToTensorOutputOffset(p);
+
+    #pragma unroll
+    for (int j = threadIdx.y; j < num_y_output; j += blockDim.y) {
+      #pragma unroll
+      for (int i = threadIdx.x; i < num_x_output; i += blockDim.x) {
+        float result = 0.0f;
+        #pragma unroll
+        for (int l = 0; l < GetKernelSize<StaticKernelSizeY>()(kernelSizeY); ++l) {
+          const int kernel_offset = kernelSizeX * l;
+          const int input_offset = i + num_x_input * (j + l + plane_kernel_offset);
+          #pragma unroll
+          for (int k = 0; k < GetKernelSize<StaticKernelSizeX>()(kernelSizeX); ++k) {
+            result += s[k + input_offset] * kernel[k + kernel_offset];
+          }
+        }
+        const int tensor_index = plane_output_offset + indexMapper.mapCudaOutputKernelToTensorOutputOffset(i+first_x, j+first_y);
+        buffer[tensor_index] = result;
+      }
+    }
+
+    __syncthreads();
+  }
+};
+
+template <typename InputEvaluator, typename Index, typename InputDims>
+__global__ void EigenConvolutionKernel3D(
+    InputEvaluator eval,
+    const internal::IndexMapper<Index, InputDims, 3, InputEvaluator::Layout>
+        indexMapper,
+    const float* __restrict kernel, const size_t numPlanes, const size_t numX,
+    const size_t maxX, const size_t numY, const size_t maxY, const size_t numZ,
+    const size_t maxZ, const size_t kernelSizeX, const size_t kernelSizeY,
+    const size_t kernelSizeZ, float* buffer) {
+  extern __shared__ float s[];
+
+  // Load inputs to shared memory
+  const int first_x = blockIdx.x * maxX;
+  const int last_x = (first_x + maxX < numX ? first_x + maxX : numX) - 1;
+  const int num_x_input = last_x - first_x + kernelSizeX;
+
+  const int first_y = blockIdx.y * maxY;
+  const int last_y = (first_y + maxY < numY ? first_y + maxY : numY) - 1;
+  const int num_y_input = last_y - first_y + kernelSizeY;
+
+  const int first_z = blockIdx.z * maxZ;
+  const int last_z = (first_z + maxZ < numZ ? first_z + maxZ : numZ) - 1;
+  const int num_z_input = last_z - first_z + kernelSizeZ;
+
+  for (int p = 0; p < numPlanes; ++p) {
+
+    const int plane_input_offset = indexMapper.mapCudaInputPlaneToTensorInputOffset(p);
+    const int plane_kernel_offset = 0;
+
+    for (int k = threadIdx.z; k < num_z_input; k += blockDim.z) {
+      for (int j = threadIdx.y; j < num_y_input; j += blockDim.y) {
+        for (int i = threadIdx.x; i < num_x_input; i += blockDim.x) {
+          const int tensor_index = plane_input_offset + indexMapper.mapCudaInputKernelToTensorInputOffset(i+first_x, j+first_y, k+first_z);
+          s[i + num_x_input * (j + num_y_input * (k + plane_kernel_offset))] = eval.coeff(tensor_index);
+        }
+      }
+    }
+
+    __syncthreads();
+
+    // Convolution
+    const int num_z_output = last_z - first_z + 1;
+    const int num_y_output = last_y - first_y + 1;
+    const int num_x_output = last_x - first_x + 1;
+    const int plane_output_offset = indexMapper.mapCudaOutputPlaneToTensorOutputOffset(p);
+
+    for (int k = threadIdx.z; k < num_z_output; k += blockDim.z) {
+      for (int j = threadIdx.y; j < num_y_output; j += blockDim.y) {
+        for (int i = threadIdx.x; i < num_x_output; i += blockDim.x) {
+          float result = 0.0f;
+          for (int n = 0; n < kernelSizeZ; ++n) {
+            for (int m = 0; m < kernelSizeY; ++m) {
+              for (int l = 0; l < kernelSizeX; ++l) {
+                result += s[i + l + num_x_input * (j + m + num_y_input * (k + n + plane_kernel_offset))] * kernel[l + kernelSizeX * (m + kernelSizeY * n)];
+              }
+            }
+          }
+          const int tensor_index = plane_output_offset + indexMapper.mapCudaOutputKernelToTensorOutputOffset(i+first_x, j+first_y, k+first_z);
+          buffer[tensor_index] = result;
+        }
+      }
+    }
+    __syncthreads();
+  }
+};
+
+
+
+template<typename Indices, typename InputArgType, typename KernelArgType>
+struct TensorEvaluator<const TensorConvolutionOp<Indices, InputArgType, KernelArgType>, GpuDevice>
+{
+  typedef TensorConvolutionOp<Indices, InputArgType, KernelArgType> XprType;
+
+  static const int NumDims =  internal::array_size<typename TensorEvaluator<InputArgType, GpuDevice>::Dimensions>::value;
+  static const int NumKernelDims = internal::array_size<Indices>::value;
+  typedef typename XprType::Index Index;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename TensorEvaluator<KernelArgType, GpuDevice>::Dimensions KernelDimensions;
+
+  enum {
+    IsAligned = TensorEvaluator<InputArgType, GpuDevice>::IsAligned &
+                TensorEvaluator<KernelArgType, GpuDevice>::IsAligned,
+    PacketAccess = false,
+    BlockAccess = false,
+    Layout = TensorEvaluator<InputArgType, GpuDevice>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC TensorEvaluator(const XprType& op, const GpuDevice& device)
+      : m_inputImpl(op.inputExpression(), device), m_kernelArg(op.kernelExpression()), m_kernelImpl(op.kernelExpression(), device), m_indices(op.indices()), m_buf(NULL), m_kernel(NULL), m_local_kernel(false), m_device(device)
+  {
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<InputArgType, GpuDevice>::Layout) == static_cast<int>(TensorEvaluator<KernelArgType, GpuDevice>::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+    const typename TensorEvaluator<InputArgType, GpuDevice>::Dimensions& input_dims = m_inputImpl.dimensions();
+    const typename TensorEvaluator<KernelArgType, GpuDevice>::Dimensions& kernel_dims = m_kernelImpl.dimensions();
+
+    m_dimensions = m_inputImpl.dimensions();
+    for (int i = 0; i < NumKernelDims; ++i) {
+      const Index index = op.indices()[i];
+      const Index input_dim = input_dims[index];
+      const Index kernel_dim = kernel_dims[i];
+      const Index result_dim = input_dim - kernel_dim + 1;
+      m_dimensions[index] = result_dim;
+    }
+  }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename InputArgType::Scalar Scalar;
+
+  EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* data) {
+    preloadKernel();
+    m_inputImpl.evalSubExprsIfNeeded(NULL);
+    if (data) {
+      executeEval(data);
+      return false;
+    } else {
+      m_buf = (Scalar*)m_device.allocate(dimensions().TotalSize() * sizeof(Scalar));
+      executeEval(m_buf);
+      return true;
+    }
+  }
+
+  EIGEN_STRONG_INLINE void cleanup() {
+    m_inputImpl.cleanup();
+    if (m_buf) {
+      m_device.deallocate(m_buf);
+      m_buf = NULL;
+    }
+    if (m_local_kernel) {
+      m_device.deallocate((void*)m_kernel);
+      m_local_kernel = false;
+    }
+    m_kernel = NULL;
+  }
+
+  EIGEN_STRONG_INLINE void preloadKernel() {
+    // Don't make a local copy of the kernel unless we have to (i.e. it's an
+    // expression that needs to be evaluated)
+    const Scalar* in_place = m_kernelImpl.data();
+    if (in_place) {
+      m_kernel = in_place;
+      m_local_kernel = false;
+    } else {
+      size_t kernel_sz = m_kernelImpl.dimensions().TotalSize() * sizeof(Scalar);
+      Scalar* local = (Scalar*)m_device.allocate(kernel_sz);
+      typedef TensorEvalToOp<const KernelArgType> EvalTo;
+      EvalTo evalToTmp(local, m_kernelArg);
+      const bool PacketAccess = internal::IsVectorizable<GpuDevice, KernelArgType>::value;
+      const bool BlockAccess = false;
+      internal::TensorExecutor<const EvalTo, GpuDevice, PacketAccess, BlockAccess>::run(evalToTmp, m_device);
+
+      m_kernel = local;
+      m_local_kernel = true;
+    }
+  }
+
+  static unsigned int ceil(unsigned int num, unsigned int denom) {
+    const unsigned int rounded_toward_zero = num / denom;
+    if (num > rounded_toward_zero * denom) {
+      return rounded_toward_zero + 1;
+    }
+    return rounded_toward_zero;
+  }
+
+  void executeEval(Scalar* data) const {
+    typedef typename TensorEvaluator<InputArgType, GpuDevice>::Dimensions InputDims;
+
+    const int maxSharedMem = m_device.sharedMemPerBlock();
+    const int maxThreadsPerBlock = m_device.maxCudaThreadsPerBlock();
+    const int maxBlocksPerProcessor = m_device.maxCudaThreadsPerMultiProcessor() / maxThreadsPerBlock;
+    const int numMultiProcessors = m_device.getNumCudaMultiProcessors();
+    const int warpSize = 32;
+
+    switch (NumKernelDims) {
+      case 1: {
+        const int kernel_size = m_kernelImpl.dimensions().TotalSize();
+
+        const int numX = dimensions()[m_indices[0]];
+        const int numP = dimensions().TotalSize() / numX;
+        int maxX;
+        dim3 block_size;
+
+        const int single_stride_dim =
+            static_cast<int>(Layout) == static_cast<int>(ColMajor)
+                ? 0
+                : m_inputImpl.dimensions().rank() - 1;
+        if (m_indices[0] == single_stride_dim) {
+          // Maximum the reuse
+          const int inner_dim = ((maxSharedMem / (sizeof(Scalar)) - kernel_size + 1 + 31) / 32) * 32;
+          maxX = (std::min<int>)(inner_dim, numX);
+          const int maxP = (std::min<int>)(maxSharedMem / ((kernel_size - 1 + maxX) * sizeof(Scalar)), numP);
+          block_size.x = numext::mini(maxThreadsPerBlock, maxX);
+          block_size.y = (std::min<int>)(maxThreadsPerBlock / block_size.x, maxP);
+        }
+        else {
+          // Read as much as possible alongside the inner most dimension, that is the plane
+          const int inner_dim = maxSharedMem / ((warpSize + kernel_size) * sizeof(Scalar));
+          const int maxP = (std::min<int>)(inner_dim, numP);
+          maxX = (std::min<int>)(maxSharedMem / (inner_dim * sizeof(Scalar)) - kernel_size + 1, numX);
+
+          block_size.x = numext::mini(warpSize, maxX);
+          block_size.y = (std::min<int>)(maxThreadsPerBlock/block_size.x, maxP);
+        }
+
+        const int shared_mem = block_size.y * (maxX + kernel_size - 1) * sizeof(Scalar);
+        assert(shared_mem <= maxSharedMem);
+
+        const int num_x_blocks = ceil(numX, maxX);
+        const int blocksPerProcessor = numext::mini(maxBlocksPerProcessor, maxSharedMem / shared_mem);
+        const int num_y_blocks = ceil(numMultiProcessors * blocksPerProcessor, num_x_blocks);
+
+        dim3 num_blocks(num_x_blocks, std::min<int>(num_y_blocks, ceil(numP, block_size.y)));
+
+
+        //cout << "launching 1D kernel with block_size.x: " << block_size.x << " block_size.y: " << block_size.y << " num_blocks.x: " << num_blocks.x << " num_blocks.y: " << num_blocks.y << " maxX: " << maxX << " shared_mem: " << shared_mem << " in stream " << m_device.stream() << endl;
+
+        const array<Index, 1> indices(m_indices[0]);
+        const array<Index, 1> kernel_dims(m_kernelImpl.dimensions()[0]);
+        internal::IndexMapper<Index, InputDims, 1, Layout> indexMapper(
+            m_inputImpl.dimensions(), kernel_dims, indices);
+        switch(kernel_size) {
+          case 4: {
+            LAUNCH_CUDA_KERNEL((EigenConvolutionKernel1D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims, 4>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, 4, data);
+            break;
+          }
+          case 7: {
+            LAUNCH_CUDA_KERNEL((EigenConvolutionKernel1D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims, 7>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, 7, data);
+            break;
+          }
+          default: {
+            LAUNCH_CUDA_KERNEL((EigenConvolutionKernel1D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims, Dynamic>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, kernel_size, data);
+          }
+        }
+        break;
+      }
+
+      case 2: {
+        const int idxX =
+            static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 0 : 1;
+        const int idxY =
+            static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 1 : 0;
+        const int kernel_size_x = m_kernelImpl.dimensions()[idxX];
+        const int kernel_size_y = m_kernelImpl.dimensions()[idxY];
+
+        const int numX = dimensions()[m_indices[idxX]];
+        const int numY = dimensions()[m_indices[idxY]];
+        const int numP = dimensions().TotalSize() / (numX*numY);
+
+        const float scaling_factor = sqrtf(static_cast<float>(maxSharedMem) / (sizeof(Scalar) * kernel_size_y * kernel_size_x));
+
+        // Snap maxX to warp size
+        int inner_dim = ((static_cast<int>(scaling_factor * kernel_size_x) - kernel_size_x + 1 + 32) / 32) * 32;
+        const int maxX = (std::min<int>)(inner_dim, numX);
+        const int maxY = (std::min<int>)(maxSharedMem / (sizeof(Scalar) * (maxX + kernel_size_x - 1)) - kernel_size_y + 1, numY);
+        const int maxP = (std::min<int>)(maxSharedMem / ((kernel_size_x - 1 + maxX) * (kernel_size_y - 1 + maxY) * sizeof(Scalar)), numP);
+
+        dim3 block_size;
+        block_size.x = numext::mini(1024, maxX);
+        block_size.y = (std::min<int>)(1024/block_size.x, maxY);
+        block_size.z = (std::min<int>)(1024/(block_size.x*block_size.y), maxP);
+
+        const int shared_mem = block_size.z * (maxX + kernel_size_x - 1) * (maxY + kernel_size_y - 1) * sizeof(Scalar);
+        assert(shared_mem <= maxSharedMem);
+
+        const int num_x_blocks = ceil(numX, maxX);
+        const int num_y_blocks = ceil(numY, maxY);
+        const int blocksPerProcessor = numext::mini(maxBlocksPerProcessor, maxSharedMem / shared_mem);
+        const int num_z_blocks = ceil(numMultiProcessors * blocksPerProcessor, num_x_blocks * num_y_blocks);
+
+        dim3 num_blocks(num_x_blocks, num_y_blocks, std::min<int>(num_z_blocks, ceil(numP, block_size.z)));
+
+
+        //cout << "launching 2D kernel with block_size.x: " << block_size.x << " block_size.y: " << block_size.y  << " block_size.z: " << block_size.z << " num_blocks.x: " << num_blocks.x << " num_blocks.y: " << num_blocks.y << " num_blocks.z: " << num_blocks.z << " maxX: " << maxX << " maxY: " << maxY << " maxP: " << maxP << " shared_mem: " << shared_mem << " in stream " << m_device.stream() << endl;
+
+        const array<Index, 2> indices(m_indices[idxX], m_indices[idxY]);
+        const array<Index, 2> kernel_dims(m_kernelImpl.dimensions()[idxX],
+                                          m_kernelImpl.dimensions()[idxY]);
+        internal::IndexMapper<Index, InputDims, 2, Layout> indexMapper(
+            m_inputImpl.dimensions(), kernel_dims, indices);
+        switch (kernel_size_x) {
+          case 4: {
+            switch (kernel_size_y) {
+              case 7: {
+                LAUNCH_CUDA_KERNEL((EigenConvolutionKernel2D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims, 4, 7>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, 4, 7, data);
+                break;
+              }
+              default: {
+                LAUNCH_CUDA_KERNEL((EigenConvolutionKernel2D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims, 4, Dynamic>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, 4, kernel_size_y, data);
+                break;
+              }
+            }
+            break;
+          }
+          case 7: {
+            switch (kernel_size_y) {
+              case 4: {
+                LAUNCH_CUDA_KERNEL((EigenConvolutionKernel2D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims, 7, 4>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, 7, 4, data);
+                break;
+              }
+              default: {
+                LAUNCH_CUDA_KERNEL((EigenConvolutionKernel2D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims, 7, Dynamic>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, 7, kernel_size_y, data);
+                break;
+              }
+            }
+            break;
+          }
+          default: {
+            LAUNCH_CUDA_KERNEL((EigenConvolutionKernel2D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims, Dynamic, Dynamic>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, kernel_size_x, kernel_size_y, data);
+            break;
+          }
+        }
+        break;
+      }
+
+      case 3: {
+        const int idxX =
+            static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 0 : 2;
+        const int idxY =
+            static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 1 : 1;
+        const int idxZ =
+            static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 2 : 0;
+
+        const int kernel_size_x = m_kernelImpl.dimensions()[idxX];
+        const int kernel_size_y = m_kernelImpl.dimensions()[idxY];
+        const int kernel_size_z = m_kernelImpl.dimensions()[idxZ];
+
+        const int numX = dimensions()[m_indices[idxX]];
+        const int numY = dimensions()[m_indices[idxY]];
+        const int numZ = dimensions()[m_indices[idxZ]];
+        const int numP = dimensions().TotalSize() / (numX*numY*numZ);
+
+        const int maxX = (std::min<int>)(128, (std::min<int>)(maxSharedMem / (sizeof(Scalar) * kernel_size_y * kernel_size_z) - kernel_size_x + 1, numX));
+        const int maxY = (std::min<int>)(128, (std::min<int>)(maxSharedMem / (sizeof(Scalar) * (maxX + kernel_size_x - 1) * kernel_size_z) - kernel_size_y + 1, numY));
+        const int maxZ = (std::min<int>)(128, (std::min<int>)(maxSharedMem / (sizeof(Scalar) * (maxX + kernel_size_x - 1) * (maxY + kernel_size_y - 1)) - kernel_size_z + 1, numZ));
+
+        dim3 block_size;
+        block_size.x = numext::mini(32, maxX);
+        block_size.y = numext::mini(32, maxY);
+        block_size.z = (std::min<int>)(1024/(block_size.x*block_size.y), maxZ);
+        dim3 num_blocks(ceil(numX, maxX), ceil(numY, maxY), ceil(numZ, maxZ));
+
+        const int shared_mem = (maxX + kernel_size_x - 1) * (maxY + kernel_size_y - 1) * (maxZ + kernel_size_z - 1) * sizeof(Scalar);
+        assert(shared_mem <= maxSharedMem);
+
+        //cout << "launching 3D kernel with block_size.x: " << block_size.x << " block_size.y: " << block_size.y  << " block_size.z: " << block_size.z << " num_blocks.x: " << num_blocks.x << " num_blocks.y: " << num_blocks.y << " num_blocks.z: " << num_blocks.z  << " shared_mem: " << shared_mem << " in stream " << m_device.stream() << endl;
+        const array<Index, 3> indices(m_indices[idxX], m_indices[idxY],
+                                      m_indices[idxZ]);
+        const array<Index, 3> kernel_dims(m_kernelImpl.dimensions()[idxX],
+                                          m_kernelImpl.dimensions()[idxY],
+                                          m_kernelImpl.dimensions()[idxZ]);
+        internal::IndexMapper<Index, InputDims, 3, Layout> indexMapper(
+            m_inputImpl.dimensions(), kernel_dims, indices);
+
+        LAUNCH_CUDA_KERNEL((EigenConvolutionKernel3D<TensorEvaluator<InputArgType, GpuDevice>, Index, InputDims>), num_blocks, block_size, shared_mem, m_device, m_inputImpl, indexMapper, m_kernel, numP, numX, maxX, numY, maxY, numZ, maxZ, kernel_size_x, kernel_size_y, kernel_size_z, data);
+        break;
+      }
+
+      default: {
+        EIGEN_STATIC_ASSERT((NumKernelDims >= 1 && NumKernelDims <= 3), THIS_METHOD_IS_ONLY_FOR_OBJECTS_OF_A_SPECIFIC_SIZE);
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    eigen_assert(m_buf);
+    eigen_assert(index < m_dimensions.TotalSize());
+    return m_buf[index];
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(const Index index) const
+  {
+    eigen_assert(m_buf);
+    eigen_assert(index < m_dimensions.TotalSize());
+    return internal::ploadt<PacketReturnType, LoadMode>(m_buf+index);
+  }
+
+ private:
+  // No assignment (copies are needed by the kernels)
+  TensorEvaluator& operator = (const TensorEvaluator&);
+
+  TensorEvaluator<InputArgType, GpuDevice> m_inputImpl;
+  TensorEvaluator<KernelArgType, GpuDevice> m_kernelImpl;
+  KernelArgType m_kernelArg;
+  Indices m_indices;
+  Dimensions m_dimensions;
+  Scalar* m_buf;
+  const Scalar* m_kernel;
+  bool m_local_kernel;
+
+  const GpuDevice& m_device;
+};
+#endif
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CONVOLUTION_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorCustomOp.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorCustomOp.h
new file mode 100644
index 0000000000..dc39565d6b
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorCustomOp.h
@@ -0,0 +1,302 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_CUSTOM_OP_H
+#define EIGEN_CXX11_TENSOR_TENSOR_CUSTOM_OP_H
+
+namespace Eigen {
+
+/** \class TensorCustomUnaryOp
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor custom class.
+  *
+  *
+  */
+namespace internal {
+template<typename CustomUnaryFunc, typename XprType>
+struct traits<TensorCustomUnaryOp<CustomUnaryFunc, XprType> >
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::StorageKind StorageKind;
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = traits<XprType>::NumDimensions;
+  static const int Layout = traits<XprType>::Layout;
+};
+
+template<typename CustomUnaryFunc, typename XprType>
+struct eval<TensorCustomUnaryOp<CustomUnaryFunc, XprType>, Eigen::Dense>
+{
+  typedef const TensorCustomUnaryOp<CustomUnaryFunc, XprType>& type;
+};
+
+template<typename CustomUnaryFunc, typename XprType>
+struct nested<TensorCustomUnaryOp<CustomUnaryFunc, XprType>, 1, typename eval<TensorCustomUnaryOp<CustomUnaryFunc, XprType> >::type>
+{
+  typedef TensorCustomUnaryOp<CustomUnaryFunc, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename CustomUnaryFunc, typename XprType>
+class TensorCustomUnaryOp : public TensorBase<TensorCustomUnaryOp<CustomUnaryFunc, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename internal::traits<TensorCustomUnaryOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename internal::nested<TensorCustomUnaryOp>::type Nested;
+  typedef typename internal::traits<TensorCustomUnaryOp>::StorageKind StorageKind;
+  typedef typename internal::traits<TensorCustomUnaryOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCustomUnaryOp(const XprType& expr, const CustomUnaryFunc& func)
+      : m_expr(expr), m_func(func) {}
+
+  EIGEN_DEVICE_FUNC
+  const CustomUnaryFunc& func() const { return m_func; }
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename XprType::Nested>::type&
+  expression() const { return m_expr; }
+
+  protected:
+    typename XprType::Nested m_expr;
+    const CustomUnaryFunc m_func;
+};
+
+
+// Eval as rvalue
+template<typename CustomUnaryFunc, typename XprType, typename Device>
+struct TensorEvaluator<const TensorCustomUnaryOp<CustomUnaryFunc, XprType>, Device>
+{
+  typedef TensorCustomUnaryOp<CustomUnaryFunc, XprType> ArgType;
+  typedef typename internal::traits<ArgType>::Index Index;
+  static const int NumDims = internal::traits<ArgType>::NumDimensions;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef
+      typename internal::remove_const<typename ArgType::Scalar>::type Scalar;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+    BlockAccess = false,
+    Layout = TensorEvaluator<XprType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const ArgType& op, const Device& device)
+      : m_op(op), m_device(device), m_result(NULL)
+  {
+    m_dimensions = op.func().dimensions(op.expression());
+  }
+
+  typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* data) {
+    if (data) {
+      evalTo(data);
+      return false;
+    } else {
+      m_result = static_cast<CoeffReturnType*>(
+          m_device.allocate(dimensions().TotalSize() * sizeof(Scalar)));
+      evalTo(m_result);
+      return true;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    if (m_result != NULL) {
+      m_device.deallocate(m_result);
+      m_result = NULL;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const {
+    return m_result[index];
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const {
+    return internal::ploadt<PacketReturnType, LoadMode>(m_result + index);
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType* data() const { return m_result; }
+
+ protected:
+  EIGEN_DEVICE_FUNC void evalTo(Scalar* data) {
+    TensorMap<Tensor<CoeffReturnType, NumDims, Layout, Index> > result(
+        data, m_dimensions);
+    m_op.func().eval(m_op.expression(), result, m_device);
+  }
+
+  Dimensions m_dimensions;
+  const ArgType m_op;
+  const Device& m_device;
+  CoeffReturnType* m_result;
+};
+
+
+
+/** \class TensorCustomBinaryOp
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor custom class.
+  *
+  *
+  */
+namespace internal {
+template<typename CustomBinaryFunc, typename LhsXprType, typename RhsXprType>
+struct traits<TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType> >
+{
+  typedef typename internal::promote_storage_type<typename LhsXprType::Scalar,
+                                                  typename RhsXprType::Scalar>::ret Scalar;
+  typedef typename internal::promote_storage_type<typename LhsXprType::CoeffReturnType,
+                                                  typename RhsXprType::CoeffReturnType>::ret CoeffReturnType;
+  typedef typename promote_storage_type<typename traits<LhsXprType>::StorageKind,
+                                        typename traits<RhsXprType>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename traits<LhsXprType>::Index,
+                                      typename traits<RhsXprType>::Index>::type Index;
+  typedef typename LhsXprType::Nested LhsNested;
+  typedef typename RhsXprType::Nested RhsNested;
+  typedef typename remove_reference<LhsNested>::type _LhsNested;
+  typedef typename remove_reference<RhsNested>::type _RhsNested;
+  static const int NumDimensions = traits<LhsXprType>::NumDimensions;
+  static const int Layout = traits<LhsXprType>::Layout;
+};
+
+template<typename CustomBinaryFunc, typename LhsXprType, typename RhsXprType>
+struct eval<TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType>, Eigen::Dense>
+{
+  typedef const TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType>& type;
+};
+
+template<typename CustomBinaryFunc, typename LhsXprType, typename RhsXprType>
+struct nested<TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType>, 1, typename eval<TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType> >::type>
+{
+  typedef TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename CustomBinaryFunc, typename LhsXprType, typename RhsXprType>
+class TensorCustomBinaryOp : public TensorBase<TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename internal::traits<TensorCustomBinaryOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename internal::traits<TensorCustomBinaryOp>::CoeffReturnType CoeffReturnType;
+  typedef typename internal::nested<TensorCustomBinaryOp>::type Nested;
+  typedef typename internal::traits<TensorCustomBinaryOp>::StorageKind StorageKind;
+  typedef typename internal::traits<TensorCustomBinaryOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCustomBinaryOp(const LhsXprType& lhs, const RhsXprType& rhs, const CustomBinaryFunc& func)
+
+      : m_lhs_xpr(lhs), m_rhs_xpr(rhs), m_func(func) {}
+
+  EIGEN_DEVICE_FUNC
+  const CustomBinaryFunc& func() const { return m_func; }
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename LhsXprType::Nested>::type&
+  lhsExpression() const { return m_lhs_xpr; }
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename RhsXprType::Nested>::type&
+  rhsExpression() const { return m_rhs_xpr; }
+
+  protected:
+    typename LhsXprType::Nested m_lhs_xpr;
+    typename RhsXprType::Nested m_rhs_xpr;
+    const CustomBinaryFunc m_func;
+};
+
+
+// Eval as rvalue
+template<typename CustomBinaryFunc, typename LhsXprType, typename RhsXprType, typename Device>
+struct TensorEvaluator<const TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType>, Device>
+{
+  typedef TensorCustomBinaryOp<CustomBinaryFunc, LhsXprType, RhsXprType> XprType;
+  typedef typename internal::traits<XprType>::Index Index;
+  static const int NumDims = internal::traits<XprType>::NumDimensions;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+    BlockAccess = false,
+    Layout = TensorEvaluator<LhsXprType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_op(op), m_device(device), m_result(NULL)
+  {
+    m_dimensions = op.func().dimensions(op.lhsExpression(), op.rhsExpression());
+  }
+
+  typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* data) {
+    if (data) {
+      evalTo(data);
+      return false;
+    } else {
+      m_result = static_cast<Scalar *>(m_device.allocate(dimensions().TotalSize() * sizeof(Scalar)));
+      evalTo(m_result);
+      return true;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    if (m_result != NULL) {
+      m_device.deallocate(m_result);
+      m_result = NULL;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const {
+    return m_result[index];
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const {
+    return internal::ploadt<PacketReturnType, LoadMode>(m_result + index);
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType* data() const { return m_result; }
+
+ protected:
+  EIGEN_DEVICE_FUNC void evalTo(Scalar* data) {
+    TensorMap<Tensor<Scalar, NumDims, Layout> > result(data, m_dimensions);
+    m_op.func().eval(m_op.lhsExpression(), m_op.rhsExpression(), result, m_device);
+  }
+
+  Dimensions m_dimensions;
+  const XprType m_op;
+  const Device& m_device;
+  CoeffReturnType* m_result;
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_CUSTOM_OP_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h
new file mode 100644
index 0000000000..3c33015bc4
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h
@@ -0,0 +1,154 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_DEVICE_H
+#define EIGEN_CXX11_TENSOR_TENSOR_DEVICE_H
+
+namespace Eigen {
+
+/** \class TensorDevice
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Pseudo expression providing an operator = that will evaluate its argument
+  * on the specified computing 'device' (GPU, thread pool, ...)
+  *
+  * Example:
+  *    C.device(EIGEN_GPU) = A + B;
+  *
+  * Todo: thread pools.
+  * Todo: operator +=, -=, *= and so on.
+  */
+
+template <typename ExpressionType, typename DeviceType> class TensorDevice {
+  public:
+    TensorDevice(const DeviceType& device, ExpressionType& expression) : m_device(device), m_expression(expression) {}
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator=(const OtherDerived& other) {
+      typedef TensorAssignOp<ExpressionType, const OtherDerived> Assign;
+      Assign assign(m_expression, other);
+      internal::TensorExecutor<const Assign, DeviceType>::run(assign, m_device);
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator+=(const OtherDerived& other) {
+      typedef typename OtherDerived::Scalar Scalar;
+      typedef TensorCwiseBinaryOp<internal::scalar_sum_op<Scalar>, const ExpressionType, const OtherDerived> Sum;
+      Sum sum(m_expression, other);
+      typedef TensorAssignOp<ExpressionType, const Sum> Assign;
+      Assign assign(m_expression, sum);
+      internal::TensorExecutor<const Assign, DeviceType>::run(assign, m_device);
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator-=(const OtherDerived& other) {
+      typedef typename OtherDerived::Scalar Scalar;
+      typedef TensorCwiseBinaryOp<internal::scalar_difference_op<Scalar>, const ExpressionType, const OtherDerived> Difference;
+      Difference difference(m_expression, other);
+      typedef TensorAssignOp<ExpressionType, const Difference> Assign;
+      Assign assign(m_expression, difference);
+      internal::TensorExecutor<const Assign, DeviceType>::run(assign, m_device);
+      return *this;
+    }
+
+  protected:
+    const DeviceType& m_device;
+    ExpressionType& m_expression;
+};
+
+
+#ifdef EIGEN_USE_THREADS
+template <typename ExpressionType> class TensorDevice<ExpressionType, ThreadPoolDevice> {
+  public:
+    TensorDevice(const ThreadPoolDevice& device, ExpressionType& expression) : m_device(device), m_expression(expression) {}
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator=(const OtherDerived& other) {
+      typedef TensorAssignOp<ExpressionType, const OtherDerived> Assign;
+      Assign assign(m_expression, other);
+      internal::TensorExecutor<const Assign, ThreadPoolDevice>::run(assign, m_device);
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator+=(const OtherDerived& other) {
+      typedef typename OtherDerived::Scalar Scalar;
+      typedef TensorCwiseBinaryOp<internal::scalar_sum_op<Scalar>, const ExpressionType, const OtherDerived> Sum;
+      Sum sum(m_expression, other);
+      typedef TensorAssignOp<ExpressionType, const Sum> Assign;
+      Assign assign(m_expression, sum);
+      internal::TensorExecutor<const Assign, ThreadPoolDevice>::run(assign, m_device);
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator-=(const OtherDerived& other) {
+      typedef typename OtherDerived::Scalar Scalar;
+      typedef TensorCwiseBinaryOp<internal::scalar_difference_op<Scalar>, const ExpressionType, const OtherDerived> Difference;
+      Difference difference(m_expression, other);
+      typedef TensorAssignOp<ExpressionType, const Difference> Assign;
+      Assign assign(m_expression, difference);
+      internal::TensorExecutor<const Assign, ThreadPoolDevice>::run(assign, m_device);
+      return *this;
+    }
+
+  protected:
+    const ThreadPoolDevice& m_device;
+    ExpressionType& m_expression;
+};
+#endif
+
+#if defined(EIGEN_USE_GPU)
+template <typename ExpressionType> class TensorDevice<ExpressionType, GpuDevice>
+{
+  public:
+    TensorDevice(const GpuDevice& device, ExpressionType& expression) : m_device(device), m_expression(expression) {}
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator=(const OtherDerived& other) {
+      typedef TensorAssignOp<ExpressionType, const OtherDerived> Assign;
+      Assign assign(m_expression, other);
+      internal::TensorExecutor<const Assign, GpuDevice>::run(assign, m_device);
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator+=(const OtherDerived& other) {
+      typedef typename OtherDerived::Scalar Scalar;
+      typedef TensorCwiseBinaryOp<internal::scalar_sum_op<Scalar>, const ExpressionType, const OtherDerived> Sum;
+      Sum sum(m_expression, other);
+      typedef TensorAssignOp<ExpressionType, const Sum> Assign;
+      Assign assign(m_expression, sum);
+      internal::TensorExecutor<const Assign, GpuDevice>::run(assign, m_device);
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_STRONG_INLINE TensorDevice& operator-=(const OtherDerived& other) {
+      typedef typename OtherDerived::Scalar Scalar;
+      typedef TensorCwiseBinaryOp<internal::scalar_difference_op<Scalar>, const ExpressionType, const OtherDerived> Difference;
+      Difference difference(m_expression, other);
+      typedef TensorAssignOp<ExpressionType, const Difference> Assign;
+      Assign assign(m_expression, difference);
+      internal::TensorExecutor<const Assign, GpuDevice>::run(assign, m_device);
+      return *this;
+    }
+
+  protected:
+    const GpuDevice& m_device;
+    ExpressionType& m_expression;
+};
+#endif
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_DEVICE_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceType.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceType.h
new file mode 100644
index 0000000000..b6eeb73832
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDeviceType.h
@@ -0,0 +1,920 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_DEVICE_TYPE_H
+#define EIGEN_CXX11_TENSOR_TENSOR_DEVICE_TYPE_H
+
+namespace Eigen {
+
+// Default device for the machine (typically a single cpu core)
+struct DefaultDevice {
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void* allocate(size_t num_bytes) const {
+    return internal::aligned_malloc(num_bytes);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void deallocate(void* buffer) const {
+    internal::aligned_free(buffer);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpy(void* dst, const void* src, size_t n) const {
+    ::memcpy(dst, src, n);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpyHostToDevice(void* dst, const void* src, size_t n) const {
+    memcpy(dst, src, n);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpyDeviceToHost(void* dst, const void* src, size_t n) const {
+    memcpy(dst, src, n);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memset(void* buffer, int c, size_t n) const {
+    ::memset(buffer, c, n);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t numThreads() const {
+#ifndef __CUDA_ARCH__
+    // Running on the host CPU
+    return 1;
+#else
+    // Running on a CUDA device
+    return 32;
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t memcpyThreshold() const {
+    return 2 * numThreads();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t firstLevelCacheSize() const {
+#ifndef __CUDA_ARCH__
+    // Running on the host CPU
+    return l1CacheSize();
+#else
+    // Running on a CUDA device, return the amount of shared memory available.
+    return 48*1024;
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t lastLevelCacheSize() const {
+#ifndef __CUDA_ARCH__
+    // Running single threaded on the host CPU
+    return l3CacheSize();
+#else
+    // Running on a CUDA device
+    return firstLevelCacheSize();
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int majorDeviceVersion() const {
+#ifndef __CUDA_ARCH__
+    // Running single threaded on the host CPU
+    // Should return an enum that encodes the ISA supported by the CPU
+    return 1;
+#else
+    // Running on a CUDA device
+    return __CUDA_ARCH__ / 100;
+#endif
+  }
+};
+
+// Multiple cpu cores
+#ifdef EIGEN_USE_THREADS
+
+#if __cplusplus > 199711
+// This defines an interface that ThreadPoolDevice can take to use
+// custom thread pools underneath.
+class ThreadPoolInterface {
+ public:
+  virtual void Schedule(std::function<void()> fn) = 0;
+
+  virtual ~ThreadPoolInterface() {}
+};
+#endif
+
+// The implementation of the ThreadPool type ensures that the Schedule method
+// runs the functions it is provided in FIFO order when the scheduling is done
+// by a single thread.
+#ifdef EIGEN_USE_CUSTOM_THREAD_POOL
+class ThreadPool : public ThreadPoolInterface {
+ public:
+  // Construct a pool that contains "num_threads" threads.
+  explicit ThreadPool(int num_threads) : threads_(num_threads), waiters_(num_threads) {
+    for (int i = 0; i < num_threads; i++) {
+      threads_.push_back(new std::thread([this]() { WorkerLoop(); }));
+    }
+  }
+
+  // Wait until all scheduled work has finished and then destroy the
+  // set of threads.
+  ~ThreadPool() {
+    {
+      // Wait for all work to get done.
+      std::unique_lock<std::mutex> l(mu_);
+      while (!pending_.empty()) {
+        empty_.wait(l);
+      }
+      exiting_ = true;
+
+      // Wakeup all waiters.
+      for (auto w : waiters_) {
+        w->ready = true;
+        w->work = nullptr;
+        w->cv.notify_one();
+      }
+    }
+
+    // Wait for threads to finish.
+    for (auto t : threads_) {
+      t->join();
+      delete t;
+    }
+  }
+
+  // Schedule fn() for execution in the pool of threads. The functions are
+  // executed in the order in which they are scheduled.
+  void Schedule(std::function<void()> fn) final {
+    std::unique_lock<std::mutex> l(mu_);
+    if (waiters_.empty()) {
+      pending_.push_back(fn);
+    } else {
+      Waiter* w = waiters_.back();
+      waiters_.pop_back();
+      w->ready = true;
+      w->work = fn;
+      w->cv.notify_one();
+    }
+  }
+
+ protected:
+  void WorkerLoop() {
+    std::unique_lock<std::mutex> l(mu_);
+    Waiter w;
+    while (!exiting_) {
+      std::function<void()> fn;
+      if (pending_.empty()) {
+        // Wait for work to be assigned to me
+        w.ready = false;
+        waiters_.push_back(&w);
+        while (!w.ready) {
+          w.cv.wait(l);
+        }
+        fn = w.work;
+        w.work = nullptr;
+      } else {
+        // Pick up pending work
+        fn = pending_.front();
+        pending_.pop_front();
+        if (pending_.empty()) {
+          empty_.notify_all();
+        }
+      }
+      if (fn) {
+        mu_.unlock();
+        fn();
+        mu_.lock();
+      }
+    }
+  }
+
+ private:
+  struct Waiter {
+    std::condition_variable cv;
+    std::function<void()> work;
+    bool ready;
+  };
+
+  std::mutex mu_;
+  FixedSizeVector<std::thread*> threads_;               // All threads
+  FixedSizeVector<Waiter*> waiters_;                    // Stack of waiting threads.
+  std::deque<std::function<void()>> pending_;       // Queue of pending work
+  std::condition_variable empty_;                   // Signaled on pending_.empty()
+  bool exiting_ = false;
+};
+
+
+// Notification is an object that allows a user to to wait for another
+// thread to signal a notification that an event has occurred.
+//
+// Multiple threads can wait on the same Notification object.
+// but only one caller must call Notify() on the object.
+class Notification {
+ public:
+  Notification() : notified_(false) {}
+  ~Notification() {}
+
+  void Notify() {
+    std::unique_lock<std::mutex> l(mu_);
+    eigen_assert(!notified_);
+    notified_ = true;
+    cv_.notify_all();
+  }
+
+  void WaitForNotification() {
+    std::unique_lock<std::mutex> l(mu_);
+    while (!notified_) {
+      cv_.wait(l);
+    }
+  }
+
+ private:
+  std::mutex mu_;
+  std::condition_variable cv_;
+  bool notified_;
+};
+
+#else
+
+// Notification is an object that allows a user to to wait for another
+// thread to signal a notification that an event has occurred.
+//
+// Multiple threads can wait on the same Notification object.
+// but only one caller must call Notify() on the object.
+class Notification {
+ public:
+  Notification() : notified_(false) {}
+  ~Notification() {}
+
+  void Notify() {
+    tensorflow::mutex_lock l(mu_);
+    eigen_assert(!notified_);
+    notified_ = true;
+    cv_.notify_all();
+  }
+
+  void WaitForNotification() {
+    tensorflow::mutex_lock l(mu_);
+    while (!notified_) {
+      cv_.wait(l);
+    }
+  }
+
+ private:
+  tensorflow::mutex mu_;
+  tensorflow::condition_variable cv_;
+  bool notified_;
+};
+#endif
+
+// Runs an arbitrary function and then calls Notify() on the passed in
+// Notification.
+template <typename Function, typename... Args> struct FunctionWrapper
+{
+  static void run(Notification* n, Function f, Args... args) {
+    f(args...);
+    n->Notify();
+  }
+};
+
+static EIGEN_STRONG_INLINE void wait_until_ready(Notification* n) {
+  if (n) {
+    n->WaitForNotification();
+  }
+}
+
+
+struct MemcpyExecutor {
+  typedef MemcpyExecutor Self;
+
+  MemcpyExecutor(void *dst, const void *src) :
+      m_dst(static_cast<char *>(dst)), m_src(static_cast<const char *>(src)) { }
+
+  static EIGEN_STRONG_INLINE void run(const MemcpyExecutor* exec, size_t idx, size_t block_size) {
+    ::memcpy(&(exec->m_dst[idx]), &(exec->m_src[idx]), block_size);
+  }
+
+ private:
+  char* m_dst;
+  const char* m_src;
+};
+
+struct MemsetExecutor {
+  typedef MemsetExecutor Self;
+
+  MemsetExecutor(void *buffer, int val) :
+      m_buffer(static_cast<char *>(buffer)), m_val(val) { }
+
+  static EIGEN_STRONG_INLINE void run(const MemsetExecutor* exec, size_t idx, size_t block_size) {
+    ::memset(&(exec->m_buffer[idx]), exec->m_val, block_size);
+  }
+
+ private:
+  char* m_buffer;
+  const int m_val;
+};
+
+
+struct ThreadPoolDevice {
+  // The ownership of the thread pool remains with the caller.
+  ThreadPoolDevice(ThreadPoolInterface* pool, size_t num_cores)
+      : pool_(pool), num_threads_(num_cores) {}
+
+  EIGEN_STRONG_INLINE void* allocate(size_t num_bytes) const {
+    return internal::aligned_malloc(num_bytes);
+  }
+
+  EIGEN_STRONG_INLINE void deallocate(void* buffer) const {
+    internal::aligned_free(buffer);
+  }
+
+  EIGEN_STRONG_INLINE void memcpy(void* dst, const void* src, size_t n) const {
+#ifdef __ANDROID__
+    ::memcpy(dst, src, n);
+#else
+    if (n <= 32768) {
+      ::memcpy(dst, src, n);
+    } else {
+      MemcpyExecutor memcpy_executor(dst, src);
+      execute(memcpy_executor, n);
+    }
+#endif
+  }
+
+  EIGEN_STRONG_INLINE void memcpyHostToDevice(void* dst, const void* src, size_t n) const {
+    memcpy(dst, src, n);
+  }
+
+  EIGEN_STRONG_INLINE void memcpyDeviceToHost(void* dst, const void* src, size_t n) const {
+    memcpy(dst, src, n);
+  }
+
+  EIGEN_STRONG_INLINE void memset(void* buffer, int c, size_t n) const {
+#ifdef __ANDROID__
+    ::memset(buffer, c, n);
+#else
+    if (n <= 32768) {
+      ::memset(buffer, c, n);
+    } else {
+      MemsetExecutor memset_executor(buffer, c);
+      execute(memset_executor, n);
+    }
+#endif
+  }
+
+  EIGEN_STRONG_INLINE size_t numThreads() const {
+    return num_threads_;
+  }
+
+  EIGEN_STRONG_INLINE size_t memcpyThreshold() const {
+    return 2 * numThreads();
+  }
+
+  EIGEN_STRONG_INLINE size_t firstLevelCacheSize() const {
+    return l1CacheSize();
+  }
+
+  EIGEN_STRONG_INLINE size_t lastLevelCacheSize() const {
+    // The l3 cache size is shared between all the cores.
+    return l3CacheSize() / num_threads_;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int majorDeviceVersion() const {
+    // Should return an enum that encodes the ISA supported by the CPU
+    return 1;
+  }
+
+  template <class Function, class... Args>
+  EIGEN_STRONG_INLINE Notification* enqueue(Function&& f, Args&&... args) const {
+    Notification* n = new Notification();
+    std::function<void()> func =
+        std::bind(&FunctionWrapper<Function, Args...>::run, n, f, args...);
+    pool_->Schedule(func);
+    return n;
+  }
+
+  template <class Function, class... Args>
+  EIGEN_STRONG_INLINE void enqueue_and_forget(Function&& f, Args&&... args) const {
+    std::function<void()> func = std::bind(f, args...);
+    pool_->Schedule(func);
+  }
+
+ private:
+  template<typename Executor>
+  EIGEN_STRONG_INLINE void execute(const Executor& exec, size_t n) const {
+    // don't spawn a thread to process fewer than 1024 bytes (chosen by small amount of
+    // experimentation)
+    // TODO: make block_size a multiple of packet_size and align everything
+    const size_t block_size = numext::maxi(static_cast<size_t>(1024), n / numThreads());
+    const size_t block_count = n / block_size;
+    eigen_assert(block_count <= numThreads());
+
+    FixedSizeVector<Notification*> results(block_count);
+    for (size_t block_idx = 0; block_idx < block_count; block_idx++) {
+      results.push_back(enqueue(&Executor::run, &exec, block_idx * block_size, block_size));
+    }
+
+    if (block_count * block_size < n) {
+      Executor::run(&exec, block_count * block_size, n - block_count * block_size);
+    }
+
+    // wait for threads to finish
+    for (size_t block_idx = 0; block_idx < block_count; block_idx++) {
+      results[block_idx]->WaitForNotification();
+      delete results[block_idx];
+    }
+  }
+
+  // todo: NUMA, ...
+  size_t num_threads_;
+  ThreadPoolInterface* pool_;
+};
+#endif
+
+
+// GPU offloading
+#ifdef EIGEN_USE_GPU
+
+// An interface abstracting away device specific memory allocator.
+class Allocator {
+ public:
+  virtual ~Allocator() {}
+  EIGEN_DEVICE_FUNC virtual void* allocate(size_t num_bytes) const = 0;
+  EIGEN_DEVICE_FUNC virtual void deallocate(void* buffer) const = 0;
+};
+
+#if !defined(__GCUDACC__) && !defined(__GCUDACC_HOST__)
+
+// This defines an interface that GPUDevice can take to use
+// CUDA streams underneath.
+class StreamInterface {
+ public:
+  virtual ~StreamInterface() {}
+
+  virtual const cudaStream_t& stream() const = 0;
+  virtual const cudaDeviceProp& deviceProperties() const = 0;
+
+  // Allocate memory on the actual device where the computation will run
+  virtual void* allocate(size_t num_bytes) const = 0;
+  virtual void deallocate(void* buffer) const = 0;
+};
+
+static cudaDeviceProp* m_deviceProperties;
+static bool m_devicePropInitialized = false;
+static tensorflow::mutex m_devicePropInitMutex(tensorflow::LINKER_INITIALIZED);
+
+static void initializeDeviceProp() {
+  if (!m_devicePropInitialized) {
+    tensorflow::mutex_lock l(m_devicePropInitMutex);
+    if (!m_devicePropInitialized) {
+      int num_devices;
+      cudaError_t status = cudaGetDeviceCount(&num_devices);
+      eigen_check(status == cudaSuccess);
+      m_deviceProperties = new cudaDeviceProp[num_devices];
+      for (int i = 0; i < num_devices; ++i) {
+        status = cudaGetDeviceProperties(&m_deviceProperties[i], i);
+        eigen_check(status == cudaSuccess);
+      }
+      m_devicePropInitialized = true;
+    }
+  }
+}
+
+static const cudaStream_t default_stream = cudaStreamDefault;
+
+class CudaStreamDevice : public StreamInterface {
+ public:
+  // Use the default stream on the current device
+  CudaStreamDevice() : stream_(&default_stream) {
+    cudaGetDevice(&device_);
+    initializeDeviceProp();
+  }
+  // Use the default stream on the specified device
+  CudaStreamDevice(int device) : stream_(&default_stream), device_(device) {
+    initializeDeviceProp();
+  }
+  // Use the specified stream. Note that it's the
+  // caller responsibility to ensure that the stream can run on
+  // the specified device. If no device is specified the code
+  // assumes that the stream is associated to the current gpu device.
+  CudaStreamDevice(const cudaStream_t* stream, int device = -1)
+      : stream_(stream), device_(device) {
+    if (device < 0) {
+      cudaGetDevice(&device_);
+    } else {
+      int num_devices;
+      cudaError_t err = cudaGetDeviceCount(&num_devices);
+      eigen_check(err == cudaSuccess);
+      eigen_check(device < num_devices);
+      device_ = device;
+    }
+    initializeDeviceProp();
+  }
+
+  const cudaStream_t& stream() const { return *stream_; }
+  const cudaDeviceProp& deviceProperties() const {
+    return m_deviceProperties[device_];
+  }
+  virtual void* allocate(size_t num_bytes) const {
+    cudaError_t err = cudaSetDevice(device_);
+    eigen_check(err == cudaSuccess);
+    void* result;
+    err = cudaMalloc(&result, num_bytes);
+    eigen_check(err == cudaSuccess);
+    eigen_check(result != NULL);
+    return result;
+  }
+  virtual void deallocate(void* buffer) const {
+    cudaError_t err = cudaSetDevice(device_);
+    eigen_check(err == cudaSuccess);
+    assert(buffer != NULL);
+    err = cudaFree(buffer);
+    assert(err == cudaSuccess);
+  }
+
+ private:
+  const cudaStream_t* stream_;
+  int device_;
+};
+
+static inline void setCudaSharedMemConfig(cudaSharedMemConfig config) {
+  cudaError_t status = cudaDeviceSetSharedMemConfig(config);
+  eigen_check(status == cudaSuccess);
+}
+
+struct GpuDevice {
+  // Neither the cudastream nor the allocator is not owned: the caller is
+  // responsible for their initialization and eventual destruction.
+  explicit GpuDevice(const StreamInterface* stream) : stream_(stream) {
+    eigen_assert(stream);
+  }
+
+  // TODO(bsteiner): This is an internal API, we should not expose it.
+  EIGEN_STRONG_INLINE const cudaStream_t& stream() const {
+    return stream_->stream();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void* allocate(size_t num_bytes) const {
+#ifndef __CUDA_ARCH__
+    return stream_->allocate(num_bytes);
+#else
+    eigen_assert(false && "The default device should be used instead to generate kernel code");
+    return NULL;
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void deallocate(void* buffer) const {
+#ifndef __CUDA_ARCH__
+    stream_->deallocate(buffer);
+#else
+    eigen_assert(false && "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpy(void* dst, const void* src, size_t n) const {
+#ifndef __CUDA_ARCH__
+    cudaError_t err = cudaMemcpyAsync(dst, src, n, cudaMemcpyDeviceToDevice,
+                                      stream_->stream());
+    assert(err == cudaSuccess);
+#else
+    eigen_assert(false && "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpyHostToDevice(void* dst, const void* src, size_t n) const {
+#ifndef __CUDA_ARCH__
+    cudaError_t err =
+        cudaMemcpyAsync(dst, src, n, cudaMemcpyHostToDevice, stream_->stream());
+    assert(err == cudaSuccess);
+#else
+    eigen_assert(false && "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpyDeviceToHost(void* dst, const void* src, size_t n) const {
+#ifndef __CUDA_ARCH__
+    cudaError_t err =
+        cudaMemcpyAsync(dst, src, n, cudaMemcpyDeviceToHost, stream_->stream());
+    assert(err == cudaSuccess);
+#else
+    eigen_assert(false && "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memset(void* buffer, int c, size_t n) const {
+#ifndef __CUDA_ARCH__
+    cudaError_t err = cudaMemsetAsync(buffer, c, n, stream_->stream());
+    assert(err == cudaSuccess);
+#else
+    eigen_assert(false && "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t numThreads() const {
+    // FIXME
+    return 32;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t memcpyThreshold() const {
+    return 4 * 1024 * 1024;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t firstLevelCacheSize() const {
+    // FIXME
+    return 48*1024;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t lastLevelCacheSize() const {
+    // We won't try to take advantage of the l2 cache for the time being, and
+    // there is no l3 cache on cuda devices.
+    return firstLevelCacheSize();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void synchronize() const {
+#ifndef __CUDA_ARCH__
+    cudaError_t err = cudaStreamSynchronize(stream_->stream());
+    assert(err == cudaSuccess);
+#else
+    assert(false && "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  inline int getNumCudaMultiProcessors() const {
+    return stream_->deviceProperties().multiProcessorCount;
+  }
+  inline int maxCudaThreadsPerBlock() const {
+    return stream_->deviceProperties().maxThreadsPerBlock;
+  }
+  inline int maxCudaThreadsPerMultiProcessor() const {
+    return stream_->deviceProperties().maxThreadsPerMultiProcessor;
+  }
+  inline int sharedMemPerBlock() const {
+    return stream_->deviceProperties().sharedMemPerBlock;
+  }
+  inline int majorDeviceVersion() const {
+    return stream_->deviceProperties().major;
+  }
+
+  // This function checks if the CUDA runtime recorded an error for the
+  // underlying stream device.
+  inline bool ok() const {
+    cudaError_t error = cudaStreamQuery(stream_->stream());
+    return (error == cudaSuccess) || (error == cudaErrorNotReady);
+  }
+
+ private:
+  const StreamInterface* stream_;
+};
+
+inline void assertCudaOk() {
+  cudaError_t err = cudaGetLastError();
+
+  assert(err != cudaErrorMissingConfiguration);
+  assert(err != cudaErrorMemoryAllocation);
+  assert(err != cudaErrorInitializationError);
+  assert(err != cudaErrorLaunchFailure);
+  assert(err != cudaErrorPriorLaunchFailure);
+  assert(err != cudaErrorLaunchTimeout);
+  assert(err != cudaErrorLaunchOutOfResources);
+  assert(err != cudaErrorInvalidDeviceFunction);
+  assert(err != cudaErrorInvalidConfiguration);
+  assert(err != cudaErrorInvalidDevice);
+  assert(err != cudaErrorInvalidValue);
+  assert(err != cudaErrorInvalidPitchValue);
+  assert(err != cudaErrorInvalidSymbol);
+  assert(err != cudaErrorMapBufferObjectFailed);
+  assert(err != cudaErrorUnmapBufferObjectFailed);
+  assert(err != cudaErrorInvalidHostPointer);
+  assert(err != cudaErrorInvalidDevicePointer);
+  assert(err != cudaErrorInvalidTexture);
+  assert(err != cudaErrorInvalidTextureBinding);
+  assert(err != cudaErrorInvalidChannelDescriptor);
+  assert(err != cudaErrorInvalidMemcpyDirection);
+  assert(err != cudaErrorAddressOfConstant);
+  assert(err != cudaErrorTextureFetchFailed);
+  assert(err != cudaErrorTextureNotBound);
+  assert(err != cudaErrorSynchronizationError);
+  assert(err != cudaErrorInvalidFilterSetting);
+  assert(err != cudaErrorInvalidNormSetting);
+  assert(err != cudaErrorMixedDeviceExecution);
+  assert(err != cudaErrorCudartUnloading);
+  assert(err != cudaErrorUnknown);
+  assert(err != cudaErrorNotYetImplemented);
+  assert(err != cudaErrorMemoryValueTooLarge);
+  assert(err != cudaErrorInvalidResourceHandle);
+  assert(err != cudaErrorNotReady);
+  assert(err != cudaErrorInsufficientDriver);
+  assert(err != cudaErrorSetOnActiveProcess);
+  assert(err != cudaErrorInvalidSurface);
+  assert(err != cudaErrorNoDevice);
+  assert(err != cudaErrorECCUncorrectable);
+  assert(err != cudaErrorSharedObjectSymbolNotFound);
+  assert(err != cudaErrorSharedObjectInitFailed);
+  assert(err != cudaErrorUnsupportedLimit);
+  assert(err != cudaErrorDuplicateVariableName);
+  assert(err != cudaErrorDuplicateTextureName);
+  assert(err != cudaErrorDuplicateSurfaceName);
+  assert(err != cudaErrorDevicesUnavailable);
+  assert(err != cudaErrorInvalidKernelImage);
+  assert(err != cudaErrorNoKernelImageForDevice);
+  assert(err != cudaErrorIncompatibleDriverContext);
+  assert(err != cudaErrorPeerAccessAlreadyEnabled);
+  assert(err != cudaErrorPeerAccessNotEnabled);
+  assert(err != cudaErrorDeviceAlreadyInUse);
+  assert(err != cudaErrorProfilerDisabled);
+  assert(err != cudaErrorProfilerNotInitialized);
+  assert(err != cudaErrorProfilerAlreadyStarted);
+  assert(err != cudaErrorProfilerAlreadyStopped);
+  assert(err != cudaErrorAssert);
+  assert(err != cudaErrorTooManyPeers);
+  assert(err != cudaErrorHostMemoryAlreadyRegistered);
+  assert(err != cudaErrorHostMemoryNotRegistered);
+  assert(err != cudaErrorOperatingSystem);
+  assert(err != cudaErrorStartupFailure);
+  assert(err != cudaErrorApiFailureBase);
+
+  // catch errors types introduced after this function was written
+  assert(err == cudaSuccess);
+}
+
+#define LAUNCH_CUDA_KERNEL(kernel, gridsize, blocksize, sharedmem, device, \
+                           ...)                                            \
+  do {                                                                     \
+    (kernel)<<<(gridsize), (blocksize), (sharedmem), (device).stream()>>>( \
+        __VA_ARGS__);                                                      \
+    assertCudaOk();                                                        \
+  } while (false)
+
+#else  // __GCUDACC__
+
+// The following is the version of GpuDevice for StreamExecutor
+// (go/gpuexecutor) a GPU runtime that supports both CUDA and OpenCL.
+// StreamExecutor is being developed as an open-source replacement for the CUDA
+// runtime and is the runtime used when compiling with gcudacc. Differences
+// between the CUDA runtime and StreamExecutor are abstracted away behind
+// GpuDevice.
+
+// TODO(jpienaar): Temporary workaround until b/18409724 is addressed.
+enum cudaSharedMemConfig
+{
+    cudaSharedMemBankSizeDefault   = 0,
+    cudaSharedMemBankSizeFourByte  = 1,
+    cudaSharedMemBankSizeEightByte = 2
+};
+
+static inline void setCudaSharedMemConfig(cudaSharedMemConfig cache_config) {
+  // TODO(jpienaar): fix when implemented (b/18409724)
+}
+
+struct GpuDevice {
+  GpuDevice()
+      : stream_(perftools::gputools::MachineManager::singleton()->stream_for_device(0)),
+        allocator_(nullptr),
+        stream_exec_(stream_->parent()) {}
+
+  GpuDevice(perftools::gputools::Stream* stream,
+            const Allocator* alloc = nullptr)
+      : stream_(stream), allocator_(alloc), stream_exec_(stream_->parent()) { }
+
+  EIGEN_STRONG_INLINE perftools::gputools::Stream* stream() const {
+    return stream_;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void* allocate(size_t num_bytes) const {
+    if (allocator_ != nullptr) return allocator_->allocate(num_bytes);
+#ifndef __CUDA_ARCH__
+    perftools::gputools::DeviceMemory<char> mem =
+        stream_exec_->AllocateArray<char>(num_bytes);
+    return mem.opaque();
+#else
+    assert(false &&
+           "The default device should be used instead to generate kernel code");
+    return nullptr;
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void deallocate(void* buffer) const {
+    if (allocator_ != nullptr) {
+      allocator_->deallocate(buffer);
+      return;
+    }
+#ifndef __CUDA_ARCH__
+    perftools::gputools::DeviceMemoryBase gpu_mem(buffer);
+    stream_exec_->Deallocate(&gpu_mem);
+#else
+    assert(false &&
+           "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpy(void* dst, const void* src,
+                                                    size_t n) const {
+#ifndef __CUDA_ARCH__
+    perftools::gputools::DeviceMemoryBase gpu_to(dst);
+    if (!stream_->ThenMemcpy(&gpu_to, perftools::gputools::DeviceMemoryBase(
+                                          const_cast<void*>(src)),
+                             n).ok()) {
+      assert(false &&
+             "failed during enqueue of 'copy perftools::gputools to "
+             "perftools::gputools'");
+    }
+#else
+    assert(false &&
+           "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpyHostToDevice(void* dst, const void* src, size_t n) const {
+#ifndef __CUDA_ARCH__
+    perftools::gputools::DeviceMemoryBase gpu_to(dst);
+    if (!stream_->ThenMemcpy(&gpu_to, src, n).ok()) {
+      assert(false && "failed while enqueuing memcpy from host to device");
+    }
+#else
+    eigen_assert(false && "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void memcpyDeviceToHost(void* dst, const void* src, size_t n) const {
+#ifndef __CUDA_ARCH__
+    if (!stream_->ThenMemcpy(dst, perftools::gputools::DeviceMemoryBase(
+                                      const_cast<void*>(src)),
+                             n).ok()) {
+      assert(false && "failed while enqueuing memcpy from device to host");
+    }
+#else
+    eigen_assert(false && "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_STRONG_INLINE void memset(void* buffer, int c, size_t n) const {
+#ifndef __CUDA_ARCH__
+    perftools::gputools::DeviceMemoryBase gpu_buffer{buffer};
+    if (!stream_exec_->Memset32(stream_, &gpu_buffer, c, n)) {
+      assert(false && "GPU memset failed.");
+    }
+#else
+    assert(false &&
+           "The default device should be used instead to generate kernel code");
+#endif
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t numThreads() const {
+    // FIXME
+    return 32;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t memcpyThreshold() const {
+    return 4 * 1024 * 1024;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t firstLevelCacheSize() const {
+    // FIXME
+    return 48*1024;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t lastLevelCacheSize() const {
+    // We won't try to take advantage of the l2 cache for the time being, and
+    // there is no l3 cache on cuda devices.
+    return firstLevelCacheSize();
+  }
+
+  EIGEN_STRONG_INLINE void synchronize() const {
+    stream_->BlockHostUntilDone();
+  }
+
+  // A gpu::DeviceDescription is cached inside a StreamExecutor, so these calls
+  // aren't expensive/wasteful.
+  EIGEN_DEVICE_FUNC inline int getNumCudaMultiProcessors() const {
+    return stream_exec_->GetDeviceDescription().core_count();
+  }
+
+  EIGEN_DEVICE_FUNC inline int maxCudaThreadsPerBlock() const {
+    return stream_exec_->GetDeviceDescription().threads_per_block_limit();
+  }
+
+  EIGEN_DEVICE_FUNC inline int maxCudaThreadsPerMultiProcessor() const {
+    return stream_exec_->GetDeviceDescription().threads_per_core_limit();
+  }
+
+  EIGEN_DEVICE_FUNC inline int sharedMemPerBlock() const {
+    return stream_exec_->GetDeviceDescription().shared_memory_per_block();
+  }
+
+  EIGEN_DEVICE_FUNC inline int majorDeviceVersion() const {
+    int major, minor;
+    if (stream_exec_->GetDeviceDescription().cuda_compute_capability(&major,
+                                                                  &minor)) {
+      return major;
+    } else {
+      return 0;
+    }
+  }
+
+  inline bool ok() const { return stream_->ok(); }
+
+ private:
+  perftools::gputools::Stream* stream_;
+  perftools::gputools::StreamExecutor* stream_exec_;
+  const Allocator* allocator_;
+};
+
+#define LAUNCH_CUDA_KERNEL(kernel, gridsize, blocksize, sharedmem, device, ...)\
+    (kernel) <<< (gridsize), (blocksize), (sharedmem), (device).stream() >>> (__VA_ARGS__);  \
+  CHECK((device).stream()->ok());
+#endif  // __GCUDACC__
+
+#endif  // EIGEN_USE_GPU
+}  // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_DEVICE_TYPE_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDimensionList.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDimensionList.h
new file mode 100644
index 0000000000..19e922f92f
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDimensionList.h
@@ -0,0 +1,235 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_DIMENSION_LIST_H
+#define EIGEN_CXX11_TENSOR_TENSOR_DIMENSION_LIST_H
+
+namespace Eigen {
+
+/** \internal
+  *
+  * \class TensorDimensionList
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Special case of tensor index list used to list all the dimensions of a tensor of rank n.
+  *
+  * \sa Tensor
+  */
+
+template <typename Index, std::size_t Rank> struct DimensionList {
+  const Index operator[] (const Index i) const { return i; }
+};
+
+namespace internal {
+
+template<typename Index, std::size_t Rank> struct array_size<DimensionList<Index, Rank> > {
+  static const size_t value = Rank;
+};
+template<typename Index, std::size_t Rank> struct array_size<const DimensionList<Index, Rank> > {
+  static const size_t value = Rank;
+};
+
+template<DenseIndex n, typename Index, std::size_t Rank> const Index array_get(DimensionList<Index, Rank>& a) {
+  return n;
+}
+template<DenseIndex n, typename Index, std::size_t Rank> const Index array_get(const DimensionList<Index, Rank>& a) {
+  return n;
+}
+
+
+#if defined(EIGEN_HAS_CONSTEXPR)
+template <typename Index, std::size_t Rank>
+struct index_known_statically<DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex) const {
+    return true;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_known_statically<const DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex) const {
+    return true;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct all_indices_known_statically<DimensionList<Index, Rank> > {
+  constexpr bool operator() () const {
+    return true;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct all_indices_known_statically<const DimensionList<Index, Rank> > {
+  constexpr bool operator() () const {
+    return true;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct indices_statically_known_to_increase<DimensionList<Index, Rank> > {
+  constexpr bool operator() () const {
+    return true;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct indices_statically_known_to_increase<const DimensionList<Index, Rank> > {
+  constexpr bool operator() () const {
+    return true;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct index_statically_eq<DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return i == value;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_statically_eq<const DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return i == value;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct index_statically_ne<DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return i != value;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_statically_ne<const DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return i != value;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct index_statically_gt<DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return i > value;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_statically_gt<const DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return i > value;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct index_statically_lt<DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return i < value;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_statically_lt<const DimensionList<Index, Rank> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return i < value;
+  }
+};
+
+#else
+template <typename Index, std::size_t Rank>
+struct index_known_statically<DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex) const {
+    return true;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_known_statically<const DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex) const {
+    return true;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct all_indices_known_statically<DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() () const {
+    return true;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct all_indices_known_statically<const DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() () const {
+    return true;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct indices_statically_known_to_increase<DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() () const {
+    return true;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct indices_statically_known_to_increase<const DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() () const {
+    return true;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct index_statically_eq<DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return false;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_statically_eq<const DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return false;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct index_statically_ne<DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return false;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_statically_ne<const DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return false;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct index_statically_gt<DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return false;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_statically_gt<const DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return false;
+  }
+};
+
+template <typename Index, std::size_t Rank>
+struct index_statically_lt<DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return false;
+  }
+};
+template <typename Index, std::size_t Rank>
+struct index_statically_lt<const DimensionList<Index, Rank> > {
+  EIGEN_ALWAYS_INLINE bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return false;
+  }
+};
+#endif
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_DIMENSION_LIST_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h
new file mode 100644
index 0000000000..8bf5272ec8
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDimensions.h
@@ -0,0 +1,597 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_DIMENSIONS_H
+#define EIGEN_CXX11_TENSOR_TENSOR_DIMENSIONS_H
+
+
+namespace Eigen {
+
+/** \internal
+  *
+  * \class TensorDimensions
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Set of classes used to encode and store the dimensions of a Tensor.
+  *
+  * The Sizes class encodes as part of the type the number of dimensions and the
+  * sizes corresponding to each dimension. It uses no storage space since it is
+  * entirely known at compile time.
+  * The DSizes class is its dynamic sibling: the number of dimensions is known
+  * at compile time but the sizes are set during execution.
+  *
+  * \sa Tensor
+  */
+
+// Can't use std::pairs on cuda devices
+template <typename Index> struct IndexPair {
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE IndexPair() : first(0), second(0) { }
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE IndexPair(Index f, Index s) : first(f), second(s) { }
+  Index first;
+  Index second;
+};
+
+// Boilerplate code
+namespace internal {
+
+template<std::size_t n, typename Dimension> struct dget {
+  static const std::size_t value = get<n, typename Dimension::Base>::value;
+};
+
+
+template<typename Index, std::size_t NumIndices, std::size_t n, bool RowMajor>
+struct fixed_size_tensor_index_linearization_helper
+{
+  template <typename Dimensions> EIGEN_DEVICE_FUNC
+  static inline Index run(array<Index, NumIndices> const& indices,
+                          const Dimensions& dimensions)
+  {
+    return array_get<RowMajor ? n - 1 : (NumIndices - n)>(indices) +
+        dget<RowMajor ? n - 1 : (NumIndices - n), Dimensions>::value *
+        fixed_size_tensor_index_linearization_helper<Index, NumIndices, n - 1, RowMajor>::run(indices, dimensions);
+  }
+};
+
+template<typename Index, std::size_t NumIndices, bool RowMajor>
+struct fixed_size_tensor_index_linearization_helper<Index, NumIndices, 0, RowMajor>
+{
+  template <typename Dimensions> EIGEN_DEVICE_FUNC
+  static inline Index run(array<Index, NumIndices> const& indices,
+                          const Dimensions&)
+  {
+    return 0;
+  }
+};
+
+template<typename Index, std::size_t n>
+struct fixed_size_tensor_index_extraction_helper
+{
+  template <typename Dimensions> EIGEN_DEVICE_FUNC
+  static inline Index run(const Index index,
+                          const Dimensions& dimensions)
+  {
+    const Index mult = (index == n) ? 1 : 0;
+    return array_get<n>(dimensions) * mult +
+        fixed_size_tensor_index_extraction_helper<Index, n - 1>::run(index, dimensions);
+  }
+};
+
+template<typename Index>
+struct fixed_size_tensor_index_extraction_helper<Index, 0>
+{
+  template <typename Dimensions> EIGEN_DEVICE_FUNC
+  static inline Index run(const Index index,
+                          const Dimensions& dimensions)
+  {
+    const Index mult = (index == 0) ? 1 : 0;
+    return array_get<0>(dimensions) * mult;
+  }
+};
+
+}  // end namespace internal
+
+
+// Fixed size
+#ifndef EIGEN_EMULATE_CXX11_META_H
+template <typename std::size_t... Indices>
+struct Sizes : internal::numeric_list<std::size_t, Indices...> {
+  typedef internal::numeric_list<std::size_t, Indices...> Base;
+  static const std::size_t total_size = internal::arg_prod(Indices...);
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t rank() const {
+    return Base::count;
+  }
+
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::size_t TotalSize() {
+    return internal::arg_prod(Indices...);
+  }
+
+  Sizes() { }
+  template <typename DenseIndex>
+  explicit Sizes(const array<DenseIndex, Base::count>& /*indices*/) {
+    // todo: add assertion
+  }
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+  template <typename... DenseIndex> Sizes(DenseIndex...) { }
+  explicit Sizes(std::initializer_list<std::size_t> /*l*/) {
+    // todo: add assertion
+  }
+#endif
+
+  template <typename T> Sizes& operator = (const T& /*other*/) {
+    // add assertion failure if the size of other is different
+    return *this;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t operator[] (const int index) const {
+    return internal::fixed_size_tensor_index_extraction_helper<std::ptrdiff_t, Base::count - 1>::run(index, *this);
+  }
+
+  template <typename DenseIndex> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  size_t IndexOfColMajor(const array<DenseIndex, Base::count>& indices) const {
+    return internal::fixed_size_tensor_index_linearization_helper<DenseIndex, Base::count, Base::count, false>::run(indices, *static_cast<const Base*>(this));
+  }
+  template <typename DenseIndex> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  size_t IndexOfRowMajor(const array<DenseIndex, Base::count>& indices) const {
+    return internal::fixed_size_tensor_index_linearization_helper<DenseIndex, Base::count, Base::count, true>::run(indices, *static_cast<const Base*>(this));
+  }
+};
+
+namespace internal {
+template <typename std::size_t... Indices>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::size_t array_prod(const Sizes<Indices...>&) {
+  return Sizes<Indices...>::total_size;
+}
+}
+
+#else
+
+template <std::size_t n>
+struct non_zero_size {
+  typedef internal::type2val<std::size_t, n> type;
+};
+template <>
+struct non_zero_size<0> {
+  typedef internal::null_type type;
+};
+
+template <std::size_t V1=0, std::size_t V2=0, std::size_t V3=0, std::size_t V4=0, std::size_t V5=0> struct Sizes {
+  typedef typename internal::make_type_list<typename non_zero_size<V1>::type, typename non_zero_size<V2>::type, typename non_zero_size<V3>::type, typename non_zero_size<V4>::type, typename non_zero_size<V5>::type >::type Base;
+  static const size_t count = Base::count;
+  static const std::size_t total_size = internal::arg_prod<Base>::value;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t rank() const {
+    return count;
+  }
+
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t TotalSize() {
+    return internal::arg_prod<Base>::value;
+  }
+
+  Sizes() { }
+  template <typename DenseIndex>
+  explicit Sizes(const array<DenseIndex, Base::count>& indices) {
+    // todo: add assertion
+  }
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+  template <typename... DenseIndex> Sizes(DenseIndex... indices) { }
+  explicit Sizes(std::initializer_list<std::size_t> l) {
+    // todo: add assertion
+  }
+#else
+  EIGEN_DEVICE_FUNC explicit Sizes(const DenseIndex i0) {
+  }
+  EIGEN_DEVICE_FUNC explicit Sizes(const DenseIndex i0, const DenseIndex i1) {
+  }
+  EIGEN_DEVICE_FUNC explicit Sizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2) {
+  }
+  EIGEN_DEVICE_FUNC explicit Sizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2, const DenseIndex i3) {
+  }
+  EIGEN_DEVICE_FUNC explicit Sizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2, const DenseIndex i3, const DenseIndex i4) {
+  }
+#endif
+
+  template <typename T> Sizes& operator = (const T& other) {
+    // to do: check the size of other
+    return *this;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::size_t operator[] (const int index) const {
+    switch (index) {
+      case 0:
+        return internal::get<0, Base>::value;
+      case 1:
+        return internal::get<1, Base>::value;
+      case 2:
+        return internal::get<2, Base>::value;
+      case 3:
+        return internal::get<3, Base>::value;
+      case 4:
+        return internal::get<4, Base>::value;
+      default:
+        eigen_assert(false && "index overflow");
+        return static_cast<std::size_t>(-1);
+    }
+  }
+
+  template <typename DenseIndex> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  size_t IndexOfColMajor(const array<DenseIndex, Base::count>& indices) const {
+    return internal::fixed_size_tensor_index_linearization_helper<DenseIndex, Base::count, Base::count, false>::run(indices, *this);
+  }
+  template <typename DenseIndex> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  size_t IndexOfRowMajor(const array<DenseIndex, Base::count>& indices) const {
+    return internal::fixed_size_tensor_index_linearization_helper<DenseIndex, Base::count, Base::count, true>::run(indices, *this);
+  }
+};
+
+namespace internal {
+template <std::size_t V1, std::size_t V2, std::size_t V3, std::size_t V4, std::size_t V5>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::size_t array_prod(const Sizes<V1, V2, V3, V4, V5>&) {
+  return Sizes<V1, V2, V3, V4, V5>::total_size;
+}
+}
+
+#endif
+
+// Boilerplate
+namespace internal {
+template<typename Index, std::size_t NumIndices, std::size_t n, bool RowMajor>
+struct tensor_index_linearization_helper
+{
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  Index run(array<Index, NumIndices> const& indices, array<Index, NumIndices> const& dimensions)
+  {
+    return array_get<RowMajor ? n : (NumIndices - n - 1)>(indices) +
+      array_get<RowMajor ? n : (NumIndices - n - 1)>(dimensions) *
+        tensor_index_linearization_helper<Index, NumIndices, n - 1, RowMajor>::run(indices, dimensions);
+  }
+};
+
+template<typename Index, std::size_t NumIndices, bool RowMajor>
+struct tensor_index_linearization_helper<Index, NumIndices, 0, RowMajor>
+{
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  Index run(array<Index, NumIndices> const& indices, array<Index, NumIndices> const&)
+  {
+    return array_get<RowMajor ? 0 : NumIndices - 1>(indices);
+  }
+};
+}  // end namespace internal
+
+
+
+// Dynamic size
+template <typename DenseIndex, std::size_t NumDims>
+struct DSizes : array<DenseIndex, NumDims> {
+  typedef array<DenseIndex, NumDims> Base;
+  static const std::size_t count = NumDims;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t rank() const {
+    return NumDims;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t TotalSize() const {
+    return internal::array_prod(*static_cast<const Base*>(this));
+  }
+
+  EIGEN_DEVICE_FUNC DSizes() {
+    for (int i = 0 ; i < NumDims; ++i) {
+      (*this)[i] = 0;
+    }
+  }
+  EIGEN_DEVICE_FUNC DSizes(const array<DenseIndex, NumDims>& a) : Base(a) { }
+
+  EIGEN_DEVICE_FUNC DSizes(const DimensionList<DenseIndex, NumDims>& a) {
+    for (int i = 0 ; i < NumDims; ++i) {
+      (*this)[i] = a[i];
+    }
+  }
+
+#ifndef EIGEN_EMULATE_CXX11_META_H
+  template <typename std::size_t... Indices>
+  EIGEN_DEVICE_FUNC DSizes(const Sizes<Indices...>& a) {
+    for (int i = 0 ; i < NumDims; ++i) {
+      (*this)[i] = a[i];
+    }
+  }
+#else
+  template <std::size_t V1, std::size_t V2, std::size_t V3, std::size_t V4, std::size_t V5>
+  EIGEN_DEVICE_FUNC DSizes(const Sizes<V1, V2, V3, V4, V5>& a) {
+    for (int i = 0 ; i < NumDims; ++i) {
+      (*this)[i] = a[i];
+    }
+  }
+#endif
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+  template<typename... IndexTypes> EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE explicit DSizes(DenseIndex firstDimension, IndexTypes... otherDimensions) {
+    EIGEN_STATIC_ASSERT(sizeof...(otherDimensions) + 1 == NumDims, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    (*this) = array<DenseIndex, NumDims>{{firstDimension, otherDimensions...}};
+  }
+#else
+  EIGEN_DEVICE_FUNC explicit DSizes(const DenseIndex i0) {
+    eigen_assert(NumDims == 1);
+    (*this)[0] = i0;
+  }
+  EIGEN_DEVICE_FUNC explicit DSizes(const DenseIndex i0, const DenseIndex i1) {
+    eigen_assert(NumDims == 2);
+    (*this)[0] = i0;
+    (*this)[1] = i1;
+  }
+  EIGEN_DEVICE_FUNC explicit DSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2) {
+    eigen_assert(NumDims == 3);
+    (*this)[0] = i0;
+    (*this)[1] = i1;
+    (*this)[2] = i2;
+  }
+  EIGEN_DEVICE_FUNC explicit DSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2, const DenseIndex i3) {
+    eigen_assert(NumDims == 4);
+    (*this)[0] = i0;
+    (*this)[1] = i1;
+    (*this)[2] = i2;
+    (*this)[3] = i3;
+  }
+  EIGEN_DEVICE_FUNC explicit DSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2, const DenseIndex i3, const DenseIndex i4) {
+    eigen_assert(NumDims == 5);
+    (*this)[0] = i0;
+    (*this)[1] = i1;
+    (*this)[2] = i2;
+    (*this)[3] = i3;
+    (*this)[4] = i4;
+  }
+#endif
+
+  EIGEN_DEVICE_FUNC DSizes& operator = (const array<DenseIndex, NumDims>& other) {
+    *static_cast<Base*>(this) = other;
+    return *this;
+  }
+
+  // A constexpr would be so much better here
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t IndexOfColMajor(const array<DenseIndex, NumDims>& indices) const {
+    return internal::tensor_index_linearization_helper<DenseIndex, NumDims, NumDims - 1, false>::run(indices, *static_cast<const Base*>(this));
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t IndexOfRowMajor(const array<DenseIndex, NumDims>& indices) const {
+    return internal::tensor_index_linearization_helper<DenseIndex, NumDims, NumDims - 1, true>::run(indices, *static_cast<const Base*>(this));
+  }
+};
+
+
+
+
+// Boilerplate
+namespace internal {
+template<typename Index, std::size_t NumIndices, std::size_t n, bool RowMajor>
+struct tensor_vsize_index_linearization_helper
+{
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  Index run(array<Index, NumIndices> const& indices, std::vector<DenseIndex> const& dimensions)
+  {
+    return array_get<RowMajor ? n : (NumIndices - n - 1)>(indices) +
+      array_get<RowMajor ? n : (NumIndices - n - 1)>(dimensions) *
+        tensor_vsize_index_linearization_helper<Index, NumIndices, n - 1, RowMajor>::run(indices, dimensions);
+  }
+};
+
+template<typename Index, std::size_t NumIndices, bool RowMajor>
+struct tensor_vsize_index_linearization_helper<Index, NumIndices, 0, RowMajor>
+{
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  Index run(array<Index, NumIndices> const& indices, std::vector<DenseIndex> const&)
+  {
+    return array_get<RowMajor ? 0 : NumIndices - 1>(indices);
+  }
+};
+}  // end namespace internal
+
+
+template <typename DenseIndex>
+struct VSizes : std::vector<DenseIndex> {
+  typedef std::vector<DenseIndex> Base;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t rank() const {
+    return Base::size();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t TotalSize() const {
+    return internal::array_prod(*static_cast<const Base*>(this));
+  }
+
+  EIGEN_DEVICE_FUNC VSizes() { }
+  EIGEN_DEVICE_FUNC explicit VSizes(const std::vector<DenseIndex>& a) : Base(a) { }
+
+  template <std::size_t NumDims>
+  EIGEN_DEVICE_FUNC explicit VSizes(const array<DenseIndex, NumDims>& a) {
+    this->resize(NumDims);
+    for (int i = 0; i < NumDims; ++i) {
+      (*this)[i] = a[i];
+    }
+  }
+  template <std::size_t NumDims>
+  EIGEN_DEVICE_FUNC explicit VSizes(const DSizes<DenseIndex, NumDims>& a) {
+    this->resize(NumDims);
+    for (int i = 0; i < NumDims; ++i) {
+      (*this)[i] = a[i];
+    }
+  }
+
+  EIGEN_DEVICE_FUNC explicit VSizes(const DenseIndex i0) {
+    this->resize(1);
+    (*this)[0] = i0;
+  }
+  EIGEN_DEVICE_FUNC explicit VSizes(const DenseIndex i0, const DenseIndex i1) {
+    this->resize(2);
+    (*this)[0] = i0;
+    (*this)[1] = i1;
+  }
+  EIGEN_DEVICE_FUNC explicit VSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2) {
+    this->resize(3);
+    (*this)[0] = i0;
+    (*this)[1] = i1;
+    (*this)[2] = i2;
+  }
+  EIGEN_DEVICE_FUNC explicit VSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2, const DenseIndex i3) {
+    this->resize(4);
+    (*this)[0] = i0;
+    (*this)[1] = i1;
+    (*this)[2] = i2;
+    (*this)[3] = i3;
+  }
+  EIGEN_DEVICE_FUNC explicit VSizes(const DenseIndex i0, const DenseIndex i1, const DenseIndex i2, const DenseIndex i3, const DenseIndex i4) {
+    this->resize(5);
+    (*this)[0] = i0;
+    (*this)[1] = i1;
+    (*this)[2] = i2;
+    (*this)[3] = i3;
+    (*this)[4] = i4;
+  }
+
+  EIGEN_DEVICE_FUNC VSizes& operator = (const std::vector<DenseIndex>& other) {
+    *static_cast<Base*>(this) = other;
+    return *this;
+  }
+  template <std::size_t NumDims>
+  EIGEN_DEVICE_FUNC VSizes& operator = (const array<DenseIndex, NumDims>& a) {
+    this->resize(NumDims);
+    for (int i = 0; i < NumDims; ++i) {
+      (*this)[i] = a[i];
+    }
+    return *this;
+  }
+  template <std::size_t NumDims>
+  EIGEN_DEVICE_FUNC VSizes& operator = (const DSizes<DenseIndex, NumDims>& a) {
+    this->resize(NumDims);
+    for (int i = 0; i < NumDims; ++i) {
+      (*this)[i] = a[i];
+    }
+    return *this;
+  }
+
+  // A constexpr would be so much better here
+  template <std::size_t NumDims>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t IndexOfColMajor(const array<DenseIndex, NumDims>& indices) const {
+    return internal::tensor_vsize_index_linearization_helper<DenseIndex, NumDims, NumDims - 1, false>::run(indices, *static_cast<const Base*>(this));
+  }
+  template <std::size_t NumDims>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE size_t IndexOfRowMajor(const array<DenseIndex, NumDims>& indices) const {
+    return internal::tensor_vsize_index_linearization_helper<DenseIndex, NumDims, NumDims - 1, true>::run(indices, *static_cast<const Base*>(this));
+  }
+};
+
+
+// Boilerplate
+namespace internal {
+template <typename DenseIndex>
+EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex array_prod(const VSizes<DenseIndex>& sizes) {
+  DenseIndex total_size = 1;
+  for (int i = 0; i < sizes.size(); ++i) {
+    total_size *= sizes[i];
+  }
+  return total_size;
+};
+}
+
+namespace internal {
+
+template <typename DenseIndex, std::size_t NumDims> struct array_size<const DSizes<DenseIndex, NumDims> > {
+  static const size_t value = NumDims;
+};
+template <typename DenseIndex, std::size_t NumDims> struct array_size<DSizes<DenseIndex, NumDims> > {
+  static const size_t value = NumDims;
+};
+template <typename DenseIndex>
+struct array_size<VSizes<DenseIndex> > {
+  static const ptrdiff_t value = -1;
+};
+#ifndef EIGEN_EMULATE_CXX11_META_H
+template <typename std::size_t... Indices> struct array_size<const Sizes<Indices...> > {
+static const size_t value = Sizes<Indices...>::count;
+};
+template <typename std::size_t... Indices> struct array_size<Sizes<Indices...> > {
+static const size_t value = Sizes<Indices...>::count;
+};
+template <std::size_t n, typename std::size_t... Indices> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t array_get(const Sizes<Indices...>&) {
+  return get<n, internal::numeric_list<std::size_t, Indices...> >::value;
+}
+#else
+template <std::size_t V1, std::size_t V2, std::size_t V3, std::size_t V4, std::size_t V5> struct array_size<const Sizes<V1,V2,V3,V4,V5> > {
+  static const size_t value = Sizes<V1,V2,V3,V4,V5>::count;
+};
+template <std::size_t V1, std::size_t V2, std::size_t V3, std::size_t V4, std::size_t V5> struct array_size<Sizes<V1,V2,V3,V4,V5> > {
+  static const size_t value = Sizes<V1,V2,V3,V4,V5>::count;
+};
+template <std::size_t n, std::size_t V1, std::size_t V2, std::size_t V3, std::size_t V4, std::size_t V5> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE std::ptrdiff_t array_get(const Sizes<V1,V2,V3,V4,V5>& a) {
+  return get<n, typename Sizes<V1,V2,V3,V4,V5>::Base>::value;
+}
+
+#endif
+
+
+template <typename Dims1, typename Dims2, size_t n, size_t m>
+struct sizes_match_below_dim {
+  static inline bool run(Dims1& dims1, Dims2& dims2) {
+    return false;
+  }
+};
+template <typename Dims1, typename Dims2, size_t n>
+struct sizes_match_below_dim<Dims1, Dims2, n, n> {
+  static inline bool run(Dims1& dims1, Dims2& dims2) {
+    return (array_get<n-1>(dims1) == array_get<n-1>(dims2)) &
+        sizes_match_below_dim<Dims1, Dims2, n-1, n-1>::run(dims1, dims2);
+  }
+};
+template <typename Dims1, typename Dims2>
+struct sizes_match_below_dim<Dims1, Dims2, 0, 0> {
+  static inline bool run(Dims1& dims1, Dims2& dims2) {
+    return true;
+  }
+};
+
+} // end namespace internal
+
+
+template <typename Dims1, typename Dims2>
+bool dimensions_match(Dims1& dims1, Dims2& dims2) {
+  return internal::sizes_match_below_dim<Dims1, Dims2, internal::array_size<Dims1>::value, internal::array_size<Dims2>::value>::run(dims1, dims2);
+}
+
+template <typename IndexType, typename Dims2>
+bool dimensions_match(const VSizes<IndexType>& dims1, Dims2& dims2) {
+  if (dims1.size() != internal::array_size<Dims2>::value) {
+    return false;
+  }
+  for (int i = 0; i < internal::array_size<Dims2>::value; ++i) {
+    if (dims1[i] != dims2[i]) {
+      return false;
+    }
+  }
+  return true;
+}
+
+template <typename Dims1, typename IndexType>
+bool dimensions_match(Dims1& dims1, const VSizes<IndexType>& dims2) {
+  if (internal::array_size<Dims1>::value != dims2.size()) {
+    return false;
+  }
+  for (int i = 0; i < internal::array_size<Dims1>::value; ++i) {
+    if (dims1[i] != dims2[i]) {
+      return false;
+    }
+  }
+  return true;
+}
+
+template <typename IndexType>
+bool dimensions_match(const VSizes<IndexType>& dims1, const VSizes<IndexType>& dims2) {
+  return dims1 == dims2;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_DIMENSIONS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorEvalTo.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorEvalTo.h
new file mode 100644
index 0000000000..4ad431abae
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorEvalTo.h
@@ -0,0 +1,151 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_EVAL_TO_H
+#define EIGEN_CXX11_TENSOR_TENSOR_EVAL_TO_H
+
+namespace Eigen {
+
+/** \class TensorForcedEval
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor reshaping class.
+  *
+  *
+  */
+namespace internal {
+template<typename XprType>
+struct traits<TensorEvalToOp<XprType> >
+{
+  // Type promotion to handle the case where the types of the lhs and the rhs are different.
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+
+  enum {
+    Flags = 0,
+  };
+};
+
+template<typename XprType>
+struct eval<TensorEvalToOp<XprType>, Eigen::Dense>
+{
+  typedef const TensorEvalToOp<XprType>& type;
+};
+
+template<typename XprType>
+struct nested<TensorEvalToOp<XprType>, 1, typename eval<TensorEvalToOp<XprType> >::type>
+{
+  typedef TensorEvalToOp<XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+
+template<typename XprType>
+class TensorEvalToOp : public TensorBase<TensorEvalToOp<XprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorEvalToOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+  typedef typename Eigen::internal::nested<TensorEvalToOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorEvalToOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorEvalToOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvalToOp(CoeffReturnType* buffer, const XprType& expr)
+      : m_xpr(expr), m_buffer(buffer) {}
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+    EIGEN_DEVICE_FUNC CoeffReturnType* buffer() const { return m_buffer; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    CoeffReturnType* m_buffer;
+};
+
+
+
+template<typename ArgType, typename Device>
+struct TensorEvaluator<const TensorEvalToOp<ArgType>, Device>
+{
+  typedef TensorEvalToOp<ArgType> XprType;
+  typedef typename ArgType::Scalar Scalar;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions Dimensions;
+
+  enum {
+    IsAligned = TensorEvaluator<ArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device), m_device(device), m_buffer(op.buffer())
+  { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE ~TensorEvaluator() {
+  }
+
+  typedef typename XprType::Index Index;
+  typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+
+  EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_impl.dimensions(); }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* scalar) {
+    assert(scalar == NULL);
+    return m_impl.evalSubExprsIfNeeded(m_buffer);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalScalar(Index i) {
+    m_buffer[i] = m_impl.coeff(i);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalPacket(Index i) {
+    internal::pstoret<CoeffReturnType, PacketReturnType, Aligned>(m_buffer + i, m_impl.template packet<TensorEvaluator<ArgType, Device>::IsAligned ? Aligned : Unaligned>(i));
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return m_buffer[index];
+  }
+
+  template<int LoadMode>
+  EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    return internal::ploadt<PacketReturnType, LoadMode>(m_buffer + index);
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType* data() const { return m_buffer; }
+
+ private:
+  TensorEvaluator<ArgType, Device> m_impl;
+  const Device& m_device;
+  CoeffReturnType* m_buffer;
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_EVAL_TO_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h
new file mode 100644
index 0000000000..f2ef2d85c1
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h
@@ -0,0 +1,505 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_EVALUATOR_H
+#define EIGEN_CXX11_TENSOR_TENSOR_EVALUATOR_H
+
+namespace Eigen {
+
+/** \class TensorEvaluator
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief The tensor evaluator classes.
+  *
+  * These classes are responsible for the evaluation of the tensor expression.
+  *
+  * TODO: add support for more types of expressions, in particular expressions
+  * leading to lvalues (slicing, reshaping, etc...)
+  */
+
+// Generic evaluator
+template<typename Derived, typename Device>
+struct TensorEvaluator
+{
+  typedef typename Derived::Index Index;
+  typedef typename Derived::Scalar Scalar;
+  typedef typename Derived::Scalar CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename Derived::Dimensions Dimensions;
+
+  // NumDimensions is -1 for variable dim tensors
+  static const int NumCoords = internal::traits<Derived>::NumDimensions;
+  static const int SafeNumCoords = NumCoords >= 0 ? NumCoords : 0;
+
+  enum {
+    IsAligned = Derived::IsAligned,
+    PacketAccess = Derived::PacketAccess,
+    BlockAccess = internal::is_arithmetic<
+                      typename internal::remove_const<Scalar>::type>::value &&
+                  NumCoords >= 0,
+    Layout = Derived::Layout,
+    CoordAccess = NumCoords >= 0,
+  };
+
+  typedef typename internal::TensorBlock<
+      Index, typename internal::remove_const<Scalar>::type, SafeNumCoords, Layout>
+      TensorBlock;
+  typedef typename internal::TensorBlockReader<
+      Index, typename internal::remove_const<Scalar>::type, SafeNumCoords, Layout,
+      PacketAccess> TensorBlockReader;
+  typedef typename internal::TensorBlockWriter<
+      Index, typename internal::remove_const<Scalar>::type, SafeNumCoords, Layout,
+      PacketAccess> TensorBlockWriter;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  TensorEvaluator(const Derived& m, const Device& device)
+      : m_data(const_cast<Scalar*>(m.data())),
+        m_dims(m.dimensions()),
+        m_device(device) {}
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dims; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* dest) {
+    if (dest) {
+      m_device.memcpy((void*)dest, m_data, sizeof(Scalar) * m_dims.TotalSize());
+      return false;
+    }
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const {
+    eigen_assert(m_data);
+    return m_data[index];
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) {
+    eigen_assert(m_data);
+    return m_data[index];
+  }
+
+  template<int LoadMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  PacketReturnType packet(Index index) const
+  {
+    return internal::ploadt<PacketReturnType, LoadMode>(m_data + index);
+  }
+
+  template <int StoreMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x)
+  {
+    return internal::pstoret<Scalar, PacketReturnType, StoreMode>(m_data + index, x);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array<Index, SafeNumCoords>& coords) const {
+    eigen_assert(m_data);
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      return m_data[m_dims.IndexOfColMajor(coords)];
+    } else {
+      return m_data[m_dims.IndexOfRowMajor(coords)];
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(const array<Index, SafeNumCoords>& coords) {
+    eigen_assert(m_data);
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      return m_data[m_dims.IndexOfColMajor(coords)];
+    } else {
+      return m_data[m_dims.IndexOfRowMajor(coords)];
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void block(TensorBlock* block) const {
+    assert(m_data != NULL);
+    TensorBlockReader::Run(block, m_data);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock(
+      const TensorBlock& block) {
+    assert(m_data != NULL);
+    TensorBlockWriter::Run(block, m_data);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return m_data; }
+
+ protected:
+  Scalar* m_data;
+  Dimensions m_dims;
+  const Device& m_device;
+};
+
+
+namespace {
+template <typename T> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
+T loadConstant(const T* address) {
+  return *address;
+
+}
+// Use the texture cache on CUDA devices whenever possible
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 350
+template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
+float loadConstant(const float* address) {
+  return __ldg(address);
+}
+template <> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
+double loadConstant(const double* address) {
+  return __ldg(address);
+
+
+}
+#endif
+}
+
+
+// Default evaluator for rvalues
+template<typename Derived, typename Device>
+struct TensorEvaluator<const Derived, Device>
+{
+  typedef typename Derived::Index Index;
+  typedef typename Derived::Scalar Scalar;
+  typedef typename Derived::Scalar CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename Derived::Dimensions Dimensions;
+
+  // NumDimensions is -1 for variable dim tensors
+  static const int NumCoords = internal::traits<Derived>::NumDimensions;
+  static const int SafeNumCoords = NumCoords >= 0 ? NumCoords : 0;
+
+  enum {
+    IsAligned = Derived::IsAligned,
+    PacketAccess = Derived::PacketAccess,
+    BlockAccess = internal::is_arithmetic<
+                      typename internal::remove_const<Scalar>::type>::value &&
+                  NumCoords >= 0,
+    Layout = Derived::Layout,
+    CoordAccess = NumCoords >= 0,
+  };
+
+  // TODO(andydavis) Add block/writeBlock accessors to Tensor and TensorMap so
+  // we can default BlockAccess to true above.
+  typedef typename internal::TensorBlock<
+      Index, typename internal::remove_const<Scalar>::type, SafeNumCoords, Layout>
+      TensorBlock;
+  typedef typename internal::TensorBlockReader<
+      Index, typename internal::remove_const<Scalar>::type, SafeNumCoords, Layout,
+      PacketAccess> TensorBlockReader;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const Derived& m, const Device& device)
+      : m_data(m.data()), m_dims(m.dimensions()), m_device(device)
+  { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dims; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* data) {
+    if (internal::is_arithmetic<typename internal::remove_const<Scalar>::type>::value && data) {
+      m_device.memcpy((void*)data, m_data, m_dims.TotalSize() * sizeof(Scalar));
+      return false;
+    }
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const {
+    eigen_assert(m_data);
+    return loadConstant(m_data+index);
+  }
+
+  template<int LoadMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  PacketReturnType packet(Index index) const
+  {
+    return internal::ploadt_ro<PacketReturnType, LoadMode>(m_data + index);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array<Index, SafeNumCoords>& coords) const {
+    eigen_assert(m_data);
+    const Index index = (static_cast<int>(Layout) == static_cast<int>(ColMajor)) ? m_dims.IndexOfColMajor(coords)
+                        : m_dims.IndexOfRowMajor(coords);
+    return loadConstant(m_data+index);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void block(TensorBlock* block) const {
+    assert(m_data != NULL);
+    TensorBlockReader::Run(block, m_data);
+  }
+
+  EIGEN_DEVICE_FUNC const Scalar* data() const { return m_data; }
+
+ protected:
+  const Scalar* m_data;
+  Dimensions m_dims;
+  const Device& m_device;
+};
+
+
+
+
+// -------------------- CwiseNullaryOp --------------------
+
+template<typename NullaryOp, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorCwiseNullaryOp<NullaryOp, ArgType>, Device>
+{
+  typedef TensorCwiseNullaryOp<NullaryOp, ArgType> XprType;
+
+  enum {
+    IsAligned = true,
+    PacketAccess = internal::functor_traits<NullaryOp>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC
+  TensorEvaluator(const XprType& op, const Device& device)
+      : m_functor(op.functor()), m_argImpl(op.nestedExpression(), device)
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename internal::traits<XprType>::Scalar CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions Dimensions;
+
+  EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_argImpl.dimensions(); }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType*) { return true; }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() { }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const
+  {
+    return m_functor(index);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    return m_functor.packetOp(index);
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType* data() const { return NULL; }
+
+ private:
+  const NullaryOp m_functor;
+  TensorEvaluator<ArgType, Device> m_argImpl;
+};
+
+
+
+// -------------------- CwiseUnaryOp --------------------
+
+template<typename UnaryOp, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorCwiseUnaryOp<UnaryOp, ArgType>, Device>
+{
+  typedef TensorCwiseUnaryOp<UnaryOp, ArgType> XprType;
+
+  enum {
+    IsAligned = TensorEvaluator<ArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess &
+                   internal::functor_traits<UnaryOp>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC TensorEvaluator(const XprType& op, const Device& device)
+    : m_functor(op.functor()),
+      m_argImpl(op.nestedExpression(), device)
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename internal::traits<XprType>::Scalar CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions Dimensions;
+
+  EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_argImpl.dimensions(); }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar*) {
+    m_argImpl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_argImpl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const
+  {
+    return m_functor(m_argImpl.coeff(index));
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    return m_functor.packetOp(m_argImpl.template packet<LoadMode>(index));
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType* data() const { return NULL; }
+
+ private:
+  const UnaryOp m_functor;
+  TensorEvaluator<ArgType, Device> m_argImpl;
+};
+
+
+// -------------------- CwiseBinaryOp --------------------
+
+template<typename BinaryOp, typename LeftArgType, typename RightArgType, typename Device>
+struct TensorEvaluator<const TensorCwiseBinaryOp<BinaryOp, LeftArgType, RightArgType>, Device>
+{
+  typedef TensorCwiseBinaryOp<BinaryOp, LeftArgType, RightArgType> XprType;
+
+  enum {
+    IsAligned = TensorEvaluator<LeftArgType, Device>::IsAligned &
+                TensorEvaluator<RightArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<LeftArgType, Device>::PacketAccess &
+                   TensorEvaluator<RightArgType, Device>::PacketAccess &
+                   internal::functor_traits<BinaryOp>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<LeftArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC TensorEvaluator(const XprType& op, const Device& device)
+    : m_functor(op.functor()),
+      m_leftImpl(op.lhsExpression(), device),
+      m_rightImpl(op.rhsExpression(), device)
+  {
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<LeftArgType, Device>::Layout) == static_cast<int>(TensorEvaluator<RightArgType, Device>::Layout) || internal::traits<XprType>::NumDimensions <= 1), YOU_MADE_A_PROGRAMMING_MISTAKE);
+    eigen_assert(dimensions_match(m_leftImpl.dimensions(), m_rightImpl.dimensions()));
+  }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename internal::traits<XprType>::Scalar CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename TensorEvaluator<LeftArgType, Device>::Dimensions Dimensions;
+
+  EIGEN_DEVICE_FUNC const Dimensions& dimensions() const
+  {
+    // TODO: use right impl instead if right impl dimensions are known at compile time.
+    return m_leftImpl.dimensions();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType*) {
+    m_leftImpl.evalSubExprsIfNeeded(NULL);
+    m_rightImpl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_leftImpl.cleanup();
+    m_rightImpl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const
+  {
+    return m_functor(m_leftImpl.coeff(index), m_rightImpl.coeff(index));
+  }
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    return m_functor.packetOp(m_leftImpl.template packet<LoadMode>(index), m_rightImpl.template packet<LoadMode>(index));
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType* data() const { return NULL; }
+
+ private:
+  const BinaryOp m_functor;
+  TensorEvaluator<LeftArgType, Device> m_leftImpl;
+  TensorEvaluator<RightArgType, Device> m_rightImpl;
+};
+
+
+// -------------------- SelectOp --------------------
+
+template<typename IfArgType, typename ThenArgType, typename ElseArgType, typename Device>
+struct TensorEvaluator<const TensorSelectOp<IfArgType, ThenArgType, ElseArgType>, Device>
+{
+  typedef TensorSelectOp<IfArgType, ThenArgType, ElseArgType> XprType;
+  typedef typename XprType::Scalar Scalar;
+
+  enum {
+    IsAligned = TensorEvaluator<ThenArgType, Device>::IsAligned &
+                TensorEvaluator<ElseArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<ThenArgType, Device>::PacketAccess &
+                   TensorEvaluator<ElseArgType, Device>::PacketAccess &
+                   internal::packet_traits<Scalar>::HasBlend,
+    BlockAccess = false,
+    Layout = TensorEvaluator<IfArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC TensorEvaluator(const XprType& op, const Device& device)
+    : m_condImpl(op.ifExpression(), device),
+      m_thenImpl(op.thenExpression(), device),
+      m_elseImpl(op.elseExpression(), device)
+  {
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<IfArgType, Device>::Layout) == static_cast<int>(TensorEvaluator<ThenArgType, Device>::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE);
+    EIGEN_STATIC_ASSERT((static_cast<int>(TensorEvaluator<IfArgType, Device>::Layout) == static_cast<int>(TensorEvaluator<ElseArgType, Device>::Layout)), YOU_MADE_A_PROGRAMMING_MISTAKE);
+    eigen_assert(dimensions_match(m_condImpl.dimensions(), m_thenImpl.dimensions()));
+    eigen_assert(dimensions_match(m_thenImpl.dimensions(), m_elseImpl.dimensions()));
+  }
+
+  typedef typename XprType::Index Index;
+  typedef typename internal::traits<XprType>::Scalar CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+  typedef typename TensorEvaluator<IfArgType, Device>::Dimensions Dimensions;
+
+  EIGEN_DEVICE_FUNC const Dimensions& dimensions() const
+  {
+    // TODO: use then or else impl instead if they happen to be known at compile time.
+    return m_condImpl.dimensions();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType*) {
+    m_condImpl.evalSubExprsIfNeeded(NULL);
+    m_thenImpl.evalSubExprsIfNeeded(NULL);
+    m_elseImpl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_condImpl.cleanup();
+    m_thenImpl.cleanup();
+    m_elseImpl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType coeff(Index index) const
+  {
+    return m_condImpl.coeff(index) ? m_thenImpl.coeff(index) : m_elseImpl.coeff(index);
+  }
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC PacketReturnType packet(Index index) const
+  {
+    const int PacketSize = internal::unpacket_traits<PacketReturnType>::size;
+    internal::Selector<PacketSize> select;
+    for (Index i = 0; i < PacketSize; ++i) {
+      select.select[i] = m_condImpl.coeff(index+i);
+    }
+    return internal::pblend(select,
+                            m_thenImpl.template packet<LoadMode>(index),
+                            m_elseImpl.template packet<LoadMode>(index));
+  }
+
+  EIGEN_DEVICE_FUNC CoeffReturnType* data() const { return NULL; }
+
+ private:
+  TensorEvaluator<IfArgType, Device> m_condImpl;
+  TensorEvaluator<ThenArgType, Device> m_thenImpl;
+  TensorEvaluator<ElseArgType, Device> m_elseImpl;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_EVALUATOR_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h
new file mode 100644
index 0000000000..863c28ab43
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h
@@ -0,0 +1,461 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_EXECUTOR_H
+#define EIGEN_CXX11_TENSOR_TENSOR_EXECUTOR_H
+
+namespace Eigen {
+
+/** \class TensorExecutor
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief The tensor executor class.
+  *
+  * This class is responsible for launch the evaluation of the expression on
+  * the specified computing device.
+  */
+namespace internal {
+
+// Default strategy: the expression is evaluated with a single cpu thread.
+template <typename Expression, typename Device,
+          bool Vectorizable, bool Tileable>
+class TensorExecutor {
+ public:
+  typedef typename Expression::Index Index;
+  EIGEN_DEVICE_FUNC static inline void run(const Expression& expr, const Device& device = Device())
+  {
+    TensorEvaluator<Expression, Device> evaluator(expr, device);
+    const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+    if (needs_assign)
+    {
+      const Index size = array_prod(evaluator.dimensions());
+      for (Index i = 0; i < size; ++i) {
+        evaluator.evalScalar(i);
+      }
+    }
+    evaluator.cleanup();
+  }
+};
+
+template <typename Expression>
+class TensorExecutor<Expression, DefaultDevice, true, false> {
+ public:
+  typedef typename Expression::Index Index;
+  EIGEN_DEVICE_FUNC
+  static inline void run(const Expression& expr, const DefaultDevice& device = DefaultDevice())
+  {
+    TensorEvaluator<Expression, DefaultDevice> evaluator(expr, device);
+    const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+    if (needs_assign)
+    {
+      const Index size = array_prod(evaluator.dimensions());
+      const int PacketSize = unpacket_traits<typename TensorEvaluator<Expression, DefaultDevice>::PacketReturnType>::size;
+
+      // Manually unroll this loop since compilers don't do it.
+      const Index UnrolledSize = (size / (4 * PacketSize)) * 4 * PacketSize;
+      for (Index i = 0; i < UnrolledSize; i += 4*PacketSize) {
+        evaluator.evalPacket(i);
+        evaluator.evalPacket(i+PacketSize);
+        evaluator.evalPacket(i+2*PacketSize);
+        evaluator.evalPacket(i+3*PacketSize);
+      }
+      const Index VectorizedSize = (size / PacketSize) * PacketSize;
+      for (Index i = UnrolledSize; i < VectorizedSize; i += PacketSize) {
+        evaluator.evalPacket(i);
+      }
+      for (Index i = VectorizedSize; i < size; ++i) {
+        evaluator.evalScalar(i);
+      }
+    }
+    evaluator.cleanup();
+  }
+};
+
+template <typename Expression, bool Vectorizable>
+class TensorExecutor<Expression, DefaultDevice, Vectorizable, true> {
+ public:
+  typedef typename Expression::Index Index;
+  EIGEN_DEVICE_FUNC
+  static inline void run(const Expression& expr,
+                         const DefaultDevice& device = DefaultDevice()) {
+    typedef TensorEvaluator<Expression, DefaultDevice> Evaluator;
+    typedef typename traits<Expression>::Scalar Scalar;
+    typedef typename traits<Expression>::Index Index;
+    const std::size_t NumDims = traits<Expression>::NumDimensions;
+
+    typedef TensorBlockMapper<Index,
+                              typename internal::remove_const<Scalar>::type,
+                              NumDims, Evaluator::Layout> TensorBlockMapper;
+    typedef TensorBlock<Index, typename internal::remove_const<Scalar>::type,
+                        NumDims, Evaluator::Layout> TensorBlock;
+
+    Evaluator evaluator(expr, device);
+    std::size_t total_size = array_prod(evaluator.dimensions());
+    std::size_t cache_size = device.firstLevelCacheSize() / sizeof(Scalar);
+    if (total_size < cache_size) {
+      // TODO(andydavis) Reduce block management overhead for small tensors.
+      internal::TensorExecutor<Expression, DefaultDevice, Vectorizable,
+                               false>::run(expr, device);
+      return;
+    }
+
+    const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+    if (needs_assign) {
+      // Size tensor blocks to fit in cache (or requested target block size).
+      size_t block_total_size = numext::mini(cache_size, total_size);
+      TensorBlockShapeType block_shape = kUniformAllDims;
+      // Query expression tree for desired block size/shape.
+      std::vector<internal::TensorOpResourceRequirements> resources;
+      evaluator.getResourceRequirements(&resources);
+      if (!resources.empty()) {
+        // TODO(andydavis) Implement different policies (i.e. revert to a
+        // default policy if block shapes/sizes conflict).
+        block_shape = resources[0].block_shape;
+        block_total_size = resources[0].block_total_size;
+      }
+
+      TensorBlockMapper block_mapper(evaluator.dimensions(),
+                                     block_shape,
+                                     block_total_size);
+
+      Scalar* data = static_cast<Scalar*>(device.allocate(
+          block_total_size * sizeof(Scalar)));
+
+      const Index total_block_count = block_mapper.total_block_count();
+      for (Index i = 0; i < total_block_count; ++i) {
+        TensorBlock block = block_mapper.GetBlockForIndex(i, data);
+        evaluator.evalBlock(&block);
+      }
+      device.deallocate(data);
+    }
+    evaluator.cleanup();
+  }
+};
+
+// Multicore strategy: the index space is partitioned and each partition is executed on a single core
+#ifdef EIGEN_USE_THREADS
+template <typename Evaluator, typename Index, bool Vectorizable>
+struct EvalRange {
+  static void run(Evaluator evaluator, const Index first, const Index last) {
+    eigen_assert(last > first);
+    for (Index i = first; i < last; ++i) {
+      evaluator.evalScalar(i);
+    }
+  }
+};
+
+template <typename Evaluator, typename Index>
+struct EvalRange<Evaluator, Index, true> {
+  static void run(Evaluator evaluator, const Index first, const Index last) {
+    eigen_assert(last > first);
+
+    Index i = first;
+    static const int PacketSize = unpacket_traits<typename Evaluator::PacketReturnType>::size;
+    if (last - first >= PacketSize) {
+      eigen_assert(first % PacketSize == 0);
+      Index lastPacket = last - (last % PacketSize);
+      for (; i < lastPacket; i += PacketSize) {
+        evaluator.evalPacket(i);
+      }
+    }
+
+    for (; i < last; ++i) {
+      evaluator.evalScalar(i);
+    }
+  }
+};
+
+template <typename Expression, bool Vectorizable, bool Tileable>
+class TensorExecutor<Expression, ThreadPoolDevice, Vectorizable, Tileable> {
+ public:
+  typedef typename Expression::Index Index;
+  static inline void run(const Expression& expr, const ThreadPoolDevice& device)
+  {
+    if (device.numThreads() <= 1) {
+      DefaultDevice dd;
+      TensorExecutor<Expression, DefaultDevice, Vectorizable, Tileable>::run(expr, dd);
+      return;
+    }
+
+    typedef TensorEvaluator<Expression, ThreadPoolDevice> Evaluator;
+    Evaluator evaluator(expr, device);
+    const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+    if (needs_assign)
+    {
+      const Index size = array_prod(evaluator.dimensions());
+
+      static const Index PacketSize = Vectorizable ? unpacket_traits<typename Evaluator::PacketReturnType>::size : 1;
+      Index blocksz = std::ceil<Index>(static_cast<float>(size)/device.numThreads()) + PacketSize - 1;
+      const Index blocksize = numext::maxi<Index>(PacketSize, (blocksz - (blocksz % PacketSize)));
+      const Index numblocks = size / blocksize;
+
+      Index i = 0;
+      FixedSizeVector<Notification*> results(numblocks);
+      for (int i = 0; i < numblocks; ++i) {
+        results.push_back(device.enqueue(&EvalRange<Evaluator, Index, Vectorizable>::run, evaluator, i*blocksize, (i+1)*blocksize));
+      }
+
+      if (numblocks * blocksize < size) {
+        EvalRange<Evaluator, Index, Vectorizable>::run(evaluator, numblocks * blocksize, size);
+      }
+
+      for (int i = 0; i < numblocks; ++i) {
+        wait_until_ready(results[i]);
+        delete results[i];
+      }
+    }
+    evaluator.cleanup();
+  }
+};
+
+template <typename Index, typename Scalar>
+struct BlockRange {
+  BlockRange(Index s, Index l, Scalar* d)
+      : index_start(s), index_limit(l), data(d) {}
+  const Index index_start;
+  const Index index_limit;
+  Scalar* data;
+};
+
+template <typename Evaluator, typename Index, typename Scalar,
+          std::size_t NumDims>
+struct EvalBlockRange {
+  typedef TensorBlockMapper<Index, Scalar, NumDims, Evaluator::Layout>
+      BlockMapper;
+
+  static void run(Evaluator evaluator, const BlockMapper& block_mapper,
+                  BlockRange<Index, Scalar> block_range) {
+    typedef TensorBlock<Index, Scalar, NumDims, Evaluator::Layout>
+        TensorBlock;
+    eigen_assert(block_range.index_limit > block_range.index_start);
+
+    for (Index i = block_range.index_start; i < block_range.index_limit; ++i) {
+      TensorBlock block = block_mapper.GetBlockForIndex(i, block_range.data);
+      evaluator.evalBlock(&block);
+    }
+  }
+};
+
+template <typename Expression, bool Vectorizable>
+class TensorExecutor<Expression, ThreadPoolDevice, Vectorizable, true> {
+ public:
+  typedef typename Expression::Index Index;
+  static inline void run(const Expression& expr,
+                         const ThreadPoolDevice& device) {
+    typedef TensorEvaluator<Expression, ThreadPoolDevice> Evaluator;
+    typedef typename internal::remove_const<
+        typename traits<Expression>::Scalar>::type Scalar;
+    typedef typename traits<Expression>::Index Index;
+    static const std::size_t NumDims = traits<Expression>::NumDimensions;
+    typedef TensorBlockMapper<Index, Scalar, NumDims, Evaluator::Layout>
+        TensorBlockMapper;
+    typedef TensorBlock<Index, Scalar, NumDims, Evaluator::Layout>
+        TensorBlock;
+    typedef BlockRange<Index, Scalar> BlockRange;
+
+    Evaluator evaluator(expr, device);
+    std::size_t total_size = array_prod(evaluator.dimensions());
+    std::size_t cache_size = device.firstLevelCacheSize() / sizeof(Scalar);
+    if (total_size < cache_size || device.numThreads() <= 1) {
+      // TODO(andydavis) Reduce block management overhead for small tensors.
+      DefaultDevice dd;
+      internal::TensorExecutor<Expression, DefaultDevice, Vectorizable, false>::run(expr, dd);
+      return;
+    }
+    const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+    if (needs_assign) {
+      TensorBlockShapeType block_shape = kUniformAllDims;
+      size_t block_total_size = 0;
+      // Query expression tree for desired block size/shape.
+      std::vector<internal::TensorOpResourceRequirements> resources;
+      evaluator.getResourceRequirements(&resources);
+      if (!resources.empty()) {
+        // TODO(andydavis) Implement different shape/size policies.
+        block_shape = resources[0].block_shape;
+        block_total_size = resources[0].block_total_size;
+      }
+
+      // Divide the tensor coefficients across the number of threads, subject
+      // to min/max block size constraints.
+      const size_t min_block_size =
+          device.firstLevelCacheSize() / sizeof(Scalar);
+      const size_t max_block_size = block_total_size > 0 ? block_total_size :
+          device.lastLevelCacheSize() / sizeof(Scalar);
+      const size_t target_block_size = numext::maxi(
+          min_block_size,
+          numext::mini(static_cast<size_t>(array_prod(evaluator.dimensions())) / device.numThreads(),
+                       max_block_size));
+
+      TensorBlockMapper block_mapper(evaluator.dimensions(),
+                                     block_shape,
+                                     target_block_size);
+
+      const Index block_partition_size =
+          (block_mapper.total_block_count() + device.numThreads() - 1) /
+          device.numThreads();
+      const Index block_partition_count =
+          (block_mapper.total_block_count() + block_partition_size - 1) /
+          block_partition_size;
+
+      if (block_partition_count == 1) {
+        // Avoid thread hop if no parallelism is possible.
+        Scalar* data = static_cast<Scalar*>(
+            device.allocate(target_block_size * sizeof(Scalar)));
+        EvalBlockRange<Evaluator, Index, Scalar, NumDims>::run(
+            evaluator, block_mapper,
+            BlockRange(0, block_mapper.total_block_count(), data));
+        device.deallocate(data);
+      } else {
+        // Multi-threaded case.
+        struct ThreadState {
+          Notification* done;
+          Scalar* data;
+        };
+        FixedSizeVector<ThreadState> thread_state(block_partition_count,
+                                                  ThreadState());
+
+        // Dispatch threads.
+        for (int i = 0; i < block_partition_count; ++i) {
+          thread_state[i].data = static_cast<Scalar*>(
+              device.allocate(target_block_size * sizeof(Scalar)));
+          thread_state[i].done = device.enqueue(
+              &EvalBlockRange<Evaluator, Index, Scalar, NumDims>::run,
+              evaluator, block_mapper,
+              BlockRange(i * block_partition_size,
+                         numext::mini((i + 1) * block_partition_size,
+                                    block_mapper.total_block_count()),
+                         thread_state[i].data));
+        }
+
+        // Join threads.
+        for (int i = 0; i < block_partition_count; ++i) {
+          wait_until_ready(thread_state[i].done);
+          delete thread_state[i].done;
+          device.deallocate(thread_state[i].data);
+        }
+      }
+    }
+    evaluator.cleanup();
+  }
+};
+
+#endif
+
+
+// GPU: the evaluation of the expression is offloaded to a GPU.
+#if defined(EIGEN_USE_GPU)
+
+template <typename Expression, bool Tileable>
+class TensorExecutor<Expression, GpuDevice, false, Tileable> {
+ public:
+  typedef typename Expression::Index Index;
+  static void run(const Expression& expr, const GpuDevice& device);
+};
+
+template <typename Expression, bool Tileable>
+class TensorExecutor<Expression, GpuDevice, true, Tileable> {
+ public:
+  typedef typename Expression::Index Index;
+  static void run(const Expression& expr, const GpuDevice& device);
+};
+
+#if defined(__CUDACC__)
+template <typename Evaluator, typename Index>
+__global__ void
+__launch_bounds__(1024)
+ EigenMetaKernel_NonVectorizable(Evaluator memcopied_eval, Index size) {
+
+  const Index first_index = blockIdx.x * blockDim.x + threadIdx.x;
+  const Index step_size = blockDim.x * gridDim.x;
+
+  // Cuda memcopies the kernel arguments. That's fine for POD, but for more
+  // complex types such as evaluators we should really conform to the C++
+  // standard and call a proper copy constructor.
+  Evaluator eval(memcopied_eval);
+
+  // Use the scalar path
+  for (Index i = first_index; i < size; i += step_size) {
+    eval.evalScalar(i);
+  }
+}
+
+template <typename Evaluator, typename Index>
+__global__ void
+__launch_bounds__(1024)
+ EigenMetaKernel_Vectorizable(Evaluator memcopied_eval, Index size) {
+
+  const Index first_index = blockIdx.x * blockDim.x + threadIdx.x;
+  const Index step_size = blockDim.x * gridDim.x;
+
+  // Cuda memcopies the kernel arguments. That's fine for POD, but for more
+  // complex types such as evaluators we should really conform to the C++
+  // standard and call a proper copy constructor.
+  Evaluator eval(memcopied_eval);
+
+  // Use the vector path
+  const Index PacketSize = unpacket_traits<typename Evaluator::PacketReturnType>::size;
+  const Index vectorized_step_size = step_size * PacketSize;
+  const Index vectorized_size = (size / PacketSize) * PacketSize;
+  for (Index i = first_index * PacketSize; i < vectorized_size;
+       i += vectorized_step_size) {
+    eval.evalPacket(i);
+  }
+  for (Index i = vectorized_size + first_index; i < size; i += step_size) {
+    eval.evalScalar(i);
+  }
+}
+
+/*static*/
+template <typename Expression, bool Tileable>
+inline void TensorExecutor<Expression, GpuDevice, false, Tileable>::run(
+    const Expression& expr, const GpuDevice& device) {
+  TensorEvaluator<Expression, GpuDevice> evaluator(expr, device);
+  const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+  if (needs_assign) {
+    const int num_blocks = device.getNumCudaMultiProcessors() *
+                           device.maxCudaThreadsPerMultiProcessor() /
+                           device.maxCudaThreadsPerBlock();
+    const int block_size = device.maxCudaThreadsPerBlock();
+    const Index size = array_prod(evaluator.dimensions());
+    LAUNCH_CUDA_KERNEL(
+        (EigenMetaKernel_NonVectorizable<TensorEvaluator<Expression, GpuDevice>,
+                                         Index>),
+        num_blocks, block_size, 0, device, evaluator, size);
+  }
+  evaluator.cleanup();
+}
+
+/*static*/
+template <typename Expression, bool Tileable>
+inline void TensorExecutor<Expression, GpuDevice, true, Tileable>::run(
+    const Expression& expr, const GpuDevice& device) {
+  TensorEvaluator<Expression, GpuDevice> evaluator(expr, device);
+  const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+  if (needs_assign) {
+    const int num_blocks = device.getNumCudaMultiProcessors() *
+                           device.maxCudaThreadsPerMultiProcessor() /
+                           device.maxCudaThreadsPerBlock();
+    const int block_size = device.maxCudaThreadsPerBlock();
+    const Index size = array_prod(evaluator.dimensions());
+    LAUNCH_CUDA_KERNEL(
+        (EigenMetaKernel_Vectorizable<TensorEvaluator<Expression, GpuDevice>,
+                                      Index>),
+        num_blocks, block_size, 0, device, evaluator, size);
+  }
+  evaluator.cleanup();
+}
+
+#endif  // __CUDACC__
+#endif  // EIGEN_USE_GPU
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_EXECUTOR_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExpr.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExpr.h
new file mode 100644
index 0000000000..49d849e233
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExpr.h
@@ -0,0 +1,291 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_EXPR_H
+#define EIGEN_CXX11_TENSOR_TENSOR_EXPR_H
+
+namespace Eigen {
+
+/** \class TensorExpr
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor expression classes.
+  *
+  * The TensorCwiseNullaryOp class applies a nullary operators to an expression.
+  * This is typically used to generate constants.
+  *
+  * The TensorCwiseUnaryOp class represents an expression where a unary operator
+  * (e.g. cwiseSqrt) is applied to an expression.
+  *
+  * The TensorCwiseBinaryOp class represents an expression where a binary
+  * operator (e.g. addition) is applied to a lhs and a rhs expression.
+  *
+  */
+namespace internal {
+template<typename NullaryOp, typename XprType>
+struct traits<TensorCwiseNullaryOp<NullaryOp, XprType> >
+    : traits<XprType>
+{
+  typedef traits<XprType> XprTraits;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::Nested XprTypeNested;
+  typedef typename remove_reference<XprTypeNested>::type _XprTypeNested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+
+  enum {
+    Flags = 0,
+  };
+};
+
+}  // end namespace internal
+
+
+
+template<typename NullaryOp, typename XprType>
+class TensorCwiseNullaryOp : public TensorBase<TensorCwiseNullaryOp<NullaryOp, XprType>, ReadOnlyAccessors>
+{
+  public:
+    typedef typename Eigen::internal::traits<TensorCwiseNullaryOp>::Scalar Scalar;
+    typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+    typedef typename XprType::CoeffReturnType CoeffReturnType;
+    typedef TensorCwiseNullaryOp<NullaryOp, XprType> Nested;
+    typedef typename Eigen::internal::traits<TensorCwiseNullaryOp>::StorageKind StorageKind;
+    typedef typename Eigen::internal::traits<TensorCwiseNullaryOp>::Index Index;
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCwiseNullaryOp(const XprType& xpr, const NullaryOp& func = NullaryOp())
+        : m_xpr(xpr), m_functor(func) {}
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    nestedExpression() const { return m_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    const NullaryOp& functor() const { return m_functor; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const NullaryOp m_functor;
+};
+
+
+
+namespace internal {
+template<typename UnaryOp, typename XprType>
+struct traits<TensorCwiseUnaryOp<UnaryOp, XprType> >
+    : traits<XprType>
+{
+  // TODO(phli): Add InputScalar, InputPacket.  Check references to
+  // current Scalar/Packet to see if the intent is Input or Output.
+  typedef typename result_of<UnaryOp(typename XprType::Scalar)>::type Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename XprType::Nested XprTypeNested;
+  typedef typename remove_reference<XprTypeNested>::type _XprTypeNested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename UnaryOp, typename XprType>
+struct eval<TensorCwiseUnaryOp<UnaryOp, XprType>, Eigen::Dense>
+{
+  typedef const TensorCwiseUnaryOp<UnaryOp, XprType>& type;
+};
+
+template<typename UnaryOp, typename XprType>
+struct nested<TensorCwiseUnaryOp<UnaryOp, XprType>, 1, typename eval<TensorCwiseUnaryOp<UnaryOp, XprType> >::type>
+{
+  typedef TensorCwiseUnaryOp<UnaryOp, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename UnaryOp, typename XprType>
+class TensorCwiseUnaryOp : public TensorBase<TensorCwiseUnaryOp<UnaryOp, XprType>, ReadOnlyAccessors>
+{
+  public:
+    // TODO(phli): Add InputScalar, InputPacket.  Check references to
+    // current Scalar/Packet to see if the intent is Input or Output.
+    typedef typename Eigen::internal::traits<TensorCwiseUnaryOp>::Scalar Scalar;
+    typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+    typedef Scalar CoeffReturnType;
+    typedef typename Eigen::internal::nested<TensorCwiseUnaryOp>::type Nested;
+    typedef typename Eigen::internal::traits<TensorCwiseUnaryOp>::StorageKind StorageKind;
+    typedef typename Eigen::internal::traits<TensorCwiseUnaryOp>::Index Index;
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCwiseUnaryOp(const XprType& xpr, const UnaryOp& func = UnaryOp())
+      : m_xpr(xpr), m_functor(func) {}
+
+    EIGEN_DEVICE_FUNC
+    const UnaryOp& functor() const { return m_functor; }
+
+    /** \returns the nested expression */
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    nestedExpression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const UnaryOp m_functor;
+};
+
+
+namespace internal {
+template<typename BinaryOp, typename LhsXprType, typename RhsXprType>
+struct traits<TensorCwiseBinaryOp<BinaryOp, LhsXprType, RhsXprType> >
+{
+  // Type promotion to handle the case where the types of the lhs and the rhs
+  // are different.
+  // TODO(phli): Add Lhs/RhsScalar, Lhs/RhsPacket.  Check references to
+  // current Scalar/Packet to see if the intent is Inputs or Output.
+  typedef typename result_of<
+      BinaryOp(typename LhsXprType::Scalar,
+               typename RhsXprType::Scalar)>::type Scalar;
+  typedef traits<LhsXprType> XprTraits;
+  typedef typename promote_storage_type<
+      typename traits<LhsXprType>::StorageKind,
+      typename traits<RhsXprType>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<
+      typename traits<LhsXprType>::Index,
+      typename traits<RhsXprType>::Index>::type Index;
+  typedef typename LhsXprType::Nested LhsNested;
+  typedef typename RhsXprType::Nested RhsNested;
+  typedef typename remove_reference<LhsNested>::type _LhsNested;
+  typedef typename remove_reference<RhsNested>::type _RhsNested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+
+  enum {
+    Flags = 0,
+  };
+};
+
+template<typename BinaryOp, typename LhsXprType, typename RhsXprType>
+struct eval<TensorCwiseBinaryOp<BinaryOp, LhsXprType, RhsXprType>, Eigen::Dense>
+{
+  typedef const TensorCwiseBinaryOp<BinaryOp, LhsXprType, RhsXprType>& type;
+};
+
+template<typename BinaryOp, typename LhsXprType, typename RhsXprType>
+struct nested<TensorCwiseBinaryOp<BinaryOp, LhsXprType, RhsXprType>, 1, typename eval<TensorCwiseBinaryOp<BinaryOp, LhsXprType, RhsXprType> >::type>
+{
+  typedef TensorCwiseBinaryOp<BinaryOp, LhsXprType, RhsXprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename BinaryOp, typename LhsXprType, typename RhsXprType>
+class TensorCwiseBinaryOp : public TensorBase<TensorCwiseBinaryOp<BinaryOp, LhsXprType, RhsXprType>, ReadOnlyAccessors>
+{
+  public:
+    // TODO(phli): Add Lhs/RhsScalar, Lhs/RhsPacket.  Check references to
+    // current Scalar/Packet to see if the intent is Inputs or Output.
+    typedef typename Eigen::internal::traits<TensorCwiseBinaryOp>::Scalar Scalar;
+    typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+    typedef Scalar CoeffReturnType;
+    typedef typename Eigen::internal::nested<TensorCwiseBinaryOp>::type Nested;
+    typedef typename Eigen::internal::traits<TensorCwiseBinaryOp>::StorageKind StorageKind;
+    typedef typename Eigen::internal::traits<TensorCwiseBinaryOp>::Index Index;
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorCwiseBinaryOp(const LhsXprType& lhs, const RhsXprType& rhs, const BinaryOp& func = BinaryOp())
+        : m_lhs_xpr(lhs), m_rhs_xpr(rhs), m_functor(func) {}
+
+    EIGEN_DEVICE_FUNC
+    const BinaryOp& functor() const { return m_functor; }
+
+    /** \returns the nested expressions */
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename LhsXprType::Nested>::type&
+    lhsExpression() const { return m_lhs_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename RhsXprType::Nested>::type&
+    rhsExpression() const { return m_rhs_xpr; }
+
+  protected:
+    typename LhsXprType::Nested m_lhs_xpr;
+    typename RhsXprType::Nested m_rhs_xpr;
+    const BinaryOp m_functor;
+};
+
+
+namespace internal {
+template<typename IfXprType, typename ThenXprType, typename ElseXprType>
+struct traits<TensorSelectOp<IfXprType, ThenXprType, ElseXprType> >
+    : traits<ThenXprType>
+{
+  typedef typename traits<ThenXprType>::Scalar Scalar;
+  typedef traits<ThenXprType> XprTraits;
+  typedef typename promote_storage_type<typename traits<ThenXprType>::StorageKind,
+                                        typename traits<ElseXprType>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename traits<ElseXprType>::Index,
+                                      typename traits<ThenXprType>::Index>::type Index;
+  typedef typename IfXprType::Nested IfNested;
+  typedef typename ThenXprType::Nested ThenNested;
+  typedef typename ElseXprType::Nested ElseNested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename IfXprType, typename ThenXprType, typename ElseXprType>
+struct eval<TensorSelectOp<IfXprType, ThenXprType, ElseXprType>, Eigen::Dense>
+{
+  typedef const TensorSelectOp<IfXprType, ThenXprType, ElseXprType>& type;
+};
+
+template<typename IfXprType, typename ThenXprType, typename ElseXprType>
+struct nested<TensorSelectOp<IfXprType, ThenXprType, ElseXprType>, 1, typename eval<TensorSelectOp<IfXprType, ThenXprType, ElseXprType> >::type>
+{
+  typedef TensorSelectOp<IfXprType, ThenXprType, ElseXprType> type;
+};
+
+}  // end namespace internal
+
+
+template<typename IfXprType, typename ThenXprType, typename ElseXprType>
+class TensorSelectOp : public TensorBase<TensorSelectOp<IfXprType, ThenXprType, ElseXprType> >
+{
+  public:
+    typedef typename Eigen::internal::traits<TensorSelectOp>::Scalar Scalar;
+    typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+    typedef typename internal::promote_storage_type<typename ThenXprType::CoeffReturnType,
+                                                    typename ElseXprType::CoeffReturnType>::ret CoeffReturnType;
+    typedef typename Eigen::internal::nested<TensorSelectOp>::type Nested;
+    typedef typename Eigen::internal::traits<TensorSelectOp>::StorageKind StorageKind;
+    typedef typename Eigen::internal::traits<TensorSelectOp>::Index Index;
+
+    EIGEN_DEVICE_FUNC
+    TensorSelectOp(const IfXprType& a_condition,
+                   const ThenXprType& a_then,
+                   const ElseXprType& a_else)
+      : m_condition(a_condition), m_then(a_then), m_else(a_else)
+    { }
+
+    EIGEN_DEVICE_FUNC
+    const IfXprType& ifExpression() const { return m_condition; }
+
+    EIGEN_DEVICE_FUNC
+    const ThenXprType& thenExpression() const { return m_then; }
+
+    EIGEN_DEVICE_FUNC
+    const ElseXprType& elseExpression() const { return m_else; }
+
+  protected:
+    typename IfXprType::Nested m_condition;
+    typename ThenXprType::Nested m_then;
+    typename ElseXprType::Nested m_else;
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_EXPR_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFFT.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFFT.h
new file mode 100644
index 0000000000..ac73366762
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFFT.h
@@ -0,0 +1,846 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Jianwei Cui <thucjw@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_FFT_H
+#define EIGEN_CXX11_TENSOR_TENSOR_FFT_H
+namespace Eigen {
+
+/** \class TensorFFT
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor FFT class.
+  *
+  * TODO:
+  * Vectorize the Cooley Tukey and the Bluestein algorithm
+  * Add support for multithreaded evaluation
+  * Improve the performance on GPU
+  */
+
+template <bool NeedUprade> struct MakeComplex {
+  template <typename T>
+  #if defined(EIGEN_USE_GPU) && defined(__CUDACC__) && !defined(__GCUDACC__)
+  EIGEN_DEVICE_FUNC
+  #endif
+  T operator() (const T& val) const { return val; }
+};
+
+template <> struct MakeComplex<true> {
+  template <typename T>
+  #if defined(EIGEN_USE_GPU) && defined(__CUDACC__) && !defined(__GCUDACC__)
+  EIGEN_DEVICE_FUNC
+  #endif
+  std::complex<T> operator() (const T& val) const { return std::complex<T>(val, 0); }
+};
+
+template <> struct MakeComplex<false> {
+  template <typename T>
+  #if defined(EIGEN_USE_GPU) && defined(__CUDACC__) && !defined(__GCUDACC__)
+  EIGEN_DEVICE_FUNC
+  #endif
+  std::complex<T> operator() (const std::complex<T>& val) const { return val; }
+};
+
+template <int ResultType> struct PartOf {
+  template <typename T> T operator() (const T& val) const { return val; }
+};
+
+template <> struct PartOf<RealPart> {
+  template <typename T> T operator() (const std::complex<T>& val) const { return val.real(); }
+};
+
+template <> struct PartOf<ImagPart> {
+  template <typename T> T operator() (const std::complex<T>& val) const { return val.imag(); }
+};
+
+namespace internal {
+template <typename FFT, typename XprType, int FFTResultType, int FFTDir>
+struct traits<TensorFFTOp<FFT, XprType, FFTResultType, FFTDir> > : public traits<XprType> {
+  typedef traits<XprType> XprTraits;
+  typedef typename NumTraits<typename XprTraits::Scalar>::Real RealScalar;
+  typedef typename std::complex<RealScalar> ComplexScalar;
+  typedef typename XprTraits::Scalar InputScalar;
+  typedef typename conditional<FFTResultType == RealPart || FFTResultType == ImagPart, RealScalar, ComplexScalar>::type OutputScalar;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template <typename FFT, typename XprType, int FFTResultType, int FFTDirection>
+struct eval<TensorFFTOp<FFT, XprType, FFTResultType, FFTDirection>, Eigen::Dense> {
+  typedef const TensorFFTOp<FFT, XprType, FFTResultType, FFTDirection>& type;
+};
+
+template <typename FFT, typename XprType, int FFTResultType, int FFTDirection>
+struct nested<TensorFFTOp<FFT, XprType, FFTResultType, FFTDirection>, 1, typename eval<TensorFFTOp<FFT, XprType, FFTResultType, FFTDirection> >::type> {
+  typedef TensorFFTOp<FFT, XprType, FFTResultType, FFTDirection> type;
+};
+
+}  // end namespace internal
+
+template <typename FFT, typename XprType, int FFTResultType, int FFTDir>
+class TensorFFTOp : public TensorBase<TensorFFTOp<FFT, XprType, FFTResultType, FFTDir>, ReadOnlyAccessors> {
+ public:
+  typedef typename Eigen::internal::traits<TensorFFTOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename std::complex<RealScalar> ComplexScalar;
+  typedef typename internal::conditional<FFTResultType == RealPart || FFTResultType == ImagPart, RealScalar, ComplexScalar>::type OutputScalar;
+  typedef OutputScalar CoeffReturnType;
+  typedef typename Eigen::internal::nested<TensorFFTOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorFFTOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorFFTOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorFFTOp(const XprType& expr, const FFT& fft)
+      : m_xpr(expr), m_fft(fft) {}
+
+  EIGEN_DEVICE_FUNC
+  const FFT& fft() const { return m_fft; }
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename XprType::Nested>::type& expression() const {
+    return m_xpr;
+  }
+
+ protected:
+  typename XprType::Nested m_xpr;
+  const FFT m_fft;
+};
+
+// Eval as rvalue
+template <typename FFT, typename ArgType, typename Device, int FFTResultType, int FFTDir>
+struct TensorEvaluator<const TensorFFTOp<FFT, ArgType, FFTResultType, FFTDir>, Device> {
+  typedef TensorFFTOp<FFT, ArgType, FFTResultType, FFTDir> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename std::complex<RealScalar> ComplexScalar;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions InputDimensions;
+  typedef internal::traits<XprType> XprTraits;
+  typedef typename XprTraits::Scalar InputScalar;
+  typedef typename internal::conditional<FFTResultType == RealPart || FFTResultType == ImagPart, RealScalar, ComplexScalar>::type OutputScalar;
+  typedef OutputScalar CoeffReturnType;
+  typedef typename PacketType<OutputScalar, Device>::type PacketReturnType;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = true,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device) : m_data(NULL), m_impl(op.expression(), device), m_fft(op.fft()), m_device(device) {
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    for (int i = 0; i < NumDims; ++i) {
+      eigen_assert(input_dims[i] > 0);
+      m_dimensions[i] = input_dims[i];
+    }
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_strides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_strides[i] = m_strides[i - 1] * m_dimensions[i - 1];
+      }
+    } else {
+      m_strides[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_strides[i] = m_strides[i + 1] * m_dimensions[i + 1];
+      }
+    }
+    m_size = m_dimensions.TotalSize();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const {
+    return m_dimensions;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(OutputScalar* data) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    if (data) {
+      evalToBuf(data);
+      return false;
+    } else {
+      m_data = (CoeffReturnType*)m_device.allocate(sizeof(CoeffReturnType) * m_size);
+      evalToBuf(m_data);
+      return true;
+    }
+  }
+
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    if (m_data) {
+      m_device.deallocate(m_data);
+      m_data = NULL;
+    }
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE CoeffReturnType coeff(Index index) const {
+    return m_data[index];
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketReturnType packet(Index index) const {
+    return internal::ploadt<PacketReturnType, LoadMode>(m_data + index);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return m_data; }
+
+
+ private:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void evalToBuf(OutputScalar* data) {
+    const bool write_to_out = internal::is_same<OutputScalar, ComplexScalar>::value;
+    ComplexScalar* buf = write_to_out ? (ComplexScalar*)data : (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * m_size);
+
+    for (int i = 0; i < m_size; ++i) {
+      buf[i] = MakeComplex<internal::is_same<InputScalar, RealScalar>::value>()(m_impl.coeff(i));
+    }
+
+    for (int i = 0; i < m_fft.size(); ++i) {
+      int dim = m_fft[i];
+      eigen_assert(dim >= 0 && dim < NumDims);
+      Index line_len = m_dimensions[dim];
+      eigen_assert(line_len >= 1);
+      ComplexScalar* line_buf = (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * line_len);
+      const bool is_power_of_two = isPowerOfTwo(line_len);
+      const int good_composite = is_power_of_two ? 0 : findGoodComposite(line_len);
+      const int log_len = is_power_of_two ? getLog2(line_len) : getLog2(good_composite);
+
+      ComplexScalar* a = is_power_of_two ? NULL : (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * good_composite);
+      ComplexScalar* b = is_power_of_two ? NULL : (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * good_composite);
+      ComplexScalar* pos_j_base_powered = is_power_of_two ? NULL : (ComplexScalar*)m_device.allocate(sizeof(ComplexScalar) * (line_len + 1));
+      if (!is_power_of_two) {
+        ComplexScalar pos_j_base = ComplexScalar(std::cos(M_PI/line_len), std::sin(M_PI/line_len));
+        for (int i = 0; i < line_len + 1; ++i) {
+          pos_j_base_powered[i] = std::pow(pos_j_base, i * i);
+        }
+      }
+
+      for (Index partial_index = 0; partial_index < m_size / line_len; ++partial_index) {
+        Index base_offset = getBaseOffsetFromIndex(partial_index, dim);
+
+        // get data into line_buf
+        for (int j = 0; j < line_len; ++j) {
+          Index offset = getIndexFromOffset(base_offset, dim, j);
+          line_buf[j] = buf[offset];
+        }
+
+        // processs the line
+        if (is_power_of_two) {
+          processDataLineCooleyTukey(line_buf, line_len, log_len);
+        }
+        else {
+          processDataLineBluestein(line_buf, line_len, good_composite, log_len, a, b, pos_j_base_powered);
+        }
+
+        // write back
+        for (int j = 0; j < line_len; ++j) {
+          const ComplexScalar div_factor = (FFTDir == FFT_FORWARD) ? ComplexScalar(1, 0) : ComplexScalar(line_len, 0);
+          Index offset = getIndexFromOffset(base_offset, dim, j);
+          buf[offset] =  line_buf[j] / div_factor;
+        }
+      }
+      m_device.deallocate(line_buf);
+      if (!pos_j_base_powered) {
+        m_device.deallocate(a);
+        m_device.deallocate(b);
+        m_device.deallocate(pos_j_base_powered);
+      }
+    }
+
+    if(!write_to_out) {
+      for (int i = 0; i < m_size; ++i) {
+        data[i] = PartOf<FFTResultType>()(buf[i]);
+      }
+      m_device.deallocate(buf);
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static bool isPowerOfTwo(int x) {
+    eigen_assert(x > 0);
+    return !(x & (x - 1));
+  }
+
+  //the composite number for padding, used in Bluestein's FFT algorithm
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static int findGoodComposite(int n) {
+    int i = 2;
+    while (i < 2 * n - 1) i *= 2;
+    return i;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static int getLog2(int m) {
+    int log2m = 0;
+    while (m >>= 1) log2m++;
+    return log2m;
+  }
+
+  // Call Cooley Tukey algorithm directly, data length must be power of 2
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void processDataLineCooleyTukey(ComplexScalar* line_buf, int line_len, int log_len) {
+    eigen_assert(isPowerOfTwo(line_len));
+    scramble_FFT(line_buf, line_len);
+    compute_1D_Butterfly<FFTDir>(line_buf, line_len, log_len);
+  }
+
+  // Call Bluestein's FFT algorithm, m is a good composite number greater than (2 * n - 1), used as the padding length
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void processDataLineBluestein(ComplexScalar* line_buf, int line_len, int good_composite, int log_len, ComplexScalar* a, ComplexScalar* b, const ComplexScalar* pos_j_base_powered) {
+    int n = line_len;
+    int m = good_composite;
+    ComplexScalar* data = line_buf;
+
+    for (int i = 0; i < n; ++i) {
+      if(FFTDir == FFT_FORWARD) {
+        a[i] = data[i] * std::conj(pos_j_base_powered[i]);
+      }
+      else {
+        a[i] = data[i] * pos_j_base_powered[i];
+      }
+    }
+    for (int i = n; i < m; ++i) {
+      a[i] = ComplexScalar(0, 0);
+    }
+
+    for (int i = 0; i < n; ++i) {
+      if(FFTDir == FFT_FORWARD) {
+        b[i] = pos_j_base_powered[i];
+      }
+      else {
+        b[i] = std::conj(pos_j_base_powered[i]);
+      }
+    }
+    for (int i = n; i < m - n; ++i) {
+      b[i] = ComplexScalar(0, 0);
+    }
+    for (int i = m - n; i < m; ++i) {
+      if(FFTDir == FFT_FORWARD) {
+        b[i] = pos_j_base_powered[m-i];
+      }
+      else {
+        b[i] = std::conj(pos_j_base_powered[m-i]);
+      }
+    }
+
+    scramble_FFT(a, m);
+    compute_1D_Butterfly<FFT_FORWARD>(a, m, log_len);
+
+    scramble_FFT(b, m);
+    compute_1D_Butterfly<FFT_FORWARD>(b, m, log_len);
+
+    for (int i = 0; i < m; ++i) {
+      a[i] *= b[i];
+    }
+
+    scramble_FFT(a, m);
+    compute_1D_Butterfly<FFT_REVERSE>(a, m, log_len);
+
+    //Do the scaling after ifft
+    for (int i = 0; i < m; ++i) {
+      a[i] /= m;
+    }
+
+    for (int i = 0; i < n; ++i) {
+      if(FFTDir == FFT_FORWARD) {
+        data[i] = a[i] * std::conj(pos_j_base_powered[i]);
+      }
+      else {
+        data[i] = a[i] * pos_j_base_powered[i];
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE static void scramble_FFT(ComplexScalar* data, int n) {
+    eigen_assert(isPowerOfTwo(n));
+    int j = 1;
+    for (int i = 1; i < n; ++i){
+      if (j > i) {
+        std::swap(data[j-1], data[i-1]);
+      }
+      int m = n >> 1;
+      while (m >= 2 && j > m) {
+        j -= m;
+        m >>= 1;
+      }
+      j += m;
+    }
+  }
+
+  template<int Dir>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void compute_1D_Butterfly(ComplexScalar* data, int n, int n_power_of_2) {
+    eigen_assert(isPowerOfTwo(n));
+    if (n == 1) {
+      return;
+    }
+    else if (n == 2) {
+      ComplexScalar tmp = data[1];
+      data[1] = data[0] - data[1];
+      data[0] += tmp;
+      return;
+    }
+    else if (n == 4) {
+      ComplexScalar tmp[4];
+      tmp[0] = data[0] + data[1];
+      tmp[1] = data[0] - data[1];
+      tmp[2] = data[2] + data[3];
+      if(Dir == FFT_FORWARD) {
+        tmp[3] = ComplexScalar(0.0, -1.0) * (data[2] - data[3]);
+      }
+      else {
+        tmp[3] = ComplexScalar(0.0, 1.0) * (data[2] - data[3]);
+      }
+      data[0] = tmp[0] + tmp[2];
+      data[1] = tmp[1] + tmp[3];
+      data[2] = tmp[0] - tmp[2];
+      data[3] = tmp[1] - tmp[3];
+      return;
+    }
+    else if (n == 8) {
+      ComplexScalar tmp_1[8];
+      ComplexScalar tmp_2[8];
+
+      tmp_1[0] = data[0] + data[1];
+      tmp_1[1] = data[0] - data[1];
+      tmp_1[2] = data[2] + data[3];
+      if (Dir == FFT_FORWARD) {
+        tmp_1[3] = (data[2] - data[3]) * ComplexScalar(0, -1);
+      }
+      else {
+        tmp_1[3] = (data[2] - data[3]) * ComplexScalar(0, 1);
+      }
+      tmp_1[4] = data[4] + data[5];
+      tmp_1[5] = data[4] - data[5];
+      tmp_1[6] = data[6] + data[7];
+      if (Dir == FFT_FORWARD) {
+        tmp_1[7] = (data[6] - data[7]) * ComplexScalar(0, -1);
+      }
+      else {
+        tmp_1[7] = (data[6] - data[7]) * ComplexScalar(0, 1);
+      }
+      tmp_2[0] = tmp_1[0] + tmp_1[2];
+      tmp_2[1] = tmp_1[1] + tmp_1[3];
+      tmp_2[2] = tmp_1[0] - tmp_1[2];
+      tmp_2[3] = tmp_1[1] - tmp_1[3];
+      tmp_2[4] = tmp_1[4] + tmp_1[6];
+      // SQRT2DIV2 = sqrt(2)/2
+      #define SQRT2DIV2 0.7071067811865476
+      if (Dir == FFT_FORWARD) {
+        tmp_2[5] = (tmp_1[5] + tmp_1[7]) * ComplexScalar(SQRT2DIV2, -SQRT2DIV2);
+        tmp_2[6] = (tmp_1[4] - tmp_1[6]) * ComplexScalar(0, -1);
+        tmp_2[7] = (tmp_1[5] - tmp_1[7]) * ComplexScalar(-SQRT2DIV2, -SQRT2DIV2);
+      }
+      else {
+        tmp_2[5] = (tmp_1[5] + tmp_1[7]) * ComplexScalar(SQRT2DIV2, SQRT2DIV2);
+        tmp_2[6] = (tmp_1[4] - tmp_1[6]) * ComplexScalar(0, 1);
+        tmp_2[7] = (tmp_1[5] - tmp_1[7]) * ComplexScalar(-SQRT2DIV2, SQRT2DIV2);
+      }
+      data[0] = tmp_2[0] + tmp_2[4];
+      data[1] = tmp_2[1] + tmp_2[5];
+      data[2] = tmp_2[2] + tmp_2[6];
+      data[3] = tmp_2[3] + tmp_2[7];
+      data[4] = tmp_2[0] - tmp_2[4];
+      data[5] = tmp_2[1] - tmp_2[5];
+      data[6] = tmp_2[2] - tmp_2[6];
+      data[7] = tmp_2[3] - tmp_2[7];
+
+      return;
+    }
+    else {
+      compute_1D_Butterfly<Dir>(data, n/2, n_power_of_2 - 1);
+      compute_1D_Butterfly<Dir>(data + n/2, n/2, n_power_of_2 - 1);
+      //Original code:
+      //RealScalar wtemp = std::sin(M_PI/n);
+      //RealScalar wpi =  -std::sin(2 * M_PI/n);
+      RealScalar wtemp = m_sin_PI_div_n_LUT[n_power_of_2];
+      RealScalar wpi;
+      if (Dir == FFT_FORWARD) {
+        wpi =  m_minus_sin_2_PI_div_n_LUT[n_power_of_2];
+      }
+      else {
+        wpi = 0 - m_minus_sin_2_PI_div_n_LUT[n_power_of_2];
+      }
+
+      const ComplexScalar wp(wtemp, wpi);
+      ComplexScalar w(1.0, 0.0);
+      for(int i = 0; i < n/2; i++) {
+        ComplexScalar temp(data[i + n/2] * w);
+        data[i + n/2] = data[i] - temp;
+        data[i] += temp;
+        w += w * wp;
+      }
+      return;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index getBaseOffsetFromIndex(Index index, Index omitted_dim) const {
+    Index result = 0;
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > omitted_dim; --i) {
+        const Index partial_m_stride = m_strides[i] / m_dimensions[omitted_dim];
+        const Index idx = index / partial_m_stride;
+        index -= idx * partial_m_stride;
+        result += idx * m_strides[i];
+      }
+      result += index;
+    }
+    else {
+      for (int i = 0; i < omitted_dim; ++i) {
+        const Index partial_m_stride = m_strides[i] / m_dimensions[omitted_dim];
+        const Index idx = index / partial_m_stride;
+        index -= idx * partial_m_stride;
+        result += idx * m_strides[i];
+      }
+      result += index;
+    }
+    // Value of index_coords[omitted_dim] is not determined to this step
+    return result;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index getIndexFromOffset(Index base, Index omitted_dim, Index offset) const {
+    Index result = base + offset * m_strides[omitted_dim] ;
+    return result;
+  }
+
+ protected:
+  int m_size;
+  const FFT& m_fft;
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_strides;
+  TensorEvaluator<ArgType, Device> m_impl;
+  CoeffReturnType* m_data;
+  const Device& m_device;
+
+  // This will support a maximum FFT size of 2^32 for each dimension
+  // m_sin_PI_div_n_LUT[i] = (-2) * std::sin(M_PI / std::pow(2,i)) ^ 2;
+  RealScalar m_sin_PI_div_n_LUT[32] = {
+  0.0,
+  -2,
+  -0.999999999999999,
+  -0.292893218813453,
+  -0.0761204674887130,
+  -0.0192147195967696,
+  -0.00481527332780311,
+  -0.00120454379482761,
+  -3.01181303795779e-04,
+  -7.52981608554592e-05,
+  -1.88247173988574e-05,
+  -4.70619042382852e-06,
+  -1.17654829809007e-06,
+  -2.94137117780840e-07,
+  -7.35342821488550e-08,
+  -1.83835707061916e-08,
+  -4.59589268710903e-09,
+  -1.14897317243732e-09,
+  -2.87243293150586e-10,
+  -7.18108232902250e-11,
+  -1.79527058227174e-11,
+  -4.48817645568941e-12,
+  -1.12204411392298e-12,
+  -2.80511028480785e-13,
+  -7.01277571201985e-14,
+  -1.75319392800498e-14,
+  -4.38298482001247e-15,
+  -1.09574620500312e-15,
+  -2.73936551250781e-16,
+  -6.84841378126949e-17,
+  -1.71210344531737e-17,
+  -4.28025861329343e-18
+  };
+
+  // m_minus_sin_2_PI_div_n_LUT[i] = -std::sin(2 * M_PI / std::pow(2,i));
+  RealScalar m_minus_sin_2_PI_div_n_LUT[32] = {
+    0.0,
+    0.0,
+   -1.00000000000000e+00,
+   -7.07106781186547e-01,
+   -3.82683432365090e-01,
+   -1.95090322016128e-01,
+   -9.80171403295606e-02,
+   -4.90676743274180e-02,
+   -2.45412285229123e-02,
+   -1.22715382857199e-02,
+   -6.13588464915448e-03,
+   -3.06795676296598e-03,
+   -1.53398018628477e-03,
+   -7.66990318742704e-04,
+   -3.83495187571396e-04,
+   -1.91747597310703e-04,
+   -9.58737990959773e-05,
+   -4.79368996030669e-05,
+   -2.39684498084182e-05,
+   -1.19842249050697e-05,
+   -5.99211245264243e-06,
+   -2.99605622633466e-06,
+   -1.49802811316901e-06,
+   -7.49014056584716e-07,
+   -3.74507028292384e-07,
+   -1.87253514146195e-07,
+   -9.36267570730981e-08,
+   -4.68133785365491e-08,
+   -2.34066892682746e-08,
+   -1.17033446341373e-08,
+   -5.85167231706864e-09,
+   -2.92583615853432e-09
+  };
+};
+
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__) && !defined(__GCUDACC__)
+
+template<typename OutputScalar, typename RealScalar, typename ComplexScalar, int ResultType>
+struct writeToDeviceData {
+  void operator()(OutputScalar* d_data, ComplexScalar* data_buf, size_t size) {
+  }
+};
+
+template<typename OutputScalar, typename RealScalar, typename ComplexScalar>
+struct writeToDeviceData<OutputScalar, RealScalar, ComplexScalar, Eigen::BothParts> {
+  void operator()(OutputScalar* d_data, ComplexScalar* data_buf, size_t size) {
+    cudaMemcpy(d_data, data_buf, size * sizeof(ComplexScalar), cudaMemcpyDeviceToDevice);
+  }
+};
+
+template<typename OutputScalar, typename RealScalar, typename ComplexScalar>
+struct writeToDeviceData<OutputScalar, RealScalar, ComplexScalar, Eigen::RealPart> {
+  void operator()(OutputScalar* d_data, ComplexScalar* data_buf, size_t size) {
+    cudaMemcpy2D(d_data, sizeof(RealScalar), (RealScalar*) data_buf, 2 * sizeof(RealScalar), sizeof(RealScalar), size, cudaMemcpyDeviceToDevice);
+  }
+};
+
+template<typename OutputScalar, typename RealScalar, typename ComplexScalar>
+struct writeToDeviceData<OutputScalar, RealScalar, ComplexScalar, Eigen::ImagPart> {
+  void operator()(OutputScalar* d_data, ComplexScalar* data_buf, size_t size) {
+    RealScalar* data_buf_offset = &(((RealScalar*) data_buf)[1]);
+    cudaMemcpy2D(d_data, sizeof(RealScalar), data_buf_offset,        2 * sizeof(RealScalar), sizeof(RealScalar), size, cudaMemcpyDeviceToDevice);
+  }
+};
+
+template <typename InputScalar, typename RealScalar, typename ComplexScalar, typename InputEvaluator>
+__global__ void copyValues(ComplexScalar* d_data, InputEvaluator eval, int total_size) {
+  int i = blockIdx.x * blockDim.x + threadIdx.x;
+  if (i < total_size) {
+    d_data[i] = MakeComplex<internal::is_same<InputScalar, RealScalar>::value>()(eval.coeff(i));
+  }
+}
+
+template<typename Scalar, typename Index, int NumDims>
+__global__ void fillLineBuf(Scalar* line_buf, Scalar* data_buf, int line_len,
+                            array<Index, NumDims> coords, array<Index, NumDims> m_strides, int dim) {
+  int j = blockIdx.x * blockDim.x + threadIdx.x;
+  if(j < line_len) {
+    coords[dim] = j;
+    Index index = 0;
+    for (int i = 0; i < NumDims; ++i) {
+      index += coords[i] * m_strides[i];
+    }
+    line_buf[j] = data_buf[index];
+  }
+}
+
+template<typename ComplexScalar, typename RealScalar, typename Index, int NumDims>
+__global__ void writebackLineBuf(ComplexScalar* line_buf, ComplexScalar* data_buf, int line_len,
+                                 array<Index, NumDims> coords, array<Index, NumDims> m_strides, int dim, RealScalar div_factor) {
+  int j = blockIdx.x * blockDim.x + threadIdx.x;
+  if(j < line_len) {
+    coords[dim] = j;
+    Index index = 0;
+    for (int i = 0; i < NumDims; ++i) {
+      index += coords[i] * m_strides[i];
+    }
+
+    data_buf[index] = line_buf[j];
+    ((RealScalar*) data_buf)[2*index] /= div_factor;
+    ((RealScalar*) data_buf)[2*index + 1] /= div_factor;
+  }
+}
+
+template <typename FFT, typename ArgType, int FFTResultType, int FFTDir>
+struct TensorEvaluator<const TensorFFTOp<FFT, ArgType, FFTResultType, FFTDir>, GpuDevice> {
+  typedef TensorFFTOp<FFT, ArgType, FFTResultType, FFTDir> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, GpuDevice>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::Scalar InputScalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename std::complex<RealScalar> ComplexScalar;
+  typedef typename internal::conditional<FFTResultType == Eigen::BothParts, std::complex<RealScalar>, RealScalar>::type OutputScalar;
+  typedef typename TensorEvaluator<ArgType, GpuDevice>::Dimensions InputDimensions;
+  typedef OutputScalar CoeffReturnType;
+  typedef typename PacketType<OutputScalar, GpuDevice>::type PacketReturnType;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = false,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, GpuDevice>::Layout,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const GpuDevice& device) : m_data_buf(NULL), m_impl(op.expression(), device), m_fft(op.fft()) {
+    const typename TensorEvaluator<ArgType, GpuDevice>::Dimensions& input_dims = m_impl.dimensions();
+    for (int i = 0; i < NumDims; ++i) {
+      eigen_assert(input_dims[i] > 0);
+      m_dimensions[i] = input_dims[i];
+    }
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_strides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_strides[i] = m_strides[i - 1] * m_dimensions[i - 1];
+      }
+    } else {
+      m_strides[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_strides[i] = m_strides[i + 1] * m_dimensions[i + 1];
+      }
+    }
+    m_size = m_dimensions.TotalSize();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const {
+    return m_dimensions;
+  }
+
+  EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(OutputScalar* d_data) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    if (d_data) {
+      evalToDeviceData(d_data);
+      return false;
+    } else {
+      evalToSelfDataBuf();
+      return true;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index getIndexFromCoords(const array<Index, NumDims> & coords) const {
+    Index result = 0;
+    for (int i = 0; i < NumDims; ++i) {
+      result += coords[i] * m_strides[i];
+    }
+    return result;
+  }
+
+  EIGEN_STRONG_INLINE array<Index, NumDims> getPartialCoordsFromIndex(Index index, Index omitted_dim) const {
+    array<Index, NumDims> partial_m_strides = m_strides;
+    array<Index, NumDims> index_coords;
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (Index i = omitted_dim + 1; i < NumDims; ++i) {
+        partial_m_strides[i] /= m_dimensions[omitted_dim];
+      }
+      for (int i = NumDims - 1; i > 0; --i) {
+        if(omitted_dim == i) {
+        }
+        else {
+          const Index idx = index / partial_m_strides[i];
+          index -= idx * partial_m_strides[i];
+          index_coords[i] = idx;
+        }
+      }
+      index_coords[0] = index;
+    }
+    else {
+      for (Index i = omitted_dim - 1; i >= 0; --i) {
+        partial_m_strides[i] /= m_dimensions[omitted_dim];
+      }
+      for (int i = 0; i < NumDims - 1; ++i) {
+        if(omitted_dim == i) {
+        }
+        else {
+          const Index idx = index / partial_m_strides[i];
+          index -= idx * partial_m_strides[i];
+          index_coords[i] = idx;
+        }
+      }
+      index_coords[NumDims - 1] = index;
+    }
+    // Value of index_coords[omitted_dim] is not determined to this step
+    return index_coords;
+  }
+
+  void evalToSelfDataBuf() {
+    cudaMalloc((void**) &m_data_buf, sizeof(OutputScalar) * m_size);
+    evalToDeviceData(m_data_buf);
+  }
+
+  EIGEN_STRONG_INLINE void evalToDeviceData(OutputScalar* d_data) {
+    ComplexScalar* data_buf;
+    cudaMalloc((void**) &data_buf, sizeof(ComplexScalar) * m_size);
+
+    int block_size = 128;
+    int grid_size = m_size / block_size + 1;
+
+    copyValues<InputScalar, RealScalar, ComplexScalar, TensorEvaluator<ArgType, GpuDevice> > <<<grid_size, block_size>>>(data_buf, m_impl, m_size);
+
+    for (int i = 0; i < m_fft.size(); ++i) {
+      int dim = m_fft[i];
+      eigen_assert(dim >= 0 && dim < NumDims);
+      int line_len = m_dimensions[dim];
+      ComplexScalar* line_buf;
+      cudaMalloc((void**) &line_buf, sizeof(ComplexScalar) * line_len);
+
+      cufftHandle plan;
+      cufftPlan1d(&plan, line_len, CUFFT_C2C, 1);
+
+      for (Index partial_index = 0; partial_index < m_size/line_len; ++partial_index) {
+        array<Index, NumDims> coords = getPartialCoordsFromIndex(partial_index, dim);
+        // get data into line_buf
+        int block_size = 128;
+        int grid_size = line_len / block_size + 1;
+        fillLineBuf<ComplexScalar, Index, NumDims> <<<grid_size, block_size>>>(line_buf, data_buf, line_len, coords, m_strides, dim);
+
+        if(FFTDir == Eigen::FFT_FORWARD) {
+          cufftExecC2C(plan, reinterpret_cast<cufftComplex *>(line_buf), reinterpret_cast<cufftComplex*>(line_buf), CUFFT_FORWARD);
+        }
+        else {
+          cufftExecC2C(plan, reinterpret_cast<cufftComplex*>(line_buf), reinterpret_cast<cufftComplex*>(line_buf), CUFFT_INVERSE);
+        }
+        // write back
+        RealScalar div_factor = (FFTDir == FFT_FORWARD) ? 1.0 : line_len;
+        writebackLineBuf<ComplexScalar, RealScalar, Index, NumDims> <<<grid_size, block_size>>>(line_buf, data_buf, line_len, coords, m_strides, dim, div_factor);
+        cudaDeviceSynchronize();
+
+      }
+      cufftDestroy(plan);
+      cudaFree(line_buf);
+    }
+    writeToDeviceData<OutputScalar, RealScalar, ComplexScalar, FFTResultType>()(d_data, data_buf, m_size);
+    cudaFree(data_buf);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    if(m_data_buf != NULL) cudaFree(m_data_buf);
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE CoeffReturnType coeff(Index index) const {
+    return m_data_buf[index];
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE PacketReturnType packet(Index index) const {
+    return internal::ploadt<PacketReturnType, LoadMode>(m_data_buf + index);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return m_data_buf; }
+
+ protected:
+  int m_size;
+  const FFT& m_fft;
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_strides;
+  TensorEvaluator<ArgType, GpuDevice> m_impl;
+  OutputScalar* m_data_buf;
+
+};
+#endif
+
+}  // end namespace Eigen
+#endif //EIGEN_CXX11_TENSOR_TENSOR_FFT_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFixedSize.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFixedSize.h
new file mode 100644
index 0000000000..a7af67230f
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFixedSize.h
@@ -0,0 +1,277 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_FIXED_SIZE_H
+#define EIGEN_CXX11_TENSOR_TENSOR_FIXED_SIZE_H
+
+namespace Eigen {
+
+/** \class TensorFixedSize
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief The fixed sized version of the tensor class.
+  *
+  * The fixed sized equivalent of
+  * Eigen::Tensor<float, 3> t(3, 5, 7);
+  * is
+  * Eigen::TensorFixedSize<float, Sizes<3,5,7>> t;
+  */
+
+template<typename Scalar_, typename Dimensions_, int Options_, typename IndexType>
+class TensorFixedSize : public TensorBase<TensorFixedSize<Scalar_, Dimensions_, Options_, IndexType> >
+{
+  public:
+    typedef TensorFixedSize<Scalar_, Dimensions_, Options_, IndexType> Self;
+    typedef TensorBase<TensorFixedSize<Scalar_, Dimensions_, Options_, IndexType> > Base;
+    typedef typename Eigen::internal::nested<Self>::type Nested;
+    typedef typename internal::traits<Self>::StorageKind StorageKind;
+    typedef typename internal::traits<Self>::Index Index;
+    typedef Scalar_ Scalar;
+    typedef typename internal::packet_traits<Scalar>::type Packet;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+
+    static const int Options = Options_;
+
+    enum {
+      IsAligned = bool(EIGEN_ALIGN),
+      PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+      BlockAccess = false,
+      Layout = Options_ & RowMajor ? RowMajor : ColMajor,
+      CoordAccess = true,
+    };
+
+  typedef Dimensions_ Dimensions;
+  static const std::size_t NumIndices = Dimensions::count;
+
+  protected:
+  TensorStorage<Scalar, Dimensions, Options> m_storage;
+
+  public:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                      rank()                   const { return NumIndices; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                    dimension(std::size_t n) const { return m_storage.dimensions()[n]; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions&        dimensions()             const { return m_storage.dimensions(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                    size()                   const { return m_storage.size(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar                   *data()                        { return m_storage.data(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar             *data()                  const { return m_storage.data(); }
+
+    // This makes EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    // work, because that uses base().coeffRef() - and we don't yet
+    // implement a similar class hierarchy
+    inline Self& base()             { return *this; }
+    inline const Self& base() const { return *this; }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    EIGEN_DEVICE_FUNC inline const Scalar& coeff(Index firstIndex, IndexTypes... otherIndices) const
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return coeff(array<Index, NumIndices>{{firstIndex, otherIndices...}});
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& coeff(const array<Index, NumIndices>& indices) const
+    {
+      eigen_internal_assert(checkIndexRange(indices));
+      return m_storage.data()[linearizedIndex(indices)];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& coeff() const
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return m_storage.data()[0];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& coeff(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return m_storage.data()[index];
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline Scalar& coeffRef(Index firstIndex, IndexTypes... otherIndices)
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return coeffRef(array<Index, NumIndices>{{firstIndex, otherIndices...}});
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(const array<Index, NumIndices>& indices)
+    {
+      eigen_internal_assert(checkIndexRange(indices));
+      return m_storage.data()[linearizedIndex(indices)];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef()
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return m_storage.data()[0];
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index index)
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return m_storage.data()[index];
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline const Scalar& operator()(Index firstIndex, IndexTypes... otherIndices) const
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return this->operator()(array<Index, NumIndices>{{firstIndex, otherIndices...}});
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(const array<Index, NumIndices>& indices) const
+    {
+      eigen_assert(checkIndexRange(indices));
+      return coeff(indices);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()() const
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return coeff();
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return coeff(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator[](Index index) const
+    {
+      // The bracket operator is only for vectors, use the parenthesis operator instead.
+      EIGEN_STATIC_ASSERT(NumIndices == 1, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return coeff(index);
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline Scalar& operator()(Index firstIndex, IndexTypes... otherIndices)
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherIndices) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return operator()(array<Index, NumIndices>{{firstIndex, otherIndices...}});
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(const array<Index, NumIndices>& indices)
+    {
+      eigen_assert(checkIndexRange(indices));
+      return coeffRef(indices);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()()
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return coeffRef();
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index index)
+    {
+      eigen_assert(index >= 0 && index < size());
+      return coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator[](Index index)
+    {
+      // The bracket operator is only for vectors, use the parenthesis operator instead
+      EIGEN_STATIC_ASSERT(NumIndices == 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      return coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorFixedSize() { }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorFixedSize(const Self& other)
+      : m_storage(other.m_storage)
+    {
+    }
+
+#ifdef EIGEN_HAVE_RVALUE_REFERENCES
+    inline TensorFixedSize(Self&& other)
+      : m_storage(other.m_storage)
+    {
+    }
+#endif
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorFixedSize(const TensorBase<OtherDerived, ReadOnlyAccessors>& other)
+    {
+      typedef TensorAssignOp<TensorFixedSize, const OtherDerived> Assign;
+      Assign assign(*this, other.derived());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+    }
+
+    template<typename Other>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorFixedSize& operator=(const Other& other)
+    {
+      // FIXME: check that the dimensions of other match the dimensions of *this.
+      // Unfortunately this isn't possible yet when the rhs is an expression.
+      typedef TensorAssignOp<Self, const Other> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+      return *this;
+    }
+
+  protected:
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE bool checkIndexRange(const array<Index, NumIndices>& /*indices*/) const
+    {
+      using internal::array_apply_and_reduce;
+      using internal::array_zip_and_reduce;
+      using internal::greater_equal_zero_op;
+      using internal::logical_and_op;
+      using internal::lesser_op;
+
+      return true;
+        // check whether the indices are all >= 0
+          /*       array_apply_and_reduce<logical_and_op, greater_equal_zero_op>(indices) &&
+        // check whether the indices fit in the dimensions
+        array_zip_and_reduce<logical_and_op, lesser_op>(indices, m_storage.dimensions());*/
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index linearizedIndex(const array<Index, NumIndices>& indices) const
+    {
+      if (Options&RowMajor) {
+        return m_storage.dimensions().IndexOfRowMajor(indices);
+      } else {
+        return m_storage.dimensions().IndexOfColMajor(indices);
+      }
+    }
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_FIXED_SIZE_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorForcedEval.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorForcedEval.h
new file mode 100644
index 0000000000..1d1ce47174
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorForcedEval.h
@@ -0,0 +1,150 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_FORCED_EVAL_H
+#define EIGEN_CXX11_TENSOR_TENSOR_FORCED_EVAL_H
+
+namespace Eigen {
+
+/** \class TensorForcedEval
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor reshaping class.
+  *
+  *
+  */
+namespace internal {
+template<typename XprType>
+struct traits<TensorForcedEvalOp<XprType> >
+{
+  // Type promotion to handle the case where the types of the lhs and the rhs are different.
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename traits<XprType>::StorageKind StorageKind;
+  typedef typename traits<XprType>::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+
+  enum {
+    Flags = 0,
+  };
+};
+
+template<typename XprType>
+struct eval<TensorForcedEvalOp<XprType>, Eigen::Dense>
+{
+  typedef const TensorForcedEvalOp<XprType>& type;
+};
+
+template<typename XprType>
+struct nested<TensorForcedEvalOp<XprType>, 1, typename eval<TensorForcedEvalOp<XprType> >::type>
+{
+  typedef TensorForcedEvalOp<XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename XprType>
+class TensorForcedEvalOp : public TensorBase<TensorForcedEvalOp<XprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorForcedEvalOp>::Scalar Scalar;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+  typedef typename Eigen::internal::nested<TensorForcedEvalOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorForcedEvalOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorForcedEvalOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorForcedEvalOp(const XprType& expr)
+      : m_xpr(expr) {}
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+};
+
+
+template<typename ArgType, typename Device>
+struct TensorEvaluator<const TensorForcedEvalOp<ArgType>, Device>
+{
+  typedef TensorForcedEvalOp<ArgType> XprType;
+  typedef typename ArgType::Scalar Scalar;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions Dimensions;
+
+  enum {
+    IsAligned = true,
+    PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+  };
+
+  EIGEN_DEVICE_FUNC TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device), m_op(op.expression()), m_device(device), m_buffer(NULL)
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+
+  EIGEN_DEVICE_FUNC const Dimensions& dimensions() const { return m_impl.dimensions(); }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType*) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    const Index numValues = m_impl.dimensions().TotalSize();
+    m_buffer = (CoeffReturnType*)m_device.allocate(numValues * sizeof(CoeffReturnType));
+    // Should initialize the memory in case we're dealing with non POD types.
+    if (!internal::is_arithmetic<CoeffReturnType>::value) {
+      for (Index i = 0; i < numValues; ++i) {
+        new(m_buffer+i) CoeffReturnType();
+      }
+    }
+    typedef TensorEvalToOp<const ArgType> EvalTo;
+    EvalTo evalToTmp(m_buffer, m_op);
+    const bool PacketAccess = internal::IsVectorizable<Device, ArgType>::value;
+    const bool BlockAccess = false;
+    internal::TensorExecutor<const EvalTo, Device, PacketAccess, BlockAccess>::run(evalToTmp, m_device);
+    m_impl.cleanup();
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_device.deallocate(m_buffer);
+    m_buffer = NULL;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return m_buffer[index];
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    return internal::ploadt<PacketReturnType, LoadMode>(m_buffer + index);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return m_buffer; }
+
+ private:
+  TensorEvaluator<ArgType, Device> m_impl;
+  const ArgType m_op;
+  const Device& m_device;
+  CoeffReturnType* m_buffer;
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_FORCED_EVAL_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorForwardDeclarations.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorForwardDeclarations.h
new file mode 100644
index 0000000000..e11d5ed22e
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorForwardDeclarations.h
@@ -0,0 +1,104 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_FORWARD_DECLARATIONS_H
+#define EIGEN_CXX11_TENSOR_TENSOR_FORWARD_DECLARATIONS_H
+
+namespace Eigen {
+
+template<typename Scalar_, std::size_t NumIndices_, int Options_ = 0, typename IndexType = DenseIndex> class Tensor;
+template<typename Scalar_, typename Dimensions, int Options_ = 0, typename IndexType = DenseIndex> class TensorFixedSize;
+template<typename Scalar_, int Options_ = 0, typename IndexType = DenseIndex> class TensorVarDim;
+template<typename PlainObjectType, int Options_ = Unaligned> class TensorMap;
+template<typename PlainObjectType> class TensorRef;
+template<typename Derived, int AccessLevel = internal::accessors_level<Derived>::value> class TensorBase;
+
+template<typename NullaryOp, typename PlainObjectType> class TensorCwiseNullaryOp;
+template<typename UnaryOp, typename XprType> class TensorCwiseUnaryOp;
+template<typename BinaryOp, typename LeftXprType, typename RightXprType> class TensorCwiseBinaryOp;
+template<typename IfXprType, typename ThenXprType, typename ElseXprType> class TensorSelectOp;
+template<typename Op, typename Dims, typename XprType> class TensorReductionOp;
+template<typename XprType> class TensorIndexTupleOp;
+template<typename ReduceOp, typename Dims, typename XprType> class TensorTupleReducerOp;
+template<typename Axis, typename LeftXprType, typename RightXprType> class TensorConcatenationOp;
+template<typename Dimensions, typename LeftXprType, typename RightXprType> class TensorContractionOp;
+template<typename TargetType, typename XprType> class TensorConversionOp;
+template<typename Dimensions, typename InputXprType, typename KernelXprType> class TensorConvolutionOp;
+template<typename Dimensions, typename InputXprType, typename KernelXprType> class TensorConvolutionByFFTOp;
+template<typename FFT, typename XprType, int FFTDataType, int FFTDirection> class TensorFFTOp;
+template<typename IFFT, typename XprType, int ResultType> class TensorIFFTOp;
+template<typename DFT, typename XprType, int ResultType> class TensorDFTOp;
+template<typename IDFT, typename XprType, int ResultType> class TensorIDFTOp;
+template<typename PatchDim, typename XprType> class TensorPatchOp;
+template<DenseIndex Rows, DenseIndex Cols, typename XprType> class TensorImagePatchOp;
+template<DenseIndex Planes, DenseIndex Rows, DenseIndex Cols, typename XprType> class TensorVolumePatchOp;
+template<typename Broadcast, typename XprType> class TensorBroadcastingOp;
+template<DenseIndex DimId, typename XprType> class TensorChippingOp;
+template<typename NewDimensions, typename XprType> class TensorReshapingOp;
+template<typename XprType> class TensorLayoutSwapOp;
+template<typename StartIndices, typename Sizes, typename XprType> class TensorSlicingOp;
+template<typename ReverseDimensions, typename XprType> class TensorReverseOp;
+template<typename XprType> class TensorTrueIndicesOp;
+template<typename PaddingDimensions, typename XprType> class TensorPaddingOp;
+template<typename Shuffle, typename XprType> class TensorShufflingOp;
+template<typename Strides, typename XprType> class TensorStridingOp;
+template<typename Strides, typename XprType> class TensorInflationOp;
+template<typename Generator, typename XprType> class TensorGeneratorOp;
+template<typename LeftXprType, typename RightXprType> class TensorAssignOp;
+
+template<typename CustomUnaryFunc, typename XprType> class TensorCustomUnaryOp;
+template<typename CustomBinaryFunc, typename LhsXprType, typename RhsXprType> class TensorCustomBinaryOp;
+
+template<typename XprType> class TensorEvalToOp;
+template<typename XprType> class TensorForcedEvalOp;
+
+template<typename ExpressionType, typename DeviceType> class TensorDevice;
+template<typename Derived, typename Device> struct TensorEvaluator;
+
+class DefaultDevice;
+class ThreadPoolDevice;
+class GpuDevice;
+
+enum DFTResultType {
+  RealPart = 0,
+  ImagPart = 1,
+  BothParts = 2
+};
+
+enum FFTDirection {
+    FFT_FORWARD = 0,
+    FFT_REVERSE = 1
+};
+
+namespace internal {
+template <typename Device, typename Expression>
+struct IsVectorizable {
+  static const bool value = TensorEvaluator<Expression, Device>::PacketAccess;
+};
+
+template <typename Expression>
+struct IsVectorizable<GpuDevice, Expression> {
+  static const bool value = TensorEvaluator<Expression, GpuDevice>::PacketAccess &&
+                            TensorEvaluator<Expression, GpuDevice>::IsAligned;
+};
+
+template <typename Device, typename Expression>
+struct IsTileable {
+  static const bool value = TensorEvaluator<Expression, Device>::BlockAccess;
+};
+
+template <typename Expression, typename Device,
+          bool Vectorizable = IsVectorizable<Device, Expression>::value,
+          bool Tileable = IsTileable<Device, Expression>::value>
+class TensorExecutor;
+}  // end namespace internal
+
+}  // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_FORWARD_DECLARATIONS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFunctors.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFunctors.h
new file mode 100644
index 0000000000..526301ad5b
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorFunctors.h
@@ -0,0 +1,706 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_FUNCTORS_H
+#define EIGEN_CXX11_TENSOR_TENSOR_FUNCTORS_H
+
+namespace Eigen {
+namespace internal {
+
+namespace {
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__) && defined(__CUDA_ARCH__)
+__device__ int get_random_seed() {
+    return clock();
+}
+#else
+int get_random_seed() {
+#ifdef _WIN32
+    SYSTEMTIME st;
+    GetSystemTime(&st);
+    return st.wSecond + 1000 * st.wMilliseconds;
+#elif __APPLE__
+    return mach_absolute_time();
+#else
+    timespec ts;
+    clock_gettime(CLOCK_REALTIME, &ts);
+    return ts.tv_nsec;
+#endif
+}
+#endif
+}
+
+
+// Standard reduction functors
+template <typename T> struct SumReducer
+{
+  static const bool PacketAccess = true;
+  static const bool IsStateful = false;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const {
+    (*accum) += t;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) const {
+    (*accum) = padd<Packet>(*accum, p);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const {
+    return static_cast<T>(0);
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const {
+    return pset1<Packet>(0);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const {
+    return accum;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const {
+    return vaccum;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const {
+    return saccum + predux(vaccum);
+  }
+};
+
+template <typename T> struct MeanReducer
+{
+  static const bool PacketAccess = true;
+  static const bool IsStateful = true;
+
+  MeanReducer() : scalarCount_(0), packetCount_(0) { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) {
+    (*accum) += t;
+    scalarCount_++;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) {
+    (*accum) = padd<Packet>(*accum, p);
+    packetCount_++;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const {
+    return static_cast<T>(0);
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const {
+    return pset1<Packet>(0);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const {
+    return accum / scalarCount_;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const {
+    return pdiv(vaccum, pset1<Packet>(packetCount_));
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const {
+    return (saccum + predux(vaccum)) / (scalarCount_ + packetCount_ * unpacket_traits<Packet>::size);
+  }
+
+  protected:
+    int scalarCount_;
+    int packetCount_;
+};
+
+struct AndReducer
+{
+  static const bool PacketAccess = false;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(bool t, bool* accum) const {
+    *accum = *accum && t;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool initialize() const {
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool finalize(bool accum) const {
+    return accum;
+  }
+};
+
+struct OrReducer {
+  static const bool PacketAccess = false;
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(bool t, bool* accum) const {
+    *accum = *accum || t;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool initialize() const {
+    return false;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool finalize(bool accum) const {
+    return accum;
+  }
+};
+
+template <typename T> struct MaxReducer
+{
+  static const bool PacketAccess = true;
+  static const bool IsStateful = false;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const {
+    if (t > *accum) { *accum = t; }
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) const {
+    (*accum) = pmax<Packet>(*accum, p);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const {
+    return Eigen::NumTraits<T>::lowest();
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const {
+    return pset1<Packet>(initialize());
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const {
+    return accum;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const {
+    return vaccum;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const {
+    return numext::maxi(saccum, predux_max(vaccum));
+  }
+};
+
+template <typename T> struct MinReducer
+{
+  static const bool PacketAccess = true;
+  static const bool IsStateful = false;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const {
+    if (t < *accum) { *accum = t; }
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) const {
+    (*accum) = pmin<Packet>(*accum, p);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const {
+    return Eigen::NumTraits<T>::highest();
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const {
+    return pset1<Packet>(initialize());
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const {
+    return accum;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const {
+    return vaccum;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const {
+    return numext::mini(saccum, predux_min(vaccum));
+  }
+};
+
+
+template <typename T> struct ProdReducer
+{
+  static const bool PacketAccess = true;
+  static const bool IsStateful = false;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const {
+    (*accum) *= t;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reducePacket(const Packet& p, Packet* accum) const {
+    (*accum) = pmul<Packet>(*accum, p);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const {
+    return static_cast<T>(1);
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet initializePacket() const {
+    return pset1<Packet>(1);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T accum) const {
+    return accum;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Packet finalizePacket(const Packet& vaccum) const {
+    return vaccum;
+  }
+  template <typename Packet>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalizeBoth(const T saccum, const Packet& vaccum) const {
+    return saccum * predux_mul(vaccum);
+  }
+};
+
+#if !defined (EIGEN_USE_GPU) || !defined(__CUDACC__) || !defined(__CUDA_ARCH__)
+// We're not compiling a cuda kernel
+template <typename T> class UniformRandomGenerator {
+
+ public:
+  static const bool PacketAccess = true;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  UniformRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    seed = seed ? seed : get_random_seed();
+    srand(seed);
+  }
+  UniformRandomGenerator(const UniformRandomGenerator& other) {
+    m_seed = other.m_seed;
+  }
+
+  template<typename Index>
+  T operator()(Index, Index = 0) const {
+    return random<T>();
+  }
+  template<typename Index>
+  typename internal::packet_traits<T>::type packetOp(Index i, Index j = 0) const {
+    const int packetSize = internal::packet_traits<T>::size;
+    EIGEN_ALIGN_DEFAULT T values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = random<T>();
+    }
+    return internal::pload<typename internal::packet_traits<T>::type>(values);
+  }
+
+ private:
+  unsigned int m_seed;
+};
+
+#if __cplusplus > 199711
+template <> class UniformRandomGenerator<float> {
+ public:
+  static const bool PacketAccess = true;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  UniformRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    seed = seed ? seed : get_random_seed();
+    m_generator.seed(seed);
+  }
+  UniformRandomGenerator(const UniformRandomGenerator<float>& other) {
+    m_generator.seed(other(0, 0) * UINT_MAX);
+    m_seed = other.m_seed;
+  }
+
+  template<typename Index>
+  float operator()(Index, Index = 0) const {
+    return m_distribution(m_generator);
+  }
+  template<typename Index>
+  typename internal::packet_traits<float>::type packetOp(Index i, Index j = 0) const {
+    const int packetSize = internal::packet_traits<float>::size;
+    EIGEN_ALIGN_DEFAULT float values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = this->operator()(i, j);
+    }
+    return internal::pload<typename internal::packet_traits<float>::type>(values);
+  }
+
+ private:
+  UniformRandomGenerator& operator = (const UniformRandomGenerator&);
+  // Make sure m_seed comes first to match the layout of the cpu
+  // version of the code.
+  unsigned int m_seed;
+  mutable std::mt19937 m_generator;
+  mutable std::uniform_real_distribution<float> m_distribution;
+};
+
+template <> class UniformRandomGenerator<double> {
+ public:
+  static const bool PacketAccess = true;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  UniformRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    seed = seed ? seed : get_random_seed();
+    m_generator.seed(seed);
+  }
+  UniformRandomGenerator(const UniformRandomGenerator<double>& other) {
+    m_generator.seed(other(0, 0) * UINT_MAX);
+    m_seed = other.m_seed;
+  }
+
+  template<typename Index>
+  double operator()(Index, Index = 0) const {
+    return m_distribution(m_generator);
+  }
+  template<typename Index>
+  typename internal::packet_traits<double>::type packetOp(Index i, Index j = 0) const {
+    const int packetSize = internal::packet_traits<double>::size;
+    EIGEN_ALIGN_DEFAULT double values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = this->operator()(i, j);
+    }
+    return internal::pload<typename internal::packet_traits<double>::type>(values);
+  }
+
+ private:
+  UniformRandomGenerator& operator = (const UniformRandomGenerator&);
+  // Make sure m_seed comes first to match the layout of the cpu
+  // version of the code.
+  unsigned int m_seed;
+  mutable std::mt19937 m_generator;
+  mutable std::uniform_real_distribution<double> m_distribution;
+};
+#endif
+
+#else
+
+// We're compiling a cuda kernel
+template <typename T> class UniformRandomGenerator;
+
+template <> class UniformRandomGenerator<float> {
+ public:
+  static const bool PacketAccess = true;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  __device__ UniformRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    seed = seed ? seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+
+  __device__ UniformRandomGenerator(const UniformRandomGenerator& other) {
+    m_seed = other.m_seed;
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    const unsigned int seed = m_seed ? m_seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+
+  template<typename Index>
+  __device__ float operator()(Index, Index = 0) const {
+    return curand_uniform(&m_state);
+  }
+  template<typename Index>
+  __device__ float4 packetOp(Index, Index = 0) const {
+    return curand_uniform4(&m_state);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable curandStatePhilox4_32_10_t m_state;
+};
+
+template <> class UniformRandomGenerator<double> {
+ public:
+  static const bool PacketAccess = true;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  __device__ UniformRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    seed = seed ? seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  __device__ UniformRandomGenerator(const UniformRandomGenerator& other) {
+    m_seed = other.m_seed;
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    const unsigned int seed = m_seed ? m_seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  template<typename Index>
+  __device__ double operator()(Index, Index = 0) const {
+    return curand_uniform_double(&m_state);
+  }
+  template<typename Index>
+  __device__ double2 packetOp(Index, Index = 0) const {
+    return curand_uniform2_double(&m_state);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable curandStatePhilox4_32_10_t m_state;
+};
+
+template <> class UniformRandomGenerator<std::complex<float> > {
+ public:
+  static const bool PacketAccess = false;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  __device__ UniformRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    seed = seed ? seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  __device__ UniformRandomGenerator(const UniformRandomGenerator& other) {
+    m_seed = other.m_seed;
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    const unsigned int seed = m_seed ? m_seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  template<typename Index>
+  __device__ std::complex<float> operator()(Index, Index = 0) const {
+    float4 vals = curand_uniform4(&m_state);
+    return std::complex<float>(vals.x, vals.y);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable curandStatePhilox4_32_10_t m_state;
+};
+
+template <> class UniformRandomGenerator<std::complex<double> > {
+ public:
+  static const bool PacketAccess = false;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  __device__ UniformRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    seed = seed ? seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  __device__ UniformRandomGenerator(const UniformRandomGenerator& other) {
+    m_seed = other.m_seed;
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    const unsigned int seed = m_seed ? m_seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  template<typename Index>
+  __device__ std::complex<double> operator()(Index, Index = 0) const {
+    double2 vals = curand_uniform2_double(&m_state);
+    return std::complex<double>(vals.x, vals.y);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable curandStatePhilox4_32_10_t m_state;
+};
+
+#endif
+
+
+#if (!defined (EIGEN_USE_GPU) || !defined(__CUDACC__) || !defined(__CUDA_ARCH__)) && __cplusplus > 199711
+// We're not compiling a cuda kernel
+template <typename T> class NormalRandomGenerator {
+ public:
+  static const bool PacketAccess = true;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  NormalRandomGenerator(unsigned int seed = 0) : m_distribution(0, 1), m_seed(seed) {
+    seed = seed ? seed : get_random_seed();
+    m_generator.seed(seed);
+  }
+  NormalRandomGenerator(const NormalRandomGenerator& other)
+      : m_distribution(other.m_distribution), m_seed(other.m_seed) {
+    m_generator.seed(other(0, 0) * UINT_MAX);
+  }
+
+  template<typename Index>
+  T operator()(Index, Index = 0) const {
+    return m_distribution(m_generator);
+  }
+  template<typename Index>
+  typename internal::packet_traits<T>::type packetOp(Index, Index = 0) const {
+    const int packetSize = internal::packet_traits<T>::size;
+    EIGEN_ALIGN_DEFAULT T values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = m_distribution(m_generator);
+    }
+    return internal::pload<typename internal::packet_traits<T>::type>(values);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable std::normal_distribution<T> m_distribution;
+  mutable std::mt19937 m_generator;
+};
+
+#elif defined (EIGEN_USE_GPU) && defined(__CUDACC__) && defined(__CUDA_ARCH__)
+
+// We're compiling a cuda kernel
+template <typename T> class NormalRandomGenerator;
+
+template <> class NormalRandomGenerator<float> {
+ public:
+  static const bool PacketAccess = true;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  __device__ NormalRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    seed = seed ? seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  __device__ NormalRandomGenerator(const NormalRandomGenerator<float>& other) {
+    m_seed = other.m_seed;
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    const unsigned int seed = m_seed ? m_seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  template<typename Index>
+   __device__ float operator()(Index, Index = 0) const {
+    return curand_normal(&m_state);
+  }
+  template<typename Index>
+   __device__ float4 packetOp(Index, Index = 0) const {
+    return curand_normal4(&m_state);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable curandStatePhilox4_32_10_t m_state;
+};
+
+template <> class NormalRandomGenerator<double> {
+ public:
+  static const bool PacketAccess = true;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  __device__ NormalRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    seed = seed ? seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  __device__ NormalRandomGenerator(const NormalRandomGenerator<double>& other) {
+    m_seed = other.m_seed;
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    const unsigned int seed = m_seed ? m_seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  template<typename Index>
+  __device__ double operator()(Index, Index = 0) const {
+    return curand_normal_double(&m_state);
+  }
+  template<typename Index>
+  __device__ double2 packetOp(Index, Index = 0) const {
+    return curand_normal2_double(&m_state);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable curandStatePhilox4_32_10_t m_state;
+};
+
+
+template <> class NormalRandomGenerator<std::complex<float> > {
+ public:
+  static const bool PacketAccess = false;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  __device__ NormalRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    seed = seed ? seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  __device__ NormalRandomGenerator(const NormalRandomGenerator& other) {
+    m_seed = other.m_seed;
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    const unsigned int seed = m_seed ? m_seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  template<typename Index>
+  __device__ std::complex<float> operator()(Index, Index = 0) const {
+    float4 vals = curand_normal4(&m_state);
+    return std::complex<float>(vals.x, vals.y);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable curandStatePhilox4_32_10_t m_state;
+};
+
+template <> class NormalRandomGenerator<std::complex<double> > {
+ public:
+  static const bool PacketAccess = false;
+
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  __device__ NormalRandomGenerator(unsigned int seed = 0) : m_seed(seed) {
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    seed = seed ? seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  __device__ NormalRandomGenerator(const NormalRandomGenerator& other) {
+    m_seed = other.m_seed;
+    const int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    const unsigned int seed = m_seed ? m_seed : get_random_seed();
+    curand_init(seed, tid, 0, &m_state);
+  }
+  template<typename Index>
+  __device__ std::complex<double> operator()(Index, Index = 0) const {
+    double2 vals = curand_normal2_double(&m_state);
+    return std::complex<double>(vals.x, vals.y);
+  }
+
+ private:
+  unsigned int m_seed;
+  mutable curandStatePhilox4_32_10_t m_state;
+};
+#else
+
+template <typename T> class NormalRandomGenerator {
+ public:
+  // Uses the given "seed" if non-zero, otherwise uses a random seed.
+  NormalRandomGenerator(unsigned int seed = 0) : m_seed(seed) {}
+
+ private:
+  unsigned int m_seed;
+};
+
+#endif
+
+
+template <typename T, typename Index, size_t NumDims>
+class GaussianGenerator {
+ public:
+  static const bool PacketAccess = false;
+
+  EIGEN_DEVICE_FUNC GaussianGenerator(const array<T, NumDims>& means,
+                                      const array<T, NumDims>& std_devs)
+      : m_means(means) {
+    for (int i = 0; i < NumDims; ++i) {
+      m_two_sigmas[i] = std_devs[i] * std_devs[i] * 2;
+    }
+  }
+
+  T operator()(const array<Index, NumDims>& coordinates) const {
+    T tmp = T(0);
+    for (int i = 0; i < NumDims; ++i) {
+      T offset = coordinates[i] - m_means[i];
+      tmp += offset * offset / m_two_sigmas[i];
+    }
+    return std::exp(-tmp);
+  }
+
+ private:
+  array<T, NumDims> m_means;
+  array<T, NumDims> m_two_sigmas;
+};
+
+template <typename T> struct ArgMaxTupleReducer
+{
+  static const bool PacketAccess = false;
+  static const bool IsStateful = false;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T t, T* accum) const {
+    if (t.second > accum->second) { *accum = t; }
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const {
+    return T(0, NumTraits<typename T::second_type>::lowest());
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T& accum) const {
+    return accum;
+  }
+};
+
+template <typename T> struct ArgMinTupleReducer
+{
+  static const bool PacketAccess = false;
+  static const bool IsStateful = false;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const T& t, T* accum) const {
+    if (t.second < accum->second) { *accum = t; }
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T initialize() const {
+    return T(0, NumTraits<typename T::second_type>::highest());
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T finalize(const T& accum) const {
+    return accum;
+  }
+};
+
+} // end namespace internal
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_FUNCTORS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h
new file mode 100644
index 0000000000..91a73669a4
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorGenerator.h
@@ -0,0 +1,185 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_GENERATOR_H
+#define EIGEN_CXX11_TENSOR_TENSOR_GENERATOR_H
+
+namespace Eigen {
+
+/** \class TensorGenerator
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor generator class.
+  *
+  *
+  */
+namespace internal {
+template<typename Generator, typename XprType>
+struct traits<TensorGeneratorOp<Generator, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename Generator, typename XprType>
+struct eval<TensorGeneratorOp<Generator, XprType>, Eigen::Dense>
+{
+  typedef const TensorGeneratorOp<Generator, XprType>& type;
+};
+
+template<typename Generator, typename XprType>
+struct nested<TensorGeneratorOp<Generator, XprType>, 1, typename eval<TensorGeneratorOp<Generator, XprType> >::type>
+{
+  typedef TensorGeneratorOp<Generator, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename Generator, typename XprType>
+class TensorGeneratorOp : public TensorBase<TensorGeneratorOp<Generator, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorGeneratorOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorGeneratorOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorGeneratorOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorGeneratorOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorGeneratorOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorGeneratorOp(const XprType& expr, const Generator& generator)
+      : m_xpr(expr), m_generator(generator) {}
+
+    EIGEN_DEVICE_FUNC
+    const Generator& generator() const { return m_generator; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const Generator m_generator;
+};
+
+
+// Eval as rvalue
+template<typename Generator, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorGeneratorOp<Generator, ArgType>, Device>
+{
+  typedef TensorGeneratorOp<Generator, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions Dimensions;
+  static const int NumDims = internal::array_size<Dimensions>::value;
+  typedef typename XprType::Scalar Scalar;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_generator(op.generator())
+  {
+    TensorEvaluator<ArgType, Device> impl(op.expression(), device);
+    m_dimensions = impl.dimensions();
+
+    if (NumDims > 0) {
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        m_strides[0] = 1;
+        for (int i = 1; i < NumDims; ++i) {
+          m_strides[i] = m_strides[i - 1] * m_dimensions[i - 1];
+        }
+      } else {
+        m_strides[NumDims - 1] = 1;
+        for (int i = NumDims - 2; i >= 0; --i) {
+          m_strides[i] = m_strides[i + 1] * m_dimensions[i + 1];
+        }
+      }
+    }
+  }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    array<Index, NumDims> coords;
+    extract_coordinates(index, coords);
+    return m_generator(coords);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void extract_coordinates(Index index, array<Index, NumDims>& coords) const {
+    if (NumDims > 0) {
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        for (int i = NumDims - 1; i > 0; --i) {
+          const Index idx = index / m_strides[i];
+          index -= idx * m_strides[i];
+          coords[i] = idx;
+        }
+        coords[0] = index;
+      } else {
+        for (int i = 0; i < NumDims - 1; ++i) {
+          const Index idx = index / m_strides[i];
+          index -= idx * m_strides[i];
+          coords[i] = idx;
+        }
+        coords[NumDims-1] = index;
+      }
+    }
+  }
+
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_strides;
+  Generator m_generator;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_GENERATOR_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIO.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIO.h
new file mode 100644
index 0000000000..53dc0b04aa
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIO.h
@@ -0,0 +1,56 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_IO_H
+#define EIGEN_CXX11_TENSOR_TENSOR_IO_H
+
+namespace Eigen {
+
+namespace internal {
+template<>
+struct significant_decimals_impl<std::string>
+    : significant_decimals_default_impl<std::string, true>
+{};
+}
+
+
+template <typename T>
+std::ostream& operator << (std::ostream& os, const TensorBase<T, ReadOnlyAccessors>& expr) {
+  // Evaluate the expression if needed
+  TensorForcedEvalOp<const T> eval = expr.eval();
+  TensorEvaluator<const TensorForcedEvalOp<const T>, DefaultDevice> tensor(eval, DefaultDevice());
+  tensor.evalSubExprsIfNeeded(NULL);
+
+  typedef typename internal::remove_const<typename T::Scalar>::type Scalar;
+  typedef typename T::Index Index;
+  typedef typename TensorEvaluator<const TensorForcedEvalOp<const T>, DefaultDevice>::Dimensions Dimensions;
+  const Index total_size = internal::array_prod(tensor.dimensions());
+
+  // Print the tensor as a 1d vector or a 2d matrix.
+  static const int rank = internal::array_size<Dimensions>::value;
+  if (rank == 0) {
+    os << tensor.coeff(0);
+  } else if (rank == 1) {
+    Map<const Array<Scalar, Dynamic, 1> > array(const_cast<Scalar*>(tensor.data()), total_size);
+    os << array;
+  } else {
+    const Index first_dim = tensor.dimensions()[0];
+    static const int layout = TensorEvaluator<const TensorForcedEvalOp<const T>, DefaultDevice>::Layout;
+    Map<const Array<Scalar, Dynamic, Dynamic, layout> > matrix(const_cast<Scalar*>(tensor.data()), first_dim, total_size/first_dim);
+    os << matrix;
+  }
+
+  // Cleanup.
+  tensor.cleanup();
+  return os;
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_IO_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorImagePatch.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorImagePatch.h
new file mode 100644
index 0000000000..a1d33d964e
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorImagePatch.h
@@ -0,0 +1,757 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_IMAGE_PATCH_H
+#define EIGEN_CXX11_TENSOR_TENSOR_IMAGE_PATCH_H
+
+namespace Eigen {
+
+/** \class TensorImagePatch
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Patch extraction specialized for image processing.
+  * This assumes that the input has a least 3 dimensions ordered as follow:
+  *  1st dimension: channels (of size d)
+  *  2nd dimension: rows (of size r)
+  *  3rd dimension: columns (of size c)
+  *  There can be additional dimensions such as time (for video) or batch (for
+  * bulk processing after the first 3.
+  * Calling the image patch code with patch_rows and patch_cols is equivalent
+  * to calling the regular patch extraction code with parameters d, patch_rows,
+  * patch_cols, and 1 for all the additional dimensions.
+  */
+namespace internal {
+template<DenseIndex Rows, DenseIndex Cols, typename XprType>
+struct traits<TensorImagePatchOp<Rows, Cols, XprType> > : public traits<XprType>
+{
+  typedef typename internal::remove_const<typename XprType::Scalar>::type Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions + 1;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<DenseIndex Rows, DenseIndex Cols, typename XprType>
+struct eval<TensorImagePatchOp<Rows, Cols, XprType>, Eigen::Dense>
+{
+  typedef const TensorImagePatchOp<Rows, Cols, XprType>& type;
+};
+
+template<DenseIndex Rows, DenseIndex Cols, typename XprType>
+struct nested<TensorImagePatchOp<Rows, Cols, XprType>, 1, typename eval<TensorImagePatchOp<Rows, Cols, XprType> >::type>
+{
+  typedef TensorImagePatchOp<Rows, Cols, XprType> type;
+};
+
+template <typename Self, bool Vectorizable>
+struct ImagePatchCopyOp {
+  typedef typename Self::Index Index;
+  typedef typename Self::Scalar Scalar;
+  typedef typename Self::Impl Impl;
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      const Self& self, const Index num_coeff_to_copy, const Index dst_index,
+      Scalar* dst_data, const Index src_index) {
+    const Impl& impl = self.impl();
+    for (Index i = 0; i < num_coeff_to_copy; ++i) {
+      dst_data[dst_index + i] = impl.coeff(src_index + i);
+    }
+  }
+};
+
+template <typename Self>
+struct ImagePatchCopyOp<Self, true> {
+  typedef typename Self::Index Index;
+  typedef typename Self::Scalar Scalar;
+  typedef typename Self::Impl Impl;
+  typedef typename packet_traits<Scalar>::type Packet;
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      const Self& self, const Index num_coeff_to_copy, const Index dst_index,
+      Scalar* dst_data, const Index src_index) {
+    const Impl& impl = self.impl();
+    const Index packet_size = internal::unpacket_traits<Packet>::size;
+    const Index vectorized_size = (num_coeff_to_copy / packet_size) *
+        packet_size;
+    for (Index i = 0; i < vectorized_size; i += packet_size) {
+      Packet p = impl.template packet<Unaligned>(src_index + i);
+      internal::pstoret<Scalar, Packet, Unaligned>(dst_data + dst_index + i, p);
+    }
+    for (Index i = vectorized_size; i < num_coeff_to_copy; ++i) {
+      dst_data[dst_index + i] = impl.coeff(src_index + i);
+    }
+  }
+};
+
+template <typename Self>
+struct ImagePatchPaddingOp {
+  typedef typename Self::Index Index;
+  typedef typename Self::Scalar Scalar;
+  typedef typename packet_traits<Scalar>::type Packet;
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void Run(
+      const Index num_coeff_to_pad, const Scalar padding_value,
+      const Index dst_index, Scalar* dst_data) {
+    const Index packet_size = internal::unpacket_traits<Packet>::size;
+    const Packet padded_packet = internal::pset1<Packet>(padding_value);
+    const Index vectorized_size = (num_coeff_to_pad / packet_size) *
+        packet_size;
+    for (Index i = 0; i < vectorized_size; i += packet_size) {
+      internal::pstoret<Scalar, Packet, Unaligned>(dst_data + dst_index + i,
+                                                   padded_packet);
+    }
+    for (Index i = vectorized_size; i < num_coeff_to_pad; ++i) {
+      dst_data[dst_index + i] = padding_value;
+    }
+  }
+};
+
+}  // end namespace internal
+
+template<DenseIndex Rows, DenseIndex Cols, typename XprType>
+class TensorImagePatchOp : public TensorBase<TensorImagePatchOp<Rows, Cols, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorImagePatchOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorImagePatchOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorImagePatchOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorImagePatchOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorImagePatchOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorImagePatchOp(const XprType& expr, DenseIndex patch_rows, DenseIndex patch_cols,
+                                                           DenseIndex row_strides, DenseIndex col_strides,
+                                                           DenseIndex in_row_strides, DenseIndex in_col_strides,
+                                                           DenseIndex row_inflate_strides, DenseIndex col_inflate_strides,
+                                                           PaddingType padding_type, Scalar padding_value)
+      : m_xpr(expr), m_patch_rows(patch_rows), m_patch_cols(patch_cols),
+        m_row_strides(row_strides), m_col_strides(col_strides),
+        m_in_row_strides(in_row_strides), m_in_col_strides(in_col_strides),
+        m_row_inflate_strides(row_inflate_strides), m_col_inflate_strides(col_inflate_strides),
+        m_padding_explicit(false), m_padding_top(0), m_padding_bottom(0), m_padding_left(0), m_padding_right(0),
+        m_padding_type(padding_type), m_padding_value(padding_value) {}
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorImagePatchOp(const XprType& expr, DenseIndex patch_rows, DenseIndex patch_cols,
+                                                           DenseIndex row_strides, DenseIndex col_strides,
+                                                           DenseIndex in_row_strides, DenseIndex in_col_strides,
+                                                           DenseIndex row_inflate_strides, DenseIndex col_inflate_strides,
+                                                           DenseIndex padding_top, DenseIndex padding_bottom,
+                                                           DenseIndex padding_left, DenseIndex padding_right,
+                                                           Scalar padding_value)
+      : m_xpr(expr), m_patch_rows(patch_rows), m_patch_cols(patch_cols),
+        m_row_strides(row_strides), m_col_strides(col_strides),
+        m_in_row_strides(in_row_strides), m_in_col_strides(in_col_strides),
+        m_row_inflate_strides(row_inflate_strides), m_col_inflate_strides(col_inflate_strides),
+        m_padding_explicit(true), m_padding_top(padding_top), m_padding_bottom(padding_bottom),
+        m_padding_left(padding_left), m_padding_right(padding_right),
+        m_padding_type(PADDING_VALID), m_padding_value(padding_value) {}
+
+    EIGEN_DEVICE_FUNC
+    DenseIndex patch_rows() const { return m_patch_rows; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex patch_cols() const { return m_patch_cols; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex row_strides() const { return m_row_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex col_strides() const { return m_col_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex in_row_strides() const { return m_in_row_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex in_col_strides() const { return m_in_col_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex row_inflate_strides() const { return m_row_inflate_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex col_inflate_strides() const { return m_col_inflate_strides; }
+    EIGEN_DEVICE_FUNC
+    bool padding_explicit() const { return m_padding_explicit; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_top() const { return m_padding_top; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_bottom() const { return m_padding_bottom; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_left() const { return m_padding_left; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_right() const { return m_padding_right; }
+    EIGEN_DEVICE_FUNC
+    PaddingType padding_type() const { return m_padding_type; }
+    EIGEN_DEVICE_FUNC
+    Scalar padding_value() const { return m_padding_value; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const DenseIndex m_patch_rows;
+    const DenseIndex m_patch_cols;
+    const DenseIndex m_row_strides;
+    const DenseIndex m_col_strides;
+    const DenseIndex m_in_row_strides;
+    const DenseIndex m_in_col_strides;
+    const DenseIndex m_row_inflate_strides;
+    const DenseIndex m_col_inflate_strides;
+    const bool m_padding_explicit;
+    const DenseIndex m_padding_top;
+    const DenseIndex m_padding_bottom;
+    const DenseIndex m_padding_left;
+    const DenseIndex m_padding_right;
+    const PaddingType m_padding_type;
+    const Scalar m_padding_value;
+};
+
+// Eval as rvalue
+template<DenseIndex Rows, DenseIndex Cols, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorImagePatchOp<Rows, Cols, ArgType>, Device>
+{
+  typedef TensorImagePatchOp<Rows, Cols, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumInputDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  static const int NumDims = NumInputDims + 1;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename internal::remove_const<typename XprType::Scalar>::type Scalar;
+  typedef TensorEvaluator<const TensorImagePatchOp<Rows, Cols, ArgType>,
+                          Device> Self;
+  typedef TensorEvaluator<ArgType, Device> Impl;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = true,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = NumDims == 5,
+  };
+
+  typedef typename internal::TensorBlock<Index, Scalar, NumDims, Layout>
+    OutputTensorBlock;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device)
+  {
+    EIGEN_STATIC_ASSERT(NumDims >= 4, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+    m_paddingValue = op.padding_value();
+
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+
+    // Caches a few variables.
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_inputDepth = input_dims[0];
+      m_inputRows = input_dims[1];
+      m_inputCols = input_dims[2];
+    } else {
+      m_inputDepth = input_dims[NumInputDims-1];
+      m_inputRows = input_dims[NumInputDims-2];
+      m_inputCols = input_dims[NumInputDims-3];
+    }
+
+    m_row_strides = op.row_strides();
+    m_col_strides = op.col_strides();
+
+    // Input strides and effective input/patch size
+    m_in_row_strides = op.in_row_strides();
+    m_in_col_strides = op.in_col_strides();
+    m_row_inflate_strides = op.row_inflate_strides();
+    m_col_inflate_strides = op.col_inflate_strides();
+    // The "effective" input rows and input cols are the input rows and cols
+    // after inflating them with zeros.
+    // For examples, a 2x3 matrix with row_inflate_strides and
+    // col_inflate_strides of 2 comes from:
+    //   A B C
+    //   D E F
+    //
+    // to a matrix is 3 x 5:
+    //
+    //   A . B . C
+    //   . . . . .
+    //   D . E . F
+
+    m_input_rows_eff = (m_inputRows - 1) * m_row_inflate_strides + 1;
+    m_input_cols_eff = (m_inputCols - 1) * m_col_inflate_strides + 1;
+    m_patch_rows_eff = op.patch_rows() + (op.patch_rows() - 1) * (m_in_row_strides - 1);
+    m_patch_cols_eff = op.patch_cols() + (op.patch_cols() - 1) * (m_in_col_strides - 1);
+
+    if (op.padding_explicit()) {
+      m_outputRows = ceil((m_input_rows_eff + op.padding_top() + op.padding_bottom() - m_patch_rows_eff + 1.f) / static_cast<float>(m_row_strides));
+      m_outputCols = ceil((m_input_cols_eff + op.padding_left() + op.padding_right() - m_patch_cols_eff + 1.f) / static_cast<float>(m_col_strides));
+      m_rowPaddingTop = op.padding_top();
+      m_colPaddingLeft = op.padding_left();
+    } else {
+      // Computing padding from the type
+      switch (op.padding_type()) {
+        case PADDING_VALID:
+          m_outputRows = ceil((m_input_rows_eff - m_patch_rows_eff + 1.f) / static_cast<float>(m_row_strides));
+          m_outputCols = ceil((m_input_cols_eff - m_patch_cols_eff + 1.f) / static_cast<float>(m_col_strides));
+          // Calculate the padding
+          m_rowPaddingTop = ((m_outputRows - 1) * m_row_strides + m_patch_rows_eff - m_input_rows_eff) / 2;
+          m_colPaddingLeft = ((m_outputCols - 1) * m_col_strides + m_patch_cols_eff - m_input_cols_eff) / 2;
+          break;
+        case PADDING_SAME:
+          m_outputRows = ceil(m_input_rows_eff / static_cast<float>(m_row_strides));
+          m_outputCols = ceil(m_input_cols_eff / static_cast<float>(m_col_strides));
+          // Calculate the padding
+          m_rowPaddingTop = ((m_outputRows - 1) * m_row_strides + m_patch_rows_eff - m_input_rows_eff) / 2;
+          m_colPaddingLeft = ((m_outputCols - 1) * m_col_strides + m_patch_cols_eff - m_input_cols_eff) / 2;
+          break;
+        default:
+          eigen_assert(false && "unexpected padding");
+      }
+    }
+    eigen_assert(m_outputRows > 0);
+    eigen_assert(m_outputCols > 0);
+
+    // Dimensions for result of extraction.
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      // ColMajor
+      // 0: depth
+      // 1: patch_rows
+      // 2: patch_cols
+      // 3: number of patches
+      // 4 and beyond: anything else (such as batch).
+      m_dimensions[0] = input_dims[0];
+      m_dimensions[1] = op.patch_rows();
+      m_dimensions[2] = op.patch_cols();
+      m_dimensions[3] = m_outputRows * m_outputCols;
+      for (int i = 4; i < NumDims; ++i) {
+        m_dimensions[i] = input_dims[i-1];
+      }
+    } else {
+      // RowMajor
+      // NumDims-1: depth
+      // NumDims-2: patch_rows
+      // NumDims-3: patch_cols
+      // NumDims-4: number of patches
+      // NumDims-5 and beyond: anything else (such as batch).
+      m_dimensions[NumDims-1] = input_dims[NumInputDims-1];
+      m_dimensions[NumDims-2] = op.patch_rows();
+      m_dimensions[NumDims-3] = op.patch_cols();
+      m_dimensions[NumDims-4] = m_outputRows * m_outputCols;
+      for (int i = NumDims-5; i >= 0; --i) {
+        m_dimensions[i] = input_dims[i];
+      }
+    }
+
+    // Strides for moving the patch in various dimensions.
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_colStride = m_dimensions[1];
+      m_patchStride = m_colStride * m_dimensions[2] * m_dimensions[0];
+      m_otherStride = m_patchStride * m_dimensions[3];
+    } else {
+      m_colStride = m_dimensions[NumDims-2];
+      m_patchStride = m_colStride * m_dimensions[NumDims-3] * m_dimensions[NumDims-1];
+      m_otherStride = m_patchStride * m_dimensions[NumDims-4];
+    }
+
+    // Strides for navigating through the input tensor.
+    m_rowInputStride = m_inputDepth;
+    m_colInputStride = m_inputDepth * m_inputRows;
+    m_patchInputStride = m_inputDepth * m_inputRows * m_inputCols;
+
+    // Fast representations of different variables.
+    m_fastOtherStride = internal::TensorIntDivisor<Index>(m_otherStride);
+    m_fastPatchStride = internal::TensorIntDivisor<Index>(m_patchStride);
+    m_fastColStride = internal::TensorIntDivisor<Index>(m_colStride);
+    m_fastInputRowStride = internal::TensorIntDivisor<Index>(m_row_inflate_strides);
+    m_fastInputColStride = internal::TensorIntDivisor<Index>(m_col_inflate_strides);
+    m_fastInputColsEff = internal::TensorIntDivisor<Index>(m_input_cols_eff);
+
+    // Number of patches in the width dimension.
+    m_fastOutputRows = internal::TensorIntDivisor<Index>(m_outputRows);
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_fastOutputDepth = internal::TensorIntDivisor<Index>(m_dimensions[0]);
+    } else {
+      m_fastOutputDepth = internal::TensorIntDivisor<Index>(m_dimensions[NumDims-1]);
+    }
+
+    m_block_total_size_max = numext::maxi(static_cast<std::size_t>(1),
+                                          device.lastLevelCacheSize() /
+                                          sizeof(Scalar));
+  }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    // Patch index corresponding to the passed in index.
+    const Index patchIndex = index / m_fastPatchStride;
+    // Find the offset of the element wrt the location of the first element.
+    const Index patchOffset = (index - patchIndex * m_patchStride) / m_fastOutputDepth;
+
+    // Other ways to index this element.
+    const Index otherIndex = (NumDims == 4) ? 0 : index / m_fastOtherStride;
+    const Index patch2DIndex = (NumDims == 4) ? patchIndex : (index - otherIndex * m_otherStride) / m_fastPatchStride;
+
+    // Calculate col index in the input original tensor.
+    const Index colIndex = patch2DIndex / m_fastOutputRows;
+    const Index colOffset = patchOffset / m_fastColStride;
+    const Index inputCol = colIndex * m_col_strides + colOffset * m_in_col_strides - m_colPaddingLeft;
+    const Index origInputCol = (m_col_inflate_strides == 1) ? inputCol : ((inputCol >= 0) ? (inputCol / m_fastInputColStride) : 0);
+    if (inputCol < 0 || inputCol >= m_input_cols_eff ||
+        ((m_col_inflate_strides != 1) && (inputCol != origInputCol * m_col_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+
+    // Calculate row index in the original input tensor.
+    const Index rowIndex = patch2DIndex - colIndex * m_outputRows;
+    const Index rowOffset = patchOffset - colOffset * m_colStride;
+    const Index inputRow = rowIndex * m_row_strides + rowOffset * m_in_row_strides - m_rowPaddingTop;
+    const Index origInputRow = (m_row_inflate_strides == 1) ? inputRow : ((inputRow >= 0) ? (inputRow / m_fastInputRowStride) : 0);
+    if (inputRow < 0 || inputRow >= m_input_rows_eff ||
+        ((m_row_inflate_strides != 1) && (inputRow != origInputRow * m_row_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+
+    const int depth_index = static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 0 : NumDims - 1;
+    const Index depth = index - (index / m_fastOutputDepth) * m_dimensions[depth_index];
+
+    const Index inputIndex = depth + origInputRow * m_rowInputStride + origInputCol * m_colInputStride + otherIndex * m_patchInputStride;
+    return m_impl.coeff(inputIndex);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const Index packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    if (m_in_row_strides != 1 || m_in_col_strides != 1 || m_row_inflate_strides != 1 || m_col_inflate_strides != 1) {
+      return packetWithPossibleZero(index);
+    }
+
+    const Index indices[2] = {index, index + packetSize - 1};
+    const Index patchIndex = indices[0] / m_fastPatchStride;
+    if (patchIndex != indices[1] / m_fastPatchStride) {
+      return packetWithPossibleZero(index);
+    }
+    const Index otherIndex = (NumDims == 4) ? 0 : indices[0] / m_fastOtherStride;
+    eigen_assert(otherIndex == indices[1] / m_fastOtherStride);
+
+    // Find the offset of the element wrt the location of the first element.
+    const Index patchOffsets[2] = {(indices[0] - patchIndex * m_patchStride) / m_fastOutputDepth,
+                                   (indices[1] - patchIndex * m_patchStride) / m_fastOutputDepth};
+
+    const Index patch2DIndex = (NumDims == 4) ? patchIndex : (indices[0] - otherIndex * m_otherStride) / m_fastPatchStride;
+    eigen_assert(patch2DIndex == (indices[1] - otherIndex * m_otherStride) / m_fastPatchStride);
+
+    const Index colIndex = patch2DIndex / m_fastOutputRows;
+    const Index colOffsets[2] = {patchOffsets[0] / m_fastColStride, patchOffsets[1] / m_fastColStride};
+
+    // Calculate col indices in the original input tensor.
+    const Index inputCols[2] = {colIndex * m_col_strides + colOffsets[0] -
+      m_colPaddingLeft, colIndex * m_col_strides + colOffsets[1] - m_colPaddingLeft};
+    if (inputCols[1] < 0 || inputCols[0] >= m_inputCols) {
+      return internal::pset1<PacketReturnType>(Scalar(m_paddingValue));
+    }
+
+    if (inputCols[0] == inputCols[1]) {
+      const Index rowIndex = patch2DIndex - colIndex * m_outputRows;
+      const Index rowOffsets[2] = {patchOffsets[0] - colOffsets[0]*m_colStride, patchOffsets[1] - colOffsets[1]*m_colStride};
+      eigen_assert(rowOffsets[0] <= rowOffsets[1]);
+      // Calculate col indices in the original input tensor.
+      const Index inputRows[2] = {rowIndex * m_row_strides + rowOffsets[0] -
+        m_rowPaddingTop, rowIndex * m_row_strides + rowOffsets[1] - m_rowPaddingTop};
+
+      if (inputRows[1] < 0 || inputRows[0] >= m_inputRows) {
+        return internal::pset1<PacketReturnType>(Scalar(m_paddingValue));
+      }
+
+      if (inputRows[0] >= 0 && inputRows[1] < m_inputRows) {
+        // no padding
+        const int depth_index = static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 0 : NumDims - 1;
+        const Index depth = index - (index / m_fastOutputDepth) * m_dimensions[depth_index];
+        const Index inputIndex = depth + inputRows[0] * m_rowInputStride + inputCols[0] * m_colInputStride + otherIndex * m_patchInputStride;
+        return m_impl.template packet<Unaligned>(inputIndex);
+      }
+    }
+
+    return packetWithPossibleZero(index);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+    resources->push_back(internal::TensorOpResourceRequirements(
+        internal::kSkewedInnerDims, m_block_total_size_max));
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void block(
+      OutputTensorBlock* output_block) const {
+    typedef typename internal::ImagePatchCopyOp<Self, PacketAccess>
+        ImagePatchCopyOp;
+    typedef typename internal::ImagePatchPaddingOp<Self> ImagePatchPaddingOp;
+
+    // Calculate loop limits and various input/output dim sizes.
+    const DSizes<Index, NumDims>& block_sizes = output_block->block_sizes();
+    const bool col_major =
+        static_cast<int>(Layout) == static_cast<int>(ColMajor);
+    const Index depth_dim_size = block_sizes[col_major ? 0 : NumDims - 1];
+    const Index output_depth_dim_size = m_dimensions[
+        col_major ? 0 : NumDims - 1];
+    const Index row_dim_size = block_sizes[col_major ? 1 : NumDims - 2];
+    const Index output_row_dim_size = m_dimensions[col_major ? 1 : NumDims - 2];
+    const Index col_dim_size = block_sizes[col_major ? 2 : NumDims - 3];
+    const Index block_col_stride = row_dim_size * depth_dim_size;
+    const Index patch_index_dim_size = block_sizes[col_major ? 3 : NumDims - 4];
+    const Index outer_dim_size = block_sizes.TotalSize() /
+        (depth_dim_size * row_dim_size * col_dim_size * patch_index_dim_size);
+
+    const Index patch_size = row_dim_size * col_dim_size * depth_dim_size;
+    const Index batch_size = patch_size * patch_index_dim_size;
+
+    Index output_index = output_block->first_coeff_index();
+
+    // Loop through outer dimensions.
+    for (Index outer_dim_index = 0;
+         outer_dim_index < outer_dim_size;
+         ++outer_dim_index) {
+      const Index outer_output_base_index = outer_dim_index * batch_size;
+      // Find the offset of the element wrt the location of the first element.
+      const Index patchIndexStart = output_index / m_fastPatchStride;
+      const Index patchOffset =
+          (output_index - patchIndexStart * m_patchStride) / m_fastOutputDepth;
+      const Index colOffsetStart = patchOffset / m_fastColStride;
+      // Other ways to index this element.
+      const Index otherIndex = (NumDims == 4) ?
+          0 : output_index / m_fastOtherStride;
+      const Index patch2DIndexStart = (NumDims == 4) ?
+          0 : (output_index - otherIndex * m_otherStride) / m_fastPatchStride;
+      // Calculate starting depth index.
+      const Index depth = output_index - (output_index / m_fastOutputDepth) *
+          output_depth_dim_size;
+      const Index patch_input_base_index = depth + otherIndex *
+          m_patchInputStride;
+
+      // Loop through patches.
+      for (Index patch_index_dim_index = 0;
+           patch_index_dim_index < patch_index_dim_size;
+           ++patch_index_dim_index) {
+        const Index patch_output_base_index = outer_output_base_index +
+            patch_index_dim_index * patch_size;
+        // Patch index corresponding to the passed in index.
+        const Index patchIndex = patchIndexStart + patch_index_dim_index;
+        const Index patch2DIndex = (NumDims == 4) ?
+            patchIndex : patch2DIndexStart + patch_index_dim_index;
+        const Index colIndex = patch2DIndex / m_fastOutputRows;
+        const Index input_col_base = colIndex * m_col_strides;
+        const Index row_offset_base = (patch2DIndex - colIndex * m_outputRows) *
+            m_row_strides - m_rowPaddingTop;
+
+        // Loop through columns.
+        for (Index col_dim_index = 0;
+             col_dim_index < col_dim_size;
+             ++col_dim_index) {
+          const Index col_output_base_index = patch_output_base_index +
+              col_dim_index * block_col_stride;
+
+          // Calculate col index in the input original tensor.
+          Index colOffset = colOffsetStart + col_dim_index;
+          Index inputCol = input_col_base + colOffset * m_in_col_strides -
+              m_colPaddingLeft;
+          Index origInputCol = (m_col_inflate_strides == 1) ?
+              inputCol : ((inputCol >= 0) ?
+                          (inputCol / m_fastInputColStride) : 0);
+
+          bool pad_column = false;
+          if (inputCol < 0 || inputCol >= m_input_cols_eff ||
+              ((m_col_inflate_strides != 1) &&
+               (inputCol != origInputCol * m_col_inflate_strides))) {
+            pad_column = true;
+          }
+
+          const Index col_input_base_index = patch_input_base_index +
+              origInputCol * m_colInputStride;
+          const Index input_row_base = row_offset_base +
+              ((patchOffset + col_dim_index * output_row_dim_size) -
+               colOffset * m_colStride) * m_in_row_strides;
+          // Loop through rows.
+          for (Index row_dim_index = 0;
+               row_dim_index < row_dim_size;
+               ++row_dim_index) {
+            const Index output_base_index = col_output_base_index +
+                row_dim_index * depth_dim_size;
+            bool pad_row = false;
+            Index inputIndex;
+            if (!pad_column) {
+              Index inputRow = input_row_base + row_dim_index *
+                  m_in_row_strides;
+              Index origInputRow = (m_row_inflate_strides == 1) ?
+                  inputRow : ((inputRow >= 0) ?
+                              (inputRow / m_fastInputRowStride) : 0);
+              if (inputRow < 0 || inputRow >= m_input_rows_eff ||
+                  ((m_row_inflate_strides != 1) &&
+                   (inputRow != origInputRow * m_row_inflate_strides))) {
+                pad_row = true;
+              } else {
+                inputIndex = col_input_base_index + origInputRow *
+                    m_rowInputStride;
+              }
+            }
+            // Copy (or pad) along depth dimension.
+            if (pad_column || pad_row) {
+              ImagePatchPaddingOp::Run(depth_dim_size, Scalar(m_paddingValue),
+                                       output_base_index, output_block->data());
+            } else {
+              ImagePatchCopyOp::Run(*this, depth_dim_size,
+                                    output_base_index, output_block->data(),
+                                    inputIndex);
+            }
+          }
+        }
+      }
+      output_index += m_otherStride;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+  const TensorEvaluator<ArgType, Device>& impl() const { return m_impl; }
+
+  Index rowPaddingTop() const { return m_rowPaddingTop; }
+  Index colPaddingLeft() const { return m_colPaddingLeft; }
+  Index outputRows() const { return m_outputRows; }
+  Index outputCols() const { return m_outputCols; }
+  Index userRowStride() const { return m_row_strides; }
+  Index userColStride() const { return m_col_strides; }
+  Index userInRowStride() const { return m_in_row_strides; }
+  Index userInColStride() const { return m_in_col_strides; }
+  Index rowInflateStride() const { return m_row_inflate_strides; }
+  Index colInflateStride() const { return m_col_inflate_strides; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array<Index, NumDims>& coords) const
+  {
+    // Location of the first element of the patch.
+    // ColMajor
+    // 0: d, 1: patch_rows, 2: patch_cols, 3: number of patches, 4: number of batches
+    // RowMajor
+    // 0: number of batches, 1: number of patches, 2: patch_cols , 3: patch_rows, 4: d
+    const Index patch2DIndex = coords[static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 3 : 1];
+
+    array<Index, NumDims-1> inputCoords;
+    Index input_col_idx = patch2DIndex / m_fastInputColsEff;
+    Index inputCol = input_col_idx  + coords[1] * m_in_row_strides - m_rowPaddingTop;
+    Index inputRow = patch2DIndex - input_col_idx * m_input_cols_eff + coords[2] * m_in_col_strides - m_colPaddingLeft;
+    const Index origInputCol = (m_col_inflate_strides == 1) ? inputCol : ((inputCol >= 0) ? (inputCol / m_fastInputColStride) : 0);
+    const Index origInputRow = (m_row_inflate_strides == 1) ? inputRow : ((inputRow >= 0) ? (inputRow / m_fastInputRowStride) : 0);
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      inputCoords[0] = coords[0];  // depth
+      inputCoords[1] = origInputCol;
+      inputCoords[2] = origInputRow;
+      inputCoords[3] = coords[4];  // batch
+    } else {
+      inputCoords[3] = coords[4];  // depth
+      inputCoords[2] = origInputCol;
+      inputCoords[1] = origInputRow;
+      inputCoords[0] = coords[0];  // batch
+    }
+    // If the computed coordinates are outside the original image perimeter, return 0.
+    if (inputCol < 0 || inputCol >= m_input_cols_eff || inputRow < 0 || inputRow >= m_input_rows_eff ||
+        ((m_col_inflate_strides != 1) && (inputCol != origInputCol * m_col_inflate_strides)) ||
+        ((m_row_inflate_strides != 1) && (inputRow != origInputRow * m_row_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+    if (TensorEvaluator<ArgType, Device>::CoordAccess) {
+      return m_impl.coeff(inputCoords);
+    } else {
+      Index inputIndex;
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        inputIndex =
+          inputCoords[3] * m_patchInputStride +
+          inputCoords[2] * m_colInputStride +
+          inputCoords[1] * m_rowInputStride +
+          inputCoords[0];
+      } else {
+        inputIndex =
+          inputCoords[1] * m_patchInputStride +
+          inputCoords[2] * m_colInputStride +
+          inputCoords[3] * m_rowInputStride +
+          inputCoords[4];
+      }
+      return m_impl.coeff(inputIndex);
+    }
+  }
+
+ protected:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetWithPossibleZero(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  Dimensions m_dimensions;
+
+  Index m_otherStride;
+  Index m_patchStride;
+  Index m_colStride;
+  Index m_row_strides;
+  Index m_col_strides;
+
+  Index m_in_row_strides;
+  Index m_in_col_strides;
+  Index m_row_inflate_strides;
+  Index m_col_inflate_strides;
+
+  Index m_input_rows_eff;
+  Index m_input_cols_eff;
+  Index m_patch_rows_eff;
+  Index m_patch_cols_eff;
+
+  internal::TensorIntDivisor<Index> m_fastOtherStride;
+  internal::TensorIntDivisor<Index> m_fastPatchStride;
+  internal::TensorIntDivisor<Index> m_fastColStride;
+  internal::TensorIntDivisor<Index> m_fastInputRowStride;
+  internal::TensorIntDivisor<Index> m_fastInputColStride;
+  internal::TensorIntDivisor<Index> m_fastInputColsEff;
+
+  Index m_rowInputStride;
+  Index m_colInputStride;
+  Index m_patchInputStride;
+
+  Index m_inputDepth;
+  Index m_inputRows;
+  Index m_inputCols;
+
+  Index m_outputRows;
+  Index m_outputCols;
+
+  Index m_rowPaddingTop;
+  Index m_colPaddingLeft;
+
+  internal::TensorIntDivisor<Index> m_fastOutputRows;
+  internal::TensorIntDivisor<Index> m_fastOutputDepth;
+
+  Scalar m_paddingValue;
+  std::size_t m_block_total_size_max;
+
+  TensorEvaluator<ArgType, Device> m_impl;
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_IMAGE_PATCH_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIndexList.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIndexList.h
new file mode 100644
index 0000000000..7631b54f2f
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIndexList.h
@@ -0,0 +1,421 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_INDEX_LIST_H
+#define EIGEN_CXX11_TENSOR_TENSOR_INDEX_LIST_H
+
+#if defined(EIGEN_HAS_CONSTEXPR) && defined(EIGEN_HAS_VARIADIC_TEMPLATES)
+
+#define EIGEN_HAS_INDEX_LIST
+
+namespace Eigen {
+
+/** \internal
+  *
+  * \class TensorIndexList
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Set of classes used to encode a set of Tensor dimensions/indices.
+  *
+  * The indices in the list can be known at compile time or at runtime. A mix
+  * of static and dynamic indices can also be provided if needed. The tensor
+  * code will attempt to take advantage of the indices that are known at
+  * compile time to optimize the code it generates.
+  *
+  * This functionality requires a c++11 compliant compiler. If your compiler
+  * is older you need to use arrays of indices instead.
+  *
+  * Several examples are provided in the cxx11_tensor_index_list.cpp file.
+  *
+  * \sa Tensor
+  */
+
+template <DenseIndex n>
+struct type2index {
+  static const DenseIndex value = n;
+  constexpr operator DenseIndex() const { return n; }
+  void set(DenseIndex val) {
+    eigen_assert(val == n);
+  }
+};
+
+namespace internal {
+template <typename T>
+void update_value(T& val, DenseIndex new_val) {
+  val = new_val;
+}
+template <DenseIndex n>
+void update_value(type2index<n>& val, DenseIndex new_val) {
+  val.set(new_val);
+}
+
+template <typename T>
+struct is_compile_time_constant {
+  static constexpr bool value = false;
+};
+
+template <DenseIndex idx>
+struct is_compile_time_constant<type2index<idx> > {
+  static constexpr bool value = true;
+};
+template <DenseIndex idx>
+struct is_compile_time_constant<const type2index<idx> > {
+  static constexpr bool value = true;
+};
+template <DenseIndex idx>
+struct is_compile_time_constant<type2index<idx>& > {
+  static constexpr bool value = true;
+};
+template <DenseIndex idx>
+struct is_compile_time_constant<const type2index<idx>& > {
+  static constexpr bool value = true;
+};
+
+template <DenseIndex Idx>
+struct tuple_coeff {
+  template <typename... T>
+  static constexpr DenseIndex get(const DenseIndex i, const std::tuple<T...>& t) {
+    return std::get<Idx>(t) * (i == Idx) + tuple_coeff<Idx-1>::get(i, t) * (i != Idx);
+  }
+  template <typename... T>
+  static void set(const DenseIndex i, std::tuple<T...>& t, const DenseIndex value) {
+    if (i == Idx) {
+      update_value(std::get<Idx>(t), value);
+    } else {
+      tuple_coeff<Idx-1>::set(i, t, value);
+    }
+  }
+
+  template <typename... T>
+  static constexpr bool value_known_statically(const DenseIndex i, const std::tuple<T...>& t) {
+    return ((i == Idx) & is_compile_time_constant<typename std::tuple_element<Idx, std::tuple<T...> >::type>::value) ||
+        tuple_coeff<Idx-1>::value_known_statically(i, t);
+  }
+
+  template <typename... T>
+  static constexpr bool values_up_to_known_statically(const std::tuple<T...>& t) {
+    return is_compile_time_constant<typename std::tuple_element<Idx, std::tuple<T...> >::type>::value &&
+        tuple_coeff<Idx-1>::values_up_to_known_statically(t);
+  }
+
+  template <typename... T>
+  static constexpr bool values_up_to_statically_known_to_increase(const std::tuple<T...>& t) {
+    return is_compile_time_constant<typename std::tuple_element<Idx, std::tuple<T...> >::type>::value &&
+           is_compile_time_constant<typename std::tuple_element<Idx-1, std::tuple<T...> >::type>::value &&
+           std::get<Idx>(t) > std::get<Idx-1>(t) &&
+           tuple_coeff<Idx-1>::values_up_to_statically_known_to_increase(t);
+  }
+};
+
+template <>
+struct tuple_coeff<0> {
+  template <typename... T>
+  static constexpr DenseIndex get(const DenseIndex i, const std::tuple<T...>& t) {
+    //  eigen_assert (i == 0);  // gcc fails to compile assertions in constexpr
+    return std::get<0>(t) * (i == 0);
+  }
+  template <typename... T>
+  static void set(const DenseIndex i, std::tuple<T...>& t, const DenseIndex value) {
+    eigen_assert (i == 0);
+    update_value(std::get<0>(t), value);
+  }
+  template <typename... T>
+  static constexpr bool value_known_statically(const DenseIndex i, const std::tuple<T...>& t) {
+    //    eigen_assert (i == 0);  // gcc fails to compile assertions in constexpr
+    return is_compile_time_constant<typename std::tuple_element<0, std::tuple<T...> >::type>::value & (i == 0);
+  }
+
+  template <typename... T>
+  static constexpr bool values_up_to_known_statically(const std::tuple<T...>& t) {
+    return is_compile_time_constant<typename std::tuple_element<0, std::tuple<T...> >::type>::value;
+  }
+
+  template <typename... T>
+  static constexpr bool values_up_to_statically_known_to_increase(const std::tuple<T...>& t) {
+    return true;
+  }
+};
+}  // namespace internal
+
+
+template<typename FirstType, typename... OtherTypes>
+struct IndexList : std::tuple<FirstType, OtherTypes...> {
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC constexpr DenseIndex operator[] (const DenseIndex i) const {
+    return internal::tuple_coeff<std::tuple_size<std::tuple<FirstType, OtherTypes...> >::value-1>::get(i, *this);
+  }
+  EIGEN_STRONG_INLINE EIGEN_DEVICE_FUNC void set(const DenseIndex i, const DenseIndex value) {
+    return internal::tuple_coeff<std::tuple_size<std::tuple<FirstType, OtherTypes...> >::value-1>::set(i, *this, value);
+  }
+
+  constexpr IndexList(const std::tuple<FirstType, OtherTypes...>& other) : std::tuple<FirstType, OtherTypes...>(other) { }
+  constexpr IndexList() : std::tuple<FirstType, OtherTypes...>() { }
+
+  constexpr bool value_known_statically(const DenseIndex i) const {
+    return internal::tuple_coeff<std::tuple_size<std::tuple<FirstType, OtherTypes...> >::value-1>::value_known_statically(i, *this);
+  }
+  constexpr bool all_values_known_statically() const {
+    return internal::tuple_coeff<std::tuple_size<std::tuple<FirstType, OtherTypes...> >::value-1>::values_up_to_known_statically(*this);
+  }
+
+  constexpr bool values_statically_known_to_increase() const {
+    return internal::tuple_coeff<std::tuple_size<std::tuple<FirstType, OtherTypes...> >::value-1>::values_up_to_statically_known_to_increase(*this);
+  }
+};
+
+
+template<typename FirstType, typename... OtherTypes>
+constexpr IndexList<FirstType, OtherTypes...> make_index_list(FirstType val1, OtherTypes... other_vals) {
+  return std::make_tuple(val1, other_vals...);
+}
+
+
+namespace internal {
+
+template<typename FirstType, typename... OtherTypes> size_t array_prod(const IndexList<FirstType, OtherTypes...>& sizes) {
+  size_t result = 1;
+  for (int i = 0; i < array_size<IndexList<FirstType, OtherTypes...> >::value; ++i) {
+    result *= sizes[i];
+  }
+  return result;
+};
+
+template<typename FirstType, typename... OtherTypes> struct array_size<IndexList<FirstType, OtherTypes...> > {
+  static const size_t value = std::tuple_size<std::tuple<FirstType, OtherTypes...> >::value;
+};
+template<typename FirstType, typename... OtherTypes> struct array_size<const IndexList<FirstType, OtherTypes...> > {
+  static const size_t value = std::tuple_size<std::tuple<FirstType, OtherTypes...> >::value;
+};
+
+template<DenseIndex n, typename FirstType, typename... OtherTypes> constexpr DenseIndex array_get(IndexList<FirstType, OtherTypes...>& a) {
+  return std::get<n>(a);
+}
+template<DenseIndex n, typename FirstType, typename... OtherTypes> constexpr DenseIndex array_get(const IndexList<FirstType, OtherTypes...>& a) {
+  return std::get<n>(a);
+}
+
+template <typename T>
+struct index_known_statically {
+  constexpr bool operator() (DenseIndex) const {
+    return false;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_known_statically<IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i);
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_known_statically<const IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i);
+  }
+};
+
+template <typename T>
+struct all_indices_known_statically {
+  constexpr bool operator() () const {
+    return false;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct all_indices_known_statically<IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() () const {
+    return IndexList<FirstType, OtherTypes...>().all_values_known_statically();
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct all_indices_known_statically<const IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() () const {
+    return IndexList<FirstType, OtherTypes...>().all_values_known_statically();
+  }
+};
+
+template <typename T>
+struct indices_statically_known_to_increase {
+  constexpr bool operator() () const {
+    return false;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct indices_statically_known_to_increase<IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() () const {
+    return IndexList<FirstType, OtherTypes...>().values_statically_known_to_increase();
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct indices_statically_known_to_increase<const IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() () const {
+    return IndexList<FirstType, OtherTypes...>().values_statically_known_to_increase();
+  }
+};
+
+template <typename Tx>
+struct index_statically_eq {
+  constexpr bool operator() (DenseIndex, DenseIndex) const {
+    return false;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_statically_eq<IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i) &&
+        IndexList<FirstType, OtherTypes...>()[i] == value;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_statically_eq<const IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i) &&
+        IndexList<FirstType, OtherTypes...>()[i] == value;
+  }
+};
+
+template <typename T>
+struct index_statically_ne {
+  constexpr bool operator() (DenseIndex, DenseIndex) const {
+  return false;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_statically_ne<IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i) &&
+        IndexList<FirstType, OtherTypes...>()[i] != value;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_statically_ne<const IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i) &&
+        IndexList<FirstType, OtherTypes...>()[i] != value;
+  }
+};
+
+
+template <typename T>
+struct index_statically_gt {
+  constexpr bool operator() (DenseIndex, DenseIndex) const {
+  return false;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_statically_gt<IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i) &&
+        IndexList<FirstType, OtherTypes...>()[i] > value;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_statically_gt<const IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i) &&
+        IndexList<FirstType, OtherTypes...>()[i] > value;
+  }
+};
+
+template <typename T>
+struct index_statically_lt {
+  constexpr bool operator() (DenseIndex, DenseIndex) const {
+  return false;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_statically_lt<IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i) &&
+        IndexList<FirstType, OtherTypes...>()[i] < value;
+  }
+};
+
+template <typename FirstType, typename... OtherTypes>
+struct index_statically_lt<const IndexList<FirstType, OtherTypes...> > {
+  constexpr bool operator() (const DenseIndex i, const DenseIndex value) const {
+    return IndexList<FirstType, OtherTypes...>().value_known_statically(i) &&
+        IndexList<FirstType, OtherTypes...>()[i] < value;
+  }
+};
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+#else
+
+namespace Eigen {
+namespace internal {
+
+// No C++11 support
+template <typename T>
+struct index_known_statically {
+  EIGEN_ALWAYS_INLINE EIGEN_DEVICE_FUNC bool operator() (DenseIndex) const{
+    return false;
+  }
+};
+
+template <typename T>
+struct all_indices_known_statically {
+  EIGEN_ALWAYS_INLINE EIGEN_DEVICE_FUNC bool operator() () const {
+    return false;
+  }
+};
+
+template <typename T>
+struct indices_statically_known_to_increase {
+  EIGEN_ALWAYS_INLINE EIGEN_DEVICE_FUNC bool operator() () const {
+    return false;
+  }
+};
+
+template <typename T>
+struct index_statically_eq {
+  EIGEN_ALWAYS_INLINE EIGEN_DEVICE_FUNC bool operator() (DenseIndex, DenseIndex) const{
+    return false;
+  }
+};
+
+template <typename T>
+struct index_statically_ne {
+  EIGEN_ALWAYS_INLINE EIGEN_DEVICE_FUNC bool operator() (DenseIndex, DenseIndex) const{
+    return false;
+  }
+};
+
+template <typename T>
+struct index_statically_gt {
+  EIGEN_ALWAYS_INLINE EIGEN_DEVICE_FUNC bool operator() (DenseIndex, DenseIndex) const{
+    return false;
+  }
+};
+
+template <typename T>
+struct index_statically_lt {
+  EIGEN_ALWAYS_INLINE EIGEN_DEVICE_FUNC bool operator() (DenseIndex, DenseIndex) const{
+    return false;
+  }
+};
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+#endif
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_INDEX_LIST_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorInflation.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorInflation.h
new file mode 100644
index 0000000000..40a50e4662
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorInflation.h
@@ -0,0 +1,219 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Ke Yang <yangke@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_INFLATION_H
+#define EIGEN_CXX11_TENSOR_TENSOR_INFLATION_H
+
+namespace Eigen {
+
+/** \class TensorInflation
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor inflation class.
+  *
+  *
+  */
+namespace internal {
+template<typename Strides, typename XprType>
+struct traits<TensorInflationOp<Strides, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename Strides, typename XprType>
+struct eval<TensorInflationOp<Strides, XprType>, Eigen::Dense>
+{
+  typedef const TensorInflationOp<Strides, XprType>& type;
+};
+
+template<typename Strides, typename XprType>
+struct nested<TensorInflationOp<Strides, XprType>, 1, typename eval<TensorInflationOp<Strides, XprType> >::type>
+{
+  typedef TensorInflationOp<Strides, XprType> type;
+};
+
+}  // end namespace internal
+
+template<typename Strides, typename XprType>
+class TensorInflationOp : public TensorBase<TensorInflationOp<Strides, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorInflationOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorInflationOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorInflationOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorInflationOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorInflationOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorInflationOp(const XprType& expr, const Strides& strides)
+      : m_xpr(expr), m_strides(strides) {}
+
+    EIGEN_DEVICE_FUNC
+    const Strides& strides() const { return m_strides; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const Strides m_strides;
+};
+
+// Eval as rvalue
+template<typename Strides, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorInflationOp<Strides, ArgType>, Device>
+{
+  typedef TensorInflationOp<Strides, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = /*TensorEvaluator<ArgType, Device>::IsAligned*/ false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device), m_strides(op.strides())
+  {
+    m_dimensions = m_impl.dimensions();
+    // Expand each dimension to the inflated dimension.
+    for (int i = 0; i < NumDims; ++i) {
+      m_dimensions[i] = (m_dimensions[i] - 1) * op.strides()[i] + 1;
+    }
+
+    // Remember the strides for fast division.
+    for (int i = 0; i < NumDims; ++i) {
+      m_fastStrides[i] = internal::TensorIntDivisor<Index>(m_strides[i]);
+    }
+
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_outputStrides[0] = 1;
+      m_inputStrides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1];
+        m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1];
+      }
+    } else {  // RowMajor
+      m_outputStrides[NumDims-1] = 1;
+      m_inputStrides[NumDims-1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1];
+        m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1];
+      }
+    }
+  }
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  // Computes the input index given the output index. Returns true if the output
+  // index doesn't fall into a hole.
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool getInputIndex(Index index, Index* inputIndex) const
+  {
+    eigen_assert(index < dimensions().TotalSize());
+    *inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = index / m_outputStrides[i];
+        if (idx != idx / m_fastStrides[i] * m_strides[i]) {
+          return false;
+        }
+        *inputIndex += idx / m_strides[i] * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      if (index != index / m_fastStrides[0] * m_strides[0]) {
+        return false;
+      }
+      *inputIndex += index / m_strides[0];
+      return true;
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx = index / m_outputStrides[i];
+        if (idx != idx / m_fastStrides[i] * m_strides[i]) {
+          return false;
+        }
+        *inputIndex += idx / m_strides[i] * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      if (index != index / m_fastStrides[NumDims-1] * m_strides[NumDims-1]) {
+        return false;
+      }
+      *inputIndex += index / m_strides[NumDims - 1];
+    }
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    Index inputIndex = 0;
+    if (getInputIndex(index, &inputIndex)) {
+     return m_impl.coeff(inputIndex);
+    } else {
+     return Scalar(0);
+    }
+  }
+
+  // TODO(yangke): optimize this function so that we can detect and produce
+  // all-zero packets
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_outputStrides;
+  array<Index, NumDims> m_inputStrides;
+  TensorEvaluator<ArgType, Device> m_impl;
+  const Strides m_strides;
+  array<internal::TensorIntDivisor<Index>, NumDims> m_fastStrides;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_INFLATION_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorInitializer.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorInitializer.h
new file mode 100644
index 0000000000..375c763152
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorInitializer.h
@@ -0,0 +1,82 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_INITIALIZER_H
+#define EIGEN_CXX11_TENSOR_TENSOR_INITIALIZER_H
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+
+#include <initializer_list>
+
+namespace Eigen {
+
+/** \class TensorInitializer
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Helper template to initialize Tensors from std::initializer_lists.
+  */
+namespace internal {
+
+template <typename Derived, int N>
+struct Initializer {
+  typedef std::initializer_list<
+    typename Initializer<Derived, N - 1>::InitList> InitList;
+
+  static void run(TensorEvaluator<Derived, DefaultDevice>& tensor,
+                  Eigen::array<typename traits<Derived>::Index, traits<Derived>::NumDimensions>* indices,
+                  const InitList& vals) {
+    int i = 0;
+    for (auto v : vals) {
+      (*indices)[traits<Derived>::NumDimensions - N] = i++;
+      Initializer<Derived, N - 1>::run(tensor, indices, v);
+    }
+  }
+};
+
+template <typename Derived>
+struct Initializer<Derived, 1> {
+  typedef std::initializer_list<typename traits<Derived>::Scalar> InitList;
+
+  static void run(TensorEvaluator<Derived, DefaultDevice>& tensor,
+                  Eigen::array<typename traits<Derived>::Index, traits<Derived>::NumDimensions>* indices,
+                  const InitList& vals) {
+    int i = 0;
+    // There is likely a faster way to do that than iterating.
+    for (auto v : vals) {
+      (*indices)[traits<Derived>::NumDimensions - 1] = i++;
+      tensor.coeffRef(*indices) = v;
+    }
+  }
+};
+
+template <typename Derived>
+struct Initializer<Derived, Dynamic> {
+  typedef std::initializer_list<typename traits<Derived>::Scalar> InitList;
+
+  static void run(TensorEvaluator<Derived, DefaultDevice>& tensor,
+                  Eigen::array<typename traits<Derived>::Index, traits<Derived>::NumDimensions>* indices,
+                  const InitList& vals) {
+    // Static initialization not implemented for VarDims tensors.
+    eigen_assert(false);
+  }
+};
+
+template <typename Derived, int N>
+void initialize_tensor(TensorEvaluator<Derived, DefaultDevice>& tensor,
+                       const typename Initializer<Derived, traits<Derived>::NumDimensions>::InitList& vals) {
+  Eigen::array<typename traits<Derived>::Index, traits<Derived>::NumDimensions> indices;
+  Initializer<Derived, traits<Derived>::NumDimensions>::run(tensor, &indices, vals);
+}
+
+}  // namespace internal
+}  // namespace Eigen
+
+#endif  // EIGEN_HAS_VARIADIC_TEMPLATES
+
+#endif  // EIGEN_CXX11_TENSOR_TENSOR_ASSIGN_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIntDiv.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIntDiv.h
new file mode 100644
index 0000000000..3e90b08c99
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIntDiv.h
@@ -0,0 +1,357 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_INTDIV_H
+#define EIGEN_CXX11_TENSOR_TENSOR_INTDIV_H
+
+
+namespace Eigen {
+
+/** \internal
+  *
+  * \class TensorIntDiv
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Fast integer division by a constant.
+  *
+  * See the paper from Granlund and Montgomery for explanation.
+  *   (at http://dx.doi.org/10.1145/773473.178249)
+  *
+  * \sa Tensor
+  */
+
+namespace internal {
+
+#if !defined(__GCUDACC__) && !defined(__GCUDACC_HOST__)
+
+namespace {
+  // Note: result is undefined if val == 0
+  template <typename T>
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE int count_leading_zeros(const T val)
+  {
+#ifdef __CUDA_ARCH__
+    if (sizeof(T) == 8) {
+      return __clzll(val);
+    }
+    return __clz(val);
+#elif EIGEN_COMP_MSVC
+    DWORD leading_zeros = 0;
+    if (sizeof(T) == 8) {
+      _BitScanReverse64(&leading_zero, val);
+    }
+    else {
+      _BitScanReverse(&leading_zero, val);
+    }
+#else
+    if (sizeof(T) == 8) {
+      return __builtin_clzl(static_cast<uint64_t>(val));
+    }
+    return __builtin_clz(static_cast<uint32_t>(val));
+#endif
+  }
+
+
+  template <typename T>
+  struct DividerTraits {
+#if defined(__SIZEOF_INT128__) && !defined(__CUDACC__)
+    typedef typename conditional<sizeof(T) == 8, uint64_t, uint32_t>::type type;
+    static const int N = sizeof(T) * 8;
+#else
+    typedef uint32_t type;
+    static const int N = 32;
+#endif
+  };
+
+
+  template <typename T>
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE uint32_t muluh(const uint32_t a, const T b) {
+#if defined(__CUDA_ARCH__)
+    return __umulhi(a, b);
+#else
+    return (static_cast<uint64_t>(a) * b) >> 32;
+#endif
+  }
+
+#if defined(__CUDA_ARCH__)
+ template <typename T>
+ EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE uint64_t muluh(const uint64_t a, const T b) {
+    return __umul64hi(a, b);
+ }
+#else
+  template <typename T>
+  EIGEN_ALWAYS_INLINE uint64_t muluh(const uint64_t a, const T b) {
+#if defined(__SIZEOF_INT128__) && !defined(__CUDACC__)
+    __uint128_t v = static_cast<__uint128_t>(a) * static_cast<__uint128_t>(b);
+    return static_cast<uint64_t>(v >> 64);
+#else
+    EIGEN_STATIC_ASSERT(sizeof(T) == 4, YOU_MADE_A_PROGRAMMING_MISTAKE);
+    return (a * b) >> 32;
+#endif
+  }
+#endif
+
+  template <int N, typename T>
+  struct DividerHelper {
+    static EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE uint32_t computeMultiplier (const int log_div, const T divider) {
+      EIGEN_STATIC_ASSERT(N == 32, YOU_MADE_A_PROGRAMMING_MISTAKE);
+      return (static_cast<uint64_t>(1) << (N+log_div)) / divider - (static_cast<uint64_t>(1) << N) + 1;
+    }
+  };
+
+#if defined(__SIZEOF_INT128__) && !defined(__CUDACC__)
+  template <typename T>
+  struct DividerHelper<64, T> {
+    static EIGEN_ALWAYS_INLINE uint64_t computeMultiplier(const int log_div, const T divider) {
+      return ((static_cast<__uint128_t>(1) << (64+log_div)) / static_cast<__uint128_t>(divider) - (static_cast<__uint128_t>(1) << 64) + 1);
+    }
+  };
+#endif
+}
+
+
+template <typename T>
+struct TensorIntDivisor {
+ public:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor() {
+    multiplier = 0;
+    shift1 = 0;
+    shift2 = 0;
+  }
+
+  // Must have 0 < divider < 2^31. This is relaxed to
+  // 0 < divider < 2^63 when using 64-bit indices on platforms that support
+  // the __uint128_t type.
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor(const T divider) {
+    const int N = DividerTraits<T>::N;
+    eigen_assert(divider < NumTraits<UnsignedType>::highest()/2);
+    eigen_assert(divider > 0);
+
+    // fast ln2
+    const int leading_zeros = count_leading_zeros(static_cast<UnsignedType>(divider));
+    int log_div = N - leading_zeros;
+    // if divider is a power of two then log_div is 1 more than it should be.
+    if ((1ull << (log_div-1)) == divider)
+      log_div--;
+
+    multiplier = DividerHelper<N, T>::computeMultiplier(log_div, divider);
+    shift1 = log_div > 1 ? 1 : log_div;
+    shift2 = log_div > 1 ? log_div-1 : 0;
+  }
+
+  // Must have 0 <= numerator. On platforms that dont support the __uint128_t
+  // type numerator should also be less than 2^32-1.
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T divide(const T numerator) const {
+    eigen_assert(numerator < NumTraits<UnsignedType>::highest()/2);
+    eigen_assert(numerator >= 0);
+
+    UnsignedType t1 = muluh(multiplier, numerator);
+    UnsignedType t = (static_cast<UnsignedType>(numerator) - t1) >> shift1;
+    return (t1 + t) >> shift2;
+  }
+
+ private:
+  typedef typename DividerTraits<T>::type UnsignedType;
+  UnsignedType multiplier;
+  int32_t shift1;
+  int32_t shift2;
+};
+
+
+// Optimized version for signed 32 bit integers.
+// Derived from Hacker's Delight.
+template <>
+class TensorIntDivisor<int32_t> {
+ public:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor() {
+    magic = 0;
+    shift = 0;
+  }
+  // Must have 2 <= divider
+  EIGEN_DEVICE_FUNC TensorIntDivisor(int32_t divider)  {
+    eigen_assert(divider >= 2);
+    calcMagic(divider);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE int divide(const int32_t n) const {
+#ifdef __CUDA_ARCH__
+    return (__umulhi(magic, n) >> shift);
+#else
+    uint64_t v = static_cast<uint64_t>(magic) * static_cast<uint64_t>(n);
+    return (static_cast<uint32_t>(v >> 32) >> shift);
+#endif
+  }
+
+private:
+  // Compute the magic numbers. See Hacker's Delight section 10 for an in
+  // depth explanation.
+  EIGEN_DEVICE_FUNC void calcMagic(int32_t d) {
+   const unsigned two31 = 0x80000000;     // 2**31.
+   unsigned ad = d;
+   unsigned t = two31 + (ad >> 31);
+   unsigned anc = t - 1 - t%ad;     // Absolute value of nc.
+   int p = 31;                      // Init. p.
+   unsigned q1 = two31/anc;         // Init. q1 = 2**p/|nc|.
+   unsigned r1 = two31 - q1*anc;    // Init. r1 = rem(2**p, |nc|).
+   unsigned q2 = two31/ad;          // Init. q2 = 2**p/|d|.
+   unsigned r2 = two31 - q2*ad;     // Init. r2 = rem(2**p, |d|).
+   unsigned delta = 0;
+   do {
+      p = p + 1;
+      q1 = 2*q1;           // Update q1 = 2**p/|nc|.
+      r1 = 2*r1;           // Update r1 = rem(2**p, |nc|).
+      if (r1 >= anc) {     // (Must be an unsigned
+         q1 = q1 + 1;      // comparison here).
+         r1 = r1 - anc;}
+      q2 = 2*q2;           // Update q2 = 2**p/|d|.
+      r2 = 2*r2;           // Update r2 = rem(2**p, |d|).
+      if (r2 >= ad) {      // (Must be an unsigned
+         q2 = q2 + 1;      // comparison here).
+         r2 = r2 - ad;}
+      delta = ad - r2;
+   } while (q1 < delta || (q1 == delta && r1 == 0));
+
+   magic = (unsigned)(q2 + 1);
+   shift = p - 32;
+  }
+
+  uint32_t magic;
+  int32_t shift;
+};
+
+
+template <typename T>
+static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator / (const T& numerator, const TensorIntDivisor<T>& divisor) {
+  return divisor.divide(numerator);
+}
+
+
+#else
+// Reverse to the old code since gcudacc doesn't support the code above.
+template <typename T>
+struct TensorIntDivisor {
+ public:
+   EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor() {
+    multiplier = 0;
+    shift1 = 0;
+    shift2 = 0;
+  }
+
+  // Must have 1 <= divider <= 2^31-1
+   EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor(const T divider) {
+    const int N = 32;
+    eigen_assert(divider > 0);
+    eigen_assert(divider < (1ull<<(N-1)));
+
+    // fast ln2
+#ifndef __CUDA_ARCH__
+    const int leading_zeros = __builtin_clz(divider);
+#else
+    const int leading_zeros = __clz(divider);
+#endif
+    int log_div = N - leading_zeros;
+    // if divider is a power of two then log_div is 1 more than it should be.
+    if ((1ull << (log_div-1)) == divider)
+      log_div--;
+
+    multiplier = (static_cast<uint64_t>(1) << (N+log_div)) / divider - (static_cast<uint64_t>(1) << N) + 1;
+    shift1 = log_div > 1 ? 1 : log_div;
+    shift2 = log_div > 1 ? log_div-1 : 0;
+  }
+
+  // Must have 0 <= numerator <= 2^32-1
+   EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T divide(const T numerator) const {
+    const int N = 32;
+    eigen_assert(numerator >= 0);
+    eigen_assert(static_cast<uint64_t>(numerator) < 1ull<<N);
+
+    uint32_t t1 = (multiplier * numerator) >> N;
+    uint32_t t = (static_cast<uint32_t>(numerator) - t1) >> shift1;
+    return (t1 + t) >> shift2;
+  }
+
+ private:
+  uint64_t multiplier;
+  int32_t shift1;
+  int32_t shift2;
+};
+
+
+// Optimized version for signed 32 bit integers.
+// Derived from Hacker's Delight.
+template <>
+class TensorIntDivisor<int> {
+ public:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorIntDivisor() {
+    magic = 0;
+    shift = 0;
+  }
+  // Must have 2 <= divider
+  EIGEN_DEVICE_FUNC TensorIntDivisor(int divider)  {
+    eigen_assert(divider >= 2);
+    calcMagic(divider);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE int divide(const int n) const {
+#ifdef __CUDA_ARCH__
+    return (__umulhi(magic, n) >> shift);
+#else
+  uint64_t v = static_cast<uint64_t>(magic) * static_cast<uint64_t>(n);
+  return (static_cast<unsigned int>(v >> 32) >> shift);
+#endif
+  }
+
+private:
+  // Compute the magic numbers. See Hacker's Delight section 10 for an in
+  // depth explanation.
+  EIGEN_DEVICE_FUNC void calcMagic(int d) {
+   const unsigned two31 = 0x80000000;     // 2**31.
+   unsigned ad = d;
+   unsigned t = two31 + (ad >> 31);
+   unsigned anc = t - 1 - t%ad;     // Absolute value of nc.
+   int p = 31;                      // Init. p.
+   unsigned q1 = two31/anc;         // Init. q1 = 2**p/|nc|.
+   unsigned r1 = two31 - q1*anc;    // Init. r1 = rem(2**p, |nc|).
+   unsigned q2 = two31/ad;          // Init. q2 = 2**p/|d|.
+   unsigned r2 = two31 - q2*ad;     // Init. r2 = rem(2**p, |d|).
+   unsigned delta = 0;
+   do {
+      p = p + 1;
+      q1 = 2*q1;           // Update q1 = 2**p/|nc|.
+      r1 = 2*r1;           // Update r1 = rem(2**p, |nc|).
+      if (r1 >= anc) {     // (Must be an unsigned
+         q1 = q1 + 1;      // comparison here).
+         r1 = r1 - anc;}
+      q2 = 2*q2;           // Update q2 = 2**p/|d|.
+      r2 = 2*r2;           // Update r2 = rem(2**p, |d|).
+      if (r2 >= ad) {      // (Must be an unsigned
+         q2 = q2 + 1;      // comparison here).
+         r2 = r2 - ad;}
+      delta = ad - r2;
+   } while (q1 < delta || (q1 == delta && r1 == 0));
+
+   magic = (unsigned)(q2 + 1);
+   shift = p - 32;
+  }
+
+  unsigned int magic;
+  int shift;
+};
+
+
+template <typename T>
+static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T operator / (const T& numerator, const TensorIntDivisor<T>& divisor) {
+  return divisor.divide(numerator);
+}
+
+#endif
+
+} // end namespace internal
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_INTDIV_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorLayoutSwap.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorLayoutSwap.h
new file mode 100644
index 0000000000..bd795d54b0
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorLayoutSwap.h
@@ -0,0 +1,217 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_LAYOUT_SWAP_H
+#define EIGEN_CXX11_TENSOR_TENSOR_LAYOUT_SWAP_H
+
+namespace Eigen {
+
+/** \class TensorLayoutSwap
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Swap the layout from col-major to row-major, or row-major
+  * to col-major, and invert the order of the dimensions.
+  *
+  * Beware: the dimensions are reversed by this operation. If you want to
+  * preserve the ordering of the dimensions, you need to combine this
+  * operation with a shuffle.
+  *
+  * \example:
+  * Tensor<float, 2, ColMajor> input(2, 4);
+  * Tensor<float, 2, RowMajor> output = input.swap_layout();
+  * eigen_assert(output.dimension(0) == 4);
+  * eigen_assert(output.dimension(1) == 2);
+  *
+  * array<int, 2> shuffle(1, 0);
+  * output = input.swap_layout().shuffle(shuffle);
+  * eigen_assert(output.dimension(0) == 2);
+  * eigen_assert(output.dimension(1) == 4);
+  *
+  */
+namespace internal {
+template<typename XprType>
+struct traits<TensorLayoutSwapOp<XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = traits<XprType>::NumDimensions;
+  static const int Layout = (static_cast<int>(traits<XprType>::Layout) == static_cast<int>(ColMajor)) ? RowMajor : ColMajor;
+};
+
+template<typename XprType>
+struct eval<TensorLayoutSwapOp<XprType>, Eigen::Dense>
+{
+  typedef const TensorLayoutSwapOp<XprType>& type;
+};
+
+template<typename XprType>
+struct nested<TensorLayoutSwapOp<XprType>, 1, typename eval<TensorLayoutSwapOp<XprType> >::type>
+{
+  typedef TensorLayoutSwapOp<XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename XprType>
+class TensorLayoutSwapOp : public TensorBase<TensorLayoutSwapOp<XprType>, WriteAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorLayoutSwapOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorLayoutSwapOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+  typedef typename internal::remove_const<typename XprType::PacketReturnType>::type PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorLayoutSwapOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorLayoutSwapOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorLayoutSwapOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorLayoutSwapOp(const XprType& expr)
+      : m_xpr(expr) {}
+
+  EIGEN_DEVICE_FUNC
+  const typename internal::remove_all<typename XprType::Nested>::type&
+  expression() const { return m_xpr; }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE TensorLayoutSwapOp& operator = (const TensorLayoutSwapOp& other)
+  {
+    typedef TensorAssignOp<TensorLayoutSwapOp, const TensorLayoutSwapOp> Assign;
+    Assign assign(*this, other);
+    internal::TensorExecutor<const Assign, DefaultDevice>::run(
+        assign, DefaultDevice());
+    return *this;
+  }
+
+  template<typename OtherDerived>
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE TensorLayoutSwapOp& operator = (const OtherDerived& other)
+  {
+    typedef TensorAssignOp<TensorLayoutSwapOp, const OtherDerived> Assign;
+    Assign assign(*this, other);
+    internal::TensorExecutor<const Assign, DefaultDevice>::run(
+        assign, DefaultDevice());
+    return *this;
+  }
+
+ protected:
+  typename XprType::Nested m_xpr;
+};
+
+
+// Eval as rvalue
+template<typename ArgType, typename Device>
+struct TensorEvaluator<const TensorLayoutSwapOp<ArgType>, Device>
+{
+  typedef TensorLayoutSwapOp<ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = TensorEvaluator<ArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = (static_cast<int>(TensorEvaluator<ArgType, Device>::Layout) ==
+              static_cast<int>(ColMajor))
+                 ? RowMajor
+                 : ColMajor,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device)
+  {
+    for(int i = 0; i < NumDims; ++i) {
+      m_dimensions[i] = m_impl.dimensions()[NumDims-1-i];
+    }
+  }
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* data) {
+    return m_impl.evalSubExprsIfNeeded(data);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return m_impl.coeff(index);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    return m_impl.template packet<LoadMode>(index);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return m_impl.data(); }
+
+  const TensorEvaluator<ArgType, Device>& impl() const { return m_impl; }
+
+ protected:
+  TensorEvaluator<ArgType, Device> m_impl;
+  Dimensions m_dimensions;
+};
+
+
+// Eval as lvalue
+template<typename ArgType, typename Device>
+  struct TensorEvaluator<TensorLayoutSwapOp<ArgType>, Device>
+  : public TensorEvaluator<const TensorLayoutSwapOp<ArgType>, Device>
+{
+  typedef TensorEvaluator<const TensorLayoutSwapOp<ArgType>, Device> Base;
+  typedef TensorLayoutSwapOp<ArgType> XprType;
+
+  enum {
+    IsAligned = TensorEvaluator<ArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = (static_cast<int>(TensorEvaluator<ArgType, Device>::Layout) ==
+              static_cast<int>(ColMajor))
+                 ? RowMajor
+                 : ColMajor,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+    : Base(op, device)
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index)
+  {
+    return this->m_impl.coeffRef(index);
+  }
+  template <int StoreMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x)
+  {
+    this->m_impl.template writePacket<StoreMode>(index, x);
+  }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_LAYOUT_SWAP_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMap.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMap.h
new file mode 100644
index 0000000000..908bdc38ad
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMap.h
@@ -0,0 +1,320 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_MAP_H
+#define EIGEN_CXX11_TENSOR_TENSOR_MAP_H
+
+namespace Eigen {
+
+/** \class TensorMap
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief A tensor expression mapping an existing array of data.
+  *
+  */
+
+template<typename PlainObjectType, int Options_> class TensorMap : public TensorBase<TensorMap<PlainObjectType, Options_> >
+{
+  public:
+    typedef TensorMap<PlainObjectType, Options_> Self;
+    typedef typename PlainObjectType::Base Base;
+    typedef typename Eigen::internal::nested<Self>::type Nested;
+    typedef typename internal::traits<PlainObjectType>::StorageKind StorageKind;
+    typedef typename internal::traits<PlainObjectType>::Index Index;
+    typedef typename internal::traits<PlainObjectType>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type Packet;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+
+  /*    typedef typename internal::conditional<
+                         bool(internal::is_lvalue<PlainObjectType>::value),
+                         Scalar *,
+                         const Scalar *>::type
+                     PointerType;*/
+    typedef Scalar* PointerType;
+    typedef PointerType PointerArgType;
+
+    static const int Options = Options_;
+
+    static const Index NumIndices = PlainObjectType::NumIndices;
+    typedef typename PlainObjectType::Dimensions Dimensions;
+
+    enum {
+      IsAligned = ((int(Options_) & Aligned) == Aligned),
+      PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+      BlockAccess = false,
+      Layout = PlainObjectType::Layout,
+      CoordAccess = true,
+    };
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr) : m_data(dataPtr), m_dimensions() {
+      // The number of dimensions used to construct a tensor must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT((0 == NumIndices || NumIndices == Dynamic), YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr, Index firstDimension, IndexTypes... otherDimensions) : m_data(dataPtr), m_dimensions(firstDimension, otherDimensions...) {
+      // The number of dimensions used to construct a tensor must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT((sizeof...(otherDimensions) + 1 == NumIndices || NumIndices == Dynamic), YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+#else
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr, Index firstDimension) : m_data(dataPtr), m_dimensions(firstDimension) {
+      // The number of dimensions used to construct a tensor must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT((1 == NumIndices || NumIndices == Dynamic), YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr, Index dim1, Index dim2) : m_data(dataPtr), m_dimensions(dim1, dim2) {
+      EIGEN_STATIC_ASSERT(2 == NumIndices || NumIndices == Dynamic, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr, Index dim1, Index dim2, Index dim3) : m_data(dataPtr), m_dimensions(dim1, dim2, dim3) {
+      EIGEN_STATIC_ASSERT(3 == NumIndices || NumIndices == Dynamic, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr, Index dim1, Index dim2, Index dim3, Index dim4) : m_data(dataPtr), m_dimensions(dim1, dim2, dim3, dim4) {
+      EIGEN_STATIC_ASSERT(4 == NumIndices || NumIndices == Dynamic, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr, Index dim1, Index dim2, Index dim3, Index dim4, Index dim5) : m_data(dataPtr), m_dimensions(dim1, dim2, dim3, dim4, dim5) {
+      EIGEN_STATIC_ASSERT(5 == NumIndices || NumIndices == Dynamic, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr, const array<Index, NumIndices>& dimensions)
+      : m_data(dataPtr), m_dimensions(dimensions)
+    { }
+
+    template <typename Dimensions>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(PointerArgType dataPtr, const Dimensions& dimensions)
+      : m_data(dataPtr), m_dimensions(dimensions)
+    { }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorMap(PlainObjectType& tensor)
+      : m_data(tensor.data()), m_dimensions(tensor.dimensions())
+    { }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index rank() const { return m_dimensions.rank(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index dimension(Index n) const { return m_dimensions[n]; }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index size() const { return m_dimensions.TotalSize(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar* data() { return m_data; }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar* data() const { return m_data; }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(const array<Index, NumIndices>& indices) const
+    {
+      //      eigen_assert(checkIndexRange(indices));
+      if (PlainObjectType::Options&RowMajor) {
+        const Index index = m_dimensions.IndexOfRowMajor(indices);
+        return m_data[index];
+      } else {
+        const Index index = m_dimensions.IndexOfColMajor(indices);
+        return m_data[index];
+      }
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()() const
+    {
+      EIGEN_STATIC_ASSERT(NumIndices == 0 || NumIndices == Dynamic, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor.");
+      eigen_assert(rank() == 0);
+      return m_data[0];
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index firstIndex, IndexTypes... otherIndices) const
+    {
+      static_assert(sizeof...(otherIndices) + 1 == NumIndices, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor.");
+      if (PlainObjectType::Options&RowMajor) {
+        const Index index = m_dimensions.IndexOfRowMajor(array<Index, NumIndices>{{firstIndex, otherIndices...}});
+        return m_data[index];
+      } else {
+        const Index index = m_dimensions.IndexOfColMajor(array<Index, NumIndices>{{firstIndex, otherIndices...}});
+        return m_data[index];
+      }
+    }
+#else
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return m_data[index];
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1) const
+    {
+      if (PlainObjectType::Options&RowMajor) {
+        const Index index = i1 + i0 * m_dimensions[0];
+        return m_data[index];
+      } else {
+        const Index index = i0 + i1 * m_dimensions[0];
+        return m_data[index];
+      }
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2) const
+    {
+      if (PlainObjectType::Options&RowMajor) {
+         const Index index = i2 + m_dimensions[1] * (i1 + m_dimensions[0] * i0);
+         return m_data[index];
+      } else {
+         const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * i2);
+        return m_data[index];
+      }
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2, Index i3) const
+    {
+      if (PlainObjectType::Options&RowMajor) {
+        const Index index = i3 + m_dimensions[3] * (i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0));
+        return m_data[index];
+      } else {
+        const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * (i2 + m_dimensions[2] * i3));
+        return m_data[index];
+      }
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar& operator()(Index i0, Index i1, Index i2, Index i3, Index i4) const
+    {
+      if (PlainObjectType::Options&RowMajor) {
+        const Index index = i4 + m_dimensions[4] * (i3 + m_dimensions[3] * (i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0)));
+        return m_data[index];
+      } else {
+        const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * (i2 + m_dimensions[2] * (i3 + m_dimensions[3] * i4)));
+        return m_data[index];
+      }
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(const array<Index, NumIndices>& indices)
+    {
+      //      eigen_assert(checkIndexRange(indices));
+      if (PlainObjectType::Options&RowMajor) {
+        const Index index = m_dimensions.IndexOfRowMajor(indices);
+        return m_data[index];
+      } else {
+        const Index index = m_dimensions.IndexOfColMajor(indices);
+        return m_data[index];
+      }
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()()
+    {
+      static_assert(NumIndices == 0 || NumIndices == Dynamic, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor.");
+      eigen_internal_assert(rank() == 0);
+      return m_data[0];
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index firstIndex, IndexTypes... otherIndices)
+    {
+      static_assert(sizeof...(otherIndices) + 1 == NumIndices || NumIndices == Dynamic, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor.");
+      const std::size_t NumDims = sizeof...(otherIndices) + 1;
+      if (PlainObjectType::Options&RowMajor) {
+        const array<Index, NumDims> dims = {firstIndex, otherIndices...};
+        const Index index = m_dimensions.IndexOfRowMajor(dims);
+        return m_data[index];
+      } else {
+        const array<Index, NumDims> dims = {firstIndex, otherIndices...};
+        const Index index = m_dimensions.IndexOfColMajor(dims);
+        return m_data[index];
+      }
+    }
+#else
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index index)
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return m_data[index];
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1)
+    {
+       if (PlainObjectType::Options&RowMajor) {
+         const Index index = i1 + i0 * m_dimensions[0];
+        return m_data[index];
+      } else {
+        const Index index = i0 + i1 * m_dimensions[0];
+        return m_data[index];
+      }
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2)
+    {
+       if (PlainObjectType::Options&RowMajor) {
+         const Index index = i2 + m_dimensions[1] * (i1 + m_dimensions[0] * i0);
+        return m_data[index];
+      } else {
+         const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * i2);
+        return m_data[index];
+      }
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3)
+    {
+      if (PlainObjectType::Options&RowMajor) {
+        const Index index = i3 + m_dimensions[3] * (i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0));
+        return m_data[index];
+      } else {
+        const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * (i2 + m_dimensions[2] * i3));
+        return m_data[index];
+      }
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3, Index i4)
+    {
+      if (PlainObjectType::Options&RowMajor) {
+        const Index index = i4 + m_dimensions[4] * (i3 + m_dimensions[3] * (i2 + m_dimensions[2] * (i1 + m_dimensions[1] * i0)));
+        return m_data[index];
+      } else {
+        const Index index = i0 + m_dimensions[0] * (i1 + m_dimensions[1] * (i2 + m_dimensions[2] * (i3 + m_dimensions[3] * i4)));
+        return m_data[index];
+      }
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Self& operator=(const Self& other)
+    {
+      typedef TensorAssignOp<Self, const Self> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    Self& operator=(const OtherDerived& other)
+    {
+      typedef TensorAssignOp<Self, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+      return *this;
+    }
+
+  private:
+    Scalar* m_data;
+    Dimensions m_dimensions;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_MAP_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMeta.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMeta.h
new file mode 100644
index 0000000000..4dd9af6f92
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMeta.h
@@ -0,0 +1,103 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_META_H
+#define EIGEN_CXX11_TENSOR_TENSOR_META_H
+
+namespace Eigen {
+
+template<bool cond> struct Cond {};
+
+template<typename T1, typename T2> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
+const T1& choose(Cond<true>, const T1& first, const T2&) {
+  return first;
+}
+
+template<typename T1, typename T2> EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE
+const T2& choose(Cond<false>, const T1&, const T2& second) {
+  return second;
+}
+
+
+// Default packet types
+template <typename Scalar, typename Device>
+struct PacketType {
+  typedef typename internal::packet_traits<Scalar>::type type;
+  static const int size = internal::unpacket_traits<type>::size;
+};
+
+// For CUDA packet types when using a GpuDevice
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__)
+template <>
+struct PacketType<float, GpuDevice> {
+  typedef float4 type;
+  static const int size = 4;
+};
+template <>
+struct PacketType<double, GpuDevice> {
+  typedef double2 type;
+  static const int size = 2;
+};
+#endif
+
+
+#if defined(EIGEN_HAS_CONSTEXPR)
+#define EIGEN_CONSTEXPR constexpr
+#else
+#define EIGEN_CONSTEXPR
+#endif
+
+// Tuple mimics std::pair but works on e.g. nvcc.
+template <typename U, typename V> struct Tuple {
+ public:
+  U first;
+  V second;
+
+  typedef U first_type;
+  typedef V second_type;
+
+  EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  Tuple() : first(), second() {}
+
+  EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  Tuple(const U& f, const V& s) : first(f), second(s) {}
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  Tuple& operator= (const Tuple& rhs) {
+    if (&rhs == this) return *this;
+    first = rhs.first;
+    second = rhs.second;
+    return *this;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void swap(Tuple& rhs) {
+    using numext::swap;
+    swap(first, rhs.first);
+    swap(second, rhs.second);
+  }
+};
+
+template <typename U, typename V>
+EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+bool operator==(const Tuple<U, V>& x, const Tuple<U, V>& y) {
+  return (x.first == y.first && x.second == y.second);
+}
+
+template <typename U, typename V>
+EIGEN_CONSTEXPR EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+bool operator!=(const Tuple<U, V>& x, const Tuple<U, V>& y) {
+  return !(x == y);
+}
+
+#undef EIGEN_CONSTEXPR
+
+}  // namespace Eigen
+
+#endif  // EIGEN_CXX11_TENSOR_TENSOR_META_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h
new file mode 100644
index 0000000000..e67f3da31a
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h
@@ -0,0 +1,817 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_MORPHING_H
+#define EIGEN_CXX11_TENSOR_TENSOR_MORPHING_H
+
+namespace Eigen {
+
+/** \class TensorReshaping
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor reshaping class.
+  *
+  *
+  */
+namespace internal {
+template<typename NewDimensions, typename XprType>
+struct traits<TensorReshapingOp<NewDimensions, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = array_size<NewDimensions>::value;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename NewDimensions, typename XprType>
+struct eval<TensorReshapingOp<NewDimensions, XprType>, Eigen::Dense>
+{
+  typedef const TensorReshapingOp<NewDimensions, XprType>& type;
+};
+
+template<typename NewDimensions, typename XprType>
+struct nested<TensorReshapingOp<NewDimensions, XprType>, 1, typename eval<TensorReshapingOp<NewDimensions, XprType> >::type>
+{
+  typedef TensorReshapingOp<NewDimensions, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename NewDimensions, typename XprType>
+class TensorReshapingOp : public TensorBase<TensorReshapingOp<NewDimensions, XprType>, WriteAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorReshapingOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorReshapingOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+  typedef typename internal::remove_const<typename XprType::PacketReturnType>::type PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorReshapingOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorReshapingOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorReshapingOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorReshapingOp(const XprType& expr, const NewDimensions& dims)
+      : m_xpr(expr), m_dims(dims) {}
+
+    EIGEN_DEVICE_FUNC
+    const NewDimensions& dimensions() const { return m_dims; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorReshapingOp& operator = (const TensorReshapingOp& other)
+    {
+      typedef TensorAssignOp<TensorReshapingOp, const TensorReshapingOp> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorReshapingOp& operator = (const OtherDerived& other)
+    {
+      typedef TensorAssignOp<TensorReshapingOp, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const NewDimensions m_dims;
+};
+
+
+// Eval as rvalue
+template<typename NewDimensions, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorReshapingOp<NewDimensions, ArgType>, Device>
+{
+  typedef TensorReshapingOp<NewDimensions, ArgType> XprType;
+  typedef NewDimensions Dimensions;
+
+  enum {
+    IsAligned = TensorEvaluator<ArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    // TODO(andydavis) Re-enable BlockAccess when the performance issue
+    // with block-based reshape is resolved.
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device), m_dimensions(op.dimensions())
+  {
+    // The total size of the reshaped tensor must be equal to the total size
+    // of the input tensor.
+    eigen_assert(internal::array_prod(m_impl.dimensions()) == internal::array_prod(op.dimensions()));
+
+    if (BlockAccess) {
+      const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims =
+          m_impl.dimensions();
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        m_outputStrides[0] = 1;
+        for (int i = 1; i < NumOutputDims; ++i) {
+          m_outputStrides[i] = m_outputStrides[i - 1] * m_dimensions[i - 1];
+        }
+        m_inputStrides[0] = 1;
+        for (int i = 1; i < NumInputDims; ++i) {
+          m_inputStrides[i] = m_inputStrides[i - 1] * input_dims[i - 1];
+        }
+      } else {
+#ifdef __CUDACC__
+        // TODO(andydavis) Remove the following line of code when associated
+        // nvcc bug b/22973013 is fixed.
+        for (int i = 0; i < 1; ++i) {}
+#endif
+        m_outputStrides[NumOutputDims - 1] = 1;
+        for (int i = NumOutputDims - 2; i >= 0; --i) {
+          m_outputStrides[i] = m_outputStrides[i + 1] * m_dimensions[i + 1];
+        }
+        m_inputStrides[NumInputDims - 1] = 1;
+        for (int i = NumInputDims - 2; i >= 0; --i) {
+          m_inputStrides[i] = m_inputStrides[i + 1] * input_dims[i + 1];
+        }
+      }
+    }
+  }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  static const std::size_t NumOutputDims =
+      internal::array_size<Dimensions>::value;
+  static const std::size_t NumInputDims = internal::array_size<
+    typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  typedef typename internal::TensorBlock<
+    Index, typename internal::remove_const<Scalar>::type, NumOutputDims, Layout>
+  OutputTensorBlock;
+  typedef typename internal::TensorBlock<
+    Index, typename internal::remove_const<Scalar>::type, NumInputDims, Layout>
+  InputTensorBlock;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* data) {
+    return m_impl.evalSubExprsIfNeeded(data);
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return m_impl.coeff(index);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    return m_impl.template packet<LoadMode>(index);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+    m_impl.getResourceRequirements(resources);
+  }
+
+  // TODO(andydavis) Reduce the overhead of this function.
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void block(
+      OutputTensorBlock* output_block) const {
+    // Calculate output block unit-stride inner dimension length.
+    const DSizes<Index, NumOutputDims>& output_block_sizes =
+        output_block->block_sizes();
+    Index output_inner_dim_size = 1;
+    Index output_outer_dim_start = NumOutputDims;
+    for (Index i = 0; i < NumOutputDims; ++i) {
+      const Index dim = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+          ? i : NumOutputDims - i - 1;
+      output_inner_dim_size *= output_block_sizes[dim];
+      if (output_block_sizes[dim] < m_dimensions[dim]) {
+        output_outer_dim_start = i + 1;
+        break;
+      }
+    }
+
+    // Initialize output block iterator state.
+    struct BlockIteratorState {
+      Index stride;
+      Index span;
+      Index size;
+      Index count;
+    };
+    array<BlockIteratorState, NumOutputDims> block_iter_state;
+
+    for (Index i = 0; i < NumOutputDims; ++i) {
+      const Index dim = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+          ? i : NumOutputDims - i - 1;
+      block_iter_state[i].size = output_block_sizes[dim];
+      block_iter_state[i].stride = m_outputStrides[dim];
+      block_iter_state[i].span =
+          block_iter_state[i].stride * (block_iter_state[i].size - 1);
+      block_iter_state[i].count = 0;
+    }
+
+    const Index output_outer_dim_size = output_block_sizes.TotalSize() /
+        output_inner_dim_size;
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims =
+        m_impl.dimensions();
+
+    Index index = output_block->first_coeff_index();
+    for (Index outer_idx = 0; outer_idx < output_outer_dim_size; ++outer_idx) {
+      Index inner_idx = 0;
+      while (inner_idx < output_inner_dim_size) {
+        // Calculate input coords based on 'index'.
+        array<Index, NumInputDims> input_coords;
+        Index idx = index;
+        if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+          for (int i = NumInputDims - 1; i > 0; --i) {
+            input_coords[i] = idx / m_inputStrides[i];
+            idx -= input_coords[i] * m_inputStrides[i];
+          }
+          input_coords[0] = idx;
+        } else {
+          for (int i = 0; i < NumInputDims - 1; ++i) {
+            input_coords[i] = idx / m_inputStrides[i];
+            idx -= input_coords[i] * m_inputStrides[i];
+          }
+          input_coords[NumInputDims - 1] = idx;
+        }
+
+        // Calculate target input block shape, using at most
+        // 'output_inner_dim_size' coefficients along the input block's inner
+        // dimensions.
+        DSizes<Index, NumInputDims> input_block_sizes;
+        Index num_to_allocate = output_inner_dim_size - inner_idx;
+        for (Index i = 0; i < NumInputDims; ++i) {
+          const Index dim =
+              static_cast<int>(Layout) == static_cast<int>(ColMajor)
+              ? i : NumInputDims - i - 1;
+          input_block_sizes[dim] = numext::mini(
+              num_to_allocate, (static_cast<Index>(input_dims[dim]) -
+                                input_coords[dim]));
+          if (input_coords[dim] == 0) {
+            num_to_allocate /= input_block_sizes[dim];
+          } else {
+            num_to_allocate = 1;
+          }
+        }
+
+        // Calculate input block strides.
+        DSizes<Index, NumInputDims> input_block_strides;
+        if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+          input_block_strides[0] = 1;
+          for (int i = 1; i < NumInputDims; ++i) {
+            input_block_strides[i] = input_block_strides[i - 1] *
+                input_block_sizes[i - 1];
+          }
+        } else {
+          input_block_strides[NumInputDims - 1] = 1;
+          for (int i = NumInputDims - 2; i >= 0; --i) {
+            input_block_strides[i] = input_block_strides[i + 1] *
+                input_block_sizes[i + 1];
+          }
+        }
+
+        // Instantiate and read input block from input tensor.
+        InputTensorBlock input_block(index, input_block_sizes,
+                                     input_block_strides, m_inputStrides,
+                                     output_block->data() + outer_idx *
+                                     output_inner_dim_size + inner_idx);
+
+        m_impl.block(&input_block);
+
+        const Index input_block_total_size = input_block_sizes.TotalSize();
+        index += input_block_total_size;
+        inner_idx += input_block_total_size;
+      }
+      eigen_assert(inner_idx == output_inner_dim_size);
+      index -= output_inner_dim_size;
+      // Update index.
+      for (Index i = output_outer_dim_start; i < NumOutputDims; ++i) {
+        if (++block_iter_state[i].count < block_iter_state[i].size) {
+          index += block_iter_state[i].stride;
+          break;
+        }
+        block_iter_state[i].count = 0;
+        index -= block_iter_state[i].span;
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return const_cast<Scalar*>(m_impl.data()); }
+
+  EIGEN_DEVICE_FUNC const TensorEvaluator<ArgType, Device>& impl() const { return m_impl; }
+
+ protected:
+  TensorEvaluator<ArgType, Device> m_impl;
+  NewDimensions m_dimensions;
+  DSizes<Index, NumOutputDims> m_outputStrides;
+  DSizes<Index, NumInputDims> m_inputStrides;
+};
+
+
+// Eval as lvalue
+template<typename NewDimensions, typename ArgType, typename Device>
+  struct TensorEvaluator<TensorReshapingOp<NewDimensions, ArgType>, Device>
+  : public TensorEvaluator<const TensorReshapingOp<NewDimensions, ArgType>, Device>
+
+{
+  typedef TensorEvaluator<const TensorReshapingOp<NewDimensions, ArgType>, Device> Base;
+  typedef TensorReshapingOp<NewDimensions, ArgType> XprType;
+  typedef NewDimensions Dimensions;
+
+  enum {
+    IsAligned = TensorEvaluator<ArgType, Device>::IsAligned,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+    : Base(op, device)
+  { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index)
+  {
+    return this->m_impl.coeffRef(index);
+  }
+  template <int StoreMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x)
+  {
+    this->m_impl.template writePacket<StoreMode>(index, x);
+  }
+};
+
+
+/** \class TensorSlicing
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor slicing class.
+  *
+  *
+  */
+namespace internal {
+template<typename StartIndices, typename Sizes, typename XprType>
+struct traits<TensorSlicingOp<StartIndices, Sizes, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = array_size<StartIndices>::value;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename StartIndices, typename Sizes, typename XprType>
+struct eval<TensorSlicingOp<StartIndices, Sizes, XprType>, Eigen::Dense>
+{
+  typedef const TensorSlicingOp<StartIndices, Sizes, XprType>& type;
+};
+
+template<typename StartIndices, typename Sizes, typename XprType>
+struct nested<TensorSlicingOp<StartIndices, Sizes, XprType>, 1, typename eval<TensorSlicingOp<StartIndices, Sizes, XprType> >::type>
+{
+  typedef TensorSlicingOp<StartIndices, Sizes, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename StartIndices, typename Sizes, typename XprType>
+class TensorSlicingOp : public TensorBase<TensorSlicingOp<StartIndices, Sizes, XprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorSlicingOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorSlicingOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorSlicingOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorSlicingOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorSlicingOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorSlicingOp(const XprType& expr, const StartIndices& indices, const Sizes& sizes)
+      : m_xpr(expr), m_indices(indices), m_sizes(sizes) {}
+
+    EIGEN_DEVICE_FUNC
+    const StartIndices& startIndices() const { return m_indices; }
+    EIGEN_DEVICE_FUNC
+    const Sizes& sizes() const { return m_sizes; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorSlicingOp& operator = (const TensorSlicingOp& other)
+    {
+      typedef TensorAssignOp<TensorSlicingOp, const TensorSlicingOp> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorSlicingOp& operator = (const OtherDerived& other)
+    {
+      typedef TensorAssignOp<TensorSlicingOp, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const StartIndices m_indices;
+    const Sizes m_sizes;
+};
+
+
+// Eval as rvalue
+template<typename StartIndices, typename Sizes, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorSlicingOp<StartIndices, Sizes, ArgType>, Device>
+{
+  typedef TensorSlicingOp<StartIndices, Sizes, ArgType> XprType;
+  static const int NumDims = internal::array_size<Sizes>::value;
+
+  enum {
+    // Alignment can't be guaranteed at compile time since it depends on the
+    // slice offsets and sizes.
+    IsAligned = /*TensorEvaluator<ArgType, Device>::IsAligned*/ false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = TensorEvaluator<ArgType, Device>::BlockAccess,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = TensorEvaluator<ArgType, Device>::CoordAccess,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device), m_device(device), m_dimensions(op.sizes()), m_offsets(op.startIndices())
+  {
+    for (int i = 0; i < internal::array_size<Dimensions>::value; ++i) {
+      eigen_assert(m_impl.dimensions()[i] >= op.sizes()[i] + op.startIndices()[i]);
+    }
+
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    const Sizes& output_dims = op.sizes();
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_inputStrides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1];
+      }
+
+      // Don't initialize m_fastOutputStrides[0] since it won't ever be accessed.
+      m_outputStrides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_outputStrides[i] = m_outputStrides[i-1] * output_dims[i-1];
+        m_fastOutputStrides[i] = internal::TensorIntDivisor<Index>(m_outputStrides[i]);
+      }
+    } else {
+      m_inputStrides[NumDims-1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1];
+      }
+
+      m_outputStrides[NumDims-1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_outputStrides[i] = m_outputStrides[i+1] * output_dims[i+1];
+        m_fastOutputStrides[i] = internal::TensorIntDivisor<Index>(m_outputStrides[i]);
+      }
+    }
+
+    m_block_total_size_max = numext::maxi(static_cast<std::size_t>(1),
+                                          device.lastLevelCacheSize() /
+                                          sizeof(Scalar));
+  }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename internal::remove_const<Scalar>::type ScalarNonConst;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef Sizes Dimensions;
+  typedef internal::TensorBlock<Index, ScalarNonConst, NumDims, Layout>
+    TensorBlock;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* data) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    if (internal::is_arithmetic<typename internal::remove_const<Scalar>::type>::value && data && m_impl.data()) {
+      Index contiguous_values = 1;
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        for (int i = 0; i < NumDims; ++i) {
+          contiguous_values *= dimensions()[i];
+          if (dimensions()[i] != m_impl.dimensions()[i]) {
+            break;
+          }
+        }
+      } else {
+        for (int i = NumDims-1; i >= 0; --i) {
+          contiguous_values *= dimensions()[i];
+          if (dimensions()[i] != m_impl.dimensions()[i]) {
+            break;
+          }
+        }
+      }
+      // Use memcpy if it's going to be faster than using the regular evaluation.
+      if (contiguous_values > m_device.memcpyThreshold()) {
+        Scalar* src = (Scalar*)m_impl.data();
+        for (int i = 0; i < internal::array_prod(dimensions()); i += contiguous_values) {
+          Index offset = srcCoeff(i);
+          m_device.memcpy((void*)(data+i), src+offset, contiguous_values * sizeof(Scalar));
+        }
+        return false;
+      }
+    }
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return m_impl.coeff(srcCoeff(index));
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+        eigen_assert(index+packetSize-1 < internal::array_prod(dimensions()));
+
+    Index inputIndices[] = {0, 0};
+    Index indices[] = {index, index + packetSize - 1};
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx0 = indices[0] / m_fastOutputStrides[i];
+        const Index idx1 = indices[1] / m_fastOutputStrides[i];
+        inputIndices[0] += (idx0 + m_offsets[i]) * m_inputStrides[i];
+        inputIndices[1] += (idx1 + m_offsets[i]) * m_inputStrides[i];
+        indices[0] -= idx0 * m_outputStrides[i];
+        indices[1] -= idx1 * m_outputStrides[i];
+      }
+      inputIndices[0] += (indices[0] + m_offsets[0]);
+      inputIndices[1] += (indices[1] + m_offsets[0]);
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx0 = indices[0] / m_fastOutputStrides[i];
+        const Index idx1 = indices[1] / m_fastOutputStrides[i];
+        inputIndices[0] += (idx0 + m_offsets[i]) * m_inputStrides[i];
+        inputIndices[1] += (idx1 + m_offsets[i]) * m_inputStrides[i];
+        indices[0] -= idx0 * m_outputStrides[i];
+        indices[1] -= idx1 * m_outputStrides[i];
+      }
+      inputIndices[0] += (indices[0] + m_offsets[NumDims-1]);
+      inputIndices[1] += (indices[1] + m_offsets[NumDims-1]);
+    }
+    if (inputIndices[1] - inputIndices[0] == packetSize - 1) {
+      PacketReturnType rslt = m_impl.template packet<Unaligned>(inputIndices[0]);
+      return rslt;
+    }
+    else {
+      typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+      values[0] = m_impl.coeff(inputIndices[0]);
+      values[packetSize-1] = m_impl.coeff(inputIndices[1]);
+      for (int i = 1; i < packetSize-1; ++i) {
+        values[i] = coeff(index+i);
+      }
+      PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+      return rslt;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array<Index, NumDims>& coords)
+  {
+    array<Index, NumDims> inputCoords;
+    for (int i = 0; i < NumDims; ++i) {
+      inputCoords = coords[i] + this->m_offsets[i];
+    }
+    return m_impl.coeff(inputCoords);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+    resources->push_back(internal::TensorOpResourceRequirements(
+        internal::kSkewedInnerDims, m_block_total_size_max));
+    m_impl.getResourceRequirements(resources);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void block(
+      TensorBlock* output_block) const {
+    TensorBlock input_block(srcCoeff(output_block->first_coeff_index()),
+                            output_block->block_sizes(),
+                            output_block->block_strides(),
+                            m_inputStrides,
+                            output_block->data());
+    m_impl.block(&input_block);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar* data() const {
+    Scalar* result = m_impl.data();
+    if (result) {
+      Index offset = 0;
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        for (int i = 0; i < NumDims; ++i) {
+          if (m_dimensions[i] != m_impl.dimensions()[i]) {
+            offset += m_offsets[i] * m_inputStrides[i];
+            for (int j = i+1; j < NumDims; ++j) {
+              if (m_dimensions[j] > 1) {
+                return NULL;
+              }
+              offset += m_offsets[j] * m_inputStrides[j];
+            }
+            break;
+          }
+        }
+      } else {
+        for (int i = NumDims - 1; i >= 0; --i) {
+          if (m_dimensions[i] != m_impl.dimensions()[i]) {
+            offset += m_offsets[i] * m_inputStrides[i];
+            for (int j = i-1; j >= 0; --j) {
+              if (m_dimensions[j] > 1) {
+                return NULL;
+              }
+              offset += m_offsets[j] * m_inputStrides[j];
+            }
+            break;
+          }
+        }
+      }
+      return result + offset;
+    }
+    return NULL;
+  }
+
+ protected:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const
+  {
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = index / m_fastOutputStrides[i];
+        inputIndex += (idx + m_offsets[i]) * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      inputIndex += (index + m_offsets[0]);
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx = index / m_fastOutputStrides[i];
+        inputIndex += (idx + m_offsets[i]) * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      inputIndex += (index + m_offsets[NumDims-1]);
+    }
+    return inputIndex;
+  }
+
+  array<Index, NumDims> m_outputStrides;
+  array<internal::TensorIntDivisor<Index>, NumDims> m_fastOutputStrides;
+  array<Index, NumDims> m_inputStrides;
+  TensorEvaluator<ArgType, Device> m_impl;
+  const Device& m_device;
+  Dimensions m_dimensions;
+  const StartIndices m_offsets;
+  std::size_t m_block_total_size_max;
+};
+
+
+// Eval as lvalue
+template<typename StartIndices, typename Sizes, typename ArgType, typename Device>
+struct TensorEvaluator<TensorSlicingOp<StartIndices, Sizes, ArgType>, Device>
+  : public TensorEvaluator<const TensorSlicingOp<StartIndices, Sizes, ArgType>, Device>
+{
+  typedef TensorEvaluator<const TensorSlicingOp<StartIndices, Sizes, ArgType>, Device> Base;
+  typedef TensorSlicingOp<StartIndices, Sizes, ArgType> XprType;
+  static const int NumDims = internal::array_size<Sizes>::value;
+
+  enum {
+    IsAligned = /*TensorEvaluator<ArgType, Device>::IsAligned*/ false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = TensorEvaluator<ArgType, Device>::BlockAccess,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = TensorEvaluator<ArgType, Device>::CoordAccess,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+    : Base(op, device)
+    { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename internal::remove_const<Scalar>::type ScalarNonConst;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef Sizes Dimensions;
+  typedef internal::TensorBlock<Index, ScalarNonConst, NumDims, Layout>
+    TensorBlock;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index)
+  {
+    return this->m_impl.coeffRef(this->srcCoeff(index));
+  }
+
+  template <int StoreMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x)
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    Index inputIndices[] = {0, 0};
+    Index indices[] = {index, index + packetSize - 1};
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx0 = indices[0] / this->m_fastOutputStrides[i];
+        const Index idx1 = indices[1] / this->m_fastOutputStrides[i];
+        inputIndices[0] += (idx0 + this->m_offsets[i]) * this->m_inputStrides[i];
+        inputIndices[1] += (idx1 + this->m_offsets[i]) * this->m_inputStrides[i];
+        indices[0] -= idx0 * this->m_outputStrides[i];
+        indices[1] -= idx1 * this->m_outputStrides[i];
+      }
+      inputIndices[0] += (indices[0] + this->m_offsets[0]);
+      inputIndices[1] += (indices[1] + this->m_offsets[0]);
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx0 = indices[0] / this->m_fastOutputStrides[i];
+        const Index idx1 = indices[1] / this->m_fastOutputStrides[i];
+        inputIndices[0] += (idx0 + this->m_offsets[i]) * this->m_inputStrides[i];
+        inputIndices[1] += (idx1 + this->m_offsets[i]) * this->m_inputStrides[i];
+        indices[0] -= idx0 * this->m_outputStrides[i];
+        indices[1] -= idx1 * this->m_outputStrides[i];
+      }
+      inputIndices[0] += (indices[0] + this->m_offsets[NumDims-1]);
+      inputIndices[1] += (indices[1] + this->m_offsets[NumDims-1]);
+    }
+    if (inputIndices[1] - inputIndices[0] == packetSize - 1) {
+      this->m_impl.template writePacket<StoreMode>(inputIndices[0], x);
+    }
+    else {
+      EIGEN_ALIGN_DEFAULT CoeffReturnType values[packetSize];
+      internal::pstore<CoeffReturnType, PacketReturnType>(values, x);
+      this->m_impl.coeffRef(inputIndices[0]) = values[0];
+      this->m_impl.coeffRef(inputIndices[1]) = values[packetSize-1];
+      for (int i = 1; i < packetSize-1; ++i) {
+        this->coeffRef(index+i) = values[i];
+      }
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(const array<Index, NumDims>& coords)
+  {
+    array<Index, NumDims> inputCoords;
+    for (int i = 0; i < NumDims; ++i) {
+      inputCoords = coords[i] + this->m_offsets[i];
+    }
+    return this->m_impl.coeffRef(inputCoords);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock(
+      const TensorBlock& block) {
+    this->m_impl.writeBlock(
+        TensorBlock(this->srcCoeff(block.first_coeff_index()),
+                    block.block_sizes(),
+                    block.block_strides(),
+                    this->m_inputStrides,
+                    const_cast<ScalarNonConst*>(block.data())));
+
+  }
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_MORPHING_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorPadding.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorPadding.h
new file mode 100644
index 0000000000..d1dff3f38b
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorPadding.h
@@ -0,0 +1,388 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_PADDING_H
+#define EIGEN_CXX11_TENSOR_TENSOR_PADDING_H
+
+namespace Eigen {
+
+/** \class TensorPadding
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor padding class.
+  * At the moment only padding with a constant value is supported.
+  *
+  */
+namespace internal {
+template<typename PaddingDimensions, typename XprType>
+struct traits<TensorPaddingOp<PaddingDimensions, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename PaddingDimensions, typename XprType>
+struct eval<TensorPaddingOp<PaddingDimensions, XprType>, Eigen::Dense>
+{
+  typedef const TensorPaddingOp<PaddingDimensions, XprType>& type;
+};
+
+template<typename PaddingDimensions, typename XprType>
+struct nested<TensorPaddingOp<PaddingDimensions, XprType>, 1, typename eval<TensorPaddingOp<PaddingDimensions, XprType> >::type>
+{
+  typedef TensorPaddingOp<PaddingDimensions, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename PaddingDimensions, typename XprType>
+class TensorPaddingOp : public TensorBase<TensorPaddingOp<PaddingDimensions, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorPaddingOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorPaddingOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorPaddingOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorPaddingOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorPaddingOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorPaddingOp(const XprType& expr, const PaddingDimensions& padding_dims,
+                                                        const Scalar padding_value)
+      : m_xpr(expr), m_padding_dims(padding_dims), m_padding_value(padding_value) {}
+
+    EIGEN_DEVICE_FUNC
+    const PaddingDimensions& padding() const { return m_padding_dims; }
+    EIGEN_DEVICE_FUNC
+    Scalar padding_value() const { return m_padding_value; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const PaddingDimensions m_padding_dims;
+    const Scalar m_padding_value;
+};
+
+
+// Eval as rvalue
+template<typename PaddingDimensions, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorPaddingOp<PaddingDimensions, ArgType>, Device>
+{
+  typedef TensorPaddingOp<PaddingDimensions, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<PaddingDimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = true,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device), m_padding(op.padding()), m_paddingValue(op.padding_value())
+  {
+    // Compute dimensions
+    m_dimensions = m_impl.dimensions();
+    for (int i = 0; i < NumDims; ++i) {
+      m_dimensions[i] += m_padding[i].first + m_padding[i].second;
+    }
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_outputStrides[0] = 1;
+      if (NumDims > 0) {
+        m_inputStrides[0] = 1;
+        for (int i = 1; i < NumDims; ++i) {
+          m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1];
+          m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1];
+        }
+        m_outputStrides[NumDims] = m_outputStrides[NumDims-1] * m_dimensions[NumDims-1];
+      }
+    } else {
+      m_outputStrides[NumDims] = 1;
+      if (NumDims > 0) {
+        m_inputStrides[NumDims - 1] = 1;
+        for (int i = NumDims - 2; i >= 0; --i) {
+          m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1];
+          m_outputStrides[i+1] = m_outputStrides[i+2] * m_dimensions[i+1];
+        }
+        m_outputStrides[0] = m_outputStrides[1] * m_dimensions[0];
+      }
+    }
+  }
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar*) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    eigen_assert(index < dimensions().TotalSize());
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = index / m_outputStrides[i];
+        if (idx < m_padding[i].first || idx >= m_dimensions[i] - m_padding[i].second) {
+          return m_paddingValue;
+        }
+        inputIndex += (idx - m_padding[i].first) * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      if (NumDims > 0) {
+        if (index < m_padding[0].first || index >= m_dimensions[0] - m_padding[0].second) {
+          return m_paddingValue;
+        }
+        inputIndex += (index - m_padding[0].first);
+      }
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx = index / m_outputStrides[i+1];
+        if (idx < m_padding[i].first || idx >= m_dimensions[i] - m_padding[i].second) {
+          return m_paddingValue;
+        }
+        inputIndex += (idx - m_padding[i].first) * m_inputStrides[i];
+        index -= idx * m_outputStrides[i+1];
+      }
+      if (NumDims > 0) {
+        if (index < m_padding[NumDims-1].first ||
+            index >= m_dimensions[NumDims-1] - m_padding[NumDims-1].second) {
+          return m_paddingValue;
+        }
+        inputIndex += (index - m_padding[NumDims-1].first);
+      }
+    }
+    return m_impl.coeff(inputIndex);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      return packetColMajor(index);
+    }
+    return packetRowMajor(index);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array<Index, NumDims>& coords) const
+  {
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      if (NumDims > 0) {
+        const Index idx = coords[0];
+        if (idx < m_padding[0].first || idx >= m_dimensions[0] - m_padding[0].second) {
+          return m_paddingValue;
+        }
+        inputIndex = idx - m_padding[0].first;
+      }
+      for (int i = 1; i < NumDims; ++i) {
+        const Index idx = coords[i];
+        if (idx < m_padding[i].first || idx >= m_dimensions[i] - m_padding[i].second) {
+          return m_paddingValue;
+        }
+        inputIndex += (idx - m_padding[i].first) * m_inputStrides[i];
+      }
+    } else {
+      if (NumDims > 0) {
+        const Index idx = coords[NumDims-1];
+        if (idx < m_padding[NumDims-1].first || idx >= m_dimensions[NumDims-1] - m_padding[NumDims-1].second) {
+          return m_paddingValue;
+        }
+        inputIndex = idx - m_padding[NumDims-1].first;
+      }
+      for (int i = NumDims - 2; i >= 0; --i) {
+        const Index idx = coords[i];
+        if (idx < m_padding[i].first || idx >= m_dimensions[i] - m_padding[i].second) {
+          return m_paddingValue;
+        }
+        inputIndex += (idx - m_padding[i].first) * m_inputStrides[i];
+      }
+    }
+    return m_impl.coeff(inputIndex);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetColMajor(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    const Index initialIndex = index;
+    Index inputIndex = 0;
+    for (int i = NumDims - 1; i > 0; --i) {
+      const Index first = index;
+      const Index last = index + packetSize - 1;
+      const Index lastPaddedLeft = m_padding[i].first * m_outputStrides[i];
+      const Index firstPaddedRight = (m_dimensions[i] - m_padding[i].second) * m_outputStrides[i];
+      const Index lastPaddedRight = m_outputStrides[i+1];
+
+      if (last < lastPaddedLeft) {
+        // all the coefficient are in the padding zone.
+        return internal::pset1<PacketReturnType>(m_paddingValue);
+      }
+      else if (first >= firstPaddedRight && last < lastPaddedRight) {
+        // all the coefficient are in the padding zone.
+        return internal::pset1<PacketReturnType>(m_paddingValue);
+      }
+      else if (first >= lastPaddedLeft && last < firstPaddedRight) {
+        // all the coefficient are between the 2 padding zones.
+        const Index idx = index / m_outputStrides[i];
+        inputIndex += (idx - m_padding[i].first) * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      else {
+        // Every other case
+        return packetWithPossibleZero(initialIndex);
+      }
+    }
+
+    const Index last = index + packetSize - 1;
+    const Index first = index;
+
+    if (NumDims > 0) {
+      const Index lastPaddedLeft = m_padding[0].first;
+      const Index firstPaddedRight = (m_dimensions[0] - m_padding[0].second);
+      const Index lastPaddedRight = m_outputStrides[1];
+
+      if (last < lastPaddedLeft) {
+        // all the coefficient are in the padding zone.
+        return internal::pset1<PacketReturnType>(m_paddingValue);
+      }
+      else if (first >= firstPaddedRight && last < lastPaddedRight) {
+        // all the coefficient are in the padding zone.
+        return internal::pset1<PacketReturnType>(m_paddingValue);
+      }
+      else if (first >= lastPaddedLeft && last < firstPaddedRight) {
+        // all the coefficient are between the 2 padding zones.
+        inputIndex += (index - m_padding[0].first);
+        return m_impl.template packet<Unaligned>(inputIndex);
+      }
+    }
+
+    // Every other case
+    return packetWithPossibleZero(initialIndex);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetRowMajor(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    const Index initialIndex = index;
+    Index inputIndex = 0;
+
+    for (int i = 0; i < NumDims - 1; ++i) {
+      const Index first = index;
+      const Index last = index + packetSize - 1;
+      const Index lastPaddedLeft = m_padding[i].first * m_outputStrides[i+1];
+      const Index firstPaddedRight = (m_dimensions[i] - m_padding[i].second) * m_outputStrides[i+1];
+      const Index lastPaddedRight = m_outputStrides[i];
+
+      if (last < lastPaddedLeft) {
+        // all the coefficient are in the padding zone.
+        return internal::pset1<PacketReturnType>(m_paddingValue);
+      }
+      else if (first >= firstPaddedRight && last < lastPaddedRight) {
+        // all the coefficient are in the padding zone.
+        return internal::pset1<PacketReturnType>(m_paddingValue);
+      }
+      else if (first >= lastPaddedLeft && last < firstPaddedRight) {
+        // all the coefficient are between the 2 padding zones.
+        const Index idx = index / m_outputStrides[i+1];
+        inputIndex += (idx - m_padding[i].first) * m_inputStrides[i];
+        index -= idx * m_outputStrides[i+1];
+      }
+      else {
+        // Every other case
+        return packetWithPossibleZero(initialIndex);
+      }
+    }
+
+    const Index last = index + packetSize - 1;
+    const Index first = index;
+
+    if (NumDims > 0) {
+      const Index lastPaddedLeft = m_padding[NumDims-1].first;
+      const Index firstPaddedRight = (m_dimensions[NumDims-1] - m_padding[NumDims-1].second);
+      const Index lastPaddedRight = m_outputStrides[NumDims-1];
+
+      if (last < lastPaddedLeft) {
+        // all the coefficient are in the padding zone.
+        return internal::pset1<PacketReturnType>(m_paddingValue);
+      }
+      else if (first >= firstPaddedRight && last < lastPaddedRight) {
+        // all the coefficient are in the padding zone.
+        return internal::pset1<PacketReturnType>(m_paddingValue);
+      }
+      else if (first >= lastPaddedLeft && last < firstPaddedRight) {
+        // all the coefficient are between the 2 padding zones.
+        inputIndex += (index - m_padding[NumDims-1].first);
+        return m_impl.template packet<Unaligned>(inputIndex);
+      }
+    }
+
+    // Every other case
+    return packetWithPossibleZero(initialIndex);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetWithPossibleZero(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  Dimensions m_dimensions;
+  array<Index, NumDims+1> m_outputStrides;
+  array<Index, NumDims> m_inputStrides;
+  TensorEvaluator<ArgType, Device> m_impl;
+  PaddingDimensions m_padding;
+
+  Scalar m_paddingValue;
+};
+
+
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_PADDING_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorPatch.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorPatch.h
new file mode 100644
index 0000000000..c89022ab8e
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorPatch.h
@@ -0,0 +1,314 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_PATCH_H
+#define EIGEN_CXX11_TENSOR_TENSOR_PATCH_H
+
+namespace Eigen {
+
+/** \class TensorPatch
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor patch class.
+  *
+  *
+  */
+namespace internal {
+template<typename PatchDim, typename XprType>
+struct traits<TensorPatchOp<PatchDim, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions + 1;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename PatchDim, typename XprType>
+struct eval<TensorPatchOp<PatchDim, XprType>, Eigen::Dense>
+{
+  typedef const TensorPatchOp<PatchDim, XprType>& type;
+};
+
+template<typename PatchDim, typename XprType>
+struct nested<TensorPatchOp<PatchDim, XprType>, 1, typename eval<TensorPatchOp<PatchDim, XprType> >::type>
+{
+  typedef TensorPatchOp<PatchDim, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename PatchDim, typename XprType>
+class TensorPatchOp : public TensorBase<TensorPatchOp<PatchDim, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorPatchOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorPatchOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorPatchOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorPatchOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorPatchOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorPatchOp(const XprType& expr, const PatchDim& patch_dims)
+      : m_xpr(expr), m_patch_dims(patch_dims) {}
+
+    EIGEN_DEVICE_FUNC
+    const PatchDim& patch_dims() const { return m_patch_dims; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const PatchDim m_patch_dims;
+};
+
+
+// Eval as rvalue
+template<typename PatchDim, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorPatchOp<PatchDim, ArgType>, Device>
+{
+  typedef TensorPatchOp<PatchDim, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value + 1;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = true,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device)
+  {
+    Index num_patches = 1;
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    const PatchDim& patch_dims = op.patch_dims();
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = 0; i < NumDims-1; ++i) {
+        m_dimensions[i] = patch_dims[i];
+        num_patches *= (input_dims[i] - patch_dims[i] + 1);
+      }
+      m_dimensions[NumDims-1] = num_patches;
+
+      m_inputStrides[0] = 1;
+      m_patchStrides[0] = 1;
+      for (int i = 1; i < NumDims-1; ++i) {
+        m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1];
+        m_patchStrides[i] = m_patchStrides[i-1] * (input_dims[i-1] - patch_dims[i-1] + 1);
+      }
+      m_outputStrides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1];
+      }
+    } else {
+      for (int i = 0; i < NumDims-1; ++i) {
+        m_dimensions[i+1] = patch_dims[i];
+        num_patches *= (input_dims[i] - patch_dims[i] + 1);
+      }
+      m_dimensions[0] = num_patches;
+
+      m_inputStrides[NumDims-2] = 1;
+      m_patchStrides[NumDims-2] = 1;
+      for (int i = NumDims-3; i >= 0; --i) {
+        m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1];
+        m_patchStrides[i] = m_patchStrides[i+1] * (input_dims[i+1] - patch_dims[i+1] + 1);
+      }
+      m_outputStrides[NumDims-1] = 1;
+      for (int i = NumDims-2; i >= 0; --i) {
+        m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1];
+      }
+    }
+  }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    Index output_stride_index = (static_cast<int>(Layout) == static_cast<int>(ColMajor)) ? NumDims - 1 : 0;
+    // Find the location of the first element of the patch.
+    Index patchIndex = index / m_outputStrides[output_stride_index];
+    // Find the offset of the element wrt the location of the first element.
+    Index patchOffset = index - patchIndex * m_outputStrides[output_stride_index];
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 2; i > 0; --i) {
+        const Index patchIdx = patchIndex / m_patchStrides[i];
+        patchIndex -= patchIdx * m_patchStrides[i];
+        const Index offsetIdx = patchOffset / m_outputStrides[i];
+        patchOffset -= offsetIdx * m_outputStrides[i];
+        inputIndex += (patchIdx + offsetIdx) * m_inputStrides[i];
+      }
+    } else {
+      for (int i = 0; i < NumDims - 2; ++i) {
+        const Index patchIdx = patchIndex / m_patchStrides[i];
+        patchIndex -= patchIdx * m_patchStrides[i];
+        const Index offsetIdx = patchOffset / m_outputStrides[i+1];
+        patchOffset -= offsetIdx * m_outputStrides[i+1];
+        inputIndex += (patchIdx + offsetIdx) * m_inputStrides[i];
+      }
+    }
+    inputIndex += (patchIndex + patchOffset);
+    return m_impl.coeff(inputIndex);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    Index output_stride_index = (static_cast<int>(Layout) == static_cast<int>(ColMajor)) ? NumDims - 1 : 0;
+    Index indices[2] = {index, index + packetSize - 1};
+    Index patchIndices[2] = {indices[0] / m_outputStrides[output_stride_index],
+                             indices[1] / m_outputStrides[output_stride_index]};
+    Index patchOffsets[2] = {indices[0] - patchIndices[0] * m_outputStrides[output_stride_index],
+                             indices[1] - patchIndices[1] * m_outputStrides[output_stride_index]};
+
+    Index inputIndices[2] = {0, 0};
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 2; i > 0; --i) {
+        const Index patchIdx[2] = {patchIndices[0] / m_patchStrides[i],
+                                   patchIndices[1] / m_patchStrides[i]};
+        patchIndices[0] -= patchIdx[0] * m_patchStrides[i];
+        patchIndices[1] -= patchIdx[1] * m_patchStrides[i];
+
+        const Index offsetIdx[2] = {patchOffsets[0] / m_outputStrides[i],
+                                    patchOffsets[1] / m_outputStrides[i]};
+        patchOffsets[0] -= offsetIdx[0] * m_outputStrides[i];
+        patchOffsets[1] -= offsetIdx[1] * m_outputStrides[i];
+
+        inputIndices[0] += (patchIdx[0] + offsetIdx[0]) * m_inputStrides[i];
+        inputIndices[1] += (patchIdx[1] + offsetIdx[1]) * m_inputStrides[i];
+      }
+    } else {
+      for (int i = 0; i < NumDims - 2; ++i) {
+        const Index patchIdx[2] = {patchIndices[0] / m_patchStrides[i],
+                                   patchIndices[1] / m_patchStrides[i]};
+        patchIndices[0] -= patchIdx[0] * m_patchStrides[i];
+        patchIndices[1] -= patchIdx[1] * m_patchStrides[i];
+
+        const Index offsetIdx[2] = {patchOffsets[0] / m_outputStrides[i+1],
+                                    patchOffsets[1] / m_outputStrides[i+1]};
+        patchOffsets[0] -= offsetIdx[0] * m_outputStrides[i+1];
+        patchOffsets[1] -= offsetIdx[1] * m_outputStrides[i+1];
+
+        inputIndices[0] += (patchIdx[0] + offsetIdx[0]) * m_inputStrides[i];
+        inputIndices[1] += (patchIdx[1] + offsetIdx[1]) * m_inputStrides[i];
+      }
+    }
+    inputIndices[0] += (patchIndices[0] + patchOffsets[0]);
+    inputIndices[1] += (patchIndices[1] + patchOffsets[1]);
+
+    if (inputIndices[1] - inputIndices[0] == packetSize - 1) {
+      PacketReturnType rslt = m_impl.template packet<Unaligned>(inputIndices[0]);
+      return rslt;
+    }
+    else {
+      EIGEN_ALIGN_DEFAULT CoeffReturnType values[packetSize];
+      values[0] = m_impl.coeff(inputIndices[0]);
+      values[packetSize-1] = m_impl.coeff(inputIndices[1]);
+      for (int i = 1; i < packetSize-1; ++i) {
+        values[i] = coeff(index+i);
+      }
+      PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+      return rslt;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array<Index, NumDims>& coords) const
+  {
+    Index patch_coord_idx = Layout == ColMajor ? NumDims - 1 : 0;
+    // Location of the first element of the patch.
+    const Index patchIndex = coords[patch_coord_idx];
+
+    if (TensorEvaluator<ArgType, Device>::CoordAccess) {
+      array<Index, NumDims-1> inputCoords;
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        for (int i = NumDims - 2; i > 0; --i) {
+          const Index patchIdx = patchIndex / m_patchStrides[i];
+          patchIndex -= patchIdx * m_patchStrides[i];
+          const Index offsetIdx = coords[i];
+          inputCoords[i] = coords[i] + patchIdx;
+        }
+      } else {
+        for (int i = 0; i < NumDims - 2; ++i) {
+          const Index patchIdx = patchIndex / m_patchStrides[i];
+          patchIndex -= patchIdx * m_patchStrides[i];
+          const Index offsetIdx = coords[i+1];
+          inputCoords[i] = coords[i+1] + patchIdx;
+        }
+      }
+      Index coords_idx = Layout == ColMajor ? 0 : NumDims - 1;
+      inputCoords[0] = (patchIndex + coords[coords_idx]);
+      return m_impl.coeff(inputCoords);
+    }
+    else {
+      Index inputIndex = 0;
+      if (Layout == ColMajor) {
+        for (int i = NumDims - 2; i > 0; --i) {
+          const Index patchIdx = patchIndex / m_patchStrides[i];
+          patchIndex -= patchIdx * m_patchStrides[i];
+          const Index offsetIdx = coords[i];
+          inputIndex += (patchIdx + offsetIdx) * m_inputStrides[i];
+        }
+      } else {
+        for (int i = 0; i < NumDims - 2; ++i) {
+          const Index patchIdx = patchIndex / m_patchStrides[i];
+          patchIndex -= patchIdx * m_patchStrides[i];
+          const Index offsetIdx = coords[i+1];
+          inputIndex += (patchIdx + offsetIdx) * m_inputStrides[i];
+        }
+      }
+      Index coords_idx = Layout == ColMajor ? 0 : NumDims - 1;
+      inputIndex += (patchIndex + coords[coords_idx]);
+      return m_impl.coeff(inputIndex);
+    }
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_outputStrides;
+  array<Index, NumDims-1> m_inputStrides;
+  array<Index, NumDims-1> m_patchStrides;
+
+  TensorEvaluator<ArgType, Device> m_impl;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_PATCH_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
new file mode 100644
index 0000000000..a70d5ae1f0
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h
@@ -0,0 +1,1141 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_H
+#define EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_H
+
+namespace Eigen {
+
+/** \class TensorReduction
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor reduction class.
+  *
+  */
+
+namespace internal {
+template<typename Op, typename Dims, typename XprType>
+struct traits<TensorReductionOp<Op, Dims, XprType> >
+ : traits<XprType>
+{
+  typedef typename traits<XprType>::Scalar Scalar;
+  typedef typename traits<XprType>::StorageKind StorageKind;
+  typedef typename traits<XprType>::Index Index;
+  typedef typename XprType::Nested Nested;
+};
+
+template<typename Op, typename Dims, typename XprType>
+struct eval<TensorReductionOp<Op, Dims, XprType>, Eigen::Dense>
+{
+  typedef const TensorReductionOp<Op, Dims, XprType>& type;
+};
+
+template<typename Op, typename Dims, typename XprType>
+struct nested<TensorReductionOp<Op, Dims, XprType>, 1, typename eval<TensorReductionOp<Op, Dims, XprType> >::type>
+{
+  typedef TensorReductionOp<Op, Dims, XprType> type;
+};
+
+
+
+template <typename InputDims, typename OutputDims, typename ReducedDims> EIGEN_DEVICE_FUNC
+static void partition_dims(const InputDims& input_dims,
+                           const array<bool, internal::array_size<InputDims>::value>& reduced,
+                           OutputDims* output_dims, ReducedDims* reduced_dims) {
+  const int NumInputDims = internal::array_size<InputDims>::value;
+  int outputIndex = 0;
+  int reduceIndex = 0;
+  for (int i = 0; i < NumInputDims; ++i) {
+    if (OutputDims::count == 0 || reduced[i]) {
+      (*reduced_dims)[reduceIndex] = input_dims[i];
+      ++reduceIndex;
+    } else {
+      (*output_dims)[outputIndex] = input_dims[i];
+      ++outputIndex;
+    }
+  }
+}
+
+
+
+template <typename ReducedDims, int NumTensorDims, int Layout>
+struct are_inner_most_dims {
+  static const bool value = false;
+};
+template <typename ReducedDims, int NumTensorDims, int Layout>
+struct preserve_inner_most_dims {
+  static const bool value = false;
+};
+
+#if defined(EIGEN_HAS_CONSTEXPR) && defined(EIGEN_HAS_VARIADIC_TEMPLATES)
+// The use of the tmp1, tmp2, tmp3 intermediate variables is needed for nvcc 7
+// to compile the code below. NVidia is working on a fix.
+template <typename ReducedDims, int NumTensorDims>
+struct are_inner_most_dims<ReducedDims, NumTensorDims, ColMajor>{
+  static const bool tmp1 = indices_statically_known_to_increase<ReducedDims>()();
+  static const bool tmp2 = index_statically_eq<ReducedDims>()(0, 0);
+  static const bool tmp3 = index_statically_eq<ReducedDims>()(array_size<ReducedDims>::value-1, array_size<ReducedDims>::value-1);
+  static const bool value = tmp1 & tmp2 & tmp3;
+};
+template <typename ReducedDims, int NumTensorDims>
+struct are_inner_most_dims<ReducedDims, NumTensorDims, RowMajor>{
+  static const bool tmp1 = indices_statically_known_to_increase<ReducedDims>()();
+  static const bool tmp2 = index_statically_eq<ReducedDims>()(0, NumTensorDims - array_size<ReducedDims>::value);
+  static const bool tmp3 = index_statically_eq<ReducedDims>()(array_size<ReducedDims>::value - 1, NumTensorDims - 1);
+  static const bool value = tmp1 & tmp2 & tmp3;
+
+};
+template <typename ReducedDims, int NumTensorDims>
+struct preserve_inner_most_dims<ReducedDims, NumTensorDims, ColMajor>{
+  static const bool tmp1 = indices_statically_known_to_increase<ReducedDims>()();
+  static const bool tmp2 = index_statically_gt<ReducedDims>()(0, 0);
+  static const bool value = tmp1 & tmp2;
+
+};
+template <typename ReducedDims, int NumTensorDims>
+struct preserve_inner_most_dims<ReducedDims, NumTensorDims, RowMajor>{
+  static const bool tmp1 = indices_statically_known_to_increase<ReducedDims>()();
+  static const bool tmp2 = index_statically_lt<ReducedDims>()(array_size<ReducedDims>::value - 1, NumTensorDims - 1);
+  static const bool value = tmp1 & tmp2;
+};
+#endif
+
+
+template <int DimIndex, typename Self, typename Op>
+struct GenericDimReducer {
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::CoeffReturnType* accum) {
+    EIGEN_STATIC_ASSERT(DimIndex >= 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+    for (int j = 0; j < self.m_reducedDims[DimIndex]; ++j) {
+      const typename Self::Index input = firstIndex + j * self.m_reducedStrides[DimIndex];
+      GenericDimReducer<DimIndex-1, Self, Op>::reduce(self, input, reducer, accum);
+    }
+  }
+};
+template <typename Self, typename Op>
+struct GenericDimReducer<-1, Self, Op> {
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::CoeffReturnType* accum) {
+    reducer.reduce(self.m_impl.coeff(firstIndex), accum);
+  }
+};
+
+template <typename Self, typename Op, bool Vectorizable = (Self::InputPacketAccess & Op::PacketAccess)>
+struct InnerMostDimReducer {
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Self::CoeffReturnType reduce(const Self& self, typename Self::Index firstIndex, typename Self::Index numValuesToReduce, Op& reducer) {
+    typename Self::CoeffReturnType accum = reducer.initialize();
+    for (typename Self::Index j = 0; j < numValuesToReduce; ++j) {
+      reducer.reduce(self.m_impl.coeff(firstIndex + j), &accum);
+    }
+    return reducer.finalize(accum);
+  }
+};
+
+template <typename Self, typename Op>
+struct InnerMostDimReducer<Self, Op, true> {
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename Self::CoeffReturnType reduce(const Self& self, typename Self::Index firstIndex, typename Self::Index numValuesToReduce, Op& reducer) {
+    const int packetSize = internal::unpacket_traits<typename Self::PacketReturnType>::size;
+    const typename Self::Index VectorizedSize = (numValuesToReduce / packetSize) * packetSize;
+    typename Self::PacketReturnType p = reducer.template initializePacket<typename Self::PacketReturnType>();
+    for (typename Self::Index j = 0; j < VectorizedSize; j += packetSize) {
+      reducer.reducePacket(self.m_impl.template packet<Unaligned>(firstIndex + j), &p);
+    }
+    typename Self::CoeffReturnType accum = reducer.initialize();
+    for (typename Self::Index j = VectorizedSize; j < numValuesToReduce; ++j) {
+      reducer.reduce(self.m_impl.coeff(firstIndex + j), &accum);
+    }
+    return reducer.finalizeBoth(accum, p);
+  }
+};
+
+template <int DimIndex, typename Self, typename Op, bool vectorizable = (Self::InputPacketAccess & Op::PacketAccess)>
+struct InnerMostDimPreserver {
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::PacketReturnType* accum) {
+    eigen_assert(false && "should never be called");
+  }
+};
+
+template <int DimIndex, typename Self, typename Op>
+struct InnerMostDimPreserver<DimIndex, Self, Op, true> {
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::PacketReturnType* accum) {
+    EIGEN_STATIC_ASSERT(DimIndex >= 0, YOU_MADE_A_PROGRAMMING_MISTAKE);
+    for (typename Self::Index j = 0; j < self.m_reducedDims[DimIndex]; ++j) {
+      const typename Self::Index input = firstIndex + j * self.m_reducedStrides[DimIndex];
+      InnerMostDimPreserver<DimIndex-1, Self, Op>::reduce(self, input, reducer, accum);
+    }
+  }
+};
+
+template <typename Self, typename Op>
+struct InnerMostDimPreserver<-1, Self, Op, true> {
+  static EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void reduce(const Self& self, typename Self::Index firstIndex, Op& reducer, typename Self::PacketReturnType* accum) {
+    reducer.reducePacket(self.m_impl.template packet<Unaligned>(firstIndex), accum);
+  }
+};
+
+// Default full reducer
+template <typename Self, typename Op, typename Device, bool Vectorizable = (Self::InputPacketAccess & Op::PacketAccess)>
+struct FullReducer {
+  static const bool HasOptimizedImplementation = false;
+
+  static EIGEN_DEVICE_FUNC void run(const Self& self, Op& reducer, const Device&, typename Self::CoeffReturnType* output) {
+    const typename Self::Index num_coeffs = array_prod(self.m_impl.dimensions());
+    *output = InnerMostDimReducer<Self, Op>::reduce(self, 0, num_coeffs, reducer);
+  }
+};
+
+
+#ifdef EIGEN_USE_THREADS
+// Multithreaded full reducers
+template <typename Eval, typename Op, bool Vectorizable = (Eval::InputPacketAccess & Op::PacketAccess)>
+struct FullReducerShard {
+  static void run(const Eval& eval, typename Eval::Index firstIndex, typename Eval::Index numValuesToReduce, Op& reducer, FullReducerShard* shard) {
+
+    shard->saccum = reducer.initialize();
+    for (typename Eval::Index j = 0; j < numValuesToReduce; ++j) {
+      reducer.reduce(eval.m_impl.coeff(firstIndex + j), &shard->saccum);
+    }
+  }
+
+  typename Eval::CoeffReturnType saccum;
+};
+
+template <typename Eval, typename Op>
+struct FullReducerShard<Eval, Op, true> {
+  static void run(const Eval& eval, typename Eval::Index firstIndex, typename Eval::Index numValuesToReduce, Op& reducer, FullReducerShard* shard) {
+
+    const int packetSize = internal::unpacket_traits<typename Eval::PacketReturnType>::size;
+    const typename Eval::Index VectorizedSize = (numValuesToReduce / packetSize) * packetSize;
+
+    shard->paccum = reducer.template initializePacket<typename Eval::PacketReturnType>();
+    for (typename Eval::Index j = 0; j < VectorizedSize; j += packetSize) {
+      reducer.reducePacket(eval.m_impl.template packet<Unaligned>(firstIndex + j), &shard->paccum);
+    }
+    shard->saccum = reducer.initialize();
+    for (typename Eval::Index j = VectorizedSize; j < numValuesToReduce; ++j) {
+      reducer.reduce(eval.m_impl.coeff(firstIndex + j), &shard->saccum);
+    }
+  }
+
+  typename Eval::PacketReturnType paccum;
+  typename Eval::CoeffReturnType saccum;
+};
+
+
+template <typename Self, typename Op>
+struct FullReducer<Self, Op, ThreadPoolDevice, false> {
+  static const bool HasOptimizedImplementation = !Op::IsStateful;
+
+  // launch one reducer per thread and accumulate the result.
+  static void run(const Self& self, Op& reducer, const ThreadPoolDevice& device, typename Self::CoeffReturnType* output) {
+    typedef typename Self::Index Index;
+    const Index num_coeffs = array_prod(self.m_impl.dimensions());
+    const Index blocksize = std::floor<Index>(static_cast<float>(num_coeffs)/device.numThreads());
+    const Index numblocks = blocksize > 0 ? num_coeffs / blocksize : 0;
+    eigen_assert(num_coeffs >= numblocks * blocksize);
+
+    FixedSizeVector<Notification*> results(numblocks);
+    FixedSizeVector<FullReducerShard<Self, Op, false> > shards(numblocks, FullReducerShard<Self, Op, false>());
+    for (Index i = 0; i < numblocks; ++i) {
+      results.push_back(device.enqueue(&FullReducerShard<Self, Op, false>::run, self, i*blocksize, blocksize, reducer, &shards[i]));
+    }
+
+    FullReducerShard<Self, Op, false> finalShard;
+    if (numblocks * blocksize < num_coeffs) {
+      FullReducerShard<Self, Op, false>::run(self, numblocks * blocksize, num_coeffs - numblocks * blocksize, reducer, &finalShard);
+    } else {
+      finalShard.saccum = reducer.initialize();
+    }
+
+    for (Index i = 0; i < numblocks; ++i) {
+      wait_until_ready(results[i]);
+      delete results[i];
+    }
+
+    for (Index i = 0; i < numblocks; ++i) {
+      reducer.reduce(shards[i].saccum, &finalShard.saccum);
+    }
+    *output = reducer.finalize(finalShard.saccum);
+  }
+};
+
+template <typename Self, typename Op>
+struct FullReducer<Self, Op, ThreadPoolDevice, true> {
+  static const bool HasOptimizedImplementation = !Op::IsStateful;
+
+  // launch one reducer per thread and accumulate the result.
+  static void run(const Self& self, Op& reducer, const ThreadPoolDevice& device, typename Self::CoeffReturnType* output) {
+    typedef typename Self::Index Index;
+    const Index num_coeffs = array_prod(self.m_impl.dimensions());
+    const Index blocksize = std::floor<Index>(static_cast<float>(num_coeffs)/device.numThreads());
+    const Index numblocks = blocksize > 0 ? num_coeffs / blocksize : 0;
+    eigen_assert(num_coeffs >= numblocks * blocksize);
+
+    FixedSizeVector<Notification*> results(numblocks);
+    FixedSizeVector<FullReducerShard<Self, Op, true> > shards(numblocks, FullReducerShard<Self, Op, true>());
+    for (Index i = 0; i < numblocks; ++i) {
+      results.push_back(device.enqueue(&FullReducerShard<Self, Op, true>::run, self, i*blocksize, blocksize, reducer, &shards[i]));
+    }
+
+    FullReducerShard<Self, Op, true> finalShard;
+    if (numblocks * blocksize < num_coeffs) {
+      FullReducerShard<Self, Op, true>::run(self, numblocks * blocksize, num_coeffs - numblocks * blocksize, reducer, &finalShard);
+    } else {
+      finalShard.paccum = reducer.template initializePacket<typename Self::PacketReturnType>();
+      finalShard.saccum = reducer.initialize();
+    }
+
+    for (Index i = 0; i < numblocks; ++i) {
+      wait_until_ready(results[i]);
+      delete results[i];
+    }
+
+    for (Index i = 0; i < numblocks; ++i) {
+      reducer.reducePacket(shards[i].paccum, &finalShard.paccum);
+      reducer.reduce(shards[i].saccum, &finalShard.saccum);
+    }
+
+    *output = reducer.finalizeBoth(finalShard.saccum, finalShard.paccum);
+  }
+};
+#endif
+
+
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__)
+// Full reducers for GPU, don't vectorize for now
+
+// Reducer function that enables multiple cuda thread to safely accumulate at the same
+// output address. It basically reads the current value of the output variable, and
+// attempts to update it with the new value. If in the meantime another cuda thread
+// updated the content of the output address it will try again.
+template <typename T, typename R>
+__device__ EIGEN_ALWAYS_INLINE void atomicReduce(T* output, T accum, R& reducer) {
+#if __CUDA_ARCH__ >= 300
+  if (sizeof(T) == 4)
+  {
+    unsigned int oldval = *reinterpret_cast<unsigned int*>(output);
+    unsigned int newval = oldval;
+    reducer.reduce(accum, reinterpret_cast<T*>(&newval));
+    if (newval == oldval) {
+      return;
+    }
+    unsigned int readback;
+    while ((readback = atomicCAS((unsigned int*)output, oldval, newval)) != oldval) {
+      oldval = readback;
+      newval = oldval;
+      reducer.reduce(accum, reinterpret_cast<T*>(&newval));
+      if (newval == oldval) {
+        return;
+      }
+    }
+  }
+  else if (sizeof(T) == 8) {
+    unsigned long long oldval = *reinterpret_cast<unsigned long long*>(output);
+    unsigned long long newval = oldval;
+    reducer.reduce(accum, reinterpret_cast<T*>(&newval));
+    if (newval == oldval) {
+      return;
+    }
+    unsigned long long readback;
+    while ((readback = atomicCAS((unsigned long long*)output, oldval, newval)) != oldval) {
+      oldval = readback;
+      newval = oldval;
+      reducer.reduce(accum, reinterpret_cast<T*>(&newval));
+      if (newval == oldval) {
+        return;
+      }
+    }
+  }
+  else {
+    assert(0 && "Wordsize not supported");
+  }
+#else
+  assert(0 && "Shouldn't be called on unsupported device");
+#endif
+}
+
+template <typename T>
+__device__ inline void atomicReduce(T* output, T accum, SumReducer<T>&) {
+#if __CUDA_ARCH__ >= 300
+  atomicAdd(output, accum);
+#else
+  assert(0 && "Shouldn't be called on unsupported device");
+#endif
+}
+
+template <int BlockSize, int NumPerThread, typename Self,
+          typename Reducer, typename Index>
+__global__ void FullReductionKernel(Reducer reducer, const Self input, Index num_coeffs,
+                                    typename Self::CoeffReturnType* output) {
+  const Index first_index = blockIdx.x * BlockSize * NumPerThread + threadIdx.x;
+
+  if (first_index == 0) {
+    *output = reducer.initialize();
+  }
+
+  typename Self::CoeffReturnType accum = reducer.initialize();
+  for (Index i = 0; i < NumPerThread; ++i) {
+    const Index index = first_index + i * BlockSize;
+    if (index >= num_coeffs) {
+      break;
+    }
+    typename Self::CoeffReturnType val = input.m_impl.coeff(index);
+    reducer.reduce(val, &accum);
+  }
+
+  for (int offset = warpSize/2; offset > 0; offset /= 2) {
+    reducer.reduce(__shfl_down(accum, offset), &accum);
+  }
+
+  if ((threadIdx.x & (warpSize - 1)) == 0) {
+    atomicReduce(output, accum, reducer);
+  }
+}
+
+
+template <typename Self, typename Op, bool Vectorizable>
+struct FullReducer<Self, Op, GpuDevice, Vectorizable> {
+  // Unfortunately nvidia doesn't support well exotic types such as complex,
+  // so reduce the scope of the optimized version of the code to the simple case
+  // of floats.
+  static const bool HasOptimizedImplementation = !Op::IsStateful &&
+                                                 internal::is_same<typename Self::CoeffReturnType, float>::value;
+
+  template <typename OutputType>
+  static void run(const Self& self, Op& reducer, const GpuDevice& device, OutputType* output) {
+    assert(false && "Should only be called on floats");
+  }
+
+  static void run(const Self& self, Op& reducer, const GpuDevice& device, float* output) {
+    typedef typename Self::Index Index;
+
+    const Index num_coeffs = array_prod(self.m_impl.dimensions());
+    const int block_size = 256;
+    const int num_per_thread = 128;
+    const int num_blocks = std::ceil(static_cast<float>(num_coeffs) / (block_size * num_per_thread));
+    LAUNCH_CUDA_KERNEL((FullReductionKernel<block_size, num_per_thread>),
+                       num_blocks, block_size, 0, device, reducer, self, num_coeffs, output);
+  }
+};
+
+#endif
+
+
+template <typename Self, typename Op,
+          bool Vectorizable = (Self::InputPacketAccess & Op::PacketAccess)>
+class BlockReducer {
+ public:
+  typedef typename Self::Index Index;
+  typedef typename Self::Scalar Scalar;
+  typedef typename Self::CoeffReturnType CoeffReturnType;
+  typedef typename Self::PacketReturnType PacketReturnType;
+  explicit BlockReducer(const Op& reducer) : op_(reducer) {
+    accum_ = op_.initialize();
+  }
+  void Reduce(Index index, Index num_values_to_reduce, Scalar* data) {
+    for (Index i = 0; i < num_values_to_reduce; ++i) {
+      op_.reduce(data[index + i], &accum_);
+    }
+  }
+  CoeffReturnType Finalize() {
+    return op_.finalize(accum_);
+  }
+  PacketReturnType FinalizePacket() {
+    // TODO(andydavis) This function should not be called for Scalar
+    // reductions: clean this up or add an assert here.
+    return PacketReturnType();
+  }
+
+ private:
+  CoeffReturnType accum_;
+  Op op_;
+};
+
+template <typename Self, typename Op>
+class BlockReducer<Self, Op, true> {
+ public:
+  typedef typename Self::Index Index;
+  typedef typename Self::Scalar Scalar;
+  typedef typename Self::CoeffReturnType CoeffReturnType;
+  typedef typename Self::PacketReturnType PacketReturnType;
+  explicit BlockReducer(const Op& reducer) : op_(reducer) {
+    vaccum_ = op_.template initializePacket<PacketReturnType>();
+    accum_ = op_.initialize();
+  }
+  void Reduce(Index index, Index num_values_to_reduce, Scalar* data) {
+    const int packet_size = internal::unpacket_traits<PacketReturnType>::size;
+    const Index vectorized_size = (num_values_to_reduce / packet_size) *
+        packet_size;
+    for (Index i = 0; i < vectorized_size; i += packet_size) {
+      op_.reducePacket(internal::ploadt<PacketReturnType, Unaligned>(
+          &data[index + i]), &vaccum_);
+    }
+    for (Index i = vectorized_size; i < num_values_to_reduce; ++i) {
+      op_.reduce(data[index + i], &accum_);
+    }
+  }
+  CoeffReturnType Finalize() {
+    return op_.finalizeBoth(accum_, vaccum_);
+  }
+  PacketReturnType FinalizePacket() {
+    return op_.finalizePacket(vaccum_);
+  }
+
+ private:
+  PacketReturnType vaccum_;
+  CoeffReturnType accum_;
+  Op op_;
+};
+
+}  // end namespace internal
+
+
+template <typename Op, typename Dims, typename XprType>
+class TensorReductionOp : public TensorBase<TensorReductionOp<Op, Dims, XprType>, ReadOnlyAccessors> {
+  public:
+    typedef typename Eigen::internal::traits<TensorReductionOp>::Scalar Scalar;
+    typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+    typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+    typedef typename Eigen::internal::nested<TensorReductionOp>::type Nested;
+    typedef typename Eigen::internal::traits<TensorReductionOp>::StorageKind StorageKind;
+    typedef typename Eigen::internal::traits<TensorReductionOp>::Index Index;
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorReductionOp(const XprType& expr, const Dims& dims) : m_expr(expr), m_dims(dims)
+    { }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    TensorReductionOp(const XprType& expr, const Dims& dims, const Op& reducer) : m_expr(expr), m_dims(dims), m_reducer(reducer)
+    { }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const XprType& expression() const { return m_expr; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const Dims& dims() const { return m_dims; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+    const Op& reducer() const { return m_reducer; }
+
+  protected:
+    typename XprType::Nested m_expr;
+    const Dims m_dims;
+    const Op m_reducer;
+};
+
+
+// Eval as rvalue
+template<typename Op, typename Dims, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorReductionOp<Op, Dims, ArgType>, Device>
+{
+  typedef TensorReductionOp<Op, Dims, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions InputDimensions;
+  static const int NumInputDims = internal::array_size<InputDimensions>::value;
+  static const int NumReducedDims = internal::array_size<Dims>::value;
+  EIGEN_STATIC_ASSERT(NumInputDims >= NumReducedDims, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  static const int NumOutputDims = NumInputDims - NumReducedDims;
+  typedef DSizes<Index, NumOutputDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename internal::remove_const<Scalar>::type ScalarNonConst;
+  typedef TensorEvaluator<const TensorReductionOp<Op, Dims, ArgType>, Device> Self;
+  static const bool InputPacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = Self::InputPacketAccess && Op::PacketAccess,
+    BlockAccess = TensorEvaluator<ArgType, Device>::BlockAccess,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  typedef typename internal::TensorBlock<Index, ScalarNonConst, NumOutputDims,
+                                         Layout> OutputTensorBlock;
+  typedef typename internal::TensorBlock<Index, ScalarNonConst, NumInputDims,
+                                         Layout> InputTensorBlock;
+
+  static const bool ReducingInnerMostDims = internal::are_inner_most_dims<Dims, NumInputDims, Layout>::value;
+  static const bool PreservingInnerMostDims = internal::preserve_inner_most_dims<Dims, NumInputDims, Layout>::value;
+  static const bool RunningFullReduction = (NumInputDims==NumReducedDims);
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device), m_reducer(op.reducer()), m_result(NULL), m_device(device)
+  {
+    EIGEN_STATIC_ASSERT((!ReducingInnerMostDims | !PreservingInnerMostDims | (NumReducedDims == NumInputDims)),
+                        YOU_MADE_A_PROGRAMMING_MISTAKE);
+    for (int i = 0; i < NumInputDims; ++i) {
+      m_reduced_dim[i] = false;
+    }
+    for (int i = 0; i < NumReducedDims; ++i) {
+      eigen_assert(op.dims()[i] >= 0);
+      eigen_assert(op.dims()[i] < NumInputDims);
+      m_reduced_dim[op.dims()[i]] = true;
+    }
+
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    internal::partition_dims(input_dims, m_reduced_dim, &m_dimensions, &m_reducedDims);
+
+    // Precompute output strides.
+    if (NumOutputDims > 0) {
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        m_outputStrides[0] = 1;
+        for (int i = 1; i < NumOutputDims; ++i) {
+          m_outputStrides[i] = m_outputStrides[i - 1] * m_dimensions[i - 1];
+          m_fastOutputStrides[i] = internal::TensorIntDivisor<Index>(m_outputStrides[i]);
+        }
+      } else {
+        m_outputStrides[NumOutputDims - 1] = 1;
+        for (int i = NumOutputDims - 2; i >= 0; --i) {
+          m_outputStrides[i] = m_outputStrides[i + 1] * m_dimensions[i + 1];
+          m_fastOutputStrides[i] = internal::TensorIntDivisor<Index>(m_outputStrides[i]);
+        }
+      }
+    }
+
+    // Precompute input strides.
+    if (NumInputDims > 0) {
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        m_inputStrides[0] = 1;
+        for (int i = 1; i < NumInputDims; ++i) {
+          m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1];
+        }
+      } else {
+        m_inputStrides[NumInputDims - 1] = 1;
+        for (int i = NumInputDims - 2; i >= 0; --i) {
+          m_inputStrides[i] = m_inputStrides[i + 1] * input_dims[i + 1];
+        }
+      }
+    }
+
+    int outputIndex = 0;
+    int reduceIndex = 0;
+    for (int i = 0; i < NumInputDims; ++i) {
+      if (m_reduced_dim[i]) {
+        m_reducedStrides[reduceIndex] = m_inputStrides[i];
+        ++reduceIndex;
+      } else {
+        m_preservedStrides[outputIndex] = m_inputStrides[i];
+        m_output_to_input_dim_map[outputIndex] = i;
+        ++outputIndex;
+      }
+    }
+
+    m_numValuesToReduce
+        = NumOutputDims == 0 ? internal::array_prod(input_dims)
+        : (static_cast<int>(Layout) == static_cast<int>(ColMajor))
+            ? m_preservedStrides[0] : m_preservedStrides[NumOutputDims - 1];
+
+    m_block_total_size_max = numext::maxi(static_cast<std::size_t>(1),
+                                        device.lastLevelCacheSize() /
+                                        sizeof(Scalar));
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  typedef typename internal::remove_const<typename XprType::CoeffReturnType>::type CoeffReturnType;
+  typedef typename PacketType<CoeffReturnType, Device>::type PacketReturnType;
+
+  EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(CoeffReturnType* data) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+
+    // Use the FullReducer if possible.
+    if (RunningFullReduction && internal::FullReducer<Self, Op, Device>::HasOptimizedImplementation &&
+        ((RunningOnGPU && (m_device.majorDeviceVersion() >= 3)) ||
+         (internal::array_prod(m_impl.dimensions()) > 1024 * 1024))) {
+
+      bool need_assign = false;
+      if (!data) {
+        m_result = static_cast<CoeffReturnType*>(m_device.allocate(sizeof(CoeffReturnType)));
+        data = m_result;
+        need_assign = true;
+      }
+
+      Op reducer(m_reducer);
+      internal::FullReducer<Self, Op, Device>::run(*this, reducer, m_device, data);
+      return need_assign;
+    }
+
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+
+    if (m_result) {
+      m_device.deallocate(m_result);
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    if (RunningFullReduction && m_result) {
+      return *m_result;
+    }
+    Op reducer(m_reducer);
+    if (ReducingInnerMostDims) {
+      return internal::InnerMostDimReducer<Self, Op>::reduce(*this, firstInput(index),
+                                                             m_numValuesToReduce, reducer);
+    } else {
+      typename Self::CoeffReturnType accum = reducer.initialize();
+      internal::GenericDimReducer<NumReducedDims-1, Self, Op>::reduce(*this, firstInput(index), reducer, &accum);
+      return reducer.finalize(accum);
+    }
+  }
+
+  // TODO(bsteiner): provide a more efficient implementation.
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index + packetSize - 1 < dimensions().TotalSize());
+
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+    if (ReducingInnerMostDims) {
+      const Index num_values_to_reduce = m_numValuesToReduce;
+      const Index firstIndex = firstInput(index);
+      for (Index i = 0; i < packetSize; ++i) {
+        Op reducer(m_reducer);
+        values[i] = internal::InnerMostDimReducer<Self, Op>::reduce(*this, firstIndex + i * num_values_to_reduce,
+                                                                    num_values_to_reduce, reducer);
+      }
+    } else if (PreservingInnerMostDims) {
+      const Index firstIndex = firstInput(index);
+      const int innermost_dim = (static_cast<int>(Layout) == static_cast<int>(ColMajor)) ? 0 : NumOutputDims - 1;
+      // TBD: extend this the the n innermost dimensions that we preserve.
+      if (((firstIndex % m_dimensions[innermost_dim]) + packetSize - 1) < m_dimensions[innermost_dim]) {
+        Op reducer(m_reducer);
+        typename Self::PacketReturnType accum = reducer.template initializePacket<typename Self::PacketReturnType>();
+        internal::InnerMostDimPreserver<NumReducedDims-1, Self, Op>::reduce(*this, firstIndex, reducer, &accum);
+        return reducer.finalizePacket(accum);
+      } else {
+        for (int i = 0; i < packetSize; ++i) {
+          values[i] = coeff(index + i);
+        }
+      }
+    } else {
+      for (int i = 0; i < packetSize; ++i) {
+        values[i] = coeff(index + i);
+      }
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+    resources->push_back(internal::TensorOpResourceRequirements(
+        internal::kSkewedInnerDims, m_block_total_size_max));
+    m_impl.getResourceRequirements(resources);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void block(
+      OutputTensorBlock* output_block) const {
+    // Special case full reductions to avoid input block copy below.
+    if (NumInputDims == NumReducedDims) {
+      eigen_assert(output_block->first_coeff_index() == 0);
+      eigen_assert(output_block->block_sizes().TotalSize() == 1);
+      Op reducer(m_reducer);
+      output_block->data()[0] = internal::InnerMostDimReducer<Self, Op>::reduce(
+          *this, 0, m_numValuesToReduce, reducer);
+      return;
+    }
+
+    // Calculate input tensor 'slice' required to reduce output block coeffs.
+    DSizes<Index, NumInputDims> input_slice_sizes(m_impl.dimensions());
+    for (int i = 0; i < NumOutputDims; ++i) {
+      // Clip preserved input dimensions by output block size.
+      input_slice_sizes[m_output_to_input_dim_map[i]] =
+          output_block->block_sizes()[i];
+    }
+
+    // Shard input tensor slice into blocks (because it could be large if we
+    // need to reduce along several dimensions to calculate required output
+    // coefficients).
+    const Index max_coeff_count =
+        numext::mini(((m_device.firstLevelCacheSize()) / sizeof(Scalar)),
+                   input_slice_sizes.TotalSize());
+
+    // Calculate max output shard size needed to keep working set of reducers
+    // in L1, while leaving enough space for reducer overhead and 'packet_size'
+    // reductions.
+    DSizes<Index, NumInputDims> target_input_block_sizes;
+    CalculateTargetInputBlockShape(max_coeff_count, input_slice_sizes,
+                                   &target_input_block_sizes);
+    // Calculate indices for first preserved dimension.
+    const Index first_preserved_dim_output_index =
+        static_cast<int>(Layout) == static_cast<int>(ColMajor) ?
+        0 : NumOutputDims - 1;
+    const Index first_preserved_dim_input_index = m_output_to_input_dim_map[
+        first_preserved_dim_output_index];
+    const bool inner_most_dim_preserved = first_preserved_dim_input_index ==
+        (static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 0 :
+         NumInputDims - 1) | PreservingInnerMostDims;
+
+    // Calculate output block inner/outer dimension sizes.
+    const Index output_block_inner_dim_size = output_block->block_sizes()[
+        first_preserved_dim_output_index];
+    const Index output_block_outer_dim_size =
+        output_block->block_sizes().TotalSize() / output_block_inner_dim_size;
+    // Calculate shard size for first preserved dimension.
+    const Index output_shard_size = target_input_block_sizes[
+        first_preserved_dim_input_index];
+    const Index num_output_shards =
+        (output_block_inner_dim_size + output_shard_size - 1) /
+        output_shard_size;
+
+    // Initialize 'tensor_slice_offsets' from input coords of output index.
+    DSizes<Index, NumInputDims> tensor_slice_offsets;
+    GetInputCoordsForOutputIndex(output_block->first_coeff_index(),
+                                 &tensor_slice_offsets);
+
+    // Store tensor slice offset in first preserved dimension to be used
+    // to update tensor slice extents in loop below.
+    const Index first_preserved_dim_offset_start = tensor_slice_offsets[
+        first_preserved_dim_input_index];
+
+    array<BlockIteratorState, NumOutputDims> block_iter_state;
+
+    // Initialize state used to iterate through output coefficients
+    // and update 'tensor_slice_offsets' in outer preserved dims.
+    for (int i = 0; i < NumOutputDims - 1; ++i) {
+      const int dim = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+          ? i + 1 : NumOutputDims - i - 2;
+      block_iter_state[i].input_dim = m_output_to_input_dim_map[dim];
+      block_iter_state[i].output_size = output_block->block_sizes()[dim];
+      block_iter_state[i].output_count = 0;
+    }
+
+    // Allocate input block memory.
+    ScalarNonConst* input_block_data = static_cast<ScalarNonConst*>(
+        m_device.allocate(max_coeff_count * sizeof(Scalar)));
+    // Allocate reducer memory.
+    const bool packet_reductions_enabled = (Self::InputPacketAccess &
+                                            Op::PacketAccess);
+    const Index packet_size = internal::unpacket_traits<PacketReturnType>::size;
+    const Index num_reducers =
+        (inner_most_dim_preserved && packet_reductions_enabled) ?
+        (output_shard_size / packet_size + output_shard_size % packet_size +
+         packet_size) : output_shard_size;
+    typedef internal::BlockReducer<Self, Op> BlockReducer;
+    BlockReducer* reducers = static_cast<BlockReducer*>(
+        m_device.allocate(num_reducers * sizeof(BlockReducer)));
+
+    InputDimensions input_tensor_dims(m_impl.dimensions());
+    for (Index output_outer_index = 0;
+         output_outer_index < output_block_outer_dim_size;
+         ++output_outer_index) {
+      for (Index output_shard_index = 0;
+           output_shard_index < num_output_shards;
+           ++output_shard_index) {
+        // Initialize 'tensor_slice_extents' for this output shard.
+        DSizes<Index, NumInputDims> tensor_slice_extents(input_slice_sizes);
+        for (int i = 0; i < NumInputDims; ++i) {
+          if (i == first_preserved_dim_input_index) {
+            // Clip first preserved dim size to output shard size.
+            tensor_slice_extents[i] = numext::mini(
+                output_shard_size,
+                input_slice_sizes[i] - (tensor_slice_offsets[i] -
+                                        first_preserved_dim_offset_start));
+
+          } else if (!m_reduced_dim[i]) {
+            // Clip outer preserved dims to size 1, so that we reduce a
+            // contiguous set of output coefficients.
+            tensor_slice_extents[i] = 1;
+          }
+        }
+
+        // Intialize output coefficient reducers.
+        for (int i = 0; i < num_reducers; ++i) {
+          new (&reducers[i]) BlockReducer(m_reducer);
+        }
+
+        typedef internal::TensorSliceBlockMapper<
+          Index, ScalarNonConst, NumInputDims, Layout> TensorSliceBlockMapper;
+
+        // TODO(andydavis) Consider removing 'input_block_stride_order' if we
+        // find that scattered reads are not worth supporting in
+        // TensorSliceBlockMapper.
+        TensorSliceBlockMapper block_mapper(
+            input_tensor_dims, tensor_slice_offsets, tensor_slice_extents,
+            target_input_block_sizes, DimensionList<Index, NumInputDims>());
+
+        const Index num_outputs_to_update = tensor_slice_extents[
+            first_preserved_dim_input_index];
+        const Index preserved_dim_vector_reducer_count =
+            (inner_most_dim_preserved && packet_reductions_enabled) ?
+            num_outputs_to_update / packet_size: 0;
+        const Index preserved_dim_vector_coeff_count =
+            inner_most_dim_preserved ? preserved_dim_vector_reducer_count *
+            packet_size : 0;
+        const Index preserved_dim_reducer_limit =
+            (inner_most_dim_preserved && packet_reductions_enabled) ?
+          (preserved_dim_vector_reducer_count +
+           num_outputs_to_update % packet_size) : num_outputs_to_update;
+
+        const Index total_block_count = block_mapper.total_block_count();
+        for (Index b = 0; b < total_block_count; ++b) {
+          InputTensorBlock input_block = block_mapper.GetBlockForIndex(
+              b, input_block_data);
+          // Read.
+          m_impl.block(&input_block);
+
+          Index num_values_to_reduce = 1;
+          for (Index i = 0; i < NumInputDims; ++i) {
+            if (m_reduced_dim[i]) {
+              num_values_to_reduce *= input_block.block_sizes()[i];
+            }
+          }
+          // Reduce.
+          if (inner_most_dim_preserved) {
+            const Index input_outer_dim_size =
+                input_block.block_sizes().TotalSize() / num_outputs_to_update;
+            for (Index input_outer_dim_index = 0;
+                 input_outer_dim_index < input_outer_dim_size;
+                 ++input_outer_dim_index) {
+              const Index input_outer_dim_base = input_outer_dim_index *
+                  num_outputs_to_update;
+              for (Index i = 0; i < preserved_dim_vector_reducer_count; ++i) {
+                reducers[i].Reduce(input_outer_dim_base + i * packet_size,
+                                   packet_size, input_block.data());
+              }
+              const Index scalar_reducer_base = input_outer_dim_base +
+                  preserved_dim_vector_coeff_count;
+              for (Index i = preserved_dim_vector_reducer_count;
+                   i < preserved_dim_reducer_limit; ++i) {
+                reducers[i].Reduce(scalar_reducer_base + i -
+                                   preserved_dim_vector_reducer_count,
+                                   1,
+                                   input_block.data());
+              }
+            }
+          } else {
+            for (Index i = 0; i < num_outputs_to_update; ++i) {
+              reducers[i].Reduce(i * num_values_to_reduce,
+                                 num_values_to_reduce,
+                                 input_block.data());
+            }
+          }
+        }
+
+        // Finalize all reducers for this output shard.
+        const Index output_base_index =
+            output_outer_index * output_block_inner_dim_size +
+            output_shard_index * output_shard_size;
+        if (inner_most_dim_preserved) {
+          EIGEN_ALIGN_DEFAULT CoeffReturnType values[packet_size];
+          for (Index i = 0; i < preserved_dim_vector_reducer_count; ++i) {
+            const Index reducer_base = output_base_index + i * packet_size;
+            internal::pstore<CoeffReturnType, PacketReturnType>(
+                values, reducers[i].FinalizePacket());
+            for (Index j = 0; j < packet_size; ++j) {
+              output_block->data()[reducer_base + j] = values[j];
+            }
+          }
+          const Index scalar_reducer_base = output_base_index +
+              preserved_dim_vector_coeff_count;
+
+          for (Index i = preserved_dim_vector_reducer_count;
+               i < preserved_dim_reducer_limit; ++i) {
+            output_block->data()[
+                scalar_reducer_base + i - preserved_dim_vector_reducer_count] =
+                reducers[i].Finalize();
+          }
+        } else {
+          for (int i = 0; i < num_outputs_to_update; ++i) {
+            output_block->data()[output_base_index + i] =
+                reducers[i].Finalize();
+          }
+        }
+
+        // Update 'tensor_slice_offsets' by num outputs for this output shard.
+        tensor_slice_offsets[first_preserved_dim_input_index] +=
+            num_outputs_to_update;
+      }
+      // Update slice offset for inner preserved dim.
+      tensor_slice_offsets[first_preserved_dim_input_index] -=
+          output_block_inner_dim_size;
+      // Update slice offsets for remaining output dims.
+      for (int i = 0; i < NumOutputDims - 1; ++i) {
+        BlockIteratorState& b = block_iter_state[i];
+        if (++b.output_count < b.output_size) {
+          ++tensor_slice_offsets[b.input_dim];
+          break;
+        }
+        b.output_count = 0;
+        tensor_slice_offsets[b.input_dim] -= b.output_size - 1;
+      }
+    }
+
+    // Free memory.
+    m_device.deallocate(input_block_data);
+    m_device.deallocate(reducers);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+  private:
+  template <int, typename, typename> friend struct internal::GenericDimReducer;
+  template <typename, typename, bool> friend struct internal::InnerMostDimReducer;
+  template <int, typename, typename, bool> friend struct internal::InnerMostDimPreserver;
+  template <typename S, typename O, typename D, bool V> friend struct internal::FullReducer;
+#ifdef EIGEN_USE_THREADS
+  template <typename S, typename O, bool V> friend struct internal::FullReducerShard;
+#endif
+#if defined(EIGEN_USE_GPU) && defined(__CUDACC__)
+  template <int B, int N, typename S, typename R, typename I> friend void internal::FullReductionKernel(R, const S, I, typename S::CoeffReturnType*);
+#endif
+
+  struct BlockIteratorState {
+    Index input_dim;
+    Index output_size;
+    Index output_count;
+  };
+
+  // Returns the Index in the input tensor of the first value that needs to be
+  // used to compute the reduction at output index "index".
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index firstInput(Index index) const {
+    if (ReducingInnerMostDims) {
+      return index * m_numValuesToReduce;
+    }
+    Index startInput = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumOutputDims - 1; i > 0; --i) {
+        // This is index_i in the output tensor.
+        const Index idx = index / m_fastOutputStrides[i];
+        startInput += idx * m_preservedStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+    } else {
+      for (int i = 0; i < NumOutputDims - 1; ++i) {
+        // This is index_i in the output tensor.
+        const Index idx = index / m_fastOutputStrides[i];
+        startInput += idx * m_preservedStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+    }
+    if (PreservingInnerMostDims) {
+      eigen_assert(m_numValuesToReduce == 1);
+      startInput += index;
+    } else {
+      startInput += index * m_numValuesToReduce;
+    }
+    return startInput;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void GetInputCoordsForOutputIndex(
+      Index index,
+      DSizes<Index, NumInputDims>* coords) const {
+    for (int i = 0; i < NumInputDims; ++i) {
+      (*coords)[i] = 0;
+    }
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumOutputDims - 1; i > 0; --i) {
+        const Index idx = index / m_fastOutputStrides[i];
+        (*coords)[m_output_to_input_dim_map[i]] = idx;
+        index -= idx * m_outputStrides[i];
+      }
+      (*coords)[m_output_to_input_dim_map[0]] = index;
+    } else {
+      for (int i = 0; i < NumOutputDims - 1; ++i) {
+        const Index idx = index / m_fastOutputStrides[i];
+        (*coords)[m_output_to_input_dim_map[i]] = idx;
+        index -= idx * m_outputStrides[i];
+      }
+      (*coords)[m_output_to_input_dim_map[NumOutputDims-1]] = index;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void CalculateTargetInputBlockShape(
+      const Index max_coeff_count,
+      const DSizes<Index, NumInputDims>& input_slice_sizes,
+      DSizes<Index, NumInputDims>* target_input_block_sizes) const {
+    typedef typename internal::packet_traits<Scalar>::type Packet;
+    const Index packet_size = internal::unpacket_traits<Packet>::size;
+    typedef internal::BlockReducer<Self, Op> BlockReducer;
+    // TODO(andydavis) Compute reducer overhead correctly for the case where
+    // we are preserving the inner most dimension, and a single reducer
+    // reduces a packet's worth of output coefficients.
+    const Index reducer_overhead = sizeof(BlockReducer) / sizeof(Scalar);
+
+    Index coeff_to_allocate = max_coeff_count;
+    bool first_preserved_dim_allocated = false;
+    bool first_reduced_dim_allocated = false;
+    for (int i = 0; i < NumInputDims; ++i) {
+      const int dim = static_cast<int>(Layout) == static_cast<int>(ColMajor)
+          ? i : NumInputDims - i - 1;
+      (*target_input_block_sizes)[dim] = 1;
+      if (m_reduced_dim[dim]) {
+        // TODO(andydavis) Consider allocating to multiple reduced dimensions.
+        // Watch out for cases where reduced dimensions are not contiguous,
+        // which induces scattered reads.
+        if (!first_reduced_dim_allocated) {
+          (*target_input_block_sizes)[dim] = numext::mini(input_slice_sizes[dim],
+                                                        coeff_to_allocate);
+          coeff_to_allocate /= (*target_input_block_sizes)[dim];
+          first_reduced_dim_allocated = true;
+        }
+      } else if (!first_preserved_dim_allocated) {
+        // TODO(andydavis) Include output block size in this L1 working set
+        // calculation.
+        const Index allocated = max_coeff_count - coeff_to_allocate;
+        const Index alloc_size = numext::maxi(static_cast<Index>(1),
+                                            coeff_to_allocate /
+                                            reducer_overhead);
+        (*target_input_block_sizes)[dim] = numext::mini(input_slice_sizes[dim],
+                                                      alloc_size);
+        coeff_to_allocate = numext::maxi(
+            static_cast<Index>(1),
+            coeff_to_allocate / ((*target_input_block_sizes)[dim] *
+                                 reducer_overhead));
+        first_preserved_dim_allocated = true;
+      }
+    }
+  }
+
+  // Bitmap indicating if an input dimension is reduced or not.
+  array<bool, NumInputDims> m_reduced_dim;
+  // Dimensions of the output of the operation.
+  Dimensions m_dimensions;
+  // Precomputed strides for the input tensor.
+  array<Index, NumInputDims> m_inputStrides;
+  // Precomputed strides for the output tensor.
+  array<Index, NumOutputDims> m_outputStrides;
+  array<internal::TensorIntDivisor<Index>, NumOutputDims> m_fastOutputStrides;
+  // Subset of strides of the input tensor for the non-reduced dimensions.
+  // Indexed by output dimensions.
+  array<Index, NumOutputDims> m_preservedStrides;
+  // Map from output to input dimension index.
+  array<Index, NumOutputDims> m_output_to_input_dim_map;
+  // How many values go into each reduction
+  Index m_numValuesToReduce;
+
+  // Subset of strides of the input tensor for the reduced dimensions.
+  // Indexed by reduced dimensions.
+  array<Index, NumReducedDims> m_reducedStrides;
+  // Size of the input dimensions that are reduced.
+  // Indexed by reduced dimensions.
+  array<Index, NumReducedDims> m_reducedDims;
+
+  // Evaluator for the input expression.
+  TensorEvaluator<ArgType, Device> m_impl;
+
+  // Operation to apply for computing the reduction.
+  Op m_reducer;
+
+  // For full reductions
+#ifdef EIGEN_USE_GPU
+  static const bool RunningOnGPU = internal::is_same<Device, Eigen::GpuDevice>::value;
+#else
+  static const bool RunningOnGPU = false;
+#endif
+  CoeffReturnType* m_result;
+  std::size_t m_block_total_size_max;
+
+  const Device& m_device;
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReductionCuda.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReductionCuda.h
new file mode 100644
index 0000000000..d052dcdf69
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReductionCuda.h
@@ -0,0 +1,642 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Manjunath Kudlur <keveman@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_CUDA_H
+#define EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_CUDA_H
+
+#if defined(EIGEN_USE_GPU)
+
+namespace Eigen {
+namespace internal {
+
+template <typename OutExpr, typename InExpr, typename Op, typename Indices,
+          bool Tileable>
+class TensorExecutor<
+    const TensorAssignOp<
+        OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>,
+    GpuDevice, false, Tileable> {
+ public:
+  typedef const TensorAssignOp<
+      OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>
+      Expression;
+  static void run(const Expression& expr, const GpuDevice& device);
+};
+
+template <typename OutExpr, typename InExpr, typename Op, typename Indices,
+          bool Tileable>
+class TensorExecutor<
+    const TensorAssignOp<
+        OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>,
+    GpuDevice, true, Tileable> {
+ public:
+  typedef const TensorAssignOp<
+      OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>
+      Expression;
+  static void run(const Expression& expr, const GpuDevice& device);
+};
+
+template <typename InExpr, typename Op, typename Indices, bool Tileable>
+class TensorExecutor<const TensorEvalToOp<const TensorReductionOp<
+                         Op, const Indices, const InExpr> >,
+                     GpuDevice, false, Tileable> {
+ public:
+  typedef const TensorEvalToOp<
+      const TensorReductionOp<Op, const Indices, const InExpr> > Expression;
+  static void run(const Expression& expr, const GpuDevice& device);
+};
+
+template <typename InExpr, typename Op, typename Indices, bool Tileable>
+class TensorExecutor<const TensorEvalToOp<const TensorReductionOp<
+                         Op, const Indices, const InExpr> >,
+                     GpuDevice, true, Tileable> {
+ public:
+  typedef const TensorEvalToOp<
+      const TensorReductionOp<Op, const Indices, const InExpr> > Expression;
+  static void run(const Expression& expr, const GpuDevice& device);
+};
+
+}  // end namespace internal
+}  // end namespace Eigen
+
+#if defined(__CUDACC__)
+
+namespace Eigen {
+
+namespace internal {
+
+namespace {
+
+#define DIVUP(x, y) (((x) + (y)-1) / (y))
+
+// Initialize output[0..size-1] with val
+template <typename Output>
+__global__ void InitVector(const float val, int size, Output output) {
+  int idx = blockIdx.x * blockDim.x + threadIdx.x;
+  for (int i = idx; i < size; i += gridDim.x * blockDim.x) {
+    output.coeffRef(i) = val;
+  }
+}
+
+// -----------------------------------------------------------------------------
+// Column Reduction kernels
+// -----------------------------------------------------------------------------
+template <int GRID_DIM, int BLOCK_DIM, int NUM_PER_THREAD, typename Input,
+          typename Output, typename Reducer>
+__global__ void ColumnReduceKernel(Reducer reducer, const Input input, int rows,
+                                   int cols, Output output) {
+  assert(blockDim.x == BLOCK_DIM);
+  assert(blockDim.y == 1);
+  assert(blockDim.z == 1);
+
+  assert(gridDim.x == GRID_DIM);
+  assert(gridDim.y == 1);
+  assert(gridDim.z == 1);
+
+  typedef typename Input::Index Index;
+
+  const Index num_input_points = DIVUP(rows, NUM_PER_THREAD) * cols;
+  const int bx = blockIdx.x;
+  const int tx = threadIdx.x;
+
+  for (Index i = bx * BLOCK_DIM + tx; i < num_input_points;
+       i += BLOCK_DIM * GRID_DIM) {
+    const Index input_col = i % cols;
+    const Index input_row_begin =
+        ((i / cols) % DIVUP(rows, NUM_PER_THREAD)) * NUM_PER_THREAD;
+    float reduced_val = reducer.bottom_value();
+    for (int j = 0; j < NUM_PER_THREAD; ++j) {
+      float val = ((input_col < cols) && (input_row_begin + j < rows))
+                      ? input.coeff((input_row_begin + j) * cols + input_col)
+                      : reducer.bottom_value();
+      reduced_val = reducer(reduced_val, val);
+    }
+#if __CUDA_ARCH__ >= 300
+    reducer.atomic_reduce(&output.coeffRef(input_col), reduced_val);
+#endif
+  }
+}
+
+// -----------------------------------------------------------------------------
+// Row Reduction kernels
+// -----------------------------------------------------------------------------
+template <int GRID_DIM, int BLOCK_DIM, int NUM_PER_THREAD, typename Input,
+          typename Output, typename Reducer>
+__global__ void RowReduceKernel(Reducer reducer, const Input input, int rows,
+                                int cols, Output output) {
+  assert(BLOCK_DIM % 32 == 0);
+  assert(blockDim.x == BLOCK_DIM);
+  assert(blockDim.y == 1);
+  assert(blockDim.z == 1);
+
+  assert(gridDim.x == GRID_DIM);
+  assert(gridDim.y == 1);
+  assert(gridDim.z == 1);
+
+  const int unroll_times = 16;
+  assert(NUM_PER_THREAD % unroll_times == 0);
+
+  typedef typename Input::Index Index;
+
+  __shared__ float temp[BLOCK_DIM];
+
+  const Index input_col_blocks = DIVUP(cols, BLOCK_DIM * NUM_PER_THREAD);
+  const Index num_input_blocks = input_col_blocks * rows;
+
+  const int bx = blockIdx.x;
+  const int tx = threadIdx.x;
+
+  for (Index i = bx; i < num_input_blocks; i += GRID_DIM) {
+    const Index col_block = i % input_col_blocks;
+    const Index row_block = i / input_col_blocks;
+    const Index col_begin = col_block * BLOCK_DIM * NUM_PER_THREAD + tx;
+    const Index row = row_block;
+    float reduced_val = reducer.bottom_value();
+    if (row < rows) {
+      for (Index j = 0; j < NUM_PER_THREAD; j += unroll_times) {
+        const Index last_col = col_begin + BLOCK_DIM * (j + unroll_times - 1);
+        if (last_col >= cols) {
+          // We can skip the last iteration of the loop since we know
+          // that col >= cols there.
+#pragma unroll
+          for (int k = 0; k < unroll_times - 1; ++k) {
+            const Index col = col_begin + BLOCK_DIM * (j + k);
+            const float val = (col < cols ? input.coeff(row * cols + col)
+                               : reducer.bottom_value());
+            reduced_val = reducer(reduced_val, val);
+          }
+          break;  // col < cols for all later iterations.
+        } else {
+          // Faster version of the loop with no branches after unrolling.
+#pragma unroll
+          for (int k = 0; k < unroll_times; ++k) {
+            const Index col = col_begin + BLOCK_DIM * (j + k);
+            reduced_val = reducer(reduced_val, input.coeff(row * cols + col));
+          }
+        }
+      }
+    }
+    temp[tx] = reduced_val;
+
+    __syncthreads();
+    const int warp_id = tx & 31;
+    if (warp_id < 16) temp[tx] = reducer(temp[tx], temp[tx + 16]);
+    if (warp_id < 8) temp[tx] = reducer(temp[tx], temp[tx + 8]);
+    if (warp_id < 4) temp[tx] = reducer(temp[tx], temp[tx + 4]);
+    if (warp_id < 2) temp[tx] = reducer(temp[tx], temp[tx + 2]);
+    if (warp_id < 1) temp[tx] = reducer(temp[tx], temp[tx + 1]);
+
+    if (warp_id == 0) {
+      if (row < rows) {
+#if __CUDA_ARCH__ >= 300
+        reducer.atomic_reduce(&output.coeffRef(row), temp[tx]);
+#endif
+      }
+    }
+
+    __syncthreads();
+  }
+}
+
+template <typename Input, typename Output, typename Reducer>
+void ColumnReduceCuda(Reducer reducer, const GpuDevice& device,
+                      const Input input, int rows, int cols, Output output) {
+  const int block_size = 256;
+  const int grid_size = 128;
+  const int num_per_thread = 16;
+  LAUNCH_CUDA_KERNEL(InitVector, 32, 1024, 0, device, reducer.bottom_value(),
+                     cols, output);
+  LAUNCH_CUDA_KERNEL(
+      (ColumnReduceKernel<grid_size, block_size, num_per_thread>), grid_size,
+      block_size, 0, device, reducer, input, rows, cols, output);
+}
+
+template <typename Input, typename Output, typename Reducer>
+void RowReduceCuda(Reducer reducer, const GpuDevice& device, const Input input,
+                   int rows, int cols, Output output) {
+  const int block_size = 256;
+  const int grid_size = 32;
+  const int num_per_thread = 128;
+  LAUNCH_CUDA_KERNEL(InitVector, 32, 1024, 0, device, reducer.bottom_value(),
+                     rows, output);
+  LAUNCH_CUDA_KERNEL((RowReduceKernel<grid_size, block_size, num_per_thread>),
+                     grid_size, block_size, 0, device, reducer, input, rows,
+                     cols, output);
+}
+
+// Provides arbitrary sum reductions, applying a function across the
+// right argument being reduced prior to summing
+template <typename F>
+struct FnSumReducer {
+  __host__ __device__ FnSumReducer(F f) : f_(f) {}
+  __host__ __device__ float bottom_value() { return 0.0f; }
+  __device__ float operator()(float x, float y) const { return x + f_(y); }
+  __device__ void atomic_reduce(float* x, float y) const { atomicAdd(x, y); }
+
+  F f_;
+};
+
+// Identity is used for the basic SumReduction
+struct Identity {
+  __device__ float operator()(float x) const { return x; }
+};
+
+struct CudaSumReducer : FnSumReducer<Identity> {
+  __host__ __device__ CudaSumReducer() : FnSumReducer(Identity()) {}
+};
+
+struct CudaMaxReducer {
+  // nvcc doesn't recognize numeric_limits<float>::lowest for some reason.
+  CudaMaxReducer() {
+    bottom_value_ = -3.40282347E+38F;  // std::numeric_limits<float>::lowest();
+  }
+  __host__ __device__ float bottom_value() { return bottom_value_; }
+  __device__ float operator()(float x, float y) const { return fmax(x, y); }
+
+  // This is equivalent to atomicMax(x, y), but CUDA does not have atomicMax for
+  // float data type. Instead, this atomically compares-and-swaps the old value
+  // at x with y. If the old value returned by the CAS operation was already
+  // larger than y, or what was read before, it declares success and finishes,
+  // otherwise repeats the procedure.
+  __device__ void atomic_reduce(float* x, float y) {
+    unsigned int old_val = *reinterpret_cast<unsigned int*>(x);
+    while (*reinterpret_cast<float*>(&old_val) < y) {
+      unsigned int current_val =
+          atomicCAS(reinterpret_cast<unsigned int*>(x), old_val,
+                    *reinterpret_cast<unsigned int*>(&y));
+      if (old_val == current_val) {
+        break;
+      }
+      old_val = current_val;
+    }
+  }
+  float bottom_value_;
+};
+
+}  // end namespace
+
+template <typename Op>
+struct IsFloatSumReduction {
+  static const bool value = false;
+};
+
+template <>
+struct IsFloatSumReduction<SumReducer<float> > {
+  static const bool value = true;
+};
+
+template <typename Op>
+struct IsFloatMaxReduction {
+  static const bool value = false;
+};
+
+template <>
+struct IsFloatMaxReduction<MaxReducer<float> > {
+  static const bool value = true;
+};
+
+template <typename Op>
+struct SumOrMaxOfFloat {
+  static const bool value =
+      IsFloatSumReduction<Op>::value || IsFloatMaxReduction<Op>::value;
+};
+
+enum ReductionType { ROW_REDUCE, COL_REDUCE, UNOPTIMIZED };
+
+template <typename Op, typename Expr, typename ReductionExpr>
+ReductionType GetReductionType(const Expr& expr,
+                               const ReductionExpr& reduction_expr,
+                               const GpuDevice& device, std::size_t* rows,
+                               std::size_t* cols) {
+  typedef TensorEvaluator<const Expr, GpuDevice> EvalExpr;
+  typedef TensorEvaluator<const ReductionExpr, GpuDevice> ReductionEvalExpr;
+
+  if (device.majorDeviceVersion() < 3) {
+    return UNOPTIMIZED;
+  }
+  const EvalExpr eval_expr(expr, device);
+
+  // We only have fast reductions for sum/max of float.
+  if (!SumOrMaxOfFloat<Op>::value) {
+    return UNOPTIMIZED;
+  }
+
+  // For sum/max of float, if we are doing a full reduction, we can
+  // use the ROW_REDUCE optimization.
+  if (ReductionEvalExpr::NumReducedDims == ReductionEvalExpr::NumInputDims) {
+    *rows = 1;
+    *cols = array_prod(eval_expr.dimensions());
+    return ROW_REDUCE;
+  }
+
+  if (ReductionEvalExpr::NumReducedDims > 1) {
+    return UNOPTIMIZED;
+  }
+
+  const int dim = reduction_expr.dims()[0];
+  if (static_cast<int>(ReductionEvalExpr::Layout) ==
+      static_cast<int>(RowMajor)) {
+    if (dim == ReductionEvalExpr::NumInputDims - 1) {
+      *rows = array_prod(eval_expr.dimensions()) /
+              eval_expr.dimensions()[ReductionEvalExpr::NumInputDims - 1];
+      *cols = eval_expr.dimensions()[ReductionEvalExpr::NumInputDims - 1];
+      if (*cols < 32) return UNOPTIMIZED;
+      return ROW_REDUCE;
+    } else if (dim == 0) {
+      *rows = eval_expr.dimensions()[0];
+      *cols = array_prod(eval_expr.dimensions()) / eval_expr.dimensions()[0];
+      if (*rows < 32) return UNOPTIMIZED;
+      return COL_REDUCE;
+    }
+  } else if (static_cast<int>(ReductionEvalExpr::Layout) ==
+             static_cast<int>(ColMajor)) {
+    if (dim == ReductionEvalExpr::NumInputDims - 1) {
+      *rows = eval_expr.dimensions()[ReductionEvalExpr::NumInputDims - 1];
+      *cols = array_prod(eval_expr.dimensions()) /
+              eval_expr.dimensions()[ReductionEvalExpr::NumInputDims - 1];
+      if (*rows < 32) return UNOPTIMIZED;
+      return COL_REDUCE;
+    } else if (dim == 0) {
+      *rows = array_prod(eval_expr.dimensions()) / eval_expr.dimensions()[0];
+      *cols = eval_expr.dimensions()[0];
+      if (*cols < 32) return UNOPTIMIZED;
+      return ROW_REDUCE;
+    }
+  }
+  return UNOPTIMIZED;
+}
+
+template <typename Expression, typename Index, bool Vectorizable>
+struct LaunchKernel;
+
+template <typename Expression, typename Index>
+struct LaunchKernel<Expression, Index, true> {
+  static void launch(int num_blocks, int block_size, const GpuDevice& device,
+                     const TensorEvaluator<Expression, GpuDevice>& evaluator,
+                     Index size) {
+    LAUNCH_CUDA_KERNEL(
+        (EigenMetaKernel_Vectorizable<TensorEvaluator<Expression, GpuDevice>,
+                                      Index>),
+        num_blocks, block_size, 0, device, evaluator, size);
+  }
+};
+
+template <typename Expression, typename Index>
+struct LaunchKernel<Expression, Index, false> {
+  static void launch(int num_blocks, int block_size, const GpuDevice& device,
+                     const TensorEvaluator<Expression, GpuDevice>& evaluator,
+                     Index size) {
+    LAUNCH_CUDA_KERNEL(
+        (EigenMetaKernel_NonVectorizable<TensorEvaluator<Expression, GpuDevice>,
+                                         Index>),
+        num_blocks, block_size, 0, device, evaluator, size);
+  }
+};
+
+template <typename F, typename LHS, typename RHS, bool Compatible>
+struct LaunchRowReduce;
+
+template <typename F, typename LHS, typename RHS>
+struct LaunchRowReduce<F, LHS, RHS, true> {
+  static void launch(const GpuDevice& device, RHS input, std::size_t rows,
+                     std::size_t cols, LHS output) {
+    RowReduceCuda(F(), device, input, rows, cols, output);
+  }
+};
+
+template <typename F, typename LHS, typename RHS>
+struct LaunchRowReduce<F, LHS, RHS, false> {
+  static void launch(const GpuDevice& device, RHS input, std::size_t rows,
+                     std::size_t cols, LHS output) {}
+};
+
+template <typename F, typename LHS, typename RHS, bool Compatible>
+struct LaunchColReduce;
+
+template <typename F, typename LHS, typename RHS>
+struct LaunchColReduce<F, LHS, RHS, true> {
+  static void launch(const GpuDevice& device, RHS input, std::size_t rows,
+                     std::size_t cols, LHS output) {
+    ColumnReduceCuda(F(), device, input, rows, cols, output);
+  }
+};
+
+template <typename F, typename LHS, typename RHS>
+struct LaunchColReduce<F, LHS, RHS, false> {
+  static void launch(const GpuDevice& device, RHS input, std::size_t rows,
+                     std::size_t cols, LHS output) {}
+};
+
+template <typename Expression, typename Device, bool Vectorizable>
+class TensorAssignExecutorHelper;
+
+template <typename OutExpr, typename InExpr, typename Op, typename Indices,
+          bool Vectorizable>
+class TensorAssignExecutorHelper<
+    const TensorAssignOp<
+      OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>,
+    GpuDevice, Vectorizable> {
+ public:
+  typedef const TensorAssignOp<
+    OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>
+    Expression;
+
+  typedef typename Expression::Index Index;
+  typedef TensorEvaluator<OutExpr, GpuDevice> LHSEval;
+  typedef TensorEvaluator<const InExpr, GpuDevice> RHSEval;
+  static inline void run(const Expression& expr, const GpuDevice& device) {
+    std::size_t rows, cols;
+    const ReductionType reduction_type =
+        GetReductionType<Op>(expr.rhsExpression().expression(),
+                             expr.rhsExpression(), device, &rows, &cols);
+    if (reduction_type == UNOPTIMIZED) {
+      TensorEvaluator<Expression, GpuDevice> evaluator(expr, device);
+      const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+      if (needs_assign) {
+        const int num_blocks = device.getNumCudaMultiProcessors() *
+                               device.maxCudaThreadsPerMultiProcessor() /
+                               device.maxCudaThreadsPerBlock();
+        const int block_size = device.maxCudaThreadsPerBlock();
+        const Index size = array_prod(evaluator.dimensions());
+        LaunchKernel<Expression, Index, Vectorizable>::launch(
+            num_blocks, block_size, device, evaluator, size);
+      }
+      evaluator.cleanup();
+    } else {
+      LHSEval output(expr.lhsExpression(), device);
+      RHSEval input(expr.rhsExpression().expression(), device);
+      bool lhs_needs_assign = output.evalSubExprsIfNeeded(NULL);
+      bool rhs_needs_assign = input.evalSubExprsIfNeeded(NULL);
+      if (lhs_needs_assign && rhs_needs_assign) {
+        const bool Compatible =
+            IsFloatSumReduction<Op>::value || IsFloatMaxReduction<Op>::value;
+        if (reduction_type == ROW_REDUCE) {
+          if (IsFloatSumReduction<Op>::value) {
+            LaunchRowReduce<CudaSumReducer, LHSEval, RHSEval,
+                            Compatible>::launch(device, input, rows, cols,
+                                                output);
+          } else if (IsFloatMaxReduction<Op>::value) {
+            LaunchRowReduce<CudaMaxReducer, LHSEval, RHSEval,
+                            Compatible>::launch(device, input, rows, cols,
+                                                output);
+          } else {
+            // Unsupported reduction type
+            assert(false && "Unsupported reduction function for ROW_REDUCE");
+          }
+        } else {
+          if (IsFloatSumReduction<Op>::value) {
+            LaunchColReduce<CudaSumReducer, LHSEval, RHSEval,
+                            Compatible>::launch(device, input, rows, cols,
+                                                output);
+          } else if (IsFloatMaxReduction<Op>::value) {
+            LaunchColReduce<CudaMaxReducer, LHSEval, RHSEval,
+                            Compatible>::launch(device, input, rows, cols,
+                                                output);
+          } else {
+            // Unsupported reduction type
+            assert(false && "Unsupported reduction function for COL_REDUCE");
+          }
+        }
+      }
+      input.cleanup();
+      output.cleanup();
+    }
+  }
+};
+
+template <typename OutExpr, typename InExpr, typename Op, typename Indices,
+          bool Tileable>
+inline void TensorExecutor<
+    const TensorAssignOp<
+        OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>,
+    GpuDevice, false, Tileable>::run(const Expression& expr,
+                                     const GpuDevice& device) {
+  TensorAssignExecutorHelper<
+      const TensorAssignOp<
+          OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>,
+      GpuDevice, false>::run(expr, device);
+}
+
+template <typename OutExpr, typename InExpr, typename Op, typename Indices,
+          bool Tileable>
+inline void TensorExecutor<
+    const TensorAssignOp<
+        OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>,
+    GpuDevice, true, Tileable>::run(const Expression& expr,
+                                    const GpuDevice& device) {
+  TensorAssignExecutorHelper<
+      const TensorAssignOp<
+          OutExpr, TensorReductionOp<Op, Indices const, InExpr const> const>,
+      GpuDevice, true>::run(expr, device);
+}
+
+template <typename T, typename Index>
+struct PtrWrapper {
+  EIGEN_DEVICE_FUNC PtrWrapper(T* ptr) : m_ptr(ptr) {}
+  EIGEN_DEVICE_FUNC T& coeffRef(Index i) { return *(m_ptr + i); }
+  T* m_ptr;
+};
+
+template <typename Expression, typename Device, bool Vectorizable>
+class TensorEvalToExecutorHelper;
+
+template <typename InExpr, typename Op, typename Indices, bool Vectorizable>
+class TensorEvalToExecutorHelper<const TensorEvalToOp<const TensorReductionOp<
+                                     Op, const Indices, const InExpr> >,
+                                 GpuDevice, Vectorizable> {
+ public:
+  typedef const TensorEvalToOp<const TensorReductionOp<
+      Op, const Indices, const InExpr> > Expression;
+  typedef typename Expression::Index Index;
+  typedef TensorEvaluator<const InExpr, GpuDevice> RHSEval;
+
+  static inline void run(const Expression& expr, const GpuDevice& device) {
+    std::size_t rows, cols;
+    const ReductionType reduction_type =
+        GetReductionType<Op>(expr.expression().expression(), expr.expression(),
+                             device, &rows, &cols);
+    if (reduction_type == UNOPTIMIZED) {
+      TensorEvaluator<Expression, GpuDevice> evaluator(expr, device);
+      const bool needs_assign = evaluator.evalSubExprsIfNeeded(NULL);
+      if (needs_assign) {
+        const int num_blocks = device.getNumCudaMultiProcessors() *
+                               device.maxCudaThreadsPerMultiProcessor() /
+                               device.maxCudaThreadsPerBlock();
+        const int block_size = device.maxCudaThreadsPerBlock();
+        const Index size = array_prod(evaluator.dimensions());
+        LaunchKernel<Expression, Index, Vectorizable>::launch(
+            num_blocks, block_size, device, evaluator, size);
+      }
+      evaluator.cleanup();
+    } else {
+      typedef typename internal::remove_const<typename Expression::Scalar>::type Scalar;
+      PtrWrapper<Scalar, Index> output(expr.buffer());
+      TensorEvaluator<const InExpr, GpuDevice> input(
+          expr.expression().expression(), device);
+      typedef PtrWrapper<Scalar, Index> LHSEval;
+      typedef TensorEvaluator<const InExpr, GpuDevice> RHSEval;
+      bool rhs_needs_assign = input.evalSubExprsIfNeeded(NULL);
+      if (rhs_needs_assign) {
+        const bool Compatible =
+            IsFloatSumReduction<Op>::value || IsFloatMaxReduction<Op>::value;
+        if (reduction_type == ROW_REDUCE) {
+          if (IsFloatSumReduction<Op>::value) {
+            LaunchRowReduce<CudaSumReducer, LHSEval, RHSEval,
+                            Compatible>::launch(device, input, rows, cols,
+                                                output);
+          } else if (IsFloatMaxReduction<Op>::value) {
+            LaunchRowReduce<CudaMaxReducer, LHSEval, RHSEval,
+                            Compatible>::launch(device, input, rows, cols,
+                                                output);
+          }
+        } else {
+          if (IsFloatSumReduction<Op>::value) {
+            LaunchColReduce<CudaSumReducer, LHSEval, RHSEval,
+                            Compatible>::launch(device, input, rows, cols,
+                                                output);
+          } else if (IsFloatMaxReduction<Op>::value) {
+            LaunchColReduce<CudaMaxReducer, LHSEval, RHSEval,
+                            Compatible>::launch(device, input, rows, cols,
+                                                output);
+          }
+        }
+      }
+      input.cleanup();
+    }
+  }
+};
+
+template <typename InExpr, typename Op, typename Indices, bool Tileable>
+inline void
+TensorExecutor<const TensorEvalToOp<
+                   const TensorReductionOp<Op, const Indices, const InExpr> >,
+               GpuDevice, false, Tileable>::run(const Expression& expr,
+                                                const GpuDevice& device) {
+  TensorEvalToExecutorHelper<const TensorEvalToOp<const TensorReductionOp<
+                                 Op, const Indices, const InExpr> >,
+                             GpuDevice, false>::run(expr, device);
+}
+
+template <typename InExpr, typename Op, typename Indices, bool Tileable>
+inline void
+TensorExecutor<const TensorEvalToOp<
+                   const TensorReductionOp<Op, const Indices, const InExpr> >,
+               GpuDevice, true, Tileable>::run(const Expression& expr,
+                                               const GpuDevice& device) {
+  TensorEvalToExecutorHelper<const TensorEvalToOp<const TensorReductionOp<
+                                 Op, const Indices, const InExpr> >,
+                             GpuDevice, true>::run(expr, device);
+}
+
+}  // end namespace internal
+
+}  // end namespace Eigen
+
+#endif  // __CUDACC__
+#endif  // EIGEN_USE_GPU
+#endif  // EIGEN_CXX11_TENSOR_TENSOR_REDUCTION_CUDA_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorRef.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorRef.h
new file mode 100644
index 0000000000..fb8ba09dd3
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorRef.h
@@ -0,0 +1,442 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_REF_H
+#define EIGEN_CXX11_TENSOR_TENSOR_REF_H
+
+namespace Eigen {
+
+namespace internal {
+
+template <typename Dimensions, typename Scalar>
+class TensorLazyBaseEvaluator {
+ public:
+  TensorLazyBaseEvaluator() : m_refcount(0) { }
+  virtual ~TensorLazyBaseEvaluator() { }
+
+  EIGEN_DEVICE_FUNC virtual const Dimensions& dimensions() const = 0;
+  EIGEN_DEVICE_FUNC virtual const Scalar* data() const = 0;
+
+  EIGEN_DEVICE_FUNC virtual const Scalar coeff(DenseIndex index) const = 0;
+  EIGEN_DEVICE_FUNC virtual Scalar& coeffRef(DenseIndex index) = 0;
+
+  void incrRefCount() { ++m_refcount; }
+  void decrRefCount() { --m_refcount; }
+  int refCount() const { return m_refcount; }
+
+ private:
+  // No copy, no assigment;
+  TensorLazyBaseEvaluator(const TensorLazyBaseEvaluator& other);
+  TensorLazyBaseEvaluator& operator = (const TensorLazyBaseEvaluator& other);
+
+  int m_refcount;
+};
+
+
+template <typename Dimensions, typename Expr, typename Device>
+class TensorLazyEvaluatorReadOnly : public TensorLazyBaseEvaluator<Dimensions, typename TensorEvaluator<Expr, Device>::Scalar> {
+ public:
+  //  typedef typename TensorEvaluator<Expr, Device>::Dimensions Dimensions;
+  typedef typename TensorEvaluator<Expr, Device>::Scalar Scalar;
+
+  TensorLazyEvaluatorReadOnly(const Expr& expr, const Device& device) : m_impl(expr, device), m_dummy(Scalar(0)) {
+    m_dims = m_impl.dimensions();
+    m_impl.evalSubExprsIfNeeded(NULL);
+  }
+  virtual ~TensorLazyEvaluatorReadOnly() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC virtual const Dimensions& dimensions() const {
+    return m_dims;
+  }
+  EIGEN_DEVICE_FUNC virtual const Scalar* data() const {
+    return m_impl.data();
+  }
+
+  EIGEN_DEVICE_FUNC virtual const Scalar coeff(DenseIndex index) const {
+    return m_impl.coeff(index);
+  }
+  EIGEN_DEVICE_FUNC virtual Scalar& coeffRef(DenseIndex /*index*/) {
+    eigen_assert(false && "can't reference the coefficient of a rvalue");
+    return m_dummy;
+  };
+
+ protected:
+  TensorEvaluator<Expr, Device> m_impl;
+  Dimensions m_dims;
+  Scalar m_dummy;
+};
+
+template <typename Dimensions, typename Expr, typename Device>
+class TensorLazyEvaluatorWritable : public TensorLazyEvaluatorReadOnly<Dimensions, Expr, Device> {
+ public:
+  typedef TensorLazyEvaluatorReadOnly<Dimensions, Expr, Device> Base;
+  typedef typename Base::Scalar Scalar;
+
+  TensorLazyEvaluatorWritable(const Expr& expr, const Device& device) : Base(expr, device) {
+  }
+  virtual ~TensorLazyEvaluatorWritable() {
+  }
+
+  EIGEN_DEVICE_FUNC virtual Scalar& coeffRef(DenseIndex index) {
+    return this->m_impl.coeffRef(index);
+  }
+};
+
+template <typename Dimensions, typename Expr, typename Device>
+class TensorLazyEvaluator : public internal::conditional<bool(internal::is_lvalue<Expr>::value),
+                            TensorLazyEvaluatorWritable<Dimensions, Expr, Device>,
+                            TensorLazyEvaluatorReadOnly<Dimensions, const Expr, Device> >::type {
+ public:
+  typedef typename internal::conditional<bool(internal::is_lvalue<Expr>::value),
+                                         TensorLazyEvaluatorWritable<Dimensions, Expr, Device>,
+                                         TensorLazyEvaluatorReadOnly<Dimensions, const Expr, Device> >::type Base;
+  typedef typename Base::Scalar Scalar;
+
+  TensorLazyEvaluator(const Expr& expr, const Device& device) : Base(expr, device) {
+  }
+  virtual ~TensorLazyEvaluator() {
+  }
+};
+
+}  // namespace internal
+
+
+/** \class TensorRef
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief A reference to a tensor expression
+  * The expression will be evaluated lazily (as much as possible).
+  *
+  */
+template<typename PlainObjectType> class TensorRef : public TensorBase<TensorRef<PlainObjectType> >
+{
+  public:
+    typedef TensorRef<PlainObjectType> Self;
+    typedef typename PlainObjectType::Base Base;
+    typedef typename Eigen::internal::nested<Self>::type Nested;
+    typedef typename internal::traits<PlainObjectType>::StorageKind StorageKind;
+    typedef typename internal::traits<PlainObjectType>::Index Index;
+    typedef typename internal::traits<PlainObjectType>::Scalar Scalar;
+    typedef typename internal::packet_traits<Scalar>::type Packet;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+    typedef Scalar* PointerType;
+    typedef PointerType PointerArgType;
+
+    static const Index NumIndices = PlainObjectType::NumIndices;
+    typedef typename PlainObjectType::Dimensions Dimensions;
+
+    enum {
+      IsAligned = false,
+      PacketAccess = false,
+      BlockAccess = false,
+      Layout = PlainObjectType::Layout,
+      CoordAccess = false,  // to be implemented
+    };
+
+    EIGEN_STRONG_INLINE TensorRef() : m_evaluator(NULL) {
+    }
+
+    template <typename Expression>
+    EIGEN_STRONG_INLINE TensorRef(Expression& expr) : m_evaluator(new internal::TensorLazyEvaluator<Dimensions, Expression, DefaultDevice>(expr, DefaultDevice())) {
+      m_evaluator->incrRefCount();
+    }
+
+    template <typename Expression>
+    EIGEN_STRONG_INLINE TensorRef(const Expression& expr) : m_evaluator(new internal::TensorLazyEvaluator<Dimensions, const Expression, DefaultDevice>(expr, DefaultDevice())) {
+      m_evaluator->incrRefCount();
+    }
+
+    template <typename Expression>
+    EIGEN_STRONG_INLINE TensorRef& operator = (const Expression& expr) {
+      unrefEvaluator();
+      m_evaluator = new internal::TensorLazyEvaluator<Dimensions, Expression, DefaultDevice>(expr, DefaultDevice());
+      m_evaluator->incrRefCount();
+      return *this;
+    }
+
+    ~TensorRef() {
+      unrefEvaluator();
+    }
+
+    TensorRef(const TensorRef& other) : m_evaluator(other.m_evaluator) {
+      eigen_assert(m_evaluator->refCount() > 0);
+      m_evaluator->incrRefCount();
+    }
+
+    TensorRef(TensorRef& other) : m_evaluator(other.m_evaluator) {
+      eigen_assert(m_evaluator->refCount() > 0);
+      m_evaluator->incrRefCount();
+    }
+
+    TensorRef& operator = (const TensorRef& other) {
+      if (this != &other) {
+        unrefEvaluator();
+        m_evaluator = other.m_evaluator;
+        eigen_assert(m_evaluator->refCount() > 0);
+        m_evaluator->incrRefCount();
+      }
+      return *this;
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index rank() const { return m_evaluator->dimensions().size(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index dimension(Index n) const { return m_evaluator->dimensions()[n]; }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_evaluator->dimensions(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Index size() const { return m_evaluator->dimensions().TotalSize(); }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar* data() const { return m_evaluator->data(); }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar operator()(Index index) const
+    {
+      return m_evaluator->coeff(index);
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar operator()(Index firstIndex, IndexTypes... otherIndices) const
+    {
+      const std::size_t NumIndices = (sizeof...(otherIndices) + 1);
+      const array<Index, NumIndices> indices{{firstIndex, otherIndices...}};
+      return coeff(indices);
+    }
+    template<typename... IndexTypes> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index firstIndex, IndexTypes... otherIndices)
+    {
+      const std::size_t NumIndices = (sizeof...(otherIndices) + 1);
+      const array<Index, NumIndices> indices{{firstIndex, otherIndices...}};
+      return coeffRef(indices);
+    }
+#else
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar operator()(Index i0, Index i1) const
+    {
+      array<Index, 2> indices;
+      indices[0] = i0;
+      indices[1] = i1;
+      return coeff(indices);
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar operator()(Index i0, Index i1, Index i2) const
+    {
+      array<Index, 3> indices;
+      indices[0] = i0;
+      indices[1] = i1;
+      indices[2] = i2;
+      return coeff(indices);
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar operator()(Index i0, Index i1, Index i2, Index i3) const
+    {
+      array<Index, 4> indices;
+      indices[0] = i0;
+      indices[1] = i1;
+      indices[2] = i2;
+      indices[3] = i3;
+      return coeff(indices);
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar operator()(Index i0, Index i1, Index i2, Index i3, Index i4) const
+    {
+      array<Index, 5> indices;
+      indices[0] = i0;
+      indices[1] = i1;
+      indices[2] = i2;
+      indices[3] = i3;
+      indices[4] = i4;
+      return coeff(indices);
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index i0, Index i1)
+    {
+      array<Index, 2> indices;
+      indices[0] = i0;
+      indices[1] = i1;
+      return coeffRef(indices);
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index i0, Index i1, Index i2)
+    {
+      array<Index, 3> indices;
+      indices[0] = i0;
+      indices[1] = i1;
+      indices[2] = i2;
+      return coeffRef(indices);
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& operator()(Index i0, Index i1, Index i2, Index i3)
+    {
+      array<Index, 4> indices;
+      indices[0] = i0;
+      indices[1] = i1;
+      indices[2] = i2;
+      indices[3] = i3;
+      return coeffRef(indices);
+    }
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index i0, Index i1, Index i2, Index i3, Index i4)
+    {
+      array<Index, 5> indices;
+      indices[0] = i0;
+      indices[1] = i1;
+      indices[2] = i2;
+      indices[3] = i3;
+      indices[4] = i4;
+      return coeffRef(indices);
+    }
+#endif
+
+    template <std::size_t NumIndices> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(const array<Index, NumIndices>& indices) const
+    {
+      const Dimensions& dims = this->dimensions();
+      Index index = 0;
+      if (PlainObjectType::Options & RowMajor) {
+        index += indices[0];
+        for (int i = 1; i < NumIndices; ++i) {
+          index = index * dims[i] + indices[i];
+        }
+      } else {
+        index += indices[NumIndices-1];
+        for (int i = NumIndices-2; i >= 0; --i) {
+          index = index * dims[i] + indices[i];
+        }
+      }
+      return m_evaluator->coeff(index);
+    }
+    template <std::size_t NumIndices> EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(const array<Index, NumIndices>& indices)
+    {
+      const Dimensions& dims = this->dimensions();
+      Index index = 0;
+      if (PlainObjectType::Options & RowMajor) {
+        index += indices[0];
+        for (int i = 1; i < NumIndices; ++i) {
+          index = index * dims[i] + indices[i];
+        }
+      } else {
+        index += indices[NumIndices-1];
+        for (int i = NumIndices-2; i >= 0; --i) {
+          index = index * dims[i] + indices[i];
+        }
+      }
+      return m_evaluator->coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE const Scalar coeff(Index index) const
+    {
+      return m_evaluator->coeff(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE Scalar& coeffRef(Index index)
+    {
+      return m_evaluator->coeffRef(index);
+    }
+
+  private:
+    EIGEN_STRONG_INLINE void unrefEvaluator() {
+      if (m_evaluator) {
+        m_evaluator->decrRefCount();
+        if (m_evaluator->refCount() == 0) {
+          delete m_evaluator;
+        }
+      }
+    }
+
+  internal::TensorLazyBaseEvaluator<Dimensions, Scalar>* m_evaluator;
+};
+
+
+// evaluator for rvalues
+template<typename Derived, typename Device>
+struct TensorEvaluator<const TensorRef<Derived>, Device>
+{
+  typedef typename Derived::Index Index;
+  typedef typename Derived::Scalar Scalar;
+  typedef typename Derived::Packet Packet;
+  typedef typename Derived::Scalar CoeffReturnType;
+  typedef typename Derived::Packet PacketReturnType;
+  typedef typename Derived::Dimensions Dimensions;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = false,
+    BlockAccess = false,
+    Layout = TensorRef<Derived>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const TensorRef<Derived>& m, const Device&)
+      : m_ref(m)
+  { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_ref.dimensions(); }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar*) {
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const {
+    return m_ref.coeff(index);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) {
+    return m_ref.coeffRef(index);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return m_ref.data(); }
+
+ protected:
+  TensorRef<Derived> m_ref;
+};
+
+
+// evaluator for lvalues
+template<typename Derived, typename Device>
+struct TensorEvaluator<TensorRef<Derived>, Device> : public TensorEvaluator<const TensorRef<Derived>, Device>
+{
+  typedef typename Derived::Index Index;
+  typedef typename Derived::Scalar Scalar;
+  typedef typename Derived::Packet Packet;
+  typedef typename Derived::Scalar CoeffReturnType;
+  typedef typename Derived::Packet PacketReturnType;
+  typedef typename Derived::Dimensions Dimensions;
+
+  typedef TensorEvaluator<const TensorRef<Derived>, Device> Base;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = false,
+    BlockAccess = false,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(TensorRef<Derived>& m, const Device& d) : Base(m, d)
+  { }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) {
+    return this->m_ref.coeffRef(index);
+  }
+};
+
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_REF_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReverse.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReverse.h
new file mode 100644
index 0000000000..44e147de3e
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReverse.h
@@ -0,0 +1,278 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Navdeep Jaitly <ndjaitly@google.com>
+//                    Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_REVERSE_H
+#define EIGEN_CXX11_TENSOR_TENSOR_REVERSE_H
+namespace Eigen {
+
+/** \class TensorReverse
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor reverse elements class.
+  *
+  */
+namespace internal {
+template<typename ReverseDimensions, typename XprType>
+struct traits<TensorReverseOp<ReverseDimensions,
+                              XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename ReverseDimensions, typename XprType>
+struct eval<TensorReverseOp<ReverseDimensions, XprType>, Eigen::Dense>
+{
+  typedef const TensorReverseOp<ReverseDimensions, XprType>& type;
+};
+
+template<typename ReverseDimensions, typename XprType>
+struct nested<TensorReverseOp<ReverseDimensions, XprType>, 1,
+            typename eval<TensorReverseOp<ReverseDimensions, XprType> >::type>
+{
+  typedef TensorReverseOp<ReverseDimensions, XprType> type;
+};
+
+}  // end namespace internal
+
+template<typename ReverseDimensions, typename XprType>
+class TensorReverseOp : public TensorBase<TensorReverseOp<ReverseDimensions,
+                                          XprType>, WriteAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorReverseOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorReverseOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorReverseOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorReverseOp>::StorageKind
+                                                                    StorageKind;
+  typedef typename Eigen::internal::traits<TensorReverseOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorReverseOp(
+      const XprType& expr, const ReverseDimensions& reverse_dims)
+      : m_xpr(expr), m_reverse_dims(reverse_dims) {}
+
+    EIGEN_DEVICE_FUNC
+    const ReverseDimensions& reverse() const { return m_reverse_dims; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorReverseOp& operator = (const TensorReverseOp& other)
+    {
+      typedef TensorAssignOp<TensorReverseOp, const TensorReverseOp> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorReverseOp& operator = (const OtherDerived& other)
+    {
+      typedef TensorAssignOp<TensorReverseOp, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const ReverseDimensions m_reverse_dims;
+};
+
+// Eval as rvalue
+template<typename ReverseDimensions, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorReverseOp<ReverseDimensions, ArgType>, Device>
+{
+  typedef TensorReverseOp<ReverseDimensions, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<ReverseDimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op,
+                                                        const Device& device)
+      : m_impl(op.expression(), device), m_reverse(op.reverse())
+  {
+    // Compute strides
+    m_dimensions = m_impl.dimensions();
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_strides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_strides[i] = m_strides[i-1] * m_dimensions[i-1];
+      }
+    } else {
+      m_strides[NumDims-1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_strides[i] = m_strides[i+1] * m_dimensions[i+1];
+      }
+    }
+  }
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar*) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index reverseIndex(
+      Index index) const {
+    eigen_assert(index < dimensions().TotalSize());
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        Index idx = index / m_strides[i];
+        index -= idx * m_strides[i];
+        if (m_reverse[i]) {
+          idx = m_dimensions[i] - idx - 1;
+        }
+        inputIndex += idx * m_strides[i] ;
+      }
+      if (m_reverse[0]) {
+        inputIndex += (m_dimensions[0] - index - 1);
+      } else {
+        inputIndex += index;
+      }
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        Index idx = index / m_strides[i];
+        index -= idx * m_strides[i];
+        if (m_reverse[i]) {
+          idx = m_dimensions[i] - idx - 1;
+        }
+        inputIndex += idx * m_strides[i] ;
+      }
+      if (m_reverse[NumDims-1]) {
+        inputIndex += (m_dimensions[NumDims-1] - index - 1);
+      } else {
+        inputIndex += index;
+      }
+    }
+    return inputIndex;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(
+      Index index) const  {
+    return m_impl.coeff(reverseIndex(index));
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    // TODO(ndjaitly): write a better packing routine that uses
+    // local structure.
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type
+                                                            values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_strides;
+  TensorEvaluator<ArgType, Device> m_impl;
+  ReverseDimensions m_reverse;
+};
+
+// Eval as lvalue
+
+template <typename ReverseDimensions, typename ArgType, typename Device>
+struct TensorEvaluator<TensorReverseOp<ReverseDimensions, ArgType>, Device>
+    : public TensorEvaluator<const TensorReverseOp<ReverseDimensions, ArgType>,
+                             Device> {
+  typedef TensorEvaluator<const TensorReverseOp<ReverseDimensions, ArgType>,
+                          Device> Base;
+  typedef TensorReverseOp<ReverseDimensions, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<ReverseDimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op,
+                                                        const Device& device)
+      : Base(op, device) {}
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  const Dimensions& dimensions() const { return this->m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index) {
+    return this->m_impl.coeffRef(Base::reverseIndex(index));
+  }
+
+  template <int StoreMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x) {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    // This code is pilfered from TensorMorphing.h
+    EIGEN_ALIGN_DEFAULT CoeffReturnType values[packetSize];
+    internal::pstore<CoeffReturnType, PacketReturnType>(values, x);
+    for (int i = 0; i < packetSize; ++i) {
+      this->coeffRef(index+i) = values[i];
+    }
+  }
+
+};
+
+
+}  // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_REVERSE_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorShuffling.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorShuffling.h
new file mode 100644
index 0000000000..2e59a147bc
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorShuffling.h
@@ -0,0 +1,412 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_SHUFFLING_H
+#define EIGEN_CXX11_TENSOR_TENSOR_SHUFFLING_H
+
+namespace Eigen {
+
+/** \class TensorShuffling
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor shuffling class.
+  *
+  *
+  */
+namespace internal {
+template<typename Shuffle, typename XprType>
+struct traits<TensorShufflingOp<Shuffle, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename Shuffle, typename XprType>
+struct eval<TensorShufflingOp<Shuffle, XprType>, Eigen::Dense>
+{
+  typedef const TensorShufflingOp<Shuffle, XprType>& type;
+};
+
+template<typename Shuffle, typename XprType>
+struct nested<TensorShufflingOp<Shuffle, XprType>, 1, typename eval<TensorShufflingOp<Shuffle, XprType> >::type>
+{
+  typedef TensorShufflingOp<Shuffle, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename Shuffle, typename XprType>
+class TensorShufflingOp : public TensorBase<TensorShufflingOp<Shuffle, XprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorShufflingOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorShufflingOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorShufflingOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorShufflingOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorShufflingOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorShufflingOp(const XprType& expr, const Shuffle& shuffle)
+      : m_xpr(expr), m_shuffle(shuffle) {}
+
+    EIGEN_DEVICE_FUNC
+    const Shuffle& shufflePermutation() const { return m_shuffle; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorShufflingOp& operator = (const TensorShufflingOp& other)
+    {
+      typedef TensorAssignOp<TensorShufflingOp, const TensorShufflingOp> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorShufflingOp& operator = (const OtherDerived& other)
+    {
+      typedef TensorAssignOp<TensorShufflingOp, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const Shuffle m_shuffle;
+};
+
+
+// Eval as rvalue
+template<typename Shuffle, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorShufflingOp<Shuffle, ArgType>, Device>
+{
+  typedef TensorShufflingOp<Shuffle, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename internal::remove_const<Scalar>::type ScalarNonConst;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+    BlockAccess = TensorEvaluator<ArgType, Device>::BlockAccess,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  typedef typename internal::TensorBlock<
+    Index, typename internal::remove_const<Scalar>::type, NumDims,
+    TensorEvaluator<ArgType, Device>::Layout> TensorBlock;
+  typedef typename internal::TensorBlockReader<
+    Index, typename internal::remove_const<Scalar>::type, NumDims,
+    TensorEvaluator<ArgType, Device>::Layout, PacketAccess> TensorBlockReader;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_shuffle(op.shufflePermutation()), m_impl(op.expression(), device)
+  {
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    for (int i = 0; i < NumDims; ++i) {
+      m_dimensions[i] = input_dims[m_shuffle[i]];
+      m_inverseShuffle[m_shuffle[i]] = i;
+    }
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_unshuffledInputStrides[0] = 1;
+      m_outputStrides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_unshuffledInputStrides[i] =
+            m_unshuffledInputStrides[i - 1] * input_dims[i - 1];
+        m_outputStrides[i] = m_outputStrides[i - 1] * m_dimensions[i - 1];
+      }
+    } else {
+      m_unshuffledInputStrides[NumDims - 1] = 1;
+      m_outputStrides[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_unshuffledInputStrides[i] =
+            m_unshuffledInputStrides[i + 1] * input_dims[i + 1];
+        m_outputStrides[i] = m_outputStrides[i + 1] * m_dimensions[i + 1];
+      }
+    }
+
+    for (int i = 0; i < NumDims; ++i) {
+      m_inputStrides[i] = m_unshuffledInputStrides[m_shuffle[i]];
+    }
+
+    m_block_total_size_max = numext::maxi(static_cast<std::size_t>(1),
+                                        device.firstLevelCacheSize() /
+                                        sizeof(Scalar));
+  }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return m_impl.coeff(srcCoeff(index));
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void getResourceRequirements(
+      std::vector<internal::TensorOpResourceRequirements>* resources) const {
+    resources->push_back(internal::TensorOpResourceRequirements(
+        internal::kUniformAllDims, m_block_total_size_max));
+    m_impl.getResourceRequirements(resources);
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void block(
+      TensorBlock* output_block) const {
+    if (m_impl.data() != NULL) {
+      // Fast path: we have direct access to the data, so shuffle as we read.
+      TensorBlockReader::Run(output_block,
+                             srcCoeff(output_block->first_coeff_index()),
+                             m_inverseShuffle,
+                             m_unshuffledInputStrides,
+                             m_impl.data());
+      return;
+    }
+
+    // Slow path: read unshuffled block from the input and shuffle in-place.
+    // Initialize input block sizes using input-to-output shuffle map.
+    DSizes<Index, NumDims> input_block_sizes;
+    for (Index i = 0; i < NumDims; ++i) {
+      input_block_sizes[i] = output_block->block_sizes()[m_inverseShuffle[i]];
+    }
+
+    // Calculate input block strides.
+    DSizes<Index, NumDims> input_block_strides;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      input_block_strides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        input_block_strides[i] = input_block_strides[i - 1] *
+            input_block_sizes[i - 1];
+      }
+    } else {
+      input_block_strides[NumDims - 1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        input_block_strides[i] = input_block_strides[i + 1] *
+            input_block_sizes[i + 1];
+      }
+    }
+
+    // Read input block.
+    TensorBlock input_block(srcCoeff(output_block->first_coeff_index()),
+                            input_block_sizes,
+                            input_block_strides,
+                            m_unshuffledInputStrides,
+                            output_block->data());
+
+    m_impl.block(&input_block);
+
+    // Naive In-place shuffle: random IO but block size is O(L1 cache size).
+    // TODO(andydavis) Improve the performance of this in-place shuffle.
+    const Index total_size = input_block_sizes.TotalSize();
+    std::vector<bool> bitmap(total_size, false);
+    ScalarNonConst* data = const_cast<ScalarNonConst*>(output_block->data());
+    const DSizes<Index, NumDims>& output_block_strides =
+        output_block->block_strides();
+    for (Index input_index = 0; input_index < total_size; ++input_index) {
+      if (bitmap[input_index]) {
+        // Coefficient at this index has already been shuffled.
+        continue;
+      }
+
+      Index output_index = GetBlockOutputIndex(input_index,
+                                               input_block_strides,
+                                               output_block_strides);
+      if (output_index == input_index) {
+        // Coefficient already in place.
+        bitmap[output_index] = true;
+        continue;
+      }
+
+      // The following loop starts at 'input_index', and shuffles
+      // coefficients into their shuffled location at 'output_index'.
+      // It skips through the array shuffling coefficients by following
+      // the shuffle cycle starting and ending a 'start_index'.
+      ScalarNonConst evicted_value;
+      ScalarNonConst shuffled_value = data[input_index];
+      do {
+        evicted_value = data[output_index];
+        data[output_index] = shuffled_value;
+        shuffled_value = evicted_value;
+        bitmap[output_index] = true;
+        output_index = GetBlockOutputIndex(output_index,
+                                           input_block_strides,
+                                           output_block_strides);
+      } while (output_index != input_index);
+
+      data[output_index] = shuffled_value;
+      bitmap[output_index] = true;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index GetBlockOutputIndex(
+      Index input_index,
+      const DSizes<Index, NumDims>& input_block_strides,
+      const DSizes<Index, NumDims>& output_block_strides) const {
+    Index output_index = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = input_index / input_block_strides[i];
+        output_index += idx * output_block_strides[m_inverseShuffle[i]];
+        input_index -= idx * input_block_strides[i];
+      }
+      return output_index + input_index *
+          output_block_strides[m_inverseShuffle[0]];
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx = input_index / input_block_strides[i];
+        output_index += idx * output_block_strides[m_inverseShuffle[i]];
+        input_index -= idx * input_block_strides[i];
+      }
+      return output_index + input_index *
+          output_block_strides[m_inverseShuffle[NumDims - 1]];
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const {
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = index / m_outputStrides[i];
+        inputIndex += idx * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      return inputIndex + index * m_inputStrides[0];
+    } else {
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx = index / m_outputStrides[i];
+        inputIndex += idx * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      return inputIndex + index * m_inputStrides[NumDims - 1];
+    }
+  }
+
+  const Shuffle& m_shuffle;
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_inverseShuffle;
+  array<Index, NumDims> m_outputStrides;
+  array<Index, NumDims> m_inputStrides;
+  array<Index, NumDims> m_unshuffledInputStrides;
+  TensorEvaluator<ArgType, Device> m_impl;
+  std::size_t m_block_total_size_max;
+};
+
+
+// Eval as lvalue
+template<typename Shuffle, typename ArgType, typename Device>
+struct TensorEvaluator<TensorShufflingOp<Shuffle, ArgType>, Device>
+    : public TensorEvaluator<const TensorShufflingOp<Shuffle, ArgType>, Device>
+{
+  typedef TensorEvaluator<const TensorShufflingOp<Shuffle, ArgType>, Device> Base;
+
+  typedef TensorShufflingOp<Shuffle, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename XprType::Scalar Scalar;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+    BlockAccess = TensorEvaluator<ArgType, Device>::BlockAccess,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+  };
+
+  typedef typename internal::TensorBlock<
+    Index, typename internal::remove_const<Scalar>::type, NumDims,
+    TensorEvaluator<ArgType, Device>::Layout> TensorBlock;
+  typedef typename internal::TensorBlockWriter<
+    Index, typename internal::remove_const<Scalar>::type, NumDims,
+    TensorEvaluator<ArgType, Device>::Layout, PacketAccess> TensorBlockWriter;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : Base(op, device)
+  { }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType& coeffRef(Index index)
+  {
+    return this->m_impl.coeffRef(this->srcCoeff(index));
+  }
+
+  template <int StoreMode> EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x)
+  {
+    static const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+    internal::pstore<CoeffReturnType, PacketReturnType>(values, x);
+    for (int i = 0; i < packetSize; ++i) {
+      this->coeffRef(index+i) = values[i];
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void writeBlock(
+      const TensorBlock& block) {
+    eigen_assert(this->m_impl.data() != NULL);
+    TensorBlockWriter::Run(block, this->srcCoeff(block.first_coeff_index()),
+                           this->m_inverseShuffle,
+                           this->m_unshuffledInputStrides, this->m_impl.data());
+  }
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_SHUFFLING_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorStorage.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorStorage.h
new file mode 100644
index 0000000000..cfde4fdc72
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorStorage.h
@@ -0,0 +1,247 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSORSTORAGE_H
+#define EIGEN_CXX11_TENSOR_TENSORSTORAGE_H
+
+#ifdef EIGEN_TENSOR_STORAGE_CTOR_PLUGIN
+  #define EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN EIGEN_TENSOR_STORAGE_CTOR_PLUGIN;
+#else
+  #define EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN
+#endif
+
+namespace Eigen {
+
+/** \internal
+  *
+  * \class TensorStorage
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Stores the data of a tensor
+  *
+  * This class stores the data of fixed-size, dynamic-size or mixed tensors
+  * in a way as compact as possible.
+  *
+  * \sa Tensor
+  */
+template<typename T, typename Dimensions, int Options_> class TensorStorage;
+
+
+// Pure fixed-size storage
+template<typename T, int Options_, typename FixedDimensions>
+class TensorStorage<T, FixedDimensions, Options_>
+{
+ private:
+  static const std::size_t Size = FixedDimensions::total_size;
+
+  EIGEN_ALIGN_DEFAULT T m_data[Size];
+  FixedDimensions m_dimensions;
+
+ public:
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE TensorStorage() {
+    EIGEN_STATIC_ASSERT(Size == FixedDimensions::total_size, YOU_MADE_A_PROGRAMMING_MISTAKE)
+  }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE T *data() { return m_data; }
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE const T *data() const { return m_data; }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE const FixedDimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC
+  EIGEN_STRONG_INLINE DenseIndex size() const { return m_dimensions.TotalSize(); }
+};
+
+
+// pure dynamic
+template<typename T, int Options_, typename IndexType, std::size_t NumIndices_>
+class TensorStorage<T, DSizes<IndexType, NumIndices_>, Options_>
+{
+  public:
+    typedef IndexType Index;
+    typedef DSizes<IndexType, NumIndices_> Dimensions;
+    typedef TensorStorage<T, DSizes<IndexType, NumIndices_>, Options_> Self;
+
+    EIGEN_DEVICE_FUNC TensorStorage()
+      : m_data(NumIndices_ ? 0 : internal::conditional_aligned_new_auto<T,(Options_&DontAlign)==0>(1))
+      , m_dimensions() {}
+
+    EIGEN_DEVICE_FUNC TensorStorage(internal::constructor_without_unaligned_array_assert)
+      : m_data(NumIndices_ ? 0 : internal::conditional_aligned_new_auto<T,(Options_&DontAlign)==0>(1))
+      , m_dimensions(internal::template repeat<NumIndices_, Index>(0)) {}
+
+    EIGEN_DEVICE_FUNC TensorStorage(Index size, const array<Index, NumIndices_>& dimensions)
+        : m_data(internal::conditional_aligned_new_auto<T,(Options_&DontAlign)==0>(size)), m_dimensions(dimensions)
+      { EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN }
+
+    EIGEN_DEVICE_FUNC TensorStorage(const Self& other)
+      : m_data(internal::conditional_aligned_new_auto<T,(Options_&DontAlign)==0>(internal::array_prod(other.m_dimensions)))
+      , m_dimensions(other.m_dimensions)
+    {
+      internal::smart_copy(other.m_data, other.m_data+internal::array_prod(other.m_dimensions), m_data);
+    }
+    EIGEN_DEVICE_FUNC Self& operator=(const Self& other)
+    {
+      if (this != &other) {
+        Self tmp(other);
+        this->swap(tmp);
+      }
+      return *this;
+    }
+
+    EIGEN_DEVICE_FUNC  ~TensorStorage() { internal::conditional_aligned_delete_auto<T,(Options_&DontAlign)==0>(m_data, internal::array_prod(m_dimensions)); }
+    EIGEN_DEVICE_FUNC  void swap(Self& other)
+    { numext::swap(m_data,other.m_data); numext::swap(m_dimensions,other.m_dimensions); }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const {return m_dimensions;}
+
+    EIGEN_DEVICE_FUNC void resize(Index size, const array<Index, NumIndices_>& nbDimensions)
+    {
+      const Index currentSz = internal::array_prod(m_dimensions);
+      if(size != currentSz)
+      {
+        internal::conditional_aligned_delete_auto<T,(Options_&DontAlign)==0>(m_data, currentSz);
+        if (size)
+          m_data = internal::conditional_aligned_new_auto<T,(Options_&DontAlign)==0>(size);
+        else
+          m_data = 0;
+        EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
+      }
+      m_dimensions = nbDimensions;
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T *data() { return m_data; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T *data() const { return m_data; }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index size() const { return m_dimensions.TotalSize(); }
+
+ private:
+  T *m_data;
+  Dimensions m_dimensions;
+};
+
+
+// pure dynamic
+template<typename T, int Options_>
+class TensorStorage<T, VSizes<DenseIndex>, Options_>
+{
+    T* m_data;
+    VSizes<DenseIndex> m_dimensions;
+    typedef TensorStorage<T, VSizes<DenseIndex>, Options_> Self_;
+
+  public:
+    EIGEN_DEVICE_FUNC TensorStorage() : m_data(0), m_dimensions() {}
+
+    template <DenseIndex NumDims>
+    EIGEN_DEVICE_FUNC TensorStorage(const array<DenseIndex, NumDims>& dimensions)
+      {
+        m_dimensions.resize(NumDims);
+        for (int i = 0; i < NumDims; ++i) {
+          m_dimensions[i] = dimensions[i];
+        }
+        const DenseIndex size = array_prod(dimensions);
+        m_data = internal::conditional_managed_new_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(size);
+        EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN
+      }
+
+    EIGEN_DEVICE_FUNC TensorStorage(const std::vector<DenseIndex>& dimensions)
+        : m_dimensions(dimensions)
+      {
+        const DenseIndex size = internal::array_prod(dimensions);
+        m_data = internal::conditional_managed_new_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(size);
+        EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN
+      }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes> EIGEN_DEVICE_FUNC
+    TensorStorage(IndexTypes... dimensions) {
+      const int NumDims = sizeof...(dimensions);
+      m_dimensions.resize(NumDims);
+      const array<DenseIndex, NumDims> dim{{dimensions...}};
+      DenseIndex size = 1;
+      for (int i = 0; i < NumDims; ++i) {
+        size *= dim[i];
+        m_dimensions[i] = dim[i];
+      }
+      m_data = internal::conditional_managed_new_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(size);
+      EIGEN_INTERNAL_TENSOR_STORAGE_CTOR_PLUGIN
+    }
+#endif
+
+    EIGEN_DEVICE_FUNC TensorStorage(const Self_& other)
+      : m_data(internal::conditional_managed_new_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(internal::array_prod(other.m_dimensions)))
+      , m_dimensions(other.m_dimensions)
+    {
+      internal::smart_copy(other.m_data, other.m_data+internal::array_prod(other.m_dimensions), m_data);
+    }
+
+    EIGEN_DEVICE_FUNC Self_& operator=(const Self_& other)
+    {
+      if (this != &other) {
+        Self_ tmp(other);
+        this->swap(tmp);
+      }
+      return *this;
+    }
+
+    EIGEN_DEVICE_FUNC ~TensorStorage()
+    {
+      internal::conditional_managed_delete_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(m_data, internal::array_prod(m_dimensions));
+    }
+
+    EIGEN_DEVICE_FUNC void swap(Self_& other)
+    { std::swap(m_data,other.m_data); std::swap(m_dimensions,other.m_dimensions); }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const VSizes<DenseIndex>& dimensions() const { return m_dimensions; }
+
+    template <typename NewDimensions> EIGEN_DEVICE_FUNC
+    void resize(DenseIndex size, const NewDimensions& nbDimensions)
+    {
+      const DenseIndex currentSz = internal::array_prod(m_dimensions);
+      if(size != currentSz)
+      {
+        internal::conditional_managed_delete_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(m_data, currentSz);
+        if (size)
+          m_data = internal::conditional_managed_new_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(size);
+        else
+          m_data = 0;
+        EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
+      }
+      m_dimensions.resize(internal::array_size<NewDimensions>::value);
+      for (int i = 0; i < internal::array_size<NewDimensions>::value; ++i) {
+        m_dimensions[i] = nbDimensions[i];
+      }
+    }
+    EIGEN_DEVICE_FUNC void resize(DenseIndex size, const std::vector<DenseIndex>& nbDimensions)
+    {
+      const DenseIndex currentSz = internal::array_prod(m_dimensions);
+      if(size != currentSz)
+      {
+        internal::conditional_managed_delete_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(m_data, currentSz);
+        if (size)
+          m_data = internal::conditional_managed_new_auto<T,(Options_&DontAlign)==0,(Options_&AllocateUVM)>(size);
+        else
+          m_data = 0;
+        EIGEN_INTERNAL_DENSE_STORAGE_CTOR_PLUGIN
+      }
+      m_dimensions = nbDimensions;
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE T *data() { return m_data; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const T *data() const { return m_data; }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE DenseIndex size() const { return m_dimensions.TotalSize(); }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSORSTORAGE_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorStriding.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorStriding.h
new file mode 100644
index 0000000000..8abe5ea8e4
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorStriding.h
@@ -0,0 +1,329 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_STRIDING_H
+#define EIGEN_CXX11_TENSOR_TENSOR_STRIDING_H
+
+namespace Eigen {
+
+/** \class TensorStriding
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor striding class.
+  *
+  *
+  */
+namespace internal {
+template<typename Strides, typename XprType>
+struct traits<TensorStridingOp<Strides, XprType> > : public traits<XprType>
+{
+  typedef typename XprType::Scalar Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename Strides, typename XprType>
+struct eval<TensorStridingOp<Strides, XprType>, Eigen::Dense>
+{
+  typedef const TensorStridingOp<Strides, XprType>& type;
+};
+
+template<typename Strides, typename XprType>
+struct nested<TensorStridingOp<Strides, XprType>, 1, typename eval<TensorStridingOp<Strides, XprType> >::type>
+{
+  typedef TensorStridingOp<Strides, XprType> type;
+};
+
+}  // end namespace internal
+
+
+
+template<typename Strides, typename XprType>
+class TensorStridingOp : public TensorBase<TensorStridingOp<Strides, XprType> >
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorStridingOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorStridingOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorStridingOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorStridingOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorStridingOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorStridingOp(const XprType& expr, const Strides& dims)
+      : m_xpr(expr), m_dims(dims) {}
+
+    EIGEN_DEVICE_FUNC
+    const Strides& strides() const { return m_dims; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorStridingOp& operator = (const TensorStridingOp& other)
+    {
+      typedef TensorAssignOp<TensorStridingOp, const TensorStridingOp> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorStridingOp& operator = (const OtherDerived& other)
+    {
+      typedef TensorAssignOp<TensorStridingOp, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const Strides m_dims;
+};
+
+
+// Eval as rvalue
+template<typename Strides, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorStridingOp<Strides, ArgType>, Device>
+{
+  typedef TensorStridingOp<Strides, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = /*TensorEvaluator<ArgType, Device>::IsAligned*/ false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device)
+  {
+    m_dimensions = m_impl.dimensions();
+    for (int i = 0; i < NumDims; ++i) {
+      m_dimensions[i] = ceilf(static_cast<float>(m_dimensions[i]) / op.strides()[i]);
+    }
+
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_outputStrides[0] = 1;
+      m_inputStrides[0] = 1;
+      for (int i = 1; i < NumDims; ++i) {
+        m_outputStrides[i] = m_outputStrides[i-1] * m_dimensions[i-1];
+        m_inputStrides[i] = m_inputStrides[i-1] * input_dims[i-1];
+        m_inputStrides[i-1] *= op.strides()[i-1];
+      }
+      m_inputStrides[NumDims-1] *= op.strides()[NumDims-1];
+    } else {  // RowMajor
+      m_outputStrides[NumDims-1] = 1;
+      m_inputStrides[NumDims-1] = 1;
+      for (int i = NumDims - 2; i >= 0; --i) {
+        m_outputStrides[i] = m_outputStrides[i+1] * m_dimensions[i+1];
+        m_inputStrides[i] = m_inputStrides[i+1] * input_dims[i+1];
+        m_inputStrides[i+1] *= op.strides()[i+1];
+      }
+      m_inputStrides[0] *= op.strides()[0];
+    }
+  }
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    return m_impl.coeff(srcCoeff(index));
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    Index inputIndices[] = {0, 0};
+    Index indices[] = {index, index + packetSize - 1};
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx0 = indices[0] / m_outputStrides[i];
+        const Index idx1 = indices[1] / m_outputStrides[i];
+        inputIndices[0] += idx0 * m_inputStrides[i];
+        inputIndices[1] += idx1 * m_inputStrides[i];
+        indices[0] -= idx0 * m_outputStrides[i];
+        indices[1] -= idx1 * m_outputStrides[i];
+      }
+      inputIndices[0] += indices[0] * m_inputStrides[0];
+      inputIndices[1] += indices[1] * m_inputStrides[0];
+    } else {  // RowMajor
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx0 = indices[0] / m_outputStrides[i];
+        const Index idx1 = indices[1] / m_outputStrides[i];
+        inputIndices[0] += idx0 * m_inputStrides[i];
+        inputIndices[1] += idx1 * m_inputStrides[i];
+        indices[0] -= idx0 * m_outputStrides[i];
+        indices[1] -= idx1 * m_outputStrides[i];
+      }
+      inputIndices[0] += indices[0] * m_inputStrides[NumDims-1];
+      inputIndices[1] += indices[1] * m_inputStrides[NumDims-1];
+    }
+    if (inputIndices[1] - inputIndices[0] == packetSize - 1) {
+      PacketReturnType rslt = m_impl.template packet<Unaligned>(inputIndices[0]);
+      return rslt;
+    }
+    else {
+      EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+      values[0] = m_impl.coeff(inputIndices[0]);
+      values[packetSize-1] = m_impl.coeff(inputIndices[1]);
+      for (int i = 1; i < packetSize-1; ++i) {
+        values[i] = coeff(index+i);
+      }
+      PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+      return rslt;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index srcCoeff(Index index) const
+  {
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx = index / m_outputStrides[i];
+        inputIndex += idx * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      inputIndex += index * m_inputStrides[0];
+    } else {  // RowMajor
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx = index / m_outputStrides[i];
+        inputIndex += idx * m_inputStrides[i];
+        index -= idx * m_outputStrides[i];
+      }
+      inputIndex += index * m_inputStrides[NumDims-1];
+    }
+    return inputIndex;
+  }
+
+  Dimensions m_dimensions;
+  array<Index, NumDims> m_outputStrides;
+  array<Index, NumDims> m_inputStrides;
+  TensorEvaluator<ArgType, Device> m_impl;
+};
+
+
+// Eval as lvalue
+template<typename Strides, typename ArgType, typename Device>
+struct TensorEvaluator<TensorStridingOp<Strides, ArgType>, Device>
+    : public TensorEvaluator<const TensorStridingOp<Strides, ArgType>, Device>
+{
+  typedef TensorStridingOp<Strides, ArgType> XprType;
+  typedef TensorEvaluator<const XprType, Device> Base;
+  //  typedef typename XprType::Index Index;
+  static const int NumDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  //  typedef DSizes<Index, NumDims> Dimensions;
+
+  enum {
+    IsAligned = /*TensorEvaluator<ArgType, Device>::IsAligned*/ false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : Base(op, device) { }
+
+  typedef typename XprType::Index Index;
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index)
+  {
+    return this->m_impl.coeffRef(this->srcCoeff(index));
+  }
+
+  template <int StoreMode> EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  void writePacket(Index index, const PacketReturnType& x)
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < this->dimensions().TotalSize());
+
+    Index inputIndices[] = {0, 0};
+    Index indices[] = {index, index + packetSize - 1};
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumDims - 1; i > 0; --i) {
+        const Index idx0 = indices[0] / this->m_outputStrides[i];
+        const Index idx1 = indices[1] / this->m_outputStrides[i];
+        inputIndices[0] += idx0 * this->m_inputStrides[i];
+        inputIndices[1] += idx1 * this->m_inputStrides[i];
+        indices[0] -= idx0 * this->m_outputStrides[i];
+        indices[1] -= idx1 * this->m_outputStrides[i];
+      }
+      inputIndices[0] += indices[0] * this->m_inputStrides[0];
+      inputIndices[1] += indices[1] * this->m_inputStrides[0];
+    } else {  // RowMajor
+      for (int i = 0; i < NumDims - 1; ++i) {
+        const Index idx0 = indices[0] / this->m_outputStrides[i];
+        const Index idx1 = indices[1] / this->m_outputStrides[i];
+        inputIndices[0] += idx0 * this->m_inputStrides[i];
+        inputIndices[1] += idx1 * this->m_inputStrides[i];
+        indices[0] -= idx0 * this->m_outputStrides[i];
+        indices[1] -= idx1 * this->m_outputStrides[i];
+      }
+      inputIndices[0] += indices[0] * this->m_inputStrides[NumDims-1];
+      inputIndices[1] += indices[1] * this->m_inputStrides[NumDims-1];
+    }
+    if (inputIndices[1] - inputIndices[0] == packetSize - 1) {
+      this->m_impl.template writePacket<Unaligned>(inputIndices[0], x);
+    }
+    else {
+      EIGEN_ALIGN_DEFAULT Scalar values[packetSize];
+      internal::pstore<Scalar, PacketReturnType>(values, x);
+      this->m_impl.coeffRef(inputIndices[0]) = values[0];
+      this->m_impl.coeffRef(inputIndices[1]) = values[packetSize-1];
+      for (int i = 1; i < packetSize-1; ++i) {
+        this->coeffRef(index+i) = values[i];
+      }
+    }
+  }
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_STRIDING_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorTraits.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorTraits.h
new file mode 100644
index 0000000000..b8c1eadfc3
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorTraits.h
@@ -0,0 +1,294 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_TRAITS_H
+#define EIGEN_CXX11_TENSOR_TENSOR_TRAITS_H
+
+namespace Eigen {
+namespace internal {
+
+
+template<typename Scalar, int Options>
+class compute_tensor_flags
+{
+  enum {
+    is_dynamic_size_storage = 1,
+
+    aligned_bit =
+    (
+        ((Options&DontAlign)==0) && (
+#if EIGEN_ALIGN_STATICALLY
+            (!is_dynamic_size_storage)
+#else
+            0
+#endif
+            ||
+#if EIGEN_ALIGN
+            is_dynamic_size_storage
+#else
+            0
+#endif
+      )
+    ) ? AlignedBit : 0,
+    packet_access_bit = packet_traits<Scalar>::Vectorizable && aligned_bit ? PacketAccessBit : 0
+  };
+
+  public:
+    enum { ret = packet_access_bit | aligned_bit};
+};
+
+
+template<typename Scalar_, std::size_t NumIndices_, int Options_, typename IndexType_>
+struct traits<Tensor<Scalar_, NumIndices_, Options_, IndexType_> >
+{
+  typedef Scalar_ Scalar;
+  typedef Dense StorageKind;
+  typedef IndexType_ Index;
+  static const int NumDimensions = NumIndices_;
+  static const int Layout = Options_ & RowMajor ? RowMajor : ColMajor;
+  enum {
+    Options = Options_,
+    Flags = compute_tensor_flags<Scalar_, Options_>::ret | (is_const<Scalar_>::value ? 0 : LvalueBit),
+  };
+};
+
+
+template<typename Scalar_, typename Dimensions, int Options_, typename IndexType_>
+struct traits<TensorFixedSize<Scalar_, Dimensions, Options_, IndexType_> >
+{
+  typedef Scalar_ Scalar;
+  typedef Dense StorageKind;
+  typedef IndexType_ Index;
+  static const int NumDimensions = array_size<Dimensions>::value;
+  static const int Layout = Options_ & RowMajor ? RowMajor : ColMajor;
+  enum {
+    Options = Options_,
+    Flags = compute_tensor_flags<Scalar_, Options_>::ret | (is_const<Scalar_>::value ? 0: LvalueBit),
+  };
+};
+
+
+template<typename Scalar_, int Options_, typename IndexType_>
+struct traits<TensorVarDim<Scalar_, Options_, IndexType_> >
+{
+  typedef Scalar_ Scalar;
+  typedef Dense StorageKind;
+  typedef IndexType_ Index;
+  static const int NumDimensions = -1;
+  static const int Layout = Options_ & RowMajor ? RowMajor : ColMajor;
+  enum {
+    Options = Options_,
+    Flags = compute_tensor_flags<Scalar_, Options_>::ret | (is_const<Scalar_>::value ? 0 : LvalueBit),
+  };
+};
+
+template<typename PlainObjectType, int Options_>
+struct traits<TensorMap<PlainObjectType, Options_> >
+  : public traits<PlainObjectType>
+{
+  typedef traits<PlainObjectType> BaseTraits;
+  typedef typename BaseTraits::Scalar Scalar;
+  typedef typename BaseTraits::StorageKind StorageKind;
+  typedef typename BaseTraits::Index Index;
+  static const int NumDimensions = BaseTraits::NumDimensions;
+  static const int Layout = BaseTraits::Layout;
+  enum {
+    Options = Options_,
+    Flags = (BaseTraits::Flags & ~AlignedBit) | (Options&Aligned ? AlignedBit : 0),
+  };
+};
+
+template<typename PlainObjectType>
+struct traits<TensorRef<PlainObjectType> >
+  : public traits<PlainObjectType>
+{
+  typedef traits<PlainObjectType> BaseTraits;
+  typedef typename BaseTraits::Scalar Scalar;
+  typedef typename BaseTraits::StorageKind StorageKind;
+  typedef typename BaseTraits::Index Index;
+  static const int NumDimensions = BaseTraits::NumDimensions;
+  static const int Layout = BaseTraits::Layout;
+  enum {
+    Options = BaseTraits::Options,
+    Flags = (BaseTraits::Flags & ~AlignedBit) | (Options&Aligned ? AlignedBit : 0),
+  };
+};
+
+
+template<typename _Scalar, std::size_t NumIndices_, int Options, typename IndexType_>
+struct eval<Tensor<_Scalar, NumIndices_, Options, IndexType_>, Eigen::Dense>
+{
+  typedef const Tensor<_Scalar, NumIndices_, Options, IndexType_>& type;
+};
+
+template<typename _Scalar, std::size_t NumIndices_, int Options, typename IndexType_>
+struct eval<const Tensor<_Scalar, NumIndices_, Options, IndexType_>, Eigen::Dense>
+{
+  typedef const Tensor<_Scalar, NumIndices_, Options, IndexType_>& type;
+};
+
+template<typename Scalar_, typename Dimensions, int Options, typename IndexType_>
+struct eval<TensorFixedSize<Scalar_, Dimensions, Options, IndexType_>, Eigen::Dense>
+{
+  typedef const TensorFixedSize<Scalar_, Dimensions, Options, IndexType_>& type;
+};
+
+template<typename Scalar_, typename Dimensions, int Options, typename IndexType_>
+struct eval<const TensorFixedSize<Scalar_, Dimensions, Options, IndexType_>, Eigen::Dense>
+{
+  typedef const TensorFixedSize<Scalar_, Dimensions, Options, IndexType_>& type;
+};
+
+template<typename Scalar_,  int Options, typename IndexType_>
+struct eval<TensorVarDim<Scalar_, Options, IndexType_>, Eigen::Dense>
+{
+  typedef const TensorVarDim<Scalar_, Options, IndexType_>& type;
+};
+
+template<typename Scalar_, int Options, typename IndexType_>
+struct eval<const TensorVarDim<Scalar_, Options, IndexType_>, Eigen::Dense>
+{
+  typedef const TensorVarDim<Scalar_, Options, IndexType_>& type;
+};
+
+template<typename PlainObjectType, int Options>
+struct eval<TensorMap<PlainObjectType, Options>, Eigen::Dense>
+{
+  typedef const TensorMap<PlainObjectType, Options>& type;
+};
+
+template<typename PlainObjectType, int Options>
+struct eval<const TensorMap<PlainObjectType, Options>, Eigen::Dense>
+{
+  typedef const TensorMap<PlainObjectType, Options>& type;
+};
+
+template<typename PlainObjectType>
+struct eval<TensorRef<PlainObjectType>, Eigen::Dense>
+{
+  typedef const TensorRef<PlainObjectType>& type;
+};
+
+template<typename PlainObjectType>
+struct eval<const TensorRef<PlainObjectType>, Eigen::Dense>
+{
+  typedef const TensorRef<PlainObjectType>& type;
+};
+
+
+template <typename Scalar_, std::size_t NumIndices_, int Options_, typename IndexType_>
+struct nested<Tensor<Scalar_, NumIndices_, Options_, IndexType_>, 1, typename eval<Tensor<Scalar_, NumIndices_, Options_, IndexType_> >::type>
+{
+  typedef const Tensor<Scalar_, NumIndices_, Options_, IndexType_>& type;
+};
+
+template <typename Scalar_, std::size_t NumIndices_, int Options_, typename IndexType_>
+struct nested<const Tensor<Scalar_, NumIndices_, Options_, IndexType_>, 1, typename eval<const Tensor<Scalar_, NumIndices_, Options_, IndexType_> >::type>
+{
+  typedef const Tensor<Scalar_, NumIndices_, Options_, IndexType_>& type;
+};
+
+template <typename Scalar_, typename Dimensions, int Options, typename IndexType_>
+struct nested<TensorFixedSize<Scalar_, Dimensions, Options, IndexType_>, 1, typename eval<TensorFixedSize<Scalar_, Dimensions, Options, IndexType_> >::type>
+{
+  typedef const TensorFixedSize<Scalar_, Dimensions, Options, IndexType_>& type;
+};
+
+template <typename Scalar_, typename Dimensions, int Options, typename IndexType_>
+struct nested<const TensorFixedSize<Scalar_, Dimensions, Options, IndexType_>, 1, typename eval<const TensorFixedSize<Scalar_, Dimensions, Options, IndexType_> >::type>
+{
+  typedef const TensorFixedSize<Scalar_, Dimensions, Options, IndexType_>& type;
+};
+
+template <typename Scalar_, int Options>
+struct nested<TensorVarDim<Scalar_, Options>, 1, typename eval<TensorVarDim<Scalar_, Options> >::type>
+{
+  typedef const TensorVarDim<Scalar_, Options>& type;
+};
+
+template <typename Scalar_, int Options>
+struct nested<const TensorVarDim<Scalar_, Options>, 1, typename eval<const TensorVarDim<Scalar_, Options> >::type>
+{
+  typedef const TensorVarDim<Scalar_, Options>& type;
+};
+
+
+template <typename PlainObjectType, int Options>
+struct nested<TensorMap<PlainObjectType, Options>, 1, typename eval<TensorMap<PlainObjectType, Options> >::type>
+{
+  typedef const TensorMap<PlainObjectType, Options>& type;
+};
+
+template <typename PlainObjectType, int Options>
+struct nested<const TensorMap<PlainObjectType, Options>, 1, typename eval<TensorMap<PlainObjectType, Options> >::type>
+{
+  typedef const TensorMap<PlainObjectType, Options>& type;
+};
+
+template <typename PlainObjectType>
+struct nested<TensorRef<PlainObjectType>, 1, typename eval<TensorRef<PlainObjectType> >::type>
+{
+  typedef const TensorRef<PlainObjectType>& type;
+};
+
+template <typename PlainObjectType>
+struct nested<const TensorRef<PlainObjectType>, 1, typename eval<TensorRef<PlainObjectType> >::type>
+{
+  typedef const TensorRef<PlainObjectType>& type;
+};
+
+}  // end namespace internal
+
+// Convolutional layers take in an input tensor of shape (D, R, C, B), or (D, C,
+// R, B), and convolve it with a set of filters, which can also be presented as
+// a tensor (D, K, K, M), where M is the number of filters, K is the filter
+// size, and each 3-dimensional tensor of size (D, K, K) is a filter. For
+// simplicity we assume that we always use square filters (which is usually the
+// case in images), hence the two Ks in the tensor dimension.  It also takes in
+// a few additional parameters:
+// Stride (S): The convolution stride is the offset between locations where we
+//             apply the filters.  A larger stride means that the output will be
+//             spatially smaller.
+// Padding (P): The padding we apply to the input tensor along the R and C
+//              dimensions.  This is usually used to make sure that the spatial
+//              dimensions of the output matches our intention.
+//
+// Two types of padding are often used:
+//   SAME: The pad value is computed so that the output will have size
+//         R/S and C/S.
+//   VALID: no padding is carried out.
+// When we do padding, the padded values at the padded locations are usually
+// zero.
+//
+// The output dimensions for convolution, when given all the parameters above,
+// are as follows:
+// When Padding = SAME: the output size is (B, R', C', M), where
+//   R' = ceil(float(R) / float(S))
+//   C' = ceil(float(C) / float(S))
+// where ceil is the ceiling function.  The input tensor is padded with 0 as
+// needed.  The number of padded rows and columns are computed as:
+//   Pr = ((R' - 1) * S + K - R) / 2
+//   Pc = ((C' - 1) * S + K - C) / 2
+// when the stride is 1, we have the simplified case R'=R, C'=C, Pr=Pc=(K-1)/2.
+// This is where SAME comes from - the output has the same size as the input has.
+// When Padding = VALID: the output size is computed as
+//   R' = ceil(float(R - K + 1) / float(S))
+//   C' = ceil(float(C - K + 1) / float(S))
+// and the number of padded rows and columns are computed in the same way as in
+// the SAME case.
+// When the stride is 1, we have the simplified case R'=R-K+1, C'=C-K+1, Pr=0,
+// Pc=0.
+typedef enum {
+  PADDING_VALID = 1,
+  PADDING_SAME = 2,
+} PaddingType;
+
+}  // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_TRAITS_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorTrueIndices.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorTrueIndices.h
new file mode 100644
index 0000000000..ec1d44e6a6
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorTrueIndices.h
@@ -0,0 +1,250 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2015 Eugene Brevdo <ebrevdo@google.com>
+//                    Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_TRUE_INDICES_H
+#define EIGEN_CXX11_TENSOR_TENSOR_TRUE_INDICES_H
+namespace Eigen {
+
+/** \class TensorTrueIndices
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Tensor provide indices of true values class.
+  *
+  */
+namespace internal {
+template<typename XprType>
+struct traits<TensorTrueIndicesOp<XprType> > : public traits<XprType>
+{
+  typedef DenseIndex Scalar;
+  typedef DenseIndex CoeffReturnType;
+  typedef traits<XprType> XprTraits;
+  //typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = 2; // XprTraits::NumDimensions;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<typename XprType>
+struct eval<TensorTrueIndicesOp<XprType>, Eigen::Dense>
+{
+  typedef const TensorTrueIndicesOp<XprType>& type;
+};
+
+template<typename XprType>
+struct nested<TensorTrueIndicesOp<XprType>, 1,
+            typename eval<TensorTrueIndicesOp<XprType> >::type>
+{
+  typedef TensorTrueIndicesOp<XprType> type;
+};
+
+}  // end namespace internal
+
+template<typename XprType>
+class TensorTrueIndicesOp : public TensorBase<TensorTrueIndicesOp<XprType>, WriteAccessors>
+{
+  public:
+    typedef typename Eigen::internal::traits<TensorTrueIndicesOp>::Scalar Scalar;
+    //typedef typename Eigen::internal::traits<TensorTrueIndicesOp>::Packet Packet;
+    typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+    typedef typename Eigen::internal::traits<TensorTrueIndicesOp>::CoeffReturnType CoeffReturnType;
+    typedef typename internal::packet_traits<CoeffReturnType>::type PacketReturnType;
+    typedef typename Eigen::internal::nested<TensorTrueIndicesOp>::type Nested;
+    typedef typename Eigen::internal::traits<TensorTrueIndicesOp>::StorageKind
+                                                                    StorageKind;
+    typedef typename Eigen::internal::traits<TensorTrueIndicesOp>::Index Index;
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorTrueIndicesOp(
+        const XprType& expr, const CoeffReturnType& not_found = -1)
+        : m_xpr(expr), m_not_found(not_found) {
+    }
+
+    EIGEN_DEVICE_FUNC
+    const CoeffReturnType& not_found() const { return m_not_found; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorTrueIndicesOp& operator = (const TensorTrueIndicesOp& other)
+    {
+      typedef TensorAssignOp<TensorTrueIndicesOp, const TensorTrueIndicesOp> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorTrueIndicesOp& operator = (const OtherDerived& other)
+    {
+      typedef TensorAssignOp<TensorTrueIndicesOp, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(
+          assign, DefaultDevice());
+      return *this;
+    }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    CoeffReturnType m_not_found;
+};
+
+// Eval as rvalue
+template<typename ArgType, typename Device>
+struct TensorEvaluator<const TensorTrueIndicesOp<ArgType>, Device>
+{
+  typedef TensorTrueIndicesOp<ArgType> XprType;
+  typedef typename XprType::Index InputIndex;
+  typedef typename XprType::Index Index;
+  static const int NumDims = 2;
+  typedef DSizes<Index, 2> Dimensions;
+  typedef typename TensorEvaluator<ArgType, Device>::Dimensions InputDimensions;
+  static const int NumInputDims = internal::array_size<InputDimensions>::value;
+
+  enum {
+    IsAligned = true,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = false,  // to be implemented
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op,
+                                                        const Device& device)
+      : m_impl(op.expression(), device), m_not_found(op.not_found())
+  {
+    // Store original dimensions
+    m_orig_dimensions = m_impl.dimensions();
+
+    // Calculate output dimensions
+    m_dimensions[0] = m_orig_dimensions.TotalSize();
+    m_dimensions[1] = NumInputDims;
+
+    // Calculate strides of input expression
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_strides[0] = 1;
+      for (int i = 1; i < NumInputDims; ++i) {
+        m_strides[i] = m_strides[i-1] * m_orig_dimensions[i-1];
+      }
+    } else {
+      m_strides[NumInputDims-1] = 1;
+      for (int i = NumInputDims - 2; i >= 0; --i) {
+        m_strides[i] = m_strides[i+1] * m_orig_dimensions[i+1];
+      }
+    }
+  }
+
+  typedef typename XprType::Scalar Scalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar*) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE InputIndex origIndices(
+      Index index) const {
+    eigen_assert(index < dimensions().TotalSize());
+    Index inputIndex = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      inputIndex = index % m_dimensions[0];
+    } else {
+      inputIndex = index / m_dimensions[1];
+    }
+    return inputIndex;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE int whichDim(
+      Index index) const {
+    eigen_assert(index < dimensions().TotalSize());
+    int inputDim = 0;
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      inputDim = index / m_dimensions[0];
+    } else {
+      inputDim = index % m_dimensions[1];
+    }
+    return inputDim;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType origDim(
+      int dim, InputIndex index) const {
+    eigen_assert(index < m_orig_dimensions.TotalSize());
+    eigen_assert(dim > -1 && dim < m_orig_dimensions.size());
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      for (int i = NumInputDims - 1; i > 0; --i) {
+        Index idx = index / m_strides[i];
+        if (i == dim) return idx;  // Found our dimension
+        index -= idx * m_strides[i];
+      }
+      return index;
+    } else {
+      for (int i = 0; i < NumInputDims - 1; ++i) {
+        Index idx = index / m_strides[i];
+        if (i == dim) return idx;  // Found our dimension
+        index -= idx * m_strides[i];
+      }
+      return index;
+    }
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(
+      Index index) const  {
+    InputIndex orig_index = origIndices(index);
+    if (m_impl.coeff(orig_index))
+      return origDim(whichDim(index), orig_index);
+    else {
+      return m_not_found;
+    }
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE
+  PacketReturnType packet(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    // TODO(ndjaitly): write a better packing routine that uses
+    // local structure.
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type
+                                                            values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+ protected:
+  InputDimensions m_orig_dimensions;
+  Dimensions m_dimensions;
+  TensorEvaluator<ArgType, Device> m_impl;
+  array<Index, NumInputDims> m_strides;
+  CoeffReturnType m_not_found;
+};
+
+}  // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_TRUE_INDICES_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorVarDim.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorVarDim.h
new file mode 100644
index 0000000000..49954b955e
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorVarDim.h
@@ -0,0 +1,315 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2014 Benoit Steiner <benoit.steiner.goog@gmail.com>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_VAR_DIM_H
+#define EIGEN_CXX11_TENSOR_TENSOR_VAR_DIM_H
+
+namespace Eigen {
+
+/** \class Tensor
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief A version of the tensor class that supports a variable number of dimensions.
+  *
+  * The variable equivalent of
+  * Eigen::Tensor<float, 3> t(3, 5, 7);
+  * is
+  * Eigen::TensorVarDim<float> t(3, 5, 7);
+  */
+
+template<typename Scalar_, int Options_, typename IndexType_>
+class TensorVarDim : public TensorBase<TensorVarDim<Scalar_, Options_, IndexType_> >
+{
+  public:
+    typedef TensorVarDim<Scalar_, Options_, IndexType_> Self;
+    typedef TensorBase<TensorVarDim<Scalar_, Options_, IndexType_> > Base;
+    typedef typename Eigen::internal::nested<Self>::type Nested;
+    typedef typename internal::traits<Self>::StorageKind StorageKind;
+    typedef typename internal::traits<Self>::Index Index;
+    typedef Scalar_ Scalar;
+    typedef typename internal::packet_traits<Scalar>::type Packet;
+    typedef typename NumTraits<Scalar>::Real RealScalar;
+    typedef typename Base::CoeffReturnType CoeffReturnType;
+    typedef typename Base::PacketReturnType PacketReturnType;
+
+    enum {
+      IsAligned = bool(EIGEN_ALIGN) & !(Options_ & DontAlign),
+      PacketAccess = (internal::packet_traits<Scalar>::size > 1),
+      BlockAccess = false,
+      Layout = Options_ & RowMajor ? RowMajor : ColMajor,
+      // disabled for now as the number of coefficients is not known by the
+      // caller at compile time.
+      CoordAccess = false,
+    };
+
+    static const int Options = Options_;
+
+    static const Index NumIndices = Dynamic;
+
+    typedef VSizes<Index> Dimensions;
+
+  protected:
+    TensorStorage<Scalar, VSizes<Index>, Options_> m_storage;
+
+  public:
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                         rank() const { return m_storage.dimensions().size(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                         dimension(std::size_t n) const { return m_storage.dimensions()[n]; }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions()    const { return m_storage.dimensions(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index                         size()                   const { return m_storage.size(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar                        *data()                        { return m_storage.data(); }
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar                  *data()                  const { return m_storage.data(); }
+
+    // This makes EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    // work, because that uses base().coeffRef() - and we don't yet
+    // implement a similar class hierarchy
+    inline Self& base()             { return *this; }
+    inline const Self& base() const { return *this; }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    EIGEN_DEVICE_FUNC inline const Scalar& coeff(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) const
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      static const std::size_t NumIndices = sizeof...(otherIndices) + 2;
+      return coeff(array<Index, NumIndices>{{firstIndex, secondIndex, otherIndices...}});
+    }
+#endif
+
+    template <std::size_t NumIndices>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(const array<Index, NumIndices>& indices) const
+    {
+      eigen_internal_assert(checkIndexRange(indices));
+      return m_storage.data()[linearizedIndex(indices)];
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& coeff(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return m_storage.data()[index];
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline Scalar& coeffRef(Index firstIndex, Index secondIndex, IndexTypes... otherIndices)
+    {
+      static const std::size_t NumIndices = sizeof...(otherIndices) + 2;
+      return coeffRef(array<Index, NumIndices>{{firstIndex, secondIndex, otherIndices...}});
+    }
+#endif
+
+    template <std::size_t NumIndices>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(const array<Index, NumIndices>& indices)
+    {
+      eigen_internal_assert(checkIndexRange(indices));
+      return m_storage.data()[linearizedIndex(indices)];
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& coeffRef(Index index)
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return m_storage.data()[index];
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline const Scalar& operator()(Index firstIndex, Index secondIndex, IndexTypes... otherIndices) const
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      static const std::size_t NumIndices = sizeof...(otherIndices) + 2;
+      return this->operator()(array<Index, NumIndices>{{firstIndex, secondIndex, otherIndices...}});
+    }
+#endif
+
+    template <std::size_t NumIndices>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(const array<Index, NumIndices>& indices) const
+    {
+      eigen_assert(checkIndexRange(indices));
+      return coeff(indices);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator()(Index index) const
+    {
+      eigen_internal_assert(index >= 0 && index < size());
+      return coeff(index);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Scalar& operator[](Index index) const
+    {
+      return coeff(index);
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    inline Scalar& operator()(Index firstIndex, IndexTypes... otherIndices)
+    {
+      // The number of indices used to access a tensor coefficient must be equal to the rank of the tensor.
+      static const size_t NumIndices = sizeof...(otherIndices) + 1;
+      return operator()(array<Index, NumIndices>{{firstIndex, otherIndices...}});
+    }
+#endif
+
+    template <std::size_t NumIndices>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(const array<Index, NumIndices>& indices)
+    {
+      eigen_assert(checkIndexRange(indices));
+      return coeffRef(indices);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator()(Index index)
+    {
+      eigen_assert(index >= 0 && index < size());
+      return coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Scalar& operator[](Index index)
+    {
+      return coeffRef(index);
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorVarDim()
+      : m_storage()
+    {
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorVarDim(const Self& other)
+      : m_storage(other.m_storage)
+    {
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    EIGEN_STRONG_INLINE TensorVarDim(Index firstDimension, IndexTypes... otherDimensions)
+        : m_storage(firstDimension, otherDimensions...)
+    {
+    }
+#endif
+
+    EIGEN_STRONG_INLINE explicit TensorVarDim(const std::vector<Index>& dimensions)
+        : m_storage(dimensions)
+    {
+      EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+    }
+
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorVarDim(const TensorBase<OtherDerived, ReadOnlyAccessors>& other)
+    {
+      typedef TensorAssignOp<TensorVarDim, const OtherDerived> Assign;
+      Assign assign(*this, other.derived());
+      resize(TensorEvaluator<const Assign, DefaultDevice>(assign, DefaultDevice()).dimensions());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+    }
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorVarDim(const TensorBase<OtherDerived, WriteAccessors>& other)
+    {
+      typedef TensorAssignOp<TensorVarDim, const OtherDerived> Assign;
+      Assign assign(*this, other.derived());
+      resize(TensorEvaluator<const Assign, DefaultDevice>(assign, DefaultDevice()).dimensions());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+    }
+
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorVarDim& operator=(const TensorVarDim& other)
+    {
+      typedef TensorAssignOp<TensorVarDim, const TensorVarDim> Assign;
+      Assign assign(*this, other);
+      resize(TensorEvaluator<const Assign, DefaultDevice>(assign, DefaultDevice()).dimensions());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+      return *this;
+    }
+    template<typename OtherDerived>
+    EIGEN_DEVICE_FUNC
+    EIGEN_STRONG_INLINE TensorVarDim& operator=(const OtherDerived& other)
+    {
+      typedef TensorAssignOp<TensorVarDim, const OtherDerived> Assign;
+      Assign assign(*this, other);
+      resize(TensorEvaluator<const Assign, DefaultDevice>(assign, DefaultDevice()).dimensions());
+      internal::TensorExecutor<const Assign, DefaultDevice>::run(assign, DefaultDevice());
+      return *this;
+    }
+
+#ifdef EIGEN_HAS_VARIADIC_TEMPLATES
+    template<typename... IndexTypes>
+    void resize(Index firstDimension, IndexTypes... otherDimensions)
+    {
+      // The number of dimensions used to resize a tensor must be equal to the rank of the tensor.
+      EIGEN_STATIC_ASSERT(sizeof...(otherDimensions) + 1 == NumIndices, YOU_MADE_A_PROGRAMMING_MISTAKE)
+      static const std::size_t NumIndices = sizeof...(otherDimensions) + 1;
+      resize(array<Index, NumIndices>{{firstDimension, otherDimensions...}});
+    }
+#endif
+
+    template <size_t NumIndices>
+    void resize(const array<Index, NumIndices>& dimensions)
+    {
+      Index size = Index(1);
+      for (std::size_t i = 0; i < NumIndices; i++) {
+        internal::check_rows_cols_for_overflow<Dynamic>::run(size, dimensions[i]);
+        size *= dimensions[i];
+      }
+      #ifdef EIGEN_INITIALIZE_COEFFS
+        bool size_changed = size != this->size();
+        m_storage.resize(size, dimensions);
+        if(size_changed) EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+      #else
+        m_storage.resize(size, dimensions);
+      #endif
+    }
+    void resize(const std::vector<Index>& dimensions)
+    {
+      Index size = Index(1);
+      for (std::size_t i = 0; i < dimensions.size(); i++) {
+        internal::check_rows_cols_for_overflow<Dynamic>::run(size, dimensions[i]);
+        size *= dimensions[i];
+      }
+      #ifdef EIGEN_INITIALIZE_COEFFS
+        bool size_changed = size != this->size();
+        m_storage.resize(size, dimensions);
+        if(size_changed) EIGEN_INITIALIZE_COEFFS_IF_THAT_OPTION_IS_ENABLED
+      #else
+        m_storage.resize(size, dimensions);
+      #endif
+    }
+
+  protected:
+    template <std::size_t NumIndices>
+    bool checkIndexRange(const array<Index, NumIndices>& indices) const
+    {
+      /*     using internal::array_apply_and_reduce;
+      using internal::array_zip_and_reduce;
+      using internal::greater_equal_zero_op;
+      using internal::logical_and_op;
+      using internal::lesser_op;
+
+      return
+        // check whether the indices are all >= 0
+        array_apply_and_reduce<logical_and_op, greater_equal_zero_op>(indices) &&
+        // check whether the indices fit in the dimensions
+        array_zip_and_reduce<logical_and_op, lesser_op>(indices, m_storage.dimensions());
+      */
+      return true;
+    }
+
+    template <std::size_t NumIndices>
+    EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Index linearizedIndex(const array<Index, NumIndices>& indices) const
+    {
+      if (Options&RowMajor) {
+        return m_storage.dimensions().IndexOfRowMajor(indices);
+      } else {
+        return m_storage.dimensions().IndexOfColMajor(indices);
+      }
+    }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_VAR_DIM_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorVolumePatch.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorVolumePatch.h
new file mode 100644
index 0000000000..de86c57f11
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorVolumePatch.h
@@ -0,0 +1,677 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+
+#ifndef EIGEN_CXX11_TENSOR_TENSOR_VOLUME_PATCH_H
+#define EIGEN_CXX11_TENSOR_TENSOR_VOLUME_PATCH_H
+
+namespace Eigen {
+
+/** \class TensorVolumePatch
+  * \ingroup CXX11_Tensor_Module
+  *
+  * \brief Patch extraction specialized for processing of volumetric data.
+  * This assumes that the input has a least 4 dimensions ordered as follows:
+  *  - channels
+  *  - planes
+  *  - rows
+  *  - columns
+  *  - (optional) additional dimensions such as time or batch size.
+  * Calling the volume patch code with patch_planes, patch_rows, and patch_cols
+  * is equivalent to calling the regular patch extraction code with parameters
+  * d, patch_planes, patch_rows, patch_cols, and 1 for all the additional
+  * dimensions.
+  */
+namespace internal {
+template<DenseIndex Planes, DenseIndex Rows, DenseIndex Cols, typename XprType>
+struct traits<TensorVolumePatchOp<Planes, Rows, Cols, XprType> > : public traits<XprType>
+{
+  typedef typename internal::remove_const<typename XprType::Scalar>::type Scalar;
+  typedef traits<XprType> XprTraits;
+  typedef typename packet_traits<Scalar>::type Packet;
+  typedef typename XprTraits::StorageKind StorageKind;
+  typedef typename XprTraits::Index Index;
+  typedef typename XprType::Nested Nested;
+  typedef typename remove_reference<Nested>::type _Nested;
+  static const int NumDimensions = XprTraits::NumDimensions + 1;
+  static const int Layout = XprTraits::Layout;
+};
+
+template<DenseIndex Planes, DenseIndex Rows, DenseIndex Cols, typename XprType>
+struct eval<TensorVolumePatchOp<Planes, Rows, Cols, XprType>, Eigen::Dense>
+{
+  typedef const TensorVolumePatchOp<Planes, Rows, Cols, XprType>& type;
+};
+
+template<DenseIndex Planes, DenseIndex Rows, DenseIndex Cols, typename XprType>
+struct nested<TensorVolumePatchOp<Planes, Rows, Cols, XprType>, 1, typename eval<TensorVolumePatchOp<Planes, Rows, Cols, XprType> >::type>
+{
+  typedef TensorVolumePatchOp<Planes, Rows, Cols, XprType> type;
+};
+
+}  // end namespace internal
+
+template<DenseIndex Planes, DenseIndex Rows, DenseIndex Cols, typename XprType>
+class TensorVolumePatchOp : public TensorBase<TensorVolumePatchOp<Planes, Rows, Cols, XprType>, ReadOnlyAccessors>
+{
+  public:
+  typedef typename Eigen::internal::traits<TensorVolumePatchOp>::Scalar Scalar;
+  typedef typename Eigen::internal::traits<TensorVolumePatchOp>::Packet Packet;
+  typedef typename Eigen::NumTraits<Scalar>::Real RealScalar;
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+  typedef typename Eigen::internal::nested<TensorVolumePatchOp>::type Nested;
+  typedef typename Eigen::internal::traits<TensorVolumePatchOp>::StorageKind StorageKind;
+  typedef typename Eigen::internal::traits<TensorVolumePatchOp>::Index Index;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorVolumePatchOp(const XprType& expr, DenseIndex patch_planes, DenseIndex patch_rows, DenseIndex patch_cols,
+                                                            DenseIndex plane_strides, DenseIndex row_strides, DenseIndex col_strides,
+                                                            DenseIndex in_plane_strides, DenseIndex in_row_strides, DenseIndex in_col_strides,
+                                                            DenseIndex plane_inflate_strides, DenseIndex row_inflate_strides, DenseIndex col_inflate_strides,
+                                                            PaddingType padding_type, Scalar padding_value)
+      : m_xpr(expr), m_patch_planes(patch_planes), m_patch_rows(patch_rows), m_patch_cols(patch_cols),
+        m_plane_strides(plane_strides), m_row_strides(row_strides), m_col_strides(col_strides),
+        m_in_plane_strides(in_plane_strides), m_in_row_strides(in_row_strides), m_in_col_strides(in_col_strides),
+        m_plane_inflate_strides(plane_inflate_strides), m_row_inflate_strides(row_inflate_strides), m_col_inflate_strides(col_inflate_strides),
+        m_padding_explicit(false), m_padding_top_z(0), m_padding_bottom_z(0), m_padding_top(0), m_padding_bottom(0), m_padding_left(0), m_padding_right(0),
+        m_padding_type(padding_type), m_padding_value(padding_value) {}
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorVolumePatchOp(const XprType& expr, DenseIndex patch_planes, DenseIndex patch_rows, DenseIndex patch_cols,
+                                                           DenseIndex plane_strides, DenseIndex row_strides, DenseIndex col_strides,
+                                                           DenseIndex in_plane_strides, DenseIndex in_row_strides, DenseIndex in_col_strides,
+                                                           DenseIndex plane_inflate_strides, DenseIndex row_inflate_strides, DenseIndex col_inflate_strides,
+                                                           DenseIndex padding_top_z, DenseIndex padding_bottom_z,
+                                                           DenseIndex padding_top, DenseIndex padding_bottom,
+                                                           DenseIndex padding_left, DenseIndex padding_right,
+                                                           Scalar padding_value)
+      : m_xpr(expr), m_patch_planes(patch_planes), m_patch_rows(patch_rows), m_patch_cols(patch_cols),
+        m_plane_strides(plane_strides), m_row_strides(row_strides), m_col_strides(col_strides),
+        m_in_plane_strides(in_plane_strides), m_in_row_strides(in_row_strides), m_in_col_strides(in_col_strides),
+        m_plane_inflate_strides(plane_inflate_strides), m_row_inflate_strides(row_inflate_strides), m_col_inflate_strides(col_inflate_strides),
+        m_padding_explicit(true), m_padding_top_z(padding_top_z), m_padding_bottom_z(padding_bottom_z), m_padding_top(padding_top), m_padding_bottom(padding_bottom),
+        m_padding_left(padding_left), m_padding_right(padding_right),
+        m_padding_type(PADDING_VALID), m_padding_value(padding_value) {}
+
+    EIGEN_DEVICE_FUNC
+    DenseIndex patch_planes() const { return m_patch_planes; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex patch_rows() const { return m_patch_rows; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex patch_cols() const { return m_patch_cols; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex plane_strides() const { return m_plane_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex row_strides() const { return m_row_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex col_strides() const { return m_col_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex in_plane_strides() const { return m_in_plane_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex in_row_strides() const { return m_in_row_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex in_col_strides() const { return m_in_col_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex plane_inflate_strides() const { return m_plane_inflate_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex row_inflate_strides() const { return m_row_inflate_strides; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex col_inflate_strides() const { return m_col_inflate_strides; }
+    EIGEN_DEVICE_FUNC
+    bool padding_explicit() const { return m_padding_explicit; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_top_z() const { return m_padding_top_z; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_bottom_z() const { return m_padding_bottom_z; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_top() const { return m_padding_top; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_bottom() const { return m_padding_bottom; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_left() const { return m_padding_left; }
+    EIGEN_DEVICE_FUNC
+    DenseIndex padding_right() const { return m_padding_right; }
+    EIGEN_DEVICE_FUNC
+    PaddingType padding_type() const { return m_padding_type; }
+    EIGEN_DEVICE_FUNC
+    Scalar padding_value() const { return m_padding_value; }
+
+    EIGEN_DEVICE_FUNC
+    const typename internal::remove_all<typename XprType::Nested>::type&
+    expression() const { return m_xpr; }
+
+  protected:
+    typename XprType::Nested m_xpr;
+    const DenseIndex m_patch_planes;
+    const DenseIndex m_patch_rows;
+    const DenseIndex m_patch_cols;
+    const DenseIndex m_plane_strides;
+    const DenseIndex m_row_strides;
+    const DenseIndex m_col_strides;
+    const DenseIndex m_in_plane_strides;
+    const DenseIndex m_in_row_strides;
+    const DenseIndex m_in_col_strides;
+    const DenseIndex m_plane_inflate_strides;
+    const DenseIndex m_row_inflate_strides;
+    const DenseIndex m_col_inflate_strides;
+    const bool m_padding_explicit;
+    const DenseIndex m_padding_top_z;
+    const DenseIndex m_padding_bottom_z;
+    const DenseIndex m_padding_top;
+    const DenseIndex m_padding_bottom;
+    const DenseIndex m_padding_left;
+    const DenseIndex m_padding_right;
+    const PaddingType m_padding_type;
+    const Scalar m_padding_value;
+};
+
+
+// Eval as rvalue
+template<DenseIndex Planes, DenseIndex Rows, DenseIndex Cols, typename ArgType, typename Device>
+struct TensorEvaluator<const TensorVolumePatchOp<Planes, Rows, Cols, ArgType>, Device>
+{
+  typedef TensorVolumePatchOp<Planes, Rows, Cols, ArgType> XprType;
+  typedef typename XprType::Index Index;
+  static const int NumInputDims = internal::array_size<typename TensorEvaluator<ArgType, Device>::Dimensions>::value;
+  static const int NumDims = NumInputDims + 1;
+  typedef DSizes<Index, NumDims> Dimensions;
+  typedef typename internal::remove_const<typename XprType::Scalar>::type Scalar;
+
+  enum {
+    IsAligned = false,
+    PacketAccess = TensorEvaluator<ArgType, Device>::PacketAccess,
+    BlockAccess = false,
+    Layout = TensorEvaluator<ArgType, Device>::Layout,
+    CoordAccess = NumDims == 6,
+  };
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE TensorEvaluator(const XprType& op, const Device& device)
+      : m_impl(op.expression(), device)
+  {
+    EIGEN_STATIC_ASSERT(NumDims >= 5, YOU_MADE_A_PROGRAMMING_MISTAKE);
+
+    m_paddingValue = op.padding_value();
+
+    const typename TensorEvaluator<ArgType, Device>::Dimensions& input_dims = m_impl.dimensions();
+
+    // Cache a few variables.
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_inputDepth = input_dims[0];
+      m_inputPlanes = input_dims[1];
+      m_inputRows = input_dims[2];
+      m_inputCols = input_dims[3];
+    } else {
+      m_inputDepth = input_dims[NumInputDims-1];
+      m_inputPlanes = input_dims[NumInputDims-2];
+      m_inputRows = input_dims[NumInputDims-3];
+      m_inputCols = input_dims[NumInputDims-4];
+    }
+
+    m_plane_strides = op.plane_strides();
+    m_row_strides = op.row_strides();
+    m_col_strides = op.col_strides();
+
+    // Input strides and effective input/patch size
+    m_in_plane_strides = op.in_plane_strides();
+    m_in_row_strides = op.in_row_strides();
+    m_in_col_strides = op.in_col_strides();
+    m_plane_inflate_strides = op.plane_inflate_strides();
+    m_row_inflate_strides = op.row_inflate_strides();
+    m_col_inflate_strides = op.col_inflate_strides();
+
+    // The "effective" spatial size after inflating data with zeros.
+    m_input_planes_eff = (m_inputPlanes - 1) * m_plane_inflate_strides + 1;
+    m_input_rows_eff = (m_inputRows - 1) * m_row_inflate_strides + 1;
+    m_input_cols_eff = (m_inputCols - 1) * m_col_inflate_strides + 1;
+    m_patch_planes_eff = op.patch_planes() + (op.patch_planes() - 1) * (m_in_plane_strides - 1);
+    m_patch_rows_eff = op.patch_rows() + (op.patch_rows() - 1) * (m_in_row_strides - 1);
+    m_patch_cols_eff = op.patch_cols() + (op.patch_cols() - 1) * (m_in_col_strides - 1);
+
+    if (op.padding_explicit()) {
+      m_outputPlanes = ceil((m_input_planes_eff + op.padding_top_z() + op.padding_bottom_z() - m_patch_planes_eff + 1.f) / static_cast<float>(m_plane_strides));
+      m_outputRows = ceil((m_input_rows_eff + op.padding_top() + op.padding_bottom() - m_patch_rows_eff + 1.f) / static_cast<float>(m_row_strides));
+      m_outputCols = ceil((m_input_cols_eff + op.padding_left() + op.padding_right() - m_patch_cols_eff + 1.f) / static_cast<float>(m_col_strides));
+      m_planePaddingTop = op.padding_top_z();
+      m_rowPaddingTop = op.padding_top();
+      m_colPaddingLeft = op.padding_left();
+    } else {
+      // Computing padding from the type
+      switch (op.padding_type()) {
+        case PADDING_VALID:
+          m_outputPlanes = ceil((m_input_planes_eff - m_patch_planes_eff + 1.f) / static_cast<float>(m_plane_strides));
+          m_outputRows = ceil((m_input_rows_eff - m_patch_rows_eff + 1.f) / static_cast<float>(m_row_strides));
+          m_outputCols = ceil((m_input_cols_eff - m_patch_cols_eff + 1.f) / static_cast<float>(m_col_strides));
+          m_planePaddingTop = 0;
+          m_rowPaddingTop = 0;
+          m_colPaddingLeft = 0;
+          break;
+        case PADDING_SAME: {
+          m_outputPlanes = ceil(m_input_planes_eff / static_cast<float>(m_plane_strides));
+          m_outputRows = ceil(m_input_rows_eff / static_cast<float>(m_row_strides));
+          m_outputCols = ceil(m_input_cols_eff / static_cast<float>(m_col_strides));
+          const Index dz = m_outputPlanes * m_plane_strides + m_patch_planes_eff - 1 - m_input_planes_eff;
+          const Index dy = m_outputRows * m_row_strides + m_patch_rows_eff - 1 - m_input_rows_eff;
+          const Index dx = m_outputCols * m_col_strides + m_patch_cols_eff - 1 - m_input_cols_eff;
+          m_planePaddingTop = dz - dz / 2;
+          m_rowPaddingTop = dy - dy / 2;
+          m_colPaddingLeft = dx - dx / 2;
+          break;
+        }
+        default:
+          eigen_assert(false && "unexpected padding");
+      }
+    }
+    eigen_assert(m_outputRows > 0);
+    eigen_assert(m_outputCols > 0);
+    eigen_assert(m_outputPlanes > 0);
+
+    // Dimensions for result of extraction.
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      // ColMajor
+      // 0: depth
+      // 1: patch_planes
+      // 2: patch_rows
+      // 3: patch_cols
+      // 4: number of patches
+      // 5 and beyond: anything else (such as batch).
+      m_dimensions[0] = input_dims[0];
+      m_dimensions[1] = op.patch_planes();
+      m_dimensions[2] = op.patch_rows();
+      m_dimensions[3] = op.patch_cols();
+      m_dimensions[4] = m_outputPlanes * m_outputRows * m_outputCols;
+      for (int i = 5; i < NumDims; ++i) {
+        m_dimensions[i] = input_dims[i-1];
+      }
+    } else {
+      // RowMajor
+      // NumDims-1: depth
+      // NumDims-2: patch_planes
+      // NumDims-3: patch_rows
+      // NumDims-4: patch_cols
+      // NumDims-5: number of patches
+      // NumDims-6 and beyond: anything else (such as batch).
+      m_dimensions[NumDims-1] = input_dims[NumInputDims-1];
+      m_dimensions[NumDims-2] = op.patch_planes();
+      m_dimensions[NumDims-3] = op.patch_rows();
+      m_dimensions[NumDims-4] = op.patch_cols();
+      m_dimensions[NumDims-5] = m_outputPlanes * m_outputRows * m_outputCols;
+      for (int i = NumDims-6; i >= 0; --i) {
+        m_dimensions[i] = input_dims[i];
+      }
+    }
+
+    // Strides for the output tensor.
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_rowStride = m_dimensions[1];
+      m_colStride = m_dimensions[2] * m_rowStride;
+      m_patchStride = m_colStride * m_dimensions[3] * m_dimensions[0];
+      m_otherStride = m_patchStride * m_dimensions[4];
+    } else {
+      m_rowStride = m_dimensions[NumDims-2];
+      m_colStride = m_dimensions[NumDims-3] * m_rowStride;
+      m_patchStride = m_colStride * m_dimensions[NumDims-4] * m_dimensions[NumDims-1];
+      m_otherStride = m_patchStride * m_dimensions[NumDims-5];
+    }
+
+    // Strides for navigating through the input tensor.
+    m_planeInputStride = m_inputDepth;
+    m_rowInputStride = m_inputDepth * m_inputPlanes;
+    m_colInputStride = m_inputDepth * m_inputRows * m_inputPlanes;
+    m_otherInputStride = m_inputDepth * m_inputRows * m_inputCols * m_inputPlanes;
+
+    m_outputPlanesRows = m_outputPlanes * m_outputRows;
+
+    // Fast representations of different variables.
+    m_fastOtherStride = internal::TensorIntDivisor<Index>(m_otherStride);
+    m_fastPatchStride = internal::TensorIntDivisor<Index>(m_patchStride);
+    m_fastColStride = internal::TensorIntDivisor<Index>(m_colStride);
+    m_fastRowStride = internal::TensorIntDivisor<Index>(m_rowStride);
+    m_fastInputRowStride = internal::TensorIntDivisor<Index>(m_row_inflate_strides);
+    m_fastInputColStride = internal::TensorIntDivisor<Index>(m_col_inflate_strides);
+    m_fastInputPlaneStride = internal::TensorIntDivisor<Index>(m_plane_inflate_strides);
+    m_fastInputColsEff = internal::TensorIntDivisor<Index>(m_input_cols_eff);
+    m_fastOutputPlanes = internal::TensorIntDivisor<Index>(m_outputPlanes);
+    m_fastOutputPlanesRows = internal::TensorIntDivisor<Index>(m_outputPlanesRows);
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      m_fastOutputDepth = internal::TensorIntDivisor<Index>(m_dimensions[0]);
+    } else {
+      m_fastOutputDepth = internal::TensorIntDivisor<Index>(m_dimensions[NumDims-1]);
+    }
+  }
+
+  typedef typename XprType::CoeffReturnType CoeffReturnType;
+  typedef typename XprType::PacketReturnType PacketReturnType;
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const Dimensions& dimensions() const { return m_dimensions; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE bool evalSubExprsIfNeeded(Scalar* /*data*/) {
+    m_impl.evalSubExprsIfNeeded(NULL);
+    return true;
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE void cleanup() {
+    m_impl.cleanup();
+  }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(Index index) const
+  {
+    // Patch index corresponding to the passed in index.
+    const Index patchIndex = index / m_fastPatchStride;
+
+    // Spatial offset within the patch. This has to be translated into 3D
+    // coordinates within the patch.
+    const Index patchOffset = (index - patchIndex * m_patchStride) / m_fastOutputDepth;
+
+    // Batch, etc.
+    const Index otherIndex = (NumDims == 5) ? 0 : index / m_fastOtherStride;
+    const Index patch3DIndex = (NumDims == 5) ? patchIndex : (index - otherIndex * m_otherStride) / m_fastPatchStride;
+
+    // Calculate column index in the input original tensor.
+    const Index colIndex = patch3DIndex / m_fastOutputPlanesRows;
+    const Index colOffset = patchOffset / m_fastColStride;
+    const Index inputCol = colIndex * m_col_strides + colOffset * m_in_col_strides - m_colPaddingLeft;
+    const Index origInputCol = (m_col_inflate_strides == 1) ? inputCol : ((inputCol >= 0) ? (inputCol / m_fastInputColStride) : 0);
+    if (inputCol < 0 || inputCol >= m_input_cols_eff ||
+        ((m_col_inflate_strides != 1) && (inputCol != origInputCol * m_col_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+
+    // Calculate row index in the original input tensor.
+    const Index rowIndex = (patch3DIndex - colIndex * m_outputPlanesRows) / m_fastOutputPlanes;
+    const Index rowOffset = (patchOffset - colOffset * m_colStride) / m_fastRowStride;
+    const Index inputRow = rowIndex * m_row_strides + rowOffset * m_in_row_strides - m_rowPaddingTop;
+    const Index origInputRow = (m_row_inflate_strides == 1) ? inputRow : ((inputRow >= 0) ? (inputRow / m_fastInputRowStride) : 0);
+    if (inputRow < 0 || inputRow >= m_input_rows_eff ||
+        ((m_row_inflate_strides != 1) && (inputRow != origInputRow * m_row_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+
+    // Calculate plane index in the original input tensor.
+    const Index planeIndex = (patch3DIndex - m_outputPlanes * (colIndex * m_outputRows + rowIndex));
+    const Index planeOffset = patchOffset - colOffset * m_colStride - rowOffset * m_rowStride;
+    const Index inputPlane = planeIndex * m_plane_strides + planeOffset * m_in_plane_strides - m_planePaddingTop;
+    const Index origInputPlane = (m_plane_inflate_strides == 1) ? inputPlane : ((inputPlane >= 0) ? (inputPlane / m_fastInputPlaneStride) : 0);
+    if (inputPlane < 0 || inputPlane >= m_input_planes_eff ||
+        ((m_plane_inflate_strides != 1) && (inputPlane != origInputPlane * m_plane_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+
+    const int depth_index = static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 0 : NumDims - 1;
+    const Index depth = index - (index / m_fastOutputDepth) * m_dimensions[depth_index];
+
+    const Index inputIndex = depth +
+        origInputRow * m_rowInputStride +
+        origInputCol * m_colInputStride +
+        origInputPlane * m_planeInputStride +
+        otherIndex * m_otherInputStride;
+
+    return m_impl.coeff(inputIndex);
+  }
+
+  template<int LoadMode>
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packet(Index index) const
+  {
+    const Index packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
+    eigen_assert(index+packetSize-1 < dimensions().TotalSize());
+
+    if (m_in_row_strides != 1 || m_in_col_strides != 1 || m_row_inflate_strides != 1 || m_col_inflate_strides != 1 ||
+        m_in_plane_strides != 1 || m_plane_inflate_strides != 1) {
+      return packetWithPossibleZero(index);
+    }
+
+    const Index indices[2] = {index, index + packetSize - 1};
+    const Index patchIndex = indices[0] / m_fastPatchStride;
+    if (patchIndex != indices[1] / m_fastPatchStride) {
+      return packetWithPossibleZero(index);
+    }
+    const Index otherIndex = (NumDims == 5) ? 0 : indices[0] / m_fastOtherStride;
+    eigen_assert(otherIndex == indices[1] / m_fastOtherStride);
+
+    // Find the offset of the element wrt the location of the first element.
+    const Index patchOffsets[2] = {(indices[0] - patchIndex * m_patchStride) / m_fastOutputDepth,
+                                   (indices[1] - patchIndex * m_patchStride) / m_fastOutputDepth};
+
+    const Index patch3DIndex = (NumDims == 5) ? patchIndex : (indices[0] - otherIndex * m_otherStride) / m_fastPatchStride;
+    eigen_assert(patch3DIndex == (indices[1] - otherIndex * m_otherStride) / m_fastPatchStride);
+
+    const Index colIndex = patch3DIndex / m_fastOutputPlanesRows;
+    const Index colOffsets[2] = {
+      patchOffsets[0] / m_fastColStride,
+      patchOffsets[1] / m_fastColStride};
+
+    // Calculate col indices in the original input tensor.
+    const Index inputCols[2] = {
+      colIndex * m_col_strides + colOffsets[0] - m_colPaddingLeft,
+      colIndex * m_col_strides + colOffsets[1] - m_colPaddingLeft};
+    if (inputCols[1] < 0 || inputCols[0] >= m_inputCols) {
+      return internal::pset1<PacketReturnType>(Scalar(m_paddingValue));
+    }
+
+    if (inputCols[0] != inputCols[1]) {
+      return packetWithPossibleZero(index);
+    }
+
+    const Index rowIndex = (patch3DIndex - colIndex * m_outputPlanesRows) / m_fastOutputPlanes;
+    const Index rowOffsets[2] = {
+      (patchOffsets[0] - colOffsets[0] * m_colStride) / m_fastRowStride,
+      (patchOffsets[1] - colOffsets[1] * m_colStride) / m_fastRowStride};
+    eigen_assert(rowOffsets[0] <= rowOffsets[1]);
+    // Calculate col indices in the original input tensor.
+    const Index inputRows[2] = {
+      rowIndex * m_row_strides + rowOffsets[0] - m_rowPaddingTop,
+      rowIndex * m_row_strides + rowOffsets[1] - m_rowPaddingTop};
+
+    if (inputRows[1] < 0 || inputRows[0] >= m_inputRows) {
+      return internal::pset1<PacketReturnType>(Scalar(m_paddingValue));
+    }
+
+    if (inputRows[0] != inputRows[1]) {
+      return packetWithPossibleZero(index);
+    }
+
+    const Index planeIndex = (patch3DIndex - m_outputPlanes * (colIndex * m_outputRows + rowIndex));
+    const Index planeOffsets[2] = {
+      patchOffsets[0] - colOffsets[0] * m_colStride - rowOffsets[0] * m_rowStride,
+      patchOffsets[1] - colOffsets[1] * m_colStride - rowOffsets[1] * m_rowStride};
+    eigen_assert(planeOffsets[0] <= planeOffsets[1]);
+    const Index inputPlanes[2] = {
+      planeIndex * m_plane_strides + planeOffsets[0] - m_planePaddingTop,
+      planeIndex * m_plane_strides + planeOffsets[1] - m_planePaddingTop};
+
+    if (inputPlanes[1] < 0 || inputPlanes[0] >= m_inputPlanes) {
+      return internal::pset1<PacketReturnType>(Scalar(m_paddingValue));
+    }
+
+    if (inputPlanes[0] >= 0 && inputPlanes[1] < m_inputPlanes) {
+      // no padding
+      const int depth_index = static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 0 : NumDims - 1;
+      const Index depth = index - (index / m_fastOutputDepth) * m_dimensions[depth_index];
+      const Index inputIndex = depth +
+          inputRows[0] * m_rowInputStride +
+          inputCols[0] * m_colInputStride +
+          m_planeInputStride * inputPlanes[0] +
+          otherIndex * m_otherInputStride;
+      return m_impl.template packet<Unaligned>(inputIndex);
+    }
+
+    return packetWithPossibleZero(index);
+  }
+
+  EIGEN_DEVICE_FUNC Scalar* data() const { return NULL; }
+
+  const TensorEvaluator<ArgType, Device>& impl() const { return m_impl; }
+
+  Index planePaddingTop() const { return m_planePaddingTop; }
+  Index rowPaddingTop() const { return m_rowPaddingTop; }
+  Index colPaddingLeft() const { return m_colPaddingLeft; }
+  Index outputPlanes() const { return m_outputPlanes; }
+  Index outputRows() const { return m_outputRows; }
+  Index outputCols() const { return m_outputCols; }
+  Index userPlaneStride() const { return m_plane_strides; }
+  Index userRowStride() const { return m_row_strides; }
+  Index userColStride() const { return m_col_strides; }
+  Index userInPlaneStride() const { return m_in_plane_strides; }
+  Index userInRowStride() const { return m_in_row_strides; }
+  Index userInColStride() const { return m_in_col_strides; }
+  Index planeInflateStride() const { return m_plane_inflate_strides; }
+  Index rowInflateStride() const { return m_row_inflate_strides; }
+  Index colInflateStride() const { return m_col_inflate_strides; }
+
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE CoeffReturnType coeff(const array<Index, NumDims>& coords) const
+  {
+    // ColMajor
+    //   0: depth, 1: patch_planes, 2: patch_rows, 3: patch_cols, 4: number of patches, 5: batches
+    // RowMajor
+    //   0: batches, 1: number of patches, 2: patch_cols , 3: patch_rows, 4: patch_planes, 5: depth
+    const Index patch3DIndex = coords[static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 4 : 1];
+    const Index colOffset = coords[static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 3 : 2];
+    const Index rowOffset= coords[static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 2 : 3];
+    const Index planeOffset = coords[static_cast<int>(Layout) == static_cast<int>(ColMajor) ? 1 : 4];
+
+    array<Index, NumDims-1> inputCoords;
+
+    const Index colIndex = patch3DIndex / m_fastOutputPlanesRows;
+    const Index inputCol = colIndex * m_col_strides + colOffset * m_in_col_strides - m_colPaddingLeft;
+    const Index origInputCol = (m_col_inflate_strides == 1) ? inputCol : ((inputCol >= 0) ? (inputCol / m_fastInputColStride) : 0);
+    if (inputCol < 0 || inputCol >= m_input_cols_eff ||
+        ((m_col_inflate_strides != 1) && (inputCol != origInputCol * m_col_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+
+    const Index rowIndex = (patch3DIndex - colIndex * m_outputPlanesRows) / m_fastOutputPlanes;
+    const Index inputRow = rowIndex * m_row_strides + rowOffset * m_in_row_strides - m_rowPaddingTop;
+    const Index origInputRow = (m_row_inflate_strides == 1) ? inputRow : ((inputRow >= 0) ? (inputRow / m_fastInputRowStride) : 0);
+    if (inputRow < 0 || inputRow >= m_input_rows_eff ||
+        ((m_row_inflate_strides != 1) && (inputRow != origInputRow * m_row_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+
+    const Index planeIndex = patch3DIndex - colIndex * m_outputPlanesRows - rowIndex * m_outputRows;
+    const Index inputPlane = planeIndex * m_plane_strides + planeOffset * m_in_plane_strides - m_planePaddingTop;
+    const Index origInputPlane = (m_plane_inflate_strides == 1) ? inputPlane : ((inputPlane >= 0) ? (inputPlane / m_fastInputPlaneStride) : 0);
+    if (inputPlane < 0 || inputPlane >= m_input_planes_eff ||
+        ((m_plane_inflate_strides != 1) && (inputPlane != origInputPlane * m_plane_inflate_strides))) {
+      return Scalar(m_paddingValue);
+    }
+
+    if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+      inputCoords[0] = coords[0];  // depth
+      inputCoords[1] = origInputPlane;
+      inputCoords[2] = origInputRow;
+      inputCoords[3] = origInputCol;
+      inputCoords[4] = coords[5];  // batch
+    } else {
+      inputCoords[4] = coords[5];  // depth
+      inputCoords[3] = origInputPlane;
+      inputCoords[2] = origInputRow;
+      inputCoords[1] = origInputCol;
+      inputCoords[0] = coords[0];  // batch
+    }
+    if (TensorEvaluator<ArgType, Device>::CoordAccess) {
+      return m_impl.coeff(inputCoords);
+    } else {
+      Index inputIndex;
+      if (static_cast<int>(Layout) == static_cast<int>(ColMajor)) {
+        inputIndex =
+          inputCoords[4] * m_otherInputStride +
+          inputCoords[3] * m_colInputStride +
+          inputCoords[2] * m_rowInputStride +
+          inputCoords[1] * m_planeInputStride +
+          inputCoords[0];
+      } else {
+        inputIndex =
+          inputCoords[0] * m_otherInputStride +
+          inputCoords[1] * m_colInputStride +
+          inputCoords[2] * m_rowInputStride +
+          inputCoords[3] * m_planeInputStride +
+          inputCoords[4];
+      }
+      return m_impl.coeff(inputIndex);
+    }
+  }
+
+ protected:
+  EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE PacketReturnType packetWithPossibleZero(Index index) const
+  {
+    const int packetSize = internal::unpacket_traits<PacketReturnType>::size;
+    EIGEN_ALIGN_DEFAULT typename internal::remove_const<CoeffReturnType>::type values[packetSize];
+    for (int i = 0; i < packetSize; ++i) {
+      values[i] = coeff(index+i);
+    }
+    PacketReturnType rslt = internal::pload<PacketReturnType>(values);
+    return rslt;
+  }
+
+  Dimensions m_dimensions;
+
+  // Parameters passed to the costructor.
+  Index m_plane_strides;
+  Index m_row_strides;
+  Index m_col_strides;
+
+  Index m_outputPlanes;
+  Index m_outputRows;
+  Index m_outputCols;
+
+  Index m_planePaddingTop;
+  Index m_rowPaddingTop;
+  Index m_colPaddingLeft;
+
+  Index m_in_plane_strides;
+  Index m_in_row_strides;
+  Index m_in_col_strides;
+
+  Index m_plane_inflate_strides;
+  Index m_row_inflate_strides;
+  Index m_col_inflate_strides;
+
+  // Cached input size.
+  Index m_inputDepth;
+  Index m_inputPlanes;
+  Index m_inputRows;
+  Index m_inputCols;
+
+  // Other cached variables.
+  Index m_outputPlanesRows;
+
+  // Effective input/patch post-inflation size.
+  Index m_input_planes_eff;
+  Index m_input_rows_eff;
+  Index m_input_cols_eff;
+  Index m_patch_planes_eff;
+  Index m_patch_rows_eff;
+  Index m_patch_cols_eff;
+
+  // Strides for the output tensor.
+  Index m_otherStride;
+  Index m_patchStride;
+  Index m_rowStride;
+  Index m_colStride;
+
+  // Strides for the input tensor.
+  Index m_planeInputStride;
+  Index m_rowInputStride;
+  Index m_colInputStride;
+  Index m_otherInputStride;
+
+  internal::TensorIntDivisor<Index> m_fastOtherStride;
+  internal::TensorIntDivisor<Index> m_fastPatchStride;
+  internal::TensorIntDivisor<Index> m_fastColStride;
+  internal::TensorIntDivisor<Index> m_fastRowStride;
+  internal::TensorIntDivisor<Index> m_fastInputPlaneStride;
+  internal::TensorIntDivisor<Index> m_fastInputRowStride;
+  internal::TensorIntDivisor<Index> m_fastInputColStride;
+  internal::TensorIntDivisor<Index> m_fastInputColsEff;
+  internal::TensorIntDivisor<Index> m_fastOutputPlanesRows;
+  internal::TensorIntDivisor<Index> m_fastOutputPlanes;
+  internal::TensorIntDivisor<Index> m_fastOutputDepth;
+
+  Scalar m_paddingValue;
+
+  TensorEvaluator<ArgType, Device> m_impl;
+};
+
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSOR_TENSOR_VOLUME_PATCH_H
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/g3doc/README.md b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/g3doc/README.md
new file mode 100644
index 0000000000..1c3fe32f9b
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/Tensor/g3doc/README.md
@@ -0,0 +1,1792 @@
+# Eigen Tensors
+
+Tensors are multidimensional arrays of elements. Elements are typically scalars,
+but more complex types such as strings are also supported.
+
+[TOC]
+
+## Tensor Classes
+
+You can manipulate a tensor with one of the following classes.  They all are in
+the namespace ```::Eigen.```
+
+
+### Class Tensor&lt;data_type, rank&gt;
+
+This is the class to use to create a tensor and allocate memory for it.  The
+class is templatized with the tensor datatype, such as float or int, and the
+tensor rank.  The rank is the number of dimensions, for example rank 2 is a
+matrix.
+
+Tensors of this class are resizable.  For example, if you assign a tensor of a
+different size to a Tensor, that tensor is resized to match its new value.
+
+#### Constructor Tensor&lt;data_type, rank&gt;(size0, size1, ...)
+
+Constructor for a Tensor.  The constructor must be passed ```rank``` integers
+indicating the sizes of the instance along each of the the ```rank```
+dimensions.
+
+    // Create a tensor of rank 3 of sizes 2, 3, 4.  This tensor owns
+    // memory to hold 24 floating point values (24 = 2 x 3 x 4).
+    Tensor<float, 3> t_3d(2, 3, 4);
+
+    // Resize t_3d by assigning a tensor of different sizes, but same rank.
+    t_3d = Tensor<float, 3>(3, 4, 3);
+
+#### Constructor Tensor&lt;data_type, rank&gt;(size_array)
+
+Constructor where the sizes for the constructor are specified as an array of
+values instead of an explicitly list of parameters.  The array type to use is
+```Eigen::array<Eigen::Index>```.  The array can be constructed automatically
+from an initializer list.
+
+    // Create a tensor of strings of rank 2 with sizes 5, 7.
+    Tensor<string, 2> t_2d({5, 7});
+
+
+### Class TensorFixedSize&lt;data_type, Sizes&lt;size0, size1, ...&gt;&gt;
+
+Class to use for tensors of fixed size, where the size is known at compile
+time.  Fixed sized tensors can provide very fast computations because all their
+dimensions are known by the compiler.  FixedSize tensors are not resizable.
+
+If the total number of elements in a fixed size tensor is small enough the
+tensor data is held onto the stack and does not cause heap allocation and free.
+
+    // Create a 4 x 3 tensor of floats.
+    TensorFixedSize<float, Sizes<4, 3>> t_4x3;
+
+### Class TensorMap&lt;Tensor&lt;data_type, rank&gt;&gt;
+
+This is the class to use to create a tensor on top of memory allocated and
+owned by another part of your code.  It allows to view any piece of allocated
+memory as a Tensor.  Instances of this class do not own the memory where the
+data are stored.
+
+A TensorMap is not resizable because it does not own the memory where its data
+are stored.
+
+#### Constructor TensorMap&lt;Tensor&lt;data_type, rank&gt;&gt;(data, size0, size1, ...)
+
+Constructor for a Tensor.  The constructor must be passed a pointer to the
+storage for the data, and "rank" size attributes.  The storage has to be
+large enough to hold all the data.
+
+    // Map a tensor of ints on top of stack-allocated storage.
+    int storage[128];  // 2 x 4 x 2 x 8 = 128
+    TensorMap<int, 4> t_4d(storage, 2, 4, 2, 8);
+
+    // The same storage can be viewed as a different tensor.
+    // You can also pass the sizes as an array.
+    TensorMap<int, 2> t_2d(storage, 16, 8);
+
+    // You can also map fixed-size tensors.  Here we get a 1d view of
+    // the 2d fixed-size tensor.
+    TensorFixedSize<float, Sizes<4, 5>> t_4x3;
+    TensorMap<float, 1> t_12(t_4x3, 12);
+
+
+#### Class TensorRef
+
+See Assigning to a TensorRef below.
+
+## Accessing Tensor Elements
+
+#### &lt;data_type&gt; tensor(index0, index1...)
+
+Return the element at position ```(index0, index1...)``` in tensor
+```tensor```.  You must pass as many parameters as the rank of ```tensor```.
+The expression can be used as an l-value to set the value of the element at the
+specified position.  The value returned is of the datatype of the tensor.
+
+    // Set the value of the element at position (0, 1, 0);
+    Tensor<float, 3> t_3d(2, 3, 4);
+    t_3d(0, 1, 0) = 12.0f;
+
+    // Initialize all elements to random values.
+    for (int i = 0; i < 2; ++i) {
+      for (int j = 0; j < 3; ++j) {
+        for (int k = 0; k < 4; ++k) {
+          t_3d(i, j, k) = ...some random value...;
+        }
+      }
+    }
+
+    // Print elements of a tensor.
+    for (int i = 0; i < 2; ++i) {
+      LOG(INFO) << t_3d(i, 0, 0);
+    }
+
+
+## TensorLayout
+
+The tensor library supports 2 layouts: ```ColMajor``` (the default) and
+```RowMajor```.  Only the default column major layout is currently fully
+supported, and it is therefore not recommended to attempt to use the row major
+layout at the moment.
+
+The layout of a tensor is optionally specified as part of its type. If not
+specified explicitly column major is assumed.
+
+    Tensor<float, 3, ColMajor> col_major;  // equivalent to Tensor<float, 3>
+    TensorMap<Tensor<float, 3, RowMajor> > row_major(data, ...);
+
+All the arguments to an expression must use the same layout. Attempting to mix
+different layouts will result in a compilation error.
+
+It is possible to change the layout of a tensor or an expression using the
+```swap_layout()``` method.  Note that this will also reverse the order of the
+dimensions.
+
+    Tensor<float, 2, ColMajor> col_major(2, 4);
+    Tensor<float, 2, RowMajor> row_major(2, 4);
+
+    Tensor<float, 2> col_major_result = col_major;  // ok, layouts match
+    Tensor<float, 2> col_major_result = row_major;  // will not compile
+
+    // Simple layout swap
+    col_major_result = row_major.swap_layout();
+    eigen_assert(col_major_result.dimension(0) == 4);
+    eigen_assert(col_major_result.dimension(1) == 2);
+
+    // Swap the layout and preserve the order of the dimensions
+    array<int, 2> shuffle(1, 0);
+    col_major_result = row_major.swap_layout().shuffle(shuffle);
+    eigen_assert(col_major_result.dimension(0) == 2);
+    eigen_assert(col_major_result.dimension(1) == 4);
+
+
+## Tensor Operations
+
+The Eigen Tensor library provides a vast library of operations on Tensors:
+numerical operations such as addition and multiplication, geometry operations
+such as slicing and shuffling, etc.  These operations are available as methods
+of the Tensor classes, and in some cases as operator overloads.  For example
+the following code computes the elementwise addition of two tensors:
+
+    Tensor<float, 3> t1(2, 3, 4);
+    ...set some values in t1...
+    Tensor<float, 3> t2(2, 3, 4);
+    ...set some values in t2...
+    // Set t3 to the element wise sum of t1 and t2
+    Tensor<float, 3> t3 = t1 + t2;
+
+While the code above looks easy enough, it is important to understand that the
+expression ```t1 + t2``` is not actually adding the values of the tensors.  The
+expression instead constructs a "tensor operator" object of the class
+TensorCwiseBinaryOp&lt;scalar_sum&gt;, which has references to the tensors
+```t1``` and ```t2```.  This is a small C++ object that knows how to add
+```t1``` and ```t2```.  It is only when the value of the expression is assigned
+to the tensor ```t3``` that the addition is actually performed.  Technically,
+this happens through the overloading of ```operator=()``` in the Tensor class.
+
+This mechanism for computing tensor expressions allows for lazy evaluation and
+optimizations which are what make the tensor library very fast.
+
+Of course, the tensor operators do nest, and the expression ```t1 + t2 *
+0.3f``` is actually represented with the (approximate) tree of operators:
+
+    TensorCwiseBinaryOp<scalar_sum>(t1, TensorCwiseUnaryOp<scalar_mul>(t2, 0.3f))
+
+
+### Tensor Operations and C++ "auto"
+
+Because Tensor operations create tensor operators, the C++ ```auto``` keyword
+does not have its intuitive meaning.  Consider these 2 lines of code:
+
+    Tensor<float, 3> t3 = t1 + t2;
+    auto t4 = t1 + t2;
+
+In the first line we allocate the tensor ```t3``` and it will contain the
+result of the addition of ```t1``` and ```t2```.  In the second line, ```t4```
+is actually the tree of tensor operators that will compute the addition of
+```t1``` and ```t2```.  In fact, ```t4``` is *not* a tensor and you cannot get
+the values of its elements:
+
+    Tensor<float, 3> t3 = t1 + t2;
+    cout << t3(0, 0, 0);  // OK prints the value of t1(0, 0, 0) + t2(0, 0, 0)
+
+    auto t4 = t1 + t2;
+    cout << t4(0, 0, 0);  // Compilation error!
+
+When you use ```auto``` you do not get a Tensor as a result but instead a
+non-evaluated expression.  So only use ```auto``` to delay evaluation.
+
+Unfortunately, there is no single underlying concrete type for holding
+non-evaluated expressions, hence you have to use auto in the case when you do
+want to hold non-evaluated expressions.
+
+When you need the results of a set of tensor computations you have to assign the
+result to a Tensor that will be capable of holding them.  This can be
+either a normal Tensor, a fixed size Tensor, or a TensorMap on an existing
+piece of memory.  All the following will work:
+
+    auto t4 = t1 + t2;
+
+    Tensor<float, 3> result = t4;  // Could also be: result(t4);
+    cout << result(0, 0, 0);
+
+    TensorMap<float, 4> result(<a float* with enough space>, <size0>, ...) = t4;
+    cout << result(0, 0, 0);
+
+    TensorFixedSize<float, Sizes<size0, ...>> result = t4;
+    cout << result(0, 0, 0);
+
+Until you need the results, you can keep the operation around, and even reuse
+it for additional operations.  As long as you keep the expression as an
+operation, no computation is performed.
+
+    // One way to compute exp((t1 + t2) * 0.2f);
+    auto t3 = t1 + t2;
+    auto t4 = t3 * 0.2f;
+    auto t5 = t4.exp();
+    Tensor<float, 3> result = t5;
+
+    // Another way, exactly as efficient as the previous one:
+    Tensor<float, 3> result = ((t1 + t2) * 0.2f).exp();
+
+### Controlling When Expression are Evaluated
+
+There are several ways to control when expressions are evaluated:
+
+*   Assignment to a Tensor, TensorFixedSize, or TensorMap.
+*   Use of the eval() method.
+*   Assignment to a TensorRef.
+
+#### Assigning to a Tensor, TensorFixedSize, or TensorMap.
+
+The most common way to evaluate an expression is to assign it to a Tensor.  In
+the example below, the ```auto``` declarations make the intermediate values
+"Operations", not Tensors, and do not cause the expressions to be evaluated.
+The assignment to the Tensor ```result``` causes the evaluation of all the
+operations.
+
+    auto t3 = t1 + t2;             // t3 is an Operation.
+    auto t4 = t3 * 0.2f;           // t4 is an Operation.
+    auto t5 = t4.exp();            // t5 is an Operation.
+    Tensor<float, 3> result = t5;  // The operations are evaluated.
+
+If you know the ranks and sizes of the Operation value you can assign the
+Operation to a TensorFixedSize instead of a Tensor, which is a bit more
+efficient.
+
+    // We know that the result is a 4x4x2 tensor!
+    TensorFixedSize<float, Sizes<4, 4, 2>> result = t5;
+
+Simiarly, assigning an expression to a TensorMap causes its evaluation.  Like
+tensors of type TensorFixedSize, TensorMaps cannot be resized so they have to
+have the rank and sizes of the expression that are assigned to them.
+
+#### Calling eval().
+
+When you compute large composite expressions, you sometimes want to tell Eigen
+that an intermediate value in the expression tree is worth evaluating ahead of
+time.  This is done by inserting a call to the ```eval()``` method of the
+expression Operation.
+
+    // The previous example could have been written:
+    Tensor<float, 3> result = ((t1 + t2) * 0.2f).exp();
+
+    // If you want to compute (t1 + t2) once ahead of time you can write:
+    Tensor<float, 3> result = ((t1 + t2).eval() * 0.2f).exp();
+
+Semantically, calling ```eval()``` is equivalent to materializing the value of
+the expression in a temporary Tensor of the right size.  The code above in
+effect does:
+
+    // .eval() knows the size!
+    TensorFixedSize<float, Sizes<4, 4, 2>> tmp = t1 + t2;
+    Tensor<float, 3> result = (tmp * 0.2f).exp();
+
+Note that the return value of ```eval()``` is itself an Operation, so the
+following code does not do what you may think:
+
+    // Here t3 is an evaluation Operation.  t3 has not been evaluated yet.
+    auto t3 = (t1 + t2).eval();
+
+    // You can use t3 in another expression.  Still no evaluation.
+    auto t4 = (t3 * 0.2f).exp();
+
+    // The value is evaluated when you assign the Operation to a Tensor, using
+    // an intermediate tensor to represent t3.x
+    Tensor<float, 3> result = t4;
+
+While in the examples above calling ```eval()``` does not make a difference in
+performance, in other cases it can make a huge difference.  In the expression
+below the ```broadcast()``` expression causes the ```X.maximum()``` expression
+to be evaluated many times:
+
+    Tensor<...> X ...;
+    Tensor<...> Y = ((X - X.maximum(depth_dim).reshape(dims2d).broadcast(bcast))
+                     * beta).exp();
+
+Inserting a call to ```eval()``` between the ```maximum()``` and
+```reshape()``` calls guarantees that maximum() is only computed once and
+greatly speeds-up execution:
+
+    Tensor<...> Y =
+      ((X - X.maximum(depth_dim).eval().reshape(dims2d).broadcast(bcast))
+        * beta).exp();
+
+In the other example below, the tensor ```Y``` is both used in the expression
+and its assignment.  This is an aliasing problem and if the evaluation is not
+done in the right order Y will be updated incrementally during the evaluation
+resulting in bogus results:
+
+     Tensor<...> Y ...;
+     Y = Y / (Y.sum(depth_dim).reshape(dims2d).broadcast(bcast));
+
+Inserting a call to ```eval()``` between the ```sum()``` and ```reshape()```
+expressions ensures that the sum is computed before any updates to ```Y``` are
+done.
+
+     Y = Y / (Y.sum(depth_dim).eval().reshape(dims2d).broadcast(bcast));
+
+Note that an eval around the full right hand side expression is not needed
+because the generated has to compute the i-th value of the right hand side
+before assigning it to the left hand side.
+
+However, if you were assigning the expression value to a shuffle of ```Y```
+then you would need to force an eval for correctness by adding an ```eval()```
+call for the right hand side:
+
+     Y.shuffle(...) =
+        (Y / (Y.sum(depth_dim).eval().reshape(dims2d).broadcast(bcast))).eval();
+
+
+#### Assigning to a TensorRef.
+
+If you need to access only a few elements from the value of an expression you
+can avoid materializing the value in a full tensor by using a TensorRef.
+
+A TensorRef is a small wrapper class for any Eigen Operation.  It provides
+overloads for the ```()``` operator that let you access individual values in
+the expression.  TensorRef is convenient, because the Operation themselves do
+not provide a way to access individual elements.
+
+    // Create a TensorRef for the expression.  The expression is not
+    // evaluated yet.
+    TensorRef<Tensor<float, 3> > ref = ((t1 + t2) * 0.2f).exp();
+
+    // Use "ref" to access individual elements.  The expression is evaluated
+    // on the fly.
+    float at_0 = ref(0, 0, 0);
+    cout << ref(0, 1, 0);
+
+Only use TensorRef when you need a subset of the values of the expression.
+TensorRef only computes the values you access.  However note that if you are
+going to access all the values it will be much faster to materialize the
+results in a Tensor first.
+
+In some cases, if the full Tensor result would be very large, you may save
+memory by accessing it as a TensorRef.  But not always.  So don't count on it.
+
+
+### Controlling How Expressions Are Evaluated
+
+The tensor library provides several implementations of the various operations
+such as contractions and convolutions.  The implementations are optimized for
+different environments: single threaded on CPU, multi threaded on CPU, or on a
+GPU using cuda.  Additional implementations may be added later.
+
+You can choose which implementation to use with the ```device()``` call.  If
+you do not choose an implementation explicitly the default implementation that
+uses a single thread on the CPU is used.
+
+The default implementation has been optimized for recent Intel CPUs, taking
+advantage of SSE, AVX, and FMA instructions.  Work is ongoing to tune the
+library on ARM CPUs.  Note that you need to pass compiler-dependent flags
+to enable the use of SSE, AVX, and other instructions.
+
+For example, the following code adds two tensors using the default
+single-threaded CPU implementation:
+
+    Tensor<float, 2> a(30, 40);
+    Tensor<float, 2> b(30, 40);
+    Tensor<float, 2> c = a + b;
+
+To choose a different implementation you have to insert a ```device()``` call
+before the assignment of the result.  For technical C++ reasons this requires
+that the Tensor for the result be declared on its own.  This means that you
+have to know the size of the result.
+
+    Eigen::Tensor<float, 2> c(30, 40);
+    c.device(...) = a + b;
+
+The call to ```device()``` must be the last call on the left of the operator=.
+
+You must pass to the ```device()``` call an Eigen device object.  There are
+presently three devices you can use: DefaultDevice, ThreadPoolDevice and
+GpuDevice.
+
+
+#### Evaluating With the DefaultDevice
+
+This is exactly the same as not inserting a ```device()``` call.
+
+    DefaultDevice my_device;
+    c.device(my_device) = a + b;
+
+#### Evaluating with a Thread Pool
+
+    #include "thread/threadpool.h"
+
+    // Create a threadpool and start the threads.  This is the Google way,
+    // other environments use different mechanism to create a thread pool.
+    ThreadPool my_pool(4 /* number of threads in the pool */);
+    my_pool.StartWorkers();
+
+    // Create the Eigen ThreadPoolDevice.
+    // You typically use up to all the available threads in the pool.
+    Eigen::ThreadPoolDevice my_device(&my_pool, 4 /* number of threads to use */);
+
+    // Now just use the device when evaluating expressions.
+    Eigen::Tensor<float, 2> c(30, 50);
+    c.device(my_device) = a.contract(b, dot_product_dims);
+
+
+#### Evaluating On GPU
+
+This is presently a bit more complicated than just using a thread pool device.
+You need to create a GPU device but you also need to explicitly allocate the
+memory for tensors with cuda.
+
+
+## API Reference
+
+### Datatypes
+
+In the documentation of the tensor methods and Operation we mention datatypes
+that are tensor-type specific:
+
+#### &lt;Tensor-Type&gt;::Dimensions
+
+Acts like an array of ints.  Has an ```int size``` attribute, and can be
+indexed like an array to access individual values.  Used to represent the
+dimensions of a tensor.  See ```dimensions()```.
+
+#### &lt;Tensor-Type&gt;::Index
+
+Acts like an ```int```.  Used for indexing tensors along their dimensions.  See
+```operator()```, ```dimension()```, and ```size()```.
+
+#### &lt;Tensor-Type&gt;::Scalar
+
+Represents the datatype of individual tensor elements.  For example, for a
+```Tensor<float>```, ```Scalar``` is the type ```float```.  See
+```setConstant()```.
+
+#### &lt;Operation&gt;
+
+We use this pseudo type to indicate that a tensor Operation is returned by a
+method.  We indicate in the text the type and dimensions of the tensor that the
+Operation returns after evaluation.
+
+The Operation will have to be evaluated, for example by assigning it to a
+tensor, before you can access the values of the resulting tensor.  You can also
+access the values through a TensorRef.
+
+
+## Built-in Tensor Methods
+
+These are usual C++ methods that act on tensors immediately.  They are not
+Operations which provide delayed evaluation of their results.  Unless specified
+otherwise, all the methods listed below are available on all tensor classes:
+Tensor, TensorFixedSize, and TensorMap.
+
+## Metadata
+
+### int NumDimensions
+
+Constant value indicating the number of dimensions of a Tensor.  This is also
+known as the tensor "rank".
+
+      Eigen::Tensor<float, 2> a(3, 4);
+      cout << "Dims " << a.NumDimensions;
+      => Dims 2
+
+### Dimensions dimensions()
+
+Returns an array-like object representing the dimensions of the tensor.
+The actual type of the dimensions() result is <Tensor-Type>::Dimensions.
+
+    Eigen::Tensor<float, 2> a(3, 4);
+    const Eigen::Tensor<float, 2>::Dimensions& d = a.dimensions();
+    cout << "Dim size: " << d.size << ", dim 0: " << d[0]
+         << ", dim 1: " << d[1];
+    => Dim size: 2, dim 0: 3, dim 1: 4
+
+If you use a C++11 compiler, you can use ```auto``` to simplify the code:
+
+    const auto& d = a.dimensions();
+    cout << "Dim size: " << d.size << ", dim 0: " << d[0]
+         << ", dim 1: " << d[1];
+    => Dim size: 2, dim 0: 3, dim 1: 4
+
+### Index dimension(Index n)
+
+Returns the n-th dimension of the tensor.  The actual type of the
+```dimension()``` result is ```<Tensor-Type>::Index```, but you can
+always use it like an int.
+
+      Eigen::Tensor<float, 2> a(3, 4);
+      int dim1 = a.dimension(1);
+      cout << "Dim 1: " << dim1;
+      => Dim 1: 4
+
+### Index size()
+
+Returns the total number of elements in the tensor.  This is the product of all
+the tensor dimensions.  The actual type of the ```size()``` result is
+```<Tensor-Type>::Index```, but you can always use it like an int.
+
+    Eigen::Tensor<float, 2> a(3, 4);
+    cout << "Size: " << a.size();
+    => Size: 12
+
+
+### Getting Dimensions From An Operation
+
+A few operations provide ```dimensions()``` directly,
+e.g. ```TensorReslicingOp```.  Most operations defer calculating dimensions
+until the operation is being evaluated.  If you need access to the dimensions
+of a deferred operation, you can wrap it in a TensorRef (see Assigning to a
+TensorRef above), which provides ```dimensions()``` and ```dimension()``` as
+above.
+
+TensorRef can also wrap the plain Tensor types, so this is a useful idiom in
+templated contexts where the underlying object could be either a raw Tensor
+or some deferred operation (e.g. a slice of a Tensor).  In this case, the
+template code can wrap the object in a TensorRef and reason about its
+dimensionality while remaining agnostic to the underlying type.
+
+
+## Constructors
+
+### Tensor
+
+Creates a tensor of the specified size. The number of arguments must be equal
+to the rank of the tensor. The content of the tensor is not initialized.
+
+    Eigen::Tensor<float, 2> a(3, 4);
+    cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl;
+    => NumRows: 3 NumCols: 4
+
+### TensorFixedSize
+
+Creates a tensor of the specified size. The number of arguments in the Size<>
+template parameter determines the rank of the tensor. The content of the tensor
+is not initialized.
+
+    Eigen::TensorFixedSize<float, Sizes<3, 4>> a;
+    cout << "Rank: " << a.rank() << endl;
+    => Rank: 2
+    cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl;
+    => NumRows: 3 NumCols: 4
+
+### TensorMap
+
+Creates a tensor mapping an existing array of data. The data must not be freed
+until the TensorMap is discarded, and the size of the data must be large enough
+to accomodate the coefficients of the tensor.
+
+    float data[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11};
+    Eigen::TensorMap<float, 2> a(data, 3, 4);
+    cout << "NumRows: " << a.dimension(0) << " NumCols: " << a.dimension(1) << endl;
+    => NumRows: 3 NumCols: 4
+    cout << "a(1, 2): " << a(1, 2) << endl;
+    => a(1, 2): 9
+
+
+## Contents Initialization
+
+When a new Tensor or a new TensorFixedSize are created, memory is allocated to
+hold all the tensor elements, but the memory is not initialized.  Similarly,
+when a new TensorMap is created on top of non-initialized memory, its
+contents are not initialized.
+
+You can use one of the methods below to initialize the tensor memory.  These
+have an immediate effect on the tensor and return the tensor itself as a
+result.  These are not tensor Operations which delay evaluation.
+
+### &lt;Tensor-Type&gt; setConstant(const Scalar& val)
+
+Sets all elements of the tensor to the constant value ```val```.  ```Scalar```
+is the type of data stored in the tensor.  You can pass any value that is
+convertible to that type.
+
+Returns the tensor itself in case you want to chain another call.
+
+    a.setConstant(12.3f);
+    cout << "Constant: " << endl << a << endl << endl;
+    =>
+    Constant:
+    12.3 12.3 12.3 12.3
+    12.3 12.3 12.3 12.3
+    12.3 12.3 12.3 12.3
+
+Note that ```setConstant()``` can be used on any tensor where the element type
+has a copy constructor and an ```operator=()```:
+
+    Eigen::Tensor<string, 2> a(2, 3);
+    a.setConstant("yolo");
+    cout << "String tensor: " << endl << a << endl << endl;
+    =>
+    String tensor:
+    yolo yolo yolo
+    yolo yolo yolo
+
+
+### &lt;Tensor-Type&gt; setZero()
+
+Fills the tensor with zeros.  Equivalent to ```setConstant(Scalar(0))```.
+Returns the tensor itself in case you want to chain another call.
+
+    a.setZero();
+    cout << "Zeros: " << endl << a << endl << endl;
+    =>
+    Zeros:
+    0 0 0 0
+    0 0 0 0
+    0 0 0 0
+
+
+### &lt;Tensor-Type&gt; setValues({..initializer_list})
+
+Fills the tensor with explicit values specified in a std::initializer_list.
+The type of the initializer list depends on the type and rank of the tensor.
+
+If the tensor has rank N, the initializer list must be nested N times.  The
+most deeply nested lists must contains P scalars of the Tensor type where P is
+the size of the last dimension of the Tensor.
+
+For example, for a ```TensorFixedSize<float, Sizes<2, 3>>``` the initializer list must
+contains 2 lists of 3 floats each.
+
+```setValues()``` returns the tensor itself in case you want to chain another
+call.
+
+    Eigen::Tensor<float, 2> a(2, 3);
+    a.setValues({{0.0f, 1.0f, 2.0f}, {3.0f, 4.0f, 5.0f}});
+    cout << "a" << endl << a << endl << endl;
+    =>
+    a
+    0 1 2
+    3 4 5
+
+If a list is too short, the corresponding elements of the tensor will not be
+changed.  This is valid at each level of nesting.  For example the following
+code only sets the values of the first row of the tensor.
+
+    Eigen::Tensor<int, 2> a(2, 3);
+    a.setConstant(1000);
+    a.setValues({{10, 20, 30}});
+    cout << "a" << endl << a << endl << endl;
+    =>
+    a
+    10   20   30
+    1000 1000 1000
+
+### &lt;Tensor-Type&gt; setRandom()
+
+Fills the tensor with random values.  Returns the tensor itself in case you
+want to chain another call.
+
+    a.setRandom();
+    cout << "Random: " << endl << a << endl << endl;
+    =>
+    Random:
+      0.680375    0.59688  -0.329554    0.10794
+     -0.211234   0.823295   0.536459 -0.0452059
+      0.566198  -0.604897  -0.444451   0.257742
+
+You can customize ```setRandom()``` by providing your own random number
+generator as a template argument:
+
+    a.setRandom<MyRandomGenerator>();
+
+Here, ```MyRandomGenerator``` must be a struct with the following member
+functions, where Scalar and Index are the same as ```<Tensor-Type>::Scalar```
+and ```<Tensor-Type>::Index```.
+
+See ```struct UniformRandomGenerator``` in TensorFunctors.h for an example.
+
+    // Custom number generator for use with setRandom().
+    struct MyRandomGenerator {
+      // Default and copy constructors. Both are needed
+      MyRandomGenerator() { }
+      MyRandomGenerator(const MyRandomGenerator& ) { }
+
+      // Return a random value to be used.  "element_location" is the
+      // location of the entry to set in the tensor, it can typically
+      // be ignored.
+      Scalar operator()(Eigen::DenseIndex element_location,
+                        Eigen::DenseIndex /*unused*/ = 0) const {
+        return <randomly generated value of type T>;
+      }
+
+      // Same as above but generates several numbers at a time.
+      typename internal::packet_traits<Scalar>::type packetOp(
+          Eigen::DenseIndex packet_location, Eigen::DenseIndex /*unused*/ = 0) const {
+        return <a packet of randomly generated values>;
+      }
+    };
+
+You can also use one of the 2 random number generators that are part of the
+tensor library:
+*   UniformRandomGenerator
+*   NormalRandomGenerator
+
+
+## Data Access
+
+The Tensor, TensorFixedSize, and TensorRef classes provide the following
+accessors to access the tensor coefficients:
+
+    const Scalar& operator()(const array<Index, NumIndices>& indices)
+    const Scalar& operator()(Index firstIndex, IndexTypes... otherIndices)
+    Scalar& operator()(const array<Index, NumIndices>& indices)
+    Scalar& operator()(Index firstIndex, IndexTypes... otherIndices)
+
+The number of indices must be equal to the rank of the tensor. Moreover, these
+accessors are not available on tensor expressions. In order to access the
+values of a tensor expression, the expression must either be evaluated or
+wrapped in a TensorRef.
+
+
+### Scalar* data() and const Scalar* data() const
+
+Returns a pointer to the storage for the tensor.  The pointer is const if the
+tensor was const.  This allows direct access to the data.  The layout of the
+data depends on the tensor layout: RowMajor or ColMajor.
+
+This access is usually only needed for special cases, for example when mixing
+Eigen Tensor code with other libraries.
+
+Scalar is the type of data stored in the tensor.
+
+    Eigen::Tensor<float, 2> a(3, 4);
+    float* a_data = a.data();
+    a_data[0] = 123.45f;
+    cout << "a(0, 0): " << a(0, 0);
+    => a(0, 0): 123.45
+
+
+## Tensor Operations
+
+All the methods documented below return non evaluated tensor ```Operations```.
+These can be chained: you can apply another Tensor Operation to the value
+returned by the method.
+
+The chain of Operation is evaluated lazily, typically when it is assigned to a
+tensor.  See "Controlling when Expressions are Evaluated" for more details about
+their evaluation.
+
+### &lt;Operation&gt; constant(const Scalar& val)
+
+Returns a tensor of the same type and dimensions as the original tensor but
+where all elements have the value ```val```.
+
+This is useful, for example, when you want to add or subtract a constant from a
+tensor, or multiply every element of a tensor by a scalar.
+
+    Eigen::Tensor<float, 2> a(2, 3);
+    a.setConstant(1.0f);
+    Eigen::Tensor<float, 2> b = a + a.constant(2.0f);
+    Eigen::Tensor<float, 2> c = b * b.constant(0.2f);
+    cout << "a" << endl << a << endl << endl;
+    cout << "b" << endl << b << endl << endl;
+    cout << "c" << endl << c << endl << endl;
+    =>
+    a
+    1 1 1
+    1 1 1
+
+    b
+    3 3 3
+    3 3 3
+
+    c
+    0.6 0.6 0.6
+    0.6 0.6 0.6
+
+### &lt;Operation&gt; random()
+
+Returns a tensor of the same type and dimensions as the current tensor
+but where all elements have random values.
+
+This is for example useful to add random values to an existing tensor.
+The generation of random values can be customized in the same manner
+as for ```setRandom()```.
+
+    Eigen::Tensor<float, 2> a(2, 3);
+    a.setConstant(1.0f);
+    Eigen::Tensor<float, 2> b = a + a.random();
+    cout << "a" << endl << a << endl << endl;
+    cout << "b" << endl << b << endl << endl;
+    =>
+    a
+    1 1 1
+    1 1 1
+
+    b
+    1.68038   1.5662  1.82329
+    0.788766  1.59688 0.395103
+
+
+## Unary Element Wise Operations
+
+All these operations take a single input tensor as argument and return a tensor
+of the same type and dimensions as the tensor to which they are applied.  The
+requested operations are applied to each element independently.
+
+### &lt;Operation&gt; operator-()
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the opposite values of the original tensor.
+
+    Eigen::Tensor<float, 2> a(2, 3);
+    a.setConstant(1.0f);
+    Eigen::Tensor<float, 2> b = -a;
+    cout << "a" << endl << a << endl << endl;
+    cout << "b" << endl << b << endl << endl;
+    =>
+    a
+    1 1 1
+    1 1 1
+
+    b
+    -1 -1 -1
+    -1 -1 -1
+
+### &lt;Operation&gt; sqrt()
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the square roots of the original tensor.
+
+### &lt;Operation&gt; rsqrt()
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the inverse square roots of the original tensor.
+
+### &lt;Operation&gt; square()
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the squares of the original tensor values.
+
+### &lt;Operation&gt; inverse()
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the inverse of the original tensor values.
+
+### &lt;Operation&gt; exp()
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the exponential of the original tensor.
+
+### &lt;Operation&gt; log()
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the natural logarithms of the original tensor.
+
+### &lt;Operation&gt; abs()
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the absolute values of the original tensor.
+
+### &lt;Operation&gt; pow(Scalar exponent)
+
+Returns a tensor of the same type and dimensions as the original tensor
+containing the coefficients of the original tensor to the power of the
+exponent.
+
+The type of the exponent, Scalar, is always the same as the type of the
+tensor coefficients.  For example, only integer exponents can be used in
+conjuntion with tensors of integer values.
+
+You can use cast() to lift this restriction.  For example this computes
+cubic roots of an int Tensor:
+
+    Eigen::Tensor<int, 2> a(2, 3);
+    a.setValues({{0, 1, 8}, {27, 64, 125}});
+    Eigen::Tensor<double, 2> b = a.cast<double>().pow(1.0 / 3.0);
+    cout << "a" << endl << a << endl << endl;
+    cout << "b" << endl << b << endl << endl;
+    =>
+    a
+    0   1   8
+    27  64 125
+
+    b
+    0 1 2
+    3 4 5
+
+### &lt;Operation&gt;  operator * (Scalar scale)
+
+Multiplies all the coefficients of the input tensor by the provided scale.
+
+### &lt;Operation&gt;  cwiseMax(Scalar threshold)
+TODO
+
+### &lt;Operation&gt;  cwiseMin(Scalar threshold)
+TODO
+
+### &lt;Operation&gt;  unaryExpr(const CustomUnaryOp& func)
+TODO
+
+
+## Binary Element Wise Operations
+
+These operations take two input tensors as arguments. The 2 input tensors should
+be of the same type and dimensions. The result is a tensor of the same
+dimensions as the tensors to which they are applied, and unless otherwise
+specified it is also of the same type. The requested operations are applied to
+each pair of elements independently.
+
+### &lt;Operation&gt; operator+(const OtherDerived& other)
+
+Returns a tensor of the same type and dimensions as the input tensors
+containing the coefficient wise sums of the inputs.
+
+### &lt;Operation&gt; operator-(const OtherDerived& other)
+
+Returns a tensor of the same type and dimensions as the input tensors
+containing the coefficient wise differences of the inputs.
+
+### &lt;Operation&gt; operator*(const OtherDerived& other)
+
+Returns a tensor of the same type and dimensions as the input tensors
+containing the coefficient wise products of the inputs.
+
+### &lt;Operation&gt; operator/(const OtherDerived& other)
+
+Returns a tensor of the same type and dimensions as the input tensors
+containing the coefficient wise quotients of the inputs.
+
+This operator is not supported for integer types.
+
+### &lt;Operation&gt; cwiseMax(const OtherDerived& other)
+
+Returns a tensor of the same type and dimensions as the input tensors
+containing the coefficient wise maximums of the inputs.
+
+### &lt;Operation&gt; cwiseMin(const OtherDerived& other)
+
+Returns a tensor of the same type and dimensions as the input tensors
+containing the coefficient wise mimimums of the inputs.
+
+### &lt;Operation&gt; Logical operators
+
+The following logical operators are supported as well:
+
+*   operator&&(const OtherDerived& other)
+*   operator||(const OtherDerived& other)
+*   operator<(const OtherDerived& other)
+*   operator<=(const OtherDerived& other)
+*   operator>(const OtherDerived& other)
+*   operator>=(const OtherDerived& other)
+*   operator==(const OtherDerived& other)
+*   operator!=(const OtherDerived& other)
+
+They all return a tensor of boolean values.
+
+
+## Selection (select(const ThenDerived& thenTensor, const ElseDerived& elseTensor)
+
+Selection is a coefficient-wise ternary operator that is the tensor equivalent
+to the if-then-else operation.
+
+    Tensor<bool, 3> if = ...;
+    Tensor<float, 3> then = ...;
+    Tensor<float, 3> else = ...;
+    Tensor<float, 3> result = if.select(then, else);
+
+The 3 arguments must be of the same dimensions, which will also be the dimension
+of the result.  The 'if' tensor must be of type boolean, the 'then' and the
+'else' tensor must be of the same type, which will also be the type of the
+result.
+
+Each coefficient in the result is equal to the corresponding coefficient in the
+'then' tensor if the corresponding value in the 'if' tensor is true. If not, the
+resulting coefficient will come from the 'else' tensor.
+
+
+## Contraction
+
+Tensor *contractions* are a generalization of the matrix product to the
+multidimensional case.
+
+    // Create 2 matrices using tensors of rank 2
+    Eigen::Tensor<int, 2> a(2, 3);
+    a.setValues({{1, 2, 3}, {6, 5, 4}});
+    Eigen::Tensor<int, 2> b(3, 2);
+    a.setValues({{1, 2}, {4, 5}, {5, 6}});
+
+    // Compute the traditional matrix product
+    array<IndexPair<int>, 1> product_dims = { IndexPair(1, 0) };
+    Eigen::Tensor<int, 2> AB = a.contract(b, product_dims);
+
+    // Compute the product of the transpose of the matrices
+    array<IndexPair<int>, 1> transpose_product_dims = { IndexPair(0, 1) };
+    Eigen::Tensor<int, 2> AtBt = a.contract(b, transposed_product_dims);
+
+
+## Reduction Operations
+
+A *Reduction* operation returns a tensor with fewer dimensions than the
+original tensor.  The values in the returned tensor are computed by applying a
+*reduction operator* to slices of values from the original tensor.  You specify
+the dimensions along which the slices are made.
+
+The Eigen Tensor library provides a set of predefined reduction operators such
+as ```maximum()``` and ```sum()``` and lets you define additional operators by
+implementing a few methods from a reductor template.
+
+### Reduction Dimensions
+
+All reduction operations take a single parameter of type
+```<TensorType>::Dimensions``` which can always be specified as an array of
+ints.  These are called the "reduction dimensions."  The values are the indices
+of the dimensions of the input tensor over which the reduction is done.  The
+parameter can have at most as many element as the rank of the input tensor;
+each element must be less than the tensor rank, as it indicates one of the
+dimensions to reduce.
+
+Each dimension of the input tensor should occur at most once in the reduction
+dimensions as the implementation does not remove duplicates.
+
+The order of the values in the reduction dimensions does not affect the
+results, but the code may execute faster if you list the dimensions in
+increasing order.
+
+Example: Reduction along one dimension.
+
+    // Create a tensor of 2 dimensions
+    Eigen::Tensor<int, 2> a(2, 3);
+    a.setValues({{1, 2, 3}, {6, 5, 4}});
+    // Reduce it along the second dimension (1)...
+    Eigen::array<int, 1> dims({1 /* dimension to reduce */});
+    // ...using the "maximum" operator.
+    // The result is a tensor with one dimension.  The size of
+    // that dimension is the same as the first (non-reduced) dimension of a.
+    Eigen::Tensor<int, 1> b = a.maximum(dims);
+    cout << "a" << endl << a << endl << endl;
+    cout << "b" << endl << b << endl << endl;
+    =>
+    a
+    1 2 3
+    6 5 4
+
+    b
+    3
+    6
+
+Example: Reduction along two dimensions.
+
+    Eigen::Tensor<float, 3, Eigen::ColMajor> a(2, 3, 4);
+    a.setValues({{{0.0f, 1.0f, 2.0f, 3.0f},
+                  {7.0f, 6.0f, 5.0f, 4.0f},
+                  {8.0f, 9.0f, 10.0f, 11.0f}},
+                 {{12.0f, 13.0f, 14.0f, 15.0f},
+                  {19.0f, 18.0f, 17.0f, 16.0f},
+                  {20.0f, 21.0f, 22.0f, 23.0f}}});
+    // The tensor a has 3 dimensions.  We reduce along the
+    // first 2, resulting in a tensor with a single dimension
+    // of size 4 (the last dimension of a.)
+    // Note that we pass the array of reduction dimensions
+    // directly to the maximum() call.
+    Eigen::Tensor<float, 1, Eigen::ColMajor> b =
+        a.maximum(Eigen::array<int, 2>({0, 1}));
+    cout << "b" << endl << b << endl << endl;
+    =>
+    b
+    20
+    21
+    22
+    23
+
+#### Reduction along all dimensions
+
+As a special case, if you pass no parameter to a reduction operation the
+original tensor is reduced along *all* its dimensions.  The result is a
+one-dimension tensor with a single value.
+
+    Eigen::Tensor<float, 3> a(2, 3, 4);
+    a.setValues({{{0.0f, 1.0f, 2.0f, 3.0f},
+                  {7.0f, 6.0f, 5.0f, 4.0f},
+                  {8.0f, 9.0f, 10.0f, 11.0f}},
+                 {{12.0f, 13.0f, 14.0f, 15.0f},
+                  {19.0f, 18.0f, 17.0f, 16.0f},
+                  {20.0f, 21.0f, 22.0f, 23.0f}}});
+    // Reduce along all dimensions using the sum() operator.
+    Eigen::Tensor<float, 1> b = a.sum();
+    cout << "b" << endl << b << endl << endl;
+    =>
+    b
+    276
+
+
+### &lt;Operation&gt; sum(const Dimensions& new_dims)
+### &lt;Operation&gt; sum()
+
+Reduce a tensor using the sum() operator.  The resulting values
+are the sum of the reduced values.
+
+### &lt;Operation&gt; mean(const Dimensions& new_dims)
+### &lt;Operation&gt; mean()
+
+Reduce a tensor using the mean() operator.  The resulting values
+are the mean of the reduced values.
+
+### &lt;Operation&gt; maximum(const Dimensions& new_dims)
+### &lt;Operation&gt; maximum()
+
+Reduce a tensor using the maximum() operator.  The resulting values are the
+largest of the reduced values.
+
+### &lt;Operation&gt; minimum(const Dimensions& new_dims)
+### &lt;Operation&gt; minimum()
+
+Reduce a tensor using the minimum() operator.  The resulting values
+are the smallest of the reduced values.
+
+### &lt;Operation&gt; prod(const Dimensions& new_dims)
+### &lt;Operation&gt; prod()
+
+Reduce a tensor using the prod() operator.  The resulting values
+are the product of the reduced values.
+
+### &lt;Operation&gt; all(const Dimensions& new_dims)
+### &lt;Operation&gt; all()
+Reduce a tensor using the all() operator.  Casts tensor to bool and then checks
+whether all elements are true.  Runs through all elements rather than
+short-circuiting, so may be significantly inefficient.
+
+### &lt;Operation&gt; any(const Dimensions& new_dims)
+### &lt;Operation&gt; any()
+Reduce a tensor using the any() operator.  Casts tensor to bool and then checks
+whether any element is true.  Runs through all elements rather than
+short-circuiting, so may be significantly inefficient.
+
+### &lt;Operation&gt; reduce(const Dimensions& new_dims, const Reducer& reducer)
+
+Reduce a tensor using a user-defined reduction operator.  See ```SumReducer```
+in TensorFunctors.h for information on how to implement a reduction operator.
+
+
+## Convolutions
+
+### &lt;Operation&gt; convolve(const KernelDerived& kernel, const Dimensions& dims)
+
+Returns a tensor that is the output of the convolution of the of the input tensor with the kernel,
+along the specified dimensions of the input tensor. The dimension size for dimensions of the output tensor
+which were part of the convolution will be reduced by the formula:
+output_dim_size = input_dim_size - kernel_dim_size + 1 (requires: input_dim_size >= kernel_dim_size).
+The dimension sizes for dimensions that were not part of the convolution will remain the same.
+Performance of the convolution can depend on the length of the stride(s) of the input tensor dimension(s) along which the
+convolution is computed (the first dimension has the shortest stride for ColMajor, whereas RowMajor's shortest stride is
+for the last dimension).
+
+    // Compute convolution along the second and third dimension.
+    Tensor<float, 4, DataLayout> input(3, 3, 7, 11);
+    Tensor<float, 2, DataLayout> kernel(2, 2);
+    Tensor<float, 4, DataLayout> output(3, 2, 6, 11);
+    input.setRandom();
+    kernel.setRandom();
+
+    Eigen::array<ptrdiff_t, 2> dims({1, 2});  // Specify second and third dimension for convolution.
+    output = input.convolve(kernel, dims);
+
+    for (int i = 0; i < 3; ++i) {
+      for (int j = 0; j < 2; ++j) {
+        for (int k = 0; k < 6; ++k) {
+          for (int l = 0; l < 11; ++l) {
+            const float result = output(i,j,k,l);
+            const float expected = input(i,j+0,k+0,l) * kernel(0,0) +
+                                   input(i,j+1,k+0,l) * kernel(1,0) +
+                                   input(i,j+0,k+1,l) * kernel(0,1) +
+                                   input(i,j+1,k+1,l) * kernel(1,1);
+            VERIFY_IS_APPROX(result, expected);
+          }
+        }
+      }
+    }
+
+
+
+## Geometrical Operations
+
+These operations return a Tensor with different dimensions than the original
+Tensor.  They can be used to access slices of tensors, see them with different
+dimensions, or pad tensors with additional data.
+
+### &lt;Operation&gt; reshape(const Dimensions& new_dims)
+
+Returns a view of the input tensor that has been reshaped to the specified
+new dimensions.  The argument new_dims is an array of Index values.  The
+rank of the resulting tensor is equal to the number of elements in new_dims.
+
+The product of all the sizes in the new dimension array must be equal to
+the number of elements in the input tensor.
+
+    // Increase the rank of the input tensor by introducing a new dimension
+    // of size 1.
+    Tensor<float, 2> input(7, 11);
+    array<int, 3> three_dims{{7, 11, 1}};
+    Tensor<float, 3> result = input.reshape(three_dims);
+
+    // Decrease the rank of the input tensor by merging 2 dimensions;
+    array<int, 1> one_dim{{7 * 11}};
+    Tensor<float, 1> result = input.reshape(one_dim);
+
+This operation does not move any data in the input tensor, so the resulting
+contents of a reshaped Tensor depend on the data layout of the original Tensor.
+
+For example this is what happens when you ```reshape()``` a 2D ColMajor tensor
+to one dimension:
+
+    Eigen::Tensor<float, 2, Eigen::ColMajor> a(2, 3);
+    a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}});
+    Eigen::array<Eigen::DenseIndex, 1> one_dim({3 * 2});
+    Eigen::Tensor<float, 1, Eigen::ColMajor> b = a.reshape(one_dim);
+    cout << "b" << endl << b << endl;
+    =>
+    b
+      0
+    300
+    100
+    400
+    200
+    500
+
+This is what happens when the 2D Tensor is RowMajor:
+
+    Eigen::Tensor<float, 2, Eigen::RowMajor> a(2, 3);
+    a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}});
+    Eigen::array<Eigen::DenseIndex, 1> one_dim({3 * 2});
+    Eigen::Tensor<float, 1, Eigen::RowMajor> b = a.reshape(one_dim);
+    cout << "b" << endl << b << endl;
+    =>
+    b
+      0
+    100
+    200
+    300
+    400
+    500
+
+The reshape operation is a lvalue. In other words, it can be used on the left
+side of the assignment operator.
+
+The previous example can be rewritten as follow:
+
+    Eigen::Tensor<float, 2, Eigen::ColMajor> a(2, 3);
+    a.setValues({{0.0f, 100.0f, 200.0f}, {300.0f, 400.0f, 500.0f}});
+    Eigen::array<Eigen::DenseIndex, 2> two_dim({2, 3});
+    Eigen::Tensor<float, 1, Eigen::ColMajor> b;
+    b.reshape(two_dim) = a;
+    cout << "b" << endl << b << endl;
+    =>
+    b
+      0
+    300
+    100
+    400
+    200
+    500
+
+Note that "b" itself was not reshaped but that instead the assignment is done to
+the reshape view of b.
+
+
+### &lt;Operation&gt; shuffle(const Shuffle& shuffle)
+
+Returns a copy of the input tensor whose dimensions have been
+reordered according to the specified permutation. The argument shuffle
+is an array of Index values. Its size is the rank of the input
+tensor. It must contain a permutation of 0, 1, ..., rank - 1. The i-th
+dimension of the output tensor equals to the size of the shuffle[i]-th
+dimension of the input tensor. For example:
+
+    // Shuffle all dimensions to the left by 1.
+    Tensor<float, 3> input(20, 30, 50);
+    // ... set some values in input.
+    Tensor<float, 3> output = input.shuffle({1, 2, 0})
+
+    eigen_assert(output.dimension(0) == 30);
+    eigen_assert(output.dimension(1) == 50);
+    eigen_assert(output.dimension(2) == 20);
+
+Indices into the output tensor are shuffled accordingly to formulate
+indices into the input tensor. For example, one can assert in the above
+code snippet that:
+
+    eigen_assert(output(3, 7, 11) == input(11, 3, 7));
+
+In general, one can assert that
+
+    eigen_assert(output(..., indices[shuffle[i]], ...) ==
+                 input(..., indices[i], ...))
+
+The shuffle operation results in a lvalue, which means that it can be assigned
+to. In other words, it can be used on the left side of the assignment operator.
+
+Let's rewrite the previous example to take advantage of this feature:
+
+    // Shuffle all dimensions to the left by 1.
+    Tensor<float, 3> input(20, 30, 50);
+    // ... set some values in input.
+    Tensor<float, 3> output(30, 50, 20);
+    output.shuffle({2, 0, 1}) = input;
+
+
+### &lt;Operation&gt; stride(const Strides& strides)
+
+Returns a view of the input tensor that strides (skips stride-1
+elements) along each of the dimensions.  The argument strides is an
+array of Index values.  The dimensions of the resulting tensor are
+ceil(input_dimensions[i] / strides[i]).
+
+For example this is what happens when you ```stride()``` a 2D tensor:
+
+    Eigen::Tensor<int, 2> a(4, 3);
+    a.setValues({{0, 100, 200}, {300, 400, 500}, {600, 700, 800}, {900, 1000, 1100}});
+    Eigen::array<Eigen::DenseIndex, 2> strides({3, 2});
+    Eigen::Tensor<int, 2> b = a.stride(strides);
+    cout << "b" << endl << b << endl;
+    =>
+    b
+       0   200
+     900  1100
+
+It is possible to assign a tensor to a stride:
+    Tensor<float, 3> input(20, 30, 50);
+    // ... set some values in input.
+    Tensor<float, 3> output(40, 90, 200);
+    output.stride({2, 3, 4}) = input;
+
+### &lt;Operation&gt; inflate(const Strides& strides)
+
+Returns a view of an "inflated" tensor of the input tensor by inserting zeros
+between the original elements in the input tensor. The argument strides is an
+array of Index values, indicating how much "inflation" there is. The dimensions
+ of the resulting tensor are (input_dimensions[i] - 1) * strides[i] + 1. In
+some sense it is the inverse of the ```stride()``` operation.
+
+For example this is what happens when you ```inflate()``` a 2D tensor:
+
+    Eigen::Tensor<int, 2> a(2, 3);
+    a.setValues({{0, 100, 200}, {300, 400, 500}});
+    Eigen::array<Eigen::DenseIndex, 2> strides({3, 2});
+    Eigen::Tensor<int, 2> b = a.inflate(strides);
+    cout << "b" << endl << b << endl;
+    =>
+    b
+       0     0     0    100    0    0    200
+       0     0     0      0    0    0      0
+     300     0     0    400    0    0    500
+
+The ```inflate()``` operation is an r-value only operation as it doesn't make
+sense to assign a value to an inflated tensor in positions where the values are
+hardwired to zero.
+
+### &lt;Operation&gt; slice(const StartIndices& offsets, const Sizes& extents)
+
+Returns a sub-tensor of the given tensor. For each dimension i, the slice is
+made of the coefficients stored between offset[i] and offset[i] + extents[i] in
+the input tensor.
+
+    Eigen::Tensor<int, 2> a(4, 3);
+    a.setValues({{0, 100, 200}, {300, 400, 500},
+                 {600, 700, 800}, {900, 1000, 1100}});
+    Eigen::array<int, 2> offsets = {1, 0};
+    Eigen::array<int, 2> extents = {2, 2};
+    Eigen::Tensor<int, 1> slice = a.slice(offsets, extents);
+    cout << "a" << endl << a << endl;
+    =>
+    a
+       0   100   200
+     300   400   500
+     600   700   800
+     900  1000  1100
+    cout << "slice" << endl << slice << endl;
+    =>
+    slice
+     300   400
+     600   700
+
+
+### &lt;Operation&gt; chip(const Index offset, const Index dim)
+
+A chip is a special kind of slice. It is the subtensor at the given offset in
+the dimension dim. The returned tensor has one fewer dimension than the input
+tensor: the dimension dim is removed.
+
+For example, a matrix chip would be either a row or a column of the input
+matrix.
+
+    Eigen::Tensor<int, 2> a(4, 3);
+    a.setValues({{0, 100, 200}, {300, 400, 500},
+                 {600, 700, 800}, {900, 1000, 1100}});
+    Eigen::Tensor<int, 1> row_3 = a.chip(2, 0);
+    Eigen::Tensor<int, 1> col_2 = a.chip(1, 1);
+    cout << "a" << endl << a << endl;
+    =>
+    a
+       0   100   200
+     300   400   500
+     600   700   800
+     900  1000  1100
+    cout << "row_3" << endl << row_3 << endl;
+    =>
+    row_3
+       600   700   800
+    cout << "col_2" << endl << col_2 << endl;
+    =>
+    col_2
+       100   400   700    1000
+
+It is possible to assign values to a tensor chip since the chip operation is a
+lvalue. For example:
+
+    Eigen::Tensor<int, 1> a(3);
+    a.setValues({{100, 200, 300}});
+    Eigen::Tensor<int, 2> b(2, 3);
+    b.setZero();
+    b.chip(0, 0) = a;
+    cout << "a" << endl << a << endl;
+    =>
+    a
+     100
+     200
+     300
+    cout << "b" << endl << b << endl;
+    =>
+    b
+       100   200   300
+         0     0     0
+
+
+### &lt;Operation&gt; reverse(const ReverseDimensions& reverse)
+
+Returns a view of the input tensor that reverses the order of the coefficients
+along a subset of the dimensions.  The argument reverse is an array of boolean
+values that indicates whether or not the order of the coefficients should be
+reversed along each of the dimensions.  This operation preserves the dimensions
+of the input tensor.
+
+For example this is what happens when you ```reverse()``` the first dimension
+of a 2D tensor:
+
+    Eigen::Tensor<int, 2> a(4, 3);
+    a.setValues({{0, 100, 200}, {300, 400, 500},
+                {600, 700, 800}, {900, 1000, 1100}});
+    Eigen::array<bool, 2> reverse({true, false});
+    Eigen::Tensor<int, 2> b = a.reverse(reverse);
+    cout << "a" << endl << a << endl << "b" << endl << b << endl;
+    =>
+    a
+       0   100   200
+     300   400   500
+     600   700   800
+     900  1000  1100
+    b
+     900  1000  1100
+     600   700   800
+     300   400   500
+       0   100   200
+
+
+### &lt;Operation&gt; broadcast(const Broadcast& broadcast)
+
+Returns a view of the input tensor in which the input is replicated one to many
+times.
+The broadcast argument specifies how many copies of the input tensor need to be
+made in each of the dimensions.
+
+    Eigen::Tensor<int, 2> a(2, 3);
+    a.setValues({{0, 100, 200}, {300, 400, 500}});
+    Eigen::array<int, 2> bcast({3, 2});
+    Eigen::Tensor<int, 2> b = a.broadcast(bcast);
+    cout << "a" << endl << a << endl << "b" << endl << b << endl;
+    =>
+    a
+       0   100   200
+     300   400   500
+    b
+       0   100   200    0   100   200
+     300   400   500  300   400   500
+       0   100   200    0   100   200
+     300   400   500  300   400   500
+       0   100   200    0   100   200
+     300   400   500  300   400   500
+
+### &lt;Operation&gt; concatenate(const OtherDerived& other, Axis axis)
+
+TODO
+
+### &lt;Operation&gt;  pad(const PaddingDimensions& padding)
+
+Returns a view of the input tensor in which the input is padded with zeros.
+
+    Eigen::Tensor<int, 2> a(2, 3);
+    a.setValues({{0, 100, 200}, {300, 400, 500}});
+    Eigen::array<std::pair<int, int>, 2> paddings;
+    paddings[0] = make_pair(0, 1);
+    paddings[1] = make_pair(2, 3);
+    Eigen::Tensor<int, 2> b = a.pad(paddings);
+    cout << "a" << endl << a << endl << "b" << endl << b << endl;
+    =>
+    a
+       0   100   200
+     300   400   500
+    b
+       0     0     0    0
+       0     0     0    0
+       0   100   200    0
+     300   400   500    0
+       0     0     0    0
+       0     0     0    0
+       0     0     0    0
+
+
+### &lt;Operation&gt;  extract_patches(const PatchDims& patch_dims)
+
+Returns a tensor of coefficient patches extracted from the input tensor, where
+each patch is of dimension specified by 'patch_dims'. The returned tensor has
+one greater dimension than the input tensor, which is used to index each patch.
+The patch index in the output tensor depends on the data layout of the input
+tensor: the patch index is the last dimension ColMajor layout, and the first
+dimension in RowMajor layout.
+
+For example, given the following input tensor:
+
+    Eigen::Tensor<float, 2, DataLayout> tensor(3,4);
+    tensor.setValues({{0.0f, 1.0f, 2.0f, 3.0f},
+                    {4.0f, 5.0f, 6.0f, 7.0f},
+                    {8.0f, 9.0f, 10.0f, 11.0f}});
+
+    cout << "tensor: " << endl << tensor << endl;
+    =>
+    tensor:
+    0   1   2   3
+    4   5   6   7
+    8   9  10  11
+
+Six 2x2 patches can be extracted and indexed using the following code:
+
+    Eigen::Tensor<float, 3, DataLayout> patch;
+    Eigen::array<ptrdiff_t, 2> patch_dims;
+    patch_dims[0] = 2;
+    patch_dims[1] = 2;
+    patch = tensor.extract_patches(patch_dims);
+    for (int k = 0; k < 6; ++k) {
+      cout << "patch index: " << k << endl;
+      for (int i = 0; i < 2; ++i) {
+        for (int j = 0; j < 2; ++j) {
+          if (DataLayout == ColMajor) {
+            cout << patch(i, j, k) << " ";
+          } else {
+            cout << patch(k, i, j) << " ";
+          }
+        }
+        cout << endl;
+      }
+    }
+
+This code results in the following output when the data layout is ColMajor:
+
+    patch index: 0
+    0 1
+    4 5
+    patch index: 1
+    4 5
+    8 9
+    patch index: 2
+    1 2
+    5 6
+    patch index: 3
+    5 6
+    9 10
+    patch index: 4
+    2 3
+    6 7
+    patch index: 5
+    6 7
+    10 11
+
+This code results in the following output when the data layout is RowMajor:
+(NOTE: the set of patches is the same as in ColMajor, but are indexed differently).
+
+    patch index: 0
+    0 1
+    4 5
+    patch index: 1
+    1 2
+    5 6
+    patch index: 2
+    2 3
+    6 7
+    patch index: 3
+    4 5
+    8 9
+    patch index: 4
+    5 6
+    9 10
+    patch index: 5
+    6 7
+    10 11
+
+### &lt;Operation&gt;  extract_image_patches(const Index patch_rows, const Index patch_cols,
+                          const Index row_stride, const Index col_stride,
+                          const Index in_row_stride, const Index in_col_stride,
+                          const Index row_inflate_stride, const Index col_inflate_stride,
+                          const PaddingType padding_type, const Scalar padding_value)
+
+Returns a tensor of coefficient image patches extracted from the input tensor,
+which is expected to have dimensions ordered as follows (depending on the data
+layout of the input tensor, and the number of additional dimensions 'N'):
+
+* ColMajor
+     * 1st dimension: channels (of size d)
+     * 2nd dimension: rows (of size r)
+     * 3rd dimension: columns (of size c)
+     * 4th-Nth dimension: time (for video) or batch (for bulk processing).
+
+* RowMajor (reverse order of ColMajor)
+    * 1st-Nth dimension: time (for video) or batch (for bulk processing).
+    * N+1'th dimension: columns (of size c)
+    * N+2'th dimension: rows (of size r)
+    * N+3'th dimension: channels (of size d)
+
+The returned tensor has one greater dimension than the input tensor, which is
+used to index each patch. The patch index in the output tensor depends on the
+data layout of the input tensor: the patch index is the 4'th dimension in
+ColMajor layout, and the 4'th from the last dimension in RowMajor layout.
+
+For example, given the following input tensor with the following dimension
+sizes:
+
+* depth:   2
+* rows:    3
+* columns: 5
+* batch:   7
+
+    Tensor<float, 4> tensor(2,3,5,7);
+    Tensor<float, 4, RowMajor> tensor_row_major = tensor.swap_layout();
+
+2x2 image patches can be extracted and indexed using the following code:
+
+* 2D patch: ColMajor (patch indexed by second-to-last dimension)
+
+    Tensor<float, 5> twod_patch;
+    twod_patch = tensor.extract_image_patches<2, 2>();
+    // twod_patch.dimension(0) == 2
+    // twod_patch.dimension(1) == 2
+    // twod_patch.dimension(2) == 2
+    // twod_patch.dimension(3) == 3*5
+    // twod_patch.dimension(4) == 7
+
+* 2D patch: RowMajor (patch indexed by the second dimension)
+
+    Tensor<float, 5, RowMajor> twod_patch_row_major;
+    twod_patch_row_major = tensor_row_major.extract_image_patches<2, 2>();
+    // twod_patch_row_major.dimension(0) == 7
+    // twod_patch_row_major.dimension(1) == 3*5
+    // twod_patch_row_major.dimension(2) == 2
+    // twod_patch_row_major.dimension(3) == 2
+    // twod_patch_row_major.dimension(4) == 2
+
+Input parameters:
+
+* patch_rows, patch_cols: Spatial extent of the extracted patches.
+* row_stride, col_stride: Image Displacement (in pixels) between the
+  upper-left coordinates of consecutive patches.
+* in_row_stride, in_col_stride: Image displacement (in pixels) between
+  two consecutive patch samples. If larger than 1 (default), they allow
+  for sparsely sampling the input image.
+* row_inflate_stride, col_inflate_stride: If larger than 1 (default), "inflates"
+  the inputs by inserting zeros between the original elements. This is useful
+  for backward convolution.
+* padding_type: Boundary conditions. Either PADDING_SAME (default)
+  or PADDING_VALID.
+* padding_value: the value used in padding, defaults to 0.
+
+## Special Operations
+
+### &lt;Operation&gt; cast&lt;T&gt;()
+
+Returns a tensor of type T with the same dimensions as the original tensor.
+The returned tensor contains the values of the original tensor converted to
+type T.
+
+    Eigen::Tensor<float, 2> a(2, 3);
+    Eigen::Tensor<int, 2> b = a.cast<int>();
+
+This can be useful for example if you need to do element-wise division of
+Tensors of integers.  This is not currently supported by the Tensor library
+but you can easily cast the tensors to floats to do the division:
+
+    Eigen::Tensor<int, 2> a(2, 3);
+    a.setValues({{0, 1, 2}, {3, 4, 5}});
+    Eigen::Tensor<int, 2> b =
+        (a.cast<float>() / a.constant(2).cast<float>()).cast<int>();
+    cout << "a" << endl << a << endl << endl;
+    cout << "b" << endl << b << endl << endl;
+    =>
+    a
+    0 1 2
+    3 4 5
+
+    b
+    0 0 1
+    1 2 2
+
+
+### &lt;Operation&gt;     eval()
+
+TODO
+
+
+## Representation of scalar values
+
+Scalar values are often represented by tensors of size 1 and rank 1. It would be
+more logical and user friendly to use tensors of rank 0 instead. For example
+Tensor&lt;T, N&gt;::maximum() currently returns a Tensor&lt;T, 1&gt;. Similarly, the inner
+product of 2 1d tensors (through contractions) returns a 1d tensor. In the
+future these operations might be updated to return 0d tensors instead.
+
+## GPU Support
+
+NVidia GPU support can be enabled using:
+
+    #define EIGEN_USE_GPU
+
+To speedup operations on GPU, it is also recommended to use 32 bit indices. This
+prevents Eigen from using 64 bit loop indices, which have to be emulated in
+software and make any operation extremely slow.
+
+This can be achieved globally by using the EIGEN_DEFAULT_DENSE_INDEX_TYPE define
+as follow:
+
+    #define EIGEN_DEFAULT_DENSE_INDEX_TYPE int
+
+This can also be done individually for each tensor by using the Index32Bit
+option as follow:
+
+    Eigen::Tensor<DataType, Rank, Eigen::Index32Bit> t;
+    Eigen::TensorMap<Eigen::Tensor<DataType, Rank, Eigen::Index32Bit> > t_map;
+
+
+## Limitations
+
+*   The number of tensor dimensions is currently limited to 250 when using a
+    compiler that supports cxx11. It is limited to only 5 for older compilers.
+*   The IndexList class requires a cxx11 compliant compiler. You can use an
+    array of indices instead if you don't have access to a modern compiler.
+*   TensorVarDims are only partially supported
+*   On GPUs only floating point values are properly tested and optimized for.
+*   Complex and integer values are known to be broken on GPUs. If you try to use
+    them you'll most likely end up triggering a static assertion failure such as
+    EIGEN_STATIC_ASSERT(packetSize > 1, YOU_MADE_A_PROGRAMMING_MISTAKE)
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/DynamicSymmetry.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/DynamicSymmetry.h
new file mode 100644
index 0000000000..13cb2157f2
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/DynamicSymmetry.h
@@ -0,0 +1,293 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSORSYMMETRY_DYNAMICSYMMETRY_H
+#define EIGEN_CXX11_TENSORSYMMETRY_DYNAMICSYMMETRY_H
+
+namespace Eigen {
+
+class DynamicSGroup
+{
+  public:
+    inline explicit DynamicSGroup() : m_numIndices(1), m_elements(), m_generators(), m_globalFlags(0) { m_elements.push_back(ge(Generator(0, 0, 0))); }
+    inline DynamicSGroup(const DynamicSGroup& o) : m_numIndices(o.m_numIndices), m_elements(o.m_elements), m_generators(o.m_generators), m_globalFlags(o.m_globalFlags) { }
+    inline DynamicSGroup(DynamicSGroup&& o) : m_numIndices(o.m_numIndices), m_elements(), m_generators(o.m_generators), m_globalFlags(o.m_globalFlags) { std::swap(m_elements, o.m_elements); }
+    inline DynamicSGroup& operator=(const DynamicSGroup& o) { m_numIndices = o.m_numIndices; m_elements = o.m_elements; m_generators = o.m_generators; m_globalFlags = o.m_globalFlags; return *this; }
+    inline DynamicSGroup& operator=(DynamicSGroup&& o) { m_numIndices = o.m_numIndices; std::swap(m_elements, o.m_elements); m_generators = o.m_generators; m_globalFlags = o.m_globalFlags; return *this; }
+
+    void add(int one, int two, int flags = 0);
+
+    template<typename Gen_>
+    inline void add(Gen_) { add(Gen_::One, Gen_::Two, Gen_::Flags); }
+    inline void addSymmetry(int one, int two) { add(one, two, 0); }
+    inline void addAntiSymmetry(int one, int two) { add(one, two, NegationFlag); }
+    inline void addHermiticity(int one, int two) { add(one, two, ConjugationFlag); }
+    inline void addAntiHermiticity(int one, int two) { add(one, two, NegationFlag | ConjugationFlag); }
+
+    template<typename Op, typename RV, typename Index, std::size_t N, typename... Args>
+    inline RV apply(const std::array<Index, N>& idx, RV initial, Args&&... args) const
+    {
+      eigen_assert(N >= m_numIndices && "Can only apply symmetry group to objects that have at least the required number of indices.");
+      for (std::size_t i = 0; i < size(); i++)
+        initial = Op::run(h_permute(i, idx, typename internal::gen_numeric_list<int, N>::type()), m_elements[i].flags, initial, std::forward<Args>(args)...);
+      return initial;
+    }
+
+    template<typename Op, typename RV, typename Index, typename... Args>
+    inline RV apply(const std::vector<Index>& idx, RV initial, Args&&... args) const
+    {
+      eigen_assert(idx.size() >= m_numIndices && "Can only apply symmetry group to objects that have at least the required number of indices.");
+      for (std::size_t i = 0; i < size(); i++)
+        initial = Op::run(h_permute(i, idx), m_elements[i].flags, initial, std::forward<Args>(args)...);
+      return initial;
+    }
+
+    inline int globalFlags() const { return m_globalFlags; }
+    inline std::size_t size() const { return m_elements.size(); }
+
+    template<typename Tensor_, typename... IndexTypes>
+    inline internal::tensor_symmetry_value_setter<Tensor_, DynamicSGroup> operator()(Tensor_& tensor, typename Tensor_::Index firstIndex, IndexTypes... otherIndices) const
+    {
+      static_assert(sizeof...(otherIndices) + 1 == Tensor_::NumIndices, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor.");
+      return operator()(tensor, std::array<typename Tensor_::Index, Tensor_::NumIndices>{{firstIndex, otherIndices...}});
+    }
+
+    template<typename Tensor_>
+    inline internal::tensor_symmetry_value_setter<Tensor_, DynamicSGroup> operator()(Tensor_& tensor, std::array<typename Tensor_::Index, Tensor_::NumIndices> const& indices) const
+    {
+      return internal::tensor_symmetry_value_setter<Tensor_, DynamicSGroup>(tensor, *this, indices);
+    }
+  private:
+    struct GroupElement {
+      std::vector<int> representation;
+      int flags;
+      bool isId() const
+      {
+        for (std::size_t i = 0; i < representation.size(); i++)
+          if (i != (size_t)representation[i])
+            return false;
+        return true;
+      }
+    };
+    struct Generator {
+      int one;
+      int two;
+      int flags;
+      constexpr inline Generator(int one_, int two_, int flags_) : one(one_), two(two_), flags(flags_) {}
+    };
+
+    std::size_t m_numIndices;
+    std::vector<GroupElement> m_elements;
+    std::vector<Generator> m_generators;
+    int m_globalFlags;
+
+    template<typename Index, std::size_t N, int... n>
+    inline std::array<Index, N> h_permute(std::size_t which, const std::array<Index, N>& idx, internal::numeric_list<int, n...>) const
+    {
+      return std::array<Index, N>{{ idx[n >= m_numIndices ? n : m_elements[which].representation[n]]... }};
+    }
+
+    template<typename Index>
+    inline std::vector<Index> h_permute(std::size_t which, std::vector<Index> idx) const
+    {
+      std::vector<Index> result;
+      result.reserve(idx.size());
+      for (auto k : m_elements[which].representation)
+        result.push_back(idx[k]);
+      for (std::size_t i = m_numIndices; i < idx.size(); i++)
+        result.push_back(idx[i]);
+      return result;
+    }
+
+    inline GroupElement ge(Generator const& g) const
+    {
+      GroupElement result;
+      result.representation.reserve(m_numIndices);
+      result.flags = g.flags;
+      for (std::size_t k = 0; k < m_numIndices; k++) {
+        if (k == (std::size_t)g.one)
+          result.representation.push_back(g.two);
+        else if (k == (std::size_t)g.two)
+          result.representation.push_back(g.one);
+        else
+          result.representation.push_back(int(k));
+      }
+      return result;
+    }
+
+    GroupElement mul(GroupElement, GroupElement) const;
+    inline GroupElement mul(Generator g1, GroupElement g2) const
+    {
+      return mul(ge(g1), g2);
+    }
+
+    inline GroupElement mul(GroupElement g1, Generator g2) const
+    {
+      return mul(g1, ge(g2));
+    }
+
+    inline GroupElement mul(Generator g1, Generator g2) const
+    {
+      return mul(ge(g1), ge(g2));
+    }
+
+    inline int findElement(GroupElement e) const
+    {
+      for (auto ee : m_elements) {
+        if (ee.representation == e.representation)
+          return ee.flags ^ e.flags;
+      }
+      return -1;
+    }
+
+    void updateGlobalFlags(int flagDiffOfSameGenerator);
+};
+
+// dynamic symmetry group that auto-adds the template parameters in the constructor
+template<typename... Gen>
+class DynamicSGroupFromTemplateArgs : public DynamicSGroup
+{
+  public:
+    inline DynamicSGroupFromTemplateArgs() : DynamicSGroup()
+    {
+      add_all(internal::type_list<Gen...>());
+    }
+    inline DynamicSGroupFromTemplateArgs(DynamicSGroupFromTemplateArgs const& other) : DynamicSGroup(other) { }
+    inline DynamicSGroupFromTemplateArgs(DynamicSGroupFromTemplateArgs&& other) : DynamicSGroup(other) { }
+    inline DynamicSGroupFromTemplateArgs<Gen...>& operator=(const DynamicSGroupFromTemplateArgs<Gen...>& o) { DynamicSGroup::operator=(o); return *this; }
+    inline DynamicSGroupFromTemplateArgs<Gen...>& operator=(DynamicSGroupFromTemplateArgs<Gen...>&& o) { DynamicSGroup::operator=(o); return *this; }
+  
+  private:
+    template<typename Gen1, typename... GenNext>
+    inline void add_all(internal::type_list<Gen1, GenNext...>)
+    {
+      add(Gen1());
+      add_all(internal::type_list<GenNext...>());
+    }
+
+    inline void add_all(internal::type_list<>)
+    {
+    }
+};
+
+inline DynamicSGroup::GroupElement DynamicSGroup::mul(GroupElement g1, GroupElement g2) const
+{
+  eigen_internal_assert(g1.representation.size() == m_numIndices);
+  eigen_internal_assert(g2.representation.size() == m_numIndices);
+
+  GroupElement result;
+  result.representation.reserve(m_numIndices);
+  for (std::size_t i = 0; i < m_numIndices; i++) {
+    int v = g2.representation[g1.representation[i]];
+    eigen_assert(v >= 0);
+    result.representation.push_back(v);
+  }
+  result.flags = g1.flags ^ g2.flags;
+  return result;
+}
+
+inline void DynamicSGroup::add(int one, int two, int flags)
+{
+  eigen_assert(one >= 0);
+  eigen_assert(two >= 0);
+  eigen_assert(one != two);
+
+  if ((std::size_t)one >= m_numIndices || (std::size_t)two >= m_numIndices) {
+    std::size_t newNumIndices = (one > two) ? one : two + 1;
+    for (auto& gelem : m_elements) {
+      gelem.representation.reserve(newNumIndices);
+      for (std::size_t i = m_numIndices; i < newNumIndices; i++)
+        gelem.representation.push_back(i);
+    }
+    m_numIndices = newNumIndices;
+  }
+
+  Generator g{one, two, flags};
+  GroupElement e = ge(g);
+
+  /* special case for first generator */
+  if (m_elements.size() == 1) {
+    while (!e.isId()) {
+      m_elements.push_back(e);
+      e = mul(e, g);
+    }
+
+    if (e.flags > 0)
+      updateGlobalFlags(e.flags);
+
+    // only add in case we didn't have identity
+    if (m_elements.size() > 1)
+      m_generators.push_back(g);
+    return;
+  }
+
+  int p = findElement(e);
+  if (p >= 0) {
+    updateGlobalFlags(p);
+    return;
+  }
+
+  std::size_t coset_order = m_elements.size();
+  m_elements.push_back(e);
+  for (std::size_t i = 1; i < coset_order; i++)
+    m_elements.push_back(mul(m_elements[i], e));
+  m_generators.push_back(g);
+
+  std::size_t coset_rep = coset_order;
+  do {
+    for (auto g : m_generators) {
+      e = mul(m_elements[coset_rep], g);
+      p = findElement(e);
+      if (p < 0) {
+        // element not yet in group
+        m_elements.push_back(e);
+        for (std::size_t i = 1; i < coset_order; i++)
+          m_elements.push_back(mul(m_elements[i], e));
+      } else if (p > 0) {
+        updateGlobalFlags(p);
+      }
+    }
+    coset_rep += coset_order;
+  } while (coset_rep < m_elements.size());
+}
+
+inline void DynamicSGroup::updateGlobalFlags(int flagDiffOfSameGenerator)
+{
+    switch (flagDiffOfSameGenerator) {
+      case 0:
+      default:
+        // nothing happened
+        break;
+      case NegationFlag:
+        // every element is it's own negative => whole tensor is zero
+        m_globalFlags |= GlobalZeroFlag;
+        break;
+      case ConjugationFlag:
+        // every element is it's own conjugate => whole tensor is real
+        m_globalFlags |= GlobalRealFlag;
+        break;
+      case (NegationFlag | ConjugationFlag):
+        // every element is it's own negative conjugate => whole tensor is imaginary
+        m_globalFlags |= GlobalImagFlag;
+        break;
+      /* NOTE:
+       *   since GlobalZeroFlag == GlobalRealFlag | GlobalImagFlag, if one generator
+       *   causes the tensor to be real and the next one to be imaginary, this will
+       *   trivially give the correct result
+       */
+    }
+}
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSORSYMMETRY_DYNAMICSYMMETRY_H
+
+/*
+ * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle;
+ */
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/StaticSymmetry.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/StaticSymmetry.h
new file mode 100644
index 0000000000..942293bd71
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/StaticSymmetry.h
@@ -0,0 +1,236 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSORSYMMETRY_STATICSYMMETRY_H
+#define EIGEN_CXX11_TENSORSYMMETRY_STATICSYMMETRY_H
+
+namespace Eigen {
+
+namespace internal {
+
+template<typename list> struct tensor_static_symgroup_permutate;
+
+template<int... nn>
+struct tensor_static_symgroup_permutate<numeric_list<int, nn...>>
+{
+  constexpr static std::size_t N = sizeof...(nn);
+
+  template<typename T>
+  constexpr static inline std::array<T, N> run(const std::array<T, N>& indices)
+  {
+    return {{indices[nn]...}};
+  }
+};
+
+template<typename indices_, int flags_>
+struct tensor_static_symgroup_element
+{
+  typedef indices_ indices;
+  constexpr static int flags = flags_;
+};
+
+template<typename Gen, int N>
+struct tensor_static_symgroup_element_ctor
+{
+  typedef tensor_static_symgroup_element<
+    typename gen_numeric_list_swapped_pair<int, N, Gen::One, Gen::Two>::type,
+    Gen::Flags
+  > type;
+};
+
+template<int N>
+struct tensor_static_symgroup_identity_ctor
+{
+  typedef tensor_static_symgroup_element<
+    typename gen_numeric_list<int, N>::type,
+    0
+  > type;
+};
+
+template<typename iib>
+struct tensor_static_symgroup_multiply_helper
+{
+  template<int... iia>
+  constexpr static inline numeric_list<int, get<iia, iib>::value...> helper(numeric_list<int, iia...>) {
+    return numeric_list<int, get<iia, iib>::value...>();
+  }
+};
+
+template<typename A, typename B>
+struct tensor_static_symgroup_multiply
+{
+  private:
+    typedef typename A::indices iia;
+    typedef typename B::indices iib;
+    constexpr static int ffa = A::flags;
+    constexpr static int ffb = B::flags;
+  
+  public:
+    static_assert(iia::count == iib::count, "Cannot multiply symmetry elements with different number of indices.");
+
+    typedef tensor_static_symgroup_element<
+      decltype(tensor_static_symgroup_multiply_helper<iib>::helper(iia())),
+      ffa ^ ffb
+    > type;
+};
+
+template<typename A, typename B>
+struct tensor_static_symgroup_equality
+{
+    typedef typename A::indices iia;
+    typedef typename B::indices iib;
+    constexpr static int ffa = A::flags;
+    constexpr static int ffb = B::flags;
+    static_assert(iia::count == iib::count, "Cannot compare symmetry elements with different number of indices.");
+
+    constexpr static bool value = is_same<iia, iib>::value;
+
+  private:
+    /* this should be zero if they are identical, or else the tensor
+     * will be forced to be pure real, pure imaginary or even pure zero
+     */
+    constexpr static int flags_cmp_ = ffa ^ ffb;
+
+    /* either they are not equal, then we don't care whether the flags
+     * match, or they are equal, and then we have to check
+     */
+    constexpr static bool is_zero      = value && flags_cmp_ == NegationFlag;
+    constexpr static bool is_real      = value && flags_cmp_ == ConjugationFlag;
+    constexpr static bool is_imag      = value && flags_cmp_ == (NegationFlag | ConjugationFlag);
+
+  public:
+    constexpr static int global_flags = 
+      (is_real ? GlobalRealFlag : 0) |
+      (is_imag ? GlobalImagFlag : 0) |
+      (is_zero ? GlobalZeroFlag : 0);
+};
+
+template<std::size_t NumIndices, typename... Gen>
+struct tensor_static_symgroup
+{
+  typedef StaticSGroup<Gen...> type;
+  constexpr static std::size_t size = type::static_size;
+};
+
+template<typename Index, std::size_t N, int... ii, int... jj>
+constexpr static inline std::array<Index, N> tensor_static_symgroup_index_permute(std::array<Index, N> idx, internal::numeric_list<int, ii...>, internal::numeric_list<int, jj...>)
+{
+  return {{ idx[ii]..., idx[jj]... }};
+}
+
+template<typename Index, int... ii>
+static inline std::vector<Index> tensor_static_symgroup_index_permute(std::vector<Index> idx, internal::numeric_list<int, ii...>)
+{
+  std::vector<Index> result{{ idx[ii]... }};
+  std::size_t target_size = idx.size();
+  for (std::size_t i = result.size(); i < target_size; i++)
+    result.push_back(idx[i]);
+  return result;
+}
+
+template<typename T> struct tensor_static_symgroup_do_apply;
+
+template<typename first, typename... next>
+struct tensor_static_symgroup_do_apply<internal::type_list<first, next...>>
+{
+  template<typename Op, typename RV, std::size_t SGNumIndices, typename Index, std::size_t NumIndices, typename... Args>
+  static inline RV run(const std::array<Index, NumIndices>& idx, RV initial, Args&&... args)
+  {
+    static_assert(NumIndices >= SGNumIndices, "Can only apply symmetry group to objects that have at least the required amount of indices.");
+    typedef typename internal::gen_numeric_list<int, NumIndices - SGNumIndices, SGNumIndices>::type remaining_indices;
+    initial = Op::run(tensor_static_symgroup_index_permute(idx, typename first::indices(), remaining_indices()), first::flags, initial, std::forward<Args>(args)...);
+    return tensor_static_symgroup_do_apply<internal::type_list<next...>>::template run<Op, RV, SGNumIndices>(idx, initial, args...);
+  }
+
+  template<typename Op, typename RV, std::size_t SGNumIndices, typename Index, typename... Args>
+  static inline RV run(const std::vector<Index>& idx, RV initial, Args&&... args)
+  {
+    eigen_assert(idx.size() >= SGNumIndices && "Can only apply symmetry group to objects that have at least the required amount of indices.");
+    initial = Op::run(tensor_static_symgroup_index_permute(idx, typename first::indices()), first::flags, initial, std::forward<Args>(args)...);
+    return tensor_static_symgroup_do_apply<internal::type_list<next...>>::template run<Op, RV, SGNumIndices>(idx, initial, args...);
+  }
+};
+
+template<EIGEN_TPL_PP_SPEC_HACK_DEF(typename, empty)>
+struct tensor_static_symgroup_do_apply<internal::type_list<EIGEN_TPL_PP_SPEC_HACK_USE(empty)>>
+{
+  template<typename Op, typename RV, std::size_t SGNumIndices, typename Index, std::size_t NumIndices, typename... Args>
+  static inline RV run(const std::array<Index, NumIndices>&, RV initial, Args&&...)
+  {
+    // do nothing
+    return initial;
+  }
+
+  template<typename Op, typename RV, std::size_t SGNumIndices, typename Index, typename... Args>
+  static inline RV run(const std::vector<Index>&, RV initial, Args&&...)
+  {
+    // do nothing
+    return initial;
+  }
+};
+
+} // end namespace internal
+
+template<typename... Gen>
+class StaticSGroup
+{
+    constexpr static std::size_t NumIndices = internal::tensor_symmetry_num_indices<Gen...>::value;
+    typedef internal::group_theory::enumerate_group_elements<
+      internal::tensor_static_symgroup_multiply,
+      internal::tensor_static_symgroup_equality,
+      typename internal::tensor_static_symgroup_identity_ctor<NumIndices>::type,
+      internal::type_list<typename internal::tensor_static_symgroup_element_ctor<Gen, NumIndices>::type...>
+    > group_elements;
+    typedef typename group_elements::type ge;
+  public:
+    constexpr inline StaticSGroup() {}
+    constexpr inline StaticSGroup(const StaticSGroup<Gen...>&) {}
+    constexpr inline StaticSGroup(StaticSGroup<Gen...>&&) {}
+
+    template<typename Op, typename RV, typename Index, std::size_t N, typename... Args>
+    static inline RV apply(const std::array<Index, N>& idx, RV initial, Args&&... args)
+    {
+      return internal::tensor_static_symgroup_do_apply<ge>::template run<Op, RV, NumIndices>(idx, initial, args...);
+    }
+
+    template<typename Op, typename RV, typename Index, typename... Args>
+    static inline RV apply(const std::vector<Index>& idx, RV initial, Args&&... args)
+    {
+      eigen_assert(idx.size() == NumIndices);
+      return internal::tensor_static_symgroup_do_apply<ge>::template run<Op, RV, NumIndices>(idx, initial, args...);
+    }
+
+    constexpr static std::size_t static_size = ge::count;
+
+    constexpr static inline std::size_t size() {
+      return ge::count;
+    }
+    constexpr static inline int globalFlags() { return group_elements::global_flags; }
+
+    template<typename Tensor_, typename... IndexTypes>
+    inline internal::tensor_symmetry_value_setter<Tensor_, StaticSGroup<Gen...>> operator()(Tensor_& tensor, typename Tensor_::Index firstIndex, IndexTypes... otherIndices) const
+    {
+      static_assert(sizeof...(otherIndices) + 1 == Tensor_::NumIndices, "Number of indices used to access a tensor coefficient must be equal to the rank of the tensor.");
+      return operator()(tensor, std::array<typename Tensor_::Index, Tensor_::NumIndices>{{firstIndex, otherIndices...}});
+    }
+
+    template<typename Tensor_>
+    inline internal::tensor_symmetry_value_setter<Tensor_, StaticSGroup<Gen...>> operator()(Tensor_& tensor, std::array<typename Tensor_::Index, Tensor_::NumIndices> const& indices) const
+    {
+      return internal::tensor_symmetry_value_setter<Tensor_, StaticSGroup<Gen...>>(tensor, *this, indices);
+    }
+};
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSORSYMMETRY_STATICSYMMETRY_H
+
+/*
+ * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle;
+ */
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/Symmetry.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/Symmetry.h
new file mode 100644
index 0000000000..879d6cd77b
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/Symmetry.h
@@ -0,0 +1,338 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSORSYMMETRY_SYMMETRY_H
+#define EIGEN_CXX11_TENSORSYMMETRY_SYMMETRY_H
+
+namespace Eigen {
+
+enum {
+  NegationFlag           = 0x01,
+  ConjugationFlag        = 0x02
+};
+
+enum {
+  GlobalRealFlag         = 0x01,
+  GlobalImagFlag         = 0x02,
+  GlobalZeroFlag         = 0x03
+};
+
+namespace internal {
+
+template<std::size_t NumIndices, typename... Sym>                   struct tensor_symmetry_pre_analysis;
+template<std::size_t NumIndices, typename... Sym>                   struct tensor_static_symgroup;
+template<bool instantiate, std::size_t NumIndices, typename... Sym> struct tensor_static_symgroup_if;
+template<typename Tensor_> struct tensor_symmetry_calculate_flags;
+template<typename Tensor_> struct tensor_symmetry_assign_value;
+template<typename... Sym> struct tensor_symmetry_num_indices;
+
+} // end namespace internal
+
+template<int One_, int Two_>
+struct Symmetry
+{
+  static_assert(One_ != Two_, "Symmetries must cover distinct indices.");
+  constexpr static int One = One_;
+  constexpr static int Two = Two_;
+  constexpr static int Flags = 0;
+};
+
+template<int One_, int Two_>
+struct AntiSymmetry
+{
+  static_assert(One_ != Two_, "Symmetries must cover distinct indices.");
+  constexpr static int One = One_;
+  constexpr static int Two = Two_;
+  constexpr static int Flags = NegationFlag;
+};
+
+template<int One_, int Two_>
+struct Hermiticity
+{
+  static_assert(One_ != Two_, "Symmetries must cover distinct indices.");
+  constexpr static int One = One_;
+  constexpr static int Two = Two_;
+  constexpr static int Flags = ConjugationFlag;
+};
+
+template<int One_, int Two_>
+struct AntiHermiticity
+{
+  static_assert(One_ != Two_, "Symmetries must cover distinct indices.");
+  constexpr static int One = One_;
+  constexpr static int Two = Two_;
+  constexpr static int Flags = ConjugationFlag | NegationFlag;
+};
+
+/** \class DynamicSGroup
+  * \ingroup TensorSymmetry_Module
+  *
+  * \brief Dynamic symmetry group
+  *
+  * The %DynamicSGroup class represents a symmetry group that need not be known at
+  * compile time. It is useful if one wants to support arbitrary run-time defineable
+  * symmetries for tensors, but it is also instantiated if a symmetry group is defined
+  * at compile time that would be either too large for the compiler to reasonably
+  * generate (using templates to calculate this at compile time is very inefficient)
+  * or that the compiler could generate the group but that it wouldn't make sense to
+  * unroll the loop for setting coefficients anymore.
+  */
+class DynamicSGroup;
+
+/** \internal
+  *
+  * \class DynamicSGroupFromTemplateArgs
+  * \ingroup TensorSymmetry_Module
+  *
+  * \brief Dynamic symmetry group, initialized from template arguments
+  *
+  * This class is a child class of DynamicSGroup. It uses the template arguments
+  * specified to initialize itself.
+  */
+template<typename... Gen>
+class DynamicSGroupFromTemplateArgs;
+
+/** \class StaticSGroup
+  * \ingroup TensorSymmetry_Module
+  *
+  * \brief Static symmetry group
+  *
+  * This class represents a symmetry group that is known and resolved completely
+  * at compile time. Ideally, no run-time penalty is incurred compared to the
+  * manual unrolling of the symmetry.
+  *
+  * <b><i>CAUTION:</i></b>
+  *
+  * Do not use this class directly for large symmetry groups. The compiler
+  * may run into a limit, or segfault or in the very least will take a very,
+  * very, very long time to compile the code. Use the SGroup class instead
+  * if you want a static group. That class contains logic that will
+  * automatically select the DynamicSGroup class instead if the symmetry
+  * group becomes too large. (In that case, unrolling may not even be
+  * beneficial.)
+  */
+template<typename... Gen>
+class StaticSGroup;
+
+/** \class SGroup
+  * \ingroup TensorSymmetry_Module
+  *
+  * \brief Symmetry group, initialized from template arguments
+  *
+  * This class represents a symmetry group whose generators are already
+  * known at compile time. It may or may not be resolved at compile time,
+  * depending on the estimated size of the group.
+  *
+  * \sa StaticSGroup
+  * \sa DynamicSGroup
+  */
+template<typename... Gen>
+class SGroup : public internal::tensor_symmetry_pre_analysis<internal::tensor_symmetry_num_indices<Gen...>::value, Gen...>::root_type
+{
+  public:
+    constexpr static std::size_t NumIndices = internal::tensor_symmetry_num_indices<Gen...>::value;
+    typedef typename internal::tensor_symmetry_pre_analysis<NumIndices, Gen...>::root_type Base;
+
+    // make standard constructors + assignment operators public
+    inline SGroup() : Base() { }
+    inline SGroup(const SGroup<Gen...>& other) : Base(other) { }
+    inline SGroup(SGroup<Gen...>&& other) : Base(other) { }
+    inline SGroup<Gen...>& operator=(const SGroup<Gen...>& other) { Base::operator=(other); return *this; }
+    inline SGroup<Gen...>& operator=(SGroup<Gen...>&& other) { Base::operator=(other); return *this; }
+
+    // all else is defined in the base class
+};
+
+namespace internal {
+
+template<typename... Sym> struct tensor_symmetry_num_indices
+{
+  constexpr static std::size_t value = 1;
+};
+
+template<int One_, int Two_, typename... Sym> struct tensor_symmetry_num_indices<Symmetry<One_, Two_>, Sym...>
+{
+private:
+  constexpr static std::size_t One = static_cast<std::size_t>(One_);
+  constexpr static std::size_t Two = static_cast<std::size_t>(Two_);
+  constexpr static std::size_t Three = tensor_symmetry_num_indices<Sym...>::value;
+
+  // don't use std::max, since it's not constexpr until C++14...
+  constexpr static std::size_t maxOneTwoPlusOne = ((One > Two) ? One : Two) + 1;
+public:
+  constexpr static std::size_t value = (maxOneTwoPlusOne > Three) ? maxOneTwoPlusOne : Three;
+};
+
+template<int One_, int Two_, typename... Sym> struct tensor_symmetry_num_indices<AntiSymmetry<One_, Two_>, Sym...>
+  : public tensor_symmetry_num_indices<Symmetry<One_, Two_>, Sym...> {};
+template<int One_, int Two_, typename... Sym> struct tensor_symmetry_num_indices<Hermiticity<One_, Two_>, Sym...>
+  : public tensor_symmetry_num_indices<Symmetry<One_, Two_>, Sym...> {};
+template<int One_, int Two_, typename... Sym> struct tensor_symmetry_num_indices<AntiHermiticity<One_, Two_>, Sym...>
+  : public tensor_symmetry_num_indices<Symmetry<One_, Two_>, Sym...> {};
+
+/** \internal
+  *
+  * \class tensor_symmetry_pre_analysis
+  * \ingroup TensorSymmetry_Module
+  *
+  * \brief Pre-select whether to use a static or dynamic symmetry group
+  *
+  * When a symmetry group could in principle be determined at compile time,
+  * this template implements the logic whether to actually do that or whether
+  * to rather defer that to runtime.
+  *
+  * The logic is as follows:
+  * <dl>
+  * <dt><b>No generators (trivial symmetry):</b></dt>
+  * <dd>Use a trivial static group. Ideally, this has no performance impact
+  *     compared to not using symmetry at all. In practice, this might not
+  *     be the case.</dd>
+  * <dt><b>More than 4 generators:</b></dt>
+  * <dd>Calculate the group at run time, it is likely far too large for the
+  *     compiler to be able to properly generate it in a realistic time.</dd>
+  * <dt><b>Up to and including 4 generators:</b></dt>
+  * <dd>Actually enumerate all group elements, but then check how many there
+  *     are. If there are more than 16, it is unlikely that unrolling the
+  *     loop (as is done in the static compile-time case) is sensible, so
+  *     use a dynamic group instead. If there are at most 16 elements, actually
+  *     use that static group. Note that the largest group with 4 generators
+  *     still compiles with reasonable resources.</dd>
+  * </dl>
+  *
+  * Note: Example compile time performance with g++-4.6 on an Intenl Core i5-3470
+  *       with 16 GiB RAM (all generators non-redundant and the subgroups don't
+  *       factorize):
+  *
+  *          # Generators          -O0 -ggdb               -O2
+  *          -------------------------------------------------------------------
+  *          1                 0.5 s  /   250 MiB     0.45s /   230 MiB
+  *          2                 0.5 s  /   260 MiB     0.5 s /   250 MiB
+  *          3                 0.65s  /   310 MiB     0.62s /   310 MiB
+  *          4                 2.2 s  /   860 MiB     1.7 s /   770 MiB
+  *          5               130   s  / 13000 MiB   120   s / 11000 MiB
+  *
+  * It is clear that everything is still very efficient up to 4 generators, then
+  * the memory and CPU requirements become unreasonable. Thus we only instantiate
+  * the template group theory logic if the number of generators supplied is 4 or
+  * lower, otherwise this will be forced to be done during runtime, where the
+  * algorithm is reasonably fast.
+  */
+template<std::size_t NumIndices>
+struct tensor_symmetry_pre_analysis<NumIndices>
+{
+  typedef StaticSGroup<> root_type;
+};
+
+template<std::size_t NumIndices, typename Gen_, typename... Gens_>
+struct tensor_symmetry_pre_analysis<NumIndices, Gen_, Gens_...>
+{
+  constexpr static std::size_t max_static_generators = 4;
+  constexpr static std::size_t max_static_elements = 16;
+  typedef tensor_static_symgroup_if<(sizeof...(Gens_) + 1 <= max_static_generators), NumIndices, Gen_, Gens_...> helper;
+  constexpr static std::size_t possible_size = helper::size;
+
+  typedef typename conditional<
+    possible_size == 0 || possible_size >= max_static_elements,
+    DynamicSGroupFromTemplateArgs<Gen_, Gens_...>,
+    typename helper::type
+  >::type root_type;
+};
+
+template<bool instantiate, std::size_t NumIndices, typename... Gens>
+struct tensor_static_symgroup_if
+{
+  constexpr static std::size_t size = 0;
+  typedef void type;
+};
+
+template<std::size_t NumIndices, typename... Gens>
+struct tensor_static_symgroup_if<true, NumIndices, Gens...> : tensor_static_symgroup<NumIndices, Gens...> {};
+
+template<typename Tensor_>
+struct tensor_symmetry_assign_value
+{
+  typedef typename Tensor_::Index Index;
+  typedef typename Tensor_::Scalar Scalar;
+  constexpr static std::size_t NumIndices = Tensor_::NumIndices;
+
+  static inline int run(const std::array<Index, NumIndices>& transformed_indices, int transformation_flags, int dummy, Tensor_& tensor, const Scalar& value_)
+  {
+    Scalar value(value_);
+    if (transformation_flags & ConjugationFlag)
+      value = numext::conj(value);
+    if (transformation_flags & NegationFlag)
+      value = -value;
+    tensor.coeffRef(transformed_indices) = value;
+    return dummy;
+  }
+};
+
+template<typename Tensor_>
+struct tensor_symmetry_calculate_flags
+{
+  typedef typename Tensor_::Index Index;
+  constexpr static std::size_t NumIndices = Tensor_::NumIndices;
+
+  static inline int run(const std::array<Index, NumIndices>& transformed_indices, int transform_flags, int current_flags, const std::array<Index, NumIndices>& orig_indices)
+  {
+    if (transformed_indices == orig_indices) {
+      if (transform_flags & (ConjugationFlag | NegationFlag))
+        return current_flags | GlobalImagFlag; // anti-hermitian diagonal
+      else if (transform_flags & ConjugationFlag)
+        return current_flags | GlobalRealFlag; // hermitian diagonal
+      else if (transform_flags & NegationFlag)
+        return current_flags | GlobalZeroFlag; // anti-symmetric diagonal
+    }
+    return current_flags;
+  }
+};
+
+template<typename Tensor_, typename Symmetry_, int Flags = 0>
+class tensor_symmetry_value_setter
+{
+  public:
+    typedef typename Tensor_::Index Index;
+    typedef typename Tensor_::Scalar Scalar;
+    constexpr static std::size_t NumIndices = Tensor_::NumIndices;
+
+    inline tensor_symmetry_value_setter(Tensor_& tensor, Symmetry_ const& symmetry, std::array<Index, NumIndices> const& indices)
+      : m_tensor(tensor), m_symmetry(symmetry), m_indices(indices) { }
+
+    inline tensor_symmetry_value_setter<Tensor_, Symmetry_, Flags>& operator=(Scalar const& value)
+    {
+      doAssign(value);
+      return *this;
+    }
+  private:
+    Tensor_& m_tensor;
+    Symmetry_ m_symmetry;
+    std::array<Index, NumIndices> m_indices;
+
+    inline void doAssign(Scalar const& value)
+    {
+      #ifdef EIGEN_TENSOR_SYMMETRY_CHECK_VALUES
+        int value_flags = m_symmetry.template apply<internal::tensor_symmetry_calculate_flags<Tensor_>, int>(m_indices, m_symmetry.globalFlags(), m_indices);
+        if (value_flags & GlobalRealFlag)
+          eigen_assert(numext::imag(value) == 0);
+        if (value_flags & GlobalImagFlag)
+          eigen_assert(numext::real(value) == 0);
+      #endif
+      m_symmetry.template apply<internal::tensor_symmetry_assign_value<Tensor_>, int>(m_indices, 0, m_tensor, value);
+    }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSORSYMMETRY_SYMMETRY_H
+
+/*
+ * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle;
+ */
diff --git a/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/util/TemplateGroupTheory.h b/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/util/TemplateGroupTheory.h
new file mode 100644
index 0000000000..0fe0b7c46d
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/CXX11/src/TensorSymmetry/util/TemplateGroupTheory.h
@@ -0,0 +1,666 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2013 Christian Seiler <christian@iwakd.de>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_CXX11_TENSORSYMMETRY_TEMPLATEGROUPTHEORY_H
+#define EIGEN_CXX11_TENSORSYMMETRY_TEMPLATEGROUPTHEORY_H
+
+namespace Eigen {
+
+namespace internal {
+
+namespace group_theory {
+
+/** \internal
+  * \file CXX11/Tensor/util/TemplateGroupTheory.h
+  * This file contains C++ templates that implement group theory algorithms.
+  *
+  * The algorithms allow for a compile-time analysis of finite groups.
+  *
+  * Currently only Dimino's algorithm is implemented, which returns a list
+  * of all elements in a group given a set of (possibly redundant) generators.
+  * (One could also do that with the so-called orbital algorithm, but that
+  * is much more expensive and usually has no advantages.)
+  */
+
+/**********************************************************************
+ *                "Ok kid, here is where it gets complicated."
+ *                         - Amelia Pond in the "Doctor Who" episode
+ *                           "The Big Bang"
+ *
+ * Dimino's algorithm
+ * ==================
+ *
+ * The following is Dimino's algorithm in sequential form:
+ *
+ * Input: identity element, list of generators, equality check,
+ *        multiplication operation
+ * Output: list of group elements
+ *
+ * 1. add identity element
+ * 2. remove identities from list of generators
+ * 3. add all powers of first generator that aren't the
+ *    identity element
+ * 4. go through all remaining generators:
+ *        a. if generator is already in the list of elements
+ *                -> do nothing
+ *        b. otherwise
+ *                i.   remember current # of elements
+ *                     (i.e. the size of the current subgroup)
+ *                ii.  add all current elements (which includes
+ *                     the identity) each multiplied from right
+ *                     with the current generator to the group
+ *                iii. add all remaining cosets that are generated
+ *                     by products of the new generator with itself
+ *                     and all other generators seen so far
+ *
+ * In functional form, this is implemented as a long set of recursive
+ * templates that have a complicated relationship.
+ *
+ * The main interface for Dimino's algorithm is the template
+ * enumerate_group_elements. All lists are implemented as variadic
+ * type_list<typename...> and numeric_list<typename = int, int...>
+ * templates.
+ *
+ * 'Calling' templates is usually done via typedefs.
+ *
+ * This algorithm is an extended version of the basic version. The
+ * extension consists in the fact that each group element has a set
+ * of flags associated with it. Multiplication of two group elements
+ * with each other results in a group element whose flags are the
+ * XOR of the flags of the previous elements. Each time the algorithm
+ * notices that a group element it just calculated is already in the
+ * list of current elements, the flags of both will be compared and
+ * added to the so-called 'global flags' of the group.
+ *
+ * The rationale behind this extension is that this allows not only
+ * for the description of symmetries between tensor indices, but
+ * also allows for the description of hermiticity, antisymmetry and
+ * antihermiticity. Negation and conjugation each are specific bit
+ * in the flags value and if two different ways to reach a group
+ * element lead to two different flags, this poses a constraint on
+ * the allowed values of the resulting tensor. For example, if a
+ * group element is reach both with and without the conjugation
+ * flags, it is clear that the resulting tensor has to be real.
+ *
+ * Note that this flag mechanism is quite generic and may have other
+ * uses beyond tensor properties.
+ *
+ * IMPORTANT: 
+ *     This algorithm assumes the group to be finite. If you try to
+ *     run it with a group that's infinite, the algorithm will only
+ *     terminate once you hit a compiler limit (max template depth).
+ *     Also note that trying to use this implementation to create a
+ *     very large group will probably either make you hit the same
+ *     limit, cause the compiler to segfault or at the very least
+ *     take a *really* long time (hours, days, weeks - sic!) to
+ *     compile. It is not recommended to plug in more than 4
+ *     generators, unless they are independent of each other.
+ */
+
+/** \internal
+  *
+  * \class strip_identities
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Cleanse a list of group elements of the identity element
+  *
+  * This template is used to make a first pass through all initial
+  * generators of Dimino's algorithm and remove the identity
+  * elements.
+  *
+  * \sa enumerate_group_elements
+  */
+template<template<typename, typename> class Equality, typename id, typename L> struct strip_identities;
+
+template<
+  template<typename, typename> class Equality,
+  typename id,
+  typename t,
+  typename... ts
+>
+struct strip_identities<Equality, id, type_list<t, ts...>>
+{
+  typedef typename conditional<
+    Equality<id, t>::value,
+    typename strip_identities<Equality, id, type_list<ts...>>::type,
+    typename concat<type_list<t>, typename strip_identities<Equality, id, type_list<ts...>>::type>::type
+  >::type type;
+  constexpr static int global_flags = Equality<id, t>::global_flags | strip_identities<Equality, id, type_list<ts...>>::global_flags;
+};
+
+template<
+  template<typename, typename> class Equality,
+  typename id
+  EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, ts)
+>
+struct strip_identities<Equality, id, type_list<EIGEN_TPL_PP_SPEC_HACK_USE(ts)>>
+{
+  typedef type_list<> type;
+  constexpr static int global_flags = 0;
+};
+
+/** \internal
+  *
+  * \class dimino_first_step_elements_helper 
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Recursive template that adds powers of the first generator to the list of group elements
+  *
+  * This template calls itself recursively to add powers of the first
+  * generator to the list of group elements. It stops if it reaches
+  * the identity element again.
+  *
+  * \sa enumerate_group_elements, dimino_first_step_elements
+  */
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename g,
+  typename current_element,
+  typename elements,
+  bool dont_add_current_element   // = false
+>
+struct dimino_first_step_elements_helper :
+  public dimino_first_step_elements_helper<
+    Multiply,
+    Equality,
+    id,
+    g,
+    typename Multiply<current_element, g>::type,
+    typename concat<elements, type_list<current_element>>::type,
+    Equality<typename Multiply<current_element, g>::type, id>::value
+  > {};
+
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename g,
+  typename current_element,
+  typename elements
+>
+struct dimino_first_step_elements_helper<Multiply, Equality, id, g, current_element, elements, true>
+{
+  typedef elements type;
+  constexpr static int global_flags = Equality<current_element, id>::global_flags;
+};
+
+/** \internal
+  *
+  * \class dimino_first_step_elements
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Add all powers of the first generator to the list of group elements
+  *
+  * This template takes the first non-identity generator and generates the initial
+  * list of elements which consists of all powers of that generator. For a group
+  * with just one generated, it would be enumerated after this.
+  *
+  * \sa enumerate_group_elements
+  */
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename generators
+>
+struct dimino_first_step_elements
+{
+  typedef typename get<0, generators>::type first_generator;
+  typedef typename skip<1, generators>::type next_generators;
+  typedef type_list<first_generator> generators_done;
+
+  typedef dimino_first_step_elements_helper<
+    Multiply,
+    Equality,
+    id,
+    first_generator,
+    first_generator,
+    type_list<id>,
+    false
+  > helper;
+  typedef typename helper::type type;
+  constexpr static int global_flags = helper::global_flags;
+};
+
+/** \internal
+  *
+  * \class dimino_get_coset_elements
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Generate all elements of a specific coset
+  *
+  * This template generates all the elements of a specific coset by
+  * multiplying all elements in the given subgroup with the new
+  * coset representative. Note that the first element of the
+  * subgroup is always the identity element, so the first element of
+  * ther result of this template is going to be the coset
+  * representative itself.
+  *
+  * Note that this template accepts an additional boolean parameter
+  * that specifies whether to actually generate the coset (true) or
+  * just return an empty list (false).
+  *
+  * \sa enumerate_group_elements, dimino_add_cosets_for_rep
+  */
+template<
+  template<typename, typename> class Multiply,
+  typename sub_group_elements,
+  typename new_coset_rep,
+  bool generate_coset      // = true
+>
+struct dimino_get_coset_elements
+{
+  typedef typename apply_op_from_right<Multiply, new_coset_rep, sub_group_elements>::type type;
+};
+
+template<
+  template<typename, typename> class Multiply,
+  typename sub_group_elements,
+  typename new_coset_rep
+>
+struct dimino_get_coset_elements<Multiply, sub_group_elements, new_coset_rep, false>
+{
+  typedef type_list<> type;
+};
+
+/** \internal
+  *
+  * \class dimino_add_cosets_for_rep
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Recursive template for adding coset spaces
+  *
+  * This template multiplies the coset representative with a generator
+  * from the list of previous generators. If the new element is not in
+  * the group already, it adds the corresponding coset. Finally it
+  * proceeds to call itself with the next generator from the list.
+  *
+  * \sa enumerate_group_elements, dimino_add_all_coset_spaces
+  */
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename sub_group_elements,
+  typename elements,
+  typename generators,
+  typename rep_element,
+  int sub_group_size
+>
+struct dimino_add_cosets_for_rep;
+
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename sub_group_elements,
+  typename elements,
+  typename g,
+  typename... gs,
+  typename rep_element,
+  int sub_group_size
+>
+struct dimino_add_cosets_for_rep<Multiply, Equality, id, sub_group_elements, elements, type_list<g, gs...>, rep_element, sub_group_size>
+{
+  typedef typename Multiply<rep_element, g>::type new_coset_rep;
+  typedef contained_in_list_gf<Equality, new_coset_rep, elements> _cil;
+  constexpr static bool add_coset = !_cil::value;
+
+  typedef typename dimino_get_coset_elements<
+    Multiply,
+    sub_group_elements,
+    new_coset_rep,
+    add_coset
+  >::type coset_elements;
+
+  typedef dimino_add_cosets_for_rep<
+    Multiply,
+    Equality,
+    id,
+    sub_group_elements,
+    typename concat<elements, coset_elements>::type,
+    type_list<gs...>,
+    rep_element,
+    sub_group_size
+  > _helper;
+
+  typedef typename _helper::type type;
+  constexpr static int global_flags = _cil::global_flags | _helper::global_flags;
+
+  /* Note that we don't have to update global flags here, since
+   * we will only add these elements if they are not part of
+   * the group already. But that only happens if the coset rep
+   * is not already in the group, so the check for the coset rep
+   * will catch this.
+   */
+};
+
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename sub_group_elements,
+  typename elements
+  EIGEN_TPL_PP_SPEC_HACK_DEFC(typename, empty),
+  typename rep_element,
+  int sub_group_size
+>
+struct dimino_add_cosets_for_rep<Multiply, Equality, id, sub_group_elements, elements, type_list<EIGEN_TPL_PP_SPEC_HACK_USE(empty)>, rep_element, sub_group_size>
+{
+  typedef elements type;
+  constexpr static int global_flags = 0;
+};
+
+/** \internal
+  *
+  * \class dimino_add_all_coset_spaces
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Recursive template for adding all coset spaces for a new generator
+  *
+  * This template tries to go through the list of generators (with
+  * the help of the dimino_add_cosets_for_rep template) as long as
+  * it still finds elements that are not part of the group and add
+  * the corresponding cosets.
+  *
+  * \sa enumerate_group_elements, dimino_add_cosets_for_rep
+  */
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename sub_group_elements,
+  typename elements,
+  typename generators,
+  int sub_group_size,
+  int rep_pos,
+  bool stop_condition        // = false
+>
+struct dimino_add_all_coset_spaces
+{
+  typedef typename get<rep_pos, elements>::type rep_element;
+  typedef dimino_add_cosets_for_rep<
+    Multiply,
+    Equality,
+    id,
+    sub_group_elements,
+    elements,
+    generators,
+    rep_element,
+    sub_group_elements::count
+  > _ac4r;
+  typedef typename _ac4r::type new_elements;
+  
+  constexpr static int new_rep_pos = rep_pos + sub_group_elements::count;
+  constexpr static bool new_stop_condition = new_rep_pos >= new_elements::count;
+
+  typedef dimino_add_all_coset_spaces<
+    Multiply,
+    Equality,
+    id,
+    sub_group_elements,
+    new_elements,
+    generators,
+    sub_group_size,
+    new_rep_pos,
+    new_stop_condition
+  > _helper;
+
+  typedef typename _helper::type type;
+  constexpr static int global_flags = _helper::global_flags | _ac4r::global_flags;
+};
+
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename sub_group_elements,
+  typename elements,
+  typename generators,
+  int sub_group_size,
+  int rep_pos
+>
+struct dimino_add_all_coset_spaces<Multiply, Equality, id, sub_group_elements, elements, generators, sub_group_size, rep_pos, true>
+{
+  typedef elements type;
+  constexpr static int global_flags = 0;
+};
+
+/** \internal
+  *
+  * \class dimino_add_generator
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Enlarge the group by adding a new generator.
+  *
+  * It accepts a boolean parameter that determines if the generator is redundant,
+  * i.e. was already seen in the group. In that case, it reduces to a no-op.
+  *
+  * \sa enumerate_group_elements, dimino_add_all_coset_spaces
+  */
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename elements,
+  typename generators_done,
+  typename current_generator,
+  bool redundant          // = false
+>
+struct dimino_add_generator
+{
+  /* this template is only called if the generator is not redundant
+   * => all elements of the group multiplied with the new generator
+   *    are going to be new elements of the most trivial coset space
+   */
+  typedef typename apply_op_from_right<Multiply, current_generator, elements>::type multiplied_elements;
+  typedef typename concat<elements, multiplied_elements>::type new_elements;
+
+  constexpr static int rep_pos = elements::count;
+
+  typedef dimino_add_all_coset_spaces<
+    Multiply,
+    Equality,
+    id,
+    elements, // elements of previous subgroup
+    new_elements,
+    typename concat<generators_done, type_list<current_generator>>::type,
+    elements::count, // size of previous subgroup
+    rep_pos,
+    false // don't stop (because rep_pos >= new_elements::count is always false at this point)
+  > _helper;
+  typedef typename _helper::type type;
+  constexpr static int global_flags = _helper::global_flags;
+};
+
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename elements,
+  typename generators_done,
+  typename current_generator
+>
+struct dimino_add_generator<Multiply, Equality, id, elements, generators_done, current_generator, true>
+{
+  // redundant case
+  typedef elements type;
+  constexpr static int global_flags = 0;
+};
+
+/** \internal
+  *
+  * \class dimino_add_remaining_generators
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Recursive template that adds all remaining generators to a group
+  *
+  * Loop through the list of generators that remain and successively
+  * add them to the group.
+  *
+  * \sa enumerate_group_elements, dimino_add_generator
+  */
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename generators_done,
+  typename remaining_generators,
+  typename elements
+>
+struct dimino_add_remaining_generators
+{
+  typedef typename get<0, remaining_generators>::type first_generator;
+  typedef typename skip<1, remaining_generators>::type next_generators;
+
+  typedef contained_in_list_gf<Equality, first_generator, elements> _cil;
+
+  typedef dimino_add_generator<
+    Multiply,
+    Equality,
+    id,
+    elements,
+    generators_done,
+    first_generator,
+    _cil::value
+  > _helper;
+
+  typedef typename _helper::type new_elements;
+
+  typedef dimino_add_remaining_generators<
+    Multiply,
+    Equality,
+    id,
+    typename concat<generators_done, type_list<first_generator>>::type,
+    next_generators,
+    new_elements
+  > _next_iter;
+
+  typedef typename _next_iter::type type;
+  constexpr static int global_flags =
+    _cil::global_flags |
+    _helper::global_flags |
+    _next_iter::global_flags;
+};
+
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename generators_done,
+  typename elements
+>
+struct dimino_add_remaining_generators<Multiply, Equality, id, generators_done, type_list<>, elements>
+{
+  typedef elements type;
+  constexpr static int global_flags = 0;
+};
+
+/** \internal
+  *
+  * \class enumerate_group_elements_noid
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Helper template that implements group element enumeration
+  *
+  * This is a helper template that implements the actual enumeration
+  * of group elements. This has been split so that the list of
+  * generators can be cleansed of the identity element before
+  * performing the actual operation.
+  *
+  * \sa enumerate_group_elements
+  */
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename generators,
+  int initial_global_flags = 0
+>
+struct enumerate_group_elements_noid
+{
+  typedef dimino_first_step_elements<Multiply, Equality, id, generators> first_step;
+  typedef typename first_step::type first_step_elements;
+
+  typedef dimino_add_remaining_generators<
+    Multiply,
+    Equality,
+    id,
+    typename first_step::generators_done,
+    typename first_step::next_generators, // remaining_generators
+    typename first_step::type // first_step elements
+  > _helper;
+
+  typedef typename _helper::type type;
+  constexpr static int global_flags =
+    initial_global_flags |
+    first_step::global_flags |
+    _helper::global_flags;
+};
+
+// in case when no generators are specified
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  int initial_global_flags
+>
+struct enumerate_group_elements_noid<Multiply, Equality, id, type_list<>, initial_global_flags>
+{
+  typedef type_list<id> type;
+  constexpr static int global_flags = initial_global_flags;
+};
+
+/** \internal
+  *
+  * \class enumerate_group_elements
+  * \ingroup CXX11_TensorSymmetry_Module
+  *
+  * \brief Enumerate all elements in a finite group
+  *
+  * This template enumerates all elements in a finite group. It accepts
+  * the following template parameters:
+  *
+  * \tparam Multiply      The multiplication operation that multiplies two group elements
+  *                       with each other.
+  * \tparam Equality      The equality check operation that checks if two group elements
+  *                       are equal to another.
+  * \tparam id            The identity element
+  * \tparam _generators   A list of (possibly redundant) generators of the group
+  */
+template<
+  template<typename, typename> class Multiply,
+  template<typename, typename> class Equality,
+  typename id,
+  typename _generators
+>
+struct enumerate_group_elements
+  : public enumerate_group_elements_noid<
+      Multiply,
+      Equality,
+      id,
+      typename strip_identities<Equality, id, _generators>::type,
+      strip_identities<Equality, id, _generators>::global_flags
+    >
+{
+};
+
+} // end namespace group_theory
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+#endif // EIGEN_CXX11_TENSORSYMMETRY_TEMPLATEGROUPTHEORY_H
+
+/*
+ * kate: space-indent on; indent-width 2; mixedindent off; indent-mode cstyle;
+ */
diff --git a/third_party/eigen3/unsupported/Eigen/FFT b/third_party/eigen3/unsupported/Eigen/FFT
new file mode 100644
index 0000000000..2c45b3999e
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/FFT
@@ -0,0 +1,418 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra. 
+//
+// Copyright (C) 2009 Mark Borgerding mark a borgerding net
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_FFT_H
+#define EIGEN_FFT_H
+
+#include <complex>
+#include <vector>
+#include <map>
+#include <Eigen/Core>
+
+
+/**
+  * \defgroup FFT_Module Fast Fourier Transform module
+  *
+  * \code
+  * #include <unsupported/Eigen/FFT>
+  * \endcode
+  *
+  * This module provides Fast Fourier transformation, with a configurable backend
+  * implementation.
+  *
+  * The default implementation is based on kissfft. It is a small, free, and
+  * reasonably efficient default.
+  *
+  * There are currently two implementation backend:
+  *
+  * - fftw (http://www.fftw.org) : faster, GPL -- incompatible with Eigen in LGPL form, bigger code size.
+  * - MKL (http://en.wikipedia.org/wiki/Math_Kernel_Library) : fastest, commercial -- may be incompatible with Eigen in GPL form.
+  *
+  * \section FFTDesign Design
+  *
+  * The following design decisions were made concerning scaling and
+  * half-spectrum for real FFT.
+  *
+  * The intent is to facilitate generic programming and ease migrating code
+  * from  Matlab/octave.
+  * We think the default behavior of Eigen/FFT should favor correctness and
+  * generality over speed. Of course, the caller should be able to "opt-out" from this
+  * behavior and get the speed increase if they want it.
+  *
+  * 1) %Scaling:
+  * Other libraries (FFTW,IMKL,KISSFFT)  do not perform scaling, so there
+  * is a constant gain incurred after the forward&inverse transforms , so 
+  * IFFT(FFT(x)) = Kx;  this is done to avoid a vector-by-value multiply.  
+  * The downside is that algorithms that worked correctly in Matlab/octave 
+  * don't behave the same way once implemented in C++.
+  *
+  * How Eigen/FFT differs: invertible scaling is performed so IFFT( FFT(x) ) = x. 
+  *
+  * 2) Real FFT half-spectrum
+  * Other libraries use only half the frequency spectrum (plus one extra 
+  * sample for the Nyquist bin) for a real FFT, the other half is the 
+  * conjugate-symmetric of the first half.  This saves them a copy and some 
+  * memory.  The downside is the caller needs to have special logic for the 
+  * number of bins in complex vs real.
+  *
+  * How Eigen/FFT differs: The full spectrum is returned from the forward 
+  * transform.  This facilitates generic template programming by obviating 
+  * separate specializations for real vs complex.  On the inverse
+  * transform, only half the spectrum is actually used if the output type is real.
+  */
+ 
+
+#ifdef EIGEN_FFTW_DEFAULT
+// FFTW: faster, GPL -- incompatible with Eigen in LGPL form, bigger code size
+#  include <fftw3.h>
+#  include "src/FFT/ei_fftw_impl.h"
+   namespace Eigen {
+     //template <typename T> typedef struct internal::fftw_impl  default_fft_impl; this does not work
+     template <typename T> struct default_fft_impl : public internal::fftw_impl<T> {};
+   }
+#elif defined EIGEN_MKL_DEFAULT
+// TODO 
+// intel Math Kernel Library: fastest, commercial -- may be incompatible with Eigen in GPL form
+#  include "src/FFT/ei_imklfft_impl.h"
+   namespace Eigen {
+     template <typename T> struct default_fft_impl : public internal::imklfft_impl {};
+   }
+#else
+// internal::kissfft_impl:  small, free, reasonably efficient default, derived from kissfft
+//
+# include "src/FFT/ei_kissfft_impl.h"
+  namespace Eigen {
+     template <typename T> 
+       struct default_fft_impl : public internal::kissfft_impl<T> {};
+  }
+#endif
+
+namespace Eigen {
+
+ 
+// 
+template<typename T_SrcMat,typename T_FftIfc> struct fft_fwd_proxy;
+template<typename T_SrcMat,typename T_FftIfc> struct fft_inv_proxy;
+
+namespace internal {
+template<typename T_SrcMat,typename T_FftIfc>
+struct traits< fft_fwd_proxy<T_SrcMat,T_FftIfc> >
+{
+  typedef typename T_SrcMat::PlainObject ReturnType;
+};
+template<typename T_SrcMat,typename T_FftIfc>
+struct traits< fft_inv_proxy<T_SrcMat,T_FftIfc> >
+{
+  typedef typename T_SrcMat::PlainObject ReturnType;
+};
+}
+
+template<typename T_SrcMat,typename T_FftIfc> 
+struct fft_fwd_proxy
+ : public ReturnByValue<fft_fwd_proxy<T_SrcMat,T_FftIfc> >
+{
+  typedef DenseIndex Index;
+
+  fft_fwd_proxy(const T_SrcMat& src,T_FftIfc & fft, Index nfft) : m_src(src),m_ifc(fft), m_nfft(nfft) {}
+
+  template<typename T_DestMat> void evalTo(T_DestMat& dst) const;
+
+  Index rows() const { return m_src.rows(); }
+  Index cols() const { return m_src.cols(); }
+protected:
+  const T_SrcMat & m_src;
+  T_FftIfc & m_ifc;
+  Index m_nfft;
+private:
+  fft_fwd_proxy& operator=(const fft_fwd_proxy&);
+};
+
+template<typename T_SrcMat,typename T_FftIfc> 
+struct fft_inv_proxy
+ : public ReturnByValue<fft_inv_proxy<T_SrcMat,T_FftIfc> >
+{
+  typedef DenseIndex Index;
+
+  fft_inv_proxy(const T_SrcMat& src,T_FftIfc & fft, Index nfft) : m_src(src),m_ifc(fft), m_nfft(nfft) {}
+
+  template<typename T_DestMat> void evalTo(T_DestMat& dst) const;
+
+  Index rows() const { return m_src.rows(); }
+  Index cols() const { return m_src.cols(); }
+protected:
+  const T_SrcMat & m_src;
+  T_FftIfc & m_ifc;
+  Index m_nfft;
+private:
+  fft_inv_proxy& operator=(const fft_inv_proxy&);
+};
+
+
+template <typename T_Scalar,
+         typename T_Impl=default_fft_impl<T_Scalar> >
+class FFT
+{
+  public:
+    typedef T_Impl impl_type;
+    typedef DenseIndex Index;
+    typedef typename impl_type::Scalar Scalar;
+    typedef typename impl_type::Complex Complex;
+
+    enum Flag {
+      Default=0, // goof proof
+      Unscaled=1,
+      HalfSpectrum=2,
+      // SomeOtherSpeedOptimization=4
+      Speedy=32767
+    };
+
+    FFT( const impl_type & impl=impl_type() , Flag flags=Default ) :m_impl(impl),m_flag(flags) { }
+
+    inline
+    bool HasFlag(Flag f) const { return (m_flag & (int)f) == f;}
+
+    inline
+    void SetFlag(Flag f) { m_flag |= (int)f;}
+
+    inline
+    void ClearFlag(Flag f) { m_flag &= (~(int)f);}
+
+    inline
+    void fwd( Complex * dst, const Scalar * src, Index nfft)
+    {
+        m_impl.fwd(dst,src,static_cast<int>(nfft));
+        if ( HasFlag(HalfSpectrum) == false)
+          ReflectSpectrum(dst,nfft);
+    }
+
+    inline
+    void fwd( Complex * dst, const Complex * src, Index nfft)
+    {
+        m_impl.fwd(dst,src,static_cast<int>(nfft));
+    }
+
+    /*
+    inline 
+    void fwd2(Complex * dst, const Complex * src, int n0,int n1)
+    {
+      m_impl.fwd2(dst,src,n0,n1);
+    }
+    */
+
+    template <typename _Input>
+    inline
+    void fwd( std::vector<Complex> & dst, const std::vector<_Input> & src) 
+    {
+      if ( NumTraits<_Input>::IsComplex == 0 && HasFlag(HalfSpectrum) )
+        dst.resize( (src.size()>>1)+1); // half the bins + Nyquist bin
+      else
+        dst.resize(src.size());
+      fwd(&dst[0],&src[0],src.size());
+    }
+
+    template<typename InputDerived, typename ComplexDerived>
+    inline
+    void fwd( MatrixBase<ComplexDerived> & dst, const MatrixBase<InputDerived> & src, Index nfft=-1)
+    {
+      typedef typename ComplexDerived::Scalar dst_type;
+      typedef typename InputDerived::Scalar src_type;
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(InputDerived)
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(ComplexDerived)
+      EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(ComplexDerived,InputDerived) // size at compile-time
+      EIGEN_STATIC_ASSERT((internal::is_same<dst_type, Complex>::value),
+            YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+      EIGEN_STATIC_ASSERT(int(InputDerived::Flags)&int(ComplexDerived::Flags)&DirectAccessBit,
+            THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_WITH_DIRECT_MEMORY_ACCESS_SUCH_AS_MAP_OR_PLAIN_MATRICES)
+
+      if (nfft<1)
+        nfft = src.size();
+
+      if ( NumTraits< src_type >::IsComplex == 0 && HasFlag(HalfSpectrum) )
+        dst.derived().resize( (nfft>>1)+1);
+      else
+        dst.derived().resize(nfft);
+
+      if ( src.innerStride() != 1 || src.size() < nfft ) {
+        Matrix<src_type,1,Dynamic> tmp;
+        if (src.size()<nfft) {
+          tmp.setZero(nfft);
+          tmp.block(0,0,src.size(),1 ) = src;
+        }else{
+          tmp = src;
+        }
+        fwd( &dst[0],&tmp[0],nfft );
+      }else{
+        fwd( &dst[0],&src[0],nfft );
+      }
+    }
+ 
+    template<typename InputDerived>
+    inline
+    fft_fwd_proxy< MatrixBase<InputDerived>, FFT<T_Scalar,T_Impl> >
+    fwd( const MatrixBase<InputDerived> & src, Index nfft=-1)
+    {
+      return fft_fwd_proxy< MatrixBase<InputDerived> ,FFT<T_Scalar,T_Impl> >( src, *this,nfft );
+    }
+
+    template<typename InputDerived>
+    inline
+    fft_inv_proxy< MatrixBase<InputDerived>, FFT<T_Scalar,T_Impl> >
+    inv( const MatrixBase<InputDerived> & src, Index nfft=-1)
+    {
+      return  fft_inv_proxy< MatrixBase<InputDerived> ,FFT<T_Scalar,T_Impl> >( src, *this,nfft );
+    }
+
+    inline
+    void inv( Complex * dst, const Complex * src, Index nfft)
+    {
+      m_impl.inv( dst,src,static_cast<int>(nfft) );
+      if ( HasFlag( Unscaled ) == false)
+        scale(dst,Scalar(1./nfft),nfft); // scale the time series
+    }
+
+    inline
+    void inv( Scalar * dst, const Complex * src, Index nfft)
+    {
+      m_impl.inv( dst,src,static_cast<int>(nfft) );
+      if ( HasFlag( Unscaled ) == false)
+        scale(dst,Scalar(1./nfft),nfft); // scale the time series
+    }
+
+    template<typename OutputDerived, typename ComplexDerived>
+    inline
+    void inv( MatrixBase<OutputDerived> & dst, const MatrixBase<ComplexDerived> & src, Index nfft=-1)
+    {
+      typedef typename ComplexDerived::Scalar src_type;
+      typedef typename OutputDerived::Scalar dst_type;
+      const bool realfft= (NumTraits<dst_type>::IsComplex == 0);
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(OutputDerived)
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(ComplexDerived)
+      EIGEN_STATIC_ASSERT_SAME_VECTOR_SIZE(ComplexDerived,OutputDerived) // size at compile-time
+      EIGEN_STATIC_ASSERT((internal::is_same<src_type, Complex>::value),
+            YOU_MIXED_DIFFERENT_NUMERIC_TYPES__YOU_NEED_TO_USE_THE_CAST_METHOD_OF_MATRIXBASE_TO_CAST_NUMERIC_TYPES_EXPLICITLY)
+      EIGEN_STATIC_ASSERT(int(OutputDerived::Flags)&int(ComplexDerived::Flags)&DirectAccessBit,
+            THIS_METHOD_IS_ONLY_FOR_EXPRESSIONS_WITH_DIRECT_MEMORY_ACCESS_SUCH_AS_MAP_OR_PLAIN_MATRICES)
+
+      if (nfft<1) { //automatic FFT size determination
+        if ( realfft && HasFlag(HalfSpectrum) ) 
+          nfft = 2*(src.size()-1); //assume even fft size
+        else
+          nfft = src.size();
+      }
+      dst.derived().resize( nfft );
+
+      // check for nfft that does not fit the input data size
+      Index resize_input= ( realfft && HasFlag(HalfSpectrum) )
+        ? ( (nfft/2+1) - src.size() )
+        : ( nfft - src.size() );
+
+      if ( src.innerStride() != 1 || resize_input ) {
+        // if the vector is strided, then we need to copy it to a packed temporary
+        Matrix<src_type,1,Dynamic> tmp;
+        if ( resize_input ) {
+          size_t ncopy = (std::min)(src.size(),src.size() + resize_input);
+          tmp.setZero(src.size() + resize_input);
+          if ( realfft && HasFlag(HalfSpectrum) ) {
+            // pad at the Nyquist bin
+            tmp.head(ncopy) = src.head(ncopy);
+            tmp(ncopy-1) = real(tmp(ncopy-1)); // enforce real-only Nyquist bin
+          }else{
+            size_t nhead,ntail;
+            nhead = 1+ncopy/2-1; // range  [0:pi)
+            ntail = ncopy/2-1;   // range (-pi:0)
+            tmp.head(nhead) = src.head(nhead);
+            tmp.tail(ntail) = src.tail(ntail);
+            if (resize_input<0) { //shrinking -- create the Nyquist bin as the average of the two bins that fold into it
+              tmp(nhead) = ( src(nfft/2) + src( src.size() - nfft/2 ) )*src_type(.5);
+            }else{ // expanding -- split the old Nyquist bin into two halves
+              tmp(nhead) = src(nhead) * src_type(.5);
+              tmp(tmp.size()-nhead) = tmp(nhead);
+            }
+          }
+        }else{
+          tmp = src;
+        }
+        inv( &dst[0],&tmp[0], nfft);
+      }else{
+        inv( &dst[0],&src[0], nfft);
+      }
+    }
+
+    template <typename _Output>
+    inline
+    void inv( std::vector<_Output> & dst, const std::vector<Complex> & src,Index nfft=-1)
+    {
+      if (nfft<1)
+        nfft = ( NumTraits<_Output>::IsComplex == 0 && HasFlag(HalfSpectrum) ) ? 2*(src.size()-1) : src.size();
+      dst.resize( nfft );
+      inv( &dst[0],&src[0],nfft);
+    }
+
+
+    /*
+    // TODO: multi-dimensional FFTs
+    inline 
+    void inv2(Complex * dst, const Complex * src, int n0,int n1)
+    {
+      m_impl.inv2(dst,src,n0,n1);
+      if ( HasFlag( Unscaled ) == false)
+          scale(dst,1./(n0*n1),n0*n1);
+    }
+  */
+
+    inline
+    impl_type & impl() {return m_impl;}
+  private:
+
+    template <typename T_Data>
+    inline
+    void scale(T_Data * x,Scalar s,Index nx)
+    {
+#if 1
+      for (int k=0;k<nx;++k)
+        *x++ *= s;
+#else
+      if ( ((ptrdiff_t)x) & 15 )
+        Matrix<T_Data, Dynamic, 1>::Map(x,nx) *= s;
+      else
+        Matrix<T_Data, Dynamic, 1>::MapAligned(x,nx) *= s;
+         //Matrix<T_Data, Dynamic, Dynamic>::Map(x,nx) * s;
+#endif  
+    }
+
+    inline
+    void ReflectSpectrum(Complex * freq, Index nfft)
+    {
+      // create the implicit right-half spectrum (conjugate-mirror of the left-half)
+      Index nhbins=(nfft>>1)+1;
+      for (Index k=nhbins;k < nfft; ++k )
+        freq[k] = conj(freq[nfft-k]);
+    }
+
+    impl_type m_impl;
+    int m_flag;
+};
+
+template<typename T_SrcMat,typename T_FftIfc> 
+template<typename T_DestMat> inline 
+void fft_fwd_proxy<T_SrcMat,T_FftIfc>::evalTo(T_DestMat& dst) const
+{
+    m_ifc.fwd( dst, m_src, m_nfft);
+}
+
+template<typename T_SrcMat,typename T_FftIfc> 
+template<typename T_DestMat> inline 
+void fft_inv_proxy<T_SrcMat,T_FftIfc>::evalTo(T_DestMat& dst) const
+{
+    m_ifc.inv( dst, m_src, m_nfft);
+}
+
+}
+#endif
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/unsupported/Eigen/KroneckerProduct b/third_party/eigen3/unsupported/Eigen/KroneckerProduct
new file mode 100644
index 0000000000..c932c06a6d
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/KroneckerProduct
@@ -0,0 +1,34 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef EIGEN_KRONECKER_PRODUCT_MODULE_H
+#define EIGEN_KRONECKER_PRODUCT_MODULE_H
+
+#include "../../Eigen/Core"
+
+#include "../../Eigen/src/Core/util/DisableStupidWarnings.h"
+
+namespace Eigen {
+
+/**
+  * \defgroup KroneckerProduct_Module KroneckerProduct module
+  *
+  * This module contains an experimental Kronecker product implementation.
+  *
+  * \code
+  * #include <Eigen/KroneckerProduct>
+  * \endcode
+  */
+
+} // namespace Eigen
+
+#include "src/KroneckerProduct/KroneckerTensorProduct.h"
+
+#include "../../Eigen/src/Core/util/ReenableStupidWarnings.h"
+
+#endif // EIGEN_KRONECKER_PRODUCT_MODULE_H
diff --git a/third_party/eigen3/unsupported/Eigen/src/FFT/CMakeLists.txt b/third_party/eigen3/unsupported/Eigen/src/FFT/CMakeLists.txt
new file mode 100644
index 0000000000..edcffcb189
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/src/FFT/CMakeLists.txt
@@ -0,0 +1,6 @@
+FILE(GLOB Eigen_FFT_SRCS "*.h")
+
+INSTALL(FILES
+  ${Eigen_FFT_SRCS}
+  DESTINATION ${INCLUDE_INSTALL_DIR}/unsupported/Eigen/src/FFT COMPONENT Devel
+  )
diff --git a/third_party/eigen3/unsupported/Eigen/src/FFT/ei_fftw_impl.h b/third_party/eigen3/unsupported/Eigen/src/FFT/ei_fftw_impl.h
new file mode 100644
index 0000000000..d49aa17f51
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/src/FFT/ei_fftw_impl.h
@@ -0,0 +1,261 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra. 
+//
+// Copyright (C) 2009 Mark Borgerding mark a borgerding net
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+namespace Eigen { 
+
+namespace internal {
+
+  // FFTW uses non-const arguments
+  // so we must use ugly const_cast calls for all the args it uses
+  //
+  // This should be safe as long as 
+  // 1. we use FFTW_ESTIMATE for all our planning
+  //       see the FFTW docs section 4.3.2 "Planner Flags"
+  // 2. fftw_complex is compatible with std::complex
+  //    This assumes std::complex<T> layout is array of size 2 with real,imag
+  template <typename T> 
+  inline 
+  T * fftw_cast(const T* p)
+  { 
+      return const_cast<T*>( p); 
+  }
+
+  inline 
+  fftw_complex * fftw_cast( const std::complex<double> * p)
+  {
+      return const_cast<fftw_complex*>( reinterpret_cast<const fftw_complex*>(p) ); 
+  }
+
+  inline 
+  fftwf_complex * fftw_cast( const std::complex<float> * p)
+  { 
+      return const_cast<fftwf_complex*>( reinterpret_cast<const fftwf_complex*>(p) ); 
+  }
+
+  inline 
+  fftwl_complex * fftw_cast( const std::complex<long double> * p)
+  { 
+      return const_cast<fftwl_complex*>( reinterpret_cast<const fftwl_complex*>(p) ); 
+  }
+
+  template <typename T> 
+  struct fftw_plan {};
+
+  template <> 
+  struct fftw_plan<float>
+  {
+      typedef float scalar_type;
+      typedef fftwf_complex complex_type;
+      fftwf_plan m_plan;
+      fftw_plan() :m_plan(NULL) {}
+      ~fftw_plan() {if (m_plan) fftwf_destroy_plan(m_plan);}
+
+      inline
+      void fwd(complex_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftwf_plan_dft_1d(nfft,src,dst, FFTW_FORWARD, FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwf_execute_dft( m_plan, src,dst);
+      }
+      inline
+      void inv(complex_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftwf_plan_dft_1d(nfft,src,dst, FFTW_BACKWARD , FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwf_execute_dft( m_plan, src,dst);
+      }
+      inline
+      void fwd(complex_type * dst,scalar_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftwf_plan_dft_r2c_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwf_execute_dft_r2c( m_plan,src,dst);
+      }
+      inline
+      void inv(scalar_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL)
+              m_plan = fftwf_plan_dft_c2r_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwf_execute_dft_c2r( m_plan, src,dst);
+      }
+
+      inline 
+      void fwd2( complex_type * dst,complex_type * src,int n0,int n1) {
+          if (m_plan==NULL) m_plan = fftwf_plan_dft_2d(n0,n1,src,dst,FFTW_FORWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwf_execute_dft( m_plan, src,dst);
+      }
+      inline 
+      void inv2( complex_type * dst,complex_type * src,int n0,int n1) {
+          if (m_plan==NULL) m_plan = fftwf_plan_dft_2d(n0,n1,src,dst,FFTW_BACKWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwf_execute_dft( m_plan, src,dst);
+      }
+
+  };
+  template <> 
+  struct fftw_plan<double>
+  {
+      typedef double scalar_type;
+      typedef fftw_complex complex_type;
+      ::fftw_plan m_plan;
+      fftw_plan() :m_plan(NULL) {}
+      ~fftw_plan() {if (m_plan) fftw_destroy_plan(m_plan);}
+
+      inline
+      void fwd(complex_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftw_plan_dft_1d(nfft,src,dst, FFTW_FORWARD, FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftw_execute_dft( m_plan, src,dst);
+      }
+      inline
+      void inv(complex_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftw_plan_dft_1d(nfft,src,dst, FFTW_BACKWARD , FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftw_execute_dft( m_plan, src,dst);
+      }
+      inline
+      void fwd(complex_type * dst,scalar_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftw_plan_dft_r2c_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftw_execute_dft_r2c( m_plan,src,dst);
+      }
+      inline
+      void inv(scalar_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL)
+              m_plan = fftw_plan_dft_c2r_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftw_execute_dft_c2r( m_plan, src,dst);
+      }
+      inline 
+      void fwd2( complex_type * dst,complex_type * src,int n0,int n1) {
+          if (m_plan==NULL) m_plan = fftw_plan_dft_2d(n0,n1,src,dst,FFTW_FORWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftw_execute_dft( m_plan, src,dst);
+      }
+      inline 
+      void inv2( complex_type * dst,complex_type * src,int n0,int n1) {
+          if (m_plan==NULL) m_plan = fftw_plan_dft_2d(n0,n1,src,dst,FFTW_BACKWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftw_execute_dft( m_plan, src,dst);
+      }
+  };
+  template <> 
+  struct fftw_plan<long double>
+  {
+      typedef long double scalar_type;
+      typedef fftwl_complex complex_type;
+      fftwl_plan m_plan;
+      fftw_plan() :m_plan(NULL) {}
+      ~fftw_plan() {if (m_plan) fftwl_destroy_plan(m_plan);}
+
+      inline
+      void fwd(complex_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftwl_plan_dft_1d(nfft,src,dst, FFTW_FORWARD, FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwl_execute_dft( m_plan, src,dst);
+      }
+      inline
+      void inv(complex_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftwl_plan_dft_1d(nfft,src,dst, FFTW_BACKWARD , FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwl_execute_dft( m_plan, src,dst);
+      }
+      inline
+      void fwd(complex_type * dst,scalar_type * src,int nfft) {
+          if (m_plan==NULL) m_plan = fftwl_plan_dft_r2c_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwl_execute_dft_r2c( m_plan,src,dst);
+      }
+      inline
+      void inv(scalar_type * dst,complex_type * src,int nfft) {
+          if (m_plan==NULL)
+              m_plan = fftwl_plan_dft_c2r_1d(nfft,src,dst,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwl_execute_dft_c2r( m_plan, src,dst);
+      }
+      inline 
+      void fwd2( complex_type * dst,complex_type * src,int n0,int n1) {
+          if (m_plan==NULL) m_plan = fftwl_plan_dft_2d(n0,n1,src,dst,FFTW_FORWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwl_execute_dft( m_plan, src,dst);
+      }
+      inline 
+      void inv2( complex_type * dst,complex_type * src,int n0,int n1) {
+          if (m_plan==NULL) m_plan = fftwl_plan_dft_2d(n0,n1,src,dst,FFTW_BACKWARD,FFTW_ESTIMATE|FFTW_PRESERVE_INPUT);
+          fftwl_execute_dft( m_plan, src,dst);
+      }
+  };
+
+  template <typename _Scalar>
+  struct fftw_impl
+  {
+      typedef _Scalar Scalar;
+      typedef std::complex<Scalar> Complex;
+
+      inline
+      void clear() 
+      {
+        m_plans.clear();
+      }
+
+      // complex-to-complex forward FFT
+      inline
+      void fwd( Complex * dst,const Complex *src,int nfft)
+      {
+        get_plan(nfft,false,dst,src).fwd(fftw_cast(dst), fftw_cast(src),nfft );
+      }
+
+      // real-to-complex forward FFT
+      inline
+      void fwd( Complex * dst,const Scalar * src,int nfft) 
+      {
+          get_plan(nfft,false,dst,src).fwd(fftw_cast(dst), fftw_cast(src) ,nfft);
+      }
+
+      // 2-d complex-to-complex
+      inline
+      void fwd2(Complex * dst, const Complex * src, int n0,int n1)
+      {
+          get_plan(n0,n1,false,dst,src).fwd2(fftw_cast(dst), fftw_cast(src) ,n0,n1);
+      }
+
+      // inverse complex-to-complex
+      inline
+      void inv(Complex * dst,const Complex  *src,int nfft)
+      {
+        get_plan(nfft,true,dst,src).inv(fftw_cast(dst), fftw_cast(src),nfft );
+      }
+
+      // half-complex to scalar
+      inline
+      void inv( Scalar * dst,const Complex * src,int nfft) 
+      {
+        get_plan(nfft,true,dst,src).inv(fftw_cast(dst), fftw_cast(src),nfft );
+      }
+
+      // 2-d complex-to-complex
+      inline
+      void inv2(Complex * dst, const Complex * src, int n0,int n1)
+      {
+        get_plan(n0,n1,true,dst,src).inv2(fftw_cast(dst), fftw_cast(src) ,n0,n1);
+      }
+
+
+  protected:
+      typedef fftw_plan<Scalar> PlanData;
+
+      typedef std::map<int64_t,PlanData> PlanMap;
+
+      PlanMap m_plans;
+
+      inline
+      PlanData & get_plan(int nfft,bool inverse,void * dst,const void * src)
+      {
+          bool inplace = (dst==src);
+          bool aligned = ( (reinterpret_cast<size_t>(src)&15) | (reinterpret_cast<size_t>(dst)&15) ) == 0;
+          int64_t key = ( (nfft<<3 ) | (inverse<<2) | (inplace<<1) | aligned ) << 1;
+          return m_plans[key];
+      }
+
+      inline
+      PlanData & get_plan(int n0,int n1,bool inverse,void * dst,const void * src)
+      {
+          bool inplace = (dst==src);
+          bool aligned = ( (reinterpret_cast<size_t>(src)&15) | (reinterpret_cast<size_t>(dst)&15) ) == 0;
+          int64_t key = ( ( (((int64_t)n0) << 30)|(n1<<3 ) | (inverse<<2) | (inplace<<1) | aligned ) << 1 ) + 1;
+          return m_plans[key];
+      }
+  };
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/unsupported/Eigen/src/FFT/ei_kissfft_impl.h b/third_party/eigen3/unsupported/Eigen/src/FFT/ei_kissfft_impl.h
new file mode 100644
index 0000000000..be51b4e6fe
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/src/FFT/ei_kissfft_impl.h
@@ -0,0 +1,420 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2009 Mark Borgerding mark a borgerding net
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+namespace Eigen { 
+
+namespace internal {
+
+  // This FFT implementation was derived from kissfft http:sourceforge.net/projects/kissfft
+  // Copyright 2003-2009 Mark Borgerding
+
+template <typename _Scalar>
+struct kiss_cpx_fft
+{
+  typedef _Scalar Scalar;
+  typedef std::complex<Scalar> Complex;
+  std::vector<Complex> m_twiddles;
+  std::vector<int> m_stageRadix;
+  std::vector<int> m_stageRemainder;
+  std::vector<Complex> m_scratchBuf;
+  bool m_inverse;
+
+  inline
+    void make_twiddles(int nfft,bool inverse)
+    {
+      using std::acos;
+      m_inverse = inverse;
+      m_twiddles.resize(nfft);
+      Scalar phinc =  (inverse?2:-2)* acos( (Scalar) -1)  / nfft;
+      for (int i=0;i<nfft;++i)
+        m_twiddles[i] = exp( Complex(0,i*phinc) );
+    }
+
+  void factorize(int nfft)
+  {
+    //start factoring out 4's, then 2's, then 3,5,7,9,...
+    int n= nfft;
+    int p=4;
+    do {
+      while (n % p) {
+        switch (p) {
+          case 4: p = 2; break;
+          case 2: p = 3; break;
+          default: p += 2; break;
+        }
+        if (p*p>n)
+          p=n;// impossible to have a factor > sqrt(n)
+      }
+      n /= p;
+      m_stageRadix.push_back(p);
+      m_stageRemainder.push_back(n);
+      if ( p > 5 )
+        m_scratchBuf.resize(p); // scratchbuf will be needed in bfly_generic
+    }while(n>1);
+  }
+
+  template <typename _Src>
+    inline
+    void work( int stage,Complex * xout, const _Src * xin, size_t fstride,size_t in_stride)
+    {
+      int p = m_stageRadix[stage];
+      int m = m_stageRemainder[stage];
+      Complex * Fout_beg = xout;
+      Complex * Fout_end = xout + p*m;
+
+      if (m>1) {
+        do{
+          // recursive call:
+          // DFT of size m*p performed by doing
+          // p instances of smaller DFTs of size m, 
+          // each one takes a decimated version of the input
+          work(stage+1, xout , xin, fstride*p,in_stride);
+          xin += fstride*in_stride;
+        }while( (xout += m) != Fout_end );
+      }else{
+        do{
+          *xout = *xin;
+          xin += fstride*in_stride;
+        }while(++xout != Fout_end );
+      }
+      xout=Fout_beg;
+
+      // recombine the p smaller DFTs 
+      switch (p) {
+        case 2: bfly2(xout,fstride,m); break;
+        case 3: bfly3(xout,fstride,m); break;
+        case 4: bfly4(xout,fstride,m); break;
+        case 5: bfly5(xout,fstride,m); break;
+        default: bfly_generic(xout,fstride,m,p); break;
+      }
+    }
+
+  inline
+    void bfly2( Complex * Fout, const size_t fstride, int m)
+    {
+      for (int k=0;k<m;++k) {
+        Complex t = Fout[m+k] * m_twiddles[k*fstride];
+        Fout[m+k] = Fout[k] - t;
+        Fout[k] += t;
+      }
+    }
+
+  inline
+    void bfly4( Complex * Fout, const size_t fstride, const size_t m)
+    {
+      Complex scratch[6];
+      int negative_if_inverse = m_inverse * -2 +1;
+      for (size_t k=0;k<m;++k) {
+        scratch[0] = Fout[k+m] * m_twiddles[k*fstride];
+        scratch[1] = Fout[k+2*m] * m_twiddles[k*fstride*2];
+        scratch[2] = Fout[k+3*m] * m_twiddles[k*fstride*3];
+        scratch[5] = Fout[k] - scratch[1];
+
+        Fout[k] += scratch[1];
+        scratch[3] = scratch[0] + scratch[2];
+        scratch[4] = scratch[0] - scratch[2];
+        scratch[4] = Complex( scratch[4].imag()*negative_if_inverse , -scratch[4].real()* negative_if_inverse );
+
+        Fout[k+2*m]  = Fout[k] - scratch[3];
+        Fout[k] += scratch[3];
+        Fout[k+m] = scratch[5] + scratch[4];
+        Fout[k+3*m] = scratch[5] - scratch[4];
+      }
+    }
+
+  inline
+    void bfly3( Complex * Fout, const size_t fstride, const size_t m)
+    {
+      size_t k=m;
+      const size_t m2 = 2*m;
+      Complex *tw1,*tw2;
+      Complex scratch[5];
+      Complex epi3;
+      epi3 = m_twiddles[fstride*m];
+
+      tw1=tw2=&m_twiddles[0];
+
+      do{
+        scratch[1]=Fout[m] * *tw1;
+        scratch[2]=Fout[m2] * *tw2;
+
+        scratch[3]=scratch[1]+scratch[2];
+        scratch[0]=scratch[1]-scratch[2];
+        tw1 += fstride;
+        tw2 += fstride*2;
+        Fout[m] = Complex( Fout->real() - Scalar(.5)*scratch[3].real() , Fout->imag() - Scalar(.5)*scratch[3].imag() );
+        scratch[0] *= epi3.imag();
+        *Fout += scratch[3];
+        Fout[m2] = Complex(  Fout[m].real() + scratch[0].imag() , Fout[m].imag() - scratch[0].real() );
+        Fout[m] += Complex( -scratch[0].imag(),scratch[0].real() );
+        ++Fout;
+      }while(--k);
+    }
+
+  inline
+    void bfly5( Complex * Fout, const size_t fstride, const size_t m)
+    {
+      Complex *Fout0,*Fout1,*Fout2,*Fout3,*Fout4;
+      size_t u;
+      Complex scratch[13];
+      Complex * twiddles = &m_twiddles[0];
+      Complex *tw;
+      Complex ya,yb;
+      ya = twiddles[fstride*m];
+      yb = twiddles[fstride*2*m];
+
+      Fout0=Fout;
+      Fout1=Fout0+m;
+      Fout2=Fout0+2*m;
+      Fout3=Fout0+3*m;
+      Fout4=Fout0+4*m;
+
+      tw=twiddles;
+      for ( u=0; u<m; ++u ) {
+        scratch[0] = *Fout0;
+
+        scratch[1]  = *Fout1 * tw[u*fstride];
+        scratch[2]  = *Fout2 * tw[2*u*fstride];
+        scratch[3]  = *Fout3 * tw[3*u*fstride];
+        scratch[4]  = *Fout4 * tw[4*u*fstride];
+
+        scratch[7] = scratch[1] + scratch[4];
+        scratch[10] = scratch[1] - scratch[4];
+        scratch[8] = scratch[2] + scratch[3];
+        scratch[9] = scratch[2] - scratch[3];
+
+        *Fout0 +=  scratch[7];
+        *Fout0 +=  scratch[8];
+
+        scratch[5] = scratch[0] + Complex(
+            (scratch[7].real()*ya.real() ) + (scratch[8].real() *yb.real() ),
+            (scratch[7].imag()*ya.real()) + (scratch[8].imag()*yb.real())
+            );
+
+        scratch[6] = Complex(
+            (scratch[10].imag()*ya.imag()) + (scratch[9].imag()*yb.imag()),
+            -(scratch[10].real()*ya.imag()) - (scratch[9].real()*yb.imag())
+            );
+
+        *Fout1 = scratch[5] - scratch[6];
+        *Fout4 = scratch[5] + scratch[6];
+
+        scratch[11] = scratch[0] +
+          Complex(
+              (scratch[7].real()*yb.real()) + (scratch[8].real()*ya.real()),
+              (scratch[7].imag()*yb.real()) + (scratch[8].imag()*ya.real())
+              );
+
+        scratch[12] = Complex(
+            -(scratch[10].imag()*yb.imag()) + (scratch[9].imag()*ya.imag()),
+            (scratch[10].real()*yb.imag()) - (scratch[9].real()*ya.imag())
+            );
+
+        *Fout2=scratch[11]+scratch[12];
+        *Fout3=scratch[11]-scratch[12];
+
+        ++Fout0;++Fout1;++Fout2;++Fout3;++Fout4;
+      }
+    }
+
+  /* perform the butterfly for one stage of a mixed radix FFT */
+  inline
+    void bfly_generic(
+        Complex * Fout,
+        const size_t fstride,
+        int m,
+        int p
+        )
+    {
+      int u,k,q1,q;
+      Complex * twiddles = &m_twiddles[0];
+      Complex t;
+      int Norig = static_cast<int>(m_twiddles.size());
+      Complex * scratchbuf = &m_scratchBuf[0];
+
+      for ( u=0; u<m; ++u ) {
+        k=u;
+        for ( q1=0 ; q1<p ; ++q1 ) {
+          scratchbuf[q1] = Fout[ k  ];
+          k += m;
+        }
+
+        k=u;
+        for ( q1=0 ; q1<p ; ++q1 ) {
+          int twidx=0;
+          Fout[ k ] = scratchbuf[0];
+          for (q=1;q<p;++q ) {
+            twidx += static_cast<int>(fstride) * k;
+            if (twidx>=Norig) twidx-=Norig;
+            t=scratchbuf[q] * twiddles[twidx];
+            Fout[ k ] += t;
+          }
+          k += m;
+        }
+      }
+    }
+};
+
+template <typename _Scalar>
+struct kissfft_impl
+{
+  typedef _Scalar Scalar;
+  typedef std::complex<Scalar> Complex;
+
+  void clear() 
+  {
+    m_plans.clear();
+    m_realTwiddles.clear();
+  }
+
+  inline
+    void fwd( Complex * dst,const Complex *src,int nfft)
+    {
+      get_plan(nfft,false).work(0, dst, src, 1,1);
+    }
+
+  inline
+    void fwd2( Complex * dst,const Complex *src,int n0,int n1)
+    {
+        EIGEN_UNUSED_VARIABLE(dst);
+        EIGEN_UNUSED_VARIABLE(src);
+        EIGEN_UNUSED_VARIABLE(n0);
+        EIGEN_UNUSED_VARIABLE(n1);
+    }
+
+  inline
+    void inv2( Complex * dst,const Complex *src,int n0,int n1)
+    {
+        EIGEN_UNUSED_VARIABLE(dst);
+        EIGEN_UNUSED_VARIABLE(src);
+        EIGEN_UNUSED_VARIABLE(n0);
+        EIGEN_UNUSED_VARIABLE(n1);
+    }
+
+  // real-to-complex forward FFT
+  // perform two FFTs of src even and src odd
+  // then twiddle to recombine them into the half-spectrum format
+  // then fill in the conjugate symmetric half
+  inline
+    void fwd( Complex * dst,const Scalar * src,int nfft) 
+    {
+      if ( nfft&3  ) {
+        // use generic mode for odd
+        m_tmpBuf1.resize(nfft);
+        get_plan(nfft,false).work(0, &m_tmpBuf1[0], src, 1,1);
+        std::copy(m_tmpBuf1.begin(),m_tmpBuf1.begin()+(nfft>>1)+1,dst );
+      }else{
+        int ncfft = nfft>>1;
+        int ncfft2 = nfft>>2;
+        Complex * rtw = real_twiddles(ncfft2);
+
+        // use optimized mode for even real
+        fwd( dst, reinterpret_cast<const Complex*> (src), ncfft);
+        Complex dc = dst[0].real() +  dst[0].imag();
+        Complex nyquist = dst[0].real() -  dst[0].imag();
+        int k;
+        for ( k=1;k <= ncfft2 ; ++k ) {
+          Complex fpk = dst[k];
+          Complex fpnk = conj(dst[ncfft-k]);
+          Complex f1k = fpk + fpnk;
+          Complex f2k = fpk - fpnk;
+          Complex tw= f2k * rtw[k-1];
+          dst[k] =  (f1k + tw) * Scalar(.5);
+          dst[ncfft-k] =  conj(f1k -tw)*Scalar(.5);
+        }
+        dst[0] = dc;
+        dst[ncfft] = nyquist;
+      }
+    }
+
+  // inverse complex-to-complex
+  inline
+    void inv(Complex * dst,const Complex  *src,int nfft)
+    {
+      get_plan(nfft,true).work(0, dst, src, 1,1);
+    }
+
+  // half-complex to scalar
+  inline
+    void inv( Scalar * dst,const Complex * src,int nfft) 
+    {
+      if (nfft&3) {
+        m_tmpBuf1.resize(nfft);
+        m_tmpBuf2.resize(nfft);
+        std::copy(src,src+(nfft>>1)+1,m_tmpBuf1.begin() );
+        for (int k=1;k<(nfft>>1)+1;++k)
+          m_tmpBuf1[nfft-k] = conj(m_tmpBuf1[k]);
+        inv(&m_tmpBuf2[0],&m_tmpBuf1[0],nfft);
+        for (int k=0;k<nfft;++k)
+          dst[k] = m_tmpBuf2[k].real();
+      }else{
+        // optimized version for multiple of 4
+        int ncfft = nfft>>1;
+        int ncfft2 = nfft>>2;
+        Complex * rtw = real_twiddles(ncfft2);
+        m_tmpBuf1.resize(ncfft);
+        m_tmpBuf1[0] = Complex( src[0].real() + src[ncfft].real(), src[0].real() - src[ncfft].real() );
+        for (int k = 1; k <= ncfft / 2; ++k) {
+          Complex fk = src[k];
+          Complex fnkc = conj(src[ncfft-k]);
+          Complex fek = fk + fnkc;
+          Complex tmp = fk - fnkc;
+          Complex fok = tmp * conj(rtw[k-1]);
+          m_tmpBuf1[k] = fek + fok;
+          m_tmpBuf1[ncfft-k] = conj(fek - fok);
+        }
+        get_plan(ncfft,true).work(0, reinterpret_cast<Complex*>(dst), &m_tmpBuf1[0], 1,1);
+      }
+    }
+
+  protected:
+  typedef kiss_cpx_fft<Scalar> PlanData;
+  typedef std::map<int,PlanData> PlanMap;
+
+  PlanMap m_plans;
+  std::map<int, std::vector<Complex> > m_realTwiddles;
+  std::vector<Complex> m_tmpBuf1;
+  std::vector<Complex> m_tmpBuf2;
+
+  inline
+    int PlanKey(int nfft, bool isinverse) const { return (nfft<<1) | int(isinverse); }
+
+  inline
+    PlanData & get_plan(int nfft, bool inverse)
+    {
+      // TODO look for PlanKey(nfft, ! inverse) and conjugate the twiddles
+      PlanData & pd = m_plans[ PlanKey(nfft,inverse) ];
+      if ( pd.m_twiddles.size() == 0 ) {
+        pd.make_twiddles(nfft,inverse);
+        pd.factorize(nfft);
+      }
+      return pd;
+    }
+
+  inline
+    Complex * real_twiddles(int ncfft2)
+    {
+      using std::acos;
+      std::vector<Complex> & twidref = m_realTwiddles[ncfft2];// creates new if not there
+      if ( (int)twidref.size() != ncfft2 ) {
+        twidref.resize(ncfft2);
+        int ncfft= ncfft2<<1;
+        Scalar pi =  acos( Scalar(-1) );
+        for (int k=1;k<=ncfft2;++k) 
+          twidref[k-1] = exp( Complex(0,-pi * (Scalar(k) / ncfft + Scalar(.5)) ) );
+      }
+      return &twidref[0];
+    }
+};
+
+} // end namespace internal
+
+} // end namespace Eigen
+
+/* vim: set filetype=cpp et sw=2 ts=2 ai: */
diff --git a/third_party/eigen3/unsupported/Eigen/src/KroneckerProduct/CMakeLists.txt b/third_party/eigen3/unsupported/Eigen/src/KroneckerProduct/CMakeLists.txt
new file mode 100644
index 0000000000..4daefebee6
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/src/KroneckerProduct/CMakeLists.txt
@@ -0,0 +1,6 @@
+FILE(GLOB Eigen_KroneckerProduct_SRCS "*.h")
+
+INSTALL(FILES
+  ${Eigen_KroneckerProduct_SRCS}
+  DESTINATION ${INCLUDE_INSTALL_DIR}/unsupported/Eigen/src/KroneckerProduct COMPONENT Devel
+  )
diff --git a/third_party/eigen3/unsupported/Eigen/src/KroneckerProduct/KroneckerTensorProduct.h b/third_party/eigen3/unsupported/Eigen/src/KroneckerProduct/KroneckerTensorProduct.h
new file mode 100644
index 0000000000..b8f2cba173
--- /dev/null
+++ b/third_party/eigen3/unsupported/Eigen/src/KroneckerProduct/KroneckerTensorProduct.h
@@ -0,0 +1,297 @@
+// This file is part of Eigen, a lightweight C++ template library
+// for linear algebra.
+//
+// Copyright (C) 2011 Kolja Brix <brix@igpm.rwth-aachen.de>
+// Copyright (C) 2011 Andreas Platen <andiplaten@gmx.de>
+// Copyright (C) 2012 Chen-Pang He <jdh8@ms63.hinet.net>
+//
+// This Source Code Form is subject to the terms of the Mozilla
+// Public License v. 2.0. If a copy of the MPL was not distributed
+// with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
+
+#ifndef KRONECKER_TENSOR_PRODUCT_H
+#define KRONECKER_TENSOR_PRODUCT_H
+
+namespace Eigen {
+
+/*!
+ * \ingroup KroneckerProduct_Module
+ *
+ * \brief The base class of dense and sparse Kronecker product.
+ *
+ * \tparam Derived is the derived type.
+ */
+template<typename Derived>
+class KroneckerProductBase : public ReturnByValue<Derived>
+{
+  private:
+    typedef typename internal::traits<Derived> Traits;
+    typedef typename Traits::Scalar Scalar;
+
+  protected:
+    typedef typename Traits::Lhs Lhs;
+    typedef typename Traits::Rhs Rhs;
+    typedef typename Traits::Index Index;
+
+  public:
+    /*! \brief Constructor. */
+    KroneckerProductBase(const Lhs& A, const Rhs& B)
+      : m_A(A), m_B(B)
+    {}
+
+    inline Index rows() const { return m_A.rows() * m_B.rows(); }
+    inline Index cols() const { return m_A.cols() * m_B.cols(); }
+
+    /*!
+     * This overrides ReturnByValue::coeff because this function is
+     * efficient enough.
+     */
+    Scalar coeff(Index row, Index col) const
+    {
+      return m_A.coeff(row / m_B.rows(), col / m_B.cols()) *
+             m_B.coeff(row % m_B.rows(), col % m_B.cols());
+    }
+
+    /*!
+     * This overrides ReturnByValue::coeff because this function is
+     * efficient enough.
+     */
+    Scalar coeff(Index i) const
+    {
+      EIGEN_STATIC_ASSERT_VECTOR_ONLY(Derived);
+      return m_A.coeff(i / m_A.size()) * m_B.coeff(i % m_A.size());
+    }
+
+  protected:
+    typename Lhs::Nested m_A;
+    typename Rhs::Nested m_B;
+};
+
+/*!
+ * \ingroup KroneckerProduct_Module
+ *
+ * \brief Kronecker tensor product helper class for dense matrices
+ *
+ * This class is the return value of kroneckerProduct(MatrixBase,
+ * MatrixBase). Use the function rather than construct this class
+ * directly to avoid specifying template prarameters.
+ *
+ * \tparam Lhs  Type of the left-hand side, a matrix expression.
+ * \tparam Rhs  Type of the rignt-hand side, a matrix expression.
+ */
+template<typename Lhs, typename Rhs>
+class KroneckerProduct : public KroneckerProductBase<KroneckerProduct<Lhs,Rhs> >
+{
+  private:
+    typedef KroneckerProductBase<KroneckerProduct> Base;
+    using Base::m_A;
+    using Base::m_B;
+
+  public:
+    /*! \brief Constructor. */
+    KroneckerProduct(const Lhs& A, const Rhs& B)
+      : Base(A, B)
+    {}
+
+    /*! \brief Evaluate the Kronecker tensor product. */
+    template<typename Dest> void evalTo(Dest& dst) const;
+};
+
+/*!
+ * \ingroup KroneckerProduct_Module
+ *
+ * \brief Kronecker tensor product helper class for sparse matrices
+ *
+ * If at least one of the operands is a sparse matrix expression,
+ * then this class is returned and evaluates into a sparse matrix.
+ *
+ * This class is the return value of kroneckerProduct(EigenBase,
+ * EigenBase). Use the function rather than construct this class
+ * directly to avoid specifying template prarameters.
+ *
+ * \tparam Lhs  Type of the left-hand side, a matrix expression.
+ * \tparam Rhs  Type of the rignt-hand side, a matrix expression.
+ */
+template<typename Lhs, typename Rhs>
+class KroneckerProductSparse : public KroneckerProductBase<KroneckerProductSparse<Lhs,Rhs> >
+{
+  private:
+    typedef KroneckerProductBase<KroneckerProductSparse> Base;
+    using Base::m_A;
+    using Base::m_B;
+
+  public:
+    /*! \brief Constructor. */
+    KroneckerProductSparse(const Lhs& A, const Rhs& B)
+      : Base(A, B)
+    {}
+
+    /*! \brief Evaluate the Kronecker tensor product. */
+    template<typename Dest> void evalTo(Dest& dst) const;
+};
+
+template<typename Lhs, typename Rhs>
+template<typename Dest>
+void KroneckerProduct<Lhs,Rhs>::evalTo(Dest& dst) const
+{
+  typedef typename Base::Index Index;
+  const int BlockRows = Rhs::RowsAtCompileTime,
+            BlockCols = Rhs::ColsAtCompileTime;
+  const Index Br = m_B.rows(),
+              Bc = m_B.cols();
+  for (Index i=0; i < m_A.rows(); ++i)
+    for (Index j=0; j < m_A.cols(); ++j)
+      Block<Dest,BlockRows,BlockCols>(dst,i*Br,j*Bc,Br,Bc) = m_A.coeff(i,j) * m_B;
+}
+
+template<typename Lhs, typename Rhs>
+template<typename Dest>
+void KroneckerProductSparse<Lhs,Rhs>::evalTo(Dest& dst) const
+{
+  typedef typename Base::Index Index;
+  const Index Br = m_B.rows(),
+              Bc = m_B.cols();
+  dst.resize(this->rows(), this->cols());
+  dst.resizeNonZeros(0);
+  
+  // compute number of non-zeros per innervectors of dst
+  {
+    VectorXi nnzA = VectorXi::Zero(Dest::IsRowMajor ? m_A.rows() : m_A.cols());
+    for (Index kA=0; kA < m_A.outerSize(); ++kA)
+      for (typename Lhs::InnerIterator itA(m_A,kA); itA; ++itA)
+        nnzA(Dest::IsRowMajor ? itA.row() : itA.col())++;
+      
+    VectorXi nnzB = VectorXi::Zero(Dest::IsRowMajor ? m_B.rows() : m_B.cols());
+    for (Index kB=0; kB < m_B.outerSize(); ++kB)
+      for (typename Rhs::InnerIterator itB(m_B,kB); itB; ++itB)
+        nnzB(Dest::IsRowMajor ? itB.row() : itB.col())++;
+    
+    Matrix<int,Dynamic,Dynamic,ColMajor> nnzAB = nnzB * nnzA.transpose();
+    dst.reserve(VectorXi::Map(nnzAB.data(), nnzAB.size()));
+  }
+
+  for (Index kA=0; kA < m_A.outerSize(); ++kA)
+  {
+    for (Index kB=0; kB < m_B.outerSize(); ++kB)
+    {
+      for (typename Lhs::InnerIterator itA(m_A,kA); itA; ++itA)
+      {
+        for (typename Rhs::InnerIterator itB(m_B,kB); itB; ++itB)
+        {
+          const Index i = itA.row() * Br + itB.row(),
+                      j = itA.col() * Bc + itB.col();
+          dst.insert(i,j) = itA.value() * itB.value();
+        }
+      }
+    }
+  }
+}
+
+namespace internal {
+
+template<typename _Lhs, typename _Rhs>
+struct traits<KroneckerProduct<_Lhs,_Rhs> >
+{
+  typedef typename remove_all<_Lhs>::type Lhs;
+  typedef typename remove_all<_Rhs>::type Rhs;
+  typedef typename scalar_product_traits<typename Lhs::Scalar, typename Rhs::Scalar>::ReturnType Scalar;
+  typedef typename promote_index_type<typename Lhs::Index, typename Rhs::Index>::type Index;
+
+  enum {
+    Rows = size_at_compile_time<traits<Lhs>::RowsAtCompileTime, traits<Rhs>::RowsAtCompileTime>::ret,
+    Cols = size_at_compile_time<traits<Lhs>::ColsAtCompileTime, traits<Rhs>::ColsAtCompileTime>::ret,
+    MaxRows = size_at_compile_time<traits<Lhs>::MaxRowsAtCompileTime, traits<Rhs>::MaxRowsAtCompileTime>::ret,
+    MaxCols = size_at_compile_time<traits<Lhs>::MaxColsAtCompileTime, traits<Rhs>::MaxColsAtCompileTime>::ret,
+    CoeffReadCost = Lhs::CoeffReadCost + Rhs::CoeffReadCost + NumTraits<Scalar>::MulCost
+  };
+
+  typedef Matrix<Scalar,Rows,Cols> ReturnType;
+};
+
+template<typename _Lhs, typename _Rhs>
+struct traits<KroneckerProductSparse<_Lhs,_Rhs> >
+{
+  typedef MatrixXpr XprKind;
+  typedef typename remove_all<_Lhs>::type Lhs;
+  typedef typename remove_all<_Rhs>::type Rhs;
+  typedef typename scalar_product_traits<typename Lhs::Scalar, typename Rhs::Scalar>::ReturnType Scalar;
+  typedef typename promote_storage_type<typename traits<Lhs>::StorageKind, typename traits<Rhs>::StorageKind>::ret StorageKind;
+  typedef typename promote_index_type<typename Lhs::Index, typename Rhs::Index>::type Index;
+
+  enum {
+    LhsFlags = Lhs::Flags,
+    RhsFlags = Rhs::Flags,
+
+    RowsAtCompileTime = size_at_compile_time<traits<Lhs>::RowsAtCompileTime, traits<Rhs>::RowsAtCompileTime>::ret,
+    ColsAtCompileTime = size_at_compile_time<traits<Lhs>::ColsAtCompileTime, traits<Rhs>::ColsAtCompileTime>::ret,
+    MaxRowsAtCompileTime = size_at_compile_time<traits<Lhs>::MaxRowsAtCompileTime, traits<Rhs>::MaxRowsAtCompileTime>::ret,
+    MaxColsAtCompileTime = size_at_compile_time<traits<Lhs>::MaxColsAtCompileTime, traits<Rhs>::MaxColsAtCompileTime>::ret,
+
+    EvalToRowMajor = (LhsFlags & RhsFlags & RowMajorBit),
+    RemovedBits = ~(EvalToRowMajor ? 0 : RowMajorBit),
+
+    Flags = ((LhsFlags | RhsFlags) & HereditaryBits & RemovedBits)
+          | EvalBeforeNestingBit | EvalBeforeAssigningBit,
+    CoeffReadCost = Dynamic
+  };
+
+  typedef SparseMatrix<Scalar> ReturnType;
+};
+
+} // end namespace internal
+
+/*!
+ * \ingroup KroneckerProduct_Module
+ *
+ * Computes Kronecker tensor product of two dense matrices
+ *
+ * \warning If you want to replace a matrix by its Kronecker product
+ *          with some matrix, do \b NOT do this:
+ * \code
+ * A = kroneckerProduct(A,B); // bug!!! caused by aliasing effect
+ * \endcode
+ * instead, use eval() to work around this:
+ * \code
+ * A = kroneckerProduct(A,B).eval();
+ * \endcode
+ *
+ * \param a  Dense matrix a
+ * \param b  Dense matrix b
+ * \return   Kronecker tensor product of a and b
+ */
+template<typename A, typename B>
+KroneckerProduct<A,B> kroneckerProduct(const MatrixBase<A>& a, const MatrixBase<B>& b)
+{
+  return KroneckerProduct<A, B>(a.derived(), b.derived());
+}
+
+/*!
+ * \ingroup KroneckerProduct_Module
+ *
+ * Computes Kronecker tensor product of two matrices, at least one of
+ * which is sparse
+ *
+ * \warning If you want to replace a matrix by its Kronecker product
+ *          with some matrix, do \b NOT do this:
+ * \code
+ * A = kroneckerProduct(A,B); // bug!!! caused by aliasing effect
+ * \endcode
+ * instead, use eval() to work around this:
+ * \code
+ * A = kroneckerProduct(A,B).eval();
+ * \endcode
+ *
+ * \param a  Dense/sparse matrix a
+ * \param b  Dense/sparse matrix b
+ * \return   Kronecker tensor product of a and b, stored in a sparse
+ *           matrix
+ */
+template<typename A, typename B>
+KroneckerProductSparse<A,B> kroneckerProduct(const EigenBase<A>& a, const EigenBase<B>& b)
+{
+  return KroneckerProductSparse<A,B>(a.derived(), b.derived());
+}
+
+} // end namespace Eigen
+
+#endif // KRONECKER_TENSOR_PRODUCT_H
diff --git a/third_party/gpus/crosstool/BUILD b/third_party/gpus/crosstool/BUILD
new file mode 100644
index 0000000000..eac4dc7fad
--- /dev/null
+++ b/third_party/gpus/crosstool/BUILD
@@ -0,0 +1,28 @@
+licenses(["restricted"])
+
+package(default_visibility = ["//visibility:public"])
+
+filegroup(
+    name = "crosstool",
+    srcs = ["CROSSTOOL"],
+    output_licenses = ["unencumbered"],
+)
+
+cc_toolchain(
+    name = "cc-compiler-local",
+    all_files = ":empty",
+    compiler_files = ":empty",
+    cpu = "local",
+    dwp_files = ":empty",
+    dynamic_runtime_libs = [":empty"],
+    linker_files = ":empty",
+    objcopy_files = ":empty",
+    static_runtime_libs = [":empty"],
+    strip_files = ":empty",
+    supports_param_files = 0,
+)
+
+filegroup(
+    name = "empty",
+    srcs = [],
+)
diff --git a/third_party/gpus/crosstool/CROSSTOOL b/third_party/gpus/crosstool/CROSSTOOL
new file mode 100644
index 0000000000..3570c5c8a2
--- /dev/null
+++ b/third_party/gpus/crosstool/CROSSTOOL
@@ -0,0 +1,146 @@
+major_version: "local"
+minor_version: ""
+default_target_cpu: "same_as_host"
+
+default_toolchain {
+  cpu: "k8"
+  toolchain_identifier: "local_linux"
+}
+default_toolchain {
+  cpu: "piii"
+  toolchain_identifier: "local_linux"
+}
+default_toolchain {
+  cpu: "darwin"
+  toolchain_identifier: "local_darwin"
+}
+
+toolchain {
+  abi_version: "local"
+  abi_libc_version: "local"
+  builtin_sysroot: ""
+  compiler: "compiler"
+  host_system_name: "local"
+  needsPic: true
+  supports_gold_linker: false
+  supports_incremental_linker: false
+  supports_fission: false
+  supports_interface_shared_objects: false
+  supports_normalizing_ar: false
+  supports_start_end_lib: false
+  supports_thin_archives: false
+  target_libc: "local"
+  target_cpu: "local"
+  target_system_name: "local"
+  toolchain_identifier: "local_linux"
+
+  tool_path { name: "ar" path: "/usr/bin/ar" }
+  tool_path { name: "compat-ld" path: "/usr/bin/ld" }
+  tool_path { name: "cpp" path: "/usr/bin/cpp" }
+  tool_path { name: "dwp" path: "/usr/bin/dwp" }
+  # As part of the TensorFlow release, we place some cuda-related compilation
+  # files in third_party/gpus/crosstool/clang/bin, and this relative
+  # path, combined with the rest of our Bazel configuration causes our
+  # compilation to use those files.
+  tool_path { name: "gcc" path: "clang/bin/crosstool_wrapper_driver_is_not_gcc" }
+  # Use "-std=c++11" for nvcc. For consistency, force both the host compiler
+  # and the device compiler to use "-std=c++11".
+  cxx_flag: "-std=c++11"
+  linker_flag: "-lstdc++"
+  linker_flag: "-B/usr/bin/"
+
+  # TODO(bazel-team): In theory, the path here ought to exactly match the path
+  # used by gcc. That works because bazel currently doesn't track files at
+  # absolute locations and has no remote execution, yet. However, this will need
+  # to be fixed, maybe with auto-detection?
+  cxx_builtin_include_directory: "/usr/lib/gcc/"
+  cxx_builtin_include_directory: "/usr/local/include"
+  cxx_builtin_include_directory: "/usr/include"
+  tool_path { name: "gcov" path: "/usr/bin/gcov" }
+
+  # C(++) compiles invoke the compiler (as that is the one knowing where
+  # to find libraries), but we provide LD so other rules can invoke the linker.
+  tool_path { name: "ld" path: "/usr/bin/ld" }
+
+  tool_path { name: "nm" path: "/usr/bin/nm" }
+  tool_path { name: "objcopy" path: "/usr/bin/objcopy" }
+  objcopy_embed_flag: "-I"
+  objcopy_embed_flag: "binary"
+  tool_path { name: "objdump" path: "/usr/bin/objdump" }
+  tool_path { name: "strip" path: "/usr/bin/strip" }
+
+  # Anticipated future default.
+  unfiltered_cxx_flag: "-no-canonical-prefixes"
+
+  # Make C++ compilation deterministic. Use linkstamping instead of these
+  # compiler symbols.
+  unfiltered_cxx_flag: "-Wno-builtin-macro-redefined"
+  unfiltered_cxx_flag: "-D__DATE__=\"redacted\""
+  unfiltered_cxx_flag: "-D__TIMESTAMP__=\"redacted\""
+  unfiltered_cxx_flag: "-D__TIME__=\"redacted\""
+
+  # Security hardening on by default.
+  # Conservative choice; -D_FORTIFY_SOURCE=2 may be unsafe in some cases.
+  # We need to undef it before redefining it as some distributions now have
+  # it enabled by default.
+  compiler_flag: "-U_FORTIFY_SOURCE"
+  compiler_flag: "-D_FORTIFY_SOURCE=1"
+  compiler_flag: "-fstack-protector"
+  compiler_flag: "-fPIE"
+  linker_flag: "-pie"
+  linker_flag: "-Wl,-z,relro,-z,now"
+
+  # Enable coloring even if there's no attached terminal. Bazel removes the
+  # escape sequences if --nocolor is specified. This isn't supported by gcc
+  # on Ubuntu 14.04.
+  # compiler_flag: "-fcolor-diagnostics"
+
+  # All warnings are enabled. Maybe enable -Werror as well?
+  compiler_flag: "-Wall"
+  # Enable a few more warnings that aren't part of -Wall.
+  compiler_flag: "-Wunused-but-set-parameter"
+  # But disable some that are problematic.
+  compiler_flag: "-Wno-free-nonheap-object" # has false positives
+
+  # Keep stack frames for debugging, even in opt mode.
+  compiler_flag: "-fno-omit-frame-pointer"
+
+  # Anticipated future default.
+  linker_flag: "-no-canonical-prefixes"
+  # Have gcc return the exit code from ld.
+  linker_flag: "-pass-exit-codes"
+  # Stamp the binary with a unique identifier.
+  linker_flag: "-Wl,--build-id=md5"
+  linker_flag: "-Wl,--hash-style=gnu"
+  # Gold linker only? Can we enable this by default?
+  # linker_flag: "-Wl,--warn-execstack"
+  # linker_flag: "-Wl,--detect-odr-violations"
+
+  compilation_mode_flags {
+    mode: DBG
+    # Enable debug symbols.
+    compiler_flag: "-g"
+  }
+  compilation_mode_flags {
+    mode: OPT
+
+    # No debug symbols.
+    # Maybe we should enable https://gcc.gnu.org/wiki/DebugFission for opt or
+    # even generally? However, that can't happen here, as it requires special
+    # handling in Bazel.
+    compiler_flag: "-g0"
+
+    # Conservative choice for -O
+    # -O3 can increase binary size and even slow down the resulting binaries.
+    # Profile first and / or use FDO if you need better performance than this.
+    compiler_flag: "-O2"
+
+    # Disable assertions
+    compiler_flag: "-DNDEBUG"
+
+    # Removal of unused code and data at link time (can this increase binary size in some cases?).
+    compiler_flag: "-ffunction-sections"
+    compiler_flag: "-fdata-sections"
+    linker_flag: "-Wl,--gc-sections"
+  }
+}
diff --git a/third_party/gpus/crosstool/LICENSE b/third_party/gpus/crosstool/LICENSE
new file mode 100644
index 0000000000..d3da228420
--- /dev/null
+++ b/third_party/gpus/crosstool/LICENSE
@@ -0,0 +1,203 @@
+Copyright 2015 The TensorFlow Authors.  All rights reserved.
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2015, The TensorFlow Authors.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc b/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
new file mode 100755
index 0000000000..0f419a8b61
--- /dev/null
+++ b/third_party/gpus/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc
@@ -0,0 +1,316 @@
+#!/usr/bin/env python2
+
+"""Crosstool wrapper for compiling CUDA programs.
+
+SYNOPSIS:
+  crosstool_wrapper_is_not_gcc [options passed in by cc_library()
+                                or cc_binary() rule]
+
+DESCRIPTION:
+  This script is expected to be called by the cc_library() or cc_binary() bazel
+  rules. When the option "-x cuda" is present in the list of arguments passed
+  to this script, it invokes the nvcc CUDA compiler. Most arguments are passed
+  as is as a string to --compiler-options of nvcc. When "-x cuda" is not
+  present, this wrapper invokes hybrid_driver_is_not_gcc with the input
+  arguments as is.
+
+NOTES:
+  Changes to the contents of this file must be propagated from
+  //third_party/gpus/crosstool/crosstool_wrapper_is_not_gcc to
+  //third_party/gpus/crosstool/v*/*/clang/bin/crosstool_wrapper_is_not_gcc
+"""
+
+__author__ = 'keveman@google.com (Manjunath Kudlur)'
+
+from argparse import ArgumentParser
+import os
+import subprocess
+import re
+import sys
+import pipes
+
+CURRENT_DIR = os.path.dirname(sys.argv[0])
+CPU_COMPILER = ('/usr/bin/gcc')
+NVCC_PATH = CURRENT_DIR + '/../../../cuda/bin/nvcc'
+GCC_HOST_COMPILER_PATH = ('/usr/bin/gcc')
+LLVM_HOST_COMPILER_PATH = ('/usr/bin/gcc')
+PREFIX_DIR = os.path.dirname(GCC_HOST_COMPILER_PATH)
+
+
+def Log(s):
+  print 'gpus/crosstool: {0}'.format(s)
+
+
+def GetOptionValue(argv, option):
+  """Extract the list of values for option from the argv list.
+
+  Args:
+    argv: A list of strings, possibly the argv passed to main().
+    option: The option whose value to extract, without the leading '-'.
+
+  Returns:
+    A list of values, either directly following the option,
+    (eg., -opt val1 val2) or values collected from multiple occurrences of
+    the option (eg., -opt val1 -opt val2).
+  """
+
+  parser = ArgumentParser()
+  parser.add_argument('-' + option, nargs='*', action='append')
+  args, _ = parser.parse_known_args(argv)
+  if not args or not vars(args)[option]:
+    return []
+  else:
+    return sum(vars(args)[option], [])
+
+
+def GetHostCompilerOptions(argv):
+  """Collect the -isystem, -iquote, and --sysroot option values from argv.
+
+  Args:
+    argv: A list of strings, possibly the argv passed to main().
+
+  Returns:
+    The string that can be used as the --compiler-options to nvcc.
+  """
+
+  parser = ArgumentParser()
+  parser.add_argument('-isystem', nargs='*', action='append')
+  parser.add_argument('-iquote', nargs='*', action='append')
+  parser.add_argument('--sysroot', nargs=1)
+  parser.add_argument('-g', nargs='*', action='append')
+
+  args, _ = parser.parse_known_args(argv)
+
+  opts = ''
+  # This is a temporary workaround for b/12960069.
+  # NVIDIA is going to fix this in CUDA 6.5, but until then this workaround
+  # will let us compile Thrust with the cuda crosstool.
+  # bazel passes all include directories as '-isystem dir' to the crosstool.
+  # This causes nvcc to think that there are kernel launches from system
+  # directories (which apparently is not supported by the compiler). This
+  # workaround changes '-isystem third_party/gpus/cuda/include' to
+  # '-iquote third_party/gpus/cuda/include'.
+  isystem_args = [x for x in args.isystem
+                  if 'third_party/gpus/cuda/include' not in x]
+  iquote_args = (args.iquote +
+                 [x for x in args.isystem
+                  if 'third_party/gpus/cuda/include' in x])
+  # This hack is needed so that we can compile eigen3. We need to include
+  # third_party/eigen3 with -I. Some eigen file include using the
+  # include <Eigen/Core> syntax, and -iquote doesn't work for that.
+  has_eigen = ['third_party/eigen3'] in isystem_args
+  if has_eigen:
+    isystem_args.remove(['third_party/eigen3'])
+
+  if isystem_args:
+    opts += '-isystem ' + ' -isystem '.join(sum(isystem_args, []))
+  if iquote_args:
+    opts += ' -iquote ' + ' -iquote '.join(sum(iquote_args, []))
+  if args.g:
+    opts += ' -g' + ' -g'.join(sum(args.g, []))
+  if args.sysroot:
+    opts += ' --sysroot ' + args.sysroot[0]
+  if has_eigen:
+    opts += ' -I third_party/eigen3'
+
+  return opts
+
+def GetNvccOptions(argv):
+  """Collect the -nvcc_options values from argv.
+
+  Args:
+    argv: A list of strings, possibly the argv passed to main().
+
+  Returns:
+    The string that can be passed directly to nvcc.
+  """
+
+  parser = ArgumentParser()
+  parser.add_argument('-nvcc_options', nargs='*', action='append')
+
+  args, _ = parser.parse_known_args(argv)
+
+  if args.nvcc_options:
+    return ' '.join(['--'+a for a in sum(args.nvcc_options, [])])
+  return ''
+
+
+def StripAndTransformNvccOptions(argv):
+  """Strips the -nvcc_options values from argv and transforms define-macros.
+
+  Args:
+    argv: A list of strings, possibly the argv passed to main().
+
+  Returns:
+    A list of strings that can be passed directly to gcudacc.
+  """
+  parser = ArgumentParser()
+  parser.add_argument('-nvcc_options', nargs='*', action='store')
+  args, leftover = parser.parse_known_args(argv)
+  if args.nvcc_options:
+    for option in args.nvcc_options:
+      (flag, _, value) = option.partition('=')
+      if 'define-macro' in flag:
+        leftover.append('-D' + value)
+  return leftover
+
+
+def InvokeGcudacc(argv, gcudacc_version, gcudacc_flags, log=False):
+  """Call gcudacc with arguments assembled from argv.
+
+  Args:
+    argv: A list of strings, possibly the argv passed to main().
+    gcudacc_version: The version of gcudacc; this is a subdirectory name under
+      the gcudacc bin/ directory.
+    gcudacc_flags: A list of extra arguments passed just for gcudacc.
+    log: True if logging is requested.
+
+  Returns:
+    The return value of calling os.system('gcudacc ' + args)
+  """
+
+  gcudacc_cmd = os.path.join(GCUDACC_PATH_BASE, gcudacc_version, 'gcudacc.par')
+  gcudacc_cmd = (
+      gcudacc_cmd +
+      ' --google_host_compiler={0} '.format(LLVM_HOST_COMPILER_PATH) +
+      ' '.join(sum(gcudacc_flags, [])) +
+      ' -- ' +
+      ' '.join(StripAndTransformNvccOptions(argv)))
+  if log: Log(gcudacc_cmd)
+  return os.system(gcudacc_cmd)
+
+
+def InvokeNvcc(argv, log=False):
+  """Call nvcc with arguments assembled from argv.
+
+  Args:
+    argv: A list of strings, possibly the argv passed to main().
+    log: True if logging is requested.
+
+  Returns:
+    The return value of calling os.system('nvcc ' + args)
+  """
+
+  host_compiler_options = GetHostCompilerOptions(argv)
+  nvcc_compiler_options = GetNvccOptions(argv)
+  opt_option = GetOptionValue(argv, 'O')
+  m_options = GetOptionValue(argv, 'm')
+  m_options = ''.join([' -m' + m for m in m_options if m in ['32', '64']])
+  include_options = GetOptionValue(argv, 'I')
+  out_file = GetOptionValue(argv, 'o')
+  depfiles = GetOptionValue(argv, 'MF')
+  defines = GetOptionValue(argv, 'D')
+  defines = ''.join([' -D' + define for define in defines])
+  undefines = GetOptionValue(argv, 'U')
+  undefines = ''.join([' -U' + define for define in undefines])
+  std_options = GetOptionValue(argv, 'std')
+  # currently only c++11 is supported by Cuda 7.0 std argument
+  nvcc_allowed_std_options = ["c++11"]
+  std_options = ''.join([' -std=' + define
+      for define in std_options if define in nvcc_allowed_std_options])
+
+  # The list of source files get passed after the -c option. I don't know of
+  # any other reliable way to just get the list of source files to be compiled.
+  src_files = GetOptionValue(argv, 'c')
+
+  if len(src_files) == 0:
+    return 1
+  if len(out_file) != 1:
+    return 1
+
+  opt = (' -O2' if (len(opt_option) > 0 and int(opt_option[0]) > 0)
+         else ' -g -G')
+
+  includes = (' -I ' + ' -I '.join(include_options)
+              if len(include_options) > 0
+              else '')
+
+  # Unfortunately, there are other options that have -c prefix too.
+  # So allowing only those look like C/C++ files.
+  src_files = [f for f in src_files if
+               re.search('\.cpp$|\.cc$|\.c$|\.cxx$|\.C$', f)]
+  srcs = ' '.join(src_files)
+  out = ' -o ' + out_file[0]
+
+  nvccopts = ' '.join([
+      r'-gencode=arch=compute_35,\"code=sm_35,compute_35\"',
+      r'-gencode=arch=compute_52,\"code=sm_52,compute_52\"',])
+  nvccopts += ' ' + nvcc_compiler_options
+  nvccopts += undefines
+  nvccopts += defines
+  nvccopts += std_options
+  nvccopts += m_options
+
+  if depfiles:
+    # Generate the dependency file
+    depfile = depfiles[0]
+    cmd = (NVCC_PATH + ' ' + nvccopts +
+           ' --compiler-options "' + host_compiler_options + '"' +
+           ' --compiler-bindir=' + GCC_HOST_COMPILER_PATH +
+           ' -I .' +
+           ' -x cu ' + includes + ' ' + srcs + ' -M -o ' + depfile)
+    if log: Log(cmd)
+    exit_status = os.system(cmd)
+    if exit_status != 0:
+      return exit_status
+
+  cmd = (NVCC_PATH + ' ' + nvccopts +
+         ' --compiler-options "' + host_compiler_options + ' -fPIC"' +
+         ' --compiler-bindir=' + GCC_HOST_COMPILER_PATH +
+         ' -I .' +
+         ' -x cu ' + opt + includes + ' -c ' + srcs + out)
+
+  # TODO(zhengxq): for some reason, 'gcc' needs this help to find 'as'. 
+  # Need to investigate and fix. 
+  cmd = 'PATH=' + PREFIX_DIR + ' ' + cmd
+  if log: Log(cmd)
+  return os.system(cmd)
+
+
+def main():
+  parser = ArgumentParser()
+  parser.add_argument('-x', nargs=1)
+  parser.add_argument('--cuda_log', action='store_true')
+  parser.add_argument('--use_gcudacc', action='store_true')
+  parser.add_argument('--gcudacc_version', action='store', default='v8')
+  parser.add_argument('--gcudacc_flag', nargs='*', action='append', default=[])
+  args, leftover = parser.parse_known_args(sys.argv[1:])
+
+  if args.x and args.x[0] == 'cuda':
+    if args.cuda_log: Log('-x cuda')
+    leftover = [pipes.quote(s) for s in leftover]
+    if args.use_gcudacc:
+      if args.cuda_log: Log('using gcudacc')
+      return InvokeGcudacc(argv=leftover,
+                           gcudacc_version=args.gcudacc_version,
+                           gcudacc_flags=args.gcudacc_flag,
+                           log=args.cuda_log)
+    if args.cuda_log: Log('using nvcc')
+    return InvokeNvcc(leftover, log=args.cuda_log)
+
+  # Strip our flags before passing through to the CPU compiler for files which
+  # are not -x cuda. We can't just pass 'leftover' because it also strips -x.
+  # We not only want to pass -x to the CPU compiler, but also keep it in its
+  # relative location in the argv list (the compiler is actually sensitive to
+  # this).
+  cpu_compiler_flags = [flag for flag in sys.argv[1:]
+                             if not flag.startswith(('--cuda_log',
+                                                     '--use_gcudacc',
+                                                     '--gcudacc_version',
+                                                     '--gcudacc_flag'))]
+  if args.use_gcudacc:
+    # This macro is defined for TUs that are not marked with "-x cuda" but are
+    # built as part of a -config=cuda --use_gcudacc compilation. They are
+    # compiled with the default CPU compiler. Since the objects built from
+    # these TUs are later linked with objects that come from gcudacc, some
+    # parts of the code need to be marked for these special cases. For example,
+    # some types have to be defined similarly for gcudacc-compiled TUs and
+    # default CPU compiler-compiled TUs linked with them, but differently when
+    # nvcc is used.
+    # TODO(eliben): rename to a more descriptive name.
+    cpu_compiler_flags.append('-D__GCUDACC_HOST__')
+
+  return subprocess.call([CPU_COMPILER] + cpu_compiler_flags)
+
+if __name__ == '__main__':
+  sys.exit(main())
diff --git a/third_party/gpus/cuda/BUILD b/third_party/gpus/cuda/BUILD
new file mode 100644
index 0000000000..40c048661c
--- /dev/null
+++ b/third_party/gpus/cuda/BUILD
@@ -0,0 +1,158 @@
+licenses(["restricted"])  # MPL2, portions GPL v3, LGPL v3, BSD-like
+
+load("/tensorflow/tensorflow", "if_cuda")
+
+package(default_visibility = ["//visibility:public"])
+
+config_setting(
+    name = "cuda_crosstool_condition",
+    values = {"crosstool_top": "//third_party/gpus/crosstool"},
+    visibility = ["//visibility:public"],
+)
+
+config_setting(
+    name = "using_gcudacc",
+    values = {
+        "crosstool_top": "//third_party/gpus/crosstool",
+        "copt": "--use_gcudacc",
+    },
+    visibility = ["//visibility:public"],
+)
+
+config_setting(
+    name = "using_nvcc",
+    values = {
+        "crosstool_top": "//third_party/gpus/crosstool",
+        "copt": "--use_nvcc",
+    },
+)
+
+cc_library(
+    name = "cuda_headers",
+    hdrs = glob([
+        "**/*.h",
+    ]),
+    includes = [".", "include"],
+    visibility = ["//visibility:public"],
+)
+
+cc_library(
+    name = "cudart_static",
+    srcs = [
+        "lib64/libcudart_static.a",
+    ],
+    includes = ["include/"],
+    linkopts = [
+        "-ldl",
+        "-lrt",
+        "-lpthread",
+    ],
+    visibility = ["//visibility:public"],
+)
+
+cc_library(
+    name = "cudart",
+    srcs = [
+        "lib64/libcudart.so.7.0",
+    ],
+    data = [
+        "lib64/libcudart.so.7.0",
+    ],
+    includes = ["include/"],
+    visibility = ["//visibility:public"],
+    linkstatic = 1,
+)
+
+cc_library(
+    name = "cublas",
+    srcs = [
+        "lib64/libcublas.so.7.0",
+    ],
+    data = [
+        "lib64/libcublas.so.7.0",
+    ],
+    includes = ["include/"],
+    visibility = ["//visibility:public"],
+    linkstatic = 1,
+)
+
+cc_library(
+    name = "cudnn",
+    srcs = [
+        "lib64/libcudnn.so.6.5",
+    ],
+    data = [
+        "lib64/libcudnn.so.6.5",
+    ],
+    includes = ["include/"],
+    visibility = ["//visibility:public"],
+    linkstatic = 1,
+)
+
+cc_library(
+    name = "cuda",
+    deps = [
+        ":cuda_headers",
+        ":cudart",
+        ":cublas",
+        ":cudnn",
+    ],
+    visibility = ["//visibility:public"],
+)
+
+# TODO(opensource): for now, we have to invoke the cuda_config.sh manually in the source tree.
+# This rule checks if Cuda libraries in the source tree has been properly configured.
+# The output list makes bazel runs this rule first if the Cuda files are missing.
+# This gives us an opportunity to check and print a meaningful error message.
+# But we will need to create the output file list to make bazel happy in a successfull run.
+genrule(
+    name = "cuda_check",
+    srcs = [
+        "cuda.config",
+        "cuda_config.sh",
+    ],
+    outs = [
+        "include/cuda.h",
+        "include/cublas.h",
+        "include/cudnn.h",
+        "lib64/libcudart_static.a",
+        "lib64/libcublas.so.7.0",
+        "lib64/libcudnn.so.6.5",
+        "lib64/libcudart.so.7.0",
+    ],
+    cmd = if_cuda(
+        # Under cuda config, create all the symbolic links to the actual cuda files
+        "OUTPUTDIR=`readlink -f $(@D)/../../..`; cd third_party/gpus/cuda; OUTPUTDIR=$$OUTPUTDIR ./cuda_config.sh --check;",
+
+        # Under non-cuda config, create all dummy files to make the build go through
+        ";".join([
+          "mkdir -p $(@D)/include",
+         "mkdir -p $(@D)/lib64",
+         "touch $(@D)/include/cuda.h",
+          "touch $(@D)/include/cublas.h",
+          "touch $(@D)/include/cudnn.h",
+          "touch $(@D)/lib64/libcudart_static.a",
+          "touch $(@D)/lib64/libcublas.so.7.0",
+          "touch $(@D)/lib64/libcudnn.so.6.5",
+          "touch $(@D)/lib64/libcudart.so.7.0"
+            ]),
+    ),
+    local = 1,
+)
+
+genrule(
+    name = "cuda_config_check",
+    outs = [
+        "cuda.config",
+    ],
+    cmd = if_cuda(
+        # Under cuda config, create the symbolic link to the actual cuda.config
+        "ln -sf `readlink -f third_party/gpus/cuda/cuda.config` $(@D)/;",
+
+        # Under non-cuda config, create the dummy file
+        ";".join([
+         "touch $(@D)/cuda.config",
+        ]),
+    ),
+    local = 1,
+)
diff --git a/third_party/gpus/cuda/LICENSE b/third_party/gpus/cuda/LICENSE
new file mode 100644
index 0000000000..d3da228420
--- /dev/null
+++ b/third_party/gpus/cuda/LICENSE
@@ -0,0 +1,203 @@
+Copyright 2015 The TensorFlow Authors.  All rights reserved.
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright 2015, The TensorFlow Authors.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/third_party/gpus/cuda/cuda_config.sh b/third_party/gpus/cuda/cuda_config.sh
new file mode 100755
index 0000000000..034298590b
--- /dev/null
+++ b/third_party/gpus/cuda/cuda_config.sh
@@ -0,0 +1,169 @@
+#!/bin/bash
+
+# A simple script to configure the Cuda tree needed for the TensorFlow GPU
+# build. We need both Cuda toolkit 7.0 and Cudnn 6.5.
+# Useage:
+#    * User edit cuda.config to point both Cuda toolkit and Cudnn libraries to their local path
+#    * run cuda_config.sh to generate symbolic links in the source tree to reflect
+#    * the file organizations needed by TensorFlow.
+
+print_usage() {
+cat << EOF
+Usage: $0 [--check]
+  Configure TensorFlow's canonical view of Cuda libraries using cuda.config.
+Arguments:
+  --check: Only check that the proper Cuda dependencies has already been
+       properly configured in the source tree. It also creates symbolic links to
+       the files in the gen-tree to make bazel happy.
+EOF
+}
+
+CHECK_ONLY=0
+# Parse the arguments. Add more arguments as the "case" line when needed.
+while [[ $# > 0 ]]; do
+  argument="$1"
+  shift
+  case $argument in
+    --check)
+      CHECK_ONLY=1
+      ;;
+    *)
+      echo "Error: unknown arguments"
+      print_usage
+      exit -1
+      ;;
+  esac
+done
+
+source cuda.config || exit -1
+
+OUTPUTDIR=${OUTPUTDIR:-../../..}
+CUDA_TOOLKIT_PATH=${CUDA_TOOLKIT_PATH:-/usr/local/cuda}
+CUDNN_INSTALL_PATH=${CUDNN_INSTALL_PATH:-/usr/local/cuda}
+
+# An error message when the Cuda toolkit is not found
+function CudaError {
+  echo ERROR: $1
+cat << EOF
+##############################################################################
+##############################################################################
+Cuda 7.0 toolkit is missing.
+1. Download and install the CUDA 7.0 toolkit and CUDNN 6.5 library;
+2. Run configure from the root of the source tree, before rerunning bazel;
+Please refer to README.md for more details.
+##############################################################################
+##############################################################################
+EOF
+  exit -1
+}
+
+# An error message when CUDNN is not found
+function CudnnError {
+  echo ERROR: $1
+cat << EOF
+##############################################################################
+##############################################################################
+Cudnn 6.5 is missing.
+1. Download and install the CUDA 7.0 toolkit and CUDNN 6.5 library;
+2. Run configure from the root of the source tree, before rerunning bazel;
+Please refer to README.md for more details.
+##############################################################################
+##############################################################################
+EOF
+  exit -1
+}
+
+# Check that Cuda libraries has already been properly configured in the source tree.
+# We still need to create links to the gen-tree to make bazel happy.
+function CheckAndLinkToSrcTree {
+  ERROR_FUNC=$1
+  FILE=$2
+  if test ! -e $FILE; then
+    $ERROR_FUNC "$PWD/$FILE cannot be found"
+  fi
+
+  # Link the output file to the source tree, avoiding self links if they are
+  # the same. This could happen if invoked from the source tree by accident.
+  if [ ! `readlink -f $PWD` == `readlink -f $OUTPUTDIR/third_party/gpus/cuda` ]; then
+    mkdir -p `dirname $OUTPUTDIR/third_party/gpus/cuda/$FILE`
+    ln -sf $PWD/$FILE $OUTPUTDIR/third_party/gpus/cuda/$FILE
+  fi
+}
+
+if [ "$CHECK_ONLY" == "1" ]; then
+  CheckAndLinkToSrcTree CudaError include/cuda.h
+  CheckAndLinkToSrcTree CudaError include/cublas.h
+  CheckAndLinkToSrcTree CudnnError include/cudnn.h
+  CheckAndLinkToSrcTree CudaError lib64/libcudart_static.a
+  CheckAndLinkToSrcTree CudaError lib64/libcublas.so.7.0
+  CheckAndLinkToSrcTree CudnnError lib64/libcudnn.so.6.5
+  CheckAndLinkToSrcTree CudaError lib64/libcudart.so.7.0
+  exit 0
+fi
+
+# Actually configure the source tree for TensorFlow's canonical view of Cuda
+# libraries.
+
+if test ! -e ${CUDA_TOOLKIT_PATH}/lib64/libcudart.so.7.0; then
+  CudaError "cannot find ${CUDA_TOOLKIT_PATH}/lib64/libcudart.so.7.0"
+fi
+
+if test ! -d ${CUDNN_INSTALL_PATH}; then
+  CudnnError "cannot find dir: ${CUDNN_INSTALL_PATH}"
+fi
+
+# Locate cudnn.h
+if test -e ${CUDNN_INSTALL_PATH}/cudnn.h; then
+  CUDNN_HEADER_PATH=${CUDNN_INSTALL_PATH}
+elif test -e ${CUDNN_INSTALL_PATH}/include/cudnn.h; then
+  CUDNN_HEADER_PATH=${CUDNN_INSTALL_PATH}/include
+else
+  CudnnError "cannot find cudnn.h under: ${CUDNN_INSTALL_PATH}"
+fi
+
+# Locate libcudnn.so.6.5
+if test -e ${CUDNN_INSTALL_PATH}/libcudnn.so.6.5; then
+  CUDNN_LIB_PATH=${CUDNN_INSTALL_PATH}
+elif test -e ${CUDNN_INSTALL_PATH}/lib64/libcudnn.so.6.5; then
+  CUDNN_LIB_PATH=${CUDNN_INSTALL_PATH}/lib64
+else
+  CudnnError "cannot find libcudnn.so.6.5 under: ${CUDNN_INSTALL_PATH}"
+fi
+
+# Helper function to build symbolic links for all files under a directory.
+function LinkOneDir {
+  SRC_PREFIX=$1
+  DST_PREFIX=$2
+  SRC_DIR=$3
+  DST_DIR=`echo $SRC_DIR | sed "s,^$SRC_PREFIX,$DST_PREFIX,"`
+  mkdir -p $DST_DIR
+  FILE_LIST=`find -L $SRC_DIR -maxdepth 1 -type f`
+  if test "$FILE_LIST" != ""; then
+    ln -sf $FILE_LIST $DST_DIR/ || exit -1
+  fi
+}
+export -f LinkOneDir
+
+# Build links for all files under the directory, including subdirectoreis.
+function LinkAllFiles {
+  SRC_DIR=$1
+  DST_DIR=$2
+  find -L $SRC_DIR -type d | xargs -I {} bash -c "LinkOneDir $SRC_DIR $DST_DIR {}" || exit -1
+}
+
+# Set up the symbolic links for cuda toolkit. We link at individual file level,
+# not at the directory level.
+# This is because the external library may have different file layout from our desired structure.
+mkdir -p $OUTPUTDIR/third_party/gpus/cuda
+echo "Setting up Cuda include"
+LinkAllFiles ${CUDA_TOOLKIT_PATH}/include $OUTPUTDIR/third_party/gpus/cuda/include || exit -1
+echo "Setting up Cuda lib64"
+LinkAllFiles ${CUDA_TOOLKIT_PATH}/lib64 $OUTPUTDIR/third_party/gpus/cuda/lib64 || exit -1
+echo "Setting up Cuda bin"
+LinkAllFiles ${CUDA_TOOLKIT_PATH}/bin $OUTPUTDIR/third_party/gpus/cuda/bin || exit -1
+echo "Setting up Cuda nvvm"
+LinkAllFiles ${CUDA_TOOLKIT_PATH}/nvvm $OUTPUTDIR/third_party/gpus/cuda/nvvm || exit -1
+
+# Set up symbolic link for cudnn
+ln -sf $CUDNN_HEADER_PATH/cudnn.h $OUTPUTDIR/third_party/gpus/cuda/include/cudnn.h || exit -1
+ln -sf $CUDNN_LIB_PATH/libcudnn.so.6.5 $OUTPUTDIR/third_party/gpus/cuda/lib64/libcudnn.so.6.5 || exit -1
diff --git a/tools/bazel.rc b/tools/bazel.rc
new file mode 100644
index 0000000000..742f59fee1
--- /dev/null
+++ b/tools/bazel.rc
@@ -0,0 +1 @@
+build:cuda --crosstool_top=//third_party/gpus/crosstool